Introduction: Multivariate projection methods such as PCA and PLS has been widely applied for analysis of biological and chemical data. OnPLS is a recent extension to these methods suitable for integrative analysis of omics data. With OnPLS it is possible to compare multiple omics datasets to identify joint variation and variation locally unique for each of the studied datasets. OnPLS is a new approach for truly integrative analysis of omics data to be contrasted to commonly applied approaches limiting analysis to 1) comparing findings from individually analyzed blocks of data 2) pairwise comparison of individual probes.

Experimental: A Java based implementation of OnPLS was used for the statistical modeling. 116 lung squamous cell cancer samples were characterized using gene expression profiling and global proteomics. The OnPLS model was applied to jointly model variation between mRNA and protein expression. Enrichment analysis of factor loadings was performed using the Enrichr tools to identify biological mechanisms explained by the different joint and unique components of the OnPLS model.

Results: Using a cross-validation procedure the model with the highest predictive ability was calculated having two joint components and one locally unique component for each of the proteomics and gene expression datasets. The model explained 21.9% of the variation in the expression data and 26.1% of the variation in the proteomics data. The first joint component captures the highest degree of common variation between mRNA and protein activity. From the mRNA data, this component is related to immune infiltrates, especially monocytes and B-cells, whereas this component is related to extracellular matrix activity from the protein data. This suggests covariance of mRNA immune-related gene expression and extracellular matrix-related protein expression. As expected, local variation specific to the protein measurements involved regulation of protein activation and processing. mRNA-specific variation is related to keratinization, a key process in squamous cell cancer.

Conclusion: OnPLS offers an interesting approach for integrative analysis of omics data. Applying this approach to proteo-genomics data of lung squamous cell cancers suggest similar patterns of activity is represented in protein and gene expression data, however the biological processes associated with this activity may be distinct.

Citation Format: Fredrik Pettersson, Paul A. Stewart, Robbert J. Slebos, Eric A. Welsh, Ling Cen, Yonghong Zhang, Zhihua Chen, Chia-Ho Cheng, Guolin Zhang, Bin Fang, Victoria Izumi, Sean Yoder, Katherine Fellows, Y Ann Chen, Jamie K. Teer, Steven Eschrich, John M. Koomen, Anders Berglund, Eric B. Haura. OnPLS-based integrative proteogenomics analysis of lung squamous cell cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 1565. doi:10.1158/1538-7445.AM2017-1565