Introduction: Breast tumors have 4 well-established intrinsic subtypes based on transcriptome profiling. However, clusters defined by proteomics are often in disagreement with those defined by transcriptomics. Here, we report the findings of proteogenomic profiling of 118 laser microdissected (LMD) breast tumors using RNA-Seq and mass-spectrometry (MS)-based proteomic technologies.

Methods: Cases used in this study were drawn from the Clinical Breast Care Project, with patients consented using an IRB-approved protocol. A total of 118 primary breast tumors embedded in OCT were selected and processed by LMD. Total RNA and protein were extracted using the Illustra triplePrep kit. Paired-end RNA sequencing of 118 cases was performed using the Illumina HiSeq platform, and the reads were preprocessed using a PERL-based pipeline involving the preprocessing tool PRINSEQ, splice-aligner GSNAP and HTSeq for quantifying expression. Quantitative global proteomics analyses were performed on 113 cases using isobaric TMT 6-plex labeling with the “universal reference” strategy. MS data were acquired using a Q-Exactive instrument and analyzed using Proteome Discoverer with Byonic node. Sample-to-sample normalization was conducted to remove pipetting errors and ComBat was used to remove batch effect. K-means clustering was done using Bioconductor package Consensusclustering.

Results: The number of preprocessed RNA sequencing reads for the 118 cases ranged from over 43 to 295 million. An average of 83% of reads was mapped, and 24,518 genes with a mean expression of ≥ 10 counts across 118 tumor samples were identified. The PAM50 algorithm was used for intrinsic subtyping, yielding 37 Basal-like, 16 HER2-enriched, 39 Luminal A and 26 Luminal B calls. Unsupervised clustering of 3,000 highly varying genes reflected 4 intrinsic subtypes. In the global proteomics data, 840 proteins were identified across all 113 cases. Unsupervised K-means consensus clustering on all 840 or just using the top 210 highly varying proteins indicated the optimal number of clusters to be 3. These 3 clusters were identified as Basal-enriched, Luminal A-enriched and Luminal B-enriched. HER2-enriched cases were distributed among these clusters.

We did not observe a stromal-enriched cluster in this analysis of LMD-prepared samples that selected against stromal components of the tumor.

Conclusion: Analysis of LMD breast tumors using proteogenomic technologies resulted in 3 clusters for proteome data: basal-enriched, luminal A-enriched and luminal B-enriched. Unlike a recent report on proteomics clustering using bulk processing of tumors, a stromal-enriched cluster was not observed in this analysis which excluded stromal components of the samples.

The views expressed in this abstract are those of the author and do not reflect the official policy of the Department of Army/Navy/Air Force, Department of Defense, or U.S. Government.

Citation Format: Praveen-Kumar Raj-Kumar, Tao Liu, Lori A. Sturtz, Albert J. Kovatich, Marina A. Gritsenko, Vladislav A. Petyuk, Brenda Deyarmin, Viswanadham Sridhara, James Craig, Jason E. McDermott, Anil K. Shukla, Ronald J. Moore, Matthew E. Monroe, Bobbie-Jo M. Webb-Robertson, Jeffrey A. Hooke, J.Leigh Fantacone-Campbell, Leonid Kvecher, Jianfang Liu, Jennifer Kane, Jennifer Melley, Stella Somiari, Stephen C. Benz, Justin Golovato, Shahrooz Rabizadeh, Patrick Soon-Shiong, Richard D. Smith, Richard J. Mural, Karin D. Rodland, Craig D. Shriver, Hai Hu. Integrated proteogenomic analysis of laser microdissected primary breast tumors define proteome clusters [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 284.