Background: Thymomas & Thymic Carcinomas are rare malignancies with approximately 500 cases in the US per year. Apart from standard chemotherapy, treatment options are limited for those patients who become refractory to therapy or present with distant metastasis. Further, histological subtyping of these tumors has proven challenging, resulting in substantial discordance between pathologists and hindering the development of targeted therapy. A major impediment to therapeutic advancement is an inadequate understanding of the transcriptional biology of these cancers. Using next-generation sequencing, we embarked on a study to survey the transcriptomes of thymic malignancies to comprehensively identify novel biology by analyzing all full length transcripts expressed in these tissues.

Methods: Frozen thymomas, thymic carcinoma, and normal tissues were obtained from the Indiana University Simon Cancer Center. The WHO (2004 classification) subtypes represented in our sample set include: (4) type A, (2) A/B, (1) B2, (5) B3, (1) C, and (3) normal tissues. Tissues were reviewed and classified by one pathologist (S.B.) who was blinded to subsequent analyses. cDNA libraries were prepared and sequenced on an Applied Biosystems (AB) SOLiD3+ sequencer using a 50bp fragment run. For gene expression, mapping of reads to the genome was performed using the AB BioScope 1.0 Pipeline and outputs imported into Partek Genomics Suite for analysis. In Partek, mapped reads were cross-referenced against known genes from the UCSC database followed by statistical comparison of RPKM values for each gene between subtypes. Dimensionality reduction analyses (PCA & hierarchical clustering) were also performed in Partek.

Results: RNA sequencing of the 16 tissues produced 736 million reads equaling 37GB of data of which 24.4GB (66%) mapped to the human genome. These initial sequencing outputs represent only a portion, as additional paired-end sequencing of these samples is ongoing. In our preliminary data analysis, unsupervised hierarchical clustering of RPKM values from 20,600 RefSeq genes revealed 100% concordance between gene expression clusters and WHO subtype. A subsequent unsupervised clustering of 705 pre-miRNAs also showed substantial concordance between clusters and subtype. When analyzing differential gene expression between the three subtypes most represented in our data set (A, A/B, B3), we report 318 genes to be differentially expressed between A vs. A/B, 799 genes for A/B vs. B3, and 1524 genes for A vs. B3.

Discussion: We report preliminary data from RNA-sequencing of thymic malignancies. Initial analyses reveal that this technology could be used to accurately subtype these tumors. Further paired-end sequencing of these samples and additional tumors is ongoing. Subsequent analyses of these data include identifying gene fusions, mutations, alternative splicing, noncoding RNAs, novel transcribed regions, and potential viral genomes.

Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 102nd Annual Meeting of the American Association for Cancer Research; 2011 Apr 2-6; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2011;71(8 Suppl):Abstract nr 4858. doi:10.1158/1538-7445.AM2011-4858