Cancer–testis (CT) antigens are potential targets for cancer immunotherapy because of their restricted expression in immune-privileged germ cells and various malignancies. Current application of CT-based immunotherapy has been focused on CT expression–rich tumors such as melanoma and lung cancers. In this study, we surveyed CT expression using The Cancer Genome Atlas (TCGA) datasets for ten common cancer types. We show that CT expression is specific and enriched within certain cancer molecular subtypes. For example, HORMAD1, CXorf61, ACTL8, and PRAME are highly enriched in the basal subtype of breast cancer; MAGE and CSAG are most frequently activated in the magnoid subtype of lung adenocarcinoma; and PRAME is highly upregulated in the ccB subtype of clear cell renal cell carcinoma. Analysis of CT gene expression and DNA methylation indicates that some CTs are regulated epigenetically, whereas others are controlled primarily by tissue- and subtype-specific transcription factors. Our results suggest that although for some CT expression is associated with patient outcome, not many are independent prognostic markers. Thus, CTs with shared expression pattern are heterogeneous molecules with distinct activation modes and functional properties in different cancers and cancer subtypes. These data suggest a cancer subtype–orientated application of CT antigen as biomarkers and immunotherapeutic targets. Cancer Immunol Res; 2(4); 371–9. ©2013 AACR.
The cancer–testis (CT) antigens are characterized by their spontaneous immunogenicity and distinct expression patterns normally restricted to germ cells of the testis and placenta but frequently are activated in tumor cells (1, 2). T cells and antibodies against CT proteins are detectable in patients with cancer (3–7), suggesting that the abnormal expression of CT antigens in tumors could induce adaptive immune response. More than 100 CT antigen genes have been identified (8). Among these, CT-X genes form clusters on the X chromosome (e.g., MAGE, SSX, and SPANX gene families) and encode the most immunogenic CT proteins. Other CT genes are single-copy genes located on various autosomes. The expression frequency of CT genes varies greatly in cancers. Some cancers, such as colon, renal carcinoma, and glioblastoma, are CT-poor, with detectable CT expression in fewer than 20% of tumors. CT-rich cancer types, such as lung carcinoma and melanoma, can have CT expression frequencies greater than 50%. Within a cancer type, CT expression is heterogeneous in tumor cells and varies among different tumor grades. For example, higher frequency CT gene expression has been reported in more advanced stages of non–small cell lung cancer (9, 10). In bladder cancer, the expression of the MAGE gene family is most frequently found in the invasive forms (11), and the expression of NY-ESO-1 is correlated with higher nuclear grade (12).
Whether the reactivation of the CT genes in cancers represents a causal or correlative event is not clear and is under active investigation. Clinicopathologic analyses have linked CT expression frequently with worse prognosis and less frequently with improved outcome in different cancer types (13–17); however, the molecular and cellular function of CT antigens is not well understood. For example, MAGEA11 is proposed to act as an oncogene by inhibiting prolyl hydroxylase 2 (PHD2), which downregulates the tumor-promoting hypoxia-inducible factor alpha (HIF1-α) in prostate cancer cells (18–20); MAGEA4 in the same gene family is suggested to be a tumor repressor by inducing apoptosis in non–small cell lung cancer (21–23).
Because of their limited expression in normal tissues and their wide distribution in tumors, CT antigens are promising targets for cancer immunotherapy. However, clinical trials based on strategies targeting two well-characterized CT antigens (MAGEA3 and NY-ESO-1) have shown limited success in patients with cancer (24). Recently, two parallel phase II studies, using heterologous prime boost vaccination with rV-NY-ESO-1 and rF-NY-ESO-1, have reported promising clinical benefit for patients with melanoma and ovarian cancer that are at high risk for relapse (25). Moreover, adoptively transferred autologous T cells transduced with a T-cell receptor (TCR) directed against NY-ESO-1 have mediated tumor regression in patients with synovial cell sarcoma (26).
In this study, we conducted a comprehensive survey of the expression of CT antigens in multiple human cancers from the Cancer Genome Atlas (TCGA) RNAseq datasets and identified multiple tumor subtype–specific CT antigens that can be further studied as potential biomarkers and targets for immunotherapy.
Materials and Methods
Molecular profiling datasets and data preprocessing
Level 3 RNAseq data (RNAseq RPKM or RNAseqV2 RSEM), level 3 Agilent microarray data, level 2 DNA methylation data (Infinium Human Methylation 450), and clinical data for multiple cancers were downloaded from the TCGA data portal (ref. 27; https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm). For DNA methylation data, M values were calculated as the log2 ratio of methylated intensity over unmethylated intensity (28). For RNAseq data, RSEM (RNA-Seq by expectation maximization) values were used for most analyses as they produced similar results as those from RPKM (reads per kilobase per million mapped reads) but providing more coverage on tumor samples. RPKM values were used only in the expression heatmaps (as in Fig. 2) for comparing the expression levels between genes. The breast cancer dataset NKI-295 (29) was downloaded from http://microarray-pubs.stanford.edu//wound_NKI/explore.htm. Because of ambiguous reads mapping to sets of genes with nearly 100% identity, only one gene of each set was used as a mapping target in RNAseq expression calculation. They include CTAG1B for CTAG1A and CTAG1B; MAGEA2 for MAGEA2 and MAGEA2B; MAGEA9B for MAGEA9 and MAGEA9B; XAGE1D for XAGE1A, XAGE1B, XAGE1C, and XAGE1D; GAGE12F for GAGE12F, GAGE12G, and GAGE12I; GAGE12D for GAGE12C, GAGE12D and GAGE12E; and SPANXB2 for SPANXB1 and SPANXB2.
Compilation of CT gene list
A list of 240 CT genes was queried from the current CT database (8). A search of the expression database generated from 12 normal somatic tissues by the Illumina BodyMap project (GEO accession: GSE30611) further narrowed this list down to 129 genes that have restricted expression in germ cells except for minor expression in one tissue which is mostly represented in the brain.
Identification of CT gene overexpression in cancers and cancer subtypes
We examined CT gene expression in 318 normal tissue samples collected from autologous sites of 6 types of tumors (see Supplementary Table S2) using RNAseqV2 RSEM values to set a baseline for each CT gene as the median normal expression level plus 3 times SD. Any tumor with RSEM values above the baseline for a particular gene was considered positive for expression of that gene. To identify CT genes that are significantly expressed in cancer subtypes, we estimated CT gene expression percentage in each cancer subtype as described above. Only genes with >30% frequency of expression within a cancer subtype were used for the study. We used the ANOVA test to identify candidate CT genes with a cancer subtype–specific expression pattern. On the basis of different samples sizes, the P value cutoffs were set to 1e-12, 1e-8, 1e-5, and 0.02 for the BRCA, KIRC, LUAD, and COAD datasets, respectively.
Determination of tumor molecular subtypes using consensus clustering
Cancer molecular subtypes were determined by consensus k-means clustering of gene expression data using either Agilent microarray data or RNAseq RSEM values. Results using data from these 2 platforms from the same tumor correspond well with each other. The top 5,000 variably expressed genes with the highest median absolute deviation (MAD) values were used to conduct consensus clustering using the GenePattern website at http://genepattern.broadinstitute.org. The conditions used were k-means max 5 clusters with 500 rounds of resampling iterations. Validation of clustering results for known breast cancer and glioblastoma subtypes was conducted using clustering analysis of genes from PAM50 (30) and the 840-gene list (31), respectively (see Supplementary Fig. S1).
Cluster analysis and statistical analysis
Clustering analyses were conducted using the Cluster and TreeView software (http://rana.lbl.gov/EisenSoftware.htm). ANOVA test was conducted in R (http://www.r-project.org). For survival analysis, patients were stratified using the “k-means” function from the R software to two expression groups (high expresser and low expresser). Kaplan–Meier plots were drawn using the “survival” package from the R software. Multivariate analysis was conducted in SPSS Statistics 22.
Identification of DNA methylation sites regulating gene expression
Infinium Human Methylation450 probes covering a gene and its 3-kb promoter region were collected. Pearson correlation was calculated between M values for each methylation probe and logged RNAseq RSEM values. This was done for each gene within each tumor type tested, and for each methylation probe, a median Pearson coefficient was generated across all tumor types. For each gene, the probe with the least median Pearson coefficient was chosen as the best probe showing inversed correlation between methylation and gene expression.
CT gene expression in human cancers
We compiled a set of 129 CT genes, which includes 82 CT-X and 47 non-X CT genes (see Materials and Methods and Supplementary Table S1). RNAseq data on nearly 3,500 tumor samples along with 318 normal tissues were obtained from the TCGA data repository, comprising ten cancer types including breast invasive carcinoma, lung adenocarcinoma, lung squamous carcinoma, colon adenocarcinoma, ovarian serous adenocarcinoma, clear cell renal cell carcinoma (ccRCC), head and neck squamous carcinoma, endometrial carcinoma, skin cutaneous melanoma, and glioblastoma (see Supplementary Table S2).
To identify CT genes with restricted expression in tumors, we used CT gene expression in normal tissue samples to set a baseline level (see Materials and Methods and Supplementary Table S2). Tumors with expression above the baseline for a particular CT gene were used for further analysis. Figure 1 summarizes the frequency of CT gene overexpression in a panel of human cancers, covering genes having a >15% expression frequency in at least one tumor type. This analysis identified 35 CT-X genes and 19 non-X CT genes for further study. PLAC1 is the most frequently expressed CT-X gene and PRAME is the most frequently expressed non-X CT gene in all cancers. These results also confirmed that glioblastoma and ccRCCs are CT-poor tumors, whereas melanomas and lung carcinomas have frequent CT gene expression. A summary of CT gene overexpression frequencies is shown in Supplementary Table S3.
Enriched expression of CT genes in molecular subtypes of cancers
As CT genes are often not prevalently overexpressed in cancers and cancers are known to be intrinsically heterogeneous, we examined whether CT genes are activated within particular cancer subtypes. To this end, we defined molecular subtypes of all ten cancers included in this study by consensus clustering of gene expression using the RNAseq RSEM values (see Materials and Methods). As expected, our analysis identified known molecular subtypes for breast, lung, and brain tumors (Supplementary Fig. S1). For example, our analysis separated breast cancers into luminal A/B, HER2-enriched, and basal subtypes (33); glioblastomas into proneural, neural, classical, and mesenchymal subtypes (31); and lung adenocarcinomas into bronchioid, magnoid, and squamoid subtypes (34). For ccRCCs, besides the previously known subtypes ccA and ccB, we identified a new subtype named ccAB that has a gene expression pattern between those of the ccA and ccB classes (35, 36). Many of these molecular subtypes are known to have prognostic value, which we confirmed in lung, kidney, and endometrial cancers (Supplementary Fig. S1) but not in breast cancer where basal subtype was a well-known poor prognosis group (data not shown).
We next used the ANOVA test to identify CT gene expression enriched within cancer subtypes (see Materials and Methods). In breast cancer, we confirmed that CXorf61 and HORMAD1 were specific for the basal subtype and PLAC1 for the non-basal subtype (37, 38). We also identified additional subtype-specific genes such as ACTL8 and PRAME for the basal subtype and POTEC for the non-basal subtype (Fig. 2A; Supplementary Table S4). In lung adenocarcinomas, a set of MAGE genes is most frequently overexpressed in the magnoid subtype and least expressed in the bronchioid subtype (Fig. 2C, left). Similarly, SEMG1 overexpression is enriched in the colon cancer COAD-1 subtype; SPACA3 is enriched in the COAD-2 and -3 subtypes; and PRAME is enriched in the ccB subtype of ccRCC (Fig. 2D and E). We validated our findings using external microarray gene expression datasets available for breast cancer and lung adenocarcinomas (34, 39). This confirmed that basal breast cancers have higher expression of CXorf61, PRAME, ACTL8, and MAGEA3 (Fig. 2B), and the expression of MAGEA1, 3, 4, 12, and CSAG1 is higher in magnoid lung adenocarcinomas (Fig. 2C, right).
Figure 3 highlights this updated landscape of CT gene expression in various cancers including subtype information, covering 40 genes with >30% expression frequency in at least one subtype of one cancer. These results indicate that genes such as CXorf61, HORMAD1, and SEMG1 exhibit cancer subtype–specific overexpression. In addition, particular cancer subtypes such as magnoid of lung adenocarcinoma and UCEC-2 of endometrial cancer exhibit overall increased CT gene expression compared with other subtypes of the same cancer. Therefore, CT gene expression may be restricted to particular cancer subtypes even when its expression frequency in a particular cancer is low. These genes represent potential immunotherapy targets for those specific cancer subtypes.
Regulation of CT gene expression by DNA methylation
Previous studies have shown that epigenetic modifications including promoter hypomethylation and histone deacetylation have important roles in CT gene activation (40–44). We analyzed TCGA methylation microarrays data (Infinium human Met450) to correlate the DNA methylation status with CT gene expression. This analysis focused on eight tumor types for which sufficient data were available (with >50 samples having both RNAseq and methylation data) on Met450 probes covering the entire gene structure and the 3-kb upstream promoter region for each of the 35 CT genes listed in Fig. 3.
The negative correlations between CT gene methylation and expression across cancer types are estimated and shown in Fig. 4A, reporting the least median Pearson correlation coefficient from all Met450 probes for one CT gene in eight tumors (see Materials and Methods and Supplementary Table S5). This analysis confirmed that the transcription of the CT-X genes (e.g., MAGEs, CXorf61) is regulated primarily by promoter DNA methylation (45). On the other hand, non-X CT gene expression correlated less well with DNA methylation except for PRAME and CTCFL. A detailed analysis of the CXorf61 gene expression and DNA methylation in breast cancers confirmed that it is more hypomethylated and highly expressed in basal tumors (Fig. 4B, left). However, there are non-basal breast tumors that were equally methylated yet with less expression, indicating that there are other subtype-specific mechanisms inhibiting CXorf61 expression in non-basal tumors. There is a correlation between CXorf61 gene expression and DNA methylation in lung squamous carcinomas, even though it is not expressed in a subtype-specific manner.
Similarly, PRAME expression and DNA methylation are correlated in both breast cancer and lung squamous carcinomas, with better subtype specificity in lung squamous carcinomas (Fig. 4B, right). Thus, activation of CT genes that are enriched in certain cancer subtypes is likely to be controlled by both DNA methylation and other subtype-specific mechanisms.
Prognostic value of CT genes in ccRCC
Correlations between CT expression and cancer prognosis have been reported in various studies (46, 47). In our analysis, we examined all 129 CT genes for their prognostic values with TCGA data by univariate Cox proportional hazard regression test in eight cancers using the same sample size of 200 randomly selected samples. This analysis revealed that approximately 15% of CT genes are potentially prognostic in ccRCCs, a percentage much higher than those found in other cancer types (Fig. 5A and Supplementary Table S6). Results from Kaplan–Meier survival analysis on the top three candidate prognostic genes are shown in Fig. 5C, confirming that overexpression of SPANXC, C21orf99, and SSX1 are indicators of poor prognosis for ccRCCs. When the same test was conducted on all known genes, we found that in fact more prognostic genes could be discerned in renal clear cell carcinomas than in other cancers (Fig. 5A and B). Indeed, cluster analysis of about 3,000 prognostic genes found in ccRCCs clearly identified two major types of prognostic genes, those highly expressed in the ccB subtype (poor prognostic genes) and those downregulated in the ccB subtype (Supplementary Fig. S2), which has significantly shortened survival compared with the ccA and ccAB subtypes (Supplementary Fig. S1A). A closer examination of CT gene expression associated with poor prognosis within the ccA, ccAB, and ccB subtypes also identified many CTs including the SPANXC and SSX1/2 genes with enriched expression in the ccB subtype (Fig. 5D). Therefore, a higher percentage of prognostic CTs seen in ccRCC is not unexpected. A previous study on conventional RCC identified a 259-gene prognostic gene expression signature, of which 45% of the genes overlap with our findings (data not shown; ref. 48).
ANOVA tests showed that the expression of the poor prognosis CTs (SPANXC, SSXs, C21orf99, and PRAME) is significantly associated with higher tumor grade/stage, whereas the expression of the better prognosis CT (FATE1) is significantly associated with lower tumor grade/stage (Supplementary Fig. S2B). This is in agreement with a previous report that the ccB subtype comprised higher grade tumors (36). In the multivariate analysis with histologic grade, pathologic stage and metastatic status added as confounding variables, CT association with prognosis is either diminished or weakened even as the significance remains for some CTs such as SPANXC and SSX1. Similar results of prognostic feature were observed in other cancer types (Supplementary Table S7). Therefore, a significant proportion of CT genes show prognostic values in ccRCCs that can be attributed partly to their coincidental overexpression in the poor prognostic ccB subtype. As these CT genes can be treated as potential prognostic markers, they are unlikely to be the causal factor of poor prognosis in ccRCCs.
Although CT genes are increasingly promising as targets for cancer immunotherapy approaches, a limitation is the selection of adequate tumor-specific targets, especially for CT-poor cancers. Previous studies of CT expression analyses generated by reverse transcription PCR (RT-PCR), immunohistochemistry (IHC), or microarray platforms have particular limitations. For example, RT-PCR and IHC could not provide a genome-wide unbiased context and microarray probes did not cover all CTs and could not discriminate close members within a CT-X gene cluster. The large TCGA datasets generated by the next-generation sequencing technology has enabled better characterization of tumor-specific CT antigens as biomarkers and proficient identification of promising immunotherapy targets within cancers and cancer subtypes. In this study, we analyzed the large datasets of expression, methylation, and clinicopathologic features available from the TCGA database to create a landscape of CT gene reactivation in cancers.
In particular, we refined the previous notion of CT-poor or CT-rich cancers based on the enrichment of CTs in specific subtypes. The first suggestion of subtype-specific CT gene expression in a CT-poor tumor type was described in the estrogen receptor (ER)-negative and basal breast cancer molecular subtypes (38). Although in that study the analysis was limited to CT-X genes that were present on the arrays, it showed the possibility of identifying subsets of patients that could potentially benefit from CT-based immunotherapy approaches. This raises the possibility of using subtype-specific CT antigen information in clinical trial design within the context of several other factors such as the antigenicity of each CT, the specificity of gene expression, and the heterogeneity of CT expression within a tumor. A caveat to this approach is that in tumors heterogeneous for CT expression, antigen-negative tumor cells might escape from immune intervention. Thus, finding additional options for CT-based immunotherapy would expand the reservoir for polyvalent vaccines to enhance immune response and reduce the chance of tumor cell escape from immunotherapy.
The control of CT expression was also examined in this study with the integration of the TCGA methylation data. As expected, CT-X genes cluster at a genomic location and usually are coactivated; for example, the MAGEA (2, 3, 6, 12) and CSAG (1, 2, 3) genes are within 70 kb (151,869,311–151,938,240) at Xq28 and they are frequently coexpressed. A similar pattern was observed for MAGEA4 and MAGEA10, which are about 200 kb apart from 151,092,137 to 151,302,983. Coexpression of CT genes within a physical proximity in all cancer types (Figs. 1 and 3) can be partially explained by shared local epigenetic regulation driven by demethylation and histone modification. However, even in regions with high correlation, discordance is observed at the gene level. For example, some tumors express MAGEAs but not CSAGs and vice versa. It is not surprising that FATE1, a CT gene 100 kb upstream of MAGEA4, is rarely coexpressed with MAGE4, given its R value of −0.2 for expression and methylation.
Understanding the mechanisms of CT upregulation is of clinical importance as correlations between CT expression and epigenetic changes could lead to combinatorial therapies. For example, a combined regimen of demethylation agents and histone deacetylase (HDAC) inhibitors might increase the expression of a number of CT genes, thus leading to improved immune responses.
Expression of many CTs shows association with prognosis in this study. However, our results do not address whether survival effects might relate to any biologic driver effect of CT genes. Instead, our data suggest that this may relate to subtype-specific or grade-associated expression. Among the ten cancers studied, we identified the most significant subtype effects in ccRCCs. For this cancer, patients with subtype ccB have a significantly worse survival rate than those with ccA or ccAB. The largest number of prognostic CTs is identified in ccRCCs and most of them exhibit a subtype-specific expression pattern. Multivariate Cox regression analysis with other clinical parameters shows that the prognostic value of CT expression could be attributed to its association with higher tumor grade/stage. Among the breast cancer molecular subtypes, the HER2-overexpressing and basal-like breast cancers have been shown to link to poorer outcomes (49, 50). In the TCGA dataset, however, this link was not significant for the HER2 or basal subgroups. The median follow-up for the dataset is approximately 430 days (14 months), which is much shorter than that in other existing datasets. Thus, it is not surprising that our data did not reveal prognostic CTs in breast cancer, even though a large number of subtype-specific CTs are identified in the TCGA breast cancer dataset.
In this study, we have used the extensive TCGA RNAseq datasets as the indicator of gene expression. Future serologic experiments will be needed to investigate the correlation between messenger RNA, protein expression, and antibody response in serum. Our data provide support for additional studies toward the development of enhanced approaches for cancer immunotherapy based on subtype-specific expression of the CT genes as either potential diagnostic markers or immunotherapeutic targets.
Disclosure of Potential Conflicts of Interest
W.K.A. Yung has received a research grant from Daiichi Sankyo, and honoraria from Actelion and serves as a consultant to Novartis and Merck. No potential conflicts of interest were disclosed by the other authors.
Conception and design: J. Yao, W.K.A. Yung, Q. Zhao
Development of methodology: J. Yao, Q. Zhao
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Yao, G.J. Riggins
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Yao, O.L. Caballero, J.N. Weinstein, Q. Zhao
Writing, review, and/or revision of the manuscript: J. Yao, O.L. Caballero, G.J. Riggins, R.L. Strausberg, Q. Zhao
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Q. Zhao
Study supervision: Q. Zhao
The authors thank the TCGA research network for providing the cancer genomics data used in this study.
This study is supported in part by the TCGA grant U24CA143883 from NCI/NIH, and funds from Ludwig Institute for Cancer Research.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.