Abstract
Promoter elements play important roles in isoform and cell type–specific expression. We surveyed the epigenomic promoter landscape of gastric adenocarcinoma, analyzing 110 chromatin profiles (H3K4me3, H3K4me1, H3K27ac) of primary gastric cancers, gastric cancer lines, and nonmalignant gastric tissues. We identified nearly 2,000 promoter alterations (somatic promoters), many deregulated in various epithelial malignancies and mapping frequently to alternative promoters within the same gene, generating potential pro-oncogenic isoforms (RASA3). Somatic promoter–associated N-terminal peptides displaying relative depletion in tumors exhibited high-affinity MHC binding predictions and elicited potent T-cell responses in vitro, suggesting a mechanism for reducing tumor antigenicity. In multiple patient cohorts, gastric cancers with high somatic promoter usage also displayed reduced T-cell cytolytic marker expression. Somatic promoters are enriched in PRC2 occupancy, display sensitivity to EZH2 therapeutic inhibition, and are associated with novel cancer-associated transcripts. By generating tumor-specific isoforms and decreasing tumor antigenicity, epigenomic promoter alterations may thus drive intrinsic tumorigenesis and also allow nascent cancers to evade host immunity.
Significance: We apply epigenomic profiling to demarcate the promoter landscape of gastric cancer. Many tumor-specific promoters activate different promoters in the same gene, some generating pro-oncogenic isoforms. Tumor-specific promoters also reduce tumor antigenicity by causing relative depletion of immunogenic peptides, contributing to cancer immunoediting and allowing tumors to evade host immune attack. Cancer Discov; 7(6); 630–51. ©2017 AACR.
This article is highlighted in the In This Issue feature, p. 539
Introduction
Gastric cancer is the third leading cause of global cancer mortality with high prevalence in many East Asian countries (1). Patients with gastric cancer often present with late-stage disease (2, 3), and clinical management remains challenging, as exemplified by several recent negative phase II and phase III clinical trials (4–7). At the molecular level, studies have identified characteristic gene mutations (8, 9), copy-number alterations, gene fusions (10), and transcriptional patterns in gastric cancer (11, 12). However, few of these have been clinically translated into targeted therapies, with the exception of HER2-positive gastric cancer and trastuzumab (13). There is thus a strong need for additional and more comprehensive explorations of gastric cancer, as these may highlight new biomarkers for disease detection, predicting patient prognosis or responses to therapy, as well as new therapeutic modalities.
Promoter elements are cis-regulatory elements that function to link gene transcription initiation to upstream regulatory stimuli, integrating inputs from diverse signaling pathways (14). Promoters represent an important reservoir of biological, functional, and regulatory diversity, as current estimates suggest that 30% to 50% of genes in the human genome are associated with multiple promoters (15), which can be selectively activated as a function of developmental lineage and cellular state (16). Differential usage of alternative promoters causes the generation of distinct 5′ untranslated regions (5′ UTR) and first exons in transcripts, which in turn can influence mRNA expression levels (17), translational efficiencies (18, 19), and the generation of different protein isoforms through gain and loss of 5′ coding domains (15, 20). In cancer, alternative promoters in genes such as ALK (21), TP53 (22), LEF1 (23), and CYP19A1 (24) have been reported, producing cancer-specific isoform variants with oncogenic properties. To date, promoter alterations in cancer have been largely studied on a gene-by-gene basis, and very little is known about the global extent of promoter-level diversity in gastric cancer and other solid malignancies.
Promoters in the genome can be experimentally identified by various methods. Broadly divided into RNA-based or epigenomic approaches, the former involves technologies such as RNA sequencing (RNA-seq), CAP analysis gene expression (CAGE), and global run-on sequencing (25–27). For the latter, active promoters have been shown to exhibit characteristic chromatin modifications, specifically H3K4me3 positivity, H3K27ac positivity, and H3K4me1 depletion (28–31). Compared with transcriptome sequencing (25), using histone modifications to identify promoters carries certain advantages. First, epigenome-guided promoter identification allows genomic localization of the promoter element itself, rather than the ensuing transcript product. Second, particularly for clinical samples, epigenome-guided promoter identification is less prone to transcript degradation artifacts caused by 5′ RNA exonucleases (32). Epigenome-marked promoters may also highlight transcript classes not easily detectible by other means, such as promoters originating via recapping events, short-lived RNAs (27), or unstable RNAs with greater sensitivity to exosome-mediated decay, such as promoter upstream transcripts (33) and/or enhancer RNAs (34–36).
In this work, we analyzed a gastric cancer cohort of primary samples and cell lines to survey the landscape of altered promoter elements in gastric cancer. Our study applied microscale histone modification profiling [Nano–chromatin immunoprecipitation sequencing (Nano-ChIP-seq)] to profile primary cancers (37, 38), which allows the measurement of epigenomic modifications in vivo, compared with laboratory-cultured cell lines that may harbor epigenomic artifacts due to in vitro culture (39, 40). By comparing the epigenomic promoter profiles of primary gastric cancers and matched normal tissues, we observed pervasive alterations in promoter usage in gastric cancer, and an important role for somatic promoters in enhancing gastric cancer transcript, protein isoform, and immunogenic diversity. We also found that many somatic promoters observed in gastric cancer are deregulated in other cancer types, supporting a generalized role for somatic promoters in solid malignancies. To our knowledge, our study represents one of the largest and most comprehensive surveys of somatic promoters in any single tumor type.
Results
Identifying Epigenomic Promoter Alterations in Gastric Cancer
Using Nano-ChIP-seq (37), we profiled three histone modification marks (H3K4me3, H3K27ac, and H3K4me1) across 17 gastric cancers, matched normal gastric mucosae (34 samples), and 13 gastric cancer cell lines, generating 110 epigenomic profiles (Supplementary Tables S1 and S2 provide clinical and sequencing metrics; Fig. 1A). Quality control of the Nano-ChIP-seq data was performed using two independent methods: ChIP enrichment at known promoters and employing the ChIP-seq quality control and validation tool ChIP-seq analytics and confidence estimation (CHANCE; ref. 41). Comparisons of Nano-ChIP-seq read densities at 1,000 promoters associated with highly expressed protein-coding genes confirmed successful enrichment in all H3K27ac and H3K4me3 libraries. CHANCE analysis also revealed that the large majority (81%) of samples exhibited successful enrichment (Supplementary Table S2). We have previously also shown that Nano-ChIP signals exhibit a good concordance with orthogonal ChIP-qPCR results (38).
To enable accurate promoter identification, we integrated data from multiple histone modifications, selecting H3K4me3 regions simultaneously codepleted for H3K4me1 (“H3K4me3hi/H3K4me1lo regions”; Supplementary Fig. S1; Methods; ref. 42). Comparisons against data from external sources, including GENCODE reference transcripts, ENCODE chromatin state models, and CAGE databases, validated the vast majority of H3K4me3hi/H3K4me1lo regions as true promoter elements (Supplementary Text; Supplementary Fig. S1). Because primary gastric tissues comprise several different tissue types, including epithelial cells, immune cells, and stroma, we further confirmed that our promoter profiles were reflective of bona fide gastric epithelia by comparison against Epigenome Roadmap data for gastric and nongastric tissues. Gastric tumor and matched normal promoter profiles exhibited the highest correlations to Roadmap gastric mucosae and were distinct from other gastrointestinal tissues (small intestine, colon mucosa, colon sigmoid), stomach-associated muscle, skin, and blood (CD14; Supplementary Fig. S2). Primary tissue promoter profiles also showed a significant overlap with promoter profiles of gastric cancer cell lines (87%), which are purely epithelial in origin, compared with gastrointestinal fibroblast lines (58%–69%), and colon carcinoma lines (59%–74%; Supplementary Fig. S2).
In total, we mapped approximately 23,000 promoter elements in the Nano-ChIP-seq cohort. Visual exploration of these promoter elements identified three main promoter categories: unaltered promoters, promoters gained in tumors (gained somatic or tumor-specific promoters), and promoters present in normal gastric tissues but lost or decreased in gastric cancer (lost somatic or normal-specific promoters; Fig. 1A–C). Representative examples of unaltered promoters included RHOA (Fig. 1A), whereas CEACAM6, an intracellular adhesion gene, exhibited somatic promoter gain at the CEACAM6 transcription start site (TSS) in tumor samples and cell lines (Fig. 1B). Conversely, ATP4A, a parietal cell–associated H+/K+ ATPase with decreased expression in gastric cancer (43), exhibited somatic promoter loss (Fig. 1C). Both CEACAM6 and ATP4A promoter alterations were correlated with increased and decreased CEACAM6 and ATP4A gene expression in the same samples, respectively (Fig. 1B and C).
Previous studies have established distinct molecular subtypes of gastric cancer (11, 12, 44). Because of limited sample sizes, however, we elected in the current study to identify promoter alterations (“somatic promoters”; ref. 45–47) present in multiple gastric cancer tissues relative to control tissues irrespective of subtype. Focusing on recurrent alterations also has the benefit of reducing potential artifacts due to “private” epigenomic variation or individual sample-specific technical errors. Using two complementary read count–based algorithms commonly used for analysis of ChIP-seq data (48–50), we identified approximately 2,000 highly recurrent somatic promoters, of which 75% were gained in gastric cancers [fold change (FC) ≥ 1.5, q < 0.1]. Two-dimensional heat map clustering and principal component analysis (PCA) plots based on somatic promoters confirmed a separation of gastric cancer samples from normal samples (Fig. 1D; Supplementary Fig. S3). Somatic promoter H3K4me3 levels were also highly correlated with H3K27ac signals, commonly regarded as a marker of active regulatory activity (42, 51). This correlation was observed across all somatic promoters (r = 0.84, P < 0.001, Fig. 1E), and also when gained somatic promoters and lost somatic promoters were analyzed separately (r = 0.78, P < 0.001 for gained somatic; r = 0.82, P < 0.001 for lost somatic, Supplementary Fig. S3). Pathway analysis revealed that both gained somatic promoters and lost somatic promoters were significantly associated with expression gene sets previously reported to be upregulated and downregulated in gastric cancer, respectively (Fig. 1F). These included upregulated oncogenes (MET, ABL2), cell adhesion genes (CEACAM6), and claudin family members (CLDN7, CLDN3). Fifteen percent to 18% of somatic promoters mapped to noncoding RNAs, including HOTAIR and PVT1, previously associated with gastric cancer (Supplementary Table S3; refs. 52, 53). Additional analyses at increasing thresholds of stringency (FC from 1.5 to 2 and FDR from 0.1 to 0.001) yielded similar results, supporting the robustness of this analysis (Supplementary Fig. S3). These results demonstrate that normal gastric epithelia and gastric cancers can be distinguished on the basis of epigenomic promoter profiles.
Somatic Promoters in Gastric Cancer Exhibit Deregulation in Diverse Cancer Types
To explore relationships between epigenomic promoter alterations and gene expression, we analyzed RNA-seq data from the same discovery cohort (∼106 million reads/sample), quantifying RNA-seq transcript reads mapping to the epigenome-guided promoter regions or directly downstream. Examining somatic promoter regions (Fig. 2A provides an illustrative example of a gained somatic promoter), we observed significantly increased expression at gained somatic promoters in gastric cancers and significantly decreased expression at lost somatic promoters, compared with either all promoters (P < 0.001; Fig. 2B), or unaltered promoters (P < 0.001; Supplementary Fig. S4; refs. 27, 51, 54). Among other types of epigenetic modifications, previous studies have also reported a reciprocal relationship between active regulatory regions and DNA methylation (54, 55). Using Infinium 450K DNA methylation arrays, we identified 7,505 CpG sites overlapping the somatic promoter regions (5,213 sites for gained somatic promoters; 2,292 sites for lost somatic promoters). Promoters gained in gastric cancer were significantly hypomethylated compared with all promoters (P < 0.001, Wilcoxon test), whereas promoters lost in gastric cancer were hypermethylated (P < 0.001, Wilcoxon test; Fig. 2B, bottom). As DNA methylation typically occurs in CpG-rich regions (56), we then repeated the analysis focusing only on CpG island–bearing promoters (Methods). Similar to the original results, CpG island–bearing promoters gained in gastric cancer were significantly hypomethylated compared with all CpG island–bearing promoters (P < 0.001, Wilcoxon test), whereas CpG island–bearing promoters lost in gastric cancer were hypermethylated (P < 0.001, Wilcoxon test; Supplementary Fig. S5).
To validate the somatic promoters in a larger independent gastric cancer cohort and also to examine their behavior in other cancer types, we proceeded to query RNA-seq data of 354 gastric cancer samples from the The Cancer Genome Atlas (TCGA) consortium (n = 321 gastric cancer samples, n = 33 matched normals). To perform this analysis, RNA-seq reads from TCGA samples were mapped against the epigenome-guided somatic promoter regions defined by the discovery samples and normalized to calculate fold-change differences in expression between gastric cancer samples versus normal samples (see Methods). Similar to the discovery series, we observed that TCGA gastric cancer samples also exhibited significantly increased expression at gained somatic promoters, whereas lost somatic promoters exhibited decreased expression, relative to either all promoters (P < 0.001; Fig. 2C) or unaltered promoters (P < 0.001; Supplementary Fig. S4). We further tested the tissue specificity of the gastric cancer somatic promoters by querying RNA-seq data from other tumor types, including colon cancer, kidney renal clear cell carcinoma (ccRCC), and lung adenocarcinoma (LUAD; Fig. 2D). Almost two thirds (n = 1,231, 63%, FC ≥ 1.5) of gastric cancer somatic promoters were also differentially regulated in TCGA colon cancer samples, and similarly, a significant proportion of gastric cancer somatic promoters were also associated with differential RNA-seq expression in TCGA ccRCC (n = 939, 48%, FC ≥ 1.5) and LUAD samples (n = 1,059, 54%, FC ≥ 1.5; Fig. 2D). This result suggests that many gastric cancer somatic promoters are also likely associated with deregulated promoter activity in other solid epithelial malignancies.
Role of Alternative Promoters
By comparing the somatic promoters against the reference GENCODE database (V19; ref. 57), we discovered extensive use of alternative promoters (18%) in gastric cancers, defined as situations where a common unaltered promoter is present in both normal tissues and tumors (canonical promoter) but a secondary tumor-specific promoter is engaged in the latter (alternative promoter). The remaining 82% of somatic promoters corresponded to single major isoforms or unannotated transcripts (see below). Fifty-seven percent of the alternative promoters occurred downstream of the canonical promoter. Using multiple RNA-seq analysis methods, we confirmed that transcript isoforms driven by alternative promoters are overexpressed in gastric cancers to a significantly greater degree than canonical promoters in the same gene (Methods; Supplementary Fig. S6). For example, HNF4A, a transcription factor overexpressed in gastric cancer (58–61), is driven by two promoters (P1 and P2). At the HNF4A canonical promoter (“P2”), we observed equal promoter signals in gastric cancer tissues and normal tissues; however, we also further observed gain of an additional promoter in gastric cancers at a TSS 45 kb downstream (“P1”). Similarly, HNF4A P1 promoter gains were also observed in gastric cancer cell lines (Fig. 3A), with RNA-seq analysis supporting expression of the HNF4A P1 isoform in gastric cancers. Alternative promoter usage was also observed at the EPCAM gene, frequently used to identify circulating tumor cells, causing expression of EPCAM transcript ENST00000263735.4 (Fig. 3B). Notably, both the HNF4A and EPCAM alternative isoforms exhibited significantly greater cancer overexpression compared with their canonical isoforms (Supplementary Fig. S6). Other genes associated with tumor-specific alternative promoters, many reported for the first time, include NKX6-3 (FC = 1.83, q < 0.05) and GRIN2D (FC = 1.9, q < 0.001). A complete list of gastric cancer tumor–specific promoters is provided in Supplementary Table S4.
To explore the influence of alternative promoters on protein diversity, we identified 714 tumor-specific promoter alterations predicted to change N-terminal protein composition and also supported by both H3K4me3 and RNA-seq data. The vast majority of these alterations (>95%) were in-frame to that of the canonical protein. Of these, 47% (n = 338) were predicted to cause gains of new N-terminal peptides in tumors (see Methods). To confirm protein-level expression of these N-terminal peptides in gastrointestinal cancer, we queried publicly available peptide spectral data of 90 TCGA colorectal cancer and 60 normal colon samples (62, 63). Colorectal cancer data were used for this analysis, as large-scale proteomic data of primary gastric cancers are not currently available, and because many gastric cancer somatic promoters are also observed in colorectal cancer (Fig. 2D). Across all proteins detected, proteins with gained promoters were overexpressed in cancer to a significantly greater extent than proteins without gained promoters (P < 0.001; 63% vs. 54%; Fisher test). Then, examining specific N-terminal peptides predicted to be gained in tumors, we confirmed protein expression of 33% (112/338) in the colorectal cancer data (Supplementary Table S5), of which 51.8% were overexpressed in colorectal cancer samples relative to normal colon samples (FDR, 10%). In a separate experiment, we further investigated whether these N-terminal peptides also exhibit tumor overexpression in proteomic data from 3 gastric cancer cell lines and 1 normal gastric epithelial line (GES1; see Methods). Similar to the colorectal cancer data, 48% of the N-terminal peptides were overexpressed in the gastric cancer lines relative to normal GES1 gastric cells, and again, a significantly greater proportion of proteins with gained promoters were overexpressed in cancer compared with proteins without gained promoters (P < 0.001; Fisher test). Taken collectively, these analyses suggest that alternative promoters may contribute significantly toward proteomic diversity in gastrointestinal cancer.
To examine possible functions of somatic promoters in cancer development, we focused on RASA3, which encodes a RAS GTPase-activating protein required for Gαi-induced inhibition of MAPK (64). In both gastric cancers (50%) and gastric cancer lines, we observed gain of promoter activity at an intronic region 127 kb downstream apart from the canonical RASA3 TSS (Fig. 3C, top; Supplementary Fig. S7). RNA-seq and 5′ Rapid Amplification of cDNA Ends (RACE) analysis confirmed expression of this shorter RASA3 isoform (Fig. 3C, bottom), and expression of this shorter RASA3 isoform was also observed in TCGA RNA-seq data (Fig. 3C). Using isoform-specific quantitative PCR, we confirmed that although both the canonical full-length RASA3 and shorter RASA3 isoform are overexpressed in gastric cancer tissues relative to matched normal tissues (P < 0.01 Student one sided t test), the shorter RASA3 isoform is overexpressed to a significantly greater extent (FC = 2.64, P = 0.01, Student one sided t test; Supplementary Fig. S7). Compared with the canonical full-length RASA3 protein (CanT), the shorter 31-kDa RASA3 somatic isoform (SomT) is predicted to lack the N-terminal RASGAP domain (Fig. 3D). Consistent with these predictions, transfection of RASA3 CanT into GES1 normal gastric epithelial cells induced lower levels of active GTP-bound RAS compared with either empty vector or RASA3 SomT transfected cells, indicating that RASA3 CanT has higher RASGAP activity (Supplementary Fig. S7).
To address functions of RASA3 SomT, we transfected the RASA3 CanT and SomT isoforms into SNU1967 gastric cancer cells. Compared with untransfected cells, transfection of RASA3 SomT into SNU1967 cells significantly stimulated migration (P < 0.01) and invasion (P < 0.01), whereas RASA3 CanT significantly suppressed invasion (P < 0.001; Fig. 3E, Supplementary Fig. S7). Similarly, transfection of RASA3 SomT into GES1 cells significantly stimulated migration (P < 0.01, Fig. 3E) and invasion (P < 0.01, Supplementary Fig. S7), whereas RASA3 CanT did not. When tested on KRAS-mutated AGS gastric cancer cells that are innately highly migratory, expression of RASA3 CanT potently suppressed migration, whereas RASA3 SomT exhibited significantly less attenuation (P = 0.03, Supplementary Fig. S7). These results suggest that tumor-specific use of RASA3 SomT is likely to increase gastric cancer cell migration and invasion. Notably, RASA3 CanT and SomT transfections did not alter SNU1967, GES1, or AGS cellular proliferation rates (Supplementary Fig. S7). To confirm that these observations are not due to nonphysiologic in vitro expression levels, we then examined NCC24 gastric cancer cells, which normally express high endogenous levels of RASA3 SomT and minimal RASA3 CanT (Supplementary Fig. S7). Silencing of endogenous RASA3 SomT using two independent siRNA constructs significantly inhibited NCC24 migration and invasion (P < 0.01–0.001; Supplementary Fig. S7), consistent with RASA3 SomT playing a role in promoting cancer migration and invasion.
In an earlier study (38), we reported a transcript isoform of the MET receptor tyrosine kinase driven by an internal alternative promoter, which has been independently confirmed in other cancer types (65). However, functional implications of this MET variant (Var) remain unclear. RNA-seq and 5′ RACE analysis confirmed transcript expression of this shorter isoform, predicted to harbor a truncated SEMA domain (Supplementary Fig. S8). To assess functional differences between wild-type (WT) and Var MET, we performed transient transfections of MET-WT and MET-Var into HEK293 cells. In both untreated and HGF-treated conditions, MET-Var transfected cells exhibited significantly higher levels of pGAB1 (Y627), a key mediator of MET signaling [e.g., 2.48- to 3.95-fold comparing MET-Var vs. MET-WT, P = 0.003 (untreated), P < 0.05 (T15 and T30); ref. 66]. In addition, in HGF-untreated samples, cells transfected with MET-Var also exhibited higher pERK1/2 levels (2.74-fold) and higher pSTAT3 (Y705; refs. 67–70) levels (1.80-fold) compared with MET-WT [P = 0.023 and P = 0.026 for pERK and pSTAT3 (Y705), respectively]. These results suggest that expression of the MET-Var isoform may promote MET downstream signaling kinetics in a manner important for gastric cancer tumorigenesis.
Somatic Promoters Correlate with Tumor Immunity
Cancer immunoediting is a process where developing tumors sculpt their immunogenic and antigenic profile to evade host immune surveillance (71, 72). Mechanisms of cancer immunoediting are diverse, including upregulation of immune checkpoint inhibitors, such as PD-L1 (72). To explore potential contributions of somatic promoters to tumor immunity, we identified somatic promoter–associated N-terminal peptides with high predicted affinity binding to various MHC class I HLA alleles (Supplementary Tables S6 and S7), which are required for antigen presentation to CD8+ cytotoxic T cells (IC50 ≤ 50 nmol/L, Fig. 4A; ref. 73). Analysis of recurrent somatic promoter–associated peptides using the NetMHCpan-2.8 (74) algorithm against patient-specific MHC class I alleles revealed a significant enrichment in high-affinity MHC I binding compared with multiple control peptide populations, including canonical gastric cancer peptides (average 36% vs. 24%; P < 0.01), randomly selected peptides (P < 0.001), and C-terminal peptides (P < 0.01; Fig. 4B shows HLA-A, B, and C combined; Supplementary Fig. S9A depicts data for HLA-A only). The majority of these high-affinity somatic promoter–associated peptides corresponded to situations where the somatic transcript lacking the N-terminal peptide is overexpressed in tumors relative to normal tissues (78% lost; 76/97 high-affinity peptides, Fig. 4C), and the proportion of high-affinity MHC-binding peptides among lost peptides was significantly greater than among gained peptides (37% vs. 21%, P < 0.05, Fisher test). Notably, because transcripts driven by the N-terminal lacking somatic TSSs are also overexpressed in tumors to a significantly greater degree than transcripts driven by the canonical TSS (P < 0.05, Wilcoxon one-sided test; Supplementary Fig. S6), such a scenario would be expected to result in relative depletion of N-terminal immunogenic peptides in tumors. Interestingly, an analogous N-terminal analysis using RNA-seq data alone (in the absence of epigenomic data) revealed that epigenome-guided N-terminal peptides exhibited significantly higher predicted immunogenicity scores compared with RNA-seq–only identified peptides (36.10% vs. 27% for MHC presentation,P = 0.02, Fisher test), suggesting that epigenome-guided promoter identification can provide complementary value to RNA-seq–only guided analyses (Supplementary Fig. S9).
To explore whether somatic promoters might contribute to reducing tumor antigen burden and immunoreactivity in vivo, we proceeded to examine correlations between promoter alterations and intratumor T-cell activity in various primary gastric cancer cohorts. First, to detect promoter alterations in a cohort of 95 gastric cancer–normal pairs [Singapore (SG) cohort], we generated a customized NanoString panel targeting the top 95 recurrent gastric cancer somatic promoters, measuring transcripts associated with either the canonical promoter or the alternative promoter. There was a significant correlation between the NanoString data and RNA-seq (Supplementary Fig. S10, r = 0.65, P < 0.001), with approximately 35% of transcripts driven by alternate promoters upregulated in more than half of the gastric cancers (Fig. 4D). Second, to examine markers of T-cell activity in these same gastric cancer samples, we analyzed previously published microarray data (75) to measure CD8A (a measure of CD8+ tumor-infiltrating lymphocytes), and granzyme A (GZMA) and perforin (PRF1; refs. 76–78), which are both T-cell effectors and validated markers of T-cell cytolytic activity. We confirmed that these three genes (CD8A, GZMA, and PRF1) were not themselves associated with somatic promoters. Comparing the top and bottom quartiles, gastric cancers with high somatic promoter usage exhibited significantly lower GZMA and PRF1 levels (P < 0.001 and P = 0.01, Wilcoxon test) indicating lower T-cell cytolytic activity (Fig. 4E, top left), and also a trend toward lower CD8A levels (P = 0.14, Wilcoxon one-sided test). These findings support a lower level of tumor antigenicity in gastric cancers with high somatic promoter usage, as recurrent N-terminal peptides lost through somatic promoters are predicted to be collectively more immunogenic than peptides gained through somatic promoters (Fig. 4C). Using two different algorithms (ASCAT, ref. 79, and ESTIMATE, ref. 80), we further confirmed that the decreased GZMA and PRF1 levels are independent of tumor purity differences between gastric cancers (Supplementary Fig. S10). Similar results were obtained upon splitting the gastric cancer samples based on median promoter usage score (GZMA, P < 0.001; and PRF1, P = 0.03). Patients with gastric cancers exhibiting high somatic promoter usage (top 25%) also showed poor survival compared with patients with gastric cancers with low somatic promoter usage (bottom 25%; Fig. 4E, top right, HR = 2.56, P = 0.02). Again, dividing patients by their median somatic promoter usage score also showed similar survival differences (Supplementary Fig. S10, HR = 1.81, P = 0.04).
To validate these findings, we then analyzed two other prominent gastric cancer cohorts: one from TCGA and another from the Asian Cancer Research Group (ACRG). In the TCGA cohort, availability of RNA-seq data allowed us to infer somatic promoter usage directly from next-generation sequencing data (Fig. 2C). Similar to the Singapore cohort, TCGA gastric cancers with high somatic promoter usage (top 25%) exhibited decreased CD8A (P = 0.002, Wilcoxon one-sided test), GZMA (P = 0.001, Wilcoxon one-sided test), and PRF1 levels (P = 0.005, Wilcoxon one-sided test, Fig. 4E, bottom left) compared with gastric cancers with low somatic promoter usage (bottom 25%) in a manner independent of tumor purity (Supplementary Fig. S10). Notably, as previous studies have suggested that somatic mutation burden may also correlate with intratumor T-cell cytolytic response (81), we further repeated the analysis after adjusting for the total number of missense mutations in each sample using a regression-based approach. Even after correcting for somatic mutation burden, we still observed decreased CD8A (P = 0.02, Wilcoxon one-sided test), GZMA (P = 0.01, Wilcoxon one-sided test), and PRF1 expression (P = 0.03, Wilcoxon one-sided test) in samples with high somatic promoter usage (top 25% against bottom 25%; Supplementary Fig. S10).
We leveraged a third independent cohort of gastric cancersamples from ACRG (11). Using NanoString to target 89 canonical and alternative promoters along with various immune markers, we profiled 264 primary gastric cancer samples from the ACRG cohort (11). Forty percent of alternative promoter transcripts showed tumor-specific expression in more than half of the samples (Supplementary Fig. S10). Once again, samples with high somatic promoter usage (top 25%) showed significantly lower expression of T-cell cytolytic activity markers, including CD8A (P = 0.035, Wilcoxon one-sided test), CD4A (P = 0.005, Wilcoxon one-sided test), GZMA (P = 0.001, Wilcoxon one-sided test), and PRF1 (P = 0.025, Wilcoxon one-sided test; Fig. 4E, bottom right; Supplementary Fig. S10). Similar results were obtained upon splitting the gastric cancer samples based on median promoter usage score (Supplementary Table S8). Also, after adjusting for mutational burden (for cases where information is available), samples with high somatic promoter usage still showed decreased CD8A (P = 0.167, Wilcoxon one-sided test), GZMA (P = 0.009, Wilcoxon one-sided test), and PRF1 (P = 0.03, Wilcoxon one-sided test) expression (Supplementary Fig. S10). Taken collectively, these results, observed across multiple gastric cancer cohorts and assessed using diverse technologies (microarray, RNA-seq, NanoString), all support a significant association between somatic promoter usage and reduced tumor immunity levels. Importantly, the decreased levels of T-cell cytolytic activity associated with somatic promoter usage are likely independent of tumor purity and mutational load.
Somatic Promoter–Associated Peptides Are Immunogenic In Vitro
To functionally test the ability of N-terminal peptides depleted in gastric cancer to elicit immune responses, we conducted in vitro assays using the high-throughput Epitope Maximum (EPIMAX) platform (82, 83), which allows multiepitope testing for both T-cell proliferation and cytokine production. First, we identified N-terminal peptides predicted to exhibit high HLA-binding affinities across a pool of healthy peripheral blood mononuclear cell (PBMC) donors. Second, selecting 15 alternative promoter–associated peptides for testing, we generated peptide pools for each peptide (Supplementary Tables S9 and S10; Methods), which were then used to stimulate PBMCs from 9 healthy donors. T-cell proliferation and cytokine production levels were measured and benchmarked against control peptides (Supplementary Table S11). Across all 135 exposures (15 peptides across 9 donors), we observed strong cytokine responses for 79 peptide pools (58%; FC ≥ 2 relative to actin peptides; Fig. 4F and G) inducing complex Th1, Th2, and Th17 polarizations in a donor-dependent fashion (Supplementary Fig. S11).
To test the immunogenic capacity of specific N-terminal peptides in a more cellular setting, we then assessed responses of T cells previously primed to recognize either altered or WT peptides, when cocultured with HLA-matched isogenic gastric cancer cells expressing either altered or WT peptides, respectively (Supplementary Fig. S11). Similar approaches have been used by previous investigators to investigate protein and peptide immunogenicity (84–86). By MHC-I affinity screening, a VMCDIFFSL nonamer in the WT RASA3 N-terminus was predicted to exhibit high MHC-I affinity binding for both the HLA-A02:01 (IC50 = 6.93 nm) and HLA-A02:06 (IC50 = 9.74 nm) alleles. Using HLA-A*02:06 T cells that are cross-reactive to HLA-A*02:01–positive AGS cells (87, 88), we tested the release of IFNγ from primed T cells after exposure to AGS lysates expressing either RASA3 CanT or SomT isoforms. ELISA assays demonstrated that T cells primed to recognize RASA3 CanT released significantly more IFNγ when cocultured with RASA3 CanT–expressing AGS cells than when cocultured with RASA3 SomT–expressing AGS cells. In contrast, T cells primed with RASA3 SomT did not exhibit appreciable IFNγ release when cocultured with RASA3 SomT–expressing AGS cells (Supplementary Fig. S11). Thus, under similar in vitro conditions, RASA3 CanT is capable of eliciting a stronger immune response than RASA3 SomT, consistent with the RASA3 CanT N-terminus being more immunogenic. Taken collectively, these in vitro results demonstrate that peptides predicted to exhibit relative depletion in gastric cancers through somatic promoters can produce immunogenic responses, with the magnitude of immune responses depending on both peptide sequence and host immune background.
Somatic Promoters Are Associated with EZH2 Occupancy
To identify potential oncogenic mechanisms driving the somatic promoters, we intersected the genomic locations of the somatic promoters with transcription factor binding sites of 237 transcription factors from 83 different tissues (89). Regions exhibiting somatic promoters were significantly enriched in regions associated with EZH2 (P < 0.01) and SUZ12 (P < 0.01) binding (Fig. 5A; Supplementary Table S12), confirming earlier findings on a smaller cohort (38). Both EZH2 and SUZ12 are components of the Polycomb repressor complex 2 (PRC2) epigenetic regulator complex, which is upregulated in many cancer types, including gastric cancer (90–95). To validate these findings, we then performed EZH2 ChIP-seq on HFE-145 normal gastric epithelial cells (Methods). Concordant with the previous findings, we observed significant enrichment of EZH2 binding sites at somatic promoters compared with all promoters (enrichment score 27 vs. 13 for all promoters, P < 0.01), and this EZH2 enrichment remained significant when the gained somatic (enrichment score 28,P < 0.01) and lost somatic promoters (enrichment score 24,P < 0.01) were analyzed separately (Supplementary Fig. S12).
To experimentally test whether inhibiting EZH2/PRC2 activity might modulate somatic promoter usage in gastric cancer, we treated IM95 gastric cancer cells with GSK126, a highly selective small-molecule inhibitor of EZH2 methyltransferase activity (90, 96). This line was selected because it has previously been shown to be sensitive to EZH2 depletion (Supplementary Fig. S12; ref. 97). RNA-seq analysis of GSK126-treated IM95 cells at two treatment time points (days 6 and 9) confirmed that genes upregulated upon EZH2 inhibition are enriched in previously identified PRC2 target gene sets (Supplementary Fig. S12). GSK126 treatment caused deregulation of 2,134 promoters in total. Of 1,959 promoters exhibiting somatic alterations in primary gastric cancers (Fig. 1D), GSK126 treatment caused deregulation of 251 somatic promoters in IM95 cells (12.8%). This proportion was significantly greater than the proportion of unaltered promoters exhibiting deregulation after GSK126 challenge (8.8%, OR = 1.46, P < 0.001, Fisher test, Fig. 5B), suggesting heightened sensitivity of somatic promoters to EZH2 inhibition. The proportion of somatic promoters deregulated after EZH2 inhibition was also greater than the total proportion of genes (as defined by GENCODE) regulated by GSK126 (1.5%, OR = 9.21, P < 0.001, Fig. 5B). Of those promoters exhibiting both GSK126 deregulation and also mapping to somatic promoters lost in primary gastric cancer, 89.6% were reactivated following GSK126 administration (78/87, FC ≥ 2, q < 0.1; Methods), consistent with EZH2 functioning to repress these promoters. For example, Fig. 5C and D highlights two lost somatic promoters (SLC9A9 and PSCA), exhibiting expression gain after GSK126 treatment (Fig. 5). These results thus suggest a role for EZH2 in regulating somatic promoters in gastric cancer.
Somatic Promoters Reveal Novel Cancer-Associated Transcripts
Finally, when analyzing the altered somatic promoters with respect to proximity to known genes, we found that somatic promoters could be classified into annotated and unannotated categories. Annotated promoters were defined as promoters mapping close (<500 bp) to a known GENCODE TSS, whereas unannotated promoters were those mapping to genomic regions devoid of known GENCODE TSSs. The majority of promoters present in nonmalignant tissues, and also promoters unchanged between tumors and normal tissues, mapped closely to previously annotated TSSs (72%–92%). In contrast, only 41% of somatic promoters mapped to annotated promoter locations, whereas the remaining 59% mapped to “unannotated” locations, distant from GENCODE TSSs and in many cases 2 to 10 kb away (Fig. 6A).
To test the functional relevance of these unannotated promoters, we used GenoCanyon, a nucleotide-level quantification of genomic functional potential that integrates multiple levels of conservation and epigenomic information (98). We observed that 81% of the unannotated promoter regions exhibited a maximum genome-wide functional score of greater than 0.9 (range 0–1), indicating high functional potential. To ascertain tissue-type specificities, we then applied tissue-specific annotations using GenoSkyline (99), an extension of the GenoCanyon framework integrating Roadmap Epigenomics data (54). We observed that gastrointestinal tissues had the third highest median score after embryonic stem cell (ESC) and fetal tissues, consistent with our tumors being gastric in lineage and also dedifferentiated (Fig. 6B). In a separate analysis, recent studies have also suggested that endogenous repeat elements in the human genome may contribute significantly to regulatory element variation (100), and hypomethylation of repeat elements can induce cancer-associated transcription (101). We found that unannotated promoters were also significantly enriched for the repeat elements ERV1 (P < 0.0001 unannotated vs. all) and L1 (P < 0.0001 unannotated vs. all; Supplementary Fig. S13).
Compared with annotated promoters, unannotated promoters exhibited weaker H3K27ac signals, suggesting that the former might have lower activity and decreased gene expression levels (Supplementary Fig. S13). Supporting this, somatic promoters, even those supported by CAGE tags (indicating true promoters), exhibited significantly lower RNA-seq expression levels compared with all CAGE tag–supported promoters (Fig. 6C). We thus hypothesized that unannotated promoters might be associated with low transcript levels, thereby rendering them more challenging to detect by conventional depth transcriptome sequencing given the very wide dynamic range of cellular transcriptomes (10–10,000 transcripts per cell for different genes; Fig. 6D; ref. 102). To test this possibility, we employed both down-sampling and up-sampling analysis. Not surprisingly, decreasing levels of RNA-seq depth caused a concomitant decrease in detected somatic promoter transcripts. For example, down-sampling to approximately 40 M reads caused approximately 250 transcripts (FPKM > 0; Fig. 6E) to be rendered undetectable at somatic promoters. More convincingly, in the reciprocal experiment, we experimentally generated deep RNA-seq data for 5 matched gastric cancer/normal pairs (average read depth 140 M compared with standard 100 M) and confirmed the additional detection of 435 new somatic promoter–associated transcripts (FPKM > 0; Fig. 6E). We estimate that usage of deep RNA-seq data allowed us to discover additional transcripts for 22% of the unannotated promoters, not previously detectable at regular depth RNA-seq (Fig. 6F). These results demonstrate that despite being associated with bona fide cancer-associated transcripts, many somatic promoters defined by epigenomic profiling may have been missed by conventional-depth RNA-seq.
Discussion
Identifying somatically altered cis-regulatory elements and understanding how these elements direct cancer-associated gene expression (103) represents a critical scientific goal (104, 105). Here, we defined close to 2,000 promoters exhibiting altered activity in gastric cancer, indicating that somatic promoters in gastric cancer are pervasive. Promoters are traditionally defined as proximal cis-regulatory elements that recruit general transcription factors to initiate transcription (106, 107). However, selection and activation of TSSs by RNA polymerase at core promoters is dependent on multiple factors. Core promoters are differentially distributed between genes of different functions (15, 106), and chromatin distributions and epigenetic landscapes of core promoter regions can also differ in a tissue-specific manner (15, 108–110). Presence of multiple transcription initiation sites within the same gene can generate distinct transcript isoforms with different 5′ UTRs that can act to regulate gene expression (111–113), and usage of alternative 5′ UTRs can also affect both translation and protein stability of cancer-associated genes such as BRCA1, TGFβ, and ERG (18, 114–117). Such findings demonstrate that specific promoter element activity is complex and cell-context dependent, with impact on downstream transcriptional, translational, and functional processes.
A significant proportion (∼18%) of somatic promoters corresponded to alternative promoters. In cancer, alternative promoter utilization is of major relevance, as increasing numbers of genes (e.g., LEF1, TP53, TGFB3) are now being shown to exhibit distinct alternative promoter–associated isoforms that differentially affect malignant growth (21, 118). In the current study, we identified alternative promoters in genes both known and novel to gastric cancer biology with significant clinical and translational implications. For example, we discovered an alternative promoter at the EPCAM gene locus specifically activated in gastric tumors. In gastric cancer, EPCAM encodes a transmembrane glycoprotein that has been proposed as a marker for circulating tumor cells (119), and EPCAM expression levels have been correlated with prognosis of patients with gastric cancer (120). However, little is known about the specific cellular mechanisms driving high EPCAM expression in gastric cancer. Our finding that EPCAM is regulated in gastric cancer not through its canonical promoter, but instead through a cancer-specific alternative promoter may lend credence to recent reports suggesting that in addition to acting as an experimentally convenient surface marker, EPCAM may actually play a more direct pro-oncogenic role in stimulating cellular proliferation (121).
Another novel example of an alternative promoter–associated gene, identified for the first time in our study, is RASA3. Although a functional role for RASA3 in cancer remains to be definitively established, studies from other biological fields have shown that RASA3 can inhibit RAP1 (122), which in turn has been implicated in invasion and metastasis in various cancers (123, 124). RASA3 depletion can enhance signaling by integrins (125) and MAPK (64), and the possibility that RASA3 can act as a tumor suppressor has also been recently suggested through independent cross-species cancer studies (126). Our results suggest that RASA3 may play a more complex role in cancer, as the expression of WT RASA3 inhibited cell migration and invasion in gastric cancer cell lines, whereas N-terminal Var RASA3 enhanced migration and invasion. A third example of an alternative promoter–driven gene is MET, which has been extensively investigated as a target for cancer therapy (127–129). Although we and others have previously reported (38, 65) the expression of an N-terminal truncated MET-Var in cancer, functional implications of this truncated MET-Var have remained unclear. In this study, experimental assessment of MET WT and Var signaling revealed that truncated MET variants may have different downstream signaling effects compared with full-length MET isoforms. Under the experimental conditions used, we observed significant differences in phosphorylation patterns of ERK, STAT3, and GAB1, in a manner consistent with MET-Var being more pro-oncogenic compared with MET-WT, as both ERK, STAT3, and GAB1 have been shown to facilitate MET-induced signaling (130–132). The MET signaling pathway is known to be particularly complex, with multiple feedback loops (133), and understanding how expression of the N-terminal short MET isoform might promote downstream survival signaling will be an important subject of future research, particularly in light of recent clinical trials targeting MET in lung cancer using antibodies that have been unsuccessful (5).
Our study also revealed an unexpected relationship between somatic promoters and tumor immunity. Specifically, we discovered that alternative promoter isoforms overexpressed in gastric cancer were significantly depleted of N-terminal peptides predicted to be potentially immunogenic, based on computational predictions of high-affinity MHC class I binding and other immunologic assays. We believe that finding is relevant to cancer immunity, as it builds on previous findings from the literature establishing the existence of self-reactive T cells, the potential immunogenicity of overexpressed tumor antigens, and the process of tumor immunoediting. First, although the majority of self-reactive T cells are clonally deleted during early development, numerous groups have also demonstrated the frequent persistence of self-reactive T cells in the periphery (134). For example, analysis of transgenic mice has shown that 25% to 40% of autoreactive T cells are likely to escape clonal deletion even in the presence of the deleting ligand (135), and in humans, Yu and colleagues have demonstrated that clonal deletion prunes the T-cell repertoire but does not fully eliminate self-reactive T-cell clones (136). Importantly, although such self-reactive T cells are typically low-avidity and are not capable of recognizing self-antigens under normal physiologic conditions (137–140), they still retain the ability to become activated and to produce effector and memory cells under conditions of appropriate stimulation, such as infection and the mounting of antitumor responses (141, 142).
Second, in cancer, several studies have shown that self-reactive T cells can exhibit immunologic activity toward overexpressed tumor antigens, even if these antigens are also expressed at lower levels in normal tissues. One well-known example is the melanocyte differentiation antigen Melan-A/MART1, which is both expressed by normal melanocytes and overexpressed in malignant melanoma cells (143–147). T-cell recognition of Melan-A/MART1 has been detected in 50% of patients with melanoma (148), and even healthy individuals have been shown to exhibit a disproportionately high frequency of Melan-A/MART1–specific T cells in the peripheral blood (148). Besides Melan-A/MART1, other examples of tumor-associated self-antigens (149–151) inducing immunologic recognition in both healthy individuals and patients with cancer (152) include tyrosinase-related proteins (TRP1 and TRP2; refs. 153–159) and glycoprotein (gp) 100 (147, 160–163) in melanoma, and P1A in mastocytoma cells (164). Such examples clearly demonstrate that in certain cases, normally expressed proteins can still become immunogenic when overexpressed in cancer. Third, tumor immunoediting, the acquired capacity of developing tumors to escape immune control, is a recognized hallmark of cancer (165–172). Tumor immune escape can occur via different mechanisms, such as through upregulation of immune checkpoint inhibitors (e.g., PD-L1) and altered transcription of antigen-presenting genes (173–176) or tumor-specific antigens. For example, decreased expression of melanoma antigens (e.g., gp100, MART1, and P1A) has been associated with melanoma progression to later disease stages (177). Besides overt downregulation of the entire gene, it is thus highly plausible that transcriptional changes affecting splice forms and promoter variants may also contribute to tumor immunoediting. For example, very recent work (178) in B-cell acute lymphoblastic leukemia has described the production of N-terminally truncated CD19 variants in response to CD19 CART (chimeric antigen receptor–armed T cells) therapy, clearly showing that promoter transcript variants can indeed arise as a consequence of immunologic pressure. Taken collectively, we believe that these previously established findings all point to a plausible role for alternative promoters in reducing the immunogenic potential of tumors. In this regard, our observation that somatic promoter regions exhibit a significant overlap with binding targets of the PRC2 epigenetic regulator complex, and are particularly sensitive to EZH2 inhibition, suggests that pharmacologic approaches for reawakening somatic promoter–associated epitopes might represent an attractive strategy for increasing antitumor T-cell immunoreactivity and antitumor activity (86, 179).
In conclusion, our study indicates an important role for somatic promoters in gastric cancer. We also note that a significant portion (52%) of the somatic promoters localized to unannotated TSSs, consistent with recent studies indicating the existence of hundreds of transcript loci remaining to be annotated (180). Interestingly, a large portion of the human transcriptome has been shown to originate from repetitive elements that can exhibit promoter activity and/or express noncoding RNAs (181, 182). Unannotated promoters activated in our gastric cancer study were found to be enriched in ERV1 and L1 repeat elements that have been shown to be associated with stage-specific transcription in early human embryonic cells (183), suggesting a yet-unknown functional role for these promoters. Analysis of these unannotated promoters is likely to provide fertile ground for new and hitherto unanticipated insights into mechanisms of gastric cancer development and progression.
Methods
Primary Tissue Samples and Cell Lines
Primary patient samples were obtained from the SingHealth tissue repository with approvals from the SingHealth Centralised Institutional Review Board and signed patient informed consent. “Normal” (nonmalignant) samples used in this study refer to samples harvested from the stomach, from sites distant from the tumor and exhibiting no visible evidence of tumor or intestinal metaplasia/dysplasia upon surgical assessment. Tumor samples were confirmed by cryosectioning to contain >60% tumor cells. FU97, IM95, MKN7, OCUM1, and RERF-GC-1B cell lines were obtained from the Japan Health Science Research Resource Bank. AGS, KATOIII and SNU16, Hs 1.Int and Hs 738.St/Int gastrointestinal fibroblast lines were obtained from the ATCC. NCC-59, NCC-24, and SNU-1967 and SNU-1750 were obtained from the Korean Cell Line Bank. YCC3, YCC7, YCC21, and YCC22 were gifts from Yonsei Cancer Centre (Seoul, South Korea). HFE145 cells were a gift from Dr. Hassan Ashktorab (Howard University, Washington, DC). GES1 cells were a gift from Dr. Alfred Cheng, Chinese University of Hong Kong. Cell line identities were confirmed by short tandem repeat DNA profiling using ANSI/ATCC ASN-0002-2011 guidelines in mid-late 2015. All cell lines were negative for Mycoplasma contamination as assessed by the MycoAlert Mycoplasma Detection Kit (Lonza) and the MycoSensor qPCR Assay Kit (Agilent Technologies). PBMCs from healthy donors were collected under protocol CIRB ref no. 2010/720/E.
ChIP-seq
Nano-ChIP-seq was performed as described previously (38) with slight modifications (see Supplementary Text). Eight Nano-ChIP-seq libraries were multiplexed (New England Biolabs) and sequenced on 2 lanes of a HiSeq2500 sequencer (Illumina) to an average depth of 20 to 30 million reads per library. We assessed ChIP library qualities (H3K27ac, H3K4me3, and H3K4me1) using two different methods, ChIP enrichment assessment and CHANCE (see Supplementary Text; ref. 41). For EZH2 ChIP-seq, EZH2 antibodies (catalog #5246, Cell Signaling Technology) were used for ChIP. Thirty nanograms of ChIPed DNA was used for each sequencing library preparation (New England Biolabs).
Promoter Analysis
Promoter (H3K4me3hi/H3K4me1lo) regions were identified by calculating the H3K4me3:H3K4me1 ratio for all H3K4me3 regions merged across normal and gastric cancer samples. We estimated the required sample size to achieve 80% power and 10% type I error (http://powerandsamplesize.com/) based on the average signals of top 100 differential promoters between tumor and normal samples. This result yielded a recommended sample size of 11 (average), which is met in our study (16 N/T). Regions with H3K4me3:H3K4me1 ratios <1 in both normal and gastric cancer samples were excluded from further analysis. For all analyses performed in this study, promoter regions were defined as genomic locations exhibiting H3K4me3hi/H3K4me1lo signals, and for all subsequent analyses, it was only within this predefined H3K4me3hi/H3K4me1lo subset that H3K4me3 signals were compared. H3K27ac data were used for correlative analysis. H3K4me3 data (FASTQs) for colon carcinoma lines were downloaded from public databases: HCT116 and Caco2 from ENCODE and V503 and V400 from GSE36204. To compare promoter signals between gastric cancer and normal samples, we used the DESeq2 (46) and edgeR (47) bioconductor packages using a read count matrix of ChIP-seq signals, adjusting for replicate information. Regions with FCs greater than 1.5 (FDR = 0.1) were selected as significantly different. The criteria of FC = 1.5 and q < 0.1 was based on previous literature comparing ChIP-seq profiles using DESeq2 and edgeR also using similar thresholds (49, 50). Significantly altered promoters identified by DESeq2 overlapped almost completely with altered promoters found by edgeR. A regularized log transformation of the DESeq2 read counts was used to plot PCAs and heat maps.
Transcriptome Analysis
RNA-seq data were obtained from the European Genome-Phenome Archive under accession no. EGAS00001001128. Data were processed by first aligning to GENCODE v19 transcript annotations using TopHat v2.0.12 (184). Cufflinks 2.2.0 was used to generate FPKM abundance measures. For identification of novel transcripts, Cufflinks was used without employing a reference transcript annotation. Transcripts were then merged across all gastric cancer and normal samples and compared against GENCODE annotations to identify novel transcripts using Cuffmerge 2.2.0. Deep-depth strand-specific RNA-seq was also performed on 10 additional primary samples (paired-end 101 bp). TCGA datasets were downloaded from TCGA Data Portal (https://gdc.cancer.gov/) in the form of FASTQ files, which were then aligned to GENCODE v19 transcript annotations using TopHat v2.0.12. To analyze promoter-associated RNA expression, RNA-seq reads from TCGA samples (tumor and normal) were mapped against the genomic locations of promoter regions originally defined by epigenomic profiling in the discovery samples, including all promoters, gained somatic promoters, and lost somatic promoters (see Fig. 1). RNA-seq reads mapping to these epigenome-defined promoter regions were then quantified and normalized by promoter length (kilobases) and by total library size, and FCs in expression were computed between tumor and normal TCGA sample groups. Length of promoter loci was defined as the number of base pairs (bp) between the start and stop genomic coordinate of the H3K4me3 region as identified by the peak caller program CCAT v3.0 (185). Isoform level quantification for alternative promoter–driven transcripts was performed using Cufflinks (FPKM; ref. 186), Kallisto (TPM; ref. 187), and MISO (isoform-centric analysis; ref. 188). Assigned counts for each isoform were normalized by DESeq2.
Other Analyses
Other analyses, including DNA methylation analysis, survival analysis, gene set enrichment analysis, analysis of repetitive elements, functional element analysis using GenoCanyon and GenoSkyline, and analysis of transcription factor binding sites, are presented in the Supplementary Text.
Mass Spectrometry and Data Analysis
Peptide-level spectral data for 90 colon and rectal cancer samples (63) were downloaded from the CPTAC portal (https://cptac-data-portal.georgetown.edu/cptac/s/S016) generated by the Clinical Proteomic Tumor Analysis Consortium (NCI/NIH). Spectral counts were extracted using IDPicker's idQuery tool (189). Differentially expressed peptides were identified by fitting a linear model (limma R; ref. 190) on quantile-normalized and log2-transformed spectral counts. For gastric cancer cell line mass spectrometry, AGS, GES1, SNU1750, and MKN1 proteomic profiles were generated using nanoflow liquid chromatography on an EASY-nLC 1200 system coupled to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific; Supplementary Text). The Q Exactive HF was operated with a TOP20 MS-MS spectra acquisition method per MS full scan. MS scans were conducted with 60,000 resolution and MS-MS scans with 15,000 resolution. For data analysis, raw files were processed with MaxQuant (191) version 1.5.2.8 against the UniProt annotated human protein database (192). Carbamidomethylation was set as a fixed modification, whereas methionine oxidation and protein N-acetylation were considered as variable modifications. Search results were processed with MaxQuant filtered with an FDR of 0.01. The match between run option and Label-Free Quantification (LFQ; ref. 193) was activated. LFQ intensities were filtered for potential contaminants and reverse proteins, and log2 transformed. They were then imputed using the open-source software Perseus (0.5 width, 1.8 downshift; ref. 194) and fitted using linear models (limma R; ref. 190).
Molecular Biology and Biochemistry
Procedures for 5′ rapid amplification of cDNA ends (5′ RACE), gene cloning, Western blotting (MET variants), RASA3 mRNA measurement, and RAS-GTP assays are presented in the Supplementary Text.
Transfection with RASA3 siRNAs
Two RASA3 siRNAs were used to silence the RASA3 SomT transcript in NCC24 cells [hs.Ri.RASA3.13.1 TriFECTa Kit DsiRNA Duplex (Integrated DNA Technologies), and Silencer Select Pre-Designed siRNA s355 (Life Technologies)]. NCC24 cells were transfected either with the above two siRNAs or a nontargeting control (ON-TARGETplus non-targeting pool, Dharmacon) at a final concentration of 100 nmol/L for 48 hours, subsequently followed by qPCR and Western validation and migration/invasion assays.
Cell Proliferation, Migration, and Invasion Assays
For cell proliferation, 3 × 103 GES1, SNU1967, and AGS cells were plated into 96-well plates in media with 10% FBS and left overnight to attach. The next day (day 0), cells were transiently transfected with WT and Var RASA3 constructs using Lipofectamine 3000 (Thermo Fisher Scientific). The amount of the constructs was 40 ng per well for AGS and 100 ng per well for GES1 and SNU1967 cells. Cell proliferation was measured by the WST-8 assay (Cell Counting Kit-8, Dojindo) from 24 to 120 hours posttransfection. WST-8 solution (10 μL) was added per well, and the absorbance reading was measured at 450 nm after 2 hours of incubation in a humidified incubator. To determine cell-migratory capacities, RASA3 WT and Var-transfected GES1, SNU1967, and AGS cells and siRNA-treated NCC24 cells were tested using Corning Costar 6.5-mm Transwell with 8.0-μm Pore Polycarbonate Membrane Inserts (3422, Corning). AGS cells (2.5 × 104), GES1 cells (2 × 104), SNU1967 cells (3 × 104), and NCC24 cells (5 × 104) were suspended in 0.1 mL serum-free RPMI medium and added to the top of the Transwell insert. RPMI (0.6 mL) containing 10% FBS was added into the bottom well as a chemoattractant. After incubation for 24 hours at 37°C in a 5% CO2 incubator, cells were fixed with 3.7% formaldehyde and permeabilized with 100% methanol. Nonmigrated cells were scraped off with cotton swabs from the upper surface of the membrane. Migrated cells were stained with 0.5% crystal violet. The number of migrated cells was represented as the total area of migrated cells versus the area of Transwell membrane calculated using ImageJ software. For cell invasion assays, the above Transwell inserts were coated with 0.1 mL (300 μg/mL) Corning Matrigel matrix (354234, Corning) for 2 to 4 hours at 37°C before use.
Altered Peptide and Antigen Prediction
Altered peptides were defined as variant N-terminal protein sequences arising from somatic alterations in alternative promoter usage. The following filters were applied to select the pool of altered peptides: (i) FC of at least 1.5 for alternate versus canonical RNA-seq expression; (ii) only one canonical and one alternate isoform per gene loci; and (iii) annotated transcripts confirmed as protein coding by GENCODE. Canonical promoters were defined as regions exhibiting unaltered H3K4me3 peaks. Random peptides from the human proteome were generated from amino acid sequences of GENCODE coding transcripts. N-terminal peptide gains were identified as cases in which the alternative transcript was associated with a different 5′ region predicted to result in a different translated protein sequence compared with the canonical transcript. For each N-terminal altered protein, we evaluated binding of 9-mer peptides using the NetMHCpan 2.8 using a strict threshold of IC50 ≤ 50 nmol/L to identify strong MHC binders (74, 195). Antigen predictions were performed against patient-specific HLA types of gastric cancer samples predicted using OptiType (196). OptiType was run using default parameters, except BWA mem was used as an aligner for prefiltering reads aligning to the OptiType-provided reference sequences.
Association of Cytolytic Markers with Alternative Promoter Usage
Local immune cytolytic activity was evaluated using the expression of GZMA and PRF1 as previously used by Rooney and colleagues (81). Tumor content was estimated using two algorithms, ASCAT (aberrant cell fraction; ref. 79) and ESTIMATE (tumor purity; ref. 80). Expression data for the SG series were downloaded (GSE15460) and normalized using the robust multiarray average algorithm in the “affy” R package (197) and log2 transformed. Affymetrix SNP Array 6.0 data for the SG series were downloaded from GSE31168 and GSE85466. Mutation frequencies for TCGA stomach adenocarcinoma (STAD) samples were downloaded from the TCGA STAD publication data (https://tcga-data.nci.nih.gov/docs/publications/stad_2014/; ref. 198) using level 2 curated MAF files (QCv5_blacklist_Pass.aggregated.capture.tcga.uuid.curated.somatic.maf) filtered for “Missense” variant classification. Expression data for TCGA STAD samples (TPM) were computed using the Kallisto algorithm (187). Raw SNP Array 6.0.CEL files for TCGA gastric cancers (STAD) were downloaded from the GDC data portal (https://gdc-portal.nci.nih.gov/). Access to this dataset was obtained using Database of Genotypes and Phenotypes (dbGaP) credentials and an ID issued by eRA commons. Precomputed ESTIMATE scores for TCGA STAD were downloaded from http://bioinformatics.mdanderson.org/estimate/ and converted to tumor purity using the formula cos (0.6049872018 + 0.0001467884 × ESTIMATE score; ref. 80). Preprocessed expression data for the ACRG series were downloaded from GSE62254, and precomputed ASCAT scores were obtained from collaborators (J. Lee). Expression of cytolytic markers was adjusted for missense mutation and tumor purity frequencies using a spline regression model.
EPIMAX Assays
Peptides for 15 representative alternative promoters were synthesized by GenScript (Supplementary Table S10). Control peptide pools for human actin were purchased from JPT [PM-ACTS, PepMix Human (Actin) JPT]. PBMCs were obtained from 9 healthy volunteers, of which 8 PBMC samples were HLA typed (Supplementary Table S9). PBMCs were labeled with 1 μmol/L CFSE (Life Technologies, Thermo Fisher Scientific) and cultured at a density of 2 × 105 cells per well in complete culture medium [cRPMI comprising RPMI-1640 medium (Gibco, Thermo Fisher Scientific), 15 mmol/L HEPES (Gibco), 1% nonessential amino acid (Gibco), 1 mmol/L sodium pyruvate (Gibco), 1% penicillin/streptomycin (Gibco), 2 mmol/L l-glutamine (Gibco), 50 μmol/L β2-mercaptoethanol (Sigma, Merck), and 10% heat-inactivated FCS (HyClone)] for 5 days. Individual peptide pools of each alternative promoter were added at the start of the culture at a concentration of 1 μg/mL for each peptide. At the end of day 5, cells were stained with LIVE/DEAD Fixable Near-IR Dead Cell Stain Kit (Life Technologies) and labeled with CD4-BUV737 (BD Biosciences), CD8-PacificBlue (BD Biosciences), CD3-PE (BioLegend), CD19-PE/TexasRed (Beckman Coulter), and CD56-APC (BD Biosciences). In addition, magnetic bead–based cytokine multiplex analysis (human cytokine panel 1, Millipore, Merck) was performed on cell culture supernatants to measure secreted cytokine levels.
IFNf Assays
To test the immunogenicity of the RASA3 WT and Var protein sequences, CD14+ monocytes were isolated from an HLA-A*02:06 donor by positive selection using magnetic beads (Miltenyi Biotec). Dendritic cells (DC) were generated by GM-CSF (1,000 IU/mL) and IL4 (400 IU/mL) and further matured by TNF (10 ng/mL), IL1b (10 ng/mL), IL6 (10 ng/mL; Miltenyi Biotec), and PGE2 (1 μg/mL; Stemcell Technologies) for 24 hours. The DCs were then primed with AGS cell lysates expressing WT RASA3 or Var RASA3 for 24 hours, before being cocultured with T cells from the same donor at the ratio of 1:5. After 5 days of coculture with DCs, T cells were isolated by positive selection using CD3 magnetic beads (Miltenyi Biotec) and cocultured with AGS cells expressing either WT or Var RASA3 at the ratio of 20:1 for 2 days. Supernatants were harvested and IFNγ release was measured by ELISA (R&D Systems).
NanoString Analysis
NanoString nCounter Reporter CodeSets were designed for 95 genes (83 upregulated in gastric cancer and 11 downregulated) and 5 housekeeping genes (AGPAT1, CLTC, B2M, POL2RL, and TBP covering a broad expression range) on the SG series samples. For each gene, we designed three probes, targeting (i) the 5′ end of the alternate promoter location; (ii) the 5′ end of the canonical promoter (defined by promoter regions of equal enrichment in both GC and normal samples or the longest protein-coding transcript); and (iii) a common downstream probe. A separate NanoString assay was designed for 88 genes on the ACRG cohort, using similar criteria. Vendor-provided nCounter software (nSolver) was used for data analysis. Raw counts were normalized using the geometric mean of the internal positive control probes included in each CodeSet.
EZH2 Inhibition
IM95 cells were treated with GSK126 (Selleck Chemicals), a selective EZH2 inhibitor (96, 199), at a concentration of 5 μmol/L. Cell proliferation was monitored in 96-well plates posttreatment with GSK126 using the CellTiter-Glo Luminescent Cell Viability Assay (Promega) for three independent experiments. For RNA-seq analysis, total RNA was extracted using the Qiagen RNeasy Mini Kit according to the manufacturer's instructions. Cells were treated with GSK126 (Selleck Chemicals; dissolved in DMSO) at a concentration of 5 μmol/L. Control cells were treated with the same concentration of DMSO (0.1%). RNA-seq differential analysis for promoter loci was carried out using edgeR (47) on read counts mapping to H3K4me3 regions estimated using featureCounts (200). RNA-seq gene level differential analysis was performed using cuffdiff2.2.1.
Accession Codes
Genomic data for this study have been deposited in the National Center for Biotechnology GEO database, under accession numbers GSE51776 and GSE75898 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=kfoxqeamzfetpal&acc=GSE75898).
Disclosure of Potential Conflicts of Interest
P. Tan has ownership interest (including patents) in ASTAR. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: A. Qamra, M. Xing, N. Padmanabhan, P.K.H. Chow, B.T. Teh, P. Tan
Development of methodology: M. Xing, J.J.T. Kwok, J.S. Lin, X. Yao, B.T. Teh, P. Tan
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M. Xing, N. Padmanabhan, S. Zhang, C. Xu, Y.S. Leong, A.P.L. Lim, Q. Tang, X. Yao, X. Ong, M. Lee, S.T. Tay, E.G. Santoso, C.C.Y. Ng, A. Jusakul, D. Smoot, S.Y. Rha, K.G. Yeoh, W.P. Yong, P.K.H. Chow, W.H. Chan, H.S. Ong, K.C. Soo, K.-M. Kim, W.K. Wong, B.T. Teh, D. Kappei, J. Lee, P. Tan
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A. Qamra, M. Xing, J.J.T. Kwok, Q. Tang, W.F. Ooi, J.S. Lin, T. Nandi, X. Yao, A. Ng, S.Y. Rha, S.G. Rozen, D. Kappei, J. Connolly, P. Tan
Writing, review, and/or revision of the manuscript: A. Qamra, M. Xing, J.J.T. Kwok, A. Ng, S.Y. Rha, K.G. Yeoh, W.P. Yong, P.K.H. Chow, S.G. Rozen, B.T. Teh, J. Connolly, P. Tan
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.P.L. Lim, A.T.L. Keng, H. Ashktorab, S.G. Rozen, B.T. Teh, P. Tan
Study supervision: P.K.H. Chow, P. Tan
Other (provided the cell lines): H. Ashktorab
Acknowledgments
We thank the Sequencing and Scientific Computing teams at the Genome Institute of Singapore for providing sequencing services and data management capabilities, and the Duke-NUS Genome Biology Facility for sequencing services. We also thank Dr. Shyam Prabhakar for helpful discussions. We thank Dr. Wanjin Hong for the gift of HEK293 cells and pCI-Puro-HA vector and Dr. Alfred Cheng for the gift of GES1 cells.
Grant Support
This work was supported by a core institutional grant from the Genome Institute of Singapore under the Agency for Science, Technology and Research; core funding from Duke-NUS Medical School; and National Medical Research Council grants TCR/009-NUHS/2013, BnB/0005b/2013 (BnB11dec069), and NMRC/STaR/0026/2015. Other sources of support include the Cancer Science Institute of Singapore, NUS, under the National Research Foundation Singapore and the Singapore Ministry of Education under its Research Centres of Excellence initiative.