To understand the genetic drivers of immune recognition and evasion in colorectal cancer, we analyzed 1,211 colorectal cancer primary tumor samples, including 179 classified as microsatellite instability–high (MSI-high). This set includes The Cancer Genome Atlas colorectal cancer cohort of 592 samples, completed and analyzed here. MSI-high, a hypermutated, immunogenic subtype of colorectal cancer, had a high rate of significantly mutated genes in important immune-modulating pathways and in the antigen presentation machinery, including biallelic losses of B2M and HLA genes due to copy-number alterations and copy-neutral loss of heterozygosity. WNT/β-catenin signaling genes were significantly mutated in all colorectal cancer subtypes, and activated WNT/β-catenin signaling was correlated with the absence of T-cell infiltration. This large-scale genomic analysis of colorectal cancer demonstrates that MSI-high cases frequently undergo an immunoediting process that provides them with genetic events allowing immune escape despite high mutational load and frequent lymphocytic infiltration and, furthermore, that colorectal cancer tumors have genetic and methylation events associated with activated WNT signaling and T-cell exclusion.
Significance: This multi-omic analysis of 1,211 colorectal cancer primary tumors reveals that it should be possible to better monitor resistance in the 15% of cases that respond to immune blockade therapy and also to use WNT signaling inhibitors to reverse immune exclusion in the 85% of cases that currently do not. Cancer Discov; 8(6); 730–49. ©2018 AACR.
This article is highlighted in the In This Issue feature, p. 663
Despite major advances in technologies to monitor complex molecular profiles of cancer, translating molecular observations into treatment decisions remains challenging in routine patient care due to cancer heterogeneity and the rapidly expanding repertoire of treatment options, including immunotherapy (1, 2). Precision medicine advances have been fueled by technological advances, such as targeted sequencing, and also by multi-omic sequencing efforts, such as The Cancer Genome Atlas (TCGA; ref. 3), which are powered to provide guidance for targeting molecular observations in the form of estimates of the frequencies of drivers to consider.
Colorectal cancer in particular represents a heterogenous group of dynamic diseases with differing sets of genetic events, accompanying immune response, and influences of exogenous factors, providing a challenge for personalized therapeutic approaches. The initial TCGA effort to molecularly profile colorectal cancer reported on 276 primary samples, providing a deep look at the common non-hypermutated chromosome unstable subset, but only a preliminary consideration of 35 hypermutated cases due to the low overall number of microsatellite instability–high (MSI-high) cases (4). The NCI Genetics and Epidemiology of Colorectal Cancer Consortium (NCI-GECCO) effort, which relies on the TCGA colorectal cancer data as a basis for integration of epidemiology data and genome-wide association study data from 40,000 participants to make recommendations regarding colorectal cancer prevention (5), pursued the systematic completion of the TCGA colorectal cancer cohort, which now includes 592 cases with tumor exome, transcriptome, methylation, and copy-number alteration (CNA), as well as key subtyping and intermediate data as a resource for the community, including molecular pathologists. We present this resource here.
To further power the study of colorectal cancer, we integrated exome data on 619 previously published primary tumor cases from the Nurses' Health Study (NHS) and the Health Professionals Follow-up Study (HPFS; ref. 6), yielding an unprecedented sample set of 1,211 primary colorectal cancer molecularly characterized cases for identifying significantly mutated genes, pathways, and potential oncogenic hotspots. The large number of specimens, including a high number of hypermutated MSI-high cases (n = 179, or 15%) and POLE (16 cases), make it possible to carefully analyze genetic alterations in these hypermutation subtypes.
We were particularly interested in applying this large multi-omic colorectal cancer cohort to understand drivers of immune infiltration, as recent advances in immunotherapy have necessitated a deeper consideration of the less frequent MSI-high colorectal cancers, which have been shown to respond to immune checkpoint blockade, whereas the much more common microsatellite stable (MSS) colorectal cancers do not usually respond to these immunotherapies (7, 8). The FDA recently approved pembrolizumab and nivolumab, two PD-1 immune checkpoint–blocking antibodies, for the treatment of patients with MSI-high [or mismatch repair deficient (MMR-D)] colorectal cancer and, in the case of pembrolizumab, all MSI-high or MMR-D treatment-refractory solid malignancies. It is hypothesized that a higher somatic mutational load leading to increased presentation of neoepitopes mediates immunotherapy responses in MSI-high tumors (6–8). However, it has been observed previously that MSI-high or immune-infiltrated tumors have evolved mutations that may confer resistance to recognition by the immune system in untreated samples (i.e., immunoediting; ref. 9). On the other hand, it has been postulated that a lower mutational load and poorly characterized effector T-cell exclusion processes underlie immunotherapy resistance in MSS colorectal cancer (10). Here, we demonstrate that there are other factors besides mutation load affecting T-cell infiltration in both MSI-high and MSS. Specifically, we show that MSI-high tumors frequently undergo immunoediting through complete disruption of both alleles of key genes in the MHC–antigen presentation pathway and that both MSS and MSI-high cancers have lower T-cell infiltration when molecular events that upregulate the WNT pathway are present.
Colorectal Cancer Subtypes of MSI-High, POLE, and MSS Analyzed in the TCGA Cohort
MSI-high status was available for the previously published 276 TCGA samples, using the clinical grade classification, which uses PCR on a set of microsatellite markers (4). To assess MSI-high status on the remaining samples for which we did not have coverage on these microsatellite markers, we relied on a benchmarked bioinformatics approach called mSINGS (11, 12), which considers whether there is an increased insertion and deletion (indel) rate at microsatellite tracts throughout the exome (Supplementary Table S1 and Methods). MSI-high samples are driven by disruptive mutations of mismatch repair genes or epigenetic silencing of MLH1, and we confirmed these driving events in all but 11 samples, increasing our confidence in our assessment of MSI-high status (Supplementary Fig. S1A). We also confirmed increased mutation load in MSI-high tumors with increased substitutions (P < 0.001) and indels (P < 0.001), often at microsatellite tracts due to MMR-D in MSI-high cases (Supplementary Fig. S1B). Three Lynch syndrome cases were identified by manual inspection of the germline sequenced reads aligned to the human genome to identify disruptive mutations in MMR genes (Supplementary Table S1). These cases were typical of other MSI-high cases with respect to mutational load for number of substitutions and indels and were treated that way in downstream analysis. The 75 TCGA MSI-high cases presented here included transcriptome, CNA, and methylation data, in addition to exome data, making multi-omic analyses possible for these cases.
An additional rare hypermutation subtype exists in colorectal cancer that could be potentially targetable by immune checkpoint inhibitors based on having an unusually high mutation load (13): POLE mutated cases that have mutations in the exonuclease domain of the catalytic subunit of the polymerase epsilon, and that have an outlier number of substitutions, but not indels (Supplementary Fig. S2). We identified 16 of these cases on the basis of mutations in the gene POLE and an outlier number of substitutions (Fig. 1; Supplementary Tables S1 and S2). Three cases were also identified that were POLE and MSI-high, based on mutations in POLE, mSINGS results, and epigenetic silencing of MLH1. All of these cases had an extremely high mutation load, including both indels and substitutions: The largest mutation load in the series was case TCGA-3821 with both a POLE mutation and MSI-high resulting in 10,088 substitutions and 143 indels.
Next, we subtyped the MSI-high and MSS samples further to validate that the resulting TCGA colorectal cancer 592-sample cohort recapitulated previously reported features of MSS and MSI-high colorectal cancer. CpG island methylator phenotype (CIMP) status was called using an array-based methylation method (14). Genomic subtypes, including BRAF+, KRAS+, APC+, TP53+, and CTNNB1+, were called using somatic mutation data generated as described in Methods. In addition, the colorectal cancer Consensus Molecular Subtype for each sample was generated using the mRNA (Supplementary Table S1; ref. 15). Figure 1 shows that the colorectal cancer data in this cohort were consistent with previously described genetic and molecular subtypes of MSI-high and MSS status (4, 6, 16). As previously reviewed (16), MSI-high tumors were enriched for hypermethylation (CIMP-high, P < 1 × 10−15), mutations in BRAF (P < 1 × 10−15), and RNF43 (P < 1 × 10−15). On the other hand, MSS tumors were enriched for mutations in APC (P < 1 × 10−15) and KRAS (P = 0.004) and increased chromosomal instability, including CNAs (P < 0.001) and recurrent high gains and losses (Supplementary Fig. S1C; Supplementary Table S3). These differences between MSS and MSI-high tumors reflect differences in precursor lesion and etiology (17) that may contribute to responses to immune checkpoint blockade therapy. A complete description of the data analysis methods, including somatic mutation calling, CNA calling, copy-neutral loss of heterozygosity (CN-LOH) calling, and HLA class I typing, along with available data, may be found in Methods.
WNT Signaling and Immune-Related Genes and Pathways Significantly Mutated in the Combined 1,211 Colorectal Cancer Cases
To get a deeper look at the mutational landscape of colorectal cancer, we combined the somatic mutation data from the 592 fresh-frozen exomes from TCGA with the somatic mutations data from the formalin-fixed paraffin-embedded (FFPE) exomes from the 619 cases from the Nurses' Health Study (NHS) and the HPFS (6), yielding a sample set of 1,211 colorectal cancer molecularly characterized cases. The combined large number of specimens, including 179 MSI-high cases (104 from the NHS/HPFS cohort typed for MSI-high status using a clinical grade test), provided sufficient power to identify significantly mutated genes and pathways in MSS and MSI-high tumors. This large dataset also had the power to identify rare mutation hotspots. For example, we identified the known activating mutation IDH1R132H (18) in five cases and AKT1E17K (19) in six cases. Figure 2A and Supplementary Table S4 detail which dataset was used for which analyses: The somatic mutations from the NHS/HPFS cohort were additionally used to identify potential drivers, whereas the multi-omic TCGA data were used to explore further the functional roles of these drivers.
We utilized the MutSigCV algorithm (20) on the combined dataset to identify the significantly mutated genes that were found to be mutated more frequently than expected given the cohort-specific background mutation rate in MSS or MSI-high (6). We identified a total of 62 significantly mutated genes, of which 9 were identified in MSS tumors only, 40 in MSI-high tumors only, and 13 in both (Fig. 2B; Supplementary Table S5). Supplementary Table S6 shows which genes in Fig. 2B were novel with respect to the four previous large-scale whole-exome sequencing studies of colorectal cancer (1, 4, 6, 21). The significantly mutated genes in MSS were all previously reported in TCGA (4) or in the NHS/HPFS study (6) except for B2M, which is more commonly mutated and significant in MSI-high, occurring less commonly in MSS (Supplementary Table S6). On the other hand, 27 of the 53 significantly mutated genes in MSI-high were novel, likely resulting from the large increase in the number of MSI-high tumors in this combined series relative to previous studies (1, 4, 6, 21). WNT signaling genes were well represented as significantly mutated genes in both subtypes, whereas immune-related genes were significantly mutated in MSI-high specifically (Fig. 2B).
We used MutSigCV on the TCGA cohort to identify significantly mutated pathways in the MSS cohort using BioCarta, REACTOME, and KEGG gene sets. We identified 135 pathway gene sets with a Q-value less than 0.01 in the MSS cohort: Only four involved the immune system, specifically interleukin signaling (Supplementary Table S7). Next, we used MutSigCV to identify six significantly mutated pathways in the MSI-high cohort, including two significantly mutated pathways involved in MHC class I antigen-presenting machinery (APM; Supplementary Table S8). The antigen-presenting genes HLA-A, HLA-B, and B2M contribute a large number of the mutations in these pathways.
Immune-Related and WNT Signaling Genes Significantly Mutated in MSI-High Based on MMR-D
It had been previously observed that MMR-D mutations occur at specific microsatellite sites in the colorectal cancer genome (22), but the functional impact of these microsatellite hotspots was not previously known (22). A recent study (23) presented a method to use microsatellite indels, which occur at a high rate in MSI-high tumors, to identify cancer drivers. The study identified a JAK1 microsatellite indel as driving immune evasion in endometrial tumors and redemonstrated RNF43 microsatellite indels as inhibitors of WNT signaling in colon cancer, thereby demonstrating the utility of this approach. We applied a similar approach (see Methods) to the full 179 colorectal cancer MSI-high cohort to see whether each of the 53 significantly mutated genes in MSI-high had a higher than expected number of indels at microsatellite sites. We observed 29,429 total indels at 1 billion possible microsatellite tracts in the entire MSI-high cohort. Aggregating the data by gene, we observed 28 of 53 genes significantly mutated by this MMR-D typical mechanism (Fig. 2B). Eight of these genes were related to WNT signaling, and 11 were related to immune pathways (Fig. 2B), indicating selection pressure on these genes in MSI-high resulting from the MMR-D mutation rate (Supplementary Table S9).
Biallelic Losses of WNT and Immune-Related Genes
To add further evidence to the driver status of the 62 significantly mutated genes, and to overcome concerns about hypermutation rates creating false positives, we integrated data on disruptive mutations, single-copy losses, and CN-LOH in MSS (Fig. 3A) and MSI-high tumors and thereby identified cases with biallelic losses in 26 of the 62 significantly mutated genes (Fig. 3A; Supplementary Table S10; short reads limit our ability to check whether both alleles are lost when there are two disruptive mutations, so these cases are less conclusive). Of these, 13 were WNT signaling genes and eight were immune-related genes. These data indicate selection pressure to disrupt WNT genes across all colorectal cancers, whereas immune-related genes are selected for disruption in MSI-high.
Although MSI-high cancers have relatively few CNAs (Supplementary Fig. S3), we found that CN-LOH events do occur at sites of disruptive mutations, resulting in biallelic loss of a gene and indicating selection pressure to inactivate that gene (Supplementary Fig. S4A shows focal CN-LOH events driving biallelic loss of B2M). Figure 3B shows all the CN-LOH and single-copy loss events overlapping disruptive mutations in the TCGA cohort, revealing recurrent CN-LOH events that occur in the HLA-A, HLA-B, and B2M regions, providing evidence that in MSI-high cases, CN-LOH events cluster in APM genes. Interestingly, a recent study of CN-LOH of the HLA class I region in lung cancer showed that biallelic loss in this region can drive immune escape (24), whereas a recent study in melanoma showed that biallelic losses of B2M are enriched in anti–PD-1 nonresponders relative to responders (25).
Selection for Disruption of Immune-Related Genes in MSI-High Tumors
Figure 4A shows the mutation landscape for the 75 MSI-high samples from the TCGA, which has CNA and gene expression data. Fifteen significantly mutated genes (identified using MutSigCV on the combined cohort) have a documented role in the immune system specifically, including 13 significantly mutated in MSI-high tumors, a subtype shown previously to have a high level of T-cell infiltration (6), and two significantly mutated, immune-related genes (ZFP36L2 and B2M) in both MSS and MSI-high (Figs. 2B and 4A). Of these 15 genes, three had known roles in modulating diverse functions in hematopoietic lineage cells and effects beyond antigen presentation but no previously described roles in cancer cells: XYLT2, a gene necessary for dendritic cell traffic and T-cell response in the lymph nodes, because of its role in the biosynthesis of heparan sulfate (26); ZFP36L2, a gene involved in thymocyte development (27); and KLF3, a gene involved in B-cell development (28). Four additional significantly mutated immune-related genes had been previously described as being expressed in cancer cells: CD58, a tumor surface protein involved in the activation of natural killer (NK) cells through interaction with CD2 (29) that is concurrently lost with B2M in some cancers (30); CASP8, a tumor-expressed gene involved in innate immunity through its role in apoptosis (31); ZBTB20, a tumor Toll-like receptor mediating innate immune response by repressing IκBα gene transcription (32); and RNF128, a tumor-expressed gene that is essential for IFNβ production and innate antiviral immune response (33). Together, these mutations indicate positive selection for immune escape through other mechanisms beyond mutations in APM.
The observation of mutations consistent with cancer immunoediting motivated us to understand the extent of damage to the APM previously reported in MSI-high or more heavily T cell–infiltrated tumors (6–8, 34). We found that 57% of MSI-high cases had at least one mutation in HLA-A, HLA-B, or HLA-C, with 39% having a mutation in HLA-A, 33% in HLA-B, and 21% in HLA-C, using an HLA mutation calling method that accounts for the polymorphic nature of this region (Fig. 4A; Supplementary Table S11). We included in Fig. 4A gene expression data for each HLA class I gene, demonstrating that HLA class I expression levels vary in colorectal cancer primary tumors. We observed both truncating mutations that lower expression of HLA-A and HLA-B (Fig. 4B) and missense mutations in exon 4, which disrupt the alpha 3 domain that binds to the CD8 coreceptor of T-cells (35), potentially disrupting antigen presentation. Because the HLA class I genes are adjacent on chromosome 6, a CN-LOH event in this region, referred to as an HLA LOH event, could simultaneously eliminate up to three maternal or paternal adjacent HLA class I alleles (Supplementary Fig. S5), potentially iteratively reducing the cell's capacity for antigen presentation while expanding the landscape of possible oncogenic mutations available to the tumor (36). Using a similar method to the one used here, HLA LOH events were observed recently in lung cancer metastasis with evidence that these cases can iteratively drive immune escape (24). Here, we found six HLA LOH cases out of 75 primary untreated MSI-high tumors; two of these had additional disruptive mutations, potentially further reducing the number of HLA class I alleles in those cases (Fig. 4A).
Next, we considered significantly mutated genes in MSI-high with a role in disrupting HLA class I presentation. We identified recurrent mutations in the genes NLRC5 and RFX5, which cooperatively regulate the transcription of HLA class I genes (37–40). NLRC5 was mutated in 33% of MSI-high samples and RFX5 was mutated in 16% of TCGA MSI-high samples (Fig. 4A). Mutations in either of these genes were found to be associated with decreased expression of HLA-A, HLA-B, and HLA-C (Fig. 4C), reducing the expression of each gene by approximately 50%. RFX5 was a significantly mutated gene in the TCGA and NHS/HPFS cohorts independently. Although NLRC5 has been previously suggested to be a possible target for immune evasion (41), the size of the gene (1,866 AA) makes it difficult to identify as significantly mutated by MutSigCV. Instead, NLRC5 was identified here because it was significantly enriched for insertions/deletions at homopolymer and dinucleotide sites in MSI-high based on the MMR deficiency significance calculation performed here (P = 1 × 10−4).
Antigen presentation machinery genes have been shown to be enriched for mutations in MSI-high relative to MSS tumors (6), in particular TAP2, a gene that helps transport peptides to be presented by the APM, and B2M, a necessary component of HLA class I complexes on the surface; we found 6 of 75 MSI-high cases (8%) with biallelic disruption of TAP2 (a gene that is not significantly mutated in MSI-high by itself) and 8 of 75 MSI-high cases (11%) with biallelic disruption of B2M (Fig. 4A and D). Six of the MSI-high cases with biallelic disruption of B2M appeared to be clonal based on the tumor purity, the variant allele fraction, and the CNA or CN-LOH data, and all six showed significantly reduced expression of B2M (Supplementary Figs. S4A and S6). Biallelic disruption of B2M has been shown to confer resistance to anti–PD-1 (PDCD1) therapy in a patient with melanoma undergoing therapy (25, 42), as well as lung cancer (43) and colorectal cancer (7).
Figure 4A shows that clustering the samples based on the HLA class I and class II gene expression yields four clusters: Cluster I has both HLA classes upregulated, Cluster II has both clusters downregulated, Cluster III has HLA class I downregulated and HLA class II upregulated, and Cluster IV has HLA class I upregulated and HLA class II downregulated, indicating different subtypes within MSI-high based on HLA gene expression. Cluster II is enriched for NLRC5/RFX5–mutated samples (P = 0.01 by a one-sided Fisher exact test) consistent with the role of these genes in regulating HLA class I expression and the role of RFX5 mutations in decreasing HLA class II expression in MSI-high colorectal cancer (44). Clusters I and IV, which have HLA class I genes upregulated, are enriched for TAP1, TAP2, and B2M biallelic loss cases (P = 0.02 by a one-sided Fisher exact test), indicating that these genes may be selected to be disrupted in cases where HLA class I gene expression is intact.
Overall, 20 of 75 (27%) MSI-high primary tumor cases were found to have biallelic disruptions to one of eight genes important for antigen presentation to CD8 T cells, whereas another 49 of 75 (65%) cases were found to have less severe forms of damage to the same gene set (Fig. 4E). These mutations are predicted to affect MHC class I expression, upstream antigen processing, and antigen presentation itself. The biallelic mutations were predominantly mutually exclusive of one another, indicating that these mutations potentially have similar roles in tumor survival. Beyond the biallelic disruption mutations, however, the majority of these mutations likely help tumor survival only marginally, by, for example, downregulating MHC class I epitope presentation by the tumor. In both of these cases, cross-presentation of tumor-derived antigens by dendritic cells, macrophages, and other cell types may remain an avenue for activating an antitumor CD8 T-cell response, and characterizing these effects in the context of APM mutations is an important area for future work. Together, these mutations suggest the existence of positive selection for immune escape through the disruption of APM, editing the cancer cell genome likely to avoid the presentation of the most immunogenic neoepitopes resultant from high mutational burden.
Mutation Load as a Predictor of T-cell Infiltration
To assess CD8+ T-cell infiltration levels in the TCGA colorectal cancer we applied the expression-based pan T-cell infiltration measurement used on the melanoma TCGA cohort in (45). The T cell average is the average log-expression of 12 canonical T cell–associated genes (CCL2, CCL3, CCL4, CXCL9, CXCL10, CD8A, HLA-DOB, HLA-DMB, HLA-DOA, GZMK, ICOS, and IRF1). T-cell average is correlated with tumor-infiltrating lymphocyte (TIL) score, a pathology-based measure of T-cell infiltration, based on 429 pathology slides available for TCGA samples (see Methods, Supplementary Table S12; Supplementary Fig. S7A). This measurement is highly correlated with CD8A expression by itself (R2 = 0.73) and the “CYT” score (the average of GZMA and PRF1 expression, a measure of immune cytolytic activity based on two genes not present in the T-cell average; R2 = 0.67; Supplementary Fig. S7B; ref. 46). We use the measurement with more genes here for robustness and for consistency with recent analyses (45).
This T-cell infiltration measurement was well supported by 1,346 correlated T-cell inflammation signature genes (Supplementary Table S13; Supplementary Fig. S8A). Also, RNA sequencing (RNA-seq)–based cell deconvolution of tissue-infiltrating immune and stromal populations using MCP-counter (ref. 47; based on 111 genes, including CD8A and ICOS, two genes in the T-cell average measurement) showed that the T-cell average was correlated with all immune cell populations present during immune inflammation, including cytotoxic lymphocytes (R2 = 0.76), monocytic lineage cells such as macrophages (R2 = 0.62), and myeloid dendritic cells (R2 = 0.61; Supplementary Fig. S8B and S8C; Supplementary Table S14). On the other hand, NK cells were not as correlated with the T-cell average (R2 = 0.21) and a group of samples had outlier high amounts of NK-cell expression, one of which was a B2M biallelic loss case (Supplementary Fig. S5) possibly attracting NK cells due to the lack of HLA class I on the surface.
The T-cell average was higher for MSI-high samples, which have on average approximately 1,000 coding mutations, relative to MSS samples, which have on average approximately 100 coding mutations. However, the T-cell average was not directly correlated with mutation load (we use mutation load because accurately predicting neoantigens is still an open research problem) within MSS (R2 = 0.001) or MSI-high (R2 = 0.004; Fig. 5B). We sought to identify other explanations for the variability of T-cell average that is not explained by mutation load.
WNT Signaling Is Inversely Correlated with T-cell Infiltration in the TCGA Colorectal Cancer Cohort
Recently, it has been demonstrated that high WNT signaling excludes T-cell infiltration in melanoma primary lesions (45), and that T cells failed to control a melanoma tumor with overexpression of CTNNB1 (48). Further evidence has demonstrated a role of WNT signaling in shaping immunoresponse in ovarian (49) and prostate (50) cancers. We identified 19 significantly mutated genes using MutSigCV (20) in overall colorectal cancer with a role in WNT signaling (Figs. 2B and 5A). APC, TP53, RNF43, FBXW7, SOX9, ARID1A, and TCF7L2 are significantly mutated genes in all colorectal cancer tumors. On the other hand, AMER1, TGIF1 (51), ELF3 (52), SMAD4, and SMAD2 are significantly mutated in MSS tumors specifically, and ZNRF3 (53), BCL9L, DOCK3 (54), CD58 (55), AXIN2, and WNT16 are significantly mutated in MSI-high tumors. CTNNB1 is recurrently mutated with activating driver mutations (see Methods) in 16 tumors. Using MutSigCV, we also identified KEGG_WNT_SIGNALING_PATHWAY as a significantly mutated pathway in overall colorectal cancer (Q < 1.0 × 10−15) and MSS samples (Q < 3.4 × 10−14). In MSI-high, the WNT signaling pathway was significantly mutated on the basis of the microsatellite indel significance calculation using the BioCarta (Q < 1.0 × 10−15) and REACTOME (Q < 9.8 × 10−4) gene sets, but not by MutSigCV (Supplementary Table S15).
To investigate whether, as in primary melanomas (56), WNT signaling is inversely correlated with TILs in colorectal cancer, we tested the correlation between WNT target gene expression and the T-cell average. Although the role of AXIN2 in cancer is still being understood, here we are using it as a biomarker of WNT signaling expression, as it is a WNT target gene whose gene expression level has been shown to be upregulated in tumors with high nuclear CTNNB1 protein levels (57). In the TCGA cohort, AXIN2 gene expression was significantly inversely correlated with the T-cell average (R = −0.36, Q = 4.8 × 10−18; rank = 1,754, Fig. 5C). Supplementary Figure S9 shows a heat map of the T-cell average and the expression level of 11 other WNT target genes (58–63) that were significantly inversely correlated with the T-cell average: RNF43 (Q = 8.1 × 10−18) and ZNRF3 (Q = 1.5 × 10−16; refs. 53, 64), NKD1 (Q = 1.6 × 10−11), TCF7 (Q = 4.0 × 10−15), APCDD1 (Q = 9.3 × 10−3), ASCL2 (Q = 1.7 × 10−9), NOTUM (Q = 2.9 × 10−8), EPHB2 (Q = 2.7 × 10−14), ETS2 (Q = 2.5 × 10−7), MYB (Q = 7.4 × 10−6), and MYC (Q = 0.57). These findings support our conclusion that WNT signaling, as shown using the expression levels of 12 WNT target genes in colorectal cancer, is anticorrelated with T-cell infiltration.
Nuclear CTNNB1 (β-Catenin) Expression Is Inversely Associated with Pathologically Determined Immune Infiltration in Colorectal Cancer
Given our findings of the inverse association of WNT signaling with activated T-cell transcriptional signatures, we sought to confirm this result using well-validated IHC markers of protein expression and immune infiltration. Specifically, we tested 1,150 colorectal cancer FFPE cases, procured from the NHS/HPFS cohorts, for nuclear CTNNB1 as a marker of activation of the WNT signaling pathway (65). To reduce bias resulting from spatial heterogeneity, we assessed hematoxylin and eosin (H&E) whole-tissue sections and four tissue microarrays (TMA) per tumor. Next, we assessed spatially and functionally distinct immune infiltrates in colorectal cancer through histopathologic examination of four lymphocytic reaction patterns from: TILs, intratumoral periglandular reaction, peritumoral lymphocytic reaction, and Crohn-like lymphoid reaction (see Methods; ref. 66), as well as four T-cell functional subsets [CD3+ cells, CD8+ cells, CD45RO+ (PTPRC protein isoform) cells, and FOXP3+ cells], using TMA and IHC and image analysis (67).
Consistent with our finding of inverse association between WNT signaling and T-cell average, nuclear CTNNB1 protein expression was inversely associated with TILs (P = 0.027; Fig. 5D; Supplementary Table S16). Intratumoral, peritumoral, and Crohn-like lymphoid reactions were not significantly associated with nuclear CTNNB1 expression (P > 0.14). As previously shown (66), TILs were more frequently occurring in MSI-high tumors. However, the inverse association between TILs and nuclear CTNNB1 expression was apparent in both MSI-high and MSS cancers (Supplementary Fig. S10A; Supplementary Table S17). We further examined the correlation between nuclear CTNNB1 and CD3+ cells (pan T cells), CD8+ cells (cytotoxic T cells), CD45RO+ cells (antigen-experienced memory T cells), and FOXP3+ cells (regulatory T cells, which suppress cytotoxic T cells). We observed a significant inverse association between nuclear CTNNB1 expression with CD8+ cells (P = 0.002), CD45RO+ cells (P = 0.008), and CD3+ cells (P = 0.054), but not with FOXP3+ cells (P > 0.1) (Fig. 5E; Supplementary Table S18). Figure 5F shows an example of a colorectal cancer with nuclear CTNNB1-high and a low level of TILs and another colorectal cancer lacking nuclear CTNNB1 expression (cytoplasmic-only) but exhibiting a higher level of TILs. This finding was consistent across both MSI-high and MSS cancers (Supplementary Fig. S10B; Supplementary Table S19) and provides independent evidence for attenuation of the antitumor immune response in tumors with active WNT signaling.
APC Biallelic Mutations Associate with Increased WNT Signaling and Decreased TILs in MSS and MSI-High
Using the TCGA colorectal cancer cohort, we sought to identify specific genomic drivers of increased WNT signaling and decreased T-cell average. We first considered biallelic disruptive mutations in APC, because APC loss results in increased nuclear CTNNB1, thereby driving expression of WNT target genes (68). In samples with biallelic disruptive mutations in APC, AXIN2 expression was significantly increased (P < 2.2 × 10−16) and T-cell average was significantly decreased (P = 4.0 × 10−12) relative to samples with no disruptive mutations in APC (Fig. 6A and B). This was evident for MSI-high and MSS taken separately (Fig. 6C), so that 71% of all colorectal cancer cases having APC biallelic loss have lower T-cell infiltration based on the T-cell average measurement.
Superenhancer Hypomethylation of AXIN2 Is Associated with Decreased TILs Independent of APC Alterations in MSS Tumors
Recent work showed that key WNT signaling genes, including AXIN2, are controlled by superenhancers, which are continuous genomic regions that drive up gene expression when hypomethylated in cancer (69). AXIN2 is overexpressed in tumors relative to normal tissue as a result of hypomethylation of a superenhancer region located within AXIN2 (chr17:63540803-63558867). A methylation driver of T-cell infiltration is of interest because it has the potential to drive changes in T-cell infiltration much faster than somatic mutations. To assess whether the methylation status of this superenhancer might be potentially driving T-cell infiltration, we first showed that in the TCGA multi-omic cohort, which has methylation data, AXIN2 expression was inversely correlated with the methylation status of a set of CpG islands near the AXIN2 superenhancer (Fig. 7A). Although this region was uniformly hypermethylated in MSI-high and normal samples, in MSS samples, the methylation state (β-value, the fraction of methylated reads at that CpG site) of this superenhancer region is bimodally distributed (Fig. 7B; Supplementary Fig. S11A and S11B). AXIN2 was significantly increased (P < 2 × 10−16) and T-cell average was significantly decreased (P = 1 × 10−4) in MSS tumors with hypomethylation of the AXIN2 superenhancer region, even though these samples do not differ significantly in mutation load (P = 0.4; Fig. 7C).
The methylation state of this superenhancer varied within APC biallelic loss cases and cases without APC disruptions (Fig. 7D), indicating the possibility of it being a driver independent of APC mutation status. Figure 7E shows that AXIN2 superenhancer hypomethylation significantly increased AXIN2 expression within the set of cases without APC disruptions and also within the set of cases with APC biallelic losses. However, T-cell average was only significantly decreased within the APC cases with biallelic losses. Together, these observations indicate that there may be multiple pathways to immune resistance in colorectal cancer by way of altering WNT signaling.
At a time when cancer immunotherapy is becoming a mainstream treatment for patients with several cancer types, it is important to understand how cancer cells interact with the immune system and avoid being eliminated by it. As a cancer evolves from its normal cell of origin with progressively accumulated mutations, it needs to evade immune recognition and prevent tumor antigen-specific T cells from infiltrating and attacking it. A main mechanism by which cancers avoid immune attack is by triggering immune checkpoints that are inhibitory pathways crucial for limiting self-attack and minimizing collateral damage in normal tissues (70). Antibody therapy can be used to block the CTLA4 or PD-1 (PDCD1) checkpoints, releasing the T cells to attack the tumor (71, 72). These T cells likely recognize neoepitopes created by nonsynonymous somatic mutations (73) and result in tumors with higher mutational loads being more likely to respond to immunotherapy (74, 75), regardless of tumor type or site of origin (7, 8). Among colorectal cancers, MMR-D tumors are the major molecular subtype with a high mutational load, and these are the patients that preferentially respond to anti–PD-1 therapy (7, 8). However, not all patients with colorectal cancer with high mutational load respond to immune checkpoint inhibitor therapy, suggesting that the cancer cells may have evolved alternate mechanisms to evade immune recognition prior to treatment. Others respond initially and then develop resistance.
Colorectal cancer provides a clear opportunity to identify immune resistance mechanisms present in untreated samples, because MSI-high is a subtype that does respond to immune blockade therapy and MSS does not. As a result of the large number of MSI-high samples in this cohort and the use of analysis techniques that account for the high MMR-D mutation rate, we were able to identify 11 significantly mutated immune-related genes with roles in modulating diverse hematopoietic cell types and effects beyond antigen presentation, suggesting potential novel therapeutic targets. Additional genes with a role in the immune system were found to be frequently mutated by these techniques. Of particular relevance to this study, NLRC5 is a large gene that was only identified in the MMR-D significance calculation. It regulates HLA class I gene expression (40) and has been shown to activate WNT signaling in hepatocellular carcinoma (76), suggesting that master regulators connecting tumor biology and immune response can still be discovered.
In our recent study (6), we observed an increased number of mutations in HLA class I and APM genes in tumors with higher numbers of TILs. Here, we show enrichment of APM and other immune-related genes in MSI-high versus MSS tumors. These findings potentially explain why not all colorectal cancers respond to immune checkpoint blockade therapy or go on to progress later. Similarly, HLA LOH has been recently reported in 40% of mostly smoking-induced non–small cell lung cancers with a high mutational load, being associated with a high subclonal neoantigen burden and increased T-cell infiltration. The frequency of HLA LOH was enriched in metastatic sites (24). Taken together with the data here on MSI-high colorectal cancer, this further supports that highly mutated cancers develop escape mechanisms upon strong immune selection pressures.
Another opportunity presented by large-scale cancer genomics studies like this one is the ability to assess the frequency of established drivers of resistance to immunotherapy in the untreated colorectal cancer population. Our study showed that 11% of untreated MSI-high patients with colorectal cancer harbor biallelic loss of B2M, which is the subunit of MHC required for the stable surface expression of all HLA class I molecules. Loss of B2M, long known to be an escape mechanism to cancer immunotherapy strategies (77–79), has been recently shown to confer resistance to anti–PD-1 antibody therapy (25, 42, 43).
In addition, our study showed that the majority of MSI-high cases had at least one mutation likely to play a role in decreasing antigen presentation. Although in most cases these initial mutations are not predicted to be sufficient to confer resistance to immune checkpoint blockade therapy, they indicate that immune editing is happening prior to treatment and that patient tumors are on a resistance continuum that needs to be monitored in advance of treatment. For example, although a B2M mutation may be sufficient to cause resistance due to a loss of all HLA class I antigen-presenting alleles, a patient whose cancer has four HLA class I parental alleles disrupted may be farther along the path to resistance than a patient with all HLA class I parental alleles intact. In fact, we observed here that 55% of cases (41/75) had multiple alterations in the HLA class I antigen-presenting pathway. This does not suggest that these cases will not respond to immune blockade, when a large percentage do. It means that these cancers are hypermutating and evolving a means of immune escape even before treatment and that we need to consider other approaches. For example, HLA class II antigen-presenting genes were expressed and not significantly mutated in MSI-high tumors, providing evidence that there is a population for which HLA class II–based therapies, currently being pursued (80), might prove effective.
Many colorectal cancers, particularly MSS tumors, are non–T-cell-infiltrated (274/490 MSS cases). A study in melanomas (45) has shown that WNT signaling upregulation may suppress immune infiltration. Because mutations in WNT signaling pathway genes, including APC and CTNNB1, are commonly observed in colorectal cancer, we sought to assess whether there was an anticorrelation between the activation of WNT signaling and T-cell infiltration. Both by transcriptome analyses in the TCGA set and by IHC evaluation of T-cell infiltration and CTNNB1 nuclear localization in independent tumor samples, it was evident that colorectal cancers with high-level WNT signaling exhibit low levels of T-cell infiltration independent of mutation load. In addition, our data show that biallelic loss of APC (a gene that upregulates WNT signaling), occurring in 62% of MSS cases and 20% of MSI-high cases, is associated with downregulation of T-cell infiltration.
In summary, our findings suggest that MSI-high tumors display strong evidence of cancer hypermutation of immune genes to evolve a means of escape even before immune blockade treatment, allowing these highly mutated tumors to grow unimpeded by immune recognition. On the other hand, MSS tumors that harbor a lower mutational load exhibit less cancer immunoediting while exhibiting higher WNT pathway signaling and decreased T-cell infiltration.
TCGA Data Access
The results published here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/. All the raw sequence data for TCGA exomes and transcriptomes, as well as the SNP6 array data, are available through Database of Genotypes and Phenotypes (dbGaP) access. The raw exome data for this study are available through TCGA using the barcodes provided in Supplementary Table S1. Access to methylation array data is available through the TCGA Genomic Data Commons (GDC) at https://portal.gdc.cancer.gov.
Data Harmonization Yields a 592-Sample TCGA Colorectal Cancer Cohort
The original TCGA colorectal cancer publication provided an analysis of 276 colorectal samples (4). In the current study, we added the 242 prior samples that had complete multi-omic data to 350 unpublished TCGA colorectal cancer samples to produce a cohort of 592 TCGA colorectal cancer cases with exome sequencing, DNA copy number, methylation, and RNA-seq for multi-omic analyses (4). As a first step, we assessed tumor purity with InfiniumPurify (81) using methylation array data. We chose this method because the other popular method, ABSOLUTE (82), relies on CNAs, and MSI-high cases (n = 75) have few to no CNAs. To minimize tumor heterogeneity, TCGA assessed the tumor mass area. Next, we called somatic mutations from all 592 whole-exome sequencing data, ensuring consistency with prior calls using a previously published and validated method (83) that has been used in several settings (Supplementary Table S2; refs. 84–86). We benchmarked our variant calling against Mutect2 on the 528 TCGA samples from our cohort to which the GDC has applied their Mutect2 pipeline (GDC results available at https://portal.gdc.cancer.gov). Our overall coverage of Mutect2 results was high, with 95% of that method's reported coding mutations either called by our method or specifically flagged by one of our filters (see Supplementary Table S20 for sample-by-sample details). Finally, we provided additional evidence for the somatic mutations by providing variant allele fractions for the matched tumor transcriptome samples, in addition to the exomes. These data show that missense mutations are typically expressed, whereas disruptive mutations result in reduced expression of the variant due to nonsense mediated decay, consistent with ref. 87 (Supplementary Fig. S12; Supplementary Table S2).
Calling indels continues to present challenges due to insufficient coverage, alignment quality, the presence of repeats, short interspersed elements, and homopolymer/dimers, microsatellite instability, that result in false negatives (88). Prior calling methods filtered out indels at homopolymer runs likely to decrease false positives due to polymerase slip errors during massively parallel sequencing (64). We did not exclude such indels and instead aggregated the indels with respect to the microsatellite runs for downstream analyses, including but not limited to annotating biallelic losses and identifying significantly mutated genes based on MMR-D at the level of microsatellite sites (see below).
We also generated CNA calls using the TCGA SNP array data using a method previously published and validated (83–86). CN-LOH events were called as described in Methods. We call CNA and CN-LOH events conservatively by not adjusting for tumor purity, instead setting uniform tighter cutoffs that decrease the likelihood of false positives. Supplementary Tables S21 and S22 detail all the CNAs and CN-LOH events we generated for all 592 TCGA samples with information on the amplitude of the gain or loss and the genes affected where possible. To call HLA class I typing, we used POLYSOLVER (35) when applicable to the available exome data type and seq2HLA (89). Once the HLA class I type was identified, we remapped the data and called point mutations, CNA events, and CN-LOH for the HLA region using the approaches cited above (Supplementary Table S11). Figure 2A shows the analyses for which we use these data, and Supplementary Table S4 indicates how we integrated cohorts. We assessed our overall accuracy by checking whether the data recapitulates known subtype relationships and frequencies (Fig. 1).
TCGA Sample Collection
The TCGA cohort was collected and molecularly characterized by the NCI (Rockville, MD) as part of a multi-institute effort. The colorectal cancer cohort includes 592 primary colorectal adenocarcinomas, 48 normal adjacent tissues used for gene expression and methylation controls, and 592 germline DNA used for somatic mutation, CNA, and CN-LOH calling. All patients were untreated by chemotherapy or radiotherapy and provided informed consent; tissue collection was approved by the Institutional Review Board.
MSI-High Status Calling
We compared the mSINGS calls and the clinical grade calls on the previously annotated and published 276 samples using the default settings and 0.27 cutoff for the score. This gave us agreement with the MSI-PCR results, where they existed, for all but three samples. For the samples with disagreements, we checked the other indicators, which consistently pointed toward the clinical classifications being correct: case TCGA-AA-3715 (NEG by mSINGS; but called MSI-H by clinical grade testing, with MLH1 epigenetic silencing, high mutation rate, and MMR mutations), case TCGA-AA-A02R (NEG by mSINGS; but called MSI-H with clinical grade testing, with MLH1 epigenetic silencing, high mutation rate, and MMR mutations), and case TCGA-AA-A00L (POS by mSINGS; but MSS by clinical grade testing, high MLH1 expression, low mutation rate, no MMR mutations). We ran mSINGS on the remaining 316 TCGA samples using 0.27 as a cutoff for the score, verifying the resulting classification with MLH1 epigenetic status, MMR-D mutation status, and mutation load for missense mutations and indels. As an independent verification, we were able to identify confirming MMR gene disruptions in all but 11 cases in the samples classified as MSI-high, and none of the MSS cases had MMR gene disruptions.
Somatic Mutation Calling
We called somatic mutations for the whole TCGA cohort from the alignments available from TCGA (4). We used the published and validated methodology in ref. 83 checking to ensure consistency with the TCGA somatic mutation calls available in ref. 4. Mapped tumor and matched normal whole-exome sequencing data were obtained from TCGA, with reads mapped to the human genome (either hg18 or hg19). Functional annotation was performed relative to the full set of Refseq transcripts, as obtained from the UCSC Genome Browser database on March 14, 2016. During the process of calling somatic mutations, we identified 38 samples that were excluded to yield the final set of 592 samples, because they did not pass QC measures due to missing files, low coverage, or contamination; details are provided in Supplementary Table S23. The final set of barcodes for accessing the raw sequence data through TCGA are in Supplementary Table S1.
HLA Typing and Somatic Mutation Calling of HLA Class I Genes
HLA allele types were first determined for the HLA class I genes for each sample, using either POLYSOLVER (35) or seq2HLA (89). For those samples already analyzed in ref. 35, we used the HLA types provided by S. Shukla and C.J. Wu; for the remaining samples, we applied seq2HLA to RNA-seq data when correctly matepair-labeled paired-end RNA-seq data were available, and applied POLYSOLVER to whole-exome data in all other cases. The TCGA colorectal cancer cohort had to be analyzed using both approaches because some of the data were generated using the SOLiD sequencing platform, which cannot be run using POLYSOLVER. We then remapped all reads that were originally mapped to the HLA region of chromosome 6 to the consensus sequence for the HLA alleles determined by HLA typing (consensus sequences were obtained from the IMGT/HLA Database; see http://www.hla-alleles.org), using bowtie2 (90) to perform the realignment. Somatic variants, CNA events, and CN-LOH events for HLA-A, HLA-B, and HLA-C were then called on the basis of the realigned reads using the method described above.
We evaluated the presence or absence of CNAs on the basis of SNP6 arrays, when available, or whole-exome sequencing data using the published and validated method in ref. 83. To assess whether the segment is gained or lost, we calculate the copy-number ratio for the segment by taking the average of the log2 copy-number ratio of the individual SNP6 probes (resp., exon depths) for that segment. Segments for which the copy-number ratio increased (decreased) by at least 30% are called single-copy gains (losses); similarly, increases (decreases) of 60% are defined as two-copy gains (losses). We did not account for tumor purity directly; instead, we set cutoffs sufficiently high to ensure that gains and losses were present. This approach increases the number of false negatives relative to methods that account for tumor purity. However, CNAs that are identified are more likely to be real. Two-copy gains and losses were considered to be focal when they included 10 or fewer genes. This method was applied to HLA class I genes after typing and remapping data to the correct allele(s).
To identify CN-LOH events, we first restricted attention to SNPs in the SNP6 panel that appeared heterozygous in the matched normal sample (B-allele frequency at least 25%). When SNP6 data were not available, we instead used probable coding SNPs identified in exome-sequencing data using the method described above for calling variants and identifying SNP variants that had a B-allele frequency at least 25% in the tumor and the normal. For each such SNP, we calculated its expected B-allele frequency in the tumor sample based on its observed copy number and assuming a monoallelic gain or loss. We call the difference between this quantity and the actual B-allele frequency the “B-allele deficit.” The B-allele deficit, defined in this way, should be positive for actual instances of CN-LOH, zero for monoallelic gains or losses and negative for gains involving both alleles. We segmented the B-allele deficits for each of the samples using the Circular Binary Segmentation Algorithm (91). This resulted in a set of segments for each sample, each corresponding to a piece of DNA with consistent B-allele deficit; we averaged the log2 copy-number ratio and the B-allele deficit over each such segment. Segments that were copy neutral (i.e., less than 30% gain or loss) and had a mean B-allele deficit of at least 0.15 were called as CN-LOH. Once a segment is assessed as having CN-LOH, then all the genes on that segment are considered to be CN-LOH. This method was applied to HLA class I genes after typing and remapping data to the correct allele(s).
Subtyping Using Somatic Mutation Data
To identify a sample as positive for a particular genomic subtype, we used somatic mutation data. For TP53, APC, PTEN, RNF43, ZNRF3, AXIN2, AXIN1, ATM, MSH6, TGFBR2, SMAD4, MLH1, MSH2, MSH6, PMS2, MLH3, and MSH3, any disruptive mutation was used as a positive identification. For KRAS, any mutations at codons 12, 13, 59, 61, 117, and 146 were used. For BRAF, any mutations at codons 600, 601, 466, and 469 were used. For PIK3CA, any mutations at codons 88, 110, 111, 345, 542, 545, 546, 1043, 1046, and 1047 were used. For CTNNB1, any mutations at codons 31–35, 37, 40, 41, 45, 383, and 387 were used.
Identification of MLH1 Epigenetically Silenced Cases
We assessed the MLH1 DNA methylation status of each sample following the method used by ref. 4, which relied on the specific probe (cg00893636) in the promoter of MLH1.
Tumor Purity Assessment Using Methylation Data
To assess tumor purity, we used DNA methylation levels from a specific set of CpG probes that are differentially methylated between cancer and normal cells, as described in ref. 81 and published as InfiniumPurify. We used this approach because copy number–based approaches, like ABSOLUTE (82), are not effective on samples with few CNAs, like MSI-high cases. The published methodology contains a list of approximately 1,000 differentially methylated probes (DMP) on the Illumina 450 k platform, each specific to a different cancer type, including colorectal cancer–specific DMPs that we used for this analysis. To apply InfiniumPurify to the maximum number of TCGA samples, including the cases with 27 k methylation arrays, we restricted the colorectal cancer DMP list to the 46 probes shared between the 27 k and 450 k methylation microarrays and confirmed, using samples where both 27 k and 450 k data was available, that this restriction had a minimal effect on the estimated tumor purity relative to running the full set of colorectal cancer DMPs. We then used this restricted set of DMPs for the cases with only 27 k methylation arrays and the full set of DMPs on the cases with 450 k methylation arrays. Other than this modification, we calculated tumor purities using the published method, making a final linear adjustment suggested in ref. 81 to improve agreement between InfiniumPurify and ABSOLUTE.
Significantly Mutated Genes and Pathways Using MutSigCV
We ran MutSigCV on the set of coding variants using the MutSigCV package as described in ref. 20 with minor adjustments. We used the covariates file provided by the Broad Institute that accounts for the effect that DNA replication time, chromatin state (open/closed), and general level of transcription activity have on the background mutation rate for each gene. To generate an accurate coverage file, we used the exome kits Roche v2 and VCRome 2.1, used on the samples run on Solid and Illumina, respectively, taking the union of the exons included in each kit to define the gene, and the reading frame for each exon provided by the UCSC Genome Browser for the corresponding Refseq transcript. A gene was included in the list of significantly mutated genes if it was significantly mutated in the TCGA cohort and the combined cohort, the NHS/HPFS cohort and the combined cohort, or the TCGA cohort, the NHS/HPFS and the combined cohort. This was done to ensure that we did not include genes for which increasing the number of samples did not decrease the evidence that the gene is significantly mutated. Also, to avoid artifacts, the cohorts were considered independently because the TCGA samples were fresh-frozen and the NHS/HPFS cohort was FFPE, and each was done on a different exome kit. To identify significantly mutated pathways, we did that same analysis after treating each pathway as a union of genes. A few aspects of the MutSigCV that are gene specific will not be run in this mode, including DNA replication time, chromatin state, and transcription activity.
Significantly Mutated Genes and Pathways by MSI Resulting from MMR-D
To assess significantly mutated genes on the basis of MMR-D–induced mutations alone, we developed a modified version of the original MutSig algorithm (20). We located all homopolymer runs at least 3 bases long (about 4.2 million) and all dinucleotide repeats at least 4 bases long (about 2.5 million) within the coding exons of any Refseq genes. These constitute our potential MMR-D mutation sites of interest. Because the base MMR-D mutation rate differs significantly depending on the run length and content, we partitioned the sites into 48 different contexts (length 3, 4, 5, 6, 7, 8, 9, or ≥ 10, for homopolymer base A/T or C/G or dinucleotide pair AT, AC/GT, AG/CT, or CG) and determined the background mutation rate (BMR) for each context as the number of run-lengthening or run-shortening indels observed in that context divided by the number of candidate sites within a particular set of samples, in this case all MSI-high cases in TCGA and NHS/HPFS.
If each site were mutated independently at random with the appropriate BMR, then the number of MMR-D indels within gene G would be Poisson-distributed with expected value EG = sum_c BMR_c * sites_c(G), where the sum is over context. We assign each gene a z-score on the basis of its actual observed number of run-lengthening or run-shortening somatic indels, NG, as compared with the expected number: ZG = (NG − EG)/sqrt(EG). To determine the unlikeliness of a given z-score, we repeatedly shuffle the observed MMR-D indels at random through the available sites, preserving their contexts, and recompute z-scores for all genes; doing this 1,000 times gives an empirical z-score distribution with >107 samples, and we assign P values with respect to this empirical distribution. In other words, the P value for gene G is the probability that an arbitrary gene would achieve a z-score as high as ZG if the indels in each context were randomly distributed among the available sites. To identify significantly mutated pathways, we did that same analysis after treating each pathway as a union of genes. A similar method for using microsatellites to identify drivers was published recently (23).
T-cell Inflammation Score and Identification of T-cell Inflammation Signature Genes
We denoted the average log-expression of 12 canonical T cell–associated genes (CCL2, CCL3, CCL4, CXCL9, CXCL10, CD8A, HLA-DOB, HLA-DMB, HLA-DOA, GZMK, ICOS, and IRF1) as the “T-cell average” for each sample with expression data (45, 92, 93). To identify T-cell inflammation signature genes that are correlated with T-cell average, we followed the method in ref. 56. We partitioned the samples into eight classes of T-cell inflammation, with divisions at the 12.5th, 25th, 37.5th, etc., percentiles of the T-cell average within the set of MSS and MSI-high samples. Samples with T-cell average below (above) the median are designated as T-cell noninflamed (inflamed). We restricted attention to protein-coding genes that were expressed with at least 1.0 FPKM in at least 80% of the MSS and MSI-H samples; this constituted around 15,000 genes. We fit the log-expression of each gene to its T-cell class with a linear model; we reported those genes that changed expression by at least 2-fold between the inflamed and noninflamed sets and for which the relation had a P value of <1 × 10−4 by ANOVA as T cell–inflamed signature genes (Supplementary Table S13).
Prospective Cohort Study Dataset
We utilized incident colorectal cancer cases identified within two prospective cohort studies in the NHS (which has followed 121,701 women ages 30–55 years at 1976) and the HPFS (which has followed 51,529 men ages 40–75 years at 1986). We collected FFPE tumor tissue blocks from hospitals throughout the United States where patients with colorectal cancer had undergone surgical resection (94). Cases with available tissue showed characteristics similar to cases without available tissue (94). Among participants diagnosed with colorectal cancer until 2012 in the NHS and HPFS, we analyzed 1,150 cases with available data on nuclear CTNNB1 status and immune response in tumor tissue samples. After whole-exome sequencing on DNA from tumor and normal FFPE tissue pairs underwent an assay validation using 185 colorectal cancer cases in the NHS and HPFS cohorts (64), we conducted whole-exome sequencing in additional cases with available ample tissue materials. We did not have gene expression data for these cases, because they were FFPE.
Pathologic Evaluation of Tumor Immunity Status
In the NHS and HPFS, a single pathologist (S. Ogino), blinded to other data, reviewed H&E–stained tissue sections, confirmed diagnosis of colorectal carcinoma, and recorded pathologic features, including four components of lymphocytic reaction (TILs, intratumoral periglandular reaction, peritumoral lymphocytic reaction, and Crohn-like lymphoid reaction) as described previously (66). In the TCGA dataset, a single pathologist (K. Inamura), blinded to other data, reviewed publicly available images of H&E-stained tissue sections and recorded pathologic features, including TILs. TILs were precisely defined as lymphocytes on top of neoplastic epithelial cells, as has been defined in the literature. Intratumoral periglandular reaction was defined as lymphocytic reaction in tumor stroma within a tumor mass. Peritumoral lymphocytic reaction was defined as discrete lymphoid reaction surrounding a tumor mass. Crohn-like lymphoid reaction was defined as transmural lymphoid reaction. Each of these features was graded as negative/low, intermediate, or high (66).
In the NHS and HPFS, we constructed TMAs to include up to four cores from colorectal cancer and up to two cores from normal tissue blocks. We conducted IHC analysis for CTNNB1 using an anti-CTNNB1 antibody (clone 14, dilution 1:400; BD Transduction Laboratories), as described previously (95). A single pathologist (T. Morikawa) graded cytoplasmic, nuclear, and membrane CTNNB1 expression status separately as no expression, weak expression, or moderate/strong expression. Cytoplasmic and nuclear CTNNB1 positivity was defined as the presence of moderate/strong expression, and loss of membrane CTNNB1 as no or weak membrane expression. A selected set of tumors (n = 292) was independently examined by a second pathologist (S. Ogino), and the concordance between the two pathologists was 0.90 for nuclear CTNNB1 (κ = 0.80), 0.78 for cytoplasmic CTNNB1 (κ = 0.54), and 0.86 for membrane CTNNB1 (κ = 0.72), indicating good to substantial agreement (all P < 0.0001; ref. 95). We performed IHC for CD3, CD8, CD45RO, and FOXP3 as described previously (67). We used an automated scanning microscope and the Ariol image analysis system (Genetix) to measure densities (cells/mm2) of CD3+ cells, CD8+ cells, CD45RO+ cells, and FOXP3+ cells in colorectal cancer tissue.
Analyses of MSI, DNA Methylation, and KRAS, BRAF, and PIK3CA Mutations in the NHS and HPFS
DNA was extracted from FFPE tissue blocks using QIAamp DNA FFPE Tissue Kit (Qiagen). MSI status was determined using 10 microsatellite markers (D2S123, D5S346, D17S250, BAT25, BAT26, BAT40, D18S55, D18S56, D18S67, and D18S487), and MSI-high was defined as the presence of instability in ≥30% of the markers, as described previously (96). Using bisulfite-treated DNA, methylation status of eight CIMP-specific promoters (CACNA1G, CDKN2A, CRABP1, IGF2, MLH1, NEUROG1, RUNX3, and SOCS1) and long interspersed nucleotide element-1 (LINE-1) was analyzed (97, 98). CIMP-high was defined as methylation in ≥6 of eight promoters (97). PCR and pyrosequencing were performed for KRAS (codons 12, 13, 61, and 146; ref. 99), BRAF (codon 600; ref. 96), and PIK3CA (exons 9 and 20; ref. 94).
Statistical Analysis NHS and HPFS IHC Cohort
We used logistic regression analyses to assess the association of nuclear CTNNB1 status (positive vs. negative) with immune response to tumor. We assessed four lymphocytic reaction patterns (TILs, intratumoral periglandular reaction, peritumoral lymphocytic reaction, and Crohn-like lymphoid reaction) and four T-cell subsets (CD3+ cells, CD8+ cells, CD45RO+ cells, and FOXP3+ cells) in tumor tissue. Multivariable logistic regression models initially included age (continuous), sex, year of diagnosis (continuous), family history of colorectal cancer (present vs. absent), tumor location (proximal colon vs. distal colon vs. rectum), MSI status (high vs. low/MSS), CIMP-specific promoter status (high vs. low/negative), LINE-1 methylation level (continuous), KRAS (mutant vs. wild-type), BRAF (mutant vs. wild-type), and PIK3CA (mutant vs. wild-type). A backward elimination with a threshold of P = 0.05 was used to select variables for the final model. The cases with missing data [family history of colorectal cancer (0.9%), tumor location (0.4%), MSI status (4.0%), CIMP status (7.7%), KRAS (3.7%), BRAF (3.0%), and PIK3CA (10.5%)] were included in the majority category of a given categorical covariate to limit the degrees of freedom of the models. For cases with missing data on LINE-1 methylation level (5.0%), we assigned a separate indicator variable. We confirmed that excluding cases with missing data in any of the variables did not substantially alter our results. We assessed statistical interactions between nuclear CTNNB1 status (positive vs. negative) and MSI status (high vs. low/MSS) in relation to tumor immunity status using the Wald test for the cross-product in logistic regression models. The proportional odds assumption was assessed using the ordinal logistic regression model. We observed evidence for violation of this assumption in two of four lymphocytic reaction patterns (intratumoral periglandular reaction and Crohn-like lymphoid reaction) and, therefore, used binary histologic lymphocytic reaction variables as outcome variables for logistic regression analyses. To compare characteristics between subgroups according to nuclear CTNNB1 status, we used the χ2 test for categorical variables, and the unpaired t test for continuous variables.
All statistical analyses were performed using SAS software (version 9.4, SAS Institute), and all P values were two-sided.
Disclosure of Potential Conflicts of Interest
S. Hu-Lieskovan is a consultant/advisory board member for Amgen, Genmab, and Merck. M.D.M. Leiserson is a postdoctoral researcher and consultant/advisory board member for Microsoft. B.J. Raphael has ownership interest (including patents) in Medley Genomics and is a consultant/advisory board member for the same. C.J. Wu is a consultant/advisory board member for Neon Therapeutics. E.S. Lander is a consultant/advisory board member for Codiak BioSciences, Neon Therapeutics, and Third Rock Ventures. L.A. Garraway is senior vice president at Eli Lilly and Company and has ownership interest (including patents) in Foundation Medicine and Tango Therapeutics. C.S. Fuchs is a consultant/advisory board member for CytomX, Eli Lilly, Entrinsic Health, Genentech, Merck, Sanofi, and Taiho. No potential conflicts of interest were disclosed by the other authors.
Conception and design: C.S. Grasso, M. Giannakis, D.K. Wells, X.J. Mu, D.A. Wheeler, L.A. Garraway, T.J. Hudson, C.S. Fuchs, A. Ribas, S. Ogino, U. Peters
Development of methodology: C.S. Grasso, D.K. Wells, X.J. Mu, B. Shirts, D.A. Wheeler, J.M. Zaretsky, L.A. Garraway, C.S. Fuchs, S. Ogino, U. Peters
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.S. Grasso, M. Giannakis, T. Hamada, J.A. Nowak, R. Nishihara, Z.R. Qian, K. Inamura, T. Morikawa, K. Nosho, D.A. Wheeler, S.H. Zaidi, S.B. Gabriel, C.S. Fuchs, S. Ogino, U. Peters
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.S. Grasso, M. Giannakis, D.K. Wells, T. Hamada, X.J. Mu, M. Quist, J.A. Nowak, R. Nishihara, Z.R. Qian, G. Abril-Rodriguez, M.S. Geybels, W.M. Grady, L. Hsu, J.R. Huyghe, Y.J. Kim, M.D.M. Leiserson, D.J. Montoya, B.B. Nadel, M. Pellegrini, C.C. Pritchard, C. Puig-Saus, B.J. Raphael, S.J. Salipante, D.S. Shin, E. Shinbrot, B. Shirts, S. Shukla, W. Sun, J. Tsoi, A. Upfill-Brown, D.A. Wheeler, C.J. Wu, M. Yu, S.H. Zaidi, J.M. Zaretsky, E.S. Lander, C.S. Fuchs, A. Ribas, S. Ogino, U. Peters
Writing, review, and/or revision of the manuscript: C.S. Grasso, M. Giannakis, D.K. Wells, T. Hamada, X.J. Mu, J.A. Nowak, H. Escuin-Ordinas, W.M. Grady, S. Hu-Lieskovan, P. Krystofinski, M. Pellegrini, C.C. Pritchard, D.S. Shin, B. Shirts, J.L. Stanford, J. Tsoi, M. Yu, L.A. Garraway, T.J. Hudson, C.S. Fuchs, A. Ribas, S. Ogino
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.S. Grasso, D.K. Wells, M. Quist, Z.R. Qian, C. Connolly, E.H. Quist, E.S. Lander, S. Ogino
Study supervision: C.S. Grasso, A. Ribas, S. Ogino, U. Peters
We would like to thank the participants and staff of the Nurses' Health Study and the Health Professionals Follow-up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, IA, ID, IL, IN, KY, LA, MA, MD, ME, MI, NC, ND, NE, NH, NJ, NY, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. The authors assume full responsibility for analyses and interpretation of these data. This work was supported by NIH grants U01 CA137088 to U. Peters; U54 HG003067 to E.S. Lander and S.B. Gabriel; P01 CA87969 to M.J. Stampfer; UM1 CA186107 to M.J. Stampfer; P01 CA55075 to W.C. Willett; UM1 CA167552 to W.C. Willett; P50 CA127003 to C.S. Fuchs; R35 CA197735 to S. Ogino; R01 CA151993 to S. Ogino; K07 CA190673 to R. Nishihara; 1R01CA194663-01 W.M. Grady; R35 CA197633 and P01 CA168585 to A. Ribas and C.S. Grasso; the Nodal Award from the Dana-Farber Harvard Cancer Center to S. Ogino; The Parker Institute for Cancer Immunotherapy, the Ressler Family Fund, the Samuels Family Fund, and the Garcia-Corsini Family Fund to A. Ribas and C.S. Grasso; and by grants from the Project P Fund, The Friends of the Dana-Farber Cancer Institute, Bennett Family Fund, and the Entertainment Industry Foundation through the National Colorectal Cancer Research Alliance. M. Giannakis was supported by a KL2/Catalyst Medical Research Investigator Training award NIH Award KL2 TR001100. M. Giannakis and C.S. Fuchs were supported by the Stand Up To Cancer Colorectal Cancer Dream Team Translational Research Grant (grant number SU2C-AACR-DT22-17). Stand Up To Cancer (SU2C) is a program of the Entertainment Industry Foundation and the research grant is administered by the American Association for Cancer Research, a scientific partner of SU2C. S. Hu-Lieskovan was supported by a Career Development Award from the American Society of Clinical Oncology (ASCO), a Tower Cancer Research Foundation Grant, a Dr. Charles Coltman Fellowship Award from the Hope Foundation, and a UCLA KL2 Award. D.S. Shin was supported by the Tumor Immunology 4T32CA009120-40 training grant and 2016 Conquer Cancer Foundation ASCO Young Investigator Award. T.J. Hudson and S.H. Zaidi were supported by the Ontario Institute for Cancer Research, through generous support from the Ontario Ministry of Research and Innovation. B.B. Nadel was supported by the Biomedical Big Data training grant from the NIH-NLM National Cancer Institute; PHS grant number 5T32LM012424-03. D.J. Montoya was supported by NIAMS 3P50AR063020-03S1. W.M. Grady was supported by R.A.C.E. Charities and the Gensch family, STTR Translational Research grant. This research was supported by funds from the Parker Institute for Cancer Immunotherapy (PICI), grant number 20163828.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.