Using 48,627 samples from the Center for Cancer Genomics and Advanced Therapeutics (C-CAT), we present a pan-cancer landscape of driver alterations and their clinical actionability in Japanese patients. Comparison with White patients in Genomics Evidence Neoplasia Information Exchange (GENIE) demonstrates high TP53 mutation frequencies in Asian patients across multiple cancer types. Integration of C-CAT, GENIE, and The Cancer Genome Atlas data reveals many cooccurring and mutually exclusive relationships between driver mutations. At pathway level, mutations in epigenetic regulators frequently cooccur with PI3K pathway molecules. Furthermore, we found significant cooccurring mutations within the epigenetic pathway. Accumulation of mutations in epigenetic regulators causes increased proliferation-related transcriptomic signatures. Loss-of-function of many epigenetic drivers inhibits cell proliferation in their wild-type cell lines, but this effect is attenuated in those harboring mutations of not only the same but also different epigenetic drivers. Our analyses dissect various genetic properties and provide valuable resources for precision medicine in cancer.

Significance:

We present a genetic landscape of 26 principal cancer types/subtypes, including Asian-prevalent ones, in Japanese patients. Multicohort data integration unveils numerous cooccurring and exclusive relationships between driver mutations, identifying cooccurrence of multiple mutations in epigenetic regulators, which coordinately cause transcriptional and phenotypic changes. These findings provide insights into epigenetic regulator–driven oncogenesis.

This article is featured in Selected Articles from This Issue, p. 695

With the advent of next-generation sequencing (NGS), large-scale pan-cancer genome projects, such as The Cancer Genome Atlas (TCGA), have identified numerous driver aberrations in cancer genomes (1, 2). These efforts have facilitated the elucidation of functional pathways, the development of molecularly targeted therapies, and the proposal of molecular classification across different cancer types (3–5). Owing to these advances, precision medicine, which optimizes therapeutic approaches based on NGS-based genome profiling, has become routine clinical practice. These genomic data with clinical outcomes have been collected by several cancer registries, such as the AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE; refs. 6–10). Pan-cancer genome projects assessed all coding genes through whole-exome and/or whole-genome sequencing and collected data from other omics modalities, including transcriptome and epigenome, allowing the integrated multi-omics analysis in an unbiased manner (1, 2). In contrast, although the real-world clinico-genomic registries contained only genome-sequencing data in a limited number of cancer-related genes, their sample sizes are much larger than those of pan-cancer genome projects, which may provide potential opportunities for significant discoveries by leveraging increased statistical power (6–10).

A number of genetic studies have demonstrated extensive genomic diversity and clinical utility among patients with different races or ancestries (11, 12). There have been several studies mainly investigating differences between White and Black populations, which documented distinct genetic patterns, including increased alterations involving the PI3K pathway and decreased TP53 and FBXW7 mutations and CCNE1 amplifications in White populations (13, 14). However, because of the limited sample numbers in non-White populations, few meaningful comparisons have been conducted across races or ancestries (15). Particularly, because sporadic reports have focused on Asian ancestries (8, 9), it remains unclear whether and how the cancer genomic landscape is different between Asian and White populations. Moreover, Asian-prevalent cancer types, including stomach adenocarcinoma (STAD) and intrahepatic cholangiocarcinoma (IHCH), have not been fully genetically characterized. To facilitate cancer precision medicine, several clinico-genomic registries have also been established in East Asian countries, such as the K-MASTER program (8, 9). Particularly, the Japanese registry, known as the Center for Cancer Genomics and Advanced Therapeutics (C-CAT), has collected real-world clinico-genomic data from more than 48,000 tumors, offering valuable resources for investigating the cancer genome in Asian populations (10).

Pairwise relationships between driver alterations can provide valuable functional insights. For instance, mutually exclusive relationships are related either to functional redundancy or to synthetic lethality, whereas cooccurring relationships suggest functional synergies, potentially reflecting resistance to targeted therapy. Previous studies have identified hundreds of cooccurring and mutually exclusive relationships, including the cooccurrence of PIK3CA and NFE2L2 mutations and mutually exclusive mutations within the RAS pathway, using the TCGA cohort (3, 4, 16, 17). However, because of the scarcity of independent large-scale datasets, it is still uncertain whether such pairwise relationships have been fully elucidated.

Here we describe the largest pan-cancer genomics data of the Japanese population, including those in Asian-prevalent cancer types, and identify genomic differences between Asian and White populations using C-CAT and other clinico-genomic registry data (GENIE and TCGA). In addition, we integrated these multicohort data and evaluated cooccurring and mutually exclusive relationships between driver mutations. We identified significant cooccurrence of multiple mutations involving epigenetic regulators and characterized its effect on functional and clinical properties.

Overview of the C-CAT Cohort

Between June 2019 and August 2023, genomic and clinical information from 48,891 tumor samples with advanced solid tumors across 30 organs/tissues were registered in the C-CAT cohort. We removed duplicated samples and those with missing data and rigorously filtered putative single-nucleotide polymorphisms (SNP; including population-specific SNPs) as well as noncoding mutations, synonymous mutations, and those with low variant allele frequencies (Supplementary Fig. S1A). Then, after removing ultra-hypermutator samples, a total of 264,585 nonsynonymous coding mutations in 48,627 samples were analyzed (Fig. 1A). Among them, over 250 samples were present in 26 principal cancer types/subtypes, including not only colon (COAD), rectum (READ), pancreas (PAAD), lung (LUAD), prostate (PRAD) adenocarcinomas, and invasive ductal carcinoma (IDC) of the breast but also Asian-prevalent cancers, such as esophageal squamous cell carcinoma (ESCC), STAD, and IHCH (Supplementary Fig. S1B and S1C; Supplementary Table S1). Two targeted gene panels have been used in Japan, namely the FoundationOne CDx [F1CDx, n = 42,389 (87.2%)] and the NCC Oncopanel [n = 6,238 (12.8%)]. F1CDx and NCC Oncopanel encompass all coding exons of 309 and 124 genes, respectively (Supplementary Table S2). Among them, 164 genes are shared between F1CDx and all targeted gene panels included in the GENIE cohort (described in detail below) (6), of which 70 genes are not included in NCC Oncopanel (Supplementary Fig. S2A). Of 164 genes, we focused on 69 genes that have been reported as (i) a cancer type-specific driver or (ii) a pan-cancer driver which is mutated in ≥10% of samples in at least one cancer type/subtype in the C-CAT or GENIE cohort (as described below; Supplementary Fig. S2B; Supplementary Table S3).

Figure 1.

Overview of the C-CAT cohort data. A, Distribution of cancer types among 48,627 samples in the C-CAT cohort. The first and second inner circles indicate targeted gene panel and organs/tissues. Twenty-six principal cancer types/subtypes are shown in bold and underlined. Others include 10 organs/tissues. See Supplementary Table S1 for cancer type abbreviations of principal cancer types/subtypes. BRCA, invasive breast carcinoma; CHOL, cholangiocarcinoma; COADREAD, colorectal adenocarcinoma; DIFG, diffuse glioma; EGC, esophagogastric adenocarcinoma; GBC, gallbladder cancer; NSCLC, non-small cell lung cancer; OVT, ovarian epithelial tumor; SACA, salivary carcinoma; SOC, serous ovarian cancer; UCEC, endometrial carcinoma; USARC, uterine sarcoma/mesenchymal; USMT, uterine smooth muscle tumor. B, Frequency of samples harboring recurrent driver mutations across 26 principal cancer types/subtypes. Genes with a cohort-level mutation frequency (shown in right bars) of ≥5% or a cancer type-specific mutation frequency of ≥10% are displayed. Gene function and related functional pathways are also shown. TF, transcription factor. C, Proportion of samples harboring driver mutations belonging to each functional pathway in the C-CAT cohort. Color indicates the number of driver mutations in each functional pathway in a sample.

Figure 1.

Overview of the C-CAT cohort data. A, Distribution of cancer types among 48,627 samples in the C-CAT cohort. The first and second inner circles indicate targeted gene panel and organs/tissues. Twenty-six principal cancer types/subtypes are shown in bold and underlined. Others include 10 organs/tissues. See Supplementary Table S1 for cancer type abbreviations of principal cancer types/subtypes. BRCA, invasive breast carcinoma; CHOL, cholangiocarcinoma; COADREAD, colorectal adenocarcinoma; DIFG, diffuse glioma; EGC, esophagogastric adenocarcinoma; GBC, gallbladder cancer; NSCLC, non-small cell lung cancer; OVT, ovarian epithelial tumor; SACA, salivary carcinoma; SOC, serous ovarian cancer; UCEC, endometrial carcinoma; USARC, uterine sarcoma/mesenchymal; USMT, uterine smooth muscle tumor. B, Frequency of samples harboring recurrent driver mutations across 26 principal cancer types/subtypes. Genes with a cohort-level mutation frequency (shown in right bars) of ≥5% or a cancer type-specific mutation frequency of ≥10% are displayed. Gene function and related functional pathways are also shown. TF, transcription factor. C, Proportion of samples harboring driver mutations belonging to each functional pathway in the C-CAT cohort. Color indicates the number of driver mutations in each functional pathway in a sample.

Close modal

In total, 151,875 mutations were found in these driver genes and 44,349 (91.2%) of samples carried at least one driver mutation, with a mean number of 3.1 per sample. Of 69 driver genes, 11 genes were mutated in ≥5% of samples in the entire C-CAT cohort. An additional 27 genes were mutated in ≥10% of samples from at least one principal cancer type/subtype. Among them, the most frequently mutated gene in the C-CAT cohort was TP53 (55.9%), followed by KRAS (24.8%), APC (16.7%), PIK3CA (11.9%), ARID1A (10.4%), and KMT2D (8.9%; Fig. 1B). TP53 mutations were the most frequent lesion in 18 cancer types/subtypes, including urothelial bladder carcinoma (BLCA), COAD, IDC, and LUAD. Activating KRAS mutations were the second leading genomic aberration and mainly found in PAAD, COAD, and READ. The following mutations were highly enriched in a specific cancer type/subtype: GATA3 mutation in IDC, CDKN2A, KRAS, and SMAD4 mutations in PAAD, VHL mutation in renal cell carcinoma (RCC), APC mutation in COAD, IDH1 mutation in IHCH, NFE2L2 mutation in ESCC, and EGFR mutation in LUAD. There were no significant differences in driver mutation frequency between F1CDx and NCC Oncopanel (Supplementary Fig. S2C). Among canonical functional pathways, the most frequently altered was genome integrity (63.3%), followed by receptor tyrosine kinase (RTK)–RAS signaling (46.0%), epigenetic regulators (38.7%), developmental pathways (33.2%), and PI3K signaling (23.4%; Fig. 1C).

Gene rearrangements, many of which produced putative driver fusions, were detected in 1,115 (2.6%) and 51 samples (0.8%) by F1CDx and NCC Oncopanel, which evaluated 36 and 13 genes, respectively (Fig. 2A; Supplementary Table S4). Among driver fusions found in 26 principal cancer types/subtypes (n = 492 and 26 fusions by F1CDx and NCC Oncopanel, respectively), the most frequent driver fusions were ERG fusions (n = 129), followed by those involving RET (n = 62), ALK (n = 51), FGFR3 (n = 43), and FGFR2 (n = 41; Fig. 2B). Besides them, highly recurrent fusions, such as BRAF (n = 100) and FLI1 (n = 76) fusions, were detected in rare cancer types/subtypes. Moreover, 7,887 driver copy-number alterations (CNA) were identified in 6,915 patients, with ERBB2 amplification being the most common, followed by FGFR1, PIK3CA, and MET amplifications (Fig. 2C; Supplementary Table S5). High tumor mutation burden (TMB) was observed in 8.3% of patients, particularly in those with BLCA, LUAD, and cervical squamous cell carcinoma (CESC; Fig. 2D). Taken together, at least one clinically actionable genetic lesion (with evidence levels 1, 2, or 3A) annotated using OncoKB (18), a precision oncology knowledge base, was found in 15.3% of patients, with well-differentiated thyroid cancer (WDTC) being the highest, followed by IDC, LUAD, and IHCH (Fig. 3A). Compared with previous reports (8, 9), several alterations were more frequently reported, including ERBB2 amplification in esophagogastric cancer (including STAD), IDH1 R132 mutation in cholangiocarcinoma (including IHCH), and mutations in homologous recombination repair genes (BRCA2 and ATM) in prostate cancer (including PRAD; Fig. 3B; Supplementary Table S6).

Figure 2.

Gene fusion and TMB in the C-CAT cohort. A, Distribution of 1,140 and 64 gene rearrangements forming putative driver fusions among 1,115 and 51 samples in the F1CDx (top) and NCC Oncopanel (bottom), respectively. B, Number of driver gene rearrangements forming putative driver fusions across 26 principal cancer types/subtypes in the C-CAT cohort according to targeted gene panels used. C, Number of driver copy-number alterations (CNA) across 26 principal cancer types/subtypes in the C-CAT cohort according to targeted gene panels used. D, Proportion of tumor mutation burden (TMB)-high samples across 26 principal cancer types/subtypes. Values in parentheses indicate numbers of samples examined.

Figure 2.

Gene fusion and TMB in the C-CAT cohort. A, Distribution of 1,140 and 64 gene rearrangements forming putative driver fusions among 1,115 and 51 samples in the F1CDx (top) and NCC Oncopanel (bottom), respectively. B, Number of driver gene rearrangements forming putative driver fusions across 26 principal cancer types/subtypes in the C-CAT cohort according to targeted gene panels used. C, Number of driver copy-number alterations (CNA) across 26 principal cancer types/subtypes in the C-CAT cohort according to targeted gene panels used. D, Proportion of tumor mutation burden (TMB)-high samples across 26 principal cancer types/subtypes. Values in parentheses indicate numbers of samples examined.

Close modal
Figure 3.

Clinically actionable alterations in the C-CAT cohort. A, Fraction of samples with clinically actionable alterations [including TMB-high and microsatellite instability (MSI)-high] according to the OncoKB knowledge base in the entire cohort (left; pie chart) and in each principal cancer type/subtype (right; bar graph). B, Details of clinically actionable alterations with evidence level 1. Number of patients with evidence level 1 in corresponding cancer types (left) and cancer type, clinically actionable alteration, and their potential sensitive drugs (right) are shown. PIK3CA major hotspots include C420R, E542K, E545A, E545D, E545G, E545K, H1047L, H1047R, H1047Y, Q546E, and Q546R. ESR1 hotspots include D538, E380, L469V, L536, S463P, and Y537. See Supplementary Table S6 for details.

Figure 3.

Clinically actionable alterations in the C-CAT cohort. A, Fraction of samples with clinically actionable alterations [including TMB-high and microsatellite instability (MSI)-high] according to the OncoKB knowledge base in the entire cohort (left; pie chart) and in each principal cancer type/subtype (right; bar graph). B, Details of clinically actionable alterations with evidence level 1. Number of patients with evidence level 1 in corresponding cancer types (left) and cancer type, clinically actionable alteration, and their potential sensitive drugs (right) are shown. PIK3CA major hotspots include C420R, E542K, E545A, E545D, E545G, E545K, H1047L, H1047R, H1047Y, Q546E, and Q546R. ESR1 hotspots include D538, E380, L469V, L536, S463P, and Y537. See Supplementary Table S6 for details.

Close modal

Comparison of Genetic Landscape Between Asian and White Populations

Next, we aimed to clarify the genetic differences between Asian and White populations across multiple cancer types. We used genomic data captured by MSK-IMPACT (n = 65,469), DFCI-ONCOPANEL (n = 33,220), and F1CDx (DUKE; n = 1,970) deposited in GENIE (version 14.0) and curated them in a similar manner to the C-CAT cohort (Supplementary Fig. S1A). We obtained 613,468 nonsynonymous coding mutations in 84,531 samples, which consisted of White (84.6%), Asian (6.2%), Black (6.2%), and other populations (Supplementary Fig. S1B). In the GENIE cohort, 252,994 mutations were found in 69 driver genes and 71,223 (84.3%) of samples carried at least one driver mutation, with a mean number of 3.0 per sample. We compared mutational frequency in 352 recurrent driver gene–cancer type combinations between the C-CAT cohort and White population from the GENIE cohort (Fig. 4A; Supplementary Table S7). Compared with the White subcohort, 4 driver gene–cancer type combinations showed a lower frequency, whereas 14 combinations were significantly higher in the C-CAT cohort (>10% difference with q-value < 0.01). Almost identical results were obtained when primary or metastatic status was adjusted for by Cochran–Mantel–Haenszel test (Supplementary Fig. S3A). Importantly, TP53 mutations accounted for 10 of the 14 higher combinations. Although TP53 mutation frequency differed across these cancer types/subtypes, the mutation frequency was consistently higher in the C-CAT cohort by 10.7%–30.4%, with head and neck squamous cell carcinoma (HNSC) showing the largest difference (Fig. 4B). The same results were observed when the analysis was limited to tumor-only sequencing, primary or metastatic samples, as well as more detailed subtypes (Supplementary Fig. S3B–S3D). Even within the GENIE cohort, the Asian population (including East and South Asian populations) showed a higher TP53 mutation frequency than the White population in seven of eight cancer types/subtypes (with ≥30 samples in both races), although the differences were rarely significant due to small sample size in the Asian population (Supplementary Fig. S4A). Mutation type and distribution were largely similar between the two populations (Supplementary Fig. S4B and S4C). Besides TP53 mutations, a limited number of combinations showed a consistent result when White and Asian populations were compared within the GENIE cohort (Supplementary Fig. S4A). These results suggest that increased frequency of TP53 mutations is widespread across multiple cancer types in Asian population, while further studies would be necessary to confirm the differences of mutation frequency in other driver genes.

Figure 4.

Differences in the mutational landscape between Asian and White patients. A, Volcano plot showing differentially mutated genes between the C-CAT cohort and the GENIE White subcohort. Genes with >10% difference and q-value < 0.01 are considered significant. B, Comparison of TP53-mutated sample frequency between the C-CAT cohort and the GENIE White subcohort in 10 cancer types/subtypes. C, Comparison of samples with clinically actionable alterations (focusing on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels) between the C-CAT cohort and the GENIE White subcohort in each principal cancer type/subtype. Cancer types with >10% difference and q-value < 0.01 are considered significant. D, Fraction of samples with clinically actionable alterations (focusing on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels) when combining 26 principal cancer types/subtypes in the C-CAT cohort (left) and GENIE White subcohort (right). A, B, and C, Two-sided Fisher exact test with Benjamini–Hochberg correction.

Figure 4.

Differences in the mutational landscape between Asian and White patients. A, Volcano plot showing differentially mutated genes between the C-CAT cohort and the GENIE White subcohort. Genes with >10% difference and q-value < 0.01 are considered significant. B, Comparison of TP53-mutated sample frequency between the C-CAT cohort and the GENIE White subcohort in 10 cancer types/subtypes. C, Comparison of samples with clinically actionable alterations (focusing on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels) between the C-CAT cohort and the GENIE White subcohort in each principal cancer type/subtype. Cancer types with >10% difference and q-value < 0.01 are considered significant. D, Fraction of samples with clinically actionable alterations (focusing on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels) when combining 26 principal cancer types/subtypes in the C-CAT cohort (left) and GENIE White subcohort (right). A, B, and C, Two-sided Fisher exact test with Benjamini–Hochberg correction.

Close modal

When comparing clinical actionability between the C-CAT cohort and the White population in the GENIE cohort, focusing on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels (Supplementary Tables S3–5), a comparable proportion of patients harbored at least one clinically actionable genetic lesion in each cancer type/subtype between the C-CAT cohort and the GENIE White subcohort, although some differences between the two cohorts were observed in IHCH and WDTC (Fig. 4C). When the 26 principal cancer types/subtypes were combined, the overall proportion was higher in the White subcohort compared with the C-CAT cohort (26.8% vs. 18.3%; Fig. 4D). This is primarily because the C-CAT cohort has a higher prevalence of cancer types like pancreatobiliary cancers, which have few targetable alterations, urging the exploration of alternative therapeutic avenues in these cancer types.

Cooccurrence and Exclusivity Across Driver Mutations by Multicohort Analysis

To increase the power to detect cooccurring and mutually exclusive relationships across driver mutations in each cancer type, we attempted to combine genomic data from multiple cohorts, including C-CAT, GENIE, and TCGA. We first extracted mutation data from TCGA, curated them, and obtained 26,840 mutations in 69 driver genes. Then, we limited the analysis to 22 cancer types/subtypes shared across the three cohorts and to 66 driver genes that were present in ≥1% of samples in a corresponding cancer type/subtype in all cohorts (253 recurrent driver gene–cancer type combinations in total). We confirmed the consistency of previously reported cooccurring and mutually exclusive relationships (n = 7 and 25 driver gene pair–cancer type combinations, respectively; refs. 4, 16, 17) across C-CAT, GENIE, and TCGA cohorts using the Fisher exact test, establishing the basis for genomic data integration (Supplementary Fig. S5A). Therefore, we performed a meta-analysis of the three cohorts with different cancer type distributions and identified significant cooccurring (n = 568) and mutually exclusive (n = 152) relationships among 1,790 combinations (Fig. 5A; Supplementary Table S8). These significant pairs included not only previously reported (n = 215; refs. 3, 4, 16, 17) but also many novel relationships (n = 484). Among them, the C-CAT cohort mainly contributed to Asian-prevalent cancer types, including CESC and IHCH (Supplementary Fig. S5B).

Figure 5.

Cooccurring and mutually exclusive relationships across driver mutations. A, Pairwise cooccurring and exclusive relationships across 66 driver genes by conducting a meta-analysis of C-CAT, GENIE, and TCGA cohorts. Number of cancer types/subtypes showing significant cooccurring and exclusive relationships are shown in red and blue gradient colors, respectively. Driver gene pairs in gray are not evaluated. TF, transcription factor. B, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to the same or different functional pathways. Two-sided Fisher exact test. C, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to the same pathway according to its function. Functional pathways with ≥5 combinations are shown. D, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to different pathways according to its function. B and C, Values in parentheses indicate numbers of driver gene pair–cancer type combinations examined. C and D, The binomial test was used to estimate the deviation from the expected proportions of cooccurring or exclusive relationships. Yellow and dark blue asterisks indicate proportions that are statistically higher and lower than expected, respectively.

Figure 5.

Cooccurring and mutually exclusive relationships across driver mutations. A, Pairwise cooccurring and exclusive relationships across 66 driver genes by conducting a meta-analysis of C-CAT, GENIE, and TCGA cohorts. Number of cancer types/subtypes showing significant cooccurring and exclusive relationships are shown in red and blue gradient colors, respectively. Driver gene pairs in gray are not evaluated. TF, transcription factor. B, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to the same or different functional pathways. Two-sided Fisher exact test. C, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to the same pathway according to its function. Functional pathways with ≥5 combinations are shown. D, Proportion of significant cooccurring and exclusive relationships among driver gene pair–cancer type combinations belonging to different pathways according to its function. B and C, Values in parentheses indicate numbers of driver gene pair–cancer type combinations examined. C and D, The binomial test was used to estimate the deviation from the expected proportions of cooccurring or exclusive relationships. Yellow and dark blue asterisks indicate proportions that are statistically higher and lower than expected, respectively.

Close modal

Among 137 significant pairs observed in more than one cancer type/subtype, 114 (83.2%) pairs showed the same direction of association. For example, well-known pairs, such as ARID1A and PTEN, ARID1A and PIK3CA, CDKN2A and TP53, and RB1 and TP53, cooccurred and showed the same direction consistently across five or more cancer types/subtypes (refs. 3, 4, 16, 17; Supplementary Fig. S6A). Such consistently cooccurring relationships also included several novel pairs, such as FBXW7 and KMT2D, FBXW7 and PIK3CA, and KMT2D and PIK3CA, although FBXW7 and PIK3CA were reported to show mutual exclusivity (19). These results suggest that a majority of cooccurring and mutually exclusive relationships are shared across cancer types. However, there were several exceptions, such as KRAS and TP53 as well as TP53 and PTEN, which showed different interdependencies according to cancer type/subtype (Supplementary Fig. S6B).

In one tumor, multiple mutations may target a single pathway, while different pathways may be comutated. Therefore, we mapped the identified pairwise relationships to functional pathways and found that driver genes within the same pathway were significantly exclusive (Fig. 5B). Among them, a significantly larger proportion of mutually exclusive pairs was observed within the RTK-RAS pathway, consistent with previous reports (ref. 4; Fig. 5C). On the other hand, the epigenetic pathway more frequently had multiple driver mutations per sample (described in detail below). We also identified significant cooccurring and mutually exclusive relationships between different pathways. For example, mutations in genome integrity-associated genes were mutually exclusive with those in oncogenic signaling pathways (Fig. 5D). Importantly, these relationships mostly consisted of exclusiveness of TP53 mutation with those in PI3K (including PIK3CA), RTK-RAS (BRAF, HRAS, and KRAS), developmental (CTNNB1), and other (GNAS) signaling pathways found in multiple cancer types/subtypes (Supplementary Fig. S6C). These results not only confirm previously reported exclusive relationships with TP53 mutation (3, 4, 16, 17) but also extend them to a wide range of signaling pathways involved in cancer survival.

There was also significant cooccurrence of mutations in cell-cycle regulators, such as RB1 and CDKN2A, with those in genome integrity-associated genes, particularly TP53, in various cancer types/subtypes, which are well known and whose synergistic effects were experimentally validated (20, 21). We also identified novel cooccurring pairs at the pathway level. There was significant cooccurrence of mutations in epigenetic regulators and PI3K pathway (Fig. 5D). Although the cooccurring mutations in ARID1A and PI3K pathway genes were previously reported (3), mutations involving other epigenetic regulators, including KMT2D, also cooccurred with PIK3CA or PTEN mutations. These results are consistent with CRISPR-Cas9 screening results, showing that loss of many epigenetic drivers cooperate with PIK3CA-activating mutations (22). Furthermore, several epigenetic regulators, such as KMT2D and EP300, are a direct phosphorylation target of AKT downstream of PI3K (23, 24), suggesting that such molecular mechanisms may underlie their genetic relationships. Consistent findings of cooccurrence and mutual exclusivity at both the gene and pathway levels were obtained using the permutation approach, confirming the validity of our framework (Supplementary Fig. S7A–S7D). Taken together, these observations point to similar functional interdependencies across driver mutations involving various epigenetic regulators.

Mutations in Epigenetic Regulators Have Common Impacts on Epigenetic and Transcriptomic Profiles

The strongest relationship in our analysis was the cooccurrence of multiple driver mutations in the epigenetic pathway, suggesting cooccurring lesions that mediate synergistic activation of the pathway (Fig. 5C). Although most of their relationships were weak to moderate, 13 epigenetic regulators, including those involved in DNA methylation (IDH1 and IDH2), histone acetylation (EP300 and CREBBP), histone methylation (KMT2A, KMT2D, KDM6A, and SETD2), histone ubiquitination (BAP1 and BCOR), and chromatin remodeling (ARID1A, ATRX, and SMARCA4) significantly cooccurred in 10 cancer types/subtypes (Fig. 5A; Supplementary Fig. S8A; Supplementary Table S8). In addition, a considerable proportion of patients harbored more than two driver mutations in epigenetic regulators, particularly in uterine endometrial carcinoma (UEC) and BLCA (Fig. 6A). These relationships were consistently observed across C-CAT, GENIE, and TCGA cohorts (Supplementary Fig. S8B and S8C). Thus, mutations in distinct but functionally related proteins may converge into similar functionally advantageous phenotypes that are clonally selected during carcinogenesis.

Figure 6.

Epigenetic and transcriptomic influences shared across multiple driver mutations in epigenetic regulators. A, Proportion of samples with driver mutations in epigenetic regulators in each cancer type/subtype in the C-CAT cohort. Color indicates the number of driver mutations in epigenetic regulators in a sample. B, DNA methylation level (calculated as average beta values) in promoter-associated CpG island shores for samples with 0, 1, or ≥2 driver mutations in epigenetic regulators in six cancer types/subtypes (with ≥200 samples). C, ssGSEA scores for proliferation-related signatures (E2F TARGETS and G2M CHECKPOINT) for samples with 0, 1, or ≥2 driver mutations in epigenetic regulators in six cancer types/subtypes (with ≥200 samples). D, Proliferation-related signatures for human 12Z uterine endometrial cell lines transfected with control, siARID1A, siEP300, and both (n = 3 for each group) in GSEA analysis. NES, normalized enrichment score. B and C, Jonckheere–Terpstra trend test (n = 1,000 permutations). Values in parentheses indicate numbers of samples examined.

Figure 6.

Epigenetic and transcriptomic influences shared across multiple driver mutations in epigenetic regulators. A, Proportion of samples with driver mutations in epigenetic regulators in each cancer type/subtype in the C-CAT cohort. Color indicates the number of driver mutations in epigenetic regulators in a sample. B, DNA methylation level (calculated as average beta values) in promoter-associated CpG island shores for samples with 0, 1, or ≥2 driver mutations in epigenetic regulators in six cancer types/subtypes (with ≥200 samples). C, ssGSEA scores for proliferation-related signatures (E2F TARGETS and G2M CHECKPOINT) for samples with 0, 1, or ≥2 driver mutations in epigenetic regulators in six cancer types/subtypes (with ≥200 samples). D, Proliferation-related signatures for human 12Z uterine endometrial cell lines transfected with control, siARID1A, siEP300, and both (n = 3 for each group) in GSEA analysis. NES, normalized enrichment score. B and C, Jonckheere–Terpstra trend test (n = 1,000 permutations). Values in parentheses indicate numbers of samples examined.

Close modal

A recent study has reported that disruption of epigenetic regulators with diverse functions confers a competitive advantage, such as broad resistance to environmental stress, to cancer cells in common (25). Therefore, we hypothesized that cooccurring mutations in various epigenetic regulators have shared effects on cancer phenotype. We first assessed methylation array data from 2,213 samples with six cancer types/subtypes (with ≥200 samples) deposited in TCGA (Fig. 6B). Interestingly, although most driver mutations in epigenetic regulators decreased or tended to decrease DNA methylation level in promoter-associated CpG island shores in BLCA, HNSC, and LUAD, all driver mutations in epigenetic regulators increased DNA methylation level in COAD, STAD, and UEC. Regardless of the direction of change, the more driver mutations in epigenetic regulators accumulated, the more prominent the changes in DNA methylation level became in all analyzed cancer types/subtypes. These results suggest that multiple driver mutations in epigenetic regulators induce similar epigenetic alterations within a cancer type.

Next, to identify common functional consequences induced by driver mutations in epigenetic regulators across cancer types, we assessed RNA-sequencing (RNA-seq) data from 2,440 samples with six cancer types/subtypes (with ≥200 samples) deposited in TCGA. We performed gene-set enrichment analysis (GSEA), which compared samples with and without mutations in each epigenetic regulator in each cancer type/subtype (21 driver gene–cancer type combinations) using 50 hallmark gene sets. This analysis demonstrated 187 significantly upregulated and 45 significantly downregulated signatures (q-value < 0.25; Supplementary Table S9). Importantly, the most frequently upregulated were proliferation-related signatures (E2F TARGETS and G2M CHECKPOINT), which were consistently increased in samples with driver mutations in epigenetic regulators in six cancer types/subtypes (12 driver gene–cancer type combinations; Supplementary Fig. S9A). Such upregulation was consistent with previous reports showing the role of epigenetic regulators in cell-cycle control (26). Therefore, we evaluated the effect of the accumulation of driver mutations in epigenetic regulators on these signatures using single sample GSEA (ssGSEA). As the number of driver mutations in epigenetic regulators increased, proliferation-related signatures were enhanced in these cancer types/subtypes (Fig. 6C). Therefore, cooccurring driver mutations in epigenetic regulators coordinately augment the malignant phenotype in various cancer types.

To confirm our observation in TCGA patients, we searched for publicly available transcriptome data that enabled us to evaluate the effect of targeting multiple epigenetic drivers. We utilized the RNA-seq data of 12Z (a human uterine endometrial cell line) cells transfected with siRNAs targeting ARID1A, EP300, or both and nontargeting control siRNA (27). This analysis showed that individual ARID1A or EP300 knockdown upregulated proliferation-related signatures compared with nontargeting control (Fig. 6D). Remarkably, both gene knockdown further augmented these signatures than single knockdown. These results functionally validate that multiple epigenetic driver disruptions show common effects on proliferative property and that their combination functions in an additive manner.

Functional and Clinical Implications of Cooccurring Driver Mutations in Epigenetic Regulators

To further investigate the functional relevance of cooccurring mutations in epigenetic regulators, we leveraged gene essentiality scores from CRISPR-Cas9 screening in the Cancer Dependency Map (DepMap) project (28). By combining these scores with genetic profiles of cancer cell lines, we compared the effect of CRISPR targeting of a given gene in its wild-type and mutant cell lines. As previously reported (17), ARID1A knockout significantly diminished cell proliferation/survival in ARID1A wild-type lung cancer cell lines compared with ARID1A-mutant ones (Fig. 7A). These results suggest that its inactivation is harmful for cell fitness in vitro, although ARID1A loss-of-function mutations are recurrent in humans and are shown to contribute to carcinogenesis in vivo (26). Interestingly, the effect of ARID1A knockout tended to be reduced in cell lines harboring mutations in other epigenetic drivers (SETD2 and SMARCA4), suggesting that the negative effect of inactivating a given epigenetic driver can be mitigated by disruption of other epigenetic drivers. Similar results were obtained for SMARCA4 gene (Fig. 7A). Therefore, we examined whether such phenomenon is shared by multiple epigenetic regulators across organs by performing a meta-analysis of driver gene–organ combinations. The meta-analysis showed that inactivating a given epigenetic driver caused a significant decrease of cell proliferation/survival in its wild-type cell lines, although all of the analyzed epigenetic drivers were tumor suppressor genes (Fig. 7B). In contrast, cell lines harboring mutations in an epigenetic driver exhibited less or almost no dependency on the same gene, compared with its wild-type cell lines (Fig. 7C). More importantly, the effect of inactivating a given epigenetic driver was diminished in cell lines harboring mutations in other epigenetic drivers (Fig. 7D). These results suggest that although loss-of-function of epigenetic regulators impairs cell fitness in common, deficiencies in other epigenetic drivers render such effect tolerable, potentially promoting the cooccurrence of multiple driver mutations in epigenetic regulators in human cancers.

Figure 7.

Functional relevance of cooccurring driver mutations in epigenetic regulators. A, Gene essentiality score for ARID1A in lung cancer cell lines harboring no mutations in epigenetic regulators (wild-type; blue), mutations in ARID1A (dark orange), and mutations in non-ARID1A epigenetic regulators (light orange) (left). Same analysis for SMARCA4 (right). Wilcoxon rank sum test. Values in parentheses indicate numbers of cell lines examined. B, Forest plot of pooled mean gene essentiality scores showing the effect of inactivating an epigenetic driver in cell lines with no mutations in epigenetic regulators (wild-type). C, Forest plot of pooled mean difference of gene essentiality scores showing the effect of inactivating an epigenetic driver in its mutant cell lines compared with those with no mutations in epigenetic regulators (wild-type). D, Forest plot of pooled mean difference of gene essentiality scores showing the effect of inactivating an epigenetic driver in cell lines harboring mutations in other epigenetic regulators compared with those with no mutations in epigenetic regulators (wild-type). BD, Epigenetic driver genes (mutated in >10 samples) in each organ (with >30 cell lines) were analyzed. The I2 and P values of χ2 test for heterogeneity are shown.

Figure 7.

Functional relevance of cooccurring driver mutations in epigenetic regulators. A, Gene essentiality score for ARID1A in lung cancer cell lines harboring no mutations in epigenetic regulators (wild-type; blue), mutations in ARID1A (dark orange), and mutations in non-ARID1A epigenetic regulators (light orange) (left). Same analysis for SMARCA4 (right). Wilcoxon rank sum test. Values in parentheses indicate numbers of cell lines examined. B, Forest plot of pooled mean gene essentiality scores showing the effect of inactivating an epigenetic driver in cell lines with no mutations in epigenetic regulators (wild-type). C, Forest plot of pooled mean difference of gene essentiality scores showing the effect of inactivating an epigenetic driver in its mutant cell lines compared with those with no mutations in epigenetic regulators (wild-type). D, Forest plot of pooled mean difference of gene essentiality scores showing the effect of inactivating an epigenetic driver in cell lines harboring mutations in other epigenetic regulators compared with those with no mutations in epigenetic regulators (wild-type). BD, Epigenetic driver genes (mutated in >10 samples) in each organ (with >30 cell lines) were analyzed. The I2 and P values of χ2 test for heterogeneity are shown.

Close modal

Finally, we clinically characterized patients with cooccurring driver mutations in epigenetic regulators and evaluated their shared effects on prognosis. Mutations in several epigenetic drivers are reported to be associated with better prognosis, such as ARID1A mutation in UEC (29), IDH1 mutation in glioblastoma (30), and KDM6A mutation in BLCA (3), whereas SMARCA4 mutation predicts worse survival in LUAD (31). On the other hand, the remaining driver mutations in epigenetic regulators have been reported to show no significant prognostic impacts. To increase the statistical power, we performed a meta-analysis of C-CAT, GENIE, and TCGA cohorts, focusing on LUAD as prognostic information was available for sufficient number of patients in all three cohorts. This meta-analysis showed that all of three epigenetic drivers (ARID1A, SETD2, and SMARCA4) had or tended to have worse prognostic impact on overall survival (Supplementary Fig. S9B). In addition, the overall survival was or tended to be worse for patients with LUAD with one or more driver mutations in epigenetic regulators in three cohorts (Supplementary Fig. S9C). The negative prognostic impact of these mutations was validated in multivariable analysis incorporating clinical factors, such as age and stage at diagnosis, in the TCGA cohort (Supplementary Fig. S9D). Taken together, driver mutations in epigenetic regulators confer a shared negative prognostic impact at least in some context.

Through the analysis of large-scale clinico-genomic data in the C-CAT cohort, we have elucidated driver aberrations and their clinical actionability in Japanese patients across multiple cancer types, including Asian-prevalent ones. This analysis provides highly reliable resources for cancer precision medicine, with important implications for appropriate patient selection for molecularly targeted therapies and immunotherapies. More importantly, the population-level comparative analysis systematically clarifies similarities and differences of driver landscape between Asian and other populations, potentially pointing to the influences of their genetic background and environment.

The integration of multi-cohort data in different populations uncovers numerous cooccurring and mutually exclusive relationships between driver mutations across multiple cancer types, thereby not only validating previous findings (3, 4, 16, 17) but also substantially expanding the catalogue of pairwise relationships. Because previous studies explored such relationships within a cancer type and/or in a pan-cancer manner in which many cancer types were combined, their similarity and difference across cancer types remain elusive (3, 4, 16, 17). Here we found that many pairwise relationships are shared across multiple cancer types, suggesting the functional relationships between cancer drivers extend beyond specific organs/tissues, although some mutually exclusive relationships may reflect the genetic differences across subtypes within a cancer type. Importantly, several relationships are common to many genes within a functional pathway, including the frequent cooccurrence of mutations in epigenetic regulators and PI3K pathway genes. Our observations with recent in vivo CRISPR-Cas9 screening data (22) suggest that loss of function of various epigenetic regulators synergize with other functional pathways in a similar manner.

Remarkably, the strongest interdependency in our analysis is the cooccurrence of multiple mutations involving epigenetic regulators. Although previous studies have mainly focused on specific vulnerabilities by combinatorial loss of function of epigenetic regulators, such as ARID1A and ARID1B and EP300 and CREBBP (32, 33), the integration of transcriptomic, epigenetic, and functional analyses provides compelling evidence supporting the functional interplay of multiple driver mutations in epigenetic regulators in cancer pathogenesis. Although many studies have investigated the oncogenic mechanisms by individual epigenetic regulator alterations, it remains unknown how different alterations can induce cancers exhibiting similar phenotypes. Together with a recent report showing a common effect on stress resistance of cancer cells across many epigenetic regulators (25), our findings provide an important clue to understanding the phenotypic homogeneity observed across diverse epigenetic driver alterations. Given the reversibility of epigenetic changes, many epigenetic modulating agents currently available or under development can be a promising therapeutic strategy against cancers with accumulating driver mutations in epigenetic regulators.

Another notable finding is an increased TP53 mutation frequency in Asian populations. Although previous reports in a certain cancer type have documented genetic alterations enriched in Asian populations, such as HRAS mutations in BLCA (14), NFE2L2 mutations in esophageal carcinoma, EGFR mutations in LUAD (34), and FOXA1 mutations in PRAD (35), our study demonstrates that increased TP53 mutation frequency is widespread across multiple cancer types. Because the frequency of TP53 rare pathogenic germline variants is comparable between Japanese and White populations (36), common germline variants associated with ethnicity or ancestry may impact such differences. Clinically, the different TP53 mutation frequency may not only affect the overall therapeutic efficacy and the proportion of molecular subtypes within a cancer type but also influence the response to targeted therapy, depending on the genomic and epigenetic context. Because the difference in primary or metastatic status as well as patient selection for gene panel testing and sample collection between the C-CAT and GENIE cohorts may potentially bias the results, further investigations would be warranted.

In conclusion, we have constructed the driver landscape of Japanese patients with cancer and have clarified its differences from White patients. Moreover, we have comprehensively identified numerous cooccurring and mutually exclusive relationships between driver mutations, encompassing those of Asian-prevalent cancer types, which underscores the power of multicohort pan-cancer studies. Notably, we found prominent cooccurrences of multiple driver mutations in epigenetic regulators, which cause epigenetic, transcriptional, and phenotypic changes in common. These observations genetically dissect the cancer pathogenesis, especially epigenetic regulator-driven oncogenesis, and provide a molecular framework to improve individualized therapeutic strategies in patients with cancer.

C-CAT Dataset Preparation

Since June 2019, data from almost all targeted sequencing with F1CDx and NCC Oncopanel performed under Japan's national health insurance system has been collected to C-CAT (10), with written informed consent from each patient. This study was conducted in accordance with the Declaration of Helsinki and was reviewed and approved by the institutional review board at the National Cancer Center. F1CDx is a tumor-only sequencing panel, whereas matched blood samples are used as germline control for NCC Oncopanel. Somatic mutation, CNA, and gene rearrangement calls for the targeted-sequencing data (version 20230821) were downloaded from the C-CAT portal (https://www.ncc.go.jp/en/c_cat/use/index.html). Genomic coordinates of mutations from alignments to GRCh38 were converted to GRCh37 (hg19) using LiftOver. Samples with missing data or duplicated in the same patient were excluded from the analysis. Next, mutation calls were annotated to gene transcripts in GENCODE (release 19), and a single canonical effect per mutation was reported using Variant Effect Predictor (version 105) and vcf2maf (version 1.6.16). Noncoding mutations (other than splice-site mutations), mutations with low variant allele frequency (<0.05), and synonymous mutations were excluded. Mutations were further filtered with a strict population frequency cutoff value, unless they were listed ≥30 times in the COSMIC database (version 70) or were included in known somatic sites (https://github.com/mskcc/vcf2maf/blob/main/data/known_somatic_sites.bed), by removing (i) variants observed at a frequency ≥0.0001 in the National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI-ESP) 6500, (ii) variants observed at a frequency ≥0.0001 in the 1000 Genomes Project October 2014 release, (iii) variants observed at a frequency ≥0.0001 in the gnomAD exome collection (version 2.1.1), (iv) variants observed at a frequency ≥0.0001 in the 14KJPN data available from Japanese Multi Omics Reference Panel (jMorp) by the Tohoku Medical Megabank Organization (ToMMo), the largest Japanese-specific SNP database (37), and (v) variants observed at a frequency ≥0.0001 in the Genome Medical Alliance (GEM) Japan Whole Genome Aggregation (GEM-J WGA) Panel, another Japanese-specific SNP database released by the GEM Japan project. Annovar (38) was used to incorporate data from the COSMIC database, NHLBI-ESP, 1000 Genomes Project, and gnomAD exome collection. Finally, we removed ultra-hypermutator samples defined as those with ≥70 and 50 mutations for F1CDx and NCC Oncopanel, respectively. Putative driver fusions were selected from gene rearrangement calls if both partner genes were listed in the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (Accessed 14 July 2023; ref. 39) or in ChimerKB from ChimerDB 4.0 (ref. 40; Supplementary Table S4). Putative driver CNAs were selected on the basis of the OncoKB knowledge base (ref. 18; Supplementary Table S5). Samples with a TMB of ≥10 mutations per megabase were classified as TMB-high samples.

GENIE Dataset Preparation

Somatic mutation, CNA, and gene rearrangement calls for targeted-sequencing data from MSK-IMPACT, DFCI-ONCOPANEL, and DUKE-F1-DX1 were downloaded from Synapse (Release 14.0; syn52433756). Sample exclusion, mutation annotation, and mutation and fusion filtering were performed as in the C-CAT cohort. Samples with unknown race information (n = 4,955) were also excluded. We removed ultra-hypermutator samples (with ≥120, 120, and 70 mutations for MSK-IMPACT, DFCI-ONCOPANEL, and DUKE-F1-DX1, respectively). Self-reported race information was used to classify patients into those of White, Black, Asian, and other races. Survival information for MSK-IMPACT was obtained from cBioPortal for Cancer Genomics at https://www.cbioportal.org/study/summary?id=msk_impact_2017 (7).

Cancer Type Classification

Cancer type/subtype was classified according to OncoTree codes (version oncotree_2020_04_01; https://www.cbioportal.org/oncotree/), which are mainly based on tissue location and histology (41). In this hierarchical structure, the most detailed cancer type/subtype category that included ≥250 patients in the C-CAT cohort and ≥100 patients in the GENIE cohort was defined as the “principal cancer types/subtypes.” Adenocarcinoma of unknown primary (ADNOS) and ovarian cancer, other (OOVC) were excluded from the analysis because of the strong heterogeneity in these categories. In addition, melanoma (MEL) was excluded because of different subtype distribution between Asian and White populations (42). Data for OncoTree-based cancer type classification were available for the C-CAT and GENIE cohort, whereas samples were manually classified into the OncoTree codes in the TCGA cohort (Supplementary Table S1).

Driver Genes and Functional Pathways

Genes were defined as drivers for a specific cancer type/subtype if they were listed as cancer type–specific drivers in the TCGA PanCan Atlas (5). In addition, pan-cancer drivers were considered as drivers if their mutation frequency was ≥10% in the corresponding cancer types/subtypes in the GENIE cohort. For cancer types/subtypes with ≤30 samples in the TCGA cohort or those not included in the TCGA Pan-Cancer Atlas project driver list, driver genes were defined according to several genetic studies. In total, 352 driver gene–cancer type combinations were analyzed, and after rigorous filtering, the remaining mutations in these combinations were considered as driver mutations (Supplementary Table S3). Driver genes were classified into 9 functional pathways, including PI3K (n = 7 genes), RTK-RAS (n = 14), epigenetic (n = 16), cell cycle (n = 5), genomic integrity (n = 6), developmental pathway (n = 8), transcription factor (n = 4), other signaling (n = 4), and other (n = 5), based on previous pan-cancer studies (4, 5).

Evaluation of Clinical Actionability

We used the OncoKB Therapeutic Level of Evidence V2 (data version: v4.9; ref. 18) to assess the clinical actionability of somatic mutations in 69 driver genes, driver CNAs, driver fusions, and other biomarkers including microsatellite instability and TMB (thus excluding KRAS and NRAS wild-type in colorectal cancer). For the comparison of clinical actionability (with evidence level 1–3A) between the C-CAT cohort and the GENIE White subcohort, we focused on somatic mutations, driver CNAs, and driver fusions covered by all 11 panels (thus excluding microsatellite instability and TMB). Fisher exact test with Benjamini–Hochberg correction was used to compare the fraction of samples with clinically actionable alterations in each principal cancer type/subtype, considering q-value < 0.01 and a difference of frequency >10% as significant. Cancer type classification by OncoKB was used when counting the number of patients with evidence level 1 clinical actionable alterations (Fig. 3B).

Comparison of Mutation Frequency Across Different Races

In 26 principal cancer types/subtypes, mutation frequency was compared between the C-CAT cohort and the White population of the GENIE cohort. For 16 driver genes not included in NCC Oncopanel, only the F1CDx dataset was used. In total, 352 driver gene–cancer type combinations were evaluated using the Fisher exact test or the Cochran–Mantel–Haenszel test stratified by primary or metastatic status with Benjamini–Hochberg correction applied separately to each result. Driver gene–cancer type combinations with a q-value < 0.01 and a difference of mutation frequency >10% were considered significant. Similar comparisons using the Fisher exact test were performed for both primary and metastatic samples, more detailed subtypes, as well as between Asian and White populations in the GENIE cohort for cancer types/subtypes with ≥30 samples. Furthermore, the mutation type and position within the TP53 gene were compared between the C-CAT cohort and the White population of the GENIE cohort using Fisher exact test with Benjamini–Hochberg correction.

TCGA Dataset Preparation

Somatic mutation data in Mutation Annotation Format (compiled by TCGA PanCanAtlas MC3 Working Group; mc3.v0.2.8.PUBLIC.maf.gz), FPKM (fragment per kilobase of transcript per million mapped reads) normalized values of RNA-seq data (EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv), prognostic data (TCGA-CDR-SupplementalTableS1.xlsx), DNA methylation data (jhu-usc.edu_PANCAN_HumanMethylation450.betaValue_whitelisted.tsv), and sample quality data (merged_sample_quality_annotations.tsv) were downloaded from NCI Genomic Data Commons (GDC). In addition, raw read counts of RNA-seq data were obtained from the NCI GDC Data Portal. Sample exclusion, mutation annotation, and mutation filtering were performed as in the C-CAT cohort. In addition, samples that failed quality control standards based on the PanCan Atlas Project were also removed. We removed ultra-hypermutator samples (with ≥2,500 mutations).

Analysis of Cooccurrence and Mutual Exclusivity Across Driver Mutations

We limited the analysis to 22 principal cancer types/subtypes shared across the C-CAT, GENIE, and TCGA cohorts and to 66 driver genes which were present in ≥1% of samples in each corresponding cancer type/subtype in all cohorts. Thus, 253 recurrent driver gene–cancer type combinations were evaluated for cooccurring and mutually exclusive relationships using Fisher exact test. Then, the ORs and SEs were combined across cohorts through fixed-effect meta-analysis to derive a final P value using the R package meta (version 6.1–0). Finally, P values from multiple comparisons were adjusted using Benjamini–Hochberg correction. Relationships with q-value < 0.05 were considered significant. The χ2 test and the Higgins and Thompson's I2 statistic were used to measure heterogeneity.

To validate cooccurring and mutually exclusive relationships, we used the permutation approach. For each cancer type/subtype, we constructed a binary matrix of driver genes and samples, encoding the presence or absence of a somatic mutation using all samples from the three cohorts. We then performed random permutations in which somatic mutation status was shuffled across samples. For each permuted matrix, we calculated the number of samples harboring mutations in both driver genes for all possible driver gene pairs. By repeating the permutation procedure 1,000,000 times, we generated an empirical null distribution representing the expected number of samples with cooccurring driver mutations for all driver gene pairs. Subsequently, we compared the null distribution with the observed number of samples with cooccurring driver mutations to calculate nominal P values. For driver gene pairs involving genes not included in NCC Oncopanel, we removed samples analyzed by NCC Oncopanel and conducted the same analysis. Finally, P values from multiple comparisons were adjusted using Benjamini–Hochberg correction.

Significant cooccurring and mutually exclusive relationships were then classified into pathway levels. Using the binomial test, we assessed whether the proportion of cooccurring and mutually exclusive relationships was deviated from the expected frequencies for each combination of pathways. The expected frequencies (26.0% and 19.7% for cooccurring and exclusive relationships in the same pathway and 32.5% and 6.9% in different pathways) were calculated on the basis of the proportion across driver gene–cancer type combinations in the same and different pathways.

DNA Methylation Analysis in TCGA

We analyzed the Illumina Infinium HumanMethylation450 BeadChip array data from 2,213 samples from six cancer types/subtypes (with ≥200 samples) in the TCGA cohort. Annotation for Illumina probes was obtained with the R package minfi (version 1.40.0; ref. 43), specifically utilizing the annotation file IlluminaHumanMethylation450kanno.ilmn12.hg19 (version 0.6.0). Probes with missing values, probes on sex chromosomes, SNP-associated probes at the CpG interrogation or the single nucleotide extension, and cross-reactive probes (44) were removed. Then, probes annotated as promoter-associated island shores (n = 18,099; ref. 45) were selected for further analysis. Jonckheere–Terpstra trend test with the R package DescTools (version 0.99.48) was used to assess the trend of average beta-values according to the number of driver mutations in epigenetic regulators (1,000 permutations).

Transcriptome Analysis in TCGA

To identify significantly enriched pathways, raw read counts of RNA-seq data were analyzed using the R package edgeR (version 3.36.0), normalized by TMM (trimmed mean of M), and used as input for GSEA (version 4.1.0; ref. 46). Samples with and without mutations in each epigenetic regulator were compared in six cancer types/subtypes (21 driver gene–cancer type combinations in total) using Molecular Signatures Database hallmark gene sets (version 2022.1.Hs; ref. 47). Statistical significance was calculated by a permutation of phenotype labels (1,000 times). Gene sets with a false discovery rate (FDR) q-value < 0.25 and ranking among the top 20 were considered significant. Plots were made using the R package Rtoolbox (version 1.4).

FPKM normalized values of RNA-seq data from TCGA PanCan­Atlas were used as input for ssGSEA (ssGSEA2.0, https://github.com/broadinstitute/ssGSEA2.0) with default parameters (48). The signature enrichment scores for proliferation-related signatures (E2F TARGETS and G2M CHECKPOINT) were generated for each sample. Jonckheere–Terpstra trend test with the R package DescTools was used to assess the trend of ssGSEA scores according to the number of driver mutations in epigenetic regulators (1,000 permutations).

Transcriptome Analysis in Cell Lines

RNA-seq data of human 12Z uterine endometrial cell lines transfected with siRNAs targeting ARID1A, EP300, or both, and nontargeting control siRNAs, were obtained from Gene Expression Omnibus (GSE148473). RNA-seq reads were mapped to the reference genome (GRCh37) using STAR (version 2.5.2a) (49) and gene expression counts were quantified using featureCounts (version 1.6.5; ref. 50). Raw read counts were TMM-normalized using the R package edgeR for GSEA analysis. Statistical significance was calculated by a permutation of gene-set labels (1,000 times) and P < 0.05 was considered significant. Plots were made using the R package Rtoolbox.

Gene Dependency Analysis with DepMap Data

Sample information data (sample_info.csv) for 1,840 cell lines, mutation call data (CCLE_mutations.csv) for 1,771 cell lines, and batch-corrected genome-wide CRISPR-Cas9 knockout screen data (CRISPR_gene_effect.csv) for 1,086 cell lines were obtained from the DepMap portal (Public 22Q2; https://depmap.org/portal/; ref. 28). Among them, 1,039 cell lines had both mutation and CRISPR-Cas9 knockout screen data. In the CRISPR-Cas9 knockout screen data, negative gene essentiality scores indicate cell growth inhibition and/or death resulting from gene knockout. Cell lines that were reported as contaminated or misclassified were removed. Cell lines were categorized according to their organ of origin, and organs with >30 cell lines and associated with cancer types/subtypes showing significant cooccurrence of multiple driver mutations in epigenetic regulators were analyzed. First, the effect of inactivating a given epigenetic driver on gene essentiality scores was evaluated in its wild-type cell lines in each cancer type. Then, the differences of such effect between its wild-type and mutant cell lines as well as between its wild-type cell lines and those harboring mutations in other epigenetic drivers were evaluated. To combine the results, the mean or the mean difference of gene essentiality scores were pooled across epigenetic driver–organ combinations using fixed-effect meta-analysis as described above.

Survival Analysis

Survival data in LUAD patients were available for 825, 1076, and 503 patients for the C-CAT, GENIE, and TCGA cohorts, respectively. For the C-CAT and GENIE cohorts, overall survival was defined as the time from the registration date and the procedure date respectively to the date of death or last follow-up. For the TCGA cohort, the overall survival from the date of diagnosis was used. Survival analyses were performed using the R package survival (version 3.2–13) and survminer (version 0.4.9). Univariable and multivariable analyses were performed using the Kaplan–Meier method with Cox proportional hazards model. To conduct the meta-analysis of survival across cohorts, we first evaluated the effects of driver mutations in epigenetic regulators using the univariable Cox proportional hazards model, comparing samples with and without mutations in a certain epigenetic regulator. Then, the log HRs and SEs of each cohort were pooled through fixed-effect meta-analysis as described above.

Statistical Analysis

Statistical analyses were performed with R4.1.3 software (The R Foundation for Statistical Computing). Comparison of categorical and continuous data was performed using the Fisher exact test and two-sided Wilcoxon rank sum test, respectively, unless otherwise specified.

Data and Code Availability

Our findings are supported by data that are available from public online repositories or data that are publicly available upon request from the data provider. The data analyzed in this study were obtained from C-CAT (https://www.ncc.go.jp/en/c_cat/about), GENIE (https://www.aacr.org/professionals/research/aacr-project-genie), TCGA Research Network (https://www.cancer.gov/tcga), and the DepMap Project (https://depmap.org/portal). Gene expression data analyzed in this study were obtained from Gene Expression Omnibus at GSE148473. The custom code used in this study is available at https://github.com/nccmo/C-CAT_analysis_2024.

Y. Saito reports grants from JSPS KAKENHI and grants from Takeda Science Foundation during the conduct of the study. Y. Kogure reports personal fees from Takeda, Daiichi Sankyo, Nippon Shinyaku, and personal fees from Kyowa Kirin outside the submitted work. M. Tabata reports personal fees from Pfizer Inc. outside the submitted work. T. Kanai reports grants from Abbvie GK, Mochida Pharmaceutical Co., Ltd., Kyorin Pharmaceutical Co., Ltd., Taiho Pharmaceutical Co., Ltd., Daiichisankyo Co., Ltd., Mitsubishi Tanabe Pharma, Takeda Pharmaceutical Co., Ltd., JIMRO Co., Ltd., EA Pharma Co., Ltd., ZERIA Pharmaceutical Co., Ltd., Chugai Pharmaceutical Co., Ltd., and grants from Miyarisan Pharmaceutical Co., Ltd., outside the submitted work. J. Koya reports personal fees from Eisai Co., Ltd., and personal fees from Tomy Digital Biology Co., Ltd., outside the submitted work. K. Kataoka reports grants from the Japan Agency for Medical Research and Development, Japan Science and Technology Agency Moonshot R&D Program, Takeda Science Foundation, Chordia Therapeutics, Asahi Kasei Pharma, Shionogi, Teijin Pharma, Japan Blood Products Organization, Mochida Pharmaceutical, JCR Pharmaceuticals, Nippon Kayaku, and grants from the Uehara Memorial Foundation during the conduct of the study; grants and personal fees from Otsuka Pharmaceutical, Chugai Pharmaceutical, Takeda Pharmaceuticals, Meiji Seika Pharma, Eisai, Ono Pharmaceutical, Kyowa Kirin, Daiichi Sankyo, Sumitomo Pharma, Nippon Shinyaku; personal fees from Astellas Pharma, Novartis, AstraZeneca, Janssen Pharmaceutical, SymBio Pharmaceuticals, Bristol Myers Squibb, Pfizer, Alexion Pharmaceuticals, AbbVie, Sanofi, Sysmex, Mundipharma, Incyte Corporation, and personal fees from Kyorin Pharmaceutical outside the submitted work; in addition, K. Kataoka has a patent for genetic alterations as a biomarker in T-cell lymphomas issued and licensed and a patent for PD-L1 abnormalities as a predictive biomarker for immune checkpoint blockade therapy issued and licensed. No disclosures were reported by the other authors.

S. Horie: Resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. Y. Saito: Resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. Y. Kogure: Validation, writing–review and editing. K. Mizuno: Validation, writing–review and editing. Y. Ito: Investigation, writing–review and editing. M. Tabata: Investigation, writing–review and editing. T. Kanai: Investigation, writing–review and editing. K. Murakami: Investigation, writing–review and editing. J. Koya: Investigation, writing–review and editing. K. Kataoka: Conceptualization, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing.

This work was supported by the Japan Society for the Promotion of Science KAKENHI (JP 22K20808, to Y. Saito), the Japan Agency for Medical Research and Development [Project for Promotion of Cancer Research and Therapeutic Evolution (JP22ama221510), to K. Kataoka], Japan Science and Technology Agency Moonshot R&D Program (JPMJMS2022, to K. Kataoka), Takeda Science Foundation (to Y. Saito), and the Uehara Memorial Foundation (to K. Kataoka). The results published here are in part based upon data generated by C-CAT (https://www.ncc.go.jp/en/c_cat/about), TCGA Research Network (https://www.cancer.gov/tcga), and the DepMap Project (https://depmap.org/portal). The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. The supercomputing resources were provided by the Human Genome Center within the Institute of Medical Science at The University of Tokyo.

Note: Supplementary data for this article are available at Cancer Discovery Online (http://cancerdiscovery.aacrjournals.org/).

1.
Ding
L
,
Bailey
MH
,
Porta-Pardo
E
,
Thorsson
V
,
Colaprico
A
,
Bertrand
D
, et al
.
Perspective on oncogenic processes at the end of the beginning of cancer genomics
.
Cell
2018
;
173
:
305
20
.
2.
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
.
Pan-cancer analysis of whole genomes
.
Nature
2020
;
578
:
82
93
.
3.
Kandoth
C
,
McLellan
MD
,
Vandin
F
,
Ye
K
,
Niu
B
,
Lu
C
, et al
.
Muta­tional landscape and significance across 12 major cancer types
.
Nature
2013
;
502
:
333
9
.
4.
Sanchez-Vega
F
,
Mina
M
,
Armenia
J
,
Chatila
WK
,
Luna
A
,
La
KC
, et al
.
Oncogenic signaling pathways in the cancer genome atlas
.
Cell
2018
;
173
:
321
37
.
5.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
.
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
173
:
371
85
.
6.
AACR Project GENIE Consortium
.
AACR Project GENIE: powering precision medicine through an international consortium
.
Cancer Discov
2017
;
7
:
818
31
.
7.
Zehir
A
,
Benayed
R
,
Shah
RH
,
Syed
A
,
Middha
S
,
Kim
HR
, et al
.
Muta­tional landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients
.
Nat Med
2017
;
23
:
703
13
.
8.
Park
KH
,
Choi
JY
,
Lim
AR
,
Kim
JW
,
Choi
YJ
,
Lee
S
, et al
.
Genomic landscape and clinical utility in korean advanced pan-cancer patients from prospective clinical sequencing: K-MASTER program
.
Cancer Discov
2022
;
12
:
938
48
.
9.
Wu
L
,
Yao
H
,
Chen
H
,
Wang
A
,
Guo
K
,
Gou
W
, et al
.
Landscape of somatic alterations in large-scale solid tumors from an Asian population
.
Nat Commun
2022
;
13
:
4264
.
10.
Kohno
T
,
Kato
M
,
Kohsaka
S
,
Sudo
T
,
Tamai
I
,
Shiraishi
Y
, et al
.
C-CAT: the national datacenter for cancer genomic medicine in Japan
.
Cancer Discov
2022
;
12
:
2509
15
.
11.
Tan
DS
,
Mok
TS
,
Rebbeck
TR
.
Cancer genomics: diversity and disparity across ethnicity and geography
.
J Clin Oncol
2016
;
34
:
91
101
.
12.
Arora
K
,
Tran
TN
,
Kemel
Y
,
Mehine
M
,
Liu
YL
,
Nandakumar
S
, et al
.
Genetic ancestry correlates with somatic differences in a real-world clinical cancer sequencing cohort
.
Cancer Discov
2022
;
12
:
2552
65
.
13.
Yuan
J
,
Hu
Z
,
Mahal
BA
,
Zhao
SD
,
Kensler
KH
,
Pi
J
, et al
.
Integrated analysis of genetic ancestry and genomic alterations across cancers
.
Cancer Cell
2018
;
34
:
549
60
.
14.
Carrot-Zhang
J
,
Chambwe
N
,
Damrauer
JS
,
Knijnenburg
TA
,
Robertson
AG
,
Yau
C
, et al
.
Comprehensive analysis of genetic ancestry and its molecular correlates in cancer
.
Cancer Cell
2020
;
37
:
639
54
.
15.
Jiagge
E
,
Jin
DX
,
Newberg
JY
,
Perea-Chamblee
T
,
Pekala
KR
,
Fong
C
, et al
.
Tumor sequencing of African ancestry reveals differences in clinically relevant alterations across common cancers
.
Cancer Cell
2023
;
41
:
1963
71
.
16.
Mina
M
,
Raynaud
F
,
Tavernari
D
,
Battistello
E
,
Sungalee
S
,
Saghafinia
S
, et al
.
Conditional selection of genomic alterations dictates cancer evolution and oncogenic dependencies
.
Cancer Cell
2017
;
32
:
155
68
.
17.
Mina
M
,
Iyer
A
,
Tavernari
D
,
Raynaud
F
,
Ciriello
G
.
Discovering functional evolutionary dependencies in human cancers
.
Nat Genet
2020
;
52
:
1198
207
.
18.
Chakravarty
D
,
Gao
J
,
Phillips
SM
,
Kundra
R
,
Zhang
H
,
Wang
J
, et al
.
OncoKB: a precision oncology knowledge base
.
JCO Precis Oncol
2017
;
2017
:
PO.17.00011
.
19.
Ge
Z
,
Leighton
JS
,
Wang
Y
,
Peng
X
,
Chen
Z
,
Chen
H
, et al
.
Integrated genomic analysis of the ubiquitin pathway across cancer types
.
Cell Rep
2018
;
23
:
213
26
.
20.
Donehower
LA
,
Lozano
G
.
20 years studying p53 functions in genetically engineered mice
.
Nat Rev Cancer
2009
;
9
:
831
41
.
21.
Ku
SY
,
Rosario
S
,
Wang
Y
,
Mu
P
,
Seshadri
M
,
Goodrich
ZW
, et al
.
Rb1 and Trp53 cooperate to suppress prostate cancer lineage plasticity, metastasis, and antiandrogen resistance
.
Science
2017
;
355
:
78
83
.
22.
Langille
E
,
Al-Zahrani
KN
,
Ma
Z
,
Liang
M
,
Uuskula-Reimand
L
,
Espin
R
, et al
.
Loss of epigenetic regulation disrupts lineage integrity, induces aberrant alveogenesis, and promotes breast cancer
.
Cancer Discov
2022
;
12
:
2930
53
.
23.
Toska
E
,
Osmanbeyoglu
HU
,
Castel
P
,
Chan
C
,
Hendrickson
RC
,
Elkabets
M
, et al
.
PI3K pathway regulates ER-dependent transcription in breast cancer through the epigenetic regulator KMT2D
.
Science
2017
;
355
:
1324
30
.
24.
Huang
WC
,
Chen
CC
.
Akt phosphorylation of p300 at Ser-1834 is essential for its histone acetyltransferase and transcriptional activity
.
Mol Cell Biol
2005
;
25
:
6592
602
.
25.
Loukas
I
,
Simeoni
F
,
Milan
M
,
Inglese
P
,
Patel
H
,
Goldstone
R
, et al
.
Selective advantage of epigenetically disrupted cancer cells via phenotypic inertia
.
Cancer Cell
2023
;
41
:
70
87
.
26.
Mathur
R
.
ARID1A loss in cancer: towards a mechanistic understanding
.
Pharmacol Ther
2018
;
190
:
15
23
.
27.
Wilson
MR
,
Reske
JJ
,
Holladay
J
,
Neupane
S
,
Ngo
J
,
Cuthrell
N
, et al
.
ARID1A mutations promote P300-dependent endometrial invasion through super-enhancer hyperacetylation
.
Cell Rep
2020
;
33
:
108366
.
28.
Ghandi
M
,
Huang
FW
,
Jané-Valbuena
J
,
Kryukov
GV
,
Lo
CC
,
McDonald
ER
III
, et al
.
Next-generation characterization of the cancer cell line encyclopedia
.
Nature
2019
;
569
:
503
8
.
29.
Shen
J
,
Ju
Z
,
Zhao
W
,
Wang
L
,
Peng
Y
,
Ge
Z
, et al
.
ARID1A deficiency promotes mutability and potentiates therapeutic antitumor immunity unleashed by immune checkpoint blockade
.
Nat Med
2018
;
24
:
556
62
.
30.
Brat
DJ
,
Verhaak
RG
,
Aldape
KD
,
Yung
WK
,
Salama
SR
,
Cooper
LA
, et al
.
Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas
.
N Engl J Med
2015
;
372
:
2481
98
.
31.
Schoenfeld
AJ
,
Bandlamudi
C
,
Lavery
JA
,
Montecalvo
J
,
Namakydoust
A
,
Rizvi
H
, et al
.
The genomic landscape of SMARCA4 alterations and associations with outcomes in patients with lung cancer
.
Clin Cancer Res
2020
;
26
:
5701
8
.
32.
Helming
KC
,
Wang
X
,
Roberts
CWM
.
Vulnerabilities of mutant SWI/SNF complexes in cancer
.
Cancer Cell
2014
;
26
:
309
17
.
33.
Ogiwara
H
,
Sasaki
M
,
Mitachi
T
,
Oike
T
,
Higuchi
S
,
Tominaga
Y
, et al
.
Targeting p300 addiction in CBP-deficient cancers causes synthetic lethality by apoptotic cell death due to abrogation of MYC expression
.
Cancer Discov
2016
;
6
:
430
45
.
34.
Shigematsu
H
,
Lin
L
,
Takahashi
T
,
Nomura
M
,
Suzuki
M
,
Wistuba
II
, et al
.
Clinical and biological features associated with epidermal growth factor receptor gene mutations in lung cancers
.
J Natl Cancer Inst
2005
;
97
:
339
46
.
35.
Li
J
,
Xu
C
,
Lee
HJ
,
Ren
S
,
Zi
X
,
Zhang
Z
, et al
.
A genomic and epigenomic atlas of prostate cancer in Asian populations
.
Nature
2020
;
580
:
93
9
.
36.
Kumamoto
T
,
Yamazaki
F
,
Nakano
Y
,
Tamura
C
,
Tashiro
S
,
Hattori
H
, et al
.
Medical guidelines for Li-Fraumeni syndrome 2019, version 1.1
.
Int J Clin Oncol
2021
;
26
:
2161
78
.
37.
Tadaka
S
,
Hishinuma
E
,
Komaki
S
,
Motoike
IN
,
Kawashima
J
,
Saigusa
D
, et al
.
jMorp updates in 2020: large enhancement of multi-omics data resources on the general Japanese population
.
Nucleic Acids Res
2021
;
49
:
D536
d44
.
38.
Wang
K
,
Li
M
,
Hakonarson
H
.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
2010
;
38
:
e164
.
39.
Mitelman
F
,
Johansson
B
,
Mertens
F
.
Mitelman database of chromosome aberrations and gene fusions in cancer 2023
. Available from: https://mitelmandatabase.isb-cgc.org.
40.
Jang
YE
,
Jang
I
,
Kim
S
,
Cho
S
,
Kim
D
,
Kim
K
, et al
.
ChimerDB 4.0: an updated and expanded database of fusion genes
.
Nucleic Acids Res
2020
;
48
:
D817
d24
.
41.
Kundra
R
,
Zhang
H
,
Sheridan
R
,
Sirintrapun
SJ
,
Wang
A
,
Ochoa
A
, et al
.
OncoTree: a cancer classification system for precision oncology
.
JCO Clin Cancer Inform
2021
;
5
:
221
30
.
42.
Hayward
NK
,
Wilmott
JS
,
Waddell
N
,
Johansson
PA
,
Field
MA
,
Nones
K
, et al
.
Whole-genome landscapes of major melanoma subtypes
.
Nature
2017
;
545
:
175
80
.
43.
Aryee
MJ
,
Jaffe
AE
,
Corrada-Bravo
H
,
Ladd-Acosta
C
,
Feinberg
AP
,
Hansen
KD
, et al
.
Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays
.
Bioinformatics
2014
;
30
:
1363
9
.
44.
Chen
YA
,
Lemire
M
,
Choufani
S
,
Butcher
DT
,
Grafodatskaya
D
,
Zanke
BW
, et al
.
Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray
.
Epigenetics
2013
;
8
:
203
9
.
45.
Timp
W
,
Feinberg
AP
.
Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host
.
Nat Rev Cancer
2013
;
13
:
497
510
.
46.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
47.
Liberzon
A
,
Birger
C
,
Thorvaldsdóttir
H
,
Ghandi
M
,
Mesirov
JP
,
Tamayo
P
.
The molecular signatures database (MSigDB) hallmark gene set collection
.
Cell Syst
2015
;
1
:
417
25
.
48.
Barbie
DA
,
Tamayo
P
,
Boehm
JS
,
Kim
SY
,
Moody
SE
,
Dunn
IF
, et al
.
Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1
.
Nature
2009
;
462
:
108
12
.
49.
Dobin
A
,
Davis
CA
,
Schlesinger
F
,
Drenkow
J
,
Zaleski
C
,
Jha
S
, et al
.
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
2013
;
29
:
15
21
.
50.
Liao
Y
,
Smyth
GK
,
Shi
W
.
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
.
Bioinformatics
2014
;
30
:
923
30
.

Supplementary data