Abstract
Ras is frequently mutated in cancer, however, there is a lack of consensus in the literature regarding the cancer mutation frequency of Ras, with quoted values varying from 10%–30%. This variability is at least in part due to the selective aggregation of data from different databases and the dominant influence of particular cancer types and particular Ras isoforms within these datasets. To provide a more definitive figure for Ras mutation frequency in cancer, we cross-referenced the data in all major publicly accessible cancer mutation databases to determine reliable mutation frequency values for each Ras isoform in all major cancer types. These percentages were then applied to current U.S. cancer incidence statistics to estimate the number of new patients each year that have Ras-mutant cancers. We find that approximately 19% of patients with cancer harbor Ras mutations, equivalent to approximately 3.4 million new cases per year worldwide. We discuss the Ras isoform and mutation-specific trends evident within the datasets that are relevant to current Ras-targeted therapies.
Introduction
Ras proteins activate signaling networks controlling cell proliferation, differentiation, and survival (1). They are encoded by three ubiquitously expressed genes, HRAS, KRAS, and NRAS, that share significant sequence homology and largely overlapping functions (2). Ras proteins cycle between an inactive GDP-bound conformation and an active GTP-bound conformation. Activation is facilitated by guanine nucleotide exchange factors (GEF), and inactivating GTP hydrolysis is enhanced by GTPase-activating proteins (GAP). Ras activation causes a conformational change that allows engagement with more than 20 different proteins from 10 effector families (3). The most intensively studied of these from a cancer and therapeutic perspective have been the RAF and PtdIns-3 kinase families. Activated Ras concentrates effector proteins into plasma membrane signaling nanoclusters where they can interact with necessary proteins and lipids to control downstream pathways (4). Mutations of Ras that render the protein constitutively active are widely observed in cancer; however, there are distinctive patterns in the mutation frequencies associated with each Ras gene and cancer type (5).
Oncogenic Ras
Gain-of-function missense mutations promote oncogenesis with almost all detected in patients clustering in three hotspots at codons 12, 13, and 61 (5). These result in enhanced GTP binding due to fast exchange of nucleotide and/or impairment of GAP binding (6). Although these mutants are all activating, they are not equal in their oncogenic potential and differences in patient survival are associated with different Ras mutants (7–11). Mutation-specific oncogenesis was clearly demonstrated in vivo using a library CRISPR gene-editing approach that allowed 12 different activating codon 12 and 13 mutations of KRAS to be simultaneously compared in each mouse and found that only five of these mutants resulted in the development of lung tumors (12). The frequency of individual mutations in patients also varies across tissue types and between isoforms, suggesting contextual influences that determine which isoform–mutation combinations have a selective advantage in different cancer types (5).
The interplay between three main factors determines whether conditions are permissive for initiation and progression of Ras-dependent oncogenesis and might explain why specific Ras isoforms and mutations are associated with distinct cancer types (5). The first is Ras dosage that is defined by expression levels and relative activation state (13). The proportion of a Ras population that is GTP-bound varies from 30% to 90% dependent upon which mutation is present (14). Furthermore, the stability of the active state can vary dependent on whether the mutant is fast-cycling or GAP-insensitive (6). Together with the fact that Ras expression levels vary over 100-fold between isoforms and across tissues (15), this means that significant differences in Ras signaling capacity can result depending on the tissue/isoform/mutation combination. Only a subset of these combinations will be optimal because too much Ras signaling promotes senescence or cell death while too little fails to initiate tumorigenesis (16–18). Importantly, the narrow range of permissive signaling capacity is subject to change over time to facilitate tumor progression and resistance to therapeutics (19).
The second factor is signal specificity associated with each Ras isoform and their individual mutations. The extent to which Ras isoforms display differential coupling to effector pathways is not well understood. It is also confounded by differences in expression/dosing; however, in vivo experiments expressing Ras isoforms from the same genetic locus to avoid dosage influences still revealed that Ras isoforms cannot fully recapitulate the functions of each other (20, 21). Isoform-specific signaling is thought to be mediated by differential intracellular localisation that favors preferential coupling to specific effector pathways (4, 22, 23), and by distinct biochemical properties imparted by allosteric lobe sequence variations between each isoform (24). Recent in vitro analysis revealed distinct binding preferences for Ras–Raf interactions with BRAF binding being highly selective for KRAS while CRAF was critical for HRAS-mediated MAPK signaling (25). Mutational-specificity is also important for Ras biology (12, 26–29), and structural and biochemical features underpinning mutational differences in nucleotide cycling, allosteric regulation and GEF, GAP, and effector interactions are now being defined (14, 25, 28, 30–32).
The final factor is cellular and tissue context that contributes the genetic, epigenetic, and proteomic landscapes in which Ras networks operate. This heterogeneity can result in different proliferative potential depending on the capacity of the oncogene to engage the subset of drivers in that cell or tissue (28, 33, 34). Ras dosing and signaling specificity titrated against these backgrounds will favor selection of different Ras variant combinations in each tissue. Cellular context is also important: an example of this was seen in the spatial and cell type–specific variation in MAPK activation in KRASG12V-mutated mouse colon that was shaped by cell-+type differences in expression of regulatory proteins of the MAPK pathway (35).
Our understanding of cancer is facilitated by observing patterns found in cancer mutation databases. Those data have highlighted the isoform/mutation combinations most frequently seen in each tissue and this has helped to inform the development and testing of potential Ras-targeted therapies in appropriate patient groups. However, it is also important to note that the lack of consensus among these datasets that can result in incorrect estimates of the true disease burden associated with each Ras isoform.
Cancer Mutation Databases
We incorporate four leading cancer mutation databases into our analyses. The largest of these is the Catalogue of Somatic Mutations in Cancer (COSMIC), that contains manually curated data from the cancer literature comprising approximately 9.7 million coding mutations from approximately 1.4 million samples (including ∼34,000 whole genomes; ref. 36). The most refined database is The Cancer Genome Atlas (TCGA) that has molecularly characterised tumor samples from approximately 11,300 patients representing 33 cancer types. All samples in TCGA have been subject to comprehensive genomic, epigenomic, transcriptomic, and proteomic analysis to better understand the oncogenic systems biology of molecular subtypes of cancer (37). The co-ordinated management within the TCGA Program means that verification and quality control of sample type is also likely to be the most consistent versus larger datasets derived from multiple independent studies. The Memorial Sloan Kettering Cancer Centre (MSKCC) cBioPortal facilitates meta-analysis of TCGA and ∼130 other datasets comprising approximately 40,000 samples (38). The International Cancer Genome Consortium (ICGC) data portal performs a similar function with multi-omic data for 22 cancer types currently curated from approximtely 24,000 samples derived from patients from around the world (39). There is overlap in the data present within each database, with almost all TCGA data also found within the cBioPortal data portal, approximately 50% of TCGA data are present within the ICGC data portal and all of the data found in TCGA, cBioPortal, and ICGC is collated within COSMIC (Fig. 1).
Overview of Ras data in cancer genetics databases. A, The sampling relationship, the number of samples tested, and the number of Ras mutant samples identified in each of the publicly accessible databases and data portals. B, The sample tissue composition of each database is not equivalent. Data sources: COSMIC v85, cBioPortal v2.2.0, ICGC release 27, TCGA release 12.0, all accessed contemporaneously.
Overview of Ras data in cancer genetics databases. A, The sampling relationship, the number of samples tested, and the number of Ras mutant samples identified in each of the publicly accessible databases and data portals. B, The sample tissue composition of each database is not equivalent. Data sources: COSMIC v85, cBioPortal v2.2.0, ICGC release 27, TCGA release 12.0, all accessed contemporaneously.
With such rich and integrated datasets available, it might seem surprising that there is still no consensus on the disease burden associated with a major oncogene such as mutant Ras. For example, 11.6% of TCGA samples, 17.5% of cBioPortal samples, 19.3% of ICGC samples, and 24.8% of COSMIC samples are Ras mutant (Fig. 1). These differences are driven by the different priorities underpinning sample curation. COSMIC collates widely from the cancer literature and includes a high proportion of tissues such as colon and lung that contain high percentages of KRAS mutations (Fig. 1B). In contrast, the other datasets consist of studies where Ras mutation status was not a factor in their collection and consequently contain larger proportions of breast, brain, kidney, and liver samples where Ras mutations are rarely observed.
In fact, none of the percentages accurately reflect Ras disease burden because none of the datasets accurately recapitulate the relative frequency of each disease in the patient population. An example of this is seen in the TCGA dataset, where the ten rarest cancers representing 2.7% of new cases/year in the United States account for approximately 20% of TCGA samples. Given this, how should the datasets be used best to understand Ras mutation patterns? It seems likely that the sheer volume of Ras-mutant samples within the COSMIC dataset (Fig. 1), means that for many cancer types broadly accurate conclusions can be drawn regarding the association of particular Ras isoforms and the types of mutations observed. In contrast, the smaller datasets are particularly suited to comparative analysis of genome-wide changes and genetic associations with mutant Ras.
Ras Patterns across Datasets
Comparison of KRAS mutation data for major KRAS-associated cancers illustrates the challenge of reaching a consensus across the datasets (Supplementary Table S1). For example, colorectal cancer exhibits KRAS mutation frequencies of approximately 33% in the COSMIC dataset comprising approximately 75,000 tested samples. The frequencies of 40%–45% suggested by the other datasets are based on small sample sizes of fewer than 500. Notably however, the private Foundation Medicine (FM) dataset comprising 13,336 colorectal samples reports a KRAS mutation frequency of approximately 50% (40). This may be due to higher sensitivity of the recent genetic screening methods employed by FM versus the long-term aggregate data in COSMIC. A second point of difference is that the samples within the FM dataset were from patients that presented with advanced metastatic disease in contrast to the heterogeneity of samples collated in COSMIC that were derived from a wide range of studies. For lung cancer, similar disparities are evident with COSMIC reporting lower percentages of approximately 21% and the small-scale datasets giving intermediate values compared to a recent large-scale FM study that found that 31% of 5,749 lung adenocarcinoma samples were KRAS mutant (41).
The collated pancreatic data reveals a consistent anomaly: according to literature sources 90%–98% of pancreatic cancers are KRAS mutated (1, 3, 42); while most datasets contain lower values of 70%–75% (Supplementary Table S1). The stroma of pancreatic ductal adenocarcinomas (PDAC) are extensively infiltrated with cancer-associated fibroblasts that are not Ras mutant (43). It seems likely that the lower estimates across the datasets are confounded by stromal sampling resulting in reduced sensitivity for positively identifying KRAS mutations in the small subpopulations of cancer cells present. Consistent with this, a recent large-scale cancer genetics study by Foundation Medicine of 3,594 primary and metastatic PDAC samples where a required threshold of cancer cell content was rigorously verified in each sample reported that 88% of samples contain KRAS mutations (44).
Where publicly available datasets such as the TCGA excel is in multi-gene comparisons and in the integrated analysis of a wide range of types of genetic changes. The Pan-Cancer Atlas used TCGA data to define the molecular subtypes of cancer and to generate deep understanding of the genetic programmes associated with driving different cancer types (45). More recently, this has been extended to the noncoding part of the genome in 2,500 tumors and allowed the identification of an average of approximately 1.2 noncoding and approximately 2.6 coding driver mutations per tumor (46). Analysis of 85 genes within the immediate receptor tyrosine kinase (RTK)–Ras network (37), reveals a wide range of Ras pathway dependencies ranging from 30% to 96% of samples across the TCGA cancer types (Supplementary Fig. S1). Some of these such as pancreatic cancer (PAAD in TCGA terminology) are highly linked to Ras mutation; however, most cancers exhibit activating genetic changes in the Ras network independently of mutant Ras. Targeting Ras and Ras network genes is clearly relevant in nearly all cancer types, even those where Ras mutations are rarely found.
Amplification of nonmutated Ras is a feature of a subset of cancers including esophageal (ESCA), stomach (STAD), ovarian (OV), and testicular (TGCT) where it represents the dominant type of Ras genetic change (Supplementary Fig. S1). In each of these cases, KRAS is by far the most frequently amplified and the most frequently mutated Ras isoform. We have not formally checked whether it is the wild-type or mutant allele that is amplified; however, in these specific cancer types where amplification is seen more often than mutation, at least a subset of these events will be in the wild-type allele. Changes in Ras dosage are associated with progression and response to therapy (19); and the deletions and amplifications observed in some cancer types may reflect this. Copy number analysis reveals that the Ras isoforms display distinct patterns, with KRAS typically amplified whilst HRAS is more often deleted when copy number changes occur (Supplementary Table S2B). Some of these changes appear to be reciprocal; for example, in squamous cell lung cancer (LUSC), HRAS and NRAS exhibit copy number losses whereas KRAS increases (Supplementary Table S2B). Alternatively, in bladder (BLCA), ovarian (OV), and testicular (TCGT) cancer, HRAS is decreased but both NRAS and KRAS show a clear tendency for increasing copy number. Together, these suggest that some interesting isoform-specific biology may be at play that could be worth further investigation.
Thus, the lack of consensus at the tissue level remains problematic; however, the major themes within each dataset in terms of preferential coupling of particular Ras isoforms to specific cancer types and the patterns of mutations are consistent across all datasets. In the absence of access to large private cancer genetics datasets that have the benefits of scale, consistently high quality of curation and comprehensive genomic screening, we will use the COSMIC dataset together with selected publicly available data from the Foundation Medicine database to collate Ras isoform mutations patterns across a wide range of cancer types.
Ras Mutation Frequencies In Cancer
To estimate Ras disease burden, it is necessary to convert the Ras mutation frequencies found in cancer genetics databases into patient numbers based on current cancer incidence data. We have collated frequencies for a wide range of cancers from all four databases (COSMIC, cBioPortal, ICGC, and TCGA; Supplementary Table S2). The data are derived from formally verified cancer types rather than samples with only a generic tissue-based categorization and we have used the TCGA naming system where relevant to facilitate cross-comparison between databases. Data from COSMIC, together with publicly available Foundation Medicine data for all three Ras isoforms in colorectal adenocarcinoma (COAD, READ) and KRAS mutation frequencies in lung (LUAD) and pancreatic (PAAD) adenocarcinomas are presented in Table 1. In 2018 the American Cancer Society estimated that approximately 1.7 million new cases of cancer were diagnosed in the United States (47). The 29 cancer types presented in Table 1 represent approximately 80% coverage of United States cancer cases. Note that we have not included nonmelanoma skin cancers in Table 1 because they almost always present as benign, they are typically under-reported in cancer statistics and they are not included in global cancer incidence reports.
We estimate that approximately 19% of patients with cancer will harbor a Ras mutation; this is equivalent to approximately 260,000 new cases per year in the United States. Globally, there are currently approximately 18 million new cancer diagnoses per year (48). While acknowledging that the incidence of different cancer types varies around the world, a simple extrapolation of our observations suggests that there are approximately 3.4 million new cancer cases worldwide per year with a Ras mutation. KRAS is the most frequently mutated of the three Ras isoforms in 19 of the 29 cancer types in Table 1 and is responsible for 75% of Ras-mutant cancers. NRAS (17% of patients) and HRAS (7%) show strong coupling to only a small subset of cancer types. Isoform-specific coupling is particularly evident for major KRAS cancer types and for NRAS in melanoma (SKCM; Table 1). In contrast, thyroid cancer subtypes (THCAA, THCAF) are notable for displaying high levels of mutation in all three Ras isoforms.
Although KRAS is the major cancer-causing isoform, patient numbers for the other Ras isoforms are still significant and translate into approximately 230,000 patients globally for HRAS and approximately 560,000 for NRAS. This highlights the importance of targeting all isoforms and not just the current concentration on developing KRAS-targeted therapies. Farnesyl transferase inhibitors (FTI) that failed clinical trials more than 20 years ago (48) are enjoying a renaissance due to personalized medicine approaches. KRAS and NRAS do not respond to FTIs; however, HRAS is sensitive (49). Ras mutation profiling means that suitable patients can now be identified and the FTI tipifarnib is currently progressing through phase II clinical trials for use in HRAS-mutant head and neck, leukemia, lymphoma, and thyroid cancers (ClinicalTrials.gov). While most of these cancers represent obvious choices for use of FTIs (Table 1), there are appreciable numbers of potential beneficiaries across a wide range of other cancer types. This includes breast cancer where we estimate that globally there are approximately 12,000 new cases per year that will harbor mutant HRAS. In addition to targeted use of existing FTIs, an exciting new FTI has recently been developed that overcomes KRAS resistance and potentially opens the way to pan-Ras inhibition (50).
Examining mutation-specific patterns also reveals patterns associated with distinct tissue types (Supplementary Table S3). For example, KRAS codon 13 mutations are particularly associated with the gastrointestinal tract and some blood cancers where rare mutations of A146 are also mostly observed. There are 19 different activating codon 12, 13, or 61 mutations that can be created in each isoform by a single base change. Five mutations (G12D, G12V, G12C, G13D, and Q61R) account for 70% of all Ras-mutant patients. G12C mutations are frequently found in lung cancer due to G:C>T:A transversions associated with bulky adducts generated by the mutagens in tobacco smoke (51). Chemical inhibitors of G12C-mutated KRAS have been developed (52–54). These compounds preferentially bind to GDP-bound KRAS and prevent exchange for GTP and interaction with effectors. Compounds developed by Mirati, Wellspring, Janssen, and Amgen have now entered phase I/II clinical trials and Amgen recently reported the initial results of their trials with AMG 510 where it was tolerated by patients and stabilized or partially regressed their non–small cell lung tumors (55). Associated in vivo studies revealed a synergy with immunotherapy and excitingly 9 of 10 mice showed complete and curative tumor regression when AMG 510 was used in combination with anti-PD-1. While lung cancer has been the focus of these trials due to the smoking-associated prevalence of KRASG12C, many other cancer types contain appreciable numbers of potential beneficiaries of treatment (Supplementary Table S3), especially when considered on a global scale. It is also important to be able to target other mutations such as G12V and G12D that each have a 2–3-fold higher disease burden than G12C. Several compounds targeting these other mutations are now in development and entering trials.
Concluding Remarks and Future Directions
We have determined the global disease burden associated with Ras mutations for different cancer types. Approximately 19% of patients with cancer harbor Ras mutations with KRAS responsible for 75% of that number. Our meta-analysis revealed the differences in sampling and interpretation that underlie the lack of consensus that has prevailed to date. Converting the frequencies into patient numbers also helps to refocus attention onto the substantial populations of patients that might benefit from anti-Ras therapeutics in cancers where Ras is not frequently mutated. After a prolonged period where it seemed that Ras was undruggable, we are now entering an era where it seems feasible that we will have Ras-targeted precision therapy options available that can be tailored to individual mutations and cancers. The patient number estimates give an indication of the size of the pool potentially available for clinical trials. Given the isoform-, mutation-, and tissue-specific differences in Ras biology that are now evident, it will be important to have access to even larger databases with high-quality sample curation and genome-wide profiling to develop deeper understanding that will inform these precision medicine approaches.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
This work was supported by funding from North West Cancer Research (CR1166) and NWCR endowment funding for I.A. Prior, and by federal funds from the National Cancer Institute, National Institutes of Health, under contract no. HHSN261200800001E (to J.L. Hartley). We gratefully acknowledge the contributions of the National Cancer Institute, MSKCC, ICGC, the Sanger Centre and others for generating the cancer genetics databases that made this work possible.