Carrot-Zhang and colleagues describe associations between Native American ancestry and the somatic mutational landscape in lung cancer, including tumor mutation burden and specific driver mutations in EGFR, KRAS, and STK11. Local ancestry analysis suggests that specific germline loci, and not environment, underlie these associations.
See related article by Carrot-Zhang et al., p. 591.
Cancer is driven by complex interactions between environmental and genetic factors. There are at least three distinct types of genetic variation that interact with the environment: (i) somatic variants—genomic alterations that arise in nongerm cells and in certain circumstances contribute to tumorigenesis; (ii) rare germline variants—pathogenic alterations that characterize cancer predisposition genes. Individually, these variants are rare but in aggregate affect a substantial fraction of cancers; and (iii) common germline risk alleles—these variants commonly referred to as polymorphisms are more prevalent in the general population. Such variants have relatively modest individual effects but may work together with other common variants to increase the risk of developing certain types of cancer. Disparities between different ancestral groups in terms of cancer prevalence and outcomes have long been observed. Much of this can (at least in part) be explained by major socioeconomic factors. However, a genetic component is also likely. Groups that have interrogated the factors leading to cancer disparities, like the Multiethnic Cohort study, have shown that significant variability in the incidence of cancers between ancestral groups cannot be fully explained by nongenetic factors (1).
The Cancer Genome Atlas (TGCA) has served as a wealth of information for systematic surveys of cancer genomes. One of the most important contributions of TCGA is that it has enabled the comprehensive characterization of somatic alterations that occur across a multitude of cancer types (2). TCGA also serves as a modern resource to advance our understanding of how germline genetic variation can be a causative force in cancer. In a recent study of 10,389 cases from 33 cancer types in TGCA, it was shown that 4.1% of cases had pathogenic variants, and another 3.8% carried likely pathogenic variants, suggesting that although these variants are not always clinically identified, they do play an important role in driving cancer (3). Large consortia have also played an important role in defining the numerous common genetic variants that incrementally increase (or decrease) risk for acquiring cancer. Rashkin and colleagues recently performed a large genome-wide association study across 18 cancer types and 2 large independent cohorts (4). This study included >400,000 individuals and >60,000 cancer cases. In total, 21 previously unreported significant genome-wide associations were observed, indicating that despite the decades of work dedicated to the identification of common variation that increases risk for complex diseases, there are still associations to be discovered.
In addition to comprehensive surveys of different types of variation in cancer, some efforts have systematically investigated the interaction between these types of variation. The most established understanding of the interaction between germline variation and somatic variation is commonly referred to as the “two-hit” hypothesis where a germline pathogenic variant precedes a second somatic mutation in that same gene. Lu and colleagues showed in a large-scale study of rare germline cancer susceptibility loci that BRCA1 germline truncations tended to co-occur with TP53 somatic mutations, and BRCA1/BRCA2 germline truncations are often mutually exclusive with PIK3CA somatic mutations in breast cancer (5). In a systematic study of common germline polymorphism and somatic events, Carter and colleagues identified and validated 17 associations between germline variants and genes with somatic mutations in known cancer genes (6). Finally, similar trends between ancestry and pathogenic cancer-predisposing variants have been observed. For example, in a study of 9,899 cases across 33 cancers, Oak and colleagues identified an association in patients with African ancestry between BRCA2 pathogenic mutations and lung squamous cell carcinoma, and in patients with East Asian ancestry between BRIP1 pathogenic mutations and stomach adenocarcinoma (7). Yuan and colleagues showed at a pan-cancer level that African Americans had more TP53 somatic mutations when compared with patients with European ancestry. This trend also continued in breast cancer, where African Americans showed a higher frequency of TP53 mutations. Finally, African Americans also had significantly lower levels of chromosomal instability than patients with European American ancestry (8). Taken together, these results suggest that specific trends related to ancestry can be seen in both the prevalence of cancer-predisposing variants and patterns of somatic variation.
In this issue of Cancer Discovery, Carrot-Zhang and colleagues present an interesting study of how genetic ancestry may contribute to somatic mutational burden and specific somatic driver mutations in lung cancers from admixed populations (9). This group has assembled an impressive set of samples from understudied populations to address this question. The cohort includes more than 1,100 individuals representing an approximately equal mix from Mexico and Colombia profiled with a combination of sequencing approaches. For most individuals in the cohort, only tumor tissue (no matched normal) was profiled, and these data were used for both germline and somatic variant calling. The authors estimated ancestry of each individual using principal component analysis. They describe a general method to assess associations between ancestry and key driver somatic mutations and apply this approach to their Latin American cohort. They found that there is an anticorrelation between tumor mutation burden and Native American ancestry that appears to depend on EGFR mutation status. They also observe that Native American ancestry is correlated with EGFR driver mutations (but not passenger mutations) and anticorrelated with KRAS and STK11 mutations, a pattern that would seem to mirror previous observations in individuals of East Asian ancestry (10). These correlations with Native American ancestry were observed in both the Mexican and Colombian cohorts and were independent of smoking status. The authors further explored the correlation of local ancestry with specific somatic mutations and developed a polygenic local ancestry risk score. They conclude that genetic loci specific to Native American ancestry modulate EGFR and KRAS mutation status in lung cancers. They posit that identification of the underlying germline alleles, associated with somatic EGFR and KRAS mutations in these populations, will improve our understanding of the pathway biology and evolution of the subset of lung cancers driven by these alterations.
This excellent study highlights the need for more research focused on understanding the interaction of somatic variation with ancestry and, more generally, with germline variation. Different ancestral groups may be characterized by both rare and common germline variation. Some rare germline variants may be more prevalent in certain populations due to founder effects or genetic drift. Similarly, certain risk alleles may be linked with ancestry-associated alleles. These three types of variation (somatic, rare germline, and common germline) have to date been studied primarily in isolation from each other, with robust research communities individually focused on: (i) identifying common germline cancer risk alleles and polygenic risk scores from genome-wide association studies (GWAS); (ii) identifying rare germline cancer-predisposing variants in linkage studies; and (iii) conducting surveys of somatic variants driving sporadic cancers. These groups rarely interact, and few approach all three types of variation in a holistic fashion. Groups engaged in germline variant interpretation are generally interested in a single gene/disease etiology and exclusively focus on germline variants, with somatic variant interpretation guidelines in relatively early development stages. Groups focused on GWAS may not have had ready access to somatic tumor tissues. In contrast, groups that completed large somatic mutational landscapes generally had necessary materials to also complete germline variant profiling but perhaps underappreciated its importance in cancer. Initially, these separate efforts may have been driven by a sensible division of labor, playing to individual strengths, with each chasing the low-hanging fruit. For groups outside of large consortia, somatic and germline variant data (or raw data) may not have been readily available. In some cases, this may be driven by patient data privacy concerns, especially related to genome-wide germline data. In all cases, cohorts with sufficient ancestral diversity are lacking. The time has come to address these limitations.
Although there are excellent examples of individual cancer and pan-cancer analyses that have systematically integrated somatic and germline variation, some of which are cited above, we can do more to facilitate such studies in the future. Both germline and somatic tumor databases need representation of more diverse populations. We also must make more effort to analyze and make available integrated somatic and germline call sets. It will be particularly important to build larger and more diverse databases of tumor–normal paired samples. Even the excellent study highlighted here, which profiled large numbers of admixed individuals, was likely somewhat limited by primarily profiling only tumor samples. To avoid false positives, the authors took sensible steps to limit their conclusions to known activating mutations in drivers of lung cancer. However, if we are truly to understand the interaction between germline variation and somatic variation in different populations, future studies will need high-quality genome-wide calls for both germline and somatic variants. Only tumors with matched normals can provide this. To this day, the number of comprehensively sequenced tumors with matched normal samples remains relatively small for most tumor types and is severely biased in terms of ancestral diversity. As these datasets grow, we hope that a more holistic approach to understanding the genetic etiology of cancer will become increasingly commonplace.
No disclosures were reported.