Abstract
Cancer development is a complex process driven by inherited and acquired molecular and cellular alterations. Prevention is the holy grail of cancer elimination, but making this a reality will take a fundamental rethinking and deep understanding of premalignant biology. In this Perspective, we propose a national concerted effort to create a Precancer Atlas (PCA), integrating multi-omics and immunity – basic tenets of the neoplastic process. The biology of neoplasia caused by germline mutations has led to paradigm-changing precision prevention efforts, including: tumor testing for mismatch repair (MMR) deficiency in Lynch syndrome establishing a new paradigm, combinatorial chemoprevention efficacy in familial adenomatous polyposis (FAP), signal of benefit from imaging-based early detection research in high-germline risk for pancreatic neoplasia, elucidating early ontogeny in BRCA1-mutation carriers leading to an international breast cancer prevention trial, and insights into the intricate germline-somatic-immunity interaction landscape. Emerging genetic and pharmacologic (metformin) disruption of mitochondrial (mt) respiration increased autophagy to prevent cancer in a Li-Fraumeni mouse model (biology reproduced in clinical pilot) and revealed profound influences of subtle changes in mt DNA background variation on obesity, aging, and cancer risk. The elaborate communication between the immune system and neoplasia includes an increasingly complex cellular microenvironment and dynamic interactions between host genetics, environmental factors, and microbes in shaping the immune response. Cancer vaccines are in early murine and clinical precancer studies, building on the recent successes of immunotherapy and HPV vaccine immune prevention. Molecular monitoring in Barrett's esophagus to avoid overdiagnosis/treatment highlights an important PCA theme. Next generation sequencing (NGS) discovered age-related clonal hematopoiesis of indeterminate potential (CHIP). Ultra-deep NGS reports over the past year have redefined the premalignant landscape remarkably identifying tiny clones in the blood of up to 95% of women in their 50s, suggesting that potentially premalignant clones are ubiquitous. Similar data from eyelid skin and peritoneal and uterine lavage fluid provide unprecedented opportunities to dissect the earliest phases of stem/progenitor clonal (and microenvironment) evolution/diversity with new single-cell and liquid biopsy technologies. Cancer mutational signatures reflect exogenous or endogenous processes imprinted over time in precursors. Accelerating the prevention of cancer will require a large-scale, longitudinal effort, leveraging diverse disciplines (from genetics, biochemistry, and immunology to mathematics, computational biology, and engineering), initiatives, technologies, and models in developing an integrated multi-omics and immunity PCA – an immense national resource to interrogate, target, and intercept events that drive oncogenesis. Cancer Res; 77(7); 1510–41. ©2017 AACR.
Introduction
In 2016, a national commitment to eradicate cancer by investing in greatly accelerating progress within diverse fields of cancer research, including prevention and early detection, was heralded with the creation of the Cancer Moonshot initiative. Funding for this effort (and other cancer-related innovative initiatives, e.g., stem cell research) received strong bipartisan support and was signed into law as part of the 21st Century Cures Act (1). The rate-limiting step for developing and implementing precision prevention approaches has been our limited understanding of precancer biology in contrast to the extensive study of advanced disease. For example, The Cancer Genome Atlas (TCGA), with volumes of omics data from >11,000 patients across 33 tumor types, has transformed our understanding of cancer biology, identifying hundreds of driver mutations, molecular subgroups, immense tumor heterogeneity, and become a tremendous national resource for discovery, catalyzing the development of novel computational tools. In contrast, although the seminal genetic model of tumorigenesis was defined in the colon nearly 30 years ago (2), limited numbers of adenomas have undergone NGS (3). The interaction between immunity and neoplasia is now established as a fundamental principle of cancer development and progression. A diverse array of engineered models, single-cell technologies, and broad disciplines are beginning to be leveraged to probe early oncogenesis and malignant transformation. Large-scale longitudinal and systematic mapping is critical to develop an integrated omics and immune PCA, allowing dissection of the sequential molecular and cellular events that promote oncogenesis, to drive novel prevention and interception (4–6) (Fig. 1).
An integrated multi-omics and immunity PCA. Inherited and acquired molecular alterations and their interaction with the microenvironment and immune system influence oncogenic progression to invasive carcinoma. Three types of at-risk tissue fields exist from a neoplasia perspective: a normal, carcinogen-exposed, and germline-predisposed (see text). Normal cells (light orange, far left) that have nuclear or mitochondrial germline mutations (green nuclei) acquire somatic alterations (dark orange), including due to viral infection (purple). Driven by genomic instability, these events can alter the oncogenic state with loss of cell growth control, immune evasion and other hallmarks of cancer (6), which can then result in the development of advanced precancers (multicolored cell mass with subclonal diversity/heterogeneity) that immediately precede invasive cancer (red; far right). Molecular alterations, depicted by symbols in the nucleus, include mutations, SNPs, or epigenetic alterations. The accumulation of mutations (e.g., UV, ABOBEC3, DNA repair defects) during life creates signatures (shown by the chromosomal insets and colored dots in gradient from cancer to normal; right to left). These mutational signatures, often identified in cancers, may give clues about etiology and precursors. Functional assays in model systems (e.g., C. elegans and mouse embryonic fibroblasts) will help confirm genotoxic exposures and identify new ones. Mixed mutational processes are also active in normal tissues from conception, varying in strength over space and time. Multi-omic alterations interact (bidirectionally) with the tissue microenvironment of well-established immune cells (tumor-associated macrophages/fibroblasts, MDSCs, NK and cytotoxic/regulatory T-cells), more recently identified and less well understood cells in this context (e.g., adipocytes, myocytes, pericytes, chondrocytes, osteoblasts, neutrophils, fibroblasts, and vascular, epithelial, neuronal, mast and B-cells), microbes, and other cells/events that influence oncogenesis (see text). The continuum between immune state modulated by cytokines and growth factors includes immune surveillance, composed of the antigenic repertoire, innate and adaptive immune cells (upper left box), and immune suppression/escape (upper right box) along with the cells and markers that can lead to immune escape. These complex multi-omic and immunity fields are evolving rapidly, requiring a PCA mechanism/process of continuous updating.
An integrated multi-omics and immunity PCA. Inherited and acquired molecular alterations and their interaction with the microenvironment and immune system influence oncogenic progression to invasive carcinoma. Three types of at-risk tissue fields exist from a neoplasia perspective: a normal, carcinogen-exposed, and germline-predisposed (see text). Normal cells (light orange, far left) that have nuclear or mitochondrial germline mutations (green nuclei) acquire somatic alterations (dark orange), including due to viral infection (purple). Driven by genomic instability, these events can alter the oncogenic state with loss of cell growth control, immune evasion and other hallmarks of cancer (6), which can then result in the development of advanced precancers (multicolored cell mass with subclonal diversity/heterogeneity) that immediately precede invasive cancer (red; far right). Molecular alterations, depicted by symbols in the nucleus, include mutations, SNPs, or epigenetic alterations. The accumulation of mutations (e.g., UV, ABOBEC3, DNA repair defects) during life creates signatures (shown by the chromosomal insets and colored dots in gradient from cancer to normal; right to left). These mutational signatures, often identified in cancers, may give clues about etiology and precursors. Functional assays in model systems (e.g., C. elegans and mouse embryonic fibroblasts) will help confirm genotoxic exposures and identify new ones. Mixed mutational processes are also active in normal tissues from conception, varying in strength over space and time. Multi-omic alterations interact (bidirectionally) with the tissue microenvironment of well-established immune cells (tumor-associated macrophages/fibroblasts, MDSCs, NK and cytotoxic/regulatory T-cells), more recently identified and less well understood cells in this context (e.g., adipocytes, myocytes, pericytes, chondrocytes, osteoblasts, neutrophils, fibroblasts, and vascular, epithelial, neuronal, mast and B-cells), microbes, and other cells/events that influence oncogenesis (see text). The continuum between immune state modulated by cytokines and growth factors includes immune surveillance, composed of the antigenic repertoire, innate and adaptive immune cells (upper left box), and immune suppression/escape (upper right box) along with the cells and markers that can lead to immune escape. These complex multi-omic and immunity fields are evolving rapidly, requiring a PCA mechanism/process of continuous updating.
Translating Inherited Risk into Precision Prevention
Germline cancer susceptibility
The genetics of various hereditary forms of cancer risk have been investigated extensively and long been used to aid our understanding of sporadic neoplasia. Historically, most cancer susceptibility genes were identified through linkage studies in cancer families (7). Furthermore, the biology of germline mutations in certain cancer genes (e.g., BRCA1 carriers) is much better understood than in the somatic setting and is now providing tremendous insight into precision interventions by leveraging the biology of the mutated gene itself. Study of the biology of tumors that develop in BRCA1/2-mutation carriers has led to paradigm-changing precision therapy with PARP inhibitors (7), which have demonstrated preventive activity in BRCA1-deficient mice (8). Similarly Lynch syndrome is serving as a model of immune oncology for sporadic high level of microsatellite instability (MSI-H) tumors (9, 10). Understanding the convergence of Wnt and EGFR signaling in FAP, characterized by germline APC mutations, led to a breakthrough trial of combination chemoprevention with erlotinib and sulindac, which reduced duodenal neoplasia, the leading cause of cancer death following standard prophylactic colectomy for this devastating syndrome (11). Multigene NGS is broadening the spectrum of cancer risk linked to various hereditary syndromes, frequently identifying individuals with high-penetrance germline mutations that are unexpected based on clinical history, for example, colorectal cancer (CRC) in BRCA1/2- and TP53-mutation carriers and breast cancer in Lynch syndrome (7, 12, 13). Furthermore, broad NGS identified germline mutations in cancer-predisposing genes in ∼10% of >1100 pediatric cancer patients, most of whom lack suggestive family histories (14).
A fundamental question regarding inherited cancer susceptibility is why germline mutations, which affect all cells, generally predispose to a particular spectrum and pattern of cancers (7). Although some genes have organ-specific effects (e.g., mutations leading to hepatic overload which predispose to liver cancer), most cancer susceptibility genes have a broad range of functions (e.g., BRCA1 essential functions include homologous recombination-type double-strand break repair [HR-DSBR]). It is unclear, therefore, why different inherited DNA repair defects predispose to specific and different cancers, such as mismatch repair (MMR) gene defects in Lynch syndrome predisposing mostly to CRC and endometrial cancer, while HR-DSBR defects in BRCA1/2-mutation carriers associate mostly with breast and ovarian cancer and others, such as p53 mutations in Li Fraumani syndrome that are more general and associated with many different cancers. Even defects in different MMR genes and location of mutations in the same gene can be associated with differing clinical effects (7, 15). For example, mutations in MSH2 are linked to urothelial cancer, distinct BRCA1 mutations differentially reduce hematopoietic stem cell (HSC) function, and central mutations in BRCA2 confer higher risks of ovarian (vs. breast) cancer and differ by context (e.g., family history, lifestyle, and other modifying factors; refs. 7, 15, 16). Interestingly, most cancer DNA repair genes are germline mutated. The basis of tissue specificity in this context remains a mystery after decades of study and is still under active investigation (16).
BRCA1-mutation carriers
Perhaps the most historically intriguing and enigmatic issue in this field is the fact that BRCA1-mutation/haploinsufficient carriers develop estrogen receptor (ER)-negative cancers in estrogen-regulated tissues and may be prevented by tamoxifen and related agents (which do not prevent ER-negative cancer in non-BRCA1, high-risk populations), a focus of increasingly intensive research. Experimental evidence of this tissue specificity in this setting demonstrated that BRCA1-haploinsufficient breast epithelial (but not fibroblast) cells exhibit an increased DNA damage response, genomic instability, telomere erosion, and premature senescence (17). Telomere-BRCA1 interactions in genome instability also reports synergy between an ovarian cancer risk SNP in OGG1 glycosidase (base excision repair pathway gene) and BRCA1-mutation carriers on telomere instability (18) and profound alterations in telomere homeostasis in BRCA1-silenced nonmalignant breast epithelial cells (19).
Emerging data are now revealing potential underlying mechanisms, primarily involving hormonal/DNA repair interplay in the early ontogeny of neoplasia in BRCA1-mutation carriers (20). BRCA1-haploinsufficient normal mammary epithelial cells (MECs) exhibit all the usual Brca1 functions except for defective stalled replication fork (SRF) repair producing replication stress, among the earliest defects in this setting, which can be rescued by reconstituting cells with WT BRCA1 (21). Interacting factors relevant to tissue specificity include BRCA1 haploinsufficiency (heterozygosity) associated with high titers of circulating estrogen and progesterone and end-organ hormonal dysfunction (20). Furthermore, normal BRCA-mutant breast cells upregulate transcription of CYP1A1 (estrogen-metabolizing gene) and CYP19A1 (estrogen-producing aromatase gene), the latter contributing to the 6–7-fold higher estrogen levels in normal/benign breast and ovary tissue than in the already high titers in the circulation (vs. nonmutant carriers). BRCA1-deficient, ERα-negative breast cells were more susceptible to estrogen-induced DNA damage, DSB (via defective SFR), and genomic instability (not seen with BRCA2 loss), i.e., BRCA1-mutant breast cells have increased rates of estrogen metabolism and both ERα activation by estradiol and the conversion of estrogen to genotoxic metabolites can cause DSBs (22). Haploinsufficiency in replication stress suppression is also a feature of apparently normal MECs with a PALB2 mutation (a BRCA1/2-interacting suppressor protein), which also causes DNA replication and damage response defects (23). Taken together, persistent replication stress in BRCA1-mutant breast cells promotes defective HR-DSBR and genomic instability, which drives later tumorigenic events of p53 mutation and WT BRCA1 loss of heterozygosity.
Bolstering these findings are experiments in which BRCA1 haploinsufficiency: represses PR/ER transcriptional activity, disrupts receptor turnover in normal tissue adjacent to BRCA1-mutant breast cancer and mammary epithelium of Brca1/p53-deficient mice, and cooperates with estrogen signaling to promote ERα-negative mammary tumorigenesis; enables targeted, conditional ERα overexpression (with p53 insufficiency) in MECs leading to ER-negative neoplasia; stimulates estrogen-dependent proliferation of premalignant cells in Brca1-knockout mice; causes impaired ovarian hormone regulation, signaling, and growth factor response and DNA repair in normal human BRCA1-mutant breast tissue and genetically engineered mouse model (GEMM) with disrupted Brca1 alleles (24); and protects MECs from oxidative stress-induced cell death by estrogen regulation of NRF2 pathway via PI3K-AKT signaling, which promotes survival of BRCA1-deficient cells in estrogen-responsive tissues (25). The interplay between BRCA1 and ESR1 (which encodes ERα) includes SWI/SNF subunit BRD7 recruiting BRCA1 to the ESR1 promoter to transcriptionally regulate ERα (26) and BRCA1/BARD1-mediated ubiquitination of ERα, which increases estrogen signaling and breast tumorigenesis (27). Genome-wide association studies (GWAS) are also leading to a better understanding of the biology of tumor development in BRCA1 carriers, including identifying SNPs at ESR1 (6q25.1) associated with breast cancer risk and linking low-frequency BRCA1-coding variants to reproductive aging and breast cancer susceptibility, likely mediated by prolonged estrogen exposure/signaling and DNA repair defects (28). BRCA1-mutation has also been associated with accelerated ovarian aging from preclinical, GWAS, and epidemiologic data. Germline BRCA1 mutations also can reprogram breast metabolism towards mitochondrial-dependent biosynthetic intermediates (29), guiding metabolomic-epigenetic prevention strategies. Finally, a report of ER-negative ovarian cancer found that ERα-dependent estrogen signaling promoted the JAK2-STAT3 pathway, which stimulated immunosuppressive myeloid-derived suppressor cells (MDSCs; ref. 30), providing a provocative perspective into how augmented estrogenic activity could contribute to ER-negative tumor development (e.g., in BRCA1-mutation carriers).
Large randomized, controlled trials of selective estrogen receptor modulators (SERMs) (e.g., tamoxifen and raloxifene) and aromatase inhibitors (e.g., anastrozole) show benefit in preventing ER-positive breast cancer (31). Observational studies, however, unexpectedly indicate that tamoxifen can also reduce breast cancer risk, especially contralateral breast cancer, in BRCA1-mutation carriers (32). Further insight into this issue is coming from studies in which tamoxifen: inhibited proliferation in normal breast epithelial tissue from BRCA1 carriers, similar to noncarriers at increased or population risk of breast cancer; prevented mammary preneoplasia in relation to BRCA1 gene dosage, MECs from conditional GEMMs with only one disrupted BRCA1 allele (more relevant to early human ontogeny; loss of WT BRCA1 is a late event), were more sensitive to tamoxifen (vs. when both alleles were disrupted; ref. 33); and decreased MDSC and increased cytotoxic T-cell infiltration in ER-negative ovarian cancer, whereas the opposite effects were seen in response to estradiol (see above; ref. 30). BRCA1/2 defects have complex immune interactions, including BRCA1/interferon-γ cooperation to activate innate immune response gene subset (ref. 34; see Fanconi Anemia and Big Genomics below). Tamoxifen resistance in BRCA1-deficient GEMMs was associated with gene dosage, an immune signature, and was reversed by the aromatase inhibitor letrozole (33, 35).
The earliest steps of BRCA1-mutant breast neoplasia may involve defects in lineage commitment influencing tissue specificity (36–38). Elegant studies of luminal progenitor biology (36, 37) have created a transformative potential to prevent/delay BRCA1-associated breast cancer, a disease for which the best current preventive option is prophylactic surgery. A highly proliferative subset of luminal progenitor cells (which lack ERα and PR) give rise to basal-like breast cancer, which constitutively express RANK and are hyper-responsive to RANK-L (produced by mature luminal cells that express PR), a key mediator of progestin-driven mammary tumorigenesis. In basal cells, progesterone stimulates Wnt to promote basal progenitor expansion, proliferation and cancer precursor cells. These progenitor cell effects could explain the clinical hormone replacement findings suggesting that progesterone increases breast cancer incidence (31). RANK+ cells were exquisitely sensitive to DNA damage (e.g., stalled fork repair) in the BRCA1 haploinsufficient state, with aberrant activation of RANK and NFκB (via double-stranded DNA break activation of ataxia-telangiectasia checkpoint kinase and NFκB essential modulator) pathways and neoplastic transformation (39). Remarkably, RANK+ (but not RANK−) BRCA1-mutant luminal progenitor cells shared a molecular profile more closely aligned with basal-like breast tumors than any other subtype (36). Analyses of human breast tissue found that cells from the luminal lineage contained precursors to basal-like breast cancer and targeted loss of BRCA1 in luminal cells (but not basal cells) produced basal-like tumors. Aberrant expression of the transcription factor SLUG can promote cell fate-switching to a basal cell identity in BRCA1-mutation carriers (40). BRCA1 mutation stabilizes the protein, while mutant Slug in murine MECs prevents tumors. Both the mutation of origin and order of mutations influence the progenitor/preneoplastic path (41).
Pharmacologic RANK-L inhibition or RANK deletion (in mammary epithelium) in several GEMMs markedly inhibits Brca1-driven mammary tumorigenesis (36, 37). High-level RANK-L with nonfunctional BRCA1 drives epithelial-mesenchymal transition with invasion and correlates with high-grade ER/PR negative disease and mammary stem cell number (in young breast cancer patients; ref. 42). RANK-L/RANK signaling also can influence innate and adaptive immunity (43, 44). RANK-L produced by regulatory T-cells (Treg) promotes mammary cancer and T-cell tolerance to intestinal bacteria. Finally, NFκB hyperactivation in BRCA-mutant breast cancer cells appears to be associated with an immune signature (43). A second RANK-L receptor, LGR4, was just discovered and implicated in the regulation of multiple developmental pathways. Serum levels of osteoprotegerin (OPG), the endogenous RANK-L inhibitor, are significantly lower in BRCA1-mutation carriers (vs. controls) and levels correlated inversely with germline BRCA1-mutation locations known to confer highest breast cancer risk. Premenopausal women with low OPG levels are also inversely associated with breast cancer risk (45). Certain SNPs in the TNFRSF11A locus have been associated with increased RANK expression and breast cancer risk in BRCA1 carriers (37). Denosumab (a RANK-L mAb inhibitor FDA approved in 2010 with a well-established safety record for treating postmenopausal osteoporosis and preventing skeletal events in patients with bone metastases, espousing both efficacy and toxicity superior to bisphosphonates, which also have immune effects in the breast; refs. 43) blocked progesterone-induced proliferation in BRCA1-mutant human organoids and reduced breast epithelial cell proliferation and progenitor cell clonogenic potential in small pilot window trials of BRCA1 carriers (36, 37). Also of interest, improved disease-free survival was recently reported in a randomized trial in which postmenopausal women received low-dose denosumab with an aromatase inhibitor as adjuvant therapy for hormone receptor–positive breast cancer. Targeting RANKL directly is more selective and less toxic than targeting ER or NFκB (which can cause immunosuppression) to prevent mammary cancer in the BRCA1-mutation setting (43). Tamoxifen may further increase the risk of endometrial cancer associated with BRCA1 mutation (46). In contrast, PR modulators (e.g., FDA-approved ulipristal acetate) have the opposite effect, suppressing endometrial proliferation in carriers and endometrial cancer development in mice (24) and can also block RANK-L signaling.
Together, these studies suggest that steroid hormone regulation is perturbed in very early stem/progenitor cells of BRCA1 carriers, creating a milieu in which breast epithelial cells are hyper-responsive to hormonal stimuli before transitioning to a hormone-independent (e.g., NFκB activated) state involving p53 mutation and WT BRCA1 loss. RANK-L inhibition, therefore, should work best to prevent/delay tumor onset for premenopausal BRCA1-mutation carriers based on the above science and recent clinical data indicating that risk-reducing salpingo-oophorectomy is ineffective in this setting (47). Based on these data, denosumab is in late-stage development for a large-scale international breast cancer prevention trial in BRCA-mutation carriers (43), with potential secondary endpoints of ovarian and pancreatic cancer.
Fanconi Anemia (FA) is a rare hereditary syndrome characterized by genomic/chromosomal instability, bone marrow failure, and cancer (7), and now involves 21 genes (including BRCA2/FANCD1 and BRCA1/FANCS). FA pathway required for efficient repair of stress-induced DNA damage can lead to bone marrow failure and has been linked to aging, HSC dysfunction, and reservoir depletion. Defects in BRCA1/FANCD1/BRG1-mediated DNA repair and FANCM-nonsense mutations contribute to hormone-negative breast cancer. Cross-talk between FA (repairs DNA interstrand crosslinks and regulates cellular response to replication stress) and homologous recombination (e.g., BRCA1/2 repairs replication-associated DNA damage and protects stalled replication forks) pathways is critical for genomic integrity. BRCA1/2-deficient tumors have a compensatory increase in FANCD2 activity, which orchestrates DNA pathway choice at replication forks (48). FA patients have high risk of immune related complications (related to B- and T-cell defects) such as infection. Loss of Fancc impairs B-cell function and antibody-secreting cell differentiation in mice through deregulating Wnt signaling. T-cell-specific conditional BRCA2/FANCD1 knockout mice develop T-cell loss and immune dysfunction, which may involve p53 activation (34). Interestingly, p53 downregulates the FA DNA repair pathway. FANCC and certain other FA pathway genes are linked to oxidative stress, cytokine sensitivity, and innate viral immunity. The above immune defects and B-cell dysfunction may account for susceptibility to HPV (and other viral) infections and variable immune responses to HPV vaccines, which have implications for preventive cancer vaccine development. Somatic mutations or epigenetic silencing of FA genes is common in sporadic breast, ovary, lung, and pancreatic cancer and leukemia. The FA pathway has also been linked to mitochondrial dysfunction (49). Failure to remove damaged mitochondria can lead to increased mt-ROS, genotoxic stress, and tumorigenesis in FA, and somatic FA driver mutations in the sporadic setting (50).
Other hereditary forms of cancer risk
Mouse models of Lynch syndrome have begun to unravel key mechanisms of the CRC predisposition by finding that butyrate, generated by gut microbiota from dietary carbohydrates, can act as an oncometabolite (51). Interestingly, butyrate can have the opposite effects in different CRC models, likely reflecting different germline backgrounds, e.g., reduced dietary butyrate (and antibiotics altering microbiota) markedly decreased adenoma and CRC development in an APCMin/+MSH2−/− mouse model. This butyrate effect was only seen in MMR-deficient, but not MMR-proficient mice. Butyrate can suppress carcinogenesis in other colon models (see microbiota discussion later). In APCMin/+ mice, genetically predisposed to intestinal neoplasia, celecoxib (a COX-2 inhibitor known to reduce intestinal adenoma burden) increased gut Coriobacteriaceae, which suppressed production of oncogenic metabolites (e.g., glycine and serine; ref. 52). CRC development was reduced in germ-free ApcMin/+ mice (compared to conventionally housed controls) but increased after enteral Fusobacterium nucleatum or enterotoxigenic Bacteroides fragilis (ETBF). Fusobacterium species have been discovered to be highly enriched in human colorectal adenomas and potentiate intestinal tumorigenesis in APCMin/- mice, possibly through a mechanism involving macrophage recruitment and immune evasion from NK cells (53). In this model, ETBF has been shown to induce T-helper 17 cells in a B. fragilis toxin-dependent manner to reduce neoplasia. Furthermore, IL-17 promotes a pathologic myeloid inflammatory signature and colon tumorigenic response to ETBF, whose secreted metalloprotease toxin cleaves E-cadherin and induces epithelial signals to recruit Tregs that allow IL-17 polarization for early transformation-colonized Min mice. IL-17 monoclonal antibody (mAb) and Treg cell depletion suppressed tumorigenesis at the adenoma stage (54; see microbiota section below). Autophagy is activated in the intestinal epithelium in murine and human CRC and conditional inactivation of Atg7 in intestinal epithelial cells inhibits the formation of precancers in Apc+/− mice, an effect requiring dysbiosis. This study reveals that inhibition of autophagy in cooperation with microbiota may prevent CRC development in genetically predisposed patients (55).
The first real signal of benefit in early detection research in pancreatic cancer, imaging people with high-penetrance mutations, was recently reported (56). Unselected patients may have a very high (>15% in one clinic-based cohort) prevalence of germline mutations (particularly of BRCA1/2 and among Ashkenazi Jewish individuals), most without suspicious family histories (57). Furthermore, precursor lesions in people with high-penetrance germline mutations may have a higher malignant potential (than other pancreas high-risk groups; ref. 58). These data are leading some centers to recommend germline testing for all new pancreatic cancer patients. Precedent for such an approach already exists in NCCN guidelines for ovarian cancer, where a substantial fraction of unselected patients will carry BRCA1/2 mutations (most without consistent clinical/family histories) and screening has limited benefit (59). The development of molecular imaging techniques to detect high-grade lesions (pancreatic intraepithelial neoplasia; PanIN-3) may further improve prevention and early detection of this fatal disease.
Immune interception may be effective in certain inherited neoplasia settings characterized by immunogenic antigen production (Table 1). Cancers that arise in Lynch syndrome with inherited DNA MMR gene defects display a MSI-H and widespread accumulation of somatic frameshift mutations/neoantigens (60), thought to underlie the success of immune checkpoint inhibitors in this disease (9, 10). Recent immune profiling study of hereditary and sporadic MSI-H endometrial cancer found that although both had stromal CD8+ T-cells, there were important differences in the microenvironment (e.g., only sporadic cases had PD-L1+ cells; ref. 61). A high degree of MSI-H CRC is associated with intense T-cell immunity/infiltrates due to numerous frameshift mutations and truncated proteins. Translating this breakthrough advance in immunotherapy to the prevention setting is supported by evidence of early immunosurveillance (T-cells specific to MSI-related neoantigens (62)) in “healthy” Lynch syndrome carriers. Prophylactic HPV vaccine success could translate to this mutation carrier setting with vaccines targeting predictable frame shift mutation-derived peptides. Immune escape in Lynch syndrome (but not sporadic MSI-H neoplasia) -associated CRC and adenomas can be due to β-2 microglobulin mutations (see below).
Universal CRC tumor testing for MMR deficiency to screen for Lynch syndrome (LS), a paradigm-changing approach for identifying inherited cancer risk, has become standard practice and the main trigger for confirmatory germline testing. 2015 US estimates are that universal tumor screening will identify ∼21,000 people with LS, producing substantial public health benefits (cancers prevented, lives saved and cost effective). This benefits both the patient and at-risk family members for intensive and early screening and aspirin or potentially other NSAID prevention (63–65). The profound activity of immune checkpoint blockade in MMR deficient tumors (11) has added to the enthusiasm of universal screening in this setting. The estimated prevalence of LS in the general U.S. population is 1:280 (1.1 million). NGS germline testing, however, has created a genetic conundrum and major clinical challenge by identifying and reporting missense, small insertions, deletions, or splice variants of uncertain significance (VUS) in ∼30% (∼300,000 people in the U.S.). As chemo- and immune-prevention continues advancing in this disease, the development of new tools to clarify the functional status of VUS in LS is a pressing unmet need. In silico systems biology analyses have produced encouraging preliminary results (63), but a national concerted effort involving MMR and biochemistry experts will be required to resolve the VUS issue, which is also affecting other germline mutation testing (e.g., BRCA1/2). Leveraging the Lynch syndrome demonstration project recommended by the Blue Ribbon Panel (BRP) for the Cancer Moonshot will be important.
This tumor testing approach is being evaluated in newly diagnosed lung cancer, where the ∼1% of patients with tumor EGFR T790M mutation have a very high risk of carrying germline T790M mutation (66). Unaffected germline carriers routinely have lung nodules even in young nonsmokers and may benefit from T790 inhibitor chemoprevention. These high-risk families could be a valuable complement to population-based cohorts for studying lung cancer development in nonsmokers (67). Pan TCGA analysis of tumor sequencing data (>4,000 tumors, 12 cancer types) revealed rare germline mutations (e.g., BRCA1/2, FANCM, MSH6) in 4–19% of cancer types, unselected for family history and associated with increased somatic mutation frequencies (68). Germline variants and somatic events are also intricately linked, with specific haplotypes of JAK2 V617F in myeloproliferative neoplasms (MPN; ref. 69), EGFR exon 19 microdeletions in non-small cell lung cancer (70), and germline variation influences on gene expression in breast cancer risk loci (71).
Translating GWAS to the clinic have been challenged by the small-effect sizes of risk variants (see below). Recent pan-cancer study (22 tumor types, 5954 tumors) integrating common genotypes (412 germline loci) with somatic changes found that inherited variation influenced somatic evolution of neoplasia by directing where (organ site) and how (which genes are impacted transcriptionally) cancer develops – highlighting the remarkable prospect of anticipating and intercepting key early events during tumor development (72). This genome-wide analysis of germline-somatic interactions among cancer patients identified common germline alleles that had very large effects on somatic events. New associations included a 15q22.2 allele with >10-fold increased somatic alteration of GNAQ and an intronic SNP in RBFOX1 with >8-fold increase in mutation rate of SF3B1 affecting RNA splicing. Interestingly, some frequently somatically mutated genes (e.g., TP53 and PIK3CA) were not found to be influenced by common germline variants. Recent studies, however, found a germline variant which strongly influenced somatic PIK3CA-mutation frequency (73) and another found an African American-specific SNP in the TP53 gene, which impairs p53 tumor suppressor function (74). As broad NGS of blood and neoplasias become more widely used and germline-somatic relationships comprehensively mapped, shared and distinct germline and/or somatic events can be integrated into the PCA and exploited as prevention targets (7, 64).
Immunologic features of three distinct germline neoplastic pathways
Germline mechanism of hypermutability . | Characteristics of cancers . | Somatic mutations, signatures in cancers . | Evidence of immunogenic cancer phenotype . | Precancer immune biology . |
---|---|---|---|---|
DNA MMR mutations: MLH1, MSH2, MSH6, PMS2; BMMR-D | CRC, endometrial, and others with MSI-H; pediatric CRC/adenomas, gliomas, lymphomas in BMMR-D | Hotspot microsatellites in driver genes, e.g., TGFBR2, BAX in LS; POLE, POLD1 in BMMR-D; biallelic MSH3 | MSI-induced FSPs found in associated cancers; PD-1 inhibitors active in LS, BMMR-D, sporadic MSI-H | MSI-H in precancers, e.g., adenomas/crypt foci, IPMNs, DCIS; microbiota; FSP T-cell immunity; B2M immune escape |
BRCA1/2 mutationsa (HR deficiency) | Breast, ovarian, pancreatic, prostate, and other cancers | Specific signatures of tandem duplications and/or deletions | Markers of immunity in BRCA1/2: breast, ovary, and pancreatic cancers | LPs, RANKL/RANK, NFkB, immunity; wild-type loss in some precancers |
APOBEC3A/B chimeric deletion polymorphism | Modest increased breast cancer risk, somatic hypermutation | APOBEC-mutational burden; hotspots within driver genes, e.g., PIK3CA | Cytokine response, immunity genes; penetrance of immune effects appears high | APOBEC3 mutagenesis in sporadic pre-invasive bladder carcinoma and CIN; innate immunity to infection |
Germline mechanism of hypermutability . | Characteristics of cancers . | Somatic mutations, signatures in cancers . | Evidence of immunogenic cancer phenotype . | Precancer immune biology . |
---|---|---|---|---|
DNA MMR mutations: MLH1, MSH2, MSH6, PMS2; BMMR-D | CRC, endometrial, and others with MSI-H; pediatric CRC/adenomas, gliomas, lymphomas in BMMR-D | Hotspot microsatellites in driver genes, e.g., TGFBR2, BAX in LS; POLE, POLD1 in BMMR-D; biallelic MSH3 | MSI-induced FSPs found in associated cancers; PD-1 inhibitors active in LS, BMMR-D, sporadic MSI-H | MSI-H in precancers, e.g., adenomas/crypt foci, IPMNs, DCIS; microbiota; FSP T-cell immunity; B2M immune escape |
BRCA1/2 mutationsa (HR deficiency) | Breast, ovarian, pancreatic, prostate, and other cancers | Specific signatures of tandem duplications and/or deletions | Markers of immunity in BRCA1/2: breast, ovary, and pancreatic cancers | LPs, RANKL/RANK, NFkB, immunity; wild-type loss in some precancers |
APOBEC3A/B chimeric deletion polymorphism | Modest increased breast cancer risk, somatic hypermutation | APOBEC-mutational burden; hotspots within driver genes, e.g., PIK3CA | Cytokine response, immunity genes; penetrance of immune effects appears high | APOBEC3 mutagenesis in sporadic pre-invasive bladder carcinoma and CIN; innate immunity to infection |
NOTE: Immune biology of precancers in three distinct inherited neoplastic pathways: high-penetrance MMR deficiency and BRCA1/2 mutations, and low penetrance polymorphisms of APOBEC3A/B. Cancers associated with all three pathways are associated with distinct, predictable forms of somatic hypermutability and have various features, suggesting an immunogenic phenotype. The PCA will be key to devising immune interception (see text).
Abbreviations: BMMR-D, biallelic mismatch repair deficiency; B2M, β-2 microglobulin; CIN, cervical intraepithelial neoplasia; CRC, colorectal cancer; FSP, frameshift peptide; HR, homologous recombination; IPMN, intraductal papillary mucinous neoplasms; LP, luminal progenitor; LS, Lynch syndrome; MMR, mismatch repair; (high level) MSI, microsatellite instability.
aFanconi anemia pathway (including BRCA1/2) defects associated with impaired immunity (see text).
GWAS have contributed to expanding catalogs of implicated genes and pathways for many multifaceted human diseases and are beginning to shed light on shared and unique etiological and pathological disease components. Combining large-scale GWAS findings across cancer types (breast, ovarian, and prostate) and using fine-mapping pathway analysis and polygenic risk scores (PRS) discovered cross-cancer risk loci has the potential to shed light on shared biology underlying these hormone-related cancers (75). While the low penetrance of most GWAS loci has limited clinical translation, the combined effects of such SNPs to create robust PRS (64) may be more useful clinically, especially, for example, as modifiers of high-risk BRCA1 and BRCA2-mutation carriers given that even small relative risk changes translate into large absolute effects. Most general population breast cancer SNPs are associated with ER-negative or -positive disease and, respectively, in BRCA1- or BRCA2-mutation carriers. Similarly, only general population loci (e.g., 19p13.11) associated with high-grade serous ovarian cancer modify risk of ovarian cancer in BRCA1- and BRCA2-mutation carriers, where tumors are of this high-grade subtype (64). The new OncoArray approach to investigate genetic architecture of common cancers, with dense mapping of loci associated with single or multiple cancers, pleotropic effects and cancer related traits, will also build on first-generation GWAS findings (76). The first genome-wide copy-number variants (CNVs) association study of BRCA1 carriers validated a CNV deletion at 19q13.2 with decreased ovarian cancer risk, matching a region displaying strong regulatory potential in ovarian (but not breast) tissue (77). Finally, the ability of GWAS to discover novel cancer genes/pathways underlying the observed risk (78) is now being exploited for future drug development or repositioning, aided by experiments in relevant preclinical models. These GWAS-initiated efforts have already led to prevention-relevant drug targets, IL-17 and ESR1 (64).
A key challenge is that many GWAS-identified loci are not near known coding or regulatory regions, leaving associated mechanisms and functions unclear. Linking susceptibility variants to their respective causative genes and cell-specific regulatory elements thus remains a high priority in order to realize the full potential of association studies. For example, sections of DNA called “enhancers” are increasingly found to be mutated or disrupted as major causes of human cancer, though they have been difficult to locate. A recently developed computational method greatly improves regulatory element prediction based on tissue-specific local epigenomic signatures, a major advance for cancer genomics (79). Deep characterization of transcriptional regulation of molecular events will greatly advance functional studies of human disease variants, identifying novel disease mechanisms and prevention opportunities. Certain SNPs underlying cancer risk are linked to hypermutability and immune activation (64; Table 1) and intestinal barrier function (discussed below). Germline variants can affect endogenous mutational signatures such as an APOBEC3 copy number polymorphism. An APOBEC3A/3B-deletion allele (discussed more later) confers breast and ovarian cancer risk and innate immunity to infection, suggesting strikingly disparate effects of this SNP on infection, mutagenesis, and cancer (80, 81).
Mitochondrial biology and genetics in cancer predisposition
Mitochondrial DNA (mtDNA) with its mutations and polymorphisms is a relatively underappreciated field in cancer research. Human mtDNA is maternally inherited (and mapped initially from Africa; ref. 82) and encodes 37 genes: 22 transfer RNAs, 2 ribosomal RNAs and 13 protein subunits of the electron transport chain (ETC) complexes and ATP synthase (mtOXPHOS proteins), essential for respiration. There are several mtDNA copies per mitochondrion and hundreds of mitochondria per cell. Generally, neoplastic cells possess functional mitochondria and retain the ability to conduct oxidative phosphorylation. In fact, targeted depletion of mitochondrial DNA can reduce tumorigenic potential in vivo (83). While it has long been known that somatic mtDNA alterations are frequently acquired during oncogenesis (84), emerging data indicate that methylation and inherited mtDNA variants influence multiple innate mitochondrial functions, including reactive oxygen species (ROS) production and redox control, signal transduction and epigenome systems, autophagy, apoptosis, and immunity (82). Furthermore, mtDNA genes are intimately linked with ∼2000 nuclear genes encoding proteins that function within mitochondria, which can produce Mendelian patterns of inheritance.
Inherited deleterious missense alterations in mtDNA genes, such as ND6 and COI (cytochrome oxidase subunit I), which code for subunits of OXPHOS complexes I and IV, have been associated with risks of various cancers. Inherited mutations in the COI gene (associated with increased ROS production, which contributes to tumorigenesis) are associated with prostate cancer risk and may explain increased risks in African American men; certain African mtDNA lineages harbor COI gene variants that may contribute to risk of other cancers among African Americans (85–87). mtDNA variants have been associated with risks of ovarian, bladder, breast, endometrial, and HPV-infection and cervical cancers, among others (82, 84, 88–92). The progression of nonalcoholic fatty liver disease to nonalcoholic steatohepatitis (NASH), which predisposes to hepatocellular carcinoma, has been shown to be associated with a mtDNA SNP in the mt-Atp8 gene (a subunit of OXPHOS V, ATP synthase). This variant has profound effects on hepatic lipid and acylcarnitine metabolism and susceptibility to high-fat diet-induced NASH (93). Generating preclinical models of these and other inherited mtDNA mutations (94) will be critical to probing the contribution of mitochondrial biology to inherited cancer risk.
Deleterious alterations in mtDNA are inherently heteroplasmic (harboring a mixture of mutant and wild-type mtDNA) with high levels of these severe mutations being lethal. Since mtDNA transmission during mitosis is the result of stochastic distribution into daughter cells, milder mtDNA polymorphisms can shift to become predominantly enriched within individual cells and closer to pure mutant (homoplasmy), potentially contributing to neoplastic transformation. The importance of this phenomenon for cancer predisposition has been demonstrated in a pedigree in which a mtDNA complex I ND5 m.12425delA frameshift mutation is present in homoplasmy in a nasopharyngeal oncocytic tumor, which was inherited as a germline mutation, transmitted at lower heteroplasmy levels (5–10% mutant), and thus masked by wild-type mtDNAs. Shift to homoplasmic ND5 mutation occurred exclusively in tumor cells and correlated with lack of the ND6 subunit, indicating that complex I mutations may have a selective advantage. Thus, while the transmission of the mutant mtDNA in this pedigree was phenotypically silent, the chance increase of the mutant mtDNA in somatic cells caused oncocytic transformation (95).
In contrast to Mendelian genetics, the heteroplasmic mtDNA genotype is continuously changing during successive cytokinesis to generate cells with varying oncogenic potential (82). Widespread heterogeneity has been reported in the mtDNA of normal human cells and heteroplasmic variants among different tissues within the same individual, highlighting the existence of composite mixtures of related genotypes rather than a single genotype. Mechanistic study of the regulation of mtDNA heteroplasmy may yield novel prevention insights. mtDNA is highly polymorphic due to its very high mutation rate, greater than an order of magnitude higher than the nuclear genome, and harboring functional variants that can be beneficial or deleterious depending on context. A subset of mtDNA variants causes subtle changes in OXPHOS, which can modulate inflammation, stress, autophagy, and oncogenic responses to diet and other factors (82, 86, 96). Those functional mtDNA variants that were beneficial (adaptive) in a particular environment increased in number and gave rise to descendant mtDNAs, which share the ‘founders’ beneficial variant. Migration (or other major changes in diet, stress, etc.) can make previously adaptive variants become maladaptive and predisposed to a wide range of common diseases (82).
Altered energy metabolism is now regarded as one of the hallmarks of cancer (6, 83). Recent murine work revealed profound influences of subtle changes in mtDNA genetic background (haplotype/variation) on obesity, aging, and cancer (97–101). Mitochondria are essential for cancer progression; however, their role in tumor development has only recently been confirmed. Murine study highlighted the effects of mtDNA background on tumorigenesis by examining PyMT transgenic mice (inherently predisposed to developing breast cancer) with identical nuclear genomes but varying mtDNA genome backgrounds. The mtDNA background influenced both the latency and progression of the primary breast tumors (97). This concept was confirmed and extended mechanistically (in different mouse strains), further indicating the substantial mtDNA haplotype influence on obesity and aging – two major cancer risk factors (100). In normal mice, mitochondrial catalase transgene (mtCAT) reduces mitochondrial ROS production, somatic mtDNA mutations, and increases longevity (98). When mouse mtCAT transgene was crossed with PyMT, the incidence of invasive breast cancer was greatly reduced (99). Genetic or pharmacologic (metformin) disruption of mitochondrial respiration increased autophagy and prevented cancer development in a mouse model of Li-Fraumeni syndrome. Furthermore, in a pilot study of Li-Fraumeni patients, metformin was shown to decrease tissue mitochondrial activity and reproduced cell-signaling results observed in the mouse model, effects known to give rise to rhabdomyosarcoma, a component cancer of this syndrome (102). Finally, a recent study established a mechanistic link between tumor formation and mitochondrial respiration, and showed that mtDNA acquisition occurs via trafficking of whole mitochondria (103).
Adding to the tissue- and geographic-specific contextual basis of the mt effects discussed above, emerging data also suggest intricate communication between the nucleus and mitochondria (96). For example, nuclear BRCA1 has been found in the mitochondria, where it may play a role in maintenance of mt genome integrity and DNA repair. Metformin beneficial effects on mtCOI in BRCA1 carriers was discussed above (29). Murine models with germline mutations in the nuclear gene SUV3, which encodes for a mtRNA helicase, are characterized by marked somatic mtDNA instability, hypermutability, shortened lifespan, and various cancers – a unique model to assess mitochondrial genomic instability in cancer predisposition. Clinical relevance was shown by reduced SUV3 expression in two independent cohorts of human breast cancer (104). SOD (mt antioxidant function) nuclear DNA germline mutations function in the mitochondria and heterozygous Sod2 mice develop pulmonary fibrosis and lung cancer. Mutations in nuclear DNA genes influencing transformation involve some of the same targets/mechanisms affected by mtDNA – TETs, succinate, fumarate, NRF2, and α-ketoglutarate dioxygenases (83, 96, 97, 104).
Mitochondria may also be intimately involved with T-cell immune surveillance, since T-effector cells are more glycolytic while Treg cells are more oxidative. Within neoplastic cells, glucose is converted to lactate (which promotes inflammation, angiogenesis and tumorigenesis), thus inhibiting T-effector function. Treg function is enhanced, which further inhibits the T-effector cells, and suggests that immune rejection of neoplasia might be boosted by mild complex I inhibitors such as metformin, whose effectiveness should be increased in neoplastic cells with partial OXPHOS dysfunction (105). Mitochondria can also influence the inflammasome, innate immunity, IL-1β and NFκB inflammatory pathways (96, 106). Certain mtDNA alterations modify (lower) risks of breast cancer in germline BRCA2 mutations (107). mt ETC gene variation may help explain breast cancer risk in BRCA1/2-negative women with strong family history (108). Future GWAS integrating both nuclear and mitochondrial data will provide a more comprehensive germline-somatic landscape.
Big Genomics Data of Premalignant Somatic Tissues
The collection and analyses of NGS big data are beginning to provide biological insights into cancer prevention and early detection. It is worth noting that, in addition to comprehensive analyses of “big genomics data,” several recent studies have also examined cancer microbiomes (reviewed in (109, 110)), transcriptomes (111, 112) and epigenomes (reviewed in (113, 114)). In addition to big data generated from somatic sequencing efforts, GWAS has involved over one million cancer patients worldwide across most organ sites identifying ∼3000 cancer-related genetic associations (115) and has been studied in some precancers such as Barrett's esophagus (116–118), colorectal adenomas (119, 120), DCIS (121) and hematologic premalignancies (below). A recently discovered germline mutation in familial Barrett's esophagus may be involved in esophageal maturation, consistent with GWAS findings (118).
The genome of a malignancy can be examined as an archeological record bearing the cumulative imprints of all mutational processes that have been operative throughout the cellular lineage between the fertilized egg and cancer. Each mutational process leaves a characteristic imprint, termed mutational signature, which can change over time, and almost all mutational signatures detected in a cancer genome have been imprinted during the precursor phase of a cancer cell (122; Fig. 1). Examination of the cancer genomes from >12,000 patients has revealed more than 30 distinct mutational signatures, related to endogenous processes and environmental exposures, such as UV-light, aflatoxin, and tobacco (cancer.sanger.ac.uk/cosmic/signatures). Some of these signatures have already been used for identifying the presence of specific carcinogens, including aristolochic acid, one of the most potent known human carcinogens – a chemical present in certain plants still in use even today especially in China – with global public health risks of urologic and hepatic cancers (122). Mutational processes can cause clonal evolution in normal, at-risk tissue, including skin, lung, kidney, brain, breast, fallopian tube, and intestine epithelium (111, 123–127; see below). Somatic mutation burden was age- and tissue-dependent and much greater in the mitochondrial vs. nuclear genome (123). Recent discovery of a novel mutational process linking cellular lineage and cancer has major implications on the study of field carcinogenesis, prevention, and early detection research, including liquid biopsies (124). It will be important for the PCA to incorporate both premalignant and normal-aged tissues with the goal of identifying relevant mutational signatures that may provide important clues into premalignant oncogenesis, prevention, and interception. However, only half of the currently known cancer signatures have unknown etiology. To address this and identify new signatures, a large-scale initiative recently funded by a UK Grand Challenge Award will use NGS to identify mutational signatures from well-annotated samples from >5000 cancer patients from five continents, with validation in experimental systems. This work will provide new preventive and global public health opportunities (122).
Another set of widespread and extensively reported endogenous mutational signatures are those attributed to ectopic activity of the APOBEC family of deaminases (122). Examination of more than 10,000 specimens from 36 distinct cancer types revealed that signatures are found in more than 30% of cancer samples and account for approximately 15% of all somatic mutations across these cancers. APOBEC3B expression has been linked to oncogene-induced replication stress and mutational signatures are especially strong in bladder and cervical cancer where they account for more than 75% of all somatic mutations. In cancers of the cervix and oropharynx, these APOBEC mutational signatures are triggered early by HPV infection (122). A recent study of APOBEC3A/B enzyme structures revealed conserved features that may be useful for designing APOBEC inhibitors that could reduce mutation burden and cancer (e.g., bladder) risk caused by these enzymes (128). The importance of APOBEC3B (A3B) has been questioned by finding A3 signature mutations and cancer (e.g., breast) risk in patients carrying an APOBEC3A/3B polymorphism characterized by a 30-kb deletion that eliminates A3B and creates an A3A–A3B chimera (ref. 81, discussed above). Paradoxically, this A3B-deletion SNP leads to increased A3A activity thought to offset this deletion SNP and underlie breast cancer risk, A3 mutation signatures, germline copy number, and immune activation. A recent report investigating this enigma, however, strongly implicated APOBEC3H haplotype I (and not A3A) and suggested that A3B and A3H-I together explain the bulk of A3 signatures in cancers globally (129).
Multiple mutational signatures reflect failure of different DNA repair pathways. Disruptive germline polymorphisms in MC1R, contributor to phosphorylation of DNA repair proteins, have been associated with increased UV somatic mutation burden and skin cancer risk (130). A specific mutational signature, associated with both germline and somatic BRCA1 or 2 mutations, is related to marked increases in the rate of point mutations (131) and observed in breast, ovarian, pancreatic, gastric, and esophageal cancers (even those without BRCA1 or 2 mutations; refs. 132–137), associated with markers of immunity in subsets of the former three cancers (Table 1), suggesting a potential role for immune-based prevention against such cancers. A mutational signature reflecting the accumulation of unrepaired reactive oxygen species, mainly 8-oxoguanine, has been identified in CRC and adenomas arising in individuals with pathogenic germline MUTYH mutations, which cause base excision repair (BER) defects (138). Additionally, a failure of transcription-coupled nucleotide excision repair (NER) due to ERCC2 somatic mutations in preinvasive bladder neoplasia has been shown to exhibit a specific mutational signature (139). Germline defects in other NER genes can cause Xeroderma pigmentosum, a rare autosomal recessive genetic disorder associated with UV-induced DNA damage, mutational signatures, and very high risk of non-melanoma skin cancer, which can be reduced using bacterial DNA repair enzymes or nicotinamide (140, 141), which can prevent UV-induced immune suppression and enhance DNA repair.
Mutational signatures responsible for the unavoidable background mutation rate in somatic cells have been identified. Notably, two unrelated mutational signatures have been found to act as endogenous mutational “clocks,” characterized by accumulating somatic mutations within all normal somatic cells of the human body with the progression of age (142, 143). One of these mutational signatures has been attributed to spontaneous deamination of 5-methylcytosine in the context of CpG (its rate of “ticking” appears to be influenced by cellular division), while the etiology of the second clock-like signature remains unknown. Interestingly, a recent study demonstrated the increased rate of one mutational clock to be mechanistically linked to tobacco smoking (144). The somatic mutation loads in single-cell lineages provide information about an individual's lifetime history of mutagenic exposure and the impact of intrinsic factors on mutagenesis. Expanding this work to precancers, more cell types and larger populations in the PCA would further refine estimates of the rates of somatic changes in human genomes. Understanding the contributions of environmental and endogenous mutagenic processes to somatic mutation loads is fundamental to develop preventive strategies.
Analyses of omics data from precancers and field defects are beginning to emerge. Despite the relatively small sample sizes within such analyses, cross-sectional, precancer/cancer pair studies suggest that many precancers share genomic alterations with their respective invasive cancers, including prostate (145), ductal and lobular breast cancer (146, 147), pancreas (148), head and neck cancer (149), non-melanoma skin (126), melanoma (150), lung adenocarcinoma (151), and colorectal neoplasia and suggest that early lesions are often polyclonal (3,138). Genomic, transcriptomic, and epigenomic alterations have been detected in normal tissue fields as discussed above (123–127) and precancers. In-depth deep sequencing analysis of ongoing clonal competition in histologically normal-appearing eyelid epidermis (extensive somatic mutations, largely of UV signature) suggested initial exponential growth of a clonal population, due to acquisition of a driver mutation, followed by density-dependent or immune growth constraint (126), an issue relevant to other sites discussed below (e.g., clonal hematopoiesis). Another study demonstrated that actinic keratosis, a skin precancer, was closely related to squamous skin cancer by multiple genomic measures. Analyses of small adenocarcinomas within high-grade adenomas found substantially more chromosome level changes in the cancer and suggested that gain of 20q was associated with transformation, consistent with NGS of FAP adenomas (3). NGS revealed high mutation burdens in morphologically normal prostate tissue distant from the cancer, reflecting clonal expansions, and indicating that the underlying mutational processes in normal tissue were also operative in cancer (145). Extensive data from many sites link transformation with DNA repair defects, genomic instability, and replication stress (152). Barrett's esophagus is the best-studied epithelial precancer from an NGS, genomics, transcriptomics, and big data perspective (132, 152–156), with five GWAS (116, 117), including the most comprehensive evaluation of inflammation-related germline variation suggesting that variants in MGST1 influence disease susceptibility (117). Furthermore, recent study revealed that gram negative microbiota in the esophagus produce lipopolysaccharide (a TLR4 ligand), which both primes and activates the NOD-like receptor 3 inflammasome to secrete IL-1β, IL-18 and other proinflammatory cytokines contributing to inflammation-mediated oncogenesis in Barrett's esophagus (157). Barrett's esophagus/esophageal cancer pairs revealed that the overall somatic mutation rate (single-nucleotide variations) was lower in Barrett's esophagus than esophageal cancer but higher than that of several invasive cancers (e.g., breast cancer, multiple myeloma [MM]) and exome sequencing found that most (∼80%) recurrently mutated genes in esophageal cancer were remarkably similar to the matched Barrett's esophagus (132). A subsequent study assessed mutation order in the Barrett's esophagus-esophageal cancer sequence (155). Only two genes, TP53 and SMAD4 displayed a stage-specific mutational pattern with TP53 present in high-grade dysplasia and cancer and SMAD4 only in cancer. Another study (153) performed whole genome sequencing on paired Barrett's esophagus and cancer samples with a Barrett's esophagus component that included biopsies over space and time. The analyses revealed numerous somatic mutations, small insertions, and deletions at all disease stages and that copy number increases appeared to drive neoplastic transformation to esophageal cancer (156). In addition, the results indicate that Barrett's esophagus is polyclonal with evidence of branched evolution co-occurring with distinct localized clones. See longitudinal section below for Barrett's esophagus SCNA data.
Two recent large-scale NGS reports of mtDNA in cancer (>2,000 human cancer specimens, 30 tumor types) identified a mutational signature with unique heavy strand-specific C > T transition. More importantly, this missense mutational signature was considered neutral (analogous to passenger mutations in nuclear DNA), not compromising mitochondrial function (158). One of these studies further refined the mtDNA mutational map with matched transcriptome sequencing (RNA-seq) data (159). Although DNA/RNA allelic ratios generally were consistent, some mutations in mt-tRNAs displayed strong allelic imbalances caused by accumulation of unprocessed tRNA precursors, indicative of impaired tRNA folding and maturation, which underlie a range of diseases. Both studies found a selective pressure against deleterious coding mutations affecting oxidative phosphorylation, indicating that tumors require functional mitochondria. Unexpectedly, known dominant mutagens, such as cigarette smoke or UV light, had a negligible effect on mtDNA mutations. Another new study has reported significant correlations between mtRNA-Seq and mtDNA copy number, with some important exceptions (e.g. MT-ND5 and MT-ND6) (160). mtDNA mutations are frequent in Barrett's esophagus without dysplasia and have been used to characterize genetic lineages that assess clonal relationships, e.g., between Barrett's esophagus and esophageal squamous cells (161). Clonal expansion of mtDNA mutations can result in mitochondrial dysfunction, such as decreased ETC enzyme activity and impaired cellular respiration. NGS of mtDNA of oncocytomas, which are rare benign tumors of epithelial cells defined by excessive amounts of mitochondria, has identified a pathogenic mutation signature that compromises the overall function of the mitochondria, proposed to serve as a metabolic barrier for these benign tumors, and perhaps precancers, progressing to malignancy (162). A complex biochemical shift encompassing intra-lesional lactate metabolism helps drive Barrett's neoplastic progression, possibly involving microbiota change to predominantly gram negative flora (163).
In addition to the results above, a systematic approach to classify cancers using transcription profiles at both bulk tumors and single-cell resolution has been described (see below). These profiles not only provide molecular bases for classifying cancers with shared transcriptional programs across different cancer types but also characterize heterogeneity that exists within individual neoplasias (164, 165). Whole transcriptome profiling using RNA-seq, pathway enrichment, and functional assays of Barrett's esophagus found novel cell-cell communication, with normal epithelial cells sensing and dramatically suppressing dysplastic cell behavior (e.g., motility, signaling pathways). These effects are distinct from the stromal and immune cell microenvironment effects. Downregulated TGFβ, EGF, and Wnt pathways associated with the differential transcriptional profiles observed in the dysplastic cells in co-culture (normal and dysplastic Barrett's epithelial cells, which often coexist in vivo; ref. 156). Single-cell approaches would allow analysis of different subpopulations of cells within highly variable epithelial, immune, stromal, and other microenvironment cellular components surrounding the neoplasias (Figure 1; refs. 156, 164, 165). Furthermore, oncogenic pathways and developmental- and immune-based gene expression signatures can be used for “pathway/phenotype”-based molecular characterization (166, 167).
Recently, a novel analytic approach to define oncogenic states and produce functional maps of cancer has been established. This serves as a framework for combining experimental and computational strategies to deconvolve oncogenic pathways/signatures derived from oncogene activation into transcriptional components that can be used to determine oncogenic states. By mapping precancers and cancers onto distinct oncogenic states, the resulting functional map can be used to characterize how these states relate to omics features of NGS mutations, copy number alterations, gene and protein expression, gene dependencies, and biological phenotypes; and to predict which interventions are more likely to have a significant effect (168). This approach has been used to effectively map cancers with altered KRAS/MAPK pathways into divergent functional states. Studies in pancreatic oncogenesis highlight the need for big data approaches to interpret elaborate KRAS mutation subtype, Hippo and other pathway interactions, profound effects on cell metabolism, DNA repair, immunity, mitochondrial biology, and distinct precursor pathways (169). Mutant Kras in pancreatic acinar cells induces expression of ICAM-1 to attract macrophages and drive PanIN development: direct early cooperative mechanisms between driver mutations and inflammatory environments (170). Even mast and B-cells can initiate and promote pancreatic tumorigenesis (171, 172). For example, primary human and mouse models found B-cell infiltration in proximity of PanIN lesions, due to stromal secretion of the B-cell chemoattractant CXCL13 (171). Another study found highly expressed HIF1a in PanINs; deletion of HIF1a increased secretion of CXCL13 and related attractants and accelerated malignant transformation. Depletion of B-cells reduced PanIN progression (172). These dynamic maps can continually integrate new data, be generalized to consider germline and somatic networks and interactions (173), and dissect the close interchange with the immune microenvironment. Integrating functional genomics data, such as those described above, to the maps of oncogenic states will provide further insight into the cellular contexts in which genomic alterations contribute to malignant transformation.
Epigenetics
Previous work has yielded only a limited big data perspective of the neoplastic epigenome, primarily in hematologic neoplasia, where chromatin modifiers are in general among the most frequently mutated in cancer (114, 174). Most studies have focused on performing functional analysis on a few genes in a limited number of samples as reviewed below. Widespread epigenetic field defects have been observed in apparently normal breast tissue located adjacent to breast cancer (175) and also associated with inflammation-related cancers, such as Helicobacter pylori-induced neoplasia (in which AID has been implicated; ref. 127), where NGS has revealed more cancer pathway-related genes affected by DNA methylation than by genetic alterations (176). Field defects (involving methylation of MGMT and ADAMTS14 and cohesion SA1) in normal colon tissue is more frequent in African-Americans than Caucasians, possibly providing insight into CRC racial disparities (177). The ten eleven translocation (TET) enzymes oxidize 5-methylcytosines (5mCs) and promote locus-specific reversal of DNA methylation (discussed below).
An epigenetic mitotic clock was developed using a novel mathematical approach. A key feature underlying the construction of this clock is the focus on Polycomb group target promoter CpGs, which are unmethylated in many different fetal tissue types, thus allowing defining a proper ground state from which to then assess deviations in aged tissue. By correlating the tick rate predictions of this model to the rate of stem cell divisions in normal tissue, as well as to an mRNA expression-based mitotic index in cancer tissue, this model approximates a mitotic-like clock. The epigenetic mitotic clock-like signature exhibits a consistent, universal pattern of acceleration in cancer in normal epithelial cells exposed to a major carcinogen. Epigenetic clonal mosaicism is maximal before cancer emerges. Unlike the mutational clock-like signatures discussed above, this epigenetic clock is based on clinical DCIS and lung CIS progression to cancer and normal at-risk tissue; a concrete example of a molecular mitotic-like clock that predicts universal acceleration in precancer (178). Smoking was associated with an increased rate of this mutational clock. Another approach to intratumoral heterogeneity analyzed DNA methylation patterns at two genomic loci that were assumed to have no role in gene regulation, in contrast to driver methylation changes. Methylation at such neutral loci were unlikely to be under selective pressure and therefore, could serve as a “molecular clock” to measure mitotic divisions based on the higher error rate of DNA methylation maintenance relative to the error rate of DNA polymerase (179).
Aberrations of the epigenetic modulator TET2 are among the first alterations identified in several hematologic premalignancies. The TET family of proteins was originally named due to the TET1 fusion partner in mixed-lineage leukemia (MLL)-rearranged AML (see “biochemistry” subsection below), however, this translocation is rare, and its significance in leukemogenesis is unclear. In contrast, TET2 mutations are found in premalignant HSCs and aged healthy individuals (180) with propensity to transform (see clonal hematopoiesis below). Disruption of TET2 in mouse models increases HSC proliferation and clonal expansion, prone to additional oncogenic events generally required for malignant transformation (177). Mouse models found that co-occurring Tet2 disruption with Asxl, Ezh2, or Jak2 V617F results in MDS or MPN. Recurrent dominant point mutations in IDH1 and IDH2 are early events in some hematologic neoplasias that lead to loss of TET activity and other epigenetic changes (181). In addition, TET2, IDH1, and IDH2 mutations are frequently observed in lymphoma precursors (182–184), and the frequency of TET loss-of-function (which can drive hematologic transformation) in these settings supports testing TET activators. TET modulators (185), can enhance antigen presentation, increase IL-6 production by macrophages (186), affect Tregs (187), and alter expression of endogenous retroviruses, cancer testis antigens, and stem cell antigens in premalignant lesions resulting in enhanced immunogenicity.
Another epigenetic mechanism found to be important in premalignant biology involves RNA editing by ADAR enzymes (188–190), which results in adenosine-to-inosine conversion of RNA thereby inducing virtual adenosine-to-guanine mutations (since inosine bears molecular resemblance to guanine; ref. 189). Depending on whether the editing events occur in coding regions or 3′ UTRs, ADAR-mediated editing of mRNAs can result in post-transcriptional protein coding mutations or altered susceptibility to microRNAs (190). Germline variation of ADAR genes may influence ovarian cancer susceptibility (188). ADAR1 editase activity has been implicated in the oncogenic transformation of premalignant progenitors that harbor clonal self-renewal, survival, and cell cycle-altering mutations (191, 192), such as in hepatocellular carcinoma precursors, where aberrant RNA editing of AZIN1 has been found to be a key oncogenic driver (189–193). ADAR1 is associated with CIN development in transformation (194) and appears to play a critical role in cancer stem cell-related diseases (195). Inflammatory cytokine networks and JAK2/STAT signaling activate ADAR1 during relapse/progression in leukemia stem cell renewal, linking RNA editing to the development of innate immunity and potential preventive activity (112). Finally, study of ADAR1 regulation of APOBEC3 in neoplasia will be critical, potentially suppressing hypermutation and immunity (196).
Emerging data also suggest that some premalignant lesions may progress to cancer via fundamental epigenetic/transcriptional reprogramming to a progenitor-like state required for driver mutations to induce tumorigenesis (197). The role of BRAF mutations in benign nevi is a major historical conundrum (198). BRAF V600E and other mutations in the MAPK pathway are very early events in melanoma development substantially enriched in benign lesions. Mutations in other common driver genes are found only in intermediate or later stages of disease such as CDKN2A loss, TERT promoter mutations, or the SWI/SNF chromatin modifiers ARID1A, ARID2, or SMARCA4. In the BRAF V600E zebrafish model of melanoma, deletion of p53 promotes the nevus-to-melanoma transformation, but melanomas remain surprisingly infrequent considering that all of the cells bear both the oncogene and tumor suppressor loss (199) – a feature that replicates the phenomenon of “field defect” in human tumors. Studies in BRAF V600E/p53-null zebrafish now suggests that initiation of malignant transformation within such a “cancerized field” requires fundamental epigenetic reprogramming of these premalignant cells into an embryonic state via transcription factor-mediated reactivation of genes typically expressed only in neural crest progenitor cells (197). This reprogramming involves binding of multiple transcription factors and generation of “superenhancer” regions. Engineered models and epigenetic mechanistic work suggest key roles of p15 loss and autophagy (overcoming senescence) in promoting BRAF V600E-mutant transformation to melanoma (200). Deletion of Atg7 inhibits tumorigenesis, likely via a mitochondrial mechanism (201).
Similarly, mouse model research demonstrated that basal cell carcinomas, known to be driven by oncogenic signaling in the hedgehog pathway, only originate from stem cells located in very specific areas of the murine epidermis, which are reprogrammed back into a more progenitor-like state before progressing to cancer possibly involving EMT (202, 203). In the pancreas, the transcription factor KLF4 is a ductal fate determinant, inducing cellular identity change to a duct-like lineage and initiating acinar-cell reprogramming during early premalignancy (204). In Kras GEMM, Klf4 deletion dramatically reduced PanIN number, while Klf4 overexpression did the opposite. Acinar cell reprogramming can also involve Notch, YAP, TAZ, and c-MYC. Genetic ablation of Smoothened in stromal fibroblasts in a Kras GEMM increased the early premalignant acinar-to-ductal metaplasia and PanINs (205). Mutant Kras cooperates with impaired JNK signaling and induces mitochondrial oxidative stress in acinar cells to accelerate PanIN development. Like the zebrafish melanoma model, these experiments provide further evidence that the earliest stages of tumorigenesis are characterized by reprogramming to a more embryonic cell state. Such data suggest that tumor-initiating cells can be identified – and potentially targeted for early destruction – through their ability to reactivate an “embryonic” epigenetic state, highlighting one of the PCA's key missions: probing premalignant cells and model systems to better understand when epigenomic changes arise and how stable they are over time.
The Power of Immunology and Biochemistry
Immune oncology
The integration of multiple omics analysis platforms with immune-informatics analysis can be the foundation of a more effective framework for precision prevention. There is now a wealth of evidence from both animal models and cancer patients of how the immune system can survey and recognize peptides encoded by certain genetic mutations when such peptides are presented on the surface of the cancer cell bound to MHC-Class I and Class II molecules. Immunogenic neoantigens, including oncogenic drivers, may also be targets of immunosurveillance (206, 207). For example, T-cells specific to mutated RAS peptides have been found in cancer patients and may be a viable target for immune approaches to treatment and even prevention. Proof-of-principle experiments of vaccine targeting mutant Kras (with Treg depletion) in a pancreas mouse model induced CD8+ T-cells specific for the Kras mutation and showed preventive efficacy in the early PanIN setting (207). In addition to KRAS and other predicted mutations in known oncogenes (e.g., PI3K, β-catenin, and Myc), cancer cells and their precursors can harbor tens to hundreds of random mutations throughout their genome. GEMM studies have identified mechanisms beginning with these genetic drivers and other somatic alterations, which progressively recruit inflammatory/immunosuppressive cells and induce changes in normal cells to create and interact with the premalignant tumor microenvironment (TME) to promote oncogenesis and immune evasion (see Kras map above in Big Genomics). This complexity is highlighted by some preclinical data, surprisingly suggesting that immunogenicity of premalignant lesions can cause early immune suppression/tolerance (206). These studies are currently limited in humans, however, to only a few lesion types and patients. The role of tumor suppressors (e.g., p53, ARF, Rb, PTEN) in immunity is strongly linked to maintenance of genomic integrity. p53 can modulate innate and adaptive immunity, including macrophage function and production of certain cytokines, chemokines, and ICAM1. p53 can also influence various immune checkpoints, e.g., DD1α is a direct transcriptional target of p53, while p53 regulates PD-L1 via miR34, which directly binds to the 3′ UTR of the gene encoding PD-L1 (208).
Vaccine-based approaches hold particular promise since they are a form of precision prevention with few side effects. Viral vaccines (e.g., to HPV) given before exposure can provide long-term protection from cancer development after only one or two treatments (140). HPV vaccines have modest activity, however, in treating chronic viral infections and related precancers, and far less activity in treating established cancers. Therapeutic vaccine studies targeting viral E6/E7 oncogenic drivers in CIN2/3, including results from single-cell T-cell receptor (TCR) sequencing, suggest that inducing efficient trafficking of functional effector T-cells to the epithelial disease site is critical to eliminate both CIN and virus (209). HPV induces APOBEC3B (which mutates chromosomal DNA) and tumor-associated stromal fibroblasts (see APOBEC above). The mechanisms of viral immune evasion vary by pathogen (seven viruses have been linked with human neoplasia) but are strikingly similar to those used by non-viral neoplasias. HPV examples of immune evasion include: E6 inactivates p53, which induces PD-L1 (as above) and cervical Tregs, and integrates into/disrupts the PD-L1 3′ UTR enhancing PD-L1 expression (210) and certain HLA genes have been linked to infection and CIN transformation, and immune evasion (209, 210). Increased diversity/dysbiosis of vaginal microbiota combined with reduced abundance of Lactobacillus spp. is involved in HPV acquisition and persistence in the evolution of cervical precancer and cancer (211). These provocative data suggesting that vaginal microbiota may influence the host's innate immune response, susceptibility to infection and neoplasia, will require future longitudinal, multi-omic studies (212) as proposed in the PCA to prove causality.
Evidence for immune surveillance has been reported in healthy people and associated with lowered lifetime cancer risk. Childhood febrile viral infections have been associated with reduced cancer risk, consistent with an influenza mouse model in which the virus infection elicited protective antibodies and T-cells specific for host and some tumor-associated antigens (213). These data suggest that infection-induced immunity and immune memory could provide long-term immune surveillance of cancer, important properties for vaccine targets. T-cells are likely the main effector cells in preventing most forms of cancer. The immune system has the ability to recognize precancers and generate immune responses to potentially intercept and prevent cancer in the elimination phase of immunoediting (58, 62, 214). Precancers that are not eliminated proceed to the equilibrium phase in which the immune system holds the lesion in check and avoiding immune elimination is a hallmark of cancer (6). We must learn how to both strengthen T-cell immunity – either through immunization, drugs, or engineering – and concurrently overcome a dynamic hostile tumor microenvironment that prevents T-cell activation and infiltration into early neoplasia. The latter involves multiple factors, for example, intricate signaling networks, metabolic reprogramming of the microenvironment by the high utilization of extracellular glucose and glutamine results in extracellular lactate, which attenuates dendritic and T-cell activation, stimulates macrophage polarization to an M2 state, induces VEGF secretion by stromal cells, and activates NFκB. The microenvironment can in turn have profound effects on the metabolism of neoplastic cells (83). Emerging data suggest that microenvironment barriers develop early in precursor lesions but are likely qualitatively different from more established cancer-associated barriers. The progressive accumulation of somatic changes that leads to neoplasia also co-opts neighboring vascular, neuronal, and other normal cells to support/promote oncogenesis by subverting the immune system to escape detection and elimination.
Remarkable new data revealed that high-level arm- and whole-chromosome-SCNAs drive immune evasion (215). This unanticipated effect is in striking contrast mutational burden effects on immunity, and can override immune response even in high mutational/neoantigen burden settings (e.g., MSI-H/MMR-deficient tumors; refs. 9, 10), revealing an increasingly intricate interplay between SCNA levels, gene alterations, and immunity. Critical to vaccine development, therefore, is the identification of potent immune enhancers/adjuvants (e.g., metformin, toll-like receptors [TLRs], STING, stromal inhibitors) that can specifically target one or more innate pathways and alter the developing inflammation that promotes immune suppression. For example, cancer vaccines can increase tumor infiltrating CD8+ T-cells, but these cells produce interferon-γ, leading to upregulation of PD-L1 and other immunosuppressive pathways. Furthermore, experience with therapeutic cancer vaccines shows that targeting a single antigen or a single mutated peptide invariably leads to outgrowth of cancer cells that have lost that mutation (5). This may happen in the precancer setting as well, requiring a vaccine that elicits a polyclonal and polyspecific immune response, and monitored by T-cell receptor sequencing to look at clonality and clone expansion, liquid biopsy detection of low levels of specific mutations or mutational load, SCNAs and imaging of the microenvironment, immune response (216, 217), and high-grade precancer (e.g., PanIN-3; ref. 218).
In a clinical feasibility trial of advanced adenoma patients, lack of immune response to a MUC1 cancer vaccine correlated with increased levels of circulating MDSCs responsible for inhibiting adaptive immunity (219), suggesting that these may be useful biomarkers to identify individuals unlikely to benefit from preventive cancer vaccines. As above, research into drugs that could help overcome such immune resistance will be critical. Metformin has many key properties in this context, enhancing T-cell immunity and immune memory, and influencing the microbiome, by affecting mitochondrial biology and RANK-L (see above; refs. 29, 220–222). Furthermore, prospective cohort data suggest that aspirin prevention of CRC is related to its effects on T-cell immunity (223). Large study of adenomas reported a highly inflammatory microenvironment (224), which varied by histology and location (225). Two NGS reports of inflammatory bowel disease (IBD)-associated CRC suggested that, compared to their sporadic counterparts, IBD-associated CRCs have a distinct mutational profile associated with cell-to-cell signaling, cell adhesion, and epigenetic regulators/chromatin modifiers, all linked to IBD inflammatory mediators (226, 227). IDH1 mutation (extremely rare in sporadic CRC) was found only in Crohn's-associated CRC. As a precursor to CRC, ongoing IBD microbiome studies will continue to inform the biology of cancer development as shown by recent findings of immune-protozoa-microbiota interactions in colitis and CRC (228).
Every human body contains ∼40 trillion microbes (microbiota), which have exponentially more genes (the microbiome) than do human cells; 99% of microbiota reside in the gut. The roles of colonizing viruses, fungi, and parasites are reviewed elsewhere (228, 229). Insights from ongoing cancer research are elucidating the interplay and crosstalk between microbiota, innate immunity (myeloid and lymphoid), genetics, environmental exposures, diet, drugs and lifestyle, which centers on the metabolitesgenerated from host and multibiome activities and interactions (109, 110). Gut microbiota influence the shape and quality of the immune system, the immune system guides the composition and localization of the microbiota, and both can influence oncogenesis. Transcriptional reprogramming through epigenetic modifications is a prominent mechanism by which microbiota influence innate immunity. Conflicting clinical results of high-fiber diets have not controlled for microbiota, which ferment fiber into butyrate and other short chain fatty acids (SCFAs) that have energetic (via β-oxidation in the mitochondria) and epigenetic functions in colonocytes, histone acetylation, and tumor suppressor effects. The microbiota-epigenetic crosstalk can be illustrated when histone deacetylase 3 (HDAC3) is specifically deleted from intestinal cells: gene expression is massively altered and the integrity of the epithelial barrier is lost. Microbiota undergo rhythmic oscillations in composition and function and induce TLR signaling in intestinal epithelial cells to drive hormone production and metabolic activity through a coordinated circadian clock (110). Intestinal barrier function (and bacterial translocation) is also regulated by GWAS-identified laminin nuclear and mtDNA variants (119, 120) and inflammatory cytokines, such as IL-1β and IL-18 (230), autophagy (96), p53 loss, and carbohydrates (which affect gut mucous layer and microbiota spatial organization; ref. 231). A mouse model of impaired intestinal barrier function, where immune cells are exposed to the microbial product lipopolysaccharide, favors intestinal tumorigenesis through the action of interleukin IL-23/IL-17. Low-grade gut inflammation due to microbiota weakens epithelial tight junctions and may cause cancer risk factors – obesity and Type 2 diabetes (232).
Intricate community structures, such as matrix-enclosed polymicrobial biofilms, can protect luminal microbiota from host factors and antibiotics, and are immunogenic (233). New metabolite imaging techniques, such as nanostructure imaging mass spectrometry, have identified polyamine metabolites that can enhance the activity of SLC3A2, which imports nitric oxide and can induce biofilm formation. The metabolome reflects both host and microbial activities. New tools are needed to assign strain specificity and assemble individual genomes from hybrid capture and single-cell approaches to isolate and sequence rare species and single microbial cells (110, 234). Gut microbiota and other many factors can intrinsically affect the epithelial/mucus barrier (from germline to diet to metabolite, etc.), including a protective role against CRC with barrier maintenance, largely through pattern recognition receptors (e.g., TLRs and NOD-like receptors), mucin glycan composition, and production of protective metabolites, such as SCFAs (e.g., butyrate can suppress colon tumorigenesis, mediated by Gpr109a and discussed earlier). Nod2 loss (a mouse model of innate immune deficiency) disrupts the gut microbial balance (dysbiosis), induces IL-6, and gives rise to colitis-associated carcinogenesis in mice (110). Thymic stromal lymphopoietin (TSLP) is a cytokine expressed mainly by epithelial cells at barrier surfaces (skin, gut, and lung). Extensive sequencing and computational tools were used to study skin metabolites and microbiota, producing a molecular map of the skin surface, identifying chemical drivers of skin cancer (235). Short-course calcipotriol, a topical TSLP-specific inducer FDA approved for psoriasis, durably suppressed skin cancer development in GEMMs consistent with an immune memory response and markedly reduced actinic keratosis (mediated by T-cell adaptive immunity) in a randomized clinical trial (236).
A surprising source of exogenously DNA damage double-stranded breaks come from microbiota such as H. pylori via the NER pathway, which cause genomic instability (4-fold increase in mutation frequency in mice) of gastric nuclear and mitochondrial DNA, intraepithelial neoplasia and adenocarcinoma (237), and Mycobacterium tuberculosis can inhibit T-cell activity. This immunosuppressive effect may be similar to the potential immunosuppressive effect of F. nucleatum (and its virulence factor FadA, which modulated E-cadherin/β-catenin signaling and increases colonic permeability) in mice and recently validated clinically in CRC. F. nucleatum protein Fap2 can mediate F. nucleatum attachment to and invasion of adenomas and CRC and immune evasion. Mucosal microbial communities show distinct alterations across stages of colorectal carcinogenesis and seem more involved in the conventional chromosomal instability pathway vs. serrated precursors. F. nucleatum levels have also been associated with MSI and colonic tumorigenesis (238). Correlations of bacterial taxa indicate early signs of dysbiosis in adenoma, and co-exclusive relationships are subsequently more common in cancer. Microbiota influences lesion localization with proximal (right-sided) colon tumors much more likely associated with biofilm-producing bacteria than distal (left–sided) tumors. Emerging data suggest a link between ETBF, IBD, and CRC. ETBF can trigger IL-17-dependent colon tumorigenesis characterized by immune infiltrate and myeloid signature (239). Depletion of Tregs in a GEMM enhanced colitis but suppressed tumorigenesis associated with shifting mucosal cytokine profile from IL-17 to interferon-γ. Impaired epithelial barrier function, induced by alarmins, recruit and activate Tregs. It has been shown that gut microbes and metabolites modulate whole host immune and hormonal factors influencing the fate of distant precancers (e.g., breast), possibly by interacting with broader systemic microbial-immune networks.
Leveraging the recently launched US National Microbiome Initiative could help identify potential microbiota-related prevention factors including lifestyle (108, 232, 240), antibiotics, diet, and microbial reprogramming (241, 242). Diets rich in whole grains and fiber are associated with a lower risk for F. nucleatum-positive (but not -negative) CRC, supporting a potential role for intestinal microbiota in regulating host immunity mediating the association between diet and colorectal neoplasia (243). High-protein diet can reduce beneficial microbiota and metabolites, downregulating immune protection (110). High-fat diet was found to induce intestinal progenitor cells to adopt a more stem cell-like fate, increasing tumor incidence. These effects were not the result of obesity but were caused by certain fatty acids in the diet (244). In contrast, calorie restriction has the opposite effect, associated with reduced tumor initiation. Short-term diet changes in rural Africans and African Americans resulted in large changes in bacterial species, metabolites and cancer risk (241). Gut microbiota-induced metabolites and components were recently shown to promote obesity-linked immune escape in hepatic tumorigenesis (245). Finally, bacterial composition in Barrett's esophagus is dynamic and associated with genomic instability (246).
Analyses of NGS genomic data are critical to develop vaccines that target specific epitopes derived from mutations, copy number alterations, or other variants common to precancers. However, this direct strategy is especially challenging given the large number of alterations, low penetrance of driver mutations, so-called “long tail” problem (low frequency mutations; ref. 247), and corresponding mutant peptides, which do not lead to effective antigen presentation and response. Mass spectrometry-based analysis and immunogenic assays in mice (248), coupled with precursor NGS genomic data, will help select better precancer vaccine targets. Computational methodologies using existing resources and databases that catalog potential antigens (249) and machine-learning classification approaches to predict peptide binding affinity to HLA I and II have also been developed (250, 251). Computational analyses of neoantigen-epitope prediction algorithms have shown that only a very small proportion of predicted neo-epitopes are actually presented on MHC class I as targets of endogenous T-cell responses (10, 252, 253). Using the NGS genomic data from precancers could use two strategies to nominate candidate neoantigens: 1) mutation-calling algorithms to identify frequently occurring antigens and 2) for the low frequency events, utilize functional maps described above to identify complementary antigens that associate with oncogenic states (168). These filtered lists of neoantigens could predict/prioritize T-cell neoepitope candidates/vaccine targets in silico using algorithms and analytic pipelines based on stabilized peptide p–MHC-I binding affinity (252, 254), although their utility to predict MHC class II antigens, gene fusions, splicing variants, and posttranslational modifications is limited, since they rely on whole-exome sequencing (which is best for point mutations and small insertions/deletions). Novel approaches using T-cells from healthy donors may broaden neoantigen-specific immune responses (255). Leveraging the PCA to develop high-throughput approaches and improved predictive algorithms to identify (and quantify) immunogenicity of the precancer antigenic repertoire will be critical to identifying potential immunosurveillance and vaccine targets (256–258).
As above, the MHC/HLA complex (located on 6p21) is critical to immune oncology and vaccine development. HLA loci are highly polymorphic and implicated in cancer risk by multiple studies in a variety of tumor types, including via search of the NHGRI GWAS catalog (259). Highlighting the importance of biochemistry in this context, a recent analysis found somatic mutations that disrupt β-2 microglobulin (a component of class I MHC located on chromosome 15) protein-protein interactions, with a striking enrichment for mutations at protein interfaces involving β-2 microglobulin's binding partners (260). It has been shown that disruption of β-2 microglobulin can minimize immunogenicity of human embryonic stem cells (261). Such mechanisms may be employed by precancers and cancers to escape immune surveillance (262, 263). In human Lynch syndrome-associated CRC, β-2 microglobulin mutations and accordingly HLA class I loss or downregulation is frequent (up to 40%); most likely resulting from an active immune selection/editing process (264). FOXP3-positive T-cell infiltration was significantly lower in normal mucosa adjacent to mutant (mt)-β-2 microglobulin compared to WT-β-2 microglobulin tumors, suggesting that in the absence of Tregs, the outgrowth of less immunogenic mt-β-2 microglobulin tumor cells is favored (see above).
Biochemistry: understanding the molecular basis of neoplasia
TCGA and related studies have demonstrated that a large number of genetic and epigenetic factors, such as chromatin modifiers and remodelers, are highly mutated in a large number of solid tumors and in hematologic malignancies. Recurrent mutations in genes that encode regulators of chromatin structure and function highlight the central role that aberrant epigenetic regulation plays in the pathogenesis of these neoplasms. Deciphering the molecular mechanisms for how alterations in epigenetic modifiers, specifically histone and DNA methylases and demethylases, drive hematopoietic transformation could provide avenues for developing novel targeted epigenetic prevention for hematologic neoplasia and could also inform future research in solid tumors. Many such protein complexes – including the MLL family and polycomb components PRC1 and PRC2, which contain EZH2, ASXL1, and BAP1 (265), and the SWI/SNF (266) – contain genes that are frequently mutated in human cancers but were initially identified in simple model systems, such as Drosophila and yeast, emphasizing the importance of model organisms in any large-scale efforts in cancer prevention. While genomic deletions and nonsense, frameshift, and splice site mutations that introduce a premature stop codon or alter protein structure can be obvious loss-of-function events, missense mutations can be hard to classify unless they alter enzymatic function or disrupt protein–protein interfaces.
A large number of hematologic malignancies harbor translocations of the N-terminal region of MLL1 to diverse fusion partners that share very little sequence or functional similarity. To understand how these diverse MLL translocations result in leukemogenesis, biochemical and enzymological studies were essential. First, MLL and its yeast homolog SET1 have been shown to be present in COMPASS (Complex of Proteins Associated with Set1), catalyzes methylation at histone H3K4 (265, 267), although our understanding of the function of this important enzyme is continuing to evolve (268). COMPASS dysregulation of gene expression can give rise to various cancers (267). Second, AFF4, itself a fusion partner of MLL in leukemia, has been found to be a common factor among all purified MLL translocations (269). Third, ELL, one of the frequent translocation partners of MLL in leukemia, has been found to function as an RNA Pol II elongation factor that increased the catalytic rate of transcription elongation by RNA Pol II by suppressing transient pausing (270). Finally, it has been shown that many MLL translocation partners are found in association with ELL and the positive transcription elongation factor (P-TEFβ), within the Super Elongation Complex (SEC; refs. 265, 270, 271). The translocation of MLL into SEC is involved in the misrecruitment of SEC to MLL target genes, perturbing transcription elongation checkpoints at these loci and resulting in leukemia (271). MLL-induced leukemogenesis highlights the role of deregulated histone methylation in tumorigenesis (272).
Another example of the importance of biochemistry is deciphering the molecular role of an observed genetic link of EZH2 in cancer. EZH2 encodes the catalytic subunit of PRC2, which is responsible for methylating lysine 27 of histone 3 (H3K27). Trimethylation at this site is associated with closed chromatin and silencing of neighboring gene expression. In neoplasia, EZH2 can influence T-cell biology (273) and function as either an oncogene or a tumor suppressor gene depending on the cellular context; for example, EZH is sufficient to transform lung cells in transgenic mouse models overexpressing EZH (274), and loss-of-function EZH2 mutations occur in MDS and chronic myelomonocytic leukemia (CMML; ref. 275). In germinal center diffuse large B-cell lymphomas, recurrent mutations essentially of only one codon (Y641) create a protein with reduced affinity for unmethylated H3K27 but highly increased affinity for monomethylated H3K27, resulting in higher levels of H3K27 trimethylation overall. In contrast, pre-AML syndromes like MDS and CMML do not develop Y641 mutations but instead recurrently develop nonsense, frameshift, and other loss-of-function mutations in EZH2 resulting in low levels of H3K27 trimethylation (276). Ezh2 loss synergizes with JAK2-V617F in hematopoietic cells to contribute to the development and progression of MPN. The MPN phenotype induced by JAK2-V617F was accentuated in Jak2-V617F;Ezh2(-/-) mice, resulting in expansion of the stem cell and progenitor cell compartments and severe disease progression, including more advanced myelofibrosis and reduced survival (277). These results, which support tumor suppressor function of EZH2 in patients with MPN, have important clinical implications for EZH2 inhibitor development in this setting. It is possible that EZH2 inhibition will mimic malignancy-associated, loss-of-function EZH2 mutations in normal myeloid cells leading to dysregulated growth or differentiation in these cells, highlighting the need for future context-dependent studies.
SWI/SNF is also a critical regulator of nucleosome remodeling conserved from yeast to humans. Biochemical investigation combined with bioinformatic assessments have demonstrated widespread genomic alterations that occur across the members of the complex in 19.6% of all human tumors reported in 44 studies (278). In liver, genetic suppression of SWI/SNF complex member ARID1B was shown to overcome oncogene-induced senescence and lead to liver neoplastic progression (279). The finding that ARID1A deficiency disrupts SWI/SNF-mediated control of enhancers (including H3K27 reduction) provides a mechanism by which ARID1A mutations/deletions may promote colorectal and mammary tumorigenesis (280, 281). While these studies suggest an emerging role for SWI/SNF in cancer development, further delineation of the role of SWI/SNF complexes in precancers will help drive preventive efforts. Furthermore, prior studies demonstrate antagonistic relationship between the SWI/SNF and PRC2 complex in mediating oncogenic transformation (265, 267).
A transformative example of biochemistry's importance in premalignant biology involves the discovery of recurrent mutations in IDH1 and IDH2 in glioblastoma, AML, and their precursor cells. Such mutations were found through broad sequencing efforts (282) although their role at the molecular level was not clear until the advent of modern metabolomics profiling (283), which found that mutant IDH enzymes convert the normal intracellular metabolite alpha-ketoglutarate into 2-hydroxyglutarate. A competitive inhibitor of a large class of dioxygenase enzymes that utilizes alpha-ketoglutarate, 2-hydroxyglutarate accumulates to very high levels in IDH-mutated cancers, potently inhibiting many important intracellular dioxygenases, including the TET family, prolyl hydroxylases, and several histone demethylases (284–286). Thus, biochemistry and metabolomics have illustrated how 2-hydroxyglutarate contributes to carcinogenesis in a hitherto unprecedented way by acting as a novel “oncometabolite” generated by somatic IDH1/IDH2 mutations that can potentially serve as vaccine targets for both cancer prevention (287).
Biochemical approaches have also focused on the significance of metabolism and its link to epigenetic factors, such as the TET family in the regulation of cell-lineage specification and the development of cancer. These discoveries are only a few examples among a large number of biochemical approaches in neoplasia and are a testimony to the power of biochemistry in understanding neoplasia and the design of its targeted prevention, for example, by highlighting the importance of epigenetic regulation. High information content mass spectrometry to profile global histone modifications in human cancers (113), when combined with the DNA-sequencing data, can be used to identify novel variants that can drive epigenetic changes that can lead to oncogenic transformation. Chromatin-immunoprecipitation technology combined with NGS sequencing (ChIP-seq) can provide systematic information regarding the architecture of the chromatin cell states of cancers. New technological advances have demonstrated that ChIP-seq can be carried out in human tissues (289). Interestingly, examination of the chromatin landscape was able to fully distinguish the normal vs. cancers. These results suggest the possibility of gaining additional insights into precancers by systematic assessment of chromatin states using key histone acetylation and methylation patterns, super-enhancers, as well as TET, SWI/SNF, and PRC2 all of which are critical for chromatin regulation (289).
It is essential that the PCA incorporate detailed biochemical and enzymological studies on purified protein complexes to decipher the precise, context-dependent function of chromatin and other epigenetic modifiers and somatic mutations in precancer development and progression (268). This will also allow the profiles to be cross-referenced with the landscapes in primary tumors, as well as of the corresponding transcriptomic data to identify critical epigenetic changes that are necessary for malignant transformation. The intricate roles of EZH2, PRC2 and SWI/SNF in chromatin regulation in normal development and neoplasia require further elucidation especially in precancers. Finally, metabolic alterations in neoplasia continue to uncover novel connections between nutrient utilization and oncogenic state (83), critical to precancer progression.
Single-Cell Analyses
The natural history of precancers is heavily influenced by the multi-omic heterogeneity (229, 290; also see Big Genomics section) of neoplastic cells, multibiome, and tissue microenvironment. Single-cell RNA- or DNA-sequencing technologies can be specifically leveraged to unravel the elaborate cellular relationships within these lesions and stem cell identity that cannot be addressed by assaying bulk tissue (291, 292). In the case of mRNA profiles, downstream analyses can characterize known populations and novel subpopulations of cells and assess how these populations change in abundance as disease progresses or regresses. These data also can be used to more accurately infer important disease-associated gene regulatory and immune cell networks (293), because the gene expression variability has not been averaged across all sampled cells as in bulk tissue. In addition, single-cell sequencing can reveal and monitor lesion heterogeneity in somatic alterations and complex clonal dynamics among epithelial cells sampled at different geographic locations and over time to complement existing multi-region bulk sequencing approaches (294). These data will provide a high-resolution picture of cell types present in precancers and their surrounding microenvironment and the transcriptional programs active within each cell type that drive disease progression.
Several technical limitations need to be overcome, however, to realize the full potential of single-cell sequencing of precancers. These lesions are relatively small and frequently only diagnosed in formalin-fixed, paraffin-embedded (FFPE) tissues previously precluding comprehensive genome-sequencing studies using current methods (295). Furthermore, information regarding the location of neoplastic cells with particular mutations within a given lesion is especially important for early lesions, as this often defines the boundary between preinvasive and invasive neoplasia. Therefore, the development and application of methods that allow assessing the genetic and phenotypic features in situ using intact FFPE tissue samples is especially critical for the improved understanding of preinvasive lesions. Several technologies enable copy number alteration and gene expression analyses at the single-cell level from FFPE slides. These include FISH and immuno-FISH (combination of FISH with immunofluorescence; refs. 296, 297), mRNA in situ hybridization, in situ PCR, and STAR-FISH (298–300). The application of immuno-FISH for the analysis of cellular phenotypic heterogeneity and genetic features revealed extensive intratumor diversity in DCIS and clear expansion of minor subclones in DCIS to dominant clones in invasive ductal carcinoma (297). A shortcoming of these methods is the limited set of markers that can be assessed on a single slide and the need for a priori knowledge of the changes to be analyzed.
These methods are beginning to be applied to precancers, including DCIS, with massively parallel single-cell sequencing for copy number analysis. This proof-of-principle analysis established technical feasibility and demonstrated intra-lesion genetic heterogeneity at DCIS, suggesting complex and distinct evolutionary processes involved in early DCIS to subclonal selection in invasive disease. Multicolor FISH to evaluate clonal evolution at single-cell resolution in Barrett's esophagus found extensive genetic diversity in progressors (301). A whole-exome single-cell sequencing method was developed to assess genetic heterogeneity and tested on a premalignant JAK2-negative MPN (essential thrombocythemia) patient (302). Such profiling of a different MPN myelofibrosis revealed substantial heterogeneity in cytokine production (303). Single-cell genomic research of childhood ALL provided a high-resolution view of the preleukemic sequence of events leading to ALL, with early ETV6-RUNX1 translocations in hematopoietic precursors to later APOBEC-mediated transformation to leukemia (304). Somatic mutation loads in established single-cell clonal lineages provide information about lifetime exposures and intrinsic factors (164). Importantly, however, current single-cell sequencing approaches have important technological limitations, especially in epithelial precancers. First, the methods available are labor and cost intensive. Second, it is currently not possible to obtain accurate detailed copy number and mutational data from the same cell, given that some whole-genome amplification methods yield templates optimal for copy number analysis, whereas others are optimal for mutation profiling. Therefore, efforts are required for the development of less labor-intensive and more cost-effective methods for sequencing approaches and clonal lineage tracing, which are essential for a detailed analysis of the evolutionary paths of in situ disease and its progression to invasive cancer. Two more technologies, FISSEQ (fluorescent in situ sequencing; ref. 292) and “spatial transcriptomics” (305), allow for complete transcriptome analysis of single cells in intact tissue sections. Recently, a highly sensitive and quantitative single-cell RNA-seq assay to detect transcriptional variation and subtle differences in gene expression between related cells or cell states (306) has been developed. These techniques will help elucidate chromatin maps/signatures and epigenetic heterogeneity in neoplasia (307) and the multibiome (229, 308), including imaging of host–microbiota interchanges (309). Leveraging Human Cell Atlas advances, including novel single-cell technology, will be critical.
Cost reduction and advances in cell sequencing (and cfDNA technology) could theoretically allow temporal monitoring of blood and epithelial premalignancies on a population scale within the PCA (310–312). Periodic single-cell DNA sequencing of multiple cells from an individual will be invaluable for cancer prevention as it will allow one to assess the overall baseline accumulation of somatic mutations over time in a person, to survey and monitor multiple different endogenous processes and exogenous exposures through the use of mutational signatures, and to reveal the existence of progenitor cells, premalignant clones, and clonal evolution over time (313–315). The unprecedented resolution of sequencing single cells comes with a hefty computational and data price. Monitoring even a single individual will require multiple sequencing of one's genome every year resulting in several terabytes of data per person. As such, population-scale examinations will generate millions of whole-genome sequences resulting in exabyte scale data (>1018 bytes) requiring next generation of computational infrastructure (316) and novel computational frameworks (e.g., to take into account their relatively low signal resolution when compared with traditional bulk tissue sequencing). Challenges for whole-genome analyses with single-cell resolution will be highly amplified with multi-omic sequencing (317). More than ever, the rate-limiting step will be data analysis.
Liquid Biopsies for Early Detection and Intervention
By virtue of the clonal nature of tumor cells, somatic changes are present in many copies that are continuously released and can be detected in the blood as cell-free (cf) circulating tumor DNA (318). In cancer patients, alterations in cfDNA can be detected using sequencing and bioinformatic approaches, although such alterations can be difficult to detect as they often represent a minute fraction (<1%) of cfDNA. A variety of both targeted and whole-genome approaches have been developed to detect such alterations in cfDNA (319, 320). These have been used for early detection of recurrence in colorectal cancer, pancreatic cancer, neuroblastoma, hematopoietic malignancies, and others (321–323). Importantly, cfDNA detection of pancreatic cancer recurrence appears to precede radiographic detection of recurrence by over six months in some cases, providing a larger window for potential intervention in this challenging disease (322). Ultra-deep NGS identified somatic mutations in cfDNA from uterine lavage in patients with stage IA and in ∼50% of normal women, age-related (321). A new phylogenetic multiplex-PCR NGS platform approach to ctDNA profiling provides sufficient sensitivity to both identify lung cancer patients destined to relapse within a year of subclonal detection. Of importance to early detection research, cfDNA was recently found in plasma in patients with premalignant lung and bladder disease (324, 325). Differentially methylated loci in multiple HPV and host genes detected by NGS in urine cfDNA could lead to precision screening in cervical premalignancy (326). Somatic mutations in cfDNA among individuals without any precancer diagnosis, however, pose even more serious challenges for the development of ctDNA screening tests (327).
Achieving acceptable levels of sensitivity and specificity for early cancer detection will require further technical advances to identify possible combinations of cancer-specific mutations and define potential quantitative thresholds to avoid overdiagnosis. Enrichment steps can be based on biological properties (e.g., surface expression markers) or physics (e.g., size, density, deformability). Based on findings to date, it can already be envisaged that DNA sequencing needs to be broad, to encompass considerable tumor heterogeneity, and deep, to detect minute amounts of ctDNA fragments in the milieu of extensive genetically normal cfDNA. Nonmalignant conditions that can lead to the death of normal cells may also lead to a further dilution of ctDNA molecules and hamper quantitative evaluations. Very sensitive technologies that allow the detection of less than 0.1% of ctDNA in blood plasma (e.g., digital droplet PCR or cancer personalized profiling [CAPP] by deep sequencing methods) have been developed and applied to cfDNA analyses in patients with various forms of cancer (328), but the key biological limitation might be the number of genome equivalents present in blood samples from early-stage cancer patients.
Much work remains to be done in improving methodologies for detection of circulating tumor DNA. In addition to the need for higher sensitivity, the specificity of cfDNA measurements also faces serious challenges (329). Several emerging reports have shown that cancer-associated mutations are not restricted to cancer patients. Very low levels of TP53-mutated cfDNA were observed in plasma of matched non-cancer controls in lung cancer early detection research (327) and in the peritoneal fluid and peripheral blood of women with benign ovarian lesions (319). Likewise, a potential confounding issue is the detection of clonal alterations that arise in blood cells of healthy individuals that may be associated with aging and clonal hematopoiesis (see below). Distinguishing between alterations in genes associated with MDS/AML versus solid tumors may be one way to overcome this issue. Additionally, even when molecular alterations are identified, determining the tissue of origin of the incipient neoplastic lesion can be extremely challenging. Combining sequencing of cfDNA with epigenetics markers (330, 331), mutational signatures, imaging and mathematical modeling can be used for pinpointing the most likely tissue in which the clone originated. Even greater issues challenge the promise of ctDNA analysis for early cancer detection. Genomic alterations, such as those in the BRAF, RAS, EGFR, HER2, FGFR3, PIK3CA, TP53, CDKN2A, and NF1/2genes, all considered hallmark drivers of specific cancers, can also be identified in benign and premalignant conditions, occasionally at frequencies higher than in their malignant counterparts.
Analysis of other blood/fluid components, such as exosomes, platelets (332), urine, peritoneal fluid, and circulating tumor cells (CTCs), may help increase the sensitivity of detection (318, 324–326). CTCs are released early during tumor development (318) and have been found in patients with small primary tumors. Thus, detection of CTCs might be an alternative approach to early cancer identification. Circulating tumor cells have been detected in 3% of patients with chronic obstructive pulmonary disease (COPD) who have an elevated risk of developing lung cancer 1–4 years before annual CT-screening detected lung nodules (333), possibly complementing a CLIA-approved bronchial airway genomic classifier (FDA Laboratory Developed Test [LDT]) validated for lung cancer detection (334), but were not detected in control individuals (both smokers and nonsmokers) with normal lung function. The other FDA-approved omic stool DNA test is for CRC screening (111). This approach was recently extended into the less invasive, more accessible nasal epithelium where gene-expression alterations were associated with lung cancer detection (335). Technical advances have established the feasibility of detecting and NGS of CTCs collected from blood at the very early steps of tumor invasion (336).
Focusing on population groups at an elevated risk of developing cancer (e.g., COPD patients or individuals with inherited cancer susceptibility) is a good strategy to speed up the process of testing and validation of emerging approaches and technologies before considering large population-based screening efforts as indicated above. CTCs have been detected in ∼10% of CRC precancers (adenomas) possibly stimulated by cytokine-driven epithelial migration. Enormous resources are required to identify genomic aberrations specific and sensitive enough to detect advanced precancers and early malignant lesions in large cohorts. Finally, verification of the findings of approaches involving liquid biopsy will call for acquisition of information on the putative location of the occult lesion to select the appropriate imaging modalities or other diagnostic means prior to planning appropriate therapies. In this context, tissue-specific multiplex transcriptome profiling of single CTCs (337) might be a promising approach. With these approaches, it is possible to imagine a time when individuals at high risk of developing cancer due to either genetic or environmental risk factors could be serially monitored using a blood-based test. A potential advantage of this approach would be the relative ease of compliance compared with other more invasive screening technologies. Ultimately, the specific methodology would be determined by practical considerations such as cost, sensitivity, specificity, and robustness of the assays, but these approaches may forever change screening for cancers that are currently incurable unless diagnosed at an early stage.
Longitudinal Analysis of Premalignancies
It is likely that synchronous precancer/cancer pair studies will not always accurately reflect the temporal clonal evolution underlying neoplastic transformation, i.e., genetic alterations found in precancer adjacent to cancer are not comparable to those found in precancer in a patient who never progressed (as well shown in Barrett's esophagus). To fully appreciate such mechanisms, systematic spatial and temporal sampling of neoplastic cells will be essential. To date, such analyses of epithelial premalignancies, with the exception of Barrett's esophagus, have been extremely limited, with reports of a relatively small number of patients with lung squamous and adenocarcinoma premalignancy/field defect followed longitudinally (111, 338, 339). Whole transcriptome and genome analysis of preinvasive and early invasive squamous lung carcinoma in a small number of patients have identified molecular alterations in both epithelial cell signaling pathways and immune cell pathways that associate with progression of these precancers over time. Several cross-sectional studies, however, of lung squamous carcinogenesis have been reported. Abnormal upregulation of chemokine genes such as CXCL1, CXCL8, CXCL9 or CXCL10 is frequently detected in preinvasive lung lesions. Other differentially expressed genes include SOX2, CEACAM5, SLC2A1, RNF20, SSBP2, RASGRP3, and PTTG1, which is a known proto-oncogene. Differential gene expression was modest between low- and high-grade precancers, but substantial changes occurred between preinvasive and invasive squamous cell carcinoma samples (analogous to the pattern in Barrett's). The majority of the differentially expressed precancer genes, however, were shared with those in early invasive lesions. Pathway analysis showed that these shared genes are associated with DNA damage response, DNA/RNA metabolism, and inflammation. PI3K/AKT signaling was an early event during squamous carcinogenesis (338, 340). Homozygous inactivation of Keap1 or Trp53 promoted airway basal stem cell self-renewal, suggesting expansion of mutant stem cell clones. Deletion of Trp53 and Keap1 in these stem cells recapitulated the human setting (341).
Genomic instability and driver genes promote clonal evolution/diversity as illustrated below (150, 342). Surprising relationships between SCNA types and levels, mutation burden, cell cycle markers, and immunity (215) have major implications on malignant transformation. Large longitudinal studies of Barrett's esophagus established SCNAs as essential oncogenic drivers. Non-progressors largely maintained stable genomes, in contrast to patients with high levels of SCNAs, genome doubling, genetic diversity, and chromosomal instability/catastrophe associated with rapid (<2 years) progression to esophageal cancer (343). Another Barrett's esophagus report of clonal evolution at single-cell resolution using multicolor FISH found that baseline genetic diversity predicts progression (to cancer) and remains in stable dynamic equilibrium over time, suggesting that clonal make up and evolutionary trajectory of the lesion is predetermined from the outset (301). Clonal expansions were rare, often involving p16. Importantly, this Barrett's work has established the feasibility and model of longitudinal study in epithelial premalignancy. Focusing on premalignancies of the blood has several advantages, including the ease of repeatedly acquiring neoplastic cells to interrogate clonal evolution, discoveries that inform single-cell sequencing studies in epithelial neoplasia. Study of MPNs provides the only direct data that somatic mutation order (JAK2 and TET2) can greatly influence disease features (183). The overall malignant transformation rate for clonal hematopoiesis, monoclonal B-cell lymphocytosis (MBL), and monoclonal gammopathy of undetermined significance (MGUS) is about 1–2% per year, but individual risk is highly variable. Comprehensive single-cell and cfDNA omics research will play a key role in dissecting disease pathogenesis. We now have the ability to monitor hundreds of individual cells, thus overcoming bulk-cell/tissue limitations and allowing precise analyses of intraclonal and microenvironment architecture and crosstalk in the process and timing of transformation.
The unexpectedly high prevalence of clonal hematopoiesis was recently characterized (344, 345) by NGS identification of somatic mutations in genes (mutant clones of mostly single driver mutations), similar to the mutational spectrum seen in MDS (notably DNMT3A, TET2, and ASXL1), and is age-related in the general population and associated with increased risk of MPN, MDS, and AML (346, 347). In one study using ultra sensitive NGS techniques, tiny hematopoietic clones defined by somatic mutations could be identified in as many as 95% of women in their 50s, arguing that potentially premalignant clones are ubiquitous (348). Although the mechanism is unclear, most people with more macroscopic clonal hematopoiesis of indeterminate potential (CHIP) can stably harbor small hematopoietic clones for long periods of time. These clones must have initially expanded to the point of being detectable but then were held in check (347). One possibility is that the somatically mutated clones are able to persist in a finite number of pseudo-niches not available to normal stem cells. Once these pseudo-niches are filled, the clone cannot expand further until additional mutations are acquired or the microenvironment is altered to create additional areas hospitable to these cells. This phenomena has been observed in some other sites (e.g., eyelid, skin, BRAF-mutant moles; see above), where clones only grow so much before they become stable, until epigenetic reprogramming or some other event promotes transformation (see epigenetic section).
Recent studies have demonstrated how cells with a wide range of somatic mutations typical of myeloid malignancies (e.g., in U2AF1, SF3B1, SRSF2, ASXL1, or TET2) can induce inflammatory conditions and innate immune responses that then favor the growth of these mutant cells. Whether adaptive immune surveillance plays a role in limiting the expansion of premalignant clones is unclear. T-cell-mediated autoimmunity is well documented in some forms of MDS suggesting that interactions with the adaptive immune system likely play a role clonal evolution. It appears that the aging bone marrow microenvironment promotes outgrowth of clones (e.g., driven by mutations in splicing factor genes SF3B1, SRSF2) and is associated with innate and adaptive immune attrition. Murine findings suggest that aberrant splicing may also produce neoantigens capable of eliciting an immune response that would need to be suppressed or evaded for progression to occur. Drugs targeting the inflammasome and innate immune responses implicated in remodeling of the microenvironment are potential preventive approaches under investigation. The strong age dependency for clonal hematopoiesis occurs in the context of age-related changes in the bone marrow microenvironment and in normal HSCs with accumulation of DNA damage, myeloid skewing, altered DNA methylation patterns, and relative quiescence. The same genetic and epigenetic changes that drive oncogenesis may be selected for in this context and are not necessarily inevitable consequences of aging (349). Novel models using human disease-derived iPSCs and CRISPR/Cas9 editing tools have been used to map (and reverse) the evolution of the full spectrum of myeloid neoplasia from CHIP to MDS to AML (350). These models can be used to test compounds that might have stage specific effects capable of altering the fitness or preventing the evolution of premalignant clones.
Chronic infection with persistent inflammation depletes hematopoietic stem and progenitor cells through primarily through replication stress-induced terminal differentiation mediated by the transcription factor BATF2. Using a murine model of M. avium infection, inflammatory stimulation was a greater contributor to pancytopenia than myelofibrosis. Further study of these effects will provide mechanistic insight into diseases of progenitor and stem cell attrition/malfunction such as clonal hematopoiesis, MDS, and aplastic anemia (351). Aplastic anemia is a slightly different scenario, with two distinct processes occurring – immune-mediated destruction of hematopoietic cells and a constant stimulus for the regeneration of these cells. Two kinds of clonal hematopoietic cells may be selected for in that environment – those that can evade the immune system (reverting to a more normal stem cell state) and those that acquire mutations promoting clonal expansion. Clonal hematopoiesis is highly prevalent among patients with aplastic anemia, in which two broad types of genetic alterations are identified: (1) age-related mutations and CNAs commonly seen in myeloid malignancies (e.g., DNMT3A, RUNX1, ASXL1) and (2) those not age related and highly specific to aplastic anemia – PIGA and BCOR/BCORL1 mutations and uniparental disomy (UPD) in 6p (6pUPD) identified SNP array karyotyping, which in contrast to the first type, is associated with stable or decreasing clone size over time and much lower rates progression to MDS/AML (352). 6pUPD functionally results in deletion of half of the class I HLA locus, providing a mechanism for evasion of an immune response. Outside of this context, this event would not be predicted to confer a clonal advantage. In aplastic anemia, this form of immune evasion would allow stem cells to function as they might were the immune response not present. Clones with driver mutations in MDS-related genes, on the other hand, may gain proliferative or niche-independent growth capabilities that are more oncogenic. This is borne out by observations that 6pUPD is more common in children with aplastic anemia, an age group where CHIP mutations are rare and that 6pUPD is exceedingly rare in elderly MDS and AML. This suggests that 6pUPD is not an oncogenic event but only allows stem cells to persist in autoimmunity.
Nearly 40% of individuals with unexplained cytopenias harbor detectable mutations, many with clones having more than one driver mutation and higher risk of transformation to MDS/AML and all-cause mortality (353). Cooperating mutations also can be identified during periods of clonally skewed hematopoiesis in sporadic and hereditary settings that precede myeloid transformation. DNMT3A haploinsufficiency transformed FLT-mutant myeloproliferative disease into AML in mouse model (354). The frequent co-occurrence of germline GATA2 and somatic ASXL1 events (355) and germline SNPs associated with somatic mutations of JAK2 have uncovered several targetable cooperative mutations (356) driving premalignant progression (357). Examples of hereditary mutated transcription factors that predispose to hematologic neoplasia include mutations in CEBPA, RUNX1, ETV6, and PAX5 (287, 357). A recent GWAS identified germline variants that predispose to both JAK2 V617F clonal hematopoiesis and MPN (356). Four genes (JAK2, SH2B3, CHEK2, and TET2) altered in both inherited and somatic settings, contribute to V617F clonal hematopoeisis and/or MPN development. This identification of a predisposition allele associated with TET2 is intriguing since somatic TET2 mutations are common early events in myeloid precursors, including clonal hematopoiesis and MPN, and can be identified in HSCs, either preceding or following the acquisition of V617F, with the mutational order of these two genes influencing the clinical and biologic behavior of these neoplasms.
MBL is an asymptomatic expansion of clonal B-cells in the peripheral blood present in roughly 4% of all U.S. individuals over the age of 40 years (358). Genetic predisposition to MBL is suggested by the finding that the incidence of MBL is 3-fold higher for individuals within familial chronic lymphocytic leukemia (CLL) kindreds (defined as families with at least two first-degree relatives with CLL). A large GWAS of CLL and MBL families found significant germline variant associations in two out of eight regions tested (359). NGS has shown that most mutations and intraclonal heterogeneity found in CLL are already present in MBL years before progression (360, 361). Furthermore, longitudinal MBL studies, including those from patients who progressed to CLL, have begun to elucidate the sequence, timing, and impact of subclonal expansion and T-cell exhaustion on malignant transformation (362–364). Risks of serious bacterial infections in individuals with MBL are similar to those with CLL (365–367) and linked to MBL transformation (368, 369) and solid tumor risk is 3–4-fold higher in MBL and CLL (versus healthy controls), all thought to be due to defects in immune surveillance (366, 367). Antibody responses to primary and secondary antigen challenges are typically inefficient among patients with early-stage CLL. Preliminary data have shown that immunologic T-cell synapse is defective in individuals with MBL as well (358). Efforts to generate efficient vaccine responses will, therefore, be challenging. Enhancement of cytolytic T-cell function in MBL via vaccine therapy should be a long-term challenge since graft-versus-leukemia is highly effective in eradicating leukemic B-cells. Both in vitro and in vivo data suggest that lenalidomide can repair defects in the T-cell immune synapse and reduce Tregs in CLL patients (370, 371), an approach currently being tested clinically in MBL. Of note, mouse model data suggest that age and inflammatory status of the host microenvironment promotes selection for adaptive oncogenic events in B-cell progenitors (372), analogous to clonal hematopoiesis. Related work involves a hereditary syndrome of susceptibility to pre–B-cell neoplasia caused by inherited mutations of PAX5. Mechanistic data indicated that inherited susceptibility and aberrant immune responses to postnatal infections drives B-cell clonal evolution of premalignant B-cells and transformation to leukemia and lymphoma, by showing that pre-B-ALL was initiated in Pax5 heterozygous mice only when exposed to common infections (373).
The basis of MGUS as a precursor state is striking similar to other, recently characterized precursors of hematologic malignancies (e.g., clonal hematopoiesis and MBL; ref. 374). Studies of MGUS have provided some of the best evidence that the immune system has the capacity to recognize precursor states. Search for shared targets of immune response led to the finding that T-cells against stem cell antigens (such as SOX2) are particularly enriched in MGUS (versus MM). Prospective data demonstrate that baseline SOX2 T-cell immunity correlates with risk of transformation (214). Loss of T, NK and NKT cells (possibly due to Tregs, MDSCs, suppressive cytokines) have been associated with progression to MM. Prevention strategies include boosting pre-existing T-cell immunity, for example, with SOX2 vaccines and immune-modulatory drugs. Much of the genomic instability and somatic hypermutation events in MGUS are thought to originate/occur in the germinal center where IgH translocations involving AID induced by tumor-dendritic cell cross-talk appear to be early events in myelomagenesis. Immunoglobulin IGHV genes carry imprints of clonal tumor history. Deep sequencing of the Ig locus indicate ongoing somatic hypermutation in a subset of neoplastic cells, suggesting neoplastic transformation is initiated in a germinal center B-cell. These data parallel IGHV clonal sequences in some MGUS with ongoing somatic hypermutation imprints. Since MGUS precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational intraclonal variation from the same germinal center B-cell (375). Whole-epigenome NGS with extensive and unbiased analysis of the DNA methylome, including promoters, gene bodies, and intergenic regions in normal plasma cells, MGUS, and MM patient samples, identified DNA methylation of B-cell-specific enhancers, a new phenomenon in MM pathogenesis (376). This pattern may reveal new potential insights in the biology of the disease, representing a de novo epigenetic reprogramming or reflecting an epigenetic imprint of initial premalignant phases of the disease in progenitor cells. MGUS was less heterogeneous than MM but shared a similar hypermethylation signature. In contrast, hypomethylation in MM was much more extensive and heterogeneous than in MGUS, suggesting that it may be involved in transformation. Genomic studies, including small NGS reports have highlighted that MGUS is a genetically advanced lesion (with many of the genetic changes found in MM cells) with intraclonal heterogeneity in longitudinal studies (377). The mutational landscape reflects the biologic continuum of plasma cell dyscrasias from a low-complexity mutational pattern in MGUS and immunoglobulin light-chain amyloidosis to a high-complexity pattern in MM (378). Genomic study of light-chain amyloidosis did not find a unifying mutation (by whole-exome sequencing), had similar CNA profiles as MM, but surprisingly the transcriptome is very similar to normal plasma cells (379).
Strikingly similar to clonal hematopoiesis and MBL, MGUS cells demonstrate clinical dormancy despite these extensive genomic alterations. Interestingly, new humanized models developed to grow precursor cells in vivo indicate that MGUS cells have the capacity for progressive growth, clearly indicating that the clinical stability/dormancy of these cells is in part mediated by features extrinsic to tumor cells, such as the non-cellular matrix and immune, bone, endothelial, stromal, and other cells in the bone marrow niche (shared with hematopoietic stem cells) where signals derived from osteoblasts may be important for mediating dormancy of MGUS cells (377). These humanized models to grow premalignant cells in vivo should greatly advance the study of clonal evolution and malignant transformation in this setting (380). Inherited genetic variation in specific SNPs increases MGUS predisposition and risk of transforming to MM (381). Loci identified also points to a role for chronic antigen-driven stimulation in driving clonal origins and evolution in MM and other B-cell tumors. The risk of MM is increased >30-fold in the inherited glucosphingolipid storage disorder Gaucher disease (GD), characterized by germline GBA mutations, due in part to lysolipid-induced chronic inflammation and genomic instability shown to promote the development of gammopathy in mice and patients. Lysolipid substrate reduction (especially given early) in Gba1-deficient mice decreased the risk of gammopathies (382, 383). This led to the recent discovery that in nearly 25% of all cases of MGUS/MM, the underlying clone may be driven by lipid antigens such as inflammation-associated bioactive lipids, which has important prevention implications (382). Another high-risk group are blacks. Studies of GD and African cohorts (in Ghana) have an increased incidence of polyclonal gammopathies (377), suggesting that polyclonal B-cell activation may be a less genetically complex pre-MGUS phase.
Summary
While a number of interventions are already FDA approved for cancer prevention (detailed in ref. 384), this is only the tip of the iceberg. New precision prevention approaches will be needed, with novel agents and combinations (11, 43, 385), trial designs (e.g., involving molecular/immune risk selection; refs. 386–388), surrogate (e.g., PI3K signature; ref. 389), and predictive markers (e.g., somatic mutations and germline variants of PIK3CA, BRAF, and HLA class I antigen expression in aspirin efficacy in CRC; promising leads are being evaluated prospectively; refs. 389–391). Emerging studies/models of progenitors and mutational processes linking cell lineage to clonal evolution include: iPSCs and CRISPR/Cas9 editing to map the evolution of myeloid neoplasia; and cell-fate dynamics, reprogramming, and lineage-specific regulation to progenitor cells, potentially at the single-cell level, that can be identified and targeted for early destruction (124, 314, 350). For example, elegant studies of luminal progenitor biology and RANK signaling initiated a transformative prevention trial in BRCA1 carriers. Recent in-depth mechanistic work involving farnesoid X nuclear receptor ligand (392) adds further biological plausibility to the first real signal of clinical benefit in NASH (393). As documented in this Perspective, Barrett's esophagus is the most developed model for the PCA indicating feasibility and major discovery, including: germline (rare mutations and GWAS; refs. 116–118), mitochondrial (161), big genomics (153–155), epigenetics/transcriptomics (156), biochemical (163), immune (157), microbiome (246), single-cell (301), and longitudinal (383) studies and molecular monitoring to avoid overdiagnosis (394).
Developing cancer vaccines as potent as polio, diphtheria, and rubella vaccines would protect future generations from developing cancer. In cervical cancer, prophylactic vaccination against HPV can virtually eliminate the disease (395), although immune evasion is an issue in CIN2/3 (396). The development of cancer vaccines to stimulate T-cells (with strong adjuvants) to recognize precancer antigens as foreign may prevent cancer even at the premalignant stage (264). The communication between the immune system and neoplasia reflects a fundamental principle of oncogenesis in which tumors orchestrate an intricate signaling network involving different immune cells and metabolic pathways to induce immune tolerance. This increasingly complex interplay involves an elaborate microenvironment (Fig. 1; refs. 215, 397); autophagy, aging, mitochondrial dynamics, metabolic activation, epigenetic reprogramming, and HSC fate (83, 96, 372, 398); and intricate systemic microbial landscape (109, 110). Repurposing proven preventive agents such as tamoxifen is supported by striking off-target innate and adaptive immune effects (30, 399). A rigorous, integrated approach recently confirmed the widespread genetic regulation of immune and inflammatory pathways, extending experiments to primary human cells in disease-relevant contexts to unravel the cell- and context-specific regulatory effects of intricate disease variants (400). Finally, systematic, genome-wide transcriptional and epigenetic studies have begun to uncover the role of heterogeneity and variability across multiple immune cell types (401).
Age-related CHIP, now established by NGS, was found in 10% of over 40,000 healthy people >65 years (e.g., 344, 345, 347). Several ultra-deep NGS studies emerging over the past year are redefining the premalignant landscape finding minute somatic mutation-driven clones in normal tissue, including blood cells ([micro-CHIP] in 95% of 50–60 year old women), eyelid skin (32% of 234 biopsies from four middle-aged healthy people), and age-related peritoneal (100% in 20 women) and uterine (49% in 95 women) lavage fluid (126, 321, 348, 402). These clones seem to expand to the point of being detectable but then held in check until an epigenetic, immune (equilibrium phase), or other transforming event (264, 347, 403). Diet, lifestyle, environmental, metabolomic factors, and the human genome and multibiome (including viruses and protozoa), which share an intricate set of interdependencies (109, 110, 228, 229, 232–234, 404), have important implications for prevention strategies through targeting immune cells and microbiota mechanistically rooted in gnotobiotic mouse models. Probing germline (rare mutations and common variants)-somatic interactions is leading to novel prevention targets and approaches. Single-cell tools are beginning to reveal immense heterogeneity of DCIS and Barrett's esophagus. A major challenge is to develop single-cell technology to study spatial proximity and temporal dynamics between cells, integrating individual cellular states into models of functioning tissues, including interactions of precancer and immune cells and other components of the microenvironment, which will revolutionize our fundamental understanding of neoplasia biology (405).
In BRCA1/2-mutation carriers, high-grade serous ovarian carcinoma (most common and severe subtype) originates from fallopian tubes rather than the ovaries. This shift in understanding highlights the remarkable prospect of practice-changing prevention advances, however, the scarcity of clinically annotated precancers to interrogate limits essential biologic insight needed to drive novel interception strategies (127). This is one of many examples where a national PCA effort could transform a devastating disease. Based on recent recommendations from the NCI Blue Ribbon Panel associated with the Cancer Moonshot initiative, the NCI has assembled a precancer think tank to develop a national roadmap for these types of studies. Establishing a PCA that integrates multi-omics and immunity (Fig. 1) will be critical to better dissect and disrupt clonal evolution/diversity and the immunosuppressive microenvironment to identify targets/mechanisms and immunogenic antigens, including the design of vaccines, to detect, prevent, and reject precancers.
As indicated in this Perspective, the complexities and challenges of this project are daunting, but the disruptive tools to make the PCA feasible are coming on line and the potential public health benefits of preventing cancer are immeasurable. To accelerate the prevention of cancer, this field needs a large-scale, longitudinal effort, leveraging major initiatives and infrastructure, diverse disciplines, technologies, and models (164, 350, 380, 406–412) to develop a multi-omics and immunity PCA. This unified Atlas will provide a vast national resource for discovery to interrogate, target, and intercept events that drive oncogenesis.
In summary, to fully achieve cancer prevention, we must build teams with multiple areas of expertise from the NIH, academia, Food and Drug Administration, private foundations, philanthropic partners, and industry. The best analogy of assembling such a multidisciplinary team is the Manhattan Project—one goal, multiple experts.
Disclosure of Potential Conflicts of Interest
A.E. Spira reports receiving a commercial research grant from Janssen Pharmaceuticals and is a consultant/advisory board member for Janssen Pharmaceuticals and Veracyte Inc. M.B. Yurgelun reports receiving a commercial research grant from Myriad Genetic Laboratories, Inc. R. Bejar is a consultant/advisory board member for Genoptix, Celgene, Foundation Medicine, and Alexion. J.E. Garber reports receiving other commercial research support from Novartis and Ambry Genetics and is a consultant/advisory board member for Biogen, Novartis, and GTx. V.E. Velculescu has ownership interest (including patents) in Personal Genome Diagnostics and is a consultant/advisory board member for Personal Genome Diagnostics. M.L. Disis reports receiving a commercial research grant from VentiRx, Celgene, Jannsen, Seattle Genetics, and EMD Sorono and has ownership interest (including patents) in VentiRx and Epithany. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: A.E. Spira, M.B. Yurgelun, R. Bejar, E. Vilar, T.R. Rebbeck, V.E. Velculescu, S.M. Lippman
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.M. Lippman
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M.B. Yurgelun, T.R. Rebbeck, D. Wallace, S.M. Lippman
Writing, review, and/or revision of the manuscript: A.E. Spira, M.B. Yurgelun, L. Alexandrov, A. Rao, R. Bejar, K. Polyak, M. Giannakis, A. Shilatifard, O.J. Finn, M. Dhodapkar, N.E. Kay, E. Braggio, E. Vilar, S.A. Mazzilli, T.R. Rebbeck, J.E. Garber, V.E. Velculescu, M.L. Disis, D. Wallace, S.M. Lippman
Study supervision: S.M. Lippman
Acknowledgments
The authors thank Jennifer Beane, PhD, for her input on the single-cell sequencing section and Leona Flores, PhD, for editorial assistance with this article.
Grant Support
A.E. Spira was supported by NIH/NCI 1U01CA214182 and 5U01CA196408. S.M. Lippman was supported for this work by NCIP30-CA023100-29. AS and SML co-chair the PremalignanT Cancer Genome Atlas (PreTCGA) Demonstration Project, NCI BRP.