Abstract
Tissue profiling technologies present opportunities for understanding transition from precancerous lesions to malignancy, which may impact risk stratification, prevention, and even cancer treatment. A human precancer atlas building effort is ongoing to tackle the significant challenge of decoding the heterogeneity among cells, specimens, and patients. Here, we discuss the findings resulting from atlases built across precancer types, including those found in colon, breast, lung, stomach, cervix, and skin, using bulk, single-cell, and spatial profiling strategies. We highlight two main themes that emerge across precancer types: the ordering of molecular events that occur during tumor progression and the fluctuation of microenvironmental response during precancer progression. We further highlight the key challenges of data integration across large cohorts of patients, and the need for computational tools to reliably annotate and quality control high-volume, high-dimensional data.
Introduction
Cancers are dynamic and complex diseases, which undergo constant alteration in the genome, epigenome, and tumor microenvironment (TME). More than 200 types of cancers have been described, each can be further classified into subtypes. Modern technologies enable subtypes to be defined through molecular data describing the genome, epigenome, and TME, which provide characterizations previously unrevealed by histologic classifications. These technologies fueled The Cancer Genome Atlas (TCGA), and subsequently, the Clinical Proteomic Tumor Analysis Consortium (CPTAC), which generated large-scale molecular data on cancer specimens for 33 human cancer types (1). These rich datasets are transformative in cancer research, in that they define key molecular aberrations with clinical implications over various cancer types, such as TP53, mutations IDH1/2, and 1p and 19q deletion in subtyping diffuse glioma with distinct diagnoses (2–4). Assessment of recurrent fusion events’ druggability across cancer types is also enabled by these data (5). Expanded application of therapies to a myriad of cancer types is made possible by the identification of ubiquitous driver genes, such as TP53 in breast, head and neck, and ovarian cancers (1, 6). Systematic examinations of cancers at the molecular level rapidly expand our knowledge of the underlying mechanisms driving cancers, ultimately contributing to patient stratification and intervention strategies.
Despite the tremendous progress in understanding cancer, successful control of advanced cancers remains a significant challenge in part due to tumor heterogeneity (7). Tumor heterogeneity is presented as molecular and cellular diversity at various levels, including within tumor, between different tumors, and over time. In cancer, tumor heterogeneity is thought to arise mostly from stochastic genomic alterations with selection favoring tumor growth and plasticity (8, 9). Nongenetic, but inheritable, epigenetic mechanisms can also play a role (10). Molecular subtyping efforts essentially attempt to find generalizable and actionable principles among heterogeneous cancers that arose from independently complex evolutionary trajectories. An illustrative example of this challenge can be found in colorectal cancer, where collective efforts have been conducted to establish consensus molecular subtypes that integrate multidimensional bulk-profiling data (11). More recently, single-cell resolution data established a new classification system for colorectal cancer (12). However, these data also show that individual cells exhibit more similarity within a tumor than across tumors, demonstrating that each tumor potentially can be classified as its own subtype. This can be attributed to individualized genetic variation and somatic evolutionary patterns that give rise to intertumoral variability in advanced cancers. Clinical outcomes also demonstrate the impact of tumor heterogeneity. Almost all advanced cancers develop resistance to targeted therapies, even those that exhibit drastic initial response. Evidence points to intratumoral and temporal heterogeneity as the main forces fostering therapy resistance (13). These instances highlight significant obstacles posed by tumor heterogeneity in decoding cancers with individual variabilities.
Lesions at premalignant stages can provide alternative windows for cancer intervention. Precancerous lesions often precede the development of invasive carcinoma (14). Precancers are usually less heterogeneous than cancers (13, 15), as premalignant lesions (PML) usually start with mutations in well-known driver genes that initiate neoplastic growth, as opposed to mutations arising later in tumor evolution that are harbored by subclones that contribute to intratumoral heterogeneity and patient-to-patient variation (16, 17). Given that higher levels of tumor heterogeneity render worse outcomes to anticancer treatments (13), it is conceivable that early intervention is the key to controlling cancer.
Studying precancers also benefits our understanding of the disease from the perspective of prevention. Extrinsic factors, such as diets, smoking, alcohol consumption, and microbiome differences have profound influences on cancer development (18–20). Understanding the mechanisms, such as disrupted pathways modulated by extrinsic factors, would offer valuable information for developing preventative strategies. For example, a high-fat diet induced intestinal stem cell adaptation through peroxisome proliferator-activated receptor signaling pathway is associated with colorectal cancer risks, and the inhibition of key proteins in this pathway can reduce the protumorigenic effects of a high-fat diet on tumor initiation (21). This understanding of the high-fat diet induced tumorigenesis pathway opens new opportunities for diet-mediated cancer prevention. Furthermore, because not all PMLs will progress to cancer, it is also crucial to understand the features distinguishing lesions at high risks to those (16, 17) that are likely to regress (14). Understanding precancer biology may, in some way, be more effective in controlling cancer, because this knowledge is key to deliver timely intervention, guide appropriate follow-up screening, prevent overtreatments, and identify potential targets for early intervention.
Given the measurable value of understanding PMLs, many groups are now engaged in constructing human precancer atlases to characterize molecular changes during the early stages towards malignancy. We provide an overview of various molecular and spatial profiling technologies (Fig. 1) applied in these precancer atlases’ studies that focus on precancer originating in the colon, breast, lung, stomach, cervix, skin, and more importantly, the insights gained through these atlasing efforts.
Characterization of Precancerous Lesions Using Bulk Approaches
Being important and prevalent pillars of molecular profiling, bulk sequencing analyses have been extensively used in characterizing advanced human cancers (Fig. 1A; refs. 1, 22). Likewise, these approaches have been applied to PMLs of several cancer types, although at a lower rate compared with malignant lesions. Colonic precancerous lesions are one of the most profiled human precancers. Outside of canonical APC mutations (23, 24), a myriad of mutations in WNT pathway genes such as LRP1B, SOX9, FAT4, TCF7L2, FBXW7, ARID1A, CTNNB1 (25–28), as well as early onset epigenetic silencing of WNT negative regulators, were also observed in conventional colonic adenomas (25, 29–32). In contrast to conventional adenomas that are mainly found in the descending colon, serrated colonic polyps that arise from the ascending colon lack APC mutations and frequently exhibit CpG island methylator phenotype (CIMP) and epigenetic silencing of CDKN2A/p16 (23, 33–37), highlighting diverging genetic events initiating conventional and serrated polyps. Notably, a high mutational burden characteristic of microsatellite instable (MSI) colorectal cancers was not yet observed in sessile serrated lesion precursors (15).New technologies such as multirestriction enzymes digested Hi-C (mHi-C) revealed progressive decay of chromatin microstructures, such as strips and loops originating from active cis-regulatory elements, particularly at promoters in a progression spectrum of familial adenomatous polyposis (FAP) adenomas (bioRxiv 2022.08.26.505505). Another recently developed technology Ultra High-throughput Whole Genome Methylation Sequencing (WGMS) enabled comprehensive and untargeted measurements of methylation status in large scale studies. Its application on the same specimen set revealed a gradual decrease in average genome-wide methylation, although specific regulatory elements had divergent patterns of methylation alterations, highlighting the complex and dynamic nature of the methylation landscape during precancer progression (bioRxiv 2022.05.30.494076). Bulk microbiome profiling (16S rRNA-sequencing and metagenomic sequencing) of human colorectal cancer slurries has also identified tumor initiation roles of key bacterial species, including colibactin-expressing Escherichia coli, enterotoxigenic Bacteroides fragilis, and more recently, Clostridioides difficile (38, 39). In parallel, bacterial FISH are used to probe the presence of microorganisms in tissues and lesions (40–42). The high data quality and cost-effectiveness of bulk profiling render a global view of genetic, epigenetic, and microbial alterations between different samples representing colonic polyp progression.
Molecular profiling applied to lung cancer adenocarcinoma (LUAD) research has shed light on precursor lesions at various stages including adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive AD (MIA). With whole-exome sequencing and more refined multiregional exome sequencing, key mutations in driver genes such as EGFR, RBM10, BRAF, ERBB2, TP53, KRAS, MAP2K1, and MET are already observed in PMLs. Although these well-known cancer driver events appear early, a rather gradual increment of other events such as APOBEC-associated mutations, total mutational burden, arm and focal copy-number alteration, allelic imbalance, TP53, and HLA loss of heterozygosity accompany malignant progression (43, 44). Using reduced representation bisulfite sequencing, gradual increases in methylation aberrations and intratumoral heterogeneity in the methylation landscape were also observed from early to late lesions (45). These tumor cell-specific changes were accompanied by a gradual decrease in antitumor response and an increase of suppressive or dysfunctional immune cell subtypes along the progression axis (46). Combining targeted immune-related mRNA profiling and TCR profiling with genomic profiling, an association of summed effects of genomic/epigenetic changes and immune alterations was established, although the association of the immune contexture with any single tumor feature was weak, indicating heterogeneous mechanisms that converge on the progression towards immune evasion (46). Dysregulated immunity was also observed in the precursor bronchial PMLs to lung squamous cell cancer (LUSC) through large-scale sequencing (47). Examination of TCR diversity in PML samples revealed a decreased diversity associated with regressed and non-progressed PMLs within the proliferative subtype, indicative of the encroaching of a few dominant T-cell clones in suppressing transformation (48). TCR diversity is negatively associated with IFN signaling scores and antigen processing signatures. Bulk molecular characterization of lung precancer lesions clarified genetic events that occur early versus later in malignant transformation and implicated tumor microenvironmental changes occurring during tumor progression.
In breast cancer, the value of bulk characterization has been demonstrated in characterizing ductal carcinoma in situ (DCIS) pre-invasive lesions. The profiling of two well-annotated cohorts of DCIS specimens with matched recurrence/invasion data identified 812 genes and an early onset copy-number profile that can be used as a classifier for recurrence and invasive progression with high accuracy. Biological programs underlying these molecular profiles include cell-cycle progression, growth factor signaling, increased metabolism, and elevated immune response. The latter was further probed by multiplexed ion beam imaging (MIBI) to reveal associations between CD4+ T cells, myeloid and plasmacytoid dendritic cells, monocytes, and macrophages, with DCIS recurrence (49). Classification efforts in other precancer types produce similar translational values. Transcriptomic profiling produced two nevi/melanoma subtypes: type 1 characterized by pigmentation-type and MITF gene signature and type 2 characterized by inflammatory-type and AXL gene signature, with the former predicted to confer resistance to BRAF/MEK inhibitor and the latter to anti-PD1 treatment (50). Although in cervical cancer, integration of premalignant cervical intraepithelial neoplasia sequencing with HPV status classified high versus low-risk subtypes (51). These multidimensional precancer characterization efforts not only enhance our understanding of the biology of tumor progression but also lay a solid foundation for translational applications such as risk prediction.
Precancer Profiling With Single-Cell and Spatial Resolution
Technologies with single-cell and spatial resolution enable several levels of heterogeneity within precancerous lesions to be assessed, including heterogeneity between tumor and microenvironmental cells, and between tumor cells of different states (Fig. 1B). Although the number of samples profiled for each study has been historically small, larger-scale studies of these types are beginning to emerge. Chen and colleagues performed scRNA-seq on 128 human colonic specimens, identifying two different precancerous cell states within polyps (15). High-level gene program analysis of these cell states revealed that conventional adenomas originate from aberrant WNT-driven stem cell expansion and that serrated lesions originate from pyloric metaplasia of non-stem cells, potentially due to damage. Analysis of the TME using scRNA-seq and spatially resolved multiplex immunofluorescence revealed a cytotoxic immune microenvironment preceding high tumor mutational burden in serrated polyps that are thought to be precursors of MSI-H colorectal cancers. These human data are supported by a proximal serrated tumorigenesis mouse model driven by mutant Braf and enterotoxigenic Bacteroides fragilis (52). The earliest events of the enterotoxigenic response occur in epithelial cells at the colonic mucosal surface prior to tumor formation, and resultant tumors are hypermethylated, infiltrated with CD8+ cytotoxic T cells, but again, not hypermutated. Association studies of human carcinogenic microbes in mice combined with scRNA-seq also showed a serrated-like damage response in differentiated colonic cells compared with WNT-activation in stem/progenitor cells (39). Microbial influence in serrated tumorigenesis is evidenced by the detection of polymicrobial biofilms in ∼90% of right-sided colorectal cancers, which are enriched for serrated tumors (42). Remarkably, the transition of serrated pre-cancerous lesions to MSI-H colorectal cancers maintained some metaplastic character, but portions of the tumor gain stem properties accompanied by non-APC WNT pathway mutations; these regions were devoid of cytotoxic T cells, further demonstrating the heterogeneous nature of both tumor cells and associated immune cells (15). Cellular plasticity between regenerative and stem states in colonic tumors modulate cytotoxic immunity, an observation supported mechanistically by mouse models. The mapping of these transitions proposes the reclassification of the consensus molecular subtypes (CMS) to only include two tumor cell subtypes based on their adenomatous (iCMS2) or serrated origins (iCMS3; ref. 12). These two pathways dichotomize WNT-dependent and WNT-independent mouse models of tumorigenesis that produce different tumor cell populations downstream (53). Single-cell characterization specifically of serrated lesions further uncovered two subgroups of tumor cells: the sessile serrated lesion subtype featuring Notch signaling and the traditional serrated adenoma subtype featuring Paneth cell metaplasia (54). In addition, further characterization of the damage response to stimulate abnormal ROS and further subsetting of CD103+ CD8+ tissue-resident memory T cells to be associated with enhanced cytotoxicity in serrated lesions were found. Single nuclei transcriptomic and chromosomal accessibility analyses on a progression spectrum of FAP samples, which are largely driven by APC mutations, further revealed a coordinated epigenomic and transcriptional trajectory during progression (55). Consistent with single-cell studies above, tumor progression is associated with a gain in stem features, enriched T-reg, exhausted T cells, and precancer associated fibroblast, all pointing towards immunosuppression. Key cell compositional changes were validated with co-detection by indexing (CODEX) multiplex imaging, and potential biomarkers for progression staging (GPX2 expression, LEK, TCF, and HNF4A accessibility). Mid-scale single-cell and spatial studies of human tissues with confirmatory mouse experiments are richly delineating different subtypes of tumor cell states, tumor-immune modulation mechanisms, and alterations associated with progression in colonic precancerous lesions.
Single-cell transcriptomic profiling has also been applied to gastric precancerous lesions spanning a spectrum of malignant stages. Pyloric metaplasia towards intestinal stem-like cells from glandular secretory cells was observed, followed by two groups of goblet cells that emerge from intestinal metaplasia (IM) with one featuring metabolism alteration and the other proliferation (56). HES6, a transcription factor of early goblet differentiation, as well as early gastric markers KLK10 and SULT2B1, were proposed as markers indicating the risk of IM progression. Another study that performed trajectory analyses on matched malignant and PMLs from patients diagnosed with either intestinal type (IGC) or diffuse type (DGC) gastric cancer revealed a potential path of neoplastic progression from IM to IGC and a de novo path toward DGC characterized by a higher expression of stemness and inflammatory CAF interaction (57).
In other atlasing efforts, more emphasis has been placed on spatially resolved profiling (Fig. 1C). Cell densities, cell neighborhoods and collagen structure were identified as key predictors of DCIS progression using multiplexed-ion beam imaging by time of flight (MIBI-TOF) data generated from DCIS specimens with matched progression to invasive breast cancer (IBC). Coupled with laser-capture microdissection RNA-seq, myoepithelial, and stromal cell features were determined to be more predictive of progression compared with cancer cell features, emphasizing the important role of TME interactions. The hypothesis is that a compromised myoepithelial barrier surrounding the precancer enables infiltration of cells, such as immune cells and CAFs, which restrain tumor progression and protect against the incidence of invasive relapse (58). On the other hand, invasive transformation of DCIS to IBC generates hypoxia, marked by CA9 in tumor cells, which plays a role in recruiting FOXP3+ T regs cells to promote DCIS progression (59). Single-cell transcriptomic profiling also uncovered differential long noncoding RNA SNHG6 and SNHG29 in invasive breast cancer and DCIS (60). The value of experimental design on profiling pure DCIS precancer lesions, and synchronous DCIS and IBC specimens is demonstrated in these atlasing studies.
At last, high-plex immunofluorescence imaging (CyCIF), 3D high-resolution microscopy and spatially resolved microregion transcriptomic revealed molecular evidence of progression both within and across specimens of cutaneous melanoma spanning various degrees of malignancies (61). Imaging analyses revealed higher order cellular patterns including immune synapses, PD1-PDL1 colocalization and other juxtacrine ligand–receptor interactions; they also revealed complex patterns of cytokine and receptor expression leading to immune cell polarization. These discoveries, together with recurrent neighborhood analysis, revealed a number of immunosuppression mechanisms along the progression spectrum especially near invasive borders of lesions. Moreover, expression gradients expression of melanoma-related proteins such as MITF, S100A, and S100B were observed across invasive regions, highlighting the potential of morphogen gradients acting in tumor progression (61).
Emerging Themes from Precancer Atlases
Several common themes have emerged from profiling efforts across precancer types (Fig. 2). The first is the clarification of the ordering of molecular events that occur during tumor progression. Although substantial effort has been exerted on studying the effects of high-frequency driver genes, several precancer studies showed that these highly recurrent mutations mainly exert their effects at the precancer stage. Although these mutations may play a synergistic role with other events during later stages of cancer, progression is usually triggered by less prevalent molecular events conferring fitness advantages, which are more diverse among individuals. High-frequency driver events are targets of study because they are found in large majorities of different cancers. However, these events are shown to be already prevalent in premalignancy, although very few of PMLs actually progress to cancer. These observations put into question the nature of events that determine paths of progression towards malignancy, cancer heterogeneity, and cancer cell plasticity in response to treatments.
The second major theme is the fluctuation of microenvironmental response during the course of precancer progression. Although cancers are classified as “immune hot” or “immune cold,” precancer profiling offers a glimpse into pathway by which a TME arrive at this final state. A cytotoxic environment is already established in precancers despite their lower mutation burden compared with later stage tumors. Other extrinsic factors such as damage might have facilitated this favorable microenvironment to enable immune surveillance. A cytotoxic immune microenvironment restrains tumor progression, but this environment is progressively replaced by an immunosuppressive one as a precancer evolves into cancer. The transition is modulated by tumor-dependent microenvironmental changes, such as hypoxia, tissue physical architecture that excludes immune cells, and altered cytokine and fibroblast profiles. For instance, recent characterization of cervical lesions of various stages unveiled a low but active immune surveillance response featuring infiltration of CD8+ T cells, effector NK cells, and M1-like macrophage in PMLs. This is in contrast to malignant lesions, where the TME is characterized by enrichments of immunosuppressive cell types including exhausted T cells and M2-like macrophage (62). These findings suggest the potential of boosting immune cytotoxicity at early stages of malignancies as a preventative approach against tumor progression, although the applications of such strategies need to be carefully monitored to prevent immune-related adverse events such as autoimmunity (63).
Opportunities and Challenges for Profiling Precancerous Lesions Across Time and Space
Heterogeneity among cells, specimens, and patients is a substantial obstacle for understanding and treating cancer. For understanding tumor progression, variability from individual specimens or patients within a stage can dampen the differential signals across the stages. In addition, many PMLs do not progress, leading to false signals when pooled together with progressing lesions. Using a longitudinal experimental design by sampling within the same patient can somewhat alleviate interpatient heterogeneity (Fig. 1, yellow axes). Yet, this is tremendously challenging for large-scale studies given patient compliance, the unpredictability of disease progression, and the long-time scale for cancer development. In a majority of cases, precancers are completely removed to prevent cancer development, leading to even less chance for additional sampling from the same lesion over time. In the colorectal precancer space, every polyp arises from a unique sequence of genetic events. Thus, polyps metachronously collected from the same patient can only share limited degree of similarities in their progression histories and cannot be safely referenced as precursors to later tumors (13, 64). Thus, mapping an accurate progression path from a collection of independent precancer specimens characterized in bulk presents significant challenges, although metachronous tumor characterization can shed light on the effects of shared genetic predisposition and environmental exposure.
Precancer atlas building requires datasets from large numbers of human specimens, which also brings up logistical challenges. Precancerous lesions are usually very limited in size compared with surgical resections of cancers, which makes customization of next-generation assays to work well with small numbers of cells all the more important. Many assays also generate the highest quality data with freshly collected samples, which put increased demands on a coordinated tissue collection/processing pipeline. Furthermore, many high-resolution data types still require meticulous manual annotation to manage technical and batch variation, making it challenging to perform in high throughput.
On the bright side, newer technologies with increasing resolution (single-cell/spatial) present interesting opportunities (Fig. 1, green axes). Single-cell technologies enable the investigation of differences between cell types and states within a sample, which is usually masked by signal averaging in bulk profiling technologies. Spatially resolved profiling technologies enable the evaluation of differences between regions within a lesion, changes in organization of different cell types, and alterations in cellular neighborhoods and tissue architecture (bioRxiv 2023.03.09.530832). These additional layers of information facilitate deconvolution of cellular heterogeneity within precancers and cancers. Using these technologies, it is possible to identify cells or regions featuring various degrees of malignancy within a tumor, opening new ways to dissect molecular and cellular alterations along the tumor progression axis. The generalizability of these types of analyses as they apply across tumors should be noted, and the importance of data integration across large cohorts of patients should be emphasized along technology advancement. Even though there is still yet a lot of work to be done to develop computational tools to reliably integrate, annotate, and quality control high-volume and high-dimensional data, three dimensional single-cell atlases are now emerging (65), representing a promising next step of precancer characterization.
Authors' Disclosures
K.S. Lau reports grants from NIDDK and NCI during the conduct of the study. No disclosures were reported by the other authors.
Acknowledgments
Z. Chen and K.S. Lau are partly funded by R01DK103831 from the National Institute of Diabetes and Digestive and Kidney Diseases and P50CA236733 and U54CA274367 from the NCI.