Abstract
Epithelial stem cells accumulate mutations throughout life. Some of these mutants increase competitive fitness and may form clones that colonize the stem cell niche and persist to acquire further genome alterations. After a transient expansion, mutant stem cells must revert to homeostatic behavior so normal tissue architecture is maintained. Some positively selected mutants may promote cancer development, whereas others inhibit carcinogenesis. Factors that shape the mutational landscape include wild-type and mutant stem cell dynamics, competition for the niche, and environmental exposures. Understanding these processes may give new insight into the basis of cancer risk and opportunities for cancer prevention.
Recent advances in sequencing have found somatic mutations in all epithelial tissues studied to date. Here we review how the mutational landscape of normal epithelia is shaped by clonal competition within the stem cell niche combined with environmental exposures. Some of the selected mutant genes are oncogenic, whereas others may be inhibitory of transformation. Discoveries in this area leave many open questions, such as the definition of cancer driver genes, the mechanisms by which tissues constrain a high proportion of oncogenic mutant cells, and whether clonal fitness can be modulated to decrease cancer risk.
INTRODUCTION
Aging is accompanied by mutation (1). From the first division after conception, somatic cells acquire mutations (2–4). This progressive increase in the number of mutations in human stem cells is unavoidable but on its own cannot explain the diverse mutational landscapes that develop across normal tissues. Differences in the structure of the stem cell niche, wild-type and mutant stem cell dynamics, and environmental exposures all play a part in determining the prevalence of particular mutant genes in normal tissues. Understanding these processes sheds light on the biology of stem cells, the impact of environmental exposures, and the nature of the precancer state. In turn, these may guide interventions to reduce cancer risk, as it is from this mutated landscape that the founder clones of cancers emerge over years or decades (5, 6).
Progress in transgenic mouse and primary cell culture research has given insights into the behavior of wild-type and mutant stem cells that inform the interpretation of mutational data (7, 8). Here we first briefly consider the approaches used to detect mutations and then consider insights into epithelial stem cells and mutational selection that apply across tissues. Epithelia that have been studied in depth are then reviewed, and finally we draw together common principles in this nascent field and consider the complex relationship between normal tissue mutations and carcinogenesis.
THE CHALLENGE OF FINDING NORMAL EPITHELIAL MUTANTS
Cancers are clonal with subclones within them, so that as long as a sample is of sufficient purity, sequencing at normal depth will detect the founder clone and large subclones (9). An equivalent mass of normal epithelia is highly polyclonal, containing many different mutant clones (Fig. 1A). For a given gene, almost all reads will be wild-type, so that most mutants are likely to be below the lower limit of reliable detection of standard sequencing and hence be missed. Several strategies have been developed to get around this challenge.
A simple approach is to generate single cell–derived clonal cultures from the tissue of interest (refs. 1, 10, 11; Fig. 1B). This allows amplification of single genomes in live cells to generate enough DNA for reliable whole-genome sequencing (WGS). The disadvantages are a loss of spatial information as the tissue is disaggregated when cultured, the potential for biased sampling as some wild-type or mutant cells may not grow in culture, the introduction of new mutations while cells are being cultured, and the substantial labor involved.
An alternative strategy is to use laser-capture microdissection (LCM) to remove a microscopic piece of tissue and perform either exome sequencing or WGS (12). Laser capture of a histologically defined stem cell niche such as a colonic crypt is quite likely to yield an oligoclonal or clonal population (ref. 13; Fig. 1C). Crucially, LCM retains spatial information, a major advantage allowing mutant clones to be mapped across an epithelium by taking multiple samples from serial sections. Optimization of multiple steps in small sample sequencing protocols has enabled WGS of as few as 100 cells with minimal sequencing errors (12). LCM overcomes the potential selective bias of the cell culture approach, has an acceptable technical failure rate, and has become the most widely used approach in the field. It has yielded a rich harvest of information on single-nucleotide variants, structural alterations, burden of synonymous mutations, and mutational signatures in mutant clones in normal tissues. However, LCM remains labor intensive, and the area of tissue and number of individual donors that can be analyzed in such studies are small, limiting the statistical robustness of studies.
More recent methods for the analysis of small samples are based on detecting somatic mutations in single DNA molecules using an extremely accurate protocol called duplex sequencing (14). The latest iteration of this approach achieved an error rate of less than five errors per billion bases and has been used to map the mutational burden and signatures across multiple human tissues (15). The technique randomly samples about a third of the genome, so it cannot give information about specific mutant genes, but is scalable to allow large numbers of individuals to be sampled in epidemiologic studies to assess links between cancer risk and mutational burden/signatures for example. A key insight from duplex sequencing is that there is little relationship between the rate of cell division in a tissue and somatic mutation burden; for example, postmitotic neurons accumulate somatic mutations at a similar rate to proliferating tissues (15).
To map mutations in epithelia without a clearly defined niche, an alternative method relies on deep targeted sequencing (DTS; refs. 16–18; Fig. 1D). Sheets of epithelia, such as skin or esophagus, can be detached from the underlying stroma and dissected into a grid of samples (1–2 mm2 each in the case of the skin or esophagus; refs. 19, 20). These are then sequenced at several hundredfold coverage. To identify rare variant alleles, a “reference library” of very high depth (over 10,000-fold) sequence data of normal DNA is used to measure the technical sequencing error rate at each base in each gene. This is then used to estimate the probability that the variant observed in the experimental sample is a true mutation (20). By collecting samples in a grid, clones spanning adjacent samples can be merged and low spatial resolution maps of mutant clones generated (19, 20). The advantage of DTS is that large areas can be rapidly surveyed at low cost, and large numbers of mutant clones collected, orders of magnitude more than with the LCM approach. However, in most cases the combinations of mutants within a clone are not revealed, copy-number information is limited, and estimates of synonymous mutation burden and signature may be biased in targeted data.
Finally, mutational information may be extracted from RNA sequencing data. An analysis of the GTEx study, which took a single sample from 30 tissues in hundreds of donors, revealed somatic mutants in multiple epithelia, with skin, lung, and esophagus the most highly mutated and colon, stomach, and bladder the least (21). Mutational signatures and copy-number alterations were detected and agreed with DNA sequencing results where these were available. However, the samples contain a complex mixture of cell types and varying proportions of wild-type and mutant cells, and the expression level of mutant genes may vary across lineages, so the sensitivity of mutant detection and ability to reliably estimate proportions of mutant cells in tissues are unclear. Nevertheless, this study confirms that somatic mutations are widespread across all tissues and suggests their abundance may vary substantially between different epithelia.
In summary, there is no single solution to mapping mutant clones in epithelia. The structure of the tissue and knowledge about the stem cell niche are invaluable in guiding which approach should be selected. WGS of clones is clearly the gold standard for many analyses but is limited to small areas of tissue in small numbers of donors. DTS mapping has revealed much about the nature of competitive selection of mutants at a scale unmatched by any other method. Duplex sequencing has great promise for studying small biopsies in large numbers of individuals. In exploring a previously unstudied tissue, a hybrid approach seems wise, using DTS to assess the prevalence and size of mutant clones, guiding more costly but informative WGS sampling (22).
NORMAL STEM CELL DYNAMICS AND SOMATIC MUTATIONS
We now turn to key principles for interpreting the effects of somatic mutations on normal epithelia. These are stem cell dynamics and how mutations that alter cell behavior may be identified.
A critical feature of normal tissue is cellular homeostasis, so that the number of cells in the tissue as a whole and in the proliferating compartment is stable over time. Even in highly mutated normal epithelia, the rate of cell loss by processes such as shedding or apoptosis must match cell production. Furthermore, within the stem cell niche, on average, each division must produce one stem cell and one cell that will leave the niche to differentiate, so that the number of stem cells remains constant. In transgenic mice, a specific reporter mutation can be induced in scattered single cells to allow clones to be tracked. When combined with statistical modeling or more recently live imaging, this has shown how epithelial stem cells achieve this homeostatic balance (23–27).
There are two distinct cellular mechanisms to achieve homeostasis used by stem cell populations in epithelia. The first, exemplified by the intestinal crypt, is likely to operate in most epithelia organized into spatially defined proliferative compartments. Stem cells divide to generate two stem cell daughters, but division is limited by the finite space within the niche. Stem cells can only divide when a neighboring stem cell leaves the niche as it differentiates (23, 26, 28, 29). In other epithelia where the stem cells lie within an uninterrupted cell layer, the probability of generating stem cell and differentiating daughter cells is balanced so that across the stem cell population, the average division results in 50% stem cells and 50% differentiating cells, ensuring homeostasis (24, 27, 28, 30–32). Both of these mechanisms result in neutral drift in lineage-tracing experiments (23, 25, 28).
Given the conservation of stem cell behavior across evolution, it is to be expected that human epithelial stem cells will also fall into one or the other of these two classes of homeostatic regulation (28, 33). An implication of these dynamics is that single-nucleotide variants that do not alter cell behavior may be lost by neutral drift before the second allele is disrupted. A further constraint on mutant clones in tissues organized into clonal units, such as the colon, is that once the niche is fully occupied, its further expansion is restricted unless it can transgress the normal boundaries of the compartment (23, 29). In squamous epithelia, on the other hand, the niche is a two-dimensional sheet, and there are no limits to clonal expansion within the plane of proliferating cells (24, 31). As well as setting a potential upper bound on clone size, the niche defines the dynamics of competition between mutants and mutant selection as discussed below.
A key challenge that parallels the analysis of mutations in cancer genomes is to discriminate “driver” mutations that alter cell behavior from neutral “passenger” mutants. A simple method to resolve the two that is widely used in somatic mutation studies is to compute the ratio of protein-altering (dN) to silent (dS) mutations in the coding region of a gene. This approach avoids pitfalls such as the variation in mutation rates across the genome and variations in sequencing coverage and has recently been enhanced by using the mutational spectrum to estimate the expected frequency for every possible nucleotide substitution in the gene (17, 34). The dN:dS ratio under the assumption of neutrality is 1, whereas positively selected mutant genes in epithelia may have dN:dS ratios of up to 50 or more. Negative selection is much harder to detect, requiring a much larger sample size, but has been observed in highly competitive tissues such as the skin (20).
It should be noted that dN:dS ratios have some limitations. For example, a mutant gene such as PIK3CA with both inactivating nonsense and missense mutations and gain-of-function “hotspot” mutations can have a net dN:dS ratio close to 1 (20). Some synonymous mutations may have functional consequences and so cannot be assumed to be neutral (35). Simple comparisons of dN:dS ratios across tissues or studies cannot be relied upon as indicators of the strength of selection of a mutant gene in different contexts. Furthermore, sufficient numbers of mutations are required so this approach is not suitable for small studies or sparsely mutated tissues. Nevertheless, the approach is a simple and robust method to identify mutant genes likely to have functional impact on cell dynamics.
As we will discuss below, there are several examples of positive selection of mutant genes in humans that have been studied in mouse models, which show the mutant gene confers a proliferative advantage on stem cells. In each case, the competitive advantage of mutant stem cells is transient, as following colonization of a niche, such as a colonic crypt, or a region of squamous epithelium, the mutant cells revert from expansion toward homeostasis. This behavioral reversion is critical for the maintenance of normal histology and function of aging tissues, but its molecular basis is poorly understood.
In epithelia, a mutant cell immediately comes into direct contact with wild-type neighbors and must compete with them for space in the tissue. Wild-type cells are able to sense cells with a markedly abnormal phenotype, such as those overexpressing high levels of oncogenic mutants from strong synthetic promoters, and extrude them from the epithelium (36–38). However, the phenotype of the same mutants expressed from their own promoter is subtler, allowing the mutant cell to persist (39, 40). Positively selected mutants confer a proliferative advantage on the mutant cell itself, but some “super competitors” also act on adjacent wild-type cells, inducing their differentiation. For example, in mice Apc-null stem cells in the intestinal crypt secrete the WNT inhibitor NOTUM, which drives adjacent wild-type stem cells to differentiate and leave the crypt, creating space for mutant clone expansion (41, 42). This increases the chances that the Apc-mutant cells will completely occupy the crypt and go on to found an adenoma. Interestingly, the efficiency of crypt takeover by Apc-null cells is reduced by a calorie-restricted diet that increases the number of wild-type cells and level of competition in the niche (43). A further example is of Notch1- null cells in the mouse esophagus, which induce the differentiation and upward migration of adjacent wild-type cells in the proliferative layer, apparently by activation of Notch signaling in the wild-type cell (44–46).
We now explore these themes in individual epithelia, firstly considering those organized into discrete clonal units where mutants compete within a niche, such as the colon, and then tissues where there is no defined niche, such as squamous epithelia and the bladder.
COLON
The colonic epithelium is a classic example of a tissue organized into spatially discrete proliferative units, defined by the colonic crypt and an area of adjacent epithelium that it supports, each maintained by a separate population of stem cells (47, 48). Wild-type stem cells compete neutrally for space in the niche, with stem cell division being linked to the exit of a cell as it begins the process of differentiation (Fig. 2A). Lineage tracing in mice shows crypts becoming monoclonal through neutral competition, with the colonizing clone being no fitter than the stem cells it displaces (ref. 49; Fig. 2B).
More recently, spontaneous mutations have been used as lineage markers to infer the dynamics of wild-type stem cells in normal human colonic crypts. Somatic mutations in the enzyme O-acetyl transferase can be detected by histochemical staining, allowing the visualization of clones within colonic crypts (50). A mixed population of clonal, partially colonized, and nonmutated crypts is seen, which when analyzed leads to estimates of seven stem cells per crypt that divide every 9 months on average, with a median time for a crypt to drift to monoclonality of 6 years, and other studies reaching similar conclusions (13, 50, 51). The rate of stem cell turnover is far slower than in the mouse colon, though other aspects of stem cell competition appear to be conserved (39, 49, 50). Human colonic epithelium is thus an array of small spatially separated stem cell pools undergoing slow turnover and a process of neutral drift.
The insight that many colonic crypts in aging humans will carry clonal mutations motivated the large-scale sequencing of normal human colonic crypts by LCM (13). Over 500 clonal crypts from middle-aged or older donors were subjected to WGS. Signature analysis indicated the bulk of mutations were due to cell-intrinsic processes (signatures SBS1 and 5) or oxidative DNA damage (SBS18), which were found in all samples (13, 52). In addition, some mutational signatures were restricted to either individuals or crypts. For example, a patient who had received chemotherapy had a unique signature reflecting this and two crypts had the signature of the APOBEC cytidine deaminases. The WGS analysis was augmented by targeted sequencing for known colon cancer drivers. Mutant AXIN2 and STAG2 were identified as under positive selection and gain-of-function hotspot mutations detected in ERBB2, ERBB3, PIK3CA, and FBXW7. Overall, these mutant genes are found in about 1% of crypts of a typical 50-year-old (13). About 50% of crypts harbor mutations likely to have an impact on protein function in genes of the Cancer Gene Census, but the absence of selection argues that these do not substantially alter the behavior of colon stem cells (53).
How do mutants alter stem cell dynamics? STAG2 lies on the X chromosome, so a protein-truncating mutant will lead to loss of protein expression (50). This allows the visualization of mutant cells in crypts, revealing a 10-fold higher ratio of monoclonal to partially colonized crypts than is seen with neutral mutations. A STAG2-mutant stem cell has a decisive competitive advantage over wild-type cells, as when a differentiated cell leaves the crypt, it is much more likely to be replaced by a STAG2-mutant than a wild-type cell, with the former having a 99% probability of taking over its crypt (50).
Copy-number changes and/or structural variants were much more common than positively selected mutations in normal colon, being detected in about a fifth of evaluable crypts (13). The commonest events were large deletions and tandem duplications; whole-chromosome copy-number increases were seen more rarely. The events observed were not recurrent, are not known to be linked to cancer, and were confined to single crypts. The level of copy-number alterations in normal epithelium is far lower than that in cancer (13).
Almost all clonal genomic events are confined within a single crypt. However, rarely crypts may split into two, a process termed crypt fission, which was first demonstrated in the normal human colon by visualizing clones carrying somatic mitochondrial mutations (54). Mouse models have argued that oncogenic mutants such as Kras may spread by accelerating the rate of crypt fission, allowing them to break out of the imprisonment of a single crypt (Fig. 2C; ref. 55). In humans, the fission rate of wild-type crypts is low, estimated at 0.7%/year. STAG2 mutants increase this rate 3-fold, and for mutant KRAS the rate of fission may be up to 10-fold higher (50). The effect is to create areas of multiple clonal crypts within the normal epithelium. As well as crypt fission, recent mouse studies hint that oncogenic Kras mutants secrete short-range signals that may have detrimental effects on wild-type stem cells in adjacent crypts in the small intestine in mice, though it remains to be seen if such mechanisms operate to promote clone spread in humans (56).
Although our focus is on normal epithelium, it is worth noting that recent studies have shown the dramatic impact of nonmalignant disease, specifically inflammatory bowel disease (IBD), on the selection and dynamics of mutant clones in the colon. These disorders are characterized by episodes of inflammation, ulceration, and healing, lasting over decades. An LCM-based study found that IBD increased mutational burden, substantially elevated the proportion of crypts with copy-number alterations, and generated multiple clones extending over millimeters, presumably due to crypt fission (57). Genes under positive selection in IBD epithelium included two known cancer drivers, ARID1A and FBXW7, but also PIGR and ZC3H12A, genes implicated in inflammation, were specifically selected in IBD epithelium. These findings were extended by studies using both single and bulk crypt sequencing and clonal organoid sequencing, confirming the selection of multiple mutant genes in the IL17 pathway that drives ulcerative colitis (58, 59). Functional studies confirm that the selected mutants protect against IL17-driven apoptosis (58). In an elegant study of patients with Crohn disease, LCM of crypts was performed on biopsies of normal tissue over 8 years prior to the development of a tumor requiring resection (60). These revealed the dramatic expansion of TP53 mutant clones from the ascending to descending colon. Interestingly, in mouse models, whereas Apc- and Kras- mutant cells outcompete wild-type cells to take over crypts, Trp53-mutant cells compete neutrally in normal epithelium, only gaining a competitive advantage when the intestine is inflamed (39). Thus, strong selective pressures can lead to the selection of specific somatic mutants unrelated to cancer in inflamed bowel.
Normal colon thus tolerates both copy number–altered and mutant crypts, and particular oncogenic mutants have the potential to expand across large areas of epithelium by driving crypt fission in response to selection driven by environmental cues such as inflammation. Common cancer driver mutants such as APC and KRAS are rarely found in normal epithelium, whereas some mutant genes that are more common in normal epithelium, such as ERBB2, are comparatively infrequent in colonic cancer. The ability of a mutant to alter stem cell dynamics to increase the likelihood of crypt colonization is thus separable from oncogenicity.
STOMACH AND SMALL INTESTINE
Less information is available on somatic mutations in the stomach, small intestine, and rectum. Recent LCM sequencing of multiple organs from five human donors gives a preliminary view of the mutational landscape of the stomach and small intestine in which stem cells are also arranged into glands (stomach) or crypts (intestine; ref. 22). The mutational burden at both sites is similar to the colon, despite the extreme rarity of cancer in the small intestine. Signatures are also similar, although the signature of the carcinogen aristolochic acid (AA) was identified in the stomach (22).
ENDOMETRIUM
A second tissue that contains histologically restricted clonal units but differs dramatically from the colon is the endometrium, which consists of an epithelial sheet punctuated by glands. Between menarche and menopause, the human endometrium undergoes cyclical apoptosis, shedding, regeneration, and remodeling of its functionalis layer (refs. 61–63; Fig. 2D). The regeneration of the epithelium is thought to depend on stem cells that persist in the highly branched endometrial glands that lie in the basalis layer (64). LCM and WGS of human basalis glands show that over 90% are clonal (65). As many clonal gland sections do not carry mutants under positive selection, this may reflect a drift to monoclonality due to neutral competition, as seen in the colonic crypt. The mutation burden was proportional to age, in keeping with the predominant mutational signatures being the clock-like SBS1, SBS5, and SBS40, which is similar to SBS5 and SBS18, reflecting oxidative damage (65). Twelve mutant genes were under positive selection in normal endometrial glands. These included the growth factor receptors ERBB2 and ERBB3; signal transduction components KRAS, PIK3CA, PIK3R1, ARHGAP35, and PPP2R1A; and steroid hormone response genes ZFHX3 and FOXA2 and also FBXW7, CHD4, and SPOP. Sixty percent of glands had one or more selected mutant genes (65). Consistent results have been obtained in other studies using LCM of epithelium and glands with a targeted sequencing approach (66–68). Somatic copy-number changes and structural variants were rarer and found in about a sixth of normal endometrial glands, almost all of which had only a single alteration (65). Some clones appeared to extend across multiple glands, consistent with the interconnected branching structure of the glands (64, 65, 68). The mutants that are commonly selected in normal epithelium differ from those frequently mutated in cancer. Endometrial cancer driver genes are rarely mutated in glands and are not under selection in the endometrium. Only 2% of glands harbored cancer drivers (heterozygous TP53 and ARID1A mutants; ref. 65). As with the colon, the normal tissue does not provide conditions that favor strong selection of oncogenic mutant clones.
It is interesting to speculate if the same mutants would be selected in the same tissue even if it grows in a different body site. Endometriosis, which occurs in about 10% of women, is a condition in which endometrial-like epithelium grows outside of the uterus. LCM followed by targeted sequencing has revealed recurrent PIK3CA-, PIK3R1-, KRAS-, FBXW7-, and PPP2R1A-mutant clones in endometriosis lesions consistent with convergent selection of mutants in the same environment, albeit at a different location (66, 69).
EPIDERMIS
The outermost layer of the skin, the epidermis, consists of a sheet of keratinocytes, interspersed with hair follicles and sweat ducts (33). The keratinocytes are organized into layers. Proliferating cells are confined to the deepest basal layer. Dividing cells generate daughters that go on to either divide themselves or differentiate, exiting the basal layer and migrating through the overlying layers until they reach the surface of the skin from which they are shed (Fig. 3A). Shedding and proliferation continue throughout life. One consequence of the structure of the epidermis is that there is no barrier to limit the lateral spread of clones, which can extend over a centimeter in diameter (17). Transgenic mouse research and live imaging of primary human cell cultures indicate that in the normal epidermis, proliferating cells are a single population that has a simple pattern of behavior (27, 30, 45). The average cell division generates dividing and differentiating daughter cells with equal probability (Fig. 3A), achieving cellular homeostasis across the population of proliferating cells (Fig. 3A). A consequence of these normal cell dynamics is that while most cells that acquire a neutral mutation generate short-lived clones that are lost by differentiation within a few rounds of cell division, by chance, a minority of clones will expand and persist longer term in the tissue (Fig. 3B).
Early studies on somatic mutation in the epidermis used immunostaining for TP53 to detect clusters of cells in which the protein was stabilized by mutation, which could then be sequenced. TP53-mutant clones were identified in sun-exposed normal skin (70). More recently, DTS has been used to map mutations in sheets of normal skin in multiple body sites, uncovering a high density of mutant clones particularly in sites regularly or intermittently exposed to sunlight (16–18, 20). The majority of these mutations are C to T (mutational signature SBS7) and CC to TT substitutions consistent with UV-induced mutation; the remaining mutations are attributable to the clock-like SBS5 (20). The density of mutant clones varies widely between individuals at the same body site and across the body.
Positively selected genes in the epidermis include NOTCH1, NOTCH2, NOTCH3, FAT1, TP53, TP63, ARID1A, AJUBA, KMT2D, RB1, and RBM10 (ref. 20; Fig. 3C). In addition, canonical hotspot activating mutations were found in the receptor tyrosine kinases EGFR, ERBB2, ERBB3, and FGFR3 and the downstream signaling components KRAS, HRAS, AKT1, and PIK3CA, along with the transcription factor NFE2L2, which regulates the oxidative stress response. The skin is also the first tissue in which evidence of negative selection, of missense CUL3 and DICER1 mutations and nonsense PIK3CA mutations, has emerged. This may reflect the fierce clonal selection in the skin (20).
How do mutant genes drive clonal expansion? In the case of Trp53, insight comes from a mouse model in which the equivalent of the commonest missense mutant in human skin (TP53R248W) can be induced in single cells and the resultant clones tracked by virtue of expressing a fluorescent reporter. Proliferating mutant cells produce slightly more dividing than differentiating daughters in each cell division on average (ref. 40; Fig. 3D). This gives mutant cells a proliferative advantage over wild-type cells, and an increased chance of persisting and spreading through the epidermis compared with a neutral mutant. The mutant clones expand and colonize large areas of epidermis, which soon appear thickened and express stress markers. However, as months pass, the behavior of the mutant cells changes, reverting toward balanced cell production, and mutant and wild-type epidermis become histologically indistinguishable (ref. 40; Fig. 3D). A similar mechanism of a transient competitive advantage followed by reversion to homeostatic behavior in driver mutant clones in human epidermis would help to explain how the epidermis can carry a high burden of positively selected mutants within normal epithelium.
Surveying skin across the body reveals intriguing differences between sites. The UV mutational signature is subtly different in facial skin (SBS7d), which is exposed to UV light on a daily basis, than in other locations, reflecting distinct DNA damage and/or repair processes (20, 52, 71). Selection of mutant genes also varies. Mutant TP53 is preferentially selected, and mutant FAT1 is less competitive in facial skin compared with other locations. In contrast, in the lower leg, mutant NOTCH1 and NOTCH2 are more strongly selected than other mutant genes (20). These differences in selection may result from the frequency and intensity of UV light exposure, as UV light may alter the behavior of existing mutant clones. For example, in mice, repeated exposure to sub-sunburn doses of UV light dramatically expands Trp53-mutant clones compared with unirradiated skin (40, 72, 73). As a result, the bulk of the Trp53-mutant population in sun-exposed skin is generated by UV light–induced growth of preexisting clones rather than de novo mutations (72). The impact of UV light on other mutant genes remains to be studied, but even if Trp53 was the only gene affected, the landscape may be changed as other clones are displaced by mutant Trp53 clonal expansion.
By old age, WGS of microbiopsies reveals normal epidermis to be a dense patchwork of mutant clones, carrying up to 30 to 40,000 mutations per genome (20). Single clones may have several driver mutations, with one or more having loss of heterozygosity (LOH). The most frequent gene undergoing LOH is NOTCH1, followed by PTCH1, the driver of basal cell carcinoma, which lies close to NOTCH1. FAT1 and TP53 LOH is also frequent.
Collectively, mutant NOTCH1 and FAT1 clones each occupy about a third of aged sun-exposed epidermis, about twice the area colonized by TP53 and NOTCH2 mutants (20). It is striking that the proportion of keratinocyte cancers carrying NOTCH1 and FAT1 mutants is similar to that in normal skin, whereas these tumors are substantially enriched in mutant TP53 and NOTCH2 compared with normal tissue. This suggests that although mutant TP53 and NOTCH2 promote cancer development, mutant NOTCH1 and FAT1 may make little contribution to transformation (20).
In summary, normal epidermis tolerates a remarkably high burden of clones carrying multiple mutations under strong positive selection. UV light both generates the bulk of mutations and shapes the mutational landscape, particularly by expanding the TP53-mutant population.
ESOPHAGUS
Like the epidermis, the squamous esophagus consists of layers of keratinocytes but differs in several respects. The lower cell layers contain dividing cells; cells exit the cell cycle and migrate toward the surface but retain their nuclei until they are shed (74, 75). Continual cell turnover is required to maintain tissue integrity, with the majority of cell divisions occurring in the two to three layers immediately above the basal cell layer (76). The esophagus has rare glands, which appear to be almost entirely quiescent (77). As in the epidermis, there are no barriers to restrict the lateral expansion of clones within the proliferative compartment, which can expand to millimeter scale (19). In terms of stem cell dynamics, mouse studies argue that the proliferating cells are a single population with similar properties to those in the epidermis, so that cells with a neutral mutation will follow neutral drift (24, 78). In human esophagus, evidence for stem cell behavior is indirect, inferred from proliferation marker expression and cell culture, but all proliferating cells seem to have similar potential to generate cultures and reconstitute esophageal epithelium in xenograft studies, consistent with what would be expected from mouse studies (74, 75).
It might be expected that a lower proportion of esophageal epithelium would be mutated compared with epidermis given the lifelong exposure of the latter to mutagenic UV light. However, this is not the case (10, 19, 21, 22). Human esophagus progressively acquires mutations with age—the predominant mutational signatures being the clock-like SBS1 and 5, with the addition of the alcohol mutational signature SBS16 in some individuals. Mutant genes under positive selection include NOTCH1, NOTCH2, NOTCH3, TP53, FAT1, ARID1A, KMT2D, CUL3, AJUBA, PIK3CA, ARID2, TP63, NFE2L2, CCND1, and PPM1D (refs. 10, 19; Fig. 4A). WGS shows copy-neutral LOH of NOTCH1 is frequent, but other genome alterations are rare. By old age, esophageal epithelium is one of the most mutated tissues in the body, with mutant clones occupying the majority of the epithelium.
Mouse models argue the mechanism of clonal selection in the esophagus is competition for space within the proliferative compartment (79). Lineage tracing in mutagen-treated mouse esophagus with a patchwork of mutations very similar to that in humans indicates that mutant clones expand, displacing wild-type and less fit mutants until they encounter mutants of similar fitness (Fig. 4B). At this point, clones revert toward homeostatic behavior and compete neutrally, explaining both selection and the ability of the tissue to retain normal structure and cell dynamics with such a high burden of mutant cells (79). The intense competition for space in highly mutated epithelia such as the esophagus also poses a challenge for early tumors, as highly competitive clones within the normal epithelium may remove microscopic lesions before they can progress further (ref. 80; Fig. 4C).
In terms of the area of the esophagus colonized, NOTCH1 mutants are predominant, and in combination with LOH, this means that by middle age the majority of the esophagus has lost both alleles of NOTCH1 (20, 44). In mouse models, transgenic inhibition of Notch signaling confers a strong competitive advantage on clones in normal esophagus, and Notch1 is haploinsufficient, so mutation of a single allele confers a competitive advantage, increased by loss of the second allele (44, 45). However, despite being very competitive in normal esophagus, NOTCH1 mutants are poorly oncogenic, being found in less than 10% of cancers (ref. 81; Fig. 4D). Indeed, the depletion of mutants in cancer compared with normal tissue argues that NOTCH1 loss may protect against transformation. In contrast to NOTCH1, TP53 mutation with LOH is found in almost all squamous carcinomas of the esophagus. In normal tissue, heterozygous mutant clones are found in 10% of the tissue by middle age, rising up to 30% for those in their 70s (refs. 10, 19; Fig. 4D). This suggests that TP53 mutants confer a competitive advantage on the cells that carry them, but unlike NOTCH1, TP53-mutant clones with LOH are very rare in the normal epithelium (19, 82).
Less is known about how environmental factors shape the normal esophageal landscape. High alcohol intake is a risk factor for squamous esophageal cancer and is associated with an increased mutational burden with an alcohol mutational signature and a higher density of clones carrying TP53 and NOTCH1 mutations (10). Almost all squamous esophageal cancers carry TP53 mutations, and in drinkers they are likely to be caused by alcohol generating a TP53-mutant clone in the normal epithelium, as evidenced by the alcohol mutational signature (83). In mice, it has been shown that exposure to low-dose ionizing radiation—50 mGy, equivalent to three to four CT scans—is able to promote the expansion of preexisting Trp53-mutant clones by a DNA damage–independent mechanism (84). The exposure results in redox stress, causing wild-type cells to differentiate. Mutant cells express high levels of antioxidant genes and are protected, and so are able to expand into the space vacated by wild-type cells (84). This is an example of an environmental exposure that leaves no mutational signature but might potentially increase cancer risk by expanding the population of oncogenic mutants in a normal tissue. Such factors might explain the absence of different mutational signatures in esophageal squamous cancers from high- and low-incidence parts of the world (83).
UROTHELIUM
The urothelium that lines the bladder and ureters is a continuous, multilayered sheet of cells that normally have a very low rate of cell division with less than one in a thousand cells expressing proliferation markers (refs. 85, 86; Fig. 5A). However, in response to injury, basal and possibly the overlying intermediate layer cells rapidly proliferate to reconstitute the epithelial surface (87–89). Such proliferative potential may be exploited by mutations and as with stratified epithelia, there are no barriers to restrict the expansion of mutant clones, so although most mutant clones are submillimeter in scale, they can extend beyond a centimeter (90–92).
Two recent studies have used LCM to examine mutations in the urothelium. The first used a combination of targeted sequencing and whole-exome sequencing, subsequently performing standard-depth WGS on samples found to be clonal in normal bladder urothelium from transplant donors and patients with cancer from the United Kingdom (91). The mutation burden rose with age and was similar to that in other tissues. Mutational signature analysis was complicated by marked variation between donors, so de novo signature discovery was performed. This revealed the presence of APOBEC cytidine deaminase mutagenesis, which is rare in other normal tissues (91). The most frequently positively selected mutant genes were the epigenetic regulators KMT2D, KDM6A, and ARID1A, whereas common bladder cancer drivers were rarely mutated (Fig. 5B; refs. 91, 93). Copy-number alterations were absent from the majority of clones, with the most common changes being gains of whole chromosomes or chromosome arms.
A second study studied both ureters and bladders from a Chinese population, sampling larger areas of urothelium and performing whole-exome sequencing on histologically normal epithelium distant from urothelial cancers (92). The larger area of urothelium sampled and relatively shallow 140-fold coverage meant smaller mutant clones were not detected. De novo mutational signature analysis revealed an “aging” SBS1- and SBS5-like signature, an APOBEC cytidine deaminase signature resembling SBS2 and SBS13, and most surprisingly an SBS22-like signature similar to that produced by AA, which was present in 60% of female and 25% of male samples. AA is a powerful mutagen present in traditional herbal medicines that is associated with an increased risk of urological cancers (94, 95). The mutational burden of the normal epithelium was low in patients without the AA signature but significantly higher in those with AA exposure—dramatically so in some individuals. Copy-number alterations were rare. Mutant clone sizes were larger in the AA-exposed subjects. Positively selected mutant genes in the normal urothelium were again KMT2D and KDM6A but also TP53. AA emerges as not just a mutagen but an agent able to alter mutant clonal dynamics, with the largest clone found in AA-exposed tissues extending over several square centimeters (92).
BRONCHIAL EPITHELIUM
Finally, we consider the bronchial epithelium, which contains basal cells and multiple specialized cell types. Mitochondrial mutant clones demonstrate the potential for extensive lateral expansion involving all cell types extending at least 1 mm in diameter and are argued to exhibit features of neutral competition, although this is controversial (82, 96). The mutational landscape has been studied by WGS of clonal cultures of cells from brush biopsies of never, current, and former smokers (Fig. 5C; ref. 11). The efficiency of culture generation was 15% to 40%, but the extent of selection during the establishment of cultures is not known. All subjects accumulated mutations with age at a low rate, circa 20 mutations/year, but this is dwarfed by the effect of smoking, with over 5,000 mutations/cell in current smokers and half this in former smokers, in whom about half the cells had a mutational burden close to that of nonsmokers (11).
Mutational signatures included the age-correlated SBS1 and SBS5, particularly dominant in nonsmokers, and tobacco-linked signatures SBS4 and SBS16 in current smokers and ex-smokers. Of particular interest, the cells with normal mutation burden in ex-smokers had little SBS4 (11).
Mutant genes under positive selection were TP53 and NOTCH1, both present in over 30% of colonies, and more rarely PTEN, ARID1A, and ARID2, which are selected in carcinoma of the lung, and also FAT1 and CHEK2 (11, 34, 97). As the brush biopsy samples cells in a small area, some mutations were shared between colonies from a given donor, and 75% shared the same TP53 mutation. The proportion of colonies carrying a selected mutant was under 10% in never smokers but ranged from 25% to over 50% in current smokers, a few of which carried two or three selected genes (11).
These findings demonstrate the huge impact of smoking on the mutational landscape of bronchial epithelium, which is to be expected in light of the link of smoking in lung cancer risk (98). Less expected however, and something that was detected by the single-cell genome analysis that is a feature of this study, is the emergence of a population of cells in ex-smokers that carry few tobacco-induced mutations (11). These cells seem to outcompete their heavily mutated neighbors in the absence of the selective pressure exerted by smoke exposure and may contribute to the decrease in cancer risk that follows the cessation of smoking, though their nature and the mechanism by which they evade mutation remains to be determined (11, 98).
DISCUSSION
All cells age and mutate, but the mutant genes that increase cell fitness and found clones that colonize the stem cell niche vary between tissues. For example, mutant NOTCH1 and TP53 are strongly selected and colonize a large proportion of skin, esophagus, and lung but are comparatively rare in other epithelia (10, 11, 19, 20). Both these genes are keratinocyte stem cell regulators, promoting differentiation, and mutant clones gain a proliferative advantage through a bias in cell fate from differentiation to proliferation and spread widely (40, 45, 46, 79). It is tempting to hypothesize that many normal tissue mutants are similarly key parts of the regulatory networks that control stem cell dynamics in the tissues in which they are selected.
Epithelial resilience, continuing to function normally while carrying clones with multiple driver mutants, is widespread but is currently unexplained. Candidate mechanisms include the highly conserved ability of epithelial cells to sense local density, which in squamous epithelia can trigger cell differentiation and exit from the proliferative compartment and is observed coincident with a return toward normal cell behavior in mouse models (40, 45, 99). Epithelial cells balance division and cell extrusion via mechanosensitive ion channels such as Piezo1, alterations in the dynamics of MAPK signaling, epithelial calcium waves, and cell–cell junctions, all of which may play a role in density-dependent regulation of mutant cells (100–104). The reversion from clonal expansion toward homeostasis occurs for every mutant clone in our normal tissues, barring the one that escapes to cause cancer, so this is a critical area for future research.
Mutational signatures are compelling evidence of environmental exposures that may alter normal tissue landscapes such as AA and tobacco exposure, and can identify cell populations not exposed to mutagenesis (11, 92). However, as shown with UV light, a mutagen can have a major impact by driving clonal expansion independent of its effect on generating mutations (40, 72). Other factors, such as low-dose radiation, may cause clonal expansion and leave no mutational signature (84). Only 3 of 20 known or suspected human chemical carcinogens were found to be mutagenic when administered to mice (105). Thus, the main effects of environmental factors on cancer risk may be by reshaping the mutational landscape of normal tissues rather than by mutagenesis. A combination of human and model system studies will be needed to resolve the mechanisms of action of potential carcinogens on aging mutated tissues.
Little is known about how germline variation affects normal tissue landscapes. A recent study of normal intestinal crypts from patients with germline POLE/POLD1 mutations that cause cancer predisposition found an increased mutational burden but no evidence of other genome changes or abnormal tissue function (106). An outstanding task is to extend normal tissue studies into diverse populations that vary in their genetics, environmental exposures, and risk of cancer.
It is noteworthy how many mutant genes that are selected in cancer are not enriched in normal tissue and vice versa, indicative of the different processes of competition that operate in the spatial zero-sum game of normal tissue compared with an expanding tumor (107, 108). It seems clear that in the future, cancer driver genes should not be defined in terms of their frequency in cancer alone but rather their relative frequency in tumors versus normal tissues (20). A mutant gene with the same prevalence in normal tissue as a tumor may have no role in carcinogenesis, whereas a mutant depleted in cancer compared with a normal tissue may inhibit transformation (19, 20). Other key differences between normal tissue clones and cancer are a vast increase in mutational burden in most cancers, additional mutational signatures indicating mutagenic processes not present in normal tissue, and a great increase in copy-number alterations.
There seems no simple way to predict cancer risk from the normal tissue landscape. Some very low-risk tissues, such as the small intestine, seem to have a similar mutational burden to the comparatively high-risk colon (13, 22). A very high prevalence of mutant clones such as in the esophagus need not translate into a high cancer risk, as the most prevalent mutant, NOTCH1, may even be antioncogenic (19, 80). By old age, epithelia harbor billions of cells carrying mutations associated with cancer, and yet in most cases no cancers form within a given tissue (53). Learning to decipher the metrics that predict cancer risk within the normal landscape of each tissue is a key challenge for the future.
Finally, can the somatic mutational landscape be manipulated to reduce cancer risk? Data from mouse models suggest this may be feasible. Treatment of mice with the WNT activator lithium chloride reduced the competitive advantage of Apc-null cells in the intestine and hence the number of adenomas they generate (42). Manipulating redox stress can deplete the population of Trp53-mutant cells in the mouse esophageal epithelium, and treatment with the antidiabetes drug metformin reduces the fitness of Pik3ca-mutant cells in the same tissue (84, 109). The challenges of designing long-term studies to test such interventions in humans are considerable, but rapid progress in this field gives hope that both candidate agents and the means to validate them may soon be developed.
Authors’ Disclosures
P.H. Jones reports grants from Wellcome Trust and Cancer Research UK during the conduct of the study, as well as grants from Cancer Research UK and Wellcome Trust outside the submitted work. No disclosures were reported by the other author.
Acknowledgments
We thank Peter Campbell, Mike Stratton, and Inigo Martincorena for insightful discussions. This work was supported by a grant from the Wellcome Trust to the Wellcome Sanger Institute (296194) and a Cancer Research UK Programme Grant to P.H. Jones (C609/A27326).