Mosaic mutations in normal tissues can occur early in embryogenesis and be associated with hereditary cancer syndromes when affecting cancer susceptibility genes (CSG). Their contribution to apparently sporadic cancers is currently unknown. Analysis of paired tumor/blood sequencing data of 35,310 patients with cancer revealed 36 pathogenic mosaic variants affecting CSGs, most of which were not detected by prior clinical genetic testing. These CSG mosaic variants were consistently detected at varying variant allelic fractions in microdissected normal tissues (n = 48) from distinct embryonic lineages in all individuals tested, indicating their early embryonic origin, likely prior to gastrulation, and likely asymmetrical propagation. Tumor-specific biallelic inactivation of the CSG affected by a mosaic variant was observed in 91.7% (33/36) of cases, and tumors displayed the hallmark pathologic and/or genomic features of inactivation of the respective CSGs, establishing a causal link between CSG mosaic variants arising in early embryogenesis and the development of apparently sporadic cancers.

Significance:

Here, we demonstrate that mosaic variants in CSGs arising in early embryogenesis contribute to the oncogenesis of seemingly sporadic cancers. These variants can be systematically detected through the analysis of tumor/normal sequencing data, and their detection may affect therapeutic decisions as well as prophylactic measures for patients and their offspring.

See related commentary by Liggett and Sankaran, p. 889.

This article is highlighted in the In This Issue feature, p. 873

Normal cells undergo mutagenesis throughout an individual's life span (1), resulting in somatic mosaicism, a phenomenon by which an individual has two or more cell populations with different genotypes (2). Various normal tissues, such as skin, bladder, and esophagus, have been shown to comprise mosaics of evolving clones harboring a panoply of somatic mutations (3–5). RNA-sequencing analyses of 29 different tissue types from over 400 individuals have revealed somatic clonal expansions in normal tissues (6). Notably, mosaic variants acquired as early as in embryogenesis may be present in adult normal cells, as evidenced by whole-genome sequencing analyses of normal blood from 241 adults where early embryonic somatic mutations were identified (7).

Mosaicism has been reported to affect cancer susceptibility genes (CSG) in individuals meeting clinical criteria for hereditary predisposition syndromes, such as tuberous sclerosis and neurofibromatosis (8, 9). The contribution of mosaic variants to the development of apparently sporadic cancers, however, has yet to be determined. In addition to the potential cancer predisposition risk conferred by mosaicism affecting CSGs, these variants may be transmitted to offspring if present in gonadal tissue. In addition, some of the CSGs affected by mosaic variants constitute indications for targeted therapies. Hence, their detection and the definition of their role in cancer predisposition are critical for genetic counseling and clinical management. The low variant allelic fraction (VAF) in blood of mosaic variants poses challenges to their detection using current clinical genetic testing methodologies (10). In the absence of matched paired tumor/normal samples, their distinction from sequencing artifacts, clonal hematopoiesis (CH), and circulating tumor cells (CTC) is challenging.

Here, through the analysis of >35,000 unselected patients with cancer undergoing tumor/normal sequencing, we sought to determine whether mosaic variants affecting CSGs arising in embryogenesis might contribute to the development of seemingly sporadic cancers. Our analyses revealed that mosaic variants affecting CSGs can occur early in embryogenesis, likely contribute to tumorigenesis, and result in increased risk of cancer development.

Detection of Cancer-Causative Mosaic Variants in CSGs

To detect candidate patients harboring mosaic variants affecting CSGs, following Institutional Review Board (IRB) approval from Memorial Sloan Kettering Cancer Center (MSKCC), we applied a set of filtering criteria to tumor/blood sequencing data from 35,310 unselected patients with cancer using the FDA-cleared MSK-Integrated Mutation Profiling of Actionable Targets assay (MSK-IMPACT; ref. 11; Fig. 1A; Supplementary Table S1). We sought to identify mosaic pathogenic/likely pathogenic (P/LP) variants in the 61 CSGs included in the MSK-IMPACT panel (Supplementary Table S2). A total of 3,113,225 variants affecting any of the 61 CSGs in the MSK-IMPACT panel were detected in the blood at any VAF in the 35,310 patients included in our study (Fig. 1A; Supplementary Tables S1 and S2). To select mosaic variants that would be causative of cancer, we applied a set of filters to identify candidate variants with a VAF of 1.5% to 30% in blood, ≥10% VAF in the tumor with a VAF at least 1.5× higher in the tumor as compared with the matched blood, total depth of sequencing coverage ≥50 for both tumor and blood, ≥6 reads of the alternate allele in blood, and an allele frequency <1% in any Genome Aggregation Database (gnomAD; v2.1.1; Fig. 1A; Supplementary Figs. S1A–S1E, S2A–S2D, and S3A and S3B; see Methods and Supplementary Methods). To distinguish true mosaic variants from CTC variants and CH, we excluded patients with >1 variant meeting these criteria and applied established CH criteria (12), respectively (Fig. 1A; Supplementary Methods). One hundred twenty-three CTC variants, including 48 affecting CSGs, were identified in 25 individuals, with an average of 4.9 (range, 2–24) CTC variants/case that largely (71.4%) displayed mutational signatures matching those of their respective tumors; only 23.6% of CTC variants were bona fide loss-of-function (LOF) mutations (Supplementary Tables S3 and S4).

Figure 1.

Identification of patients with cancer with candidate early mosaic variants. A, Schematic representation of the methodology for selection of patients with candidate mosaic variants in CSGs, sequencing methods, selection algorithm, filtering and exclusion criteria, and selected variants. B, VAF, sequencing depth, and variant type of the candidate P/LP mosaic variants identified by the set of filters in blood and tumor. C, Tumor type and germ layer derivation, CSG affected, variant type, biallelic inactivation of P/LP candidate mosaic variants (n = 36). Phenobars (right) depict clinical characteristics. Assoc, association; MPNST, malignant peripheral nerve sheet tumor; Mult, multiple; PANET, pancreatic neuroendocrine tumor; SCCOHT, small cell carcinoma of the ovary, hypercalcemic type.

Figure 1.

Identification of patients with cancer with candidate early mosaic variants. A, Schematic representation of the methodology for selection of patients with candidate mosaic variants in CSGs, sequencing methods, selection algorithm, filtering and exclusion criteria, and selected variants. B, VAF, sequencing depth, and variant type of the candidate P/LP mosaic variants identified by the set of filters in blood and tumor. C, Tumor type and germ layer derivation, CSG affected, variant type, biallelic inactivation of P/LP candidate mosaic variants (n = 36). Phenobars (right) depict clinical characteristics. Assoc, association; MPNST, malignant peripheral nerve sheet tumor; Mult, multiple; PANET, pancreatic neuroendocrine tumor; SCCOHT, small cell carcinoma of the ovary, hypercalcemic type.

Close modal

This analysis revealed 53 candidate mosaic variants affecting CSGs, including 36 P/LP variants and 17 variants of uncertain significance (VUS) in 36 and 17 individuals, respectively (Fig. 1A; Supplementary Table S5). The median VAFs of the P/LP mosaic variants detected in blood and tumors were 7.8% (range, 1.7%–28.3%) and 49.4% (range, 22.8%–79.7%), respectively, and the median blood and tumor sequencing depths at the mosaic variant sites were 474 (range, 256–923) and 629 (range, 192–1,520), respectively (Fig. 1B; Supplementary Table S5). Upon normalization by sequencing depth, the VAF remained increased in tumors relative to blood (Supplementary Fig. S4A–S4E). When taking tumor purity and ploidy into consideration, all mosaic variants were found to be clonal in the tumors analyzed (Supplementary Table S5). Upon applying the same algorithm to non-CSGs included in the MSK-IMPACT panel (n = 280; see Methods), we detected 79 mosaic variants meeting these criteria, all of which would be classified as VUS for hereditary cancer susceptibility (Supplementary Fig. S5). These findings demonstrate a significant enrichment for P/LP variants in CSGs as compared with non-CSGs taking into account their corresponding genomic footprints in the MSK-IMPACT panel (P = 5 × 10−31; two-tailed Fisher exact test), supporting the adequacy of the approach used for the detection of cancer-causative mosaic variants.

Clinical and Genetic Characterization of Cancer-Causative Mosaic Variants

The CSGs most frequently affected by P/LP mosaic variants were TP53 (n = 16) and RB1 (n = 5; Fig. 1C; Supplementary Table S5). Most (27/36; 75%) P/LP mosaic variants were bona fide LOF mutations [i.e., truncating, frameshifting, and splice-site single-nucleotide variants (SNV); Fig. 1B and C; Supplementary Table S5]. Tumor-specific biallelic inactivation of the CSGs affected by the mosaic variants was observed in 91.7% (33/36) of cases as loss of heterozygosity of the wild-type allele (28/36, 77.8%) or as a second inactivating somatic mutation (5/36, 13.9%; Fig. 1C; Supplementary Table S5). Tumors from patients with MSH2 (MOS14 and MOS36) and MSH6 (MOS9 and MOS38) mosaic variants displayed loss of MSH2 and MSH6 protein expression, respectively, as well as a dominant microsatellite instability (MSI) mutational signature, high tumor mutation burden, and enrichment for short indels, and/or were MSI-high by PCR analysis (Fig. 2AD; Supplementary Table S6; Supplementary Figs. S6A and S6B and S7A–S7D). The ovarian (MOS1) and breast (MOS8) cancers in BRCA2 P/LP mosaic variant carriers harbored genomic features indicative of homologous recombination DNA repair deficiency (HRD), such as dominant HRD mutational signature, genomic instability, high fraction of genome altered, increased number of large-scale state transitions, and/or increased indel length (Fig. 3AC; Supplementary Table S7). These findings support the notion that the mosaic variants identified here likely played an etiologic role in cancer development in these patients.

Figure 2.

Validation of candidate mosaic variants affecting mismatch repair genes by targeted sequencing. A, Schematic representation of the validation method of candidate mosaic variants and representative micrographs of hematoxylin and eosin (H&E)–stained tumors and normal tissues of case MOS14. Scale bars, 1 mm. B, VAF of the mosaic variants and tumor-derived nonsynonymous somatic mutations in microdissected tumors and normal tissues according to germ layer. ***, P < 0.001, Mann–Whitney U test. C and D, VAF of the mosaic variants affecting mismatch repair genes (red) and of tumor-derived nonsynonymous somatic mutations (gray) in tumor and normal tissues and tumor mutational signatures. Error bars, 95% confidence interval. Representative photomicrographs of H&E-stained slides and IHC analysis of mismatch repair proteins are depicted. Scale bars, 50 μm. AP, appendix; C1, histologic component 1; C2, histologic component 2; CO, colon; comp, component; EM, endometrium; FT, fallopian tube epithelium; MSI, microsatellite instability; N, normal; SigMA, Signature Multivariate Analysis; SMM, smooth muscle; STR, fallopian tube stroma; T, tumor.

Figure 2.

Validation of candidate mosaic variants affecting mismatch repair genes by targeted sequencing. A, Schematic representation of the validation method of candidate mosaic variants and representative micrographs of hematoxylin and eosin (H&E)–stained tumors and normal tissues of case MOS14. Scale bars, 1 mm. B, VAF of the mosaic variants and tumor-derived nonsynonymous somatic mutations in microdissected tumors and normal tissues according to germ layer. ***, P < 0.001, Mann–Whitney U test. C and D, VAF of the mosaic variants affecting mismatch repair genes (red) and of tumor-derived nonsynonymous somatic mutations (gray) in tumor and normal tissues and tumor mutational signatures. Error bars, 95% confidence interval. Representative photomicrographs of H&E-stained slides and IHC analysis of mismatch repair proteins are depicted. Scale bars, 50 μm. AP, appendix; C1, histologic component 1; C2, histologic component 2; CO, colon; comp, component; EM, endometrium; FT, fallopian tube epithelium; MSI, microsatellite instability; N, normal; SigMA, Signature Multivariate Analysis; SMM, smooth muscle; STR, fallopian tube stroma; T, tumor.

Close modal
Figure 3.

Validation of candidate mosaic variants affecting HRD genes by targeted sequencing. A and B, VAF of the mosaic variants affecting BRCA2 (red) and of tumor-derived nonsynonymous somatic mutations (gray) in tumor and normal tissues and tumor mutational signatures. Copy-number plots depicting segmented log2 ratios (y-axis) according to genomic position (x-axis). C, Number of state transitions according to segment size. Each line corresponds to a tumor (n = 15) of the 10 individuals in the validation cohort. DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma.

Figure 3.

Validation of candidate mosaic variants affecting HRD genes by targeted sequencing. A and B, VAF of the mosaic variants affecting BRCA2 (red) and of tumor-derived nonsynonymous somatic mutations (gray) in tumor and normal tissues and tumor mutational signatures. Copy-number plots depicting segmented log2 ratios (y-axis) according to genomic position (x-axis). C, Number of state transitions according to segment size. Each line corresponds to a tumor (n = 15) of the 10 individuals in the validation cohort. DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma.

Close modal

Tumors harboring candidate CSG mosaic variants were of various histologic types, the most frequent being breast cancer and sarcoma (6/36, each; Fig. 1C; Supplementary Fig. S8 and Supplementary Table S5), and stemmed from the different germ layers, including 13 tumors derived from the endoderm and mesoderm each and 10 tumors from the ectoderm (Fig. 1C; Supplementary Table S5). The CSGs affected by the mosaic variants were detected in cancer types expected in the syndromes caused by germline P/LP variants of the same genes in 80.6% (29/36) of cases. For instance, mosaic variants affecting BRCA2 were identified in breast or ovarian cancers, whereas those affecting APC were found in colorectal carcinoma or its precursor, tubular adenoma. Notably, in five of seven cases with a nonclassic tumor type–gene association, the tumor type has been reported in association with the germline variants in the genes affected, albeit at a lower frequency, such as prostate and gastric cancers in TP53 germline carriers (13) and sarcomas in Lynch syndrome (14). Our analysis also revealed the novel observation of SMARCA4 mosaic variants in two patients with small cell carcinomas of the ovary, hypercalcemic type (SCCOHT; Fig. 1C; Supplementary Table S8), indicating that in addition to somatic or germline SMARCA4 variants, SCCOHTs may also be underpinned by P/LP mosaic variants affecting SMARCA4. Akin to patients with germline variants in CSGs (15, 16), 27.8% (10/36) of individuals carrying a CSG mosaic variant had multiple tumor types (median = 2; range, 2–3; Supplementary Table S8) and an age of onset intermediate between sporadic and germline cases (Supplementary Figs. S9 and S10).

Out of the 24 patients who had previous germline genetic testing, including next-generation sequencing (n = 23) and Sanger sequencing (n = 1), only six (25%) had been reported as mosaic, whereas the remaining 18 (75%) were reported as negative for germline genetic testing. In the context of tumor/normal sequencing, 10 of 18 were misreported as tumor-derived mutations, whereas eight of 18 were not reported at all due to not meeting filtering criteria for somatic variant detection (Fig. 1C; Supplementary Tables S8 and S9), highlighting the need for the development of a systematic methodology for the detection of mosaic variants. Only approximately half (51.4%, 18/35) of the patients with evaluable medical history met the clinical criteria for germline genetic testing for the gene affected by the mosaic variant based on personal history, whereas only 2.9% (1/35) of cases met those criteria based on family history (Supplementary Tables S8 and S9). Hence, when a germline susceptibility is suspected, yet routine germline clinical genetic testing yields a negative result, assessment for mosaicism may be a reasonable approach, given that detection of these important variants would allow gene-specific cancer screening and prophylactic measures, screening of the patient's offspring, and even the potential use of specific therapies (e.g., PARP inhibitors in the case of BRCA2 mosaic P/LP variants or immune-checkpoint inhibitors in the case of mosaic P/LP variants affecting mismatch repair genes).

Validation of Mosaic Variants

To validate the mosaic nature of the candidate variants identified, we interrogated their presence in normal tissues deriving from different embryonic lineages in 10 patients with available formalin-fixed, paraffin-embedded (FFPE) material (Fig. 2A). To ensure adequate purity of the different tissues, following histopathologic evaluation and assessment of leukocyte infiltration, we conducted laser capture microdissection of normal (n = 48) and tumor (n = 15) FFPE tissues (Fig. 2A; Supplementary Table S10; Supplementary Figs. S11A–S11L and S12), and subjected them to targeted sequencing using MSK-IMPACT. A median of four (range, 3–9) different normal tissues were analyzed per case (Supplementary Table S10). We identified the CSG mosaic variants at varying VAFs (median, 7.6%; range, 1.5%–25.8%) in all normal samples interrogated, which included normal tissues of mesodermal, endodermal, and/or ectodermal lineages (Fig. 2B; Supplementary Table S10). The CSG mosaic variants were found to be enriched in the respective tumor tissues with a median VAF of 53.4% (range, 25.6%–92.9%; P < 0.001, Mann–Whitney U test; Figs. 2BD and 3A and B; Supplementary Fig. S13A–S13F and Supplementary Table S10). The tumors of eight of 10 cases analyzed harbored additional nonsynonymous somatic mutations (n = 76; median number of nonsynonymous mutations = 4.5/case; range, 1–34), none of which were detected in the normal tissues interrogated (Figs. 2BD and 3A and B; Supplementary Fig. S13C–S13F and Supplementary Table S10). Moreover, we conducted an additional orthogonal validation of the mosaic variants in 39 normal and 13 tumor tissues, respectively, of individuals with available FFPE material (n = 10) using amplicon sequencing. Our analyses validated the mosaic nature of the variants showing a strong positive correlation (r = 0.93; P = 2.2 × 10−16) in the VAF detected by MSK-IMPACT and amplicon sequencing (Supplementary Fig. S14A). In all the cases analyzed, the detection of the candidate mosaic variants in multiple tissue lineages in the absence of tumor-derived somatic mutations provides further evidence that the candidate variants interrogated were mosaic in nature.

Cancer-Causative Mosaic Variants Arise Early in Embryogenesis

Next, we sought to determine the time during development when mosaic variants arose. Assuming (i) a symmetrical cell contribution model, in which the genetic material of two daughter cells of a dividing progenitor cell contributes equally to adult tissues, and (ii) that these mosaic variants are heterozygous, variants occurring in the first five cell divisions would be expected to have a VAF in normal tissues ranging from 25% (first division) to 1.6% (fifth division; Fig. 4A; ref. 7). Given that ancestral clones of blood emerge early in embryogenesis, before gastrulation (17, 18), and the limits of detection of the tumor/normal sequencing assay used, mosaic variants arising after the fifth cell division, which would have an expected VAF lower than 1%, are unlikely to be detectable. We observed that although the VAFs of the 36 mosaic variants in blood fell within this range, their VAF distribution did not show peaks at the values expected for each of the first five cell divisions (Fig. 4A and B), suggesting an asymmetry in cell contribution, in agreement with previous studies (7, 18, 19). Using a log-likelihood model (ref. 7; Supplementary Methods), we sought to investigate the cell contribution asymmetry by introducing an asymmetry factor. We opted for a heuristic approach in which we modified the expected VAF of two cell divisions at a time while maintaining other cell divisions symmetric, as previously described (7). Our analyses revealed that the best-fitting asymmetric cell contribution model was a better representation of the data compared with the symmetric cell contribution model for both cell divisions 1 and 2 (P = 5.1 × 10−6, likelihood ratio test) and cell divisions 3 and 4 (P = 6.5 × 10−6, likelihood ratio test; Fig. 4C). These findings indicate that cell contribution during early embryogenesis is likely asymmetrical, as previously reported (7, 18, 19).

Figure 4.

Timing of occurrence and mutational processes of mosaic variants. A, Schematic representation of the first cell divisions of embryogenesis along with the expected VAF of mosaic variants per cell generation in adult normal tissues assuming a symmetrical cell contribution. B, VAF distribution of the 36 CSG mosaic variants. Expected VAF distribution from symmetric cell contribution model (red line) and best-fitting cell contribution mixture model (blue line). C, Contour plots depicting the log likelihoods of symmetric and asymmetric cell contributions. The x-axis and y-axis display the expected VAFs of the first and second cell divisions, respectively (left), and of the third and fourth cell divisions, respectively (right), given different cell contribution asymmetry levels (right x- and top y-axes). The dotted lines represent the expected VAFs of the respective cell divisions as per a symmetric cell contribution model. X, symmetric model; +, best-fitting asymmetric model. D, Assignment of the 36 mosaic variants to the VAF clusters of the best-fitting mixture model. The means of four VAF clusters are shown in red (cluster 1), blue (cluster 2), green (cluster 3), and orange (cluster 4). The expected VAFs of mosaic variants occurring in the first five cell generations assuming a symmetric cell contribution are shown as black dotted lines. Error bars, 95% confidence interval. E, Posterior probabilities of the 36 mosaic variants to belong to each of the four VAF clusters identified using the beta-binomial mixture model. F, SNV and indel mutational signatures of the 36 mosaic variants identified. sig, signature.

Figure 4.

Timing of occurrence and mutational processes of mosaic variants. A, Schematic representation of the first cell divisions of embryogenesis along with the expected VAF of mosaic variants per cell generation in adult normal tissues assuming a symmetrical cell contribution. B, VAF distribution of the 36 CSG mosaic variants. Expected VAF distribution from symmetric cell contribution model (red line) and best-fitting cell contribution mixture model (blue line). C, Contour plots depicting the log likelihoods of symmetric and asymmetric cell contributions. The x-axis and y-axis display the expected VAFs of the first and second cell divisions, respectively (left), and of the third and fourth cell divisions, respectively (right), given different cell contribution asymmetry levels (right x- and top y-axes). The dotted lines represent the expected VAFs of the respective cell divisions as per a symmetric cell contribution model. X, symmetric model; +, best-fitting asymmetric model. D, Assignment of the 36 mosaic variants to the VAF clusters of the best-fitting mixture model. The means of four VAF clusters are shown in red (cluster 1), blue (cluster 2), green (cluster 3), and orange (cluster 4). The expected VAFs of mosaic variants occurring in the first five cell generations assuming a symmetric cell contribution are shown as black dotted lines. Error bars, 95% confidence interval. E, Posterior probabilities of the 36 mosaic variants to belong to each of the four VAF clusters identified using the beta-binomial mixture model. F, SNV and indel mutational signatures of the 36 mosaic variants identified. sig, signature.

Close modal

In agreement with these findings, we observed that although the VAF of mosaic variants in blood and in other normal tissues was similar when aggregated across patients (P > 0.05, Mann–Whitney U test), the VAF of mosaic variants showed differences across normal tissues of the same individual (Figs. 2C and D and 3A and B; Supplementary Figs. S13A–S13F and S14B). In some individuals (MOS1 and MOS6), the VAF of the mosaic variant was relatively similar in the different normal tissues interrogated, whereas in others, such as MOS9, a wider VAF distribution across different tissues was observed, even across tissues of the same germ layer (Supplementary Fig. S15A). These findings provide further support to the notion that the daughter cells of a given cell division might contribute to adult tissues in an asymmetrical manner and suggest that the degree of cell contribution asymmetry has interindividual variability, consistent with previous reports (18, 19).

Using a clustering model based on beta-binomial mixture distributions, without considering expected VAFs from the different specific cell divisions, we sought to determine mosaic variant clusters based on their VAF in blood. The number of mixture distributions was determined by bootstrapping, whereby four mosaic variant VAF clusters were the simplest model that maximized the log likelihood of the model (Fig. 4B and D; Supplementary Fig. S15B). Based on the posterior probability distribution, we assigned the mosaic variants to one of the four VAF clusters and observed that the majority (72.2%; 26/36) belonged to the third and fourth clusters (Fig. 4E), which would suggest that most of these variants were acquired during the third or fourth cell divisions of embryogenesis. Nonetheless, asymmetry in cell contribution was observed, which showed marked interindividual variability in agreement with prior studies (18, 19). Due to the FFPE nature of our samples, single-cell sequencing that would allow individual phylogenetic inference per patient and incorporation of mutation rate, ideal for the assessment of mutation timing, could not be conducted. Therefore, we can conclude that the variants developed within the first five cell divisions; however, the exact timing could not be fully ascertained.

To define the biological processes involved in their genesis, we explored the mutational spectra and mutational signatures that shaped the mosaic variants in our cohort. The 13 mosaic indels identified had heterogeneous profiles, whereas the 23 mosaic SNVs were frequently C>T substitutions predominantly at CpG sites, consistent with a dominant clock-like/aging mutational signature (ref. 20; Fig. 4F), akin to what was reported by Ju and colleagues for early embryonic mutations (7), and as observed in the context of germline variants (21).

Here, we detected P/LP mosaic variants affecting CSGs in patients with apparently sporadic cancers subjected to tumor/normal sequencing using an FDA-cleared assay. These mosaic variants were present in tumors whose phenotypes are typical of the cognate syndromes caused by germline variants affecting the respective genes. Analysis of the tumor samples of the 36 patients with detectable P/LP mosaic variants targeting CSGs revealed biallelic inactivation of the respective CSG in 91.7% of cases, and that the tumors displayed pathologic and/or genomic features consistent with inactivation of the respective CSG. Given the identification of the P/LP mosaic variants in tissues derived from the mesoderm, endoderm, and ectoderm, the most parsimonious explanation is that the mutational process resulting in mosaicism occurred early in embryogenesis, before gastrulation, when the different primary germ layers are established (22). This finding is consistent with their detection in our initial screen, which required the mosaic variants to be present in both blood (mesoderm-derived) and tumors (mesoderm-, ectoderm-, or endoderm-derived). Our findings not only confirm previous reports of early mosaic variants in healthy human and mouse tissues (7, 23, 24), including those affecting cancer genes (3, 4, 6), but also provide a causative link between mosaic variants arising in early embryogenesis and seemingly sporadic cancers.

Through analyses of normal tissues of different lineages, we observed a likely asymmetry in the contribution of mosaic variants to different tissues, in agreement with studies reporting that cell contribution to postnatal tissues may not be symmetrical (7, 23). These differences could stem from evolutionary bottlenecks, distinct rates of proliferation and involution taking place in embryogenesis and postnatally, and a potential lineage-dependent positive or negative selection effect of mosaic variants affecting CSGs.

Our study has important limitations. The approach we used allowed only for the detection of patients with CSG mosaic variants present in blood, a tissue that differentiates early during embryogenesis, and our analysis was restricted to CSGs included in the MSK-IMPACT panel. Hence, the approximate 1/1,000 frequency of CSG P/LP mosaic variants in patients with cancer we identified likely constitutes a conservative estimate of their impact on cancer predisposition. Furthermore, the validation of mosaic variants was restricted to the samples available for each case. The study of a wider spectrum of tissues would require postmortem analyses with a priori identification of patients with somatic mosaic variants. Finally, owing to the FFPE nature of the samples, single-cell sequencing, which would constitute an orthogonal validation of the observations made and would allow the individual phylogenetic inference per patient, ideal for the assessment of mutation timing given the vast interindividual variability in terms of the asymmetry we observed and reported by others (18, 19), could not be conducted. Despite these limitations, our study demonstrates that mosaic variants affecting CSGs can be detected using a clinical-grade tumor/normal targeted sequencing assay, which is an additional advantage of this sequencing methodology in the clinical setting. Our study also provides a comprehensive analysis of mosaicism affecting CSGs in a large population of patients with cancer and demonstrates that some of these individuals harbor mosaic variants in CSGs occurring in early embryogenesis that likely contributed to cancer development.

Subjects and Samples

This study was approved by the MSKCC IRB, protocols 12-245 (genomic profiling in patients with cancer) and 19-154 (prevalence of somatic mosaicism in advanced cancer patients). Written informed consent was obtained according to IRB protocols. Deidentified tumor/blood MSK-IMPACT sequencing data of 35,310 patients with cancer enrolled on the institutional protocol 12-245 (NCT01775072) who underwent sequencing between January 2014 and December 2019 were retrieved.

Filtering Criteria for Detection of Cancer-Causative Mosaic Variants

Sixty-one of 341 genes present across MSK-IMPACT versions were determined to be associated with increased cancer susceptibility and a dominant mode of inheritance. These 61 CSGs were analyzed for candidate mosaic variants, and the variant pathogenicity was reviewed by a board-certified molecular pathologist (D. Mandelker) according to the American College of Medical Genetics and Genomics (ACMG) criteria (25).

To define candidate mosaic variants affecting CSGs (Supplementary Methods), given that mosaic variants are expected to have a VAF <50%, seen for heterozygous variants, and that the MSK-IMPACT sequencing resolution is limited at VAFs <1.5%, we selected variants with a blood VAF ≥1.5% to 30%. To detect cancer-causative mosaic variants that would therefore be enriched in tumors, we selected variants with a tumor VAF ≥10% and a tumor/blood VAF ratio ≥1.5, given that the individual is mosaic, but the cell giving rise to the tumor was heterozygous for the variant at the outset. To minimize artifacts, only reads with a mapping quality >20 and cases with a sequencing depth ≥50 for both tumor and blood samples with ≥6 reads of the alternate allele in blood were included. Detected variants had to have an allele frequency <1% in the gnomAD (v2.1.1). In addition, we excluded variants in highly repetitive regions and variant calls with strand bias, as described (12). To distinguish mosaic variants from CTC variants present in the blood samples, cases with ≥1 variant in blood targeting CSGs or non-CSGs meeting the above criteria were excluded. To exclude circulating malignancies, variants from patients with hematologic malignancies were removed. To distinguish mosaic variants from CH, established CH criteria were applied (12).

The pathogenicity of mosaic variants targeting non-CSGs (n = 280) with known gene–disease associations were determined according to the ACMG criteria.

IHC

Expression of MSH2, MSH6, and CD45 was assessed by IHC on a Bond-3 automated stainer platform (Leica Biosystems; Supplementary Methods).

Statistical Analysis

Statistical analyses were conducted using Rv3.1.2. VAF 95% confidence intervals were calculated using the Wilson procedure. Mann–Whitney U test and Fisher exact test were used for continuous and categorical variables, respectively. P values were two-sided and adjusted for multiple testing wherever appropriate. P values < 0.05 were considered statistically significant.

Data Availability

Somatic mutations and copy-number alterations for the 36 mosaic cases identified in this study are available on cBioPortal (https://www.cbioportal.org/study/summary?id=mixed_mos_msk_2021). Targeted sequencing data (aligned BAM files) of tumor and normal tissues included in the validation cohort supporting the findings of this study have been deposited in the Sequence Read Archive under accession number SUB10801254.

F. Pareja reports grants from the NIH/NCI during the conduct of the study. K. Breen has an immediate family member who is on the scientific advisory board for Emendo Biotherapeutics, Karyopharm Therapeutics, Imago BioSciences, and DarwinHealth; is cofounder of Isabl Technologies; and has equity interest in Imago BioSciences, Emendo Biotherapeutics, and Isabl Technologies. M.F. Berger reports personal fees from Eli Lilly outside the submitted work. E. Comen reports personal fees from Pfizer and Novartis outside the submitted work. N. Riaz reports grants from Repare Therapeutics and Bristol Myers Squibb, and personal fees from Illumina outside the submitted work. B. Weigelt reports personal fees from Repare Therapeutics outside the submitted work. Z.K. Stadler reports personal fees from Alcon, Adverum, Gyroscope Therapeutics, Neurogene, and RegenexBio outside the submitted work. M.E. Robson reports grants from AstraZeneca, Merck, and Pfizer and personal fees from Change Healthcare outside the submitted work, as well as uncompensated advisory for Artios Pharma, AstraZeneca, Daiichi Sankyo, Epic Sciences, Merck, Pfizer, Tempus Lab, and Zenith Pharma. J.S. Reis-Filho reports personal fees from Paige, Repare Therapeutics, Goldman Sachs, Grupo Oncoclinicas, Roche Tissue Diagnostics, Genentech, Roche, In Vicro, Eli Lilly, Personalis, and Volition Rx outside the submitted work. D. Mandelker reports grants from the NIH/NCI during the conduct of the study. No disclosures were reported by the other authors.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

F. Pareja: Conceptualization, resources, data curation, formal analysis, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. R.N. Ptashkin: Data curation, software, formal analysis, investigation, methodology, writing–review and editing. D.N. Brown: Software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. F. Derakhshan: Validation, investigation, writing–review and editing. P. Selenica: Data curation, formal analysis, investigation, writing–review and editing. E.M. da Silva: Investigation, writing–review and editing. A.M. Gazzo: Formal analysis, validation, investigation, visualization, writing–review and editing. A. Da Cruz Paula: Formal analysis, validation, investigation, writing–review and editing. K. Breen: Data curation, formal analysis, investigation, writing–review and editing. R. Shen: Investigation, writing–review and editing. A. Marra: Data curation, formal analysis. A. Zehir: Investigation, writing–review and editing. R. Benayed: Investigation, writing–review and editing. M.F. Berger: Investigation, writing–review and editing. O. Ceyhan-Birsoy: Investigation, writing–review and editing. S. Jairam: Investigation, writing–review and editing. M. Sheehan: Investigation, writing–review and editing. U. Patel: Investigation, writing–review and editing. Y. Kemel: Investigation, writing–review and editing. J. Casanova-Murphy: Formal analysis, investigation. C.J. Schwartz: Validation, investigation, writing–review and editing. M. Vahdatinia: Investigation, writing–review and editing. E. Comen: Investigation, writing–review and editing. L. Borsu: Investigation, writing–review and editing. X. Pei: Formal analysis, investigation. N. Riaz: Formal analysis, investigation. D.H. Abramson: Investigation, writing–review and editing. B. Weigelt: Investigation, writing–review and editing. M.F. Walsh: Investigation, writing–review and editing. A.-K. Hadjantonakis: Investigation, writing–review and editing. M. Ladanyi: Investigation, writing–review and editing. K. Offit: Investigation, writing–review and editing. Z.K. Stadler: Investigation, writing–review and editing. M.E. Robson: Conceptualization, investigation, writing–review and editing. J.S. Reis-Filho: Conceptualization, resources, supervision, funding acquisition, investigation, writing–original draft, writing–review and editing. D. Mandelker: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, writing–review and editing.

This study was partially funded by the Breast Cancer Research Foundation and by the Sarah Leigh Fund. Research reported in this study was partly funded by a Cancer Center Support Grant of the NIH/NCI (P30CA008748). F. Pareja is partially funded by an NIH K12 grant (CA184746), and B. Weigelt is partially funded by a Cycle for Survival grant. F. Pareja, R. Shen, N. Riaz, B. Weigelt, and J.S. Reis-Filho are funded in part by an NIH/NCI P50 grant (CA247749 01). Y. Kemel, K. Breen, and Z.K. Stadler are partially supported by the Robert and Kate Niehaus Center for Inherited Cancer Genomics and the Sharon Corzine Research Foundation.

1.
Vogelstein
B
,
Papadopoulos
N
,
Velculescu
VE
,
Zhou
S
,
Diaz
LA
Jr
,
Kinzler
KW
.
Cancer genome landscapes
.
Science
2013
;
339
:
1546
58
.
2.
Biesecker
LG
,
Spinner
NB
.
A genomic view of mosaicism and human disease
.
Nat Rev Genet
2013
;
14
:
307
20
.
3.
Martincorena
I
,
Roshan
A
,
Gerstung
M
,
Ellis
P
,
Van Loo
P
,
McLaren
S
et al
.
Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin
.
Science
2015
;
348
:
880
6
.
4.
Lawson
ARJ
,
Abascal
F
,
Coorens
THH
,
Hooks
Y
,
O'Neill
L
,
Latimer
C
et al
.
Extensive heterogeneity in somatic mutation and selection in the human bladder
.
Science
2020
;
370
:
75
82
.
5.
Martincorena
I
,
Fowler
JC
,
Wabik
A
,
Lawson
ARJ
,
Abascal
F
,
Hall
MWJ
et al
.
Somatic mutant clones colonize the human esophagus with age
.
Science
2018
;
362
:
911
7
.
6.
Yizhak
K
,
Aguet
F
,
Kim
J
,
Hess
JM
,
Kubler
K
,
Grimsby
J
et al
.
RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues
.
Science
2019
;
364
.
7.
Ju
YS
,
Martincorena
I
,
Gerstung
M
,
Petljak
M
,
Alexandrov
LB
,
Rahbari
R
et al
.
Somatic mutations reveal asymmetric cellular dynamics in the early human embryo
.
Nature
2017
;
543
:
714
8
.
8.
Verhoef
S
,
Bakker
L
,
Tempelaars
AM
,
Hesseling-Janssen
AL
,
Mazurczak
T
,
Jozwiak
S
et al
.
High rate of mosaicism in tuberous sclerosis complex
.
Am J Hum Genet
1999
;
64
:
1632
7
.
9.
Kehrer-Sawatzki
H
,
Cooper
DN
.
Mosaicism in sporadic neurofibromatosis type 1: variations on a theme common to other hereditary cancer syndromes?
J Med Genet
2008
;
45
:
622
31
.
10.
Mandelker
D
,
Ceyhan-Birsoy
O
.
Evolving significance of tumor-normal sequencing in cancer care
.
Trends Cancer
2020
;
6
:
31
9
.
11.
Cheng
DT
,
Mitchell
TN
,
Zehir
A
,
Shah
RH
,
Benayed
R
,
Syed
A
et al
.
Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology
.
J Mol Diagn
2015
;
17
:
251
64
.
12.
Bolton
KL
,
Ptashkin
RN
,
Gao
T
,
Braunstein
L
,
Devlin
SM
,
Kelly
D
et al
.
Cancer therapy shapes the fitness landscape of clonal hematopoiesis
.
Nat Genet
2020
;
52
:
1219
26
.
13.
Kratz
CP
,
Freycon
C
,
Maxwell
KN
,
Nichols
KE
,
Schiffman
JD
,
Evans
DG
et al
.
Analysis of the Li-Fraumeni spectrum based on an international germline TP53 variant data set: an International Agency for Research on Cancer TP53 database analysis
.
JAMA Oncol
2021
.
14.
Nilbert
M
,
Therkildsen
C
,
Nissen
A
,
Akerman
M
,
Bernstein
I
.
Sarcomas associated with hereditary nonpolyposis colorectal cancer: broad anatomical and morphological spectrum
.
Fam Cancer
2009
;
8
:
209
13
.
15.
Vogt
A
,
Schmid
S
,
Heinimann
K
,
Frick
H
,
Herrmann
C
,
Cerny
T
et al
.
Multiple primary tumours: challenges and approaches, a review
.
ESMO Open
2017
;
2
:
e000172
.
16.
Cybulski
C
,
Nazarali
S
,
Narod
SA
.
Multiple primary cancers as a guide to heritability
.
Int J Cancer
2014
;
135
:
1756
63
.
17.
Lee-Six
H
,
Obro
NF
,
Shepherd
MS
,
Grossmann
S
,
Dawson
K
,
Belmonte
M
et al
.
Population dynamics of normal human blood inferred from somatic mutations
.
Nature
2018
;
561
:
473
8
.
18.
Park
S
,
Mali
NM
,
Kim
R
,
Choi
JW
,
Lee
J
,
Lim
J
et al
.
Clonal dynamics in early human embryogenesis inferred from somatic mutation
.
Nature
2021
;
597
:
393
7
.
19.
Coorens
THH
,
Moore
L
,
Robinson
PS
,
Sanghvi
R
,
Christopher
J
,
Hewinson
J
et al
.
Extensive phylogenies of human development inferred from somatic mutations
.
Nature
2021
;
597
:
387
92
.
20.
Alexandrov
LB
,
Kim
J
,
Haradhvala
NJ
,
Huang
MN
,
Ng
AWT
,
Wu
Y
et al
.
The repertoire of mutational signatures in human cancer
.
Nature
2020
;
578
:
94
101
.
21.
Rahbari
R
,
Wuster
A
,
Lindsay
SJ
,
Hardwick
RJ
,
Alexandrov
LB
,
Turki
SA
et al
.
Timing, rates and spectra of human germline mutation
.
Nat Genet
2016
;
48
:
126
33
.
22.
Ghimire
S
,
Mantziou
V
,
Moris
N
,
Arias
AM
.
Human gastrulation: the embryo and its models
.
Dev Biol
2021
;
474
:
100
8
.
23.
Behjati
S
,
Huch
M
,
van Boxtel
R
,
Karthaus
W
,
Wedge
DC
,
Tamuri
AU
et al
.
Genome sequencing of normal cells reveals developmental lineages and mutational processes
.
Nature
2014
;
513
:
422
5
.
24.
Bae
T
,
Tomasini
L
,
Mariani
J
,
Zhou
B
,
Roychowdhury
T
,
Franjic
D
et al
.
Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis
.
Science
2018
;
359
:
550
5
.
25.
Richards
S
,
Aziz
N
,
Bale
S
,
Bick
D
,
Das
S
,
Gastier-Foster
J
et al
.
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
.
Genet Med
2015
;
17
:
405
24
.

Supplementary data