Mismatch repair (MMR)–deficient cancers have been discovered to be highly responsive to immune therapies such as PD-1 checkpoint blockade, making their definition in patients, where they may be relatively rare, paramount for treatment decisions. In this study, we utilized patterns of mutagenesis known as mutational signatures, which are imprints of the mutagenic processes associated with MMR deficiency, to identify MMR-deficient breast tumors from a whole-genome sequencing dataset comprising a cohort of 640 patients. We identified 11 of 640 tumors as MMR deficient, but only 2 of 11 exhibited germline mutations in MMR genes or Lynch Syndrome. Two additional tumors had a substantially reduced proportion of mutations attributed to MMR deficiency, where the predominant mutational signatures were related to APOBEC enzymatic activity. Overall, 6 of 11 of the MMR-deficient cases in this cohort were confirmed genetically or epigenetically as having abrogation of MMR genes. However, IHC analysis of MMR-related proteins revealed all but one of 10 samples available for testing as MMR deficient. Thus, the mutational signatures more faithfully reported MMR deficiency than sequencing of MMR genes, because they represent a direct pathophysiologic readout of repair pathway abnormalities. As whole-genome sequencing continues to become more affordable, it could be used to expose individually abnormal tumors in tissue types where MMR deficiency has been rarely detected, but also rarely sought. Cancer Res; 77(18); 4755–62. ©2017 AACR.
Mismatch repair (MMR) deficiency in human cancer is associated with heterozygous germline mutations in components of MMR machinery (MSH2, MSH6, MLH1, PMS2 – otherwise known as Lynch syndrome) or due to somatic mutations in these genes and/or promoter hypermethylation of MLH1 (1–9). These classical tumor suppressors lose the wild-type allele in the tumor, resulting in biallelic gene inactivation. The MMR pathway plays a pivotal role in reducing error rates during replication, thus MMR deficiency is associated with high levels of mutagenesis—10- to 100-fold more mutations than tumors that have intact MMR pathways (9, 10).
Recognition of MMR-deficient tumors is clinically important. MMR-deficient colorectal cancers, although associated with poor differentiation and intense lymphocytic infiltration, have better prognoses. More recently, MMR-deficient tumors have shown clinical responses to therapies that interfere with T-cell immune checkpoints (11–14), particularly of PD-1/PD-L1 proteins. In a phase II clinical trial (15), an objective clinical response to anti-PD1 treatment was demonstrable in MMR-deficient tumors (4/10 colorectal and 5/7 of other tissue-types) in contrast to 0 of 18 of MMR-proficient tumors. This was recently reinforced by an expanded study evaluating anti-PD-1 efficacy across 12 tumor types (14). Sensitivity to immune therapy in MMR-deficient tumors is believed to be mediated through T-cell reactivity against “neoantigens”—novel epitopes that are formed as a consequence of the excessive somatic mutations in these cancers (16). Intriguingly, this phenomenon is not restricted to colon cancer, thus identification of all MMR-deficient tumors, regardless of tissue-type could be beneficial (17).
Diagnostic methods for identification of MMR-deficient tumors have evolved over the last 25 years. Histopathologic criteria preceded PCR-based testing of microsatellite loci, before IHC for loss of MMR protein expression became routine in clinical diagnostics. Sequencing of DNA repair genes and methylation assays are also used to confirm somatic and/or germline pathogenic mutations. However, somatic MMR deficiency is only sought routinely in cancers where it is frequent, like colorectal (∼20%) and endometrial (∼20%–30%) cancer. Clinical testing is not customarily performed where it is less common [e.g., cervical cancer (8%), or breast cancer (0%–2%); refs. 12, 15]. Indeed, the rare occurrence of MMR-deficient breast tumors has been attributed to germline mutations in Lynch Syndrome, and frequently assessed in that context although the true relationship between MMR-deficient breast tumors and inherited predisposition remains controversial (18).
Modern cancer resequencing experiments, particularly whole-genome sequencing (WGS) approaches, have revealed much more than the handful of driver mutations that are causally implicated in carcinogenesis. They have also exposed the many thousands of passenger mutations that are the products of the mutational processes that have occurred throughout tumorigenesis (19–22). Each mutational process leaves its own characteristic imprint or mutational signature on the genome (23), an indicator of past and ongoing exposures, whether of environmental insults such as ultraviolet radiation, or of endogenous biochemical degradation and deficiencies of DNA repair pathways like MMR deficiency.
Previously, substitution signatures 6, 20, and 26 (among others) were associated with MMR deficiency in colorectal, stomach, and uterine carcinomas (http://cancer.sanger.ac.uk/cosmic/signatures). Here, we find that eleven of 640 breast tumors (1.7%) in a WGS study show variable quantities of these substitution signatures. Critically, this subset of tumors would not have been detected as being MMR-deficient using current clinical criteria for assessment of breast cancer. We describe their distinctive and highly individual, pathognomonic genomic profiles, confirm the diagnosis using genetic, epigenetic, and protein IHC methods, and emphasize how genome profiling using mutational signatures could be a powerful additional tool for tumor stratification.
Material and Methods
Detailed methods are provided in Supplementary Materials and Methods.
The overarching accession number for the data on 560 breast cancers previously described in Nik-Zainal and colleagues is EGAS00001001178 (24). For the 80 additional breast cancers described in Davies and colleagues (25), data are deposited under the following EGA accession numbers; bam files in EGAD00001002740 and SNP6 Cel files in EGAD00010001079.
Internal review boards of each participating institution approved collection and use of samples of all patients in this study. Informed consent was obtained by the relevant participating institution. Ethical guidelines were in accordance with the Declaration of Helsinki.
DNA was extracted from 640 breast cancer cases along with corresponding normal tissue and subjected to WGS as described previously (24, 25). Resulting BAM files were aligned to the reference human genome (GRCh37) using Burrows-Wheeler Aligner, BWA (v0.5.9).
Mutation calling was performed as described previously (24). Briefly, CaVEMan (Cancer Variants through Expectation Maximization: http://cancerit.github.io/CaVEMan/) was used for calling somatic substitutions. Indels in the tumor and normal genomes were called using a modified Pindel version 2.0 (http://cancerit.github.io/cgpPindel/) on the NCBI37 genome build. Structural variants were discovered using a bespoke algorithm, BRASS (BReakpoint AnalySiS; https://github.com/cancerit/BRASS) through discordantly mapping paired-end reads followed by de novo local assembly using Velvet to determine exact coordinates and features of breakpoint junction sequence.
In total, 3,808,160 somatic base substitutions, 399,466 small indels, and 83,191 rearrangements were detected in the 640 samples (24, 25).
Mutation signature analysis based on nonnegative matrix factorization (NMF) was performed as described previously (22, 24). Twelve base consensus substitution signatures were identified in the 560 breast whole genomes: signatures 1, 2, 3, 5, 6, 8, 13, 17, 18, 20, 26, and 30 in that analysis where a primary signature extraction using NMF was followed by reintroduction of the 12 signatures mentioned above as currently reported on COSMIC (http://cancer.sanger.ac.uk/cosmic/signatures). In the description of previously published data in this manuscript, we have utilized the mutational signatures that were extracted and assigned previously using Alexandrov's method. The activity of these twelve signatures was estimated in a further 80 breast cancer whole genomes (25).
In all new analyses, we did not have access to the Alexandrov method. To maintain consistency, all new analyses used the same new iterative algorithm to identify the set of COSMIC signatures [cancer.sanger.ac.uk/cosmic/signatures] active in each sample (the so called “exposure”; SigFit, Morganella and colleagues manuscript submitted). Each sample was completely described by a vector containing the number of substitutions observed for each mutation and flanking sequence context (defined by the neighboring bases immediately 5′ and 3′ to the mutated base and by the mutated base itself). Each mutation was orientated with respect to the pyrimidine strand and consequently each vector contained 96 elements. The algorithm started from an initial solution estimated by using a simulated annealing-based method. Then mutations were iteratively reassigned to alternative signatures, cosine similarities were obtained through comparing the reconstructed 96-element vector for each potential reassignment to that of the observed 96-element vector with the aim of identifying the highest possible cosine similarity value. The algorithm stopped when no improvement to the cosine similarity was found. We ran the algorithm controlling the sparsity of the solution (number of signatures simultaneously active in each sample). In particular, we used three different values for the parameter alpha: 0, 0.01, and 0.02 (the sparsity of the solution increases with alpha). In addition, we assessed the consistency of the results for the breast data using two predefined sets of signatures: one containing all thirty COSMIC signatures and one containing only the subset of 12 signatures that were previously associated with breast cancer. As expected different results were obtained for different sets of a priori signatures provided (12 or 30). However, results for different settings of alpha were largely consistent with minor variation, reflecting the degree of constraint between different alpha (Supplementary Table S1) within each set of a priori signatures. To maintain consistency in all downstream analyses, the setting of alpha 0.01 was used for all analyses shown in this publication.
Two indel signatures based on the presence of either short tandem repeats or short stretches of identical sequence at the breakpoints (termed overlapping microhomology), were also extracted.
ASCAT copy number analysis
Single-nucleotide polymorphism (SNP) array hybridization using the Affymetrix SNP6.0 platform was performed according to Affymetrix protocols. Allele-specific copy number analysis of tumors was performed using ASCAT (v2.1.1) was performed as described previously (24). ASCAT takes non-neoplastic cellular infiltration and overall tumor ploidy into consideration, to generate integer-based allele-specific copy number profiles for the tumor cells. ASCAT was also applied to next-generation sequencing data directly with highly comparable results. Copy number values and estimates of aberrant tumor cell fraction provided by ASCAT are in put into the CaVEMan substitution algorithm. In addition, ASCAT segmentation profiles were used to establish the presence of loss of heterozygosity across MMR genes and to search for homozygous deletion of these genes.
Detection of variants in MMR genes
We sought to discover both germline and somatic mutations of all classes in the following genes involved in MMR; MLH1, MSH2, MSH6, PMS2, PMS1, SETD2, MYH11, EPCAM, TGFBR2, MLH3, and in MUTYH in the 640 breast cancer cases. Single base substitutions and small insertions/deletions were interrogated using Caveman and Pindel algorithms, respectively. While large deletions were investigated using a combination of ASCAT copy number data and rearrangement calls by BRASS. Variants affecting the coding regions of these genes were verified by visual inspection to remove common sequencing artefacts and cross-referenced against dbSNP and ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) to identify benign polymorphisms. Variants with evidence in both the tumor and corresponding non-neoplastic tissue were deemed to be germline, while those restricted to the tumor were somatic. ASCAT copy number data was used to determine whether there had been loss of the alternative allele in the tumor sample.
IHC staining of formalin-fixed paraffin-embedded sections was performed by the center responsible for recruiting the patient. Staining for the MLH1, PMS2, MHS2, and MSH6 proteins was performed according to standard clinical diagnostic lab procedures for the relevant center.
Methylation analyses for identification of hypermethylated genes
DNA methylation of cancer samples were assayed by Infinium Human Methylation 450k Beadchips (Illumina) according to the manufacturers' protocols. Analysis was performed as described previously (24).
Detection of MMR-deficient cancers in other datasets
See Supplementary Materials and Methods for more details of the analysis of an MMR-deficient Ovarian cancer sample and of whole-exome sequence (WES) breast cancer samples.
Genomic profiles of presumptive MMR-deficient breast tumors
Substitution signature analysis of 640 WGS breast tumors (24, 25) revealed eleven samples (1.7%) with variable quantities of substitution signatures 6, 20, and 26 (Supplementary Table S2; Supplementary Fig. S1). In addition, all eleven had strikingly high numbers of insertions/deletions (indels); orders of magnitude more than other breast cancers (mean of 20,870, range 2,535–66,764 in these 11 compared with mean of 270, range 19–1,512 in other 629 breast cancers). These indels occurred at polynucleotide repeat tracts consistent with the pattern of microsatellite instability associated with MMR deficiency. Altogether, these eleven tumors had a constellation of genomic features that were quite distinct from those identified in the other 629 breast cancers (Fig. 1) and are described in more detail below (Fig. 2).
Of the 11 patients, 5 were ER-positive and 6 were ER-negative tumors. Nine of 11 patients had breast cancers that were hypermutated by MMR substitution signatures, reflected in the absolute number (range 9,232–84,432, average 31,166) and proportions (79%–100%) of Signature 6, 20, and 26 mutations (Fig. 2). The two remaining patients had a large number of mutations attributed to the activity of APOBEC cytidine deaminases in addition to MMR-related signatures. PD5937a had 34% of mutations attributed to MMR signatures while PD23561a had only 13.7% of MMR-related mutations. These two samples thus showed a greater degree of heterogeneity of mutational signatures. It is likely that the predominance of the APOBEC-related signature is indicative of earlier onset of this mutational process in the tumor's clonal evolution, prior to acquiring the MMR signature. See Supplementary Fig. S2 for genome plots of all 11 samples.
Genetic/epigenetic and IHC evidence supporting MMR deficiency in the breast tumors
To support our hypothesis that these genomic profiles were indicative of MMR deficiency, we searched for genetic and/or epigenetic evidence of MMR inactivation in MMR-related genes (MLH1, MSH2, MSH6, PMS2, PMS1, SETD2, MYH11, EPCAM, TGFBR2, MLH3) and in MUTYH. Inactivating mutations were confirmed in 6 of 11 cancers (55%). Of these 6 cases, two patients had somatic MLH1 mutations, one had a heterozygous germline MLH1 truncating mutation, and one had a PMS2 germline heterozygous Lynch Syndrome mutation (c.137G>T p.S46I, rs121434629; ref. 26). All four tumors had loss of the alternative parental allele making these genes null at the cellular level. The fifth patient had promoter hypermethylation of MLH1 to a level consistent with complete inactivation of the gene. The remaining tumor had somatic compound heterozygous mutations in MSH2. Therefore, genetic/epigenetic confirmation was not achieved in 5 of the 11 (45%) putatively MMR-deficient tumors, although methylation data were not available for these tumors (See Supplementary Table S3 for details of the mutations).
To obtain alternative, corroborative evidence of MMR deficiency, IHC for MMR proteins was performed for 10 of the 11 cases (see Fig. 3 for examples). Reassuringly, 9 of 10 cancer samples were confirmed to exhibit concomitant deficits of MLH1 and PMS2 (7), or MSH2 and MSH6 (2) IHC staining, in-keeping with expectations in clinical diagnostics of MMR-deficient tumors (Fig. 2). The tenth sample (PD23561a), however, was the exception. This sample was the particular tumor that demonstrated only a small contribution of MMR signatures (13.7% of substitutions). It is therefore possible that the IHC result is a false negative because MMR deficiency is restricted to only a subpopulation of cells in this tumor. These results emphasize two things. First, knowledge of the precise MMR driver mutation is not essential as the mutational signatures appear to be a dependable read-out of MMR abrogation. Second, for MMR deficiency that has arisen later in cancer evolution or is present only at subclonal levels, genomic signature profiling may be a more faithful reporter than IHC, outperforming prevailing protein-based methods used in clinical practice today.
Notably, monoallelic MMR mutations did not result in MMR signatures as somatic PMS2 (PD22361a), germline PMS2 (PD13608a), germline MSH6 (PD18006a), and germline MLH3 (PD5936a) mutations were observed in the rest of the breast cancer dataset but did not yield genomic features of MMR abrogation. Therefore, biallelic inactivation appears to be a prerequisite for a tumor to exhibit MMR deficiency.
Mutational signatures of MMR deficiency are associated with specific gene defects
In addition, correlations between mutational signatures and particular MMR defects are observed. MLH1-inactivated breast cancers had combinations of predominantly mutation types C>T/G>A and T>C/A>G transitions (classified as Signatures 6 and 26, respectively) with overwhelming indel mutagenesis, particularly deletions at polynucleotide repeat tracts. In contrast, PMS2 inactivation appears to be enriched mainly for T>C/A>G transitions with some contribution of T>G/A>C transversions (classified as Signature 26 in breast cancer previously) and insertions at polynucleotide repeat tracts. This observation was upheld in another cancer type - where a germline PMS2 Lynch Syndrome patient, carrying the identical c.137G>T p.S46I mutation, had a strikingly similar mutation profile and enrichment of insertions (Supplementary Fig. S3; ref. 27) in an ovarian cancer. The field of mutational signatures is still in its infancy and there is some variability in the extractions of mutational signatures and their assignments in different tumor types. Signature 26 for example, had not previously been described in ovarian tumors. To avoid bias, we performed an unsupervized mutational signature extraction and found probable contributions from signatures 9 and 12 (of the 30 mutation signatures currently available at COSMIC; http://cancer.sanger.ac.uk/cosmic/signatures) rather than signature 26 in both the tumors with the germline PMS2 mutations (Supplementary Table S1 and Supplementary information for more details). Indeed, inspection of the 96-element mutational profiles of these two tumors suggests that this alternative result may be more suitable than the previous conclusion of Signature 26. However, the numbers are small and the field is still developing; thus further investigations are warranted to draw definitive conclusions. Regardless of what the final attributed “signature/s” will be, there are early hints at associations between mutational signatures and specific genetic defects, at least within the mismatch repair pathway.
Population frequency estimates of MMR deficiency in breast cancer
To investigate whether MMR deficiency could be detected using alternative sequencing strategies in breast cancer, we restricted the analysis to mutations that fell within coding sequences, mimicking a WES experiment. Although MMR signatures were detectable in all of the 11 cases with MMR deficiency, in the absence of any post hoc filtering, many other samples were erroneously assigned MMR signatures as well (Supplementary Table S4). Therefore, we applied strict criteria to classifying MMR-deficient tumors (Supplementary for details) on a new pool of data, of approximately 1,100 WES breast cancers (http://cancer.sanger.ac.uk/cosmic). We found that approximately 1% of tumors (14 of 1,097) could be classified as MMR-deficient. In contrast, an alternative method, which utilizes WES data to analyze instability at microsatellite loci (28) on a subset overlapping this dataset, did not detect any MMR-deficient breast cancers. WES-based classification is almost certainly conservative (Supplementary Fig. S4; Supplementary Table S4) although here, helps to provide an approximation of population frequency. These 14 of 1,097 samples were not hitherto highlighted as MMR-deficient in mutational signature analyses (21), partially due to lack of previous understanding. Advances in analysis and insights gained from the vast bank of WGS breast cancers sequenced more recently, permit an appraisal of old data, with new awareness.
Two important questions arise from this study. First, what is the clinical significance of these observations? The frequency of MMR-deficient breast cancers in this dataset is approximately 1%–2%. Most MMR-deficient breast tumors are not due to Lynch syndrome, as only 2 of the 11 cases carried a germline mutation in an MMR gene. Forty-five percent of cases would not have been identified through sequencing of DNA repair genes in the tumor or in the germline but were instantly recognizable as MMR deficient through mutational signature profiling, increasing the yield of tumors that could have selective sensitivity to specific immune-therapy treatments. Second, is WGS really necessary to detect MMR-deficient tumors? Where the increase in mutational burden is so great, exome sequencing alone could expose tumors where MMR deficiency has arisen early and is clonal, although we demonstrate that there is a reduced specificity using WES within breast cancer. Furthermore, WGS may still be a necessity to detect MMR deficiency that has arisen only in subclones (20), precisely the subpopulations that are potentially selectively targetable, but could go unseen and untreated and may possibly be the source of future recurrence.
Finally, the dramatic improvements in targeted therapeutics on cancer outcomes have not necessarily come from new drugs. Indeed, reappraisal of existing drugs combined with fine-tuned stratification for more specific application has sometimes been at the heart of improved outcomes. Two clinical trials recently investigated sensitivity to PD-L1 checkpoint inhibitors, and used transcriptomic measures of PD-L1 as a biomarker (29). Cutoffs applied in these two trials were startlingly different: Opdivo (nivolumab, CheckMate-026 trial, Bristol–Myers Squibb) used a cutoff of PD-L1 expression exceeding 5% of cells, while Keytruda (pembrolizumab, Keynote-024, Merck) used a 50% cutoff to categorize tumors. Progression-free survival outcomes were contrasting between the two studies and speculated to be due to the choice of biomarker and cutoff, conferring a lack of specificity in tumor classification. Perhaps the application of mutational signatures could make a difference here. Our analyses suggest that because mutational signatures are a direct pathophysiologic read-out of MMR pathway abrogation, they confer a higher degree of sensitivity and specificity for tumor classification, and could possibly out-perform current biomarkers of MMR deficiency.
Disclosure of Potential Conflicts of Interest
H. Davies has ownership interest (including patents) in applied patents. D. Glodzik has ownership interest (including patents) in applied patents. S. Nik-Zainal has ownership interest (including patents) in a patent application and is a consultant/advisory board member for Artios Pharma Ltd. No potential conflicts of interest were disclosed by the other authors.
Conception and design: H. Davies, A.L. Richardson, A.-L. Børresen-Dale, A. Thompson, M. Stratton, S. Nik-Zainal
Development of methodology: S. Morganella, D. Glodzik, S. Nik-Zainal
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.A. Purdie, S.J. Jang, E. Borgen, H.G. Russnes, A. Viari, A.L. Richardson, A.-L. Børresen-Dale, A. Thompson, J.E. Eyfjord, G. Kong, S. Nik-Zainal
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H. Davies, S. Morganella, D. Glodzik, X. Zou, A.L. Richardson, S. Nik-Zainal
Writing, review, and/or revision of the manuscript: H. Davies, S. Morganella, C.A. Purdie, S.J. Jang, E. Borgen, A.L. Richardson, A.-L. Børresen-Dale, A. Thompson, J.E. Eyfjord, S. Nik-Zainal
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.A. Purdie, S. Nik-Zainal
Study supervision: S. Nik-Zainal
This work has been performed on data that was previously published. Key Principal Investigators whose vision led to this very fruitful collaboration include Mike Stratton, Alain Viari, Gu Kong, Henk Stennenberg, Marc van der Vijver, Ewan Birney, Ake Borg, and Anne-Lise Børresen-Dale.
S. Nik-Zainal and X. Zou were supported by the Wellcome Trust via a Wellcome Trust Strategic Award (WT101126/B/13/Z), S. Nik-Zainal is personally funded by a Cancer Research UK Advanced Clinician Scientist Award (C60100/A23916). A. L. Richardson is partially supported by the Dana-Farber/Harvard Cancer Center SPORE in Breast Cancer (NIH/NCI5 P50 CA168504-02). G. Kong is supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (NRF2015R1A2A1A10052578).