A breast cancer genome is a record of the historic mutagenic activity that has occurred throughout the development of the tumor. Indeed, every mutation may be informative. Although driver mutations were the main focus of cancer research for a long time, passenger mutational signatures, the imprints of DNA damage and DNA repair processes that have been operative during tumorigenesis, are also biologically illuminating. This review is a chronicle of how the concept of mutational signatures arose and brings the reader up-to-date on this field, particularly in breast cancer. Mutational signatures have now been advanced to include mutational processes that involve rearrangements, and novel cancer biological insights have been gained through studying these in great detail. Furthermore, there are efforts to take this field into the clinical sphere. If validated, mutational signatures could thus form an additional weapon in the arsenal of cancer precision diagnostics and therapeutic stratification in the modern war against cancer. Clin Cancer Res; 23(11); 2617–29. ©2017 AACR.

See all articles in this CCR Focus section, “Breast Cancer Research: From Base Pairs to Populations.”

The central tenet of cancer research has for decades been the identification of somatic driver mutations that are causally implicated in tumorigenesis (1). Thus, a host of breast cancer driver events are now known (2), including copy number aberrations (3–8), such as the ERBB2 and CCND1 amplification loci and homozygous deletions of CDKN2A/B and PTEN, and high-frequency substitution and insertion/deletion (indel) driver mutations in cancer genes like TP53 (∼frequency 53%), PIK3CA (8%–26%), CDH1 (21%), AKT1 (8%), and GATA3 (4%; refs. 9–12). Separately, extensive germline exploration has led to documentation of rare, high-penetrance (BRCA1, BRCA2, TP53; refs. 13, 14), moderate penetrance (PTEN, STK11, CDH1, ATM, CHEK2, BRIP1, PALB2; refs. 15–19), and common, low-penetrance risk alleles (20–24) for developing breast cancer (25). Essentially, enormous efforts have been placed on breast cancer classification based on somatic and germline mutation information, histopathologic markers, copy number, and expression profiles (9, 26, 27)—all aimed at improving diagnostic, prognostic, and therapeutic stratification.

When massive parallel sequencing arrived in the late 2000s (28), the increase in the speed of sequencing was of orders of magnitude, permitting access to large swathes of the human genome not previously accessible at a reasonable cost. In a striking testament to this technology, five back-to-back breast cancer articles were published in 2012 (9–12), providing a thorough view of the molecular foundations of breast cancer and saturating driver discovery in coding sequences (29). Quite apart from the mere handful of driver mutations present in each tumor, modern sequencing technologies enabled us to access the many thousands of passenger mutations present in each cancer as well. Herein lies a significant realization—that passenger mutations are not simply random manifestations or mutational debris—they represent the scars of biological processes that have gone awry during cancer development and are, therefore, a rich historical record of tumorigenesis (30).

The following model was previously proposed: At the point of a patient's cancer diagnosis, the set of somatic mutations revealed through sequencing of the tumor is the aggregate outcome of one or more mutational processes (30–32). Each process, defined by the mechanisms of DNA damage and DNA repair that constitute it, leaves a characteristic imprint or mutational signature on the cancer genome (Fig. 1). The final catalog of mutations is also determined by the intensity and duration of exposure to each mutational process (Fig. 1). Some may be weak or moderate in their intensity, whereas others may be very strong in their assertion. In addition, some exposures may be ongoing through the entire lifetime of the patient, even preceding the formation of the cancer, and some may commence late or become dominant later in tumorigenesis (Fig. 1). Furthermore, cancers comprise subclonal populations, which may be variably exposed to each mutational process (33, 34), promoting complexity of the final landscape of somatic mutations in a cancer genome.

Figure 1.

Somatic mutational processes in human cancer. Each mutational process leaves a characteristic imprint, or mutational signature, on the cancer genome, comprising DNA damage and DNA repair components. The arrows indicate the duration and intensity of exposure to a specific mutational process. The amount of exposure to each mutational process could vary from one person to another. Mutational processes A, B, C, and D represent hypothetical mutational processes that have occurred through the lifetime of the developing tumor. A could represent a normal mutational process that happens in all our cells (including normal cells), hence it is occurring in a small amount throughout life. B could represent a mutational process caused by an environmental insult, such as an occupational exposure to a carcinogen. C could represent a mutational process which occurs in bursts through tumorigenesis such as intermitted exposure to a chemical or to an intermittent disease process. D could represent the acquisition of a defect in a gene involved in normal DNA repair. The final mutational portrait is a composite of all the mutational processes that have been active over the lifetime of the cancer patient. A different patient could have all of these mutational processes occurring in their tumor or could have some of the same mutational processes as well as other mutational processes present.

Figure 1.

Somatic mutational processes in human cancer. Each mutational process leaves a characteristic imprint, or mutational signature, on the cancer genome, comprising DNA damage and DNA repair components. The arrows indicate the duration and intensity of exposure to a specific mutational process. The amount of exposure to each mutational process could vary from one person to another. Mutational processes A, B, C, and D represent hypothetical mutational processes that have occurred through the lifetime of the developing tumor. A could represent a normal mutational process that happens in all our cells (including normal cells), hence it is occurring in a small amount throughout life. B could represent a mutational process caused by an environmental insult, such as an occupational exposure to a carcinogen. C could represent a mutational process which occurs in bursts through tumorigenesis such as intermitted exposure to a chemical or to an intermittent disease process. D could represent the acquisition of a defect in a gene involved in normal DNA repair. The final mutational portrait is a composite of all the mutational processes that have been active over the lifetime of the cancer patient. A different patient could have all of these mutational processes occurring in their tumor or could have some of the same mutational processes as well as other mutational processes present.

Close modal

In 2012, the 183,016 substitutions present in 21 whole breast cancer genomes were used in a proof-of-principle exercise to demonstrate the existence of mutational signatures (30, 33). Critically, sequence context immediately 5′ and 3′ to each mutated base was taken into consideration in classifying each substitution. As there are six classes of base substitution and 16 possible sequence contexts for each mutated base (A, C, G, or T at the 5′ base and A, C, G, or T at the 3′ base), there are 96 possible mutated trinucleotides for each tumor. Various mathematical methods were explored and finally, nonnegative matrix factorization was used to extract five substitution signatures present in these tumors (signatures A–E, now known as signatures 1B, 2, 3, 8, and 13; refs. 30, 33; Fig. 2).

Figure 2.

Currently known extracted substitution mutational signatures in human breast cancers. A, Table of 12 mutational signatures extracted using nonnegative matrix factorization. Each signature is ordered by mutation class (C>A/G>T, C>G/G>C, C>T/G>A, T>A/A>T, T>C/A>G, T>G/A>C), taking immediate flanking sequence into account, resulting in 96 triplets. For each class, mutations are ordered by 5′ base (A, C, G, T) first, before 3′ base (A, C, G, T). Y-axis reports the probability of a signature generating each of the 96 triplets. Signature extraction was performed separately in 17 cancer types. The bars report the results of the extraction on the 560 breast cancers (37) using a widely available algorithm using simply default parameters (38), and the error bars demonstrate the variability (of the presumptive same signatures) between cancers of different tissue types. The table also contains the associated etiologies of each signature, the prevalence of these signatures in breast cancer, and whether the signature is also seen in other tumor types. HR, homologous recombination. B, Absolute numbers of mutations of each signature in each sample (top) and proportion of each signature in each sample (bottom). Panel B reprinted by permission from Macmillan Publishers Ltd.: Nature 534:47–54, copyright 2016.

Figure 2.

Currently known extracted substitution mutational signatures in human breast cancers. A, Table of 12 mutational signatures extracted using nonnegative matrix factorization. Each signature is ordered by mutation class (C>A/G>T, C>G/G>C, C>T/G>A, T>A/A>T, T>C/A>G, T>G/A>C), taking immediate flanking sequence into account, resulting in 96 triplets. For each class, mutations are ordered by 5′ base (A, C, G, T) first, before 3′ base (A, C, G, T). Y-axis reports the probability of a signature generating each of the 96 triplets. Signature extraction was performed separately in 17 cancer types. The bars report the results of the extraction on the 560 breast cancers (37) using a widely available algorithm using simply default parameters (38), and the error bars demonstrate the variability (of the presumptive same signatures) between cancers of different tissue types. The table also contains the associated etiologies of each signature, the prevalence of these signatures in breast cancer, and whether the signature is also seen in other tumor types. HR, homologous recombination. B, Absolute numbers of mutations of each signature in each sample (top) and proportion of each signature in each sample (bottom). Panel B reprinted by permission from Macmillan Publishers Ltd.: Nature 534:47–54, copyright 2016.

Close modal

Subsequently, a methods article (35) and a landmark article (32) were published where this mathematical approach was applied across 30 cancer types involving 7,042 samples [507 whole-genome sequencing (WGS) and 6,535 whole-exome sequencing (WES)] and revealed 21 substitution signatures altogether (http://cancer.sanger.ac.uk/cosmic/signatures). The number of breast cancers available for analysis had increased considerably to 100 WGS and 800 WES tumors. Reassuringly, the same five substitution signatures that were recognized previously were consistently identified in this larger dataset (30, 32), reinforcing conviction in the concept of mutational signatures and in the methods applied to extract them.

In a recent endeavor exploring 560 WGS breast tumors (36), the largest cohort of WGS cancers of a single tissue type to date, a total of 12 substitution signatures were identified from 3,479,652 mutations (Fig. 2A). This may superficially appear to be a substantial surge in signature discovery in breast tumors. On close inspection, many of the new signatures are relatively rare, present in few samples (36). Thus, in a similar paradigm to that of drivers, we have likely saturated the discovery of high-frequency, common mutational signatures in breast cancer. Sequencing further primary breast tumors is unlikely to yield new, major signatures. The increase in power possibly permits disambiguation of closely correlated signatures. Signatures 1 and 5, hitherto classified as signature 1B, were only just separated by this analysis. Many different algorithms are available today for mutation signature extraction (37–41)—some may reveal 11 (with signature 1B) or 12 signatures (signatures 1 and 5) from this dataset, or with more relaxation of parameters, even 13 (Supplementary Data). Regardless, that five signatures were consistently seen when as few as 21 samples were studied reveals that these early signatures are robust and common, and report ubiquitously present mutagenic processes in breast cells.

Of the 12 signatures now documented in breast cancer (ref. 36; Fig. 2A), signature 1B or signatures 1 and 5 are associated with age of diagnosis; signatures 2 and 13 are associated with the activity of the APOBEC cytidine deaminases; signature 3 is associated with BRCA1/BRCA2 deficiency; signature 8 appears to be increased in tumors with BRCA1/BRCA2 deficiency, although also present at lower levels in other tumors; signatures 6, 20, and 26 are associated with mismatch repair deficiency; and signatures 17, 18, and 30 are of unknown etiology (36). Of note, these mutational signatures do not appear to demonstrate specificity to breast cancer subtype whether classified by estrogen receptor (ER) status or other systems such as PAM50 or AIMS.

Most breast tumors have less than 20,000 substitutions in total (less than 6.5 mutations per Mb; Fig. 2B). Only a handful of samples have a very large number of mutations (up to 94,000 substitutions; Fig. 2B). Irrespective of mutation burden, the vast majority of samples comprise multiple mutational signatures (36). A subset of samples may be composed predominantly of specific signatures, and may even be overwhelmed by a very large number of mutations from these signatures and termed “hypermutators” (42). This trait is associated with certain mutational processes: signatures 2, 13, 6, 20, 26, and 17 in breast cancers (36). Indeed, some of these signatures (signatures 8, 13, and 17) appear to dominate later in breast tumorigenesis (43), observed latterly in cancer evolution (33, 36) and in metastatic disease (44). Perhaps, in time, these associations will be definitively verified as harbingers of poorer outcomes.

It was previously observed that substitution signatures had particular relationships with classes of indels (30). Patients with germline BRCA1/BRCA2 mutations exhibited an excess of larger indels (>3 bp) with microhomology present at breakpoint junctions (30). Moreover, tumors with signature 6, 20, or 26, which are associated with mismatch repair deficiency, have a large number of indels at polynucleotide repeat tracts, consistent with a label of microsatellite instability in these cancers (32, 36). Thus, correlations are observable between substitution signatures and crude indel patterns.

Mutational processes in human somatic cells are not restricted to producing base substitutions. Indeed, DNA damage and DNA repair processes can generate patterns of indels and large-scale chromosomal aberrations or structural variation as well (31). Thus, the basic premise of mutational signatures was recently extended to structural variation in breast cancer (36).

Genomic instability is a broad concept that encompasses a wide range of chromosomal level abnormalities. Some tumors have a large number of rearrangements (several hundred) that are focused or “clustered” at specific loci reporting driver amplicons (e.g., CCND1, ERBB2) or are simply sites of chromothripsis (45), for example. In contrast, other tumors could have an equivalent number of rearrangements but have them widely distributed throughout the genome instead. Intuitively, different mutational processes are likely to underpin these disparate genomic outcomes (36).

Rearrangements were thus separated according to whether they were clustered or dispersed (Fig. 3A), and then by rearrangement class (tandem duplication, deletion, inversion, or translocation; Fig. 3B) and by size (36). Following this classification, we applied the same mathematical framework, as described previously, and extracted six rearrangement signatures (RS; ref. 36; Fig. 3C). This exercise of defining rearrangement signatures was not simply academic—unsupervised hierarchical clustering yielded seven major subgroups (groups A–G) that exhibited distinct associations with other genomic, histologic, gene expression, and clinical features (ref. 36; Fig. 4).

Figure 3.

Extracting rearrangement mutational signatures in human breast cancers. A, Whole genome Circos plots were adapted from the R Circos package. Features depicted in Circos plots from outermost rings heading inwards: Karyotypic ideogram outermost. Base substitutions next, plotted as rainfall plots (log10 intermutation distance on radial axis, dot colors: blue = C>A; black = C>G; red = C>T; gray = T>A; green = T>C; pink = T>G). Ring with short green lines = insertions; ring with short red lines = deletions. Major copy number allele ring (green = gain), minor copy number allele ring (pink = loss); central lines represent rearrangements (green = tandem duplications; pink = deletions; blue = inversions; and gray = interchromosomal events). Note the difference in the nature of the distribution of rearrangements between the two tumors depicted. The whole genome profile on the left has >300 rearrangements that are clustered at distinct loci in specific chromosomes. In contrast, the >300 rearrangements present in the profile on the right-hand side are uniformly dispersed through the genome. The mutational processes underpinning the differing distributions in these two tumors are most likely to be different. Thus, separating rearrangements into whether they are clustered or dispersed represents a first step in the rearrangement classification. B, Types of rearrangements that can be ascertained easily. The hypothetical pieces of reference DNA from two different chromosomes on the left can be rearranged to form four main classes of rearrangements, as shown on the right. This is a second step in the classification of rearrangements prior to rearrangement signature extraction. The rearrangements are also divided by size before extraction. C, Six rearrangement signatures extracted using nonnegative matrix factorization. Probability of rearrangement element on y-axis. Rearrangement size on x-axis. Chr, chromosome; Del, deletion; Inv, inversion; Tds, tandem duplication; Trans, translocation.

Figure 3.

Extracting rearrangement mutational signatures in human breast cancers. A, Whole genome Circos plots were adapted from the R Circos package. Features depicted in Circos plots from outermost rings heading inwards: Karyotypic ideogram outermost. Base substitutions next, plotted as rainfall plots (log10 intermutation distance on radial axis, dot colors: blue = C>A; black = C>G; red = C>T; gray = T>A; green = T>C; pink = T>G). Ring with short green lines = insertions; ring with short red lines = deletions. Major copy number allele ring (green = gain), minor copy number allele ring (pink = loss); central lines represent rearrangements (green = tandem duplications; pink = deletions; blue = inversions; and gray = interchromosomal events). Note the difference in the nature of the distribution of rearrangements between the two tumors depicted. The whole genome profile on the left has >300 rearrangements that are clustered at distinct loci in specific chromosomes. In contrast, the >300 rearrangements present in the profile on the right-hand side are uniformly dispersed through the genome. The mutational processes underpinning the differing distributions in these two tumors are most likely to be different. Thus, separating rearrangements into whether they are clustered or dispersed represents a first step in the rearrangement classification. B, Types of rearrangements that can be ascertained easily. The hypothetical pieces of reference DNA from two different chromosomes on the left can be rearranged to form four main classes of rearrangements, as shown on the right. This is a second step in the classification of rearrangements prior to rearrangement signature extraction. The rearrangements are also divided by size before extraction. C, Six rearrangement signatures extracted using nonnegative matrix factorization. Probability of rearrangement element on y-axis. Rearrangement size on x-axis. Chr, chromosome; Del, deletion; Inv, inversion; Tds, tandem duplication; Trans, translocation.

Close modal
Figure 4.

The spectrum of signatures within 560 breast cancers and individual patient whole genome profiles. The panels in the middle represent, from top to bottom: BRCA1- or BRCA2-null samples (dark purple) versus what are believed to be non-BRCA1/BRCA2–mutated samples (light purple), ER status (black = positive; gray = negative), proportions of substitution signatures, rearrangement signatures, and indel patterns present in the 560 patients. Figure legends are provided at the top of the figure. Samples are ordered according to hierarchical clustering performed on rearrangement mutational signatures. Six whole genome profiles of individual patients are shown to demonstrate how individualized each cancer genome is per patient. Note the striking differences between the six patients, even within the same “group” (groups B and G). Group D is enriched with BRCA1-null tumors, group G is enriched with BRCA2-null tumors, and group F is enriched with tumors that are never genetically BRCA1 null, are BRCA-like but different. Ins, insertions; Mh, microhomology mediated; Rep, polynucleotide repeat-tract mediated.

Figure 4.

The spectrum of signatures within 560 breast cancers and individual patient whole genome profiles. The panels in the middle represent, from top to bottom: BRCA1- or BRCA2-null samples (dark purple) versus what are believed to be non-BRCA1/BRCA2–mutated samples (light purple), ER status (black = positive; gray = negative), proportions of substitution signatures, rearrangement signatures, and indel patterns present in the 560 patients. Figure legends are provided at the top of the figure. Samples are ordered according to hierarchical clustering performed on rearrangement mutational signatures. Six whole genome profiles of individual patients are shown to demonstrate how individualized each cancer genome is per patient. Note the striking differences between the six patients, even within the same “group” (groups B and G). Group D is enriched with BRCA1-null tumors, group G is enriched with BRCA2-null tumors, and group F is enriched with tumors that are never genetically BRCA1 null, are BRCA-like but different. Ins, insertions; Mh, microhomology mediated; Rep, polynucleotide repeat-tract mediated.

Close modal

Three of the signatures are featured in homologous recombination (HR)–deficient tumors: RS1, dominated by long (>100 kb) tandem duplications, characterized many HR-deficient tumors but defined group F tumors associated with older age of diagnosis and poorer outcome in this small cohort; RS3, characterized by short (<10 kb) tandem duplications was specific to BRCA1-mutant tumors (group D); whereas RS5, defined by deletions (<10 kb), are present in BRCA1- and BRCA2-deficient samples and typified group G BRCA2-mutated samples (36). Hence, we were able to differentiate BRCA1- from BRCA2-null tumors, as well as a BRCA-like (but different) cohort with distinct clinical features (36). These diverse groups would have simply been labeled as having “genomic instability” in the past and been indistinguishable (Fig. 4).

Of the remaining rearrangement signatures, RS2, characterized by large (>100 kb) nonclustered deletions, inversions, and interchromosomal translocations, defined group E ER-positive tumors (36). In contrast, RS4 and RS6 were both characterized by clustered rearrangements and were enriched in groups A, B, and C, which were of mixed ER status but frequently had large driver amplicons, for example, ERBB2 and CCND1 (36).

Remarkably, deep analysis of individual rearrangement signatures has unearthed a novel, if somewhat disturbing, biological insight. Very recently, 33 loci were identified as sites that are rearranged by long RS1 tandem duplications more frequently than expected in independent tumors from different patients, even if by only a single tandem duplication (46). Interestingly, these hotspots are enriched for breast cancer germline susceptibility loci, breast-specific super-enhancer regulatory elements, and oncogenes (46). These loci have high transcriptional activity in breast tissue and are susceptible to double-strand break (DSB) damage and, following DSB repair, to formation of rearrangements. Yet, not all classes of rearrangements are represented at these sites—only long RS1 tandem duplications. It was hypothesized that long tandem duplications are more likely to effectively increase whole copies of these regulatory elements/genes, and that this could confer some degree of secondary selective pressure, even if incrementally (46). Indeed, corroborative transcriptomic evidence was observed to support this postulate, providing a devastating insight into this mutational process of HR deficiency: It may commence as a passenger mutational signature but, unwittingly, creates secondary driver events. RS1 is, therefore, a particularly deleterious genetic mechanism—an injurious mutational signature that perpetuates carcinogenesis (46).

This field of rearrangement signatures may only be in its infancy, but a number of deep messages are appearing, although clinical significance requires further evaluation. An exciting future awaits as the field matures and other tissue types are incorporated into these analyses.

The substitution signatures described thus far report mutagenesis distributed throughout the human genome. Intriguingly, localized mutagenesis has also been reported (30). By calculating an intermutation distance, or the distance from a substitution to the one immediately preceding it in the reference genome, we were able to appreciate focal substitution hypermutation (30). Although most mutations in a cancer genome would exhibit an intermutation distance of approximately 105 bp to approximately 106 bp, localized regions of hypermutation or “kataegis” presented as clusters of substitutions with shorter intermutation distances (defined as six or more substitutions with an average intermutation distance of <1,000 bp; refs. 30, 32, 36). These focal mutation showers had striking characteristics—an excess of cytosine mutations at a TpC sequence context and colocalizing with a different class of mutation altogether, rearrangements.

Kataegis mutations bear a strong resemblance to those of genome-wide signatures 2 and 13, which are associated with APOBEC enzymatic activity (47–49). APOBECs are a family of cytidine deaminases that evolved to restrict retroviruses and retrotransposon elements. APOBECs require single-stranded DNA (ssDNA) as a substrate for deamination of cytosine to uracil. Notably, experimental studies in yeast suggest that DSBs and end resection are a source of ssDNA required for APOBECs to generate kataegis (47). In contrast, alternative cellular processes such as replication or transcription have been hypothesized as a potential fount of ssDNA for APOBEC activity–generating signatures 2 and 13 (31). Thus, although APOBEC enzymes are involved in kataegis, and genome-wide signatures 2 and 13, they are believed to be mechanistically distinct mutational processes likely arising at different instances of cellular stress (Fig. 5).

Figure 5.

Mechanistic insights from mutagenesis: the APOBEC family of enzymes in genome-wide (signatures 2 and 13) and localized mutational signatures (kataegis). On the basis of the predominant cytosine mutagenesis at a TpC sequence context, the APOBEC family of enzymes has been implicated in causing both localized kataegis and genome-wide signatures 2 and 13. A, APOBECs cause DNA damage, particularly on ssDNA, by deaminating cytosine into uracil. Uracil-N-glycosylase (UNG) first removes uracil before other components of the Base Excision Repair pathway restore the damaged DNA to its original state. If DNA is uncorrected and enters replication as uracil or an abasic site, then the possibilities are of generating C>T transition or C>G and C>A transversion mutations. B, Although APOBECs are involved in both localized and genome-wide mutagenesis, there is mounting experimental and analytic evidence to support the hypotheses that these signatures arise by different mechanisms. Kataegis is believed to require a DSB to arise first, before end resection of the DSB leaves ssDNA exposed for APOBEC deamination (left). In contrast, APOBEC deamination that gives rise to signatures 2 and 13 requires long stretches of ssDNA that could occur during uncoupling of the leading and lagging replication strands (right).

Figure 5.

Mechanistic insights from mutagenesis: the APOBEC family of enzymes in genome-wide (signatures 2 and 13) and localized mutational signatures (kataegis). On the basis of the predominant cytosine mutagenesis at a TpC sequence context, the APOBEC family of enzymes has been implicated in causing both localized kataegis and genome-wide signatures 2 and 13. A, APOBECs cause DNA damage, particularly on ssDNA, by deaminating cytosine into uracil. Uracil-N-glycosylase (UNG) first removes uracil before other components of the Base Excision Repair pathway restore the damaged DNA to its original state. If DNA is uncorrected and enters replication as uracil or an abasic site, then the possibilities are of generating C>T transition or C>G and C>A transversion mutations. B, Although APOBECs are involved in both localized and genome-wide mutagenesis, there is mounting experimental and analytic evidence to support the hypotheses that these signatures arise by different mechanisms. Kataegis is believed to require a DSB to arise first, before end resection of the DSB leaves ssDNA exposed for APOBEC deamination (left). In contrast, APOBEC deamination that gives rise to signatures 2 and 13 requires long stretches of ssDNA that could occur during uncoupling of the leading and lagging replication strands (right).

Close modal

Interestingly, an alternative form of kataegis was also rarely observed (0.9% of all kataegis foci identified in breast cancer; ref. 36). Also colocalizing with rearrangements, this version of kataegis exhibited a different base substitution pattern of T>G and T>C mutations predominantly at NTT and NTA sequences. The etiology of this form of kataegis is unknown.

The distribution of somatic mutations is uneven through cancer genomes, has been extensively studied, and has been found to be largely influenced by replication time domains and histone epigenetic marks (50, 51). Predicated on being able to probabilistically assign every mutation in a cancer to a mutational signature, similar analyses have now been performed as mutational signatures (36). Because mutational signatures are proxies for specific biological processes, the advantage of performing these analyses as mutational signatures is that one can interpret the influence of dynamic cellular events, such as replication, transcription, and nucleosome occupancy, on the associated biological processes (ref. 36; Table 1).

Table 1.

Summary of relationships between each mutational signature and various genomic features. The 20 mutational signatures are noted in the left-most column. This is followed by information on mutation classes, features that predominantly characterize each signature, and associated etiologies, if known. Relationships relating to transcriptional strands, replication time, and strands and chromatin organization are also noted.

Mutational signatureMutation typePredominant features of signatureAssociated mutational processTranscriptional strandReplicative strandReplication timeChromatin organization
Sub C>T at CpG Deamination of methyl-cytosine (age associated)  Some bias Enriched late  
Sub T>C Uncertain (age associated) Some bias Some bias Enriched late Slight enrichment at linker 
Sub C>T at TpCpN APOBEC related Some bias Strong lagging strand bias Enriched late  
13 Sub C>G at TpCpN APOBEC related Some bias Strong lagging strand bias Flat  
Sub C>T (and C>A and T>C) MMR deficient  Some bias Flat  
20 Sub C>A (and C>T and T>C) MMR deficient  Some bias Enriched late  
26 Sub T>C MMR deficient Some bias Strong bias Enriched late Enriched at linker 
Sub  HR deficient Some bias Some bias Enriched late  
Sub C>A Amplified by HR deficiency? Some bias  Enriched late  
18 Sub C>A Uncertain Some bias Some bias Enriched late Enriched at nucleosomes and periodic 
17 Sub T>G Uncertain  Some bias Enriched late Enriched at nucleosomes and periodic 
30 Sub C>T Uncertain   Flat  
RS1 Rearr Large tandem duplications (>100 kb) Uncertain type of HR deficiency? NA NA Enriched early  
RS2 Rearr Dispersed translocations  NA NA Enriched early  
RS3 Rearr Small tandem duplications (<10 kb) HR deficiency (BRCA1) NA NA Enriched early  
RS4 Rearr Clustered translocations  NA NA Enriched early  
RS5 Rearr Deletions HR deficient NA NA Enriched early  
RS6 Rearr Other clustered rearrangements  NA NA Enriched early  
Repeat-med Indel <3 bp indel at polynuc tract MMR deficient NA NA Enriched late Enriched at linker and periodic 
Microhom Indel ≥3 bp indel with MMEJ-junctions HR deficient NA NA Enriched late  
Mutational signatureMutation typePredominant features of signatureAssociated mutational processTranscriptional strandReplicative strandReplication timeChromatin organization
Sub C>T at CpG Deamination of methyl-cytosine (age associated)  Some bias Enriched late  
Sub T>C Uncertain (age associated) Some bias Some bias Enriched late Slight enrichment at linker 
Sub C>T at TpCpN APOBEC related Some bias Strong lagging strand bias Enriched late  
13 Sub C>G at TpCpN APOBEC related Some bias Strong lagging strand bias Flat  
Sub C>T (and C>A and T>C) MMR deficient  Some bias Flat  
20 Sub C>A (and C>T and T>C) MMR deficient  Some bias Enriched late  
26 Sub T>C MMR deficient Some bias Strong bias Enriched late Enriched at linker 
Sub  HR deficient Some bias Some bias Enriched late  
Sub C>A Amplified by HR deficiency? Some bias  Enriched late  
18 Sub C>A Uncertain Some bias Some bias Enriched late Enriched at nucleosomes and periodic 
17 Sub T>G Uncertain  Some bias Enriched late Enriched at nucleosomes and periodic 
30 Sub C>T Uncertain   Flat  
RS1 Rearr Large tandem duplications (>100 kb) Uncertain type of HR deficiency? NA NA Enriched early  
RS2 Rearr Dispersed translocations  NA NA Enriched early  
RS3 Rearr Small tandem duplications (<10 kb) HR deficiency (BRCA1) NA NA Enriched early  
RS4 Rearr Clustered translocations  NA NA Enriched early  
RS5 Rearr Deletions HR deficient NA NA Enriched early  
RS6 Rearr Other clustered rearrangements  NA NA Enriched early  
Repeat-med Indel <3 bp indel at polynuc tract MMR deficient NA NA Enriched late Enriched at linker and periodic 
Microhom Indel ≥3 bp indel with MMEJ-junctions HR deficient NA NA Enriched late  

Abbreviations: MMEJ, microhomology-mediated end joining; NA, not available; polynuc, polynucleotide; Rearr, rearrangement; Sub, substitution.

Reprinted by permission from Macmillan Publishers Ltd.: Nature Communications 7:11383, copyright 2016.

For example, one of the most noteworthy insights obtained from this analysis was the degree of asymmetry observed between replication strands for particular signatures. For approximately 100,000 mutations on the leading replicative strand, approximately 140,000 mutations were observed on the lagging strand specifically for APOBEC-related signatures 2 and 13 (36). This level of asymmetry implies that replication has a mechanistic role in the generation of signatures 2 and 13 (Fig. 5). APOBECs demand ssDNA as a deamination substrate, and replication is a perfect physiologic source of ssDNA. Indeed, in 2016, four other publications supported this observation through in vivo (52, 53) and in vitro (54, 55) studies. Replication strand asymmetry was also observed for signature 26 (36), one of the four mutational signatures associated with deficiency of mismatch repair. Had these analyses been performed on all mutations combined, the specific behaviors (Table 1) would not have been appreciable—the signal diluted by aggregation. Thus, these vignettes demonstrate the value of performing analyses as mutational signatures.

Ultimately, a profound theme has crystallized. Different signatures exhibit different relationships with replication, transcription, and chromatin organization, fortifying how mutational signatures must be true biological phenomena and are not simply theoretical, mathematical constructs.

Some mutational signatures are a direct pathophysiologic read-out of the abrogation of a DNA repair gene/pathway and could be used as a biomarker to report DNA repair deficiency in a tumor (31, 56). Somatic nullness of a single gene, such as BRCA1, however, does not simply produce one mutational signature; it produces a multitude of mutational patterns (36). On one hand, this complicates an already burdened mutational landscape. Conversely, this could be used to our advantage for potential clinical applications.

Very recently, a supervised Lasso logistic regression model was used to learn the multiple substitution, indel, and rearrangement mutational signatures that distinguish germline BRCA1/BRCA2–mutated cancers from sporadic tumors (57). Six mutational patterns were found to be discriminatory and were weighted to create a mutational signature–based predictor of BRCA1/BRCA2 deficiency called HRDetect (57).

HRDetect outperforms customary copy number–based approaches (refs. 58–60; e.g., HRD index) for detecting BRCA1/BRCA2 deficiency and any individual signature on its own (HRDetect AUC = 0.98). This is unsurprising, as a predictor that hunts for some combination of many signatures would be more sensitive and specific than a predictor that is dependent on only a single signal (57). Thus, HRDetect works extraordinarily well even in situations of reduced mutation information secondary to low tumor cellularity, low sequencing depth (e.g., low coverage WGS sequencing of ∼10-fold rather than 30-fold), or increased noise (e.g., in cancer specimens that have artefactual genetic changes arising from formalin fixation; ref. 57). This observation could have immediate potential applications.

Of particular clinical importance, HRDetect revealed a larger proportion of patients with BRCA1/BRCA2 deficiency than expected, of up to 22%, that is, many more than the 3.9% of germline mutation carriers that were knowingly recruited to the study (57). More than half of these tumors would not have been detected as BRCA1/BRCA2 null using targeted sequencing of these genes alone. BRCA1/BRCA2–null tumors are selectively sensitive to compounds such as PARP inhibitors (61–64), which are currently theoretically reserved for approximately 1% to 5% of the germline mutation carriers. Profoundly, if in fact one in every five breast cancer patients has the equivalent of a BRCA1/BRCA2–null tumor, could they be similarly selectively sensitive to PARP inhibitors? This is unknown, and it is now necessary to embark on experiments and/or clinical trials to seek conclusive evidence. The message to the community is this: We need clinical trials of drugs like PARP inhibitors, which are not restricted to germline mutation carriers, and are applied to sporadic breast and possibly other tumors in the general populace.

Beyond that of driver mutations, mutational signatures could contribute a powerful, additional spoke in the wheel of cancer diagnostics and therapeutic stratification. There is likely to be scope for identifying other pathophysiologic processes with sensitivities to different therapies (e.g., replication stress with WEE1/ATR inhibitors or perhaps stratifying sensitivity to immunotherapies; refs. 65, 66). The academic abstraction of mutational signatures takes a step closer toward the clinic.

Although it is a fast-paced and exciting field, the mutational signatures model does warrant critical scrutiny. No matter how sophisticated the analyses of in vivo mutagenesis, there are limitations to studying tumors—it is an uncontrolled and noisy system, and even the best clinical metadata collections will, at best, provide associations.

First, we acknowledge that the model requires validation

Experiments that show how different signatures can be generated by different exposures will contribute toward reinforcing the concept. The field of environmental mutagenesis (67–71) will argue that historic TP53 and HPRT reporter assays, and experiments exposing mouse embryonic fibroblasts to external exposures (34), such as ultraviolet light and tobacco carcinogens, already provide evidence that mutations generated through exogenous exposures generate mutation patterns that are similar to those observed in human cancers. However, there have been limited efforts to demonstrate similarly clear relationships for endogenous mutational processes. Perhaps, systematic surveys of mutational signatures of DNA-damaging agents and from abrogation of DNA repair genes will be required to truly convince the scientific community that mutational signatures observed in human cancers arise from both external and internal sources of DNA damage and DNA repair. Experimental evidence showing that the amount of exposure (whether to a chemical compound or endogenous exposure) is correlated with the degree of mutagenesis will also help to strengthen conviction in this model. The final demonstration of being able to turn on a signature (through gene knockout) and turn it off again (through reversing the mutation) could definitively authenticate this model.

Second, what is the mathematical rigor of this concept?

The principle of factorizing or reducing a complex, multidimensional dataset into simpler parts is not unusual. Multiple different mathematical methods have been developed for precisely this purpose (37–41). Although showing striking similarity, the results obtained through these different methods are not identical (Supplementary Data, Supplementary Figs. S1–S3). This has raised concerns regarding reproducibility.

There are signatures that are staunchly similar, for example, the signatures related to the activity of APOBEC enzymes (signatures 2 and 13), which are pervasive, robust across algorithms (Supplementary Data, Supplementary Fig. S3), and undisputed as mutational signatures in human tumors. Related signatures are admittedly sometimes less clearly distinguishable (Supplementary Note, Supplementary Fig. S3). Various post hoc processing methods are reported to be used to tease these apart, and these do result in differences in the final extracted signatures. For example, signatures 6, 20, and 26 (all related to mismatch repair deficiency) are historically more difficult to disentangle because of commonalities in their 96-element profile. That they are more challenging to disambiguate from one another does not mean that they do not exist, of course, and may even reflect biological interactions between them.

The assignment of the amount of each signature present in individual tumors is also a source of variation between algorithms. Invariably, these algorithms assign a small proportion of every signature to every sample examined. This is unlikely to be biologically true, so penalties may or have been introduced to increase the “sparsity” of mutation assignments, resulting in variation in final signature contributions to individual samples. Thus, mutational signature extraction and the assignments of these signatures are currently not fully deterministic. Of course, balance needs to be struck between precise signature analysis and not overfitting data through post hoc processing.

Another potential source of muddied results comes from pooling of data across tumor types. At a first approximation, increasing the size of a cohort would provide greater statistical power for analysis. However, mixing of tumor types, particularly if they are not of equivalent numbers (e.g., 500 breast cancers with 25 leukemias) and have differing mutational burdens, could result in signal dilution or interference. This can be difficult to disentangle; therefore, pooled analyses should be undertaken with a very clear declaration of methods, including what post hoc processing steps are used. In the community, it remains controversial whether pooled analyses should be used. Such analyses imply that we expect to extract mutational signatures that are identical across all tissues. This may be true for some signatures, but not for all. There is no reason to expect that genes involved in HR repair perform precisely the same functions at the same time in the cell cycle to the same degree, in breast tissue as well as in colonic tissue. Indeed, the likelihood is that they almost certainly do not.

Even when studying a specific tissue type such as breast cancer, we acknowledge that there are genuine biological differences between cohorts of samples (see Supplementary Figs. S1–S3 for comparison of two cohorts). Rare signatures present in 1% to 2% of tumors only, may not have been detected previously, because it was simply not present in any prior dataset examined (Supplementary Figs. S4–S5). Thus, getting a different result such as a novel mutational signature in a new dataset may be a genuine new finding, provided, of course, that many of the canonical signatures are also detected.

It is very likely that mutational signatures in human tumors do indeed exist, but how analyses are performed could affect the results of a signature extraction. Therefore, for any given analysis, it is vital to report how it was performed with absolute clarity, and for reviewers to critically assess whether the method applied is appropriate to the biological question being asked. What is described in this review is what we have seen in breast cancers to date, although the possibility of change is there. Mathematical extractions of mutational signatures have their limitations and should not be considered as deterministic. Intertissue variation is expected (Supplementary Table S1). Perhaps one way of presenting data is that of an average signal with error bars indicative of the intertissue variation (Fig. 2A).

Thus, there is variability in mathematical extraction of signatures depending on the algorithm used, on how data are used (whether analyzed as a pool of multiple tumor types or analyzed as separate tumor types), and even on whether the data are derived from whole genome or from exome sequencing experiments. How best to handle these issues remains uncertain and will likely be resolved in time.

Today, we can demonstrate and quantify mutational signatures in breast and other cancers; we can gain novel biological insights and potentially exploit signature properties for clinical applications. As noted above, some thoughtfulness is still required in the interpretation of any cancer-based analysis, and experimental work remains the bastion for substantiating proposed etiologies or mechanisms underpinning mutational signatures.

Notwithstanding, we are able to thoroughly profile cancer genomes per patient (Fig. 4 shows six strikingly different whole genome profiles). Soberingly, for the near approximately 700 WGS and approximately 1,500 WES breast cancers that have already been scrutinized, no two patients shared the same set of drivers or the same quantities of signatures (Fig. 4). Personalized genomics is, therefore, not an option for us to debate; it is a fact of life and a challenge we must embrace.

Applying comprehensive genomic approaches judiciously (72), particularly within the context of clinical trials, could prove to be most rewarding. If we had access to informative cohorts with outcome data available, this would indeed help to accelerate translation into the clinic (72, 73).

It should also go beyond that of resequencing primary cancers. Precursors of breast cancer such as ductal carcinoma in situ (DCIS) and metastasic lesions (66) should be targeted for similarly detailed levels of driver and mutational signature investigation. Likewise, tumors separated temporally and spatially in individual patients could provide useful perspectives on tumorigenesis. There also remains more to explore through integration with other modalities, such as expression (74) and methylation, and assessments of surrounding tissue microenvironment.

Last but not least, the insights on mutational signatures have only transpired because data generated through many sequencing studies, from many academic and clinical centers, have been shared with the wider community. Thus, any future sequencing endeavor, be it within a clinical trial or otherwise, should be committed to data sharing. This is because the opportunity to learn new things from data resources not just immediately, but subsequently, is huge, particularly if thorough genomic profiling is available.

S. Nik-Zainal is listed as a co-inventor on multiple patent filings related to the application of mutational signatures that are owned by Genome Research Limited, and is a consultant/advisory board member for Artios Pharma Ltd. No potential conflicts of interest were disclosed by the other author.

The authors thank Esther Lips (NKI, Holland, the Netherlands), Shelley Hwang (Duke University, Durham, NC), and Alastair Thompson (MD Anderson Cancer Center, Houston, TX) for critical assessment of the manuscript. The authors also thank the ICGC Breast Cancer Working Group and the BASIS Consortium funded by the Seventh EU Programme for having the foresight to see the potential and conceive the idea of these extensive resequencing experiments in breast cancer.

S. Nik-Zainal was a Wellcome-Beit Fellow and personally funded by a Wellcome Trust Intermediate Clinical Research Grant (WT100183MA) at the start of writing this review, and subsequently funded by a CRUK Advanced Clinician Scientist Award (C60100/A23916). S. Morganella is funded by core funds from the Wellcome Trust Sanger Institute.

1.
Stratton
MR
,
Campbell
PJ
,
Futreal
PA
. 
The cancer genome
.
Nature
2009
;
458
:
719
24
.
2.
Stephens
PJ
,
Tarpey
PS
,
Davies
H
,
Van Loo
P
,
Greenman
C
,
Wedge
DC
, et al
The landscape of cancer genes and mutational processes in breast cancer
.
Nature
2012
;
486
:
400
4
.
3.
Ching
HC
,
Naidu
R
,
Seong
MK
,
Har
YC
,
Taib
NA
. 
Integrated analysis of copy number and loss of heterozygosity in primary breast carcinomas using high-density SNP array
.
Int J Oncol
2011
;
39
:
621
33
.
4.
Fang
M
,
Toher
J
,
Morgan
M
,
Davison
J
,
Tannenbaum
S
,
Claffey
K
. 
Genomic differences between estrogen receptor (ER)-positive and ER-negative human breast carcinoma identified by single nucleotide polymorphism array comparative genome hybridization analysis
.
Cancer
2011
;
117
:
2024
34
.
5.
Hicks
J
,
Krasnitz
A
,
Lakshmi
B
,
Navin
NE
,
Riggs
M
,
Leibu
E
, et al
Novel patterns of genome rearrangement and their association with survival in breast cancer
.
Genome Res
2006
;
16
:
1465
79
.
6.
Hicks
J
,
Muthuswamy
L
,
Krasnitz
A
,
Navin
N
,
Riggs
M
,
Grubor
V
, et al
High-resolution ROMA CGH and FISH analysis of aneuploid and diploid breast tumors
.
Cold Spring Harb Symp Quant Biol
2005
;
70
:
51
63
.
7.
King
CR
,
Kraus
MH
,
Aaronson
SA
. 
Amplification of a novel v-erbB-related gene in a human mammary carcinoma
.
Science
1985
;
229
:
974
6
.
8.
Leary
RJ
,
Lin
JC
,
Cummins
J
,
Boca
S
,
Wood
LD
,
Parsons
DW
, et al
Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers
.
Proc Natl Acad Sci U S A
2008
;
105
:
16224
9
.
9.
Curtis
C
,
Shah
SP
,
Chin
SF
,
Turashvili
G
,
Rueda
OM
,
Dunning
MJ
, et al
The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
.
Nature
2012
;
486
:
346
52
.
10.
Ellis
MJ
,
Ding
L
,
Shen
D
,
Luo
J
,
Suman
VJ
,
Wallis
JW
, et al
Whole-genome analysis informs breast cancer response to aromatase inhibition
.
Nature
2012
;
486
:
353
60
.
11.
Shah
SP
,
Roth
A
,
Goya
R
,
Oloumi
A
,
Ha
G
,
Zhao
Y
, et al
The clonal and mutational evolution spectrum of primary triple-negative breast cancers
.
Nature
2012
;
486
:
395
9
.
12.
Banerji
S
,
Cibulskis
K
,
Rangel-Escareno
C
,
Brown
KK
,
Carter
SL
,
Frederick
AM
, et al
Sequence analysis of mutations and translocations across breast cancer subtypes
.
Nature
2012
;
486
:
405
9
.
13.
Miki
Y
,
Swensen
J
,
Shattuck-Eidens
D
,
Futreal
PA
,
Harshman
K
,
Tavtigian
S
, et al
A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1
.
Science
1994
;
266
:
66
71
.
14.
Wooster
R
,
Bignell
G
,
Lancaster
J
,
Swift
S
,
Seal
S
,
Mangion
J
, et al
Identification of the breast cancer susceptibility gene BRCA2
.
Nature
1995
;
378
:
789
92
.
15.
Thompson
D
,
Duedal
S
,
Kirner
J
,
McGuffog
L
,
Last
J
,
Reiman
A
, et al
Cancer risks and mortality in heterozygous ATM mutation carriers
.
J Natl Cancer Inst
2005
;
97
:
813
22
.
16.
Masciari
S
,
Larsson
N
,
Senz
J
,
Boyd
N
,
Kaurah
P
,
Kandel
MJ
, et al
Germline E-cadherin mutations in familial lobular breast cancer
.
J Med Genet
2007
;
44
:
726
31
.
17.
Meijers-Heijboer
H
,
van den Ouweland
A
,
Klijn
J
,
Wasielewski
M
,
de Snoo
A
,
Oldenburg
R
, et al
Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations
.
Nat Genet
2002
;
31
:
55
9
.
18.
Litman
R
,
Peng
M
,
Jin
Z
,
Zhang
F
,
Zhang
J
,
Powell
S
, et al
BACH1 is critical for homologous recombination and appears to be the Fanconi anemia gene product FANCJ
.
Cancer Cell
2005
;
8
:
255
65
.
19.
Chen
J
,
Lindblom
P
,
Lindblom
A
. 
A study of the PTEN/MMAC1 gene in 136 breast cancer families
.
Hum Genet
1998
;
102
:
124
5
.
20.
Cox
A
,
Dunning
AM
,
Garcia-Closas
M
,
Balasubramanian
S
,
Reed
MW
,
Pooley
KA
, et al
A common coding variant in CASP8 is associated with breast cancer risk
.
Nat Genet
2007
;
39
:
352
8
.
21.
Easton
DF
,
Pooley
KA
,
Dunning
AM
,
Pharoah
PD
,
Thompson
D
,
Ballinger
DG
, et al
Genome-wide association study identifies novel breast cancer susceptibility loci
.
Nature
2007
;
447
:
1087
93
.
22.
Hunter
DJ
,
Kraft
P
,
Jacobs
KB
,
Cox
DG
,
Yeager
M
,
Hankinson
SE
, et al
A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer
.
Nat Genet
2007
;
39
:
870
4
.
23.
Stacey
SN
,
Manolescu
A
,
Sulem
P
,
Rafnar
T
,
Gudmundsson
J
,
Gudjonsson
SA
, et al
Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer
.
Nat Genet
2007
;
39
:
865
9
.
24.
Stacey
SN
,
Manolescu
A
,
Sulem
P
,
Thorlacius
S
,
Gudjonsson
SA
,
Jonsson
GF
, et al
Common variants on chromosome 5p12 confer susceptibility to estrogen receptor-positive breast cancer
.
Nat Genet
2008
;
40
:
703
6
.
25.
Thompson
WD
. 
Genetic epidemiology of breast cancer
.
Cancer
1994
;
74
:
279
87
.
26.
Bergamaschi
A
,
Kim
YH
,
Wang
P
,
Sorlie
T
,
Hernandez-Boussard
T
,
Lonning
PE
, et al
Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer
.
Genes Chromosomes Cancer
2006
;
45
:
1033
40
.
27.
Vincent-Salomon
A
,
Lucchesi
C
,
Gruel
N
,
Raynal
V
,
Pierron
G
,
Goudefroye
R
, et al
Integrated genomic and transcriptomic analysis of ductal carcinoma in situ of the breast
.
Clin Cancer Res
2008
;
14
:
1956
65
.
28.
Bentley
DR
,
Balasubramanian
S
,
Swerdlow
HP
,
Smith
GP
,
Milton
J
,
Brown
CG
, et al
Accurate whole human genome sequencing using reversible terminator chemistry
.
Nature
2008
;
456
:
53
9
.
29.
Pereira
B
,
Chin
SF
,
Rueda
OM
,
Vollan
HK
,
Provenzano
E
,
Bardwell
HA
, et al
The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes
.
Nat Commun
2016
;
7
:
11479
.
30.
Nik-Zainal
S
,
Alexandrov
LB
,
Wedge
DC
,
Van Loo
P
,
Greenman
CD
,
Raine
K
, et al
Mutational processes molding the genomes of 21 breast cancers
.
Cell
2012
;
149
:
979
93
.
31.
Helleday
T
,
Eshtad
S
,
Nik-Zainal
S
. 
Mechanisms underlying mutational signatures in human cancers
.
Nat Rev Genet
2014
;
15
:
585
98
.
32.
Alexandrov
LB
,
Nik-Zainal
S
,
Wedge
DC
,
Aparicio
SA
,
Behjati
S
,
Biankin
AV
, et al
Signatures of mutational processes in human cancer
.
Nature
2013
;
500
:
415
21
.
33.
Nik-Zainal
S
,
Van Loo
P
,
Wedge
DC
,
Alexandrov
LB
,
Greenman
CD
,
Lau
KW
, et al
The life history of 21 breast cancers
.
Cell
2012
;
149
:
994
1007
.
34.
Nik-Zainal
S
,
Kucab
JE
,
Morganella
S
,
Glodzik
D
,
Alexandrov
LB
,
Arlt
VM
, et al
The genome as a record of environmental exposure
.
Mutagenesis
2015
;
30
:
763
70
.
35.
Alexandrov
LB
,
Nik-Zainal
S
,
Wedge
DC
,
Campbell
PJ
,
Stratton
MR
. 
Deciphering signatures of mutational processes operative in human cancer
.
Cell Rep
2013
;
3
:
246
59
.
36.
Nik-Zainal
S
,
Davies
H
,
Staaf
J
,
Ramakrishna
M
,
Glodzik
D
,
Zou
X
, et al
Landscape of somatic mutations in 560 breast cancer whole-genome sequences
.
Nature
2016
;
534
:
47
54
.
37.
Kim
J
,
Mouw
KW
,
Polak
P
,
Braunstein
LZ
,
Kamburov
A
,
Tiao
G
, et al
Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors
.
Nat Genet
2016
;
48
:
600
6
.
38.
Gehring
JS
,
Fischer
B
,
Lawrence
M
,
Huber
W
. 
SomaticSignatures: inferring mutational signatures from single-nucleotide variants
.
Bioinformatics
2015
;
31
:
3673
5
.
39.
Fischer
A
,
Illingworth
CJ
,
Campbell
PJ
,
Mustonen
V
. 
EMu: probabilistic inference of mutational processes and their localization in the cancer genome
.
Genome Biol
2013
;
14
:
R39
.
40.
Roberts
ND
,
Wedge
DC
,
Campbell
PC
.
HDP
. https://github.com/nicolaroberts/hdp; 
2015
.
41.
Shiraishi
Y
,
Tremmel
G
,
Miyano
S
,
Stephens
M
. 
A simple model-based approach to inferring and visualizing cancer mutation signatures
.
PLoS Genet
2015
;
11
:
e1005657
.
42.
Nik-Zainal
S
,
Wedge
DC
,
Alexandrov
LB
,
Petljak
M
,
Butler
AP
,
Bolli
N
, et al
Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer
.
Nat Genet
2014
;
46
:
487
91
.
43.
Swanton
C
,
McGranahan
N
,
Starrett
GJ
,
Harris
RS
. 
APOBEC enzymes: mutagenic fuel for cancer evolution and heterogeneity
.
Cancer Discov
2015
;
5
:
704
12
.
44.
Lefebvre
C
,
Bachelot
T
,
Filleron
T
,
Pedrero
M
,
Campone
M
,
Soria
JC
, et al
Mutational profile of metastatic breast cancers: a retrospective analysis
.
PLoS Med
2016
;
13
:
e1002201
.
45.
Stephens
PJ
,
Greenman
CD
,
Fu
B
,
Yang
F
,
Bignell
GR
,
Mudie
LJ
, et al
Massive genomic rearrangement acquired in a single catastrophic event during cancer development
.
Cell
2011
;
144
:
27
40
.
46.
Glodzik
D
,
Morganella
S
,
Davies
H
,
Simpson
PT
,
Li
Y
,
Zou
X
, et al
A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers
.
Nat Genet
2017
;
49
:
341
8
.
47.
Taylor
BJ
,
Nik-Zainal
S
,
Wu
YL
,
Stebbings
LA
,
Raine
K
,
Campbell
PJ
, et al
DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis
.
eLife
2013
;
2
:
e00534
.
48.
Lada
AG
,
Kliver
SF
,
Dhar
A
,
Polev
DE
,
Masharsky
AE
,
Rogozin
IB
, et al
Disruption of transcriptional coactivator sub1 leads to genome-wide re-distribution of clustered mutations induced by APOBEC in active yeast genes
.
PLoS Genet
2015
;
11
:
e1005217
.
49.
Walker
BA
,
Wardell
CP
,
Murison
A
,
Boyle
EM
,
Begum
DB
,
Dahir
NM
, et al
APOBEC family mutational signatures are associated with poor prognosis translocations in multiple myeloma
.
Nat Commun
2015
;
6
:
6997
.
50.
Polak
P
,
Karlic
R
,
Koren
A
,
Thurman
R
,
Sandstrom
R
,
Lawrence
MS
, et al
Cell-of-origin chromatin organization shapes the mutational landscape of cancer
.
Nature
2015
;
518
:
360
4
.
51.
Schuster-Bockler
B
,
Lehner
B
. 
Chromatin organization is a major influence on regional mutation rates in human cancer cells
.
Nature
2012
;
488
:
504
7
.
52.
Haradhvala
NJ
,
Polak
P
,
Stojanov
P
,
Covington
KR
,
Shinbrot
E
,
Hess
JM
, et al
Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair
.
Cell
2016
;
164
:
538
49
.
53.
Seplyarskiy
VB
,
Soldatov
RA
,
Popadin
KY
,
Antonarakis
SE
,
Bazykin
GA
,
Nikolaev
SI
. 
APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication
.
Genome Res
2016
;
26
:
174
82
.
54.
Hoopes
JI
,
Cortez
LM
,
Mertz
TM
,
Malc
EP
,
Mieczkowski
PA
,
Roberts
SA
. 
APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication
.
Cell Rep
2016
;
14
:
1273
82
.
55.
Kanu
N
,
Cerone
MA
,
Goh
G
,
Zalmas
LP
,
Bartkova
J
,
Dietzen
M
, et al
DNA replication stress mediates APOBEC3 family mutagenesis in breast cancer
.
Genome Biol
2016
;
17
:
185
.
56.
Lord
CJ
,
Ashworth
A
. 
BRCAness revisited
.
Nat Rev Cancer
2016
;
16
:
110
20
.
57.
Davies
H
,
Glodzik
D
,
Morganella
S
,
Yates
LR
,
Staaf
J
,
Zou
X
, et al
HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures
.
Nat Med
2017
;
23
:
517
25
.
58.
Joosse
SA
,
van Beers
EH
,
Tielen
IH
,
Horlings
H
,
Peterse
JL
,
Hoogerbrugge
N
, et al
Prediction of BRCA1-association in hereditary non-BRCA1/2 breast carcinomas with array-CGH
.
Breast Cancer Res Treat
2009
;
116
:
479
89
.
59.
Vollebergh
MA
,
Lips
EH
,
Nederlof
PM
,
Wessels
LF
,
Schmidt
MK
,
van Beers
EH
, et al
An aCGH classifier derived from BRCA1-mutated breast cancer and benefit of high-dose platinum-based chemotherapy in HER2-negative breast cancer patients
.
Ann Oncol
2011
;
22
:
1561
70
.
60.
Watkins
JA
,
Irshad
S
,
Grigoriadis
A
,
Tutt
AN
. 
Genomic scars as biomarkers of homologous recombination deficiency and drug response in breast and ovarian cancers
.
Breast Cancer Res
2014
;
16
:
211
.
61.
Fong
PC
,
Boss
DS
,
Yap
TA
,
Tutt
A
,
Wu
P
,
Mergui-Roelvink
M
, et al
Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers
.
N Engl J Med
2009
;
361
:
123
34
.
62.
Prakash
R
,
Zhang
Y
,
Feng
W
,
Jasin
M
. 
Homologous recombination and human health: the roles of BRCA1, BRCA2, and associated proteins
.
Cold Spring Harb Perspect Biol
2015
;
7
:
a016600
.
63.
Bryant
HE
,
Schultz
N
,
Thomas
HD
,
Parker
KM
,
Flower
D
,
Lopez
E
, et al
Specific killing of BRCA2-deficient tumours with inhibitors of poly(ADP-ribose) polymerase
.
Nature
2005
;
434
:
913
7
.
64.
Farmer
H
,
McCabe
N
,
Lord
CJ
,
Tutt
AN
,
Johnson
DA
,
Richardson
TB
, et al
Targeting the DNA repair defect in BRCA mutant cells as a therapeutic strategy
.
Nature
2005
;
434
:
917
21
.
65.
Vonderheide
RH
,
Domchek
SM
,
Clark
AS
. 
Immunotherapy for breast cancer: what are we missing?
Clin Cancer Res
2017
;
23
:
2640
6
.
66.
Yates
LR
,
Desmedt
C
. 
Translational genomics: practical applications of the genomic revolution in breast cancer
.
Clin Cancer Res
2017
;
23
:
2630
9
.
67.
Hainaut
P
,
Pfeifer
GP
. 
Somatic TP53 mutations in the era of genome sequencing
.
Cold Spring Harb Perspect Med
2016
;
6
. pii: a026179.
68.
Besaratinia
A
,
Kim
SI
,
Hainaut
P
,
Pfeifer
GP
. 
In vitro recapitulating of TP53 mutagenesis in hepatocellular carcinoma associated with dietary aflatoxin B1 exposure
.
Gastroenterology
2009
;
137
:
1127
37
.
69.
Hainaut
P
,
Pfeifer
GP
. 
Patterns of p53 G–>T transversions in lung cancers reflect the primary mutagenic signature of DNA-damage by tobacco smoke
.
Carcinogenesis
2001
;
22
:
367
74
.
70.
Bouaoun
L
,
Sonkin
D
,
Ardin
M
,
Hollstein
M
,
Byrnes
G
,
Zavadil
J
, et al
TP53 variations in human cancers: new lessons from the IARC TP53 database and genomics data
.
Hum Mutat
2016
;
37
:
865
76
.
71.
Olivier
M
,
Weninger
A
,
Ardin
M
,
Huskova
H
,
Castells
X
,
Vallee
MP
, et al
Modelling mutational landscapes of human cancers in vitro
.
Sci Rep
2014
;
4
:
4482
.
72.
Reeder-Hayes
KE
,
Anderson
BO
. 
Breast cancer disparities at home and abroad: a review of the challenges and opportunities for system-level change
.
Clin Cancer Res
2017
;
23
:
2655
64
.
73.
Freedman
RA
,
Partridge
AH
. 
Emerging data and current challenges for young, old, obese, or male patients with breast cancer
.
Clin Cancer Res
2017
;
23
:
2647
54
.
74.
Ferrari
A
,
Vincent-Salomon
A
,
Pivot
X
,
Sertier
AS
,
Thomas
E
,
Tonon
L
, et al
A whole-genome sequence and transcriptome perspective on HER2-positive breast cancers
.
Nat Commun
2016
;
7
:
12222
.