Background: Our understanding of the biological processes that generate somatic mutations in breast cancer has increased markedly over the past five years. Using the catalog of somatic mutations present in cancer genomes, over 30 “mutational signatures” have been produced. While these provide important insights into the processes responsible for somatic mutation, gaps remain, and the etiology of several signatures remains unknown.

Methods: We have developed a new method in which the specific nucleotide change (e.g., C>T), the codon that each mutation falls in (e.g., GCT), the position in the codon (e.g., 2), and the nucleotides immediately 5' and 3' of the mutation (e.g., 5': C; 3': G) are all considered. The summary of these mutation characteristics forms a mutational profile for each tissue sample. Putting multiple samples' profiles together forms a sparse matrix with the number of samples as rows and the mutation characteristics as columns. Nonsmooth nonnegative matrix factorization was then applied to enable the discovery of intrinsic patterns in this sparse matrix.

Results: Using somatic mutations identified in 1017 breast cancer tissues from The Cancer Genome Atlas (TCGA), we have identified four mutational signatures. Signature A correlates with the well-defined APOBEC signatures and signature B with the “aging” signature, which is the result 5-methylcytosine hydrolysis. Signature C and signature D are potentially new signatures. Signature D is enriched with C:G>A:T mutations; these mostly occur in the middle position of codons, and are enriched with GG(CC) either 5' or 3' of the mutation's sequence context. G>T mutations are known to occur as a consequence of oxidative damage that is not repaired. Guanines are vulnerable as they have the highest vertical oxidation potential of the nucleobases. The 5' guanines in GG sites are especially reactive. We hypothesize that Signature D results from oxidative mutagenesis.

When correlated with clinical phenotypes, the basal subtype is clearly enriched for tumors with the Signature D mutation pattern (exposure level is in 169 basal tumors and in 797 non-basal tumors, p=<0.01), suggesting an etiologic link with basal-like breast cancer. G>T somatic mutations in breast cancer mainly take place during cell replication rather than during transcription. In the normal breast, epithelial cell replication occurs during the luteal phase of the menstrual cycle and during pregnancy, primarily under the direction of progesterone (P). P binds to its receptor (PR) in a subpopulation of PR positive cells where it initiates the transcription of genes including RANKL, with resultant paracrine stimulation (through RANK), of the NF-κB signaling pathway in neighboring cells. The four RANKL genes (TOP2A, MKI67, PBK, CDK1), defined by Nolan et al., are all positively associated with the signature D (P-values < 0.05), suggesting that this type of mutagenesis is associated with RANKL pathway upregulation.

Conclusions: We have identified a potentially new somatic mutational signature, which we have designated as Signature D, which appears to result from exposure of DNA to oxidative stress during replication. It is associated with the basal subtype of breast cancer as well as RANKL- NF-κB pathway upregulation.

Citation Format: Zeng Z, Vo AH, Luo Y, Khan SA, Clare SE. Novel breast cancer mutational signatures [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr P3-06-06.