Abstract
Purpose: Interval breast cancer is of clinical interest, as it exhibits an aggressive phenotype and evades detection by screening mammography. A comprehensive picture of somatic changes that drive tumors to become symptomatic in the screening interval can improve understanding of the biology underlying these aggressive tumors.
Experimental Design: Initiated in April 2013, Clinical Sequencing of Cancer in Sweden (Clinseq) is a scientific and clinical platform for the genomic profiling of cancer. The breast cancer pilot study consisted of women diagnosed with breast cancer between 2001 and 2012 in the Stockholm/Gotland regions. A subset of 307 breast tumors was successfully sequenced, of which 113 were screen-detected and 60 were interval cancers. We applied targeted deep sequencing of cancer-related genes; low-pass, whole-genome sequencing; and RNA sequencing technology to characterize somatic differences in the genomic and transcriptomic architecture by interval cancer status. Mammographic density and PAM50 molecular subtypes were considered.
Results: In the univariate analyses, TP53, PPP1R3A, and KMT2B were significantly more frequently mutated in interval cancers than in screen-detected cancers. Acquired somatic copy number aberrations with a frequency difference of at least 15% between the two groups included gains in 17q23-q25.3 and losses in 16q24.2. Gene expression analysis identified 447 significantly differentially expressed genes, of which 120 were replicated in an independent microarray dataset. After adjusting for PAM50, most differences were no longer significant.
Conclusions: Molecular differences by interval cancer status were observed, but they were largely explained by PAM50 subtypes. This work offers new insights into the biological differences between the two tumor groups. Clin Cancer Res; 23(10); 2584–92. ©2016 AACR.
With clinical sequencing gaining ground, it is time to aggressively pursue the genomic structure of interval breast cancers, as these tumors carry a high mortality burden. Although screen-detected cancers are biologically distinct from interval cancers in terms of somatic mutations, copy number aberrations, and gene expression, most of the differences are no longer significant after adjusting for breast cancer intrinsic subtypes (PAM50). We also show that the molecular differences appear to form a spectrum from less aggressive (screen-detected) to more aggressive (interval) manifestations of the disease, which can be characterized by PAM50 subtypes, namely, luminal A, luminal B, HER2-enriched, and basal-like, in that order. A comprehensive picture of somatic changes that drive tumors to become symptomatic in the screening interval can improve understanding of the biology underlying this aggressive subset of breast cancer.
Introduction
Interval breast cancer is of clinical interest as it is diagnosed within the time interval between screening examinations and evades detection by mammography. Interval cancers have been claimed to represent a more aggressive subset of breast cancer (1). They are typically larger in size, of higher grade, more frequently node-positive at diagnosis, more likely to be negative for estrogen receptor (ER) status, and more often associated with a triple-negative phenotype (2–5). Even after adjusting for known tumor characteristics, interval cancers are more fatal than screen-detected cancers (6, 7). This group of tumors represent a problem in a screening program and a necessary condition for effective screening is that the total incidence of interval cancers is kept low (8).
On the basis of the PAM50 gene signature, which measures expression profiles for 50 genes, breast cancer can be classified into 4 highly reproducible and robust intrinsic subtypes (luminal A, luminal B, HER2-enriched, and basal-like) with distinct biology and clinical outcome (9, 10). While previous studies have reported an overrepresentation of aggressive immunohistochemistry-defined subtypes among interval cancers (3, 5), to our knowledge, no study has characterized the somatic differences between screen-detected and interval cancers while at the same time taking PAM50 molecular subtypes into consideration.
We have previously shown that interval cancers exhibit features of more aggressive tumor behavior than screen-detected cancers, especially in mammographically nondense breasts (5). In another study, we have also shown that there are germline genetic differences between screen-detected and interval cancers, quantified by summing the effects of multiple risk breast cancer risk variants in a polygenic risk score (11). As these tumors carry a high mortality burden, it is time to aggressively pursue the genomic structure of interval breast cancers with the advent of clinical sequencing. To clarify the biologic nature and malignant potential of interval cancers, we apply targeted deep sequencing of cancer-related genes; low-pass, whole-genome sequencing; and RNA sequencing (RNA-seq) technology to characterize 113 screen-detected and 60 interval cancer tumors. Our study is the first large-scale experiment to detail somatic differences in the genomic and transcriptomic architecture of screen-detected and interval cancers, taking percent mammographic density (PMD) and PAM50 subtypes into account.
Materials and Methods
Detailed methodology is presented as Supplementary Methods in the Supplement.
Study populations
Initiated in April 2013, Clinical Sequencing of Cancer in Sweden (Clinseq, http://clinseq.org/) is a scientific and clinical platform for the genomic profiling of cancer. The breast cancer pilot study consisted of women with a primary breast cancer diagnosed in 2001 to 2012 in the Stockholm/Gotland regions. A subset of 307 breast tumors was successfully sequenced (12). The validation study is a nested case–control study consisting of women diagnosed with a primary breast cancer in 1997 to 2005 in the same regions and has been described in detail previously (13, 14). The cohort included 621 individuals with fresh-frozen tumors. Further details about both datasets are available in Supplementary Methods in the Supplement.
The discovery study was approved by the ethical committee at Karolinska Institutet. The validation study set has been previously approved for gene expression analyses by the same committee.
Clinical data
We assessed the screening history for all women in Clinseq and the validation study. Dates of mammographic screening visits and information about the outcome of each visit were obtained from the mammography screening database kept at the Stockholm-Gotland Regional Cancer Center (5). The database contains attendance and outcome of all visits undertaken within the population-based mammography screening program for Stockholm County. All Stockholm women ages 50 to 69 years have been invited to be screened at 24-month intervals since 1989, whereas women ages 40 to 49 years were included from mid-2005 and screened at 18-month intervals. Participation rate was 70%, recall rate was 3%, and detection rate was 0.5% for the study period (15). Full details of the organizational and quality aspects of the Stockholm mammography screening program are described in the publication by Lind and colleagues (15). Screen-detected breast cancer was defined as a breast cancer diagnosis made after a positive screen finding but before the next visit or end of a normal screening interval. Interval breast cancer was defined as a breast cancer diagnosis made after a negative screen but before the next visit or end of a normal screening interval.
Tumor characteristics were manually retrieved from medical records for both the Clinseq and validation studies. Mammographic density was measured with an area-based method previously described by Li and colleagues (16). Valid images were retrieved for 164 participants in Clinseq (94.8%).
Exclusions
From the Clinseq study (n = 307), we excluded 7 noninvasive cancers, 1 duplicate, 61 breast cancers diagnosed outside of a normal screening interval, and 65 breast cancers diagnosed in women not attending screening. The primary analysis dataset included 113 screen-detected and 60 interval cancers (see flow diagram leading to analytical cohort in Supplementary Fig. S1 in the Supplement).
In the validation study (n = 621), breast cancers diagnosed outside a normal screening interval (n = 91) and breast cancers diagnosed in women not attending screening (n = 310) were excluded. The final dataset for validation included 109 screen-detected and 111 interval cancers (diagnosed within 24 months after a negative screen).
Sample preparation, sequencing, and microarrays
Isolation of total RNA and DNA from fresh-frozen tumor samples in both the discovery and validation studies was performed according to standard methods (AllPrep DNA/RNA/Protein Mini and RNeasy Mini Kit from Qiagen, respectively, see Supplementary Methods in the Supplement for details). DNA sequencing libraries were used for low-pass, whole-genome sequencing as well as deep sequencing of a custom panel of 516 cancer-related genes in Clinseq (Supplementary Table S1 in Supplement). The validation study was profiled using NuGEN amplification protocol and hybridized using the HRSTA-2.0 custom human Affymetrix array (13, 14). Details of the custom array are available at NCBI GEO depository as GPL10379. The corresponding array data have been deposited at the Gene Expression Omnibus Database under accession numbers GSE48091 and GSE81954.
Bioinformatic processing of sequencing data in parent Clinseq study
Details on methods for this section are described in the study by Rantalainen and colleagues (12) and our complete analysis pipeline is described in Supplementary Methods in the Supplement.
Assigning intrinsic (PAM50) subtypes
For the Clinseq dataset, intrinsic subtyping was performed on the basis of RNA-seq data classifying tumors into luminal A, luminal B, HER2-enriched, or basal-like. Subtypes were assigned using the research-based 50-gene prediction analysis of microarray (PAM50) gene set (13, 17, 18). For the validation dataset, intrinsic subtyping was carried out using microarray data.
Statistical analysis
Somatic mutation profiling.
The Fisher exact test was used to examine frequency differences in somatic mutations between screen-detected and interval cancer. Genes found to be associated with nominal P < 0.05 were further analyzed in exact logistic regression models (elrm package in R) adjusting for PMD in quartiles and PAM50 subtype.
Somatic copy number aberrations.
Copy number loss and gain were determined as segments having “log2copyRatio”< log2(0.75) with “log10.pvalue”<log10(0.0001) and “log2.copyRatio” ≥log2(1.25) with “log10.pvalue ”<log10(0.0001), respectively, from the BICseq output. In total, there were 27,542 valid segments. The impact of common copy number aberrations (CNA; frequency ≥ 5%) was assessed by burden analysis of segmental copy number variation (CNV) data in Plink. To identify CNAs at specific genomic locations, the segments were mapped to Ensembl stable identifiers (referred to as probes). The Fisher exact test was used to compare copy number events by interval cancer status. Multiple testing corrections were performed via false discovery rate (FDR) estimation using the Benjamini–Hochberg procedure (19). The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) breast invasive carcinoma segmented CNV profiles for 1,088 tumors before and after removing common germline CNV were used as reference.
Differential gene expression analysis.
Differential analysis of count data features between screen-detected and interval cancers in Clinseq was performed using the DESeq2 package in R. We first carried out univariate analyses to identify genes which are differentially expressed between screen-detected and interval cancers (without PAM50 subtype information). Differences in gene expression were considered significant if FDR < 0.05 and absolute fold change ≥ 1.5. We subsequently carried out multivariable analyses to see whether any of the selected genes were differentially associated with screen-detected and interval cancers independently of PAM50 subtype.
The limma package in R was used for differential expression analysis of the validation data. As the validation study does not have information on mammographic density, only PAM50 subtype was included in the multivariable model. A significantly differentially expressed (SDE) gene in the Clinseq analysis was considered to be replicated if the nominal P value associated with the differentially expressed gene in the validation dataset is less than 0.05 and the log2 fold change is in the same direction. To additionally investigate whether SDE genes in the Clinseq analysis collectively show a statistically significant, concordant difference between screen-detected (reference) and interval cancer tumors in the replication dataset, we analyzed the enrichment of a gene set containing 447 SDE genes identified in Clinseq in the validation dataset using Gene Set Enrichment Analysis (GSEA) using default parameters (20). We also carried out cluster analysis to group/classify tumor tissues, based on the identified SDE genes. Principal component analysis (PCA) was performed to examine sample relations. To evaluate cumulative changes in the expression of groups of multiple genes defined on the basis of prior biological knowledge (i.e., pathway analysis), we used GSEA Preranked (classic scoring scheme; ref. 20).
Results
From the parent Clinseq study, we identified 113 screen-detected and 60 interval cancers. Seven percent (n = 8) of the screen-detected cancers were prevalent screens (i.e., first mammogram). Sixty percent (n = 36) of the interval cancers were diagnosed at least 1 year after their latest (negative) screening mammogram. There was no evidence that interval cancers diagnosed closer in time to a negative screen were correlated with more aggressive phenotypes. Similar to what was observed in a larger study on interval cancers and tumor characteristics of which this study is a subset (5), neither PAM50 subtype nor cellular proliferation level was significantly associated with time since negative screen among interval cancers (P > 0.05).
Interval cancers exhibit a more aggressive phenotype
Interval cancers were more common in women with higher mammographic density (Table 1). The PAM50 subtype distribution differed by interval cancer status, with the luminal A signature being less common and subtypes associated with worse prognosis (luminal B and basal-like) being more frequent among interval cancers (Table 1). Interval cancers were also significantly more often associated with higher grade, larger size, higher proliferation levels, and negative for ER and progesterone receptor (PR; Table 1). Because of the enrichment for deadly metastatic breast cancers, the validation study has a comparatively larger proportion of interval cancers. However, we did not observe any large difference when comparing tumor characteristics between screen-detected and interval cancers in the 2 studies. Interval cancers in the validation dataset were similarly significantly associated with more aggressive tumor characteristics (Supplementary Table S2 in Supplement).
Characteristic . | Screen-detected . | Interval . | OR (95% CI) . | PWald . | Ptrend . |
---|---|---|---|---|---|
Age, y | 59.1 (7.7) | 57.9 (6.6) | 0.98 (0.94–1.02) | 0.300 | |
Percent mammographic density (quartiles, Q) | |||||
Q1 (0.20–10.4) | 31 (27.4) | 10 (16.7) | Reference | 0.048 | |
Q2 (>10.4–18.8) | 29 (25.7) | 12 (20) | 1.28 (0.48–3.42) | 0.619 | |
Q3 (>18.8–29.3) | 26 (23) | 15 (25) | 1.79 (0.69–4.65) | 0.233 | |
Q4 (>29.3–79.2) | 23 (20.4) | 18 (30) | 2.43 (0.95–6.23) | 0.065 | |
Missing | 4 (3.5) | 5 (8.3) | |||
PAM50 | |||||
Luminal A | 73 (64.6) | 23 (38.3) | Reference | ||
Luminal B | 20 (17.7) | 15 (25) | 2.38 (1.05–5.39) | 0.038 | |
HER2-enriched | 16 (14.2) | 11 (18.3) | 2.18 (0.89–5.36) | 0.089 | |
Basal-like | 4 (3.5) | 11 (18.3) | 8.73 (2.53–30.06) | 0.001 | |
ER | |||||
Positive | 101 (89.4) | 45 (75) | Reference | ||
Negative | 11 (9.7) | 15 (25) | 3.14 (1.29–7.61) | 0.011 | |
Missing | 1 (0.9) | 0 (0) | |||
PR | |||||
Positive | 80 (70.8) | 30 (50) | Reference | ||
Negative | 32 (28.3) | 29 (48.3) | 2.72 (1.37–5.39) | 0.004 | |
Missing | 1 (0.9) | 1 (1.7) | |||
HER2 | |||||
Negative | 88 (77.9) | 48 (80) | Reference | ||
Positive | 24 (21.2) | 11 (18.3) | 0.84 (0.38–1.86) | 0.668 | |
Missing | 1 (0.9) | 1 (1.7) | |||
Elston–Ellis grade | |||||
Well-differentiated | 22 (19.5) | 4 (6.7) | Reference | 0.023 | |
Moderately differentiated | 51 (45.1) | 25 (41.7) | 2.70 (0.84–8.67) | 0.096 | |
Poorly differentiated | 40 (35.4) | 28 (46.7) | 3.85 (1.20–12.4) | 0.024 | |
Missing | 0 (0) | 3 (5) | |||
Largest tumor size, mm | |||||
<20 | 53 (46.9) | 18 (30) | Reference | 0.014 | |
20–49 | 56 (49.6) | 36 (60) | 1.89 (0.96–3.73) | 0.066 | |
≥50 | 4 (3.5) | 6 (10) | 4.42 (1.12–17.44) | 0.034 | |
Proliferation index (Ki-67) | |||||
Low (<20%) | 54 (47.8) | 18 (30) | Reference | ||
High (≥20%) | 57 (50.4) | 40 (66.7) | 2.11 (1.08–4.11) | 0.029 | |
Missing | 2 (1.8) | 2 (3.3) |
Characteristic . | Screen-detected . | Interval . | OR (95% CI) . | PWald . | Ptrend . |
---|---|---|---|---|---|
Age, y | 59.1 (7.7) | 57.9 (6.6) | 0.98 (0.94–1.02) | 0.300 | |
Percent mammographic density (quartiles, Q) | |||||
Q1 (0.20–10.4) | 31 (27.4) | 10 (16.7) | Reference | 0.048 | |
Q2 (>10.4–18.8) | 29 (25.7) | 12 (20) | 1.28 (0.48–3.42) | 0.619 | |
Q3 (>18.8–29.3) | 26 (23) | 15 (25) | 1.79 (0.69–4.65) | 0.233 | |
Q4 (>29.3–79.2) | 23 (20.4) | 18 (30) | 2.43 (0.95–6.23) | 0.065 | |
Missing | 4 (3.5) | 5 (8.3) | |||
PAM50 | |||||
Luminal A | 73 (64.6) | 23 (38.3) | Reference | ||
Luminal B | 20 (17.7) | 15 (25) | 2.38 (1.05–5.39) | 0.038 | |
HER2-enriched | 16 (14.2) | 11 (18.3) | 2.18 (0.89–5.36) | 0.089 | |
Basal-like | 4 (3.5) | 11 (18.3) | 8.73 (2.53–30.06) | 0.001 | |
ER | |||||
Positive | 101 (89.4) | 45 (75) | Reference | ||
Negative | 11 (9.7) | 15 (25) | 3.14 (1.29–7.61) | 0.011 | |
Missing | 1 (0.9) | 0 (0) | |||
PR | |||||
Positive | 80 (70.8) | 30 (50) | Reference | ||
Negative | 32 (28.3) | 29 (48.3) | 2.72 (1.37–5.39) | 0.004 | |
Missing | 1 (0.9) | 1 (1.7) | |||
HER2 | |||||
Negative | 88 (77.9) | 48 (80) | Reference | ||
Positive | 24 (21.2) | 11 (18.3) | 0.84 (0.38–1.86) | 0.668 | |
Missing | 1 (0.9) | 1 (1.7) | |||
Elston–Ellis grade | |||||
Well-differentiated | 22 (19.5) | 4 (6.7) | Reference | 0.023 | |
Moderately differentiated | 51 (45.1) | 25 (41.7) | 2.70 (0.84–8.67) | 0.096 | |
Poorly differentiated | 40 (35.4) | 28 (46.7) | 3.85 (1.20–12.4) | 0.024 | |
Missing | 0 (0) | 3 (5) | |||
Largest tumor size, mm | |||||
<20 | 53 (46.9) | 18 (30) | Reference | 0.014 | |
20–49 | 56 (49.6) | 36 (60) | 1.89 (0.96–3.73) | 0.066 | |
≥50 | 4 (3.5) | 6 (10) | 4.42 (1.12–17.44) | 0.034 | |
Proliferation index (Ki-67) | |||||
Low (<20%) | 54 (47.8) | 18 (30) | Reference | ||
High (≥20%) | 57 (50.4) | 40 (66.7) | 2.11 (1.08–4.11) | 0.029 | |
Missing | 2 (1.8) | 2 (3.3) |
NOTE: Mean and SD are shown within parentheses for age at diagnosis in years; count and percent proportion are shown for categorical variables. Association with interval cancer status was tested using binomial logistic regression for each characteristic separately. OR and corresponding 95% confidence intervals (CI) of Wald tests and P values for trend tests (where appropriate) were reported.
Differences in somatic mutation frequencies
The most frequently mutated genes included PIK3CA, TP53, GATA3, MAP3K1, CHD1, and KMT2C (Supplementary Table S1 in Supplement), which is in agreement with previous reports on genes mutated in breast cancer (21). Three genes, namely, TP53, PPP1R3A, and KMT2B, were found to be significantly more often mutated in interval cancers compared with screen-detected cancers (P < 0.05, Fig. 1 and Table 2). Improved statistical significance (smaller P values) was observed for all 3 genes in the model adjusted for PMD to reduce masking effect (Table 2). Only KMT2B remained significantly associated (P = 0.017) after further adjustment for PAM50 subtype (Table 2).
Gene . | Screen-detected (n, %) . | Interval (n, %) . | Fisher P . | +PMD . | +PMD, PAM50 . |
---|---|---|---|---|---|
TP53 | 27 (23.9) | 25 (41.7) | 0.023 | 0.020 | 0.710 |
PPP1R3A | 0 (0) | 3 (5.0) | 0.040 | 0.015 | 0.053 |
KMT2B | 1 (0.9) | 4 (6.7) | 0.050 | 0.033 | 0.017 |
Gene . | Screen-detected (n, %) . | Interval (n, %) . | Fisher P . | +PMD . | +PMD, PAM50 . |
---|---|---|---|---|---|
TP53 | 27 (23.9) | 25 (41.7) | 0.023 | 0.020 | 0.710 |
PPP1R3A | 0 (0) | 3 (5.0) | 0.040 | 0.015 | 0.053 |
KMT2B | 1 (0.9) | 4 (6.7) | 0.050 | 0.033 | 0.017 |
NOTE: Exact logistic regression was performed adjusting for percent mammographic density (+PMD) and both PMD and PAM50 subtype (+PMD, PAM50).
Differences in CNAs
The general distribution of CNA was similar between Clinseq tumors and TCGA data (Supplementary Fig. S2 in Supplement), which was measured experimentally using another technology (Affymetrix Genome-Wide Human SNP Array 6.0). The global burden of common CNAs in interval compared with screen-detected cancers was found to be significantly different with respect to the number of CNAs per sample (P = 0.050, 10,000 permutations). When CNAs were examined at the probe level, copy number gain and loss frequencies in 1,704 and 276 probes, respectively, were found to be different by interval cancer status; however, none would survive correction for multiple testing (FDR ≤ 0.05, Fig. 2A). CNA (uncorrected P < 0.05) with a difference in frequency of at least 15% between the 2 tumor groups (n = 429) in our data included gains in 17q23-q25.3 (Fig. 2B) and losses in 16q24.2 (Fig. 2C). The proportion of interval cancer tumors with a gain event in the significant 17q region ranged from 25.0% to 38.3%, compared with 8.0% to 20.4% in screen-detected tumors (smallest P = 0.002). Loss events were more frequently observed among screen-detected cancers at 16q24.3 (22.1%) than among interval cancers (6.7%, smallest P = 0.01). After adjusting for PMD and PAM50 subtype, 312 of the 429 CNA (72.7%) with a difference in frequency of at least 15% between the 2 tumor groups remained significant (P < 0.05; data not shown). However, none of the individual CNA probes exhibited differences that survive correction for multiple testing.
Differences in gene expression
In total, 17,136 genes passed default independent filtering by DESeq2, 447 of which were shown to be significantly differentially expressed between screen-detected and interval cancers in the univariate analysis. With these 447 SDE genes, we performed an analysis to group tumors with similar biology in our primary data. Four stable clusters were revealed (Fig. 3A), which were more associated with PAM50 subtype, ER, PR, HER2, grade, and proliferation level than interval cancer status (Fig. 3B and Supplementary Table S3 in Supplement). From the PCA plot in Fig. 3C, screen-detected and interval cancers were found to be largely overlapping on the basis of their expression profiles of the 447 SDE genes identified, but visible separation by interval cancer status can be observed at the extreme ends (left and right) of the plot, suggesting that there are transcriptomic differences between the 2 breast cancer subtypes. From the same plot, we observed a continuum of less aggressive to more aggressive PAM50 subtypes, namely, luminal A, luminal B, HER-enriched, and basal-like, in that order.
External validation in independent dataset
For 334 of the 447 SDE genes, gene expression analysis results were available from an external gene expression dataset (Supplementary Fig. S3 in Supplement). In the independent validation dataset, 120 of these genes (35.9%) were individually associated with an estimate in the same direction (P < 0.05, data not shown). Collectively, the 334 overlapping genes considered as one gene set were found to be significantly enriched in the validation dataset (GSEA, P = 0.005, FDR = 0.024; Fig. 3D). Six SDE genes replicated in the validation dataset belonged to the PAM50 gene set. After adjusting for PMD and PAM50 subtype, only one replicated SDE gene, IGF2BP3, remained significantly associated with interval cancer status in both studies.
Pathway analysis
In the univariate analysis, 19 gene signatures were found to be significantly associated with FWER < 0.05 in Clinseq, of which 15 (78.9%) were replicated with nominal P < 0.05 in the independent study (Supplementary Table S4 in Supplement). After adjusting for PMD and PAM50 subtype, only one subgroup of genes [Hallmark_Estrogen_Response_Early, systematic name in the Molecular Signatures Database (MSigDB, v5.0): MM5906] remained significantly downregulated in interval cancers compared with screen-detected cancers in both datasets.
Discussion
In this study, we profiled 113 screen-detected and 60 interval cancer tumors to reveal the spectrum of genomic sequence, architecture, and transcriptomic alterations that can distinguish between the 2 groups of breast cancer. A unique aspect of our study was that mammographic density was controlled for, suggesting that many of the interval tumors were not missed during screenings, but were indeed aggressive and arose to become detectible between screens. In the univariate analyses, TP53, PPP1R3A, and KMT2B were significantly more frequently mutated in interval cancers than in screen-detected cancers. CNAs with a frequency difference of at least 15% between the 2 groups included gains in 17q23-1-25.3 and losses in 16q24.2. Gene expression analysis identified 447 SDE genes. Of these genes, 334 could be tested in an independent microarray dataset, out of which 120 were replicated. After adjusting for PMD and PAM50 subtype, the majority of differences between screen-detected and interval cancers in our data were, however, no longer significant, implying that they were not independent of breast cancer subtype as classified by PAM50.
Compared with screen-detected cancers, interval cancers were significantly associated with PMD and more aggressive clinical features in our study, replicating earlier work by us and others (2–5). The aggressiveness of interval cancer highlights the need to understand biologic properties that distinguish it from screen-detected cancer. The highly sensitive targeted next-generation sequencing approach used in Clinseq detected a larger proportion of acquired mutations in interval cancer tumors for TP53, PPP1R3A, and KMT2B.
TP53 is a tumor suppressor that is commonly mutated in many cancer types, including breast cancer (22). Mutations in this gene have been reported to be more common in basal-like and HER2-enriched tumors (22–24). Adjusting for PMD and PAM50 subtype removed the significant association between somatic mutations in TP53 and interval cancer; suggesting that the aggressive nature of interval cancer may be largely explained by aggressive subtypes associated with higher proliferation.
As Clinseq is a pan-cancer initiative, the cancer gene list targeted is not exclusive to breast cancer. Breast is not the most frequently mutated cancer site for PPP1R3A and KMT2B. PPP1R3A has been implicated in tumor progression in colorectal cancer and KMT2B mutations are common in tumors of the pancreas, liver, lung, stomach, brain, bladder, endometrium, and large intestine (23, 25, 26). Knockdown of KMT2B has been shown to decrease the proliferation of prostate cancer cells in vitro (27). Like in the case for TP53, adjustment for PAM50 subtype removed the significant association for PPP1R3A. In contrast, KMT2B remained significantly associated with interval cancer status, suggesting the presence of a mechanism underlying the aggressive nature of such tumors that is independent of subtype. It may be interesting to examine whether interval breast cancers have more biology in common with other cancers (i.e., cancers in which KMT2B mutations are more frequent) than screen-detected breast cancers.
Overall, interval cancers were significantly associated with an increased CNA burden. The close-knit relationship between interval cancer status and PAM50 subtype is also implied in certain CNAs. Noteworthy regions that showed up in our analyses included gains in 17q23-1-25.3 and losses in 16q24.2, both of which have been previously found to be differentially associated with breast cancer intrinsic subtypes (28–30). A majority of these CNAs remained significant after adjusting for PAM50 in our data, suggesting that they may contribute substantially to the different genetic etiologies of screen-detected and interval cancers, over and beyond effects of PAM50 subtype. However, failing to survive multiple testing, our results on specific CNAs at 17q23-1-25.3 and 16q24.2 will need independent confirmation in larger datasets.
Although distinct clusters were obtained from the expression of genes which were differentially expressed between screen-detected and interval cancers, they were more correlated with PAM50 subtype and other tumor features than interval cancer status itself. On the basis of the transcriptomic profiles of the 447 SDE genes identified, the molecular differences between screen-detected and interval cancers appear to form a spectrum from less aggressive to more aggressive manifestations of the disease, which can be characterized by PAM50 subtypes. Consequently, only one gene (IGF 2 mRNA–binding protein 3, IGF2BP3) was found to be significantly differentially expressed between screen-detected and interval cancer in both Clinseq and a validation dataset after adjusting for PMD and PAM50 subtype. In line with our findings that intrinsic subtypes explain most of the biologic differences between screen-detected and interval cancers, it has been proposed that IGF2BP3, a protein coding gene that is highly expressed in cancer (31), may be an additional basal-type marker in breast carcinoma (32).
The only similar work for gene expression differences between screen-detected and interval cancers by Rojo and colleagues was performed in 10 samples (33). The authors found the mTOR signaling pathway to be significantly upregulated in interval cancers and concluded that this pathway may mediate their aggressiveness. In agreement, when multiple genes are considered en masse, 2 hallmark gene sets related to mTOR were found to be upregulated among interval cancers compared with screen-detected cancers in both Clinseq and the validation dataset. However, the associations were no longer significant after adjusting for PMD and PAM50 subtype in our study. After controlling for PAM50 subtype, the only hallmark gene set that remained significantly downregulated in interval cancers compared with screen-detected cancers in both datasets was a subgroup of genes defining early response to estrogen. This result is not surprising, as a large proportion of interval cancers is typically ER-negative. Considering that the gene set consists of 200 members, the overlap of effects between ER status captured by PAM50 subtype and estrogen early responsive genes may not be complete.
A limitation of this study is the definition of interval cancers, of which the duration between 2 screens is dependent on recommendations from screening programs. For example, the new 2015–2016 American Cancer Society guidelines suggest getting annual mammograms between ages 45 and 54 and every 2 years thereafter. In the United Kingdom NHS Breast Screening Programme, women are invited for screening every 3 years. It is important to note that if the screening interval is sufficiently long; all cancers will ultimately become interval cancers. However, most of the previous studies reporting a more aggressive nature of interval cancers have been defined using the same biennial interval used in this study.
A comprehensive picture of somatic changes that drives tumors to become symptomatic in the screening interval can improve understanding of the biology underlying this aggressive subset of breast cancer. In summary, we observed that molecular differences between screen-detected and interval cancers were present but were largely explained by PAM50 subtypes. This work clarifies the picture on what type of breast cancer we are likely to identify through population-based screening and what type of cancer we are likely to miss. Future work looking into within-subtype differences may help clarify whether there are specific genomic differences that were masked by aggregating all subtypes together.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: J. Li, D. Klevebring, J. Holm, H. Grönberg, K. Czene
Development of methodology: J. Li, D. Klevebring, J. Lindberg
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Li, D. Klevebring, N.P. Tobin, L.S. Lindström, J. Holm, G. Prochazka, S. Törnberg, J. Hartman, J. Frisell, J. Lindberg, P. Hall, J. Bergh, H. Grönberg, K. Czene
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Li, E. Ivansson, D. Klevebring, N.P. Tobin, L.S. Lindström, C. Cristando, J. Palmgren, K. Humphreys, M. Rantalainen, J. Lindberg, K. Czene
Writing, review, and/or revision of the manuscript: J. Li, E. Ivansson, N.P. Tobin, L.S. Lindström, J. Holm, C. Cristando, J. Palmgren, S. Törnberg, K. Humphreys, J. Hartman, J. Frisell, M. Rantalainen, J. Bergh, K. Czene
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Holm, J. Frisell, P. Hall
Study supervision: H. Grönberg, K. Czene
Acknowledgments
We thank Gustaf Rosin for collecting data from medical records and John Lövrot for help in processing the validation dataset.
Disclaimer
These funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of this article; and decision to submit the manuscript for publication.
Grant Support
This work was financed by the Swedish Research Council (grant no: 2014-2271, 521-2014-2057); Swedish Cancer Society (grant no: CAN 2013/469); Stockholm County Council (grant no: LS 1211-1594), the Cancer Society in Stockholm (grant no: 141092), and Breast Cancer Theme Centre Consortium (BRECT). J. Li is a UNESCO- L'Oréal International Fellow and a recipient of an award from the Alex and Eva Wallström Foundation. This study was supported by the Cancer Risk Prediction Center (CRisP; www.crispcenter.org) and a Linneus Centre (Contract ID: 70867902) financed by the Swedish Research Council.