Purpose: Interval breast cancer is of clinical interest, as it exhibits an aggressive phenotype and evades detection by screening mammography. A comprehensive picture of somatic changes that drive tumors to become symptomatic in the screening interval can improve understanding of the biology underlying these aggressive tumors.

Experimental Design: Initiated in April 2013, Clinical Sequencing of Cancer in Sweden (Clinseq) is a scientific and clinical platform for the genomic profiling of cancer. The breast cancer pilot study consisted of women diagnosed with breast cancer between 2001 and 2012 in the Stockholm/Gotland regions. A subset of 307 breast tumors was successfully sequenced, of which 113 were screen-detected and 60 were interval cancers. We applied targeted deep sequencing of cancer-related genes; low-pass, whole-genome sequencing; and RNA sequencing technology to characterize somatic differences in the genomic and transcriptomic architecture by interval cancer status. Mammographic density and PAM50 molecular subtypes were considered.

Results: In the univariate analyses, TP53, PPP1R3A, and KMT2B were significantly more frequently mutated in interval cancers than in screen-detected cancers. Acquired somatic copy number aberrations with a frequency difference of at least 15% between the two groups included gains in 17q23-q25.3 and losses in 16q24.2. Gene expression analysis identified 447 significantly differentially expressed genes, of which 120 were replicated in an independent microarray dataset. After adjusting for PAM50, most differences were no longer significant.

Conclusions: Molecular differences by interval cancer status were observed, but they were largely explained by PAM50 subtypes. This work offers new insights into the biological differences between the two tumor groups. Clin Cancer Res; 23(10); 2584–92. ©2016 AACR.

Translational Relevance

With clinical sequencing gaining ground, it is time to aggressively pursue the genomic structure of interval breast cancers, as these tumors carry a high mortality burden. Although screen-detected cancers are biologically distinct from interval cancers in terms of somatic mutations, copy number aberrations, and gene expression, most of the differences are no longer significant after adjusting for breast cancer intrinsic subtypes (PAM50). We also show that the molecular differences appear to form a spectrum from less aggressive (screen-detected) to more aggressive (interval) manifestations of the disease, which can be characterized by PAM50 subtypes, namely, luminal A, luminal B, HER2-enriched, and basal-like, in that order. A comprehensive picture of somatic changes that drive tumors to become symptomatic in the screening interval can improve understanding of the biology underlying this aggressive subset of breast cancer.

Interval breast cancer is of clinical interest as it is diagnosed within the time interval between screening examinations and evades detection by mammography. Interval cancers have been claimed to represent a more aggressive subset of breast cancer (1). They are typically larger in size, of higher grade, more frequently node-positive at diagnosis, more likely to be negative for estrogen receptor (ER) status, and more often associated with a triple-negative phenotype (2–5). Even after adjusting for known tumor characteristics, interval cancers are more fatal than screen-detected cancers (6, 7). This group of tumors represent a problem in a screening program and a necessary condition for effective screening is that the total incidence of interval cancers is kept low (8).

On the basis of the PAM50 gene signature, which measures expression profiles for 50 genes, breast cancer can be classified into 4 highly reproducible and robust intrinsic subtypes (luminal A, luminal B, HER2-enriched, and basal-like) with distinct biology and clinical outcome (9, 10). While previous studies have reported an overrepresentation of aggressive immunohistochemistry-defined subtypes among interval cancers (3, 5), to our knowledge, no study has characterized the somatic differences between screen-detected and interval cancers while at the same time taking PAM50 molecular subtypes into consideration.

We have previously shown that interval cancers exhibit features of more aggressive tumor behavior than screen-detected cancers, especially in mammographically nondense breasts (5). In another study, we have also shown that there are germline genetic differences between screen-detected and interval cancers, quantified by summing the effects of multiple risk breast cancer risk variants in a polygenic risk score (11). As these tumors carry a high mortality burden, it is time to aggressively pursue the genomic structure of interval breast cancers with the advent of clinical sequencing. To clarify the biologic nature and malignant potential of interval cancers, we apply targeted deep sequencing of cancer-related genes; low-pass, whole-genome sequencing; and RNA sequencing (RNA-seq) technology to characterize 113 screen-detected and 60 interval cancer tumors. Our study is the first large-scale experiment to detail somatic differences in the genomic and transcriptomic architecture of screen-detected and interval cancers, taking percent mammographic density (PMD) and PAM50 subtypes into account.

Detailed methodology is presented as Supplementary Methods in the Supplement.

Study populations

Initiated in April 2013, Clinical Sequencing of Cancer in Sweden (Clinseq, http://clinseq.org/) is a scientific and clinical platform for the genomic profiling of cancer. The breast cancer pilot study consisted of women with a primary breast cancer diagnosed in 2001 to 2012 in the Stockholm/Gotland regions. A subset of 307 breast tumors was successfully sequenced (12). The validation study is a nested case–control study consisting of women diagnosed with a primary breast cancer in 1997 to 2005 in the same regions and has been described in detail previously (13, 14). The cohort included 621 individuals with fresh-frozen tumors. Further details about both datasets are available in Supplementary Methods in the Supplement.

The discovery study was approved by the ethical committee at Karolinska Institutet. The validation study set has been previously approved for gene expression analyses by the same committee.

Clinical data

We assessed the screening history for all women in Clinseq and the validation study. Dates of mammographic screening visits and information about the outcome of each visit were obtained from the mammography screening database kept at the Stockholm-Gotland Regional Cancer Center (5). The database contains attendance and outcome of all visits undertaken within the population-based mammography screening program for Stockholm County. All Stockholm women ages 50 to 69 years have been invited to be screened at 24-month intervals since 1989, whereas women ages 40 to 49 years were included from mid-2005 and screened at 18-month intervals. Participation rate was 70%, recall rate was 3%, and detection rate was 0.5% for the study period (15). Full details of the organizational and quality aspects of the Stockholm mammography screening program are described in the publication by Lind and colleagues (15). Screen-detected breast cancer was defined as a breast cancer diagnosis made after a positive screen finding but before the next visit or end of a normal screening interval. Interval breast cancer was defined as a breast cancer diagnosis made after a negative screen but before the next visit or end of a normal screening interval.

Tumor characteristics were manually retrieved from medical records for both the Clinseq and validation studies. Mammographic density was measured with an area-based method previously described by Li and colleagues (16). Valid images were retrieved for 164 participants in Clinseq (94.8%).

Exclusions

From the Clinseq study (n = 307), we excluded 7 noninvasive cancers, 1 duplicate, 61 breast cancers diagnosed outside of a normal screening interval, and 65 breast cancers diagnosed in women not attending screening. The primary analysis dataset included 113 screen-detected and 60 interval cancers (see flow diagram leading to analytical cohort in Supplementary Fig. S1 in the Supplement).

In the validation study (n = 621), breast cancers diagnosed outside a normal screening interval (n = 91) and breast cancers diagnosed in women not attending screening (n = 310) were excluded. The final dataset for validation included 109 screen-detected and 111 interval cancers (diagnosed within 24 months after a negative screen).

Sample preparation, sequencing, and microarrays

Isolation of total RNA and DNA from fresh-frozen tumor samples in both the discovery and validation studies was performed according to standard methods (AllPrep DNA/RNA/Protein Mini and RNeasy Mini Kit from Qiagen, respectively, see Supplementary Methods in the Supplement for details). DNA sequencing libraries were used for low-pass, whole-genome sequencing as well as deep sequencing of a custom panel of 516 cancer-related genes in Clinseq (Supplementary Table S1 in Supplement). The validation study was profiled using NuGEN amplification protocol and hybridized using the HRSTA-2.0 custom human Affymetrix array (13, 14). Details of the custom array are available at NCBI GEO depository as GPL10379. The corresponding array data have been deposited at the Gene Expression Omnibus Database under accession numbers GSE48091 and GSE81954.

Bioinformatic processing of sequencing data in parent Clinseq study

Details on methods for this section are described in the study by Rantalainen and colleagues (12) and our complete analysis pipeline is described in Supplementary Methods in the Supplement.

Assigning intrinsic (PAM50) subtypes

For the Clinseq dataset, intrinsic subtyping was performed on the basis of RNA-seq data classifying tumors into luminal A, luminal B, HER2-enriched, or basal-like. Subtypes were assigned using the research-based 50-gene prediction analysis of microarray (PAM50) gene set (13, 17, 18). For the validation dataset, intrinsic subtyping was carried out using microarray data.

Statistical analysis

Somatic mutation profiling.

The Fisher exact test was used to examine frequency differences in somatic mutations between screen-detected and interval cancer. Genes found to be associated with nominal P < 0.05 were further analyzed in exact logistic regression models (elrm package in R) adjusting for PMD in quartiles and PAM50 subtype.

Somatic copy number aberrations.

Copy number loss and gain were determined as segments having “log2copyRatio”< log2(0.75) with “log10.pvalue”<log10(0.0001) and “log2.copyRatio” ≥log2(1.25) with “log10.pvalue ”<log10(0.0001), respectively, from the BICseq output. In total, there were 27,542 valid segments. The impact of common copy number aberrations (CNA; frequency ≥ 5%) was assessed by burden analysis of segmental copy number variation (CNV) data in Plink. To identify CNAs at specific genomic locations, the segments were mapped to Ensembl stable identifiers (referred to as probes). The Fisher exact test was used to compare copy number events by interval cancer status. Multiple testing corrections were performed via false discovery rate (FDR) estimation using the Benjamini–Hochberg procedure (19). The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) breast invasive carcinoma segmented CNV profiles for 1,088 tumors before and after removing common germline CNV were used as reference.

Differential gene expression analysis.

Differential analysis of count data features between screen-detected and interval cancers in Clinseq was performed using the DESeq2 package in R. We first carried out univariate analyses to identify genes which are differentially expressed between screen-detected and interval cancers (without PAM50 subtype information). Differences in gene expression were considered significant if FDR < 0.05 and absolute fold change ≥ 1.5. We subsequently carried out multivariable analyses to see whether any of the selected genes were differentially associated with screen-detected and interval cancers independently of PAM50 subtype.

The limma package in R was used for differential expression analysis of the validation data. As the validation study does not have information on mammographic density, only PAM50 subtype was included in the multivariable model. A significantly differentially expressed (SDE) gene in the Clinseq analysis was considered to be replicated if the nominal P value associated with the differentially expressed gene in the validation dataset is less than 0.05 and the log2 fold change is in the same direction. To additionally investigate whether SDE genes in the Clinseq analysis collectively show a statistically significant, concordant difference between screen-detected (reference) and interval cancer tumors in the replication dataset, we analyzed the enrichment of a gene set containing 447 SDE genes identified in Clinseq in the validation dataset using Gene Set Enrichment Analysis (GSEA) using default parameters (20). We also carried out cluster analysis to group/classify tumor tissues, based on the identified SDE genes. Principal component analysis (PCA) was performed to examine sample relations. To evaluate cumulative changes in the expression of groups of multiple genes defined on the basis of prior biological knowledge (i.e., pathway analysis), we used GSEA Preranked (classic scoring scheme; ref. 20).

From the parent Clinseq study, we identified 113 screen-detected and 60 interval cancers. Seven percent (n = 8) of the screen-detected cancers were prevalent screens (i.e., first mammogram). Sixty percent (n = 36) of the interval cancers were diagnosed at least 1 year after their latest (negative) screening mammogram. There was no evidence that interval cancers diagnosed closer in time to a negative screen were correlated with more aggressive phenotypes. Similar to what was observed in a larger study on interval cancers and tumor characteristics of which this study is a subset (5), neither PAM50 subtype nor cellular proliferation level was significantly associated with time since negative screen among interval cancers (P > 0.05).

Interval cancers exhibit a more aggressive phenotype

Interval cancers were more common in women with higher mammographic density (Table 1). The PAM50 subtype distribution differed by interval cancer status, with the luminal A signature being less common and subtypes associated with worse prognosis (luminal B and basal-like) being more frequent among interval cancers (Table 1). Interval cancers were also significantly more often associated with higher grade, larger size, higher proliferation levels, and negative for ER and progesterone receptor (PR; Table 1). Because of the enrichment for deadly metastatic breast cancers, the validation study has a comparatively larger proportion of interval cancers. However, we did not observe any large difference when comparing tumor characteristics between screen-detected and interval cancers in the 2 studies. Interval cancers in the validation dataset were similarly significantly associated with more aggressive tumor characteristics (Supplementary Table S2 in Supplement).

Table 1.

Clinicopathologic characteristics of screen-detected (n = 113) versus interval breast cancers (n = 60) in Clinseq (discovery dataset)

CharacteristicScreen-detectedIntervalOR (95% CI)PWaldPtrend
Age, y 59.1 (7.7) 57.9 (6.6) 0.98 (0.94–1.02) 0.300  
Percent mammographic density (quartiles, Q) 
 Q1 (0.20–10.4) 31 (27.4) 10 (16.7) Reference  0.048 
 Q2 (>10.4–18.8) 29 (25.7) 12 (20) 1.28 (0.48–3.42) 0.619  
 Q3 (>18.8–29.3) 26 (23) 15 (25) 1.79 (0.69–4.65) 0.233  
 Q4 (>29.3–79.2) 23 (20.4) 18 (30) 2.43 (0.95–6.23) 0.065  
 Missing 4 (3.5) 5 (8.3)    
PAM50 
 Luminal A 73 (64.6) 23 (38.3) Reference   
 Luminal B 20 (17.7) 15 (25) 2.38 (1.055.39) 0.038  
 HER2-enriched 16 (14.2) 11 (18.3) 2.18 (0.89–5.36) 0.089  
 Basal-like 4 (3.5) 11 (18.3) 8.73 (2.5330.06) 0.001  
ER 
 Positive 101 (89.4) 45 (75) Reference   
 Negative 11 (9.7) 15 (25) 3.14 (1.297.61) 0.011  
 Missing 1 (0.9) 0 (0)    
PR 
 Positive 80 (70.8) 30 (50) Reference   
 Negative 32 (28.3) 29 (48.3) 2.72 (1.375.39) 0.004  
 Missing 1 (0.9) 1 (1.7)    
HER2 
 Negative 88 (77.9) 48 (80) Reference   
 Positive 24 (21.2) 11 (18.3) 0.84 (0.38–1.86) 0.668  
 Missing 1 (0.9) 1 (1.7)    
Elston–Ellis grade 
 Well-differentiated 22 (19.5) 4 (6.7) Reference  0.023 
 Moderately differentiated 51 (45.1) 25 (41.7) 2.70 (0.84–8.67) 0.096  
 Poorly differentiated 40 (35.4) 28 (46.7) 3.85 (1.2012.4) 0.024  
 Missing 0 (0) 3 (5)    
Largest tumor size, mm 
 <20 53 (46.9) 18 (30) Reference  0.014 
 20–49 56 (49.6) 36 (60) 1.89 (0.96–3.73) 0.066  
 ≥50 4 (3.5) 6 (10) 4.42 (1.1217.44) 0.034  
Proliferation index (Ki-67) 
 Low (<20%) 54 (47.8) 18 (30) Reference   
 High (≥20%) 57 (50.4) 40 (66.7) 2.11 (1.084.11) 0.029  
 Missing 2 (1.8) 2 (3.3)    
CharacteristicScreen-detectedIntervalOR (95% CI)PWaldPtrend
Age, y 59.1 (7.7) 57.9 (6.6) 0.98 (0.94–1.02) 0.300  
Percent mammographic density (quartiles, Q) 
 Q1 (0.20–10.4) 31 (27.4) 10 (16.7) Reference  0.048 
 Q2 (>10.4–18.8) 29 (25.7) 12 (20) 1.28 (0.48–3.42) 0.619  
 Q3 (>18.8–29.3) 26 (23) 15 (25) 1.79 (0.69–4.65) 0.233  
 Q4 (>29.3–79.2) 23 (20.4) 18 (30) 2.43 (0.95–6.23) 0.065  
 Missing 4 (3.5) 5 (8.3)    
PAM50 
 Luminal A 73 (64.6) 23 (38.3) Reference   
 Luminal B 20 (17.7) 15 (25) 2.38 (1.055.39) 0.038  
 HER2-enriched 16 (14.2) 11 (18.3) 2.18 (0.89–5.36) 0.089  
 Basal-like 4 (3.5) 11 (18.3) 8.73 (2.5330.06) 0.001  
ER 
 Positive 101 (89.4) 45 (75) Reference   
 Negative 11 (9.7) 15 (25) 3.14 (1.297.61) 0.011  
 Missing 1 (0.9) 0 (0)    
PR 
 Positive 80 (70.8) 30 (50) Reference   
 Negative 32 (28.3) 29 (48.3) 2.72 (1.375.39) 0.004  
 Missing 1 (0.9) 1 (1.7)    
HER2 
 Negative 88 (77.9) 48 (80) Reference   
 Positive 24 (21.2) 11 (18.3) 0.84 (0.38–1.86) 0.668  
 Missing 1 (0.9) 1 (1.7)    
Elston–Ellis grade 
 Well-differentiated 22 (19.5) 4 (6.7) Reference  0.023 
 Moderately differentiated 51 (45.1) 25 (41.7) 2.70 (0.84–8.67) 0.096  
 Poorly differentiated 40 (35.4) 28 (46.7) 3.85 (1.2012.4) 0.024  
 Missing 0 (0) 3 (5)    
Largest tumor size, mm 
 <20 53 (46.9) 18 (30) Reference  0.014 
 20–49 56 (49.6) 36 (60) 1.89 (0.96–3.73) 0.066  
 ≥50 4 (3.5) 6 (10) 4.42 (1.1217.44) 0.034  
Proliferation index (Ki-67) 
 Low (<20%) 54 (47.8) 18 (30) Reference   
 High (≥20%) 57 (50.4) 40 (66.7) 2.11 (1.084.11) 0.029  
 Missing 2 (1.8) 2 (3.3)    

NOTE: Mean and SD are shown within parentheses for age at diagnosis in years; count and percent proportion are shown for categorical variables. Association with interval cancer status was tested using binomial logistic regression for each characteristic separately. OR and corresponding 95% confidence intervals (CI) of Wald tests and P values for trend tests (where appropriate) were reported.

Differences in somatic mutation frequencies

The most frequently mutated genes included PIK3CA, TP53, GATA3, MAP3K1, CHD1, and KMT2C (Supplementary Table S1 in Supplement), which is in agreement with previous reports on genes mutated in breast cancer (21). Three genes, namely, TP53, PPP1R3A, and KMT2B, were found to be significantly more often mutated in interval cancers compared with screen-detected cancers (P < 0.05, Fig. 1 and Table 2). Improved statistical significance (smaller P values) was observed for all 3 genes in the model adjusted for PMD to reduce masking effect (Table 2). Only KMT2B remained significantly associated (P = 0.017) after further adjustment for PAM50 subtype (Table 2).

Figure 1.

Distribution of mutations for 3 genes found to have significant frequency differences (P < 0.05) by interval cancer status. Each column denotes one subject. The corresponding distribution of PAM50 subtype among subjects with mutations is shown.

Figure 1.

Distribution of mutations for 3 genes found to have significant frequency differences (P < 0.05) by interval cancer status. Each column denotes one subject. The corresponding distribution of PAM50 subtype among subjects with mutations is shown.

Close modal
Table 2.

Results for tests of association between somatic mutations in cancer genes and interval cancer status (P < 0.05)

GeneScreen-detected (n, %)Interval (n, %)Fisher P+PMD+PMD, PAM50
TP53 27 (23.9) 25 (41.7) 0.023 0.020 0.710 
PPP1R3A 0 (0) 3 (5.0) 0.040 0.015 0.053 
KMT2B 1 (0.9) 4 (6.7) 0.050 0.033 0.017 
GeneScreen-detected (n, %)Interval (n, %)Fisher P+PMD+PMD, PAM50
TP53 27 (23.9) 25 (41.7) 0.023 0.020 0.710 
PPP1R3A 0 (0) 3 (5.0) 0.040 0.015 0.053 
KMT2B 1 (0.9) 4 (6.7) 0.050 0.033 0.017 

NOTE: Exact logistic regression was performed adjusting for percent mammographic density (+PMD) and both PMD and PAM50 subtype (+PMD, PAM50).

Differences in CNAs

The general distribution of CNA was similar between Clinseq tumors and TCGA data (Supplementary Fig. S2 in Supplement), which was measured experimentally using another technology (Affymetrix Genome-Wide Human SNP Array 6.0). The global burden of common CNAs in interval compared with screen-detected cancers was found to be significantly different with respect to the number of CNAs per sample (P = 0.050, 10,000 permutations). When CNAs were examined at the probe level, copy number gain and loss frequencies in 1,704 and 276 probes, respectively, were found to be different by interval cancer status; however, none would survive correction for multiple testing (FDR ≤ 0.05, Fig. 2A). CNA (uncorrected P < 0.05) with a difference in frequency of at least 15% between the 2 tumor groups (n = 429) in our data included gains in 17q23-q25.3 (Fig. 2B) and losses in 16q24.2 (Fig. 2C). The proportion of interval cancer tumors with a gain event in the significant 17q region ranged from 25.0% to 38.3%, compared with 8.0% to 20.4% in screen-detected tumors (smallest P = 0.002). Loss events were more frequently observed among screen-detected cancers at 16q24.3 (22.1%) than among interval cancers (6.7%, smallest P = 0.01). After adjusting for PMD and PAM50 subtype, 312 of the 429 CNA (72.7%) with a difference in frequency of at least 15% between the 2 tumor groups remained significant (P < 0.05; data not shown). However, none of the individual CNA probes exhibited differences that survive correction for multiple testing.

Figure 2.

Copy number profile of screen-detected (n = 113) and interval cancers (n = 60). A, Manhattan plot of the P values obtained from Fisher exact test comparing gain or loss of copy number events between screen-detected and interval cancers in Clinseq. Blue line denotes P < 0.05, and red line denotes P < 0.01. Green dots denote a minimum of 15% difference in proportion between screen-detected and interval cancers. Chromosomal plots for chromosomes 17 (B) and 16 (C) containing significant CNAs with a difference in proportion of at least 15% between the 2 tumor groups. Proportions (%) of loss and gain events are shown for screen-detected (black line) and interval cancers (red line). TCGA breast carcinoma data including common germline CNV (n = 1,088, light blue line) and TCGA breast carcinoma data excluding common germline CNV (dark blue line). Dotted green lines denote P < 0.05.

Figure 2.

Copy number profile of screen-detected (n = 113) and interval cancers (n = 60). A, Manhattan plot of the P values obtained from Fisher exact test comparing gain or loss of copy number events between screen-detected and interval cancers in Clinseq. Blue line denotes P < 0.05, and red line denotes P < 0.01. Green dots denote a minimum of 15% difference in proportion between screen-detected and interval cancers. Chromosomal plots for chromosomes 17 (B) and 16 (C) containing significant CNAs with a difference in proportion of at least 15% between the 2 tumor groups. Proportions (%) of loss and gain events are shown for screen-detected (black line) and interval cancers (red line). TCGA breast carcinoma data including common germline CNV (n = 1,088, light blue line) and TCGA breast carcinoma data excluding common germline CNV (dark blue line). Dotted green lines denote P < 0.05.

Close modal

Differences in gene expression

In total, 17,136 genes passed default independent filtering by DESeq2, 447 of which were shown to be significantly differentially expressed between screen-detected and interval cancers in the univariate analysis. With these 447 SDE genes, we performed an analysis to group tumors with similar biology in our primary data. Four stable clusters were revealed (Fig. 3A), which were more associated with PAM50 subtype, ER, PR, HER2, grade, and proliferation level than interval cancer status (Fig. 3B and Supplementary Table S3 in Supplement). From the PCA plot in Fig. 3C, screen-detected and interval cancers were found to be largely overlapping on the basis of their expression profiles of the 447 SDE genes identified, but visible separation by interval cancer status can be observed at the extreme ends (left and right) of the plot, suggesting that there are transcriptomic differences between the 2 breast cancer subtypes. From the same plot, we observed a continuum of less aggressive to more aggressive PAM50 subtypes, namely, luminal A, luminal B, HER-enriched, and basal-like, in that order.

Figure 3.

Transcriptome profile of 60 interval breast versus 113 screen-detected breast cancers in Clinseq. A, Cluster analysis of 447 significantly differentially expressed genes (FDR < 0.05 and ≥1.5-fold change) identified 4 stable clusters. Dark blue regions indicate cluster partitions for samples that always cluster together (high consensus), and white areas indicate partitions with low consensus. Cluster membership is depicted above the heatmap. B, Heatmap of gene expression profiles. Samples are ordered by cluster membership. C, PCA plot to visualize sample-to-sample distances. D, Ranks of 334 overlapping genes in the validation study—an independent set of 109 screen-detected and 111 interval cancer samples profiled with expression arrays. The enrichment score (ES) reflects the degree to which significantly differentially expressed genes in Clinseq are overrepresented at the top or bottom of a ranked list of genes in the validation study. The middle portion of the plot shows where the candidate genes appear in the ranked list of genes. The bottom portion of the plot shows the value of the ranking metric as one moves down the list of ranked genes. The ranking metric measures a gene's correlation with a phenotype. A positive value indicates correlation with interval cancer, and a negative value indicates correlation with screen-detected cancer. An accumulation at the extremes indicates an agreement between the 2 lists.

Figure 3.

Transcriptome profile of 60 interval breast versus 113 screen-detected breast cancers in Clinseq. A, Cluster analysis of 447 significantly differentially expressed genes (FDR < 0.05 and ≥1.5-fold change) identified 4 stable clusters. Dark blue regions indicate cluster partitions for samples that always cluster together (high consensus), and white areas indicate partitions with low consensus. Cluster membership is depicted above the heatmap. B, Heatmap of gene expression profiles. Samples are ordered by cluster membership. C, PCA plot to visualize sample-to-sample distances. D, Ranks of 334 overlapping genes in the validation study—an independent set of 109 screen-detected and 111 interval cancer samples profiled with expression arrays. The enrichment score (ES) reflects the degree to which significantly differentially expressed genes in Clinseq are overrepresented at the top or bottom of a ranked list of genes in the validation study. The middle portion of the plot shows where the candidate genes appear in the ranked list of genes. The bottom portion of the plot shows the value of the ranking metric as one moves down the list of ranked genes. The ranking metric measures a gene's correlation with a phenotype. A positive value indicates correlation with interval cancer, and a negative value indicates correlation with screen-detected cancer. An accumulation at the extremes indicates an agreement between the 2 lists.

Close modal

External validation in independent dataset

For 334 of the 447 SDE genes, gene expression analysis results were available from an external gene expression dataset (Supplementary Fig. S3 in Supplement). In the independent validation dataset, 120 of these genes (35.9%) were individually associated with an estimate in the same direction (P < 0.05, data not shown). Collectively, the 334 overlapping genes considered as one gene set were found to be significantly enriched in the validation dataset (GSEA, P = 0.005, FDR = 0.024; Fig. 3D). Six SDE genes replicated in the validation dataset belonged to the PAM50 gene set. After adjusting for PMD and PAM50 subtype, only one replicated SDE gene, IGF2BP3, remained significantly associated with interval cancer status in both studies.

Pathway analysis

In the univariate analysis, 19 gene signatures were found to be significantly associated with FWER < 0.05 in Clinseq, of which 15 (78.9%) were replicated with nominal P < 0.05 in the independent study (Supplementary Table S4 in Supplement). After adjusting for PMD and PAM50 subtype, only one subgroup of genes [Hallmark_Estrogen_Response_Early, systematic name in the Molecular Signatures Database (MSigDB, v5.0): MM5906] remained significantly downregulated in interval cancers compared with screen-detected cancers in both datasets.

In this study, we profiled 113 screen-detected and 60 interval cancer tumors to reveal the spectrum of genomic sequence, architecture, and transcriptomic alterations that can distinguish between the 2 groups of breast cancer. A unique aspect of our study was that mammographic density was controlled for, suggesting that many of the interval tumors were not missed during screenings, but were indeed aggressive and arose to become detectible between screens. In the univariate analyses, TP53, PPP1R3A, and KMT2B were significantly more frequently mutated in interval cancers than in screen-detected cancers. CNAs with a frequency difference of at least 15% between the 2 groups included gains in 17q23-1-25.3 and losses in 16q24.2. Gene expression analysis identified 447 SDE genes. Of these genes, 334 could be tested in an independent microarray dataset, out of which 120 were replicated. After adjusting for PMD and PAM50 subtype, the majority of differences between screen-detected and interval cancers in our data were, however, no longer significant, implying that they were not independent of breast cancer subtype as classified by PAM50.

Compared with screen-detected cancers, interval cancers were significantly associated with PMD and more aggressive clinical features in our study, replicating earlier work by us and others (2–5). The aggressiveness of interval cancer highlights the need to understand biologic properties that distinguish it from screen-detected cancer. The highly sensitive targeted next-generation sequencing approach used in Clinseq detected a larger proportion of acquired mutations in interval cancer tumors for TP53, PPP1R3A, and KMT2B.

TP53 is a tumor suppressor that is commonly mutated in many cancer types, including breast cancer (22). Mutations in this gene have been reported to be more common in basal-like and HER2-enriched tumors (22–24). Adjusting for PMD and PAM50 subtype removed the significant association between somatic mutations in TP53 and interval cancer; suggesting that the aggressive nature of interval cancer may be largely explained by aggressive subtypes associated with higher proliferation.

As Clinseq is a pan-cancer initiative, the cancer gene list targeted is not exclusive to breast cancer. Breast is not the most frequently mutated cancer site for PPP1R3A and KMT2B. PPP1R3A has been implicated in tumor progression in colorectal cancer and KMT2B mutations are common in tumors of the pancreas, liver, lung, stomach, brain, bladder, endometrium, and large intestine (23, 25, 26). Knockdown of KMT2B has been shown to decrease the proliferation of prostate cancer cells in vitro (27). Like in the case for TP53, adjustment for PAM50 subtype removed the significant association for PPP1R3A. In contrast, KMT2B remained significantly associated with interval cancer status, suggesting the presence of a mechanism underlying the aggressive nature of such tumors that is independent of subtype. It may be interesting to examine whether interval breast cancers have more biology in common with other cancers (i.e., cancers in which KMT2B mutations are more frequent) than screen-detected breast cancers.

Overall, interval cancers were significantly associated with an increased CNA burden. The close-knit relationship between interval cancer status and PAM50 subtype is also implied in certain CNAs. Noteworthy regions that showed up in our analyses included gains in 17q23-1-25.3 and losses in 16q24.2, both of which have been previously found to be differentially associated with breast cancer intrinsic subtypes (28–30). A majority of these CNAs remained significant after adjusting for PAM50 in our data, suggesting that they may contribute substantially to the different genetic etiologies of screen-detected and interval cancers, over and beyond effects of PAM50 subtype. However, failing to survive multiple testing, our results on specific CNAs at 17q23-1-25.3 and 16q24.2 will need independent confirmation in larger datasets.

Although distinct clusters were obtained from the expression of genes which were differentially expressed between screen-detected and interval cancers, they were more correlated with PAM50 subtype and other tumor features than interval cancer status itself. On the basis of the transcriptomic profiles of the 447 SDE genes identified, the molecular differences between screen-detected and interval cancers appear to form a spectrum from less aggressive to more aggressive manifestations of the disease, which can be characterized by PAM50 subtypes. Consequently, only one gene (IGF 2 mRNA–binding protein 3, IGF2BP3) was found to be significantly differentially expressed between screen-detected and interval cancer in both Clinseq and a validation dataset after adjusting for PMD and PAM50 subtype. In line with our findings that intrinsic subtypes explain most of the biologic differences between screen-detected and interval cancers, it has been proposed that IGF2BP3, a protein coding gene that is highly expressed in cancer (31), may be an additional basal-type marker in breast carcinoma (32).

The only similar work for gene expression differences between screen-detected and interval cancers by Rojo and colleagues was performed in 10 samples (33). The authors found the mTOR signaling pathway to be significantly upregulated in interval cancers and concluded that this pathway may mediate their aggressiveness. In agreement, when multiple genes are considered en masse, 2 hallmark gene sets related to mTOR were found to be upregulated among interval cancers compared with screen-detected cancers in both Clinseq and the validation dataset. However, the associations were no longer significant after adjusting for PMD and PAM50 subtype in our study. After controlling for PAM50 subtype, the only hallmark gene set that remained significantly downregulated in interval cancers compared with screen-detected cancers in both datasets was a subgroup of genes defining early response to estrogen. This result is not surprising, as a large proportion of interval cancers is typically ER-negative. Considering that the gene set consists of 200 members, the overlap of effects between ER status captured by PAM50 subtype and estrogen early responsive genes may not be complete.

A limitation of this study is the definition of interval cancers, of which the duration between 2 screens is dependent on recommendations from screening programs. For example, the new 2015–2016 American Cancer Society guidelines suggest getting annual mammograms between ages 45 and 54 and every 2 years thereafter. In the United Kingdom NHS Breast Screening Programme, women are invited for screening every 3 years. It is important to note that if the screening interval is sufficiently long; all cancers will ultimately become interval cancers. However, most of the previous studies reporting a more aggressive nature of interval cancers have been defined using the same biennial interval used in this study.

A comprehensive picture of somatic changes that drives tumors to become symptomatic in the screening interval can improve understanding of the biology underlying this aggressive subset of breast cancer. In summary, we observed that molecular differences between screen-detected and interval cancers were present but were largely explained by PAM50 subtypes. This work clarifies the picture on what type of breast cancer we are likely to identify through population-based screening and what type of cancer we are likely to miss. Future work looking into within-subtype differences may help clarify whether there are specific genomic differences that were masked by aggregating all subtypes together.

No potential conflicts of interest were disclosed.

Conception and design: J. Li, D. Klevebring, J. Holm, H. Grönberg, K. Czene

Development of methodology: J. Li, D. Klevebring, J. Lindberg

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Li, D. Klevebring, N.P. Tobin, L.S. Lindström, J. Holm, G. Prochazka, S. Törnberg, J. Hartman, J. Frisell, J. Lindberg, P. Hall, J. Bergh, H. Grönberg, K. Czene

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Li, E. Ivansson, D. Klevebring, N.P. Tobin, L.S. Lindström, C. Cristando, J. Palmgren, K. Humphreys, M. Rantalainen, J. Lindberg, K. Czene

Writing, review, and/or revision of the manuscript: J. Li, E. Ivansson, N.P. Tobin, L.S. Lindström, J. Holm, C. Cristando, J. Palmgren, S. Törnberg, K. Humphreys, J. Hartman, J. Frisell, M. Rantalainen, J. Bergh, K. Czene

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Holm, J. Frisell, P. Hall

Study supervision: H. Grönberg, K. Czene

We thank Gustaf Rosin for collecting data from medical records and John Lövrot for help in processing the validation dataset.

These funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of this article; and decision to submit the manuscript for publication.

This work was financed by the Swedish Research Council (grant no: 2014-2271, 521-2014-2057); Swedish Cancer Society (grant no: CAN 2013/469); Stockholm County Council (grant no: LS 1211-1594), the Cancer Society in Stockholm (grant no: 141092), and Breast Cancer Theme Centre Consortium (BRECT). J. Li is a UNESCO- L'Oréal International Fellow and a recipient of an award from the Alex and Eva Wallström Foundation. This study was supported by the Cancer Risk Prediction Center (CRisP; www.crispcenter.org) and a Linneus Centre (Contract ID: 70867902) financed by the Swedish Research Council.

1.
DeGroote
R
,
Rush
BF
 Jr
,
Milazzo
J
,
Warden
MJ
,
Rocko
JM
. 
Interval breast cancer: a more aggressive subset of breast neoplasias
.
Surgery
1983
;
94
:
543
7
.
2.
Rayson
D
,
Payne
JI
,
Abdolell
M
,
Barnes
PJ
,
MacIntosh
RF
,
Foley
T
, et al
Comparison of clinical-pathologic characteristics and outcomes of true interval and screen-detected invasive breast cancer among participants of a Canadian breast screening program: a nested case-control study
.
Clin Breast Cancer
2011
;
11
:
27
32
.
3.
Domingo
L
,
Sala
M
,
Servitja
S
,
Corominas
JM
,
Ferrer
F
,
Martinez
J
, et al
Phenotypic characterization and risk factors for interval breast cancers in a population-based breast cancer screening program in Barcelona, Spain
.
Cancer Causes Control
2010
;
21
:
1155
64
.
4.
Porter
PL
,
El-Bastawissi
AY
,
Mandelson
MT
,
Lin
MG
,
Khalid
N
,
Watney
EA
, et al
Breast tumor characteristics as predictors of mammographic detection: comparison of interval- and screen-detected cancers
.
J Natl Cancer Inst
1999
;
91
:
2020
8
.
5.
Holm
J
,
Humphreys
K
,
Li
J
,
Ploner
A
,
Cheddad
A
,
Eriksson
M
, et al
Risk factors and tumor characteristics of interval cancers by mammographic density
.
J Clin Oncol
2015
;
33
:
1030
7
.
6.
Shen
Y
,
Yang
Y
,
Inoue
LY
,
Munsell
MF
,
Miller
AB
,
Berry
DA
. 
Role of detection method in predicting breast cancer survival: analysis of randomized screening trials
.
J Natl Cancer Inst
2005
;
97
:
1195
203
.
7.
Mook
S
,
Van 't Veer
LJ
,
Rutgers
EJ
,
Ravdin
PM
,
van de Velde
AO
,
van Leeuwen
FE
, et al
Independent prognostic value of screen detection in invasive breast cancer
.
J Natl Cancer Inst
2011
;
103
:
585
97
.
8.
Tabar
L
,
Faberberg
G
,
Day
NE
,
Holmberg
L
. 
What is the optimum interval between mammographic screening examinations? An analysis based on the latest results of the Swedish two-county breast cancer screening trial
.
Br J Cancer
1987
;
55
:
547
51
.
9.
Perou
CM
,
Sorlie
T
,
Eisen
MB
,
van de Rijn
M
,
Jeffrey
SS
,
Rees
CA
, et al
Molecular portraits of human breast tumours
.
Nature
2000
;
406
:
747
52
.
10.
Sorlie
T
,
Perou
CM
,
Tibshirani
R
,
Aas
T
,
Geisler
S
,
Johnsen
H
, et al
Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications
.
Proc Natl Acad Sci U S A
2001
;
98
:
10869
74
.
11.
Li
J
,
Holm
J
,
Bergh
J
,
Eriksson
M
,
Darabi
H
,
Lindstrom
LS
, et al
Breast cancer genetic risk profile is differentially associated with interval and screen-detected breast cancers
.
Ann Oncol
2015
;
26
:
517
22
.
12.
Rantalainen
M
,
Klevebring
D
,
Lindberg
J
,
Ivansson
E
,
Rosin
G
,
Kis
L
, et al
Is sequencing-based breast cancer diagnostics ready to replace current routine biomarkers?
Sci Rep
2016
.
Under Review
.
13.
Cunha
SI
,
Bocci
M
,
Lovrot
J
,
Eleftheriou
N
,
Roswall
P
,
Cordero
E
, et al
Endothelial ALK1 is a therapeutic target to block metastatic dissemination of breast cancer
.
Cancer Res
2015
;
75
:
2445
56
.
14.
Lindstrom
LS
,
Jauhiainen
A
,
Wilking
U
,
Foukakis
T
,
Åström
G
,
Czene
K
, et al
Gene signature model predicts metastatic onset better than standard clinical markers – Nested case-control design uniquely enables enrichment for biologically relevant features
.
Cancer Res
2013
;
73
(
24 Suppl
):
Abstract nr P6-06-17
.
15.
Lind
H
,
Svane
G
,
Kemetli
L
,
Tornberg
S
. 
Breast cancer screening program in Stockholm county, Sweden - aspects of organization and quality assurance
.
Breast Care
2010
;
5
:
353
7
.
16.
Li
J
,
Szekely
L
,
Eriksson
L
,
Heddson
B
,
Sundbom
A
,
Czene
K
, et al
High-throughput mammographic-density measurement: a tool for risk prediction of breast cancer
.
Breast Cancer Res
2012
;
14
:
R114
.
17.
Parker
JS
,
Mullins
M
,
Cheang
MC
,
Leung
S
,
Voduc
D
,
Vickery
T
, et al
Supervised risk predictor of breast cancer based on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
18.
Tibshirani
R
,
Hastie
T
,
Narasimhan
B
,
Chu
G
. 
Diagnosis of multiple cancer types by shrunken centroids of gene expression
.
Proc Natl Acad Sci U S A
2002
;
99
:
6567
72
.
19.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate - a practical and powerful approach to multiple testing
.
J R Stat Soc B Met
1995
;
57
:
289
300
.
20.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
21.
Stephens
PJ
,
Tarpey
PS
,
Davies
H
,
Van Loo
P
,
Greenman
C
,
Wedge
DC
, et al
The landscape of cancer genes and mutational processes in breast cancer
.
Nature
2012
;
486
:
400
4
.
22.
Kandoth
C
,
McLellan
MD
,
Vandin
F
,
Ye
K
,
Niu
B
,
Lu
C
, et al
Mutational landscape and significance across 12 major cancer types
.
Nature
2013
;
502
:
333
9
.
23.
Cancer Genome Atlas
N
. 
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
24.
Schneider
BP
,
Winer
EP
,
Foulkes
WD
,
Garber
J
,
Perou
CM
,
Richardson
A
, et al
Triple-negative breast cancer: risk factors to potential targets
.
Clin Cancer Res
2008
;
14
:
8010
8
.
25.
Hayashida
Y
,
Goi
T
,
Hirono
Y
,
Katayama
K
,
Urano
T
,
Yamaguchi
A
. 
PPP1R3 gene (protein phosphatase 1) alterations in colorectal cancer and its relationship to metastasis
.
Oncol Rep
2005
;
13
:
1223
7
.
26.
Rao
RC
,
Dou
Y
. 
Hijacked in cancer: the KMT2 (MLL) family of methyltransferases
.
Nat Rev Cancer
2015
;
15
:
334
46
.
27.
Malik
R
,
Khan
AP
,
Asangani
IA
,
Cieslik
M
,
Prensner
JR
,
Wang
X
, et al
Targeting the MLL complex in castration-resistant prostate cancer
.
Nat Med
2015
;
21
:
344
52
.
28.
Natrajan
R
,
Lambros
MB
,
Geyer
FC
,
Marchio
C
,
Tan
DS
,
Vatcheva
R
, et al
Loss of 16q in high grade breast cancer is associated with estrogen receptor status: Evidence for progression in tumors with a luminal phenotype?
Genes Chromosomes Cancer
2009
;
48
:
351
65
.
29.
Jonsson
G
,
Staaf
J
,
Vallon-Christersson
J
,
Ringner
M
,
Holm
K
,
Hegardt
C
, et al
Genomic subtypes of breast cancer identified by array-comparative genomic hybridization display distinct molecular and clinical characteristics
.
Breast Cancer Res
2010
;
12
:
R42
.
30.
Toffoli
S
,
Bar
I
,
Abdel-Sater
F
,
Delree
P
,
Hilbert
P
,
Cavallin
F
, et al
Identification by array comparative genomic hybridization of a new amplicon on chromosome 17q highly recurrent in BRCA1 mutated triple negative breast cancer
.
Breast Cancer Res
2014
;
16
:
466
.
31.
Mueller-Pillasch
F
,
Lacher
U
,
Wallrapp
C
,
Micha
A
,
Zimmerhackl
F
,
Hameister
H
, et al
Cloning of a gene highly overexpressed in cancer coding for a novel KH-domain containing protein
.
Oncogene
1997
;
14
:
2729
33
.
32.
Vranic
S
,
Gurjeva
O
,
Frkovic-Grazio
S
,
Palazzo
J
,
Tawfik
O
,
Gatalica
Z
. 
IMP3, a proposed novel basal phenotype marker, is commonly overexpressed in adenoid cystic carcinomas but not in apocrine carcinomas of the breast
.
Appl Immunohistochem Mol Morphol
2011
;
19
:
413
6
.
33.
Rojo
F
,
Domingo
L
,
Sala
M
,
Zazo
S
,
Chamizo
C
,
Menendez
S
, et al
Gene expression profiling in true interval breast cancer reveals overactivation of the mTOR signaling pathway
.
Cancer Epidemiol Biomarkers Prev
2014
;
23
:
288
99
.