Background: Smoking is a possible risk factor for breast cancer and has been linked to increased risk of estrogen receptor–positive (ER+) disease in some epidemiologic studies. It is unknown whether smoking has quantitative effects on ER expression.

Methods: We examined relationships between smoking and ER expression from tumors of 1,888 women diagnosed with invasive breast cancer from a population-based study in North Carolina. ER expression was characterized using binary (±) and continuous measures for ER protein, ESR1 mRNA, and a multigene luminal score (LS) that serves as a measure of estrogen signaling in breast tumors. We used logistic and linear regression models to estimate temporal and dose-dependent associations between smoking and ER measures.

Results: The odds of ER+, ESR1+, and LS+ tumors among current smokers (at the time of diagnosis), those who smoked 20 or more years, and those who smoked within 5 years of diagnosis were nearly double those of nonsmokers. Quantitative levels of ESR1 were highest among current smokers compared with never smokers overall [mean (log2) = 9.2 vs. 8.7, P < 0.05] and among ER+ cases; however, we did not observe associations between smoking measures and continuous ER protein expression.

Conclusions: In relationship to breast cancer diagnosis, recent smoking was associated with higher odds of the ER+, ESR1+, and LS+ subtype. Current smoking was associated with elevated ESR1 mRNA levels and an elevated LS, but not with altered ER protein.

Impact: A multigene LS and single-gene ESR1 mRNA may capture tumor changes associated with smoking. Cancer Epidemiol Biomarkers Prev; 27(1); 67–74. ©2017 AACR.

Epidemiologic studies have demonstrated distinct risk factor profiles for breast cancer subtypes classified according to estrogen receptor (ER) status (1, 2). In a previous analysis from the Carolina Breast Cancer Study (CBCS), we reported a modest increased risk of ER+ breast cancer in association with prediagnostic smoking (3). Several contemporary epidemiologic studies have reported similar associations (3–5). These findings raise the question of whether smoking exposure could be linked to altered ER expression or pathway activity.

If prediagnostic smoking modulates ER expression, quantitative levels of ER may differ between breast tumors of smokers and nonsmokers. This hypothesis could be tested at the protein level. However, clinical IHC assays are tuned to be maximally sensitive for the detection of ER, which leads to saturated signals and suppression of ER's dynamic expression range (6). RNA assays may have a wider dynamic range and therefore may better capture quantitative changes in ESR1 expression. In addition, multigene scores, such as the PAM50 luminal gene signature (6, 7), capture estrogen signaling across multiple targets and may offer improved resolution when examining smoking in relation to ER expression. By evaluating quantitative variation in ER expression in relation to smoking, we seek to more directly assess a link between smoking and estrogen-related pathways to breast cancer.

In this large population-based study, we evaluated smoking in association with binary classifications and quantitative measures for ER protein, ESR1 mRNA, and a multigene luminal score (LS) that captures estrogen signaling in tumors; we examined temporal and dose-dependent measures of smoking in relation to each biomarker.

Study population

Phase III of the CBCS III is a population-based case-only study that combines epidemiology and molecular biology to examine environmental and genetic risk factors for molecular subtypes of breast cancer (8, 9). To be eligible for inclusion, patients must have been female and received a first and primary diagnosis of breast cancer between May 1, 2008, and October 31, 2013. The patient also must have resided in the 44-county North Carolina study region and been between the ages of 20 and 74 at the time of diagnosis. To examine potential differences by age and race, the CBCS employed a randomized recruitment strategy that was designed to oversample young and African American women (10).

Breast cancer cases were identified by a rapid case ascertainment system, implemented through collaboration between Lineberger Comprehensive Cancer Center (Chapel Hill, NC) and the North Carolina Central Cancer Registry. Briefly, CBCS contacted the patient's primary physician to obtain permission to invite the patient into the study, yielding an overall response rate of 70% and a total of 2,998 women. Study participants were asked to consent to a nurse-administered in-person interview that took place in the study participant's home or another prearranged location. The average time between study enrollment and interview was 6 months. The nurse-administered questionnaire included items on family and personal medical history, reproductive history, smoking, alcohol, diet, medication use, and occupational history. Upon consent, the nurse also collected a blood sample and objective anthropometric measurements of height (m), weight (kg), waist (m), and hip (m) circumference. All study activities and protocols were approved by the Institutional Review Board at the University of North Carolina at Chapel Hill School of Medicine (Chapel Hill, NC). Study participants provided written informed consent, and all research activities were conducted in accordance with the U.S. Common Rule.

Study design

Tumor gene expression analysis.

The CBCS includes protein and RNA expression data on genes involved in estrogen signaling. At the time of interview, investigators asked study participants for written permission to obtain formalin-fixed, paraffin-embedded tumor blocks or tissue slides from the hospital where the diagnostic surgery was to be performed. Tumor blocks were used to construct tissue microarrays for IHC staining, where each patient's tumor was represented by 1 to 4 cores on the microarray. To enrich for tumor cellularity, cores were taken from within a tumor region that was annotated on the tumor block by a pathologist. Hematoxylin and eosin slides were constructed for the top, middle, and bottom portion of each block. Cores were excluded if tumor was not included on top and bottom slides. RNA was extracted using the Qiagen RNeasy FFPE Kit and protocol applied to separate cores or sections from the same tumor block. The current analysis includes data for 1,888 women analyzed for ER protein level by IHC and 993 women analyzed for RNA expression (Table 1).

Table 1.

Age, race, and smoking characteristics of CBCS III study participants included in IHC analysis (Overall: N = 1,888) and the subset of study participants sampled for NanoString analysis (NanoString sampled: n = 993)

OverallNanoString sampled
Characteristicsn (%)n (%)
Race 
 AA 957 (50.7) 488 (49.1) 
 Non-AA 931 (49.3) 505 (50.9) 
Age 
 <50 994 (52.7) 506 (51) 
 ≥50 894 (47.3) 487 (49) 
Smoking history 
 Never 1,036 (54.9) 523 (52.7) 
 Ever 851 (45.1) 469 (47.3) 
 Missing 
Smoking status 
 Never 1,036 (54.9) 523 (52.7) 
 Former 502 (26.6) 260 (26.2) 
 Current 349 (18.5) 209 (21.1) 
 Missing 
Duration of smoking 
 Never 1,036 (54.9) 523 (52.7) 
 ≤10 years 243 (12.9) 143 (14.4) 
 11–20 years 202 (10.7) 98 (9.88) 
 >20 years 405 (21.5) 228 (23) 
 Missing 
Amount smoked 
 Never 1,036 (54.9) 523 (52.7) 
 <1/2 pack 332 (17.6) 184 (18.5) 
 1/2–1 pack 340 (18.0) 187 (18.9) 
 >1 pack 179 (9.5) 98 (9.88) 
 Missing 
Time since smoking cessationa 
 Never 1,036 (54.9) 523 (52.7) 
 <5 years 440 (23.3) 260 (26.2) 
 5–10 years 63 (3.3) 31 (3.13) 
 11–20 years 136 (7.2) 67 (6.75) 
 >20 years 212 (11.2) 111 (11.2) 
 Missing 
OverallNanoString sampled
Characteristicsn (%)n (%)
Race 
 AA 957 (50.7) 488 (49.1) 
 Non-AA 931 (49.3) 505 (50.9) 
Age 
 <50 994 (52.7) 506 (51) 
 ≥50 894 (47.3) 487 (49) 
Smoking history 
 Never 1,036 (54.9) 523 (52.7) 
 Ever 851 (45.1) 469 (47.3) 
 Missing 
Smoking status 
 Never 1,036 (54.9) 523 (52.7) 
 Former 502 (26.6) 260 (26.2) 
 Current 349 (18.5) 209 (21.1) 
 Missing 
Duration of smoking 
 Never 1,036 (54.9) 523 (52.7) 
 ≤10 years 243 (12.9) 143 (14.4) 
 11–20 years 202 (10.7) 98 (9.88) 
 >20 years 405 (21.5) 228 (23) 
 Missing 
Amount smoked 
 Never 1,036 (54.9) 523 (52.7) 
 <1/2 pack 332 (17.6) 184 (18.5) 
 1/2–1 pack 340 (18.0) 187 (18.9) 
 >1 pack 179 (9.5) 98 (9.88) 
 Missing 
Time since smoking cessationa 
 Never 1,036 (54.9) 523 (52.7) 
 <5 years 440 (23.3) 260 (26.2) 
 5–10 years 63 (3.3) 31 (3.13) 
 11–20 years 136 (7.2) 67 (6.75) 
 >20 years 212 (11.2) 111 (11.2) 
 Missing 

Abbreviation: AA, African American.

aTime since smoking cessation with respect to date of diagnosis.

ER protein.

IHC staining of CBCS3 was described by Allott and colleagues (11). Automated quantification of ER protein was determined by a Genie classifier and the Aperio nuclear v9 algorithm (Aperio Technologies; ref. 11). We calculated percent positivity for ER as the product of positively stained tumor cells for each core, multiplied by its core-specific weight, summed across all cores per patient [ER wild type (WT) %]. We assigned a cutoff point of ≥10% for “ER+” tumors; 1% to <10% for “ER borderline” tumors; and < 1% for “ER” tumors. For the ER binary classification, “ER borderline” tumors were combined with “ER tumors” based on our previous observations that borderline tumors shared other molecular features with ER disease (11).

ESR1 mRNA.

ESR1 was quantified using NanoString technology (12). Briefly, total ESR1 mRNA counts were assayed using an ESR1-specific molecular probe, which hybridizes to RNA fragments in solution. Hybrids are then fixed to a solid matrix and counted using microscopic imaging, yielding raw mRNA counts. Quality control and data normalization were performed using the NanoStringNorm R package (13). Data were first normalized to the geometric means of 6 internal positive controls and subsequently to the geometric means of 5 reference genes. Normalized ESR1 counts were log2 transformed, yielding a bimodal Gaussian distribution of the data. We used the mclust R package and an unsupervised analysis to classify the two distributions as ESR1 or ESR1+, reflecting low and high expression, respectively (14). ESR1 tumors had log2 values ranging between 0 and 8.35 and ESR1+ tumors had log2 values ranging between 8.38 and 15.64.

PAM50 subtype and LS.

Breast cancer–intrinsic subtype was measured using the RNA-based “PAM50 signature” (7). Differential expression of the 50-gene signature was used to categorize breast cancers into 4 intrinsic subtypes: Luminal A, Luminal B, HER2E, and basal-like. Each case was classified based upon highest Pearson correlation with a centroid defined for each subtype. The PAM50 Luminal gene signature is embedded within the larger signature and includes 8 highly correlated genes associated with Luminal-type breast cancers, which are characterized by high ER expression (6, 7). The 8 genes include: BAG1, ESR1, FOXA1, GPR160, NAT1, MAPT, MLPH, and PGR. Each gene was quantified and normalized according to procedures for ESR1, as described above. To calculate the LS, we took the average of the normalized values of the 8 genes. Normalized and transformed values for LS followed a bimodal Gaussian distribution. We used the mclust R package to classify the LS as LS or LS+, reflecting low and high scores, respectively. LS tumors had log2 values ranging between 3.26 and 7.57 and LS+ tumors had log2 values ranging between 7.58 and 11.37. ESR1 mRNA and the 8 genes embedded in the LS were assayed along with other genes included in 1 of 3 NanoString batches or code sets. Samples were randomized to batch and all NanoString analyses were adjusted for “code set” to minimize potential batch effects.

Smoking exposure assessment.

Prediagnostic history of smoking was obtained during the nurse-administered in-person interview and includes data on smoking duration, frequency, and dose. Women in CBCS were considered ever smokers if they smoked at least 100 cigarettes during their lifetimes. Smoking history was defined as “ever” or “never” (history); smoking status defined as “current,” “former,” or “never” (status); age at smoking initiation measured in years (initiation); smoking duration measured as the total number of years of smoking between initiation and current use or cessation (duration); number of cigarettes smoked per day (dose); and age at smoking cessation, where applicable. Pack-years were defined as a cumulative measure of the number of cigarette packs smoked per day, divided by smoking duration in years. Similarly, pack-decades were defined as cumulative measures of cigarette packs smoked per day, over 10-year intervals.

Covariate assessment.

Potential confounders include: first-degree family history of breast cancer defined as breast cancer diagnosis for mother or a full female sibling (15); alcohol consumption defined as having any history of alcohol use (16–18); ever having breast fed (1); body mass index (BMI: kg/m2; ref. 1); parity defined as number of full-term births (1, 16); history of oral contraceptive use (19); hormone replacement therapy use (19); menopausal status; meeting physical activity guidelines; age; and race.

Participants were also asked for permission to obtain pathology reports and medical records from the treating facilities. Clinical and pathologic data abstracted from medical records and pathology reports included tumor size, stage, and node status; these tumor characteristics were considered as potential confounders of the relationship between smoking and ER expression. For all cases, a single pathologist (J. Geradts) determined tumor grade.

Statistical analysis

For binary ER, ESR1, and LS-defined subtype variables, we used generalized logit models to estimate ORs and 95% confidence intervals (CIs) for categorical measures of smoking. To evaluate temporal and dose-dependent associations between smoking and subtype, we first estimated the associations via logistic regression for a 1-unit increase in pack-decades. We compared this cumulative exposure model with an exposure time–windows model (i.e., piecewise logistic regression model) for three time intervals for time since smoking cessation: 0 to 10 years; 11 to 20 years; and >20 years. We used a likelihood ratio test (LRT) to compare the deviances between the two models, the difference of which follows a χ2 distribution with 2 degrees of freedom (df).

To evaluate the hypothesis that odds of ER, ESR1, and LS-positive subtypes vary with time since smoking exposure, we used a generalized logit model with a log normal latency function to calculate time-weighted exposure estimates for the 40-year period preceding breast cancer diagnosis. The latency period between smoking exposure and breast cancer occurrence is thought to be as much as 40 years, and the log normal distribution has been used to describe variation in disease risk with time since exposure (20). Specifically, the lognormal latency function can be used to describe the rise, peak, and decline in risk or log-odds with respect to time since exposure. The highest weights are assigned during the time interval where smoking is associated with the greatest odds of ER+, ESR1+, or LS+ breast cancer and may signify the most etiologically relevant time interval for smoking exposure. The macro used to model the log normal latency function is described in Richardson (2009; ref. 21).

We used linear regression to model the relationship between continuous measures of ER ESR1, LS, and categorical measures of smoking, adjusted for age, race, and NanoString batch (where applicable). We calculated the estimated value of continuous biomarker expression for each individual, based on coding of the smoking exposure and covariates (age, race, and NanoString batch) pattern. Expression levels for each biomarker were described according to interquartile range and visualized using boxplots within categories of smoking.

All analyses were conducted using SAS 9.4 (SAS Institute Inc.) and R version 3.3.3.

Figure 1 illustrates the relationships between categorical and continuous measures for LS, ESR1 mRNA, and ER protein. Figure 1A shows distinct clusters for ESR1 and LS, reflecting low and high expression for each. We compared binary classifications for ER protein as measured by IHC with those for ESR1 and LS and observed moderate to good values for sensitivity (se) and specificity (sp; ER vs. ESR1: se = 92%, sp = 86% and ER vs. LS: se = 89%, sp = 85%). Weighted percent ER protein (ER WT%) was positively correlated with log2 values for ESR1 mRNA (r = 0.70, P < 0.01; Fig. 1B); however, ESR1 mRNA appeared to have a greater dynamic range compared with ER WT%. Among ER+ tumors, quantitative protein values tended to saturate the upper end of the percentage range.

Figure 1.

Correlations between ER protein, ESR1 mRNA, and the multigene LS. A, Includes a scatterplot showing the relationship between ER IHC status, ESR1 mRNA expression (log2), and LS (median centered; n = 993). ER+ breast tumors are colored yellow (≥10% weighted positivity); ER borderline tumors are colored gray (1%–<10% weighted positivity); and ER tumors are colored dark blue (<1% weighted positivity). B, Includes a scatterplot showing the relationship between ER weighted percent positivity (%), ESR1 mRNA expression (log2), and LS binary classifications (i.e., LS+ and LS). ER IHC and ESR1 mRNA values were positively correlated (r = 0.70, P < 0.01). An expectation–maximization (EM) algorithm identified two distinct clusters for ESR1 expression (ERS1, dark blue; ESR1+, yellow).

Figure 1.

Correlations between ER protein, ESR1 mRNA, and the multigene LS. A, Includes a scatterplot showing the relationship between ER IHC status, ESR1 mRNA expression (log2), and LS (median centered; n = 993). ER+ breast tumors are colored yellow (≥10% weighted positivity); ER borderline tumors are colored gray (1%–<10% weighted positivity); and ER tumors are colored dark blue (<1% weighted positivity). B, Includes a scatterplot showing the relationship between ER weighted percent positivity (%), ESR1 mRNA expression (log2), and LS binary classifications (i.e., LS+ and LS). ER IHC and ESR1 mRNA values were positively correlated (r = 0.70, P < 0.01). An expectation–maximization (EM) algorithm identified two distinct clusters for ESR1 expression (ERS1, dark blue; ESR1+, yellow).

Close modal

Categorical measures of prediagnostic smoking history, dose, and duration were associated with increased odds of ER+, ESR1+, and LS+ subtypes (Supplementary Figs. S1–S3). ER+ breast tumors were most common among ever smokers compared with never smokers (OR = 1.51; 95% CI, 1.15–1.97) as shown in Supplementary Fig. S1. When stratified by smoking status at the time of diagnosis, current smokers were twice as likely to be ER+ compared with never smokers (OR = 1.89; 95% CI, 1.33–2.70); former smokers had slightly elevated odds of ER+ breast cancer (OR = 1.25; 95% CI, 0.91–1.73). Smoking duration of 20 years or more had elevated odds of ER+ breast cancer (OR = 1.79; 95% CI, 1.26–2.56). We observed more modestly elevated odds of the ER+ subtype for shorter duration of smoking. Women who smoked <1/2 or 1/2 to 1 packs of cigarettes per day had increased odds of ER+ breast cancer [(OR = 1.48; 95% CI, 1.04–2.10) and (OR = 1.57; 95% CI, 1.09–2.26), respectively]. However, for the highest category for smoking dose (>1 pack/day), we observed slightly weaker odds for ER+ tumors (OR = 1.44; 95% CI, 0.87–2.37). With respect to “time since smoking cessation,” smoking within 5 years of breast cancer diagnosis was associated with a 60% increased odds of having ER+ breast cancer (OR = 1.59; 95% CI, 1.15–2.20). In general, we observed similar patterns of association between smoking measures and the ESR1+ and LS+ subtypes (Supplementary Figs. S2 and S3). Notably, the magnitudes of the ORs were slightly higher for the RNA-based measures, particularly for smoking duration and time since smoking exposure.

We also evaluated the distribution of Luminal A, Luminal B, and basal-like intrinsic subtypes with respect to smoking status at the time of diagnosis. We observed a higher frequency of Luminal A versus basal-like tumors among current and former smokers [current: 64.3% vs. never: 54.2%, frequency difference (95% CI) = 10.2% (3.0–20.1) and former: 66.3% vs. never: 54.2%, frequency difference (95% CI) = 12.1% (3.2–21.1)]. When adjusted for age and race, relative frequency of Luminal A tumors remained elevated, but the difference estimates were slightly attenuated. We observed no substantial difference in the proportion of Luminal B versus basal-like breast cancer according to smoking status.

Recency of smoking appeared to alter several of the estrogen-related biomarkers we examined (Table 2). Our cumulative exposure models suggest that a 1-unit increase in pack-decades was associated with a 10% to 18% increase in the odds of having a “positive” subtype: ER+ (OR = 1.09; 95% CI, 0.99–1.20), ESR1+ (OR = 1.18; 95% CI, 1.04–1.34), and LS+ (OR = 1.18; 95% CI, 1.04–1.35). Moreover, for the exposure time–windows models, total pack-decades smoked within 10 years of a breast cancer diagnosis was associated with the greatest odds of having ER+, ESR1+, or LS+ breast cancer when compared with exposure accumulated between 10 and 20 or greater than 20 years prior to diagnosis. However, results from our LRT suggest that the exposure time–windows model provides improved fit over the cumulative exposure model for LS-defined subtypes (P = 0.04), but did not substantially improve data fit for the ER (P = 0.63) and ESR1 subtypes (P = 0.27).

Table 2.

Estimated ORs and 95% CIs for cumulative smoking exposure and ER-defined breast cancer subtypes (associations are described as the trend per pack-decadea overall and within intervals for time since smoking cessation)

ER n = 1,163/537 (±)ESR1 n = 608/304 (±)LS n = 619/293 (±)
OR per pack-decade (95% CI)OR per pack-decade (95% CI)OR per pack-decade (95% CI)
Cumulative exposure 1.09 (0.99–1.20) 1.18 (1.04–1.34) 1.18 (1.04–1.35) 
Time since smoking cessationb 
 0–10 years prior 1.54 (0.74–3.19) 2.15 (0.82–5.64) 2.99 (1.11–8.08) 
 11–20 years prior 0.83 (0.37–1.89) 0.87 (0.30–2.55) 0.79 (0.26–2.35) 
 >20 years prior 1.08 (0.88–1.32) 1.07 (0.83–1.39) 1.01 (0.78–1.31) 
Test of heterogeneity 
 LRT, 2 dfc 0.94 2.59 6.39 
P 0.63 0.27 0.04 
ER n = 1,163/537 (±)ESR1 n = 608/304 (±)LS n = 619/293 (±)
OR per pack-decade (95% CI)OR per pack-decade (95% CI)OR per pack-decade (95% CI)
Cumulative exposure 1.09 (0.99–1.20) 1.18 (1.04–1.34) 1.18 (1.04–1.35) 
Time since smoking cessationb 
 0–10 years prior 1.54 (0.74–3.19) 2.15 (0.82–5.64) 2.99 (1.11–8.08) 
 11–20 years prior 0.83 (0.37–1.89) 0.87 (0.30–2.55) 0.79 (0.26–2.35) 
 >20 years prior 1.08 (0.88–1.32) 1.07 (0.83–1.39) 1.01 (0.78–1.31) 
Test of heterogeneity 
 LRT, 2 dfc 0.94 2.59 6.39 
P 0.63 0.27 0.04 

NOTE: ORs were estimated using cases with complete covariate data.

Abbreviations: +, positive subtype; –, negative subtype.

aSmoking exposure was modeled as the number of packs smoked per decade. ORs and 95% CIs were derived from unconditional logistic regression models, adjusted for: NanoString batch, age, race, menopausal status, parity, breastfeeding, family history of breast cancer, alcohol use, BMI (kg/m2), physical activity, oral contraceptive use, hormone replacement therapy use, node status, stage, tumor size, and tumor grade.

bTime since smoking cessation with respect to date of diagnosis.

cLRT comparing cumulative and exposure time–windows model, with 2 degrees of freedom.

Current smoking was associated with the greatest odds of the LS+ breast cancer subtype. Figure 2 illustrates variation with time of exposure for the association between pack-decades and LS+ breast cancer for the 40-year period preceding breast cancer diagnosis. Our latency model with log normal weighted exposures demonstrated increased odds of the LS+ subtype for prediagnostic smoking proximal to the time of diagnosis. An LRT comparing the log normal latency model to the cumulative exposure model for the same 40-year period did not suggest that our latency model provided improved fit for the data (LRT = 4.2, 2 df). However, the dose–response parameter estimate in our latency model was statistically significant, thereby suggesting the peak in odds proximal to diagnosis may be the most etiologically relevant time point for smoking and ER+ breast cancer occurrence.

Figure 2.

Odds of LS+ breast cancer with time since smoking cessation. This figure displays variation with time since smoking cessation in the association between pack-decades of cigarettes smoked and LS+ breast cancer. Logistic regression models were adjusted for age and race (n = 993). The dashed blue line indicates the estimated OR for cumulative smoking exposure (pack-decades) for the model described in Table 2 [OR and 95% CI = 1.18 (1.04–1.35)]. The solid dark gray dots indicate point estimates for the association between pack-decades and LS+ breast cancer for each year preceding breast cancer diagnosis over a period of 40 years, with exposure time points weighted using a log normal distribution. The light gray bands represent 95% CIs surrounding point estimates.

Figure 2.

Odds of LS+ breast cancer with time since smoking cessation. This figure displays variation with time since smoking cessation in the association between pack-decades of cigarettes smoked and LS+ breast cancer. Logistic regression models were adjusted for age and race (n = 993). The dashed blue line indicates the estimated OR for cumulative smoking exposure (pack-decades) for the model described in Table 2 [OR and 95% CI = 1.18 (1.04–1.35)]. The solid dark gray dots indicate point estimates for the association between pack-decades and LS+ breast cancer for each year preceding breast cancer diagnosis over a period of 40 years, with exposure time points weighted using a log normal distribution. The light gray bands represent 95% CIs surrounding point estimates.

Close modal

RNA levels were also more quantitatively sensitive to differences in smoking history. Supplementary Tables S1 and S2 present estimated biomarker expression values for ER protein, ESR1 mRNA, and the LS, adjusted for age, race, and NanoString batch (where applicable). In general, ER protein levels did not vary across smoking exposures for breast cancer cases overall or when restricted to ER+ cases. Compared with never smokers, we observed the highest levels of ESR1 mRNA and the highest LSs among current smokers [(mean (log2) = 9.2 vs. 8.7, P < 0.05) and (mean (log2) = 8.3 vs. 7.9, P < 0.05), respectively]. When restricted to ER+ breast cancer cases, we still observed higher levels of ESR1 among current smokers; however, the LS association was attenuated. Figures 3 and 4 visualize estimated expression values for ESR1 and LS among “never,” “former,” and “current” smokers. We explored whether the LS and ESR1 levels varied in association with smoking after stratification by age. We found that although ESR1 and LSs were slightly higher among older women, consistent with higher rates of ER+ disease in older women, the general patterns of association with smoking status were similar by age. Likewise, we did not see evidence of effect modification or confounding by race.

Figure 3.

Prediagnostic smoking status and the distribution of ESR1 mRNA. This figure includes boxplots displaying estimated expression values for ESR1 among all cases (A; n = 986) and among ER+ cases (B; n = 644), by prediagnostic smoking status. ESR1 values were estimated from a linear regression model adjusted for age, race, and NanoString batch. *, P < 0.05, where “never” smokers serve as the referent group. Estimated expression values are based on cases with complete covariate data.

Figure 3.

Prediagnostic smoking status and the distribution of ESR1 mRNA. This figure includes boxplots displaying estimated expression values for ESR1 among all cases (A; n = 986) and among ER+ cases (B; n = 644), by prediagnostic smoking status. ESR1 values were estimated from a linear regression model adjusted for age, race, and NanoString batch. *, P < 0.05, where “never” smokers serve as the referent group. Estimated expression values are based on cases with complete covariate data.

Close modal
Figure 4.

Prediagnostic smoking status and the distribution of LS values. This figure includes boxplots displaying the distribution of LS among all cases (A; n = 986) and among those who were ER+ (B; n = 644), by smoking status. LS values were estimated from a linear regression model adjusted for age, race, and NanoString batch. *, P < 0.05, where “never” smokers serve as the referent group. Estimated expression values are based on cases with complete covariate data.

Figure 4.

Prediagnostic smoking status and the distribution of LS values. This figure includes boxplots displaying the distribution of LS among all cases (A; n = 986) and among those who were ER+ (B; n = 644), by smoking status. LS values were estimated from a linear regression model adjusted for age, race, and NanoString batch. *, P < 0.05, where “never” smokers serve as the referent group. Estimated expression values are based on cases with complete covariate data.

Close modal

Findings from our study lend quantitative support to the hypothesis that smoking could be linked to estrogen-mediated pathways in breast tumors. In our case-only study of nearly 2,000 patients, we observed increased odds of the ER+ subtype for temporal and dose-dependent measures of smoking. We also demonstrate quantitative changes in ER-related tumor subtypes characterized by ESR1 mRNA and a multigene LS. Increased odds of ER+, ESR1+, and LS+ subtypes were most apparent among women who were self-reported current smokers at the time of diagnosis. Logistic regression models with latency parameters allowed us to simultaneously model dose, duration, and time of exposure to demonstrate that the most etiologically relevant period for smoking and ER-defined breast cancer may be during prediagnostic smoking closest in time to diagnosis. In addition, we observed that current smoking was associated with increased quantitative levels for ESR1, but not ER protein, which may suggest that RNA more sensitively captures biological differences when compared with ER protein expression.

Contemporary epidemiologic studies have demonstrated positive associations between smoking and ER+ breast cancer with estimates ranging between 10% and 50% increased risk among current or former smokers (3, 4, 22, 23). Our case-only analysis in CBCS phase III demonstrated that relative to nonsmokers, the odds of having ER+ versus ER breast cancer was approximately double among current smokers. These findings are consistent with our previous case–control analysis in the CBCS phase I and II, which also showed increased risk of ER+ disease among smokers and heterogeneity of ORs for the luminal (ER+) and basal-like (ER) subtypes (3, 8). In that study, we observed a positive association between smoking and ER+ risk but no association between smoking and the ER subtype, a pattern that has been observed in other studies performed in U.S. populations (4, 22). However, in contrast, other studies of smoking and breast cancer risk in Swedish, Swiss, and Australian populations have demonstrated positive associations between smoking and the ER breast cancer subtype (24–26). These conflicting observations may reflect temporal differences in exposure, behavioral patterns, or may also be an artifact of differing methods used to assay ER expression (e.g., ligand-binding, immunoreactivity). Varying methods used to assay ER protein expression may also result in different thresholds for ER positivity. Thus, a careful investigation of the relationship between smoking and ER-defined breast cancer subtypes should consider era, methodologic approaches, and characteristics of population of interest.

In both clinical and research settings, IHC has been used as the standard for ER quantification in breast tumors (27). IHC is highly sensitive for the detection of ER protein, serving as an excellent marker to guide clinical decision making. However, protein saturation may preclude studying subtler, quantitative differences in association with etiologic factors. Our study addresses this potential limitation by using ESR1 mRNA counts to characterize breast tumors as ESR1+ and ESR1. Unlike ER protein expression values for percent positivity, the log2-transformed ESR1 mRNA counts in our study follow a bimodal Gaussian distribution, identifying two distinct classes of breast tumors reflecting low and high expression of ESR1. On the basis of ESR1+ subtypes, current smoking, long smoking duration of more than 20 years, and smoking within 5 years of a breast cancer diagnosis were associated with 2 to 3 times the odds of having a positive (+) subtype.

In addition, our study benefits by the incorporation of a multigene LS embedded in the PAM50 signature, used to classify breast tumors according to intrinsic subtype (6, 7). The 8 genes included in the LS are highly correlated with luminal subtypes, which are characterized by high ER expression. Multigene scores may offer improved resolution over single gene markers as they are often predictive, prognostic, and may have etiologic relevance by capturing additional dimensions of estrogen response not captured by a single gene. We observed similar patterns of association between smoking and the ER+, ESR1+, and LS+ subtypes. Notably, however, the magnitudes of association were slightly higher for the ESR1 and LS mRNA classifications.

Although the prevalence of cigarette smoking has steadily decreased since the 1950s, approximately 50% of women in the United States report a history of ever smoking and 14% are self-reported current smokers (28). For protracted exposures in studies of etiology, it is important to evaluate measures of dose, duration, and temporality to fully evaluate associations with the outcome. Women with the longest smoking histories in our study were older compared with never smokers and were also most likely to be self-reported current smokers at the time of diagnosis. As such, traditional metrics for smoking in studies of cancer etiology are confounded by age, dose, and duration of exposure, thereby creating a challenge in understanding how a combination of dose and timing influences biomarker expression in breast tumors.

In the current study, we use cumulative and time-varying (latency) models to simultaneously evaluate dose, duration, and timing of exposure; we observed that prediagnostic smoking proximal to time of diagnosis may be associated with increased odds of ER+, ESR1+, and LS+ subtypes. We also observed higher quantitative levels of ESR1 among current smokers and women who smoked within 5 years of breast cancer diagnosis. Thus, the timing of a woman's smoking exposure relative to date of diagnosis may be key to understanding the relationship between smoking and ER-defined breast tumors. Future studies of smoking and breast cancer risk may benefit from statistical methods that can be used to elucidate associations for exposures confounded by time.

There also may be clinical implications for changes in quantitative estrogen pathway genes. Prospective studies of breast cancer survivors have suggested that smoking exposure prior to diagnosis may influence survival outcomes, presumably through reduced efficacy of ER-targeted therapies (29–31). Researchers have also suggested that fluctuations in endogenous estrogens may influence intrinsic subtyping in premenopausal women (32); so it is plausible that an exogenous exposure like smoking, which could modulate ER expression, may also have implications for intrinsic subtyping and subsequent treatment decisions. At present, IHC biomarkers have the greatest clinical application in breast cancer, although high-sensitivity IHC assays may have somewhat saturated signals, limiting our ability to assess quantitative changes in protein. Notably, we did not observe associations between smoking and quantitative ER protein expression. RNA measures for ESR1 may prove useful in evaluating quantitative changes; however, the feasibility of implementing such measures in clinical settings remains unknown. The current findings suggest that in research settings designed to understand breast cancer heterogeneity in relation to exposure history, quantitative levels of RNA may have value. As genomic tests become more widely used, sensitivity of these tests to smoking behavior and other exposures will be important to understand.

With the unique compilation of population-based observational and biomarker data, CBCS is an ideal resource to examine the association between smoking and ER-defined breast cancer subtypes. Findings from our study add a unique contribution to the body of literature by considering multiple methods to characterize ER-defined breast tumors and by incorporating measures of time, duration, and dose to identify etiologically relevant exposure periods. We also suggest that RNA measures may provide improved resolution of gene expression for studies seeking to evaluate the etiology of ER+ breast tumors. Future work should seek to examine smoking in relation to other proposed biomarkers of breast carcinogenesis and should evaluate other patient exposures as possible modulators of clinical and/or genomic tumor characteristics.

No potential conflicts of interest were disclosed.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Conception and design: E.N. Butler, D.B. Richardson, X. Sun, A.F. Olshan, M.A. Troester

Development of methodology: E.N. Butler, M.A. Troester

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Geradts, A.F. Olshan, M.A. Troester

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E.N. Butler, J.T. Bensen, M. Chen, D.B. Richardson, X. Sun, A.F. Olshan, M.A. Troester

Writing, review, and/or revision of the manuscript: E.N. Butler, J.T. Bensen, K. Conway, D.B. Richardson, X. Sun, J. Geradts, A.F. Olshan, M.A. Troester

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E.N. Butler, X. Sun

Study supervision: K. Conway, A.F. Olshan, M.A. Troester

Research reported in this publication was supported by the P50-CA058223, U01-CA179715, and the University Cancer Research Fund, University of North Carolina at Chapel Hill. The work was also supported by F31CA200336 from the NCI of the NIH.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Millikan
RC
,
Newman
B
,
Tse
CK
,
Moorman
PG
,
Conway
K
,
Dressler
LG
, et al
Epidemiology of basal-like breast cancer
.
Breast Cancer Res Treat
2008
;
109
:
123
39
.
2.
Trivers
KF
,
Lund
MJ
,
Porter
PL
,
Liff
JM
,
Flagg
EW
,
Coates
RJ
, et al
The epidemiology of triple-negative breast cancer, including race
.
Cancer Causes Control
2009
;
20
:
1071
82
.
3.
Butler
EN
,
Tse
CK
,
Bell
ME
,
Conway
K
,
Olshan
AF
,
Troester
MA
. 
Active smoking and risk of luminal and basal-like breast cancer subtypes in the Carolina Breast Cancer Study
.
Cancer Causes Control
2016
;
27
:
775
86
.
4.
Kawai
M
,
Malone
KE
,
Tang
MT
,
Li
CI
. 
Active smoking and the risk of estrogen receptor-positive and triple-negative breast cancer among women ages 20 to 44 years
.
Cancer
2014
;
120
:
1026
34
.
5.
Nyante
S
,
Gierach
G
,
Dallal
C
,
Freedman
ND
,
Park
Y
,
Danforth
KN
, et al
Cigarette smoking and postmenopausal breast cancer risk in a prospective cohort
.
Br J Cancer
2014
;
110
:
2339
47
.
6.
Nielsen
TO
,
Parker
JS
,
Leung
S
,
Voduc
D
,
Ebbert
M
,
Vickery
T
, et al
A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer
.
Clin Cancer Res
2010
;
16
:
5222
32
.
7.
Parker
JS
,
Mullins
M
,
Cheang
MC
,
Leung
S
,
Voduc
D
,
Vickery
T
, et al
Supervised risk predictor of breast cancer based on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
8.
Newman
B
,
Moorman
PG
,
Millikan
R
,
Qaqish
BF
,
Geradts
J
,
Aldrich
TE
, et al
The Carolina Breast Cancer Study: integrating population-based epidemiology and molecular biology
.
Breast Cancer Res Treat
1995
;
35
:
51
60
.
9.
McGee
SA
,
Durham
DD
,
Tse
CK
,
Millikan
RC
. 
Determinants of breast cancer treatment delay differ for African American and White women
.
Cancer Epidemiol Biomarkers Prev
2013
;
22
:
1227
38
.
10.
Weinberg
CR
,
Sandler
DP
. 
Randomized recruitment in case-control studies
.
Am J Epidemiol
1991
;
134
:
421
32
.
11.
Allott
EH
,
Cohen
SM
,
Geradts
J
,
Sun
X
,
Khoury
T
,
Bshara
W
, et al
Performance of three-biomarker immunohistochemistry for intrinsic breast cancer subtyping in the AMBER consortium
.
Cancer Epidemiol Biomarkers Prev
2016
;
25
:
470
8
.
12.
Geiss
GK
,
Bumgarner
RE
,
Birditt
B
,
Dahl
T
,
Dowidar
N
,
Dunaway
DL
, et al
Direct multiplexed measurement of gene expression with color-coded probe pairs
.
Nat Biotechnol
2008
;
26
:
317
25
.
13.
Waggott
DM
. 
NanoStringNorm: Normalize NanoString miRNA and mRNA Data
.
Vienna, Austria
:
R Core Team
; 
2015
.
14.
Fraley
C
,
Raftery
AE
,
Murphy
TB
,
Scrucca
L
.
Normal mixture modeling for model-based clustering, classification, and density estimation
.
Seattle, WA
:
University of Washington
; 
2012
.
15.
Yang
XR
,
Sherman
ME
,
Rimm
DL
,
Lissowska
J
,
Brinton
LA
,
Peplonska
B
, et al
Differences in risk factors for breast cancer molecular subtypes in a population-based study
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
439
43
.
16.
Tamimi
RM
,
Colditz
GA
,
Hazra
A
,
Baer
HJ
,
Hankinson
SE
,
Rosner
B
, et al
Traditional breast cancer risk factors in relation to molecular subtypes of breast cancer
.
Breast Cancer Res Treat
2012
;
131
:
159
67
.
17.
Williams
LA
,
Olshan
AF
,
Tse
CK
,
Bell
ME
,
Troester
MA
. 
Alcohol intake and invasive breast cancer risk by molecular subtype and race in the Carolina breast cancer study
.
Cancer Causes Control
2016
;
27
:
259
69
.
18.
Park
SY
,
Kolonel
LN
,
Lim
U
,
White
KK
,
Henderson
BE
,
Wilkens
LR
. 
Alcohol consumption and breast cancer risk among women from five ethnic groups with light to moderate intakes: the multiethnic cohort study
.
Int J Cancer
2014
;
134
:
1504
10
.
19.
Kwan
ML
,
Kushi
LH
,
Weltzien
E
,
Maring
B
,
Kutner
SE
,
Fulton
RS
, et al
Epidemiology of breast cancer subtypes in two prospective cohort studies of breast cancer survivors
.
Breast Cancer Res
2009
;
11
:
R31
.
20.
Terry
PD
,
Miller
AB
,
Rohan
TE
. 
Cigarette smoking and breast cancer risk: a long latency period?
Int J Cancer
2002
;
100
:
723
8
.
21.
Richardson
DB
. 
Latency models for analyses of protracted exposures
.
Epidemiology
2009
;
20
:
395
9
.
22.
Gaudet
MM
,
Gapstur
SM
,
Sun
J
,
Diver
WR
,
Hannan
LM
,
Thun
MJ
. 
Active smoking and breast cancer risk: original cohort data and meta-analysis
.
J Natl Cancer Inst
2013
;
105
:
515
25
.
23.
Kabat
GC
,
Kim
M
,
Phipps
AI
,
Li
CI
,
Messina
CR
,
Wactawski-Wende
J
, et al
Smoking and alcohol consumption in relation to risk of triple-negative breast cancer in a cohort of postmenopausal women
.
Cancer Causes Control
2011
;
22
:
775
83
.
24.
Manjer
J
,
Malina
J
,
Berglund
G
,
Bondeson
L
,
Garne
JP
,
Janzon
L
. 
Smoking associated with hormone receptor negative breast cancer
.
Int J Cancer
2001
;
91
:
580
4
.
25.
Cooper
JA
,
Rohan
TE
,
Cant
EL
,
Horsfall
DJ
,
Tilley
WD
. 
Risk factors for breast cancer by oestrogen receptor status: a population-based case-control study
.
Br J Cancer
1989
;
59
:
119
25
.
26.
Morabia
A
,
Bernstein
M
,
Ruiz
J
,
Heritier
S
,
Diebold Berger
S
,
Borisch
B
. 
Relation of smoking to breast cancer by estrogen receptor status
.
Int J Cancer
1998
;
75
:
339
42
.
27.
Yaziji
H
,
Taylor
CR
,
Goldstein
NS
,
Dabbs
DJ
,
Hammond
EH
,
Hewlett
B
, et al
Consensus recommendations on estrogen receptor testing in breast cancer by immunohistochemistry
.
Appl Immunohistochem Mol Morphol
2008
;
16
:
513
20
.
28.
Agaku
IT
,
King
BA
,
Dube
SR
,
Centers for Disease Control and Prevention
. 
Current cigarette smoking among adults - United States, 2005-2012
.
MMWR Morbidity Mortality Weekly Report
2014
;
63
:
29
34
.
29.
Daniell
HW
. 
Estrogen receptors, breast cancer, and smoking
.
N Engl J Med
1980
;
302
:
1478
.
30.
Kakugawa
Y
,
Kawai
M
,
Nishino
Y
,
Fukamachi
K
,
Ishida
T
,
Ohuchi
N
, et al
Smoking and survival after breast cancer diagnosis in Japanese women: a prospective cohort study
.
Cancer Sci
2015
;
106
:
1066
74
.
31.
Persson
M
,
Simonsson
M
,
Markkula
A
,
Rose
C
,
Ingvar
C
,
Jernstrom
H
. 
Impacts of smoking on endocrine treatment response in a prospective breast cancer cohort
.
Br J Cancer
2016
;
115
:
382
90
.
32.
Bernhardt
SM
,
Dasari
P
,
Walsh
D
,
Townsend
AR
,
Price
TJ
,
Ingman
WV
. 
Hormonal modulation of breast cancer gene expression: implications for intrinsic subtyping in premenopausal women
.
Front Oncol
2016
;
6
:
241
.