Abstract
We investigated the extent to which estrogen receptor (ER) and progesterone receptor (PR) status results from a centralized pathology laboratory agree with ER and PR results from community pathology laboratories reported to two Surveillance, Epidemiology and End Results (SEER) registries (Los Angeles County and Detroit) and whether statistical estimates for the association between reproductive factors and breast cancer receptor subtypes differ by the source of data. The agreement between the centralized laboratory and SEER registry classifications was substantial for ER (κ = 0.70) and nearly so for PR status (κ = 0.60). Among the four subtypes defined by joint ER and PR status, the agreement between the two sources was substantial for the two major breast cancer subtypes (ER−/PR−, κ = 0.69; ER+/PR+, κ = 0.62) and poor for the two rarer subtypes (ER+/PR−, κ = 0.30; ER−/PR+, κ = 0.05). Estimates for the association between reproductive factors (number of full-term pregnancies, age at first full-term pregnancy, and duration of breastfeeding) and the two major subtypes (ER+/PR+ and ER−/PR−) differed minimally between the two sources of data. For example, parous women with at least four full-term pregnancies had 40% lower risk for ER+/PR+ breast cancer than women who had never been pregnant [centralized laboratory, odds ratio, 0.60 (95% confidence interval, 0.39-0.92); SEER, odds ratio, 0.57 (95% confidence interval, 0.38-0.85)]; no association was observed for ER−/PR− breast cancer (both Ptrend > 0.30). Our results suggest that conclusions based on SEER registry data are reasonably reliable for ER+/PR+ and ER−/PR− subtypes. (Cancer Epidemiol Biomarkers Prev 2009;18(8):2214–20)
Introduction
We previously showed that multiparity and early age at first birth protected against estrogen receptor–positive (ER+)/progesterone receptor–positive (PR+) tumors but not against ER−negative (ER−)/PR−negative (PR−) tumors among women ages 35 to 64 years who participated in Women's Contraceptive Reproductive Experiences (CARE) Study, a multicenter population-based case-control study on invasive breast cancer (1, 2). In those analyses, breastfeeding decreased breast cancer risk, regardless of ER and PR status. Like most epidemiologic studies on this topic (3, 4), we used ER and PR status obtained from the Surveillance, Epidemiology and End Results (SEER) registries supplemented by pathology reports at one non-SEER registry.
SEER registries routinely collect laboratory results on ER and PR status from patients' medical records, documenting their breast cancer diagnosis. These reports come from many pathology laboratories throughout the SEER regions. These laboratories may use different methods and/or different cut points for a positive receptor status, causing concern about consistency among epidemiologists using these data to define breast cancer subtypes. Thus far, however, no data have been published on the extent to which potential inaccuracies in reporting receptor subtypes affect the statistical estimates of risk factor associations with breast cancer.
Here, we compare ER and PR status from a centralized pathology laboratory assessment of tumor samples to the status obtained from community pathology testing as recoded in two SEER registries: the Los Angeles County and Detroit registries. We assessed whether different sources of ER and PR results affect the statistical estimates for the associations between subtypes of breast cancer and the fairly well–established reproductive breast cancer risk factors: number of full-term pregnancies, age at first full-term pregnancy, and duration of breastfeeding.
Materials and Methods
Subject Identification
The Women's CARE Study was a population-based case-control study on invasive breast cancer conducted among U.S.-born White and African-American women ages 35 to 64 y who resided in one of five areas of the United States (Atlanta, Detroit, Los Angeles County, Philadelphia, or Seattle; ref. 5). This study is restricted to women who participated in the Los Angeles County or Detroit components of the Women's CARE Study.
Case participants in the Women's CARE Study were diagnosed with a first primary invasive breast cancer between June 1994 and August 1998. Control participants were women with no history of invasive or in situ breast cancer who were identified by random digit dialing. Control participants were frequency matched to the expected distribution of cases in strata defined by 5-y age groups, ethnicity (White or African American) and residence located in the same geographic region. The Women's CARE Study recruited 1,921 cases (1,072 White and 849 African American) and 2,034 control participants (1,161 White and 873 African American) from Los Angeles County and Detroit. The interview response rates were 73.3% for cases in Los Angeles County, 73.7% for controls in Los Angeles County, 74.7% for cases in Detroit, and 74.1% for controls in Detroit.
The Women's CARE Study collected demographic characteristics, complete histories of menstrual and reproductive factors, family history of breast cancer, and information pertaining to other factors from each participant during an in-person interview.
Assessment of ER and PR Status
As part of the Women's CARE Study, paraffin-embedded tumor blocks were obtained for 1,333 cases (Los Angeles County, 919; Detroit, 414), 80% of those requested. All paraffin-embedded tumor blocks were carefully reviewed and evaluated in the laboratory of Dr. Press at the University of Southern California. This laboratory served only as the centralized pathology laboratory and, as a specialized research laboratory, did not contribute to results ascertained by the Los Angeles County SEER registry. We excluded 127 case samples because the paraffin-embedded tumor blocks we received contained only carcinoma in situ (n = 56), no tumor tissue (n = 46), only paraffin-embedded, H&E-stained tissue sections (n = 8); insufficient tissue or blocks for the assays (n = 3); or other problems with the tissue (n = 14). We therefore successfully determined ER and PR expression status for 1,206 case subjects (Los Angeles County, 839; Detroit, 367) in the centralized laboratory. In a previous study on the association between percent mammographic density and subtypes of breast cancer, we reported on 352 of the Los Angeles County cases for whom we also had mammograms (6).
The status of ER and PR was determined using previously published immunohistochemical methods (7-9). Immunostaining results for ER and PR expression were interpreted in a blinded fashion and scored semiquantitatively on the basis of the visually estimated percentage of positively stained tumor cell nuclei. The intensity of nuclear staining was scored for individual tumor cell nuclei as negative (-)/no staining, plus one (+1)/weak intensity, plus two (+2)/intermediate intensity, or plus three (+3)/strong intensity. A minimum of 100 tumor cells were scored with the percentage of tumor cell nuclei in each category recorded. In this article, an overall score of ≥1% immunostained tumor cell nuclei was considered as ER+ or PR+ status.
Detailed information about the collection of ER and PR status from SEER registries in the Women's CARE Study has been described elsewhere (1). For the 1,206 cases whose ER and PR status were determined in the centralized pathology laboratory, 1,048 (86.9%) had ER status, 926 (76.8%) had PR status, and 919 (76.2%) had status for ER and PR available in the SEER registry record.
All Women's CARE participants signed an informed consent before their recruitment; the study was approved by the institutional review boards at the University of Southern California, the Karmanos Comprehensive Cancer Center, the Centers for Disease Control and Prevention, and the City of Hope.
Statistical Analysis
We calculated Cohen's κ statistics and corresponding 95% confidence intervals (95% CI; refs. 10, 11) to evaluate the agreement between ER and PR status from the centralized pathology laboratory and the data from the SEER registries. Landis and Koch (12) provide a benchmark for interpreting the values of κ. κ Agreement level below 0.0 is considered as poor, 0.00 to 0.20 as slight, 0.21 to 0.40 as fair, 0.41 to 0.60 as moderate, 0.61 to 0.80 as substantial, and 0.81 to 1.00 as almost perfect agreement. Furthermore, we calculated κ statistics by study site, tumor characteristics, and cases' demographic characteristics. Homogeneity of κ statistics by study site, tumor characteristics, and cases' demographic characteristics was tested by the homogeneity χ2 test of Fleiss (13).
We assessed the association of breast cancer subtypes defined by the combinations of ER and PR status, with number of full-term (>26-wk gestation) pregnancies, age at first full-term pregnancy (defined for each woman as the age at which that pregnancy ended), and duration of breastfeeding, estimating odd ratios and corresponding 95% CIs using multivariable polychotomous unconditional logistic regression for case-control comparisons. Tests for trend were conducted by fitting ordinal values corresponding to categories of exposure in our models and testing whether the coefficient (slope of the dose response) differed from zero. Separate models estimating the odds ratios (95% CIs) and tests for trend were fit to data from the centralized pathology laboratory and to data from SEER registry reports.
We included the following factors selected a priori as potential confounders in all multivariable polychotomous logistic regression models: age (35-39, 40-44, 45-49, 50-54, 55-59, or 60-64 y), race (White or African American), family history of breast cancer [no first-degree family history, first-degree (mother, sister, or daughter) family history, unknown or adopted], age at menarche (≤11, 12, 13, >13 y), study site (Los Angeles County or Detroit), and education (high school or lower level of education, technical school or some college, or college graduate). When restricting analyses to parous women, a single model that additionally adjusted for number of full-term pregnancies (1, 2, 3, ≥4) was fitted to assess the joint effects of age at first full-term pregnancy and breastfeeding duration.
We excluded five controls who had missing information on parity when assessing whether the source of ER and PR status (centralized laboratory or SEER) affected receptor-subtype specific risk estimates for the reproductive factors studied. This resulted in 2,029 control and 919 case participants available for analysis.
In reporting the results of trend tests, we considered a two-sided P < 0.05 as statistically significant. All analyses were done using the SAS statistical package (version 9.1; SAS Institute).
Results
Agreement of ER and PR Status
Among 1,048 case participants who had known ER status from the centralized laboratory and from SEER, tumors from 898 cases (86%) were classified the same by both sources [ER− (n = 316) or ER+ (n = 582)], whereas 150 (14%) cases were classified into different categories (Table 1). Overall, there was a substantial agreement for ER status between the centralized laboratory results and the SEER registry reports (κ = 0.70).
Among 926 case participants who had known PR status from the two sources, tumors from 745 (80%) cases were classified the same by both [PR− (n = 296) or PR+ (n = 449)], whereas 181 (20%) cases were classified into different categories (Table 1). Overall, there was a moderate agreement for PR status between the centralized laboratory results and SEER registry reports (κ = 0.60).
Among 919 case participants who had known ER and PR status from the two sources, the overall agreement for four subtypes (ER−/PR−, ER+/PR+, ER+/PR−, ER−/PR+) between the two sources was moderate (κ = 0.55). Furthermore, we found that the agreement between the two sources was substantial for the two major breast cancer subtypes (ER−/PR−, κ = 0.69; ER+/PR+, κ = 0.62) and poor for the two rarer subtypes (ER+/PR−, κ = 0.30; ER−/PR+, κ = 0.05; Table 1).
κ Statistics increased with increasing calendar year from 1994 to 1998, especially for PR status. κ Statistics were lower for women whose breast cancer was diagnosed in 1994 (ER, κ = 0.54; PR, κ = 0.39) compared with those whose breast cancer was diagnosed from 1995 through 1998 (ER, κ ≥ 0.66; PR, κ ≥ 0.59; Table 2). κ Statistics for ER status varied by tumor size, the highest κ statistics were observed among women whose tumor sizes were between 2 and 4.9 cm (κ = 0.75), but there was no such variation in κ statistics for PR. κ Statistics for ER status were relatively constant across 5-year groupings for age at diagnosis from 35 to 44 years through 60 to 64 years (P = 0.60); the κ statistics for PR status were lower for women whose breast cancer was diagnosed at age 60 to 64 years (κ = 0.36) than those in other age groups. κ Statistics were not associated with study site, tumor stage, and case patient's race (all P > 0.10).
Associations of Reproductive Factors with Breast Cancer Subtypes by Source of ER/PR
Results for the association of parity with risk for ER and PR subtypes of breast cancer were similar whether the source of data on subtype came from the centralized laboratory or from SEER (Table 3). Parity was associated with risk reductions in ER+/PR+ (Ptrend ≤ 0.008 for centralized laboratory and for SEER classification) and ER−/PR+ breast cancer (Ptrend ≤ 0.05 for centralized laboratory and for SEER classification) but was not statistically associated with ER−/PR− breast cancer (Ptrend > 0.30 for centralized laboratory and for SEER classification). Parous women who had four or more full-term pregnancies had 40% lower risk for ER+/PR+ breast cancer than women who had never been pregnant [centralized laboratory, odds ratio, 0.60 (95% CI, 0.39-0.92); SEER, odds ratio, 0.57 (95% CI, 0.38-0.85)].
Among parous women, early age at first full-term pregnancy was not statistically significantly associated with any subtype when SEER data were used (Ptrend ≥ 0.11; Table 3). However, we observed a marginally positive association between age at first full-term pregnancy and the risk for ER+/PR+ subtype using the data from the centralized laboratory (Ptrend = 0.05). Duration of breastfeeding was negatively associated with breast cancer risk, regardless of ER and PR status (Ptrend ≤ 0.05 for centralized laboratory and Ptrend ≤ 0.10 for SEER classifications).
Discussion
To our knowledge, this study is the first to investigate the extent to which ER and PR status results from a centralized pathology laboratory agreed with data from SEER registries and whether statistical estimates for the association between reproductive factors and breast cancer receptor subtypes differ according to the source of data.
We found that the agreement between the centralized laboratory and SEER registry classifications was substantial for ER and moderate for PR status. The agreement seemed to increase with increasing calendar year during 1994 and 1998 (especially for PR status), and agreement was obviously poor among women whose breast cancer was diagnosed in 1994 compared with those whose breast cancer was diagnosed during 1995 and 1998. Historically, ER and PR expression status in breast tissue was measured by the dextran-coated charcoal biochemical assay, and the initiation of measuring ER expression was earlier than that for PR expression (14, 15). The central laboratory used for these investigations was among the first to validate use of monoclonal ER antibodies (7, 16-19) and monoclonal PR antibodies (20-22) for localization of ER and PR in tissue sections by immunohistochemistry and subsequently adapted these methods to analysis of archival tissues (9, 23). With the development of antibodies to ER and PR, immunohistochemical assay replaced the dextran-coated charcoal biochemical assay since the mid-1990s in United States (24). During the transition period, it is possible that some Women's CARE Study patient participants had ER and PR status measured by the dextran-coated charcoal assay in some community laboratories. In addition, when use of immunohistochemistry methods was first initiated, some assays did not meet the guidelines for technical and clinical validation (25, 26). These reasons might account for the poorer agreement we observed for women diagnosed in 1994 compared with those whose breast cancer was diagnosed from 1995 through 1998.
Among the subtypes of breast cancer defined by joint ER and PR status, we found that the agreement between the centralized laboratory and SEER registry was substantial for the two major breast cancer subtypes (ER+/PR+ and ER−/PR−). Estimates for the associations between reproductive factors and risk for ER−/PR− and ER+/PR+ breast cancer also agreed whether results from a centralized pathology laboratory or SEER data were used.
Animal and human data have shown that the endogenous or exogenous ovarian hormones estrogen and progesterone play an important role in breast cancer development (27-29). These hormones act through their respective receptors, ER and PR. Clinical data indicate that breast cancer patients with tumors classified as ER+/PR+ are, on average, more likely to be responsive to hormone treatment and have a better prognosis than those with ER−/PR− tumors (30, 31). Using ER and PR status from SEER registries, a number of epidemiologic studies have found that multiparity (1, 32-34) and early age at first full-term pregnancy (34, 35) are associated with lower risk for ER+/PR+ but not ER−/PR− breast cancer, whereas a longer duration of breastfeeding is associated with lower breast cancer risk, regardless of ER and PR status (1, 33). In this article, we obtained supportive evidence for these previous findings by using ER and PR status from the centralized pathology laboratory.
In our data, the agreement between the centralized laboratory and SEER registry was poor for the two rarer breast cancer subtypes (ER+/PR+ and ER−/PR+). Furthermore, we found that the risk estimates between reproductive factors and these two subtypes were inconsistent between the two data sources. Inconsistencies also exist in the findings from previous epidemiologic studies, although the ER and PR data were obtained directly from community pathology laboratories or from SEER registries, which collect information from community pathology laboratories (1, 32, 34). We previously reported that relationships between reproductive factors and these two subtypes were similar to those for ER+/PR+ tumors (1), but other studies have provided different results. In the prospective Iowa Women's Health Study, ER+/PR− tumors differed from the other three ER/PR subtypes in the direction and strength of associations with parity and age at first birth (34). In contrast, a report from the prospective Nurses' Health Study showed that the adverse effect of delayed childbearing was observed only for ER+/PR− and ER−/PR− subtypes (32).
Compared with SEER data, the centralized laboratory had a lower detection rate for ER+ or PR+ status. Although ER status and PR status were determined in this laboratory, many outside laboratories were involved in the preservation and storage of tissue, and the centralized laboratory and one in Detroit were involved in collecting archived tumor tissue. The Detroit laboratory prepared sections for evaluation, storing them while tissue for a number of participants was being collected, and then shipped them to the centralized laboratory wherein all tissue sections were paraffinized. Despite the delay in preserving Detroit samples, we observed no study site differences in prevalence rates by receptor status. Nevertheless, the centralized laboratory performed assays on tissue that had been stored longer than was true when evaluations were done in the community laboratories at the time of diagnosis (resulting in the data recorded within the SEER registries). It is possible that this longer time frame resulted in a lower detection rate for ER+ and PR+ by the laboratory (36).
Another limitation of our study is that we did not obtain tissue from all eligible case participants. We compared demographic characteristics, family history of breast cancer, reproductive factors, and tumor size across three subgroups of cases: cases without ER and PR status from either a centralized pathology laboratory or SEER registries; cases with ER and PR status from the centralized laboratory, but no information for ER and PR from SEER registries; and cases with ER and PR status from two sources (results not shown). This analysis indicated that cases who had ER and PR status from the centralized laboratory, but no data from SEER registries had characteristics that were similar to those without ER and PR status from either of these two sources except that cases were 0.4 years younger at first full-term pregnancy, on average. Cases who had ER and PR status from the centralized laboratory and SEER registries were more likely to have been diagnosed in Los Angeles County than in Detroit; to have been younger at diagnosis and better educated; to have been, on average, 0.8 years older at first-full-term pregnancy; and to have had larger tumors than the other two groups. There was no difference in either number of full-term pregnancies or duration of breastfeeding across these three subgroups. Thus, it is unlikely that the statistically significant associations between the reproductive factors and subtypes of breast cancers reported here would be different if we had obtained pathology samples from all eligible case participants.
In summary, epidemiologic studies often abstract ER and PR status from diverse medical records or use population-based cancer registry reports, such as SEER registry reports. Our results from this comparison with data from a centralized pathology laboratory suggest that conclusions based on SEER registry data are reasonably reliable for ER+/PR+ and ER−/PR− subtypes.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicathis fact.
We thank Dr. Karen Petrosyan, Armine Arakelyan, Hasmik Toumaian, and Judith Udove for the technical assistance in the performance of the immunohistochemical assays for this study.