Abstract
Risk prediction models may be useful for precision breast cancer screening. We aimed to evaluate the performance of breast cancer risk models developed in European-ancestry studies in a Korean population.
We compared discrimination and calibration of three multivariable risk models in a cohort of 77,457 women from the Korean Cancer Prevention Study (KCPS)-II. The first incorporated U.S. breast cancer incidence and mortality rates, U.S. risk factor distributions, and RR estimates from European-ancestry studies. The second recalibrated the first by using Korean incidence and mortality rates and Korean risk factor distributions, while retaining the European-ancestry RR estimates. Finally, we derived a Korea-specific model incorporating the RR estimates from KCPS.
The U.S. European-ancestry breast cancer risk model was well calibrated among Korean women <50 years [expected/observed = 1.124 (0.989, 1.278)] but markedly overestimated the risk for those ≥50 years [E/O = 2.472 (2.005, 3.049)]. Recalibrating absolute risk estimates using Korean breast cancer rates and risk distributions markedly improved the calibration in women ≥50 [E/O = 1.018 (0.825, 1.255)]. The model incorporating Korean-based RRs had similar but not clearly improved performance relative to the recalibrated model.
The poor performance of the U.S. European-ancestry breast cancer risk model among older Korean women highlights the importance of tailoring absolute risk models to specific populations. Recalibrating the model using Korean incidence and mortality rates and risk factor distributions greatly improved performance.
The data will provide valuable information to plan and evaluate actions against breast cancer focused on primary prevention and early detection in Korean women.
Introduction
Breast cancer is the most common nonskin cancer in women worldwide. Although Korea has had a lower incidence compared with Western countries, the incidence of breast cancer is rapidly increasing. It is now the second leading cancer in Korean women (1) after thyroid cancer, with 21,402 new cases diagnosed in 2014. This increasing trend likely reflects changes in reproductive factors in Korean women such as early menarche, late menopause, and having fewer children at an older age (2), all of which are secondary to the rapid development and westernization of Korea. However, the age distribution of breast cancer incidence in Korea is still markedly different from that in the Western countries, with a peak at 45–49 years and a higher proportion of premenopausal women (3–5).
Mammography and other screening modalities can reduce morbidity and mortality of breast cancer (6, 7). In Korea, mammography is recommended every 2 years for women ages 40 or older. Breast cancer susceptibility, however, largely depends on multiple risk factors, and it is crucial to identify high-risk women who may benefit from aggressive screening strategies. Thus, building and improving the predictive values for risk prediction models is an important step toward targeted screening and prevention.
A number of breast cancer risk models have been developed in European-ancestry populations (8). These models use information on reproductive factors, family history of disease, mammographic density, and measured genetic factors to estimate a woman's absolute risk of disease. Recent work has focused on developing and validating a “synthetic” multivariable risk model that can include a comprehensive set of risk factors (9, 10). The absolute risks from this model are well calibrated across U.S.- and European cohorts, but they are unlikely to provide accurate risk estimates for Korean women without further modification. It is unknown whether recalibrated risk estimates using Korean population incidence rates but retaining relative risk estimates from European-based studies will perform well among Korean women.
This study aims to evaluate the discrimination and calibration of three breast cancer risk models among Korean women: a model based on risk factor RRs estimated from European-ancestry studies, U.S. incidence and mortality rates, and the distribution of risk factors among U.S. non-Hispanic white women; a model using European-ancestry RR estimates but Korean incidence and mortality rates and Korean risk factor distributions; and a model that combines Korean incidence and mortality rates and Korean risk factor distributions with Korean RR estimates. Although the last model, in principle, should perform best, in practice, if the risk factor RRs are similar across European-ancestry and Korean populations, the recalibrated model using RRs estimated among Europeans may perform well, especially if the sample sizes used to estimate the Korean RRs are relatively small.
We calculate absolute risk estimates and evaluate their performance using data from the Korean Cancer Prevention Study-II (KCPS-II) Biobank and the Individualized Coherent Absolute Risk Estimation (iCARE) software. iCARE was developed to develop and validate risk prediction models for a population combining information on RR estimates, age-specific incidence/mortality rates and risk factor distributions from multiple data sources (11).
Materials and Methods
Study population for discrimination and calibration analyses
We used the KCPSII Biobank to evaluate the discrimination and calibration of breast cancer absolute risk models. The KCPS-II includes 78,282 women who undertook routine health assessments at health promotion centers between 2004 and 2013. The study design and recruitment have been described in detail previously (12). All participants gave written informed consent before participation. The Institutional Review Board of Yonsei University approved this study protocol (IRB approval number 4-2011-0277). Exclusion criteria included no information on height and weight, history of breast cancer, and age at entry below 20 or above 80 years. The final analytic samples included 77,457 women (Supplementary Fig. S1; Supplementary Table S1).
Data collection
All participants were asked to complete a structured questionnaire to collect the following details: age at menarche, age at menopause, parity, age at first birth, oral contraceptive (OC) use (never, ever), hormone replacement therapy (HRT) use, alcohol intake, history of benign breast disease (BBD), and family history of breast cancer. Height and weight were measured while participants wore light clothing. Body mass index (BMI) was calculated as the weight (kg) divided by the height squared (m2).
Follow-up for breast cancer
The principal outcome was incidence of breast cancer (ICD-10 codes C50). Because all participants have a unique identification number assigned at birth, allowing linkage with the national cancer registry and hospital admission records, the follow-up was almost 100% complete. Cancer diagnoses are based on histologic type, resulting in high accuracy.
Study population for RR estimation
We used the KCPS to independently estimate the relative risks for breast cancer risk factors (Supplementary Table S2). The KCPS is a 1.3-million-member prospective cohort study, designed to assess risk factors for mortality, incidence, and hospital admission from cancer, with a follow-up of 25 years (13). The KCPS cohort includes the 443,627 women ages 20–80 years who received health insurance from the Korean Medical Insurance Corporation and who had biennial medical evaluations between 1992 and 1995. The collection of risk factors was similarly done to the KCPS-II. Because history of BBD was not asked for women in the KCPS, we defined history of BBD based on ICD-10 code D24. In the KCPS cohort, an incident breast cancer was coded on the basis of a hospital admission for a cancer diagnosis.
Statistical analysis
We evaluated the performance of a recently published breast cancer absolute risk model (9) in the KCPS-II Biobank. We compared the performance of three models: (i) the U.S.-based European-ancestry model, using incidence, mortality, and risk factor distributions among U.S. non-Hispanic white women and European-ancestry RRs (USEA); (ii) a recalibrated model, using Korean incidence mortality and risk-factor distributions but European-ancestry RRs (KREA); and (iii) a fully Korean-based model using Korean incidence mortality and risk-factor distributions and RR estimates from the KCPS (KRKR).
The models include data on reproductive, anthropometric, behavioral, and clinical risk factors: age at menarche (≤10, 11, 12, 13, 14, 15, ≥16 years), age at menopause (<40, 40–44, 45–49, 50–54, ≥55 years), parity (0, 1, 2, ≥3 births), age at first birth (<20, 20–24, 25–29, ≥30 years), OC use (never, ever), HRT use (never, ever), BMI (<18.5, 18.5–24.9, 25.0–29.9, ≥30.0 kg/m2), height (cm/10), alcohol intake (0, 1–4, 5–14, 15–24, 25–34, 35–44, ≥45 g/day), history of BBD (no, yes), and family history of breast cancer (no, yes).
Due to the effect of estrogen, postmenopausal women have a greater risk of developing breast cancer than premenopausal women. Moreover, several factors such as obesity and HRT use have been linked to a higher risk of breast cancer only for postmenopausal women (14, 15). Therefore, we assessed the breast cancer risk models separately for women younger than 50 years and women ages 50 or older.
For the USEA and KREA models, we used literature-based RRs of the risk factors (9). For the KRKR model, the RR estimates were obtained from multivariable Cox regression models based on a Korean cohort (KCPS). Supplementary Table S2 provides detailed descriptions of RR estimates included in the models and population distribution.
iCARE uses average age-specific incidence rates to calibrate the predicted risks (16, 17). We used information on age-specific breast cancer incidence rates (Supplementary Fig. S2) and mortality rates from population-based registries in the United States and Korea: the 2008-2012 U.S. Surveillance Epidemiology and End Results data and the 2010 Korea National Statistical Office, respectively.
To obtain information on risk factor distributions, iCARE uses an additional individual-level reference dataset of risk factors representing each population (11). The reference datasets were 2010 National Health and Nutrition Examination Survey (NHANES) for the U.S.-based model and 2010–2012 Korean NHANES (KNHANES) for the recalibrated model and the Korean-based model. To account for missing data in KNHANES for continuous factors, we performed conditional mean imputation, using MICE to draw multiple samples of missing factors conditional on observed data (m = 10), then averaging factor values over the samples. For two unmeasured factors, history of BBD and family history of breast cancer, in the KNHANES, we used single random draw imputation based on the prevalence of the corresponding factors from the validation cohort. Using the imputation and simulation described above, we created a complete dataset of KNHANES with no missing information on risk factors.
Discrimination and calibration were used to evaluate the performance of model validation. For risk discrimination, we assessed the area under the receiver operating characteristic curve (AUC). For calibration, the KCPS-II Biobank participants were categorized into deciles of 5-year absolute risk predicted by iCARE Lit model. The predicted and observed incidence in each decile was compared using expected-to-observed ratio (E/O) and the Hosmer-Lemeshow χ2 test. Furthermore, we estimated cumulative and 10-year absolute risk using the current probability method (16) in the Korean-based model.
The absolute risk of developing breast cancer for a woman of age a over the time interval a + s can be calculated as
Formula (A) holds under the assumptions that the risk factors Z act in a multiplicative fashion on the baseline hazard function ${\lambda _0}( t )$. Formula (A) accounts for competing risks due to mortality from other causes through the age-specific mortality rate function m(t).
Cumulative risk is evaluated as absolute risk between age 20 years and a specific age. The 10-year risk is evaluated as absolute risk over the next 10 years for a woman who has attained a specific age without developing breast cancer. To use this method, we estimated the multivariable RRs of each women in the KCPS-II based on their risk factors Z, the log-relative risks β estimated in the KCPS, the age-specific mortality rates of breast cancer in Korea, and the risk factor distribution in KNHANES. iCARE uses the log relative risks, the risk factor distribution, and the population average age-specific incidence rates to calculate the baseline hazard; it then calculates absolute risk for each subject using formula (A).
All statistical tests were two-sided at a significance level of 0.05 and calculated using SAS version 9.4 software (SAS Institute, Cary, NC) for descriptive statistics and relative risks. Absolute risks were evaluated with R 3.5.0 software using the iCARE package 1.0.0.
Results
Baseline risk factors
A total of 680 breast cancer cases were diagnosed during follow-up in the KCPS-II Biobank (322 cases were diagnosed within 5 years). Baseline risk factor distributions stratified by age of 50 are displayed in Supplementary Table S1. Compared with women ages 50 years or older, women younger than 50 years tend to have earlier age at menarche, fewer births, later age at first birth, and were more likely to drink alcohol.
Population distribution and RRs
The population distributions and RRs of breast cancer risk factors exhibit different patterns comparing the U.S. non-Hispanic white and Korean population (Supplementary Table S2). The U.S. non-Hispanic white women tend to have earlier age at menarche, later age at menopause, and earlier age at first birth than Korean women. The proportions of women who use OC or HRT were markedly higher among U.S. non-Hispanic white women than among Korean women. The U.S. non-Hispanic white women, on average, had a higher BMI than Korean women.
The relative risks did not differ greatly between European-ancestry and Korean women for most risk factors: the OR comparing the highest versus the lowest risk category among Korean women was on average 0.40–1.90 times that of European-ancestry women. One notable exception was BBD, which had a larger effect on breast cancer among Korean women than European-ancestry women (RR = 5.05 vs. 1.68). This may reflect the narrower definition of BBD in the KPCS-II, focusing on women with a history of benign neoplasm of unspecified breast. In addition, the inverse association between BMI and breast cancer risk seen among premenopausal European-ancestry was not seen among Korean women.
Risk projections
Figure 1 shows cumulative and 10-year risks of breast cancer among Korean women between age 20 and 80 years by percentiles of absolute risk estimated in the KRKR model. The cumulative risk at 80 years for women in the 95th percentile of risk was 7.56%, while the average cumulative risk was 2.06%. The 10-year risk of breast cancer for women in the 95th percentile of risk peaked at 2.61% at age 45. The 10-year risks increased from age 20 to 45 and decreased thereafter.
Cumulative and 10-year breast cancer risk for Korean women, stratified by risk percentiles in the KPCS-II Biobank estimated in the Korean-based model. Cumulative risk is evaluated as absolute risk between age 20 years and a specific age shown on the x-axis. The 10-year risk is evaluated as absolute risk over the next 10 years for a woman who has attained a specific age (shown on the x-axis) without developing breast cancer.
Cumulative and 10-year breast cancer risk for Korean women, stratified by risk percentiles in the KPCS-II Biobank estimated in the Korean-based model. Cumulative risk is evaluated as absolute risk between age 20 years and a specific age shown on the x-axis. The 10-year risk is evaluated as absolute risk over the next 10 years for a woman who has attained a specific age (shown on the x-axis) without developing breast cancer.
Predictive capacities
The AUCs for women ages <50 years and ≥50 years for the USEA model in the KCPS-II were 71.8% [95% confidence interval (CI), 68.8–74.8] and 57.1% (95% CI, 51.2–62.9), showing better ability to distinguish cases from noncases among younger women (Table 1). The USEA model was well calibrated among Korean women ages <50 years [E/O (95% CI) = 1.12 (0.99–1.28); Fig. 2] but it overestimated the risk for those ages ≥50 years [E/O (95% CI) = 2.47 (2.01–3.05); Fig. 3]. Recalibrating absolute risk estimates using Korean age-specific incidence rates and risk distributions markedly improves the calibration in women ages ≥50 years [E/O (95% CI) = 1.02 (0.83–1.26)]. Recalibrating using the Korean age-specific incidence rates while keeping a U.S. risk factor reference distribution underestimated risk in the KCPS-II [E/O (95% CI) = 0.74 (0.65,0.84)]; recalibrating using a Korean risk factor reference distribution while keeping U.S. incidence rates overestimated risk [E/O (95% CI) = 1.36 (1.19,1.54); Supplementary Table S3]. In addition, incorporating Korean-based RR estimates also improved model calibration [<50 years E/O (95% CI) = 0.96 (0.85–1.09); >50 years E/O (95% CI) = 0.94 (0.76–1.16)]. In discrimination, however, the AUC slightly decreased among women ages <50 [AUC (95% CI) = 69.7% (66.7–72.6)] and those ages ≥50 years [AUC (95% CI) = 58.4% (52.9–63.8)]. For all models, miscalibration was most evident in the extreme risk deciles (Supplementary Table S4; Supplementary Fig. S3).
Discrimination and calibration for the breast cancer risk prediction models validated using the Korean Cancer Prevention Study-II Biobank.
Age group . | Model . | AUC (95% CI) . | E/O ratio (95% CI) . |
---|---|---|---|
<50 years of age (233 cases, 57,206 noncases) | U.S.-based European-ancestry | 71.8 (68.8–74.8) | 1.124 (0.989–1.278) |
Recalibrated | 70.7 (67.7–73.7) | 0.894 (0.787–1.017) | |
Korean-based | 69.7 (66.7–72.6) | 0.960 (0.845–1.091) | |
≥50 years of age (87 cases, 18,680 noncases) | U.S.-based European-ancestry | 57.1 (51.2–62.9) | 2.472 (2.005–3.049) |
Recalibrated | 61.5 (56.2–66.9) | 1.018 (0.825–1.255) | |
Korean-based | 58.4 (52.9–63.8) | 0.941 (0.763–1.161) |
Age group . | Model . | AUC (95% CI) . | E/O ratio (95% CI) . |
---|---|---|---|
<50 years of age (233 cases, 57,206 noncases) | U.S.-based European-ancestry | 71.8 (68.8–74.8) | 1.124 (0.989–1.278) |
Recalibrated | 70.7 (67.7–73.7) | 0.894 (0.787–1.017) | |
Korean-based | 69.7 (66.7–72.6) | 0.960 (0.845–1.091) | |
≥50 years of age (87 cases, 18,680 noncases) | U.S.-based European-ancestry | 57.1 (51.2–62.9) | 2.472 (2.005–3.049) |
Recalibrated | 61.5 (56.2–66.9) | 1.018 (0.825–1.255) | |
Korean-based | 58.4 (52.9–63.8) | 0.941 (0.763–1.161) |
Note: The AUCs reported in Table 1 are defined on the basis of predicted absolute risk and incorporate the variation due to age.
(A) The U.S.-based European-ancestry model, using incidence, mortality, and risk factor distributions among U.S. non-Hispanic white women and European-ancestry relative risk (RR) estimates; (B) a recalibrated model, using Korean incidence mortality and risk-factor distributions but European-ancestry RR estimates; and (C) a fully Korean-based model using Korean incidence mortality and risk-factor distributions and RR estimates from the Korean Cancer Prevention Study.
Abbreviations: CI, confidence interval; E, expected 5-year absolute risk; O, observed 5-year incidence.
Absolute risk calibration of breast cancer risk prediction models in the KCPS-II among women less than 50 years of age. The risk categories are based on absolute risk. KCPS-II, Korean Cancer Prevention Study-II Biobank; HL, Hosmer-Lemeshow test statistic.
Absolute risk calibration of breast cancer risk prediction models in the KCPS-II among women less than 50 years of age. The risk categories are based on absolute risk. KCPS-II, Korean Cancer Prevention Study-II Biobank; HL, Hosmer-Lemeshow test statistic.
Absolute risk calibration of breast cancer risk prediction models in the KCPS-II among women 50 years of age or older. The risk categories are based on absolute risk. KCPS-II, Korean Cancer Prevention Study-II Biobank; HL, Hosmer-Lemeshow test statistic.
Absolute risk calibration of breast cancer risk prediction models in the KCPS-II among women 50 years of age or older. The risk categories are based on absolute risk. KCPS-II, Korean Cancer Prevention Study-II Biobank; HL, Hosmer-Lemeshow test statistic.
Discussion
In this study, we evaluated the performance of the breast cancer risk models originally developed for U.S. women to predict the 5-year breast cancer risk in a Korean population, directly and after recalibration to account for Korean age-specific incidence rates, risk factor distributions, and relative risks. To the best of our knowledge, our study is the first to assess these breast cancer risk models in an East Asian population.
The discrimination of the unrecalibrated USEA model and the two recalibrated models was similar for women <50 years (AUCs between 70.7% and 71.8%), but the recalibrated models performed better for women ≥50 years (AUC of 57.1 vs. 58.4% and 61.5%). The differences in AUCs indicate that the recalibration is changing the rank ordering of women according to their predicted risk. This reordering occurs because iCARE uses the distribution of risk factors in the population not only to define the baseline incidence rate, but also to estimate risks for women who are missing data on one or more risk factor. Our results suggest that using a reference distribution that better matches the target population can improve discrimination. In the case of the KCPS-II Biobank, missing data on factors that have very different distributions in the United States and Korea (e.g., age at menarche, parity, age at first birth, and alcohol intake) likely accounts for the improvement in discrimination between the USEA and recalibrated models.
The USEA breast cancer risk model was well calibrated among Korean women <50 years but overestimated the risk for those ≥50 years. Further recalibrations of the model showed appreciably improved calibration, especially among older women. This underscores the general importance of recalibrating absolute risk models to reflect the age-specific incidence rates, distribution of risk factors, and relative risks in the target population.
Consistent with previous reports, we found lower breast cancer incidence in the KCPS-II Biobank relative to incidence among U.S. non-Hispanic whites (2, 18). Relative to U.S. non-Hispanic whites, women in the KCPS-II Biobank had a lower proportion of risk factors such as earlier age at menarche, OC or HRT use, and BMI.
The RRs for most of the risk factor categories differed modestly between Korean and European-ancestry women; the OR comparing extreme risk factor categories among Korean women was generally between 0.40 and 1.90 times that among European-ancestry women. The largest exception was BBD, which had a RR 5 times larger in Korean than European-ancestry women. Moreover, Korean women ≥50 years had a larger effect of BBD than those <50, whereas the U.S. women had a similar effect between the age categories. This may reflect our definition of BBD in the KCPS, where the RRs were estimated: women with any ICD-10 code of D24 (“benign neoplasm of unspecified breast”) at baseline were considered to have a history of BBD. This is a more restrictive definition than was used in the European-ancestry studies (19; “atypical hyperplasia of the breast”) and may define a smaller, more homogenous group of women at higher risk of breast cancer. We chose to use ICD-10 D24 to define BBD because we believe that the accuracy of this insurance claims code would be better compared with other codes capturing more heterogeneous forms of BBD.
Risk factor distributions in our study were consistent with distributions of primary risk factors for breast cancer observed in previous Korean studies: early menarche, late menopause (20, 21), later and fewer births (2), taller height (22), obesity (23), history of BBD (24), alcohol intake (25), family history (20), and OC use (26). A systematic review reported that HRT had no significant effect on breast cancer in Korean women (27).
The literature-based absolute risk model for European-ancestry women that we assessed here has recently been validated in the European-ancestry U.S. and UK populations, showing good calibration (10, 28). Risk prediction models used in one country need to be carefully considered before they are adopted and incorporated into guidelines of other countries. These considerations need to account for different disease epidemiology across populations. Indeed, when we applied the original iCARE model, which uses the incidence rates of breast cancer and competing all-cause mortality rates among U.S. non-Hispanic white women, the 5-year absolute risk was overpredicted among Korean women older than 50 (E/O = 2.472; Table 1). This might be due to a variation in age-specific breast cancer incidence between U.S. non-Hispanic whites and Koreans. Consistent with previous findings, we found that the incidence rate of breast cancer in Korean women increased up to age 50 and decreased thereafter; whereas the incidence rate in U.S. non-Hispanic whites rose with age (3–5).
The recalibrated KREA model, which applied Korean incidence and mortality data and the RR estimates from European-ancestry studies, showed markedly improved calibration among women older than 50 years (E/O = 0.92). When the RR estimates from Korean population were further incorporated, the E/O ratio became nearly 1, although the AUC decreased somewhat. The unexpected decrease in model discrimination may be chance fluctuation, or it could reflect relatively imprecise estimates of the Korean RRs from the KCPS. This highlights the importance of considering the bias-variance tradeoff when developing risk models for specific target populations. Estimates of RRs from the target population will be unbiased, but if the available sample sizes are small, those estimates may be highly variable and the resulting risk model may have poor out-of-sample performance. If RR estimates from large samples from a nontarget population are available, they may have relatively good performance in the target population, given their improved precision—provided the true RRs in the target and nontarget population are not too different. In this specific case, considering the large size of the KCPS and the small, likely chance differences in AUCs between the recalibrated models with European-ancestry and Korean RRs, we believe the fully recalibrated KRKR model using Korean RRs is most appropriate for Korean women.
The striking age incidence curve of breast cancer in Korea—rising into the mid-40s, then declining—has been consistently observed over the last few decades (18). Korea is experiencing an aging society, and there is a strong generational cohort effect in breast cancer occurrence in Korean women. It has been reported that reproductive factors such as early age at menarche, late age at menopause, delayed first pregnancy, and changes in breast feeding patterns are associated with the cohort effect of breast cancer incidence among Korean women (18). Another reason for the highest peak in the middle-aged women may be due to the rapid increase in breast cancer screening experience in that age group, that is, higher rate of screening rates among women ages in their 40s and 50s, which is compatible with the age-incidence curve findings (29). This may also be responsible for the larger effect of BBD observed among Korean women ≥50 years than those <50 in our study. According to previous projections, the breast cancer incidence in Korea will increase up to 100 per 100,000 women in the future and the incidence curve by age will be similar to the current curve observed in Western women (2).
Several breast cancer risk assessment tools have been proposed in Korea. Previous case-control studies attempted to identify high-risk groups using a breast cancer probability model with relevant risk factors (30–32). A model developed from a prospective cohort study in Korea with an 8-year follow-up was internally validated in the same source population (33). However, the model did not differentiate between premenopausal and postmenopausal women and included only three risk factors (age, age at menarche, and lactation). A more recent study developed a Korean risk prediction model for breast cancer by modifying the Breast Cancer Risk Assessment Tool (BCRAT) and validated it in two Korean cohorts, showing a better validity than that in the original BCRAT (34). Similar to our study, the study calculated the risks separately for two age groups (<50 and ≥50 years old) and included several reproductive factors and modifiable lifestyle habits such as OC use and BMI. However, the study could not assess model calibration by different levels of risk due to a small number of breast cancer cases.
Matsuno and colleagues developed the Asian American Breast Cancer Study (AABCS) model using ethnicity-specific data to estimate absolute risks for Asian and Pacific Islander American women; and found that for Chinese and Filipino women, projections of absolute risk were lower in the AABCS model compared with the BCRAT that uses data from white women (35). However, because the AABCS model is designed for American women, it may not be generalizable to women in Asian countries who have historically had lower breast cancer risk than Asian women in the United States or European countries (36).
The limitations of our study include few breast cancer cases in the validation cohort, especially for women who are older than 50 years. After a few more years of follow-up, we expect to obtain a larger number of events when the women in the cohort become older in the future. We also acknowledge that the RR model might differ between populations because the distributions of breast cancer subtypes differ. For example, a recent study found that higher proportions of estrogen receptor–positive breast cancer at a younger age among Asian women compared with non-Hispanic white women, which was not considered in this study (37). Another limitation was simulated data for unmeasured risk factors in the KNHANES, which may provide different results from using actual data. Estimates of the Korea-specific RRs from the KCPS may be inaccurate for some risk factors due to a high proportion of missing data. Finally, the risk models we have adapted here do not include several important risk factors, which may have led to diminished discriminatory accuracy. Of particular importance here, the model does not include history of breastfeeding, which has been found to be the strongest protective factor in Korean women (38), whereas in European-ancestry women, the protective effect is relatively small (39, 40). Model fit might also be improved by tailoring categories for available data to the Korean population, for example, using Asian-specific World Health Organization cutoffs for overweight and obesity (41). The relative risk models also do not include interactions among the risk factors, for example, between BMI and hormone therapy (9, 42–44). Further research should consider more comprehensive models including history of breastfeeding and other risk factors, such as breast density, bone mineral density, and genetic and biomarkers. One advantage of the iCARE package is that it allows incorporation of polygenic risk score derived from single-nucleotide polymorphisms. In our future research, we plan to evaluate whether and how genetic information improve the performance of the breast cancer risk prediction models in Korean population.
We included modifiable risk factors such as parity, age at first birth, OC use, HRT use, BMI, and alcohol intake, in the breast cancer risk models, allowing policy makers to quantify risk reduction after risk factor modification and encourages the general population to modify behaviors. Moreover, we evaluated model calibration stratified by levels of risk, which can be useful for risk-based prevention and screening by classifying subjects at the extremes of risk. The success of recalibration of existing breast cancer risk models in this Korean cohort suggests that recalibration could be of great value for assessment of breast cancer risk in other countries.
In conclusion, although the original USEA breast cancer risk model using incidence rates and risk factor distributions from U.S. non-Hispanic women and European-ancestry relative risks showed relatively good discrimination and calibration among Korean women younger than 50, it had lower discrimination and poor calibration among Korean women older than 50. Recalibrated models using Korean breast cancer incidence rates and RRs had good discrimination and improved calibration. The data from this study will provide valuable information to plan and evaluate actions against breast cancer focused on primary prevention and early detection in Korean women. Future work to improve model discrimination should incorporate additional risk factors, including history of breast feeding, genetic risk markers, and breast density.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: Y.H. Jee, C. Gao, S.H. Jee, P. Kraft
Development of methodology: Y.H. Jee, C. Gao, S.H. Jee, P. Kraft
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.H. Jee
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Y.H. Jee, C. Gao, J. Kim, S. Park, S.H. Jee, P. Kraft
Writing, review, and/or revision of the manuscript: Y.H. Jee, C. Gao, J. Kim, S. Park, S.H. Jee, P. Kraft
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y.H. Jee, S.H. Jee, P. Kraft
Study supervision: S.H. Jee, P. Kraft
Acknowledgments
This work was supported by a grant of the NCI (P30CA00651654; to P. Kraft).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.