Background:

To develop a breast cancer prediction model for Korean women using published polygenic risk scores (PRS) combined with nongenetic risk factors (NGRF).

Methods:

Thirteen PRS models generated from single or multiple combinations of the Asian and European PRSs were evaluated among 20,434 Korean women. The AUC and increase in OR per SD were compared for each PRS. The PRSs with the highest predictive power were combined with NGRFs; then, an integrated prediction model was established using the Individualized Coherent Absolute Risk Estimation (iCARE) tool. The absolute breast cancer risk was stratified for 18,142 women with available follow-up data.

Results:

PRS38_ASN+PRS190_EB, a combination of Asian and European PRSs, had the highest AUC (0.621) among PRSs, with an OR per SD increase of 1.45 (95% confidence interval: 1.31–1.61). Compared with the average risk group (35%–65%), women in the top 5% had a 2.5-fold higher risk of breast cancer. Incorporating NGRFs yielded a modest increase in the AUC of women ages >50 years. For PRS38_ASN+PRS190_EB+NGRF, the average absolute risk was 5.06%. The lifetime absolute risk at age 80 years for women in the top 5% was 9.93%, whereas that of women in the lowest 5% was 2.22%. Women at higher risks were more sensitive to NGRF incorporation.

Conclusions:

Combined Asian and European PRSs were predictive of breast cancer in Korean women. Our findings support the use of these models for personalized screening and prevention of breast cancer.

Impact:

Our study provides insights into genetic susceptibility and NGRFs for predicting breast cancer in Korean women.

Breast cancer incidence is substantially increasing in Asian countries (1, 2). In Korea, it has increased over the last 25 years (3). Furthermore, it is expected to continue to grow within the next decades with the continuation of Western lifestyles and changes in reproductive patterns (4). Along with interest in these changes, there has been paramount interest in the development of a risk model to estimate and stratify an individual's susceptibility to breast cancer (5).

Breast cancer has a multifactorial etiology resulting from a complex interaction of genetic and nongenetic risk factors (NGRF), such as environmental, reproductive, and lifestyle factors. Therefore, it is essential to develop a prediction model that accurately captures risk factors (RF). Although earlier breast cancer prediction models were based exclusively on NGRFs, increasing efforts have been made to incorporate genetic factors, such as polygenic risk scores (PRS), into breast cancer prediction models (6–9). Emerging evidence suggests that PRSs, which provide a joint effect of numerous common genetic susceptibility variants, may explain a significant portion of genetic susceptibility to breast cancer (10, 11). However, the majority of PRSs developed to date are based on European ancestry, and PRSs for Asian women have been underevaluated (12–14). Consequently, studies have suggested integrating Asian-specific SNPs and Asian-specific weights and combining different ethnic PRSs using diverse statistical adjustments to examine the transferability of European PRSs to Asian women (15, 16).

In 2013, the Korean Breast Cancer Risk Assessment Tool was developed on the basis of the Gail model and Korean RF distribution (16). However, risk prediction models incorporating genetic factors for Korean women are lacking (16, 17). A Korean-specific risk prediction model is in high demand because the incidence of breast cancer, reproductive RFs, and genetic characteristics may differ significantly among ethnic groups.

In this study, we aimed to: (i) develop a breast cancer PRS for Korean women using previously published PRSs for those of Asian and European ancestry and (ii) investigate whether the integration of NGRFs based on Korean data could improve its predictive performance.

Study design overview

We conducted the study using two steps according to the aims of our study. First, previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (Fig. 1, left). Second, we evaluated the performance of the prediction models incorporating PRSs and NGRFs and estimated the absolute breast cancer risks (Fig. 1, right).

Figure 1.

Study flow chart. Previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (left). Prediction models incorporating PRSs and NGRFs were constructed using a different cohort (right). After evaluating the predictive performance of the models, the absolute breast cancer risk was estimated.

Figure 1.

Study flow chart. Previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (left). Prediction models incorporating PRSs and NGRFs were constructed using a different cohort (right). After evaluating the predictive performance of the models, the absolute breast cancer risk was estimated.

Close modal

Study populations

All participants in this study belonged to either the Health Examinee (HEXA) and Korean Association Resource (KARE) cohort of the Korean Genome and Epidemiology Study (KoGES) or the Breast Cancer Case -Cohort (BCCC). The detailed design of the KoGES study, a large cohort study with publicly available data, has been described elsewhere (18). HEXA, initiated in 2004, recruited 173,357 participants ages >40 years from 38 health examination centers and training hospitals located in eight regions of Korea. Among them, 58,697 participants who had genotype data and met the sample quality control criteria were selected for the analyses. Samples with a low genotype call rate (<97%), cryptic relatedness, or gender discrepancy were excluded. We selected women who had not been diagnosed with cancer for further analysis (Fig. 1). Cases were defined as those who were diagnosed with breast cancer but not with other types of cancers. Controls were defined as those who were cancer-free at baseline and at the time of the follow-up surveys. The participants of HEXA had been followed up utilizing active and passive methods (18). The first follow-up cohort of HEXA at a median of 4.6 ± 1.5 years, was denoted as HEXA1st. KARE, initiated in 2001, recruited 10,038 participants ages 40 to 69 years from two cities, Ansan and Ansung, in Korea. Among them, 5,493 participants who had genotype data and met the sample quality criteria used for HEXA were selected. The KARE cohort was used as a reference dataset for RF distributions when constructing an absolute risk estimation model. The BCCC was initiated in 2008, and it only recruited patients with breast cancer (N = 2,165) from Seoul National University Hospital. One BCCC participant whose age at onset was >80 years was excluded before the analysis. Participants in the BCCC had genotype data but lacked information about NGRFs. Therefore, genetic information about the BCCC was used for PRS validation only. To perform PRS validation, we used 378 cases detected during the baseline HEXA survey. To construct the PRS+NGRF model, we used 153 cases detected at the time of the follow-up survey. Individuals who were not included in the follow-up survey (N = 4,097) or had missing NGRF data (N = 2,097) were not included in the prediction model construction process. Instead, they were used as controls for PRS validation. To ensure a balanced distribution, we generated two independent subsets from the HEXA controls at a 1:1 ratio by randomization. Finally, we analyzed 2,542 cases and 17,892 controls for PRS validation (Fig. 1). The PRS+NGRF model was constructed using 153 cases and 17,989 controls based on the aforementioned conditions (Fig. 1).

Genetic data acquisition

The HEXA and KARE participants were genotyped using the Korean Chip (K-CHIP), which was designed by the Center for Genome Science, Korea National Institute of Health, based on the UK Biobank Axiom Array, and manufactured by Affymetrix (19). BCCC participants were genotyped using the Affymetrix Genome-wide Human SNP array 6.0. We used the Michigan imputation server for phasing (via Eagle v2.4) and imputation (via minimac4) using 1000 Genomes Phase III data as the reference panel (20). After imputation, we excluded SNPs with low imputation quality scores (INFO < 0.3), minor allele frequency ≤0.01, genotype call rates (<95%), and Hardy–Weinberg equilibrium P ≤ 1E-06.

Construction of breast cancer PRS models

A total of 376 breast cancer–associated SNPs, including 313 SNPs selected from Mavaddat and colleagues, 17 novel SNPs from Zhang and colleagues, and 46 Asian-specific SNPs from Ho and colleagues, were investigated in our study (21–23). We examined seven single PRS models according to previously published beta weights of SNPs (Supplementary Table S1). Single PRSs were constructed using the following equation (1):

where β is a coefficient representing the association between each SNP and breast cancer and k is the number of SNPs used. Of the 376 SNPs, 239 were available in the imputed genotype data of the three Korean cohorts. Accordingly, seven single PRS models were constructed. Depending on the numbers of incorporated SNPs and types of β weights used, PRSs were denoted as PRS38_ASN, PRS196_EUR, PRS196_ASN, PRS196_EB, PRS201_EUR, PRS201_ASN, and PRS201_META. PRSASN or PRSEUR indicates that β weights of each PRS were inferred from Asian or European weights, respectively (10, 21, 22). PRSEB applied β weights based on a combination of Asian and European weights using the Empirical Bayes approach (23). PRSMETA utilized β weights generated by a meta-analysis of European and Asian weights reported by a previous study (ref. 15; see Supplementary Table S1 for details).

Six multiple PRSs were constructed using a linear combination of the Asian and European PRSs. Multiple PRSs were constructed using equation (2):

α0, α1, and α2 were obtained by fitting a logistic regression model with breast cancer incidence as the outcome. PRSs were standardized to the respective standardized deviations of the HEXA controls. Ten-fold cross-validation by regression was conducted for multiple PRS models. The relative contributions of each PRS to multiple PRS models are shown in Supplementary Table S2.

Models incorporating NGRFs

For the PRS models developed during step one, NGRFs were incorporated to establish an integrated risk prediction model. Depending on the menopausal status, the incidence of breast cancer differs in Korea, and RFs linked to the development of breast cancer exert varying effects (16, 24). Thus, the PRS+NGRF models were constructed separately using the cut-off age of 50 years by applying different relative risks (RR) and RFs. Information about estrogen-dependent NGRFs in the HEXA and KARE were taken from the survey data. The body mass index (BMI) measured at the time of enrollment (average age, 53 ± 8.37 years) was used. Breast cancer–associated NGRFs and respective RRs were obtained from external studies (16, 24). For women ages <50 years, age at menarche, familial history of breast cancer, menopausal status, age at first full-term pregnancy, height, and BMI were included. For women ages ≥50 years, age at menopause and pregnancy experience (nullipara or para; Supplementary Table S3) were additionally included, whereas age at first full-term pregnancy and menopausal status were excluded. Supplementary Table S3 provides a description of the RR estimates used in this study. In all prediction models incorporating NGRF scores, equation (3) was used, where Fk and wk are the value and corresponding weight of factor k, respectively:

Statistical analysis

PRS association analysis and evaluation of predictive performance

For PRS association analyses, we used logistic regression adjusted for covariates. We examined the OR per SD of the PRS for seven percentile groups (0%–5%, 5%–15%, 15%–35%, 35%–65%, 65%–85%, 85%–95%, 95%–100%), with 35%–65% being the average risk group.

The prediction performance of the PRS was measured by the area under the ROC curve (AUC) using logistic regression. To compare the predictive function of either NGRF-based or PRS-based models and integrated models (PRS+NGRF), the AUC and expected to observed (E/O) ratios were evaluated.

Estimation of the absolute risk of breast cancer according to PRS percentiles

For the PRS showing the highest prediction accuracy, the lifetime absolute risks of breast cancer were estimated. Furthermore, using an integrated PRS+NGRF model, the lifetime and 5-year absolute breast cancer risks were recalculated. The absolute risk of breast cancer for women of age α over the time interval α+τ was defined according to equation (4):

Equation (4) assumes that RF Z acts in a multiplicative fashion on the baseline hazard function λ0(t). It accounts for competing risks originating from mortality due to other causes through m(t), the age-specific mortality rate function (2). The lifetime absolute risk was evaluated as the risk between the age of 20 years and a specific age with a maximum of 80 years. The 5-year absolute risk was defined as the risk within the next 5 years for a woman who has reached a specific age. The Individualized Coherent Absolute Risk Estimation (iCARE) tool requires the RRs of RFs (Z), log-relative risks (β), age-specific incidence rate of all-cause mortality excluding breast cancer mortality, incidence rates of breast cancer, and RF distributions within a population. For this study, RRs were obtained from external studies (16, 24), and RF distributions were derived from KARE, which was used as a reference cohort. The age-specific breast cancer incidence and mortality rates of Korean women in 2010 were obtained from the Korean Statistical Information Service. Absolute risks were evaluated with R 4.2.1 using the iCARE R package (version 1.18.0; ref. 25). P < 0.05 was considered significant.

To investigate the associated effect of PRS and NGRF according to the magnitude of risks, absolute lifetime risks using multiple PRS and NGRF risk strata were analyzed. PRSs were classified into three risk groups (0%–20%: low; 20%–80%: mid; 80%–100%: high), and NGRF scores were classified into two groups divided using a median distribution (0%–50%: low; 50%–100%: high).

Ethics approval and consent to participate

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of National Biobank of Korea (P01-202108-31-001) and Seoul National University Hospital (1507-132-689). Written informed consent was obtained from all subjects.

Data availability

The data underlying this article are available in the article and in its online Supplementary Materials and Methods. Raw data were generated from the Korea Genome and Epidemiology Study. The age-specific breast cancer incidence and mortality rates of Korean women were obtained from the Korean Statistical Information Service (SCR_023565). The derived data supporting the findings of this study are available from the corresponding author upon request.

Study population

PRS validation was performed among 20,434 Korean women (Fig. 1). The PRS+NRGF model was evaluated among 18,142 controls. In this subset, 153 cancer cases were detected during the follow-up period. Among 153 newly developed cases, 68 occurred in women ages <50 years and 85 occurred in those ages >50 years.

Performance of PRS in the Korean population

Thirteen PRSs were constructed using previously reported Asian and European SNPs (Table 1). In general, the multiple PRS models performed better than the single PRS models for Korean women. Among the PRS models, the most predictive was PRS38_ASN+PRS190_EB (AUC: 0.621), although the overall AUC differences between PRS models were marginal. The contribution of PRS38_ASN to PRS38_ASN+PRS190_EB was approximately 30% (Supplementary Table S2). We did not observe a significant interaction between PRS38_ASN+PRS190_EB and age (Supplementary Table S4). The density plot of PRS38_ASN+PRS190_EB is shown in Supplementary Fig. S1. The distribution curve for cancer participants shifted to the right compared with that of the controls. The percentile association of PRS38_ASN+PRS190_EB stratified into seven percentile groups is presented in Table 2. Women in the top 5% had a 2.5-fold higher risk and women in the lowest 5% had a 0.61-fold lower risk of breast cancer than the average risk group (35%–65%). The risk distributions were well distinguished between the risk percentile groups, although the associations were not statistically significant. Lifetime and 5-year absolute risks of PRS38_ASN+PRS190_EB are shown in Supplementary Fig. S2. The lifetime absolute risk at age 80 years for women in the highest 5% was 9.91%, and that of women in the lowest 5% was 2.18% (average lifetime absolute risk: 4.89%).

Table 1.

Mean, SD, and the associations of the PRS with the breast cancer risk of Korean women.

HEXAa + BCCCb
PRSCase (N = 2,542) Mean ± (SD)Control (N = 17,892) Mean ± (SD)OR per SD (95% CI)cAUCd (95% CI)
Single PRS 
PRS38_ASN −0.10 ± 0.41 −0.24 ± 0.41 1.37 (1.24–1.52) 0.592 (0.581–0.604) 
PRS190_EUR 0.68 ± 0.51 0.47 ± 0.50 1.38 (1.24–1.53) 0.611 (0.599–0.623) 
PRS190_ASN −0.06 ± 0.56 −0.29 ± 0.54 1.41 (1.27–1.56) 0.612 (0.600–0.624) 
PRS190_EB 0.24 ± 0.46 0.05 ± 0.45 1.41 (1.27–1.56) 0.616 (0.604–0.627) 
PRS201_EUR 0.46 ± 0.52 0.25 ± 0.50 1.37 (1.23–1.51) 0.612 (0.600–0.624) 
PRS201_ASN 0.03 ± 0.55 −0.19 ± 0.54 1.41 (1.28–1.56) 0.612 (0.600–0.624) 
PRS201_META 0.61 ± 0.59 0.37 ± 0.57 1.41 (1.28–1.57) 0.614 (0.603–0.626) 
Multiple PRS 
PRS38_ASN+PRS190_EUR −1.85 ± 0.21 −1.94 ± 0.20 1.44 (1.30–1.59) 0.619 (0.607–0.631) 
PRS38_ASN+PRS190_ASN −2.07 ± 0.22 −2.17 ± 0.21 1.44 (1.30–1.59) 0.615 (0.604–0.627) 
PRS38_ASN+PRS190_EB −1.98 ± 0.20 −2.07 ± 0.19 1.45 (1.31–1.61) 0.621 (0.609–0.633) 
PRS38_ASN+PRS201_EUR −1.92 ± 0.21 −2.01 ± 0.21 1.43 (1.29–1.58) 0.620 (0.608–0.631) 
PRS38_ASN+PRS201_ASN −2.04 ± 0.21 −2.13 ± 0.21 1.44 (1.30–1.59) 0.615 (0.603–0.627) 
PRS38_ASN+PRS201_META −1.85 ± 0.23 −1.95 ± 0.23 1.45 (1.31–1.60) 0.618 (0.607–0.630) 
HEXAa + BCCCb
PRSCase (N = 2,542) Mean ± (SD)Control (N = 17,892) Mean ± (SD)OR per SD (95% CI)cAUCd (95% CI)
Single PRS 
PRS38_ASN −0.10 ± 0.41 −0.24 ± 0.41 1.37 (1.24–1.52) 0.592 (0.581–0.604) 
PRS190_EUR 0.68 ± 0.51 0.47 ± 0.50 1.38 (1.24–1.53) 0.611 (0.599–0.623) 
PRS190_ASN −0.06 ± 0.56 −0.29 ± 0.54 1.41 (1.27–1.56) 0.612 (0.600–0.624) 
PRS190_EB 0.24 ± 0.46 0.05 ± 0.45 1.41 (1.27–1.56) 0.616 (0.604–0.627) 
PRS201_EUR 0.46 ± 0.52 0.25 ± 0.50 1.37 (1.23–1.51) 0.612 (0.600–0.624) 
PRS201_ASN 0.03 ± 0.55 −0.19 ± 0.54 1.41 (1.28–1.56) 0.612 (0.600–0.624) 
PRS201_META 0.61 ± 0.59 0.37 ± 0.57 1.41 (1.28–1.57) 0.614 (0.603–0.626) 
Multiple PRS 
PRS38_ASN+PRS190_EUR −1.85 ± 0.21 −1.94 ± 0.20 1.44 (1.30–1.59) 0.619 (0.607–0.631) 
PRS38_ASN+PRS190_ASN −2.07 ± 0.22 −2.17 ± 0.21 1.44 (1.30–1.59) 0.615 (0.604–0.627) 
PRS38_ASN+PRS190_EB −1.98 ± 0.20 −2.07 ± 0.19 1.45 (1.31–1.61) 0.621 (0.609–0.633) 
PRS38_ASN+PRS201_EUR −1.92 ± 0.21 −2.01 ± 0.21 1.43 (1.29–1.58) 0.620 (0.608–0.631) 
PRS38_ASN+PRS201_ASN −2.04 ± 0.21 −2.13 ± 0.21 1.44 (1.30–1.59) 0.615 (0.603–0.627) 
PRS38_ASN+PRS201_META −1.85 ± 0.23 −1.95 ± 0.23 1.45 (1.31–1.60) 0.618 (0.607–0.630) 

aHEXA: Health Examinee.

bBCCC: Breast Cancer Case -Cohort.

cOR: odds ratios were estimated using a logistic regression model adjusted for age and study, SD: standard deviation, CI: confidence interval.

dAUC: Area under the curve.

Table 2.

Percentile association of PRS38_ASN+PRS190_EB and participant distributions.

Sample size
PRS (Percentile)BCCCa-CaseHEXAb-CaseHEXA-ControlOR (95% CI)cP
00–05(%) 47 11 964 0.61 (0.31–1.09) 1.20E-01 
05–15(%) 125 20 1,898 0.57 (0.34–0.90) 2.11E-02 
15–35(%) 290 63 3,734 0.91 (0.66–1.24) 5.39E-01 
35–65(%) 610 101 5,419 Reference — 
65–85(%) 534 94 3,459 1.46 (1.10–1.94) 9.35E-03 
85–95(%) 320 54 1,669 1.74 (1.24–2.42) 1.17E-03 
95–100(%) 238 35 749 2.50 (1.67–3.67) 4.52E-06 
 
Sample size
PRS (Percentile)BCCCa-CaseHEXAb-CaseHEXA-ControlOR (95% CI)cP
00–05(%) 47 11 964 0.61 (0.31–1.09) 1.20E-01 
05–15(%) 125 20 1,898 0.57 (0.34–0.90) 2.11E-02 
15–35(%) 290 63 3,734 0.91 (0.66–1.24) 5.39E-01 
35–65(%) 610 101 5,419 Reference — 
65–85(%) 534 94 3,459 1.46 (1.10–1.94) 9.35E-03 
85–95(%) 320 54 1,669 1.74 (1.24–2.42) 1.17E-03 
95–100(%) 238 35 749 2.50 (1.67–3.67) 4.52E-06 
 

aBCCC: Breast Cancer Case -Cohort.

bHEXA: Health Examinee.

cORs were estimated using a logistic regression model adjusted for age and study, CI: confidence interval.

Performance of the prediction model incorporating NGRFs

The prediction model established using NGRFs had limited predictive power (Fig. 2; Supplementary Table S5). NGRF models were more predictive for women ages ≥50 years than for those ages <50 years (AUC: 0.564 vs. 0.503). The results showed that there were noticeable differences in the AUC changes when NGRF models were added to PRS models, depending on the age group. For women ages ≥50 years, the addition of NGRFs led to an increase in the AUC, although the initial PRS had a lower AUC. However, for participants ages <50 years, the incorporation of NGRFs had only a small effect on the AUC, whereas PRS alone had better predictive performance. This implies that women ages ≥50 are more dependent on NGRFs, whereas women ages <50 are more genetically predisposed. PRS38_ASN+PRS201_META+NGRF had the highest predictive power for women ages <50 years, whereas PRS38_ASN+PRS190_EB+NGRF was the most predictive model for women ages ≥50 years. Nevertheless, because there was not much difference in the overall AUC (0.012) between the best and worst multiple PRSs for all age groups, we decided to use PRS38_ASN+PRS190_EB+NGRF (hereafter referred to as the integrated model), which is the model containing PRS38_ASN+PRS190_EB (hereafter referred to as the multiple PRS model), which had the highest accuracy during step one, to further estimate the absolute breast cancer risk.

Figure 2.

AUC for various PRS models and NGRFs predicting the breast cancer risk. The AUC was compared among NGRFs, PRSs, and integrated (PRS+NGRF) models for women. A, Age <50 years. B, ≥50 years.

Figure 2.

AUC for various PRS models and NGRFs predicting the breast cancer risk. The AUC was compared among NGRFs, PRSs, and integrated (PRS+NGRF) models for women. A, Age <50 years. B, ≥50 years.

Close modal

Absolute risk of breast cancer according to PRS percentiles

Figure 3 depicts the lifetime and 5-year absolute risks of the integrated model. The lifetime absolute risk of breast cancer ranged from 2% to 10%, with an average of 5.06% (Fig. 3A). The absolute risk at age 80 years for women in the highest 5% was 9.93%, whereas that for women in the lowest 5% was at 2.22%. The zenith of the 5-year absolute risk for women in the top 5% at age 48 years was 1.47%, and it declined thereafter (Fig. 3B). The 5-year absolute risk of the average risk group at 40 years, which is the age when the first breast cancer screening is recommended in Korea, was 0.6%. However, women in the top 5% reached this level of risk much earlier at age 33 years. This outcome may support the need for individualized screening strategies for high-risk women, particularly those ages <40 years.

Figure 3.

Estimation of the absolute breast cancer risk by seven percentiles. The absolute risk of developing breast cancer is predicted using the integrated model (PRS38_ASN+PRS190_EB+NGRF). The dotted lines represent the average risks. A, Lifetime absolute risk. B, 5-year absolute risk.

Figure 3.

Estimation of the absolute breast cancer risk by seven percentiles. The absolute risk of developing breast cancer is predicted using the integrated model (PRS38_ASN+PRS190_EB+NGRF). The dotted lines represent the average risks. A, Lifetime absolute risk. B, 5-year absolute risk.

Close modal

To compare the changes induced by incorporating NGRFs, the lifetime and 5-year absolute risks of the multiple PRS model were also estimated using HEXA 1st (Supplementary Fig. S3). The average absolute risks of the integrated model (5.06%) and multiple PRS model (4.81%) were quite similar. However, the distributions of density in both models were different. Figure 4 depicts the density plot of the multiple PRS and integrated models at age 80 years. Adding NGRFs to multiple PRS models increased the SD of the integrated model. A greater increase in the SD was observed for the higher-risk group, especially those in the top 5% (Supplementary Table S6). This implies that women at higher risk are more sensitive to risk-reducing interventions.

Figure 4.

Density plot of breast cancer absolute risk at age 80. The absolute risk was stratified by seven PRS percentiles. A, Multiple PRS model. B, Integrated model.

Figure 4.

Density plot of breast cancer absolute risk at age 80. The absolute risk was stratified by seven PRS percentiles. A, Multiple PRS model. B, Integrated model.

Close modal

To investigate the associated effect of PRS and NGRF, we analyzed the absolute lifetime risks using different PRS and NGRF risk levels (Supplementary Fig. S4). In the model, the curves for mid PRS+high NGRF versus high PRS+low NGRF as well as the curves for mid PRS+low NGRF versus low PRS+high NGRF nearly overlapped. This supports the idea that risk modifications might reduce the breast cancer risk of some individuals despite their inherited genetic risks. The difference in the absolute risk with high NGRF and low NGRF levels was greater for women at higher risk, indicating that there is greater potential for risk reduction among this group of women.

In our study, we developed a breast cancer prediction model for Korean women based on 13 PRS and NGRFs. We demonstrated that (i) the combined Asian and European PRS was predictive of breast cancer among Korean women and (ii) the incorporation of NGRFs improved breast cancer risk stratification for women ages ≥50 years. Our findings provide essential insights into genetic susceptibility and NGRFs for predicting breast cancer among Korean women.

To our knowledge, very few studies of the Korean PRS have been conducted, and most of them were conducted as a part of large-scale Asian studies (14, 22). In a previous analysis, a PRS constructed with 44 SNPs among East Asian women in the Breast Cancer Association Consortium was examined among Korean women. In that study, although the Korean-specific AUC was not provided, the overall AUC was 0.606 (14), which is consistent with the AUC of PRS38_ASN in our study (AUC: 0.592). In addition, one recent study explored a combined Asian and European PRS (PRS46+PRS287_EB) for Asian women, and an AUC of 0.630 (OR per SD: 1.59) was reported for a subset of Korean women (22). Our study is one of the few studies to establish a breast cancer risk prediction model incorporating PRSs for Korean women only. In our study, the AUC of the PRS with the highest predictive power was 0.621. This is compatible with those of published studies that examined PRSs for Asian women (13, 22, 26, 27). In addition, the increase in AUC resulting from combining NGRFs with multiple PRSs was comparable with the findings of previous studies (Supplementary Table S5; refs. 7, 9, 15).

We established two separate models incorporating NGRFs according to age using different RFs and RRs. In Korea, menopause is an important RF contributing to a distinctive breast cancer incidence curve that peaks at age 50 years and declines thereafter, as shown in this study (Fig. 3; refs. 5, 24). For this reason, Korean prediction models often have been used to analyze age groups separately (16, 17, 24, 28). In our study, we observed that the contributions of NGRFs and PRSs to the prediction of breast cancer were markedly different among age groups (Fig. 2; Supplementary Table S5). For those ages <50 years, the AUC of the PRS alone was initially higher compared with that of women ages >50 years. Adding NGRFs did not contribute to an increase in the AUC of the group. Our findings suggest that young women have higher genetic susceptibility to breast cancer than their older counterparts (26). An analysis that evaluated the interactions between the PRS and NGRFs showed a stronger association between the PRS and premenopausal women (OR: 2.46), thus supporting the findings of our study (29).

In our study, the performance of the prediction models among patients ages <50 years was greater than it was among those ages ≥50 years. Presumably, risk prediction for patients ages ≥50 years is further complicated by menopause and confounding factors such as BMI (30–32). In a previous study, a prediction model constructed using iCARE based on the Korean incidence of mortality and Korean-based risk distributions showed an AUC of 0.584 for women ages ≥50 years and an AUC of 0.697 for women ages <50 years (24). The Korean Breast Cancer Risk Assessment Tool incorporated RFs such as age at menopause, pregnancy experience, BMI, oral contraceptive usage, and exercise, unlike that for premenopausal women, yielding an AUC of 0.65 for women ages ≥50 (17). In this study, we selected reproductive RFs as the main NGRFs. The effects of risks related to estrogen-dependent reproductive factors, including age at menarche and number of pregnancies, on breast cancer development have been clearly established (4), whereas the effects of lifestyle factors (alcohol intake, hormone use, and exercise) are complicated and still controversial (33–36). In addition, owing to the characteristics of questionnaire-based surveys, self-reported lifestyle RFs that change over time are associated with a high risk of confounding the study by creating recall bias or misclassification (32).

The results of our study support the usage of personalized screening and prevention strategies based on PRS risk stratification. To date, most guidelines have been updated to implement personalized screening for those at high risk, notably those with a family history or identified pathogenic variants (37, 38). However, the current Korean national breast cancer screening program follows a one-size-fits-all approach, encouraging women ages 40 to 69 years in the general population to undergo a biennial mammography based on the evidence that screening reduces breast cancer mortality in this age group (39). Our study suggests that earlier screening should be implemented for women in the high-risk group, particularly for those ages <40 years who are currently ineligible for screening programs. In addition, preventive interventions and lifestyle modifications for women with higher risk scores should be focused on to yield maximum risk reductions (40). For instance, lifestyle changes affecting BMI may modify the breast cancer risk. However, the best screening tool for breast cancer in young women and its effect on reducing real-world breast cancer mortality should be carefully evaluated before implementation (41).

Our study had several limitations. First, the small sample size limited the optimization of the prediction model. A sufficient sample size may have provided significant P values for all risk estimates in the multiple PRS model (Supplementary Table S4). Also, the inclusion of few participants ages >70 years (1.57%) may have underrepresented the risk estimates of women in this age group. The average life expectancy of Koreans is increasing and reached 84 years in 2021 (42). Further accumulation of follow-up data may enable us to develop a more accurate model that covers all age strata. Second, because the BCCC and HEXA cohorts of case–control studies were recruited from either a teaching hospital or a health examination center in an urban area, their characteristics may not be representative of the entire Korean population, thus inducing selection bias. Further external validation among the general population is warranted. Third, our model does not include a modifiable RF. Further adjustments are needed to include information about modifiable RFs, and other RFs associated with breast cancer risk, such as mammographic density, might improve the predictive power (43, 44). Finally, although we only included common variants in our model, pathogenic variants that are known to confer a higher risk of breast cancer susceptibility may be included in future prediction models.

In conclusion, we showed that the combined PRS is predictive of breast cancer in Korean women. The incorporation of NGRFs further enhanced the predictive power for women ages >50 years. These models can be helpful to developing optimal screening strategies for and effective preventive measures against breast cancer.

T.-W. Ha reports personal fees from DCGen Co., Ltd. outside the submitted work; and has a pending patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. H.-M. Choi reports personal fees from DCGen Co., Ltd. outside the submitted work; has a patent for 10-2022-0144476 pending; Methods for predicting cancer prognosis and compositions thereof pending. H.-B. Lee reports grants from the Korea Health Industry Development Institute (KHIDI) during the conduct of the study; is the chief medical officer of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. H.-C. Shin is the chief executive officer of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. W. Chung reports personal fees from DCGen Co., Ltd and is the chief technical officer of DCGen Co., Ltd; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. W. Han reports grants from the Korea Health Industry Development Institute (KHIDI) during the conduct of the study; is the chairman of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending.

J. Choi: Formal analysis, writing–original draft, writing–review and editing. T.-W. Ha: Data curation, formal analysis, methodology, writing–original draft, writing–review and editing. H.-M. Choi: Data curation, methodology, writing–review and editing. H.-B. Lee: Conceptualization, methodology, writing–review and editing. H.-C. Shin: Conceptualization, methodology, writing–review and editing. W. Chung: Conceptualization, supervision, methodology, writing–review and editing. W. Han: Conceptualization, supervision, methodology, writing–review and editing.

W. Han and H.-B. Lee received a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant no.: HR14C0003).

This study was conducted with bioresources from National Biobank of Korea, the Korea Disease Control and Prevention Agency, Republic of Korea (KBN-2021-504).

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

1.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
2.
Ghoncheh
M
,
Mahdavifar
N
,
Darvishi
E
,
Salehiniya
H
.
Epidemiology, incidence and mortality of breast cancer in Asia
.
Asian Pac J Cancer Prev
2016
;
17
:
47
52
.
3.
Hong
S
,
Won
YJ
,
Park
YR
,
Jung
KW
,
Kong
HJ
,
Lee
ES
.
Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2017
.
Cancer Res Treat
2020
;
52
:
335
50
.
4.
Lee
JE
,
Lee
SA
,
Kim
TH
,
Park
S
,
Choy
YS
,
Ju
YJ
, et al
.
Projection of breast cancer burden due to reproductive/lifestyle changes in Korean women (2013–2030) using an age-period-cohort model
.
Cancer Res Treat
2018
;
50
:
1388
95
.
5.
Park
HL
.
Breast cancer risk prediction in Korean women: review and perspectives on personalized breast cancer screening
.
J Breast Cancer
2020
;
23
:
331
42
.
6.
Rudolph
A
,
Song
M
,
Brook
MN
,
Milne
RL
,
Mavaddat
N
,
Michailidou
K
, et al
.
Joint associations of a polygenic risk score and environmental risk factors for breast cancer in the Breast Cancer Association Consortium
.
Int J Epidemiol
2018
;
47
:
526
36
.
7.
van Veen
EM
,
Brentnall
AR
,
Byers
H
,
Harkness
EF
,
Astley
SM
,
Sampson
S
, et al
.
Use of single-nucleotide polymorphisms and mammographic density plus classic risk factors for breast cancer risk prediction
.
JAMA Oncol
2018
;
4
:
476
82
.
8.
Lakeman
IM
,
Rodríguez-Girondo
M
,
Lee
A
,
Ruiter
R
,
Stricker
BH
,
Wijnant
SR
, et al
.
Validation of the BOADICEA model and a 313-variant polygenic risk score for breast cancer risk prediction in a Dutch prospective cohort
.
Genet Med
2020
;
22
:
1803
11
.
9.
Shieh
Y
,
Hu
D
,
Ma
L
,
Huntsman
S
,
Gard
CC
,
Leung
JW
, et al
.
Breast cancer risk prediction using a clinical risk model and polygenic risk score
.
Breast Cancer Res Treat
2016
;
159
:
513
25
.
10.
Mavaddat
N
,
Antoniou
AC
,
Easton
DF
,
Garcia-Closas
M
.
Genetic susceptibility to breast cancer
.
Mol Oncol
2010
;
4
:
174
91
.
11.
Michailidou
K
,
Hall
P
,
Gonzalez-Neira
A
,
Ghoussaini
M
,
Dennis
J
,
Milne
RL
, et al
.
Large-scale genotyping identifies 41 new loci associated with breast cancer risk
.
Nat Genet
2013
;
45
:
353
61
.
12.
Chan
CHT
,
Munusamy
P
,
Loke
SY
,
Koh
GL
,
Yang
AZY
,
Law
HY
, et al
.
Evaluation of three polygenic risk score models for the prediction of breast cancer risk in Singapore Chinese
.
Oncotarget
2018
;
9
:
12796
804
.
13.
Lee
CPL
,
Irwanto
A
,
Salim
A
,
Yuan
JM
,
Liu
J
,
Koh
WP
, et al
.
Breast cancer risk assessment using genetic variants and risk factors in a Singapore Chinese population
.
Breast Cancer Res
2014
;
16
:
R64
.
14.
Wen
W
,
Shu
XO
Guo
X
,
Cai
Q
,
Long
J
,
Bolla
MK
, et al
.
Prediction of breast cancer risk based on common genetic variants in women of East Asian ancestry
.
Breast Cancer Res
2016
;
18
:
124
.
15.
Yang
Y
,
Tao
R
,
Shu
X
,
Cai
Q
,
Wen
W
,
Gu
K
, et al
.
Incorporating polygenic risk scores and nongenetic risk factors for breast cancer risk prediction among Asian women
.
JAMA Netw Open
2022
;
5
:
e2149030
.
16.
Park
B
,
Ma
SH
,
Shin
A
,
Chang
MC
,
Choi
JY
,
Kim
S
, et al
.
Korean risk assessment model for breast cancer risk prediction
.
PLoS One
2013
;
8
:
e76736
.
17.
Lee
C
,
Lee
JC
,
Park
B
,
Bae
J
,
Lim
MH
,
Kang
D
, et al
.
Computational discrimination of breast cancer for Korean women based on epidemiologic data only
.
J Korean Med Sci
2015
;
30
:
1025
34
.
18.
Kim
Y
,
Han
BG
,
KoGES
Group
.
Cohort profile: the Korean genome and epidemiology study (KoGES) consortium
.
Int J Epidemiol
2017
;
46
:
1350
.
19.
Moon
S
,
Kim
YJ
,
Han
S
,
Hwang
MY
,
Shin
DM
,
Park
MY
, et al
.
The Korea Biobank Array: design and identification of coding variants associated with blood biochemical traits
.
Sci Rep
2019
;
9
:
1382
.
20.
Das
S
,
Forer
L
,
Schönherr
S
,
Sidore
C
,
Locke
AE
,
Kwong
A
, et al
.
Next-generation genotype imputation service and methods
.
Nat Genet
2016
;
48
:
1284
7
.
21.
Mavaddat
N
,
Michailidou
K
,
Dennis
J
,
Lush
M
,
Fachal
L
,
Lee
A
, et al
.
Polygenic risk scores for prediction of breast cancer and breast cancer subtypes
.
Am J Hum Genet
2019
;
104
:
21
34
.
22.
Ho
WK
,
Tai
MC
,
Dennis
J
,
Shu
X
,
Li
J
,
Ho
PJ
, et al
.
Polygenic risk scores for prediction of breast cancer risk in Asian populations
.
Genet Med
2022
;
24
:
586
600
.
23.
Zhang
H
,
Ahearn
TU
,
Lecarpentier
J
,
Barnes
D
,
Beesley
J
,
Qi
G
, et al
.
Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses
.
Nat Genet
2020
;
52
:
572
81
.
24.
Jee
YH
,
Gao
C
,
Kim
J
,
Park
S
,
Jee
SH
,
Kraft
P
.
Validating breast cancer risk prediction models in the Korean Cancer Prevention Study-II Biobank
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
1271
7
.
25.
Pal Choudhury
P
,
Maas
P
,
Wilcox
A
,
Wheeler
W
,
Brook
M
,
Check
D
, et al
.
iCARE: an R package to build, validate and apply absolute risk models
.
PLoS One
2020
;
15
:
e0228198
.
26.
Hsieh
YC
,
Tu
SH
,
Su
CT
,
Cho
EC
,
Wu
CH
,
Hsieh
MC
, et al
.
A polygenic risk score for breast cancer risk in a Taiwanese population
.
Breast Cancer Res Treat
2017
;
163
:
131
8
.
27.
Zheng
W
,
Wen
W
,
Gao
YT
,
Shyr
Y
,
Zheng
Y
,
Long
J
, et al
.
Genetic and clinical predictors for breast cancer risk assessment and stratification among Chinese women
.
J Natl Cancer Inst
2010
;
102
:
972
81
.
28.
Anastasiadi
Z
,
Lianos
GD
,
Ignatiadou
E
,
Harissis
HV
,
Mitsis
M
.
Breast cancer in young women: an overview
.
Updates Surg
2017
;
69
:
313
7
.
29.
Shi
M
,
O'Brien
K
,
Weinberg
C
.
Interactions between a polygenic risk score and non-genetic risk factors in young-onset breast cancer
.
Sci Rep
2020
;
10
:
3242
.
30.
Zhang
X
,
Rice
M
,
Tworoger
SS
,
Rosner
BA
,
Eliassen
AH
,
Tamimi
RM
, et al
.
Addition of a polygenic risk score, mammographic density, and endogenous hormones to existing breast cancer risk prediction models: a nested case–control study
.
PLoS Med
2018
;
15
:
e1002644
.
31.
Hsieh
CC
,
Trichopoulos
D
,
Katsouyanni
K
,
Yuasa
S
.
Age at menarche, age at menopause, height and obesity as risk factors for breast cancer: associations and interactions in an international case-control study
.
Int J Cancer
1990
;
46
:
796
800
.
32.
Liu
R
,
Kitamura
Y
,
Kitamura
T
,
Sobue
T
,
Sado
J
,
Sugawara
Y
, et al
.
Reproductive and lifestyle factors related to breast cancer among Japanese women: an observational cohort study
.
Medicine
2019
;
98
:
e18315
.
33.
Park
SY
,
Kolonel
LN
,
Lim
U
,
White
KK
,
Henderson
BE
,
Wilkens
LR
.
Alcohol consumption and breast cancer risk among women from five ethnic groups with light to moderate intakes: the Multiethnic Cohort Study
.
Int J Cancer
2014
;
134
:
1504
10
.
34.
Gao
Y
,
Huang
YB
,
Liu
XO
,
Chen
C
,
Dai
HJ
,
Song
FJ
, et al
.
Tea consumption, alcohol drinking and physical activity associations with breast cancer risk among Chinese females: a systematic review and meta-analysis
.
Asian Pac J Cancer Prev
2013
;
14
:
7543
50
.
35.
Irwin
ML
,
Varma
K
,
Alvarez-Reeves
M
,
Cadmus
L
,
Wiley
A
,
Chung
GG
, et al
.
Randomized controlled trial of aerobic exercise on insulin and insulin-like growth factors in breast cancer survivors: the Yale Exercise and Survivorship Study
.
Cancer Epidemiol Biomarkers Prev
2009
;
18
:
306
13
.
36.
Chen
WY
.
Postmenopausal hormone therapy and breast cancer risk: current status and unanswered questions
.
Endocrinol Metab Clin North Am
2011
;
40
:
509
18
.
37.
Evans
DG
,
Graham
J
,
O'Connell
S
,
Arnold
S
,
Fitzsimmons
D
.
Familial breast cancer: summary of updated NICE guidance
.
BMJ
2013
;
346
:
f3829
.
38.
Monticciolo
DL
,
Newell
MS
,
Moy
L
,
Niell
B
,
Monsees
B
,
Sickles
EA
.
Breast cancer screening in women at higher-than-average risk: recommendations from the ACR
.
J Am Coll Radiol
2018
;
15
:
408
14
.
39.
Lee
EH
,
Park
BY
,
Kim
NS
,
Suh
HJ
,
Koh
KR
,
Min
JW
, et al
.
The Korean guideline for breast cancer screening
.
J Korean Med Assoc
2015
;
58
:
408
19
.
40.
Arthur
RS
,
Wang
T
,
Xue
X
,
Kamensky
V
,
Rohan
TE
.
Genetic factors, adherence to healthy lifestyle behavior, and risk of invasive breast cancer among women in the UK Biobank
.
J Natl Cancer Inst
2020
;
112
:
893
901
.
41.
Roberts
MC
.
Implementation challenges for risk-stratified screening in the era of precision medicine
.
JAMA Oncol
2018
;
4
:
1484
5
.
42.
Organisation for Economic Co-operation and Development
.
OECD health statistics 2021
;
2021
.
Available from:
https://www.oecd.org/els/health-systems/Table-of-Content-Metadata-OECD-Health-Statistics-2021.pdf.
43.
Brentnall
AR
,
Harkness
EF
,
Astley
SM
,
Donnelly
LS
,
Stavrinos
P
,
Sampson
S
, et al
.
Mammographic density adds accuracy to both the Tyrer-Cuzick and Gail Breast Cancer Risk Models in a prospective UK screening cohort
.
Breast Cancer Res
2015
;
17
:
147
.
44.
Kim
EY
,
Chang
Y
,
Ahn
J
,
Yun
JS
,
Park
YL
,
Park
CH
, et al
.
Mammographic breast density, its changes, and breast cancer risk in premenopausal and postmenopausal women
.
Cancer
2020
;
126
:
4687
96
.
This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

Supplementary data