Abstract
To develop a breast cancer prediction model for Korean women using published polygenic risk scores (PRS) combined with nongenetic risk factors (NGRF).
Thirteen PRS models generated from single or multiple combinations of the Asian and European PRSs were evaluated among 20,434 Korean women. The AUC and increase in OR per SD were compared for each PRS. The PRSs with the highest predictive power were combined with NGRFs; then, an integrated prediction model was established using the Individualized Coherent Absolute Risk Estimation (iCARE) tool. The absolute breast cancer risk was stratified for 18,142 women with available follow-up data.
PRS38_ASN+PRS190_EB, a combination of Asian and European PRSs, had the highest AUC (0.621) among PRSs, with an OR per SD increase of 1.45 (95% confidence interval: 1.31–1.61). Compared with the average risk group (35%–65%), women in the top 5% had a 2.5-fold higher risk of breast cancer. Incorporating NGRFs yielded a modest increase in the AUC of women ages >50 years. For PRS38_ASN+PRS190_EB+NGRF, the average absolute risk was 5.06%. The lifetime absolute risk at age 80 years for women in the top 5% was 9.93%, whereas that of women in the lowest 5% was 2.22%. Women at higher risks were more sensitive to NGRF incorporation.
Combined Asian and European PRSs were predictive of breast cancer in Korean women. Our findings support the use of these models for personalized screening and prevention of breast cancer.
Our study provides insights into genetic susceptibility and NGRFs for predicting breast cancer in Korean women.
Introduction
Breast cancer incidence is substantially increasing in Asian countries (1, 2). In Korea, it has increased over the last 25 years (3). Furthermore, it is expected to continue to grow within the next decades with the continuation of Western lifestyles and changes in reproductive patterns (4). Along with interest in these changes, there has been paramount interest in the development of a risk model to estimate and stratify an individual's susceptibility to breast cancer (5).
Breast cancer has a multifactorial etiology resulting from a complex interaction of genetic and nongenetic risk factors (NGRF), such as environmental, reproductive, and lifestyle factors. Therefore, it is essential to develop a prediction model that accurately captures risk factors (RF). Although earlier breast cancer prediction models were based exclusively on NGRFs, increasing efforts have been made to incorporate genetic factors, such as polygenic risk scores (PRS), into breast cancer prediction models (6–9). Emerging evidence suggests that PRSs, which provide a joint effect of numerous common genetic susceptibility variants, may explain a significant portion of genetic susceptibility to breast cancer (10, 11). However, the majority of PRSs developed to date are based on European ancestry, and PRSs for Asian women have been underevaluated (12–14). Consequently, studies have suggested integrating Asian-specific SNPs and Asian-specific weights and combining different ethnic PRSs using diverse statistical adjustments to examine the transferability of European PRSs to Asian women (15, 16).
In 2013, the Korean Breast Cancer Risk Assessment Tool was developed on the basis of the Gail model and Korean RF distribution (16). However, risk prediction models incorporating genetic factors for Korean women are lacking (16, 17). A Korean-specific risk prediction model is in high demand because the incidence of breast cancer, reproductive RFs, and genetic characteristics may differ significantly among ethnic groups.
In this study, we aimed to: (i) develop a breast cancer PRS for Korean women using previously published PRSs for those of Asian and European ancestry and (ii) investigate whether the integration of NGRFs based on Korean data could improve its predictive performance.
Materials and Methods
Study design overview
We conducted the study using two steps according to the aims of our study. First, previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (Fig. 1, left). Second, we evaluated the performance of the prediction models incorporating PRSs and NGRFs and estimated the absolute breast cancer risks (Fig. 1, right).
Study flow chart. Previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (left). Prediction models incorporating PRSs and NGRFs were constructed using a different cohort (right). After evaluating the predictive performance of the models, the absolute breast cancer risk was estimated.
Study flow chart. Previously reported breast cancer SNPs were validated among Korean women using various published PRSs, and absolute breast cancer risks were evaluated using the PRS with the highest accuracy (left). Prediction models incorporating PRSs and NGRFs were constructed using a different cohort (right). After evaluating the predictive performance of the models, the absolute breast cancer risk was estimated.
Study populations
All participants in this study belonged to either the Health Examinee (HEXA) and Korean Association Resource (KARE) cohort of the Korean Genome and Epidemiology Study (KoGES) or the Breast Cancer Case -Cohort (BCCC). The detailed design of the KoGES study, a large cohort study with publicly available data, has been described elsewhere (18). HEXA, initiated in 2004, recruited 173,357 participants ages >40 years from 38 health examination centers and training hospitals located in eight regions of Korea. Among them, 58,697 participants who had genotype data and met the sample quality control criteria were selected for the analyses. Samples with a low genotype call rate (<97%), cryptic relatedness, or gender discrepancy were excluded. We selected women who had not been diagnosed with cancer for further analysis (Fig. 1). Cases were defined as those who were diagnosed with breast cancer but not with other types of cancers. Controls were defined as those who were cancer-free at baseline and at the time of the follow-up surveys. The participants of HEXA had been followed up utilizing active and passive methods (18). The first follow-up cohort of HEXA at a median of 4.6 ± 1.5 years, was denoted as HEXA1st. KARE, initiated in 2001, recruited 10,038 participants ages 40 to 69 years from two cities, Ansan and Ansung, in Korea. Among them, 5,493 participants who had genotype data and met the sample quality criteria used for HEXA were selected. The KARE cohort was used as a reference dataset for RF distributions when constructing an absolute risk estimation model. The BCCC was initiated in 2008, and it only recruited patients with breast cancer (N = 2,165) from Seoul National University Hospital. One BCCC participant whose age at onset was >80 years was excluded before the analysis. Participants in the BCCC had genotype data but lacked information about NGRFs. Therefore, genetic information about the BCCC was used for PRS validation only. To perform PRS validation, we used 378 cases detected during the baseline HEXA survey. To construct the PRS+NGRF model, we used 153 cases detected at the time of the follow-up survey. Individuals who were not included in the follow-up survey (N = 4,097) or had missing NGRF data (N = 2,097) were not included in the prediction model construction process. Instead, they were used as controls for PRS validation. To ensure a balanced distribution, we generated two independent subsets from the HEXA controls at a 1:1 ratio by randomization. Finally, we analyzed 2,542 cases and 17,892 controls for PRS validation (Fig. 1). The PRS+NGRF model was constructed using 153 cases and 17,989 controls based on the aforementioned conditions (Fig. 1).
Genetic data acquisition
The HEXA and KARE participants were genotyped using the Korean Chip (K-CHIP), which was designed by the Center for Genome Science, Korea National Institute of Health, based on the UK Biobank Axiom Array, and manufactured by Affymetrix (19). BCCC participants were genotyped using the Affymetrix Genome-wide Human SNP array 6.0. We used the Michigan imputation server for phasing (via Eagle v2.4) and imputation (via minimac4) using 1000 Genomes Phase III data as the reference panel (20). After imputation, we excluded SNPs with low imputation quality scores (INFO < 0.3), minor allele frequency ≤0.01, genotype call rates (<95%), and Hardy–Weinberg equilibrium P ≤ 1E-06.
Construction of breast cancer PRS models
where β is a coefficient representing the association between each SNP and breast cancer and k is the number of SNPs used. Of the 376 SNPs, 239 were available in the imputed genotype data of the three Korean cohorts. Accordingly, seven single PRS models were constructed. Depending on the numbers of incorporated SNPs and types of β weights used, PRSs were denoted as PRS38_ASN, PRS196_EUR, PRS196_ASN, PRS196_EB, PRS201_EUR, PRS201_ASN, and PRS201_META. PRSASN or PRSEUR indicates that β weights of each PRS were inferred from Asian or European weights, respectively (10, 21, 22). PRSEB applied β weights based on a combination of Asian and European weights using the Empirical Bayes approach (23). PRSMETA utilized β weights generated by a meta-analysis of European and Asian weights reported by a previous study (ref. 15; see Supplementary Table S1 for details).
α0, α1, and α2 were obtained by fitting a logistic regression model with breast cancer incidence as the outcome. PRSs were standardized to the respective standardized deviations of the HEXA controls. Ten-fold cross-validation by regression was conducted for multiple PRS models. The relative contributions of each PRS to multiple PRS models are shown in Supplementary Table S2.
Models incorporating NGRFs
Statistical analysis
PRS association analysis and evaluation of predictive performance
For PRS association analyses, we used logistic regression adjusted for covariates. We examined the OR per SD of the PRS for seven percentile groups (0%–5%, 5%–15%, 15%–35%, 35%–65%, 65%–85%, 85%–95%, 95%–100%), with 35%–65% being the average risk group.
The prediction performance of the PRS was measured by the area under the ROC curve (AUC) using logistic regression. To compare the predictive function of either NGRF-based or PRS-based models and integrated models (PRS+NGRF), the AUC and expected to observed (E/O) ratios were evaluated.
Estimation of the absolute risk of breast cancer according to PRS percentiles
Equation (4) assumes that RF Z acts in a multiplicative fashion on the baseline hazard function λ0(t). It accounts for competing risks originating from mortality due to other causes through m(t), the age-specific mortality rate function (2). The lifetime absolute risk was evaluated as the risk between the age of 20 years and a specific age with a maximum of 80 years. The 5-year absolute risk was defined as the risk within the next 5 years for a woman who has reached a specific age. The Individualized Coherent Absolute Risk Estimation (iCARE) tool requires the RRs of RFs (Z), log-relative risks (β), age-specific incidence rate of all-cause mortality excluding breast cancer mortality, incidence rates of breast cancer, and RF distributions within a population. For this study, RRs were obtained from external studies (16, 24), and RF distributions were derived from KARE, which was used as a reference cohort. The age-specific breast cancer incidence and mortality rates of Korean women in 2010 were obtained from the Korean Statistical Information Service. Absolute risks were evaluated with R 4.2.1 using the iCARE R package (version 1.18.0; ref. 25). P < 0.05 was considered significant.
To investigate the associated effect of PRS and NGRF according to the magnitude of risks, absolute lifetime risks using multiple PRS and NGRF risk strata were analyzed. PRSs were classified into three risk groups (0%–20%: low; 20%–80%: mid; 80%–100%: high), and NGRF scores were classified into two groups divided using a median distribution (0%–50%: low; 50%–100%: high).
Ethics approval and consent to participate
The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of National Biobank of Korea (P01-202108-31-001) and Seoul National University Hospital (1507-132-689). Written informed consent was obtained from all subjects.
Data availability
The data underlying this article are available in the article and in its online Supplementary Materials and Methods. Raw data were generated from the Korea Genome and Epidemiology Study. The age-specific breast cancer incidence and mortality rates of Korean women were obtained from the Korean Statistical Information Service (SCR_023565). The derived data supporting the findings of this study are available from the corresponding author upon request.
Results
Study population
PRS validation was performed among 20,434 Korean women (Fig. 1). The PRS+NRGF model was evaluated among 18,142 controls. In this subset, 153 cancer cases were detected during the follow-up period. Among 153 newly developed cases, 68 occurred in women ages <50 years and 85 occurred in those ages >50 years.
Performance of PRS in the Korean population
Thirteen PRSs were constructed using previously reported Asian and European SNPs (Table 1). In general, the multiple PRS models performed better than the single PRS models for Korean women. Among the PRS models, the most predictive was PRS38_ASN+PRS190_EB (AUC: 0.621), although the overall AUC differences between PRS models were marginal. The contribution of PRS38_ASN to PRS38_ASN+PRS190_EB was approximately 30% (Supplementary Table S2). We did not observe a significant interaction between PRS38_ASN+PRS190_EB and age (Supplementary Table S4). The density plot of PRS38_ASN+PRS190_EB is shown in Supplementary Fig. S1. The distribution curve for cancer participants shifted to the right compared with that of the controls. The percentile association of PRS38_ASN+PRS190_EB stratified into seven percentile groups is presented in Table 2. Women in the top 5% had a 2.5-fold higher risk and women in the lowest 5% had a 0.61-fold lower risk of breast cancer than the average risk group (35%–65%). The risk distributions were well distinguished between the risk percentile groups, although the associations were not statistically significant. Lifetime and 5-year absolute risks of PRS38_ASN+PRS190_EB are shown in Supplementary Fig. S2. The lifetime absolute risk at age 80 years for women in the highest 5% was 9.91%, and that of women in the lowest 5% was 2.18% (average lifetime absolute risk: 4.89%).
Mean, SD, and the associations of the PRS with the breast cancer risk of Korean women.
. | HEXAa + BCCCb . | |||
---|---|---|---|---|
PRS . | Case (N = 2,542) Mean ± (SD) . | Control (N = 17,892) Mean ± (SD) . | OR per SD (95% CI)c . | AUCd (95% CI) . |
Single PRS | ||||
PRS38_ASN | −0.10 ± 0.41 | −0.24 ± 0.41 | 1.37 (1.24–1.52) | 0.592 (0.581–0.604) |
PRS190_EUR | 0.68 ± 0.51 | 0.47 ± 0.50 | 1.38 (1.24–1.53) | 0.611 (0.599–0.623) |
PRS190_ASN | −0.06 ± 0.56 | −0.29 ± 0.54 | 1.41 (1.27–1.56) | 0.612 (0.600–0.624) |
PRS190_EB | 0.24 ± 0.46 | 0.05 ± 0.45 | 1.41 (1.27–1.56) | 0.616 (0.604–0.627) |
PRS201_EUR | 0.46 ± 0.52 | 0.25 ± 0.50 | 1.37 (1.23–1.51) | 0.612 (0.600–0.624) |
PRS201_ASN | 0.03 ± 0.55 | −0.19 ± 0.54 | 1.41 (1.28–1.56) | 0.612 (0.600–0.624) |
PRS201_META | 0.61 ± 0.59 | 0.37 ± 0.57 | 1.41 (1.28–1.57) | 0.614 (0.603–0.626) |
Multiple PRS | ||||
PRS38_ASN+PRS190_EUR | −1.85 ± 0.21 | −1.94 ± 0.20 | 1.44 (1.30–1.59) | 0.619 (0.607–0.631) |
PRS38_ASN+PRS190_ASN | −2.07 ± 0.22 | −2.17 ± 0.21 | 1.44 (1.30–1.59) | 0.615 (0.604–0.627) |
PRS38_ASN+PRS190_EB | −1.98 ± 0.20 | −2.07 ± 0.19 | 1.45 (1.31–1.61) | 0.621 (0.609–0.633) |
PRS38_ASN+PRS201_EUR | −1.92 ± 0.21 | −2.01 ± 0.21 | 1.43 (1.29–1.58) | 0.620 (0.608–0.631) |
PRS38_ASN+PRS201_ASN | −2.04 ± 0.21 | −2.13 ± 0.21 | 1.44 (1.30–1.59) | 0.615 (0.603–0.627) |
PRS38_ASN+PRS201_META | −1.85 ± 0.23 | −1.95 ± 0.23 | 1.45 (1.31–1.60) | 0.618 (0.607–0.630) |
. | HEXAa + BCCCb . | |||
---|---|---|---|---|
PRS . | Case (N = 2,542) Mean ± (SD) . | Control (N = 17,892) Mean ± (SD) . | OR per SD (95% CI)c . | AUCd (95% CI) . |
Single PRS | ||||
PRS38_ASN | −0.10 ± 0.41 | −0.24 ± 0.41 | 1.37 (1.24–1.52) | 0.592 (0.581–0.604) |
PRS190_EUR | 0.68 ± 0.51 | 0.47 ± 0.50 | 1.38 (1.24–1.53) | 0.611 (0.599–0.623) |
PRS190_ASN | −0.06 ± 0.56 | −0.29 ± 0.54 | 1.41 (1.27–1.56) | 0.612 (0.600–0.624) |
PRS190_EB | 0.24 ± 0.46 | 0.05 ± 0.45 | 1.41 (1.27–1.56) | 0.616 (0.604–0.627) |
PRS201_EUR | 0.46 ± 0.52 | 0.25 ± 0.50 | 1.37 (1.23–1.51) | 0.612 (0.600–0.624) |
PRS201_ASN | 0.03 ± 0.55 | −0.19 ± 0.54 | 1.41 (1.28–1.56) | 0.612 (0.600–0.624) |
PRS201_META | 0.61 ± 0.59 | 0.37 ± 0.57 | 1.41 (1.28–1.57) | 0.614 (0.603–0.626) |
Multiple PRS | ||||
PRS38_ASN+PRS190_EUR | −1.85 ± 0.21 | −1.94 ± 0.20 | 1.44 (1.30–1.59) | 0.619 (0.607–0.631) |
PRS38_ASN+PRS190_ASN | −2.07 ± 0.22 | −2.17 ± 0.21 | 1.44 (1.30–1.59) | 0.615 (0.604–0.627) |
PRS38_ASN+PRS190_EB | −1.98 ± 0.20 | −2.07 ± 0.19 | 1.45 (1.31–1.61) | 0.621 (0.609–0.633) |
PRS38_ASN+PRS201_EUR | −1.92 ± 0.21 | −2.01 ± 0.21 | 1.43 (1.29–1.58) | 0.620 (0.608–0.631) |
PRS38_ASN+PRS201_ASN | −2.04 ± 0.21 | −2.13 ± 0.21 | 1.44 (1.30–1.59) | 0.615 (0.603–0.627) |
PRS38_ASN+PRS201_META | −1.85 ± 0.23 | −1.95 ± 0.23 | 1.45 (1.31–1.60) | 0.618 (0.607–0.630) |
aHEXA: Health Examinee.
bBCCC: Breast Cancer Case -Cohort.
cOR: odds ratios were estimated using a logistic regression model adjusted for age and study, SD: standard deviation, CI: confidence interval.
dAUC: Area under the curve.
Percentile association of PRS38_ASN+PRS190_EB and participant distributions.
. | Sample size . | . | . | ||
---|---|---|---|---|---|
PRS (Percentile) . | BCCCa-Case . | HEXAb-Case . | HEXA-Control . | OR (95% CI)c . | P . |
00–05(%) | 47 | 11 | 964 | 0.61 (0.31–1.09) | 1.20E-01 |
05–15(%) | 125 | 20 | 1,898 | 0.57 (0.34–0.90) | 2.11E-02 |
15–35(%) | 290 | 63 | 3,734 | 0.91 (0.66–1.24) | 5.39E-01 |
35–65(%) | 610 | 101 | 5,419 | Reference | — |
65–85(%) | 534 | 94 | 3,459 | 1.46 (1.10–1.94) | 9.35E-03 |
85–95(%) | 320 | 54 | 1,669 | 1.74 (1.24–2.42) | 1.17E-03 |
95–100(%) | 238 | 35 | 749 | 2.50 (1.67–3.67) | 4.52E-06 |
. | Sample size . | . | . | ||
---|---|---|---|---|---|
PRS (Percentile) . | BCCCa-Case . | HEXAb-Case . | HEXA-Control . | OR (95% CI)c . | P . |
00–05(%) | 47 | 11 | 964 | 0.61 (0.31–1.09) | 1.20E-01 |
05–15(%) | 125 | 20 | 1,898 | 0.57 (0.34–0.90) | 2.11E-02 |
15–35(%) | 290 | 63 | 3,734 | 0.91 (0.66–1.24) | 5.39E-01 |
35–65(%) | 610 | 101 | 5,419 | Reference | — |
65–85(%) | 534 | 94 | 3,459 | 1.46 (1.10–1.94) | 9.35E-03 |
85–95(%) | 320 | 54 | 1,669 | 1.74 (1.24–2.42) | 1.17E-03 |
95–100(%) | 238 | 35 | 749 | 2.50 (1.67–3.67) | 4.52E-06 |
aBCCC: Breast Cancer Case -Cohort.
bHEXA: Health Examinee.
cORs were estimated using a logistic regression model adjusted for age and study, CI: confidence interval.
Performance of the prediction model incorporating NGRFs
The prediction model established using NGRFs had limited predictive power (Fig. 2; Supplementary Table S5). NGRF models were more predictive for women ages ≥50 years than for those ages <50 years (AUC: 0.564 vs. 0.503). The results showed that there were noticeable differences in the AUC changes when NGRF models were added to PRS models, depending on the age group. For women ages ≥50 years, the addition of NGRFs led to an increase in the AUC, although the initial PRS had a lower AUC. However, for participants ages <50 years, the incorporation of NGRFs had only a small effect on the AUC, whereas PRS alone had better predictive performance. This implies that women ages ≥50 are more dependent on NGRFs, whereas women ages <50 are more genetically predisposed. PRS38_ASN+PRS201_META+NGRF had the highest predictive power for women ages <50 years, whereas PRS38_ASN+PRS190_EB+NGRF was the most predictive model for women ages ≥50 years. Nevertheless, because there was not much difference in the overall AUC (0.012) between the best and worst multiple PRSs for all age groups, we decided to use PRS38_ASN+PRS190_EB+NGRF (hereafter referred to as the integrated model), which is the model containing PRS38_ASN+PRS190_EB (hereafter referred to as the multiple PRS model), which had the highest accuracy during step one, to further estimate the absolute breast cancer risk.
AUC for various PRS models and NGRFs predicting the breast cancer risk. The AUC was compared among NGRFs, PRSs, and integrated (PRS+NGRF) models for women. A, Age <50 years. B, ≥50 years.
AUC for various PRS models and NGRFs predicting the breast cancer risk. The AUC was compared among NGRFs, PRSs, and integrated (PRS+NGRF) models for women. A, Age <50 years. B, ≥50 years.
Absolute risk of breast cancer according to PRS percentiles
Figure 3 depicts the lifetime and 5-year absolute risks of the integrated model. The lifetime absolute risk of breast cancer ranged from 2% to 10%, with an average of 5.06% (Fig. 3A). The absolute risk at age 80 years for women in the highest 5% was 9.93%, whereas that for women in the lowest 5% was at 2.22%. The zenith of the 5-year absolute risk for women in the top 5% at age 48 years was 1.47%, and it declined thereafter (Fig. 3B). The 5-year absolute risk of the average risk group at 40 years, which is the age when the first breast cancer screening is recommended in Korea, was 0.6%. However, women in the top 5% reached this level of risk much earlier at age 33 years. This outcome may support the need for individualized screening strategies for high-risk women, particularly those ages <40 years.
Estimation of the absolute breast cancer risk by seven percentiles. The absolute risk of developing breast cancer is predicted using the integrated model (PRS38_ASN+PRS190_EB+NGRF). The dotted lines represent the average risks. A, Lifetime absolute risk. B, 5-year absolute risk.
Estimation of the absolute breast cancer risk by seven percentiles. The absolute risk of developing breast cancer is predicted using the integrated model (PRS38_ASN+PRS190_EB+NGRF). The dotted lines represent the average risks. A, Lifetime absolute risk. B, 5-year absolute risk.
To compare the changes induced by incorporating NGRFs, the lifetime and 5-year absolute risks of the multiple PRS model were also estimated using HEXA 1st (Supplementary Fig. S3). The average absolute risks of the integrated model (5.06%) and multiple PRS model (4.81%) were quite similar. However, the distributions of density in both models were different. Figure 4 depicts the density plot of the multiple PRS and integrated models at age 80 years. Adding NGRFs to multiple PRS models increased the SD of the integrated model. A greater increase in the SD was observed for the higher-risk group, especially those in the top 5% (Supplementary Table S6). This implies that women at higher risk are more sensitive to risk-reducing interventions.
Density plot of breast cancer absolute risk at age 80. The absolute risk was stratified by seven PRS percentiles. A, Multiple PRS model. B, Integrated model.
Density plot of breast cancer absolute risk at age 80. The absolute risk was stratified by seven PRS percentiles. A, Multiple PRS model. B, Integrated model.
To investigate the associated effect of PRS and NGRF, we analyzed the absolute lifetime risks using different PRS and NGRF risk levels (Supplementary Fig. S4). In the model, the curves for mid PRS+high NGRF versus high PRS+low NGRF as well as the curves for mid PRS+low NGRF versus low PRS+high NGRF nearly overlapped. This supports the idea that risk modifications might reduce the breast cancer risk of some individuals despite their inherited genetic risks. The difference in the absolute risk with high NGRF and low NGRF levels was greater for women at higher risk, indicating that there is greater potential for risk reduction among this group of women.
Discussion
In our study, we developed a breast cancer prediction model for Korean women based on 13 PRS and NGRFs. We demonstrated that (i) the combined Asian and European PRS was predictive of breast cancer among Korean women and (ii) the incorporation of NGRFs improved breast cancer risk stratification for women ages ≥50 years. Our findings provide essential insights into genetic susceptibility and NGRFs for predicting breast cancer among Korean women.
To our knowledge, very few studies of the Korean PRS have been conducted, and most of them were conducted as a part of large-scale Asian studies (14, 22). In a previous analysis, a PRS constructed with 44 SNPs among East Asian women in the Breast Cancer Association Consortium was examined among Korean women. In that study, although the Korean-specific AUC was not provided, the overall AUC was 0.606 (14), which is consistent with the AUC of PRS38_ASN in our study (AUC: 0.592). In addition, one recent study explored a combined Asian and European PRS (PRS46+PRS287_EB) for Asian women, and an AUC of 0.630 (OR per SD: 1.59) was reported for a subset of Korean women (22). Our study is one of the few studies to establish a breast cancer risk prediction model incorporating PRSs for Korean women only. In our study, the AUC of the PRS with the highest predictive power was 0.621. This is compatible with those of published studies that examined PRSs for Asian women (13, 22, 26, 27). In addition, the increase in AUC resulting from combining NGRFs with multiple PRSs was comparable with the findings of previous studies (Supplementary Table S5; refs. 7, 9, 15).
We established two separate models incorporating NGRFs according to age using different RFs and RRs. In Korea, menopause is an important RF contributing to a distinctive breast cancer incidence curve that peaks at age 50 years and declines thereafter, as shown in this study (Fig. 3; refs. 5, 24). For this reason, Korean prediction models often have been used to analyze age groups separately (16, 17, 24, 28). In our study, we observed that the contributions of NGRFs and PRSs to the prediction of breast cancer were markedly different among age groups (Fig. 2; Supplementary Table S5). For those ages <50 years, the AUC of the PRS alone was initially higher compared with that of women ages >50 years. Adding NGRFs did not contribute to an increase in the AUC of the group. Our findings suggest that young women have higher genetic susceptibility to breast cancer than their older counterparts (26). An analysis that evaluated the interactions between the PRS and NGRFs showed a stronger association between the PRS and premenopausal women (OR: 2.46), thus supporting the findings of our study (29).
In our study, the performance of the prediction models among patients ages <50 years was greater than it was among those ages ≥50 years. Presumably, risk prediction for patients ages ≥50 years is further complicated by menopause and confounding factors such as BMI (30–32). In a previous study, a prediction model constructed using iCARE based on the Korean incidence of mortality and Korean-based risk distributions showed an AUC of 0.584 for women ages ≥50 years and an AUC of 0.697 for women ages <50 years (24). The Korean Breast Cancer Risk Assessment Tool incorporated RFs such as age at menopause, pregnancy experience, BMI, oral contraceptive usage, and exercise, unlike that for premenopausal women, yielding an AUC of 0.65 for women ages ≥50 (17). In this study, we selected reproductive RFs as the main NGRFs. The effects of risks related to estrogen-dependent reproductive factors, including age at menarche and number of pregnancies, on breast cancer development have been clearly established (4), whereas the effects of lifestyle factors (alcohol intake, hormone use, and exercise) are complicated and still controversial (33–36). In addition, owing to the characteristics of questionnaire-based surveys, self-reported lifestyle RFs that change over time are associated with a high risk of confounding the study by creating recall bias or misclassification (32).
The results of our study support the usage of personalized screening and prevention strategies based on PRS risk stratification. To date, most guidelines have been updated to implement personalized screening for those at high risk, notably those with a family history or identified pathogenic variants (37, 38). However, the current Korean national breast cancer screening program follows a one-size-fits-all approach, encouraging women ages 40 to 69 years in the general population to undergo a biennial mammography based on the evidence that screening reduces breast cancer mortality in this age group (39). Our study suggests that earlier screening should be implemented for women in the high-risk group, particularly for those ages <40 years who are currently ineligible for screening programs. In addition, preventive interventions and lifestyle modifications for women with higher risk scores should be focused on to yield maximum risk reductions (40). For instance, lifestyle changes affecting BMI may modify the breast cancer risk. However, the best screening tool for breast cancer in young women and its effect on reducing real-world breast cancer mortality should be carefully evaluated before implementation (41).
Our study had several limitations. First, the small sample size limited the optimization of the prediction model. A sufficient sample size may have provided significant P values for all risk estimates in the multiple PRS model (Supplementary Table S4). Also, the inclusion of few participants ages >70 years (1.57%) may have underrepresented the risk estimates of women in this age group. The average life expectancy of Koreans is increasing and reached 84 years in 2021 (42). Further accumulation of follow-up data may enable us to develop a more accurate model that covers all age strata. Second, because the BCCC and HEXA cohorts of case–control studies were recruited from either a teaching hospital or a health examination center in an urban area, their characteristics may not be representative of the entire Korean population, thus inducing selection bias. Further external validation among the general population is warranted. Third, our model does not include a modifiable RF. Further adjustments are needed to include information about modifiable RFs, and other RFs associated with breast cancer risk, such as mammographic density, might improve the predictive power (43, 44). Finally, although we only included common variants in our model, pathogenic variants that are known to confer a higher risk of breast cancer susceptibility may be included in future prediction models.
In conclusion, we showed that the combined PRS is predictive of breast cancer in Korean women. The incorporation of NGRFs further enhanced the predictive power for women ages >50 years. These models can be helpful to developing optimal screening strategies for and effective preventive measures against breast cancer.
Authors' Disclosures
T.-W. Ha reports personal fees from DCGen Co., Ltd. outside the submitted work; and has a pending patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. H.-M. Choi reports personal fees from DCGen Co., Ltd. outside the submitted work; has a patent for 10-2022-0144476 pending; Methods for predicting cancer prognosis and compositions thereof pending. H.-B. Lee reports grants from the Korea Health Industry Development Institute (KHIDI) during the conduct of the study; is the chief medical officer of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. H.-C. Shin is the chief executive officer of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. W. Chung reports personal fees from DCGen Co., Ltd and is the chief technical officer of DCGen Co., Ltd; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending. W. Han reports grants from the Korea Health Industry Development Institute (KHIDI) during the conduct of the study; is the chairman of DCGen Co., Ltd.; has a patent for 10-2022-0144476; Methods for predicting cancer prognosis and compositions thereof pending.
Authors' Contributions
J. Choi: Formal analysis, writing–original draft, writing–review and editing. T.-W. Ha: Data curation, formal analysis, methodology, writing–original draft, writing–review and editing. H.-M. Choi: Data curation, methodology, writing–review and editing. H.-B. Lee: Conceptualization, methodology, writing–review and editing. H.-C. Shin: Conceptualization, methodology, writing–review and editing. W. Chung: Conceptualization, supervision, methodology, writing–review and editing. W. Han: Conceptualization, supervision, methodology, writing–review and editing.
Acknowledgments
W. Han and H.-B. Lee received a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant no.: HR14C0003).
This study was conducted with bioresources from National Biobank of Korea, the Korea Disease Control and Prevention Agency, Republic of Korea (KBN-2021-504).
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).