Abstract
Screening digital breast tomosynthesis (DBT) aims to identify breast cancer early when treatment is most effective, leading to reduced mortality. In addition to early detection, the information contained within DBT images may also inform subsequent risk stratification and guide risk-reducing management. Using transfer learning, we refined a model in the Joanne Knight Breast Health Cohort at Washington University, a cohort of 5,066 women with DBT screening (mean age, 54.6), among whom 105 were diagnosed with breast cancer (26 ductal carcinoma in situ). We applied the model to external data from the Emory Breast Imaging Dataset, a cohort of 7,017 women free from cancer (mean age, 55.4), among whom 111 pathology-confirmed breast cancer cases were diagnosed more than 6 months after initial DBT (17 ductal carcinoma in situ). We obtained a 5-year AUC of 0.75 [95% confidence interval (CI), 0.73–0.78] in the internal validation. The model validated in external data gave an AUC of 0.72 (95% CI, 0.69–0.75). The AUC was unchanged when age and Breast Imaging-Reporting and Data System density were added to the model with synthetic DBT images. The model significantly outperforms the Tyrer-Cuzick model, with a 5-year AUC of 0.56 (95% CI, 0.54–0.58; P < 0.01). Our model extends risk prediction applications to synthetic DBT, provides 5-year risk estimates, and is readily calibrated to national risk strata for clinical translation and guideline-driven risk management. The model could be implemented within any digital mammography program.
Prevention Relevance: We develop and externally validate a 5-year risk prediction model for breast cancer using synthetic DBT and demonstrate clinical utility by calibrating to the national risk strata as defined in breast cancer risk management guidelines.
Introduction
To date, applications of radiomic data have largely focused on improved diagnosis at the time of screening. The widespread use of digital mammography has supported this focus for both full-field digital mammograms (FFDM) and digital breast tomosynthesis (DBT). Concurrent with diagnosis, breast cancer risk prediction has an increasing role in routine screening. Prediction models have moved from using demographic variables that approximate changes in breast tissue to using digitized film mammograms and now to digital images (1) to predict long-term risk. DBT has been evaluated and internally validated to predict up to 2-year risk (2). Given that current guidelines in the United States refer to 5-year risk as a guide for risk management, including chemoprevention, other risk reduction strategies (3, 4), and additional screening (5), it is imperative that long-term risk be evaluated. Furthermore, these long-term prediction models must be evaluated in a diverse population to ensure generalizability (6).
DBT was approved by the FDA for all women in 2011 (7). It improves the cancer detection rate in screening (8) and has demonstrated usefulness in screening and diagnostic settings (9). The uptake of tomosynthesis for breast screening has varied over time by insurance status and other population characteristics (7, 10). It was approved by Medicare in 2015, but through 2017, women from underrepresented racial/ethnic groups, lower education levels, lower income, and rural residences had been slower to access this technology (10). This is consistent with other advanced imaging technologies, including MRI (11).
Given the differential uptake over time and yet now widespread use, it is imperative that the long-term risk of breast cancer also be evaluated. With sufficient time from implementation to identify cases of breast cancer in a broadly screened population, a prediction tool that works across populations is urgently needed for clinical translation. We draw on a comprehensive breast imaging center accredited and designated by the American College of Radiology (ACR), providing routine breast screening in a diverse population [27% non-Hispanic Black (NHB) women; ref. 12], to develop and apply a risk prediction model from the DBT screening images. For the first time, we illustrate the model’s performance in an external validation cohort with 46% of NHB women (13). We use a 5-year risk horizon to assess prediction performance following DBT screening mammography and demonstrate clinical utility by calibrating to the national risk strata to better manage risk.
Materials and Methods
Analytic dataset
The Joanne Knight Breast Health Cohort at Washington University (WashU cohort) is used as training data in this study with up to 10 years of mammograms (14). This cohort of women undergoing routine opportunistic mammography screening in St. Louis includes over 10,000 women with 27% of NHB women (14). Women free from breast cancer, attending routine screening ages 30 to 94 (mean age, 54.6), were included in the cohort. Incident breast cancers (invasive and in situ) are identified through record linkage to pathology and tumor registries (14). Eligibility included consent for follow-up and attendance at a routine screening visit. In this academic center, approximately 3% of women are sufficiently high risk to qualify for supplemental imaging, which includes annual MRI, typically offset by 6 months from annual screening mammography. We excluded women whose entry examination resulted in a diagnosis of breast cancer and those with a diagnosis within the first 6 months of their mammogram to reflect the time for diagnostic workup and pathology confirmation. We began follow-up from the first DBT, typically from 2013 when it was introduced to screening services, and followed women through December 2020. We identified 105 breast cancer cases (26 Ductal Carcinoma in Situ), excluding those with a diagnosis in the first 6 months of their eligibility DBT, among 5,066 women free from breast cancer at their first screening DBT. All mammograms were uniformly processed on Hologic machines. Upon entry to the cohort, women self-reported breast cancer risk factors using established and validated measures (15). The radiologists’ readings of mammograms for Breast Imaging-Reporting and Data System (BI-RADS) breast density were retrieved along with the images. From these data, we estimated the Tyrer-Cuzick (TC) model risk of breast cancer (16, 17).
External validation cohort
The external validation dataset is drawn from the Emory Breast Imaging Dataset (EMBED) with up to 8 years of follow-up (13). This cohort represents a 20% random sample of deidentified mammograms of diverse women undergoing screening or diagnostic mammograms at four hospitals (two community hospitals, one large inner-city hospital, and one private academic hospital) from January 2013 to December 2020. The age range for women in this cohort was 25.4 to 89, with a median age of 55.5 and an IQR of 46.9 to 64.5. Similar to the WashU cohort, DBT was in use from the inception of the cohort in 2013, and we included women from their first DBT and excluded women diagnosed with breast cancer within the first 6 months since entry to the cohort, those who have no follow-up information since the baseline mammogram, and those who had a history of breast cancer prior to their first DBT. The baseline screening mammograms’ BI-RADS categories were 83.1% BI-RADS category 1 and 16.9% BI-RADS category 2. Follow-up was from the time of the first DBT to the last screening mammogram or 5 years (whichever came first). From the cohort of 7,017 women, we identified 111 pathology-confirmed breast cancer cases (including 17 in situ cases) beyond 6 months of entry into the cohort and 6,906 women (46.3% NHB) who remained free from breast cancer during follow-up. As described for the EMBED cohort, the mammography service is linked to the pathology department and medical records to confirm breast cancer cases during follow-up (13). Data included age, race, and time from the initial digital screening mammogram to breast cancer diagnosis. DBTs were performed on Hologic machines.
Statistical analysis
The risk model for synthetic DBTs in the form of a convolutional neural network was developed in the WashU cohort using the transfer learning approach (18). Transfer learning is a machine learning technique where a model developed for one task is reused or adapted as the starting point for a model on a second related task (18). We have previously used this approach as described for density estimation (19). In this case, we used a pretrained convolutional neural network with FFDM images from over 10,000 women within WashU (20). This model learned to extract important features from the FFDM images that are predictive of breast cancer risk. To enhance the model’s capabilities for synthetic DBT, we employed transfer learning. Rather than training a new model from scratch with synthetic DBT images, we leveraged the knowledge embedded in the pretrained FFDM model. Specifically, the pretrained model had already learned representations from FFDM images, such as patterns and structures relevant to breast cancer risk. We fine-tuned this pretrained model to accommodate synthetic DBT images, which are derived from a series of raw projection images. By applying transfer learning, the model was able to transfer the previously learned FFDM features and adapt them to the synthetic DBT data with fewer computational resources and less training time. This approach also improved the performance of the model by allowing it to generalize more effectively across both imaging modalities.
Our model takes the synthetic DBT mediolateral oblique and craniocaudal images with the option to input clinical risk factors. The outputs of the model include a mammogram risk score (MRS), the probability of 5-year breast cancer onset, and relative risk for each woman that can be used for risk calibration (see Fig. 1).
Model overview. The four views of synthetic DBT screening mammograms are the inputs of the deep learning algorithm. The model has the option of using clinical risk factors (i.e., age and BI-RADS density). The model output includes an MRS and breast cancer onset probability for each woman. The proposed framework also allows for personalized risk stratification calibrated to population risk, such as that provided by SEER. BC, breast cancer.
Model overview. The four views of synthetic DBT screening mammograms are the inputs of the deep learning algorithm. The model has the option of using clinical risk factors (i.e., age and BI-RADS density). The model output includes an MRS and breast cancer onset probability for each woman. The proposed framework also allows for personalized risk stratification calibrated to population risk, such as that provided by SEER. BC, breast cancer.
Performance of the following models is examined: (i) age and BI-RADS density, (ii) synthetic DBT mammogram only, and (iii) synthetic DBT plus age and BI-RADS density.
We performed both an internal validation and an external validation in assessing the prediction performance. The internal validation uses a fivefold cross-validation and involves randomly partitioning women in the WashU cohort into five subsamples. The external EMBED data are used solely for validation. The 95% confidence intervals (CI) were estimated using 5,000 bootstraps.
The performance of the risk prediction model was assessed in terms of discrimination of risk stratification. We used the AUC to assess the discrimination performance (21). We report the 5-year AUC from the first screening synthetic DBT. We generated a distribution plot to visualize the MRS separation between women who subsequently develop breast cancer and women who remain cancer-free in the external validation cohort. The US SEER 22 (https://surveillance.cancer.gov/devcan/) calibrated 2016 risk stratification by absolute age using MRS, which is also reported in the external validation. Additionally, we show calibration via predicted versus observed 5-year risk in the external validation. Because prediction performance has varied in population groups, we stratified subanalyses by race (NHB vs. non-Hispanic White) and density (dense vs. nondense).
This prospective cohort study’s creation and follow-up were supported by WashU and the Breast Cancer Research Foundation. Ethical approval was obtained from the Institutional Review Board of Washington University in St. Louis, MO. Informed written consent was obtained for study participation, and the study was conducted in accordance with the Declaration of Helsinki. The EMBED cohort deidentified data were shared following Institutional Review Board approval (13).
Data availability
Development data, mammogram images, and covariates at WashU are available with a data use agreement. Apply to the corresponding author.
External validation data from Emory are publicly available at https://github.com/Emory-HITI/EMBED_Open_Data.
Results
Cohort characteristics
We draw on a cohort of 5,066 women who consented to linkage to medical records, including pathology reports, to study the risk of breast cancer in a diverse population (WashU cohort). We developed the model in this diverse population, which includes uninsured women and those covered through the Breast and Cervical Screening Program as well as other forms of insurance. The external validation (Emory cohort) is in a similarly diverse population of 7,017 women.
Breast cancer risk factors were assessed at entry to the cohort in this prospective study (Table 1). The mean age at entry was 54.6 for women who remained cancer-free. Of the population, 26.5% self-identified as NHB women. Cases had denser BI-RADS (C and D) and a higher prevalence of family history of breast cancer [25.5% compared with women who remained cancer-free (17.1%)]. Comparable BI-RADS distribution and ethnic diversity in the Emory external validation cohort are reported in Table 1. The mean age at first DBT was 55.9 for women who remained cancer-free, and 46.3% were NHB. Cases of breast cancer were diagnosed consistently through follow-up across all 5 years.
Baseline patient characteristics by case status of the WashU mammography screening cohort and the external validation Emory Breast Imaging Dataset, EMBED. Breast cancer cases exclude diagnostic images confirmed within 6 months of screening mammograms.
. | WashU derivation cohort . | Emory validation cohort . | ||
---|---|---|---|---|
Breast cancer (n = 105) . | Cancer-free cohort (n = 4,961) . | Breast cancer (n = 111) . | Cancer-free cohort (n = 6,906) . | |
Mean (SD) | ||||
Age (years) | 56.7 (8.5) | 54.6 (8.7) | 60.5 (10.7) | 55.9 (11.1) |
Body mass index (kg/m2) | 29.0 (6.2) | 29.2 (7.3) | — | — |
Number (%) | ||||
BI-RADS | ||||
A | 2 (2.2%) | 453 (9.7%) | 9 (8.1%) | 775 (11.2%) |
B | 47 (44.4%) | 2,303 (49.1%) | 41 (36.9%) | 2,810 (40.7%) |
C | 51 (48.9%) | 1,683 (35.9%) | 56 (50.5%) | 2,891 (41.9%) |
D | 5 (4.4%) | 237 (5.0%) | 5 (4.5%) | 430 (6.2%) |
Not reported | 0 (0%) | 15 (0.3%) | — | — |
Nulliparous | 23 (22.0%) | 982 (20.9%) | — | — |
Family history of breast cancer (mother, sister, or both) | 27 (25.5%) | 804 (17.1%) | — | — |
Race | ||||
Non-Hispanic White | 77 (73.3%) | 3,288 (70.1%) | 54 (48.7%) | 2,672 (38.7%) |
NHB | 26 (24.4%) | 1,240 (26.5%) | 49 (44.1%) | 3,195 (46.3%) |
Asian | 0 (0%) | 25 (0.5%) | 5 (4.5%) | 506 (7.3%) |
Others | 2 (2.2%) | 45 (0.9%) | 1 (0.9%) | 56 (0.8%) |
Not reported | 0 (0%) | 93 (2.0%) | 2 (1.8%) | 477 (6.9%) |
Time to cancer (years) | ||||
0.5–1 | 28 (26.6%) | — | 22 (19.8%) | — |
1–2 | 28 (26.6%) | — | 32 (28.8%) | — |
2–3 | 15 (14.4%) | — | 24 (21.6%) | — |
3–4 | 13 (12.4%) | — | 20 (18.1%) | — |
4–5 | 21 (20%) | — | 13 (11.7%) | — |
. | WashU derivation cohort . | Emory validation cohort . | ||
---|---|---|---|---|
Breast cancer (n = 105) . | Cancer-free cohort (n = 4,961) . | Breast cancer (n = 111) . | Cancer-free cohort (n = 6,906) . | |
Mean (SD) | ||||
Age (years) | 56.7 (8.5) | 54.6 (8.7) | 60.5 (10.7) | 55.9 (11.1) |
Body mass index (kg/m2) | 29.0 (6.2) | 29.2 (7.3) | — | — |
Number (%) | ||||
BI-RADS | ||||
A | 2 (2.2%) | 453 (9.7%) | 9 (8.1%) | 775 (11.2%) |
B | 47 (44.4%) | 2,303 (49.1%) | 41 (36.9%) | 2,810 (40.7%) |
C | 51 (48.9%) | 1,683 (35.9%) | 56 (50.5%) | 2,891 (41.9%) |
D | 5 (4.4%) | 237 (5.0%) | 5 (4.5%) | 430 (6.2%) |
Not reported | 0 (0%) | 15 (0.3%) | — | — |
Nulliparous | 23 (22.0%) | 982 (20.9%) | — | — |
Family history of breast cancer (mother, sister, or both) | 27 (25.5%) | 804 (17.1%) | — | — |
Race | ||||
Non-Hispanic White | 77 (73.3%) | 3,288 (70.1%) | 54 (48.7%) | 2,672 (38.7%) |
NHB | 26 (24.4%) | 1,240 (26.5%) | 49 (44.1%) | 3,195 (46.3%) |
Asian | 0 (0%) | 25 (0.5%) | 5 (4.5%) | 506 (7.3%) |
Others | 2 (2.2%) | 45 (0.9%) | 1 (0.9%) | 56 (0.8%) |
Not reported | 0 (0%) | 93 (2.0%) | 2 (1.8%) | 477 (6.9%) |
Time to cancer (years) | ||||
0.5–1 | 28 (26.6%) | — | 22 (19.8%) | — |
1–2 | 28 (26.6%) | — | 32 (28.8%) | — |
2–3 | 15 (14.4%) | — | 24 (21.6%) | — |
3–4 | 13 (12.4%) | — | 20 (18.1%) | — |
4–5 | 21 (20%) | — | 13 (11.7%) | — |
5-Year risk prediction performance
All results here exclude women who have developed breast cancer within the first 6 months of their screening mammogram and apply to a constant sample size for each cohort. We first assessed a model using age and BI-RADS density at entry. We observed a 5-year AUC of 0.55 (95% CI, 0.53–0.57) in both internal and external validation. We then assessed the model with the MRS derived from synthetic DBT mammogram only. We observed a 5-year AUC of 0.75 (95% CI, 0.73–0.77) in the internal validation and 0.72 (95% CI, 0.69–0.75) when applied to the external validation data. There was no change in the AUC when age and BI-RADS density were added to the MRS (Table 2).
Model performance by screening history for risk over 1 to 5 years based on baseline synthetic DBT MRS. Performance is reported as AUC (95% CI). Data are presented for the internal cross-validation cohort and external EMBED validation cohorta.
Model . | 2-year AUC . | 3-year AUC . | 4-year AUC . | 5-year AUC . |
---|---|---|---|---|
Internal validation cohort WashU (n = 5,066) | ||||
Age + BI-RADS | 0.59 (0.57–0.61) | 0.58 (0.56–0.60) | 0.56 (0.54–0.58) | 0.55 (0.53–0.57) |
MRS onlyb | 0.81 (0.76–0.84) | 0.80 (0.76–0.83) | 0.78 (0.75–0.80) | 0.75 (0.73–0.77) |
MRSb + age + BI-RADS | 0.81 (0.76–0.84) | 0.80 (0.77–0.83) | 0.78 (0.75–0.80) | 0.75 (0.73–0.78) |
External validation cohort EMBED (n = 7,017) | ||||
Age + BI-RADS | 0.58 (0.56–0.60) | 0.57 (0.55–0.59) | 0.56 (0.54–0.58) | 0.55 (0.53–0.57) |
MRSb only | 0.77 (0.71–0.81) | 0.77 (0.73–0.80) | 0.73 (0.70–0.76) | 0.72 (0.69–0.75) |
MRS + age + BI-RADS | 0.77 (0.71–0.82) | 0.77 (0.74–0.80) | 0.73 (0.70–0.77) | 0.72 (0.70–0.75) |
Model . | 2-year AUC . | 3-year AUC . | 4-year AUC . | 5-year AUC . |
---|---|---|---|---|
Internal validation cohort WashU (n = 5,066) | ||||
Age + BI-RADS | 0.59 (0.57–0.61) | 0.58 (0.56–0.60) | 0.56 (0.54–0.58) | 0.55 (0.53–0.57) |
MRS onlyb | 0.81 (0.76–0.84) | 0.80 (0.76–0.83) | 0.78 (0.75–0.80) | 0.75 (0.73–0.77) |
MRSb + age + BI-RADS | 0.81 (0.76–0.84) | 0.80 (0.77–0.83) | 0.78 (0.75–0.80) | 0.75 (0.73–0.78) |
External validation cohort EMBED (n = 7,017) | ||||
Age + BI-RADS | 0.58 (0.56–0.60) | 0.57 (0.55–0.59) | 0.56 (0.54–0.58) | 0.55 (0.53–0.57) |
MRSb only | 0.77 (0.71–0.81) | 0.77 (0.73–0.80) | 0.73 (0.70–0.76) | 0.72 (0.69–0.75) |
MRS + age + BI-RADS | 0.77 (0.71–0.82) | 0.77 (0.74–0.80) | 0.73 (0.70–0.77) | 0.72 (0.70–0.75) |
Diagnostic images and cases confirmed within 6 months of screening mammograms are excluded as future risk prediction is not applicable to these women.
Baseline synthetic DBT MRS.
While many demographic risk factor models are available, we chose to use the TC model in our development cohort as a representative demographic model. Comparison with the TC model was done in the WashU cohort only due to the lack of risk factors in Emory EMBED public access data. The distribution of risk factors input to the TC model is presented in Supplementary Table S1. Using a fixed sample size of 4,962 women, we obtained a 5-year AUC of 0.56 (95% CI, 0.54–0.58) using TC alone and an AUC of 0.75 (95% CI, 0.73–0.78) when we added TC with the MRS derived from synthetic DBTs (Supplementary Table S2). The model using the MRS statistically significantly outperforms the TC model (P < 0.01).
Synthetic DBT MRS and risk calibration
The distribution plot of synthetic DBT MRS in the external validation cohort shows good separation between women who developed breast cancer and those who remained cancer-free, consistent with the observed AUC (Fig. 2A). The estimated HR for a 1 SD increase in MRS was 2.1 (95% CI, 1.8–2.4).
Distribution of breast cancer risk at clinic visit estimated from mammograms in the EMBED external validation cohort, stratified by breast cancer status at the end of the 5-year follow-up from today. A, The DBT MRS distribution for women who remain cancer-free and for those who develop breast cancer and (B) the distribution of women who remain cancer-free and those who develop breast cancer by the SEER calibration and cut points defined by the National Institute for Health and Care Excellence, UK and US guidelines.
Distribution of breast cancer risk at clinic visit estimated from mammograms in the EMBED external validation cohort, stratified by breast cancer status at the end of the 5-year follow-up from today. A, The DBT MRS distribution for women who remain cancer-free and for those who develop breast cancer and (B) the distribution of women who remain cancer-free and those who develop breast cancer by the SEER calibration and cut points defined by the National Institute for Health and Care Excellence, UK and US guidelines.
Calibration to the population risk
The MRS is also used to calibrate to the US SEER 5-year expected incidence of breast cancer in the external validation cohort (Fig. 2B). Using the SEER 5-year cut points, the high-risk category (4% or higher 5-year risk) includes 20% of breast cancer cases and 6% of the cancer-free women (Supplementary Table S3). The very-low-risk category (lower than 1% 5-year risk) includes 16% of the breast cancer cases diagnosed in the next 5 years and 50% of the women who remained breast cancer–free (Supplementary Table S3). A risk ratio of 11.1 was observed when comparing the high-risk group (5-year risk >4%) with the very-low-risk group (5-year risk <1%; Fig. 2B).
Calibration of observed and predicted risk
We present predicted versus observed 5-year risk by decile with their 95% CI in Supplementary Fig. S1. It shows good calibration across all levels of risk. We did not observe any over- or underestimation of risk across all quintiles of risk (Supplementary Table S4).
Sensitivity analyses
We next assessed model performance in predicting risk for varying time intervals after the current clinic visit. As shown in Table 1, the diagnosis of breast cancer cases is well distributed across time from the first DBT in both datasets. In the external validation data, performance was steady with an AUC of 0.77 for 2- to 3-year risk and 0.72 for 5-year risk (Table 2).
Stratified subanalyses of performance in the external validation data by race/ethnicity and BI-RADS breast density are presented in Supplementary Table S5. Because breasts with high mammographic density hinder the detection of tumors, we stratified women into dense (BI-RADS C/D) and nondense (BI-RADS A/B) categories. Among women with dense breasts at the first DBT, we obtained a 5-year risk AUC of 0.69 (95% CI, 0.65–0.74) in the external validation data. An AUC of 0.74 (95% CI, 0.69–0.79) was obtained for the nondense subgroup. Additionally, we obtained a 5-year risk AUC of 0.73 (95% CI, 0.68–0.76) for non-Hispanic White women and 0.72 (95% CI, 0.68–0.76) for NHB women. When we limit the analysis to invasive breast cancers in the external validation data, we observe a 5-year AUC of 0.72 (95% CI, 0.69–0.75).
Discussion
For the first time, we developed and externally validated the performance of a 5-year breast cancer risk prediction model using synthetic DBT images in a diverse external validation population that is 46.3% NHB women. The synthetic DBT image-based prediction model performed equally well in non-Hispanic White and NHB women and across women with dense and nondense breasts. We extended prediction from synthetic DBT images to 5-year risk and reached a 5-year AUC of 0.72 in the diverse external validation cohort. This study shows the feasibility of risk estimation from synthetic DBT images in routine breast screening services for risk prediction intervals consistent with current US risk management guidelines.
Risk stratification is an essential first step to either implementing risk reduction strategies among high-risk women (4, 22) or moving to precision prevention, where screening modality and frequency might, vary as well as the use of lifestyle and chemoprevention approaches to reduce risk and improve risk-benefit outcomes for the screened population (23). In the United States in 2021, 76% of women over age 50 reported being screened within the past 2 years (24). Mammograms offer a unique opportunity to advance clinical prevention in parallel with the screening benefit of early diagnosis and reduced breast cancer mortality (25). With over 80% of women screened with DBT (10), it is imperative that risk models now incorporate this imaging modality. Furthermore, many clinical services have stopped the simultaneous use of FFDM with DBT screening to reduce radiation exposure for women, increasing the imperative for prediction based on DBT alone.
Research to improve breast cancer risk prediction beyond the Gail model (26) has included established risk factors, hormones, mammographic density, and polygenic risk scores as summarized in comprehensive reviews (1). These models, based on varying combinations of demographic risk factors with the addition of polygenic risk scores and mammographic breast density, perform modestly at best with AUCs of 0.54 to 0.68 for the prediction of 5-year risk (27). Advances in AI methods have also been applied to FFDM with comparable performance when using a baseline mammogram (28, 29) and relying less on clinical factors. With a broad population, access to and adherence to screening recommendations, and the uptake of DBT to over 80% of breast screening (10), efficiency may be increased using these prediction methods over clinical factor–based models such as TC (20). Furthermore, participation in breast screening is a goal for racial and ethnic groups, so models must perform across the diverse US population. The need for the models to be evaluated in diverse populations and now include DBT is pressing (6). To date, only one evaluation of DBT for risk prediction has been identified (2). That study limits prediction to a 2-year time horizon with only internal validation, which is shorter than the 5-year horizon and lacks the external validation used to guide clinical practice. We overcome this methodologic gap and use a diverse external validation data source from an urban Atlanta clinical service where 46% of women are NHB (13).
Importantly, risk management guidelines move beyond the United States Preventive Services Task Force recommendation for routine screening for women ages 40 to 74 to identify women at elevated risk who may be offered genetic risk testing, risk reduction approaches (NCCN and ASCO; refs. 3, 4), or modified screening strategies as defined by the ACR (5). While the United States uses a 5-year time horizon, in the UK, a 10-year horizon is used (30). We note that applying the TC model in our population had relatively modest performance in separating high versus low risk as seen in other settings. In other studies, it has identified a high-risk population that gives rise to some 39% of cancers (31). Many of the demographic models use risk factors that have been shown to modify breast density (32), which itself is summarized or contained in the whole breast image. Thus, these crude ways to estimate the image and breast cancer risk do not contribute to estimation once the whole image is incorporated into the prediction model. For clinical translation, our calibration shows good agreement between predicted and observed values and identifies 6% of the population at high risk over the next 5 years, from whom 20% of total cases are diagnosed.
While the sample size for our study is somewhat limited by the relatively recent widespread use of DBT for breast cancer screening (10), the performance and calibration offer opportunities to apply this approach to risk prediction in routine clinical settings, consistent with current US risk management guidelines. Further evaluation of performance across additional race and ethnicity groups will strengthen the applicability of this approach as will broader evidence from community-based screening programs.
Follow-up of cohorts is important for the development and validation of risk models. In both cohorts, the follow-up exams are used for case ascertainment in the cohort of women undergoing routine screening. The mammograms used for risk prediction stop at the time of prediction. Complete ascertainment of cases is a priority, and both systems are ACR-accredited breast screening services. Thus, they follow high standards of quality assurance.
Given the opportunistic nature of mammography screening in the United States, there is an opportunity for loss to follow-up in both the derivation cohort at WashU and in the external validation cohort assembled at Emory. At WashU, the cohort is also linked to the electronic health record to expand surveillance for death and lack of contact with the system. We previously reported follow-up noting that through 2020, 74% of cohort participants have had a medical center visit within the past year and 80% within the past 2 years. Emory links their data to the hospital tumor registry to augment cancer detection and confirmation, and follow-up for death is less rigorous to date. In both academic medical centers, there are disproportionate rates of supplemental screening that includes MRI at 6 months, which accounts for the cases of breast cancer diagnosed in the 6- to 12-month follow-up interval. Regardless of follow-up, the Cox proportional hazards model accounts for the comparison of cases to control women who are known to be alive and free from cancer at that time. Finally, we note that the overall follow-up is somewhat curtailed by the varied start date of the first DBT screening to enter follow-up for this analysis.
This study has important strengths. It uses a diverse external validation population drawn from two community hospitals, a large inner-city hospital, and a private academic hospital. Our 5-year risk model is calibrated to SEER, accommodates the use of accepted risk management cut points, and allows direct application for routine clinical risk management across clinical settings. We limit the analysis to the use of synthetic DBT and begin follow-up of women free from breast cancer from that first DBT exam.
Our results should be considered in the context of limitations. Additional racial and ethnic diversity of the screened populations will strengthen applicability and address gaps in evidence (6). Reflecting the longer use of DBT in routine screening, additional long-term follow-up will become available and should provide further evidence on its utility for long-term risk prediction in routine care.
Conclusions
Our 5-year risk model using synthetic DBTs accurately classifies women in opportunistic screening programs according to the absolute risk of breast cancer and can facilitate guideline-driven risk management and support more equitable screening programs.
Authors’ Disclosures
S. Jiang reports a patent for risk prediction using radiomic features pending. G.A. Colditz reports a patent for risk prediction using radiomic features pending. No disclosures were reported by the other author.
Authors’ Contributions
S. Jiang: Conceptualization, resources, software, formal analysis, validation, investigation, methodology, writing–original draft, writing–review and editing. D.L. Bennett: Conceptualization, investigation, writing–review and editing. G.A. Colditz: Conceptualization, resources, data curation, funding acquisition, validation, investigation, writing–original draft, writing–review and editing.
Acknowledgments
This research is supported by Washington University School of Medicine.
Note: Supplementary data for this article are available at Cancer Prevention Research Online (http://cancerprevres.aacrjournals.org/).