Abstract
We updated algorithms to identify breast cancer recurrences from administrative data, extending previously developed methods.
In this validation study, we evaluated pairs of breast cancer recurrence algorithms (vs. individual algorithms) to identify recurrences. We generated algorithm combinations that categorized discordant algorithm results as no recurrence [High Specificity and PPV (positive predictive value) Combination] or recurrence (High Sensitivity Combination). We compared individual and combined algorithm results to manually abstracted recurrence outcomes from a sample of 600 people with incident stage I–IIIA breast cancer diagnosed between 2004 and 2015. We used Cox regression to evaluate risk factors associated with age- and stage-adjusted recurrence rates using different recurrence definitions, weighted by inverse sampling probabilities.
Among 600 people, we identified 117 recurrences using the High Specificity and PPV Combination, 505 using the High Sensitivity Combination, and 118 using manual abstraction. The High Specificity and PPV Combination had good specificity [98%, 95% confidence interval (CI): 97–99] and PPV (72%, 95% CI: 63–80) but modest sensitivity (64%, 95% CI: 44–80). The High Sensitivity Combination had good sensitivity (80%, 95% CI: 49–94) and specificity (83%, 95% CI: 80–86) but low PPV (29%, 95% CI: 25–34). Recurrence rates using combined algorithms were similar in magnitude for most risk factors.
By combining algorithms, we identified breast cancer recurrences with greater PPV than individual algorithms, without additional review of discordant records.
Researchers should consider tradeoffs between accuracy and manual chart abstraction resources when using previously developed algorithms. We provided guidance for future studies that use breast cancer recurrence algorithms with or without supplemental manual chart abstraction.
Introduction
Breast cancer recurrence is an important outcome in cancer treatment and survivorship research. Most population-based tumor registries do not capture recurrences. Recurrences are not documented in a systematic and clearly structured manner in electronic health records (EHR); thus, accurate identification usually requires resource-intensive chart abstraction (1). Several studies have developed and validated algorithms to identify breast cancer recurrences from EHR data, potentially reducing costs and time required for data collection (2–10). One meta-analysis reported a median sensitivity of 87% (range: 44%–97%) and median positive predictive value (PPV) of 73% (range: 14%–94%) across 14 studies with algorithms for breast cancer recurrence (11).
When using previously developed recurrence algorithms, researchers must select an algorithm, adapt it to their data, and address potential false positives (FP) and false negatives (FN). In addition, algorithms may need to be updated (e.g., to reflect new billing and coding practices). Validating an existing algorithm against chart abstraction requires decisions about which patients to include and extent of manual abstraction, particularly in a setting with constrained resources. Only a few studies have validated or described how they adapted a previously developed algorithm to a new population (6, 8). As use of recurrence algorithms becomes more common in epidemiologic research, following standard practices to evaluate recurrence algorithms will help increase comparability and accuracy of data across studies.
We describe our evaluation of previously developed algorithms for breast cancer recurrence (3, 6) in a population of people diagnosed with incident stage I–IIIA breast cancer. The size of our population prohibited manually collecting gold-standard chart abstraction recurrence data on all people. Thus, we sought to identify an algorithm-based approach that would yield high sensitivity and specificity for recurrence in the absence of chart abstraction. In this study, we describe algorithm selection, adaptation, and analytic considerations to provide scientific direction for future studies.
Materials and Methods
We used data from the Optimal Breast cancer Chemotherapy Dosing (OBCD) study. The OBCD study included adults ≥18 years of age identified as female diagnosed with primary incident stage I–IIIA breast cancer and enrolled at Kaiser Permanente Washington (KPWA) or Kaiser Permanente Northern California (KPNC). For this validation study, we only included participants from KPWA that had definitive breast surgery for their primary breast cancer diagnosis. We obtained Institutional Review Board approval from KPWA, KPNC, and collaborating sites [Memorial Sloan Kettering (New York, NY) and Rutgers University (New Brunswick, NJ)] with a waiver of consent to collect and analyze data in accordance with the U.S. Federal Policy for the Protection of Human Subjects (Common Rule).
Algorithm choice
We identified algorithms that had been developed and validated using data from KP or similar health systems to increase implementation feasibility and potential accuracy in the same settings (3, 6, 8). We selected a set of recurrence algorithms previously developed and published by Chubak and colleagues (3) among people with stage I–II breast cancer, and later extended by Kroenke and colleagues (6), who evaluated the performance of algorithm pairs with select medical record review to resolve discordant results (triangulation), and included patients with stage I–IIIA breast cancer. We chose one algorithm with high sensitivity (“Algorithm 7” in Kroenke and colleagues) and one algorithm with high specificity and PPV (“Algorithm 9” in Kroenke and colleagues) to identify recurrences (6).
Recurrence algorithm adaptation
We used the selected algorithms to identify breast cancer recurrences that occurred before each participant's death, health plan disenrollment, or end of follow-up (December 31, 2019)—whichever came first. As described previously, the algorithms used procedure codes [Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System (HCPCS) codes] and National Drug Codes (NDC; ref. 3). Specifically, Algorithm 7 first identifies recurrence based on codes for a secondary non-breast malignant neoplasm >180 days after primary breast cancer, then mastectomy >180 days after primary breast cancer, and then radiotherapy >365 days after primary breast cancer. Algorithm 9 first identifies recurrence based on two visits with a code for a secondary malignant neoplasm within 60 days of each other and >365 days after primary breast cancer, and then a non-breast cancer record in the Surveillance Epidemiology and End Results (SEER) registry after the primary breast cancer. Because our study included cases diagnosed after the implementation of ICD-10 codes in 2015, we adapted the Chubak and Kroenke algorithms to include ICD-10 diagnosis codes that mapped to the ICD-9 codes used in the original algorithms (available upon request). In addition, we extended the algorithm code to identify recurrence dates. When a recurrence was identified, the recurrence date was defined as the first date associated with an algorithm-identified recurrence (e.g., an ICD code), following previous methods (12). Different algorithms used different indicators; therefore, the first associated date varied by algorithm.
We used three methods to combine algorithms based on how we treated algorithm-discordant pairs between Algorithm 7 and Algorithm 9. First, we categorized all discordant pairs as no recurrence in the “High Specificity and PPV Combination.” Second, we categorized all discordant pairs as recurrence in the “High Sensitivity Combination.” Third, we used triangulation to categorize discordant pairs as recurrence if a confirmed recurrence was identified in chart abstraction and no recurrence if a recurrence was not identified in chart abstraction. Triangulation aligns with Kroenke's prior work and may be option for studies that have additional chart abstraction resources (6).
Validation sample
Abstraction was undertaken among a sample of the 2,859 people with incident breast cancer diagnosed at KPWA between 2004 and 2015 with EHRs available. We abstracted the following: all discordant records in which one algorithm indicated a recurrence while the other did not (N = 404); a random sample of concordant records where both algorithms indicated no recurrence (N = 100); a random sample of concordant records where both algorithms indicated recurrence (N = 100); and all additional concordant records where both algorithms indicated a recurrence, but corresponding recurrence dates differed by |$ \ge $|6 months (N = 24). We included concordant records because we did not want to assume that concordance between the two algorithms represented a true positive (TP) or true negative (TN). None of the people included in the original validation studies by Chubak and Kroenke were included in this validation sample. Out of 628 records abstracted, we excluded 28 from the final analysis. Four were ineligible for this study (upon further review they were not stage I–IIIA breast cancer or had incomplete EHRs), six had recurrence dates beyond the end of follow-up (December 31, 2019), and 18 did not have definitive surgery for the primary breast cancer diagnosis. We included 600 remaining records in this validation analysis, which were weighted back to the overall population using inverse sampling weights based on the probability of being selected for medical record abstraction after accounting for exclusions. The inverse sampling weights were based entirely on the algorithm results (e.g., all discordant or random samples of concordant records) and were not based on any population characteristics.
Medical record abstraction and gold-standard recurrence definition
We abstracted information on breast cancer recurrences from pathology and imaging reports. We provided chart abstractors with information on the primary breast cancer diagnosis (including date) but blinded them to recurrence algorithm results. We defined recurrence as a tumor in the ipsilateral breast, lymph nodes, or any distant site >120 days after the definitive surgery date with the same histologic type as the primary breast tumor (e.g., ductal, lobular, tubular, mucinous, etc.), following methods described previously (13). Rather than relying on ICD diagnosis codes, the abstractor read through text of oncology notes, pathology reports, and imaging reports to identify recurrences. We included metastatic diagnoses found on clinical imaging with unknown histology. We abstracted information on recurrence date, histology, method of diagnosis (pathology or imaging reports), and tumor markers.
Additional data collection
We collected demographic and clinical characteristics [e.g., age, race, ethnicity, body mass index (BMI), menopausal status, Charlson comorbidity index] at the time of initial breast cancer diagnosis from EHR data. We categorized people as postmenopausal if they were 55 years or older, or younger than 55 with evidence of a bilateral oophorectomy. Charlson comorbidity index was calculated using administrative data from the 12 months prior to diagnosis (14). We collected information on initial breast cancer tumor characteristics [stage, grade, estrogen receptor (ER), progesterone receptor (PR), HER2 status], treatment (surgery type, chemotherapy, radiotherapy, endocrine therapy), and second primary breast cancers by linking to the Seattle-Puget Sound SEER tumor registry.
Statistical analyses
We described the chart-abstracted study population, stratified by abstracted recurrence status. We then described the characteristics of the overall study population using inverse sampling weights. We calculated performance characteristics for each algorithm individually and combined, compared with gold-standard chart abstraction. We calculated the number of TP, FP, FN, and TN in the chart-abstracted study population and weighted to the overall study population. For each individual algorithm and combination, we calculated weighted sensitivity, specificity, PPV, and negative predictive value (NPV) using inverse sampling probabilities.
We sought to understand how algorithm performance might change based on study design decisions. Therefore, we evaluated weighted performance measures among subpopulations: people without a second primary breast cancer diagnosis because these diagnoses may be incorrectly identified as recurrences in an algorithm (remaining N = 523); people with follow-up data truncated at the end of 2014 to mimic an algorithm that uses ICD-9 codes only (N = 556); and people with recurrences identified by both algorithms excluding any with recurrence dates ≥6 months apart because these recurrences may have been incorrectly identified by one or both algorithms (remaining N = 549).
We were interested in how different recurrence algorithm approaches might influence associations between risk factors and recurrence. Therefore, we estimated associations between demographic, clinical, tumor, and treatment characteristics and recurrence. Using Cox proportional hazards models, we modeled recurrence using each definition (from chart abstraction, each algorithm alone, and each combined algorithm) in a separate model. We calculated HRs for recurrence with 95% confidence intervals (CI) using linearized variance estimators to compute SEs, adjusted for age and stage of the incident breast cancer, weighting results using inverse sampling probabilities. For these analyses, we additionally censored people at the date of a second primary breast cancer diagnosis. People with missing data for specific risk factors were excluded from analyses for that risk factor. For cases with different recurrence dates in each algorithm, we used the earliest date for analysis. All analyses were conducted using StataMP.
Data availability
Data contain potentially identifiable information (e.g., dates of diagnoses and exams) that cannot be shared openly without appropriate human subjects approval and data use agreements, but may be available upon reasonable request from the corresponding author.
Results
Among 600 people with primary breast cancer whose charts were abstracted, 118 (20%) had an abstracted recurrence, which, when weighted to the overall population (N = 2,756), corresponded to 224 (8%) estimated recurrences (Table 1). The mean age at primary breast cancer diagnosis was 61 years among the chart abstracted sample and 62 years among the overall population and did not meaningfully vary by recurrence status. Figure 1 shows the number of recurrences identified by Algorithm 7 and Algorithm 9, stratified by chart abstraction recurrence results. Shading indicates whether the recurrences were treated as TP, FN, FP, or TN. Results for Algorithm 7 alone ignore recurrence results from Algorithm 9, and vice versa. The results for the combined algorithms and triangulation show how discordant pairs were classified as FN or TN for the High Specificity and PPV Combination, FP or TP for the High Sensitivity Combination, and TP or TN depending on the chart abstraction results for the Triangulation Combination.
Descriptive characteristics of people with stage I–IIIA breast cancer, stratified by gold-standard recurrence outcomes from manually abstracted medical records, unweighted, and weighted to the overall study population.
. | Chart abstracted sample . | Overall study populationa . | ||||
---|---|---|---|---|---|---|
. | . | Recurrence . | . | Recurrence . | ||
. | All . | Yes . | No . | All . | Yes . | No . |
. | N = 600 . | N = 118 . | N = 482 . | N = 2,756 . | N = 224 . | N = 2,532 . |
. | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . |
Age at primary diagnosis | 61 (13) | 60 (12) | 61 (13) | 62 (13) | 59 (18) | 63 (12) |
Year of primary diagnosis | N (%) | N (%) | N (%) | N (%) | N (%) | N (%) |
2004–2007 | 180 (30) | 29 (25) | 151 (31) | 904 (33) | 45 (20) | 859 (34) |
2008–2011 | 218 (36) | 51 (43) | 167 (35) | 1010 (37) | 77 (34) | 933 (37) |
2012–2015 | 202 (34) | 38 (32) | 164 (34) | 842 (31) | 102 (45) | 740 (29) |
Race | ||||||
AI/AN | 6 (1) | 0 (0) | 6 (1) | 29 (1) | 0 | 29 (1) |
Asian | 42 (7) | 6 (5) | 36 (8) | 177 (6) | 29 (13) | 148 (6) |
Black | 24 (4) | 5 (4) | 19 (4) | 93 (3) | 6 (3) | 87 (3) |
NH/PI | 3 (1) | 1 (1) | 2 (0) | 4 (0) | 2 (1) | 2 (0) |
White | 498 (83) | 104 (88) | 394 (82) | 2,360 (86) | 184 (82) | 2,176 (86) |
Other | 12 (2) | 0 (0) | 12 (3) | 56 (2) | 0 | 56 (2) |
Multiple | 15 (2) | 2 (2) | 13 (3) | 37 (1) | 2 (1) | 35 (1) |
Ethnicity | ||||||
Not Hispanic | 563 (94) | 112 (95) | 451 (94) | 2,671 (97) | 216 (97) | 2,455 (97) |
Hispanic | 37 (6) | 6 (5) | 31 (6) | 85 (3) | 7 (3) | 77 (3) |
BMI (kg/m2) at diagnosis | ||||||
<25.0 | 155 (26) | 23 (20) | 132 (27) | 656 (24) | 38 (17) | 618 (24) |
25.0–29.9 | 168 (28) | 37 (31) | 131 (27) | 822 (30) | 74 (33) | 747 (30) |
30.0+ | 181 (30) | 42 (36) | 139 (29) | 737 (27) | 89 (40) | 648 (26) |
Missing | 96 (16) | 16 (14) | 80 (17) | 541 (20) | 23 (10) | 519 (21) |
Menopausal status at diagnosis | ||||||
Premenopausal | 182 (30) | 31 (26) | 151 (31) | 746 (27) | 65 (29) | 681 (27) |
Postmenopausal | 418 (70) | 87 (74) | 331 (69) | 2,010 (73) | 159 (71) | 1,852 (73) |
Stage | ||||||
I | 226 (38) | 37 (31) | 189 (39) | 1,562 (57) | 83 (37) | 1,479 (58) |
IIA | 145 (24) | 31 (26) | 114 (24) | 731 (27) | 48 (21) | 684 (27) |
IIB/C | 132 (22) | 29 (25) | 103 (21) | 286 (10) | 62 (28) | 224 (9) |
IIIA | 97 (16) | 21 (18) | 76 (16) | 177 (6) | 31 (14) | 146 (6) |
Grade | ||||||
1 | 128 (21) | 17 (14) | 111 (23) | 705 (26) | 24 (11) | 682 (27) |
2 | 228 (38) | 47 (40) | 181 (38) | 1,096 (40) | 121 (54) | 976 (39) |
3 | 264 (44) | 52 (44) | 182 (38) | 879 (32) | 77 (35) | 802 (32) |
Missing | 10 (2) | 2 (2) | 8 (2) | 75 (3) | 2 (1) | 73 (3) |
ER status | ||||||
Positive | 487 (81) | 93 (79) | 394 (82) | 2,274 (83) | 186 (83) | 2,089 (83) |
Negative | 106 (18) | 21 (18) | 85 (18) | 473 (17) | 32 (15) | 441 (17) |
Missing | 7 (1) | 4 (3) | 3 (1) | 9 (0) | 5 (2) | 3 (0) |
PR status | ||||||
Positive | 453 (76) | 88 (75) | 365 (76) | 2,170 (79) | 177 (79) | 1,993 (79) |
Negative | 134 (22) | 25 (21) | 109 (23) | 547 (20) | 40 (18) | 507 (20) |
Missing | 13 (2) | 5 (4) | 8 (2) | 39 (1) | 7 (3) | 32 (1) |
HER2 status | ||||||
Positive | 99 (17) | 10 (9) | 89 (19) | 457 (17) | 14 (6) | 443 (18) |
Borderline | 13 (2) | 3 (3) | 10 (2) | 60 (2) | 5 (2) | 55 (2) |
Negative | 476 (79) | 100 (85) | 376 (78) | 2,182 (79) | 199 (89) | 1,983 (78) |
Missing | 12 (2) | 5 (4) | 7 (2) | 57 (2) | 5 (2) | 52 (2) |
Surgery type | ||||||
Lumpectomy | 287 (48) | 52 (44) | 235 (49) | 1,635 (59) | 102 (46) | 1,533 (61) |
Mastectomy | 313 (52) | 66 (56) | 247 (51) | 1,121 (41) | 122 (54) | 1,000 (40) |
Chemotherapy | ||||||
No | 297 (50) | 55 (47) | 242 (50) | 1,926 (70) | 110 (49) | 1,816 (72) |
Yes | 303 (51) | 63 (53) | 240 (50) | 831 (30) | 114 (51) | 717 (28) |
Radiotherapy | ||||||
No | 234 (39) | 52 (44) | 182 (38) | 1,204 (44) | 81 (36) | 1,123 (44) |
Yes | 366 (61) | 66 (56) | 300 (62) | 1,552 (56) | 142 (64) | 1,410 (56) |
Endocrine therapy | ||||||
No | 186 (31) | 31 (26) | 155 (32) | 1,142 (42) | 45 (20) | 1,097 (43) |
Yes | 414 (69) | 87 (73) | 327 (68) | 1,614 (59) | 178 (80) | 1,435 (57) |
Charlson comorbidity index in year before diagnosis | ||||||
0 | 464 (77) | 97 (82) | 367 (76) | 2,103 (76) | 193 (86) | 1,910 (75) |
1 | 71 (12) | 12 (10) | 59 (12) | 340 (12) | 17 (8) | 323 (13) |
2 | 31 (5) | 2 (2) | 29 (6) | 207 (8) | 3 (2) | 203 (8) |
3+ | 34 (6) | 7 (6) | 27 (6) | 106 (4) | 10 (4) | 97 (4) |
Second breast cancer | ||||||
No | 523 (87) | 106 (90) | 417 (87) | 2647 (96) | 209 (93) | 2,438 (96) |
Yes | 77 (13) | 12 (10) | 65 (14) | 109 (4) | 15 (7) | 94 (4) |
. | Chart abstracted sample . | Overall study populationa . | ||||
---|---|---|---|---|---|---|
. | . | Recurrence . | . | Recurrence . | ||
. | All . | Yes . | No . | All . | Yes . | No . |
. | N = 600 . | N = 118 . | N = 482 . | N = 2,756 . | N = 224 . | N = 2,532 . |
. | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . | Mean (SD) . |
Age at primary diagnosis | 61 (13) | 60 (12) | 61 (13) | 62 (13) | 59 (18) | 63 (12) |
Year of primary diagnosis | N (%) | N (%) | N (%) | N (%) | N (%) | N (%) |
2004–2007 | 180 (30) | 29 (25) | 151 (31) | 904 (33) | 45 (20) | 859 (34) |
2008–2011 | 218 (36) | 51 (43) | 167 (35) | 1010 (37) | 77 (34) | 933 (37) |
2012–2015 | 202 (34) | 38 (32) | 164 (34) | 842 (31) | 102 (45) | 740 (29) |
Race | ||||||
AI/AN | 6 (1) | 0 (0) | 6 (1) | 29 (1) | 0 | 29 (1) |
Asian | 42 (7) | 6 (5) | 36 (8) | 177 (6) | 29 (13) | 148 (6) |
Black | 24 (4) | 5 (4) | 19 (4) | 93 (3) | 6 (3) | 87 (3) |
NH/PI | 3 (1) | 1 (1) | 2 (0) | 4 (0) | 2 (1) | 2 (0) |
White | 498 (83) | 104 (88) | 394 (82) | 2,360 (86) | 184 (82) | 2,176 (86) |
Other | 12 (2) | 0 (0) | 12 (3) | 56 (2) | 0 | 56 (2) |
Multiple | 15 (2) | 2 (2) | 13 (3) | 37 (1) | 2 (1) | 35 (1) |
Ethnicity | ||||||
Not Hispanic | 563 (94) | 112 (95) | 451 (94) | 2,671 (97) | 216 (97) | 2,455 (97) |
Hispanic | 37 (6) | 6 (5) | 31 (6) | 85 (3) | 7 (3) | 77 (3) |
BMI (kg/m2) at diagnosis | ||||||
<25.0 | 155 (26) | 23 (20) | 132 (27) | 656 (24) | 38 (17) | 618 (24) |
25.0–29.9 | 168 (28) | 37 (31) | 131 (27) | 822 (30) | 74 (33) | 747 (30) |
30.0+ | 181 (30) | 42 (36) | 139 (29) | 737 (27) | 89 (40) | 648 (26) |
Missing | 96 (16) | 16 (14) | 80 (17) | 541 (20) | 23 (10) | 519 (21) |
Menopausal status at diagnosis | ||||||
Premenopausal | 182 (30) | 31 (26) | 151 (31) | 746 (27) | 65 (29) | 681 (27) |
Postmenopausal | 418 (70) | 87 (74) | 331 (69) | 2,010 (73) | 159 (71) | 1,852 (73) |
Stage | ||||||
I | 226 (38) | 37 (31) | 189 (39) | 1,562 (57) | 83 (37) | 1,479 (58) |
IIA | 145 (24) | 31 (26) | 114 (24) | 731 (27) | 48 (21) | 684 (27) |
IIB/C | 132 (22) | 29 (25) | 103 (21) | 286 (10) | 62 (28) | 224 (9) |
IIIA | 97 (16) | 21 (18) | 76 (16) | 177 (6) | 31 (14) | 146 (6) |
Grade | ||||||
1 | 128 (21) | 17 (14) | 111 (23) | 705 (26) | 24 (11) | 682 (27) |
2 | 228 (38) | 47 (40) | 181 (38) | 1,096 (40) | 121 (54) | 976 (39) |
3 | 264 (44) | 52 (44) | 182 (38) | 879 (32) | 77 (35) | 802 (32) |
Missing | 10 (2) | 2 (2) | 8 (2) | 75 (3) | 2 (1) | 73 (3) |
ER status | ||||||
Positive | 487 (81) | 93 (79) | 394 (82) | 2,274 (83) | 186 (83) | 2,089 (83) |
Negative | 106 (18) | 21 (18) | 85 (18) | 473 (17) | 32 (15) | 441 (17) |
Missing | 7 (1) | 4 (3) | 3 (1) | 9 (0) | 5 (2) | 3 (0) |
PR status | ||||||
Positive | 453 (76) | 88 (75) | 365 (76) | 2,170 (79) | 177 (79) | 1,993 (79) |
Negative | 134 (22) | 25 (21) | 109 (23) | 547 (20) | 40 (18) | 507 (20) |
Missing | 13 (2) | 5 (4) | 8 (2) | 39 (1) | 7 (3) | 32 (1) |
HER2 status | ||||||
Positive | 99 (17) | 10 (9) | 89 (19) | 457 (17) | 14 (6) | 443 (18) |
Borderline | 13 (2) | 3 (3) | 10 (2) | 60 (2) | 5 (2) | 55 (2) |
Negative | 476 (79) | 100 (85) | 376 (78) | 2,182 (79) | 199 (89) | 1,983 (78) |
Missing | 12 (2) | 5 (4) | 7 (2) | 57 (2) | 5 (2) | 52 (2) |
Surgery type | ||||||
Lumpectomy | 287 (48) | 52 (44) | 235 (49) | 1,635 (59) | 102 (46) | 1,533 (61) |
Mastectomy | 313 (52) | 66 (56) | 247 (51) | 1,121 (41) | 122 (54) | 1,000 (40) |
Chemotherapy | ||||||
No | 297 (50) | 55 (47) | 242 (50) | 1,926 (70) | 110 (49) | 1,816 (72) |
Yes | 303 (51) | 63 (53) | 240 (50) | 831 (30) | 114 (51) | 717 (28) |
Radiotherapy | ||||||
No | 234 (39) | 52 (44) | 182 (38) | 1,204 (44) | 81 (36) | 1,123 (44) |
Yes | 366 (61) | 66 (56) | 300 (62) | 1,552 (56) | 142 (64) | 1,410 (56) |
Endocrine therapy | ||||||
No | 186 (31) | 31 (26) | 155 (32) | 1,142 (42) | 45 (20) | 1,097 (43) |
Yes | 414 (69) | 87 (73) | 327 (68) | 1,614 (59) | 178 (80) | 1,435 (57) |
Charlson comorbidity index in year before diagnosis | ||||||
0 | 464 (77) | 97 (82) | 367 (76) | 2,103 (76) | 193 (86) | 1,910 (75) |
1 | 71 (12) | 12 (10) | 59 (12) | 340 (12) | 17 (8) | 323 (13) |
2 | 31 (5) | 2 (2) | 29 (6) | 207 (8) | 3 (2) | 203 (8) |
3+ | 34 (6) | 7 (6) | 27 (6) | 106 (4) | 10 (4) | 97 (4) |
Second breast cancer | ||||||
No | 523 (87) | 106 (90) | 417 (87) | 2647 (96) | 209 (93) | 2,438 (96) |
Yes | 77 (13) | 12 (10) | 65 (14) | 109 (4) | 15 (7) | 94 (4) |
Abbreviations: AI/AN; American Indian/Alaska Native; AJCC, American Joint Committee on Cancer; BMI, body mass index; ER, estrogen receptor; NH/PI, Native Hawaiian/Pacific Islander; PR, progesterone receptor; SD, standard deviation.
aNumbers and %s for the overall study population were calculated using inverse probability sampling weights. Weighted Ns may not add to total due to rounding.
Methods for defining recurrence versus no recurrence based on algorithms and chart abstraction. This figure shows how each individual and combination of algorithms for breast cancer recurrence was used to define true positives, false positives, true negatives, and false negatives compared with chart abstraction. We reviewed medical records for 600 people to obtain our gold-standard recurrence outcome. This included 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart. When combining algorithms, we categorized recurrence-discordant pairs as negatives in one combination and positive in another. In the High Specificity and PPV Combination, if both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If one or both algorithms did not identify a recurrence, then this was defined no recurrence from this combination. In the High Sensitivity Combination, if one or both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If both algorithms did not identify a recurrence, then this was defined as no recurrence from this combination. For Triangulation Combination, we used chart abstraction results to define discordant algorithm results. If the chart abstraction identified a recurrence, then the discordant result was defined as a recurrence; if the chart abstraction did not identify a recurrence, then the discordant result was defined as no recurrence.
Methods for defining recurrence versus no recurrence based on algorithms and chart abstraction. This figure shows how each individual and combination of algorithms for breast cancer recurrence was used to define true positives, false positives, true negatives, and false negatives compared with chart abstraction. We reviewed medical records for 600 people to obtain our gold-standard recurrence outcome. This included 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart. When combining algorithms, we categorized recurrence-discordant pairs as negatives in one combination and positive in another. In the High Specificity and PPV Combination, if both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If one or both algorithms did not identify a recurrence, then this was defined no recurrence from this combination. In the High Sensitivity Combination, if one or both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If both algorithms did not identify a recurrence, then this was defined as no recurrence from this combination. For Triangulation Combination, we used chart abstraction results to define discordant algorithm results. If the chart abstraction identified a recurrence, then the discordant result was defined as a recurrence; if the chart abstraction did not identify a recurrence, then the discordant result was defined as no recurrence.
Among chart abstracted records, we identified 388 (66% of 600 records) recurrences using Algorithm 7 alone and 234 (39%) recurrences using Algorithm 9 alone (Table 2). The High Specificity and PPV Combination identified 117 (20%) recurrences and had the highest weighted specificity (98%, 95% CI: 97–99) and PPV (72%, 95% CI: 63–80), with a sensitivity of 64% (95% CI: 44–80). The High Sensitivity Combination identified 505 (84%) recurrences and had the highest weighted sensitivity (80%, 95% CI: 49–94), with a specificity of 83% (95% CI: 80–86) and PPV of 29% (95% CI: 25–34). Algorithms had similar NPVs whether used individually or in combination.
Comparison of weighted algorithm performance for each algorithm alone and combined compared with gold-standard chart abstraction, and among subpopulations.
. | Algorithm 7 . | Algorithm 9 . | High Specificity and PPV Combinationa . | High Sensitivity Combinationb . |
---|---|---|---|---|
N recurrences (%) | 388 (66%) | 234 (39%) | 117 (20%) | 505 (84%) |
Overall (N=600)c | % (95% CI) | % (95% CI) | % (95% CI) | % (95% CI) |
Sensitivity | 78% (49–93) | 65% (45–82) | 64% (44–80) | 80% (49–94) |
Specificity | 88% (85–90) | 93% (92–95) | 98% (97–99) | 83% (80–86) |
PPV | 36% (31–74) | 46% (39–53) | 72% (63–80) | 29% (25–34) |
NPV | 98% (93–99) | 97% (93–99) | 97% (93–99) | 98% (92–100) |
No second breast cancer (N=523) | ||||
Sensitivity | 77% (47–93) | 66% (43–83) | 65% (43–82) | 78% (47–85) |
Specificity | 90% (88–92) | 94% (92–95) | 98% (97–99) | 85% (82–88) |
PPV | 40% (34–46) | 47% (40–55) | 75% (66–83) | 31% (27–37) |
NPV | 98% (92–99) | 97% (93–99) | 97% (93–99) | 80% (92–100) |
Censored in 2014 (N=556) | ||||
Sensitivity | 96% (89–98) | 79% (68–87) | 78% (67–86) | 100% (96–100) |
Specificity | 91% (89–93) | 96% (95–97) | 98% (97–99) | 89% (86–91) |
PPV | 29% (23–36) | 43% (34–53) | 58% (46–69) | 25% (20–31) |
NPV | 100% (100–100) | 99% (99–100) | 99% (99–99) | 100% (100–100) |
Recurrence algorithm dates ≤6 months apart (N=549) | ||||
Sensitivity | 76% (44–92) | 59% (38–78) | 58% (37–76) | 76% (44–93) |
Specificity | 88% (86–90) | 94% (92–95) | 99% (98–99) | 84% (80–87) |
PPV | 32% (27–38) | 42% (34–50) | 74% (62–83) | 26% (21–31) |
NPV | 98% (93–99) | 97% (93–99) | 97% (93–99) | 98% (92–100) |
. | Algorithm 7 . | Algorithm 9 . | High Specificity and PPV Combinationa . | High Sensitivity Combinationb . |
---|---|---|---|---|
N recurrences (%) | 388 (66%) | 234 (39%) | 117 (20%) | 505 (84%) |
Overall (N=600)c | % (95% CI) | % (95% CI) | % (95% CI) | % (95% CI) |
Sensitivity | 78% (49–93) | 65% (45–82) | 64% (44–80) | 80% (49–94) |
Specificity | 88% (85–90) | 93% (92–95) | 98% (97–99) | 83% (80–86) |
PPV | 36% (31–74) | 46% (39–53) | 72% (63–80) | 29% (25–34) |
NPV | 98% (93–99) | 97% (93–99) | 97% (93–99) | 98% (92–100) |
No second breast cancer (N=523) | ||||
Sensitivity | 77% (47–93) | 66% (43–83) | 65% (43–82) | 78% (47–85) |
Specificity | 90% (88–92) | 94% (92–95) | 98% (97–99) | 85% (82–88) |
PPV | 40% (34–46) | 47% (40–55) | 75% (66–83) | 31% (27–37) |
NPV | 98% (92–99) | 97% (93–99) | 97% (93–99) | 80% (92–100) |
Censored in 2014 (N=556) | ||||
Sensitivity | 96% (89–98) | 79% (68–87) | 78% (67–86) | 100% (96–100) |
Specificity | 91% (89–93) | 96% (95–97) | 98% (97–99) | 89% (86–91) |
PPV | 29% (23–36) | 43% (34–53) | 58% (46–69) | 25% (20–31) |
NPV | 100% (100–100) | 99% (99–100) | 99% (99–99) | 100% (100–100) |
Recurrence algorithm dates ≤6 months apart (N=549) | ||||
Sensitivity | 76% (44–92) | 59% (38–78) | 58% (37–76) | 76% (44–93) |
Specificity | 88% (86–90) | 94% (92–95) | 99% (98–99) | 84% (80–87) |
PPV | 32% (27–38) | 42% (34–50) | 74% (62–83) | 26% (21–31) |
NPV | 98% (93–99) | 97% (93–99) | 97% (93–99) | 98% (92–100) |
Abbreviations: NPV, negative predictive value; PPV, positive predictive value.
aHigh Specificity and PPV Combination categorized discordant algorithm pairs as negative.
bHigh Sensitivity Combination categorized discordant algorithm pairs as positive.
cThe overall population includes the entire study sample with 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart.
When we excluded people diagnosed with a second primary breast cancer, none of the weighted performance measures changed substantially (Table 2). When we censored follow-up at the end of 2014 (before ICD-10 codes were implemented), the sensitivity increased to 100% (95% CI: 96–100) for the High Sensitivity Combination, and to 78% (95% CI: 67–86) for the High Specificity and PPV Combination. Specificity also increased slightly while the PPV decreased compared with overall results. Sensitivity decreased slightly and specificity increased across all algorithms when we excluded people with recurrence algorithm dates ≥6 months apart.
There were very few FNs from Algorithm 7 alone (N = 5, 1%) and the High Sensitivity Combination (N = 2, 0%; Supplementary Table S1). In weighted analyses, we estimated 48 FNs from Algorithm 7 alone (2%), and 45 FNs from the High Sensitivity Combination (2%) among the overall population. The High Specificity and PPV Combination had the fewest FPs (N = 34, 6%) compared with other algorithms (Supplementary Table S1). In weighted analyses, this equated to an estimated 55 FPs among the overall population. Using triangulation, we identified 150 recurrences (25% of abstracted cases; Supplementary Table S2). Triangulation resulted in a higher sensitivity (80%, 95% CI: 49–94), specificity (98%, 95% CI: 97–99), and PPV (77%, 95% CI: 68–83) than either algorithm individually or combined.
Abstracted recurrence rates declined as age increased, resulting in a HR as low as 0.3 (95% CI: 0.1–1.0) for people 80 years or older versus women ages 50–59 (Table 3). We noted similar inverse associations between age and recurrence rates based on individual and combined algorithms. Higher BMI was associated with increased abstracted recurrence rates (HR = 2.3, 95% CI: 1.0–4.9 for BMI ≥30 vs. BMI <25). When using algorithms, HRs for the associations between BMI ≥30 and recurrence ranged from 1.0 to 1.2 and none reached statistical significance. People with stage IIB/C or IIIA primary breast cancer had a higher recurrence rate than people with stage I breast cancer based on abstracted results (HR = 5.5, 95% CI: 2.0–14.9 for stage IIB/C, and HR = 3.3, 95% CI: 1.2–9.2 for stage IIIA). These HRs were also elevated when using individual algorithms to define recurrence but varied widely ranging from 4.3 to 12.1 for stage IIB/C and 3.9–13.5 for stage IIIA. There was less variation in these results among the combined algorithms (HR range 6.5–7.1 for stage IIB/C and 5.7–7.6 for stage IIIA).
Weighted, adjusteda HRs for breast cancer recurrence risk associated with demographic, tumor, and treatment characteristics of the primary cancer using chart abstraction, each algorithm alone, and algorithm combinations to define recurrence.
. | Chart abstraction . | Algorithm 7 alone . | Algorithm 9 alone . | High Specificity and PPV Combinationb . | High Sensitivity Combinationc . |
---|---|---|---|---|---|
. | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . |
Age at diagnosis (ref: 50–59) | |||||
<50 | 0.6 (0.2–1.9) | 0.8 (0.4–1.5) | 0.6 (0.3–1.4) | 0.8 (0.4–1.6) | 0.7 (0.3–1.4) |
60–69 | 0.4 (0.2–1.0) | 0.8 (0.5–1.5) | 0.7 (0.3–1.4) | 0.8 (0.4–1.6) | 0.7 (0.4–1.4) |
70–79 | 0.3 (0.1–0.8) | 0.6 (0.3–1.1) | 0.5 (0.3–1.1) | 0.6 (0.2–1.2) | 0.5 (0.3–1.0) |
80+ | 0.3 (0.1–1.0) | 0.5 (0.2–1.3) | 0.4 (0.1–1.3) | 0.5 (0.2–1.5) | 0.4 (0.2–1.1) |
Race (ref: White) | |||||
American Indian/Alaska Native | — | 0.7 (0.1–4.8) | 3.7 (0.7–18.5) | 1.3 (0.1–13.0) | 1.5 (0.3–7.5) |
Asian | 1.9 (0.4–10.0) | 1.0 (0.4–2.3) | 1.5 (0.7–3.5) | 0.9 (0.3–2.6) | 1.3 (0.6–2.8) |
Black | 0.6 (0.2–2.6) | 1.4 (0.5–4.0) | 1.1 (0.3–4.0) | 1.7 (0.5–5.6) | 1.1 (0.3–3.8) |
NH/PI | 1.2 (0.4–4.0) | 1.3 (0.5–3.3) | 1.5 (0.5–4.2) | 1.8 (0.8–4.2) | 1.1 (0.3–3.7) |
All others | — | 0.8 (0.2–3.2) | 1.3 (0.3–6.1) | 0.3 (0.0–2.7) | 1.0 (0.2–4.8) |
Multiple | 0.7 (0.1–3.2) | 2.2 (0.9–5.8) | 1.0 (0.4–2.9) | 0.7 (0.1–3.3) | 2.4 (1.0–5.7) |
Year of diagnosis (ref: 2010–2015) | |||||
2000–2004 | 0.4 (0.1–1.4) | 0.9 (0.4–1.9) | 0.6 (0.2–1.4) | 1.0 (0.4–2.6) | 0.7 (0.3–1.5) |
2005–2009 | 0.5 (0.3–0.9) | 1.1 (0.7–1.7) | 0.5 (0.3–1.0) | 0.8 (0.5–1.4) | 0.9 (0.5–1.4) |
Ethnicity (ref: not Hispanic) | |||||
Hispanic | 1.4 (0.5–4.0) | 3.8 (2.0–7.2) | 2.1 (0.9–4.9) | 1.2 (0.3–4.2) | 4.6 (2.6–8.1) |
BMI (ref: <25.0 kg/m2) | |||||
25.0–29.9 | 1.7 (0.6–4.9) | 0.9 (0.5–1.5) | 0.8 (0.4–1.5) | 0.9 (0.4–1.9) | 0.9 (0.5–1.6) |
30.0+ | 2.3 (1.0–4.9) | 1.2 (0.7–2.1) | 1.0 (0.5–1.9) | 1.2 (0.6–2.5) | 1.1 (0.6–2.0) |
Menopausal status (ref: premenopausal) | |||||
Postmenopausal | 2.5 (0.9–7.3) | 0.9 (0.4–2.2) | 0.9 (0.4–2.2) | 1.4 (0.5–3.9) | 0.8 (0.3–1.8) |
AJCC stage (ref: I) | |||||
IIA | 1.1 (0.5–2.5) | 1.3 (0.7–2.2) | 2.5 (1.3–4.9) | 1.8 (0.9–3.8) | 1.5 (0.9–2.6) |
IIB/C | 5.5 (2.0–14.9) | 4.3 (2.3–8.0) | 12.1 (6.0–24.1) | 6.5 (3.2–13.3) | 7.1 (3.5–14.2) |
IIIA | 3.3 (1.2–9.2) | 3.9 (1.8–8.4) | 13.5 (6.3–29.1) | 5.7 (2.5–13.0) | 7.6 (3.3–17.5) |
Grade (ref: 1) | |||||
2 | 3.2 (1.3–8.2) | 1.1 (0.6–1.9) | 1.5 (0.7–3.2) | 1.7 (0.8–3.7) | 1.1 (0.6–2.0) |
3 | 2.2 (1.0–4.9) | 1.6 (0.9–2.7) | 2.3 (1.1–4.7) | 2.2 (1.0–4.7) | 1.8 (1.0–3.4) |
ER status (ref: ER positive) | |||||
ER negative | 0.9 (0.4–2.0) | 1.7 (1.0–2.9) | 1.6 (0.9–2.7) | 1.8 (1.0–3.2) | 1.7 (1.1–2.8) |
PR status (ref: PR positive) | |||||
PR negative | 1.0 (0.5–2.1) | 2.0 (1.3–3.1) | 1.6 (1.0–2.7) | 1.7 (1.0–3.0) | 2.0 (1.3–3.2) |
HER2 status (ref: HER2 positive) | |||||
HER2 borderline | 1.7 (0.2–16.4) | 0.7 (0.1–4.8) | 0.2 (0.0–1.8) | 1.0 (0.1–8.7) | 0.3 (0.0–1.9) |
HER2 negative | 3.5 (1.4–9.2) | 1.2 (0.7–2.1) | 0.6 (0.3–1.1) | 1.3 (0.6–2.7) | 0.8 (0.5–1.4) |
Surgery type (ref: lumpectomy) | |||||
Mastectomy | 1.3 (0.7–2.6) | 1.6 (1.1–2.6) | 1.5 (0.9–2.5) | 1.5 (0.9–2.6) | 1.8 (1.1–2.8) |
Chemotherapy (ref: no) | |||||
Yes | 1.1 (0.5–2.5) | 1.5 (0.8–2.6) | 2.3 (1.1–4.5) | 1.7 (0.8–3.4) | 1.8 (1.0–3.3) |
Radiotherapy (ref: no) | |||||
Yes | 1.1 (0.5–2.2) | 0.8 (0.5–1.3) | 0.9 (0.5–1.4) | 0.8 (0.5–1.5) | 0.8 (0.5–1.2) |
Endocrine therapy (ref: no) | |||||
Yes | 1.6 (0.7–3.3) | 0.8 (0.5–1.3) | 1.0 (0.6–1.7) | 1.0 (0.6–2.0) | 0.8 (0.5–1.2) |
Charlson score (ref: 0) | |||||
1 | 0.7 (0.3–1.5) | 1.1 (0.6–1.9) | 1.2 (0.6–2.2) | 0.9 (0.4–1.9) | 1.2 (0.7–2.1) |
2 | 0.2 (0.0–1.4) | 0.7 (0.3–1.8) | 0.3 (0.1–1.3) | 0.1 (0.0–1.4) | 0.7 (0.3–1.8) |
3+ | 1.9 (0.7–5.3) | 3.1 (1.3–7.1) | 2.9 (1.2–7.3) | 2.4 (0.8–7.0) | 3.4 (1.5–7.4) |
. | Chart abstraction . | Algorithm 7 alone . | Algorithm 9 alone . | High Specificity and PPV Combinationb . | High Sensitivity Combinationc . |
---|---|---|---|---|---|
. | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . |
Age at diagnosis (ref: 50–59) | |||||
<50 | 0.6 (0.2–1.9) | 0.8 (0.4–1.5) | 0.6 (0.3–1.4) | 0.8 (0.4–1.6) | 0.7 (0.3–1.4) |
60–69 | 0.4 (0.2–1.0) | 0.8 (0.5–1.5) | 0.7 (0.3–1.4) | 0.8 (0.4–1.6) | 0.7 (0.4–1.4) |
70–79 | 0.3 (0.1–0.8) | 0.6 (0.3–1.1) | 0.5 (0.3–1.1) | 0.6 (0.2–1.2) | 0.5 (0.3–1.0) |
80+ | 0.3 (0.1–1.0) | 0.5 (0.2–1.3) | 0.4 (0.1–1.3) | 0.5 (0.2–1.5) | 0.4 (0.2–1.1) |
Race (ref: White) | |||||
American Indian/Alaska Native | — | 0.7 (0.1–4.8) | 3.7 (0.7–18.5) | 1.3 (0.1–13.0) | 1.5 (0.3–7.5) |
Asian | 1.9 (0.4–10.0) | 1.0 (0.4–2.3) | 1.5 (0.7–3.5) | 0.9 (0.3–2.6) | 1.3 (0.6–2.8) |
Black | 0.6 (0.2–2.6) | 1.4 (0.5–4.0) | 1.1 (0.3–4.0) | 1.7 (0.5–5.6) | 1.1 (0.3–3.8) |
NH/PI | 1.2 (0.4–4.0) | 1.3 (0.5–3.3) | 1.5 (0.5–4.2) | 1.8 (0.8–4.2) | 1.1 (0.3–3.7) |
All others | — | 0.8 (0.2–3.2) | 1.3 (0.3–6.1) | 0.3 (0.0–2.7) | 1.0 (0.2–4.8) |
Multiple | 0.7 (0.1–3.2) | 2.2 (0.9–5.8) | 1.0 (0.4–2.9) | 0.7 (0.1–3.3) | 2.4 (1.0–5.7) |
Year of diagnosis (ref: 2010–2015) | |||||
2000–2004 | 0.4 (0.1–1.4) | 0.9 (0.4–1.9) | 0.6 (0.2–1.4) | 1.0 (0.4–2.6) | 0.7 (0.3–1.5) |
2005–2009 | 0.5 (0.3–0.9) | 1.1 (0.7–1.7) | 0.5 (0.3–1.0) | 0.8 (0.5–1.4) | 0.9 (0.5–1.4) |
Ethnicity (ref: not Hispanic) | |||||
Hispanic | 1.4 (0.5–4.0) | 3.8 (2.0–7.2) | 2.1 (0.9–4.9) | 1.2 (0.3–4.2) | 4.6 (2.6–8.1) |
BMI (ref: <25.0 kg/m2) | |||||
25.0–29.9 | 1.7 (0.6–4.9) | 0.9 (0.5–1.5) | 0.8 (0.4–1.5) | 0.9 (0.4–1.9) | 0.9 (0.5–1.6) |
30.0+ | 2.3 (1.0–4.9) | 1.2 (0.7–2.1) | 1.0 (0.5–1.9) | 1.2 (0.6–2.5) | 1.1 (0.6–2.0) |
Menopausal status (ref: premenopausal) | |||||
Postmenopausal | 2.5 (0.9–7.3) | 0.9 (0.4–2.2) | 0.9 (0.4–2.2) | 1.4 (0.5–3.9) | 0.8 (0.3–1.8) |
AJCC stage (ref: I) | |||||
IIA | 1.1 (0.5–2.5) | 1.3 (0.7–2.2) | 2.5 (1.3–4.9) | 1.8 (0.9–3.8) | 1.5 (0.9–2.6) |
IIB/C | 5.5 (2.0–14.9) | 4.3 (2.3–8.0) | 12.1 (6.0–24.1) | 6.5 (3.2–13.3) | 7.1 (3.5–14.2) |
IIIA | 3.3 (1.2–9.2) | 3.9 (1.8–8.4) | 13.5 (6.3–29.1) | 5.7 (2.5–13.0) | 7.6 (3.3–17.5) |
Grade (ref: 1) | |||||
2 | 3.2 (1.3–8.2) | 1.1 (0.6–1.9) | 1.5 (0.7–3.2) | 1.7 (0.8–3.7) | 1.1 (0.6–2.0) |
3 | 2.2 (1.0–4.9) | 1.6 (0.9–2.7) | 2.3 (1.1–4.7) | 2.2 (1.0–4.7) | 1.8 (1.0–3.4) |
ER status (ref: ER positive) | |||||
ER negative | 0.9 (0.4–2.0) | 1.7 (1.0–2.9) | 1.6 (0.9–2.7) | 1.8 (1.0–3.2) | 1.7 (1.1–2.8) |
PR status (ref: PR positive) | |||||
PR negative | 1.0 (0.5–2.1) | 2.0 (1.3–3.1) | 1.6 (1.0–2.7) | 1.7 (1.0–3.0) | 2.0 (1.3–3.2) |
HER2 status (ref: HER2 positive) | |||||
HER2 borderline | 1.7 (0.2–16.4) | 0.7 (0.1–4.8) | 0.2 (0.0–1.8) | 1.0 (0.1–8.7) | 0.3 (0.0–1.9) |
HER2 negative | 3.5 (1.4–9.2) | 1.2 (0.7–2.1) | 0.6 (0.3–1.1) | 1.3 (0.6–2.7) | 0.8 (0.5–1.4) |
Surgery type (ref: lumpectomy) | |||||
Mastectomy | 1.3 (0.7–2.6) | 1.6 (1.1–2.6) | 1.5 (0.9–2.5) | 1.5 (0.9–2.6) | 1.8 (1.1–2.8) |
Chemotherapy (ref: no) | |||||
Yes | 1.1 (0.5–2.5) | 1.5 (0.8–2.6) | 2.3 (1.1–4.5) | 1.7 (0.8–3.4) | 1.8 (1.0–3.3) |
Radiotherapy (ref: no) | |||||
Yes | 1.1 (0.5–2.2) | 0.8 (0.5–1.3) | 0.9 (0.5–1.4) | 0.8 (0.5–1.5) | 0.8 (0.5–1.2) |
Endocrine therapy (ref: no) | |||||
Yes | 1.6 (0.7–3.3) | 0.8 (0.5–1.3) | 1.0 (0.6–1.7) | 1.0 (0.6–2.0) | 0.8 (0.5–1.2) |
Charlson score (ref: 0) | |||||
1 | 0.7 (0.3–1.5) | 1.1 (0.6–1.9) | 1.2 (0.6–2.2) | 0.9 (0.4–1.9) | 1.2 (0.7–2.1) |
2 | 0.2 (0.0–1.4) | 0.7 (0.3–1.8) | 0.3 (0.1–1.3) | 0.1 (0.0–1.4) | 0.7 (0.3–1.8) |
3+ | 1.9 (0.7–5.3) | 3.1 (1.3–7.1) | 2.9 (1.2–7.3) | 2.4 (0.8–7.0) | 3.4 (1.5–7.4) |
Abbreviations: AI/AN, American Indian/Alaska Native; AJCC, American Joint Committee on Cancer; BMI, body mass index; ER, estrogen receptor; NH/PI, Native Hawaiian/Pacific Islander; PR, progesterone receptor; SD, standard deviation.
aAdjusted for age (categorical) and stage (categorical).
bHigh Specificity and PPV Combination categorized discordant algorithm pairs as negative.
cHigh Sensitivity Combination categorized discordant algorithm pairs as positive.
Discussion
We extended previous approaches (3, 6) to identify breast cancer recurrences in people with early-stage breast cancer using administrative data. We evaluated several options for identifying recurrences: abstraction, individual algorithms, combined algorithms without chart review to categorize discordant results, and combined algorithms with triangulation to categorize discordant results. We noted wide variation in the number of recurrences identified from each method and performance depending on how we categorized discordant algorithm results. In addition, Algorithms 7 and 9 used different codes to identify recurrences, which may have led to different people being classified as FP and FN. For example, Algorithm 7 included ICD codes for secondary non-breast malignant neoplasms, which may or may not have accurately represented a breast cancer recurrence and could have been coded as a FP (3). Alternatively, Algorithm 9 did not require any treatment codes, which may have missed some recurrences and potentially explains why this algorithm resulted in more FN. These differences underscore the tradeoffs made when choosing a method to ascertain recurrence and highlight the importance of understanding how algorithm choice affects recurrence risk estimations (15).
Overall, our performance results for the combined algorithms showed improvements in specificity and PPV over the individual algorithms, particularly for the High Specificity and PPV combination (98% specificity and 72% PPV). These results were well within the range results noted from prior studies (specificity range 78%–99% and PPV range 14%–94%; ref. 11). Studies that may benefit from using the High Specificity and PPV combination are those that have limited to no chart review resources and wish to minimize potential FP recurrences, which could alter any associations between risk factors and recurrence. Reviewing all records with a positive result would help reduce FP. However, using the High Specificity and PPV combination would miss more recurrences than either algorithm alone or combination. Triangulation, which uses combined algorithms followed by chart abstraction to reclassify discordant algorithm results, resulted in higher sensitivity, specificity, and PPV than either algorithm individually or combined. The sensitivity and PPV for triangulation (80% and 98%, respectively) were similar to those previously published by Kroenke and colleagues (81% and 98%, respectively in the Life After Cancer Epidemiology (LACE) cohort and 73% and 92%, respectively in the Women's Health Initiative (WHI) cohort; ref. 6). However, triangulation would require additional chart abstraction (15% of records in our study), which may not be feasible in a larger study.
Ideally, the true association between risk factors and recurrence in our overall study would be in the same direction as the results from chart review and combined algorithms. The fact that the most of HRs for recurrence were similar in magnitude between the two algorithm combinations suggests our study results will be robust, whichever combination of algorithms we choose. One exception was the association between BMI and recurrence, which was elevated when using the abstraction-based definition for recurrence but not the algorithm-based definitions. This difference may reflect unique characteristics of the chart abstraction sample and/or reduced precision in our estimates. Several prior studies demonstrated small increased recurrence risks associated with increasing BMI with HRs between 1.10–1.30—similar to the HRs we noted for BMI with the combined algorithms (16–19). In addition, while all models indicated an association between increasing cancer stage and recurrence, the HRs differed slightly between models with Algorithm 9 producing the highest HRs. This may be related to FP records that were identified as a recurrence in Algorithm 9 but not in chart review; if they were diagnosed at higher stages, this might alter the association between stage and recurrence for Algorithm 9. Future studies may consider using additional statistical methods to account for potential misclassification in both predictors and outcomes (20).
Researchers should consider potential impacts of study design decisions on algorithm results. For example, specificity and PPV improved slightly after removing people diagnosed with second primary breast cancers from the analysis, suggesting some second primary cancers resulted in algorithm FP. Though this improvement was small, it may be worth censoring records at the time of a second primary breast cancer diagnosis when using a recurrence algorithm if they can be easily identified from a tumor registry. In addition, censoring follow-up in 2014 improved sensitivity while decreasing PPV because truncating follow-up reduced the total number of positives (both TP and FP) by recoding any recurrences diagnosed after 2014 from positive to negative. Thus, ICD-10 codes, which had not been included in previous validation studies for these algorithms, did not substantially alter algorithm performance in this population, but instead reduced the overall number of recurrences. One previous study that evaluated expansion from ICD-9 to ICD-10 codes noted similar results (2). Algorithm development should continue to incorporate and evaluate new codes as they are implemented and used in clinical practice.
When combining algorithms, we had to decide how to handle different recurrence dates; 51 records in our study had recurrences in both algorithms with dates ≥6 months apart. Excluding these records reduced sensitivity because many of them were TP recurrences despite the difference in dates. When evaluating associations between risk factors and recurrences based on combined algorithm results, we selected the earlier of the two algorithm recurrence dates for our outcome. However, one could alternatively use the later date, as this might indicate recurrence completeness, or a midpoint between the two. Understanding potential biases associated with selecting an earlier versus later recurrence date from algorithms is important work for future study. Focusing chart abstraction on records with different recurrence dates from each algorithm could be another approach to improving accuracy with limited resources. While validating recurrence date was beyond the scope of this analysis, this is worth examining in future research.
A strength of our study was our ability to validate two algorithms for breast cancer recurrence and calculate performance measures using different combinations of these algorithms. Although the algorithms we validated were based on work previously done at KPWA (3) and KPNC (6), we conducted this validation in a sample of breast cancer records that were not previously included in either publication. Nonetheless, we acknowledge these algorithms may perform better in our population than in a non-KP population. We can also be reasonably confident that we had access to all diagnosis, procedure, and drug codes for each person (including codes from external claims) up to the date of disenrollment because the KPWA health plan relies on accurate coding for billing (21).
One important limitation of our study is the limited racial and ethnic diversity in our sample. The demographics, while reflective of the KP populations used for the analysis, resulted in wide CIs when analyzing associations between race or ethnicity and risk of recurrence. We were unable to make inferences about how these variables may be associated with algorithm performance. We recognize the importance of conducting future analyses using delivery system data with more proportionate racial/ethnic representation to ensure that we understand the impact of using recurrence algorithms for populations who have historically been excluded from or mistreated by research and subjected to disparities in health care delivery. In addition, we were not able to account for death as a competing event in our analyses while also accounting for sampling weights. Our performance analyses also excluded 28 cases (4%) of the medical record review based on study eligibility criteria. We recognize that a study that did not have resources for medical review would not know to exclude these cases, which may affect algorithm performance. Finally, we acknowledge that the improved accuracy we saw from combining algorithms only applies to the two algorithms we included in this study. Several breast cancer recurrence algorithms exist, and performance of different algorithm combinations will likely differ from our results.
In conclusion, our study advances previous approaches to identifying recurrences using two previously developed algorithms without additional chart abstraction, and highlights issues to consider when applying algorithms to EHR data to identify breast cancer recurrences. We suggest potential analyses to evaluate algorithm performance and describe tradeoffs in deciding which approach to use. Accurate recurrence identification via algorithms could be incredibly useful in large, observational studies to further research on risk factors and treatments that affect recurrence rates. Continued testing, refinement, and standardization of recurrence algorithms across different populations should help improve performance, increase generalizability, and further reduce the need for resource-intensive medical record abstraction to identify breast cancer recurrences (1, 11).
Authors' Disclosures
E.J. Aiello Bowles reports grants from NCI during the conduct of the study. C.H. Kroenke reports grants from NCI during the conduct of the study. J. Chubak reports grants from NIH during the conduct of the study. J. Bhimani reports grants from Memorial Sloan Kettering Cancer Center during the conduct of the study. K. O'Connell reports grants from NCI of the National Institutes of Health and Geoffrey Beene Cancer Research Center during the conduct of the study. S. Brandzel reports grants from NCI during the conduct of the study. M.K. Theis reports grants from NCI during the conduct of the study. E.V. Bandera reports grants from NIH during the conduct of the study; personal fees from Pfizer, Inc outside the submitted work. L.H. Kushi reports grants from NCI during the conduct of the study. E.D. Kantor reports grants from NCI and other support from the Geoffrey Beene Cancer Research Center during the conduct of the study. No disclosures were reported by the other authors.
Disclaimer
The funder did not play a role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the article; and the decision to submit the article for publication.
Authors' Contributions
E.J. Aiello Bowles: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. C.H. Kroenke: Conceptualization, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. J. Chubak: Conceptualization, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. J. Bhimani: Investigation, methodology, writing–review and editing. K. O'Connell: Data curation, formal analysis, validation, methodology, writing–review and editing. S. Brandzel: Supervision, investigation, project administration, writing–review and editing. E. Valice: Data curation, software, writing–review and editing. R. Doud: Data curation, software, writing–review and editing. M.K. Theis: Data curation, software, writing–review and editing. J.M. Roh: Project administration, writing–review and editing. N. Heon: Project administration, writing–review and editing. S. Persaud: Project administration, writing–review and editing. J.J. Griggs: Validation, investigation, writing–review and editing. E.V. Bandera: Conceptualization, funding acquisition, validation, investigation, methodology, project administration, writing–review and editing. L.H. Kushi: Conceptualization, funding acquisition, validation, investigation, methodology, project administration, writing–review and editing. E.D. Kantor: Conceptualization, resources, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.
Acknowledgments
This work was supported by grants from the NCI [R37CA222793 (all authors received), P30CA008748, P01CA154292, R50CA211115 (E.J. Aiello Bowles), and R01CA230440, R01CA253028 (C.H. Kroenke)] as well as the Geoffrey Beene Cancer Research Center at Memorial Sloan Kettering Cancer Research. This work was also supported by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which is funded by contract nos. N01-PC-35142, N01-PC-2010-00029, HHSN261201300012I, and N01 PC-2013-00012 from the SEER Program of the NCI with additional support from the Fred Hutchinson Cancer Research Center and the State of Washington.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).