Background:

We updated algorithms to identify breast cancer recurrences from administrative data, extending previously developed methods.

Methods:

In this validation study, we evaluated pairs of breast cancer recurrence algorithms (vs. individual algorithms) to identify recurrences. We generated algorithm combinations that categorized discordant algorithm results as no recurrence [High Specificity and PPV (positive predictive value) Combination] or recurrence (High Sensitivity Combination). We compared individual and combined algorithm results to manually abstracted recurrence outcomes from a sample of 600 people with incident stage I–IIIA breast cancer diagnosed between 2004 and 2015. We used Cox regression to evaluate risk factors associated with age- and stage-adjusted recurrence rates using different recurrence definitions, weighted by inverse sampling probabilities.

Results:

Among 600 people, we identified 117 recurrences using the High Specificity and PPV Combination, 505 using the High Sensitivity Combination, and 118 using manual abstraction. The High Specificity and PPV Combination had good specificity [98%, 95% confidence interval (CI): 97–99] and PPV (72%, 95% CI: 63–80) but modest sensitivity (64%, 95% CI: 44–80). The High Sensitivity Combination had good sensitivity (80%, 95% CI: 49–94) and specificity (83%, 95% CI: 80–86) but low PPV (29%, 95% CI: 25–34). Recurrence rates using combined algorithms were similar in magnitude for most risk factors.

Conclusions:

By combining algorithms, we identified breast cancer recurrences with greater PPV than individual algorithms, without additional review of discordant records.

Impact:

Researchers should consider tradeoffs between accuracy and manual chart abstraction resources when using previously developed algorithms. We provided guidance for future studies that use breast cancer recurrence algorithms with or without supplemental manual chart abstraction.

This article is featured in Selected Articles from This Issue, p. 345

Breast cancer recurrence is an important outcome in cancer treatment and survivorship research. Most population-based tumor registries do not capture recurrences. Recurrences are not documented in a systematic and clearly structured manner in electronic health records (EHR); thus, accurate identification usually requires resource-intensive chart abstraction (1). Several studies have developed and validated algorithms to identify breast cancer recurrences from EHR data, potentially reducing costs and time required for data collection (2–10). One meta-analysis reported a median sensitivity of 87% (range: 44%–97%) and median positive predictive value (PPV) of 73% (range: 14%–94%) across 14 studies with algorithms for breast cancer recurrence (11).

When using previously developed recurrence algorithms, researchers must select an algorithm, adapt it to their data, and address potential false positives (FP) and false negatives (FN). In addition, algorithms may need to be updated (e.g., to reflect new billing and coding practices). Validating an existing algorithm against chart abstraction requires decisions about which patients to include and extent of manual abstraction, particularly in a setting with constrained resources. Only a few studies have validated or described how they adapted a previously developed algorithm to a new population (6, 8). As use of recurrence algorithms becomes more common in epidemiologic research, following standard practices to evaluate recurrence algorithms will help increase comparability and accuracy of data across studies.

We describe our evaluation of previously developed algorithms for breast cancer recurrence (3, 6) in a population of people diagnosed with incident stage I–IIIA breast cancer. The size of our population prohibited manually collecting gold-standard chart abstraction recurrence data on all people. Thus, we sought to identify an algorithm-based approach that would yield high sensitivity and specificity for recurrence in the absence of chart abstraction. In this study, we describe algorithm selection, adaptation, and analytic considerations to provide scientific direction for future studies.

We used data from the Optimal Breast cancer Chemotherapy Dosing (OBCD) study. The OBCD study included adults ≥18 years of age identified as female diagnosed with primary incident stage I–IIIA breast cancer and enrolled at Kaiser Permanente Washington (KPWA) or Kaiser Permanente Northern California (KPNC). For this validation study, we only included participants from KPWA that had definitive breast surgery for their primary breast cancer diagnosis. We obtained Institutional Review Board approval from KPWA, KPNC, and collaborating sites [Memorial Sloan Kettering (New York, NY) and Rutgers University (New Brunswick, NJ)] with a waiver of consent to collect and analyze data in accordance with the U.S. Federal Policy for the Protection of Human Subjects (Common Rule).

Algorithm choice

We identified algorithms that had been developed and validated using data from KP or similar health systems to increase implementation feasibility and potential accuracy in the same settings (3, 6, 8). We selected a set of recurrence algorithms previously developed and published by Chubak and colleagues (3) among people with stage I–II breast cancer, and later extended by Kroenke and colleagues (6), who evaluated the performance of algorithm pairs with select medical record review to resolve discordant results (triangulation), and included patients with stage I–IIIA breast cancer. We chose one algorithm with high sensitivity (“Algorithm 7” in Kroenke and colleagues) and one algorithm with high specificity and PPV (“Algorithm 9” in Kroenke and colleagues) to identify recurrences (6).

Recurrence algorithm adaptation

We used the selected algorithms to identify breast cancer recurrences that occurred before each participant's death, health plan disenrollment, or end of follow-up (December 31, 2019)—whichever came first. As described previously, the algorithms used procedure codes [Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System (HCPCS) codes] and National Drug Codes (NDC; ref. 3). Specifically, Algorithm 7 first identifies recurrence based on codes for a secondary non-breast malignant neoplasm >180 days after primary breast cancer, then mastectomy >180 days after primary breast cancer, and then radiotherapy >365 days after primary breast cancer. Algorithm 9 first identifies recurrence based on two visits with a code for a secondary malignant neoplasm within 60 days of each other and >365 days after primary breast cancer, and then a non-breast cancer record in the Surveillance Epidemiology and End Results (SEER) registry after the primary breast cancer. Because our study included cases diagnosed after the implementation of ICD-10 codes in 2015, we adapted the Chubak and Kroenke algorithms to include ICD-10 diagnosis codes that mapped to the ICD-9 codes used in the original algorithms (available upon request). In addition, we extended the algorithm code to identify recurrence dates. When a recurrence was identified, the recurrence date was defined as the first date associated with an algorithm-identified recurrence (e.g., an ICD code), following previous methods (12). Different algorithms used different indicators; therefore, the first associated date varied by algorithm.

We used three methods to combine algorithms based on how we treated algorithm-discordant pairs between Algorithm 7 and Algorithm 9. First, we categorized all discordant pairs as no recurrence in the “High Specificity and PPV Combination.” Second, we categorized all discordant pairs as recurrence in the “High Sensitivity Combination.” Third, we used triangulation to categorize discordant pairs as recurrence if a confirmed recurrence was identified in chart abstraction and no recurrence if a recurrence was not identified in chart abstraction. Triangulation aligns with Kroenke's prior work and may be option for studies that have additional chart abstraction resources (6).

Validation sample

Abstraction was undertaken among a sample of the 2,859 people with incident breast cancer diagnosed at KPWA between 2004 and 2015 with EHRs available. We abstracted the following: all discordant records in which one algorithm indicated a recurrence while the other did not (N = 404); a random sample of concordant records where both algorithms indicated no recurrence (N = 100); a random sample of concordant records where both algorithms indicated recurrence (N = 100); and all additional concordant records where both algorithms indicated a recurrence, but corresponding recurrence dates differed by |$ \ge $|6 months (N = 24). We included concordant records because we did not want to assume that concordance between the two algorithms represented a true positive (TP) or true negative (TN). None of the people included in the original validation studies by Chubak and Kroenke were included in this validation sample. Out of 628 records abstracted, we excluded 28 from the final analysis. Four were ineligible for this study (upon further review they were not stage I–IIIA breast cancer or had incomplete EHRs), six had recurrence dates beyond the end of follow-up (December 31, 2019), and 18 did not have definitive surgery for the primary breast cancer diagnosis. We included 600 remaining records in this validation analysis, which were weighted back to the overall population using inverse sampling weights based on the probability of being selected for medical record abstraction after accounting for exclusions. The inverse sampling weights were based entirely on the algorithm results (e.g., all discordant or random samples of concordant records) and were not based on any population characteristics.

Medical record abstraction and gold-standard recurrence definition

We abstracted information on breast cancer recurrences from pathology and imaging reports. We provided chart abstractors with information on the primary breast cancer diagnosis (including date) but blinded them to recurrence algorithm results. We defined recurrence as a tumor in the ipsilateral breast, lymph nodes, or any distant site >120 days after the definitive surgery date with the same histologic type as the primary breast tumor (e.g., ductal, lobular, tubular, mucinous, etc.), following methods described previously (13). Rather than relying on ICD diagnosis codes, the abstractor read through text of oncology notes, pathology reports, and imaging reports to identify recurrences. We included metastatic diagnoses found on clinical imaging with unknown histology. We abstracted information on recurrence date, histology, method of diagnosis (pathology or imaging reports), and tumor markers.

Additional data collection

We collected demographic and clinical characteristics [e.g., age, race, ethnicity, body mass index (BMI), menopausal status, Charlson comorbidity index] at the time of initial breast cancer diagnosis from EHR data. We categorized people as postmenopausal if they were 55 years or older, or younger than 55 with evidence of a bilateral oophorectomy. Charlson comorbidity index was calculated using administrative data from the 12 months prior to diagnosis (14). We collected information on initial breast cancer tumor characteristics [stage, grade, estrogen receptor (ER), progesterone receptor (PR), HER2 status], treatment (surgery type, chemotherapy, radiotherapy, endocrine therapy), and second primary breast cancers by linking to the Seattle-Puget Sound SEER tumor registry.

Statistical analyses

We described the chart-abstracted study population, stratified by abstracted recurrence status. We then described the characteristics of the overall study population using inverse sampling weights. We calculated performance characteristics for each algorithm individually and combined, compared with gold-standard chart abstraction. We calculated the number of TP, FP, FN, and TN in the chart-abstracted study population and weighted to the overall study population. For each individual algorithm and combination, we calculated weighted sensitivity, specificity, PPV, and negative predictive value (NPV) using inverse sampling probabilities.

We sought to understand how algorithm performance might change based on study design decisions. Therefore, we evaluated weighted performance measures among subpopulations: people without a second primary breast cancer diagnosis because these diagnoses may be incorrectly identified as recurrences in an algorithm (remaining N = 523); people with follow-up data truncated at the end of 2014 to mimic an algorithm that uses ICD-9 codes only (N = 556); and people with recurrences identified by both algorithms excluding any with recurrence dates ≥6 months apart because these recurrences may have been incorrectly identified by one or both algorithms (remaining N = 549).

We were interested in how different recurrence algorithm approaches might influence associations between risk factors and recurrence. Therefore, we estimated associations between demographic, clinical, tumor, and treatment characteristics and recurrence. Using Cox proportional hazards models, we modeled recurrence using each definition (from chart abstraction, each algorithm alone, and each combined algorithm) in a separate model. We calculated HRs for recurrence with 95% confidence intervals (CI) using linearized variance estimators to compute SEs, adjusted for age and stage of the incident breast cancer, weighting results using inverse sampling probabilities. For these analyses, we additionally censored people at the date of a second primary breast cancer diagnosis. People with missing data for specific risk factors were excluded from analyses for that risk factor. For cases with different recurrence dates in each algorithm, we used the earliest date for analysis. All analyses were conducted using StataMP.

Data availability

Data contain potentially identifiable information (e.g., dates of diagnoses and exams) that cannot be shared openly without appropriate human subjects approval and data use agreements, but may be available upon reasonable request from the corresponding author.

Among 600 people with primary breast cancer whose charts were abstracted, 118 (20%) had an abstracted recurrence, which, when weighted to the overall population (N = 2,756), corresponded to 224 (8%) estimated recurrences (Table 1). The mean age at primary breast cancer diagnosis was 61 years among the chart abstracted sample and 62 years among the overall population and did not meaningfully vary by recurrence status. Figure 1 shows the number of recurrences identified by Algorithm 7 and Algorithm 9, stratified by chart abstraction recurrence results. Shading indicates whether the recurrences were treated as TP, FN, FP, or TN. Results for Algorithm 7 alone ignore recurrence results from Algorithm 9, and vice versa. The results for the combined algorithms and triangulation show how discordant pairs were classified as FN or TN for the High Specificity and PPV Combination, FP or TP for the High Sensitivity Combination, and TP or TN depending on the chart abstraction results for the Triangulation Combination.

Table 1.

Descriptive characteristics of people with stage I–IIIA breast cancer, stratified by gold-standard recurrence outcomes from manually abstracted medical records, unweighted, and weighted to the overall study population.

Chart abstracted sampleOverall study populationa
RecurrenceRecurrence
AllYesNoAllYesNo
N = 600N = 118N = 482N = 2,756N = 224N = 2,532
Mean (SD)Mean (SD)Mean (SD)Mean (SD)Mean (SD)Mean (SD)
Age at primary diagnosis 61 (13) 60 (12) 61 (13) 62 (13) 59 (18) 63 (12) 
Year of primary diagnosis N (%) N (%) N (%) N (%) N (%) N (%) 
 2004–2007 180 (30) 29 (25) 151 (31) 904 (33) 45 (20) 859 (34) 
 2008–2011 218 (36) 51 (43) 167 (35) 1010 (37) 77 (34) 933 (37) 
 2012–2015 202 (34) 38 (32) 164 (34) 842 (31) 102 (45) 740 (29) 
Race 
 AI/AN 6 (1) 0 (0) 6 (1) 29 (1) 29 (1) 
 Asian 42 (7) 6 (5) 36 (8) 177 (6) 29 (13) 148 (6) 
 Black 24 (4) 5 (4) 19 (4) 93 (3) 6 (3) 87 (3) 
 NH/PI 3 (1) 1 (1) 2 (0) 4 (0) 2 (1) 2 (0) 
 White 498 (83) 104 (88) 394 (82) 2,360 (86) 184 (82) 2,176 (86) 
 Other 12 (2) 0 (0) 12 (3) 56 (2) 56 (2) 
 Multiple 15 (2) 2 (2) 13 (3) 37 (1) 2 (1) 35 (1) 
Ethnicity 
 Not Hispanic 563 (94) 112 (95) 451 (94) 2,671 (97) 216 (97) 2,455 (97) 
 Hispanic 37 (6) 6 (5) 31 (6) 85 (3) 7 (3) 77 (3) 
BMI (kg/m2) at diagnosis 
 <25.0 155 (26) 23 (20) 132 (27) 656 (24) 38 (17) 618 (24) 
 25.0–29.9 168 (28) 37 (31) 131 (27) 822 (30) 74 (33) 747 (30) 
 30.0+ 181 (30) 42 (36) 139 (29) 737 (27) 89 (40) 648 (26) 
 Missing 96 (16) 16 (14) 80 (17) 541 (20) 23 (10) 519 (21) 
Menopausal status at diagnosis 
 Premenopausal 182 (30) 31 (26) 151 (31) 746 (27) 65 (29) 681 (27) 
 Postmenopausal 418 (70) 87 (74) 331 (69) 2,010 (73) 159 (71) 1,852 (73) 
Stage 
 I 226 (38) 37 (31) 189 (39) 1,562 (57) 83 (37) 1,479 (58) 
 IIA 145 (24) 31 (26) 114 (24) 731 (27) 48 (21) 684 (27) 
 IIB/C 132 (22) 29 (25) 103 (21) 286 (10) 62 (28) 224 (9) 
 IIIA 97 (16) 21 (18) 76 (16) 177 (6) 31 (14) 146 (6) 
Grade 
 1 128 (21) 17 (14) 111 (23) 705 (26) 24 (11) 682 (27) 
 2 228 (38) 47 (40) 181 (38) 1,096 (40) 121 (54) 976 (39) 
 3 264 (44) 52 (44) 182 (38) 879 (32) 77 (35) 802 (32) 
 Missing 10 (2) 2 (2) 8 (2) 75 (3) 2 (1) 73 (3) 
ER status 
 Positive 487 (81) 93 (79) 394 (82) 2,274 (83) 186 (83) 2,089 (83) 
 Negative 106 (18) 21 (18) 85 (18) 473 (17) 32 (15) 441 (17) 
 Missing 7 (1) 4 (3) 3 (1) 9 (0) 5 (2) 3 (0) 
PR status 
 Positive 453 (76) 88 (75) 365 (76) 2,170 (79) 177 (79) 1,993 (79) 
 Negative 134 (22) 25 (21) 109 (23) 547 (20) 40 (18) 507 (20) 
 Missing 13 (2) 5 (4) 8 (2) 39 (1) 7 (3) 32 (1) 
HER2 status 
 Positive 99 (17) 10 (9) 89 (19) 457 (17) 14 (6) 443 (18) 
 Borderline 13 (2) 3 (3) 10 (2) 60 (2) 5 (2) 55 (2) 
 Negative 476 (79) 100 (85) 376 (78) 2,182 (79) 199 (89) 1,983 (78) 
 Missing 12 (2) 5 (4) 7 (2) 57 (2) 5 (2) 52 (2) 
Surgery type 
 Lumpectomy 287 (48) 52 (44) 235 (49) 1,635 (59) 102 (46) 1,533 (61) 
 Mastectomy 313 (52) 66 (56) 247 (51) 1,121 (41) 122 (54) 1,000 (40) 
Chemotherapy 
 No 297 (50) 55 (47) 242 (50) 1,926 (70) 110 (49) 1,816 (72) 
 Yes 303 (51) 63 (53) 240 (50) 831 (30) 114 (51) 717 (28) 
Radiotherapy 
 No 234 (39) 52 (44) 182 (38) 1,204 (44) 81 (36) 1,123 (44) 
 Yes 366 (61) 66 (56) 300 (62) 1,552 (56) 142 (64) 1,410 (56) 
Endocrine therapy 
 No 186 (31) 31 (26) 155 (32) 1,142 (42) 45 (20) 1,097 (43) 
 Yes 414 (69) 87 (73) 327 (68) 1,614 (59) 178 (80) 1,435 (57) 
Charlson comorbidity index in year before diagnosis 
 0 464 (77) 97 (82) 367 (76) 2,103 (76) 193 (86) 1,910 (75) 
 1 71 (12) 12 (10) 59 (12) 340 (12) 17 (8) 323 (13) 
 2 31 (5) 2 (2) 29 (6) 207 (8) 3 (2) 203 (8) 
 3+ 34 (6) 7 (6) 27 (6) 106 (4) 10 (4) 97 (4) 
Second breast cancer 
 No 523 (87) 106 (90) 417 (87) 2647 (96) 209 (93) 2,438 (96) 
 Yes 77 (13) 12 (10) 65 (14) 109 (4) 15 (7) 94 (4) 
Chart abstracted sampleOverall study populationa
RecurrenceRecurrence
AllYesNoAllYesNo
N = 600N = 118N = 482N = 2,756N = 224N = 2,532
Mean (SD)Mean (SD)Mean (SD)Mean (SD)Mean (SD)Mean (SD)
Age at primary diagnosis 61 (13) 60 (12) 61 (13) 62 (13) 59 (18) 63 (12) 
Year of primary diagnosis N (%) N (%) N (%) N (%) N (%) N (%) 
 2004–2007 180 (30) 29 (25) 151 (31) 904 (33) 45 (20) 859 (34) 
 2008–2011 218 (36) 51 (43) 167 (35) 1010 (37) 77 (34) 933 (37) 
 2012–2015 202 (34) 38 (32) 164 (34) 842 (31) 102 (45) 740 (29) 
Race 
 AI/AN 6 (1) 0 (0) 6 (1) 29 (1) 29 (1) 
 Asian 42 (7) 6 (5) 36 (8) 177 (6) 29 (13) 148 (6) 
 Black 24 (4) 5 (4) 19 (4) 93 (3) 6 (3) 87 (3) 
 NH/PI 3 (1) 1 (1) 2 (0) 4 (0) 2 (1) 2 (0) 
 White 498 (83) 104 (88) 394 (82) 2,360 (86) 184 (82) 2,176 (86) 
 Other 12 (2) 0 (0) 12 (3) 56 (2) 56 (2) 
 Multiple 15 (2) 2 (2) 13 (3) 37 (1) 2 (1) 35 (1) 
Ethnicity 
 Not Hispanic 563 (94) 112 (95) 451 (94) 2,671 (97) 216 (97) 2,455 (97) 
 Hispanic 37 (6) 6 (5) 31 (6) 85 (3) 7 (3) 77 (3) 
BMI (kg/m2) at diagnosis 
 <25.0 155 (26) 23 (20) 132 (27) 656 (24) 38 (17) 618 (24) 
 25.0–29.9 168 (28) 37 (31) 131 (27) 822 (30) 74 (33) 747 (30) 
 30.0+ 181 (30) 42 (36) 139 (29) 737 (27) 89 (40) 648 (26) 
 Missing 96 (16) 16 (14) 80 (17) 541 (20) 23 (10) 519 (21) 
Menopausal status at diagnosis 
 Premenopausal 182 (30) 31 (26) 151 (31) 746 (27) 65 (29) 681 (27) 
 Postmenopausal 418 (70) 87 (74) 331 (69) 2,010 (73) 159 (71) 1,852 (73) 
Stage 
 I 226 (38) 37 (31) 189 (39) 1,562 (57) 83 (37) 1,479 (58) 
 IIA 145 (24) 31 (26) 114 (24) 731 (27) 48 (21) 684 (27) 
 IIB/C 132 (22) 29 (25) 103 (21) 286 (10) 62 (28) 224 (9) 
 IIIA 97 (16) 21 (18) 76 (16) 177 (6) 31 (14) 146 (6) 
Grade 
 1 128 (21) 17 (14) 111 (23) 705 (26) 24 (11) 682 (27) 
 2 228 (38) 47 (40) 181 (38) 1,096 (40) 121 (54) 976 (39) 
 3 264 (44) 52 (44) 182 (38) 879 (32) 77 (35) 802 (32) 
 Missing 10 (2) 2 (2) 8 (2) 75 (3) 2 (1) 73 (3) 
ER status 
 Positive 487 (81) 93 (79) 394 (82) 2,274 (83) 186 (83) 2,089 (83) 
 Negative 106 (18) 21 (18) 85 (18) 473 (17) 32 (15) 441 (17) 
 Missing 7 (1) 4 (3) 3 (1) 9 (0) 5 (2) 3 (0) 
PR status 
 Positive 453 (76) 88 (75) 365 (76) 2,170 (79) 177 (79) 1,993 (79) 
 Negative 134 (22) 25 (21) 109 (23) 547 (20) 40 (18) 507 (20) 
 Missing 13 (2) 5 (4) 8 (2) 39 (1) 7 (3) 32 (1) 
HER2 status 
 Positive 99 (17) 10 (9) 89 (19) 457 (17) 14 (6) 443 (18) 
 Borderline 13 (2) 3 (3) 10 (2) 60 (2) 5 (2) 55 (2) 
 Negative 476 (79) 100 (85) 376 (78) 2,182 (79) 199 (89) 1,983 (78) 
 Missing 12 (2) 5 (4) 7 (2) 57 (2) 5 (2) 52 (2) 
Surgery type 
 Lumpectomy 287 (48) 52 (44) 235 (49) 1,635 (59) 102 (46) 1,533 (61) 
 Mastectomy 313 (52) 66 (56) 247 (51) 1,121 (41) 122 (54) 1,000 (40) 
Chemotherapy 
 No 297 (50) 55 (47) 242 (50) 1,926 (70) 110 (49) 1,816 (72) 
 Yes 303 (51) 63 (53) 240 (50) 831 (30) 114 (51) 717 (28) 
Radiotherapy 
 No 234 (39) 52 (44) 182 (38) 1,204 (44) 81 (36) 1,123 (44) 
 Yes 366 (61) 66 (56) 300 (62) 1,552 (56) 142 (64) 1,410 (56) 
Endocrine therapy 
 No 186 (31) 31 (26) 155 (32) 1,142 (42) 45 (20) 1,097 (43) 
 Yes 414 (69) 87 (73) 327 (68) 1,614 (59) 178 (80) 1,435 (57) 
Charlson comorbidity index in year before diagnosis 
 0 464 (77) 97 (82) 367 (76) 2,103 (76) 193 (86) 1,910 (75) 
 1 71 (12) 12 (10) 59 (12) 340 (12) 17 (8) 323 (13) 
 2 31 (5) 2 (2) 29 (6) 207 (8) 3 (2) 203 (8) 
 3+ 34 (6) 7 (6) 27 (6) 106 (4) 10 (4) 97 (4) 
Second breast cancer 
 No 523 (87) 106 (90) 417 (87) 2647 (96) 209 (93) 2,438 (96) 
 Yes 77 (13) 12 (10) 65 (14) 109 (4) 15 (7) 94 (4) 

Abbreviations: AI/AN; American Indian/Alaska Native; AJCC, American Joint Committee on Cancer; BMI, body mass index; ER, estrogen receptor; NH/PI, Native Hawaiian/Pacific Islander; PR, progesterone receptor; SD, standard deviation.

aNumbers and %s for the overall study population were calculated using inverse probability sampling weights. Weighted Ns may not add to total due to rounding.

Figure 1.

Methods for defining recurrence versus no recurrence based on algorithms and chart abstraction. This figure shows how each individual and combination of algorithms for breast cancer recurrence was used to define true positives, false positives, true negatives, and false negatives compared with chart abstraction. We reviewed medical records for 600 people to obtain our gold-standard recurrence outcome. This included 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart. When combining algorithms, we categorized recurrence-discordant pairs as negatives in one combination and positive in another. In the High Specificity and PPV Combination, if both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If one or both algorithms did not identify a recurrence, then this was defined no recurrence from this combination. In the High Sensitivity Combination, if one or both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If both algorithms did not identify a recurrence, then this was defined as no recurrence from this combination. For Triangulation Combination, we used chart abstraction results to define discordant algorithm results. If the chart abstraction identified a recurrence, then the discordant result was defined as a recurrence; if the chart abstraction did not identify a recurrence, then the discordant result was defined as no recurrence.

Figure 1.

Methods for defining recurrence versus no recurrence based on algorithms and chart abstraction. This figure shows how each individual and combination of algorithms for breast cancer recurrence was used to define true positives, false positives, true negatives, and false negatives compared with chart abstraction. We reviewed medical records for 600 people to obtain our gold-standard recurrence outcome. This included 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart. When combining algorithms, we categorized recurrence-discordant pairs as negatives in one combination and positive in another. In the High Specificity and PPV Combination, if both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If one or both algorithms did not identify a recurrence, then this was defined no recurrence from this combination. In the High Sensitivity Combination, if one or both algorithms identified a recurrence, then this was defined as a recurrence for this combination. If both algorithms did not identify a recurrence, then this was defined as no recurrence from this combination. For Triangulation Combination, we used chart abstraction results to define discordant algorithm results. If the chart abstraction identified a recurrence, then the discordant result was defined as a recurrence; if the chart abstraction did not identify a recurrence, then the discordant result was defined as no recurrence.

Close modal

Among chart abstracted records, we identified 388 (66% of 600 records) recurrences using Algorithm 7 alone and 234 (39%) recurrences using Algorithm 9 alone (Table 2). The High Specificity and PPV Combination identified 117 (20%) recurrences and had the highest weighted specificity (98%, 95% CI: 97–99) and PPV (72%, 95% CI: 63–80), with a sensitivity of 64% (95% CI: 44–80). The High Sensitivity Combination identified 505 (84%) recurrences and had the highest weighted sensitivity (80%, 95% CI: 49–94), with a specificity of 83% (95% CI: 80–86) and PPV of 29% (95% CI: 25–34). Algorithms had similar NPVs whether used individually or in combination.

Table 2.

Comparison of weighted algorithm performance for each algorithm alone and combined compared with gold-standard chart abstraction, and among subpopulations.

Algorithm 7Algorithm 9High Specificity and PPV CombinationaHigh Sensitivity Combinationb
N recurrences (%) 388 (66%) 234 (39%) 117 (20%) 505 (84%) 
Overall (N=600)c % (95% CI) % (95% CI) % (95% CI) % (95% CI) 
 Sensitivity 78% (49–93) 65% (45–82) 64% (44–80) 80% (49–94) 
 Specificity 88% (85–90) 93% (92–95) 98% (97–99) 83% (80–86) 
 PPV 36% (31–74) 46% (39–53) 72% (63–80) 29% (25–34) 
 NPV 98% (93–99) 97% (93–99) 97% (93–99) 98% (92–100) 
No second breast cancer (N=523) 
 Sensitivity 77% (47–93) 66% (43–83) 65% (43–82) 78% (47–85) 
 Specificity 90% (88–92) 94% (92–95) 98% (97–99) 85% (82–88) 
 PPV 40% (34–46) 47% (40–55) 75% (66–83) 31% (27–37) 
 NPV 98% (92–99) 97% (93–99) 97% (93–99) 80% (92–100) 
Censored in 2014 (N=556) 
 Sensitivity 96% (89–98) 79% (68–87) 78% (67–86) 100% (96–100) 
 Specificity 91% (89–93) 96% (95–97) 98% (97–99) 89% (86–91) 
 PPV 29% (23–36) 43% (34–53) 58% (46–69) 25% (20–31) 
 NPV 100% (100–100) 99% (99–100) 99% (99–99) 100% (100–100) 
Recurrence algorithm dates ≤6 months apart (N=549) 
 Sensitivity 76% (44–92) 59% (38–78) 58% (37–76) 76% (44–93) 
 Specificity 88% (86–90) 94% (92–95) 99% (98–99) 84% (80–87) 
 PPV 32% (27–38) 42% (34–50) 74% (62–83) 26% (21–31) 
 NPV 98% (93–99) 97% (93–99) 97% (93–99) 98% (92–100) 
Algorithm 7Algorithm 9High Specificity and PPV CombinationaHigh Sensitivity Combinationb
N recurrences (%) 388 (66%) 234 (39%) 117 (20%) 505 (84%) 
Overall (N=600)c % (95% CI) % (95% CI) % (95% CI) % (95% CI) 
 Sensitivity 78% (49–93) 65% (45–82) 64% (44–80) 80% (49–94) 
 Specificity 88% (85–90) 93% (92–95) 98% (97–99) 83% (80–86) 
 PPV 36% (31–74) 46% (39–53) 72% (63–80) 29% (25–34) 
 NPV 98% (93–99) 97% (93–99) 97% (93–99) 98% (92–100) 
No second breast cancer (N=523) 
 Sensitivity 77% (47–93) 66% (43–83) 65% (43–82) 78% (47–85) 
 Specificity 90% (88–92) 94% (92–95) 98% (97–99) 85% (82–88) 
 PPV 40% (34–46) 47% (40–55) 75% (66–83) 31% (27–37) 
 NPV 98% (92–99) 97% (93–99) 97% (93–99) 80% (92–100) 
Censored in 2014 (N=556) 
 Sensitivity 96% (89–98) 79% (68–87) 78% (67–86) 100% (96–100) 
 Specificity 91% (89–93) 96% (95–97) 98% (97–99) 89% (86–91) 
 PPV 29% (23–36) 43% (34–53) 58% (46–69) 25% (20–31) 
 NPV 100% (100–100) 99% (99–100) 99% (99–99) 100% (100–100) 
Recurrence algorithm dates ≤6 months apart (N=549) 
 Sensitivity 76% (44–92) 59% (38–78) 58% (37–76) 76% (44–93) 
 Specificity 88% (86–90) 94% (92–95) 99% (98–99) 84% (80–87) 
 PPV 32% (27–38) 42% (34–50) 74% (62–83) 26% (21–31) 
 NPV 98% (93–99) 97% (93–99) 97% (93–99) 98% (92–100) 

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

aHigh Specificity and PPV Combination categorized discordant algorithm pairs as negative.

bHigh Sensitivity Combination categorized discordant algorithm pairs as positive.

cThe overall population includes the entire study sample with 388 discordant records between algorithms, 95 records where the algorithms agreed there was no recurrence, 66 records where the algorithms agreed there was a recurrence, and 51 additional records where the algorithms agreed there was a recurrence but the recurrence dates from each algorithm were >6 months apart.

When we excluded people diagnosed with a second primary breast cancer, none of the weighted performance measures changed substantially (Table 2). When we censored follow-up at the end of 2014 (before ICD-10 codes were implemented), the sensitivity increased to 100% (95% CI: 96–100) for the High Sensitivity Combination, and to 78% (95% CI: 67–86) for the High Specificity and PPV Combination. Specificity also increased slightly while the PPV decreased compared with overall results. Sensitivity decreased slightly and specificity increased across all algorithms when we excluded people with recurrence algorithm dates ≥6 months apart.

There were very few FNs from Algorithm 7 alone (N = 5, 1%) and the High Sensitivity Combination (N = 2, 0%; Supplementary Table S1). In weighted analyses, we estimated 48 FNs from Algorithm 7 alone (2%), and 45 FNs from the High Sensitivity Combination (2%) among the overall population. The High Specificity and PPV Combination had the fewest FPs (N = 34, 6%) compared with other algorithms (Supplementary Table S1). In weighted analyses, this equated to an estimated 55 FPs among the overall population. Using triangulation, we identified 150 recurrences (25% of abstracted cases; Supplementary Table S2). Triangulation resulted in a higher sensitivity (80%, 95% CI: 49–94), specificity (98%, 95% CI: 97–99), and PPV (77%, 95% CI: 68–83) than either algorithm individually or combined.

Abstracted recurrence rates declined as age increased, resulting in a HR as low as 0.3 (95% CI: 0.1–1.0) for people 80 years or older versus women ages 50–59 (Table 3). We noted similar inverse associations between age and recurrence rates based on individual and combined algorithms. Higher BMI was associated with increased abstracted recurrence rates (HR = 2.3, 95% CI: 1.0–4.9 for BMI ≥30 vs. BMI <25). When using algorithms, HRs for the associations between BMI ≥30 and recurrence ranged from 1.0 to 1.2 and none reached statistical significance. People with stage IIB/C or IIIA primary breast cancer had a higher recurrence rate than people with stage I breast cancer based on abstracted results (HR = 5.5, 95% CI: 2.0–14.9 for stage IIB/C, and HR = 3.3, 95% CI: 1.2–9.2 for stage IIIA). These HRs were also elevated when using individual algorithms to define recurrence but varied widely ranging from 4.3 to 12.1 for stage IIB/C and 3.9–13.5 for stage IIIA. There was less variation in these results among the combined algorithms (HR range 6.5–7.1 for stage IIB/C and 5.7–7.6 for stage IIIA).

Table 3.

Weighted, adjusteda HRs for breast cancer recurrence risk associated with demographic, tumor, and treatment characteristics of the primary cancer using chart abstraction, each algorithm alone, and algorithm combinations to define recurrence.

Chart abstractionAlgorithm 7 aloneAlgorithm 9 aloneHigh Specificity and PPV CombinationbHigh Sensitivity Combinationc
HR (95% CI)HR (95% CI)HR (95% CI)HR (95% CI)HR (95% CI)
Age at diagnosis (ref: 50–59) 
 <50 0.6 (0.2–1.9) 0.8 (0.4–1.5) 0.6 (0.3–1.4) 0.8 (0.4–1.6) 0.7 (0.3–1.4) 
 60–69 0.4 (0.2–1.0) 0.8 (0.5–1.5) 0.7 (0.3–1.4) 0.8 (0.4–1.6) 0.7 (0.4–1.4) 
 70–79 0.3 (0.1–0.8) 0.6 (0.3–1.1) 0.5 (0.3–1.1) 0.6 (0.2–1.2) 0.5 (0.3–1.0) 
 80+ 0.3 (0.1–1.0) 0.5 (0.2–1.3) 0.4 (0.1–1.3) 0.5 (0.2–1.5) 0.4 (0.2–1.1) 
Race (ref: White) 
 American Indian/Alaska Native — 0.7 (0.1–4.8) 3.7 (0.7–18.5) 1.3 (0.1–13.0) 1.5 (0.3–7.5) 
 Asian 1.9 (0.4–10.0) 1.0 (0.4–2.3) 1.5 (0.7–3.5) 0.9 (0.3–2.6) 1.3 (0.6–2.8) 
 Black 0.6 (0.2–2.6) 1.4 (0.5–4.0) 1.1 (0.3–4.0) 1.7 (0.5–5.6) 1.1 (0.3–3.8) 
 NH/PI 1.2 (0.4–4.0) 1.3 (0.5–3.3) 1.5 (0.5–4.2) 1.8 (0.8–4.2) 1.1 (0.3–3.7) 
 All others — 0.8 (0.2–3.2) 1.3 (0.3–6.1) 0.3 (0.0–2.7) 1.0 (0.2–4.8) 
 Multiple 0.7 (0.1–3.2) 2.2 (0.9–5.8) 1.0 (0.4–2.9) 0.7 (0.1–3.3) 2.4 (1.0–5.7) 
Year of diagnosis (ref: 2010–2015) 
 2000–2004 0.4 (0.1–1.4) 0.9 (0.4–1.9) 0.6 (0.2–1.4) 1.0 (0.4–2.6) 0.7 (0.3–1.5) 
 2005–2009 0.5 (0.3–0.9) 1.1 (0.7–1.7) 0.5 (0.3–1.0) 0.8 (0.5–1.4) 0.9 (0.5–1.4) 
Ethnicity (ref: not Hispanic) 
 Hispanic 1.4 (0.5–4.0) 3.8 (2.0–7.2) 2.1 (0.9–4.9) 1.2 (0.3–4.2) 4.6 (2.6–8.1) 
BMI (ref: <25.0 kg/m2
 25.0–29.9 1.7 (0.6–4.9) 0.9 (0.5–1.5) 0.8 (0.4–1.5) 0.9 (0.4–1.9) 0.9 (0.5–1.6) 
 30.0+ 2.3 (1.0–4.9) 1.2 (0.7–2.1) 1.0 (0.5–1.9) 1.2 (0.6–2.5) 1.1 (0.6–2.0) 
Menopausal status (ref: premenopausal) 
 Postmenopausal 2.5 (0.9–7.3) 0.9 (0.4–2.2) 0.9 (0.4–2.2) 1.4 (0.5–3.9) 0.8 (0.3–1.8) 
AJCC stage (ref: I) 
 IIA 1.1 (0.5–2.5) 1.3 (0.7–2.2) 2.5 (1.3–4.9) 1.8 (0.9–3.8) 1.5 (0.9–2.6) 
 IIB/C 5.5 (2.0–14.9) 4.3 (2.3–8.0) 12.1 (6.0–24.1) 6.5 (3.2–13.3) 7.1 (3.5–14.2) 
 IIIA 3.3 (1.2–9.2) 3.9 (1.8–8.4) 13.5 (6.3–29.1) 5.7 (2.5–13.0) 7.6 (3.3–17.5) 
Grade (ref: 1) 
 2 3.2 (1.3–8.2) 1.1 (0.6–1.9) 1.5 (0.7–3.2) 1.7 (0.8–3.7) 1.1 (0.6–2.0) 
 3 2.2 (1.0–4.9) 1.6 (0.9–2.7) 2.3 (1.1–4.7) 2.2 (1.0–4.7) 1.8 (1.0–3.4) 
ER status (ref: ER positive) 
 ER negative 0.9 (0.4–2.0) 1.7 (1.0–2.9) 1.6 (0.9–2.7) 1.8 (1.0–3.2) 1.7 (1.1–2.8) 
PR status (ref: PR positive) 
 PR negative 1.0 (0.5–2.1) 2.0 (1.3–3.1) 1.6 (1.0–2.7) 1.7 (1.0–3.0) 2.0 (1.3–3.2) 
HER2 status (ref: HER2 positive) 
 HER2 borderline 1.7 (0.2–16.4) 0.7 (0.1–4.8) 0.2 (0.0–1.8) 1.0 (0.1–8.7) 0.3 (0.0–1.9) 
 HER2 negative 3.5 (1.4–9.2) 1.2 (0.7–2.1) 0.6 (0.3–1.1) 1.3 (0.6–2.7) 0.8 (0.5–1.4) 
Surgery type (ref: lumpectomy) 
 Mastectomy 1.3 (0.7–2.6) 1.6 (1.1–2.6) 1.5 (0.9–2.5) 1.5 (0.9–2.6) 1.8 (1.1–2.8) 
Chemotherapy (ref: no) 
 Yes 1.1 (0.5–2.5) 1.5 (0.8–2.6) 2.3 (1.1–4.5) 1.7 (0.8–3.4) 1.8 (1.0–3.3) 
Radiotherapy (ref: no) 
 Yes 1.1 (0.5–2.2) 0.8 (0.5–1.3) 0.9 (0.5–1.4) 0.8 (0.5–1.5) 0.8 (0.5–1.2) 
Endocrine therapy (ref: no) 
 Yes 1.6 (0.7–3.3) 0.8 (0.5–1.3) 1.0 (0.6–1.7) 1.0 (0.6–2.0) 0.8 (0.5–1.2) 
Charlson score (ref: 0) 
 1 0.7 (0.3–1.5) 1.1 (0.6–1.9) 1.2 (0.6–2.2) 0.9 (0.4–1.9) 1.2 (0.7–2.1) 
 2 0.2 (0.0–1.4) 0.7 (0.3–1.8) 0.3 (0.1–1.3) 0.1 (0.0–1.4) 0.7 (0.3–1.8) 
 3+ 1.9 (0.7–5.3) 3.1 (1.3–7.1) 2.9 (1.2–7.3) 2.4 (0.8–7.0) 3.4 (1.5–7.4) 
Chart abstractionAlgorithm 7 aloneAlgorithm 9 aloneHigh Specificity and PPV CombinationbHigh Sensitivity Combinationc
HR (95% CI)HR (95% CI)HR (95% CI)HR (95% CI)HR (95% CI)
Age at diagnosis (ref: 50–59) 
 <50 0.6 (0.2–1.9) 0.8 (0.4–1.5) 0.6 (0.3–1.4) 0.8 (0.4–1.6) 0.7 (0.3–1.4) 
 60–69 0.4 (0.2–1.0) 0.8 (0.5–1.5) 0.7 (0.3–1.4) 0.8 (0.4–1.6) 0.7 (0.4–1.4) 
 70–79 0.3 (0.1–0.8) 0.6 (0.3–1.1) 0.5 (0.3–1.1) 0.6 (0.2–1.2) 0.5 (0.3–1.0) 
 80+ 0.3 (0.1–1.0) 0.5 (0.2–1.3) 0.4 (0.1–1.3) 0.5 (0.2–1.5) 0.4 (0.2–1.1) 
Race (ref: White) 
 American Indian/Alaska Native — 0.7 (0.1–4.8) 3.7 (0.7–18.5) 1.3 (0.1–13.0) 1.5 (0.3–7.5) 
 Asian 1.9 (0.4–10.0) 1.0 (0.4–2.3) 1.5 (0.7–3.5) 0.9 (0.3–2.6) 1.3 (0.6–2.8) 
 Black 0.6 (0.2–2.6) 1.4 (0.5–4.0) 1.1 (0.3–4.0) 1.7 (0.5–5.6) 1.1 (0.3–3.8) 
 NH/PI 1.2 (0.4–4.0) 1.3 (0.5–3.3) 1.5 (0.5–4.2) 1.8 (0.8–4.2) 1.1 (0.3–3.7) 
 All others — 0.8 (0.2–3.2) 1.3 (0.3–6.1) 0.3 (0.0–2.7) 1.0 (0.2–4.8) 
 Multiple 0.7 (0.1–3.2) 2.2 (0.9–5.8) 1.0 (0.4–2.9) 0.7 (0.1–3.3) 2.4 (1.0–5.7) 
Year of diagnosis (ref: 2010–2015) 
 2000–2004 0.4 (0.1–1.4) 0.9 (0.4–1.9) 0.6 (0.2–1.4) 1.0 (0.4–2.6) 0.7 (0.3–1.5) 
 2005–2009 0.5 (0.3–0.9) 1.1 (0.7–1.7) 0.5 (0.3–1.0) 0.8 (0.5–1.4) 0.9 (0.5–1.4) 
Ethnicity (ref: not Hispanic) 
 Hispanic 1.4 (0.5–4.0) 3.8 (2.0–7.2) 2.1 (0.9–4.9) 1.2 (0.3–4.2) 4.6 (2.6–8.1) 
BMI (ref: <25.0 kg/m2
 25.0–29.9 1.7 (0.6–4.9) 0.9 (0.5–1.5) 0.8 (0.4–1.5) 0.9 (0.4–1.9) 0.9 (0.5–1.6) 
 30.0+ 2.3 (1.0–4.9) 1.2 (0.7–2.1) 1.0 (0.5–1.9) 1.2 (0.6–2.5) 1.1 (0.6–2.0) 
Menopausal status (ref: premenopausal) 
 Postmenopausal 2.5 (0.9–7.3) 0.9 (0.4–2.2) 0.9 (0.4–2.2) 1.4 (0.5–3.9) 0.8 (0.3–1.8) 
AJCC stage (ref: I) 
 IIA 1.1 (0.5–2.5) 1.3 (0.7–2.2) 2.5 (1.3–4.9) 1.8 (0.9–3.8) 1.5 (0.9–2.6) 
 IIB/C 5.5 (2.0–14.9) 4.3 (2.3–8.0) 12.1 (6.0–24.1) 6.5 (3.2–13.3) 7.1 (3.5–14.2) 
 IIIA 3.3 (1.2–9.2) 3.9 (1.8–8.4) 13.5 (6.3–29.1) 5.7 (2.5–13.0) 7.6 (3.3–17.5) 
Grade (ref: 1) 
 2 3.2 (1.3–8.2) 1.1 (0.6–1.9) 1.5 (0.7–3.2) 1.7 (0.8–3.7) 1.1 (0.6–2.0) 
 3 2.2 (1.0–4.9) 1.6 (0.9–2.7) 2.3 (1.1–4.7) 2.2 (1.0–4.7) 1.8 (1.0–3.4) 
ER status (ref: ER positive) 
 ER negative 0.9 (0.4–2.0) 1.7 (1.0–2.9) 1.6 (0.9–2.7) 1.8 (1.0–3.2) 1.7 (1.1–2.8) 
PR status (ref: PR positive) 
 PR negative 1.0 (0.5–2.1) 2.0 (1.3–3.1) 1.6 (1.0–2.7) 1.7 (1.0–3.0) 2.0 (1.3–3.2) 
HER2 status (ref: HER2 positive) 
 HER2 borderline 1.7 (0.2–16.4) 0.7 (0.1–4.8) 0.2 (0.0–1.8) 1.0 (0.1–8.7) 0.3 (0.0–1.9) 
 HER2 negative 3.5 (1.4–9.2) 1.2 (0.7–2.1) 0.6 (0.3–1.1) 1.3 (0.6–2.7) 0.8 (0.5–1.4) 
Surgery type (ref: lumpectomy) 
 Mastectomy 1.3 (0.7–2.6) 1.6 (1.1–2.6) 1.5 (0.9–2.5) 1.5 (0.9–2.6) 1.8 (1.1–2.8) 
Chemotherapy (ref: no) 
 Yes 1.1 (0.5–2.5) 1.5 (0.8–2.6) 2.3 (1.1–4.5) 1.7 (0.8–3.4) 1.8 (1.0–3.3) 
Radiotherapy (ref: no) 
 Yes 1.1 (0.5–2.2) 0.8 (0.5–1.3) 0.9 (0.5–1.4) 0.8 (0.5–1.5) 0.8 (0.5–1.2) 
Endocrine therapy (ref: no) 
 Yes 1.6 (0.7–3.3) 0.8 (0.5–1.3) 1.0 (0.6–1.7) 1.0 (0.6–2.0) 0.8 (0.5–1.2) 
Charlson score (ref: 0) 
 1 0.7 (0.3–1.5) 1.1 (0.6–1.9) 1.2 (0.6–2.2) 0.9 (0.4–1.9) 1.2 (0.7–2.1) 
 2 0.2 (0.0–1.4) 0.7 (0.3–1.8) 0.3 (0.1–1.3) 0.1 (0.0–1.4) 0.7 (0.3–1.8) 
 3+ 1.9 (0.7–5.3) 3.1 (1.3–7.1) 2.9 (1.2–7.3) 2.4 (0.8–7.0) 3.4 (1.5–7.4) 

Abbreviations: AI/AN, American Indian/Alaska Native; AJCC, American Joint Committee on Cancer; BMI, body mass index; ER, estrogen receptor; NH/PI, Native Hawaiian/Pacific Islander; PR, progesterone receptor; SD, standard deviation.

aAdjusted for age (categorical) and stage (categorical).

bHigh Specificity and PPV Combination categorized discordant algorithm pairs as negative.

cHigh Sensitivity Combination categorized discordant algorithm pairs as positive.

We extended previous approaches (3, 6) to identify breast cancer recurrences in people with early-stage breast cancer using administrative data. We evaluated several options for identifying recurrences: abstraction, individual algorithms, combined algorithms without chart review to categorize discordant results, and combined algorithms with triangulation to categorize discordant results. We noted wide variation in the number of recurrences identified from each method and performance depending on how we categorized discordant algorithm results. In addition, Algorithms 7 and 9 used different codes to identify recurrences, which may have led to different people being classified as FP and FN. For example, Algorithm 7 included ICD codes for secondary non-breast malignant neoplasms, which may or may not have accurately represented a breast cancer recurrence and could have been coded as a FP (3). Alternatively, Algorithm 9 did not require any treatment codes, which may have missed some recurrences and potentially explains why this algorithm resulted in more FN. These differences underscore the tradeoffs made when choosing a method to ascertain recurrence and highlight the importance of understanding how algorithm choice affects recurrence risk estimations (15).

Overall, our performance results for the combined algorithms showed improvements in specificity and PPV over the individual algorithms, particularly for the High Specificity and PPV combination (98% specificity and 72% PPV). These results were well within the range results noted from prior studies (specificity range 78%–99% and PPV range 14%–94%; ref. 11). Studies that may benefit from using the High Specificity and PPV combination are those that have limited to no chart review resources and wish to minimize potential FP recurrences, which could alter any associations between risk factors and recurrence. Reviewing all records with a positive result would help reduce FP. However, using the High Specificity and PPV combination would miss more recurrences than either algorithm alone or combination. Triangulation, which uses combined algorithms followed by chart abstraction to reclassify discordant algorithm results, resulted in higher sensitivity, specificity, and PPV than either algorithm individually or combined. The sensitivity and PPV for triangulation (80% and 98%, respectively) were similar to those previously published by Kroenke and colleagues (81% and 98%, respectively in the Life After Cancer Epidemiology (LACE) cohort and 73% and 92%, respectively in the Women's Health Initiative (WHI) cohort; ref. 6). However, triangulation would require additional chart abstraction (15% of records in our study), which may not be feasible in a larger study.

Ideally, the true association between risk factors and recurrence in our overall study would be in the same direction as the results from chart review and combined algorithms. The fact that the most of HRs for recurrence were similar in magnitude between the two algorithm combinations suggests our study results will be robust, whichever combination of algorithms we choose. One exception was the association between BMI and recurrence, which was elevated when using the abstraction-based definition for recurrence but not the algorithm-based definitions. This difference may reflect unique characteristics of the chart abstraction sample and/or reduced precision in our estimates. Several prior studies demonstrated small increased recurrence risks associated with increasing BMI with HRs between 1.10–1.30—similar to the HRs we noted for BMI with the combined algorithms (16–19). In addition, while all models indicated an association between increasing cancer stage and recurrence, the HRs differed slightly between models with Algorithm 9 producing the highest HRs. This may be related to FP records that were identified as a recurrence in Algorithm 9 but not in chart review; if they were diagnosed at higher stages, this might alter the association between stage and recurrence for Algorithm 9. Future studies may consider using additional statistical methods to account for potential misclassification in both predictors and outcomes (20).

Researchers should consider potential impacts of study design decisions on algorithm results. For example, specificity and PPV improved slightly after removing people diagnosed with second primary breast cancers from the analysis, suggesting some second primary cancers resulted in algorithm FP. Though this improvement was small, it may be worth censoring records at the time of a second primary breast cancer diagnosis when using a recurrence algorithm if they can be easily identified from a tumor registry. In addition, censoring follow-up in 2014 improved sensitivity while decreasing PPV because truncating follow-up reduced the total number of positives (both TP and FP) by recoding any recurrences diagnosed after 2014 from positive to negative. Thus, ICD-10 codes, which had not been included in previous validation studies for these algorithms, did not substantially alter algorithm performance in this population, but instead reduced the overall number of recurrences. One previous study that evaluated expansion from ICD-9 to ICD-10 codes noted similar results (2). Algorithm development should continue to incorporate and evaluate new codes as they are implemented and used in clinical practice.

When combining algorithms, we had to decide how to handle different recurrence dates; 51 records in our study had recurrences in both algorithms with dates ≥6 months apart. Excluding these records reduced sensitivity because many of them were TP recurrences despite the difference in dates. When evaluating associations between risk factors and recurrences based on combined algorithm results, we selected the earlier of the two algorithm recurrence dates for our outcome. However, one could alternatively use the later date, as this might indicate recurrence completeness, or a midpoint between the two. Understanding potential biases associated with selecting an earlier versus later recurrence date from algorithms is important work for future study. Focusing chart abstraction on records with different recurrence dates from each algorithm could be another approach to improving accuracy with limited resources. While validating recurrence date was beyond the scope of this analysis, this is worth examining in future research.

A strength of our study was our ability to validate two algorithms for breast cancer recurrence and calculate performance measures using different combinations of these algorithms. Although the algorithms we validated were based on work previously done at KPWA (3) and KPNC (6), we conducted this validation in a sample of breast cancer records that were not previously included in either publication. Nonetheless, we acknowledge these algorithms may perform better in our population than in a non-KP population. We can also be reasonably confident that we had access to all diagnosis, procedure, and drug codes for each person (including codes from external claims) up to the date of disenrollment because the KPWA health plan relies on accurate coding for billing (21).

One important limitation of our study is the limited racial and ethnic diversity in our sample. The demographics, while reflective of the KP populations used for the analysis, resulted in wide CIs when analyzing associations between race or ethnicity and risk of recurrence. We were unable to make inferences about how these variables may be associated with algorithm performance. We recognize the importance of conducting future analyses using delivery system data with more proportionate racial/ethnic representation to ensure that we understand the impact of using recurrence algorithms for populations who have historically been excluded from or mistreated by research and subjected to disparities in health care delivery. In addition, we were not able to account for death as a competing event in our analyses while also accounting for sampling weights. Our performance analyses also excluded 28 cases (4%) of the medical record review based on study eligibility criteria. We recognize that a study that did not have resources for medical review would not know to exclude these cases, which may affect algorithm performance. Finally, we acknowledge that the improved accuracy we saw from combining algorithms only applies to the two algorithms we included in this study. Several breast cancer recurrence algorithms exist, and performance of different algorithm combinations will likely differ from our results.

In conclusion, our study advances previous approaches to identifying recurrences using two previously developed algorithms without additional chart abstraction, and highlights issues to consider when applying algorithms to EHR data to identify breast cancer recurrences. We suggest potential analyses to evaluate algorithm performance and describe tradeoffs in deciding which approach to use. Accurate recurrence identification via algorithms could be incredibly useful in large, observational studies to further research on risk factors and treatments that affect recurrence rates. Continued testing, refinement, and standardization of recurrence algorithms across different populations should help improve performance, increase generalizability, and further reduce the need for resource-intensive medical record abstraction to identify breast cancer recurrences (1, 11).

E.J. Aiello Bowles reports grants from NCI during the conduct of the study. C.H. Kroenke reports grants from NCI during the conduct of the study. J. Chubak reports grants from NIH during the conduct of the study. J. Bhimani reports grants from Memorial Sloan Kettering Cancer Center during the conduct of the study. K. O'Connell reports grants from NCI of the National Institutes of Health and Geoffrey Beene Cancer Research Center during the conduct of the study. S. Brandzel reports grants from NCI during the conduct of the study. M.K. Theis reports grants from NCI during the conduct of the study. E.V. Bandera reports grants from NIH during the conduct of the study; personal fees from Pfizer, Inc outside the submitted work. L.H. Kushi reports grants from NCI during the conduct of the study. E.D. Kantor reports grants from NCI and other support from the Geoffrey Beene Cancer Research Center during the conduct of the study. No disclosures were reported by the other authors.

The funder did not play a role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the article; and the decision to submit the article for publication.

E.J. Aiello Bowles: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. C.H. Kroenke: Conceptualization, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. J. Chubak: Conceptualization, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. J. Bhimani: Investigation, methodology, writing–review and editing. K. O'Connell: Data curation, formal analysis, validation, methodology, writing–review and editing. S. Brandzel: Supervision, investigation, project administration, writing–review and editing. E. Valice: Data curation, software, writing–review and editing. R. Doud: Data curation, software, writing–review and editing. M.K. Theis: Data curation, software, writing–review and editing. J.M. Roh: Project administration, writing–review and editing. N. Heon: Project administration, writing–review and editing. S. Persaud: Project administration, writing–review and editing. J.J. Griggs: Validation, investigation, writing–review and editing. E.V. Bandera: Conceptualization, funding acquisition, validation, investigation, methodology, project administration, writing–review and editing. L.H. Kushi: Conceptualization, funding acquisition, validation, investigation, methodology, project administration, writing–review and editing. E.D. Kantor: Conceptualization, resources, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.

This work was supported by grants from the NCI [R37CA222793 (all authors received), P30CA008748, P01CA154292, R50CA211115 (E.J. Aiello Bowles), and R01CA230440, R01CA253028 (C.H. Kroenke)] as well as the Geoffrey Beene Cancer Research Center at Memorial Sloan Kettering Cancer Research. This work was also supported by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which is funded by contract nos. N01-PC-35142, N01-PC-2010-00029, HHSN261201300012I, and N01 PC-2013-00012 from the SEER Program of the NCI with additional support from the Fred Hutchinson Cancer Research Center and the State of Washington.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

1.
Warren
JL
,
Yabroff
KR
.
Challenges and opportunities in measuring cancer recurrence in the United States
.
J Natl Cancer Inst
2015
;
107
:
djv134
.
2.
Carroll
NM
,
Ritzwoller
DP
,
Banegas
MP
,
O'Keeffe-Rosetti
M
,
Cronin
AM
,
Uno
H
, et al
.
Performance of cancer recurrence algorithms after coding scheme switch from international classification of diseases 9th revision to international classification of diseases 10th revision
.
JCO Clin Cancer Inform
2019
;
3
:
1
9
.
3.
Chubak
J
,
Yu
O
,
Pocobelli
G
,
Lamerato
L
,
Webster
J
,
Prout
MN
, et al
.
Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer
.
J Natl Cancer Inst
2012
;
104
:
931
40
.
4.
Cronin-Fenton
D
,
Kjærsgaard
A
,
Nørgaard
M
,
Amelio
J
,
Liede
A
,
Hernandez
RK
, et al
.
Breast cancer recurrence, bone metastases, and visceral metastases in women with stage II and III breast cancer in Denmark
.
Breast Cancer Res Treat
2018
;
167
:
517
28
.
5.
Hassett
MJ
,
Ritzwoller
DP
,
Taback
N
,
Carroll
N
,
Cronin
AM
,
Ting
GV
, et al
.
Validating billing/encounter codes as indicators of lung, colorectal, breast, and prostate cancer recurrence using 2 large contemporary cohorts
.
Med Care
2014
;
52
:
e65
73
.
6.
Kroenke
CH
,
Chubak
J
,
Johnson
L
,
Castillo
A
,
Weltzien
E
,
Caan
BJ
.
Enhancing breast cancer recurrence algorithms through selective use of medical record data
.
J Natl Cancer Inst
2016
;
108
:
djv336
.
7.
Pedersen
RN
,
Öztürk
B
,
Mellemkjær
L
,
Friis
S
,
Tramm
T
,
Nørgaard
M
, et al
.
Validation of an algorithm to ascertain late breast cancer recurrence using Danish medical registries
.
Clin Epidemiol
2020
;
12
:
1083
93
.
8.
Ritzwoller
DP
,
Hassett
MJ
,
Uno
H
,
Cronin
AM
,
Carroll
NM
,
Hornbrook
MC
, et al
.
Development, validation, and dissemination of a breast cancer recurrence detection and timing informatics algorithm
.
J Natl Cancer Inst
2018
;
110
:
273
81
.
9.
Zeng
Z
,
Espino
S
,
Roy
A
,
Li
X
,
Khan
SA
,
Clare
SE
, et al
.
Using natural language processing and machine learning to identify breast cancer local recurrence
.
BMC Bioinformatics
2018
;
19
:
498
.
10.
Aagaard Rasmussen
L
,
Jensen
H
,
Flytkjaer Virgilsen
L
,
Jellesmark Thorsen
LB
,
Vrou Offersen
B
,
Vedsted
P
.
A validated algorithm for register-based identification of patients with recurrence of breast cancer-based on Danish Breast Cancer Group (DBCG) data
.
Cancer Epidemiol
2019
;
59
:
129
34
.
11.
Izci
H
,
Tambuyzer
T
,
Tuand
K
,
Depoorter
V
,
Laenen
A
,
Wildiers
H
, et al
.
A systematic review of estimating breast cancer recurrence at the population level with administrative data
.
J Natl Cancer Inst
2020
;
112
:
979
88
.
12.
Chubak
J
,
Onega
T
,
Zhu
W
,
Buist
DSM
,
Hubbard
RA
.
An electronic health record-based algorithm to ascertain the date of second breast cancer events
.
Med Care
2017
;
55
:
e81
e7
.
13.
Boudreau
DM
,
Yu
O
,
Chubak
J
,
Wirtz
HS
,
Bowles
EJ
,
Fujii
M
, et al
.
Comparative safety of cardiovascular medication use and breast cancer outcomes among women with early stage breast cancer
.
Breast Cancer Res Treat
2014
;
144
:
405
16
.
14.
Deyo
RA
,
Cherkin
DC
,
Ciol
MA
.
Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases
.
J Clin Epidemiol
1992
;
45
:
613
9
.
15.
Chubak
J
,
Pocobelli
G
,
Weiss
NS
.
Tradeoffs between accuracy measures for electronic health care data algorithms
.
J Clin Epidemiol
2012
;
65
:
343
9
.
16.
Cespedes Feliciano
EM
,
Kwan
ML
,
Kushi
LH
,
Chen
WY
,
Weltzien
EK
,
Castillo
AL
, et al
.
Body mass index, PAM50 subtype, recurrence, and survival among patients with nonmetastatic breast cancer
.
Cancer
2017
;
123
:
2535
42
.
17.
Feigelson
HS
,
Bodelon
C
,
Powers
JD
,
Curtis
RE
,
Buist
DSM
,
Veiga
LHS
, et al
.
Body mass index and risk of second cancer among women with breast cancer
.
J Natl Cancer Inst
2021
;
113
:
1156
60
.
18.
Pang
Y
,
Wei
Y
,
Kartsonaki
C
.
Associations of adiposity and weight change with recurrence and survival in breast cancer patients: a systematic review and meta-analysis
.
Breast Cancer
2022
;
29
:
575
88
.
19.
Petrelli
F
,
Cortellini
A
,
Indini
A
,
Tomasello
G
,
Ghidini
M
,
Nigro
O
, et al
.
Association of obesity with survival outcomes in patients with cancer: a systematic review and meta-analysis
.
JAMA Netw Open
2021
;
4
:
e213520
.
20.
Lyles
RH
,
Lin
J
.
Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting
.
Stat Med
2010
;
29
:
2297
309
.
21.
Ross
TR
,
Ng
D
,
Brown
JS
,
Pardee
R
,
Hornbrook
M
,
Hart
G
, et al
.
The HMO Research Network Virtual Data Warehouse: a public model to support collaboration
.
EGEMS
2014
;
2
:
1049
.