Purpose:

Cancer treatments can paradoxically appear to reduce the risk of noncancer mortality in observational studies, due to residual confounding. Here we introduce a method, Bias Reduction through Analysis of Competing Events (BRACE), to reduce bias in the presence of residual confounding.

Experimental Design:

BRACE is a novel method for adjusting for bias from residual confounding in proportional hazards models. Using standard simulation methods, we compared BRACE with Cox proportional hazards regression in the presence of an unmeasured confounder. We examined estimator distributions, bias, mean squared error (MSE), and coverage probability. We then estimated treatment effects of high versus low intensity treatments in 36,630 prostate cancer, 4,069 lung cancer, and 7,117 head/neck cancer patients, using the Veterans Affairs database. We analyzed treatment effects on cancer-specific mortality (CSM), noncancer mortality (NCM), and overall survival (OS), using conventional multivariable Cox and propensity score (adjusted using inverse probability weighting) models, versus BRACE-adjusted estimates.

Results:

In simulations with residual confounding, BRACE uniformly reduced both bias and MSE. In the absence of bias, BRACE introduced bias toward the null, albeit with lower MSE. BRACE markedly improved coverage probability, but with a tendency toward overcorrection for effective but nontoxic treatments. For each clinical cohort, more intensive treatments were associated with significantly reduced hazards for CSM, NCM, and OS. BRACE attenuated OS estimates, yielding results more consistent with findings from randomized trials and meta-analyses.

Conclusions:

BRACE reduces bias and MSE when residual confounding is present and represents a novel approach to improve treatment effect estimation in nonrandomized studies.

Translational Relevance

Observational comparative effectiveness studies are at risk for bias from residual confounding due to treatment selection bias. This bias can lead to erroneous inferences regarding the efficacy of various treatment strategies across disciplines in oncology. Moreover, even if residual confounding is suspected, it is not typically quantified and its impact is generally imprecisely estimated. Using three large nationwide observational cohorts of cancer patients, we show that intensive treatment is associated with improvement in overall survival, cancer-specific survival, and noncancer mortality. The associations between intensive treatment and reduced noncancer mortality, even after multivariable Cox modeling and propensity score weighting, suggest the presence of treatment selection bias. The novel Bias Reduction through Analysis of Competing Events (BRACE) method was designed to mitigate this bias and is shown here in simulation studies and in large cohorts of cancer patients to reduce bias and improve treatment effect estimation, which can facilitate more accurate inferences in the absence of randomized data.

Although treatment decisions and trial designs ideally are informed by preexisting high-quality randomized data, there are many scenarios where no relevant trial exists, or existing studies are flawed, outdated, underpowered, or poorly generalizable, necessitating observational designs to estimate treatment effects. Bias due to residual confounding (often called treatment selection bias) is an important problem, however, that complicates inferences from nonrandomized comparative effectiveness studies (1, 2). In observational data, multivariable regression models and propensity scores are common approaches to reduce bias from measured confounders (3). However, residual confounding from unmeasured or unknown confounders remains a pernicious problem that can undermine conclusions from such analyses and cannot be overcome by adjustment, scoring, or weighting methods (4–8). Importantly, biased inferences from observational data can mislead the medical field, resulting in patients receiving toxic, costly, and ineffective therapies (9, 10). Therefore, novel methods to mitigate treatment selection bias are of considerable significance to both outcomes research and clinical practice.

Competing event analysis allows for the identification of residual confounding problems in observational data, particularly when the effect of a treatment on competing events can be bounded a priori (11). For example, although the addition of a novel cancer treatment to a standard regimen may have no effect on or even increase mortality from noncancer health events, such as cardiac disease, it should not intrinsically reduce the incidence of such noncancer events. Despite this, in nonrandomized data, competing event analysis can reveal a lower incidence of competing health events in the group receiving intensified treatment, due to unmeasured confounding by more favorable health characteristics in this group, even after appropriately controlling for measurable confounders (12). When present, this phenomenon typically indicates the presence of residual confounding, assuming that more intensive treatment does not truly reduce the risk for competing events.

Although diagnosing residual confounding with a competing events analysis is helpful (12), there remains no consensus on how to address it (2, 4). Here we propose a method, Bias Reduction through Analysis of Competing Events (BRACE), to mitigate this bias (13). We demonstrate the method's performance in simulated data and three prospective observational studies of U.S. veterans treated for prostate, lung, or head/neck cancer, and compare our findings with results from landmark randomized trials and/or meta-analyses. We also applied BRACE to externally collected data for further validation.

BRACE technique

BRACE-corrected estimates of the effect of treatment on overall survival (OS, | $\widehat{\Theta}_{\rm c}$ |⁠) were obtained as the sum of the (adjusted) effect estimate on OS (⁠| ${\widehat{\Theta}}$ |⁠) and the product (1 − | $\widehat{\Theta}_2$ |⁠) (1 − | $\widehat{\omegar}_0$ |⁠), where | $\widehat{\Theta}_2$ | is the adjusted effect estimate on the competing event (NCM) and | $\widehat{\omegar}_0$ | is the estimated proportion of the overall event hazard attributable to the primary event (cancer-specific mortality; CSM; refs. 11–17), i.e.

formula

More details regarding BRACE derivation are given in the Supplementary Methods.

Simulation methods

Our simulation analyses followed recommended guidelines (ADEMP; ref. 18). The hypothetical data were generated to simulate the presence of a confounder that introduces heterogeneous risk for the baseline primary and competing event hazards (λ01 and λ02, respectively), without influencing the effects of treatment on the baseline cause-specific hazards (Θ1 and Θ2, respectively). In the scenarios presented, the bias parameter (κ) increases the log odds of treatment in the presence of the confounder, while decreasing λ02 by a factor of 1/3. We used a standard unadjusted Cox proportional hazards model (18) to estimate the treatment effect (⁠| $\widehat{\Theta}_{\rm u}$ |⁠) on OS, assuming the simulated confounding variable is unobserved. We then compared this estimate with the corrected estimate (⁠| $\widehat{\Theta}_{\rm c}$ |⁠) obtained via BRACE, based on the correction shown in equation (1). As a validation step, we also compared both methods to a hypothetical estimate obtained by reweighting the Cox model with inverse probability of treatment weighting (IPTW), treating the confounding factor as measurable.

For each method, we estimated and compared bias, mean squared error (MSE), and coverage probability using a parametric model under varying inputs for effect size, distribution parameter(s), sample size, and magnitude of bias. Effect sizes and bias were varied factorially, whereas the other model inputs were varied one at a time. We compared three effect size conditions: no treatment effect (i.e., Θ1 = Θ2 = Θ = 1), effective but nontoxic treatment (i.e., Θ1 < 1, Θ2 = 1, Θ < 1), and effective but toxic treatment with negating effects on the composite endpoint (i.e., Θ1 < 1, Θ2 > 1, Θ = 1). Importantly, in both the case of no treatment effect and effective but toxic treatment, Θ = 1, the underlying reasons are different, and therefore the impact of BRACE is expected to differ. The null hypothesis was Θ = 1 (alternative hypothesis Θ ≠ 1). We applied BRACE when | $\widehat{\Theta}_2$ | < 1 and the upper limit of the 95% confidence interval (CI) for | $\widehat{\Theta}$ | < 1; otherwise, | $\widehat{\Theta}_{\rm c}$ | defaulted to | $\widehat{\Theta}_{\rm u}$ |⁠. Note that when | $\widehat{\Theta}_2$ | < 1, this indicates that treatment reduces risk for competing events relative to the comparison group; in the case of intensive treatments, this implies the potential for residual confounding and bias.

For the parametrization of event times, we assumed a Weibull distribution for each of the cause-specific event times, with random censoring, under varying Weibull constants (γ < 1, γ = 1, γ > 1). We tested models using two sample sizes (large, N = 1,500 per trial; small, N = 250 per trial). For the bias parameter, we tested three conditions: none, moderate, and severe). For moderate and severe bias conditions, the bias parameter was drawn from a Bernoulli distribution. The biasing factor increases the log odds of being assigned to the treatment group and decreases (by a constant multiple of 1/3) the baseline hazard for event 2 (λ02). Note that this factor neither directly influences the effects of treatment (because Θ1 and Θ2 do not vary with the factor) nor indirectly influences the composite treatment effect (Θ) due to its effect on λ02; for further discussion see Mell & Jeong (11).

Simulated data were generated in SAS v9.4 (SAS Institute Inc.). For each scenario, we generated a sample of 500 trials. Random number seeds were set once per repetition and random number states were stored at the start of each repetition. For outcome regression modeling and bias corrections, we used R version 4.0.2, coxph function (survival package; ref. 19). To estimate | $\widehat{\omegar}_0$ |⁠, we used the Nelson–Aalen method, as described elsewhere (20). For graphical analysis of results, we used ridge, boxplot, zip, and forest plots (rsimsum and ggplot2 packages in R).

Population and sampling methods

We applied BRACE to observational cohorts of patients treated for prostate cancer, lung cancer, and head/neck cancer, sampled from the Veterans Affairs (VA) Informatics and Computing Infrastructure (VINCI) database. VINCI contains detailed electronic medical records for veterans treated across the United States with tumor registry data collected by trained registrars according to standardized protocols (21). Further details on each cohort are given below. Each cohort was previously analyzed for separate purposes but was adapted as described below to adhere as close as possible to previously published sampling methodologies (22–24). This study was conducted in accordance with recognized ethical guidelines including the Declaration of Helsinki and was approved by our institutional and local VA institutional review boards. Waiver of informed consent was obtained.

Outcomes

The primary outcome of interest was OS. For competing risks analysis, we analyzed two events: CSM and noncancer (competing) mortality. Patients with documented follow-up visits and no death event were coded as alive at last follow-up with event times censored. Date of death and cause of death were obtained via the National Death Index from the Department of Defense for deaths through 2014 consistent with work from other groups (25) and tumor registry data for deaths after 2014, which are linked to the VA data by social security numbers. Survival times (days) were measured from the date of diagnosis.

Statistical methods

Unadjusted and multivariable Cox proportional hazards models (MVA) were fit for each outcome and each cohort. Missing data were imputed using iterative model-based imputation (IRMI; ref. 26). Adjustment variables were determined using backward selection, retaining covariates found to be associated with any outcome (threshold: P < 0.20). The proportional hazards assumption was checked for treatment (cox.zph function in the survival package in R), and when violated, treatment was modeled as a time-varying covariate. For propensity score adjustment, we implemented IPTW with multivariable Cox models using stabilized weights derived from the same sets of covariates. An average treatment effect approach was used for estimation of treatment effects. Each IPTW model was checked for covariate balance across treatment groups, with a standardized mean difference (SMD) threshold of 0.05. BRACE was then applied to the IPTW model estimates to obtain bias-corrected estimates (⁠| $\widehat{\Theta}_{\rm c}$ |⁠). Bootstrapped CIs for | $\widehat{\Theta}_{\rm c}$ | were estimated with 500 replicates. Monte Carlo estimates of | $\widehat{\Theta}_{\rm c}$ | were obtained by randomly cosampling values of | $\widehat{\Theta}_2$ | and | $\widehat{\omegar}_0$ | from their respective distributions (1,000 replicates). CIs were defined by the 2.5th and 97.5th percentiles of the sampling distributions. The BRACE method does not generate P values; CIs were compared between methods. Statistical analyses were performed using R version 4.0.2 (R Foundation for Statistical Computing).

Prostate cohort

The prostate cancer cohort included 36,630 patients with T1-T2N0M0 prostate cancer, with prostate-specific antigen (PSA) < 20, who were diagnosed between 2000 and 2015 and received radical prostatectomy and/or radiotherapy (RT; ref. 22). The treatment effect of interest was radical prostatectomy versus RT. Covariables were treatment, age (categorical in 10-year increments), race (White/Black/Other), NCI-adapted Charlson comorbidity index (CCI; 0/1/≥2), body mass index (BMI; <18.5, 18.5–25, 25–30, >30), smoking status (nonsmoker/current smoker), employment status (not employed/employed), marital status (not married/married), alcohol use (nondrinker vs. current use), baseline PSA (<10 vs. 10–20), Gleason score (6/7/8–10), and T category. The cohort analyzed here excluded patients with missing data, following the previously published sampling procedure (22). Each Cox model was stratified by use of hormonal therapy. Use of hormonal therapy was recorded as a binary yes/no based on whether hormonal therapy was given within 6 months of RT/surgery. Alcohol use was excluded from each model after backward selection.

For additional external validation (i.e., in a cohort where we did not determine the sampling methods), we applied the BRACE method to a cohort of patients from SEER data with low-risk prostate cancer (cT1–T2a, PSA <10, and Gleason 6) who received radical prostatectomy, brachytherapy, or external beam RT from 2005 to 2015, as described in detail elsewhere (27). We were provided the extracted and curated data set. Brachytherapy and external beam RT were combined in a single RT group and were compared with radical prostatectomy. For multivariable analysis, initial variables included age at diagnosis (continuous), race (White/Black/Other/Unknown), insurance status (Insured/Uninsured/Unknown), and year of diagnosis (continuous). Year of diagnosis was excluded from each model after backward selection. On balance check after propensity score weighting, all covariates had an SMD of <0.05 except for age (0.11).

Lung cohort

The lung cohort included 4,069 patients with biopsy-proven clinical stage I (T1–T2a N0 M0) non–small cell lung cancer (NSCLC) diagnosed between 2006 and 2015 and treated definitively with surgery (lobectomy or sublobar resection) or RT. The primary treatment effect of interest for this cohort was lobectomy versus sublobar resection or definitive RT, as previously described (23). Covariables were treatment, age (categorical in 10-year increments), sex, race (White/Black/Other), smoking status (never/current/past), CCI (0/1/2/3+), pretreatment forced expiratory volume in one second (FEV1; pretreatment percent predicted, categorical, <30%, 31%–50%, 51%–80%, >80%), T category (T1a vs. T1b vs. T2a), grade, and histology. Missing values were imputed using IRMI.

Head/neck cohort

The head and neck cohort included 7,117 patients with locoregionally advanced, nonmetastatic (AJCC 7th edition stage III–IVB) squamous cell carcinoma of the oropharynx, oral cavity, larynx, and hypopharynx diagnosed between 2005 and 2015 and treated with definitive (at least 5 weeks of) RT with or without chemotherapy. The primary treatment effect of interest for this cohort was intensive therapy (defined as RT with concurrent cisplatin or with multiagent induction chemotherapy) versus alternative therapy (defined as RT alone or RT with other systemic therapies not qualifying as intensive), as previously described (24). Covariables in each model were intensive treatment, age (categorical), sex, race (White/Black/Other), Eastern Cooperative Oncology Group performance status (2 vs. 0–1), CCI (2 vs. 0–1), BMI (<18.5, 18.5–25, 25–30, >30), smoking status (never/past/current), marital status (currently employed vs. not employed), employment status (currently employed vs. not employed), primary tumor site (oral cavity vs. hypopharynx/larynx vs. p16-negative oropharynx vs. p16-positive oropharynx), T category (1–2 vs. 3–4), and N category (N0–1 vs. 2–3) per American Joint Committee on Cancer (AJCC) 7th edition. Unknown p16 status was considered positive for oropharyngeal cancer patients and negative for nonoropharyngeal cancer patients, consistent with other studies (28, 29). Other missing values were imputed with IRMI.

Data availability statement

Simulation data sets are available from the corresponding author on reasonable request. VINCI and SEER data supporting the findings of this study are available from the U.S. VA Administration and SEER, respectively, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of these authorities.

Simulation studies

Table 1 shows summary statistics for the analysis of simulated data comparing treatment effect estimates with the standard Cox model versus BRACE in the presence of unmeasured confounding. In general, BRACE uniformly reduced both bias and MSE compared with the standard Cox model. For example, in the case of moderate bias, with effective but nontoxic treatment (Table 1, row 5), the MSE was 3.51 × 10–2 with the standard model versus 0.259 × 10–2 with BRACE. In some cases, BRACE even reduced MSE relative to a hypothetical method that was given the advantage of treating the confounding factor as though it were measured, likely due its incorporation of prior knowledge about Θ2. For example, with an effective but nontoxic treatment (Table 1, row 5), the bias (1.92 × 10−2) and MSE (0.259 × 10–2) were both lower with BRACE. Results were similar when the standard Cox model was replaced with a propensity score model; hence, only the former are shown.

Table 1.

Summary statistics comparing the standard Cox model vs. BRACE method.

Summary statistics comparing the standard Cox model vs. BRACE method.
Summary statistics comparing the standard Cox model vs. BRACE method.

Our conclusions were similar under different assumptions about the Weibull parameter (γ), magnitude of bias, relative baseline event hazards (ω0), and sample size [except for standard errors (SEs) as expected in the small sample simulation; Supplementary Results]. In the scenario with no treatment selection bias and homogeneous risk for the baseline event hazards for each sample (Supplementary Table S1, rows 1–3), each method returns (as expected) unbiased estimates with low error. In the case of an effective but nontoxic treatment (Supplementary Table S1, row 2), BRACE introduced bias toward the null (2.90 × 10–2) compared with the standard method (0.331 × 10–2), albeit with lower MSE (0.341 × 10–2 vs. 0.484 × 10–2, respectively).

Figure 1 compares various plots corresponding to the results of our simulations presented in Table 1. As seen in the ridge plots, when the effect of treatment on the composite event was null (Fig. 1A and C), the distribution of the BRACE-adjusted estimates falls between the standard estimator and the hypothetical (i.e., “gold standard”). In the scenario with an effective but nontoxic treatment (Fig. 1B), BRACE overcorrected the bias slightly, but still resulted in lower bias and MSE, as seen in box plots (Fig. 1DF) and forest plots (Fig. 1GI). In all cases, both bias and MSE were lower with BRACE when residual confounding was present.

Figure 1.

Ridge plots (AC), box plots (DF), forest plots (GI), and zip plots (JL) comparing standard vs. BRACE methods in the presence of residual confounding. The hypothetical case where the confounder is measurable is also shown for comparison. A, D, G, and J correspond to the simulation from Table 1, row 4 (null treatment effect). B, E, H, and K correspond to the simulation from Table 1, row 5 (effective but nontoxic treatment). C, F, I, and L correspond to the simulation from Table 1, row 6 (effective and toxic treatment, with negating effects on the composite event). Treatment effects are shown on the log scale. For ridge plots, the dark vertical line represents the true value of the treatment effect (theta). For box plots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. For forest plots: center point, mean; horizontal lines, 95% CI limits.

Figure 1.

Ridge plots (AC), box plots (DF), forest plots (GI), and zip plots (JL) comparing standard vs. BRACE methods in the presence of residual confounding. The hypothetical case where the confounder is measurable is also shown for comparison. A, D, G, and J correspond to the simulation from Table 1, row 4 (null treatment effect). B, E, H, and K correspond to the simulation from Table 1, row 5 (effective but nontoxic treatment). C, F, I, and L correspond to the simulation from Table 1, row 6 (effective and toxic treatment, with negating effects on the composite event). Treatment effects are shown on the log scale. For ridge plots, the dark vertical line represents the true value of the treatment effect (theta). For box plots: center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. For forest plots: center point, mean; horizontal lines, 95% CI limits.

Close modal

BRACE also markedly improved coverage probability (i.e., proportion of intervals containing the true Θ) relative to the standard approach, but with a tendency toward overcorrection in the case of the effective but nontoxic treatment (Fig. 1JL). For all scenarios, BRACE attenuated the tendency of the standard model toward extreme downward bias and improved inference regarding the likely true effect. In general, changing the input parameters had little effect on the ability of BRACE to improve estimation, as shown in Supplementary Table S1. Further description and results of sensitivity analysis are provided in Supplementary Results.

In the primary analysis, we only applied the BRACE correction to trials for which | $\widehat{\Theta}_2$ | < 1 and the upper limit of the 95% CI for | $\widehat{\Theta}$ | < 1. As an alternative approach, we applied BRACE to trials for which | $\widehat{\Theta}_2$ | < 1 and | $\widehat{\Theta}$ | < 1, regardless of whether the null hypothesis is rejected. When we applied this less conservative approach using the same parameter inputs as in rows 4 to 6 from Table 1, we found that when all the treatment effects were null, expected values for bias (0.664 × 10–2) and MSE (0.219 × 10–2) were still low, with | $\widehat{\Theta}_{\rm c}= 1.01$ |⁠. For the situations with an effective and nontoxic treatment and an effective but toxic treatment, the corresponding values were, respectively: bias: 1.92 × 10–2 and −9.43 × 10–2; MSE: 0.26 × 10–2 and 1.12 × 10–2; | $\widehat{\Theta}_{\rm c}$ |⁠: 0.714 and 0.911. All these expected values compared favorably to the standard approach. However, it should be noted that in the absence of treatment selection bias or effect heterogeneity, in the scenario with all null treatment effects, this strategy led to slight upward bias: 2.16 × 10–2; MSE: 0.318 × 10−2; | $\widehat{\Theta}_{\rm c}= 1.022$ |⁠.

Application to clinical cohorts

We next applied BRACE to actual observational patient cohorts. Characteristics of the prostate cancer (N = 36,630), lung cancer (N = 4,069), and head/neck cancer (N = 7,117) cohorts are presented in Supplementary Tables S2–S4. Covariate balance was achieved for all IPTW models at SMD < 0.05 (Supplementary Fig. S1). Results for standard approaches using either multivariable Cox models or IPTW models were largely similar and are presented in tabular form, with comparison of IPTW versus BRACE-corrected estimates emphasized in the text.

For prostate cancer patients, prostatectomy was associated with significantly reduced CSM compared with RT after adjusting for covariates using IPTW [hazard ratio (HR) 0.82, 95% CI): 0.64, 0.92; P = 0.02; Table 2]. Prostatectomy was also associated with significantly reduced noncancer mortality (HR, 0.71; 95% CI, 0.65–0.74; P < 0.001) and improved OS (HR, 0.73; 95% CI, 0.66–0.75; P < 0.001) with IPTW (Fig. 2; Table 2). After correction using BRACE, however, the effect on OS was attenuated substantially (HR, 0.98; 95% CI, 0.95–1.00) with the upper bound of the 95% CI including the null (Table 2). For additional validation, we analyzed a similar cohort of early-stage prostate cancer patients from a recently published analysis of 50,804 low-risk prostate cancer patients from SEER data (courtesy, Dr. Benjamin Kann and Dr. Joseph Miccio; ref. 27); the adjusted effects of prostatectomy relative to RT before (with IPTW) and after BRACE correction were HR 0.54 (95% CI, 0.50–0.61) and 0.97 (95% CI, 0.95–0.99), respectively.

Table 2.

Treatment effects for each cohort.

OutcomeProstate cancerLung cancerHead and neck cancer
Cancer-specific mortality 
 Unadjusted 0.65 (0.57—0.75)** 0.66 (0.56—0.76)** 0.58 (0.53—0.63)** 
 MVA 0.85 (0.73—0.99)* 0.78 (0.66—0.92)* 0.65 (0.59—0.72)** 
 IPTW 0.82 (0.64—0.92)* 0.76 (0.61—0.88)* 0.64 (0.58—0.71)** 
Noncancer mortality 
 Unadjusted 0.54 (0.51—0.57)** 0.63 (0.48—0.83)** 0.64 (0.59—0.70)** 
 MVA 0.73 (0.68—0.78)** 0.83 (0.62—1.11) 0.76 (0.69—0.84)** 
 IPTW 0.71 (0.65—0.74)** 0.87 (0.64—1.17) 0.76 (0.69—0.84)** 
Overall survival 
 Unadjusted 0.55 (0.52—0.59)** 0.65 (0.57—0.74)** 0.61 (0.57—0.65)** 
 MVA 0.75 (0.71—0.80)** 0.79 (0.68—0.91)* 0.71 (0.66—0.76)** 
 IPTW 0.73 (0.66—0.75)** 0.79 (0.66—0.89)* 0.70 (0.66—0.76)** 
 BRACE 0.98 (0.95—1.00) 0.81 (0.65—0.94) 0.81 (0.76—0.86) 
Reference values‡ 
 0.93 (0.65—1.35)a 0.72 (0.48—1.09)c 0.88 (0.85—0.92)d 
 0.94 (0.65—1.36)b   
OutcomeProstate cancerLung cancerHead and neck cancer
Cancer-specific mortality 
 Unadjusted 0.65 (0.57—0.75)** 0.66 (0.56—0.76)** 0.58 (0.53—0.63)** 
 MVA 0.85 (0.73—0.99)* 0.78 (0.66—0.92)* 0.65 (0.59—0.72)** 
 IPTW 0.82 (0.64—0.92)* 0.76 (0.61—0.88)* 0.64 (0.58—0.71)** 
Noncancer mortality 
 Unadjusted 0.54 (0.51—0.57)** 0.63 (0.48—0.83)** 0.64 (0.59—0.70)** 
 MVA 0.73 (0.68—0.78)** 0.83 (0.62—1.11) 0.76 (0.69—0.84)** 
 IPTW 0.71 (0.65—0.74)** 0.87 (0.64—1.17) 0.76 (0.69—0.84)** 
Overall survival 
 Unadjusted 0.55 (0.52—0.59)** 0.65 (0.57—0.74)** 0.61 (0.57—0.65)** 
 MVA 0.75 (0.71—0.80)** 0.79 (0.68—0.91)* 0.71 (0.66—0.76)** 
 IPTW 0.73 (0.66—0.75)** 0.79 (0.66—0.89)* 0.70 (0.66—0.76)** 
 BRACE 0.98 (0.95—1.00) 0.81 (0.65—0.94) 0.81 (0.76—0.86) 
Reference values‡ 
 0.93 (0.65—1.35)a 0.72 (0.48—1.09)c 0.88 (0.85—0.92)d 
 0.94 (0.65—1.36)b   

Note: Effects of intensive treatment approaches on CSM, noncancer mortality, and OS for each clinical cohort. Results are presented from unadjusted, Cox multivariable (MVA), and IPTW models. The BRACE correction was applied to IPTW estimates.

*, Statistically significant with P < 0.05; **, statistically significant with P < 0.001.

†, Note that the BRACE method does not result in a P value. ‡, Reference values were obtained from randomized trials and meta-analyses (asurgery vs. active monitoring; bradiotherapy vs. active monitoring; clobectomy vs. segmentectomy; dlocoregional therapy with chemotherapy vs. locoregional therapy alone; refs. 30–32).

Figure 2.

Kaplan–Meier survival curves showing unadjusted associations between treatment strategies for the prostate cancer (top), lung cancer (middle), and head and neck cancer (bottom) cohorts. CSM, cancer-specific mortality; OS, overall survival; NCM, noncancer mortality. Prostate cancer cohort compares radical prostatectomy (blue) vs. definitive radiotherapy (red). Lung cancer cohort compares lobectomy (blue) vs. sublobar resection or definitive radiotherapy (red). Head and neck cancer cohort compares definitive radiotherapy with induction and/or concurrent cisplatin-based chemotherapy (blue) vs. definitive radiotherapy alone or with alternative systemic therapy (red).

Figure 2.

Kaplan–Meier survival curves showing unadjusted associations between treatment strategies for the prostate cancer (top), lung cancer (middle), and head and neck cancer (bottom) cohorts. CSM, cancer-specific mortality; OS, overall survival; NCM, noncancer mortality. Prostate cancer cohort compares radical prostatectomy (blue) vs. definitive radiotherapy (red). Lung cancer cohort compares lobectomy (blue) vs. sublobar resection or definitive radiotherapy (red). Head and neck cancer cohort compares definitive radiotherapy with induction and/or concurrent cisplatin-based chemotherapy (blue) vs. definitive radiotherapy alone or with alternative systemic therapy (red).

Close modal

For the lung cancer cohort, lobectomy was similarly associated with significantly improved CSM (HR, 0.76; 95% CI, 0.61–0.88; P = 0.02), NCM (HR, 0.87; 95% CI, 0.64–1.17; P = 0.43), and OS (HR, 0.79; 95% CI, 0.66–0.89; P < 0.01), after adjusting for covariates (Fig. 2; Table 2). After BRACE application, the estimated effect on survival was slightly attenuated (HR, 0.81; 95% CI, 0.65–0.94). Note that in the meta-analysis by Zheng and colleagues (32), studies with higher levels of evidence showed an attenuated effect of lobectomy (HR 0.91), nearer to the BRACE estimate. For the head/neck cancer cohort, intensive chemotherapy was also associated with significantly improved CSM (HR, 0.64; 95% CI, 0.58–0.71; P < 0.001), NCM (HR, 0.76; 95% CI, 0.69–0.84; P < 0.001), and OS (HR, 0.70; 95% CI, 0.66–0.76; P < 0.001; Fig. 2; Table 2), with an attenuated effect on OS with BRACE (HR 0.81; 95% CI, 0.76–0.86; Fig. 1; Table 2). As a sensitivity analysis, we conducted the same analyses in the lung and head and neck cohorts excluding observations with missing data, but this did not affect the results or conclusions.

When the proportional hazards assumption was not met, we modeled treatment as a time-varying covariate. Supplementary Table S5 shows results stratified by time period over which the assumption held, following the approach used by Jones and colleagues (33, 34). In general, the corrected OS estimates were nearly identical, but in the prostate cancer cohort, the null hypothesis was not rejected using BRACE, and in the lung cancer cohort, the standard and BRACE estimates diverged more, indicating sensitivity to the proportional hazards assumption. Using Monte Carlo methods for CI estimation, the BRACE-adjusted effects of intensive treatment on OS for the prostate, head/neck, and lung cancer cohorts were HR 0.98 (95% CI, 0.90–1.07), HR 0.81 (95% CI, 0.74–0.91), and HR 0.81 (95% CI, 0.76–0.89), respectively.

Typical strategies to address treatment selection bias in observational data include multivariable Cox proportional hazards regression and propensity score modeling (35–38). Although valuable, these methods may be insufficient to eliminate residual confounding, leading to erroneous inferences (1–8). Competing risks analysis can diagnose residual confounding by identifying mechanistically implausible effects of treatment on competing health events (11, 12). In each cohort we examined, although there were strong associations between intensive treatment and improved survival, competing risks analysis revealed these effects were driven in part by associations with reduced noncancer mortality, even after adjusting for numerous measurable confounders. The likely explanation for this is the selective use of intensive treatment in patients with more favorable baseline health characteristics, thus leading to reduced noncancer mortality, rather than effects on the competing event per se. However, simply identifying this problem does not inherently provide a method to address it.

Here we derived and extensively validated a novel method to attenuate bias, resulting in lower model error compared with standard approaches in simulated data. Although true effects are not directly observable in nonrandomized studies, multiple applications of BRACE yielded attenuated treatment effect estimates more consistent with high-level evidence than uncorrected estimates. Such comparisons should be viewed with caution, given methodological and population differences across studies, but they can lend insight when comparing potentially biased results.

For example, the randomized ProtecT trial found no difference in OS by treatment in 1,643 patients with predominantly low-risk prostate cancer (30), which our results generally support. Although ProtecT did not directly quantify the effect of prostatectomy versus RT, both were compared with a common control group (active monitoring), with nearly identical effects (Table 2). Similarly, the MACH-NC meta-analysis, which investigated the effect of chemotherapy in addition to RT for 16,485 patients across 87 randomized trials, reported an effect of chemotherapy on survival (HR 0.88), close to our BRACE-adjusted estimate (31). Although evidence regarding the comparative effectiveness of treatments for early-stage NSCLC is conflicting (32, 39–46), our results indicated a survival advantage to lobectomy after BRACE correction. Though large randomized trials are lacking, in the meta-analysis by Zheng and colleagues, the effect was attenuated in trials with higher levels of evidence (32). Of note, in some studies, lobectomy has been associated with higher 90-day mortality compared with stereotactic ablative radiotherapy (SABR; ref. 45); if true, BRACE would undercorrect for treatment selection bias.

Our findings, thus, have important implications regarding analyses using observational data. For example, the National Cancer Database lacks cause-specific event data, precluding the application of BRACE and leaving many analyses vulnerable to undiagnosed bias. Missing or inadequate comorbidity data may also contribute to residual confounding; application of BRACE in databases that include cause-specific event data could help mitigate this problem proactively.

This study has several limitations. Notably, the proportional relative hazards model treats several key quantities as independent, a strong condition that is not always verifiable. Although we did not observe critical differences in the performance of BRACE under varying parameters, due to practical limitations we could not examine all permutations or many variables that could theoretically impact our findings. In our analysis of clinical data, it is important to note the corrected estimates could still be biased. Furthermore, inferences can be sensitive to the proportional hazards assumption or to the method used to estimate confidence intervals, especially when close to the null. We noted that in the absence of residual confounding, BRACE increased type II error (bias toward the null), emphasizing the importance of this assumption. We did not take on complex model specifications, such as multiple measured confounders, multiarm comparisons, dependence between events, or Bayesian approaches; future work could examine these questions, including covariate effects on the BRACE estimate. Although we did not observe different results when BRACE was tested in smaller samples, its application may be limited in very small observational cohorts, especially if no significant treatment effects are observable. Note, however, that sensitivity analyses for unmeasured bias are often limited in underpowered studies, and not usually warranted in small studies with inconclusive results (47). For small sample size scenarios, the assumptions for testing hypothesis on parameters of the proportional hazards model may also not hold, and the variances of the estimators may differ from the theoretical ones. Bias adjustment methods can be studied for log-rank methods (48), and this could be a topic for further research. It is important to note that gains using BRACE depend on leveraging a critical assumption: namely, that intensifying treatment does not reduce the hazard for noncancer events. Although this assumption is generally justified when comparing more vs. less intensive treatments (e.g., A vs. A + B designs), in other contexts it may not be possible to bound the effects of a treatment on competing events, such as when comparing two systemic therapies). In this case, BRACE correction would not be recommended.

In summary, we present a novel method (BRACE) to mitigate bias from residual confounding. Appropriate application in observational, nonrandomized data would likely improve effect estimation and inferences.

A.B. Sharabi reports grants from Varian Medical Systems and Pfizer, as well as personal fees from AstraZeneca, Merck, and Jounce Therapeutics outside the submitted work; in addition, A.B. Sharabi is a scientific founder of Toragen Inc with equity outside the submitted work. No disclosures were reported by the other authors.

C.W. Williamson: Conceptualization, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. T.J. Nelson: Data curation, software, formal analysis, investigation, methodology, writing–review and editing. C.A. Thompson: Conceptualization, supervision, investigation, methodology, writing–review and editing. L.K. Vitzthum: Conceptualization, investigation, methodology, writing–review and editing. K. Zakeri: Conceptualization, methodology, writing–review and editing. P.J. Riviere: Data curation, software, visualization, writing–review and editing. A.K. Bryant: Data curation, software, writing–review and editing. A.B. Sharabi: Investigation, writing–review and editing. J. Zou: Software, formal analysis, supervision, validation, investigation, visualization, methodology, writing–review and editing. L.K. Mell: Conceptualization, resources, formal analysis, supervision, validation, investigation, visualization, methodology, writing–review and editing.

The authors wish to thank James Murphy for facilitating access to VINCI data and Joseph Miccio and Benjamin Kann for facilitating access to SEER data.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Becher
H
.
The concept of residual confounding in regression models and some applications
.
Stat Med
1992
;
11
:
1747
58
.
2.
Jatoi
I
.
Residual confounding threatens the validity of observational studies on breast cancer local therapy
.
Cancer
2020
;
126
:
2317
020
.
3.
Harrell
FE
Jr
,
Lee
KL
,
Mark
DB
.
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors
.
Stat Med
1996
;
15
:
361
87
.
4.
Bosco
JL
,
Silliman
RA
,
Thwin
SS
,
Geiger
AM
,
Buist
DS
,
Prout
MN
, et al
.
A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies
.
J Clin Epidemiol
2010
;
63
:
64
74
.
5.
Giordano
SH
,
Kuo
YF
,
Duan
Z
,
Hortobagyi
GN
,
Freeman
J
,
Goodwin
JS
.
Limits of observational data in determining outcomes from cancer therapy
.
Cancer
2008
;
112
:
2456
66
.
6.
McGale
P
,
Cutter
D
,
Darby
S
,
Henson
KE
,
Jagsi
R
,
Taylor
CW
.
Can observational data replace randomized trials
?
J Clin Oncol
2016
;
34
:
3355
7
.
7.
Kumar
A
,
Guss
ZD
,
Courtney
PT
,
Nalawade
V
,
Sheridan
P
,
Sarkar
RR
, et al
.
Evaluation of the use of cancer registry data for comparative effectiveness research
.
JAMA Netw Open
2020
;
3
:
e2011985
.
8.
Soni
PD
,
Hartman
HE
,
Dess
RT
,
Abugharib
A
,
Allen
SG
,
Feng
FY
, et al
.
Comparison of population-based observational studies with randomized trials in oncology
.
J Clin Oncol
2019
;
37
:
1209
16
.
9.
Korn
EL
,
Freidlin
B
.
Methodology for comparative effectiveness research: potential and limitations
.
J Clin Oncol
2012
;
30
:
4185
7
.
10.
Hahn
OM
,
Schilsky
RL
.
Randomized controlled trials and comparative effectiveness research
.
J Clin Oncol
2012
;
30
:
4194
201
.
11.
Mell
LK
,
Jeong
JH
.
Pitfalls of using composite primary end points in the presence of competing risks
.
J Clin Oncol
2010
;
28
:
4297
9
.
12.
Mell
LK
,
Carmona
R
,
Gulaya
S
,
Lu
T
,
Wu
J
,
Saenz
CC
, et al
.
Cause-specific effects of radiotherapy and lymphadenectomy in stage I-II endometrial cancer: a population-based study
.
J Natl Cancer Inst
2013
;
105
:
1656
66
.
13.
Mell
LK
,
Nelson
TJ
,
Thompson
CA
,
Williamson
CW
,
Vitzthum
LK
,
Zou
J
.
Bias reduction through analysis of competing events (BRACE): a novel method to mitigate bias from residual confounding in observational data
.
medRxiv
2020 Jan 1
.
14.
Carmona
R
,
Gulaya
S
,
Murphy
JD
,
Rose
BS
,
Wu
J
,
Noticewala
S
, et al
.
Validated competing event model for the stage I-II endometrial cancer population
.
Int J Radiat Oncol Biol Phys
2014
;
89
:
888
98
.
15.
Zakeri
K
,
Rose
BS
,
Gulaya
S
,
D'Amico
AV
,
Mell
LK
.
Competing event risk stratification may improve the design and efficiency of clinical trials: Secondary analysis of SWOG 8794
.
Contemp Clin Trials
2013
;
34
:
74
9
.
16.
Carmona
R
,
Zakeri
K
,
Green
G
,
Hwang
L
,
Gulaya
S
,
Xu
B
, et al
.
Improved method to stratify elderly patients with cancer at risk for competing events
.
J Clin Oncol
2016
;
34
:
1270
7
.
17.
Mell
LK
,
Shen
H
,
Nguyen-Tân
PF
,
Rosenthal
DI
,
Zakeri
K
,
Vitzthum
LK
, et al
.
Nomogram to predict the benefit of intensive treatment for locoregionally advanced head and neck cancer
.
Clin Cancer Res
2019
;
25
:
7078
88
.
18.
Morris
TP
,
White
IR
,
Crowther
MJ
.
Using simulation studies to evaluate statistical methods
.
Stat Med
2019
;
38
:
2074
102
.
19.
Therneau
TM
,
Grambsch
PM
.
The Cox Model
. In:
Therneau
TM
,
Grambsch
PM
, editors.
Modeling survival data: extending the cox model. Statistics for biology and health
.
New York
:
Springer
;
2009
. p.
39
77
.
20.
Mell
LK
,
Zakeri
K
,
Shen
H
.
Time-to-event analysis
. In:
Mell
LK
,
Tran
PT
,
Yu
J
,
Zhang
Q
, editors.
Principles of clinical cancer research
.
New York
:
Demos Medical Publishing
;
2019
. p.
276
303
.
21.
American College of Surgeons
.
Facility oncology registry data standards
;
2016
.
Available from
: https://www.facs.org/-/media/files/quality-programs/cancer/ncdb/fords-2016.ashx.
22.
Riviere
P
,
Luterstein
E
,
Kumar
A
,
Vitzthum
LK
,
Deka
R
,
Sarkar
RR
, et al
.
Survival of African American and non-Hispanic white men with prostate cancer in an equal-access health care system
.
Cancer
2020
;
126
:
1683
90
.
23.
Bryant
AK
,
Mundt
RC
,
Sandhu
AP
,
Urbanic
JJ
,
Sharabi
AB
,
Gupta
S
, et al
.
Stereotactic body radiation therapy versus surgery for early lung cancer among US veterans
.
Ann Thorac Surg
2018
;
105
:
425
31
.
24.
Vitzthum
LK
,
Park
H
,
Zakeri
K
,
Bryant
AK
,
Feng
C
,
Shen
H
, et al
.
Selection of head and neck cancer patients for intensive therapy
.
Int J Radiat Oncol Biol Phys
2020
;
106
:
157
66
.
25.
Maynard
C
,
Trivedi
R
,
Nelson
K
,
Fihn
SD
.
Disability rating, age at death, and cause of death in US veterans with service-connected conditions
.
Mil Med
2018
;
183
:
e371
6
.
26.
Templ
M
,
Kowarik
A
,
Filzmoser
P
.
Iterative stepwise regression imputation using standard and robust methods
.
Comput Stat Data Anal
2011
;
55
:
2793
806
.
27.
Miccio
JA
,
Talcott
WJ
,
Jairam
V
,
Park
HS
,
James
BY
,
Leapman
MS
, et al
.
Quantifying treatment selection bias effect on survival in comparative effectiveness research: findings from low-risk prostate cancer patients
.
Prostate Cancer Prostatic Dis
2021
;
24
:
414
22
.
28.
Stein
AP
,
Saha
S
,
Yu
M
,
Kimple
RJ
,
Lambert
PF
.
Prevalence of human papillomavirus in oropharyngeal squamous cell carcinoma in the United States across time
.
Chem Res Toxicol
2014
;
27
:
462
9
.
29.
Fakhry
C
,
Westra
WH
,
Wang
SJ
,
van Zante
A
,
Zhang
Y
,
Rettig
E
, et al
.
The prognostic role of sex, race, and human papillomavirus in oropharyngeal and nonoropharyngeal head and neck squamous cell cancer
.
Cancer
2017
;
123
:
1566
75
.
30.
Hamdy
FC
,
Donovan
JL
,
Lane
J
,
Mason
M
,
Metcalfe
C
,
Holding
P
, et al
.
10-year outcomes after monitoring, surgery, or radiotherapy for localized prostate cancer
.
N Engl J Med
2016
;
375
:
1415
24
.
31.
Pignon
JP
,
le Maître
A
,
Maillard
E
,
Bourhis
J
.
Meta-analysis of chemotherapy in head and neck cancer (MACH-NC): an update on 93 randomised trials and 17,346 patients
.
Radiother Oncol
2009
;
92
:
4
14
.
32.
Zheng
YZ
,
Zhai
WY
,
Zhao
J
,
Luo
RX
,
Gu
WJ
,
Fu
SS
, et al
.
Oncologic outcomes of lobectomy vs. segmentectomy in non-small cell lung cancer with clinical T1N0M0 stage: a literature review and meta-analysis
.
J Thorac Dis
2020
;
12
:
3178
87
.
33.
Jones
CU
,
Pugh
SL
,
Sandler
HM
,
Chetner
MP
,
Amin
MB
,
Bruner
DW
, et al
.
Adding short-term androgen deprivation therapy to RT in men with localized prostate cancer: long-term update of the NRG/RTOG 9408 randomized clinical trial
.
Int J Radiat Oncol Biol Phys
2022
;
112
:
294
303
.
34.
Mell
LK
,
Pugh
S
,
Jones
CU
,
Nelson
TJ
,
Morginstin
M
,
Gore
EM
, et al
.
Effect of hormone therapy within risk groups defined by generalized competing event model: ancillary analysis of NRG Oncology's RTOG 9408 (abstr).
Int J Radiat Oncol Biol Phys
2020
;
108
:
E863
4
.
35.
Rosenbaum
PR
,
Rubin
DB
.
The central role of the propensity score in observational studies for causal effects
.
Biometrika
1983
;
70
:
41
55
.
36.
Austin
PC
.
An introduction to propensity score methods for reducing the effects of confounding in observational studies
.
Multivariate Behav Res
2011
;
46
:
399
424
.
37.
Austin
PC
.
A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality
.
Multivariate Behav Res
2011
;
46
:
119
51
.
38.
Austin
PC
.
The performance of different propensity-score methods for estimating relative risks
.
J Clin Epidemiol
2008
;
61
:
537
45
.
39.
Razi
SS
,
John
MM
,
Sainathan
S
,
Stavropoulos
C
.
Sublobar resection is equivalent to lobectomy for T1a non–small cell lung cancer in the elderly: a Surveillance, Epidemiology, and End Results database analysis
.
J Surg Res
2016
;
200
:
683
9
.
40.
Ginsberg
RJ
,
Rubinstein
LV
,
Group
LCS
.
Randomized trial of lobectomy versus limited resection for T1 N0 non-small cell lung cancer
.
Ann Thorac Surg
1995
;
60
:
615
22
.
41.
Khullar
OV
,
Liu
Y
,
Gillespie
T
,
Higgins
KA
,
Ramalingam
S
,
Lipscomb
J
, et al
.
Survival after sublobar resection versus lobectomy for clinical stage IA lung cancer: an analysis from the National Cancer Data Base
.
J Thorac Oncol
2015
;
10
:
1625
33
.
42.
El-Sherif
A
,
Gooding
WE
,
Santos
R
,
Pettiford
B
,
Ferson
PF
,
Fernando
HC
, et al
.
Outcomes of sublobar resection versus lobectomy for stage I non-small cell lung cancer: a 13-year analysis
.
Ann Thorac Surg
2006
;
82
:
408
15
.
43.
Chang
JY
,
Senan
S
,
Paul
MA
,
Mehran
RJ
,
Louie
AV
,
Balter
P
, et al
.
Stereotactic ablative radiotherapy versus lobectomy for operable stage I non-small-cell lung cancer: a pooled analysis of two randomised trials
.
Lancet Oncol
2015
;
16
:
630
7
.
44.
Zhang
B
,
Zhu
F
,
Ma
X
,
Tian
Y
,
Cao
D
,
Luo
S
, et al
.
Matched-pair comparisons of stereotactic body radiotherapy (SBRT) versus surgery for the treatment of early stage non-small cell lung cancer: a systematic review and meta-analysis
.
Radiother Oncol
2014
;
112
:
250
5
.
45.
Zheng
X
,
Schipper
M
,
Kidwell
K
,
Lin
J
,
Reddy
R
,
Ren
Y
, et al
.
Survival outcome after stereotactic body radiation therapy and surgery for stage I non-small cell lung cancer: a meta-analysis
.
Int J Radiat Oncol Biol Phys
2014
;
90
:
603
11
.
46.
Shirvani
SM
,
Jiang
J
,
Chang
JY
,
Welsh
J
,
Likhacheva
A
,
Buchholz
TA
, et al
.
Lobectomy, sublobar resection, and stereotactic ablative radiotherapy for early-stage non-small cell lung cancers in the elderly
.
JAMA Surg
2014
;
149
:
1244
53
.
47.
Lash
TL
,
Fox
MP
,
MacLehose
RF
,
Maldonado
G
,
McCandless
LC
,
Greenland
S
.
Good practices for quantitative bias analysis
.
Int J Epidemiol
2014
;
43
:
1969
85
.
48.
Mehrotra
DV
,
Roth
AJ
.
Relative risk estimation and inference using a generalized logrank statistic
.
Stat Med
2001
;
20
:
2099
113
.

Supplementary data