Abstract
Previous studies indicate that the benefit of therapy depends on patients' risk for cancer recurrence relative to noncancer mortality (ω ratio). We sought to test the hypothesis that patients with head and neck cancer (HNC) with a higher ω ratio selectively benefit from intensive therapy.
We analyzed 2,688 patients with stage III–IVB HNC undergoing primary radiotherapy (RT) with or without systemic therapy on three phase III trials (RTOG 9003, RTOG 0129, and RTOG 0522). We used generalized competing event regression to stratify patients according to ω ratio and compared the effectiveness of intensive therapy as a function of predicted ω ratio (i.e., ω score). Intensive therapy was defined as treatment on an experimental arm with altered fractionation and/or multiagent concurrent systemic therapy. A nomogram was developed to predict patients' ω score on the basis of tumor, demographic, and health factors. Analysis was by intention to treat.
Decreasing age, improved performance status, higher body mass index, node-positive status, P16-negative status, and oral cavity primary predicted a higher ω ratio. Patients with ω score ≥0.80 were more likely to benefit from intensive treatment [5-year overall survival (OS), 70.0% vs. 56.6%; HR of 0.73, 95% confidence interval (CI): 0.57–0.94; P = 0.016] than those with ω score <0.80 (5-year OS, 46.7% vs. 45.3%; HR of 1.02, 95% CI: 0.92-1.14; P = 0.69; P = 0.019 for interaction). In contrast, the effectiveness of intensive therapy did not depend on risk of progression.
Patients with HNC with a higher ω score selectively benefit from intensive treatment. A nomogram was developed to help select patients for intensive therapy.
The effectiveness of intensive therapy for patients with head and neck cancer (HNC) who are older, or who have comorbidities, or who have favorable risk human papilloma virus–associated disease is unclear. Traditional risk stratification models pool patients at high risk for cancer events with patients at high risk for competing events, even though these groups have different expected benefit from intensive therapy. Studies indicate that the hazard for cancer recurrence relative to competing mortality (ω ratio) is a key determinant of treatment benefit, with newer regression methods developed to quantify effects on this ratio. This is the first study to examine the effectiveness of intensive therapy for HNC as a function of ω ratio. We found that patients with a predicted ω > 0.80 had improved overall survival with intensive therapy, using pooled data from three randomized controlled trials.
Introduction
Although the effectiveness of intensive therapy [e.g., concurrent chemotherapy or altered fractionation (AFX)] for locoregionally advanced head and neck cancer has been established, there is considerable controversy surrounding which subsets of patients are most likely to benefit from this approach. In particular, the effectiveness of intensive therapy in patients who are older, or who have comorbidities, or who have relatively favorable risk disease [e.g., human papillomavirus (HPV)-associated disease, nonsmokers] is unclear.
Traditionally, risk stratification models used in cancer outcomes research have focused on the effects of treatments and risk factors on endpoints such as overall survival (OS) or progression-free survival (PFS). A problem is that these endpoints do not differentiate effects on primary events, such as disease recurrence or cancer mortality, from competing events, such as death from comorbid illness. As a result, such models are suboptimal, because they pool patients at high risk for cancer events with patients at high risk for competing events, even though these groups have different expected benefit from intensive therapy (1–9). Thus, staging systems and nomograms that predict for OS and PFS are likely to be suboptimal for selecting patients with head and neck cancer (HNC) for intensive therapeutic regimens.
Previous studies indicate that in patients with competing risks, the hazard for cancer relative to competing mortality events (i.e., ω ratio) is a key determinant of treatment benefit (7, 9–11). In particular, older patients with higher ω ratios may be good candidates for more intensive therapy; conversely, younger patients with lower ω ratios may not be. Further work is needed to define factors that critically affect ω ratios and correlate them with treatment effects. Correspondingly, newer methods have been developed to quantify effects on the ω ratio, with considerable improvement in risk stratification compared with standard models (10–13). However, it is not known whether the benefit of more intensive treatment varies according to ω ratio and, in particular, whether this is a more effective method to predict which patients are most likely to benefit from intensive treatment. The goal of this study was to develop a model to identify patients with locally advanced HNC with a higher ω ratio and to test the hypothesis that such patients selectively benefit from treatment intensification.
Materials and Methods
Population, sampling methods, and treatment
We studied 2,688 patients with locoregionally advanced (stage III–IVB) HNC treated on three clinical trials: RTOG 9003 (NCT00771641), RTOG 0129 (NCT00047008), and RTOG 0522 (NCT00265941). Details of these protocols have been published previously (14–17). Written informed consent was obtained for all patients. The study was conducted in accordance with recognized ethical guidelines and was approved by the institutional review boards at all participating institutions.
Briefly, patients on RTOG 9003 were randomized to one of four arms: hyperfractionated radiotherapy (HFX: 81.6 Gy in 68 fractions twice a day over 7 weeks), delayed concomitant boost radiotherapy (DCB: 72 Gy in 42 fractions over 6 weeks), split course radiotherapy (SC: 67.2 Gy in 42 fractions over 6 weeks), or standard fractionation (SFX: 70 Gy in 35 fractions over 7 weeks). For the purpose of this analysis, HFX and DCB were considered AFX, whereas SC and SFX were not. Patients on RTOG 9003 did not receive chemotherapy. Patients on RTOG 0129 were randomized to either AFX or SFX and received chemotherapy (two cycles of cisplatin 100 mg/m2weeks 1 and 4 of chemoradiotherapy for patients receiving AFX, and three cycles at the same dose weeks 1, 4, and 7 for patients receiving SFX). Patients on RTOG 0522 were randomized to either cetuximab (400 mg/m2loading dose, followed by 250 mg/m2weekly) or no cetuximab, and all patients received AFX (six fractions per week) along with concurrent cisplatin (two cycles of cisplatin 100 mg/m2weeks 1 and 4 of chemoradiotherapy). All human investigations were performed after approval by a local human investigations committee and in accordance with an assurance filed with and approved by the Department of Health & Human Services.
Outcomes
Progression-free survival time was defined as the time from randomization to the first recurrence of disease, or death from any cause, or censoring. Overall survival time was defined as the time from randomization to death from any cause, or censoring. Time to recurrence and time to cancer-specific mortality were defined as the time from randomization to first recurrence (or cancer-related mortality), with competing mortality events treated as censored. Time to competing mortality for recurrence was defined as time from randomization to death from any cause, in the absence of a recurrence event, with recurrence events treated as censored. Correspondingly, time to competing mortality for cancer mortality was defined as time from randomization to death from any cause, in the absence of a cancer mortality event, with cancer mortality events treated as censored.
Statistical analysis
The study followed TRIPOD guidelines (18). The statistical approach involved two main steps: (i) development of a model to separate patients by ω ratio and (ii) validation of the model as a method to predict treatment effects (i.e., variation in treatment effects as a function of ω). Overall survival was used as the primary outcome assessment for model validation, because this endpoint was not used in model development and represents an outcome of clear clinical benefit to patients.
Kaplan–Meier functions were used to plot PFS and OS and cumulative incidence functions were used to plot competing events with respect to time. The basehaz function in R (version 3.4.2) was used to estimate cumulative hazards. Forest plots were used to analyze treatment effects within risk strata, according to intention to treat. Proportional hazards assumptions were tested using the Grambsch–Therneau method (cox.zph function in R).
We trained risk scores for recurrence, competing mortality, and PFS using data from the control arms from the three studies (Supplementary Fig. S1), based on the linear predictor from a multivariable Cox proportional hazards regression (19). For RTOG 9003, the SC and SFX arms were collectively considered the control group. For the multivariable models, we selected the following candidate variables for inclusion, based on their availability in all three trials and potential association with disease recurrence (15–17, 20, 21) and/or competing mortality (2–4, 11, 19, 22, 23): age (per 10 years; continuous), female sex, black/African–American race (vs. other), white/Caucasian race (vs. other), body mass index (BMI; ≤20 kg/m2vs. >20), ECOG performance status (0 vs. 1–2), marital status (married vs. other/unknown), anemia (yes/no), education history (any college vs. other/unknown)—as a proxy for socioeconomic status (SES), primary site (oral cavity vs. oropharynx vs. hypopharynx vs. larynx), T stage (0–2 vs. 3 vs. 4), and N stage (0 vs. 1–2a vs. 2b–2c vs. 3). Anemia was defined for males as a baseline hemoglobin ≤13.5 g/dL and for females as a baseline hemoglobin ≤12.5 g/dL. For patients with known smoking and tumor P16 status, we included pack-years (≤10 vs. >10) and P16 (positive vs. negative) as covariates. P16 was analyzed as a prognostic factor for both oropharyngeal and nonoropharyngeal sites, based on several studies that have found differences in outcomes by P16 or HPV status in both oropharyngeal (20, 21) and nonoropharyngeal HNC (24–26).
All variables were normalized by subtracting the sample mean and dividing by the sample standard deviation. The mean BMI value was imputed for 194 patients with missing data, using single imputation. Risk scores based on the linear predictor were generated taking the inner product of the coefficient vector with the individual patient's data vector, as described previously (11). Risk strata were defined according to quantiles of the risk score distribution. We compared results with a standard model developed to stratify patients with oropharyngeal cancer (20, 21). Note that this model also stratifies patients with nonoropharyngeal cancer (Supplementary Fig. S2).
For modeling effects of covariates, we used generalized competing event (GCE) regression based on a proportional relative hazards model (11, 27, 28). A detailed description of the GCE modeling approach is provided below. In brief, the ratio of the cause-specific hazard for recurrence (λ1) versus the cause-specific hazard for competing mortality (λ2) is represented as ω+, whereas the ratio of the cause-specific hazard for recurrence (λ1) to the hazard for any progression-free event (λ1 + λ2) is represented as ω. We use the terms ω and ω+ratio to refer to observed values, whereas ω and ω+score refers to values of ω and ω+, respectively, predicted by the GCE model.
For GCE regression, the same variables were used as in the Cox proportional hazards models after normalization. Separate regression models were built for both cause-specific events (i.e., disease recurrence) and competing mortality (i.e., death in the absence of disease recurrence, with the cause-specific event treated as censored). Treatment-related deaths were classified as competing mortality events. To test the sensitivity of our conclusions to model specification, reduce overfitting, and facilitate clinical implementation, we generated a parsimonious GCE model using backward stepwise regression to exclude variables from the regression if P > 0.20 and by consolidating N stage (0 vs. 1–3). In this model, only age, performance status, BMI, oral cavity site, N stage, and P16 status had P < 0.20 and thus were retained in the final nomogram, which was trained on the subset of controls with known P16 status (N = 602).
GCE risk scores were generated by taking the inner product of the (normalized) individual patient's data vector with the difference of the coefficient vector for cause-specific events and competing mortality. For 95% confidence intervals (CIs) of estimates, we employed the gcerisk package in R (27). Risk strata were defined according to quantiles of the GCE risk score distribution. Tests of treatment effects and interactions included random effects for study and age (29). All P values are two-sided.
GCE model
For mutually exclusive events of type k, we posit the following proportional relative hazards model:
where
Here, λk0(t) is the baseline hazard for an event of type k, Σj≠k λj0(t) is the baseline cause-specific hazard for the set of events competing with event type k, |{\rm{X}}$| is a vector of covariates, and |\beta _{k\ GCE}^ + $| is the vector of effects (coefficients) on the covariates. From this model, it can be shown that
where |\ {\beta _k}\ {\rm{and\ }}{\beta _{j \ne k}}\ $| represent effects on the baseline hazard for event type k and competing events, respectively, from the Cox proportional hazard model. We use |{\rm{\hat{\beta }}}_{k\ GCE}^ + = {{\rm{\hat{\beta }}}_k}$|-|{{\rm{\hat{\beta }}}_{j \ne k}}$| as the estimator for |{\rm{\beta }}_{k\ GCE}^ + $| and |{\rm{\hat{\omega }}}_{k0}^ + ( {\rm{t}} ){\rm{\ }}$| = |{{\rm{\hat{\Lambda }}}_{k0}}( {\rm{t}} )/{{\rm{\hat{\Lambda }}}_{j \ne k\ 0}}( {\rm{t}} )$|, where |{{\rm{\hat{\Lambda }}}_{k0}}({\rm t})$| and |{{\rm{\hat{\Lambda }}}_{j \ne k\ 0}}( {\rm{t}} )$| represent the Nelson–Aalen estimators18for the cumulative hazard for event type k and the set of competing events at time t, respectively. We estimate the predicted value of |{\rm{\hat{\omega }}}_k^ + ( {t|{\bmi{d}}} )\ $|for an individual with given data vector d as:
Note then that |{\rm{exp}}( {{\rm{\hat{\beta }}}_{k\ GCE}^ + } )$| is the estimate of the ω+ratio, which quantifies how the relative hazards for primary and competing events change in response to changes in covariates.
We define the omega value as the ratio of the hazard for an event of type k to the hazard for all events:
and estimate the predicted omega value as:
Note that while |{{\rm{\hat{\omega }}}_k}$| ranges from 0 to 1 inclusive, |{\rm{\hat{\omega }}}_k^ + $| ranges from 0 to ∞. For k = 2, a value of |{\rm{\hat{\omega }}}_1^ + = 1{\rm{\ }}$| means the hazard for event type 1 equals the hazard for event type 2, and therefore |{{\rm{\hat{\omega }}}_1}$| = |{{\rm{\hat{\omega }}}_2}$| = 0.5. For the purpose of this study, we defined ω+as the ratio of the hazard for disease recurrence to the hazard for competing mortality in the absence of recurrence, and ω as the ratio of the hazard for disease recurrence to the hazard for recurrence or death from any cause. All values of ω are unscaled unless otherwise specified. Scaled estimates were obtained by factoring out the baseline ω+values.
Sample size estimates
We used the power calculator described by Pintilie (30) to estimate sample sizes for hypothetical randomized trials with a primary endpoint of PFS, assuming balanced randomization, accrual time of 3 years, follow-up time of 2 years, two-sided α = 0.05, and β = 0.20. We considered two events, cancer recurrence (k = 1) and competing mortality (k = 2), and assumed an HR for cancer recurrence (θ1) of 0.5 and an HR for competing mortality (θ2) of 1. Under varying |{{\rm{\omega }}_1}$| values, we allowed the HR for any event (θ) to vary according to the equation:
Final GCE risk score
R functions to define the GCE risk score and scaled predicted ω are:
exp.risk.score =
function(AGE,BMI,ECOG12,OC,N0,P16){exp(-0.3693*((0.1*AGE-5.72)/0.9126)+0.2044*(((BMI>20)-0.88538)/0.31883)-0.2262*((ECOG12-0.377076)/0.4851)+0.1684*((OC-0.03488)/0.18364)-0.1274*((N0-0.14452)/0.3519)-0.2147*((P16-0.488372)/0.5))}
scaled.omega.predicted =
function(AGE,BMI,ECOG12,OC,N0,P16) {exp.risk.score(AGE,BMI,ECOG12,OC,N0,P16)/(exp.risk.score(AGE,BMI,ECOG12,OC,N0,P16)+1)}
omega.score =
function(AGE,BMI,ECOG12,OC,N0,P16) {2.6*exp.risk.score(AGE,BMI,ECOG12,OC,N0,P16)/(2.6*exp.risk.score(AGE,BMI,ECOG12,OC,N0,P16)+1)}
For this calculation, AGE is in years, BMI is in kg/cm2, ECOG12 is 1 if ECOG performance status is >0 and 0 otherwise, OC is 1 for oral cavity tumors and 0 otherwise, N0 is 1 if there is no nodal involvement and 0 otherwise, and P16 is 1 for P16-positive tumors and 0 otherwise. The factor 2.6 is the mean baseline ω+estimate from the control sample.
Results
Sample characteristics are provided in Supplementary Table S1. Comparisons of model estimates for the entire control group and the subset with known smoking history and P16 status appear in Table 1. Factors predicting a higher ratio were decreasing age, improved performance status, higher BMI, node-positive status, P16-negative status, and oral cavity primary. It is interesting to compare and contrast effect estimates from Cox versus GCE models. Although patients with poorer OS or PFS are typically identified as candidates for more intensive treatment, the GCE model indicates that patients with advanced age, poorer performance status, hypopharynx site, and advanced T category, for example, have a reduced hazard for cancer events relative to competing mortality, implying that such patients are relatively less likely to benefit from treatment intensification. Moreover, some factors, such as N3 category and marital, education, and smoking status, are attenuated in the GCE model due to offsetting effects on recurrence and competing mortality.
Comparison of Cox versus GCE models in all controls (left columns) and complete cases with known smoking and P16 status (right columns)
. | All controls (N = 1352) . | Subset with known smoking and P16 status (N = 527) . | ||
---|---|---|---|---|
. | Cox PH regression . | GCE regression . | Cox PH regression . | GCE regression . |
Characteristics | HRa(95% CI) | ω+Ratio (RHR)tb1fn2b(95% CI) | HRa(95% CI) | ω+Ratio (RHR)b(95% CI) |
Age at diagnosis, per 10 yearsc | 1.42 (1.31–1.53) | 0.66 (0.57–0.77) | 1.37 (1.19–1.58) | 0.67 (0.51–0.88) |
Sex | ||||
Female vs. male | 0.84 (0.70–1.01) | 1.06 (0.74–1.51) | 0.94 (0.68–1.29) | 0.94 (0.51–1.75) |
Race | ||||
Black | 1.04 (0.74–1.47) | 0.51 (0.25–1.06) | 0.65 (0.35–1.22) | 0.88 (0.25–3.10) |
White | 0.80 (0.60–1.09) | 0.42 (0.22–0.80) | 0.59 (0.35–1.00) | 0.83 (0.29–2.41) |
Nonblack/nonwhite | Reference | Reference | Reference | Reference |
Body mass indexc | ||||
≤20 kg/m2vs. >20 kg/m2 | 0.55 (0.46–0.67) | 1.03 (0.71–1.49) | 0.76 (0.54–1.09) | 1.66 (0.83–3.32) |
ECOG performance statusc | ||||
1–2 vs. 0 | 1.35 (1.16–1.58) | 0.49 (0.36–0.67) | 1.54 (1.19–2.01) | 0.59 (0.34–1.00) |
Anemia | ||||
Yes vs. no/unknown | 1.11 (0.94–1.30) | 0.79 (0.58–1.08) | 0.92 (0.69–1.21) | 0.97 (0.56–1.67) |
Married | ||||
Yes vs. no/unknown | 0.75 (0.66–0.88) | 0.99 (0.74–1.32) | 0.77 (0.59–1.00) | 1.16 (0.70–1.93) |
Education history | ||||
Any college/vocational/technical vs. none/unknown | 0.60 (0.51–0.71) | 0.98 (0.71–1.36) | 0.57 (0.43–0.77) | 0.90 (0.52–1.55) |
Anatomic subsite | ||||
Oropharynx | Reference | Reference | Reference | Reference |
Larynx | 1.16 (0.96–1.39) | 1.06 (0.73–1.53) | 0.93 (0.66–1.30) | 1.15 (0.59–2.24) |
Hypopharynx | 1.65 (1.33–2.04) | 0.77 (0.50–1.18) | 1.72 (1.16–2.55) | 0.87 (0.39–1.91) |
Oral cavityc | 1.47 (1.13–1.91) | 2.55 (1.40–4.62) | 2.11 (1.22–3.64) | 2.46 (0.62–9.77) |
T stage | ||||
0–2 | Reference | Reference | Reference | Reference |
3 | 1.06 (0.88–1.28) | 0.85 (0.59–1.23) | 0.87 (0.63–1.21) | 0.84 (0.44–1.61) |
4 | 1.53 (1.26–1.86) | 0.77 (0.52–1.13) | 1.45 (1.04–2.01) | 0.84 (0.44–1.62) |
N stage | ||||
0c | Reference | Reference | Reference | Reference |
1–2a | 1.26 (1.01–1.57) | 1.36 (0.88–2.10) | 1.23 (0.82–1.84) | 1.53 (0.69–3.39) |
2b–2c | 1.29 (1.06–1.57) | 1.39 (0.94–2.06) | 1.54 (1.07–2.21) | 1.37 (0.67–2.82) |
3 | 2.19 (1.65–2.91) | 1.07 (0.60–1.91) | 2.93 (1.77–4.84) | 1.00 (0.36–2.81) |
Smoking history, pack-years | ||||
≤10 vs. >10 | — | — | 0.50 (0.36–0.70) | 1.05 (0.56–1.94) |
P16 statusc | ||||
Positive vs. negative | — | — | 0.53 (0.39–0.72) | 0.66 (0.37–1.18) |
. | All controls (N = 1352) . | Subset with known smoking and P16 status (N = 527) . | ||
---|---|---|---|---|
. | Cox PH regression . | GCE regression . | Cox PH regression . | GCE regression . |
Characteristics | HRa(95% CI) | ω+Ratio (RHR)tb1fn2b(95% CI) | HRa(95% CI) | ω+Ratio (RHR)b(95% CI) |
Age at diagnosis, per 10 yearsc | 1.42 (1.31–1.53) | 0.66 (0.57–0.77) | 1.37 (1.19–1.58) | 0.67 (0.51–0.88) |
Sex | ||||
Female vs. male | 0.84 (0.70–1.01) | 1.06 (0.74–1.51) | 0.94 (0.68–1.29) | 0.94 (0.51–1.75) |
Race | ||||
Black | 1.04 (0.74–1.47) | 0.51 (0.25–1.06) | 0.65 (0.35–1.22) | 0.88 (0.25–3.10) |
White | 0.80 (0.60–1.09) | 0.42 (0.22–0.80) | 0.59 (0.35–1.00) | 0.83 (0.29–2.41) |
Nonblack/nonwhite | Reference | Reference | Reference | Reference |
Body mass indexc | ||||
≤20 kg/m2vs. >20 kg/m2 | 0.55 (0.46–0.67) | 1.03 (0.71–1.49) | 0.76 (0.54–1.09) | 1.66 (0.83–3.32) |
ECOG performance statusc | ||||
1–2 vs. 0 | 1.35 (1.16–1.58) | 0.49 (0.36–0.67) | 1.54 (1.19–2.01) | 0.59 (0.34–1.00) |
Anemia | ||||
Yes vs. no/unknown | 1.11 (0.94–1.30) | 0.79 (0.58–1.08) | 0.92 (0.69–1.21) | 0.97 (0.56–1.67) |
Married | ||||
Yes vs. no/unknown | 0.75 (0.66–0.88) | 0.99 (0.74–1.32) | 0.77 (0.59–1.00) | 1.16 (0.70–1.93) |
Education history | ||||
Any college/vocational/technical vs. none/unknown | 0.60 (0.51–0.71) | 0.98 (0.71–1.36) | 0.57 (0.43–0.77) | 0.90 (0.52–1.55) |
Anatomic subsite | ||||
Oropharynx | Reference | Reference | Reference | Reference |
Larynx | 1.16 (0.96–1.39) | 1.06 (0.73–1.53) | 0.93 (0.66–1.30) | 1.15 (0.59–2.24) |
Hypopharynx | 1.65 (1.33–2.04) | 0.77 (0.50–1.18) | 1.72 (1.16–2.55) | 0.87 (0.39–1.91) |
Oral cavityc | 1.47 (1.13–1.91) | 2.55 (1.40–4.62) | 2.11 (1.22–3.64) | 2.46 (0.62–9.77) |
T stage | ||||
0–2 | Reference | Reference | Reference | Reference |
3 | 1.06 (0.88–1.28) | 0.85 (0.59–1.23) | 0.87 (0.63–1.21) | 0.84 (0.44–1.61) |
4 | 1.53 (1.26–1.86) | 0.77 (0.52–1.13) | 1.45 (1.04–2.01) | 0.84 (0.44–1.62) |
N stage | ||||
0c | Reference | Reference | Reference | Reference |
1–2a | 1.26 (1.01–1.57) | 1.36 (0.88–2.10) | 1.23 (0.82–1.84) | 1.53 (0.69–3.39) |
2b–2c | 1.29 (1.06–1.57) | 1.39 (0.94–2.06) | 1.54 (1.07–2.21) | 1.37 (0.67–2.82) |
3 | 2.19 (1.65–2.91) | 1.07 (0.60–1.91) | 2.93 (1.77–4.84) | 1.00 (0.36–2.81) |
Smoking history, pack-years | ||||
≤10 vs. >10 | — | — | 0.50 (0.36–0.70) | 1.05 (0.56–1.94) |
P16 statusc | ||||
Positive vs. negative | — | — | 0.53 (0.39–0.72) | 0.66 (0.37–1.18) |
Abbreviations: ECOG, Eastern Cooperative Oncology Group; GCE, generalized competing event; PH, proportional hazards; RHR, relative hazard ratio.
a>1 Indicates increased HR for progression-free survival.
b>1 Indicates increased HR for cancer recurrence relative to competing mortality.
cRetained in parsimonious GCE model (nomogram).
Compared with standard models, GCE models improved stratification according to ω ratio within each risk group, with increasing ω from low risk to high risk according to both model predictions and observations (Supplementary Table S2). Agreement between predicted ω (i.e., ω score) and observed ω ratios was high, indicating excellent model fit and validity. The observed 3-year ω and ω+ratios for the whole cohort were 0.719 and 2.56, respectively. The observed 3-year ω and ω+ratios for the subset with known p16 and smoking status were 0.738 and 2.87, respectively.
As shown in Fig. 1, OS differed markedly across risk groups defined by standard models (Fig. 1A and C), whereas GCE models show little correspondence between OS and risk level when risk is defined by ω score (Fig. 1B and D). This suggests that, paradoxically, patients with a better predicted survival (and higher ω score) could be more likely to benefit from intensive treatment (by virtue of being much less likely to die from noncancer causes). This is further shown in Fig. 2, which plots the cumulative incidences of cancer recurrence and competing mortality within risk groups. Note that with standard risk stratification models, the probability of both cancer and noncancer mortality is increased in the highest risk strata relative to the GCE model, whereas the converse is true of the lower risk strata, further supporting GCE model validity. This is because while standard models are designed to separate groups according to PFS and OS, GCE models are designed to optimize the ratio of competing events in order to favor a particular event of interest.
Overall survival by risk strata. A, Cox model in the whole cohort. B, Generalized competing event (GCE) model in the whole cohort. C, Fakhry and colleagues (21) nomogram in patients with known smoking history and P16 status. D, GCE nomogram in patients with known P16 status.
Overall survival by risk strata. A, Cox model in the whole cohort. B, Generalized competing event (GCE) model in the whole cohort. C, Fakhry and colleagues (21) nomogram in patients with known smoking history and P16 status. D, GCE nomogram in patients with known P16 status.
Competing event incidences by risk score. A, Cox model in the whole cohort. B, Generalized competing event (GCE) model in the whole cohort. C, Fakhry and colleagues (21) nomogram in patients with known smoking history and P16 status. D, GCE model in patients with known P16 status.
Competing event incidences by risk score. A, Cox model in the whole cohort. B, Generalized competing event (GCE) model in the whole cohort. C, Fakhry and colleagues (21) nomogram in patients with known smoking history and P16 status. D, GCE model in patients with known P16 status.
Patients with the highest ω score (≥0.80)—representing the highest quintile—were more likely to benefit from intensive treatment (5-year OS, 70.0% vs. 56.6%; HR of 0.73, 95% CI: 0.57–0.94; Wald P = 0.016) than those with ω score <0.80 (5-year OS, 46.7% vs. 45.3%; HR of 1.02, 95% CI: 0.92–1.14; Wald P = 0.69; P = 0.019 for interaction). For patients with known P16 status, the GCE nomogram similarly identified a statistically significant benefit from treatment intensification in patients with ω score ≥0.80 (HR of 0.67; 95% CI: 0.47–0.95, Wald P = 0.027); in contrast, we did not find statistically significant treatment effects in the high-risk subgroups defined by standard models overall (Fig. 3), or in any of the trials separately. Treatment intensification was also associated with statistically significant improvement in OS in patients with ω score ≥0.80 in the RTOG 9003 trial separately (Supplementary Fig. S3). These results appeared robust over a range of potential cut-off points near the ω score of 0.80 (Supplementary Fig. S4A and S4B). Calibration plots showed excellent discriminatory ability, with better fitting at higher predicted ω values (Supplementary Fig. S4C). A nomogram for calculating an individual's ω score appears in Fig. 4.
Interaction between experimental therapy and ω score. A, Whole cohort. Left: ω score <0.80; right: ω score ≥0.80. B, Patients with known P16 status. Left: ω score <0.80; right: ω score ≥0.80.
Interaction between experimental therapy and ω score. A, Whole cohort. Left: ω score <0.80; right: ω score ≥0.80. B, Patients with known P16 status. Left: ω score <0.80; right: ω score ≥0.80.
Nomogram to predict patients' relative hazard for recurrence based on GCE regression model. ECOG, Eastern Cooperative Oncology Group.
Nomogram to predict patients' relative hazard for recurrence based on GCE regression model. ECOG, Eastern Cooperative Oncology Group.
Model estimates and performance were similar when patients with missing BMI data were omitted from the analysis. We found evidence of efficiency gains with the GCE model relative to standard models under varying definitions of “high risk” (Table 2), due to the higher ratio of primary to competing events. However, this analysis does not account for efficiency loss that could result from a lower event rate. Although the incidence of competing mortality was lower in the high-risk group using the GCE model, the lower incidence of cancer recurrence offsets some of the efficiency gains, indicating correlation between primary and competing events. As such, GCE models could be less efficient than models designed to predict recurrence, but this conclusion was sensitive to the lack of P16 status for the majority of the cohort. It is noteworthy, however, that sample size estimates were similar with the various approaches, despite a marked reduction in the overall event rate in the “high-risk” group defined by the GCE model.
Comparison of sample size estimates within variously defined high-risk groups
. | Cancer recurrence [3-year cumulative incidence (%)] . | Competing mortality [3-year cumulative incidence (%)] . | HRa . | N . |
---|---|---|---|---|
Whole cohort | ||||
Highest tertile | ||||
Cox model for OS | 48.4 | 25.3 | 0.672 | 442 |
Cox model for recurrence | 50.7 | 21.9 | 0.651 | 364 |
GCE modela | 36.6 | 6.4 | 0.574 | 307 |
Highest quintile | ||||
Cox model for OS | 51.9 | 26.7 | 0.670 | 409 |
Cox model for recurrence | 54.6 | 24.3 | 0.654 | 346 |
GCE modela | 36.6 | 5.5 | 0.565 | 293 |
Subset with known P16 and smoking status | ||||
Highest tertile | ||||
Fakhry model for OS | 52.3 | 20.7 | 0.642 | 331 |
Cox model for recurrence | 53.4 | 18.5 | 0.629 | 298 |
GCE model | 39.7 | 8.9 | 0.592 | 315 |
Highest quintile | ||||
Fakhry model for OS | 54.4 | 24.6 | 0.655 | 351 |
Cox model for recurrence | 58.9 | 19.6 | 0.625 | 264 |
GCE model | 34.6 | 11.0 | 0.600 | 301 |
. | Cancer recurrence [3-year cumulative incidence (%)] . | Competing mortality [3-year cumulative incidence (%)] . | HRa . | N . |
---|---|---|---|---|
Whole cohort | ||||
Highest tertile | ||||
Cox model for OS | 48.4 | 25.3 | 0.672 | 442 |
Cox model for recurrence | 50.7 | 21.9 | 0.651 | 364 |
GCE modela | 36.6 | 6.4 | 0.574 | 307 |
Highest quintile | ||||
Cox model for OS | 51.9 | 26.7 | 0.670 | 409 |
Cox model for recurrence | 54.6 | 24.3 | 0.654 | 346 |
GCE modela | 36.6 | 5.5 | 0.565 | 293 |
Subset with known P16 and smoking status | ||||
Highest tertile | ||||
Fakhry model for OS | 52.3 | 20.7 | 0.642 | 331 |
Cox model for recurrence | 53.4 | 18.5 | 0.629 | 298 |
GCE model | 39.7 | 8.9 | 0.592 | 315 |
Highest quintile | ||||
Fakhry model for OS | 54.4 | 24.6 | 0.655 | 351 |
Cox model for recurrence | 58.9 | 19.6 | 0.625 | 264 |
GCE model | 34.6 | 11.0 | 0.600 | 301 |
Abbreviations: GCE, generalized competing event; OS, overall survival.
aProjected HR for recurrence or death from any cause from Eq. (G), based on observed ω ratios.
Discussion
In this study we found that intensive treatment differentially benefits patients with a higher relative recurrence risk (ω score ≥0.80). Previous studies involving HNC and other disease sites have found that ω scores could be used to identify patients with a greater likelihood to benefit from intensive therapy (3, 10–13). This study is the first to examine treatment effects within risk groups defined by this factor. We found evidence to support the hypothesis that relative recurrence risk is an important predictor of treatment effectiveness in the HNC population.
Advantages of this study were its large sample size and large number of known predictors of both cancer-related and competing events. Randomization also mitigated the impact of selection bias, which presents a problem studying treatment effects in other data sources. A limitation of this study, however, is the heterogeneity in the treatment and populations across the three trials. “Intensive treatment” was defined relative to the baseline (control) group and thus included AFX (with or without concurrent chemotherapy), or chemoradiotherapy with concurrent targeted therapy, depending on the trial. We were also unable to control directly for some predictors, such as comorbidity and SES, that likely would have helped optimize the model relative to standard approaches. For example, income was prognostic for both cancer recurrence and competing mortality but had to be omitted because it was not collected for all trials. Although previous studies have found increased survival for patients undergoing treatment at high-volume centers (31–33), radiotherapy quality in RTOG trials is considered to be high. The incidence of noncancer mortality in this cohort was also lower than has been observed in prior studies (2), indicating the exclusion of many patients at risk for competing events.
Variation in the definition of “intensive treatment” is a potential limitation of our study; however, the intent was to compare the effectiveness of intensity with respect to the control arm, and our results were unaffected whether we applied fixed or random-effects models. Data from large randomized trials or meta-analyses involving homogeneous treatments (in particular, with an established survival benefit) will be important for further model validation. Note that both RTOG 0129 and RTOG 0522 failed to reject the null hypothesis; in the absence of effective therapy, it is not possible to identify subpopulations that would benefit. Future studies involving more trials that met their primary endpoint would be helpful to determine how treatment effects and toxicity vary with the ω ratio. However, we did observe a survival advantage with AFX in the high-risk group from RTOG 9003, lending support to the hypothesis that ω score is a useful predictive marker.
Interpreting the effects of particular covariates in this study should be undertaken with caution, since CIs were fairly wide (leading to some differences in interpretation across samples). Lack of consistency and incomplete collection of key prognostic variables hampers efforts to compare risk models, requiring us to retrain multivariable models in new samples; however, GCE models have previously been validated in population-based studies (10, 11). It should also be noted that age cutoffs ≤50 (and >70) have been previously associated with a selective benefit (or lack thereof) of treatment intensification in HNC, including the RTOG 0522 trial included in this analysis (16, 34, 35). However, age as a sole criterion for treatment selection is generally not favored (36), because other health factors can influence the appropriate intensity of therapy. In this study, age ≤50 years was not predictive of a treatment benefit in the whole cohort.
GCE regression is a modeling approach with clear differences relative to standard risk stratification methods. It contrasts with other nomograms (21, 37, 38) in that instead of predicting patients' risk for event-free survival, which is preferable for prognostication, the GCE model seeks to predict the ratio of cancer events to competing events, which is considered preferable as a predictive model. Further studies are required to establish its advantages over standard methods, especially in the postoperative setting and the larger population not participating in trials, who we expect would have differing risk for competing events. An important limitation is that the cutoff of 0.80 for the ω score, although robust, was not chosen a priori; our results should thus be considered hypothesis generating and should be validated in future studies. Ascertaining optimal cutoffs to define “high-risk” groups remains an area of investigation, especially with models controlling for comorbidity and other geriatric/frailty assessments. Perhaps most interestingly, our findings suggest that a higher absolute risk for recurrence/progression does not necessarily confer a higher likelihood to benefit from intensive therapy (or greater power to detect treatment effects). This is because patients with a low risk for both recurrence and competing mortality may benefit as much from aggressive treatment approaches as patients with high risk for both events.
In summary, here we propose a method to predict an underreported but meaningful quantity for individual patients (i.e., relative recurrence risk, or ω ratio), with a clinically relevant interpretation (i.e., a value >50% means the individual's hazard for cancer recurrence exceeds the hazard for competing mortality). Our findings indicate that patients with a higher relative recurrence risk, indicated by a ω score ≥0.80, selectively benefit from intensive therapy. This approach is being implemented prospectively in the NRG-HN004 trial, along with a nomogram to inform clinical practice and trial design (comogram.org). Further research, however, is needed to optimize GCE models and to ascertain which patients derive the greatest benefit from intensive therapy.
Disclosure of Potential Conflicts of Interest
D.I. Rosenthal is an employee of Merck. S.J. Frank is an employee of Boston Scientific and Varian Medical; reports receiving commercial research grants from Hitachi, Eli Lilly, and Elekta; and holds ownership interest (including patents) in C4 Imaging. J.A. Bonner is an employee of Bristol-Myers Squibb, Eli Lilly, Merck Serono, and Cel-Sci. S.S. Yom is an employee of Galera; reports receiving commercial research grants from Bristol-Myers Squibb, Merck, Genentech, and BioMimetix; and reports receiving other remuneration from Springer and UpToDate. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: L.K. Mell, H. Shen, Q.-T. Le
Development of methodology: L.K. Mell, H. Shen, K. Zakeri, S.J. Wong
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L.K. Mell, P.F. Nguyen-Tân, D.I. Rosenthal, A.M. Trotti III, J.A. Bonner, C.U. Jones, S.S. Yom, W.L. Thorstad, S.J. Wong, G. Shenouda, J.A. Ridge, Q.E. Zhang
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L.K. Mell, H. Shen, D.I. Rosenthal, K. Zakeri, L.K. Vitzthum, S.J. Frank, P.B. Schiff, A.M. Trotti III, J.A. Bonner, S.J. Wong, G. Shenouda, J.A. Ridge, Q.E. Zhang
Writing, review, and/or revision of the manuscript: L.K. Mell, H. Shen, P.F. Nguyen-Tân, D.I. Rosenthal, K. Zakeri, L.K. Vitzthum, S.J. Frank, P.B. Schiff, A.M. Trotti III, J.A. Bonner, C.U. Jones, S.S. Yom, W.L. Thorstad, S.J. Wong, G. Shenouda, J.A. Ridge, Q.E. Zhang, Q.-T. Le
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): L.K. Mell, L.K. Vitzthum, J.A. Bonner, S.J. Wong
Study supervision: L.K. Mell
Acknowledgments
This project was supported by grants U10CA180868 (NRG Oncology Operations), U10CA180822 (NRG Oncology SDMC) from the National Cancer Institute (NCI; NRG Oncology/RTOG 9003, NCT00771641, https://clinicaltrials.gov/ct2/show/NCT00771641, NRG Oncology/RTOG 0129, NCT00047008, https://clinicaltrials.gov/ct2/show/NCT00047008, and NRG Oncology/RTOG 0522, NCT00265941, https://clinicaltrials.gov/ct2/show/NCT00265941), and Eli Lilly.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.