In recent years several clinical studies have investigated deintensified treatments in human papillomavirus (HPV)-associated head and neck squamous cell carcinoma. Two large phase III trials, RTOG 1016 and De-ESCALaTE, which attempted to reduce toxicity by replacing radiotherapy in combination with cisplatin with the use of cetuximab in combination with radiotherapy, recently suggested that radiotherapy + cetuximab leads to inferior survival compared with standard therapy (observed HRs of 1.45 and 5 in RTOG 1016 and De-ESCALaTE), as well as increased rates of locoregional failure. These unexpected results should prompt a careful examination of deintensification trials, both in HPV-associated oropharyngeal cancer and in other contexts. Statistical designs for deintensification studies should be consistent with the study aims of reducing toxicities while maintaining survival nearly identical to the standard of care. We suggest criteria to design future deintensification trials and discuss important operating characteristics, including tradeoffs between power and stringent early stopping rules to reduce the number of patients exposed to inferior treatments. Using retrospective analyses of previous clinical studies, we compared designs with different operating characteristics. As an example, using outcomes data from RTOG 1016 and De-ESCALaTE, we conducted analyses to determine advantages of (i) stringent futility early-stopping rules and of (ii) study designs that leverage both toxicity and efficacy endpoints for interim analyses. We show that increasing the frequency of interim-futility analyses has little impact on power, but the average study duration and number of subjects enrolled before the trial is closed for inferiority can decrease substantially (from 57.8 to 18 months, and from 764 to 645 subjects). Moreover, the number of observed deaths during the study can be reduced by up to 68%.

Translational Relevance

In recent years, several clinical studies have investigated deintensified treatments in human papillomavirus (HPV)-associated head and neck squamous cell carcinoma (HNSCC). Statistical designs for deintensification studies should be consistent with the study aims of reducing toxicities while maintaining survival nearly identical to the standard of care. We suggest criteria to design future deintensification trials and discuss important operating characteristics, including tradeoffs between power and stringent early stopping rules to reduce the number of patients exposed to inferior treatments. Using retrospective analyses of two large phase III clinical studies in HPV-associated HNSCC, we evaluate designs for future deescalation studies under realistic scenarios, including enrollment rates, and survival distributions for the standard of care and the deintensified treatment. These retrospective analyses illustrate how the suggest criteria can be relevant for future studies.

The incidence of human papillomavirus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) is increasing rapidly in many high-income countries (1–4). HPV-associated HNSCC cancers are a distinct subtype as compared with HPV-negative HNSCC (5, 6). In most countries, intensity-modulated radiotherapy in combination with cisplatin (RT+CP) is a current standard of care for advanced oropharyngeal squamous cell carcinoma (7). HPV-associated HNSCC affects a younger and healthier cohort of patients compared with tobacco- and alcohol-associated cancers, and treatment with RT+CP has a high success rate in HPV-associated cancers, with 3-year survival rates close to 90% (8, 9). However, the addition of cisplatin to radiotherapy is associated with a substantial increase in acute and late toxicity compared with radiotherapy alone (10). Young patients with HPV-associated HNSCC therefore experience significant morbidity from treatment with RT+CP, which can adversely impact their quality of life for decades (11).

There is a general agreement for the need of alternative, possibly deescalated, therapies for patients with HPV-associated oropharyngeal cancers with reduced toxicity profiles, without sacrificing response to treatment relative to RT+CP (12). In recent years, several clinical studies have investigated deintensified treatments in HPV-associated HNSCC: for instance OPTIMA (13), E1308 (14), RTOG 1016 (9), and De-ESCALaTE (15). Results of several more studies (PATHOS, ECOG3311, ADEPT, Quarterback trial, and NCT02048020) are eagerly awaited (12, 16).

Unfortunately, both RTOG 1016 and De-ESCALaTE, two large phase III trials that attempted to reduce toxicity by replacing concurrent RT+CP with the use of the EGFR inhibitor cetuximab in combination with radiotherapy (RT+cetuximab), showed inferior overall survival and progression-free survival (OS and PFS) for RT+cetuximab compared with standard therapy, as well as increased rates of locoregional failure. These results were unexpected and should encourage a reevaluation of ongoing deintensification HNSCC trials enrolling similar populations, as well as statistical designs of future deescalation trials.

Deintensification trials may expose study subjects to inferior and toxic treatments. The reported HRs for OS for cetuximab+RT compared with RT+CP were 1.45 and 5 in RTOG 1016 and De-ESCALaTE, respectively, and neither trial provided evidence that the experimental treatment decreases the rates of acute and late toxicities.

Here, we suggest a few relevant characteristics for future deintensification trial designs and desirable operating characteristics.

  • (i) The study design should balance a large probability of detecting noninferior treatments (i.e., survival similar to the standard of care) with the aim of minimizing the risk of exposing a large number of patients on the trial and future patients to inferior and/or toxic treatments.

  • (ii) De-escalation trials can consider the use of both toxicity and efficacy coprimary outcomes to evaluate reductions of adverse events and to test noninferior survival of the deintensified treatment. Explicit early stopping rules for both early evidence of inferior survival and insufficient reduction of toxicities should be included in the study design.

  • (iii) Early stopping rules for inferior survival should be sufficiently stringent (terminate inferior treatments with high >60% probability during the trial) in settings where large OS inferiority margins (i.e., the cutoff to define inferior and noninferior survival) are used. Conservative stopping rules for inferiority and/or insufficient reduction of toxicity may preserve power but can expose a large number of patients, during the study, to toxic and/or inferior treatments.

  • (iv) Explicit and reproducible early stopping and testing procedures should be provided with the study publication and should be included in the study protocol. This is necessary to set good practices, to define guidelines for future designs, and for informed discussions and comparisons of trial designs.

These criteria need to be considered together with other important factors relevant to clinical trial design, such as the resources available to conduct multiple interim analyses, the variability of treatment effect estimates, the use of appropriate noninferiority margins, and the importance of secondary endpoints. Although beyond the scope of this perspective, these aspects are more comprehensively discussed elsewhere (17–20).

We use retrospective analyses to evaluate designs for future deescalation studies, with realistic scenarios, including enrollment rates, OS and PFS distributions for the standard of care, and the deintensified treatment. These retrospective analyses illustrate how the above criteria (i–iv) can be relevant for future studies.

The recent De-ESCALaTE study used conservative tests for toxicity (primary endpoint) at interim analyses (with P value thresholds at 0.001 for the null hypothesis of equal toxicity between both arms), and reported a HR between RT+cetuximab and RT+CP of 5.0 for OS (15). No explicit stopping rules for inferior survival have been reported (15). Moreover, the authors reported in the Supplementary Materials and Methods of the study article (15) that there have been several amendments made to the toxicity testing rules, with an extension of the overall sample size of the study due to an unexpectedly high rate of toxicities and a change of statistical testing procedure for toxicity from a Poisson test to a Mann–Whitney U test, although a t test was most recently used in the study article to report results. The publication indicates that the study protocol and analysis plan will be shared (upon request) from January 1st 2020 onwards.

RTOG 1016 was stopped early due to an insufficient evidence of noninferior survival. Interestingly, the estimated HR = 1.45 between RT+cetuximab and RT+CP (nearly identical to the specified noninferiority margin in RTOG 1016) and the upper limit of the HR's confidence interval did not cross the original per-protocol stopping boundary, which in retrospect appeared to be too conservative (9), and an adjusted post hoc stopping rule was instead applied to terminate the study.

Would more stringent early futility stopping criteria prevent patients from being exposed to inferior treatments in future deescalation trials, while maintaining appropriate power of trial designs? Using outcomes data from RTOG 1016 (sampling from digitalized Kaplan–Maier curves extracted with DigitizeIt; ref. 21), we conducted a simulation analysis to determine advantages of stringent futility stopping rules.

We considered the original design of RTOG 1016 (see Gillison and colleagues, 2019; ref. 9), including the null (inferior OS) and alternative (noninferior OS) hypotheses specified using HRs for OS (⁠|{\rm{H}}{{\rm{R}}_{{\rm{OS}}}}$|⁠) equal to 1.45 and 1.00, early stopping rules, observed accrual rate, and the size of 800 randomized patients. By protocol, futility interim analyses (early stopping for the null hypothesis) using PFS started after 2 years and were conducted every 12 months thereafter. Toxicity data are not considered in this initial set of simulations.

We compared this design with an alternative Bayesian design, with identical null and alternative hypotheses (H0 : HROS = 1.45 and HA : HROS < 1.45), similar noninferiority (alternative hypothesis) interim monitoring rules (as in RTOG 1016, OS interim and final analyses after 45, 90, 135, and 180 events to stop the study for the alternative hypothesis), but with more stringent early stopping criteria for inferiority (null hypothesis). Specifically, we used a Bayesian Cox model (22) for PFS to estimate the HR between the experimental and control arm. Inferiority analyses (futility early stopping) using PFS, as described below, are repeated every 1, 3, 6, or every 12 months.

We considered two summaries to define these stopping rules for inferiority, either the Bayesian posterior probability (i) of a HRPFS>1 or (ii) of a HRPFS >1.2. The choice of an appropriate threshold h, such as h = 1 or h = 1.2, depends on the tradeoff in acceptable survival reduction and expected improvement in toxicity compared with standard of care. The trial is stopped at interim analyses if the probability of HRPFS> 1 (HRPFS > 1.2) exceeds a predefined futility parameter.

Assuming HRs (for OS and PFS) equal to the noninferiority margin in RTOG 1016 (1.45), we determined (separately for the posterior probability of HRPFS > 1 and for HRPFS > 1.2) futility thresholds such that the trial would stop early during the trial for inferiority with probability P0 = 0.8 (or 0.9). In practice, the probability P0 (e.g., 0.8 and 0.9) should be selected considering a tradeoff between protecting patients from inferior treatments and maintaining power, and thus may be selected using prior data regarding the expected toxicity reduction associated with the experimental treatment. Larger P0 values will lead to faster termination of inferior experimental therapies but will also have a negative impact on power. If large noninferiority margins are used (for instance a HR = 1.45 as in RTOG 1016) investigators may consider large values 0.8–0.9, whereas smaller values P0 0.6 or 0.7 may be sufficient if smaller noninferiority margins are applied.

We used 10,000 simulations of the two (Bayesian and RTOG 1016) designs across four scenarios. In scenario 1, the OS and PFS distributions of the control and experimental arm are identical to the observed distributions in RTOG 1016. In the remaining three scenarios, control OS and PFS distributions are as observed in RTOG 1016, and HRs between both OS and PFS distributions of the control and experimental arms are equal to HR = 1.45, 1.1, or 1. The Supplementary Materials and Methods provide additional details on the Bayesian design, including the control of type I error rates, and the implementation of the simulation study.

Table 1 and Supplementary Table S1 report selected operating characteristics from the simulation study. Importantly, while increasing the frequency of interim-futility analyses has little impact on power, the average study duration and number of subjects enrolled before the trial is closed for inferiority can decrease (from 57.8 to 30.5–18 months, and from 764 to 645–690 subjects, scenario 1). The average number of deaths during the study is reduced up to 68% in scenario 1. The simulations also show the importance of selecting a suitable statistic (e.g., the posterior probability of HR > h, with h = 1 as in Table 1 or h = 1.2 as in Supplementary Table S1) for inferiority interim analyses. While these statistics perform similarly well in inferiority scenarios, using h = 1 instead of h = 1.2 reduces the probability of stopping for inferiority significantly when the deintensified therapy is noninferior (scenario 4). Supplementary Table S2 presents additional simulation analyses using different sample sizes, which ensure (approximately) 85% and 90% power. Also in these simulations, increasing the frequency of interim analyses has minimal effects on power (<1% power reduction), but can substantially reduce the study duration when the experimental treatment is inferior.

Table 1.

Operating characteristics

Bayesian designa
Target proportion of trials that should be stopped early under HR = 1.4580%90%
Futility-interim analyses every1 month3 months6 months12 months1 month3 months6 months12 monthsPer-protocol RTOG 1016 design
 Scenario 1: OS and PFS curves as observed in RTOG 1016 
H0 not rejected, stopped early for inferiority (%)b 97.3 97.2 97.1 97.1 98.8 98.8 98.8 98.8 78.8 
H0 not rejected, not stopped early (%)c 2.4 2.5 2.6 2.6 19.9 
H0 rejected (type I error, %)d 0.2 0.3 0.3 0.3 0.1 0.1 0.2 0.2 1.3 
H0 rejected at final analysis (%)e <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 
H0 rejected at interim analyses (%)f 0.2 0.3 0.3 0.3 0.1 0.1 0.2 0.2 1.3 
 Average study duration (months)g 32.1 33.4 35.1 38 26.4 27.6 29 31.8 57.8 
 Patients randomized (average) 579.8 599.5 622.6 662.8 499.4 520.5 545 591.1 763.9 
 Deaths on study (average) 43.6 46.6 50.3 57.4 31.2 33.6 36.7 43.1 103.1 
 Scenario 2: OS (and PFS) HR = 1.45 between experimental and control arm 
H0 not rejected, stopped early for inferiority (%)b 80.8 80.5 80.2 80.2 89.8 89.9 89.8 89.8 38.7 
H0 not rejected, not stopped early (%)c 18 18.2 18.5 18.5 9.5 9.3 9.3 9.3 57.9 
H0 rejected (type I error)d 1.2 1.3 1.3 1.4 0.7 0.8 0.9 0.9 3.4 
H0 rejected at final analysis (%)e 0.4 0.4 0.5 0.4 0.2 0.2 0.2 0.2 1.3 
H0 rejected at interim analysis (%)f 0.8 0.8 0.8 0.9 0.6 0.6 0.7 0.7 2.1 
 Average study duration (months)g 57.8 59.2 60.9 63 44.7 45.8 47.6 50 97 
 Patients randomized (average) 689.8 702.8 716.8 734.8 617.1 632 652.6 680.3 763.9 
 Deaths on study (average) 89.5 92.7 96.3 101.3 66.1 68.6 72.5 78.1 103.1 
 Scenario 3: OS (and RFS PFS) HR = 1.1 between experimental and control arm 
H0 not rejected, stopped early for inferiority (%)b 13.5 12.8 12.5 12.3 25.5 24.8 24.4 23.6 0.9 
H0 not rejected, not stopped early (%)c 36.3 36.7 36.7 36.7 30.5 30.8 31.1 31.4 41.7 
H0 rejected (power)d 50.2 50.5 50.8 51 44 44.4 44.6 45 57.4 
H0 rejected at final analysis (%)e 19.5 19.5 19.6 19.6 16.9 16.9 16.8 16.9 22.9 
H0 rejected at interim analysis (%)f 30.8 31 31.2 31.5 27.2 27.6 27.8 28.1 34.6 
 Average study duration (months)g 119.6 120.5 121 121.6 108 109.3 110.3 111.6 130.4 
 Patients randomized (average) 784.3 786.8 788.6 790.1 756.8 763.2 769.3 777.2 791.2 
 Deaths on study (average) 146.9 148.2 148.8 149.9 133 135 136.4 138.7 152.8 
 Scenario 4: Identical OS (and PFS) distributions in both arms, HR = 1 
H0 not rejected, stopped early for inferiority (%)b 4.1 3.7 3.3 9.9 9.5 9.1 8.6 0.1 
H0 not rejected, not stopped early(%)c 19.7 19.7 19.8 19.8 18.6 18.7 18.8 18.8 22.7 
H0 rejected (power)d 76.1 76.3 76.5 76.9 71.5 71.8 72.1 72.6 77.2 
H0 rejected at final analysis (%)e 21.5 21.6 21.6 21.8 20.2 20.1 20.1 20.2 21.2 
H0 rejected at interim analysis (%)f 54.6 54.7 54.8 55.1 51.3 51.7 52 52.5 56 
 Average study duration (months)g 118.4 118.7 119 119.4 113.4 113.8 114.3 114.9 121.9 
 Patients randomized (average) 792.8 794.1 795.2 796.7 778.8 781.8 784.8 787.9 799.6 
 Deaths on study (average) 140.9 141.3 141.6 142.2 134.9 135.5 136.2 137.2 145.3 
Bayesian designa
Target proportion of trials that should be stopped early under HR = 1.4580%90%
Futility-interim analyses every1 month3 months6 months12 months1 month3 months6 months12 monthsPer-protocol RTOG 1016 design
 Scenario 1: OS and PFS curves as observed in RTOG 1016 
H0 not rejected, stopped early for inferiority (%)b 97.3 97.2 97.1 97.1 98.8 98.8 98.8 98.8 78.8 
H0 not rejected, not stopped early (%)c 2.4 2.5 2.6 2.6 19.9 
H0 rejected (type I error, %)d 0.2 0.3 0.3 0.3 0.1 0.1 0.2 0.2 1.3 
H0 rejected at final analysis (%)e <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 
H0 rejected at interim analyses (%)f 0.2 0.3 0.3 0.3 0.1 0.1 0.2 0.2 1.3 
 Average study duration (months)g 32.1 33.4 35.1 38 26.4 27.6 29 31.8 57.8 
 Patients randomized (average) 579.8 599.5 622.6 662.8 499.4 520.5 545 591.1 763.9 
 Deaths on study (average) 43.6 46.6 50.3 57.4 31.2 33.6 36.7 43.1 103.1 
 Scenario 2: OS (and PFS) HR = 1.45 between experimental and control arm 
H0 not rejected, stopped early for inferiority (%)b 80.8 80.5 80.2 80.2 89.8 89.9 89.8 89.8 38.7 
H0 not rejected, not stopped early (%)c 18 18.2 18.5 18.5 9.5 9.3 9.3 9.3 57.9 
H0 rejected (type I error)d 1.2 1.3 1.3 1.4 0.7 0.8 0.9 0.9 3.4 
H0 rejected at final analysis (%)e 0.4 0.4 0.5 0.4 0.2 0.2 0.2 0.2 1.3 
H0 rejected at interim analysis (%)f 0.8 0.8 0.8 0.9 0.6 0.6 0.7 0.7 2.1 
 Average study duration (months)g 57.8 59.2 60.9 63 44.7 45.8 47.6 50 97 
 Patients randomized (average) 689.8 702.8 716.8 734.8 617.1 632 652.6 680.3 763.9 
 Deaths on study (average) 89.5 92.7 96.3 101.3 66.1 68.6 72.5 78.1 103.1 
 Scenario 3: OS (and RFS PFS) HR = 1.1 between experimental and control arm 
H0 not rejected, stopped early for inferiority (%)b 13.5 12.8 12.5 12.3 25.5 24.8 24.4 23.6 0.9 
H0 not rejected, not stopped early (%)c 36.3 36.7 36.7 36.7 30.5 30.8 31.1 31.4 41.7 
H0 rejected (power)d 50.2 50.5 50.8 51 44 44.4 44.6 45 57.4 
H0 rejected at final analysis (%)e 19.5 19.5 19.6 19.6 16.9 16.9 16.8 16.9 22.9 
H0 rejected at interim analysis (%)f 30.8 31 31.2 31.5 27.2 27.6 27.8 28.1 34.6 
 Average study duration (months)g 119.6 120.5 121 121.6 108 109.3 110.3 111.6 130.4 
 Patients randomized (average) 784.3 786.8 788.6 790.1 756.8 763.2 769.3 777.2 791.2 
 Deaths on study (average) 146.9 148.2 148.8 149.9 133 135 136.4 138.7 152.8 
 Scenario 4: Identical OS (and PFS) distributions in both arms, HR = 1 
H0 not rejected, stopped early for inferiority (%)b 4.1 3.7 3.3 9.9 9.5 9.1 8.6 0.1 
H0 not rejected, not stopped early(%)c 19.7 19.7 19.8 19.8 18.6 18.7 18.8 18.8 22.7 
H0 rejected (power)d 76.1 76.3 76.5 76.9 71.5 71.8 72.1 72.6 77.2 
H0 rejected at final analysis (%)e 21.5 21.6 21.6 21.8 20.2 20.1 20.1 20.2 21.2 
H0 rejected at interim analysis (%)f 54.6 54.7 54.8 55.1 51.3 51.7 52 52.5 56 
 Average study duration (months)g 118.4 118.7 119 119.4 113.4 113.8 114.3 114.9 121.9 
 Patients randomized (average) 792.8 794.1 795.2 796.7 778.8 781.8 784.8 787.9 799.6 
 Deaths on study (average) 140.9 141.3 141.6 142.2 134.9 135.5 136.2 137.2 145.3 

NOTE: The Bayesian design conducts interim analyses for inferiority (using the probability of HR > 1) every 1, 3, 6, or 12 months, starting from months 1, 3, 6, or 12. The far-right column reports operating characteristics for the RTOG 1016 protocol design.

aThe Bayesian design uses a normal prior with mean zero and unit variance.

bPercentage of simulated trials that were stopped early for inferiority.

cPercentage of simulated trials that were not stopped early and declared the experimental treatment inferior to the control arm at final analysis.

dPercentage of simulated trials that declared the experimental treatment noninferior to the control arm at interim or final analyses (sum ofe andf).

ePercentage of simulated trials that declared the experimental treatment noninferior to the control arm at final analysis.

fPercentage of simulated trials that declared the experimental treatment noninferior to the control arm at interim analyses.

gStudy duration is defined as the period between trial activation and either time of early stopping or final analyses.

We repeated the simulations using the reported Kaplan–Meier curves of De-ESCALaTE (15). Supplementary Table S3 provides (similar to Table 1) a summary of the operating characteristics of the two (Bayesian and RTOG 1016) designs with outcome data generated from digitalized Kaplan–Meier curves of De-ESCALaTE (15). The results are in strong concordance with those in Table 1.

We considered the utility of using both efficacy and toxicity endpoints for early stopping (inferior survival or insufficient reduction in toxicity). This is relevant because De-ESCALaTE and RTOG 1016 did not demonstrate reductions in toxicity events. We used a Bayesian model (see Supplementary Materials and Methods) to estimate the average number of adverse events (grade 3–5 events) per patient, EE and EC on the experimental and control arm, and we computed the posterior probability of a reduction in toxicities (⁠|{E_E} - \ {E_C} \lt 0)$| at futility-interim analyses. If this probability falls below a predefined threshold, then the trial is stopped for futility. In addition, the study may be stopped early for inferiority survival (as previously described) or noninferiority (see Supplementary Materials and Methods). The availability of sufficient information to compare toxicities is a prerequisite for early stopping for noninferiority (see Supplementary Materials and Methods).

We considered all combinations of three efficacy and three toxicity scenarios. The average number of toxicities per patient is set either identical in the control and experimental arm, or reduced by 25% or 50% in the experimental arm (toxicity scenarios 1–3). The control and experimental OS and PFS distributions are either as observed in RTOG 1016 (scenario 1), or the HR (for OS and PFS) between the control and the experimental arms are set equal to HR = 1.45 or 1 (scenarios 2–3).

Figure 1 and Supplementary Figs. S1–S6 show selected operating characteristics from the simulation study. The additional stopping rules for toxicity have only moderate impact on the power (Supplementary Fig. S3 for RTOG 1016 and Supplementary Fig. S6 for De-ESCALaTE) if the deintensified therapy reduces toxicity as intended. In contrast, when the deintensified therapy fails to reduce toxicity, then the study is terminated for futility (insufficient reduction of toxicities or inferiority) in >99% of the simulations (Fig. 1). This determines a reduction of up to 54% in the average number of enrollments compared with a Bayesian design without toxicity monitoring (Fig. 1B).

Figure 1.

Operating characteristics of the Bayesian design testing both (i) noninferior survival and (ii) reduction of adverse events (AE) for the deintensified therapy. Dots and crosses correspond to a 0%, and 50% reduction of adverse events under the experimental treatment (an average of 3.2 and 1.6 number of adverse events per patients). Results refer to a design that stops the study early for inferiority with proximately probability 0.8 when |H{R_{OS}}\ = \ H{R_{PFS}}\ = \ 1.45,$| and terminates the study early for insufficient reductions of toxicities with probability 0.6 when the control and experimental arm have identical adverse event distributions. The red horizontal lines indicate the operating characteristics of the RTOG 1016 design, which does not consider adverse events at interim analyses.

Figure 1.

Operating characteristics of the Bayesian design testing both (i) noninferior survival and (ii) reduction of adverse events (AE) for the deintensified therapy. Dots and crosses correspond to a 0%, and 50% reduction of adverse events under the experimental treatment (an average of 3.2 and 1.6 number of adverse events per patients). Results refer to a design that stops the study early for inferiority with proximately probability 0.8 when |H{R_{OS}}\ = \ H{R_{PFS}}\ = \ 1.45,$| and terminates the study early for insufficient reductions of toxicities with probability 0.6 when the control and experimental arm have identical adverse event distributions. The red horizontal lines indicate the operating characteristics of the RTOG 1016 design, which does not consider adverse events at interim analyses.

Close modal

In summary, the recently published RTOG 1016 and De-ESCALaTE studies indicate the need for prospective studies before deintensification strategies can be adopted in clinical practice and showed that deintensified experimental treatments in HPV-associated HNSCC can lead to inferior outcomes. With the benefit of hindsight, we do not suggest these impactful studies were flawed. Rather, these studies offer opportunities to evaluate statistical designs for future deintensification trials. Flexible statistical designs, such as Bayesian designs with multiple primary outcomes (23–25), and well-tuned stopping criteria may be beneficial to make clinical studies more consistent with their primary aims, including the control of toxicity events (23) and the control of the potential number of patients that receive an inferior treatment. There is a rich literature on statistical designs that leverage data on multiple endpoints to improve early decisions and minimize the risk of exposing patients to inferior or toxic treatments (23–26).

We used the recently published RTOG 1016 and De-ESCALaTE studies to describe the impact of early stopping parameters in deintensification trials. Evidence of noninferiority and reduced toxicities are captured by established Bayesian models (22, 27). The Bayesian stopping rules are specified using threshold parameters on easy to interpret probabilities of inferiority and toxicity reductions (27). We illustrated the use of simulation studies (such as those summarized in Table 1 and Fig. 1) to choose early stopping parameters and to balance the tradeoff between power and the potential number of patients exposed to a treatment that is inferior or that does not reduce toxicities. Interpretable and well-justified early stopping rules are valuable in the setting of deintensification trials where standard treatment often leads to a favorable outcome in a high proportion of patients. The arguments set forth here could be applied in principle to any deescalation trial or noninferiority trial where overall survival needs to be monitored to expose a minimum number of patients to a potentially inferior treatment.

L. Trippa is an employee/paid consultant for Galera Therapeutics Inc. J.D. Schoenfeld is an employee/paid consultant for Tilos, LEK, and Catenion, and reports receiving other commercial research support from Merck and Bristol-Myers Squibb. No potential conflicts of interest were disclosed by the other author.

Conception and design: S. Ventz, L. Trippa, J.D. Schoenfeld

Development of methodology: S. Ventz, L. Trippa, J.D. Schoenfeld

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S. Ventz, L. Trippa

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S. Ventz, L. Trippa, J.D. Schoenfeld

Writing, review, and/or revision of the manuscript: S. Ventz, L. Trippa, J.D. Schoenfeld

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Ventz

Study supervision: S. Ventz, J.D. Schoenfeld

1.
Chaturvedi
AK
,
Engels
EA
,
Pfeiffer
RM
,
Hernandez
BY
,
Xiao
W
,
Kim
E
, et al
Human papillomavirus and rising oropharyngeal cancer incidence in the United States
.
J Clin Oncol
2011
;
29
:
4294
301
.
2.
Chaturvedi
AK
,
Anderson
WF
,
Lortet-Tieulent
J
,
Curado
MP
,
Ferlay
J
,
Franceschi
S
, et al
Worldwide trends in incidence rates for oral cavity and oropharyngeal cancers
.
J Clin Oncol
2013
;
31
:
4550
9
.
3.
Mehanna
H
,
Beech
T
,
Nicholson
T
,
El-Hariry
I
,
McConkey
C
,
Paleri
V
, et al
Prevalence of human papillomavirus in oropharyngeal and nonoropharyngeal head and neck cancer - Systematic review and meta-analysis of trends by time and region
.
Head Neck
2013
;
35
:
747
55
.
4.
Boscolo-Rizzo
P
,
Zorzi
M
,
Del Mistro
A
,
Da Mosto
MC
,
Tirelli
G
,
Buzzoni
C
, et al
The evolution of the epidemiological landscape of head and neck cancer in Italy: is there evidence for an increase in the incidence of potentially HPV-related carcinomas?
PLoS One
2018
;
13
:
e0192621
.
5.
Hay
A
,
Ganly
I
. 
Targeted therapy in oropharyngeal squamous cell carcinoma: the implications of HPV for therapy
.
Rare Cancers Ther
2015
;
3
:
89
117
.
6.
Clump
DA
,
Bauman
JE
,
Ferris
RL
. 
Cancer of the oropharynx
.
Surg Oncol Clin N Am
2015
;
24
:
509
20
.
7.
Marur
S
,
Burtness
B
. 
Oropharyngeal squamous cell carcinoma treatment: current standards and future directions
.
Curr Opin Oncol
2014
;
26
:
252
8
.
8.
Ang
KK
,
Harris
J
,
Wheeler
R
,
Weber
R
,
Rosenthal
DI
,
Nguyen-Tân
PF
, et al
Human papillomavirus and survival of patients with oropharyngeal cancer
.
N Engl J Med
2010
;
363
:
24
35
.
9.
Gillison
ML
,
Trotti
AM
,
Harris
J
,
Eisbruch
A
,
Harari
PM
,
Adelstein
DJ
, et al
Radiotherapy plus cetuximab or cisplatin in human papillomavirus-positive oropharyngeal cancer (NRG Oncology RTOG 1016): a randomised, multicentre, non-inferiority trial
.
Lancet
2019
;
393
:
40
50
.
10.
Munker
R
,
Purmale
L
,
Aydemir
Ü
,
Reitmeier
M
,
Pohlmann
H
,
Schorer
H
, et al
Advanced head and neck cancer: Long-term results of chemo-radiotherapy, complications and induction of second malignancies
.
Onkologie
2001
;
24
:
553
8
.
11.
Langendijk
JA
,
Doornaert
P
,
Verdonck-de Leeuw
IM
,
Leemans
CR
,
Aaronson
NK
,
Slotman
BJ
. 
Impact of late treatment-related toxicity on quality of life among patients with head and neck cancer treated with radiotherapy
.
J Clin Oncol
2008
;
26
:
3770
6
.
12.
Mirghani
H
,
Blanchard
P
. 
Treatment de-escalation for HPV-driven oropharyngeal cancer: where do we stand?
Clin Transl Radiat Oncol
2017
;
8
:
4
11
.
13.
Seiwert
TY
,
Foster
CC
,
Blair
EA
,
Karrison
TG
,
Agrawal
N
,
Melotek
JM
, et al
OPTIMA: a phase II dose and volume de-escalation trial for human papillomavirus-positive oropharyngeal cancer
.
Ann Oncol
2019
;
30
:
297
302
.
14.
Marur
S
,
Li
S
,
Cmelak
AJ
,
Gillison
ML
,
Zhao
WJ
,
Ferris
RL
, et al
E1308: phase II trial of induction chemotherapy followed by reduced-dose radiation and weekly cetuximab in patients with HPV-associated resectable squamous cell carcinoma of the oropharynx— ECOG-ACRIN Cancer Research Group
.
J Clin Oncol
2017
;
35
:
490
7
.
15.
Mehanna
H
,
Robinson
M
,
Hartley
A
,
Kong
A
,
Foran
B
,
Fulton-Lieuw
T
, et al
Radiotherapy plus cisplatin or cetuximab in low-risk human papillomavirus-positive oropharyngeal cancer (De-ESCALaTE HPV): an open-label randomised controlled phase 3 trial
.
Lancet
2019
;
393
:
51
60
.
16.
Chen
AM
,
Felix
C
,
Wang
PC
,
Hsu
S
,
Basehart
V
,
Garst
J
, et al
Reduced-dose radiotherapy for human papillomavirus-associated squamous-cell carcinoma of the oropharynx: a single-arm, phase 2 study
.
Lancet Oncol
2017
;
18
:
803
11
.
17.
Mauri
L
,
D'Agostino
RB
. 
Challenges in the design and interpretation of noninferiority trials
.
N Engl J Med
2017
;
377
:
1357
67
.
18.
Regan
MM
,
Barry
WT
. 
Trial designs and results supporting treatment de-escalation and escalation
.
Breast
2017
;
34
:
S10
2
.
19.
Hung
HMJ
,
Wang
SJ
,
O'Neill
R
. 
A regulatory perspective on choice of margin and statistical inference issue in non-inferiority trials
.
Biometrical J
2005
;
47
:
28
36
.
20.
D'Agostino
RB
,
Massaro
JM
,
Sullivan
LM
. 
Non-inferiority trials: design concepts and issues - the encounters of academic consultants in statistics
.
Stat Med
2003
;
22
:
169
86
.
21.
DigitizeIt
. Available from: https://www.digitizeit.de/.
22.
Ibrahim
JG
,
Chen
M-H
,
Sinha
D
.
Bayesian survival analysis
. Switzerland,
Springer
; 
2001
.
23.
Thall
PF
,
Simon
RM
,
Estey
EH
. 
Bayesian sequential monitoring designs for singlearm clinical trials with multiple outcomes
.
Stat Med
1995
;
14
:
357
79
.
24.
Murray
TA
,
Thall
PF
,
Yuan
Y
,
McAvoy
S
,
Gomez
DR
. 
Robust treatment comparison based on utilities of semi-competing risks in non-small-cell lung cancer
.
J Am Stat Assoc
2017
;
112
:
11
23
.
25.
Cai
C
,
Liu
S
,
Yuan
Y
. 
A Bayesian design for phase II clinical trials with delayed responses based on multiple imputation
.
Stat Med
2014
;
33
:
4017
28
.
26.
Thall
PF
,
Nguyen
HQ
,
Braun
TM
,
Qazilbash
MH
. 
Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes
.
Biometrics
2013
;
69
:
673
82
.
27.
Cellamare
M
,
Ventz
S
,
Baudin
E
,
Mitnick
CD
,
Trippa
L
. 
A Bayesian response-adaptive trial in tuberculosis: the endTB trial
.
Clin Trials
2017
;
14
:
17
28
.

Supplementary data