Successful completion of the Biomarker-integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) trial, reported in this issue of Cancer Discovery, is an important advance in the effort to improve clinical trial approaches to the simultaneous development of new therapeutics with matching diagnostic tests so that patients most likely to benefit from these therapies can be identified. Cancer Discovery; 1(1); 17–20. ©2011 AACR.

Commentary on Kim et al., p. 44

Advances in basic cancer research have led to a widely used discovery and development approach for drugs designed to inhibit specific cancer pathways. However, clinical trial designs have not kept pace with basic research advances, and use of traditional, histology-based, “all-comers” phase I and II trial designs for these drugs has led typically to failure in phase III studies or demonstration of “success” based on statistically significant but clinically questionable benefit in an all-comers population. For example, Table 1 lists 9 drugs that failed in 13 phase III trials of unselected (i.e., in the absence of a diagnostic test that predicts tumor responsiveness to a drug) patients with non–small cell lung cancer (NSCLC) during or after the completion of the BATTLE study (1). Indeed, only 2 drugs that target signaling pathways have been approved by the FDA for the treatment of unselected NSCLC patients: erlotinib [small molecule inhibitor of the epidermal growth factor receptor (EGFR) tyrosine kinase] and bevacizumab [monoclonal antibody targeting vascular endothelial growth factor (VEGF)]. Clearly, the approach to developing these kinds of drugs for NSCLC and other cancers needs to change, or we will continue to waste precious clinical trial resources in futile studies.

Table 1.

Signaling pathway–targeting compounds that failed in phase III trials involving unselected NSCLC patients

AgentTargetTrial design
Bexarotene RXR Add-on to vinorelbine/cisplatin 
Sorafenib Multikinase Add-on to carboplatin/paclitaxel 
    Add-on to gemcitabine/cisplatin 
Vandetanib Multikinase Add-on to erlotinib 
    Add-on to docetaxela 
    Add-on to pemetrexed 
Bevacizumab VEGF Add-on to erlotinib 
Cediranib VEGFR Add-on to carboplatin/paclitaxel 
Figitumumab IGF1R Add-on to carboplatin/paclitaxel 
    Add-on to erlotinib 
Lonafarnib Farnesyltransferase Add-on to carboplatin/paclitaxel 
PFS3512676 TLR9 Add-on to carboplatin/paclitaxel 
Vadimezan Tumor vasculature Add-on to carboplatin/paclitaxel 
AgentTargetTrial design
Bexarotene RXR Add-on to vinorelbine/cisplatin 
Sorafenib Multikinase Add-on to carboplatin/paclitaxel 
    Add-on to gemcitabine/cisplatin 
Vandetanib Multikinase Add-on to erlotinib 
    Add-on to docetaxela 
    Add-on to pemetrexed 
Bevacizumab VEGF Add-on to erlotinib 
Cediranib VEGFR Add-on to carboplatin/paclitaxel 
Figitumumab IGF1R Add-on to carboplatin/paclitaxel 
    Add-on to erlotinib 
Lonafarnib Farnesyltransferase Add-on to carboplatin/paclitaxel 
PFS3512676 TLR9 Add-on to carboplatin/paclitaxel 
Vadimezan Tumor vasculature Add-on to carboplatin/paclitaxel 

NOTE: Shading indicates compounds that were studied in the BATTLE trial.

Abbreviations: RXR, retinoid X receptor; TLR9, Toll-like receptor 9.

aMet primary progression-free survival end point of a hazard ratio <0.80; no difference in overall survival.

Although this problem is well recognized and frequently discussed, in practice little has changed in the use of traditional phase I and II trial designs in cancer drug development. In part, this lack of change has resulted from the difficulty of selecting a diagnostic test to identify responsive patient subgroups in early clinical trials, and even when a test has been selected, it has often been incorrect. A good example is the selection of EGFR protein expression to predict responsiveness to the EGFR-targeting antibody cetuximab. This choice was rational, based on preclinical studies of the antibody–receptor interaction (2), but subsequent studies indicated that EGFR protein expression, as assessed by immunohistochemical (IHC) analyses, is not a useful predictor for responsiveness to cetuximab in the clinic (3). Similarly, although intuitively appealing and supported by preclinical experiments (4), insulin-like growth factor 1 receptor (IGF1R) protein expression alone has not been useful in selecting patients who will benefit from treatment with IGF1R-targeting antibodies (5). These kinds of errors are, in some measure, due to the frequent depiction of cancer pathways as relatively simple, well-understood linear networks, when in reality our understanding of these pathways, and their perturbation by therapeutics, remains poor. In addition, agnostic, systems biology approaches to identifying “molecular signatures” for responsive patient subgroups have been hampered by the requirement for relatively large clinical datasets for signature validation, which are not available in early phase I and II trials of a new anticancer agent. Further, molecular signatures derived from preclinical studies have not yet proved reliable for predicting benefit in the clinic (6, 7). Thus, despite major advances in understanding cancer biology over the past 30 years, only 8 diagnostic tests used to select responsive patient subgroups are included in cancer drug labels (estrogen receptor IHC, HER-2 IHC/DNA hybridization assay, EGFR IHC, C-KIT IHC, BCR-ABL chromosome, PML-RAR chromosome, 5 del chromosome, and RAS mutation), with only 3 of these FDA approved (HER-2 IHC/DNA hybridization assay, EGFR IHC, and C-KIT IHC).

The BATTLE trial investigators must be recognized and congratulated as bold innovators in their efforts to move beyond traditional clinical trial designs that are ineffective in simultaneously developing a new therapeutic and a matching diagnostic test. The investigators showed that treatment allocation based on results from multiple assays performed on computed tomography–guided biopsy specimens is feasible and associated with minimal safety risk. The study used an adaptive randomization design for this allocation, which was based on ongoing analyses of the rate of 8-week disease control obtained for 20 biomarker-treatment groups (4 treatments, with 5 biomarker groups, yields 20 combinations; the number of patients in some groups was small, as shown in Table 2 of ref. 1). The results indicate that 8 of the 20 biomarker–treatment matches met the predefined criterion for efficacy: a >80% probability of achieving a >30% 8-week disease control rate (DCR; ref. 1). Some matches that met the efficacy criterion are consistent with our current understanding of markers predictive for drug efficacy, such as KRAS/BRAF mutations predicting for response to sorafenib (8). However, other matches are more difficult to understand, including the finding that the highest DCR for erlotinib (40%) was in the VEGF/VEGFR-2 biomarker group. One might have expected the highest DCR for erlotinib to have been in the EGFR biomarker group (which had a 35% DCR that did not meet the efficacy criterion).

Table 2.

Alternative approach to BATTLE

Traditional single-arm phase II trialBiomarker eligibility requirementEstimated proportion of eligible patients (%)Sample size needed without early stoppingExpected sample size with early stopping
Erlotinib EGFR mutation 15 133 100 
Vandetanib VEGFR-2 overexpression (IHC score >100) 40 50 38 
Erlotinib + bexarotene RXR α overexpression (nuclear IHC score > 30) 80 25 19 
Sorafenib KRAS or BRAF mutation 22 91 68 
      Total = 299 Total = 225 
Traditional single-arm phase II trialBiomarker eligibility requirementEstimated proportion of eligible patients (%)Sample size needed without early stoppingExpected sample size with early stopping
Erlotinib EGFR mutation 15 133 100 
Vandetanib VEGFR-2 overexpression (IHC score >100) 40 50 38 
Erlotinib + bexarotene RXR α overexpression (nuclear IHC score > 30) 80 25 19 
Sorafenib KRAS or BRAF mutation 22 91 68 
      Total = 299 Total = 225 

NOTE: This alternative approach uses 4 separate Simon 2-stage phase II trials that treat only biomarker-positive patients.

Another consideration is that issues with the BATTLE study design complicate interpretation of results. First, as noted by the investigators, partial exclusion of patients with previous erlotinib treatment confounded the adaptive randomization process because these patients could be randomized to only 2 of the 4 treatment arms. Second, technical qualifications of the assays used for biomarker grouping were not reported; thus, the choice of cutoffs for these assays (to determine whether a patient's tumor was “positive” or “negative” for a given biomarker group) could be questioned. These issues make it difficult to conclude that predictive biomarkers have been identified for the treatments. In addition, although the authors assert that the study “validated prespecified hypotheses regarding predictive biomarkers,” the precise biomarker hypotheses, as well as associated type I and type II statistical errors, are not clear. Thus, the study should be considered as generating a hypothesis rather than as confirming a particular biomarker hypothesis.

From a drug developer perspective, it is of interest to ask whether BATTLE results will influence development plans for the investigational drugs included in the study, or whether BATTLE results might have altered development plans if the results had been available before phase III studies were initiated for bexarotene, sorafenib, or vandetanib, all of which yielded negative results in an unselected NSCLC population (Table 1). With regard to future development, although it is not clear whether the manufacturers of bexarotene, sorafenib, or vandetanib will initiate confirmatory studies in biomarker-defined patient populations identified in BATTLE as potentially responsive to these drugs, 2 BATTLE-like studies have been initiated recently, with support from multiple pharmaceutical companies: BATTLE-FL (front line; NCT01263782) and BATTLE-2 (NCT01248247). BATTLE-FL involves lung cancer patients who are chemotherapy naïve for metastatic disease and includes the combination of pemetrexed and carboplatin as a “control group,” with other treatment arms involving addition of anti-VEGF (bevacizumab), anti-EGFR (cetuximab), or anti-IGF1R (cixutumumab) antibodies to the pemetrexed + carboplatin backbone. Similar to BATTLE, BATTLE-2 involves previously treated lung cancer patients and includes erlotinib and sorafenib treatment arms as well as 2 other investigational treatments: a combination of the AKT inhibitor MK-2206 with the MEK inhibitor AZD6244 and a combination of MK-2206 with erlotinib. As noted in the article by Kim and colleagues (1), this study involves an approach to biomarker selection and tumor classification that is different from the approach of the BATTLE trial. In the first half of the study, clinically validated biomarkers (such as KRAS mutation) will define biomarker groups to be used in adaptive treatment allocation. A limited set of additional prespecified biomarkers will be evaluated in tumor biopsies. After analysis of results, biomarkers considered to be potentially predictive for response to each experimental treatment will be selected for use in treatment allocation decisions in the second half of the study.

Although it is not difficult to point out flaws in the BATTLE trial and to question the significance of results in terms of subsequent development of drugs and biomarkers included in the trial, it is more problematic to suggest an alternative, more efficient approach to codevelopment of new therapeutics with matching predictive biomarkers.

Two major innovations of the BATTLE trial are its operational and statistical approaches. From an operational perspective, BATTLE successfully pioneered an ambitious goal of incorporating 4 different treatment arms (requiring cooperation of 4 different pharmaceutical companies) and 5 different biomarker classifiers within a single study, with treatment allocation based on results of a diagnostic biopsy. Considerable operational efficiency is gained by the lack of “screen failures” versus a traditional approach to selecting only biomarker-positive patients for separate phase II studies. For example, using a traditional single-arm phase II Simon 2-stage approach with null and alternative hypotheses that match those of BATTLE—the null hypothesis is a DCR of 30%, and the alternative hypothesis is a DCR of 50%, with 20% type I error rate and 80% power—with treatment of only biomarker-positive patients in each phase II study (using the data in Supplementary Table S1 of ref. 1 to calculate the frequency of biomarker-positive patients), up to 299 patients would need to be enrolled and undergo biopsy (to obtain 20 biomarker–positive patients for each treatment), compared with the 200 patients required for BATTLE (ref. 9; Table 2). Even with an assumption that all separate phase II studies would stop early, 225 patients (on average) would be needed (Table 2).

Furthermore, among the 4 treatment arms in the BATTLE trial, erlotinib was the only one FDA approved for lung cancer before initiation of the trial. As described previously (10), if the erlotinib arm is viewed as a control group, then additional operational efficiency is gained by inclusion of 3 experimental treatments with 1 control treatment in a single study, as opposed to the traditional approach of separate randomized phase II trials, each comparing one experimental treatment to erlotinib.

The second major innovation of the BATTLE trial is the statistical approach, using adaptive rather than equal randomization. Adaptive randomization allowed selection of the best treatment arm for each enrolled patient based on accumulating knowledge of the DCR for each of the 20 biomarker–treatment matches evaluated in the study. Simulations performed by the BATTLE statisticians indicate that, even with a requirement for equal randomization among approximately the first 90 of 200 patients (to avoid early skewing of the adaptive randomization process), the adaptive randomization approach provides higher expected overall DCRs than does an equal randomization approach (9). The adaptive approach is attractive to both patients and physicians because it epitomizes the idea of “personalized medicine.” Notably, a similar approach is used in the Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis 2 (I-SPY 2) trial, which involves patients with newly diagnosed breast cancer who are eligible for neoadjuvant treatment with a taxane (11).

A caveat regarding the use of adaptive randomization is that the differences in expected DCRs for adaptive versus equal randomization in the BATTLE trial simulations were relatively small (9), and that Korn and Freidlin (12) have reported similar simulations for 2-arm studies, in which the advantages of adaptive versus equal randomization (or alternate fixed randomization, such as 2:1) in the probability of disease control were found to be quite small and of questionable advantage from a trial design perspective.

An alternative approach to BATTLE, using fixed randomization with a similar number of patients, can be used to determine whether codevelopment of an experimental agent with a matching diagnostic test should move forward. For example, we consider a 240-patient trial with equal randomization comparing 3 experimental groups with a common control, with 5 distinct biomarker subpopulations equally distributed across the treatment groups (yielding 20 biomarker–treatment matches, as in BATTLE). A set of decision rules may be constructed on the basis of observed P values, comparing each experimental treatment with the control. If the test in the all-comers population is not significant at a prespecified threshold (e.g., P > 0.2), then further development may not be considered; conversely, further development of an experimental treatment in all comers might be considered if the test is significant (e.g., P < 0.05). An observed trend toward statistical significance (e.g., 0.05 ≤ P < 0.20) could trigger a comparison of the experimental treatment with the control in each biomarker subpopulation, with additional development in a specific subpopulation considered if P < 0.05. Using this example and assuming a DCR of 30% in all but one biomarker subpopulation for a given experimental treatment, we see that an underlying 80% DCR would be required to have a >50% probability of moving an experimental treatment forward, either in a biomarker subpopulation or in an all-comers population. Although an 80% DCR may seem high, precedence for this kind of efficacy exists in well-matched biomarker–treatment combinations (such as BRAF mutation and the BRAF inhibitor PLX4032) and is arguably an appropriate expectation for further development of new therapeutics with matching diagnostic tests.

Statistical designs different from those of BATTLE and I-SPY 2 have been proposed for randomized trials that include predictive biomarker hypotheses (13, 14). These designs do not require a prespecified biomarker test used for treatment assignment, which avoids the screen failure problem described above. The cross-validated adaptive signature design (14) is particularly attractive because it can test multiple biomarkers in a large trial with many clinical end points in the true population of interest. Statistical validity of selected subgroups is characterized by evaluating the signature selection procedure among multiple (e.g., 10) nonoverlapping groups, forming a 90% sample for each complement to the 10% subsample for training and using the 10% sample to test how well the predicted model works. Combining variability across these samples gives a composite predictive value of the signature selection procedure. Although signatures in the 10 subsets evaluated will vary, they presumably will predict patient outcomes and treatment benefit similarly well; that is, they are all different, but similar, “versions of the truth.” Not having to prespecify a set of biomarkers to test before enrollment has the major advantage of accommodating how little we often know at the beginning of a pivotal trial about which subgroups may benefit from a novel treatment regimen.

In summary, successful completion of the BATTLE study is an important milestone in the war against cancer. Biomarker-based approaches like those of BATTLE and I-SPY 2, as well as fixed randomization alternatives, represent major advances in trial design that should accelerate identification of predictive biomarkers for novel therapeutics.

No potential conflicts of interest were disclosed.

1.
Kim
ES
,
Herbst
RS
,
Wistuba
II
,
Lee
JJ
,
Blumenschein
GR
 Jr
,
Tsao
A
, et al
. 
The BATTLE Trial: personalizing therapy for lung cancer
.
Cancer Discovery
2011
;
1
:
OF42
OF51
.
2.
Goldenberg
A
,
Masui
H
,
Divgi
C
,
Kamrath
H
,
Pentlow
K
,
Mendelsohn
J
. 
Imaging of human tumor xenografts with an indium-111-labeled anti-epidermal growth factor receptor monoclonal antibody
.
J Natl Cancer Inst
1989
;
81
:
1616
25
.
3.
Chung
KY
,
Shia
J
,
Kemeny
NE
,
Shah
M
,
Schwartz
GK
,
Tse
A
, et al
. 
Cetuximab shows activity in colorectal cancer patients with tumors that do not express the epidermal growth factor receptor by immunohistochemistry
.
J Clin Oncol
2005
;
23
:
1803
10
.
4.
Gong
Y
,
Yao
E
,
Shen
R
,
Goel
A
,
Arcila
M
,
Teruya-Feldstein
J
, et al
. 
High expression levels of total IGF-1R and sensitivity of NSCLC cells in vitro to an anti-IGF-1R antibody (R1507)
.
PLoS One
2009
;
4
:
e7273
.
5.
Gualberto
A
,
Dolled-Filhart
M
,
Gustavson
M
,
Christiansen
J
,
Wang
YF
,
Hixon
ML
, et al
. 
Molecular analysis of non-small cell lung cancer identifies subsets with different sensitivity to insulin-like growth factor I receptor inhibition
.
Clin Cancer Res
2010
;
16
:
4654
65
.
6.
Baggerly
K
,
Coombes
K
. 
Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology
.
Ann Appl Stat
2009
;
3
:
1309
34
.
7.
Potti
A
,
Dressman
HK
,
Bild
A
,
Riedel
RF
,
Chan
G
,
Sayer
R
, et al
. 
Genomic signatures to guide the use of chemotherapeutics
.
Nat Med
2006
;
12
:
1294
300
.
8.
Wilhelm
SM
,
Carter
C
,
Tang
L
,
Wilkie
D
,
McNabola
A
,
Rong
H
, et al
. 
BAY 43-9006 exhibits broad spectrum oral antitumor activity and targets the RAF/MEK/ERK pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis
.
Cancer Res
2004
;
64
:
7099
109
.
9.
Zhou
X
,
Liu
S
,
Kim
ES
,
Herbst
RS
,
Lee
JJ
. 
Bayesian adaptive design for targeted therapy development in lung cancer—a step toward personalized medicine
.
Clin Trials
2008
;
5
:
181
93
.
10.
Freidlin
B
,
Korn
EL
,
Gray
R
,
Martin
A
. 
Multi-arm clinical trials of new agents: some design considerations
.
Clin Cancer Res
2008
;
14
:
4368
71
.
11.
Barker
AD
,
Sigman
CC
,
Kelloff
GJ
,
Hylton
NM
,
Berry
DA
,
Esserman
LJ
. 
I-SPY 2: an adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy
.
Clin Pharmacol Ther
2009
;
86
:
97
100
.
12.
Korn
EL
,
Freidlin
B
. 
Outcome-adaptive randomization: is it useful
?
J Clin Oncol
2011
;
29
:
771
6
.
13.
Baker
SG
,
Sargent
DJ
. 
Designing a randomized clinical trial to evaluate personalized medicine: a new approach based on risk prediction
.
J Natl Cancer Inst
2010
;
102
:
1756
9
.
14.
Freidlin
B
,
Jiang
W
,
Simon
R
. 
The cross-validated adaptive signature design
.
Clin Cancer Res
2010
;
16
:
691
8
.