We compiled and analyzed a database of cooperative group trials in advanced pancreatic cancer to develop historical benchmarks for overall survival (OS) and progression-free survival (PFS). Such benchmarks are essential for evaluating new therapies in a single-arm setting. The analysis included patients with untreated metastatic pancreatic cancer receiving regimens that included gemcitabine, between 1995 and 2005. Prognostic baseline factors were selected by their significance in Cox regression analysis. Outlier trial arms were identified by comparing individual 6-month OS and PFS rates against the entire group. The dataset selected for the generation of OS and PFS benchmarks was then tested for intertrial arm variability using a logistic-normal model with the selected baseline prognostic factors as fixed effects and the individual trial arm as a random effect. A total of 1,132 cases from eight trials qualified. Performance status and sex were independently significant for OS, and performance status was prognostic for PFS. Outcomes for one trial (NCCTG-034A) were significantly different from the other trial arms. When this trial was excluded, the remaining trial arms were homogeneous for OS and PFS outcomes after adjusting for performance status and sex. Benchmark values for 6-month OS and PFS are reported along with a method for using these values in future study design and analysis. The benchmark survival values were generated from a dataset that was homogeneous between trials. The benchmarks can be used to enable single-arm phase II trials using a gemcitabine platform, especially under certain circumstances. Such circumstances might be when a randomized control arm is not practically feasible, an early signal of activity of an experimental agent is being explored such as in expansion cohorts of phase I studies, and in patients who are not candidates for combination cytotoxic therapy. Clin Cancer Res; 20(16); 4176–85. ©2014 AACR.

Phase II clinical trials in cancer have, in recent years, focused increasingly on “targeted” agents that are “cytostatic” rather than “cytotoxic.” Although most agents that ultimately prove to be useful in the clinic demonstrate at least some disease stability, many authors feel that a traditional treatment response endpoint for phase II trials in solid tumors is less relevant for testing the newer targeted agents (1). Researchers therefore frequently prefer to measure treatment success in terms of overall survival (OS) or progression-free survival (PFS) rather than clinical response. For survival and PFS endpoints in the phase II setting, one may choose between a single-arm approach, which compares trial results with some historical benchmark, or a randomized phase II trial with two or more arms, where the “control” arm provides the benchmark for judging success. The Clinical Trial Design Task Force of the NCI Investigational Drug Steering Committee has recommended the randomized approach in the phase II setting, especially when evaluating combinations of agents (2). However, the single-arm approach is deemed appropriate for the evaluation of single-agent experimental therapies, and where a well-defined historical control database is available (2, 3).

Single-arm designs have the advantage of requiring fewer patients, all of whom receive the experimental treatment. The conduct of trials requires patients, funding, and effort. With a multitude of candidate treatments and limitations on funding and time, an expedited result through a single-arm trial is desirable when feasible. However, researchers may have difficulty arriving at an appropriate historical benchmark against which to compare their results (4).

To address the problem of reliable historical benchmarks for single-arm phase II trials, efforts have already been made in specific disease sites, such as stage IV melanoma (3), to amass historical databases and derive historical control data for future trials. The current effort, part of the aforementioned NCI-sponsored task force, has resulted in the compilation of clinical trial data in two specific diseases: advanced pancreatic cancer and advanced non–small cell lung cancer. We report here on the advanced pancreatic cancer database and the benchmarks derived for previously untreated advanced pancreatic cancer. All trials were conducted by cooperative groups in the United States from 1995 to 2005. These clinical trial data were compiled and analyzed specifically to provide the appropriate benchmarks for the planning and analysis of future phase II trials in this disease.

Historically, certain trials in advanced pancreatic cancer included locally advanced unresectable disease. More recently, and certainly for the future, trials will select exclusively for either locally advanced or metastatic disease so that these two patient populations can be studied separately (5). Therefore, in accordance with the primary objective to provide benchmarks for future trials in advanced, metastatic pancreatic cancer, the decision was made to focus initially on cases with metastatic disease and no prior chemotherapy for pancreatic cancer. Cases with locally advanced disease will be considered separately. In addition, because the use of gemcitabine for advanced pancreatic cancer represented a change in treatment standard for the disease (6), and continues to be a treatment standard, we include for this analysis only those trial arms in which gemcitabine is part of the treatment regimen. The recent use of FOLFIRINOX for some patient groups is presented in the discussion.

Data sources and selection

Patient-level and trial-level data from eligible cooperative group trial participants were collected from U.S. Cooperative Group phase II and phase III trials in advanced pancreatic cancer, accrued during the time period of January, 1995, through December, 2005. Four cooperative groups submitted data for pancreatic cancer: SWOG, the North Central Cancer Treatment Group (NCCTG), the Eastern Cooperative Oncology Group (ECOG), and the Radiation Therapy Oncology Group (RTOG). The study was approved by the institutional review board (IRB) for the organization compiling and analyzing the data (Cancer Research and Biostatistics) and by the IRBs serving each participating cooperative group data center. Waiver of consent was approved due to the deidentified and retrospective nature of the data.

Patient-level data elements requested were age at registration, sex, race, Zubrod performance status (PS), location and extent of disease at baseline, prior treatment, laboratory values, and assigned treatment arm. The extent to which the data element requests were fulfilled varied by trial, so that certain baseline factors were missing from various subsets of the full dataset. Trial-level variables included activation and closure dates, the treatment setting (first line vs. second or more), details on the treatment regimen, type of trial (phase II vs. III), objective tumor response and disease progression criteria in use, and type of disease evaluation required at baseline. Patient-level outcome data included best response to treatment, time to progression, and time to death or last contact. Trials activating before the 2000-to-2004 time frame used bidimensional response and progression criteria as described below. Those activated in the year 2000 or later used unidimensional RECIST (7).

Statistical methods

OS was calculated as the time from registration until death or last contact, with censoring at last contact if the patient was still living. PFS was calculated as the time from registration until documented progression of disease (according to individual trial criteria), or death, whichever occurred first. Cases alive without progression were censored at the date of last contact.

The selection of baseline patient-level factors to carry forward as part of the determination of OS and PFS benchmarks was achieved by the combined information from statistical modeling and practical considerations. Practical considerations included the availability of a given data element in a sufficient proportion of the submitted data, and the practicality of measuring or predicting the distribution of a given factor in a real trial population. The selection of independently prognostic baseline covariates was achieved via Cox regression (8) with stepwise variable selection, applied separately to OS and PFS. Univariately significant patient-level factors with sufficient representation in the database were subjected to multiple factor Cox regression with stepwise selection to determine independent prognostic significance. Factors that were not univariately significant (but sufficiently represented) were initially included as well, to explore for the possibility of confounding. Factors that were insufficiently represented in the data were excluded from consideration. As a rule, a factor that was represented in less than 80% of the cases available for analysis, or was missing completely from an individual contributing group in the final analysis set, could not be considered. Survival curves and estimates of medians and rates were estimated by the Kaplan–Meier method (9).

Between-trial variability in the 6-month OS rate and the 6-month PFS rate was assessed in two ways: (i) the OS and PFS rate for each trial arm was compared with the overall rate using a Bonferroni correction; (ii) once prognostic baseline factors were selected, a logistic-normal model with individual trial arm as a random effect was used to estimate between-trial variability not accounted for by differing distributions of the baseline factors. These methods for assessing between-trial variability were also described in the context of historical melanoma trials by Korn and colleagues (3). This model was applied to the entire set of trials, as well as to the reduced set where outliers identified in the first method were eliminated. OS and PFS benchmarks were calculated for prognostic groups according to the selected baseline factors, eliminating any trial that was found to be an outlier in step 1. All analyses were performed using SAS version 9.2.

Characteristics of the study population

A complete listing of the trials involving 2,341 cases that were contributed by four cooperative groups is given in (Table 1). After screening to include only those cases with previously untreated metastatic disease enrolled to trial arms that incorporated gemcitabine as part of the regimen, there were 1,132 cases available for this analysis. The included cases were from eight trials (10 trial arms) with activation dates ranging from 1996 to 2004, as characterized in Table 2. The three ECOG trials that were activated before the year 2000 used ECOG Solid Tumor Response Criteria (10). With respect to determining progression, the ECOG criteria are comparable with standard World Health Organization (WHO) bidimensional progression criteria (11), in which a 25% increase in the product of perpendicular diameters in any single lesion (or the appearance of a new lesion) qualified for disease progression. The NCCTG trial activated before the year 2000 used comparable bidimensional criteria. All other trials used RECIST, which for progression requires a 20% increase in the sum of unidimensional diameters over the smallest sum observed during treatment, or the observation of new lesions. Assessment intervals ranged from 6 weeks to 8 weeks (or two cycles of gemcitabine). Reported outcomes for each included trial arm are given in Table 3.

Table 1.

Complete listing of data contributed by group and protocol

ProtocolType of trialProtocol TxSettingN
ECOG-1202 Phase II Chemotherapy First line 10 
ECOG-1298 Phase II Chemotherapy First line 32 
ECOG-2297 Phase III Chemotherapy First line 321 
ECOG-3292 Phase II Chemo/immunotherapy First line 24 
ECOG-3296 Phase II Chemotherapy First line 36 
NCCTG-0043 Phase II Chemo/biologic First line 48 
NCCTG-014C Randomized phase II Biologic First line 42 
NCCTG-014C Randomized phase II Chemo/biologic First line 39 
NCCTG-034A Phase II Chemo/biologic First line 79 
NCCTG-894352 Phase III Chemotherapy Either 94 
NCCTG-924352 Phase II Chemotherapy Either 46 
NCCTG-964351 Phase II Chemotherapy First line 13 
NCCTG-984351 Phase II Chemotherapy Either 58 
NCCTG-9942 Phase II Concurrent chemo/RT First line 46 
RTOG-0020 Randomized phase II Concurrent chemo/RT First line 91 
RTOG-0411 Phase II Chemo/biologic/RT First line 82 
RTOG-9102 Phase III Concurrent chemo/RT First line 27 
RTOG-9209 Phase II Concurrent chemo/RT First line 50 
RTOG-9812 Phase II Concurrent chemo/RT Either 109 
SWOG-0107 Phase II Chemotherapy First line 60 
SWOG-0205 Phase III Chemotherapy First line 347 
SWOG-0205 Phase III Chemo/biologic First line 348 
SWOG-8916 Phase II Chemotherapy First line 25 
SWOG-8933 Phase II Chemotherapy First line 35 
SWOG-9100 Phase II Chemo/potentiator First line 26 
SWOG-9135 Phase II Chemotherapy First line 39 
SWOG-9413 Phase II Chemo/immunotherapy First line 55 
SWOG-9629 Phase II Chemo/potentiator First line 58 
SWOG-9629 Phase II Chemo/potentiator Second line or more 48 
SWOG-9924 Phase II Biologic First line 53 
ProtocolType of trialProtocol TxSettingN
ECOG-1202 Phase II Chemotherapy First line 10 
ECOG-1298 Phase II Chemotherapy First line 32 
ECOG-2297 Phase III Chemotherapy First line 321 
ECOG-3292 Phase II Chemo/immunotherapy First line 24 
ECOG-3296 Phase II Chemotherapy First line 36 
NCCTG-0043 Phase II Chemo/biologic First line 48 
NCCTG-014C Randomized phase II Biologic First line 42 
NCCTG-014C Randomized phase II Chemo/biologic First line 39 
NCCTG-034A Phase II Chemo/biologic First line 79 
NCCTG-894352 Phase III Chemotherapy Either 94 
NCCTG-924352 Phase II Chemotherapy Either 46 
NCCTG-964351 Phase II Chemotherapy First line 13 
NCCTG-984351 Phase II Chemotherapy Either 58 
NCCTG-9942 Phase II Concurrent chemo/RT First line 46 
RTOG-0020 Randomized phase II Concurrent chemo/RT First line 91 
RTOG-0411 Phase II Chemo/biologic/RT First line 82 
RTOG-9102 Phase III Concurrent chemo/RT First line 27 
RTOG-9209 Phase II Concurrent chemo/RT First line 50 
RTOG-9812 Phase II Concurrent chemo/RT Either 109 
SWOG-0107 Phase II Chemotherapy First line 60 
SWOG-0205 Phase III Chemotherapy First line 347 
SWOG-0205 Phase III Chemo/biologic First line 348 
SWOG-8916 Phase II Chemotherapy First line 25 
SWOG-8933 Phase II Chemotherapy First line 35 
SWOG-9100 Phase II Chemo/potentiator First line 26 
SWOG-9135 Phase II Chemotherapy First line 39 
SWOG-9413 Phase II Chemo/immunotherapy First line 55 
SWOG-9629 Phase II Chemo/potentiator First line 58 
SWOG-9629 Phase II Chemo/potentiator Second line or more 48 
SWOG-9924 Phase II Biologic First line 53 

Abbreviation: RT, radiotherapy.

Table 2.

Trial characteristics for primary analysis cases (N = 1,132)

ProtocolNaActivation yearType of trialTreatmentProgression criteriaAssessment interval
ECOG-1298 (18) 30 1999 Phase II Gem/docetaxel ECOGb 8 weeks 
ECOG-2297 Arm 1 (19) 156 1998 Phase III Gemcitabine ECOGb 8 weeks 
ECOG-2297 Arm 2 (20) 149 1998 Phase III Gem/5-FU ECOGb 8 weeks 
ECOG-3296 (20) 36 1996 Phase II Gem/5-FU ECOGb 8 weeks 
NCCTG-0043 (21) 40 2001 Phase II Gem/ISIS-2503 RECISTc 6 weeks 
NCCTG-014C Arm B (22) 36 2002 Rand. phase II Gem/PS-341 RECISTc 6 weeks 
NCCTG-034A (23) 70 2005 Phase II Gem/oxali/bev RECISTc 2 cycles/2 months 
NCCTG-984351 (24) 54 1999 Phase II Gem/oxaliplatin WHOd 6 weeks 
SWOG-0205 Arm 1 (25) 282 2004 Phase III Gem/cetuximab RECISTc 2 cycles/8 weeks 
SWOG-0205 Arm 2 (25) 279 2004 Phase III Gemcitabine RECISTc 2 cycles/8 weeks 
ProtocolNaActivation yearType of trialTreatmentProgression criteriaAssessment interval
ECOG-1298 (18) 30 1999 Phase II Gem/docetaxel ECOGb 8 weeks 
ECOG-2297 Arm 1 (19) 156 1998 Phase III Gemcitabine ECOGb 8 weeks 
ECOG-2297 Arm 2 (20) 149 1998 Phase III Gem/5-FU ECOGb 8 weeks 
ECOG-3296 (20) 36 1996 Phase II Gem/5-FU ECOGb 8 weeks 
NCCTG-0043 (21) 40 2001 Phase II Gem/ISIS-2503 RECISTc 6 weeks 
NCCTG-014C Arm B (22) 36 2002 Rand. phase II Gem/PS-341 RECISTc 6 weeks 
NCCTG-034A (23) 70 2005 Phase II Gem/oxali/bev RECISTc 2 cycles/2 months 
NCCTG-984351 (24) 54 1999 Phase II Gem/oxaliplatin WHOd 6 weeks 
SWOG-0205 Arm 1 (25) 282 2004 Phase III Gem/cetuximab RECISTc 2 cycles/8 weeks 
SWOG-0205 Arm 2 (25) 279 2004 Phase III Gemcitabine RECISTc 2 cycles/8 weeks 

Abbreviations: bev, bevacizumab; 5-FU, 5-fluorouracil; Gem, gemcitabine; oxali, oxaliplatin; Rand., randomized.

aNumber of cases included in the current analysis.

bECOG Solid Tumor Response Criteria (10).

cResponse Evaluation Criteria In Solid Tumors (7).

dWorld Health Organization Guidelines (11).

Table 3.

Study population and published results for included trials

ProtocolNM/F (%)PS 0/1/2 (%)PFSOS
ECOG-1298 32 50/50 12/72/16 2.1 months median PFS 4.7 months median OS 
ECOG-2297 Arm 1 162 54/46 35/52/14 2.2 months median PFS 5.4 months median OS 
ECOG-2297 Arm 2 160 52/48 23/64/14 3.4 months median PFS 6.7 months median OS 
ECOG-3296 36 69/31 28/53/19 2.4 months median TTFa 4.3 months median OS 
NCCTG-0043 48 37/63 34/58/8 3.8 months median PFS 6.7 months median OS 
NCCTG-014C Arm B 36 53/47 35/58/7 2.4 months 4. 8 months 
NCCTG-034A 82 70/27b 40/54/6b 5.7 months 8.1 months 
NCCTG-984351 46 56/44 35/41/20 4.5 months 6.2 months 
SWOG-0205 Arm 1 371 54/46 87/13c 3.0 months 5.9 months 
SWOG-0205 Arm 2 372 51/49 87/13c 3.4 months 6.3 months 
ProtocolNM/F (%)PS 0/1/2 (%)PFSOS
ECOG-1298 32 50/50 12/72/16 2.1 months median PFS 4.7 months median OS 
ECOG-2297 Arm 1 162 54/46 35/52/14 2.2 months median PFS 5.4 months median OS 
ECOG-2297 Arm 2 160 52/48 23/64/14 3.4 months median PFS 6.7 months median OS 
ECOG-3296 36 69/31 28/53/19 2.4 months median TTFa 4.3 months median OS 
NCCTG-0043 48 37/63 34/58/8 3.8 months median PFS 6.7 months median OS 
NCCTG-014C Arm B 36 53/47 35/58/7 2.4 months 4. 8 months 
NCCTG-034A 82 70/27b 40/54/6b 5.7 months 8.1 months 
NCCTG-984351 46 56/44 35/41/20 4.5 months 6.2 months 
SWOG-0205 Arm 1 371 54/46 87/13c 3.0 months 5.9 months 
SWOG-0205 Arm 2 372 51/49 87/13c 3.4 months 6.3 months 

aTime to treatment failure.

bPercentages generated from data, not published report.

cPS 0–1/2.

Of the 1,132 cases, 1,123 had died and 9 were alive at last contact. The minimum, median, and maximum survival follow-up for the 9 living patients were 2, 43, and 74 months, respectively. All but 3 of the 1,132 patients had progressive disease at the time of death or last contact, or had died due to their cancer. Published results of individual trials are shown in Table 3.

Baseline characteristics for the 1,132 cases are shown in Table 4. Forty-six percent of patients were female, and a majority of patients (87%) had a PS (translated to Zubrod scale) of 0 or 1. Ninety percent of patients were Caucasian, 7% black, <1% Asian, and the remaining 2% Native American, Pacific Islander, or not reported. Seventy-seven percent of cases originated from randomized phase III trials. Thirty-eight percent of cases were from trials that were activated from the period from 1995 to 2000, 62% from 2000 to 2005.

Table 4.

Patient characteristics and statistics from univariate and multivariate Cox proportional hazards regression models for 1,132 patients (N = 1,116 cases complete for data for multivariate models)

OSPFS
Single factorMultivariateSingle factor
FactorN (%)ComparisonHR (P) N = 1,132HR (P) N = 1,116HR (P) N = 1,132
Sex 
 Female 518 (46%)     
 Male 614 (54%) Male vs. female 1.16 (0.017) 1.15 (0.02) 1.09 (0.16) 
PS 
 0 316 (28%)     
 1 650 (57%) PS 1 vs. PS 0 1.28 (<0.0001) 1.27 (<0.0006) 1.12 (0.09) 
 2 150 (13%) PS 2 vs. PS 1 1.86 (<0.0001) 1.87 (<0.0001) 1.58 (<.0001) 
 Not reported 16 (1%) None NA NA NA 
Race 
 White 1,020 (90%) White vs. rest 0.91 (0.33) NA 0.85 (0.10) 
 Black 76 (7%) Black vs. rest 1.18 (0.16) NA 1.12 (0.35) 
 Asian/other 18 (2%) Asian/other vs. rest 0.95 (0.78) NA 1.29 (0.13) 
 Not reported 18 (2%) None NA NA NA 
Bilirubin 
 Normal 511 (48%)   NA  
 >Normal 81 (7%) > Normal vs. normal 1.13 (0.046) NA 1.06 (0.38) 
 Not reported 540 (47%) None    
AST/ALT 
 Normal 474 (42%)     
 >Normal 315 (28%) > Normal vs. normal 1.35 (<.001) NA 1.31 (<0.001) 
 Not reported 343 (30%) None NA NA  
OSPFS
Single factorMultivariateSingle factor
FactorN (%)ComparisonHR (P) N = 1,132HR (P) N = 1,116HR (P) N = 1,132
Sex 
 Female 518 (46%)     
 Male 614 (54%) Male vs. female 1.16 (0.017) 1.15 (0.02) 1.09 (0.16) 
PS 
 0 316 (28%)     
 1 650 (57%) PS 1 vs. PS 0 1.28 (<0.0001) 1.27 (<0.0006) 1.12 (0.09) 
 2 150 (13%) PS 2 vs. PS 1 1.86 (<0.0001) 1.87 (<0.0001) 1.58 (<.0001) 
 Not reported 16 (1%) None NA NA NA 
Race 
 White 1,020 (90%) White vs. rest 0.91 (0.33) NA 0.85 (0.10) 
 Black 76 (7%) Black vs. rest 1.18 (0.16) NA 1.12 (0.35) 
 Asian/other 18 (2%) Asian/other vs. rest 0.95 (0.78) NA 1.29 (0.13) 
 Not reported 18 (2%) None NA NA NA 
Bilirubin 
 Normal 511 (48%)   NA  
 >Normal 81 (7%) > Normal vs. normal 1.13 (0.046) NA 1.06 (0.38) 
 Not reported 540 (47%) None    
AST/ALT 
 Normal 474 (42%)     
 >Normal 315 (28%) > Normal vs. normal 1.35 (<.001) NA 1.31 (<0.001) 
 Not reported 343 (30%) None NA NA  

Abbreviation: NA, not available.

Prognostic baseline factors

Univariate survival statistics for the considered baseline factors with respect to OS and PFS are shown in Table 4. Baseline factors considered were sex, race (white vs. black vs. Asian versus other), PS, bilirubin > normal, and serum aspartate aminotransferase (AST)/alanine aminotransferase (ALT) > normal. Location of metastatic disease was not considered due to insufficient data availability. Certain laboratory values such as albumin and platelet counts were also not consistently available across the submitted databases.

OS findings were as follows: Race was not univariately significant as no single race was significantly different from the others, possibly because the distribution was overwhelmingly white. The race factor was explored in the multivariate setting initially, to assess for the possibility of effect modification with other factors. This factor was then eliminated from consideration. AST/ALT was significant univariately (HR, 1.35; P < 0.001 for abnormal AST/ALT) but was not entered into multivariate analyses because of insufficient numbers with available data (available for only 839 cases.) Serum bilirubin level was significantly prognostic for survival (HR, 1.52 for bilirubin above normal; P < 0.001), but was missing in 540 (nearly half) of cases. Imputation for patient data with missing laboratories was considered but lacked good surrogates for an imputation model. The remaining factors (sex and PS) were entered into a Cox regression analysis with stepwise elimination, with the final model including both factors as independently associated with survival.

For PFS, race, sex, and bilirubin were not significantly prognostic. AST/ALT was significant (HR, 1.34; P < 0.001), but again, the factor was not considered further due to lack of sufficient data. PS was independently prognostic (Table 4).

Variability between trial arms

The overall 6-month survival rate for the combined 10 trial arms was 48% [95% confidence interval (CI) 44%–50%]. When comparing the individual 6-month OS rates of each of the 10 trial arms with the overall rate of 48% with correction for multiple comparisons, one trial arm—the single-arm phase II trial NCCTG-034A (gemcitabine + bevacizumab + oxaliplatin), with a 6-month survival rate of 67% for 70 patients—differed substantially from the group. A logistic-normal model for the binary 6-month OS outcome was applied with PS and male sex included as fixed effects, and a random effect with an assumed normal distribution to represent the residual variance component not explained by the two baseline covariates. The P value for the variance component (by likelihood ratio test) was 0.032, indicating significant inter-trial-arm variability. The same model after excluding the outlier trial NCCTG-034A yielded a nonsignificant P value of 0.14 for the inter-trial variance component. Thus, with respect to variability between trial arms in 6-month OS rates, the removal of NCCTG-034A serves to achieve the homogeneity for reliable benchmarks.

For the 6-month PFS rates, a comparison of each arm to the overall 6-month PFS rate of 24% (95% CI, 22%–27%) identified the same outlier, NCCTG-034A, with a 6-month PFS rate of 44%. There was a significant variance component in the logistic-normal model even after adjusting for the covariate of PS (P = 0.02), representing considerable between-trial-arm variability. When the outlier trial was removed from the dataset, the between-trial-arm variance component was no longer significant (P is close to 1 when adjusted for PS). On the basis of these findings, a suitable benchmark for PFS might be derived using the reduced dataset, which, again, excludes the outlying trial arm from NCCTG-034A.

Survival and PFS

The median OS for all trials combined, henceforth excluding NCCTG-034A, was 5.7 months (95% CI, 5.3–6.0 months), and the median PFS was 2.9 months (95% CI, 2.6–3.4 months). There were no differences in survival or PFS when comparing the data from phase II versus phase III trials (P = 0.70 for OS and P = 0.67 by log-rank test for PFS; Fig. 1A and B). Likewise, there were no significant differences in OS or PFS when comparing the 1995 to 2000 activation period against later 2000 to 2004 activations (Fig. 1C and D). Benchmarks for OS and PFS according to the chosen factors (sex and PS) are given in Tables 5 and 6, respectively. The subgroups with good PS (0 or 1 on the Zubrod scale) had the better prognoses for OS and PFS, and females had a slight overall survival advantage over males. Although sex was not a statistically significant factor in the Cox regression analysis for PFS, we chose to include this factor in the derivation of benchmarks for PFS to maintain consistency with the OS benchmarks. Similarly, predicted survival rates for cases classified by sex and PS are provided for and used in the study planning and analysis examples given in examples 1 and 2. Figure 2A and B show OS and PFS curves for the same benchmark categories.

Figure 1.

Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,062 (N034A excluded). A, OS according to type of clinical trial (phase II vs. phase III). B, PFS according to type of clinical trial (phase II vs. phase III). C, OS according to time period of trial activation. D, PFS according to time period of trial activation.

Figure 1.

Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,062 (N034A excluded). A, OS according to type of clinical trial (phase II vs. phase III). B, PFS according to type of clinical trial (phase II vs. phase III). C, OS according to time period of trial activation. D, PFS according to time period of trial activation.

Close modal
Figure 2.

Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,046 cases with complete data for PS (NCCTG-034A excluded). A, OS according to sex and PS. B, PFS according to PS and sex.

Figure 2.

Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,046 cases with complete data for PS (NCCTG-034A excluded). A, OS according to sex and PS. B, PFS according to PS and sex.

Close modal
Table 5.

OS statistics for advanced pancreatic cancer according to sex and PS

FemaleMale
6-month/12-month OS rates6-month/12-month OS rates
NPredictedaObservedMedian OS months (95% CI)NPredictedaObservedMedian OS months (95% CI)
PS 0 130 61%/26% 64%/30% 7.62 (6.41–9.4) 158 60%/20% 57%/17% 6.65 (5.91–7.2) 
PS 1 286 47%/18% 45%/16% 5.37 (5.06–6.05) 326 46%/14% 47%/16% 5.73 (4.76–6.34) 
PS 2 68 27%/4.0% 26%/4.4% 3.81 (2.66–4.86) 78 26%/2.9% 27%/2.6% 2.51 (1.71 -3.25) 
FemaleMale
6-month/12-month OS rates6-month/12-month OS rates
NPredictedaObservedMedian OS months (95% CI)NPredictedaObservedMedian OS months (95% CI)
PS 0 130 61%/26% 64%/30% 7.62 (6.41–9.4) 158 60%/20% 57%/17% 6.65 (5.91–7.2) 
PS 1 286 47%/18% 45%/16% 5.37 (5.06–6.05) 326 46%/14% 47%/16% 5.73 (4.76–6.34) 
PS 2 68 27%/4.0% 26%/4.4% 3.81 (2.66–4.86) 78 26%/2.9% 27%/2.6% 2.51 (1.71 -3.25) 

aPredicted rates are conditional on the levels of the covariates (sex and PS) and are derived from a logistic regression model.

Table 6.

PFS statistics for advanced pancreatic cancer according to sex and PS

FemaleMale
6-month PFS rate6-month PFS rate
NPredictedaObservedMedian OS months (95% CI)NPredictedaObservedMedian OS months (95% CI)
PS 0 130 26% 27% 3.60 (2.66–4.73) 158 25% 23% 3.63 (2.83–3.94) 
PS 1 286 25% 24% 3.45 (2.66–3.68) 326 24% 24% 2.51 (2.07–3.38) 
PS 2 68 15% 13% 2.14 (1.58–3.15) 78 14% 15% 1.64 (1.41–2.76) 
FemaleMale
6-month PFS rate6-month PFS rate
NPredictedaObservedMedian OS months (95% CI)NPredictedaObservedMedian OS months (95% CI)
PS 0 130 26% 27% 3.60 (2.66–4.73) 158 25% 23% 3.63 (2.83–3.94) 
PS 1 286 25% 24% 3.45 (2.66–3.68) 326 24% 24% 2.51 (2.07–3.38) 
PS 2 68 15% 13% 2.14 (1.58–3.15) 78 14% 15% 1.64 (1.41–2.76) 

aPredicted rates are conditional on the levels of the covariates (sex and PS) and are derived from a logistic regression model.

Application of the benchmark algorithm for future phase II trial designs

The variation in PFS or OS based on the prognostic factors indicated above (sex and PS) can be combined to arrive at benchmarks (or null hypothesis values) for future clinical trials. Estimates from Tables 5 and 6 above can be used to predict survival for patients registered to a new study, depending on the specific proportions with respect to the prognostic factors. Example 1 shows the procedure for designing a study with the 6-month OS rate as primary endpoint. One can estimate (or guess) the potential frequency of patient groups to come up with an appropriate benchmark for the expected study sample. The guesses will not dramatically affect the appropriate sample size. The expected fraction of patients in each of the (sex and PS) groupings is multiplied by the 6-month OS estimates from Table 5 [following the algorithm of Korn and colleagues (3)] to generate a null predicted 6-month (or 12-month) survival rate:

Example 1: Designing a single-arm phase II trial with 6-month OS rate as primary endpoint.

  • 1) Estimate expected frequencies of patient groups to be accrued, to determine πP0, the benchmark null hypothesis:

  •  Example:

  •  Female, PS 0 (5%); female, PS 1 (30%); female, PS 2 (10%)

  •  Male, PS 0 (5%); male, PS 1 (40%); male, PS 2 (10%)

  • 2) Use predicted rates in Table 5 to calculate the predicted 6-month OS rate for this sample.

  •  Using the above frequencies:

  •  πP0 = 0.05 × 0.64 + 0.30 × 0.45 + 0.10 × 0.26 + 0.05 × 0.57 + 0.40 × 0.47 + 0.10 × 0.27 = 0.44 or 44%.

  • 3) Specify the alternative πpA.

  •  Example: πpA = πP0 + 0.15.

  • 4) Use usual binomial sample size calculators with assumption of complete follow-up at 6 months.

Example 2 shows the procedure for the analysis of a completed trial given the actual fraction of patients in each grouping. The calculated predicted OS rate is compared with the observed 6-month survival estimated from the new phase II trial. A new treatment could be declared worthy of additional study if the OS rate πP can be rejected at some type-1 error level (for instance, P < 0.10):

Example 2: Analyzing a completed single-arm phase II trial with 6-month OS rate as primary endpoint:

  • 1) Use the actual frequencies of patient groups to determine πP, the average of predicted outcomes specific to the trial.

  •  Example:

  •  Female, PS 0 (5%); female, PS 1 (30%); female, PS 2 (10%)

  •  Male, PS 0 (5%); male, PS 1 (40%); male, PS 2 (10%)

  • 2) Use predicted rates in Table 5 to calculate the historical null rate prediction for OS.

  •  Using the example expected frequencies:

  •  πP = 0.05 × 0.64 + 0.30 × 0.45 + 0.10 × 0.26 + 0.05 × 0.57 + 0.40 × 0.47 + 0.10 × 0.27 = 0.44 or 44%.

  • 3) Compare the predicted rate with the observed 6 months OS rate in the completed trial. Declare the treatment worthy if the predicted rate (0.44) can be rejected at the desired type 1 error level (e.g., P < 0.10).

These examples represent a trial patient population with a slightly worse PS distribution than that seen in the historical dataset, and the 6-month predicted OS is adjusted to that patient mixture.

With the advent of newer targeted therapies, there is potential for improvement over existing cytotoxic therapies in the treatment of pancreatic cancer (12, 13). The Clinical Trial Design Task Force of the NCI Investigational Drug Steering Committee recommended the implementation of a randomized approach for the phase II setting in general, but acknowledged the existence of situations where a single-arm approach is allowable (2). Specifically, if there exists a reliable historical database, the single-arm approach may be a way to reduce the number of patients required to complete the study (4).

The availability of appropriate data to serve as a historical control provides a mechanism to help design and assess the addition of these new agents initially in a phase II setting when a single-arm trial is desired, or multiple treatment arms all consisting of experimental therapies. Accrual may even proceed more quickly if the aspect of randomization diminishes the desire to participate for some patients where existing therapies have marginal benefit. In fact, a large national survey of patients with cancer shows that the most frequent barrier to trial accrual, on the part of the patients, was their concern about randomized treatment (14).

With a modern, relevant historical control database for untreated, metastatic pancreatic cancer, the next step would be to create online tools, based on the historical benchmarks, to be used for study planning and analysis of single-arm or multiple-arm “pick the winner” phase II trials.

There was complete overlap in terms of survival between the phase II and phase III trial setting in the benchmark population, which supports the use of phase III trial data to supplement the availability of phase II in the establishment of benchmarks for phase II trials. Progression criteria varied in the study populations with either some version of bidimensional response and progression criteria or RECIST in use on any given trial. RECIST is generally thought to be less sensitive to disease progression because the volume required to call progressive disease is larger and patients satisfying the criteria for progression under WHO (with a 25% increase in the bidimensional product in any single lesion) may not satisfy the criteria for progression under RECIST. One might therefore expect shorter PFS times in the trials activating before the year 2000. Assessment intervals varied as well, with the shortest protocol-specified interval being 6 weeks, and the longest 8 weeks that corresponded to two cycles of gemcitabine. For this reason, OS remains as the most reliable endpoint in pancreatic cancer trials in general, where PFS times are frequently short enough to be biased by differing assessment intervals (5). In addition, with very short survival times, OS is a reasonable endpoint for advanced pancreatic cancer even in the phase II setting where results are expected in a shorter time frame.

Despite the potential effects on progression times based on progression criteria, in this dataset there was no significant difference in PFS between older trials, activated during the era of bidimensional progression criteria, versus newer trials using RECIST. This would suggest that the PFS endpoint is comparable even when criteria differ. However, this topic deserves further exploration, including a study of the comparability of solid tumor response endpoints, and this database will enable that exploration as well.

Estimates of the OS and PFS rates across all of the trials would be misleading if there were significant between-trial variance that could not be explained by differences in the population with respect to the chosen baseline factors (15). In this case, the elimination of one outlying trial resulted in a population without a significant between-trial variance component.

There are several compelling reasons to further this effort, once the necessary tools are in place, by continuing to build a database as trials are completed. If the database can be constantly updated with new trials, it will remain an invaluable tool for planning and analysis. Changes in treatment standards and improvements in first-line and second-line therapy will ultimately result in benchmarks in need of continuous updates. Although there was no difference in survival over the past 10 years in the dataset used for the primary analysis, this was not so for the entire database. For the data initially contributed to this project, the decision was made to include only those trials that administered gemcitabine as part of the regimen, because this represented a change in treatment standard and an accompanying advance in the survival prognosis (6, 12).

The “outlier” trial NCCTG-034A was excluded from the benchmark calculations because its results differed markedly from the rest of the database in the positive direction. The recent development of the non–gemcitabine-based cytotoxic combination regimen of oxaliplatin, irinotecan, 5-fluorourcil, and leucovorin (FOLFIRINOX; ref. 16) demonstrated improved survival of patients with metastatic disease over gemcitabine. However, the applicability of the FOLFIRINOX regimen, usually in a modified form, is limited to 20% to 25% of patients with metastatic disease who have a favorable PS (0–1) and adequate liver function and are generally younger. For now, gemcitabine-based regimens remain the standard of care for the majority of patients with advanced pancreatic cancer, and the database reported upon here remains relevant for the future. A recent phase III study demonstrated the superiority of nab-paclitaxel/gemcitabine combination when compared with gemcitabine that would support the use of gemcitabine-based therapies in advanced pancreatic cancer in the foreseeable future (17). Nevertheless, future additions to a database of historical controls would be a key to maintaining its relevance. Further work on evaluating the performance of the model on recent completed and future phase II trial results would also be interesting. Expanding these analyses to other more comprehensive datasets such as the Aide et Recherche en Cancerelogie Digestive (ARCAD) pancreatic database would also be a very important consideration.

The discovery of factors that are prognostic for survival in pancreatic cancer was not the purpose of this study. However, there are other patient-level baseline factors in addition to PS and sex that are undoubtedly prognostic for survival and PFS, and it may be practical for some of these factors to be used in the refinement of historical benchmarks. For example, although lactate dehydrogenase was requested, it was not available from the contributing groups. Based upon initial review of availability, factors such as percentage weight loss, serum albumin concentration, serum CA19–9 level, and tumor grade were not requested though there is some evidence reported that they might be prognostic. Location of the primary tumor and metastatic sites could also be considered as factors in the future. With a more extensive database, those factors could be found and used for the further development of historical benchmarks that tailor more closely to a specific phase II trial's patient population. This would increase the advantage to this covariate-adjusted approach, as compared with the simpler method of comparing overall outcomes with historical benchmarks. Furthermore, other groups represented in the database in smaller proportions, such as those with locally advanced disease treated with or without radiotherapy in conjunction with chemotherapy, are worthy of investigation to develop benchmarks applicable to those populations.

Continued development of a historical control database would lend itself well to the development of Web-based study planning and analysis tools. These freely accessible tools would enable researchers to design an appropriately powered single-arm or multiple-arm experimental study in much the same way as currently available tools provide. The enhancement would be in the ability to specify the expected patient population in terms of the baseline factors. These tools will also enable the researcher to perform analyses at the conclusion of the trial, which would account for the actual makeup of the trial population with respect to the important baseline factors, allowing for a comparison against an appropriately matched historical control. It is worth noting that these target values could be used to help develop trial designs other than single-arm studies. For example, they could be used to provide baseline survival calculations to aid in the design for randomized selection or screening studies that do not incorporate a control arm. If the database continues to grow, these tools can be continuously refined. As these tools are developed and used, it is hoped that the experience gained can be successfully applied to other disease settings in phase II cancer clinical trials.

No potential conflicts of interest were disclosed.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Conception and design: P.A. Philip, M. LeBlanc, L. Rubinstein, S.P. Ivy, J. Crowley

Development of methodology: P.A. Philip, K. Chansky, M. LeBlanc, L. Rubinstein, J. Crowley

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): P.A. Philip, S.R. Alberts

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): P.A. Philip, K. Chansky, M. LeBlanc, L. Seymour, J. Crowley

Writing, review, and/or revision of the manuscript: P.A. Philip, K. Chansky, M. LeBlanc, L. Rubinstein, L. Seymour, S.P. Ivy, S.R. Alberts, J. Crowley

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. LeBlanc, P.J. Catalano

Study supervision: M. LeBlanc, J. Crowley

This work was supported by the NCI of the NIH under award number U10CA038926 (to J. Crowley).

1.
Dhani
N
,
Tu
D
,
Sargent
DJ
,
Seymour
L
,
Moore
MJ
. 
Alternate endpoints for screening phase II studies
.
Clin Canc Res
2009
;
15
:
1873
82
.
2.
Seymour
L
,
Ivy
SP
,
Sargent
D
,
Spriggs
D
,
Baker
L
,
Rubinstein
L
, et al
The design of phase II clinical trials testing cancer therapeutics: consensus recommendations from the clinical trial design task force of the National Cancer Institute Investigational Drug Steering Committee
.
Clin Canc Res
2010
;
16
:
1764
9
.
3.
Korn
EL
,
Liu
PY
,
Lee
SJ
,
Chapman
JA
,
Niedzwiecki
D
,
Suman
VJ
, et al
Meta-analysis of phase II cooperative group trials in metastatic stage IV melanoma to determine progression-free and overall survival benchmarks for future phase II trials
.
J Clin Oncol
2008
;
26
:
527
34
.
4.
Rubinstein
L
,
Crowley
J
,
Ivy
P
,
LeBlanc
M
,
Sargent
D
. 
Randomized phase II designs
.
Clin Canc Res
2009
;
15
:
1883
90
.
5.
Philip
P
,
Mooney
M
,
Jaffe
D
,
Eckhardt
G
,
Moore
M
,
Meropol
N
, et al
Consensus report of the National Cancer Institute clinical trials planning meeting on pancreas cancer treatment
.
J Clin Oncol
2009
;
27
:
5660
9
.
6.
Burris
HA
,
Moore
MJ
,
Andersen
J
,
Green
MR
,
Rothenberg
ML
,
Modiano
MR
, et al
Improvements in survival and clinical benefit with gemcitabine as first-line therapy for patients with advanced pancreas cancer: a randomized trial
.
J Clin Oncol
1997
;
15
:
2403
13
.
7.
Therasse
P
,
Arbuck
S
,
Eisenhauer
E
,
Wanders
J
,
Kaplan
RS
,
Rubinstein
L
, et al
New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada
.
J Natl Cancer Inst
2000
;
92
:
205
16
.
8.
Cox
D
. 
Regression models and life-tables
.
J R Stat Soc Series B Stat Methodol
1972
;
34
:
187
220
.
9.
Kaplan
EL
,
Meier
P
. 
Nonparametric estimation from incomplete observations
.
J Am Stat Assoc
1958
;
53
:
457
81
.
10.
Oken
MM
,
Creech
RH
,
Tormey
DC
,
Horton
J
,
Davis
TE
,
McFadden
ET
, et al
Toxicity and response criteria of the Eastern Cooperative Oncology Group
.
American J Clin Oncol
1982
;
5
:
649
55
.
11.
WHO
. 
WHO handbook for reporting results of cancer treatment
.
Geneva, Switzerland
:
World Health Organization Offset Publication
; 
1979
12.
Burris
H
,
Rocha-Lima
C
. 
New therapeutic directions for advanced pancreatic cancer: targeting the epidermal growth factor and vascular endothelial growth factor pathways
.
Oncologist
2008
;
13
:
289
98
.
13.
Heinemann
V
,
Boeck
S
,
Hinke
A
,
Labianca
R
,
Louvet
C
. 
Meta-analysis of randomized trials: evaluation of benefit from gemcitabine-based combination chemotherapy applied in advanced pancreatic cancer
.
BMC Cancer
2008
;
8
:
82
.
14.
Unger
JM
,
Hershman
DL
,
Albain
KS
,
Moinpour
CM
,
Petersen
JA
,
Burg
K
, et al
Patient income level and cancer clinical trial participation
.
J Clin Oncol
2013
;
31
:
536
42
.
15.
Higgins
JPT
,
Thompson
SG
,
Spiegelhalter
DJ
. 
A re-evaluation of random-effects meta-analysis
.
J R Stat Soc Ser A Stat Soc
2008
;
172
:
137
59
.
16.
Conroy
T
,
Desseigne
F
,
Ychou
M
,
Bouche
O
,
Guimbaud
R
,
Becouarn
Y
, et al
FOLFIRINOX versus gemcitabine for metastatic pancreatic cancer
.
N Engl J Med
2011
;
364
:
1817
25
.
17.
Von Hoff
DD
,
Ervin
T
,
Arena
FP
,
Chiorean
EG
,
Infante
JR
,
Moore
MJ
, et al
Randomized phase III study of weekly nab-Paclitaxel plus gemcitabine vs gemcitabine alone in patients with metastatic adenocarcinoma of the pancreas
.
J Clin Oncol 31, 2013 (suppl 4; abstr LBA148)
.
18.
Shepard
RC
,
Levy
DE
,
Berlin
JD
,
Stuart
K
,
Harris
JE
,
Aviles
V
, et al
Phase II study of gemcitabine in combination with docetaxel in patients with advanced pancreatic carcinoma (E1298)
.
Oncology
2004
;
66
:
303
9
.
19.
Berlin
JD
,
Catalano
P
,
Thomas
JP
,
Kugler
JW
,
Haller
DG
, 
Bowen Benson III A. Phase III study of gemcitabine in combination with fluorouracil versus gemcitabine alone in patients with advanced pancreatic carcinoma: Easter Cooperative Oncology Group trial E2297
.
J Clin Oncol
2002
;
20
:
3270
5
.
20.
Berlin
JD
,
Adak
S
,
Vaughn
DJ
,
Flinker
D
,
Blaszkowsky
L
,
Harris
JE
, et al
A phase II study of gemcitabine and 5-fluorouracil in metastatic pancreatic cancer: an Eastern Cooperative Oncology Group study (E3296)
.
Oncology
2000
;
58
:
215
8
21.
Alberts
SR
,
Schroeder
M
,
Erlichman
C
,
Steen
PD
,
Foster
NR
,
Moore
DF
 Jr
, et al
Gemcitabine and ISIS-2503 for patients with locally advanced or metastatic pancreatic adenocarcinoma: a North Central Cancer Treatment Group Phase II Trial
.
J Clin Oncol
2004
;
22
:
4944
50
.
22.
Alberts
S
,
Foster
N
,
Morton
R
,
Kugler
J
,
Schaefer
P
,
Wiesenfeld
M
, et al
PS-341 and gemcitabine in patients with metastatic pancreatic adenocarcinoma: a North Central Cancer Treatment Group (NCCTG) randomized phase II study
.
Ann Oncol
2005
;
16
:
1654
61
.
23.
Kim
GP
,
Oberg
A
,
Kabat
B
,
Sing
A
,
Hedrick
E
,
Campbell
S
, et al
NCCTG phase II trial of bevacizumab, gemcitabine, oxaliplatin in patients with metastatic pancreatic adenocarcinoma
.
J Clin Oncol
2006
;
24
(
suppl. 18s
).
24.
Alberts
S
,
Townley
P
,
Goldberg
R
,
Cha
SS
,
Sargent
DJ
,
Moore
DF
, et al
Gemcitabine and oxaliplatin for metastatic pancreatic adenocarcinoma: a North Central Cancer Treatment Group phase II study
.
Ann Oncol
2003
;
14
:
3605
10
.
25.
Philip
PA
,
Benedetti
J
,
Corless
C
,
Wong
R
,
O'Reilly
EM
,
Flynn
PJ
, et al
Phase III study comparing gemcitabine plus cetuximab versus gemcitabine in patients with advanced pancreatic adenocarcinoma: Southwest Oncology Group-directed intergroup trial S0205
.
J Clin Oncol
2010
;
28
:
3505
10
.