Abstract
We compiled and analyzed a database of cooperative group trials in advanced pancreatic cancer to develop historical benchmarks for overall survival (OS) and progression-free survival (PFS). Such benchmarks are essential for evaluating new therapies in a single-arm setting. The analysis included patients with untreated metastatic pancreatic cancer receiving regimens that included gemcitabine, between 1995 and 2005. Prognostic baseline factors were selected by their significance in Cox regression analysis. Outlier trial arms were identified by comparing individual 6-month OS and PFS rates against the entire group. The dataset selected for the generation of OS and PFS benchmarks was then tested for intertrial arm variability using a logistic-normal model with the selected baseline prognostic factors as fixed effects and the individual trial arm as a random effect. A total of 1,132 cases from eight trials qualified. Performance status and sex were independently significant for OS, and performance status was prognostic for PFS. Outcomes for one trial (NCCTG-034A) were significantly different from the other trial arms. When this trial was excluded, the remaining trial arms were homogeneous for OS and PFS outcomes after adjusting for performance status and sex. Benchmark values for 6-month OS and PFS are reported along with a method for using these values in future study design and analysis. The benchmark survival values were generated from a dataset that was homogeneous between trials. The benchmarks can be used to enable single-arm phase II trials using a gemcitabine platform, especially under certain circumstances. Such circumstances might be when a randomized control arm is not practically feasible, an early signal of activity of an experimental agent is being explored such as in expansion cohorts of phase I studies, and in patients who are not candidates for combination cytotoxic therapy. Clin Cancer Res; 20(16); 4176–85. ©2014 AACR.
Introduction
Phase II clinical trials in cancer have, in recent years, focused increasingly on “targeted” agents that are “cytostatic” rather than “cytotoxic.” Although most agents that ultimately prove to be useful in the clinic demonstrate at least some disease stability, many authors feel that a traditional treatment response endpoint for phase II trials in solid tumors is less relevant for testing the newer targeted agents (1). Researchers therefore frequently prefer to measure treatment success in terms of overall survival (OS) or progression-free survival (PFS) rather than clinical response. For survival and PFS endpoints in the phase II setting, one may choose between a single-arm approach, which compares trial results with some historical benchmark, or a randomized phase II trial with two or more arms, where the “control” arm provides the benchmark for judging success. The Clinical Trial Design Task Force of the NCI Investigational Drug Steering Committee has recommended the randomized approach in the phase II setting, especially when evaluating combinations of agents (2). However, the single-arm approach is deemed appropriate for the evaluation of single-agent experimental therapies, and where a well-defined historical control database is available (2, 3).
Single-arm designs have the advantage of requiring fewer patients, all of whom receive the experimental treatment. The conduct of trials requires patients, funding, and effort. With a multitude of candidate treatments and limitations on funding and time, an expedited result through a single-arm trial is desirable when feasible. However, researchers may have difficulty arriving at an appropriate historical benchmark against which to compare their results (4).
To address the problem of reliable historical benchmarks for single-arm phase II trials, efforts have already been made in specific disease sites, such as stage IV melanoma (3), to amass historical databases and derive historical control data for future trials. The current effort, part of the aforementioned NCI-sponsored task force, has resulted in the compilation of clinical trial data in two specific diseases: advanced pancreatic cancer and advanced non–small cell lung cancer. We report here on the advanced pancreatic cancer database and the benchmarks derived for previously untreated advanced pancreatic cancer. All trials were conducted by cooperative groups in the United States from 1995 to 2005. These clinical trial data were compiled and analyzed specifically to provide the appropriate benchmarks for the planning and analysis of future phase II trials in this disease.
Historically, certain trials in advanced pancreatic cancer included locally advanced unresectable disease. More recently, and certainly for the future, trials will select exclusively for either locally advanced or metastatic disease so that these two patient populations can be studied separately (5). Therefore, in accordance with the primary objective to provide benchmarks for future trials in advanced, metastatic pancreatic cancer, the decision was made to focus initially on cases with metastatic disease and no prior chemotherapy for pancreatic cancer. Cases with locally advanced disease will be considered separately. In addition, because the use of gemcitabine for advanced pancreatic cancer represented a change in treatment standard for the disease (6), and continues to be a treatment standard, we include for this analysis only those trial arms in which gemcitabine is part of the treatment regimen. The recent use of FOLFIRINOX for some patient groups is presented in the discussion.
Materials and Methods
Data sources and selection
Patient-level and trial-level data from eligible cooperative group trial participants were collected from U.S. Cooperative Group phase II and phase III trials in advanced pancreatic cancer, accrued during the time period of January, 1995, through December, 2005. Four cooperative groups submitted data for pancreatic cancer: SWOG, the North Central Cancer Treatment Group (NCCTG), the Eastern Cooperative Oncology Group (ECOG), and the Radiation Therapy Oncology Group (RTOG). The study was approved by the institutional review board (IRB) for the organization compiling and analyzing the data (Cancer Research and Biostatistics) and by the IRBs serving each participating cooperative group data center. Waiver of consent was approved due to the deidentified and retrospective nature of the data.
Patient-level data elements requested were age at registration, sex, race, Zubrod performance status (PS), location and extent of disease at baseline, prior treatment, laboratory values, and assigned treatment arm. The extent to which the data element requests were fulfilled varied by trial, so that certain baseline factors were missing from various subsets of the full dataset. Trial-level variables included activation and closure dates, the treatment setting (first line vs. second or more), details on the treatment regimen, type of trial (phase II vs. III), objective tumor response and disease progression criteria in use, and type of disease evaluation required at baseline. Patient-level outcome data included best response to treatment, time to progression, and time to death or last contact. Trials activating before the 2000-to-2004 time frame used bidimensional response and progression criteria as described below. Those activated in the year 2000 or later used unidimensional RECIST (7).
Statistical methods
OS was calculated as the time from registration until death or last contact, with censoring at last contact if the patient was still living. PFS was calculated as the time from registration until documented progression of disease (according to individual trial criteria), or death, whichever occurred first. Cases alive without progression were censored at the date of last contact.
The selection of baseline patient-level factors to carry forward as part of the determination of OS and PFS benchmarks was achieved by the combined information from statistical modeling and practical considerations. Practical considerations included the availability of a given data element in a sufficient proportion of the submitted data, and the practicality of measuring or predicting the distribution of a given factor in a real trial population. The selection of independently prognostic baseline covariates was achieved via Cox regression (8) with stepwise variable selection, applied separately to OS and PFS. Univariately significant patient-level factors with sufficient representation in the database were subjected to multiple factor Cox regression with stepwise selection to determine independent prognostic significance. Factors that were not univariately significant (but sufficiently represented) were initially included as well, to explore for the possibility of confounding. Factors that were insufficiently represented in the data were excluded from consideration. As a rule, a factor that was represented in less than 80% of the cases available for analysis, or was missing completely from an individual contributing group in the final analysis set, could not be considered. Survival curves and estimates of medians and rates were estimated by the Kaplan–Meier method (9).
Between-trial variability in the 6-month OS rate and the 6-month PFS rate was assessed in two ways: (i) the OS and PFS rate for each trial arm was compared with the overall rate using a Bonferroni correction; (ii) once prognostic baseline factors were selected, a logistic-normal model with individual trial arm as a random effect was used to estimate between-trial variability not accounted for by differing distributions of the baseline factors. These methods for assessing between-trial variability were also described in the context of historical melanoma trials by Korn and colleagues (3). This model was applied to the entire set of trials, as well as to the reduced set where outliers identified in the first method were eliminated. OS and PFS benchmarks were calculated for prognostic groups according to the selected baseline factors, eliminating any trial that was found to be an outlier in step 1. All analyses were performed using SAS version 9.2.
Results
Characteristics of the study population
A complete listing of the trials involving 2,341 cases that were contributed by four cooperative groups is given in (Table 1). After screening to include only those cases with previously untreated metastatic disease enrolled to trial arms that incorporated gemcitabine as part of the regimen, there were 1,132 cases available for this analysis. The included cases were from eight trials (10 trial arms) with activation dates ranging from 1996 to 2004, as characterized in Table 2. The three ECOG trials that were activated before the year 2000 used ECOG Solid Tumor Response Criteria (10). With respect to determining progression, the ECOG criteria are comparable with standard World Health Organization (WHO) bidimensional progression criteria (11), in which a 25% increase in the product of perpendicular diameters in any single lesion (or the appearance of a new lesion) qualified for disease progression. The NCCTG trial activated before the year 2000 used comparable bidimensional criteria. All other trials used RECIST, which for progression requires a 20% increase in the sum of unidimensional diameters over the smallest sum observed during treatment, or the observation of new lesions. Assessment intervals ranged from 6 weeks to 8 weeks (or two cycles of gemcitabine). Reported outcomes for each included trial arm are given in Table 3.
Complete listing of data contributed by group and protocol
Protocol . | Type of trial . | Protocol Tx . | Setting . | N . |
---|---|---|---|---|
ECOG-1202 | Phase II | Chemotherapy | First line | 10 |
ECOG-1298 | Phase II | Chemotherapy | First line | 32 |
ECOG-2297 | Phase III | Chemotherapy | First line | 321 |
ECOG-3292 | Phase II | Chemo/immunotherapy | First line | 24 |
ECOG-3296 | Phase II | Chemotherapy | First line | 36 |
NCCTG-0043 | Phase II | Chemo/biologic | First line | 48 |
NCCTG-014C | Randomized phase II | Biologic | First line | 42 |
NCCTG-014C | Randomized phase II | Chemo/biologic | First line | 39 |
NCCTG-034A | Phase II | Chemo/biologic | First line | 79 |
NCCTG-894352 | Phase III | Chemotherapy | Either | 94 |
NCCTG-924352 | Phase II | Chemotherapy | Either | 46 |
NCCTG-964351 | Phase II | Chemotherapy | First line | 13 |
NCCTG-984351 | Phase II | Chemotherapy | Either | 58 |
NCCTG-9942 | Phase II | Concurrent chemo/RT | First line | 46 |
RTOG-0020 | Randomized phase II | Concurrent chemo/RT | First line | 91 |
RTOG-0411 | Phase II | Chemo/biologic/RT | First line | 82 |
RTOG-9102 | Phase III | Concurrent chemo/RT | First line | 27 |
RTOG-9209 | Phase II | Concurrent chemo/RT | First line | 50 |
RTOG-9812 | Phase II | Concurrent chemo/RT | Either | 109 |
SWOG-0107 | Phase II | Chemotherapy | First line | 60 |
SWOG-0205 | Phase III | Chemotherapy | First line | 347 |
SWOG-0205 | Phase III | Chemo/biologic | First line | 348 |
SWOG-8916 | Phase II | Chemotherapy | First line | 25 |
SWOG-8933 | Phase II | Chemotherapy | First line | 35 |
SWOG-9100 | Phase II | Chemo/potentiator | First line | 26 |
SWOG-9135 | Phase II | Chemotherapy | First line | 39 |
SWOG-9413 | Phase II | Chemo/immunotherapy | First line | 55 |
SWOG-9629 | Phase II | Chemo/potentiator | First line | 58 |
SWOG-9629 | Phase II | Chemo/potentiator | Second line or more | 48 |
SWOG-9924 | Phase II | Biologic | First line | 53 |
Protocol . | Type of trial . | Protocol Tx . | Setting . | N . |
---|---|---|---|---|
ECOG-1202 | Phase II | Chemotherapy | First line | 10 |
ECOG-1298 | Phase II | Chemotherapy | First line | 32 |
ECOG-2297 | Phase III | Chemotherapy | First line | 321 |
ECOG-3292 | Phase II | Chemo/immunotherapy | First line | 24 |
ECOG-3296 | Phase II | Chemotherapy | First line | 36 |
NCCTG-0043 | Phase II | Chemo/biologic | First line | 48 |
NCCTG-014C | Randomized phase II | Biologic | First line | 42 |
NCCTG-014C | Randomized phase II | Chemo/biologic | First line | 39 |
NCCTG-034A | Phase II | Chemo/biologic | First line | 79 |
NCCTG-894352 | Phase III | Chemotherapy | Either | 94 |
NCCTG-924352 | Phase II | Chemotherapy | Either | 46 |
NCCTG-964351 | Phase II | Chemotherapy | First line | 13 |
NCCTG-984351 | Phase II | Chemotherapy | Either | 58 |
NCCTG-9942 | Phase II | Concurrent chemo/RT | First line | 46 |
RTOG-0020 | Randomized phase II | Concurrent chemo/RT | First line | 91 |
RTOG-0411 | Phase II | Chemo/biologic/RT | First line | 82 |
RTOG-9102 | Phase III | Concurrent chemo/RT | First line | 27 |
RTOG-9209 | Phase II | Concurrent chemo/RT | First line | 50 |
RTOG-9812 | Phase II | Concurrent chemo/RT | Either | 109 |
SWOG-0107 | Phase II | Chemotherapy | First line | 60 |
SWOG-0205 | Phase III | Chemotherapy | First line | 347 |
SWOG-0205 | Phase III | Chemo/biologic | First line | 348 |
SWOG-8916 | Phase II | Chemotherapy | First line | 25 |
SWOG-8933 | Phase II | Chemotherapy | First line | 35 |
SWOG-9100 | Phase II | Chemo/potentiator | First line | 26 |
SWOG-9135 | Phase II | Chemotherapy | First line | 39 |
SWOG-9413 | Phase II | Chemo/immunotherapy | First line | 55 |
SWOG-9629 | Phase II | Chemo/potentiator | First line | 58 |
SWOG-9629 | Phase II | Chemo/potentiator | Second line or more | 48 |
SWOG-9924 | Phase II | Biologic | First line | 53 |
Abbreviation: RT, radiotherapy.
Trial characteristics for primary analysis cases (N = 1,132)
Protocol . | Na . | Activation year . | Type of trial . | Treatment . | Progression criteria . | Assessment interval . |
---|---|---|---|---|---|---|
ECOG-1298 (18) | 30 | 1999 | Phase II | Gem/docetaxel | ECOGb | 8 weeks |
ECOG-2297 Arm 1 (19) | 156 | 1998 | Phase III | Gemcitabine | ECOGb | 8 weeks |
ECOG-2297 Arm 2 (20) | 149 | 1998 | Phase III | Gem/5-FU | ECOGb | 8 weeks |
ECOG-3296 (20) | 36 | 1996 | Phase II | Gem/5-FU | ECOGb | 8 weeks |
NCCTG-0043 (21) | 40 | 2001 | Phase II | Gem/ISIS-2503 | RECISTc | 6 weeks |
NCCTG-014C Arm B (22) | 36 | 2002 | Rand. phase II | Gem/PS-341 | RECISTc | 6 weeks |
NCCTG-034A (23) | 70 | 2005 | Phase II | Gem/oxali/bev | RECISTc | 2 cycles/2 months |
NCCTG-984351 (24) | 54 | 1999 | Phase II | Gem/oxaliplatin | WHOd | 6 weeks |
SWOG-0205 Arm 1 (25) | 282 | 2004 | Phase III | Gem/cetuximab | RECISTc | 2 cycles/8 weeks |
SWOG-0205 Arm 2 (25) | 279 | 2004 | Phase III | Gemcitabine | RECISTc | 2 cycles/8 weeks |
Protocol . | Na . | Activation year . | Type of trial . | Treatment . | Progression criteria . | Assessment interval . |
---|---|---|---|---|---|---|
ECOG-1298 (18) | 30 | 1999 | Phase II | Gem/docetaxel | ECOGb | 8 weeks |
ECOG-2297 Arm 1 (19) | 156 | 1998 | Phase III | Gemcitabine | ECOGb | 8 weeks |
ECOG-2297 Arm 2 (20) | 149 | 1998 | Phase III | Gem/5-FU | ECOGb | 8 weeks |
ECOG-3296 (20) | 36 | 1996 | Phase II | Gem/5-FU | ECOGb | 8 weeks |
NCCTG-0043 (21) | 40 | 2001 | Phase II | Gem/ISIS-2503 | RECISTc | 6 weeks |
NCCTG-014C Arm B (22) | 36 | 2002 | Rand. phase II | Gem/PS-341 | RECISTc | 6 weeks |
NCCTG-034A (23) | 70 | 2005 | Phase II | Gem/oxali/bev | RECISTc | 2 cycles/2 months |
NCCTG-984351 (24) | 54 | 1999 | Phase II | Gem/oxaliplatin | WHOd | 6 weeks |
SWOG-0205 Arm 1 (25) | 282 | 2004 | Phase III | Gem/cetuximab | RECISTc | 2 cycles/8 weeks |
SWOG-0205 Arm 2 (25) | 279 | 2004 | Phase III | Gemcitabine | RECISTc | 2 cycles/8 weeks |
Study population and published results for included trials
Protocol . | N . | M/F (%) . | PS 0/1/2 (%) . | PFS . | OS . |
---|---|---|---|---|---|
ECOG-1298 | 32 | 50/50 | 12/72/16 | 2.1 months median PFS | 4.7 months median OS |
ECOG-2297 Arm 1 | 162 | 54/46 | 35/52/14 | 2.2 months median PFS | 5.4 months median OS |
ECOG-2297 Arm 2 | 160 | 52/48 | 23/64/14 | 3.4 months median PFS | 6.7 months median OS |
ECOG-3296 | 36 | 69/31 | 28/53/19 | 2.4 months median TTFa | 4.3 months median OS |
NCCTG-0043 | 48 | 37/63 | 34/58/8 | 3.8 months median PFS | 6.7 months median OS |
NCCTG-014C Arm B | 36 | 53/47 | 35/58/7 | 2.4 months | 4. 8 months |
NCCTG-034A | 82 | 70/27b | 40/54/6b | 5.7 months | 8.1 months |
NCCTG-984351 | 46 | 56/44 | 35/41/20 | 4.5 months | 6.2 months |
SWOG-0205 Arm 1 | 371 | 54/46 | 87/13c | 3.0 months | 5.9 months |
SWOG-0205 Arm 2 | 372 | 51/49 | 87/13c | 3.4 months | 6.3 months |
Protocol . | N . | M/F (%) . | PS 0/1/2 (%) . | PFS . | OS . |
---|---|---|---|---|---|
ECOG-1298 | 32 | 50/50 | 12/72/16 | 2.1 months median PFS | 4.7 months median OS |
ECOG-2297 Arm 1 | 162 | 54/46 | 35/52/14 | 2.2 months median PFS | 5.4 months median OS |
ECOG-2297 Arm 2 | 160 | 52/48 | 23/64/14 | 3.4 months median PFS | 6.7 months median OS |
ECOG-3296 | 36 | 69/31 | 28/53/19 | 2.4 months median TTFa | 4.3 months median OS |
NCCTG-0043 | 48 | 37/63 | 34/58/8 | 3.8 months median PFS | 6.7 months median OS |
NCCTG-014C Arm B | 36 | 53/47 | 35/58/7 | 2.4 months | 4. 8 months |
NCCTG-034A | 82 | 70/27b | 40/54/6b | 5.7 months | 8.1 months |
NCCTG-984351 | 46 | 56/44 | 35/41/20 | 4.5 months | 6.2 months |
SWOG-0205 Arm 1 | 371 | 54/46 | 87/13c | 3.0 months | 5.9 months |
SWOG-0205 Arm 2 | 372 | 51/49 | 87/13c | 3.4 months | 6.3 months |
aTime to treatment failure.
bPercentages generated from data, not published report.
cPS 0–1/2.
Of the 1,132 cases, 1,123 had died and 9 were alive at last contact. The minimum, median, and maximum survival follow-up for the 9 living patients were 2, 43, and 74 months, respectively. All but 3 of the 1,132 patients had progressive disease at the time of death or last contact, or had died due to their cancer. Published results of individual trials are shown in Table 3.
Baseline characteristics for the 1,132 cases are shown in Table 4. Forty-six percent of patients were female, and a majority of patients (87%) had a PS (translated to Zubrod scale) of 0 or 1. Ninety percent of patients were Caucasian, 7% black, <1% Asian, and the remaining 2% Native American, Pacific Islander, or not reported. Seventy-seven percent of cases originated from randomized phase III trials. Thirty-eight percent of cases were from trials that were activated from the period from 1995 to 2000, 62% from 2000 to 2005.
Patient characteristics and statistics from univariate and multivariate Cox proportional hazards regression models for 1,132 patients (N = 1,116 cases complete for data for multivariate models)
. | . | . | OS . | PFS . | |
---|---|---|---|---|---|
. | . | . | Single factor . | Multivariate . | Single factor . |
Factor . | N (%) . | Comparison . | HR (P) N = 1,132 . | HR (P) N = 1,116 . | HR (P) N = 1,132 . |
Sex | |||||
Female | 518 (46%) | ||||
Male | 614 (54%) | Male vs. female | 1.16 (0.017) | 1.15 (0.02) | 1.09 (0.16) |
PS | |||||
0 | 316 (28%) | ||||
1 | 650 (57%) | PS 1 vs. PS 0 | 1.28 (<0.0001) | 1.27 (<0.0006) | 1.12 (0.09) |
2 | 150 (13%) | PS 2 vs. PS 1 | 1.86 (<0.0001) | 1.87 (<0.0001) | 1.58 (<.0001) |
Not reported | 16 (1%) | None | NA | NA | NA |
Race | |||||
White | 1,020 (90%) | White vs. rest | 0.91 (0.33) | NA | 0.85 (0.10) |
Black | 76 (7%) | Black vs. rest | 1.18 (0.16) | NA | 1.12 (0.35) |
Asian/other | 18 (2%) | Asian/other vs. rest | 0.95 (0.78) | NA | 1.29 (0.13) |
Not reported | 18 (2%) | None | NA | NA | NA |
Bilirubin | |||||
Normal | 511 (48%) | NA | |||
>Normal | 81 (7%) | > Normal vs. normal | 1.13 (0.046) | NA | 1.06 (0.38) |
Not reported | 540 (47%) | None | |||
AST/ALT | |||||
Normal | 474 (42%) | ||||
>Normal | 315 (28%) | > Normal vs. normal | 1.35 (<.001) | NA | 1.31 (<0.001) |
Not reported | 343 (30%) | None | NA | NA |
. | . | . | OS . | PFS . | |
---|---|---|---|---|---|
. | . | . | Single factor . | Multivariate . | Single factor . |
Factor . | N (%) . | Comparison . | HR (P) N = 1,132 . | HR (P) N = 1,116 . | HR (P) N = 1,132 . |
Sex | |||||
Female | 518 (46%) | ||||
Male | 614 (54%) | Male vs. female | 1.16 (0.017) | 1.15 (0.02) | 1.09 (0.16) |
PS | |||||
0 | 316 (28%) | ||||
1 | 650 (57%) | PS 1 vs. PS 0 | 1.28 (<0.0001) | 1.27 (<0.0006) | 1.12 (0.09) |
2 | 150 (13%) | PS 2 vs. PS 1 | 1.86 (<0.0001) | 1.87 (<0.0001) | 1.58 (<.0001) |
Not reported | 16 (1%) | None | NA | NA | NA |
Race | |||||
White | 1,020 (90%) | White vs. rest | 0.91 (0.33) | NA | 0.85 (0.10) |
Black | 76 (7%) | Black vs. rest | 1.18 (0.16) | NA | 1.12 (0.35) |
Asian/other | 18 (2%) | Asian/other vs. rest | 0.95 (0.78) | NA | 1.29 (0.13) |
Not reported | 18 (2%) | None | NA | NA | NA |
Bilirubin | |||||
Normal | 511 (48%) | NA | |||
>Normal | 81 (7%) | > Normal vs. normal | 1.13 (0.046) | NA | 1.06 (0.38) |
Not reported | 540 (47%) | None | |||
AST/ALT | |||||
Normal | 474 (42%) | ||||
>Normal | 315 (28%) | > Normal vs. normal | 1.35 (<.001) | NA | 1.31 (<0.001) |
Not reported | 343 (30%) | None | NA | NA |
Abbreviation: NA, not available.
Prognostic baseline factors
Univariate survival statistics for the considered baseline factors with respect to OS and PFS are shown in Table 4. Baseline factors considered were sex, race (white vs. black vs. Asian versus other), PS, bilirubin > normal, and serum aspartate aminotransferase (AST)/alanine aminotransferase (ALT) > normal. Location of metastatic disease was not considered due to insufficient data availability. Certain laboratory values such as albumin and platelet counts were also not consistently available across the submitted databases.
OS findings were as follows: Race was not univariately significant as no single race was significantly different from the others, possibly because the distribution was overwhelmingly white. The race factor was explored in the multivariate setting initially, to assess for the possibility of effect modification with other factors. This factor was then eliminated from consideration. AST/ALT was significant univariately (HR, 1.35; P < 0.001 for abnormal AST/ALT) but was not entered into multivariate analyses because of insufficient numbers with available data (available for only 839 cases.) Serum bilirubin level was significantly prognostic for survival (HR, 1.52 for bilirubin above normal; P < 0.001), but was missing in 540 (nearly half) of cases. Imputation for patient data with missing laboratories was considered but lacked good surrogates for an imputation model. The remaining factors (sex and PS) were entered into a Cox regression analysis with stepwise elimination, with the final model including both factors as independently associated with survival.
For PFS, race, sex, and bilirubin were not significantly prognostic. AST/ALT was significant (HR, 1.34; P < 0.001), but again, the factor was not considered further due to lack of sufficient data. PS was independently prognostic (Table 4).
Variability between trial arms
The overall 6-month survival rate for the combined 10 trial arms was 48% [95% confidence interval (CI) 44%–50%]. When comparing the individual 6-month OS rates of each of the 10 trial arms with the overall rate of 48% with correction for multiple comparisons, one trial arm—the single-arm phase II trial NCCTG-034A (gemcitabine + bevacizumab + oxaliplatin), with a 6-month survival rate of 67% for 70 patients—differed substantially from the group. A logistic-normal model for the binary 6-month OS outcome was applied with PS and male sex included as fixed effects, and a random effect with an assumed normal distribution to represent the residual variance component not explained by the two baseline covariates. The P value for the variance component (by likelihood ratio test) was 0.032, indicating significant inter-trial-arm variability. The same model after excluding the outlier trial NCCTG-034A yielded a nonsignificant P value of 0.14 for the inter-trial variance component. Thus, with respect to variability between trial arms in 6-month OS rates, the removal of NCCTG-034A serves to achieve the homogeneity for reliable benchmarks.
For the 6-month PFS rates, a comparison of each arm to the overall 6-month PFS rate of 24% (95% CI, 22%–27%) identified the same outlier, NCCTG-034A, with a 6-month PFS rate of 44%. There was a significant variance component in the logistic-normal model even after adjusting for the covariate of PS (P = 0.02), representing considerable between-trial-arm variability. When the outlier trial was removed from the dataset, the between-trial-arm variance component was no longer significant (P is close to 1 when adjusted for PS). On the basis of these findings, a suitable benchmark for PFS might be derived using the reduced dataset, which, again, excludes the outlying trial arm from NCCTG-034A.
Survival and PFS
The median OS for all trials combined, henceforth excluding NCCTG-034A, was 5.7 months (95% CI, 5.3–6.0 months), and the median PFS was 2.9 months (95% CI, 2.6–3.4 months). There were no differences in survival or PFS when comparing the data from phase II versus phase III trials (P = 0.70 for OS and P = 0.67 by log-rank test for PFS; Fig. 1A and B). Likewise, there were no significant differences in OS or PFS when comparing the 1995 to 2000 activation period against later 2000 to 2004 activations (Fig. 1C and D). Benchmarks for OS and PFS according to the chosen factors (sex and PS) are given in Tables 5 and 6, respectively. The subgroups with good PS (0 or 1 on the Zubrod scale) had the better prognoses for OS and PFS, and females had a slight overall survival advantage over males. Although sex was not a statistically significant factor in the Cox regression analysis for PFS, we chose to include this factor in the derivation of benchmarks for PFS to maintain consistency with the OS benchmarks. Similarly, predicted survival rates for cases classified by sex and PS are provided for and used in the study planning and analysis examples given in examples 1 and 2. Figure 2A and B show OS and PFS curves for the same benchmark categories.
Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,062 (N034A excluded). A, OS according to type of clinical trial (phase II vs. phase III). B, PFS according to type of clinical trial (phase II vs. phase III). C, OS according to time period of trial activation. D, PFS according to time period of trial activation.
Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,062 (N034A excluded). A, OS according to type of clinical trial (phase II vs. phase III). B, PFS according to type of clinical trial (phase II vs. phase III). C, OS according to time period of trial activation. D, PFS according to time period of trial activation.
Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,046 cases with complete data for PS (NCCTG-034A excluded). A, OS according to sex and PS. B, PFS according to PS and sex.
Cooperative group trials in advanced pancreatic cancer with regimens containing gemcitabine. N = 1,046 cases with complete data for PS (NCCTG-034A excluded). A, OS according to sex and PS. B, PFS according to PS and sex.
OS statistics for advanced pancreatic cancer according to sex and PS
. | Female . | Male . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | 6-month/12-month OS rates . | . | . | 6-month/12-month OS rates . | . | ||
. | N . | Predicteda . | Observed . | Median OS months (95% CI) . | N . | Predicteda . | Observed . | Median OS months (95% CI) . |
PS 0 | 130 | 61%/26% | 64%/30% | 7.62 (6.41–9.4) | 158 | 60%/20% | 57%/17% | 6.65 (5.91–7.2) |
PS 1 | 286 | 47%/18% | 45%/16% | 5.37 (5.06–6.05) | 326 | 46%/14% | 47%/16% | 5.73 (4.76–6.34) |
PS 2 | 68 | 27%/4.0% | 26%/4.4% | 3.81 (2.66–4.86) | 78 | 26%/2.9% | 27%/2.6% | 2.51 (1.71 -3.25) |
. | Female . | Male . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | 6-month/12-month OS rates . | . | . | 6-month/12-month OS rates . | . | ||
. | N . | Predicteda . | Observed . | Median OS months (95% CI) . | N . | Predicteda . | Observed . | Median OS months (95% CI) . |
PS 0 | 130 | 61%/26% | 64%/30% | 7.62 (6.41–9.4) | 158 | 60%/20% | 57%/17% | 6.65 (5.91–7.2) |
PS 1 | 286 | 47%/18% | 45%/16% | 5.37 (5.06–6.05) | 326 | 46%/14% | 47%/16% | 5.73 (4.76–6.34) |
PS 2 | 68 | 27%/4.0% | 26%/4.4% | 3.81 (2.66–4.86) | 78 | 26%/2.9% | 27%/2.6% | 2.51 (1.71 -3.25) |
aPredicted rates are conditional on the levels of the covariates (sex and PS) and are derived from a logistic regression model.
PFS statistics for advanced pancreatic cancer according to sex and PS
. | Female . | Male . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | 6-month PFS rate . | . | . | 6-month PFS rate . | . | ||
. | N . | Predicteda . | Observed . | Median OS months (95% CI) . | N . | Predicteda . | Observed . | Median OS months (95% CI) . |
PS 0 | 130 | 26% | 27% | 3.60 (2.66–4.73) | 158 | 25% | 23% | 3.63 (2.83–3.94) |
PS 1 | 286 | 25% | 24% | 3.45 (2.66–3.68) | 326 | 24% | 24% | 2.51 (2.07–3.38) |
PS 2 | 68 | 15% | 13% | 2.14 (1.58–3.15) | 78 | 14% | 15% | 1.64 (1.41–2.76) |
. | Female . | Male . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | 6-month PFS rate . | . | . | 6-month PFS rate . | . | ||
. | N . | Predicteda . | Observed . | Median OS months (95% CI) . | N . | Predicteda . | Observed . | Median OS months (95% CI) . |
PS 0 | 130 | 26% | 27% | 3.60 (2.66–4.73) | 158 | 25% | 23% | 3.63 (2.83–3.94) |
PS 1 | 286 | 25% | 24% | 3.45 (2.66–3.68) | 326 | 24% | 24% | 2.51 (2.07–3.38) |
PS 2 | 68 | 15% | 13% | 2.14 (1.58–3.15) | 78 | 14% | 15% | 1.64 (1.41–2.76) |
aPredicted rates are conditional on the levels of the covariates (sex and PS) and are derived from a logistic regression model.
Application of the benchmark algorithm for future phase II trial designs
The variation in PFS or OS based on the prognostic factors indicated above (sex and PS) can be combined to arrive at benchmarks (or null hypothesis values) for future clinical trials. Estimates from Tables 5 and 6 above can be used to predict survival for patients registered to a new study, depending on the specific proportions with respect to the prognostic factors. Example 1 shows the procedure for designing a study with the 6-month OS rate as primary endpoint. One can estimate (or guess) the potential frequency of patient groups to come up with an appropriate benchmark for the expected study sample. The guesses will not dramatically affect the appropriate sample size. The expected fraction of patients in each of the (sex and PS) groupings is multiplied by the 6-month OS estimates from Table 5 [following the algorithm of Korn and colleagues (3)] to generate a null predicted 6-month (or 12-month) survival rate:
Example 1: Designing a single-arm phase II trial with 6-month OS rate as primary endpoint.
1) Estimate expected frequencies of patient groups to be accrued, to determine πP0, the benchmark null hypothesis:
Example:
Female, PS 0 (5%); female, PS 1 (30%); female, PS 2 (10%)
Male, PS 0 (5%); male, PS 1 (40%); male, PS 2 (10%)
2) Use predicted rates in Table 5 to calculate the predicted 6-month OS rate for this sample.
Using the above frequencies:
πP0 = 0.05 × 0.64 + 0.30 × 0.45 + 0.10 × 0.26 + 0.05 × 0.57 + 0.40 × 0.47 + 0.10 × 0.27 = 0.44 or 44%.
3) Specify the alternative πpA.
Example: πpA = πP0 + 0.15.
4) Use usual binomial sample size calculators with assumption of complete follow-up at 6 months.
Example 2 shows the procedure for the analysis of a completed trial given the actual fraction of patients in each grouping. The calculated predicted OS rate is compared with the observed 6-month survival estimated from the new phase II trial. A new treatment could be declared worthy of additional study if the OS rate πP can be rejected at some type-1 error level (for instance, P < 0.10):
Example 2: Analyzing a completed single-arm phase II trial with 6-month OS rate as primary endpoint:
1) Use the actual frequencies of patient groups to determine πP, the average of predicted outcomes specific to the trial.
Example:
Female, PS 0 (5%); female, PS 1 (30%); female, PS 2 (10%)
Male, PS 0 (5%); male, PS 1 (40%); male, PS 2 (10%)
2) Use predicted rates in Table 5 to calculate the historical null rate prediction for OS.
Using the example expected frequencies:
πP = 0.05 × 0.64 + 0.30 × 0.45 + 0.10 × 0.26 + 0.05 × 0.57 + 0.40 × 0.47 + 0.10 × 0.27 = 0.44 or 44%.
3) Compare the predicted rate with the observed 6 months OS rate in the completed trial. Declare the treatment worthy if the predicted rate (0.44) can be rejected at the desired type 1 error level (e.g., P < 0.10).
These examples represent a trial patient population with a slightly worse PS distribution than that seen in the historical dataset, and the 6-month predicted OS is adjusted to that patient mixture.
Discussion
With the advent of newer targeted therapies, there is potential for improvement over existing cytotoxic therapies in the treatment of pancreatic cancer (12, 13). The Clinical Trial Design Task Force of the NCI Investigational Drug Steering Committee recommended the implementation of a randomized approach for the phase II setting in general, but acknowledged the existence of situations where a single-arm approach is allowable (2). Specifically, if there exists a reliable historical database, the single-arm approach may be a way to reduce the number of patients required to complete the study (4).
The availability of appropriate data to serve as a historical control provides a mechanism to help design and assess the addition of these new agents initially in a phase II setting when a single-arm trial is desired, or multiple treatment arms all consisting of experimental therapies. Accrual may even proceed more quickly if the aspect of randomization diminishes the desire to participate for some patients where existing therapies have marginal benefit. In fact, a large national survey of patients with cancer shows that the most frequent barrier to trial accrual, on the part of the patients, was their concern about randomized treatment (14).
With a modern, relevant historical control database for untreated, metastatic pancreatic cancer, the next step would be to create online tools, based on the historical benchmarks, to be used for study planning and analysis of single-arm or multiple-arm “pick the winner” phase II trials.
There was complete overlap in terms of survival between the phase II and phase III trial setting in the benchmark population, which supports the use of phase III trial data to supplement the availability of phase II in the establishment of benchmarks for phase II trials. Progression criteria varied in the study populations with either some version of bidimensional response and progression criteria or RECIST in use on any given trial. RECIST is generally thought to be less sensitive to disease progression because the volume required to call progressive disease is larger and patients satisfying the criteria for progression under WHO (with a 25% increase in the bidimensional product in any single lesion) may not satisfy the criteria for progression under RECIST. One might therefore expect shorter PFS times in the trials activating before the year 2000. Assessment intervals varied as well, with the shortest protocol-specified interval being 6 weeks, and the longest 8 weeks that corresponded to two cycles of gemcitabine. For this reason, OS remains as the most reliable endpoint in pancreatic cancer trials in general, where PFS times are frequently short enough to be biased by differing assessment intervals (5). In addition, with very short survival times, OS is a reasonable endpoint for advanced pancreatic cancer even in the phase II setting where results are expected in a shorter time frame.
Despite the potential effects on progression times based on progression criteria, in this dataset there was no significant difference in PFS between older trials, activated during the era of bidimensional progression criteria, versus newer trials using RECIST. This would suggest that the PFS endpoint is comparable even when criteria differ. However, this topic deserves further exploration, including a study of the comparability of solid tumor response endpoints, and this database will enable that exploration as well.
Estimates of the OS and PFS rates across all of the trials would be misleading if there were significant between-trial variance that could not be explained by differences in the population with respect to the chosen baseline factors (15). In this case, the elimination of one outlying trial resulted in a population without a significant between-trial variance component.
There are several compelling reasons to further this effort, once the necessary tools are in place, by continuing to build a database as trials are completed. If the database can be constantly updated with new trials, it will remain an invaluable tool for planning and analysis. Changes in treatment standards and improvements in first-line and second-line therapy will ultimately result in benchmarks in need of continuous updates. Although there was no difference in survival over the past 10 years in the dataset used for the primary analysis, this was not so for the entire database. For the data initially contributed to this project, the decision was made to include only those trials that administered gemcitabine as part of the regimen, because this represented a change in treatment standard and an accompanying advance in the survival prognosis (6, 12).
The “outlier” trial NCCTG-034A was excluded from the benchmark calculations because its results differed markedly from the rest of the database in the positive direction. The recent development of the non–gemcitabine-based cytotoxic combination regimen of oxaliplatin, irinotecan, 5-fluorourcil, and leucovorin (FOLFIRINOX; ref. 16) demonstrated improved survival of patients with metastatic disease over gemcitabine. However, the applicability of the FOLFIRINOX regimen, usually in a modified form, is limited to 20% to 25% of patients with metastatic disease who have a favorable PS (0–1) and adequate liver function and are generally younger. For now, gemcitabine-based regimens remain the standard of care for the majority of patients with advanced pancreatic cancer, and the database reported upon here remains relevant for the future. A recent phase III study demonstrated the superiority of nab-paclitaxel/gemcitabine combination when compared with gemcitabine that would support the use of gemcitabine-based therapies in advanced pancreatic cancer in the foreseeable future (17). Nevertheless, future additions to a database of historical controls would be a key to maintaining its relevance. Further work on evaluating the performance of the model on recent completed and future phase II trial results would also be interesting. Expanding these analyses to other more comprehensive datasets such as the Aide et Recherche en Cancerelogie Digestive (ARCAD) pancreatic database would also be a very important consideration.
The discovery of factors that are prognostic for survival in pancreatic cancer was not the purpose of this study. However, there are other patient-level baseline factors in addition to PS and sex that are undoubtedly prognostic for survival and PFS, and it may be practical for some of these factors to be used in the refinement of historical benchmarks. For example, although lactate dehydrogenase was requested, it was not available from the contributing groups. Based upon initial review of availability, factors such as percentage weight loss, serum albumin concentration, serum CA19–9 level, and tumor grade were not requested though there is some evidence reported that they might be prognostic. Location of the primary tumor and metastatic sites could also be considered as factors in the future. With a more extensive database, those factors could be found and used for the further development of historical benchmarks that tailor more closely to a specific phase II trial's patient population. This would increase the advantage to this covariate-adjusted approach, as compared with the simpler method of comparing overall outcomes with historical benchmarks. Furthermore, other groups represented in the database in smaller proportions, such as those with locally advanced disease treated with or without radiotherapy in conjunction with chemotherapy, are worthy of investigation to develop benchmarks applicable to those populations.
Continued development of a historical control database would lend itself well to the development of Web-based study planning and analysis tools. These freely accessible tools would enable researchers to design an appropriately powered single-arm or multiple-arm experimental study in much the same way as currently available tools provide. The enhancement would be in the ability to specify the expected patient population in terms of the baseline factors. These tools will also enable the researcher to perform analyses at the conclusion of the trial, which would account for the actual makeup of the trial population with respect to the important baseline factors, allowing for a comparison against an appropriately matched historical control. It is worth noting that these target values could be used to help develop trial designs other than single-arm studies. For example, they could be used to provide baseline survival calculations to aid in the design for randomized selection or screening studies that do not incorporate a control arm. If the database continues to grow, these tools can be continuously refined. As these tools are developed and used, it is hoped that the experience gained can be successfully applied to other disease settings in phase II cancer clinical trials.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Authors' Contributions
Conception and design: P.A. Philip, M. LeBlanc, L. Rubinstein, S.P. Ivy, J. Crowley
Development of methodology: P.A. Philip, K. Chansky, M. LeBlanc, L. Rubinstein, J. Crowley
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): P.A. Philip, S.R. Alberts
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): P.A. Philip, K. Chansky, M. LeBlanc, L. Seymour, J. Crowley
Writing, review, and/or revision of the manuscript: P.A. Philip, K. Chansky, M. LeBlanc, L. Rubinstein, L. Seymour, S.P. Ivy, S.R. Alberts, J. Crowley
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. LeBlanc, P.J. Catalano
Study supervision: M. LeBlanc, J. Crowley
Grant Support
This work was supported by the NCI of the NIH under award number U10CA038926 (to J. Crowley).