Classical phase II trial designs assume a homogeneous tumor type and yield an estimate of a stochastic probability of tumor response. Clinically, however, oncology is moving towards tumor subtyping based upon predictive markers. For a given phase II trial predictive markers may be defined prospectively (on the basis of previous results) or identified retrospectively on the basis of analysis of responding and non-responding tumors. Retrospective analysis has the advantages that the analysis can be “supervised” by grouping responding and non-responding tumors, and hypotheses not formulated at the time of trial design can be tested. We propose that phase II trials should be powered to permit the retrospective (or prospective) identification of responding tumor subtypes. We propose three approaches to such power calculations: i) the trial is powered to detect the presence of a responding subtype of the smallest prevalence of interest as well as to estimate the response rate in the total population ii) the trial is powered to detect the presence of a responding subtype of the smallest prevalence of interest and to estimate the response rate within that subtype and iii) the trial is powered to detect a hypothesized difference between the response rate in the smallest subtype of interest and the total population. These calculations can be applied to both single stage and two-stage designs. Relevant parameters include the smallest prevalence of interest of a responding subtype, the hypothesized response rate within that subtype, the hypothesized total response rate, and the probabilities of type I and II errors. Extensions to allow multiple searches for subtypes and to account for imperfect marker assays will be considered. Sample size calculations for the different scenarios will be presented.

Example: In a retrospective analysis of non-small cell lung cancers treated with the epidermal growth factor receptor (EGFR)-targeted agent, gefitinib, Lynch et al (NEJM 350:2129) discovered a new set of EGFR mutations that conferred sensitivity to gefitinib. In a sample of 275, the overall response rate was 9%, but the response rate within tumors with EGFR mutations (about 8% of the total) was 89% Had these investigators anticipated these facts at the outset, a power calculation according to (iii) would have yielded a sample size of 80. Fortunately, at the time they conducted their analysis, they had an excess of cases. On the other hand, had this investigation been structured as an optimal two-stage phase II trial with typical parameters of a response rate of interest of 20% or greater and a response rate of no interest of 5% or less, the probability of termination at stage one (the probability of no responses in the first 10 patients) would have been 60%, the maximum sample size would have been 29, and the responding subtype most likely would remain undiscovered.

98th AACR Annual Meeting-- Apr 14-18, 2007; Los Angeles, CA