Early phase cancer prevention trials are designed to demonstrate safety, tolerability, feasibility, and signals of efficacy of preventive agents. Yet it is often observed that many trials fail to detect intervention effects. We conducted a systematic review and pooled analyses of recently completed early phase chemoprevention trials to gain in depth insight on the failure of detecting efficacy signals by comparing hypothesized effect sizes to the corresponding observed effect sizes.

Single- or multi-arm efficacy chemoprevention trials conducted under the phase 0/I/II Cancer Prevention Clinical Trials Program of the Division of Cancer Prevention, NCI between 2003 and 2019 were evaluated. A total of 59 chemoprevention trials were reviewed. Twenty-four studies were efficacy or biomarker trials with complete information on hypothesized and observed effect sizes and included in this analysis. The majority of the trials (n = 18) were multi-arm randomized studies of which 15 trials were blinded. The pooled estimate of the observed to hypothesized effect size ratio was 0.57 (95% confidence interval: 0.42–0.73, P < 0.001) based on a random-effects model. There were no significant differences detected in the ratio of observed to hypothesized effect sizes when conducting various subgroup analyses.

The results demonstrate that the majority of early phase cancer chemoprevention trials have substantially smaller observed effect sizes than hypothesized effect sizes. Sample size calculations for early phase chemoprevention trials need to balance the potential detectable effect sizes with realistic and cost-effective accrual of study populations, thereby, detecting only intervention effects large enough to justify subsequent large-scale confirmatory trials.

Prevention Relevance:

The results of this systematic review and pooled analyses demonstrate that for early chemoprevention trials, there are substantial differences between hypothesized and observed effect sizes, regardless of study characteristics. The conduct of early phase chemoprevention trial requires careful planning of study design, primary endpoint, and sample size determination.

Early phase chemoprevention trials in cancer are designed to evaluate safety and tolerability and to identify efficacy signals of chemopreventive agents. The phase 0/I/II Cancer Prevention Clinical Trials Program (referred to as the “Consortia” Program) of the Division of Cancer Prevention, NCI, was established in 2003, and renewed in 2012 with five major academic medical centers. The goals of the Consortia Program were as follows: (i) to efficiently design and conduct early phase clinical trials necessary to assess the potential of cancer preventive agents of various classes, many of which are directed at molecular targets which have been shown to be expressed in intraepithelial neoplasia; (ii) to characterize the biological effects of new cancer preventive agents on their defined molecular targets as well as on multiple endpoints associated with carcinogenesis, such as proliferation, apoptosis, growth factor expression, oncogene expression, and others (correlation of these effects with clinically relevant endpoints is required); and (iii) to develop further scientific insights into the mechanisms of cancer prevention by the agents examined and to continue to develop novel potential markers as determinants of response.

Unlike the drug development process for therapeutic agents, there is no uniform approach on how early phase cancer prevention trials are conducted. Although the paradigm is evolving with targeted and immunotherapeutic agents, clinical trials for therapeutic agents follow a relatively standard drug evaluation paradigm, that is, a phase I dose-escalation trial to evaluate safety and to identify MTD followed typically by a single-arm nonrandomized study to evaluate efficacy signals using accepted surrogate clinical endpoints, such as objective response or progression-free survival, in a relatively small number of patients. For early phase prevention trials, on the other hand, there is no uniform approach for the safety and early efficacy evaluation. Agents that are evaluated for prevention range from new entities requiring full phase I evaluation to repurposed agents for which safety is well established but for which the optimal preventive dose or schedule may need additional study. Furthermore, while early phase clinical trials of therapeutic agents are designed to evaluate efficacy signals using a limited number of surrogate clinical endpoints, the efficacy signals of preventive agents are typically evaluated in a large variety of putative surrogate biomarkers which are potentially related to the prevention or delay of the carcinogenic process and targeted by the mechanism of action of the preventive agents (1). This heterogeneity in study designs and endpoints poses challenges when determining hypothesized effect sizes and, consequently, sample sizes for early phase prevention trials. This has resulted in a large number of studies which have not been adequately powered and, consequently, negative trials with inconclusive results (2).

We conducted a systematic review and pooled analyses of recently completed trials conducted under the phase 0/I/II Cancer Prevention Clinical Trials Program to gain in depth insight on the failure of detecting early chemopreventive efficacy signals by comparing hypothesized effect sizes to the corresponding observed effect sizes.

The systematic review was conducted following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement (3).

Rationale and objectives

We performed a systematic review and pooled analyses of early phase chemoprevention clinical trials conducted under the phase 0/I/II Cancer Prevention Clinical Trials Program to examine reasons for failure to detect chemoprevention intervention effects in such trials by comparing planned effect sizes to observed effect sizes.

Inclusion criteria for systemic review

All chemoprevention trials conducted under the phase 0/I/II Cancer Prevention Clinical Trials Program were evaluated in the systemic review. This included phase 0/I/II single-arm and multi-arm efficacy, safety, feasibility, or biomarker trials which were conducted and completed or terminated between 2003 and 2019, for which a final report or published article was available.

Data collection process and data items

Three co-authors (J. Eickhoff, G. Chen, and J. Zaborek) reviewed study protocols, articles, and study reports to collect study information. The following study information was obtained from the study protocols: study agent, disease site, principal investigator, study status, study type, study phase, number of study arms, randomization status, analysis population, primary endpoint(s), primary endpoint type (continuous or binary), anticipated dropout rate, planned sample size, and specifics regarding the sample size/power calculation, including hypothesized effect size, significance level, power, and statistical methods/tests. Published articles and final study reports were reviewed to collect the following study results information: observed sample size for primary endpoint evaluation, study duration, statistical methods and tests which were used for the analysis of primary endpoint, observed effect size for primary endpoint, P value for primary endpoint, and whether the study was completed or terminated early.

Analysis dataset of evaluable studies

Studies where the primary endpoint was efficacy or modulation of a biomarker measured on a continuous or binary scale were considered evaluable for inclusion in the analysis. Studies where the primary endpoint was safety, dose finding, bioequivalence, noninferiority, and pharmacokinetics were excluded from the analysis. Furthermore, studies with incomplete information regarding planned sample sizes and studies without adequate results information regarding the primary endpoints, primary comparisons or observed effect sizes were considered unevaluable and excluded from the analyses.

Hypothesized effect sizes

Hypothesized effect sizes were quantified as standardized differences for all studies. Specifically, for nonrandomized single-arm trials with primary endpoint measured on a continuous scale and for which the primary efficacy evaluation was the analysis of the change from baseline (preintervention) to the postintervention assessment, the hypothesized effect size was defined as |${\rm{E}}{{\rm{S}}}_{\rm{H}}{\rm{\ = \ |}}{\mu }_{{\rm{Post}}}{\rm{ - \ }}{\mu }_{{\rm{Pre}}}{\rm{|/}}\sigma $| where μPost and μPre denote the anticipated preintervention and postintervention means and σ the SD of the change. For binary outcomes (proportions), the standard normal approximation was used to define the hypothesized effect sizes, that is, μ = p and |$\sigma \ = \ \sqrt {p( {1 - p} )}$| where p denotes the assumed population proportion. Analogously, for randomized two-arm studies with the primary objective to conduct comparisons between the two arms with a primary endpoint measured on a continuous scale, the standardized absolute difference was defined as: |${\rm{E}}{{\rm{S}}}_{\rm{H}}\ = \ |{\mu }_1 - {\mu }_2|/{\sigma }_p$|⁠, where μ1 and μ2 denote hypothesized means for the two arms and σp the pooled SD. For randomized two-arm studies with binary outcomes (proportions), the standardized absolute difference was defined as: |${\rm{E}}{{\rm{S}}}_{\rm{H}}\ = \ |{p}_1 - {p}_2|/\sqrt {\hat{p}( {1 - \hat{p}} )}$|⁠, where p1 and p2 denote hypothesized proportions for the two arms and |$\hat{p}$| the pooled proportion across the two arms. For randomized studies with more than two arms, the standardized absolute difference was defined on the basis of the primary pairwise comparison as defined in the protocol.

Observed effect sizes

Observed effect sizes (ESO) were calculated when means/SDs (continuous endpoints) or proportions (binary endpoints) for the primary endpoints were reported. Specifically, for a single-arm trial with a continuous endpoint, the observed effect size was calculated as |${\rm{E}}{{\rm{S}}}_{\rm{O}}\ = \ | {\bar{x} - {\mu }_0} |/s$| where |$\ \bar{x}$|⁠, |${\mu }_0,$| and s denote the sample mean, the reference mean, and SD, and for a single-arm preintervention versus postintervention trial, the observed effect size was calculated as |${\rm{E}}{{\rm{S}}}_{\rm{O}}\ = \ | {{{\bar{x}}}_{Post} - {{\bar{x}}}_{Pre}} |/s$| where |$\ {\bar{x}}_{Post}$|⁠, |$\ {\bar{x}}_{Pre}$|⁠, and the sample postintervention and preintervention means and s the SD of the preintervention to postintervention change. For a randomized two-arm study where the primary endpoint was measured on a continuous scale and the primary objective was to conduct a comparison between arms, the observed effect size was calculated as |${\rm{E}}{{\rm{S}}}_{\rm{O}}\ = \ | {{{\bar{x}}}_1 - {{\bar{x}}}_2} |/{s}_p$| where |${\bar{x}}_1$| and |${\bar{x}}_2$| denote the sample means for the two arms and sp denotes the pooled SD. Analogously, for randomized studies with more than two arms, the ESO was calculated on the basis of the primary pairwise comparison between arms as defined in the study protocol.

For some studies, the primary endpoint was analyzed using nonparametric tests and only medians and the corresponding P values were reported so that the standardized ESO, as described above, could not be determined. For other studies, only the P values were reported without any information regarding the observed differences. In those situations, the ESO were estimated on the basis of the reported P values and sample size information for the primary endpoint evaluations. Unless noted otherwise, two-sided P values were assumed for all comparisons.

For example, for one-arm studies with a two-sided test, ESO was estimated as |${\rm{E}}{{\rm{S}}}_{\rm{O}}\ = \ {z}_{1 - p{\rm{/2}}}\sqrt {1/N} $|⁠, where N denotes the actual sample size of evaluable subjects, p the reported P value, and z1 −p/2 the 1 − p/2 quantile of the standard normal distribution. In cases where only upper bounds of the P value were provided, for example, P < 0.001, the P value for the ESO estimation was truncated at the upper bound. The confidence interval (CI) transformation principle approximation method was used to estimate SEs and construct 95% confidence intervals (CI) for the observed effect sizes (4).

Analysis of hypothesized versus observed effect sizes

To quantify the differences between hypothesized versus observed effect sizes, the ratios ESO to ESH were calculated and reported. Consistency between ESO versus ESH across studies was quantified by calculating the intraclass correlation coefficient (ICC) which was reported along with the corresponding 95% CI.

Pooled estimates of the ESO to ESH ratios across studies were obtained using a random-effects linear model, or, in the absence of heterogeneity among studies, a fixed-effects linear model (5). Weighting of individual studies was performed using the inverse variance method (6). Heterogeneity of the ratios among studies was quantified by calculating the I2 statistics (7). A forest plot, showing the ratios of the individual studies and pooled estimates along with the corresponding 95% CI, was generated to summarize the results. Because for the ratios, only the numerator represented a random variable with no evidence for skewness in the distribution, no transformation was applied when analyzing the ratios. If a lower confidence bound for an individual study was negative, then the corresponding confidence bound was truncated at zero in the forest plot.

Statistical analyses were conducted using SAS (SAS Institute), version 9.4 and R software (R Foundation for Statistical Computing), version 3.5.1. All reported P values were two sided and P < 0.05 was used to define statistical significance.

Subgroup analyses

Subgroup analyses were conducted by comparing ESO versus ESH for the following subgroups: (i) studies where accrual goal was achieved, defined as enrolling at least 80% of an accrual target, versus studies where accrual goal not achieved, (ii) studies where the primary analysis was based on the intent-to-treat (ITT) analysis populations versus studies where it was based on the per-protocol analysis populations as defined in the corresponding study protocols or study reports, (iii) nonrandomized single-arm versus randomized multi-arm studies, (iv) large studies, defined as an actual total (across arms) sample size of at least 50 subjects, versus small studies, defined as an actual total sample size of less than 50 subjects, and (v) studies where the effect size was reported or could be calculated on the basis of reported mean differences and SDs versus studies where the effect sizes were not reported but where the effect sizes could be derived on the basis of P value and sample size information as described above. Pooled estimates for each subgroup were obtained using a random-effects or fixed-effects linear model based on inverse variance weighting as described earlier. Comparisons of pooled estimates between subgroups were conducted using a two-sample t test using the differences of the pooled estimates and SE estimates obtained from the random-effects or fixed-effects models.

Data availability statement

The data underlying this article will be shared on reasonable request to the corresponding author.

Study characteristics

A total of 59 studies conducted and completed between 2003 and 2019 were reviewed. Eighteen studies did not meet the inclusion/exclusion criteria for the analysis, including two pharmacokinetic, 10 safety, three bioequivalence, and three noninferiority studies (Fig. 1). Seventeen additional studies were excluded as they were not evaluable for the analysis due to insufficient information on study results or hypothesized effect size, resulting in a total of 24 evaluable studies (Table 1).

Figure 1.

Flow diagram showing study selection process based on PRISMA 2020 guidelines. A total of 24 evaluable studies were identified.

Figure 1.

Flow diagram showing study selection process based on PRISMA 2020 guidelines. A total of 24 evaluable studies were identified.

Close modal
Table 1:

Study characteristics of 24 trials included in the analysis.

National Clinical Trial (NCT) numberTarget organAgentsNumber of armsPlanned sample sizeActual sample sizeEffect size reporteda
NCT01447927 Esophagus Metformin, Hydrochloride 64 69 No 
NCT03300570 Colorectal Dolcanatide 24 24 Yes 
NCT00438464 Prostate Finasteride 200 183 Yes 
NCT00321893 Lung Budesonide 202 198 Yes 
NCT00637481 Breast Atorvastatin, Calcium 60 61 No 
NCT01021215 Lung Celecoxib, Zileuton 66 70 Yes 
NCT01849250 Breast Doconexent (DHA) 60 57 No 
NCT00468910 Colon Aspirin 60 72 No 
NCT00828984 Colon Polyethylene Glycol 60 32 No 
NCT00245024 Breast Sulindac 30 26 No 
NCT00721877 Other Resveratrol 42 40 Yes 
NCT01097304 Esophagus Ursodiol 30 28 No 
NCT01370889 Breast Resveratrol 30 34 No 
NCT01433913 Prostate Metformin, Hydrochloride 46 19 No 
NCT01447355 Skin Cholecalciferol 25 25 Yes 
NCT00462280 Skin Lovastatin 60 49 No 
NCT00513461 Liver S-Adenosyl-l-methionine 110 83 No 
NCT00754494 Colon Erlotinib.HCl 45 42 Yes 
NCT01312467 Colorectal Metformin, Hydrochloride 39 32 No 
NCT00118040 Bladder Genistein 60 51 No 
NCT00853996 Breast Acolbifene.HCl 40 25 Yes 
NCT00450229 Prostate 3,3′ Diindolylmethane 45 39 No 
NCT00666562 Bladder Green tea, Catechins 33 29 Yes 
NCT01325311 Prostate Cholecalciferol, G-2535 (Isoflavones 100) 50 15 Yes 
National Clinical Trial (NCT) numberTarget organAgentsNumber of armsPlanned sample sizeActual sample sizeEffect size reporteda
NCT01447927 Esophagus Metformin, Hydrochloride 64 69 No 
NCT03300570 Colorectal Dolcanatide 24 24 Yes 
NCT00438464 Prostate Finasteride 200 183 Yes 
NCT00321893 Lung Budesonide 202 198 Yes 
NCT00637481 Breast Atorvastatin, Calcium 60 61 No 
NCT01021215 Lung Celecoxib, Zileuton 66 70 Yes 
NCT01849250 Breast Doconexent (DHA) 60 57 No 
NCT00468910 Colon Aspirin 60 72 No 
NCT00828984 Colon Polyethylene Glycol 60 32 No 
NCT00245024 Breast Sulindac 30 26 No 
NCT00721877 Other Resveratrol 42 40 Yes 
NCT01097304 Esophagus Ursodiol 30 28 No 
NCT01370889 Breast Resveratrol 30 34 No 
NCT01433913 Prostate Metformin, Hydrochloride 46 19 No 
NCT01447355 Skin Cholecalciferol 25 25 Yes 
NCT00462280 Skin Lovastatin 60 49 No 
NCT00513461 Liver S-Adenosyl-l-methionine 110 83 No 
NCT00754494 Colon Erlotinib.HCl 45 42 Yes 
NCT01312467 Colorectal Metformin, Hydrochloride 39 32 No 
NCT00118040 Bladder Genistein 60 51 No 
NCT00853996 Breast Acolbifene.HCl 40 25 Yes 
NCT00450229 Prostate 3,3′ Diindolylmethane 45 39 No 
NCT00666562 Bladder Green tea, Catechins 33 29 Yes 
NCT01325311 Prostate Cholecalciferol, G-2535 (Isoflavones 100) 50 15 Yes 

aEffect size was reported as standardized difference or could be derived from reported means and SDs.

The target organ was colon/colorectum for five (21%) studies, breast for five (21%) studies, prostate for four (17%) studies, and other for 10 (42%) studies (Fig. 2A). There were six (25%) nonrandomized single-arm and 18 (75%) randomized multi-arm studies of which 15 were single- or double-blinded. The median planned sample size was 48 (range, 24–202) participants while the median actual sample size was 40 (range, 19–198; Table 1). The ITT analysis population was used for the primary endpoint evaluation for 18 (75%) of the studies while the per-protocol analysis population was used in six (25%) studies. For the majority of the studies (N = 21), the planned sample sizes were calculated using 0.80–0.89 power at the one- or two-sided 0.05 significance level (Fig. 2A). Only one study (4%) was terminated early. Figure 2B shows the target sample sizes versus the corresponding actual sample sizes. The observed dropout rate exceeded the planned dropout rate in 11 (46%) studies (Fig. 2C). The R2 value when regressing the actual sample size on the planned sample size was 0.94. The planned sample sizes, however, were less than the corresponding actual sample sizes for the majority of studies (79%).

Figure 2.

Study characteristics of the 24 trials included in the analysis were evaluated. A, Study characteristics. B, Target versus observed sample sizes. C, Planned versus observed dropout rates.

Figure 2.

Study characteristics of the 24 trials included in the analysis were evaluated. A, Study characteristics. B, Target versus observed sample sizes. C, Planned versus observed dropout rates.

Close modal

Analysis of hypothesized versus observed effect sizes

Figure 3 shows the scatter plot of ESH versus ESO. The mean ESH was 0.72 (SD 0.26) while the mean ESO was 0.43 (SD 0.29). The ESO was less than the corresponding ESH in 20 (83%) of the studies. The results of the pooled analysis are shown in the forest plot (Fig. 4). The pooled mean ratio of ESO to ESH based on the random-effects model was 0.57 (95% CI: 0.42–0.73, P < 0.001). Heterogeneity in the ratio of ESO to ESH was large with an I2 of 59%. A low level of agreement between ESO and ESH was observed with an ICC of 0.19 (95% CI: −0.2 to 0.5).

Figure 3.

Scatter plot of hypothesized versus observed effect sizes. The observed effect sizes were less than the corresponding hypothesized effect sizes for the majority of studies.

Figure 3.

Scatter plot of hypothesized versus observed effect sizes. The observed effect sizes were less than the corresponding hypothesized effect sizes for the majority of studies.

Close modal
Figure 4.

Forest plot of the ratios ESO/ESH across studies. The overall pooled mean ratio of ESO to ESH based on the random-effects model was 0.57 (95% CI: 0.42–0.73, P < 0.001).

Figure 4.

Forest plot of the ratios ESO/ESH across studies. The overall pooled mean ratio of ESO to ESH based on the random-effects model was 0.57 (95% CI: 0.42–0.73, P < 0.001).

Close modal

Subgroup analyses

Pooled ESO to ESH ratios within subgroups were calculated and compared (Table 2). For studies where the accrual goal was achieved (N = 19), the pooled ESO to ESH ratio was 0.54 (95% CI: 0.39–0.71) versus 0.70 (95% CI: 0.23–1.18) for studies where the accrual goal was not achieved (P = 0.43). There was no difference observed in the ESO to ESH ratio when comparing studies where the primary endpoint was evaluated using the ITT analysis population (0.56, 95% CI: 0.37–0.75) to studies where the primary endpoint was evaluated using the per-protocol population (0.61, 95% CI: 0.30–0.93; P = 0.79). Furthermore, there was no significant difference detected when comparing the pooled ESO with ESH ratios between large (0.45, 95% CI: 0.24–0.65) versus small (0.67, 95% CI: 0.45–0.89) studies (P = 0.20). Analogously, no significant difference was detected when comparing the pooled ESO to ESH ratios between studies where the effect size was reported (0.67, 95% CI: 0.41–0.92) versus studies where the effect size was not reported and therefore estimated (0.45, 95% CI: 0.32–0.58; P = 0.11). When comparing nonrandomized single-arm studies (N = 6) with randomized multi-arm studies (N = 18), the pooled ESO to ESH ratio was 0.75 (95% CI: 0.35–1.15) for nonrandomized single-arm studies versus 0.48 (95% CI: 0.37–0.60) for randomized multi-arm studies (P = 0.09).

Table 2.

Subgroup analyses of valuation of ESO versus ESH.

Number of studiesPooled ratio ESO/ESH (95% CI)PI2d
Accrual 
 Achieved (≥80% accrual goal) 19 0.54 (0.38–0.71)a,c 0.43 59% 
 Not achieved (<80% accrual goal) 0.70 (0.23–1.18)a  58% 
Analysis population 
 Intent-to-treat 18 0.56 (0.37–0.75)a,c 0.79 57% 
 Per protocol 0.61 (0.30–0.93)a,c  66% 
Study design 
 Nonrandomized single-arm 0.75 (0.35–1.15)a 0.09 29% 
 Randomized multi-arm 18 0.48 (0.37–0.60)a,c  83% 
Study size 
 Large (nb ≥ 50) 0.45 (0.24–0.65)a,c 0.20 56% 
 Small (nb < 50) 15 0.67 (0.45–0.89)a,c  59% 
Reported effect size 
 Yesd 10 0.67 (0.41–0.92)a,c 0.11 69% 
 Noe 14 0.45 (0.32–0.58)c  45% 
Number of studiesPooled ratio ESO/ESH (95% CI)PI2d
Accrual 
 Achieved (≥80% accrual goal) 19 0.54 (0.38–0.71)a,c 0.43 59% 
 Not achieved (<80% accrual goal) 0.70 (0.23–1.18)a  58% 
Analysis population 
 Intent-to-treat 18 0.56 (0.37–0.75)a,c 0.79 57% 
 Per protocol 0.61 (0.30–0.93)a,c  66% 
Study design 
 Nonrandomized single-arm 0.75 (0.35–1.15)a 0.09 29% 
 Randomized multi-arm 18 0.48 (0.37–0.60)a,c  83% 
Study size 
 Large (nb ≥ 50) 0.45 (0.24–0.65)a,c 0.20 56% 
 Small (nb < 50) 15 0.67 (0.45–0.89)a,c  59% 
Reported effect size 
 Yesd 10 0.67 (0.41–0.92)a,c 0.11 69% 
 Noe 14 0.45 (0.32–0.58)c  45% 

aRandom-effects model.

bActual total sample size.

cP < 0.05.

dEffect size was reported as standard differences or could be calculated from reported means and SDs.

eEffect size was estimated on the basis of reported P values and sample size information.

The findings summarized above indicate that the majority of early phase chemoprevention trials conducted under the phase 0/I/II Cancer Prevention Clinical Trials Program of the Division of Cancer Prevention between 2003 and 2019 had much smaller observed effect sizes than hypothesized effect sizes, ultimately resulting in a large proportion of negative trials. Specifically, for 83% of the trials, the observed effect size was smaller than the hypothesized effect size, and the mean observed effect size was approximately only 60% of the hypothesized effect size across all studies. It was anticipated that the observed effect size would be closer to the hypothesized effect size for studies where the accrual goal was met or for trials where the primary analysis was based on the per-protocol populations. However, there were no significant differences detected when comparing ratios between studies where the accrual goal was met versus not met or when comparing studies where the primary analysis was based on the ITT versus the per-protocol populations. When comparing nonrandomized single-arm trials with randomized multi-arm trials, the observed to hypothesized effect size ratio was larger for nonrandomized single-arm studies as compared with randomized multi-arm studies (0.75 vs. 0.48; P = 0.09). This is not surprising as determining a realistic effect size requires less information at study planning and is subject to less uncertainty for nonrandomized single-arm trials as compared with randomized multi-arm trials. For example, for a typical preintervention versus postintervention single-arm study evaluating the intervention effect on surrogate biomarker endpoint measured on a quantitative scale, determining a hypothesized effect size requires only an estimate of the absolute or percentage difference in the biomarker endpoint from the preintervention to postintervention assessment and an estimate for the corresponding SD. Determining the hypothesized effect sizes for a placebo-controlled trial, on the other hand, would require the same information for the treatment arm in addition to information regarding potential differences and variability in the control arm.

The primary endpoints for early phase therapeutic trials are typically surrogate clinical endpoints, for example, objective response or progression-free survival, while for prevention trials, they are generally putative (but oftentimes not yet validated) surrogate biomarker endpoints. Determining realistic hypothesized effect sizes for clinical endpoints based on consideration regarding clinically important differences and results from other clinical trials is considerably easier than for surrogate biomarker endpoints for early phase prevention trials. Specifically, for prevention trials with surrogate biomarker endpoints, preliminary studies evaluating the intervention effect are oftentimes lacking. Furthermore, biomarker endpoints are often subject to large variabilities, resulting in smaller observed effect sizes. Precision of the biomarker measurements from diverse biologic samples is often not well established prior to their application in clinical trials. The precision is affected by both methodologic and biologic variations. For example, the proliferation marker Ki67, which was used as a primary endpoint in seven out of the 59 reviewed trials, is subject to substantial interlaboratory variability (8, 9). Therefore, analyses are generally batched and performed at study end. In addition, biomarkers such as proliferation may be subject to spontaneous variation and may be sensitive to unmeasured stressors experienced by the study participant, in addition to the intervention itself. It is difficult to control for such confounding factors, short of increasing the number of enrolled participants and using a randomized placebo-controlled trial design.

For these reasons, it is common practice to utilize randomized controlled designs in early phase prevention trials. In the drug development process of cancer therapeutic agents, however, early phase efficacy trials have been traditionally designed as noncontrolled single-arm trials (10). The advantages and disadvantages of utilizing randomized controlled designs for early phase therapeutic efficacy trials has been debated in recent year (11–13). The main advantage of a randomized controlled design for an early efficacy evaluation is that it minimizes bias. As there are no prevention biomarkers analogous to CT-measured tumor shrinkage used in therapeutic trials, biomarkers used in cancer prevention trials are less well standardized and randomized trial designs help to minimize the effects of the variability. On the other hand, a randomized controlled design requires a substantially larger sample size, resulting in prolonged study duration, which is undesirable and oftentimes impractical for trials where the primary objective is to detect an efficacy signal. The increased utilization of single-arm study designs for early efficacy evaluation in preventive trials deserves further study as it may be a practical and efficient way to identify promising prevention agents which could be then confirmed in a properly powered large randomized controlled trial setting. The utilization of blinded interim analyses for sample size re-estimation or interim analyses for futility assessments could be a useful tool for evaluating early efficacy signals for prevention trials where feasible, although this requires biomarker analysis during trial conduct which could introduce additional variability due to batch-to-batch variations for laboratory measurements performed at different times.

Limitations of the analysis presented here include the relatively small number of evaluable studies and the large heterogeneity between studies with respect to study design and primary endpoint. Comparisons between subgroups were considered as exploratory due to the small number of studies. Furthermore, the small number of evaluable studies did not allow for additional subgroup analyses or meta regression analyses for identifying additional factors which could be associated with the discrepancy between hypothesized and observed effect sizes. Nevertheless, the trials included in this systemic review provide important experience regarding expectations for the biomarkers that have been tested thus far, allowing future effect size planning to be more realistic.

Early phase chemoprevention trials require careful planning in terms of study design and primary endpoints. Utilization of the classical randomized controlled design may not be the optimal choice in every situation. Furthermore, sample size and hypothesized effect sizes should be determined under realistic assumptions with respect to anticipated intervention effects and variability in surrogate biomarkers. The goal of these early phase trials is to identify efficacy signals to justify further agent development. Sample size calculations for such early phase cancer prevention trials need to balance the potential detectable effect sizes with the capacity for realistic and cost-effective accrual. The overarching need is to detect only effect sizes large enough to justify subsequent phase III trials.

J. Eickhoff reports grants from NIH during the conduct of the study. No disclosures were reported by the other authors.

J. Eickhoff: Data curation, formal analysis, methodology, writing–original draft. J. Zaborek: Data curation, methodology, writing–review and editing. G. Chen: Data curation, methodology, writing–review and editing. V.V. Sahasrabuddhe: Conceptualization, resources, methodology, writing–review and editing. L.G. Ford: Conceptualization, resources, methodology. E. Szabo: Conceptualization, resources, investigation, methodology, writing–review and editing. K. Kim: Conceptualization, supervision, funding acquisition, investigation, methodology, project administration, writing–review and editing.

This article is dedicated to the memory of Zhuming Zhang, who provided invaluable contribution to this work and passed away suddenly in November of 2022. The work was supported in part by U24 CA242637 for the CP-CTNet Coordinating Center (K. Kim, J. Eickhoff, J. Zaborek, and G. Chen) and P30 CA014520 for the University of Wisconsin Carbone Cancer Center Support Grant from the NIH (K. Kim and J Eickhoff).

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

1.
Szabo
E
.
Biomarkers in phase I-II chemoprevention trials: lessons from the NCI experience
.
Ecancermedicalscience
2015
;
9
:
599
.
2.
Follen
M
,
Vlastos
AT
,
Meyskens
FL
Jr
,
Atkinson
EN
,
Schottenfeld
D
.
Why phase II trials in cervical chemoprevention are negative: what have we learned
?
Cancer causes control
2002
;
13
:
855
73
.
3.
Page
MJ
,
McKenzie
JE
,
Bossuyt
PM
,
Boutron
I
,
Hoffmann
TC
,
Mulrow
CD
, et al
.
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
2021
;
372
:
n71
.
4.
Kelley
K
.
Constructing confidence intervals for standardized effect sizes: theory, application, and implementation
.
J Stat Softw
2007
;
20
:
1
24
.
5.
DerSimonian
R
,
Laird
N
.
Meta-analysis in clinical trials
.
Control Clin Trials
1986
;
7
:
177
88
.
6.
Fleiss
JL
.
The statistical basis of meta-analysis
.
Stat Methods Med Res
1993
;
2
:
121
45
.
7.
Higgins
JP
,
Thompson
SG
.
Quantifying heterogeneity in a meta-analysis
.
Stat Med
2002
;
21
:
1539
58
.
8.
Mengel
M
,
von Wasielewski
R
,
Wiese
B
,
Rüdiger
T
,
Müller-Hermelink
HK
,
Kreipe
H
.
Inter-laboratory and inter-observer reproducibility of immunohistochemical assessment of the Ki-67 labelling index in a large multi-centre trial
.
J Pathol
2002
;
198
:
292
9
.
9.
Focke
CM
,
Bürger
H
,
van Diest
PJ
,
Finsterbusch
K
,
Gläser
D
,
Korsching
E
, et al
.
German breast screening pathology initiative. interlaboratory variability of Ki67 staining in breast cancer
.
Eur J Cancer
2017
;
84
:
219
27
.
10.
Rubinstein
L
.
Phase II design: history and evolution
.
Chin Clin Oncol
2014
;
3
:
48
.
11.
Lee
JJ
,
Feng
L
.
Randomized phase II designs in cancer clinical trials: current status and future directions
.
J Clin Oncol
2005
;
23
:
4450
7
.
12.
Mandrekar
SJ
,
Sargent
DJ
.
Randomized phase II trials: time for a new era in clinical trial design
.
J Thorac Oncol
2010
;
5
:
932
4
.
13.
Ratain
MJ
,
Sargent
DJ
.
Optimising the design of phase II oncology trials: the importance of randomisation
.
Eur J Cancer
2009
;
45
:
275
80
.