Abstract
At major annual oncology meetings, many presentations about phase III clinical trials that failed to hit their primary endpoints emphasized seemingly positive results of questionable statistical rigor or unclear clinical relevance, a study concludes.
The results of some clinical trials presented at oncology meetings are explained in potentially misleading ways, according to work led by Massimo Di Maio, MD, of the University of Turin, Italy (JAMA Oncol 2020 April 16 [Epub ahead of print]). Of 91 randomized, phase III trials that were formally negative with regard to their primary endpoints, 29% were presented at major annual oncology meetings from 2017 to 2019 with not-negative conclusions in their presentations.
“Many times, there was no clear message that the primary analysis was negative and, more or less explicitly, the authors suggested that the experimental treatment could be an option for clinical practice,” Di Maio says. “This is dangerous,” he adds, because clinicians could treat patients using unproven therapies.
In abstracts describing 50% of the formally negative trials with not-negative conclusions, a numeric improvement in the primary endpoint was cited—despite failure to reach the established value for statistical significance of P < 0.05. But it's important to note, Di Maio says, that “we are aware that it's not a dichotomic distinction between positive and negative.”
“The real interpretation is that, the smaller the P value, the stronger the evidence that you seem to have a real effect going on,” says Stuart Pocock, PhD, of the London School of Hygiene and Tropical Medicine in the UK. He notes it is possible that a trial with nonsignificant results could have been conducted on a treatment that actually works—perhaps the trial was simply too small—although glossing over high P values may still be problematic.
Negative primary endpoints were also skirted by focusing on secondary endpoints, a practice Di Maio's group noted in 38% of the abstracts—and secondary endpoints are sometimes of questionable clinical relevance. For example, if the primary endpoint is overall survival and progression-free survival (PFS) is a secondary endpoint, does a 2-month increase in PFS merit discussion if the primary endpoint fails?
Focusing on secondary endpoints may sometimes be fraught with statistical concerns, too. “Any time we perform a statistical analysis, there is a risk of a false-positive result,” Di Maio says, so hunting for positives among a litany of secondary endpoints hazards coming across an invalid hit.
False positives are also a worry when researchers conduct numerous post hoc subgroup analyses, in which patients are divided based on characteristics such as treatment history or the presence or absence of a given mutation. Di Maio's group noted an overemphasis on positive results in subgroups in 46% of the trials.
A final issue was post hoc noninferiority claims, appearing in 27% of the abstracts. Trials designed to test for superiority often lack a sufficient number of participants to make claims about noninferiority, which require tight confidence intervals. For instance, if the hazard ratio is 1 but the confidence interval is 0.6 to 1.4, the study treatment could be 40% worse than the treatment it was tested against, making noninferiority statements shaky.
Despite the inherent perils of highlighting results with high P values, shifting focus to secondary endpoints, and conducting post hoc subgroup or noninferiority analyses, such findings can be framed appropriately. “I think it's a good idea to do post hoc analyses,” Pocock says. Exploratory analyses can generate hypotheses, which “opens a pathway to say, ‘Let's not give up yet and explore these ideas in a fresh, maybe larger, trial.’” –Nicole Haloupek
For more news on cancer research, visit Cancer Discovery online at http://cancerdiscovery.aacrjournals.org/CDNews.