Abstract
Purpose: Phase I studies rely on investigators to accurately attribute adverse events as related or unrelated to study drug. This information is ultimately used to help establish a safe dose. Attribution in the phase I setting has not been widely studied and assessing the accuracy of attribution is complicated by the lack of a gold standard. We examined dose–toxicity relationships as a function of attribution and toxicity category to evaluate for evidence of toxicity misattribution.
Experimental Design: Individual patient records from 38 phase I studies activated between 2000 and 2010 were used. Dose was defined as a percentage of maximum dose administered on each study. Relationships between dose and patient-level toxicity were explored graphically and with logistic regression. All P values were two-sided.
Results: 11,909 toxicities from 1,156 patients were analyzed. Unrelated toxicity was not associated with dose (P = 0.0920 for grade ≥3, P = 0.4194 for grade ≥1), whereas related toxicity increased with dose (P < 0.0001, both grade ≥3 and ≥1). Similar results were observed across toxicity categories. In the five-tier system, toxicities attributed as “possibly,” “probably,” or “definitely” related were associated with dose (all P < 0.0001), whereas toxicities attributed as “unlikely” or “unrelated” were not (all P > 0.1).
Conclusions: Reassuringly, we did not observe an association between unrelated toxicity rate and dose, an association that could only have been explained by physician misattribution. Our findings also confirmed our expectation that related toxicity rate increases with dose. Our analysis supports simplifying attribution to a two-tier system by collapsing “possibly,” “probably,” and “definitely” related. Clin Cancer Res; 22(3); 553–9. ©2015 AACR.
See related commentary by Sharma and Ratain, p. 527
Attribution in phase I studies has not been well studied, but accurate classification is critical to generating reliable safety data. Furthermore, most studies utilize a five-tier attribution system, but empiric data are limited on how best to translate this into the binary “related” or “unrelated” classification required to define dose-limiting toxicities. We analyzed data from 11,909 toxicities on 38 phase I trials to understand the relationship between dose and toxicity by attribution category. As expected, we found that the rate of drug-related toxicity increases with dose. Reassuringly, we did not observe an association between unrelated toxicity rate and dose, a finding that does not rule out misattribution at the individual patient level. Importantly, the similarity of dose–toxicity relationships between “unrelated” and “unlikely” related toxicities and “possibly,” “probably,” and “definitely” related toxicities supports using a simplified binomial attribution system when collecting the relatedness of adverse events.
Introduction
Phase I trials define the maximum tolerated dose (MTD) and safety of the study drug(s) being evaluated. To do so, investigators must attribute adverse events as related or unrelated to the study drug. Typically, only adverse events considered related to study drug are counted as “dose limiting” and used to help define the MTD. Accurate attribution of adverse events is therefore essential to the integrity of the safety data generated by these studies. Further complicating matters, attribution is collected using a five-tier system, but little empiric data exist on how best to translate this into the binary “related” or “unrelated” classification required to define dose-limiting toxicities. Previous work by our group and others has shown that misattribution of even a small number of serious unrelated adverse events as drug-related can lead to significant errors in MTD estimation (1, 2). Still other analyses have found that nearly one third of phase III studies adopt doses that are at least 20% different than those recommended in the initial phase I study (3). The degree to which toxicity misattribution in phase I studies contributes to these errors in the recommended dose is unknown.
There are important reasons to suspect that physicians may frequently misattribute the cause of adverse events on studies. One analysis of two large, randomized, placebo-controlled phase III studies found that nearly 50% of adverse events on the placebo arms were misattributed as study drug related (4). Similarly, a recent randomized, placebo-controlled, phase III study in prostate cancer reported a higher rate adverse events in the placebo than treatment arm, possibly reflecting a degree of misattribution of disease related-toxicity as drug related (5). Given that the safety profile of the drugs that reach phase III studies are generally well understood, it is possible that errors in toxicity attribution could be even more common in phase I studies where new drugs or even drug classes are being tested. Attribution may also be more challenging with targeted therapy where nonhematologic adverse events predominate and common toxicities, such as fatigue, are also frequent sequelae of patients' underlying disease (6, 7).
Unfortunately, little is known about toxicity attribution on phase I studies. Directly measuring the frequency of attribution errors is not possible due to the lack of a gold standard with which to determine true causality. Physicians instead rely on a number of factors to aid in this decision making, including temporal relationships to dosing and prior knowledge, when it exists, of the common toxicities associated with the agent or drug class. Causality can sometimes be further inferred by the abrogation of symptoms with dose modification and reoccurrence, or lack thereof, when patients are challenged if medically appropriate. Although these methods are helpful, it can never be known with certainty which individual adverse events are caused by a drug. Phase I trials present further challenges because ethical and practical considerations prevent use of placebos that could otherwise be utilized to compare toxicity rates.
To address these limitations, we utilized a large prospectively maintained database of phase I studies to evaluate the effect of drug dose on the frequency of unrelated and related adverse events. Our goals were two-fold. First, we aimed to determine whether our clinical expectations of dose–toxicity relationships for related and unrelated toxicities were confirmed by empiric data. Our hypothesis was that the rate of unrelated toxicities would remain constant across all dose levels, whereas the rate of drug-related toxicities would increase with dose. Second, we aimed to evaluate whether the current practice paradigm of simplifying the five-tier attribution system to a binary system was supported by the observed dose–toxicity relationships. Both lines of questioning support our larger goal of providing an evidence-based framework for the interpretation of toxicity data generated by phase I studies.
Materials and Methods
Study design and patient eligibility
Individual treatment records from a multicenter cohort of all patients treated on NCI-sponsored (8) phase I trials activated between 2000 and 2010 who met the following inclusion and exclusion criteria were analyzed. Data were provided from the Clinical Trials Monitoring System (CTMS) database, which is managed by Theradex Systems, Inc. The CTMS database is prospectively maintained with robust data management and auditing practices (9). Written informed consent was obtained from all patients according to the protocol to which they were enrolled. Trials of vaccines, immunotherapy, radiotherapy, locoregional therapies, and autologous or allogeneic stem cell transplant were excluded. Studies of these agents were excluded as the adverse events associated with these unique therapeutic approaches are typically not generalizable to the broader population of phase I studies utilizing systemically administered cytotoxic or molecularly targeted therapy. Similarly, organ dysfunction studies were also excluded because these studies have unique toxicity profiles that may not be generalizable to the broader phase I population that typically does not include these patients. Trials with more than one investigational agent and trials with induction cycles were excluded to allow for more direct comparison of study drug dose between trials. Study drug dose was expressed as a percentage of maximum administered dose (%MAD) evaluated on a trial-specific basis. Eligible patients were adults (≥18 years) with solid tumors, excluding lymphoma. All patients had regular follow-up visits as specified by the protocol to which they were enrolled. Patients were required to have received at least one dose of study drug to be included in the analysis.
Toxicity definition
Due to the varying length of patient participation on phase I studies, only cycle 1 toxicities were considered. Adverse event data were collected in accordance with CTMS guidelines that require that all adverse events (both clinical and laboratory) be recorded beginning from the start of treatment. Cycle 1 was defined using protocol-specific cycle length. Toxicities were grouped according to System Organ Classification (SOC) using CTCAE category (cardiovascular, constitutional, gastrointestinal, hematologic, metabolic, and other) based on the recorded description of the toxicity. The “other” category represents a combination of renal, neurologic, respiratory, and dermatologic toxicities as these groups were too small to evaluate individually, and toxicities that could not be placed into any of the categories (eg, insomnia, fever, depression, and anxiety). Toxicities were also classified as either unrelated or related to study drug using a simplified two-tier version adapted from the current five-tier physician attribution system following common clinical convention as follows: unrelated (“unrelated” or “unlikely”) and related (“possibly,” “probably,” or “definitely”).
Statistical analysis
All analysis of dose–toxicity was done with the patient as the unit of analysis. To investigate the association of dose and toxicity rate overall and for each toxicity category, patients were divided into four dose groups based on %MAD they received as follows: 0%–25%, 26%–50%, 51%–75%, and 76%–100%. The proportions of patients experiencing grade ≥1 and grade ≥3 toxicity at each dose group were assessed graphically and via logistic regression. Related and unrelated toxicities (using the two-tier classification scheme defined above) were analyzed separately. The toxicity outcomes for the logistic regression models were whether or not a patient experienced grade ≥1 or grade ≥3 toxicity. Dose was assessed both as a continuous covariate (%MAD) and as a categorical covariate (four dose groups, as defined above). Fitted logistic regression curves based on continuous %MAD were superimposed on the toxicity-by-dose group plot, and P values from these models were used to test for associations between dose and toxicity. Other functions for the dose–toxicity curve were investigated, including local regression with the locfit function in R and a method based on constrained maximum likelihood estimation (10), but logistic models appeared to fit the data best. Dose–toxicity relationships for each level of the five-tier attribution system were investigated similarly and comparisons with the two-tier system were assessed graphically.
Toxicity characteristics were summarized and differences in the distribution of physician attribution (five-tier scale) across toxicity categories were explored using histograms of attribution of all grade ≥3 toxicities for each category of toxicity. All statistical analysis was performed in SAS 9.2 (SAS Institute) and R 3.0.1 (R Foundation). All P values were two-sided, and P values less than 0.05 were considered significant.
Results
Data from 1,156 patients treated on 38 phase I trials were analyzed. Baseline patient characteristics are presented in Table 1. A broad range of tumor types was represented. Median age was 57 years (range, 18–84). The median number of prior systemic therapies was 3, and most patients (68%) had performance status 1. Many different classes of experimental agents were represented, including microtubule inhibitors (24% of patients), topoisomerase inhibitors (11% of patients), and HSP90 inhibitors (10% of patients). Two hundred twenty-three patients were treated at ≤25% of MAD on the trial in which they participated, 311 were treated at 26%–50% of the maximum, 343 were treated at 51%–75% of the maximum, and 279 were treated at 76%–100% of the maximum.
Characteristic . | Number, median . | %, range . |
---|---|---|
Primary tumor site | ||
Gastrointestinal | 379 | 33% |
Genitourinary | 159 | 14% |
Thoracic | 123 | 11% |
Breast | 108 | 9% |
Gynecologic | 138 | 12% |
Sarcoma | 74 | 6% |
Head and neck | 80 | 7% |
Melanoma and skin | 74 | 6% |
Brain and unknown | 21 | 2% |
Sex | ||
Male | 557 | 48% |
Female | 599 | 52% |
Age (y) | 57 | 18–84 |
ECOG performance status | ||
0 | 284 | 25% |
1 | 788 | 68% |
≥2 | 81 | 7% |
3 missing | ||
Number of prior systemic therapies | ||
0–2 | 410 | 35% |
3 | 230 | 20% |
≥4 | 516 | 45% |
Study drug dose level | ||
≤25% of MAD | 223 | 19% |
26%–50% of MAD | 311 | 27% |
51%–75% of MAD | 343 | 30% |
>75% of MAD | 279 | 24% |
Drug class | ||
Alkylator | 33 | 3% |
Aminoflavone | 52 | 5% |
Antiangiogenic agent | 86 | 7% |
Antimetabolite | 9 | 1% |
DNA intercalator | 44 | 4% |
ERBB/KRAS inhibitor | 46 | 4% |
Epigenetic modifier | 99 | 9% |
Hsp90 inhibitor | 117 | 10% |
Microtubule inhibitor | 275 | 24% |
Multi-TKI | 107 | 9% |
Other | 65 | 6% |
PARP inhibitor | 65 | 6% |
PI3K/mTOR inhibitor | 11 | 1% |
Ribonucleotide reductase inhibitor | 20 | 2% |
Topoisomerase inhibitor | 127 | 11% |
Characteristic . | Number, median . | %, range . |
---|---|---|
Primary tumor site | ||
Gastrointestinal | 379 | 33% |
Genitourinary | 159 | 14% |
Thoracic | 123 | 11% |
Breast | 108 | 9% |
Gynecologic | 138 | 12% |
Sarcoma | 74 | 6% |
Head and neck | 80 | 7% |
Melanoma and skin | 74 | 6% |
Brain and unknown | 21 | 2% |
Sex | ||
Male | 557 | 48% |
Female | 599 | 52% |
Age (y) | 57 | 18–84 |
ECOG performance status | ||
0 | 284 | 25% |
1 | 788 | 68% |
≥2 | 81 | 7% |
3 missing | ||
Number of prior systemic therapies | ||
0–2 | 410 | 35% |
3 | 230 | 20% |
≥4 | 516 | 45% |
Study drug dose level | ||
≤25% of MAD | 223 | 19% |
26%–50% of MAD | 311 | 27% |
51%–75% of MAD | 343 | 30% |
>75% of MAD | 279 | 24% |
Drug class | ||
Alkylator | 33 | 3% |
Aminoflavone | 52 | 5% |
Antiangiogenic agent | 86 | 7% |
Antimetabolite | 9 | 1% |
DNA intercalator | 44 | 4% |
ERBB/KRAS inhibitor | 46 | 4% |
Epigenetic modifier | 99 | 9% |
Hsp90 inhibitor | 117 | 10% |
Microtubule inhibitor | 275 | 24% |
Multi-TKI | 107 | 9% |
Other | 65 | 6% |
PARP inhibitor | 65 | 6% |
PI3K/mTOR inhibitor | 11 | 1% |
Ribonucleotide reductase inhibitor | 20 | 2% |
Topoisomerase inhibitor | 127 | 11% |
In total, 11,909 unique toxicities were observed during cycle 1. An overview of these toxicities is presented in Table 2. The most commonly assigned attribution was “possibly” related, which accounted for 30% of all toxicities. Only 4% of toxicities were assigned “definitely” related, and only 16% of toxicities were assigned either “probably” or “definitely” related. Sixty-four percent of toxicities were grade 1. Only 1% of toxicities were grade 4, and only 0.2% of toxicities were grade 5. Grade 5 toxicities occurred in only 2% of patients (22 of 1,156), which is in line with previous reports that death on phase I studies is quite rare (11). The most common categories of toxicity were metabolic (25%), constitutional (18%), gastrointestinal (18%), and hematologic (15%).
Characteristic . | N (%) . |
---|---|
Category | |
Cardiovascular | 560 (5%) |
Constitutional | 2,152 (18%) |
Gastrointestinal | 2,137 (18%) |
Hematologic | 1,808 (15%) |
Metabolic | 2,919 (25%) |
Other | 2,333 (20%) |
Grade | |
1 | 7,651 (64%) |
2 | 2,869 (24%) |
3 | 1,189 (10%) |
4 | 178 (1%) |
5 | 22 (0.2%) |
Attribution | |
Unrelated | 3,187 (27%) |
Unlikely related | 3,175 (27%) |
Possibly related | 3,581 (30%) |
Probably related | 1,446 (12%) |
Definitely related | 520 (4%) |
Characteristic . | N (%) . |
---|---|
Category | |
Cardiovascular | 560 (5%) |
Constitutional | 2,152 (18%) |
Gastrointestinal | 2,137 (18%) |
Hematologic | 1,808 (15%) |
Metabolic | 2,919 (25%) |
Other | 2,333 (20%) |
Grade | |
1 | 7,651 (64%) |
2 | 2,869 (24%) |
3 | 1,189 (10%) |
4 | 178 (1%) |
5 | 22 (0.2%) |
Attribution | |
Unrelated | 3,187 (27%) |
Unlikely related | 3,175 (27%) |
Possibly related | 3,581 (30%) |
Probably related | 1,446 (12%) |
Definitely related | 520 (4%) |
Almost all patients (96%) had at least one grade ≥1 toxicity and about half (44%) had at least one grade ≥3 toxicity. The dose–toxicity relationship of related and unrelated toxicities is shown in Fig. 1. There was a statistically significant relationship between dose and toxicity among related toxicities (grade ≥1 P < 0.0001; grade ≥3 P < 0.0001) but not unrelated toxicities (grade ≥1 P = 0.4194; grade ≥3 P = 0.0920). Similar results were observed when patients on cytotoxic (N = 560) and molecular agents (N = 596) were analyzed separately (included in Supplementary Fig. S1).
Next, we evaluated the dose–toxicity relationships of all grade ≥1 toxicities and all grade ≥3 toxicities as a function of physician attribution using the five-tier system (“unrelated,” “unlikely,” “possibly,” “probably,” or “definitely”). The results are shown in Fig. 2. Neither “unrelated” nor “unlikely” related toxicities were associated to dose (P = 0.4058 and 0.8677 for grade ≥1 and grade≥3 “unrelated”; P = 0.8663 and 0.1351 for grade ≥1 and grade≥3 “unlikely” related). In distinction, “possibly,” “probably,” or “definitely” related toxicities were each statistically significantly associated to dose (all P < 0.0001).
Finally, we hypothesized that some toxicities could be less likely to occur as a complication of cancer or other underlying medical comorbidities and might therefore be easier to accurately attribute as drug related (or vice versa). In fact, the distribution of physician attribution assignment in grade ≥3 toxicities did vary substantially by toxicity category (Supplementary Fig. S2). We therefore evaluated the association between dose and toxicity within multiple CTCAE categories as shown in Fig. 3. There was no statistically significant relationship between dose and unrelated toxicity rate within any category (all P > 0.05). Conversely, among the related toxicity, the toxicity rate increased as dose increased (P range, <0.0001–0.0294) and the rate of increase appeared steeper for constitutional, hematologic, and gastrointestinal toxicities (Fig. 3A, C, and E).
Discussion
In this article, we evaluated how dose affects the rate of related and unrelated toxicity on phase I studies graphically and with logistic regression. Our analyses reveal that unrelated toxicities occur at a constant rate across dose levels and that there are no statistically significant changes in their rate as dose increases, empirically confirming our clinical intuition about unrelated toxicities. This finding remained consistent when we looked at specific categories of toxicities and whether we grouped attribution levels of “unrelated” and “unlikely” related together or investigated them separately (Figs. 1 and 2). These results are reassuring and suggest that although some drug-related toxicities may be misattributed to other factors such as underlying disease, this is unlikely to be occurring at a high rate on phase I studies. On the other hand, we did observe an increase in the rate of related adverse events as study drug dose increases, and logistic regression analysis confirms that this increase was statistically significant both overall and across specific toxicity categories. These later findings were expected and support the general clinical experience that increasing drug dose is associated with increasing toxicity. Interestingly, the degree of increase in the rate of toxicity with increasing dose varied by category with apparently larger effects of dose (expressed as a steeper slope) for constitutional, hematologic, and gastrointestinal toxicities (Fig. 3). We noted that even at the lowest doses, some level of drug-related toxicity was present; we hypothesize that this is due to a small amount of idiosyncratic (i.e., dose independent), drug-related toxicity that occurs across all doses.
Although clinical trial reports typically group toxicities as either related or unrelated to study drug for the purpose of defining DLT, physicians are required to attribute toxicities using a five-tier system based on their degree of confidence in relatedness. Our analysis provides insight into how physicians utilize this system. Interestingly, among the nearly 12,000 individual toxicities analyzed here, only 16% were labeled as “probably” or “definitely” related to the study drug. In comparison, 30% of the toxicities were attributed as “possibly” related to study drug, making this the most commonly assigned attribution. This finding suggests reluctance on the part of treating physicians to more definitely attribute a particular toxicity as drug related, even when the impact at the study level (i.e., in what constitutes a DLT) would be the same. This finding may also reflect caution on the part of physicians; when physicians are uncertain about drug causality, they may call it “possibly” related for safety reasons, knowing that “possibly” related will ultimately be counted as related by standard clinical convention on phase I studies (12). In comparison, similar numbers of toxicities were attributed as “unrelated” (27%) and “unlikely” (27%) related, suggesting that physicians are not similarly hesitant to definitely attribute toxicities as unrelated to the study drug.
The number of different attribution categories has led some to think that attribution in its current form is not clinically useful and that it is a burden for data collection, and indeed most phase I trials condense attribution to a binary covariate for defining DLTs (12–14). Importantly, we found that dose–toxicity relationships were flat among toxicities attributed as “unrelated” or “unlikely” related whereas they were increasing among those attributed as “possibly,” “probably,” or “definitely.” Assessing the five-tier system graphically, we see that dose–toxicity relationships for “unrelated” or “unlikely” related toxicities look similar to one another and dose–toxicity relationships for “possibly,” “probably,” and “definitely” related toxicities look similar. Taken together, these data indicate that simplifying attribution to a two-tier system by combining “possibly,” “probably,” and “definitely” related toxicities, and specifically grouping “possibly” related toxicities with the related category, is appropriate, given the necessity of defining DLTs as either present or absent. Moreover, these findings and the current practice in phase I trials of simplifying attribution to a binary variable support implementing a binary attribution system for classification of adverse events rather than a five-tier system in the data collection phase, which would simplify real-time data collection. Although further research about how physicians would utilize such a system is needed, we feel that it could be implemented without compromising the safety data generated by phase I studies. One potential justification for maintaining the current five-tier system is if attribution were to be used as part of a model-based dose escalation design that weights toxicities differently by how likely they are to be drug related. In contrast, attribution itself may not be relevant to studies where the primary interest lies in evaluating rates of dose interruption, reduction, or permanent discontinuation for any reason in order to determine the tolerability of an agent or dose.
Strengths of our current study include the diversity of institutions, study drugs/classes, and protocols included in our cohort as well as the large number patients and toxicities analyzed. We also had access to complete individual patient-level data, meaning our analysis was not biased by incomplete or inconsistent adverse event reporting between individual study articles (15). Our analysis does, however, have certain limitations. First, the gold standard, which is knowledge of whether a particular toxicity is truly caused by the study drug or not, is never known and thus misattribution cannot be directly measured in the phase I setting. Because of this, the lack of evidence of misattribution in this analysis should not be interpreted as confirmation that physician attribution is accurate. In addition, because we aggregated thousands of toxicities, we would not have been able to identify attribution errors that occur at a very low rate, or that occur at a similar rate across dose levels, or errors at the toxicity level that do not affect patient-level outcomes. Although these errors could not be detected, they could still have serious implications as prior work by our group has shown that even low rates of misattribution can impact the accuracy of MTD estimation (10). Similarly, our analysis cannot account for any differences in the way physicians might attribute toxicities based on their perception of where individual patients were within the planned dose escalation. Finally, we cannot determine the degree to which the systematic misattribution of unrelated toxicities as drug-related across all doses may have contributed to an increase in the rate of drug-related toxicities at all dose levels without altering the dose–toxicity relationship slope as illustrated in Supplementary Fig. S3 in a hypothetical scenario. The effect of this type of systematic attribution error would be reflected in our results as an upward shift of the dose–toxicity curve without a change in the slope in Figs. 1 and 2.
In conclusion, we found that in phase I studies, dose–toxicity relationships are consistent with our clinical intuition that unrelated toxicities occur at a constant and low rate regardless of study drug dose and that related toxicities increase with increasing dose. These results confirm our hypothesis about expected relationships between dose and toxicity and provide indirect evidence that physician attribution on phase I may not be as systematically error prone as previously reported in the phase III randomized setting (4). Our data also suggest that moving to a binary attribution system could simplify data collection and the process of defining DLTs on phase I studies. Overall, our results should be reassuring to physicians, patients, and regulators who rely on the safety data generated by these important studies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: A. Eaton, A. Iasonos, D. Vulih, S.P. Ivy, D.R. Spriggs, D.M. Hyman
Development of methodology: A. Eaton, A. Iasonos, S.P. Ivy, D.M. Hyman
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): G.L. Smith, S.P. Ivy, D.M. Hyman
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A. Eaton, A. Iasonos, M.M. Gounder, A. Drilon, D. Vulih, S.P. Ivy, D.M. Hyman
Writing, review, and/or revision of the manuscript: A. Eaton, A. Iasonos, M.M. Gounder, A. Drilon, D. Vulih, S.P. Ivy, D.R. Spriggs, D.M. Hyman
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E.G. Pamer, G.L. Smith, D.M. Hyman
Study supervision: A. Iasonos, S.P. Ivy, D.M. Hyman
Grant Support
This work was supported in part by the Cancer Center core grant P30 CA008748. The core grant provides funding to institutional cores such as Biostatistics, which was used in this study. A. Iasonos was partially funded by The Translational and Integrative Medicine Research Fund at Memorial Sloan Kettering Cancer Center, New York, NY.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.