Abstract
CS16-01
The implementation of a clinical trial requires the development of a protocol document which provides specification for numerous aspects of operations. Included among the list of items that must be defined prior to the initiation of a trial are the definition of the primary hypothesis of the study, the endpoint that will be used to evaluate the primary hypothesis and a plan for interim monitoring of the trail. Among other aspects, the key objectives of interim monitoring of study findings are to ensure that the results of the trial are not released prematurely and that the effect of treatment is not proceeding in an unanticipated manner which may result in some unforeseen harm that would jeopardize the safety of individuals participating in the trial. The body charged with the interim review of study findings is a group of individuals, independent of those conducting or sponsoring the trial, referred to as the Data Monitoring Committee (DMC). It is their role to determine if the trial should continue to the point of planned definitive analysis before releasing findings from the trial, or if the findings from the trial should be released early. In clinical trials involving the evaluation of pharmaceutical agents for the treatment of patients diagnosed with cancer, the formal interim monitoring plan usually involves a sequential analysis of the observed number of events for primary endpoint of the study. With the advent of clinical trials involving the evaluation of pharmaceutical agents for use as chemopreventive agents, there has been a need to have additional forms of sequential analysis in the interim monitoring to include consideration of endpoints other than that which is identified as the primary endpoint. This can be an issue in any clinical trial, but has become an issue in prevention trials for two reasons. First, prevention trials involve healthy individuals where the need to consider the full extent of risks and benefits of treatment can be more critical than among those who already have disease and require therapy. Second, the agents being evaluated in cancer prevention trials have anticipated affects (known or theorized) on the incidence of several diseases, not just the endpoint being tested under the primary hypothesis. Breast cancer prevention trials evaluating selective estrogen receptors (SERM) as chemopreventive agents are good illustrations of the complexity that can be involved in monitoring multiple endpoints in a study. In these situations there can be more than ten separate endpoints that can be affected by treatment, some endpoints affected in a beneficial manner, and some affected in a detrimental manner. The Breast Cancer Prevention Trial (J Natl Cancer Inst 1998; 90:1371-88) and the Study of Tamoxifen and Raloxifene (JAMA 2006; 295:2727-41), also known as the STAR trial, are examples of two such studies. For example, the protocol-specified interim monitoring for the STAR included a plan for the monitoring of the primary endpoint, invasive breast cancer, and also a supplemental plan for global monitoring of the composite effect on invasive breast cancer, in situ breast cancer, endometrial cancer, ischemic heart disease, hip fracture, spine fracture, Colles' fracture, stroke, transient ischemic attack, pulmonary embolism, deep vein thrombosis and death. A statistically significant finding for either of these aspects of monitoring (primary endpoint or global) would trigger a discussion by the DMC to decide if it would be appropriate to recommend the early release of the trial findings. The basis for the supplemental global monitoring used in the STAR (and also for the BCPT) was the methodology developed by Freedman et al. (Control Clin Trials 1996: 17:509-525) for use in the Women's Health Initiative study. The details of statistical algorithms and procedures for determination of statistical significance of the global monitoring of multiple endpoints are defined in the Freedman et al. paper. The methods are relatively straightforward. Essentially, global monitoring involves the development of a composite index of net-effect among all of the endpoints and a calculation of a p-value to determine if the net-effect is significantly in favor of one treatment group or the other. The global index of net-effect is found by taking the difference between the treatment arms in the number of events observed for each specific endpoint, and then summing up the differences. A decision regarding the statistical significance of the index of net-effect is based on a p-value at the traditional level of 0.05. While the statistical algorithms involved in global monitoring are not complex, there is one issue involved in global monitoring that can be difficult to resolve. This pertains to the weighting of the different endpoints being assessed. In the simple form described above, there is no weighting applied to the different disease endpoints. Thus, when determining the index of net-effect, a case of invasive breast cancer counts the same as does a case of in situ breast cancer as does a case of wrist fracture as does a death, etc. In situations like this where there are obvious differences in the severity of the endpoints being evaluated it would be appropriate to incorporate a weighting of the different endpoints. The difficulty in doing this lies in the determination of the correct scale to use for the weighting. While in most cases one can usually rank the diseases being studied into categories of severity without too much difficulty, the weights that are given to each category of severity is not always readily apparent. Should an in situ breast cancer count one-half of an invasive breast cancer or two-thirds? How much weight relative to invasive breast cancer should be given to an endometrial cancer or to a Colles' fracture? These are subjective aspects that are difficult to define in quantitative terms. In practice, what is done is that the global index of net-effect is calculated using a set of different weightings agreed upon a priori by the DMC ranging from no weighting to an extreme level of weighting for the most severe diseases. This provides the DMC with a sensitivity analysis of the index for their consideration in conjunction with the other findings from the trial. Among several possibilities, one example of a scale with heavy weighting to the most severe events would be that based on the five-year case-fatality rate for each disease. The global monitoring conducted in a trial is actually a form of benefit/risk assessment for the total population of individuals participating in the trial. This trial-based assessment can be adapted for use as an assessment for individuals in the general population. Such an adaptation has been developed by Gail and his colleagues (J Natl Cancer Inst 1999:91:1829-46). As there are a large number of endpoints potentially affected by tamoxifen, it is essential to have a visual presentation tool to facilitate the communication of benefit/risk information to those contemplating the use of tamoxifen for prevention. This type of tool, based on the Gail et al. benefit/risk assessment method, was used to assist in describing the potential benefits and risks of treatment to those contemplating participation in both the BCPT and the STAR where over 280,000 women were screened. The tool used in these trials could also be used in general practice as an aide to describe the potential benefits and risks of treatment to any woman who may be considering taking tamoxifen for breast cancer prevention. Although not currently available, the tool could be modified using information from the STAR to also provide a risk/benefit assessment for raloxifene use. The data presentation in the tool is based on describing the effects of treatment on each endpoint potentially affected in terms of the number of cases expected to occur over the next five years among a population of 10,000 women similar in demographic characteristics as the individual being considered. Included for each endpoint are the numbers of cases expected in the next five years if none of the 10,000 women were treated and, depending on the expectation as a beneficial or detrimental effect, the number of cases that may be prevented or additional cases that may be caused if all of the 10,000 women were treated. There are data tables included in the Gail et al benefit/risk assessment paper which can be abstracted into the tool for women of any particular category of age, race, five-year projected breast cancer risk and hysterectomy status. This tool can be a useful aide to communicate information to women in one's practice. However, it is time consuming to construct. More efficient methods are needed for generating individualized benefit/risk assessment in the clinical setting. One possible method to achieve this would be to develop and distribute PC-based software to automatically generate patient-specific benefit/risk information in a manner similar to that which was done with the distribution of the NCI Breast Cancer Risk Assessment Tool.
[Fifth AACR International Conference on Frontiers in Cancer Prevention Research, Nov 12-15, 2006]