Abstract
Purpose: Gastrointestinal stromal tumor (GIST) is a relatively rare tumor that is treated with targeted therapies in advanced stages. Randomized clinical trials (RCT) often require long follow-up and large sample sizes to evaluate overall survival (OS), the gold-standard measure of treatment efficacy. However, changes in therapy following disease progression may complicate survival assessments. Establishing surrogate endpoints may facilitate the drug approval and availability of new efficacious treatments; however, no published studies have investigated this topic in unresectable and/or metastatic GIST.
Experimental Design: A systematic literature review identified 14 RCTs and five observational studies of sufficient methodologic quality published between January 1995 and December 2013 (29 treatment arms; 2,189 patients). Weighted linear regression was used to evaluate the relation between median OS and median progression-free survival (PFS) for all arms combined and stratified by treatment line, treatment type, and quality score.
Results: Median OS and PFS were positively related with a correlation of 0.91. The association was still moderate (correlation 0.72) after eliminating four influential data points. In stratified analyses, correlation of OS and PFS was greater in later lines of therapy (first line = 0.52; second line = 0.80; third- and later-line = 0.70) and imatinib showed a stronger association (0.91) than other evaluated treatments (−0.26 to 0.69).
Conclusion: This analysis identified a strong relationship between median OS and PFS, especially in later lines of therapy. Findings suggest that PFS could serve as a surrogate marker for OS; however, analyses of patient-level data are needed to establish its validity in GIST. Clin Cancer Res; 21(2); 295–302. ©2014 AACR.
This arm-level meta-analysis pertains directly to the evaluation of survival endpoints in oncology clinical trials. We evaluated the relationship between median overall survival and median progression-free survival in gastrointestinal stromal tumor (GIST) trials. Establishing the validity of PFS as a surrogate marker for OS in advanced GIST may allow for shorter follow-up periods in trials, and help to reduce the costs of clinical trials and facilitate the drug approval and availability of new and efficacious targeted therapies. No published studies have investigated this topic in unresectable and/or metastatic GIST. This analysis identified a strong relationship between median OS and PFS, especially in later lines of therapy. Findings suggest that PFS could serve as a surrogate marker for OS; however, analyses of patient-level data are needed to establish its validity in GIST.
Introduction
Gastrointestinal stromal tumor (GIST) is the most common mesenchymal tumor of the gastrointestinal (GI) system, although GIST is a relatively rare cancer (1, 2). The annual incidence of GIST may range from 6.8 to 19.7 per million across several countries, including the United States (2, 3). This range may be an underestimate due to the challenges of diagnosis and the under-representation of small GIST cases in cancer registries (4). In early stages, GIST is treated by surgical resection, whereas advanced and unresectable forms are treated with targeted therapies: tyrosine kinase inhibitors (TKI), such as imatinib (5) or sunitinib (6), in the case of imatinib resistance. Another targeted therapy, regorafenib, is used in patients with GIST with resistance to both imatinib and sunitinib (7). Several new therapies are also being evaluated for later lines of treatment, including the TKIs nilotinib and imatinib in combination with the mTOR inhibitor everolimus (8–10).
Targeted therapies have shown a profound effect on survival outcomes. On the basis of estimates from 1995 to 2000 and from 2001 to 2004 after the introduction of imatinib, survival in unresectable and metastatic GIST increased from 12 months to 33 months (11). More recent studies suggest that survival today may be as much as 60 months (12). The approval of the current targeted therapies in GIST is based on data measured by surrogate endpoints. Although overall survival (OS) is the gold-standard measure of treatment efficacy in most cancers, the evaluation of OS in clinical trials in metastatic or unresectable GIST requires a lengthy follow-up due to the improvements in survival time with targeted therapies. In addition, estimates of OS in GIST are increasingly impacted by treatment cross-over and the subsequent administration of effective later-line therapies.
There has been a significant interest in establishing and validating surrogate endpoints for OS in advanced cancers. Utilized as a substitute for another clinical endpoint, a surrogate is a measure or sign that is expected to sufficiently predict clinical benefit or decline. Correlation analyses have validated surrogate endpoints for OS in metastatic colorectal, ovarian, and other cancers (13–16). The use of surrogate endpoints in advanced cancers may facilitate earlier analysis of trial data and provide more straightforward estimates of efficacy, eliminating the impact of cross-over in the event of disease progression.
Typically evaluated as a secondary or coprimary endpoint in oncology trials, progression-free survival (PFS) reflects the time from beginning the treatment to initial disease progression or to death by any cause. PFS is an attractive candidate for a surrogate endpoint as it measures only the effect of the study drug and is not impacted by subsequent treatments patients receive, as OS may be in trial settings where cross-over is offered to participating patients (17). In addition, in disease settings such as advanced GIST, with incidence rates of only 11 to 20 cases per million per year (18), PFS is reported to be a commonly accepted primary endpoint (19).
The purpose of this study was to evaluate the relationship between median PFS and median OS in GIST to inform consideration of PFS as a surrogate endpoint for OS. A systematic review and an arm-level analysis evaluated the linear association between median OS and median PFS in clinical trials and observational studies of targeted therapies for GIST. Additional analyses explore the impact of treatment line, treatment type, and study quality on the strength of the relationship.
Materials and Methods
Literature review
Two systematic literature searches were conducted in PubMed and Embase according to Cochrane guidelines (20); an initial search from January 1, 1995 through July 1, 2010 was conducted in PubMed, and later an updated search through October 28, 2013 was conducted in both databases. Search terms included “GIST,” “advanced or metastatic,” and “unresectable or stage IV.” The searches were limited to clinical trials, prospective observational studies, and retrospective observational studies published in English. A manual search of the annual proceedings of the American Society of Clinical Oncology and the European Society of Medical Oncology was conducted from 2011 through 2013. Two researchers reviewed each abstract and text against the study inclusion and exclusion criteria. Studies were included if they evaluated targeted therapies for GIST, reported both median OS and median PFS values, and enrolled 15 or more patients.
Quality assessment
Two researchers evaluated all included studies using the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) approach, as outlined and adopted by the Cochrane Collaboration (20. Chapter 12: Interpreting results and drawing conclusions). This approach defines the quality of a body of evidence on the basis of study design and the directness, precision, and consistency of the results. Each study was assigned a score of low (1 to <2), moderate (2 to <3), and high (3 to ≤4) for the randomized controlled trials of high-statistical power and consistent reporting. Studies with an average score of at least 1.75 were considered to be of sufficient quality for inclusion in the analysis; others were excluded to limit the impact of studies with critical problems and unsystematic clinical observations on the study results. Sensitivity analysis was conducted using different groups of quality scores.
Data extraction
For each included study of sufficient quality, data were extracted for study design, year of publication, sample size per treatment arm, treatment, treatment line, and Eastern Cooperative Oncology Group (ECOG; ref. 21) performance status. Data on the median OS and median PFS and outcome definitions were collected; reported point values took precedence over data from Kaplan–Meier curves. Confidence intervals were also collected to elucidate the variation in median outcome. In the event that a study was published in multiple articles or abstracts, the most recent data were used. A second researcher fully validated the extracted data.
Data analysis
Analyses were defined prospectively in a statistical analysis plan. To evaluate the relationship between median OS and median PFS, linear regression analyses were performed. Each treatment arm was weighted in the least-squares regression models by its respective sample size. In addition, the corresponding coefficient of determination (R2) and Pearson correlation coefficient were recorded for each model fit. Difference in Beta Scaled (DFBETAS; ref. 22) was used to identify influential data points within the key regression analyses and additional models were then run to show the effect of excluding all highly influential points. Furthermore, to explore the impact of other study variables, additional analyses were performed stratifying treatment arms by (i) treatment line (first-line, second- and line 2.5, third- and later-line); (ii) treatment type (imatinib, sunitinib, sorafenib, combination, and other TKIs); and (iii) assigned study quality score (1.75, 2–2.75, 3–3.75). All reported P values reflect two-sided tests, with a value of P < 0.05 indicative of statistical significance. Analyses were performed using R version 3.0.2. Studies that were designated as line 2.5 (2+) were those that allowed patients with second-line and later treatments but were largely considered to be second-line studies.
Results
Following the systematic literature review (Fig. 1), a total of 51 studies were evaluated for quality. Nineteen studies were considered to be of sufficient quality for inclusion in analyses (5–10, 23–35). Most studies were phase II or III clinical trials, though three retrospective and one prospective observational study were included (25, 29, 32, 36). These studies included 29 treatment arms and 2,189 patients (Table 1). Across study treatment arms, there were between 15 and 349 patients with unresectable and/or metastatic GIST; the mean and the median for reported median OS were 18.7 and 11.8 months, respectively. The mean of the reported median PFS was 7.9 (median median PFS was 4.1) for the 29 treatment arms. Evaluated targeted therapies included imatinib, sunitinib, regorafenib, and nilotinib. Treatment line and ECOG status of evaluated patients varied across the studies.
Systematic review yield. aThese publications did not provide any additional data for the study of interest.
Systematic review yield. aThese publications did not provide any additional data for the study of interest.
Therapies and study characteristics
. | . | . | . | Proportion (%) . | Median (in mo) . | . | |||
---|---|---|---|---|---|---|---|---|---|
First author and year (name, study design) . | Treatment (daily dose) . | N . | Line . | ECOG 0 . | ECOG 1 . | ECOG 2 . | OS . | PFS . | Quality score . |
Blanke et al. 2008 (S0033, phase III) | Imatinib (800 mg) | 349 | 1 | 96%a | 51.0 | 20.0 | H (3.75) | ||
Imatinib (400 mg) | 345 | 1 | 55.0 | 18.0 | |||||
McAuliffe et al. 2007 (retrospective observational) | Imatinib (400–800 mg) | 53 | 1 | — | — | — | 41.2 | 27.0 | L (2.00) |
Ryu et al. 2009 (phase II) | Imatinib (400 mg) | 47 | 1 | 27.7% | 68.1% | 4.3% | 65.0b | 40.0b | L (1.75) |
George et al. 2009 (phase II) | Sunitinib (37.5 mg morning or evening) | 60 | 2 | 56.7% | 41.7% | 1.7% | 24.6 | 7.8 | L (1.75) |
Heinrich et al. 2008 (phase I–II) | Sunitinib (25–75 mg) | 97 | 2 | 51.5% | 42.3% | 6.2% | 19.0 | 7.8 | L (2.00) |
Raut et al. 2010 (retrospective observational) | Sunitinib (NR) | 50 | 2 | — | — | — | 26.0 | 15.6 | L (1.75) |
Schoffski et al. 2010c (phase I–II) | Everolimus + imatinib (2.5 mg + 600 mg) | 28 | 2 | 39% | 57% | 4% | 14.9 | 1.9 | M (2.50) |
Benjamin et al. 2011c (phase II) | Motesanib (125 mg) | 138 | 2 | — | — | — | 14.7 | 3.7 | M (2.50) |
Li et al. 2012c (prospective observational) | Imatinib (600 mg) | 52 | 2 | 100% | 18.6 | 3.9 | L (2.00) | ||
Maurel et al. 2010c (phase I–II) | Imatinib + doxorubicin (400 mg) | 26 | 2+ | 20.8% | 61.5% | 7.7% | 13.0 | 3.3 | L (2.00) |
Kindler et al. 2011c (U. Chicago Phase II Consortium) | Sorafenib (800 mg) | 38 | 2+ | 47%d | 47%d | 5%d | 11.6 | 5.2 | M (2.50) |
Schoffski et al. 2010c (phase I–II) | Everolimus + imatinib (2.5 mg + 600 mg) | 47 | 3 | 45% | 49% | 6% | 10.7 | 3.5 | M (2.50) |
Montemurro et al. 2013c (retrospective observational) | Sorafenib (400 mg) | 56 | 3 | 63.9%e | 36.1%e | 17.9 | 6.0 | L (2.00) | |
Italiano et al. 2012c (retrospective observational) | Imatinib (doses NR) | 40 | 3 | 80%f | 9%f | 7.5 | 2.9 | L (2.00) | |
Imatinib + other agent (not “ib”; dose NR) | 27 | 3 | 8.7 | 3.0 | |||||
Nilotinib (dose NR) | 67 | 3 | 11.8 | 4.1 | |||||
Sorafenib (dose NR) | 55 | 3 | 10.7 | 4.9 | |||||
George et al. 2013c (phase II) | Regorafenib (160 mg) | 33 | 3+ | 70.0% | 30.0% | 0% | 27.0 | 13.0 | M (2.50) |
Kang et al. 2013c,g (RIGHT, phase III) | Imatinib (400 mg) | 81 | 3+ | 68%h | 32%h | 8.2 | 1.8 | M (3.00) | |
Park et al. 2012c (Korean GIST Study, phase II) | Sorafenib (800 mg) | 31 | 3+ | 29% | 61% | 10% | 9.7 | 4.9 | M (2.50) |
Reichardt et al. 2012c (ENESTg3, phase III) | Nilotinib (800 mg) | 165 | 3+ | 54.5% | 37.6% | 7.9% | 11.9 | 3.6 | M (3.00) |
BSC, BSC + imatinib, or BSC + sunitinibi | 83 | 3+ | 39.8% | 49.4% | 9.6% | 9.2 | 3.6 | ||
Kang et al. 2012c (phase II) | Dovitinib (500 mg) | 30 | 3+ | — | — | — | 6.2 | 3.6 | M (2.50) |
Trent et al. 2011c (phase II) | Dasatinib (140 mg) | 50 | 3+ | — | — | — | 19.0 | 2.0 | L (2.00) |
Montemurro et al. 2013c (retrospective observational) | Sorafenib (400 mg) | 68 | 4 | 63.9% | 36.1%e | 11.0 | 7.1 | L (2.00) | |
Italiano et al. 2012c (retrospective observational) | Imatinib (dose NR) | 37 | 4 | 80%f | 9%f | 7.4 | 4.5 | L (2.00) | |
Imatinib + other agent (not “ib”; dose NR) | 15 | 4 | 7.4 | 2.5 | |||||
Nilotinib (dose NR) | 21 | 4 | 5.4 | 3.8 |
. | . | . | . | Proportion (%) . | Median (in mo) . | . | |||
---|---|---|---|---|---|---|---|---|---|
First author and year (name, study design) . | Treatment (daily dose) . | N . | Line . | ECOG 0 . | ECOG 1 . | ECOG 2 . | OS . | PFS . | Quality score . |
Blanke et al. 2008 (S0033, phase III) | Imatinib (800 mg) | 349 | 1 | 96%a | 51.0 | 20.0 | H (3.75) | ||
Imatinib (400 mg) | 345 | 1 | 55.0 | 18.0 | |||||
McAuliffe et al. 2007 (retrospective observational) | Imatinib (400–800 mg) | 53 | 1 | — | — | — | 41.2 | 27.0 | L (2.00) |
Ryu et al. 2009 (phase II) | Imatinib (400 mg) | 47 | 1 | 27.7% | 68.1% | 4.3% | 65.0b | 40.0b | L (1.75) |
George et al. 2009 (phase II) | Sunitinib (37.5 mg morning or evening) | 60 | 2 | 56.7% | 41.7% | 1.7% | 24.6 | 7.8 | L (1.75) |
Heinrich et al. 2008 (phase I–II) | Sunitinib (25–75 mg) | 97 | 2 | 51.5% | 42.3% | 6.2% | 19.0 | 7.8 | L (2.00) |
Raut et al. 2010 (retrospective observational) | Sunitinib (NR) | 50 | 2 | — | — | — | 26.0 | 15.6 | L (1.75) |
Schoffski et al. 2010c (phase I–II) | Everolimus + imatinib (2.5 mg + 600 mg) | 28 | 2 | 39% | 57% | 4% | 14.9 | 1.9 | M (2.50) |
Benjamin et al. 2011c (phase II) | Motesanib (125 mg) | 138 | 2 | — | — | — | 14.7 | 3.7 | M (2.50) |
Li et al. 2012c (prospective observational) | Imatinib (600 mg) | 52 | 2 | 100% | 18.6 | 3.9 | L (2.00) | ||
Maurel et al. 2010c (phase I–II) | Imatinib + doxorubicin (400 mg) | 26 | 2+ | 20.8% | 61.5% | 7.7% | 13.0 | 3.3 | L (2.00) |
Kindler et al. 2011c (U. Chicago Phase II Consortium) | Sorafenib (800 mg) | 38 | 2+ | 47%d | 47%d | 5%d | 11.6 | 5.2 | M (2.50) |
Schoffski et al. 2010c (phase I–II) | Everolimus + imatinib (2.5 mg + 600 mg) | 47 | 3 | 45% | 49% | 6% | 10.7 | 3.5 | M (2.50) |
Montemurro et al. 2013c (retrospective observational) | Sorafenib (400 mg) | 56 | 3 | 63.9%e | 36.1%e | 17.9 | 6.0 | L (2.00) | |
Italiano et al. 2012c (retrospective observational) | Imatinib (doses NR) | 40 | 3 | 80%f | 9%f | 7.5 | 2.9 | L (2.00) | |
Imatinib + other agent (not “ib”; dose NR) | 27 | 3 | 8.7 | 3.0 | |||||
Nilotinib (dose NR) | 67 | 3 | 11.8 | 4.1 | |||||
Sorafenib (dose NR) | 55 | 3 | 10.7 | 4.9 | |||||
George et al. 2013c (phase II) | Regorafenib (160 mg) | 33 | 3+ | 70.0% | 30.0% | 0% | 27.0 | 13.0 | M (2.50) |
Kang et al. 2013c,g (RIGHT, phase III) | Imatinib (400 mg) | 81 | 3+ | 68%h | 32%h | 8.2 | 1.8 | M (3.00) | |
Park et al. 2012c (Korean GIST Study, phase II) | Sorafenib (800 mg) | 31 | 3+ | 29% | 61% | 10% | 9.7 | 4.9 | M (2.50) |
Reichardt et al. 2012c (ENESTg3, phase III) | Nilotinib (800 mg) | 165 | 3+ | 54.5% | 37.6% | 7.9% | 11.9 | 3.6 | M (3.00) |
BSC, BSC + imatinib, or BSC + sunitinibi | 83 | 3+ | 39.8% | 49.4% | 9.6% | 9.2 | 3.6 | ||
Kang et al. 2012c (phase II) | Dovitinib (500 mg) | 30 | 3+ | — | — | — | 6.2 | 3.6 | M (2.50) |
Trent et al. 2011c (phase II) | Dasatinib (140 mg) | 50 | 3+ | — | — | — | 19.0 | 2.0 | L (2.00) |
Montemurro et al. 2013c (retrospective observational) | Sorafenib (400 mg) | 68 | 4 | 63.9% | 36.1%e | 11.0 | 7.1 | L (2.00) | |
Italiano et al. 2012c (retrospective observational) | Imatinib (dose NR) | 37 | 4 | 80%f | 9%f | 7.4 | 4.5 | L (2.00) | |
Imatinib + other agent (not “ib”; dose NR) | 15 | 4 | 7.4 | 2.5 | |||||
Nilotinib (dose NR) | 21 | 4 | 5.4 | 3.8 |
Abbreviations: BSC, best supportive care; H, high; L, low; M, moderate; NR, not reported.
aThis study also enrolled patients with performance status of 3 (4%).
bData were visually estimated from a Kaplan–Meier curve.
cIndicates study identified in the systematic review update.
dPerformance status tool not specified.
eProportions include ECOG of 3. Reported value is for the overall study population.
fProportion includes ECOG ≥2. Reported value is for the overall study population. 11% of patients had unknown performance status.
gWithin the RIGHT randomized trial, the placebo arm was not included in the analysis.
hProportions include ECOG of 3.
iThis treatment arm was not included in the analyses by treatment as it represents a mixture of sunitinib and imatinib along with BSC.
Overall analysis
Median OS was strongly associated with median PFS (R2 = 0.84, r = 0.91; Fig. 2A and Table 2). When analyzing influential data points, any DFBETA larger than 2/sqrt(n) or smaller than −2/sqrt(n) was flagged as influential (where n is the number of points in the model; ref. 22). Ryu and colleagues (35), McAuliffe and colleagues (31), and both low- and high-dose treatment arms from Blanke and colleagues (5) stood out as having a high degree of influence on the fitted model. Figure 2B shows the modest change in slope of the regression with removal of the influential data points. Although the strength of the relationship was also reduced in this sensitivity analysis, the result is still reflective of a strong linear association between median OS and median PFS (R2 = 0.52, r = 0.72; Table 2).
Weighted linear regression analyses for the relationship between median OS and median PFS in 29 treatment arms evaluating patients (n = 2,189) with unresectable and/or metastatic GIST in observational studies (n = 4) and clinical trials (n = 15) of targeted therapies. Influential data points were identified using DFBETAS. A, regression for all arms. B, regression for all arms with four influential points removed. C, regression stratified by treatment line. D, regression stratified by treatment line with five influential points removed.
Weighted linear regression analyses for the relationship between median OS and median PFS in 29 treatment arms evaluating patients (n = 2,189) with unresectable and/or metastatic GIST in observational studies (n = 4) and clinical trials (n = 15) of targeted therapies. Influential data points were identified using DFBETAS. A, regression for all arms. B, regression for all arms with four influential points removed. C, regression stratified by treatment line. D, regression stratified by treatment line with five influential points removed.
Weighted linear regression analyses for the relationship between median OS and median PFS in 29 treatment arms evaluating patients (n = 2,189) with unresectable and/or metastatic GIST in observational studies (n = 4) and clinical trials (n = 15) of targeted therapies
Group . | Correlation (95% CI) . | R2 . | Adjusted R2 . | Slope (SE), P . |
---|---|---|---|---|
Overall | ||||
All arms | 0.91 (0.82–0.96) | 0.84 | 0.83 | 2.08 (0.18), P < 0.0001 |
All arms, with removal of four influential points | 0.72 (0.46–0.87) | 0.52 | 0.50 | 1.33 (0.27), P < 0.0001 |
Stratified by treatment line | ||||
First line | 0.52 (−0.88–0.99) | 0.082 | −0.38 | 0.25 (0.59), P = 0.71 |
First line, with removal of influential point | −1.0 (not availablea) | 0.98 | 0.97 | −1.57 (0.20), P = 0.082 |
Second and 2.5-line | 0.80 (0.21–0.96) | 0.66 | 0.60 | 0.98 (0.29), P = 0.015 |
Second and 2.5-line, with removal of influential point | 0.66 (−0.19–0.94) | 0.51 | 0.41 | 1.32 (0.58), P = 0.072 |
Third and later lines | 0.70 (0.33–0.88) | 0.39 | 0.35 | 1.27 (0.41), P = 0.0074 |
Third and later lines, with removal of three influential points | 0.62 (0.14–0.87) | 0.44 | 0.40 | 1.77 (0.57), P = 0.0093 |
Stratified by treatment | ||||
Imatinib | 0.91 (0.56–0.98) | 0.72 | 0.67 | 1.75 (0.45), P = 0.008 |
Sunitinib | 0.65 (not availablea) | 0.44 | −0.13 | 0.62 (0.71), P = 0.54 |
Sorafenib | 0.29 (−0.80–0.93) | 0.03 | −0.29 | 0.58 (1.90), P = 0.78 |
Combination therapy | −0.26 (−0.93–0.81) | 0.16 | −0.13 | −1.62 (2.18), P = 0.51 |
Other TKIs | 0.69 (−0.13–0.95) | 0.39 | 0.27 | 1.28 (0.71), P = 0.13 |
Stratified by quality score | ||||
1.75 | 0.98 (not availablea) | 0.96 | 0.91 | 1.32 (0.28), P = 0.13 |
2–2.75 | 0.85 (0.66–0.94) | 0.73 | 0.72 | 1.22 (0.17), P < 0.0001 |
3–3.75 | 0.99 (0.83–1.0) | 0.96 | 0.95 | 2.66 (0.31), P = 0.0032 |
Group . | Correlation (95% CI) . | R2 . | Adjusted R2 . | Slope (SE), P . |
---|---|---|---|---|
Overall | ||||
All arms | 0.91 (0.82–0.96) | 0.84 | 0.83 | 2.08 (0.18), P < 0.0001 |
All arms, with removal of four influential points | 0.72 (0.46–0.87) | 0.52 | 0.50 | 1.33 (0.27), P < 0.0001 |
Stratified by treatment line | ||||
First line | 0.52 (−0.88–0.99) | 0.082 | −0.38 | 0.25 (0.59), P = 0.71 |
First line, with removal of influential point | −1.0 (not availablea) | 0.98 | 0.97 | −1.57 (0.20), P = 0.082 |
Second and 2.5-line | 0.80 (0.21–0.96) | 0.66 | 0.60 | 0.98 (0.29), P = 0.015 |
Second and 2.5-line, with removal of influential point | 0.66 (−0.19–0.94) | 0.51 | 0.41 | 1.32 (0.58), P = 0.072 |
Third and later lines | 0.70 (0.33–0.88) | 0.39 | 0.35 | 1.27 (0.41), P = 0.0074 |
Third and later lines, with removal of three influential points | 0.62 (0.14–0.87) | 0.44 | 0.40 | 1.77 (0.57), P = 0.0093 |
Stratified by treatment | ||||
Imatinib | 0.91 (0.56–0.98) | 0.72 | 0.67 | 1.75 (0.45), P = 0.008 |
Sunitinib | 0.65 (not availablea) | 0.44 | −0.13 | 0.62 (0.71), P = 0.54 |
Sorafenib | 0.29 (−0.80–0.93) | 0.03 | −0.29 | 0.58 (1.90), P = 0.78 |
Combination therapy | −0.26 (−0.93–0.81) | 0.16 | −0.13 | −1.62 (2.18), P = 0.51 |
Other TKIs | 0.69 (−0.13–0.95) | 0.39 | 0.27 | 1.28 (0.71), P = 0.13 |
Stratified by quality score | ||||
1.75 | 0.98 (not availablea) | 0.96 | 0.91 | 1.32 (0.28), P = 0.13 |
2–2.75 | 0.85 (0.66–0.94) | 0.73 | 0.72 | 1.22 (0.17), P < 0.0001 |
3–3.75 | 0.99 (0.83–1.0) | 0.96 | 0.95 | 2.66 (0.31), P = 0.0032 |
Abbreviation: CI, confidence interval.
aThe confidence interval was not generated by the linear regression model due to inclusion of only three data points.
Analysis by treatment line
Four treatment arms evaluated a first-line population (5, 31, 35), whereas eight arms evaluated a second- or 2.5-line [treatment arms designated as line 2.5 (2+) were those that allowed patients receiving second-line and later treatments; refs. 6, 9, 23, 24, 28–30, 33], and 17 arms evaluated a third- and later-line population (7–10, 25–27, 32, 34). In a stratified analysis, the linear association was strongest for second-line therapy (R2 = 0.66, r = 0.80) followed by third- and later-line therapy (R2 = 0.39, r = 0.70; Fig. 2C and Table 2). The association was less strong in first-line therapy (R2 = 0.08, r = 0.52), which may be related to the variable treatment options available to patients with GIST following progression on first-line treatment.
Ryu and colleagues (35) was identified as an influential point in the first-line therapy model and Raut and colleagues (33) was identified for second line. In the third-line regression model, George and colleagues (7), Trent and colleagues (10), and the sorafenib fourth-line arm of Montemurro and colleagues (32) each contributed highly influential data points. The sensitivity analysis with removal of these data points showed a weaker relationship in second line but still reflected moderate correlation (R2 = 0.51, r = 0.66; Fig. 2D and Table 2). This was also true of the third- and later-line analysis (R2 = 0.44, r = 0.63). For first-line therapy, removal of the influential data points by Ryu and colleagues (35) changed the slope from positive to negative (R2 = 0.98, r = −1), suggesting that no significant association between median OS and median PFS is present in first-line therapy.
Analysis by treatment type
Eight treatment arms evaluated single-agent imatinib (5, 25, 27, 29, 31, 35), three evaluated sunitinib (6, 24, 33), five evaluated sorafenib (8, 25, 28, 32), five evaluated imatinib combinations (9, 25, 30), and seven evaluated other TKIs (7, 10, 23, 25, 26, 34), including regorafenib and nilotinib. (These analyses did not include arm from Reichardt and colleagues, as it was not able to be categorized as a single therapy. Patients received BSC, BSC + imatinib, or BSC + sunitinib.) In a stratified analysis, imatinib was associated with the strongest linear association between median OS and median PFS (R2 = 0.72, r = 0.91), followed by other TKIs (R2 = 0.39, r = 0.69) and sunitinib (R2 = 0.44, r = 0.66; Fig. 3A and Table 2). Sorafenib (R2 = 0.03, r = 0.29) and imatinib combination therapy (R2 = 0.16, r = −0.26) demonstrated little or no association.
Weighted linear regression analyses for the relationship between median OS and median PFS in 29 treatment arms evaluating patients (n = 2,189) with unresectable and/or metastatic GIST in observational studies (n = 4) and clinical trials (n = 15) of targeted therapies. All studies were scored for quality, with higher scores representing a higher level of evidence. A, regressions stratified by therapy (combination included imatinib + doxorubicin and imatinib + everolimus; “other TKI” included regorafenib, nilotinib, motesanib, dovitinib, and dasatinib). B, regressions stratified by quality score. Note that the analysis in A does not include the control arm from Reichardt et al. (34), as it could not be categorized as a single therapy. Patients received best supportive care or its combination with imatinib or sunitinib.
Weighted linear regression analyses for the relationship between median OS and median PFS in 29 treatment arms evaluating patients (n = 2,189) with unresectable and/or metastatic GIST in observational studies (n = 4) and clinical trials (n = 15) of targeted therapies. All studies were scored for quality, with higher scores representing a higher level of evidence. A, regressions stratified by therapy (combination included imatinib + doxorubicin and imatinib + everolimus; “other TKI” included regorafenib, nilotinib, motesanib, dovitinib, and dasatinib). B, regressions stratified by quality score. Note that the analysis in A does not include the control arm from Reichardt et al. (34), as it could not be categorized as a single therapy. Patients received best supportive care or its combination with imatinib or sunitinib.
Analysis by quality score
The impact of quality grading on relation between median OS and median PFS was evaluated by analyzing data in three groups: studies with score 3 to 3.75, score 2 to 2.75, and score 1.75. Three treatment arms were graded with a score of 1.75 (6, 33, 35). The majority of the treatment arms (arms n = 21) received scores of 2 to 2.75 (7–10, 23–26, 28–32). Five treatment arms were given scores of 3 to 3.75 (5, 27, 34).
In a stratified analysis, the higher-quality studies demonstrated the strongest linear relationship between median OS and median PFS (R2 = 0.96, r = 0.99), though the analyses of moderate-quality (R2 = 0.73, r = 0.85) and lower-quality (R2 = 0.96, r = 0.98) studies also showed strong association (Fig. 3B and Table 2).
Discussion
Targeted therapies have markedly improved survival in patients with unresectable and metastatic GIST. Although clinical trials in GIST have been characterized by long periods of follow-up for survival endpoints, the pivotal S0033 trial of first-line imatinib required a median follow-up of 4.5 years (5). Receipt of additional therapies after treatment progression further complicates the evaluation of OS in clinical trials. PFS offers an advantage over OS because it requires patients to be followed only until their disease progresses and therefore, measures only the effect of the study drug and is not diluted by subsequent treatments patients receive, as OS may be (17). This analysis demonstrates potential for PFS as a valid surrogate for OS based on a strong relationship between median OS and median PFS in 29 study arms of targeted agents used to treat advanced or metastatic GIST.
The strong correlation between median OS and median PFS persisted even with the removal of four highly influential treatment arms (5, 31, 35). Two of these influential arms were from the SS003 trial, which evaluated the largest number of patients with GIST (n = 746) and assumed the greatest weight within the model (5). The other two removed had relatively high estimates of median OS and PFS compared with others; each had a median PFS longer than 2 years. Moreover, all four of these arms represent the only first-line GIST populations evaluated in this analysis and therefore their corresponding estimates of OS are most impacted by the use of any later therapies applied after disease progression.
In a stratified analysis, the associations between median OS and median PFS differed by increasing line of therapy. Second-line (eight arms) and third- and later-line (17 arms) therapies all demonstrated a moderate to strong relationship, whereas first-line therapy (four arms) showed only moderate correlation. The first-line sensitivity analysis with removal of an influential data point reflected the absence of a relationship (35). This finding and the nonsignificant slope indicate that there is insufficient evidence for the association between median OS and PFS within first-line therapy alone. The moderate correlation in the main analysis may be related to the variability in treatment options following progression. The power to detect a survival benefit using derivations from PFS declines substantially with increased survival postprogression (37). Thus, since survival after GIST progression in the first-line setting may be long, reduced correlation would be expected in this model.
Analyses stratified by study quality score reinforced the result from the analysis of all treatment arms, suggesting that the relationship between median OS and median PFS is not impacted by differences in study quality as assessed by researchers. In the analysis stratified by therapy, imatinib was the most-represented treatment (eight arms: 4 first-line, 1 second-line, 1 third-line, 2 later than third-line) and expected to show the strongest association between median OS and median PFS. This assumption was confirmed, and the model in “other TKIs” (seven arms) demonstrated moderate correlation. It should be noted that “other TKIs” consisted of arms from studies of second- or later-line therapies. As these showed strong association in the analysis stratified by treatment line, this may confound the interpretation of the analysis by therapy type.
Before this analysis, it was assumed that there would be few high-quality studies of patients with unresectable and/or metastatic GIST, as it is a relatively rare cancer. This expectation was confirmed by the results of the systematic review. Included studies were characterized by limitations in design, the most notable of which was small sample sizes. To improve the strength of this arm-level analysis, only higher-quality studies with at least 15 patients per arm were included. The introduction of the GRADE quality assessment feature of the review and the strict cutoff for inclusion attempted to improve the level of evidence. A relatively low sample cutoff was chosen as several key GIST trials evaluated small sample sizes and it was necessary to include these data. Nonetheless, this analysis still considered four observational studies, which differs from analyses in more common cancers that only include data from clinical trials (14, 15, 38, 39).
One limitation of this analysis is the potential for a high degree of heterogeneity among the analyzed studies. Studies utilized differing inclusion and exclusion criteria, progression criteria (e.g., RECIST and SWOG), and time intervals between radiologic and clinical assessments. These differences may have contributed to a wider range of survival outcomes, which could bias the findings toward a stronger relationship between endpoints. However, the results of this study were similar in magnitude to estimates demonstrated in other cancers (39, 40). Future analyses should confirm the strength of the association in GIST using patient-level data from clinical trials, which has been widely regarded as necessary for the validation of endpoint surrogacy (41).
The inherent relationship between OS and PFS should also be considered. By definition, PFS is contained within OS, thus median OS is always expected to be longer than median PFS. Overlapping definitions of PFS and OS may account for part of the relationship between the values, especially if the length of survival after progression is short (37). Part of this problem may be solved by performing a study-level analysis with relative estimates of survival efficacy (i.e., HR). However, few randomized trials have evaluated a GIST population, which precludes an analysis of this type.
Relying on data from observational studies and clinical trials in patients with GIST, this analysis demonstrated a strong correlation between median OS and PFS. The association was more apparent in later treatment lines compared with first-line therapy. These findings provide some insight into the use of PFS as a surrogate marker for OS, which may be strengthened by future trial results; however, further patient-level analyses are needed to fully establish its validity in GIST.
This study highlights the importance and the need for further research to establish validity of PFS as a surrogate marker, which may ultimately lead to earlier treatment decisions, help to reduce the considerable costs of clinical trials, and to facilitate approval, reimbursement, and availability of new efficacious treatments for GIST.
Disclosure of Potential Conflicts of Interest
J. Chang and A. Mohamed are employees of Bayer HealthCare Pharmaceuticals. I. Özer-Stillman is an employee of and a consultant/advisory board member for Evidera. L. Strand is an employee of Evidera. K. Tranbarger-Freier was an employee of Evidera. No other conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: I. Özer-Stillman, J. Chang
Development of methodology: I. Özer-Stillman, J. Chang, K.E. Tranbarger-Freier
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L. Strand
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Strand, A.F. Mohamed, K.E. Tranbarger-Freier
Writing, review, and/or revision of the manuscript: I. Özer-Stillman, L. Strand, J. Chang, A.F. Mohamed, K.E. Tranbarger-Freier
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): I. Özer-Stillman, L. Strand, J. Chang, A.F. Mohamed
Study supervision: I. Özer-Stillman, J. Chang
Acknowledgments
The authors thank Dr. Kyle Fahrbach of Evidera for statistical advising and review of the manuscript and Mina Jeong, Emily Shore, and Christopher Ngai for review of the manuscript.
Grant Support
This work was supported by Bayer HealthCare Pharmaceuticals.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.