Many promising new cancer drugs proceed through preclinical testing and early-phase trials only to fail in late-stage clinical testing. Thus, improved models that better predict survival outcomes and enable the development of biomarkers are needed to identify patients most likely to respond to and benefit from therapy. Here, we describe a comprehensive approach in which we incorporated biobanking, xenografting, and multiplexed phospho-flow (PF) cytometric profiling to study drug response and identify predictive biomarkers in acute myeloid leukemia (AML) patients. To test the efficacy of our approach, we evaluated the investigational JAK2 inhibitor fedratinib (FED) in 64 patient samples. FED robustly reduced leukemia in mouse xenograft models in 59% of cases and was also effective in limiting the protumorigenic activity of leukemia stem cells as shown by serial transplantation assays. In parallel, PF profiling identified FED-mediated reduction in phospho-STAT5 (pSTAT5) levels as a predictive biomarker of in vivo drug response with high specificity (92%) and strong positive predictive value (93%). Unexpectedly, another JAK inhibitor, ruxolitinib (RUX), was ineffective in 8 of 10 FED-responsive samples. Notably, this outcome could be predicted by the status of pSTAT5 signaling, which was unaffected by RUX treatment. Consistent with this observed discrepancy, PF analysis revealed that FED exerted its effects through multiple JAK2-independent mechanisms. Collectively, this work establishes an integrated approach for testing novel anticancer agents that captures the inherent variability of response caused by disease heterogeneity and in parallel, facilitates the identification of predictive biomarkers that can help stratify patients into appropriate clinical trials. Cancer Res; 76(5); 1214–24. ©2016 AACR.

Historically, improvements in long-term survival of cancer patients due to new therapeutic approaches have been incremental. Promising preclinical studies or early-phase clinical trials frequently do not translate into survival benefits in phase III trials (1), implying that traditional preclinical models and endpoints in early-phase trials are insufficient surrogates for predicting long-term outcomes. Moreover, the mechanistic basis for variable treatment responses in clinical trials is often unknown and can result in rejection of a drug that may be of benefit to a subset of patients. Thus, a re-examination of traditional drug development models and parallel identification of drug response biomarkers for patient selection are required in order to improve the success rate in bringing forward new effective oncologic drugs.

Current preclinical models seldom reflect the disease state within humans. For example, new drugs are frequently screened for their antiproliferative activity against cancer cell lines in vitro. However, cell lines and in vitro cultures do not fully capture the intrinsic and extrinsic diversity of human disease. Moreover, proliferation in culture measures drug effects on the bulk population and not the cancer stem cells (CSC), which in many tumors have been linked to therapy failure and disease recurrence (2). Tumor heterogeneity is also not well modeled by (frequently nonorthotopic) injection of human cancer cell lines into mice, or even by engineered mouse models; the low variability and good reproducibility of the latter are actually disadvantageous for drug testing as they do not reflect intratumor and interpatient heterogeneity (3, 4). Xenotransplantation of primary cancer cells is currently the best functional assay for both normal and malignant adult human stem cells, and in the context of human acute myeloid leukemia (AML) reads out clinically relevant properties of repopulating cells (5–7). Numerous previous studies have evaluated the efficacy of antileukemia drugs in the setting of xenotransplantation assays (8–12); however, the number of primary patient samples tested has generally been small, thus precluding biomarker development.

Here, we describe a comprehensive approach that combines drug testing of a large cohort of primary patient samples in xenotransplantation assays with parallel phospho-flow (PF) cytometric single-cell profiling of short-term drug responsiveness in vitro to develop companion drug response biomarkers. To test this approach, we studied the efficacy of fedratinib (FED, also known as SAR302503 or TG101348), an investigational Janus kinase 2 (JAK2) inhibitor, against leukemia stem cells (LSC) in AML. JAK2 inhibitors including ruxolitinib (RUX) and FED have demonstrated efficacy in clinical trials for the treatment of myeloproliferative neoplasms (MPN; refs. 13–15), but have not been employed in AML, where activating JAK2 mutations are rare. Nevertheless, downstream STAT transcription factors are activated in the majority of AML cases (16, 17). Furthermore, high levels of phosphorylated JAK2 (pJAK2) expression have been associated with worse outcome in AML, and in vitro studies suggest that JAK2 could be a therapeutic target in this disease (18). We demonstrate here that our approach effectively captures the variability in treatment response that is generally seen in patient cohorts and allowed identification of a PF signature that correlated with drug responses in xenografts.

Xenotransplantation assay and drug studies

Xenotransplantation and in vivo drug treatment experiments were carried out in Toronto (T) and Vancouver (V) using local optimized protocols. Employed protocols yielded similar engraftment results at both sites for patient samples tested in pilot studies. NOD.SCID (NS) and NOD.SCID-IL2Rγnull (NSG) mice were bred and housed at the University Health Network (UHN) Animal Facility (T) or the BC Cancer Research Centre Animal Resource Centre (V). Eight- to 10-week-old mice were sublethally irradiated (T: 225 cGy; V-NS: 325 cGy; V-NSG: 315 cGy) 24 hours before transplantation. NS mice (T) received 200 μg anti-CD122 mAb by subcutaneous injection immediately after irradiation. For NSG experiments, T-cell depletion was carried out by treating mice with 12.5 μg/kg anti–CD3-diphtheria toxin by i.p. injection 24 hours after AML transplantation for 2 consecutive days (V), or by using the EasySep CD3 Positive Selection Kit (StemCell Technologies) prior to intrafemoral (IF) transplantation (T). AML samples were injected IF except for two Vancouver samples that were transplanted intravenously as indicated in Table 1. AML samples were transplanted at a dose of 2 to 5 × 106 cells/mouse. Treatment with FED (Sanofi) or RUX (Selleck Chemicals), both at a dose of 60 mg/kg, or vehicle (0.5% methylcellulose) was given twice daily by oral gavage for 14 days starting 2 to 3 weeks after transplantation. For serial transplantation studies, equal numbers of human CD45+ cells harvested from the pooled bone marrow of FED-treated or vehicle-treated mice were injected into untreated secondary recipients and engraftment evaluated 10 to 12 weeks after transplantation. For cytarabine combination studies, mice were treated with cytarabine 80 mg/kg/d i.p. ×5 days prior to FED or vehicle treatment. For DAS combination studies, mice received 60 mg/kg FED twice daily, 50 mg/kg DAS once daily, or both for 2 weeks by oral gavage. For all drug studies, mice were sacrificed the day after the final dose, and the level of human leukemic engraftment in the injected femur and noninjected bones (other femur plus two tibias) was evaluated by flow cytometry using human-specific mAbs (for details, see Supplementary Methods).

Table 1.

Efficacy of FED against xenotransplanted AML samples (initial cohort)

Injected RFNoninjected BM
Mean engraftment (%)Mean engraftment (%)
SitePatient IDFABSampleVehicleFEDP valueRR (%)VehicleFEDP valueRR (%)In vivo responsea
M1 Relapse 90.7 7.2 0.001 92 67.9 5.8 0.001 91 
M0 Diagnosis 21.6 1.2 0.001 94 5.9 0.3 0.004 95 
Unclassified Diagnosis 50.9 7.4 0.002 85 19.9 3.6 0.001 82 
M4 Relapse 34.3 6.3 0.008 82 7.7 1.5 0.002 81 
Unclassified Diagnosis 36.4 8.6 0.002 76 15.5 4.1 0.001 74 
M4 Relapse 17.1 4.2 0.001 75 6.8 0.5 0.001 92 
ND Diagnosis 76.9 19.3 0.001 75 46.8 7.9 0.001 83 
M4 Relapse 75.7 19.4 0.001 74 50.9 9.3 0.001 82 
M5 Diagnosis i.v. i.v.   27.4 6.7 0.0001 76 
10 M4 Diagnosis i.v. i.v.   60.5 18.3 0.0001 70 
11 M5a Diagnosis 22.6 7.3 0.004 68 13.5 3.7 0.001 73 
12 ND Diagnosis 86.9 28.3 0.001 67 40.4 9.6 0.001 76 
13 M5 Diagnosis 36.6 12.7 0.02 65 3.6 1.8 0.05 50 
14 M0 Diagnosis 18.7 7.1 0.0001 62 8.5 1.8 0.004 79 
15 M5b Diagnosis 25.6 10.0 0.01 61 11.5 3.6 0.01 69 
16 ND Diagnosis 66.1 27.4 0.0008 59 6.7 4.7 NS 30 
17 M1 Diagnosis 27.9 12.4 0.05 56 0.6 0.2 NS 67 
18 M5 Diagnosis 28.0 15.2 0.001 46 15.2 8.1 0.01 47 PR 
19 M2 Diagnosis 55.7 32.5 NS 42 12.3 5.3 0.044 57 PR 
20 M4 Diagnosis 84.9 64.1 0.009 24 57.1 33.7 0.045 41 PR 
21 M2 Diagnosis 86.1 84.5 NS 53.7 37.1 0.03 31 PR 
22 M0 Diagnosis 93.6 93.7 NS 90.9 32.3 0.0001 64 PR 
23 M5b Diagnosis 17.9 3.3 NS 82 2.8 1.3 NS 54 NR 
24 ND Relapse 41.0 26.5 NS 35 9.6 1.7 NS 82 NR 
25 M4 Diagnosis 24.6 19.4 NS 21 5.2 2.5 0.017 52 NR 
26 M4Eo Diagnosis 16.0 13.6 NS 15 1.0 1.5 NS −56 NR 
27 M4 Diagnosis 25.7 22.7 NS 12 2.8 1.3 NS 54 NR 
28 M2 Diagnosis 68.0 64.9 NS 25.1 5.6 NS 78 NR 
29 M4 Diagnosis 20.7 20.2 NS 2.3 3.0 NS −30 NR 
30 Unclassified Diagnosis 82.2 80.9 NS 84.3 81.0 NS NR 
31 M4 PD 96.4 95.0 NS 92.3 80.7 0.001 13 NR 
32 M4 Diagnosis 98.4 97.8 NS 98.7 87.8 0.014 11 NR 
33 M1 Diagnosis 79.8 82.2 NS −3 55.8 32.4 NS 42 NR 
34 M4Eo Diagnosis 46.2 52.0 NS −13 14.8 19.3 NS −30 NR 
Injected RFNoninjected BM
Mean engraftment (%)Mean engraftment (%)
SitePatient IDFABSampleVehicleFEDP valueRR (%)VehicleFEDP valueRR (%)In vivo responsea
M1 Relapse 90.7 7.2 0.001 92 67.9 5.8 0.001 91 
M0 Diagnosis 21.6 1.2 0.001 94 5.9 0.3 0.004 95 
Unclassified Diagnosis 50.9 7.4 0.002 85 19.9 3.6 0.001 82 
M4 Relapse 34.3 6.3 0.008 82 7.7 1.5 0.002 81 
Unclassified Diagnosis 36.4 8.6 0.002 76 15.5 4.1 0.001 74 
M4 Relapse 17.1 4.2 0.001 75 6.8 0.5 0.001 92 
ND Diagnosis 76.9 19.3 0.001 75 46.8 7.9 0.001 83 
M4 Relapse 75.7 19.4 0.001 74 50.9 9.3 0.001 82 
M5 Diagnosis i.v. i.v.   27.4 6.7 0.0001 76 
10 M4 Diagnosis i.v. i.v.   60.5 18.3 0.0001 70 
11 M5a Diagnosis 22.6 7.3 0.004 68 13.5 3.7 0.001 73 
12 ND Diagnosis 86.9 28.3 0.001 67 40.4 9.6 0.001 76 
13 M5 Diagnosis 36.6 12.7 0.02 65 3.6 1.8 0.05 50 
14 M0 Diagnosis 18.7 7.1 0.0001 62 8.5 1.8 0.004 79 
15 M5b Diagnosis 25.6 10.0 0.01 61 11.5 3.6 0.01 69 
16 ND Diagnosis 66.1 27.4 0.0008 59 6.7 4.7 NS 30 
17 M1 Diagnosis 27.9 12.4 0.05 56 0.6 0.2 NS 67 
18 M5 Diagnosis 28.0 15.2 0.001 46 15.2 8.1 0.01 47 PR 
19 M2 Diagnosis 55.7 32.5 NS 42 12.3 5.3 0.044 57 PR 
20 M4 Diagnosis 84.9 64.1 0.009 24 57.1 33.7 0.045 41 PR 
21 M2 Diagnosis 86.1 84.5 NS 53.7 37.1 0.03 31 PR 
22 M0 Diagnosis 93.6 93.7 NS 90.9 32.3 0.0001 64 PR 
23 M5b Diagnosis 17.9 3.3 NS 82 2.8 1.3 NS 54 NR 
24 ND Relapse 41.0 26.5 NS 35 9.6 1.7 NS 82 NR 
25 M4 Diagnosis 24.6 19.4 NS 21 5.2 2.5 0.017 52 NR 
26 M4Eo Diagnosis 16.0 13.6 NS 15 1.0 1.5 NS −56 NR 
27 M4 Diagnosis 25.7 22.7 NS 12 2.8 1.3 NS 54 NR 
28 M2 Diagnosis 68.0 64.9 NS 25.1 5.6 NS 78 NR 
29 M4 Diagnosis 20.7 20.2 NS 2.3 3.0 NS −30 NR 
30 Unclassified Diagnosis 82.2 80.9 NS 84.3 81.0 NS NR 
31 M4 PD 96.4 95.0 NS 92.3 80.7 0.001 13 NR 
32 M4 Diagnosis 98.4 97.8 NS 98.7 87.8 0.014 11 NR 
33 M1 Diagnosis 79.8 82.2 NS −3 55.8 32.4 NS 42 NR 
34 M4Eo Diagnosis 46.2 52.0 NS −13 14.8 19.3 NS −30 NR 

Abbreviations: FAB, French-American-British; ND, not determined; PD, persistent disease; NS, not statistically significant.

aIn vivo response criteria: R: >50% RR in RF; PR: 20% to 50% RR in RF or >20% RR in BM only; NR, no significant difference between FED- and vehicle-treated mice (NS) or <20% RR in both RF and BM.

Definition of drug response for xenotransplantation studies

For in vivo drug studies, definition of response was based on the relative reduction (RR) in human leukemic engraftment in drug-treated versus vehicle-treated mice. RR was calculated as [(mean%engraftment of vehicle-treated mice) − (mean%engraftment of drug-treated mice)]/(mean%engraftment of vehicle-treated mice). We distinguished effects in the injected right femur (RF) versus noninjected bones (BM) as leukemic burden is usually higher in the injected RF, and as such a significant reduction in leukemic engraftment in the RF is more difficult to achieve than in noninjected BM. Patients were classified as responders (R) if RR in the RF was >50%, partial responders (PR) if we observed 20% to 50% RR in the RF or >20% RR in the BM only, and nonresponders (NR) if there was no statistically significant difference in engraftment levels between vehicle- and drug-treated mice or RR was <20% in both RF and BM.

PF cytometric analysis

AML patient samples tested in vivo were subjected to PF analysis following short-term drug treatment in vitro. Viably frozen samples were thawed and serum starved for 1 hour at 37°C, then treated with DMSO (vehicle), FED (100 nmol/L), AC220 (5 nmol/L; Selleck Chemicals), RUX (300 nmol/L), or DAS (100–200 nmol/L; Toronto Research Chemicals) for another hour. During the last 30 m, cells were incubated with viability dye, then fixed, washed, and permeabilized. Phosphomarker and extracellular staining was carried out for 30′ with optimized concentrations of antibodies (for details, see Supplementary Methods). Data were acquired on a BD LSRFortessa and analyzed using FlowJo and Cytobank software (http://cytobank.org/; ref. 19). Ba/F3 cells were obtained from R. Rottapel in 2006. This IL3-dependent hematopoietic cell line remains exquisitely IL3 dependent for survival and proliferation, as assessed by cell viability assays following IL3 withdrawal (ongoing). OCI-AML5 cells were obtained from M.D. Minden in 2010. This patient-derived AML cell line was authenticated by short-tandem repeats analysis in 2014 at the Centre for Applied Genomics (Hospital for Sick Children, Toronto, Canada).

Statistical analysis

Sixty-four independent patient samples were used in the study to capture the diversity of AML. Comparison of engraftment in drug- versus vehicle-treated mice was performed using two-tailed t tests. For PF analysis, two-group comparisons were performed using the Mann–Whitney U test. Correlations were assessed by two-tailed Spearman correlation. All data were analyzed with GraphPad Prism software, version 5.0, for Mac OS X.

Study approval

Peripheral blood cells were collected from patients with newly diagnosed or relapsed AML at the Princess Margaret Cancer Centre or Vancouver General Hospital after obtaining informed consent according to procedures approved by the UHN and University of British Columbia (UBC) Research Ethics Boards. Thawed viably frozen samples were prescreened for engraftment ability in xenotransplanted mice, and engrafting samples were used in drug studies. All animal experiments were performed in accordance with institutional guidelines approved by the UHN or UBC Animal Care Committee.

FED targets LSCs in primary AML xenografts with heterogeneous responses

We tested the potential efficacy of FED in AML in an initial cohort of 34 patient samples obtained at diagnosis or relapse and representing multiple cytogenetic and molecular subtypes (Supplementary Table S1). Samples were transplanted into cohorts of immune-deficient mice (n = 5–8/group); following a 2- to 3-week engraftment period, mice were treated with FED or vehicle control for another 2 weeks (Fig. 1A). For 17 of 34 samples, leukemic engraftment in the injected femurs was 56% to 94% lower in FED-treated relative to vehicle-treated mice (P < 0.05, Table 1 and Fig. 1B). These samples also showed a 30% to 95% RR in leukemic engraftment of noninjected bones (P < 0.05). Five additional samples responded less robustly (<50% RR in injected femur and 31%–64% RR in noninjected bones; P < 0.05; Table 1). These 22 samples were classified as xenograft responders (X-R). By contrast, FED had a small or negligible effect on the leukemic graft in 12 of 34 samples (<20% RR, P < 0.05 or any RR, P > 0.05); these were classified as xenograft nonresponders (X-NR; Table 1 and Fig. 1C).

Figure 1.

Heterogeneous responses to FED treatment in primary AML xenografts. A, schematic illustrating the experimental protocol for in vivo drug testing. B and C, flow cytometric analysis of human CD45+CD33+ AML engraftment in the injected femur of mice transplanted with AML cells followed by treatment with FED or vehicle control (left) and summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice after FED or vehicle treatment (right). B, representative responsive samples. C, representative nonresponsive samples. D, summary of human leukemic engraftment in untreated secondary recipients 10 to 12 weeks posttransplantation of equal numbers of human CD45+ cells harvested from the pooled bone marrow of FED or vehicle-treated mice. Representative X-NR (left) and X-R (right) patient samples are shown. Data from injected RF and noninjected bones were combined for the analysis. Bars indicate mean values. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Figure 1.

Heterogeneous responses to FED treatment in primary AML xenografts. A, schematic illustrating the experimental protocol for in vivo drug testing. B and C, flow cytometric analysis of human CD45+CD33+ AML engraftment in the injected femur of mice transplanted with AML cells followed by treatment with FED or vehicle control (left) and summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice after FED or vehicle treatment (right). B, representative responsive samples. C, representative nonresponsive samples. D, summary of human leukemic engraftment in untreated secondary recipients 10 to 12 weeks posttransplantation of equal numbers of human CD45+ cells harvested from the pooled bone marrow of FED or vehicle-treated mice. Representative X-NR (left) and X-R (right) patient samples are shown. Data from injected RF and noninjected bones were combined for the analysis. Bars indicate mean values. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

Close modal

We also evaluated whether treatment with a standard agent such as cytarabine could potentiate the effects of FED, using samples from 3 AML patients that were partial- or nonresponders to FED alone (Supplementary Fig. S1A). In 2 of 3 samples tested, cytarabine+FED treatment significantly reduced leukemia burden in treated mice compared with either drug alone (Supplementary Fig. S1B). These findings suggest that combining the targeted therapeutic FED with cytarabine may increase overall efficacy.

To evaluate whether FED targets LSCs, AML cells were harvested from primary mice and transplanted into untreated secondary recipients; two X-NR and seven X-R samples were evaluable (cells from vehicle-treated primary mice generated >10% mean engraftment levels in secondary mice). For the two X-NR samples, the FED-treated and control groups yielded similar engraftment levels in secondary mice (Fig. 1D, left and Supplementary Fig. S1C). By contrast, in five of seven X-R samples, the FED-treated group gave rise to smaller grafts compared with controls (Fig. 1D, right and Supplementary Fig. S1C), although this only reached statistical significance in two samples due to small numbers of transplanted mice. These results suggest that in responding samples, FED treatment may impair the function and/or survival of LSCs exposed to drug in the primary mice, although variable sensitivity was observed as in the primary mice. Overall, by testing a large cohort of patient samples in a clinically relevant model, we evaluated drug effects against LSCs and captured the heterogeneous treatment response that is often seen in clinical trials.

FED-sensitive pSTAT5 provides a biomarker of in vivo response to FED

Given the observed heterogeneity of response in xenograft assays, we sought to identify a biomarker of FED responsiveness. As FED is a tyrosine kinase inhibitor, we used a multiplexed PF cytometry assay (20) to profile, with single-cell resolution, the impact of short-term (30–60′) in vitro FED treatment on the basal activity of the SYK, BCR, JAK/STAT, MAPK, PI3K/mTOR, and NF-κB signaling pathways in primary AML blasts (Supplementary Fig. S2A–S2C). We used 100 nmol/L FED in these studies, because this concentration greatly reduced IL3-induced STAT5 phosphorylation in the FLT3L-responsive OCI-AML5 cell line (Supplementary Fig. S2D). As expected (5), the AML samples in this cohort displayed highly variable expression of CD34, CD45, CD123, and CD33 (Supplementary Fig. S3A and S3B). Therefore, we costained each sample with antibodies directed against these markers to evaluate signaling in immunophenotypic cell subsets within individual samples. Basal levels of MAPK p38, pSTAT5, pAKT(T), pSRC, and IκBα showed the highest variance across this cohort (Fig. 2A, top). As would be expected, a 60′ treatment with FED in vitro decreased basal levels of pSTAT5 to a greater extent than other phosphoproteins (Fig. 2A, bottom). Interestingly, not all samples with high basal pSTAT5 levels were FED-responsive in the PF assay (R2 = 0.54), but samples with the highest basal levels of pSTAT5 showed the greatest FED-mediated decrease in pSTAT5 (Supplementary Fig. S3C). In most cases, only a fraction of the AML blast population showed elevated basal pSTAT5; the level of CD34 expression on pSTAT5hi cells was highly variable (Supplementary Fig. S3D). Thus, we identified considerable signaling heterogeneity within and between AML samples in the cohort.

Figure 2.

pSTAT5 levels predict response to FED treatment in the initial AML patient cohort. A, box and whisker plots of normalized basal median fluorescence intensities (MFI; top) and log2 FC ratios (bottom) showing the impact of FED treatment on phosphoprotein levels analyzed by flow cytometry. Horizontal lines in boxes indicate medians, boxes span interquartile range, and whiskers extend to the minimum and maximum values. n = 34 for all markers except pSTAT1 (n = 26), pSTAT3(S) (n = 27), pSRC (n = 26), pAkt(T) (n = 33), pAkt(S) (n = 27), IκB (n = 26), p4EBP1 (n = 14), pNFkB (n = 15), pSHP2 (n = 19). Normalized basal MFI was calculated by subtracting the MFI of the phospho-antibody–stained sample minus the MFI of the fluorescence minus one control (stained with surface markers but without phospho-antibodies). The log2 FC ratio was calculated as the log2 of the ratio of (MFI of drug-treated sample)/(MFI of vehicle-treated sample). B, basal and post-FED treatment changes in pSTAT5 levels in patient samples classified as nonresponders (X-NR) or responders (X-R) in xenograft assays. C, unsupervised hierarchical clustering analysis (Euclidean distance) of AML patient samples based on log2 FC ratios of FED-treated/vehicle-treated samples for each phosphoprotein as described above. White areas indicate missing values.

Figure 2.

pSTAT5 levels predict response to FED treatment in the initial AML patient cohort. A, box and whisker plots of normalized basal median fluorescence intensities (MFI; top) and log2 FC ratios (bottom) showing the impact of FED treatment on phosphoprotein levels analyzed by flow cytometry. Horizontal lines in boxes indicate medians, boxes span interquartile range, and whiskers extend to the minimum and maximum values. n = 34 for all markers except pSTAT1 (n = 26), pSTAT3(S) (n = 27), pSRC (n = 26), pAkt(T) (n = 33), pAkt(S) (n = 27), IκB (n = 26), p4EBP1 (n = 14), pNFkB (n = 15), pSHP2 (n = 19). Normalized basal MFI was calculated by subtracting the MFI of the phospho-antibody–stained sample minus the MFI of the fluorescence minus one control (stained with surface markers but without phospho-antibodies). The log2 FC ratio was calculated as the log2 of the ratio of (MFI of drug-treated sample)/(MFI of vehicle-treated sample). B, basal and post-FED treatment changes in pSTAT5 levels in patient samples classified as nonresponders (X-NR) or responders (X-R) in xenograft assays. C, unsupervised hierarchical clustering analysis (Euclidean distance) of AML patient samples based on log2 FC ratios of FED-treated/vehicle-treated samples for each phosphoprotein as described above. White areas indicate missing values.

Close modal

By comparing the in vivo drug response and PF data for each patient sample, we found that basal pSTAT5 levels were significantly higher in X-R compared with X-NR samples (Fig. 2B). FED treatment also led to a significantly greater decrease in basal pSTAT5 levels in X-R relative to X-NR samples (Fig. 2B), suggesting that the PF assay represents a biomarker of AML responsiveness to FED in vivo. This conclusion was supported by unsupervised clustering and heatmap analysis of FED-mediated decreases in basal levels of 12 phosphoproteins, which divided the cohort into two major groups (Fig. 2C). One cluster contained primarily X-R samples (14/15) that showed robust FED-mediated decreases in pSTAT5 [0.47 to 2.78 fold change (FC), log2 scale]; there were inconsistent but significant decreases in several other phosphoproteins in this sample cluster following FED treatment. The second cluster contained a mixture of X-R and X-NR samples where FED treatment did not cause changes in the measured phosphoproteins. Considering all of the X-R samples in this initial patient cohort, 64% (14/22) exhibited ≥0.4-fold (log2) FED-mediated decrease in pSTAT5 (Supplementary Fig. S3E), whereas the remaining X-R samples lacked this biomarker, suggesting at least two different mechanisms of response to FED. Importantly, 92% (11/12) of X-NR samples did not show loss of pSTAT5 when treated with FED. This assay thus provides a highly specific predictor of treatment sensitivity to FED in vivo with high-positive predictive value (14/15 = 93%).

To confirm these findings, we tested the suitability of FED-sensitive decrease in pSTAT5 as a biomarker of in vivo response to FED in an independent validation cohort of 30 additional AML patient samples. In this cohort, 16 and 14 samples were classified as X-R and X-NR, respectively, based on RR of leukemic engraftment in treated xenotransplanted mice (Table 2 and Supplementary Fig. S4). A FED-mediated reduction in pSTAT5 signaling was observed in 9 of 16 X-R and was absent in 12 of 14 X-NR patients, giving a sensitivity of 56% and specificity of 86% for this response biomarker in the validation cohort. Collectively, these data indicate that FED-sensitive pSTAT5 may provide a phosphoproteomic biomarker to identify AML patients whose leukemia cells are unlikely to respond to FED treatment in the in vivo model.

Table 2.

Efficacy of FED against xenotransplanted AML samples (validation cohort)

Injected RFDistal BM
Mean engraftment (%)Mean engraftment (%)
SitePatient IDFABSampleVehicleFEDP valueRR (%)VehicleFEDP valueRR (%)In vivo responsea
35 M4 Diagnosis 45.3 8.6 0.001 81 24.8 7.3 0.01 71 
36 M5a Diagnosis 20.4 4.0 0.001 80 6.9 3.3 NS 52 
37 Unclassified Diagnosis 39.6 8.6 0.001 78 23.4 4.0 0.001 83 
38 M5 Diagnosis 42.4 9.7 0.001 77 15.4 3.7 0.02 76 
39 M4 Diagnosis 51.4 12.6 0.0001 75 4.3 0.7 0.029 84 
40 M4 Diagnosis 30.0 8.8 0.003 71 16.2 5.6 0.001 65 
41 M5a Diagnosis 27.1 10.9 0.005 60 6.2 2.0 0.03 68 
42 M5a Diagnosis 21.5 9.1 0.001 58 16.0 8.0 0.01 50 
43 Unclassified Diagnosis 79.5 39.8 0.001 50 52.2 9.3 0.001 82 
44 M4Eo Diagnosis 25.5 13.9 0.024 46 12.9 4.5 0.001 65 PR 
45 M5a Diagnosis 72.4 45.1 0.06 38 50.2 16.6 0.001 67 PR 
46 Unclassified Diagnosis 56.8 35.8 0.0057 37 15.5 6.4 0.013 59 PR 
47 M1 Diagnosis 48.7 30.9 0.03 37 13.8 9.8 NS 29 PR 
48 M0 Diagnosis 42.6 28.4 NS 33 9.8 5.1 0.05 48 PR 
49 M4 Diagnosis 97.8 78.9 0.001 19 88.4 55.9 0.001 37 PR 
50 Unclassified Diagnosis 72.7 63.4 NS 13 63.8 49.1 0.04 23 PR 
51 M5 Diagnosis 44.2 19.5 NS 56 71.8 61.7 NS 14 NR 
52 M2 Diagnosis 19.0 10.4 NS 45 5.4 2.0 0.017 64 NR 
54 M5b Diagnosis 54.3 43.4 NS 20 36.8 24.8 NS 33 NR 
54 Unclassified Diagnosis 24.4 20.0 NS 18 24.1 17.1 NS 29 NR 
55 Unclassified Diagnosis 78.0 68.3 NS 12 55.5 49.5 NS 11 NR 
56 M1 Diagnosis 92.2 81.0 NS 12 55.9 53.7 NS NR 
57 Unclassified Diagnosis 46.9 42.2 NS 10 39.8 31.0 NS 22 NR 
58 M5 Diagnosis 75.1 71.0 NS 53.5 39.9 NS 25 NR 
59 M4Eo Diagnosis 43.4 41.2 NS 8.6 7.6 NS 11 NR 
60 M1 Relapse 93.1 90.1 NS 76.7 65.2 NS 15 NR 
61 M1 Diagnosis 75.6 73.6 NS 20.4 25.6 NS −25 NR 
62 Unclassified PD 95.8 93.3 0.001 89.4 79.6 0.05 11 NR 
63 M1 Diagnosis 74.1 72.9 NS 69.7 58.6 0.014 16 NR 
64 M4 Diagnosis 30.0 36.7 NS −22 38.4 49.4 NS −29 NR 
Injected RFDistal BM
Mean engraftment (%)Mean engraftment (%)
SitePatient IDFABSampleVehicleFEDP valueRR (%)VehicleFEDP valueRR (%)In vivo responsea
35 M4 Diagnosis 45.3 8.6 0.001 81 24.8 7.3 0.01 71 
36 M5a Diagnosis 20.4 4.0 0.001 80 6.9 3.3 NS 52 
37 Unclassified Diagnosis 39.6 8.6 0.001 78 23.4 4.0 0.001 83 
38 M5 Diagnosis 42.4 9.7 0.001 77 15.4 3.7 0.02 76 
39 M4 Diagnosis 51.4 12.6 0.0001 75 4.3 0.7 0.029 84 
40 M4 Diagnosis 30.0 8.8 0.003 71 16.2 5.6 0.001 65 
41 M5a Diagnosis 27.1 10.9 0.005 60 6.2 2.0 0.03 68 
42 M5a Diagnosis 21.5 9.1 0.001 58 16.0 8.0 0.01 50 
43 Unclassified Diagnosis 79.5 39.8 0.001 50 52.2 9.3 0.001 82 
44 M4Eo Diagnosis 25.5 13.9 0.024 46 12.9 4.5 0.001 65 PR 
45 M5a Diagnosis 72.4 45.1 0.06 38 50.2 16.6 0.001 67 PR 
46 Unclassified Diagnosis 56.8 35.8 0.0057 37 15.5 6.4 0.013 59 PR 
47 M1 Diagnosis 48.7 30.9 0.03 37 13.8 9.8 NS 29 PR 
48 M0 Diagnosis 42.6 28.4 NS 33 9.8 5.1 0.05 48 PR 
49 M4 Diagnosis 97.8 78.9 0.001 19 88.4 55.9 0.001 37 PR 
50 Unclassified Diagnosis 72.7 63.4 NS 13 63.8 49.1 0.04 23 PR 
51 M5 Diagnosis 44.2 19.5 NS 56 71.8 61.7 NS 14 NR 
52 M2 Diagnosis 19.0 10.4 NS 45 5.4 2.0 0.017 64 NR 
54 M5b Diagnosis 54.3 43.4 NS 20 36.8 24.8 NS 33 NR 
54 Unclassified Diagnosis 24.4 20.0 NS 18 24.1 17.1 NS 29 NR 
55 Unclassified Diagnosis 78.0 68.3 NS 12 55.5 49.5 NS 11 NR 
56 M1 Diagnosis 92.2 81.0 NS 12 55.9 53.7 NS NR 
57 Unclassified Diagnosis 46.9 42.2 NS 10 39.8 31.0 NS 22 NR 
58 M5 Diagnosis 75.1 71.0 NS 53.5 39.9 NS 25 NR 
59 M4Eo Diagnosis 43.4 41.2 NS 8.6 7.6 NS 11 NR 
60 M1 Relapse 93.1 90.1 NS 76.7 65.2 NS 15 NR 
61 M1 Diagnosis 75.6 73.6 NS 20.4 25.6 NS −25 NR 
62 Unclassified PD 95.8 93.3 0.001 89.4 79.6 0.05 11 NR 
63 M1 Diagnosis 74.1 72.9 NS 69.7 58.6 0.014 16 NR 
64 M4 Diagnosis 30.0 36.7 NS −22 38.4 49.4 NS −29 NR 

Abbreviations: FAB, French-American-British; PD, persistent disease; NS, not statistically significant.

aIn vivo response criteria: R: >50% RR in RF; PR: 20% to 50% RR in RF or >20% RR in BM only; NR, no significant difference between drug- and vehicle-treated mice (NS) or <20% RR in both RF and BM.

To examine whether the effects of FED against LSCs in AML were indeed mediated by JAK2 inhibition, we tested the efficacy of RUX, a JAK1/2 inhibitor approved for the treatment of MPNs, against 10 AML samples classified as FED X-R. Surprisingly, eight of ten samples did not respond to RUX treatment in xenograft assays, and the remaining two showed only a partial response (Fig. 3A and Table 3). Furthermore, in vitro treatment with RUX had minimal or no impact on pSTAT5 levels, in marked contrast with strong FED-mediated effects. As cytokine-induced JAK2 activation is accompanied by autophosphorylation of Y1007 and Y1008 (Fig. 3B; ref. 21), we also examined the abundance of pJAK2Y1007/Y1008 in samples from both cohorts. Interestingly, when detected, pJAK2 was predominantly expressed in CD34CD45hi AML blasts, and levels were not affected by in vitro treatment with FED (Fig. 3C). Samples with FED-sensitive pSTAT5 either had little pJAK2 (Fig. 3C, AML43), or expressed pJAK2 and pSTAT5 in phenotypically distinct cellular subsets (Fig. 3C, AML50), suggesting that STAT5 phosphorylation was JAK2 independent in most cases. Finally, samples with the highest basal pJAK2 typically had low pSTAT5 (Fig. 3C, AML53), suggesting that pJAK2 did not induce STAT5 phosphorylation in these samples. Collectively, these data suggest that FED exerts its effects in AML primarily through JAK2-independent mechanisms.

Figure 3.

Discordant responses to FED and RUX treatment in AML xenograft assays. A, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice treated with vehicle, FED, or RUX. Results are shown for two representative samples of ten tested. Each symbol represents one mouse. Bars indicate mean values. **, P < 0.01; ***, P < 0.001. B, histogram showing pJAK2 levels in Ba/F3 cells (a pro-B cell line) stimulated with mIL-3 (10 ng/mL) for 5 minutes. C, PF analysis of pSTAT5 and pJAK2 levels in combination with surface CD34 or CD45 expression in representative AML samples after in vitro treatment with vehicle (VEH) or FED.

Figure 3.

Discordant responses to FED and RUX treatment in AML xenograft assays. A, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice treated with vehicle, FED, or RUX. Results are shown for two representative samples of ten tested. Each symbol represents one mouse. Bars indicate mean values. **, P < 0.01; ***, P < 0.001. B, histogram showing pJAK2 levels in Ba/F3 cells (a pro-B cell line) stimulated with mIL-3 (10 ng/mL) for 5 minutes. C, PF analysis of pSTAT5 and pJAK2 levels in combination with surface CD34 or CD45 expression in representative AML samples after in vitro treatment with vehicle (VEH) or FED.

Close modal
Table 3.

Summary of in vivo and PF response of AML samples treated with FED or RUX

In vivo responseLog2 FC drug/veh pSTAT5
Sample IDFLT3-ITDFEDRUXFEDRUX
AML48 PR NR −1.22 −0.2 
AML37 NR −1.52 −0.48 
AML38 PR −0.52 −0.07 
AML40 NR −0.76 0.15 
AML2 NR −2.09 ND 
AML12 PR NR −3.03 −0.44 
AML43 NR −1.6 −0.21 
AML1 − NR −0.35 −0.01 
AML36 − NR −0.03 0.04 
AML7 − PR −0.56 −0.64 
In vivo responseLog2 FC drug/veh pSTAT5
Sample IDFLT3-ITDFEDRUXFEDRUX
AML48 PR NR −1.22 −0.2 
AML37 NR −1.52 −0.48 
AML38 PR −0.52 −0.07 
AML40 NR −0.76 0.15 
AML2 NR −2.09 ND 
AML12 PR NR −3.03 −0.44 
AML43 NR −1.6 −0.21 
AML1 − NR −0.35 −0.01 
AML36 − NR −0.03 0.04 
AML7 − PR −0.56 −0.64 

Abbreviation: ND, not determined.

Although FED was developed as a JAK2 inhibitor, like all other tyrosine kinase inhibitors, it has significant effects on other tyrosine kinases. Specifically, FED also inhibits FLT3 (Supplementary Fig. S5A; ref. 22), albeit with 5-fold lower potency (23). Therefore, we examined whether the observed FED effects might be mediated by FLT3 inhibition, because activating FLT3-ITD mutations are common in AML. In accordance with studies showing that FLT3-ITD causes STAT5 activation (24), basal pSTAT5 levels were significantly higher in FLT3-ITD+ compared with FLT3-ITD AML samples (Supplementary Fig. S5B), as previously reported (25). Treatment in vitro with the FLT3 inhibitor AC220 or FED decreased pSTAT5 to a greater extent in FLT3-ITD+ samples (Supplementary Fig. S5B and S5C). Nonetheless, FED also robustly decreased pSTAT5 levels in 3 FLT3-ITD AML samples (Supplementary Fig. S5C), and 15/38 X-R samples lacked FLT3-ITD (Supplementary Fig. S5D). These data argue that in vitro and in vivo responses to FED are not restricted to FLT3-ITD+ samples.

Identification of subpopulations with distinct pSRC/pSTAT5 signaling and drug sensitivity

Many AML samples from both cohorts exhibited small FED-mediated reductions in basal pSRC levels (Fig. 2A and Supplementary Fig. S4A), suggesting that inhibition of SRC signaling might partly account for FED's efficacy in vivo. Indeed, prior reports have suggested that several SRC family kinases are aberrantly activated in AML (26, 27). In vitro treatment with dasatanib (DAS), a dual SRC/ABL kinase inhibitor, robustly decreased SRC phosphorylation in our initial patient cohort (Fig. 4A). Interestingly, most AML samples contained phenotypically defined cellular subsets that exhibited distinct signaling profiles, including a discrete pSRChi subset that was typically CD45hiCD34−/lo (Fig. 4B) and CD33hi (data not shown), and a pSTAT5hi subset that was CD45lo (CD34+ or CD34). In most cases, pSTAT5 levels were robustly decreased by in vitro treatment with FED but not DAS, whereas pSRC was reduced to a greater extent by DAS than FED (Fig. 4B). One exception was sample AML21, in which pSTAT5 and pSRC, although clearly present in different subsets, were both more sensitive to inhibition by DAS than by FED (Supplementary Fig. S6). Indeed, DAS treatment reduced global tyrosine phosphorylation in this sample; of note, this sample contains a BCR-ABL1 gene rearrangement. This unusual PF profile was also observed in six chronic phase CML samples tested (Supplementary Fig. S6 and data not shown).

Figure 4.

PF analysis reveals subpopulations with distinct pSRC/pSTAT5 signaling and drug sensitivity. A, log2 FC ratios (calculated as described in Fig. 2) showing impact of treatment with FED or DAS on each phosphoprotein in initial cohort of AML patient samples, analyzed by flow cytometry. B, PF analysis of pSRC and pSTAT5 levels in combination with CD34 or CD45 expression in representative AML samples after in vitro treatment with vehicle, FED, or DAS. C, schematic illustrating the experimental protocol for FED/DAS testing. D, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice treated with vehicle, FED, DAS, or FED+DAS. Results from two representative patient samples are shown. E, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of untreated secondary recipients 10 to 12 weeks posttransplantation of equal numbers of human CD45+ cells from the pooled bone marrow of primary mice transplanted with AML8 and treated as indicated. For D and E, each symbol represents one mouse. Bars indicate mean values. **, P < 0.01; ***, P < 0.001.

Figure 4.

PF analysis reveals subpopulations with distinct pSRC/pSTAT5 signaling and drug sensitivity. A, log2 FC ratios (calculated as described in Fig. 2) showing impact of treatment with FED or DAS on each phosphoprotein in initial cohort of AML patient samples, analyzed by flow cytometry. B, PF analysis of pSRC and pSTAT5 levels in combination with CD34 or CD45 expression in representative AML samples after in vitro treatment with vehicle, FED, or DAS. C, schematic illustrating the experimental protocol for FED/DAS testing. D, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of engrafted mice treated with vehicle, FED, DAS, or FED+DAS. Results from two representative patient samples are shown. E, summary of human CD45+CD33+ AML engraftment in the injected femur and noninjected bones of untreated secondary recipients 10 to 12 weeks posttransplantation of equal numbers of human CD45+ cells from the pooled bone marrow of primary mice transplanted with AML8 and treated as indicated. For D and E, each symbol represents one mouse. Bars indicate mean values. **, P < 0.01; ***, P < 0.001.

Close modal

The identification of distinct cell populations with differential pSTAT5/pSRC signaling and the prediction from in vitro analysis that FED and DAS might be efficacious when used together prompted us to test the efficacy of FED+DAS combination therapy in vivo with 3 patient samples (Fig. 4C–E and data not shown). Response to single-agent therapy (either FED or DAS) in xenotransplanted mice was concordant with drug sensitivity predicted by PF analysis of in vitro treatment effects (Fig. 4B and D), and the combination of FED+DAS was more effective in reducing leukemic engraftment in treated mice compared with either agent alone. In addition, although a limited number of mice were tested, serial transplantation of AML8 (a relapse sample) suggested that combination therapy may more effectively impair LSC function compared with the single agents (Fig. 4E). Thus, combining multiparameter PF analysis with in vivo drug testing of primary patient samples enabled the rational design and evaluation of a combination regimen to target LSCs in an animal model of AML.

Here, we developed an approach that combines drug testing of large numbers of primary patient samples with multiparameter PF analysis. Using FED as a prototype, we demonstrated heterogeneity of treatment response in our xenograft model, with efficacy observed in 38 of 64 (59%) samples. In parallel, we identified a drug response biomarker that predicts which patients are likely to benefit from FED treatment based on the responses observed in xenografts. To date, this is the largest patient cohort tested with a single drug. This scale of testing allowed inclusion of patients with refractory/relapsed disease and heterogeneous molecular and cytogenetic abnormalities. The similar concordance of in vivo drug response with PF response in Toronto and Vancouver points to the robustness and feasibility of our approach for future preclinical drug development in AML.

Although xenograft models are often employed in anticancer drug development, timing and sample availability generally preclude contemporaneous analysis of patients in a clinical trial and xenografts derived from the same patients' samples, leading to uncertainty as to the validity of the latter for predicting treatment response. In the absence of such paired analyses, characterization of a predictive biomarker derived from xenograft treatment responses represents an ideal means to correlate the heterogeneous responses seen in the two settings. Validation of predicted drug responses in patients would strongly support the utility and applicability of large-scale xenografting for preclinical drug development, which if carried out in parallel with studies to identify response biomarkers as described here, will increase the success rate of anticancer drugs that proceed to trial, especially for drugs that may be beneficial to only a subset of patients.

In the case of AML, this approach can only be applied to the approximately 50% of patient samples that can generate xenografts using current methods, which may raise questions about the universality of drug-testing results. However, AML patients whose cells are engraftment-capable have worse outcomes (Kennedy and colleagues; unpublished data; ref. 7), and thus are in greatest need. Furthermore, there is substantive evidence that the properties of LSCs assayed in AML xenotransplant models have clinical relevance across a broader set of samples beyond those that can generate xenografts (5–7). For example, a functionally defined LSC-specific gene expression signature is highly prognostic in multiple independent AML cohorts (5, 6), suggesting that common pathways that are operative within LSC-enriched (i.e., xenograft-initiating) cell fractions are linked to outcome in all patients. This link also implies that, regardless of a patient's mutational spectrum and even when the subclonal composition of a xenograft does not reflect the dominant leukemic clone in a patient (28), the common stemness pathways that govern engrafting cells represent potentially relevant therapeutic targets. Thus, drugs that impair leukemic engraftment may affect outcomes, a possibility that must ultimately be tested in clinical trials.

Patient-derived tissues that have been serially propagated as xenografts (distinct from primary patient samples) have been used previously by the Pediatric Preclinical Testing Program (29) and others (30) to test a panel of therapeutic agents against a limited number of individual tumors. Although effective in screening potential new drugs, this approach does not capture interpatient tumor heterogeneity nor allow for identification of drug response biomarkers. Furthermore, xenografts that have been extensively propagated may not mirror the patient's tumor accurately. The largest study in a single tumor type examined the efficacy of Sagopilone against a panel of 22 primary non–small cell lung cancer (NSCLC) xenografts (31). Fifty percent of samples exhibited partial responses defined by RECIST (32), but only 5% of patients had a partial response at clinical trial (33). The incongruence between success at preclinical stages and failure in clinical trials seen in this and other studies may be explained by the inability of previous preclinical models of NSCLC and other tumors both to reflect disease heterogeneity and to read out CSC function (2). Drugs that reduce tumor bulk may not necessarily have activity against CSCs, and vice versa. As evaluation of drug effects against CSCs in clinical trials is challenging, it is important to evaluate CSC function in well-characterized preclinical models of human cancer, of which the AML xenograft is the best example.

The use of a large number of primary AML samples in our study enabled us to capture the response heterogeneity that is commonly seen in the clinic. The mechanistic basis for variability in treatment responses in clinical trials is often unknown but is likely related at least in part to disease heterogeneity among patients. Both inter- and intrapatient tumor heterogeneity exist on functional and genetic levels, and cell lines or even a small number of patient-derived xenografts do not sufficiently capture this heterogeneity. This precludes identification of response biomarkers and confounds efforts to understand mechanisms of action of candidate drugs and correlate these to response. For example, treatment with AZD1480, an ATP-competitive inhibitor of JAK kinases, reduced leukemic engraftment in xenograft studies of a small number of AML samples (n = 7; ref. 34). In vitro analysis of 48 AML samples demonstrated inhibition of both pSTAT3 and pSTAT5 in the phenotypically defined CD34+ LSC/progenitor blast population, but the degree of inhibition was quite variable (0%–100%). Thus, it is difficult to draw firm conclusions about whether inhibition of pSTAT3/5 signaling is the critical mechanism by which AZD1480 mediates its in vivo effects against LSCs. In our study, PF profiling provided some mechanistic insights into how FED exerts its anti-LSC effects in vivo; specifically, that FED is not acting solely through inhibition of JAK2 or FLT3 activity. Unexpectedly, we observed very few in vivo responses to another JAK inhibitor (RUX), highlighting the fact that tyrosine kinase inhibitors usually have multiple specificities (23), and that the mechanism of action against LSCs may not be related to their most potent inhibitor activity. The discordance between the in vivo and in vitro responses observed with FED and RUX also demonstrates that response biomarkers are likely tied to specific-drug actions and cannot be extrapolated to a drug class. Parallel assessment of in vivo response and in vitro analysis of affected pathways also provides opportunity for uncovering drug synergy, as demonstrated here for FED+DAS.

Predictive response biomarkers are increasingly being sought to guide patient selection in clinical trials to maximize the chances of demonstrating true drug effectiveness. Even when patients are selected for a trial based on the presence of a putative drug target (mutation and/or pathway), variability in response may be related to off-target effects, as seen in the current study, and thus unpredictable. The drug-responsive pSTAT5 biomarker that we identified based on correlation with treatment responses in xenografts was highly specific (few false positives) with strong positive predictive value in an independent validation cohort. Although this biomarker correlated with FLT3-ITD positivity, it also correctly predicted in vivo response to FED in 3 FLT3-ITD xenografted patient samples. The biomarker did not identify all patients who exhibited drug responsiveness in xenograft assays; however, reserving treatment for patients who have the response biomarker avoids exposing those unlikely to derive therapeutic benefit to potential drug toxicities. Indeed, when our study was initiated, FED was in clinical trials for the treatment of MPNs. However, Sanofi recently halted these trials after reports of Wernicke's encephalopathy in a small number of treated patients (35).

Our study provides a modern paradigm for preclinical drug development that improves the chances of correctly identifying drugs with LSC activity to move forward into clinical trials. The evidence that xenotransplantation assays detect properties of AML patient samples that correlate with outcome (5–7) increases the confidence that candidate drugs that ablate LSCs in this preclinical model will also be effective when administered to patients. Clinical trials evaluating novel anticancer therapies are expensive, time consuming, and expose patients to risk. We anticipate that adoption of this approach for preclinical evaluation and biomarker development will lead to improved patient selection and clinical trial outcomes.

M.D. Minden is a consultant/advisory board member for Celgene. No potential conflicts of interest were disclosed by the other authors.

Conception and design: W.C. Chen, M.D. Minden, C. Guidos, J.E. Dick, J.C.Y. Wang

Development of methodology: W.C. Chen, J.S. Yuan, A. Mitchell, C. Guidos, J.E. Dick, J.C.Y. Wang

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): W.C. Chen, Y. Xing, N. Mbong, G. Gerhard, G. Bogdanoski, S. Lauriault, Y. Merkulova, M.D. Minden, D.E. Hogge, C. Guidos, J.C.Y. Wang

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): W.C. Chen, J.S. Yuan, Y. Xing, A. Mitchell, J.A. Kennedy, G. Bogdanoski, S. Lauriault, Y. Merkulova, C. Guidos

Writing, review, and/or revision of the manuscript: W.C. Chen, J.S. Yuan, A. Mitchell, M.D. Minden, D.E. Hogge, C. Guidos, J.E. Dick, J.C.Y. Wang

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): W.C. Chen, J.S. Yuan, Y. Xing, A.C. Popescu, J.A. Kennedy, G. Bogdanoski, S. Perdu, Y. Merkulova, D.E. Hogge

Study supervision: D.E. Hogge, C. Guidos, J.E. Dick, J.C.Y. Wang

Other (performed experiments): J. McLeod

The authors thank Jaime Claudio and Amanda Kotzer for project management support and Sherry Zhao and members of the SickKids-UHN Flow Facility for technical support.

All authors were supported by the Cancer Stem Cell Consortium with funding from the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-047), and through the Canadian Institutes of Health Research (CSC-105367). J.E. Dick is also supported by grants from the Canadian Cancer Society, Terry Fox Foundation, Ontario Institute for Cancer Research with funds from the province of Ontario, a Canada Research Chair and the Ontario Ministry of Health and Long Term Care (OMOHLTC). The views expressed do not necessarily reflect those of the OMOHLTC.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Lieu
CH
,
Tan
AC
,
Leong
S
,
Diamond
JR
,
Eckhardt
SG
. 
From bench to bedside: lessons learned in translating preclinical studies in cancer drug development
.
J Natl Cancer Inst
2013
;
105
:
1441
56
.
2.
Wang
JC
. 
Evaluating therapeutic efficacy against cancer stem cells: new challenges posed by a new paradigm
.
Cell Stem Cell
2007
;
1
:
497
501
.
3.
Wartha
K
,
Herting
F
,
Hasmann
M
. 
Fit-for purpose use of mouse models to improve predictivity of cancer therapeutics evaluation
.
Pharmacol Ther
2014
;
142
:
351
61
.
4.
Aparicio
S
,
Hidalgo
M
,
Kung
AL
. 
Examining the utility of patient-derived xenograft mouse models
.
Nat Rev Cancer
2015
;
15
:
311
6
.
5.
Eppert
K
,
Takenaka
K
,
Lechman
ER
,
Waldron
L
,
Nilsson
B
,
van Galen
P
, et al
Stem cell gene expression programs influence clinical outcome in human leukemia
.
Nat Med
2011
;
17
:
1086
93
.
6.
Metzeler
KH
,
Maharry
K
,
Kohlschmidt
J
,
Volinia
S
,
Mrozek
K
,
Becker
H
, et al
A stem cell-like gene expression signature associates with inferior outcomes and a distinct microRNA expression profile in adults with primary cytogenetically normal acute myeloid leukemia
.
Leukemia
2013
;
27
:
2023
31
.
7.
Pearce
DJ
,
Taussig
D
,
Zibara
K
,
Smith
LL
,
Ridler
CM
,
Preudhomme
C
, et al
AML engraftment in the NOD/SCID assay reflects the outcome of AML: implications for our understanding of the heterogeneity of AML
.
Blood
2006
;
107
:
1166
73
.
8.
Jin
L
,
Hope
KJ
,
Zhai
Q
,
Smadja-Joffe
F
,
Dick
JE
. 
Targeting of CD44 eradicates human acute myeloid leukemic stem cells
.
Nat Med
2006
;
12
:
1167
74
.
9.
Jin
L
,
Lee
EM
,
Ramshaw
HS
,
Busfield
SJ
,
Peoppl
AG
,
Wilkinson
L
, et al
Monoclonal antibody-mediated targeting of CD123, IL-3 receptor alpha chain, eliminates human acute myeloid leukemic stem cells
.
Cell Stem Cell
2009
;
5
:
31
42
.
10.
Schenk
T
,
Chen
WC
,
Gollner
S
,
Howell
L
,
Jin
L
,
Hebestreit
K
, et al
Inhibition of the LSD1 (KDM1A) demethylase reactivates the all-trans-retinoic acid differentiation pathway in acute myeloid leukemia
.
Nat Med
2012
;
18
:
605
11
.
11.
Skrtic
M
,
Sriskanthadevan
S
,
Jhas
B
,
Gebbia
M
,
Wang
X
,
Wang
Z
, et al
Inhibition of mitochondrial translation as a therapeutic strategy for human acute myeloid leukemia
.
Cancer Cell
2011
;
20
:
674
88
.
12.
Sukhai
MA
,
Prabha
S
,
Hurren
R
,
Rutledge
AC
,
Lee
AY
,
Sriskanthadevan
S
, et al
Lysosomal disruption preferentially targets acute myeloid leukemia cells and progenitors
.
J Clin Invest
2012
;
123
:
315
28
.
13.
Verstovsek
S
,
Mesa
RA
,
Gotlib
J
,
Levy
RS
,
Gupta
V
,
DiPersio
JF
, et al
A double-blind, placebo-controlled trial of ruxolitinib for myelofibrosis
.
N Engl J Med
2012
;
366
:
799
807
.
14.
Mascarenhas
J
,
Hoffman
R
. 
A comprehensive review and analysis of the effect of ruxolitinib therapy on the survival of patients with myelofibrosis
.
Blood
2013
;
121
:
4832
7
.
15.
Pardanani
A
,
Gotlib
JR
,
Jamieson
C
,
Cortes
JE
,
Talpaz
M
,
Stone
RM
, et al
Safety and efficacy of TG101348, a selective JAK2 inhibitor, in myelofibrosis
.
J Clin Oncol
2011
;
29
:
789
96
.
16.
Steensma
DP
. 
JAK2 V617F in myeloid disorders: molecular diagnostic techniques and their clinical utility: a paper from the 2005 William Beaumont Hospital Symposium on Molecular Pathology
.
J Mol Diagn
2006
;
8
:
397
411
;
quiz 526
.
17.
Vicente
C
,
Vazquez
I
,
Marcotegui
N
,
Conchillo
A
,
Carranza
C
,
Rivell
G
, et al
JAK2-V617F activating mutation in acute myeloid leukemia: prognostic impact and association with other molecular markers
.
Leukemia
2007
;
21
:
2386
90
.
18.
Ikezoe
T
,
Kojima
S
,
Furihata
M
,
Yang
J
,
Nishioka
C
,
Takeuchi
A
, et al
Expression of p-JAK2 predicts clinical outcome and is a potential molecular target of acute myelogenous leukemia
.
Int J Cancer
2011
;
129
:
2512
21
.
19.
Kotecha
N
,
Krutzik
PO
,
Irish
JM
. 
Web-based analysis and publication of flow cytometry experiments
.
Curr Protoc Cytom
2010
;
Chapter 10:Unit10 17
.
20.
Perova
T
,
Grandal
I
,
Nutter
LM
,
Papp
E
,
Matei
IR
,
Beyene
J
, et al
Therapeutic potential of spleen tyrosine kinase inhibition for treating high-risk precursor B cell acute lymphoblastic leukemia
.
Sci Transl Med
2014
;
6
:
236ra62
.
21.
Feng
J
,
Witthuhn
BA
,
Matsuda
T
,
Kohlhuber
F
,
Kerr
IM
,
Ihle
JN
. 
Activation of Jak2 catalytic activity requires phosphorylation of Y1007 in the kinase activation loop
.
Mol Cell Biol
1997
;
17
:
2497
501
.
22.
Pardanani
A
,
Hood
J
,
Lasho
T
,
Levine
RL
,
Martin
MB
,
Noronha
G
, et al
TG101209, a small molecule JAK2-selective kinase inhibitor potently inhibits myeloproliferative disorder-associated JAK2V617F and MPLW515L/K mutations
.
Leukemia
2007
;
21
:
1658
68
.
23.
Zhou
T
,
Georgeon
S
,
Moser
R
,
Moore
DJ
,
Caflisch
A
,
Hantschel
O
. 
Specificity and mechanism-of-action of the JAK2 tyrosine kinase inhibitors ruxolitinib and SAR302503 (TG101348)
.
Leukemia
2014
;
28
:
404
7
.
24.
Choudhary
C
,
Brandts
C
,
Schwable
J
,
Tickenbrock
L
,
Sargin
B
,
Ueker
A
, et al
Activation mechanisms of STAT5 by oncogenic Flt3-ITD
.
Blood
2007
;
110
:
370
4
.
25.
Rosen
DB
,
Minden
MD
,
Kornblau
SM
,
Cohen
A
,
Gayko
U
,
Putta
S
, et al
Functional characterization of FLT3 receptor signaling deregulation in acute myeloid leukemia by single cell network profiling (SCNP)
.
PLoS One
2010
;
5
:
e13543
.
26.
Dos Santos
C
,
McDonald
T
,
Ho
YW
,
Liu
H
,
Lin
A
,
Forman
SJ
, et al
The Src and c-Kit kinase inhibitor dasatinib enhances p53-mediated targeting of human acute myeloid leukemia stem cells by chemotherapeutic agents
.
Blood
2013
;
122
:
1900
13
.
27.
Li
XY
,
Jiang
LJ
,
Chen
L
,
Ding
ML
,
Guo
HZ
,
Zhang
W
, et al
RIG-I modulates Src-mediated AKT activation to restrain leukemic stemness
.
Mol Cell
2014
;
53
:
407
19
.
28.
Klco
JM
,
Spencer
DH
,
Miller
CA
,
Griffith
M
,
Lamprecht
TL
,
O'Laughlin
M
, et al
Functional heterogeneity of genetically defined subclones in acute myeloid leukemia
.
Cancer Cell
2014
;
25
:
379
92
.
29.
Houghton
PJ
,
Morton
CL
,
Tucker
C
,
Payne
D
,
Favours
E
,
Cole
C
, et al
The pediatric preclinical testing program: description of models and early testing results
.
Pediatr Blood Cancer
2007
;
49
:
928
40
.
30.
Malaney
P
,
Nicosia
SV
,
Dave
V
. 
One mouse, one patient paradigm: New avatars of personalized cancer therapy
.
Cancer Lett
2014
;
344
:
1
12
.
31.
Hammer
S
,
Sommer
A
,
Fichtner
I
,
Becker
M
,
Rolff
J
,
Merk
J
, et al
Comparative profiling of the novel epothilone, sagopilone, in xenografts derived from primary non-small cell lung cancer
.
Clin Cancer Res
2010
;
16
:
1452
65
.
32.
Eisenhauer
EA
,
Therasse
P
,
Bogaerts
J
,
Schwartz
LH
,
Sargent
D
,
Ford
R
, et al
New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1)
.
Eur J Cancer
2009
;
45
:
228
47
.
33.
Heigener
DF
,
von Pawel
J
,
Eschbach
C
,
Brune
A
,
Schmittel
A
,
Schmelter
T
, et al
Prospective, multicenter, randomized, independent-group, open-label phase II study to investigate the efficacy and safety of three regimens with two doses of sagopilone as second-line therapy in patients with stage IIIB or IV non-small-cell lung cancer
.
Lung Cancer
2013
;
80
:
319
25
.
34.
Cook
AM
,
Li
L
,
Ho
Y
,
Lin
A
,
Li
L
,
Stein
A
, et al
Role of altered growth factor receptor-mediated JAK2 signaling in growth and maintenance of human acute myeloid leukemia stem cells
.
Blood
2014
;
123
:
2826
37
.
35.
Zhang
Q
,
Zhang
Y
,
Diamond
S
,
Boer
J
,
Harris
JJ
,
Li
Y
, et al
The Janus kinase 2 inhibitor fedratinib inhibits thiamine uptake: a putative mechanism for the onset of Wernicke's encephalopathy
.
Drug Metab Dispos
2014
;
42
:
1656
62
.