Abstract
Molecular subtyping for pancreatic cancer has made substantial progress in recent years, facilitating the optimization of existing therapeutic approaches to improve clinical outcomes in pancreatic cancer. With advances in treatment combinations and choices, it is becoming increasingly important to determine ways to place patients on the best therapies upfront. Although various molecular subtyping systems for pancreatic cancer have been proposed, consensus regarding proposed subtypes, as well as their relative clinical utility, remains largely unknown and presents a natural barrier to wider clinical adoption.
We assess three major subtype classification schemas in the context of results from two clinical trials and by meta-analysis of publicly available expression data to assess statistical criteria of subtype robustness and overall clinical relevance. We then developed a single-sample classifier (SSC) using penalized logistic regression based on the most robust and replicable schema.
We demonstrate that a tumor-intrinsic two-subtype schema is most robust, replicable, and clinically relevant. We developed Purity Independent Subtyping of Tumors (PurIST), a SSC with robust and highly replicable performance on a wide range of platforms and sample types. We show that PurIST subtypes have meaningful associations with patient prognosis and have significant implications for treatment response to FOLIFIRNOX.
The flexibility and utility of PurIST on low-input samples such as tumor biopsies allows it to be used at the time of diagnosis to facilitate the choice of effective therapies for patients with pancreatic ductal adenocarcinoma and should be considered in the context of future clinical trials.
Molecular subtyping for pancreatic cancer has made substantial progress in recent years, facilitating the optimization of existing therapeutic approaches to improve clinical outcomes in pancreatic cancer. We show that a tumor-intrinsic two-subtype schema is the most replicable and clinically robust across different subtype schemas, with basal-like subtype tumors showing resistance to FOLFIRINOX-based regimens in two independent clinical trials. Our results strongly support the need to evaluate molecular subtyping in treatment decision-making for patients with PDAC in the context of future clinical trials. We present PurIST, a clinically usable single-sample classifier that is robust and highly replicable across different gene expression platforms and sample collection types, and may be utilized in future clinical trials.
Introduction
Recent treatment advances, including FOLFIRINOX (1), gemcitabine plus nab-paclitaxel (2), and olaparib for BRCA-mutant patients (3), have provided patients and providers with better options. With the substantial progress in molecular subtyping for pancreatic cancer (4–9), there is now an opportunity to determine the optimal choice of therapy given a patient's molecular subtype and other biomarker information, enabling “precision medicine” approaches in pancreatic cancer (10, 11).
Transcriptomic molecular subtyping in pancreatic cancer is currently an area of active development, where multiple subtyping schemas for pancreatic cancer have been proposed. For example, three molecular subtypes with potential clinical and therapeutic relevance were first described by Collisson and colleagues (5), leveraging a combination of cell line, bulk, and laser capture microdissected (LCM) patient samples: Collisson (i) quasi-mesenchymal (QM-PDA), (ii) classical, and (iii) exocrine-like. A subsequent study of patients with pancreatic cancer, based on more diverse pancreatic cancer histologies in addition to the most common pancreatic ductal adenocarcinoma (PDAC), found four molecular subtypes (4): Bailey (i) squamous, (ii) pancreatic progenitor, (iii) immunogenic, and (iv) aberrantly differentiated endocrine exocrine (ADEX). More recently, Puleo and colleagues describe five subtypes that are based on features specific to tumor cells and the local microenvironment (7). Maurer and colleagues performed LCM of both tumor and stroma and showed the contribution of each to the three schemas above (8). Finally, we have previously shown two tumor-intrinsic subtypes of PDAC (6), which we called Moffitt (i) basal-like, given the similarities with basal breast and basal bladder cancer, and (ii) classical, given the overlap with the Collisson classical subtype.
However, consensus regarding proposed subtypes for clinical decision making in PDAC has been elusive. In addition, each proposed schema utilized independent cohorts of patients to demonstrate clinical relevance. As a result, the generalizability, robustness, and relative clinical utility of each proposed subtyping schema remains unclear. Comparative evaluations of these proposed subtyping systems have been limited, partially due to the difficulty in curating and applying these diverse subtyping approaches in new datasets.
Toward this end, we perform, for the first time, a systematic interrogation of the aforementioned subtyping schemas, based on a meta-analysis of their clinical utility across a large number of independent cohorts in PDAC including two clinical trials with treatment response data. We demonstrate that a tumor-intrinsic two-subtype schema from Moffitt and colleagues (6) is robust and best explains overall survival (OS) and treatment response across multiple validation datasets. Given the clinically replicable performance of this tumor-intrinsic two-subtype schema, we have developed a single-sample classifier (SSC) that we call Purity Independent Subtyping of Tumors (PurIST) to perform subtype calling for clinical use. We show that PurIST performs well on multiple gene expression platforms including microarray, RNA sequencing (RNAseq), and NanoString. In addition, given the preponderance of nonsurgical biopsies in the neoadjuvant and metastatic settings, we demonstrate its clinical utility for small sample volumes using a matched cohort of patients with bulk, archival, and fine-needle aspirations (FNA) samples. Finally, we show the stability of PurIST-predicted subtypes before and after treatment, and that PurIST basal-like subtype tumors are associated with treatment resistance to FOLFIRINOX, strongly supporting the need to incorporate subtyping into clinical trials of patients with PDAC.
Materials and Methods
Public datasets
Archival data were obtained from public sources (Supplementary Table S1; Fig. 1). For the public datasets, expression was used “as-is” with respect to the original publication, that is, RNAseq data were not realigned and gene-level expression estimates were provided in terms of fragments per kilobase per million reads (FPKM) or transcripts per million (TPM), depending on the study.
Sample collection
Deidentified bulk and FNA samples (“Yeh_Seq” dataset; Supplementary Table S1) were collected from the Institutional Review Board (IRB)-approved University of North Carolina Lineberger Comprehensive Cancer Center Tissue Procurement Core Facility after IRB exemption in accordance with the U.S. Common Rule and were flash frozen in liquid nitrogen. FNA samples were collected ex vivo at the time of resection. The FNA technique used mirrors standard cytopathology procedures, where three passes were performed using a 22-gauge needle. Palpation was used to localize the tumor. Samples were frozen in either PBS or RNAlater (Millipore Sigma). FFPE samples were prepared, hematoxylin and eosin stained, and reviewed by a single pathologist (K.E. Volmar) who was blinded to the results. See Supplementary Materials and Methods for data processing and analysis of Yeh_Seq samples. RNAseq (GSE131050) and NanoString (GSE131051) data generated from these samples are deposited in Gene Expression Omnibus (GEO).
Sample inclusion for consensus clustering analysis and PurIST training
Each collected public dataset was subjected to sample filtering to retain samples for consensus clustering (CC)–based subtype calling by the Collisson, Bailey, and Moffitt schemas, with criteria specified in Supplementary Table S1. We implemented the methods utilized in the original publication pertaining to each schema (Supplementary Materials and Methods). When prior subtype calls were available, the original published calls were used. Specifically, in PACA_AU_seq and PACA_AU_array, the original Bailey subtype calls were used; in TCGA_PAAD, the Collisson and Bailey calls were used.
For treatment response and survival analysis, samples with available clinical and RNAseq data were used. Specifically, for the pooled survival analysis, samples from the following datasets with RNAseq data and CC calls were utilized: Linehan, Moffitt_ GEO_array, PACA_AU_seq, PACA_AU_array, and TCGA_PAAD (survival group, Supplementary Table S1). Duplicated samples in PACA_AU_seq and PACA_AU_array datasets were only used once, with the subtypes called in PACA_AU_array used when mismatches of subtype calls were found between the two datasets. To train PurIST, Moffitt schema CC calls from the datasets in the training group (Aguirre, Moffitt_GEO_array, and TCGA_PAAD; Supplementary Table S1) were utilized. These samples were further filtered to provide final training labels for the PurIST algorithm (Supplementary Tables S1 and S2) by dropping poorly clustered samples on the clustered dendrogram in each dataset based on visual inspection. We considered these filtered calls as “training labels.” Model training for PurIST is described in the Supplementary Materials and Methods.
Results
The Moffitt tumor-intrinsic two-subtype schema has important implications for treatment response
To evaluate the potential impact of molecular subtypes on treatment response, we utilize transcriptomic and treatment response data from two independent clinical trials, and perform a systematic analysis of treatment response with respect to CC calls from each of the three different subtyping schemas (Supplementary Materials and Methods, Supplementary Fig. S1) for PDAC: Collisson, Bailey, and Moffitt (4–6). We first examined the association of the subtypes from each schema with treatment response using patient samples from a promising phase Ib trial by Nywening and colleagues (“Linehan,” Linehan_seq dataset; Supplementary Tables S1 and S2) of FOLFIRINOX in combination with a CCR2 inhibitor (PF-04136309) in patients with locally advanced PDAC, where an objective response was seen in 49% of patients (12). Enrolled patients had no prior treatment, and underwent core biopsies prior to the start of therapy. Twenty-eight patients with RNAseq and treatment data were available for analysis.
We found a significant overall association between categorical treatment response (based on RECIST 1.1 criteria) and pretreatment subtype classifications from the Moffitt schema (P = 0.0117; Supplementary Table S3), where basal-like tumors showed no response to FOLFIRINOX alone or FOLFIRINOX plus PF-04136309 after stratifying by arm [overall response rate (ORR) = 0%; disease control rate (DCR) = 33%; Supplementary Table S3, Fig. 2A, generalized Cochran–Mantel–Haenszel test], whereas classical tumors showed a much stronger response overall (ORR = 40%; DCR = 100%). In contrast, we were unable to identify a relationship between subtype and treatment response under the Collisson (P = 0.428) and Bailey (P = 0.113) schemas (Fig. 2A; Supplementary Table S3). As the sample size in this phase Ib trial (n = 28 patients) was small, we similarly reanalyzed the COMPASS trial results (n = 40 patients) in the context of the three subtyping schemas.
Patients enrolled in COMPASS underwent core-needle biopsies and were treated with one of two standard first-line therapies, modified-FOLFIRINOX or gemcitabine plus nab-paclitaxel. Collected patient samples in COMPASS underwent laser capture microdissection (LCM) followed by whole genome sequencing and RNAseq. Subtypes for each schema were determined as mentioned previously. Similar to our findings in the Linehan phase Ib trial, we found a significant association between the Moffitt two subtype schema with categorical treatment response stratifying by arm (P = 0.00098, generalized Cochran–Mantel–Haenszel test), where the basal-like subtype had much lower response to either treatment (ORR = 10%; DCR = 50%) relative to the classical subtype (ORR = 36.7%; DCR = 100%). We also found significant associations between treatment response and the subtypes from the Collisson (P = 0.0024) and Bailey (P = 0.0067) schemas. However, we notably observe that the Bailey squamous subtype strongly overlaps with the Moffitt basal-like subtype, and the remaining nonsquamous Bailey subtypes appear to overlap strongly with the Moffitt classical subtype (Fig. 2B, Cohen Kappa = 1.0, P = 2.54 × 10−10). We similarly found that the Collisson QM-PDA and the remaining non–QM-PDA subtypes correspond strongly with the Moffitt basal-like and classical subtypes, respectively (Fig. 2B, Cohen Kappa = 0.875, P = 2.44 × 10−8), a fact also mirrored in the Linehan trial.
Given these observations, we formally evaluated the relative clinical utility of each subtyping system using nonnested model selection criteria such as Bayesian information criterion (BIC; ref. 13). Briefly, such criteria evaluate model fit relative to the complexity of the model, as models with more predictors (subtypes) may simply have better fit due to overfitting, and also may contain excess predictors (additional subtypes) that do not contribute meaningfully in differentiating clinical outcomes. The model with the lowest BIC in a series of competing candidate models is preferred in statistical applications, and is agnostic to the magnitude of the difference (14). Considering response as a continuous outcome (% change in tumor volume), we find that the Moffitt schema had the best (lowest) BIC score in both datasets (Linehan BIC = 247.37, COMPASS BIC = 378.75, two-way ANOVA model; Supplementary Table S3), compared with the Collisson (Linehan BIC = 254.63, COMPASS BIC = 382.8) and Bailey (Linehan BIC = 250.75, COMPASS BIC = 385.66) schemas. This result similarly held if we considered response as a categorical variable (ordinal regression model; Supplementary Table S3). This finding is also reflected among the non–QM-PDA and nonsquamous subtypes (Fig. 2A and B; Supplementary Table S3), where little difference in response can be seen between these subtypes. Our results using BIC suggest that the additional subtypes found in the Collisson and Bailey schemas do not demonstrate additional benefit in differentiating treatment response over the Moffitt two-subtype schema. Taken together, these results suggest that the Moffitt basal-like and classical subtypes strongly and parsimoniously explains treatment response relative to other schemas in both clinical trials.
The Linehan phase Ib trial captured both pre- and posttreatment biopsies, providing a unique opportunity to evaluate the stability of molecular subtypes after treatment. As pre- and posttreatment biopsies are unlikely to be obtained from the same location, these samples may also provide an opportunity to evaluate intrapatient tumor heterogeneity. Interestingly, we found strong stability in the Moffitt schema subtypes in pre- and posttreatment biopsies (Fig. 2C; Cohen Kappa = 1.0; P = 2.54 × 10−10), suggesting that not only may there be less tumor-intrinsic subtype heterogeneity within a tumor, but also that the Moffitt schema subtypes are not affected by treatment, either with FOLFIRINOX or with the addition of the CCR2 inhibitor. In contrast, we found higher rates of switching in Collisson subtypes pre- to posttreatment (Fig. 2C; Supplementary Fig. S2), where changes in the exocrine-like and classical subtypes were more common. Similarly, the nonsquamous Bailey subtypes appeared to show the highest rate of subtype switching pre- and posttreatment, with the ADEX subtype demonstrating the highest rate of switching among these subtypes (Supplementary Fig. S2). It is unclear whether there is any clinical significance to such subtype transitions. Prior studies have suggested that the Bailey ADEX, Bailey immunogenic, and Collisson exocrine–like subtypes are confounded by tumor purity in contrast to the Moffitt subtypes (7–9), which may explain some of the increased heterogeneity in subtypes pre- and posttreatment in these schemas. In contrast, the Collisson QM-PDA and Bailey squamous subtypes, which were shown to overlap strongly with the Moffitt basal-like subtype, were observed to be much more stable between the two time points.
The tumor-intrinsic two-subtype schema strongly and replicably differentiates patient survival across multiple studies
Given the paucity of available genomic data in the context of treatment response in PDAC, we also perform a meta-analysis of five independent patient cohorts with OS data available: Linehan_seq, Moffitt GEO array (GSE71729), ICGC PACA_AU array, ICGC PACA_AU seq, and TCGA PAAD (survival group; Supplementary Tables S1 and S2). To determine the potential replicability of the different subtyping schemas (Collisson, Bailey, Moffitt) in differentiating clinical outcomes, we utilized CC subtype calls from each schema.
We find that the Moffitt tumor-intrinsic two-subtype schema reliably differentiates survival across individual datasets (Supplementary Fig. S1; Supplementary Table S4), showing significant associations with OS in the majority of individual studies in contrast to other schemas. After pooling datasets, we found that patients with Moffitt basal-like subtype tumors have significantly worse prognosis compared with the Moffitt classical subtype (Fig. 3C, stratified HR = 1.98, P < 0.0001, stratified Cox proportional hazards model). We also observed similar trends in the Bailey squamous and Collisson QM-PDA subtypes relative to other subtypes in the same schemas (Fig. 3A and B), mirroring our treatment response results from the previous section (Fig. 2A and B). However, overall subtype-specific survival differences were most pronounced within the two-subtype schema across studies (Supplementary Table S4), compared with the Collisson (P = 0.069) and Bailey (P = 0.076) schemas. Moreover, we find that that nonsquamous subtypes in the Bailey schema have very similar OS to one another (Fig. 3B), where a direct overall comparison of these subtypes showed no statistically significant differences in survival in our pooled dataset (immunogenic vs. ADEX stratified HR = 1.07, pancreatic progenitor vs. ADEX HR = 1.01, overall P = 0.82). We find a similar result when comparing survival among patients from the non–QM-PDA subtypes in the Collisson schema in the pooled data (Fig. 3A; exocrine-like vs. classical stratified HR = 1.17; P = 0.344).
In our pooled dataset, strong correspondence was again found between the Bailey squamous, Collisson QM-PDA, and Moffitt basal-like subtypes, and between the Moffitt classical subtype and the remaining subtypes in the Bailey (Cohen Kappa = 0.56, P = 0; Supplementary Fig. S3A) and Collisson (Cohen Kappa = 0.4, P = 0; Supplementary Fig. S3B) schemas. In TCGA PAAD, where estimates of tumor purity were available, Moffitt classical patients that were also classified as QM-PDA in the Collisson schema had much lower tumor purity than other samples (P = 0.0016; Supplementary Fig. S3C). The Bailey ADEX and immunogenic samples also had lower tumor purity, regardless of whether they were called Moffitt classical or basal-like (Supplementary Fig. S3D). These findings are similar to other studies (7–9), and suggest that the discordance in subtype assignment between schemas may be driven by tumor purity.
To determine the best fitting model for OS, we calculate BIC with respect to the stratified Cox proportional hazards model pertaining to each schema. Similar to our analysis of treatment response, we find that the Moffitt two-subtype schema has the best (lowest) BIC and therefore has the best and most parsimonious fit to the pooled survival data (Fig. 3; Supplementary Table S4). We also find this to be the case in the majority of individual studies, replicated across each of our validation datasets (Supplementary Table S4). These results reflect our finding that no difference in OS can be observed among the Collisson non–QM-PDA and Bailey nonsquamous subtypes in our pooled analysis. In total, these findings support the conclusion that the Moffitt two-subtype schema strongly and parsimoniously explains differences in OS, compared with alternate subtyping schemas. Our results further suggest that the additional subtypes found in the Collisson and Bailey schemas do not demonstrate additional clinical benefit in terms of predicting OS relative to the simpler Moffitt two-subtype schema, based on BIC and direct statistical comparison of the Collisson non–QM-PDA and Bailey nonsquamous subtypes. Given the robustness and highly replicable clinical utility of the Moffitt schema, we next developed a SSC based on this tumor-intrinsic two-subtype schema to avoid reliance on CC-based analysis.
PurIST SSC
The ability to resolve and assign subtypes via clustering is limited when applied to individual patients. Reclustering new samples with existing training samples may also change existing subtype assignments. Thus, we developed a robust SSC, PurIST, to predict subtype in individual patients, based on our three largest bulk gene expression datasets (TCGA PAAD, Aguirre Biopsies, and Moffitt GSE71729, training group; Supplementary Table S1). A key element of our method includes the utilization of tumor-intrinsic genes previously identified (6) to avoid the possible confounding of tumor gene expression with those from other tissue types. For model training, we designated training labels (Supplementary Materials and Methods). We use rank-derived quantities as predictors in our final SSC model instead of the raw expression values, utilizing the k Top Scoring Pair (kTSP) approach to generate these predictors (Supplementary Materials and Methods). The motivation of this approach is that while the raw values of gene expression may be on different scales in different studies, their relative magnitudes can be preserved by ranks.
We find that this type of rank transformation of the raw expression data has several advantages. First, a single predictor (TSP) only depends on the ranks of raw gene expression of a gene pair in a sample. Hence, its value is robust to overall technical shifts in raw expression values (i.e., due to variation in sequencing depth), and, as a result, is less sensitive to common between-sample normalization procedures of data preprocessing (15–17). Second, it simplifies data integration over different training studies as data are on the same scale. Finally, prediction in new patients is also simplified, as normalizing new patient data to the training set is no longer necessary, which may further affect the accuracy of model predictions (16).
Development and external validation of PurIST classifier.
We apply a systematic procedure (Supplementary Materials and Methods) implementing the above approach to derive our PurIST model for prediction in the tumor-intrinsic two-subtype schema given the training labels (Supplementary Materials and Methods) and rank transformed predictors for each training samples. The selected eight gene pairs (TSP), fitted model, and model coefficients are given in Supplementary Table S5. Figure 4A (Supplementary Materials and Methods) describes the validation that is performed in a hypothetical new patient by computing the values of each of the eight selected TSPs in that patient, where a value of 1 is assigned if the first gene in a TSP, gene A, has greater expression than the second gene, gene B, in that patient (and assigned 0 value otherwise). These values are then multiplied by the corresponding set of estimated TSP model coefficients, summing these values to get the patient “TSP Score” after correction for estimated baseline effects. This score is then converted to a predicted probability of belonging to the basal-like subtype, where values greater than 0.5 suggest basal-like subtype membership and the classical subtype otherwise.
To assess the quality of our prediction model, we evaluate the cross-validation error of the final model in our training group. We find that the internal leave-one-out cross-validation error for PurIST on the training group is low (3.1%). To validate this model, we apply it to the validation group datasets (Supplementary Table S1; Fig. 4) and determine whether PurIST predictions recapitulate the CC subtypes in each study. We find that pooled validation samples strongly segregate by CC subtype when sorted by their predicted basal-like probability, despite diverse studies of origin (Fig. 4B). These suggest that our methodology avoids potential study-level batch effects. The relative expression of classifier genes within each classifier TSP (paired rows, Fig. 4B) strongly discriminates between subtypes in each sample, forming the basis of our robust TSP-oriented approach for subtype prediction (Supplementary Fig. S4). We also find that, visually, predicted subtypes from PurIST have strong correspondence with independently determined CC subtypes. Overall, the PurIST classifier predicted subtypes with high levels of confidence (Fig. 4C), with most basal-like subtype predictions having predicted basal-like probabilities >0.9 (strong basal-like), and most classical subtype predictions with predicted basal probabilities of <0.1 (strong classical). Among these high confidence predictions, the majority of these calls corresponded with subtypes obtained independently via CC (Fig. 4D). Lower confidence calls (likely/lean basal-like/classical categories of prediction) had higher rates of misclassification, although these less confident calls were more rare in our validation datasets (Fig. 4C).
To evaluate the overall classification performance of PurIST across studies, we apply a nonparametric meta-analysis approach to obtain a consensus ROC curve based on the individual ROC curves from each validation study (18). We found that the overall consensus AUC is high, with a value of 0.993. ROC curves from individual studies were also consistent (Fig. 4E). In addition, we find that the estimated interstudy variability of these ROC curves with respect to predicted basal-like probability threshold t is low overall, with relatively higher variance at low thresholds and almost no variability at our standard threshold of 0.5 or greater (Fig. 4F). These reflect the similarity of individual ROC curves seen in Fig. 4E. We find that within our validation datasets, the prediction accuracy rates were in general 90% or higher, and individual study AUCs were 0.95 or greater (Fig. 4G). Furthermore, sensitivities and specificities were often high and in some cases equal to 1, reflecting near perfect classification accuracy. These results suggest that PurIST is robust across multiple datasets and platforms and recapitulates the subtypes independently obtained via CC, which we have shown to have high clinical utility.
Replicability of PurIST in archival formalin-fixed and paraffin embedded and FNA samples
Because frozen bulk tumor samples are not commonly available in routine clinical practice, we next looked at the replicability of PurIST predictions across sample types that are more widely collected in clinical practice. Notably, nearly all preoperative and metastatic biopsies are obtained using either FNA or core biopsy techniques. Prior studies have shown the feasibility of performing RNAseq on core biopsies (11) and endoscopic ultrasound guided FNAs, both of which are commonly utilized in the diagnosis of pancreatic cancer (19). We therefore evaluated the performance of PurIST in both formalin-fixed and paraffin embedded (FFPE) and FNA samples.
Among 47 pairs of matched FNA and bulk samples that passed quality control (Yeh_Seq dataset; Supplementary Materials and Methods), we found significant agreement between the PurIST subtype calls of the matched FNA and bulk samples (Cohen Kappa = 0.544; P = 2.8e−05; Fig. 5A; Supplementary Table S1). Only three pairs of samples (6.4%) show disagreement in subtype calling results using PurIST. CC calls of the bulk samples are also shown as a comparison. We performed a similar evaluation with tumors that we had matched FFPE, FNA, and bulk samples available (Fig. 5B; Supplementary Table S1). We found complete agreement among PurIST subtype predictions among FFPE, FNA, and bulk samples in patients that had all three sample types available (five sets total), further supporting that PurIST is robust across different sample preparations. We also found that the genes pertaining to PurIST TSPs are comparatively less variable than genes not designated as tumor-intrinsic (Supplementary Fig. S5). For example, PurIST TSP genes, originally selected from our tumor-intrinsic gene list, have significantly higher Spearman correlation between sample types than Bailey immunogenic (P = 0.0149) or ADEX genes (P = 0.0083; Supplementary Fig. S5), using a permutation test (Supplementary Materials and Methods). The stability of TSP genes across sample types, support their robustness and their ability to identify tumor-intrinsic signals in samples that may be confounded by low-input or degradation.
Replicability of PurIST predictions on a NanoString platform
RNAseq assays in Clinical Laboratory Improvement Amendments (CLIA)–certified laboratories are still in their infancy. Thus, we evaluated the performance of PurIST on samples using NanoString nCounter, a gene expression quantification system that directly quantifies molecular barcodes. This platform has been widely used in cancer molecular subtyping (20), and is more widely available in CLIA-certified laboratories. In samples with both RNAseq and NanoString platform expression data available, we evaluated the consistency between subtype calls based on their RNAseq and NanoString expression data using PurIST-n (Fig. 5C; Supplementary Table S5; Supplementary Materials and Methods). This updated classifier is trained in a manner similar to PurIST, with the exception that genes were limited to those in common between the two platforms, as a more limited set of genes were available for our NanoString probeset. We found that there was strong agreement between PurIST-n calls in 51 patients with matched RNAseq/NanoString samples (Cohen Kappa = 0.879; P = 2.25e−11), where only one sample showed disagreement in its PurIST-n call. This discrepancy may be due to the relatively lower read count in the RNAseq sample for this patient. In addition, it is noteworthy that the PurIST-n call for this sample is a low confidence call (“lean classical”). These results support the replicability of PurIST on the NanoString platform and suggest that NanoString may be more robust at overcoming the hurdles of low input or degraded samples.
Applicability of PurIST to treatment decision making
We next evaluated the potential utility of using PurIST for clinical decision making. In basal-like and classical samples that were classified by PurIST, we found significant survival differences in both the pooled public (with all training group samples removed) and the Yeh_Seq FNA datasets, with basal-like samples showing shorter OS (Fig. 6A and B; Supplementary Fig. S6; Supplementary Table S4). We then looked at the relevance of PurIST to treatment response in the COMPASS and Linehan trials (Fig. 6C and D). PurIST recapitulated 48 of 49 PDAC subtype calls compared with the previous CC-based calls in the COMPASS dataset, and 66 of 66 subtype calls in the Linehan dataset (Supplementary Tables S1 and S2). Only one patient with a CC classical tumor was called basal-like by PurIST and had stable disease (SD, % change >−30% and <20%) in the COMPASS trial. Notably, the only PR seen in a PurIST basal-like tumor was in a patient with an unstable DNA subtype (10). In agreement with our CC analysis (Fig. 2B), we find that PurIST-predicted subtype tumors had similar associations with treatment response (Fig. 6C and D; Supplementary Table S3). We also found no change in PurIST subtype or the confidence of the call after treatment, suggesting that PurIST tumor subtypes are unchanged after treatment with FOLFIRINOX ± PF-04136300 (Fig. 6D and E). Finally, after excluding the sample with an unstable DNA subtype, we show a positive correlation between PurIST basal-like predicted class probabilities and worse treatment response in basal-like tumors (Fig. 6F). No association of PurIST classical confidence and treatment response was seen (Fig. 6G).
Discussion
Several subtyping systems for pancreatic cancer have now been proposed. Despite this, several limitations remain before they can be clinically usable. Here we leverage the wealth of transcriptomic studies that have been performed in pancreatic cancer to determine the molecular subtypes that may be most clinically useful and replicable across studies. Our results show that while multiple molecular subtypes may be used to characterize patient samples, the two tumor-intrinsic subtypes from the Moffitt schema: basal-like (overlaps with Bailey squamous/Collisson QM-PDA) and classical (overlaps with non-Bailey squamous/non-Collisson QM-PDA) are the most concordant and clinically robust. The compelling findings of basal-like tumors showing resistance to FOLFIRINOX and the lack of objective studies comparing current first-line therapies FOLFIRINOX versus gemcitabine plus nab-paclitaxel strongly support the need to evaluate the role of molecular subtyping in treatment decision making for patients with PDAC. Therefore, we have developed a SSC based on the two tumor-intrinsic subtypes that avoids the instability associated with current strategies of clustering multiple samples and the low tumor purity issues in PDAC samples.
Prior studies have shown that merging samples from multiple studies (horizontal data integration) can improve the performance of prediction models, relative to training on individual studies (21). However, systematic differences in the scales of the expression values in each dataset are often observed, as some may have been separately normalized prior to their publication or were generated from a variety of expression platforms. Complicated cross-platform normalizations are often employed in such situations prior to model training. Furthermore, new samples must be normalized to the training dataset prior to prediction to obtain relevant predicted values. This often results in a “test-set bias" (16), where predictions may change due to the samples in the test set or the normalization approach used. In addition, prediction models may change with the addition of new training samples, as renormalizations may be warranted among training samples. In all, this leads to potential complications for data merging, stability of prediction, and model accuracy (22, 23). We present PurIST, which is not dependent on cross-study normalization, and is robust to platform type and sample collection differences. We show that the sensitivity and specificity of PurIST calls are high across multiple independent studies, demonstrating that the PurIST classifier recapitulates the tumor-intrinsic subtype calling obtained initially by CC. Given the significant clinical relevance of the two tumor-intrinsic subtypes for both prognosis and treatment response, and the high accuracy of predicted subtype calls in our validation datasets, PurIST may have tremendous clinical value. Specifically, we show that PurIST works for gene expression data assayed across multiple platforms, including microarrays, RNAseq, and NanoString. Furthermore, the algorithm provides replicable classification for matched samples from snap-frozen bulk tissue as well as FNA, core biopsies, and archival tissues.
Thus, PurIST may be flexibly used on low input and more degraded samples and may be performed with targeted gene expression platforms such as NanoString, avoiding the need for a CLIA RNAseq assay. Our enduring findings that basal-like subtype tumors are significantly less likely to respond to FOLFIRINOX-based regimens strongly supports the need for the incorporation of molecular subtyping in future clinical trials to determine the association of molecular subtypes with this and other therapies. In addition, the stability of PurIST subtypes after treatment is a noteworthy finding and may point to fundamental biological differences in the tumor subtypes. However, larger clinical trials with pre- and posttreatment biopsies will be needed to determine whether this is a treatment-dependent observation. Our ability to subtype based on either core or FNA biopsies considerably increases the flexibility and practicality of integrating PDAC molecular subtypes into future clinical trials in the metastatic and neoadjuvant setting where bulk specimens are rarely available.
In summary, we present a clinically usable SSC that may be used on any type of gene expression data including RNAseq, microarray, and NanoString, and on diverse sample types including FFPE, core biopsies, FNAs, and bulk frozen tumors. Although results of the association of FOLFIRINOX resistance in patients with basal-like subtype tumors is compelling, future prospective clinical trials in patients with PDAC will be needed to evaluate the utility of PurIST in treatment decision making, and in the context of different therapies.
Disclosure of Potential Conflicts of Interest
J.J. Yeh, N.U. Rashid, X.L. Peng, and R.A. Moffit submitted US Provisional Application No.: 62/827473 on "Purity Independent Subtyping of Tumors (PurIST), A Platform and Sample Type Independent Single Sample Classifier for Treatment Decision Making in Pancreatic Cancer." No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: J.J. Yeh, N.U. Rashid, R.A. Moffitt
Development of methodology: J.J. Yeh, N.U. Rashid, X.L. Peng, R.A. Moffitt
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.J. Yeh, R.A. Moffitt, K.E. Volmar, B.A. Belt, R.Z. Panni, T.M. Nywening, S.G. Herrera, S.G. Hennessey, A.B. Morrison, A. Nayyar, B. Schmidt, H.J. Kim, D.C. Linehan
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.J. Yeh, N.U. Rashid, X.L. Peng, C. Jin, R.A. Moffitt, R. Kawalerski, D.C. Linehan
Writing, review, and/or revision of the manuscript: J.J. Yeh, N.U. Rashid, X.L. Peng, C. Jin, R.A. Moffitt, K.E. Volmar, B.A. Belt, K.J. Moore, R. Kawalerski, A. Nayyar, B. Schmidt, H.J. Kim, D.C. Linehan
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.J. Yeh, X.L. Peng, C. Jin, B.A. Belt, R.Z. Panni, T.M. Nywening, S.G. Herrera, K.J. Moore, R. Kawalerski, A. Nayyar, A.E. Chang, D.C. Linehan
Study supervision: J.J. Yeh, N.U. Rashid
Acknowledgments
We thank the following UNC Lineberger Comprehensive Cancer Center Core Facilities for their excellent technical assistance: Translational Genomics Laboratory, Animal Pathology, Translational Pathology, Tissue Procurement. We thank the University of Rochester Genomics Research Center for their excellent technical assistance. This study used the COMPASS data (https://www.ebi.ac.uk/ega/studies/EGAS00001002543), which was originally generated with the support of the Ontario Institute for Cancer Research through funding provided by the Government of Ontario. This study was funded by R01-CA199064 and R01-CA199064-03S1 (to N.U. Rashid and J.J. Yeh), U24-CA211000 (to X.L. Peng and J.J. Yeh), T32CA009621 (to R.Z. Panni), R01-CA168863 (to D.C. Linehan), and PANCAN RAN-2 16-95-LINE (to D.C. Linehan and J.J. Yeh).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.