Abstract
Early detection of pancreatic ductal adenocarcinoma (PDAC) remains elusive. Precursor lesions of PDAC, specifically intraductal papillary mucinous neoplasms (IPMNs), represent a bona fide pathway to invasive neoplasia, although the molecular correlates of progression remain to be fully elucidated. Single-cell transcriptomics provides a unique avenue for dissecting both the epithelial and microenvironmental heterogeneities that accompany multistep progression from noninvasive IPMNs to PDAC.
Single-cell RNA sequencing was performed through droplet-based sequencing on 5,403 cells from 2 low-grade IPMNs (LGD-IPMNs), 2 high-grade IPMNs (HGD-IPMN), and 2 PDACs (all surgically resected).
Analysis of single-cell transcriptomes revealed heterogeneous alterations within the epithelium and the tumor microenvironment during the progression of noninvasive dysplasia to invasive cancer. Although HGD-IPMNs expressed many core signaling pathways described in PDAC, LGD-IPMNs harbored subsets of single cells with a transcriptomic profile that overlapped with invasive cancer. Notably, a proinflammatory immune component was readily seen in low-grade IPMNs, composed of cytotoxic T cells, activated T-helper cells, and dendritic cells, which was progressively depleted during neoplastic progression, accompanied by infiltration of myeloid-derived suppressor cells. Finally, stromal myofibroblast populations were heterogeneous and acquired a previously described tumor-promoting and immune-evading phenotype during invasive carcinogenesis.
This study demonstrates the ability to perform high-resolution profiling of the transcriptomic changes that occur during multistep progression of cystic PDAC precursors to cancer. Notably, single-cell analysis provides an unparalleled insight into both the epithelial and microenvironmental heterogeneities that accompany early cancer pathogenesis and might be a useful substrate to identify targets for cancer interception.
See related commentary by Hernandez-Barco et al., p. 2027
Introduction
Pancreatic ductal adenocarcinoma (PDAC) is the third leading cause of cancer-related deaths in the United States, and most patients present with unresectable disease due to the lack of effective early-detection strategies (1). This reiterates the critical need for understanding the pathogenesis of early neoplasia with the goal of developing biomarkers and molecular targets for cancer interception. The most common cystic neoplasm that is a bona fide precursor to PDAC is intraductal papillary mucinous neoplasms (IPMNs), which comprise roughly 40% to 50% of resected lesions that are initially diagnosed as asymptomatic pancreatic cysts (2). Although most IPMNs harbor low-grade dysplasia (LGD), it is imperative to distinguish IPMNs that have progressed to high-grade dysplasia (HGD), or harbor an outright invasive component (PDAC). To guide clinicians with identifying IPMNs harboring HGD or PDAC, several radiologic “high risk” or “worrisome features” (so-called Sendai and Fukuoka criteria) have been proposed (3). Although the rate of overdiagnosis and overtreatment has been significantly reduced, these criteria still lack optimal sensitivity and specificity (4). Although patients with noninvasive IPMNs have an excellent prognosis upon surgical resection, once an IPMN develops an invasive component, the probability of long-term survival drops significantly (5). Although we have now elucidated the signature driver mutations (KRAS and GNAS) that distinguish IPMNs from other pancreatic cysts, these do not reliably distinguish between indolent versus aggressive IPMNs (6, 7). In fact, the overall state of knowledge remains rudimentary, especially with regard to molecular metrics that can identify “aggressive” precancerous lesions that are likely to progress to carcinoma and require intervention from those that are “indolent” and will naturally regress or remain stable. Thus, much remains to be elucidated in terms of biomarkers of IPMN progression and the underlying molecular features of dysplastic cells that predicate to invasive neoplasia.
The application of single-cell DNA and RNA sequencing to cancers has provided unprecedented insights into tumor and microenvironmental heterogeneity present in established cancers. However, the extrapolation of these technologies to the compendium of precursor lesions has been scarce. In the specific context of the pancreas, prior studies have established the transcriptional profiles of normal cell types, such as islet cells, using single-cell approaches (8–10). In this study, we perform the first reported single-cell transcriptomic profiling of cystic precursor lesions of PDAC spanning histologic grades of dysplastic epithelium. Specifically, we demonstrate our ability to generate transcriptomic libraries from >5,400 single cells from surgically resected pancreatic tissues, including 2 IPMNs with LGD, 2 IPMNs with HGD, and 2 PDAC lesions utilizing a droplet-based single-cell RNA-seq methodology (11). Our results demonstrate that epithelial and stromal heterogeneity is evident even within precursor lesions during multistep carcinogenesis and reflect the progressive co-option of the microenvironment toward a tumor-promoting milieu.
Materials and Methods
Cell line and fresh tissue material from surgically resected IPMNs and PDAC
A total of 6 patients were recruited at MD Anderson Cancer Center (MDACC) and University of Pittsburgh Medical Center (UPMC) through informed written consent following institutional review board (IRB) approval at both institutions (PA15-0014, Lab08-0098, Lab05-0080, and Lab00-396). This study was conducted in accordance with Good Clinical Practices concerning medical research in humans per the Declaration of Helsinki. Two PDACs and 1 HGD-IPMN were profiled from MDACC and 2 LGD-IPMNs and 1 HGD-IPMN were profiled from UPMC. A previously established pancreatic tumoroid line (MP81) was also profiled (12). Tumoroid line source was validated through STR DNA fingerprinting and confirmed to be mycoplasma-free. Histologic subtypes of IPMN included in this study were gastric (1 LGD-IPMN), intestinal (1 LGD-IPMN), and pancreatobiliary (2 HGD-IPMNs). Tumor cellularity was determined by pathologic review of tissue and compared with yield of cell phenotypes from single-cell RNA-seq.
Preparation of fresh tissue material and dissociation into single cells
Pancreatic tissue was delivered to the laboratory on ice after surgical resection in DMEM, high-glucose, GlutaMAX Supplement, HEPES (Thermo Fisher, 10564011) in 1% bovine serum albumin (Thermo Fisher, B14) in a 15-mL conical tube. Tissue was then transferred to a 35- × 12-mm Petri Dish (Thermo Fisher, #150318), and minced with sterile surgical scalpel to 0.5- to 1.0-mm fragments in approximately 1 mL of the media.
Tissue digestion was performed with Liberase TH Research Grade (Sigma-Aldrich, 5401135001) alone for IPMN samples, or with both Liberase TH Research Grade and Accutase solution for PDAC tissues (Sigma-Aldrich, A6964). For warm digestion with Liberase TH Research Grade, pancreatic tissue fragments were incubated to a final concentration of 10 mg/mL and placed on an incubated orbital shaker at 37°C, 225 RPM for 20 minutes and gently pipetted every 10 minutes. At the end of the digestion period, the fragments (tissue slurry) were gently pipetted and washed to maximize the release of single cells. The tissue slurry was passed through a 100-μm cell strainer followed by a 35-μm cell strainer. The single-cell suspension was transferred to a new 15-mL conical tube and centrifuged for 5 minutes at 400 RCF at 4°C. The supernatant was discarded, and the cell pellet was resuspended in 400 μL of PBS for downstream cell viability analysis and cell counting.
For warm digestion of PDAC tissues, we followed the identical procedure above as the IPMNs, followed by a second digestion period using sterile-filtered Accutase solution, and placed on a shaker at 37°C, 225 RPM for 30 minutes, with gentle pipetting every 10 minutes.
ddSEQ SureCell library preparation
Single-cell transcriptomic amplification and library prep was performed using the SureCell WTA 3′ Library Prep Kit for the ddSEQ System following the manufacturer's recommendations (Illumina, cat. #20014279). Samples were processed across 6 sequential ddSEQ runs. Briefly, single cells were individually partitioned into subnanoliter droplets. Single cells were then lysed and barcoded inside individual droplets, with subsequent first-strand synthesis. Individual droplets containing barcoded sample cells were disintegrated, purified, and subjected to second strand cDNA synthesis. Purified amplified cDNA was subsequently quantified on a 2200 TapeStation System (Agilent) in order to validate quality of amplification by quantifying the area under the curve in regions between 200- and 5,000-bp range on a D5000 high-sensitivity screentape (Supplementary Fig. S5A). Successful reactions (>2 ng yield) were then fragmented and amplified using Nextera technology for low input cDNA material. Double-sided size selection and library purification was performed utilizing purification beads per the manufacturer's protocol to remove short library fragments. Final library quality and concentration was measured by TapeStation (Supplementary Fig. S5B) as the area under the curve. Libraries were sequenced on across 6 Illumina NextSeq 500 runs.
Single-cell clustering and transcriptomic analyses
Resulting FastQ files were run through the single-cell RNA-seq app on BaseSpace. Briefly, reads were aligned against a reference genome using Spliced Transcripts Alignment to a Reference (STAR), followed by barcode tagging and BAM indexing (13). Count files are generated using gene unique molecular identifier (UMI) counter and with cells passing quality filter based on cells above background and passing knee filter.
To perform the T-SNE clustering and additional downstream analysis, the UMI count files were compiled into sparse matrices and subsequently filtered based on the criteria that each cell must express a minimum of 300 genes and be composed of less than 15% mitochondrial genes. Following filtering, the data are log normalized, scaled, and centered. The output is then analyzed by principal component analysis (PCA). After identifying the number of dimensions to use, the result is clustered in a shared nearest-neighbor algorithm and T-SNE before plotting. The resulting T-SNE plots were colored according to specific features to visually present expression of genes in various clusters. These annotated matrices were also used to find differentially expressed genes between subpopulations. Heat maps organized by annotated clusters were generated by plotting the top 20 differentially expressed genes associating with each cluster. Functions for analysis were provided through the recommended workflow in the SEURAT package (11).
The correlation matrices were created based on the filtered list of cells used in the T-SNE. Matrices were formed through the function “cor” from the stats package. The Pearson correlation coefficients were then plotted in ComplexHeatmap (14). Cells are ordered through hierarchical clustering. For cell-cycle analysis, we defined a cell-cycle characteristic coefficient as the average of the log2(TPM + 1) across genes for a cell from the gene ontology set (version 6.2 MSigDB) related to the G1, S, G2, and M phases, respectively (15, 16). These coefficients were then plotted in ComplexHeatmap (13) and hierarchically clustered by cells. All T-SNE and heat maps were run in R v3.4.2.
Results
Concordance of single-cell RNA sequencing to bulk-cell lines
Prior to tissue profiling, we investigated the concordance of bulk and single-cell RNA-seq profiles from a PDAC tumoroid line (MP81) using the droplet-based sequencing technology. Single-cell and bulk-cell RNA libraries were made using Nextera library preparation chemistry. Single-cell RNA-seq was performed at 670 million reads, resulting in 30.4% of the reads mapping to the coding sequence (CDS) regions, and 37.7% mapping to UTR regions with 91,032 reads, 10,800 UMI counts, and a median of 3,293 unique genes detected per cell passing filter. Correlations in gene-expression levels between 2,022 single cells and bulk cell suspension were excellent (rs > 0.9), with total coverage gene counts of 17,507 and 15,185 for single cells and bulk cells, respectively (Supplementary Fig. S1A). We also verified the concordance of gene counts across 2 independent replicates of single-cell library preparation from the same cell line sample. This revealed a high correlation in gene-expression levels (rs > 0.9) with almost identical levels of gene coverage (14,994 and 14,488; Supplementary Fig. S1B).
Preneoplastic epithelium of IPMNs demonstrates both unique and shared transcriptomic signatures with PDAC
We subsequently applied droplet-based single-cell RNA sequencing to study the diverse transcriptional profiles that exist within surgically resected preneoplastic (IPMN) and invasive (PDAC) pancreatic lesions (Supplementary Figs. S1C and S1D, S2 and Supplementary Table S1). Cumulatively, among all of the tissue samples, 5,403 single cells were sequenced. Cells with low expression of genes (<300 genes) and high percentage of mitochondrial genes expressed (>10%) were digitally filtered out, resulting in 3,343 single cells used for the subsequent analysis. Scatter plots of number of UMIs compared with number of genes and abundance of mitochondrial transcripts revealed consistent read depth across single cells between lesions and absence of apoptosis induced transcript batch effects (Supplementary Fig. S1C). The mean number of genes and UMIs detected per cell was 1,101 and 3,194, respectively. After identifying the top variable genes, we performed PCA and determined which principal components (Supplementary Fig. S1D) to use for unsupervised clustering using t-distribution stochastic neighbor embedding (t-SNE; ref. 17), which was implemented using the SEURAT package (Fig. 1A; ref. 11). This analysis identified 10 distinct subpopulations (“clusters”) composed of unique stromal and epithelial components classified defined by characteristic gene-expression patterns (Fig. 1B and C; Supplementary Table S2; Supplementary Fig. S3).
tSNE plots of all 3,343 cells from 6 lesions included in this study, (A) annotated by different tissue samples, (B) annotated by unique cell types characterized by gene expression (ductal epithelium = Ep). C, Feature plots demonstrating expression of specified genes among clusters on the tSNE. D, Clustered heat map of cell-cycle characteristic coefficients per cell within the subpopulations indicated by the header demonstrating a greater proliferative state of PDAC-derived cells.
tSNE plots of all 3,343 cells from 6 lesions included in this study, (A) annotated by different tissue samples, (B) annotated by unique cell types characterized by gene expression (ductal epithelium = Ep). C, Feature plots demonstrating expression of specified genes among clusters on the tSNE. D, Clustered heat map of cell-cycle characteristic coefficients per cell within the subpopulations indicated by the header demonstrating a greater proliferative state of PDAC-derived cells.
We first identified the neoplastic single-cell “clusters” through previously defined signature transcripts for pancreatic epithelial lesions, including KRT19 and MUC1, which were present among all samples types (IPMNs and PDAC) irrespective of grade if dysplasia or invasion (refs. 18, 19; Fig. 2A). We subsequently sought to correlate the histologic grading of the designated “clusters” with known biomarkers of dysplastic progression to cancer. This revealed high expression of transcripts such as CEACAM6 within subpopulations of HGD-IPMN and PDAC samples, which confirmed previously published data on this marker in bulk RNA analysis and IHC of intact tissues (20, 21). Conversely, we observed high expression of MUC5AC, which encodes for an apomucin mostly seen in LGD and downregulated during histologic progression, within the LGD-IPMNs compared with the other sample types (18).
A, Violin plots of gene expression across lesion types confirm expression of characteristic PDAC and cystic preneoplasia-related markers. B, Heat map of the top 20 differentially expressed genes used to identify cell phenotypes across each of the 10 discrete clusters indicated by the header. C, Sankey diagram demonstrating epithelial cells profiled from LGD-IPMNs, HGD-IPMNS, and PDAC tissue and where they reside within annotated tSNE clusters. D, Correlation heat map of Pearson correlation coefficients of hierarchically clustered individual cells across all lesions, identified by originating lesion type and tSNE cluster.
A, Violin plots of gene expression across lesion types confirm expression of characteristic PDAC and cystic preneoplasia-related markers. B, Heat map of the top 20 differentially expressed genes used to identify cell phenotypes across each of the 10 discrete clusters indicated by the header. C, Sankey diagram demonstrating epithelial cells profiled from LGD-IPMNs, HGD-IPMNS, and PDAC tissue and where they reside within annotated tSNE clusters. D, Correlation heat map of Pearson correlation coefficients of hierarchically clustered individual cells across all lesions, identified by originating lesion type and tSNE cluster.
In an effort to identify “cluster-defining” signatures, we profiled the top 25 differentially expressed genes within each of the 10 distinct single-cell “cluster” (Fig. 2B; Supplementary Table S3). Annotation of the resulting transcripts revealed aberrant expression of multiple cancer-related genes even within the LGD-IPMN cells designated as “clusters” LG.Ep1, 2, and 3 (Fig. 1B). These include overexpression of transcripts such as TFF3 and REG4 that have been previously described as upregulated during cancer progression (22, 23). On the contrary, the LGD-IPMN clusters demonstrated persistent expression of putative tumor suppressor genes (TSG) such as RAP1GAP that have been shown to suppress tumor invasion and metastases across various cancers (including RAS-mutant neoplasms; refs. 24), while HGD-IPMNs (HG.Ep) revealed downregulation of the aforementioned TSGs, concomitant with higher expression of oncogenic transcripts previously associated with progression to PDAC such as S100P and S100A10, among others (25–27).
We then investigated whether biological differences could be observed through gene ontology and pathway analyses (IPA: Ingenuity Pathways Analysis) of differentially expressed genes between neoplastic epithelial clusters (Fig. 1B). Several aberrant canonical pathways were identified during the transition from LGD to HGD-IPMNs that are related to previously published core signaling pathways in PDAC, including integrin signaling, signaling by small GTPases, Wnt/B-catenin signaling, axonal guidance signaling, apoptosis, and G1–S phase regulation (Supplementary Table S4; refs. 28, 29). When comparing the PDAC and LGD lesions, aberrant expression of the same pathways was observed, with the additional deregulation of DNA damage response, TGF-B signaling, and SAPK/JNK signaling. Other aberrant canonical pathways identified during transition from LGD to HGD and PDAC lesions include metabolism related pathways such as oxidative phosphorylation and mitochondrial dysfunction, as well as mTOR signaling (Supplementary Table S5).
Cell-cycle analysis of epithelial cells across lesion types revealed higher proliferative states (G2–M and S phase) in cells derived from PDAC compared with LGD and HGD lesions (Fig. 1D). Generally, there was a significant proportion of LGD cells with low expression of cell-cycle genes likely representing cells within low proliferative states (G0 phase). On the other hand, there is a small population of cells derived from LGD lesions (LG.Ep3) with high expression of proliferation-related genes, which may represent a more aggressive subpopulation within this lesional type. This may suggest that even within lesions histologically identified as low grade, there exist rare populations of cells with varying degrees of putative malignant potential. Further, it is important to note that even “adenocarcinoma”-designated clusters contained cells originating from LGD-IPMNs with associated expression of PDAC core signaling pathways and deregulation of TSGs (Fig. 2C). In our particular data set, 1.2% of epithelial cells from LGD-IPMNs were found in adenocarcinoma-related clusters. Additionally, up to 8.9% of cells from LGD-IPMNs tended to cluster with HGD-IPMNs with their respective changes in gene-expression pathways. This suggests that even within IPMNs with otherwise LGD histology, there are cells that phenocopy the transcriptomic features of invasive neoplasia (Supplementary Fig. S2). The expression profiles of such low frequency cells within LGD-IPMNs would likely be missed during bulk RNA sequencing, and further underscores the utility of the single-cell sequencing approach in elucidating the epithelial heterogeneity that exists even within early precursor lesions of PDAC.
Analysis of the cell-to-cell correlations for gene-expression of the 3,343 cells demonstrated relatively higher intratumoral coherence among cells from LGD and HGD lesions compared with those from PDAC (Fig. 2D), a not surprising finding suggesting an increase in intratumoral epithelial heterogeneity during the progression from IPMNs to PDAC (30). Interlesion correlation was better observed in cells derived from stromal components including myeloid and lymphocytic populations, whereby a significant number of populations showed similarities across tissue types (Fig. 1A–C). This suggests the presence of common cancer-associated immune components among lesion types. On the other hand, even though these stromal components tended to cluster with one another (Fig. 1B), correlation heat maps suggested the presence of multiple unique subtypes within the stroma and nonrandom variations during histologic progression to cancer (Supplementary Fig. S4 and results below).
Virtual microdissection of stromal and immune heterogeneity during IPMN progression
In an effort to better identify unique subpopulations of stromal components across lesion types, we opted to perform single-cell digital microdissection of only stromal cells and exclude epithelial components. This resulted in the identification of 7 unique clusters with varying degrees of enrichment across lesions types (Fig. 3A and B). Differential expression of the top 20 genes across clusters allowed us to identify distinct immune and myofibrolast-derived phenotypes within each lesional subtype (Fig. 3C and D).
A, tSNE plot of all stromal cells that were virtually microdissection from entire lesions. Different colors represent annotation of unique cell phenotypes. B, Proportion of cell phenotypes enriched in each lesion (PDAC, HG IPMN, and LG IPMN); colors refer to unique cell types in A. C, Heat map of the top 20 differentially expressed genes used to identify cell phenotypes across clusters. D, Feature plots demonstrating expression of specific genes among clusters to identify respective cell types.
A, tSNE plot of all stromal cells that were virtually microdissection from entire lesions. Different colors represent annotation of unique cell phenotypes. B, Proportion of cell phenotypes enriched in each lesion (PDAC, HG IPMN, and LG IPMN); colors refer to unique cell types in A. C, Heat map of the top 20 differentially expressed genes used to identify cell phenotypes across clusters. D, Feature plots demonstrating expression of specific genes among clusters to identify respective cell types.
A high proportion of cytotoxic T cells (measured by CD8, and presence of granzyme- and perforin-related transcripts) were observed in LGD-IPMNs compared with HGD-IPMNs and PDAC. In proportion to other immune subtypes, CD4 T cells also appear to be more highly enriched in LGD-IPMN compared with others and present with generalized activation as defined by expression of CD69. We also detected the presence of rare B-cell populations (expressing CD20 and CD19) that are present in both HGD-IPMNs and in LGD-IPMNs, but are completely absent in PDACs. Presence of tumor-infiltrating B cells has recently been described in PanIN lesions with an immunosuppressive role during the initiating stages of PDAC multistep progression and may, in fact, have a similar role in the context of IPMNs based on these findings (10, 31, 32).
Notably, we observed a significantly enriched proportion of myeloid-derived suppressor cells (MDSCs), within the stromal component of PDAC, representing 51% (277/544) of single stromal cells profiled, compared with 2.3% (3/131) and 3.5% (10/281) within LGD-IPMNs and HGD-IPMNs, respectively. These MDSCs resemble the protumorigenic polymorphonuclear MDSCs (PMN-MDSCs) phenotype based on expression of CD11b (ITGAM), S100A9, CCL3, and APOE, which has been previously described to be prevalent during cancer progression (33). Among myeloid-derived populations, we also observed cDC2-type dendritic cells (DC), characterized by expression of CD1C, THBD, and FCER1A (34). These cells have been shown to have T-cell stimulatory potential and are critical mediators of cross-presentation of tumor antigens mediated through the high-affinity IgE receptor FcϵRI (FCER1A; ref. 35). This proinflammatory subpopulation appears in greater proportions among LGD and HGD-IPMNs, suggesting a more prevalent predication for antitumor immune response within preneoplastic lesions.
Heterogeneous fibroblast populations across histologic subtypes were also identified, potentially representing distinct stromal functions during tumorigenesis. For example, the fibroblast subtype known as “inflammatory” CAF (iCAF) is characterized by expression of VIM, FAP, COL3A1, DES, IL6, and CXCL12 and reduced expression of a-SMA (ACTA2; refs. 36, 37). This subpopulation has been shown to be involved in immunosuppression, growth factor secretion, and promotion of protumorigenic mechanisms facilitating invasion and metastasis (36), which might correlate with its exclusive representation in PDAC-derived clusters, representing 10.5% (58/544) of single stromal cells profiled, and absence within noninvasive IPMNs. A separate compartment of CAFs, described as myofibroblasts (“myCAFs”), with increased alpha-SMA expression and reduced expression of CXCL12 and DES, was also identified. These myCAFs have been implicated in distinct functions from iCAFs, including secretion of autocrine stromal and endothelial growth factors (36). This population is rare in LGD-IPMNs, but is highly represented in HGD-IPMNs, suggesting that activation of fibroblasts to the myCAF phenotype tends to occur even within noninvasive dysplastic lesions (Fig. 4).
Schematic representation of evolving molecular and phenotypic signatures during preneoplastic progression of pancreatic cancer.
Schematic representation of evolving molecular and phenotypic signatures during preneoplastic progression of pancreatic cancer.
Discussion
Histologically well-established precancerous lesions precede the onset of PDAC by years, if not decades. With an increasing use of diagnostic cross-sectional imaging, incidence of precancerous lesions in the form of pancreatic cysts is rising even in clinically healthy individuals (38–40). In general, approximately 2.4% of the general population undergoing MRI present with a cystic pancreatic lesion with individuals over 70 years having an incidence of up to 10% (41). To guide clinicians, several radiologic guidelines have been reported and progressively updated (3, 42, 43). In general, large, symptomatic main duct IPMNs with enhancing mural nodules show the highest probability of malignant transformation; frequency of invasive IPMNs is 43.1% (range, 11%–81%) and should be surgically treated (3). Yet several other case series of resected IPMNs report a wide range of IPMN-related pancreatic adenocarcinoma (6%–81%; refs. 44–46). A balance between diagnosis of potentially invasive IPMNs and harmful overtreatment of patients is thus crucial in our approach stratification of cystic lesions. In the current study, we describe an approach at profiling the molecular events that occur during pancreatic carcinogenesis in the context of these cystic precursors (IPMNs) with the goal of understanding the heterogeneity of epithelial and stromal components that might delineate lesions with an aggressive potential. We do this through high-resolution single-cell RNA sequencing of LGD-IPMNs, HGD-IPMNs, and PDACs, which allows us to specifically profile aberrant pathways across multiple cell types. Interestingly, progression from LGD and HGD lesions to invasive PDAC revealed shifts in distinct cell populations encompassing both epithelial and stromal/immune compartments.
Among epithelial populations, we detected expression of transcripts of gastric lineage that have been previously described during IPMN progression and have been correlated with better prognosis, such as MUC5AC, as differentially overexpressed in our LGD lesions (47). Although oncogenic transcripts are also expressed at the LGD stage, there appears to be retained expression of tumor suppressor–related pathways, which may help counteract further dysplastic progression of these lesions. Within HGD lesions, differential expression of these tumor suppressor pathways is no longer detected, and we begin to find previously described classic core signaling pathways in PDAC (29, 48).
As a stroma-rich cancer, multiple studies have shown how the diverse tumor microenvironment (TME) plays a crucial role in the development, progression, and immune evasion that exemplifies the biology of PDAC (49, 50). As previously shown by Moffit and colleagues, “virtual microdissection” of stromal genes revealed an “activated” subpopulation of stroma characterized by overexpression of inflammatory pathways (51). Although these findings are further validated within our own study, Moffit and colleagues were not able to distinguish specific contributions of immune and myofibroblast components to this signature. By dissecting specific cell types at the single-cell level, we further delineate this signature to describe the dichotomy of “myCAF” versus “iCAF” fibroblast populations during multistep progression, with the former cells observed (albeit rarely) even in LGD-IPMNs and the latter cells only identified in PDAC samples. Within our analysis, besides expression of many typical CAF markers (FAP, THY1, and DES), the iCAF cells also show a unique expression pattern of the C-X-C motif chemokine ligand 12 (CXCL12; ref. 52). This chemokine and its corresponding receptor (CXCR4) are believed to play a crucial role in many solid cancers and are associated with metastatic spread in breast cancer, lung cancer, and melanoma (53–56). Accumulating evidence also points to a role in PDAC progression, with higher expression of CXCL12 correlating with metastasis, likely by facilitating immune evasion, and increased levels of matrix metalloproteinases leading to cellular invasion (57–60). Within our own study, we find similar trends in the presence of other cellular populations that parallel the emergence of CXCL12-expressing iCAFs, such as a decrease in cytotoxic T-cell and increase in myeloid-suppressive proportions, creating the notorious immune-suppressive TME well described in PDAC.
Myeloid-derived subpopulations also tend to evolve during tumor progression. Composed of multiple clusters expressing CD11b, ITGB2, CD13, CD18, and S100A9, a clear separation based on their individual transcriptomes remains difficult. This high overlap of myeloid-derived subpopulations mirrors recent findings in breast cancer where previously proposed M1 and M2 macrophage-associated genes are frequently expressed within the same cells between these 2 clusters (61). As Azizi and colleagues describe, this suggests that distinct prototypical macrophages may not be prevalent within tumors and may in fact exist within the spectrum of these 2 phenotypes. Among other myeloid-derived populations, we do identify PMN-MDSCs that are known surrogates of prognosis and contribute to immune evasion and tumor progression through neoangiogenesis, migration and invasion, and thus metastatic spread (62, 63). Similar to the elevated proportion of CAFs experienced in more advanced disease, we describe a striking increase in MDSC populations compared with preneoplastic lesions (51% of nonepithelial cell types in PDAC, versus <5% in noninvasive IPMNs). This resembles findings from Kumar and colleagues whereby CAFs are able to actively recruit PMN-MDSCs to tumors, further supporting the role of an “inflammatory CAF” subpopulation in promoting immunosuppression (64). Additional work has described the importance of immune populations during tumorigenesis in the context of IPMNs specifically (65). Spatial characteristics as related to epithelial and cytotoxic T cell, as well as myeloid cell and cytotoxic T-cell proximity, have been shown to be a predictive feature of the presence of HGD. This is believed to be caused by immunosuppressive interactions derived from tumor cells and myeloid cells against cytotoxic T cells, regulating tumor development. With further data supporting the role of myeloid-derived cytotoxic T-cell suppression, immunotherapeutic approaches targeting this cross-talk may serve as a novel strategy at impeding tumor progression (10, 66).
The identification of demonstrable DC populations within LGD and HGD-IPMN microenvironment suggests a proinflammatory phase that is present during the noninvasive precursor states. This is particularly true in LGD-IPMNs where a higher proportion of cytotoxic T cells exist in comparison with other lesion types that may be supported by cDC2 cells through cross-presentation of tumor antigens and T-cell stimulation. Additionally, the presence of these DCs in IPMNs may also provide opportunities for cancer “immune interception,” because DC-targeted vaccines have previously shown effectiveness in the context of an myeloid immunosuppressive environment through reduction of PMN-MDSCs (67).
Although single-cell RNA sequencing provides new tools for high-resolution profiling of cell populations, it is not without its limitations, particularly in the context of this study. Although these methodologies are accurate in determining relative abundance of transcripts, they suffer when attempting to capture lowly expressed genes (68). This is mostly of symptom of droplet-based approaches as currently described. Although they allow for high-throughput sequencing of thousands of single cells from a single sample by counting the 3′ end of transcripts, sensitive detection of low to moderately expressed mRNA transcripts is forfeited compared with those methodologies providing full-length transcript data (68–70). As we aimed to profile the complex heterogeneity of these preneoplastic tissues within this study, we prioritized maximizing throughput through a droplet-based approach at the cost of observing an average of 1,101 genes per cell. Sequencing depth is another important consideration, as low read depth would not be optimal for providing detailed transcript information from a single cell that may contain subtle changes gene-expression signatures. But even with increasing sequencing depth, transcript dropout remains a significant limitation where poor mRNA capture and amplification may result in up to 72% of loss transcripts, particularly in genes expressed at low levels (68). It is also important to note that single-cell dissociation of tissues remains a challenging protocol, for 1 thing to maintain viability of cells throughout the process and the other to maintain relative cell representation as isolation protocols often lead to selective degradation of specific cell types. There also remains the potential for these methodologies to alter expression states and mRNA levels during this process. Further work has demonstrated that this can be ameliorated through RNA-seq of single nuclei which may result in less of a bias of cell populations (19, 71). Further limitations within our own study include the low number of samples profiled. This is mostly a consequence of the relative paucity of freshly resected surgical samples from pancreatic cysts of appropriate cellularity (low-grade cysts are particularly challenging in being paucicellular), as well as the difficulty of obtaining adequate viability following single-cell dissociation in order to create viable cDNA and sequencing libraries. This further exacerbates the fact we could not obtain consistent histologic subtypes in our preneoplastic lesions, which can have a direct impact on transcriptomic heterogeneity.
In conclusion, we describe how the TME may evolve during the multistep progression of IPMNs to PDAC, whereby the noninvasive precursor lesions begin to experience a loss of cytotoxic T cells during dysplastic progression and begin to assimilate immune components such as PMN-MDSCs, with immune-suppressive properties. Additionally, we demonstrate this permissive microenvironment is correlated with appearance of tumor-promoting iCAF stromal cell populations that facilitate immune evasion. Notably, single-cell analysis of IPMNs reveals, for the first time, that even in otherwise histologically innocuous LGD-IPMNs, we find minor populations of cells that transcriptionally phenocopy HGD and PDAC (∼9% and ∼1% of LGD-IPMN cells, respectively). It is possible that future single-cell analyses on a larger series of LGD-IPMNs might establish a “threshold” that portends aggressive natural history even in the absence of radiologically detectable worrisome features. Although it is important to stress the limited generalizations that can be made from such a small subset of lesions, the ability to detect these gene-expression patterns among single cells provides a primer for uncovering how heterogeneous cell types contribute to tumor carcinogenesis. Leveraging this strategy may thus facilitate elucidation of molecular biomarkers for disease stratification.
Disclosure of Potential Conflicts of Interest
C.M. Taniguchi reports receiving commercial research grants from Galera Therapeutics. A.D. Singhi is a consultant/advisory board member for Foundation Medicine. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: V. Bernard, F.C. Mulu, B.M. Stephens, M.H. Katz, A. Maitra, H.A. Alvarez
Development of methodology: V. Bernard, A. Semaan, F.A. San Lucas, F.C. Mulu, B.M. Stephens, C.-W. Tzeng, M.H. Katz, H.A. Alvarez
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): V. Bernard, A. Semaan, F.C. Mulu, B.M. Stephens, J. Zhao, N. Kamyabi, C.M. Taniguchi, M.P. Kim, C.-W. Tzeng, M.H. Katz, A.D. Singhi, H.A. Alvarez
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): V. Bernard, A. Semaan, J. Huang, F.A. San Lucas, Y. Huang, N. Kamyabi, P.A. Scheet, M.H. Katz, A. Maitra, H.A. Alvarez
Writing, review, and/or revision of the manuscript: V. Bernard, A. Semaan, J. Huang, F.A. San Lucas, B.M. Stephens, P.A. Guerrero, N. Kamyabi, S. Sen, C.M. Taniguchi, M.P. Kim, C.-W. Tzeng, M.H. Katz, A. Maitra, H.A. Alvarez
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): V. Bernard, B.M. Stephens, H.A. Alvarez
Study supervision: C.M. Taniguchi, H.A. Alvarez
Acknowledgments
This work was supported by the MD Anderson Moonshot Program and the Khalifa Bin Zayed Al-Nahyan Foundation, the NIH (U01CA196403 and U01CA200468 to A. Maitra and in part by U54CA096300 and U54CA096297 to V. Bernard), the Cancer Prevention Research Institute of Texas (RP140106 to V. Bernard and RP170067 to N. Kamyabi and V. Bernard), and the DFG (German Research Foundation; SE-2616/2-1 to A. Semaan).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
References
Supplementary data
Patient characteristics
Cell phenotype gene list
Pathway analysis among lesions
Supplemental legend