Abstract
There is a need to individualize assays for tumor molecular phenotyping, given variations in the differentiation status of tumor and normal tissues in different patients. To address this, we performed single-cell genomics of breast tumors and adjacent normal cells propagated for a short duration under growth conditions that enable epithelial reprogramming. Cells analyzed were either unselected for a specific subpopulation or phenotypically defined as undifferentiated and highly clonogenic ALDH+/CD49f+/EpCAM+ luminal progenitors, which express both basal cell and luminal cell–enriched genes. We analyzed 420 tumor cells and 284 adjacent normal cells for expression of 93 genes that included a PAM50-intrinsic subtype classifier and stemness-related genes. ALDH+/CD49f+/EpCAM+ tumor and normal cells clustered differently compared with unselected tumor and normal cells. PAM50 gene-set analyses of ALDH+/CD49f+/EpCAM+ populations efficiently identified major and minor clones of tumor cells, with the major clone resembling clinical parameters of the tumor. Similarly, a stemness-associated gene set identified clones with divergent stemness pathway activation within the same tumor. This refined expression profiling technique distinguished genes truly deregulated in cancer from genes that identify cellular precursors of tumors. Collectively, the assays presented here enable more precise identification of cancer-deregulated genes, allow for early identification of therapeutically targetable tumor cell subpopulations, and ultimately provide a refinement of precision therapeutics for cancer treatment. Cancer Res; 77(10); 2759–69. ©2017 AACR.
Introduction
Gene expression–based molecular subclassification of tumors has gained clinical acceptance over the years and several tools have been commercialized for clinical use. Oncotype DX, ProSigna (PAM50) and MammaPrint (70-gene signature) are few such assays used in breast cancer management (1–4). A recent study suggested that MammaPrint assay aids in treatment decisions in early-stage breast cancer, particularly to identify patients who may not need chemotherapy (5). Superiority of few of these assays in tumor classification compared with traditional IHC based tumor classification is under debate. For example, although an earlier report claimed that PAM50 gathers more clinical information than IHC of hormone receptors or ki67 (6), a recent study disputed such a claim (7).
Although tumor classification based on gene-expression patterns has been valuable clinically, further progress in these assays are needed to address two clinically important issues. First, it has been difficult to discern whether gene expression patterns in tumors that led to subtype classification are acquired due to genome aberrations or reflect cell type origin of tumors. Recent discovery of enormous interindividual variation in gene expression in healthy tissues due to single-nucleotide polymorphism in the regulatory regions of genomes makes it even harder to identify mutation-driven gene expression changes when normal cells from the same individual are not available for comparison (8, 9). Second, tumor heterogeneity is a major clinical concern and the gene expression–based assays may identify only major clones of the tumor. Therefore, an ideal assay should be able to identify cancer-specific aberration in gene expression and identify both major and minor clones of tumor cells.
As an initial step to address the above issues, we combined the latest progress in propagating normal and tumor cells from the same patient using an epithelial-reprogramming assay (10) and single-cell genomics of PAM50/stem cell–associated genes (11). Unlike previously reported mammary epithelial growth conditions, which favors outgrowth of basal epithelial cells, reprogramming assay allows growth of stem, luminal progenitor and mature cells (12–14). Assays that allow growth of breast epithelial cells of different differentiation state are essential because most breast cancers, including basal-like breast cancers, are suggested to originate from luminal progenitors and then differentiate/dedifferentiate into specific subtypes (15–18). We have recently demonstrated that tumor and adjacent normal cells are in different differentiation state, which complicates our ability to distinguish mutation-driven gene expression changes in tumor from changes due to differences in differentiation state (14). In normal breast, >2,000 genes are differentially expressed between stem/progenitor and differentiated cells (19) and these differences alone can account for tumor to normal tissue gene expression variations noted in large scale studies. To partially overcome this limitation, comparison between normal and tumors were done two ways. Assays included either bulk populations of epithelial cells or flow cytometrically enriched ALDH+/CD49f+/EpCAM+ adjacent normal and tumor cells. In normal breast, these cells are considered to be undifferentiated highly clonogenic luminal progenitors that express both basal cell and luminal cell–enriched genes (20). These assays, performed with cells from four tumors and adjacent normal tissues, enabled us to identify major and minor tumor clones and distinguish genes aberrantly expressed in tumors from genes whose expression pattern in tumor mirrored expression pattern in a subset of normal cells, which are likely the cellular precursors of tumors.
Materials and Methods
Primary tissues, culturing by reprogramming assay, and flow cytometry
Breast tissues used were de-identified and the Indiana University Institutional Review Board considered the protocol non-human subjects. Freshly obtained or cryopreserved tissues were minced, digested, and subjected to culturing under modified reprogramming assay condition, as we have described recently (14). All cases used in the study were from mastectomy such that adjacent normal tissues were from regions as distant as possible from the tumor. Breast epithelial cells were collected by trypsinization and subjected to flow cytometry using antibodies described previously (14).
Single-cell qRT-PCR
We used the C1 Single-Cell Auto Prep System (Fluidigm) for single-cell capture and preamplification according to the manufacturer's instructions (protocol 100-4904 K1). Both adjacent normal and tumor cells were grown for similar duration and in the same media and collected by flow cytometry before loading into the integrated fluidic circuit (IFC). Briefly, a pool of all primers was prepared (100 μmol/L). A lysis final mix, a reverse-transcriptase (RT) final mix and a preamplification (PreAmp) final mix were prepared and stored on ice. Next, the IFC for normal medium-size single cells (10 to 17 μm in diameter; Fluidigm, part number 100-5479), or the IFC for tumor large-size single cells (17–25 μm in diameter; Fluidigm, part number 100-5758) was primed by adding C1 collection reagent, preloading reagent, blocking reagent and wash buffer into the IFC, and the IFC was placed into the C1 Single-Cell Auto Prep System and the script ‘STA:Prime’ was run. Priming lasted 20 minutes, during which time cells were prepared for loading. Cells were pelleted and media were removed to create a concentration of 300,000 cells/mL. The single-cell suspension was mixed with a 3:2 ratio of Suspension Reagent (Fluidigm). After priming was completed, blocking and priming solutions were removed and 6 μL of cell mix was loaded onto the IFC. The IFC was placed back into the C1, and the script “STA:Cell Load” was run. After cell loading was complete, the IFC was removed and single-cell capture sites were viewed using a microscope. Empty capture sites were noted, and the IFC was loaded with harvest reagent, lysis final mix, RT final mix, and PreAmp final mix. The IFC was placed back into the C1, and the script “STA:PreAmp” was run. After preamplification, the IFC was removed from the C1 and 3 μL of cDNA from each single cell was removed from individual collection wells and diluted in 25 μL of DNA suspension buffer (Fluidigm). The BioMark HD System (Fluidigm, protocol 68000088 K1) performs 96 individual qPCR reactions on the preamplified cDNA from every single cell by utilizing 96 × 96 Dynamic Array Gene Expression IFCs (Fluidigm, part number BMK-M-96.96). For the qPCR reactions, SsoFast EvaGreen Supermix with Low ROX (Bio-Rad, part number 172-5211) and DNA Binding Dye (Fluidigm, part number 100-7609) were combined with individual cDNA samples. The cDNA mix was loaded into one side of the IFC, and primer pairs were loaded into the other side. The Juno System (Fluidigm) was used to prime the IFC and to distribute cDNA mix and primer pairs into reaction chambers inside the IFC. The IFC was then transferred into the BioMark HD System where qPCR reactions were controlled. Data from wells that contained more than one cell or non-viable cells were excluded from the analyses.
Data analyses
Real Time PCR Analysis software is a data analysis tool available from Fluidigm as part of the Singular Analysis Toolset and can be found at https://www.fluidigm.com/software. This toolset was built on the open source software R (Version 3.0.2). These software packages were used together to annotate and analyze data files generated from the BioMark HD. Hierarchical clustering was used to identify co-expressed genes. Heatmaps and violin plots were generated to compare sample groups. ANOVA identified differentially expressed genes (Supplementary Table S1). Principal component analysis plots were generated to analyze outliers. Differentially expressed genes were subjected to Ingenuity pathway analyses (Ingenuity.com) to identify signaling networks uniquely active in tumor cells.
IHC
IHC procedure for MMP2 (antibody from Fisher Scientific #35-130-0Z) and quantitation have been described previously (21).
Results
Single-cell genomics in cancer are mostly being used to decipher tumor heterogeneity, to develop evolutionary tree, and to map cellular hierarchy (11, 22, 23). This technique has also been used to detect clonal selection in patient-derived breast cancer xenografts (24). Although DNA-based single-cell genomics, which only compares copy number variations and mutations, have seen enormous progress with respect to detection and data analysis, there are several limitations in the use of this technology for RNA-based studies, including interindividual variation in gene expression (9, 25) and differences in differentiation state of tumor and normal (14). Although perfection in elucidating cancer-specific gene-expression changes may be impossible to achieve, refinement is possible by co-adapting several latest technologies. Figure 1A provides a schematic view of our approach toward achieving these goals.
We propagated tumor and the adjacent normal tissues from four patients for a short duration using epithelial-reprogramming assay (10). In our previous study, we have shown that in vitro propagation does not introduce any mutations and potentially allows expansion of minor clones (14). Tumors in patient 1 and patient 2 were ER+/progesterone receptor + (PR+), whereas patient 3 had an ER+/PR−/HER2+ tumor. Patient 4 had ER+/PR+ tumor with extensive metastasis. Patient 2 tumor also expressed HER2 at moderately higher levels (2+). We first performed phenotypic characterization of tumor and adjacent normal cells using CD49f and EpCAM antibodies. CD49f+/EpCAM−, CD49f+/EpCAM+ and CD49f−/EpCAM+ cells are enriched for cells with stem/basal, luminal progenitor, and mature/differentiated/non-clonogenic properties, respectively (26, 27). As expected, tumors of patients 1 and 2 had higher levels of differentiated cells compared with their corresponding adjacent normal cells, whereas both tumor and the adjacent normal cells of patient 3 were enriched for luminal progenitor cells (Fig. 1B). Because of these differences in stem/progenitor and differentiated cell hierarchy between tumor and normal cells, we adapted two strategies for characterizing primary cells. The first series included comparison of gene expression between randomly selected epithelial cells of tumor with normal, irrespective of their differentiation state. Epithelial cells were sorted using EpCAM and Jam-A/CD321 antibodies to avoid contamination from any non-epithelial cells (Fig. 2A). The second comparison included a phenotypically defined population in which gated ALDEFLUOR+ stem/progenitor cells (28) were fractionated based on the expression of CD49f and EpCAM into CD49f+/EpCAM+ luminal progenitor cells (Fig. 2B). These cells in normal breast are considered to be undifferentiated luminal progenitors with overlapping luminal and basal cell gene expression pattern (20). Therefore, this selection procedure should enhance our ability to detect relevant cancer-specific gene expression differences in undifferentiated tumor cells. Between 27 and 93 cells per sample from isolated tumor or normal cells were subjected to qRT-PCR at the single-cell level with stemness-associated and PAM50 gene primers.
Results of unselected tumor and normal epithelial cell comparison
Unsupervised hierarchical clustering of qRT-PCR results of stemness gene set showed clear separation of tumor from normal cells in both patient samples (Fig. 3A and B). Tumor cells in both cases formed 2–3 minor clusters. However, tumor-enriched signaling networks differed between patient samples. A cluster of tumor cells in patient 1 showed enrichment of genes in the Wnt/β-catenin pathway (Fig. 3C). Major signals responsible for cancer cell dedifferentiation and germ cell fate, including SOX9 and SOX17 were upregulated in tumor cells compared with normal cells (29, 30). A cluster of tumor cells in patient 2 likely have an activated Wnt/βCatenin pathway but it is most likely due to downregulation of APC and AXIN1, which are negative regulators of the Wnt/β-Catenin pathway (Fig. 3D; ref. 31).
PAM50 gene set analyses enabled characterization of tumor and normal cells at multiple levels. First, Jam-A±EpCAM+ cells used in the analyses are epithelial because these cells expressed variable levels of keratin 5, keratin 17, EGFR, and ERBB2 (Fig. 4A). Keratin 5 and 17, although expressed predominantly in basal cells, are expressed in luminal cells, particularly cells with luminal A breast cancer gene expression pattern (32). Second, most often cultured breast epithelial cells are enriched for basal gene expression. Although our previous phenotypic analyses have shown that reprogramming assay conditions allow growth of both luminal and basal cells, this has not been proven at transcriptome level (14). In case of patient 1, normal cells were randomly distributed with one relatively large cluster with interspersed tumor cells and a minor cluster. The major cluster of normal cells expressed both basal and luminal cell-enriched genes, whereas the minor cluster expressed mostly luminal cell-enriched genes (Fig. 4A). Tumor cells formed two well-separated clusters, one expressing predominantly luminal cell-enriched genes and the other expressing both luminal and basal cell-enriched genes. Similar results in patient 2 with majority of tumor cells forming a single cluster that predominantly expressed luminal genes (Fig. 4B). Majority of normal cells expressed both luminal and basal genes. Third, PAM50 gene set analyses confirmed heterogeneity in gene-expression pattern in both tumor and normal cells of the same patient.
Tumor classification based on differential gene expression in phenotypically defined tumor and normal cells
To overcome difference in differentiation state between two cell types as a confounding factor in identification of cancer-specific gene expression changes, we performed the analyses in a phenotypically defined ALDH+/CD49f+/EpCAM+ subpopulation. CD49f+/EpCAM− and/or CD44+/CD24− cells, although considered much more stem/basal like, were not selected for the analysis because these cells constituted a minor population in both normal and tumors (Fig. 1B and data not shown).
With stemness-associated gene primer sets, normal and tumor cells segregated into two distinct groups in patient 1 (Fig. 5A). This analysis distinguished tumor cells from normal cells much more clearly than the analysis that involved unselected epithelial cells shown in Fig. 3A. Ingenuity pathway analyses of genes that are expressed at higher levels in tumor cells compared with normal cells identified two signaling networks; one involving VEGFA, WNT5A, SMAD3, and MMP2 and the other involving DNMT3A (Fig. 5B). In patient 2, tumor cells formed two unique clusters; one being very similar to normal cells and other showing very little expression of stemness-associated genes (Fig. 5C). Pathways that involved β-Catenin signaling and stemness-associated genes SOX9 and SOX17 were dominant in the major tumor clone (Fig. 5D). In patient 3, tumor cells formed one major cluster, which differed from normal cells through differential expression of NGFR and SERPINA1 (Fig. 5E). Tumor cells also lacked the expression of CDH2, which further indicates luminal nature of tumor cells. Similar analysis of tumor and normal of patient 4 is shown in Supplementary Fig. S1. Tumor cells were enriched for the expression of Wnt/β-Catenin pathway genes.
PAM50 gene set analyses provided additional insights. Keratin 14 and keratin 5 expression pattern in these phenotypically defined cells was similar to previously reported expression pattern in freshly prepared un-cultured CD49f+/EpCAM+ cells (16), suggesting limited introduction of culturing artifacts in gene expression analyses. In patient 1, although normal cells grouped into one cluster demonstrating basal/luminal hybrid gene expression pattern, tumor cells subgrouped into three clusters (Fig. 6A). One major cluster is most likely luminal B subtype as it expressed mostly luminal genes. Two other minor clusters were enriched for basal or basal/luminal hybrid gene expression pattern. Tumor samples of patient 2 also formed two distinct clusters; one expressing mostly luminal genes, whereas the other expressing both luminal and basal genes (Fig. 6B). Tumor cells in patient 3 are mostly homogenous expressing predominantly luminal genes and few basal genes (Fig. 6C). However, normal cells in this case clustered into two clusters; one leaning toward luminal and the other being luminal/basal hybrid. Note that basal/luminal hybrid gene expression pattern is not unique to cultured cells and is observed in primary tumor samples (33).
Clustering analysis of combined unselected and phenotypically defined cell population
It is often difficult to distinguish tumor cells from normal cells that may be embedded in the tumor and such cross contamination could impact data interpretation in single-cell RNA analysis. To address this issue, we performed unsupervised clustering of all cells from a patient sample. In the PAM50 analyses of cells from patient 1, the majority of tumor cells separated into two large clusters with very few tumor cells interspersed within two clusters of normal cells (Supplementary Fig. S2). In stemness gene set analysis, normal cells formed two distinct clusters with very few interspersed tumor cells, whereas tumor cells formed one large cluster (Supplementary Fig. S3). Results are similar with cells from patient 2. With PAM50 gene set, normal cells formed one cluster expressing both basal and luminal genes, whereas tumor cells formed one major cluster and one minor cluster with both clusters expressing mostly luminal genes compared with normal cells (Supplementary Fig. S4). With stemness gene set, normal cells formed one large cluster expressing both luminal and basal genes, whereas tumor cells formed two distinct clusters (Supplementary Fig. S5). As with patient 1, there were very few normal cells clustered within the tumor cell cluster, suggesting minimum normal cell contamination within the tumor. Overall, results presented clearly show the ability of the assay presented in this study to document heterogeneity within primary tumor at single-cell level. In addition, this clustering analysis further documents heterogeneity in gene expression in both normal and tumor cells.
Gene expression differences between tumor and normal: cell-type-origin of tumor versus transformation induced
It has often been difficult to determine whether the observed gene-expression pattern in a tumor is reflection of cancer-specific gene aberration or cell-type-origin of tumor. Towards this end, we organized results of qRT-PCR into violin plots, which show dynamic range of expression in cells and depict heterogeneity in expression. With phenotypically defined population of cells, tumor-specific overexpression of MMP2, CCND1, BAG1, and CTNNA1 was clearly evident in patient 1 (Fig. 7A). Analyses of data from unselected cells demonstrated a wide range of heterogeneity in expression patterns in both tumor and adjacent normal cells and the number of genes that could be considered differentially expressed between tumor and normal cells dropped significantly (Fig. 7B). For example, although overall expression of NGFR appeared to be lower in tumor cells compared with adjacent normal cells, tumor contained two populations of cells, one with NGFR levels similar to adjacent normal cells and the other population expressing lower levels of NGFR. CDH2 expression appears to be elevated in only a fraction of tumor cells as evident from both phenotypically defined and unselected cell analyses. Data from patient 2 (Fig. 7C and D) also revealed the ability of single-cell analyses to identify genes that are truly differentially expressed in tumors. MMP2 was only gene that showed overexpression in both tumors compared with normal irrespective of cell types used for comparison. We confirmed overexpression of MMP2 in tumors by IHC of tumor and adjacent normal tissues (Supplementary Fig. S6). Overall, the single-cell gene expression studies presented in this study enabled identification of major and minor subclones of tumor cells as well as genes truly differentially expressed in tumor compared with normal cells.
Discussion
In this study, we used single-cell gene expression analyses to further refine methods that can be used to decipher tumor heterogeneity at individual patient level, to identify minor tumor clones as well as to distinguish genes truly differentially expressed in tumor from genes whose expression pattern in tumor suggests cellular precursor of tumor. Although several previously used techniques of culturing normal and tumor cells from the breast are often biased toward outgrowth of cells with basal cell characteristics, single-cell analyses clearly showed that the method used in our study enabled growth of both normal and tumor cells with luminal gene expression pattern. Therefore, we believe that any artifacts introduced due to culturing are minimum and is the same for both normal and tumor cells. Also, note that there is no contamination of other cell types because all cells selected for the analyses expressed keratins, ERBB2 and EGFR at variable levels. As multiple studies including the recently published results of Human Functional Genome Project clearly show enormous functional variation in human genome that affects host–environment interaction at individual levels (8, 9, 25, 34), it is critical to develop assays that compare tumor with normal from the same patient. Data presented here demonstrate feasibility of comparing tumor and normal at individual level.
Ideally, the studies reported here need to be conducted with cells from fresh tissues with limited in vitro manipulation. Our attempts to conduct such analyses were unsuccessful due to limited number of epithelial cells from fresh tissues. Such studies are feasible only when large tumors are available and larger tumors often tend to have significant number of necrotic areas. Adjacent normal tissue also tends to have higher levels of fibroblasts and adipocytes and yield very few epithelial cells. Therefore, our analyses required enrichment of epithelial cells through short-term cultures. Because data were obtained from cultured cells, the effect of tumor microenvironment on tumor cell gene expression was not taken into consideration. Nonetheless, cell-intrinsic gene expression differences alone were sufficient to distinguish tumor from adjacent normal.
Precision therapeutics programs at multiple institutions rely mostly on testing gene aberrations at DNA levels to identify cancer-specific signaling networks (35). These assays are reliable in characterizing tumors compared with assays that use mRNA-based tumor characterization although recent studies have revealed limitations even with mutation-based assays because selected tumor tissue for DNA analysis may not be representative of entire tumor (36). mRNA-based cancer classification suffers from several limitations, including lack of appropriate normal controls, inter-individual heterogeneity in normal cell gene expression, differences in differentiation state between normal and tumors affecting gene expression, and the effect of stroma on gene expression in tumor cells. Greater than 2,000 genes are differentially expressed between luminal/progenitor and mature cells of the breast and we have previously demonstrated interindividual variability in differentiation state of normal breast epithelial cells (14, 19). Comparative analyses of phenotypically defined cells versus unselected cells described in this study clearly showed the need to refine cancer-specific pathway discovery by excluding confounding factors such as differences in differentiation state between normal and tumor cells and to use normal tissue from the same individual for comparison. Stemness-related signaling networks that appear to be active in tumor cells differed based on types of cells used for comparison. Lack of consideration to these aspects may be responsible for limited reproducibility of mRNA-based gene-expression signatures and even randomly selected gene sets demonstrating prognostic values in breast cancer (37).
It has been technically challenging to differentiate genes that are truly differentially expressed in tumor compared with normal from genes whose expression pattern in tumor suggests cellular precursors of tumor. This knowledge is critical if the focus is to develop therapies based on differentially expressed genes. Data presented in Fig. 7 clearly indicate the power of the analyses presented to identify truly differentially expressed genes in tumors. Number of genes differentially expressed in tumor compared with normal is much higher in patient 1 compared with patient 2. Further studies with additional samples are needed to link results of this type of studies with clinical parameters.
We were able to detect minor clones of tumor cells using the assay system. Although bioinformatics approaches have allowed dissection of genome and RNA sequencing data to build tumor evolutionary maps and detect minor tumor clones, functional evaluation of these minor clones for tumor recurrence or therapy resistance has not been possible. With cryopreserved tumor and normal cells from the same patients, the assay presented here provides an opportunity to test various drugs for their tumor cell-specific affects and residual cells after treatment can then be subjected to single-cell genomic studies to identify cancer cell types that are de novo resistant to treatment.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: E.F. Srour, H. Nakshatri
Development of methodology: S. Mohamad, G.E. Sandusky, E.F. Srour, H. Nakshatri
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M. Anjanappa, S. Mohamad, A. Gunawan, S. Rice, E.F. Srour, H. Nakshatri
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A. Cardoso, L. Cheng, S. Mohamad, Y. Dong, L. Li, G.E. Sandusky, H. Nakshatri
Writing, review, and/or revision of the manuscript: A. Gunawan, Y. Dong, G.E. Sandusky, H. Nakshatri
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. Anjanappa, H. Nakshatri
Study supervision: H. Nakshatri
Other (performance and analysis of single-cell experiments): A. Cardoso
Acknowledgments
We thank Mr. Luke Stewart of Fluidigm for the analyses of samples from patient 4. We also thank IU Simon Cancer Center Flow cytometry, Immunohistochemistry and Tissue Procurement Cores for their help.
Grant Support
This work was supported by Susan G. Komen for the Cure SAC110025 and Indiana Clinical Translational Sciences core pilot grant (H. Nakshatri). Walther Cancer Foundation provided support for the Bioinformatics core.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.