Glioblastoma (GBM), an incurable tumor, remains difficult to model and more importantly to treat due to its genetic/epigenetic heterogeneity and plasticity across cellular states. The ability of current tumor models to recapitulate the cellular states found in primary tumors remains unexplored. To address this issue, we compared single-cell RNA sequencing of tumor cells from 5 patients across four patient-specific glioblastoma stem cell (GSC)–derived model types, including glioma spheres, tumor organoids, glioblastoma cerebral organoids (GLICO), and patient-derived xenografts. We find that GSCs within the GLICO model are enriched for a neural progenitor–like cell subpopulation and recapitulate the cellular states and their plasticity found in the corresponding primary parental tumors. These data demonstrate how the contribution of a neuroanatomically accurate human microenvironment is critical and sufficient for recapitulating the cellular states found in human primary GBMs, a principle that may likely apply to other tumor models.
It has been unclear how well different patient-derived GBM models are able to recreate the full heterogeneity of primary tumors. Here, we provide a complete transcriptomic characterization of the major model types. We show that the microenvironment is crucial for recapitulating GSC cellular states, highlighting the importance of tumor–host cell interactions.
See related commentary by Luo and Weiss, p. 907.
This article is highlighted in the In This Issue feature, p. 890
Glioblastoma (GBM) is an incurable and malignant primary brain tumor. Despite the discovery more than a decade ago of a primitive tumor/glial “stem-like” cell (GSC) that propagates the disease following a hierarchical developmental process, no clinically effective GSC-targeted therapies have been developed to date (1, 2). Further elucidation of GSC biology and hierarchy will be pivotal for developing more effective treatment (3), but the diffuse and heterogeneous nature of the disease makes it exceedingly difficult to model. Single-cell RNA sequencing (scRNA-seq) has enabled the characterization of the intratumoral heterogeneity of human gliomas (4) and the mapping of developmental lineages across multiple glioma subtypes (5–7). Recent work has identified plasticity between at least four cellular states [astrocyte (AC)-like, mesenchymal (MES)-like, oligodendrocyte progenitor cell (OPC)–like, and neural progenitor cell (NPC)–like] in isocitrate dehydrogenase (IDH)–wild-type GBMs that recapitulate the whole tumor heterogeneity. Each of these states is enriched for but not defined by specific genomic aberrations. The proportion of cells in each of these states in a given tumor accounts for the overall transcriptomic type assigned to that tumor (MES, proneural, or classic). Unfortunately, successful application of this knowledge to new treatments remains elusive to date.
GBM models serve as an indispensable tool for studying the disease and screening potential therapies. Serum-free glioma spheres established the importance of GSCs as an accurate in vitro model (8). Recently, expanding glioma spheres to three-dimensional culture systems has enabled improved modeling of GBM cellular heterogeneity (9). In addition, patient-derived xenografts (XE) provide a microenvironment for cells to not only develop heterogeneous populations as three-dimensional tumor organoids (TO), but also facilitate cross-talk with other nonneoplastic normal host cells while avoiding the lack of human tumor genomic complexity inherent in genetically engineered mouse models. Nevertheless, recent work has demonstrated that XE can follow mouse-specific evolutionary trajectories which potentially jeopardize cancer modeling studies (10). Patient-derived tumor cells grown in embryonic or induced pluripotent stem cell–derived human cerebral organoids are a promising model system that has been shown to recapitulate tumor-like structures while maintaining serial transplantation competency (11–13).
Because no one tumor model is perfectly representative of the human disease, different models will likely be most appropriate depending on the question being asked. To characterize tumor cell subpopulations in different GBM models, we profiled the transcriptomes of more than 60,000 single cells across five primary GBM patients/tumors (IDH–wild-type), each used to derive four different GBM models: two-dimensional glioma sphere cultures (2-D), three-dimensional TO, glioblastoma cerebral organoids (GLICO), and patient-derived XE. We compared the models' transcriptional profiles with each patient's own tumor and found higher correlation with GLICO as compared with the other model types. Furthermore, by combining cells across all models, we can recapitulate all four GSC cellular states—NPC-like, OPC-like, AC-like, and MES-like (7)—recapitulating the makeup of the original tumor. We found that GLICOs have a higher percentage of GSCs in the NPC-like cell state and, along with the XE model, have a higher proportion of OPC-like cells. Importantly, both NPC and OPC compartments were associated with stemness and presented a similar enrichment in the single-cell transcriptomes of primary GBM samples (14). We also see relevant GBM and neural developmental/stem cell genes that are differentially expressed in GLICO as compared with 2-D and TO, including SOX4, BCAN, and genes encoding members of the NOTCH pathway. Finally, we replated GLICO cells in 2-D culture conditions and show that a significant component of the NPC-like and AC-like compartments is lost without the organoid microenvironment. Overall, these data reveal important differences between patient-derived models, including key GBM features present in the GLICO model, at a single-cell level useful for future studies.
GLICOs Correlate with Patient Tumors and XE
We generated four models from each of five GBM patient tumors as shown previously (12). Figure 1A shows an overview of the generation of the different models. Hematoxylin and eosin staining of the patient tumor, the GLICO model, and the patient-derived XE reveals the characteristic microscopic invasiveness of GBM (Fig. 1B and C; Supplementary Fig. S1A) as is consistent with previous observations (11, 12). The primary tumors tested were representative of the different GBM genetic landscape and transcriptomic subtypes (Supplementary Table S1; Supplementary Fig. S1B). We performed 10x Genomics Chromium scRNA-seq on sorted GFP+ GSCs, and 19 of the 20 models and a total of 62,885 cells passed stringent quality-control filters (see Table 1; Methods). We inferred copy-number alteration (CNA) profiles from the scRNA-seq data as described previously to characterize samples across models (ref. 4; Supplementary Fig. S1C). As is common in GBM, all of the tumors exhibited chromosomal amplification and deletions, including the hallmark chr7AMP/chr10DEL which was present in all but one patient.
|Patient cell line name .||Model .||Number of cells .||Average number of genes per cell .||Average % mitochondrial genes .||Average number of counts .|
|Patient cell line name .||Model .||Number of cells .||Average number of genes per cell .||Average % mitochondrial genes .||Average number of counts .|
aBulk RNA-seq of patient tumor available.
In patients with bulk RNA-seq data available (WCM1, WCM2, WCM3, and WCM5), we sought to quantify the similarity between the models' single-cell transcriptomes and patients' bulk transcriptomes. To visualize the models at a single-cell level, we performed a combined principal component analysis (PCA) of all of the cells' gene-expression data along with the bulk transcriptional data from each patient (Fig. 1D). We observe all of the models exhibit single-cell heterogeneity, consistent with the intratumoral heterogeneity of GBM (4). We see that 2-D and TO cluster exclusively together in all 5 patients, and GLICO and XE tend to cluster closer together, which was further demonstrated using unsupervised hierarchical clustering of the models' average transcriptomes (Fig. 1E). This indicates GLICO may be able to model the same transcriptional programs as XE, such as interactions with nonmalignant brain cells, whereas 2-D and TO models cannot. Of note, none of the models were able to accurately model tumor WCM1 when compared with the 4 other patients. CNA showed evidence of subclones in the copy-number profiles of the 2-D and TO models that were not present in either the XE or GLICO CNA profiles, which may contribute to the global principal component (PC) differences in this patient. We also see from the unsupervised clustering that none of the models clustered with the bulk patient tumor, which confirms our intuition that no model, including XE, will perfectly recapitulate the patient tumor.
To see whether the principal components were capturing relevant biological information and not library size or housekeeping genes, we visualized the genes with the highest correlation to the principal components that separate 2-D/TO from GLICO and the bulk transcriptional profile, when applicable (Supplementary Fig. S1D). When we do this, we see genes including SOX4, EGFR, and NFIA, which are known to be important in neural and glioma development, thus indicating biological signals are driving the difference between the various models (15, 16).
Next, we computed the correlation of every cell with its corresponding bulk patient tumor and visualized the distributions by model. Although the correlation coefficients differ between individual tumors, on average we see that across all of the patients, GSCs cultured in GLICO are significantly more correlated to the corresponding patient's bulk transcriptome than the other models (P < 0.0001, Fig. 1F). When broken up by patient, we see cells from GLICO and XE have consistently higher correlation to the patient tumor than 2-D and TO (Supplementary Fig. S1E). The violin distributions in Fig. 1F also show a subset of cells in GLICO with high correlation values, suggesting the presence of a particular patient-related state of this model. Lastly, all of the models show heterogeneity in their correlations, indicating that although more GLICO cells match the bulk patient tumor, the heterogeneity in the patient tumor cannot be described by bulk RNA-seq alone.
Finally, in order to extend the comparison and include primary GBM single-cell transcriptomic data, we incorporated eight primary scRNA-seq samples into our analysis, restricting to malignant cells with high variation across the dataset, as in ref. 14 (see Methods). When we performed hierarchical clustering between the aggregated transcriptomes of each model type, we found that GLICO and XE clustered closer to the primary samples than 2-D and TO (Fig. 1G).
Unsupervised Cluster Analysis Characterizes GSC Subpopulations
Following comparison with patient tumors, we next sought to characterize the intratumoral heterogeneity of our cell populations across the models in an unsupervised manner. To do this, we performed Louvain clustering, which employs a community-based clustering algorithm (17), on all of the batch-corrected cells to define clusters (C0–C6) and used a Uniform Manifold Approximation and Projection (UMAP) projection to visualize the data (ref. 18; Fig. 2A; Supplementary Fig. S2A). We observed that certain clusters express well-known marker genes, notably genes encoding cell-cycle and proliferation markers in C1 and C4 (e.g., MKI67), AP1 subunits and AC markers in C0 (e.g., GFAP), hypoxia markers in C5 (e.g., VEGFA), and proneural and pluripotency markers in C2 (e.g., SOX4; Fig. 2A and B).
To further characterize each cluster, we computed enrichment scores for a set of GBM-related gene sets from MSigDB (19) and used hierarchical clustering to define the relationship between Louvain clusters (see Methods; Fig. 2C). C1 and C4 cluster together as proliferative clusters with C1 containing G1–S cells and C4 containing G2–M cells (Supplementary Fig. S2B). Together, we label these two clusters as the “cell cycle” cluster. C5 and C6 clustered close to each other with strong scores for hypoxia (strongest in C5), apoptosis, and amino acid deprivation genes (strongest in C6; Supplementary Fig. S2C). The presence of a cluster of rapidly proliferating tumor cells, a cluster of a more quiescent, stem-like cell population, and a core of hypoxic cells has been well characterized in the literature (4, 7, 9, 20, 21). Clusters C0 and C2 both show expression of genes important in glial development; however, C0 shows a classic and AC-like GBM phenotype, whereas C2 shows a stronger proneural and NPC/OPC-like phenotype as defined in previous studies (22). To test the validity of the batch correction methods, we also performed Louvain clustering on a per-sample basis (N = 220 clusters across 19 samples) and then looked for recurrent patterns across these clusters. This resulted in eight meta-cluster groups with recurrent patterns nearly identical to those obtained from batch correction (Supplementary Fig. S2D; see Methods).
After defining the cluster identities of the cells in our study, we next looked at the composition of each model as defined by these clusters (Fig. 2D). Strikingly, we find the GLICO model has a higher proportion of cells in the proneural cluster (C2) than the other models (P < 2.2 × 10−16), whereas XE has a higher proportion of cells in the classic cluster (C0, C3). In contrast, the 2-D model has a higher proportion of cycling cells (C1, C4), as reported previously for this in vitro model in comparison with its patient-derived transcriptome (4), as well as a lower proportion of cells in the proneural cluster (C2). Finally, the TO model demonstrated a much larger fraction of hypoxic cells (C5) consistent with the known central hypoxia and subsequent necrosis that forms in many TO compared with the nonhypoxic environment of the normal cerebral organoids and the in vivo mouse brains, the tumor microenvironments utilized in the GLICO and XE models, respectively (9).
We next sought to relate the clusters to the traditional GBM subtypes in each of the models. In order to infer the subtype of each cell, we correlated each cell's transcriptional profile to bulk RNA-seq data from reference GBMs from The Cancer Genome Atlas (TCGA; refs. 23, 24). We restricted our reference set to the three GBM subtypes—proneural, classic, and MES—due to recent literature questioning the validity of the neural subtype (25) as well as our own findings that the neural subtype was not separable from the other three subtypes (Fig. 2E; Supplementary Fig. S2E and S2F). We find that the GLICO models have a higher percentage of cells labeled as the proneural subtype and a lower percentage of MES cells than the other models, consistent with the cluster analysis. This is consistent with previous single-cell analyses of primary GBMs that have shown a proneural subpopulation of cells with a strong stemness signature in all GBMs regardless of the assigned bulk tumor transcriptional subtype of the tumor (4, 7).
GLICOs Are Enriched for NPC-Like and OPC-Like Cellular States
After characterizing the intratumoral heterogeneity across models, we next analyzed specific glial lineages in this dataset. Using stemness, AC, and OPC lineage marker genes, we calculated lineage scores of the cells as in refs. 5 and 26. We visualized the lineage scores on the UMAP projections as well as key lineage-specific genes to distinguish between more stem-like, OPC-like, and AC-like populations (Fig. 3A and B).
In addition to SOX4 expression, an essential factor in maintaining stemness (15), high expressions of NFIA and SOX11 have been implicated in neural stem cell development and GSC identity, and these were used as stem-like marker genes (5, 16). OLIG1 and OLIG2 are among the first markers of the oligodendroglial lineage and are both highly expressed in GSCs. OLIG2 in particular was one of four neurodevelopmental transcription factors shown to be essential for GBM propagation (27). GFAP and APOE are well-known AC markers with APOE marking astroglial lineage and GFAP marking more differentiated AC (26). Additional marker gene expression visualization can be found in Supplementary Fig. S3A. Based on lineage score and marker gene criteria, we see in the UMAP projection that C2 is made up of both NPC-like and OPC-like cells and C0 is made up of AC-like cells.
We next compared lineage scores of the cells across models (Fig. 3C) and see the GLICO model has significantly higher stem-like and OPC-like scores than the 2-D or TO models (P < 0.001, one-way ANOVA). GLICO also has a higher stem-like score than the XE model, but comparable OPC composition. Finally, XE has higher AC lineage scores than the other models, consistent with its classic GBM phenotype.
To compare these results with the recent findings by Neftel and colleagues (7) that GBMs are made up of transitional states of NPC-, OPC-, AC-, and MES-like cells, we assigned cell types using their four cell type signatures and plotted the location on our UMAP projection (Fig. 3D; Supplementary Fig. S3B). Using the two-dimensional visualization of the Neftel meta-module scores, we see GLICO has a broader spread of the GBM cells, highlighting the increased cell type diversity seen in the cerebral organoid model compared with the other models (Fig. 3E). We see that the clusters we found in our unsupervised clustering analysis correspond strongly to the four Neftel cellular states (Fig. 3F). Our proneural cluster (C2) is comprised of both NPC-like cells in the lower group of cells and OPC-like cells above. In line with previous studies (7, 21), our hypoxia cluster (C5) corresponds with the MES-like compartment. Finally, our classic phenotype cluster (C0) matches the AC-like compartment. We also observed that cycling cells were located at intermediate states as the Neftel data suggested.
Next, we sought to compare the cellular compositions of primary tumors with the different model types. Using scRNA-seq of primary GBMs (ref. 14; Supplementary Fig. S3C and S3D), we observe the same main cellular states after including the patient data (Fig. 3G). Interestingly, we see that although the 2-D, TO, and XE models are again less enriched for the NPC/OPC cluster, the primary tumor cells share a similar NPC-like subpopulation to GLICO (P < 2.2 × 10−16 for primary and GLICO cells, hypergeometric enrichment test, Fig. 3H; Supplementary Fig. S3E). This confirms that GLICO maintains an NPC/OPC subpopulation present in primary tumors that is largely absent in 2-D, TO, and XE models. Moreover, GLICO was most similar to the primary GBM samples with respect to cellular states composition (Supplementary Fig. S3F). Overall, these results show that GLICO, although not identical to primary tumors, recapitulates the GBM cellular states and partially mirrors essential aspects of primary GBM heterogeneity.
GLICOs Express NOTCH Pathway Members and GBM Invasiveness Markers
We next performed differential expression analysis using MAST, a package that adjusts for the high rate of dropout in scRNA-seq data, to find which genes were differentially expressed in GLICO as compared with the other in vitro models, 2-D and TO (ref. 28; Fig. 4A; Supplementary Fig. S4A and S4B). Prominent GBM genes SOX4, BCAN, and KPNA2 are among the significant genes with the highest positive log fold change expression in the GLICO model. BCAN has been shown to be a marker of GBM invasiveness, and KPNA2 promotes metabolic reprogramming in GBM (20, 29). Notably, three NOTCH pathway members are also among the highest differentially expressed in the GLICO model—the Delta-like NOTCH ligand DLL3 and the NOTCH effector genes HES1 and HES6. HES6 was also found to be a part of the core GSC-specific transcription factor network found by ref. 27. We then performed gene set enrichment analysis (GSEA) of the differentially expressed genes, which confirmed enrichment of the proneural gene set (P < 2.2 × 10−16, FDR < 0.05), as well as significant enrichment of a gene set enriched for GBM cell lines growing with a spherical phenotype over an adherent growth monolayer, further confirming an enriched stem-like phenotype in the GLICO model (P < 2.2 × 10−16, FDR = 0.08, Fig. 4B; Supplementary Fig. S4C). We compared expression of these NOTCH pathway members and GBM genes across models (Fig. 4C; Supplementary Fig. S4D) and found that GLICO had more cells with higher DLL3, SOX4, BCAN, and HES6 expression than the 2-D and TO models.
Ingenuity Pathway Analysis for cellular functions based on genes differentially expressed in GLICO versus 2-D and TO revealed an inactivation of necrosis and apoptosis pathways in GLICO and the activation of cell proliferation (Fig. 4D). This inactivation of cell death pathways in GSCs grown in GLICO compared with those in vitro may in part account for the greater resistance to cytotoxic chemotherapy seen when matched GSCs are grown in GLICO compared with those same cells grown in vitro (12).
Expression of the top three genes—SOX4, BCAN, and DLL3—in GLICO was confirmed using immunofluorescence staining (Fig. 4E). In addition, SOX4 is present in both 2-D and GLICO, but the expression is localized to the nucleus only in the GLICO model (Supplementary Fig. S4E). This highlights the existence of additional gene-regulatory mechanisms that cannot be captured with scRNA-seq alone but are likely important in GSC biology. In addition, we see that the NOTCH ligand DLL3 is expressed not only in GLICO GSCs, but also in the cerebral organoid cells of GLICO (Fig. 4E). This suggests a cross-talk between tumor and organoid cells via the NOTCH pathway that can be further studied and targeted using the GLICO model.
Given the identification of these cellular states, particularly the proneural NPC/OPC-like component of GLICO cells, we explored whether we could identify a root node through lineage reconstruction with RNA velocity analysis (ref. 30; Fig. 4F; Supplementary Fig. S4F). Recently, it has been shown with RNA velocity analysis that when looking at only cycling GBM cells, lineage tracing supports the hypothesis that MES GSCs progress to proneural GSCs (21). When using our dataset, we chose to focus on noncycling cells that span the spectrum of the four GSC cellular states. Using RNA velocity analysis to label high-density regions as starting points, or “root cells,” we found that all four of the cellular states are labeled as root cells in different patient and model types. A wide range of genes from multiple cellular states, for example AC-like genes (FABP7, APOD), NPC-like genes (SOX4, STMN2), and MES-like genes (ENO2), are differentially expressed in these root cells across samples (Fig. 4G). Due to the high concordance of our modules with published work (7), we believe our data support the conclusion that there exists plasticity between all of the cellular states instead of a defined MES to proneural hierarchy.
Cerebral Organoid Influences GBM Cellular States
In order to explore whether the specific intratumoral transcriptomic heterogeneity observed in GLICO was a consequence of the model's growth conditions rather than an irreversible selection of particular cell subpopulations, we disaggregated the GLICO cells for 2 patients—WCM3 and WCM5—and replated the cells in 2-D culture conditions. Thirty days after replating, we submitted the cells for scRNA-seq and compared the results with their corresponding 2-D and GLICO datasets. We visualized genes that are differentially expressed between 2-D and GLICO models to determine whether the replated GSCs resemble the 2-D or GLICO phenotype (Fig. 5A). Unsupervised clustering analysis of these gene signatures revealed that the replated GSCs more closely resemble the 2-D phenotype. We see that SOX4, NFIA in WCM3 and BCAN, ASCL1 in WCM5 are among the genes highly expressed in only the GLICO and not the 2-D or replated GSCs.
Furthermore, PCA showed a significant overlap between replated GSCs and the 2-D cells as compared with GLICO cells (Fig. 5B; Supplementary Fig. S5A and S5B). Strikingly, when we label cells by GBM cellular state (7), we clearly see a subgroup of NPC-like cells that are present only in the GLICO model and lost upon replating. Similarly, a group of AC-like cells in WCM3 and to a lesser extent WCM5 are present only in GLICO conditions (Supplementary Fig. S5C). Furthermore, plotting the cell-cycle categories onto the principal components, we observe that the NPC-like and AC-like cell types present only in the GLICO model are noncycling. We can verify these results via microscopy, where after 30 days the isolated cells started to form neurospheres, once again resembling the 2-D phenotype. We also observed that after 7 days the GBM cells adhered to the substrate and interconnected with each other forming an intricate cell network (Fig. 5C). Interestingly, we observe a similar phenotype at the first steps of GSC isolation from freshly dissected patient tumor tissues. These data all suggest that these GBM cellular states are present only in the GLICO model as a consequence of its microenvironment (Fig. 5D).
Despite the rapid development of powerful new technologies to explore the genomic and epigenomic states of tumor cells isolated directly from primary human tumors, tumor models remain critical for the experimental integration of tumor biology and for preclinical screening of therapeutic agents. Nevertheless, few novel human cancer–derived models have been developed over the last decade, and even fewer of the available models have been interrogated and compared with the corresponding human primary tumors using modern-day genomic profiling. In this study, we used single-cell analyses of the transcriptomes of four models in five GBM patient tumors to elucidate the cellular heterogeneity of glioma spheres, TO, XE, and GLICO. We confirmed previous work that the GLICO model accurately phenocopies the patient tumor (12, 13), recapitulating the GBM heterogeneity with an enrichment in a stem-like cellular state, a balanced proportion of cycling cells, and an increase in survival mechanisms. Furthermore, we characterized the cellular landscape of all cells across all patients and models. Cells exhibiting more hypoxic, proneural, or classic signatures occupy a separate cluster from cycling cells, confirming their identity. Our results agree with the recent work from the Suva lab (7) by confirming the existence with unsupervised clustering of four distinctive meta-modules of GSCs: NPC-, OPC-, AC-, and MES-like. These large single-cell gene expression studies show us a more comprehensive description of the intratumor heterogeneity in GBM and particularly in GSCs. We also show that GLICO cells are enriched for NPC/OPC-like signatures with a strong stemness capacity. Strikingly, the same signatures are enriched in primary GBM cells, which makes GLICO an attractive model system for studying these hard-to-treat subpopulations that cannot be obtained in the rest of the models.
From comparing the single-cell transcriptomes of the models with the bulk patient tumor, the advantage of glioma cerebral organoids is apparent. In addition to having the strongest correlation with the patient transcriptome, we see that GLICO also clusters separately from 2-D and TO models and tends toward clustering with the XE model and the primary samples. This suggests that GLICO not only provides comparable modeling of glioma cell interactions with nonmalignant brain cells as XE at a fraction of the cost, time, and resources, but also has the advantages of a human brain microenvironment. The relevance of this human brain/GSC interdependence is further manifested in the replating experiment where we see glioma cells forming dendrite-like structures in the GLICO model, and then losing these structures and NPC/AC compartments, to then resume a glioma sphere program in the two-dimensional culture conditions.
Not only does GLICO differentially express the stemness marker SOX4 and glioma invasiveness marker BCAN, but we see prominent NOTCH pathway members (DLL3, HES1, and HES6) differentially expressed in GLICO model cells. DLL3, a NOTCH ligand that has recently been suggested as a therapeutic target for IDH-mutant gliomas (31), is expressed only in the GLICO and XE models, pointing to its importance only in the presence of a tumor microenvironment. This conclusion is further amplified by the high expression of DLL3 in surrounding cerebral organoid cells in the GLICO model. These results provide support for NOTCH inhibition as a potential avenue for treatment (32) but more importantly highlight the potential benefits of using GLICO as an ex vivo model over traditional 2-D cell lines for more accurately identifying clinically active agents in various high-throughput drug screens.
Like all tumor models, GLICO has its own limitations such as its inability to model tumor cell interactions with host blood vessels or immunologic cells. Nevertheless, our data demonstrate how the contribution of a neuroanatomically accurate human microenvironment is critical and sufficient for recapitulating the cellular states, and their plasticity, found in the corresponding parental human primary tumor. We speculate that this principle will likely apply to other tumors, although proof awaits the development of new model systems that similarly recapitulate normal host organ–tumor interactions.
GBM is a devastating disease that has seen scant progress in treatment options over the last few decades. Our in-depth characterization of the heterogeneous landscape of cellular states across different GBM models will hopefully help guide the optimal use of each model for the study and treatment of GBM. In a more general way, the demonstration of the importance of the human microenvironment on tumor cell biology, as seen in the GLICO model, should help inform the development of other novel ex vivo human cancer models.
Lead Contact and Materials Availability
Further information and requests for resources and reagents should be directed to and will be fulfilled by Howard A. Fine (email@example.com).
Experimental Model and Subject Details
Following written informed consent, tumor samples classified as GBM, based on the WHO criteria, were obtained from patients undergoing surgical treatment at the NIH or from Weill Cornell Medicine/New York–Presbyterian Hospital in accordance with the appropriate Institutional Review Boards. Within 1 to 3 hours after surgical removal, tumors were washed in PBS and enzymatically dissociated into single cells. Tumor cells were cultured in Neurobasal medium supplemented with N2, B27 and bFGF and EGF (NBE medium; Thermo Fisher Scientific), N2 and B27 supplements (Thermo Fisher Scientific), and human recombinant bFGF and EGF (25 ng/mL each; R&D Systems) plus Heparin sodium and l-glutamine. Regular Mycoplasma screening was performed using the MycoAlert Detection Kit (Lonza Inc.). Detailed information for all resources is provided in Supplementary Table S2.
Human Embryonic Stem Cells.
NIH-registered human H1 (WA01) or H9 (WA09) embryonic stem cells were purchased from WiCell Research Institute, Inc., and maintained in mTeSR1 medium (STEMCELL Technologies).
An overview of the experiment is provided in Fig. 1A. Cerebral organoids and coculture with GSCs were generated as described in ref. 12. Briefly, GSCs were transduced with pLenti PGK GFP Blast (w510-5; Addgene) and were cultured under blasticidin selection in NBE medium. Organoids were plated one per well in a 96-well round-bottom plate, excess medium was removed, and 20,000 GSCs that stably express GFP in 150 μL of NBE were added to each well. After stationary incubation at 37°C for 24 hours, GLICOs were washed in PBS and transferred to a new 6-well plate with 4 mL of organoid differentiation medium and filtered for tumor formation. Organoids were maintained on an orbital shaker for up to 14 days at 37°C.
TO and XE Generation.
Intracranial tumor cell injection into SCID mice was performed as described in ref. 8. Briefly, 200,000 GSC GFP-expressing cells were orthotopically xenografted into the right cortex of neonatal SCID mice. Mice were sacrificed upon display of overt phenotypic or neurologic signs. Protocols were reviewed and approved by the WCM Institutional Animal Care and Use Committee board. TOs were obtained as described in ref. 9. Briefly, 20,000 GSC GFP-expressing cells were suspended in Matrigel droplets (Corning, #354277) on parafilm molds prior to culture. TOs were cultured in 6-well plates, and shaken in NBE medium.
Single-Cell Dissociation and Sorting.
Mouse brain dissected areas with evident presence of tumor GFP+ cells, GLICOs, and TOs were dissociated into single-cell suspensions using the Papain Dissociation System following the manufacturer's instructions (Worthington Biochemical, LK003150). Two-dimensional cultures were dissociated following standard passaging protocols using Trypsin. Cells were filtered through a 40 μm strainer and resuspended in ice-cold PBS/2%BSA. GFP-positive/DAPI-negative cells were sorted by FACS (Sony, MA900) and collected in PBS/2%BSA for scRNA-seq. Cell viability was checked at every step of the process by Trypan Blue Exclusion test. For the replating experiments, dissociated GLICOs were cultured in NBE medium for 30 days, renewing it twice per week.
Immunofluorescence for GLICOs and 2-D Cultures.
WCM5 GFP-expressing GLICOs were fixed in 4% paraformaldehyde for 45 minutes at room temperature followed by three PBS washes for 10 minutes each and then embedded in paraffin. Sections of 10 μm were obtained using microtome on poly-lysine–coated slides. Following deparaffinization and rehydration, antigen retrieval was performed by submerging the slides in Trilogy solution (Sigma, 920P) by heat in a pressure cooker for 15 minutes. Sections were permeabilized for 20 minutes with PBS/0.5% Triton X-100 at room temperature and blocked 1 hour with PBS/3% BSA. Each section was incubated overnight at 4°C with primary antibodies against GFP (Abcam, ab13970, 1:200) and SOX4 (SIGMA, HPA029901, 1:100), BCAN (SIGMA, H00063827, 1:100), or DLL3 (Thermo, 703623, 1:100) followed by an incubation with the secondary antibodies coupled to Alexa 488 (Invitrogen, A11039, 1:200) and Alexa 568 (Invitrogen, A11036, 1:200) or Alexa 555 (Thermo, A28180, 1:200) for 1 hour at room temperature. Nuclei were counterstained with DAPI. Images were obtained using an epifluorescence microscope, processed, and analyzed using the Fiji software. Results were obtained for three independent GLICOs. Alternatively, WCM5 GFP-expressing GSCs were grown as 2-D and then cultured in Matrigel-coated Lab-Tek chambered coverglass (Thermo, 155409) overnight. Cells were fixed in 4% paraformaldehyde for 10 minutes at room temperature and then permeabilized, blocked, stained, and imaged in identical conditions as GLICOs.
Quantification and Statistical Analysis
scRNA-seq data were processed through 10x Genomics Chromium Single Cell Platform, and count matrices were generated using their Cell Ranger pipeline (10x Genomics). scRNA-seq data were preprocessed and largely analyzed using scanpy version 1.4 (33). For quality control, genes detected in less than 3 cells and cells with fewer than 200 genes were excluded. Cells with number of genes, number of unique molecular identifiers, or percentage of mitochondrial genes outside of 3 SDs were removed. Expression values were further library size corrected to 10,000 reads per cell and log1p transformed. Ribosomal and mitochondrial genes were subsequently removed due to their tendency to drive cluster formation.
Principal Component and Correlation Analyses.
PCA of individual patients was performed using scikit-learn version 0.20.3 (https://scikit-learn.org/stable/) in Python. PCA was performed per patient on combined data across the four models and the processed bulk RNA-seq data using the set of genes present in both bulk and single-cell data. Prior to PCA, bulk RNA-seq data were similarly log1p transformed, and the combined data were scaled to unit variance and zero mean to avoid gene abundance driving the principal component signal.
Unsupervised hierarchical clustering (Ward's method, Euclidean distance) was done using the expression signatures of all genes in the bulk patient tumor with the average expression values of each available model type. Clustering was done using the cluster.hierarchy package in scipy version 1.2.1 (https://docs.scipy.org/).
Box plots comparing each model with bulk patient RNA-seq data were done by calculating the Spearman correlation between the expression profile of each cell in each model and the bulk profile using both (i) all genes and (ii) genes marked as highly variable in the bulk patient tumor. Both versions found GLICO to have a significantly higher correlation than the other three models (P < 0.0001, Wilcoxon rank sum test). Violin plots broken up by patients were computed with the same method.
Correlation matrices in Supplementary Fig. S1 were calculated using Pearson correlation between the top 100 gene PCs and averaged across cells per sample and model on nonscaled, log-transformed data. Correlation results were robust to changing the number of gene principal components.
CNA Inference from Single-Cell Data.
CNAs were inferred using InferCNV (https://github.com/broadinstitute/infercnv), which works by sorting gene expression values by chromosomal position and subtracting expression values by the average CNV values of a nonmalignant reference population. We used an external reference of normal brain RNA-seq data downloaded from GTEX (http://www.gtexportal.org/) as described in ref. 4.
Single-Sample GSEA of Bulk Transcriptomes.
The subtypes of the bulk transcriptomes in Supplementary Fig. S1B were determined using single-sample GSEA (GSVA R package 1.34.0) of the four primary patient tumors and gene sets of Proneural, Classic, MES, and Neural subtypes from ref. 22.
Combined Dimensionality Reduction and Clustering.
UMAP projections and Louvain clustering with SCANPY were used to visualize and cluster the data. Combined dimensionality reduction and cluster analysis was performed with a filter for high-variability genes (default mean and dispersion cutoffs; 20,254 genes total with 10,980 marked as highly variable taken as the outer join of all highly variable genes in each sample). Upon dimensionality reduction, batch effects between patients and models separated and drove cluster assignments. Therefore, we corrected for this batch effect using the batch balanced k-nearest neighbors (BBKNN) method to better find cell populations across patients and models. BBKNN is a graph-based batch correction method that connects cells that have similar nearest neighbors within their batch. We found that after batch correction and subsequent embedding and clustering, functionally relevant clusters that were no longer dominated by batch were found. We also performed a per-sample clustering analysis not reliant on batch correction to test the validity of our clusters (below).
Gene set scores per Louvain cluster were calculated by taking the average expression of each gene set and subtracting a binned reference set with scanpy's score_genes function as detailed in ref. 5. Gene sets for cluster identification (e.g., HALLMARK_HYPOXIA) were downloaded from MSigDB (http://software.broadinstitute.org/gsea/msigdb/), and their gene set scores were plotted and subsequent cluster profiles were grouped using unsupervised hierarchical clustering.
GBM subtypes (proneural, classic, and MES) per cell were determined using reference GBM bulk transcriptomes from TCGA and computed using SingleR (24), a method that finds a Spearman coefficient for the correlation of the single-cell expression profiles to the reference transcriptomes and aggregates and refines the results per cell type.
Per-Sample Clustering Analysis.
We sought to find recurrent patterns across patients without relying on batch correction to test the validity of our clusters. In a similar manner to ref. 7, we performed Louvain clustering on each of the 19 samples individually, resulting in 220 clusters. We then aggregated the transcriptomes for each cluster using the average expression to get a cluster by gene matrix. Next, we performed hierarchical clustering of these clusters to identify patterns across all clusters, with a distance metric of one minus the Pearson correlation. Finally, we cut the dendrogram into “meta-clusters” and scored the meta-clusters using the gene signatures from our main clusters in Fig. 2A. The scoring, shown in the top plot of Supplementary Fig. S2D, shows the high concordance between these meta-clusters and the patterns found from batch correction.
Lineage and Cell Type Assignment.
Stem oligodendrocyte progenitor and AC lineage gene sets were taken from ref. 5, as well as cell-cycle scoring genes, and calculated using the aforementioned scoring method. Scores for each lineage were assigned to cells, and the distributions were compared across models using a one-way ANOVA test (*, P < 0.05). To quantify the distances between lineage cells, diffusion pseudotime analysis was conducted by setting a root node as the cell with highest combined expression of SOX4 and SOX11 (representing a cell from the NPC-like cluster) and computed by inferring the progression of cells using geodesic distance along a graph with the dpt function in scanpy; however, similar results are achieved with different cells from cluster 7.
To confirm these results, we also calculated meta-module scores as in ref. 7 for NPC1, NPC2, MES1, MES2, AC, and OPC scores. To assign cells to one of these four cell types, we took the maximum score for NPC2, MES2, AC, and OPC after finding higher enrichment of NPC2 and MES2 in our dataset. If the maximum score for a cell was less than a minimum evidence threshold of 0.3, we marked the assignment as ambiguous. To visualize the meta-module scores, we reimplemented the two-dimensionality visualization of ref. 7 by separating cells as OPC/NPC versus AC/MES and plotting max(OPC score, NPC score) – max(AC score, MES score) on the y axis and log2(|OPC score – NPC score| = 1) or log2(|AC score – MES score| = 1) on the x axis. Cells were scored with cycling gene sets and determined to be cycling if they had a phase designation of S or G2_M using the score_genes_cell_cycle function. The normalized confusion matrix was calculated by comparing the cluster assignment and cell type assignment for each cell and normalizing per cell type assignment.
Comparison with Primary Single-Cell Transcriptomes.
Primary scRNA-seq data were obtained from ref. 14. Removal of nonmalignant cells was done as they described (Supplementary Fig. S3C). Preprocessing steps prior to batch correction and integration with our data were performed in the same manner as our samples. Primary cells were assigned to one of the four cellular states as described above. We restricted our analysis to these cell populations in order to enrich for cell types with high variation, in concordance with the original analysis. The proportions of each cellular state per model were calculated and then plotted, as well as a cosine similarity score based on these proportions (Supplementary Fig. S3F). To cluster aggregated transcriptomes with the different model types in Fig. 1G, the average expression values across the primary GSCs were clustered with the transcriptomes of each model type, using cosine distance as the distance metric. The same analysis but separating each of the eight primary samples, labeled by the reported subtype, is included in Supplementary Fig. S3D.
Differential Expression and Gene Set Enrichment.
Differential expression analysis was performed using MAST (28) between GLICO and 2-D/TO cells, excluding genes that were detected in at least 20% of cells in either population. Similar analysis was done with XE versus 2-D/TO cells. GSEA was conducted using the log fold changes of MAST as the ranked list and input into GSEAPreranked (http://software.broadinstitute.org/gsea/index.jsp). We tested for enrichment in C2 and C5 gene sets (N = 10,679) from MSigDB. Ingenuity Pathway Analysis for Cellular Functions (https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/) was performed on the resulting MAST differentially expressed genes.
Lineage Reconstruction via RNA Velocity.
Velocity streams and inference of root cells were generated using scvelo version 0.1.23 (30). Using cycling assignments, we performed the lineage reconstruction with and without cycling cells and found root nodes present in noncycling cells. Root and terminal points were found using the terminal_states function in scvelo, which uses Markov processes on the transition probability matrix of the cells to find root and end points.
Ethics Approval and Consent to Participate.
Study protocols were in accordance with the Weill Cornell Medicine Institutional Review Board. All clinical samples were analyzed in a deidentified fashion. All experiments were carried out in conformity to the principles set out in the WMA Declaration of Helsinki. Written informed consent was provided by all patients (tumor acquisition protocols: Universal Database protocol, IRB #1302013582; and Precision Medicine protocol, IRB #1305013903).
Data and Code Availability
Data generated for this study have been deposited in the NCBI Sequence Read Archive (PRJNA595375). The code supporting the current study is available from the corresponding author upon request.
Disclosure of Potential Conflicts of Interest
O. Elemento is a consultant at Volastra Therapeutics, reports receiving commercial research support from Eli Lilly, and has ownership interest (including patents) in Volastra Therapeutics and One Three Biotech. No potential conflicts of interest were disclosed by the other authors.
Conception and design: A.R. Pine, S.M. Cirigliano, A. Linkous, K. Miyaguchi, M. Snuderl, H.A. Fine
Development of methodology: A.R. Pine, S.M. Cirigliano, A. Linkous, K. Miyaguchi, O. Elemento, H.A. Fine
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.M. Cirigliano, A. Linkous, K. Miyaguchi, L. Edwards, R. Singhania, T.H. Schwartz, R. Ramakrishna, D.J. Pisapia, O. Elemento, H.A. Fine
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.R. Pine, S.M. Cirigliano, J.G. Nicholson, Y. Hu, M. Snuderl, O. Elemento, H.A. Fine
Writing, review, and/or revision of the manuscript: A.R. Pine, S.M. Cirigliano, J.G. Nicholson, A. Linkous, R. Ramakrishna, D.J. Pisapia, M. Snuderl, O. Elemento, H.A. Fine
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R. Singhania, D.J. Pisapia, H.A. Fine
Study supervision: O. Elemento, H.A. Fine
Other (preparation of GFP-expressing GSC lines): K. Miyaguchi
Other (experimental support): L. Edwards
We would like to thank the patients whose sample donations were used in this study. We also thank the WCM Applied Bioinformatics Core, the WCM Flow Cytometry Core, the WCM Epigenomics Core, and the WCM Microscopy and Image Analysis Core Facility for their support. Work in H.A. Fine's laboratory is supported by an NIH Director Pioneer Award (1DP1CA228040-01). A.R. Pine was supported by the Tri-Institutional Training Program in Computational Biology and Medicine funded by NIH grant 1T32GM083937.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.