Gliomas are primary brain tumors with high mortality and heterogeneous biology that is insufficiently understood. In this study, we performed a systematic analysis of the intrinsic organization of complex glioma transcriptome to gain deeper knowledge of the tumor biology. Gene coexpression relationships were explored in 790 glioma samples from 5 published patient cohorts treated at different institutions. We identified 20 coexpression modules that were common to all the data sets and associated with proliferation, angiogenesis, hypoxia, immune response, genomic alterations, cell differentiation phenotypes, and other features inherent to glial tumors. A collection of high-quality signatures for the respective processes was obtained using cross‐data set summarization of the modules' gene composition. Individual modules were found to be organized into higher order coexpression groups, the two largest of them associated with glioblastoma and oligodendroglioma, respectively. We identified a novel prognostic gene expression signature (185 genes) linked to a proastrocytic pattern of tumor cell differentiation. This “proastrocytic” signature was associated with long survival and defined a subgroup of the previously established “proneural” class of gliomas. A strong negative correlation between proastrocytic and proneural markers across differentiated tumors underscored the distinction between these subtypes of glioma. Interestingly, one further novel signature in glioma was identified that was associated with EGFR (epidermal growth factor receptor) gene amplification and suggested that EGF signaling in glioma may be a subject to regulation by Sprouty family proteins. In summary, this integrated analysis of the glioma transcriptome provided several novel insights into molecular heterogeneity and pathogenesis of glial tumors. Cancer Res; 70(24); 10060–70. ©2010 AACR.

Gliomas are the most common primary brain tumors in adults (1). Despite advances in their surgery, radiotherapy, and chemotherapy, the majority of gliomas remain lethal within 1 to 3 years from the time of diagnosis (2). A key challenge for diagnosing and treatment of glial tumors is their high interindividual heterogeneity (3). Histologically, gliomas are classified into 2 major subtypes: astrocytoma and oligodendroglioma (1), which are further divided into grades ranging from I (benign tumors) to IV (malignant glioblastoma) according to presence of anaplastic features such as necrosis, mitotic figures, vascular proliferation and others (1). However, tumors falling within the same category often substantially differ in their clinical behavior which reflects their diversity at the molecular level (4). Therefore, molecular approaches for investigation of the tumors heterogeneity and pathogenesis are required (3).

Microarray gene expression profiling offers an opportunity for genome-scale, quantitative evaluation of glioma biology by simultaneously measuring expression levels for thousands of genes (5). Molecular subtypes of glioma determined according to tumor expression profiles were shown to correlate with survival more accurately than does conventional histology (6, 7). Due to this higher accuracy in capturing the tumor phenotype, expression-based classifications offer an opportunity to improve clinical diagnostics of gliomas (6–8). Gene expression analyses have also yielded insights into molecular aspects of tumor progression (9), recurrence (8), and differentiation (10). However, a key challenge remains the high intrinsic complexity of the glioma transcriptome, which encompasses expression profiles of thousands of genes driven by a wide range of factors. This complexity hinders development of a standard expression-based classification of gliomas, different studies suggesting from 3 to 6 transcriptional glioma subtypes (6, 8, 11, 12), and leads to the need for an integrated view of expression variation in the tumors. We hypothesized that a more structured approach may facilitate understanding of the glioma transcriptome and provide insight into molecular diversity and pathogenesis of glial tumors.

Analysis of gene coexpression relationships is a powerful method for exploring intrinsic organization of a transcriptome (13). In contrast to the conventional differential expression analysis which searches for expression changes of individual genes between given conditions, coexpression networks explore gene-to-gene relationships and reveal intrinsic modules of coordinately expressed genes in an unsupervised way (14). This delineates expression changes that are driven by distinct cellular processes and provides a functional framework for understanding gene expression patterns of the tumors (13, 15). Several studies based on this approach yielded insights into oncogenic signaling pathways (15), chemotherapy resistance (16), and molecular heterogeneity of glioma (17, 18). Although resulting in important findings, these studies were limited in their ability to provide an integrated view of the structure of the tumor transcriptome due to methodologic constraints: 1) incomplete genome coverage (17, 18), 2) restriction of the analysis to only glioblastoma tumors (15, 16, 18), 3) use of algorithms that produce “tight” coexpression modules with a limited gene content (16, 17).

Here, we apply improved methods based on weighted gene coexpression network analysis (WGCNA; ref. 14) to publicly available microarray data covering multiple histologic types of glioma and the majority of genes in the genome. We identified genome-scale modules of coexpressed genes with clear functional interpretations and obtained an overview of the higher order organization of the transcriptome based on module-to-module relationships. The analysis revealed a novel prognostic signature which was associated with a proastrocytic pattern of tumor cell differentiation and provides potential gene markers for tumor subtype diagnostics. Our results also suggest a functional link between Sprouty proteins and oncogenic epidermal growth factor receptor (EGFR) signaling in glioma.

Microarray data

We analyzed 5 microarray data sets from 4 gene expression studies in glioma (6, 8, 11, 12). The data sets were originally generated using Affymetrix arrays (U133 Plus 2.0 and U133A platforms). We labeled the data sets according to the names of the first authors in the studies: “Gravendeel” (6), “Li_Train” (12), “Li_Test” (12), “Phillips” (8), and “Freije” (11). The primary analysis was focused on the Gravendeel data set as it contained the largest number of samples. Normalization of the raw data sets was performed using MAS5 algorithm (19) followed by quantile normalization. See Supplementary Methods for further information on the data sets and their normalization.

Coexpression module detection

WGCNA (14) was performed in the Gravendeel data set. A coexpression network was constructed on the basis of 4,000 genes with highest expression variance (20). For all possible pairs of the variable genes, Pearson correlation coefficients were calculated across all samples. The correlations matrix was raised to the power 9, thus producing a weighted network (weighted correlation = correlation9, Supplementary Methods; ref. 14). The weighted network was transformed into a network of topological overlap (TO)—an advanced coexpression measure that considers not only the correlation of 2 genes with each other, but also the extent of their shared correlations across the weighted network (14). Genes were hierarchically clustered on the basis of their TO. Modules were identified on the dendrogram using the Dynamic Tree Cut algorithm (21). The resulting modules were tested for sensitivity to random removal of samples, and the stable modules were retained (see Supplementary Methods for details).

For each gene, we determined its connectivity within its module of residence by summing up the TOs of the gene with all the other genes in the module. By definition, highly connected (hub) genes display expression profiles highly characteristic for their module of residence (22). To obtain a condensed representative expression profile of each module (module eigengene, ME; ref. 22), we summarized expression levels of the top 10 hub genes in the module (Supplementary Methods; ref. 14). The resulting MEs were used to extend the modules' gene composition from the 4,000 network genes up to the genome scale (∼19,000 genes) using a newly developed procedure (Supplementary Methods).

Functional annotation of the modules

Functional annotation of the modules was performed on the basis of analysis of their gene composition. We used DAVID (http://david.abcc.ncifcrf.gov/) to test each module for enrichment in genes with particular GO, SwissProt, and InterPro terms compared with the background list of all genes on the array (23). Association of modules with genomic rearrangements was detected using DAVID on the basis of overrepresentation of genes encoded at neighboring chromosomal locations (23). Relation of modules to differentiation programs and cell types was detected on the basis of enrichment in genes preferentially expressed in purified populations of neurons, astrocytes or oligodendrocytes (Fisher's exact test; ref. 24). In addition, we related modules to processes in glioma on the basis of the literature data.

Classification of glioma samples

For each gene, data were normalized into Z-scores before clustering. In the Gravendeel data set, we performed a k-means clustering of samples into 5 groups on the basis of the top 25 hub genes from the consensus modules M8, M11, M15, M20 (a total of 100 genes, Euclidean distance). For each cluster, a centroid was obtained that consisted of average expression values of the 100 genes. Tumors in the 5 independent data sets were assigned to classes according to the nearest centroid as measured by Euclidean distance between the centroids and expression levels of the 100 marker genes.

Reproducing of the Phillips subtypes

To compare our subtypes of glioma with those proposed by Phillips and colleagues (8), we reproduced the centroids (Mes, Prolif, and PN) described in the original study by calculating mean expression levels of the published centroid genes (8) in the published groups of samples (8) within the Phillips data set. Samples in each of the 5 microarray datasets were assigned to a class on the basis of the nearest centroid (Euclidean distance).

Gene set enrichment analysis

Gene set enrichment analysis (GSEA; ref. 25) was used to compare tumors in the “cluster 17” (38 samples) versus all the other tumors (238 samples) in the Gravendeel data set. The analyzed gene set (424 genes) was compiled of genes that showed at least a 5-fold upregulation in astrocytes compared to neurons and oligodendrocytes (24). The analysis was performed by randomly permuting phenotype labels 1,000 times (25).

Assessment of correlation between the proastrocytic and proneural markers

From each data set, we selected differentiated tumors, that is those where the proastrocytic (M15) and/or neurogenesis (M20) modules were upregulated (Z-score normalized expression values of the proastrocytic ME and/or proneural ME higher than 1.0). Pearson correlation coefficients were calculated between the proastrocytic ME and the neurogenesis ME across these tumors in each data set.

Detection of gene coexpression modules

To investigate functional organization of the glioma tran scriptome, we analyzed the largest-to-date public microarray glioma data set generated by Gravendeel and colleagues (Gravendeel data set; ref. 6). The data set consisted of 276 tumor samples including glioblastoma, anaplastic astrocytoma, anaplastic oligodendroglioma, low-grade astrocytoma, and low-grade oligodendroglioma (Supplementary Table S1). We applied WGCNA algorithm to explore transcriptional relationships between genes in this data set (14). A weighted gene coexpression network was constructed on the basis of 4,000 highly informative genes (14, 20). Because WGCNA evaluates coexpression for a pair of genes based on their topological overlap (TO), which considers not only the correlation of the 2 genes with each other, but also the degree of their shared neighbors across the entire network, the analysis highlighted consistent patterns of gene coexpression in the transcriptome (see Methods). Hierarchical clustering of the TO network followed by a screening for cluster stability revealed 20 stable modules (Fig. 1; modules M1–M20), each of them containing coordinately expressed genes potentially involved in shared cellular processes.

Figure 1.

Network analysis of gene expression in glioma identifies modules of coexpressed genes. The dendrogram was produced by unsupervised hierarchical clustering of genes on the basis of topological overlap (TO). Coexpression modules were detected by applying a dynamic branch cutting algorithm to the dendrogram. The analysis revealed 22 modules marked by colors on the horizontal bar. Modules M' and M" were unstable (see Methods section) and therefore were excluded from further analysis.

Figure 1.

Network analysis of gene expression in glioma identifies modules of coexpressed genes. The dendrogram was produced by unsupervised hierarchical clustering of genes on the basis of topological overlap (TO). Coexpression modules were detected by applying a dynamic branch cutting algorithm to the dendrogram. The analysis revealed 22 modules marked by colors on the horizontal bar. Modules M' and M" were unstable (see Methods section) and therefore were excluded from further analysis.

Close modal

To determine gene composition of the modules at a genome scale, we first obtained a condensed expression profile for each module (i.e., ME) by summarizing expression patterns of the module hub genes into a single characteristic expression profile (14, 22). The resulting MEs captured the major expression trends of the modules and served to represent each module in a summarized form. We used the MEs in a newly developed procedure to extend the modules from the 4,000 seed genes to the entire data set of about 19,000 genes (Supplementary Methods), which increased the modules' size on average 3-fold. Thus, the analysis resulted in determination of 20 genome-scale coexpression modules in the Gravendeel data set.

Coexpression modules are highly preserved across independent patient cohorts

Microarrays are notorious for differences in gene expression profiles across data sets (26). To test whether the identified modules could be found in independent patient cohorts, we analyzed 4 additional public expression data sets from distinct institutions (see Supplementary Methods for details on the datasets). We first projected MEs from the Gravendeel data set onto each of the 4 additional patient cohorts and next used the ME projections to determine gene composition of the modules in these data sets (Supplementary Methods). A striking cross–data set similarity of the resulting modules was observed (Fig. 2), with P values ranging from 10−11 to 10−619 (see Supplementary Table S2). Furthermore, projections of a given module were always more similar to each other than to projections of the other modules. This confirmed that the modules are common across the data sets generated at distinct medical institutions and represent a consistent feature of the glioma transcriptome organization.

Figure 2.

Modules are highly reproducible across independent patient cohorts. Modules were compared in all possible pairs of the 5 data sets. For each module, a horizontal bar depicts mean percentage of overlapping genes across the 5 data sets. Statistical significance of the overlap is denoted by the Fisher's exact test P values (median P values across all possible pairs of the data sets). The “Size” column shows the number of genes in the consensus modules, which were compiled of genes belonging to the same module in at least 3 of the 5 data sets.

Figure 2.

Modules are highly reproducible across independent patient cohorts. Modules were compared in all possible pairs of the 5 data sets. For each module, a horizontal bar depicts mean percentage of overlapping genes across the 5 data sets. Statistical significance of the overlap is denoted by the Fisher's exact test P values (median P values across all possible pairs of the data sets). The “Size” column shows the number of genes in the consensus modules, which were compiled of genes belonging to the same module in at least 3 of the 5 data sets.

Close modal

Despite the high overall consistency, we found some genes to be data set–specific. To determine the “core” gene composition of the modules, we compiled consensus modules by including genes that belonged to the same module in at least 3 of the 5 patient cohorts (FDR, false discovery rate, < 2%; Supplementary Methods). This resulted in 20 modules with a refined gene composition that are provided in Supplementary Table S3. Although similar meta-analysis approaches have been used in other cancer types (27, 28), this is, to our knowledge, the first meta-analysis of gene coexpression modules in glioma. The consensus modules represent a collection of high-quality gene expression signatures (Supplementary Table S3) and transcriptionally characterize the key processes inherent to glial tumors (see in the following text).

Coexpression modules correspond to distinct cellular processes and structures

To determine which processes and structures underlie the observed gene coexpression, we statistically analyzed gene composition of the consensus modules. The modules were found to be associated with a wide range of functions that can be grouped into several categories: processes commonly involved in cancer (e.g., proliferation, hypoxia, angiogenesis, and immune response; ref. 29), glioma differentiation phenotypes (mesenchymal, proneural, and proastrocytic; refs. 1, 8), genetic alterations (e.g., 12q13 amplification and 10q deletion; refs. 3, 30), normal brain functions (infiltrated neurons and myelination), and others (Table 1). A detailed functional characterization of the modules based on their gene composition is provided in the Supplementary Table S4.

Table 1.

Functional annotation of the modules

ModuleTop overrepresented termaCharacteristic genesbOverall annotationc
General cancer functions 
M11 Mitosis (10−55CDK2, CENPA, E2F1, MCM2, TOP2A Proliferation 
M4 Glycolysis (10−4ADM, ANG, ENO1, VEGFA Hypoxia 
M1 Vasculature development (10−7CDH5, ENG, ESAM, FLT1, PECAM1 Endothelium 
M6 Collagen (10−16COL1A1, COL3A1, COL5A1, LAMB1 Blood vessels 
M7 Immune response (10−48CD4, CD8, MHC, TLR1, TLR2 Immune response 
M3 Protein kinase regulation (10−9NES, DUSP4, DUSP6, SPRY2 Protein kinase regulation 
M2 Response to virus (10−18IFI35, IFI44, IRF7, IRF9, MX1 IFN-induced genes 
Glioma differentiation phenotypes 
M8 Mesenchymal markers (10−29CHI3L1, FAS, TAGLN, TNC, VIM Mesenchymal differentiation 
M15 Astrocytic markers (10−26APOE, ID4, LIFR, PAX6, PEA15 Proastrocytic differentiation 
M19 Neuronal markers (10−44INA, L1CAM, MYT1L, NRXN1, SEPT3 Proneural differentiation 
M20 Neurogenesis (10−6ASCL1, BMP2, DLL3, HES6, NLGN1 Neurogenesis 
Genomic alterations 
M5 19 chromosome (10−49BAX, ERCC1 19q deletion 
M9 1 chromosome (10−89AK2, EFNA4, HDAC1, MYCBP 1p deletion 
M10 12q13 cytoband (10−8MDM2 12q13 amplification 
M16 10 chromosome (10−19ABI1, MXI1 10q deletion 
Normal brain functions 
M13 Neuron markers (10−95GABRA1, GRIN1, KCNA1, SYN1 Infiltrated neurons 
M14 Axon ensheathment (10−5MBP, MAG, MOBP, MOG Myelination 
Others 
M12 Nucleus (10−12TOP1, SMC3 Nucleus 
M17 Nucleus (10−16CCNJ, CTCF, TBP Nucleus 
M18 Protein synthesis (10−28EEF2, EIF2A, RPL4, RPS6 Protein synthesis 
ModuleTop overrepresented termaCharacteristic genesbOverall annotationc
General cancer functions 
M11 Mitosis (10−55CDK2, CENPA, E2F1, MCM2, TOP2A Proliferation 
M4 Glycolysis (10−4ADM, ANG, ENO1, VEGFA Hypoxia 
M1 Vasculature development (10−7CDH5, ENG, ESAM, FLT1, PECAM1 Endothelium 
M6 Collagen (10−16COL1A1, COL3A1, COL5A1, LAMB1 Blood vessels 
M7 Immune response (10−48CD4, CD8, MHC, TLR1, TLR2 Immune response 
M3 Protein kinase regulation (10−9NES, DUSP4, DUSP6, SPRY2 Protein kinase regulation 
M2 Response to virus (10−18IFI35, IFI44, IRF7, IRF9, MX1 IFN-induced genes 
Glioma differentiation phenotypes 
M8 Mesenchymal markers (10−29CHI3L1, FAS, TAGLN, TNC, VIM Mesenchymal differentiation 
M15 Astrocytic markers (10−26APOE, ID4, LIFR, PAX6, PEA15 Proastrocytic differentiation 
M19 Neuronal markers (10−44INA, L1CAM, MYT1L, NRXN1, SEPT3 Proneural differentiation 
M20 Neurogenesis (10−6ASCL1, BMP2, DLL3, HES6, NLGN1 Neurogenesis 
Genomic alterations 
M5 19 chromosome (10−49BAX, ERCC1 19q deletion 
M9 1 chromosome (10−89AK2, EFNA4, HDAC1, MYCBP 1p deletion 
M10 12q13 cytoband (10−8MDM2 12q13 amplification 
M16 10 chromosome (10−19ABI1, MXI1 10q deletion 
Normal brain functions 
M13 Neuron markers (10−95GABRA1, GRIN1, KCNA1, SYN1 Infiltrated neurons 
M14 Axon ensheathment (10−5MBP, MAG, MOBP, MOG Myelination 
Others 
M12 Nucleus (10−12TOP1, SMC3 Nucleus 
M17 Nucleus (10−16CCNJ, CTCF, TBP Nucleus 
M18 Protein synthesis (10−28EEF2, EIF2A, RPL4, RPS6 Protein synthesis 

aFunctional terms overrepresented with highest statistical significance in the given module (Fisher's exact test P values are given in brackets).

bGenes that are functionally typical for their module of residence.

cOverall annotation was obtained by combining results of enrichment analysis performed on multiple information sources (Supplementary Table S3).

Higher order organization of the transcriptome

Distinct processes, such as angiogenesis and hypoxia, are known to interact in cancer (29). We hypothesized that individual modules may, therefore, be organized into a higher order structure which describes functional relationships between them. To test this, we clustered modules and patients in the Gravendeel data set. The modules were clustered on the basis of Pearson correlations between their MEs (Fig. 3, dendrogram of modules on the left), whereas the patients were clustered on the basis of expression levels of the top 25 hub genes from each module (Fig. 3, dendrogram of patients at the top). The clustering was robust to the choice of the number of hub genes (Supplementary Fig. S1) and reproducible across the data sets (Supplementary Fig. S2). To cross-reference the expression patterns with clinical behavior of the tumors, we displayed clinical characteristics, such as tumor grade, histopathologic diagnosis, and survival below the clustered heatmap (Fig. 3, panels at the bottom). We also analyzed the clinical data quantitatively on the basis of correlations of these parameters with the MEs (Table 2).

Figure 3.

Higher order organization of the glioma transcriptome. Columns, patients; rows, genes (25 genes per module). Red denotes higher expression relative to green. The figure was obtained on the Gravendeel data set. Modules were hierarchically clustered on the basis of correlations between their eigengenes (MEs); the resulting module clusters are marked with bars (clusters A, B, C, D and E). Tumors (276 samples) were hierarchically clustered on the basis of the top 25 hub genes from each module. Module annotations are provided on the right of the heatmap. Underlined are the modules that were used for classifying the glioma samples into prognostic subtypes (see Results). Patient characteristics are shown below the heatmap: tumor grade, histology, patient survival, and tumor expression-based subtype determined in the present study.

Figure 3.

Higher order organization of the glioma transcriptome. Columns, patients; rows, genes (25 genes per module). Red denotes higher expression relative to green. The figure was obtained on the Gravendeel data set. Modules were hierarchically clustered on the basis of correlations between their eigengenes (MEs); the resulting module clusters are marked with bars (clusters A, B, C, D and E). Tumors (276 samples) were hierarchically clustered on the basis of the top 25 hub genes from each module. Module annotations are provided on the right of the heatmap. Underlined are the modules that were used for classifying the glioma samples into prognostic subtypes (see Results). Patient characteristics are shown below the heatmap: tumor grade, histology, patient survival, and tumor expression-based subtype determined in the present study.

Close modal
Table 2.

Module expression levels are correlated with the patients' clinical characteristics

ModuleGroupaAnnotationGradebDiagnosiscSurvivaldCOX P value
M1 Endothelium 0.37 GBM Short 1.7 × 10−2 
M2  IFN-induced genes 0.39 GBM Short 1.3 × 10−7 
M3  Protein kinase regulation 0.44 GBM Short <10−16 
M4  Hypoxia 0.59 GBM Short 1.8 × 10−14 
M5  19q deletion 0.48 GBM Short 6.7 × 10−9 
M6  Blood vessels 0.52 GBM Short 1.8 × 10−6 
M7  Immune response 0.38 GBM Short 2.6 × 10−7 
M8  Mesenchymal differentiation 0.58 GBM Short <10−16 
M9  1p deletion 0.52 GBM Short 6.6 × 10−16 
M10 12q13 amplification 0.25 GBM Short 3.7 × 10−3 
M11  Proliferation 0.40 GBM Short 2.1 × 10−3 
M12  Nucleus 
M13 Normal neurons −0.28 Normal 
M14  Myelination Normal 
M15 Proastrocytic differentiation −0.34 Astro, Normal Long 1.8 × 10−4 
M16 10q deletion −0.59 Astro, Oligo, Normal Long <10−16 
M17  Nucleus −0.42 Oligo Long 1.7 × 10−10 
M18  Protein synthesis −0.40 Oligo Long 2.0 × 10−12 
M19  Proneural differentiation −0.54 Oligo, normal Long 2.0 × 10−11 
M20  Neurogenesis −0.57 Oligo Long 6.0 × 10−14 
ModuleGroupaAnnotationGradebDiagnosiscSurvivaldCOX P value
M1 Endothelium 0.37 GBM Short 1.7 × 10−2 
M2  IFN-induced genes 0.39 GBM Short 1.3 × 10−7 
M3  Protein kinase regulation 0.44 GBM Short <10−16 
M4  Hypoxia 0.59 GBM Short 1.8 × 10−14 
M5  19q deletion 0.48 GBM Short 6.7 × 10−9 
M6  Blood vessels 0.52 GBM Short 1.8 × 10−6 
M7  Immune response 0.38 GBM Short 2.6 × 10−7 
M8  Mesenchymal differentiation 0.58 GBM Short <10−16 
M9  1p deletion 0.52 GBM Short 6.6 × 10−16 
M10 12q13 amplification 0.25 GBM Short 3.7 × 10−3 
M11  Proliferation 0.40 GBM Short 2.1 × 10−3 
M12  Nucleus 
M13 Normal neurons −0.28 Normal 
M14  Myelination Normal 
M15 Proastrocytic differentiation −0.34 Astro, Normal Long 1.8 × 10−4 
M16 10q deletion −0.59 Astro, Oligo, Normal Long <10−16 
M17  Nucleus −0.42 Oligo Long 1.7 × 10−10 
M18  Protein synthesis −0.40 Oligo Long 2.0 × 10−12 
M19  Proneural differentiation −0.54 Oligo, normal Long 2.0 × 10−11 
M20  Neurogenesis −0.57 Oligo Long 6.0 × 10−14 

Note: The results were obtained on the basis of the Gravendeel data set. Dash (-) means “not significant”.

aModule groups are based on Figure 3.

bSpearman correlation coefficient between MEs and the tumor histologic grade.

cHistologic glioma subtypes in which the modules are upregulated (P < 0.05, Wilcoxon test based on MEs).

dAssociation of MEs with survival as determined by univariate COX regression analysis.

Notable in this structured view was the presence of 2 large module coexpression groups (groups A and E) that were expressed in a mutually exclusive way (Fig. 3). These groups consisted of the modules associated with poor and favorable prognoses, respectively (Table 2). Specifically, group A contained the modules associated with cancer progression and was linked to glioblastoma, the most aggressive glioma type (Fig. 3). In contrast, group E included modules related to neurogenesis and cell proneural differentiation and was activated in oligodendroglioma which is less aggressive (Fig. 3). The large scope and strong intensity of this mutually exclusive expression pattern of the 2 module groups underscores biological divergence between glioblastoma and oligodendroglioma. In line with this, a recent study of molecular heterogeneity in brain tumors suggested that gliomas fall into 2 major molecular classes: G type (glioblastoma-like) and O type (oligodendroglioma-like; ref. 12).

The structured view of the transcriptome also illuminates cell differentiation programs in glioma. As shown in the Figure 3, the functionally related modules M19 and M20 associated with proneural differentiation and neurogenesis, respectively, showed a notable distinction in their expression patterns: M20 was upregulated in oligodendroglioma only, whereas M19 was activated both in oligodendroglioma and in the samples containing a large amount of normal neurons (Table 2, Fig. 3). Together with the module M13 which was expressed selectively in the neuron-containing samples, the trio of these modules (M13, M19, M20) thus describes the shared and unique transcriptional features of normal neurons and proneural oligodendroglioma. These results are consistent with the previous reports claiming that although tumor cells in oligodendroglioma tend to morphologically resemble oligodendrocytes (1, 3), they are typically marked by a proneural pattern of gene expression including activation of multiple genes characteristic for mature neurons (M19, e.g., INA, L1CAM, MYT1L, NRXN1, SEPT3; refs. 10, 31). At the same time, oligodendroglioma is clearly distinguished from nonneoplastic brain tissue due to low expression levels of other neuronal genes (M13, e.g., GABRA1, GRIN1, KCNA1, SYN1) and elevated expression of genes characteristic for immature progenitors of brain cells (M20, e.g., ASCL1, BMP2, DLL3, HES6, NLGN1; Fig. 3; ref. 10). Taken together, individual modules are organized into a higher order structure that correlates with clinical characteristics and provides insight into biology of glioma.

The module of proastrocytic differentiation provides novel markers of longer survival

Gliomas are notorious for their molecular and clinical heterogeneity (1, 3). To determine molecular subtypes of glioma predictive of its clinical behavior, we examined the patients' clustering in the Gravendeel data set (Fig. 3). Patients with distinct diagnoses (glioblastoma, lower grade astrocytoma, and oligodendroglioma) tended to segregate into separate clusters. Specifically, the glioblastoma-enriched cluster was marked by upregulation of the modules related to tumor mesenchymal differentiation (M8) and proliferation (M11), whereas the clusters enriched in astrocytoma and oligodendroglioma showed an upregulation of the proastrocytic (M15) and neurogenesis (M20) modules, respectively (Fig. 3).

The mesenchymal proliferation and neurogenesis modules are known to define prognostic subtypes of glioma (8, 11). In contrast, the proastrocytic module represents a novel signature, as only a few individual expression markers of glioma proastrocytic differentiation have been reported so far [e.g., APOE (32), ID4 (33), PEA15 (34)]. To evaluate prognostic relevance of the proastrocytic signature in the context of the known glioma subtypes, we reclustered tumors on the basis of hub genes from the mesenchymal (M8), proliferation (M11), neurogenesis (M20), and proastrocytic (M15) modules only. The k-means clustering defined the 4 tumor subclasses (mesenchymal, proliferative, proneural, and proastrocytic) each marked by upregulation of the respective module, as well as the “other” subclass in which all of these modules were expressed at a low level (Supplementary Fig. S3). For each subclass, we calculated a centroid by averaging expression values of the marker genes (Supplementary Table S5) and the centroids were used to classify tumors in the 5 independent data sets (see Methods section). The resulting stratification of samples into classes and the underlying expression patterns are presented in Supplementary Table S6 and Figure S5, respectively.

We compared survival between the obtained patient groups (Fig. 4). In agreement with the previous studies (8, 11), the mesenchymal and proliferative subtypes were associated with poor, whereas the proneural subtype with favorable prognosis (Fig. 4). The “other” subtype was characterized by short survival. Notably, in all the datasets, patients with proastrocytic tumors survived on average several fold longer than those with tumors of the mesenchymal, proliferative, and “other” subtypes (P = 0.005 for the least significant data set, log-rank test; Fig. 4). Survival rates in the proastrocytic and proneural subtypes were highly similar (P > 0.1). Thus, the proastrocytic subtype is strongly and reproducibly associated with favorable prognosis.

Figure 4.

Proastrocytic module defines a glioma subtype reproducibly associated with long survival. Kaplan‐Meier survival plots for the molecular subtypes of glioma in the independent patient cohorts: Gravendeel (A), Li Train (B), Phillips (C), Freije (D). The mesenchymal, proliferative, and “other” subtypes were reproducibly associated with short survival; the proneural and proastrocytic subtypes, with long survival. P values refer to the difference in mean survival time between the proastrocytic subtype and the union of the mesenchymal, proliferative, and “other” subtypes (log-rank test). The “Li Test” data set was excluded from the analysis due to absence of accurate survival data.

Figure 4.

Proastrocytic module defines a glioma subtype reproducibly associated with long survival. Kaplan‐Meier survival plots for the molecular subtypes of glioma in the independent patient cohorts: Gravendeel (A), Li Train (B), Phillips (C), Freije (D). The mesenchymal, proliferative, and “other” subtypes were reproducibly associated with short survival; the proneural and proastrocytic subtypes, with long survival. P values refer to the difference in mean survival time between the proastrocytic subtype and the union of the mesenchymal, proliferative, and “other” subtypes (log-rank test). The “Li Test” data set was excluded from the analysis due to absence of accurate survival data.

Close modal

We compared our subtypes with those from the earlier studies (6, 8, 11, 12), by calculating proportions of patients shared between them in the respective data sets (Supplementary Methods). Because we observed a substantial overall agreement of our results with those from the previous studies (Supplementary Table S7), we further focused on a detailed comparison of the proastrocytic subtype.

The proastrocytic subtype was found similar to “cluster 17” that was identified by Gravendeel and colleagues as a biologically uncharacterized group of tumors with favorable prognosis (ref. 6; P < 3 × 10−7, Fisher's exact test). Using GSEA (25), we found that this cluster is characterized by upregulation of known astrocyte-lineage markers (24) compared with the rest of the tumors (P < 0.001, Supplementary Methods). This provides a functional annotation to “cluster 17” and suggests that tumors marked by upregulation of proastrocytic genes have previously been observed as a distinct cluster (6); however, it was not recognized as related to proastrocytic differentiation. These results, thus, for the first time, implicate proastrocytic differentiation as a feature that substantially contributes to expression-based heterogeneity of glioma.

To accurately compare our subtypes with the classification introduced by Phillips and colleagues (8), we classified tumors in each of the data sets into the Mes, Prolif, and PN classes using the respective centroids from the original publication (ref. 8; Supplementary Methods). We found that, in all the data sets, proastrocytic tumors represented strictly a subgroup of the PN class (Supplementary Table S6). In line with this, proastrocytic and proneural genes were jointly upregulated in many tumors which indicates their mixed differentiation phenotype (Supplementary Fig. S4). However, a strong and reproducible negative correlation between expression levels of the proastrocytic and proneural modules in the differentiated gliomas was observed (Supplementary Fig. S5). This inverse relationship between the 2 differentiation programs suggests that the proastrocytic and proneural tumors represent biologically distinct subtypes of glioma (see Supplementary Discussion).

Because proastrocytic and proneural tumors have similar survival rates, addition of the proastrocytic subtype into the classification scheme does not improve the overall strength of association between gene expression and survival. However, as a distinct group of markers, the proastrocytic genes may be important for confirmation of favorable survival prognosis in ambiguous cases during clinical diagnostics. Further research may also identify a distinction in clinical behavior between tumors with proastrocytic and proneural differentiation. Within the proastrocytic module M15 (Supplementary Table S3), genes with the highest mean value of kME (see Supplementary Methods; e.g., DAAM2, NRTK2, AGXT2L1, ALDH2, ADD3, and others) are expected to be the most robust markers of the proastrocytic subtype for potential use in diagnostics.

Taken together, the previously unrecognized proastrocytic module defines a molecular subtype of glioma associated with favorable prognosis. This module complements the known prognostic signatures (“mesenchymal”, “proliferative”, and “proneural”) and offers potential novel markers for diagnostics of the glioma subtypes.

Sprouty genes appear to be involved in regulation of EGFR signaling in glioma

Molecular heterogeneity of gliomas represents a challenge for development of therapies targeted at signaling pathways with aberrant activity in the tumors (2). The EGFR signaling pathway is one of those known to be constitutively activated in a large fraction of highly aggressive gliomas, often via EGFR gene amplification (35). Although this oncogenic pathway is therapeutically promising, molecular mechanisms by which it is regulated are still insufficiently understood (36). We, therefore, analyzed how expression levels of the transcriptomic modules are correlated with the amplification status of the EGFR gene (Supplementary Table S8). The module of protein kinase regulation M3 was found to be, by far, the most significantly and strongly upregulated module upon EGFR amplification (P = 4.6 × 10−12, Wilcoxon test). Furthermore, this module was also the one whose expression profile was most strongly correlated with the expression level of EGFR (Pearson correlation = 0.37, P = 1.3 × 10−10; Supplementary Table S8).

We analyzed gene composition of the protein kinase regulation module (M3) in more detail and found it to contain a number of genes known to play significant roles in glioma such as stem cell marker NES, platelet-derived growth factor PDGFA and a poor prognosis marker FABP7 (Supplementary Table S3). The module also contained genes related to cell adhesion, such as cadherins (CDH2, CDH4, CDH6) and integrins (ITGA7, ITGB8), which suggests its potential upregulation in more invasive tumors (37, 38). Meanwhile, a notable feature of the module M3 was overrepresentation of genes involved in regulation of the mitogen-activated protein kinase signaling cascade (MAPK cascade regulation, P = 1.8 × 10−4) which is known to act downstream of EGFR (39). Specifically, the module included dual specificity phosphatases (DUSP4, DUSP6), Sprouty genes (SPRY1, SPRY2, SPRY4) and genes of the Sprouty-related family (SPRED1, SPRED2; Supplementary Table S3).

Sprouty proteins were shown to regulate EGFR signaling in normal human cells (40, 41), but it is unknown whether they also play a role in glioma. The observed coexpression of Sprouty and Sprouty-related genes (P = 2.8 × 10−8) suggests that they are coordinately activated in glial tumors with amplified EGFR and may regulate EGFR signaling in these tumors. Although Sprouty were originally discovered as negative regulators of MAPK signaling (42, 43), SPRY2 has been shown to inhibit ubiquitination and endocytosis of EGFR and consequently enhance EGFR signaling in non–brain human cells (40, 41). This raises the possibility that Sprouty genes that are activated in tumors with amplified EGFR act to sustain constitutively increased EGFR signaling, thus functioning as oncogenes in glioma. While establishing the role of Sprouty in pathogenesis of glioma requires further research, our data indicate that these genes may play an important role in biology of glial tumors and may represent potential targets for antiglioma therapy.

In summary, we explored the structure of glioma transcriptome and found it to be highly similar across multiple tumor sets from distinct medical institutions. The microarray data meta-analysis allowed determination of refined expression signatures for multiple processes and provided a systems-level view of gene expression variation in glioma. The structured view of the transcriptome uncovered a proastrocytic gene expression signature associated with favorable prognosis and revealed that Sprouty proteins which modify EGF signaling likely participate in glioma pathogenesis.

No potential conflicts of interest were disclosed.

We thank Dr. P.J. French for providing us the Gravendeel data set before it became accessible through the GEO database.

This research was supported by a grant from the Centre for Medical Systems Biology within the framework of the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) and grant RFBR-10-04-01385-аa from the Russian Foundation for Basic Research. AEI, MGS, Russian Foundation for Basic Research (grant RFBR-10-04-01385-аa); PA 't Hoen, grant from the Centre for Medical Systems Biology within the framework of the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Louis
DN
,
Ohgaki
H
,
Wiestler
OD
, et al
The 2007 WHO classification of tumours of the central nervous system
.
Acta Neuropathol
2007
;
114
:
97
109
.
2.
Omuro
AM
,
Faivre
S
,
Raymond
E
. 
Lessons learned in the development of targeted therapy for malignant gliomas
.
Mol Cancer Ther
2007
;
6
:
1909
19
.
3.
Louis
DN
,
Holland
EC
,
Cairncross
JG
. 
Glioma classification: a molecular reappraisal
.
Am J Pathol
2001
;
159
:
779
86
.
4.
Cairncross
JG
,
Ueki
K
,
Zlatescu
MC
, et al
Specific genetic predictors of chemotherapeutic response and survival in patients with anaplastic oligodendrogliomas
.
J Natl Cancer Inst
1998
;
90
:
1473
9
.
5.
Mischel
PS
,
Cloughesy
TF
,
Nelson
SF
. 
DNA-microarray analysis of brain cancer: molecular classification for therapy
.
Nat Rev Neurosci
2004
;
5
:
782
92
.
6.
Gravendeel
LA
,
Kouwenhoven
MC
,
Gevaert
O
, et al
Intrinsic gene expression profiles of gliomas are a better predictor of survival than histology
.
Cancer Res
2009
;
69
:
9065
72
.
7.
Nutt
CL
,
Mani
DR
,
Betensky
RA
, et al
Gene expression-based classification of malignant gliomas correlates better with survival than histological classification
.
Cancer Res
2003
;
63
:
1602
7
.
8.
Phillips
HS
,
Kharbanda
S
,
Chen
R
, et al
Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis
.
Cancer Cell
2006
;
9
:
157
73
.
9.
Van Den Boom
J
,
Wolter
M
,
Kuick
R
, et al
Characterization of gene expression profiles associated with glioma progression using oligonucleotide-based microarray analysis and real-time reverse transcription-polymerase chain reaction
.
Am J Pathol
2003
;
163
:
1033
43
.
10.
Ducray
F
,
Idbaih
A
,
de Reynies
A
, et al
Anaplastic oligodendrogliomas with 1p19q codeletion have a proneural gene expression profile
.
Mol Cancer
2008
;
7
:
41
.
11.
Freije
WA
,
Castro-Vargas
FE
,
Fang
Z
, et al
Gene expression profiling of gliomas strongly predicts survival
.
Cancer Res
2004
;
64
:
6503
10
.
12.
Li
A
,
Walling
J
,
Ahn
S
, et al
Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes
.
Cancer Res
2009
;
69
:
2091
9
.
13.
Oldham
MC
,
Konopka
G
,
Iwamoto
K
, et al
Functional organization of the transcriptome in human brain
.
Nat Neurosci
2008
;
11
:
1271
82
.
14.
Zhang
B
,
Horvath
S
. 
A general framework for weighted gene co-expression network analysis
.
Stat Appl Genet Mol Biol
2005
;
4
:
article17
.
15.
Horvath
S
,
Zhang
B
,
Carlson
M
, et al
Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target
.
Proc Natl Acad Sci USA
2006
;
103
:
17402
7
.
16.
Murat
A
,
Migliavacca
E
,
Gorlia
T
, et al
Stem cell-related “self-renewal” signature and high epidermal growth factor receptor expression associated with resistance to concomitant chemoradiotherapy in glioblastoma
.
J Clin Oncol
2008
;
26
:
3015
24
.
17.
Godard
S
,
Getz
G
,
Delorenzi
M
, et al
Classification of human astrocytic gliomas on the basis of gene expression: a correlated group of genes with angiogenic activity emerges as a strong predictor of subtypes
.
Cancer Res
2003
;
63
:
6613
25
.
18.
Liang
Y
,
Diehn
M
,
Watson
N
, et al
Gene expression profiling reveals molecularly and clinically distinct subtypes of glioblastoma multiforme
.
Proc Natl Acad Sci USA
2005
;
102
:
5814
9
.
19.
Lim
WK
,
Wang
K
,
Lefebvre
C
,
Califano
A
. 
Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks
.
Bioinformatics
2007
;
23
:
i282
8
.
20.
Carlson
MR
,
Zhang
B
,
Fang
Z
,
Mischel
PS
,
Horvath
S
,
Nelson
SF
. 
Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks
.
BMC Genomics
2006
;
7
:
40
.
21.
Langfelder
P
,
Zhang
B
,
Horvath
S
. 
Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R
.
Bioinformatics
2008
;
24
:
719
20
.
22.
Horvath
S
,
Dong
J
. 
Geometric interpretation of gene coexpression network analysis
.
PLoS Comput Biol
2008
;
4
:
e1000117
.
23.
Huang
da W
,
Sherman
BT
,
Lempicki
RA
. 
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
.
Nat Protoc
2009
;
4
:
44
57
.
24.
Cahoy
JD
,
Emery
B
,
Kaushal
A
, et al
A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function
.
J Neurosci
2008
;
28
:
264
78
.
25.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci USA
2005
;
102
:
15545
50
.
26.
Wang
H
,
He
X
,
Band
M
,
Wilson
C
,
Liu
L
. 
A study of inter-lab and inter-platform agreement of DNA microarray data
.
BMC Genomics
2005
;
6
:
71
.
27.
Rhodes
DR
,
Chinnaiyan
AM
. 
Integrative analysis of the cancer transcriptome
.
Nat Genet
2005
;
37
Suppl
:
S31
7
.
28.
Rhodes
DR
,
Yu
J
,
Shanker
K
, et al
Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression
.
Proc Natl Acad Sci USA
2004
;
101
:
9309
14
.
29.
Hanahan
D
,
Weinberg
RA
. 
The hallmarks of cancer
.
Cell
2000
;
100
:
57
70
.
30.
Furnari
FB
,
Fenton
T
,
Bachoo
RM
, et al
Malignant astrocytic glioma: genetics, biology, and paths to treatment
.
Genes Dev
2007
;
21
:
2683
710
.
31.
Mukasa
A
,
Ueki
K
,
Ge
X
, et al
Selective expression of a subset of neuronal genes in oligodendroglioma with chromosome 1p loss
.
Brain Pathol
2004
;
14
:
34
42
.
32.
Rousseau
A
,
Nutt
CL
,
Betensky
RA
, et al
Expression of oligodendroglial and astrocytic lineage markers in diffuse gliomas: use of YKL-40, ApoE, ASCL1, and NKX2–2
.
J Neuropathol Exp Neurol
2006
;
65
:
1149
56
.
33.
Liang
Y
,
Bollen
AW
,
Nicholas
MK
,
Gupta
N
. 
Id4 and FABP7 are preferentially expressed in cells with astrocytic features in oligodendrogliomas and oligoastrocytomas
.
BMC Clin Pathol
2005
;
5
:
6
.
34.
Petalidis
LP
,
Oulas
A
,
Backlund
M
, et al
Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data
.
Mol Cancer Ther
2008
;
7
:
1013
24
.
35.
Ekstrand
AJ
,
James
CD
,
Cavenee
WK
,
Seliger
B
,
Pettersson
RF
,
Collins
VP
. 
Genes for epidermal growth factor receptor, transforming growth factor alpha, and epidermal growth factor and their expression in human gliomas in vivo
.
Cancer Res
1991
;
51
:
2164
72
.
36.
Nicholas
MK
,
Lukas
RV
,
Jafri
NF
,
Faoro
L
,
Salgia
R
. 
Epidermal growth factor receptor – mediated signal transduction in the development and therapy of gliomas
.
Clin Cancer Res
2006
;
12
:
7261
70
.
37.
Guo
W
,
Giancotti
FG
. 
Integrin signalling during tumour progression
.
Nat Rev Mol Cell Biol
2004
;
5
:
816
26
.
38.
Kohutek
ZA
,
diPierro
CG
,
Redpath
GT
,
Hussaini
IM
. 
ADAM-10-mediated N-cadherin cleavage is protein kinase C-alpha dependent and promotes glioblastoma cell migration
.
J Neurosci
2009
;
29
:
4605
15
.
39.
Roberts
PJ
,
Der
CJ
. 
Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer
.
Oncogene
2007
;
26
:
3291
310
.
40.
Egan
JE
,
Hall
AB
,
Yatsula
BA
,
Bar-Sagi
D
. 
The bimodal regulation of epidermal growth factor signaling by human Sprouty proteins
.
Proc Natl Acad Sci USA
2002
;
99
:
6041
6
.
41.
Wong
ES
,
Fong
CW
,
Lim
J
, et al
Sprouty2 attenuates epidermal growth factor receptor ubiquitylation and endocytosis, and consequently enhances Ras/ERK signalling
.
EMBO J
2002
;
21
:
4796
808
.
42.
Gross
I
,
Bassit
B
,
Benezra
M
,
Licht
JD
. 
Mammalian sprouty proteins inhibit cell growth and differentiation by preventing ras activation
.
J Biol Chem
2001
;
276
:
46460
8
.
43.
Hacohen
N
,
Kramer
S
,
Sutherland
D
,
Hiromi
Y
,
Krasnow
MA
. 
Sprouty encodes a novel antagonist of FGF signaling that patterns apical branching of the Drosophila airways
.
Cell
1998
;
92
:
253
63
.