Glioblastoma multiforme (GBM) can be clustered by gene expression into four main subtypes associated with prognosis and survival, but enhancers and other gene-regulatory elements have not yet been identified in primary tumors. Here, we profiled six histone modifications and CTCF binding as well as gene expression in primary gliomas and identified chromatin states that define distinct regulatory elements across the tumor genome. Enhancers in mesenchymal and classical tumor subtypes drove gene expression associated with cell migration and invasion, whereas enhancers in proneural tumors controlled genes associated with a less aggressive phenotype in GBM. We identified bivalent domains marked by activating and repressive chromatin modifications. Interestingly, the gene interaction network from common (subtype-independent) bivalent domains was highly enriched for homeobox genes and transcription factors and dominated by SHH and Wnt signaling pathways. This subtype-independent signature of early neural development may be indicative of poised dedifferentiation capacity in glioblastoma and could provide potential targets for therapy.
Significance: Enhancers and bivalent domains in glioblastoma are regulated in a subtype-specific manner that resembles gene regulation in glioma stem cells. Cancer Res; 78(10); 2463–74. ©2018 AACR.
Glioblastoma multiforme (GBM) is an aggressive primary brain tumor that accounts for 52% of all malignant primary brain neoplasias. The median time of survival with treatment is 14.6 months, and only 5% of diagnosed individuals survive 5 years from diagnosis (1). Given the dismal prognosis of GBM, many studies have focused on analysis of whole-genome/exome sequencing and gene expression data from primary GBM tumors to identify common gene mutations and expression profiles. These studies identified 4 molecular subtypes of GBM, classical, mesenchymal, neural, and proneural. These data have been invaluable in identifying genes and gene pathways that drive the development of GBM, and subtypes predict some aspects of patient prognosis and response to treatment (2). However, the underlying chromatin context that regulates gene expression programs in primary GBM tumors is largely unknown. As GBM lesions are developmentally plastic, and can change certain aspects of their cellular identity, understanding how they vary with regard to chromatin structure will enable identification of key genes and regulatory motifs controlling differentiation capacity in GBM.
Although several studies have quantified single histone modifications in GBM-derived cell lines, none of these studies have been performed in uncultured primary tissue, and few have looked at patterns derived from multiple histone modifications in the same cell line or tumor (3). These studies established the general trend that repressive modifications (particularly polycomb silencing) are globally reduced, and active modifications (such as H3K4me3, H3K9ac, H3K27ac) are generally increased across the GBM genome. This lack of data hinders efforts to conclusively identify and characterize the initiating cell type for glioblastoma tumors. Indeed, a recent chromatin profiling study to identify enhancers revealed cell-type of origin in medulloblastoma, but no comparable dataset currently exists for GBM (4).
In this study, we sought to categorize regulatory regions of the genome in primary GBM tumors by profiling six posttranslational modifications of histone H3, and the multifunctional insulator binding protein CTCF, in conjunction with gene expression profiling of the same tumors. We used a HMM-based approach (5) and identified combinations of chromatin marks that defined distinct regulatory elements across the genome (Fig. 1A). The resulting model encompassed 21 chromatin states that identified known regulatory elements such as enhancers and promoters and also identified bivalent regions in tumors for the first time. We were able to annotate any state in this model with matched expression data, generating a context-dependent view of gene expression that identified regulatory regions that may control gene expression indirectly.
Materials and Methods
Chromatin immunoprecipitation in solid tumors and cell lines
All patients provided written informed consent. This study was approved by the Institutional Review Boards of St. David's Medical Center and of the University of Texas at Austin (Austin, TX), and studies were conducted in accordance with the ethical guidelines of the Belmont Report. Cell lines were obtained from ATCC. Cell lines were not subsequently authenticated or tested for mycoplasma except for the T98G cell line, which was verified to be mycoplasma free by a PCR assay. Tumors were collected during planned surgical resections, and only excess tissue that was not used for pathologic analysis was used in this study. Thirteen samples were collected: two meningiomas, two grade 3 anaplastic astrocytomas (AA1 and AA2), and nine grade 4 glioblastoma multiforme tumors (GBM1–GBM9). Tumor samples were flash frozen in liquid nitrogen and homogenized in a liquid nitrogen cooled Biopulverizer mortar and pestle (BioSpec Products) until particles were submillimeter size. Homogenized tumor tissue was aliquoted by weight into 15 ml conical tubes, and suspended in PBS + 10 μg/mL PMSF in isopropyl alcohol, with 1% formaldehyde for cross-linking. Samples were crosslinked for 15 minutes, rocking at room temperature, washed twice with PBS + PMSF, flash frozen in liquid nitrogen, and stored at −80°C until processing for chromatin immunoprecipitation sequencing (ChIP-seq).
Briefly, samples were lysed by douncing in hypotonic buffer, followed by centrifugation and resuspension in RIPA buffer with protease inhibitors. Lysate was sonicated in an ice bath, 30 seconds on, 60 seconds off, high intensity (Bioruptor, Diagenode). Sonication continued for four 10-minute cycles; then, samples were centrifuged to pellet-insoluble particles. We used protein A beads to “preclear” the sample before overnight antibody incubation. Samples plus beads were rocked for 30 minutes (4°C), and supernatant was transferred to a new tube. An input sample was removed and antibodies were added for overnight incubation. We performed the following immunoprecipitations for each tumor: CTCF (EMD Millipore, 07-729), H3K4me3 (EMD Millipore, 07-473), H3K4me1 (EMD Millipore, 07-436), H3K27me3 (EMD Millipore, 07-449), H3K9ac (EMD Millipore, 07-352), H3K9me3 (Abcam, ab8898), and H3K27ac (Abcam, ab4729), using 10 μg of antibody per immunoprecipitation.
After the overnight incubation, we added 30 μL beads to each immunoprecipitation, samples plus beads rocked for an hour (4°C), followed by salt/detergent washes (4°C) to reduce nonspecific binding. We eluted the antibody/DNA complexes from the beads twice, using 1% SDS and 0.1 mol/L NaHCO3 buffer. To reverse cross-linking, we added 5 mol/L NaCl to each sample and then incubated at 65°C for >4 hours. For DNA extraction, samples were sequentially digested with RNase A and Proteinase K and then extracted with phenol–chloroform and precipitated with ethanol; DNA was resuspended in sterile water.
Library preparation and sequencing
After assessing enrichment at positive control sites using qPCR, we prepared libraries using the New England Biolabs NEBNext Library Prep Kit (E6240L, New England Biolabs), with some changes. We performed adapter ligation before size selection, used Bioo adapters (514103, Bioo Scientific) and Ampure XP beads (A63881, Beckman-Coulter) instead of columns for all purifications and size selections. Sequencing was performed on a HiSeq 2500 in either the Genome Sequencing Facility at MD Anderson Cancer Center at Science Park (Smithville, TX) or the UT Austin Genome Sequencing and Analysis Facility.
Alignment and peak calling in ChIP-seq data
Sequencing of input libraries was performed for each tumor. Available datasets and their sequencing details are summarized in the “Overview” and “ChIP-seq” sheets of Supplementary File S1. Alignment of all ChIP-seq and input sequence datasets was performed with BWA (v0.7.12-r1039; ref. 6) aln and sampe functions after hard trimming all reads to 50 bases. The reference used was the primary GRCh38 (hg38) assembly (GENCODE v24). Duplicate sequences were flagged using the Picard MarkDuplicates tool (v1.123; http://broadinstitute.github.io/picard). Statistics for these datasets are summarized in the “ChIP-seq” sheet of Supplementary File S1, which groups datasets by mark.
We performed initial identification of enriched regions (peaks) using the MACS2 (v188.8.131.5260309; ref. 7) callpeak function specifying a relaxed P value threshold of 0.01, allowing one duplicate read per locus (keep-dup=2), pairing each ChIP-seq sample with its input control. We removed data aligning to genomic regions with aberrantly high signal due to copy number differences (8), and we defined low complexity regions using the “Duke Excluded Regions” and “DAC Blacklisted regions” tracks at the UCSC Genome Browser, which were lifted over to hg38.
Statistics were gathered for each peak set, including peak counts at a wide range of MACS2-reported Q value and fold enrichment (FE) levels and total base coverage for selected levels. Because P and Q values are strongly affected by sequencing depth, FE provided the most consistent measure for thresholds across chromatin mark sets. Utilizing FE also insulates these peak calls from the effects of copy number alterations in the tumor genome. We could detect large copy number alterations containing amplifications and deletions in the ChIP input libraries (Supplementary Fig. S1A and S1B). However, as FE was calculated relative to input, this provides an inherent normalization of any amplified regions, preventing them from dominating called peaks with the subsequent use of those peaks in identifying chromatin states. From examination of peak counts and base coverage at various FE levels, we selected three significance levels: high (FE 6+), target (FE 4.5+), and low (FE 3.5 or 4). MACS2-generated narrowPeak files were converted to a custom BED9+ format, including fields for P and Q values, FE, rank, and a significance level designator. All reported results are based on peaks at the target level. In preparation for model building, target-level peaks were extracted and, for each mark, peaks from all tumors were merged using bedtools (v2.25.0; ref. 9) in such a way as to preserve sample identity.
Chromatin states: systematic identification of histone colocalization
From the above merged consensus peaks for each experiment, we used ChromHMM v1.12 (5) to build a 21-state model, using only tumor ChIP-seq data. To map the distribution of enhancers across elements in the genome, annotation coordinates were defined on the basis of UCSC RefFlat annotations from January 2015. Any genomic region outside of the above annotations was defined as intergenic. Enhancer coordinates were intersected with these elements using bedtools. To understand which genes are regulated by which states, we used the GENCODE (10) annotation, version 24, and used bedtools closest to identify the distance between chromatin states and the closest protein-coding gene(s). For enhancers, we analyzed the closest gene even if it was >1 Mb away, but for bivalent regions, the closest gene (and associated bivalent domain) was not analyzed unless the distance was <500 bp.
Assessing gene enrichments for common enhancer and bivalent states using DAVID
To assess enriched pathways and terms for our common enhancer and bivalent domains, we utilized Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.8 (11) and identified enriched terms using functional annotation clustering. The DAVID terms in Figs. 5A and 6A were derived as follows. For each cluster, the term encompassing the largest number of genes while still having a significant P value [false discover rate (FDR) corrected] was selected; an additional term could be chosen from any cluster if the terms did not describe redundant features. Annotation clusters were skipped if they appeared completely redundant to any previous cluster, or nonsignificant by FDR-adjusted P value of <0.05. This process was continued until at most 10 significant terms were identified.
RNA sequencing in solid tumors and cell lines
RNA was isolated from homogenized tumor tissue using TRIzol (15596-026, Thermo Fisher Scientific). The Ribo-Zero rRNA Removal Kit (MRZH116, Illumina) was used to remove ribosomal RNAs, and the resulting RNA was used to prepare single-end or paired-end libraries with the NEBNext Small RNA Kit for Illumina (E7300S, New England Biolabs). The resulting libraries were run on an Illumina HiSeq 2500 as above.
We used cutadapt to remove 3′ Illumina RNA sequencing (RNA-seq) adapters from fastq reads (v1.10; https://github.com/marcelm/cutadapt), discarding sequences <36 bases after trimming. We filtered reads from rRNA and tRNA by aligning to a reference composed of human rRNA and tRNA sequences, retaining sequences that did not align. Finally, transcriptome-aware alignment was performed with TopHat2 (v2.1.0; ref. 12). We used the primary GRCh38 (hg38) assembly from GENCODE release 24, and the corresponding GENCODE comprehensive gene annotation set for the transcriptome. Duplicate sequences were flagged using Picard as described above. Alignment statistics for all RNA-seq datasets are summarized in the “RNA-seq” sheet of Supplementary File S1. We used the Tuxedo suite to perform gene fragments per kilobase of transcript per million (FPKM) quantification on RNA-seq data (13). The cuffnorm table of tumor-only FPKM and cuffdiff (see below) output, merged based on the unique ENSEMBL identifier is provided in Supplementary File S2.
Assignment of The Cancer Genome Atlas subtypes to tumors
ENSEMBL Gene IDs from cuffnorm-derived FPKM tables were extracted and matched against 841 gene names from The Cancer Genome Atlas (TCGA) subtyping study (14), resulting in 743 direct matches. Another 92 matches were identified using HGNC aliases, with 836 TCGA genes matched. FPKM values for the genes were extracted in TCGA-gene order. The resulting matrices had missing or 0 values replaced by the smallest R double precision value, counts log2 transformed; rows median centered; complete Spearman pairwise correlation of columns performed to construct a distance matrix; and finally, average-linkage hierarchical clustering performed on columns represented by the distance matrix. Assignment of TCGA mesenchymal, classical, or proneural subtypes to the tumors was performed on the basis of the three main branches in the resulting dendrogram. We used these subtypes to perform differential gene expression analysis across the proneural and the mesenchymal/classical group. Group assignments were as follows: mesenchymal/classical: AA1, GBM3, GBM5, GBM7, GBM8, GBM9; proneural: AA2, GBM1, GBM2, GBM4, GBM6. Genes described as significantly differentially expressed between the two groups are those marked as significant at Q ≤ 0.05 by cuffdiff, included in Supplementary File S2.
We used R version 3.3.1 (The R Project for Statistical Computing, http://www.r-project.org/) software for calculations. Statistical tests, sample number, and data representation are indicated in the main text or figure legends.
The raw fastq files from the GBM datasets generated and analyzed in this study are available in the dbGaP repository (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001389.v1.p1).
Gene expression programs in tumors recapitulate clinically distinct GBM subtypes
GBM tumors are highly heterogeneous (15), and tumor samples were homogenized before processing, so the data are representative of bulk tumor, as opposed to any specific population of cells. We performed RNA-seq from all primary tumors, as well as several commonly used GBM-derived cell lines and two independent lines of primary normal human astrocytes. After alignment, gene expression was quantified over protein-coding genes (Materials and Methods; ref. 10). We plotted gene expression data for previously identified GBM subtype classifier genes (14) and found expression patterns similar to TCGA (Fig. 1B). Thus, the tumors that we used for chromatin profiling represent authentic and clinically relevant GBM subtypes. We did not identify a neural subtype among our tumors, and there is some uncertainty regarding whether this subtype is a distinct molecular subtype of GBM (2). IDH1 is known to be frequently mutated in lower grade gliomas and in the G-CIMP subclass of the proneural subtype of GBM (16, 17). In accordance, we found that 2 of the 5 proneural tumors in our sample set showed the characteristic R132H mutation of IDH1. GBM cell lines differed widely from the tumors in their gene expression patterns and thus are not an accurate model of primary tumor lesions for genome-wide profiling studies (Supplementary Fig. S2A). When we clustered the tumors on the basis of all protein-coding genes, three groups were evident: tumors, normal human astrocytes, and GBM-derived cell lines (Supplementary Fig. S2B).
Global profiling of histone modifications reveals biologically relevant patterns
To profile regulatory chromatin states in GBM, we developed a protocol to perform ChIP-seq in primary GBM tumor samples (Materials and Methods) and concurrently sequenced the RNA derived from these samples. We profiled four active histone modifications (H3K4me1, H3K4me3, H3K9ac, and H3K27ac), two repressive histone modifications (H3K27me3 and H3K9me3), and the multifunctional insulator binding protein CTCF. Transcribed genes such as calmodulin (CALM1) displayed active histone modifications (Fig. 2A), whereas transcriptionally silent genes such as keratin 72 (KRT72) showed broad repressive marks (Fig. 2B). Genome-wide, samples showing a given mark generally clustered together, with active marks, repressive marks, and CTCF binding forming distinct groups (Fig. 2C and D). Thus, the biological state of the chromatin rather than tumor identity determined the clustering of datasets, indicating that the profiling data were reflective of the underlying chromatin state.
To systematically identify distinct chromatin states in the tumor genome, we called ChIP-seq peaks for each sample and then used ChromHMM (5) to build a 21-state model of histone modification co-localizations across the genome (Materials and Methods). On the basis of known associations of histone marks and CTCF binding with regulatory activities, we identified several functionally distinct chromatin states in the tumor genome (18), including an active enhancer, polycomb, and heterochromatin silenced state (Supplementary Table S1). Interestingly, the 21-state model revealed the existence of a bivalent state marked by active H3K4me3 and repressive H3K27me3 modifications. Such bivalent states were first identified in embryonic stem cells (19) and have been identified in glioblastoma-derived cell lines (20), but to our knowledge, this is the first time they have been seen to exist in primary GBM tumors. A view of the ChIP-seq signal surrounding a given chromatin state in a tumor showed that globally, the identified states faithfully reflected the underlying combinations of histone marks (Fig. 3A). At individual loci, the identified states captured the appropriate combination of marks corresponding to different types of functional elements, such as promoters, enhancers, and silenced heterochromatin regions (Fig. 3B). Although our analysis was not geared toward identifying superenhancers, there was good overlap between our identified enhancers in GBM and superenhancers reported in medulloblastoma and in the U87MG GBM-derived cell line based on ChIP-seq for MED1, BRD4, and EP300 (Fig. 3C; refs. 4, 21).
Although expression was variable across states, any state with a repressive mark was associated with a statistically significant reduction in expression compared with other states (Fig. 4A; Supplementary Fig. S3A–S3D). We used whole-genome bisulfite sequencing (WGBS) data from TCGA GBM tumors to examine DNA methylation levels corresponding to each state. The polycomb and heterochromatin silenced states were highly methylated, reflecting their transcriptional inactivity. Interestingly, although the genes nearest enhancers were highly expressed, the enhancers themselves were highly methylated (Fig. 4B; Supplementary Fig. S4A–S4D). Although methylation is generally considered to be silencing (22), WGBS is unable to distinguish between methylation and 5-hydroxymethylation (5hmC) of cytosine residues. Enhancers and gene bodies are specifically marked by 5hmC in ES cells (23), and enhancers were recently reported to be targeted by 5hmC in GBM (24). By comparing our enhancer annotations to 5hmC data from GBM tumors (24), we found that GBM enhancers were enriched for 5hmC compared with promoters (P = 6.212e−07, t test), bivalent regions (P = 4.08e−09), polycomb silenced regions (P = 1.866e−09), and background 5hmC levels in the genome (P = 2.778e−12; Supplementary Fig. S5A). Genes corresponding to bivalent states were not expressed, and bivalent loci showed lower levels of methylation than polycomb and heterochromatin silenced regions. CTCF-binding sites were associated with high levels of methylation, consistent with previous reports (25).
GBM enhancers are located predominantly within introns and intergenic regions and contain degenerate STAT/Klf/SP family motifs
Active enhancers (state 12) varied widely in number among tumors, with a median number of 3,640 (Supplementary Table S2). Generally, enhancers localized within or upstream of gene bodies. The vast majority of enhancers in any given tumor were located in introns. The genomic distribution of enhancers that we identified in GBM showed small differences, particularly for introns and intergenic regions, compared with enhancers identified in clinical medulloblastoma tumors (4) or in cell lines (Supplementary Table S3; ref. 26). Although the difference is small, the reduced representation of GBM enhancers in intergenic regions, and increased representation in intronic regions, was consistent across tumor samples (Supplementary Fig. S5B). Although expression levels for genes associated with enhancers varied across tumors, there were no statistically significant changes in expression across tumors or subtypes, or in the average distance to a gene (Supplementary Fig. S5C and S5D).
We used MEME-ChIP (27) to identify de novo motifs overrepresented in enhancers in each tumor. These motifs were largely degenerate and bore resemblance to motifs for several families of transcription factors, such as KLF and STAT-like binding motifs (Supplementary Fig. S6). Some motifs from 6-8 nucleotides in length strongly resembled primary and secondary binding sites for the AP2 transcription factor family, as well as TCF3, ASCL2, and TCF5. From a functional perspective, the observed motifs indicate that enhancers are enriched for binding of transcription factors controlling cellular proliferation and immune response.
Enhancers are subtype specific and control genes involved in cell–cell contacts
The enhancer state in our model is defined by colocalization of H3K27ac and H3K4me1 (Fig. 3A; Supplementary Table S2). Some transcriptional activity originated from these regions, but in a less defined manner than from promoter regions (Fig. 3A). A total of 1,817 enhancers, which covered 1,227 genes, were present in at least 3 of 11 tumors, and we defined this set as “common enhancers.” A total of 307 of these genes were strongly enriched for pathways that mediate cell–cell interactions, such as cell adhesion and cell–cell adherens junctions (Fig. 5A; Supplementary File S3).
These 307 genes with a DAVID (11) annotation from Fig. 5A were split between proneural and mesenchymal/classical (MES/CL) tumors, with genes expressed in proneural tumors being minimally expressed in the MES/CL tumors and vice versa (Fig. 5B). Of these 307 genes, 33 were on the TCGA list of subtype genes (2). This includes five genes significantly upregulated in the proneural tumors (EPHB1, MAPT, NCAM1, KIF21B, STMN1) and two genes that were significantly upregulated in the MES/CL tumors (EGFR, OSBPL3). In total, we identified 274 novel genes associated with GBM that are controlled by an enhancer in at least 3 tumors (Supplementary File S3). The subtype specificity of enhancers was often visually evident, as with the gene PODXL, which showed strong enhancer signals and expression in the MES/CL tumors, but much weaker enhancer signals and expression in proneural tumors (Fig. 5C).
Bivalent regions are enriched for hedgehog signaling and embryonic development genes
The model identified a bivalent state (state 16; Supplementary Table S4) predominantly defined by colocalization of H3K4me3 and H3K27me3 (Fig. 3A). Transcription from this state was slightly higher than a polycomb silenced state, but lower than genes near active promoters or enhancers (Fig. 4A). There were 1,435 frequently bivalent regions present in at least 5 tumors, and these regions covered 1,511 distinct genes (Materials and Methods). A total of 840 of these genes showed strong enrichment for pattern specification, Wnt signaling, embryonic development, transcription factor DNA-binding domains (Fig. 6A; Supplementary File S4), and revealed a proneural versus MES/CL division in gene expression, similar to enhancers. Frequently bivalent regions were divided in expression by subtype and fell into two groups: Group 1 contained genes expressed in MES/CL tumors that were often bivalent in proneural tumors, and Group 2 exhibited largely the reciprocal pattern (Fig. 6B; Supplementary Fig. S7A–S7D). For example, Group 1 contained many genes within the HOXB locus that were bivalent in proneural tumors but expressed in MES/CL tumors (Fig. 6C). When compared with existing data from adult brain (28), the GBM samples showed distinct epigenetic profiles (Supplementary Fig. S8).
To identify regions that were bivalent in a subtype-independent manner in GBM, we examined bivalent regions common to at least 8 of 11 tumors and identified 349 regions, which covered 381 unique genes (Supplementary File S5). Sixty-eight of these 381 genes (17.8%) were homeobox genes, a highly significant enrichment as only 1.25% of all genes are homeobox genes (P < 2.2e−16, Fisher exact test). Moreover, there were 127 transcription factors (33.3%) among the 381 commonly bivalent genes, an equally significant enrichment (P < 2.2e−16, Fisher exact test) compared with the 10% of all genes that are transcription factors. The occurrence of homeobox genes and transcription factors is thus likely to represent a functional attribute of bivalent chromatin domains in GBM. The bivalent domains were marked by punctate H3K4me3, with H3K27me3 more broadly distributed across the region (Fig. 6C).
Commonly bivalent genes were highly interconnected, with 192 of the genes connected through StringDb (29). Thirty genes were not connected to the main network, so the primary network comprises 162 nodes. Using HumanNet, 176 of the nodes were connected (AUC = 0.612; P = 1.54e−12; ref. 30). The resulting network is dominated by SHH and IHH, with WNT1, GATA family transcription factors, and the growth factor FGF10 forming additional hubs (Fig. 6D). The presence of bivalent chromatin domains in cancer may indicate dedifferentiation toward a more stem cell–like phenotype.
Although gene expression profiling of primary tumors suggests 4 molecular subtypes, classical, mesenchymal, neural, and proneural, detailed phenotypic and molecular characterization of glioma stem cells (GSC), which are thought to be the tumor-initiating cells in glioblastoma, reveal two distinct subtypes of GSCs, corresponding to the mesenchymal and proneural types (31). Strikingly, the genes targeted by the enhancers and bivalent chromatin states that we identified in primary tumors also separate them into two groups corresponding to the GSC-based classification of GBM. Moreover, genes targeted by enhancers appear to regulate pathways that are differentially active in the two GSC subtypes. Many enhancer-associated genes that were significantly upregulated in MES/CL tumors promote cellular invasion and angiogenesis, a hallmark of mesenchymal GSCs (32, 33). For example, PODXL (Podocalyxin-like) promotes cell migration, and its overexpression in GSCs is associated with a poor outcome (34); MMP11 (matrix metalloprotease 11) cleaves ECM, promotes tumorigenesis, and cellular invasion (35); S100A16 (a Ca++ binding protein) promotes the epithelial-to-mesenchymal transition (EMT) in breast cancer (36); the protein kinase FAM20C is a marker of mesenchymal GSCs and promotes proliferation in triple-negative breast cancer (37); LMO2 (LIM Domain Only 2) promotes angiogenesis (38), and is a marker of GSCs (32, 33). Conversely, many enhancer-associated genes significantly upregulated in proneural tumors were associated with increased survival. This set included, for example, AKT3 and DNM1 (dynamin-1), which are associated with survival in GBM (39), and NCAM1 (neural cell adhesion molecule 1), which is involved in neuron–neuron interactions and is negatively correlated with invasion (40). Many of these genes have not been previously associated with GBM.
Genes adjacent to frequently bivalent regions could be placed into two groups showing the same reciprocal relationship in expression between MES/CL and proneural tumors as observed with enhancer-associated genes (Figs. 5B and 6B; Supplementary Fig. S7A–S7D). Thus, Group 1 genes active in MES/CL tumors, such as COL6A2, SMOC2, ITGB2, FOXC2, and HOXB3, have been associated with angiogenesis, cellular migration, and invasive growth (41–43). Some Group 2 genes active in proneural tumors were protective. For example, ICAM5 is an intracellular adhesion molecule that regulates interactions between neurons and microglia, and it is often repressed in colon cancer (44). Another Group 2 gene, SLIT2, provides axon guidance in the developing forebrain, and patients with SLIT2-positive gliomas show better survival (45). However, other Group 2 genes highly expressed in proneural tumors were strongly suggestive of GSCs. Notable among these was OLIG2, a lineage-specific transcription factor for oligodendrocytes that is required for proliferation of GSCs (46). OLIG2 regulates PDGFRA (47), which was also a Group 2 bivalent gene highly expressed in proneural samples, and although it is required for gliomagenesis (48), its expression is associated with an improved prognosis (49).
Genes marked by bivalent chromatin in 70% of GBM (8/11 tumors) were highly interconnected and formed a network dominated by Wnt (WNT1, WNT2B, WNT6), and hedgehog (SHH, IHH) signaling, as well as HOX and homeobox genes, and transcription factors. This bears strong similarity to signatures of bivalent chromatin both in embryonic stem cells, where the opposing active and polycomb-repressed marks poise genes for developmental expression, as well as in cancer stem cells (19, 50). WNT5B was recently identified as vital for differentiation and cell growth in GSCs (51). Bivalent loci marked by H3K4me3 and H3K27me3 have not been identified previously in primary GBM tumors; unexpectedly, the bivalent signature associated with GSCs, which comprise a small fraction of the overall tumor, was instead observable in the bulk tumor.
One potential issue with identifying bivalent regions in a population of primary tumor cells is that active and repressive marks could be present at the same locus, in distinct subpopulations of cells in heterogeneous tumors. Performing ChIP-reChIP to conclusively identify active and repressive marks in the same cells could resolve this ambiguity, but the large number of cells required makes such experiments infeasible in clinical tissue samples. To establish whether bivalent regions could be discerned in a homogeneous population of GBM-derived cells, we performed ChIP-seq for H3K4me3 and H3K27me3 in the T98G cell line. Interestingly, we identified 1073 loci showing both active and repressive marks in these cells. These loci overlapped significantly (P < 4.257e-201; Bedtools Fisher) with the set of bivalent loci present in at least 8 tumors (Supplementary Fig. S9A and S9B) and included key genes like HOXA3-7, SHH, and WNT6. In addition, if active and repressive marks occurred in different populations of cells, the expectation is that such loci would show expression levels that are intermediate between the active and repressed state. However, the bivalent regions we identified in tumors showed undetectable expression levels, similar to polycomb-repressed regions (Figs. 3A and 4; Supplementary Fig. S2). Thus, although we cannot formally exclude the possibility that the active and repressive states occur in distinct cells in tumors, our data suggest that the bivalent state is not dispersed across active and repressed subpopulations of cells.
Wnt and hedgehog signaling regulate EMT, invasion, and proliferation in cancers; certain components of these pathways were expressed in the GBM tumors profiled here. The master regulators IHH, SHH, and WNT1 were nearly always silent, but poised for expression. The large number of commonly bivalent transcription factors and homeobox genes suggests a rapidly deployable program controlling a Hh and Wnt-mediated transcriptional response with the capacity to drive the production of multipotent stem cells from more differentiated bulk tumor cells. Many genes expected to be specific to GSCs were highly expressed in unsorted tumor in a subtype-specific manner, indicating the genetic pathways necessary for GSC programming are present in any GBM tumor cell. These findings corroborate other recent studies in cultured GSCs, indicating that stemness in GBM is controlled in part by Wnt signaling (52) and by epigenetic regulation of HOX genes (53). Our study raises the possibility that targeting epigenetic states with small-molecule modulators, in combination with agents that target Wnt and hedgehog signaling pathways, might be a fruitful approach for exploring subtype-specific therapy in GBM.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: A.W. Hall, M.C. Cowperthwaite, V.R. Iyer
Development of methodology: A.W. Hall, A.M. Battenhouse, V.R. Iyer
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.W. Hall, H. Shivram, A.R. Morris, M.C. Cowperthwaite, M. Shpak
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.W. Hall, A.M. Battenhouse, M.C. Cowperthwaite, V.R. Iyer
Writing, review, and/or revision of the manuscript: A.W. Hall, A.M. Battenhouse, H. Shivram, M.C. Cowperthwaite, M. Shpak, V.R. Iyer
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.M. Battenhouse, M.C. Cowperthwaite, M. Shpak, V.R. Iyer
Study supervision: V.R. Iyer
This work was funded in part by grants from the Cancer Prevention Research Institute of Texas (RP120194) and NIH (HG004563, CA130075, and CA198648 to V.R. Iyer). We are grateful to the patients for consenting to donate tissue. This study would not be possible without them. We thank the staff at NeuroTexas Institute at St. David's Medical Center (Austin, TX), the Next Generation Sequencing Core Facility at the University of Texas MD Anderson Cancer Center Science Park, which was supported by CPRIT Core Facility Support Grant RP120348, as well as the Genome Sequencing and Analysis Facility at the University of Texas at Austin for sequencing, and the Texas Advanced Computing Center at The University of Texas at Austin for HPC resources.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.