Abstract
Glioblastoma is the most aggressive brain tumor in adults and has few therapeutic options. The study of molecular subtype classifications may lead to improved prognostic classification and identification of new therapeutic targets. The Cancer Genome Atlas (TCGA) subtype classification has mainly been applied in U.S. clinical trials, while the intrinsic glioma subtype (IGS) has mainly been applied in European trials.
From paraffin-embedded tumor samples of 432 patients with uniformly treated, newly diagnosed glioblastoma, we built tissue microarrays for IHC analysis and applied RNA sequencing to the best samples to classify them according to TCGA and IGS subtypes.
We obtained transcriptomic results from 124 patients. There was a lack of agreement among the three TCGA classificatory algorithms employed, which was not solely attributable to intratumoral heterogeneity. There was overlapping of TCGA mesenchymal subtype with IGS cluster 23 and of TCGA classical subtype with IGS cluster 18. Molecular subtypes were not associated with prognosis, but levels of expression of 13 novel genes were identified as independent prognostic markers in glioma-CpG island methylator phenotype–negative patients, independently of clinical factors and MGMT methylation. These findings were validated in at least one external database. Three of the 13 genes were selected for IHC validation. In particular, high ZNF7 RNA expression and low ZNF7 protein expression were strongly associated with longer survival, independently of molecular subtypes.
TCGA and IGS molecular classifications of glioblastoma have no higher prognostic value than individual genes and should be refined before being applied to clinical trials.
When classifying glioblastoma tumors into The Cancer Genome Atlas (TCGA) subtypes, there is high variability depending on the algorithm used. Our findings indicate a lack of prognostic impact for both TCGA and intrinsic glioma subtype molecular subtypes compared with known prognostic factors, such as type of surgery, age or MGMT methylation status. In contrast, we found that the RNA expression levels of several individual genes were strong independent prognostic markers that were validated in external databases (Gravendeel, TCGA, and/or Rembrandt). Targeting a few of these genes with IHC revealed that both the RNA and protein expression of zinc finger protein 7 had a remarkable independent effect on survival in glioma-CpG island methylator phenotype negative, a surrogate of wild-type IDH glioblastomas.
Introduction
The standard therapy for glioblastoma is surgery followed by radiation therapy with concomitant and adjuvant temozolomide, but median survival remains around 14–16 months, depending on clinical factors, such as extent of surgery, patient age, and Karnofsky performance status (KPS), and on molecular factors, including IDH mutation status and MGMT promoter methylation (1, 2).
Several systems have been proposed for classifying glioblastoma tumors in molecular subtypes (3–16). In 2006, Phillips and colleagues (6) used DNA microarray to identify three glioma subtypes with different gene expression patterns, prognoses, and biological processes: proneural, which was enriched with markers of neurogenesis; proliferative, with markers of cell proliferation; and mesenchymal, with markers of mesenchymal origin. Verhaak and colleagues (7) used an unsupervised classification approach to data from The Cancer Genome Atlas (TCGA) and reported four subtypes: proneural, characterized by IDH1 mutations and/or PDGFRA amplification; classical, with EGFR amplification and/or mutations; mesenchymal, with NF1 loss and/or mutations; and neural, characterized by the expression of neuron markers. However, they could not confirm the proliferative subtype described by Phillips and colleagues (6, 17) The proneural subtype was redefined by Noushmehr and colleagues (18) with the identification of a glioma-CpG island methylator phenotype (G-CIMP). This phenotype was more prevalent among lower grade gliomas, associated with IDH somatic mutations and younger age, and mostly present in the proneural subtype. A later report by Brennan and colleagues (8) confirmed the survival advantage for the proneural subtype, which was mainly conferred by the G-CIMP–positive phenotype, identified MGMT methylation as a predictive factor for response to temozolomide, and described several novel gene mutations, some of which were targetable (19). Wang and colleagues (9) later explored changes in the microenvironment of different glioblastoma gene expression signatures before and after therapeutic intervention and concluded that the neural subtype described in previous reports (7, 8) lacked characteristic gene abnormalities and was, in fact, a contamination of normal neural tissue. Therefore, the classification was refined into only three subtypes (proneural, classical, mesenchymal). In addition, using the single sample gene set enrichment analysis (ssGSEA) of 50 gene signatures, they found that 8% of cases were enriched in multiple ssGSEA signatures, suggesting that cases can activate more than one transcriptional signature. A survival analysis of the 74 patients (20%) who activated only one signature (highest simplicity scores >0.99), they found that mesenchymal patients had shorter survival than classical and proneural patients (P = 0.03; ref. 9). On the basis of these studies, TCGA-based subtypes have been used to explore sensitivity to treatment in several clinical trials (20, 21).
Other molecular classifications of glioblastoma subtypes have been developed and used in clinical trials, mainly in Europe (22–24). Gravendeel and colleagues (12) proposed the intrinsic glioma subtypes (IGS), which segregated gliomas of all histologic grades into seven different clusters. Glioblastoma tumors were distributed mainly in clusters 18, 22, and 23 but they could also be seen in other clusters, where prognosis was better, leading the authors to suggest that molecular alterations may drive prognosis better than morphology. In a later study comparing 55 paired fresh-frozen (FF) and formalin-fixed paraffin-embedded (FFPE) tumor samples, they confirmed the validity of FFPE samples for molecular profiling (25).
Nevertheless, when evaluating the prognostic role of molecular alterations, we cannot ignore the impact of clinical variables. In fact, one of the drawbacks in studies of molecular profiling is precisely the partial omission of clinical variables from the analyses, which makes it difficult to separate the effect of such factors as extent of surgery and MGMT methylation from that of the molecular subtype.
We have explored the glioblastoma molecular subtypes and other molecular prognostic factors by analyzing FFPE samples from a uniformly treated population of newly diagnosed patients with glioblastoma. Our objectives were to evaluate the impact of subtypes on overall survival and to identify other molecular alterations, such as individual genes, related to survival in our patient population.
Materials and Methods
Patients and study design
This study was approved by the Institutional Review Board of the Hospital Germans Trias i Pujol (PI-14-016) and by the Ethics Committees of all the participating institutions and their Biobanks and was conducted in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments. All patients or their representatives gave their written informed consent to participate in this study.
From 2004 to 2014, the GLIOCAT project (26) collected clinical data from 432 consecutive patients with glioblastoma from six institutions, all of whom had received the standard first-line treatment (surgery followed by radiotherapy with concurrent and adjuvant temozolomide). The pathologic diagnosis confirming glioblastoma was performed by pathologists before patients were included in the project. Once selected for inclusion, each case was anonymized and given a number to identify it across all data.
The following data were recorded: age, sex, symptoms, tumor characteristics, type of surgery, postsurgical performance, and mini mental status examination (MMSE) score, details of radiotherapy and temozolomide treatments and treatment at relapse, date of progression, subsequent treatments, date and status at last control, and date and status of death or last control alive. Presurgical and postsurgical resonance images were uploaded on a shared platform. Once patients were included in this study, MGMT methylation status was determined if it had not previously been assessed.
DNA extraction and assessment of MGMT methylation
DNA was extracted from two 15-μm sections of FFPE tissue using the QIAamp DNA Mini Kit (QIAGEN GmbH), following the manufacturer's protocol. In cases with less than 50% of tumor cells, the tumor tissue was macrodissected manually. Then 500 ng of extracted DNA was subjected to bisulfite treatment using the EZ DNA Methylation-Gold Kit (Zymo Research Corporation). DNA methylation patterns in the CpG island of the MGMT gene were determined by methylation-specific PCR using primers specific for either methylated or modified nonmethylated DNA, as described previously (27).
RNA sequencing
RNA was extracted and assessed for quality. The highest-quality RNA samples were further anonymized and sent to the Centro Nacional de Análisis Genómico (CNAG-CRG) for analysis by RNA sequencing (RNA-seq). Before performing RNA-seq in the FFPE samples, we compared results in four cases of paired FFPE and FF samples and confirmed the similarity of results in the two types of tissue samples (26).
RNA extraction and quality assessment
The RNA extraction of FFPE tumor specimens was performed on five 15-μm deep tissue cuts using the RNeasy FFPE Kit (Qiagen) according to the manufacturer's recommendations. RNA quantity and purity were measured with the NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific) and the Qubit RNA HS Assay Kit (Invitrogen) used with the Qubit Fluorometer (Thermo Fisher Scientific). RNA was analyzed with the 2100 Bioanalyzer (Agilent) and quality was assessed by Eukaryote Total RNA Nano Bioanalyzer Assay (Agilent) according to the “DV200” standard—the percentage of RNA fragments >200 nucleotides—as recommended by Illumina for FFPE samples.
FFPE samples were sequenced in different batches and we first selected the highest-quality RNAs from patients with the most complete clinical and radiological data. However, as many samples did not have RNA with the minimal quality requirements, we subsequently included cases without available MRIs and, finally, those without informative results from the MGMT methylation analyses.
RNA library construction and sequencing
The starting input material for the construction of the RNA library was DNA-free total RNA from FFPE samples, obtained using the KAPA RNA HyperPrep Kit with RiboErase (HMR; Roche Kapa Biosystems), according to the manufacturer's protocol. Briefly, the ribosomal RNA (rRNA) was depleted from 0.8 to 1 μg of total RNA using RiboErase. rRNA-depleted RNA samples were fragmented in the presence of divalent ions at 65°C for 1 minute due to the low quality of the initial total RNA. Following RNA fragmentation (first- and second-strand synthesis), the Illumina-compatible bar-coded adapters were ligated at a concentration of 1.5 μmol/L. Libraries were enriched with 15 cycles of PCR using KAPA HiFi HotStart ReadyMix (Kapa-Roche). The size and quality of the libraries were assessed in a High Sensitivity DNA Bioanalyzer Assay (Agilent).
The libraries were sequenced on HiSeq2000 (Illumina) in paired-end mode with a read length of 2 × 76 bp using TruSeq SBS Kit v4. Each sample was sequenced in a fraction of a sequencing v4 flow cell lane, following the manufacturer's protocol. Sequencing was performed with a minimum of 65 mol/L paired-end reads (2 × 75 bp) per RNA-seq sample. Image analysis, base calling, and quality scoring of the run were processed using Real Time Analysis (RTA 1.18.66.3) software and followed by the generation of FASTQ sequence files by CASAVA (RRID:SCR_001802).
RNA-seq processing and data analysis
RNA-seq paired-end reads were mapped against the human reference genome (GRCh38) using STAR (RRID:SCR_015899, version 2.5.2a; ref. 28) with ENCODE parameters for long RNA. Genes were quantified using RSEM (RRID:SCR_013027, version 1.2.28; ref. 29) and GENCODE (RRID:SCR_014966, version 24). Mapping quality metrics were calculated with GEMtools (http://gemtools.github.io/) and RSeQC (RRID:SCR_005275; ref. 30). Normalization and transformation of the expression data for plotting and analysis were done with the variance stabilizing transformation function of DESeq2 (RRID:SCR_015687; ref. 31). Principal component analysis (PCA) was performed with the R prcomp function and the R ggplot2 library (RRID:SCR_014601).
Classification of glioblastoma molecular subtypes
TCGA classification of glioblastoma molecular subtypes (7–9) was performed with the GlioVis portal (32), which contains over 6,500 tumor samples of approximately 50 expression datasets and facilitates classification with three algorithms: Support Vector Machine (SVM), K-nearest neighbor (KNN) and ssGSEA. G-CIMP status was predicted using the GlioVis SVM algorithm. The GlioVis glioblastoma TCGA cohort was selected as training dataset for both glioblastoma molecular subtype and G-CIMP predictions. The IGS classification of glioblastoma molecular subtypes was done with the R clusterRepro package (9, 12), using the centroids for IGS0, IGS9, IGS16, IGS17, IGS18, IGS22, and IGS23 subtypes as described by Gravendeel (12, 23, 24).
Tissue microarray construction and IHC analyses
Tissue microarrays (TMAs) were built for IHC analyses of all samples with sufficient tissue. Between two and four cores of each case were obtained, depending on the amount of tissue available. TMAs were constructed using a Veridiam Tissue Array Instrument (El Cajon), model VTA-100, using a 1-mm-diameter needle. Consecutive 4-μm-thick sections were obtained and hematoxylin–eosin staining was done in sections 1, 20, and 40 to evaluate the persistence of the tumor at each spot.
Samples of normal nonbrain tissue were included in the TMAs as a guide for their reading and as positive/negative controls for IHC staining.
BenchmarK XT (Ventana Medical System Inc.), Bond Max (Leica Microsystems GmbH), and Dako (Dako Cytomation) automated immunolabeling systems were used. Cores with less than 50% of tumor were considered nonevaluable, and samples with more than 50% of nonevaluable cores were considered noninformative. Several antibodies were studied. For IDH1-R132H, we used the antibody Dianova catalog No. DIA-H09, RRID:AB_2335716. For H3.3 and H3.3 G34R, the antibodies Millipore catalog No. ABE419, RRID:AB_2728728 and RevMAb Biosciences catalog No. 31-1120-00, RRID:AB_2716433, were used, respectively. In all cases, results were scored as positive or negative expression.
Gene fusions in glioblastoma molecular subtypes
We also screened for fusions that could characterize each subtype. For fusion detection, RNA-seq reads were mapped against the reference human genome with STAR-fusion version 0.7.0 with default options using GENCODE version 24. Previously described fusions were selected as final candidates to describe subtypes (33).
Prognostic value of individual genes and related IHC protein expression
Finally, we looked for individual genes associated with overall survival. As IDH mutation status and G-CIMP status determine populations with different prognosis, we excluded G-CIMP–positive tumors from this analysis to assure an IDH–wild-type population (18, 34). Genes that were significant in our dataset (P < 0.05) and in at least one external glioblastoma gene expression dataset (Gravendeel, TCGA, and/or Rembrandt) were selected for a multivariate survival analysis including clinical variables. The model included MGMT methylation status, patient age, extent of surgery, and gene expression (one gene at a time). We identified those genes that were strongly independently related to survival in this G-CIMP–negative population (HR < 0.7 or >1.5, P < 0.05) and selected three for further validation by IHC: YWHAG (ENSG00000170027.6); UBXN7 (ENSG00000163960.11); and zinc finger protein 7 (ZNF7; ENSG00000147789.15). A FDR approach was used to account for multiple testing.
YWHAG, located at chromosome 7q11.23, has only one transcript variant (ENST00000307630.4), which was the isoform detected with RNA-seq. For IHC analysis, we selected the antibody catalog No. HPA026918, RRID: AB_10610566, which had previously been validated with Western blot analysis and protein array (https://www.proteinatlas.org/ENSG00000170027-YWHAG/antibody) and matched 100% with the gene transcript.
UBXN7, located at chromosome 3q29, has six transcript variants, four of which are protein coding. In our data, we observed the expression of only one protein-coding transcript [ENST00000296328.8 (UBXN7–201)]. For IHC, we used the antibody Anti-UBXN7 (ab185085), which targeted the sequence of the isoform even though it had not previously been validated with Western blot analysis, protein array, or orthogonal testing.
ZNF7, located at chromosome 8q24.3, has 18 transcript variants, 10 of which are protein coding. In our data, the most expressed variants were ZNF7 203, 205, 206, 207, 214, and 218, only four of which (203, 206, 207, 218) are protein coding. For IHC, we used the antibody Thermo Fisher Scientific catalog No. PA5-38583, RRID:AB_2555182, which recognizes a motif in the N-terminal region of the protein (aminoacids 70-83) that is not present in variant ZNF7 218 (Supplementary Fig. S1).
Details on IHC antibodies and scoring systems are summarized in Supplementary Data S1A.
Statistical analyses
Categorical variables were compared with the χ2 or Fisher exact test and continuous variables with the Kruskal–Wallis test. Overall survival was calculated form the date of surgery to the date of death or last control alive. Patients who were still alive at the date of last contact were censored. Median overall survival was calculated with the Kaplan–Meier method and compared using the log-rank test. The Cox proportional hazards model was used to calculate HRs with their 95% confidence intervals (CI). Several multivariate analyses of overall survival were performed, including different variables. Clinical data were merged with molecular and IHC results for statistical analyses. The Pearson ρ coefficient was used to estimate the correlation between gene and protein expression. The optimal cut-off to predict the maximal difference in overall survival according to IHC results was calculated with the Maximally Selected Rank Statistics (Maxstat package of R, version 1.1.442). Analyses were performed with R software v3.4.2 and SPSS v24 (IBM; RRID:SCR_002865).
Data availability
The datasets and full information on quality control assessments of sequenced samples generated during this study are available from the corresponding author on reasonable request. Molecular data underlying the findings described in the article are fully available without restriction from the Bioproject Sequence Read Archive http://www.ncbi.nlm.nih.gov/bioproject/PRJNA613395.
Results
Tumor tissue samples were obtained from 329 of 432 patients with glioblastoma registered in the GLIOCAT project (26). After multiple RNA extractions from each sample, 357 RNA libraries were prepared and RNA-seq results were obtained for 151 tumor samples. Twenty-seven of these samples were not included in subsequent analyses: 18 because the RNA was degraded and/or contaminated and nine because they were obtained at second surgeries at recurrence. Quality control assessments and mapping statistics are summarized in Supplementary Data S2. In addition, 17 TMAs were prepared and 280 samples were analyzed by IHC.
RNA-seq characterization of glioblastoma molecular subtypes
On the basis of the RNA-seq results, the tumor samples were classified into the three TCGA molecular subtypes (9), using the three facilitated algorithms by GlioVis: SVM, KNN, and ssGSEA (Supplementary Data S1B; ref. 32). There was agreement among all three algorithms on the subtype classification of only 82 samples, which were considered a separate subgroup in further analyses (Table 1).
. | All patients with RNA-seq results . | Non-GCIMP patients with RNA-seq results . | All patients with IHC results . |
---|---|---|---|
. | n = 124 (%) . | n = 118 (%) . | n = 280 (%) . |
Age, yrs | |||
median (range) | 64.0 (55.0–71.0) | 64.5 (56.0–71.0) | 62.0 (53.0–69.0) |
≤65 yrs | 70 (56.5) | 64 (54.2) | 186 (66.4) |
≤50 yrs | 15 (12.1) | 11 (9.32) | 47 (16.8) |
Sex | |||
Male | 76 (61.3) | 71 (60.2) | 164 (58.6) |
Female | 48 (38.7) | 47 (39.8) | 116 (41.4) |
Glioblastoma type | |||
Primary | 120 (96.8) | 116 (98.3) | 273 (97.5) |
Secondary | 4 (3.23) | 2 (1.69) | 7 (2.50) |
Extent of surgerya | |||
Total resection | 27 (23.9) | 26 (24.3) | 74 (28.1) |
Partial resection | 71 (62.8) | 66 (61.7) | 171 (65.0) |
Biopsy | 15 (13.3) | 15 (14.0) | 18 (6.8) |
MGMT methylation statusa,b | |||
Methylated | 64 (51,6) | 60 (50.8) | 132 (47.1) |
Unmethylated | 52 (41.9) | 50 (42.4) | 138 (49.3) |
Unknown | 8 (6.5) | 8 (6.8) | 10 (3.6) |
KPSc | |||
≤70% | 31 (25.0) | 30 (25.4) | 58 (20.7) |
>70% | 64 (51.6) | 61 (51.7) | 169 (60.4) |
Unknown | 29 (23.4) | 27 (22.9) | 53 (18.9) |
MMSE score | |||
<27 | 9 (7.3) | 9 (7.8) | 40 (14.3) |
≥27 | 31 (25.0) | 29 (24.5) | 77 (27.5) |
Unknown | 84 (67.7) | 80 (67.7) | 163 (58.2) |
IDH1 statusd | |||
Wild-type | 107 (86.3) | 106 (189.8) | 264 (94.3) |
Mutated | 5 (4.0) | - | 11 (3.9) |
Unknown | 12 (9.7) | 12 (10.2) | 5 (1.8) |
G-CIMP+ | 6 (4.8) | 0 (0.0) | 6 (2.1) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by SVMe | |||
Classical | 46 (37.1) | 45 (38.1) | 40 (14.3) |
Mesenchymal | 40 (32.3) | 40 (33.9) | 36 (12.9) |
Proneural | 38 (30.6) | 33 (28.0) | 37 (13.2) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by KNNe | |||
Classical | 80 (64.5) | 80 (67.8) | 70 (25.0) |
Mesenchymal | 14 (11.3) | 14 (11.9) | 14 (5.0) |
Proneural | 30 (24.2) | 24 (20.3) | 29 (10.4) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by ssGSEAe | |||
Classical | 53 (42.7) | 53 (44.9) | 45 (16.1) |
Mesenchymal | 32 (25.8) | 32 (27.1) | 30 (10.7) |
Proneural | 39 (31.5) | 33 (28.0) | 38 (13.6) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by SVM+KNN+ssGSEA (n = 82)e | |||
Classical | 42 (51.2) | 42 (54.5) | 36 (12.9) |
Mesenchymal | 13 (15.9) | 13 (16.9) | 13 (4.6) |
Proneural | 27 (32.9) | 22 (28.6) | 26 (9.3) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
IGS classification | |||
0 | 6 (4.8) | 6 (5.0) | 5 (1.8) |
9 | 14 (11.3) | 12 (10.2) | 14 (5.0) |
16 | 2 (1.6) | 2 (1.69) | 2 (0.7) |
17 | 12 (9.6) | 8 (6.78) | 11 (3.9) |
18 | 68 (54.8) | 68 (57.6) | 59 (21.1) |
22 | 4 (3.2) | 4 (3.3) | 4 (1.4) |
23 | 18 (14.5) | 18 (15.3) | 18 (6.4) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
. | All patients with RNA-seq results . | Non-GCIMP patients with RNA-seq results . | All patients with IHC results . |
---|---|---|---|
. | n = 124 (%) . | n = 118 (%) . | n = 280 (%) . |
Age, yrs | |||
median (range) | 64.0 (55.0–71.0) | 64.5 (56.0–71.0) | 62.0 (53.0–69.0) |
≤65 yrs | 70 (56.5) | 64 (54.2) | 186 (66.4) |
≤50 yrs | 15 (12.1) | 11 (9.32) | 47 (16.8) |
Sex | |||
Male | 76 (61.3) | 71 (60.2) | 164 (58.6) |
Female | 48 (38.7) | 47 (39.8) | 116 (41.4) |
Glioblastoma type | |||
Primary | 120 (96.8) | 116 (98.3) | 273 (97.5) |
Secondary | 4 (3.23) | 2 (1.69) | 7 (2.50) |
Extent of surgerya | |||
Total resection | 27 (23.9) | 26 (24.3) | 74 (28.1) |
Partial resection | 71 (62.8) | 66 (61.7) | 171 (65.0) |
Biopsy | 15 (13.3) | 15 (14.0) | 18 (6.8) |
MGMT methylation statusa,b | |||
Methylated | 64 (51,6) | 60 (50.8) | 132 (47.1) |
Unmethylated | 52 (41.9) | 50 (42.4) | 138 (49.3) |
Unknown | 8 (6.5) | 8 (6.8) | 10 (3.6) |
KPSc | |||
≤70% | 31 (25.0) | 30 (25.4) | 58 (20.7) |
>70% | 64 (51.6) | 61 (51.7) | 169 (60.4) |
Unknown | 29 (23.4) | 27 (22.9) | 53 (18.9) |
MMSE score | |||
<27 | 9 (7.3) | 9 (7.8) | 40 (14.3) |
≥27 | 31 (25.0) | 29 (24.5) | 77 (27.5) |
Unknown | 84 (67.7) | 80 (67.7) | 163 (58.2) |
IDH1 statusd | |||
Wild-type | 107 (86.3) | 106 (189.8) | 264 (94.3) |
Mutated | 5 (4.0) | - | 11 (3.9) |
Unknown | 12 (9.7) | 12 (10.2) | 5 (1.8) |
G-CIMP+ | 6 (4.8) | 0 (0.0) | 6 (2.1) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by SVMe | |||
Classical | 46 (37.1) | 45 (38.1) | 40 (14.3) |
Mesenchymal | 40 (32.3) | 40 (33.9) | 36 (12.9) |
Proneural | 38 (30.6) | 33 (28.0) | 37 (13.2) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by KNNe | |||
Classical | 80 (64.5) | 80 (67.8) | 70 (25.0) |
Mesenchymal | 14 (11.3) | 14 (11.9) | 14 (5.0) |
Proneural | 30 (24.2) | 24 (20.3) | 29 (10.4) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by ssGSEAe | |||
Classical | 53 (42.7) | 53 (44.9) | 45 (16.1) |
Mesenchymal | 32 (25.8) | 32 (27.1) | 30 (10.7) |
Proneural | 39 (31.5) | 33 (28.0) | 38 (13.6) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
TCGA subtype by SVM+KNN+ssGSEA (n = 82)e | |||
Classical | 42 (51.2) | 42 (54.5) | 36 (12.9) |
Mesenchymal | 13 (15.9) | 13 (16.9) | 13 (4.6) |
Proneural | 27 (32.9) | 22 (28.6) | 26 (9.3) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
IGS classification | |||
0 | 6 (4.8) | 6 (5.0) | 5 (1.8) |
9 | 14 (11.3) | 12 (10.2) | 14 (5.0) |
16 | 2 (1.6) | 2 (1.69) | 2 (0.7) |
17 | 12 (9.6) | 8 (6.78) | 11 (3.9) |
18 | 68 (54.8) | 68 (57.6) | 59 (21.1) |
22 | 4 (3.2) | 4 (3.3) | 4 (1.4) |
23 | 18 (14.5) | 18 (15.3) | 18 (6.4) |
Unknown | 0 (0.0) | 0 (0.0) | 167 (59.6) |
Note: Supplementary Table S3 shows the details of the survival analysis.
Abbreviations: IHC, immunohistochemistry; MMSE, mini mental status examination; yrs, years.
aPrognostic factor for overall survival in all patient subpopulations.
bMGMT methylation status was not available for all patients.
cPrognostic factor for overall survival in patients with IHC results.
dPrognostic factor for overall survival in patients with RNA-seq or IHC results.
eNo prognostic value for overall survival.
There was no significant association between clinical characteristics and TCGA subtypes (Supplementary Table S2). However, tumors classified as proneural by any of the algorithms were more likely to have IDH1 mutations and to be G-CIMP positive, while absence of MGMT methylation was more frequent in those classified as mesenchymal by SVM (Fig. 1; Supplementary Table S1). There was no case with a histone H3.3 (K27 or G34 mutant) mutation in our series. There was also no significant association between clinical characteristics and IGS clusters. IDH1-mutated and G-CIMP–positive tumors were mostly classified as clusters 9 and 17 (Fig. 1; Supplementary Table S2). Most tumors classified as IGS cluster 18 were TCGA classical subtype and those classified as IGS cluster 23 were TCGA mesenchymal subtype (Fig. 1).
Prognostic factors
The log-rank test identified age ≤65, greater extent of resection, MGMT methylation, G-CIMP positive, and IDH1 mutations—but not TCGA subtypes—as associated with longer overall survival (Supplementary Table S3). The multivariate analyses identified age, extent of resection, and MGMT methylation status as independent prognostic factors for overall survival. In addition, the mesenchymal subtype identified by KNN—but not by other algorithms—was associated with shorter survival in the multivariate analysis (HR, 2.53; 95% CI, 1.05–6.12; P = 0.038). Patients with tumors classified as IGS 17 had longer survival, but these tumors were enriched with other favorable markers: 83.3% had MGMT methylation; 27.3% were IDH1-mutated; and 33.3% were G-CIMP positive.
Gene fusions in TCGA subtypes
Among the 124 samples with informative RNA-seq results, 25 (20.1%) had actionable gene fusions, which were mainly in TCGA classical subtype and IGS cluster 18 (Fig 1).
Prognostic value of individual genes in G-CIMP–negative tumors
When we explored the prognostic value of individual genes from our RNA-seq data, in order to ensure an IDH–wild-type subgroup, we eliminated G-CIMP–positive cases and examined gene expression in the remaining 118 cases (Table 1). Through a univariate analysis and a comparative study with three external molecular data bases, we identified 139 genes associated with overall survival in both our univariate analysis and at least one other database (Fig. 2). A multivariate model including these genes, along with age, MGMT methylation status, and extent of surgery, identified 47 genes as independent markers of overall survival (Fig. 1; Supplementary Table S4). Five of these 47 genes were strong independent factors of longer survival (HR < 0.7; P < 0.05) and four of shorter survival (HR > 1.5; P < 0.05). An additional four genes were related less strongly with survival but were validated in two external databases in G-CIMP–negative tumors (Table 2). We then narrowed our selection to three genes for further validation by IHC and immune phenotyping: YWHAG, UBXN7, and ZFN7. Higher median expression levels of YWHAG and UBXN7 were observed in the proneural subtype, while there were no differences in ZNF7 expression levels across the three subtypes (Fig. 3).
Genea . | HR (95% CI) . | P . | External database for validation . | HUGO gene nomenclature committee gene name and ID . |
---|---|---|---|---|
High gene expression associated with longer overall survival | ||||
ZNF7 | ||||
ENSG00000147789 | 0.47 (0.26–0.83) | 0.009 | Rembrandt | zinc finger protein 7 (13139) |
UBN1 | ||||
ENSG00000118900 | 0.53 (0.28–0.98) | 0.042 | Graveendel | ubinuclein 1 (12506) |
LZTS2 | ||||
ENSG00000123684 | 0.62 (0.41–0.95) | 0.027 | Rembrandt | leucine zipper tumor suppressor 2 (29381) |
SLC25A25 | ||||
ENSG00000148339 | 0.66 (0.46–0.95) | 0.024 | Graveendel | solute carrier family 25 member 25 (20663) |
High gene expression associated with shorter overall survival | ||||
LPGAT1 ENSG00000123684 | 1.8 (1.2–2.7) | 0.007 | Graveendel | lysophosphatidylglycerol acyltransferase 1 (28985) |
YWHAG | ||||
ENSG00000170027 | 1.8 (1.2–2.8) | 0.010 | Graveendel and TCGA | tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma (12852) |
DYNLL1 | ||||
ENSG00000088986 | 1.8 (1.1–3) | 0.021 | Graveendel | dynein light chain LC8-type 1 (15476) |
DNAJC10 | ||||
ENSG00000077232 | 1.8 (1.1–2.9) | 0.024 | Rembrandt | DnaJ heatshock protein family (Hsp40) member C10 (24637) |
UBXN7 | ||||
ENSG00000163960 | 2 (1.2–3.5) | 0.0092 | Rembrandt | UBX domain protein 7 (29119) |
High gene expression less strongly associated with shorter overall survival but validated in two external databases | ||||
SLC17A6 | ||||
ENSG00000091664 | 1.1 (1–1.2) | 0.012 | Graveendel and TCGA | solute carrier family 17 member 6 (16703) |
CNDP1 | ||||
ENSG00000150656 | 1.1 (1–1.2) | 0.031 | Graveendel and TCGA | carnosine dipeptidase 1 (20675) |
B3GALNT1 | ||||
ENSG00000169255 | 1.5 (1.1–2.1) | 0.02 | Graveendel and Rembrandt | β-1,3-N-acetylgalactosaminyltransferase 1 (globoside blood group; 918) |
KIAA1715 (new name LNPK) | ||||
ENSG00000144320 | 1.5 (1–2.1) | 0.027 | Graveendel and Rembrandt | lunapark, ER junction formation factor (21610) |
Genea . | HR (95% CI) . | P . | External database for validation . | HUGO gene nomenclature committee gene name and ID . |
---|---|---|---|---|
High gene expression associated with longer overall survival | ||||
ZNF7 | ||||
ENSG00000147789 | 0.47 (0.26–0.83) | 0.009 | Rembrandt | zinc finger protein 7 (13139) |
UBN1 | ||||
ENSG00000118900 | 0.53 (0.28–0.98) | 0.042 | Graveendel | ubinuclein 1 (12506) |
LZTS2 | ||||
ENSG00000123684 | 0.62 (0.41–0.95) | 0.027 | Rembrandt | leucine zipper tumor suppressor 2 (29381) |
SLC25A25 | ||||
ENSG00000148339 | 0.66 (0.46–0.95) | 0.024 | Graveendel | solute carrier family 25 member 25 (20663) |
High gene expression associated with shorter overall survival | ||||
LPGAT1 ENSG00000123684 | 1.8 (1.2–2.7) | 0.007 | Graveendel | lysophosphatidylglycerol acyltransferase 1 (28985) |
YWHAG | ||||
ENSG00000170027 | 1.8 (1.2–2.8) | 0.010 | Graveendel and TCGA | tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma (12852) |
DYNLL1 | ||||
ENSG00000088986 | 1.8 (1.1–3) | 0.021 | Graveendel | dynein light chain LC8-type 1 (15476) |
DNAJC10 | ||||
ENSG00000077232 | 1.8 (1.1–2.9) | 0.024 | Rembrandt | DnaJ heatshock protein family (Hsp40) member C10 (24637) |
UBXN7 | ||||
ENSG00000163960 | 2 (1.2–3.5) | 0.0092 | Rembrandt | UBX domain protein 7 (29119) |
High gene expression less strongly associated with shorter overall survival but validated in two external databases | ||||
SLC17A6 | ||||
ENSG00000091664 | 1.1 (1–1.2) | 0.012 | Graveendel and TCGA | solute carrier family 17 member 6 (16703) |
CNDP1 | ||||
ENSG00000150656 | 1.1 (1–1.2) | 0.031 | Graveendel and TCGA | carnosine dipeptidase 1 (20675) |
B3GALNT1 | ||||
ENSG00000169255 | 1.5 (1.1–2.1) | 0.02 | Graveendel and Rembrandt | β-1,3-N-acetylgalactosaminyltransferase 1 (globoside blood group; 918) |
KIAA1715 (new name LNPK) | ||||
ENSG00000144320 | 1.5 (1–2.1) | 0.027 | Graveendel and Rembrandt | lunapark, ER junction formation factor (21610) |
aSupplementary Table S5 shows the Gene Ontology terms for these genes.
High RNA expression levels of YWHAG and UBXN7 were markers of shorter overall survival (Table 2; Supplementary Table S5). There was only a trend toward a correlation between YWHAG RNA and protein expression (P = 0.067), while there was no correlation between UBXN7 RNA and protein expression (P = 0.195). There was no association between protein expression of either YWHAG or UBXN7 and overall survival.
High ZNF7 RNA expression was a marker of longer survival (Table 2; Fig. 4A). Median expression of ZNF7 was 15.0 (percentile 25; 75%: 5.00–36.3), Supplementary Fig. S2 and Supplementary Data S1A. There was no correlation between RNA and protein expression (P = 0.979). According to a cut-off of 2.3, calculated by the Maximally Selected Rank statistics method, among the selected G-CIMP–negative population, median overall survival for patients with low ZNF7 protein expression was 27.2 months (95% CI, 11.7–42.7), compared with 12.9 months (95% CI, 10.4–15.5) for those with high expression (HR, 0.44; 95% CI, 0.21–0.92; P = 0.029; Fig. 4B). Low protein expression was also identified as an independent factor of longer overall survival when adjusted by age, extent of surgery, and MGMT methylation status (HR, 0.30; 95% CI, 0.129–0.723; P = 0.007) and maintained its prognostic value when we successively included TCGA subtypes by each of the classification methods in the multivariate model. Among all patients with IHC results (Table 1), that included unselected G-CIMP or IDH-status patients, there was a trend toward association with survival in the univariate analysis: low ZNF7 protein expression, 20.6 months (95% CI, 15.2–26.1); high ZNF7 protein expression, 16.0 months (95% CI, 13.6–18.3). The multivariate analysis identified ZNF7 protein expression as a prognostic marker (HR, 0.62; 95% CI, 0.39–1.0; P = 0.053; Supplementary Data S1A).
Discussion
We have examined the prognostic role of TCGA (9) and IGS (12) molecular subtypes in FFPE tumor samples from a uniformly treated cohort of newly diagnosed patients with glioblastoma. We found that samples were classified differently depending on the GlioVis algorithm (32) used and that there was no clear association between molecular subtype and survival. Finally, we identified individual genes that may have a greater impact on survival than the molecular subtypes themselves.
Only 66.1% of cases were classified into the same subtype by all three GlioVis algorithms (32). Using ssGSEA, which is based on 50 gene signatures strongly associated with the different subtypes, TCGA network found a 77%–93% concordance of unique calls to a specific subtype (7, 9) and attributed discordant cases to intratumoral heterogeneity and the activation of more than one transcriptional signature. However, this explanation cannot by itself account for our divergent findings because our raw data used for classification was the same for all three algorithms.
Nevertheless, we observed a high overlapping between TCGA mesenchymal subtype and the IGS cluster 23 and between TCGA classical subtype and the IGS cluster 18, which was especially evident for samples that called only one subtype. Both TCGA and the IGS classifications have been used in clinical trials (20–24) to explore treatment susceptibilities. But the variability we have observed in our study would preclude their use, unless the methods used for classification were strengthened to find a reproducible and interchangeable classification (35)
Regardless of the algorithm used for classifying the subtypes, the univariate and multivariate analyses did not identify an association between molecular TCGA subtypes and survival, nor among the glioblastoma IGS subtype independently of the individual status of IDH1 or MGMT. The mesenchymal subtype as identified by KNN, that seemed to be more stringent than the other algorithms by classifying only 11.2% of samples as mesenchymal, was associated with shorted survival but only when the analysis was adjusted for other prognostic factors.
Independently of the molecular subtypes, we identified 13 genes whose expression positively or negatively influenced outcome, even when adjusted for established prognostic factors, in G-CIMP–negative tumors as surrogates of IDH–wild-type glioblastomas. Moreover, the prognostic value of these genes was validated in TCGA, Gravendeel, and/or Rembrandt in similar population datasets, thus reinforcing our results. Our findings are in line with other reports indicating that individual genes seem to predict outcome more accurately than either morphologic or genomic subtyping (36–39). We selected three of these genes for validation by IHC (YWHAG, UBXN7, and ZNF7). We were able to detect a prognostic role for ZNF7. Interestingly, ZNF7 gene expression levels were independent of the molecular subtypes. Multivariate analyses showed that low protein expression was an independent marker of longer survival (HR, 0.30; P = 0.007) even when adjusted by the other known prognostic factors and the molecular subtypes, while conversely, high gene expression was a marker of longer survival (HR, 0.47; P = 0.009). The trend was maintained in the unselected cohort of patients with IHC but not RNA-seq assessment.
In an attempt to account for this inverse relationship between gene and protein expression with survival, we first reviewed the lack of correlation between the antibody used for IHC and gene expression. ZNF7 is a C2H2 ZNF protein that belongs to the large Krüppel-associated box (KRAB) ZNF subfamily of ZNF transcription factors ((https://www.uniprot.org/uniprot/P17097). They are characterized by having a KRAB repressor domain in the N-terminus and several C2H2 ZNF motifs that recognize specific DNA sequences and are thought to be involved in KRAB-mediated epigenetic silencing activities. Although there is little experimental data on the KRAB C2H2 family, some of the members play an important role in various processes of embryonic development, cell differentiation and proliferation, and regulation of the cell cycle and apoptosis (40). Among the highest expressed coding variants in our samples, only the isoform 207 has both the KRAB domain and the C2H2 ZNF motifs, while variants 203 and 206 have only the N-terminal KRAB transcriptional repressor domain (without DNA-binding activity) and variant 218 has only the C2H2 ZNF motif. It would thus be expected that isoforms 203 and 206, lacking ZNF motifs, would not be able to bind DNA, although they could potentially bind corepressor proteins, and that variant 218 would not recruit transcription repression complexes to ZNF7 DNA-binding sites. Therefore, the lack of correlation between gene and protein expression could be in part explained by the inability of the antibody to detect the most expressed transcript variant, ZNF7 218.
Other factors must be considered, however. Discordances between RNA and protein expression are well documented (41, 42). In a comprehensive cross-tissue analysis of gene and protein expression in normal and cancer tissues (43), there was a significant correlation between mRNA expression and protein abundance in only 6.1% of the studied genes. Intriguingly, higher levels of correlation were observed in normal tissues compared with tumor specimens. Current data demonstrate a substantial role for regulatory processes occurring after mRNA is made in this lack of concordance. For example, translation rates can be influenced by the mRNA sequence, thus causing a difference among ZNF7 transcript variants. In addition, protein half-life can be modulated by several mechanisms, including post-transcriptional modifications, and in fact, several SUMOylation and ubiquitination sites are predicted in the ZNF7 reference sequence. It is unknown how differences in protein interactions and posttranslational modifications can affect the half-life or subcellular localization of each protein variant. Given the complexity of ZNF7 alternative splicing and the great differences among ZNF7 protein isoforms, in-depth in vivo and in vitro studies and the use of isoform-specific antibodies would be necessary to clarify the contradictory effects on survival of mRNA and protein expression. Functional studies to explore the mechanisms of this lack of concordance are warranted but beyond the scope of this study.
An important limitation of this study was the dropout of samples when they were processed for RNA-seq. We obtained FFPE samples of 329 patients and 357 libraries were prepared in different batches and multiple extractions. However, we obtained good-quality results for only 124 cases, which highlights both the feasibility of using FFPE samples for RNA-seq and also the difficulties involved in chemical modification, cross-linking, and degradation over time due to the fixation and archiving methods (26). Nevertheless, this is the first study of RNA-seq analysis of FFPE samples in a sizeable cohort of newly diagnosed, uniformly treated patients with glioblastoma. Consequently, we were able to include clinical and molecular variables, such as MGMT methylation, in multivariate models. This allowed us to demonstrate a lack of prognostic impact of the molecular subtypes in the first-line setting and, conversely, the emergence of several genes independently related to survival in non-GCIMP patients with glioblastoma. These findings, especially those on the ZNF7 gene and its protein product, warrant validation in experimental models and larger cohorts of patients.
Authors’ Disclosures
C. Carrato reports personal fees from Abbvie outside the submitted work. M. Martinez Garcia reports other from Roche, Pfizer, and Celgene outside the submitted work. J. Capellades reports other from UCB Pharma, Eisai, and Medtronic outside the submitted work. M.J. Gil-Gil reports personal fees from Pfizer, Roche Pharma, Eisai, and Agendia, and nonfinancial support from Novartis, Daiichi-Sankyo, and Kern outside the submitted work. B. Bellosillo reports personal fees from Biocartis outside the submitted work. A. Muñoz-Marmol reports personal fees from AbbVie Inc, Roche, Pfizer, and Merck Sharp & Dohme de España and other from Amgen and Novartis outside the submitted work. N. de la Iglesia reports grants from Marató TV3 Foundation during the conduct of the study. C. Balana reports grants from FUNDACIO LA MARATO TVE during the conduct of the study and personal fees from Karyopharm, Abbvie, and Celgene outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
A. Esteve-Codina: Resources, data curation, software, formal analysis, methodology, writing-original draft, writing-review and editing. F. Alameda: Conceptualization, resources, data curation, formal analysis, funding acquisition, investigation, methodology, writing-original draft. C. Carrato: Formal analysis, investigation, methodology, writing-review and editing. E. Pineda: Conceptualization, data curation, formal analysis, supervision, investigation, writing-original draft. O. Arpí: Formal analysis, investigation, methodology, writing-original draft. M. Martinez García: Conceptualization, data curation, supervision, investigation, methodology, writing-original draft. M. Mallo: Conceptualization, resources, data curation, software, writing-original draft. M. Gut: Conceptualization, resources, formal analysis, validation, investigation, visualization, methodology, writing-original draft. M. Dabad: Data curation, formal analysis, investigation, methodology, writing-original draft. A. Tortosa: Conceptualization, data curation, investigation, writing-original draft. S. Del Barco: Resources, data curation, supervision, investigation, writing-original draft. J. Capellades: Conceptualization, resources, data curation, formal analysis, supervision, investigation, methodology, writing-original draft. J. Puig: Software, formal analysis, supervision, investigation, visualization, writing-original draft. O. Gallego: Data curation, investigation, writing-original draft. T. Pujol: Data curation, investigation, visualization, writing-original draft. L. Oleaga: Data curation, investigation, visualization, writing-original draft. M. Gil-Gil: Resources, data curation, investigation, writing-original draft. C. de Quintana-Schmidt: Data curation, investigation, writing-original draft. I. Valduvieco: Data curation, investigation, writing-original draft. A. Martinez-Cardus: Data curation, investigation, methodology, writing-original draft. B. Bellosillo: Resources, data curation, investigation, methodology, writing-original draft. A.M. Muñoz-Marmol: Data curation, formal analysis, methodology, writing-original draft, writing-review and editing. A. Esteve: Data curation, software, formal analysis, investigation, methodology, writing-original draft, writing-review and editing. M. Domenech: Data curation, investigation, writing-original draft. A. Camins: Data curation, investigation, visualization. J. Craven-Bartle: Resources, data curation, investigation. S. Villa: Data curation, investigation. J. Marruecos: Resources, investigation. S. Domenech: Data curation, investigation, visualization. N. de la Iglesia: Conceptualization, resources, data curation, supervision, validation, investigation, methodology, writing-review and editing. C. Balana: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, investigation, visualization, methodology, writing-original draft, project administration, writing-review and editing.
Acknowledgments
This study was funded by the Fundació La Marató TV3 (665/C/2013; http://www.ccma.cat/tv3/marato/projectes-financats/2012/231/) PIs: Carmen Balana, Francesc Alameda, and Nuria de la Iglesia. Anna Esteve-Codina is funded by ISCIII/MINECO (PT17/0009/0019) and co-funded by FEDER. Marta Domenech is funded by ISCIII/AES (CM19/00068). Carmen Balana is funded by ISCIII/AES (INT19/00032).
The authors thank the biobanks of the participating institutions: Fundació Institut Mar d’Investigacions Mèdiques, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Institut d’Investigació en Ciències de la Salut Germans Trias i Pujol (IGTP), Fundació Institut de Recerca de l’Hospital de la Santa Creu i St. Pau (IIB Sant Pau), and the Xarxa de Bancs de Tumors sponsored by Pla Director d'Oncologia de Catalunya (XBTC). We also thank Orieta Celiku from the NCI for providing useful guidelines for TGCA glioblastoma data retrieval and Natalia Garcia-Balaña for her role as data manager and in coordinating the group. We are grateful for the collaboration of all the investigators of the Glioma Catalonia (GLIOCAT) Project.
Other investigators of the GLIOCAT Project:
Julie Marie-Blanc1, Montserrat Arumí2, Raquel Lopez-Martos3, Silvia Menendez4, Ivan Archilla5, Teresa Ribalta5, Carlos Mesia6, Silvia Bagué7, Rafael Fuentes8, Noemí Vidal9, Iban Aldecoa5,10
1CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08028 Barcelona, Spain;2Pathology Department, Neuropathology Unit, Hospital del Mar, Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Barcelona, Spain;3Pathology Department, Hospital Universitari Germans Trias i Pujol, Badalona, Spain; 4Cancer Research Program, Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Barcelona, Spain; 5Pathology Department (Neuropathology), Hospital Clínic, Barcelona, Spain; 6Neuro-Oncology Unit & Medical Oncology Department, Institut Catala d’Oncologia (ICO), Institut de Investigació Bellvitge (IDIBELL), L'Hospitalet, Barcelona, Spain; 7Pathology Department, Hospital de Sant Pau, Barcelona, Spain; 8Radiation Therapy Department, Institut Catala d’Oncologia (ICO), Girona, Spain; 9Pathology Department, Hospital de Bellvitge. Bellvitge, Spain; 10Neurological Tissue Bank, Biobanc-Hospital Clínic-IDIBAPS, Barcelona, Spain
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.