Abstract
Gliomas are the most common and malignant intracranial tumors. The standard therapy is surgical resection combined with radiotherapy and chemotherapy. However, the emergence of radioresistance and chemoresistance, which is largely due to DNA damage repair, limits the therapeutic efficacy. Therefore, we identified a high-efficiency DNA damage repair–related risk signature as a predictor for prognosis in lower grade glioma.
The signature was developed and validated in two independent datasets of the Chinese Glioma Genome Atlas (172 samples) and The Cancer Genome Atlas (451 samples). The time-dependent ROC curve, Cox regression, Nomogram, and Kaplan–Meier analyses were performed to evaluate the prognostic performance of the risk signature. The Metascape and IHC staining were performed to reveal the potential biological mechanism. GraphPad prism, SPSS, and R language were used for statistical analysis and graphical work.
This signature could distinguish the prognosis of patients, and patients with high-risk scores exhibited short survival time. The time-dependent ROC curve, Cox regression, and Nomogram model indicated the independent prognostic performance and high prognostic accuracy of the signature for survival. Combined with the IDH mutation status, this risk signature could further subdivide patients with distinct survival. Functional analysis of associated genes revealed signature-related biological process of cell cycle and DNA repair. These mechanisms were confirmed in patient samples.
The DNA damage repair–related signature was an independent and powerful prognostic biomarker in lower grade glioma.
The signature may potentially improve risk stratification of patients and provide a more accurate assessment of personalized treatment in clinic.
Introduction
Gliomas are the most common and malignant primary tumors in the central nervous system, with standard treatment involving surgical resection, radiotherapy, and chemotherapy (1–3). According to the 2016 World Health Organization (WHO) Classification of Tumors of the central nervous system, molecular biomarkers have been incorporated with histologic features to refine glioma classification, which can improve the accuracy of diagnosis and guide individualized treatment (4, 5). The application of molecular classification has driven targeted therapies and several approaches are used in clinical trials, such as PDGFR inhibitor, AKT inhibitor, and IDH1 inhibitor (6, 7). Moreover, with the success of mAbs against PD-1 or PD-L1 in treatment of melanoma and non–small cell lung cancer, immunotherapy has been proposed as a promising treatment of glioma (8–11). At present, approaches of immunotherapy are not limited to immune checkpoints blockade. Peptide vaccines, dendritic cell vaccines, and regulation of innate immune cells in the glioma environment are currently evaluated (12, 13). Despite numerous efforts to increase efficacy of glioma treatment, no substantial improvement in survival has been obtained.
As the standard regimens, both radiotherapy and alkylating agent temozolomide kill cancer cells by inducing DNA damage, which interferes with DNA replication and transcription and finally activates cell death pathways. However, DNA damage response (DDR), which is essential to maintain genomic stability in non-neoplastic cells, can remove these lesions, contributing to chemoresistance and radioresistance (14, 15). DDR is a complex network of signaling, involving DNA damage signaling, cell-cycle checkpoints induction and cell-cycle arrest, and DNA damage repair. Hence, an approach targeting DDR offers a hopeful adjuvant therapy to improve clinical prognosis of glioma (16). For example, NVP-BEZ235, a dual PI3K/mTOR inhibitor, attenuates the repair of ionizing radiation–induced DNA damage and enhances tumor radiosensitization in mouse glioblastoma model (17). PARP inhibitors combined with radiotherapy and/or temozolomide have demonstrated antiglioblastoma efficacy in animal models (18, 19). Therefore, the incorporation of DDR-related genes into the studies of cancer progression and prognosis will provide new ideas for clinical treatment of patients. Recently, a DNA repair–related signature has been identified in glioblastoma to predict prognosis (20).
In this study, we systemically analyzed the expression profile of RNA sequencing data and developed an integrated DNA damage repair–related risk signature to predict the survival of patients with lower grade glioma (LGG). We further characterized and evaluated the four-gene signature in Chinese Glioma Genome Atlas (CGGA, training set) and The Cancer Genome Atlas (TCGA, validation set) datasets. By applying Cox regression analysis, time-dependent ROC (time ROC) curve, and nomogram plots, we ascertained the risk signature as an independent prognostic biomarker, which could accurately predict the 1-, 3-, and 5-year overall survival for patients with LGG. The underlying mechanism of risk signature was also elucidated. We believed that this robust prognostic model would improve risk stratification and provide a more accurate assessment of personalized clinical management in patients with LGG.
Materials and Methods
Datasets
A total of 172 LGG samples with RNA-sequencing data and corresponding clinical information were obtained from CGGA (http://www.cgga.org.cn) as training/discovery dataset. The establishment and management of our CGGA dataset have been introduced in our previous article (21). Another 451 glioma samples with RNA-sequencing data and clinical information were obtained from TCGA (http://cancergenome.nih.gov/) as the validation set. The characteristics of patients with glioma from these datasets are summarized in Supplementary Table S1. We obtained written informed consent from the patients and the studies were conducted in accordance with the Declaration of Helsinki. The studies were approved by the Institutional Review Boards of Beijing Tiantan Hospital.
Analytic approach and bioinformatics
To obtain prognostic genes, we applied univariate Cox proportional regression model in LGG samples from CGGA dataset with a list of 513 putative DNA repair genes that were obtained from previous study (22). P < 0.05 was considered statistically significant. A total of 266 genes within the list were identified to be associated with survival in patients with LGGs. To identify the most useful predictive features, we employed the least absolute shrinkage and selection operator (LASSO), which has been proved suitable for regression analysis of high dimensional data (23, 24), to develop an optimal risk signature. The risk score for each individual was calculated as follows:
where |{{\rm{\beta }}_{\rm{i}}}$| is the coefficient and xi is the z-score–transformed relative expression value of each selected gene.
Patients in both the training and validation dataset were then divided into high risk and low risk groups based on the median cutoff of risk score. The Kaplan–Meier survival curves were used to estimate survival distributions. Cox regression and time ROC curve analysis were conducted to evaluate the prognostic performance of our prognostic model. A web-based tool, Metascape (http://metascape.org/), was employed to functionally annotate the pathways and biological functions of our risk signature. Figures were generated by several R packages, such as “pheatmap,” “time ROC,” and “survival.”
Construction of an individualized prediction model
To predict patients' overall survival, a nomogram was established as an individualized prediction model on the basis of the integrated analysis of signature risk score and clinicopathologic characteristics using the “rms” package in R language software (25). The final model was constructed using a backward stepdown selection process based on the Akaike information criterion (26). Concordance index (C-index) and calibration curves were used to measure the predictive accuracy and discriminative ability of the nomograms. The prognostic nomogram was estimated in the training cohort and then tested in the validation cohort.
IHC
To confirm the biological function of signature genes, we analyzed IHC protein staining data of Cyclin A2 and FANCD2 in the IDH-mutant (mut) LGG samples from CGGA dataset. The IHC expression levels were compared among patients with glioma in the low-, medium-, and high risk score groups with a nonparametric test. In brief, sections were deparaffinized, boiled with EDTA antigen retrieval buffer, and then incubated with the primary antibodies overnight at 4°C (anti-Cyclin A2 antibody, 1:500 dilution, Abcam and anti-FANCD2, 1:100, Abcam). The sections were then incubated with the appropriate secondary antibodies (1:100, ZSGB-Bio) at room temperature for 1 hour. IHC staining was quantified by analysis of images from at least five high-power fields per section. The images were evaluated independently by two investigators. The expression levels of Cyclin A2 and FANCD2 in tumor were defined as the portion of positively stained cells against total counted cells. The difference was assessed by Student t test.
Statistical analysis
One-way ANOVA was performed to compare differences among at least three groups. The Student t test was used to determine differences in each two-group comparison. The statistical analyses were conducted using R language (version 3.4.1), GraphPad Prism (version 7.0), and SPSS (version 16.0). P < 0.05 was regarded as statistically significant.
Results
Prognostic signature construction
By performing univariate Cox regression analysis with the 513 putative DNA repair genes, we identified 266 genes that were significantly associated with survival of patients with LGG in CGGA dataset. The heatmap allowed visualization of the association between these differently expressed genes and patients' survival (Fig. 1A). We further applied the LASSO Cox regression algorithm to select the most useful predictive features and identified four genes (CRY2, HDAC1, DCLRE1B, and KPNA2) with nonzero regression coefficients (Fig. 1B and C). Ultimately, a four-gene risk signature was built and the risk score of each patient was calculated using the following formula for further analysis: signature risk score = (−0.030424186 × CRY2 expression) + (0.020665793 × HDAC1 expression) + (0.088606238 × DCLRE1B expression) + (0.262949057 × KPNA2 expression).
To comprehensively investigate the four identified genes, we evaluated the association of risk genes with risk score and patient survival. Figure 2A shows the survival overview of each patient with LGG in CGGA dataset and patients with high risk score had poor prognosis. The heatmap showed that patients in the high risk group had a tendency to have higher expression of HDAC1, DCLRE1B, and KPNA2 and lower expression of CRY2. Kaplan–Meier analysis found that these four individual genes could well distinguish the prognosis of patients using median expression as a cutoff in patient with LGG (Fig. 2B–E). Moreover, the similar trend was observed in patients further stratified by IDH mutation and 1p/19q codeletion status according to the new WHO criteria although no significant difference was found in IDH-mut and 1p/19q codel patients (Supplementary Fig. S1). These findings were verified in the validation cohort (Supplementary Figs. S2 and S3).
Molecular characteristics and survival analysis of the risk signature
To get an overview of the risk signature in LGG, we investigated the correlation between risk score and clinical features in both CGGA and TCGA datasets. As shown in Fig. 3A, the risk score distributed differently with respect to the IDH mutation and 1p/19q codeletion status. The highest risk score was concentrated in IDH wild-type (wt) stratified patients. On the basis of the median value of risk score, patients were divided into high and low risk groups. Kaplan–Meier analysis found that patients in high risk group had shorter survival time than those in low risk group (Fig. 3B; Supplementary Fig. S4). Except LGG IDH-mut and 1p/19q codel patients (probably due to the small sample size), we observed the similar trend in LGG IDH-mut and 1p/19q noncodel and LGG IDH-wt patients (Fig. 3C–E). We then evaluated the prediction accuracy of risk score for the overall survival. The risk score showed high time-dependent AUC in CGGA (1 year: 0.8449, 3 years: 0.8901, and 5 years: 0.9115) and TCGA (1 year: 0.7973, 3 years: 0.7821, and 5 years: 0.6958; Fig. 3F) cohorts.
Prognostic validity of the risk signature in LGGs
To assess the prognostic value of risk signature, Cox regression analysis was conducted in CGGA and TCGA cohorts. We first selected risk score and several clinical factors for univariate Cox analysis. As shown in Fig. 4A, factors including age, IDH mutation status, 1p/19q codeletion status, chemotherapy, and risk score were significantly associated with patient survival in CGGA cohort (P < 0.05). After further multivariate Cox regression of these factors, risk score was still associated with survival significantly (P < 0.001; Fig. 4B). These results indicated that risk score was an independent prognostic factor for the overall survival of patients with LGG (P < 0.001). Because IDH mutation is a central marker to implicate glioma progression and prognosis and the mutation status was proved to be an independent prognostic indicator in our study, we then combined it with risk score to predict patients' prognosis (27). Patients were further divided into four subgroups based on IDH mutation status and risk score, and survival analysis was conducted. Among them, patients with wild-type IDH and high risk score exhibited the worst prognosis (Fig. 4C). It has been reported that IDH-wt LGGs are heterogeneous in prognosis and should be further stratified (28). Our results demonstrated that the risk signature could be a potential marker to refine the substratification of LGG. We conducted the same study in the TCGA cohort. Results showed that risk score was an independent prognostic biomarker, and the patients were remarkably stratified with distinct survival when combined with the IDH mutation status (Supplementary Fig. S5).
Construction of an individualized prediction model
The independent prognostic parameters for overall survival of patients were selected and integrated into a nomogram model (Fig. 5A). The C-indices were 0.841 [95% confidence interval (CI), −0.0847–1.7675] in the training cohort and 0.866 (95% CI, −0.0799–1.8114) in the validation cohort, indicating satisfactory concordance. Meanwhile, the calibration plot for the probability of survival showed optimal agreement in the prediction of 1-, 3-, and 5-year overall survival in both cohorts (Fig. 5B and C).
Functional annotation and pathway enrichment
To explore the potential functional characteristics, we identified genes that were significantly positively (R > 0.5) correlated with the risk score by Pearson correlation analysis, and then annotated their functions using Metascape. Results indicated that these correlated genes were mainly involved in pathways related to cell cycle and DNA repair, which participated in tumor initiation and progression. Similar results were verified in TCGA cohort (Fig. 6; Supplementary Fig. S6). To validate the results of bioinformatics analyses, we conducted IHC protein staining experiments in the CGGA cohort, focusing on proteins involved in cell cycle (Cyclin A2) and DNA repair (FANCD2). The results showed that the expression of Cyclin A2 and FANCD2 (Fig. 6C and D) were the lowest in tumors in the low risk group. Although there was no statistically significant difference in the percentage of FANCD2-positive cells between the high risk and medium risk group, the positive cell staining was significantly deeper in the high risk group than that in the medium risk group when viewed under the microscope. This result was consistent with that of biological function analysis. Collectively, we proved that this risk signature had a strong predictive effect on the prognosis of patients and could guide the individual treatment. To better apply this method to the clinic, we evaluated the risk gene KPNA2 within the signature and found that KPNA2 was significantly correlated with the risk score (r = 0.9589, CGGA; r = 0.9413, TCGA; Supplementary Fig. S7). To some extent, we could use the expression of KPNA2 instead of the risk signature to predict patient prognosis. As reported, KPNA2 was highly expressed in many types of malignancies, such as hepatocellular carcinoma, bladder cancer, etc., and it could be used as a biomarker for prognosis of patients (29–31). These studies indicated the importance of KPNA2 and its potential application in glioma.
Discussion
Advances in high-throughput sequencing techniques have improved our understanding of transcriptional alterations in glioma (32–34). According to the survival estimation of patients by specific biomarkers, physicians can better develop individualized treatment in clinical practice (35). Compared with a single gene, a multigene set can predict survival more robustly for patients (23, 36). In this study, we identified a risk signature as an independent prognostic indicator to stratify patients with LGG. The risk signature was subsequently integrated into a nomogram along with clinicopathologic characteristics, and an effective tool was constructed for individualized assessment of patient survival with LGGs. Moreover, further functional analysis revealed that the signature was associated with DNA repair and cell cycle.
Both internal factors and external environment of an organism can cause DNA damage, which may lead to cell-cycle arrest, apoptosis, aging, or malignant transformation of cells (37, 38). Compelling evidence has revealed that DNA repair is closely related to drug resistance of cancer cells and prognosis of patients (39–41). In this study, we combined DNA repair genes with patient prognosis and integrated multiple genes into a single signature using LASSO modeling, and our novel risk signature was proved to be a significant predictor both in CGGA and TCGA cohorts. Furthermore, based on IDH mutation status and risk score, patients with LGG were divided into subgroups with distinct survival time, which was conducive to the development of individualized treatment.
The four genes included in our signature were CRY2, HDAC1, DCLRE1B, and KPNA2. We have built the heatmap to investigate the risk of identified genes on the basis of gene expression profile. As shown in the heatmap, the expression of four genes increased along with risk score except CRY2 in both training and validation cohorts. The CRY2 gene is a core component of the circadian clock and plays a key role in cell-cycle proliferation, DNA repair, and DNA damage checkpoints (42, 43). Recent studies found that dysregulated CRY2 might be associated with pathologic development of breast cancer by mediating hormone signaling, and CRY2 degradation is involved in the chemoresistance of colorectal cancer (44). KPNA2 is a protein involved in nucleocytoplasmic transport and has been reported highly expressed in endometrial, prostate, colorectal, and brain cancers, suggesting its oncogenic effects (29, 45, 46). HDAC1 is one of the core members of class I histone deacetylases (HDAC), which serve functional roles including modification of histone and nonhistone proteins, repression or activation of gene expression, regulation of chromatin structure, and cell metabolism (47, 48). Recent studies have reported that HDAC1 might play a significant role in placental breast cancer resistance protein regulation, which controls the transfer rates of various drugs (49). In mammalian cells sensitive to nitrogen mustard, DCLRE1B plays an important role in the repair system for DNA interstrand cross-link–mediated DNA damage (50). Deficiency or inhibition of DCLRE1B in mouse fibroblast and human lymphoma cells reduces cell viability after cisplatin treatment (51). These findings suggest the four identified genes might play an important role in cancer development and tumor progression.
Although maximal safe resection combined with adjuvant radiotherapy and/or chemotherapy is the recommended treatment for patients with LGG (52), the clinical efficacy of patients still varies greatly with the same treatment regimen. The goal of precision medicine is to build a model that offers a more tailored treatment for individual patients considering their individual difference. In this study, our signature was established to predict the prognosis based on the differences in gene expression of individual patients. There are several advantages of our risk signature, including large population datasets, systematic data analysis, and potential clinical applications. In addition, combining this risk signature with IDH mutation status could further divide patients into subgroups with different survival time, which enabled more individualized treatment regimens. However, there are several limitations of our study. It is a retrospective study and the predictive value of the risk signature needs to be further characterized and validated in prospective studies. In addition, the four genes within the signature are not the star genes in the detection and treatment of glioma, which may limit the popularization and application of this method to some extent.
In conclusion, we identified and reliably validated a four-gene risk signature associated with survival time in two cohorts of patients with LGGs. This novel signature was validated to be a useful tool for risk stratification and individual prognostic assessment. Further biological functional analysis of these genes provided new insights into the malignant progression of LGGs and might be an important reference in personalized clinical treatment for patients with glioma.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: F. Zeng, X. Liu
Development of methodology: X. Liu
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): G. Li
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): X. Liu, K. Wang
Writing, review, and/or revision of the manuscript: F. Zeng
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): K. Wang
Study supervision: F. Zeng, Z. Zhao
Acknowledgments
This work was supported by grants from National Natural Science Foundation of China (81802994) and National Natural Science Foundation of China (NSFC)/Research Grants Council (RGC) Joint Research Scheme (81761168038).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.