Abstract
Background: Lung cancer is associated with the highest mortality rate of all cancer types, and the most common histologic subtype of lung cancer is adenocarcinoma. To apply more effective therapeutic treatment, molecular markers that are able to predict the recurrence risk of patients with adenocarcinoma are critically needed. Mutations in TP53 tumor suppressor gene have been found in approximately 50% of lung adenocarcinoma cases, but the presence of a TP53 mutation does not always associate with increased mortality.
Methods: The Cancer Genome Atlas RNA sequencing data of lung adenocarcinoma were used to define a novel gene signature for P53 deficiency. This signature was then used to calculate a sample-specific P53 deficiency score based on a patient's transcriptomic profile and tested in four independent lung adenocarcinoma microarray datasets.
Results: In all datasets, P53 deficiency score was a significant predictor for recurrence-free survival where high P53 deficiency score was associated with poor survival. The score was prognostic even after adjusting for several key clinical variables including age, tumor stage, smoking status, and P53 mutation status. Furthermore, the score was able to predict recurrence-free survival in patients with stage I adenocarcinoma and was also associated with smoking status.
Conclusions: The P53 deficiency score was a better predictor of recurrence-free survival compared with P53 mutation status and provided additional prognostic values to established clinical factors.
Impact: The P53 deficiency score can be used to stratify early-stage patients into subgroups based on their risk of recurrence for aiding physicians to decide personalized therapeutic treatment. Cancer Epidemiol Biomarkers Prev; 27(1); 86–95. ©2017 AACR.
This article is featured in Highlights of This Issue, p. 1
Introduction
Lung cancer is the leading cause of all cancer-associated deaths in the United States, accounting for approximately a quarter of all cancer mortalities (1). It is associated with poor prognosis, with only 17.4% of lung cancer patients surviving over 5 years after diagnosis (2). Lung adenocarcinoma (LUAD) makes up 40% of lung cancers, making adenocarcinoma the most common histologic subtype of lung cancers (3). Around 10% to 25% of LUAD patients are never-smokers (4). A primary reason for the high mortality of lung cancer is that only 25% of cases are diagnosed at an early stage, when surgery is most effective (5). Indeed, surgical resection of stage I non–small cell lung cancer (NSCLC) results in 5-year survival rates of approximately 70% (6–9). As screening for lung cancer becomes more widely adopted in medical practice, a larger proportion of lung cancer cases will be detected at an early stage (10–12).
LUAD cases are heterogeneous in many aspects, including histopathologic patterns, molecular features, and driver mutations. For example, approximately 30% to 60% of LUAD cases are associated with mutations in EGFR, KRAS, or ALK genes. This heterogeneity likely explains the significant variation in prognosis among LUAD patients, with survival time ranging from a few months to more than 7 years (13–16). Interestingly, about 15.7% of stage I lung cancer patients with complete surgical resection have reported cancer recurrence (17). Thus, it is critical to develop methods for predicting patient-specific prognosis so that the most effective therapeutic and management strategies can be designed for distinct subsets of LUADs. For example, some early-stage lung cancers are highly aggressive at the time of diagnosis and should be treated more aggressively (18–20). Indeed, it has been suggested that some stage I lung cancer patients have developed occult micro-metastases, via which they would develop disease recurrence (21).
Gene expression profiling has been widely applied to investigate the transcriptional regulation underlying lung cancer development and progression. Several gene signatures have been defined to predict the recurrence of patients with early-stage NSCLC (22–27). For example, Beer and colleagues identified 50 genes most of which are highly correlated with survival of 86 lung cancer cases, and found that a risk index calculated based on these 50 genes predicted the risk of patients in stage I cancers (26). In another study, Chen and colleagues developed a five-gene signature that was predictive of survival outcome in NSCLC, but the risk strata were highly correlated with stage, and stage was far more associated with survival than the expression profile (28). Importantly, a multisite blinded validation study was performed to examine the performance of several prognostic models based on gene expression alone or in combination with clinical variables (25). This study found that most models achieved better prognosis prediction accuracy when using samples containing all stages compared to using just stage I samples, suggesting their ability to discriminate stages even when stage information was not included in the model. It should be noted that, without including clinical variables, none of the models achieved a significant hazard ratio (HR) in the two validation datasets for stage I lung cancer samples.
The tumor suppressor gene TP53, which encodes tumor suppressor protein P53, is the most frequently mutated gene in human cancers, and is associated with more than 50% of tumor cases (29). In lung cancer, somatic mutation frequency of TP53 varies between different histopathologic subtypes. The highest rates of TP53 mutation have been observed in small cell lung cancer (>90%; ref. 30) and squamous cell carcinomas (81%; ref. 31), which are the subtypes that are most consistently associated with long-term smoking. In LUAD, 46% of cases have somatic mutations in TP53 (31). Notably, LUAD patients who are current or former smokers have been shown to have higher TP53 mutation rates and significantly higher somatic mutation burdens in their tumors compared to patients who are never-smokers (32, 33). In addition, very different TP53 mutation frequency and mutation types have been observed between current or former smokers and never-smokers in LUAD (33). Despite the high frequency of this mutation, the prognostic value of TP53 mutation status is still not clear in lung cancer, even though such information is commonly available for lung cancer patients (30). Previous studies have reported inconsistent or even controversial findings on the prognostic value of TP53 mutation status (34–36).
In this study, we developed a gene signature to quantify P53 pathway activity in LUAD samples by comparing gene expression between P53-mutant and wild-type samples. Our rationale is that P53 pathway activity can more accurately reflect the aggressiveness of cancer cells and is thus a better prognostic indicator than P53 mutation status. Although most nonsynonymous mutations of TP53 result in aberrant P53 activity in a dominant manner (37), the effects of different mutations vary significantly. Thus, the full impact of mutations cannot be captured by a binary indicator like P53 mutation status. In addition, loss of P53 activity can be caused by other mechanisms, including DNA hypermethylation in the promoter of TP53 (38, 39), deletion of TP53 (40), or indirectly, by the dysregulation of P53 regulators (41, 42). Here, we use a TP53 nonsynonymous mutations-based gene signature to calculate P53 deficiency scores (PDS) in LUAD samples. Our results indicate that PDS can reliably and significantly predict the rate of recurrence for early-stage LUAD patients.
Materials and Methods
LUAD gene expression datasets
The RNA sequencing (RNA-seq) data for LUAD samples generated by The Cancer Genome Atlas (TCGA) project were used to define a gene signature for characterizing P53 pathway activity. The data were downloaded from FireHose (http://gdac.broadinstitute.org/) as Level 3 processed RNA-seq data in November 2016. The data contain gene expression profiles for a total of 515 tumor samples, and provide the Reads Per Kilobase per Million mapped reads (RPKM) for 20,502 genes. In addition, somatic mutation and clinical information associated with these samples were also downloaded.
We used six lung cancer gene expression datasets from microarray experiments to validate the effectiveness of the P53-based gene signatures in predicting survival of patients with LUAD. All these datasets are available from the public Gene Expression Omnibus (GEO) database with the following accession IDs: GSE31210, GSE8894, GSE68465, GSE13213, GSE3141, and GSE42127. The number of adenocarcinoma samples in these datasets are 226, 63, 443, 117, 58, and 133, respectively. Among them, the first four datasets provide recurrence-free survival (RFS) of patients, whereas the other two datasets provide only overall survival (OS). Patient smoking information is available for GSE31210, GSE68465, and GSE13213 datasets. The mutation status of P53 is only available for GSE13213 dataset. A summary of these six datasets is provided in the Supplementary Table S1. These datasets were all generated from one-channel microarray platforms, and were downloaded as a matrix containing the expression levels of all probesets. Probeset-level expression was converted into gene-level expression by choosing the probeset with the maximum average expression to represent the gene with multiple probesets.
Define P53-deficency gene signature based on TCGA LUAD RNA-seq data
The P53-deficency gene signature was defined by comparing the differential expression of genes between P53-mutant and wild-type samples while considering confounding variables using TCGA LUAD RNA-seq data. Samples containing synonymous TP53 mutations were assigned to the wild-type group. For each gene, a logistic regression model was constructed using patient class as the response variable (Y = 1 for P53-mutant samples, and Y = 0 for P53 wild-type samples).
The predictor variables include expression level of the gene of consideration (X1), age at the time of diagnosis (X2), gender (X3), tumor stage (I, II, III, or IV, labeled as X4), and smoking status (X5). Gene expression level was represented as log(RPKM+1) to avoid extreme values. By applying these models to the TCGA LUAD data, we estimated the coefficients (β values) and their statistical significance (p value) for all genes. Second, given (β, p) values for all genes, we defined the P53-deficency gene signature using a pair of weight profiles, w+ and w−, that assigned all genes two values in the following way: For gene i, |w_i^ + = - \log ( {{p_i}} )I({\beta _i} > 0)$| and |w_i^ - = - \log ( {{p_i}} )I({\beta _i} \lt 0)$|. To avoid extreme values, the weights were trimmed at 10, and then transformed into a value within [0,1], by subtracting the minimum value and then dividing by the range. If a gene i is more significantly upregulated in P53-mutant versus wild-type samples, it will associate with a higher |w_i^ + \ $| and |w_i^ - $| of zero. Conversely, a more significantly downregulated gene will associate with a higher |w_i^ - \ $| and |w_i^ + $| of zero.
Calculate patient-specific PDS based on their expression profiles
Given the expression profiles for a number of LUAD samples, sample-specific PDSs were calculated for all samples based on the P53-deficient gene signature as described above. Specifically, we applied a modified version of a statistical method called BASE (43) with the following steps: First, gene expression data were converted into relative expression of genes by comparing with a calculated reference profile that contained the median expression of genes across all samples. Second, genes were sorted in a descending order based on their relative expression to obtain an expression profile (|{e_1},{e_2}, \ldots,{e_g}$|), where g is the total number of genes. The biased distribution of upregulated (with large values in |{w^ + }$|) and downregulated (with large values in |\ {w^ - }$|) genes in P53-mutant samples were examined by comparing two cumulative functions, a foreground f(i) and a background b(i).
If genes with large weights in w (|w_i^ + $| for upregulated genes and |w_i^ - $| for downregulated genes in P53-mutant samples) tend to have large values in tumor expression profile e, |f(i)$| will increase in value more rapidly than |b(i)$| as |i$| increases. Third, the maximum deviation between the two functions was calculated and normalized against a null distribution that was estimated by permutation to obtain PDS+ (if w = w+) or PDS− (if w = w−). Finally, the two scores were combined by taking their difference (PDS+ − PDS−) to obtain the final PDS for this sample. Patients with high PDSs are more likely to have P53 mutation, whereas patients with low PDSs are likely to have wild-type P53. Therefore, high PDS refers to low P53 pathway activity and vice versa. After accomplishing this procedure, the PDSs were calculated for all samples in the lung cancer expression datasets.
Predict patient survival using PDSs
Cox proportional hazard models were constructed to investigate the effectiveness of patient-specific PDSs in predicting patients' survival (RFS or OS). Patient samples were dichotomized into two groups by using an indicator function |I( {{\rm{PDS}} \ge t} )$|, where t is a prespecified threshold. Normally, we set t = 0. If this threshold resulted in no or a small number of samples in one group, we set t as the median of PDSs in the sample. A univariate Cox regression model was used to determine the association between dichotomized PDSs and patient survival. Multivariate Cox regression model was used to determine the effect of PDSs on survival after adjusting for potential confounding variables such as age, tumor stage, smoking status, etc. Kaplan–Meier method and log rank test were used to plot survival curve (44). The difference between the survival curves of different groups was compared with significance being estimated by using a log-rank test. The R package “survival” was used to implement statistical analyses. Specifically, the “coxph” function was used to construct cox proportional hazard models; the “survfit” function was used to create Kaplan–Meier survival curves, and the “survdiff” function was used to compare the difference between two survival curves.
P53 mutation types in TCGA
The TCGA LUAD patients were separated into three groups (P53 gain of function, P53 loss of function, and P53 wild-type) based on their P53 mutation types. The P53 gain of function group was determined by containing R248Q, R27H, or R175H mutation in protein sequence (45, 46). The P53 loss of function group was determined by having P53 nonsense mutations or frame shift mutations in the transcript sequence.
P53 target genes
The target genes of P53 were downloaded from the ChIP Enrichment Analysis (CHEA) database (47), which provides P53 targets in four different human cell lines, HCT116, U2OS, IMR90, and HFKS, identified from ChIP-chip or ChIA-PET experiments. None of those four cell lines has P53 mutation. Genes identified in at least two cell lines were selected, resulting in a total of 627 P53 target genes.
Results
Overview of this study
To assess whether a gene signature that reflects P53 activity is a better prognostic marker than P53 mutation status, we performed a series of analyses in LUAD as diagrammed in Fig. 1. We used TCGA RNA-seq data for LUAD to define a P53-deficient gene signature by comparing gene expression between P53-mutant and wild-type samples. In contrast with traditional gene signatures, our signature consisted of a pair of weighted profiles that indicate the magnitude of up- and downregulation of genes in P53-mutant samples relative to wild-type samples after adjusting for clinical variables such as age at time of diagnosis and tumor stage. This whole transcriptomic signature was applied to infer P53 pathway activity in samples based on their gene expression profiles. Patients having high similarities between their transcriptomic profiles and P53-deficient gene signature would have high PDSs, which leads to low P53 pathway activity, and vice versa. After PDS calculation, we next evaluated the PDS's ability to discriminate samples with or without P53 mutation, and examined the association of PDS with tumor stage and smoking status. Moreover, we examined the effectiveness of PDS in predicting the RFS of patients in four independent LUAD datasets. Particularly, we investigated its predictive power in early stage samples. We demonstrated that PDS is prognostic even after adjusting clinical variables including age, stage, P53 mutation status, and smoking status.
Association of PDS with P53 mutation and OS
Statistical analysis of TCGA LUAD RNA-seq data resulted in the identification of 1,376 and 1,194 genes that were up- and downregulated in P53-mutant versus wild-type samples at a false discovery rate of 0.001(Supplementary Table S2). Gene set enrichment analysis indicated that upregulated genes were enriched for cell cycle and DNA replication genes, whereas downregulated genes were enriched for ribosome genes (Supplementary Table S3). We investigated the overlap of these genes with the 621 P53 target genes that were available from the LUAD RNA-seq data (Fig. 2A). Out of these target genes, 51 and 53 were up- and downregulated in P53-mutant samples, representing a significant enrichment of 1.3-fold (P = 0.05) and 1.5-fold (P = 0.002), respectively. This indicates that P53 target genes are more likely to be differentially expressed between P53-mutant and wild-type LUAD samples. Moreover, P53 may primarily function as a transcriptional activator, because its target genes are more significantly enriched in downregulated genes. As shown, P53 targets only represent a small fraction (∼4%) of the differentially expressed genes, suggesting that the majority of them are indirectly regulated by the P53 pathway. Additionally, according to our previous study, we provided the importance of using the indirect targets of transcriptional factor to estimate its pathway activity (48, 49). Therefore, even though those P53 target genes were indirectly regulated by P53 pathway, it is reasonable to include them into P53 gene signature to enhance the power of statistical inference on P53 pathway activity for the future analysis.
To further validate PDS is associated with P53 pathway activity, we stratified TCGA LUAD patients into P53 gain of function group, P53 loss of function group and wild-type P53 group and compared their PDSs difference. As shown in the Supplementary Fig. S1, patients with P53 loss of function mutation showed the significantly higher PDSs comparing to the patients with wild-type P53 (P = 2e−7). Because of the limited sample size in the P53 gain of function group (N = 4), we did not observe PDSs showed significant difference between P53 gain of function group and P53 loss of function group. However, PDSs in the P53 loss of function group trended to be higher compared to the PDSs in the P53 gain of function group. These observations further indicate that our PDS is associated with P53 pathway activity in the LUAD.
Next we examined whether PDSs of samples inform their P53 mutation status. As shown in Fig. 2B, we observed significantly higher PDSs in P53-mutant samples compared with wild-type samples (P = 1e−40). The mutation status of P53 was not predictive of patient OS as shown in Fig. 2C. However, when patients were dichotomized into two groups with either high or low PDS, we observed significantly shorter OS of the high-PDS (PDS-Hi) group compared to the low-PDS (PDS-Lo) group (Fig. 2D). This suggested that P53 deficiency was associated with poor prognosis. We next separated patients into two subsets based on their P53 mutation status and examined the predictive power of PDS in each subset (Fig. 2E). As shown in Fig. 2E, we observed a significant difference between PDS-Hi and PDS-Lo patients in the P53 wild-type subset (P = 0.003, solid curves), but not in the P53-mutant subset (P > 0.1, dotted curves). For wild-type patients, the PDS-Hi group had overall mortality rates 2.67-fold higher than the PDS-Lo group. These results suggested that in P53 wild-type samples there might exist other mechanisms that result in defective P53 pathway activity. However, most P53-mutant samples have defective P53 pathway activity, and therefore PDS does not provide further prognostic significance in these patients.
Predicting RFS in four LUAD datasets
Having shown the association of PDS with patient OS in TCGA LUAD, we subsequently examined its ability to predict RFS in four independent datasets. These datasets were generated by using different microarray platforms, and varied in the number of samples from 63 to 442. The samples in the GSE68465 dataset were mostly from white American patients, whereas samples in the other three datasets were from East Asian patients. As shown in Fig. 3, results in the four datasets consistently showed that high-PDS is associated with significantly shorter RFS in LUAD.
Although we showed that there was an association between PDS and P53 mutation status in the TCGA LUAD data, the underlying P53-deficient gene signature was defined based on LUAD data and the prognostic analysis was based on OS instead of RFS. Thus, we further validated the association between PDS and P53 mutation status in the GSE13213 dataset. As shown in Fig. 4A, P53-mutant samples had significantly higher PDSs compared to wild-type samples (P = 5e−4). In addition, PDSs were negatively correlated with TP53 expression levels (Fig. 4B), although the correlation was moderate (R = −0.36). Neither TP53 expression level (Fig. 4C) or P53 mutation status (Fig. 4D) was predictive of RFS in this dataset, however, consistent with Fig. 2E, PDS was significantly associated with RFS in P53 wild-type but not in P53-mutant patients (Fig. 4E). Altogether, these results indicate that PDS correctly reflects P53 pathway activity and provides a better prognostic marker than TP53 expression and P53 mutation status in LUAD.
In addition to these analyses, we also examined the prognostic value of PDS in two other LUAD microarray datasets in which OS but not RFS is available. Our results indicated that high-PDS was associated with poor OS, but less predictive power was observed (Supplementary Fig. S2).
Predicting RFS in early-stage LUAD
We next investigated whether PDS was prognostic in early-stage LUAD after adjusting for clinical variables. To do this, we constructed a multivariate Cox regression model on the GSE31210 dataset that included dichotomized PDSs, gender, age at diagnosis, tumor stage, and smoking status as predictor variables. As shown in the forest plot, PDS remained significant (P = 0.002) for predicting RFS even after considering key clinical variables (Fig. 5A). However, stage II samples had significantly higher PDSs than stage I samples (Fig. 5B). Stage-specific survival analysis indicated that PDS was a significant predictor in stage I samples (P = 0.008) with HR = 2.33 (Fig. 5C). In stage II samples, high PDS was also associated with poor survival with HR = 2.19, but was not significant (P = 0.07), likely due to the small sample size (Fig. 5D).
Association of PDS with smoking status
A large fraction of lung cancer cases are associated with long-term smoking. Therefore, we investigated the association between PDS and smoking status of patients. As shown, patients that were current or former smokers showed significantly higher PDSs than those who were never-smokers (P = 7e−5) in the GSE31210 dataset (Fig. 6A). However, smoking status was not a significant predictor for RFS (Fig. 6B). We separated patients into ever- and never-smokers, and performed survival analysis in each subset. We found that high-PDS was significantly associated with patient RFS with a high HR = 4.16 (P = 7e−6) for ever-smokers, but only moderate significance was observed for never-smokers (P = 0.09). The same analysis was performed in the GSE68465 and the GSE13213 datasets, which results in similar observations but less significant association between PDS and survival in both ever- and never-smoker subgroups (Supplementary Fig. S3). These results suggest that P53 might play more critical roles in the progression of LUAD associated with tobacco exposure.
Association of PDS with survival in lung squamous cell carcinoma
We expanded our prognostic analyses in patients with lung squamous cell carcinoma using the TCGA LUAD RNA-seq data derived P53-deficient gene signature. Specifically, we performed this analysis using the squamous cell carcinoma samples available from the TCGA LUSC dataset (n = 501, OS), the GSE8894 dataset (n = 75, OS), the GSE3141 dataset (n = 53, OS), the GSE4573 dataset (n = 130, OS), and the GSE14814 dataset (n = 52, DSS: disease-specific survival). Interestingly, the inferred PDS was not predictive of OS or DSS of patients in these datasets (Supplementary Fig. S4). This might be explained by the fact that the P53-deficient gene signature was defined based on LUAD data and was not be able to accurately reflect P53 activity in squamous cell carcinoma. However, a squamous cell carcinoma-specific P53-deficient gene signature defined based on TCGA LUSC RNA-seq data did not predict patient survival either (Supplementary Fig. S5). This may be because the TP53 gene has a very high mutation frequency (>80%) in lung squamous cell carcinoma (31). Thus, it is possible that all cases of this lung cancer subtype are associated with a defective P53 pathway, and therefore the P53 pathway activity is not of prognostic value.
Discussion
Several studies have reported prognostic gene signatures for lung cancer especially in NSCLC. However, none of those gene signatures presented significance in prognosis prediction when clinical variables were taken into consideration. Often the genes involved in those signatures were poorly overlapped, suggesting a lack of reproducibility. Moreover, TP53 mutations, the most common mutations in human cancers, have been researched as a predictive biomarker for prognosis in lung cancer patients in recent years. Because of the different mutation types of TP53, the prognostic value of TP53 in lung cancer has not been clearly determined.
In this study, we defined a P53-deficient gene signature by comparing gene expression between P53-mutant and wild-type samples in TCGA LUAD RNA-seq data. This signature was used to calculate the PDSs of patients based on their expression measured by microarray platforms. Our results indicated that PDS was associated with P53 mutation status and smoking status, and was predictive of patient RFS with high consistency in multiple LUAD datasets. Furthermore, its predictive ability was independent of clinical variables including tumor stage, suggesting it has the potential to be used for stratifying early-stage patients based on their prognosis to adopt personalized treatment.
We showed that the PDS outperforms TP53 expression and P53 mutation status in terms of prognostic prediction in LUAD (Fig. 2 and 4). The activity of the P53 protein is determined by the posttranscriptional regulation and posttranslational modification, and as a result, the mRNA level of TP53 does not accurately reflect its protein activity. As shown, TP53 expression is only weakly correlated with PDS in the GSE31210 dataset (Fig. 2B), and does not predict patient survival (Fig. 2C). More than 75% of TP53 mutations result in an abnormal P53 protein that deactivates the P53 pathway via a dominant-negative regulation of wild-type P53 (37); however, the severity of distinct P53 mutations varies substantially. Moreover, the P53 pathway can also be deactivated by other biological mechanisms, including epigenetic regulation and deletion of TP53 genes, or through alterations of other genes in this pathway. For these reasons, somatic mutation status does not fully capture the P53 pathway activity, and is not a significant predictor for patient survival (Fig. 4D). In contrast, the inferred PDS provides a quantitative measurement of P53 pathway activity, and is predictive of patient prognosis, especially, in P53 wild-type samples (Figs. 2E and 4E).
The P53-deficient gene signature is defined by comparing transcriptome profiles between P53-mutant and wild-type LUAD samples from TCGA. However, smoking status, which is the most important distinction in LUAD classification, might confound the differential gene expression analysis used to define this signature. Nonsmokers tend to have fewer somatic mutations compared to current or former smokers. To ensure that our P53-deficient gene signature solely picked up differences in P53 activity and was not confounded by smoking status and other clinical variables, we adjusted for these factors during the differential expression analysis used to create our signature. We validated that this adjustment was sufficient in regards to smoking status through two follow-up analyses. A multivariate Cox regression model found that PDS was the most significant predictor of patient survival even when adjusting for smoking status in the GSE31210 dataset (Fig. 5A). Furthermore, when patients were stratified based on smoking status, the PDS was predictive of survival with high significance in the ever-smoker group and moderate significance in the never-smoker group in the GSE31210 dataset (Fig. 6C).
In breast cancer, a 32-gene signature has been proposed by Miller and colleagues to predict P53 mutation status and patient prognosis (50). In this study, we showed in LUAD that a P53-deficient gene signature defined based on RNA-seq provides a significant prognostic predictor that is applicable to both RNA-seq and microarray platforms. In addition, we applied a novel method to calculate P53 deficiency in tumor samples, which utilized the whole gene signature instead of selecting a small set of genes. The P53-deficient gene signature consists of all genes and for each gene a weight is assigned on the basis of its ability to discriminate P53-mutant against wild-type samples. This whole-transcriptome strategy is easy to be implemented and achieves a high statistical power.
In summary, we have defined a gene signature that captures P53 pathway activity in LUAD samples and predicts patient prognosis. The computational framework introduced in this study can be applied to define prognostic signatures for any cancer types based on matched gene expression and somatic mutation data.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: C. Cheng
Development of methodology: C. Cheng
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.I. Amos, C. Cheng
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Y. Zhao, F.S. Varn, G. Cai, C.I. Amos, C. Cheng
Writing, review, and/or revision of the manuscript: Y. Zhao, F.S. Varn, G. Cai, F. Xiao, C.I. Amos, C. Cheng
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): F. Xiao, C. Cheng
Study supervision: C. Cheng
Acknowledgments
This work was supported by the National Center for Advancing Translational Sciences of the NIH under award number KL2TR001088, the Center of Biomedical Research Excellence grant under award number GM103534, and the start-up funding package provided to Chao Cheng by the Geisel School of Medicine at Dartmouth College.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.