A new database that combines gene expression measurements from 166 data sets enables researchers to identify prognostic variables across 39 types of cancer. For multiple cancer types, high expression of the cell-cycle regulator FOXM1 and the presence of neutrophils in the tumor suggest a poor prognosis. The immune cell marker KLRB1 and the presence of plasma cells in tumors are strong indicators that patients are likely to survive.

A new database that merges information on gene expression and survival might help researchers identify genes that forecast cancer patients' prognoses.

Although scientists have amassed vast amounts of data about tumors, this information hasn't been as helpful as they had hoped in pinpointing factors that doctors could use to make prognoses or choose therapies. One obstacle is that the data are diverse: Researchers gathered them with a variety of techniques and analysis platforms, and they may reflect the outcomes of different treatments.

“What we haven't had is a large compendium that exists across cancers for linking genomic profiles and cancer outcomes,” says Ash Alizadeh, MD, PhD, of Stanford University in California.

A team led by Alizadeh and his Stanford colleagues Andrew Gentles, PhD, and Aaron Newman, PhD, created a database that integrates 166 survival and microarray data sets from multiple sources, including the National Center for Biotechnology Information's Gene Expression Omnibus and the European Bioinformatics Institute's ArrayExpress. Called PRECOG, the database includes nearly 30,000 expression profiles covering 39 different cancers.

To uncover differences in gene expression that might be valuable for determining prognoses, the researchers standardized data from about 18,000 patients so that results obtained from various microarrays would be comparable, and then determined an overall survival score for each gene and cancer type. Using RNA sequencing (RNA-seq) data from The Cancer Genome Atlas (TCGA), they verified their identification of genes with prognostic value.

High expression of FOXM1, which spurs cell-cycle progression, was the strongest predictor of a poor prognosis, the researchers reported last month in Nature Medicine. Although previous studies showed that FOXM1 promotes specific tumor types, these results suggest it plays a role across many cancers, says Alizadeh.

In contrast, high expression of KLRB1 suggests greater likelihood of survival. This gene is a marker of immune cell infiltration of tumors, and “it's a dominant determinant of cancer outcome across many tumor types,” Alizadeh says.

The researchers also asked what the abundance of specific types of tumor-infiltrating immune cells reveals about patients' prospects. They used a computational approach called CIBERSORT, developed by Newman and Alizadeh, to determine the proportions of different immune cells in bulk tumors from gene expression patterns. A large number of plasma cells in the cancer was a strong indicator of a positive prognosis, whereas abundant neutrophils were a strong indicator of a poor prognosis. How these different cell types help or hinder tumors remains unclear.

“I think it's great that they are putting together and harmonizing these data,” says Jean C. Zenklusen, PhD, director of TCGA. However, he believes the database “is of limited use” because the majority of data it contains comes from microarrays, which researchers are abandoning for the more powerful technique RNA-seq.