Tumor metastasis is a major contributor to mortality of cancer patients, but the process remains poorly understood. Molecular comparisons between primary tumors and metastases can provide insights into the pathways and processes involved. Here, we systematically analyzed and cataloged molecular correlates of metastasis using The Cancer Genome Atlas (TCGA) datasets across 11 different cancer types, these data involving 4,473 primary tumor samples and 395 tumor metastasis samples (including 369 from melanoma). For each cancer type, widespread differences in gene transcription between primary and metastasis samples were observed. For several cancer types, metastasis-associated genes from TCGA comparisons were found to overlap extensively with external results from independent profiling datasets of metastatic tumors. Although some differential expression patterns associated with metastasis were found to be shared across multiple cancer types, by and large each cancer type showed a metastasis signature that was distinctive from those of the other cancer types. Functional categories of genes enriched in multiple cancer type–specific metastatic overexpression signatures included cellular response to stress, DNA repair, oxidation–reduction process, protein deubiquitination, and receptor activity. The TCGA-derived prostate cancer metastasis signature in particular could define a subset of aggressive primary prostate cancer. Transglutaminase 2 protein and mRNA were both elevated in metastases from breast and melanoma cancers. Alterations in miRNAs and in DNA methylation were also identified.
Our findings suggest that there are different molecular pathways to metastasis involved in different cancers. Our catalog of alterations provides a resource for future studies investigating the role of specific genes in metastasis.
This article is featured in Highlights of This Issue, p. 335
Metastases are formed by cancer cells that have left the primary tumor mass to form new colonies at sites throughout the human body (1). Tumor metastasis remains a major contributor to deaths of cancer patients (2). Metastasis is a multi-step process, which includes localized invasion, intravasation into lymphatic or blood vessels, traversal of the bloodstream, extravasation from the bloodstream, formation of micrometastasis, and colonization (1, 2). The process of metastasis and the factors governing the cancer spread and establishment at secondary locations remains poorly understood (3). Only a small fraction of cancer cells from the primary tumor may go on to successfully establish distant, macroscopic metastasis, and although the tumor microenvironment is understood to play an important role (3), the molecular state of the cancer cells in a macroscopic metastasis may widely differ from that of the cancer cells in the associated primary tumor.
Molecular comparisons between primary tumors and metastases can potentially provide insights into the pathways and processes involved with cancer disease progression (4, 5). Numerous independent studies have carried out gene expression profiling of metastasis versus primary cancer for individual cancer types (4–18). In addition to individual studies by cancer type, “pancancer” molecular analyses would allow for examining similarities and differences among the molecular alterations that may be associated with metastasis across diverse cancer types. The recently published “MET500” dataset includes transcriptome profiling data for metastasis samples from approximately 500 patients, involving over 30 primary sites, and biopsied from over 22 organs (19); however, the MET500 dataset does not include any data on primary cancers. The Cancer Genome Atlas (TCGA), a large-scale initiative to comprehensively profile over 10,000 cancer cases at the molecular level, includes data on some metastasis samples as well as on primary samples. Other than the TCGA-sponsored melanoma marker study (20), the metastasis samples were not featured in the respective marker analyses by cancer type that were led by TCGA network, as the project as a whole was focused on primary disease. The advantages of analyzing TCGA data for metastasis-associated molecular correlations include the multiple cancer types having been profiled on a common platform that involves multiple levels of molecular data in addition to mRNA expression.
In this study, we systematically analyzed and cataloged molecular correlates of metastasis using TCGA datasets, across 11 different cancer types for which metastasis versus primary data were available. Molecular profiling data platforms analyzed included mRNA expression, protein expression, miRNA expression, and DNA methylation. Significantly altered genes, as identified in a given cancer type, were compared across the other cancer types, as well as across results from other profiling datasets from studies outside of TCGA.
Materials and Methods
TCGA patient cohort
Results are based upon data generated by TCGA Research Network (https://gdc.cancer.gov/). Molecular data were aggregated from public repositories. Tumors analyzed in this study spanned 11 different TCGA projects, each project representing a specific cancer type, listed as follows: Breast invasive carcinoma (BRCA); Cervical squamous cell carcinoma (CESC) and endocervical adenocarcinoma; Colorectal adenocarcinoma (CRC, combining COAD and READ projects); Esophageal carcinoma (ESCA); Head and Neck squamous cell carcinoma (HNSC); Pancreatic adenocarcinoma (PAAD); Pheochromocytoma and Paraganglioma (PCPG); Prostate adenocarcinoma (PRAD); Sarcoma (SARC); Skin Cutaneous Melanoma (SKCM); and Thyroid carcinoma (THCA). Cancer molecular profiling data were generated through informed consent as part of previously published studies and analyzed in accordance with each original study's data use guidelines and restrictions. Metastasis versus primary samples were inferred using the TCGA sample code (“06” vs. “01,” respectively), which is the two digit code following the TCGA legacy sample name (e.g., metastasis sample “TCGA-V1-A9O5-06” and primary sample “TCGA-ZG-A9L9-01”).
RNA sequencing (RNA-seq) data were obtained from The Broad Institute Firehose pipeline (http://gdac.broadinstitute.org/). All RNA-seq samples were aligned using the by UNC RNA-seq V2 pipeline (21). Expression of coding genes was quantified for 20,531 features based on the gene models defined in the TCGA Gene Annotation File (GAF). Gene expression was quantified by counting the number of reads overlapping each gene model's exons and converted to reads per kilobase mapped (RPKM) values by dividing by the transcribed gene length defined in the GAF and by the total number of reads aligned to genes. Proteomic data generated by reverse-phase protein array (RPPA) across 7,663 patient tumors (“level 4” data) were obtained from The Cancer Proteome Atlas (http://tcpaportal.org/tcpa/; ref. 22). The miRNA-seq dataset was obtained from TCGA PanCanAtlas project (https://gdc.cancer.gov/about-data/publications/pancanatlas; ref. 23), which involved batch correction according to Illumina GAIIx or HiSeq 2000 platforms. DNA methylation profiles for 450K Illumina array platform were obtained from The Broad Institute Firehose pipeline (http://gdac.broadinstitute.org/).
Differential analyses by molecular feature
For mRNA, miRNA, and RPPA data platforms, differential expression between comparison groups was assessed using Pearson correlation on log-transformed values (base 2). For cancer types with more than one metastasis profile, the Pearson correlation P value is equivalent to a t test; for cancer types with just one metastasis profile, significant genes in effect represented outliers with large differences at the edge or outside of the distribution as defined by the primary samples. Differential analyses between metastasis and primary by alternate methods for RNA-seq data were found to be largely concordant with results by the Pearson method (Supplementary Fig. S1). For DNA methylation platform, differential expression between comparison groups was assessed using Pearson correlation on logit-transformed values (natural log). For SKCM datasets, a linear regression model was also carried out for each gene, with dependent variable (continuous variable) of expression and with independent variables: metastasis/primary (categorical variable) + estimated tumor purity (ref. 24; continuous variable). FDRs were estimated using the method of Storey and Tibshirini (25). For selecting top features for a given data platform and cancer type, FDR < 10% was used as a cutoff; for SKCM datasets, top features were also significant with P < 0.05 for linear model incorporating tumor purity as a covariate. Visualization using heat maps was performed using both JavaTreeview (version 1.1.6r4; ref. 26) and matrix2png (version 1.2.1; ref. 27). R software (version 3.1.0) was used for generation of box plots.
Pathway and network analyses
Enrichment of gene ontology (GO) annotation terms within the sets of differentially expressed genes was evaluated using SigTerms software (28) and one-sided Fisher exact tests, with FDRs estimated using the method of Storey and Tibshirini (25). Protein interaction network analysis used the entire set of human protein–protein interactions cataloged in Entrez Gene (downloaded June 2017). Entrez Gene interactions with yeast two-hybrid experiments providing the only support for the interaction were not included in the analysis. Graphical visualization of networks was generated using Cytoscape (29).
Analysis of external expression profiling datasets
We examined the following external gene expression profiling datasets of metastasis versus primary samples (listed by Gene Expression Omnibus or ArrayExpress accession number): BRCA studies E-MTAB-4003 (8), GSE100534 (9), and GSE110590 (5); CRC studies GSE50760 (10), GSE22834 (11), and GSE41258 (12); PAAD studies GSE42952 (13) and GSE19281 (14); PRAD studies GSE21034 (7), GSE3933 (6), and GSE6099 (4); SKCM studies GSE65904 (15), GSE17275 (16), and GSE46517 (17); and THCA study GSE60542 (18). Differential expression between comparison groups was assessed using t test on log-transformed values (base 2). For the purposes of comparing results of external datasets with TCGA metastasis signatures, where multiple expression array features referred to the same gene, the feature with the smallest P value for differences between metastasis and primary tumors (either direction) was used to represent the gene. For patient survival associations involving the TCGA PRAD metastasis signature, we examined external gene expression profiling datasets of primary prostate cancer from Taylor and colleagues (GSE21034; ref. 7), Sboner and colleagues (GSE16560; ref. 30), and Nakagawa and colleagues (GSE10645; ref. 31), assigning a metastasis signature score to each external tumor profile using our previously described “t score” metric (21); log2-transformed values within each dataset were for normalized to SDs from the median across the primary sample profiles. In the same way, the t score metric was also used in applying the TCGA metastasis gene signature for a given cancer type to the primary sample mRNA profiles in TCGA for that cancer type.
We also examined tissue-specific mRNA signatures, to determine whether these might overlap with the cancer metastasis–specific mRNA signatures that were identified. Gene expression data (TPM values) from GTEx Analysis version 7 release were obtained from the GTEx Portal (https://www.gtexportal.org). Genes with average TPM values greater than 5 units across the normal tissue samples were used in this analysis, which involved 12,769 unique genes in total. Using log-transformed values, for each tissue in GTEx dataset that would be associated with one of the cancer types analyzed in the present study (breast, BRCA; skin, SKCM; cervix/uteri, CESC; colon, CRC; esophagus, ESCA; muscle, SARC; nerve, PCPG; pancreas, PAAD; prostate, PRAD; and thyroid, THCA), the top 500 genes positively correlated with that tissue as compared with all other tissues were determined (t test using log-transformed data). For a given cancer type, both the genes overexpressed in metastasis and the genes underexpressed in metastasis were each compared with the set of tissue-specific mRNA markers from GTEx corresponding to that cancer type, with the significant overlap determined using one-sided Fisher exact tests. In the same way, we examined GTEx-derived markers of tissues representing common sites of metastasis (adrenal gland, brain, liver, and lungs) for significant overlap with TCGA-derived metastasis overexpressed genes.
All P values were two-sided unless otherwise specified.
TCGA cohort of primary and metastasis samples
Our study utilized 4,473 primary tumor samples and 395 tumor metastasis samples, involving 4,839 human cancer cases representing 11 different major types, for which TCGA generated data on one or more of the following molecular characterization platforms (Supplementary Data S1): RNA-seq (4,446 primaries and 393 metastases), RPPA (3,194 and 267), miRNA sequencing (4,350 and 378), and DNA methylation arrays (3,913 and 391). Of the cancer types studied, TCGA SKCM data involved the most metastasis samples (n = 369), followed by THCA (n = 8), and BRCA (n = 7); CESC, HNSC, and PCPG cancer types each involved two metastasis samples; CRC, ESCA, PAAD, PRAD, and SARC each involved one metastatic sample. Just 29 of the 395 metastasis samples had a primary pair from the same patient, and so unpaired analyses between primary and metastasis were made the focus of this study. In terms of somatic DNA copy by SNP array platform, only SKCM metastasis samples had available data, with no data generated on primaries. Somatic mutation calls by whole-exome sequencing were considered too sparse for carrying out comparisons within each cancer type, with the exception of SKCM, whose data have been studied previously (20).
Differential mRNA patterns associated with metastasis by TCGA cancer type
We first set out to define differentially expressed mRNAs (based on RNA-seq platform) between primary and metastasis samples for each cancer type. For each cancer type, the top differentially expressed mRNAs (genes) in metastasis greatly exceeded the chance expected (Fig. 1; Supplementary Data S2 and S3). Using a FDR cutoff of 10%, the numbers of top significant genes ranged from 43 for PCPG to 10,084 for SKCM, with the other cancer types having between 178 and 1,205 top genes. For cancer types with only one metastasis sample, significant genes in effect represented outliers with large differences at the edge or outside of the distribution as defined by the primary samples (Supplementary Fig. S1). The limitations with metastasis signature as defined by a single sample would include false negatives (e.g., in cases where the distributions between primary and metastasis would overlap) and questions as to the generalizability of the signature to other metastasis cases, where the latter may be partially addressable by comparisons with results from external datasets (see below). We examined differences involving estimated tumor purities (24), as gene expression patterns in cancer can reflect noncancer as well as cancer cells (32). Of all the cancer types examined, only SKCM showed significantly lower tumor purity in metastasis versus primary (P = 7.6E-7, t test, Supplementary Fig. S2). Using linear models incorporating purity as a covariate, on the order of 8,038 genes remained significantly differentially expressed in SKCM, of the above 10,084 genes (Fig. 1).
Although some differential expression patterns associated with metastasis were found to be shared across multiple cancer types, by and large each cancer type showed a metastasis signature that was distinctive from those of the other cancer types. In comparing the respective expression signatures of metastasis from each cancer type to each other, some amount of gene-set overlap was observed (Fig. 2A). In a number of cases, the overlap in signatures between any two cancer types was statistically significant, even if the overlap itself involved a fraction of genes (e.g., on the order of 10%). A set of 821 genes were found significant (FDR < 10%) with same direction of change for two or more cancer types (Fig. 2B). Of these genes, 65 were significant for three or more cancer types, including genes with previously demonstrated functional roles in metastasis such as EPL3 (33), MYCNOS (34), and FOXF2 (35). Just 8 genes (BEND4, CD5L, CELA1, CLEC4M, CYP17A1, DCAF8L2, FAM151A, and SPIC) were overexpressed in metastasis (FDR < 10%) for four or more cancer types. We furthermore examined whether any of the metastasis signature genes (considering overexpressed and underexpressed gene sets separately) would be enriched for normal tissue-specific mRNA markers associated with the given cancer type (as obtained using GTEx data). Of 10 different tissue-specific marker gene sets, only a nominally significant association (P < 0.001, one-sided Fisher exact test) was observed between SKCM metastasis underexpressed genes and gene markers associated with GTEx mRNA markers of normal skin tissues.
Functional categories of genes represented by the cancer type–specific metastasis expression signatures were examined using the GO annotation terms (Supplementary Data S4). Specific GO term categories were found enriched within the corresponding metastasis signatures of multiple cancer types (Fig. 3A). Significantly enriched GO terms (FDR < 10% using one-sided Fisher exact tests) found with the metastasis overexpressed genes for at least three cancer types included “cellular response to stress,” “DNA repair,” “oxidation–reduction process,” “protein deubiquitination,” and “receptor activity,” and significant GO terms within the underexpressed genes for at least three cancer types included “extracellular region,” “proteolysis,” and “regulation of locomotion.” We took the genes related to receptor activity and genes high in metastasis (FDR < 10%) for at least one cancer type, and we integrated these with public databases of protein–protein interactions to generate a protein interaction network (Fig. 3), which allowed us to visualize the potential relationships involving these genes. Although most of the genes in this network involved SKCM, a number of other genes involved a trend (P < 0.05, Pearson on log-transformed data) of higher expression in metastasis in two or more cancer types, and 10 genes in the network were high (P < 0.05) in three or more cancer types: CR1, CR2, GP1BA, GRID2, GRM7, LHCGR, LRP2, MED14, P2RX2, and PTPRH. Similar types of interaction networks were also generated involving genes related to oxidation–reduction process or protein deubiquitination (Supplementary Fig. S3). Genes involved in the immune checkpoint pathway were also examined in TCGA metastasis profiles (Supplementary Fig. S4), with these being elevated across SKCM metastasis samples as expected (20), as well as elevated in a portion of metastasis samples from other cancer types.
Metastasis-associated mRNA patterns as observed in datasets external to TCGA
To help assess their generalizability, we compared the gene expression signatures of metastasis, as defined for each cancer type using TCGA data, with metastasis expression signatures obtained from external datasets made available by previously published studies. We examined 15 external gene expression profiling datasets of metastasis versus primary samples, involving six cancer types (BRCA, CRC, PAAD, PRAD, SKCM, and THCA). For each of the cancer types surveyed, a significant number of genes where found to overlap with the results of at least one external dataset of the given cancer type, for either the metastasis overexpressed or underexpressed genes (Fig. 4A). Perhaps, in part, because the CRC and THCA metastasis signatures each involved fewer genes, the CRC overexpressed genes showed some overlap but not a significant overlap with CRC overexpressed genes from external datasets, and THCA underexpressed genes by TCGA did not show significant overlap with external dataset results. For each cancer type, on the order of 35%–70% of genes comprising the corresponding TCGA metastasis signature showed a similar significant trend (P < 0.05) in at least one external dataset of that cancer type (Fig. 4B).
Notably, the external datasets often involved different sites of metastasis for a given cancer type; for example, the external PRAD datasets involved samples taken from various sites including lymph node, bone, lungs, testes, and brain (4, 6, 7), implying that the TCGA PRAD signature, while derived from a single metastasis sample, would not be specific to a single site. Similarly, breast metastasis in the GSE110590 dataset (5) involved a number of different sites, with the TCGA BRCA metastasis signature being manifested in samples from most of these sites (Fig. 4C). Furthermore, we examined GTEx-derived markers of tissues representing common sites of metastasis (adrenal gland, brain, liver, and lung) for significant overlap with TCGA-derived metastasis overexpressed genes; after multiple testing correction (25), only the GTEx liver signature was found to significantly overlap with metastasis genes associated with BRCA (P < 1E-7, one-sided Fisher exact test, with 20 of the 342 BRCA metastasis overexpressed genes also included in the top 500 genes highly expressed in normal liver), but not with genes from the other cancer types.
Previous studies have suggested that a subset of primary tumors resemble metastatic tumors with respect to gene expression patterns (36). For each cancer type in our TCGA cohort, we investigated the corresponding metastasis expression signatures in primary tumors. TCGA expression profiles of primary tumors were each scored for manifestation of the metastasis signature. Out of nine cancer types for which pathologic stage or grade information were provided, five (CSEC, HNSC, PRAD, SKCM, and THCA) showed some statistical trend for positive correlation between the signature score and stage or grade across primary cancers (one-sided P ≤ 0.05, Pearson, Fig. 5A). This association was notably strongest for PRAD cancer type (P < 1E-30) to the extent that clear differences in time to adverse events between patients with primary prostate tumors manifesting the PRAD metastasis signature as compared with the rest of the patients were observable, when applying the signature to profiles from multiple external cohorts (Fig. 5B). In addition, in another prostate cancer dataset, consisting of primary prostate cancer samples from patients for whom the early onset of metastasis following radical prostatectomy was recorded (37), PRAD metastasis signature scores were significantly elevated (P < 1E-9) in the early onset group (Fig. 5C).
Molecular patterns associated with metastasis involving protein, miRNA, and methylation
We went on to examine the protein, miRNA, and DNA methylation datasets in TCGA, to define the differentially expressed features between primary and metastasis samples for each cancer type. RPPA proteomic data involved 218 features and four cancer types (BRCA, PCPG, SKCM, and THCA) with metastasis profiles. For SKCM, a large portion of RPPA features examined were differentially expressed in metastasis (94 features at FDR < 10%, Pearson correlation on log-transformed data, 83 features significant after corrections for tumor purity; Supplementary Data S5), analogous to results from mRNA expression. No RPPA features with globally significant (FDR<10%) were found for PCPG or THCA, likely, in part, due to limited sample power. For BRCA, one protein feature, transglutaminase 2, was elevated in metastasis and globally significant at FDR < 10% (FDR < 1E-12), corresponding to mRNA-level differences (Fig. 6A). Transglutaminase 2 protein and mRNA were also elevated in SKCM (Fig. 6A) and the protein is known to promote metastasis (38). For most cancer types, widespread differences in miRNA expression between metastasis and primary were observed (Fig. 6B; Supplementary Data S5). Most of the significant miRNAs detected were overexpressed versus underexpressed in metastasis, with 85 overexpressed miRNAs and 12 underexpressed miRNAs significant (FDR < 10%) in two or more cancer types, and with 17 overexpressed miRNAs significant in three or more cancer type (Fig. 6C). For a number of cancer types, mRNA–miRNA pairings, as defined by both a previously identified miRNA–target interaction (as cataloged by miRTarBase; ref. 39) and significant differential expression in metastasis for both mRNA and miRNA (in opposite directions), could also be identified (Supplementary Data S5).
Using TCGA data from DNA methylation arrays, we examined 150,253 CpG Island probes, finding widespread differences in methylation between metastasis and primary samples for each cancer type studied (Fig. 6D; Supplementary Data S6). The numbers of top significant methylation features (FDR < 10%, Pearson on logit-transformed data) ranged from 163 for THCA to 27,530 for SKCM (after corrections for tumor purity), with the other cancer types having between 441 and 6,611 top features. As increased methylation of regulatory regions in proximity to genes can lead to epigenetic silencing, we integrated DNA methylation results with mRNA expression results, defining sets of genes associated with both altered methylation and expression (Fig. 6E; Supplementary Data S6). For all cancer types except SARC and THCA, significant inverse correspondences between methylation and expression results were observed (P < 0.05, one-sided Fisher exact test or χ2 test), either involving genes overexpressed and with lower associated methylation in metastasis or involving genes underexpressed and with higher associated methylation in metastasis. The significantly overlapping results involved, for example, 2,730 genes for SKCM (both overexpressed and underexpressed genes, with inverse patterns of DNA methylation), 66 genes for PRAD, 43 genes for HNSC, and 33 genes for CESC.
Our study of TCGA data on cancer metastasis samples had three overall objectives: (i) to obtain a preliminary global view of metastasis versus primary molecular differences across several cancer types, (ii) to provide a resource for future studies investigating the role of specific genes in metastasis, and (iii) to help provide direction for future genomics studies of metastasis, for example, by showcasing the utility of examining molecular differences across cancer types and across other molecular profiling platforms, in addition to RNA-seq. A clear limitation of this study involves the limited number of metastasis samples profiled as part of TCGA consortium, as the main focus of TCGA was to examine genomic and molecular patterns of primary rather than of metastatic cases. For several of the cancer types examined, this limitation is mitigated somewhat by comparing the results from TCGA transcriptomic data to existing data from other studies, thereby demonstrating the relevance of differential gene patterns as observed across sample cohorts. Our results would support the need for future multiplatform-based and pancancer genomic studies profiling larger numbers of metastasis with primary samples, which would allow us to further define and refine the molecular signatures of metastasis as put forth in this study. Nevertheless, our study demonstrates that even on the basis of a single metastasis sample, there would be molecular information contained here, representing real biological differences that may involve at least some metastasis cases for a given cancer type.
Our study has identified widespread molecular differences in metastasis versus primary tumors for 11 different cancer types, with each cancer type having a signature of metastasis that is distinct from that of the other cancer types. This would suggest that there are different molecular pathways to metastasis involved in different cancers. Our findings would seemingly differ with those of two early studies of gene expression patterns of metastasis, one from Ramaswamy and colleagues (36), which defined a single 128-gene signature of metastasis across multiple cancer types (lung, breast, prostate, colorectal, uterus, ovary, etc.), and one from Weigelt and colleagues (40), which could not find any global significant differences over chance expected between primary and metastasic breast cancer samples. Studies subsequent to the Weigelt and colleagues' study have been able to define widespread differences associated with breast cancer metastasis versus primary tumors (5, 8, 9). Interestingly, when surveying TCGA data, none of the Ramaswamy signature genes showed consistent high or low expression patterns in metastasis across the different cancer types (Supplementary Data S2). Although the Ramaswamy study found that a subset of primary tumors from various cancer types expressing the 128-gene metastasis signature were associated with worse outcome, we find in this study that aggressive prostate cancers, in particular, appear to express a metastasis signature pattern, but other cancer types, such as breast cancer, do not show a similar phenomenon. One salient feature of this study was to survey available data from multiple external sources, in addition to TCGA data. Where genes are found to show consistent patterns across multiple datasets and studies, we may place the most confidence in these gene patterns, at least given the currently available data.
The results of this study (e.g., as provided in the Supplementary Materials and Methods) may serve as a resource for future studies investigating the role of specific genes in metastasis. The various gene signatures of metastasis, as identified in each cancer type by TCGA data, may be mined to help identify candidates for functional studies. For cancer types with only a single metastasis sample with TCGA, there would be potential limitations with the associated metastasis signature, including questions as to the generalizability of the signature to other metastasis cases. Integration of TCGA results with results of external public datasets can considerably strengthen the metastasis associations as identified for specific genes. For example, the TCGA PRAD metastasis signature was based on a single metastasis profile, but this signature showed highly significant overlap with results with each of the three independent profiling datasets of prostate cancer metastasis versus primary disease (4, 6, 7). The TCGA PRAD signature could also define a subset of aggressive primary prostate cancer. Integration between mRNA data and data from other platforms in TCGA may also be used to select genes of particular interest, such as the genes showing concordant alterations involving both expression and DNA methylation. Genes that appear significant in multiple cancer types, including genes encoding cell receptors, may also be of interest for further investigation.
Although successfully defining molecular signatures of metastasis across several different cancer types, this study points to the need for more molecular data on metastasis in human tumors. Much could be gained by generating molecular data on larger numbers of metastasis and primary cancers, using multiple “omics” data platforms, in addition to mRNA expression profiling. The global molecular patterns involved in metastasis would entail proteomic and DNA methylation levels, in addition to transcriptomic levels. For many cancer types with metastasis data in TCGA, few or no relevant external molecular profiling datasets were found to be available. For cancer types where a large number of expression outliers could be associated with a single metastasis sample profile, profiling more metastasis cases would enable us to define more robust molecular signatures that would presumably be generalizable to the disease as a whole. Profiling larger numbers of cases would also allow for paired analyses by patient between primary and metastasis samples, as well as offering the possibility of subtype discovery within metastatic tumors, according to differential patterns being found within some but not all metastasis cases. Molecular data from human tumors may be combined with molecular data from experimental models of metastasis (41) to identify genes common to both, which may help pinpoint critical targets relevant in both the laboratory and human disease settings. The top gene correlates of metastasis by and large do not appear to represent canonical oncogenes (32, 42) or frequent targets of point mutation (43), but rather appear indicative of complex processes at work involving multiple internal and external factors. The molecular signatures of metastasis for each cancer type have the potential to lead to new discoveries into the disease process.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: C.J. Creighton
Development of methodology: C.J. Creighton
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): F. Chen, Y. Zhang, C.J. Creighton
Writing, review, and/or revision of the manuscript: S. Varambally, C.J. Creighton
Study supervision: C.J. Creighton
This work was supported, in part, by NIH grant P30CA125123 (to C.J. Creighton).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.