Genes that are commonly deregulated in cancer are clinically attractive as candidate pan-diagnostic markers and therapeutic targets. To globally identify such targets, we compared Cap Analysis of Gene Expression profiles from 225 different cancer cell lines and 339 corresponding primary cell samples to identify transcripts that are deregulated recurrently in a broad range of cancer types. Comparing RNA-seq data from 4,055 tumors and 563 normal tissues profiled in the The Cancer Genome Atlas and FANTOM5 datasets, we identified a core transcript set with theranostic potential. Our analyses also revealed enhancer RNAs, which are upregulated in cancer, defining promoters that overlap with repetitive elements (especially SINE/Alu and LTR/ERV1 elements) that are often upregulated in cancer. Lastly, we documented for the first time upregulation of multiple copies of the REP522 interspersed repeat in cancer. Overall, our genome-wide expression profiling approach identified a comprehensive set of candidate biomarkers with pan-cancer potential, and extended the perspective and pathogenic significance of repetitive elements that are frequently activated during cancer progression. Cancer Res; 76(2); 216–26. ©2015 AACR.

Successful cancer treatment depends heavily on early detection and diagnosis. Despite decades of research, relatively few biomarkers are routinely used in clinics (e.g., CA-125 and PSA in ovarian and prostate cancers, respectively; refs. 1, 2). There is a need for reliable and clinically applicable new cancer biomarkers for early detection. Cancers originating in the same tissue can be very heterogeneous, often being derived from different cell types and having drastically different mutation profiles (3). At the same time, cancers from different tissues can share some common features, for example, The Cancer Genome Atlas (TCGA) has found genes and pathways, DNA copy number alterations, mutations, methylation, and transcriptome changes that recur across 12 different primary tumor types (4).

Here using Cap Analysis of Gene Expression (CAGE) data collected for the Functional ANnoTation Of Mammalian genome (FANTOM5) project (5), we identified mRNAs, long-noncoding RNAs (lncRNA), enhancer RNAs (eRNA), and RNAs initiating from within repeat elements, which are recurrently perturbed in cancer cell lines. To confirm that these transcripts are relevant to tumors, we compared their expression in 4,055 primary tumors and 563 matching tissue sets RNA-seq profiled by the TCGA (6) and in a set of colorectal tumor (7) samples profiled proteomically. Finally, for the most promising biomarker candidates we performed qRT-PCR validations in cancer cell lines and tumor cDNA panels. Taken together, our analyses allowed for identification of a set of robust pan cancer biomarker candidates, which have the potential for development as blood biomarkers for early detection and for histological screening of biopsies.

This work is part of the FANTOM5 project. Data download, genomic tools, and copublished manuscripts have been summarized at the FANTOM5 website (8).

FANTOM5 data

We used the cap analysis of gene expression (CAGE) data from the FANTOM5 project (libraries sequenced to a median depth of 4 million mapped tags; ref. 5). We used 564 CAGE profiles: 225 cancer cell lines and 339 primary cells samples. We split the data into three data sets: (i) matched solid, (ii) unmatched solid, and (iii) matched blood (Supplementary Table S1A and S1C for list of cancer types and sample annotation). The CAGE tag counts under 184,827 robust decomposition-based peak identification (DPI) clusters (5) were used to represent a promoter-level expression. For the enhancer activity, we used the CAGE tags counts under 43,011 enhancer regions identified in ref. 9.

FANTOM5 differential expression analysis

To identify up- and downregulated transcripts in cancer cell lines versus normal primary cells, we used Genewise Negative Binomial Generalized Linear Models as implemented in edgeR (10). The cancer versus normal comparison was performed using glmLRT function. In matched solid comparison, we set equal weight for each solid cancer type, each type contributing equally to overall comparison. In the matched solid and matched blood dataset, simple cancer versus normal comparison was performed.

The P-values were adjusted for multiple testing by Benjamini–Hochberg method. The thresholds of fold change >4 and FDR <0.01 were used.

ON/OFF analysis

For each feature, we determined the expression status in binary fashion: ON (expressed, count > 0), OFF (not detected, count = 0). We then calculated the frequency of expression in cancer and normal samples. Features expressed four times more frequently in cancer than in normal samples were selected as “ON in cancer,” whereas features not expressed/lost four times more often in cancer than in normal samples mere selected as “OFF in cancer.” The procedure was applied to each dataset (matched solid, unmatched solid, and matched blood). The significance of the association (contingency) between ON/OFF status and cancer/normal status was tested by two-sided Fisher exact test with adjustment for multiple testing by Benjamini–Hochberg method. The threshold of FDR < 0.01 was used. The pipeline of differential expression described above was applied separately to the DPI/promoter counts and enhancer counts. The features found differentially expressed in all three datasets were selected as “pan” cancer features, whereas features differentially expressed in matched and unmatched solid datasets only were selected as “solid only” cancer features.

TCGA RNA-seq data

We obtained the RNA-Seq profiling data of 4,055 cancer samples and 563 normal tissues data from The Cancer Genome Atlas (TCGA) Data Portal (data status as of Aug 5, 2013, origin listed in Supplementary Table S1B; ref. 6). The profiles represented 14 solid cancer types for which both tumor and normal tissue samples were available. We downloaded level 3 RNASeqV2, upper quartile normalized RSEM count estimates with expression profiles of 20,531 genes in 4,618 samples.

The counts were log2 transformed and used as an input expression data to LIMMA.

The cancer versus normal comparison was performed using equal weight for each solid cancer type, each type contributing equally to overall comparison. The P-values were adjusted for multiple testing by Benjamini–Hochberg method. The thresholds of fold change >2 and FDR <0.01 were used.

Enrichment for cancer-related genes

We tested for the enrichment by applying a hypergeometric test, using the significance threshold of P < 0.05. The list of oncogenes was a union of oncogenes listed in MSigDB (11) and UniProt (12) databases. For tumor suppressors, we considered genes listed in at least two of three sources: MSigDB (11), UniProt (12), and TSGene (13) tumor suppressor list. For the list of genes frequently mutated in cancer we used the high confidence drivers mutated across 12 cancer types from Tamborero and colleagues (14). We also tested for enrichment of cancer-related genes listed in COSMIC: Cancer Gene census (15).

Chromatin Interaction Analysis Paired-End Tagenhancer–promoter pairs

We obtained the Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) data from ENCODE/GIS-Ruan project (GSE39495, April 21, 2014). Data files from 15 experiments covered five cell lines (Hct-116, Helas3, K562, Mcf7, and Nb4) and three transcription factors (Pol2, Ctcf, ERalpha a). We merged the interaction from all experiments. We then extracted the ChIA–PET interaction clusters overlapping the genomic locations of enhancers and searched if the linked genomic locations overlap promoters of the annotated genes.

Quantitative PCR for cell line samples and human cancer/normal tissue cDNA

Five hundred nanograms of total RNA from K562, HepG2, MCF7, and HDF was reverse-transcribed using oligo dT primer, which was then diluted 12.5 times with DNA/RNA free water. Primers for real-time PCR were designed by the Primer3 web tool (Supplementary Table S12). The housekeeping gene ACTB was utilized as to normalize the expression levels. Quantitative PCR (qPCR) was carried out with ABI 7500 Fast Real-Time PCR System using Power SYBR Green PCR Master Mix. For validation in tumor samples we performed qPCR reactions on TissueScan Cancer Survey Panel 96 – I cDNA panel (CSRT501, OriGene, MD). Each reaction was run in triplicate for cell line samples, and singlet for human cancer/normal tissue cDNA panel.

Identification of transcripts recurrently up- or downregulated in cancer cell lines

Using CAGE data collected for the FANTOM5 (5, 9) project, we compared expression levels of transcripts from 184,827 promoter and 43,011 enhancer regions between a panel of 225 cancer cell lines and a panel of 339 primary cell samples (samples IDs and their annotation is listed in Supplementary Table S1C).

First, the cancer cell line and primary cell datasets were divided into three subsets (see Supplementary Table S1A); cell lines and primary cells from solid tissues or blood lineages that could be matched are referred to as matched-solid or matched-blood. The remaining samples from solid tissue are referred to as unmatched-solid.

In each subset, we identified promoters that were differentially expressed between cancer and normal (edgeR; ref. 10, >4-fold change, FDR < 0.01). We also performed an alternative binary analysis (we refer to it as an ON/OFF analysis) to identify transcripts that were consistently switched off or switched on in cancer [four times more often expressed (switched ON) or not detected (switched OFF) in the cancer group compared to the normal group, using a significance level of FDR < 0.01 by Fisher exact test (examples on Fig. 1B)]. The results of the ON/OFF and edgeR analyses were then merged to obtain a final selection of up- and downregulated promoters (Fig. 1A and Supplementary Table S2).

Figure 1.

Summary of comparisons carried out to identify recurrently perturbed transcripts in the FANTOM5 cell line dataset. A, differential expression (DE) pipeline applied to the FANTOM5 data. B, examples of differentially expressed promoters showing expression switching (ON and OFF) and expression shift (UP and DOWN). C, comparison between promoter and gene level differential expression (based on CAGE data). Note: Although the majority of differentially expressed promoters reflect gene-wise differential expression, a significant fraction behave differently, for example, MPP2 or BCAT1. D, table summarizing the number of promoters and genes showing differential expression. Numbers in parentheses indicate numbers of unique genes.

Figure 1.

Summary of comparisons carried out to identify recurrently perturbed transcripts in the FANTOM5 cell line dataset. A, differential expression (DE) pipeline applied to the FANTOM5 data. B, examples of differentially expressed promoters showing expression switching (ON and OFF) and expression shift (UP and DOWN). C, comparison between promoter and gene level differential expression (based on CAGE data). Note: Although the majority of differentially expressed promoters reflect gene-wise differential expression, a significant fraction behave differently, for example, MPP2 or BCAT1. D, table summarizing the number of promoters and genes showing differential expression. Numbers in parentheses indicate numbers of unique genes.

Close modal

In total, 2,108 promoters were differentially expressed in cancer cell lines. Seven hundred and eighty-one were consistently up regulated in all three comparisons and a further 814 were up only in solid cancers. Conversely 99 were consistently down-regulated in all three datasets and a further 414 were down only in solid cancers (Table 1). Sixty-three percent of the differentially expressed peaks overlapped protein-coding genes, 12% overlapped long noncoding genes (GENCODE v19; ref. 16) and 25% were not associated to any known genes (Supplementary Table S3).

Table 1.

The numbers of differentially expressed promoters (DPI clusters) and the type of genomic region they overlap, based on Gencode 19

UpregulatedDownregulated
Type of genomic regionPan cancerSolid onlyPan cancerSolid onlyTotal%
Protein coding 434 455 92 354 1,335 63 
LincRNA 45 38 92 
Antisense 37 28 69 
Pseudogene 12 27 
Other ncRNAs 20 33 58 
Unannotated 233 251 40 527 25 
Total 781 814 99 414 2,108 100 
UpregulatedDownregulated
Type of genomic regionPan cancerSolid onlyPan cancerSolid onlyTotal%
Protein coding 434 455 92 354 1,335 63 
LincRNA 45 38 92 
Antisense 37 28 69 
Pseudogene 12 27 
Other ncRNAs 20 33 58 
Unannotated 233 251 40 527 25 
Total 781 814 99 414 2,108 100 

In some cases, the CAGE analysis identified alternative promoters. Comparing the gene-wise differential expression (total CAGE signal for the same gene) to the differential expression of individual promoters (Fig. 1C), we found that for 23% of differentially expressed protein coding genes, at least one alternative promoter behaved differently to that of the whole gene, whereas for lncRNAs (which have fewer alternative promoters) it was only 5% (Fig. 1D).

Differentially expressed protein coding genes are enriched in cancer-associated genes

Focusing on CAGE peaks unambiguously at the 5′ end of protein coding genes (±500 bp from the 5′ end of annotated transcripts or located in 5′ UTRs) we identified 911 promoters corresponding to 656 unique genes that were differentially expressed: 435 upregulated and 221 downregulated (Supplementary Table S9). The gene set was significantly enriched for oncogenes (hypergeometric test P = 7.5e−05, 33 genes), tumor suppressors (P = 0.0043, 13 genes), genes frequently mutated in cancer (P = 0.034, 18 genes; ref. 14) and genes listed in the Cancer Gene Census (P = 0.01, 28 genes; see Supplementary Table S3F; ref. 15). Interestingly, eight oncogenes were downregulated, and five tumor suppressors were upregulated, changing in the opposite direction to one would expect (Supplementary Fig. S1A–S1C). This may be caused by regulatory feedback loops responding to neoplastic changes.

We next performed an analogous cancer versus normal analysis on RNA-seq data from 14 tumor-normal pairs (4,055 primary cancer samples and 563 normal tissues samples; Supplementary Table S1B) from The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/). The fold changes observed for the TCGA analysis were considerably weaker (Fig. 2), than those seen for the FANTOM5 analysis presumably because of the mixture of cells in a tumor diluting the cancer cell signal (Fig. 2). To recover similar numbers of genes from both the TCGA and FANTOM5 analyses, we therefore applied a weaker threshold (abs FC > 2, FDR < 0.01) to identify 490 upregulated genes and 1,661 downregulated genes (Supplementary Table S4). The up-regulated genes were enriched for those listed in the cancer gene census (hypergeometric test P = 0.03, 18 genes). Of particular note, many more genes were downregulated in the tumor-normal comparison than we observed for the cancer cell line-primary cell comparison.

Figure 2.

Pan-cancer biomarker candidates. Genes aberrantly expressed in both FANTOM5 cancer cell lines (>4 fold change or four times expression gain/loss, FDR < 0.01) and in the TCGA primary tumors (fold change > 2, FDR < 0.01). The heat maps show the expression fold changes from cancer versus normal comparisons. The right panels represent 10 solid tumor origins and blood cancer based on FANTOM5 cancer cell lines. The left panels represent 14 tumor types from TCGA. The TCGA tumor type abbreviations are BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COAD, colon adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; THCA, thyroid carcinoma; UCEC, uterine corpus endometrial carcinoma.

Figure 2.

Pan-cancer biomarker candidates. Genes aberrantly expressed in both FANTOM5 cancer cell lines (>4 fold change or four times expression gain/loss, FDR < 0.01) and in the TCGA primary tumors (fold change > 2, FDR < 0.01). The heat maps show the expression fold changes from cancer versus normal comparisons. The right panels represent 10 solid tumor origins and blood cancer based on FANTOM5 cancer cell lines. The left panels represent 14 tumor types from TCGA. The TCGA tumor type abbreviations are BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COAD, colon adenocarcinoma; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; THCA, thyroid carcinoma; UCEC, uterine corpus endometrial carcinoma.

Close modal

Potential pan-cancer biomarkers

We found that 76 (17%) of the upregulated genes identified in the cancer cell lines analysis were also upregulated in primary tumors from TCGA (Fig. 2). Among them we find oncogenes (HOXC13, MYEOV, MNX1, and CASC5), cancer antigens (PRAME, CD70, CASC5, IDF2BP3) and, somewhat unexpectedly, the tumor suppressors (TP73, BLM, BUB1B). The upregulated genes were also enriched in genes involved in cell cycle, DNA metabolism, biopolymer metabolism, and homeobox genes involved in development. This included well-known pan-cancer genes such as TERT, PRAME, and TOP2A (17, 18) and MYEOV and MNX1, which are implicated in blood malignancies (19, 20) and FAM111B in prostate cancer (21).

For the downregulated genes, 52 (19%) genes from the FANTOM5 cancer cell lines analysis were also downregulated in primary tumors (Fig. 2). Interestingly, the list was enriched for genes related to oxidoreductase activity (five genes: AOX1, PTGS1, ACOX2, COX7A1, and the tumor suppressor GPX3; ref. 22). Because the downregulation is seen in both cancer cell lines and primary tumors, we deduce that the changes are caused by a permanent reprograming of metabolism in cancer cells rather than response to tumor microenvironment, or cell culture conditions. Finally, we also observed seven discrepancies; CDKN2A, COL1A1, COL5A2, GJB2, HIST1H2BH, MMP9, and TNFRSF6B were downregulated in the cancer cell lines but up regulated in the primary tumor analysis.

Finally, we used recent proteome data from 90 colorectal cancers and 30 normal tissues published by Zhang and colleagues (7). The spectral count data were available for 239 of our 656 differentially expressed genes. Twenty mRNAs/proteins were upregulated in both the cancer cell lines (CAGE) and colorectal tumors (mass spec data) whereas 16 were upregulated in both the RNA-seq and mass spec data (Supplementary Fig. S2A and Supplementary Table S9C). Notably, four genes were robustly upregulated in all three comparisons: MCM2, TOP2A, ASNS, and MKI67.

There were 108 genes that were downregulated at the protein level and in at least one transcriptome analysis (CAGE or RNA-seq). Strikingly, the top 10 enriched terms within those genes were all related to metabolic processes, either to oxidative processes or lipid metabolism (Supplementary Fig. S2B), thereby confirming the metabolic pathway changes that we have observed from the RNA data.

Pan-cancer long-noncoding RNAs

From the cancer cell line analysis we identified 271 differentially expressed lncRNAs (181 lncRNAs annotated with GENCODE 19, plus a further 90 with the miTRanscriptome annotation; ref. 23). The majority (247 lncRNAs) were upregulated whereas 24 were downregulated (Supplementary Table S10A). In total, 39 and five of these were up- and downregulated, respectively, in both the cancer cell line analysis and at least one tumor type in the miTranscriptome study (23). Of those, 21 were consistently upregulated and two consistently downregulated in cancer cell lines and at least two tumor types (Supplementary Table S10B).

For two of these lncRNAs (ENST00000448869 and FOXP4-AS1), we performed qRTPCR validation in cancer cell lines versus primary cells and also in a cDNA panel covering eight tumor types and normal matching tissues. In both cases, the targets were highly significantly upregulated in both cancer cell lines and tumors (Fig. 4).

Figure 3.

Genomic neighborhood of pan-cancer–associated noncoding transcripts. A, RP11-124N14.3 lncRNA is consistently downregulated in cancer cell lines and is positioned antisense to VIM (EMT marker). The downregulation has been confirmed by qPCR (Fig. 4A) and in the miTranscriptome data (Supplementary Table S10A). B, we observed that REP22 repeat becomes activated in cancer, giving rise to bidirectional transcription. The example here shows the promoters of the protein coding CCD144NL and its antisense CCD144NL-AS1 overlapping a REP522 element. The upregulation of both genes was validated in cancer cell lines by qPCR (Fig. 4A). C, in another example, BRCAT95, which is upregulated in breast cancer (Supplementary Table S10A), is initiated from a REP522 element and is confirmed to be upregulated by qPCR. The other transcript could not be confirmed by qPCR, likely due to low expression level and/or incorrectly assembled transcript. Interestingly, the promoter pair overlaps FANTOM5 defined enhancer region. D, ChIA-pet data show pan-cancer enhancer chr1:46575103-46575175 is physically associated with the promoter for the PIK3R3 gene that is reported to increase tumor migration and invasion when overexpressed in colorectal cancer (29) and is identified as a potential therapeutic target in ovarian cancer (41). The visualizations were performed in the ZENBU genome browser (42).

Figure 3.

Genomic neighborhood of pan-cancer–associated noncoding transcripts. A, RP11-124N14.3 lncRNA is consistently downregulated in cancer cell lines and is positioned antisense to VIM (EMT marker). The downregulation has been confirmed by qPCR (Fig. 4A) and in the miTranscriptome data (Supplementary Table S10A). B, we observed that REP22 repeat becomes activated in cancer, giving rise to bidirectional transcription. The example here shows the promoters of the protein coding CCD144NL and its antisense CCD144NL-AS1 overlapping a REP522 element. The upregulation of both genes was validated in cancer cell lines by qPCR (Fig. 4A). C, in another example, BRCAT95, which is upregulated in breast cancer (Supplementary Table S10A), is initiated from a REP522 element and is confirmed to be upregulated by qPCR. The other transcript could not be confirmed by qPCR, likely due to low expression level and/or incorrectly assembled transcript. Interestingly, the promoter pair overlaps FANTOM5 defined enhancer region. D, ChIA-pet data show pan-cancer enhancer chr1:46575103-46575175 is physically associated with the promoter for the PIK3R3 gene that is reported to increase tumor migration and invasion when overexpressed in colorectal cancer (29) and is identified as a potential therapeutic target in ovarian cancer (41). The visualizations were performed in the ZENBU genome browser (42).

Close modal
Figure 4.

Validation of pan-cancer biomarkers by qRT-PCR. A, summary of the qRT-PCR validations in three cancer cell lines and dermal fibroblasts as normal reference. The table shows the significant upregulation of eight REP522-associated transcripts and six potential biomarkers, and downregulation of three lncRNAs (potential tumor suppressors). B, for the most promising candidates, we performed qRT-PCR validation across a cDNA panel of 65 tumors, seven lesions and 24 normal tissues. Note: BLM, a known tumor suppressor, and two selected lncRNAs (ENST00000448869 and FOXP4-AS1) showed highly significant upregulation in cancer. C, as in B, but showing the upregulation of ENST00000448869 across multiple cancer types.

Figure 4.

Validation of pan-cancer biomarkers by qRT-PCR. A, summary of the qRT-PCR validations in three cancer cell lines and dermal fibroblasts as normal reference. The table shows the significant upregulation of eight REP522-associated transcripts and six potential biomarkers, and downregulation of three lncRNAs (potential tumor suppressors). B, for the most promising candidates, we performed qRT-PCR validation across a cDNA panel of 65 tumors, seven lesions and 24 normal tissues. Note: BLM, a known tumor suppressor, and two selected lncRNAs (ENST00000448869 and FOXP4-AS1) showed highly significant upregulation in cancer. C, as in B, but showing the upregulation of ENST00000448869 across multiple cancer types.

Close modal

We also looked for the overlap with the lists of pan-cancer lncRNAs to the 229 “onco-lncRNAs” identified by Cabanski and colleagues (24), which allowed us to confirm three additional upregulated lncRNAs (Supplementary Table S10B and S10D). Our analysis of preprocessed TCGA RNA-seq data also allowed us to confirm deregulation of four lncRNAs, two already confirmed by miTranscriptome and Cabanski (MEG3 and DGCR5) and two that were missed by other reports; downregulation of the MT1L pseudogene and most notably the up-regulation of PVT1, which is a well-known lncRNA oncogene (25).

Deregulated long-noncoding RNAs located near cancer-related genes

We next looked at the genomic neighborhood of the differentially expressed lncRNAs. For 27 of the 181 (GENCODE19) differentially expressed lncRNAs, we found 33 cancer-related genes within 100 kb (Table 2, example in Supplementary Fig. S3). For example, PVT1 neighbors MYC; these are consistently cogained in cancer. We also observe RP11-1070N10.5 neighboring the TCL6 (lincRNA), TCL1A, and TCL1B oncogenes (located in a breakpoint cluster region on chromosome 14q32 in T-cell leukemia (26) and HOXA11-AS, neighboring HOXA13 and HOXA9 and overlapping the HOXA11 oncogene. Notably five out of six cancer-related genes located within 1 kb from upregulated lncRNAs were also upregulated, these include the MCF2L, GATA2, and MNX1 oncogenes and BSG and CSAG1 cancer antigens (Table 2). Possibly linked to cancer metabolism, the upregulated PCAT7 is located antisense to fructose-1,6-bisphosphatase-2 (FBP2; Supplementary Fig. S3), whose decreased expression promotes glycolysis and growth in gastric cancer cells (27).

Table 2.

The differentially expressed lncRNAs located within 100 kb from known cancer-related genes

lncRNAlncRNA DE summaryNeighbor gene nameNeighbor DE summaryNeighbor gene infoDistance from lncRNAOverlapStrand
MCF2L-AS1 Solid UP MCF2L Solid UP Oncogene <1 kb Yes Opposite 
RP11-475N22.4 (GATA2-AS1) Solid ON GATA2 Solid ON Oncogene, cancer gene census <1 kb Yes Opposite 
MNX1-AS1 Pan ON MNX1 Solid ON Oncogene <1 kb No Opposite 
AC009005.2 Solid ON BSG Pan UP Cancer antigen <1 kb Yes Opposite 
CSAG4 Pan ON CSAG1 Pan ON Cancer antigen <1 kb No Opposite 
HOXA11-AS Pan UP HOXA11  Oncogene, cancer gene census <1 kb Yes Opposite 
RHPN1-AS1 Pan UP MAFA Solid UP Tumor suppressor, oncogene <100 kb No Same 
LINC00624 Pan ON BCL9 Solid UP Oncogene, cancer gene census <100 kb No Opposite 
IFITM9P Solid DOWN MYEOV Pan UP Oncogene <100 kb Yes Opposite 
RP11-435O5.2 Pan UP PTCH1 Pan ON Tumor suppressor <100 kb No Same 
LIFR-AS1 Solid ON LIFR  Oncogene, cancer gene census, Mut Driver <100 kb Yes Opposite 
AC079767.4 Pan ON CREB1  Oncogene, cancer gene census, Mut Driver <100 kb No Same 
RP11-460N16.1 Pan ON MITF  Oncogene, cancer gene census <100 kb No Same 
RP11-1070N10.5 Solid ON TCL6  Oncogene, cancer gene census <100 kb No Same 
RP11-1070N10.5 Solid ON TCL1A  Oncogene, cancer gene census <100 kb No Opposite 
HOXA11-AS Pan UP HOXA9  Oncogene, cancer gene census <100 kb No Opposite 
PVT1 Solid UP MYC  Oncogene, cancer gene census <100 kb No Same 
LAMTOR5-AS1 Solid UP RBM15  Oncogene, cancer gene census <100 kb No Same 
RNU6-781P Pan UP ZNF384  Oncogene, cancer gene census <100 kb No Opposite 
RP11-284F21.7 Pan UP PRCC  Oncogene, cancer gene census <100 kb No Opposite 
HOXA11-AS Pan UP HOXA13  Oncogene, cancer gene census <100 kb No Opposite 
RP11-1070N10.5 Solid ON TCL1B  Oncogene <100 kb No Same 
TSPY26P Solid OFF HCK  Oncogene <100 kb No Opposite 
RP5-884M6.1 Solid ON PIK3CG  Mut Driver <100 kb No Same 
RP11-405F3.4 Pan ON KIFC3  Mut Driver <100 kb No Same 
RP5-991G20.1 Pan UP ZFHX3  Mut Driver <100 kb Yes Opposite 
CTA-714B7.5 Pan UP TOM1  Mut Driver <100 kb No Opposite 
LINC00243 Pan UP MDC1  Mut Driver <100 kb No Same 
CTA-714B7.5 Pan UP HMGXB4  Mut Driver <100 kb No Opposite 
CTC-338M12.9 Pan ON TRIM7  Mut Driver <100 kb No Opposite 
SCARNA14 Solid ON MAP2K1  Cancer gene census <100 kb No Opposite 
AC034193.5 Solid UP FANCD2  Cancer gene census <100 kb No Same 
RNU6-781P Pan UP ING4  Tumor suppressor <100 kb No Opposite 
lncRNAlncRNA DE summaryNeighbor gene nameNeighbor DE summaryNeighbor gene infoDistance from lncRNAOverlapStrand
MCF2L-AS1 Solid UP MCF2L Solid UP Oncogene <1 kb Yes Opposite 
RP11-475N22.4 (GATA2-AS1) Solid ON GATA2 Solid ON Oncogene, cancer gene census <1 kb Yes Opposite 
MNX1-AS1 Pan ON MNX1 Solid ON Oncogene <1 kb No Opposite 
AC009005.2 Solid ON BSG Pan UP Cancer antigen <1 kb Yes Opposite 
CSAG4 Pan ON CSAG1 Pan ON Cancer antigen <1 kb No Opposite 
HOXA11-AS Pan UP HOXA11  Oncogene, cancer gene census <1 kb Yes Opposite 
RHPN1-AS1 Pan UP MAFA Solid UP Tumor suppressor, oncogene <100 kb No Same 
LINC00624 Pan ON BCL9 Solid UP Oncogene, cancer gene census <100 kb No Opposite 
IFITM9P Solid DOWN MYEOV Pan UP Oncogene <100 kb Yes Opposite 
RP11-435O5.2 Pan UP PTCH1 Pan ON Tumor suppressor <100 kb No Same 
LIFR-AS1 Solid ON LIFR  Oncogene, cancer gene census, Mut Driver <100 kb Yes Opposite 
AC079767.4 Pan ON CREB1  Oncogene, cancer gene census, Mut Driver <100 kb No Same 
RP11-460N16.1 Pan ON MITF  Oncogene, cancer gene census <100 kb No Same 
RP11-1070N10.5 Solid ON TCL6  Oncogene, cancer gene census <100 kb No Same 
RP11-1070N10.5 Solid ON TCL1A  Oncogene, cancer gene census <100 kb No Opposite 
HOXA11-AS Pan UP HOXA9  Oncogene, cancer gene census <100 kb No Opposite 
PVT1 Solid UP MYC  Oncogene, cancer gene census <100 kb No Same 
LAMTOR5-AS1 Solid UP RBM15  Oncogene, cancer gene census <100 kb No Same 
RNU6-781P Pan UP ZNF384  Oncogene, cancer gene census <100 kb No Opposite 
RP11-284F21.7 Pan UP PRCC  Oncogene, cancer gene census <100 kb No Opposite 
HOXA11-AS Pan UP HOXA13  Oncogene, cancer gene census <100 kb No Opposite 
RP11-1070N10.5 Solid ON TCL1B  Oncogene <100 kb No Same 
TSPY26P Solid OFF HCK  Oncogene <100 kb No Opposite 
RP5-884M6.1 Solid ON PIK3CG  Mut Driver <100 kb No Same 
RP11-405F3.4 Pan ON KIFC3  Mut Driver <100 kb No Same 
RP5-991G20.1 Pan UP ZFHX3  Mut Driver <100 kb Yes Opposite 
CTA-714B7.5 Pan UP TOM1  Mut Driver <100 kb No Opposite 
LINC00243 Pan UP MDC1  Mut Driver <100 kb No Same 
CTA-714B7.5 Pan UP HMGXB4  Mut Driver <100 kb No Opposite 
CTC-338M12.9 Pan ON TRIM7  Mut Driver <100 kb No Opposite 
SCARNA14 Solid ON MAP2K1  Cancer gene census <100 kb No Opposite 
AC034193.5 Solid UP FANCD2  Cancer gene census <100 kb No Same 
RNU6-781P Pan UP ING4  Tumor suppressor <100 kb No Opposite 

NOTE: For the complete list of genes that are located near differentially expressed lncRNAs, see Supplementary Table S7.

Abbreviation: DE, differentially expressed.

Activation of repeat elements in cancer

Globally about 20% of all FANTOM5 promoters initiate from within repetitive elements and low complexity DNA sequences annotated by RepeatMasker. We observed a simple relationship for promoters that overlapped a repetitive element; the higher the fold change (upregulation in cancer), the higher the probability that the promoter overlapped a repetitive element (Supplementary Fig. S5, see Supplementary Table S13 for the promoter–repeat associations). A more detailed analysis revealed that the upregulated promoters are enriched in retrotransposons (SINE/Alu, LINE/L1, LTR/ERV1, LTR/ERVL). The SINE/Alu and LINE/L1 overlapping promoters tended to be located in intronic regions of protein coding genes (49% and 32%, respectively) and not associated to known RNA transcripts, whereas the upregulated promoters overlapping LTR/ERV1 often initiated the expression of lncRNAs (31 GENCODE lncRNAs and 48 miTranscriptome lncRNAs; Table 3).

Table 3.

Promoters overlapping repetitive elements

Location of differentially expressed repeat
Repeat overlapping promotersDifferentially expressed repeat overlapping promotersProtein coding
Repeat familyTotal# downOdds ratioP-value# upOdds ratioP-value5′UTRIntronExon3′UTRlncRNAPseudogeneNot annotated
REP522 72 25 62.05 2.2E−16 12 
Low_complexity 2,013 13 2.37 4.7E−03 18 1.04 0.81 15 
Simple_repeat 11,982 44 1.35 0.06 204 2.13 2.2E−16 86 70 17 63 
SINE/Alu 3,961 2.4E−05 138 4.44 2.2E−16 67 11 50 
LINE/L1 3,426 0.1 1.5E−03 67 2.35 1.8E−09 22 12 32 
LINE/L2 3,220 0.22 0.01 25 0.9 0.7 17 
LTR/ERVL-MaLR 3,613 7.8E−05 31 0.99 10 11 
LTR/ERV1 3,932 0.18 3.0E−03 133 4.3 2.2E−16 12 31 83 
LTR/ERVL 1,488 0.04 20 1.57 0.049 
Location of differentially expressed repeat
Repeat overlapping promotersDifferentially expressed repeat overlapping promotersProtein coding
Repeat familyTotal# downOdds ratioP-value# upOdds ratioP-value5′UTRIntronExon3′UTRlncRNAPseudogeneNot annotated
REP522 72 25 62.05 2.2E−16 12 
Low_complexity 2,013 13 2.37 4.7E−03 18 1.04 0.81 15 
Simple_repeat 11,982 44 1.35 0.06 204 2.13 2.2E−16 86 70 17 63 
SINE/Alu 3,961 2.4E−05 138 4.44 2.2E−16 67 11 50 
LINE/L1 3,426 0.1 1.5E−03 67 2.35 1.8E−09 22 12 32 
LINE/L2 3,220 0.22 0.01 25 0.9 0.7 17 
LTR/ERVL-MaLR 3,613 7.8E−05 31 0.99 10 11 
LTR/ERV1 3,932 0.18 3.0E−03 133 4.3 2.2E−16 12 31 83 
LTR/ERVL 1,488 0.04 20 1.57 0.049 

NOTE: The table shows the numbers of upregulated and downregulated promoters that overlap nine families of repetitive elements (≥20 promoters) as well as Fisher exact statistics of the enrichments (odd ratios and P-value, two-sided test). The right side of the table shows the available information about the annotation of those promoters.

In contrast, the majority of promoters overlapping simple repeats and low complexity sequences were associated with protein coding transcripts. Simple repeats were enriched among upregulated promoters, whereas low complexity sequences were enriched among downregulated promoters (Table 3).

Bidirectional transcription from REP522 satellite repeat is activated in cancer

Interestingly, a specific repeat family, REP522, was strongly enriched in the most upregulated promoters. REP522, originally called a telomeric satellite, is a largely palindromic, unclassified interspersed repeat of ∼1.8 Kb in size (28). We observed that out of 72 promoters overlapping REP522, 25 were upregulated in cancer (odds ratio, 62.05). Twenty out of these 25 promoters were associated with a known transcript (five pseudogenes, nine lncRNAs, and one protein coding gene) including the pseudogene BAGE2 (B melanoma antigen family, member 2) and the lncRNAs PCAT7 and BRCAT95, which were previously implicated in cancer (23). In most cases, the transcription is initiated bidirectionally and in five cases it overlaps regions previously annotated as enhancers. To show that the observed activation of REP522 elements was not due to a mapping artifact, we performed qPCR validation for 11 upregulated, REP522 initiated transcripts from different genomic regions in three cancer cell lines and dermal fibroblast cells as a control. For eight of these we confirmed upregulation in the cancer cell lines compared with normal fibroblasts (Fig. 4A). In one case, we confirmed the bidirectional transcription of CCD144NL and CCD144NL-AS1 from one REP522 element (Fig. 3B). The three transcripts for which the qPCR validations did not yield any results represented very lowly expressed, novel and computationally assembled transcripts from miTranscriptome, hinting at the possibility that they were either too lowly expressed or the transcripts were not correctly assembled (Fig. 3C). To our knowledge this is the first report implicating REP522 activation in cancer.

Enhancer activation in cancer

Taking advantage of the fact that CAGE data can be used to estimate the activity of enhancers from balanced bidirectional capped transcription (9), we performed differential expression analysis based on CAGE tags counts under 43,011 CAGE-defined enhancers (9), using the same differential expression pipeline as for the promoter regions. We found 28 pan-cancer enhancers upregulated in solid and blood cancers and a further 62 upregulated in solid cancers only (Supplementary Table S5). Enhancers tend to be highly cell-type specific (9); accordingly we found no broadly downregulated enhancers in cancer.

We found that 23 of the 90 upregulated enhancers could be associated to a miTranscriptome transcript (5′ end within 500 bp from the enhancers; Supplementary Table S11A) and that four of those transcripts were reported to be upregulated in at least one cancer type (Supplementary Table S11B).

We next used Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) data from the ENCODE project to associate these pan-cancer enhancers with their target genes. We found that 55 of the 90 upregulated enhancers can be physically linked to promoters of known genes (228 unique enhancer—gene links, Supplementary Table S6). 17 of the enhancers were linked to cancer related genes, including seven oncogenes (BCL9, CREB1, ZNF384, SALL4, TFRC, BTG1, and oncomir MIR21), two tumor suppressors (ING4, KCTD11) and five Mut-Drivers (PIK3CB, CLIP1, KIFC3, GPS2, and CARM1; Supplementary Table S6). In addition, eight of the upregulated enhancers were linked to promoters found to be upregulated in cancer cell lines, including cancer-linked genes such as TNFSF12 and PIK3R3 (Fig. 3D; ref. 29).

By using both the FANTOM5 CAGE expression data from cancer cell lines and primary cells, and the TCGA RNA-seq and TCGA proteome expression datasets from TCGA tumors and normal tissues we have built an overview of recurrent expression changes in cancer.

These datasets have their own strengths and weaknesses. Complicating the TCGA analysis, both tumors and normal tissues are complex mixtures of cell types (cancer cells, infiltrating lymphocytes, stroma and blood vessels), thus interpretation of differential expression between normal and cancer is complicated. Differences in gene expression may simply reflect differences in cell composition. To minimize this issue, the TCGA (3) required that profiled tumor samples contain at least 60% tumor cells and less than 20% necrosis. The FANTOM5 cell line and primary cell data avoids this complication as relatively homogenous, pure cell cultures were profiled. Conversely, artifacts from the long-term culture of cell lines and their artificial in vitro culture conditions could affect our FANTOM5 analysis. The TCGA avoids this by directly profiling fresh tissue.

As expected, there are differences in the genes sets identified by the two datasets. Despite this, we identified a core set of 128 markers that are consistently perturbed in both the FANTOM5 cell and TCGA tissue analyses. Four of the markers are also upregulated at the protein level in a colon cancer dataset. Specifically, TOP2A, MKI67, MCM2, and ASNS, which are among some of the most studied cancer biomarkers and drug targets. TOP2A is targeted by etoposide (30). ASNS is targeted in asparginase therapy of acute lymphoblastic leukemia (31), and both MKI67 and MCM2 have been studied as biomarkers (32) and (33) and potential drug targets (34, 35). Targeting these genes is likely to bring therapeutic value to many patients as they are recurrently upregulated across many cancer types. Our pan-cancer markers also appear to be mostly novel, as comparison to prior works [Rhodes and colleagues, multicancer meta-signature of 67 genes upregulated in cancer by meta-analysis of 40 published microarray experiments (18); Xu and colleagues, 46 genes upregulated across 21 microarray data sets (36)] found little overlap (Supplementary Table S8).

The FANTOM5 CAGE data also allowed us to look at transcript types rarely studied in prior efforts (long-noncoding RNAs, enhancer RNAs, and repetitive element derived RNAs). We report 271 pan-cancer lncRNAs, including famous cancer-associated lncRNAs such as PVT1 and many novel cases. Public datasets confirmed the upregulation of 25 and downregulation of three of these lncRNAs in at least two primary tumor types (23, 24) and we further validated upregulation of two novel lncRNAs by qRTPCR in a cDNA panel covering eight tumor types. We also identify 90 enhancer RNA-producing regions that are recurrently activated in cancer cell lines. For four of them a corresponding lncRNA transcript model is upregulated in the TCGA dataset.

The observation that promoters that overlap repetitive elements are often upregulated in cancer is quite interesting, and the link of the little known REP522 element to cancer is novel. One instance of REP522 near the B melanoma antigen (BAGE) locus has been reported to be marked with H3K9me3 and actively transcribed (37), perhaps suggesting REP522 transcriptional activation is responsible for upregulation of BAGE in cancer. Other better studied elements such as LTR elements have previously been reported to act as alternative promoters of host genes in mouse embryos (38) and to contribute to the complexity of the transcriptome of iPS and stem cells (39). Thus, the reactivation of these elements and the eRNAs identified above suggests acquisition of stem cell like properties by cancer cells. Possibly because repetitive sequences are usually suppressed by methylation in somatic cells; however, in cancer they are frequently hypomethylated (40).

In conclusion, our results, which highlight the transcriptome changes in cancer and cover both protein coding genes, non-protein coding transcripts, unannotated promoters and enhancer RNAs, represent an important step towards discovery of potentially useful cancer biomarkers and therapeutic targets. Development of technologies to detect and target these molecules has the great potential to be applicable to a broad range of cancers. One last note is that we identify molecules that are consistently up or down in cancer normal comparisons, but are not necessarily always higher in all cancers relative to all normal tissues (a subset are). Such molecules may not be suitable for plasma/serum based diagnostics but would be useful in screening biopsies in a histopathologic setting.

P. Carninci is founder and CEO for TransSINE Technologies. No potential conflicts of interest were disclosed by the other authors.

Conception and design: B. Kaczkowski, Y. Hayashizaki, P. Carninci, A.R.R. Forrest

Development of methodology: B. Kaczkowski, M. Itoh, P. Carninci

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): Y. Tanaka, M. Itoh, The FANTOM5 Consortium, P. Carninci, A.R.R. Forrest

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B. Kaczkowski, H. Kawaji, A. Sandelin, R. Andersson, M. Itoh, T. Lassmann

Writing, review, and/or revision of the manuscript: B. Kaczkowski, R. Andersson, P. Carninci, A.R.R. Forrest

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y. Tanaka, H. Kawaji, A. Sandelin, T. Lassmann, The FANTOM5 Consortium, P. Carninci

Study supervision: A.R.R. Forrest

Other (supported experimental design of qPCR validation of pan-cancer marker's candidates and data analysis, as well as performing experiments): Y. Tanaka

The authors thank Erik Arner, Efthymios Motakis, Kosuke Hashimoto, Dave Tang, Chung-Chau Hon, Jordan Ramilowski, and Giovani Pascarella for valuable discussions and comments to the manuscript, and Yuri Ishizu for technical assistance.

B. Kaczkowski was supported by Postdoctoral Fellowship Program from Japan Society for the Promotion of Science (JSPS) and Foreign Postdoctoral Researcher (FPR) program from RIKEN, Japan. Y. Tanaka was supported by Grants-in-Aid for Scientific Research (KAKENHI) from the Ministry of Education, Culture, Sports, Science, and Technology. R. Andersson was supported by funding from the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme (grant agreement No. 638273). A. Sandelin was supported by the Novo Nordisk Foundation and the Lundbeck Foundation. FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center from MEXT to Y. Hayashizaki and a Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan to Y. Hayashizaki. This study is also supported by Research Grants from the Japanese Ministry of Education, Culture, Sports, Science and Technology through RIKEN Preventive Medicine and Diagnosis Innovation Program to Y. Hayashizaki and RIKEN Centre for Life Science, Division of Genomic Technologies to P. Carninci. A.R.R. Forrest is supported by a Senior Cancer Research Fellowship from the Cancer Research Trust and funds raised by the MACA Ride to Conquer Cancer.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Felder
M
,
Kapur
A
,
Gonzalez-Bosquet
J
,
Horibata
S
,
Heintz
J
,
Albrecht
R
, et al
MUC16 (CA125): tumor biomarker to cancer therapy, a work in progress
.
Mol Cancer
2014
;
13
:
129
.
2.
Makarov
DV
,
Loeb
S
,
Getzenberg
RH
,
Partin
AW
. 
Biomarkers for prostate cancer
.
Annu Rev Med
2009
;
60
:
139
51
.
3.
Cancer Genome Atlas Research Network.
Kandoth
C
,
Schultz
N
,
Cherniack
AD
,
Akbani
R
,
Liu
Y
, et al
Integrated genomic characterization of endometrial carcinoma
.
Nature
2013
;
497
:
67
73
.
4.
Cancer Genome Atlas Research Network. Genome Characterization Center.
Chang
K
,
Creighton
CJ
,
Davis
C
,
Donehower
L
, et al
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
5.
FANTOM Consortium and the RIKEN PMI and CLST (DGT)
. 
A promoter-level mammalian expression atlas
.
Nature
2014
;
507
:
462
70
.
6.
The Cancer Genome Atlas - Data Portal [Internet]
.
[cited 2015 Jul 14]. Available from
: https://tcga-data.nci.nih.gov/
7.
Zhang
B
,
Wang
J
,
Wang
X
,
Zhu
J
,
Liu
Q
,
Shi
Z
, et al
Proteogenomic characterization of human colon and rectal cancer
.
Nature
2014
;
513
:
382
7
.
8.
FANTOM5 project [Internet]
.
[cited 2015 Jul 14]. Available from
: http://fantom.gsc.riken.jp/5/
9.
Andersson
R
,
Gebhard
C
,
Miguel-Escalada
I
,
Hoof
I
,
Bornholdt
J
,
Boyd
M
, et al
An atlas of active enhancers across human cell types and tissues
.
Nature
2014
;
507
:
455
61
.
10.
Robinson
MD
,
McCarthy
DJ
,
Smyth
GK
. 
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
.
Bioinformatics
2010
;
26
:
139
40
.
11.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
12.
Magrane
M
,
Consortium
U
. 
UniProt Knowledgebase: a hub of integrated protein data
.
Database (Oxford)
2011
:
bar009
.
13.
Zhao
M
,
Sun
J
,
Zhao
Z
. 
TSGene: a web resource for tumor suppressor genes
.
Nucleic Acids Res
2013
;
41
:
D970
6
.
14.
Tamborero
D
,
Gonzalez-Perez
A
,
Perez-Llamas
C
,
Deu-Pons
J
,
Kandoth
C
,
Reimand
J
, et al
Comprehensive identification of mutational cancer driver genes across 12 tumor types
.
Sci Rep
2013
;
3
:
2650
.
15.
Futreal
PA
,
Coin
L
,
Marshall
M
,
Down
T
,
Hubbard
T
,
Wooster
R
, et al
A census of human cancer genes
.
Nat Rev Cancer
2004
;
4
:
177
83
.
16.
Harrow
J
,
Frankish
A
,
Gonzalez
JM
,
Tapanari
E
,
Diekhans
M
,
Kokocinski
F
, et al
GENCODE: the reference human genome annotation for The ENCODE Project
.
Genome Res
2012
;
22
:
1760
74
.
17.
Fratta
E
,
Coral
S
,
Covre
A
,
Parisi
G
,
Colizzi
F
,
Danielli
R
, et al
The biology of cancer testis antigens: putative function, regulation and therapeutic potential
.
Mol Oncol
2011
;
5
:
164
82
.
18.
Rhodes
DR
,
Yu
J
,
Shanker
K
,
Deshpande
N
,
Varambally
R
,
Ghosh
D
, et al
Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression
.
Proc Natl Acad Sci U S A
2004
;
101
:
9309
14
.
19.
Janssen
JW
,
Vaandrager
JW
,
Heuser
T
,
Jauch
A
,
Kluin
PM
,
Geelen
E
, et al
Concurrent activation of a novel putative transforming gene, myeov, and cyclin D1 in a subset of multiple myeloma cell lines with t(11;14)(q13;q32)
.
Blood
2000
;
95
:
2691
8
.
20.
Taketani
T
,
Taki
T
,
Sako
M
,
Ishii
T
,
Yamaguchi
S
,
Hayashi
Y
. 
MNX1-ETV6 fusion gene in an acute megakaryoblastic leukemia and expression of the MNX1 gene in leukemia and normal B cell lines
.
Cancer Genet Cytogenet
2008
;
186
:
115
9
.
21.
Akamatsu
S
,
Takata
R
,
Haiman
CA
,
Takahashi
A
,
Inoue
T
,
Kubo
M
, et al
Common variants at 11q12, 10q26 and 3p11.2 are associated with prostate cancer susceptibility in Japanese
.
Nat Genet
2012
;
44
:
426
9
S1
.
22.
Barrett
CW
,
Ning
W
,
Chen
X
,
Smith
JJ
,
Washington
MK
,
Hill
KE
, et al
Tumor suppressor function of the plasma glutathione peroxidase gpx3 in colitis-associated carcinoma
.
Cancer Res
2013
;
73
:
1245
55
.
23.
Iyer
MK
,
Niknafs
YS
,
Malik
R
,
Singhal
U
,
Sahu
A
,
Hosono
Y
, et al
The landscape of long noncoding RNAs in the human transcriptome
.
Nat Genet
2015
;
47
:
199
208
.
24.
Cabanski
CR
,
White
NM
,
Dang
HX
,
Silva-Fisher
JM
,
Rauck
CE
,
Cicka
D
, et al
Pan-cancer transcriptome analysis reveals long noncoding RNAs with conserved function
.
RNA Biol
2015
;
12
:
628
42
.
25.
Tseng
Y-Y
,
Moriarity
BS
,
Gong
W
,
Akiyama
R
,
Tiwari
A
,
Kawakami
H
, et al
PVT1 dependence in cancer with MYC copy-number increase
.
Nature
2014
;
512
:
82
6
.
26.
Saitou
M
,
Sugimoto
J
,
Hatakeyama
T
,
Russo
G
,
Isobe
M
. 
Identification of the TCL6 genes within the breakpoint cluster region on chromosome 14q32 in T-cell leukemia
.
Oncogene
2000
;
19
:
2796
802
.
27.
Li
H
,
Wang
J
,
Xu
H
,
Xing
R
,
Pan
Y
,
Li
W
, et al
Decreased fructose-1,6-bisphosphatase-2 expression promotes glycolysis and growth in gastric cancer cells
.
Mol Cancer
2013
;
12
:
110
.
28.
Wheeler
TJ
,
Clements
J
,
Eddy
SR
,
Hubley
R
,
Jones
TA
,
Jurka
J
, et al
Dfam: a database of repetitive DNA based on profile hidden Markov models
.
Nucleic Acids Res
2013
;
41
:
D70
82
.
29.
Wang
G
,
Yang
X
,
Li
C
,
Cao
X
,
Luo
X
,
Hu
J
. 
PIK3R3 induces epithelial-to-mesenchymal transition and promotes metastasis in colorectal cancer
.
Mol Cancer Ther
2014
;
13
:
1837
47
.
30.
Johnson
CA
,
Padget
K
,
Austin
CA
,
Turner
BM
. 
Deacetylase activity associates with topoisomerase II and is necessary for etoposide-induced apoptosis
.
J Biol Chem
2001
;
276
:
4539
42
.
31.
Hawkins
DS
,
Park
JR
,
Thomson
BG
,
Felgenhauer
JL
,
Holcenberg
JS
,
Panosyan
EH
, et al
Asparaginase pharmacokinetics after intensive polyethylene glycol-conjugated L-asparaginase therapy for children with relapsed acute lymphoblastic leukemia
.
Clin Cancer Res
2004
;
10
:
5335
41
.
32.
Dudderidge
TJ
,
Stoeber
K
,
Loddo
M
,
Atkinson
G
,
Fanshawe
T
,
Griffiths
DF
, et al
Mcm2, Geminin, and KI67 define proliferative state and are prognostic markers in renal cell carcinoma
.
Clin Cancer Res
2005
;
11
:
2510
7
.
33.
Wharton
SB
,
Chan
KK
,
Anderson
JR
,
Stoeber
K
,
Williams
GH
. 
Replicative Mcm2 protein as a novel proliferation marker in oligodendrogliomas and its relationship to Ki67 labelling index, histological grade and prognosis
.
Neuropathol Appl Neurobiol
2001
;
27
:
305
13
.
34.
Liu
Y
,
He
G
,
Wang
Y
,
Guan
X
,
Pang
X
,
Zhang
B
. 
MCM-2 is a therapeutic target of Trichostatin A in colon cancer cells
.
Toxicol Lett
2013
;
221
:
23
30
.
35.
Rahmanzadeh
R
,
Rai
P
,
Celli
JP
,
Rizvi
I
,
Baron-Lühr
B
,
Gerdes
J
, et al
Ki-67 as a molecular target for therapy in an in vitro three-dimensional model for ovarian cancer
.
Cancer Res
2010
;
70
:
9234
42
.
36.
Xu
L
,
Geman
D
,
Winslow
RL
. 
Large-scale integration of cancer microarray data identifies a robust common cancer signature
.
BMC Bioinformatics
2007
;
8
:
275
.
37.
Ward
MC
,
Wilson
MD
,
Barbosa-Morais
NL
,
Schmidt
D
,
Stark
R
,
Pan
Q
, et al
Latent regulatory potential of human-specific repetitive elements
.
Mol Cell
2013
;
49
:
262
72
.
38.
Peaston
AE
,
Evsikov
AV
,
Graber
JH
,
de Vries
WN
,
Holbrook
AE
,
Solter
D
, et al
Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos
.
Dev Cell
2004
;
7
:
597
606
.
39.
Fort
A
,
Hashimoto
K
,
Yamada
D
,
Salimullah
M
,
Keya
CA
,
Saxena
A
, et al
Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance
.
Nat Genet
2014
;
46
:
558
66
.
40.
Ehrlich
M
. 
DNA methylation in cancer: too much, but also too little
.
Oncogene
2002
;
21
:
5400
13
.
41.
Zhang
L
,
Huang
J
,
Yang
N
,
Greshock
J
,
Liang
S
,
Hasegawa
K
, et al
Integrative genomic analysis of phosphatidylinositol 3′-kinase family identifies PIK3R3 as a potential therapeutic target in epithelial ovarian cancer
.
Clin Cancer Res
2007
;
13
:
5314
21
.
42.
Severin
J
,
Lizio
M
,
Harshbarger
J
,
Kawaji
H
,
Daub
CO
,
Hayashizaki
Y
, et al
Interactive visualization and analysis of large-scale sequencing datasets using ZENBU
.
Nat Biotechnol
2014
;
32
:
217
9
.

Supplementary data