Esophageal adenocarcinoma (EAC) has one of the fastest increases in incidence of any cancer, along with poor five-year survival rates. Barrett's esophagus (BE) is the main risk factor for EAC; however, the mechanisms driving EAC development remain poorly understood. Here, transcriptomic profiling was performed using RNA-sequencing (RNA-seq) on premalignant and malignant Barrett's tissues to better understand this disease. Machine-learning and network analysis methods were applied to discover novel driver genes for EAC development. Identified gene expression signatures for the distinction of EAC from BE were validated in separate datasets. An extensive analysis of the noncoding RNA (ncRNA) landscape was performed to determine the involvement of novel transcriptomic elements in Barrett's disease and EAC. Finally, transcriptomic mutational investigation of genes that are recurrently mutated in EAC was performed. Through these approaches, novel driver genes were discovered for EAC, which involved key cell cycle and DNA repair genes, such as BRCA1 and PRKDC. A novel 4-gene signature (CTSL, COL17A1, KLF4, and E2F3) was identified, externally validated, and shown to provide excellent distinction of EAC from BE. Furthermore, expression changes were observed in 685 long noncoding RNAs (lncRNA) and a systematic dysregulation of repeat elements across different stages of Barrett's disease, with wide-ranging downregulation of Alu elements in EAC. Mutational investigation revealed distinct pathways activated between EAC tissues with or without TP53 mutations compared with Barrett's disease. In summary, transcriptome sequencing revealed altered expression of numerous novel elements, processes, and networks in EAC and premalignant BE.

Implications: This study identified opportunities to improve early detection and treatment of patients with BE and esophageal adenocarcinoma. Mol Cancer Res; 15(11); 1558–69. ©2017 AACR.

Esophageal adenocarcinoma (EAC) has one of the fastest increases in incidence rates of all cancers since the 1970s in white populations of all age groups in many countries (1, 2). The strongest risk factor for EAC is the presence of Barrett's esophagus (BE), a condition in which columnar intestinal-type mucosa (IM) replaces the normal squamous lining of the esophagus in response to gastroesophageal reflux. Barrett's IM without dysplasia (non-dysplastic BE, NDBE) can progress through the stages of low-grade dysplasia (LGD) to high-grade dysplasia (HGD) to invasive EAC (3). Most patients with EAC present with late stages of disease, contributing to the poor 5-year population survival rates of ∼15% (4, 5). Even after highly aggressive chemoradiotherapy followed by esophagectomy for treatment of potentially curable disease, 5-year survival rates are typically below 50% (6).

The mechanisms driving EAC development are poorly understood, and the successes of early detection screening programs remain modest (7). Exome- and whole-genome sequencing studies have recently contributed greatly to the identification of driver mutations and frequent catastrophic, chromothriptic events (8). Further efforts have been made to investigate the order of genomic alterations occurring in the development of EAC by studying the mutations in Barrett's IM, HGD, and EAC (9).

Whereas previous expression studies of Barrett's disease have mainly used microarray-based technologies relying on predefined, mostly coding genes, RNA-sequencing (RNA-seq) offers an unbiased method of analyzing the whole transcriptome. Most microarray-based transcriptome analyses will for example miss a large proportion of long noncoding RNAs (lncRNAs), which are involved in almost all known cellular processes (10–14), and are important in cancer biology, as shown by the number of functional characterized lncRNAs involved in carcinogenesis (15). In EAC, only two lncRNAs have been reported (16, 17), and no systematic approach identifying lncRNAs dysregulation at different stages of Barrett's disease and EAC development has been published so far.

Despite having potentially important roles in disease, other subsets of the transcriptome are equally difficult to characterize without sequencing based technologies and have therefore rarely been explored. For example, repeat elements, which make up two thirds of the human genome (18), have been associated with cancer (19) and double stranded breaks (20), and long-interspersed nuclear element 1 (LINE-1) retrotransposition is increased in EAC and Barrett's (21).

Here we report a characterization of the transcriptomic landscape at different stages of the Barrett's disease spectrum by performing whole transcriptome RNA-seq. Utilizing machine learning and network analysis, we identify novel genes and networks in EAC development. This analysis provides the one of the first systematic studies of aberrations in lncRNAs and repeat element expression in EAC and identifies a novel, non-TP53 mutation-based pathway for EAC formation through transcriptional mutation analysis.

Study population, specimen collection and analysis, and RNA extraction

Institutional review board approval for this study was obtained at all collaborating institutions, and all patients provided written informed consent.

Fifty-one tissue samples [normal squamous esophagus (NE) = 17, NDBE = 14, BE with LGD = 8, EAC = 12] were collected at endoscopy from 44 patients for RNA-seq. Four patients provided matched normal squamous tissues. For the immunohistochemistry analysis, an additional 15 patients (NE = 5, NDBE = 5, EAC = 5) were added for external validation of protein findings. Demographic details of all patients included in this study are provided in Supplementary Table S1.

All cancer specimens were pretreatment biopsies from radio-chemotherapy-naïve patients.

Specimen collection

All tissue samples were collected from the tubular esophagus at or above the gastroesophageal junction, which was defined as the proximal border of the gastric rugal folds. In patients with NDBE or LGD, a columnar mucosal segment of at least 1 cm was required for inclusion. Following endoscopic removal, research specimens were immediately placed in a solution for RNA stabilization and stored for 24 to 48 hours at 4°C.

Determining tissue cellularity and dysplasia

Prior to RNA isolation specimens were cut into three pieces using surgical scalpels treated with an RNAse and DNA decontamination reagent. The central section of the specimen was formalin-fixed and paraffin-embedded with subsequent representative sections taken for H&E staining. The two remaining sections of the specimens were stored at −80°C until further use. The corresponding H&E sections were reviewed by two experienced gastrointestinal pathologists (Y.V. Bobryshev and M. Edwards) to verify the correct diagnosis and determine tissue cellularity as well as the degree and extent of dysplasia. Median microscopically estimated percentages of diagnosis defining epithelium in the specimens used for RNA extraction were as follows: NE 100% (IQR 100%–100%), NDBE 55% (IQR 50%–88%), LGD 50% (IQR 40%–79%), and EAC 73% (IQR 70%–100%).

RNA sequencing, bioinformatic analysis, gene and lncRNA validation

A detailed description of the RNA sequencing, the bioinformatic pipelines, and qPCR validation of lncRNAs is provided in Supplementary Methods. Briefly, trimming, mapping, and counting were performed using trimgalore, STAR (22), and HTSeq (23). Differential gene expression analysis was performed with EdgeR28. Network analysis was performed using WGCNA (24). The SNV calling pipeline consisted of Samtools (25) mpileup, bcftools, vcfutilis, and vcffilter. The mutations were analyzed using VEP. The method to generate, and validate, the 12- and 4-gene signature is described in Supplementary Methods.

A detailed description of lncRNA qPCR validation can be found in Supplementary Methods. Raw expression data has been deposited to http://www.ebi.ac.uk/arrayexpress/(E-MTAB-4054).

Next-generation RNA sequencing of Barrett's disease tissues

A summary of RNA-seq run metrics and patient demographics is provided in Supplementary Table S1 and in the Supplementary Methods.

Computational tissue purity estimation resulted in assessed purities > 80% for all analyzed samples (Supplementary Fig. S1A) higher than estimated microscopically. This lack of correlation of computational purity and histopathologic estimates is consistent with previous reports (26). The median stromal compartment was higher in EAC samples compared with NDBE and LGD and lowest in NE (Supplementary Fig. S1B), which may reflect tumor-induced desmoplasia. Enrichment for the immune cell score was only observed in EAC, possibly indicating a higher percentage of immune cell infiltration in EAC compared with BE (Supplementary Fig. S1C).

Whole transcriptome sequencing characterizes disease-specific patient cohorts

Correlating gene expression profiles revealed high correlation between NDBE, LGD, and EAC (Supplementary Fig. S1D), with closest similarity between NDBE and LGD (R = 0.97). All tissues are almost equally dissimilar to NE, indicating their non-squamous tissue histology. The correlation between EAC and NDBE and EAC and LGD was R = 0.94 and 0.93, respectively (Supplementary Fig. S1D).

Principal component analysis corroborated these findings, revealing that the major transcriptomic differences found between disease entities are between normal squamous tissues and NDBE/LGD/EAC (Fig. 1A). However, in the second principal component analysis, EAC showed distinct clustering characteristics with a subset of cancers grouping with the respective, tightly correlated precursor lesions (Fig. 1A).

Figure 1.

RNA-seq of esophageal tissues. A, Principal component analysis of the first two components, with EAC (red), non-dysplastic Barrett's esophagus (NDBE, blue), BE with LGD (green), and normal squamous esophagus (NE, purple). B, Number of upregulated (red) and downregulated (blue) differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes when comparing LGD/NDBE/NE to EAC with the percentage of change in each direction described on the x-axis. C, Venn diagram of the uniquely differentially expressed genes from B between LGD/NDBE/NE and EAC. D, Hierarchal clustering heatmap of the top 100 differentially expressed genes between each condition and EAC normalized as z-score per gene. E, Volcano plot comparing gene expression between LGD and NDBE. Genes with an absolute log2 fold change over ≥ 2 and FDR ≤ 0.05 are colored red.

Figure 1.

RNA-seq of esophageal tissues. A, Principal component analysis of the first two components, with EAC (red), non-dysplastic Barrett's esophagus (NDBE, blue), BE with LGD (green), and normal squamous esophagus (NE, purple). B, Number of upregulated (red) and downregulated (blue) differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes when comparing LGD/NDBE/NE to EAC with the percentage of change in each direction described on the x-axis. C, Venn diagram of the uniquely differentially expressed genes from B between LGD/NDBE/NE and EAC. D, Hierarchal clustering heatmap of the top 100 differentially expressed genes between each condition and EAC normalized as z-score per gene. E, Volcano plot comparing gene expression between LGD and NDBE. Genes with an absolute log2 fold change over ≥ 2 and FDR ≤ 0.05 are colored red.

Close modal

One sample, LNC_B13, originally classified as NDBE, showed a gene expression signature much closer to normal squamous esophagus. Upon microscopic re-inspection, the specimen contained a larger proportion of squamous esophagus than intestinal metaplasia, confirming the accuracy of RNA-seq–based tissue classification.

Differential gene expression analysis identifies specifically differentially expressed genes across Barrett's disease stages

Differential gene expression (DE) analysis showed that the highest number of DE genes (absolute log2 FC ≥ 1, FDR ≤ 0.05) was identified when comparing EAC to NE (n = 6,766; Fig. 1B), with almost equal numbers of DE genes present when EAC was compared with LGD (n = 2,885) and to NDBE (2,466; Fig. 1B). Of the top 10 DE genes between EAC and NE, 4 genes (HNF4A [log2FCEAC/NE 8.5, FDR = 9e−125], ANKS4B [log2FCEAC/NE 8,7, FDR = 2e−104], LGALS4 [log2FCEAC/NE 7.9, FDR = 6e−104], and BCL2L14 [log2FCEAC/NE 6.1, FDR = 1e−97], Supplementary Table S2) were also differentially expressed between NDBE and LGD versus NE, suggesting a tissue specific expression of these genes.

Of the 7,803 uniquely DE genes between EAC and the other groups (Fig. 1C), 4,029 were unique to EAC versus NE showing enrichment for developmental processes (P = 5.6e−22; Supplementary Fig. S1E). Shared DE genes between all groups versus EAC (n = 1,260) showed enrichment for cell motility (P = 7.2e−14), response to stress (P = 2.9e−16), and inflammatory response (P = 1.1e−17) in EAC (Supplementary Fig. S2), while the 316 genes dysregulated between EAC and NDBE and LGD were enriched for biological processes typical of highly replicative malignant tissues, such as mitotic cell cycle (P = 1.3e−11) and DNA-replication initiation (P = 2.5e−7; Supplementary Fig. S1E). Hierarchical clustering of the top 100 DE genes (based on FDR) between EAC and all groups illustrated a perfect classification of histologic tissue types, with EAC and NE displaying perfect separation, whereas NDBE and LGD clustered together, but distinctly from EAC (Fig. 1D). However, despite the similarities between NDBE and LGD compared with EAC, we observed 214 significantly differentially expressed genes between NDBE and LGD (Fig. 1E). Although these lacked significant gene enrichment signatures, the top 10 DE genes included 6 transcription factors (FOSB [log2FCLGD/NDBE 4.5, FDR = 3.8e−7], NR4A1 [log2FCLGD/NDBE 2.4, FDR = 6e−5], EGR1 [log2FCLGD/NDBE 2.6, FDR = 6e−5], FOS [log2FCLGD/NDBE 2.4, FDR = 1.3e−4], EGR3 [log2FCLGD/NDBE 2.4, FDR = 1e−3], and ATF3 [log2FCLGD/NDBE 1.8, FDR = 1e−3]), which showed similar dysregulation in EAC (Supplementary Fig. S2A). Most of these transcription factors are involved in cell proliferation, differentiation, and transformation processes. Thus, these identified novel alterations could function as biomarkers for dysplasia in patients with histopathology that is difficult to interpret, an important and not uncommon clinical problem for LGD (27).

A list of top differentially expressed genes is provided in Supplementary Table S2.

Network analysis reveals EAC specific coexpression networks

Most transcriptomic studies focus on deriving lists of differentially expressed genes. However, the information captured by transcriptomic experiments is essentially much richer as the relationship between measured transcripts can also be considered through pair-wise correlation of gene expression profiles. To this end, we used weighted gene coexpression network analysis (WGCNA) to identify gene coexpression networks that may be important for EAC carcinogenesis. WGCNA identifies clusters of highly interconnected genes, which are termed eigenmodules or in short, modules, which can then be explored for their biological plausibility through gene ontology, pathway analysis and similar. Through WGCNA we identified three modules specifically dysregulated in EAC compared with NDBE, LGD, and NE samples.

Module 1 (Fig. 2) showed increased expression in EAC and was highly enriched for cell-cycle genes (Fig. 2A and B). The most highly connected genes (Fig. 2C) included many significantly differential expressed genes between EAC and NDBE as well as EAC and LGD (labeled red) such as PRKDC (log2FCEAC/NDBE 1.1 FDREAC/BE 4.8e−09, log2FCEAC/LGD 1.5 FDREAC/LGD 5.0e−09) and BRCA1 (log2FCEAC/NDBE 2.3 FDREAC/NDBE 2.4e−9, log2FCEAC/LGD 1.5 FDREAC/LGD 1.2e−6; Supplementary Fig. S2B), while some were only specifically differentially expressed between EAC and LGD (labeled green, e.g., XPO1 [log2FCEAC/LGD 1.2 FDREAC/LGD 3.6e−09]), or uniquely differentially expressed between EAC and NDBE (labeled blue, e.g., PLK1 [log2FCEAC/LGD 1.1 FDREAC/LGD 1.3e−5]; Fig. 2B). This identification of BRCA1 and PRKDC in EAC is striking, as these genes are involved in DNA editing and repair mechanisms, which could be explained by the high degree of genomic rearrangement and chromothriptic events seen in EAC (8, 28). PRKDC was further validated as being upregulated in EAC at the protein level by IHC (Fig. 2D and E).

Figure 2.

Top identified weighted gene coexpression network in EAC. A, Heatmap (top) of all genes in the identified EAC specific module, and median eigengene expression profile (bottom) of the brown module identified by WGCNA, sorted by each condition. B, Top 10 gene ontology enrichment processes of all genes in the brown module. C, The top connected mRNA genes (≥10 connections) protein–protein interaction from the EAC-specific module (A). Size corresponds to number of connections (small ≥10, medium ≥20, large ≥30). Nodes differentially expressed in EAC versus NDBE/LGD (red), EAC versus NDBE (blue), and EAC versus LGD (green). D, IHC intensity score for PRKDC. ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM. E, Representative IHC slide of PRKDC between NDBE (left) and EAC (right).

Figure 2.

Top identified weighted gene coexpression network in EAC. A, Heatmap (top) of all genes in the identified EAC specific module, and median eigengene expression profile (bottom) of the brown module identified by WGCNA, sorted by each condition. B, Top 10 gene ontology enrichment processes of all genes in the brown module. C, The top connected mRNA genes (≥10 connections) protein–protein interaction from the EAC-specific module (A). Size corresponds to number of connections (small ≥10, medium ≥20, large ≥30). Nodes differentially expressed in EAC versus NDBE/LGD (red), EAC versus NDBE (blue), and EAC versus LGD (green). D, IHC intensity score for PRKDC. ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM. E, Representative IHC slide of PRKDC between NDBE (left) and EAC (right).

Close modal

WGCNA also confirmed the increased immune cell score observed in EAC by identifying an EAC-specific module highly expressed and enriched for genes involved in immune system functions (Supplementary Fig. S2C). Another specific module shown to be specifically downregulated in EAC and LGD compared with NDBE mainly included genes involved in protein transport functions (Supplementary Fig. S2D).

Identification of a novel, EAC-specific gene expression signature

Following the identification of EAC-specific gene coexpression networks, we attempted to discover an EAC specific gene signature by applying four machine-learning classification methods (Fig. 3A). Following our selection criteria (Supplementary Fig. S3A), we identified 39 genes showing complete separation of the conditions upon unsupervised hierarchical clustering (Fig. 3B). The identified genes were significantly enriched for biosynthetic processes, inflammatory responses, and cell proliferation (Fig. 3C). Interestingly, the gene signature included four lncRNAs not previously associated with EAC, such as DLGAP2-AS1 (log2FCEAC/NE 4.9 FDREAC/NE = 1.7e−49, log2FCEAC/NDBE 2.9 FDREAC/NDBE 8.3e−22, log2FCEAC/LGD 2.0 FDREAC/LGD 2.4e−6; Fig. 3B).

Figure 3.

Identification of EAC-specific gene signature through machine learning. A, Venn diagram of the four different cross-validations methods used to identify EAC-specific genes. Five-CV (five-cross-validation), MCCV (Monte–Carlo cross-validation), LOOCV (leave-one-out cross-validation). B, Heatmap of genes present in ≥2 cross-validation methods, and differentially expressed between EAC versus NDBE. C, Gene enrichment analysis of the genes from B. D, Conditional classification tree for sub setting of 12-gene signature in combined external microarrays (D top left) with resulting 4 genes signature and corresponding RNA-seq expression (D, bottom left). AUROC for combined microarray datasets comparing EAC to NDBE (D, top right) and for validation data sets of gastric (green) and colorectal (blue) cancer (D, bottom right) compared with corresponding normal tissues. E, IHC intensity score for KLF4, E2F3, COL17A1, and E2F3. *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM. F, Representative IHC slide of COL17A1 and E2F3 with NDBE on the left and EAC to the right.

Figure 3.

Identification of EAC-specific gene signature through machine learning. A, Venn diagram of the four different cross-validations methods used to identify EAC-specific genes. Five-CV (five-cross-validation), MCCV (Monte–Carlo cross-validation), LOOCV (leave-one-out cross-validation). B, Heatmap of genes present in ≥2 cross-validation methods, and differentially expressed between EAC versus NDBE. C, Gene enrichment analysis of the genes from B. D, Conditional classification tree for sub setting of 12-gene signature in combined external microarrays (D top left) with resulting 4 genes signature and corresponding RNA-seq expression (D, bottom left). AUROC for combined microarray datasets comparing EAC to NDBE (D, top right) and for validation data sets of gastric (green) and colorectal (blue) cancer (D, bottom right) compared with corresponding normal tissues. E, IHC intensity score for KLF4, E2F3, COL17A1, and E2F3. *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM. F, Representative IHC slide of COL17A1 and E2F3 with NDBE on the left and EAC to the right.

Close modal

Next, 12 genes identified in all four methods and differentially expressed between EAC and NDBE (Supplementary Fig. S3B) were evaluated for their capacity to differentiate between benign and malignant Barrett's tissues in two publically available microarray datasets [GSE37203, ref. (29); GSE26886, ref. (30)] comparing 87 EAC to non-dysplastic BE tissues (GSE37203, n = 46; GSE26866, n = 41). All corresponding probes of the 12 genes were identified in GSE26886, whereas probes mapping to PM20D2 could not be found in GSE37203. Each gene's discrimination characteristics were then assessed by plotting ROC curves and calculating corresponding areas under the curve (AUC), sensitivities, and specificities. These results are summarized in Supplementary Table S3. Our random forest classification analysis of combined external datasets identified four of the 12 genes (KLF4, E2F3, COL17A, CTSL) as main drivers of differences, thus contributing the most to classification accuracies (Fig. 3D, top and bottom). Near-perfect distinction of disease states could be achieved, when combining these four genes into an EAC specific gene signature (AUROC of combined validation sets: 0.965 [0.931–1.000]; Fig. 3D, right; Supplementary Fig. S3C). As this 4-gene signature had been derived through combining data from both our internal RNA-seq and the two external microarray datasets, we subsequently tested this novel 4-gene signature in a third, separate cohort of 84 BE and EAC tissues from 73 patients (GSE13898, ref. 31). In this cohort, we observed an AUC of 0.89 (95% CI 0.78–1.00), which translates to a sensitivity of 93% (95% CI 0.85–0.97) and specificity of 78% (95% CI 0.45–0.93). Intriguingly, these 4 genes also provided excellent separation of both gastric [AUROC: 0.917 (0.810–1.000), GSE19826, ref. 32] and colonic [AUROC: 0.974 (0.933–1.000), GSE23878, ref. 33] cancers from corresponding normal tissues (Fig. 3D, bottom right; Supplementary Fig. S3D), indicating a central role of these genes for epithelial gastrointestinal cancer formation.

To assess cell viability effects of individual shRNA-mediated gene knockdowns of the identified gene signature and EAC specific coexpression network, two EAC cell lines (OE33 and JHESOAD1) were accessed through the Broad Institute's Project Achilles Data Portal (see Methods). In accordance with the identified expression patterns, knockdown of EAC downregulated genes tended to result in increased EAC cell proliferation rates. Accordingly, knockdown of EAC-specific upregulated genes, such as CTSL (log2FCEAC/NE 4.0 FDREAC/NE = 7.7e−57, log2FCEAC/NDBE 2.0 FDREAC/NDBE 1.9e−15, log2FCEAC/LGD 2.0 FDREAC/LGD 5.5e−8), resulted in marked decrease in EAC cell viability (Supplementary Fig. S4A). Similar effects were observed in other GI-cancer cell lines indicating that the identified genes may be important for GI malignancy formation and perhaps not specific to EAC (Supplementary Fig. S4B). Despite this, some of the genes identified as producing lowest cell viabilities if knocked down, such as PRKDC, may potentially be exploited as therapeutic targets (34).

Expression of the gene signature proteins

Protein expression of the genes identified in our 4-gene EAC-specific signature (CTSL, COL17A1, E2F3, and KLF4) was investigated by IHC. Following quantitative analysis, significant protein overexpression of CTSL, E2F3 in EAC compared with NDBE could be confirmed (Fig. 3E and F; Supplementary Fig. S4C). Equally, the significant downregulation of COL17A1 in EAC compared with NDBE, as implied through our gene-expression analyses, was also confirmed in our IHC analysis (Fig. 3E and F), whereas there was no significant difference in protein expression levels for KLF4.

Long noncoding RNAs show characteristic differential expression across different stages of Barrett's disease

Our machine-learning approach identified lncRNAs that were specifically expressed in EAC, and thus potentially driving EAC development. Therefore, we sought to systematically characterize lncRNA dysregulation at different stages of Barrett's disease. Previously, only two lncRNAs (HNF1A-AS1 and AFAP-AS1) have been reported as upregulated in EAC (16, 17). Although these were upregulated in EAC compared with NE, we observed no difference between EAC and NDBE/LGD (Supplementary Fig. S5A and S5B). In total, our whole transcriptome analyses identify 685 significantly dysregulated lncRNAs in EAC compared with NE, NDBE, and LGD (Supplementary Table S4).

Of the 685 identified dysregulated lncRNAs, 19 overlapped Fantom5 enhancer RNAs (eRNA), suggesting that these are more involved in regulating gene expression than exerting individual molecular functions.

Unsupervised hierarchical clustering of the top 20 DE lncRNAs between EAC and the other groups showed highly accurate tissue classification (Fig. 4A; Supplementary Table S4), which represents the first of its kind for lncRNAs.

Figure 4.

lncRNA expression changes in EAC and premalignant lesions. A, Hierarchal clustered heatmap of the top 20 differentially expressed lncRNAs between each condition and EAC normalized as z-score per gene. B, Number of upregulated (red) and downregulated (blue) differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes between EAC and LGD/NDBE/NE. The respective lncRNA class is shown in the pie chart to the right. C, Representative location of each lncRNA class as classified in Gencode. D, Pearson correlation of differentially expressed neighbor lncRNA-mRNA pair subgrouped per class. E, K-means clustering of all differentially expressed Gencode genes between EAC and NE/NDBE/LGD with mean cluster expression (blue/red), number of genes in each cluster (white/green), number of lncRNAs per cluster (white/black), and the top GO biological process enrichment term to the right. F, IGV coverage plot (left) and qPCR (right) validation of (top-to-bottom) DLGAP1-AS2, DLEU2, AC108488.3, and PCAT7. *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM.

Figure 4.

lncRNA expression changes in EAC and premalignant lesions. A, Hierarchal clustered heatmap of the top 20 differentially expressed lncRNAs between each condition and EAC normalized as z-score per gene. B, Number of upregulated (red) and downregulated (blue) differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes between EAC and LGD/NDBE/NE. The respective lncRNA class is shown in the pie chart to the right. C, Representative location of each lncRNA class as classified in Gencode. D, Pearson correlation of differentially expressed neighbor lncRNA-mRNA pair subgrouped per class. E, K-means clustering of all differentially expressed Gencode genes between EAC and NE/NDBE/LGD with mean cluster expression (blue/red), number of genes in each cluster (white/green), number of lncRNAs per cluster (white/black), and the top GO biological process enrichment term to the right. F, IGV coverage plot (left) and qPCR (right) validation of (top-to-bottom) DLGAP1-AS2, DLEU2, AC108488.3, and PCAT7. *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001; n.s., nonsignificant; mean ± SEM.

Close modal

Comparing EAC to its precursor lesions and NE shows that the majority of lncRNAs tend to be upregulated in EAC (Fig. 4B). The two largest classes of DE lncRNAs are antisense and lincRNA, which also represent the two largest classes of annotated lncRNAs (Fig. 4C).

Furthermore, we observed significant expression correlation of 93% (129 of 139) of DE lncRNA–mRNA neighboring pairs (Fig. 4D). This was validated for 15 available lncRNA–mRNA pairs in the EAC TCGA dataset (Supplementary Fig. S5C). Guilt-by-association analysis, through k-means clustering, further revealed 53 and 102 lncRNAs in clusters enriched for biological terms associated with cell cycle and immune system, respectively, suggesting these lncRNAs may play a role in those processes of EAC development (Fig. 4E).

The expression of four top DE lncRNAs was validated by qRT-PCR in 41 patients (NE = 13, BE = 13, LGD = 6, EAC = 9), confirming the significant overexpressed levels of DLGAP1-AS2, DLEU2, AC108488.3, and PCAT7 compared with control tissues (Fig. 4F). With DLGAP1-AS2 also having been identified as a potential driver gene for EAC formation in our machine-learning analysis, its role as an important biomarker for EAC development warrants further exploration.

Taken together, these results indicate an important role for lncRNAs in esophageal adenocarcinogenesis.

Repeat elements are detectable by RNA-seq and show differential expression patterns across different stages of Barrett's disease

A novel endogenous retrovirus-associated long noncoding RNA (EVADR) has recently been reported in various human adenocarcinomas but EAC was not included in the analysis (35). Here, we report elevated expression levels of EVADR in EAC compared with NE (Fig. 5C, [log2FCEAC/NE 4.9 FDREAC/NE = 1.2e−11]), confirming the previously reported overexpression of EVADR specifically occurring in adenocarcinomas. However, we also detected substantially elevated levels of EVADR both in NDBE and LGD (log2FCNDBE/NE 3.3 FDRNDBE/NE 2.7e−06, log2FCLGD/NE 2.0 FDRLGD/NE 1.4e−2) indicating the preneoplastic nature of these metaplastic conditions and a potential role for EVADR in step-wise malignancy formation (Fig. 5C).

Figure 5.

Dysregulation of repeat elements and centromere proteins at different stages of EAC development. A, Expression of EVADR, a long-terminal repeat (LTR) enriched lncRNAs previously described in other human adenocarcinomas but uncharacterized in Barrett's disease. B, Number of differentially expressed (FDR ≤ 0.05) repeat elements from the largest four classes (DNA, LINE, LTR, and SINE elements). Y-axis indicates number of differentially expressed repeat elements. C, Heatmap of differentially expressed (FDR ≤ 0.05) Alu/SINE elements between LGD/NDBE/NE and EAC illustrating wide-ranging downregulation of Alu elements in EAC. D, Significant differential expression (FDR ≤ 0.05) and upregulation of centromere satellites in EAC (red) compared with NE (purple), NDBE (blue) and LGD (green). E, Log-fold change of all centromere protein encoding genes in EAC compared with NE/NDBE/LGD. *, FDR ≤ 0.05.

Figure 5.

Dysregulation of repeat elements and centromere proteins at different stages of EAC development. A, Expression of EVADR, a long-terminal repeat (LTR) enriched lncRNAs previously described in other human adenocarcinomas but uncharacterized in Barrett's disease. B, Number of differentially expressed (FDR ≤ 0.05) repeat elements from the largest four classes (DNA, LINE, LTR, and SINE elements). Y-axis indicates number of differentially expressed repeat elements. C, Heatmap of differentially expressed (FDR ≤ 0.05) Alu/SINE elements between LGD/NDBE/NE and EAC illustrating wide-ranging downregulation of Alu elements in EAC. D, Significant differential expression (FDR ≤ 0.05) and upregulation of centromere satellites in EAC (red) compared with NE (purple), NDBE (blue) and LGD (green). E, Log-fold change of all centromere protein encoding genes in EAC compared with NE/NDBE/LGD. *, FDR ≤ 0.05.

Close modal

Seeing that many lncRNAs, including EVADR, are repeat enriched (36), we explored the expression of repeat elements at different stages of EAC development. DE analysis of the four largest repeat types revealed wide dysregulation between EAC, NDBE, LGD, and normal samples with the largest repeat element differential expression patterns found between EAC and normal tissues (Fig. 5A; Supplementary Table S5). Overall, long terminal repeats (LTR), the class of repeats enriched in EVADR, and LINE elements show the highest expression in EAC, while SINE elements are mostly downregulated in EAC (Fig. 5B). This latter finding is especially interesting as we also found that APOBEC3G, an inhibitor of SINE elements (37), is upregulated in EAC compared with other tissues (log2FCEAC/NE = 0.94 FDREAC/NE = 1.1e−4, log2FCEAC/NDBE = 0.90 FDREAC/NDBE = 5.3e−4, log2FCEAC/LGD = 1.3 FDREAC/LGD 4.7e−4).

Previous studies have shown that the REP522 family is upregulated in most promoters in a variety of cancers (38). In our data we also found an increase of REP522 expression between EAC and NE, NDBE, and LGD. (log2FCEAC/NE 0.89 FDREAC/NE = 4.0e−4, log2FCEAC/NDBE 1.6 FDREAC/NDBE 1.4e−8, log2FCEAC/LGD 1.0 FDREAC/LGD 5.9e−3).

Centromere satellites and mRNA are dysregulated patterns in EAC

Through our repeat element analysis, we also observed increased expression of three centromere satellites in EAC compared with other patient groups (Fig. 5D). α-Satellites, found in the centromere regions of all chromosomes, have been linked to aneuploidy, a well-known feature of EAC (39, 40). In light of these observations, we investigated centromere protein (CENP) expression. CENP are involved in cell-cycle control and are upregulated in several cancers, where they instigate chromosomal instability (41, 42). We found the majority of all CENP to be upregulated in EAC compared with NDBE, LGD, and NE (Fig. 5E). This feature has not been reported previously in EAC; it may be involved in the frequent chromothriptic events and aneuploidy observed in this malignancy (8).

Whole transcriptome sequencing successfully identifies mutations prevalent in EAC and allows for the discovery of a RNA-seq mutational signature of different stages of Barrett's disease

Although RNA-seq is mainly employed to study gene expression, it has been shown that it is possible to identify mutations in expressed genes (43, 44).

To identify commonly occurring mutational patterns occurring in expressed transcripts, we investigated the overall transcriptional mutation signatures of each sample (Supplementary Methods). Based on five discovered signatures, we saw that the majority of EAC cluster closely together as most EAC tend to have reduced frequencies of signature 2, which mainly consists of C>T variants (Supplementary Fig. S5D and S5E). Interestingly, out of the two LGD and one NDBE patients clustering closely to EAC, one LGD patient later progressed to develop EAC.

Next, we focused our analysis on recurrently mutated SNPs in EAC (9, 28, 40). In concordance with previous studies (9), we observed a high percentage (50%) of EAC patients with missense mutation in TP53, whereas no TP53 mutation was detected in NDBE or LGD (Fig. 6A). All TP53 mutations had previously been reported in the COSMIC database. Further, mutations in SMARCA4 and CDNK2A were restricted to EAC and LGD patients (Fig. 6A), suggesting that these could be important drivers in tumor development.

Figure 6.

Identification of transcribed SNPs present in EAC and premalignant lesions of the esophagus. A, Variants (missense, synonymous, frame shift, stop gained, inframe insertions, splice variant) in known genomically mutated genes across EAC, LGD, and NDBE with aggregated percentage of patients with mutations (bar chart left), and the corresponding frequency of mutations as determined in Broad/TCGA genomic EAC data on the top of each bar. Mutations found in COSMIC were marked with an asterisk. B, MA plot of gene expression between TP53 wild-type (+) EAC, TP53 mutated (−) EAC and Barrett's intestinal metaplasia. Significantly differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes are colored red (upregulated) or blue (downregulated).

Figure 6.

Identification of transcribed SNPs present in EAC and premalignant lesions of the esophagus. A, Variants (missense, synonymous, frame shift, stop gained, inframe insertions, splice variant) in known genomically mutated genes across EAC, LGD, and NDBE with aggregated percentage of patients with mutations (bar chart left), and the corresponding frequency of mutations as determined in Broad/TCGA genomic EAC data on the top of each bar. Mutations found in COSMIC were marked with an asterisk. B, MA plot of gene expression between TP53 wild-type (+) EAC, TP53 mutated (−) EAC and Barrett's intestinal metaplasia. Significantly differentially expressed (absolute log2 FC ≥ 1, FDR ≤ 0.05) genes are colored red (upregulated) or blue (downregulated).

Close modal

RNA-seq identifies severe dysregulation of the transcriptome in TP53 mutated EACs and wide-ranging metabolic transcriptional dysregulation in TP53 wild-type tumors compared with BE

With 50% of EAC patients having either wild-type TP53 (TP53+) or mutant TP53 (TP53) transcripts, we compared the gene expression profile of TP53+ and TP53 patient groups to NDBE. We found that TP53 EACs have approximately twice as many DE genes (n = 3000) as TP53+ tumors (n = 1776) when compared with BE (Fig. 6C). Moreover, we found that TP53 tumors showed significant enrichment for cell cycle processes whereas TP53+ tumors showed enrichment for cell metabolic functions when compared with BE (Supplementary Fig. S6A). Genes commonly differentially expressed between both types of EACs and BE showed enrichment for inflammatory response processes indicating two highly distinct pathways of EAC formation with inflammation—potentially triggered by gastroesophageal reflux—as a common pathway in malignancy formation. This was further supported by our selective analysis of genes involved in the TP53 pathway, where we found that TP53 and TP53+ EACs show different expression of TP53 pathway genes, with MDM2 upregulation and TP53AIP downregulation being a characteristic trait of TP53 wild-type tumors compared with BE (Supplementary Fig. S6B).

In this study, we characterized transcriptomic alterations occurring at different stages of Barrett's disease through RNA-seq. We have identified key players in EAC-specific gene networks and describe a novel EAC specific 4-gene expression signature. Our investigation of the noncoding RNA landscape and assessment of expressed gene mutations have also identified new processes involved in the development of this disease.

We identified upregulated transcription factors (e.g., EGR1, EGR3, FOSB, FOS, NR4A1, ATF3) in Barrett's with LGD and in EAC that have not previously been associated with early stage Barrett's disease progression and could thus be tested as novel biomarkers.

Our investigation of the mRNA landscape through weighted coexpression gene network analysis (WCGNA) revealed a plethora of highly connected, differentially expressed genes mostly involved in cell-cycle control and DNA repair mechanisms. Prominent members of this network included genes not previously characterized in EAC such as BRCA1 and PRKDC. Protein levels of PRKDC were also overexpressed, and previous PRKDC knockdown data produced some of the lowest measured viabilities in the two tested EAC as well as multiple other gastrointestinal cancer cell lines. The protein kinase, DNA-activated, catalytic polypeptide (PRKDC, or DNA-PKcs) is critical for DNA double strand break repair and recombination (45, 46). As such, PRKDC activity in EAC is in line with the genomic rearrangement recently described as a central feature of EAC (8, 28). However, PRKDCs is also a central regulator of transcriptional networks, facilitating tumor formation and progression (47). PRKDC may also have clinical relevance as a predictor of sensitivity to chemotherapy, as reported for esophageal SCC (48), and it has been reported as an attractive novel chemotherapeutic target (34).

Through machine learning, we discovered and validated a novel gene signature that displayed near-perfect distinction of EAC and BE tissues in two public datasets (GSE37203, ref. 29; GSE26886, ref. 30) and in a third, completely independent validation set (GSE13898, ref. 31). Although, to our knowledge, the derived gene list lacks genes previously associated with EAC and its development, many are associated with other cancers (Supplementary Table S6). Furthermore, our random forest analysis determined four genes (CTSL, COL17A1, E2F3, and KLF4) that could accurately differentiate normal from malignant Barrett's tissues both at the transcript and protein level for CTSL, COL17A1, and E2F3, while only at the RNA expression level for KLF4. E2F3 and KLF4 were equally important for distinguishing benign and malignant gastric and colorectal tissues, suggesting possible roles as onco- and tumor suppressor genes in epithelial GI cancers. E2F family member overexpression and loss of KLF4 expression are known phenomena in both colon and gastric cancer, whereas in colorectal cancer, KLF4 is regarded as a tumor suppressor gene that may also mediate epithelial mesenchymal transition (49). We surmised that E2F family member overexpression may be the result of defective RB1 signaling, which is known to occur in EAC (3). Moreover, of the other protein coding genes identified through this machine learning and network analysis, many displayed characteristics in public shRNA-mediated gene knockdown data, indicating that they may be drivers of EAC formation and propagation.

Using whole-transcriptome sequencing we discovered 685 lncRNAs differentially expressed between EAC and NE, NDBE, and LGD. Some of these were also identified in our EAC-specific gene signature. lncRNA-mRNA coexpression analysis suggested that lncRNAs function in malignancy relevant processes such as mitosis, development, and extracellular matrix organizational processes.

Although the identified eRNAs only make up ∼3% of the total dysregulated lncRNAs, these elements point to the potential involvement of dysregulated enhancers in EAC. Enhancers regulate the transcription of target gene/genes; however, identifying the targets can be difficult because enhancers can act upon large areas of the genome and not just regulating the gene in cis. The use of chromatin configuration capture methods, such as Hi-C, will help identifying the genomic targets of the enhancers in EAC.

Further studies should now aim at establishing the functional role of the identified lncRNAs in EAC development and how these may be exploited from a prognostic and therapeutic perspective.

We found many classes of repeat elements to be upregulated in EAC. We observed increased expression of LINE elements, consistent with recent reports describing increased LINE-1 retrotransposition activity in both BE and EAC (21). Interestingly, however, Alu elements, primate-specific short interspersed nuclear elements (SINE), showed almost uniform downregulation in EAC compared with premalignant Barrett's and normal squamous tissues. Alu elements are known regulators of gene expression (50), and thus their downregulation in EAC could indicate loss of important regulatory elements facilitating malignancy formation.

Previous studies have identified similar patterns of upregulation in repeat elements in a variety of cancers. In hepatocellular carcinoma, LTR promoters show increased activation and tumors with high LTR expression were less differentiated compared with tumors with low LTR expression (51).

Furthermore, a study conducting a pan-cancer investigation noticed that promoters containing SINE, LINE, and LTRs were upregulated in cancers (38). In our study and in concordance with previous studies, we observed that the majority of LTR and LINE elements were upregulated in EAC. Our study also confirms the upregulation of REP522, which has previously been indicated in cancer (38). What differs from previous studies is that we observed a downregulation of SINE in EAC. Further studies should aim to elucidate the role of repeat elements in EAC.

Intriguingly, comparing gene expression profiles of EAC tumors with or without defective TP53 transcripts to NDBE identified two highly distinct gene enrichment profiles: TP53-mutated tumors were enriched for cell-cycle division and mitotic processes, whereas TP53 wild-type tumors enriched for metabolic processes. While the acquisition of TP53 mutations has been identified as a key step in Barrett's disease progression (9), approximately 30% of EACs harbor wild-type TP53 (28). Thus, we propose severe dysregulation in metabolic pathways as an alternative, non-TP53 mutation dependent form of EAC development.

There are limitations to our findings. First, this study is descriptive by design and cannot provide information regarding risk of Barrett's progression, as we studied samples from different individuals at different disease stages. Therefore, it is clear that spatiotemporal gene expression variability or biologic variance due to sampling of different regions of the esophagus or capturing of particular BE cell clones/subpopulations cannot be ruled out as explanations for our findings. A second limitation of our study is the lack of HGD Barrett's tissues (HGD), although we note that most recent genomic (9) and recent gene expression studies (52) indicate that the alterations occurring in HGD are very similar to those identified in EAC.

In summary, by performing whole transcriptome sequencing we have described novel genes and transcription elements and elucidated signaling pathways previously not observed or under-characterized in the Barrett's metaplasia–dysplasia–neoplasia sequence. Through external validation of our findings, we identified a novel four gene signature, which provides almost complete differentiation of EAC from BE but requires further prospective testing. Our results also suggest separate pathways to EAC in TP53 mutant and nonmutant tumors, with inflammation, presumably due to the severe gastroesophageal reflux that is involved in initiating Barrett's disease, as a common factor.

No potential conflicts of interest were disclosed.

Conception and design: J.L.V. Maag, O.M. Fisher, A. Levert-Mignon, M.L. Thomas, Y.V. Bobryshev, M.E. Dinger, R.V. Lord

Development of methodology: J.L.V. Maag, O.M. Fisher, D.C. Kaczorowski, R.V. Lord

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): O.M. Fisher, A. Levert-Mignon, M.L. Thomas, D.J. Hussey, D.I. Watson, A. Wettstein, Y.V. Bobryshev, M. Edwards, R.V. Lord

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.L.V. Maag, O.M. Fisher, Y.V. Bobryshev, R.V. Lord

Writing, review, and/or revision of the manuscript: J.L.V. Maag, O.M. Fisher, M.L. Thomas, D.J. Hussey, D.I. Watson, Y.V. Bobryshev, M.E. Dinger, R.V. Lord

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): O.M. Fisher, A. Levert-Mignon, D.C. Kaczorowski, M.L. Thomas, R.V. Lord

Study supervision: D.I. Watson, M.E. Dinger, R.V. Lord

The authors would like to thank Alice Boulghoujian and Anaiis Zaratzian of the Garvan Institute for Medical Research Histopathology Department for their excellent technical assistance with all immunohistochemical analyses.

This work was supported by the Progression of Barrett's Esophagus to Cancer Network (PROBE-Net), the Australian National Health and Medical Research Council (NHMRC1040947), Cancer Council New South Wales (SRP 08-04 and RG 13-03), and St Vincent's Clinic Foundation, Sydney. O.M. Fisher is supported by the Swiss Cancer League (BIL-KLS 133-02-2013), the Australian National Health & Medical Research Council (GNT1094423) and the Swiss National Science Foundation (P1SKP3_161806).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Pohl
H
,
Welch
HG
. 
The role of overdiagnosis and reclassification in the marked increase of esophageal adenocarcinoma incidence
.
J Natl Cancer Inst
2005
;
97
:
142
6
.
2.
Edgren
G
,
Adami
HO
,
Weiderpass
E
,
Nyrén
O
. 
A global assessment of the oesophageal adenocarcinoma epidemic
.
Gut
2013
;
62
:
1406
14
.
3.
Clemons
NJ
,
Phillips
WA
,
Lord
RV
. 
Signaling pathways in the molecular pathogenesis of adenocarcinomas of the esophagus and gastroesophageal junction
.
Cancer Biol Ther
2013
;
14
:
782
95
.
4.
Enzinger
PC
,
Mayer
RJ
. 
Esophageal cancer
.
N Engl J Med
2003
;
349
:
2241
52
.
5.
Rustgi
AK
,
El-Serag
HB
. 
Esophageal carcinoma
.
N Engl J Med
2014
;
371
:
2499
509
.
6.
van Hagen
P
,
Hulshof
MC
,
van Lanschot
JJ
,
Steyerberg
EW
,
van Berge Henegouwen
MI
,
Wijnhoven
BP
, et al
Preoperative chemoradiotherapy for esophageal or junctional cancer
.
N Engl J Med
2012
;
366
:
2074
84
.
7.
Weaver
JM
,
Ross-Innes
CS
,
Fitzgerald
RC
. 
The “-omics” revolution and oesophageal adenocarcinoma
.
Nat Rev Gastroenterol Hepatol
2014
;
11
:
19
27
.
8.
Nones
K
,
Waddell
N
,
Wayte
N
,
Patch
AM
,
Bailey
P
,
Newell
F
, et al
Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis
.
Nat Commun
2014
;
5
:
5224
–.
9.
Weaver
JMJ
,
Ross-Innes
CS
,
Shannon
N
,
Lynch
AG
,
Forshew
T
,
Barbera
M
, et al
Ordering of mutations in preinvasive disease stages of esophageal carcinogenesis
.
Nat Genet
2014
;
46
:
837
43
.
10.
Gupta
RA
,
Shah
N
,
Wang
KC
,
Kim
J
,
Horlings
HM
,
Wong
DJ
, et al
Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis
.
Nature
2011
;
464
:
1071
6
.
11.
Kretz
M
,
Siprashvili
Z
,
Chu
C
,
Webster
DE
,
Zehnder
A
,
Qu
K
, et al
Control of somatic tissue differentiation by the long non-coding RNA TINCR
.
Nature
2012
;
493
:
231
5
.
12.
Wang
P
,
Xue
Y
,
Han
Y
,
Lin
L
,
Wu
C
,
Xu
S
, et al
The STAT3-binding long noncoding RNA lnc-DC controls human dendritic cell differentiation
.
Science
2014
;
344
:
310
3
.
13.
Lin
N
,
Chang
KY
,
Li
Z
,
Gates
K
,
Rana
ZA
,
Dang
J
, et al
An evolutionarily conserved long noncoding RNA TUNA controls pluripotency and neural lineage commitment
.
Mol Cell
2014
;
53
:
1005
19
.
14.
Engreitz
JM
,
Pandya-Jones
A
,
McDonel
P
,
Shishkin
A
,
Sirokman
K
,
Surka
C
, et al
The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome
.
Science
2013
;
341
:
1237973
.
15.
Quek
XC
,
Thomson
DW
,
Maag
JLV
,
Bartonicek
N
,
Signal
B
,
Clark
MB
, et al
lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs
.
Nucleic Acids Res
2015
;
43
:
D168
73
.
16.
Yang
X
,
Song
JH
,
Cheng
Y
,
Wu
W
,
Bhagat
T
,
Yu
Y
, et al
Long non-coding RNA HNF1A-AS1 regulates proliferation and migration in oesophageal adenocarcinoma cells
.
Gut
2014
;
63
:
881
90
.
17.
Wu
W
,
Bhagat
TD
,
Yang
X
,
Song
JH
,
Cheng
Y
,
Agarwal
R
, et al
Hypomethylation of noncoding DNA regions and overexpression of the long noncoding RNA, AFAP1-AS1, in Barrett's esophagus and esophageal adenocarcinoma
.
Gastroenterology
2013
;
144
:
956
966
.
e4
.
18.
de Koning
AP
,
Gu
W
,
Castoe
TA
,
Batzer
MA
,
Pollock
DD
. 
Repetitive elements may comprise over two-thirds of the human genome
.
PLoS Genet
2011
;
7
:
e1002384
.
19.
Thibodeau
S
,
Bren
G
,
Schaid
D
. 
Microsatellite instability in cancer of the proximal colon
.
Science
1993
;
260
:
816
9
.
20.
Gasior
SL
,
Wakeman
TP
,
Xu
B
,
Deininger
PL
. 
The human LINE-1 retrotransposon creates DNA double-strand breaks
.
J Mol Biol
2006
;
357
:
1383
93
.
21.
Doucet-O'Hare
TT
,
Rodić
N
,
Sharma
R
,
Darbari
I
,
Abril
G
,
Choi
JA
, et al
LINE-1 expression and retrotransposition in Barrett's esophagus and esophageal carcinoma
.
Proc Natl Acad Sci U S A
2015
;
112
:
E4894
900
.
22.
Dobin
A
,
Davis
CA
,
Schlesinger
F
,
Drenkow
J
,
Zaleski
C
,
Jha
S
, et al
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
2012
;
29
:
15
21
.
23.
Anders
S
,
Pyl
PT
,
Huber
W
. 
HTSeq–a Python framework to work with high-throughput sequencing data
.
Bioinformatics
2015
;
31
:
166
9
.
24.
Langfelder
P
,
Horvath
S
. 
WGCNA: an R package for weighted correlation network analysis
.
BMC Bioinformatics
2008
;
9
:
559
.
25.
Li
H
,
Handsaker
B
,
Wysoker
A
,
Fennell
T
,
Ruan
J
,
Homer
N
, et al
The Sequence Alignment/Map format and SAMtools
.
Bioinformatics
2009
;
25
:
2078
9
.
26.
Yoshihara
K
,
Shahmoradgoli
M
,
Martínez
E
,
Vegesna
R
,
Kim
H
,
Torres-Garcia
W
, et al
Inferring tumour purity and stromal and immune cell admixture from expression data
.
Nat Commun
2013
;
4
.
27.
Lao-Sirieix
P
,
Fitzgerald
RC
. 
Screening for oesophageal cancer
.
Nat Rev Clin Oncol
2012
;
9
:
278
87
.
28.
Dulak
AM
,
Stojanov
P
,
Peng
S
,
Lawrence
MS
,
Fox
C
,
Stewart
C
, et al
Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity
.
Nat Genet
2013
;
45
:
478
86
.
29.
Silvers
AL
,
Lin
L
,
Bass
AJ
,
Chen
G
,
Wang
Z
,
Thomas
DG
, et al
Decreased selenium-binding protein 1 in esophageal adenocarcinoma results from posttranscriptional and epigenetic regulation and affects chemosensitivity
.
Clin Cancer Res
2010
;
16
:
2009
21
.
30.
Wang
Q
,
Ma
C
,
Kemmner
W
. 
Wdr66 is a novel marker for risk stratification and involved in epithelial-mesenchymal transition of esophageal squamous cell carcinoma
.
BMC Cancer
2013
;
13
:
137
.
31.
Kim
SM
,
Park
YY
,
Park
ES
,
Cho
JY
,
Izzo
JG
,
Zhang
D
, et al
Prognostic biomarkers for esophageal adenocarcinoma identified by analysis of tumor transcriptome
.
PLoS One
2010
;
5
:
e15074
.
32.
Wang
Q
,
Wen
YG
,
Li
DP
,
Xia
J
,
Zhou
CZ
,
Yan
DW
, et al
Upregulated INHBA expression is associated with poor survival in gastric cancer
.
Med Oncol
2010
;
29
:
77
83
.
33.
Uddin
S
,
Ahmed
M
,
Hussain
A
,
Abubaker
J
,
Al-Sanea
N
,
AbdulJabbar
A
, et al
Genome-wide expression analysis of middle eastern colorectal cancer reveals FOXM1 as a novel target for cancer therapy
.
Am J Pathol
2011
;
178
:
537
47
.
34.
Riabinska
A
,
Daheim
M
,
Herter-Sprie
GS
,
Winkler
J
,
Fritz
C
,
Hallek
M
, et al
Therapeutic targeting of a robust non-oncogene addiction to PRKDC in ATM-defective tumors
.
Sci Transl Med
2013
;
5
:
189ra78
8
.
35.
Gibb
EA
,
Warren
RL
,
Wilson
GW
,
Brown
SD
,
Robertson
GA
,
Morin
GB
, et al
Activation of an endogenous retrovirus-associated long non-coding RNA in human adenocarcinoma
.
Genome Med
2015
;
7
:
101
.
36.
Kapusta
A
,
Kronenberg
Z
,
Lynch
VJ
,
Zhuo
X
,
Ramsay
L
,
Bourque
G
, et al
Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs
.
PLoS Genet
2013
;
9
:
e1003470
.
37.
Hulme
AE
,
Bogerd
HP
,
Cullen
BR
,
Moran
JV
. 
Selective inhibition of Alu retrotransposition by APOBEC3G
.
Gene
2007
;
390
:
199
205
.
38.
Kaczkowski
B
,
Tanaka
Y
,
Kawaji
H
,
Sandelin
A
,
Andersson
R
,
Itoh
M
, et al
Transcriptome analysis of recurrently deregulated genes across multiple cancers identifies new pan-cancer biomarkers
.
Cancer Res
2016
;
76
:
216
26
.
39.
Dulak
AM
,
Schumacher
SE
,
van Lieshout
J
,
Imamura
Y
,
Fox
C
,
Shim
B
, et al
Gastrointestinal adenocarcinomas of the esophagus, stomach, and colon exhibit distinct patterns of genome instability and oncogenesis
.
Cancer Res
2012
;
72
:
4383
93
.
40.
Stachler
MD
,
Taylor-Weiner
A
,
Peng
S
,
McKenna
A
,
Agoston
AT
,
Odze
RD
, et al
Paired exome analysis of Barrett's esophagus and adenocarcinoma
.
Nat Genet
2015
;
47
:
1047
55
.
41.
Guo
XZ
,
Zhang
G
,
Wang
JY
,
Liu
WL
,
Wang
F
,
Dong
JQ
, et al
Prognostic relevance of Centromere protein H expression in esophageal carcinoma
.
BMC Cancer
2008
;
8
:
233
.
42.
Tomonaga
T
. 
Centromere protein H is up-regulated in primary human colorectal cancer and its overexpression induces aneuploidy
.
Cancer Res
2005
;
65
:
4683
9
.
43.
Blachly
JS
,
Ruppert
AS
,
Zhao
W
,
Long
S
,
Flynn
J
,
Flinn
I
, et al
Immunoglobulin transcript sequence and somatic hypermutation computation from unselected RNA-seq reads in chronic lymphocytic leukemia
.
Proc Natl Acad Sci U S A
2015
;
112
:
4323
7
.
44.
Network TCGAR
. 
Comprehensive molecular profiling of lung adenocarcinoma
.
Nature
2014
;
511
:
543
50
.
45.
Kurimasa
A
,
Kumano
S
,
Boubnov
NV
,
Story
MD
,
Tung
CS
,
Peterson
SR
, et al
Requirement for the kinase activity of human DNA-dependent protein kinase catalytic subunit in DNA strand break rejoining
.
Mol Cell Biol
1999
;
19
:
3877
84
.
46.
Zhao
Y
,
Thomas
HD
,
Batey
MA
,
Cowell
IG
,
Richardson
CJ
,
Griffin
RJ
, et al
Preclinical evaluation of a potent novel DNA-dependent protein kinase inhibitor NU7441
.
Cancer Res
2006
;
66
:
5354
62
.
47.
Goodwin
JF
,
Kothari
V
,
Drake
JM
,
Zhao
S
,
Dylgjeri
E
,
Dean
JL
, et al
DNA-PKcs-mediated transcriptional regulation drives prostate cancer progression and metastasis
.
Cancer Cell
2015
;
28
:
97
113
.
48.
Noguchi
T
,
Shibata
T
,
Fumoto
S
,
Uchida
Y
,
Mueller
W
,
Takeno
S
. 
DNA-PKcs expression in esophageal cancer as a predictor for chemoradiation therapeutic sensitivity
.
Ann Surg Oncol
2002
;
9
:
1017
22
.
49.
Wei
D
. 
Emerging role of KLF4 in human gastrointestinal cancer
.
Carcinogenesis
2005
;
27
:
23
31
.
50.
Julien Häsler
KS
. 
Alu elements as regulators of gene expression
.
Nucleic Acids Res
2006
;
34
:
5491
7
.
51.
Hashimoto
K
,
Suzuki
AM
,
Dos Santos
A
,
Desterke
C
,
Collino
A
,
Ghisletti
S
, et al
CAGE profiling of ncRNAs in hepatocellular carcinoma reveals widespread activation of retroviral LTR promoters in virus-induced tumors
.
Genome Res
2015
;
25
:
1812
24
.
52.
Varghese
S
,
Newton
R
,
Ross-Innes
CS
,
Lao-Sirieix
P
,
Krishnadath
KK
,
O'Donovan
M
, et al
Analysis of dysplasia in patients with Barrett's esophagus based on expression pattern of 90 genes
.
Gastroenterology
2015
;
149
:
1511
5
.

Supplementary data