Epigenetic mechanisms such as aberrant DNA methylation (DNAme) are known to drive esophageal squamous cell carcinoma (ESCC), yet they remain poorly understood. Here, we studied tumor-specific DNAme in ESCC cases from nine high-incidence countries of Africa, Asia, and South America. Infinium MethylationEPIC array was performed on 108 tumors and 51 normal tissues adjacent to the tumors (NAT) in the discovery phase, and targeted pyrosequencing was performed on 132 tumors and 36 NAT in the replication phase. Top genes for replication were prioritized by weighting methylation results using RNA-sequencing data from The Cancer Genome Atlas and GTEx and validated by qPCR. Methylome analysis comparing tumor and NAT identified 6,796 differentially methylated positions (DMP) and 866 differential methylated regions (DMR), with a 30% methylation (Δβ) difference. The majority of identified DMPs and DMRs were hypermethylated in tumors, particularly in promoters and gene-body regions of genes involved in transcription activation. The top three prioritized genes for replication, PAX9, SIM2, and THSD4, had similar methylation differences in the discovery and replication sets. These genes were exclusively expressed in normal esophageal tissues in GTEx and downregulated in tumors. The specificity and sensitivity of these DNAme events in discriminating tumors from NAT were assessed. Our study identified novel, robust, and crucial tumor-specific DNAme events in ESCC tumors across several high-incidence populations of the world. Methylome changes identified in this study may serve as potential targets for biomarker discovery and warrant further functional characterization.

Significance:

This largest genome-wide DNA methylation study on ESCC from high-incidence populations of the world identifies functionally relevant and robust DNAme events that could serve as potential tumor-specific markers.

Esophageal cancer is the seventh most common cancer worldwide, with an estimated 572,000 new cases and 509,000 deaths in 2018 (1). Esophageal squamous cell carcinoma (ESCC; the major subtype of esophageal cancer) is one of the most aggressive and lethal forms of cancer, which is often diagnosed at later stages, resulting in high mortality rates. ESCC arises from squamous epithelial cells of the esophagus, whereas, esophageal adenocarcinoma (EAC—the subtype commonly occurring in high-income countries of North America and Europe) originates from glandular cells near the lower esophageal sphincter. ESCC exhibits remarkable geographic differences in its incidence rate, the reasons for which are poorly understood. The highest incidences of ESCC are observed in many of the low- and middle-income countries (LMIC) of East Africa, Asia, and South America, and this disparity in incidence rates may be attributed to differences in environmental and dietary factors that are prevalent in the high-risk areas (1, 2).

Unlike other gastrointestinal cancers, there is limited understanding of ESCC etiology and molecular mechanisms of carcinogenesis especially from high-incidence countries (1, 3). In addition to the high incidence of ESCC in these LMICs, this cancer is difficult to diagnose at early stages and treat successfully (3), emphasizing the necessity for a better understanding of the underlying molecular mechanisms and the identification of molecular markers for early detection and theranostics for better management of the condition (4).

Epigenetic deregulation and resulting gene expression changes are among the major hallmarks of cancer (4–6). Although many studies have identified the presence of aberrant DNA methylation (DNAme) in ESCC, these studies are often limited by genomic coverage, absence of control tissues, sample size, high ethnic diversity, and lack of follow-up functional characterization of epigenetic changes driving ESCC development (6, 7). The largest study to date in terms of sample size was from The Cancer Genome Atlas (TCGA), which included methylome data of 96 ESCC tumors and was aimed at comparing the molecular features of ESCC and EAC (8). In addition, all of these studies focused on individual CpGs and did not address differential methylation across possible functional genomic regions involved in gene transcriptional regulation. The functional significance of the identified target genes was not analyzed either. Moreover, the majority of the genome-wide studies focused on genetic alterations predominantly from Chinese populations and the representation of other high-incidence regions like East Africa or South America is scant. Therefore, methylome studies on paired tumor and adjacent nontumor tissues from neglected high-incidence populations of ESCC with sufficient power and coverage are of obvious importance for both identifying robust DNAme markers and studying epigenetic mechanisms driving ESCC development.

To address these limitations, we conducted the first methylome-wide study, using the latest HM850K methylation array with the highest methylome coverage, on a large set of ESCC tumor and non-tumor adjacent tissues from high-incidence understudied populations of East Africa and South America. To account for the functional significance, we selected candidate genes/regions enriched for differentially methylated CpGs, looked into their differential expression in cancer tissues and expression patterns in normal mucosa using in-house data as well as data mining approaches from various publicly available databases. We validated the robustness of our top candidates on an additional set of ESCC samples from other high-incidence regions across the four continents.

Study design, study population, and sample preparation

The overall study design is summarized in Fig. 1A. Briefly, it comprised a standard discovery-replication design and included 108 tumors and 51 normal tissues adjacent to the tumor (NAT) in the discovery series and an independent set of 132 tumors and 36 NAT in the replication set. Samples were collected from different collaborating centers from 9 countries and the associated details are summarized in Fig. 1B and C. High incidence and lack of molecular studies were the primary criteria for the inclusion of populations for discovery and replication sets. Samples from Malawi, Brazil, South Africa, Ethiopia, Sudan, and Kenya were included in the discovery phase and samples from Brazil, Iran, Ethiopia, India, Kenya, and Portugal were included in the replication phase.

Figure 1.

Overview of study design and sample characteristics. A, Outline of the study design. B, Country-wise sample distribution in percentages. C, Dots in the map showing sample collection sites and their respective countries are colored.

Figure 1.

Overview of study design and sample characteristics. A, Outline of the study design. B, Country-wise sample distribution in percentages. C, Dots in the map showing sample collection sites and their respective countries are colored.

Close modal

Patients with ESCC included in the present study were recruited in case–control and complete case series studies (9–11) from various collaborating centers in association with major public-sector hospitals in the respective locations. Details of the sample collection sites and characteristics of the study participants are provided in Supplementary Tables S1 and S2, respectively. Each patient provided written informed consent to the study, the study was approved by the local Institutional Review Boards of all the collaborating centers, and by the International Agency for Research on Cancer (IARC) Ethical Committee (IEC; approval number: 16–25). All the included cases were histologically confirmed ESCC.

Tumor tissues were collected from either endoscopic biopsy or were surgically excised tissues processed into formalin-fixed paraffin-embedded (FFPE) tissues and/or freshly frozen tissues, whereas NATs were collected from pathologically confirmed negative margins of the ESCC samples. One section of 5 μm from each of these tissues was processed for pathologic examination by hematoxylin and eosin staining and the pathology was confirmed by experienced pathologists before designating them as a tumor or normal tissue for their inclusion in the current study, according to the Digestive System Tumors (5th edition) of the World Health Organization (WHO) series on the classification of human tumors. The inclusion criteria of the samples were as follows: The presence of more than 50% neoplastic cells in a sample was included as tumor tissue and the presence of 100% normal esophageal mucosa from negative margins of ESCC samples was considered as NAT.

Sample processing for HM850K array

Genomic DNA from tumor and NAT were extracted using the AllPrep DNA/RNA Mini Kit (Qiagen). The isolated DNA was then quantified with Qubit 3 Fluorometer using the Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific) as per the manufacturer's protocol. Samples with more than 500-ng DNA were included. The DNA from FFPE samples was subjected to Illumina FFPE QC Kit for quality control (QC) checks, as per the manufacturer's protocol. All DNA that passed the preliminary QC steps was then processed for bisulfite conversion with 500 ng of the isolated DNA from each sample using the EZ DNA Methylation Kit (Zymo Research) as described previously by the manufacturer. Only the FFPE-isolated DNA was further restored with the Illumina Infinium HD FFPE Restoration kit as per the protocol (12) before being processed for HM850K as per the manufacturer's instructions (13).

Genome-wide DNAme analysis

Infinium MethylationEPIC BeadChip microarray (HM850K; Illumina; ref. 13) was used to determine the genome-wide DNAme profile at single-base resolution covering over 850,000 CpG sites. Raw data files were preprocessed and normalized using the “minfi” Bioconductor package in R programming language with the R-Studio (14). The CpG sites exhibiting low detection P value (detection P > 0.05) or missing data for >10% of samples were excluded. Similarly, samples with missing data or overall low confidence for >10% of CpG sites were also removed from the analysis. Probes with single-nucleotide polymorphisms (15), crossreactive probes (16), and probes from X and Y chromosomes were also excluded. Finally, a total of 706,476 probes were retained for downstream analysis. Each CpG site was assigned with a specific β value that is defined as the ratio of signal intensities between methylated (M) and unmethylated (U) probe that is β = M/(M + U). The β value ranges from 0 to 1, with 0 being unmethylated, and 1 fully methylated. In some analyses, β values were converted to M values that are calculated as log (M/U) for normalization (17, 18). Batch correction and adjustment for potential confounding variables (age, sample type, country, etc.) and other unknown sources of variation were performed using the “SVA” package implemented in R (19). The additional raw datasets generated in this study are available in the Gene Expression Omnibus (GEO) repository, with the accession number GSE164083.

Differential methylation analysis and statistical methods

The analysis consists of two sequential elements starting with SVA followed by robust regression analysis while controlling for multiple testing. The “SVA” package corrects known and unknown surrogate variables, including batch effects. For analysis, 6 known (batch, sentrix-position, country, sample type, tobacco, alcohol) and 17 unknown significant variables were corrected by SVA. Robust linear regression was used to find differential methylation between tumor and normal tissues after adjusting for surrogate variables using the “Limma” R package (20). In addition, variables such as batch, country, sample type, age, and gender were included in the regression model for further adjustment during the regression analysis. We set a stringent filter with 30% Δβ (at least 30% difference in methylation) to identify differentially methylated positions (DMP) after correcting for multiple testing with FDR <0.05 (21). The differential methylated regions (DMR) were obtained using the DMRcate package by assuming ≥2 CpGs within a 500-bp window between tumor and normal (22). Furthermore, to rank the DMRs, we assigned each DMR a score (SDMR) as:

formula

where mΔβ is the absolute mean beta difference of CpGs in a DMR between tumor and normal; −log10P is the log-transformed P value of difference and NCpG is the number of CpGs in each DMR.

Gene expression analysis from online databases

We used normalized RNA-sequencing (RNA-seq) expression data from 96 ESCC tumors and 509 normal esophageal samples from the TCGA and the GTEx projects, respectively, using a standard processing pipeline used in RNAseqDB (23) that processes and unifies RNA-seq data from both platforms after uniform realignment, gene expression quantification, and batch effect removal. Briefly, the pipeline uses STAR to align sequencing reads, RSEM, and Feature Count to quantify gene expression, mRIN to evaluate sample degradation, RSeQC to measure sample staidness and quality, and SVAseq for batch correction.

For differential gene expression (DGE) between TCGA and GTEx, we limited our analysis to include only the genes associated with significant DMRs in the methylation analysis. We used linear models to quantify the difference in expression after correcting for multiple testing.

We assigned each gene a differential expression score (SDGE) taking into account both the magnitude and significance of the difference as:

formula

where AbsFC is the absolute fold change between tumor and normal, and −log10P is the log-transformed P value of difference. We also downloaded RNA-seq expression data across all sites from GTEx and compared them with the esophageal RNA-seq data from GTEx to identify tissue-specific expression. For each of the selected genes, a tissue-specific expression score (STSE) was calculated simply as

formula

For weighting DMR statistics, we calculated an expression score (ES) by the linear combination of differential expression and a tissue-specific ES for each gene as:

formula

Prioritization of targets: criteria and selection of top targets

To prioritize targets and genes with probable functional relevance for replication and expression studies, we ranked the targets by SDMR scores weighted by their respective ES as:

formula

Targeted methylation by pyrosequencing

The selected top three genes were then subjected to targeted DNAme replication using pyrosequencing on an additional 132 tumors and 36 NAT samples (Supplementary Table S2). The 100 ng of DNA samples were treated with sodium bisulfite using the EZ-96 DNA Methylation-Gold Kit (Zymo Research) as per the manufacture's protocol. The bisulfite-treated DNA was then processed as per the standard protocol described previously (24). The primer details for the pyrosequencing are provided in Supplementary Table S3.

TCGA methylation analysis

Infinium Human Methylation 450K BeadChip (HM450K) IDAT files and clinical data files for 96 ESCC tumors and 13 NAT (3 paired with tumor, 10 unpaired) were downloaded from the GDC data portal (NIH) using the TCGAbiolinks R package (23). From 96 tumors, 5 samples were excluded due to missing clinical data. The IDAT files were subjected to preprocessing, normalization, and QC steps similar to the discovery set HM850K files. Methylation β values of tumor and normal were compared using an unpaired t test for each of the selected CpGs for replication. We performed two regression analyses of TCGA HM450K data. The first analysis was conducted on the tumor (n = 91) versus NAT (n = 13) to determine tumor-specific DNAme. The second analysis was performed between early-stage ESCC cases consisting of stage I and II patients (n = 62) comparing with NAT (n = 13).

Gene-specific qPCR for functional validation of the targets

From the discovery set of samples, RNA was also isolated using the AllPrep DNA/RNA Mini Kit (Qiagen), and 56 tumors and 29 NAT samples gave decent RNA yield for further use. The inclusion criteria were the same for each set of samples and the isolated RNA was used to validate the impact of DNAme on gene expression (Supplementary Table S4). The RNA was assessed using a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific). 500 ng of RNA was then used to synthesize cDNA using an M-MLV Reverse Transcriptase kit (Promega Corporation) as per the mentioned protocol. The cDNA was used to measure the gene expression of the top three genes (primer details in Supplementary Table S5) relative to GAPDH (housekeeping gene) as per the standard protocol described previously (25).

Marker assessment to discriminate ESCC from normal tissue

To assess the potential of the top three prioritized DMRs as markers for ESCC, partial least square-discriminant analysis (PLS-DA) was performed to predict tumor or NAT tissue based on beta values of the 48 differentially methylated CpGs from these genes. PLS is a supervised process as it uses the independent beta values from DNAme and the dependent variables as the outcome (here tumor and NAT; ref. 26). The discriminant analysis determines the best combination of the CpGs, and the final values are chosen based on the least error rate and lowest number of predictors (CpGs) possible. The accuracy of performance of the identified predictor CpGs was estimated by plotting receiver-operating characteristic curves (ROC). Then the performance was validated with TCGA samples by plotting additional ROC curves.

Patient characteristics

The study design overview and sample characteristics are shown in Fig. 1. The mean age of patients with ESCC was 54.93 years (ranging from 25 to 90) in the discovery set and 60.64 years (ranging from 29 to 87) in the replication set for pyrosequencing. There were 47% male and 53% female ESCC cases in the discovery set and 63% males and 37% females in the replication set. A total of 54% and 50% of the cases from the discovery set were tobacco and alcohol consumers, respectively. Moreover, 59% and 47% of ESCC cases in the replication set consumed tobacco and alcohol, respectively. The details of the patient characteristics are shown in Supplementary Table S2. This study was approved by the Institutional Review Boards of all local collaborating centers and the IEC.

Identifying tumor-specific DMPs

A total of 6,796 tumor-specific DMPs spanning 2,737 unique genes were identified with a 30% Δβ DNAme difference (FDR <0.05) between tumors and NATs (Fig. 2A). Principal component analysis (PCA) was performed and plotted on the basis of DNAme status of the identified DMPs separated tumors from NATs, with tumors being more heterogeneous. No country or population-specific clustering was seen in the PCA plot using the identified DMPs (Fig. 2B). There was an increased number of hypermethylated CpGs in 1500 bp upstream of the transcription start sites and gene body, whereas intergenic regions, particularly open-sea regions, had more hypomethylated CpGs (Fig. 2C and D). The distribution of DMPs was relatively uniform across all autosomes with 5,618 hypermethylated CpGs (82.6%) and 1,178 (17.3%) hypomethylated CpGs in tumors compared with NATs (Fig. 2E; Supplementary Table S6). Unsupervised hierarchal clustering with the significant DMPs separated tumors from NAT markedly well with little misclassification (Fig. 2F). Among the 2,737 differentially methylated genes, there were several known tumor suppressors or oncogenes, including ADMTS9, ADMTS18, RNASET2, EPAS1, FHIT, RUNX3, RASSF1, ZNF382, FGFR1, KDM2A, etc. (Supplementary Table S6). Top hypermethylated CpGs were cg10752421 (SLC7A1, Δβ = 54%, P = 1.8 × 10−43), cg19126169 (NUMA1, Δβ = 52%, P = 4.8 × 10−48), cg04415798 (PAX9, Δβ = 54%, P = 1.8 × 10−43), cg11634930 (MKNK2, Δβ = 52%, P = 5.7 × 10−58). Enrichment analysis of the genes containing differentially methylated CpGs identified several important functional pathways and ontologies. The hypermethylated CpG–containing genes were enriched for several cancer-related pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG), such as pathways in cancer, hippo signaling pathways, and WNT signaling pathways among others (Fig. 2G; Supplementary Table S7). Gene ontology for the molecular function was enriched particularly for DNA binding and transcription activity and regulation functions that might contribute to gene expression alterations. The hypomethylated CpG–containing genes were enriched for the nervous system, neurotransmitter, and addiction-associated pathways like circadian entrainment, dopaminergic synapse, cholinergic synapse, morphine, and amphetamine addiction pathways.

Figure 2.

Tumor-specific DMPs. A, Circos plot showing the hyper- (purple) and hypomethylated (gold) DMPs across the autosomes. B, PCA performed using all DMPs. Samples from different countries are shown in various colors. BR, Brazil; ET, Ethiopia; IR, Iran; KN, Kenya; ML, Malawi; SA, South Africa; SD, Sudan. Tumors (T) and NAT (N) are shown as triangle and round-dot shapes, respectively. C, Genomic distribution of DMPs. D, CpG island distribution of DMPs. E, Volcano plot representing DMPs. The significantly hypermethylated probes are shown in purple, whereas hypomethylated probes are shown in gold. F, Heatmap based on identified tumor-specific DMPs. The row represents individual probes, and the column represents individual samples. Red, tumor; green, NAT samples. The color in the heatmap represents the methylation level of the genes. Purple, hypermethylated probes; gold, hypomethylated probes. G, Pathway enrichment analysis of the hyper- and hypomethylated DMPs.

Figure 2.

Tumor-specific DMPs. A, Circos plot showing the hyper- (purple) and hypomethylated (gold) DMPs across the autosomes. B, PCA performed using all DMPs. Samples from different countries are shown in various colors. BR, Brazil; ET, Ethiopia; IR, Iran; KN, Kenya; ML, Malawi; SA, South Africa; SD, Sudan. Tumors (T) and NAT (N) are shown as triangle and round-dot shapes, respectively. C, Genomic distribution of DMPs. D, CpG island distribution of DMPs. E, Volcano plot representing DMPs. The significantly hypermethylated probes are shown in purple, whereas hypomethylated probes are shown in gold. F, Heatmap based on identified tumor-specific DMPs. The row represents individual probes, and the column represents individual samples. Red, tumor; green, NAT samples. The color in the heatmap represents the methylation level of the genes. Purple, hypermethylated probes; gold, hypomethylated probes. G, Pathway enrichment analysis of the hyper- and hypomethylated DMPs.

Close modal

Identifying tumor-specific DMRs

Genome-wide differential methylation analysis in specific regions (DMRs) with at least 2 CpGs within the 500-bp window and 30% Δβ (FDR <0.05) between tumor and normal tissues identified 866 DMRs in autosomes. Hypermethylated DMRs vastly outnumbered the hypomethylated DMRs with 788 (91%) hypermethylated and 78 (9%) hypomethylated DMRs in tumors compared with NATs (Fig. 3A). A PCA was plotted using DNAme status of the identified CpGs from the DMR list separated tumors from NATs, with tumors being more heterogeneous. No country or population-specific clustering was seen in the PCA plot (Supplementary Fig. S1). A total of 45 (5%) DMRs consisted of 6 or more CpGs within the region up to a maximum of 21 CpGs, 244 (28%) consisted of 3 to 5 CpGs, and the rest contained at least 2 CpGs in the DMR regions (Supplementary Table S8). Among the 866 DMRs, 735 were associated with gene loci and involved 634 unique genes (564 were unique gene–DMR pairs and 70 genes contained 171 DMRs) and 131 nongenic DMRs failed to map to any gene (Supplementary Table S8). Because around 67% of the DMRs consisted of 2 CpGs with 30% Δβ, we tested the possibilities of the presence of other differentially methylated CpGs in these DMRs by reducing the methylation difference to 10% and increasing the number of CpGs to ≥5 (Supplementary Table S9). We found 58% of the DMRs with 2 CpGs and 30% Δβ overlap with the DMRs when the Δβ is reduced to 10% and the number of CpGs was increased to ≥5 CpGs (Supplementary Fig. S2). DNA binding and transcription activator activity pathway was significantly enriched among the DMR genes (Fig. 3B) and approximately 50% to 60% of the hypo- and hypermethylated DMRs were in promoters and enhancers (Fig. 3C). The top 4 DMRs ranked by DMR score (SDMR) were PAX9 (mean Δβ = 41%; P value = 1.83 X−300) and SIM2 (mean Δβ = 40%; P value = 1.83 X−300), having 21 CpGs each, followed by MEIS1 (mean Δβ = 44%; P value = 1.83 X−300) involving 18 CpGs, a microRNA gene MIR23B (mean Δβ = 42%; P value = 1.83 X−300) and stretch of 555-bp long nongenic region on chromosome 6 with 12 CpGs (Supplementary Table S8).

Figure 3.

Tumor-specific DMRs. A, Circos plot of the tumor-specific DMR distribution. B, Pathway enrichment analysis using DMR-associated genes. C, Genomic distribution of the tumor-specific DMRs.

Figure 3.

Tumor-specific DMRs. A, Circos plot of the tumor-specific DMR distribution. B, Pathway enrichment analysis using DMR-associated genes. C, Genomic distribution of the tumor-specific DMRs.

Close modal

Tumor-specific DMRs from TCGA HM450K data analysis

We next evaluated the overall overlap of the DMRs identified in our discovery set with the tumor-specific DMRs identified from 96 ESCC and 13 NAT samples available in TCGA. Although limited by much lesser genome coverage in HM450K array data with almost half the number of CpGs from HM850K arrays and lesser NAT samples for comparison, we identified 5,063 DMRs spanning 2,935 unique genes (Supplementary Table S10). Comparison of these 2,935 genes with 634 DMR-associated genes generated from our discovery set revealed a significant overlap (323/634, 51%), indicating that more than half of these DMR-associated genes are frequently aberrantly methylated across different populations and largely consistent across HM450K and HM850K array datasets (Supplementary Fig. S3). Because the probes on the HM850K array cover >90% of the sites on the HM450, it is less likely that the nonoverlapping probes are due to the differences in the CpG probes in both the arrays (27).

To predict whether our identified tumor-specific DNAme are early events during the development of ESCC, we compared early-stage ESCC (stage I and II; n = 62) versus NAT (n = 13) samples from TCGA. A total of 1,647 DMRs spanning 1,038 unique genes were observed between early-stage tumors and NAT, of which, 165 (∼26%) were in common with previously identified DMRs from our discovery set (866 DMRs spanning 634 genes; Supplementary Table S11). Therefore, these 165 of the tumor-specific DMRs obtained in the discovery phase could be early events that might play a crucial role in ESCC carcinogenesis (Supplementary Fig. S4A and S4B).

These results provide convincing evidence of frequent and robust DNAme alterations in the esophageal cancer tissues in comparison with the normal mucosa that is consistent across tumors from various populations. Because almost one fourth of these alterations were found in early stages after comparing with TCGA early-stage ESCC DMRs, these DNAme alterations seem to be crucial in the initiation of ESCC development.

Evaluation of the impact of differential methylation on associated gene expression

To understand the functional impact of the differentially methylated genes in ESCC, we next studied the expression of these genes in ESCC and normal esophageal tissues from TCGA and GTEx, respectively. The expression for 559/634 genes was available from TCGA and GTEx and Fig. 4A depicts average methylation of DMRs associated with expression of these genes, differential and tissue-specific expressions, and the weighting scores for each of them. A total of 430 genes were differentially expressed between tumor and normal esophageal tissues (Fig. 4A; Supplementary Table S12). A total of 334 top DMR genes exhibited an inverse correlation with expression patterns, that is, genes having hypermethylation in tumors when compared with normal tissues were downregulated in tumor tissues and hypomethylated genes having upregulation in tumors (Supplementary Table S13 and Supplementary Fig. S5). Further comparing tissue-specific expression of genes between esophageal tissues and tissues from other body sites in GTEx, 56 genes had at least a 2-fold higher expression in esophageal mucosa as compared with other sites. Among the top genes with esophagus-specific expression were IL1RN, PAX9, KRT32, SIM2, and TRIM29 (Supplementary Table S14). DNAme and altered RNA expression in the exclusively or highly expressing esophageal genes suggest the crucial role of tumor-specific DNAme patterns in ESCC carcinogenesis. Furthermore, SLC9A3, HOXA13, EPN3, THSD4, and PHYHD1 were among the top genes ranked by the ES, a measure for combining the effects of differential expression and tissue-specific expression for each gene (Fig. 4A; Supplementary Table S15).

Figure 4.

Comparison of DMR genes with RNA-seq data from TCGA-GTEx and target prioritization. A, Heatmap with DMR from discovery set analysis of HM850K data correlated with gene expression from TCGA-GTEx–derived RNA-seq data. B, Mean DNAme values from discovery set, mean RNA expression from TCGA-GTEx–derived RNA-seq data, and esophagus-specific RNA expression of the three selected genes (PAX9, SIM2, and THSD4) for validation. CS, combined score; DEG, differentially expressed genes; ES (expression score), SDGE +STSE; FPKM, fragments per kilobase million reads; SDMR (Score DMR), mean Δβ × −log10P × number of CpGs; TSE, tissue-specific expression, here esophagus.

Figure 4.

Comparison of DMR genes with RNA-seq data from TCGA-GTEx and target prioritization. A, Heatmap with DMR from discovery set analysis of HM850K data correlated with gene expression from TCGA-GTEx–derived RNA-seq data. B, Mean DNAme values from discovery set, mean RNA expression from TCGA-GTEx–derived RNA-seq data, and esophagus-specific RNA expression of the three selected genes (PAX9, SIM2, and THSD4) for validation. CS, combined score; DEG, differentially expressed genes; ES (expression score), SDGE +STSE; FPKM, fragments per kilobase million reads; SDMR (Score DMR), mean Δβ × −log10P × number of CpGs; TSE, tissue-specific expression, here esophagus.

Close modal

Prioritization and replication of candidate genes

We next prioritized candidate genes for replication as well as for understanding functional significance in terms of expression changes associated with DNA methylation deregulation. To this end, we combined gene-based methylation statistics (SDMR), information regarding tissue-specific expression patterns, and differential expression between tumor and normal tissues (ES) in a score (CS; Fig. 4A; Supplementary Table S15). On the basis of the score, the top three genes were SIM2, PAX9, and THSD4, and the top 20 prioritized gene list is shown in Table 1. From the list, we selected the top three target DMRs that were hypermethylated in tumors and downregulated in TCGA ESCC tumors as compared with normal esophageal tissues from GTEx. SIM2 had a mean methylation difference of 41% (P < 1.83 × 10−300) between tumor and NAT tissues, around 11-fold lower expression (P = 2.83 × 10−68) in the tumor as compared with normal tissues and 24.6-fold higher expression in esophageal tissues as compared with other organs. Similarly, PAX9 and THSD4 had, respectively, 40% (P < 1.83 × 10−300) and 38% (P = 8.35 × 10−246) higher methylation, 3.6-fold (P = 2.32 × 10−16) and 21.4-fold (P = 1.33 × 10−112) lower expression in tumors with very high esophagus-specific expression patterns (PAX9: 93.2-fold and THSD4: 6.1-fold higher expression in esophageal tissues compared with other organs; Fig. 4B).

Table 1.

List of top 20 DMR-associated genes based on combined score after the target prioritization of DMRs.

GeneCombined score (CS)hg19 genomic coordinatesWidth (bp)Probes (N)Mean ΔβP value
SIM2a 688,878.52 chr21:38076709:38083099 6,391 21 0.40 1.83 × 10−300 
PAX9 329,274.46 chr14:37125531:37130313 4,783 21 0.41 1.83 × 10−300 
THSD4 293,401.57 chr15:71838235:71840098 1,864 0.39 8.35 × 10−246 
HOXA3a 172,195.13 chr7:27160276:27162766 2,491 12 0.37 1.83 × 10−300 
PHYHD1 145,078.68 chr9:131683857:131684923 1,067 0.39 9.63 × 10−176 
GPT 129,918.28 chr8:145728138:145728543 406 0.41 1.09 × 10−217 
HOXB3 129,143.23 chr17:46627848:46629804 1,957 0.41 1.01 × 10−273 
KCNJ15 124,430.48 chr21:39643353:39644262 910 0.35 4.71 × 10−188 
TP53AIP1 121,633.01 chr11:128812462:128813688 1,227 0.40 1.83 × 10−300 
NFIXa 117,796.80 chr19:13135318:13135808 491 10 0.40 1.83 × 10−300 
EN1 70,812.55 chr2:119606302:119607885 1,584 0.39 1.25 × 10−189 
FAM83F 68,656.14 chr22:40417285:40417869 585 0.40 2.78 × 10−204 
SHOX2 62,780.05 chr3:157821536:157821994 459 0.38 2.90 × 10−136 
LAMB4 52,733.81 chr7:107771104:107771214 111 0.38 8.75 × 10−112 
ITPRIP 51,282.25 chr10:106093778:106094976 1,199 0.38 4.22 × 10−266 
EXPH5 49,297.09 chr11:108422663:108423292 630 0.37 3.12 × 10−172 
ZNF471 45,717.96 chr19:57019005:57019279 275 0.41 2.13 × 10−133 
MKNK2 42,887.85 chr19:2046085:2046350 266 0.52 7.67 × 10−190 
SPARC 37,744.97 chr5:151066460:151066486 27 0.39 3.71 × 10−103 
NDST1 37,215.63 chr5:149887039:149887716 678 0.35 2.36 × 10−189 
GeneCombined score (CS)hg19 genomic coordinatesWidth (bp)Probes (N)Mean ΔβP value
SIM2a 688,878.52 chr21:38076709:38083099 6,391 21 0.40 1.83 × 10−300 
PAX9 329,274.46 chr14:37125531:37130313 4,783 21 0.41 1.83 × 10−300 
THSD4 293,401.57 chr15:71838235:71840098 1,864 0.39 8.35 × 10−246 
HOXA3a 172,195.13 chr7:27160276:27162766 2,491 12 0.37 1.83 × 10−300 
PHYHD1 145,078.68 chr9:131683857:131684923 1,067 0.39 9.63 × 10−176 
GPT 129,918.28 chr8:145728138:145728543 406 0.41 1.09 × 10−217 
HOXB3 129,143.23 chr17:46627848:46629804 1,957 0.41 1.01 × 10−273 
KCNJ15 124,430.48 chr21:39643353:39644262 910 0.35 4.71 × 10−188 
TP53AIP1 121,633.01 chr11:128812462:128813688 1,227 0.40 1.83 × 10−300 
NFIXa 117,796.80 chr19:13135318:13135808 491 10 0.40 1.83 × 10−300 
EN1 70,812.55 chr2:119606302:119607885 1,584 0.39 1.25 × 10−189 
FAM83F 68,656.14 chr22:40417285:40417869 585 0.40 2.78 × 10−204 
SHOX2 62,780.05 chr3:157821536:157821994 459 0.38 2.90 × 10−136 
LAMB4 52,733.81 chr7:107771104:107771214 111 0.38 8.75 × 10−112 
ITPRIP 51,282.25 chr10:106093778:106094976 1,199 0.38 4.22 × 10−266 
EXPH5 49,297.09 chr11:108422663:108423292 630 0.37 3.12 × 10−172 
ZNF471 45,717.96 chr19:57019005:57019279 275 0.41 2.13 × 10−133 
MKNK2 42,887.85 chr19:2046085:2046350 266 0.52 7.67 × 10−190 
SPARC 37,744.97 chr5:151066460:151066486 27 0.39 3.71 × 10−103 
NDST1 37,215.63 chr5:149887039:149887716 678 0.35 2.36 × 10−189 

Note: Δβ (greater than or equal to) 30% was the cutoff for the DMR analysis [FDR (less than or equal to) 0.05] using DMRcate R package. Mean Δβ for each DMR is the mean across all probes within the DMR. Δβ values were calculated as the difference between the mean β for tumor and NAT.

aTwo DMRs of the same gene were present in the top 20 list of DMRs (Supplementary Table S14). Only the first DMR based on the combined score is included in the table.

To validate the results obtained from the discovery phase analysis of the HM850K array, we performed pyrosequencing analysis on the 3 prioritized genes (PAX9, SIM2, and THSD4) and obtained DNAme status at the target region in an independent set of 168 samples (132 tumors and 36 NAT). All three genes showed significantly higher DNAme in tumors compared with the NAT (Fig. 5A), which further reinforced the robustness of the results generated in the discovery phase. The results showed a 20% to 60% DNAme difference across the CpGs within the pyrosequenced region of all three genes (Supplementary Fig. S6A–S6C). The two CpGs (cg00762160 from HM850K array) in the PAX9 region showed at least 20% higher methylation in tumors compared with NAT (P = 0.0008). Moreover, the DNAme difference in the 5 CpGs (cg21697851 from HM850K) of SIM2 was around 30% (P = 3.38 × 10−08) that is almost identical to the beta difference obtained from the discovery phase analysis (Fig. 5B). Results obtained from the pyrosequencing of the THSD4 region with two CpGs (cg05337779 from HM850K) showed somewhat higher DNAme levels (53% in replication vs. 38% in the discovery phase) between tumor and NAT (P = 2.49 × 10−26).

Figure 5.

Replication of the three target genes. A, Mean beta value of the CpGs within the DMR region of PAX9, SIM2, and THSD4. B, Replication of the target CpG by pyrosequencing PAX9 (CpG ID: cg00762160), SIM2 (CpG ID: cg2169785), and THSD4 (CpG ID: cg05337779). C, RNA expression of PAX9, SIM2, and THSD4 normalized to GAPDH (housekeeping gene).

Figure 5.

Replication of the three target genes. A, Mean beta value of the CpGs within the DMR region of PAX9, SIM2, and THSD4. B, Replication of the target CpG by pyrosequencing PAX9 (CpG ID: cg00762160), SIM2 (CpG ID: cg2169785), and THSD4 (CpG ID: cg05337779). C, RNA expression of PAX9, SIM2, and THSD4 normalized to GAPDH (housekeeping gene).

Close modal

We further validated the gene expression results for these 3 selected genes by qPCR (Fig. 5C). The results showed a significantly lower PAX9 (−2.7-fold; P = 0.0006) and SIM2 (−11.14; P < 0.0001) expression in ESCC tumor (n = 56) than NAT (n = 29) samples. The other gene THSD4 also followed the same trend of higher expression in NAT although the difference in expression was marginally significant (−2.2-fold; P = 0.07).

Performance assessment of the prioritized DMRs as potential markers

To identify the best combinations of CpG markers from the three prioritized DMR genes consisting of 48 CpGs, we explored all combinations of CpGs that could have discriminatory power to distinguish between tumor and NAT samples using PLS-DA. Seven CpGs showed the best predictive value in our models (CpG IDs) using the discovery set samples with AUC value 0.98 [95% confidence interval (CI), 0.97–0.99; P < 0.00001] shown in Fig. 6A and B. To test the effectiveness of these potential markers in another independent set, we tested this in TCGA samples. There were 5 CpGs (CpG IDs) common in the HM450K array (used in TCGA) that were used in the analysis (Supplementary Table S16). Because stage information was available in TCGA, we performed one analysis, including stage I and II samples (n = 62) versus normal (n = 13) to assess the effectiveness of the markers in early-stage tumors and the other analysis with all stage tumors (n = 91) versus NAT (n = 13). The analysis with TCGA early-stage and TCGA all-stage tumors showed AUC 0.90 (95% CI, 0.80–1.0; P < 0.00001) and 0.89 (95% CI, 0.79–1.0; P < 0.00001), respectively (Fig. 6CE). There was no difference in DNAme of the marker panel by age, gender, and country of origin in both the datasets (Supplementary Fig. S7A–S7E).

Figure 6.

Specificity and sensitivity of CpG panel as markers for ESCC. A, Mean methylation of the proposed 7-CpG marker in tumor and NAT from discovery set. B, ROC curve of the 7-CpG marker from discovery set: 108 tumors versus 51 NAT. C, Mean methylation of the proposed 5-CpG (common in HM450K) marker in TCGA early-stage and all-stage versus NAT. D, ROC curve of the 5-CpG marker from TCGA early-stage: 62 tumors versus 13 NAT. E, ROC curve of the 5-CpG marker from TCGA-all stage: 91 tumors versus 13 NAT.

Figure 6.

Specificity and sensitivity of CpG panel as markers for ESCC. A, Mean methylation of the proposed 7-CpG marker in tumor and NAT from discovery set. B, ROC curve of the 7-CpG marker from discovery set: 108 tumors versus 51 NAT. C, Mean methylation of the proposed 5-CpG (common in HM450K) marker in TCGA early-stage and all-stage versus NAT. D, ROC curve of the 5-CpG marker from TCGA early-stage: 62 tumors versus 13 NAT. E, ROC curve of the 5-CpG marker from TCGA-all stage: 91 tumors versus 13 NAT.

Close modal

In the present study, we characterized the genome-wide methylation profiles of a large and unique set of ESCC tumors from patients living in high incidence, underrepresented populations. This was combined with gene expression data from ESCC tumor and normal samples from TCGA and GTEx and targeted validation in independent series of samples, to identify functionally relevant DNAme events and associated genes and pathways involved in ESCC carcinogenesis. We found widespread DNA methylation changes and highlighted frequent, early, and functionally relevant novel differentially methylated target genes consistent across populations that might be potential drivers involved in the ESCC development and progression.

We observed extensive hypermethylation across ESCC genomes, particularly at the promoters and within the gene body, and little hypomethylation changes that were mainly enriched in the intergenic regions and open sea regions of the genome. This differential CpG methylation change according to the genomic location is consistent with prevalent literature in several cancers (28, 29). The genes mapped to the differentially methylated CpGs included genes with known oncogenic or tumor-suppressor functions. For example, promoter methylation of RASSF1, FHIT, ADMTS1, ADMTS18, ZNF382, and RUNX3 has been previously implicated in ESCC (30). As such, several pathways important for cancer development like WNT and hippo signaling pathways, inflammatory, and cell communication pathways were enriched. These are in line with our previous study, including ESCC tumor versus NAT pairs from Brazil using one of the earliest high-throughput Illumina arrays with limited coverage of around 1500 CpGs (6) as well as two other studies exploring ESCC tumor DNAme using HM450K arrays with small sample sizes (n ≤ 10) from specific Asian population that found similar enrichment of the WNT signaling pathway (7, 31).

In contrast with most of the previous studies that identified methylation of candidate genes associated with single CpGs in ESCC (32), we focused on differentially methylated regions associated genes and identified a number of DMR genes across the cancer genome. By focusing on DMRs, we intended to identify methylation events that might have functional relevance in gene expression regulation. In addition to genic DMRs in promoters and enhancers, our study also found substantial methylation in introns and gene body supporting the presence and regulatory functions of gene body DNAme in many cancers (33). As such, several pathways involved in proximal promoter sequence-specific DNA binding and transcription activator activity were enriched among the DMRs. The only study reporting DMRs in ESCC found similar enrichments of gene body methylation and transcription regulation, though using sequencing techniques in only 4 tumors and NAT pairs (34). In addition, more than half of the tumor-specific DMRs from the discovery set replicated in TCGA DMRs despite the different population and methylation array type (HM450K array). Around 26% of the tumor-specific DMRs were found to be early events as identified through comparison with DMRs identified from early-stage tumors of TCGA. These findings are consistent with the DNAme events acting as early aberrations in ESCC development, although further studies are needed to substantiate this notion. The robustness and reproducibility of our identified tumor-specific DNAme events also represent attractive avenues for the development of assays targeting these methylation-based markers for improved early detection of ECSCC.

Because our study identified tumor-specific DNAme alterations consistently across all tumors and populations, we reasoned that they may play a role in the ESCC development process. To further explore this possibility, we investigated the expression patterns of the DMR genes in silico from tumor and normal esophageal tissues. Among the DMR genes, 88% were differentially expressed between normal and tumor tissues and 60% showed directionally opposite expression patterns compared with methylation changes, supporting the notion of an inverse correlation of DNAme and gene expression (35, 36). These genes are likely to be functionally important for esophageal cells and disruption of regulatory control by aberrant DNA methylation might act as driver events in the malignant transformation of these cells. Therefore, to identify functionally relevant methylation events, we weighted the DMRs with their corresponding expression patterns and prioritized genes for driver identification and future functional characterization. SIM2, PAX9, and THSD4 topped the list of prioritized genes, and several novel genes that were not previously reported in ESCC such as THSD4, PHYHD1, GPT, KCNJ15, and TP53AIP1 were among the top 10 prioritized DMR-associated genes. To further support these findings, we performed qPCR on candidate genes to check whether the results are concordant with the in silico analysis. Among these DMR-associated differentially expressed genes, we validated three genes PAX9, SIM2, and THSD4, showing a consistent trend of RNA expression as generated from GTEx RNA-seq data. Many of these genes also showed esophagus-specific expression, including some of the top DMR genes such as SIM2 and PAX9.

DNAme patterns of the top three target genes were successfully replicated in an independent set of ESCC samples from Brazil, Portugal, Iran, Ethiopia, Kenya, and India. Our top DMR and one of the target genes is PAX9, the transcription factor well known for its role in tooth agenesis or hypodontia (37), and several studies suggest that tooth agenesis could be linked to cancer susceptibility (38, 39). Our findings of hypermethylation and downregulation of PAX9 in ESCC as well as its exclusive expression in esophageal mucosa, coupled with previous reports of downregulation of PAX9 in esophageal adenocarcinoma, Barrett's esophagus (40), and head and neck cancers suggest an important role in regulating squamous cell differentiation in the oro-esophageal epithelium. Therefore, epigenetic silencing of this gene could be a frequent event in upper aerodigestive tract cancers contributing to oro-esophageal carcinogenesis (39, 41, 42). Similarly, SIM2 is also a transcription factor from the family of the basic helix–loop–helix–PER–ARNT–SIM. Supporting our findings, DNAme-associated suppression of SIM2 expression in ESCC is considered as a frequent event (43) and reported in a previous study in as many as 90% of the ESCC cases (n = 60). SIM2 downregulation is also involved in chemoradiotherapy sensitivity in ESCC, suggesting this gene as a possible therapeutic target (43, 44). It is noteworthy that one study identified four significantly differentially expressed genes in ESCC compared with NAT, among which, two of them were SIM2 and THSD4 underscoring their roles in tumorigenesis (45). Although the functional role of THSD4 in ESCC is not known, it is a member of the extracellular calcium–binding protein family involved in cell adhesion and migration that was previously linked to tumorigenesis of hemangioblastoma (46) and neoadjuvant chemoradiotherapy response in rectal cancer (47). Other top novel genes PHYHD1 and GPT were not previously linked to carcinogenesis. However, loss of KCNJ15 expression in renal carcinoma promoted malignant phenotypes and associated with poorer prognosis (48), and loss of TP53AIP1 expression was associated with the inhibition of cancer cell apoptosis in lung adenocarcinoma (49). All these studies on SIM2, PAX9, THSD4, KCNJ15, and TP53AIP1 suggested their alteration in cancer is a common event that might be regulated by aberrant DNAme. Therefore, upstream and downstream signaling pathways of these genes may provide potential therapeutic targets for ESCC and other related cancers.

One of the goals of this study was to discover non-random cancer-specific changes in DNAme that could discriminate ESCC tissue from normal samples. We identified a combination of 7 CpGs that could identify tumors with high sensitivity and specificity with the potential to be used as ESCC markers. The marker performance on TCGA clinical samples (AUC 0.89) is comparable with the performance on the discovery data (AUC 0.98), given that both were independent sets of tumor and normal tissues, analyzed by different microarray platforms (450K vs. 850K) and the TCGA dataset having only 5 CpGs in the marker panel. However, one limitation was the unavailability of additional independent cohorts of blood samples or other minimally invasive cell types from patients with ESCC and healthy controls for marker validation.

In conclusion, our study is the first and largest applying a methylome-profiling approach to ESCC samples from unique high-incidence populations of the world and further replicated our results with cases from other populations using TCGA data. We identified novel, robust, early, and functionally relevant tumor-specific DNAme events in ESCC across tumors. Aberrant DNAme in tumors and subsequent gene expression alterations of these genes (notably PAX9, SIM2, THSD4, and the additional genes) that are exclusively expressed or highly expressed in normal esophageal mucosa might be crucial events in the initiation of ESCC development. Our findings could serve as a reference for the functional characterization of these genes to explore deeper into their role in ESCC initiation and progression. Furthermore, the potential use of the identified CpGs as minimally invasive DNAme-based ESCC biomarkers such as those employed on minimally invasively collected samples using Cytosponge (50) or circulating tumor DNA (ctDNA) from blood, should be prioritized in future early detection efforts. Because esophageal sponge cytology sampling was shown feasible in African settings (51), the identified DNAme markers could be even tested and implemented for early detection in various high-incidence populations of ESCC in low resource settings.

S.C. Soares Lima reports grants from Conselho Nacional de Desenvolvimento Científico e Tecnológico and Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro—FAPERJ during the conduct of the study. M.L. Roux reports grants from American Cancer Society during the conduct of the study. No disclosures were reported by the other authors.

Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.

F.R. Talukdar: Conceptualization, data curation, formal analysis, validation, investigation, visualization, methodology, writing–original draft, project administration. S.C. Soares Lima: Resources, data curation, methodology. R. Khoueiry: Methodology. R.S. Laskar: Conceptualization, data curation, formal analysis, visualization. C. Cuenin: Methodology. B.P. Sorroche: Formal analysis, methodology. A.-C. Boisson: Methodology. B. Abedi-Ardekani: Methodology. C. Carreira: Methodology. D. Menya: Resources. C.P. Dzamalala: Resources. M. Assefa: Resources. A. Aseffa: Resources. V. Miranda-Gonçalves: Resources, methodology. C. Jerónimo: Resources. R.M. Henrique: Resources. R. Shakeri: Resources. R. Malekzadeh: Resources. N. Gasmelseed: Resources. M. Ellaithi: Resources. N. Gangane: Resources. D.R.S. Middleton: Resources. F. Le Calvez-Kelm: Resources. A. Ghantous: Methodology. M.L. Roux: Resources. J. Schüz: Resources, supervision. V. McCormack: Resources, data curation, supervision. M.I. Parker: Resources. L.F.R. Pinto: Resources, supervision. Z. Herceg: Conceptualization, resources, supervision, funding acquisition, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.

The work reported in this article was undertaken by F.R. Talukdar partly during the tenure of a Postdoctoral Fellowship from the International Agency for Research on Cancer (IARC), partially supported by the EC FP7 Marie Curie Actions—People—Co-funding of regional, national, and international programs (COFUND). The work in the Epigenetics Group at IARC is supported by grants from the Institut National du Cancer (INCa, France), the European Commission (EC) Seventh Framework Program (FP7) Translational Cancer Research (TRANSCAN) Framework, the Foundation ARC pour la Recherche sur le Cancer (France), Plan Cancer-Eva-Inserm research grant, La direction 1énérale de l'offre de soins (DGOS), and INSERM (SIRIC LYriCAN, INCa-DGOS-Inserm_12563; to Z. Herceg). V. Miranda-Gonçalves was supported by NORTE-01-0145-740 FEDER-000027 (ESTIMA). M.I. Parker was jointly supported by the SAMRC with funds received from the National Department of Health and the Medical Research Council (MRC) UK with funds from the UK Government's Newton Fund and GlaxoSmithKline (GSK). Kenyan case–control study was supported by the IARC and NIH/NCI (grant number R21CA191965). The Ethiopian case–control study was funded by the American Cancer Society (ACS) and IARC.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
. 
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
2.
Arnold
M
,
Laversanne
M
,
Brown
LM
,
Devesa
SS
,
Bray
F
. 
Predicting the future burden of esophageal cancer by histological subtype: international trends in incidence up to 2030
.
Am J Gastroenterol
2017
;
112
:
1247
55
.
3.
McCormack
VA
,
Menya
D
,
Munishi
MO
,
Dzamalala
C
,
Gasmelseed
N
,
Leon Roux
M
, et al
Informing etiologic research priorities for squamous cell esophageal cancer in Africa: a review of setting-specific exposures to known and putative risk factors
.
Int J Cancer
2017
;
140
:
259
71
.
4.
Ma
K
,
Cao
B
,
Guo
M
. 
The detective, prognostic, and predictive value of DNA methylation in human esophageal squamous cell carcinoma
.
Clin Epigenetics
2016
;
8
:
43
.
5.
Kanwal
R
,
Gupta
S
. 
Epigenetic modifications in cancer
.
Clin Genet
2012
;
81
:
303
11
.
6.
Lima
SC
,
Hernandez-Vargas
H
,
Simao
T
,
Durand
G
,
Kruel
CD
,
Le Calvez-Kelm
F
, et al
Identification of a DNA methylome signature of esophageal squamous cell carcinoma and potential epigenetic biomarkers
.
Epigenetics
2011
;
6
:
1217
27
.
7.
Li
X
,
Zhou
F
,
Jiang
C
,
Wang
Y
,
Lu
Y
,
Yang
F
, et al
Identification of a DNA methylome profile of esophageal squamous cell carcinoma and potential plasma epigenetic biomarkers for early diagnosis
.
PLoS One
2014
;
9
:
e103162
.
8.
Cancer Genome Atlas Research Network; Analysis Working Group: Asan University; BC Cancer Agency; Brigham and Women's Hospital; Broad Institute; Brown University
, 
Integrated genomic characterization of oesophageal carcinoma
.
Nature
2017
;
541
:
169
75
.
9.
Menya
D
,
Oduor
M
,
Kigen
N
,
Maina
SK
,
Some
F
,
Kibosia
C
, et al
Cancer epidemiology fieldwork in a resource-limited setting: experience from the western Kenya ESCCAPE esophageal cancer case–control pilot study
.
Cancer Epidemiol
2018
;
57
:
45
52
.
10.
Nicolau-Neto
P
,
Da Costa
NM
,
de Souza Santos
PT
,
Gonzaga
IM
,
Ferreira
MA
,
Guaraldi
S
, et al
Esophageal squamous cell carcinoma transcriptome reveals the effect of FOXM1 on patient outcome through novel PIK3R3 mediated activation of PI3K signaling pathway
.
Oncotarget
2018
;
9
:
16634
47
.
11.
Leon
ME
,
Assefa
M
,
Kassa
E
,
Bane
A
,
Gemechu
T
,
Tilahun
Y
, et al
Qat use and esophageal cancer in Ethiopia: a pilot case–control study
.
PLoS ONE
2017
;
12
:
e0178911
.
12.
Serrano
J
,
Snuderl
M
. 
Whole genome DNA methylation analysis of human glioblastoma using Illumina BEADARRAYS
.
Methods Mol Biol
2018
;
1741
:
31
51
.
13.
Kling
T
,
Wenger
A
,
Beck
S
,
Caren
H
. 
Validation of the MethylationEPIC BeadChip for fresh-frozen and formalin-fixed paraffin-embedded tumours
.
Clin Epigenetics
2017
;
9
:
33
.
14.
Fortin
JP
,
Triche
TJ
 Jr
,
Hansen
KD
. 
Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi
.
Bioinformatics
2017
;
33
:
558
60
.
15.
Zhou
W
,
Laird
PW
,
Shen
H
. 
Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes
.
Nucleic Acids Res
2017
;
45
:
e22
.
16.
Chen
YA
,
Lemire
M
,
Choufani
S
,
Butcher
DT
,
Grafodatskaya
D
,
Zanke
BW
, et al
Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium human methylation 450 microarray
.
Epigenetics
2013
;
8
:
203
9
.
17.
Merid
SK
,
Novoloaca
A
,
Sharp
GC
,
Kupers
LK
,
Kho
AT
,
Roy
R
, et al
Epigenome-wide meta-analysis of blood DNA methylation in newborns and children identifies numerous loci related to gestational age
.
Genome Med
2020
;
12
:
25
.
18.
Du
P
,
Zhang
X
,
Huang
CC
,
Jafari
N
,
Kibbe
WA
,
Hou
L
, et al
Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis
.
BMC Bioinformatics
2010
;
11
:
587
.
19.
Zhang
S
,
Wu
Z
,
Xie
J
,
Yang
Y
,
Wang
L
,
Qiu
H
. 
DNA methylation exploration for ARDS: a multi-omics and multi-microarray interrelated analysis
.
J Transl Med
2019
;
17
:
345
.
20.
Ritchie
ME
,
Phipson
B
,
Wu
D
,
Hu
Y
,
Law
CW
,
Shi
W
, et al
limma powers differential expression analyses for RNA-sequencing and microarray studies
.
Nucleic Acids Res
2015
;
43
:
e47
.
21.
Benjamini
Y
,
Drai
D
,
Elmer
G
,
Kafkafi
N
,
Golani
I
. 
Controlling the false discovery rate in behavior genetics research
.
Behav Brain Res
2001
;
125
:
279
84
.
22.
Peters
TJ
,
Buckley
MJ
,
Statham
AL
,
Pidsley
R
,
Samaras
K
,
V Lord
R
, et al
De novo identification of differentially methylated regions in the human genome
.
Epigenetics Chromatin
2015
;
8
:
6
.
23.
Mounir
M
,
Lucchetta
M
,
Silva
TC
,
Olsen
C
,
Bontempi
G
,
Chen
X
, et al
New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
.
PLoS Comput Biol
2019
;
15
:
e1006701
.
24.
Busato
F
,
Dejeux
E
,
El Abdalaoui
H
,
Gut
IG
,
Tost
J
. 
Quantitative DNA methylation analysis at single-nucleotide resolution by pyrosequencing(R)
.
Methods Mol Biol
2018
;
1708
:
427
45
.
25.
Li
YY
,
Wang
K
,
Chen
LH
,
Zhu
XX
,
Zhou
J
. 
Quantification of mRNA levels using real-time polymerase chain reaction (pCR)
.
Breast Cancer
2016
;
1406
:
73
9
.
26.
Perez-Enciso
M
,
Tenenhaus
M
. 
Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach
.
Hum Genet
2003
;
112
:
581
92
.
27.
Pidsley
R
,
Zotenko
E
,
Peters
TJ
,
Lawrence
MG
,
Risbridger
GP
,
Molloy
P
, et al
Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling
.
Genome Biol
2016
;
17
:
208
.
28.
Blattler
A
,
Yao
L
,
Witt
H
,
Guo
Y
,
Nicolet
CM
,
Berman
BP
, et al
Global loss of DNA methylation uncovers intronic enhancers in genes showing expression changes
.
Genome Biol
2014
;
15
:
469
.
29.
Scala
G
,
Federico
A
,
Palumbo
D
,
Cocozza
S
,
Greco
D
. 
DNA sequence context as a marker of CpG methylation instability in normal and cancer tissues
.
Sci Rep
2020
;
10
:
1721
.
30.
Li
JS
,
Ying
JM
,
Wang
XW
,
Wang
ZH
,
Tao
Q
,
Li
LL
. 
Promoter methylation of tumor suppressor genes in esophageal squamous cell carcinoma
.
Chin J Cancer
2013
;
32
:
3
11
.
31.
Kishino
T
,
Niwa
T
,
Yamashita
S
,
Takahashi
T
,
Nakazato
H
,
Nakajima
T
, et al
Integrated analysis of DNA methylation and mutations in esophageal squamous cell carcinoma
.
Mol Carcinog
2016
;
55
:
2077
88
.
32.
Talukdar
FR
,
Ghosh
SK
,
Laskar
RS
,
Mondal
R
. 
Epigenetic, genetic and environmental interactions in esophageal squamous cell carcinoma from northeast India
.
PLoS One
2013
;
8
:
e60996
.
33.
Arechederra
M
,
Daian
F
,
Yim
A
,
Bazai
SK
,
Richelme
S
,
Dono
R
, et al
Hypermethylation of gene body CpG islands predicts high dosage of functional oncogenes in liver cancer
.
Nat Commun
2018
;
9
:
3164
.
34.
Chen
C
,
Peng
H
,
Huang
X
,
Zhao
M
,
Li
Z
,
Yin
N
, et al
Genome-wide profiling of DNA methylation and gene expression in esophageal squamous cell carcinoma
.
Oncotarget
2016
;
7
:
4507
21
.
35.
Belanger
AS
,
Tojcic
J
,
Harvey
M
,
Guillemette
C
. 
Regulation of UGT1A1 and HNF1 transcription factor gene expression by DNA methylation in colon cancer cells
.
BMC Mol Biol
2010
;
11
:
9
.
36.
Xu
W
,
Xu
M
,
Wang
L
,
Zhou
W
,
Xiang
R
,
Shi
Y
, et al
Integrative analysis of DNA methylation and gene expression identified cervical cancer-specific diagnostic biomarkers
.
Signal Transduct Target Ther
2019
;
4
:
55
.
37.
Bonczek
O
,
Balcar
VJ
,
Sery
O
. 
PAX9 gene mutations and tooth agenesis: a review
.
Clin Genet
2017
;
92
:
467
76
.
38.
Paranjyothi
MV
,
Kumaraswamy
KL
,
Begum
LF
,
Manjunath
K
,
Basheer
S
. 
Tooth agenesis: a susceptible indicator for colorectal cancer?
J Cancer Res Ther
2018
;
14
:
527
31
.
39.
Gawron-Jakubek
W
,
Spaczynska
J
,
Pitynski
K
,
Loster
BW
. 
Coexistence of tooth agenesis and ovarian cancer—a systematic literature review
.
Ginekol Pol
2019
;
90
:
707
10
.
40.
Lv
J
,
Guo
L
,
Wang
JH
,
Yan
YZ
,
Zhang
J
,
Wang
YY
, et al
Biomarker identification and trans-regulatory network analyses in esophageal adenocarcinoma and Barrett's esophagus
.
World J Gastroenterol
2019
;
25
:
233
44
.
41.
Xiong
Z
,
Ren
S
,
Chen
H
,
Liu
Y
,
Huang
C
,
Zhang
YL
, et al
PAX9 regulates squamous cell differentiation and carcinogenesis in the oro-oesophageal epithelium
.
J Pathol
2018
;
244
:
164
75
.
42.
Gerber
JK
,
Richter
T
,
Kremmer
E
,
Adamski
J
,
Hofler
H
,
Balling
R
, et al
Progressive loss of PAX9 expression correlates with increasing malignancy of dysplastic and cancerous epithelium of the human oesophagus
.
J Pathol
2002
;
197
:
293
7
.
43.
Tamaoki
M
,
Komatsuzaki
R
,
Komatsu
M
,
Minashi
K
,
Aoyagi
K
,
Nishimura
T
, et al
Multiple roles of single-minded 2 in esophageal squamous cell carcinoma and its clinical implications
.
Cancer Sci
2018
;
109
:
1121
34
.
44.
Takashima
K
,
Fujii
S
,
Komatsuzaki
R
,
Komatsu
M
,
Takahashi
M
,
Kojima
T
, et al
CD24 and CK4 are upregulated by SIM2, and are predictive biomarkers for chemoradiotherapy and surgery in esophageal cancer
.
Int J Oncol
2020
;
56
:
835
47
.
45.
Su
P
,
Wen
S
,
Zhang
Y
,
Li
Y
,
Xu
Y
,
Zhu
Y
, et al
Identification of the key genes and pathways in esophageal carcinoma
.
Gastroenterol Res Pract
2016
;
2016
:
2968106
.
46.
Ma
D
,
Yang
J
,
Wang
Y
,
Huang
X
,
Du
G
,
Zhou
L
. 
Whole exome sequencing identified genetic variations in Chinese hemangioblastoma patients
.
Am J Med Genet A
2017
;
173
:
2605
13
.
47.
Frydrych
LM
,
Ulintz
P
,
Bankhead
A
,
Sifuentes
C
,
Greenson
J
,
Maguire
L
, et al
Rectal cancer sub-clones respond differentially to neoadjuvant therapy
.
Neoplasia
2019
;
21
:
1051
62
.
48.
Liu
Y
,
Wang
H
,
Ni
B
,
Zhang
J
,
Li
S
,
Huang
Y
, et al
Loss of KCNJ15 expression promotes malignant phenotypes and correlates with poor prognosis in renal carcinoma
.
Cancer Manag Res
2019
;
11
:
1211
20
.
49.
Fang
H
,
Liu
Y
,
He
Y
,
Jiang
Y
,
Wei
Y
,
Liu
H
, et al
Extracellular vesicle delivered miR5055p, as a diagnostic biomarker of early lung adenocarcinoma, inhibits cell apoptosis by targeting TP53AIP1
.
Int J Oncol
2019
;
54
:
1821
32
.
50.
Januszewicz
W
,
Tan
WK
,
Lehovsky
K
,
Debiram-Beecham
I
,
Nuckcheddy
T
,
Moist
S
, et al
Safety and acceptability of esophageal cytosponge cell collection device in a pooled analysis of data from individual patients
.
Clin Gastroenterol Hepatol
2019
;
17
:
647
56
.
51.
Middleton
DRS
,
Mmbaga
BT
,
O'Donovan
M
,
Abedi-Ardekani
B
,
Debiram-Beecham
I
,
Nyakunga-Maro
G
, et al
Minimally invasive esophageal sponge cytology sampling is feasible in a Tanzanian community setting
.
Int J Cancer
2021
;
148
:
1208
18
.