Oral tongue squamous cell carcinomas (OTSCC) are a homogenous group of aggressive tumors in the head and neck region that spread early to lymph nodes and have a higher incidence of regional failure. In addition, there is a rising incidence of oral tongue cancer in younger populations. Studies on functional DNA methylation changes linked with altered gene expression are critical for understanding the mechanisms underlying tumor development and metastasis. Such studies also provide important insight into biomarkers linked with viral infection, tumor metastasis, and patient survival in OTSCC. Therefore, we performed genome-wide methylation analysis of tumors (N = 52) and correlated altered methylation with differential gene expression. The minimal tumor-specific DNA 5-methylcytosine signature identified genes near 16 different differentially methylated regions, which were validated using genomic data from The Cancer Genome Atlas cohort. In our cohort, hypermethylation of MIR10B was significantly associated with the differential expression of its target genes NR4A3 and BCL2L11 (P = 0.0125 and P = 0.014, respectively), which was inversely correlated with disease-free survival (P = 9E−15 and P = 2E−15, respectively) in patients. Finally, differential methylation in FUT3, TRIM5, TSPAN7, MAP3K8, RPS6KA2, SLC9A9, and NPAS3 genes was found to be predictive of certain clinical and epidemiologic parameters.

Implications: This study reveals a functional minimal methylation profile in oral tongue tumors with associated risk habits, clinical, and epidemiologic outcomes. In addition, NR4A3 downregulation and correlation with patient survival suggests a potential target for therapeutic intervention in oral tongue tumors. Data from the current study are deposited in the NCBI Geo database (accession number GSE75540). Mol Cancer Res; 14(9); 805–19. ©2016 AACR.

Head and neck squamous cell carcinomas (HNSCC) are a heterogeneous group of tumors with different incidences, mortalities, and prognosis for different subsites. HNSCCs are the sixth leading cause of cancer worldwide (1) and account for almost 30% of all cancer cases in India (2). Oral cancer is the most common subtype of head and neck cancers with a worldwide incidence of greater than 300,000 cases. The disease is an important cause of morbidity and mortality with a 5-year survival of less than 50% (1, 2). Unlike other subsites of oral cavity like gingivo-buccal, tumors originating in the anterior part of tongue or oral tongue have an increased association with younger patients (3, 4), spread early to lymph nodes (3), and have a higher regional failure (5). Tobacco and alcohol are common risk factors for this group of cancer. Although the role of human papilloma virus (HPV) is clearly understood in the prognosis of oropharyngeal tumors (6), the role of HPV as an etiologic agent and/or a prognostic marker in oral tongue squamous cell carcinoma (OTSCC) is not yet established.

In the last 5 years, studies have identified somatic mutations, insertions, deletions and amplifications, copy number alterations, loss of heterozygosity, gene expression, and DNA methylation changes for different tumor subsites in HNSCC (7–12). In addition, few studies (13, 14) have demonstrated the role of epigenetic modifications and associated pathways in HPV-positive HNSCC patients.

Epigenetic changes are known biomarkers for cancer initiation and progression (15). DNA methylation at cytosine residues [5-methylcytosine (5mC)] is one of the best studied epigenetic changes in cancer (16). Cytosine methylation alters gene expression and thus plays an important role in other biologic processes, such as embryonic development, imprinting, and X-chromosome inactivation (17, 18). Cancer development is characterized by two major DNA methylation changes, hypermethylation of CpG islands located in the promoter regions of tumor suppressor genes (making them inactive) and global hypomethylation leading to activation of oncogenes and transposons (19). By conducting genome-wide methylation studies, one can identify potential DNA methylation markers for early cancer detection. Previous studies profiling DNA methylation changes in HNSCC primarily used low-throughput assays and identified changes in DCC, DAPK1, SPEN, E-cadherin, CDKN2A, RASSF1, and MGMT genes (20–22). Pickering and colleagues (2013) studied methylation in oral squamous cell carcinomas using low-density HumanMethylation27K arrays and found 3,694 loci to be differentially methylated in tumor tissues (9). Oral cancer studies by Shaw and colleagues (2013) used the Illumina GoldenGate assay that comprises 1,505 CpG island loci and found 48 loci to be differentially methylated in tumors (23).

In this study, we profiled tumor-specific 5mC changes using high-density arrays comprising 485,512 probes in 52 pairs of OTSCCs. We identified differentially methylated loci or probes (DMP) between tumors and their matched normals and studied their functional impact by comparing the levels of differential methylation with differential gene expression in tumors. We found varying extents of differential methylation in various subregions. These differentially methylated regions (DMRs) often encompass multiple genes and were associated with differential gene expression, thus representing functional DMRs. We found aberrant methylation in a total of 27,276 hypermethylated and 21,231 hypomethylated loci with an average Δβ (level of differential methylation) of at least 0.2. We discovered a minimal methylation signature comprising 16 different DMRs that harbored GPER1, TTLL8, RHPN1, OR2T6, MIR10B, ENAH, EMX2OS, BTK, C11orf53, CENPVL1, COL9A1, DEFA1, ODF3, RARRES2, SHF, and SP6 genes with significant differential methylation (quantitative and categorical, see Materials and Methods) using a machine learning–based random forest approach (24), which were validated using methylation data (450K) from the head and neck cohort in the The Cancer Genome Atlas (TCGA) project. In our study, hypermethylation in a DMR-harboring miRNA, MIR10B, was linked to downregulation of two of its validated gene targets, NR4A3 and BCL2L11, which correlated well with disease-free survival. Furthermore, the machine-learning algorithm identified DMPs within TSPAN7, FUT3, TRIM5, SLC9A9, NPAS3, RPS6KA2, MAP3K8, TIMM8A, and RNF113A genes to be predictive of risk habits, nodal status, tumor staging, prognosis, and HPV infection.

Informed consent, ethics approval, and patient samples used in the study

Informed consent was obtained voluntarily from each patient, and ethics approval was obtained from the Institutional Ethics Committees of the Mazumdar Shaw Medical Centre (Bangalore, India). Tumor and matched control (blood and/or adjacent normal tissue) specimens were collected and used in the study. Only those patients whose histologic sections confirmed the presence of squamous cell carcinoma with at least 70% tumor cells in the specimen were used in this study. At the time of admission, patients were asked about their habits (chewing, smoking, and/or alcohol consumption). Fifty-two treatment-naïve patients who underwent staging according to AJCC criteria and curative intent treatment as per NCCN guidelines involving surgery with or without postoperative adjuvant radiation or chemoradiation at the Mazumdar Shaw Medical Centre were accrued for the study (Supplementary Table S1).

Whole-genome methylation assay

Genomic DNA was extracted from tissues using DNeasy Blood and Tissue Kit (Qiagen) as per the manufacturer's specifications. Genomic DNA quality was assessed by agarose gel electrophoresis, and samples with intact, high molecular weight DNA bands were used for bisulfite conversion. Five hundred nanograms of genomic DNA was used for bisulfite conversion using the Zymo EZ DNA Methylation Kit (Zymo Research Corp.) following the manufacturer's instructions and eluted in 12 μL of elution buffer. Four microliters of the bisulfite-converted DNA was used as template for target preparation to hybridize on the BeadChip and was processed as per the manufacturer's instructions following Infinium HD and Infinium protocol for HumanMethylation450K BeadChip (Illumina), respectively. Data were collected using the HiScan (Illumina) reader and analyzed with GenomeStudio V2011.1 methylation module1.1.1 (Illumina).

Methylation450 BeadChip probe alignment and removal of ambiguous probes from analyses

The probe sequences were aligned using bowtie2 (25), while allowing up to 2 mismatches to the human reference hg19 sequence to detect and exclude probes mapping ambiguously to multiple locations in the reference genome (Supplementary Text S1). We found 17,649 probes (3.6% of the total number of probes) to be ambiguously mapping in this manner. These probes mapped to multiple locations, ranging from 2 to 6265 (Supplementary Table S2).

Whole-genome gene expression assay

Whole-genome gene expression profiling was carried out with Illumina HumanHT-12 v4 expression BeadChip (Illumina) with 21 tumors and their matched adjacent normal tissues. Total RNA was extracted using PureLink RNA Mini Kit (Invitrogen), and RNA quality was checked on the bioanalyzer using RNA Nano6000 chip (Agilent). RNA samples with poor RNA integrity numbers (RIN; ≤7) on the bioanalyzer chip, indicating partial degradation of RNA, were processed using Illumina WGDASL assay and the ones with good RIN numbers (>7) were labeled using Illumina TotalPrep RNA Amplification Kit (Ambion) as per the manufacturer's recommendations. Targets were used to hybridize arrays, and arrays were processed according to the manufacturer's recommendations. Arrays were scanned using HiScan, Illumina, and the data collected were analyzed with GenomeStudio V2011.1 Gene Expression module 1.9.0 (Illumina) to check for the assay quality control probes. Raw signal intensities were exported from GenomeStudio for transformation, normalization, and differential expression analyses using R.

Data preprocessing and differential methylation analyses

Raw data for 52 tumor:matched control sample pairs obtained from the Illumina Infinium Methylation450 BeadChip scanned using the HiScan system were analyzed in GenomeStudio Methylation Module v1.9.0 (Illumina) for assay controls, before exporting the β values for analysis using R. For each CpG site, ratio of fluorescent signal was measured by that of a methylated probe relative to the sum of the methylated and unmethylated probes, termed as a β value. β values, ranging from 0 (no methylation) to 1.0 (100% methylation of both alleles), were recorded for each probe on the array. The β value is calculated using the formula mentioned below:

where Pm is the signal intensity of the methylation detection probe and Pum is the signal intensity of the nonmethylation detection probe. Infinium methylation assay has built-in sample-dependent and -independent controls, which were used to test the overall efficiency of the assay for each sample. Raw signal intensities were exported as .idat files, preprocessed (transformed and normalized), and analyzed using R Bioconductor packages, wateRmelon, minfi, and ComBat (26–28), to estimate differential methylation. The Illumina 450K data were preprocessed using a functional normalization procedure implemented by the "preprocessFunnorm" function within the R Bioconductor package minfi (Supplementary Text S2). This algorithm was recently developed and found to outperform other 450K data normalization and batch correction methods (29). After functional normalization, we identified DMRs using "bumphunter" function in minfi, retaining significant ones with P ≤ 0.05 and fwer ≤ 0.05. Dasen normalization was also performed using wateRmelon to obtain differentially methylated loci using minfi (Supplementary Text S3). A clear difference before and after normalization was seen in the density plots (Supplementary Fig. S1). The normalized data were further batch corrected using ComBat (28) using empirical Bayes methods. Dasen-normalized DMPs corresponding to these DMRs were extracted. Postprocessing of the differentially methylated data for visualization was performed as per the method provided in the Supplementary Material (Supplementary Text S4–S6).

Data preprocessing and differential expression analyses

Whole-genome gene expression data were analyzed using final reports exported from Genome Studio, using the R Bioconductor packages lumi (30). The VST transformation and LOESS normalization methods in lumi performed best in preprocessing data from the DASL and WG assays. They were chosen based on their highest mean IACs (interarray correlation coefficients), among various combinations of three types of transformations (VST, log2, and cubicRoot) and six types of normalizations (SSN, quantile, RSN, VSN, RankInvariant, and LOESS). The preprocessed data was further batch corrected using ComBat, as the experiments had been performed in multiple batches. The batch-corrected gene expression data were then analyzed for differential expression using limma (Supplementary Text S7). This method of transformation, normalization, and batch correction was performed as described previously (31).

Correlation of methylation and expression data

All probes were annotated using the Ensemble v75 database and classified on the basis of their locations to genic subregions, such as 5′ and 3′ untranslated regions (UTR), north and south shelves and shores, TSS1500, promoters, CpG islands, gene bodies, and first exons. The relationship between differential expression of genes and differential methylation of genic subregions was visualized (Supplementary Text S8).

Minimal differentially methylated loci using machine-learning algorithm

Sample-wise dasen-normalized and batch-corrected intensities of probes corresponding to DMRs, identified postfunctional normalization using minfi, along with the tumor:matched control tissue type for each sample was used as a training set (training set #1) for the random forest analyses.

Variable selection

The method employed here follows the method described before (32). Briefly, the algorithm performs both backward elimination of variables and selection based on the importance spectrum. The Bioconductor package VarSelRF (33) was used to perform this function. About 20% of the least important variables are eliminated iteratively until the current out-of-bag (OOB) error rate becomes larger than the initial OOB error rate or the OOB error rate in the previous iteration. We computed a metric by multiplying the prediction accuracy (sensitivity and specificity) and repeatability, and dividing the product by the number of parameters contained within the predicted set, and reported the set with the highest value for this metric. Five hundred of such iterations were performed, and each of the iterations was permuted across 3,000 random forest trees (Supplementary Text S9). A score was computed for predictive parameter sets across all iterations, for each sample, which factored in the prediction accuracy of the tissue type of that sample, the repeatability (number of instances out of 500) of the predictive parameter set, presence of other predictive parameter sets, and the parameter set size. The first two weighed the score positively, and the latter two, negatively. The R commands used in our method are provided below:

For all iterations of all RF analyses, we confirmed that the rankings of variable importances remained highly correlated (Supplementary Table S3) before and after correcting for multiple hypotheses comparisons using pre- and post- Benjamin–Hochberg FDR-corrected P values (Supplementary Text S10). The R commands used to recompute importance values after multiple comparisons testing for the six somatic mutation category RF analyses are provided below:

The random forest algorithm was also implemented to predict tumor-specific minimal methylation signature, this time training with a categorical input for the probes, instead of the actual intensity values (training set #2). This training set was created on the basis of the normalized Δβ (tumor-matched control β) values falling into predefined categories. Following the Δβ density distributions across probes (Supplementary Fig. S2), for −0.2 < Δβ < 0.2, bins were determined at an interval of 0.032, and for Δβ ≥ 0.2, the binning frequency was reduced to 0.08. For example, Δβ = 0.02 for a probe corresponding to a tumor:matched control sample pair results in categories A for the tumor and B for the matched control, whereas Δβ = −0.02 would result in reversal of the categories, that is, B for the tumor and A for the matched control. In addition to the above two DMP-based training sets, we created another categorical set (training set #3), based on hypo- or hypermethylation of DMRs housing only CpG island, TSS1500, or promoter-associated probes. In this scenario, DMRs housing heterogeneously methylated probes (i.e., a mixture of hyper- and hypomethylated probes) were discarded, and only the homogeneous DMRs (with all hypo- or all hypermethylated probes) were considered.

0.632+ bootstrapping.

To understand the specificity of the best minimalistic predictors of survival, we estimated the 0.632+ error rate. We performed bootstrapping for the tissue type, predicting random forest analyses using 100 replicates (Supplementary Text S11), and the errors were reported for prediction of matched control, tumor, and both tissues. 0.632+ bootstrapping is a method to estimate prediction errors in ensemble machine-learning approaches, such as random forests (34). This method is relatively free from biases (upward and downward), unlike the leave-one-out cross-validation and 0.632 bootstrapping methods (35–37). The 0.632+ method is described by the following formula:

where Err(0.632+), Err(0.632), Err(1), and err are errors estimated by the 0.632+ method, the original 0.632 method, leave-one-out bootstrap method, and err represents the error. R represents a value between 0 and 1.

Visualization of variants

We visualized all the variants and the corresponding genes using Circos v0.66. We integrated information on LOHs, somatic SNPs and indels, and copy number insertions and deletions from 48 patients described previously (13) with ≥10% frequency of patients bearing the variants, genes undergoing significant expression, and methylation changes and visualized the changes using the circular genomic representation. The DNAse1 hypersensitivity tracks, methylation, histone and chromatin modifications, and TFBS tracks were loaded and visualized using the UCSC Genome Browser. Genome-wide profiles of differential expression and differential methylation of DMRs housing island/TSS1500/promoter probes were visualized using GenomeGraphs, a tool in the UCSC Genome Browser.

Validation of minimal methylation signature in TCGA HNSCC data

The best-scoring signatures identified from the random forest analyses on the OTSCC data, using three training sets, were examined in the 50 tumor:matched control pairs (various subsites in head and neck region) from the TCGA cohort. TCGA data were preprocessed and normalized exactly the same way as described above for the OTSCC data.

Prediction of minimal differential methylation signature in OTSCC for various clinical parameters

Various epidemiologic and clinical parameters were used for predictive analyses using the importance-based variable elimination by the random forest approach (24), where the Δβ of DMPs corresponding to DMRs were used as the training sets. Minimal differential methylation signatures with the best scores were visualized for patients with different states for the selected epidemiologic and clinical parameters.

Validation of NR4A3 and BCL2L11 expression using qPCR

Total RNA was isolated using RNeasy Mini Kits (Qiagen) following the manufacturers' instructions with on column DNaseI treatment (Qiagen) to remove contaminating genomic DNA. The RNA integrity was analyzed by electrophoresis, and samples were preserved at −80°C until further use. Total RNA (500 ng) was reverse transcribed using SuperScript III First-Strand Synthesis Kit, following the manufacturer's instructions (Thermo Fisher Scientific). qPCR was performed using KAPPA SYBR Fast qPCR Master Mix (Kapa Biosystems) on NR4A3 using primers: forward 5′-GGCTGCAAGGGCTTTTTCAA-3′, reverse 5′-TCGGTTTCGACGTCTCTTGT-3′; and on BCL2L11 using primers: forward 5′-CAACACAAACCCCAAGTCCT-3′, reverse 5′-TCTTGGGCGATCCATATCTC-3′. GAPDH was amplified in parallel using primers: forward 5′-TCGACAGTCAGCCGCATCTTCTTT-3′, reverse 5′-GCCCAATACGACCAAATCCGTTGA-3′ as an internal housekeeping control. qPCR was performed using initial denaturation at 95°C for 5 minutes, followed by 40 cycles of 95°C for 10 seconds, 60°C for 30 seconds. The amplification was followed by dissociation curve analysis using Stratagene MX300P instrument (Agilent Technologies) and were analyzed by MxPro Mx-3000p software (Agilent Technologies). All amplification reactions were done in triplicates, using nuclease-free waster as negative controls. The analysis of gene expression data was performed using the comparative Ct method of relative quantification as described by Schmittgen and Livak, 2008 (38).

Validation of miR-10B methylation using quantitative methylation-specific PCR

The genomic DNA was isolated from paired oral tongue tumor and the adjacent normal tissue samples using DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's instructions. Genomic DNA (total 500 ng) was bisulfite converted using EZ DNA Methylation Kit (Zymo Research) following the manufacturer's guidelines. For the specific amplification of CpG islands of MIR10B promoter, methylation-specific primers (forward, 5′-GTGAGTAGTTTTTCGGAGGTAGC-3′ and reverse, 5′-CGAACTAACTATAAACCCGAACG-3′) were designed using MethPrimer software (http://www.urogene.org/methprimer/), quantitative methylation-specific PCR (qMSP) was carried out using KAPA SYBR Fast qPCR Master Mix (Kapa Biosystems). Quantitative methylation in each sample was normalized using the methylated primers for ACTB (forward, 5′-AACCAATAAAACCTACTCCTCCCTTAA-3′ and reverse, 5′-TGGTGATGGAGGAGGTTTAGTAAGT-3′). The reaction mix was denatured initially at 95°C for 5 minutes and amplified for 40 cycles at 95°C for 10 seconds, 53°C for 30 seconds (59°C for ACTB primer), followed by extension at 72°C for 1 minute. The amplification was followed by dissociation curve analysis. PCR reactions were performed using Stratagene MX300P (Agilent Technologies) and were analyzed by MxPro Mx-3000p software (Agilent Technologies). Human HCT116 DKO methylated and nonmethylated DNA set standards (Zymo Research) were used as positive and negative assay controls, respectively, for qMSP. The relative level of methylated DNA was calculated for each tumor versus the adjacent normal using the method described by Schmittgen and Livak, 2008 (38).

Differential expression analysis of NR4A3 in TCGA cohort

Level 3 RNA-seq expression data on TCGA HNSCC (N = 42 for all subsites and N = 12 for oral tongue, where expression data were available for tumor:normal pairs), along with survival and living status, were downloaded from the UCSC Cancer Browser (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/#). Tumor-specific ratios of the normalized expression values for NR4A3 gene were log transformed before further analyses.

Survival analysis

Event-free survival probability distributions were estimated using the Kaplan–Meier method, and significance was assessed using the log-rank test. We calculated event-free survival for patients as the time from diagnosis to the event (recurrence, disease progression, or death) or last examination if the patient had no event. Significance was calculated as two-tailed P values, on association with event-free survival (months), of habits alone, NR4A3 expression status, and habits and NR4A3 expression, together. The R commands used for Kaplan–Meier analyses are provided in Supplementary Text S12.

Analysis workflow and differentially methylated loci in OTSCC

The analytic pipeline for genome-wide methylation and gene expression data analyses along with linking methylation with gene expression, machine-learning approach to discover methylation signature, and follow-up validation is provided in Fig. 1. The methylation data were normalized by multiple methods (see Materials and Methods), batch corrected, and then used for the identification of DMPs and regions encompassing multiple probes (DMRs) in the tumors. Scanning of the preprocessed data in tumor samples before normalization, batch correction, and pairing with corresponding normal tissues revealed DMPs in seven genes, UBE4B, CCDC13, LRP5L, BCL3, MIR4260, FOXK2, and COL18A1, to be hypermethylated, and one in gene GSTM2, to be hypomethylated in nearly 95% of the tumors. We found more DMRs to be hypomethylated than hypermethylated (Fig. 2A), which is consistent with the data from the TCGA cohort, including when data from the oral tongue subsite alone was used (Fig. 2A). The fraction of hypermethylated probes located within the CpG islands, TSS1500, and promoter subregions, combined, were higher, whereas hypomethylated probes were more frequent in the gene bodies for OTSCC (Fig. 2B). The average Δβ of the island DMRs had a narrow distribution for the OTSCC and HNSCC (all and oral tongue subsites from the TCGA cohort), with median values around 0.30 and 0.40, respectively, indicative of greater hypermethylation events (Fig. 2C). The interquartile ranges for average Δβ distributions of TSS1500 DMRs spanned both hyper- and hypomethylation ranges, for the OTSCC and HNSCC cancer types, respectively. The median, however, was more indicative of hypermethylation in the OTSCC and hypomethylation in the HNSCC. The average Δβ distributions were identical for island DMRs between the OTSCC and the TCGA HNSCC cohort for early- and late-stage tumors, with their medians suggestive of hypermethylation (Fig. 2D). The Δβ distribution for TSS1500 DMRs in OTSCC was much wider in the late-stage patient samples, as compared with those in early stages of the disease and their medians (both ∼0.3–0.4) were representative of hypermethylation. Overall, the intraregion differences (within a dataset and across tumor stages) for the Δβ distributions were significant (P < 0.0001; unpaired t test) over the interregion differences across datasets for the same subregions (TSS1500, P = 0.0751 or CpG island, P = 0.2758). A comprehensive representation of all 5mC changes, along with previously identified somatic variants in OTSCC (37), is presented as a Circos graph (Supplementary Fig. S3; Supplementary Table S4).

Figure 1.

Analysis workflow schema. The figure illustrates the tools, Bioconductor packages, and functions in R used for data preprocessing, differential methylation and expression analyses, correlation between them, identifying a tumor-specific minimal methylation signature using an ensemble machine-learning algorithm and its association with clinical and epidemiologic factors, and validation.

Figure 1.

Analysis workflow schema. The figure illustrates the tools, Bioconductor packages, and functions in R used for data preprocessing, differential methylation and expression analyses, correlation between them, identifying a tumor-specific minimal methylation signature using an ensemble machine-learning algorithm and its association with clinical and epidemiologic factors, and validation.

Close modal
Figure 2.

DMR statistics. A, DMR (q ≤ 0.05) numbers compared between inhouse oral tongue tumors (OTSCC) and data from the TCGA cohort (TCGA_OralTongue: data for oral tongue subsites of the TCGA cohort and TCGA_HNSCC: data for all tumor subsites in the TCGA cohort). B, region-wise characterization thereof. C, distribution of magnitude of differential methylation in CpG islands and TSS1500. D, distribution of DMRs in early- and late-stage tumors

Figure 2.

DMR statistics. A, DMR (q ≤ 0.05) numbers compared between inhouse oral tongue tumors (OTSCC) and data from the TCGA cohort (TCGA_OralTongue: data for oral tongue subsites of the TCGA cohort and TCGA_HNSCC: data for all tumor subsites in the TCGA cohort). B, region-wise characterization thereof. C, distribution of magnitude of differential methylation in CpG islands and TSS1500. D, distribution of DMRs in early- and late-stage tumors

Close modal

Correlation between expression and methylation

We wanted to identify the functional relevance of 5mC changes by correlating methylation changes with changes in gene expression. To do this, we extracted gene level information using probes spanning hyper- or hypomethylated regions that were up- or downregulated, respectively, for the same patient. First, we separated probes in gene bodies, CpG islands, 5′ and 3′ UTRs, north and south shelves and shores, promoters, 1.5 kb flanking the transcription start sites (TSS1500), and first exons. For gene expression analysis, whole-genome expression data were processed posttransformation; normalization and batch correction (see Materials and Methods) and tumor-specific up- and downregulated genes were considered for comparison. We found a maximum of 27 genes satisfying these criteria in the gene bodies, closely followed by 26 genes each, in the CpG islands and 5′ UTR and 22 genes in the first exon (Supplementary Fig. S4). Finally, we obtained 12 genes across all the regions where the differential methylation across all tumors was significantly linked (2-tailed t test, P < 0.05) with gene expression changes in the same tumor:normal paired samples (Fig. 3). The magnitude of correlation between expression and methylation varied widely across genes for various subregions (Fig. 3 and Supplementary Fig. S4), with a significant negative correlation (P < 0.05, two-tailed t test) observed for most subregions: 5′ UTR, R = −0.51; promoter, R = −0.5; TSS1500, R = −0.56; body, R = −0.35; S shelf, R = −0.39; first exon, R = −0.46; S shore, −0.66; and N shelf, R = −0.36. The island region harbored the largest fraction of hypermethylated and downregulated genes, whereas the promoter and TSS1500 subregions displayed a strong inverse relationship between differential expression and differential methylation (Supplementary Fig. S4). This observation formed a basis for considering DMRs housed in these three subregions as one of our training sets to identify a minimal methylation signature.

Figure 3.

Region-wise correlation between expression and methylation. The relationship between preprocessed differential methylation (tumor MINUS normal) values for probes from specific regions, such as gene bodies, CpG islands, TSS1500, N shore, S shore, N shelf, S shelf, 5′ UTR, and first exon, and the log2 fold change (tumor/normal fold change) of the corresponding gene expression for the same patient. Blue and orange colors, hypo- and hypermethylation, respectively. Filled circles indicate a correlation in the expected direction, that is, hypermethylation and downregulation or hypomethylation and upregulation. Empty circles indicate a counterintuitive correlation, namely, hypermethylation and up-regulation or hypomethylation and downregulation. The sizes of the circles indicate the magnitude of gene expression. Only genes with significant associations (P < 0.05) across all tumors are shown.

Figure 3.

Region-wise correlation between expression and methylation. The relationship between preprocessed differential methylation (tumor MINUS normal) values for probes from specific regions, such as gene bodies, CpG islands, TSS1500, N shore, S shore, N shelf, S shelf, 5′ UTR, and first exon, and the log2 fold change (tumor/normal fold change) of the corresponding gene expression for the same patient. Blue and orange colors, hypo- and hypermethylation, respectively. Filled circles indicate a correlation in the expected direction, that is, hypermethylation and downregulation or hypomethylation and upregulation. Empty circles indicate a counterintuitive correlation, namely, hypermethylation and up-regulation or hypomethylation and downregulation. The sizes of the circles indicate the magnitude of gene expression. Only genes with significant associations (P < 0.05) across all tumors are shown.

Close modal

Tumor-specific DMPs and DMRs

We used three different training sets (Materials and Methods section) to discover minimal methylation signature using variable selection by elimination analyses implemented by the random forest approach. The scores for the discovered signatures are presented in Supplementary Table S5. We found two sets of two probes in four different DMRs (encompassing GPER1 and OR2T6, TTLL8 and RHPN1 genes), each to be the best diagnostic predictor for tumors (Fig. 4A, top) with training set #1 (see Materials and Methods). These probes along with other loci housed within the same DMRs showed a hypomethylation signature for most tumor samples when compared with the normal samples (Fig. 4A). The signature was strongest for the predicted probe (DMP) within each DMR. Using the categorical training set based on binning Δβ of paired tumor and matched control samples (training set #2; see Materials and Methods), we further identified three more DMRs that encompassed MIR10B, ENAH, and EMX2OS genes (Fig. 4A). For MIR10B and EMX2OS, probes were distinctly hypermethylated for the majority of tumors (94% and 88%, respectively), while ENAH exhibited a mixed trend.Analyzing homogeneous DMRs in CpG islands/TSS1500 and promoters, alone (training set #3; see Materials and Methods), we obtained DMRs encompassing 10 genes (BTK, C11orf53, CENPVL1, COL9A1, DEFA1, ODF3, RARRES2, RHPN1, SHF, and SP6) with the highest sensitivity and specificity (Fig. 4A). We found RHPN1 gene showed up in multiple training sets. The DMRs corresponding to BTK, C11orf53, COL9A1, DEFA1, ODF3 along with RHPN1 were hypomethylated, while the remaining were relatively hypermethylated. The results from our 0.632+ bootstrapping errors were low, with a median error of approximately 0.025, for predictions of normal, tumor, or either tissues (Supplementary Fig. S5) indicative of a high overall prediction accuracy.

Figure 4.

Discovery of a minimal DNA methylation signature. A, minimal differential methylation profiles of methylation loci from random forest analyses using three different training sets (see Materials and Methods). B, validation of specific DMPs and DMRs using the same training sets in the TCGA HNSCC data.

Figure 4.

Discovery of a minimal DNA methylation signature. A, minimal differential methylation profiles of methylation loci from random forest analyses using three different training sets (see Materials and Methods). B, validation of specific DMPs and DMRs using the same training sets in the TCGA HNSCC data.

Close modal

We further validated the tumor-specific DMRs discovered in our RF analysis in the TCGA cohort. As presented in Fig. 4B (bottom), the differential methylation trends observed in our OTSCC cohort were also observed in both oral tongue and other subsites in the TCGA HNSCC cohort, except in the case of ENAH gene, where the majority of tumors were hypomethylated for the particular probe in the TCGA cohort (Fig. 4B; training set #2). In addition, we also looked at the cell line data from ENCODE consortium that revealed DNAse I hypersensitivity and/or CpG methylation in these genes, including specific locations within the DMRs housing the tissue type predictive probes (Supplementary Fig. S6), possibly indicating the role of these CpG sites in regulation of gene expression.

Hypermethylation of MIR10B and downregulation of its target gene expression

We scanned upstream and downstream of the DMR that housed MIR10B gene in an attempt to understand its differential methylation relative to its neighborhood. We found a DMR of nearly 1.31 MB to be differentially methylated in all the tumor samples (Fig. 5A). We found that the sizes of such functional DMRs discovered for the MIR10B gene to be exaggerated relative to the stringent DMR boundaries determined by minfi, also for ENAH and EMX2OS, codiscovered with MIR10B, as the minimal set (Supplementary Fig. S7).

Figure 5.

Differential methylation of MIR10B, linking it to the expression of its downstream target genes and disease-free survival. A, median differential methylation of chromosomal regions flanking MIR10B gene. The actual scatter is in gray, exponential smoothing of the scatter in red, and MIR10B is highlighted in blue. B and C, correlation between methylation (MIR10B) and expression (NR4A3 and BCL2L11). FC, fold change. D and E, correlation of NR4A3 and BCL2L11 expression with disease-free survival. Green filled circles, patients without any risk habits; red filled circles, patients with at least one risk habit (smoking, chewing tobacco, or alcohol).

Figure 5.

Differential methylation of MIR10B, linking it to the expression of its downstream target genes and disease-free survival. A, median differential methylation of chromosomal regions flanking MIR10B gene. The actual scatter is in gray, exponential smoothing of the scatter in red, and MIR10B is highlighted in blue. B and C, correlation between methylation (MIR10B) and expression (NR4A3 and BCL2L11). FC, fold change. D and E, correlation of NR4A3 and BCL2L11 expression with disease-free survival. Green filled circles, patients without any risk habits; red filled circles, patients with at least one risk habit (smoking, chewing tobacco, or alcohol).

Close modal

We observed negative correlations between the differential methylation of MIR10B gene and the differential expression of two of its validated gene targets, BCL2L11 and NR4A3(P = 0.0125 and P = 0.014, respectively; Fig. 5B and C). NR4A3 expression was downregulated in all except one tumor studied (>95% of the tumors). In addition to miR-10B hypermethylation in 60% of the tumors, and very frequent downregulation of NR4A3, we found that 50% to 60% of the tumors have hypermethylation in the NR4A3 gene itself, near the first exon and first intron, spanning a CpG island (Supplementary Fig. S8). Similarly, about 60% of the tumors had hypermethylation in the promoter region of BCL2L11 gene, also spanning a CpG island (Supplementary Fig. S8). We found the extent of NR4A3 downregulation was correlated, although weakly, in patients with no risk habits than those patients with at least one of the risk habits, smoking, alcohol consumption, and chewing tobacco. The tumor-specific differential expression of BCL2L11 and NR4A3 genes was inversely correlated with disease-free survival (P = 9E−15 and P = 2E−15, respectively; Fig. 5D and E), with absence of risk habits associating more with improved disease-free survival.

We validated the inverse association between differential methylation of MIR10B and differential expression in its downstream targets, NR4A3 (R = −0.66; P = 0.04) and BCL2L11 (R = −0.78; P = 0.01), experimentally by performing qMSP for MIR10B and qPCR for BCL2L11 and NR4A3 (see Materials and Methods), in 12 additional tumor:matched control pairs (Supplementary Fig. S9). We found that MIR10B is hypermethylated in nearly 60% of the samples, with 83% and 42% of the tumors showing downregulation for NR4A3 and BCL2L11, respectively (Supplementary Fig. S9), validating the findings from the arrays.

NR4A3 expression and patient survival

We studied the effect of MIR10B target gene expression on patient survival and found that the extent of NR4A3 downregulation was correlated, although weakly, in patients with no risk habits than those patients with at least one of the risk habits, smoking, alcohol consumption, and chewing tobacco. The tumor-specific differential expression of BCL2L11 and NR4A3 genes was inversely correlated with disease-free survival (P = 9E−15 and P = 2E−15, respectively; Fig. 5D and E), with absence of risk habits associating more with improved disease-free survival. In our discovery set, we found 95% of the tumors with NR4A3 downregulation that we validated in 83% and 42% of the tumors in an independent validation set using qPCR and by using data from TCGA oral tongue cohort (Supplementary Fig. S10). We extended our earlier observations by performing Kaplan–Meier survival analyses on the effects of habits alone, NR4A3 downregulation alone, or their combined effects on survival. We observed that the habits alone do not have significant impact on patient survival (Fig. 6A, P = 0.494; Fig. 6D, P = 0.283), whereas NR4A3 expression does (Fig. 6B, P = 0.034; Fig. 6E, P = 0.019). We further observed that the combination of no risk habits and NR4A3 expression has a significant effect on survival (Fig. 6C, P = 0.046 and Fig. 6F, P = 0.021). Overall, patients without habits (habits negative) expressed NR4A3 at a lower level compared with those with habits (Fig. 5C).

Figure 6.

Effect of NR4A3 expression on survival probability distributions. A and D, Kaplan–Meier survival probability curves for habits negative (habits-ve) versus habits positive (habits +ve). B–E,NR4A3 downregulated (NR4A3-Down) versus NR4A3 upregulated (NR4A3-Up). C–F, habits negative OR NR4A3 downregulated versus habits positive and NR4A3 upregulated, in the discovery and validation sets (A–C), or in the discovery, validation, and the TCGA oral tongue dataset, combined together (D–F).

Figure 6.

Effect of NR4A3 expression on survival probability distributions. A and D, Kaplan–Meier survival probability curves for habits negative (habits-ve) versus habits positive (habits +ve). B–E,NR4A3 downregulated (NR4A3-Down) versus NR4A3 upregulated (NR4A3-Up). C–F, habits negative OR NR4A3 downregulated versus habits positive and NR4A3 upregulated, in the discovery and validation sets (A–C), or in the discovery, validation, and the TCGA oral tongue dataset, combined together (D–F).

Close modal

DMPs predicting various epidemiologic and clinical parameters

Differential methylation (Δβ) for all tumors used as training sets along with five parameters, namely HPV infection, nodal status, risk habits, tumor stage, and prognosis, resulted in best-scoring DMPs corresponding to TSPAN7, FUT3, TRIM5, SLC9A9, NPAS3, RPS6KA2, MAP3K8, TIMM8A, and RNF113A genes. The set of predictive DMPs displayed contrasting patterns of differential methylation among the various categories of each parameter (Fig. 7).

Figure 7.

Discovery of differentially methylated loci linked with clinical and epidemiologic parameters. A–E, differential methylation profiles of HPV, habits, node, prognosis, and stage, respectively, predicting DMPs and DMRs from random forest analyses. Blue circles indicate hypomethylation and orange circles indicate hypermethylation, with the sizes indicative of the magnitude of methylation changes (Δβ).

Figure 7.

Discovery of differentially methylated loci linked with clinical and epidemiologic parameters. A–E, differential methylation profiles of HPV, habits, node, prognosis, and stage, respectively, predicting DMPs and DMRs from random forest analyses. Blue circles indicate hypomethylation and orange circles indicate hypermethylation, with the sizes indicative of the magnitude of methylation changes (Δβ).

Close modal

Epigenetic changes like DNA methylation play important roles in cancer initiation and progression. One of the first steps in studying DNA methylation is to catalog the loci where methylation changes take place and link those to the disease. Previous studies in head and neck cancer were primarily aimed at identifying 5mC changes in a handful of loci in the genome or using low-throughput methylation arrays. In the current study, we have used >482, 000 probes to profile genome-wide changes in 5mC patterns in oral tongue tumors (OTSCC) and defined functional regions of importance by comparing DNA methylation changes with those in gene expression. We term these regions of the genome as functional DMRs. We also linked global DNA methylation changes with various clinical and epidemiologic parameters in OTSCC. Our observation of the presence of genome-wide hypomethylated regions alongside promoter hypermethylation (Fig. 2) corroborates pervious study in other cancers (19). Although the role of global hypomethylation is unclear, several animal experiments using DNA methylation inhibitors (39, 40) are indicative of its involvement in oncogenesis and tumor progression. The hypermethylation of CpG islands observed in our study (Fig. 2B–D) was supportive of the theory that CpGs located within islands in normal cells are relatively unmethylated, whereas those located outside of the islands are more methylated, with the reverse pattern occurring in tumor cells (41). We observed a shift of the global hypermethylation in TSS1500 toward a relative hypomethylation, from early to advanced stages of the disease (Fig. 2D), a trend also found in hepatocellular carcinoma (42). We found some uniquely differentially methylated loci in our study, different from other studies so far, a possible indication of specific epigenetic marks linked with our cohort and patient geography as the epigenome is known to be dynamic and highly dependent on environmental and dietary factors (43).

In our study, we define the importance of functional and differentially methylated regions or DMRs in tumors based on two factors. First, we identified genes that were encompassed in DMRs by correlating differential methylation in a number of probes within specific genomic loci, especially in the CpG islands, promoter, and TSS1500, with the differential expression in these genes, between the same tumor:matched normal pairs, and estimating the significance of the association (2-tailed t test, P < 0.05) between those two events (Fig. 3). Second, we identified the entire stretch of DMPs and assigned importance to all the genes in that region. It is vital to assign function to the entire DMR, rather than a single probe or a gene housed within it, as the probe hybridization patterns vary widely, even within a given subregion, and methylation changes are not necessarily localized single-gene events. Both these criteria led us to identify functional DMRs. A genome-wide representation of such functional DMRs is shown in Supplementary Fig. S11. We found that methylation changes at certain regions do not go hand-in-hand with gene expression changes in the expected direction (i.e., hyper- and hypomethylated loci giving rise to up- and downregulation of gene expression, respectively; Supplementary Fig. S4). Even though such phenomena were previously reported (44), their significance is currently not known.

Our analyses using an ensemble machine-learning algorithm followed by estimation of errors picked several genes predictive of clinical and epidemiologic factors (Fig. 7). Several of those are shown in the past to have a role in cancer. For example, TSPAN7, a tetraspanin family member upregulated in multiple myeloma patients (45) in human epithelial malignancies (46), FUT3 in gastric cancer (47), SLC9A9, overexpressed in most cancers (http://www.proteinatlas.org/ENSG00000181804-SLC9A9/cancer), NPAS3, a neuronal PAS transcriptional factor family member as a tumor suppressor in astrocyomal progression (48), RPS6KA2, a ribosomal kinase and as a putative tumor suppressor in sporadic epithelial ovarian cancer (49) and a modifier of the EGFR signaling pathway in pancreatic cancer (50), and MAP3K8 as a proto-oncogene that plays a role in lung (51), prostrate (52), and colorectal (53) cancers.

We identified a signature containing differentially methylated loci encompassing 16 genes using a machine-learning algorithm using three different training sets (Fig. 4A) that was further validated using data from TCGA cohort (Fig 4B). Although the machine-learning algorithm identified DMPs, we could identify the functional DMRs encompassing these DMPs (Fig. 5A and Supplementary Fig. S12). Many of the genes within those DMRs are cancer-associated genes. Among those, GPER1 (G-protein–coupled estrogen receptor 1) is known to act as a tumor suppressor in ovarian cancer (54) and has low expression in breast cancer (55), RHPN1 (rhophilin, Rho GTPase-binding protein 1) is reported to be highly expressed in most cancers (http://www.proteinatlas.org/ENSG00000158106-RHPN1/cancer) and CpG islands adjacent to the RPHN1 gene is reported to be linked with survival of colorectal cancer patients (56). In addition, TTLL8 (tubulin tyrosine ligase-like family, member 8) gene is differentially methylated in diseases such as myelodysplastic syndrome (57), and OR2T6 shows copy number gains in many cancers, including HNSCC (http://ow.ly/RniAf).

MIR10B has been implicated in the past in multiple cancers, including in head and neck (58–62). Unlike in some cancers, its role in HNSCC is unclear (63). We found a large functional hypermethylated DMR around miR-10B spanning several megabases (Fig. 5A and Supplementary Fig. S7). A similar effect in MIR10B has been reported previously in gastric cancers (64). We found downregulation in one of the MIR10B target genes, NR4A3, in significant number of oral tongue tumors both in our array data and also in a smaller independent validation set of tumors (Fig. 5C and Supplementary Fig. S8). NR4A3 (NOR1) is an experimentally validated target of MIR10B (http://mirtarbase.mbc.nctu.edu.tw/), is an orphan nuclear receptor, a member of the nuclear receptor superfamily, and is one of the primary classes of therapeutic drug targets (65, 66). The gene is epigenetically silenced in gastric cancers and promotes migration and invasion in gastric cancer (67). Although data from the TCGA HNSCC cohort displayed a hypermethylation trend for the DMP in 95% of the tumors and associated functional DMR in all the tumors (Fig. 4B), we could not test the downregulation of its target genes, NR4A3 and BCL2L11, as gene expression data from the same sets of tumors were not available from the TCGA cohort. BCL2L11, another MIR10B target gene is a potential downstream target of NR4A. We found hypermethylation in miR-10B is moderately associated with the downregulation of BCL2L11 (Fig. 5B). It is possible that there are multiple mechanisms through which the expressions of NR4A3 and BCL2L11 are modulated in OTSCC, and hypermethylation of miR-10B is one such path. Low expression in NR4A1, another member from the nuclear receptor NR4A family, is linked with poor overall survival in aggressive lymphomas (65) and NR4A1 and NR4A3 double-knockout mice rapidly develop acute myeloid leukemia (65). Interestingly, NR4A3 was found to act as an oncogene in neuroblastoma, where its overexpression lowered the survival rate in patients (68). Like in neuroblastoma, it is possible that in solid tumors like HNSCC, NR4A3 acts as an oncogene. However, our data do not provide mechanistic details, and future functional biology experiments are needed to precisely define the role(s) and mechanism of action of NR4A3 in OTSCC. We found tumor-specific differential expression of BCL2L11 and NR4A3 genes to be inversely correlated with disease-free survival, especially in habits negative patients, from our array data. Downregulation of NR4A3 and hence BCL2L11, may explain such an association with improved disease-free survival. Despite the fact that the hypermethylation of MIR10B and downregulation of two of its target genes were validated in an independent validation set, due to the small sample size (N = 7 for habits positive, N = 4 for habits negative, N = 1 habits not known) and therefore lack of statistical power, it was not possible to associate survival with the expression of these genes linked with MIR10B hypermethylation. Therefore, we combined the discovery and validation sets to accurately ascertain such a relationship.

We found that NR4A3 was downregulated in a large fraction of tumors, both in the discovery set as well as an independent validation cohort assayed using qPCR and in TCGA sample cohort (Supplementary Fig. S10) and that the downregulation of this gene was significantly linked with better survival (Fig. 6). Kaplan–Meier survival analyses revealed that the downregulation of NR4A3 expression alone (Fig. 6B and E), but not the absence of habits (Fig. 6A and D), is sufficient for providing survival advantage to the patients. It is possible that habits upregulate NR4A3 expression (as most habits positive patients have lowered extent of NR4A3 downregulation; Fig. 5C), thereby working as a negative factor for survival (Fig. 6C and F). However, future work is needed to define mechanistic details of the mode of action and the implications of the key genes like NR4A3 in OTSCC biology.

In summary, we have identified a set of differential methylation signatures with functional relevance in oral tongue tumors. The association between the differentially methylated signatures with clinical and epidemiologic parameters points to their potential application in diagnosis and disease stratification. In addition, our data on NR4A3 present potential avenues of exploiting the molecule as a target for therapeutic intervention in head and neck cancer.

No potential conflicts of interest were disclosed.

Conception and design: N.M. Krishnan, B. Panda

Development of methodology: N.M. Krishnan, B. Panda

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K. Dhas, J, Nair, V. Palve, J. Bagwan, G. Siddappa, V.D. Kekatpure, M.A. Kuriakose

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): N.M. Krishnan, B. Panda

Writing, review, and/or revision of the manuscript: N.M. Krishnan, A. Suresh, M.A. Kuriakose, B. Panda

Study supervision: B. Panda

Other (data visualization and presentation): N.M. Krishnan

Other (patient cohort selection, follow up): A. Suresh

Research presented in this article is funded by the Department of Electronics and Information Technology, Government of India [ref nr.: 18(4)/2010-E-Infra., 31-03-2010] and the Department of IT, BT and ST, Government of Karnataka, India (ref nr.: 3451-00-090-2-22).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ferlay
J
,
Shin
HR
,
Bray
F
,
Forman
D
,
Mathers
C
,
Parkin
DM
. 
Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008
.
Int J Cancer
2010
;
127
:
2893
917
.
2.
Mishra
A
,
Meherotra
R
. 
Head and neck cancer: global burden and regional trends in India
.
Asian Pac J Cancer Prev
2014
;
15
:
537
50
.
3.
Llewellyn
CD
,
Johnson
NW
,
Warnakulasuriya
KA
. 
Risk factors for squamous cell carcinoma of the oral cavity in young people–a comprehensive literature review
.
Oral Oncol
2001
;
37
:
401
18
.
4.
Kuriakose
M
,
Sankaranarayanan
M
,
Nair
MK
,
Cherian
T
,
Sugar
AW
,
Scully
C
, et al
Comparison of oral squamous cell carcinoma in younger and older patients in India
.
Eur J Cancer B Oral Oncol
1992
;
28B
:
113
20
.
5.
Pathak
KA
,
Das
AK
,
Agarwal
R
,
Talole
S
,
Deshpande
MS
,
Chaturvedi
P
, et al
Selective neck dissection (I-III) for node negative and node positive necks
.
Oral Oncol
2006
;
42
:
837
41
.
6.
Fakhry
C
,
Zhang
Q
,
Nguyen-Tan
PF
,
Rosenthal
D
,
El-Naggar
A
,
Garden
AS
, et al
Human papillomavirus and overall survival after progression of oropharyngeal squamous cell carcinoma
.
J Clin Oncol
2014
;
32
:
3365
73
.
7.
Agrawal
N
,
Frederick
MJ
,
Pickering
CR
,
Bettegowda
C
,
Chang
K
,
Li
RJ
, et al
Exome sequencing of head and neck squamous cell carcinoma reveals inactivating mutations in NOTCH1
.
Science
2011
;
333
:
1154
7
.
8.
Stransky
N
,
Egloff
AM
,
Tward
AD
,
Kostic
AD
,
Cibulskis
K
,
Sivachenko
A
, et al
The mutational landscape of head and neck squamous cell carcinoma
.
Science
2011
;
333
:
1157
60
.
9.
Pickering
CR
,
Zhang
J
,
Yoo
SY
,
Bengtsson
L
,
Moorthy
S
,
Neskey
DM
, et al
Integrative genomic characterization of oral squamous cell carcinoma identifies frequent somatic drivers
.
Cancer Discov
2013
;
3
:
770
81
.
10.
India Project Team of the International Cancer Genome Consortium
. 
Mutational landscape of gingivo-buccal oral squamous cell carcinoma reveals new recurrently-mutated genes and molecular subgroups
.
Nat Commun
2013
;
4
:
2873
.
11.
The Cancer Genome Atlas Network
. 
Comprehensive genomic characterization of head and neck squamous cell carcinomas
.
Nature
2015
;
517
:
576
82
.
12.
Krishnan
NM
,
Gupta
S
,
Palve
V
,
Varghese
L
,
Pattnaik
S
,
Jain
P
, et al
Integrated analysis of oral tongue squamous cell carcinoma identifies key variants and pathways linked to risk habits, HPV, clinical parameters and tumor recurrence
.
F1000Res
2015
;
4
:
1215
.
13.
Lechner
M
,
Fenton
T
,
West
J
,
Wilson
G
,
Feber
A
,
Henderson
S
, et al
Identification and functional validation of HPV-mediated hypermethylation in head and neck squamous cell carcinoma
.
Genome Med
2013
;
5
:
15
.
14.
Wilson
GA
,
Lechner
M
,
Koferle
A
,
Caren
H
,
Butcher
LM
,
Feber
A
, et al
Integrated virus-host methylome analysis in head and neck squamous cell carcinoma
.
Epigenetics
2013
;
8
:
953
61
.
15.
Herceg
Z
,
Hainaut
P
. 
Genetic and epigenetic alterations as biomarkers for cancer detection, diagnosis and prognosis
.
Mol Oncol
2007
;
1
:
26
41
.
16.
Dawson
MA
,
Kouzarides
T
. 
Cancer epigenetics: from mechanism to therapy
.
Cell
2012
;
150
:
12
27
.
17.
Goldberg
AD
,
Allis
CD
,
Bernstein
E
. 
Epigenetics: a landscape takes shape
.
Cell
2007
;
128
:
635
8
.
18.
Bird
A
. 
Perceptions of epigenetics
.
Nature
2007
;
447
:
396
8
.
19.
Ehrlich
M
. 
DNA hypomethylation in cancer cells
.
Epigenomics
2009
;
1
:
239
59
.
20.
Yeh
KT
,
Chang
JG
,
Lin
TH
,
Wang
YF
,
Tien
N
,
Chang
JY
, et al
Epigenetic changes of tumor suppressor genes, P15, P16, VHL and P53 in oral cancer
.
Oncol Rep
2003
;
10
:
659
63
.
21.
Gasche
JA
,
Goel
A
. 
Epigenetic mechanisms in oral carcinogenesis
.
Future Oncol
2012
;
8
:
1407
25
.
22.
Mascolo
M
,
Siano
M
,
Ilardi
G
,
Russo
D
,
Merolla
F
,
De Rosa
G
, et al
Epigenetic disregulation in oral cancer
.
Int J Mol Med Sci
2012
;
13
:
2331
53
.
23.
Shaw
RJ
,
Hobkirk
AJ
,
Nikolaidis
G
,
Woolgar
JA
,
Triantafyllou
A
,
Brown
JS
, et al
Molecular staging of surgical margins in oral squamous cell carcinoma using promoter methylation of p16(INK4A), cytoglobin, E-cadherin, and TMEFF2
.
Ann Surg Oncol
2013
;
20
:
2796
802
.
24.
Breiman
L
. 
Random Forests
.
Dordrecht, The Netherlands
:
Kluwer Academic Publishers
; 
2001
.
25.
Langmead
B
,
Salzberg
SL
. 
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
2012
;
9
:
357
9
.
26.
Pidsley
R
,
CC
YW
,
Volta
M
,
Lunnon
K
,
Mill
J
,
Schalkwyk
LC
. 
A data-driven approach to preprocessing Illumina 450K methylation array data
.
BMC Genomics
2013
;
14
:
293
.
27.
Aryee
MJ
,
Jaffe
AE
,
Corrada-Bravo
H
,
Ladd-Acosta
C
,
Feinberg
AP
,
Hansen
KD
, et al
Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays
.
Bioinformatics
2014
;
30
:
1363
9
.
28.
Johnson
WE
,
Li
C
,
Rabinovic
A
. 
Adjusting batch effects in microarray expression data using empirical Bayes methods
.
Biostatistics
2007
;
8
:
118
27
.
29.
Fortin
JP
,
Labbe
A
,
Lemire
M
,
Zanke
BW
,
Hudson
TJ
,
Fertig
EJ
, et al
Functional normalization of 450k methylation array data improves replication in large cancer studies
.
Genome Biol
2014
;
15
:
503
.
30.
Du
P
,
Kibbe
WA
,
Lin
SM
. 
lumi: a pipeline for processing Illumina microarray
.
Bioinformatics
2008
;
24
:
1547
8
.
31.
Chow
ML
,
Winn
ME
,
Li
HR
,
April
C
,
Wynshaw-Boris
A
,
Fan
JB
, et al
Preprocessing and quality control strategies for illumina DASL assay-based brain gene expression studies with semi-degraded samples
.
Front Genet
2012
;
3
:
11
.
32.
Krishnan
NM
,
Gupta
S
,
Khyriem
C
,
Palve
V
,
Suresh
A
,
Siddappa
G
, et al
Integrated analysis links TP53, NOTCH, SLC38A and 11p with survival in patients with oral tongue squamous cell carcinoma
.
bioRxiv
2015
;
doi
: .
33.
Diaz-Uriarte
R
. 
GeneSrF and varSelRF: a web-based tool and R package for gene selection and classification using random forest
.
BMC Bioinformatics
2007
;
8
:
328
.
34.
Tibshirani
R
,
Efron
B
. 
Improvements on cross-validation: the .632 + bootstrap method
.
J Am Stat Assoc
1997
;
92
:
13
.
35.
Boulesteix
A-L
,
Janitza
S
,
Kruppa
J
,
König
IR
. 
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics
.
WIREs Data Mining Knowl Discov
2012
;
2
:
493
507
.
36.
Boulesteix
AL
,
Janitza
S
,
Hapfelmeier
A
,
Van Steen
K
,
Strobl
C
. 
Letter to the editor: on the term ‘interaction’ 647 and related phrases in the literature on Random Forests
.
Brief Bioinform
2015
;
16
:
338
45
.
37.
Efron
B
. 
Estimating the error rate of a prediction rule
.
J Am Stat Assoc
1983
;
78
:
316
331
.
38.
Schmittgen
TD
,
Livak
KJ
. 
Analyzing real-time PCR data by the comparative CT method
.
Nat Protocols
2008
;
3
:
1101
08
.
39.
Carr
BI
,
Reilly
JG
,
Smith
SS
,
Winberg
C
,
Riggs
A
. 
The tumorigenicity of 5-azacytidine in the male Fischer rat
.
Carcinogenesis
1984
;
5
:
1583
90
.
40.
Denda
A
,
Rao
PM
,
Rajalakshmi
S
,
Sarma
DS
. 
5-azacytidine potentiates initiation induced by carcinogens in rat liver
.
Carcinogenesis
1985
;
6
:
145
6
.
41.
Baylin
SB
,
Herman
JG
. 
DNA hypermethylation in tumorigenesis: epigenetics joins genetics
.
Trends Genet
2000
;
16
:
168
74
.
42.
Lin
CH
,
Hsieh
SY
,
Sheen
IS
,
Lee
WC
,
Chen
TC
,
Shyu
WC
, et al
Genome-wide hypomethylation in hepatocellular carcinogenesis
.
Cancer Res
2001
;
61
:
4238
43
.
43.
Hardy
TM
,
Tollefsbol
TO
. 
Epigenetic diet: impact on the epigenome and cancer
.
Epigenomics
2011
;
3
:
503
18
.
44.
Pogribny
IP
,
Rusyn
I
. 
Role of epigenetic aberrations in the development and progression of human hepatocellular carcinoma
.
Cancer Lett
2014
;
342
:
223
30
.
45.
Cheong
CM
,
Chow
AW
,
Fitter
S
,
Hewett
DR
,
Martin
SK
,
Williams
SA
, et al
Tetraspanin 7 (TSPAN7) expression is upregulated in multiple myeloma patients and inhibits myeloma tumour development in vivo
.
Exp Cell Res
2015
;
332
:
24
38
.
46.
Romanska
HM
,
Berditchevski
F
. 
Tetraspanins in human epithelial malignancies
.
J Pathol
2011
;
223
:
4
14
.
47.
Serpa
J
,
Mesquita
P
,
Mendes
N
,
Oliveira
C
,
Almeida
R
,
Santos-Silva
F
, et al
Expression of Lea in gastric cancer cell lines depends on FUT3 expression regulated by promoter methylation
.
Cancer Lett
2006
;
242
:
191
7
.
48.
Moreira
F
,
Kiehl
TR
,
So
K
,
Ajeawung
NF
,
Honculada
C
,
Gould
P
, et al
NPAS3 demonstrates features of a tumor suppressive role in driving the progression of astrocytomas
.
Am J Pathol
2011
;
179
:
462
76
.
49.
Bignone
PA
,
Lee
KY
,
Liu
Y
,
Emilion
G
,
Finch
J
,
Soosay
AE
, et al
RPS6KA2, a putative tumour suppressor gene at 6q27 in sporadic epithelial ovarian cancer
.
Oncogene
2007
;
26
:
683
700
.
50.
Milosevic
N
,
Kuhnemuth
B
,
Muhlberg
L
,
Ripka
S
,
Griesmann
H
,
Lolkes
C
, et al
Synthetic lethality screen identifies RPS6KA2 as modifier of epidermal growth factor receptor activity in pancreatic cancer
.
Neoplasia
2013
;
15
:
1354
62
.
51.
Clark
AM
,
Reynolds
SH
,
Anderson
M
,
Wiest
JS
. 
Mutational activation of the MAP3K8 protooncogene in lung cancer
.
Genes Chromosomes Cancer
2004
;
41
:
99
108
.
52.
Jeong
JH
,
Bhatia
A
,
Toth
Z
,
Oh
S
,
Inn
KS
,
Liao
CP
, et al
TPL2/COT/MAP3K8 (TPL2) activation promotes androgen depletion-independent (ADI) prostate cancer growth
.
PLoS One
2011
;
6
:
e16205
.
53.
Tunca
B
,
Tezcan
G
,
Cecener
G
,
Egeli
U
,
Zorluoglu
A
,
Yilmazlar
T
, et al
Overexpression of CK20, MAP3K8 and EIF5A correlates with poor prognosis in early-onset colorectal cancer patients
.
J Cancer Res Clin Oncol
2013
;
139
:
691
702
.
54.
Ignatov
T
,
Modl
S
,
Thulig
M
,
Weissenborn
C
,
Treeck
O
,
Ortmann
O
, et al
GPER-1 acts as a tumor suppressor in ovarian cancer
.
J Ovarian Res
2013
;
6
:
51
.
55.
Ignatov
T
,
Weissenborn
C
,
Poehlmann
A
,
Lemke
A
,
Semczuk
A
,
Roessner
A
, et al
GPER-1 expression decreases during breast cancer tumorigenesis
.
Cancer Invest
2013
;
31
:
309
15
.
56.
Beggs
AD
,
Dilworth
MP
,
Domingo
E
,
Midgley
R
,
Kerr
D
,
Tomlinson
IP
, et al
Methylation changes in the TFAP2E promoter region are associated with BRAF mutation and poorer overall & disease free survival in colorectal cancer
.
Oncoscience
2015
;
2
:
508
16
.
57.
Maegawa
S
,
Gough
SM
,
Watanabe-Okochi
N
,
Lu
Y
,
Zhang
N
,
Castoro
RJ
, et al
Age-related epigenetic drift in the pathogenesis of MDS and AML
.
Genome Res
2014
;
24
:
580
91
.
58.
Lu
YC
,
Chen
YJ
,
Wang
HM
,
Tsai
CY
,
Chen
WH
,
Huang
YC
, et al
Oncogenic function and early detection potential of miRNA-10b in oral cancer as identified by microRNA profiling
.
Cancer Prev Res
2012
;
5
:
665
74
.
59.
Tian
Y
,
Luo
A
,
Cai
Y
,
Su
Q
,
Ding
F
,
Chen
H
, et al
MicroRNA-10b promotes migration and invasion through KLF4 in human esophageal cancer cell lines
.
J Biol Chem
2010
;
285
:
7986
94
.
60.
Ma
L
,
Teruya-Feldstein
J
,
Weinberg
RA
. 
Tumour invasion and metastasis initiated by microRNA-10b in breast cancer
.
Nature
2007
;
449
:
682
8
.
61.
Bourguignon
LY
,
Wong
G
,
Earle
C
,
Krueger
K
,
Spevak
CC
. 
Hyaluronan-CD44 interaction promotes c-Src-mediated twist signaling, microRNA-10b expression, and RhoA/RhoC up-regulation, leading to Rho-kinase-associated cytoskeleton activation and breast tumor cell invasion
.
J Biol Chem
2010
;
285
:
36721
35
.
62.
Jiang
J
,
Gusev
Y
,
Aderca
I
,
Mettler
TA
,
Nagorney
DM
,
Brackett
DJ
, et al
Association of MicroRNA expression in hepatocellular carcinomas with hepatitis infection, cirrhosis, and patient survival
.
Clin Cancer Res
2008
;
14
:
419
27
.
63.
Severino
P
,
Bruggemann
H
,
Andreghetto
FM
,
Camps
C
,
Klingbeil Mde
F
,
de Pereira
WO
, et al
MicroRNA expression profile in head and neck cancer: HOX-cluster embedded microRNA-196a and microRNA-10b dysregulation implicated in cell proliferation
.
BMC Cancer
2013
;
13
:
533
.
64.
Li
Z
,
Lei
H
,
Luo
M
,
Wang
Y
,
Dong
L
,
Ma
Y
, et al
DNA methylation downregulated mir-10b acts as a tumor suppressor in gastric cancer
.
Gastric Cancer
2015
;
18
:
43
54
.
65.
Deutsch
AJ
,
Angerer
H
,
Fuchs
TE
,
Neumeister
P
. 
The nuclear orphan receptors NR4A as therapeutic target in cancer therapy
.
Anticancer Agents Med Chem
2012
;
12
:
1001
14
.
66.
Kojetin
DJ
,
Burris
TP
. 
REV-ERB and ROR nuclear receptors as drug targets
.
Nat Rev Drug Discov
2014
;
13
:
197
216
.
67.
Wang
YY
,
Li
L
,
Ye
ZY
,
Zhao
ZS
,
Yan
ZL
. 
MicroRNA-10b promotes migration and invasion through Hoxd10 in human gastric cancer
.
World J Surg Oncol
2015
;
13
:
259
.
68.
Uekusa
S
,
Kawashima
H
,
Sugito
K
,
Yoshizawa
S
,
Shinojima
Y
,
Igarashi
J
, et al
Nr4a3, a possibile oncogenic factor for neuroblastoma associated with CpGi methylation within the third exon
.
Int J Oncol
2014
;
44
:
1669
77
.