Abstract
Differentially methylated oral squamous cell carcinoma (OSCC) biomarkers, identified in vitro and validated in well-characterized surgical specimens, have shown poor clinical correlation in cohorts with different risk profiles.
To overcome this lack of relevance, we used the HumanMethylation27 BeadChip, publicly available methylation and expression array data, and quantitative methylation specific PCR to uncover differential methylation in OSCC clinical samples with heterogeneous risk profiles.
A two stage design consisting of discovery and prevalence screens was used to identify differential promoter methylation and deregulated pathways in patients diagnosed with OSCC and head and neck squamous cell carcinoma.
Promoter methylation of KIF1A (κ = 0.64), HOXA9 (κ = 0.60), NID2 (κ = 0.60), and EDNRB (κ = 0.60) had a moderate to substantial agreement with clinical diagnosis in the discovery screen. HOXA9 had 68% sensitivity, 100% specificity, and a 0.81 Area Under the Curve (AUC). NID2 had 71% sensitivity, 100% specificity, and a 0.79 AUC. In the prevalence screen, HOXA9 (κ = 0.82) and NID2 (κ = 0.80) had an almost perfect agreement with histologic diagnosis. HOXA9 had 85% sensitivity, 97% specificity, and a 0.95 AUC. NID2 had 87% sensitivity, 95% specificity, and a 0.91 AUC. A HOXA9 and NID2 gene panel had 94% sensitivity, 97% specificity, and a 0.97 AUC. In saliva, from OSCC cases and controls, HOXA9 had 75% sensitivity, 53% specificity, and a 0.75 AUC. NID2 had 87% sensitivity, 21% specificity, and a 0.73 AUC.
This phase I Biomarker Development Trial identified a panel of differentially methylated genes in normal and OSCC clinical samples from patients with heterogeneous risk profiles. This panel may be useful for early detection and cancer prevention studies. Cancer Prev Res; 4(7); 1061–72. ©2011 AACR.
Introduction
There are an estimated half a million cases of oral and oropharyngeal cancer worldwide (1). Oral cavity mortality rates have remained unchanged or have decreased in some countries around the world from 1995 to 2005 (2). It is well established that oral cancer incidence and mortality are higher in regions where tobacco habits, in the form of chewing and/or smoking, with or without alcohol intake, are common (3). It is also known that oral cancer cases distribution and occurrence vary by age, ethnic group, culture and life style, and level of country development (4). For example, the population attributable risk (PAR) because of the effects of tobacco and alcohol on oral cavity cancers is lower in the United States than in Europe and Latin America (5).
Although incidence and mortality rates are relatively low compared with other cancers, oral cancer patients are usually diagnosed in an advanced stage, which is associated with worse prognosis and higher radio- and chemotherapy morbidity. Moreover, the oral cavity patient quality of life is disproportionately compromised, because surgical therapy can be mutilating and often has significant effects on swallowing, speech, and physical appearance (6). Evidently, improved oral cancer prevention, early detection, diagnostic, and clinical management tools are needed to identify high-risk patients, such as patients with smoking and alcohol exposures, patients without adequate access to health care, and patients with high-risk lesions such as leukoplakia, which may progress to carcinoma lesions (7).
Quantitative methylation specific PCR (qMSP) has been proposed as a platform to develop early detection, diagnostic, and clinical management biomarkers in head and neck squamous cell carcinoma (HNSCC) (8–10). However, previous efforts at identifying epigenomic biomarkers in HNSCC have been limited by candidate gene or cell culture-based discovery approaches and validation technologies on well-characterized pathology specimens from homogeneous cohorts, all of which have limited the clinical relevance of the results (11–13). Numerous genes in oral squamous cell carcinoma (OSCC) tissues have been studied for promoter methylation status. It has also been shown that histologically normal tissue adjacent to tumors and premalignant lesions can also have high levels of methylation of some genes, suggesting that methylation is an early event in oral carcinogenesis (14). Hypermethylated genes in OSCC have been associated with alterations in proliferation, DNA repair, apoptosis, cell–cell adhesion, and angiogenesis. Clinically, they have been associated with tumor aggressiveness, invasiveness, and the malignant transformation of oral epithelial dysplasia (14). Recently, our laboratory showed that promoter hypermethylation is present in OSCC premalignant lesions, is useful for HNSCC detection, and plays a role in the development of resistance to cytotoxic chemotherapeutic agents (15–18). Other laboratories have shown that promoter methylation of genes in saliva may serve as potential biomarkers for early detection of primary and relapsing OSCC/HNSCC (19, 20).
Pharmacologic unmasking in cell lines with subsequent validation in surgical specimens has been the canonical pseudogenome-wide discovery approach used to identify differentially genes in OSCC (21). This approach has provided a description of hypermethylated and silenced tumor suppressor genes or hypomethylated candidate proto-oncogenes, in well characterized and carefully dissected samples from, in the large part, North American patients (22, 23). The results obtained with the pharmacologic unmasking approach in cell lines, however, have shown poor clinical application, probably because of pharmacologic bias and methylation changes associated with cell lines passage, as well as the high degree of cellular heterogeneity in tumor tissue and saliva samples.
The advent of high-throughput genomic platforms provides the opportunity to examine novel approaches. We set out to overcome the limitations of previous methods by using a novel study design in clinically defined samples from populations with different risk profiles (24–26). We used high-density promoter methylation platforms, publicly available expression arrays, and qMSP in a phase I Biomarker Development Trial (BDT) to identify differentially methylated genes that can distinguish between OSCC/HNSCC tumor and normal tissue in study populations with different risk factors (8, 27, 28). A novel feature of this project, which facilitates the heterogeneous risk factor approach, is the 2-stage design of the study. In the first stage, called the discovery screen, we used clinical samples obtained from Spain, a population with high OSCC risk associated with tobacco smoking and alcohol consumption (29, 30). In the second stage, called the “prevalence screen,” we analyzed the promoter methylation status of the best performing hypermethylated genes identified in the discovery screen on DNA isolated from a separate cohort of HNSCC tumor samples from North America with well-characterized histopathology. Markers that perform well in a population with a heterogeneous risk profile in a clinical setting in the discovery screen have a higher probability of performing well in a well-characterized set of confirmed cases and controls in the prevalence screen. This novel study design maximizes resource investment in phase I BDT, and ensures that more robust biomarkers are tested in phase II. Bioinformatics, biostatistical, and pathway analyses were used to identify relevant genes, and qMSP was used to determine the differential methylation identified in particular genes or pathways.
Methods
Patient selection
Patients for this study were consented at hospitals in Baltimore (n = 143) and Madrid, Spain (n = 36). Normal, premalignant, and OSCC tumor tissue was collected from patients who visited the outpatient clinics of Hospital Gregorio Marañón in Madrid. OSCC tumor and adjacent tissue samples were collected at Hospital Ramon y Cajals in Madrid. Tumor, normal tissue, and saliva rinses from HNSCC and healthy patients were collected at Johns Hopkins Hospital in Baltimore. Salivary rinses were obtained by rinse and gargle of 20 mL saline solution. All participants signed a consent form that clearly explained the risks and benefits of the study. The study was approved by the Ethics Committee of each participating hospital, as well as by the Johns Hopkins Institutional Review Board.
Samples
Tissue samples were frozen in liquid nitrogen and stored in −80°C. The salivary rinses were centrifuged, the supernatant was discarded, and DNA was isolated from the pellet. Tissue samples were shipped to our laboratory at the Head and Neck Cancer Research Division of the Department of Otolaryngology at Johns Hopkins School of Medicine. Tissue (5 mg) and saliva pellets were digested with 1% SDS and 50 μg/mL proteinase K (Boehringer Mannheim) at 48°C overnight, followed by phenol/chloroform extraction and ethanol precipitation of DNA as previously described.
Discovery screen
Bisulfite modification of genomic DNA (2 μg) was carried out with EpiTect Bisulfite Kit (QIAGEN) according to the manufacturer's protocol. We hybridized bisulfite converted DNA from normal (n = 4), leukoplakia (n = 4), and OSCC tissue (n = 4) samples to the HumanMethylation27 DNA Analysis BeadChip assay, which quantitatively interrogates 27,578 CpG sites from 14,495 protein-coding gene promoters and 110 mRNA gene promoters at single-nucleotide resolution. The Infinium Methylation Assay detects cytosine methylation at CpG islands based on highly multiplexed genotyping. The assay interrogates these chemically differentiated loci by using 2 site-specific probes, 1 designed for the methylated locus (M bead type) and another for the unmethylated locus (U bead type). Single-base extension of the probes incorporates a labeled ddNTP, which is subsequently stained with a fluorescence reagent. A β value was used to denote the methylation level of the CpG loci by using the ratio of intensities between methylated and unmethylated alleles (β value = methylation intensity/methylation + unmethylated intensity of the given CpG locus).
Hierarchical clustering analysis and heatmap creation
The average β values of all probes on the Illumina Infinium arrays were subjected to log10 transformation and used to generate a heatmap based on unsupervised hierarchical clustering with Spotfire DecisionSite. This clustering was based on the unweighted average method by using correlation as the similarity measure and ordering by average values. The color red was selected to represent hypermethylated genes and the color blue to represent hypomethylated genes.
Differential methylation bioinformatics
Bioinformatics strategies were used for background correction, normalization, and data analysis of differentially methylated genomic regions between tumor, leukoplakia, and normal tissue. The gene selection from the Illumina Infinium assay data was carried out in a stepwise manner. An F test was carried out across all 12 samples to identify genes with a significant difference in β values between normal, leukoplakia, and malignant tissue. Because the empirical P values were calculated genome-wide, adjustment for multiple testing was carried out. Rather than by using a Bonferroni correction, which is very stringent, the P values were transformed into q-values, which are measures of significance in terms of the false discovery rate instead of the false positive rate normally associated with P values (31). q-Values were computed from the empirical P values by using the Benjamin–Hochberg correction. Probes with q-values less than 0.05 were deemed statistically significant and were included in the final gene list. We then selected only those genes that showed a methylation difference of at least 0.2 between cancer and normal tissues and a β value of at least 0.3 in cancer. All bioinformatics analyses were carried out by using R version 2.11.1.
Comparison to existing databases of known methylation events in cancer.
The list of differentially hypermethylated genes in tumor tissue identified in the discovery screen was first compared against existing databases of known methylation events in cancer (32, 33). We generated a list of genes that have been previously shown to be hypermethylated in OSCC/HNSCC and in other tumor tissues.
Cross reference of hypermethylated genes with publicly available OSCC gene expression arrays
We then searched the Gene Expression Omnibus database (National Center for Biotechnology Information) for published expression analyses, which compared favorably with our Infinium Methylation Assay in terms of analyzing OSCC tissues against normal oral tissues from healthy patients, and not to adjacent normal oral tissues. GSE10121 hybridized 35 OSCC samples and 6 oral mucosa tissues from healthy participants to a whole-transcriptome spotted array that contains 35,035 gene-specific oligos (Human OligoSet 4.0; ref. 34). The expression of genes identified to be hypermethylated in tumor tissue was examined for evidence of tumor downregulation in the expression arrays, and a list of genes showing both hypermethylation and downregulation was generated (Fig. 1).
Flowchart of the data analysis and integration tasks carried out to identify HOXA9 and NID2 as 2 novel hypermethylated genes in OSCC and HNSCC.
Flowchart of the data analysis and integration tasks carried out to identify HOXA9 and NID2 as 2 novel hypermethylated genes in OSCC and HNSCC.
Ingenuity pathway analysis
Pathway and ontology analysis were conducted to identify how differential methylation alters cellular networks and signaling pathways in OSCC. A list of RefSeq identifiers for hypermethylated/downregulated genes was uploaded to the ingenuity pathway analysis (IPA) program, enabling exploration of gene ontology and molecular interaction. Each uploaded gene identifier was mapped to its corresponding gene object (focus genes) in the Ingenuity Pathways Knowledge Base. Core networks were constructed for both direct and indirect interactions by using default parameters, and the focus genes with the highest connectivity to other focus genes were selected as seed elements for network generation. New focus genes with high-specific connectivity (overlap between the initialized network and gene's immediate connections) were added to the growing network until the network reached a default size of 35 nodes. Nonfocus genes (those that were not among our differentially methylated input list) that contained a maximum number of links to the growing network were also incorporated. The ranking score for each network was then computed by a right-tailed Fisher's exact test as the negative log of the probability that the number of focus genes in the network is not because of random chance. Similarly, significances for functional enrichment of specific genes were also determined by the right-tailed Fisher's exact test, by using all input genes as a reference set.
Validation of in silico findings with quantitative methylation specific PCR
qMSP was used to validate the candidate genes identified in the discovery screen on a cohort of oral cavity tissue samples from noncancer and OSCC patients from Spain and the United States. Briefly, bisulfite-modified DNA was used as template for fluorescence-based real-time PCR, as previously described (35). Fluorogenic PCR reactions were carried out in a reaction volume of 20 μL consisting of 600 nmol/L of each primer; 200 μmol/L probe; 0.75 units platinum Taq polymerase (Invitrogen); 200 μmol/L each of dATP, dCTP, dGTP, and dTTP; 200 nmol/L ROX dye reference (Invitrogen); 16.6 mmol/L ammonium sulfate; 67 mmol/L Trizma (Sigma); 6.7 mmol/L magnesium chloride; 10 mmol/L mercaptoethanol; and 0.1% dimethylsulfoxide. Duplicates of 3 μL of bisulfite-modified DNA solution were used in each real-time MSP amplification reaction. Primers and probes were designed to amplify a segment of a CpG island in the promoter of genes of interest and of a reference gene, actin-B (ACTB) as previously described. Primers and probes were tested on positive (genomic methylated bisulfite converted DNA) and negative controls (genomic unmethylated bisulfite converted DNA) to ensure amplification of the desired product and nonamplification of unmethylated DNA, respectively. Primer and probe sequences and annealing temperatures are provided in Supplementary Table S1.
Amplification reactions were carried out in 384-well plates in a 7900 Sequence Detector (Perkin-Elmer Applied Biosystems) and were analyzed by SDS 2.2.1 (Sequence Detector System; Applied Biosystems). Thermal cycling was initiated with a first denaturation step at 95°C for 3 minutes, followed by 50 cycles at 95°C for 15 seconds, and annealing temperature for 1 minute. Each plate included patient DNA samples, positive (bisulfite converted hypermethylated universal DNA standard, Zymo Research), and multiple water blanks as nontemplate controls. Serial dilutions (60–0.006 ng) of this DNA were used to construct a calibration curve for each plate. The relative level of methylated DNA for each gene in each sample was determined as a ratio of qMSP for the amplified gene to ACTB and then multiplied by 100 for easier tabulation. The samples were categorized as unmethylated or methylated based on detection of methylation above a threshold set for each gene. This threshold was determined by using receiver operating characteristic (ROC) curves analyzing the levels and distribution of methylation, if any, in normal tissues.
Prevalence screen
We then analyzed the promoter methylation status of the best performing hypermethylated genes found in the discovery screen on DNA from a set of well-characterized HNSCC tumor samples and healthy patients. This allowed the validation of the hypermethylated genes in an independent set of tumors, as well as provided an estimation of the hypermethylation prevalence among a larger number of tumors with well-characterized pathology. Furthermore, to examine the feasibility of creating a diagnostic panel we examined the promoter methylation status of the best performing candidate tumor suppressor genes in saliva samples from HNSCC and healthy patients.
Biostatistics
To test the reliability of qMSP results in identifying tumor tissue Cohen's kappa (κ) was used. Because there is no clear-cut “gold standard” for the qMSP results, equal weight was applied to promoter methylation for all genes. A κ statistic less than 0 would suggest poor agreement, 0 to 0.20 slight, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 almost perfect. CIs were calculated for the κ statistic by using the Stata command “kapci.” STATA uses an analytic method for simple 2-by-2 comparisons and a bootstrap method in the case of dichotomous variables. ROC curves were drawn to obtain sensitivity and specificity in the discovery screen and most suitable cutoff values were chosen. To measure the association between 2 qMSP results and tumor status, the χ2 and OR were calculated. The inverse-logit function logit−1 (x) was used to transform the qMSP results from continuous methylation values to probabilities in single predictor logistic regression models. Scatterplots, volcano plots, and ROC curves were drawn to describe the results. All biostatistics analyses were conducted by using Stata 11 and R 2.11.1 (www.r-project.org).
Results
Discovery screen: methylation progression in OSCC
The bisulfite converted DNA from normal (n = 4), leukoplakia (n = 4), and OSCC tissue (n = 4) samples was hybridized to the HumanMethylation27 DNA Analysis BeadChip assay, which quantitatively interrogates 27,578 CpG sites from 14,495 protein-coding gene promoters and 110 microRNA gene promoters at single-nucleotide resolution. The average β values of all probes were log-transformed and used to generate a heatmap based on unsupervised hierarchical clustering. The clustering of all CpG loci (p < 0.05) clearly distinguishes between methylation events in normal, leukoplakia, and oral cancer tissue (Fig. 2A). Hypomethylated genes can be seen in blue and hypermethylated genes in red. A closer examination of differential methylation in a subset of genes shows a progression to hypermethylation in OSCC samples when compared with normal and leukoplakia samples (Fig. 2B).
A, unsupervised hierarchical clustering based on the unweighted average method by using correlation as the similarity measure and ordering by average values. The color red was selected to represent hypermethylated genes and the color green to represent hypomethylated genes. Normal mucosa, leukoplakia, and OSCC samples were examined by using the HumanMethylation27 DNA Analysis BeadChip assay, which interrogates 27,578 CpG sites from 14,495 protein-coding gene promoters and 110 mRNA gene promoters. B, a closer examination of differential methylation in a subset of genes show a progression to hypermethylation in OSCC samples when compared with normal and leukoplakia samples. C, Venn diagrams show the overlaps of significantly hypermethylated genes in cancer when compared with normal tissue, genes hypermethylated in leukoplakia when compared with normal mucosa, and hypermethylated genes in tumor when compared with leukoplakia tissue. D, Venn diagrams show the overlaps of significantly hypomethylated genes in tumor when compared with normal samples, genes hypomethylated in leukoplakia when compared with normal tissue, and genes hypomethylated in leukoplakia when compared with normal tissue. E, the progression of differential methylation events between the 3 groups of tissue examined with the In finium Methylation Assay, normal oral mucosa, leukoplakia, and OSCC tissues, is depicted. The number of potential tumor suppressor genes and potential proto-oncogenes are shown for every 2-way comparison between the 3 histology groups. F, a bar graph showing the chromosomal frequency of hypermethylated genes in OSCC when compared with normal mucosa. The majority of the 301 hypermethylated genes are clustered from chromosome 1 to chromosome 11. G, a bar graph showing the chromosomal frequency of hypomethylated genes in OSCC when compared with normal mucosa. The majority of the 62 hypomethylated genes are clustered between chromosome 8 and chromosome 19.
A, unsupervised hierarchical clustering based on the unweighted average method by using correlation as the similarity measure and ordering by average values. The color red was selected to represent hypermethylated genes and the color green to represent hypomethylated genes. Normal mucosa, leukoplakia, and OSCC samples were examined by using the HumanMethylation27 DNA Analysis BeadChip assay, which interrogates 27,578 CpG sites from 14,495 protein-coding gene promoters and 110 mRNA gene promoters. B, a closer examination of differential methylation in a subset of genes show a progression to hypermethylation in OSCC samples when compared with normal and leukoplakia samples. C, Venn diagrams show the overlaps of significantly hypermethylated genes in cancer when compared with normal tissue, genes hypermethylated in leukoplakia when compared with normal mucosa, and hypermethylated genes in tumor when compared with leukoplakia tissue. D, Venn diagrams show the overlaps of significantly hypomethylated genes in tumor when compared with normal samples, genes hypomethylated in leukoplakia when compared with normal tissue, and genes hypomethylated in leukoplakia when compared with normal tissue. E, the progression of differential methylation events between the 3 groups of tissue examined with the In finium Methylation Assay, normal oral mucosa, leukoplakia, and OSCC tissues, is depicted. The number of potential tumor suppressor genes and potential proto-oncogenes are shown for every 2-way comparison between the 3 histology groups. F, a bar graph showing the chromosomal frequency of hypermethylated genes in OSCC when compared with normal mucosa. The majority of the 301 hypermethylated genes are clustered from chromosome 1 to chromosome 11. G, a bar graph showing the chromosomal frequency of hypomethylated genes in OSCC when compared with normal mucosa. The majority of the 62 hypomethylated genes are clustered between chromosome 8 and chromosome 19.
The bioinformatics strategy identified a progression of differential methylation events between the 3 tissue sample groups examined with the Infinium Methylation Assay. We observed 301 potential tumor suppressor genes that were significantly hypermethylated in cancer when compared with normal tissue, 92 genes hypermethylated in leukoplakia when compared with normal mucosa, and 143 hypermethylated genes in tumor when compared with leukoplakia tissue. There were 86 hypermethylated genes overlapping the 143 tumor/leukoplakia and 301 tumor/normal differentially methylated genes (Fig. 2C). We also observed 62 potential proto-oncogenes hypomethylated in tumor when compared with normal samples, 168 genes hypomethylated in leukoplakia when compared with normal tissue, and 47 genes hypomethylated in leukoplakia when compared with tumor tissue. Ten genes overlapped the 62 cancer/normal and the 47 cancer/leukoplakia differentially methylated genes. Four genes overlapped the 62 cancer/normal and the 168 leukoplakia/normal differentially methylated genes (Fig. 2D). In addition, 92 potential tumor suppressor genes were hypermethylated in leukoplakia when compared with normal tissue and 168 potential proto-oncogenes were hypomethylated in leukoplakia when compared with normal tissue. Furthermore, 143 potential tumor suppressor genes were hypermethylated in cancer when compared with leukoplakia tissue and 47 potential proto-oncogenes were hypomethylated in leukoplakia when compared with normal tissue (Fig. 2E).
We then examined the chromosomal distribution of hypermethylated (Fig. 2F) and hypomethylated (Fig. 2G) genes in OSCC. There is a nonstochastic distribution for hypermethylation and hypomethylation events, which may be a reflection of both driving oncogenic transformative events and phenotypic changes resultant from the oncogenic transformation. Most of the 301 hypermethylated genes (Supplementary Table S2a) are clustered from chromosome 1 to 11, whereas the majority of the 62 hypomethylated (Supplementary Table S2b) genes are clustered between chromosome 8 and 19.
Gene ontology and IPA of hypermethylated genes
The 301 hypermethylated differentially genes were analyzed for their biological significance by using Geneontology (Spotfire) and IPA. The differentially hypermethylated genes were found to be associated with the following pathways intimately related to oncogenic transformation: cell adhesion, cell proliferation, growth regulation, and cell death (P < 0.05).
The top associated network functions in IPA, shown in Supplementary Table S3, are pathways directly related to tumorigenesis as cell signaling and interaction, nucleic acid metabolism, DNA replication, recombination and repair, cellular assembly organization, function, and maintenance.
Analysis of correlation of promoter hypermethylation with gene expression
From the list of 301 hypermethylated genes in cancer, we were able to find expression values for 275 of them in the expression arrays selected for methylation-transcriptional silencing analysis, 140 of which were downregulated. A volcano plot illustrating the correlation between the expression array results and our list of hypermethylated and hypomethylated genes is shown in Figure 3A. A Venn diagram showing the relationship of downregulated genes in the expression array to the hypermethylated gene list is depicted in Figure 3B.
A, volcano plot of P values versus effect size in the expression arrays. The log10 of the fold change P value of tumor versus control from the 9,441 upregulated genes are plotted to the right of the zero effect size value in center and log10 of the fold change P value of tumor versus control from the 8,356 downregulated genes are plotted to the left. The 301 significantly hypermethylated genes in OSCC are in red: 140 genes are plotted to the left and 161 are plotted to the right of effect size in the center. The y axis represents the log10 of the fold hange P value of tumor versus control. The x axis represents the log2 fold change of tumor versus control. B, a Venn diagram shows the overlapping 140 genes, significantly hypermethylated in the methylation platforms and significantly downregulated in the expression arrays.
A, volcano plot of P values versus effect size in the expression arrays. The log10 of the fold change P value of tumor versus control from the 9,441 upregulated genes are plotted to the right of the zero effect size value in center and log10 of the fold change P value of tumor versus control from the 8,356 downregulated genes are plotted to the left. The 301 significantly hypermethylated genes in OSCC are in red: 140 genes are plotted to the left and 161 are plotted to the right of effect size in the center. The y axis represents the log10 of the fold hange P value of tumor versus control. The x axis represents the log2 fold change of tumor versus control. B, a Venn diagram shows the overlapping 140 genes, significantly hypermethylated in the methylation platforms and significantly downregulated in the expression arrays.
Comparison with existing databases of known methylation events in cancer
We generated a list of genes that have been previously shown to be hypermethylated in OSCC/HNSCC and in other tumor tissues.
Target gene selection
By using multiple selection criteria, we chose 8 genes to validate on OSCC and normal oral mucosa tissue samples. The 8 genes were among the 140 hypermethylated genes identified in the discovery screen that were also downregulated in the public expression database, we examined for OSCC. Four of those genes (EDNRB, HOXA9, GATA4, and NID2) were identified as hypermethylated in non-OSCC/HNSCC tumor tissues in a Pubmed search (36–40). Another 4 genes (MCAM, KIF1A, DCC, and CALCA) were also found to be hypermethylated in existing databases of known methylation events in cancer (17, 18, 41). Hypermethylation of EDNRB, KIF1A, and DCC has been previously shown to be associated with HNSCC histology. We thus chose these genes as benchmarks against which we could evaluate our genomic approach. CALCA, HOXA9, GATA4, and NID2 had never been shown to be hypermethylated in OSCC.
Promoter methylation association with oral cavity malignancy
Differential promoter methylation status was confirmed with qMSP in the discovery screen. The discovery screen cohort consisted of 24 OSCC samples and 12 normal oral cavity mucosa tissues obtained from the Madrid OSCC study. Six out of the 8 genes, EDNRB, HOXA9, GATA4, NID2, KIF1A, and DCC show differential methylation between cases and controls (Fig. 4). The promoter methylation status of the remainder 2 genes, CALCA and MCAM, however, did not seem to differ between tumor tissue and normal mucosa.
Scatterplots of qMSP analysis of candidate genes promoters in the discovery screen cohort, which consisted of 24 OSCC samples and 12 normal oral cavity mucosa tissues. The relative level of methylated DNA for each gene in each sample was determined as a ratio of MSP for the amplified gene to ACTB and then multiplied by 100 [(average value of duplicates of gene of interest÷average value of duplicates of ACTB) × 100] for EDNRB, HOXA9, GATA4, NID2, MCAM, KIF1A, DCC, and CALCA. Red line denotes cutoff value.
Scatterplots of qMSP analysis of candidate genes promoters in the discovery screen cohort, which consisted of 24 OSCC samples and 12 normal oral cavity mucosa tissues. The relative level of methylated DNA for each gene in each sample was determined as a ratio of MSP for the amplified gene to ACTB and then multiplied by 100 [(average value of duplicates of gene of interest÷average value of duplicates of ACTB) × 100] for EDNRB, HOXA9, GATA4, NID2, MCAM, KIF1A, DCC, and CALCA. Red line denotes cutoff value.
Promoter methylation of KIF1A (κ = 0.64), HOXA9 (κ = 0.60), NID2 (κ = 0.60), and EDNRB (κ = 0.60) had a moderate to substantial agreement with clinical diagnosis in the discovery screen. The four of them also had a percentage agreement of 79% or more (Table 1).
Kappa statistic (κ) of interclassification agreement, 95% CIs, and agreement percentage between classification of samples by promoter methylation of EDNRB, HOXA9, GATA4, NID2, MCAM, KIF1A, DCC, CALCA, and by clinical diagnosis (discovery screen − n = 36) and by histology (prevalence screen − n = 92)
Variable . | Kappa coefficient . | |
---|---|---|
. | κ (95% CI) . | Percentage agreement (%) . |
Discovery screen | ||
HOXA9 | 0.60 (0.36–0.84) | 79 |
NID2 | 0.60 (0.36–0.84) | 80 |
GATA4 | 0.37 (0.09–0.65) | 67 |
KIF1A | 0.64 (0.39–0.89) | 82 |
EDNRB | 0.60 (0.36–0.84) | 79 |
DCC | 0.44 (0.18–0.70) | 71 |
MCAM | −0.12 (−0.42–0.18) | 41 |
CALCA | 0.24 (−0.02–0.51) | 59 |
Prevalence screen | ||
HOXA9 | 0.82 (0.70–0.94) | 91 |
NID2 | 0.80 (0.68–0.92) | 90 |
Variable . | Kappa coefficient . | |
---|---|---|
. | κ (95% CI) . | Percentage agreement (%) . |
Discovery screen | ||
HOXA9 | 0.60 (0.36–0.84) | 79 |
NID2 | 0.60 (0.36–0.84) | 80 |
GATA4 | 0.37 (0.09–0.65) | 67 |
KIF1A | 0.64 (0.39–0.89) | 82 |
EDNRB | 0.60 (0.36–0.84) | 79 |
DCC | 0.44 (0.18–0.70) | 71 |
MCAM | −0.12 (−0.42–0.18) | 41 |
CALCA | 0.24 (−0.02–0.51) | 59 |
Prevalence screen | ||
HOXA9 | 0.82 (0.70–0.94) | 91 |
NID2 | 0.80 (0.68–0.92) | 90 |
Sensitivity, specificity, Area Under the Curve (AUC), and methylation cutoff values for each of the genes evaluated in the discovery screen are shown in Table 2. Of the 4 candidate genes that had an AUC value of 0.75 or more, only NID2 and HOXA9 had 100% specificity and sensitivity more than 70%. We selected these 2 genes for further testing with the prevalence screen.
Predictive accuracy of EDNRB, HOXA9, GATA4, NID2, MCAM, KIF1A, DCC, and CALCA with oral squamous cell carcinoma. Discovery screen (n = 36); prevalence screen (n = 92)
Predictor . | Sensitivity (%) . | Specificity (%) . | AUC . | Methylation cutoff value . |
---|---|---|---|---|
Discovery screen . | ||||
HOXA9 | 68 | 100 | 0.81 | 13.11 |
NID2 | 71 | 100 | 0.79 | 11.48 |
GATA4 | 57 | 89 | 0.72 | 0.96 |
KIF1A | 77 | 92 | 0.79 | 7.14 |
EDNRB | 68 | 100 | 0.83 | 17.42 |
DCC | 55 | 92 | 0.74 | 9.04 |
MCAM | 36 | 58 | 0.45 | 13.44 |
CALCA | 46 | 83 | 0.67 | 12.85 |
Prevalence screen | ||||
HOXA9 | 85 | 97 | 0.91 | 13.11 |
NID2 | 87 | 95 | 0.92 | 11.48 |
HOXA9 and NID2 | 94 | 97 | 0.97 | a |
Predictor . | Sensitivity (%) . | Specificity (%) . | AUC . | Methylation cutoff value . |
---|---|---|---|---|
Discovery screen . | ||||
HOXA9 | 68 | 100 | 0.81 | 13.11 |
NID2 | 71 | 100 | 0.79 | 11.48 |
GATA4 | 57 | 89 | 0.72 | 0.96 |
KIF1A | 77 | 92 | 0.79 | 7.14 |
EDNRB | 68 | 100 | 0.83 | 17.42 |
DCC | 55 | 92 | 0.74 | 9.04 |
MCAM | 36 | 58 | 0.45 | 13.44 |
CALCA | 46 | 83 | 0.67 | 12.85 |
Prevalence screen | ||||
HOXA9 | 85 | 97 | 0.91 | 13.11 |
NID2 | 87 | 95 | 0.92 | 11.48 |
HOXA9 and NID2 | 94 | 97 | 0.97 | a |
aClassified positive (+) if predicted probability for positive outcome (tumor) ≥ 0.5.
Prevalence screen
We examined HOXA9 and NID2 promoter methylation status in 55 HNSCC tumor tissue samples and 37 normal tissue samples obtained from uvulopharyngopalatoplasty (UPPP) procedures carried out in noncancer patients. Minimal or no promoter methylation of HOXA9 and NID2 was observed in the normal oral cavity mucosa samples, whereas varying degrees of hypermethylation were present in the OSCC samples (Fig. 5A).
A, scatterplots of qMSP analysis of candidate genes promoters in the prevalence screen cohort, which consisted of 55 HNSCC tumor tissue samples and 37 normal tissue samples obtained from UPPP procedures carried out in noncancer patients. The relative level of methylated DNA for each gene in each sample was determined as a ratio of MSP for the amplified gene to ACTB and then multiplied by 100 [(average value of duplicates of gene of interest÷average value of duplicates of ACTB) × 100] for HOXA9 and NID2. Red line denotes cutoff value. B, ROC curve of HOXA9 (solid line) and NID2 (dashed line) methylation in a HNSCC prevalence cohort (n = 92).
A, scatterplots of qMSP analysis of candidate genes promoters in the prevalence screen cohort, which consisted of 55 HNSCC tumor tissue samples and 37 normal tissue samples obtained from UPPP procedures carried out in noncancer patients. The relative level of methylated DNA for each gene in each sample was determined as a ratio of MSP for the amplified gene to ACTB and then multiplied by 100 [(average value of duplicates of gene of interest÷average value of duplicates of ACTB) × 100] for HOXA9 and NID2. Red line denotes cutoff value. B, ROC curve of HOXA9 (solid line) and NID2 (dashed line) methylation in a HNSCC prevalence cohort (n = 92).
HOXA9 (κ = 0.82; 95% CI, 0.70–0.94) and NID2 (κ = 0.80; 95% CI, 0.68–0.92) had an almost perfect agreement with histologic diagnosis in the prevalence screen (Table 1). ROC analyses by using the cutoff values optimized for the discovery screen revealed that HOXA9 had 85% sensitivity, 97% specificity, and a 0.95 AUC. NID2 had 87% sensitivity, 95% specificity, and a 0.91 AUC (Fig. 5B). A HOXA9 and NID2 gene panel had 94% sensitivity, 97% specificity, and a 0.97 AUC (Table 2).
Diagnostic panel in saliva–oral and oropharyngeal squamous cell carcinoma
To test the feasibility of creating a diagnostic panel in saliva, we examined the promoter methylation status of HOXA2 and NID2 in saliva samples from 16 patients with OSCC, 16 patients with oropharyngeal squamous cell carcinoma (OPSCC) and saliva samples from 19 noncancer patients. Methylation of HOXA9 and NID2 promoters of the 51 saliva samples can be seen in Supplementary Figure S1A. Promoter methylation of NID2 (κ = 0.55) and HOXA9 (κ = 0.52) had a moderate agreement with histologic diagnosis (Table 3).
Kappa statistic (κ) of interclassification agreement, 95% CIs, and agreement percentage between sample classification by promoter methylation of HOXA9 and NID2 and by histology on head and neck cancer saliva samples (n = 51)
Variable . | Kappa coefficient . | |
---|---|---|
. | κ (95% CI) . | Percentage agreement (%) . |
Oral and oropharyngeal cancer | ||
HOXA9 | 0.52 (−0.12–0.42) | 59 |
NID2 | 0.55 (−0.18–0.37) | 59 |
Oral cancer | ||
HOXA9 | 0.21 (−0.11–0.52) | 60 |
NID2 | 0.23 (−0.03–0.49) | 60 |
Variable . | Kappa coefficient . | |
---|---|---|
. | κ (95% CI) . | Percentage agreement (%) . |
Oral and oropharyngeal cancer | ||
HOXA9 | 0.52 (−0.12–0.42) | 59 |
NID2 | 0.55 (−0.18–0.37) | 59 |
Oral cancer | ||
HOXA9 | 0.21 (−0.11–0.52) | 60 |
NID2 | 0.23 (−0.03–0.49) | 60 |
ROC analyses of the prevalence screen samples, using the methylation cutoff values optimized for the discovery screen, revealed that HOXA9 had a sensitivity of 63%, a specificity of 53%, and an AUC of 0.65. NID2 had a sensitivity of 72%, a specificity of 21%, and an AUC of 0.57 (Table 4).
Predictive accuracy of HOXA9 and NID2 on head and neck cancer saliva samples
Predictor . | Sensitivity (%) . | Specificity (%) . | AUC . | Methylation cutoff value . |
---|---|---|---|---|
Oral and oropharyngeal cancer (n = 51) | ||||
HOXA9 | 63 | 53 | 0.65 | 13.11 |
NID2 | 72 | 21 | 0.57 | 11.48 |
Oral cancer (n = 35) | ||||
HOXA9 | 75 | 53 | 0.75 | 13.11 |
NID2 | 87 | 21 | 0.73 | 11.48 |
HOXA9 and NID2 | 50 | 90 | 0.75 | a |
Predictor . | Sensitivity (%) . | Specificity (%) . | AUC . | Methylation cutoff value . |
---|---|---|---|---|
Oral and oropharyngeal cancer (n = 51) | ||||
HOXA9 | 63 | 53 | 0.65 | 13.11 |
NID2 | 72 | 21 | 0.57 | 11.48 |
Oral cancer (n = 35) | ||||
HOXA9 | 75 | 53 | 0.75 | 13.11 |
NID2 | 87 | 21 | 0.73 | 11.48 |
HOXA9 and NID2 | 50 | 90 | 0.75 | a |
aClassified positive (+) if predicted probability for positive outcome (tumor) ≥ than 0.5.
Diagnostic panel in saliva–oral squamous cell carcinoma
We then examined the use of promoter methylation status of HOXA2 and NID2 for cancer detection in saliva samples from 16 patients with OSCC and saliva samples from 19 noncancer patients. Methylation of HOXA9 and NID2 promoters of the 35 saliva samples can be seen in Supplementary Figure S1B. Promoter methylation of NID2 (κ = 0.23) and HOXA9 (κ = 0.21) had a fair agreement with histologic diagnosis (Table 3). ROC analyses by using the cutoff values optimized for the discovery screen revealed that HOXA9 had a sensitivity of 75%, a specificity of 53%, and an AUC of 0.75. NID2 had a sensitivity of 87%, a specificity of 21%, and an AUC of 0.73. A panel of HOXA9 and NID2 had a sensitivity of 50%, a specificity of 90%, and an AUC of 0.77 (Table 4).
The numbers of cancer/noncancer patients, and unmethylated/methylated samples in each category for the discovery, prevalence, and saliva screens are described in Supplementary Table S4. The table also describes the thresholds of promoter methylation levels (cutoff levels). Supplementary Table S5 shows the summary statistics for HOXA9 and NID2 promoter methylation values in tissues and saliva samples.
Discussion
To our knowledge, this is the first study that utilizes an unbiased genome-wide DNA methylation platform to uncover differentially methylated genes in OSCC. We successfully implemented a novel approach that combines high-density promoter methylation platforms, together with publicly available methylation, and gene expression array data, to identify novel hypermethylated genes in OSCC and HNSCC. These candidate genes can be used in diagnostic panels for early detection of OSCC and HNSCC in tissue and saliva from patients in countries with different PAR for HNSCC.
We used clinical rather than pathologic characterization of our samples in the discovery screen and well-characterized surgical samples in the prevalence screen. By testing our candidate genes in samples from populations with different risk profiles, we aimed to increase their clinical usefulness to both surgical and dental practitioners. (See Supplementary Table S4 for a description of cancer/noncancer patients and unmethylated/methylated samples described in each category.)
Our results suggest that patient heterogeneity in PAR to the 2 main risk factors for OSCC strengthens this molecular study, and may lead to a better reproducibility of the results in other populations. The focus of this study was not, however, to explore the potential associations of the novel hypermethylated genes we identified in this study with tobacco and alcohol consumption. We first wanted to establish the usefulness of using clinical samples from OSCC patients with heterogeneous risk profiles to identify hypermethylated genes that could distinguish between normal and tumor samples in phase I Biomarker Development studies, utilizing unbiased genome-wide arrays.
Precancerous lesions of the upper aerodigestive tract include leukoplakia, erythroplakia, and leukoerythroplakia. These are clinically defined lesions that have a higher degree of oncogenic risk when compared with normal oral mucosa. When these lesions show evidence of cellular atypia without evidence of invasion, they are defined as dysplastic. The presence of dysplasia increases the oncogenic risk of these lesions. Our focus in this project was to identify markers that may be useful for early diagnosis in populations with different risk profiles, and thus our interest in using samples from Europe in the discovery screen and from North America in the prevalence screen.
The significant differentially hypermethylated genes identified by our approach were found to be associated with oncogenic transformation pathways and cellular functions that are deregulated in cancer. The changes identified by pathway and gene ontology analysis underlie the progressive acquisition of a malignant phenotype in OSCC progression, which should be looked into utilizing a pathway driven approach that is beyond the scope of this article.
The heatmap revealed differential DNA methylation progression, distinguishing methylation or demethylation events unique to each of the following tissue types: normal, leukoplakia, and tumor. Our integrative approach identified HOXA9 and NID2 as novel differentially methylated genes in OSCC in the discovery screen. These genes then proved to have significantly higher methylation levels in tumor than normal mucosa in the prevalence screen, as evidenced by ROC analysis of HNSCC samples. This finding may have clinical applicability in OSCC/HNSCC resection margin assessment, and in situations when an OSCC/HNSCC biopsy is nonconclusive such as in postchemoradiation scenarios.
The evaluation of promoter methylation of HOXA9 and NID2, in separate cohorts of salivary rinses from noncancer individuals and patients with OSCC, revealed high-specificity values limited sensitivity, and an AUC of 0.77 for a panel of both genes. These results favorably compare with previous hypermethylated markers evaluated in saliva (17). The fact that we observed much better sensitivity and AUC results in saliva from OSCC patients compared with saliva from OPSCC patients might result from 2 possible causes. The first may be because of the different risk factors associated with OSCC and OPSCC. A large proportion of OPSCC is human papilloma virus related as opposed to OSCC, which is smoking and alcohol related. It might very well be that viral-related tumors have different patterns of methylation motifs than chemical-related tumors. Another plausible cause is that saliva does not bathe the oropharynx to the same extent as it does the oral cavity, and thus there is less representation of tumor tissue cells in the saliva compartment.
The functional role of NID2 and HOXA9 has not been investigated in OSCC/HNSCC. Nidogens are believed to connect laminin and collagen IV networks, hence stabilizing the basement membrane structure. Nidogens are also important for cell adhesion, as they establish contacts with various cellular integrins (36). Loss of nidogen expression in OSCC may thus favor invasion and metastasis of tumor cells by loosening cell interaction with basal membrane and by weakening the strength of the basement membrane itself.
Functional studies have revealed that loss of HOXA9 promotes mammary epithelial cell growth and survival, as well as altered tissue morphogenesis. Restoring HOXA9 expression represses growth and survival, and inhibits the malignant phenotype of breast cancer cells in culture and in xenograft mouse models. HOXA9 has been shown to restrict breast tumor behavior by directly modulating the expression of BRCA1, a DNA repair gene (38). Therefore, HOXA9 hypermethylation may lead to diminished DNA repair capacity in OSCC/HNSCC, thus increasing cancer risk particularly in those patients that smoke tobacco, which has been shown to lead to the formation of oncogenic DNA–polycyclic aromatic hydrocarbon adducts (42).
We used the 2 most common statistical approaches utilized in biomarker studies: modeling disease risk with logistic regression models and evaluated biomarker performance by measuring sensitivity, specificity, ROC, and the AUC (43). We carried out logistic regression modeling with 1 predictor to draw the predictive probability plots for each gene in the discovery, the prevalence, and the saliva screens. Graphical expression of the logistic regression, Pr (HNSCC) = logit−1 (b0 + b1 * methylation), with data overlain can be seen in Supplementary Figure S2A–D. The predictor methylation is the qMSP value for each case (1) and each control (0). Cutoff methylation values for each gene are shown by vertical dotted line. The classification performance for HOXA9 and NID2, in both HNSCC tissue and OSCC saliva, were highly satisfactory and the best published so far for hypermethylated biomarkers in HNSCC tissue and OSCC saliva, respectively.
The nonstochastic chromosomal location of hypermethylated and hypomethylated genes in the comparison between normal and OSCC tissue, as well as the similar comparisons with leukoplakia tissue, also deserves further research, which is beyond the context of this article. The chromosomal location of the significant differential methylation events can be utilized to unravel the interplay between genetic and differential methylation in the progression from normal to OSCC tissue, contribute to the identification of novel therapeutic strategies for this malignancy, and help us understand whether OSCC exhibits oncogene addiction (44, 45), and/or network/pathway addiction. This multidimensional knowledge may provide opportunities for the diagnosis of premalignant squamous lesions, and the development of novel molecular-targeted strategies the prevention and treatment of OSCC (46).
In summary, our preliminary results suggest HOXA9 and NID2 are promising OSCC biomarkers that should be studied on tissue, saliva, and serum from larger cohorts, as we move forward. These OSCC biomarkers may be useful in oral cancer prevention, early detection, diagnostic, and clinical management studies that target high-risk patients, patients without adequate access to health care, and patients with high-risk lesions such as leukoplakia, which may progress to carcinoma lesions. Our results also suggest that a phase I BDT can be carried out with a small number of genome-wide methylation arrays and subsequent validation in larger independent sample sets. A major feature facilitating our approach is the 2-stage design of the study, which buffers the impact of sample-to-sample variance by using experimental and publicly available data sets. The 2-stage design provides a sensitive approach for differential methylation and deregulated pathway detection in OSCC, while dramatically lowering the overall cost of a phase I Biomarker development project. Resources can then be utilized on the validation of the initial findings in larger number of samples from independent cohorts.
Disclosure of Potential Conflicts of Interest
D. Sidransky owns Oncomethylome Sciences, SA stock, which is subject to certain restrictions under University policy. D. Sidransky is a paid consultant to Oncomethylome Sciences, SA, and is a paid member of the company's Scientific Advisory Board. J.A. Califano is the Director of Research of the Milton J. Dance Head and Neck Endowment. The terms of this arrangement are being managed by the Johns Hopkins University in accordance with its conflict of interest policies.
Acknowledgments
This research used a web database application provided by Research Information Technology Systems (RITS)- https://www.rits.onc.jhmi.edu/. The funding agencies had no role in the design of the study, data collection or analysis, the interpretation of the results, the preparation of the manuscript, or the decision to submit the manuscript for publication. The authors wish to thank Connover Talbot, from Johns Hopkins Medical Institutions Deep Sequencing and Microarray Core (http://www.microarray.jhmi.edu/member.cgi), for his assistance in microarray and pathways data visualization, and integration.
Grant Support
This research was supported in part by the following grant awards: National Cancer Institute (NCI) Early Detection Research Network grant U01 CA84986; an NCI Supplement to Promote Diversity Award to U01 CA84986; a National Institute of Dental and Craniofacial Research (NIDCR) and NIH Specialized Program of Research Excellence grant (SPORE) P50DE019032; NIDCR grant RC2 DE20957; NCI grant 5T32CA009529-20; and National Center on Minority Health and Health Disparities (NCMHD) grant number 5S21MD008130-02. The funding agencies had no role in the design of the study, data collection or analysis, the interpretation of the results, the preparation of the manuscript, or the decision to submit the manuscript for publication.
E. Soudry is recipient of a fellowship grant from the American Physicians Fellowship for Medicine in Israel.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.