We have validated differences in DNA methylation levels of candidate genes previously reported to discriminate between normal colon mucosa of patients with colon cancer and normal colon mucosa of individuals without cancer. Here, we report that CpG sites in 16 of the 30 candidate genes selected show significant differences in mean methylation level in normal colon mucosa of 24 patients with cancer and 24 controls. A support vector machine trained on these data and data for an additional 66 CpGs yielded an 18-gene signature, composed of ten of the validated candidate genes plus eight additional candidates. This model exhibited 96% sensitivity and 100% specificity in a 40-sample training set and classified all eight samples in the test set correctly. Moreover, we found a moderate–strong correlation (Pearson coefficients r = 0.253–0.722) between methylation levels in colon mucosa and methylation levels in peripheral blood for seven of the 18 genes in the support vector model. These seven genes, alone, classified 44 of the 48 patients in the validation set correctly and five CpGs selected from only two of the seven genes classified 41 of the 48 patients in the discovery set correctly. These results suggest that methylation biomarkers may be developed that will, at minimum, serve as useful objective and quantitative diagnostic complements to colonoscopy as a cancer-screening tool. These data also suggest that it may be possible to monitor biomarker methylation levels in tissues collected much less invasively than by colonoscopy. Cancer Prev Res; 7(7); 717–26. ©2014 AACR.

Colorectal cancer is the second largest cause of cancer deaths in both men and women, despite the availability of an effective screening test (1). In fact, only about half of adults recommended to undergo a screening colonoscopy (those older than 50 years) comply with these guidelines (2). Although highly effective, colonoscopy is both invasive and subjective, depending on the unaided eye of the endoscopist to detect cancer and precancerous lesions. Up to 12% of precancerous lesions are not detected, either as a result of polyp morphology (so-called “flat” or “serrated” polyps) or failure to visualize the entire colon (3), and approximately 10% of colorectal cancers occur in individuals within 3 years of a screening colonoscopy (4).

Because epigenetic changes play a strong role in colorectal cancer (5–10, e.g., also reviewed in refs. 11, 12), we (13, 14), and others (15–17) have suggested that it may be possible to use quantitative objective measures of epigenetic change in normal tissues to detect colorectal cancer or precancerous lesions. In fact, we identified significant differences in methylation level at CpGs in 114 to 874 genes between the normal colon mucosa of 30 patients with cancer and 18 controls by array-based methylation profiling (13). The practical utility of such differences as an independent screening tool or as a complement to current screening methods depends on whether they are reproducible across test populations. In this report, we describe our validation study of methylation differences across 30 candidate genes selected from our previous study in an independent population of patients with cancer and controls. We have also used a combination of validated candidates and additional small-scale methylation profiling to build support vector machines (SVM) that are effective at discriminating patients with cancer from controls.

Description of control patients

We collected biologic specimens from patients undergoing routine screening colonoscopy at Temple University Medical Center (Philadelphia, PA) to serve as the control arm of the study. We excluded patients with a personal or first-degree family history of cancer of any kind. We also excluded any patients with a previous colonoscopic finding of polyps. Patients who were not excluded at this point underwent a complete colonoscopic evaluation by a board certified gastroenterologist. If the colonoscope could not be passed to the appendiceal orifice, the patient was excluded. If the complete colon was visualized, two cold forceps biopsies were performed. Two biopsies of normal colonic endothelium from the ascending colon (proximal to the hepatic flexure) were pooled as “right colon,” and two biopsies of normal colonic endothelium from the descending colon (distal to the splenic flexure, but proximal to the rectum) were pooled as “left colon.” Specimens were placed into RNALater RNA Stabilization Reagent (Ambion) and stored at 4°C before DNA isolation. Peripheral blood samples were also collected at this time and DNA was extracted by standard procedures (13).

Description of patients with cancer

We also collected biologic specimens from patients undergoing colon resection for presumed or biopsy-proven colon cancers. Patients were considered eligible if they had no personal or family history of colon cancer before this encounter. Patients with known or clinical features of hereditary cancer syndromes (specifically, hereditary nonpolyposis colorectal cancer or familial adenomatous polyposis syndrome) were excluded. In addition, patients with any personal history of chemotherapy or radiation therapy were also excluded. Patients who remained eligible underwent colon resection at a single National Cancer Institute designated Comprehensive Cancer Center (Temple/Fox Chase Cancer Center). Specimens were processed immediately, and normal appearing colon mucosa, well away (∼10 cm) from the lesion in question, was obtained. These specimens were classified “right colon” and “left colon” accordingly.

Patient data for samples used in this study are presented in Supplementary Table S1. Patients were matched for sex and as close as possible for age, although patients with cancer were, on average, almost 6 years older than controls. None of the patients examined in the present study was examined in the previous study (13). All patient materials were collected with the approval of Temple University Institutional Review Board (IRB) protocol 11910 or Fox Chase Cancer Center IRB protocol 11–866.

Sample preparation, bisulfite conversion

Tissue samples were rinsed with sterile saline and blotted dry before nucleic acid extraction. DNA was extracted using standard phenol-chloroform techniques. The isolated DNA was dissolved in 10 mmol/L TrisCl (pH 8.0). Samples were quantified by spectrophotometry and stored at −80°C until ready for use. The EZ DNA Methylation-Gold Kit (Zymo Research) was used to convert unmethylated genomic DNA cytosine to uracil. Site-specific CpG methylation was analyzed in the converted DNA template (5 μL at 50 ng/μL) as described below.

Veracode array analysis

Site-specific CpG methylation was analyzed in the converted DNA template (5 μL at 50 ng/μL) using a custom Veracode Array (Illumina, Inc.) at the Children's Hospital of Philadelphia Center (Philadelphia, PA) for Applied Genomics. Methylation levels (β-values: fraction of methyl CpG at each site tested) were assessed at 96 CpGs. Thirty of the CpG sites were selected because they fulfilled statistical criteria as significantly different between patients with cancer and controls in our original study (13). The additional 66 CpGs were selected as of interest for other studies (18–20) but many of these sites also differed significantly between patients with cancer and controls in our original study (13).

Statistical analysis

Because the present study is a validation experiment, in which we have a prior expectation for the direction of the difference between the mean methylation levels of patients with cancer versus controls, one-sided, paired t tests were performed. A P value of 0.05 was considered significant for the validation study because the candidates were identified after correction for multiple testing in the discovery study (13). Solely for comparison purposes, we also assessed which candidates would be significant after correction for multiple testing a second time using the Benjamini–Hochberg FDR of 0.05.

Binary classification of cancer versus control samples

A SVM with recursive feature elimination (21) was used to classify samples. Random, 10-fold cross-validation was repeated 10 times and a score was calculated for each tested sample at each reduction step. Average accuracy was calculated at each step and all the genes at the point of maximal accuracy were used as initial discriminator. A subsequential step to reduce the final discriminator to the minimum number of genes, for which the accuracy remains the same, was applied (22).

Validation of the SVM classifier

Twenty percent of the original dataset was not included in the creation of the discriminator, but we used these samples, together with other samples coming from an independent platform, to test the quality of the signature.

The signature was applied as an equation of the form:

X = a[A] + b[B] + c[C]… + z[Z] + constant

Where A, B, C…Z are the methylation level and a, b, c…z are the coefficient associated with each value. If the classification score (X) calculated for each sample is higher than 0, the sample will be declared as cancer, if less than 0 as control. The higher the score, the greater the confidence that the sample is cancer, the lower and more negative the score, the greater the confidence that the sample is control (22).

Validation of previously identified candidates

In our previous study (13), we profiled methylation levels across 27,578 CpG sites in DNA extracted from normal colon mucosa (see Materials and Methods) of 30 patients with colon cancer and 18 controls. We identified significant differences in mean methylation level between patients with cancer and controls using three different statistical thresholds (13): 119 CpGs in 114 genes differed after Bonferroni correction for multiple testing; 909 sites in 873 genes differed after applying the Benjamini–Hochberg FDR of 0.05; and 299 sites in 65 genes differed after applying the ad hoc criterion that genes in which three or more CpGs differed significantly at P < 0.05 (for a nominal significance of 0.05 × 0.05 × 0.05 = 1.25 × 10−4). From these gene/CpG lists, we selected 30 CpGs in 30 genes (Table 1) to test in an independent sample of normal colon mucosa from 24 cancer cases and 24 case-matched controls (see Materials and Methods and Supplementary Table S1). The 30 CpGs were selected from those having the largest magnitude of difference between means in the original study (Tables 1 and 2 in ref. 13), as well as a selection of CpGs in genes of additional interest that were significantly different in the original study (13).

Table 1.

The 30 CpGs selected for independent validation whose methylation levels differed significantly between normal colon mucosa of patients with cancer versus controls in ref. 13 

Gene selected for independent validationaCancer mucosa hypo (<), hyper (>), or no diff (−)CpG IDSelection criterion from Silviera et al. (13)P in validation setbFDR P
AHNAK </> cg19902569 Bonferroni 0.17 0.255 
BCDIN3 </< cg17607973 Bonferroni 0.21 0.3 
BCL2 </> cg08554462 3 CpGs<0.05 0.32 0.3778 
CASP8 >/> cg05130485 3 CpGs<0.05 5.2 × 10−4 0.0039 
DDX49 >/− cg14757492 Bonferroni 0.41 0.4241 
ENPEP </< cg17854440 Bonferroni 0.041 0.0863 
FXD7 >/> cg22392666 Bonferroni 0.22 0.3 
GABRR2 </> cg06445611 Bonferroni 0.1 0.1579 
GALR1 </< cg15343119 3 CpGs<0.05 0.046 0.0863 
GATA4 </< cg24646414 3 CpGs<0.05 0.33 0.3778 
GP1BB >/> cg07359545 Bonferroni 0.04 0.0863 
GPX4 </< cg18061485 Benjamini–Hochberg 0.041 0.0863 
GRB10b >/> cg01720588 3 CpGs<0.05 0.002 0.0086 
HSPA5 </− cg01733783 Benjamini–Hochberg 0.44 0.44 
IGF1 </< cg01305421 Benjamini–Hochberg 0.023 0.0627 
IGF2b </< cg10649864 3 CpGs<0.05 0.058 0.1024 
INS >/> cg03366382 3 CpGs<0.05 0.019 0.0627 
KCNQ1 </< cg06719391 Bonferroni 0.08 0.1334 
KRTHB6 </< cg04123507 Bonferroni 0.009 0.0338 
MEST >/< cg01888566 3 CpGs<0.05 0.26 0.3391 
MGC9712 >/> cg06194808 Bonferroni 9 × 10−4 0.0054 
MTHFR >/− cg03427831 Benjamini–Hochberg 0.39 0.4179 
OSBPL5b </< cg06282660 3 CpGs<0.05 0.044 0.0863 
RASSF5 >/> cg17558126 Bonferroni 0.022 0.0627 
SERPINB5 >/> cg20837735 3 CpGs<0.05 0.34 0.3778 
SLC16A3 >/> cg18345635 Bonferroni 1.7 × 10−4 0.0017 
SULT1C2 </< cg17966192 Bonferroni 0.002 0.0086 
VAV1 >/> cg13470920 Bonferroni 1.7 × 10−4 0.0017 
VHL >/> cg16869108 3 CpGs<0.05 5 × 10−5 0.0015 
ZNF512 </> cg18611281 3 CpGs<0.05 0.29 0.3625 
Gene selected for independent validationaCancer mucosa hypo (<), hyper (>), or no diff (−)CpG IDSelection criterion from Silviera et al. (13)P in validation setbFDR P
AHNAK </> cg19902569 Bonferroni 0.17 0.255 
BCDIN3 </< cg17607973 Bonferroni 0.21 0.3 
BCL2 </> cg08554462 3 CpGs<0.05 0.32 0.3778 
CASP8 >/> cg05130485 3 CpGs<0.05 5.2 × 10−4 0.0039 
DDX49 >/− cg14757492 Bonferroni 0.41 0.4241 
ENPEP </< cg17854440 Bonferroni 0.041 0.0863 
FXD7 >/> cg22392666 Bonferroni 0.22 0.3 
GABRR2 </> cg06445611 Bonferroni 0.1 0.1579 
GALR1 </< cg15343119 3 CpGs<0.05 0.046 0.0863 
GATA4 </< cg24646414 3 CpGs<0.05 0.33 0.3778 
GP1BB >/> cg07359545 Bonferroni 0.04 0.0863 
GPX4 </< cg18061485 Benjamini–Hochberg 0.041 0.0863 
GRB10b >/> cg01720588 3 CpGs<0.05 0.002 0.0086 
HSPA5 </− cg01733783 Benjamini–Hochberg 0.44 0.44 
IGF1 </< cg01305421 Benjamini–Hochberg 0.023 0.0627 
IGF2b </< cg10649864 3 CpGs<0.05 0.058 0.1024 
INS >/> cg03366382 3 CpGs<0.05 0.019 0.0627 
KCNQ1 </< cg06719391 Bonferroni 0.08 0.1334 
KRTHB6 </< cg04123507 Bonferroni 0.009 0.0338 
MEST >/< cg01888566 3 CpGs<0.05 0.26 0.3391 
MGC9712 >/> cg06194808 Bonferroni 9 × 10−4 0.0054 
MTHFR >/− cg03427831 Benjamini–Hochberg 0.39 0.4179 
OSBPL5b </< cg06282660 3 CpGs<0.05 0.044 0.0863 
RASSF5 >/> cg17558126 Bonferroni 0.022 0.0627 
SERPINB5 >/> cg20837735 3 CpGs<0.05 0.34 0.3778 
SLC16A3 >/> cg18345635 Bonferroni 1.7 × 10−4 0.0017 
SULT1C2 </< cg17966192 Bonferroni 0.002 0.0086 
VAV1 >/> cg13470920 Bonferroni 1.7 × 10−4 0.0017 
VHL >/> cg16869108 3 CpGs<0.05 5 × 10−5 0.0015 
ZNF512 </> cg18611281 3 CpGs<0.05 0.29 0.3625 

NOTE: Genes/CpGs in bold differed significantly in the validation.

aNote that only five of 30 CpGs tested varied in the direction opposite that expected from the discovery profile (13) and none of the five variant CpGs was significant in the validation.

bOne-tailed t test was used because there was an expected direction of difference in this validation. Eight of 14 Bonferroni candidates, two of four Benjamini–Hochberg candidates, and six of 12 “3 CpGs<0.05” candidates validated (16/30 validated overall).

Table 2.

The top 18 CpGs/genes selected by a SVM to classify cancer samples and controls

Gene nameCpG IDP cancer vs. control in Veracode arrayP in Infinium array
ANKRD15 cg17694279 0.007 0.196 
CASP8 cg05130485 5.2 × 104 0.007 
EDA2R cg14372520 0.003 0.045 
ENPEP cg17854440 0.041 1.26E−06 
GRB10 cg01720588 0.002 N/A 
IGFBP5 cg24617085 0.016 0.262 
INS cg03366382 0.019 1.28E−05 
ITGB4 cg12146151 0.006 0.451 
LGALS2 cg11081833 0.045 0.994 
MGC9712 cg06194808 9 × 104 4 × 104 
NMUR1 cg10642330 0.017 0.72 
RASSF5 cg17558126 0.022 1.19E−05 
SLC16A3 cg18345635 1.7 × 104 7.28E−05 
SULT1C2 cg17966192 0.002 0.016 
TIMP4 cg25982743 0.004 0.019 
VAV1 cg13470920 1.7 × 104 1.76E−05 
VHL cg16869108 5 × 105 5 × 104 
VMD2 cg09726693 0.001 N/A 
Gene nameCpG IDP cancer vs. control in Veracode arrayP in Infinium array
ANKRD15 cg17694279 0.007 0.196 
CASP8 cg05130485 5.2 × 104 0.007 
EDA2R cg14372520 0.003 0.045 
ENPEP cg17854440 0.041 1.26E−06 
GRB10 cg01720588 0.002 N/A 
IGFBP5 cg24617085 0.016 0.262 
INS cg03366382 0.019 1.28E−05 
ITGB4 cg12146151 0.006 0.451 
LGALS2 cg11081833 0.045 0.994 
MGC9712 cg06194808 9 × 104 4 × 104 
NMUR1 cg10642330 0.017 0.72 
RASSF5 cg17558126 0.022 1.19E−05 
SLC16A3 cg18345635 1.7 × 104 7.28E−05 
SULT1C2 cg17966192 0.002 0.016 
TIMP4 cg25982743 0.004 0.019 
VAV1 cg13470920 1.7 × 104 1.76E−05 
VHL cg16869108 5 × 105 5 × 104 
VMD2 cg09726693 0.001 N/A 

NOTE: The 10 CpGs/genes in bold are from the validated candidates in Table 1.

Methylation levels at the individual CpG sites in the normal colon mucosa of the 24 patients with cancer and 24 matched controls (Supplementary Table S1) were assayed on bisulfite-converted DNA using a custom-designed Illumina high-throughput “Veracode” array (see Materials and Methods). Comparison of mean methylation levels between the normal colon mucosa of patients with cancer and controls revealed that mean methylation levels of 16 of the 30 CpG sites tested (53%) differed significantly in the independent population (Table 1, next to last column). In this analysis, a P value of 0.05 was considered significant because the candidate genes were selected for validation on the basis of having passed multiple statistical selection criteria in the discovery population, including correction for multiple testing (13). The success rate for validation was approximately the same regardless of whether the original statistical threshold was very stringent (Bonferroni; 8/14 candidates validated), moderately stringent (Benjamini–Hochberg; 2/4 candidates validated), or our ad hoc statistical threshold requiring a modest level of significance (P < 0.05) for at least three CpGs (nominal P < 0.00125) in the candidate gene (6/12 candidates validated). If the candidates are subjected to correction for multiple testing, yet again, in the present study, eight of the 16 candidates that are significant at P < 0.05 are also significant at an FDR of 0.05 (Table 1). Notably, the P values for the other eight candidates are all <0.087 (Table 1).

Support vector machine model

The Illumina Veracode array used for validation contained 66 CpGs in addition to the 30 shown in Table 1. Although these CpGs were not selected by using the criteria outlined in our previous study, almost all of the genes containing these additional CpGs were profiled on the Infinium array in our original study and a number of them exhibited statistically significant mean methylation differences between patients with cancer and controls (see below).

A SVM was trained on the 96 CpG array data using 20 of the 24 patients with cancer and 20 of the 24 controls (Supplementary Table S1). The SVM identified 18 CpGs with optimum performance (96% sensitivity, 100% specificity; Fig. 1A) in classifying patients with cancer and controls correctly (one patient with cancer in the training set was misclassified as a control). These 18 CpGs (Table 2) consist of 10 of the original 16 validated candidates (Table 1) and eight additional candidates. Three of the eight additional candidates (TIMP4, NMUR1, and EDA2R) also exhibited significant methylation differences between patients with cancer and controls in our original study (Table 2, last column). Methylation levels at these 18 CpGs classified all eight patients in the test set correctly (Fig. 1B).

Figure 1.

A, Performance of 18 CpG SVM in validation population training set of 20 cancer cases and 20 controls. B, performance of 18 CpG SVM in validation population test set of four cancer cases and four controls. C, performance of the optimum 39 CpG SVM in classifying patients with cancer and controls in the discovery population of patients from our original study (13). This SVM has been selected from the 66 CpGs interrogated (13) in the 18 candidate genes described in Table 1.

Figure 1.

A, Performance of 18 CpG SVM in validation population training set of 20 cancer cases and 20 controls. B, performance of 18 CpG SVM in validation population test set of four cancer cases and four controls. C, performance of the optimum 39 CpG SVM in classifying patients with cancer and controls in the discovery population of patients from our original study (13). This SVM has been selected from the 66 CpGs interrogated (13) in the 18 candidate genes described in Table 1.

Close modal

We devised a further test of the utility of this 18-gene model by selecting all 65 CpGs interrogating these 18 genes on the original 27K discovery array (on which 30 patients with cancer and 18 controls were profiled; ref. 13) from which the 10 validated genes were selected (Table 1). The SVM selected 39 CpGs in 16 of the genes (Supplementary Table S2) as the optimum model (93% sensitivity, 94% specificity; Fig. 1C). Of note, only four CpGs in the INS gene were interrogated on the array and all four were selected for the SVM model. Multiple CpGs in RASSF5 (five selected of nine interrogated), VHL (five selected of seven interrogated), GRB10 (four selected of 12 interrogated), and CASP8 (three selected of six interrogated) were also selected. In addition, seven genes (VAV1, MGC97112, SULT1C2, SLC16A3, ITGB4, ANKRD15, and ENPEP) were interrogated by only two CpGs on the array and both CpGs in each of the seven genes were selected for the SVM.

Correlation between methylation levels in colon mucosa and methylation levels in peripheral blood

We had the opportunity to profile the methylation levels of the 96 CpGs on the Veracode array in DNA extracted from peripheral blood on 15 of the patients without cancer, as well as normal colon mucosa on the same 15 patients. We compared the CpG site-specific methylation levels between the two tissues and identified a number of genes in which methylation levels between the two tissues were correlated strongly. In fact, 14 of the 96 CpGs showed strong positive correlation (Pearson correlation, r>0.5) between methylation levels in normal colon mucosa and methylation levels in peripheral blood. Of the CpGs selected by the original 18 CpG SVM (Fig. 1; Table 2), seven exhibit moderate to strong methylation correlations between tissues (Table 3; Fig. 2), and use of colon methylation levels of these seven CpGs, only, results in correct classification of 40 of 48 patients (sensitivity 92%, specificity 87%; Fig. 3A) profiled on the Veracode array. Use of a seven CpG SVM (data for the identical CpG or nearest CpG interrogated on the Infinium 27 K array) to classify patients profiled on the discovery array (13) results in a sensitivity of 83% and a specificity of 61% (not shown). Allowing an SVM to be optimized from all 31 CpGs interrogated in these seven genes resulted in a model with 87% sensitivity and 83% specificity (Fig. 3B).

Figure 2.

Correlation between methylation levels in normal colon mucosa (y-axis) and peripheral blood (x-axis) of 15 of the 24 control patients. A, INS cg03366382. B, LGALS2 cg11081833. C, ANKRD15 cg17694279. D, VHL cg16869108. Trend lines were drawn by “lm” function in R.

Figure 2.

Correlation between methylation levels in normal colon mucosa (y-axis) and peripheral blood (x-axis) of 15 of the 24 control patients. A, INS cg03366382. B, LGALS2 cg11081833. C, ANKRD15 cg17694279. D, VHL cg16869108. Trend lines were drawn by “lm” function in R.

Close modal
Figure 3.

A, performance of SVM using seven CpGs showing correlation between methylation levels in normal colon and peripheral blood in classifying patients with cancer and controls in the validation population. B, performance of SVM using seven CpGs showing correlation between methylation levels in normal colon and peripheral blood in classifying patients with cancer and controls in the discovery population (13).

Figure 3.

A, performance of SVM using seven CpGs showing correlation between methylation levels in normal colon and peripheral blood in classifying patients with cancer and controls in the validation population. B, performance of SVM using seven CpGs showing correlation between methylation levels in normal colon and peripheral blood in classifying patients with cancer and controls in the discovery population (13).

Close modal
Table 3.

The seven CpGs from the optimum 18 CpG SVM model (Table 2; Fig. 1A) in which there is a moderate to strong correlation between methylation levels in normal colon mucosa and peripheral blood

Gene nameCpG IDPearson rSVM score
INS cg03366382 0.72 0.79 
LGALS2 cg11081833 0.68 0.54 
ANKRD15 cg17694279 0.64 0.84 
VHL cg16869108 0.61 0.93 
EDA2R cg14372520 0.57 0.30 
NMUR1 cg10642330 0.30 0.34 
GRB10 cg01720588 0.25 0.26 
Gene nameCpG IDPearson rSVM score
INS cg03366382 0.72 0.79 
LGALS2 cg11081833 0.68 0.54 
ANKRD15 cg17694279 0.64 0.84 
VHL cg16869108 0.61 0.93 
EDA2R cg14372520 0.57 0.30 
NMUR1 cg10642330 0.30 0.34 
GRB10 cg01720588 0.25 0.26 

We have validated multiple site-specific DNA methylation differences in normal colon mucosa between patients with colon cancer and patients without cancer. Our success rate of validation was 53% in a sample of 48 patients, none of whom were examined in the original discovery study (13). The fact that approximately half of the CpGs tested exhibit significant differences in mean methylation level in an independent set of samples is a substantial success rate, even for discovery genes judiciously selected (Table 1) from whole-genome profiles. Methylation differences at two additional candidate CpGs, one in IGF2 and one in KCNQ1 (Table 1), approached but did not achieve significance (P = 0.06 and P = 0.08, respectively).

Whether the failure to validate additional CpGs reflects a spurious result in the original study or true differences between small subpopulations of patients with colon cancer and controls sampled from the overall population in which true differences in means exist cannot be determined from only two independent samples. However, we observed that the degree of overlap in CpG site methylation level between cancer and control populations was substantial for many loci (ref. 13; data not shown). As a result, it is expected that many statistically significant differences between population means will be difficult to validate in independent populations if they have similar variance, even if there are true differences in population mean.

There are two potential weaknesses that could affect the conclusions of our study: (i) the average age of the patients with cancer (mean age = 65.4) is greater than the average age of the controls (mean age = 59.6, P = 0.03) and (ii) variability in the anatomical site of biopsy within the colon between patients with cancer and controls. Both of these differences are potentially important. Methylation levels at some CpGs have been demonstrated to change with age (23–28). Overall, the percentage of CpGs that change in an age-related way has been estimated as between 15% (using the Illumina HumanMethylation450 Bead Chip array; ref. 26) and 28% (using the Illumina HumanMethylation27 Bead Chip array; ref. 28). The major concern, with respect to our candidates, is whether they are more likely to be in the age-related group than CpGs selected at random and whether the differences we observe are the result of differences in age between the cancer population and the controls. Because our original 30 candidates were selected from the same Illumina HumanMethylation27 Bead Chip array (13) used in one of the aging studies (28), we queried whether any of the candidates in Table 1 corresponded to the more than 700 aging-related CpGs identified in that study. Only one CpG (in SLC16A3) in Table 1 appears among the aging-related CpGs in (ref. 28). Of note, in addition, is that the estimated average rate of methylation change is 0.07% to 0.2% per year in blood (26, 28) and 0.2% per year in colon (29). Given the 6-year average difference between the patients with cancer and the controls, we might expect age to account for an approximately 1.2% average difference in colon methylation between the two groups. Only three of the 16 validated candidate genes in Table 1 (GALR1, GPX4, and RASSF5) differ by less than this amount between patients with cancer and controls and none of these three candidates were identified as having age-related changes (28). Of the three genes in which CpGs have been demonstrated to incur age-related changes (INS, OSBPL5, and SLC16A3), the differences between groups are substantially larger than 1.2% (5% at INS, 3% at OSBPL5, and 10% at SLC16A3). Therefore, we do not believe that the 6-year difference in average ages of patients with cancer and controls could be the explanation for the larger and consistent differences we see in methylation levels in two independent populations.

The second source of potential bias in our study is variability in the site of the normal colon biopsy being compared between the patients with cancer and controls. In our previous study, we compared right side control biopsies with right side biopsies from patients with cancer who had right side tumors (13). In this study, we used left side biopsies from the patients with cancer because all of the cancers were left side cancers and only left side biopsies were available. However, we used right side biopsies from the controls because we were attempting to validate candidate genes identified from right side biopsies, also reasoning that biomarkers that could distinguish cancer, independent of site within the colon would be more valuable, clinically. Although site-specific DNA methylation has been reported by us and others (14, 25, 27) to vary as a function of biopsy site within the colon for a number of genes, biopsy site does not seem to be a major factor at most CpG sites. Genome-wide comparison of methylation in right-side biopsies (from the ascending colon) and biopsies from the left side or rectum has identified both hyper- and hypomethylation differences between the two anatomical sites (27). Although some of these differences were substantial in magnitude, significant differences between the two anatomical sites were found in <2% of CpG sites (8,388 sites out of >430,000) examined. Only three of the CpGs among our 30 candidates or the eight additional CpGs comprising the SVM (Table 2) are present among the 8,388 right side/rectum differing CpGs in Kaz and colleagues (27). One of these was among the 14 CpGs that were not validated in Table 1, one was among the 16 validated CpGs (in GP1BB) but not selected for the SVM and one (in LGALS2) was not among the original candidates but was selected by the SVM. Leaving this CpG out of the SVM does not substantially alter the sensitivity or specificity of the model. In summary, neither age differences nor biopsy site differences within the colon are likely to account for the reproducible differences we observe between patients with cancer and controls in two independent population samples.

Looking forward, it is likely that multigene models like those presented here will be required to classify patients with the level of accuracy required to be useful in the clinic. The number of methylation biomarker genes that will be required will be dependent on the discriminatory power of the markers but clinically useful distinctions for some clinical outcomes are made currently on the basis of measuring transcript levels of only 12 to 23 genes (30). With respect to methylation biomarkers, there are significant advantages to using DNA, rather than RNA, as a diagnostic molecule (31). In addition to the relative stability of DNA compared with RNA, DNA methylation level, like mRNA level, is a continuous variable. However, it exhibits considerably lower variance than population mRNA levels (e.g., 32, reviewed in ref. 33), in part, because DNA methylation levels are constrained between 0 and 1.

When assembling a panel of methylation biomarkers, an additional consideration is whether interrogation of a single or a small number of sites in a larger number of genes (as was the case with early array-based methods; ref. 34; methylation-sensitive restriction endonuclease-based methods, e.g., ref. 35; and most bisulfite pyrosequencing assays; ref. 36) or interrogation of a greater number of sites in a smaller number of genes (larger or custom arrays; refs. 37, 38; or multiple bisulfite pyrosequencing assays) would be superior. Although we have observed that it is often the case that methylation levels of different CpG sites within the same CpG island are highly correlated (13, 18, 39) and would, therefore, be predicted to add little additional information, it is possible that interrogation of additional CpGs will add predictive power if the additional information is not completely redundant. We are able to make a preliminary assessment of whether single CpG sites in our 18-gene model perform as well as the 39 CpG sites in 16 of these genes. Supplementary Figure S1A shows the result of using a single CpG in 17 of the 18 genes (VMD2/BEST1 is not interrogated on the Illumina 27 K array used in the original study) in classifying the 30 patients with cancer and 18 controls in our original study. Comparing this result with that in Fig. 1C, we see that the specificity is the same (94% success in classifying controls) but the sensitivity drops from 93% to 83% (five patients with cancer misclassified vs. two patients with cancer misclassified). Thus, for this particular set of candidate genes, additional precision is gained by assessing methylation at more than one site per gene. Even if we expand the number of individual CpGs/individual genes (38 CpGs/38 individual genes; again, VMD2/BEST1 is not present) to the same number used in Fig. 1C, specificity remains at 94% but specificity rises to only 86% (Supplementary Fig. S1B), suggesting that interrogating multiple CpGs per gene is a superior approach to interrogating individual CpGs in a larger number of genes.

An additional consideration for whether candidate methylation biomarkers such as those identified here will be clinically useful is whether they can be assessed in tissues collected less invasively than by colonoscopy. Although there are multiple factors associated with uptake of the test, including education, insurance coverage, and ethnicity (40), the fact that less than half of those patients recommended to have a screening colonoscopy are compliant (2) suggests that uptake of a less invasive test, such as might be performed on peripheral blood or saliva, might be substantially greater. There is much interest and some progress in developing such biomarkers (41, 42, reviewed in ref. 43). As far as whether such biomarkers give a realistic picture of methylation levels in the organ of interest, it is often assumed that methylation levels are tissue specific, but there are many sites for which methylation levels vary between individuals but do not vary substantially between tissues of the same individual (44, 45). Our finding that a significant fraction of candidate biomarkers (seven out of 18; Tables 2 and 3; Fig. 2) shows a strong correlation between methylation levels in colon mucosa and methylation levels in peripheral blood offers the possibility that methylation levels of candidate biomarkers in tissues collected less invasively could serve as a proxy not only for colon, but also for any other tissue or organ that might be difficult or impossible to biopsy.

One clear advantage of cancer screening by colonoscopy is that premalignant lesions that are detected can be removed before they have a chance to progress to cancer. We have described two methylation biomarkers that show some promise in distinguishing individuals without cancer or polyps from cancer-free individuals who have polyps (14). In this study, we attempted to define a SVM to classify which individuals in our group of controls carried polyps but its performance was mediocre: 12 of the controls had polyps (two with hyperplastic polyps, eight with tubular adenomas, and two with polyps that were not described histopathologically) and 12 did not (Supplementary Table S1). A SVM using six CpGs selected from the 96 on the array was able to classify eight of the 12 polyp-carrying individuals correctly. The two individuals with hyperplastic polyps (which are not thought to give rise to malignancy) were classified as controls but four of the controls were also classified with the polyp group. It seems likely that additional large-scale screens such as that performed by Lange and colleagues (41) will be necessary to identify markers of sufficient discriminatory power to make the more subtle distinction between individuals without cancer but with premalignant lesions from individuals who are cancer free and polyp free. However, the potential benefits of noninvasive, large-scale screening for relative cancer risk could be highly significant in reducing disease burden.

No potential conflicts of interest were disclosed.

Conception and design: C. Sapienza

Development of methodology: M. Cesaroni, C. Sapienza

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Sapienza

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M. Cesaroni, J. Powell, C. Sapienza

Writing, review, and or revision of the manuscript: M. Cesaroni, C. Sapienza

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Powell, C. Sapienza

Study supervision: C. Sapienza

The authors thank Dr. Andrew Kaz for providing information on the 8,388 CpGs whose methylation levels vary between right side colon and rectal biopsies (cited in Kaz and colleagues; ref. 27) and Drs. Noor Dawany and Andrew Kossenkov for providing SVM scripts and for the helpful discussion about statistical methods.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

This work was supported by a grant from the National Institutes of Health, NIH R03 CA180533-01.

3.
Millan
MS
,
Gross
P
,
Manilich
E
,
Church
JM
. 
Adenoma detection rate: the real indicator of quality in colonoscopy
.
Dis Colon Rectum
2008
;
51
:
1217
20
4.
Baxter
NN
,
Sutradhar
R
,
Forbes
SS
,
Paszat
LF
,
Saskin
R
,
Rabeneck
L
. 
Analysis of administrative data finds endoscopist quality measures associated with postcolonoscopy colorectal cancer
.
Gastroenterology
2011
;
140
:
65
72
.
5.
Cui
H
,
Cruz-Correa
M
,
Giardiello
FM
,
Hutcheon
DF
,
Kafonek
DR
,
Brandenburg
S
, et al
Loss of IGF2 imprinting: a potential marker of colorectal cancer risk
.
Science
2003
;
299
:
1753
5
.
6.
Nakagawa
H
,
Chadwick
RB
,
Peltomaki
P
,
Plass
C
,
Nakamura
Y
,
de La Chapelle
A
. 
Loss of imprinting of the insulin-like growth factor II gene occurs by biallelic methylation in a core region of H19-associated CTCF-binding sites in colorectal cancer
.
Proc Natl Acad Sci U S A
2001
;
98
:
591
6
.
7.
Hinoue
T
,
Weisenberger
DJ
,
Pan
F
,
Campan
M
,
Kim
M
,
Young
J
, et al
Analysis of the association between CIMP and BRAF in colorectal cancer by DNA methylation profiling
.
PLoS ONE
2009
;
4
:
e8357
8.
Weisenberger
DJ
,
Siegmund
KD
,
Campan
M
,
Young
J
,
Long
TI
,
Faasse
MA
, et al
CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer
.
Nat Genet
2006
;
38
:
738
40
.
9.
Irizarry
RA
,
Ladd-Acosta
C
,
Wen
B
,
Wu
Z
,
Montano
C
,
Onyango
P
, et al
The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores
.
Nat Genet
2009
;
41
:
178
86
.
10.
Estécio
MR
,
Gallegos
J
,
Vallot
C
,
Castoro
RJ
,
Chung
W
,
Maegawa
S
, et al
Genome architecture marked by retrotransposons modulates predisposition to DNA methylation in cancer
.
Genome Res
2010
;
20
:
1369
82
.
11.
Kondo
Y
,
Issa
JP
. 
Epigenetic changes in colorectal cancer
.
Cancer Metastasis Rev
2004
;
23
:
29
39
.
12.
Lao
VV
,
Grady
WM
. 
Epigenetics and colorectal cancer
.
Nat Rev Gastroenterol Hepatol
2011
;
8
:
686
700
.
13.
Silviera
ML
,
Smith
BP
,
Powell
J
,
Sapienza
C
. 
Epigenetic differences in normal colon mucosa of cancer patients suggests altered dietary metabolic pathways
.
Cancer Prev Res
2012
;
5
:
374
84
.
14.
Leclerc
D
,
Lévesque
N
,
Cao
Y
,
Deng
L
,
Wu
Q
,
Powell
J
, et al
Genes with aberrant expression in murine neoplastic intestine show epigenetic and expression changes in normal mucosa of colon cancer patients
.
Cancer Pre Res
2013
;
6
:
1171
81
.
15.
Hiraoka
S
,
Kato
J
,
Horii
J
,
Saito
S
,
Harada
K
,
Fujita
H
, et al
Methylation status of normal background mucosa is correlated with occurrence and development of neoplasia in the distal colon
.
Hum Pathol
2010
;
41
:
38
47
.
16.
Kim
MS
,
Louwagie
J
,
Carvalho
B
,
Terhaar Sive Droste
JS
,
Park
HL
,
Chae
YK
, et al
Promoter DNA methylation of oncostatin m receptor-beta as a novel diagnostic and therapeutic marker in colon cancer
.
PLoS ONE
2009
;
4
:
e6555
.
17.
Yi
JM
,
Dhir
M
,
Guzzetta
AA
,
Iacobuzio-Donahue
CA
,
Heo
K
,
Yang
KM
, et al
DNA methylation biomarker candidates for early detection of colon cancer
.
Tumor Biol
2012
;
33
:
363
72
.
18.
Turan
N
,
Katari
S
,
Gerson
L
,
Chalian
R
,
Foster
M
,
Gaughan
J
, et al
Inter- and intra- individual variation in allele-specific DNA methylation and gene expression in children conceived using assisted reproductive technology
.
PLoS Genet
2010
;
6
:
e1001033
19.
Turan
N
,
Ghalwash
MF
,
Katari
S
,
Coutifaris
C
,
Obradovic
Z
,
Sapienza
C
. 
DNA methylation differences at growth related genes correlate with birth weight: a molecular signature linked to developmental origins of adult disease?
BMC Med Genomics
2012
;
5
:
10
.
20.
Sapienza
C
,
Lee
J
,
Powell
J
,
Erinle
O
,
Yafai
F
,
Reichert
J
, et al
DNA methylation profiling identifies epigenetic differences between diabetes patients with ESRD and diabetes patients without nephropathy
.
Epigenetics
2011
;
6
:
20
8
.
21.
Guyon
I
,
Weston
J
,
Barnhill
S
,
Vapnik
V
. 
Gene selection for cancer classification using support vector machines
.
Mach Learn
2002
;
46
:
389
422
.
22.
Showe
MK
,
Vachani
A
,
Kossenkov
AV
,
Yousef
M
,
Nichols
C
,
Nikonova
EV
, et al
Gene expression profiles in peripheral blood mononuclear cells can distinguish patients with non-small cell lung cancer from patients with nonmalignant lung disease
.
Cancer Res
2009
;
69
:
9202
10
.
23.
Maegawa
S
,
Hinkal
G
,
Kim
HS
,
Shen
L
,
Zhang
L
,
Zhang
J
, et al
Widespread and tissue specific age-related DNA methylation changes in mice
.
Genome Res
2010
;
20
:
332
40
.
24.
Christensen
BC
,
Houseman
EA
,
Marsit
CJ
,
Zheng
S
,
Wrensch
MR
,
Wiemels
JL
, et al
Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context
.
PLoS Genet
2009
;
5
:
e1000602
.
25.
Worthley
DL
,
Whitehall
VL
,
Buttenshaw
RL
,
Irahara
N
,
Greco
SA
,
Ramsnes
I
, et al
DNA methylation within the normal colorectal mucosa is associated with pathway-specific predisposition to cancer
.
Oncogene
2010
;
29
:
1653
62
.
26.
Hannum
G
,
Guinney
J
,
Zhao
L
,
Zhang
L
,
Hughes
G
,
Sadda
S
, et al
Genome-wide methylation profiles reveal quantitative views of human aging rates
.
Mol Cell
2013
;
49
:
359
67
.
27.
Kaz
AM
,
Wong
CJ
,
Dzieciatkowski
S
,
Luo
Y
,
Schoen
RE
,
Grady
WM
. 
Patterns of DNA methylation in the normal colon vary by anatomical location, gender, and age
.
Epigenetics
2014
;
9
:
492
502
.
28.
Xu
Z
,
Taylor
JA
. 
Genome-wide age-related DNA methylation changes in blood and other tissues relate to histone modification, expression and cancer
.
Carcinogenesis
2014
;
35
:
356
64
.
29.
Ahuja
N
,
Li
Q
,
Mohan
AL
,
Baylin
SB
,
Issa
JP
. 
Aging and DNA methylation in colorectal mucosa and cancer
.
Cancer Res
1998
;
58
:
5489
94
.
30.
31.
Issa
JP
. 
DNA methylation as a clinical marker in oncology
.
J Clin Oncol
2012
;
30
:
2566
8
32.
Cheung
VG
,
Nayak
RR
,
Wang
IX
,
Elwyn
S
,
Cousins
SM
,
Morley
M
, et al
Polymorphic cis- and trans-regulation of human gene expression
.
PLoS Biol
2010
;
8
.
e1000480
.
33.
Turan
N
,
Katari
S
,
Coutifaris
C
,
Sapienza
C
. 
Explaining inter-individual variability in phenotype: is epigenetics up to the challenge?
Epigenetics
2010
;
5
:
16
9
.
35.
Herman
JG
,
Graff
JR
,
Myöhänen
S
,
Nelkin
BD
,
Baylin
SB
. 
Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands
.
Proc Natl Acad Sci U S A
1996
;
93
:
9821
6
.
36.
Dupont
JM
,
Tost
J
,
Jammes
H
,
Gut
IG
. 
De novo quantitative bisulfite sequencing using the pyrosequencing technology
.
Anal Biochem
2004
;
333
:
119
27
.
39.
Katari
S
,
Turan
N
,
Bibikova
M
,
Erinle
O
,
Chalian
R
,
Foster
M
, et al
DNA methylation and gene expression differences in children conceived in vitro or in vivo
.
Hum Mol Genet
2009
;
18
:
3769
78
40.
Gellad
ZF
,
Provenzale
D
. 
Colorectal cancer: national and international perspective on the burden of disease and public health impact
.
Gastroenterology
2010
;
138
:
2177
90
.
41.
Lange
CP
,
Campan
M
,
Hinoue
T
,
Schmitz
RF
,
van der Meulen-de Jong
AE
,
Slingerland
H
, et al
Genome-scale discovery of DNA-methylation biomarkers for blood-based detection of colorectal cancer
.
PLoS ONE
2012
;
7
:
e50266
.
42.
Tóth
K
,
Sipos
F
,
Kalmár
A
,
Patai
AV
,
Wichmann
B
,
Stoehr
R
, et al
Detection of methylated SEPT9 in plasma is a reliable screening method for both left- and right-sided colon cancers
.
PLoS ONE
2012
;
7
:
e46000
.
43.
Hong
L
,
Ahuja
N
. 
DNA methylation biomarkers of stool and blood for early detection of colon cancer
.
Genet Test Mol Biomarkers
2013
;
17
:
401
6
.
44.
Waterland
RA
,
Kellermayer
R
,
Laritsky
E
,
Rayco-Solon
P
,
Harris
RA
,
Travisano
M
, et al
Season of conception in rural gambia affects DNA methylation at putative human metastable epialleles
.
PLoS Genet
2010
;
6
:
e1001252
.