Aberrant DNA methylation at CpG islands is thought to contribute to cancer initiation and progression, but mechanisms that establish and maintain DNA methylation status during tumorigenesis or normal development remain poorly understood. In this study, we used methyl-CpG immunoprecipitation to generate comparative DNA methylation profiles of healthy and malignant cells (acute leukemia and colorectal carcinoma) for human CpG islands across the genome. While searching for sequence patterns that characterize DNA methylation states, we discovered several nonredundant sequences in CpG islands that were resistant to aberrant de novo methylation in cancer and that resembled consensus binding sites for general transcription factors (TF). Comparing methylation profiles with global CpG island binding data for specific protein 1, nuclear respiratory factor 1, and yin-yang 1 revealed that their DNA binding activity in normal blood cells correlated strictly with an absence of de novo methylation in cancer. In addition, global evidence showed that binding of any of these TFs to their consensus motif depended on their co-occurrence with neighboring consensus motifs. In summary, our results had two major implications. First, they pointed to a major role for cooperative binding of TFs in maintaining the unmethylated status of CpG islands in health and disease. Second, our results suggest that the majority of de novo methylated CpG islands are characterized by the lack of sequence motif combinations and the absence of activating TF binding. Cancer Res; 70(4); 1398–407

The epigenetic status of a normal cell can be drastically altered during aging or even more pronounced during malignant transformation (1). A commonly observed alteration is the aberrant DNA methylation of CpG islands that often targets tumor suppressor genes and may play a role in disease initiation and progression (1). Exactly how and why certain CpG islands are prone to methylation, whereas others remain unmethylated, is largely unknown.

In the past, different mechanisms for cancer-dependent, aberrant de novo methylation have been proposed largely based on the behavior of individual CpG islands. One possibility includes an initial random methylation event that is selected for during progressive proliferation (1). A second possibility comprises the targeted recruitment of DNA methyltransferases to methylation targets by cis-acting factors (2, 3), histone methyltransferases such as G9a (4, 5), or EZH2 (6). The third possibility includes the loss of chromatin boundaries or the absence of “protective” transcription factors (TF), leading to the spreading of DNA methylation into affected CpG islands (7).

The latter possibility is supported mainly by anecdotal evidence (2, 6, 815), showing that the unmethylated state of chromosomal DNA is maintained or even established by DNA-binding proteins. Although several previous studies identified specific nucleotide sequences (1620) or structural features (21) that correlated with methylation-prone or methylation-resistant CpG islands in cancer samples, they did not identify sequence motifs resembling consensus sites for known TFs, arguing against a protective role of TFs.

In the present study, we show that CpG islands that remain unmethylated in normal and malignant cells (leukemia and colon cancer) contain specific sequence motifs that are identical to consensus sequences for general TFs. Sites that are stably bound by these factors in normal cells are highly resistant to de novo methylation. We also show that the stable binding of TFs in normal cells to their respective motifs requires the presence of neighboring consensus sites for other cis-acting factors, highlighting the importance of combinatorial interactions in defining TF-bound and TF-unbound regions as well as in conferring resistance to de novo methylation.

DNA preparation from normal cells and clinical samples

Colorectal cancer samples were collected from 10 patients who underwent colon resection for biopsy-proven invasive colorectal adenocarcinoma. The study was approved by the local ethical committee. Each resection specimen was staged and graded by routine pathology analysis according to the tumor-node-metastasis classification by the American Joint Committee on Cancer. DNA from frozen colon tissues was isolated using the Puregene DNA Purification kit (Gentra) according to the supplier's recommendation. Three normal colon DNAs (male, ages 50–63 y) were purchased from BioChain Institute, Inc. Leukemic blasts and bone marrow cells from acute leukemia patients were collected during routine diagnostic bone marrow aspirations. Patients had given informed consent to additional sample collection and analyses according to a protocol approved by the local ethical committee. The human myeloid leukemia cell lines THP-1 and U937 (American Type Culture Collection) were grown in RPMI 1640 (Biochrom KG) supplemented with 10% FCS (Life Technologies). Human peripheral blood monocytes from several healthy donors (male, ages 20–28 y) were purified after Ficoll gradient centrifugation, and subsequent elutriation of DNA from hematopoietic cell types was prepared using the Blood & Cell Culture DNA kit (Qiagen). DNA concentration was determined with the NanoDrop spectrophotometer and quality was assessed by agarose gel electrophoresis.

Methyl-CpG immunoprecipitation

The recombinant MBD-Fc protein was produced as previously described (22). Methyl-CpG immunoprecipitation (MCIp) was performed as described with slight modifications. In brief, genomic DNA was sonicated to a mean fragment size of 350 to 400 bp. Each sample (2 μg) was incubated with 150 μL of protein A–Sepharose 4 Fast Flow beads (GE Healthcare) coated with 60 μg of purified MBD-Fc protein in 2 mL Ultrafree-MC centrifugal filter devices (Amicon/Millipore) for 3 h at 4°C in buffer containing 300 mmol/L NaCl. Beads were centrifuged to recover unbound DNA fragments (300 mmol/L fraction) and subsequently washed with buffers containing increasing NaCl concentrations (400, 500, and 550 mmol/L). Densely CpG-methylated DNA was eluted with a high-salt buffer (1,000 mmol/L NaCl), and all fractions were desalted using the MinElute PCR Purification kit (Qiagen). The separation of CpG methylation densities of individual MCIp fractions was controlled by quantitative PCR using primers covering the imprinted SNRPN.

Chromatin immunoprecipitation and ligation-mediated PCR

Chromatin immunoprecipitation (ChIP) analysis of purified peripheral blood monocytes was performed essentially as described (23). Precipitation of precleared chromatin from 10 × 106 cells was done overnight at 4°C using 5 μg of anti–yin-yang 1 (YY1; Santa Cruz Biotechnology), anti–nuclear respiratory factor 1 (NRF1; Abcam), anti–specific protein 1 (Sp1; Upstate), and anti-rabbit IgG (Upstate). After reversion of cross-links, enriched DNA fragments were recovered using the QIAquick PCR Purification kit (Qiagen). The quality of each ChIP was controlled at known target sites by quantitative PCR. For ChIP-on-Chip analysis, all samples as well as an aliquot of equally treated input DNA were amplified by ligation-mediated PCR for subsequent labeling as described in Supplementary Materials and Methods.

Microarray handling and analysis

Enriched methylated DNA fragments of the high-salt MCIp fractions were labeled with Alexa Fluor 5-dCTP (cancer cells) and Alexa Fluor 3-dCTP (normal cells) using the BioPrime Total Genomic Labeling System (Invitrogen) according to the manufacturer's instructions. Amplified ChIP material was labeled with Alexa Fluor 5-dCTP and the genomic input with Alexa Fluor 3-dCTP. Comparative MCIp- and ChIP-versus-input hybridizations on 244K CpG island oligonucleotide microarrays (Agilent) were performed using the recommended stringent protocol (Agilent). Images were scanned immediately after washing using a DNA microarray scanner (Agilent) and processed using Feature Extraction Software 9.5.1 (Agilent) and the standard comparative genomic hybridization protocol. Processed signal intensities were further normalized using GC-dependent regression and imported into Microsoft Office Excel 2007 for further analysis. Microarray data have been submitted and are available from the National Center for Biotechnology Information (NCBI)/Gene Expression Omnibus repository (comparative MCIp hybridizations: GSE17455, GSE17510, and GSE17512; ChIP-on-Chip hybridizations: GSE16078). MCIp microarray data for cell lines as well as ChIP-on-Chip data for TFs are also available as Genome Browser tracks in Supplementary Materials and Methods.

Algorithm for de novo motif finding

Motif discovery was performed using a comparative algorithm similar to those previously described (24). An in-depth description and benchmarking of the software suite HOMER (Hypergeometric Optimization of Motif EnRichment)6

that was developed for motif discovery will be published elsewhere.7

7C. Benner et al., in preparation.

Briefly, sequences were divided into target and background sets for each application of the algorithm. Background sequences are then selectively weighted to equalize the distributions of CpG content in target and background sequences to avoid comparing sequences of different sequence content. Motifs are found separately by first performing exhaustively screening all oligo sequences for enrichment in the target set compared with the background set using the cumulative hypergeometric distribution. Up to two mismatches were allowed in oligo sequences to increase the sensitivity of the method. The top 50 sequences of each length with the lowest P values were then converted into probability matrices and heuristically optimized to maximize hypergeometric enrichment of each motif in the given data set. As optimized motifs are found, they are removed from the data set to facilitate the identification of additional motifs.

Reference sequence

Genomic locations are based on the March 2006 human reference sequence (NCBI Build 36.1) that was produced by the International Human Genome Sequencing Consortium.

Statistical testing

All statistical testing of enrichment data (motifs or attributes) was performed using a cumulative hypergeometric distribution (or Fisher's exact test, referred to as the hypergeometric test). Statistical testing of differences in mRNA level distributions was done using the two-sided Mann-Whitney U test.

Additional methods are provided in Supplementary Materials and Methods.

Comparative DNA methylation profiles of normal and leukemia cells

To globally define methylation-prone and methylation-resistant CpG islands, we initially analyzed the methylation status of 23,000 CpG islands of the human genome in acute leukemia cell lines as well as normal blood monocytes using our previously described MCIp technique (22, 25) as shown in Supplementary Fig. S1A. A typical scatter plot of a comparative hybridization of MCIp-enriched material (Fig. 1) highlights the three types of hybridization behavior: probes that show low signal intensities in both samples (absence of DNA methylation), probes indicating specific enrichment (aberrant DNA methylation) in the leukemia samples, and probes that show high signal intensities but low signal ratios in both samples (methylated in both samples). We initially performed comparative MCIp hybridizations of two well-established leukemia cell lines (human acute monocytic leukemia cell line THP-1 and human histiocytic lymphoma cell line U937) and their normal myeloid counterpart (peripheral human blood monocytes) and extensively validated microarray results by bisulfite conversion and subsequent matrix-assisted laser desorption/ionization–time-of-flight mass spectrometric (MALDI-TOF MS) analysis (26) to show the reproducibility of this method. Three independent replicates of each cell line were highly similar (mean r2 = 0.79 and 0.87 for log10 ratios of THP-1 and U937 monocyte comparisons, respectively), and on region or single-probe level, the microarray data correlated well with methylation ratios obtained by MALDI-TOF MS analyses. A detailed description and analysis of the validation set (1,150 amplicons covering 140 genes and ∼13,500 individual CpG dinucleotides) is available in Supplementary Materials and Methods. Confirming previous observations (27), the positional annotation of microarray probes showed that regions around known transcription start sites (TSS) are less often targeted by de novo methylation in leukemia cells than promoter-distal sites (Supplementary Fig. S2A). For further global analyses, individual probe signals were combined to cover and assign methylation ratios to whole CpG island regions. The comparative analysis on the region level also separated the three different classes of CpG island regions: unmethylated in both, aberrantly hypermethylated, and methylated in both (Supplementary Fig. S2C). We were unable to identify specific properties associated with the latter type of CpG islands, which is heterogeneous and includes both monoallelic (e.g., imprinted regions) and biallelic (tissue- or soma-specific) DNA methylation events (28) and therefore concentrated on properties of unmethylated or de novo methylated genes. Confirming earlier observations (19, 22), the comparison of global mRNA expression data of normal and leukemia cells between the two major CpG island classes showed that the majority of de novo methylated genes is characterized by low or absent transcription irrespective of the position of CpG islands relative to TSS (Supplementary Fig. S2D).

Figure 1.

Comparative DNA methylation analysis of U937 cells and normal human blood monocytes using MCIp. Representative scatter plot of a comparison of MCIp-enriched material from 2 μg of genomic DNA of a leukemia cell line (U937) and a pool of normal human blood monocytes from healthy donors on human 244K CpG island arrays. Signal intensity ratios are plotted against average signals (MvA plot, log10 scale).

Figure 1.

Comparative DNA methylation analysis of U937 cells and normal human blood monocytes using MCIp. Representative scatter plot of a comparison of MCIp-enriched material from 2 μg of genomic DNA of a leukemia cell line (U937) and a pool of normal human blood monocytes from healthy donors on human 244K CpG island arrays. Signal intensity ratios are plotted against average signals (MvA plot, log10 scale).

Close modal

Sequence motifs associate with CpG island regions that remain unmethylated or become hypermethylated in cancer

We next used the de novo motif discovery algorithm HOMER to search for sequence patterns that are associated with CpG island regions that are either specifically and highly methylated in leukemia cell lines or not methylated in any sample (also see Supplementary Materials and Methods) and were able to identify a set of eight nonredundant sequence motifs that were highly enriched in either population in comparison with all CpG island regions on the array (Fig. 2A). Two repetitive motifs were highly enriched in the hypermethylated CpG island population. More strikingly, our de novo motif search revealed several sequences highly enriched in unmethylated CpG island regions that corresponded to consensus binding sites for known TFs, including nuclear TF Y (NFY), GA-binding protein (GABP), Sp1, NRF1, and YY1. These motifs were enriched with high significance and showed a clear enrichment/depletion in unmethylated or methylated CpG island regions, respectively (Fig. 2B). We next obtained comparative methylation profiles of samples from acute leukemia (n = 8, compared with normal monocytes) and colorectal carcinoma patients (n = 10, compared with normal colon) and analyzed the distribution of the above-identified motifs. All sequence motifs were significantly enriched in either unmethylated or methylated CpG island regions in both primary tumor types (Fig. 2B).

Figure 2.

Sequence motifs associated with aberrantly DNA methylated (mCpG) and commonly unmethylated CpG island regions (CpG). A, P values (hypergeometric) for the enrichment of the indicated sequence motifs were assigned based on motif-centered methylation data (mean signal intensities of all microarray probes in the range of ±150 bp around each motif; for a detailed description, see Supplementary Materials and Methods). Motifs identified de novo are shown in comparison with known matrices from the TRANSFAC database. B, the two left diagrams depict ratios of observed versus expected motif occurrences in CpG island regions that are aberrantly DNA methylated specifically in cell lines (mCpG, blue columns) or unmethylated in monocytes and cell lines (CpG, red columns). The distribution of sequence motifs was also analyzed in acute leukemia samples (AL; n = 8) or colorectal carcinomas (CRC; n = 10). Here, median ratios of observed versus expected motif occurrences are shown as described above. Error bars mark the interquartile range. Hypergeometric P values for individual enrichments are listed in Supplementary Table S3.

Figure 2.

Sequence motifs associated with aberrantly DNA methylated (mCpG) and commonly unmethylated CpG island regions (CpG). A, P values (hypergeometric) for the enrichment of the indicated sequence motifs were assigned based on motif-centered methylation data (mean signal intensities of all microarray probes in the range of ±150 bp around each motif; for a detailed description, see Supplementary Materials and Methods). Motifs identified de novo are shown in comparison with known matrices from the TRANSFAC database. B, the two left diagrams depict ratios of observed versus expected motif occurrences in CpG island regions that are aberrantly DNA methylated specifically in cell lines (mCpG, blue columns) or unmethylated in monocytes and cell lines (CpG, red columns). The distribution of sequence motifs was also analyzed in acute leukemia samples (AL; n = 8) or colorectal carcinomas (CRC; n = 10). Here, median ratios of observed versus expected motif occurrences are shown as described above. Error bars mark the interquartile range. Hypergeometric P values for individual enrichments are listed in Supplementary Table S3.

Close modal

Motifs isolated from unmethylated CpG islands were previously described as prominent constituents of proximal promoters (29, 30). Indeed, all six sequence motifs identified in unmethylated regions were enriched within the proximal promoter regions of known genes (Supplementary Fig. S3A and B), whereas repeat sequences showed no specific enrichment around TSSs. Motif searches with groups of unmethylated or methylated CpG island regions that were classified according to their genomic position (promoter/intragenic/intergenic) additionally identified a CTCF consensus motif specifically enriched in unmethylated intergenic CpG island regions (Supplementary Fig. S3A and B; Supplementary Materials and Methods). Despite the significant overrepresentation of the protective motifs in promoters, they were also enriched in unmethylated CpG island regions that were located in intergenic or intragenic regions (Supplementary Fig. S3C). By plotting average MCIp signal intensities against motif distance for each of the protective motifs, we showed that signal ratios were lowest at the center and progressively increased with distance. Distance-related differences in signal ratios markedly increased in leukemia cells, suggesting that these motifs are indeed associated with lower methylation levels and that this association depends on motif distance (Supplementary Fig. S4A). This concurs with the preferential de novo methylation of CpG island shores detected in colon cancer (27). Averaged DNA methylation ratios of individual CpGs derived from high-throughput reduced representation bisulfite sequencing of mouse primary tissues (31) show a similar motif distance–dependent distribution (Supplementary Fig. S4B and C).

Sequence motifs and TF binding in normal cells correlate with CpG methylation status in leukemia

To study the correlation between motif appearance, TF binding in normal cells, and aberrant DNA methylation in the tumor cell lines, we performed ChIP-on-chip experiments for the TFs Sp1, NRF1, and YY1 in normal monocytes. As their consensus sites, these factors preferentially bound to promoter regions (Supplementary Fig. S5A), often bound in the vicinity of each other (Supplementary Fig. S5B), and showed enrichment of the other protective motifs around their binding sites (Supplementary Fig. S5C). Some motifs showed preferences in terms of orientation or distance to each other (Supplementary Fig. S6A). In general, motif distances show periodic preferences in most cases, which is in line with sterical preferences caused by the helical structure of DNA (Supplementary Fig. S6A). Genes associated with TF-bound CpG islands generally show significantly higher mRNA levels in normal progenitors (CD34+ cells), normal blood monocytes (CD14+ cells), or a leukemia cell line as compared with all genes (Supplementary Fig. S6B), and binding of more than one factor generally increased overall expression level of associated genes (Supplementary Fig. S6C).

The direct comparison of TF binding patterns in normal cells with aberrant methylation profiles of leukemia cell lines shows that both events were mutually exclusive in all three cases (Fig. 3; Supplementary Fig. S7). We also observed that TF binding was not detected at every motif. The comparison of bound and nonbound motifs using the de novo motif-finding algorithm revealed that TF-bound motifs were coenriched for consensus motifs of the other protective motifs (Fig. 4A). A sequence motif was more likely bound if it contained at least one or two other motifs in close proximity (Fig. 4B), and genes associated with TF-bound motifs show significantly higher mRNA levels as compared with genes that were associated with nonbound motifs (Fig. 4C). The data suggest that the stable binding of these general TFs (as measured by ChIP) to their consensus motif depends on the presence of neighboring motifs that are cooperatively bound by other general TFs. Thus, the combinatorial presence of two or more of the identified consensus sequences may serve to stabilize TF binding and to confer the resistance of certain CpG islands (preferably those acting as promoters) to aberrant methylation.

Figure 3.

Correlation between TF binding in normal cells and aberrant de novo methylation in leukemia cells. The three TFs—Sp1, NRF1, and YY1—were analyzed using ChIP-on-Chip on human 244K CpG island arrays. Signal intensity ratios of ChIP enrichment of each TF are plotted against the MCIp enrichment of the leukemia cell line (THP-1) versus normal human blood monocytes. Probes associated with the corresponding consensus motifs are colored; all other probes are in gray. Corresponding diagrams for U937 are shown in Supplementary Fig. S7.

Figure 3.

Correlation between TF binding in normal cells and aberrant de novo methylation in leukemia cells. The three TFs—Sp1, NRF1, and YY1—were analyzed using ChIP-on-Chip on human 244K CpG island arrays. Signal intensity ratios of ChIP enrichment of each TF are plotted against the MCIp enrichment of the leukemia cell line (THP-1) versus normal human blood monocytes. Probes associated with the corresponding consensus motifs are colored; all other probes are in gray. Corresponding diagrams for U937 are shown in Supplementary Fig. S7.

Close modal
Figure 4.

Properties of consensus sequences that are bound or nonbound by the corresponding TF. A, based on ChIP-on-Chip data, the motifs for Sp1, NRF1, and YY1 could be subdivided into bound and nonbound CpG island sequences. De novo motif searches of bound motifs against nonbound motifs revealed a highly significant association of bound motifs with other general consensus sites within ±250 bp around each motif (hypergeometric P values are given next to the motif). B, ratios of observed versus expected motif occurrences are shown for sequence motifs that are either bound by the corresponding factor (blue columns) or not bound (green columns) and had at least one (top) or two other consensus sites (bottom) within a 250-bp distance. Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (hypergeometric test: P < 0.001), except for the cases marked with a hash. C, the box plots show the distribution of mRNA expression ratios (CD34+ progenitor cells, CD14+ monocytes, and U937 cells) conditional on the binding status of the associated motif. Red lines, medians; boxes, interquartile ranges; whiskers, 5th and 95th percentiles. Pairwise comparisons of mRNA expression ratios associated with bound and nonbound motifs are significant (P < 0.001, Mann-Whitney U test, two-sided).

Figure 4.

Properties of consensus sequences that are bound or nonbound by the corresponding TF. A, based on ChIP-on-Chip data, the motifs for Sp1, NRF1, and YY1 could be subdivided into bound and nonbound CpG island sequences. De novo motif searches of bound motifs against nonbound motifs revealed a highly significant association of bound motifs with other general consensus sites within ±250 bp around each motif (hypergeometric P values are given next to the motif). B, ratios of observed versus expected motif occurrences are shown for sequence motifs that are either bound by the corresponding factor (blue columns) or not bound (green columns) and had at least one (top) or two other consensus sites (bottom) within a 250-bp distance. Enrichment in the bound fraction and depletion in the unbound fraction were highly significant (hypergeometric test: P < 0.001), except for the cases marked with a hash. C, the box plots show the distribution of mRNA expression ratios (CD34+ progenitor cells, CD14+ monocytes, and U937 cells) conditional on the binding status of the associated motif. Red lines, medians; boxes, interquartile ranges; whiskers, 5th and 95th percentiles. Pairwise comparisons of mRNA expression ratios associated with bound and nonbound motifs are significant (P < 0.001, Mann-Whitney U test, two-sided).

Close modal

Properties of CpG island–associated genes in conjunction with CpG island methylation status and TF binding

We finally asked the question whether DNA methylation status or TF binding events at CpG islands are associated with attributes or properties of the corresponding genes or their products. Thirteen databases were analyzed for enrichment of specific terms or properties, including gene ontology terms, pathway association, protein domains or interactions, chromosomal localization, and predicted miRNA targets in regions that were associated with a DNA methylation status or bound by any of the three TFs Sp1, NRF1, or YY1 (Fig. 5; Supplementary Table S4). Hierarchical clustering of enrichment P values clearly separated the three classes of CpG islands into functional groups. DNA methylation–free and TF-bound regions included terms that were associated with basic cellular functions required for cell survival and proliferation. In line with earlier observations (32), CpG island regions that are commonly targeted by aberrant DNA methylation in both myeloid cell lines showed highly significant associations with gene ontology terms related to developmental processes, TF or receptor functions, as well as homeobox proteins, which are often targeted by Polycomb group repressors. Interestingly, these associations were also found in regions that contained unbound consensus motifs for at least one of the three above general TFs and, to a lesser extent, in regions that were methylated also in normal somatic cells (human blood monocytes; Supplementary Table S4).

Figure 5.

Hierarchical clustering of significance values for gene ontology enrichment. Enrichment or depletion was calculated for gene attributes and properties, including gene ontology (GO) terms, pathway association, protein domains or interactions, chromosomal localization, and predicted miRNA targets (a complete list of databases is given in Supplementary Materials and Methods) in regions that were associated with a DNA methylation status (mCpG, methylated; CpG unmethylated), motif presence without DNA binding of the respective factor, or TF binding [any of the three TFs—Sp1, NRF1, or YY1—in total (All), alone (Only), or in combination]. P values for enrichment or depletion of each attribute were calculated using the hypergeometric test (the complete list of P values is given in Supplementary Table S3), and attributes with P < 10−10 were used to perform hierarchical clustering (Pearson centered, average linkage). Data are presented as a heat map, where red coloring indicates the significant depletion and blue coloring the significant enrichment of an attribute. Main clusters of attributes are indicated and top terms are given for each group.

Figure 5.

Hierarchical clustering of significance values for gene ontology enrichment. Enrichment or depletion was calculated for gene attributes and properties, including gene ontology (GO) terms, pathway association, protein domains or interactions, chromosomal localization, and predicted miRNA targets (a complete list of databases is given in Supplementary Materials and Methods) in regions that were associated with a DNA methylation status (mCpG, methylated; CpG unmethylated), motif presence without DNA binding of the respective factor, or TF binding [any of the three TFs—Sp1, NRF1, or YY1—in total (All), alone (Only), or in combination]. P values for enrichment or depletion of each attribute were calculated using the hypergeometric test (the complete list of P values is given in Supplementary Table S3), and attributes with P < 10−10 were used to perform hierarchical clustering (Pearson centered, average linkage). Data are presented as a heat map, where red coloring indicates the significant depletion and blue coloring the significant enrichment of an attribute. Main clusters of attributes are indicated and top terms are given for each group.

Close modal

The hypothesis that a TF provides methylation protection dates back to the reports of two independent groups in 1994, showing that a Sp1-binding site is necessary to protect the APRT gene from de novo methylation (9, 13) in humans and mice. Because Sp1-deficient animals had no obvious “methylation defects,” the concept of methylation protection by TFs has been controversially discussed. Anecdotal evidence clearly supports a role of specific DNA-binding proteins in establishing and maintaining DNA methylation pattern, but it is unclear whether the reported observations represent isolated cases or whether methylation protection represents a general mechanism. Earlier computational studies largely failed to identify defined consensus motifs for known TFs. Only one recent survey of methylation states at CpG islands in normal human tissues described the association of unmethylated CpG islands with the consensus motif for the Sp1 (28).

Using a powerful de novo motif analysis, our study shows that several defined sequence motifs are strongly enriched in CpG islands that are generally resistant to de novo methylation in cancer. These sequence motifs were previously shown to represent the most conserved motifs in mammalian promoters (30), but the observed correlation is also evident at intergenic, promoter-distal CpG islands that are not directly associated with transcription. We also show that the sole presence of a consensus motif for any of the general factors is not sufficient to confer “protection” from de novo methylation. In fact, protection from de novo methylation requires the stable binding of these factors to their binding sites, which in turn requires the presence of neighboring motifs that are cobound by at least one other ubiquitous (or in some cases cell type–specific) TF (a schematic model describing the methylation protection hypothesis is shown in Fig. 6).

Figure 6.

A model for DNA methylation protection by the combinatorial action of general TFs. If two or more consensus sites for general TFs are located in close proximity, these sites are likely to be bound stably by the corresponding factors. Stably bound factors likely recruit cofactors that in turn create a protective chromatin environment (e.g., by introducing protective histone marks such as H3K4 methylation). These regions are only rarely methylated during neoplastic transformation or aging. A single isolated motif is less likely to be bound by its corresponding factor, will have a less protective chromatin environment, and is more likely targeted by de novo methylation in cancer.

Figure 6.

A model for DNA methylation protection by the combinatorial action of general TFs. If two or more consensus sites for general TFs are located in close proximity, these sites are likely to be bound stably by the corresponding factors. Stably bound factors likely recruit cofactors that in turn create a protective chromatin environment (e.g., by introducing protective histone marks such as H3K4 methylation). These regions are only rarely methylated during neoplastic transformation or aging. A single isolated motif is less likely to be bound by its corresponding factor, will have a less protective chromatin environment, and is more likely targeted by de novo methylation in cancer.

Close modal

Most resistant CpG islands were bound by combinations of ubiquitous TFs and also associated with basic cellular functions, whereas “methylation-prone” CpG islands generally associated with organismal development, differentiation, and cell communication, which are frequently regulated by cell type–specific TFs. Interestingly, genes that are associated with CpG islands that were commonly methylated in normal and cancer cells were enriched for predicted targets of specific (mostly uncharacterized) miRNAs; however, the relevance of this observation is uncertain and requires functional validation. We also observed that methylation-prone regions are significantly enriched for certain repeat motifs (GAGA and CACA), implying that they may also act as cis-acting sequences and direct de novo DNA methylation. GAGA resembles the consensus motif for Drosophila GAGA-binding factor, a trithorax group member that has been implicated in preventing heterochromatin spreading (33); however, a mammalian homologue has not been described thus far. CA repeats have not been previously linked to DNA methylation or chromatin structure.

With the exception of the Sp1/Sp3 motif, none of the other motifs has previously been associated with the establishment or maintenance of DNA methylation (8, 9, 28) but all are known to recruit epigenetic modifiers to their binding sites. NFY, a regulator of many cell cycle control genes, actively recruits coactivators (such as p300) that induce histone acetylation at NFY-bound promoters (34). Ubiquitously expressed NRF1 and GABP (also called NRF2) are able to recruit coactivators (PCG1 and p300/CBP) that create a chromatin environment favoring transcription (35, 36). YY1 has been shown to recruit Polycomb group proteins that control H3K27 methylation, a mark that has previously been implicated in aberrant silencing mechanisms during tumorigenesis (6, 37). However, a recent study by Lindroth and colleagues (38) elegantly showed that H3K27 methylation (recruited by YY1) and CpG DNA methylation at the murine Rasgrf1 locus are mutually exclusive, suggesting that both epigenetic marks are interdependent and antagonistic. This is also consistent with a recent study globally mapping key histone modifications and subunits of Polycomb-repressive complexes 1 and 2 (PRC1 and PRC2) in embryonic stem cells (39), which identified a YY1-like motif enriched in CpG islands that were not targeted by PRC2. Additional motifs identified in this study (ETS, NFY, AP-1, MYC, and NRF1; ref. 39) partially overlapped with those observed in the present study, further corroborating the negative correlation of repressive epigenetic marks and cis-acting sequences conferring transcriptional activity. In line with several recent observations showing that the DNA methylation status correlates with histone modifications (31, 40, 41), the factors binding the identified sequences likely share the ability to recruit RNA polymerase II (Pol II) and to create an “active” chromatin environment that may prevent or at least impede de novo CpG methylation at particular CpG islands.

An analogous study recently showed that the presence of RNA Pol II, active or stalled, predicts the epigenetic fate of promoter CpG islands in cancer (42). Because the recruitment of RNA Pol II requires cis-acting factors such as Sp1 (43), a large overlap between TF and Pol II binding is expected and the association of Pol II with resistance to de novo methylation is likely a consequence of its interaction with TFs present at the promoter. However, the fact that TF-bound, promoter-distal sites were equally resistant to de novo methylation in our study suggests that cis-acting factors may have a protective role independent of Pol II binding.

In conclusion, our data provide strong experimental and computational evidence that specific sequence motifs are associated with the DNA methylation states of CpG islands in normal and malignant cells. Most of these motifs are identical to consensus motifs for known general TFs, and our data suggest that the combinatorial binding of these factors plays a dominant role in regulating the DNA methylation status at a large set of CpG islands. Our findings also imply that the aberrant methylation patterns in cancer cells may at least in part result from a “loss of protection.”

M. Ehrich is a shareholder and employee of Sequenom, Inc.

Grant Support: Wilhelm Sander Stiftung and Deutsche Krebshilfe (M. Rehli).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
Jones
PA
,
Baylin
SB
. 
The epigenomics of cancer
.
Cell
2007
;
128
:
683
92
.
2
Metivier
R
,
Gallais
R
,
Tiffoche
C
, et al
. 
Cyclical DNA methylation of a transcriptionally active promoter
.
Nature
2008
;
452
:
45
50
.
3
Suzuki
M
,
Yamada
T
,
Kihara-Negishi
F
, et al
. 
Site-specific DNA methylation by a complex of PU.1 and Dnmt3a/b
.
Oncogene
2006
;
25
:
2477
88
.
4
Feldman
N
,
Gerson
A
,
Fang
J
, et al
. 
G9a-mediated irreversible epigenetic inactivation of Oct-3/4 during early embryogenesis
.
Nat Cell Biol
2006
;
8
:
188
94
.
5
Tachibana
M
,
Matsumura
Y
,
Fukuda
M
,
Kimura
H
,
Shinkai
Y
. 
G9a/GLP complexes independently mediate H3K9 and DNA methylation to silence transcription
.
EMBO J
2008
;
27
:
2681
90
.
6
Vire
E
,
Brenner
C
,
Deplus
R
, et al
. 
The Polycomb group protein EZH2 directly controls DNA methylation
.
Nature
2006
;
439
:
871
4
.
7
Turker
MS
. 
Gene silencing in mammalian cells and the spread of DNA methylation
.
Oncogene
2002
;
21
:
5388
93
.
8
Boumber
YA
,
Kondo
Y
,
Chen
X
, et al
. 
An Sp1/Sp3 binding polymorphism confers methylation protection
.
PLoS Genet
2008
;
4
:
e1000162
.
9
Brandeis
M
,
Frank
D
,
Keshet
I
, et al
. 
Sp1 elements protect a CpG island from de novo methylation
.
Nature
1994
;
371
:
435
8
.
10
Han
L
,
Lin
IG
,
Hsieh
CL
. 
Protein binding protects sites on stable episomes and in the chromosome from de novo methylation
.
Mol Cell Biol
2001
;
21
:
3416
24
.
11
Kress
C
,
Thomassin
H
,
Grange
T
. 
Active cytosine demethylation triggered by a nuclear receptor involves DNA strand breaks
.
Proc Natl Acad Sci U S A
2006
;
103
:
11112
7
.
12
Lin
IG
,
Hsieh
CL
. 
Chromosomal DNA demethylation specified by protein binding
.
EMBO Rep
2001
;
2
:
108
12
.
13
Macleod
D
,
Charlton
J
,
Mullins
J
,
Bird
AP
. 
Sp1 sites in the mouse aprt gene promoter are required to prevent methylation of the CpG island
.
Genes Dev
1994
;
8
:
2282
92
.
14
Tagoh
H
,
Melnik
S
,
Lefevre
P
,
Chong
S
,
Riggs
AD
,
Bonifer
C
. 
Dynamic reorganization of chromatin structure and selective DNA demethylation prior to stable enhancer complex formation during differentiation of primary hematopoietic cells in vitro
.
Blood
2004
;
103
:
2950
5
.
15
Thomassin
H
,
Flavin
M
,
Espinas
ML
,
Grange
T
. 
Glucocorticoid-induced DNA demethylation and gene memory during development
.
EMBO J
2001
;
20
:
1974
83
.
16
Feltus
FA
,
Lee
EK
,
Costello
JF
,
Plass
C
,
Vertino
PM
. 
Predicting aberrant CpG island methylation
.
Proc Natl Acad Sci U S A
2003
;
100
:
12253
8
.
17
Feltus
FA
,
Lee
EK
,
Costello
JF
,
Plass
C
,
Vertino
PM
. 
DNA motifs associated with aberrant CpG island methylation
.
Genomics
2006
;
87
:
572
9
.
18
Das
R
,
Dimitrova
N
,
Xuan
Z
, et al
. 
Computational prediction of methylation status in human genomic sequences
.
Proc Natl Acad Sci U S A
2006
;
103
:
10713
6
.
19
Keshet
I
,
Schlesinger
Y
,
Farkash
S
, et al
. 
Evidence for an instructive mechanism of de novo methylation in cancer cells
.
Nat Genet
2006
;
38
:
149
53
.
20
Illingworth
R
,
Kerr
A
,
Desousa
D
, et al
. 
A novel CpG island set identifies tissue-specific methylation at developmental gene loci
.
PLoS Biol
2008
;
6
:
e22
.
21
Bock
C
,
Paulsen
M
,
Tierling
S
,
Mikeska
T
,
Lengauer
T
,
Walter
J
. 
CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure
.
PLoS Genet
2006
;
2
:
e26
.
22
Gebhard
C
,
Schwarzfischer
L
,
Pham
TH
, et al
. 
Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia
.
Cancer Res
2006
;
66
:
6118
28
.
23
Metivier
R
,
Penot
G
,
Hubner
MR
, et al
. 
Estrogen receptor-α directs ordered, cyclical, and combinatorial recruitment of cofactors on a natural target promoter
.
Cell
2003
;
115
:
751
63
.
24
Barash
Y
,
Bejerano
G
,
Friedman
N
. 
A simple hyper-geometric approach for discovering putative transcription factor binding sites
.
WABI '01: Proceedings of the First International Workshop on Algorithms in Bioinformatics, vol. 2149
.
London (UK)
:
Springer-Verlag
; 
2001
, p.
278
293
.
25
Schilling
E
,
Rehli
M
. 
Global, comparative analysis of tissue-specific promoter CpG methylation
.
Genomics
2007
;
90
:
314
23
.
26
Ehrich
M
,
Nelson
MR
,
Stanssens
P
, et al
. 
Quantitative high-throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry
.
Proc Natl Acad Sci U S A
2005
;
102
:
15785
90
.
27
Irizarry
RA
,
Ladd-Acosta
C
,
Wen
B
, et al
. 
The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores
.
Nat Genet
2009
;
41
:
178
86
.
28
Straussman
R
,
Nejman
D
,
Roberts
D
, et al
. 
Developmental programming of CpG island methylation profiles in the human genome
.
Nat Struct Mol Biol
2009
;
16
:
564
71
.
29
Rozenberg
JM
,
Shlyakhtenko
A
,
Glass
K
, et al
. 
All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues
.
BMC Genomics
2008
;
9
:
67
.
30
Xie
X
,
Lu
J
,
Kulbokas
EJ
, et al
. 
Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals
.
Nature
2005
;
434
:
338
45
.
31
Meissner
A
,
Mikkelsen
TS
,
Gu
H
, et al
. 
Genome-scale DNA methylation maps of pluripotent and differentiated cells
.
Nature
2008
;
454
:
766
70
.
32
Bracken
AP
,
Dietrich
N
,
Pasini
D
,
Hansen
KH
,
Helin
K
. 
Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions
.
Genes Dev
2006
;
20
:
1123
36
.
33
Nakayama
T
,
Nishioka
K
,
Dong
YX
,
Shimojima
T
,
Hirose
S
. 
Drosophila GAGA factor directs histone H3.3 replacement that prevents the heterochromatin spreading
.
Genes Dev
2007
;
21
:
552
61
.
34
Faniello
MC
,
Bevilacqua
MA
,
Condorelli
G
, et al
. 
The B subunit of the CAAT-binding factor NFY binds the central segment of the co-activator p300
.
J Biol Chem
1999
;
274
:
7623
6
.
35
Izumi
H
,
Ohta
R
,
Nagatani
G
, et al
. 
p300/CBP-associated factor (P/CAF) interacts with nuclear respiratory factor-1 to regulate the UDP-N-acetyl-α-d-galactosamine: polypeptide N-acetylgalactosaminyltransferase-3 gene
.
Biochem J
2003
;
373
:
713
22
.
36
Wu
Z
,
Puigserver
P
,
Andersson
U
, et al
. 
Mechanisms controlling mitochondrial biogenesis and respiration through the thermogenic coactivator PGC-1
.
Cell
1999
;
98
:
115
24
.
37
Schlesinger
Y
,
Straussman
R
,
Keshet
I
, et al
. 
Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer
.
Nat Genet
2007
;
39
:
232
6
.
38
Lindroth
AM
,
Park
YJ
,
McLean
CM
, et al
. 
Antagonism between DNA and H3K27 methylation at the imprinted Rasgrf1 locus
.
PLoS Genet
2008
;
4
:
e1000145
.
39
Ku
M
,
Koche
RP
,
Rheinbay
E
, et al
. 
Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains
.
PLoS Genet
2008
;
4
:
e1000242
.
40
Brunner
AL
,
Johnson
DS
,
Kim
SW
, et al
. 
Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver
.
Genome Res
2009
;
19
:
1044
56
.
41
Schmidl
C
,
Klug
M
,
Boeld
TJ
, et al
. 
Lineage-specific DNA methylation in T cells correlates with histone methylation and enhancer activity
.
Genome Res
2009
;
19
:
1165
74
.
42
Takeshima
H
,
Yamashita
S
,
Shimazu
T
,
Niwa
T
,
Ushijima
T
. 
The presence of RNA polymerase II, active or stalled, predicts epigenetic fate of promoter CpG islands
.
Genome Res
2009
;
19
:
1974
82
.
43
Lemon
B
,
Tjian
R
. 
Orchestrated response: a symphony of transcription factors for gene control
.
Genes Dev
2000
;
14
:
2551
69
.