Abstract
We present a straightforward and comprehensive approach for DNA methylation analysis in mammalian genomes. The methylated-CpG island recovery assay (MIRA), which is based on the high affinity of the MBD2/MBD3L1 complex for methylated DNA, has been used to detect cell type–dependent differences in DNA methylation on a microarray platform. The procedure has been verified and applied to identify a series of novel candidate lung tumor suppressor genes and potential DNA methylation markers that contain methylated CpG islands. One gene of particular interest was DLEC1, located at a commonly deleted area on chromosome 3p22-p21.3, which was frequently methylated in primary lung cancers and melanomas. Among the identified methylated genes, homeodomain-containing genes were unusually frequent (11 of the top 50 hits) and were targeted on different chromosomes. These genes included LHX2, LHX4, PAX7, HOXB13, LBX1, SIX2, HOXD3, DLX1, HOXD1, ONECUT2, and PAX9. The data show that MIRA-assisted microarray analysis has a low false-positive rate and has the capacity to catalogue methylated CpG islands on a genome-wide basis. The results support the hypothesis that cancer-associated DNA methylation events do not occur randomly throughout the genome but at least some are targeted by specific mechanisms. (Cancer Res 2006; 66(16): 7939-47)
Introduction
In mammalian cells, the DNA base 5-methylcytosine occurs at 5′-CpG dinucleotides and provides the basis for a common mode of epigenetic inheritance. Changes in DNA methylation patterns occur in a developmental stage– and tissue-specific manner and often accompany tumor development, most notably in the form of CpG island hypermethylation (1–10). During tumorigenesis, both alleles of a tumor suppressor gene need to be inactivated by genomic changes such as chromosomal deletions or loss-of-function mutations in the coding region of a gene. As an alternative mechanism, transcriptional silencing by hypermethylation of CpG islands spanning the promoter regions of tumor suppressor genes is a common and important process in carcinogenesis. Because hypermethylation generally leads to inactivation of gene expression, this epigenetic alteration is considered to be a key mechanism for long-term silencing of tumor suppressor genes. The importance of promoter methylation in functional inactivation of lung cancer suppressor genes is becoming increasingly recognized. It is estimated that between 0.5% and 3% of all genes carrying CpG-rich promoter sequences (so-called CpG islands) may be silenced by DNA methylation in lung cancer (1, 11). This means that there are most likely several hundred genes that are incapacitated by this pathway. Some of these genes may be bona fide tumor suppressor genes, but in other cases, the methylation event may be a consequence of gene silencing or may somehow be associated with tumor formation rather than being a cause of tumorigenesis. Several specific genes are methylated in lung cancer, including CDKN2A, RASSF1A, RARβ, MGMT, GSTP1, CDH13, APC, DAPK, TIMP3, and several others (12–16). The methylation frequency (the percentage of tumors analyzed that carry methylated alleles) ranges from <10% to >80% but these numbers differ widely depending on the histologic type of tumor, the study population, and/or the methods used to assess methylation. To improve the sensitivity of screening tools for the detection of early lung cancer, DNA methylation markers have shown great promise (7, 17, 18). However, many more markers that could have improved specificity in discriminating tumor from normal tissue and are methylated at a high frequency in lung tumors have likely not yet been discovered.
To analyze DNA methylation patterns on a genome-wide scale, several techniques have been developed, but none of them has yet reached wide acceptance. Most methods currently available are labor-intensive and use methylation-sensitive restriction endonucleases, and thus are limited by the occurrence of the respective sites within the target sequence. Another way to find methylated genes is by using expression microarrays to identify genes reactivated by treatment with DNA methylation inhibitors (e.g., `5-aza-deoxycytidine; refs. 19–21). This approach is effective but can only be used with cell lines. Recently, genomic tiling and BAC microarrays have been introduced to map methylation patterns (22, 23). These methods are also limited, both in terms of their level of resolution and in terms of the requirements for restriction endonuclease recognition sites. An antibody against 5-methylcytosine has been used in immunoprecipitation experiments combined with microarrays (6, 23). However, this antibody requires ssDNA for recognition, which is sometimes difficult to achieve in CpG-rich DNA regions. Here we describe a new genome-wide DNA methylation detection method that depends neither on restriction endonucleases nor on specific antibodies. This method is based on the methylated-CpG island recovery assay (MIRA), which we previously applied for testing the methylation status of specific genes (24). It makes use of the high affinity of the MBD2b/MBD3L1 complex for methylated DNA (24, 25). Here we show that MIRA can be used to analyze the DNA methylation status of a large number of genes simultaneously using a microarray approach.
Materials and Methods
MIRA and microarray analysis. DNA obtained from normal human bronchial epithelial (NHBE) cells and from the lung cancer cell line A549 was digested with MseI (5′-TTAA), which produces small (∼200-300 bp) fragments and generally cuts outside of CpG islands. Linkers (upper strand sequence, 5′-TAGAATTCAGATCTCCCG-3′; lower strand sequence, 3′-CTTAAGTCTAGAGGGCCCAGTGGCG-5′) were ligated to the MseI-digested DNA and enrichment of the methylated fraction was done by MIRA as previously described (24). Briefly, 1 μg of purified GST-tagged MBD2b protein and 1 μg of purified His-tagged MBD3L1 protein were preincubated and bound to a glutathione sepharose CL-4B matrix (Amersham Biosciences, Piscataway, NJ). The plasmids used to produce these proteins are available on request. This matrix was incubated with 500 ng of MseI-cut and linker-ligated genomic DNA in 400 μL of a binding reaction mixture [10 mmol/L Tris-HCl (pH 7.5), 50 mmol/L NaCl, 1 mmol/L EDTA, 1 mmol/L DTT, 3 mmol/L MgCl2, 0.1% Triton-X100, 5% glycerol, 25 μg/mL bovine serum albumin, and 1.25 μg/mL sonicated JM110 (dcm minus) bacterial DNA] for 240 minutes at 4°C on a rocking platform. After washing the pelletted sepharose beads thrice with binding buffer containing 700 mmol/L NaCl, the methylated DNA–enriched genomic DNA fraction was eluted by addition of guanidinium hydrochloride–containing buffer and purified using Qiaquick PCR purification kits according to the instructions of the manufacturer (Qiagen, Valencia, CA). The MIRA-captured DNA was then PCR amplified using 4.5 μmol/L linker primers in reaction buffer (50 μL) containing deoxynucleotide triphosphates (250 μmol/L each) and Titanium Taq polymerase (Clontech, Palo Alto, CA). Twenty-two cycles of amplification at 94°C for 1 minute, 60°C for 1 minute, and 68°C for 1 minute were done. The amplified fragments were labeled with either Cy3-dCTP or Cy5-dCTP (Amersham) by random priming. In parallel, the MIRA-input DNA from normal and tumor DNA was labeled with either Cy3-dCTP or Cy5-dCTP by random priming (see scheme in Fig. 2). Then the dye-labeled DNA samples (500 ng each) were mixed and hybridized to human CpG island microarrays (UHN Microarray Centre, University of Toronto, Toronto, Canada). The sequences present on this array are derived from a CpG island library in which 68% of the unique sequences map to the 5′ end of known or putative genes (26, 27). The dye-labeled DNA samples were ethanol precipitated overnight at −70°C in the presence of human Cot-1 DNA (7.2 μg Cot-1 DNA/μg sample). After ethanol precipitation, the samples were resuspended in 22 μL of hybridization buffer 1 (2.2× SSC, 0.22% SDS). Then, 20 μL of hybridization buffer 2 (70% formamide/3× SSC, 14.3% dextran sulfate) were added to the mixture, and the tube was first heated at 95°C for 5 minutes to denature the DNA and then incubated at 42°C for 2 minutes. At this step, 4 μg of yeast tRNA (Sigma) and 3 μL of 2% BSA were added to the mixture, which was then spotted onto the DNA microarray slides. A 25 × 60 mm coverslip was then gently placed on top of the samples and the hybridization was carried out in a hybridization chamber (Corning, Acton, MA) at 60°C overnight in a water bath. After the hybridization, the microarray slides were washed and dried by a spinning in a tabletop centrifuge. Two independent MIRA assays were done for tumor and normal tissue, and the input and MIRA-enriched fractions were labeled using either Cy3 or Cy5. A total of eight arrays were labeled as dye-swap pairs and hybridized. After washing, the slides were scanned using an Axon GenePix 4000b scanner and images were quantified by GenePix pro v.6 software. Preprocessing of raw data and statistical analysis were done using Bioconductor packages in R programming environment. The spots marked as “bad” or “not found” by the GenePix software were excluded. Background correction was done using the “normexp” method implemented in the Bioconductor LIMMA package to adjust the local median background estimates. The background-corrected intensity data were normalized using the Print-tip group Lowess method to remove the bias within each array. The dye bias was then further corrected by averaging the log 2 ratios between the dye swap pairs. Based on our experience, the combined Lowess and dye swap normalization approach can best reduce variability. CpG island methylation profiles were determined by ratios between MIRA-enriched and unenriched samples (enrichment factor) for both tumor and normal tissues. The ratios of the enrichment factors between cancer and normal DNA samples will measure the methylation difference between cancer and normal tissue. To identify the CpG islands that are differentially methylated between normal and tumor cell DNA, methylation profiles were compared using statistical linear model in LIMMA. For target gene selection, unadjusted P values were set at a level of 0.05, and the fold change between cancer MIRA/Input versus normal MIRA/Input (difference factor) was set at >2. Direct comparison of MIRA-enriched fractions from tumor and normal tissue DNA provided independent confirmation for the methylation differences observed although the latter analysis may be affected by differences in gene copy numbers between normal and tumor tissue.
DNA methylation analysis using combined bisulfite restriction analysis and bisulfite sequencing. DNA was isolated from cell lines or frozen tumors and matched normal tissue by standard phenol-chloroform extraction and ethanol precipitation. Non–small-cell lung carcinoma tumor tissue samples and matching normal tissues removed with surgery were obtained from the frozen tumor bank of the City of Hope National Medical Center. The combined bisulfite restriction analysis (COBRA) assays were done using the method of Xiong and Laird (28). DNA was treated with sodium bisulfite and purified as described (24). PCR primers for amplification of specific targets in bisulfite-treated DNA are listed in Supplementary Table S1. For sequence analysis, the PCR products obtained after bisulfite conversion were cloned into the pDrive PCR cloning vector (Qiagen) and 6 to 10 individual clones were sequenced.
5-Aza-deoxycytidine treatment. A549 cells were treated with 5-aza-2′-deoxycytidine (Sigma) at 1, 5, and 10 μmol/L concentrations. Cells, 2 × 106 each, were grown for 4 days in the presence of different concentrations of 5-aza-deoxycytidine. The medium was changed daily and 5-aza-deoxycytidine was replenished each day. RNA was isolated and reverse transcription-PCR (RT-PCR) was done using forward primer 5′-GATATCTCGCACTTGCTCACC and reverse primer 5′-ATCCAGCCGCTGCTTATAGA.
Gel mobility shift assays. To test the affinity of the MBD2b/MBD3L1 complex for CpG-methylated DNA, a double-stranded 55-mer oligonucleotide (sequence 5′-GATCCGCACCGAGGCGGGC-CGCTCCGTAGCGCTACGGGACGCCCCGGGCCGCAGG-3′) containing varying numbers of symmetrically methylated CpG dinucleotides was labeled at the 5′-end and incubated with recombinant GST-tagged MBD2b alone (100 ng of protein) or with the GST-MBD2b/His-MBD3L1 complex (100 ng of each protein). The protein DNA complexes were incubated for 5 minutes at room temperature in a buffer containing 10 mmol/L Tris-HCl (pH 7.5), 50 mmol/L NaCl, 1 mmol/L EDTA, 1 mmol/L DTT, 3 mmol/L MgCl2, 0.1% Triton-X100, 5% glycerol, 25 μg/mL BSA, and 1.25 μg/mL sonicated JM110 (dcm minus) bacterial DNA. Mobility shift assays were done using 5% polyacrylamide gels.
Results
Among all methyl-CpG binding proteins known, MBD2b has the highest affinity for methylated DNA and displays the greatest capacity to differentiate between methylated and unmethylated DNA (29). It recognizes a wide range of methylated CpG sequences with little sequence specificity, unlike MeCP2, which prefers AT-rich neighboring sequence contexts (30). MBD3L1, a protein with homology to MBD2b, lacks the NH2-terminal methyl-CpG binding domain but directly interacts with MBD2b (25, 31). Using a randomly designed 55-mer oligonucleotide with different numbers of methylated CpGs, we determined that the MBD2b/MBD3L1 complex has a much higher affinity for methylated DNA than MBD2b alone (Fig. 1A). Depending on the number of methylated CpGs, the presence of MBD3L1 leads to enhanced binding of the complex and reduced mobility, suggesting the formation of multiple MBD2b/MBD3L1 complexes in the presence of more than one methylated CpG dinculeotide. The MIRA assay, which is based on the strong affinity of the MBD2b/MBD3L1 complex for methylated DNA, has a high specificity for enriching methylated DNA, and unmethylated DNA fragments stay in the supernatant (24). We initially determined that the pulldown of methylated DNA by MIRA is dependent on the number of methylated CpGs present in genomic DNA fragments. Recovery of fragments containing 13 methylated CpGs was more efficient than recovery of fragments containing two methylated CpGs and much more efficient than recovery of fragments with only one or zero methylated CpG sites (Fig. 1B). Thus, the efficiency of the MIRA enrichment depends on CpG density and the approach seems to be ideally suited for pulling down methylated CpG islands.
Affinity of the MBD2b/MBD3L1 complex for CpG-methylated DNA and dependence of the MIRA pulldown on the number of methylated CpGs. A, gel mobility shift assay with CpG-methylated oligonucleotides. A 55-mer oligonucleotide, containing varying numbers of symmetrically methylated CpG dinucleotides, was incubated with recombinant MBD2b alone (100 ng of protein) or with the MBD2b/MBD3L1 complex (100 ng of each protein). The protein DNA complexes were incubated for 5 minutes at room temperature and then a mobility shift assay was done using a 5% polyacrylamide gel. B, efficiency of the MIRA pull-down is dependent on the number of methylated CpGs. Restriction-cut human genomic DNA was methylated with different DNA methylases (SssI, HpaII, and/or HhaI) to introduce different numbers of methylated CpGs into the unmethylated CpG island promoter of the TBP gene. After MIRA, the fragments were amplified by quantitative real-time PCR using TBP-specific primers.
Affinity of the MBD2b/MBD3L1 complex for CpG-methylated DNA and dependence of the MIRA pulldown on the number of methylated CpGs. A, gel mobility shift assay with CpG-methylated oligonucleotides. A 55-mer oligonucleotide, containing varying numbers of symmetrically methylated CpG dinucleotides, was incubated with recombinant MBD2b alone (100 ng of protein) or with the MBD2b/MBD3L1 complex (100 ng of each protein). The protein DNA complexes were incubated for 5 minutes at room temperature and then a mobility shift assay was done using a 5% polyacrylamide gel. B, efficiency of the MIRA pull-down is dependent on the number of methylated CpGs. Restriction-cut human genomic DNA was methylated with different DNA methylases (SssI, HpaII, and/or HhaI) to introduce different numbers of methylated CpGs into the unmethylated CpG island promoter of the TBP gene. After MIRA, the fragments were amplified by quantitative real-time PCR using TBP-specific primers.
Figure 2A outlines the MIRA microarray approach. Genomic DNA, isolated from different sources, is cleaved with MseI (5′-TTAA), which produces small 200- to 300-bp fragments and cuts outside of CpG islands, and compatible linkers are ligated to the ends. Then the MIRA pulldown is done to isolate the methylated DNA fraction. Input and MIRA-enriched fractions are labeled with different fluorescent dyes, mixed, and hybridized to CpG island microarrays containing 12,192 CpG islands, of which >60% map to the 5′ promoter sequences of genes (26, 27). We applied this technique to identify CpG islands methylated in the lung cancer cell line A549 relative to NHBE cells (Fig. 2B and C; see Materials and Methods). MIRA enrichment factors indicating differences in levels of methylation are calculated and compared between the two cell lines. Using the data obtained from such arrays, a list of genes was compiled, which shows hypermethylation in A549 cells relative to NHBE cells (Table 1). Most of the differentially methylated CpG islands mapped either close to the 5′ ends of known or predicted genes or to the exons/introns within genes and may represent regulatory elements.
MIRA-assisted CpG island microarray analysis. A, schematic diagram of the MIRA microarray method. Input and MIRA-enriched fractions are labeled with different dyes, mixed, and hybridized to the slides and the relative enrichment factors between different cell types and tissues are determined. For confirmation, MIRA-enriched DNA from normal and tumor cells can be mixed and hybridized directly. B, representative data for MIRA microarrays. Left and middle, red, MIRA-enriched fractions; green, input fractions. Right, green, MIRA-enriched A549 DNA; red, NHBE cell DNA. C, pairwise comparison. Enrichment factors obtained from NHBE cells (vertical axis) and A549 cells (horizontal axis) were compared. The dots in the blue circle are the targets selectively methylated in tumor cell DNA.
MIRA-assisted CpG island microarray analysis. A, schematic diagram of the MIRA microarray method. Input and MIRA-enriched fractions are labeled with different dyes, mixed, and hybridized to the slides and the relative enrichment factors between different cell types and tissues are determined. For confirmation, MIRA-enriched DNA from normal and tumor cells can be mixed and hybridized directly. B, representative data for MIRA microarrays. Left and middle, red, MIRA-enriched fractions; green, input fractions. Right, green, MIRA-enriched A549 DNA; red, NHBE cell DNA. C, pairwise comparison. Enrichment factors obtained from NHBE cells (vertical axis) and A549 cells (horizontal axis) were compared. The dots in the blue circle are the targets selectively methylated in tumor cell DNA.
Methylated target genes identified by MIRA microarrays
Target no. . | ID . | FC difference* . | Genome location . | Distance† . | Gene symbol . | Description . |
---|---|---|---|---|---|---|
1 | 1_A_12 | 4.72 | chr2:175033605-175034788 | 5 kb 3′ | CIR1 | CBF1 interacting corepressor isoform 1 |
2 | 18_E_17 | 4.34 | chr3:38055590-38056657 | 0 | DLEC1 | Deleted in lung and esophageal cancer 1 |
3 | 1_B_19 | 3.69 | chr1:176933895-176934835 | 0 | LHX4 | LIM homeobox 4 |
4 | 15_G_6 | 3.58 | chr1:114111744-114111847 | 0 | PTPN22 | Protein tyrosine phosphatase |
5 | 8_E_7 | 3.54 | chr12:14237814-14238207 | >30 kb | ||
6 | 25_H_18 | 3.53 | chr14:23904920-23905609 | 1650 bp 5′ | NFATC4 | Cytoplasmic nuclear factor of activated T-cells |
7 | 22_N_10 | 3.31 | chr6:100542633-100542762 | 0 | GRP145 | G protein-coupled receptor 145 |
8 | 31_L_22 | 3.28 | chr6:115099013-115099186 | >30 kb | ||
9 | 25_N_11 | 3.24 | chr10:42569409-42570107 | 0 | hmm7851 | Hypothetical protein |
10 | 13_P_8 | 3.16 | chr6:99402311-99402598 | 0 | hmm33765 | Hypothetical protein |
11 | 31_L_21 | 3.11 | chr6:115099013-115099186 | >30 kb | ||
12 | 26_M_12 | 3.00 | chr13:49163567-49163905 | 0 | EBPL | Emopamil binding related protein |
13 | 17_F_9 | 2.92 | chr6:45327693-45327793 | 0 | SUPT3H | SUPT3H protein |
14 | 25_O_20 | 2.78 | chrX:48441921-48442544 | 2.5 kb 5′ | ERAS | Small GTPase protein E-Ras |
15 | 17_F_20 | 2.76 | chr9:123856765-123857465 | 0 | LHX2 | LIM homeobox 2 |
16 | 7_L_22 | 2.65 | chr5:72775997-72776155 | 3 kb 3′ | FOXD1 | Forkhead transcription factor |
17 | 23_B_15 | 2.63 | chr8:91679368-91679403 | >30 kb | ||
18 | 29_A_11 | 2.63 | chr1:18716744-18718019 | 0 | PAX7 | Paired box gene 7 |
19 | 26_M_11 | 2.62 | chr12:55167740-55168302 | 0 | GLS2 | Glutaminase 2 (liver, mitochondrial) |
20 | 7_K_7 | 2.58 | chr17:44187113-44188088 | 26 kb 5′ | HOXB13 | Homeobox gene |
21 | 29_A_6 | 2.58 | chr4:136284675-136284784 | >30 kb | ||
22 | 2_B_9 | 2.54 | chr10:102972869-102973961 | 2.5 kb 3′ | LBX1 | Homeobox transcription factor |
23 | 21_D_20 | 2.45 | chr2:45139996-45140793 | 3.6 kb 3′ | SIX2 | Sine oculis homeobox homologue 2 |
24 | 21_E_12 | 2.45 | chr12:48026625-48026800 | 0 | FLJ13236 | Hypothetical protein FLJ13236 |
25 | 2_A_10 | 2.40 | chr10:102972869-102973961 | 0 | BRACE2016602 | Hypothetical protein |
26 | 14_J_18 | 2.38 | chr5:83154024-83154192 | 0 | hmm32907 | Hypothetical protein |
27 | 12_B_24 | 2.35 | chr1:16757837-16758127 | 0 | EIF1AP1 | Translation initiation factor 1A pseudogene 1 |
28 | 22_L_21 | 2.34 | chr20:28164999-28165149 | 0 | hmm117175 | Hypothetical protein |
29 | 7_F_8 | 2.32 | chr6:91594928-91595364 | >30 kb | ||
30 | 25_C_14 | 2.30 | chr4:109450308-109450511 | 3.1 kb 5′ | LEF1 | Lymphoid enhancer binding factor-1 |
31 | 25_I_22 | 2.28 | chr19:4279686-4280325 | 0 | STAP2 | Signal-transducing adaptor protein 2 |
32 | 14_C_16 | 2.26 | chr6:144647977-144648462 | 0 | hmm33914 | Hypothetical protein |
33 | 32_G_18 | 2.26 | chr5:139908213-139908494 | 0 | EIF4EBP3 | Eukaryotic translation initiation factor 4E |
34 | 15_M_17 | 2.25 | chr22:27793134-27793763 | 0 | KREMEN1 | Inhibitor of Wnt signaling |
35 | 30_E_8 | 2.24 | chr5:54551833-54552394 | 0 | CR626610 | Hypothetical protein |
36 | 14_N_16 | 2.21 | chr2:176847561-176847721 | 6 kb 5′ | HOXD3 | Homeobox gene HOXD3 |
37 | 18_E_9 | 2.19 | chr10:102972869-102973961 | 2.5 kb 3′ | LBX1 | Homeobox transcription factor |
38 | 20_O_10 | 2.16 | chr2:172770466-172770678 | 4 kb 5′ | DLX1 | Homeobox protein DLX-1 |
39 | 15_J_24 | 2.15 | chr4:88700602-88701596 | 0 | NUDT9 | Nucleoside diphosphate linked moiety X-type 9 |
40 | 1_O_12 | 2.15 | chr2:176802127-176803414 | 40 kb 3′ | HOXD1 | Homeobox gene |
41 | 14_L_15 | 2.13 | chr7:154669328-154670208 | 0 | hmm78131 | Hypothetical protein |
42 | 2_D_17 | 2.10 | chr6:134258454-134259172 | 0.3 kb 3′ | TCF21 | Helix-loop-helix transcription factor |
43 | 20_L_5 | 2.07 | chr15:39592659-39593447 | 0 | LTK | Leukocyte tyrosine kinase |
44 | 23_H_10 | 2.07 | chr8:145671696-145672153 | 1.4 kb 5′ | KIFC2 | Kinesin motor |
45 | 12_C_14 | 2.02 | chr11:73167747-73168609 | 18 kb 5′ | RAB6A | Small GTPase |
46 | 31_G_9 | 2.00 | chr17:70443461-70443780 | 0 | OTOP3 | Otopetrin-3 |
47 | 26_N_12 | 2.00 | chr1:148695088-148695652 | 0 | THEM4 | Negative regulator of AKT |
48 | 11_C_15 | 2.00 | chr18:53259181-53259461 | 0 | ONECUT2 | Homeobox gene |
49 | 26_N_12 | 2.00 | chr1:148695088-148695652 | 0 | KIAA0460 | KIAA0460 protein |
50 | 2_D_9 | 2.00 | chr14:36197384-36198099 | 2.6 kb 5′ | PAX9 | Paired box gene 9 |
Target no. . | ID . | FC difference* . | Genome location . | Distance† . | Gene symbol . | Description . |
---|---|---|---|---|---|---|
1 | 1_A_12 | 4.72 | chr2:175033605-175034788 | 5 kb 3′ | CIR1 | CBF1 interacting corepressor isoform 1 |
2 | 18_E_17 | 4.34 | chr3:38055590-38056657 | 0 | DLEC1 | Deleted in lung and esophageal cancer 1 |
3 | 1_B_19 | 3.69 | chr1:176933895-176934835 | 0 | LHX4 | LIM homeobox 4 |
4 | 15_G_6 | 3.58 | chr1:114111744-114111847 | 0 | PTPN22 | Protein tyrosine phosphatase |
5 | 8_E_7 | 3.54 | chr12:14237814-14238207 | >30 kb | ||
6 | 25_H_18 | 3.53 | chr14:23904920-23905609 | 1650 bp 5′ | NFATC4 | Cytoplasmic nuclear factor of activated T-cells |
7 | 22_N_10 | 3.31 | chr6:100542633-100542762 | 0 | GRP145 | G protein-coupled receptor 145 |
8 | 31_L_22 | 3.28 | chr6:115099013-115099186 | >30 kb | ||
9 | 25_N_11 | 3.24 | chr10:42569409-42570107 | 0 | hmm7851 | Hypothetical protein |
10 | 13_P_8 | 3.16 | chr6:99402311-99402598 | 0 | hmm33765 | Hypothetical protein |
11 | 31_L_21 | 3.11 | chr6:115099013-115099186 | >30 kb | ||
12 | 26_M_12 | 3.00 | chr13:49163567-49163905 | 0 | EBPL | Emopamil binding related protein |
13 | 17_F_9 | 2.92 | chr6:45327693-45327793 | 0 | SUPT3H | SUPT3H protein |
14 | 25_O_20 | 2.78 | chrX:48441921-48442544 | 2.5 kb 5′ | ERAS | Small GTPase protein E-Ras |
15 | 17_F_20 | 2.76 | chr9:123856765-123857465 | 0 | LHX2 | LIM homeobox 2 |
16 | 7_L_22 | 2.65 | chr5:72775997-72776155 | 3 kb 3′ | FOXD1 | Forkhead transcription factor |
17 | 23_B_15 | 2.63 | chr8:91679368-91679403 | >30 kb | ||
18 | 29_A_11 | 2.63 | chr1:18716744-18718019 | 0 | PAX7 | Paired box gene 7 |
19 | 26_M_11 | 2.62 | chr12:55167740-55168302 | 0 | GLS2 | Glutaminase 2 (liver, mitochondrial) |
20 | 7_K_7 | 2.58 | chr17:44187113-44188088 | 26 kb 5′ | HOXB13 | Homeobox gene |
21 | 29_A_6 | 2.58 | chr4:136284675-136284784 | >30 kb | ||
22 | 2_B_9 | 2.54 | chr10:102972869-102973961 | 2.5 kb 3′ | LBX1 | Homeobox transcription factor |
23 | 21_D_20 | 2.45 | chr2:45139996-45140793 | 3.6 kb 3′ | SIX2 | Sine oculis homeobox homologue 2 |
24 | 21_E_12 | 2.45 | chr12:48026625-48026800 | 0 | FLJ13236 | Hypothetical protein FLJ13236 |
25 | 2_A_10 | 2.40 | chr10:102972869-102973961 | 0 | BRACE2016602 | Hypothetical protein |
26 | 14_J_18 | 2.38 | chr5:83154024-83154192 | 0 | hmm32907 | Hypothetical protein |
27 | 12_B_24 | 2.35 | chr1:16757837-16758127 | 0 | EIF1AP1 | Translation initiation factor 1A pseudogene 1 |
28 | 22_L_21 | 2.34 | chr20:28164999-28165149 | 0 | hmm117175 | Hypothetical protein |
29 | 7_F_8 | 2.32 | chr6:91594928-91595364 | >30 kb | ||
30 | 25_C_14 | 2.30 | chr4:109450308-109450511 | 3.1 kb 5′ | LEF1 | Lymphoid enhancer binding factor-1 |
31 | 25_I_22 | 2.28 | chr19:4279686-4280325 | 0 | STAP2 | Signal-transducing adaptor protein 2 |
32 | 14_C_16 | 2.26 | chr6:144647977-144648462 | 0 | hmm33914 | Hypothetical protein |
33 | 32_G_18 | 2.26 | chr5:139908213-139908494 | 0 | EIF4EBP3 | Eukaryotic translation initiation factor 4E |
34 | 15_M_17 | 2.25 | chr22:27793134-27793763 | 0 | KREMEN1 | Inhibitor of Wnt signaling |
35 | 30_E_8 | 2.24 | chr5:54551833-54552394 | 0 | CR626610 | Hypothetical protein |
36 | 14_N_16 | 2.21 | chr2:176847561-176847721 | 6 kb 5′ | HOXD3 | Homeobox gene HOXD3 |
37 | 18_E_9 | 2.19 | chr10:102972869-102973961 | 2.5 kb 3′ | LBX1 | Homeobox transcription factor |
38 | 20_O_10 | 2.16 | chr2:172770466-172770678 | 4 kb 5′ | DLX1 | Homeobox protein DLX-1 |
39 | 15_J_24 | 2.15 | chr4:88700602-88701596 | 0 | NUDT9 | Nucleoside diphosphate linked moiety X-type 9 |
40 | 1_O_12 | 2.15 | chr2:176802127-176803414 | 40 kb 3′ | HOXD1 | Homeobox gene |
41 | 14_L_15 | 2.13 | chr7:154669328-154670208 | 0 | hmm78131 | Hypothetical protein |
42 | 2_D_17 | 2.10 | chr6:134258454-134259172 | 0.3 kb 3′ | TCF21 | Helix-loop-helix transcription factor |
43 | 20_L_5 | 2.07 | chr15:39592659-39593447 | 0 | LTK | Leukocyte tyrosine kinase |
44 | 23_H_10 | 2.07 | chr8:145671696-145672153 | 1.4 kb 5′ | KIFC2 | Kinesin motor |
45 | 12_C_14 | 2.02 | chr11:73167747-73168609 | 18 kb 5′ | RAB6A | Small GTPase |
46 | 31_G_9 | 2.00 | chr17:70443461-70443780 | 0 | OTOP3 | Otopetrin-3 |
47 | 26_N_12 | 2.00 | chr1:148695088-148695652 | 0 | THEM4 | Negative regulator of AKT |
48 | 11_C_15 | 2.00 | chr18:53259181-53259461 | 0 | ONECUT2 | Homeobox gene |
49 | 26_N_12 | 2.00 | chr1:148695088-148695652 | 0 | KIAA0460 | KIAA0460 protein |
50 | 2_D_9 | 2.00 | chr14:36197384-36198099 | 2.6 kb 5′ | PAX9 | Paired box gene 9 |
NOTE: Target information was verified using the University of California Santa Cruz Genome Browser (May 2004 assembly) and GenBank.
FC difference is the ratio (fold change) of MIRA-enriched over unenriched (input) A549 DNA versus MIRA-enriched over unenriched (input) NHBE DNA.
A distance of “0” indicates that the target falls within a gene including 500 bp upstream of its transcription start site.
We confirmed cancer cell–specific methylation and lack of methylation in NHBE cells for several of the targets identified by the microarray analysis using a BstUI COBRA assay (Fig. 3A). In this assay, a CpG-containing restriction endonuclease cleavage site is conserved after bisulfite treatment when the DNA is methylated (28). COBRA assays using BstUI (5′-CGCG) initially confirmed the methylation of targets ranked number 1, 4, 10, and 20 on the list of differentially methylated genes, indicating the robustness and specificity of the MIRA microarray approach. Several target genes were of particular interest. One methylated gene (target 2 in Table 1) is designated as DLEC1 (Deleted in Lung and Esophageal Cancers) and maps to chromosome 3p22-p21.3, a common hotspot for loss of heterozygosity and deletions in lung cancer and other solid tumors. DLEC1 encodes a protein of 1755 amino acids with unknown function (32). To investigate if methylation of DLEC1 is present in human primary lung tumors, we analyzed a series of 30 primary non–small-cell lung cancers by the COBRA assay (examples are shown in Fig. 3B) and by bisulfite sequencing (Fig. 3C). Eight of the 30 (27%) undissected tumor samples tested (e.g., T4, T6, T8, T9, T10, and T15; Fig. 3B) showed clear evidence of methylation. None of the adjacent, tumor-matched, normal tissues had methylation of DLEC1 (Fig. 3B and C). Using sodium bisulfite sequencing (33), we verified that the DLEC1-associated CpG island was highly methylated in A549 cells and in a primary lung tumor but was completely unmethylated in NHBE cells or normal lung tissue (Fig. 3C). Methylation encompassed the entire CpG island of DLEC1 (Fig. 3C). Methylation of DLEC1 correlated with a lack of expression in A549 cells and the gene could be reactivated by treatment of the cell line with 5-aza-deoxycytidine (Fig. 3D). In addition, the DLEC1 promoter was methylated in 3 of 15 (20%) primary esophageal cancers and in 4 of 10 (40%) primary melanomas tested (see Supplementary Fig. S1).
Confirmation of methylation differences and identification of DLEC1 as a target for tumor-specific methylation. A, confirmation of tumor cell–specific methylation of four candidate target genes identified by the MIRA-assisted microarray approach (targets 1, 4, 10, and 20 in Table 1). A549 and NHBE cell DNA was treated with sodium bisulfite and the target CpG island sequences were amplified using specific primers. Methylation was confirmed by a BstUI COBRA assay producing cleavage products (arrows) when methylation at 5′-CGCG sequences is present. B, methylation of the DLEC1 gene in primary lung cancers. DNA was isolated from primary non–small-cell lung carcinoma tumors (T) and their adjacent normal tissues (N). After sodium bisulfite treatment, the DLEC1 promoter CpG island was PCR amplified. Cutting with BstUI indicates methylation of 5′-CGCG sequences within this CpG island sequence, which contains three BstUI sites (see samples T4, T6, T8, T9, T10, and T15). C, determination of tumor cell–specific methylation of the CpG island of the DLEC1 gene by bisulfite sequencing. Bisulfite genomic sequencing was done on DNA isolated from A549 cells, NHBE cells, a primary lung tumor, and adjacent normal lung tissue. Primers specific for the DLEC1 CpG island (gray shading) were used. In NHBE and A549 cells, almost the entire CpG island was sequenced (long bar). In the primary tissues, only the area proximal to the transcription start site (arrow) was sequenced (short bar). Sequencing results of several independent clones are shown. Black squares, methylated CpG dinucleotides. D, expression of DLEC1 in NHBE and A549 cells and reactivation by 5-aza-deoxycytidine (5′-AzadC) treatment.
Confirmation of methylation differences and identification of DLEC1 as a target for tumor-specific methylation. A, confirmation of tumor cell–specific methylation of four candidate target genes identified by the MIRA-assisted microarray approach (targets 1, 4, 10, and 20 in Table 1). A549 and NHBE cell DNA was treated with sodium bisulfite and the target CpG island sequences were amplified using specific primers. Methylation was confirmed by a BstUI COBRA assay producing cleavage products (arrows) when methylation at 5′-CGCG sequences is present. B, methylation of the DLEC1 gene in primary lung cancers. DNA was isolated from primary non–small-cell lung carcinoma tumors (T) and their adjacent normal tissues (N). After sodium bisulfite treatment, the DLEC1 promoter CpG island was PCR amplified. Cutting with BstUI indicates methylation of 5′-CGCG sequences within this CpG island sequence, which contains three BstUI sites (see samples T4, T6, T8, T9, T10, and T15). C, determination of tumor cell–specific methylation of the CpG island of the DLEC1 gene by bisulfite sequencing. Bisulfite genomic sequencing was done on DNA isolated from A549 cells, NHBE cells, a primary lung tumor, and adjacent normal lung tissue. Primers specific for the DLEC1 CpG island (gray shading) were used. In NHBE and A549 cells, almost the entire CpG island was sequenced (long bar). In the primary tissues, only the area proximal to the transcription start site (arrow) was sequenced (short bar). Sequencing results of several independent clones are shown. Black squares, methylated CpG dinucleotides. D, expression of DLEC1 in NHBE and A549 cells and reactivation by 5-aza-deoxycytidine (5′-AzadC) treatment.
Interestingly, 11 of the top 50 hits, with a calculated >2-fold enrichment of the methylated fraction in A549 versus NHBE cell DNA, were mapped to homeobox genes (LHX2, LHX4, PAX7, HOXB13, LBX1, SIX2, HOXD3, DLX1, HOXD1, ONECUT2, and PAX9; see Table 1). We confirmed the methylation difference for these homeobox genes using standard bisulfite-based approaches. Methylation levels were higher in cancer cells for all nine homeobox gene targets tested (see Fig. 4A). Methylation was either absent in NHBE cells or was partial as seen for targets 3, 15, 38, and 48. The identified methylated sequences at homeobox genes did not always coincide with the 5′ ends of the genes, but several targets were localized either to intronic CpG islands (LHX4, PAX7, and ONECUT2) or to CpG islands near the 3′ end of the gene (LBX1, SIX2, and HOXD1). For SIX2, we determined that the CpG island spanning the promoter of this gene was also methylated, in addition to the CpG island near its 3′ end (data not shown).
Methylation of homeobox genes in lung cancer. A, methylation differences of homeobox gene targets between NHBE and A549 cells. Several targets contain multiple BstUI sites and the smaller cleavage fragments are not visible on the gel. B, methylation of the LHX2 gene in primary lung tumors (T) and matching normal tissues (N). C, methylation of the LHX4 gene in primary lung tumors (T) and matching normal tissues (N).
Methylation of homeobox genes in lung cancer. A, methylation differences of homeobox gene targets between NHBE and A549 cells. Several targets contain multiple BstUI sites and the smaller cleavage fragments are not visible on the gel. B, methylation of the LHX2 gene in primary lung tumors (T) and matching normal tissues (N). C, methylation of the LHX4 gene in primary lung tumors (T) and matching normal tissues (N).
To extend the analysis of homeobox gene methylation to primary lung tumors, we analyzed the methylation status of the CpG islands associated with the LHX2 and LHX4 genes in primary lung tumors (Fig. 4B and C). For LHX2, the methylation target spans intron 2 of the gene, and for LHX4, the target is a CpG island spanning intron 1. Although low levels of methylation were detected in some normal tissues removed with tumor surgery, methylation of LHX2 and LHX4 was more pronounced in tumor samples. LHX2 was methylated in 7 of 12 (58%) of primary lung tumors and LHX4 was methylated in 6 of 8 (75%) of the tumors analyzed. The methylation differences were confirmed by sodium bisulfite sequence analysis (Fig. 5). Of a total of 210 CpG sequences analyzed for LHX2, 32 were methylated in normal tissue and 95 were methylated in the tumor (P < 0.001, χ2 test). Of 260 CpG sequences analyzed for LHX4, 105 were methylated in normal tissue and 187 were methylated in the tumor (P < 0.001, χ2 test).
Determination of the methylation differences for LHX2 and LHX4 by sodium bisulfite sequencing. Sodium bisulfite sequencing was done for the methylation target in the LHX2 gene (T/N pair #9; Fig. 4B) and the LHX4 gene (T/N pair #3; Fig. 4C).
We have also looked for hypomethylation events in the cancer cell line. We had to use less stringent cutoffs (1.5-fold difference factor) to be able to pick up any hypomethylation targets. Of only four such targets identified, two had no BLAST hits on the genome and one did not map near a known gene. The one remaining hit mapped to the 3′ end of the TAF6L gene. Thus, it is clear that hypomethylation of CpG-rich sequences relative to normal bronchial epithelial cells is not a common event in this cancer cell line.
Discussion
We present a new method for analysis of DNA methylation patterns on a genome-wide scale. The MIRA method is based on the selective enrichment of methylated DNA fragments by a matrix containing the methylated-CpG binding protein MBD2b in the presence of its binding partner, MBD3L1. MBD3L1 strongly enhances the binding of MBD2b to methylated DNA (Fig. 1A). The specificity of the MIRA method for enriching methylated DNA forms the basis of this approach (24). The binding of MBD2b to methylated DNA is relatively sequence independent, in contrast to another methylated-CpG binding protein, MeCP2 (29, 30). We have confirmed a lack of sequence bias (beyond that for methylated CpGs) by sequence analysis of the CpG island targets identified on the arrays (Table 1). No specific sequence motif spanning CpG sites, either in all targets combined or specifically in the homeobox genes, was present in a majority of the targets identified. The specificity of the MIRA microarray approach has been confirmed. Of the 18 targets identified by microarrays and tested for methylation differences using standard sodium bisulfite-based techniques, all 18 have been confirmed as differentially methylated, indicating that the number of false positives identified by this method is very low. It is likely that fold difference factors <2 will still indicate differential methylation. We have confirmed tumor cell–specific methylation of another homeodomain gene, RAX, in A549 cells although the fold difference factor was only 1.6. We expect that a 1.5-fold difference factor will in many cases be sufficient to indicate differential methylation, although this will need to be verified for more genes.
The MIRA microarray has several advantages over existing techniques for analyzing DNA methylation patterns. It does not depend on the occurrence of specific methylation-sensitive restriction sites within the target sequence. It also does not depend on the use of antibodies against 5-methylcytosine, which require ssDNA for recognition. The recombinant proteins, GST-tagged MBD2b and His-tagged MBD3L1, are easy to prepare and can be stored in large quantities as a frozen stock. No complicated manipulations need to be done. The technique can easily be adopted using a variety of array platforms including promoter arrays and genomic tiling arrays. Differentiation between methylated and unmethylated DNA is best achieved when two or more methylated CpG sites are present in the MseI fragment. The method will be most sensitive for identifying differentially methylated CpG islands due to the high frequency of methylated CpGs in these targets. The technique should be suitable, for example, for the identification of new imprinted genes in mammalian genomes. Imprinted genes often are associated with CpG islands that carry differential methylation marks in the germ line. The remodeling of genome-wide methylation patterns during development and differentiation can be analyzed. Hypermethylation of CpG islands is a hallmark of cancer cells. Thus, the MIRA microarray approach will have important use for a more comprehensive characterization of cancer-associated methylation changes. New tumor suppressor genes or new DNA methylation markers for the early detection of cancer may be identified. Using genomic tiling arrays, it should become possible to more precisely localize genomic areas that undergo hypomethylation in cancer. A complete understanding of the range of CpG islands methylated within a tumor and the occurrence of tumor-specific or even tumor subtype–specific methylation patterns may aid in our understanding of pathways leading to tumorigenesis. MIRA microarray technology may allow the classification of tumors based on their methylation patterns.
The MIRA microarray analysis has generated a list of genes methylated in a human lung cancer cell line. Many of them require follow-up investigations to determine the frequency of methylation in primary tumors and the potential biological significance of methylation silencing for tumor progression. Among the methylated genes identified were several genes involved in signal transduction pathways and transcriptional regulation (Table 1). Of particular interest was one gene, DLEC1, which is localized within a common area of loss of heterozygosity and deletion on the short arm of chromosome 3 (34). This chromosomal area is thought to harbor one or several important lung tumor suppressor genes, but the exact identity of these genes has remained unclear. Search for mutational changes in genes mapped to this location has not identified any gene that is mutated frequently. However, several genes at 3p22-p21.3 are silenced epigenetically in lung cancers (16, 35). DLEC1, which is not expressed in a fraction of lung and esophageal cancers and inhibits the growth of cancer cell lines on overexpression (32), may be an important target for methylation silencing in several types of malignancies (Fig. 3).
Our search for genes methylated in lung cancer has uncovered an unexpectedly frequent occurrence of homeobox genes as methylation targets (11 of the top 50 hits). This was a much larger number than expected because the human genome contains only a few hundred homeodomain genes. Simply by chance, <1 of 50 genes would be expected to be a homeodomain-containing gene. Many homeobox genes are aberrantly expressed in a variety of hematologic malignancies and solid tumors, and this change includes more commonly up-regulation and less commonly down-regulation of the gene (36, 37). Methylation of regulatory elements in CpG islands associated with homeobox genes may result in silencing of homeobox gene expression if the gene is expressed in adult tissues or stem cells. Individual homeobox genes have been reported to be methylated in tumors of various histologic origins. For example, methylation of HOXB13 has been reported to occur in 30% of renal cell carcinomas (38) and methylation of genes in the HOXA and HOXD clusters was reported in lung cancer (39). The HOXA5 promoter region was methylated in 16 of 20 p53-negative breast tumor specimens (40) and the homeobox gene NKX3.1 is commonly methylated in prostate cancers (41). Our data indicate that simultaneous methylation of a series of homeobox genes located on different chromosomes occurs in lung cancer cells.
Most homeobox genes are not expressed in adult tissues and the lack of their expression may promote de novo methylation during malignant progression. For example, LHX2 and LHX4 are involved in the development of the central nervous system (42, 43) and are not known to be expressed in normal lung tissue. These genes were not expressed in normal human bronchial epithelial cells as determined by RT-PCR analysis (data not shown) although they were only partially methylated (Fig. 4A). LHX2 and LHX4 could not be reactivated by 5-aza-deoxycytidine treatment. Thus, the mechanism of their primary repression is likely independent of DNA methylation. Progressive methylation of LHX2 and LHX4 occurred during lung cancer development (Figs. 4B and 5).
Although we have not identified specific sequence similarities in the homeobox gene-associated CpG islands that undergo methylation, a common mechanistic pathway may exist in cancer cells, which promotes de novo methylation of these targets. Polycomb repressive complexes are involved in silencing of homeobox genes (44), although it has been difficult to identify specific polycomb response elements in mammalian systems. A recent genome-wide survey for localization of polycomb components including SUZ12, which is required for the histone methyltransferase activity and silencing function of the EED-EZH2 complex, has identified a large fraction of the targets as homeobox genes (45, 46). These included most of the homeotic genes found in the HOX clusters and the majority of the homeodomain genes. Several of these genes were identical to the ones we identified as targets for DNA methylation in this study. The polycomb component EZH2 associates with DNA methyltransferases (47, 48) and EZH2 expression is increased in tumors of different histologic types including precancerous tissue (49, 50). Thus, although speculative at present, these data suggest a potential mechanism by which homeobox genes (and perhaps other genes) may become targets for de novo methylation in cancer cells.
In summary, our data indicate that the MIRA microarray approach has the ability to identify genes methylated in human tumors on a genome-wide basis. This technology is expected to be widely applicable for comprehensive analysis of DNA methylation patterns using available spotted microarrays and new-generation genome-scanning arrays currently under development. Because this technique is straightforward and does not require complicated manipulations, it should easily be applicable in clinical settings and should become useful as a diagnostic tool to classify tumors according to DNA methylation patterns on a genomic scale. In addition, this technology may aid in the identification of new candidate tumor suppressor genes and potential DNA methylation markers.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
Acknowledgments
Grant support: NIH grant CA104967 (G.P. Pfeifer).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Stella Tommasi for bisulfite-treated DNA from melanomas, Maricela Covarrubias for hybridization of microarray slides, and Steven Bates for help with cell culture experiments.