Abstract
In the post-genome era, attention has focused on the functions of genome sequences and how they are regulated. The emerging epigenomic changes and the interactions between cis-acting elements and protein factors may play a central role in gene regulation. To understand the crosstalk between DNA and protein on a genome-wide scale, one emerging technique, called ChIP-chip, takes the strategy of combining chromatin immunoprecipitation with microarray. This new high-throughput strategy helps screen the targets of critical transcription factors and profile the genome-wide distribution of histone modifications, which will enable the feasibility of conducting a large-scale study, such as the Human Epigenome Project. (Cancer Res 2006; 66(14): 6899-902)
Introduction
The completion of the Human Genome Project provides a road map for thorough interrogation of gene functions. This new endeavor has also made possible further technological advances. In addition to identifying novel transcription factor targets, current studies may shift our attention to genome-wide characterization of histone modifications and DNA methylation. The importance of this type of study is further echoed by a recent proposal for the Human Epigenome Project (1). This requires genome-wide technologies with high-throughput capability. One of these techniques, combining chromatin immunoprecipitation (ChIP) with microarray (ChIP-chip), has been used extensively as a genome-wide tool to screen the binding position of protein factors (2). Because the use of ChIP-chip is beginning to gain popularity in the research community, the present review provides a timely overview of recent developments and applications.
The Strategy and Development of ChIP-chip
ChIP-chip combines ChIP and microarray techniques (Fig. 1A; refs. 3–5). Briefly, DNA and protein are cross-linked in vivo with formaldehyde and immunoprecipitated with specific antibodies against the protein of interest. DNA bound by protein is sheared by sonication to fragments of ∼0.2 to 2 kb. It is necessary to apply consistent variables to DNA sonication so that the sizes of sheared DNA are of similar range. The pulled-down DNA and appropriate controls are fluorescently labeled and applied to microscopic slides for microarray analysis. With input DNA as background, the comparison of immunoprecipitated DNA with the background control helps to profile the binding position of specific proteins in the genome. Ren et al. (3) identified novel targets of the yeast transcription factors Gal4 and Ste12 first with ChIP-chip on a yeast intergenic DNA array. A subsequent study profiled the binding sites of the cell-cycle transcription factor SBF and MBF in yeast (4). Compared with the PCR amplification of specific target sequences from immunoprecipitated material, ChIP-chip is a genome-wide “reverse-genetic” approach. Another advantage of ChIP-chip is that it targets those genes directly bound by the protein factor. The classic expression arrays cannot distinguish between directly regulated genes and those changed secondarily.
Brief comparison of the procedures of ChIP-chip with other high-throughput techniques studying DNA-protein or RNA-protein interactions. The procedures of (A-C) are general principles of ChIP-chip and others. Key steps are highlighted with red text box. DpnI (star) is a methylation-sensitive restriction enzyme.
Brief comparison of the procedures of ChIP-chip with other high-throughput techniques studying DNA-protein or RNA-protein interactions. The procedures of (A-C) are general principles of ChIP-chip and others. Key steps are highlighted with red text box. DpnI (star) is a methylation-sensitive restriction enzyme.
The extent of ChIP-chip application depends, in part, on the development of DNA microarray technology, especially the availability of arrayed slides for these organisms. In humans, one of the first ChIP-chip experiments adopted was the use of a CpG island array for screening novel E2F4 targets (6). Other arrays were then developed for ChIP-chip including selected promoter array and DNA tiling arrays (2). The advantages and limitations of these array have been discussed elsewhere (2). Recently, human genome-wide array with PCR amplicons printed has further been improved to cover >90% of human nonrepetitive DNA sequences (7). Unlike a PCR amplicon–based strategy, a new oligonucleotide array technique helped build another human array representing all human nonrepetitive regions with resolution at 100 bp (8). A microarray currently not available, containing the entire human genome sequence would be ideal for ChIP-chip.
Critical for ChIP-chip experiments is the amount of starting material required for successful microarray hybridization. The number is highly variable depending on the quality of the antibody, binding frequency of protein to DNA, and other possible unknown factors. In one previous report, 1 to 10 ng of ChIP DNA were pulled down from a total of 30 to 60 μg of Drosophila DNA (9). Up to 50 individual ChIP DNA samples have been pooled to acquire enough DNA for array hybridization (5). An alternative approach to enrich the starting material is through PCR-based amplification (2–4). A RNA polymerase–based method is known to linearly amplify ChIP DNA (10). However, a PCR bias may occur, especially with mammalian systems, where large amounts of repeat sequences may skew data.
Another concern is background DNA that is pulled down by nonspecific interactions of protein and DNA. In one typical ChIP-chip experiment based on pull-down by Suz12, >50% of the targets, with 3-fold enrichment compared with control group, were false-positives (11). Recently, optimization of ChIP-chip decreased the false positive rate to <1% and false negative rate to 20% to 25% (12). Attention should be paid to several key basics for ChIP-chip, such as antibody quality, immunoprecipitation handling, optimization of array hybridization conditions, and data normalization and analysis. It is also necessary to establish appropriate controls. Generally, genomic DNA is used as an input control and samples from no antibody or immunoglobulin G groups are used as negative controls. Other control designs, such as transformed cell lines versus empty vector cell lines, wild-type target versus mutation target, and with versus without drug treatment, can all be considered as control options.
ChIP-chip Applications in Genome-wide Functional Analysis
The ChIP-chip studies discussed above indicate two strategies using ChIP-chip (3, 4). One is to screen and identify binding targets of protein factor without prior knowledge. Specific targets are then confirmed by ChIP-PCR. The other is to map protein binding location to provide a genome-wide binding profile, which may require more accurate raw data for statistical analysis because the confirmation of the top list of enriched targets by ChIP-chip cannot guarantee that the final map reflects the real spectrum of target distributions. Two major areas of research can be studied using ChIP-chip: first, the identification of sequences with specific histone modifications and, second, the identification of binding targets for nuclear protein factors.
Two strategies with ChIP-chip are used in histone modification studies. One is to detect the distribution of histone modifications using antibodies specially targeting these modifications (13). The other is using ChIP-chip to locate not the modified histones but the enzymes that catalyze the histone modification reactions (14). In yeast, when mutations are introduced to a specific histone modifying enzyme, the changes of histone modifications can identify targets (13, 14). This kind of study can also take advantage of combining ChIP-chip with other high-throughput strategies, especially expression profiling, to establish the correlation of histone markers with transcription activity (13, 15). In humans, three different arrays have been applied in studying histone modifications with ChIP-chip. One is CpG island array that showed strong correlation between CpG methylation and histone modifications (16). The second is the cDNA array, which provided new information on the distribution of histone methylation patterns in the coding regions of human genes (17). Finally, the tiling array mapped H3 markers (dimethyl-K4 and trimethyl-K4, acetyl-H3K9, and acetyl-H3K14) to nonrepetitive regions of human chromosomes 21 and 22 (18). In these studies, ChIP-chip provided a wider view than gene-by-gene studies.
Conventional methods to identify DNA binding elements, such as footprinting and gel shift assays, can be laborious. Various algorithms, based on comparative “phylogenetic footprinting” or coregulation clustering, identify regulatory elements in silico (19). It is significant that directly and in vivo, the genome-wide binding data from ChIP-chip can be used for bioinformatic analysis to identify binding elements (20). One can further combine the data from ChIP-chip, expression profiling, comparative genomic, and published literature to reduce false findings and to identify true interactions (21). The information on binding elements identified by ChIP-chip leads to the discovery of modules that can be used to build a global view of the regulation network (12). In ChIP-chip, formaldehyde can cross link not only interacting protein-DNA but also protein-protein. To avoid the interference from indirect DNA protein binding caused by formaldehyde, two alternative methods, DNA immunoprecipitation with microarray detection and protein binding microarrays, may be considered (22, 23).
ChIP-chip has also been extended or modified for other purposes. Some custom arrays can be applied to check the binding position of regulator or histone modifications pertaining to a single gene of interest (11). To study gene expression, DNA methylation, and histone acetylation in parallel, a progeny array panel, called expressed CpG island sequence tags, was designed from CpG island arrays (24). Another technique used antibodies directed against 5-methylcytidine for immunoprecipitation and subsequent hybridization onto arrays for DNA methylation profiling (25). ChIP-chip with antibodies against methyl-CpG binding proteins can also be applied to identify the methylated targets (26).
Other High-Throughput Methods Beyond ChIP-chip
Because the ideal array covering all human chromosomes is not available, alternative options have been used to address this limitation (Fig. 1A). One way is to digest genomic DNA with restriction enzymes creating blunt ends and to clone the DNA fragments precipitated by ChIP into a plasmid vector (27). One drawback of this cloning strategy is a high ratio of false-positive targets and high labor and expenditure cost. Another possible way is to establish a SELEX (systematic evolution of ligands by exponential enrichment) genomic library for sequencing, although similar questions of nonspecificity remain (Fig. 1A; ref. 28). A combination of ChIP and arbitrarily-primed PCR, called ChAP, is similar to ChIP-SELEX and uses DNA sequencing to identify protein-binding sites (29). Additionally, combining ChIP to modified serial analysis of gene expression, which can be termed ChIP-SAGE, can be used, although the requirement for extensive DNA sequencing makes it less convenient than ChIP-chip (Fig. 1A; ref. 30).
Other similar high-throughput techniques include DNA adenine methyltransferase identification (DamID) coupled with microarray and RNA-IP-chip, combining RNA immunoprecipitation with microarray (Fig. 1B and C). DamID uses the DNA methyltrasferase to “mark” the positions of DNA-protein interactions, which does not require antibody as ChIP-chip (31). However, DamID-chip has limitations when compared to ChIP-chip. First, it requires more time to express the fusion protein than formaldehyde cross linking of endogenous protein. Second, DamID cannot be applied for detecting posttranslational changes, such as histone modifications. Third, the resolution of DamID is less precise than that of ChIP-chip although both methods can produce similar binding maps (32). RNA-IP-Chip is applied to understand RNA-protein interactions (Fig. 1C; ref. 33).
Perspective on Future Development of ChIP-chip
Generally, the ChIP-chip protocol can be completed in three steps: ChIP, post-ChIP DNA handling, and microarray analysis. Some modifications can be made. For example, UV and other covalent trapping of protein-DNA complexes, such as the covalent binding of DNA methyltransferase and 5-aza-2′-deoxycytidine, may be modified to provide another marker for in vivo mapping (34, 35). Unbiased amplification of ChIP DNA would be beneficial to data quality from ChIP-chip. Efforts to amplify genomic DNA homogeneously include carefully designed amplification processes, a chemical additive, and application of more efficient enzymes, such as Φ29 DNA polymerase (36).
Currently, data analysis used for ChIP-chip analysis is largely adopted from expression microarray studies and, as such, similar assumptions are made, which may not always be reliable. For instance, the DNA pulled down, such as those against universal histone modifications, may be significantly changed so the assumption that the total signals from the test samples and the control should be equal is not valid, as is normal for data normalization of expression microarray. Currently, attention is focused on creating algorithms to use the ChIP-chip data; to combine ChIP-chip with other platforms; and to finally establish some predictable models of transcription regulation. In addition, the basic work of design, application, and analysis of ChIP-chip itself may be equally important to improve the reliability of these models.
The combination of ChIP-chip with other high-throughput methods may be beneficial. This entails both combining multiple ChIP-chips and incorporating ChIP-chip with other high-throughput methods, such as expression array. Recently, the yeast two-hybrid has been applied to create the network of protein-protein interactions, and this proteome map can be connected to the transcriptome (37). Theoretically, the information of protein-DNA crosstalk from ChIP-chip, combined with that from protein-protein crosstalk, can provide input signals to reconstruct the regulation network. The expression array may indicate the output signals from this network. It is promising that, along with the appropriate computational strategy, emerging high-throughput methods from different platforms can work cooperatively to produce a clearer picture of the regulation network and disease-related changes.
Conclusion on ChIP-chip
New biological questions continue to drive the development of new techniques. DNA sequencing, expression microarray, proteomic two-dimensional electrophoresis gel, and other system tools have helped to understand the structure and amount of cell components. New genome-wide, high-throughput tools, such as ChIP-chip, are necessary to study the activities of key components, like epigenetic modifications and DNA-protein binding in cells. ChIP-chip has been frequently used in basic biological studies and may be modified further and expanded to other aspects, such as human diseases. Lastly, the large amount of discoveries by ChIP-chip, and other high-throughput techniques, may be connected with emerging bioinformatics to add to our knowledge of life and diseases.
Acknowledgments
Grant support: National Cancer Institute grants U54 CA113001 and R01 CA-69065 (T.H-M. Huang) and P30 CA16058 (C. Plass and T.H-M. Huang); The Leukemia and Lymphoma Society of America (C. Plass); funds from The Ohio State University Comprehensive Cancer Center-Arthur G. James Cancer Hospital and Richard J. Solove Research Institute (T.H-M. Huang); and Leukemia and Lymphoma Society Scholarship (C. Plass). This investigation was partially supported by the National Institutes of Health under Ruth L. Kirschstein, National Research Service Award 5T32CA009338 (LTS) from the National Cancer Institute.
We thank Benjamin Rodriguez for helpful discussions and feedback on this manuscript. We apologize to many colleagues for not citing their work because of space limitations.