(also known as ChIP-chip
) is a technique that combines chromatin immunoprecipitation
) with microarray technology
). Like regular ChIP
, ChIP-on-chip is used to investigate interactions between proteins
and DNA in vivo
. Specifically, it allows the identification of binding sites
of DNA-binding proteins on a genome-wide basis. Whole-genome analysis can be performed to determine the locations of binding sites for almost any protein of interest. As the name of the technique suggests, such proteins are generally those operating in the context of chromatin
. The most prominent representatives of this class are transcription factors
-related proteins, like ORC
, their variants, and histone modifications.
The goal of ChIP-on-chip is to localize protein binding sites which may help in identifying functional elements in the genome
. For example, in the case of a transcription factor as a protein of interest, one can determine its transcription factor binding sites throughout the genome. Other proteins allow the identification of promoter regions
and silencing elements
, boundary elements, and sequences that control DNA replication. If histones are subject of interest, it is believed that the distribution of modifications and their localizations may offer new insights into the mechanisms of regulation
One of the long-term goals ChIP-on-chip was designed for is to establish a catalogue of (selected) organisms that lists all protein-DNA interactions under various physiological conditions. This knowledge would ultimately help in the understanding of the machinery behind gene regulation, cell proliferation
, and disease progression. Hence, ChIP-on-chip offers not only huge potential to complement our knowledge about the orchestration of the genome on the nucleotide level, but also on higher levels of information and regulation as it is propagated by research on epigenetics
The technical platforms to conduct ChIP-on-chip experiments are DNA microarrays
, or "chips"
. They can be classified and distinguished according to various characteristics:
- Probe type: DNA arrays can comprise either mechanically spotted cDNAs or PCR-products, mechanically spotted oligonucleotides, or oligonucleotides that are synthesized in situ. The early versions of microarrays were designed to detect RNAs from expressed genomic regions (open reading frames). Although such arrays are perfectly suited to study gene expression profiles, they have limited importance in ChIP experiments since most "interesting" proteins with respect to this technique bind in intergenic regions. Nowadays, even custom-made arrays can be designed and fine-tuned to match the requirements of an experiment. Also, any sequence of nucleotides can be synthesized to cover genic as well as intergenic regions.
- Probe size: Early version of cDNA arrays had a probe length of about 200bp. Latest array versions use oligos as short as 70- (Microarrays, Inc.) to 25-mers (Affymetrix). (Feb 2007)
- Probe composition: There are tiled and non-tiled DNA arrays. Non-tiled arrays use probes which are selected according to non-spatial criteria, i.e. the DNA sequences used as probes have no fixed distances in the genome. Tiled arrays, however, select a genomic region (or even a whole genome) and divide it into equal chunks. Such a region is called tiled path. The average distance between each pair of neighboring chunks (measured from the center of each chunk) gives the resolution of the tiled path. A path can be overlapping, end-to-end or spaced .
- Array size: The first microarrays used for ChIP-on-Chip contained about 13,000 spotted DNA segments representing all ORFs and intergenic regions from the yeast genome. Nowadays, Affymetrix offers whole-genome tiled yeast arrays with a resolution of 5bp (all in all 3.2 million probes). Tiled arrays for the human genome become more and more powerful, too. Just to name example, Affymetrix offers a set of seven arrays with about 90 million probes, spanning the complete non-repetitive part of the human genome with about 35bp spacing. (Feb 2007)
Besides the actual microarray, other hard- and software equipment is necessary to run ChIP-on-chip experiments. It is generally the case that one company’s microarrays can not be analyzed by another company’s processing hardware. Hence, buying an array requires also buying the associated workflow equipment. The most important elements are, among others, hybridization ovens, chip scanners, and software packages for subsequent numerical analysis of the raw data.
Workflow of a ChIP-on-chip experiment
Starting with a biological question, a ChIP-on-chip experiment can be divided into three major steps: The first is to set up and design the experiment by selecting the appropriate array and probe type. Second, the actual experiment is performed in the wet-lab. Last, during the dry-lab portion of the cycle, gathered data are analyzed to either answer the initial question or lead to new questions so that the cycle can start again.
Wet-lab portion of the workflow
- In the first step, the protein of interest (POI) is cross-linked with the DNA site it binds to in an in vivo environment. Usually this is done by a gentle formaldehyde fixation that is reversible with heat.
- Then, the cells are lysed and the DNA is sheared by sonication or using micrococcal nuclease. This results in double-stranded chunks of DNA fragments, normally 1 kb or less in length. Those which were cross-linked to the POI form a POI-DNA complex.
- In the next step, only these complexes are filtered out of the set of DNA fragments, using an antibody specific to the POI. The antibodies may be attached to a solid surface, may have a magnetic bead, or some other physical property that allow distributing cross-linked complexes and unbound fragments. This procedure is essentially an immunoprecipitation (IP). There are two alternative ways to implement this filtering step:
- immunoprecipitation of the tagged protein with an antibody against the tag (ex. FLAG, HA, c-myc)
- affinity purification that does not require antibodies, such as the Tandem Affinity Purification (TAP)
- The POI-DNA complexes are reverse cross-linked and the DNA are purified. For the rest of the workflow, the POI is no longer necessary.
- After an amplification and denaturation step, the single-stranded DNA fragments are labeled with a fluorescent tag such as Cy5 or Alexa 647.
- Finally, the fragments are poured over the surface of the DNA microarray which is spotted with short, single-stranded sequences that cover the genomic portion of interest. Whenever a labeled fragment "finds" a complementary fragment on the array, they will hybridize and form again a double-stranded DNA fragment.
Dry-lab portion of the workflow
- After a sufficiently large time frame to allow hybridization, the array is illuminated with fluorescence light. Those probes on the array that are hybridized to one of the labeled fragments will emit a light signal which can be captured by a camera. This image contains all raw data for the remaining part of the workflow.
- This raw data, encoded as false-color image, needs to be converted to numerical values before the actual analysis can be done. The analysis and information extraction of the raw data often remains the most challenging part for ChIP-on-chip experiments. Problems arise throughout this portion of the workflow, ranging from the initial chip read-out, to suitable methods to subtract background noise, and finally to appropriate algorithms that normalize the data and make it available for subsequent statistical analysis, which then hopefully lead to a better understanding of the biological question sought to answer. Furthermore, due to the different array platforms and missing standardization between them, data storage and exchange is a huge problem, too. Generally speaking, the data analysis can be divided into three major steps:
- During the first step, the captured fluorescence signals from the array are normalized, using control signals which can be derived from the same or a second chip. Such control signals allow telling which probes on the array were hybridized correctly and which bound unspecifically.
- In the second step, numerical and statistical tests are applied to control data and IP fraction data to identify POI-enriched regions along the genome. The following three methods are used widely: Median percentile rank, Single-array error, and Sliding-window. These methods generally differ in a way how low-intensity signals are handled, how much background noise is accepted, and which trait for the data is emphasized during the computation. In the recent past, the sliding-window approach seems to be favored and is often described as most powerful.
- In the third step, these regions are analyzed further. If, for example, the POI was a transcription factor, such regions would represent its binding sites. Subsequent analysis then may want to infer nucleotide motifs and other patterns to allow functional annotation of the genome.
Strengths and Weaknesses
Using tiled arrays
-on-chip allows for high resolution of genome-wide maps. These maps can determine the binding sites of many DNA-binding proteins like transcription factors and also chromatin modifications.
Although ChIP-on-chip can be a powerful technique in the area of genomics, it is very expensive. Most published studies using ChIP-on-chip repeat their experiments at least three times in order to obtain biologically meaningful maps. The cost of the DNA microarrays is often a limiting factor to whether a laboratory should proceed with a ChIP-on-chip experiment. Another limitation is the size of DNA fragments that can be achieved. Most ChIP-on-chip protocols utilize sonication as a method of breaking up DNA into small pieces. However, sonication is limited to a minimal fragment size of 200 bp. In order for higher resolution maps, this limitation should be overcome to achieve smaller fragments, preferably to single nucleosome resolution. As mentioned previously, the statistical analysis of the huge amount of data generated from arrays is a challenge and normalization procedures should aim to minimize artifacts and determine what is really biologically significant. So far, application to mammalian genomes has been a major limitation, for example, due to a significant percentage of the genome that is occupied by repeats. However, as ChIP-on-chip technology advances, high resolution whole mammalian genome maps are achievable.
Antibodies used for ChIP-on-chip can be an important limiting factor. ChIP-on-chip requires highly specific antibodies that must recognize its epitope in free solution and also under fixed conditions. If it is demonstrated to successfully immunoprecipitate cross-linked chromatin, it is termed "ChIP-grade". Companies that provide ChIP-grade antibodies include Abcam, Santa Cruz, and Upstate. To overcome the problem of specificity, the protein of interest can be fused to a tag like FLAG or HA that are recognized by antibodies. Also available are antibodies against a specific histone modification like H3 tri methyl K4. As mentioned before, the combination of these antibodies and ChIP-on-chip has become extremely powerful in determining whole genome analysis of histone modification patterns and will contribute tremendously to our understanding of the histone code and epigenetics.
A study demonstrating the non-specific nature of DNA binding proteins has been published in PLoS Biology. This indicates that alternate confirmation of functional relevancy is a necessary step in any ChIP-chip experiment.
The ChIP-on-chip technique was first applied successfully in three papers published in 2000 and 2001 . The authors identified binding sites for individual transcription factors in the budding yeast Saccharomyces cerevisiae
. In 2002, Richard Young’s group determined the genome-wide positions of 106 transcription factors using a c-Myc tagging system in yeast. Other applications for ChIP-on-chip include DNA replication
, and chromatin structure. Since then, ChIP-on-chip has become a powerful tool in determining genome-wide maps of histone modifications and many more transcription factors. ChIP-on-chip in mammalian systems has been difficult due to the large and repetitive genomes. Thus, many studies in mammalian cells have focused on select promoter regions that are predicted to bind transcription factors and have not analyzed the entire genome. However, whole mammalian genome arrays have recently become commercially available from companies like Nimblegen. In the future, as ChIP-on-chip arrays become more and more advanced, high resolution whole genome maps of DNA-binding proteins and chromatin components for mammals will be analyzed in more detail.
is a recently developed technology that still uses chromatin immunoprecipitation to crosslink the proteins of interest to the DNA but then instead of using a micro-array, it uses the more accurate, higher throughput method of sequencing to localize interaction points.