Abstract
Background: MicroRNAs (miR) are endogenous, noncoding RNAs involved in many cellular processes and have been associated with the development and progression of cancer. There are many different ways to evaluate miRs.
Methods: We described some of the most commonly used and promising miR detection methods.
Results: Each miR detection method has benefits and limitations. Microarray profiling and quantitative real-time reverse-transcription PCR are the two most common methods to evaluate miR expression. However, the results from microarray and quantitative real-time reverse-transcription PCR do not always agree. High-throughput, high-resolution next-generation sequencing of small RNAs may offer the opportunity to quickly and accurately discover new miRs and confirm the presence of known miRs in the near future.
Conclusions: All of the current and new technologies have benefits and limitations to consider when designing miR studies. Results can vary across platforms, requiring careful and critical evaluation when interpreting findings.
Impact: Although miR detection and expression analyses are rapidly improving, there are still many technical challenges to overcome. The old molecular epidemiology tenet of rigorous biomarker validation and confirmation in independent studies remains essential. Cancer Epidemiol Biomarkers Prev; 19(4); 907–11. ©2010 AACR.
In 1993, Rosalind Lee, Rhonda Feinbaum, and Victor Ambros reported the discovery of a gene, lin-4, which coded for small RNAs rather than a protein (1). This discovery led to the identification of an entirely new class of RNA: microRNA (miR). Mature miRs are small, single-stranded RNAs ∼22 nucleotides in length that are highly conserved across species (2). By degrading mRNA transcripts or inhibiting protein translation, miRs negatively regulate gene expression for a variety of fundamental biological processes, such as apoptosis, development, differentiation, and proliferation (2, 3). It is estimated that miRs regulate ∼30% of human genes (4), and miR dysregulation has been associated with the development and progression of cancer (5, 6). In fact, the same miR can act as an oncogene in some tissues and as a tumor suppressor in others (5).
These discoveries have sparked a great deal of interest in miR research. Because of their unique posttranscription and protein translation regulatory functions, miRs are important epigenetic modulators. For example, because miRs can inhibit protein translation, gene expression may be high whereas the encoded protein expression is low (7). In addition, although mRNA is not stable in formalin-fixed paraffin-embedded tissue, miR expression profiles seem to correlate well between fresh and formalin-fixed paraffin-embedded samples, possibly due to the small size and resistance to RNA degradation (8-13) of miRs. Stable miRs have also been detected in serum, plasma, urine, and other biological fluids and may be associated with cancer (14-22). These features make miRs extremely attractive for epidemiologic research, where archived formalin-fixed paraffin-embedded tissue, blood, or other biological fluids is most often available.
MiRs have been evaluated through several different methods, and each method has its own limitations. Some of the most commonly used and promising methods are listed in Table 1. The cloning method was originally used to discover miRs, which were subsequently confirmed with Northern blot (1, 23-26). Although most new miRs are still discovered through cloning, these methods are time consuming, low throughput, and biased toward the discovery of highly abundant miRs (27, 28). Similarly, other miR profiling methods have benefits and limitations. For example, although in situ hybridization is low throughput and has limited sensitivity and specificity, it shows the cellular localization of miRs, which is useful for characterizing the biology of individual miRs (28, 29). More generally, methods that directly detect miRs have low sensitivity because of the extremely short sequences and relatively low copy numbers of miRs. For these reasons, methods that do not involve miR amplification require more input total RNA. However, methods that do use amplification can be error prone due to the extremely short and inflexible template characteristics and similarity in sequences within miR families. Amplified samples are also more greatly affected by handling errors (30).
Technology . | Throughput . | Sensitivity . | Specificity . | Use* . | Cost . | Time requirement . | Limitations . |
---|---|---|---|---|---|---|---|
Cloning | Low | Low | High | Discovery and confirmation | High/miR | High | Need sequence confirmation. Limited quantitative ability. High cost and time. |
In situ hybridization | Low | Low | Low | Confirmation | High/miR | 1 d† | High background, low sensitivity and specificity. Can only detect high abundance miRs. |
Microarray (oligonucleotide microchip) | High | Low | Low for subfamily | Confirmation | Low/miR | 1 d | Requires 0.2-2 μg total RNA. Potential cross-hybridization of related miRs. Can only measure relative abundance. |
Bead-based flow cytometry | High | Medium | High | Confirmation | Low/miR | Low | High complexity: requires removal of genomic DNA and recovery of small RNA, amplification, hybridization, and flow cytometry. More prone to external contamination due to amplification. |
Northern blot | Low | Relatively low | High | Confirmation | Low/miR | High | High complexity: requires many labor-intensive steps, including radiolabeled oligonucleotide probes. Not all laboratories certified to handle radioactive probes. Requires a large amounts of total RNA (5-25 μg/sample). |
qRT-PCR | Semihigh | High | High | Confirmation | High/miR | 1 d | Primer for cDNA is based on complimentarity over short sequences at the 3′-end. More prone to external contamination due to amplification. |
Next generation sequencing | High | High | High | Discovery and confirmation | Low/miR, High/sample | 2-5 d | High complexity: requires several gel purifications. Requires at least 2-10 μg very high quality total RNA and costly, specialized equipment. Potential underrepresentation of lower copy miRs. |
Technology . | Throughput . | Sensitivity . | Specificity . | Use* . | Cost . | Time requirement . | Limitations . |
---|---|---|---|---|---|---|---|
Cloning | Low | Low | High | Discovery and confirmation | High/miR | High | Need sequence confirmation. Limited quantitative ability. High cost and time. |
In situ hybridization | Low | Low | Low | Confirmation | High/miR | 1 d† | High background, low sensitivity and specificity. Can only detect high abundance miRs. |
Microarray (oligonucleotide microchip) | High | Low | Low for subfamily | Confirmation | Low/miR | 1 d | Requires 0.2-2 μg total RNA. Potential cross-hybridization of related miRs. Can only measure relative abundance. |
Bead-based flow cytometry | High | Medium | High | Confirmation | Low/miR | Low | High complexity: requires removal of genomic DNA and recovery of small RNA, amplification, hybridization, and flow cytometry. More prone to external contamination due to amplification. |
Northern blot | Low | Relatively low | High | Confirmation | Low/miR | High | High complexity: requires many labor-intensive steps, including radiolabeled oligonucleotide probes. Not all laboratories certified to handle radioactive probes. Requires a large amounts of total RNA (5-25 μg/sample). |
qRT-PCR | Semihigh | High | High | Confirmation | High/miR | 1 d | Primer for cDNA is based on complimentarity over short sequences at the 3′-end. More prone to external contamination due to amplification. |
Next generation sequencing | High | High | High | Discovery and confirmation | Low/miR, High/sample | 2-5 d | High complexity: requires several gel purifications. Requires at least 2-10 μg very high quality total RNA and costly, specialized equipment. Potential underrepresentation of lower copy miRs. |
*Discovery of new miRs and/or confirmation of the presence of already known miRs.
†Requires 1 d for testing after the assays have been optimized.
Although there is currently no gold standard for measuring miR expression (29), oligonucleotide microarray (microchip) and quantitative real-time reverse-transcription PCR (qRT-PCR) are two of the most common methods for evaluating known miRs (7, 27, 29-31). Although some studies of cancer cell lines (32, 33) or human tissue (34) found good correlation between microarray and qRT-PCR for selected miRs, one recent study compared semihigh-throughput microarray and qRT-PCR in proliferating murine myoblast cells and concluded that there was low correlation across platforms (27). Similarly, we found poor overall correlation between microarray- and qRT-PCR–based miR expression in 49 samples from lung cancer cases in the Environmental And Genetics in Lung cancer Etiology population-based case-control study. Microarray and qRT-PCR miR expression were significantly correlated for only 4 of 9 (44%) human miRs evaluated. Other studies of miR expression in cancer have also reported a relatively poor replication of microarray miR expression by qRT-PCR (35-37), and studies with 100% validation often report only one to three miRs (38-43).
There are several reasons why the results from qRT-PCR and microarray might differ. First, the larger dynamic range of stem-loop qRT-PCR (seven logs versus three to four logs for microarray) may provide greater sensitivity (27). qRT-PCR may also have higher specificity compared with microarray in distinguishing miRs with bases that differ at the 3′-end because stem-loop primers can distinguish between miRs that differ by one nucleotide (27, 44). In addition, because miRs vary slightly in length and guanine-cytosine (GC) content, they have different melting temperatures (30). Yet all miR probes on a microarray must undergo the same hybridization conditions because they are all on the same microchip. These homogenized hybridization conditions can lead to sequence-dependent differential hybridization affinities that may result in either false positives due to nonspecific hybridization or false negatives due to hybridization signals that do not exceed the background threshold (32). Dual-channel (color) hybridization is less affected by this limitation than single-channel hybridization because the ratio between the two channels is used for data analysis rather than signal intensity. On the other hand, qRT-PCR relies exclusively on the success of cDNA synthesis, which is initiated by a stem-loop primer primed to short sequences at the 3′-end of the miR. Failure to initiate cDNA synthesis could result in false negatives. Readers should take care when reviewing older studies because stem-loop primers designed based on older versions of miR sequence databases, such as miRBase 9.2 (45), may not correctly prime to natural miR sequences due to inaccuracies in miR sequences from earlier versions compared with the current version, miRBase 14 (46). However, most modern commercially available stem-loop primers are designed based on later versions of the miRBase. In addition, qRT-PCR requires extreme care to avoid contamination or other technical errors and can produce variable results even in expert laboratories, suggesting that it is not the ideal gold standard (29).
It is considered good practice to profile miRs by microarray followed by validation with qRT-PCR (5). However, there are no standard guidelines for conducting and reporting such validation. For example, some authors report validation by qRT-PCR for some miRs and by Northern blot for other miRs, or report validation of precursor miRs but not mature miRs, without any explanation as to why these tests were chosen (47-49). In addition, when authors report that a few miRs were validated by qRT-PCR, it is often unclear if other miRs were also tested but not validated by qRT-PCR. Standardized guidelines would aid the interpretation of miR data by creating transparency in reporting. Furthermore, relative quantification of miR expression by qRT-PCR depends on the small nRNA used as an internal control. There is no standard as to which internal control should be used for the normalization of qRT-PCR data, and inappropriate normalization can result in erroneous conclusions (50). Clarity in describing how standardization controls are chosen would also aid data interpretation.
Because the full complement of human miRs has not been ascertained (29), platforms such as microarray and qRT-PCR that can only identify known sequences are limited. Emerging sequencing technologies provide a new discovery approach and have already been used to study small RNA, of which miR is one of the main components. Next-generation high-resolution deep sequencing allows both discovery of new miRs and confirmation of known miRs (7) in a high-speed, high-throughput fashion without the need for gels (51) or the ambiguity in data interpretation inherited by other methods. These new methods primarily include three platforms: the Roche (454) Genome Sequencer, which uses pyrosequencing to simultaneous sequence over 1 million reads in excess of 400 bp (52); the Illumina (Solexa) Genome Analyzer, which uses sequencing-by-synthesis to produce ∼200 million 36- to 100-bp reads (53); and the Applied Biosystems SOLiD system, which uses sequencing by oligo ligation and detection to produce 400 million 50-bp reads (54).
In brief, these methods determine the nucleotide sequence by taking a picture every time a new nucleotide is added to the growing strand, thus emitting light (51). To ensure sufficient light signal intensity for accurate detection of each added nucleotide, these methods typically amplify the fragments through emulsion PCR or library generation followed by PCR-based cluster amplification. However, amplification can result in sequence errors and some sequences may be preferentially amplified, limiting the ability to accurately quantify relative abundance. These methods can also be less accurate in areas of homopolar (identical) bases. New techniques to read the sequence derived from a single molecule are currently under development. Limitations of next-generation sequencing include bioinformatic challenges due to large quantities of data and the high cost of instruments and reagents, although each sample can be bar coded, allowing samples to be mixed and run simultaneously to reduce cost. The third generation of sequencing technologies currently under development could eventually provide lower cost options (51, 55).
In summary, miR research is an exciting and growing field. Accurate and quantitative estimation of miR profiles or specific miR expression levels and their correlation with a given condition is the key to fully understanding the function of miR biological processing. All of the current and new technologies have benefits and limitations to consider when designing miR studies. Results can vary across platforms, requiring careful and critical evaluation when interpreting findings. When costs come down as they have for genotyping, next-generation sequencing may allow fast and possibly more accurate miR profiling in a way that could greatly enhance epidemiologic research.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We thank Dr. Aaron Schetter for his critical review of this manuscript.