Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease. Mol Cancer Ther; 10(5); 752–60. ©2011 AACR.

MicroRNAs (miRNA) compose a class of small endogenous noncoding RNA capable of regulating gene expression, either by inhibiting translation or promoting mRNA degradation (1). It has been suggested that differential expression of miRNA is involved in the initiation and progression of human cancer and thus may serve as a disease marker to improve diagnosis, prognosis and therapy for cancer (2–5). On the basis of miRNA arrays, numerous studies have been conducted to identify differentially expressed miRNAs between cancer and normal samples (3, 6–10). However, for a particular cancer, differentially expressed miRNA generated from multiple studies are often highly inconsistent. For example, no differentially expressed miRNAs detected from 6 studies for prostate cancer were shared by all of the studies (11). Because the reproducibility of discovered biomarkers is of fundamental importance for validation (12, 13), this problem must be resolved in a rational way before full use of miRNA arrays in biological researches is possible.

The irreproducibility problem may not reflect lack of reliability of the technology platform used but may reflect a lack of understanding regarding a correct validation process for biomarkers (14). It is becoming clear that, in high-throughout experiments for finding biomarkers for a disease, the irreproducibility problem could be induced by, among other factors, diverse biological factors such as biological variation and heterogeneity of molecular changes in the disease (15–17). On the other hand, it is also becoming clear that the diverse molecular changes (18, 19) including expression changes of miRNAs in cancer tend to be redundant regarding function (20). Specifically, diverse expression changes of miRNAs in cancer may primarily affect cancer-associated pathways by targeting cancer genes in these pathways (9, 21, 22). If this is true, different differentially expressed miRNAs generated from different studies for a cancer may be consistent in function, as seen for biomarker discovery based on the mRNA array platform (16, 23). Thus, to evaluate the reproducibility of differentially expressed miRNA discovery in different studies for a particular cancer, 1 reasonable approach involves taking into account the functional relationship (16, 24), rather than simply counting the overlaps. That is, we can design scores to evaluate the reproducibility of biomarker discovery using a biological assumption on functional relations such as expression correlation (16) or functionally similarity (24) between biomarkers. Notably, the underlying biological assumption is statistically testable: if a score is significantly higher than expected by chance, then it can provide evidence supporting that the assumption can partially explain the apparently inconsistent biomarkers.

Here, on the basis of multiple miRNA data sets for 2 cancer types, we showed that the regulation direction (upregulated or downregulated in cancer samples compared with normal samples) of differentially expressed miRNA detected in different data sets for each cancer were highly consistent. This result indicates that the miRNA arrays could be applied to reliably capture signals of the regulation direction of differentially expressed miRNA in cancer (25). Importantly, this result also suggests that the differentially expressed miRNA in a particular cancer has a consistent up- or downregulation pattern. On the analysis of different samples for each cancer, the miRNA displaying the most pronounced differential expression varied greatly, which could be attributed to biological variation of the molecular change of miRNA in the samples (26). On the basis of the biological assumption that 2 differentially expressed miRNAs with significantly overlapped targets may have a similar function (27), we proposed a score entitled percentage of overlaps of function-related miRNAs (POF) to evaluate the functional consistency of differentially expressed miRNAs for a cancer. For each of the 2 cancers, respectively, we found that most of the POF scores between top-ranked differentially expressed miRNAs detected from different studies were rather high, suggesting that the differentially expressed miRNAs detected from different studies may point to functionally important differential expression of miRNA in the disease. Then, using another independent data set for colon cancer, we validated the model of functional links between differentially expressed miRNAs extracted from 2 other data sets for this cancer.

Datasets

We analyzed 5 large data sets for 2 cancer types (colon and gastric). For each data set, the number of samples of each state (cancer or normal) was not less than 20. The detailed information was described in Table 1. We referred to each data set using the following nomenclature: cancer type followed by the total number of samples. Three data sets were collected for colon cancer: 2 data sets [Colon99 (6) and Colon168 (7)] were used to analyze the consistency of differentially expressed miRNAs and extract the functional link models, and another data set [Colon108 (8)] was used to validate the functional link models. The Colon108 data set included 38 technical replicates with highly reproducible signals (8). For each sample with replicates in this data set, the expression value of each miRNA was represented by the average of quantile normalized values for this miRNA across the replicates (8).

Table 1.

Summary of the 5 data sets analyzed in this study

Dataseta(T:N)bPlatformSource of normal sample
Colon99 (6) 78:21 OSU_CCCc 11k v2 Normal colonic mucosad 
Colon168 (7) 84:84 OSU_CCC v2.0 Adjacent nontumor tissue 
Colon108 (8) 80:28 Illumina v1 Adjacent nontumor tissue 
Gastric41 (9) 20:21 OSU_CCC 11k v2 Adjacent nontumor tissue 
Gastric353 (10) 184:169 OSU_CCC v3.0 Adjacent nontumor tissue 
Dataseta(T:N)bPlatformSource of normal sample
Colon99 (6) 78:21 OSU_CCCc 11k v2 Normal colonic mucosad 
Colon168 (7) 84:84 OSU_CCC v2.0 Adjacent nontumor tissue 
Colon108 (8) 80:28 Illumina v1 Adjacent nontumor tissue 
Gastric41 (9) 20:21 OSU_CCC 11k v2 Adjacent nontumor tissue 
Gastric353 (10) 184:169 OSU_CCC v3.0 Adjacent nontumor tissue 

aEach data set was denoted by the following nomenclature: cancer type followed by the total number of samples.

bT, the number of tumor samples; N, the number of normal samples.

cOSU_CCC, Ohio State University custom miRNA microarray chip.

dNormal samples were described as being from the normal colonic mucosa, thus it is unclear whether they were from the patients.

miRNA targets and cancer-associated genes

Currently, Targetscan (28), miRanda (29), and PicTar (30) are among the most widely used miRNA–target interaction prediction algorithms. Thus, we used them for analyses separately. Considering that miRNA targets predicted by multiple algorithms might be more reliable (31), we also used miRNA–targets interactions appearing in at least 2 of the 8 data sources (see Supplementary Data).

A total of 2,104 cancer-associated genes were collected from F-Census which compiles cancer genes from 8 data sources (24).

Data preprocessing and differentially expressed miRNA selection

For the data sets of Colon99, Colon168, Gastric41, and Gastric353, the raw data were preprocessed as follows: (i) the median background signal intensity was subtracted from the signal for each probe; (ii) signal values less than 1 were replaced with 1; (iii) log2-transformation was applied; (iv) probes absent in more than 20% of samples were deleted and other missing values were imputed using the R package kNN imputation function (32); (v) signal values of replicate probes of a miRNA were averaged; and (vi) quantile normalization was carried out (33). For Colon108, the quantile-normalized data of the original study were used. For all platforms, we annotated probes to mature miRNA using miRbase version 16 (34).

A two-sample t-test was carried out to select the differentially expressed miRNAs. The P values were adjusted by the Benjamini and Hochberg (BH) correction procedure to account for multiple tests with the false discovery rate (FDR) < 5% (35). For comparing the top n differentially expressed miRNAs from different data sets, we ranked miRNAs according to the t-test P values to select the most significant n ones. The differentially expressed miRNAs selected for each data set were listed in Supplementary Table S1–S5.

Consistency of regulation direction between two lists of differentially expressed miRNA

For each cancer, when comparing results found from multiple data sets generated on different platforms, we only analyzed the miRNAs presented in all of the data sets. Using the binomial distribution model, we calculated the probability of observing from N randomly selected miRNAs at least m non-differentially expressed miRNAs with consistent regulation directions across 2 data sets as follows:

$$P=\sum^N_{i=m}C^i_N(P_e)^i(1-P_e)^{N-i}$$

in which Pe is the random probability that 1 non-differentially expressed miRNA with the same regulation direction exists across 2 data sets. For each cancer, we roughly defined non-differentially expressed miRNA as miRNA that was not detected as differentially expressed miRNA at the FDR level of 10% in either of the data sets for this cancer. The Pe for each cancer was close to 0.5.

The PO score

The PO (percentage of overlaps) metric is frequently used to evaluate the consistency of different lists (26, 36, 37). Clearly, to be used as a disease marker for a disease, a differentially expressed miRNA should have a steady regulation pattern across different data sets. In this study we strictly defined differentially expressed miRNA, shared by 2 lists of differentially expressed miRNA only to be thus named if it also exhibited the same direction change across the corresponding 2 data sets (16). Suppose O miRNAs are shared by list 1 with length l1 and list 2 with length l2, then the PO score from list 1 to list 2 is PO12 = O/l1, and the score from list 2 to list 1 is PO21 = O/l2.

To assess the significance of a PO score for 2 differentially expressed miRNA lists with lengths l1 and l2, respectively, we did a random experiment to test the null hypothesis that the observed PO score would be expected by chance for 2 random miRNA lists (with lengths of l1 and l2) extracted from miRNAs presented in both data sets. This process was repeated 10,000 times. The significance level of an observed PO score was defined as the percentage of the 10,000 random scores no less than the observed score.

The POF score

We considered 2 miRNAs to be functionally similar if they (i) had the same regulation direction, (ii) shared significantly more target genes than expected by chance, and (iii) shared at least 1 cancer-associated target gene. The significance of the number of targets shared by 2 miRNAs was calculated according to the cumulative hypergeometric distribution model:

$$P=1-\sum^{k-1}_{i=0}\,{{C^i_nC^{M-i}_{N-n}}\over {C^M_N}}$$

in which N is the number of all targets of all miRNAs, M and n are the numbers of targets for the 2 miRNAs, respectively, and k is the number of targets shared by these 2 miRNAs. The P values were corrected by the Benjamini and Hochberg method for multiple testing (35).

Next, we proposed the POF score to evaluate the functional consistency between 2 miRNA lists. Suppose Of12 (or Of21) denotes the number of miRNAs in list 1 (or list 2) not shared but functionally similar to at least 1 miRNA in list 2 (or list 1); the POF score from list 1 to list 2 (or from list 2 to list 1) is then POF12 = (O + Of12)/l1 (or POF21 = (O + Of21)/l2). POF is then determined as POF = (POF12 + POF21)/2.

We tested 2 null hypotheses for an observed POF score by random experiments. The first random experiment, referred to as targets randomization, was used to test the null hypothesis that the score can be expected by chance when no prior biological knowledge of miRNA–target interaction is used. Here, we assigned genes randomly derived from the human genome to miRNAs while keeping the number of targets for each miRNA unchanged. The second random experiment, referred to as miRNAs randomization, was applied to test the null hypothesis that the observed POF score can be expected for non-differentially expressed miRNAs lists with the same lengths as the differentially expressed miRNAs lists. Here, approximately, we defined non-differentially expressed miRNAs for a cancer as the remaining miRNAs after excluding the miRNAs detected to be potentially significant with a rather loose FDR cutoff of 10% in either data set for the cancer. Then, we randomly selected 2 non-differentially expressed miRNA lists with the same lengths as the differentially expressed miRNAs lists and calculated their POF score. This process was repeated 10,000 times. The significance level for an observed POF score was defined as the percentage of the randomization scores no less than the observed value.

Functional link model of differentially expressed miRNAs

For 2 lists of differentially expressed miRNAs separately selected from 2 data sets for a cancer, we constructed a network model by linking every 2 miRNAs between the lists if they are functionally similar according the criteria described above.

For a new list of differentially expressed miRNAs detected from a new data set for a cancer, we can test whether it is associated with a previously extracted model for the same disease. We can do this validation on the basis of the same assumption for defining functionally similar miRNAs. That is, for each miRNA in the new list, if it shows the same regulation direction across all data sets and shares significantly more targets with at least 1 set of common targets of 2 linked miRNAs in the model, including at least 1 cancer-associated gene, then it is defined as a functionally similar miRNA with the model.

Pathway enrichment analysis

The hypergeometic distribution model was used to find the Kyoto Encyclopedia of Genes and Genomes (KEGG; ref. 38) pathways enriched with common targets of 2 differentially expressed miRNAs. If we suppose thats miRNAs M is the total number of the common targets of 2 miRNA, N is the total number of genes in human genome, n is the number of genes annotated in a pathway, and k is the number of the common targets annotated in this pathway, we then calculate the probability of observing by chance at least k targets annotated in this KEGG pathway as follows:

$$P=1-\sum^{k-1}_{i=0}\,{{C^i_nC^{M-i}_{N-n}}\over{C^M_N}}>$$

The raw P values were adjusted by the BH correction procedure (FDR < 5%; ref. 35).

Consistent regulation direction of differentially expressed miRNA in a particular cancer

To investigate the consistency of differentially expressed miRNAs across different data sets, we analyzed 2 data sets for each of the 2 solid cancers described in Table 1. In each data set, we selected differentially expressed miRNAs by t-test at the FDR level of 5%. Considering that different platforms have different degrees of coverage of all human miRNAs, we focused on analyzing the miRNAs presented in both data sets for a cancer.

In the 2 data sets for colon cancer, 22 and 49 differentially expressed miRNAs were selected from the 110 miRNAs presented in both data sets, respectively, and they shared 14 differentially expressed miRNAs (Fig. 1A), in which 13 have the same regulation direction (upregulated or downregulated in cancer samples compared with normal samples) across these 2 data sets. Thus, the PO score (see Materials and Methods) was 0.59 from the short list to the long list and was 0.27 from the opposite direction. These 2 scores were not high, but were significantly larger than the expected scores of 0.32 and 0.15 in the 2 directions (P = 0.0010). In the 2 data sets for gastric cancer, 78 and 18 differentially expressed miRNAs were identified from the 155 miRNAs presented in both data sets and they shared 15 differentially expressed miRNAs (Fig. 1B). The PO score was 0.83 from the short list to the long list, whereas it was only 0.19 from the opposite direction; both values were significantly larger than expected by chance (P < 0.0001). Notably, the longer list of differentially expressed miRNAs included approximately half of all miRNAs detected in these 2 cancer types, indicating that miRNAs are widely altered in cancer.

Figure 1.

Consistence analysis between different differentially expressed miRNA lists for each cancer. A, overlap between the 2 lists of differentially expressed miRNAs extracted from Colon99 and Colon168. In the overlap, 1 miRNA (miR-192, italic) showed inconsistent deregulation directions across these 2 data sets. B, overlap between the 2 lists of differentially expressed miRNAs extracted from Gastric41 and Gastric353. The differentially expressed miRNAs were selected by t-test followed by BH correction (FDR < 5%).

Figure 1.

Consistence analysis between different differentially expressed miRNA lists for each cancer. A, overlap between the 2 lists of differentially expressed miRNAs extracted from Colon99 and Colon168. In the overlap, 1 miRNA (miR-192, italic) showed inconsistent deregulation directions across these 2 data sets. B, overlap between the 2 lists of differentially expressed miRNAs extracted from Gastric41 and Gastric353. The differentially expressed miRNAs were selected by t-test followed by BH correction (FDR < 5%).

Close modal

We then investigated the consistency of the regulation direction of the differentially expressed miRNAs identified from various studies for colon and gastric cancers, respectively. For the 2 colon cancer data sets, 13 (92.9%) of the 14 differentially expressed miRNAs detected in both data sets exhibited the same up- or downregulation across the data sets (P = 9.16E-04, binominal distribution test). For the 49 differentially expressed miRNAs found in the data set Colon99, after excluding the 14 overlapping differentially expressed miRNAs, 13 exhibited tentative evidence of differential expression with P values less than 0.1 in the data set Colon168, and all of them exhibited the same direction change across the 2 data sets (P = 1.22E-04, binominal distribution test). This result indicates that many miRNAs with tentative evidence of differential expression in the data set Colon168 may actually be differentially expressed in colon cancer. Similarly, for gastric cancer, all of the 15 overlapping differentially expressed miRNAs showed the same regulation directions across the 2 data sets (P = 3.05E-05, binominal distribution test). For the 78 differentially expressed miRNAs found in the data set Gastric353, after excluding the 15 overlapping differentially expressed miRNAs, 28 exhibited P values less than 0.1 in the data set Gastric41 and 27 (96.4%) of them showed the same regulation directions across the 2 data sets (P = 1.08E-07, binominal distribution test). The results showed that the regulation direction of the differentially expressed miRNAs detected in both data sets for each of the 2 cancers were highly consistent, indicating that miRNA arrays could reliably capture the signals of the regulation direction of differentially expressed miRNAs in a particular cancer.

Functional consistency of top-ranked differentially expressed miRNA for a particular cancer

Researchers are often interested in the most significant differentially expressed miRNAs. However, for a particular cancer, the top-ranked n most significant differentially expressed miRNAs separately detected from various studies are usually highly inconsistent. For example, when n was equal to 10, only 2 (hsa-miR-106b and -10b) were shared by 2 lists for colon cancer and 3 (hsa-miR-181b, -21, and -218) were shared for gastric cancer. The corresponding PO scores were only 0.2 and 0.3. When n was equal to 20, the PO scores were 0.2 and 0.45 for colon and gastric cancer, respectively. Then, we proposed the POF score (see Materials and Methods) to evaluate the functional consistency of 2 lists of the top-ranked n differentially expressed miRNAs for each cancer on the basis of the assumption that different differentially expressed miRNAs may disturb the same cancer-associated pathways through their common targets. We calculated the POF scores using the miRNA targets predicted by TargetScan (28), miRanda (29), and PicTar (30), respectively. Also, we calculated the scores using miRNA targets documented in at least 2 of the 8 data sources (see Materials and Methods). We found that all the results were quite similar (Table 2), indicating that functional analysis between miRNAs on the basis of their nonrandom target overlap is rather robust against a certain level of false targets predicted for miRNAs. Hence, we only described the results on the basis of the targets predicted by TargetScan in the following text.

Table 2.

Scores for the top n (10 or 20) most significant differentially expressed miRNAs

DatasetMethodPOFe-POF1ae-POF2bPOFe-POF1e-POF2
Colon168 vs. Colon99 TOP 10 TOP 20 
 TargetScan 0.90 0.20 (<0.0001) 0.43 (0.0004) 0.83 0.20 (<0.0001) 0.46 (0.0004) 
 miRanda 0.90 0.20 (<0.0001) 0.50 (0.0006) 0.85 0.20 (<0.0001) 0.50 (<0.0001) 
 PicTar 0.80 0.20 (<0.0001) 0.48 (0.0039) 0.78 0.20 (<0.0001) 0.47 (0.0003) 
 Integratedc 0.90 0.20 (<0.0001) 0.47 (0.0004) 0.85 0.20 (<0.0001) 0.49 (<0.0001) 
Gastric353 vs. Gastric41   
 TargetScan 0.95 0.30 (<0.0001) 0.44 (<0.0001) 0.98 0.45 (<0.0001) 0.47 (<0.0001) 
 miRanda 0.95 0.30 (<0.0001) 0.50 (0.0004) 0.98 0.45 (<0.0001) 0.50 (<0.0001) 
 PicTar 0.95 0.30 (<0.0001) 0.44 (0.0004) 0.95 0.45 (<0.0001) 0.48 (<0.0001) 
 Integratedc 0.95 0.30 (<0.0001) 0.47 (0.0004) 0.98 0.45 (<0.0001) 0.49 (<0.0001) 
DatasetMethodPOFe-POF1ae-POF2bPOFe-POF1e-POF2
Colon168 vs. Colon99 TOP 10 TOP 20 
 TargetScan 0.90 0.20 (<0.0001) 0.43 (0.0004) 0.83 0.20 (<0.0001) 0.46 (0.0004) 
 miRanda 0.90 0.20 (<0.0001) 0.50 (0.0006) 0.85 0.20 (<0.0001) 0.50 (<0.0001) 
 PicTar 0.80 0.20 (<0.0001) 0.48 (0.0039) 0.78 0.20 (<0.0001) 0.47 (0.0003) 
 Integratedc 0.90 0.20 (<0.0001) 0.47 (0.0004) 0.85 0.20 (<0.0001) 0.49 (<0.0001) 
Gastric353 vs. Gastric41   
 TargetScan 0.95 0.30 (<0.0001) 0.44 (<0.0001) 0.98 0.45 (<0.0001) 0.47 (<0.0001) 
 miRanda 0.95 0.30 (<0.0001) 0.50 (0.0004) 0.98 0.45 (<0.0001) 0.50 (<0.0001) 
 PicTar 0.95 0.30 (<0.0001) 0.44 (0.0004) 0.95 0.45 (<0.0001) 0.48 (<0.0001) 
 Integratedc 0.95 0.30 (<0.0001) 0.47 (0.0004) 0.98 0.45 (<0.0001) 0.49 (<0.0001) 

Abbreviation: POF, percentage of overlaps of function.

aExpected POF score (P-value) estimated by doing targets randomization 10,000 times.

bExpected POF score (P-value) estimated by doing miRNAs randomization 10,000 times.

cIntegrated, miRNA–target interactions were in at least 2 of 8 miRNA target sources.

Between the lists of the top 10 (or 20) differentially expressed miRNAs detected separately from the 2 data sets for colon cancer, the POF score was 0.90 (or 0.83). We tested 2 null hypotheses for these observed high POF scores by random experiments (see Materials and Methods). For the targets randomization experiment, the average of 10,000 random POF scores for either the top 10 or 20 miRNAs was 0.20. In 10,000 of the random POF scores, no one exceeded the observed scores (P < 10E-4, Table 2), we thus rejected the null hypothesis that the observed POF score can be expected by chance when no prior biological knowledge of miRNA–target interaction is used. For the miRNAs randomization experiment, we approximately defined non-differentially expressed miRNAs as those miRNAs not detected as differentially expressed miRNA using a loose FDR control of 10% in either data set. By randomly extracting 2 lists of 10 or 20 non-differentially expressed miRNAs 10,000 times, the average of random POF scores was 0.44 or 0.47, significantly smaller than the corresponding observed scores. Thus, we rejected the null hypothesis that the observed POF score can be expected for non-differentially expressed miRNAs lists with the same lengths as the differentially expressed miRNAs lists (Table 2). Similarly, the POF scores for the lists of the top 10 or 20 miRNAs separately detected from the 2 data sets for gastric cancer was 0.95 or 0.98, both of which were significantly higher than those obtained using the above 2 kinds of random experiments (Table 2).

Functional link model of differentially expressed miRNAs for a particular cancer

We constructed the model of functional relationships between the lists of the top 10 differentially expressed miRNAs (Fig. 2A) separately selected from the 2 data sets for colon cancer (see Materials and Methods). On the basis of this model, we explained some functional links. For example, hsa-miR-106a, and -20a were detected as upregulated differentially expressed miRNAs in Colon168 but not in Colon99. However, they may play similar function with hsa-miR-106b which was detected as an upregulated differentially expressed miRNA in both data sets. All these 3 miRNAs belong to the miR-17 family and regulate the same set of targets including 15 known tumor suppressor genes such as CDKN1A, PIK3R1, RB1, STK11 and TGFBR2. These common targets were significantly enriched in some cancer-associated pathways such as TGF-β signaling pathway (P = 2.47E-5), mTOR signaling pathway (P = 3.16E-5), and cell cycle (P = 4.91E-3). For another example, hsa-miR-199b-5p was detected as an upregulated differentially expressed miRNA only in Colon99, but it may play the same role with hsa-miR-135b upregulated in Colon168 as they cotargeted significantly more targets including 1 tumor suppressor gene (MAP2K4). These common targets were significantly enriched in some cancer-associated pathways such as ErbB signaling pathway (P = 2.90E-4). Similarly, hsa-miR-30a was detected as a downregulated miRNA only in Colon99, but it may play the same role with hsa-miR-30c detected as a downregulated differentially expressed miRNA in Colon168. Both of them were from the miR-30 family and shared the same targets including 36 known oncogenes such as ABL1, BCL2, KRAS and PDGFRB. These common targets were significantly enriched in some cancer-associate pathways such as axon guidance (P = 2.48E-6), focal adhesion (P = 1.19E-3) and MAPK signaling pathway (P = 3.28E-3).

Figure 2.

The function link model between the 2 lists of the top-ranked 10 differentially expressed miRNAs for each cancer. For 2 lists of differentially expressed miRNAs for a cancer, we linked every 2 functionally similar miRNAs between the lists to construct the model (see Materials and Methods). A, the model between the top-ranked 10 differentially expressed miRNAs from Colon99 (cycle and diamond) and the top-ranked 10 differentially expressed miRNAs from Colon168 (rectangle and diamond); B, the model between the top-ranked 10 differentially expressed miRNAs from Gastric353 (cycle and diamond) and the top-ranked 10 differentially expressed miRNAs from Gastric41 (rectangle and diamond). For a cancer, the upregulated module consists of upregulated miRNAs (left) and the downregulated module consists of downregulated miRNAs (right).

Figure 2.

The function link model between the 2 lists of the top-ranked 10 differentially expressed miRNAs for each cancer. For 2 lists of differentially expressed miRNAs for a cancer, we linked every 2 functionally similar miRNAs between the lists to construct the model (see Materials and Methods). A, the model between the top-ranked 10 differentially expressed miRNAs from Colon99 (cycle and diamond) and the top-ranked 10 differentially expressed miRNAs from Colon168 (rectangle and diamond); B, the model between the top-ranked 10 differentially expressed miRNAs from Gastric353 (cycle and diamond) and the top-ranked 10 differentially expressed miRNAs from Gastric41 (rectangle and diamond). For a cancer, the upregulated module consists of upregulated miRNAs (left) and the downregulated module consists of downregulated miRNAs (right).

Close modal

Similarly, we constructed the model of functional relationships between the 2 lists of the top 10 differentially expressed miRNAs (Fig. 2B) separately selected from the 2 data sets for gastric cancer. In this model, hsa-miR-93 was detected as an upregulated differentially expressed miRNA exclusively in data set Gastric353. However, it might play a similar regulatory role with hsa-miR-25 which is in the same cluster with hsa-miR-93 and was detected as an upregulated differentially expressed miRNA in data set Gastric41 as they shared 127 targets (P = 8.72E-12). Their common targets were significantly enriched in the cancer-associated TGF-β signaling pathway (P = 2.28E-8). For another example, hsa-miR-181c was upregulated in Gastric353 but they may play the same function with hsa-miR-181b detected to be upregulated in Gastric41. These 2 miRNAs belong to the miR-181 family and they regulate the same set of targets including 14 known tumor suppressor genes such as ATM, CACNA2D2, and CYLD. The common targets were significantly enriched in some cancer-related pathways such as T cell receptor signaling pathway (P = 1.26E-3) and apoptosis (P = 4.89E-2). Similarly, hsa-miR-148b and hsa-miR-212, identified as downregulated differentially expressed miRNAs separately in data set Gastric353 and Gastric41, might also play a similar regulatory role in cancer as they shared significantly more common targets (P = 2.40E-10) including 10 cancer-associated genes such as DDX6, SMAD2, and NFAT5. These common targets were significantly enriched in some cancer-related pathways such as the Wnt signaling pathway (P = 2.51E-5) and the TGF-β signaling pathway (P = 4.80E-6).

We were able to collect another independent data set (Colon108) for colon cancer. Here, we used this data set to validate the functional link model constructed for the 2 lists of the top 10 differentially expressed miRNAs separately detected from the previous 2 data sets (Fig. 2A). We only analyzed the 101 miRNAs presented in all of the 3 data sets, among which 62 miRNAs were differentially expressed in Colon108. For the 16 miRNAs in the model, 11 were differentially expressed in Colon108. Among these 11 differentially expressed miRNAs, 10 (hsa-miR-1, -106a, -106b, -10b, -135b, -20a, -21, -30a, -30c, and -92a) exhibited the same regulation directions across the 3 data sets, and this was highly unlikely to happen by chance (P = 5.86E-3, binominal distribution test). In the opposite direction, 8 (80%) of the top most significant 10 miRNAs detected from Colon108 were included in or functionally linked with the model, which was significantly more than expected by chance according to both the targets randomization and the miRNAs randomization experiments (both values of P < 0.0001). Among these 8 differentially expressed miRNAs, 3 are the members of this model, including 1 upregulated miRNA (hsa-miR-135b) and 2 downregulated miRNAs (hsa-miR-10b and -30a). The other 5 miRNAs (hsa-miR-7, -147, -195, -9, and -95) were functionally associated with the model (see Materials and Methods). In Parallel, we constructed the functional link model for the 2 lists of top 20 miRNAs separately from the previous 2 data sets (see Supplementary Fig. S1). Similar validation results were observed for this model. Twenty of the 29 miRNAs in the model were differentially expressed in Colon108. Among these 20 differentially expressed miRNAs, 18 exhibited the same regulation directions across the 3 data sets (P = 2.01E-4, binominal distribution test). For the top most significant 20 differentially expressed miRNAs detected from Colon108, 15 (75%) were included in or functionally linked with the model, which was also significant according to both 2 kinds of experiments (both values of P < 0.0001). Therefore, for this cancer type, the differentially expressed miRNAs from the 3 data sets were rather reproducible. Note that for gastric cancer, we were unable to find another large independent data sets to do the same analysis.

Our results showed that miRNA arrays could reliably detect the signals of the regulation direction of differentially expressed miRNA in cancer, in a manner similar to the study of differential gene expression in diseases on the basis of microarrays (25). Our results also suggest that miRNAs are widely altered in a particular cancer and show a consistent up- or downregulation pattern, as a part of the primary pathogenesis or subsequent response in cancer progression. However, for a particular cancer, the top-ranked differentially expressed miRNAs detected from different data sets varied greatly. It has been suggested that, due to the limited statistical powers in the presence of large variations of gene expression in a complex disease, lists of differentially expressed genes detected from various microarray experiments may be highly inconsistent but may generally comprise true discoveries (16, 39). Also, on the basis of miRNA arrays, the apparently irreproducible differentially expressed miRNAs detected from different experiments may all partially capture some important differential expression of miRNAs in the disease. Nevertheless, on the basis of the POF score, we have obtained evidence that top-ranked differentially expressed miRNAs separately detected from different studies for a cancer tend to regulate genes in the same cancer-associated pathways. This result is consistent with the observation that most human miRNA tends to play a principal role in important functions in a redundant manner (20). Notably, when comparing data produced from different platforms with different degrees of coverage of all human miRNAs, we only analyzed the miRNAs commonly detected by the platforms. We could expect that results produced by the same platform could reach higher reproducibility than results produced by different platforms which may have extra between-platform variation.

Notably, we have also investigated the 6 data sets for prostate cancer reviewed by Gandellini and colleagues (11). Among the 5 publicly available data sets, 3 data sets included only 8, 4, and 7 normal samples, respectively, which might be too small to get reliable results. Thus, we only analyzed the data set (denoted as Prostate76) from (40) and the data set (denoted as Prostate80) from (41). From the 83 miRNAs presented in both data sets, 22 and 8 differentially expressed miRNAs were detected for the 2 data sets, respectively and they shared only 1 miRNA (hsa-miR-221; see Supplementary Results). We could not observe significant consistency between these 2 differentially expressed miRNA lists (see Supplementary Results). As suggested by Gandellini and colleagues (11), discrepancy between the 2 data sets could be explained by the difference in their approach of dissection, preservation of specimens. In addition, the phenotypic heterogeneity within each data set might contribute to the inconsistence. Among the 40 stage T2a/b prostate cancer samples in Prosate80, 20 showed chemical relapse within 2 years and 20 did not show chemical relapse within 10 years and 16 miRNAs were differentially expressed between these 2 groups (41). Among the 60 tumors in Prostate76, 35 were organ confined and 17 showed extraprostatic extension and changed miRNA abundance was also observed across these 2 groups of patients (40). Thus, when we find inconsistent results from different studies for a disease, we need to carefully consider possible influences of diverse factors such as difference in experimental design, biological variation and heterogeneity of the disease.

One limitation of our analysis is that we studied the function of miRNAs through their targets predicted by tools that face the problem of a high false positive rate (42). However, we found our results were rather robust against a certain level of false targets predicted for miRNAs. Nevertheless, it is generally acknowledged that all targets of miRNA comprise a large battery for all biological conditions (43). Thus, further study is needed to achieve a more comprehensive understanding of targets active in carcinogenesis using large-scale matched miRNA and mRNA arrays (40, 44–46). One issue frequently of concern to researchers is the desire to find differentially expressed miRNAs shared by multiple cancer types as well as cancer type-specific differentially expressed miRNAs. For example, Volinia and colleagues (9) compared 6 cancer types and picked 21 upregulated miRNAs associated with more than 3 cancer types, suggesting the complexity and potential for the understanding of common mechanisms apparent in different cancers. Various groups have attempted to identify differentially expressed miRNAs unique to 1 or several cancer types. For example, Bandyopadhyay and colleagues (47) defined cancer type-specific miRNA to be differentially expressed miRNA found in fewer than 2 cancer types. Johnson and colleagues (48) concluded that let-7 was lung cancer specific as it was found to be differentially expressed in lung cancer but not in colon and breast cancer. However, as shown in this study, for a particular cancer various studies also tend to generate different yet reliable differentially expressed miRNAs. Thus, it is possibly misleading to define cancer type-specific differentially expressed miRNAs in such a simple way, as the samples used in current miRNA array experiments would be insufficient to generate a full list of differentially expressed miRNAs for a particular cancer. In our future studies we plan to analyze arrays for additional cancer types, and moreover attempt to identify the cancer type-specific differentially expressed miRNAs by observing differentially expressed miRNAs detected for a variety of cancer types while taking into consideration the opposite regulation direction for different cancer types. For example, in this study, we found that hsa-miR-221 consistently upregulated in both of the 2 gastric cancer data sets while it consistently downregulated in both of the 2 prostate cancer data sets.

In the past few years, miRNA-expression profiling of human tumors has identified many miRNA biomarkers associated with diagnosis, progression, prognosis and treatment for cancers (3). Evaluating the reproducibility of the differentially expressed miRNAs found from different studies for a cancer should be an indispensable step toward the preclinical research. For example, because a specific miRNA can targets multiple genes involved in multiple cancer-associated pathways, it has been proposed that miRNAs could be drug targets in cancer therapy (5). Certainly, the best drug target is the kind of miRNAs consistently disturbed in different cohorts of patients. However, differentially expressed miRNAs identified from different cohorts of patients for a particular cancer are often highly inconsistent and thus hold back the use of miRNAs biomarkers. Our study suggested that miRNAs playing roles in 1 cohort of patients may have functionally similar miRNAs playing the same roles in other cohorts of patients. In other words, each of the functional similar miRNAs may potentially play an important role in tumorigenesis for a part of patients. In this situation, multiplex RNA inhibition targeting strategy is expected for efficient drug design (49). Thus, the reproducibility evaluation of the differentially expressed miRNAs in different cohort of patients could provide a valuable guide for the drug target design. Future work is warranted to suggest combinations of differentially expressed miRNAs for efficient drug design by considering their overall sample coverage for a cancer.

The authors thank the two anonymous referees for their constructive advices and comments to improve this work.

This work was supported by National Natural Science Foundation of China (Grant Nos. 30770558, 30970668, and 81071646), Excellent Youth Foundation of Heilongjiang Province (Grant No. JC200808), Natural Science Foundation of Heilongjiang Province of China (Grant No. QC2010012), and Scientific Research Fund of Heilongjiang Provincial Education Department (Grant No. 11541156).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Engels
BM
,
Hutvagner
G
. 
Principles and effects of microRNA-mediated post-transcriptional gene regulation
.
Oncogene
2006
;
25
:
6163
9
.
2.
Croce
CM
. 
Causes and consequences of microRNA dysregulation in cancer
.
Nat Rev Genet
2009
;
10
:
704
14
.
3.
Calin
GA
,
Croce
CM
. 
MicroRNA signatures in human cancers
.
Nat Rev Cancer
2006
;
6
:
857
66
.
4.
Calin
GA
,
Croce
CM
. 
MicroRNA-cancer connection: the beginning of a new tale
.
Cancer Res
2006
;
66
:
7390
4
.
5.
Garzon
R
,
Marcucci
G
,
Croce
CM
. 
Targeting microRNAs in cancer: rationale, strategies and challenges
.
Nat Rev Drug Discov
2010
;
9
:
775
89
.
6.
Calin
GA
,
Liu
CG
,
Ferracin
M
,
Hyslop
T
,
Spizzo
R
,
Sevignani
C
, et al
Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas
.
Cancer Cell
2007
;
12
:
215
29
.
7.
Schetter
AJ
,
Leung
SY
,
Sohn
JJ
,
Zanetti
KA
,
Bowman
ED
,
Yanaihara
N
, et al
MicroRNA expression profiles associated with prognosis and therapeutic outcome in colon adenocarcinoma
.
JAMA
2008
;
299
:
425
36
.
8.
Sarver
AL
,
French
AJ
,
Borralho
PM
,
Thayanithy
V
,
Oberg
AL
,
Silverstein
KA
, et al
Human colon cancer profiles show differential microRNA expression depending on mismatch repair status and are characteristic of undifferentiated proliferative states
.
BMC Cancer
2009
;
9
:
401
.
9.
Volinia
S
,
Calin
GA
,
Liu
CG
,
Ambs
S
,
Cimmino
A
,
Petrocca
F
, et al
A microRNA expression signature of human solid tumors defines cancer gene targets
.
Proc Natl Acad Sci U S A
2006
;
103
:
2257
61
.
10.
Ueda
T
,
Volinia
S
,
Okumura
H
,
Shimizu
M
,
Taccioli
C
,
Rossi
S
, et al
Relation between microRNA expression and progression and prognosis of gastric cancer: a microRNA expression analysis
.
Lancet Oncol
2010
;
11
:
136
46
.
11.
Gandellini
P
,
Folini
M
,
Zaffaroni
N
. 
Towards the definition of prostate cancer-related microRNAs: where are we now?
Trends Mol Med
2009
;
15
:
381
90
.
12.
Marshall
E
. 
Getting the noise out of gene arrays
.
Science
2004
;
306
:
630
1
.
13.
Ransohoff
DF
. 
Rules of evidence for cancer molecular-marker discovery and validation
.
Nat Rev Cancer
2004
;
4
:
309
14
.
14.
Ransohoff
DF
. 
Bias as a threat to the validity of cancer molecular-marker research
.
Nat Rev Cancer
2005
;
5
:
142
9
.
15.
Ein-Dor
L
,
Kela
I
,
Getz
G
,
Givol
D
,
Domany
E
. 
Outcome signature genes in breast cancer: is there a unique set?
Bioinformatics
2005
;
21
:
171
8
.
16.
Zhang
M
,
Zhang
L
,
Zou
J
,
Yao
C
,
Xiao
H
,
Liu
Q
, et al
Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes
.
Bioinformatics
2009
;
25
:
1662
8
.
17.
Yao
C
,
Li
H
,
Zhou
C
,
Zhang
L
,
Zou
J
,
Guo
Z
. 
Multi-level reproducibility of signature hubs in human interactome for breast cancer metastasis
.
BMC Syst Biol
2010
;
4
:
151
.
18.
Hanahan
D
,
Weinberg
RA
. 
The hallmarks of cancer
.
Cell
2000
;
100
:
57
70
.
19.
Bild
AH
,
Yao
G
,
Chang
JT
,
Wang
Q
,
Potti
A
,
Chasse
D
, et al
Oncogenic pathway signatures in human cancers as a guide to targeted therapies
.
Nature
2006
;
439
:
353
7
.
20.
Ivanovska
I
,
Cleary
MA
. 
Combinatorial microRNAs: working together to make a difference
.
Cell Cycle
2008
;
7
:
3137
42
.
21.
Dalmay
T
,
Edwards
DR
. 
MicroRNAs and the hallmarks of cancer
.
Oncogene
2006
;
25
:
6170
5
.
22.
Osada
H
,
Takahashi
T
. 
MicroRNAs in biological processes and carcinogenesis
.
Carcinogenesis
2007
;
28
:
2
12
.
23.
Yang
D
,
Li
Y
,
Xiao
H
,
Liu
Q
,
Zhang
M
,
Zhu
J
, et al
Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories
.
Bioinformatics
2008
;
24
:
265
71
.
24.
Gong
X
,
Wu
R
,
Zhang
Y
,
Zhao
W
,
Cheng
L
,
Gu
Y
, et al
Extracting consistent knowledge from highly inconsistent cancer gene data sources
.
BMC Bioinformatics
2010
;
11
:
76
.
25.
Wang
D
,
Wang
C
,
Zhang
L
,
Xiao
H
,
Shen
X
,
Ren
L
, et al
Evaluation of cDNA microarray data by multiple clones mapping to the same transcript
.
Omics
2009
;
13
:
493
9
.
26.
Zhang
M
,
Yao
C
,
Guo
Z
,
Zou
J
,
Zhang
L
,
Xiao
H
, et al
Apparently low reproducibility of true differential expression discoveries in microarray studies
.
Bioinformatics
2008
;
24
:
2057
63
.
27.
Jiang
Q
,
Hao
Y
,
Wang
G
,
Juan
L
,
Zhang
T
,
Teng
M
, et al
Prioritization of disease microRNAs through a human phenome-microRNAome network
.
BMC Syst Biol
2010
;
4
Suppl 1
:
S2
.
28.
Lewis
BP
,
Burge
CB
,
Bartel
DP
. 
Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets
.
Cell
2005
;
120
:
15
20
.
29.
Betel
D
,
Wilson
M
,
Gabow
A
,
Marks
DS
,
Sander
C
. 
The microRNA.org resource: targets and expression
.
Nucleic Acids Res
2008
;
36
:
D149
53
.
30.
Krek
A
,
Grun
D
,
Poy
MN
,
Wolf
R
,
Rosenberg
L
,
Epstein
EJ
, et al
Combinatorial microRNA target predictions
.
Nat Genet
2005
;
37
:
495
500
.
31.
Creighton
CJ
,
Nagaraja
AK
,
Hanash
SM
,
Matzuk
MM
,
Gunaratne
PH
. 
A bioinformatics tool for linking gene expression profiling results with public databases of microRNA target predictions
.
RNA
2008
;
14
:
2290
6
.
32.
Troyanskaya
O
,
Cantor
M
,
Sherlock
G
,
Brown
P
,
Hastie
T
,
Tibshirani
R
, et al
Missing value estimation methods for DNA microarrays
.
Bioinformatics
2001
;
17
:
520
5
.
33.
Pradervand
S
,
Weber
J
,
Thomas
J
,
Bueno
M
,
Wirapati
P
,
Lefort
K
, et al
Impact of normalization on miRNA microarray expression profiling
.
Rna
2009
;
15
:
493
501
.
34.
Griffiths-Jones
S
,
Grocock
RJ
,
van Dongen
S
,
Bateman
A
,
Enright
AJ
. 
miRBase: microRNA sequences, targets and gene nomenclature
.
Nucleic Acids Res
2006
;
34
:
D140
4
.
35.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J Roy Stat Soc B
1995
;
57
:
289
300
.
36.
Ein-Dor
L
,
Zuk
O
,
Domany
E
. 
Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer
.
Proc Natl Acad Sci U S A
2006
;
103
:
5923
8
.
37.
Irizarry
RA
,
Warren
D
,
Spencer
F
,
Kim
IF
,
Biswal
S
,
Frank
BC
, et al
Multiple-laboratory comparison of microarray platforms
.
Nat Methods
2005
;
2
:
345
50
.
38.
Kanehisa
M
,
Goto
S
. 
KEGG: kyoto encyclopedia of genes and genomes
.
Nucleic Acids Res
2000
;
28
:
27
30
.
39.
Pan
W
,
Lin
J
,
Le
CT
. 
How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach
.
Genome Biol
2002
;
3
:
RESEARCH 0022
.
40.
Ambs
S
,
Prueitt
RL
,
Yi
M
,
Hudson
RS
,
Howe
TM
,
Petrocca
F
, et al
Genomic profiling of microRNA and messenger RNA reveals deregulated microRNA expression in prostate cancer
.
Cancer Res
2008
;
68
:
6162
70
.
41.
Tong
AW
,
Fulgham
P
,
Jay
C
,
Chen
P
,
Khalil
I
,
Liu
S
, et al
MicroRNA profile analysis of human prostate cancers
.
Cancer Gene Ther
2009
;
16
:
206
16
.
42.
Baek
D
,
Villen
J
,
Shin
C
,
Camargo
FD
,
Gygi
SP
,
Bartel
DP
. 
The impact of microRNAs on protein output
.
Nature
2008
;
455
:
64
71
.
43.
Makeyev
EV
,
Maniatis
T
. 
Multilevel regulation of gene expression by microRNAs
.
Science
2008
;
319
:
1789
90
.
44.
Joung
JG
,
Fei
Z
. 
Computational identification of condition-specific miRNA targets based on gene expression profiles and sequence information
.
BMC Bioinformatics
2009
;
10
Suppl 1
:
S34
.
45.
Liu
B
,
Li
J
,
Tsykin
A
. 
Discovery of functional miRNA-mRNA regulatory modules with computational methods
.
J Biomed Inform
2009
;
42
:
685
91
.
46.
Liu
H
,
Kohane
IS
. 
Tissue and process specific microRNA-mRNA co-expression in mammalian development and malignancy
.
PLoS One
2009
;
4
:
e5436
.
47.
Bandyopadhyay
S
,
Mitra
R
,
Maulik
U
,
Zhang
MQ
. 
Development of the human cancer microRNA network
.
Silence
2010
;
1
:
6
.
48.
Johnson
SM
,
Grosshans
H
,
Shingara
J
,
Byrom
M
,
Jarvis
R
,
Cheng
A
, et al
RAS is regulated by the let-7 microRNA family
.
Cell
2005
;
120
:
635
47
.
49.
Zhang
S
,
Chen
L
,
Jung
EJ
,
Calin
GA
. 
Targeting microRNAs with small molecules: from dream to reality
.
Clin Pharmacol Ther
2010
;
87
:
754
8
.