Abstract
Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease. Mol Cancer Ther; 10(5); 752–60. ©2011 AACR.
Introduction
MicroRNAs (miRNA) compose a class of small endogenous noncoding RNA capable of regulating gene expression, either by inhibiting translation or promoting mRNA degradation (1). It has been suggested that differential expression of miRNA is involved in the initiation and progression of human cancer and thus may serve as a disease marker to improve diagnosis, prognosis and therapy for cancer (2–5). On the basis of miRNA arrays, numerous studies have been conducted to identify differentially expressed miRNAs between cancer and normal samples (3, 6–10). However, for a particular cancer, differentially expressed miRNA generated from multiple studies are often highly inconsistent. For example, no differentially expressed miRNAs detected from 6 studies for prostate cancer were shared by all of the studies (11). Because the reproducibility of discovered biomarkers is of fundamental importance for validation (12, 13), this problem must be resolved in a rational way before full use of miRNA arrays in biological researches is possible.
The irreproducibility problem may not reflect lack of reliability of the technology platform used but may reflect a lack of understanding regarding a correct validation process for biomarkers (14). It is becoming clear that, in high-throughout experiments for finding biomarkers for a disease, the irreproducibility problem could be induced by, among other factors, diverse biological factors such as biological variation and heterogeneity of molecular changes in the disease (15–17). On the other hand, it is also becoming clear that the diverse molecular changes (18, 19) including expression changes of miRNAs in cancer tend to be redundant regarding function (20). Specifically, diverse expression changes of miRNAs in cancer may primarily affect cancer-associated pathways by targeting cancer genes in these pathways (9, 21, 22). If this is true, different differentially expressed miRNAs generated from different studies for a cancer may be consistent in function, as seen for biomarker discovery based on the mRNA array platform (16, 23). Thus, to evaluate the reproducibility of differentially expressed miRNA discovery in different studies for a particular cancer, 1 reasonable approach involves taking into account the functional relationship (16, 24), rather than simply counting the overlaps. That is, we can design scores to evaluate the reproducibility of biomarker discovery using a biological assumption on functional relations such as expression correlation (16) or functionally similarity (24) between biomarkers. Notably, the underlying biological assumption is statistically testable: if a score is significantly higher than expected by chance, then it can provide evidence supporting that the assumption can partially explain the apparently inconsistent biomarkers.
Here, on the basis of multiple miRNA data sets for 2 cancer types, we showed that the regulation direction (upregulated or downregulated in cancer samples compared with normal samples) of differentially expressed miRNA detected in different data sets for each cancer were highly consistent. This result indicates that the miRNA arrays could be applied to reliably capture signals of the regulation direction of differentially expressed miRNA in cancer (25). Importantly, this result also suggests that the differentially expressed miRNA in a particular cancer has a consistent up- or downregulation pattern. On the analysis of different samples for each cancer, the miRNA displaying the most pronounced differential expression varied greatly, which could be attributed to biological variation of the molecular change of miRNA in the samples (26). On the basis of the biological assumption that 2 differentially expressed miRNAs with significantly overlapped targets may have a similar function (27), we proposed a score entitled percentage of overlaps of function-related miRNAs (POF) to evaluate the functional consistency of differentially expressed miRNAs for a cancer. For each of the 2 cancers, respectively, we found that most of the POF scores between top-ranked differentially expressed miRNAs detected from different studies were rather high, suggesting that the differentially expressed miRNAs detected from different studies may point to functionally important differential expression of miRNA in the disease. Then, using another independent data set for colon cancer, we validated the model of functional links between differentially expressed miRNAs extracted from 2 other data sets for this cancer.
Materials and Methods
Datasets
We analyzed 5 large data sets for 2 cancer types (colon and gastric). For each data set, the number of samples of each state (cancer or normal) was not less than 20. The detailed information was described in Table 1. We referred to each data set using the following nomenclature: cancer type followed by the total number of samples. Three data sets were collected for colon cancer: 2 data sets [Colon99 (6) and Colon168 (7)] were used to analyze the consistency of differentially expressed miRNAs and extract the functional link models, and another data set [Colon108 (8)] was used to validate the functional link models. The Colon108 data set included 38 technical replicates with highly reproducible signals (8). For each sample with replicates in this data set, the expression value of each miRNA was represented by the average of quantile normalized values for this miRNA across the replicates (8).
Dataseta . | (T:N)b . | Platform . | Source of normal sample . |
---|---|---|---|
Colon99 (6) | 78:21 | OSU_CCCc 11k v2 | Normal colonic mucosad |
Colon168 (7) | 84:84 | OSU_CCC v2.0 | Adjacent nontumor tissue |
Colon108 (8) | 80:28 | Illumina v1 | Adjacent nontumor tissue |
Gastric41 (9) | 20:21 | OSU_CCC 11k v2 | Adjacent nontumor tissue |
Gastric353 (10) | 184:169 | OSU_CCC v3.0 | Adjacent nontumor tissue |
Dataseta . | (T:N)b . | Platform . | Source of normal sample . |
---|---|---|---|
Colon99 (6) | 78:21 | OSU_CCCc 11k v2 | Normal colonic mucosad |
Colon168 (7) | 84:84 | OSU_CCC v2.0 | Adjacent nontumor tissue |
Colon108 (8) | 80:28 | Illumina v1 | Adjacent nontumor tissue |
Gastric41 (9) | 20:21 | OSU_CCC 11k v2 | Adjacent nontumor tissue |
Gastric353 (10) | 184:169 | OSU_CCC v3.0 | Adjacent nontumor tissue |
aEach data set was denoted by the following nomenclature: cancer type followed by the total number of samples.
bT, the number of tumor samples; N, the number of normal samples.
cOSU_CCC, Ohio State University custom miRNA microarray chip.
dNormal samples were described as being from the normal colonic mucosa, thus it is unclear whether they were from the patients.
miRNA targets and cancer-associated genes
Currently, Targetscan (28), miRanda (29), and PicTar (30) are among the most widely used miRNA–target interaction prediction algorithms. Thus, we used them for analyses separately. Considering that miRNA targets predicted by multiple algorithms might be more reliable (31), we also used miRNA–targets interactions appearing in at least 2 of the 8 data sources (see Supplementary Data).
A total of 2,104 cancer-associated genes were collected from F-Census which compiles cancer genes from 8 data sources (24).
Data preprocessing and differentially expressed miRNA selection
For the data sets of Colon99, Colon168, Gastric41, and Gastric353, the raw data were preprocessed as follows: (i) the median background signal intensity was subtracted from the signal for each probe; (ii) signal values less than 1 were replaced with 1; (iii) log2-transformation was applied; (iv) probes absent in more than 20% of samples were deleted and other missing values were imputed using the R package kNN imputation function (32); (v) signal values of replicate probes of a miRNA were averaged; and (vi) quantile normalization was carried out (33). For Colon108, the quantile-normalized data of the original study were used. For all platforms, we annotated probes to mature miRNA using miRbase version 16 (34).
A two-sample t-test was carried out to select the differentially expressed miRNAs. The P values were adjusted by the Benjamini and Hochberg (BH) correction procedure to account for multiple tests with the false discovery rate (FDR) < 5% (35). For comparing the top n differentially expressed miRNAs from different data sets, we ranked miRNAs according to the t-test P values to select the most significant n ones. The differentially expressed miRNAs selected for each data set were listed in Supplementary Table S1–S5.
Consistency of regulation direction between two lists of differentially expressed miRNA
For each cancer, when comparing results found from multiple data sets generated on different platforms, we only analyzed the miRNAs presented in all of the data sets. Using the binomial distribution model, we calculated the probability of observing from N randomly selected miRNAs at least m non-differentially expressed miRNAs with consistent regulation directions across 2 data sets as follows:
in which Pe is the random probability that 1 non-differentially expressed miRNA with the same regulation direction exists across 2 data sets. For each cancer, we roughly defined non-differentially expressed miRNA as miRNA that was not detected as differentially expressed miRNA at the FDR level of 10% in either of the data sets for this cancer. The Pe for each cancer was close to 0.5.
The PO score
The PO (percentage of overlaps) metric is frequently used to evaluate the consistency of different lists (26, 36, 37). Clearly, to be used as a disease marker for a disease, a differentially expressed miRNA should have a steady regulation pattern across different data sets. In this study we strictly defined differentially expressed miRNA, shared by 2 lists of differentially expressed miRNA only to be thus named if it also exhibited the same direction change across the corresponding 2 data sets (16). Suppose O miRNAs are shared by list 1 with length l1 and list 2 with length l2, then the PO score from list 1 to list 2 is PO12 = O/l1, and the score from list 2 to list 1 is PO21 = O/l2.
To assess the significance of a PO score for 2 differentially expressed miRNA lists with lengths l1 and l2, respectively, we did a random experiment to test the null hypothesis that the observed PO score would be expected by chance for 2 random miRNA lists (with lengths of l1 and l2) extracted from miRNAs presented in both data sets. This process was repeated 10,000 times. The significance level of an observed PO score was defined as the percentage of the 10,000 random scores no less than the observed score.
The POF score
We considered 2 miRNAs to be functionally similar if they (i) had the same regulation direction, (ii) shared significantly more target genes than expected by chance, and (iii) shared at least 1 cancer-associated target gene. The significance of the number of targets shared by 2 miRNAs was calculated according to the cumulative hypergeometric distribution model:
in which N is the number of all targets of all miRNAs, M and n are the numbers of targets for the 2 miRNAs, respectively, and k is the number of targets shared by these 2 miRNAs. The P values were corrected by the Benjamini and Hochberg method for multiple testing (35).
Next, we proposed the POF score to evaluate the functional consistency between 2 miRNA lists. Suppose Of12 (or Of21) denotes the number of miRNAs in list 1 (or list 2) not shared but functionally similar to at least 1 miRNA in list 2 (or list 1); the POF score from list 1 to list 2 (or from list 2 to list 1) is then POF12 = (O + Of12)/l1 (or POF21 = (O + Of21)/l2). POF is then determined as POF = (POF12 + POF21)/2.
We tested 2 null hypotheses for an observed POF score by random experiments. The first random experiment, referred to as targets randomization, was used to test the null hypothesis that the score can be expected by chance when no prior biological knowledge of miRNA–target interaction is used. Here, we assigned genes randomly derived from the human genome to miRNAs while keeping the number of targets for each miRNA unchanged. The second random experiment, referred to as miRNAs randomization, was applied to test the null hypothesis that the observed POF score can be expected for non-differentially expressed miRNAs lists with the same lengths as the differentially expressed miRNAs lists. Here, approximately, we defined non-differentially expressed miRNAs for a cancer as the remaining miRNAs after excluding the miRNAs detected to be potentially significant with a rather loose FDR cutoff of 10% in either data set for the cancer. Then, we randomly selected 2 non-differentially expressed miRNA lists with the same lengths as the differentially expressed miRNAs lists and calculated their POF score. This process was repeated 10,000 times. The significance level for an observed POF score was defined as the percentage of the randomization scores no less than the observed value.
Functional link model of differentially expressed miRNAs
For 2 lists of differentially expressed miRNAs separately selected from 2 data sets for a cancer, we constructed a network model by linking every 2 miRNAs between the lists if they are functionally similar according the criteria described above.
For a new list of differentially expressed miRNAs detected from a new data set for a cancer, we can test whether it is associated with a previously extracted model for the same disease. We can do this validation on the basis of the same assumption for defining functionally similar miRNAs. That is, for each miRNA in the new list, if it shows the same regulation direction across all data sets and shares significantly more targets with at least 1 set of common targets of 2 linked miRNAs in the model, including at least 1 cancer-associated gene, then it is defined as a functionally similar miRNA with the model.
Pathway enrichment analysis
The hypergeometic distribution model was used to find the Kyoto Encyclopedia of Genes and Genomes (KEGG; ref. 38) pathways enriched with common targets of 2 differentially expressed miRNAs. If we suppose thats miRNAs M is the total number of the common targets of 2 miRNA, N is the total number of genes in human genome, n is the number of genes annotated in a pathway, and k is the number of the common targets annotated in this pathway, we then calculate the probability of observing by chance at least k targets annotated in this KEGG pathway as follows:
The raw P values were adjusted by the BH correction procedure (FDR < 5%; ref. 35).
Results
Consistent regulation direction of differentially expressed miRNA in a particular cancer
To investigate the consistency of differentially expressed miRNAs across different data sets, we analyzed 2 data sets for each of the 2 solid cancers described in Table 1. In each data set, we selected differentially expressed miRNAs by t-test at the FDR level of 5%. Considering that different platforms have different degrees of coverage of all human miRNAs, we focused on analyzing the miRNAs presented in both data sets for a cancer.
In the 2 data sets for colon cancer, 22 and 49 differentially expressed miRNAs were selected from the 110 miRNAs presented in both data sets, respectively, and they shared 14 differentially expressed miRNAs (Fig. 1A), in which 13 have the same regulation direction (upregulated or downregulated in cancer samples compared with normal samples) across these 2 data sets. Thus, the PO score (see Materials and Methods) was 0.59 from the short list to the long list and was 0.27 from the opposite direction. These 2 scores were not high, but were significantly larger than the expected scores of 0.32 and 0.15 in the 2 directions (P = 0.0010). In the 2 data sets for gastric cancer, 78 and 18 differentially expressed miRNAs were identified from the 155 miRNAs presented in both data sets and they shared 15 differentially expressed miRNAs (Fig. 1B). The PO score was 0.83 from the short list to the long list, whereas it was only 0.19 from the opposite direction; both values were significantly larger than expected by chance (P < 0.0001). Notably, the longer list of differentially expressed miRNAs included approximately half of all miRNAs detected in these 2 cancer types, indicating that miRNAs are widely altered in cancer.
We then investigated the consistency of the regulation direction of the differentially expressed miRNAs identified from various studies for colon and gastric cancers, respectively. For the 2 colon cancer data sets, 13 (92.9%) of the 14 differentially expressed miRNAs detected in both data sets exhibited the same up- or downregulation across the data sets (P = 9.16E-04, binominal distribution test). For the 49 differentially expressed miRNAs found in the data set Colon99, after excluding the 14 overlapping differentially expressed miRNAs, 13 exhibited tentative evidence of differential expression with P values less than 0.1 in the data set Colon168, and all of them exhibited the same direction change across the 2 data sets (P = 1.22E-04, binominal distribution test). This result indicates that many miRNAs with tentative evidence of differential expression in the data set Colon168 may actually be differentially expressed in colon cancer. Similarly, for gastric cancer, all of the 15 overlapping differentially expressed miRNAs showed the same regulation directions across the 2 data sets (P = 3.05E-05, binominal distribution test). For the 78 differentially expressed miRNAs found in the data set Gastric353, after excluding the 15 overlapping differentially expressed miRNAs, 28 exhibited P values less than 0.1 in the data set Gastric41 and 27 (96.4%) of them showed the same regulation directions across the 2 data sets (P = 1.08E-07, binominal distribution test). The results showed that the regulation direction of the differentially expressed miRNAs detected in both data sets for each of the 2 cancers were highly consistent, indicating that miRNA arrays could reliably capture the signals of the regulation direction of differentially expressed miRNAs in a particular cancer.
Functional consistency of top-ranked differentially expressed miRNA for a particular cancer
Researchers are often interested in the most significant differentially expressed miRNAs. However, for a particular cancer, the top-ranked n most significant differentially expressed miRNAs separately detected from various studies are usually highly inconsistent. For example, when n was equal to 10, only 2 (hsa-miR-106b and -10b) were shared by 2 lists for colon cancer and 3 (hsa-miR-181b, -21, and -218) were shared for gastric cancer. The corresponding PO scores were only 0.2 and 0.3. When n was equal to 20, the PO scores were 0.2 and 0.45 for colon and gastric cancer, respectively. Then, we proposed the POF score (see Materials and Methods) to evaluate the functional consistency of 2 lists of the top-ranked n differentially expressed miRNAs for each cancer on the basis of the assumption that different differentially expressed miRNAs may disturb the same cancer-associated pathways through their common targets. We calculated the POF scores using the miRNA targets predicted by TargetScan (28), miRanda (29), and PicTar (30), respectively. Also, we calculated the scores using miRNA targets documented in at least 2 of the 8 data sources (see Materials and Methods). We found that all the results were quite similar (Table 2), indicating that functional analysis between miRNAs on the basis of their nonrandom target overlap is rather robust against a certain level of false targets predicted for miRNAs. Hence, we only described the results on the basis of the targets predicted by TargetScan in the following text.
Dataset . | Method . | POF . | e-POF1a . | e-POF2b . | POF . | e-POF1 . | e-POF2 . |
---|---|---|---|---|---|---|---|
Colon168 vs. Colon99 | TOP 10 | TOP 20 | |||||
TargetScan | 0.90 | 0.20 (<0.0001) | 0.43 (0.0004) | 0.83 | 0.20 (<0.0001) | 0.46 (0.0004) | |
miRanda | 0.90 | 0.20 (<0.0001) | 0.50 (0.0006) | 0.85 | 0.20 (<0.0001) | 0.50 (<0.0001) | |
PicTar | 0.80 | 0.20 (<0.0001) | 0.48 (0.0039) | 0.78 | 0.20 (<0.0001) | 0.47 (0.0003) | |
Integratedc | 0.90 | 0.20 (<0.0001) | 0.47 (0.0004) | 0.85 | 0.20 (<0.0001) | 0.49 (<0.0001) | |
Gastric353 vs. Gastric41 | |||||||
TargetScan | 0.95 | 0.30 (<0.0001) | 0.44 (<0.0001) | 0.98 | 0.45 (<0.0001) | 0.47 (<0.0001) | |
miRanda | 0.95 | 0.30 (<0.0001) | 0.50 (0.0004) | 0.98 | 0.45 (<0.0001) | 0.50 (<0.0001) | |
PicTar | 0.95 | 0.30 (<0.0001) | 0.44 (0.0004) | 0.95 | 0.45 (<0.0001) | 0.48 (<0.0001) | |
Integratedc | 0.95 | 0.30 (<0.0001) | 0.47 (0.0004) | 0.98 | 0.45 (<0.0001) | 0.49 (<0.0001) |
Dataset . | Method . | POF . | e-POF1a . | e-POF2b . | POF . | e-POF1 . | e-POF2 . |
---|---|---|---|---|---|---|---|
Colon168 vs. Colon99 | TOP 10 | TOP 20 | |||||
TargetScan | 0.90 | 0.20 (<0.0001) | 0.43 (0.0004) | 0.83 | 0.20 (<0.0001) | 0.46 (0.0004) | |
miRanda | 0.90 | 0.20 (<0.0001) | 0.50 (0.0006) | 0.85 | 0.20 (<0.0001) | 0.50 (<0.0001) | |
PicTar | 0.80 | 0.20 (<0.0001) | 0.48 (0.0039) | 0.78 | 0.20 (<0.0001) | 0.47 (0.0003) | |
Integratedc | 0.90 | 0.20 (<0.0001) | 0.47 (0.0004) | 0.85 | 0.20 (<0.0001) | 0.49 (<0.0001) | |
Gastric353 vs. Gastric41 | |||||||
TargetScan | 0.95 | 0.30 (<0.0001) | 0.44 (<0.0001) | 0.98 | 0.45 (<0.0001) | 0.47 (<0.0001) | |
miRanda | 0.95 | 0.30 (<0.0001) | 0.50 (0.0004) | 0.98 | 0.45 (<0.0001) | 0.50 (<0.0001) | |
PicTar | 0.95 | 0.30 (<0.0001) | 0.44 (0.0004) | 0.95 | 0.45 (<0.0001) | 0.48 (<0.0001) | |
Integratedc | 0.95 | 0.30 (<0.0001) | 0.47 (0.0004) | 0.98 | 0.45 (<0.0001) | 0.49 (<0.0001) |
Abbreviation: POF, percentage of overlaps of function.
aExpected POF score (P-value) estimated by doing targets randomization 10,000 times.
bExpected POF score (P-value) estimated by doing miRNAs randomization 10,000 times.
cIntegrated, miRNA–target interactions were in at least 2 of 8 miRNA target sources.
Between the lists of the top 10 (or 20) differentially expressed miRNAs detected separately from the 2 data sets for colon cancer, the POF score was 0.90 (or 0.83). We tested 2 null hypotheses for these observed high POF scores by random experiments (see Materials and Methods). For the targets randomization experiment, the average of 10,000 random POF scores for either the top 10 or 20 miRNAs was 0.20. In 10,000 of the random POF scores, no one exceeded the observed scores (P < 10E-4, Table 2), we thus rejected the null hypothesis that the observed POF score can be expected by chance when no prior biological knowledge of miRNA–target interaction is used. For the miRNAs randomization experiment, we approximately defined non-differentially expressed miRNAs as those miRNAs not detected as differentially expressed miRNA using a loose FDR control of 10% in either data set. By randomly extracting 2 lists of 10 or 20 non-differentially expressed miRNAs 10,000 times, the average of random POF scores was 0.44 or 0.47, significantly smaller than the corresponding observed scores. Thus, we rejected the null hypothesis that the observed POF score can be expected for non-differentially expressed miRNAs lists with the same lengths as the differentially expressed miRNAs lists (Table 2). Similarly, the POF scores for the lists of the top 10 or 20 miRNAs separately detected from the 2 data sets for gastric cancer was 0.95 or 0.98, both of which were significantly higher than those obtained using the above 2 kinds of random experiments (Table 2).
Functional link model of differentially expressed miRNAs for a particular cancer
We constructed the model of functional relationships between the lists of the top 10 differentially expressed miRNAs (Fig. 2A) separately selected from the 2 data sets for colon cancer (see Materials and Methods). On the basis of this model, we explained some functional links. For example, hsa-miR-106a, and -20a were detected as upregulated differentially expressed miRNAs in Colon168 but not in Colon99. However, they may play similar function with hsa-miR-106b which was detected as an upregulated differentially expressed miRNA in both data sets. All these 3 miRNAs belong to the miR-17 family and regulate the same set of targets including 15 known tumor suppressor genes such as CDKN1A, PIK3R1, RB1, STK11 and TGFBR2. These common targets were significantly enriched in some cancer-associated pathways such as TGF-β signaling pathway (P = 2.47E-5), mTOR signaling pathway (P = 3.16E-5), and cell cycle (P = 4.91E-3). For another example, hsa-miR-199b-5p was detected as an upregulated differentially expressed miRNA only in Colon99, but it may play the same role with hsa-miR-135b upregulated in Colon168 as they cotargeted significantly more targets including 1 tumor suppressor gene (MAP2K4). These common targets were significantly enriched in some cancer-associated pathways such as ErbB signaling pathway (P = 2.90E-4). Similarly, hsa-miR-30a was detected as a downregulated miRNA only in Colon99, but it may play the same role with hsa-miR-30c detected as a downregulated differentially expressed miRNA in Colon168. Both of them were from the miR-30 family and shared the same targets including 36 known oncogenes such as ABL1, BCL2, KRAS and PDGFRB. These common targets were significantly enriched in some cancer-associate pathways such as axon guidance (P = 2.48E-6), focal adhesion (P = 1.19E-3) and MAPK signaling pathway (P = 3.28E-3).
Similarly, we constructed the model of functional relationships between the 2 lists of the top 10 differentially expressed miRNAs (Fig. 2B) separately selected from the 2 data sets for gastric cancer. In this model, hsa-miR-93 was detected as an upregulated differentially expressed miRNA exclusively in data set Gastric353. However, it might play a similar regulatory role with hsa-miR-25 which is in the same cluster with hsa-miR-93 and was detected as an upregulated differentially expressed miRNA in data set Gastric41 as they shared 127 targets (P = 8.72E-12). Their common targets were significantly enriched in the cancer-associated TGF-β signaling pathway (P = 2.28E-8). For another example, hsa-miR-181c was upregulated in Gastric353 but they may play the same function with hsa-miR-181b detected to be upregulated in Gastric41. These 2 miRNAs belong to the miR-181 family and they regulate the same set of targets including 14 known tumor suppressor genes such as ATM, CACNA2D2, and CYLD. The common targets were significantly enriched in some cancer-related pathways such as T cell receptor signaling pathway (P = 1.26E-3) and apoptosis (P = 4.89E-2). Similarly, hsa-miR-148b and hsa-miR-212, identified as downregulated differentially expressed miRNAs separately in data set Gastric353 and Gastric41, might also play a similar regulatory role in cancer as they shared significantly more common targets (P = 2.40E-10) including 10 cancer-associated genes such as DDX6, SMAD2, and NFAT5. These common targets were significantly enriched in some cancer-related pathways such as the Wnt signaling pathway (P = 2.51E-5) and the TGF-β signaling pathway (P = 4.80E-6).
We were able to collect another independent data set (Colon108) for colon cancer. Here, we used this data set to validate the functional link model constructed for the 2 lists of the top 10 differentially expressed miRNAs separately detected from the previous 2 data sets (Fig. 2A). We only analyzed the 101 miRNAs presented in all of the 3 data sets, among which 62 miRNAs were differentially expressed in Colon108. For the 16 miRNAs in the model, 11 were differentially expressed in Colon108. Among these 11 differentially expressed miRNAs, 10 (hsa-miR-1, -106a, -106b, -10b, -135b, -20a, -21, -30a, -30c, and -92a) exhibited the same regulation directions across the 3 data sets, and this was highly unlikely to happen by chance (P = 5.86E-3, binominal distribution test). In the opposite direction, 8 (80%) of the top most significant 10 miRNAs detected from Colon108 were included in or functionally linked with the model, which was significantly more than expected by chance according to both the targets randomization and the miRNAs randomization experiments (both values of P < 0.0001). Among these 8 differentially expressed miRNAs, 3 are the members of this model, including 1 upregulated miRNA (hsa-miR-135b) and 2 downregulated miRNAs (hsa-miR-10b and -30a). The other 5 miRNAs (hsa-miR-7, -147, -195, -9, and -95) were functionally associated with the model (see Materials and Methods). In Parallel, we constructed the functional link model for the 2 lists of top 20 miRNAs separately from the previous 2 data sets (see Supplementary Fig. S1). Similar validation results were observed for this model. Twenty of the 29 miRNAs in the model were differentially expressed in Colon108. Among these 20 differentially expressed miRNAs, 18 exhibited the same regulation directions across the 3 data sets (P = 2.01E-4, binominal distribution test). For the top most significant 20 differentially expressed miRNAs detected from Colon108, 15 (75%) were included in or functionally linked with the model, which was also significant according to both 2 kinds of experiments (both values of P < 0.0001). Therefore, for this cancer type, the differentially expressed miRNAs from the 3 data sets were rather reproducible. Note that for gastric cancer, we were unable to find another large independent data sets to do the same analysis.
Discussion
Our results showed that miRNA arrays could reliably detect the signals of the regulation direction of differentially expressed miRNA in cancer, in a manner similar to the study of differential gene expression in diseases on the basis of microarrays (25). Our results also suggest that miRNAs are widely altered in a particular cancer and show a consistent up- or downregulation pattern, as a part of the primary pathogenesis or subsequent response in cancer progression. However, for a particular cancer, the top-ranked differentially expressed miRNAs detected from different data sets varied greatly. It has been suggested that, due to the limited statistical powers in the presence of large variations of gene expression in a complex disease, lists of differentially expressed genes detected from various microarray experiments may be highly inconsistent but may generally comprise true discoveries (16, 39). Also, on the basis of miRNA arrays, the apparently irreproducible differentially expressed miRNAs detected from different experiments may all partially capture some important differential expression of miRNAs in the disease. Nevertheless, on the basis of the POF score, we have obtained evidence that top-ranked differentially expressed miRNAs separately detected from different studies for a cancer tend to regulate genes in the same cancer-associated pathways. This result is consistent with the observation that most human miRNA tends to play a principal role in important functions in a redundant manner (20). Notably, when comparing data produced from different platforms with different degrees of coverage of all human miRNAs, we only analyzed the miRNAs commonly detected by the platforms. We could expect that results produced by the same platform could reach higher reproducibility than results produced by different platforms which may have extra between-platform variation.
Notably, we have also investigated the 6 data sets for prostate cancer reviewed by Gandellini and colleagues (11). Among the 5 publicly available data sets, 3 data sets included only 8, 4, and 7 normal samples, respectively, which might be too small to get reliable results. Thus, we only analyzed the data set (denoted as Prostate76) from (40) and the data set (denoted as Prostate80) from (41). From the 83 miRNAs presented in both data sets, 22 and 8 differentially expressed miRNAs were detected for the 2 data sets, respectively and they shared only 1 miRNA (hsa-miR-221; see Supplementary Results). We could not observe significant consistency between these 2 differentially expressed miRNA lists (see Supplementary Results). As suggested by Gandellini and colleagues (11), discrepancy between the 2 data sets could be explained by the difference in their approach of dissection, preservation of specimens. In addition, the phenotypic heterogeneity within each data set might contribute to the inconsistence. Among the 40 stage T2a/b prostate cancer samples in Prosate80, 20 showed chemical relapse within 2 years and 20 did not show chemical relapse within 10 years and 16 miRNAs were differentially expressed between these 2 groups (41). Among the 60 tumors in Prostate76, 35 were organ confined and 17 showed extraprostatic extension and changed miRNA abundance was also observed across these 2 groups of patients (40). Thus, when we find inconsistent results from different studies for a disease, we need to carefully consider possible influences of diverse factors such as difference in experimental design, biological variation and heterogeneity of the disease.
One limitation of our analysis is that we studied the function of miRNAs through their targets predicted by tools that face the problem of a high false positive rate (42). However, we found our results were rather robust against a certain level of false targets predicted for miRNAs. Nevertheless, it is generally acknowledged that all targets of miRNA comprise a large battery for all biological conditions (43). Thus, further study is needed to achieve a more comprehensive understanding of targets active in carcinogenesis using large-scale matched miRNA and mRNA arrays (40, 44–46). One issue frequently of concern to researchers is the desire to find differentially expressed miRNAs shared by multiple cancer types as well as cancer type-specific differentially expressed miRNAs. For example, Volinia and colleagues (9) compared 6 cancer types and picked 21 upregulated miRNAs associated with more than 3 cancer types, suggesting the complexity and potential for the understanding of common mechanisms apparent in different cancers. Various groups have attempted to identify differentially expressed miRNAs unique to 1 or several cancer types. For example, Bandyopadhyay and colleagues (47) defined cancer type-specific miRNA to be differentially expressed miRNA found in fewer than 2 cancer types. Johnson and colleagues (48) concluded that let-7 was lung cancer specific as it was found to be differentially expressed in lung cancer but not in colon and breast cancer. However, as shown in this study, for a particular cancer various studies also tend to generate different yet reliable differentially expressed miRNAs. Thus, it is possibly misleading to define cancer type-specific differentially expressed miRNAs in such a simple way, as the samples used in current miRNA array experiments would be insufficient to generate a full list of differentially expressed miRNAs for a particular cancer. In our future studies we plan to analyze arrays for additional cancer types, and moreover attempt to identify the cancer type-specific differentially expressed miRNAs by observing differentially expressed miRNAs detected for a variety of cancer types while taking into consideration the opposite regulation direction for different cancer types. For example, in this study, we found that hsa-miR-221 consistently upregulated in both of the 2 gastric cancer data sets while it consistently downregulated in both of the 2 prostate cancer data sets.
In the past few years, miRNA-expression profiling of human tumors has identified many miRNA biomarkers associated with diagnosis, progression, prognosis and treatment for cancers (3). Evaluating the reproducibility of the differentially expressed miRNAs found from different studies for a cancer should be an indispensable step toward the preclinical research. For example, because a specific miRNA can targets multiple genes involved in multiple cancer-associated pathways, it has been proposed that miRNAs could be drug targets in cancer therapy (5). Certainly, the best drug target is the kind of miRNAs consistently disturbed in different cohorts of patients. However, differentially expressed miRNAs identified from different cohorts of patients for a particular cancer are often highly inconsistent and thus hold back the use of miRNAs biomarkers. Our study suggested that miRNAs playing roles in 1 cohort of patients may have functionally similar miRNAs playing the same roles in other cohorts of patients. In other words, each of the functional similar miRNAs may potentially play an important role in tumorigenesis for a part of patients. In this situation, multiplex RNA inhibition targeting strategy is expected for efficient drug design (49). Thus, the reproducibility evaluation of the differentially expressed miRNAs in different cohort of patients could provide a valuable guide for the drug target design. Future work is warranted to suggest combinations of differentially expressed miRNAs for efficient drug design by considering their overall sample coverage for a cancer.
Acknowledgments
The authors thank the two anonymous referees for their constructive advices and comments to improve this work.
Grant Support
This work was supported by National Natural Science Foundation of China (Grant Nos. 30770558, 30970668, and 81071646), Excellent Youth Foundation of Heilongjiang Province (Grant No. JC200808), Natural Science Foundation of Heilongjiang Province of China (Grant No. QC2010012), and Scientific Research Fund of Heilongjiang Provincial Education Department (Grant No. 11541156).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.