Abstract
Despite their extensive clinical and pathologic heterogeneity, all malignant germ cell tumors (GCT) are thought to originate from primordial germ cells. However, no common biological abnormalities have been identified to date. We profiled 615 microRNAs (miRNA) in pediatric malignant GCTs, controls, and GCT cell lines (48 samples in total) and re-analyzed available miRNA expression data in adult gonadal malignant GCTs. We applied the bioinformatic algorithm Sylamer to identify miRNAs that are of biological importance by inducing global shifts in mRNA levels. The most significant differentially expressed miRNAs in malignant GCTs were all from the miR-371–373 and miR-302 clusters (adjusted P < 0.00005), which were overexpressed regardless of histologic subtype [yolk sac tumor (YST)/seminoma/embryonal carcinoma (EC)], site (gonadal/extragonadal), or patient age (pediatric/adult). Sylamer revealed that the hexamer GCACTT, complementary to the 2- to 7-nucleotide miRNA seed AAGUGC shared by six members of the miR-371–373 and miR-302 clusters, was the only sequence significantly enriched in the 3′-untranslated region of mRNAs downregulated in pediatric malignant GCTs (as a group), YSTs and ECs, and in adult YSTs (all versus nonmalignant tissue controls; P < 0.05). For the pediatric samples, downregulated genes containing the 3′-untranslated region GCACTT showed significant overrepresentation of Gene Ontology terms related to cancer-associated processes, whereas for downregulated genes lacking GCACTT, Gene Ontology terms generally represented metabolic processes only, with few genes per term (adjusted P < 0.05). We conclude that the miR-371–373 and miR-302 clusters are universally overexpressed in malignant GCTs and coordinately downregulate mRNAs involved in biologically significant pathways. Cancer Res; 70(7); 2911–23
Introduction
Germ cell tumors (GCT) are clinically and pathologically complex neoplasms that occur from the neonatal period up to late adulthood (1). Benign forms show extensive somatic differentiation and are referred to as mature teratomas (MT) or immature teratomas (IT), whereas malignant GCTs are classified into seminomas and nonseminomatous tumors [yolk sac tumors (YST) and embryonal carcinomas (EC); ref. 2].
Despite their heterogeneity, the germ cell theory of tumorigenesis states that all GCTs arise from totipotent primordial germ cells (3). However, biological abnormalities that are conserved across the age and histologic range of malignant GCTs have not yet been identified. Here, we investigate the patterns and consequences of microRNA (miRNA) expression across the spectrum of GCTs. miRNAs regulate gene expression via translational repression and mRNA destabilization (4–6), the latter being detectable from mRNA expression changes (7, 8). This regulation is principally determined by the miRNA seed region, which binds to the seed complementary region (SCR) in the 3′-untranslated region (3′UTR) of mRNA targets (9). The seed region comprises nucleotides 2 to 8 of the miRNA, with nucleotides 2 to 7 being most critical for binding specificity (10). Importantly, miRNAs play a key role in cancer development, both as oncogenes and as tumor suppressor genes (4, 8, 11–13).
Little information is available concerning miRNA profiles in GCTs. One study used quantitative reverse transcription PCR (qRT-PCR) to determine the levels of a restricted range of 156 miRNAs in adult gonadal GCTs, compared with three cases of normal testis (patient age uncertain; ref. 14). This work suggested that the miR-371–373 gene cluster is highly expressed in adult malignant GCTs, a view supported by a genetic screen of primary human cells that implicated miR-372–373 as oncogenes in testicular GCTs (TGCT), acting through inhibition of large tumor suppressor homologue 2 (LATS2; ref. 15).
To date, there is no published miRNA profiling data for GCTs of pediatric patients, nor of those arising at extragonadal sites in any age group. It is therefore unknown whether particular changes in miRNA expression represent a fundamental feature of malignant GCTs. This study had two principal aims. First, we sought to determine global miRNA profiles in pediatric GCTs arising at both gonadal and extragonadal sites, and to compare the changes observed with those reported for adult gonadal malignant GCTs. Second, we applied the bioinformatic algorithm Sylamer (16) to identify miRNA changes that are of biological significance by inducing global shifts of mRNA expression. Our data indicate that the miR-371–373 and miR-302 clusters, of which six members (miR-372–373 and miR-302a–302d) share the identical 2- to 7-nucleotide seed region AAGUGC, have a fundamental role in the pathogenesis of malignant GCTs by downregulating functionally significant target genes.
Materials and Methods
Tumor samples
The study received Multicenter Research Ethics Committee (ref: 02/4/071) and Local Research Ethics Committee (ref: 01/128) approval. We performed miRNA expression profiling on 48 samples, representing 32 pediatric GCTs from 22 female and 10 male patients (12 YSTs, 11 seminomas, 3 ECs, 3 MTs, and 3 ITs), 2 testicular seminomas from young adults, 8 control samples, and 6 GCT cell lines (authenticated using short tandem repeat profiling; ref. 17; Supplementary Fig. S1). Further clinicopathologic details are provided in Supplementary Table S1. To avoid confusion with data from our re-analysis of miRNA expression in adult GCTs (14), all of these samples are henceforth referred to as “pediatric.” We use “seminoma” to refer to all tumors with seminomatous histology, regardless of site (i.e., testicular seminoma, ovarian dysgerminoma, and extragonadal germinoma; refs. 18, 19). The eight control tissues represented four normal gonadal specimens (one case each of prepubertal and postpubertal male and female gonad) and four developmental samples (two fetal yolk sacs and two fetal female gonads).
miRNA microarray expression profiling
Total RNA was isolated as described previously (2). Sample and human reference (20) RNA were hybridized to the miRCURY LNA array platform v9.2 (Exiqon). The miRNA GAL file was updated to miRBase v13.0,6
which annotated 615 probes on the array. The 48 raw miRNA (.txt) data files (Gene Expression Omnibus accession no. GSE18155) were processed using the BioConductor packages limma and ArrayQualityMetrics in R (21, 22). The median expression value of the quadruplicate spots for each miRNA was calculated after subtraction of background intensities. Within-array (global loess) and between-array (Aquantile method) normalization was performed (23) before a contrast matrix defining all pairwise comparisons was fitted (24). The data was filtered to exclude low variability probes (median expression value interquartile range, <0.6) and subsequently used for unsupervised hierarchical clustering, using a distance measure of 1 minus the Pearson correlation coefficient between samples. For heatmaps, values for each probe were centered by subtracting the mean expression value across samples.Differential expression was assessed using a moderated t statistic and P values adjusted for multiple testing using Benjamini and Hochberg's method (25). miRNAs with adjusted P < 0.01 were considered statistically significant and differentially expressed. Lists of differentially expressed miRNAs generated for four different comparisons (pediatric malignant GCTs, YSTs, seminomas, and EC versus nonmalignant control tissue) were subsequently used for Sylamer analysis. Only the most significantly differentially expressed miRNAs (adjusted P < 1 × 10−5) were represented on heatmaps to enhance the visualization of key miRNAs. TaqMan qRT-PCR validation of miRNA levels, normalized to RNU24, was performed as previously described (20).
We compared our findings with published miRNA expression data for adult gonadal GCTs, as obtained by qRT-PCR (14). The raw cycle threshold (Ct) data file was downloaded from the journal web site.7
After removal of the four spermatocytic seminoma (SS) samples, which do not occur in the pediatric population, miRNA Ct values for the remaining 60 adult tissue samples and five GCT cell lines were normalized to let-7a (which displayed the least variable expression across all samples), to obtain ΔCt values. The mean of all ΔCt values for all samples was then subtracted to obtain ΔΔCt values, which were used to perform unsupervised hierarchical clustering analyses and to generate lists of differentially expressed genes, employing the criteria used for the pediatric samples.mRNA expression analysis
Matching global mRNA expression profiles were available for 21 of the 42 pediatric tissue samples examined by miRNA microarray. These represented 17 malignant GCTs (10 YSTs, 6 seminomas, and 1 EC) and 4 nonmalignant controls, comprising 1 MT and 3 normal gonads (1 prepubertal, 1 postpubertal testis, and 1 postpubertal ovary; Supplementary Table S1). Profiling had previously been performed using the HG-U133A GeneChip (Affymetrix), comprising 22,283 probe sets corresponding to 13,042 genes. Data for 16 samples has been previously published (2); the EC, MT and three normal controls were previously unreported (Gene Expression Omnibus accession no. GSE18155). In addition, we re-analyzed published data from a study of adult TGCTs that also used the HG-U133A GeneChip [ref. (26); Gene Expression Omnibus accession no. GSE3218], excluding two suboptimal YST samples (K14 and K18; ref. 2). We used data from 25 such specimens, representing 8 pure YSTs, 12 pure seminomas, and 5 normal adult testis controls (26).
Raw mRNA (.CEL) files were processed and quantile normalized using Robust Multi-Array Average in R (6, 21, 27), using the Affymetrix annotation of March 2009. Robust Multi-Array Average–transformed expression values were analyzed for differential expression (24) with significance studied by t test and adjusted for multiple testing (25). Pathway enrichment analysis was performed using the Gene Ontology (GO) algorithm,8
as it permitted the comparison of differentially expressed genes (log2 fold change ≤−1.5 and adjusted P < 0.01) grouped by the presence or absence of the SCR corresponding to the common 2- to 7-nucleotide seed of the miR-372–373 and miR-302a–302d clusters. National Center for Biotechnology Information Entrez Gene identifiers were evaluated for biological process category overrepresentation within a total gene universe defined by the HG-U133A annotation library, using the hyperGTest function within the BioConductor GOstats package (28). GO terms with adjusted P < 0.01 (25) were considered statistically significant.Sylamer algorithm
For each ranked gene list derived from the mRNA expression data, we undertook Sylamer analysis for the six SCR elements of increasing size: three hexamers (corresponding to miRNA seed positions 1–6, 2–7, and 3–8), two heptamers (positions 1–7 and 2–8), and one octamer (position 1–8). Due to overrepresentation of conserved adenosines flanking SCRs in mRNAs (10), the complementarity criterion was discarded for SCR position 8 (seed position 1), where the nucleotide was always set to be adenosine, irrespective of the actual nucleotide at that position. For each comparison analyzed, the mRNA gene list was first ranked from downregulated (to the left) to upregulated (to the right). For each SCR under consideration, an enrichment/depletion P value was computed at different cutoffs in the ranked gene list. At each cutoff, an SCR was either enriched in the 3′UTRs of the genes to the left and accordingly depleted in the 3′UTRs of the genes to the right, or conversely depleted on the left and enriched on the right. Enrichment on one side and corresponding depletion on the other side of the cutoff is associated with a single P value. Varying the cutoff resulted in a set of P values for each SCR (Y-axis) visualized on a landscape plot (16), in which the log10-transformed P values were sign-adjusted and plotted against the ranked gene list (X-axis). Sign adjustment depended on the specific enrichment/depletion status of the pertinent SCR. A point plotted along the positive Y-axis signifies that the SCR is enriched in the genes to the left and depleted in the genes to the right, whereas a point plotted along the negative Y-axis conversely signifies depletion to the left and enrichment to the right. The displacement along the Y-axis identifies the significance of the joint enrichment/depletion P value for the SCR at that cutoff, according to the sign-adjusted log10 transformation.
For the present study, we combined Sylamer significance scores for different elements of each SCR to obtain a single summed significance score for each group of miRNAs sharing the same seed region. To do this, miRNAs were assigned to groups, defined by a common seed, and a single score was produced for the combined Sylamer results for the set of SCR hexamers, heptamers, and octamer particular to each group. This approach integrated signals from different word lengths and increased method sensitivity compared with standard Sylamer analysis. The same analysis was applied to all possible words that were eight nucleotides in length (all with an adenosine at position 8), including those that did not represent SCRs. The resulting scores followed an extreme value distribution by the nature of the scoring criteria employed. By fitting this distribution, P values were assigned to the scores, with values of <0.01 considered to be significant. As a filtering step, we only considered miRNA groups that contained at least one significantly differentially expressed miRNA.
Results
miRNA expression profiles in malignant GCTs
In initial unsupervised hierarchical clustering analysis, normalized microarray expression data for the 246 miRNAs that showed variable expression in the 48 pediatric samples and cell lines were used to generate a heatmap (Fig. 1). The dendrogram was divided into two main branches, one containing the pediatric malignant GCT tissues and cell lines, the other containing the nonmalignant tissue, i.e., the teratomas (MT and IT) and the normal and developmental control samples. Only one nonmalignant sample clustered with the malignant GCTs—MT tissue (MT-34) from a mixed GCT that also contained a malignant element. The pediatric malignant GCTs subdivided principally by histologic subtype, with dendrogram subdivisions comprising seminomas, cell lines, YSTs and ECs. The nonmalignant samples subdivided into two branches, with developmental controls (fetal yolk sac and gonads) in one branch and normal gonadal tissue (prepubertal and postpubertal ovary and testis) with the teratomas in the other.
In further analyses of our data set, we focused on the 42 tissue specimens only, removing the six cell lines. No miRNA showed significant differential expression (adjusted P < 0.01) between the MT and IT samples. When comparing the six teratoma samples and eight normal control specimens, only 2 of the 615 miRNAs were differentially expressed, with P values that only just reached significance (miR-9 and miR-9*; adjusted P = 0.0098 and 0.0091, respectively). Consequently, all 14 nonmalignant samples were combined for subsequent comparisons with the pediatric malignant GCT tissues. Comparing all malignant GCTs versus the nonmalignant samples produced a list of 170 significantly differentially expressed miRNAs, of which 44 (25.9%) were overexpressed in malignant GCTs and 126 (74.1%) were underexpressed (Supplementary Table S2A). A heatmap based on the most significantly differentially expressed miRNAs (adjusted P < 1 × 10−5; n = 65) showed complete segregation between the pediatric malignant GCT and nonmalignant samples (Supplementary Fig. S2). The top 10 differentially expressed miRNAs in this comparison are shown in Table 1A. The top nine were from just two miRNA clusters—miR-371–373 and miR-302 (the latter including miR-367)—and all were overexpressed in the malignant GCTs.
(A) . | |||||||||
---|---|---|---|---|---|---|---|---|---|
Pediatric data set . | Adult data set . | ||||||||
Rank . | miRNA . | Log2 fold change . | Adjusted P . | Rank . | miRNA . | ΔΔCt . | Adjusted P . | ||
1 | miR-302a | +4.41 | 6.28E-15 | 1 | miR-371 (-371-3p) | −11.18 | 1.18E-24 | ||
2 | miR-373 | +5.40 | 2.55E-14 | 2 | miR-373 | −10.26 | 1.66E-22 | ||
3 | miR-367 | +5.10 | 2.55E-14 | 3 | miR-372 | −10.79 | 4.87E-22 | ||
4 | miR-302c | +5.07 | 2.55E-14 | 4 | miR-302b | −12.45 | 1.65E-20 | ||
5 | miR-371-3p | +4.31 | 2.55E-14 | 5 | miR-302d | −12.41 | 1.65E-20 | ||
6 | miR-372 | +4.91 | 4.09E-14 | 6 | miR-373* | −8.72 | 2.85E-20 | ||
7 | miR-302d | +5.08 | 6.85E-14 | 7 | miR-367 | −12.38 | 8.08E-20 | ||
8 | miR-302b | +4.80 | 4.65E-11 | 8 | miR-302a | −12.14 | 1.30E-19 | ||
9 | miR-373* | +1.65 | 5.26E-11 | 9 | miR-302c | −11.41 | 5.62E-17 | ||
10 | miRPlus_17892 | +1.07 | 1.38E-10 | 10 | miR-302b* | −9.61 | 8.97E-16 | ||
(B) | |||||||||
miRNA | Chromosome location | miRBase accession | 5′ to 3′ sequence | ||||||
hsa-miR-371-3p | 19q13.41 | MIMAT0000723 | AAGUGCCGCCAUCUUUUGAGUGU | ||||||
hsa-miR-372 | 19q13.41 | MIMAT0000724 | AAAGUGCUGCGACAUUUGAGCGU | ||||||
hsa-miR-373 | 19q13.41 | MIMAT0000726 | GAAGUGCUUCGAUUUUGGGGUGU | ||||||
hsa-miR-302a | 4q25 | MIMAT0000684 | UAAGUGCUUCCAUGUUUUGGUGA | ||||||
hsa-miR-302b | 4q25 | MIMAT0000715 | UAAGUGCUUCCAUGUUUUAGUAG | ||||||
hsa-miR-302c | 4q25 | MIMAT0000717 | UAAGUGCUUCCAUGUUUCAGUGG | ||||||
hsa-miR-302d | 4q25 | MIMAT0000718 | UAAGUGCUUCCAUGUUUGAGUGU | ||||||
hsa-miR-367 | 4q25 | MIMAT0000719 | AAUUGCACUUUAGCAAUGGUGA | ||||||
(C) | |||||||||
Transcription factor | Chromosome location | Pediatric data set | Adult data set | ||||||
Overall rank (n = 347) | Overall log2 fold change | Overall adjusted P | GCT subtype rank, n = 523 (Sem)/n = 647 (YST) | Overall rank (n = 1,019) | Overall log2 fold change | Overall adjusted P | GCT subtype rank, n = 1,104 (Sem)/n = 1,196 (YST) | ||
NANOG | 12p13.31 | 5 | +3.23 | 7.00E-08 | 1 (Sem) | 250 | +1.85 | 6.54E-09 | 1 (Sem) |
TEAD4 (TEF-3) | 12p13.33 | 8 | +3.06 | 1.00E-04 | 34 (Sem) | 42 | +2.77 | 2.71E-06 | 19 (Sem) |
276 (YST) | 352 (YST) | ||||||||
POU5F1(OCT3/4) | 6p21.33 | 26 | +2.35 | 1.52E-07 | 13 (Sem) | 254 | +1.85 | 3.26E-07 | 23 (Sem) |
TFAP2C | 20q13.2 | 45 | +2.13 | 5.96E-07 | 11 (Sem) | 195 | +1.96 | 3.38E-08 | 9 (Sem) |
SOX17 | 8q11.22 | 114 | +1.67 | 3.01E-04 | 48 (Sem) | 10 | +3.60 | 1.44E-11 | 41 (Sem) |
39 (YST) | 42 (YST) | ||||||||
SOX15 | 17p12.3 | 260 | +1.21 | 2.11E-03 | 198 (Sem) | 570 | +1.35 | 1.10E-04 | 86 (Sem) |
(A) . | |||||||||
---|---|---|---|---|---|---|---|---|---|
Pediatric data set . | Adult data set . | ||||||||
Rank . | miRNA . | Log2 fold change . | Adjusted P . | Rank . | miRNA . | ΔΔCt . | Adjusted P . | ||
1 | miR-302a | +4.41 | 6.28E-15 | 1 | miR-371 (-371-3p) | −11.18 | 1.18E-24 | ||
2 | miR-373 | +5.40 | 2.55E-14 | 2 | miR-373 | −10.26 | 1.66E-22 | ||
3 | miR-367 | +5.10 | 2.55E-14 | 3 | miR-372 | −10.79 | 4.87E-22 | ||
4 | miR-302c | +5.07 | 2.55E-14 | 4 | miR-302b | −12.45 | 1.65E-20 | ||
5 | miR-371-3p | +4.31 | 2.55E-14 | 5 | miR-302d | −12.41 | 1.65E-20 | ||
6 | miR-372 | +4.91 | 4.09E-14 | 6 | miR-373* | −8.72 | 2.85E-20 | ||
7 | miR-302d | +5.08 | 6.85E-14 | 7 | miR-367 | −12.38 | 8.08E-20 | ||
8 | miR-302b | +4.80 | 4.65E-11 | 8 | miR-302a | −12.14 | 1.30E-19 | ||
9 | miR-373* | +1.65 | 5.26E-11 | 9 | miR-302c | −11.41 | 5.62E-17 | ||
10 | miRPlus_17892 | +1.07 | 1.38E-10 | 10 | miR-302b* | −9.61 | 8.97E-16 | ||
(B) | |||||||||
miRNA | Chromosome location | miRBase accession | 5′ to 3′ sequence | ||||||
hsa-miR-371-3p | 19q13.41 | MIMAT0000723 | AAGUGCCGCCAUCUUUUGAGUGU | ||||||
hsa-miR-372 | 19q13.41 | MIMAT0000724 | AAAGUGCUGCGACAUUUGAGCGU | ||||||
hsa-miR-373 | 19q13.41 | MIMAT0000726 | GAAGUGCUUCGAUUUUGGGGUGU | ||||||
hsa-miR-302a | 4q25 | MIMAT0000684 | UAAGUGCUUCCAUGUUUUGGUGA | ||||||
hsa-miR-302b | 4q25 | MIMAT0000715 | UAAGUGCUUCCAUGUUUUAGUAG | ||||||
hsa-miR-302c | 4q25 | MIMAT0000717 | UAAGUGCUUCCAUGUUUCAGUGG | ||||||
hsa-miR-302d | 4q25 | MIMAT0000718 | UAAGUGCUUCCAUGUUUGAGUGU | ||||||
hsa-miR-367 | 4q25 | MIMAT0000719 | AAUUGCACUUUAGCAAUGGUGA | ||||||
(C) | |||||||||
Transcription factor | Chromosome location | Pediatric data set | Adult data set | ||||||
Overall rank (n = 347) | Overall log2 fold change | Overall adjusted P | GCT subtype rank, n = 523 (Sem)/n = 647 (YST) | Overall rank (n = 1,019) | Overall log2 fold change | Overall adjusted P | GCT subtype rank, n = 1,104 (Sem)/n = 1,196 (YST) | ||
NANOG | 12p13.31 | 5 | +3.23 | 7.00E-08 | 1 (Sem) | 250 | +1.85 | 6.54E-09 | 1 (Sem) |
TEAD4 (TEF-3) | 12p13.33 | 8 | +3.06 | 1.00E-04 | 34 (Sem) | 42 | +2.77 | 2.71E-06 | 19 (Sem) |
276 (YST) | 352 (YST) | ||||||||
POU5F1(OCT3/4) | 6p21.33 | 26 | +2.35 | 1.52E-07 | 13 (Sem) | 254 | +1.85 | 3.26E-07 | 23 (Sem) |
TFAP2C | 20q13.2 | 45 | +2.13 | 5.96E-07 | 11 (Sem) | 195 | +1.96 | 3.38E-08 | 9 (Sem) |
SOX17 | 8q11.22 | 114 | +1.67 | 3.01E-04 | 48 (Sem) | 10 | +3.60 | 1.44E-11 | 41 (Sem) |
39 (YST) | 42 (YST) | ||||||||
SOX15 | 17p12.3 | 260 | +1.21 | 2.11E-03 | 198 (Sem) | 570 | +1.35 | 1.10E-04 | 86 (Sem) |
NOTE: A, the top 10 differentially expressed miRNAs segregating pediatric and adult malignant GCTs from nonmalignant tissue, ranked by adjusted P value. miRNAs in boldface are members of the miR-371–373 and miR-302 clusters. Note that miR-371, reported in the adult study, is now annotated as miR-371-3p. B, chromosomal location and seed sequence of the miR-371–373 and miR-302 clusters. The common 2- to 7-nucleotide seed region is underlined. C, transcription factors overexpressed in malignant GCTs overall, and at least one histologic subtype (all versus nonmalignant tissues), in both the pediatric and adult data sets. The columns show the rankings, fold change, and P values of genes differentially expressed in malignant GCTs overall. Also shown is the ranking in the main histologic subtypes of GCT, seminoma (Sem) and YST. In all cases, where no ranking is given, the transcription factor was not on the relevant list of differentially expressed genes. Although all six transcription factors were overexpressed in seminomas, SOX17 and TEAD4 were also overexpressed in YSTs.
Our re-analysis of published qRT-PCR profiling of miRNA expression in adult gonadal GCTs is described in Supplementary Results (Supplementary Fig. S3, Supplementary Table S2B). The top 10 differentially expressed miRNAs (Table 1A) were exclusively overexpressed miRNAs from the miR-371–373 and miR-302 clusters. Due to the different platforms used, data for the pediatric and adult samples could not be combined for direct comparison. However, parallel plots of expression values for each pediatric and adult specimen of the eight main members of the miR-371–373 and miR-302 clusters (i.e., non-miR* sequences; ref. 29) confirmed differential expression between the malignant GCTs and the nonmalignant (teratoma and control) samples (Fig. 2A). Expression patterns were similar within histologic subtypes, regardless of patient age.
Hierarchical clustering analysis using just the eight main members of the miR-371–373 and miR-302 clusters showed complete segregation of malignant GCTs from nonmalignant samples for both the pediatric and adult data [a single outlier (an EC sample) for the latter notwithstanding; Fig. 2B]. Interestingly, six of these eight miRNAs (miR-372–373 and miR-302a–302d) share a common key 2- to 7-nucleotide seed region AAGUGC (Table 1B), which corresponds to the SCR hexamer GCACTT in mRNA targets. There was no evidence that expression of these miRNA clusters was DNA copy number–driven in pediatric malignant GCTs, as assessed by 1 Mb interval array-based comparative genomic hybridization (refs. 30, 31; data not shown). For details on miRNA expression profiles in malignant GCT subtypes and cell lines, and associations between miRNA expression and tumor site, see Supplementary Results, Supplementary Tables S2 to S4, and Supplementary Figs. S4 and S5.
Transcriptional regulation of miRNA clusters
To identify transcription factors which may be responsible for miR-371–373 and miR-302 cluster overexpression, we examined gene expression profiles in our pediatric data set and in the published adult GCT data set (26). For this screening exercise, we applied less stringent criteria of log2 fold change >1.0 and adjusted P < 0.01. We identified six transcription factors that were overexpressed in malignant GCTs overall, and in at least one malignant subtype analysis, in both the pediatric and adult data sets (Table 1C). Although NANOG, POU5F1, TFAP2C, and SOX15 were specifically overexpressed in seminomas, SOX17 and TEAD4 were overexpressed in both seminomas and YSTs. The pediatric EC overexpressed all transcription factors except SOX17. For the 21 pediatric samples for which matched miRNA and mRNA expression data were available, linear regression analysis showed a positive correlation between the median expression value for the eight main miRNAs from the miR-371–373 and miR-302 clusters, and the expression levels of SOX17 and TEAD4 (P < 0.0005 and P = 0.06, respectively; Fig. 3A).
Sylamer analysis of mRNA profiles in malignant GCTs
Having validated the expression levels of selected miRNAs using qRT-PCR (Supplementary Results; Supplementary Fig. S5), we next performed Sylamer analysis on complete mRNA gene lists ranked according to differential expression between pediatric malignant GCTs and nonmalignant tissues. We focused on SCRs corresponding to seeds in miRNAs that we had identified as being differentially expressed, as the majority of pediatric malignant GCT samples in which miRNA profiling had been performed had matched mRNA profiling data. For further details of the analysis model, and the demonstration of the advantages of miRNA target identification using Sylamer rather than standard prediction methods, see Supplementary Results and Supplementary Figs. S6 and S7.
We observed that the SCR hexamer GCACTT, complementary to the common 2- to 7-nucleotide seed region AAGUGC of miR-372–373 and miR-302a–302d, was the most enriched in the mRNAs downregulated in pediatric malignant GCTs (the single P value for all SCR elements of different length was 3.81 × 10−4), YSTs (P = 4.96 × 10−5), and ECs (P = 7.18 × 10−4) compared with nonmalignant tissues (Fig. 4). No enrichment for this common SCR was seen when pediatric seminomas were compared with controls. Similar analysis of published data for adult TGCTs (26) also revealed that the common SCR hexamer GCACTT was the most enriched in genes underexpressed in adult testicular YSTs versus normal adult testis (the single P value for all SCR elements was 1.69 × 10−3; Fig. 4). This hexamer showed nonsignificant enrichment in downregulated genes when considering all adult TGCTs (P = 0.09; Fig. 4) and adult testicular seminomas compared with normal adult testis. No other SCRs corresponding to the seeds of significantly overexpressed miRNAs were overrepresented (P < 0.01) in the 3′UTRs of genes underexpressed in malignant GCTs in either the pediatric or adult data sets. For Sylamer analysis of global shifts in mRNA profiles corresponding to miRNAs underexpressed in malignant GCTs, see Supplementary Results and Supplementary Fig. S8.
Pathway enrichment analysis of downregulated mRNAs in malignant GCTs
Of 120 mRNAs that we identified as significantly downregulated in pediatric malignant GCTs versus nonmalignant tissue samples, transcript and 3′UTR information were available for 102. Sylamer identified that although the common 2- to 7-nucleotide SCR GCACTT was present in the 3′UTR of 17.4% (n = 30) of the 172 upregulated mRNAs in this comparison (log2 fold change >1.5 and adjusted P < 0.01) and a similar percentage of all 13,042 genes covered by the Affymetrix U133A GeneChip (16.3%; n = 2,125), it was enriched in the 102 downregulated mRNAs, being present in 41 (40.2%; Table 2A). The 41 mRNAs showed significant overrepresentation of the GO terms “regulation of cellular and biological processes”, “intracellular signaling cascade” and the three related terms “regulation of GTPase activity”, “regulation of Ras protein signal transduction”, and “regulation of small GTPase signal transduction” (Table 2B). In contrast, for the remaining 61 downregulated mRNAs in which the common SCR was absent, the overrepresented GO terms were generally related to a small number of metabolic processes only, with a small number of genes per term (Table 2C).
(A) . | ||||||
---|---|---|---|---|---|---|
Pediatric GCT data set . | Number of downregulated genes . | Downregulated genes with transcript and 3′UTR information (%) . | Common SCR present in 3′UTR (%) . | Common SCR absent in 3′UTR (%) . | ||
Malignant GCT vs controls | 120 | 102 (85.0) | 41 (40.2) | 61 (59.8) | ||
YST vs controls | 146 | 126 (86.3) | 47 (37.3) | 79 (62.7) | ||
Seminoma vs controls | 189 | 159 (84.1) | 51 (32.1) | 108 (67.9) | ||
(B) | ||||||
GO term | GOBPID | Total genes in GO term | Gene count | Expected gene count | Adjusted P | |
Regulation of small GTPase signal transduction | GO:0051056 | 115 | 5 | 0.499 | 9.90E-04 | |
Regulation of cellular process | GO:0050794 | 2943 | 22 | 12.78 | 3.20E-03 | |
Regulation of Ras protein signal transduction | GO:0046578 | 92 | 4 | 0.4 | 5.20E-03 | |
Regulation of biological process | GO:0050789 | 3,033 | 22 | 13.171 | 5.50E-03 | |
Regulation of Ras GTPase activity | GO:0032318 | 44 | 3 | 0.191 | 7.20E-03 | |
Intracellular signaling cascade | GO:0007242 | 769 | 10 | 3.339 | 8.00E-03 | |
(C) | ||||||
GO term | GOBPID | Total genes in GO term | Gene count | Expected gene count | Adjusted P | |
Malate metabolic process | GO:0006108 | 5 | 2 | 0.031 | 2.60E-03 | |
Fat-soluble vitamin metabolic process | GO:0006775 | 6 | 2 | 0.037 | 3.90E-03 | |
Vitamin A metabolic process | GO:0006776 | 6 | 2 | 0.037 | 3.90E-03 | |
Golgi organization and biogenesis | GO:0007030 | 8 | 2 | 0.05 | 7.20E-03 | |
(D) | ||||||
Gene information | Pediatric data set | Adult data set | Function OMIM:http://www.ncbi.nlm.nih.gov/omim/ and GENATLAS: http://genatlas.medecine.univ-paris5.fr/ | |||
Accession | Name | Rank (n = 102) | Log2 fold change | Rank (n = 212) | Log2 fold change | |
NM_000849 | GSTM3 | 1 | −3.38 | 3 | −5.03 | Metabolic, mutated in cancer |
NM_053001 | OSR2 | 3 | −2.69 | 95 | −2.19 | Zinc finger protein, transcription factor, development |
NM_006379 | SEMA3C | 4 | −2.51 | 78 | −2.41 | Immunoglobulin domain, short basic domain |
NM_181847 | AMIGO2 | 5 | −2.50 | 161 | −1.76 | Adhesion molecule |
NM_006167 | NKX3-1 | 7 | −2.47 | 19 | −3.68 | Transcription factor; downregulated in TGCT/prostate cancer |
NM_207304 | MBNL2 | 8 | −2.35 | 114 | −2.07 | Zinc finger protein, regulates alternative splicing |
NM_018013 | SOBP | 9 | −2.16 | 42 | −3.02 | Nuclear zinc finger protein; cell fate and patterning |
NM_001023567 | GOLGA8B | 10 | −2.00 | 75 | −2.45 | Golgi autoantigen, golgin subfamily a, 8B |
NM_178140 | PDZD2 | 11 | −2.02 | 31 | −3.33 | Transmembrane receptor binding protein |
NM_002736 | PRKAR2B | 12 | −2.01 | 127 | −2.00 | cAMP-dependent protein kinase |
NM_001015045 | FAM13A1 | 17 | −1.94 | 113 | −2.08 | Family with sequence similarity 13, A1. Function unknown |
NM_015230 | ARAP2 | 18 | −1.92 | 145 | −1.85 | ArfGAP protein that regulates focal adhesion |
NM_005491 | MAMLD1 | 22 | −1.78 | 136 | −1.94 | Transactivates Hes3 promoter |
NM_005923 | MAP3K5 | 24 | −1.76 | 210 | −1.51 | Activates MAPK; tumor suppressor gene; proapoptotic |
NM_001116 | ADCY9 | 25 | −1.76 | 174 | −1.70 | Adenylate cyclase |
NM_001101800 | FAM13B | 26 | −1.71 | 154 | −1.79 | Family with sequence similarity 13, B. Function unknown |
NM_006022 | TSC22D1 | 27 | −1.67 | 181 | −1.67 | Transcription factor; early-response gene |
NM_022817 | PER2 | 30 | −1.63 | 104 | −2.13 | Hypermethylated in cancer; circadian rhythm |
NM_206853 | QKI | 31 | −1.62 | 76 | −2.43 | RNA binding protein; RNA export and stability |
NM_006380 | APPBP2 | 36 | −1.55 | 63 | −2.63 | Interacts with microtubules |
NM_020194 | MFF | 40 | −1.50 | 165 | −1.75 | Membrane protein; apoptosis |
NM_001017977 | DCAF6 | 41 | −1.50 | 151 | −1.83 | Enhances transcription by nuclear receptors |
(A) . | ||||||
---|---|---|---|---|---|---|
Pediatric GCT data set . | Number of downregulated genes . | Downregulated genes with transcript and 3′UTR information (%) . | Common SCR present in 3′UTR (%) . | Common SCR absent in 3′UTR (%) . | ||
Malignant GCT vs controls | 120 | 102 (85.0) | 41 (40.2) | 61 (59.8) | ||
YST vs controls | 146 | 126 (86.3) | 47 (37.3) | 79 (62.7) | ||
Seminoma vs controls | 189 | 159 (84.1) | 51 (32.1) | 108 (67.9) | ||
(B) | ||||||
GO term | GOBPID | Total genes in GO term | Gene count | Expected gene count | Adjusted P | |
Regulation of small GTPase signal transduction | GO:0051056 | 115 | 5 | 0.499 | 9.90E-04 | |
Regulation of cellular process | GO:0050794 | 2943 | 22 | 12.78 | 3.20E-03 | |
Regulation of Ras protein signal transduction | GO:0046578 | 92 | 4 | 0.4 | 5.20E-03 | |
Regulation of biological process | GO:0050789 | 3,033 | 22 | 13.171 | 5.50E-03 | |
Regulation of Ras GTPase activity | GO:0032318 | 44 | 3 | 0.191 | 7.20E-03 | |
Intracellular signaling cascade | GO:0007242 | 769 | 10 | 3.339 | 8.00E-03 | |
(C) | ||||||
GO term | GOBPID | Total genes in GO term | Gene count | Expected gene count | Adjusted P | |
Malate metabolic process | GO:0006108 | 5 | 2 | 0.031 | 2.60E-03 | |
Fat-soluble vitamin metabolic process | GO:0006775 | 6 | 2 | 0.037 | 3.90E-03 | |
Vitamin A metabolic process | GO:0006776 | 6 | 2 | 0.037 | 3.90E-03 | |
Golgi organization and biogenesis | GO:0007030 | 8 | 2 | 0.05 | 7.20E-03 | |
(D) | ||||||
Gene information | Pediatric data set | Adult data set | Function OMIM:http://www.ncbi.nlm.nih.gov/omim/ and GENATLAS: http://genatlas.medecine.univ-paris5.fr/ | |||
Accession | Name | Rank (n = 102) | Log2 fold change | Rank (n = 212) | Log2 fold change | |
NM_000849 | GSTM3 | 1 | −3.38 | 3 | −5.03 | Metabolic, mutated in cancer |
NM_053001 | OSR2 | 3 | −2.69 | 95 | −2.19 | Zinc finger protein, transcription factor, development |
NM_006379 | SEMA3C | 4 | −2.51 | 78 | −2.41 | Immunoglobulin domain, short basic domain |
NM_181847 | AMIGO2 | 5 | −2.50 | 161 | −1.76 | Adhesion molecule |
NM_006167 | NKX3-1 | 7 | −2.47 | 19 | −3.68 | Transcription factor; downregulated in TGCT/prostate cancer |
NM_207304 | MBNL2 | 8 | −2.35 | 114 | −2.07 | Zinc finger protein, regulates alternative splicing |
NM_018013 | SOBP | 9 | −2.16 | 42 | −3.02 | Nuclear zinc finger protein; cell fate and patterning |
NM_001023567 | GOLGA8B | 10 | −2.00 | 75 | −2.45 | Golgi autoantigen, golgin subfamily a, 8B |
NM_178140 | PDZD2 | 11 | −2.02 | 31 | −3.33 | Transmembrane receptor binding protein |
NM_002736 | PRKAR2B | 12 | −2.01 | 127 | −2.00 | cAMP-dependent protein kinase |
NM_001015045 | FAM13A1 | 17 | −1.94 | 113 | −2.08 | Family with sequence similarity 13, A1. Function unknown |
NM_015230 | ARAP2 | 18 | −1.92 | 145 | −1.85 | ArfGAP protein that regulates focal adhesion |
NM_005491 | MAMLD1 | 22 | −1.78 | 136 | −1.94 | Transactivates Hes3 promoter |
NM_005923 | MAP3K5 | 24 | −1.76 | 210 | −1.51 | Activates MAPK; tumor suppressor gene; proapoptotic |
NM_001116 | ADCY9 | 25 | −1.76 | 174 | −1.70 | Adenylate cyclase |
NM_001101800 | FAM13B | 26 | −1.71 | 154 | −1.79 | Family with sequence similarity 13, B. Function unknown |
NM_006022 | TSC22D1 | 27 | −1.67 | 181 | −1.67 | Transcription factor; early-response gene |
NM_022817 | PER2 | 30 | −1.63 | 104 | −2.13 | Hypermethylated in cancer; circadian rhythm |
NM_206853 | QKI | 31 | −1.62 | 76 | −2.43 | RNA binding protein; RNA export and stability |
NM_006380 | APPBP2 | 36 | −1.55 | 63 | −2.63 | Interacts with microtubules |
NM_020194 | MFF | 40 | −1.50 | 165 | −1.75 | Membrane protein; apoptosis |
NM_001017977 | DCAF6 | 41 | −1.50 | 151 | −1.83 | Enhances transcription by nuclear receptors |
NOTE: A, the common 2- to 7-nucleotide SCR GCACTT is enriched in downregulated mRNAs in pediatric malignant GCTs. The table shows the number of downregulated genes in which the sequence complementary to the common 2- to 7-nucleotide seed of miR-372–373 and miR-302a–302d is either present or absent. B and C, GO analysis for mRNAs downregulated in pediatric malignant GCTs versus nonmalignant samples, according to the presence (C) or the absence (D) of the SCR corresponding to the common 2- to 7-nucleotide seed of miR-372–373 and miR-302a–302d. D, the 22 downregulated gene targets, common to both pediatric and adult malignant GCTs, for which the 3′UTR contains the common 2- to 7-nucleotide SCR. The 19 genes containing the common 2- to 7-nucleotide SCR that were significantly downregulated in the pediatric samples only are listed in Supplementary Table S5.
Of the 41 mRNAs containing the common 2- to 7-nucleotide SCR in the pediatric comparison, we identified 22 that were also present in the corresponding adult gene list (and therefore most likely to be direct targets of the miR-372–373 and miR-302a–302d families), including numerous cancer-associated genes (Table 2D; Supplementary Results and Supplementary Table S5). Linear regression analysis showed significant negative correlations for 21 of the 22 genes between expression levels and the median expression value for the six miRNAs from the miR-371–373 and miR-302 clusters that contain the common 2- to 7-nucleotide seed AAGUGC, using data from the 21 pediatric samples with matched miRNA and mRNA expression data (Fig. 3B; Supplementary Fig. S9).
Similar observations were made for mRNAs downregulated in pediatric YSTs versus nonmalignant tissue and for mRNAs downregulated in pediatric seminomas versus nonmalignant tissue. For both comparisons, the common 2- to 7-nucleotide SCR was enriched in downregulated genes, being present in 37.3% and 32.1%, respectively (Table 2A). Likewise, GO terms for SCR-containing mRNAs included a range of cancer-associated processes, whereas GO terms for mRNAs without the SCR generally represented metabolic processes only (Supplementary Tables S6 and S7). For details on enrichment for the common 2- to 7-nucleotide SCR GCACTT in downregulated mRNAs in adult TGCTs and analogous GO analysis, see Supplementary Results and Supplementary Table S8. For genes downregulated in YSTs versus nonmalignant samples and seminomas versus nonmalignant samples for the pediatric and adult data sets, see Supplementary Results and Supplementary Tables S9 and S10, respectively.
Discussion
In this study, we have shown that the majority of miRNAs differentially expressed in pediatric malignant GCTs are downregulated, as has been observed for other types of malignancy (32). Nevertheless, the most significant differential expression was upregulation of the miR-371–373 and miR-302 clusters, regardless of histologic subtype, tumor site (ovary, testis, or extragonadal), or patient age. Coordinate overexpression of these miRNAs seems to be specific to malignant GCTs, with no similar findings for other malignancies or diseases to date, save for miR-372–373 overexpression in an isolated case of an exceptionally rare embryonal brain tumor, at much lower levels than in malignant GCTs (33).
The miR-371–373 cluster was previously reported to be overexpressed in adult gonadal malignant GCTs, based on qRT-PCR (14) and RNase protection assays (15). However, these reports are inconsistent regarding expression of the miR-302 cluster in such tumors. One stated that miR-302a–302d expression was undetectable in many miR-371–373–expressing malignant GCTs, consistent with miR-371–373 overexpression being a selected event in malignant GCT development (15). In contrast, the other seems to illustrate the overexpression of miR-302a–302d in all adult gonadal malignant GCTs, although this was not explicitly commented on (14). Our re-analysis of the published qRT-PCR data shows that the miR-302 cluster is as significantly overexpressed as miR-371–373 in adult gonadal malignant GCTs compared with nonmalignant tissues (teratomas and controls; Table 1A), mirroring our observation for pediatric gonadal and extragonadal malignant GCTs. As both miRNA clusters are believed to be embryonic stem cell–specific pluripotency markers (34–38), our findings suggest that expression of the miR-371–373 and miR-302 clusters in malignant GCTs either represents the persistence of an embryonic pattern of miRNA expression that is not present in normal tissues and teratomas (the latter having undergone somatic differentiation), or acquired re-expression, regulated by an as yet undetermined mechanism. We observed associations between miR-371–373/miR-302 levels and transcription factor overexpression, warranting future investigations of their functional relationships, which may be complex. For example, NANOG and POU5F1 have binding sites in the miR-302 (35, 39, 40) and miR-371–373 (40) cluster promoter regions, whereas POU5F1 is negatively regulated by miR-145 (41), which is significantly downregulated in both pediatric and adult malignant GCTs (Supplementary Table S2A/B). Our observation of SOX17 overexpression in seminoma and YST, but not in an EC sample, is consistent with previous reports (42–44).
Our Sylamer analysis strongly suggests that overexpression of the miR-371–373 and miR-302 clusters is functionally important in malignant GCTs by globally affecting the levels of target mRNAs. We observed significant enrichment of the SCR hexamer GCACTT (complementary to the common miR-372–373 and miR-302a–302d 2- to 7-nucleotide seed AAGUGC) in genes underexpressed in pediatric malignant GCTs, YSTs, and EC versus nonmalignant controls and in adult YSTs versus testicular controls. Sylamer did not identify overrepresentation of SCRs corresponding to other overexpressed miRNAs. Nevertheless, such miRNAs might contribute to the clinicopathologic heterogeneity of malignant GCTs by targeting a smaller, more discrete number of mRNAs. For example, each subtype of pediatric malignant GCT showed specific abnormalities of miRNA expression (such as overexpression of miR-182–183 cluster in seminomas, miR-375 in YSTs, and the miR-515–526 cluster in ECs) and we observed significant differential miRNA expression in intracranial versus extracranial seminomas. Interestingly, however, the striking differences in mRNA expression that we previously observed between pediatric and adult malignant GCTs (2) were not reflected by similar differences in miRNA expression profiles.
Using GO analysis, we showed that for pediatric malignant GCTs, and their main subtypes YST and seminoma, the downregulated mRNAs containing the SCR corresponding to the common miR-372–373/miR-302a–302d seed mediate cellular processes important in oncogenesis and malignant progression (signal transduction, cell cycle, development, and morphogenesis, etc.), in contrast to the small number of metabolic processes identified for downregulated mRNAs without the common SCR. Together, these findings indicate the generalized functional significance of miR-372–373 and miR-302a–302d in the biology of malignant GCTs. Interestingly, these miRNA clusters, via their common 2- to 7-nucleotide seed AAGUGC, are known to be essential for regulating G1-S transition and promoting rapid proliferation in embryonic stem cells (45, 46). Our data further support the use of GO enrichment analysis to identify groups of genes targeted by the same miRNA seed that share a biological function (47). Of note, considerably weaker signals were obtained from the Sylamer and GO enrichment analysis of the adult mRNA data set (26), in which controls were normal adult testis samples only, leading to large numbers of differentially expressed genes being related to male reproduction and spermatogenesis. We obtained a more tractable list of differentially expressed genes by selecting a range of control tissues containing normal germ cells at different developmental stages.
In conclusion, our data indicate that the miR-371–373 and miR-302 clusters are universally overexpressed in malignant GCTs and are of functional significance by downregulating mRNAs involved in biologically significant pathways. It will now be important to translate our findings clinically. The miRNA expression changes we describe may improve tumor diagnosis and posttreatment monitoring, and enable novel therapeutic approaches that target fundamental abnormalities of malignant GCT cells.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
Grant Support: Medical Research Council Clinical Research Training Fellowships (R.D. Palmer and M.J. Murray), Glaxo-SmithKline Postdoctoral Fellowship (H.K. Saini), and Wellcome Trust Sanger Institute Postdoctoral Fellowship (C. Abreu-Goodger), with further support from the Medical Research Council, Cancer Research UK, CLIC Sargent, Parthenon Trust, and Addenbrooke's Charitable Trust.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.