Abstract
A major challenge in cancer research is to determine the biological and clinical significance of somatic mutations in noncoding regions. This has been studied in terms of recurrence, functional impact, and association to individual regulatory sites, but the combinatorial contribution of mutations to common RNA regulatory motifs has not been explored. Therefore, we developed a new method, MIRA (mutation identification for RNA alterations), to perform an unbiased and comprehensive study of significantly mutated regions (SMR) affecting binding sites for RNA-binding proteins (RBP) in cancer. Extracting signals related to RNA-related selection processes and using RNA sequencing (RNA-seq) data from the same specimens, we identified alterations in RNA expression and splicing linked to mutations on RBP binding sites. We found SRSF10 and MBNL1 motifs in introns, HNRPLL motifs at 5′ UTRs, as well as 5′ and 3′ splice-site motifs, among others, with specific mutational patterns that disrupt the motif and impact RNA processing. MIRA facilitates the integrative analysis of multiple genome sites that operate collectively through common RBPs and aids in the interpretation of noncoding variants in cancer. MIRA is available at https://github.com/comprna/mira.
Implications: The study of recurrent cancer mutations on potential RBP binding sites reveals new alterations in introns, untranslated regions, and long noncoding RNAs that impact RNA processing and provide a new layer of insight that can aid in the interpretation of noncoding variants in cancer genomes. Mol Cancer Res; 16(7); 1112–24. ©2018 AACR.
This article is featured in Highlights of This Issue, p. 1071
Introduction
Cancer arises from genetic and epigenetic alterations that interfere with essential mechanisms of the normal life cycle of cells such as DNA repair, replication control, and cell death (1). The search for driver mutations, which confer a selective advantage to cancer cells, is generally performed in terms of the impact on protein sequences (2). However, systematic studies of cancer genomes have highlighted mutational processes outside of protein-coding regions (3–5) and tumorigenic mutations at noncoding regions have been described, like those in the TERT promoter (6, 7). However, a major challenge remains to more accurately and comprehensively determine the significance and potential pathogenic involvement of somatic variants in regions that do not code for proteins (8). Current methods to detect driver mutations in noncoding regions are based on (i) the recurrence of mutations in predefined regions in combination with measurement of potential functional impacts (4, 9–11); (ii) recurrence in combination with sequence conservation or polymorphism data (12, 13); or (iii) the enrichment of mutations with respect to specific mutational backgrounds (5, 14, 15); and some of them combine such approaches (5). However, these methods have so far been restricted to individual positions rather than combining the contributions from multiple functionally equivalent regulatory sites. Additionally, the impact on RNA processing measured from the same samples has not been evaluated.
RNA molecules are bound by multiple RNA binding proteins (RBP) with specific roles in RNA processing, including splicing, stability, localization, and translation, which are critical for the proper control of gene expression (16). Multiple experimental approaches have established that RBPs generally interact with RNAs through short motifs of 4 to 7 nucleotides (17–19). These motifs occur anywhere along the precursor RNA molecule (pre-mRNA), including introns, protein coding regions, untranslated 5′ and 3′ regions, as well as in short and long noncoding RNAs (20, 21). Mutations on RNA regulatory sequences can impact RNA processing and lead to disease (22). However, studies carried out so far have mainly focused on sequences around splice sites (23) or in protein-coding regions (24). In vitro screenings of sequence variants in exons have revealed that more than 50% of nucleotide substitutions can induce splicing changes (25, 26), with similar effects for synonymous and nonsynonymous sites (26). Because RBP binding motifs are widespread along gene loci, and somatic mutations may occur anywhere along the genome, it is possible that mutations in other genic regions could impact RNA processing and contribute to the tumor phenotype. Mutations and expression alterations in RBP genes have an impact on specific cellular programs in cancer, but it is not known whether mutations in RBP binding sites along gene loci are frequent in cancer, could damage RNA processing, and contribute to oncogenic mechanisms.
To understand the effects of somatic mutations on RNA processing in cancer at a global level we have developed a new method, MIRA, to carry out a comprehensive study of somatic mutation patterns along genes that operate collectively through interacting with common RBPs. Compared with other existing approaches to detect relevant mutations in noncoding regions, our study provides several novelties and advantages: (i) we searched exhaustively along gene loci, hence increasing the potential to uncover deep intronic pathological mutations; (ii) we studied the enrichment of a large compendium of potential RNA regulatory motifs, allowing us to identify potentially novel mechanisms affecting RNA processing in cancer; (iii) we showed that multiple mutated genomic loci potentially interact with common RBPs, suggesting novel cancer-related selection mechanisms; and (iv) unlike previous methods, we used RNA sequencing data from the same samples to measure the impact on RNA processing. Our study uncovered multiple mutated sites associated with common RBPs that impact RNA processing with potential implications in cancer and revealed a new layer of insight to aid in the interpretation of noncoding variants in cancer genomes.
Materials and Methods
Detection of significantly mutated regions (SMR)
We used the Gencode gene annotations (v19), excluding pseudogenes. To define gene loci unequivocally, we clustered transcripts that shared a splice site on the same strand, and considered a gene to be the genomic locus and the strand defined by those transcripts. We used somatic mutations from whole genome sequencing for 505 tumor samples from 14 tumor types (ref. 10; Supplementary Table S1): bladder carcinoma (BLCA; 21 samples), breast carcinoma (BRCA; 96 samples), colorectal carcinoma (42 samples), glioblastoma multiforme (GBM; 27 samples), head and neck squamous carcinoma (HNSC; 27 samples), kidney chromophobe (KICH; 15 samples), kidney renal carcinoma (KIRC; 29 samples), low grade glioma (LGG; 18 samples), lung adenocarcinoma (LUAD; 46 samples), lung squamous cell carcinoma (LUSC; 45 samples), prostate adenocarcinoma (PRAD; 20 samples), skin carcinoma (SKCM; 38 samples), thyroid carcinoma (THCA; 34 samples), and uterine corpus endometrial carcinoma (UCEC; 47 samples). We only used substitutions, discarding those with a precise allelic match to a germline variant in dbSNP138.
To identify SMRs in both coding and noncoding regions of genes we used a sliding-window approach, whereby along each gene locus we tested all overlapping windows of length 7 (7-mer window) that harbored at least one mutation (Supplementary Fig. S1). For each 7-mer window, we performed a double statistical test. First, given a window with n mutations in a gene of length L and N mutations overall, we performed a binomial test using N/L as the expected local mutation rate. All tested windows in a gene were adjusted for multiple testing using the Benjamini–Hochberg (BH) method, and we kept only windows with corrected P < 0.05 (Supplementary Fig. S2).
To account for potential nucleotide biases (NB), we performed a second test per 7-mer window: we compared the mutation count in a given window with the expected count according to the distribution of mutations per nucleotide in the same gene as follows: For each base a, we calculated the rate of mutations in a locus R(a) = m(a)/n(a), where n(a) is the number of a bases in the gene and m(a) is the number of those bases that are mutated. The expected mutation count is then calculated using the nucleotide counts in the window and the mutation rate per nucleotide. For instance, for the 7-mer window AACTGCAG, the expected count was calculated as E = 3R(A) + 2R(C) + 2R(G) + R(T). This was compared with the actual number of mutations, n, observed in that window to define an NB score: NB-score = log2(n / E) per window. We discarded windows corresponding to single-nucleotide repeats (e.g., AAAAAAA) and kept windows with NB-score > 6 (Supplementary Fig. S2). Further, we kept 7-mer windows that overlapped any of the three intronic or exonic bases around exon–intron boundaries with 1 or more mutations as long as the NB-score was greater than 6.
Significant 7-mer windows were clustered according to genomic overlap and classified according to the genic region in the same strand on which they fell (Supplementary Tables S2–S4): 5′ or 3′ untranslated regions (5UTR/3UTR), coding sequence (CDS), exon in short or long noncoding RNA (EXON), 5′/3′ splice-site (5SS/3SS), or intron (INTRON). To unambiguously assign each 7-mer window, we prioritized regions as follows: 5SS/3SS > CDS > 5UTR/3UTR > EXON > INTRON, i.e., if a window overlapped a splice site, it was classified as such; else, if it overlapped a CDS, it was classified as CDS, etc. No significant window overlapped start or stop codons. To each SMR we assigned the average NB score and a corrected P value using the Simes approach (27): we ranked the P values of the n overlapping windows in increasing order pi, I = 1,2,…n and calculated ps = min{np1 / 1, np2 / 2, np3 / 3, …., npn / n}, where p1 was the lowest and pn the highest P values in the cluster. Each SMR cluster was then assigned the P value ps. Using other window lengths (k = 6, 8, 9) we observed that k = 7 provides an optimal trading off between the number of regions covered and the clustering of those regions (Supplementary Table S2). Code for this analysis is available at https://github.com/comprna/mira.
Comparison with expression, replication timing and with other methods
Data for replication time were obtained from ref. (14). Only SMRs with replication time data were analyzed. For each SMR in the PAN505 cohort, we considered annotated transcripts whose genomic sequence overlapped with an SMR. We calculated the total expression in transcripts per million (TPM) units for the overlapping transcripts per patient and averaged them across patients. For each SMR we compared the average expression of the SMR-containing transcripts in the mutated samples with the number of mutations. Additionally, using the same mutation dataset, we analyzed all SMRs with LARVA (14). LARVA assesses the significance of mutations in any genomic region integrating multiple noncoding functional elements, modeling their mutation count with a beta-binomial distribution to handle overdispersion related to the mutation heterogeneity and mutation correlation between neighboring sites, and uses regional genomic features such as replication timing to better estimate local mutation rates and mutational enrichments. We compared the significance of our SMRs with the significance given by LARVA using the model with a beta-binomial distribution and the replication timing correction (p-bbd-cor), which accounts for overdispersion of the mutation rates and regional biases. We also analyzed our SMRs with OncodriveFML (11), which evaluates the functional impact of mutations using CADD score (28) and RNA secondary structure (29), using the same mutation dataset. OncodriveFML was run with option −type coding for SMRs of type CDS, and with the option −type noncoding for all other SMR types (5SS/3SS, 5UTR, 3UTR, INTRON, EXON, where EXON corresponds to exons from noncoding RNAs). Each type was run independently.
Control regions for SMR comparison
For each SMR, we sampled 100 random regions of the same length and same type from the Gencode annotation, without mutations, and allowing for a maximum variation of G+C content of 5%. Each of these 100 controls was separated into different sets to generate 100 control sets of the same number as SMRs, each with similar distribution and G+C content distributions. For the 5SS and 3SS SMRs, we generated controls by sampling regions with the same length and same relative position from Gencode exon–intron boundaries without mutations and controlling for G+C content.
Motif analysis
We performed an unbiased search for enriched k-mers (k = 6) on the SMRs using MoSEA (https://github.com/comprna/MoSEA; ref. 30), using as input the SMR sequences and control regions, for each region type. MoSEA counts the number of SMRs and control regions in which each 6-mer appears, and a z-score was computed for each 6-mer comparing the observed frequency with the distribution of frequencies in 100 control subsamples of the same size, length distribution, and GC content as the SMR set. This was repeated reversing the strand of SMRs and control regions, and those 6-mers that appeared significantly enriched in the direct and reversed analyses for the same region type were discarded. We considered significant the 6-mers with z-score > 1.96 and 5 or more counts. This analysis included the GT and AG containing 6-mers at 5SS and 3SS SMRs, respectively. In total, we obtained 357 enriched 6-mers (Supplementary Table S5) in 3,547 SMRs (Supplementary Table S6). From all 20307 SMRs, 749 (3.68%) appeared in both strands due to overlapping genes. However, considering the 3,456 SMRs with enriched 6-mers, only 25 (0.72%) overlapped on opposite strands (Fisher exact test P = 1.29e−32, odds ratio = 6.16).
To label enriched 6-mers we used DeepBind (31) to score each 6-mer using models for 522 proteins containing KH (24 proteins), RRM (134, 2 in common with KH) and C2H2 (366 proteins) domains from human (413 proteins), mouse (49 proteins), and Drosophila (60 proteins). For each 6-mer, we kept the top three predictions with score > 0.1. Subsequently, given all 6-mers associated with the same RBP label, we kept those 6-mers at a maximum Levenshtein distance of 2 from the top-scoring 6-mer. Levenshtein distance measures the dissimilarity between two strings based on the number of deletions, insertions, or substitutions required to transform one string into the other.
Significant mutations per position of a motif
For each RBP, we considered all the associated 6-mers with mutations in SMRs of a given region type (Supplementary Table S7) and performed a multiple sequence alignment (MSA) using ClustalW (32). Sequence logos were built from this alignment and somatic mutations were counted per position relative to the MSA. As a control, we shuffled the mutations along the aligned positions to calculate an average per position. Germline mutations per position of the MSA from the 1,000 genomes project (33) were also considered.
CLIP-seq analysis
We collected binding sites from 91 CLIP-Seq experiments for 68 different RBPs from multiple sources (34–40) and selected the available significant CLIP clusters from each experiment. For datasets with two replicates, we selected the intersecting regions of the significant CLIP clusters, i.e., the genomic ranges covered by both replicates. From all SMRs with assigned RBP label, 482 (13.6%) had CLIP signal, whereas 3,064 did not have CLIP signal. In contrast, from all SMRs without assigned RBP label, 1,239 had CLIP overlap (7.4%), whereas 15,522 did not have CLIP signal (Fisher exact test P = 4.10e−30, odds ratio: 1.97). We performed the same analysis per RBP label: For each RBP label, we considered the SMRs with and without the label assignment and tested the enrichment of CLIP signal associated with the label using a Fisher exact test (Supplementary Table S8).
Gene set enrichment analysis
Annotations for gene sets were obtained from the Molecular Signatures Database v4.0 (41). We performed a Fisher exact test per hallmark set for genes harboring SMRs with labeled enriched motifs using the counts of genes with/without the RBP motif SMRs, and within/outside each gene set (Supplementary Table S9).
RNA-seq data analysis
The Cancer Genome Atlas (TCGA) RNA-seq data were obtained for the PAN505 samples from the Genomic Data Commons (https://portal.gdc.cancer.gov/). We estimated transcript abundances for the Gencode annotations (v19) in TPM units using Salmon (Version 0.8.1; ref. 42). For each mutated position in an enriched motif, we calculated the association with a transcript expression change using an outlier statistic. For each transcript whose genomic extension contained the SMR with the mutated motif, we compared the transcript log2(TPM + 0.01) for each patient with the mutation, with the distribution of log2(TPM + 0.01) values for the same transcript in the patients from the same tumor type with no mutations in the SMR. We only considered those cases where at least 5 patients lacked a mutation. We kept those cases with |z-score|>1.96 and a difference between the observed log2(TPM + 0.01) and the mean of log2(TPM + 0.1) in patients without mutations greater than 0.5 in absolute value, and considered significant those with a P < 0.05 after adjusting for multiple testing (BH approach; Supplementary Table S10).
RNA-seq reads were also mapped to the human genome (hg19) with STAR (version 2.5.0; ref. 43) and analyzed using Junckey (https://github.com/comprna/Junckey) as follows. All exon–exon junctions defined by spliced reads that appeared in any of the samples were grouped into junction clusters. Any two junctions were placed in the same cluster if they shared at least one splice site. Clusters were built using all junctions present in any patient, but junction read counts were assigned per patient. Only clusters with at least 30 reads in all samples were used. Additionally, we only used junctions <100 kbp in length and with >1% of reads from the cluster in all samples. For each patient, the read count per junction was normalized by the total read count in that cluster to define the junction inclusion or proportion spliced-in (PSI). A junction not expressed in a sample was set to PSI = 0. For each SMR containing an enriched motif, we compared each patient with a mutation in the motif against all patients for the same tumor type without mutations in the same SMR. We measured a z-score derived from the PSI for each junction overlapping the motif and the of PSIs for the same junction in the nonmutated patients, and we kept only those cases with |z-score|>1.96 and |ΔPSI|>0.1. We considered significant those changes with P < 0.05 after adjusting for multiple testing (BH approach; Supplementary Table S11).
Supplementary data and software
UCSC tracks for SMRs, mutations, motifs, and differentially included junctions are given as supplementary file. Plots for all mutated RBP binding motifs and RNA processing changes are available at: http://comprna.upf.edu/Data/MutationsRBPMotifs/
Code used in this article is available at:
Results
Unbiased search for SMRs along gene loci
We performed an exhaustive detection of mutation enrichment using overlapping genomic windows of 7 nucleotides (7-mer windows) along each gene locus (Fig. 1A; Supplementary Fig. S1). Using a dataset of somatic mutations from whole-genome sequencing (WGS) from 505 samples for 14 tumor types (ref. 10; PAN505; Supplementary Table S1), we performed a double statistical test. First, to account for local variations in mutational processes, we compared each 7-mer window against the mutation rate in the entire locus, and selected those with P < 0.05 after correcting for multiple tests (Supplementary Fig. S2). Second, to account for NBs, we compared the mutation count in each 7-mer window with the expected count calculated from the mutation rate per nucleotide to define an NB score per window. Of the 140,704 windows with 3 or more mutations, 93,497 (66%) showed an NB score of >6, whereas of the 45,916,437 windows with 1 mutation, which we considered to reflect the background, 1,557,310 (3%) had NB score > 6 (Fisher exact test P = 0, odds ratio = 369.5; Supplementary Fig. S2). Using the filters of P < 0.05 and NB score > 6, our exhaustive search produced 78,352 significant 7-mer windows in 8,159 genes. We further separated 7-mer windows according to whether they were in a coding sequence (CDS), a 5UTR/3UTR, an exon in a long noncoding RNA (EXON), an intron (INTRON), or overlapping a 5′ or 3′ splice site (5SS/3SS) and clustered them into SMRs (Fig. 1B), producing a total of 20,307 SMRs (Supplementary Table S2), harboring a total of 41,756 substitutions (Supplementary Fig. S3; Supplementary Tables S3 and S4). Most of the predicted SMRs were 7 to 15 nucleotides long (Supplementary Fig. S4), and the majority of SMRs were in introns and exons of noncoding RNAs (EXON).
Systematic identification of RNA-related SMRs. A, Short k-mer windows (k = 7 in our study) along genes are tested for the enrichment in mutations with respect to the gene mutation rate and the local NBs. Significant windows are clustered by region type into SMRs. For each SMR, we give the NB-score (x axis) and the number of mutations (y axis). B, We show the SMRs detected in CDS regions and introns (INTRON). All SMRs detected are shown, but we only indicate the gene name for the SMRs with NB score > 6 and with 5 or more mutations for CDS SRMs, and with 12 or more mutations for INTRON SMRs. 5′ UTRs (5UTR), 3′ UTRs (3UTR), noncoding RNA exons (EXON) and 5SS SMRs are shown in (Supplementary Fig. S9). C, Examples of a CDS SMR in NRAS and an INTRON SMR in MET. The UCSC screenshots show the SMR (green) and the mutations detected (red).
Systematic identification of RNA-related SMRs. A, Short k-mer windows (k = 7 in our study) along genes are tested for the enrichment in mutations with respect to the gene mutation rate and the local NBs. Significant windows are clustered by region type into SMRs. For each SMR, we give the NB-score (x axis) and the number of mutations (y axis). B, We show the SMRs detected in CDS regions and introns (INTRON). All SMRs detected are shown, but we only indicate the gene name for the SMRs with NB score > 6 and with 5 or more mutations for CDS SRMs, and with 12 or more mutations for INTRON SMRs. 5′ UTRs (5UTR), 3′ UTRs (3UTR), noncoding RNA exons (EXON) and 5SS SMRs are shown in (Supplementary Fig. S9). C, Examples of a CDS SMR in NRAS and an INTRON SMR in MET. The UCSC screenshots show the SMR (green) and the mutations detected (red).
We tested our predicted SMRs for possible biases. We calculated for each SMR the DNA replication timing, which is known to correlate with somatic mutations in cancers and can be a source of artifacts (44, 45), and observed no association with mutation count (Supplementary Fig. S5). Another potential source of artifacts is the relation between gene expression and mutation rates (44). We used RNA sequencing (RNA-seq) data from the same samples to measure the expression of SMR-containing transcripts and observed no association between the mutation count and expression (Supplementary Fig. S6). We further used LARVA (14) to test our SMRs using a statistical model that accounts for overdispersion of the mutation rate and replication timing. We observed an overall high similarity between the significance provided by MIRA and LARVA (Supplementary Fig. S7). In particular, we found a strong agreement for INTRON SMRs (Pearson R = 0.83). Additionally, we analyzed our SMRs with OncodriveFML (11). Most of the CDS and EXON SMRs were identified as significant by OncodriveFML, whereas not as many INTRON, 5UTR, or 3UTR SMRs were seen as significant (Supplementary Fig. S8; Supplementary Table S3), suggesting that mutations falling on INTRON or UTR regions may impact function in a different way that has not been considered yet.
Predicted SMRs recovered known and novel mutational hotspots
We found SMRs in 501 cancer-driver genes of 889 collected from the literature (30), which is more than expected by chance (Fisher exact test P = 7.24e−109, odds ratio = 4.67) compared with the 8,870 noncancer genes (of 40,973) that do not harbor SMRs. We observed CDS SMRs in 34 cancer genes identified previously (11, 13), including BRAF, IDH1, KRAS, PIK3CA, SF3B1, CTNNB1, TP53, and KRAS (Fig. 1B), as well as in in cancer genes not found previously, including NRAS, EP300, and ATM (Fig. 1C). From the 133 genes with predicted 5UTR SMRs, 17 were identified previously (11) and 8 corresponded to cancer genes, including SPOP and EEF1A1 (Supplementary Fig. S9). We also found 3UTR SMRs in 28 cancer genes, including CTNNB1 and FOXP1. From the 519 SMRs in EXON SMRs, 18 were located in cancer gene loci. In 42 lncRNAs related to cancer (15), we found only one EXON SMR in TCL6, but 11 INTRON SMRs, suggesting that lncRNAs introns could be more relevant than previously anticipated. As our analysis was exhaustive along the entire gene loci, we recovered many more INTRON SMRs than in previous reports, 317 of which within cancer genes, including NUMB, ALK, EPHB1, ARID1A, and MET (Fig. 1C). Additionally, 5 of the previously reported intronic mutations (11), we classified as 5SS/3SS SMRs, with ATG4B, NF1 and TP53 having both types of SMRs. Finally, we found 62 5SS SMRs and 53 3SS SMRs in cancer genes, including MET, CHEK2, BRCA1, VEGFA, RB1, CDKN2A, as well as TP53, PTEN, and CHD1, which were described before to have cancer mutations at splice sites (23). In summary, our SMRs provide a rich resource with potential to uncover new relevant noncoding alterations in cancer.
Somatic mutations occur frequently on RBP binding motifs
To further understand the properties of our SMRs, we calculated the mutation frequencies at trinucleotides considering the strand of the gene in which the SMR was defined. We observed an enrichment of C>T and G>A mutations on SMRs (Fig. 2A, top), which was recapitulated in the reverse-complement triplets, indicating a strong contribution from DNA-related selection processes. To identify SMRs that reflect RNA-related selection processes we studied sequence motifs potentially related to RNA processing. We performed an unbiased k-mer enrichment analysis in SMRs with k = 6. Further, to select those associated with RNA rather than DNA, we reverse complemented all SMRs and control sequences and repeated the enrichment analysis, and 6-mers enriched in both calculations for the same region-type were discarded. We found a total of 357 enriched 6-mers (Supplementary Table S5) in 3,546 SMRs (Supplementary Table S6). These enriched 6-mers showed a different mutational pattern compared with all SMRs (Fig. 2A; Supplementary Table S7). The symmetry between triplets and their reverse complements was no longer present, and there was an enrichment of mutations at AGA and TCC triples, probably reflecting RNA-related selection processes.
Enriched splice-site motifs in SMRs. A, Top, Mutation pattern in SMRs. For each nucleotide substitution, we give the total count of substitutions observed in SMRs separated according to the nucleotide triplet in which it occurs. Bottom, Mutation patterns in enriched 6-mers in SMRs, after removing 6-mers enriched also after reverse complementing the sequences (see Materials and Methods). For each nucleotide substitution, we give the total count observed separated according to the nucleotide triplet in which it occurs. In both panels, because the 6-mers are stranded, we give the substitutions according to the 6-mer strand. B, Mutation patterns in 5SS SMRs. Top, The location of the mutations relative to the splice site. The logo describes the splice-site motifs significantly enriched in 5SS SMRs, i.e., they appear more frequently in 5SS SMRs than in the corresponding controls. The bar plots below show the proportion of all somatic mutations in the 6-mers (y axis) that fall on each position along the motif logo. In orange indicate those somatic mutations that coincide with a germline SNP (with a different substitution pattern, as the exact matching substitutions were removed). Bottom, For each position the observed substitutions, indicated with a color code. C, Similar plots for the 3SS SMRs. We give below two examples of mutations found at splice-site motifs in the genes NF1 (D) and FGFR3 (E). Above the gene track we show the SMR (green track), the enriched motif found in the SMR (blue track), and the somatic mutations (read track). For each mutation we indicate the patient identifier, the tumor type and the substitution.
Enriched splice-site motifs in SMRs. A, Top, Mutation pattern in SMRs. For each nucleotide substitution, we give the total count of substitutions observed in SMRs separated according to the nucleotide triplet in which it occurs. Bottom, Mutation patterns in enriched 6-mers in SMRs, after removing 6-mers enriched also after reverse complementing the sequences (see Materials and Methods). For each nucleotide substitution, we give the total count observed separated according to the nucleotide triplet in which it occurs. In both panels, because the 6-mers are stranded, we give the substitutions according to the 6-mer strand. B, Mutation patterns in 5SS SMRs. Top, The location of the mutations relative to the splice site. The logo describes the splice-site motifs significantly enriched in 5SS SMRs, i.e., they appear more frequently in 5SS SMRs than in the corresponding controls. The bar plots below show the proportion of all somatic mutations in the 6-mers (y axis) that fall on each position along the motif logo. In orange indicate those somatic mutations that coincide with a germline SNP (with a different substitution pattern, as the exact matching substitutions were removed). Bottom, For each position the observed substitutions, indicated with a color code. C, Similar plots for the 3SS SMRs. We give below two examples of mutations found at splice-site motifs in the genes NF1 (D) and FGFR3 (E). Above the gene track we show the SMR (green track), the enriched motif found in the SMR (blue track), and the somatic mutations (read track). For each mutation we indicate the patient identifier, the tumor type and the substitution.
From the 74 enriched 6-mers found on 5SS SMRs, 46 included the 5′ splice-site consensus GT (5SSC). 5SSC motifs showed a strong conservation of G at the +5 intronic position (position 7 in Fig. 2B), with the highest density of mutations, mostly G>A and G>T, on either side of the exon–intron boundary (Fig. 2B). From the 52 enriched 6-mers found on 3SS SMRs, 36 contained the 3SS consensus AG (3SSC), with strong conservation of C nucleotides at the −5 position of the intron (position 1 in Fig. 2C) and with positions −1 and −3 (3 and 5 in Fig. 2C) being the most frequently mutated positions (Fig. 2C). Among the cases found, there was a 5SS in NF1 with mutations in skin (SKCM), lung (LUAD), and uterine (UCEC) tumors (Fig. 2D), and a 3SS in FGFR3 with mutations in head and neck (HNSC) and bladder (BLCA) tumors (Fig. 2E). Mutations at splice-site motifs were more frequent in lung (LUAD, LUSC) and uterine (UCEC) tumors; 5SSC mutations were also frequent in bladder tumors (BLCA), whereas 3SSC mutations were specifically frequent in colorectal tumors (Fig. 3A).
Cancer mutations in enriched RBP motifs. We provide the proportion of samples separated by tumor type (y axis) that have a mutated motif in 5SS (A), 3SS (B), CDS (C), and INTRON (D) SMRs. In each SMR type, we show the enriched motifs. For 5SS and 3SS, we indicate the consensus 5′ or 3′ splice-site sequences (5SSC/3SSC). The proportions are color coded by tumor type. We show the mutation patterns on SRSF10 motifs in 3SS (E), CDS (F), and INTRON (G) SMRs. Top, We indicate in dark red the position of the mutations and in light red the positions covered by motif. The bar plots below show the proportion of somatic mutations (y axis) that fall on each position along the motif logo. In orange indicate those somatic mutations that coincide with a germline SNP (with a different substitution pattern, as the exact matching substitutions were removed). Below we show for each position the number of motifs with each type of substitution indicated with a color code below.
Cancer mutations in enriched RBP motifs. We provide the proportion of samples separated by tumor type (y axis) that have a mutated motif in 5SS (A), 3SS (B), CDS (C), and INTRON (D) SMRs. In each SMR type, we show the enriched motifs. For 5SS and 3SS, we indicate the consensus 5′ or 3′ splice-site sequences (5SSC/3SSC). The proportions are color coded by tumor type. We show the mutation patterns on SRSF10 motifs in 3SS (E), CDS (F), and INTRON (G) SMRs. Top, We indicate in dark red the position of the mutations and in light red the positions covered by motif. The bar plots below show the proportion of somatic mutations (y axis) that fall on each position along the motif logo. In orange indicate those somatic mutations that coincide with a germline SNP (with a different substitution pattern, as the exact matching substitutions were removed). Below we show for each position the number of motifs with each type of substitution indicated with a color code below.
To identify RBPs that could potentially bind the enriched 6-mers beyond 5SSC/3SSC motifs, we used DeepBind (31) to score the enriched 6-mers using models for 522 proteins containing KH, RRM, and C2H2 domains from human, mouse, and Drosophila (Supplementary Fig. S10). We could confidently label 245 (68.6%) of the 357 enriched 6-mers. In 5SS and 3SS SMRs, besides the enriched 5SSC/3SSC motifs, we identified binding sites for multiple RBPs (Fig. 3A and B), including SRSF10, a splicing factor that regulates an alternative splicing response to DNA damage (46) and PCBP1, which was linked to alternative splicing in cancer (47). SRSF10 and PCBP1 motifs also appeared in CDS and INTRON SMRs (Fig. 3C and D; Supplementary Fig. S11). In 3SS SMRs, we also found binding sites for HNRPLL (Fig. 3B), a regulator of T-cell activation through alternative splicing (48), which was also found on EXON and INTRON SMRs (Fig. 3D; Supplementary Fig. S11).
To validate our assignment of RBP labels to SMRs we used data from 91 CLIP experiments for 68 different RBPs (34–40). Using the significant CLIP-seq signals as evidence of protein–RNA interaction, we observed an enrichment of CLIP signal on labeled SMRs compared with SMRs without any RBP assignment (Fisher exact test P = 4.10e−30, odds ratio = 1.97). We further tested each specific motif independently for the enrichment of CLIP signals with respect to the other motifs (Supplementary Table S8; Supplementary Fig. S10). Motifs associated with splice-site sequences had the highest enrichment of CLIP signal. We also observed that SMRs labeled with CPEB4, SRSF10, PCBP1, or PCBP3 are among the cases with greatest enrichment of CLIP signal. Thus, our labeled SMRs are generally enriched on CLIP-seq signal compared with nonlabeled SMRs, supporting the notion that these SMRs are likely to participate in protein–RNA interactions.
Somatic mutations show positional biases on RBP binding motifs
We further studied whether particular positions on the identified RBP motifs were more frequently mutated than others. For each RBP, we grouped the SMRs containing the enriched labeled 6-mers and performed a MSA to determine the equivalent positions of the motifs across the SMRs. For SRSF10 motifs, we found an enrichment of A>G mutations at A positions at INTRON, CDS, and 3SS SMRs (Fig. 3E–G). On EXON SMRs the most abundant RBP motif was CPEB4, which showed enrichment of T>A mutations, whereas HNRPLL and PCBP1 motifs showed enrichment of C>T mutations (Supplementary Fig. S12). At 5UTR SMRs, we found multiple T- and C-rich motifs, predominantly in melanoma (SKCM; Fig. 4A). In particular, PCBP3 and PTBP1 motifs at 5UTR SMRs, which were characterized by an enrichment of C>T substitutions (Fig. 4B), might be related to the 5′ terminal oligopyrimidine tract (5′TOP) motif that is relevant for translational regulation (49, 50), and these mutations could indicate an impact on translation. At 5UTR SMRs, there were also G-rich motifs (HNRPLL and HNRNPA2B1) with frequent G>A mutations (Supplementary Fig. S13). At 3UTR SMRs (Fig. 4C) we observed frequent CT-rich motifs associated with PTBP1, HNRNPC, PCBP3, and ELAVL1 (HuR) in colorectal carcinoma, BLCA, and UCEC patients (Fig. 4D), and AC-rich motifs associated with IGF2BP2 in UCEC and colorectal carcinoma with enrichment of C>T mutations (Supplementary Fig. S14).
Cancer mutations in enriched RBP motifs in 5UTR and 3UTR SMRs. For 5UTR (A) and 3UTR (B) SMRs, we provide the proportion of samples in each tumor type (y axis) that have a mutated RNA binding protein (RBP) motif (x axis). The proportions are color coded by tumor type. The proportions of SMRs with each RBP motif per tumor type are given in Supplementary Fig. S11. Positional patterns of mutations on (C) PCBP3 motifs in 5UTR SMRs, and on (D) HNRNPC3 motifs in 3UTR SMRs. Top, We indicate in red the positions covered by the motif and in dark red the position of the mutations. The bar plots below show the proportion of somatic mutations (y axis) that fall on each position along the motif logo. In orange we indicate those somatic mutations that coincide with a germline SNP in position (with a different substitution pattern, as the exact matching substitutions were removed). Below we show for each position, the number of motifs with each type of substitution indicated with a color code below.
Cancer mutations in enriched RBP motifs in 5UTR and 3UTR SMRs. For 5UTR (A) and 3UTR (B) SMRs, we provide the proportion of samples in each tumor type (y axis) that have a mutated RNA binding protein (RBP) motif (x axis). The proportions are color coded by tumor type. The proportions of SMRs with each RBP motif per tumor type are given in Supplementary Fig. S11. Positional patterns of mutations on (C) PCBP3 motifs in 5UTR SMRs, and on (D) HNRNPC3 motifs in 3UTR SMRs. Top, We indicate in red the positions covered by the motif and in dark red the position of the mutations. The bar plots below show the proportion of somatic mutations (y axis) that fall on each position along the motif logo. In orange we indicate those somatic mutations that coincide with a germline SNP in position (with a different substitution pattern, as the exact matching substitutions were removed). Below we show for each position, the number of motifs with each type of substitution indicated with a color code below.
For each enriched motif, we calculated the enrichment of gene sets (GO Biological Function, Pathways and Oncogenic Pathways) in the genes harboring SMRs with the motif (Supplementary Table S9). Among the motifs associated with cancer-related functions: 5SSC motifs appeared related to apoptosis and DNA damage response functions, as well as to NFKB activation and PI3K signal cascade (Supplementary Fig. S15). Genes with SMRs harboring SRSF10 motifs showed association to apoptosis and immune response, whereas genes with HNRNPC or HNRNPLL motifs were related to metabolic processes. RBM41-motif containing SMRs are also related to genes involved in metabolic processes, as well as in T-cell activation. These results indicate that cancer mutations on RBP motifs potentially affect a wide range of functions that may lead to specific tumor phenotypes.
Somatic mutations in RBP binding motifs impact in RNA expression and splicing
To determine the impact of mutations in enriched motifs, we tested their association with changes in RNA processing. We first estimated the association of mutations with changes in transcript isoform expression. From the 20,308 SMRs tested, 148 showed an association with a significant expression change (Fig. 5A; Supplementary Table S10). Most of the significant changes were associated with INTRON SMRs in skin (SKCM) and colorectal tumors (Supplementary Figs. S16 and S17). The motifs PTBP1, PCBP1, RBM8A, and ZNF638 harbored the highest number of mutations associated with significant transcript expression changes (Fig. 5B; Supplementary Fig. S17). In particular, mutations in RBM8A motifs were associated with expression changes of the histone acetyl-transferase gene KAT6A and the pre–B-cell leukemia transcription factor 1 gene PBX1, both potential oncogenes (51, 52). In the case of PBX1, two transcript isoforms change expression in opposite directions, indicating an isoform switch.
Transcript expression changes associated with mutations in RBP motifs. A, For each region type (x axis), we give the number of SMRs (y axis) for which we found a significant change in transcript isoform expression associated with somatic mutations in enriched motifs within the SMR. The counts are color coded by tumor type. B, For each RBP motif (x axis), we give the number of cases for which we found a significant change in transcript expression (y axis) associated with somatic mutations in the motif. The counts are color coded by tumor type. C, Significant changes in transcript expression associated with mutations in intronic SRSF10 motifs. For each patient (x axis) we show the transcripts (y axis) that have a significant increase (red) or decrease (blue) in expression. We separate them according to tumor type (indicated above). We show the significant expression changes detected in two different transcripts of MAPK10 associated with mutations in the SRSF10 motif. In the UCSC screen shot, we indicate the patients and mutations on the CAGAGA motif in the intron of MAPK10. D, Significant changes in transcript expression associated with mutations in intronic MBNL1 motifs. For each patient (x axis) we show the transcripts (y axis) that have a significant increase (red) or decrease (blue) in expression. We separate them according to tumor type (indicated above). We show the significant expression changes detected in two different transcripts of ETV1 associated to mutations in the MBNL1 motif. In the UCSC screen shot, we indicate the patients and mutations on the CGCTTT motif in the intron of ETV1.
Transcript expression changes associated with mutations in RBP motifs. A, For each region type (x axis), we give the number of SMRs (y axis) for which we found a significant change in transcript isoform expression associated with somatic mutations in enriched motifs within the SMR. The counts are color coded by tumor type. B, For each RBP motif (x axis), we give the number of cases for which we found a significant change in transcript expression (y axis) associated with somatic mutations in the motif. The counts are color coded by tumor type. C, Significant changes in transcript expression associated with mutations in intronic SRSF10 motifs. For each patient (x axis) we show the transcripts (y axis) that have a significant increase (red) or decrease (blue) in expression. We separate them according to tumor type (indicated above). We show the significant expression changes detected in two different transcripts of MAPK10 associated with mutations in the SRSF10 motif. In the UCSC screen shot, we indicate the patients and mutations on the CAGAGA motif in the intron of MAPK10. D, Significant changes in transcript expression associated with mutations in intronic MBNL1 motifs. For each patient (x axis) we show the transcripts (y axis) that have a significant increase (red) or decrease (blue) in expression. We separate them according to tumor type (indicated above). We show the significant expression changes detected in two different transcripts of ETV1 associated to mutations in the MBNL1 motif. In the UCSC screen shot, we indicate the patients and mutations on the CGCTTT motif in the intron of ETV1.
We also found mutations in an intronic SRSF10 motif associated with expression changes in a transcript from the dystrophin gene (DMD) in breast cancer (BRCA), and in two transcripts from the mitogen-activated protein kinase 10 MAPK10 (JNK3) in colorectal cancer (Fig. 5C). In DMD, the mutation was associated with increased expression, whereas in MAPK10 we observed an isoform switch (Fig. 5C). MAPK10 is a proapoptotic gene and our results suggest that a mutation in an intronic SRSF10 motif conserved in primates to be associated with an isoform switch in colorectal carcinoma (Fig. 5C). We also found a colorectal carcinoma mutation on an intronic MBNL1 motif conserved in mammals that is associated with an upregulation of the transcription factor ETV1 (Fig. 5D), which has been linked to prostate cancer (53). A mutation at a nearby site in SKCM shows association with downregulation of a different transcript isoform (Fig. 5D). In 5UTR SMRs, significant changes were associated only with SKCM mutations (Supplementary Fig. S18). One of them corresponds to a mutation on a PCBP3 motif associated with an isoform switch in the galactokinase-2 gene GALK2 in melanoma. GALK2 is a regulator of prostate cancer cell growth (54) and has alternative splicing in multiple tumor types (30). Our results indicate that these splicing alterations could stem from a mutation on a PCBP3 motif at the 5′UTR. Finally, among the changes associated with 3UTR SMRs, we found a mutation in a HuR motif in colorectal carcinoma related to the downregulation of the Debrin-like gene DBNL (Supplementary Fig. S18).
We also analyzed the changes in all possible exon–exon junctions defined from spliced reads mapped to the genome and overlapping SMRs. Of the SMRs tested, 30 were associated with a significant inclusion change in at least one junction (Fig. 6A; Supplementary Table S11). The majority of cases occurred in INTRON SMRs and associated with mutations in bladder (BLCA) and head and neck squamous cell (HNSC) tumors. Significant junctions were also commonly associated with 5SS SMRs, mainly in uterine (UCEC) and skin (SKCM) tumors (Fig. 6A) and in 5UTR SMRs specifically associated with SKCM mutations. In contrast, we found few associations for 3SS and EXON SMRs, and none for CDS SMRs (Fig. 6A). Significant changes in junctions were most commonly associated with the 5SSC and 3SSC motifs, as well as with IGF2BP2, HNRNPA2B1, HNRNPLL, and PCBP1 motifs among others (Fig. 6A). Among the significant changes associated with 5SSC (Fig. 6C), a mutation in the peptidyl-tRNA hydrolase 2 gene PTRH2 in uterine cancer (UCEC) would lead to the recognition of an upstream cryptic 5SS (Fig. 6D). PTRH2 induces anoikis and its downregulation is linked to metastasis in tumor cells (55). The mutation observed could therefore play a role in cancer progression. We also found a significant splicing change in the Farnesyl Pyrophosphate Synthetase gene FDPS in melanoma that would induce an alternative 5SS and skip part of the Polyprenyl synthetase domain (Supplementary Fig. S19). FDPS induces autophagy in cancer cells (56), and an alteration of the autophagy pathway has been related to myeloid neoplasms when SF3B1 mutations are present (57). It would be interesting to investigate further whether this splicing-induced alteration of FDPS could recapitulate a similar phenotype. Among the significant changes associated with 3SS mutations, there was one in the Adenine Phosphoribosyl transferase gene APRT associated with G>A mutations at the consensus 3SS splice site in melanoma (SKCM), which would skip the Phosphoribosyl transferase domain (Supplementary Fig. S19).
Changes in junction usage associated with mutations in RBP motifs. A, For each region type (x axis) we give the number of SMRs (y axis) for which we found a significant change in exon–exon junction inclusion associated with somatic mutations in enriched motifs within the SMR. The counts are color coded by tumor type. B, For each RBP motif (x axis), we give the number of instances (y axis) for which we found a significant change in exon–exon junction inclusion associated with somatic mutations in the motif (x axis). The counts are color coded by tumor type. C, Significant changes in junction inclusion associated with mutations in 5SS motifs. For each patient (x axis) we show the junctions (y axis) that have a significant increase (orange) or decrease (cyan) in inclusion (PSI). We separate them according to tumor type (indicated above). D, Significant junction changes in PTRH2 in uterine cancer (UCEC). We show the changing junctions in gray. The mutation in the annotated 5SS induces the usage of an upstream cryptic 5SS. We show the SMR (green), the enriched motif (blue), and the mutation (red). The boxplots below show the PSI values (y axis) of the two changing junctions separated by samples with mutations and without mutations in this SMR in UCEC, indicated in the x axis. E, Significant changes in junction inclusion associated with mutations in 5UTR SMRs. For each patient (x axis), we show the junctions (y axis) that have a significant increase (orange) or decrease (cyan) in inclusion (PSI). We separate them according to tumor type (indicated above). F, Significant junction changes in C16orf59 associated with mutations in an HRNPLL motif in melanoma (SKCM). The boxplots show the PSI values (y axis) of the two changing junctions separated by samples with mutations and without mutations, indicated in the x axis. In the screenshot we show the 5UTR SMR (green), the enriched motifs (blue), and the mutations (red), which suggest dinucleotide mutations GG>AA in some patients. The junctions associated with these mutations are downstream of the SMR and do not appear in the genomic range shown in the figure.
Changes in junction usage associated with mutations in RBP motifs. A, For each region type (x axis) we give the number of SMRs (y axis) for which we found a significant change in exon–exon junction inclusion associated with somatic mutations in enriched motifs within the SMR. The counts are color coded by tumor type. B, For each RBP motif (x axis), we give the number of instances (y axis) for which we found a significant change in exon–exon junction inclusion associated with somatic mutations in the motif (x axis). The counts are color coded by tumor type. C, Significant changes in junction inclusion associated with mutations in 5SS motifs. For each patient (x axis) we show the junctions (y axis) that have a significant increase (orange) or decrease (cyan) in inclusion (PSI). We separate them according to tumor type (indicated above). D, Significant junction changes in PTRH2 in uterine cancer (UCEC). We show the changing junctions in gray. The mutation in the annotated 5SS induces the usage of an upstream cryptic 5SS. We show the SMR (green), the enriched motif (blue), and the mutation (red). The boxplots below show the PSI values (y axis) of the two changing junctions separated by samples with mutations and without mutations in this SMR in UCEC, indicated in the x axis. E, Significant changes in junction inclusion associated with mutations in 5UTR SMRs. For each patient (x axis), we show the junctions (y axis) that have a significant increase (orange) or decrease (cyan) in inclusion (PSI). We separate them according to tumor type (indicated above). F, Significant junction changes in C16orf59 associated with mutations in an HRNPLL motif in melanoma (SKCM). The boxplots show the PSI values (y axis) of the two changing junctions separated by samples with mutations and without mutations, indicated in the x axis. In the screenshot we show the 5UTR SMR (green), the enriched motifs (blue), and the mutations (red), which suggest dinucleotide mutations GG>AA in some patients. The junctions associated with these mutations are downstream of the SMR and do not appear in the genomic range shown in the figure.
We also found mutations in 5UTR and EXON motifs associated with splicing changes (Fig. 6E). We found a mutation in a 5UTR HNRPLL motif in C16orf59 that is conserved across mammals (Fig. 6F). We also found a mutation in an EXON IGF2BP2 motif in the C14orf37 locus (Supplementary Fig. S19). Finally, INTRON SMRs had the most numerous number of junction changes associated with mutations in multiple RBP motifs, including RBM8A and SRSF10 (Supplementary Fig. S20), which included changes in the histone gene HIST1H2AC related to mutations in the RBM8A motif in HNSC, and in GMEB1 related to mutations in an SRSF10 motif. BED and GFF tracks representing all the found cases are available as Supplementary Material.
Discussion
We have described a novel method to identify and characterize somatic mutations in coding and noncoding regions in relation to their potential to be involved in protein–RNA binding. By considering mutational significance in combination with the enrichment of RBP binding motifs, we identified mutations in noncoding regions, particularly in deep intronic regions, with evidence of impact on RNA processing. Our analysis provides new potential mechanisms by which somatic mutations impact RNA processing and contribute to tumor phenotypes. Although these mutations may not be as frequent as other classic oncogenic mutations, recent evidence suggests that rare somatic variants can have an impact on expression and be clinically actionable (58). As RBPs are known to control entire cellular pathways, from epithelial-to-mesenchymal transition (59) to cellular differentiation (60), these results suggest a general model by which cancer mutations disrupt the function of RBP targets and contribute independently to the disruption of similar pathways (Fig. 7). Our work also provides a new strategy to interpret noncoding mutations and indicates that lowly recurrent mutations could still be relevant to the study of cancer, as they can affect functions collectively controlled by the same RBP.
Noncoding mutations in human tumors affect binding sites of RNA binding proteins. Our analysis suggests that many of the mutations (indicated in red) on noncoding regions, predominantly introns, and UTRs, affects binding sites of RNA binding proteins (indicated in orange and green) and affect RNA processing in multiple different genes across patients. In the figure, altered RNA processing is indicated as solid gene models. These alterations would contribute to the frequent changes observed in RNA processing in tumors and could indicate novel oncogenic mechanisms.
Noncoding mutations in human tumors affect binding sites of RNA binding proteins. Our analysis suggests that many of the mutations (indicated in red) on noncoding regions, predominantly introns, and UTRs, affects binding sites of RNA binding proteins (indicated in orange and green) and affect RNA processing in multiple different genes across patients. In the figure, altered RNA processing is indicated as solid gene models. These alterations would contribute to the frequent changes observed in RNA processing in tumors and could indicate novel oncogenic mechanisms.
MIRA presents several advantages with respect to previous methods. It detects deep intronic mutations, whereas previous methods only tested intronic regions immediately adjacent to exons (11, 15). Additionally, unlike previous approaches (11, 13, 15), we examined the location of the mutations in the context of an RNA regulatory motif, which enables the combined interpretation of noncoding mutations at multiple sites. Additionally, our analyses were driven by the positions of mutations on regulatory motifs, hence providing the precise region where mutations likely play a role. Finally, unlike most previous approaches, we tested the impact on RNA processing and expression using RNA-seq from the same samples. We observed significant changes for a small fraction of SMRs, indicating that the overall impact on RNA is modest as measured by RNA sequencing from cancer tissue samples. Deeper sequencing of tissue samples or in vitro–based assays could help validating many more of these events.
The majority of the validated cases were deep intronic mutations, whose interpretation has remained elusive so far. Here, we provided evidence of their potential relevance in cancer. Interestingly, whereas other methods assumed that the function of lncRNAs may be impacted only through exonic mutations (15), we found that intronic SMRs may be relevant as well. Additionally, although it has been generally assumed that RNA structure determines the function in UTR regions (11), we found 5UTR SMRs associated with RNA processing changes, indicating new mechanisms. Our approach is subject to several limitations. The analysis may be underpowered due to the relatively small number of patients analyzed. Another limitation is that we chose specific descriptions for the RNA binding motifs; hence, the analysis is limited by their accuracy. Despite computational and technological advances, precise definitions of RBP binding sites at a genome scale remain challenging, and different RBPs may bind similar sequence motifs. To deal with these ambiguities, our method allowed for multiple assignments of 6-mers to RBP labels, providing the opportunity to describe the mutational patterns of different RBP motifs.
In summary, MIRA analyzes noncoding mutations to take into account functional analogous sites at different genomic positions and RNA-related selection processes, providing evidence that multiple RNA processing mechanisms may be impaired in cancer through mutations on RBP motifs, thereby uncovering novel alterations relevant for cancer.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: E. Eyras
Development of methodology: B. Singh, J.L. Trincado
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): B. Singh, S.R. Piccolo
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B. Singh, J.L. Trincado, PJ Tatlow, S.R. Piccolo, E. Eyras
Writing, review, and/or revision of the manuscript: B. Singh, S.R. Piccolo, E. Eyras
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.L. Trincado, PJ Tatlow
Study supervision: E. Eyras
Acknowledgments
B. Singh., J.L. Trincado, and E. Eyras were supported by the Spanish Government and FEDER with grants BIO2014-52566-R, BIO2017-85364-R, and MDM-2014-0370, and by AGAUR with grants SGR2014-1121 and SGR2017-1020. B. Singh was funded by an FPI grant from the Spanish Government with reference BES-2012-052683. S.R. Piccolo was supported by startup funds from Brigham Young University. PJ Tatlow was supported by a fellowship from the Simmons Center for Cancer Research at Brigham Young University.
The authors thank M. Taylor, E. Larsson, and N. Lopez-Bigas for discussions. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.