Noncoding cis-regulatory variants have gained interest as cancer drivers, yet progress in understanding their significance is hindered by the numerous challenges and limitations of variant prioritization. To overcome these limitations, we focused on active cis-regulatory elements (aCRE) to design a customized panel for the deep sequencing of 56 neuroblastoma tumor and normal DNA sample pairs. To search for driver mutations, aCREs were defined by reanalysis of H3K27ac chromatin immunoprecipitation sequencing peaks in 25 neuroblastoma cell lines. These regulatory genomic regions were tested for an excess of somatic mutations and assessed for statistical significance using a global approach that accounted for chromatin accessibility and replication timing. Additional validation was provided by whole genome sequence analysis of 151 neuroblastomas. Analysis of HiC data determined the presence of candidate target genes interacting with mutated regions. An excess of somatic mutations in aCREs of diverse genes were identified, including IPO7, HAND2, and ARID3A. CRISPR-Cas9 editing was utilized to assess the functional consequences of mutations in the IPO7-aCRE. Patients with noncoding mutations in aCREs showed inferior overall and event-free survival independent of age at diagnosis, stage, risk stratification, and MYCN status. Expression of aCRE-interacting genes correlated strongly with negative prognostic markers and low survival rates. Moreover, a convergence between the biological functions of aCRE target genes and transcription factors with mutated binding motifs was associated with embryonic development and immune system response. Overall, this strategy enabled the identification of somatic mutations in regulatory elements that collectively promote neuroblastoma tumorigenesis.

Significance:

Assessment of noncoding cis-regulatory variants and long-range interaction data highlight the combined effect of somatic mutations in regulatory elements in driving neuroblastoma.

In recent years, there has been growing interest in the role of noncoding variants as cancer drivers. Noncoding SNVs in the promoters of TERT, FOXA1, PLEKHS1, WDR74, SDHD (1–3), and a cis-regulatory element of ETV1 (4) have been identified as driver mutations. We have recently demonstrated that somatic SNVs do not occur randomly across noncoding regulatory regions, but rather target the binding sites of a specific set of tumorigenic transcription factors (TF). These suggest that the combination of mutations within a set of regulatory regions acting in a network can drive cancer initiation (5). So far, no other study has investigated the pathogenic role of noncoding SNVs in neuroblastoma. Despite these recent advances, it remains a challenge to identify noncoding cancer driver mutations and assess their pathogenic role in cancer development.

A common approach to prioritizing somatic noncoding SNVs is the identification of genomic regions with a high mutation frequency in different cancer samples. However, distinguishing driver from passenger mutations is challenging since mutation rates can be affected by replication timing (6) and chromatin conformation (7). Moreover, other factors, such as the complexity of the human genome, the large cohorts needed to reach statistical power, and the sheer number of somatic mutations diminish the chances of obtaining significant results after correcting for multiple hypotheses testing. In this regard, cis-regulatory elements (CRE) may represent a highly enriched subset of genomic regions enabling the identification of such mutations. In addition, CREs are characterized by an unusually low mutation rate compared with other genomic regions (3, 4). Thus, analyses of CREs may reduce the risk of obtaining false-negative or false-positive results.

In recent years, deep sequencing of coding DNA has contributed greatly to improving the understanding of the complex picture of genetic variation in cancer. Indeed, we now know that a compendium of rare subclones may be responsible for drug resistance, invasion, metastasis, and relapse (8). However, only a few studies have used a targeted deep sequencing (TD-Seq) approach to search for somatic mutations in CREs. Notably, a recent study on breast cancer reported recurrent mutations in the promoter of the cancer driver gene FOXA1 by deep sequencing of regulatory elements (2).

Given the above observations, we believe that focusing on CREs and using TD-Seq could overcome the limitations of noncoding driver analysis and lead to the identification of recurrently mutated regions that regulate tumorigenic genes.

Neuroblastoma is an aggressive pediatric cancer originating from neural crest cells of the sympathetic nervous system. The overall 5-year survival rate in high-risk patients is lower than 40%, despite decades of considerable international efforts to improve outcomes (9). In young children, MYCN amplification and other chromosomal arm-level alterations such as deletion of 1q (30%) and 11q (45%) and unbalanced gain of 17q (60%) are reported as poor prognostic features (10). Children older than 6 years present unique structural variants, with 19p loss and 1q gain among the most frequent (11). Recently, we and others highlighted the paucity of recurrent somatic coding mutations in primary neuroblastoma, with activating mutations in ALK and inactivating mutations in ATRX as the most reported commonly (12, 13). Moreover, rearrangements activating the TERT locus (14) and mutations affecting the RAS and TP53 pathways occur in high-risk neuroblastoma (15). Recently, large-scale studies have shown that disease-associated genomic variations target DNA regulatory elements (16). Our genome-wide association studies (GWAS) have revealed that many neuroblastoma susceptibility variants lie in noncoding regions of the genome (17) and affect the expression of cancer genes such as LMO1 (18), BARD1 (19, 20), LIN28B (21), and SLC16A1 (22), which are involved in neuroblastoma tumorigenesis. Together, these studies suggest that an increased understanding of genomic alterations could have an impact on patient prognosis and response to therapies. Nevertheless, further efforts should be made to shed light on the molecular events driving neuroblastoma initiation and progression.

In this study, we proposed an alternative method to identify regulatory driver mutations in neuroblastoma. First, we defined CREs in neuroblastoma by analysis of H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq) peaks common to 25 neuroblastoma cell lines. Second, we performed TD-Seq of CREs in the genome of 56 neuroblastomas and matched normal samples. Third, we tested these regions for an excess of somatic mutations using a global approach that corrects the mutation rate for replication time, in search of noncoding driver mutations. Fourth, we validated the mutated CREs using independent whole genome sequencing (WGS) data from 151 neuroblastomas. Fifth, we experimentally validated a set of noncoding mutations to determine their biological significance. Finally, we verified whether the somatic mutations significantly enriched in CREs could collectively affect specific biological processes in neuroblastoma cells.

Analysis of public ChIP-seq data and sequencing target selection

H3K27ac ChIP-seq data from neuroblastoma cell lines, deposited in NCBI Gene Expression Omnibus with accession no. GSE90683 (23), were downloaded and reanalyzed as described in ref. 22. To select targets for our sequencing panel, we used the BEDTools suite and proceeded as follows: first, we reduced the size of peaks larger than 1,100 bp to 80% of their length (40% upstream and downstream); second, we obtained the intersections of H3K27ac peaks, and short common regions were extended to reach 301 bp; finally, we merged overlapping, book-ended and close regions (up to 100 bp) to get the final set of intervals (n = 13,667; target length = 9,804,818 bp).

To evaluate the tissue-specificity of the selected CREs, we compared the set of 25 neuroblastoma cell lines used to build our sequencing panel with two independent datasets: a set of 19 ENCODE cell lines (GSE29611) and a set of 59 cancer cell lines (GSE143653) published by Gopi and colleagues (24). Both sets were profiled by H3K27ac ChIP-seq. In this analysis, we evaluated genome-wide acetylation-based similarities by performing pairwise Pearson correlations of the ChIP-seq peak files and, in a subsequent analysis, assessed the acetylation levels of our selected peaks. For each of our CREs, we first obtained a mean acetylation level in all cell lines examined (n = 103); we then averaged the acetylation values for cell type and assessed the statistical significance of the differences using the Mann–Whitney test.

Sample collection

Neuroblastoma tumor DNA (from primary tumors) and matched germline DNA (from peripheral blood) were obtained from the Italian Neuroblastoma Registry and Hospital Sant Joan de Déu. Primary tumor samples were verified to have >75% viable tumor cell content by histopathology assessment. This study was approved by the Ethics Committee of the University of Naples Federico II.

Detection of somatic mutations

TD-Seq of 56 normal–primary neuroblastoma sample pairs was performed using an Illumina HiSeq1500 platform. Details of DNA extraction and library preparation are reported in Supplementary Information. The sequencing produced paired-end reads of 150 bp. Mapping BAM files were obtained using BWA and SAMTools by aligning the reads versus the GRCh37/hg19 reference genome assembly. Somatic SNVs and small insertions and deletions (INDEL) were detected using MuTect and Strelka, respectively. Functional annotation was performed using ANNOVAR and FunSeq2.

Filtering of somatic mutations

Poor quality raw somatic mutation calls were removed. We eliminated somatic mutations falling within duplicated genomic regions and filtered out off-targets and common polymorphisms (minor allele frequency >1%) by using allele frequencies of non-Finnish European populations from the 1,000 Genomes Project, ExAC and gnomAD databases. Finally, we further analyzed only those somatic SNVs falling within DNase hypersensitive sites (DHS) of the SK-N-SH neuroblastoma cell line (ENCODE v3), as DNA regions characterized by H3K27ac and DHS signal peaks are considered to be transcriptionally active CREs (aCRE). We used this set of DHSs from the SK-N-SH neuroblastoma cell line for our analyses as we have already demonstrated its specificity for neuroblastoma (5).

Mutational enrichment in active CREs

After applying stringent quality controls and filters, the selected SNVs/INDELs were analyzed to distinguish driver mutations from passengers, using a global approach as described previously (3, 4). In brief, with this approach, we searched for an excess of noncoding mutations in aCREs, assuming that the observed number of tumor samples mutated in any specific region follows a binomial distribution. This is dependent on the background mutation frequency that, in turn, depends on the effective length of the region and other genomic characteristics, such as replication timing and chromatin accessibility. Significantly mutated aCREs were indeed identified by accounting for these factors. Computed P values were adjusted for multiple testing using the Benjamini–Hochberg method; a detailed description is reported in Supplementary Information. aCREs were considered as significantly mutated if they passed the following selection criteria: (i) FDR ≤ 0.05, or (ii) FDR < 0.15 and mutated in the independent validation set of 151 WGS samples described below.

WGS validation data sets

To validate the presence of somatic mutations in aCREs, we analyzed 14 primary neuroblastoma whole genomes published previously (5) and used publicly available data for 137 whole genomes from the TARGET neuroblastoma project, to which we had obtained authorized access (accession no. phs000218.v21.p7; Project ID: #14831; ref. 25). Here, we searched for somatic mutations by extending the length of each aCRE by 50% on each side.

HiC data analysis

As detailed in Supplementary Information, sequencing was performed on an Illumina HiSeq platform. Paired-end reads of 150bp were mapped to the reference genome (build hg19/GRCH37) using Bowtie2.

In vitro functional assays

Neuroblastoma cell lines SK-N-BE2 (ATCC #CRL-2271; RRID: CVCL_0528) and SK-N-SH (ATCC #HTB-11; RRID: CVCL_0531) were purchased from the ATCC and cultured following the manufacturer's guidelines; cells were passaged up to five times after thawing. All cell lines were routinely tested for Mycoplasma contamination using the Universal Mycoplasma Detection Kit (ATCC, 30–1012K).

In brief, we first performed luciferase reporter gene assays to test the regulatory activity of the wild-type and mutated aCRE (IPO7-aCRE). Next, INDELs mimicking the selected somatic mutations were introduced into the SK-N-BE2 cell line by CRISPR-Cas9 editing; wild type and edited clones were validated by the Sanger method. Electropherograms were analyzed using the web tool synthego (ice.synthego.com). Finally, expression levels of potential aCRE target genes were analyzed using qPCR in edited and wild-type SK-N-BE2 cells. Details are available in Supplementary Information.

Gene expression analysis, sample clustering, and survival analysis of patients with neuroblastoma

The R2: Genomics Analysis and Visualization Platform (http://r2.amc.nl) was applied to query transcriptomic data of 498 neuroblastoma samples (GSE62564). We used the k-means clustering algorithm to divide samples into two groups based on expression levels of genes that (according to the HiC results) significantly interacted with mutated aCREs. In particular, we ran three separate k-means clustering analyses as follows: three interacting genes for IPO7-aCRE, three interacting genes for HAND2-aCRE, and 24 interacting genes for ARID3A-aCRE. For each gene, median expression was used to divide samples into two groups; clinical features and survival probabilities were then compared between the high and low expression groups (gene expression above or below the median value, respectively). We used the Mann–Whitney test to compare gene expression levels, Chi-square test to compare clinical parameters, and log-rank test to compare survival probabilities. Statistical significance was set at 5%. As described previously (26), multivariate Cox proportional regression analysis was performed to evaluate the prognostic significance of the noncoding mutational burden and currently used prognostic factors, such as age at diagnosis (>18 months vs. <18 months), International Neuroblastoma Staging System (INSS) stage (stages 4 vs. stages 1, 2, 3, and 4s), MYCN status (amplified vs. not amplified), and risk group stratification. HRs and 95% confidence interval for survival rate were calculated.

Transcription factor binding motif analysis

We conducted a motif analysis with the R-Bioconductor package “motifbreakeR” to assess whether the somatic variants within the selected aCREs disrupted or created TF binding motifs. The ENCODE database of positional weighted matrices (PWM) was used as a reference. Motif breaks or gains were considered neutral if the difference of binding efficiency was within ±0.2.

Ethics approval and consent to participate

This study was approved by the Ethics Committee of the University of Naples Federico II. Written informed consent was obtained from all patients or their legal guardians.

Availability of data and materials

The sequencing data of tumor samples generated and analyzed during this study are available at the NCBI SRA under the accession no. PRJNA773618. Germline sequencing data are available from the corresponding author on reasonable request. Public data and data repositories are referenced within the manuscript.

Genome-wide map of CREs

CREs such as promoters and enhancers located within accessible chromatin are commonly identified by the acetylation of lysine 27 on histone 3 (H3K27ac). To define CREs in neuroblastoma, we identified H3K27ac ChIP-seq peaks (GSE90683) shared by 25 neuroblastoma cell lines (see Materials and Methods and Supplementary Information) representing three different tumor identities: sympathetic noradrenergic, neural crest cell-like, and mixed type (23). As outlined in Supplementary Fig. S1, we selected 13,437 common H3K27ac peaks with a mean length of around 730 bp (min = 300, max = 8,265, Fig. 1A); this represented our target (approximately 9.8 Mb) for TD-Seq of 56 tumor-normal sample pairs (Supplementary Table S1).

Figure 1.

Characteristics of the selected somatic variants. A, Box plot showing the median length of the targeted regions (Kb). B, Box plot showing the median number of filtered somatic variants per sample. C, Stacked bar plot reporting the location of raw and filtered somatic variants. D, Bar plot with the median expression of all “ALL” genes in the RNA-seq dataset of neuroblastoma cell lines (GSE90683) compared with the median expression of aCREs mid-range (up to 0.25 Mb) and long-range genes (up to 0.5 Mb). Gene expression data were downloaded and analyzed as processed files containing log2-transformed fragments per kilobase of exon per million mapped fragments (FPKM) values. E, Box plot showing the median levels of CADD pathogenicity scores for raw (2.02) and filtered (8.60) somatic variants. F, Box plot showing the median levels of FunSeq2 pathogenicity scores for raw (0.003) and filtered (1.27) somatic variants. G, Lollipop plot reporting the mutated genes with number and type of somatic exonic variants. Gene names written in red indicate known cancer genes. ***, P < 0.0001. P value was calculated by Mann–Whitney test.

Figure 1.

Characteristics of the selected somatic variants. A, Box plot showing the median length of the targeted regions (Kb). B, Box plot showing the median number of filtered somatic variants per sample. C, Stacked bar plot reporting the location of raw and filtered somatic variants. D, Bar plot with the median expression of all “ALL” genes in the RNA-seq dataset of neuroblastoma cell lines (GSE90683) compared with the median expression of aCREs mid-range (up to 0.25 Mb) and long-range genes (up to 0.5 Mb). Gene expression data were downloaded and analyzed as processed files containing log2-transformed fragments per kilobase of exon per million mapped fragments (FPKM) values. E, Box plot showing the median levels of CADD pathogenicity scores for raw (2.02) and filtered (8.60) somatic variants. F, Box plot showing the median levels of FunSeq2 pathogenicity scores for raw (0.003) and filtered (1.27) somatic variants. G, Lollipop plot reporting the mutated genes with number and type of somatic exonic variants. Gene names written in red indicate known cancer genes. ***, P < 0.0001. P value was calculated by Mann–Whitney test.

Close modal

Having identified a set of CREs in neuroblastoma, we asked if those regions could be tissue-specific. To answer this question, we compared the set of 25 neuroblastoma cell lines with two independent datasets, both profiled by H3K27ac ChIP-seq. The first dataset included 19 ENCODE cell lines (GSE29611) and the second included 59 cancer cell lines (GSE143653; ref. 24).

First, we evaluated the genome-wide, acetylation-based similarities by performing pairwise Pearson correlations of ChIP-seq peak files (Supplementary Fig. S2A). When neuroblastoma cell lines only were compared, we observed a mean Pearson correlation coefficient of 0.623. In contrast, the comparison of neuroblastoma cell lines with other cell types resulted in correlation coefficients ranging from 0.259 to 0.331 (Supplementary Fig. S2B). This indicated that the H3k27ac profiles of neuroblastoma cells differed substantially from other cell types.

Subsequently, we wanted to compare the acetylation levels of our selected CREs with the datasets of the cell lines described above (GSE29611 and GSE143653). Overall, we found significantly higher levels of acetylation of the selected CREs in neuroblastoma cells compared with other cell types (P < 2.0 × 10–16, Supplementary Fig. S2C). These results confirmed the hypothesis that these CREs could be highly tissue-specific.

Somatic mutations within active (a)CREs

We collected high-quality sequencing data, discarding 1.94% of reads on average due to low quality (Supplementary Fig. S3A). The percentage of bases with quality scores above 20 (Q20) and 30 (Q30) was 95.91% and 90.79%, respectively (Supplementary Figs. S3B–S3C). After alignment, a depth of coverage of 459× (Supplementary Fig. S3D) for 99.63% of the target was obtained (Supplementary Fig. S3E). Furthermore, the fraction of the target covered with at least 10, 20, and 50 reads was on average 96.24%, 92.89%, and 85.31%, respectively (Supplementary Figs. S3F–S3H). This allowed us to detect a highly reliable set of raw somatic variant calls. Indeed, we identified a total of 83,962 somatic SNVs/INDELs mapping mainly to intronic (36.80%) and intergenic (55.14%) regions (Supplementary Fig. S3I).

After filtering raw somatic mutations (see Materials and Methods and Supplementary Fig. S1), we retained a total of 1,296 SNVs/INDELs (median per sample = 10; Fig. 1B; Supplementary Table S2) localized in open chromatin regions termed aCREs. Our filtered somatic variants were located mainly in regions closer to genes (such as exonic, ncRNA exonic, UTR5, and upstream) when compared with the raw somatic variants (Fig. 1C). Indeed, whereas filtering sharply reduced intergenic mutations from 52% (raw variants) to 7.10% (filtered variants), the percentage of intronic mutations remained stable (42.50% and 43.60%, respectively). Furthermore, among the filtered variants, we observed an increased proportion of UTR (15.70% vs. 1.40%), exonic (11.60% vs. 0.84%), and upstream/downstream variants (21.90% vs. 3.30%) compared with the raw somatic variants (Fig. 1C). Together, these data support the hypothesis that our selected somatic variants could affect DNA regions (aCREs) regulating the transcription of neighboring genes.

On the basis of this hypothesis, we inquired whether gene expression could be influenced by neighboring aCREs. Overall, we found that the presence of the identified aCREs correlated with increased expression of mid-range (up to 0.25 Mb away, mean = 2.95) and long-range genes (up to 0.5 Mb away, mean = 2.82) when we analyzed RNA-seq data from the same dataset (GSE90683) of 25 neuroblastoma cell lines (P < 1.0 × 10–04, Fig. 1D).

As reported in Fig. 1E and F, our selected somatic variants showed significantly higher pathogenicity, assessed with CADD (27) and FunSeq2 (28) scores, when compared with the raw set of somatic variants (P < 2.2 × 10–16). Overall, 72 variants fell within coding regions. We found 60 missense SNVs (83.33%) and 12 truncating SNV/INDELs (16.66%, Fig. 1G). However, only three known cancer genes were involved (reported in the COSMIC catalog): ARID1A, ARID1B, and GATA3 (Fig. 1G; Supplementary Table S2). Somatic mutations of the chromatin remodelers ARID1A and ARID1B have been already reported in neuroblastoma (29). These data further supported the reliability of our results.

Mutational enrichment in aCREs

To assess whether aCREs could harbor more somatic variants than expected, we tested these regions for an excess of noncoding mutations as detailed in Materials and Methods. In total, 1,151 individual aCREs contained at least one mutation (Supplementary Table S3). Overall, after the enrichment analysis, three aCREs were considered for further investigation: one showing significant enrichment for mutations and FDR <0.05 and two with FDR <0.15 and mutated in the validation set consisting of 151 WGS (Table 1).

Table 1.

Recurrently mutated aCREs in 56 neuroblastoma samples.

aCRE regionClosest genesRegionTarget (bp)Mutations TD-Seq (N)P valueFDRMutations WGS (N)
chr11:9385430–9386340 TMEM41B (dist = 49770), IPO7 (dist = 20084) Intergenic 910 4 3.70E05 0.043 0 
chr4:174440675–174448940 SCRG1 (dist = 125307), HAND2 (dist = 1728) Intergenic 8,265 3 3.65E04 0.114 2 
chr6:157099410:157101080 ARID1B Exonic 1,670 5.79E−04 0.114 
chr2:198379730–198381565 HSPE1-MOB4 Intronic 1,835 6.69E−04 0.114 
chr19:925145–927105 ARID3A Intronic 1,960 3 6.69E04 0.114 2 
chr2:70370175–70370560 LINC01816 (dist = 17809), C2orf42 (dist = 6760) Intergenic 385 7.68E−04 0.114 
chr11:66085170–66085700 CD248 Upstream 530 7.81E−04 0.114 
chr16:28874330–28875755 SH2B1 Upstream 1,425 7.94E−04 0.114 
chr12:110905620–110907360 GPN3, FAM216A Intronic 1,740 9.42E−04 0.120 
aCRE regionClosest genesRegionTarget (bp)Mutations TD-Seq (N)P valueFDRMutations WGS (N)
chr11:9385430–9386340 TMEM41B (dist = 49770), IPO7 (dist = 20084) Intergenic 910 4 3.70E05 0.043 0 
chr4:174440675–174448940 SCRG1 (dist = 125307), HAND2 (dist = 1728) Intergenic 8,265 3 3.65E04 0.114 2 
chr6:157099410:157101080 ARID1B Exonic 1,670 5.79E−04 0.114 
chr2:198379730–198381565 HSPE1-MOB4 Intronic 1,835 6.69E−04 0.114 
chr19:925145–927105 ARID3A Intronic 1,960 3 6.69E04 0.114 2 
chr2:70370175–70370560 LINC01816 (dist = 17809), C2orf42 (dist = 6760) Intergenic 385 7.68E−04 0.114 
chr11:66085170–66085700 CD248 Upstream 530 7.81E−04 0.114 
chr16:28874330–28875755 SH2B1 Upstream 1,425 7.94E−04 0.114 
chr12:110905620–110907360 GPN3, FAM216A Intronic 1,740 9.42E−04 0.120 

Note: In bold: the top significant aCRE and the validated aCREs with somatic mutations found in WGS data set. Chromosomal locations and closest genes (as upstream, downstream) are also provided.

By using our in-house generated HiC data for the SK-N-BE2 cell line, we mapped aCREs to their candidate target genes and calculated the intensity and significance of genomic interactions within a window of 2 Mb around the aCREs (see Materials and Methods).

The most significantly mutated aCRE (Table 1; Fig. 2AI), a 910 bp sequence mapping to chr11:9385430–9386340 and carrying four mutations, was located between TMEM41B (distance = 49,770 bp) and IPO7 (distance = 20,084 bp). This aCRE was designated IPO7-aCRE. As reported in Supplementary Table S4, IPO7-aCRE interacted significantly with DENND5A, TMEM41B, and IPO7. As described in the literature, IPO7 expression is regulated positively by c-Myc and negatively by p53 (30). Furthermore, we confirmed that IPO7 is a target of MYCN (ChIP-seq data in Fig. 2I), whereas high expression is correlated with MYCN amplification (P = 1.60 × 10–26), as shown in Supplementary Fig. S4. Interestingly, the topologically associating domain (TAD) containing IPO7-aCRE also contained two genes showing weak and not significant interactions, DENND2B and DENND7B; together with DENND5A, these genes belong to the DENN domain family of regulators of Rab GTPases (Supplementary Table S4).

Figure 2.

Genomic interactions for the IPO7-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr11:9385430-9386340 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Minus log10-transformed P value. F, The aCRE region of 910 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the IPO7 gene, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), the MYCN ChIP-seq data of neuroblastoma cell lines in GSE80151 (blue), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Figure 2.

Genomic interactions for the IPO7-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr11:9385430-9386340 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Minus log10-transformed P value. F, The aCRE region of 910 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the IPO7 gene, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), the MYCN ChIP-seq data of neuroblastoma cell lines in GSE80151 (blue), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Close modal

The second aCRE (Table 1; Fig. 3AI), an intergenic region of 8,265 bp located at chr4:174440675–174448940 and harboring five mutations, was found to interact significantly with both HAND2 and HAND2 antisense RNA (HAND2-AS1). This aCRE was designated HAND2-aCRE, and, importantly, is a super-enhancer known to regulate HAND2 expression (23). We also found strong and significant interactions with two ncRNAs (LINC02269 and LINC02268) and two protein-coding genes: FBXO8 and CEP44 (Supplementary Table S5). Notably, we observed that all of the significant interactions were on the downstream of the aCRE and involved the majority of the HAND2 TAD.

Figure 3.

Genomic interactions for the HAND2-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr4:174440675–174448940 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Log10-transformed P value. F, The aCRE region of 8,265 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the HAND2 gene and its antisense, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Figure 3.

Genomic interactions for the HAND2-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr4:174440675–174448940 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Log10-transformed P value. F, The aCRE region of 8,265 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the HAND2 gene and its antisense, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Close modal

The third aCRE (Table 1; Fig. 4AI), a 1,960 bp ARID3A intronic region located at chr19:925145–927105 and harboring five mutations, mapped within a telomeric region characterized by high gene density. This aCRE, which we designated ARID3A-aCRE, interacted significantly with ARID3A, a direct TP53 effector, and with 23 additional genes (Supplementary Table S6) mainly involved in the immune response, as determined by functional enrichment analysis (Supplementary Fig. S5).

Figure 4.

Genomic interactions for the ARID3A-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr19:925145–927105 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Log10-transformed P value. F, The aCRE region of 1,960 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the ARID3A gene, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Figure 4.

Genomic interactions for the ARID3A-aCRE. The figure reports genomic interactions from the aCRE point of view. Tracks are named from top to bottom. A, Genomic coordinates (hg19). B, The interaction matrix for the chr19:925145–927105 region extended 1Mb up- and downstream. Genomic coverage is reported in bins of 10 Kbs. Black bordered triangles represent TADs. C, TAD boundaries. D, Normalized number of interactions. E, Log10-transformed P value. F, The aCRE region of 1,960 bp. G, The cluster of somatic mutations falling in the aCRE and its extended ends. H, RefSeq genes (we plotted the longest isoform for each gene). I, Zoom-in showing 100 Kbs up- and downstream the aCRE. Here, in the listed order, we show the aCRE, the somatic mutations, the ARID3A gene, the DHS sites of the SK-N-SH neuroblastoma cell line in purple (ENCODE v3), and the H3K27ac data of neuroblastoma cell lines (GSE90683 and GSE65664).

Close modal

The mutational burden of aCREs is an independent prognostic factor

To assess the prognostic value of noncoding somatic mutations in the nine aCREs reported in Table 1, we categorized the 56 patients into two groups: the first harboring noncoding somatic mutations in at least one of the nine aCREs and the second without noncoding somatic mutations in those regions. By this analysis, we identified 17 patients showing the noncoding mutational burden and 39 patients without the noncoding mutational burden. For both groups, we tested the association with unfavorable clinical markers (Stage 4, MYCN amplification, age at diagnosis >18 months, high-risk), overall survival (OS), and event-free survival (EFS) probabilities. We found that patients with the noncoding mutational burden had a worse OS and EFS when compared with patients without the noncoding mutational burden (P < 2.0 × 10–03, Fig. 5A). On multivariable analysis, the noncoding mutational burden persisted as a prognostic factor independent of age, stage, risk stratification, and MYCN status (Supplementary Table S7). Although patients bearing a higher burden of mutations were more frequently observed in groups showing unfavorable clinical markers, no statistically significant correlations were found between these two parameters (Supplementary Fig. S6).

Figure 5.

Somatic variants at IPO7-aCRE affect its regulatory activity and the expression of neighboring genes. A, OS and EFS of patients with neuroblastoma with mutational burden (Mut; n = 17) and without mutational burden (Wt; n = 39). Statistical significance was calculated with log-rank test. B, Bar plot showing the results of the luciferase reporter assay carried out in SK-N-BE2 and SK-N-SH neuroblastoma cell lines. Luciferase activity is reported as the fold change over the promoterless IPO7-aCRE (pgl3-promoter vector). Data are shown as mean ± SD from nine independent transfection experiments, each performed in triplicate. C, Sanger sequencing traces showing the Cas9 cutting site in proximity of chr11:9386084:C>G variant (top line) and subsequent waveform decomposition in edited cells. Black line, gRNA sequence; red box, PAM sequence. D, Bar plot representing the INDEL distribution in SK-N-BE2 edited clone #3. E, Bar plot representing the INDEL distribution in SK-N-BE2 edited clone #2. F, Quantitative PCR measurements of mRNA (IPO7, TMEM41B, DENND5A) in SK-N-BE2 wild-type and edited clones (gRNA 13 #2 and gRNA13 #3; °, P < 0.05; *, P < 0.01; **, P < 0.001). P value was calculated by t test.

Figure 5.

Somatic variants at IPO7-aCRE affect its regulatory activity and the expression of neighboring genes. A, OS and EFS of patients with neuroblastoma with mutational burden (Mut; n = 17) and without mutational burden (Wt; n = 39). Statistical significance was calculated with log-rank test. B, Bar plot showing the results of the luciferase reporter assay carried out in SK-N-BE2 and SK-N-SH neuroblastoma cell lines. Luciferase activity is reported as the fold change over the promoterless IPO7-aCRE (pgl3-promoter vector). Data are shown as mean ± SD from nine independent transfection experiments, each performed in triplicate. C, Sanger sequencing traces showing the Cas9 cutting site in proximity of chr11:9386084:C>G variant (top line) and subsequent waveform decomposition in edited cells. Black line, gRNA sequence; red box, PAM sequence. D, Bar plot representing the INDEL distribution in SK-N-BE2 edited clone #3. E, Bar plot representing the INDEL distribution in SK-N-BE2 edited clone #2. F, Quantitative PCR measurements of mRNA (IPO7, TMEM41B, DENND5A) in SK-N-BE2 wild-type and edited clones (gRNA 13 #2 and gRNA13 #3; °, P < 0.05; *, P < 0.01; **, P < 0.001). P value was calculated by t test.

Close modal

Somatic variants in aCRE can alter the expression of candidate target genes

To assess the functional role of the noncoding mutations in affecting the expression of neighboring genes, we experimentally tested the four IPO7-aCRE somatic mutations (Table 1). We reasoned that the results provided by this functional validation could serve as proof-of-concept for the functional biological effects of aCREs.

Initially, we performed functional luciferase reporter gene assays for each of the selected mutations in SK-N-BE2 and SK-N-SH neuroblastoma cell lines and found that three out of four mutations significantly influenced the transcriptional activity of IPO7-aCRE (Fig. 5B). The somatic mutation at chr11:9386084 (C>G) showing the most significant effect on regulatory activity was chosen for designing the single guide RNA sequences (Supplementary Table S8). CRISPR-Cas9 editing was then used to introduce INDELs mimicking the chr11:9386084:C>G variant in the SK-N-BE2 cell line (Fig. 5C). Sanger sequencing and waveform decomposition analysis by the web tool synthego (ice.synthego.com; ref. 31) revealed that 13 of 38 total clones were edited (34%). In particular, clone #3 revealed a 1 bp deletion at chr11:9386084 (C>-), demonstrating 67% editing efficiency (Fig. 5D), whereas clone #2 exhibited a deletion of 4 bp at chr11:9386081–9386084 (CACC>-), with an INDEL percentage of 74% (Fig. 5E). In both of the edited clones, we observed significantly reduced mRNA expression for the interacting genes IPO7, TMEM41B, and DENND5A (Fig. 5F; Supplementary Table S4).

Previous studies indicated that depletion of IPO7 (32), TMEM41B (33, 34), and DENND5A (35) can affect normal neuronal development. As neuroblastoma originates from the neural crest (36), we went on to compare expression levels of IPO7, TMEM41B, and DENND5A in public gene expression datasets of neural crest (GSE14340), adrenal medulla (GSE7307), and normal tissues (GSE64985). All three genes were expressed at significantly higher levels in neural crest cells when compared with adrenal glands and normal tissues (P < 0.05, Supplementary Fig. S7), suggesting that they may play a role in neural crest development. Using a public dataset of 498 neuroblastomas profiled by RNA-seq (GSE62564), elevated expression of IPO7 and TMEM41B genes was confirmed; moreover, increased expression correlated with aggressive tumor phenotypes.

However, increased expression of DENND5A was found only when comparing stage 4 with stage 4s tumors (Supplementary Figs. S8A and S8B; Supplementary Table S9). Many of the processes underlying the formation and migration of neural crest can also play critical roles in cancer progression (37). This may explain why elevated expression of these genes, which appear to be involved in neural crest development, is associated with neuroblastomas that are clinically more aggressive.

To further validate our experimental approach, we measured the expression of three further genes (LMO1, RPL27A, NRIP3) that showed low normalized interaction with IPO7-aCRE (see Fig. 2AI; Supplementary Table S4). As expected, we did not find significant alterations of gene expression in edited cells (Supplementary Fig. S9).

Finally, we found that chr11:9386084:C>G was predicted to alter the binding motifs of several TFs, including RUNX2 and TFAP2A, that are known to be critical for some steps of neural crest development (Supplementary Fig. S10; Supplementary Table S10; refs. 38, 39). Together, these results suggest that decreased expression of IPO7, TMEM41B, and DENND5A resulting from somatic mutations in cis-regulatory elements can promote tumorigenesis by altering normal neuronal cell development.

Aberrant expression of aCRE target genes correlates with markers of poor outcome

Our functional studies strongly supported a protumorigenic role for somatic mutations in the IPO7-aCRE sequence. We therefore wanted to assess whether the expression levels of genes interacting with the other aCREs identified as significantly mutated (HAND2-aCRE and ARID3A-aCRE) might also correlate with protumorigenic activity. We found that high levels of expression of the protein-coding gene HAND2 were inversely associated only with MYCN amplification (P = 1.94 × 10–07). However, strikingly, high levels of CEP44 and FBXO8 expression correlated strongly with markers of poor outcome (MYCN-amplified, stage 4, high-risk tumors; P ≤ 6.87 × 10–04) and worse EFS and OS probabilities (P ≤ 2.45 × 10–08, Supplementary Figs. S11A and S11B; Supplementary Table S11). In contrast, lower expression of many of the 24 genes interacting with ARID3A-aCRE was statistically associated with markers of poor outcome (MYCN-amplified: 10/24 genes (P ≤ 4.17 × 10–02); stage 4: 9/24 genes (P ≤ 4.17 × 10–02); age at diagnosis: 11/24 genes (P ≤ 1.34 × 10–02); high-risk tumors: 9/24 genes (P ≤ 6.56 × 10–03); and worse EFS and OS probabilities (7/24 genes; P ≤ 4.80 × 10–02, see Supplementary Fig. S12; Supplementary Fig. S13; Supplementary Table S12).

Finally, we wanted to verify the association of combined gene expression within each of the three selected aCREs with clinical markers and patient survival. We thus performed k-means clustering of 498 neuroblastoma samples, based on expression levels of significantly interacting genes.

For genes interacting with IPO7-aCRE (Fig. 2AI), we found that the k-means Group 1 (n = 216) showed higher expression when compared with Group 0 (n = 282; Supplementary Fig. S14A; Supplementary Fig. S15A; P = 5.96 × 10–15). As reported in Supplementary Fig. S14B, patients in Group 1 strongly correlated with poor outcome markers (MYCN-amplification, P = 7.79 × 10–10; stage 4, P = 5.25 × 10–14; high-risk, P = 6.92 × 10–15) and showed worse EFS and OS probabilities (Supplementary Fig. S14C and S14D, P = 3.0 × 10–06 and P = 4.0 × 10–09, respectively).

For HAND2-aCRE (see Fig. 3AI), we identified two groups characterized by low (Group 0; n = 479) and high (Group 1; n = 19) gene expression levels, as shown in Supplementary Fig. S14E; Supplementary Fig. S15B (P = 1.77 × 10–13). Group 1 strongly correlated with markers of neuroblastoma aggressiveness (Supplementary Fig. S14F), such as MYCN amplification (P = 2.26 × 10–09), INSS stage 4 (P = 2.64 × 10–04), and high-risk (P = 1.72 × 10–05). Furthermore, Group 1 patients presented inferior EFS and OS compared with Group 0 (Supplementary Figs. S14G and S14H, P = 4.0 × 10–08 and 2.0 × 10–16, respectively).

For ARID3A-aCRE, interacting with 24 genes (in Fig. 4AI), k-means clustering identified two groups characterized by low (Group 1; n = 241) and high (Group 0; n = 257) expression levels, as shown in Supplementary Figs. S14I and S15C (P = 6.99 × 10–13). Group 0 strongly correlated with markers of neuroblastoma aggressiveness (Supplementary Fig. S14J), such as MYCN amplification (P = 2.74 × 10–04), INSS stage 4 (P = 1.25 × 10–06), and high-risk patients (P = 7.51 × 10–08). Furthermore, Group 1 patients showed inferior EFS and OS compared with Group 0 (Supplementary Figs. S14K and S14L, P = 1.0 × 10–09 and 2.0 × 10–09, respectively).

Somatic variants in aCREs alter TF binding motifs

We conducted motif analysis using the R-Bioconductor package “motifbreakeR” to assess whether somatic variants within the selected aCREs disrupted or created TF binding motifs. The results showed that the selected variants (n = 14) altered the binding motifs of 118 TFs: 41, 27, and 50 motifs for IPO7-aCRE, HAND2-aCRE, and ARID3A-aCRE, respectively (Supplementary Table S10).

Convergence of biological processes of the genes affected by somatic regulatory variants

Next, Gene Ontology (GO) enrichment analysis was used to assess biological processes for the TFs showing altered motifs. For each mutated aCRE, the biological functions of the TFs with altered motifs were similar to those of the target genes (Fig. 6). Regarding the IPO7-aCRE, GO terms enriched in the list of TFs (including “protein localization to nucleus” and “embryonic organ development”) coincided with functions of IPO7, that is, nuclear import of proteins (40) and role in the late stage of neural ectoderm differentiation (32). HAND2 is a transcription factor involved in the development of heart (41), limb (42), and neural crest derivatives (43); TFs with functions relating to the development of these diverse tissues were overrepresented for HAND2-aCRE. Finally, the biological functions of TFs targeting ARID3A-aCRE were found to relate to the p53 signaling pathway, embryonic development, and differentiation of immune system cells. Accordingly, ARID3A is essential during early embryonic development (44) and is required for erythroid lineage differentiation and hematopoietic stem cell production (45, 46). Moreover, ARID3A cooperates with p53 to activate p21 (WAF1) transcription (47).

Figure 6.

GO of TFs with altered binding motifs in aCREs. Dot plot reporting the GO enrichment analysis of the biological processes involving the TFs harboring somatic mutations in their binding motifs. Dot color scale is dependent on the enrichment ratio, whereas dot sizes are proportional to the log10 FDR. Statistical significance (FDR ≤ 0.05) was calculated with hypergeometric test and Benjamini–Hotchberg correction. The lower part of the figure reports the biological functions of the aCRE target genes. Colored arrows highlight the convergence of biological processes between the aCRE target genes and TFs, with mutated binding motifs, regulating the same aCRE. The functional enrichment analysis was performed by using the web tool WEB-based GEne SeT AnaLysis Toolkit (www.webgestalt.org). The GO database of nonredundant biological processes was used.

Figure 6.

GO of TFs with altered binding motifs in aCREs. Dot plot reporting the GO enrichment analysis of the biological processes involving the TFs harboring somatic mutations in their binding motifs. Dot color scale is dependent on the enrichment ratio, whereas dot sizes are proportional to the log10 FDR. Statistical significance (FDR ≤ 0.05) was calculated with hypergeometric test and Benjamini–Hotchberg correction. The lower part of the figure reports the biological functions of the aCRE target genes. Colored arrows highlight the convergence of biological processes between the aCRE target genes and TFs, with mutated binding motifs, regulating the same aCRE. The functional enrichment analysis was performed by using the web tool WEB-based GEne SeT AnaLysis Toolkit (www.webgestalt.org). The GO database of nonredundant biological processes was used.

Close modal

Of note, the GO term “myeloid cell differentiation,” a biological process that is commonly altered in cancer (48), was identified in all three lists of TFs with altered binding motifs (Fig. 6).

Comparison of the biological functions overrepresented in the three lists of TFs with those of the aCRE target genes showed that the majority of the biological terms were related to immune response and embryonic development (Table 2). These suggest that the combined effect of noncoding cancer driver mutations is the alteration of gene sets involved in specific molecular mechanisms underlying neuroblastoma tumorigenesis.

Table 2.

Biological functions of genes affected by regulatory somatic mutations.

Transcription factors with motifs altered by SNVsaTarget genes of mutated aCREsb
Embryonic developmentImmune responseEmbryonic developmentImmune response
IPO7-aCRE Embryonic organ development, endocrine system development, animal organ formation Type 2 immune response, myeloid cell differentiation IPO7 (32DENND5A, cDENND2B, cDENND7B (35
   DENND5A (51 
   TMEM41B (33, 34, 55 
HAND2-aCRE Mesenchyme development, pericardium development, cell fate commitment, neural tube development, respiratory tube development, in utero embryonic development Myeloid cell differentiation, leukocyte differentiation HAND2 (41–43— 
ARID3A-aCRE Endocrine system development, animal organ formation, in utero embryonic development, muscle tissue development, coronary vasculature development Myeloid cell differentiation, interaction with symbiont, response to antibiotic ARID3A (44, 45), STK11 (61ARID3A (45, 46) Functional enrichment in 24 genes (Supplementary Fig. S4; Supplementary Table S6) 
Transcription factors with motifs altered by SNVsaTarget genes of mutated aCREsb
Embryonic developmentImmune responseEmbryonic developmentImmune response
IPO7-aCRE Embryonic organ development, endocrine system development, animal organ formation Type 2 immune response, myeloid cell differentiation IPO7 (32DENND5A, cDENND2B, cDENND7B (35
   DENND5A (51 
   TMEM41B (33, 34, 55 
HAND2-aCRE Mesenchyme development, pericardium development, cell fate commitment, neural tube development, respiratory tube development, in utero embryonic development Myeloid cell differentiation, leukocyte differentiation HAND2 (41–43— 
ARID3A-aCRE Endocrine system development, animal organ formation, in utero embryonic development, muscle tissue development, coronary vasculature development Myeloid cell differentiation, interaction with symbiont, response to antibiotic ARID3A (44, 45), STK11 (61ARID3A (45, 46) Functional enrichment in 24 genes (Supplementary Fig. S4; Supplementary Table S6) 

aThe two columns report the enriched biological functions in the TF sets.

bThe two columns report the name of target genes involved in embryonic development and immune response.

cThe DENND2B and DENND7B genes did not show significant interactions with the aCREs but localized in the same TAD and all belong to the DENN domain protein family.

Data records

Sequencing data for tumor samples generated and analyzed during this study are available at the NCBI SRA under the accession no. PRJNA773618. The sequencing target is available as Supplementary Table S13.

To determine the role of noncoding variants in driving tumorigenesis, we performed deep sequencing of a set of transcriptionally active CREs in 56 neuroblastoma and normal sample pairs. Through bioinformatics analyses and stringent filtering, we identified a reliable set of potentially pathogenic somatic SNVs located within aCREs. We observed that the majority of these mutations were located in noncoding regions (UTRs and introns) proximal to coding genes. These data suggested that somatic variants in aCREs can affect the expression of neighboring genes.

Mutational enrichment analysis revealed nine aCREs with a significantly higher rate of mutations. Interestingly, we observed that the mutational burden in these aCREs was strongly associated with inferior survival rates in patients with neuroblastoma, as previously reported for coding mutations (49). We also demonstrated that the prognostic value of noncoding mutation counts persists in the context of conventional risk factors including age, stage, MYCN amplification, and risk status. However, future studies on independent sets of patients are required to assess the recurrence of mutated aCREs and confirm the prognostic value of their mutational burden in neuroblastoma.

We decided to further investigate IPO7-aCRE (the most significantly mutated aCRE), in addition to two further aCREs showing further mutations in a set of 151 neuroblastoma WGS (HAND2-aCRE, and ARID3A-aCRE). IPO7-aCRE, a 910 bp sequence located on chromosome 11, showed significant interactions with IPO7, TMEM41B, and DENND5A. IPO7 is involved in the nuclear import and export of proteins required for ribosome biogenesis, is regulated positively by c-Myc and negatively by p53, and is upregulated in diverse tumors (39, 40). Furthermore, IPO7 is also a target of MYCN, and high expression correlates with MYCN amplification.

DENND5A is involved in the regulation of the Rab GTPase pathway (50), impairment of which can promote diverse diseases, including cancer (51). Notably, IPO7-aCRE also showed weak interaction with DENND2B and DENND7B, members of the same protein family. Moreover, DENND5A deficiency can lead to striking alterations in neuronal development (35) and to the upregulation of Trk receptors (TrkA and TrkB), which play a critical role in neuroblastoma development (52).

Interestingly, Transmembrane Protein 41B (TMEM41B) is required for autophagosome formation (53) and is mutated in pulmonary carcinoid tumors (54). Loss of function of TMEM41B is reported to induce neuronal phenotypes that are similar to survival motor neuron protein deficiency in vivo (33, 34). More recently, analysis of knockout mice showed that TMEM41B is also essential during embryonic development (55).

The second aCRE (HAND2-aCRE) was located in an 8,265 bp intergenic region on chromosome 4 and showed significant interaction with HAND2, HAND2-AS1, FBXO8, and CEP44. Heart and Neural Crest Derivatives Expressed 2 (HAND2) is highly expressed in neural crest-derived cells and encodes a transcription factor that is particularly important during neuronal development (56, 57), functioning to define the core regulatory circuitries of sympathetic noradrenergic cell identity. It also plays a key role in neuroblastoma tumorigenesis (23). Furthermore, HAND2-AS1 is downregulated in numerous cancer types, acting as a tumor suppressor by inhibiting the proliferation, migration, and invasiveness of cancer cells (58–60).

The third aCRE, ARID3A-aCRE, was a 1,960 bp intronic region of ARID3A, located on chromosome 19; this chromosomal region is characterized by high gene density. ARID3A-aCRE interacted significantly with ARID3A and the other 24 genes, the functions of which relate mainly to the immune response. AT-Rich Interaction Domain 3A (ARID3A) is a direct TP53 effector required for the normal development of many cell lineages. Furthermore, ARID3A-aCRE interacted strongly with STK11, a tumor suppressor required for normal embryonic development (61).

On the basis of these results, we propose that the candidate target genes for the aCREs identified in this study likely play key roles in tumor development. This hypothesis is further supported by our functional validation approach (luciferase assays and CRISPR-Cas9 genome editing in neuroblastoma cell lines). Indeed, we showed that mutations in IPO7-aCRE can alter the transcriptional activity and reduce the expression of target genes IPO7, TMEM41B, and DENND5A. Interestingly, these genes are highly expressed in normal neural crest cells, indicating that aberrant expression could cause impaired neuronal development and, thus, potentially, neuroblast transformation (33–36). Along with results demonstrating the prognostic value of the noncoding mutational burden, our data strongly support the hypothesis that mutations in aCREs have a biological role in altering the expression of genes involved in neuroblastoma tumorigenesis. However, additional molecular studies to characterize the mutated aCREs need to be carried out in future, for example, testing the effect of noncoding mutations on the DNA binding affinity of TFs.

To further assess the importance of aCREs in regulating the expression of surrounding genes and their potential involvement in neuroblastoma, we correlated gene expression with prognostic markers and patient survival using publicly available data for 498 neuroblastoma samples. We found that expression profiles of many of the IPO7-aCRE, HAND2-aCRE, and ARID3A-aCRE target genes, tested singly and in combination, were associated with markers of unfavorable prognosis and low survival rates.

Subsequently, we showed that somatic variants within the selected aCREs disrupted the binding motifs of 118 TFs and went on to perform GO term biological process analysis for these TFs. Variants in IPO7-aCRE were found to modify 41 motifs for TFs involved in the control of embryonic organ development and immune response, biological processes that are also linked to the functions of the interacting genes IPO7, TMEM41B, DENND5A, DENND2B, and DENND7B. Similarly, in HAND2-aCRE, alterations were seen in 27 binding motifs for TFs involved largely in embryonic development, that is, the main biological role of HAND2. Analogously, the 50 motifs altered in the ARID3A-aCRE were mainly linked with embryonic development and cell differentiation (44). Overall, we observed a convergence of biological functions between aCRE target genes and TFs targeting the same aCRE.

Taken together, the results of this work suggest that somatic mutations enriched in aCREs can collectively affect the functions of TFs and aCRE-regulated genes involved in the same molecular mechanisms. Interestingly, myeloid cell differentiation, a biological process that is commonly altered in cancer (48), was identified as an ontology term common to all three groups of TFs. In recent years, the crucial contribution of myeloid cells to the promotion of tumor angiogenesis, cell invasion, and metastasis has come to be appreciated. In cancer, myeloid precursors differentiate into potent immunosuppressive cells, known as myeloid-derived suppressor cells (62, 63), which negatively regulate antitumor immunity (64). Importantly, this immunosuppressive tumor microenvironment, mediated by myeloid-derived suppressor cells, is one of the immune escape pathways adopted by neuroblastoma cells. Targeting of myeloid-derived suppressor cells can potentiate the effect of checkpoint inhibitors for immunotherapy in human cancers, including high-risk childhood neuroblastoma (65).

In this work, we used an alternative approach to detect and study regulatory cancer driver mutations. Our strategy enabled us to identify mutated regulatory regions that may play an important role in regulating biological processes associated with tumor development and immune escape. Further studies will be necessary to ascertain the functional roles of genes that are involved in defining aggressive neuroblastoma phenotypes.

No disclosures were reported.

V.A. Lasorsa: Data curation, visualization, writing–original draft, writing–review and editing, bioinformatic analyses. A. Montella: Validation, investigation, in vitro functional assays. S. Cantalupo: Investigation, DNA samples collection and sequencing library preparation. M. Tirelli: Validation, investigation, writing–review and editing, in vitro functional assays. C. de Torres: Writing–review and editing, provided DNA samples of neuroblastoma patients. Carmen de Torres was deceased at the time of writing the manuscript. S. Aveic: Writing–review and editing, provided DNA samples of neuroblastoma patients. G. Tonini: Conceptualization, supervision, funding acquisition, methodology, project administration, writing–review and editing, provided critical review of the study design. A. Iolascon: Writing–review and editing, provided critical review of the study design. M. Capasso: Conceptualization, supervision, funding acquisition, validation, investigation, methodology, project administration, writing–review and editing.

The authors would like to thank Marcella Pantile for her technical assistance during sample preparation. This study was supported by grants from Associazione Italiana per la Ricerca sul Cancro (grant no. 25796 to M. Capasso and grant no. 20757 to A. Iolascon); Fondazione Italiana per la Lotta al Neuroblastoma (to M. Capasso); Associazione Oncologia Pediatrica e Neuroblastoma (to M. Capasso); Regione Campania “SATIN” grant 2018-2020 (to M. Capasso).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Huang
FW
,
Hodis
E
,
Xu
MJ
,
Kryukov
GV
,
Chin
L
,
Garraway
LA
.
Highly recurrent TERT promoter mutations in human melanoma
.
Science
2013
;
339
:
957
9
.
2.
Rheinbay
E
,
Parasuraman
P
,
Grimsby
J
,
Tiao
G
,
Engreitz
JM
,
Kim
J
, et al
.
Recurrent and functional regulatory mutations in breast cancer
.
Nature
2017
;
547
:
55
60
.
3.
Weinhold
N
,
Jacobsen
A
,
Schultz
N
,
Sander
C
,
Lee
W
.
Genome-wide analysis of noncoding regulatory mutations in cancer
.
Nat Genet
2014
;
46
:
1160
5
.
4.
Orlando
G
,
Law
PJ
,
Cornish
AJ
,
Dobbins
SE
,
Chubb
D
,
Broderick
P
, et al
.
Promoter capture Hi-C-based identification of recurrent noncoding mutations in colorectal cancer
.
Nat Genet
2018
;
50
:
1375
80
.
5.
Capasso
M
,
Lasorsa
VA
,
Cimmino
F
,
Avitabile
M
,
Cantalupo
S
,
Montella
A
, et al
.
Transcription factors involved in tumorigenesis are over-represented in mutated active DNA-binding sites in neuroblastoma
.
Cancer Res
2020
;
80
:
382
93
.
6.
Hansen
RS
,
Thomas
S
,
Sandstrom
R
,
Canfield
TK
,
Thurman
RE
,
Weaver
M
, et al
.
Sequencing newly replicated DNA reveals widespread plasticity in human replication timing
.
Proc Nat Acad Sci U S A
2010
;
107
:
139
44
.
7.
Schuster-Bockler
B
,
Lehner
B
.
Chromatin organization is a major influence on regional mutation rates in human cancer cells
.
Nature
2012
;
488
:
504
7
.
8.
Salk
JJ
,
Schmitt
MW
,
Loeb
LA
.
Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations
.
Nat Rev Genet
2018
;
19
:
269
85
.
9.
Luksch
R
,
Castellani
MR
,
Collini
P
,
De Bernardi
B
,
Conte
M
,
Gambini
C
, et al
.
Neuroblastoma (peripheral neuroblastic tumours)
.
Crit Rev Oncol Hematol
2016
;
107
:
163
81
.
10.
Capasso
M
,
Diskin
SJ
.
Genetics and genomics of neuroblastoma
.
Cancer Treat Res
2010
;
155
:
65
84
.
11.
Lasorsa
VA
,
Cimmino
F
,
Ognibene
M
,
Mazzocco
K
,
Erminio
G
,
Morini
M
, et al
.
19p loss is significantly enriched in older age neuroblastoma patients and correlates with poor prognosis
.
NPJ Genom Med
2020
;
5
:
18
.
12.
Esposito
MR
,
Binatti
A
,
Pantile
M
,
Coppe
A
,
Mazzocco
K
,
Longo
L
, et al
.
Somatic mutations in specific and connected subpathways are associated with short neuroblastoma patients' survival and indicate proteins targetable at onset of disease
.
Int J Cancer
2018
;
143
:
2525
36
.
13.
Lasorsa
VA
,
Formicola
D
,
Pignataro
P
,
Cimmino
F
,
Calabrese
FM
,
Mora
J
, et al
.
Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression
.
Oncotarget
2016
;
7
:
21840
52
.
14.
Peifer
M
,
Hertwig
F
,
Roels
F
,
Dreidax
D
,
Gartlgruber
M
,
Menon
R
, et al
.
Telomerase activation by genomic rearrangements in high-risk neuroblastoma
.
Nature
2015
;
526
:
700
4
.
15.
Ackermann
S
,
Cartolano
M
,
Hero
B
,
Welte
A
,
Kahlert
Y
,
Roderwieser
A
, et al
.
A mechanistic classification of clinical phenotypes in neuroblastoma
.
Science
2018
;
362
:
1165
70
.
16.
Khurana
E
,
Fu
Y
,
Colonna
V
,
Mu
XJ
,
Kang
HM
,
Lappalainen
T
, et al
.
Integrative annotation of variants from 1092 humans: application to cancer genomics
.
Science
2013
;
342
:
1235587
.
17.
Tonini
GP
,
Capasso
M
.
Genetic predisposition and chromosome instability in neuroblastoma
.
Cancer Metastasis Rev
2020
;
39
:
275
85
.
18.
Wang
K
,
Diskin
SJ
,
Zhang
H
,
Attiyeh
EF
,
Winter
C
,
Hou
C
, et al
.
Integrative genomics identifies LMO1 as a neuroblastoma oncogene
.
Nature
2011
;
469
:
216
20
.
19.
Cimmino
F
,
Avitabile
M
,
Diskin
SJ
,
Vaksman
Z
,
Pignataro
P
,
Formicola
D
, et al
.
Fine mapping of 2q35 high-risk neuroblastoma locus reveals independent functional risk variants and suggests full-length BARD1 as tumor-suppressor
.
Int J Cancer
2018
;
143
:
2828
37
.
20.
Capasso
M
,
Devoto
M
,
Hou
C
,
Asgharzadeh
S
,
Glessner
JT
,
Attiyeh
EF
, et al
.
Common variations in BARD1 influence susceptibility to high-risk neuroblastoma
.
Nat Genet
2009
;
41
:
718
23
.
21.
Diskin
SJ
,
Capasso
M
,
Schnepp
RW
,
Cole
KA
,
Attiyeh
EF
,
Hou
C
, et al
.
Common variation at 6q16 within HACE1 and LIN28B influences susceptibility to neuroblastoma
.
Nat Genet
2012
;
44
:
1126
30
.
22.
Avitabile
M
,
Succoio
M
,
Testori
A
,
Cardinale
A
,
Vaksman
Z
,
Lasorsa
VA
, et al
.
Neural crest-derived tumor neuroblastoma and melanoma share 1p13.2 as susceptibility locus that shows a long-range interaction with the SLC16A1 gene
.
Carcinogenesis
2020
;
41
:
284
95
.
23.
Boeva
V
,
Louis-Brennetot
C
,
Peltier
A
,
Durand
S
,
Pierre-Eugene
C
,
Raynal
V
, et al
.
Heterogeneity of neuroblastoma cell identity defined by transcriptional circuitries
.
Nat Genet
2017
;
49
:
1408
13
.
24.
Gopi
LK
,
Kidder
BL
.
Integrative pan cancer analysis reveals epigenomic variation in cancer type and cell specific chromatin domains
.
Nat Commun
2021
;
12
:
1419
.
25.
Pugh
TJ
,
Morozova
O
,
Attiyeh
EF
,
Asgharzadeh
S
,
Wei
JS
,
Auclair
D
, et al
.
The genetic landscape of high-risk neuroblastoma
.
Nat Genet
2013
;
45
:
279
84
.
26.
Russo
R
,
Cimmino
F
,
Pezone
L
,
Manna
F
,
Avitabile
M
,
Langella
C
, et al
.
Kinome expression profiling of human neuroblastoma tumors identifies potential drug targets for ultra high-risk patients
.
Carcinogenesis
2017
;
38
:
1011
20
.
27.
Rentzsch
P
,
Witten
D
,
Cooper
GM
,
Shendure
J
,
Kircher
M
.
CADD: predicting the deleteriousness of variants throughout the human genome
.
Nucleic Acids Res
2019
;
47
:
D886
D94
.
28.
Fu
Y
,
Liu
Z
,
Lou
S
,
Bedford
J
,
Mu
XJ
,
Yip
KY
, et al
.
FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer
.
Genome Biol
2014
;
15
:
480
.
29.
Sausen
M
,
Leary
RJ
,
Jones
S
,
Wu
J
,
Reynolds
CP
,
Liu
X
, et al
.
Integrated genomic analyses identify ARID1A and ARID1B alterations in the childhood cancer neuroblastoma
.
Nat Genet
2013
;
45
:
12
7
.
30.
Golomb
L
,
Bublik
DR
,
Wilder
S
,
Nevo
R
,
Kiss
V
,
Grabusic
K
, et al
.
Importin 7 and exportin 1 link c-Myc and p53 to regulation of ribosomal biogenesis
.
Mol Cell
2012
;
45
:
222
32
.
31.
Hsiau
T
,
Maures
T
,
Waite
K
,
Yang
J
,
Kelso
R
,
Holden
K
, et al
.
Inference of CRISPR Edits from Sanger Trace Data
.
bioRxiv
2018
:
251082
.
32.
Sangel
P
,
Oka
M
,
Yoneda
Y
.
The role of Importin-βs in the maintenance and lineage commitment of mouse embryonic stem cells
.
FEBS Open Bio
2014
;
4
:
112
20
.
33.
Imlach
WL
,
Beck
ES
,
Choi
BJ
,
Lotti
F
,
Pellizzoni
L
,
McCabe
BD
.
SMN is required for sensory-motor circuit function in Drosophila
.
Cell
2012
;
151
:
427
39
.
34.
Lotti
F
,
Imlach
WL
,
Saieva
L
,
Beck
ES
,
Hao le
T
,
Li
DK
, et al
.
An SMN-dependent U12 splicing event essential for motor circuit function
.
Cell
2012
;
151
:
440
54
.
35.
Han
C
,
Alkhater
R
,
Froukh
T
,
Minassian
AG
,
Galati
M
,
Liu
RH
, et al
.
Epileptic encephalopathy caused by mutations in the guanine nucleotide exchange factor DENND5A
.
Am J Hum Genet
2016
;
99
:
1359
67
.
36.
Johnsen
JI
,
Dyberg
C
,
Wickstrom
M
.
Neuroblastoma—a neural crest derived embryonal malignancy
.
Front Mol Neurosci
2019
;
12
:
9
.
37.
Gallik
KL
,
Treffy
RW
,
Nacke
LM
,
Ahsan
K
,
Rocha
M
,
Green-Saxena
A
, et al
.
Neural crest and cancer: divergent travelers on similar paths
.
Mech Dev
2017
;
148
:
89
99
.
38.
Eckert
D
,
Buhl
S
,
Weber
S
,
Jager
R
,
Schorle
H
.
The AP-2 family of transcription factors
.
Genome Biol
2005
;
6
:
246
.
39.
Shirai
Y
,
Kawabe
K
,
Tosa
I
,
Tsukamoto
S
,
Yamada
D
,
Takarada
T
.
Runx2 function in cells of neural crest origin during intramembranous ossification
.
Biochem Biophys Res Commun
2019
;
509
:
1028
33
.
40.
Jäkel
S
,
Görlich
D
.
Importin beta, transportin, RanBP5 and RanBP7 mediate nuclear import of ribosomal proteins in mammalian cells
.
EMBO J
1998
;
17
:
4491
502
.
41.
Srivastava
D
,
Thomas
T
,
Lin
Q
,
Kirby
ML
,
Brown
D
,
Olson
EN
.
Regulation of cardiac mesodermal and neural crest development by the bHLH transcription factor, dHAND
.
Nat Genet
1997
;
16
:
154
60
.
42.
Charité
J
,
McFadden
DG
,
Olson
EN
.
The bHLH transcription factor dHAND controls Sonic hedgehog expression and establishment of the zone of polarizing activity during limb development
.
Development
2000
;
127
:
2461
70
.
43.
Howard
M
,
Foster
DN
,
Cserjesi
P
.
Expression of HAND gene products may be sufficient for the differentiation of avian neural crest-derived cells into catecholaminergic neurons in culture
.
Dev Biol
1999
;
215
:
62
77
.
44.
Rhee
C
,
Edwards
M
,
Dang
C
,
Harris
J
,
Brown
M
,
Kim
J
, et al
.
ARID3A is required for mammalian placenta development
.
Dev Biol
2017
;
422
:
83
91
.
45.
Kim
PG
,
Canver
MC
,
Rhee
C
,
Ross
SJ
,
Harriss
JV
,
Tu
HC
, et al
.
Interferon-α signaling promotes embryonic HSC maturation
.
Blood
2016
;
128
:
204
16
.
46.
Webb
CF
,
Bryant
J
,
Popowski
M
,
Allred
L
,
Kim
D
,
Harriss
J
, et al
.
The ARID family transcription factor bright is required for both hematopoietic stem cell and B lineage development
.
Mol Cell Biol
2011
;
31
:
1041
53
.
47.
Lestari
W
,
Ichwan
SJ
,
Otsu
M
,
Yamada
S
,
Iseki
S
,
Shimizu
S
, et al
.
Cooperation between ARID3A and p53 in the transcriptional activation of p21WAF1 in response to DNA damage
.
Biochem Biophys Res Commun
2012
;
417
:
710
6
.
48.
Wynn
TA
.
Myeloid-cell differentiation redefined in cancer
.
Nat Immunol
2013
;
14
:
197
9
.
49.
Hwang
WL
,
Wolfson
RL
,
Niemierko
A
,
Marcus
KJ
,
DuBois
SG
,
Haas-Kogan
D
.
Clinical impact of tumor mutational burden in neuroblastoma
.
J Natl Cancer Inst
2019
;
111
:
695
9
.
50.
Marat
AL
,
Dokainish
H
,
McPherson
PS
.
DENN domain proteins: regulators of Rab GTPases
.
J Biol Chem
2011
;
286
:
13791
800
.
51.
Stenmark
H
.
Rab GTPases as coordinators of vesicle traffic
.
Nat Rev Mol Cell Biol
2009
;
10
:
513
25
.
52.
Brodeur
GM
,
Minturn
JE
,
Ho
R
,
Simpson
AM
,
Iyer
R
,
Varela
CR
, et al
.
Trk receptor expression and inhibition in neuroblastomas
.
Clin Cancer Res
2009
;
15
:
3244
50
.
53.
Morita
K
,
Hama
Y
,
Izume
T
,
Tamura
N
,
Ueno
T
,
Yamashita
Y
, et al
.
Genome-wide CRISPR screen identifies TMEM41B as a gene required for autophagosome formation
.
J Cell Biol
2018
;
217
:
3817
28
.
54.
Asiedu
MK
,
Thomas
CF
Jr
,
Dong
J
,
Schulte
SC
,
Khadka
P
,
Sun
Z
, et al
.
Pathways impacted by genomic alterations in pulmonary carcinoid tumors
.
Clin Cancer Res
2018
;
24
:
1691
704
.
55.
Van Alstyne
M
,
Lotti
F
,
Dal Mas
A
,
Area-Gomez
E
,
Pellizzoni
L
.
Stasimon/Tmem41b localizes to mitochondria-associated ER membranes and is essential for mouse embryonic development
.
Biochem Biophys Res Commun
2018
;
506
:
463
70
.
56.
Dai
YS
,
Cserjesi
P
.
The basic helix-loop-helix factor, HAND2, functions as a transcriptional activator by binding to E-boxes as a heterodimer
.
J Biol Chem
2002
;
277
:
12604
12
.
57.
Hendershot
TJ
,
Liu
H
,
Clouthier
DE
,
Shepherd
IT
,
Coppola
E
,
Studer
M
, et al
.
Conditional deletion of Hand2 reveals critical functions in neurogenesis and cell type-specific gene expression for development of neural crest-derived noradrenergic sympathetic ganglion neurons
.
Dev Biol
2008
;
319
:
179
91
.
58.
Gokulnath
P
,
de Cristofaro
T
,
Manipur
I
,
Di Palma
T
,
Soriano
AA
,
Guarracino
MR
, et al
.
Long non-coding RNA HAND2-AS1 acts as a tumor suppressor in high-grade serous ovarian carcinoma
.
Int J Mol Sci
2020
;
21
:
4059
.
59.
Gong
J
,
Fan
H
,
Deng
J
,
Zhang
Q
.
LncRNA HAND2-AS1 represses cervical cancer progression by interaction with transcription factor E2F4 at the promoter of C16orf74
.
J Cell Mol Med
2020
;
24
:
6015
27
.
60.
Wei
P
,
Yang
J
,
Zhang
D
,
Cui
M
,
Li
L
.
lncRNA HAND2-AS1 regulates prostate cancer cell growth through targeting the miR-106a-5p/RBM24 axis
.
Onco Targets Ther
2020
;
13
:
4523
31
.
61.
Zhao
RX
,
Xu
ZX
.
Targeting the LKB1 tumor suppressor
.
Curr Drug Targets
2014
;
15
:
32
52
.
62.
Gabrilovich
DI
,
Ostrand-Rosenberg
S
,
Bronte
V
.
Coordinated regulation of myeloid cells by tumours
.
Nat Rev Immunol
2012
;
12
:
253
68
.
63.
Joyce
JA
,
Pollard
JW
.
Microenvironmental regulation of metastasis
.
Nat Rev Cancer
2009
;
9
:
239
52
.
64.
Sica
A
,
Bronte
V
.
Altered macrophage differentiation and immune dysfunction in tumor development
.
J Clin Invest
2007
;
117
:
1155
66
.
65.
Mao
Y
,
Eissler
N
,
Blanc
KL
,
Johnsen
JI
,
Kogner
P
,
Kiessling
R
.
Targeting suppressive myeloid cells potentiates checkpoint inhibitors to control spontaneous neuroblastoma
.
Clin Cancer Res
2016
;
22
:
3849
59
.

Supplementary data