Multiple myeloma is a malignancy of plasma cells. Extensive genetic and transcriptional characterization of myeloma has identified subtypes with prognostic and therapeutic implications. In contrast, relatively little is known about the myeloma epigenome.
CD138+CD38+ myeloma cells were isolated from fresh bone marrow aspirate or the same aspirate after freezing for 1–6 months. Gene expression and chromatin accessibility were compared between fresh and frozen samples by RNA sequencing (RNA-seq) and assay for transpose accessible chromatin sequencing (ATAC-seq). Chromatin accessible regions were used to identify regulatory RNA expression in more than 700 samples from newly diagnosed patients in the Multiple Myeloma Research Foundation CoMMpass trial (NCT01454297).
Gene expression and chromatin accessibility of cryopreserved myeloma recapitulated that of freshly isolated samples. ATAC-seq performed on a series of biobanked specimens identified thousands of chromatin accessible regions with hundreds being highly coordinated with gene expression. More than 4,700 of these chromatin accessible regions were transcribed in newly diagnosed myelomas from the CoMMpass trial. Regulatory element activity alone recapitulated myeloma gene expression subtypes, and in particular myeloma subtypes with immunoglobulin heavy chain translocations were defined by transcription of distal regulatory elements. Moreover, enhancer activity predicted oncogene expression implicating gene regulatory mechanisms in aggressive myeloma.
These data demonstrate the feasibility of using biobanked specimens for retrospective studies of the myeloma epigenome and illustrate the unique enhancer landscapes of myeloma subtypes that are coupled to gene expression and disease progression.
We define the accessible chromatin landscape of primary myeloma and identify cis-regulatory elements coupled to gene expression. This identified transcription of intergenic regulatory elements allowing for interrogation of more than 700 samples using RNA sequencing data. These data show enhancer activity reflects disease subtype and implicate gene regulatory mechanisms in aggressive myeloma.
Multiple myeloma is a malignancy of differentiated B cells, known as plasma cells, that secrete high levels of immunoglobulin. Tremendous progress in the treatment and management of myeloma has been made over the past 25 years resulting in a doubling of life expectancy with a 5-year survival rate now well above 50% (1, 2). These improved outcomes are largely due to the introduction of novel therapeutic agents and regimens that target plasma cell biology rather than myeloma-specific mutations (3, 4). Nonetheless, more than 32,000 people are diagnosed with myeloma in the United States every year, and despite improved outcomes and extended remissions, the majority will develop disease that is refractory to treatment and thus, incurable (2).
The genetic basis of multiple myeloma has been extensively studied leading to a molecular classification with prognostic implications (5–7). Approximately half of myelomas have translocations that juxtapose strong immunoglobulin heavy chain (IGH) enhancers proximal to one of the three cyclin D genes (CCND1–3), the histone methyltransferase NSD2 (also known as MMSET, WHSC1), or one of the transcription factors and proto-oncogenes MAF, MAFB, or MAFA (8–10). Myelomas without IGH translocations most often have trisomies of odd-numbered chromosomes referred to as hyperdiploidy (11). Almost all myelomas also have secondary genetic events, such as copy-number alterations commonly including del(13p) (12) and gain(1q) (13, 14), structural rearrangements of MYC (15–18), translocations of either immunoglobulin light chain kappa (IGK) or lambda (IGL; ref. 19), inactivation of TP53 (20), mutations in common and disease-specific oncogenes (21), and complex structural rearrangements (22). These genetic alterations partially underlie distinct gene expression subtypes that reflect the biology and sometimes the disease course of myeloma (23, 24). However, genetic alterations fail to fully explain the high-risk proliferation gene expression subtype and miss a substantial number of patients with myeloma that experience poor outcomes.
Epigenetic dysregulation, including aberrant DNA methylation and altered histone modifications, has been implicated in myeloma for many years (25–28). The most well studied example stems from t(4;14) translocations that result in IGH enhancer–driven NSD2 overexpression and excessive histone 3 lysine 36 dimethylation (9, 29–31). While t(4;14) translocations are a marker of high-risk disease (5, 6), the prognostic implications of other epigenetic alterations are less well understood.
One challenge that has hindered epigenetic studies of primary myeloma is the use of assays that require large amounts of specimen. For example, chromatin immunoprecipitation sequencing (ChIP-seq) is often performed using millions of cells. The assay for transpose accessible chromatin sequencing (ATAC-seq) overcomes this limitation by identifying regions of chromatin accessibility using the Tn5 transposase on small numbers of cells (32). Still, intermittent acquisition of samples makes it difficult to plan and conduct large epigenetic studies. Thus, an approach that allows for the use of biobanked samples would facilitate retrospective studies of the myeloma epigenome. To this end, we recently showed that cryopreserved human peripheral B cells could be used to faithfully characterize the accessible chromatin profile from limiting cell numbers (33). We therefore sought to determine whether such a technique is amenable to multiple myeloma cells from cryopreserved bone marrow aspirates.
Here, we compared RNA sequencing (RNA-seq) and ATAC-seq from CD138+CD38+ myeloma specimens obtained from both fresh and biobanked bone marrow aspirates. Myeloma cells isolated from cryopreserved aspirates recapitulated both the chromatin accessibility and mRNA characteristics of those obtained from fresh samples. We identified hundreds of cis-regulatory elements where accessibility was predictive of gene expression and confirmed these in an independent cohort. Chromatin accessible regions were combined with histone 3 lysine 27 acetylation (H3K27ac) ChIP-seq data to interrogate transcription of intergenic regulatory elements using RNA-seq data from 768 newly diagnosed patients from the Multiple Myeloma Research Foundation (MMRF) CoMMpass trial. These data identified more than 4,700 active regulatory elements that reflected myeloma subtype and identified a program of transcribed enhancers in patients with poor outcome.
Materials and Methods
Myeloma sample isolation
Written patient informed consent was obtained and collection of multiple myeloma bone marrow aspirates followed approved protocols from the Emory University Institutional Review Board (Atlanta, GA) and complied with the Belmont Report and the U.S. Common Rule. Processing bone marrow aspirates has been described previously (34). Specifically, bone marrow aspirates were enrichment for mononuclear cells by Ficoll gradient centrifugation by first diluting aspirates to 25 mL in PBS and carefully pipetting 10 mL of Ficoll Lymphocyte Separation Medium (Corning, 25-072-Cl) under the aspirate prior to centrifugation for 30 minutes at 400 × g with no deceleration resistance. The mononuclear cell layer (“buffy coat”) was collected by Graduated Transfer Pipette (Fisherbrand, 13-711-9AM), washed in PBS, and resuspended in RPMI1640 (Corning, #14-030-CV) with 10% FBS (GeminiBio, 97068-085), 1% penicillin-streptomycin (Corning, 30-002-CI), 1% l-glutamine (Corning, 25005CI), and 1% HEPES (Corning, 25-060-CI).
After processing, bone marrow aspirate cells were either cryopreserved or analyzed within 6 hours. Cryopreservation used 1–5 × 106 cells per mL in 10% DMSO, followed by immediately cooling to −80°C buffered by Isopropyl Alcohol (Thermo Fisher Scientific, 51000001) and transfer to −140°C liquid nitrogen freezers within 1–4 days. Both fresh and cryopreserved samples were stained with anti-CD138-FITC (Becton Dickinson, 552723), anti-CD38-BV450 (Becton Dickinson, 561378), anti-CD45-APC-Cy7 (Becton Dickinson, 348795), and propidium iodide (PI; Sigma-Aldrich, P4170) prior to analysis and isolation on a FACS Aria II (Becton Dickinson; Supplementary Fig. S1).
mRNA sequencing (mRNA-seq) analysis was performed similarly as described previously (35). A total of 50,000 cells were FACS isolated directly into 600 μL of RLT lysis buffer with 1% β-ME (Sigma-Aldrich, M6250) prior to vortexing for 1 minute at maximum speed and freezing in a dry-ice ethanol mixture. RNA was isolated (Qiagen, RNeasy Mini 74104) and quality was assessed using an Agilent BioAnalyzer. Stranded mRNA-seq libraries were made using the mRNA HyperPrep Kit (Kapa Biosystems, 08098115702) following the manufacturer's protocol and using custom short TruSeq Compatible Sequencing Adapters (Integrated DNA Technologies; ref. 36; Supplementary Table S1). Barcodes were added to each library using long PCR primers (Supplementary Table S1) during eight cycles of PCR amplification. Samples were sequenced on a HiSeq 4000 (Illumina).
FASTQ sequencing files were quality and adapter trimmed using Trim Galore! (v0.6.4; https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and CutAdapt (v2.5; https://github.com/marcelm/cutadapt/) prior to mapping to the GRCh37 genome accounting for the Ensembl GRCh37.74 transcription database using the STAR aligner (v2.5.3a; ref. 37). Putative PCR duplicate reads were marked in BAM files using Samtools (v1.7; ref. 38). Gene read counts were calculated in R (v3.6.2; ref. 39) by reading in BAM files using the function readGAlignmentPairs and counting the number of reads that overlap any exonic region of an Ensemble (v74) gene using the function summarizeOverlaps of the GenomicAlignments package (v1.2; ref. 40). Gene expression data were normalized for sequencing depth using reads per million (RPM) or fragments per kilobase per million reads (FPKM), where read depth was the total number of reads on autosomal chromosomes only (Supplementary Data S6).
Genes that did not have ≥1 FPKM in expression in at least one sample were removed from downstream analyses as not expressed. Principal component analysis (PCA) used the R function prcomp of the stats package and used log2 (FPKM+1) gene expression values from autosomes only. Hierarchical clustering used the hclust function of the stats package, as well as a Euclidean distance dissimilarity metric of the log2 (FPKM+1) expression values again from the autosomes only. Plots of genomic regions used rtracklayer (v1.44.4; ref. 41) to read in summarized files and custom R code to plot sequencing depth.
ATAC-seq was performed similarly as described previously (33, 42). Here, 20,000 CD138+CD38+PI− viable myeloma cells were sorted into PBS and kept on ice. Cells were isolated by centrifugation at 500 × g for 10 minutes at 4°C and the supernatant was carefully removed by pipet. Cells were resuspended in 50 μL of nuclei lysis buffer (10 mmol/L Tris, pH 7.4, 10 mmol/L NaCl, 3 mmol/L MgCl2, and 0.1% IGEPAL) prior to centrifugation at 500 × g for 30 minutes at 4°C and careful removal of nuclei lysis buffer by pipet. Nuclei were resuspended in 25 μL of tagmentation mix, consisting of 10.5 μL H2O, 12.5 μL TD Buffer (Illumina), and 2 μL Tn5 Transposase (Illumina), and incubated at 37°C for 1 hour. After tagmentation, DNA was isolated by digesting with 20 mg proteinase K for 1 hour in 25 μL of tagmentation clean-up buffer (326 mmol/L NaCl, 109 mmol/L EDTA, and 0.63% SDS). High–molecular weight DNA was excluded by 0.6 × volume–negative selection with SPRI Beads (Kapa Biosystems), followed by a 1.2 × SPRI bead–positive selection, which was repeated twice. ATAC libraries were amplified for 12 PCR cycles using Nextera Adapters (Illumina) prior to sequencing on an Illumina HiSeq 4000.
ATAC-seq FASTQ files were quality and adapter trimmed using Trim Galore! (v0.6.4; https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and CutAdapt (v2.5; https://github.com/marcelm/cutadapt/) prior to mapping the GRCh37 genome using bowtie2 (v2.2.6) for alignment (43). Aligned SAM files were converted to BAM files and putative PCR duplicates were marked with Samtools (v1.7; ref. 38). Regions of chromatin accessibility were identified in each sample using MACS2 (v126.96.36.19951222; ref. 44). The union of all accessible regions was determined and reads mapping into these regions were determined for each sample using the summarizeOverlaps of the GenomicAlignments package (v1.2) in R (v3.6.2). Regions that overlapped ENCODE blacklisted regions were removed (45). As a quality control metric, the number of reads in peaks was determined and used to calculate a normalized accessibility score as reads per peak million (RPPM) as described previously (42) according to the following formula:
ATAC-seq data are listed Supplementary Data S7.
The chromatin accessibility signal was log2 (RPPM+1) transformed and used for PCA using the prcomp function in R, hierarchical clustering using the hclust function also in R, as well as Pearson correlation comparisons. Chromatin stretch enhancer analysis was performed as described previously (46). Briefly, regions of chromatin accessibility that did not overlap promoters [>2,500 bp from the transcription start site (TSS)], but were within 12,500 bp of each other were stitched together. The RPPM normalized read count of these stretch regions was determined for all samples and they were ranked by the average RPPM signal, with regions past the inversion point being considered “super enhancers.” Gene ontology analysis of genes most proximal to these regions was performed using the GOstats (v2.50.0) R packages to determine enriched biological processes (47).
ATAC-seq correlation with mRNA-seq
The edgeR package (v3.26.8; ref. 48) was used to correlate regions of chromatin accessibility with the 5% most variably expressed genes between samples. Here, a “digital gene expression” list (DGEList) using read counts for all ATAC-seq autosomal regions of chromatin accessibility was used to estimate dispersion in edgeR. Next, for each gene, the DGEList object was subsetted for regions within 100 kb of the transcribed region of the gene and ATAC-seq reads were correlated with the log2 (FPKM+1) gene expression using the glmQLFit and glmQLFTest functions. Data were compiled for all genes tested and P values were corrected for multiple hypothesis testing using a Benjamini–Hochberg FDR correction.
Replication of myeloma ATAC-seq and mRNA-seq and meta-analysis of ChIP-seq data
mRNA-seq, ATAC-seq, and H3K27ac ChIP-seq data from Jin and colleagues (49) were downloaded from the European nucleotide archive for project PRJEB25605 (https://www.ebi.ac.uk/ena/data/search?query=PRJEB25605). mRNA-seq and ATAC-seq data were processed as described above in a harmonized fashion as the data we generated. ChIP-seq data were processed as described for ATAC-seq data above, except normalization used RPM instead of RPPM.
Analysis of CoMMpass RNA-seq and regulatory element transcription
Access to CoMMpass data was granted through the dbGaP data access committee and data were downloaded from project (phs000748.v7.p4). Processing of CoMMpass was done as described previously (19) and used the same reference genome to harmonize the data.
Regulatory element regions were defined as the union of overlapping regions with H3K27ac enrichment in any primary myeloma sample from Jin and colleagues (49) and regions showing chromatin accessibility for any primary myeloma sample analyzed here. Regulatory element regions were trimmed so that they did not overlap any region within 500 bp upstream of a TSS to 5 kb downstream of the TTS. Regulatory element transcription was determined by counting the number of CoMMpass RNA-seq reads using the countOverlaps function of the GenomicRanges (v1.36.1) package in R. To discriminate signal from noise, regulatory element transcription for each region in each sample was compared with the transcription detected from that same sized region shuffled 1,000 times across the genome following the same rules (i.e., cannot overlap regions 500 bp upstream of TSS to 5 kb downstream of TTS). Detection of regulatory element transcription was defined as regions that had more reads in the actual regulatory region as compared with the shuffled regions (P ≤ 0.01).
Gene set enrichment analysis (GSEA) of regulatory element transcription used GSEA (v4.03) and the hallmarks MSigDb gene set (v7.0; ref. 50). Here, the regulatory elements were annotated to the closest gene and the GSEA preranked gene set option was used on the basis of the −log10 (P) × sign (OR), where P value and OR were determined by Fisher exact test of enhancer expression based on the number of samples in which transcription of the regulatory element was detected.
Correlation of regulatory element transcription and proximal mRNA expression used edgeR (v3.26.8) as described in the “ATAC-seq correlation with mRNA-seq” methods section above. Gene expression subtypes of CoMMpass data were determined by consensus clustering as described previously (19) using the ConsensusClusterPlus package (v1.48.0; ref. 51). T-distributed stochastic neighbor embedding (t-SNE) analysis of CoMMpass mRNA and enhancer transcription used the Rtsne package (v0.15; ref. 52) based on the log2 (FPKM+1) expression values.
Gene expression prognostic of outcome was determined using a Cox proportional hazards regression implemented using the R function “coxph” of the survival package (v3.1-8). Here, the log2 (FPKM+1) expression values for each gene were regressed on censored survival data from CoMMpass interim analysis 15. P values from the Cox proportional hazards Wald test were corrected for multiple hypothesis correction using a Benjamini–Hochberg FDR.
RNA-seq and ATAC-seq raw data, as well as summarized data have been deposited in the Gene Expression Omnibus under access GSE167969.
mRNA content and chromatin accessibility are preserved in viable frozen myeloma specimens
To determine the feasibility of performing transcriptional and epigenetic studies on biobanked myeloma specimens, we compared CD138+CD38+ myeloma cells obtained from fresh bone marrow aspirates with those obtained from cryopreserved bone marrow cells from the same aspirate (see Materials and Methods). Both fresh and cryopreserved cells were isolated by FACS to obtain viable CD138+CD38+ cells (Supplementary Fig. S1A). Cryopreserved bone marrow aspirates had reduced overall viability relative to fresh isolates (Supplementary Fig. S1B), but CD138+CD38+ myeloma cells were obtained in both cases (Supplementary Fig. S1C).
RNA-seq was performed on FACS-isolated myeloma cells from both fresh and frozen bone marrow aspirates for 3 patients. PCA and hierarchical clustering of RNA expression indicated a strong similarity between fresh and frozen samples from the same patient as compared with samples from other patients (Fig. 1A; Supplementary Fig. S2A). In parallel, we also isolated 20,000 CD138+CD38+ myeloma cells for ATAC-seq. Regions of chromatin accessibility for each sample were determined and normalized reads in all accessible regions were calculated (RPPM). Similar to the RNA expression, both PCA and hierarchical clustering of chromatin accessibility data indicated that fresh and frozen samples from each patient were very similar to each other, but distinct from specimens from other patients (Fig. 1B; Supplementary Fig. S2B). Pairwise comparison of both gene-specific expression and chromatin accessibility levels for all six samples indicated that fresh and frozen samples from the same patient consistently had a higher Pearson correlation coefficient (R) as opposed to interpatient comparisons (Supplementary Fig. S2C and S2D). RNA-seq at the immunoglobulin light chains, IGK and IGL, reflected the clonal nature of myeloma, where both fresh and frozen specimens from the same patient expressed variable, joining, and constant regions from either IGK (1563) or IGL (1557 and 1562), but not both (Supplementary Fig. S2E, top and middle), a phenomenon known as light chain restriction. In addition, both fresh and frozen samples for each patient showed consistent expression of a unique IGH variable, diversity, joining, and constant chain (Supplementary Fig. S2E).
Inspection of the myeloma oncogene, CCND1, showed that 2 of the patients expressed very high levels and this was consistent in both fresh and frozen specimens (Fig. 1C, left). FISH data indicated that both patients had a t(11;14) translocation that juxtaposes the strong IGH enhancer(s) to the CCND1 locus (8). The specimens from patient 1557 did not express high levels of CCND1, but did express high levels of CCND2, which is consistent with previous reports that myeloma aberrantly expresses one of the cyclin D genes (Fig. 1D, middle; ref. 53). In addition, inspection of the MYC oncogene showed that it was highly expressed in 2 of the patients and this was consistent between fresh and frozen specimens (Fig. 1D). We queried chromatin accessibility near CCND2 and MYC to gain insight into the regulation of these oncogenes. This showed that the ATAC-seq signal was similar between fresh and frozen samples for all patients, but distinct from samples from other patients, as well as identifying regions of chromatin accessibility that corresponded with CCND2 and MYC expression (Fig. 1D, see shaded regions). Overall, these data indicate that the mRNA content and chromatin accessibility state are preserved in viably frozen cells.
cis-regulatory elements predict gene expression in myeloma
To better understand the distinct cis-regulatory elements coordinating gene expression in multiple myeloma, we performed paired RNA-seq and ATAC-seq on eight cryopreserved myeloma specimens. This identified 91,632 autosomal regions of chromatin accessibility that were present in at least one sample (Fig. 2A). The largest regions of chromatin accessibility were cataloged by calculating the number of ATAC-seq reads in stretches of accessible regions that were within 12.5 kb of each other, similar to the “super enhancer” analyses performed on H3K27ac or MED1 ChIP-seq data (46). Cumulatively, across all the samples, the regions with the largest stretches of accessible chromatin were found at several myeloma lineage–defining genes, including IRF4, CD38, SLAMF7, IGH, IGL, and MYC (Fig. 2B). Annotation of genes proximal to these large regions of chromatin accessibility indicated functions in RNA biosynthesis, nucleotide metabolism, transcription, and protein modification (Fig. 2C), processes required to support the copious amounts of immunoglobulin synthesis and growth that are hallmarks of multiple myeloma. Inspection of IRF4, CD38, and SLAMF7 loci showed that samples shared the same regions of chromatin accessibility (Fig. 2D).
To identify gene regulatory elements, we correlated chromatin accessibility to expression for regions within 100 kb of a gene. Here, we focused on the most variably expressed genes in our cohort (top 5%), which also showed significantly higher variation in expression across samples in larger studies (Supplementary Fig. S3A and S3B). For example, there were four regions of chromatin accessibility within 100 kb of TLR1 (Fig. 3A), but only the most distal region showed a significant correspondence between chromatin accessibility and TLR1 expression (Fig. 3A and B, see red arrow). Cumulatively, 16,822 comparisons identified 833 regions where chromatin accessibility significantly correlated with expression of 376 genes (FDR ≤ 0.05; Supplementary Data S1), including several genes that have been implicated in the pathology of myeloma, such as CCND2, FRZB, GSTT1, MS4A1 (encodes CD20), NCAM1, PAX5, RRAS2, and WNT5A (Supplementary Fig. S4). These results were confirmed in an independent dataset by repeating these analyses using RNA-seq and ATAC-seq data from 18 myeloma samples from Jin and colleagues (49). This resulted in 1,190 regions that correlated with proximal gene expression, including 207 of those found in our cohort (Supplementary Fig. S3C). The composition of the different study populations and modest sample sizes may account for some of the differences observed, but it is important to note that there was a significant overlap of regions where chromatin accessibility predicted gene expression between the two studies (P = 8.4 × 10–62, Fisher exact test).
To understand what discriminates regions of chromatin accessibility that correspond with gene expression from those that do not, we assessed the genomic overlap with the activating modification H3K27ac using ChIP-seq data from primary myeloma samples (49). Regions where chromatin accessibility negatively correlated with gene expression were depleted of H3K27ac and conversely those regions where accessibility positively corresponded with gene expression were enriched for H3K27ac as compared with regions not indicative of gene expression (Fig. 3C). Furthermore, H3K27ac enrichment negatively correlated with expression at regions where accessibility also negatively corresponded with expression, and H3K27ac positively correlated with expression at regions where chromatin accessibility did as well (Fig. 3D). For example, distinct regions of chromatin accessibility and H3K27ac were present both upstream of the promoter and in the first intron of the mitosis-related kinase, NEK6 (Fig. 3E, left). However, in the two ATAC-seq and ChIP-seq samples shown, only the intronic enhancer was strongly correlated with expression (Fig. 3E, see red arrow), and this was generally true across all the samples analyzed (Fig. 3F). Cumulatively, these data identify a set of cis-regulatory elements implicated in myeloma gene regulation.
Transcription of regulatory elements reflects myeloma gene expression subtype
We sought to expand this analysis by interrogating samples from the CoMMpass trial (NCT01454297), a longitudinal study of more than 1,000 patients with myeloma. Because epigenetic data have yet to be included in CoMMpass, we used RNA-seq data available for 768 samples to interrogate transcription of regulatory elements, a phenomenon linked to active enhancers (54, 55). Here, the overlapping regions of chromatin accessibility and H3K27ac (49) defined above were assessed for transcription using CoMMpass RNA-seq. To avoid contamination from exonic mRNAs or intronic pre-mRNAs, intragenic regions were removed, as well as 500 bp upstream of TSSs and 5 kb downstream of transcription termination sites. This identified 13,452 putative intergenic regulatory elements to query for transcription (Supplementary Data S2). To distinguish signal from noise, we compared the RNA-seq reads at each regulatory element for each sample with the same size region shuffled 1,000 times across the genome (see Materials and Methods). Thus, detection of transcription could be determined for each regulatory element in each sample by comparing the actual number of reads with those obtained from the permuted regions. Plotting the frequency of detected RNA for each regulatory element showed an asymmetrical distribution (Fig. 4A), with some that were consistently transcribed, as well as those that were only transcribed in a subset of samples. In total, transcription was detected at 4,729 regulatory elements in 5% or more of samples. GSEA of the genes closest in proximity to the most commonly transcribed regulatory elements indicated the top enriched gene set was protein secretion (Fig. 4B; Supplementary Data S3), which is consistent with the primary function of plasma cells from which myelomas are derived. As an example, there are three regions between PIK3C2B and MDM4 that contain overlapping H3K27ac and chromatin accessibility, as well as evidence of transcription in both our cohort (generated using stranded RNA-seq) and that from CoMMpass (Fig. 4C, left, see shaded areas). Notably, paired-end RNA-seq reads from the middle enhancer region were all contained within that transcribed element indicating that it is not spliced with MDM4 or some other gene, but rather is a distinct transcriptional unit. The stranded RNA-seq data also indicated that the two promoter proximal transcribed regions near PIK3C2B and MDM4 are on the antisense strand respective to each gene, and are thus not part of the canonical gene, but likely represent bidirectional transcription. Conversely, inspection of the FOS locus (Fig. 4C, right) showed a regulatory region immediately upstream of the FOS TSS that is transcribed on the same strand as FOS. Importantly, paired-end RNA-seq reads did not span this promoter proximal cis-regulatory region with those coming from FOS, again indicating that this is a distinct transcriptional product.
Distal regulatory elements define specific myeloma subtypes
The above examples at PIK3C2B, MDM4, and FOS suggest that these regulatory regions are composed of both promoter proximal and distal elements, with the latter more likely representing enhancer elements. To understand how these distinct active regulatory elements were coupled to gene expression, we used previously defined gene expression subtypes that reflect key oncogenic drivers and genomic alterations in myeloma (19). t-SNE analysis was performed for gene expression data, as well as for transcription at both promoter proximal (≤2,500 bp of TSS) and distal regulatory elements, with each sample colored by gene expression subtype (Fig. 5A). As expected, t-SNE analysis of gene expression data grouped expression subtypes together. In addition, both sets of regulatory elements tended to group myeloma subtypes together, however, the promoter proximal elements separated the MMSET, CCND, and MAF subsets into multiple clusters (Fig. 5A, middle, see red, gold, and turquoise colors, respectively), whereas the distal regulatory elements clustered these subsets with other samples from the same subset (Fig. 5A, right). Indeed, quantifying the t-SNE distances between all samples within a given group showed that the average distance between MMSET, CCND, and MAF samples was significantly less when using distal regulatory elements as opposed to promoter proximal elements (Fig. 5B). To a lesser extent this is also observed in the proliferation and low-bone disease groups. Subsequently, we determined uniquely transcribed regulatory elements for each myeloma subtype and focused on several hundred elements that distinguished the MMSET and MAF subtypes from others (Supplementary Data S3). Integrating these data with expression of genes within 1 Mb identified distal regulatory elements highly predictive of genes uniquely regulated in the MMSET and MAF subtypes. For instance, a regulatory element approximately 154 kb upstream of CCND2 was uniquely transcribed in the MAF subtype and this was highly correlated with MAF expression (Fig. 5C and D). These data identify active distal regulatory elements that are coordinately regulated with the unique gene expression programs of myeloma subtypes.
Distal regulatory element activity predicts myeloma gene expression and outcome
Given that transcription at regulatory elements predicted gene expression, we wondered whether regulatory element activity could be used to identify putative mechanisms of molecular pathogenesis and poor outcome. First, we identified gene expression that was prognostic of overall survival (OS) in CoMMpass patients using a Cox proportional hazards model. Gene expressions prognostic of OS are represented in a volcano plot of the OS HR for each gene compared with the significance of association with outcome (Fig. 6A). For example, reduced expression of CD27 and SLAMF7 or increased expression of RUNX2 or SUZ12 corresponded with poor outcome. Next, distal regulatory RNA levels within 1 Mb of genes prognostic of outcome were correlated with expression identifying 42% of regulatory elements that positively corresponded with gene expression (Fig. 6B, see green; Supplementary Data S4). Plotting the frequency of transcribed distal regulatory elements relative to genes prognostic of outcome indicated that approximately 1.5 active regulatory elements per Mb per gene could be detected (Fig. 6C). In total, 1,308 active distal regulatory elements were identified within 1 Mb of genes prognostic of OS and 609 of these were positively correlated with gene expression (Fig. 6C, green; Supplementary Data S5). We asked whether these regions were enriched or depleted for transcription factor binding consensus motifs, which identified five transcription factor families enriched in these regions, including nuclear respiratory factor (NRF), zinc finger (ZF), nuclear receptor (NR), helix-loop-helix (HLH), and activating protein 2 (AP2) families, whereas only interferon regulatory factor (IRF) binding motifs were depleted (Fig. 6D and E; Supplementary Data S6). In addition, gene expression of the genes that encode NRF1, SP1, and hypoxia-inducible factor-1β (HIF1β; encoded by ARNT1) was also associated with worse outcome as shown by an HR greater than one (Fig. 6F), further indicating these factors may be driving a gene expression program of aggressive myeloma. Indeed, NRF1 has been linked previously to proteasome inhibitor resistance (56, 57), and similarly both SP1 and HIF1β have also been linked to myeloma pathogenesis (58, 59). An example of one of these distal regulatory elements is shown for RUNX2 (Fig. 6G). Here, genes are shown on top with RUNX2 in red and the correlation of regulatory element transcription to gene expression (RE/GX) denoted by the height of the curves connecting each regulatory element to the RUNX2 promoter; regulatory elements, ATAC-seq, and H3K27ac ChIP-seq data are shown below. The specific regulatory element is demarcated by the red correlation curve and the region harboring the regulatory element is enlarged to show RNA levels stratified by quartile. This particular regulatory element contained both SP1 and HIF1β binding motifs (Fig. 6G, bottom). Regulatory element expression quartiles are quantitated on a box-and-whisker plot and the corresponding expression of RUNX2 is shown for each quartile (Fig. 6H), indicating a strong concurrence in regulation as denoted by a positive Pearson correlation (R = 0.35), which was highly significant (FDR = 9.4 × 10–28). Cumulatively, these data identify putative distal regulatory elements and transcription factors whose activity is highly coordinated with the gene expression program of aggressive myeloma.
Here, we show that cryopreserved bone marrow aspirates can be used to isolate multiple myeloma cells that faithfully recapitulate the mRNA content and chromatin accessibility of those isolated from fresh samples. This approach was applied to a series of cryopreserved samples that were interrogated to identify regions of chromatin accessibility that were predictive of gene expression, results that were corroborated using a published study in an independent cohort of myeloma specimens. Intergenic chromatin accessible regions were used to query transcription in more than 700 myelomas from newly diagnosed patients to identify active regulatory elements. Analysis of these active regulatory elements provided interesting insights into gene regulation in this disease. First, while regulatory RNAs were able to predict gene expression–defined myeloma subtypes, they were in some cases not as good as the mRNA expression. This might be expected as the subtypes were defined by the mRNA expression. However, closer examination revealed that distal active regulatory elements better predicted mRNA expression than promoter elements that likely represent bidirectional transcription. This suggests that it is the activity of these distal elements that is driving the gene expression–based subtypes. Thus, while myeloma has been appreciated to be an “enhanceropathy” for some time now (8, 60), our data suggest that this relationship goes well beyond enhancers that are physically juxtaposed to oncogenes during initiating translocation events, and include the impact of many distal regulatory elements. Nevertheless, such initiating events appear to influence enhancer usage. While the activity of distal regulatory elements was much better at identifying gene expression–defined myeloma subtypes, this relationship was most evident among myelomas that have a primary IGH translocation (CCND, MMSET, or MAF). In contrast, there was little difference in the ability of distal versus proximal regulatory element activity in distinguishing myelomas of subtypes that were not exclusively associated with an IGH translocation, such as the hyperdiploid, low-bone disease, and proliferation. It is possible that this reflects a distinct gene regulatory program between hyperdiploid and nonhyperdiploid myeloma. Regardless, understanding how oncogenic changes are controlling these distal regulatory elements may provide insights into new pathways to target myeloma based on the transcriptional mechanisms propagating a malignant phenotype, as well as explain the molecular basis of therapeutic response to current backbone therapies that work, in part, by targeting transcription factor activity (e.g., immunomodulatory drugs, steroids, and indirectly, proteasome inhibitors).
Interrogating current RNA-seq datasets for regulatory RNAs has emerged as a powerful tool in deciphering gene regulatory mechanisms. Enhancer transcription from The Cancer Genome Atlas (TCGA) has provided unique insight into a range of solid and some hematologic tumors revealing both ubiquitously expressed and cancer-specific enhancer transcription (61, 62). While multiple myeloma was not included in TCGA project, use of data from the MMRF CoMMpass study has allowed us to provide the largest study of enhancer transcription in myeloma to date. Median depth of mRNA-seq for CoMMpass specimens was more than 230 million mapped reads per sample, which is significantly higher than most TCGA data, and provides sufficient power to detect regulatory element transcription. Our discovery of 4,729 transcribed regulatory elements in 5% or more of myeloma samples is highly congruent with TCGA analysis that found on average 4,591 enhancer RNAs (eRNA) expressed in 10% or more of samples per cancer type (61). Others have found approximately 20% of all detectable regulatory RNAs are ubiquitous across cancer types, whereas the remaining 80% are either partially shared or unique to specific cancer types (62). We anticipate that myeloma is similar in this regard. For instance, transcribed enhancers predictive of MDM4 expression have been reported previously, albeit the regulatory element reported here is distinct from that described previously (62). Further insights into the unique molecular architecture of myeloma, and therefore potential therapeutic targets, will likely be provided by understanding the regulatory elements that are unique to myeloma versus those that are common across other cancer and/or cell types.
Despite the powerful insights gleaned, interrogating regulatory element transcription from RNA-seq is not without caveats. For instance, most RNA-seq datasets, including those used here, primarily only capture poly-adenylated RNAs, and only a portion of regulatory RNAs, such as eRNAs, are poly-adenylated (55, 63). In addition, it is not currently feasible to interrogate intragenic regions, thereby missing more than half of all regulatory elements. Furthermore, transcription through the annotated transcription termination site often occurs and contaminates 3′ regulatory elements with RNA that is a byproduct of transcription rather than a signal of enhancer function. For these reasons it will be important to conduct large-scale genetic, epigenetic, and transcriptional studies to better interrogate the regulatory mechanisms and identify molecular susceptibilities of myeloma. In this regard, the high level of RNA and chromatin accessibility correspondence between fresh and biobanked samples should provide confidence that such studies can be conducted using viably frozen bone marrow aspirates. While it is important to note that these samples were only frozen for 4 months, and longer term studies are needed, this approach potentially has the advantage of isolating immune or stromal cells from the same bone marrow aspirate, thus allowing for analysis of both tumor, stromal, and/or immune compartments.
Large-scale genetic, epigenetic, and transcriptional studies have the potential to change our understanding of myeloma pathogenesis and ultimately allow us to alter the disease course. Ensuring an adequate sample size will be of paramount importance in dissecting the distinct molecular pathology of the many myeloma subtypes. Such analyses should shed light on effective modes of oncogene regulation. For instance, some hyperdiploid cases of myeloma express high levels of CCND1 despite not having t(11;14) IGH translocations (64), and while we were not able to identify candidate cis-elements regulating CCND1 expression outside of the promoter in these myelomas, this may be due to such an element residing in an intron or an enhancer that does not efficiently transcribe eRNA. In the case of CCND2, we were able to identify an element highly coordinated with CCND2 expression in the MAF subtype, but CCND2 is also aberrantly expressed in the MMSET and proliferation subtypes, which showed much more modest transcription at the same enhancer. Thus, it still remains to be shown whether the MMSET and proliferation subtypes regulate CCND2 expression through the same regulatory elements (albeit with less transcription) or whether there are other elements not discernable by this analysis that contribute to CCND2 dysregulation in these subtypes. Further studies of myeloma epigenetic and molecular programming may provide insight into why these types of myeloma exhibit aberrant expression of such oncogenes. Likewise, epigenetic studies may provide insight into the cell differentiation state from which a myeloma originated. For instance, B cells undergo rapid epigenetic remodeling as they become activated and differentiate into germinal center B cells and plasma cells (35, 42, 65); thus, epigenetic analysis of myeloma may provide insights into disease etiology. Indeed, the CD2 myeloma subtype derives its name for cyclin D and CD20, the latter being a B-cell marker normally not retained on plasma cells or other myeloma subtypes (23). Thus, epigenetic analysis of this subtype may reveal remnants of the B-cell molecular program. Such epigenetic studies will identify important cis-regulatory regions for myeloma pathology and help prioritize transcription factors and signaling pathways for a myeloma subtype precision medicine approach.
B.G. Barwick reports grants and other from Multiple Myeloma Research Foundation and grants from American Society of Hematology during the conduct of the study, as well as grants from Multiple Myeloma Research Foundation and American Society of Hematology outside the submitted work. D.L. Jaye reports grants from Multiple Myeloma Research Foundation during the conduct of the study. C.C. Hofmeister reports grants and personal fees from BMS and Sanofi, grants from Oncolytics Biotech, and personal fees from Amgen and BlueBird Bio outside the submitted work. S. Lonial reports personal fees from Takeda, Celgene, BMS, Amgen, Janssen, Merck, and Genentech outside the submitted work. L.H. Boise reports grants from Multiple Myeloma Research Foundation and NCI during the conduct of the study. L.H. Boise also reports grants and personal fees from AstraZeneca, personal fees from Genentech, and grants from NCI, National Institute for General Medical Sciences, Leukemia and Lymphoma Society, and Riney Family Foundation outside the submitted work. No disclosures were reported by the other authors.
B.G. Barwick: Conceptualization, formal analysis. V.A. Gupta: Investigation, writing–review and editing. S.M. Matulis: Investigation. J.C. Patton: Validation, investigation. D.R. Powell: Investigation. Y. Gu: Investigation. D.L. Jaye: Investigation. K.N. Conneely: Methodology. Y.C. Lin: Investigation. C.C. Hofmeister: Funding acquisition, investigation. A.K. Nooka: Resources, investigation. J.J. Keats: Formal analysis, investigation, writing–original draft. S. Lonial: Resources, investigation. P.M. Vertino: Investigation, writing–original draft, writing–review and editing. L.H. Boise: Resources, formal analysis, investigation, methodology, writing–original draft, writing–review and editing.
This study was supported, in part, by the Emory Integrated Genomics Core of Winship Cancer Institute of Emory University and NIH/NCI award number P30 CA138292. This work was supported by an Answer Fund award from the Multiple Myeloma Research Foundation (MMRF) and 3R01 CA192844-04W1 (to L.H. Boise). C.C. Hofmeister was supported by NIH/NCI award 5R01CA201382-03. B.G. Barwick was supported by Developmental Funds from the Winship Cancer Institute of Emory University, post-doctoral fellowship PF-17-109-1-TBG from the American Cancer Society, and a Research Fellow Award from the MMRF. We thank Dr. Leif Bergsagel for constructive conversations. We thank the patients, researchers, and clinicians involved in the MMRF CoMMpass trial and the entire Multiple Myeloma Research Consortium.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.