Epigenetic regulation of chromatin states is thought to control gene expression programs during lineage specification. However, the roles of repressive histone modifications, such as trimethylated histone lysine 20 (H4K20me3), in development and genome stability are largely unknown. Here, we show that depletion of SET and MYND domain–containing protein 5 (SMYD5), which mediates H4K20me3, leads to genome-wide decreases in H4K20me3 and H3K9me3 levels and derepression of endogenous LTR- and LINE-repetitive DNA elements during differentiation of mouse embryonic stem cells. SMYD5 depletion resulted in chromosomal aberrations and the formation of transformed cells that exhibited decreased H4K20me3 and H3K9me3 levels and an expression signature consistent with multiple human cancers. Moreover, dysregulated gene expression in SMYD5 cancer cells was associated with LTR and endogenous retrovirus elements and decreased H4K20me3. In addition, depletion of SMYD5 in human colon and lung cancer cells results in increased tumor growth and upregulation of genes overexpressed in colon and lung cancers, respectively. These findings implicate an important role for SMYD5 in maintaining chromosome integrity by regulating heterochromatin and repressing endogenous repetitive DNA elements during differentiation. Cancer Res; 77(23); 6729–45. ©2017 AACR.
Mammalian DNA is packaged into two classes of chromatin: euchromatin, which is open and transcriptionally active, and heterochromatin, a densely packed chromatin structure that is largely refractory to transcription factor binding and transcription (1). Heterochromatin is located at specific chromosomal features such as centromeres and telomeres and interspersed throughout the genome (2). Heterochromatin is an integral player in the regulation of gene expression (3), and it promotes genome integrity by stabilizing DNA repeats by inhibiting recombination between homologous DNA repeats (4).
Chromatin compaction is regulated in part by histone modifications, such as H4K20 and H3K9 methylation, which are enriched at heterochromatin regions. H4K20 methylation is involved in several cellular functions such as heterochromatin formation and chromosome condensation (5), transcriptional activation and repression (6), genome stability (7), DNA replication (8), and DNA repair (9). The sequential methylation of H4K20me1 and H4K20me2, by Suv420h1 or Suv420h2, which catalyze H4K20me3, supports the formation of pericentric hetereochromatin (7). H4K20me3 marks also repress transcription of repetitive elements (10).
SMYD5 (SET and MYND domain-containing protein 5) mediates H4K20me3 marks at LINE/LTR–repetitive DNA sequences (11). SMYD5 promotes mouse embryonic stem (ES) cell self-renewal by silencing lineage-specific gene expression: SMYD5 is recruited by LINE- and LTR-repetitive DNA elements to the vicinity of differentiation genes and keeps them silenced by depositing H4K20me3 marks. SMYD5 regulates heterochromatin formation and silencing of endogenous retroviruses (ERV) in ES cells by mediating H4K20me3 marks and interacting with chromatin repressors heterochromatin protein 1 (HP1) and the H3K9 methyltransferase G9a, respectively (11). The ERV-silencing properties of the H3K9 methyltransferase ESET during differentiation (12) implicates a role for histone-modifying enzymes in heterochromatin formation and silencing of repetitive DNA elements. However, it is unclear how SMYD5 regulates heterochromatin formation, and how it contributes to silencing of repetitive DNA elements and genome stability during differentiation.
Here, we demonstrate that depletion of SMYD5 results in decreased H4K20me3 levels and decreased H3K9me3 chromatin immunoprecipitation sequencing (ChIP-Seq) levels during ES cell differentiation. Depletion of SMYD5 causes chromosome aberrations and is accompanied by cell transformation during ES cell differentiation. Finally, resulting cancer cells display decreased H4K20me3 and H3K9me3 levels and the expression signature of SMYD5-depleted transformed cells is correlated with a number of human cancers and can predict patient survival outcome.
Materials and Methods
Mouse ES cell culture
Mouse R1 ES cells were obtained from ATCC (2010) and stored in liquid nitrogen. R1 ES cells were cultured as previously described with minor modifications and used in this study from passage 30 to passage 42 (13). Briefly, R1 ES cells were cultured on irradiated mouse embryonic fibroblasts (MEF) in DMEM, 15% FBS media containing leukemia inhibitory factor (LIF; ESGRO) at 37°C with 5% CO2. For ChIP experiments, ES cells were cultured on gelatin-coated dishes in ES cell media containing 1.5 μmol/L CHIR9901 (GSK3 inhibitor) for several passages to remove feeder cells. R1 ES cells have been tested for mycoplasma using MycoFluor Mycoplasma Detection Kit (Thermo Fisher Scientific). For embryoid body (EB) formation, ES cells were cultured in low attachment binding dishes to promote 3D formation in ES cell media without LIF. shSmyd5 cancer cells were cultured in DMEM containing 15% FBS, glutamine, 2-mercaptoethanol, and nonessential amino acids.
Human ES cell culture
Human ES cells (H1) were obtained from Dr. Guokai Chen (NIH, Bethesda, MD) and stored in liquid nitrogen. H1 hESCs were cultured as previously described with minor modifications (14), and used in this study from passage 30 to 40. Briefly, H1 hESCs were cultured on Matrigel-coated dishes in serum-free defined E8 media and passaged using EDTA (14). H1 hESCs were tested for mycoplasma using MycoFluor Mycoplasma Detection Kit (Thermo Fisher Scientific). For EB formation, hESCs were cultured on low attachment binding dishes to promote 3D formation in E8 media for 2–3 days, and subsequently cultured in media containing high-glucose DMEM supplemented with 15% FBS, l-glutamine, nonessential amino acids, and β-mercaptoethanol. Human shSmyd5 cancer cells were cultured in DMEM containing 15% FBS, glutamine, β-mercaptoethanol, and nonessential amino acids.
Human cancer and nontumorigenic cell line culture
HCT-116, A549, and MCF7 cells were obtained from the NCI and stored in liquid nitrogen. HCT-116, A549, and MCF7 cells were cultured in media containing RPMI1640, 5% FBS, and 1 mmol/L l-glutamine. A549 cells were used in this study from passage 4–15, and HCT-116 and MCF7 cells were passaged no more than 10–15 times. MCF10A cells, which were passaged no more than 10–15 times, were cultured in media containing DMEM/F12, 5% FBS, 20 ng/mL EGF, 1 μg/mL hydrocortisone, 200 ng/mL cholera toxin, and 10 μg/mL insulin.
Mouse R1 ES cells were transduced with lentiviral particles encoding shRNAs as described previously (11). Briefly, shRNAs were cloned into the pSIH1-H1-puro Vector (System Biosciences) according to the manufacturer's protocol. To generate lentiviral particles, HEK 293T cells were cotransfected with an envelope plasmid (pLP/VSVG), packaging vector (psPAX2), and shRNA expression vector using Lipofectamine 2000. Twenty-four to 48 hours posttransfection, the medium containing lentiviral particles was harvested and used to infect mouse ES cells. Twenty-four hours posttransduction, ES cells were stably selected in the presence of 1 μg/mL puromycin to generate a heteroclonal population. For human ES cells (H1), the medium containing lentiviral particles was harvested and used to transduce hESCs for 4–6 hours. The E8 media were subsequently changed and posttransduction hESCs were stably selected and maintained in the presence of 0.5 μg/mL puromycin to generate a heteroclonal population. For HCT-116, A549, and MCF7 cells, the medium (RPMI1640, glutamine, and 5% FBS) containing lentiviral particles was used to transduce human cancer cells overnight. The media were subsequently changed and posttransduction HCT-116, A549, and MCF7 cells were stably selected in the presence of 1 μg/mL puromycin to generate a heteroclonal population. For MCF10A cells, the medium (DMEM/F12, 5% FBS, 20 ng/mL EGF, 1 μg/mL hydrocortisone, 200 ng/mL cholera toxin, and 10 μg/mL insulin) containing lentiviral particles was used to transduce MCF10A cells overnight. The media was subsequently changed and posttransduction MCF10A cells were stably selected in the presence of 1 μg/mL puromycin to generate a heteroclonal population.
Teratoma and tumor formation
Mouse ES cells were cultured on gelatin-coated dishes to remove feeder cells, dissociated into single cells, and 106 ES cells were injected subcutaneously into SCID-beige mice. After three to four weeks, mice were euthanized and teratomas were washed and fixed in 10% buffered formalin. Teratomas were then embedded in paraffin. Thin sections were cut and stained with hematoxylin and eosin using standard techniques. shSmyd5 cancer cell tumor formation was performed as described above for ES cells. Ten sections were cut from three different tumors (30 sections total). All animals were treated according to Institutional Animal Care and Use Committee guidelines approved for these studies at the National Heart, Lung, and Blood Institute (Bethesda, MD) and at Wayne State University (Detroit, MI).
Xenograft tumor model.
Human colon and lung adenocarcinoma cell lines HCT-116 and A549, respectively, were dissociated into single cells and 106 cells were injected subcutaneously into female SCID-beige mice, aged 6–8 weeks, in accordance with Institution Animal Care and Use Committee guidelines under current approved protocols at Wayne State University. After several weeks (4–6), when the tumors grew to approximatelym 1 cm in diameter, mice were euthanized and tumors were washed in PBS and weighed.
DQ-collagen IV–based proteolytic activity assay
Mouse and human shSmyd5 cancer cells and control shLuc cells were serial diluted in the respective maintenance media, seeded in 350 μL/well of cold Matrigel and plated in a 24-well plate. Matrigel polymerization occurred after 30-minute incubation at 37°C; 1 mL/well of maintenance medium (DMEM containing 15% FBS, glutamine, 2-mercaptoethanol, and nonessential amino acids) was added and replenished every other day. The proteolytic activity of the Matrigel-embedded cells was ascertained by first washing the cells with warm PBS and then incubating overnight in the presence or absence of DQ-collagen IV (10 μg/mL) in serum-free, phenol red–free DMEM at 37°C. DQ-collagen IV is a heavily fluorescein-labeled collagen IV substrate that yields green fluorescence upon cleavage by matrix metalloproteinases (MMP). The fluorescence resulting from the DQ-collagen IV cleavage was visualized using a Leica DMIRB fluorescence microscope equipped with the SPOT RT3 camera and the SPOT advanced software. The cell nuclei were counterstained with Hoechst or DAPI dye.
qRT-PCR expression analysis
RNA isolation and qRT-PCR were performed as previously described with minor modifications (11). Briefly, total RNA was harvested from ES cells using an RNeasy Mini Kit or miRNeasy Mini Kit (Qiagen) and DNase treated using Turbo DNA-free (Ambion). Reverse transcription was performed using a Superscript III kit (Invitrogen). Primers used for qRT-PCR (Supplementary Table S1) were designed using the Universal Probe Library Assay design Center (Roche) or Primer 3 (http://bioinfo.ut.ee/primer3-0.4.0/).
RNA was harvested from shSmyd5 cancer cells as described previously for ES cells (11). mRNA was purified using a Dynabeads mRNA purification kit (Invitrogen). Double-stranded cDNA was generated using a SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). cDNA was end-repaired using the End-It DNA End-Repair Kit (Epicentre), followed by addition of a single A nucleotide, and ligation of PE adapters (Illumina) or custom-indexed adapters. PCR was performed using Phusion High Fidelity PCR Master Mix. RNA-Seq libraries were sequenced on Illumina GAIIX or HiSeq platforms according to the manufacturer's protocol. At least two biological replicates were performed for shSmyd5 cancer cells RNA-Seq experiments.
The “read per kilo bases of exon model per million reads” (RPKM) measure, as defined previously (15), was used to quantify the mRNA expression level of a gene from RNA-Seq datasets. Differentially expressed genes were identified using edgeR (16) with a false discovery rate (FDR) < 0.001 and fold-change (FC) > 2. Genes with RPKM < 3 in both conditions in comparison were excluded from this analysis. The RPBM measure was used to quantify RNA expression levels of LINE and LTR repeats from RNA-Seq datasets.
Differentially expressed genes between shSmyd5 cancer cells and shLuc EBs were evaluated using Oncomine software. shSmyd5 cancer–repressed genes are underexpressed in shSmyd5 cancer cells and underexpressed in human lung adenocarcinoma versus normal lung (within the top 5% underexpressed; P = 3.44E−10, ref. 17; P = 1.12E−7, ref. 18), and underexpressed in additional human cancers including colon adenocarcinomas and invasive breast carcinomas versus normal colon and breast.
ChIP-Seq experiments were performed as previously described with minor modifications (11). The polyclonal H4K20me3 antibody (07-463) was obtained from Millipore. The polyclonal H3K9me3 (ab8898) antibody was obtained from Abcam. Briefly, ES cells were harvested and chemically crosslinked with 1% formaldehyde (Sigma) for 10 minutes at 37°C and subsequently sonicated using a Misonix XL2020 sonifier and setting #5 (18 cycles: 30-second pulse time, 1-minute rest). Sonicated cell extracts were used for ChIP assays. ChIP-enriched DNA was end-repaired using the End-It DNA End-Repair Kit (Epicentre), followed by addition of a single A nucleotide, and ligation of an indexed adapter. PCR was performed using Phusion High Fidelity PCR master mix. ChIP libraries were sequenced on an Illumina HiSeq platform according to the manufacturer's protocol. At least two biological replicates were performed for the H4K20me3 and H3K9me3 ChIP-Seq experiments.
Sequence reads were mapped to the mouse genome using bowtie2. To allow mapping to repetitive elements we used the default mode of bowtie2, which searches for multiple alignments, and reports the best one based on the alignment score (MAPQ; http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).
ChIP-Seq read enriched regions were identified relative to Input DNA (sonicated chromatin) as previously described with minor modifications (11). ChIP-Seq read enriched regions (peaks) were identified relative to Input DNA using “Spatial Clustering for Identification of ChIP-Enriched Regions” (SICER) software (19) with a window size setting of 200 bp, a gap setting of 400 bp and a FDR setting of 0.001. For a comparison of ChIP-enrichment between samples a fold-change threshold of 1.5 and an FDR setting of 0.001 were used. The RPBM measure (reads per base per million reads) was used to quantify the density of histone modification and SMYD5 binding at genomic regions from ChIP-Seq datasets. The Kolmogorov–Smirnov test was also performed to obtain P value statistics and compare densities at genomic regions.
Genomic DNA (gDNA) was harvested from shLuc ES cells and shSmyd5 cancer cells using the Promega Wizard Genomic DNA Purification Kit. gDNA was sonicated using a Diagenode Bioruptor and end-repaired using the End-It DNA End-Repair Kit (Epicentre) and sequenced on an Illumina HiSeq platform according to the manufacturer's protocol as described above. DNA sequencing (DNA-Seq) was performed using a read length of 75 bp. A total of 2.75 × 108 and 2.53 × 108 reads were obtained for shLuc ES cells and shSmyd5 cancer cells, respectively. At least two biological replicates were performed for the shSmyd5 cancer cell DNA-Seq experiments.
shLuc and shSmyd5 ES cells and shSmyd5 cancer cells were harvested and chemically crosslinked with 1% formaldehyde (Sigma) for 10 minutes at 37°C and subsequently sonicated using a Misonix XL2020 sonifier and setting #5 (18 cycles: 30-second pulse time, 1-minute rest). Sonicated cell extracts were used for Input-Seq assays. Input DNA was end-repaired using the End-It DNA End-Repair Kit (Epicentre) and sequenced on an Illumina HiSeq platform according to the manufacturer's protocol as described above. At least two biological replicates were performed for the shSmyd5 cancer cell Input-Seq experiments.
Copy number variation sequencing
Copy number variation sequencing (CNV-Seq) software (20) with default settings (P < 0.001) was used to identify regions of copy number alteration. For CNV-Seq analysis of Input-Seq data, regions that increased at least 2-fold were used for these analyses, while for CNV-Seq analysis of DNA-Seq data, regions that decreased at least 2-fold were used for these analyses.
Annotation of repetitive DNA sequences
Repetitive DNA sequence classes (e.g., LINE, LTR), families (L1, ERVK), and names (e.g., L1Md_T, IAPLTR1) were defined according to the annotations provided by the UCSC Genome Browser and RepeatMasker (http://www.repeatmasker.org), which uses curated libraries of repeats such as Repbase (http://www.girinst.org/repbase/).
Preparation of mouse metaphase chromosome suspension, Spectral karyotyping (SKY) probes, slide pretreatment, slide denaturation, detection, and imaging have been described previously (21). Numerical aberrations and structural aberrations were described according to nomenclature rules from Jackson Laboratories (http://www.informatics.jax.org/mgihome/nomen/gene.shtml). Ploidy designations for chromosome numbers used in this study have been presented previously (22). Loss of chromosomes (relative to cell ploidy) is classified as clonal when the identical chromosome is lost in three or more cells, and the gain of chromosomes as clonal when being present in two or more cells. Structural rearrangements must be detected in two or more cells to be classified as clonal changes.
Whole chromosome paints for chromosomes X, 3, 6, 14, and 19 were used to further define several structural aberrations found by SKY. Protocols used in the preparation of mouse metaphase chromosome suspension, FISH probes, slide pretreatment, slide denaturation, detection, and imaging are found at http://www.riedlab.nci.nih.gov/index.php/protocols
We have applied the Kolmogorov–Smirnov test using R (https://cran.r-project.org/) to obtain P value statistics and compare densities at genomic regions using ChIP-Seq data. Survival analysis was determined by log-rank (Mantel–Cox) analysis using Prism software. P < 0.05 was considered significant.
ChIP-Seq peaks were identified relative to Input DNA using SICER software (19) with a window size setting of 200 bp, a gap setting of 400 bp and a FDR setting of 0.001. For a comparison of ChIP-enrichment between samples a fold-change threshold of 1.5 and an FDR setting of 0.001 were used.
Differentially expressed genes were identified using EdgeR (FDR < 0.001 and FC > 2; ref. 23).
CNV-Seq software (20) with default settings (P < 0.001) was used to identify regions of copy number alteration. For CNV-Seq analysis of Input-Seq data, regions that increased at least 2-fold were considered significant, while for CNV-Seq analysis of DNA-Seq data, regions that decreased at least 2-fold were considered significant.
Availability of data
The sequencing data from this study have been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) under accession no. GSE94955.
SMYD5 safeguards genome integrity during ES cell differentiation
To test whether SMYD5 plays a role in safeguarding genome integrity, we evaluated the differentiation of SMYD5-depleted (short hairpin Smyd5; shSmyd5) ES cells relative to shLuc ES cells. RNAi knockdown resulted in decreased SMYD5 protein levels in shSmyd5 ES cells relative to shLuc ES cells (Fig. 1A). Previously, we showed that EB differentiation of SMYD5-depleted ES cells leads to complex structures containing bulges lined with a primitive endoderm (PE) layer (11). Interestingly, by extending our EB assays from 14 to 21 days, we observed the formation of transformed-like cells during differentiation in the absence of SMYD5 (Fig. 1B, i). These transformed colonies continued to emerge from the EBs and proliferate during the remainder of the culture period, and outnumbered the EBs after 21 days of culture.
To further characterize these transformed cells, we isolated small clumps of homogenously translucent and morphologically similar cells (Fig. 1B, ii and iii) and plated them on tissue culture dishes (Fig. 1B, iv and vi). In addition to their ability to proliferate in suspension, the transformed shSmyd5 cancer cells are capable of proliferating as a monolayer for an extended period of time (>2 months), and display a high proliferative rate (17.7-hour cell doubling time; Fig. 1C). Likewise, the shSmyd5 cancer cells are also capable of proliferating for an extended period of time (>2 months) in suspension in an anchorage-independent manner as clusters of cells, which is indicative of transformation (Fig. 1B, ii and iii). Moreover, the shSmyd5 cancer cells proliferated in 3D Matrigel concentrically away from the original embedded cells protruding the matrix in all directions (Supplementary Fig. S1A, top). Concomitantly, matrix degradation occurred as evidenced by progressive degradation of the Matrigel layer, and green fluorescence was detected following incubation with DQ-collagen IV (Fig. 1D; Supplementary Fig. S1A and S1B). The observed proteolytic activity of shSmyd5 cancer cells in 3D Matrigel (Fig. 1D; Supplementary Fig. S1A, bottom and S1B) is most likely derived from cleavage of DQ-collagen IV by the membrane-tethered 1-MMP (MMP14)/MMP2 axis, which has been extensively characterized (24) and has been shown to play a role in cell invasion (25). In contrast, control (shLuc) ES cells, which were embedded in 3D Matrigel and cultured in LIF-independent media, formed EB structures, but they did not protrude the 3D Matrigel, and fluorescence was not detected following incubation with DQ-collagen IV (Supplementary Fig. S1C). Altogether, these results suggest that shSmyd5 cancer cells are transformed.
To evaluate the tumorigenicity of shSmyd5 cancer cells in vivo, we injected them subcutaneously into SCID-beige mice and observed the formation of tumors containing mainly adenocarcinoma-like cells (Fig. 1E). In contrast, control shLuc ES cells form teratomas following subcutaneous injection into SCID-beige mice, which consist of a heterogeneous mixture of cells of the three germ layers, including ectoderm (keratinized epithelium, epidermis), mesoderm (muscle, adipocytes), and endoderm (glandular epithelium; Fig. 1F). To rule out the possibility that cell transformation was caused by an integration event that disrupted a tumor suppressor gene or by an off target effect of the shSmyd5 construct, we tested three different sequences to knock down Smyd5 (11) and all of them led to the formation of transformed cells (Supplementary Fig. S1D–S1G). As described above, while shLuc ES cells formed spherical EB structures containing a PE layer during early differentiation (day 6; Supplementary Fig. S1E, left; ref. 11), shSmyd5 ES cells formed structures containing bulges lined with a PE layer (Supplementary Fig. S1E, right). The clusters of transformed cells emerged from shSmyd5-1, shSmyd5-2, and shSmyd5-3 EBs (Supplementary Fig. S1F), but not shLuc EBs. Moreover, the transformed shSmyd5 cancer cells are capable of proliferating as a monolayer (Supplementary Fig. S1G). In addition, shSmyd5-3 cancer cells developed tumors containing adenocarcinoma-like cells in vivo following injection into SCID-beige mice (Supplementary Fig. S1H).
To investigate whether the transformed shSmyd5 cells are associated with any chromosomal aberrations, we performed SKY analysis, using previously defined nomenclature rules (22). Sixteen control (shLuc) ES cell metaphase spreads analyzed by SKY revealed a diploid population (Fig. 1G), while 14 shSmyd5 cancer cell metaphase spreads analyzed by SKY revealed a polyclonal population of 50% near-diploid cells (2n = 40; chromosome numbers ranged from 39–49; Fig. 1H, top) and 50% near-tetraploid cells (chromosome number ranged from 70 to 83; Fig. 1H, bottom). The shSmyd5 cells are of male origin, and in both cell populations, the Y chromosome was lost. In the diploid cell population, chromosomes that were clonally gained are X, 1, 2, 4, 12, and 19. Clonal structural aberrations involved chromosomes 12, 14, and 19 (Supplementary Table S2). Structural aberrations involving chromosomes 14 and 19 were found to contain homogeneously staining regions, which are typically indicative of gene amplifications. Chromosome 19 also was found by SKY to be deleted at the distal end of the chromosome (19D1).
In the tetraploid shSmyd5 cancer cells, more prevalent chromosome losses include chromosomes 10, 11, 13, 17, and 18, and a gain of chromosome 8 was found in 3 of 7 cells. The same structural aberrations involving chromosomes 14 and 19 were also found in the tetraploid cell population (Supplementary Table S2). The main differences between the 2n and 4n shSmyd5 cancer cell populations is the increase of chromosome instability (CIN) in the 4n cells, which includes the presence of several novel unbalanced translocations and dicentric chromosomes in the 4n population. The dicentric chromosomes were complex in that they not only had amplifications of regions from chromosome 19 but were also fused to different chromosomes (2, 6, 8, and 12; Supplementary Table S2). In summary, all of the structural aberrations involving chromosomes 12, 14, and 19, resulted in an imbalance (gains and losses) of these chromosome sequences (Supplementary Table S2).
Whole chromosome paints (WCP) for chromosomes X, 3, 6, 14, and 19 were used to further define several clonal aberrations found by SKY (Fig. 1I). These FISH results confirmed the deletions and several translocations observed in the SKY analysis.
Copy number alterations in shSmyd5 cancer cells are associated with decreased H4K20me3/H3K9me3 and enriched with repetitive elements
Copy number alterations (CNA), which are a structural variation that is a source of genetic variation and disease susceptibility, are commonly found in cancer cells with compromised genome integrity (23). To identify regions of CNA between shSmyd5 cancer cells and control (shLuc) ES cells, we performed whole-genome DNA-Seq. Using DNA-Seq, we obtained 7.75× and 7.13 coverage of the mouse genome for shLuc ES cells and shSmyd5 cancer cells, respectively. We then used CNV-Seq software (20) to identify CNA regions. Using this approach, we found 3,427 CNA regions (size range of 7 kb–2.26 Mb; average size of 235 kb; median size of 15.9 kb; Fig. 2A; red and green). A number of the major deletions identified using SKY analysis involving chromosomes 9, 12, 14, 19 were also identified using DNA-Seq.
To evaluate open chromatin regions, we also sequenced ChIP input chromatin from shLuc ES cells, shSmyd5 ES cells, and shSmyd5 cancer cells (Input-Seq). While we did not observe significant alterations in open chromatin between shLuc ES cells and shSmyd5 ES cells using CNV-Seq software, we found 2,088 regions (size range of 54 kb–8.6 Mb; average size of 682 kb, median size of 202.2 kb) between shSmyd5 cancer cells and shLuc ES cells (77% of altered Input-Seq regions increased in shSmyd5 cancer cells; Fig. 2A). These results suggest that shSmyd5 cancer cells may exhibit more decondensed chromatin relative to control (shLuc) ES cells. Because we did not identify any significant alterations in open chromatin between shLuc and shSmyd5 ES cells, we used shLuc ES cells as a control for shSmyd5 cancer cells.
We then investigated whether regions of altered Input-Seq chromatin compaction identified between shSmyd5 cancer cells and shLuc ES cells are correlated with SMYD5 binding and changes in H4K20me3, H3K9me3/2, and HP1 levels in shSmyd5 ES cells relative to shLuc ES cells at CNA regions (Fig. 2B). Using ChIP-Seq data, we generated for shSmyd5 ES cells relative to shLuc ES cells (11), we found that H4K20me3, H3K9me3/2, and HP1 levels decrease at CNA regions and regions of altered Input-Seq open chromatin (Fig. 2B). We also observed SMYD5 enrichment at regions of altered Input-Seq chromatin compaction (Fig. 2B). Moreover, we also investigated whether there is an association between CNA regions or regions of altered Input-Seq open chromatin compaction and enrichment of H4K20me3 and SMYD5 by evaluating the percentage of H4K20me3 or SMYD5 islands that overlap the CNA regions or the regions of altered Input-Seq open chromatin compaction, respectively. These results show that 33% of H4K20me3 islands and 23% of SMYD5 islands overlap with altered Input-Seq regions (Fig. 2C, top), and 2% of H4K20me3 islands and 2.4% of SMYD5 islands overlap with DNA-Seq CNA regions (Fig. 2C, bottom). SMYD5 and H4K20me3 occupancy was markedly higher at CNA (DNA-Seq) and altered open chromatin regions (Input-Seq) relative to random genomic sequences of comparable size and frequency (Fig. 2C), which were used as controls, demonstrating that SMYD5 and H4K20me3 are enriched at CNA and altered Input-Seq regions in shSmyd5 cancer cells.
Because H4K20me3 and SMYD5 are enriched at LINE- and LTR-repetitive DNA sequences (11), we characterized whether the CNA or Input-Seq regions contain known repetitive sequences. To this end, we evaluated the percentage of CNA or Input-Seq regions that contain at least 30% coverage of a repeat element. Indeed, the enrichment of LINE and LTR elements was significantly higher at these regions than at random genomic sequences, which were used as controls (Fig. 2D and E). We also evaluated the percent coverage of LTR and LINE sequences for all CNA or Input-Seq regions, and observed enrichment of LTR and LINE sequences in Input-Seq (Supplementary Fig. S2A and S2B) and DNA-Seq (Supplementary Fig. S2C and S2D) CNA regions relative to random genomic regions. We also evaluated the percent coverage of ERVK and L1 sequences for all CNA or Input-Seq regions, and observed enrichment of ERVK and L1 sequences in altered Input-Seq regions (Supplementary Fig. S2E and S2F) and DNA-Seq CNA regions (Supplementary Fig. S2G and S2H) relative to random genomic regions.
Depletion of SMYD5 leads to decreased H4K20me3 and decreased H3K9me3 levels during differentiation
Next, we tested whether H4K20me3 and H3K9me3 is reduced in shSmyd5 cancer cells or shSmyd5 EBs relative to control EBs or ES cells using ChIP-Seq. These results revealed a global decrease of H4K20me3 (Fig. 3A) and H3K9me3 (Fig. 3B) in shSmyd5 EBs relative to shLuc EBs. Likewise, we observed a global decrease in H4K20me3 (Fig. 3C) and H3K9me3 (Fig. 3D) levels in shSmyd5 cancer cells relative to day 14 shLuc EBs. Average profiles (Fig. 3E and F) or boxplots (Fig. 3G and H) around H4K20me3 or H3K9me3 peaks also revealed global decreases in H4K20me3 (Fig. 3E and G) and H3K9me3 (Fig. 3F and H) levels during EB differentiation and in shSmyd5 cancer cells. Browser views show decreased levels of H4K20me3 and H3K9me3 (Fig. 3I). These results demonstrate that depletion of SMYD5 leads to more pronounced decreases in H4K20me3 and H3K9me3 levels in EBs and shSmyd5 cancer cells relative to ES cells. However, we found that the protein levels of the H3K9 methyltransferase, G9a, were similar between shLuc and shSmyd5 ES cells (Supplementary Fig. S3A), and between shLuc and shSmyd5 EBs (Supplementary Fig. S3B), suggesting that changes in H3K9me3 levels in SMYD5-depleted cells is not due to altered expression of G9a.
Our data further show that H4K20me3 and H3K9me3 levels exhibit even greater decreases in shSmyd5 cancer cells relative to EBs at CNA regions (Fig. 3J and K). These results suggest that the decrease in H4K20me3 during differentiation in the absence of SMYD5 is correlated with the formation of CNA regions in shSmyd5 cancer cells.
Depletion of SMYD5 leads to elevated expression of repetitive DNA elements during differentiation
While we previously demonstrated that SMYD5 mediates silencing of LTR and LINE repeat classes in ES cells (11), it is unknown whether SMYD5 silences repetitive DNA elements during differentiation. Therefore, we investigated whether decreased H4K20me3 leads to increased expression of LINE and LTR repeats in SMYD5-depleted EBs and shSmyd5 cancer cells. Our results show that SMYD5-depleted EBs and shSmyd5 cancer cells exhibit increased expression of L1 (Fig. 4A) and ERVK repeat families (Fig. 4B) and subfamilies (Fig. 4C and D). Moreover, we observed similar expression profiles of ERVK (Supplementary Fig. S4A) and L1 repeats (Supplementary Fig. S4B) between different shSmyd5 cancer cell samples.
To investigate whether decreased H4K20me3 leads to increased expression of full-length, or intact, LTR retrotransposons and ERVs in SMYD5-depleted EBs and shSmyd5 cancer cells, we first performed a de novo search for full-length LTR retrotransposons and ERVs in the mouse genome using LTRharvest software (http://genometools.org/), and annotated internal features of these LTR regions using LTRdigest software. We then evaluated the expression of LTR internal annotated features in differentiated control EBs (shLuc) and SMYD5-depleted (shSmyd5), and in shSmyd5 cancer cells. These results reveal an increase in expression of LTR retrotransposons/ERV features including sequences encoding viral proteins such as gag and pol during EB differentiation of SMYD5-depleted cells (Fig. 4E, left), and even higher expression levels of these elements in shSmyd5 cancer cells (Fig. 4E, right). We also observed increased expression of IAP family of LTR retrotransposons in shSmyd5 cancer cells relative to ES cells (Fig. 4F). These findings suggest that depletion of SMYD5 leads to decreased H4K20me3 levels and increased expression of the underlying LTR retrotransposon/ERV sequences.
To investigate a relationship between the derepression of LTR retrotransposons/ERV regions, occupancy of SMYD5/H4K20me3, and cancer in humans, we evaluated the overlap between LTR regions and SMYD5/H4K20me3 occupancy. Using this method we found 955 regions that were occupied by SMYD5/H4K20me3 and contained LTR retrotransposons/ERV sequences (11). We then used GREAT gene ontology software (26) to functionally annotate these regions, and found that genes within these regions were identified in a copy number alterations study of 191 breast tumor samples, and genes coamplified within MYCN in primary neuroblastoma tumors (Fig. 4G), suggesting that SMYD5-mediated repression of LTR retrotransposons/ERV regions, marked by H4K20me3, represses expression of nearby genes in mouse cells whose expression and copy number is positively correlated with human cancers.
To investigate whether loss of SMYD5-dependent silencing of LTR/ERV elements leads to upregulated expression of nearby genes, we evaluated the number of upregulated genes in shLuc day 14 EBs (control) and shSmyd5 cancer cells that contain LTR/ERV sequences within 10 kb of their transcriptional start site (TSS; Fig. 5A and B). Of the 3,715 genes that were upregulated in shSmyd5 cancer cells relative to control EBs (1.5 fold-change, FDR < 0.05), 880 genes (24%) contained LTR/ERV sequences (Fig. 5A, left) and 645 genes (17%) contained LINE/L1 sequences (Fig. 5A, right) within 10 kb of their TSS. Annotation of the LTR/ERV (Fig. 5C, left) and LINE/L1 (Fig. 5C, right) elements revealed that they mainly reside in intronic and intergenic regions. We then evaluated the expression state of LTR/ERV elements nearby differentially expressed genes. These results revealed an increase in the expression of LTR/ERV (Fig. 5D, left) and LINE/L1 (Fig. 5D, right) elements in cancer cells relative to control EBs (Fig. 5D).
Moreover, we also observed decreased H4K20me3 levels at nearby LTR/ERVK (Fig. 5E, left) and LINE/L1 (Fig. 5E, right) in shSmyd5 cancer cells (Fig. 5E), suggesting that SMYD5-dependent control of H4K20me3 supports the repression of LTR/ERV elements of nearby genes. Overall, these results demonstrate that SMYD5 influences gene expression of nearby genes by silencing LTR/ERV elements.
Interestingly, the upregulated genes in shSmyd5 cancer cells, which contain H4K20me3 marks and LTR/LINE elements, are enriched in multiple human cancers (lung and breast cancer; Fig. 5F and G). These results suggest that SMYD5-dependent silencing of LTR/LINE elements represses the expression of a cancer transcriptional program during differentiation.
shSmyd5 cancer cell signature genes predict outcome of patient survival
To investigate whether shSmyd5-induced cell transformation is related to human cancers, we first identified differentially expressed genes between shSmyd5 cancer cells and shLuc and shSmyd5 ES cells and day 14 EB using RNA-Seq. K-means clustering followed by hierarchical clustering identified major patterns of gene expression variability (Fig. 6A). Notably, shSmyd5 cancer cells expressed a number of MMPs, namely MMP23, MMP2, MMP14, MMP19, and MMP24. The observed proteolytic activity of shSmyd5 cancer cells in 3D Matrigel (Fig. 1D; Supplementary Fig. S1A) is most likely due to the membrane tethered 1-MMP (MMP14)/MMP2 axis, which has been shown to play a role in cell invasion (25).
Principal component analysis (PCA) was used to determine the three-dimensional proximity of shSmyd5 cancer cells to shLuc and shSmyd5 ES cells and EBs (Fig. 6B). PCA revealed a skewed trajectory of shSmyd5 EB differentiation and shSmyd5 cancer cell formation compared with control EB differentiation (Fig. 6B). Moreover, by comparing shSmyd5 cancer cell and shLuc day 14 EB DE genes with gene expression data from primitive cells (ES cells) and differentiated cells (day 14 EB), using gene set enrichment analysis (GSEA; ref. 27), we found that differentially expressed genes are enriched in EBs (Fig. 6C), suggesting that differentiation genes are dysregulated in shSmyd5 cancer cells. DAVID Gene ontology (GO) analysis further confirmed that developmental GO terms, including cell differentiation, system development, cell development, gene expression, and lung development, were overrepresented between shSmyd5 cancer cells and shLuc day 14 EB (Fig. 6D).
Next, we identified differentially expressed genes between shSmyd5 cancer cells and shLuc EB day 14 cells (Fig. 6E). Our results show that shSmyd5 cancer cells overexpress lineage-specific genes such as Nog, Snai, Sox7, Thbd, and Spink2 and underexpress genes such as Col1a2, Erbb2ip, Gata3, Smyd5, and Sox11 (Fig. 6E). Interrogation of the overexpressed and underexpressed signature genes with a compendium of cancer expression datasets using Oncomine (28) revealed a correlation between genes that were underexpressed in shSmyd5 cancer cells and underexpressed in human lung adenocarcinoma versus normal lung with known clinical outcomes (within the top 5% underexpressed; P = 3.44E−10, ref. 17; P = 1.12E−7, ref. 18; Fig. 6F). We divided the lung adenocarcinoma datasets into two groups: those with high (top 10%) and those with low (bottom 10%) expression of shSmyd5 cancer cell–repressed genes. A Kaplan–Meier analysis was then performed on the datasets to investigate an association between the shSmyd5 cancer cell expression signature and patient outcome (survival; Fig. 6G). We found that lung adenocarcinoma patients with a low expression of shSmyd5 cancer cell–repressed genes have a decreased rate of survival relative to patients with a high expression profile of shSmyd5 cancer cell–repressed genes. A similar correlation was observed between the underexpressed shSmyd5 cancer cell repressed genes and additional human cancers including colon adenocarcinomas (Fig. 6H) and invasive breast carcinomas (Fig. 6I).
We also investigated copy number variation (CNV) and the mutational profile of SMYD5 in human cancers using CONAN (29; Fig. 6J). These results show that a loss of heterozygosity of SMYD5 is present in many human cancers including lung and breast cancer (Fig. 6J). Altogether, these results suggest that a loss of SMYD5 is associated with cancer formation.
To investigate whether knockdown of SMYD5 may potentiate tumor initiation in human cells, we depleted SMYD5 mRNA in human embryonic stem cells (H1 hESCs) using RNAi (see Materials and Methods; Fig. 7A and B; Supplementary Fig. S5A). qRT-PCR confirmed that SMYD5 mRNA levels were reduced in shSmyd5 hESCs relative to control (shLuc) hESCs (Fig. 7C). We observed altered differentiation of SMYD5-depleted hESCs, including a greater frequency of cavitated/cystic EBs and a decreased frequency of solid EBs relative to shLuc hESCs at day 13 of differentiation (Fig. 7D and E; Supplementary Fig. S5B). Interestingly, by extending the EB assay, we also observed the formation of transformed-like cells during EB differentiation of shSmyd5 hESCs (Fig. 7F and G; Supplementary Fig. S5C and S5D). Similar to mouse shSmyd5 cancer cells, human shSmyd5 cancer cells are capable of proliferating in suspension in an anchorage-independent manner as clusters of cells on low binding dishes (Fig. 7G and H; Supplementary Fig. S5E) or as a monolayer (Fig. 7I). Moreover, human shSmyd5 cancer cells proliferated in 3D Matrigel (Fig. 7J, right), and incubation with DQ-collagen IV revealed proteolytic activity of the embedded cells (Fig. 7J, right). In contrast, control (shLuc) EBs did not proliferate in 3D Matrigel, and fluorescence was not detected following incubation with DQ-collagen IV (Fig. 7J, left).
To evaluate whether depletion of SMYD5 in a nontumorigenic epithelial cell line leads to transformation we knocked down SMYD5 in MCF10A breast epithelial cells and assayed for 3D growth in Matrigel. Our results demonstrate that shSmyd5 MCF10A cells exhibit altered growth characteristics relative to control (shLuc) MCF10A cells. Specifically, while shLuc MCF10A cells formed normal round acini (Supplementary Fig. S6A, left), shSmyd5 MCF10A cells formed irregularly shaped acini that expanded into the matrix (Supplementary Fig. S6A, right). In addition, Phalloidin and DAPI staining showed that shLuc MCF10A cells formed normal acini with a lumen (Supplementary Fig. S6B, left), while shSmyd5 MCF10A cells formed irregular acini without a lumen structure (Supplementary Fig. S6B, right). Moreover, 3D culture in Collagen I showed that shSmyd5 MCF10A cells exhibit a highly branched structure relative to shLuc MCF10A cells (Supplementary Fig. S6C). Taken together, these results suggest that depletion of SMYD5 in MCF10A cells leads to a partially transformed phenotype.
To evaluate whether SMYD5 regulates expression of genes in human cancer cells, we knocked down SMYD5 in HCT-116 colon cancer, A549 lung cancer, and MCF7 breast cancer cells using RNAi (see Materials and Methods). qRT-PCR demonstrated that SMYD5 mRNA levels decreased by 86% in shSmyd5 HCT-116 cells (Fig. 7K), 82% in shSmyd5 A549 cells (Fig. 7L), and 90% in shSmyd5 MCF7 cells (Supplementary Fig. S7A) relative to shLuc HCT-116, shLuc A549, or shLuc MCF7 cells, respectively. Moreover, we observed upregulation of genes overexpressed in colon cancer such as CDX2, HNF4A, NAV2, and HOXB9 (30–33), and downregulation of genes underexpressed in colon cancer including SMAD4 and ALDH1 (34, 35) in shSmyd5 HCT-116 cells relative to shLuc HCT-116 cells (Fig. 7K). In addition, we observed upregulation of genes overexpressed in lung cancer such as FOXA3, SOX2, FGFR1, EGFR, RHOV, CCND1, TBX2, and USP7 (36–42) in shSmyd5 A549 cells relative to shLuc A549 cells (Fig. 7L). We also observed upregulation of genes overexpressed in breast cancer such as PGR, MMP11, CCNB1, STK15 (43–46), and downregulation of SCUBE2 (47) in shSmyd5 MCF7 cells relative to shLuc MCF7 cells (Supplementary Fig. S7A).
We also investigated the in vivo consequence of depleting SMYD5 in human colon and lung cancer cells. To this end, shLuc or shSmyd5 HCT116 cells, or shLuc or shSmyd5 A549 cells were injected subcutaneously into SCID-beige mice. Mice injected with shSmyd5 HCT-116 cells (Fig. 7M and N) or shSmyd5 A549 cells (Fig. 7O and P) exhibited significantly increased tumor growth relative to tumors generated from shLuc HCT116 shLuc A549 cells, respectively.
Results presented in this study implicate a role for SMYD5 in maintaining genome stability of ES cells during differentiation. The formation of transformed cells during differentiation in SMYD5-depleted ES cells is likely attributed to decreased H4K20 methylation, resulting in decreased levels of heterochromatin marks (H3K9 methylation/HP1), and transcriptional dysregulation of the underlying repetitive DNA elements. H4K20 methylation has been linked to multiple cellular processes including heterochromatin formation, transcriptional regulation and repression (6), and genome stability (7). Moreover, our results show that loss of SMYD5-dependent silencing of LTR/ERV elements leads to upregulated expression of nearby genes. Our results, which implicate a role for SMYD5 in regulating genome stability, are in alignment with the known function of other H4K20 histone methyltransferases, Suv420h1 and Suv420h2 (48). In addition, decreased H4K20me3 levels are a common hallmark of cancer, where decreased H4K20me3 levels occur early during transformation, and are progressively lost through the most malignant stages (49). Also, decreased H4K20me3 is correlated with preneoplasia and squamous cell lung cancer, and the level of H4K20me3 was found to decrease with disease progression (50). Concomitantly, decreased H4K20me3 is associated with poor prognosis in breast cancer and tumor progression (51). H3K9 methylation also plays a critical role in protecting genome stability, where a loss of Suv39h H3K9 HMTases leads to chromosomal instability, and decreased levels of H3K9 methylation have been found in cancer cells (52). Our results are consistent with these findings, where we observed decreased levels of the repressive histone modifications H4K20me3 and H3K9me3 in shSmyd5 cancer cells relative to control cells, suggesting that decreased levels of H4K20me3 and H3K9me3 is associated with genome instability, tumorigenesis, and cancer progression. Moreover, our results also suggest that SMYD5 may potentially act as a tumor suppressor in human cells, where depletion of SMYD5 in hESCs resulted in the formation of transformed cells during differentiation, and depletion of SMYD5 in human cancer cells lead to increased tumor growth and changes in expression of genes associated with cancer progression or prognosis.
In this study, we found that shSmyd5 cancer cells exhibit chromosomal aberrations, including copy number alterations. Our results revealed a positive correlation between SMYD5 binding, altered levels of repressive histone modifications and heterochromatin proteins (H4K20me3, H3K9me3, HP1) in SMYD5-depleted cells, and the occurrence of copy number alterations in shSmyd5 cancer cells. On the basis of these findings, we propose that SMYD5-dependent H4K20me3 is important for maintaining a heterochromatic structure that is important to protect genome stability. Absence of SMYD5 may favor a more relaxed heterochromatic state, which may be prone to genome instability during cellular state transitions from self-renewal to lineage commitment. In this case, inappropriate chromosome conformational changes may ensue from the induction of specific transcriptional programs during differentiation in the absence of adequate levels of heterochromatin, thus resulting in genome instability. It is also possible that derepression of ERVs during differentiation in the absence of SMYD5 may promote genome instability through insertional and postinsertion-based mutagenesis of activated ERVs (53, 54).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: B.L. Kidder, S. Sheng, K. Zhao
Development of methodology: B.L. Kidder, M.M. Bernardo, S. Sheng, K. Zhao
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): B.L. Kidder, R. He, D. Wangsa, M.M. Bernardo, S. Sheng
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B.L. Kidder, R. He, D. Wangsa, H.M. Padilla-Nash, M.M. Bernardo, S. Sheng, T. Ried, K. Zhao
Writing, review, and/or revision of the manuscript: B.L. Kidder, D. Wangsa, M.M. Bernardo, S. Sheng, K. Zhao
Study supervision: B.L. Kidder, S. Sheng, K. Zhao
Other (analysis of spectral karyotyping, cytogenetics): H.M. Padilla-Nash
We thank Drs. Zhiyong Ding, Wenfei Jin, Gangqing Hu, Ana Robles, and Curt Harris for helpful discussions. This work utilized the Wayne State University (WSU) High Performance Computing Grid and the NIH HPC Biowulf cluster. The DNA Sequencing and Genomics Core, Light Microcopy Core, Transgenic Core, and Pathology Core facilities of National Heart, Lung and Blood Institute assisted with this work. This work was supported by the Division of Intramural Research of the National Heart, Lung and Blood Institute and Wayne State University.
This work was supported by Wayne State University, Karmanos Cancer Institute, the Division of Intramural Research of the National Heart, Lung and Blood Institute, and a grant from the National Heart, Lung and Blood Institute (1K22HL126842-01A1) awarded to B.L. Kidder.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.