Abstract
Genome-wide association studies (GWAS) have identified dozens of loci associated with colon and rectal adenocarcinoma risk. As tissue-specific super-enhancers (SE) play important roles in tumorigenesis, we systematically investigate SEs and inner variants in established GWAS loci to decipher the underlying biological mechanisms.
Through a comprehensive bioinformatics analysis on multi-omics data, we screen potential single-nucleotide polymorphisms (SNP) in cancer-specific SEs, and then subject them to a two-stage case–control study containing 4,929 cases and 7,083 controls from the Chinese population. A series of functional assays, including reporter gene assays, electrophoretic mobility shift assays (EMSA), CRISPR–Cas9 genome editing, chromosome conformation capture (3C) assays, and cell proliferation experiments, are performed to characterize the variant's molecular consequence and target genes.
The SNP rs11064124 in 12p13.31 is found significantly associated with the risk of colon and rectal adenocarcinoma with an odds ratio (OR) of 0.87 [95% confidence interval (CI), 0.82–0.92, P = 8.67E-06]. The protective rs11064124-G weakens the binding affinity with vitamin D receptor (VDR) and increases the enhancer's activity and interactions with two target genes' promoters, thus coactivating the transcription of CD9 and PLEKHG6, which are both putative tumor suppressor genes for colon and rectal adenocarcinoma.
Our integrative study highlights an SE polymorphism rs11064124 and two susceptibility genes CD9 and PLEKHG6 in 12p13.31 for colon and rectal adenocarcinoma.
These findings suggest a novel insight for genetic pathogenesis of colon and rectal adenocarcinoma, involving transcriptional coactivation of diverse susceptibility genes via the SE element as a gene regulation hub.
Introduction
There are estimated to be more than 1.8 million new colorectal cancer cases and 881,000 deaths worldwide in 2018, which account for 10.2% of cancer cases and 9.2% of deaths for both sexes. Overall, colorectal cancer ranks third in terms of incidence and second in terms of mortality. Both incidence and mortality rose in the most recent decade in China (1). Convincing evidence is noted that smoking, fatness, processed meat, alcohol drinks, and insufficient physical activity increase the risk to develop colorectal cancer (1–4). In addition, genetic component plays an essential role in colorectal cancer etiology. To date, genome-wide association studies (GWAS) have identified more than 100 loci associated with colorectal cancer susceptibility (5–7). Still, the vast majority of GWAS-discovered single-nucleotide polymorphisms (SNP) reside in noncoding regulatory regions, which are mainly enhancers without definite target genes (8–10). Thus, it requires in depth follow-up researches to decipher the underlying biological mechanisms.
Recently, a special group of enhancers spanning large genomic regions in a clustered manner have been discovered and termed as super-enhancers (SE; refs. 9, 11). Compared with typical enhancers, SEs are larger in size, bind with higher density of transcription factors (TF), and possess greater ability to activate gene transcription (12, 13). They are also found frequently related to key lineage-specific genes, which are essential to the state and differentiation of normal somatic cells (14, 15). Interestingly, cancer cells could generate SEs that are absent in their healthy counterparts, and these cancer-specific SEs drive expression of the critical genes that are involved in tumorigenesis (11), such as c-MYC (9), MYCN (16), TAL1 (17), RARA (18), and CACNA1H (19).
Moreover, sequence variations disproportionately impact SE domains. Disease-associated genetic variants are more enriched in the SEs of relevant cells (9). Some studies reported that functional or causal SNPs in SEs explicitly influenced cancer risk and related pathogenic processes (20–22). For example, Oldridge and colleagues identified a recently evolved SNP rs2168101 in a SE element within the first intron of the gene LMO1, which affected neuroblastoma risk via differential GATA binding and directed regulation of LMO1's transcription and expression (22).
Given the importance of SEs in addressing key issues of cancer biology, we hypothesized that common SNPs in cancer-specific SEs of GWAS loci might confer susceptibility of colon and rectal adenocarcinoma by regulating gene transcription and expression. In this study, we screened potential SNPs via a comprehensive bioinformatics analysis on multi-omics data, evaluated their associations with risk of colon and rectal adenocarcinoma in a two-stage case–control study, and conducted a succession of functional assays to elucidate the biological mechanisms.
Materials and Methods
Study subjects
A two-stage case–control study design was used to assess the associations between candidate SNPs and the risk of colon and rectal adenocarcinoma. Stage 1 contained 757 patients and 755 healthy controls enrolled from Peking Union Medical College Hospital (Beijing, China). Stage 2 contained 4,172 cases and 6,328 controls, and they were recruited from local hospitals in Wuhan, China (Supplementary Table S1). All these participants were unrelated Chinese of ethnic Han. Cases were histopathologically confirmed colon and rectal adenocarcinoma, when controls were cancer-free individuals collected from community nutritional surveys. A written informed consent and 1 mL blood sample were acquired from each subject, a portion of which were reported in some published papers of our group (23–28). This study was executed with the approval from the ethics committee of Tongji Medical College, Huazhong University of Science and Technology (Wuhan, China).
Selection of candidate SNPs
First, we downloaded H3K27ac chromatin immunoprecipitation-sequencing (ChIP-seq) data of four cancer cell lines HCT-116, VACO_400, VACO_503, VACO_9m and three normal colon tissue samples Colon_Crypt_1, Colon_Crypt_2, Colon_Crypt_3 from GEO database, and applied the algorithm ROSE to calculate their SEs (9). The unions of SEs in cancer cell lines (Union 1) and normal colon tissues (Union 2) were generated. To obtain cancer-specific SEs (Union 1-Union 2), we excluded the SEs with any regional overlap with the Union 2′s SEs in Union 1. Second, in view of the existence of the chromosomal gaps between constituent enhancers, we further extracted the ChIP-seq peaks in cancer-specific SEs to focus on the most likely functional regions. Third, we selected candidate loci with tagSNPs that had been discovered and replicated in a large-scale GWAS study of East Asians with 14,963 cases and 31,945 controls (6). Haploview 4.2 was applied to compute linkage disequilibrium (LD) regions of above loci based on the Chinese Han Beijing (CHB) genotype information that range ±500 kb centering on corresponding tagSNPs (r2 ≥ 0.8). A total of 34 LD blocks were considered as susceptibility loci in this study (Supplementary Table S2). Fourth, we screened out the common SNPs [CHB minor allele frequency (MAF) > 0.05] affecting TF binding predicted by the program Match-1.0 Public according to the library of positional weight metrics from the database TRANSFAC. Fifth, to identify variants in DnaseI hypersensitivity sites, we annotated above SNPs through ANNOVAR with a dataset “wgEncodeRegDnaseClusteredV3” from UCSC genome annotation database, and picked the SNPs with the highest annotation scores among the variants in high LD (r2 ≥ 0.8) for next-step genotyping assays (If the variants in high LD got the same highest annotation score, we used HaploReg v4.1 to choose the most possibly functional SNP). The integrated process of bioinformatics analysis was outlined in Fig. 1.
The flowchart of the systematical bioinformatics analysis in this study. CRC, colorectal cancer.
The flowchart of the systematical bioinformatics analysis in this study. CRC, colorectal cancer.
Genotyping
Genotyping in the discovery stage (stage 1) was accomplished through the Sequenom MassARRAY system. Then, we selected SNPs with P < 0.05 for the next-stage case–control study. In the replication stage (stage 2), promising SNPs were tested on TaqMan genotyping platform based on 7900HT Fast Real-Time PCR system. Five percent of subjects were randomly picked for quality control and the concordance rate of replication was 100%.
Cell lines and cell culture
Cancer cell lines used in this study were acquired from China Center for Type Culture Collection. They were verified by short tandem repeat profiling, and ascertained the absent Mycoplasma contamination. DMEM (Gibco) containing 10% FBS (Gibco) and 1% antibiotics (100 U/mL penicillin and 100 μg/mL streptomycin) was used to culture cells in a 37°C cell incubator with 5% CO2.
Dual-luciferase reporter assays
The synthetic 2,001 bp sequences, which were ±1 kb flanking A and G allele of rs11064124 (chr12: 6380507–6382507), were cloned into pGL3-promoter vector (Promega) in the restrictive sites Kpnl and Xhol, and into pGL3-basic vector (Promega) in BamHI and SalI restrictive sites, in either forward (5′ to 3′) or reverse (3′ to 5′) direction. We also inserted the promoters (5′ untranslated region plus 1 kb upstream) of CD9 (chr12: 6308482–6309665) and PLEKHG6 (chr12: 6418602–6419749) into the restrictive sites 5′ XhoI and 3′ HindIII of the above pGL3-basic vector–containing enhancers, respectively. LoVo and HCT116 cells were seeded into 96-well plates (1.25 × 104 cells/well), and were cotransfected with 100 ng constructed plasmids and 1 ng pRL-SV40 vectors (Promega) through Lipofectamine 3000 Reagent (Life Technologies). The luciferase activities of Firefly and Renilla were measured by Dual-Luciferase Reporter Assay System (Promega) 36 hours later. The relative luciferase activity was calculated as the ratio of Firefly to Renilla activity.
Electrophoretic mobility shift assays
Oligonucleotides (25 bp) that were ±5 bp flanking the vitamin D receptor (VDR) binding motif (chr12: 6381507-6381521) were synthesized and labeled with biotin at the 3′ ends (Supplementary Table S3). Experiments were conducted using EMSA/Gel-Shift Kit (Beyotime). Briefly, nuclear protein extracted from cancer cells was incubated with binding buffer at room temperature for 15 minutes. After the mixture with labeled oligonucleotide probes, the samples were electrophoresed on a 8% PAGE gel. As for competitive assays, binding reactions were preincubated with unlabeled probes at 20-fold and 200-fold excess at 25°C for 20 minutes before the mixture with labeled probes. The reaction products were detected using SuperSignal West Pico Chemiluminescent Substrate Kit (Pierce).
RNA interference and quantitative reverse transcription PCR
Small interfering RNAs (siRNA) targeting VDR CD9 and PLEKHG6 and negative control siRNAs (siRNA-control) were synthesized by RiboBio company. LoVo and HCT116 cells were transfected with siRNA-control or specific siRNAs, and then were seeded in 24-well plates (1 × 105 cells per well) for 48 hours. The relative expression levels of VDR CD9 and PLEKHG6 were determined via qRT-PCR with SYBR Green Mix (Takara). The expression level of VDR, CD9 or PLEKHG6 was normalized to ATCB that was considered as an internal control (Supplementary Table S3).
CRISPR–Cas9 genome editing
To target the whole region of the investigated enhancer which long around 3.8 kb, we designed and synthesized a pair of sgRNAs for two cell lines HCT116 and LoVo (Supplementary Table S3) respectively. Cas9 protein and sgRNAs were transfected as ribonucleoprotein (RNP) complexes via electroporation. Then, we selected and validated the target monoclonal cells with homozygous knockout through PCR, TA cloning, and Sanger sequencing.
Chromatin conformation capture assays
The chromatin conformation capture (3C) assays of four cancer cell lines were performed as described previously (29, 30). In brief, cancer cells were treated with 2% formaldehyde for crosslinking, and then were incubated with lysis buffer for nuclei isolation. After digestion with 1,200 U of restriction enzyme NcoI (New England Biolabs) overnight at 37 °C, the reaction sample was ligated by 100U T4 DNA ligase (Thermo Fisher Scientific) at 16°C for 4 hours. The purified 3C libraries were quantitated by real-time PCR as follows: 5 minutes at 95°C followed by 45 cycles of 5 seconds at 95°C, 15 seconds at 60°C, and 10 seconds at 72°C using SYBR Green Mix (Takara). The standard curve for each primer pair was created using NcoI-digested and T4-ligated BAC DNA (RP11-294D5, chr12: 6251000-6447364) encompassing the SE and two promising target genes CD9 and PLEKHG6. Control primers of GAPDH were used for normalization and standard curve generation (primers used in 3C assays were listed in Supplementary Tables S3).
Cell proliferation
To investigate the role of CD9 and PLEKHG6 in cancer cell proliferation, they were knocked down with siRNA duplexes (siCD9-1, siCD9-2 and siPLEKHG6-1, siPLEKHG6-2; RIBOBIO) in LoVo and HCT116 cells (Supplementary Table S3). Cells (1 × 105 cells per well) were seeded in 24-well plates, and subsequently harvested after 24 hours, and further cultured in 96-well plates where each well contained 100 μL cell suspension with 2,500 cells inside. Cell viability was measured using CCK-8 assays (Dojindo) in four time points (24, 48, 72, and 96 hours).
Statistical analysis
In case–control study, the distribution differences in terms of gender, age, smoking status, drinking status, and genotypes were c by t test or χ2 test. The calculation of odds ratios (OR) and 95% confidence intervals (95% CI) were under multivariate logistic regression adjusted by sex, age group, smoking, and drinking status. In functional assays, the differences between different subgroups were assessed by Student t test. All P values in our study were two-sided and the criteria of P < 0.05 was defined as statistically significant.
Results
Selection of candidate SNPs
As shown in Fig. 1, we systematically screened the candidate SNPs in cancer-specific SEs within East-Asian GWAS loci. We identified 2,778 SEs in four cancer cell lines (Union 1) and 2,112 in three colon crypt tissues (Union 2). After the comparison of two unions (Union 1–Union 2), we acquired 1,667 variant SEs and 9,066 constituent enhancers. Among of these, 5 SEs and 31 enhancers inside were mapped to 5 East-Asian GWAS loci (11q13.4, 12p13.31, 15q13.3, 19q13.11, and 20q13.33). 101 common SNPs (CHB MAF>0.05) were found in above regions, and 29 of them were predicted to affect TF binding. After removing the LD redundancy by functional annotation, we selected 17 most promising SNPs (Supplementary Table S4) and tested their genotypes using Sequenom MassARRAY genotyping assays in first-stage case–control study.
Association analysis between the candidate SNPs and risk of colon and rectal adenocarcinoma
After quality control removing rs143618146 (call rate < 95%) and rs2427293 (PHWE < 0.05), 15 SNPs were successfully genotyped in stage I including 757 cases and 755 controls (Supplementary Tables S1 and S5). Three nominal positive SNPs were submitted to an independent population (stage II: 4,172 cases and 6,328 controls) and one SNP rs11064124 was still significantly associated with the risk of colon and rectal adenocarcinoma (Supplementary Table S6). We further tested it in the combined study totally containing 4,929 cases and 7,083 controls, and found a significant association between rs11064124 and risk of colon and rectal adenocarcinoma [additive model: OR (95% CI) = 0.87 (0.82–0.92, P = 8.67E-06; Table 1)].
Association analyses between rs11064124 and risk of colon and rectal adenocarcinoma in the two-stage case–control study.
. | Stage 1 . | Stage 2 . | Combined study . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Genotype . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . |
GG | 485 (64.1) | 445 (58.9) | 1.00 | 2,579 (62.8) | 3,734 (60.0) | 1.00 | 3,064 (63.0) | 4,179 (59.9) | 1.00 | |||
GA | 237 (31.3) | 260 (34.4) | 0.84 (0.68–1.05) | 1.24E-01 | 1,357 (33.1) | 2,124 (34.1) | 0.93 (0.85–1.01) | 8.38E-02 | 1,594 (32.8) | 2,384 (34.1) | 0.92 (0.85–0.99) | 2.66E-02 |
AA | 35 (4.6) | 50 (6.6) | 0.64 (0.41–1.01) | 5.68E-02 | 168 (4.1) | 369 (5.9) | 0.66 (0.55–0.80) | 1.70E-05 | 203 (4.2) | 419 (6.0) | 0.66 (0.56–0.79) | 3.35E-06 |
Dominant model | 0.81 (0.66–1.00) | 4.88E-02 | 0.89 (0.82–0.96) | 4.26E-03 | 0.88 (0.81–0.95) | 6.41E-04 | ||||||
Recessive model | 0.68 (0.44–1.06) | 9.17E-02 | 0.68 (0.56–0.82) | 4.60E-05 | 0.68 (0.58–0.81) | 1.37E-05 | ||||||
Additive model | 0.82 (0.70–0.98) | 2.46E-02 | 0.88 (0.82–0.94) | 1.01E-04 | 0.87 (0.82–0.92) | 8.67E-06 |
. | Stage 1 . | Stage 2 . | Combined study . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Genotype . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . | Case, n (%) . | Control, n (%) . | OR (95% CI) . | P value . |
GG | 485 (64.1) | 445 (58.9) | 1.00 | 2,579 (62.8) | 3,734 (60.0) | 1.00 | 3,064 (63.0) | 4,179 (59.9) | 1.00 | |||
GA | 237 (31.3) | 260 (34.4) | 0.84 (0.68–1.05) | 1.24E-01 | 1,357 (33.1) | 2,124 (34.1) | 0.93 (0.85–1.01) | 8.38E-02 | 1,594 (32.8) | 2,384 (34.1) | 0.92 (0.85–0.99) | 2.66E-02 |
AA | 35 (4.6) | 50 (6.6) | 0.64 (0.41–1.01) | 5.68E-02 | 168 (4.1) | 369 (5.9) | 0.66 (0.55–0.80) | 1.70E-05 | 203 (4.2) | 419 (6.0) | 0.66 (0.56–0.79) | 3.35E-06 |
Dominant model | 0.81 (0.66–1.00) | 4.88E-02 | 0.89 (0.82–0.96) | 4.26E-03 | 0.88 (0.81–0.95) | 6.41E-04 | ||||||
Recessive model | 0.68 (0.44–1.06) | 9.17E-02 | 0.68 (0.56–0.82) | 4.60E-05 | 0.68 (0.58–0.81) | 1.37E-05 | ||||||
Additive model | 0.82 (0.70–0.98) | 2.46E-02 | 0.88 (0.82–0.94) | 1.01E-04 | 0.87 (0.82–0.92) | 8.67E-06 |
Note: All the P values were adjusted by gender, age group, smoking status, and drinking status. The nominal significant results are in bold.
The locus containing rs11064124 showed allele-specific enhancer activity
Dual-luciferase reporter assays were performed to investigate whether rs11064124 exerted an allele-specific impact on the located enhancer's activity. In either LoVo or HCT116, luciferase activities were significantly higher (P < 0.01) in the A allelic subgroups than in the G allelic subgroups in both orientations forward and reverse (Fig. 2A and D). To validate whether the effect of rs11064124 G > A was influenced by the predicted TF VDR, we performed reporter gene assays with the knockdown of VDR by siRNAs. As the VDR expression was knocked down by specific siRNAs, the luciferase activity of plasmids containing rs11064124-G raised up to the similar level of the minor A allele in either cell line (Fig. 2B, C, E, and F). These results collectively revealed that the regulatory element with rs11064124-A possessed a higher enhancer activity.
Allele-specific effects of rs11064124 on transcriptional activity and binding affinity. A and D, Reporter gene assays with four constructs containing rs11064124-G and rs11064124-A in LoVo and HCT116. Reporter gene assays of constructs containing rs11064124-G and rs11064124-A allele transfected with siRNA-VDR in LoVo (B and C) and HCT116 (E and F) cells. Relative luciferase activity was calculated by normalizing the firefly/Renilla luciferase activity of the empty pGL3-promoter vector or the construct containing the rs11064124-G with the transfection of siRNA-control. All experiments were carried out in triplicate, each with three technical replicates. The data are shown as the mean ± SD from three independent experiments, and the differences in luciferase activity were assessed by Student t test (**, P < 0.01). EMSAs with biotin-labeled probes containing either the G or A allele of rs11064124 and nuclear extracts from LoVo (G) or HCT116 (H) cells. The arrow indicates an allele-specific band that interacts with nuclear protein. 20× and 200× represent 20-fold and 200-fold excess amounts of an unlabeled probe over that of the labeled probe. F, forward; R, reverse.
Allele-specific effects of rs11064124 on transcriptional activity and binding affinity. A and D, Reporter gene assays with four constructs containing rs11064124-G and rs11064124-A in LoVo and HCT116. Reporter gene assays of constructs containing rs11064124-G and rs11064124-A allele transfected with siRNA-VDR in LoVo (B and C) and HCT116 (E and F) cells. Relative luciferase activity was calculated by normalizing the firefly/Renilla luciferase activity of the empty pGL3-promoter vector or the construct containing the rs11064124-G with the transfection of siRNA-control. All experiments were carried out in triplicate, each with three technical replicates. The data are shown as the mean ± SD from three independent experiments, and the differences in luciferase activity were assessed by Student t test (**, P < 0.01). EMSAs with biotin-labeled probes containing either the G or A allele of rs11064124 and nuclear extracts from LoVo (G) or HCT116 (H) cells. The arrow indicates an allele-specific band that interacts with nuclear protein. 20× and 200× represent 20-fold and 200-fold excess amounts of an unlabeled probe over that of the labeled probe. F, forward; R, reverse.
rs11064124 showed allele-specific binding affinity
Considering that rs11064124 was predicted to influence TF binding, we performed EMSAs with rs11064124-related oligonucleotides and nuclear protein of LoVo and HCT116, to investigate its effect on TF binding. In Fig. 2G and H, labeled probes of A allele displayed much weaker binding band than those of G allele (lane 2 vs. lane 7). For competition assays, the binding signal of G allele gradually weakened to almost disappear along with adding excess unlabeled probes containing the G allele (lanes 3 and 4). At the same time, equivalent unlabeled probes containing the A allele could not exert analogic effect on the binding affinity of G allele (lane 5). It revealed that rs11064124-A allele got weaker binding affinity with the TF that was assumed to be VDR.
Genome editing suggested CD9 and PLEKHG6 as target genes
To experimentally address the target genes of the identified enhancer where rs11064124 is located, we performed CRISPR-Cas9 deletion of the Enhancer 4, which was one of five constituent enhancers (Enhancer 1, chr12: 6350598-6351151; Enhancer 2, chr12: 6358883-6359235; Enhancer 3, chr12: 6371477-6371745; Enhancer 4, chr12: 6380998-6384846; Enhancer 5, chr12: 6387057-6390696) composing a SE (Fig. 3A). The enhancer 4 was probably interacted with four genes' promoters (CD9, PLEKHG6, TNFRSF1A, and NCAPD2) according to the YY1 HiChIP data of HCT116 (Supplementary Table S7; ref. 31). qRT-PCR was conducted in genome-edited and wild-type cell lines, and the deletion of Enhancer 4 resulted in significant reduction of CD9 and PLEKHG6 expression in both LoVo and HCT116, while TNFRSF1A and NCAPD2 expression were reduced only in LoVo (Fig. 3B–E). This indicated CD9 and PLEKHG6 as the target genes of Enhancer 4 containing rs11064124. The significant coexpression (P < 0.0001) between two genes in TCGA colorectal cancer samples (colon adenocarcinoma and rectal adenocarcinoma) also implied that CD9 and PLEKHG6 were regulated by the same genomic element (Supplementary Fig. S1).
The transcriptions of CD9 and PLEKHG6 were activated by rs11064124 via a promoter–enhancer interaction. A, Schematic of the genomic locations of the SE containing 5 constituent enhancers and 4 nearby genes (CD9, PLEKHG6, TNFRSF1A, and NCAPD2) that were presumably interacted with the rs11064124-locating Enhancer 4. B–E, qPCR analysis of gene expression in wild-type and CRISPR-Cas9–induced genome-editing cells of LoVo and HCT116. F and G, qPCR analysis of the relative expression of CD9 and PLEKHG6 in LoVo and HCT116 transfected with siRNA-VDR. H, Schematic of the constructs used for reporter gene assays. I–L, Reporter gene assays with four constructs containing rs11064124-G and rs11064124-A in LoVo and HCT116. Relative luciferase activity was calculated by normalizing the firefly/Renilla luciferase activity of the empty pGL3-basic. M, 3C analysis of chromatin interactions between rs11064124 locus and two promoters of CD9 and PLEKHG6 in four cell lines with two different genotypes (G/A: HT115 and HCT15; G/G: HCT116 and LoVo). All experiments were carried out in triplicate, each with three technical replicates. The data are shown as the mean ± SD from three independent experiments, and the differences in transcription level were assessed by Student t test (*, P < 0.05; **, P < 0.01).
The transcriptions of CD9 and PLEKHG6 were activated by rs11064124 via a promoter–enhancer interaction. A, Schematic of the genomic locations of the SE containing 5 constituent enhancers and 4 nearby genes (CD9, PLEKHG6, TNFRSF1A, and NCAPD2) that were presumably interacted with the rs11064124-locating Enhancer 4. B–E, qPCR analysis of gene expression in wild-type and CRISPR-Cas9–induced genome-editing cells of LoVo and HCT116. F and G, qPCR analysis of the relative expression of CD9 and PLEKHG6 in LoVo and HCT116 transfected with siRNA-VDR. H, Schematic of the constructs used for reporter gene assays. I–L, Reporter gene assays with four constructs containing rs11064124-G and rs11064124-A in LoVo and HCT116. Relative luciferase activity was calculated by normalizing the firefly/Renilla luciferase activity of the empty pGL3-basic. M, 3C analysis of chromatin interactions between rs11064124 locus and two promoters of CD9 and PLEKHG6 in four cell lines with two different genotypes (G/A: HT115 and HCT15; G/G: HCT116 and LoVo). All experiments were carried out in triplicate, each with three technical replicates. The data are shown as the mean ± SD from three independent experiments, and the differences in transcription level were assessed by Student t test (*, P < 0.05; **, P < 0.01).
We sought to investigate whether the supposed TF VDR affected the expression levels of CD9 and PLEKHG6 via RNA interference–mediated knockdown assays in LoVo and HCT116. These results revealed that the transcript levels of CD9 and PLEKHG6 were both significantly upregulated upon knockdown of VDR. It suggested that CD9 and PLEKHG6 were transcriptionally regulated by VDR (Fig. 3F and G).
rs11064124 regulated CD9 and PLEKHG6 through long-distance interactions
We constructed target gene's promoter (CD9 or PLEKHG6) and the core enhancer 4 into an appropriate vector together (Fig. 3H). Consistent with the outcomes of previous reporter gene assays, the enhancer element with rs11064124-A allele was able to activate two promoters. Specially, it performed as an active enhancer either in forward orientation for CD9 or in reverse orientation for PLEKHG6 (Fig. 3I–L). Likewise, we knocked down the TF VDR to further check its role in these interactions. The repressive luciferase activity of G-allele plasmids elevated to the nearly high degree of the mutant A-allele along with the loss of VDR expression (Supplementary Fig. S2).
We continued to perform 3C assays to confirm the interactions based on physical contacts among these regulatory elements. As showed in Fig. 3,M, the Enhancer 4 simultaneously interacted with CD9 promoter, PLEKHG6 promoter, and Enhancer 5 that was another large enhancer within the same SE. Moreover, higher crosslinking frequencies of the positive interactions were observed in HT115 and HCT115 cells carrying the rs11064124 [G/A] genotype, compared with HCT116 and LoVo carrying the rs11064124 [G/G]. This supported allele-specific long-range chromatin interactions between the SE and two target genes. It suggested that the allele rs11064124-A could coactivate CD9 74kb upstream and PLEKHG6 37kb downstream.
Still, expression quantitative trait loci (eQTL) analyses based on GTEx database (32) did not indicate the functional SNP rs11064124 as a significant eQTL for CD9 or PLEKHG6 (Supplementary Fig. S3).
The potential roles of CD9 and PLEKHG6 as tumor suppressor genes for colon and rectal adenocarcinoma
To investigate the roles of two candidate genes on colon and rectal adenocarcinoma, we conducted cell proliferation assays in two adenocarcinoma cell lines by silencing the genes with siRNAs. The results showed that the knockdown of either CD9 or PLEKHG6 could markedly promote cell proliferation in LoVo and HCT116 (Fig. 4A–D). It indicated that CD9 and PLEKHG6 might be the susceptibility genes in 12p13.31 functioning as tumor suppressors in carcinogenesis of colon and rectal adenocarcinoma.
Knockdown of CD9 or PLEKHG6 promotes colon and rectal adenocarcinoma cell proliferation and graphical representation of the results. A and B, The cell proliferation assays in LoVo with efficient knockdown of CD9 and PLEKHG6. C and D, The cell proliferation assays in HCT116 with efficient knockdown of CD9 and PLEKHG6. E, Situating in the cancer-specific SE at an intergenic region in 12p13.31, the protective rs11064124-A allele weakens the binding activity with VDR and increases interactions between the enhancer and two target genes' promoters, thus activates the transcription of CD9 and PLEKHG6, which were both potential tumor suppressor genes for colon and rectal adenocarcinoma, resulting in reduced risk of colon and rectal adenocarcinoma in the Chinese population. All experiments were performed in triplicate, each with at least three technical replicates. The data are shown as mean ± SEM for cell proliferation assays and mean ± SD for qPCR analysis from three independent experiments (**, P < 0.01). CRC, colorectal cancer.
Knockdown of CD9 or PLEKHG6 promotes colon and rectal adenocarcinoma cell proliferation and graphical representation of the results. A and B, The cell proliferation assays in LoVo with efficient knockdown of CD9 and PLEKHG6. C and D, The cell proliferation assays in HCT116 with efficient knockdown of CD9 and PLEKHG6. E, Situating in the cancer-specific SE at an intergenic region in 12p13.31, the protective rs11064124-A allele weakens the binding activity with VDR and increases interactions between the enhancer and two target genes' promoters, thus activates the transcription of CD9 and PLEKHG6, which were both potential tumor suppressor genes for colon and rectal adenocarcinoma, resulting in reduced risk of colon and rectal adenocarcinoma in the Chinese population. All experiments were performed in triplicate, each with at least three technical replicates. The data are shown as mean ± SEM for cell proliferation assays and mean ± SD for qPCR analysis from three independent experiments (**, P < 0.01). CRC, colorectal cancer.
Discussion
Through a systematic strategy integrating bioinformatics analyses, association studies, and molecular biological experiments, we identified a susceptibility SNP rs11064124 at 12p13.31 for colon and rectal adenocarcinoma in Chinese population. Situating in the cancer-specific SE at a genomic interval, the protective rs11064124-A allele weakened the binding activity with VDR, and increased interactions between the enhancer and two target genes' promoters, thus coactivating the transcription of CD9 and PLEKHG6 that were potential tumor suppressors for colon and rectal adenocarcinoma (Fig. 4E).
The GWAS locus 12p13.31 was initially identified in a large-scale genetic study of colorectal cancer in East Asian, but was lack of definite susceptibility genes mostly due to the tagSNP rs10849432 residing in a gene desert. We located a functional variant rs11064124 in moderate LD with rs10849432 (r2 = 0.73 in East Asian) in the major component enhancer of a cancer-specific SE. Using normal colonic crypt epithelium as a comparator, two successive studies on epigenomic profiling mapped genome-wide variant enhancer loci (VEL) with differential enhancer activity in 31 colorectal cancer cell lines (33, 34). The SE we mapped on this locus was exactly positioned in one of the highly recurrently gained VEL (chr12: 6380060–6414189, recurrence: 28) that were supposed to drive an aberrant transcriptional program critical for colorectal cancer growth and survival (34).
At 12p13.31, CD9 and PLEKHG6 were confirmed to be the two target genes of our investigated enhancer through CRISPR-Cas9 genomic editing and 3C assays on physical contacts. CD9 was primarily known as one of the abundant proteins and unique markers in exosomes (35), while lots of studies reported exosomes' important role in cancer invasion and metastasis, apoptosis evasion, drug resistance, and immune escape (36). On the other hand, CD9 could suppress motility and promote adherence, resulting to tumor suppression. Furthermore, CD9 was usually downregulated in advanced cancer, and its absent expression was related to poor prognosis of different malignancies including colon cancer (37, 38). More specifically, CD9/CD82 interacted with the glycoprotein 90K, which presented antitumor activity in colorectal cancer cells by suppressing Wnt signaling (39). As for PLEKHG6, which was also named MyoGEF being a myosin-interacting guanine nucleotide exchange factor, a series of studies indicated its function in cytokinesis regulation of (40–43) and breast cancer cell invasion (44, 45). Still, the explicit mechanism of PLEKHG6 involvement in colon and rectal adenocarcinoma was warranted to be clarified.
The implicated TF, VDR, was well established as a regulator for gene transcription through binding with the biologically active form of vitamin D 1,25-dihydroxvyvitamin D3 (1,25(OH)2D3) (46). Unbiased ChIP-seq analyses demonstrated that VDR binding were more commonly occurring in clusters within enhancers distal to genes' transcriptional start sites (47–49), but close to those genes involved in immunes response (50), such as CD9 in this study. As a presumed repression factor in our study, VDR and other corepressors might displace activated TFs and recruit deacetylases, thus repressing the target genes' expression. It was similar to the suppression of IL17A by 1,25(OH)2D3 where VDR blocked nuclear factor of activated T cells from binding to its sites (51). Therefore, VDR was expected to act as a sequence-specific enhancer regulator for gene control.
Other than the enhancer near MYC at 8q24 (52–54), our study reported another SE element and its internal variation in the locus 12p13.31 associated with risk of colon and rectal adenocarcinoma. We verified a credible biomarker rs11064124 through a large-sample association study and well-organized functional assays. We further pointed out two susceptibility genes CD9 and PLEKHG6 that were simultaneously interacting with the cancer-specific SE at 12p13.31. These findings provided a practical example that more than one gene contributed to disease in a single GWAS locus, and their transcription and expression could be controlled by the same regulatory element via chromatin loop (55).
Despite the aforementioned strengths, several limitations should be acknowledged. First of all, additional experiments like cell invasion, cell migration, and xenograft tumor growth assays were needed to confirm the biological effect of CD9, PLEKHG6, and even their possible interplay in colon and rectal adenocarcinoma. Second, lacking outcomes from eQTL analyses on our own tissue samples was largely due to the ongoing collection of colon and rectal adenocarcinoma tissues and adjacent normal tissue specimens. Third, insufficient clinical data of the subjects prevented us from a deeper analysis on the prognostic implications of the SNP and two target genes.
In summary, we highlighted a SE polymorphism rs11064124 and two susceptibility genes CD9 and PLEKHG6 in 12p13.31 for colon and rectal adenocarcinoma. This study suggested a novel insight for genetic pathogenesis of colon and rectal adenocarcinoma, involving susceptibility genes' transcriptional coactivation via the SE and related variation. The integrative research strategy should contribute to a more accurate picture of the biological mechanisms underlying malignancies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: J. Ke, X. Miao
Development of methodology: J. Ke
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Ke, J. Tian, S. Mei, P. Ying, N. Yang, X. Wang, D. Zou, X. Peng, Y. Yang, Y. Zhu, Y. Gong, Z. Wang, J. Gong, J. Chang, X. Miao
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Ke, J. Tian, S. Mei, P. Ying, X. Miao
Writing, review, and/or revision of the manuscript: J. Ke, X. Miao
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Ke, J. Tian, Z. Wang, J. Gong, J. Chang, X. Miao
Study supervision: J. Ke, X. Miao
Acknowledgments
This work was supported by National Natural Science Foundation of China (NSFC-81703302), China Postdoctoral Science Foundation funded project (2017M612470), and Special Financial Grant from the China Postdoctoral Science Foundation (2018T110776; to J. Ke), National Natural Science Foundation of China (NSFC-8150287; to J. Chang), National Natural Science Foundation of China (NSFC-81673256), National High-Tech Research and Development Program of China 2014AA020609, National Key Research and Development Plan Program (2016YFC1302702), National Science Fund for Distinguished Young Scholars of China (NSFC-81925032), and Program for HUST Academic Frontier Youth Team (to X. Miao).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.