Genome-wide association studies (GWAS) have identified more than a hundred single nucleotide variants (SNV) associated with the risk of gastroesophageal cancer (GEC). The majority of the identified SNVs map to noncoding regions of the genome. Uncovering the causal SNVs and genes they modulate could help improve GEC prevention and treatment. Herein, we used HiChIP against histone 3 lysine 27 acetylation (H3K27ac) to simultaneously annotate active promoters and enhancers, identify the interactions between them, and detect nucleosome-free regions (NFR) harboring potential causal SNVs in a single assay. The application of H3K27ac HiChIP in GEC relevant models identified 61 potential functional SNVs that reside in NFRs and interact with 49 genes at 17 loci. The approach led to a 67% reduction in the number of SNVs in linkage disequilibrium at these 17 loci, and at 7 loci, a single putative causal SNV was identified. One SNV, rs147518036, located within the promoter of the UDP-glucuronate decarboxylase 1 (UXS1) gene, seemed to underlie the GEC risk association captured by the rs75460256 index SNV. The rs147518036 SNV creates a GABPA DNA recognition motif, resulting in increased promoter activity, and CRISPR‐mediated inhibition of the UXS1 promoter reduced the viability of the GEC cells. These findings provide a framework that simplifies the identification of potentially functional regulatory SNVs and target genes underlying risk-associated loci. In addition, the study implicates increased expression of the enzyme UXS1 and activation of its metabolic pathway as a predisposition to gastric cancer, which highlights potential therapeutic avenues to treat this disease.

Significance: Epigenomic footprinting using a histone posttranslational modification targeted 3D genomics methodology elucidates functional noncoding sequence variants and their target genes at cancer risk loci.

Genome-wide association studies (GWAS) have uncovered thousands of disease-associated risk loci. However, most of the associated sequence variants, including variants in linkage disequilibrium (LD), are found in noncoding regions of the genome (1). Previous studies suggest that noncoding risk-associated SNVs modulate gene expression through the alteration of transcription factor DNA recognition sites (2, 3). Noncoding cis-regulatory elements (CRE) and the genes they regulate, their target genes, remain poorly annotated in disease relevant tissues. As a result, the identification of the underlying causal variants and target genes at risk loci continues to be a significant challenge. Several approaches have been employed to identify the causal SNV(s) and target gene(s) underlying susceptibility loci (2, 46). Expression quantitative trait loci (eQTL)–based analyses can assist in the identification of the target gene(s) (6), but each risk locus harbors multiple equally plausible SNVs in LD with each other (ldSNV). Therefore, the causal sequence variant(s) often remain elusive without additional experimental information. Epigenomic profiling of posttranslational histone modifications can functionally annotate noncoding regions of the genome and aid the identification of potentially causal risk variants (4). However, the target gene(s) of the annotated CREs are not obvious, and additional data must be leveraged to predict them (4). Long-range chromatin interactions connect distal enhancer(s) with the promoter of their target gene(s), and profiling these interactions can identify the target genes of distal CREs (7). Targeted approaches that capture specific chromatin interactions, such as in situ Hi-C library preparation followed by chromatin immunoprecipitation (HiChIP; ref. 8) and PLAC-Seq (9) against posttranslational histone modifications, can simultaneously annotate CREs and identify the connections between them, facilitating the identification of potentially causal SNVs and target genes in a single assay. In addition, the advantage of using H3K27ac HiChIP to uncover the biology underlying risk loci has been recently demonstrated (10, 11).

Gastroesophageal cancer (GEC) is a leading cause of death worldwide and is influenced by environmental and genetic factors (12, 13). GWAS identified 131 single nucleotide variants (SNV) associated with GEC among populations of European and East Asian ancestry (14). Cancers of the esophagus have two main histologic subtypes: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC). Molecular profiling of ESCC reveals that it resembles squamous cell carcinomas from other tissues more closely than EAC (15). However, GWAS often report SNVs associated with esophageal cancer without specifying either ESCC or EAC. ESCC has the highest incidence in east Asian countries, whereas EAC is the more dominant subtype among western countries (12, 16). Therefore, depending on the patient population assessed in GWAS, GEC may predominantly include ESCC, EAC, or both. Of note, at the molecular level, EAC most closely resembles chromosomally unstable gastric cancer and can be considered along with gastric adenocarcinoma as single disease entity (15), known as gastroesophageal adenocarcinoma (GEA), supporting the practice of combining esophageal and stomach cancer into a single phenotype in GWAS studies. Therefore, we comprehensively evaluate the GWAS identified–GEC risk loci using H3K27ac HiChIP in three cell lines representing EAC, ESCC, and GAC to identify the underlying functional sequence variants and their target genes. We confirm several reported functional variants and identify a novel cancer dependency gene, UDP-glucuronate decarboxylase (UXS1), and variation within its promoter that is responsible for a GEA susceptibility locus.

Selection of candidate loci and calculation of LD

We downloaded all reported GWAS index SNVs from NHGRI-EBI GWAS Catalog (14). We extracted all index SNVs associated with gastric and esophageal cancer (GEC) by performing a fuzzy search on the reported trait, which returned the following keywords: “Gastric cancer,” “Esophageal cancer,” “Esophageal cancer and gastric cancer,” “Esophageal cancer (squamous cell),” “Esophageal cancer (alcohol interaction),” and “Esophageal or stomach cancer.” We conducted an individual assessment of each related study and subsequently assigned new umbrella traits to the identified index SNVs: “gastric cancer,” “esophageal cancer,” “gastric/esophageal cancer,” and “esophageal squamous cell carcinoma.” Further, we filtered the index SNVs to include only SNVs in which the reported association passed genome-wide significance (P < 5 × 10−8; Supplementary Table S1).

We downloaded the genotype data from the 1,000 genomes project (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/; ref. 17). LD was calculated using PLINK (18) within the founders of the continental European (CEU, TSI, FIN, GBR and IBS; n = 503) and east Asian (CHB, JPT, CHS, CDX and KHV; n = 504) groups. The SNVs in LD with each GWAS index SNVs (r2 ≥ 0.8) are presented in Supplementary Table S2.

DepMap CRISPR screen analyses

We analyzed the publicly available CRISPR screen results from the Dependency Map (DepMap) project (https://depmap.org/portal/; ref. 19). The likelihood ratio test (LRT) scores were calculated as described in Project DRIVE (20). Briefly, the distribution of the dependency scores for a gene across all cell lines was fitted to a skewed t-distribution and a normal distribution using the fitdist function from the fitdistrplus package in R. The skewed t-distribution was obtained from the doppelgangR R package. The LRT score was calculated for each gene individually. The LRT score is defined by Project DRIVE as two times the log of the likelihood of the fit of the skewed t-distribution over the likelihood of the fit of the normal distribution. We omitted genes where we failed to fit their dependency scores to either distribution. We determined the inflection point of the ranked LRT score curve using the extremum distance estimator from the inflection package in R. We calculated the enrichment of UXS1 dependency scores within gastric cancer cell lines versus all others cell lines combined with a Fisher exact test.

Identification of chromatin interactions and NFR

HiChIP was performed based on the previously published protocol (8) with minor modifications (21). Briefly, chromatin from 5 million cells was crosslinked with 1% formaldehyde and digested with the MboI restriction enzyme. First, the ends of the digested DNA were filled with dCTP, dGTP, dTTP, and biotin-labeled dATP and then ligated with T4 DNA ligase. Ten cycles of sonication (QSonica 800, 50% amplitude, 30 seconds ON and 30 seconds OFF) were applied to fragmentize the ligated DNA. For the ChIP capture, antibodies of H3K27ac (Abcam, ab4729, rabbit polyclonal, 7.5 µg/HiChIP) were used to pull down the enhancer and promoter elements. The streptavidin C1 magnetic beads were applied to pull down the DNA fragments that were filled with the biotin-labeled dATP. Sequencing libraries were generated with the Illumina Tagment DNA Enzyme and Buffer kit, followed by PCR amplifications using NEB Q5 master mix. The libraries were sequenced with Illumina NextSeq or NovaSeq.

The HiChIP sequencing reads were aligned to the human genome, hg19, using the HiC-Pro pipeline (22). Then, we used the Hichipper pipeline (23) to identify significant chromatin interactions defined as interactions with more than three paired-end tag (PET) and a false discovery rate (FDR) <0.05. To identify nucleosome free regions (NFR), we used HisTrader (bioRxiv 2020.03.12.989228). The enrichment signal files, bedGraphs, from the HiChIP alignments were produced using MACS2 (24), with an extension size of 147 base pairs. The significant interacting anchors identified by Hichipper were used as peaks for HisTrader (bioRxiv 2020.03.12.989228). HisTrader defaults were used, except we filtered the enrichment profile to exclude regions below 10% of the maximum signal in the anchors by setting—pMax 0.1. Intersections between the HiChIP interactions and the SNVs were accomplished using the bespoke scripts.

ChIPmentation of patient tissue samples

The cellularity of the GEA tumors obtained from the McGill University Health Centre patients was verified by a pathologist to be >50%. All patients gave written informed consent for the use of their tissue samples. The study was conducted in accordance with recognized ethical guidelines and approved by McGill University Health Centre research ethics board.

Patient tumor samples and their matched reference tissue were cut in sections of 10 µm and mounted on histologic slides. The sections were then fixed with 1% formaldehyde and quenched with glycine (0.125 mol/L final). The crosslinked samples were scrapped off the slide and transferred to 1.5 mL tubes. Homogenization of the samples was carried out manually using a disposable plastic pestle, and the resulting cell suspension was verified using a brightfield microscope. Cell pellets were lysed in 300 μL cells lysis buffer [5 mmol/L PIPES-pH 8.5, 85 mmol/L KCl, 1% (v/v) IGEPAL CA630, 50 mmol/L NaF, 1 mmol/L PMSF, 1 mmol/L phenylarsine oxide, 5 mmol/L sodium orthovanadate] followed by nuclear lysis buffer [50 mmol/L Tris-HCl pH 8.0, 10 mmol/L EDTA, 1% (w/v) SDS, 50 mmol/L NaF, 1 mmol/L PMSF, 1 mmol/L phenylarsine oxide, 5 mmol/L sodium orthovanadate]. After lysis, chromatin was sheared for 15 minutes (10 seconds ON and 20 seconds OFF) using Diagenode Bioruptor Pico (B01060010). Sheared chromatin was precleared using ThermoFisher Scientific’s Protein A dynabeads (10001D) for 1 hour at 4°C with rotation, and it was magnetically immunoprecipitated using Diagenode’s H3K27ac ChIP-seq grade antibody (C15410174) and tagmented through the Diagenode Auto ChIPmentation Kit for Histones (C01011009) on the Diagenode IP-Star Compact Automated System (SX8G), according to the manufacturer’s protocol. Stripping, end-repair, decrosslinking and library amplification were performed according to the manufacturer’s instructions. Size selection was done manually in a two-step procedure using Beckman Coulter’s Agencourt AMPure XP Beads (A63880). Library preparation was completed using Agilent’s SureSelectXT Reagent Kit (G9611A), and samples were sequenced using an llumina NovaSeq at the McGill University Genome Centre. ChIPmentation reads were aligned to the human genome, hg19, using BWA (25). Significant enriched peak regions were identified using MACS2 (24) for each sample using the default settings and the following additional flags: -B, –SPMR, –BAMPE, –broad, –broad-cutoff 0.1, −q 0.01. To compare the HiChIP and ChIPmentation enrichment profiles, we used deepTools (26).

IGR and motif analyses

We in silico tested the impact of SNVs on transcription factor binding affinity using a modified version of the previously described integrated genomic replicate (IGR) analysis (27). Briefly, we compared the ChIP-seq signal intensity across all open chromatin regions containing DNA sequences eight nucleotide in length (8-mers) that match the reference allele and its surrounding DNA sequence against the signal intensity at open chromatin regions containing 8-mers, differing only by the variant allele of each SNV. All 8-mers from a sliding window surrounding each SNV are considered. In our modified version, we compared the reference to the variant 8-mer at each position in the sliding window and used negative binomial regression to assess the difference in binding intensity. We used the maximum signal intensity within the open chromatin region and adjusted the position of the 8-mer within and the total length of open region. The results of the most significant 8-mer in the sliding window are reported. The encyclopedia of DNA elements (ENCODE) called DNaseI hypersensitivity sites (DHS) ENCFF274YGF for K562 was used for the open chromatin regions. The complete list of transcription factor ChIP-seq enrichment files used in the analysis is available in Supplementary Table S3. ChIP-seq files were matched by cell line. For the motif analyses, we used FIMO from the MEME suite to identify transcription factor motifs (28).

Luciferase promoter activity assay

The SV40 driven pGL4.23-luc2 plasmid was purchased from Promega (E8411) and propagated and extracted from NEB stable competent E. coli (C3040H) using Geneaid Presto Mini Plasmid kit (PDH100). The plasmid was then digested using the NEB HindIII-HF (R3104F) and NEB NcoI-HF (R3193S) enzymes, and the resulting 4,209 bp band was extracted from agarose gel using the Macherey–Nagel NucleoSpin Gel and PCR Clean-up kit (740609.250). Wild-type UXS1 promoter was amplified from genomic DNA using Takara Bio CloneAmp HiFi PCR premix (639298) and DNA overhangs complementary to the ends of the digested pGL4.3 plasmid added with PCR primers (Supplementary Table S4). The fragment was then cloned into the digested vector using Takara Bio In-Fusion seamless cloning system (638948), following the manufacturer’s protocol. The UXS1 promoter variant dbSNP rs147518036 was then generated by performing site directed mutagenesis on the UXS1-pGL4.23 plasmid carrying the wild-type UXS1-promoter using Agilent QuikChange II XL Site Directed Mutagenesis Kit (200521), following the manufacturer’s protocol. We verified the reference and variant sequence using Sanger sequencing. All primers used can be found in Supplementary Table S4.

Transfection of the UXS1-promoter wild-type and mutant plasmids were carried out using Roche X-tremeGENE HP DNA transfection reagent (XTGHP-RO). Briefly, cells were seeded at a density of 40,000 cells per well in a 24 well plate. Twenty-four hours post seeding, cells were cotransfected using 0.55 µL of X-treameGENE HP DNA transfection reagent with 0.05 µg of Promega SV40 driven control Renilla luciferase plasmid (E6911) and 0.5 µg of the experimental promoter Firefly luciferase plasmids in 55 µL of Gibco Opti-MEM reduced serum media (31935070). Predesigned Ambion Silencer Select nontargeting and GABPA siRNA (4392420) were used for the knockdown experiments. Cells were seeded at a density of 50,000 cells per well in a 24-well plate. Twenty-four hours post seeding, cells were transfected with either 5 pmol of siRNA targeting GABPA, or a nontargeting negative control, and 1.5 µL of Invitrogen Lipofectamine RNAiMAX transfection reagent (13778150) in 50 µL of Gibco Opti-MEM reduced serum media. Fourty eight hours post transfection, the cells were transfected with the luciferase reporter assay plasmids, as described above. Twenty four hours post transfection, cells were assayed for luciferase activity using Promega Dual-Glo Luciferase Assay System (E2920) as per the manufacturer’s protocol. Luciferase activity was measured using the Tecan i2000 spectrometer with an integration time of 10,000 milliseconds.

CRISPRi of UXS1 promoter

Lentivirus was generated for lenti-dCas9-KRAB-MeCP2 (29) using standard protocols. Briefly, HEK293T cells were grown in a 15-cm dish and allowed to reach a confluency of ∼60% to 70%, after which, they were cotransfected with PAX2, VSV-G, and lenti-dCas9-KRAB-MeCP2. Cotransfection was performed using 6 µg of each plasmid with 54 µL of Roche X-treamGENE HP DNA transfection reagent (#XTGHP-RO) diluted in 1.5 mL of Gibco Opti-MEM reduced serum media (31935070), which was subsequently added to the plate and placed in the incubator overnight at 37°C. Following incubation, media was removed, and 12 mL of fresh antibiotic free media was added. Seventy-two hours post media change, the presence of lentivirus was tested using the Takara Lenti-X GoStix Plus (631280). If lentivirus was present, media was collected, filtered through a 45-µm filter and then combined with one-third volume of Takara Lenti-X Concentrator (631232) and incubated at 4°C for 1 hour. Following incubation, the lentiviral solution was centrifuged at 1,500 × g for 45 minutes. Supernatant was discarded, and the pellet containing the lentivirus was resuspended in 200 µL of DMEM and aliquoted in single use volumes.

We transduced the gastric cell lines AGS, MKN45, and MKN74 with lenti-dCas9-KRAB-MeCP2 to generate cell lines stably expressing dCas9-KRAB-MeCP2. Briefly, 25,000 cells were seeded in 12 well plates with 500 µL of media and incubated at 37°C for 18 hours to allow cells to adhere, and then, 20 µL of lentivirus was added to each well and incubated for 24 hours at 37°C. Media was refreshed, and Sigma blasticidine S hydrochloride (15205-25MG) was added at a concentration of 10 µg/mL to select positive cells.

To repress the UXS1 promoter, single-guide RNAs (sgRNA) were designed using CRISPick (URL: https://portals.broadinstitute.org/gppx/crispick/public) and cloned into the lentiGuide vector (addgene: 52963). Lentivirus of lentiGuide carrying sgRNAs targeting the UXS1 promoter, or nontargeting sgRNAs, was generated, and dCas9-KRAB-MeCP2 cells were transduced following the abovementioned protocol. Positive clones were selected using puromycin at a concentration of 7 µg/mL.

For the rescue experiment, we cloned UXS1 complementary DNA into a doxycycline-inducible expression vector, pCW57.1-Puro (Addgene:41393), and infected dCas9-KRAB-MeCP2 AGS cells with pCW57.1-UXS1-Puro. To avoid overlap between our selection markers, we used LentiGuide-Zeo to express the sgRNA. Cells were selected with zeocin (1.2 µg/mL) for 5 days, and the same number of cells was seeded with or without 1 µg/mL doxycycline. The cell culture media were changed with fresh doxycycline every 48 hours. We verified the inducible expression of UXS1 and knockdown using RT-qPCR.

RNA extraction, RT-qPCR, and ddPCR

RNA was extracted using Qiagen RNAeasy kit (74004). After the genomic DNA depletion, 800 ng RNA was reverse transcribed to cDNA using Invitrogen SuperScript IV Reverse Transcriptase, following the manufacturer’s instructions (18090010). The resulting cDNA was diluted 8.5-fold, and 1 µL (∼5 ng) was used per ddPCR and RT-qPCR reaction. For ddPCR, absolute quantification using Bio-Rad QX200 EvaGreen ddPCR Supermix (1864034) was performed as per the standard protocol and read using QX200 Droplet Reader (1864003). For RT-qPCR, relative quantification was performed using SsoAdvanced Universal SYBR Green Supermix (1725271) on a CFX Opus 96 Real-Time PCR Instrument (12011319). The signal of RT-qPCR was normalized to the internal reference gene ACTB. All experiments were performed in triplicate. All primers are listed in Supplementary Table S4.

Western blots

Protein extracts from AGS cells were separated by SDS-PAGE gel and then transferred to PVDF membranes. Testing the expression of the UXS1 was performed by incubating PVDF membranes with anti-UXS1 (1:1,000, Invitrogen PA5-31629) and antivinculin antibodies diluted in 5% skim milk (1:1,000, Bio-Rad MCA465GA) at 4°C overnight. The next day, membranes were incubated with HRP-conjugated antirabbit IgG (1:2,000, Abcam ab6721) and antimouse IgG (1:5,000, CST 7076P2) secondary antibodies for 1 hour at room temperature. Immunoreactive material was revealed using Clarity Western ECL Substrate (Bio-Rad 170-5060) and ChemiDoc MP imaging system (Bio-Rad).

Proliferation assay

Stable dCas9-KRAB-MeCP2 expressing cells (AGS, MKN45, and MKN74) infected with UXS1 promoter-targeting sgRNAs, or nontargeting sgRNAs, were selected with puromycin for 4 days. For the AGS dCas9-KRAB-MeCP2, 2,000 cells were seeded in 100 µL of media per well in a 96-well plate in triplicate at four timepoints: 24, 48, 72, and 96 hours. Every 24 hours, 100 µL of Abcam Colorimetric MTS assay reagent (ab197010) was added to the appropriate wells and incubated at 37°C for 1 hour. Following incubation, wells were agitated for 20 seconds, and absorbance was measured at 490 nm wavelength using the Tecan i2000 spectrometer. Because a significant effect was observed for the AGS CRISPRi cells following 72 hours, we measured the cell viability for the other cell lines at this time point.

Cell lines

AGS and SNU5 cell lines were purchased from ATCC. MKN74 and MKN1 cell lines were obtained from the JCRB Cell Bank through their distributor Sekisui Xenotech, and MKN45 cell line was gift from Dr. Morag Park. KYSE140 and OE33 were obtained from the Cancer Cell line Encyclopedia project. All cell lines were verified by short tandem repeat profiling (Promega Geneprint 10) by Génome Québec or 24 loci by the DNA Sequencing Core at the University of Utah. All cells were tested for Mycoplasma using TransDetect PCR Mycoplasma Detection Kit (TransGen Biotech FM311-02).

Data availability

Genotype Tissue Expression (GTEx) data were accessed using the GTEx API V1 (https://gtexportal.org/home/apiPage). The Cancer Genome Atlas (TCGA) data were accessed using TCGAbiolinks (30). Significant changes in gene expression were assessed using Wilcoxon rank sum test.

All data generated for the current study have been made available on the NCBI gene expression omnibus under the accession GSE235431. All other raw data are available upon request from the corresponding author.

To identify genetic variants and genes underlying genetic predisposition to the development of GEC, we performed a systematic analysis of the previously reported GEC risk-associated index SNVs identified by GWAS. We selected 68 index SNVs from 40 loci reported in the NHGRI-EBI GWAS Catalog (14) with a genome-wide significant (P < 5 × 10−8) association with a GEC phenotype (Materials and Methods; Fig. 1A; Supplementary Table S1). Many of the GEC risk-associated loci lack well-known cancer genes. Only 12 (30%, 12/40) of the loci harbor a gene annotated in the cancer gene consensus (31) within 250,000 base pairs (250 kb), upstream or downstream, of the index SNV. In addition, certain associations may represent indirect environmental or behavioral interactions. For example, two of the associated loci, represented by four reported index SNVs (rs671, rs11066015, rs4646776, and rs1229984; refs. 3235), contain genes involved in alcohol metabolism, aldehyde dehydrogenase 2 (ALDH2), and aldehyde dehydrogenase 1B (ALDH1B). The consumption of alcohol is a risk factor for the development of GEC, specifically ESCC, and an interaction between alcohol consumption and individuals with defective ALDH2 or ALDH1B alleles has been reported (36).

Figure 1.

GEC risk loci are enriched with cancer dependency genes. A, The genomic location of index SNVs (outside) associated with GEC displayed as a circos plot. Index SNVs within 100 kb of each other were collapsed into a single locus and labeled as genomic coordinates (inside). B, LRT scores of genes within a 250 kb window of the GEC risk-associated index SNVs. LRT threshold for cancer dependency genes was determined by the inflection point of the ranked LRT scores (inset). C, Description of the random sampling method. The number of potential cancer dependency genes at GEC risk loci, within a ±250 kb window, were compared with a null distribution of randomly selected index SNVs from the entire GWAS catalog (n = 1,000 permutations). Fraction of loci with a cancer dependency gene with at least one selectively cancer dependency gene, randomly selected GWAS SNVs (blue), and the GEC risk loci (orange). GEC risk loci was shown to be significantly enriched for selective cancer gene dependencies (P < 0.05).

Figure 1.

GEC risk loci are enriched with cancer dependency genes. A, The genomic location of index SNVs (outside) associated with GEC displayed as a circos plot. Index SNVs within 100 kb of each other were collapsed into a single locus and labeled as genomic coordinates (inside). B, LRT scores of genes within a 250 kb window of the GEC risk-associated index SNVs. LRT threshold for cancer dependency genes was determined by the inflection point of the ranked LRT scores (inset). C, Description of the random sampling method. The number of potential cancer dependency genes at GEC risk loci, within a ±250 kb window, were compared with a null distribution of randomly selected index SNVs from the entire GWAS catalog (n = 1,000 permutations). Fraction of loci with a cancer dependency gene with at least one selectively cancer dependency gene, randomly selected GWAS SNVs (blue), and the GEC risk loci (orange). GEC risk loci was shown to be significantly enriched for selective cancer gene dependencies (P < 0.05).

Close modal

Gastroesophageal risk–associated loci harbor selective cancer gene dependencies

To determine whether the identified risk loci represent direct effects on novel cancer-related genes, as opposed to environmental or behavioral interactions, we examined publicly available genome-wide CRISPR-mediated gene knockout screen data from the Dependency Map (DepMap) Project (https://depmap.org; ref. 19). The DepMap screens evaluated the effect of gene knockouts on proliferation in 1,095 cell lines. Using the inflection point of the curve of ranked LRT scores (20), we identified a threshold LRT score (LRT > 171) to conservatively label selective cancer dependency genes (Fig. 1B; Supplementary Table S5). This analysis excluded housekeeping genes that are essential to most of the assessed cell lines. We compared the number of selective cancer dependency genes at GEC risk loci, within a ±250 kb window from the index SNV to a null distribution of randomly selected index SNVs from the entire GWAS catalog (n = 1,000 permutations). We find that the GEC risk loci are significantly enriched for selective cancer gene dependencies (P < 0.05; Fig. 1C), suggesting that many of these loci harbor potentially novel susceptibility genes with a direct effect on cancer development.

Epigenomic footprinting of H3K27ac HiChIP identifies putative causal SNVs

To identify all potentially causal SNVs at each of the risk loci, we calculated ancestral population-matched LD between the GWAS index SNVs and all SNVs within a one mega-base pair (1 MB) genomic window using the genotype data from the East Asian and European populations assessed by the 1,000 genomes project (Materials and Methods; ref. 37). Consistent with previous reports (2), the majority (98%) of the index SNVs and those in strong LD (r2 ≥ 0.8) are located within noncoding regions. However, 45% (18/40) of the loci harbor a coding SNV and 13 (32.5%) exhibit at least one amino acid changing SNV.

To identify active enhancers and promoters together with their interactions, we performed HiChIP against H3K27ac in three GEC representative cell lines, AGS, OE33, and KYSE140, representing GAC, EAC and ESCC. We identified 116,398, 121,811, and 232,301 chromatin interactions involving active CREs in AGS, OE33, and KYSE140, respectively [valid PETs ≥ 3 and a false discovery rate (FDR) < 0.05]. Among the identified chromatin interactions, we observed 48,216, 41,894, and 88,189 enhancer–promoter and 15,963, 11,677, and 20,118 promoter–promoter connections for AGS, OE33, and KYSE140, respectively, which corresponded to an average of 10.7, 9.8, and 15.7 interactions per gene. Next, to identify putative causal SNPs and the target genes underlying the GEC risk loci, we intersected the interactions involving annotated gene promoters with the population-specific, east Asian or European, ldSNVs (r2 ≥ 0.8). In addition to matching the LD patterns using representative populations from the 1,000 genomes project, we matched the association phenotype to the corresponding cellular model. For phenotypes, such as esophageal cancer, which could represent ESCC, EAC, or both, we performed the analyses using the EAC and ESCC models (OE33 and KYSE140, respectively). Next, we identified 186 SNVs connecting to 281 genes at 20 loci, which substantially reduces the number of potentially causal ldSNVs from 1,994 at these loci. The remaining loci did not contain ldSNVs that fell within a HiChIP anchor connected to an annotated gene. Consistently, SNVs found within the HiChIP interactions contain previously reported functional SNVs and genes at GEC risk-associated loci (3840). For example, variation at the PRKAA1 gene locus that is captured by five separate index SNVs, in LD with each other (rs3805495, rs6897169, rs10074991, rs13361707, and rs59585832), is associated with risk of GEC (34, 35, 38, 4145). Incorporating the H3K27ac HiChIP anchors reduces the number of LD SNVs at this locus from more than 40 SNVs to only two shared putative causal SNVs, rs59133000 and rs7702883 (Supplementary Table S6A and S6B). Of note, the rs59133000 SNV was recently implicated as the most likely causal SNV underlying the association at this locus (38).

The resolution of the HiChIP interactions is limited to the size of the restriction fragments, anchors, which often span several thousand base pairs in size. These anchors can include multiple potentially functional SNVs. However, nucleosomes harboring histones marked with H3K27ac flank active open chromatin regions bound by the transcriptional machinery (46, 47). These active NFRs can be identified through a depression, or valley, in the HiChIP enrichment profile, as previously demonstrated using ChIP-seq of histone posttranslational modifications, termed epigenomic footprints (47, 48). Therefore, we developed and employed a computational tool, called HisTrader (bioRxiv 2020.03.12.989228), which uses differencing and moving averages to identify the beginning and end of NFRs within the HiChIP enrichment profiles (Fig. 2A). To confirm that the identified NFRs are conserved and correspond to disease relevant biology, we compared the HiChIP enrichment signal to H3K27ac ChIPmentation (49) data from matched tumor and adjacent reference mucosa from two GEA patients from the McGill University Health Centre (Fig. 2B). We observed that HisTrader identified NFRs at shared H3K27ac peaks, which are consistent between the H3K27ac HiChIP from the GEA model, AGS, cell line and the GEA patient samples. Therefore, we proceeded to incorporate NFRs to further reduce the number of potentially causal SNVs.

Figure 2.

HiChIP-based epigenomic footprinting identifies putative causal SNVs underlying risk loci. A, Description of the approach to identify potentially causal SNVs using H3K27ac HiChIP data; HiChIP anchors and peaks annotate active CREs and simultaneously delineate their interactions. H3K27ac HiChIP signals are used to predict NFRs through the identification of depressions in the enrichment signal. Cancer dependency genes are defined using LRT scores and data from the DepMap project. SNVs found in active CREs and NFRs that interact with a cancer dependency gene are predicted to be the most likely causal variant(s). B, Heatmaps of the enrichment signals at shared H3K27ac peaks centered on HisTrader called NFRs; AGS H3K27ac HiChIP, patient-matched tumor, and reference mucosa H3K27ac ChIPmentation are plotted. TUGI-1T and TUGI-2T are the patient tumor samples and TUGI-1N and TUGI-2N are their respective matched patient reference gastric mucosa. C, Heatmap illustrating the reduction in the number of potentially causal LD SNVs associated and their index SNV using the H3K27ac HiChIP analysis approach (A) in the representative cell lines AGS (blue), OE33 (purple), and KYSE140 (red); Chromatin Int., chromatin interactions; Dep. Gene, dependency gene; DG, the contact maps and H3K27ac HiChIP enrichment signals are shown for putative causal SNVs using a 500 kb genomic window. D, rs201003281 and rs142909110 (LMNA); AGS HiChIP (blue), patient tumor (TUGI-1T and TUGI-2T), and reference (TUGI-1N and TUGI-2N) ChIPmentation signals are also shown (light blue). E, rs2292759 (PTPN2) and KYSE140 HiChIP (red). F, rs11556025 (CHEK2 and XBP1) and KYSE140 HiChIP (red). G, rs2834718 (RUNX1) and KYSE140 HiChIP (red).

Figure 2.

HiChIP-based epigenomic footprinting identifies putative causal SNVs underlying risk loci. A, Description of the approach to identify potentially causal SNVs using H3K27ac HiChIP data; HiChIP anchors and peaks annotate active CREs and simultaneously delineate their interactions. H3K27ac HiChIP signals are used to predict NFRs through the identification of depressions in the enrichment signal. Cancer dependency genes are defined using LRT scores and data from the DepMap project. SNVs found in active CREs and NFRs that interact with a cancer dependency gene are predicted to be the most likely causal variant(s). B, Heatmaps of the enrichment signals at shared H3K27ac peaks centered on HisTrader called NFRs; AGS H3K27ac HiChIP, patient-matched tumor, and reference mucosa H3K27ac ChIPmentation are plotted. TUGI-1T and TUGI-2T are the patient tumor samples and TUGI-1N and TUGI-2N are their respective matched patient reference gastric mucosa. C, Heatmap illustrating the reduction in the number of potentially causal LD SNVs associated and their index SNV using the H3K27ac HiChIP analysis approach (A) in the representative cell lines AGS (blue), OE33 (purple), and KYSE140 (red); Chromatin Int., chromatin interactions; Dep. Gene, dependency gene; DG, the contact maps and H3K27ac HiChIP enrichment signals are shown for putative causal SNVs using a 500 kb genomic window. D, rs201003281 and rs142909110 (LMNA); AGS HiChIP (blue), patient tumor (TUGI-1T and TUGI-2T), and reference (TUGI-1N and TUGI-2N) ChIPmentation signals are also shown (light blue). E, rs2292759 (PTPN2) and KYSE140 HiChIP (red). F, rs11556025 (CHEK2 and XBP1) and KYSE140 HiChIP (red). G, rs2834718 (RUNX1) and KYSE140 HiChIP (red).

Close modal

Through the incorporation of H3K27ac HiChIP epigenomic footprinting, we can simultaneously annotate active CREs, delineate the interactions between them and their target genes, and isolate candidate functional SNVs within NFRs. Using this approach, we further reduce the number of potentially causal SNVs at risk loci (Fig. 2C; Supplementary Table S6A). For example, the variation at the PSCA locus–associated GEC and three index SNVs, rs2294008, rs2976394, and rs2920280, have been reported to capture the risk-associated genetic variation at this locus (34, 40, 50). Using the H3K27ac anchors to filter ldSNVs reduces the number of ldSNVs from 63 and 47 to 22 and 14 potentially functional SNVs in Europeans and East Asians, respectively (Fig. 2C; Supplementary Table S6B). The addition of HisTrader called NFRs further reduces the number of SNVs from 22 and 14 in Europeans and East Asians to 8 SNVs shared between Europeans and East Asians (rs2976387, rs11784604, rs11786719, rs13274202, rs13252264, rs11786721, rs2976388, and rs2920281). These SNVs are linked to two genes, JRK and PSCA (Supplementary Table S6B). Of note, one of these SNVs, rs2978980, was recently identified as a functional SNV causing the allele-specific expression of the long noncoding RNA, lncPSCA, a noncoding transcript variant of PSCA (39). The index SNV, rs2294008, has also been reported to be a functional SNV at this locus (40). However, we do not observe an active CRE encompassing the rs2294008 SNV in our data, which may result from the cell models not fully capturing the epigenomic landscape of GEA or multiple functional alleles contributing to the association at this locus (51). Interestingly, including HisTrader called NFRs reduced the list of ldSNVs to a single putative causal variant at seven loci (Fig. 2C; Supplementary Table S6). For instance, a single SNV, rs3744270, remained as the most likely causal variant underlying the risk association captured by the rs17761864 index SNV on chromosome 17 (chr17p13.3; ref. 52). rs3744270 is located within the promoter of the serine racemase gene (SRR). This SNV is a strong eQTL for SRR in 38 tissues, including esophageal and stomach tissue, assessed by the GTEX project (53). SRR was recently shown to enhance the growth of colorectal cancer through the production of pyruvate from serine (54). However, the rs17761864 SNV is also in LD (r2 > 0.8) with a coding SNV, rs1885986, within the SMG6 nonsense mRNA‐mediated decay factor (SMG6), also known as EST1 telomerase component homolog 1A (EST1A) gene. SMG6 plays a role in nonsense-mediated mRNA decay (55) and is a component of the telomerase ribonucleoprotein complex (56). Therefore, additional studies will be required to elucidate the potential role of each putative functional SNV at this risk locus in GEC.

UXS1 is a novel GEC susceptibility and dependency gene

To identify novel cancer-related genes and prioritize putative functional SNVs targeting them, we integrated the H3K27ac HiChIP data with the cancer gene dependencies identified by the DepMap project (19). Five loci were reduced to only two or a single SNV within an epigenomic footprint that interacted with a cancer dependency gene (LRT > 171). For example, at the LMNA locus, two SNVs in LD with the index SNV rs138554234 remain following the analysis, rs201003281 and rs142909110. The rs142909110 SNV seems to be just upstream of a short isoform of LMNA, and rs201003281 is within an intronic enhancer (Fig. 2C and D). LMNA is a cancer census gene, and both regions seem to regulate it; however, no eQTLs have been reported for these SNVs. The four remaining loci contain a single candidate functional SNV. These SNVs, rs2292759, rs2834718, rs11556025, and rs147518036, target the genes PTPN2 (Fig. 2E), CHEK2 (Fig. 2F), RUNX1 (Fig. 2G), and UXS1, respectively. PTPN2, RUNX1, and CHEK2 have been previously described to play a role in cancer etiology (5759). Interestingly, rs11556025 is found within the promoter of CCDC117, which forms promoter to promoter interactions with CHEK2 and XBP1 (Fig. 2C and F). Similar to CHEK2, XBP1 is also known to play a role in tumorigenicity and cancer progression (60, 61). Interestingly, the CCDC117 promoter may function as an enhancer depending on tissue type given rs11556025 and its index SNV rs2239815 are eQTLs, identified by the GTEX project (53), for CHEK2 and XBP1 but not CCDC117 in several different tissues.

Interestingly, the SNV rs147518036 was found to target a novel cancer dependency gene, UXS1, not previously reported to have a role in GEC. The rs147518036 SNV is in LD with the lead SNV, rs75460256, recently associated with GEC among Europeans (Fig. 3A–C; ref. 62). Of note, rs147518036 is a reported eQTL for UXS1 (53), and UXS1 is significantly overexpressed in primary gastric tumors, but not esophageal tumors, when compared with reference tissue (P = 4.22 × 10−7) using data from TCGA (Supplementary Fig. S1A and S1B; ref. 63). In addition, UXS1 was found to be enriched as a cancer dependency among gastric cancer cell lines (17/34; odds ratio = 3.3; P = 8.14 × 10−4; Supplementary Fig. S2A). Because the DepMap CRISPR Screens were based on gene knockouts, we employed CRISPR interference (CRISPRi), using an inactivated Cas9 (dCas9) fused to two transcriptional repressors KRAB and MeCP2 (64), to assess the role of the UXS1 promoter element and knocking down UXS1. We tested four sgRNAs targeting the UXS1 promoter and observed >85% knockdown of UXS1 expression across all sgRNAs, demonstrating the activity of the annotated promoter in driving UXS1 expression (Fig. 3D). We confirmed that the repression of the UXS1, using sgRNA4, was specific by testing the expression of neighboring genes (Fig. 3E) and that the CRISPRi-mediated knockdown also resulted in a reduction in the UXS1 protein expression (Fig. 3D, inset). As expected, the repression of the UXS1 promoter led to a significant reduction (∼46%) in cell viability (P = 0.0006) in the AGS cell line after 96 hours (Fig. 3F). To validate UXS1 as a selective cancer dependency, we performed CRISPRi-mediated repression of the UXS1 promoter in two additional cell lines, MKN45 and MKN74 (Fig. 3G). We observed ∼50% reduction in viability of the MKN45 cell line (P = 0.006) after 72 hours, but no significant effect of UXS1 repression on the MKN74 cell line (Fig. 3G), illustrating that UXS1 is a selective dependency, which is further confirmed by the dependency scores observed by the DepMap project (Supplementary Fig. S2A and S2B). In addition, we confirm that the ectopic expression of the UXS1 can rescue the decrease in cell viability observed by repressing the UXS1 promoter (fold change = 1.36; P = 0.002; Fig. 3H; Supplementary Fig. S3). Interestingly, AGS and MKN74 were found to be heterozygous for the rs147518036 SNV (Supplementary Fig. S4). Although AGS seems to be diploid at the UXS1 locus, MKN74 seems to have a copy number alteration or increased ploidy of UXS1 (Supplementary Fig. S2B), which may explain its insensitivity to the CRISPRi-mediated repression of the UXS1.

Figure 3.

UXS1 is a novel GEC susceptibility and cancer dependency gene. A, Manhattan plot of the GWAS summary statistics, GCST90011807, and SNVs in LD with index SNV rs754669256 (red). LD was calculated using the 1,000 genomes data. Blue bars, HiChIP anchors; orange bars, HisTrader-identified NFRs. B, H3K27ac HiChIP contact matrix of a 500-kb window centered on the rs147518036 SNV along with the AGS H3K27ac HiChIP, patient tumor, and matched reference H3K27ac ChIPmentation enrichment signals. TUGI-1T and TUGI-2T are patient tumor samples. TUGI-1N and TUGI-2N are matched patient reference gastric mucosa. The location of the ldSNVs (purple) and the putative causal SNV, rs147518036 (orange), are shown. C, Zoomed in UXS1 promoter with the location of the SNV rs147518036 present in HisTrader-predicted NFR regions (orange). AGS H3K27ac HiChIP, patient tumor, and matched reference H3K27ac ChIPmentation enrichment signals are also shown. D, Bar plot showing relative RNA expression of UXS1 in the presence of four separate sgRNAs targeting the UXS1 promoter. Inset, representative immunoblot confirming the successful knockdown of UXS1 protein with sgRNA4. E, Bar plot of the normalized mRNA counts of UXS1 and neighboring genes with and without the presence of the sgRNA4-mediated CRISPRi of UXS1 promoter; highly significant knockdown of UXS1 was observed, whereas neighboring genes showed no significant differences. P values are from two-tailed t tests; ****, P = 0.00002. F, Growth curve of the AGS cells with and without CRISPRi-mediated repression of the UXS1 promoter; significance was tested using a mixed model ANOVA; **, P = 0.0006. G, Bar plot of cell viability, measured through absorbance, following 72 hours of incubation for the dCas9-KRAB-MeCP2 cell lines (AGS, MKN45, and MKN74) with (+) sgRNA4 or (–) a nontargeting control sgRNA. P values were calculated using a Welch t test; *, P < 0.05. H, Bar plot of the cell viability following 72 hours of incubation of AGS dCas9-KRAB-MeCP2 cells with sgRNA4 (+) or (–) a nontargeting control sgRNA together with (+) or without (–) doxycycline. Significance was assessed using a Welch t test. **, P = 0.002.

Figure 3.

UXS1 is a novel GEC susceptibility and cancer dependency gene. A, Manhattan plot of the GWAS summary statistics, GCST90011807, and SNVs in LD with index SNV rs754669256 (red). LD was calculated using the 1,000 genomes data. Blue bars, HiChIP anchors; orange bars, HisTrader-identified NFRs. B, H3K27ac HiChIP contact matrix of a 500-kb window centered on the rs147518036 SNV along with the AGS H3K27ac HiChIP, patient tumor, and matched reference H3K27ac ChIPmentation enrichment signals. TUGI-1T and TUGI-2T are patient tumor samples. TUGI-1N and TUGI-2N are matched patient reference gastric mucosa. The location of the ldSNVs (purple) and the putative causal SNV, rs147518036 (orange), are shown. C, Zoomed in UXS1 promoter with the location of the SNV rs147518036 present in HisTrader-predicted NFR regions (orange). AGS H3K27ac HiChIP, patient tumor, and matched reference H3K27ac ChIPmentation enrichment signals are also shown. D, Bar plot showing relative RNA expression of UXS1 in the presence of four separate sgRNAs targeting the UXS1 promoter. Inset, representative immunoblot confirming the successful knockdown of UXS1 protein with sgRNA4. E, Bar plot of the normalized mRNA counts of UXS1 and neighboring genes with and without the presence of the sgRNA4-mediated CRISPRi of UXS1 promoter; highly significant knockdown of UXS1 was observed, whereas neighboring genes showed no significant differences. P values are from two-tailed t tests; ****, P = 0.00002. F, Growth curve of the AGS cells with and without CRISPRi-mediated repression of the UXS1 promoter; significance was tested using a mixed model ANOVA; **, P = 0.0006. G, Bar plot of cell viability, measured through absorbance, following 72 hours of incubation for the dCas9-KRAB-MeCP2 cell lines (AGS, MKN45, and MKN74) with (+) sgRNA4 or (–) a nontargeting control sgRNA. P values were calculated using a Welch t test; *, P < 0.05. H, Bar plot of the cell viability following 72 hours of incubation of AGS dCas9-KRAB-MeCP2 cells with sgRNA4 (+) or (–) a nontargeting control sgRNA together with (+) or without (–) doxycycline. Significance was assessed using a Welch t test. **, P = 0.002.

Close modal

Functional SNV, rs147518036, alters GABPA binding to drive increased UXS1 expression

Following the confirmation of UXS1 as a cancer dependency gene, we sought to functionally assess rs147518036 as the functional SNV underlying the GEC risk association at this locus. Interestingly, in support of rs147518036 as the putative functional SNV underlying the rs75460256 risk locus, we observed a significant allelic imbalance (binomial test, P = 0.05, 52 vs. 33) in the H3K27ac HiChIP signal at the UXS1 promoter. We used a heterozygous marker SNV, rs78870678, that is not in LD with the index SNV, rs75460256 (r2 = 0.02), because only 14 aligned reads spanned the SNV rs147518036. The marker SNV, rs78870678, is located closer to the UXS1 and is within a nucleosome occupied region. Because the index rs75460256 SNV is not in LD with any coding SNVs and regulatory SNVs function primarily through the alteration of transcription factor DNA recognition motifs (2), we performed the IGR analysis (Materials and Methods; ref. 27)that predicts the effects of an SNV on the binding of transcription factors. We assessed 294 unique transcription factors using publicly available ChIP-seq data profiled by the ENCODE project (Supplementary Table S3; ref. 65). The IGR analysis predicted that the variant allele of the rs147518036 SNV would increase the binding affinity of different ETS factors, including ELF1, GABPA, and GABPB1 for the chromatin (Fig. 4A). A motif analysis also revealed that the variant allele of rs147518036 SNV creates a stronger GABPA DNA recognition motif (chr2:106810870-106810879, score = 10.3, P = 8.92 × 10−5) compared with the reference allele (score = 7.5, P = 2.55 × 10−4). Interestingly, the created motif is located 41 bp from an existing GABPA DNA recognition motif (chr2:106810828-106810837, Score = 10.1, P = 9.89 × 10−5; Fig. 4B). This finding is reminiscent of the recurrent somatic mutations identified within the promoter of the telomerase (TERT) gene (66, 67). The telomerase promoter mutations also create a second GABPA recognition site in helical phase with an existing GABPA recognition site to drive TERT expression (68). Therefore, the recruitment of GABP to the UXS1 promoter may mediate novel long-range chromatin interactions to drive increased transcription, as observed at the TERT locus (69).

Figure 4.

rs147518036 SNV creates a GABPA motif to drive increased UXS1 expression. A, A volcano plot of the IGR results for the rs147518036 SNV and 294 transcription factors profiled by the ENCODE project. B, Graphical illustration of GABP transcription factor heterotetramer binding UXS1 promoter at the existing and de novo ETS binding site created by the rs147518036 SNV. C, Bar plot of the fold change in reporter gene expression between reference and variant alleles of the rs147518036 SNV. P values were calculated with two-tailed Welch t tests. **, P = 0.007. D, Bar plot of the relative RNA expression of GABPA in the presence of nontargeting and GABPA siRNAs. P values were calculated with two-tailed Welch t tests. ***, P = 0.0008. E, Bar plot of the fold change in the UXS1 expression between reference and variant alleles of the rs147518036 SNV in the presence or absence of GABPA-directed siRNA. A significant increase in luciferase activity is only observed for the variant allele in the absence of GABPA-directed siRNA. P values were calculated using a two-tailed Welch t test. **, P = 0.001. (B, Created with BioRender.com.)

Figure 4.

rs147518036 SNV creates a GABPA motif to drive increased UXS1 expression. A, A volcano plot of the IGR results for the rs147518036 SNV and 294 transcription factors profiled by the ENCODE project. B, Graphical illustration of GABP transcription factor heterotetramer binding UXS1 promoter at the existing and de novo ETS binding site created by the rs147518036 SNV. C, Bar plot of the fold change in reporter gene expression between reference and variant alleles of the rs147518036 SNV. P values were calculated with two-tailed Welch t tests. **, P = 0.007. D, Bar plot of the relative RNA expression of GABPA in the presence of nontargeting and GABPA siRNAs. P values were calculated with two-tailed Welch t tests. ***, P = 0.0008. E, Bar plot of the fold change in the UXS1 expression between reference and variant alleles of the rs147518036 SNV in the presence or absence of GABPA-directed siRNA. A significant increase in luciferase activity is only observed for the variant allele in the absence of GABPA-directed siRNA. P values were calculated using a two-tailed Welch t test. **, P = 0.001. (B, Created with BioRender.com.)

Close modal

To assess the function of the rs147518036 SNV, we cloned the promoter of the UXS1 gene into a luciferase reporter construct and performed site directed mutagenesis to create the variant allele (Supplementary Fig. S5). The luciferase assay confirmed the functional impact of the rs147518036 SNV. The variant allele led to a significant increase in luciferase expression (fold change = 1.5, P = 0.007; Fig. 4C). To confirm the role of GABPA in mediating the increase in gene expression, we performed the luciferase assay in the presence of siRNA against GABPA. The siRNA-mediated knockdown led to a significant reduction in the expression of GABPA (79%, P = 0.0008; Fig. 4D). In addition, the knockdown of GABPA attenuated the increase in expression caused by the variant allele confirming its role in the increased expression caused by the variant allele of the SNV rs147518036 (Fig. 4E). Altogether, these findings reveal that the SNV rs147518036 can account for the GEC risk association indexed by the rs75460256 SNV through the creation of a GABPA DNA recognition site that results in the increased expression of UXS1.

UXS1, also known as UDP-xylose synthase, catalyzes the formation of UDP-xylose from UDP-glucuronate (70). UDP-xylose is used in the initiation of glycosaminoglycans (GAG) synthesis. To our knowledge, UXS1 has not been implicated as a cancer gene. However, the enzyme preceding it in the production of GAGs, UDP-glucose 6-dehydrogenase (UGDH) that converts UDP-glucose to UDP-glucuronic acid as part of the uronic acid pathway, has been implicated in cancer metastasis (71). Chondroitin sulfate and heparan sulfate classes of GAGs require UDP-xylose as a donor to initiate their synthesis (70), and heparan sulfate GAGs have been shown to be involved in tumor initiation, progression, and metastasis (72). Furthermore, UXS1 was recently identified as a potential therapeutic vulnerability among cancer cells overexpressing UGDH (73). Therefore, these pathways may provide novel therapeutic opportunities and strategies for the treatment of GEC.

In addition to implicating the increased expression of the novel gene UXS1 as a predisposition to GEC, these findings provide a framework that simplifies the identification of potentially functional noncoding regulatory SNVs and target genes underlying risk-associated loci. These findings highlight that unlike other capture HiC approaches (5), HiChIP against posttranslational modifications can increase the resolution to active open regulatory elements and effectively reduce the number of potentially functional sequence variants, ldSNVs, at risk loci through the simultaneous annotation of active CREs and their interactions. It is important to note that HiChIP can also exclude indirect associations, such as those observed as eQTLs, through the lack of physical interactions between a potentially functional SNV and affected gene. In addition, genome-scale CRISPR-based screens have led to the systematic identification of selective cancer gene dependencies (19), which can be leveraged to prioritize genes within risk-associated loci. However, not all target genes of risk loci need to be selective dependencies, and some loci may function through indirect mechanisms, different tissues, or through environmental interactions. Finally, these findings and methods can be extended to the interpretation of noncoding somatic mutations found in cancer (74).

No disclosures were reported.

A. Gnanapragasam: conceptualization, validation, investigation, writing–original draft, writing–review and editing. E. Kirbizakis: conceptualization, data curation, formal analysis, investigation, writing–original draft, writing–review and editing. A. Li: data curation, formal analysis, validation, investigation. K.H. White: validation, investigation. K.L. Mortenson: data curation, formal analysis. J. Cavalcante de Moura: validation, investigation. W. Jawhar: validation, investigation. Y. Yan: formal analysis, investigation. R. Falter: investigation, methodology. C. Russett: validation, investigation. B. Giannias: resources, project administration. S. Camilleri-Broët: formal analysis, investigation. N. Bertos: resources, project administration. J. Cools-Lartigue: resources, supervision. L. Garzia: supervision, writing–review and editing. V. Sangwan: resources, funding acquisition, writing–review and editing. L. Ferri: resources, funding acquisition, writing–review and editing. X. Zhang: conceptualization, resources, supervision, investigation, methodology, writing–original draft, writing–review and editing. S.D. Bailey: conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, investigation, writing–original draft, writing–review and editing.

We would like to acknowledge Marie-Michelle Simon, Tony Kwan, and colleagues at the Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University for their help in creating patient ChIPmentation libraries and sequencing expertise. We would also like thank the Centre d’expertise et de services Génome Québec for their support and expertise. Operating grants from the Cancer Research Society, Canadian Institute of Health Research, and Montreal General Foundation to S.D. Bailey supported this research. S.D. Bailey is the recipient of a Chercheur-boursier Junior 1 award from the Fonds de Recherche du Québec—Santé (FRQS), a Thomlinson award from McGill University, and the Dr. Ray Chiu Distinguished Scientist in Surgical Research award from the Montreal General Hospital Foundation. L. Garzia was supported by the National Institutes of Health R01 grant (R01CA255369).

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

1.
Watanabe
K
,
Stringer
S
,
Frei
O
,
Umićević Mirkov
M
,
de Leeuw
C
,
Polderman
TJC
, et al
.
A global overview of pleiotropy and genetic architecture in complex traits
.
Nat Genet
2019
;
51
:
1339
48
.
2.
Maurano
MT
,
Humbert
R
,
Rynes
E
,
Thurman
RE
,
Haugen
E
,
Wang
H
, et al
.
Systematic localization of common disease-associated variation in regulatory DNA
.
Science
2012
;
337
:
1190
5
.
3.
Zhang
X
,
Bailey
SD
,
Lupien
M
.
Laying a solid foundation for Manhattan—“setting the functional basis for the post-GWAS era”
.
Trends Genet
2014
;
30
:
140
9
.
4.
Nasser
J
,
Bergman
DT
,
Fulco
CP
,
Guckelberger
P
,
Doughty
BR
,
Patwardhan
TA
, et al
.
Genome-wide enhancer maps link risk variants to disease genes
.
Nature
2021
;
593
:
238
43
.
5.
Baxter
JS
,
Leavy
OC
,
Dryden
NH
,
Maguire
S
,
Johnson
N
,
Fedele
V
, et al
.
Capture Hi-C identifies putative target genes at 33 breast cancer risk loci
.
Nat Commun
2018
;
9
:
1028
.
6.
Hormozdiari
F
,
van de Bunt
M
,
Segrè
AV
,
Li
X
,
Joo
JWJ
,
Bilow
M
, et al
.
Colocalization of GWAS and eQTL signals detects target genes
.
Am J Hum Genet
2016
;
99
:
1245
60
.
7.
Dekker
J
,
Marti-Renom
MA
,
Mirny
LA
.
Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data
.
Nat Rev Genet
2013
;
14
:
390
403
.
8.
Mumbach
MR
,
Rubin
AJ
,
Flynn
RA
,
Dai
C
,
Khavari
PA
,
Greenleaf
WJ
, et al
.
HiChIP: efficient and sensitive analysis of protein-directed genome architecture
.
Nat Methods
2016
;
13
:
919
22
.
9.
Yu
M
,
Juric
I
,
Abnousi
A
,
Hu
M
,
Ren
B
.
Proximity ligation-assisted ChIP-Seq (PLAC-Seq)
.
Methods Mol Biol
2021
;
2351
:
181
99
.
10.
Mumbach
MR
,
Satpathy
AT
,
Boyle
EA
,
Dai
C
,
Gowen
BG
,
Cho
SW
, et al
.
Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements
.
Nat Genet
2017
;
49
:
1602
12
.
11.
Giambartolomei
C
,
Seo
J-H
,
Schwarz
T
,
Freund
MK
,
Johnson
RD
,
Spisak
S
, et al
.
H3K27ac HiChIP in prostate cell lines identifies risk genes for prostate cancer susceptibility
.
Am J Hum Genet
2021
;
108
:
2284
300
.
12.
Pennathur
A
,
Gibson
MK
,
Jobe
BA
,
Luketich
JD
.
Oesophageal carcinoma
.
Lancet
2013
;
381
:
400
12
.
13.
Yaghoobi
M
,
Bijarchi
R
,
Narod
SA
.
Family history and the risk of gastric cancer
.
Br J Cancer
2010
;
102
:
237
42
.
14.
Sollis
E
,
Mosaku
A
,
Abid
A
,
Buniello
A
,
Cerezo
M
,
Gil
L
, et al
.
The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource
.
Nucleic Acids Res
2023
;
51
:
D977
85
.
15.
Cancer Genome Atlas Research Network
;
Analysis Working Group: Asan University
;
BC Cancer Agency
;
Brigham and Women’s Hospital
;
Broad Institute
;
Brown University
, et al
.
Integrated genomic characterization of oesophageal carcinoma
.
Nature
2017
;
541
:
169
75
.
16.
GBD 2017 Oesophageal Cancer Collaborators
.
The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017
.
Lancet Gastroenterol Hepatol
2020
;
5
:
582
97
.
17.
1000 Genomes Project Consortium
;
Abecasis
GR
,
Altshuler
D
,
Auton
A
,
Brooks
LD
,
Durbin
RM
,
Gibbs
RA
, et al
.
A map of human genome variation from population-scale sequencing
.
Nature
2010
;
467
:
1061
73
.
18.
Purcell
S
,
Neale
B
,
Todd-Brown
K
,
Thomas
L
,
Ferreira
MAR
,
Bender
D
, et al
.
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am J Hum Genet
2007
;
81
:
559
75
.
19.
Dempster
JM
,
Pacini
C
,
Pantel
S
,
Behan
FM
,
Green
T
,
Krill-Burger
J
, et al
.
Agreement between two large pan-cancer CRISPR-Cas9 gene dependency data sets
.
Nat Commun
2019
;
10
:
5817
.
20.
McDonald
ER
3rd
,
de Weck
A
,
Schlabach
MR
,
Billy
E
,
Mavrakis
KJ
,
Hoffman
GR
, et al
.
Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening
.
Cell
2017
;
170
:
577
92.e10
.
21.
Liu
Y
,
Guo
B
,
Aguilera-Jimenez
E
,
Chu
VS
,
Zhou
J
,
Wu
Z
, et al
.
Chromatin looping shapes KLF5-dependent transcriptional programs in human epithelial cancers
.
Cancer Res
2020
;
80
:
5464
77
.
22.
Servant
N
,
Varoquaux
N
,
Lajoie
BR
,
Viara
E
,
Chen
C-J
,
Vert
J-P
, et al
.
HiC-Pro: an optimized and flexible pipeline for Hi-C data processing
.
Genome Biol
2015
;
16
:
259
.
23.
Lareau
CA
,
Aryee
MJ
.
hichipper: a preprocessing pipeline for calling DNA loops from HiChIP data
.
Nat Methods
2018
;
15
:
155
6
.
24.
Zhang
Y
,
Liu
T
,
Meyer
CA
,
Eeckhoute
J
,
Johnson
DS
,
Bernstein
BE
, et al
.
Model-based analysis of ChIP-Seq (MACS)
.
Genome Biol
2008
;
9
:
R137
.
25.
Li
H
,
Durbin
R
.
Fast and accurate short read alignment with Burrows-Wheeler transform
.
Bioinformatics
2009
;
25
:
1754
60
.
26.
Ramírez
F
,
Dündar
F
,
Diehl
S
,
Grüning
BA
,
Manke
T
.
deepTools: a flexible platform for exploring deep-sequencing data
.
Nucleic Acids Res
2014
;
42
:
W187
91
.
27.
Cowper-Sal lari
R
,
Zhang
X
,
Wright
JB
,
Bailey
SD
,
Cole
MD
,
Eeckhoute
J
, et al
.
Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression
.
Nat Genet
2012
;
44
:
1191
8
.
28.
Grant
CE
,
Bailey
TL
,
Noble
WS
.
FIMO: scanning for occurrences of a given motif
.
Bioinformatics
2011
;
27
:
1017
8
.
29.
Liu
Y
,
Wu
Z
,
Zhou
J
,
Ramadurai
DKA
,
Mortenson
KL
,
Aguilera-Jimenez
E
, et al
.
A predominant enhancer co-amplified with the SOX2 oncogene is necessary and sufficient for its expression in squamous cancer
.
Nat Commun
2021
;
12
:
7139
.
30.
Colaprico
A
,
Silva
TC
,
Olsen
C
,
Garofano
L
,
Cava
C
,
Garolini
D
, et al
.
TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
.
Nucleic Acids Res
2016
;
44
:
e71
.
31.
Sondka
Z
,
Bamford
S
,
Cole
CG
,
Ward
SA
,
Dunham
I
,
Forbes
SA
.
The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers
.
Nat Rev Cancer
2018
;
18
:
696
705
.
32.
Cui
R
,
Kamatani
Y
,
Takahashi
A
,
Usami
M
,
Hosono
N
,
Kawaguchi
T
, et al
.
Functional variants in ADH1B and ALDH2 coupled with alcohol and smoking synergistically enhance esophageal cancer risk
.
Gastroenterology
2009
;
137
:
1768
75
.
33.
Wu
C
,
Hu
Z
,
He
Z
,
Jia
W
,
Wang
F
,
Zhou
Y
, et al
.
Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in Chinese populations
.
Nat Genet
2011
;
43
:
679
84
.
34.
Sakaue
S
,
Kanai
M
,
Tanigawa
Y
,
Karjalainen
J
,
Kurki
M
,
Koshiba
S
, et al
.
A cross-population atlas of genetic associations for 220 human phenotypes
.
Nat Genet
2021
;
53
:
1415
24
.
35.
Ishigaki
K
,
Akiyama
M
,
Kanai
M
,
Takahashi
A
,
Kawakami
E
,
Sugishita
H
, et al
.
Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases
.
Nat Genet
2020
;
52
:
669
79
.
36.
Suzuki
A
,
Katoh
H
,
Komura
D
,
Kakiuchi
M
,
Tagashira
A
,
Yamamoto
S
, et al
.
Defined lifestyle and germline factors predispose Asian populations to gastric cancer
.
Sci Adv
2020
;
6
:
eaav9778
.
37.
1000 Genomes Project Consortium
;
Auton
A
,
Brooks
LD
,
Durbin
RM
,
Garrison
EP
,
Kang
HM
,
Korbel
JO
, et al
.
A global reference for human genetic variation
.
Nature
2015
;
526
:
68
74
.
38.
Yan
C
,
Zhu
M
,
Ding
Y
,
Yang
M
,
Wang
M
,
Li
G
, et al
.
Meta-analysis of genome-wide association studies and functional assays decipher susceptibility genes for gastric cancer in Chinese populations
.
Gut
2020
;
69
:
641
51
.
39.
Zheng
Y
,
Lei
T
,
Jin
G
,
Guo
H
,
Zhang
N
,
Chai
J
, et al
.
LncPSCA in the 8q24.3 risk locus drives gastric cancer through destabilizing DDX5
.
EMBO Rep
2021
;
22
:
e52707
.
40.
Study Group of Millennium Genome Project for Cancer
;
Sakamoto
H
,
Yoshimura
K
,
Saeki
N
,
Katai
H
,
Shimoda
T
,
Matsuno
Y
, et al
.
Genetic variation in PSCA is associated with susceptibility to diffuse-type gastric cancer
.
Nat Genet
2008
;
40
:
730
40
.
41.
Jin
G
,
Lv
J
,
Yang
M
,
Wang
M
,
Zhu
M
,
Wang
T
, et al
.
Genetic risk, incident gastric cancer, and healthy lifestyle: a meta-analysis of genome-wide association studies and prospective cohort study
.
Lancet Oncol
2020
;
21
:
1378
86
.
42.
Hu
N
,
Wang
Z
,
Song
X
,
Wei
L
,
Kim
BS
,
Freedman
ND
, et al
.
Genome-wide association study of gastric adenocarcinoma in Asia: a comparison of associations between cardia and non-cardia tumours
.
Gut
2016
;
65
:
1611
8
.
43.
Shi
Y
,
Hu
Z
,
Wu
C
,
Dai
J
,
Li
H
,
Dong
J
, et al
.
A genome-wide association study identifies new susceptibility loci for non-cardia gastric cancer at 3q13.31 and 5p13.1
.
Nat Genet
2011
;
43
:
1215
8
.
44.
Wang
Z
,
Dai
J
,
Hu
N
,
Miao
X
,
Abnet
CC
,
Yang
M
, et al
.
Identification of new susceptibility loci for gastric non-cardia adenocarcinoma: pooled results from two Chinese genome-wide association studies
.
Gut
2017
;
66
:
581
7
.
45.
Tanikawa
C
,
Kamatani
Y
,
Toyoshima
O
,
Sakamoto
H
,
Ito
H
,
Takahashi
A
, et al
.
Genome-wide association study identifies gastric cancer susceptibility loci at 12q24.11-12 and 20q11.21
.
Cancer Sci
2018
;
109
:
4015
24
.
46.
Ernst
J
,
Kellis
M
.
Discovery and characterization of chromatin states for systematic annotation of the human genome
.
Nat Biotechnol
2010
;
28
:
817
25
.
47.
He
HH
,
Meyer
CA
,
Shin
H
,
Bailey
ST
,
Wei
G
,
Wang
Q
, et al
.
Nucleosome dynamics define transcriptional enhancers
.
Nat Genet
2010
;
42
:
343
7
.
48.
Ziller
MJ
,
Edri
R
,
Yaffe
Y
,
Donaghey
J
,
Pop
R
,
Mallard
W
, et al
.
Dissecting neural differentiation regulatory networks through epigenetic footprinting
.
Nature
2015
;
518
:
355
9
.
49.
Schmidl
C
,
Rendeiro
AF
,
Sheffield
NC
,
Bock
C
.
ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors
.
Nat Methods
2015
;
12
:
963
5
.
50.
Park
B
,
Yang
S
,
Lee
J
,
Woo
HD
,
Choi
IJ
,
Kim
YW
, et al
.
Genome-wide association of genetic variation in the PSCA gene with gastric cancer susceptibility in a Korean population
.
Cancer Res Treat
2019
;
51
:
748
57
.
51.
Corradin
O
,
Saiakhova
A
,
Akhtar-Zaidi
B
,
Myeroff
L
,
Willis
J
,
Cowper-Sal lari
R
, et al
.
Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits
.
Genome Res
2014
;
24
:
1
13
.
52.
Wu
C
,
Kraft
P
,
Zhai
K
,
Chang
J
,
Wang
Z
,
Li
Y
, et al
.
Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions
.
Nat Genet
2012
;
44
:
1090
7
.
53.
Lonsdale
J
,
Thomas
J
,
Salvatore
M
,
Phillips
R
,
Lo
E
,
Shad
S
, et al
.
The genotype-tissue expression (GTEx) project
.
Nat Genet
2013
;
45
:
580
5
.
54.
Ohshima
K
,
Nojima
S
,
Tahara
S
,
Kurashige
M
,
Kawasaki
K
,
Hori
Y
, et al
.
Serine racemase enhances growth of colorectal cancer by producing pyruvate from serine
.
Nat Metab
2020
;
2
:
81
96
.
55.
Boehm
V
,
Kueckelmann
S
,
Gerbracht
JV
,
Kallabis
S
,
Britto-Borges
T
,
Altmüller
J
, et al
.
SMG5-SMG7 authorize nonsense-mediated mRNA decay by enabling SMG6 endonucleolytic activity
.
Nat Commun
2021
;
12
:
3965
.
56.
Redon
S
,
Reichenbach
P
,
Lingner
J
.
Protein RNA and protein protein interactions mediate association of human EST1A/SMG6 with telomerase
.
Nucleic Acids Res
2007
;
35
:
7011
22
.
57.
Manguso
RT
,
Pope
HW
,
Zimmer
MD
,
Brown
FD
,
Yates
KB
,
Miller
BC
, et al
.
In vivo CRISPR screening identifies Ptpn2 as a cancer immunotherapy target
.
Nature
2017
;
547
:
413
8
.
58.
Jiang
Y-Y
,
Lin
D-C
,
Mayakonda
A
,
Hazawa
M
,
Ding
L-W
,
Chien
W-W
, et al
.
Targeting super-enhancer-associated oncogenes in oesophageal squamous cell carcinoma
.
Gut
2017
;
66
:
1358
68
.
59.
Turnbull
C
,
Sud
A
,
Houlston
RS
.
Cancer genetics, precision prevention and a call to action
.
Nat Genet
2018
;
50
:
1212
8
.
60.
Yang
Z
,
Huo
Y
,
Zhou
S
,
Guo
J
,
Ma
X
,
Li
T
, et al
.
Cancer cell-intrinsic XBP1 drives immunosuppressive reprogramming of intratumoral myeloid cells by promoting cholesterol production
.
Cell Metab
2022
;
34
:
2018
35.e8
.
61.
Chen
X
,
Iliopoulos
D
,
Zhang
Q
,
Tang
Q
,
Greenblatt
MB
,
Hatziapostolou
M
, et al
.
XBP1 promotes triple-negative breast cancer by controlling the HIF1α pathway
.
Nature
2014
;
508
:
103
7
.
62.
Rashkin
SR
,
Graff
RE
,
Kachuri
L
,
Thai
KK
,
Alexeeff
SE
,
Blatchins
MA
, et al
.
Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts
.
Nat Commun
2020
;
11
:
4423
.
63.
Cancer Genome Atlas Research Network
;
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KRM
,
Ozenberger
BA
, et al
.
The cancer genome atlas pan-cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
64.
Yeo
NC
,
Chavez
A
,
Lance-Byrne
A
,
Chan
Y
,
Menn
D
,
Milanova
D
, et al
.
An enhanced CRISPR repressor for targeted mammalian gene regulation
.
Nat Methods
2018
;
15
:
611
6
.
65.
ENCODE Project Consortium
.
An integrated encyclopedia of DNA elements in the human genome
.
Nature
2012
;
489
:
57
74
.
66.
Horn
S
,
Figl
A
,
Rachakonda
PS
,
Fischer
C
,
Sucker
A
,
Gast
A
, et al
.
TERT promoter mutations in familial and sporadic melanoma
.
Science
2013
;
339
:
959
61
.
67.
Huang
FW
,
Hodis
E
,
Xu
MJ
,
Kryukov
GV
,
Chin
L
,
Garraway
LA
.
Highly recurrent TERT promoter mutations in human melanoma
.
Science
2013
;
339
:
957
9
.
68.
Bell
RJA
,
Rube
HT
,
Kreig
A
,
Mancini
A
,
Fouse
SD
,
Nagarajan
RP
, et al
.
Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer
.
Science
2015
;
348
:
1036
9
.
69.
Akıncılar
SC
,
Khattar
E
,
Boon
PLS
,
Unal
B
,
Fullwood
MJ
,
Tergaonkar
V
.
Long-range chromatin interactions drive mutant TERT promoter activation
.
Cancer Discov
2016
;
6
:
1276
91
.
70.
Eixelsberger
T
,
Sykora
S
,
Egger
S
,
Brunsteiner
M
,
Kavanagh
KL
,
Oppermann
U
, et al
.
Structure and mechanism of human UDP-xylose synthase: evidence for a promoting role of sugar ring distortion in a three-step catalytic conversion of UDP-glucuronic acid
.
J Biol Chem
2012
;
287
:
31349
58
.
71.
Wang
X
,
Liu
R
,
Zhu
W
,
Chu
H
,
Yu
H
,
Wei
P
, et al
.
UDP-glucose accelerates SNAI1 mRNA decay and impairs lung cancer metastasis
.
Nature
2019
;
571
:
127
31
.
72.
Sasisekharan
R
,
Shriver
Z
,
Venkataraman
G
,
Narayanasami
U
.
Roles of heparan-sulphate glycosaminoglycans in cancer
.
Nat Rev Cancer
2002
;
2
:
521
8
.
73.
Doshi
MB
,
Lee
N
,
Tseyang
T
,
Ponomarova
O
,
Goel
HL
,
Spears
M
, et al
.
Disruption of sugar nucleotide clearance is a therapeutic vulnerability of cancer cells
.
Nature
2023
;
623
:
625
32
.
74.
Bailey
SD
,
Desai
K
,
Kron
KJ
,
Mazrooei
P
,
Sinnott-Armstrong
NA
,
Treloar
AE
, et al
.
Noncoding somatic and inherited single-nucleotide variants converge to promote ESR1 expression in breast cancer
.
Nat Genet
2016
;
48
:
1260
6
.
This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.