Chromatin Accessibility Landscape of Human Triple-negative Breast Cancer Cell Lines Reveals Variation by Patient Donor Ancestry

Abstract African American (AA) women have an excessive risk of developing triple-negative breast cancer (TNBC). We employed Assay for Transposase-Accessible Chromatin using sequencing to characterize differences in chromatin accessibility between nine commonly used TNBC cell lines derived from patients of European and African ancestry. Principal component and chromosome mapping analyses of accessibility peaks with the most variance revealed separation of chromatin profiles by patient group. Motif enrichment and footprinting analyses of disparate open chromatin regions revealed differences in transcription factor activity, identifying 79 with ancestry-associated binding patterns (FDR < 0.01). AA TNBC cell lines exhibited increased accessibility for 62 transcription factors associated with epithelial-to-mesenchymal transition, cancer stemness/chemotherapeutic resistance, proliferation, and aberrant p53 regulation, as well as KAISO, which has been previously linked to aggressive tumor characteristics in AA patients with cancer. Differential Assay for Transposase-Accessible Chromatin signal analysis identified 1,596 genes located within promoters of differentially open chromatin regions in AA-derived TNBC, identifying DNA methyltransferase 1 as the top upregulated gene associated with African ancestry. Pathway analyses with these genes revealed enrichment in several pathways, including hypoxia. Culturing cells under hypoxia showed ancestry-specific stress responses that led to the identification of a core set of AA-associated transcription factors, which included members of the Kruppel-like factor and Sp subfamilies, as well as KAISO, and identified ZDHHC1, a gene previously implicated in immunity and STING activation, as the top upregulated AA-specific gene under hypoxia. Together, these data reveal a differential chromatin landscape in TNBC associated with donor ancestry. The open chromatin structure of AA TNBC may contribute to a more lethal disease. Significance: We identify an ancestry-associated open chromatin landscape and related transcription factors that may contribute to aggressive TNBC in AA women. Furthermore, this study advocates for the inclusion of diversely sourced cell lines in experimental in vitro studies to advance health equity at all levels of scientific research.


Introduction
Human cancer cell lines are valuable and widely used in vitro model systems essential to performing cancer research at its most reductionist level.Under the proper growth conditions, cancer cell lines can retain many of the genetic properties of the parental tumor from which they were derived (1).Representing a seemingly inexhaustible source of biologic material, these cost-effective and manipulatable systems remain essential to mechanistic studies, drug discovery efforts, and preclinical research.Because of the foundational role they
Media was changed after 24 hours, and cells were exposed to normoxia (21% O 2 ) and hypoxia (1% O 2 ) for 48 hours.Experiments were conducted in duplicates.After 48 hours, the cells were harvested from the flasks by 0.25% trypsin (Gibco Laboratory) and 1 million cells were suspended in 1.5 mL of freezing medium (basal media + 20% FBS + 10% DMSO).The above cryopreserved cells were sent to the Sequencing Facility at Frederick National Laboratory for Cancer Research, Frederick, MD, where sample processing and sequencing were done.

Nuclei Isolation and ATAC-seq
Cell revival, cell lysis, transposition, and DNA extraction were performed using a published method by Corces and colleagues, 2017 (8) with some modifications for cryopreserved cell processing (9).Briefly, frozen cell pellets of 50,000 cells were resuspended in 1 mL of cold ATAC-seq resuspension buffer (RSB; 10 mmol/L Tris-HCl pH 7.4, 10 mmol/L NaCl, and 3 mmol/L MgCl 2 in water).
Cells were centrifuged at 500 rcf for 5 minutes in a prechilled (4°C) fixed-angle centrifuge.After centrifugation, 900 μL of supernatant was aspirated, which left 100 μL of supernatant.This remaining 100 μL of supernatant was carefully aspirated by pipetting with a P200 pipette tip to avoid the cell pellet.Cell pellets were then resuspended in 50 μL of ATAC-seq RSB containing 0.1% NP40, 0.1% Tween-20, and 0.01% digitonin by pipetting up and down three times.This cell lysis reaction was incubated on ice for 3 minutes.After lysis, 1 mL of ATAC-seq RSB containing 0.1% Tween-20 (without NP40 or digitonin) was added, and the tubes were inverted to mix.Nuclei were then centrifuged for 10 minutes at 500 rcf in a prechilled (4°C) fixed-angle centrifuge.Supernatant was removed with two pipetting steps, as described before, and nuclei were resuspended in 50 μL of transposition mix by pipetting up and down six times.Transposition reactions were incubated at 37°C for 30 minutes in a thermomixer with shaking at 1,000 rpm.Reactions were cleaned up with Zymo DNA Clean and Concentrator 5 columns.The remainder of the ATAC-seq library preparation was performed as described previously (9).All libraries were amplified with a target concentration of 20 μL at 4 nmol/L, which is equivalent to 80 femtomoles of product.A total of 32 ATAC-seq samples were pooled and sequenced on NextSeq using OMNI ATAC-seq protocol adapted from Corces and colleagues (8) and paired-end sequencing.All samples have >95% Q30 bases, which is defined as the percentage of bases called with an inferred accuracy of 99.9% or above and a measure of base calling quality.All the samples had yields between 48 and 417 million pass filter reads.Samples were trimmed for adapters using Cutadapt v. 4.4 (ref. 10; RRID: SCR_011841) before the alignment.The trimmed reads were aligned with the human genome assembly -hg38 reference using Bowtie2 v.2.5.1 (11) alignment (RRID: SCR_016368).Overall alignment
There were 0.96% to 8.08% unmapped reads.Library complexity was measured by uniquely aligned reads using Picard's MarkDuplicate utility.All the samples had library complexity with percent nonduplicated reads ranging from 38% to 69%.The peaks were called using Genrich and total number of peaks ranged between 54,835 and 83,046.The number of reads falling in peaks (FRiP score) generally ranged between 30% and 70%.The ATAC-seq data for the human breast cancer cell lines were deposited in the NCBI Gene Expression Omnibus (GEO) database under accession number GSE223182.

RNA Extraction for Gene Expression Studies
RNA was isolated using TRIzol and subsequently DNase treated.Integrity of isolated RNA was evaluated using RNA ScreenTapeon the Agilent Tapestation (Agilent Technologies

ATAC-seq Data Processing
After verifying that the pair-end reads are of acceptable sequencing quality, sequencing adapters and primers were trimmed away using Cutadapt (10).
Trimmed reads that align to known blacklisted regions using Bowtie2 (11) were excluded from downstream analysis.Bowtie2 was again applied to align the remaining reads to the host genome allowing "multimappers".These latter sequences were then mapped as recommended by ENCODE guidelines (12).Reads which remained unmapped or failed quality checks or were identified as a PCR duplicate are removed.Primary alignments of the remaining reads were further filtered to remove low MAPQ alignment.Reads were then deduplicated and parsed to Genrich (13) for ATAC-seq peak calling.FastQC v.

Principal Component Analysis
The dimensionality of the Tn5-nicking site counts matrix was reduced by calculating the principal components and plotting the first two components which explain a significant proportion of the overall variation.

Identification of Differentially Accessible Regions
Loading the count matrix into DESeq2 v. 1.45.5 (ref. 21; RRID:SCR_015687) a differential analysis was then performed to determine the ROIs which show significant differences in measured ATAC-seq signal between EA and AA samples.The results were filtered for differential regions with FDR ≤ 0.05 and signal fold change ≥ 2. This allowed us to filter the overall open chromatin regions of interest into those with significant differential ATAC signals between the two groups or differentially accessible regions (DAR).

TF Binding Motif Enrichment
We used ChromVAR v. 1.22.1 (22) to estimate bias-corrected deviations of TF binding motif enrichment.It enables accurate clustering of ATAC-seq profiles and characterization of motifs associated with variation in chromatin accessibility.The motifmatchr package was used to match motifs from the JASPAR core database (23) to peaks from differentially open chromatin regions and variability (SD of the z-scores computed across all samples for a set of peaks) and adjusted P values were calculated, and the top 50 most variable TFs were plotted on a heat map.

Digital Footprinting Analysis
Using the motifs from the HOCOMOCO ( 24) database (RRID: SCR_005409), we ran TOBIAS v.0.14.0 (25) in the identified DARs to predict TF occupancy or perform digital footprinting analysis.TOBIAS results shed light on gain and loss of individual TF footprints in the DARs.TFs with a differential binding score above |0.225|and a FDR < 0.01 were considered significant.

Pathway Enrichment Analyses
For pathway enrichment analysis, ancestry-related genes upregulated in either EA or AA TNBC, which were identified through differential analysis of ATAC signal by DESeq2 (|log 2 FC| > 2, FDR < 0.01), were then imported into the Enrichr tool (ref.26; RRID: SCR_001575) to perform an overrepresentation analysis of both cancer hallmark pathways (MSigDb, Broad Institute; RRID: SCR_016863) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (RRID: SCR_012773).Pathways with an FDR < 0.2 were used for subsequent biologic interpretation.

STRING Protein-protein Interaction Network Construction of Ancestry-specific Transcriptional Factors
Hypoxia-induced and ancestry-specific differentially bound TFs identified by TOBIAS were analyzed using the STRING database (RRID: SCR_005223) to identify groups of TF proteins that have known and/or predicted relationships (27).Using STRING, we performed pathway enrichment analysis, which identified overrepresentation of Reactome (RRID: SCR_003485) pathways of functional subsystems that are observed more frequently than expected using hypergeometric testing against a statistical background of the genome (27).Network nodes signify proteins and colored edges denote evidence of proteinprotein interactions (PPI) of various types.Networks were clustered via the Markov cluster algorithm with default inflation parameters.

RNA-seq Data Processing and Analysis
The Illumina FASTQ files were assessed for quality control using FastQC v.0.11.6.Reads of the samples were trimmed for adapters and low-quality bases using Cutadapt before alignment with the reference genome (hg38) and the annotated transcripts using STAR v.2.7.5b.The gene expression quantification was performed using RSEM v1.3.2.The average mapping rate of all samples was 97% and unique alignment was above 88%.The mapping statistics were calculated using Picard software.Percent coding bases were between 51% and 61%.Percent untranslated region bases were 32%-43%, and mRNA bases were between 89% and 94% for all the samples.Library complexity was measured in terms of unique fragments in the mapped reads using Picard's MarkDuplicate utility.The samples had 58%-71% nonduplicate reads.Differential expression analysis was done using DESeq2 package in R. Differential expression genes (DEG) were filtered by a q value (FDR) < 0.05, the absolute value of fold change > 2. We then performed gene set enrichment analysis (GSEA) using the gene transcription regulation reference (Gene Transcription Regulation Database: GTRD) from the Molecular Signatures Database: http://www.gseamsigdb.org/gsea/msigdb/human/collections.jsp).

Ancestry Estimation for Cell Lines Using RNA-seq Data
Germline variants were called using GATK's HaplotypeCaller v.

The Cancer Genome Atlas Validation
To validate ancestry-specific upstream transcriptional regulators identified by ATAC-seq in cell line analyses, RNA-seq data from The Cancer Genome Atlas (TCGA; RRID: SCR_003193) breast cancer data were downloaded from the Broad Institute's GDAC (https://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/BRCA/20160128/) covering 112 TNBC patient tumors (n = 78 EA, n = 34 AA participants) to infer differences in TF activity by genetic ancestry, which was assigned using previously reported quantified genetic ancestry estimates (28).Of note, quantified genetic ancestry estimates were concordant with self-reported race in these subjects (Supplementary Fig. S1).Significant ancestry-specific genes upregulated in either EA or AA TNBC identified through differential analysis of TCGA BRCA TNBC RNA-seq (by DESeq2, |FC| > 1.5, FDR < 0.05) were input into the Ingenuity Pathways Analysis (RRID: SCR_008653) to perform an upstream regulator analysis.

Statistical Analysis
Two experimental replicates of each cell line were submitted for ATAC-and RNA-seq and averaged for analyses.If not specified, R platform was used to compute statistics and generate plots.log 2 fold change, P values, and FDR were calculated and used to assess significance.Significant differences for all quantitative data were considered when |log 2 FC| > 2 and FDR < 0.01 unless otherwise noted.

Data Availability
The RNA-seq data for the human breast cancer cell lines were deposited in the NCBI GEO database under accession number GSE223181.The ATAC-seq data for the human breast cancer cell lines were deposited in the NCBI GEO database under accession number GSE223182.
GSE223183: This SuperSeries is composed of the following SubSeries: GSE223181 Integrated chromatin accessibility and gene expression landscape of human TNBC cell lines reveals variation by patient donor ancestry [RNAseq] GSE223182 Integrated chromatin accessibility and gene expression landscape of human TNBC cell lines reveals variation by patient donor ancestry [Bulk ATAC-seq] The following secure token has been created to allow review of record GSE223183 while it remains in private status: urgfuuiiprstfkj.

Genome-wide Landscape of Chromatin Accessibility in EA-and AA-derived TNBC Cell Lines
The assay for transposase-accessible chromatin using sequencing, known as ATAC-seq, is currently the prevailing method used to assess chromatin accessibility across the genome (29).Genomic DNA is exposed to a highly active transposase, Tn5, which fragments DNA and inserts into open chromatin regions, adding sequencing primers.Upon sequencing, researchers can use the position of Tn5 to characterize open and closed chromatin sites, investigate TF binding, and infer downstream gene expression and regulation.In this study, we employed ATAC-seq to characterize differences in chromatin accessibility between EA (n = 4) and AA (n = 5) donors of nine commonly used TNBC cell lines (Table 1).Ancestry estimation (http://ecla.moffitt.org)indicates an average European ancestry of 93% for the four EA TNBC cell lines and 79.5% West African ancestry for the five AA TNBC cell lines (Table 1).Our ancestry estimates using the RNA-seq data from these cell lines were consistent with these reported estimates.ATAC libraries were generated using two replicates of each cell line.Chromatin was fragmented into the expected nucleosomefree regions and mononucleosome, dinucleosome, and trinucleosome patterns (Fig. 1A), with the fractions of reads in peaks (FRiP) falling between 30% and 70%, indicating high-quality data.The total number of peaks merged across nine cell lines was 212333, with intronic, intergenic, and promoter regions representing the most accessible genomic elements to Tn5 transposase (Fig. 1B and   C).Our principal component analysis of the top 50,000 peaks with the most variance revealed separation of chromatin profiles by genetic ancestry (Fig. 1D), with EA TNBC showing a closer relationship to one another than AA TNBC.
Genome-wide analysis of differentially accessible chromatin sites (FDR < 0.01 and |log 2 FC| > 2) in EA compared with AA TNBC cells confirmed distinctive patterning of open chromatin across a number of chromosomes (Fig. 1E).Together, these data suggest genome-wide differences in the chromatin landscape by donor ancestry in these commonly used TNBC cell lines.

EA-and AA-derived TNBC Cell Lines Exhibit Differential TF Accessibility by Donor Ancestry
TF dysregulation can induce aberrant gene expression associated with cancer; indeed, TF activity is altered in a number of cancers and has long been considered a lofty yet challenging candidate for drug targeting (30).Given their critical role in tumor initiation and progression, we focused our investigation on understanding differences in TF abundance and activity in TNBC lines by ancestry.Using Tn5-nicking sites as counts in called peaks, we performed   differential ATAC signal analysis using DESeq2.We examined the overabundance of known TF motifs in these differential regions of interest using nonredundant JASPAR motifs and created a heat map of the top 50 most variable TFs (Fig. 2A; Supplementary Table S1).In EA-derived TNBC lines, a strong overabundance of motifs associated with members of the AP-1 TF families was observed, including members of the FOS (FOSL1, FOSL2, FOS), JUN (JUND, JUN, JUNB), and ATF (BATF) and MAF (MAF) subfamilies; AP-1 proteins are bZip domain-containing TFs that homodimerize and heterodimerize with each other and are implicated in cancer cell growth and proliferation across a number of cancer types, including breast (31).One AA-derived cell line, MDA-MB-157, showed a similar pattern, which contrasted with the other four AA cell lines.Differentially accessible chromatin regions (DAR) in the AA-derived TNBC lines showed an overabundance in motifs associated with the Grainyhead family of TFs, such as GRHL1 and TFCP2, which are considered pro-oncogenic in breast cancer and involved in pro-metastatic processes like epithelial-tomesenchymal transition and angiogenesis (32).AA-associated DARs were also overabundant in motifs of several TFs from the AP-2 family, including AP2A, 2B, and 2C, and multiple isoforms, which have a well-documented role in breast carcinogenesis (33); also enriched were motifs in forkhead box (FOX) TFs, including members of the FOXO4 (implicated in cancer cell invasion and replicative immortality), FOXD2 (implicated in chemoresistance), FOXI1 (associated with poor prognosis in breast cancer), and FOXL1 (modulator of Wnt signaling) subfamilies (34,35).
Next, we performed a digital footprinting analysis of differentially open chromatin regions using TOBIAS.We uncovered significant differences in TF binding by ancestry (79 TFs with a differential binding score > 0.225 or < −0.225 and FDR < 0.01; Fig. 2B and C; Supplementary Table S2).EA and AA TNBC cell lines exhibited increased binding capacities for 17 and 62 total TFs, respectively, with several common themes shared with the motif enrichment analysis.In EA TNBC, the top 10 differentially bound TFs were all members of subfamilies within the AP-1 TF family.In AA TNBC, the predicted top four differentially bound TFs with an increased binding capacity were all associated with epithelial-to-mesenchymal transition (ZEB1, SNAI1, SNAI2, GRHL2).
Others were associated with cancer stemness and chemotherapeutic resistance (TFAP2C, NRF1), cancer cell proliferation (E2F, E2F1, E2F2), and the TP53 pathway (P53, P63, P73).Interestingly, our data also showed increased binding of KAISO in AA TNBC lines (Fig. 2B), whose aberrant expression has been previously linked to a distinct biology and poor outcomes in AA patients with breast and prostate cancer (36)(37)(38).We wanted to confirm the correlation between ancestry-specific TF activity identified by ATAC-seq with downstream enrichment of their known transcriptional targets.Using RNA-seq data from our cell lines, we performed GSEA with gene sets from the GTRD to infer key transcriptional regulators based on expression of their known transcriptional targets (Supplementary Table S3).While not all met statistical significance, many TFs that had shown significant enrichment with chromosome accessibility mapping also annotated with GSEA.In EA, two TFs overlapped with the two approaches, RORA (P nom = 0.23) and BCL6B (P nom = 0.26).In AA, 11 TFs overlapped, including SNAI1 (P nom = 0.19), NR1H4 (P nom < 0.0001), MSX2 (P nom = 0.03), and KLF14 (P nom = 0.08).Taken together, these data suggest that TF activities show significant differences by donor ancestry, with African ancestry displaying an ancestry-specific TF activity profile associated with increased aggressiveness in breast cancer.
We next sought to extend these findings from human TNBC cell lines to TNBC patient tumors to investigate if ancestry-specific TF activity showed similar patterning by patient ancestry group.ATAC-seq data were not available within TCGA for validation of TF binding and motif enrichment analyses; however, we leveraged RNA-seq data from triple-negative breast tumor tissues in EA (n = 78) and AA (n = 34) patients within TCGA to perform an upstream regulator analysis to infer critical transcriptional regulators based on ancestryrelated differences in downstream gene expression (Supplementary Table S4).
We identified numerous donor ancestry-associated upstream regulators in the tumors that overlapped with the ancestry-related TF activities found in our TNBC cell line analyses (Table 2).In TNBC tumors from EA patients, these TFs

TNBC Cells Derived from AA Women Show Downstream Transcriptional Changes Associated More Aggressive Tumor Biology
Our findings showing that EA and AA women possess differential patterns of TF binding in TNBC cells led us to wonder how these differences may impact downstream gene expression.To this end, we performed a differential ATAC signal analysis using DESeq2 to identify genes within differentially open ATACseq peaks, defined as differentially expressed genes or DEGs; |log 2 FC| > 2, FDR < 0.01; Fig. 3; Supplementary Table S5).Across all genomic elements, we identified 13,945 differential peaks in EA TNBC and  Table 3).Conversely, the top DEG upregulated in AA was DNMT (|log 2 FC| = 6.73,FDR = 4.9 × 10 −12 ), a DNA methyltransferase (Table 3); this epigenetic regulator was also inferred to be enriched in our upstream regulator analysis using GSEA GTRD with our RNA-seq data (ES = 0.30, P nom = 0.097; Supplementary Table S3).Interestingly, expression of DNMT1 is associated with poor survival in breast cancer, is overexpressed specifically within TNBC, induces cancer cell invasion and survival through hypermethylation of key tumor suppressors, and is considered an epigenetic target for therapeutic blocking (40).Many of the top-ranking genes identified in differentially accessible chromatin regions by ATAC-seq were also significantly overexpressed at the transcript level in these cell lines (Supplementary Fig. S2; Supplementary Table S6), including top EA-specific DEG, LRRC, and the top AA-specific DEG, DNMT.
To identify key signaling pathways associated with donor ancestry, we next performed an overrepresentation analysis using the top 500 significant DEGs (|log 2 FC| > 2, FDR < 0.01) upregulated in promoter regions.In EA TNBC, enriched Hallmark-annotated cancer pathways (FDR < 0.2) included epithelial-to-mesenchymal transition (P adjusted = 2.3 × 10 −4 ), downregulation of UV response (P adjusted = 0.04), complement (P adjusted = 0.1), and hedgehog signaling (P adjusted = 0.16; Table 3).Interestingly, the key DEGs contributing to the enrichment of the epithelial-to-mesenchymal transition pathway (e.g., fibrillin-, SLIT, WIPF, etc.) are negative regulators of this process (41)(42)(43), suggesting that this may in fact functionally translate to suppression of mesenchymal differentiation in EA TNBC.No significant enrichment of KEGG-annotated pathways was found.In AA TNBC, upregulated Hallmarkannotated cancer pathways (FDR < 0.2) included estrogen response (early, P adjusted = 4.1 × 10 −4 ; late, P = 4.4 × 10 −4 ), apical junction (P adjusted = 2.9 × 10 −3 ), and hypoxia (P adjusted = 0.1), among others (Table 3).Nine KEGG pathways were significantly upregulated in AA TNBC, including angiogenic, metabolic, and inflammatory pathways, for example, inflammatory mediator regulation of transient receptor potential (TRP) channels.Contributing to the enrichment of the TRP pathway was the upregulation of several members of the TRP calcium ion channel superfamily, including TRPM, TRPM, TRPV, and TRPS, in AA TNBC.TRP channels and their regulation of Ca 2+ signaling have been implicated in cancer proliferation, metastasis, and drug resistance, and are just emerging as a novel candidate for therapeutic intervention in breast cancer (44).Together, these data suggest an enrichment of genes and pathways specific to donor ancestry, with AA-derived TNBC lines showing changes consistent with a more generally aggressive tumor biology.

Hypoxia Exacerbates Underlying Differences in TF Activity Between EA-and AA-derived TNBC Cell Lines
Hypoxia, or regions that lack sufficient oxygen, occurs in most solid tumors, including breast cancer.Of all breast cancer subtypes, hypoxia occurs most frequently in TNBC, and has been associated with therapy resistance and poor prognosis (45,46).Furthermore, a recent study by Bassiouni and colleagues found hypoxic tumor content differed by patient race in TNBC (47).
Given the distinct molecular characteristics, we have uncovered thus far in AA TNBC, including an enrichment for the hypoxia pathway at baseline, we wondered how this might impact their biological response to hypoxia.To this end, we cultured EA-and AA-derived TNBC cell lines under normoxic (21% O 2 ) and hypoxic (1% O 2 ) conditions and performed ATAC-seq.Genome-wide ing of open chromatin across a number of chromosomes (Fig. 4A).To look at TF binding, we performed a digital footprinting analysis of differentially open chromatin regions using TOBIAS.While the most differentially expressed transcription factors (DETF) showed changes in the same general direction when comparing EA with AA under both normoxia and hypoxia, the magnitude of these changes showed variations; most hovered near zero, but a core set of TFs within each donor group deviated more dramatically (Fig. 4B; Supplementary Table S7).In EA TNBC, this core set of EA-specific hypoxia response TFs once again included members of the AP-1 TF family, which have been found to be induced in hypoxic environments (48), as well as NFE2 and NF2L1/2 which have been linked to Wnt signaling in breast cancer (49).In AA TNBC, this core set of AA-specific hypoxia response TFs includes six members of the large multigene family of Sp/Kruppel-like factor (KLF) TFs, all of which have been linked to altered cancer cell metabolism; further, three members of the Sp subfamily (SP1, SP2, SP3) have been shown to regulate hypoxia through direct interaction with HIF1a (50).Also, among this core set of AA-specific hypoxia response TFs was KAISO, which has been shown to regulate HIF1a specifically during hypoxia (51) and whose aberrant activity is heavily associated with tumors from AA patients.
To better understand how differentially bound TFs involved in the ancestrydriven hypoxia response might act in concert to dictate downstream gene expression, we constructed a protein-protein interaction (PPI) network using the STRING database (27), allowing for better discernment of functional relationships between TFs (Fig. 4C and D).Within the core set of EA-specific hypoxia-induced TFs, a significantly enriched PPI network was observed (P < 1.0 × 10 −16 ), with a high degree of TF coexpression and shared functions (average local clustering coefficient = 0.81) among the small network (nodes, n = 10, edges, n = 27); as expected, DETFs from AP-1 family, namely the JUN and FOS TF subfamilies, showed highly robust interactions, likely reflecting the homodimeric and heterodimeric functional relationships between these proteins.We performed a pathway enrichment analysis using REACTOME based on the TF network analysis to identify shared biologic function of significant TF clusters (Fig. 4C); the top FDR-significant pathways included the pathways involved in proliferation (MAPK targets; FCERI-mediated MAPK activation), anti-inflammatory cytokine signaling (IL4 and IL13 signaling; signaling by ILs), and activation of the AP-1 family of TFs.Within the core set of AA-specific hypoxia-induced TFs, a significantly enriched PPI network was also observed (P < 1.0 × 10 −16 ), which was more expansive and extensive than that of the EA PPI network (nodes, n = 55, edges, n = 148).TP53 is one of the most prominent nodes featured in this network, with high degree of connectivity and central to several other key nodes, such as SP-1, EGR-1, and KLF-4.In pathway enrichment analyses using REACTOME, 33 pathways were identified with an FDR < 0.01, 15 of which are shown in Fig. 4D.Top pathway themes in the AA TF network were many pathways related to TP53, cell cycle (E2F; cyclin D and G1; PTEN; transcription of E2F targets; TP53 regulation transcription of cell-cycle genes; etc.), and cell stress/apoptosis (activation of PUMA; activation of NOXA; oncogene-induced senescence).
We next performed a differential ATAC signal analysis using DESeq2 to better understand downstream genes that may be impacted by the increased activity of each of the ancestry-specific hypoxia response TFs (Table 4; Supplementary Table S8).Across all genomic elements, we identified 5,643 differential peaks in EA TNBC and 7,037 in AA TNBC, mapped to 3,595 and 4,618 unique gene annotations, respectively.Specifically in promoters, we identified 491 genes that were differentially accessible in EA and 1,001 that were differentially accessible in AA, with an increased access to them for TFs.Within promoter regions, the top DEGs upregulated in EA was MAGI, a junctional scaffold protein that acts as a tumor suppressor in the context breast cancer through inhibition of the p38 stress pathway (52), and SMG, which has been implicated in DNA repair and telomere maintenance (53).The top DEGs upregulated in AA was endoplasmic reticulum-associated protein ZDHHC, implicated in innate immune response and a positive regulator of the STING pathway (54), and ADAP, which has been shown to promote cancer progression by inducing cell migration and invasion (55).All together, these data suggest that TNBC cells may have distinct, donor ancestry-specific stress responses to hypoxia.

Discussion
Here, we adopted a reductionist approach to better understand the contribution of patient ancestry to TNBC at the chromosomal level.In this study, we uncovered an ancestry-specific chromatin landscape across nine commonly used human TNBC cell lines.Through chromatin profiling by ATAC-seq, we found that TNBC cell lines displayed separation of chromatin profiles by donor ancestry.ATAC-seq is uniquely positioned to explore TF activity and binding; therefore, we were especially interested in characterizing ancestry-specific TFs, as these proteins are master regulators of chromatin and downstream gene expression and their dysregulation is frequently implicated in cancer initiation and progression.
EA-derived TNBC lines showed an overabundance in motifs associated with members of AP-1 TF families that are linked to cancer cell growth and proliferation.AA-derived TNBC lines displayed strong overabundance in motifs associated with members of the Grainyhead, Forkhead Box, and AP-2 TF, which have been extensively linked to a pro-oncogenic and pro-metastatic tumor biology, notably including cancer cell invasion, epithelial-to-mesenchymal transition, angiogenesis, and poor prognosis in breast cancer.Digital footprinting analysis identified differentially bound TFs, which suggests activity at the time of the assay and may be of particular functional relevance, further bolstered these findings.Indeed, EA and AA TNBC cell lines exhibited increased binding of 17 and 62 total TFs, respectively.Particularly interesting were the differentially bound TFs specific to African ancestry, of which a majority had robust links to epithelial-to-mesenchymal transition, as well as many associated with cancer stemness, chemotherapeutic resistance, cancer cell proliferation, and the p53 pathway.Notably, we found increased binding of KAISO in AA TNBC lines.Several studies have shown KAISO to be highly and preferentially expressed in TNBC tissues of women of African descent (56,57).In both breast and prostate cancer, KAISO promotes cell migration and invasion through silencing of methylated genes that promote epithelial-to-mesenchymal transition (38).Indeed, its aberrant expression has been previously linked to a distinct biology and poor outcomes in individuals of African descent (57).Our study further corroborates these findings at the chromatin level, showing ancestryspecific binding of this critical and multifunctional transcriptional regulator.
We performed validation of these findings in human patients, identifying a number of upstream transcriptional regulators enriched in TNBC tumors from TCGA that overlapped with those found in our cell lines, which included 22 shared between patient tumors and cell lines of predominantly European ancestry, and 25 shared between patient tumors and cell lines of predominantly African ancestry; these TFs may be of particular interest for future follow-up  studies.Overall, we observed an African ancestry-related TF profile in TNBC that is largely associated with a more aggressive and pro-metastatic tumor biology.
Upon examining how these ancestry-specific TF profiles could impact gene expression, particularly of those that fell within promoter regions, we identified a number of interesting candidate genes upregulated in TNBC cell lines of African ancestry, including DNMT.A DNA methyltransferase, DNMT expression has been shown to be overexpressed specifically in TNBC associated with poor survival (40).It functions include a repression of estrogen receptor expression and signaling, promotion of epithelial-to-mesenchymal transition, activation of cellular autophagy, and cancer stem cell growth, achieving these functions through the hypermethylation of the estrogen receptor promoter region, tumor suppressor genes, and other cancer progression inhibiting factors (40).DNMT1 inhibitors have shown antitumorigenic activity and enhance sensitivity to immunotherapies.Our findings suggest that targeting this candidate gene may have preferential benefit for patients of African descent and more extensive follow-up studies are warranted to better understand its ancestryspecific role.Pathway enrichment analyses identified numerous significantly enriched pathways upregulated in AA TNBC lines, including those involved in cancer metabolism, inflammation, and cell adhesion.One particularly interesting finding was the enrichment of inflammatory mediator regulation of TRP channels specific to African ancestry, driven in part by the upregulation of several members of the TRP calcium ion channel superfamily, including TRPM, TRPM, TRPV, and TRPS.TRP channels and their regulation of Ca 2+ signaling have been shown to induce cancer proliferation, metastasis, and drug resistance.Accordingly, they are emerging as novel candidates for therapeutic targeting; yet their connection to African ancestry was previously unknown prior to these findings, which may now warrant further investigation to evaluate their potential for enhanced efficacy in AA women with breast cancer.While it was perhaps counterintuitive to find enriched pathways in estrogen response, closer examination of the genes responsible for driving that overrepresentation have functions outside of estrogen response and known roles in TNBC, including SLCA, which is implicated in epithelial-to-mesenchymal transition in TNBC (58), FAMA, which is differentially expressed in BRCA-mutated cancers (59) that tend to be triple-negative, and CDH, which is linked to TNBC proliferation and invasion (60).Furthermore, AA-associated DEGs driving the putative estrogen response pathway included several genes associated with cellular junctions, such as ZO-/TJP, CELSR, claudin-, which is overexpressed in TNBC and associated with worse outcomes (61).
Of interest, one upregulated pathway related to African ancestry was hypoxia, which commonly develops in TNBC and may vary in its extent by ancestral background, as suggested recently (47).This prompted us to further explore how TNBC cells of differing ancestral backgrounds responded to hypoxic conditions.This led us to identify a core set of differentially bound ancestry-specific TFs exacerbated under hypoxia.A tight network with high TF coexpression emerged in EA-derived TNBC, with enrichment of MAPK signaling and several anti-inflammatory cytokines.In TNBC of African ancestry, a large, extensive TF network with numerous nodes and connections was constructed.Among the many significantly enriched pathways related to these hypoxiainduced TFs, common themes of oxidative stress, DNA damage, apoptosis, and cell cycle emerged, with p53 appearing as a central node.Upon examination of top DEGs from DARs in promoter regions, top genes specific to EA-derived TNBC cells were associated with tumor suppressive and antiproliferative roles, while top genes specific to AA-derived TNBC cells were implicated in STINGdriven immune response and cell migration/invasion.When taken together, these data not only enhances our understanding of ancestry-specific hypoxia response, but at a broader level may also suggest that the underlying molecular differences in TNBC by genetic ancestry could contribute to distinct response to other experimental manipulations and perturbations.This could have implications for mechanistic work, drug discovery, response to therapy, and other work based on in vitro models, thereby further underscoring that donor ancestry should be considered in experimental design.
Understanding the contribution of genetic ancestry to breast cancer disparities in disease aggressiveness and outcomes is a current research priority in the field.Work championed by several investigators in this field has moved the needle considerably in evaluating African ancestry as a true biologic determinant of TNBC separate from race (6,7,62,63).Indeed, several notable studies have identified ancestry-related gene signatures in human TNBC tumors (7,63,64), which were largely dominated by differential immune cell infiltrates and activation.Here, we explore this important scientific question through the lens of epigenetic, rather than transcriptional regulation.This gives us the opportunity to expand our understanding of genetic ancestry by looking at the upstream master regulators of gene expression-TFs-which were central to our investigation.Our findings have shown that African ancestry in TNBC is associated with the distinct activity of critical TFs and downstream gene expression changes that may contribute in part to observed disparities in tumor aggressiveness in this population group.Furthermore, our study focuses on human cell lines, rather than patient tumors, to explore genetic ancestry at the molecular level independent of common confounders that are inextricably linked to human cohorts, such as socioeconomic status, access to quality health care, comorbidities, lifestyle factors, and other socioenvironmental factors that play central roles to cancer disparities, thereby allowing this in vitro-based study to further bolster existing evidence in the field that genetic ancestry may contribute to worsened TNBC biology.
Finally, the findings from this study may have broader implications beyond TNBC.Human cancer cell lines are foundational to basic science discovery, drug development, and preclinical research.The cell lines selected in this study represent common breast cancer lines used routinely in cancer research at the bench.We found that the cellular behavior and response to experimental conditions can vary as a result of ancestral origin.This suggests that the ancestral origin of patient-derived cell lines matters and the development and use of diversely sourced cell lines should be considered in experimental design in in vitro studies.Not only can this approach improve study rigor by reducing erroneous biologic variation between studies, but it also represents a step toward improving inclusivity and increasing health equity even at this most basic level in biomedical research.

FIGURE 1
FIGURE 1 Genome-wide landscape of chromatin accessibility in TNBC cell lines derived from EA and AA women.A, Fragment length distribution plot for TNBC cell lines.Peak < 150 bp indicates a nucleosome-free peak; peak between 150 and 300 bp indicates a mononucleosome peak.FRiP scores greater than 0.3 indicate high quality of the data.Technical replicates (n = 2) were averaged for each cell line.B, Summary of peak distribution across all samples.C, Number of peaks by TNBC cell line.All samples had high peak numbers (∼75,000) covering promoters and intergenic regions.D, Principal component analysis showing separation of the chromosome accessibility landscape by ancestry for the nine TNBC cell lines.Technical replicates (n = 2) were averaged for each line.E, Chromosome map showing the distribution of DARs between EA and AA TNBC cells (FC > 2, FDR < 0.01).

FIGURE 2
FIGURE 2 Patterns of differential TF binding in open chromatin regions among TNBC cell lines derived from EA and AA women.A, Overabundance of known TF motifs within differentially open regions.Heat map of top 50 most variable TFs by cell line donor ancestry based on differential ATAC signal analysis.B, Volcano plot of predicted differences in TF binding scores by patient group using digital footprinting analysis with TOBIAS.Open chromatin regions defined by TF binding sites are significantly different between cell lines from AA and EA patients.A total of 79 TFs were differentially bound in AA versus EA TNBC lines (points in red indicate TFs with differential binding score > |0.225| and FDR<0.01).A differential binding score of greater than 0.225 indicates increased binding in EA.A differential binding score of less than −0.225 indicates increased binding in AA.C, Top-ranked 10 TFs with differential binding capacities in the two donor groups identified through digital footprinting analysis.

FIGURE 3
FIGURE 3 Differential ATAC signal analysis of human TNBC cell lines identified ancestry-associated genes within differentially accessible chromatin regions across all genomic elements.Genes are displayed on a volcano plot, where the y-axis shows statistical significance (plotted as −log 10 FDR) and the x-axis shows the differences between each ancestry group (plotted as log 2 fold change).The horizontal dashed lines depict our specified significance cut-off points, with genes that showed an FDR < 0.05 and a |log 2 FC| > 1 shown in red.Negative log 2 fold change values show genes differentially open or up in AA donor cell lines, while positive log 2 fold change values show genes differentially open or up in EA donor cell lines.

FIGURE 4
FIGURE 4 Hypoxia exacerbates donor ancestry-related differences in TF activity in cultured TNBC cell lines.A, Chromosome map showing the distribution of DARs between EA and AA TNBC cells when subjected to hypoxic conditions (FC > 2, FDR < 0.01).B, Differential digital footprinting in EA-and AA-derived TNBC lines was performed using TOBIAS to identify open chromatin regions and their TF binding sites.Shown are the top AA-or EA-associated TFs whose predicted binding activity increased by a defined magnitude of >0.05 under hypoxia.PPI network of TFs with differential binding capacities (DBS > 22.5, FDR < 0.01) that are up in EA (C) and AA (D) under hypoxia using the STRING database.Top 15 significantly enriched pathways (FDR < 0.01, Reactome-based) based on a TF network analysis.
).Sequencing was performed on an Illumina NextSeq 2000 machine at the Sequencing Facility, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research using standard protocols.
Briefly, 16 mRNA sequencing samples were pooled and sequenced on NextSeq 2000 P2 using Illumina Stranded mRNA Ligation Kit and paired-end sequencing.The samples had 56 to 69 million pass filter reads with more than 95% of bases above the quality score of Q30.The RNA sequencing (RNA-seq) data for the human breast cancer cell lines were deposited in the NCBI's GEO database under accession number GSE223181.

TABLE 1
TNBC cell lines derived from women of EA and AA descent Ancestry estimates derived from the ECLA database from the Moffitt Cancer Center.Our ancestry estimates used our RNA-seq data from these cell lines and were concordant with ECLA.
a b Cell line annotations obtained from the ATCC.

TABLE 2
Upstream regulator analysis of TCGA breast cancer data identifies patient group-associated TFs that have a common association with ATAC-seq signals in the TNBC human cell lines

Upstream Regulator Analysis using TCGA RNA-seq Data a TNBC Tumors from European American Patients (n = 78)
RNA suspected to function as a tumor suppressor, followed by LRIG, which interestingly also has an anti-invasive role in basal-like breast cancer cells through encouraging mesenchymal-to-epithelial transition (ref.39; a TFs displayed that specifically overlapped with those identified by cell line ATAC analyses.noncoding

TABLE 3
Top genes and pathways enriched in TNBC cell lines in association with patient ancestry

Top Enriched Pathways from Differentially Open Chromatin Regions in AA TNBC (FDR < 0.2) Top enriched cancer hallmark pathways Adjusted P value Pathway enrichment score
analysis of differentially accessible chromatin sites (FDR < 0.01 and |log 2 FC| > 2) in EA compared with AA TNBC under hypoxia revealed distinctive pattern-

TABLE 4
Top DEGs upregulated under hypoxia in EA and AA TNBC cell lines