Abstract
Breast cancer incidence in the United States is lower in Hispanic/Latina (H/L) compared with African American/Black or Non-Hispanic White women. An Indigenous American breast cancer–protective germline variant (rs140068132) has been reported near the estrogen receptor 1 gene. This study tests the association of rs140068132 and other polymorphisms in the 6q25 region with subtype-specific breast cancer risk in H/Ls of high Indigenous American ancestry.
Genotypes were obtained for 5,094 Peruvian women with (1,755) and without (3,337) breast cancer. Associations between genotype and overall and subtype-specific risk for the protective variant were tested using logistic regression models and conditional analyses, including other risk-associated polymorphisms in the region.
We replicated the reported association between rs140068132 and breast cancer risk overall [odds ratio (OR), 0.53; 95% confidence interval (CI), 0.47–0.59], as well as the lower odds of developing hormone receptor negative (HR−) versus HR+ disease (OR, 0.77; 95% CI, 0.61–0.97). Models, including HER2, showed further heterogeneity with reduced odds for HR+HER2+ (OR, 0.68; 95% CI, 0.51–0.92), HR−HER2+ (OR, 0.63; 95% CI, 0.44–0.90) and HR−HER2− (OR, 0.77; 95% CI, 0.56–1.05) compared with HR+HER2−. Inclusion of other risk-associated variants did not change these observations.
The rs140068132 polymorphism is associated with decreased risk of breast cancer in Peruvians and is more protective against HR− and HER2+ diseases independently of other breast cancer-associated variants in the 6q25 region.
These results could inform functional analyses to understand the mechanism by which rs140068132-G reduces risk of breast cancer development in a subtype-specific manner. They also illustrate the importance of including diverse individuals in genetic studies.
Introduction
Breast cancer is the most common cancer and the second leading cause of cancer-related death among women in the United States (US; ref. 1). Analyses stratified by race and ethnicity categories have revealed differences in breast cancer incidence and subtype distribution among diverse populations (2). These differences are the result of structural, environmental, and genetic factors (3–6).
The genetic background of Latin American populations is shaped by the population history of the Americas and colonization (7–10), which led to vast geographical variation in the average contribution of the different continental ancestry components (7, 10). Estimates of average European ancestry in individuals from Latin American countries vary from 84% in Uruguay (11), to 18% in Peru (12), with intermediate values in countries such as Mexico or Chile (10, 12). A complementary range of variation in Indigenous American ancestry has been described, with countries like Peru and Bolivia having average Indigenous American ancestry estimates of approximately 80% (10, 12, 13). The African ancestry component in Latin America varies from up to approximately 77% in the Caribbean (14, 15) to less than 5% in countries such as Peru or Argentina (10). Peru is among the countries with the largest contribution of Indigenous American ancestry in Latin America making it an ideal population for the identification of Indigenous American-specific trait/disease-associated genetic variants.
The first breast cancer genome-wide association study (GWAS) carried out in Hispanics/Latinas (H/L) from the US and Mexico identified a single-nucleotide polymorphism (SNP), rs140068132A>G, that was correlated with Indigenous American ancestry and associated with lower odds of developing breast cancer (16). This variant was observed at relatively high frequency in Latin American populations, with frequencies varying from 5% in Puerto Rico to 23% in Peru (17). In addition to being protective against breast cancer generally, it showed lower odds of estrogen receptor-negative (ER−) disease compared with ER+ disease (16).
The rs140068132 SNP is located within an intergenic enhancer close to the ER gene (ESR1) at 6q25, shown to be a transcription factor–binding site (16). This region harbors multiple polymorphisms associated with breast cancer risk in populations of European and Asian ancestry (18–25), which have been associated with subtype-specific effects (18, 19, 22, 26) and have been defined as expression quantitative trait loci (eQTL) for ESR1 and other genes at 6q25 (24, 27). Among Europeans, five SNPs (rs3757322, rs9397437, rs851984, rs9918437, and rs2747652) define five independent loci associated with breast cancer risk (18). In Asian populations, rs12662670 and rs2046210 were extensively reported as susceptibility risk variants for breast cancer, as well as in populations of European descent (22, 24, 28). Although some of the index risk SNPs reported for other population groups present comparable frequencies in Latin American populations, they did not show strong associations in previous studies of H/L patients (16, 29) likely due to the different patterns of linkage disequilibrium (LD) between these populations. The second GWAS in H/Ls identified two independent loci defined by rs3778609 [odds ratio (OR), 0.76; 95% confidence interval (CI), 0.69–0.83] and rs851980 (OR, 1.28; 95% CI, 1.18–1.35) that are not in LD with rs140068132 (29). A meta-analysis of European and Asian populations confirmed the associations though they presented an attenuated effect (29).
This study aimed to test the extent to which rs140068132 is associated with breast cancer subtype–specific risk beyond ER status in the Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC). In addition, we evaluated previously reported SNPs in the region, their subtype-specific association, and their relationship with rs140068132.
Materials and Methods
Study participants
The PEGEN-BC study
As of March 2022, we have recruited 2,156 participants from the Instituto Nacional de Enfermedades Neoplásicas (INEN) in Lima, Peru, the largest cancer hospital in the country. Details about the study have been previously described (30). Briefly, women were invited to participate if they had a diagnosis of invasive breast cancer in the year 2010 or later, and were between 21 and 79 years of age when diagnosed. Approximately 70% of the patients invited to participate in PEGEN-BC provided their written informed consent and were included in the study. Demographic and clinical data were abstracted from electronic medical records (30). A blood sample was drawn by a certified phlebotomist at the INEN central laboratory. The current analysis includes a subset of 1,755 patients with available genotype data. Patient and tumor characteristics of individuals in the PEGEN-BC study are similar to those previously described for the overall INEN breast cancer population (31). This study was approved by the INEN and the University of California Davis Institutional Review Boards.
The pregnancy outcomes, maternal and infant cohort study
This study recruited 3,347 women aged 18 or older who attended prenatal care clinics at the Instituto Nacional Materno Perinatal (INMP) in Lima, Peru (32). The pregnancy outcomes, maternal and infant cohort study (PrOMIS) participants were included in the present study as convenience controls to be compared with the PEGEN-BC study participants in association analyses focused on the 6q25 region. The average allele frequencies in this large non–cancer-focused study are expected to provide an estimate of allele frequencies for the general population of Lima, Peru. The study was approved by the institutional review boards of the INMP and the Office of Human Research Administration, Harvard T.H. Chan School of Public Health (Boston, MA). All participants provided written informed consent.
Tumoral tissue samples and subtype classification
Tumoral tissues were obtained from core biopsy or freshly resected invasive breast cancers pre-treatment that were formalin-fixed and paraffin-embedded following standard protocols at INEN. Tumor subtypes were defined using immunohistochemical (IHC) markers by a certified pathologist at INEN. Hormone receptor (HR) status was defined by ER and progesterone receptor (PR) expression. HR positivity was defined at 1% or more cells showing staining for these markers. HER2 positivity was defined as 3+ staining by IHC or by gene amplification detected by FISH following a borderline IHC result. These markers were used to classify tumors as HR+HER2−, HR+HER2+, HR−HER2+ and HR−HER2− (30). IHC marker information was incomplete for 6% of the patients and therefore analyses by tumor subtype included 1,654 participants.
Genotyping, quality control, and imputation
The study focuses on selected SNPs that have been previously associated with breast cancer risk in the 6q25 locus; however, for estimation of continental ancestry and principal component analyses (PCA), genome-wide genotype data were used, as described below. Genotype data were obtained with the Affymetrix Precision Medicine Research Array for the PEGEN-BC study participants (30) and the Illumina Multi-Ethnic Global Array for PrOMIS study participants (32). Quality control (QC) of the genome-wide genotyped data was performed in PLINK v.1.9 (33) on each dataset separately. First, markers from sex chromosomes were excluded. SNPs with more than 2% missingness, that deviated from Hardy–Weinberg equilibrium at a P value of <5×10−5, or with a minor allele frequency below 5% were removed. Individuals missing more than 5% of genotype information were excluded. Genetically related pairs of individuals (>12.5% relatedness) were identified using KING v2.2.5 (34) and removed (75 individuals from PEGEN and 105 from PrOMIS). After these first QC steps, we identified 55,322 overlapping SNPs (excluding palindromic SNPs) between the PEGEN-BC dataset (1,755 participants), and the PrOMIS dataset (3,337 participants) that were used for ancestry estimation and PCAs. The average genotype call rate for the rs140068132 polymorphism was 0.99 in cases and in controls.
We imputed missing genotypes for the 6q25 region for each dataset separately using the Michigan Imputation Server (35), including individuals from the 1,000 Genomes project phase III as the reference panel. We filtered out low-quality–imputed variants (r2 < 0.3). We extracted imputed genotypes for 22 previously reported associated SNPs in the 6q25 region (Supplementary Table S1) to add to the analyses testing the association between rs140068132 and tumor subtype in the PEGEN-BC case only (n = 1,654 for which tumor subtype data were available) and case/control analyses (1,755 cases, 3,337 controls). These variants were extracted from published genome-wide association and fine mapping studies with and without functional analyses (16, 18–20, 23–25, 29, 36–39). Among the 22 variants in the 6q25 region that have been reported to be associated with breast cancer risk in other populations, we selected those that met the following conditions: (i) They had a P value of <0.15 in a case–control regression analysis adjusted by rs140068132; (ii) the OR in the case–control association analysis showed the same direction as reported in the literature, and (iii) they were in low LD with rs140068132. For SNPs in high LD with each other, we selected the one that presented the strongest evidence of functional activity in the literature and/or best score in the RegulomeDB (40). This process led to the selection of 5 SNPs: rs851984, rs9918437, rs3778609, rs2228480 and rs3798758. Among cases, the imputation quality (r2) of rs851984, rs9918437, rs3778609, rs2228480 and rs3798758 was 0.95, 0.70, 0.99, 0.60 and 0.92, respectively. Among controls, the r2 for rs851984, rs9918437, rs3778609, and rs3798758 was 0.99, 0.98, 0.99, 0.82, respectively. Among controls rs2228480 was genotyped, with an average genotype call rate of 0.99.
Identification of structure in the distribution of continental genetic variation
We conducted PCA on unrelated individuals to capture components of genetic structure to be used as covariates in case–control and case-only analyses and to confirm that variation between cases and controls was not driven by differences in the genotyping array. Genotyped data were pruned using PLINK v.1.9 (ref. 33; window size = 50, number of variants = 5, variance inflation factor threshold = 2) and merged with data from the 1,000 Genomes Project: (17) Admixed Americans (Peru, Colombia, Mexico and Puerto Rico), Europeans (Americans with Northern and Western European Ancestry, Italy, Spain, Finland, Scotland), East Asians (China, Japan, Vietnam), and African populations (Nigeria, Kenya, Gambia, Sierra Leone). The PEGEN-BC study includes a large proportion of patients with >98% Indigenous American ancestry, as previously reported (30), and therefore provides a source of non-admixed reference samples for this component. The PCA was performed on the merged dataset with 30,875 variants. T-distributed stochastic neighbor–embedding (Rt-SNE) analysis was performed in R 3.6.0 (41) with the Rtsne Package (42) using the first 30 principal components (PCs) of variation and perplexity = 30. Individual global genetic ancestry was estimated using ADMIXTURE (refs. 43, 44; unsupervised, k = 4) on this same dataset.
Data availability
Genotype data for rs140068132, the 22 additional SNPs in the 6q25 region, and global genetic ancestry estimates are available upon request.
Statistical analysis
Descriptive analyses of relevant clinical characteristics, genetic ancestry and rs140068132 genotypes
Differences in characteristics between tumor subtypes were tested by means of one-way ANOVA for normally distributed continuous variables, Kruskal–Wallis tests for non-normally distributed continuous variables, and χ2 tests for categorical variables.
Case–control association analysis
Single-SNP case–control association analysis was performed using binomial logistic regression models, including the first 10 PCs of genetic variation as covariates (1,755 cases and 3,337 controls). We did not include age at diagnosis/recruitment as a covariate in the case–control analysis due to the limited overlap in age distribution between the PEGEN-BC study cases (mean 52 ± 11.0 years) and the PrOMIS controls (28 ± 6.3). Adding age at diagnosis as a covariate in previous case–control analyses did not have an effect on estimated coefficients or P values for the rs140068132 SNP (16). A conditional analysis was performed by including previously associated SNPs in the model to test the independent effect of rs140068132 from the former set of polymorphisms. We complemented this analysis with haplotype association analyses.
Association between rs140068132 and tumor subtype
Multinomial and binomial multivariate logistic regression analyses were performed in the PEGEN-BC study samples (N = 1,654) and controls (N = 3,337), including the first 10 PCs of genetic variation as covariates and controls as the reference group. When cases only were evaluated, age at diagnosis was included as covariate and the HR+HER2− subtype was defined as the reference group. An additional model, including the region of residence as a covariate, was tested to account for residual confounding due to potential regional variation in allele frequency within the Indigenous American component of ancestry and variation in subtype distribution between INEN patients from different regions. A case-only conditional regression model was performed to test the independent association of rs140068132 and tumor subtype by including the previously reported breast cancer–associated SNPs that remained significant in the conditional case–control regression analysis (rs851984G>A, rs9918437G>T and rs3778609C>T). Analyses were adjusted by age at diagnosis and the first 10 PCs of genetic variation.
Haplotype association analysis
The four SNPs that remained associated with breast cancer risk in the conditional case–control model (rs140068132, rs851984, rs2228480, and rs3798758) were included in a haplotype analysis to estimate the magnitude of combined effects. Individual haplotypes defined by rs140068132, rs851984, rs2228480, rs3798758 for case–control association analysis were obtained using the haplo.stats R package (45). This package uses an optimized expectation–maximization algorithm to determine the posterior probabilities of pairs of haplotypes, conditional on unphased genotype data (45). We extracted individual haplotypes filtering out participants whose haplotype designation had a posterior probability <70% and filtered out haplotypes with a frequency <5%. For the haplotype-specific case–control analysis, 8 haplotypes and 1,132 cases and 2,076 controls were tested independently, using genotype 0 (cero copies of the haplotype) as reference. Alleles within each haplotype are represented with numbers within square brackets. To facilitate interpretation, we coded the risk allele as 1 for all SNPs and the non-risk allele as 0. Therefore, rs140068132 and rs2228480 minor alleles are denoted with a 0 due to their protective effect. The order of variants within each haplotype is as follows: rs140068132, rs851984, rs2228480, rs3798758. For case–control estimations, binomial logistic regression models were conducted to test the haplotype-specific ORs for each haplotype modeled as a continuous variable (0, 1, and 2 for absence, presence of one and presence of two copies of a specific haplotype). All analyses were adjusted by the first 10 PCs.
P values <=0.05 were considered statistically significant. All analyses were conducted in R v.3.6.0 (41).
Results
Characteristics of study participants
Clinical characteristics and genetic ancestry proportions for PEGEN-BC study patients, overall (n = 1,755) and by subtype (n = 1,654) are presented in Table 1. The average genetic ancestry proportions for the PEGEN-BC participants were 77% Indigenous American, 18% European, 4% African, and 1% East Asian. The average Indigenous American, European, African, and East Asian ancestry components for women without breast cancer were 81%, 14%, 3%, and 2%, respectively (Table 1; Fig. 1A). PCAs are consistent with these estimates, showing that a subset of individuals in the Peruvian cohort defines the Indigenous American cluster (Fig. 1B and C). In addition, visualization through a t-distributed stochastic neighbor–embedding (t-SNE) model shows that cases and controls have similar population genetic structure, including subcontinental clustering of individuals born in the Amazonian region (Supplementary Fig. S1).
Half of the patients (53%) were diagnosed with HR+HER2−, 19% with HR+HER2+, 13% with HR−HER2+, and 15% with HR−HER2− tumors. The mean proportion of Indigenous American ancestry was higher among women with the HR− and HER2+ subtypes (P = 0.009) compared with other subtypes. The average age at diagnosis was 53 years (±11) and differed by tumor subtype (Table 1), suggesting an average older age at diagnosis for women with HR+HER2− tumors. There were suggestive differences in the distribution of tumor subtype by region, with a higher proportion of patients residing in the Amazonian region diagnosed with HR−HER2− disease and a higher proportion of patients from the Mountainous region diagnosed with HR−HER2+ disease. However, these differences were not statistically significant. The distribution of clinical characteristics and genetic ancestry by ER, PR, and HER2 status separately are consistent with the results observed by the subtypes defined by these three markers (Supplementary Table S2).
rs140068132 subtype-specific association
The frequency of the rs140068132-G allele among patients with breast cancer was lower than that of controls (14% vs. 25%; respectively, P = 2.2×10–16). In a logistic regression model, we replicated the previously reported association of the G allele with lower odds of developing breast cancer (refs. 29, 46; OR, 0.53; 95% CI, 0.47–0.59), and lower odds for ER− subtypes (OR, 0.48; 95% CI, 0.39–0.57) compared with ER+ tumors (OR, 0.56, 95% CI, 0.49–0.64). A case-only analysis testing the association between rs140068132 and ER status supports the observed heterogeneity by ER status (ER+ vs. ER− OR, 0.77, 95% CI, 0.60–0.99; Table 2).
The G allele was more common in patients with HR+HER2− tumors (15%), and less so in patients with other subtypes (∼11% in HR+HER2+ and HR−HER2+, and 13% in HR−HER2−). The multinomial logistic regression model, including controls as a reference group, confirmed that there is a protective effect for all tumor subtypes (Table 2). The G allele was associated with reduced odds of HR+HER2+ and HR−HER2+ tumors in a case-only comparison, showing statically significant differences between each subtype compared with HR+HER2− (Table 2). Results did not change after including the region of residence as a covariate (Supplementary Table S3).
Independence of rs140068132 from other 6q25 previously associated variants
Previous studies in European, African, and Asian populations have reported variants in the 6q25 region associated with breast cancer risk with a differential magnitude of association by subtype (ref. 18; Supplementary Table S1). We tested whether there is a combined effect between rs140068132 and any of these previously reported variants on the observed association with tumor subtypes among the PEGEN-BC study patients. We selected SNPs from the 6q25 locus that are not in LD with rs140068132 for a conditional case–control association analysis (Supplementary Fig. S2). The selection was based on the concordance between the direction and magnitude of the association in the Peruvian case–control analysis and the previously reported ORs, and on their potential functional effect. We selected five candidate polymorphisms: rs851984G>A, rs9918437G>T, rs2228480G>A, rs3798758C>A, and rs3778609C>T. Despite the statistically significant association between each variant and breast cancer risk in the Peruvian study (Supplementary Table S1), only the rs851984, rs2228480, rs3798758, and rs140068132 SNPs remained associated with breast cancer risk in a conditional model (Supplementary Table S4), indicating the independent effect of these latter variants on breast cancer risk. To supplement the results of the conditional model, we tested the combined effect of the significant variants through a haplotype-specific analysis (Supplementary Table S5).
Conditional multinomial logistic regression analyses, including previously associated risk SNPs in the 6q25 region, did not show attenuation of the association between rs140068132 and tumor subtype (Table 3). However, rs9918437 showed a statistically significant association with HR−HER2+ subtype (OR, 1.64; 95% CI, 1.10–2.44; P = 0.015), independent of rs140068132 (OR, 1.57; 95% CI, 1.05–2.35, P = 0.027; Table 3).
Discussion
Breast cancer incidence in self-identified H/Ls is lower than in non-Hispanic White women (1), and the discovery of the rs140068132-protective SNP in the first GWAS, including a relatively large number of H/L participants, may be one of the multiple factors (e.g., alcohol intake and reproductive history) contributing to this trend (16). In previous studies, we demonstrated that rs140068132 showed heterogeneous association by ER status (16, 29), with lower odds for ER− compared with ER+ diseases among G-allele carriers (16, 29). In the present study, we replicated this observation in a highly Indigenous American population and further tested the subtype-specific association of rs140068132 showing that this SNP is more protective for HER2+ diseases compared with HR+/HER2− subtype in patients with breast cancer from Peru. We evaluated the association of this SNP together with other known risk variants at the locus showing that the association between rs140068132 and breast cancer risk is independent of other risk-associated SNPs in the region. The direction of the associations of these additional SNPs is consistent with those previously reported in other studies (Supplementary Table S1). We also found that one of the previously associated SNPs, rs9918437, shows a subtype-specific association in the Peruvian samples, with the rs9918437-T variant conferring increased odds of HR−HER2+ disease. A previous study reported a higher magnitude of effect for ER− compared with ER+ tumors (OR, 1.18, 95% CI, 1.11–1.27, compared with OR, 1.08, 95% CI, 1.04–1.13, respectively). However, no significant association was reported for the HR−HER2+ subtype (18). These results confirm that multiple SNPs within the 6q25 region contribute independently to subtype-specific breast cancer risk.
The protective effect of rs140068132 for the development of HER2+ disease suggests that the rs140068132 SNP might not only be involved in the regulation of ESR1 expression but might also affect ERBB2 expression (the gene that codes for HER2). Evidence showed that ER inhibits HER2 expression at the transcriptional level (47), and therefore, the fine-tuning regulation of ESR1 expression by rs140068132 or other variants within this locus might have a direct effect on HER2 levels. Other indirect mechanisms involve the modification of ER-related pathways (48). It is also possible that the inverse association with HER2 expression is due to molecular mechanisms that increase the chances of developing low proliferation cancer cell lineages that do not depend on HER2-associated pathways.
Previous studies have reported SNPs showing subtype-specific associations in breast cancer (18, 19, 22, 26, 49). For some of them, the functional interpretation and the link with the molecular events that explain the particular subtypes has been straightforward because they are located within known breast cancer–associated genes of high or moderate penetrance (50). However, most SNPs identified in GWAS are located in non-coding regions (51), such as the case of rs140068132, and are associated with relatively modest ORs, making the functional interpretation challenging. Future in vitro studies will allow testing the effect of rs140068132 on ESR1 gene expression in different tissue contexts, representing the heterogeneity of breast tumor subtypes.
Our study has some limitations. This study includes both case–case (subtype-specific) analyses as well as comparisons between cases and controls where the controls were part of the PrOMIS Study. These women are demographically similar to the women in the PEGEN-BC study (both studies recruited participants in public hospitals in Lima, Peru). However, given the nature of the two studies, the age distributions are different. We believe that this difference is unlikely to influence the results of our analyses and, if anything, the bias would be toward mitigation of allele frequency differences between breast cancer cases and controls assuming that some of the PrOMIS participants might develop breast cancer in the future. In addition, average ancestry differences between cases and controls are in the expected direction and of the same magnitude as what has been reported in previous studies of breast cancer in H/Ls from the US and Mexico (3). This provides additional reassurance about the adequacy of combining the PEGEN-BC and PrOMIS studies.
Another limitation was the relatively small number of overlapping SNPs between the arrays used in the PEGEN-BC and PrOMIS studies. Imputation quality was affected by the starting set and therefore full coverage of the 6q25 region was not possible to achieve due to the low imputation quality of some of the SNPs, limiting our analyses to previously reported variants. We plan to resolve this issue in the future to provide a further in-depth characterization of variation within the 6q25 region in Peruvian women.
In summary, the rs140068132 SNP is associated with breast cancer risk in a subtype-specific manner, independently of nearby variants previously described in other populations. The molecular mechanisms leading to the association between this protective variant and the least aggressive HR+HER2− breast cancer subtype needs further investigation. Ongoing eQTL analyses focused on rs140068132 in the context of the bidirectional molecular crosstalk between ER and HER2 pathways will contribute to understanding the effect of this variant on the etiology of breast cancer.
Authors' Disclosures
A. Hechmer reports grants from UCSF during the conduct of the study. E. Ziv reports grants from California Institute to Advance Precision Medicine and NIH/NCI during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
V.A. Zavala: Data curation, formal analysis, investigation, visualization, methodology, writing–original draft. S. Casavilca-Zambrano: Resources, writing–review and editing. J. Navarro Vásquez: Investigation, writing–review and editing. C.A. Castañeda: Investigation, writing–review and editing. G. Valencia: Investigation, writing–review and editing. Z. Morante: Investigation, writing–review and editing. M. Calderón: Investigation, writing–review and editing. J.E. Abugattas: Resources, investigation, writing–review and editing. H. Gomez: Investigation, writing–review and editing. H.A. Fuentes: Investigation, writing–review and editing. R. Liendo-Picoaga: Investigation, writing–review and editing. J.M. Cotrina: Investigation, writing–review and editing. C. Monge: Investigation, writing–review and editing. S.P. Neciosup: Investigation, writing–review and editing. S. Huntsman: Software, validation, writing–review and editing. D. Hu: Software, investigation, writing–review and editing. S.E. Sanchez: Resources, investigation, writing–review and editing. M.A. Williams: Resources, investigation, writing–review and editing. A. Nunez-Marrero: Investigation, writing–review and editing. L. Godoy: Investigation, writing–review and editing. A. Hechmer: Software, investigation, writing–review and editing. A.B. Olshen: Software, investigation, writing–review and editing. J. Dutil: Resources, validation, writing–review and editing. E. Ziv: Investigation, writing–review and editing. J. Zabaleta: Investigation, writing–review and editing. B. Gelaye: Resources, investigation, writing–review and editing. J. Vasquez: Investigation, writing–review and editing. M. Galvez-Nino: Investigation, writing–review and editing. D. Enriquez-Vera: Investigation, writing–review and editing. T. Vidaurre: Resources, writing–review and editing. L. Fejerman: Conceptualization, resources, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing.
Acknowledgments
The PEGEN-BC study was supported by the National Cancer Institute of the National Institutes of Health under award number (R01CA204797; to L. Fejerman). The PrOMIS study was supported by the National Institute of Child Health and Human Development of the National Institutes of Health under award number (R01-HD-059835; to B. Gelaye). We want to thank the biobank at the Instituto Nacional de Enfermedades Neoplásicas, Lima, Peru, for their assistance managing and storing the material for the study. We also want to thank participants from the PEGEN-BC and PrOMIS studies. Research supported by the 2021 AACR-Genentech Cancer Disparities Research Fellowship, Grant Number 21-40-18-ZAVA (to V.A. Zavala).
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).