Women of Latin American origin in the United States are more likely to be diagnosed with advanced breast cancer and have a higher risk of mortality than non-Hispanic White women. Studies in U.S. Latinas and Latin American women have reported a high incidence of HER2 positive (+) tumors; however, the factors contributing to this observation are unknown. Genome-wide genotype data for 1,312 patients from the Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC) were used to estimate genetic ancestry. We tested the association between HER2 status and genetic ancestry using logistic and multinomial logistic regression models. Findings were replicated in 616 samples from Mexico and Colombia. Average Indigenous American (IA) ancestry differed by subtype. In multivariate models, the odds of having an HER2+ tumor increased by a factor of 1.20 with every 10% increase in IA ancestry proportion (95% CI, 1.07–1.35; P = 0.001). The association between HER2 status and IA ancestry was independently replicated in samples from Mexico and Colombia. Results suggest that the high prevalence of HER2+ tumors in Latinas could be due in part to the presence of population-specific genetic variant(s) affecting HER2 expression in breast cancer.

Significance:

The positive association between Indigenous American genetic ancestry and HER2+ breast cancer suggests that the high incidence of HER2+ subtypes in Latinas might be due to population and subtype-specific genetic risk variants.

Globally, more than two million women are diagnosed with breast cancer each year (1). The incidence of breast cancer varies by country, and within the United States risk of breast cancer differs among racial/ethnic subpopulations (2). The age-adjusted incidence rate for breast cancer in non-Hispanic White women in the United States is the highest at 126.1 per 100,000, followed by African American at 124 per 100,000, Hispanic/Latina at 93.9 per 100,000, Asian/Pacific Islander at 93.0 per 100,000, and American Indian/Alaska Native at 74.2 per 100,000 (2). Mortality rates also differ by race/ethnicity, being highest in African American women at 28.1 per 100,000, followed by non-Hispanic Whites at 20.1 per 100,000, Hispanic/Latinas at 14.2, and with lowest rates among Asian/Pacific Islanders and American Indian/Alaska Natives at 11.2 and 11.4, respectively (2). Multiple studies have shown that while Hispanics/Latinas (Latinas) have lower breast cancer incidence and mortality rates than non-Hispanic White women, they have a higher risk of breast cancer–specific mortality with hazard ratio (HR) estimates ranging from 1.1 to 1.3 (3–7). This disparity could be in part explained by the fact that Latinas are more likely to be diagnosed with the more aggressive HER2+ and hormone receptor–negative subtypes of the disease than non-Hispanic White women (8–15).

Previous studies suggest that reproductive factors or other behavior-related exposures could explain some of the observed differences in the incidence of particular tumor subtypes between populations (16–18). In addition, genetic variants that are more strongly associated with particular tumor subtypes have been described (19). However, only few studies have reported specific factors associated with HER2+ subtypes and reports have been inconsistent (11, 19–21).

We have previously shown that a higher degree of Indigenous American (IA) genetic ancestry is associated with a lower incidence of breast cancer (22–24), and discovered a SNP in the 6q25 chromosomal region near the estrogen receptor 1 (ESR1) gene that explained a large fraction of that lower risk with an allele that originated from IA ancestry and is highly protective for breast cancer, particularly estrogen receptor–negative (ER) tumors (25). In this study, we evaluated the association between continental genetic ancestry and breast cancer subtype among highly IA patients with breast cancer from Peru, Mexico, and Colombia, to gain insight into the possibility that differences in tumor subtype distribution between racial/ethnic groups might be in part driven by underlying population-specific genetic variants.

Study participants

The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN-BC): As of October 2019, we have recruited 1,842 participants from the Instituto Nacional de Enfermedades Neoplásicas (INEN) in Lima, Peru. Women were invited to participate in the study if they had a diagnosis of invasive breast cancer in the year 2010 or later, and were between 21 and 79 years of age. A recruiter working locally at INEN, identified eligible patients with breast cancer who had upcoming appointments. During the appointment check-in, the recruiter approached eligible patients, provided the patients with information about the study, and reviewed the written informed consent with them if they showed interest in participating in the study. At the end of their oncology clinic visit, blood was drawn by a certified phlebotomist at the INEN central laboratory.

Demographic and clinical data were abstracted from the electronic medical records and included place of birth, current residence, age at diagnosis, height, weight, and family history of breast cancer, as well as tumor characteristics [i.e., histologic type, grade and stage, tumor size and ER, progesterone receptor (PR), and HER2 status]. A short study survey complemented the information obtained from the medical records and included reproductive history (i.e., age at menarche, age at first pregnancy, number of pregnancies, breastfeeding history, and menopause status) and lifestyle factors (i.e., alcohol consumption and smoking status).

The current analysis includes 1,312 patients with available genome-wide genotype data. This study was approved by the INEN and the University of California San Francisco Institutional Review Boards and all individuals provided written informed consent to participate.

The Post Columbian Study of Environmental and Heritable Causes of Breast Cancer (COLUMBUS) is a case–control study of breast cancer with individuals recruited throughout large cancer hospitals in Colombia and Mexico. In Mexico, breast cancer patients were recruited at the Oncology Hospital, Centro Médico Nacional Siglo XXI, and Gynecology Hospital No. 4, Instituto Mexicano del Seguro Social, in Mexico City, Mexico, between 2008 and 2013. The study was approved by the ethics committee of each hospital and written informed consent was obtained from all patients prior to enrollment in the study. Cases were histologically confirmed by a local pathologist. Genomic DNA was isolated from whole blood obtained by a certified phlebotomist. In Colombia, breast cancer patients diagnosed between 2011 and 2016 were recruited through large cancer hospitals in five Colombian cities (Bogota, Ibague, Medellin, Neiva, and Pasto). After providing written informed consent, patients were interviewed in person by trained research nurses. The information collected includes sociodemographic characteristics, anthropometric measures, smoking and drinking habits, family history of cancer, and the complete pathology report. Genomic DNA was isolated from whole blood samples obtained by a certified phlebotomist. The study adhered to the Helsinki Declaration and received Institutional Review Board approvals from University of Tolima (Ibagué, Colombia) and from all hospitals where recruitment was carried out.

Genotyping and quality control

In PEGEN-BC, DNA was extracted from whole blood following standard protocols and genotypic profiles were generated using the Affymetrix Precision Medicine Research Array (26). This array is designed to include a genome-wide imputation grid with about 800,000 markers and to cover five ancestral population groups, including African, admixed American, East Asian, European, and South Asian. Currently, 1,380 samples from 1,320 individuals have been genotyped. Quality control of the genotyped data was performed in PLINK 1.9 (27). First, we removed 4 individuals with a genotype call rate less than 90%. After individual genotype call rate filtering, we removed markers on sex chromosomes and those with call rates below 98% (21,752 markers were removed). We used identity by descent to identify any related individuals. There were 60 sets of duplicates and only one sample per individual was kept. The concordance rate for these duplicate individuals was 0.99. There were four sets of first-degree relatives, of which, only one sample per pair was included in downstream analyses. The final dataset for association analyses included 1,312 samples. Markers with a minor allele frequency (MAF) <1% were removed. The total number of markers included for imputation was 451,350. We imputed missing genotypes using the Michigan Imputation Server [1000 Genomes Project (1000G) phase III samples as reference; ref. 28] and after imputation filtered out markers with r2 < 0.8 and MAF of <1%, which resulted in a total number of 6,915,815 markers.

The 447 Colombian samples included in the replication analysis were genotyped using Affymetrix Axiom UKBiobank Array and the 169 Mexican samples were genotyped on Affymetrix Precision Medicine Research Array. Quality control of the genotyped data has been described previously (29).

Continental and subcontinental structure characterization

After pruning 1000G reference samples (in Plink: window size = 50, number of variants = 5, variance inflation factor threshold = 2), we merged them with the PEGEN-BC study imputed dataset, which resulted in a total of 520,450 markers for downstream analyses. Individual genetic ancestry was determined using ADMIXTURE (30), unsupervised, with four populations to capture IA, European, African, and East Asian continental ancestry based on the known major continental influences to the population of Peru (31). Global genetic ancestry estimates and core analysis variables are available upon request. To further characterize the continental and subcontinental structure of the PEGEN-BC study participants, we conducted principal component analysis (PCA) using the program Plink 1.9 (27) and T-distributed stochastic neighbor embedding (Rt-SNE) analyses with the Rtsne Package (32–34) using the first 50 principal components, in R 3.6.0 (35). Rt-SNE was run multiple times (n = 4) obtaining consistent clusters.

Individual genetic ancestry was also estimated with the program ADMIXTURE (30) in the replication samples. After pruning, 63,811 and 93,813 genotyped markers were used for the Colombian and Mexican samples, respectively.

Breast tumor subtype determination

Different approaches have been used to define breast cancer subtypes based on ER, PR, and HER2 status by IHC (36). We used a cutoff of 1% to define ER/PR positivity (37). HER2 positivity was defined as cases with 3+ staining by IHC or with gene amplification by FISH testing following a 2+ IHC result. For quality control, two independent pathologists from the University of California San Francisco reviewed the IHC slides at INEN for a subset of 52 patients. The concordance rate was high: 100% for ER, 87% for PR and 85% for HER2 (most of the discordant calls for HER2 were scored as “negative” or 1+ at INEN and 2+ by the independent pathologists). Tumor subtypes were defined as: ER/PR+ HER2, ER/PR+ HER2+, ER/PR HER2+ and ER/PR HER2.

The IHC HER2 status for the Colombian and Mexican samples was obtained from pathology reports and medical records. The hospitals from where patients were recruited used the same HER2 positivity criteria as INEN.

Statistical analysis

Patient characteristics were compared on the basis of breast cancer subtype. t tests were performed on normally distributed continuous variables, Kruskal–Wallis tests were performed on nonnormally distributed continuous variables, and χ2 tests were used on categorical variables.

Logistic regression models were used to compare individuals with HER2 versus HER2+ tumors. Multinomial logistic regression models defining the ER/PR+ HER2 group as the reference were used to test the association between tumor subtypes and IA genetic ancestry. Age at diagnosis, African ancestry, and height were included as covariates. Height was included in the models due to its potential role as a confounder based on the comparison of patient characteristics between tumor subtypes. An additional model including stage (I–II, vs. III–IV) and region of residence (Coastal, Mountain, or Amazonian) was run to assess the possible bias in the distribution of subtype by ancestry due to the potential correlation between IA ancestry and severity of the cases that are seen at INEN.

The replication analyses in the COLUMBUS consortium samples from Mexico and Colombia were conducted using logistic regression models comparing HER2+ versus HER2 cases. Additional subtyping of HER2+ tumors in the replication analysis was not conducted due to the limited sample size.

All analyses were performed using Plink 1.9 (27), R 3.6.0 (35), or ADMIXTURE (30).

Patient characteristics

PEGEN-BC study patient characteristics by tumor subtype are presented in Table 1 (N = 1,312). ER/PR+ HER2 tumors had a frequency of 49%, ER/PR+ HER2+ 18%, ER/PR HER2+ 12%, and ER/PR HER2 15%. Average age at diagnosis for all patients was 50 (SD = 11), and there was a nonsignificant trend toward younger age among women in the ER/PR HER2 group. Average body mass index (BMI) for all subtypes was 28 (SD = 5), and was similar between groups. There was a statistically significant difference between the subtypes in measured height, where women with ER/PR HER2+ tumors were shorter than those with other tumor subtypes (P = 0.033). Differences regarding reproductive variables by tumor subtype were not observed with the exception of menopausal status. There was a higher percent of premenopausal women among those with ER/PR HER2 tumors (P = 0.023). We also observed statistically significant differences by histopathologic subtype (P < 0.001), and clinical stage and grade (P < 0.001; Table 1). Lobular subtype had a frequency of 10% among women with ER/PR+ HER2 group compared with less than 3% in other groups. Stage and grade were higher in the ER/PR subgroups (HER2+ and HER2) compared to ER/PR+ tumors (Table 1).

Distribution of genetic ancestry

Results from the model-based continental genetic ancestry proportion estimation showed high IA ancestry proportions in the PEGEN-BC study patients (Table 1). The estimated average IA ancestry was 76%, the average European ancestry 18%, the average African ancestry 4%, and the average East Asian ancestry 2% (Table 1, Fig. 1AC). The PEGEN-BC study that includes 129 (9.8%) participants with an estimated genetic IA ancestry of more than 95% and 86 (6.6%) had an estimated genetic IA ancestry of more than 98%. PCA analysis shows that most of the INEN patients are concentrated in the highly IA range of genetic variation of the 1000G Admixed American populations, with few patients clustering with the Asian and African populations (Fig. 1D). We were able to identify genetic structure at the subcontinental level using PCA and Rt-SNE approaches (Fig. 2). The IA component of the study participants can be divided into three groups that correlate with the place of birth reported by the patient. We found two relatively small clusters with high IA ancestry cores, each of them composed mainly of patients from the Amazonian and the Coastal regions (Fig. 2A, AM and CO clusters, respectively). A larger cluster, including patients mainly from the Coast and the Mountains (Fig. 2A, CO/MO cluster), also contained patients with high IA ancestry. This observation is in line with previously described IA population migration patterns and genetic admixture in Peru (38). A small number of individuals in the study were identified who had high proportion of Asian or African ancestry (>50%), each group accounting for approximately 0.5% of the patients. Those with high East Asian ancestry clustered with the 1000G Japanese (Fig. 2B), while patients with high degree of African ancestry clustered with West African populations (Fig. 2B).

Among women in the Amazonian cluster, the incidence of ER/PRHER2 tumors was higher compared with other patients (19% vs. ∼13%, respectively), concordant with what has been previously reported (39). However, likely due to the small size of the Amazonian cluster, this association is not statistically significant (P = 0.89).

Subtypes and continental genetic ancestry

As described above, univariate analysis showed a statistically significant difference in proportion of IA genetic ancestry between tumor subtypes (P = 0.023, Table 1). The average IA ancestry among participants with ER/PR HER2+ tumors was 80% compared with 75% among participants with the ER/PR+ HER2 subtype. The results of a logistic regression model testing the association between HER2 status and IA ancestry showed a strong association with and without covariates (Table 2). The odds of having a HER2+ tumor increased by a factor of 1.19 per every 10% increase in IA ancestry (P = 0.002). The results of the multinomial logistic regression analysis showed that the odds of ER/PR HER2+ breast cancer increased by a factor of 1.22 per every 10% increase in IA ancestry (the ER/PR+ HER2 was defined as the reference group). This association remained statistically significant in the multivariable model [OR = 1.30; 95% confidence interval (CI), 1.10–1.55; P = 0.002; Table 2]. As expected, given the high correlation between the IA and European components of ancestry, we observed a decrease in the odds of developing ER/PR HER2+ disease with increasing European ancestry (Supplementary Table S1).

We also conducted the logistic regression analysis including the rs140068132-G protective SNP at 6q25 to test if the association between the HER2+ status and IA ancestry could be driven by this polymorphism. However, the association between HER2 status and IA ancestry was not explained by this protective variant (Table 2).

Given the Hospital-based nature of the study, we evaluated the possibility that ancestry was associated with tumor subtype due to IA ancestry being higher among women from remote regions who would more likely be referred to INEN with aggressive tumors. However, in analyses that included stage at diagnosis and region of residence as covariates the association between genetic ancestry and tumor subtype was unchanged (Supplementary Table S2).

Replication of the association was conducted in samples from Mexico and Colombia. Given the relatively small sample size for replication, we tested the association by HER2 status only (positive or negative), and analyzed samples from the two countries together including country as a covariate in the model (Table 3). The analysis showed a strong association between HER2 status and IA ancestry in the same direction as our analysis in the PEGEN-BC study patients (Table 3). For every 10% increase in IA ancestry among Mexican and Colombian patients with breast cancer, we observed a 28% increase in the odds of having a HER2+ tumor (Table 3). There was no evidence of heterogeneity by country (Colombia OR = 1.28; 95% CI, 1.03–1.60; Mexico OR = 1.20; 95% CI, 0.90–1.59).

Breast cancer subtype is a strong predictor of prognosis. Women diagnosed with ER/PR+ tumors tend to have better outcomes compared with women diagnosed with ER/PR disease (36, 40). Multiple studies have been conducted evaluating factors (genetic, behavioral, and environmental) that could explain subtype specific breast cancer risk (21, 41–45). Information about factors contributing to risk of developing HER2+ subtypes is limited but includes common germline variants (11, 19–21).

We evaluated the association between genetic ancestry and breast cancer subtype in a Latin American population with a high degree of IA genetic ancestry to assess the possibility that the previously reported higher relative frequency of HER2+ tumors among women of Latin American origin could be in part due to ancestry-specific genetic predisposition embedded within the IA ancestral component.

The clinical characteristics of the Peruvian patients with breast cancer included in this study were similar to those described in other Latin American populations with young age and relatively high proportion of the most aggressive tumor subtypes (ER/PR; refs. 7, 46, 47). Furthermore, it reflects well the tumor characteristics of the overall patient population at INEN based on the similarities between our study patient characteristics and those included in a previously published review on INEN patient characteristics (48, 49). The average IA genetic ancestry in the Peruvian patients with breast cancer was 76%, which was consistent with previous estimates of 70% to 80% IA ancestry in Peru (31, 50), including the 1000G Peruvians, who were also recruited in Lima. Taken together, these data suggest that our samples, even though they are clinic-based, are representative of the expected levels of IA ancestry in the general population of Peru.

Multiple studies have reported a higher incidence of HER2+ tumors in women of Latin American origin (8–12, 14). The results of the current analysis in patients with breast cancer from Peru, and including independent replication in patients from Mexico and Colombia, strongly suggest there could be population-specific genetic variant(s) increasing the risk of HER2+ tumors among women with IA ancestry. These results are consistent with the previously reported suggestive association between IA ancestry and ERBB2 gene expression (which codes for the HER2 protein) in breast tumors from Colombia (51).

There are various potential mechanisms behind the association between HER2 status and genetic ancestry, both environmental and/or genetic. Genetic risk factors could be acting in cis as well as in trans. Some examples of possible cis-regulated mechanisms that could lead to differences in HER2 protein expression would be the presence of population-specific expression quantitative trait loci (eQTL; ref. 52), splice variants (53), or variants affecting the probability that ERBB2 is amplified in tumor cells (54). Transregulation of HER2 protein expression could result from eQTLs or splice variants for genes that regulate ERBB2 expression. The reported associations between rare and common germline mutations in TP53 and HER2+ breast cancer (19, 55, 56) provide some support for this hypothesis. The exact mechanism for the association between TP53 mutations, SNPs, and HER2+ tumors is not well-understood, but is part of a larger body of subtype-specific common and rare variant associations with breast cancer (44, 57, 58). Large admixture mapping and genome-wide association studies (GWAS) by tumor subtype in women of IA ancestry will be needed to identify the relevant germline variants and gain clarity regarding specific mechanisms for the observed association between IA ancestry and HER2 status.

The ER/PR HER2+ tumor category had a larger OR than ER/PR+ HER2+ in the PEGEN-BC study, indicating that germline regulation of HER2 expression may not be the only mechanism at play in the relationship between HER2 status and IA ancestry. The difference between ER/PR+ versus ER/PR subtype could be random, both estimates are within each other's 95% CI, or could be reflecting hormone receptor–context dependent expression and/or amplification of ERBB2 (59).

In addition, environmental risk factors may also account for our observation. People of IA ancestry may have more exposure to risk factors that specifically increase the risk of HER2+ tumors. Some risk factors, such as BMI, are known to differentially affect the risk of luminal versus basal breast cancer subtypes (43). Although the HER2+ breast cancers are not known to have any specific environmental risk factors to date, additional studies including migration studies may help us understand whether this observation is related to environmental and/or genetic causes.

This is, to our knowledge, the first study evaluating the association between genetic ancestry and breast cancer subtypes in a sample of highly IA women. The IA genome is largely understudied. In 2016, Latinos made up 17.8% of the U.S. population (60). In that same year, however, that population made up only 0.54% of participants in GWAS (61). And just 0.05% of participants in GWAS were of non-admixed IA ancestry (61). Our study not only included admixed Latinas but a population in which 9.8% of the individuals had over 95% IA genetic ancestry. In addition, our study collected detailed clinical data on tumor characteristics and other relevant patient information from electronic medical records, which allowed us to compare breast cancer subtypes. Discovering common variants of small effects contributing to breast cancer in non-White populations is challenging due to the relatively limited number of samples available for these groups. However, as was the case in the discovery of the 6q25 protective variant, leveraging genetic admixture could provide a shortcut for identifying specific areas that are likely to carry subtype-specific risk variants (24, 25).

The PEGEN-BC study followed a hospital-based case-only design. As a result, the distribution of the subtypes in our study population might not represent the distribution of subtypes in the overall population of Peruvian patients with breast cancer. It is possible that women with slow growing or easily treatable tumors either are never treated or receive care somewhere else. However, because this is the main cancer hospital in Peru and equally accessible (or inaccessible) to patients in remote areas such as the jungle lowlands or the mountains, our study participants are likely to be a good representation of patients from all over the country and of diverse breast cancer phenotypes, although likely from the lower end of the country's income distribution. To account for this potential bias, we evaluated models that included stage and region of residence as covariates (Supplementary Table 1). Our results were unchanged.

This study reported a strong association between IA ancestry and HER2 status. The strongest association was observed for the ER/PR HER2+ subtype, with individuals from this group having higher average IA ancestry compared with individuals with ER/PR+ HER2 tumors. Our results provide support for the hypothesis that there might be population-specific genetic risk factors predisposing women of IA ancestry to develop HER2+ tumors. Further studies are needed to confirm the genetic basis of this association and discover specific regions or variants within the IA genome. Ultimately, a better understanding of these and other etiologic factors explaining breast cancer HER2 status will result in better subtype-specific polygenic risk prediction and improved targeted treatments, not only for women of Latin American origin, but in all women.

No potential conflicts of interest were disclosed.

The contents of this article are solely the responsibility of the authors and do not reflect the official views of the National Institutes of Health.

Conception and design: T. Vidaurre, J.N. Vásquez, J.E. Abugattas, S.J. Serrano-Gómez, L. Fejerman

Development of methodology: T. Vidaurre, J.E. Abugattas, A.P. Estrada-Florez, J.A. Carmona-Valencia, L. Fejerman

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): T. Vidaurre, P.C. Lott, S. Casavilca-Zambrano, M. Calderón, J.E. Abugattas, H.L. Gómez, H.A. Fuentes, S.P. Neciosup, C.A. Castañeda, F. Valencia, J. Torres, M. Echeverry, M.E. Bohórquez, G. Polanco-Echeverry, J.A. Carmona-Valencia, I. Alvarado-Cabrero, A. Velez, J. Donado, S. Song, D. Cherry, L.I. Tamayo, J. Zabaleta, L. Carvajal-Carmona, L. Fejerman

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.M. Marker, V.A. Zavala, T. Vidaurre, P.C. Lott, J.N. Vásquez, J.E. Abugattas, H.L. Gómez, J.M. Cotrina, S.P. Neciosup, I. Alvarado-Cabrero, S. Huntsman, D. Hu, R. Balassanian, J. Zabaleta, L. Carvajal-Carmona, L. Fejerman

Writing, review, and/or revision of the manuscript: K.M. Marker, V.A. Zavala, T. Vidaurre, J.N. Vásquez, S. Casavilca-Zambrano, J.E. Abugattas, H.L. Gómez, R.L. Picoaga, S.P. Neciosup, C.A. Castañeda, F. Valencia, J. Torres, S.J. Serrano-Gómez, M.C. Sanabria-Salas, D. Cherry, L.I. Tamayo, S. Huntsman, D. Hu, R. Ruiz-Cordero, R. Balassanian, E. Ziv, J. Zabaleta, L. Fejerman

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): V.A. Zavala, T. Vidaurre, J.E. Abugattas, H.L. Gómez, J. Torres, A.P. Estrada-Florez, D. Cherry, S. Huntsman

Study supervision: T. Vidaurre, S. Casavilca-Zambrano, H.A. Fuentes, Z. Morante, L. Carvajal-Carmona, L. Fejerman

Other (principal investigator at INEN Peru site): T. Vidaurre

Other (sample control): S. Casavilca-Zambrano

Other (histologic examination of breast cancer cases and assessment of biomarker expression): R. Ruiz-Cordero

We want to thank the biobank at the Instituto Nacional de Enfermedades Neoplásicas, Lima, Peru, for their assistance managing and storing the material for the study. We also want to thank participants from the PEGEN-BC study and the COLUMBUS Consortium. The PEGEN-BC study was supported by the National Cancer Institute (R01CA204797 to L. Fejerman) and the Instituto Nacional de Enfermedades Neoplásicas (Lima, Peru). The COLUMBUS Consortium was supported by grants from School of Medicine (Dean's Fellowship in Precision Health Equity to L. Carvajal-Carmona) and support from the Office of the Provost for L. Carvajal-Carmona's Latinos United for Cancer Health Advancement, LUCHA, Initiative); GSK Oncology (Ethnic Research Initiative to L. Carvajal-Carmona and M. Echeverry); The U.S. National Institutes of Health (Cancer Center Support Grant P30CA093372 from the National Cancer Institute). L. Carvajal-Carmona, M.E. Bohórquez, and M. Echeverry are also grateful for support from Colciencias (Graduate Studentship to Jennyfer Benavides, member of COLUMBUS, from Convocatoria para la Formación de Capital Humano de Alto Nivel para el Departamento de Tolima-COLCIENCIAS –755/2016), Universidad del Tolima (grants to M.E. Bohórquez and M. Echeverry, project 10112), and Sistema Nacional de Regalías, Gobernación del Tolima (grants to M.E. Bohórquez and M. Echeverry, projects 520120516 and 520115). J. Torres was supported by Coordinacion Nacional de Investigación en Salud, IMSS, México, grant FIS/IMSS/PROT/PRIO/13/027 and by the Consejo Nacional de Ciencia y Tecnologia (Fronteras de la Ciencia grant 773), México. The COLUMBUS Consortium (in alphabetical order) includes: Jennyfer Benavides (Universidad del Tolima, Ibagué, Colombia), Mabel Bohórquez (Universidad del Tolima, Ibagué, Colombia), Fernando Bolaños (Hospital Hernando Moncaleano Perdomo, Neiva, Colombia), Luis G. Carvajal-Carmona (University of California Comprehensive Cancer Center, Sacramento, Jenny Carmona (Dinámica IPS, Medellín, Colombia), Ángel Criollo (Universidad del Tolima, Ibagué, Colombia), Magdalena Echeverry (Universidad del Tolima, Ibagué, Colombia), Ana Estrada (Universidad del Tolima, Ibagué, Colombia), Gilbert Mateus (Hospital Federico Lleras Acosta, Ibagué, Colombia), Raúl Murillo (Pontificia Universidad Javeriana, Bogotá, Colombia), Justo Ramirez (Hospital Hernando Moncaleano Perdomo, Neiva, Colombia), Yesid Sánchez (Universidad del Tolima, Ibagué, Colombia), Carolina Sanabria (Instituto Nacional de Cancerología, Bogotá, Colombia), Martha Lucia Serrano (Instituto Nacional de Cancerología, Bogotá, Colombia), John Jairo Suarez (Universidad del Tolima, Ibagué, Colombia), Alejandro Vélez (Dinámica IPS, Medellín, Colombia, Hospital Pablo Tobón Uribe, Medellín, Colombia).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
. 
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
2.
United States Department of Health and Human Services; Centers for Disease Control and Prevention
. 
National Program of Cancer Registries and Surveillance, Epidemiology, and End Results SEER*Stat Database: NPCR and SEER Incidence – U.S. Cancer Statistics 2001–2015 Public Use Research Database
. 
2018
.
3.
Hsu
JL
,
Glaser
SL
,
West
DW
. 
Racial/ethnic differences in breast cancer survival among San Francisco bay area women
. 
1997
;
89
:
1311
2
.
4.
Li
FP
,
Hankey
BF
,
Clegg
LX
,
Chu
K
,
Edwards
BK
. 
Cancer survival among US whites and minorities
.
Arch Intern Med
2003
;
162
:
1985
.
5.
Pinheiro
PS
,
Williams
M
,
Miller
EA
,
Easterday
S
,
Moonie
S
,
Trapido
EJ
. 
Cancer survival among latinos and the hispanic paradox
.
Cancer Causes Control
2011
;
22
:
553
61
.
6.
Ooi
SL
,
Martinez
ME
,
Li
CI
. 
Disparities in breast cancer characteristics and outcomes by race/ethnicity
.
Breast Cancer Res Treat
2011
;
127
:
729
38
.
7.
Iqbal
J
,
Ginsburg
O
,
Rochon
PA
,
Sun
P
,
Narod
SA
. 
Differences in breast cancer stage at diagnosis and cancer-specific survival by race and ethnicity in the United States
.
JAMA
2015
;
313
:
165
73
.
8.
Parise
CA
,
Bauer
KR
,
Caggiano
V
. 
Variation in breast cancer subtypes with age and race/ethnicity
.
Crit Rev Oncol Hematol
2010
;
76
:
44
52
.
9.
Parise
CA
,
Bauer
KR
,
Brown
MM
,
Caggiano
V
. 
Breast cancer subtypes as defined by the estrogen receptor (ER), progesterone receptor (PR), and the human epidermal growth factor receptor 2 (HER2) among women with invasive breast cancer in California, 1999-2004
.
Breast J
2009
;
15
:
593
602
.
10.
Howlader
N
,
Altekruse
SF
,
Li
CI
,
Chen
VW
,
Clarke
CA
,
Ries
LAG
, et al
US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status
.
J Natl Cancer Inst
2014
;
106
:
dju055
.
11.
Banegas
MP
,
Tao
L
,
Altekruse
S
,
Anderson
WF
,
John
EM
,
Clarke
CA
, et al
Heterogeneity of breast cancer subtypes and survival among Hispanic women with invasive breast cancer in California
.
Breast Cancer Res Treat
2014
;
144
:
625
34
.
12.
Hines
LM
,
Risendal
B
,
Byers
T
,
Mengshol
S
,
Lowery
J
,
Singh
M
. 
Ethnic disparities in breast tumor phenotypic subtypes in hispanic and non-hispanic white women
.
J Women's Heal
2011
;
20
:
1543
50
.
13.
Serrano-Gómez
SJ
,
Fejerman
L
,
Zabaleta
J
. 
Breast Cancer in Latinas: A focus on intrinsic subtypes distribution
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
3
10
.
14.
Serrano-Gomez
SJ
,
Sanabria-Salas
MC
,
Hernández-Suarez
G
,
García
O
,
Silva
C
,
Romero
A
, et al
High prevalence of luminal B breast cancer intrinsic subtype in Colombian women
.
Carcinogenesis
2016
;
37
:
669
76
.
15.
Martínez
ME
,
Gomez
SL
,
Tao
L
,
Cress
R
,
Rodriguez
D
,
Unkart
J
, et al
Contribution of clinical and socioeconomic factors to differences in breast cancer subtype and mortality between Hispanic and non-hispanic white women
.
Breast Cancer Res Treat
2017
;
166
:
185
93
.
16.
Chen
L
,
Cook
LS
,
Tang
M-TC
,
Porter
PL
,
Hill
DA
,
Wiggins
CL
, et al
Body mass index and risk of luminal, HER2-overexpressing, and triple negative breast cancer
.
Breast Cancer Res Treat
2016
;
157
:
545
54
.
17.
Ambrosone
CB
,
Zirpoli
G
,
Ruszczyk
M
,
Shankar
J
,
Hong
CC
,
McIlwain
D
, et al
Parity and breastfeeding among African-American women: differential effects on breast cancer risk by estrogen receptor status in the women's circle of health study
.
Cancer Causes Control
2014
;
25
:
259
65
.
18.
Bandera E
V
,
Chandran
U
,
Hong
C-C
,
Troester
MA
,
Bethea
TN
,
Adams-Campbell
LL
, et al
Obesity, body fat distribution, and risk of breast cancer subtypes in African American women participating in the AMBER consortium
.
Breast Cancer Res Treat
2015
;
150
:
655
66
.
19.
Zhang
H
,
Ahearn
TU
,
Lecarpentier
J
,
Barnes
D
,
Beesley
J
,
Jiang
X
, et al
Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses
.
bioRxiv
2019
;
778605
.
20.
Holm
J
,
Eriksson
L
,
Ploner
A
,
Eriksson
M
,
Rantalainen
M
,
Li
J
, et al
Assessment of breast cancer risk factors reveals subtype heterogeneity
.
Cancer Res
2017
;
77
:
3708
17
.
21.
Gaudet
MM
,
Gierach
GL
,
Carter
BD
,
Luo
J
,
Milne
RL
,
Weiderpass
E
, et al
Pooled analysis of nine cohorts reveals breast cancer risk factors by tumor molecular subtype
.
Cancer Res
2018
;
78
:
6011
21
.
22.
Fejerman
L
,
John
EM
,
Huntsman
S
,
Beckman
K
,
Choudhry
S
,
Perez-Stable
E
, et al
Genetic ancestry and risk of breast cancer among U.S
.
Latinas
. 
2008
;
68
:
9723
8
.
23.
Fejerman
L
,
Romieu
I
,
John
EM
,
Lazcano-ponce
E
,
Beckman
KB
,
Pérez-stable
EJ
, et al
European ancestry is positively associated with breast cancer risk in Mexican women
.
Natl Institutes Heath
2011
;
19
:
1074
82
.
24.
Fejerman
L
,
Chen
GK
,
Eng
C
,
Huntsman
S
,
Hu
D
,
Williams
A
, et al
Admixture mapping identifies a locus on 6q25 associated with breast cancer risk in US Latinas
.
Hum Mol Genet
2012
;
21
:
1907
17
.
25.
Fejerman
L
,
Ahmadiyeh
N
,
Hu
D
,
Huntsman
S
,
Beckman
KB
,
Caswell
JL
, et al
Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25
.
Nat Commun
2014
;
5
:
5260
.
26.
Applied Biosystems
. 
Affymetrix. Axiom ® Precision Medicine Research Array Data Sheet
.
Available from:
http://tools.thermofisher.com/content/sfs/brochures/GGNO07706-2_DS_Axiom_PMRA.pdf.
27.
Purcell
SM
,
Chang
CC
,
Chow
CC
,
Tellier
LCAM
,
Lee
JJ
,
Vattikuti
S
. 
Second-generation PLINK: rising to the challenge of larger and richer datasets
.
Gigascience
2015
;
4
:
1
16
.
28.
Das
S
,
Forer
L
,
Schönherr
S
,
Sidore
C
,
Locke
AE
,
Kwong
A
, et al
Next-generation genotype imputation service and methods
.
Nat Genet
2016
;
48
:
1284
87
.
29.
Shieh
Y
,
Fejerman
L
,
Lott
PC
,
Marker
K
,
Sawyer
SD
,
Hu
D
, et al
A polygenic risk score for breast cancer in U.S. Latinas and Latin-American women
.
J Natl Cancer Inst
2019
.
DOI: 10.1093/jnci/djz174
.
30.
Alexander
DH
,
Novembre
J
,
Lange
K
. 
Fast model-based estimation of ancestry in unrelated individuals
.
Genome Res
2009
;
19
:
1655
64
.
31.
Sandoval
JR
,
Salazar-Granara
A
,
Acosta
O
,
Castillo-Herrera
W
,
Fujita
R
,
Pena
SDJ
, et al
Tracing the genomic ancestry of Peruvians reveals a major legacy of pre-Columbian ancestors
.
J Hum Genet
2013
;
58
:
627
34
.
32.
van der Maaten
L
,
Hinton
G
. 
Visualizing high-dimensional data using t-SNE
.
J Mach Learn Res
2008
;
9
:
2579
605
.
33.
van der Maaten
LJP
. 
Accelerating t-SNE using Tree-Based Algorithms
.
J Mach Learn Res
2014
;
15
:
3221
45
.
34.
Krijthe
JH
. 
Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation
. Available from:
https://github.com/jkrijthe/Rtsne.
35.
R Core Team
. 
R: A language and environment for statistical computing
.
Vienna, Austria
:
R Foundation for Statistical Computing
; 
2018
.
Available from:
https://www.r-project.org/.
36.
Blows
FM
,
Driver
KE
,
Schmidt
MK
,
Broeks
A
,
van Leeuwen
FE
,
Wesseling
J
, et al
Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies
.
PLoS Med
2010
;
7
:
e1000279
.
37.
Tang
P
,
Tse
GM
. 
Immunohistochemical surrogates for molecular classification of breast carcinoma: A 2015 update
.
Arch Pathol Lab Med
2016
;
140
:
806
14
.
38.
Harris
DN
,
Song
W
,
Shetty
AC
,
Levano
KS
,
Cáceres
O
,
Padilla
C
, et al
Evolutionary genomic dynamics of peruvians before, during, and after the Inca Empire
.
Proc Natl Acad Sci U S A
2018
;
115
:
E6526
35
.
39.
Tamayo
LI
,
Vidaurre
T
,
Navarro Vásquez
J
,
Casavilca
S
,
Aramburu Palomino
JI
,
Calderon
M
, et al
Breast cancer subtype and survival among Indigenous American women in Peru
.
PLoS One
2018
;
13
:
e0201287
.
40.
Sorlie
T
,
Tibshirani
R
,
Parker
J
,
Hastie
T
,
Marron
JS
,
Nobel
A
, et al
Repeated observation of breast tumor subtypes in independent gene expression data sets
.
Proc Natl Acad Sci U S A
2003
;
100
:
8418
23
.
41.
Benefield
HC
,
Zabor
EC
,
Shan
Y
,
Allott
EH
,
Begg
CB
,
Troester
MA
. 
Evidence for etiologic subtypes of breast cancer in the carolina breast cancer study
.
Cancer Epidemiol Biomarkers Prev
2019
;
28
:
1784
91
.
42.
Rey-Vargas
L
,
Sanabria Salas
MC
,
Fejerman
L
,
Serrano-Gómez
SJ
. 
Risk factors for triple negative breast cancer among Latina women
.
Cancer Epidemiol Biomarkers Prev
2019
;
28
:
1771
83
.
43.
Shieh
Y
,
Scott
CG
,
Jensen
MR
,
Norman
AD
,
Bertrand
KA
,
Pankratz
VS
, et al
Body mass index, mammographic density, and breast cancer risk by estrogen receptor subtype
.
Breast Cancer Res
2019
;
21
:
48
.
44.
Milne
RL
,
Kuchenbaecker
KB
,
Michailidou
K
,
Beesley
J
,
Kar
S
,
Lindström
S
, et al
Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer
.
Nat Genet
2017
;
49
:
1767
78
.
45.
Purrington
KS
,
Slager
S
,
Eccles
D
,
Yannoukakos
D
,
Fasching
PA
,
Miron
P
, et al
Genome-wide association study identifies 25 known breast cancer susceptibility loci as risk factors for triple-negative breast cancer
.
Carcinogenesis
2014
;
35
:
1012
19
.
46.
Srur-Rivero
N
,
Cartin-Brenes
M
. 
Breast cancer characteristics and survival in a Hispanic population of Costa Rica
.
Breast Cancer
2014
;
8
:
103
8
.
47.
Miller
KD
,
Goding Sauer
A
,
Ortiz
AP
,
Fedewa
SA
,
Pinheiro
PS
,
Tortolero-Luna
G
, et al
Cancer statistics for Hispanics/Latinos, 2018
.
CA Cancer J Clin
2018
;
68
:
425
45
.
48.
Vallejosa
CS
,
Gómez
HL
,
Cruza
WR
,
Pintoa
JA
,
Dyerb
RR
,
Velardec
R
, et al
Breast cancer classification according to immunohistochemistry markers: subtypes and association with clinicopathologic variables in a peruvian hospital database
.
Clin Breast Cancer
2010
;
10
:
294
300
.
49.
Fejerman
L
,
Ziv
E
. 
Population differences in breast cancer severity
.
Pharmacogenomics
2008
;
9
:
323
33
.
50.
Homburger
JR
,
Moreno-Estrada
A
,
Gignoux
CR
,
Nelson
D
,
Sanchez
E
,
Ortiz-Tello
P
, et al
Genomic insights into the ancestry and demographic history of South America
.
PLOS Genet
2015
;
11
:
e1005602
.
DOI: 10.1371/journal.pgen.1005602
.
51.
Serrano-Gómez
SJ
,
Sanabria-Salas
MC
,
Garay
J
,
Baddoo
MC
,
Hernández-Suarez
G
,
Mejía
JC
, et al
Ancestry as a potential modifier of gene expression in breast tumors from Colombian women
.
PLoS One
2017
;
12
:
1
21
.
52.
Su
Y
,
Jiang
Y
,
Sun
S
,
Yin
H
,
Shan
M
,
Tao
W
, et al
Effects of HER2 genetic polymorphisms on its protein expression in breast cancer
.
Cancer Epidemiol
2015
;
39
:
1123
7
.
53.
Caswell
JL
,
Camarda
R
,
Zhou
AY
,
Huntsman
S
,
Hu
D
,
Brenner
SE
, et al
Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors
.
Hum Mol Genet
2015
;
24
:
7421
31
.
54.
LaFramboise
T
,
Weir
BA
,
Zhao
X
,
Beroukhim
R
,
Li
C
,
Harrington
D
, et al
Allele-specific amplification in cancer revealed by SNP array analysis
.
PLoS Comput Biol
2005
;
1
:
e65
.
DOI: 10.1371/journal.pcbi.0010065
.
55.
Wilson
JRF
,
Bateman
AC
,
Hanson
H
,
An
Q
,
Evans
G
,
Rahman
N
, et al
A novel HER2-positive breast cancer phenotype arising from germline TP53 mutations
.
J Med Genet
2010
;
47
:
771
4
.
56.
Melhem-Bertrandt
A
,
Bojadzieva
J
,
Ready
KJ
,
Obeid
E
,
Liu
DD
,
Gutierrez-Barrera
AM
, et al
Early onset HER2-positive breast cancer is associated with germline TP53 mutations
.
Cancer
2012
;
118
:
908
13
.
57.
Ha
SM
,
Chae
EY
,
Cha
JH
,
Kim
HH
,
Shin
HJ
,
Choi
WJ
. 
Association of BRCA mutation types, imaging features, and pathologic findings in patients with breast cancer with BRCA1 and BRCA2 mutations
.
Am J Roentgenol
2017
;
209
:
920
8
.
58.
Chen
H
,
Wu
J
,
Zhang
Z
,
Tang
Y
,
Li
X
,
Liu
S
, et al
Association between BRCA status and triple-negative breast cancer: a meta-analysis
.
Front Pharmacol
2018
;
9
:
909
.
59.
Daemen
A
,
Manning
G
. 
HER2 is not a cancer subtype but rather a pan-cancer event and is highly enriched in AR-driven breast tumors
.
Breast Cancer Res
2018
;
20
:
8
.
60.
Annual Estimates of the Resident Population by Sex, Age, Race, and Hispanic Origin for the United States and States: April 1, 2010 to July 1, 2014. Source: U.S. Census Bureau, Population Division. Release Date: June 2015
. Available from: data.census.gov.
61.
Popejoy
AB
,
Fullerton
SM
. 
Genomics is failing on diversity
.
Nature
2016
;
538
:
161
4
.