Abstract
Background: Neuroblastoma is an often fatal pediatric cancer more frequent in European-American than African-American children. African-American children, however, are at higher risk for the more severe form of neuroblastoma and have worse overall survival than European-American children. Genome-wide association studies (GWAS) have identified several single-nucleotide polymorphisms (SNP) associated to neuroblastoma in children of European descent. Knowledge of their association to neuroblastoma in African-American children is still lacking.
Methods: We genotyped and imputed SNPs located in three gene regions reported to be associated to neuroblastoma in children of European descent, and tested them for association in 390 African-American patients with neuroblastoma compared with 2,500 healthy, ethnically matched controls.
Results: SNPs in the BARD1 gene region show a similar pattern of association to neuroblastoma in African-American and European-American children. The more restricted extent of linkage disequilibrium in the African-American population suggests a smaller candidate region for the putative causal variants than previously reported. Limited association was observed at the other two gene regions tested, including LMO1 in 11p15 and FLJ22536 in 6p22.
Conclusions: Common BARD1 SNPs affect risk of neuroblastoma in African-Americans. The role of other SNPs associated to neuroblastoma in children of European descent could not be confirmed, possibly due to different patterns of linkage disequilibrium or limited statistical power to detect association to variants with small effect on disease risk. Extension of GWAS to populations of African descent is important to confirm their results and validity beyond the European populations and can help to refine the location of the putative causal variants. Cancer Epidemiol Biomarkers Prev; 21(4); 658–63. ©2012 AACR.
Introduction
Neuroblastoma is an important pediatric cancer with an age-adjusted incidence rate in Americans of European descent of approximately 11.5 per million (1). Neuroblastoma is reported to be rarer among African-Americans children, with an age-adjusted incidence of 8.5 per million (1). Limited data exist on incidence of neuroblastoma in Africa, but reports suggest it is similar to that of African-Americans or lower (1). ORs of neuroblastoma by parental ethnic origin relative to European-American children was only 0.74 [95% confidence interval (CI), 0.56–0.96] for children with both African-American parents (2). However, a recent study has shown that African-American children in the United States are more likely to have the high-risk form of the disease than European-American children (57% vs. 44%) and have associated lower overall survival probability (67% vs. 75%; ref. 3). According to the Surveillance, Epidemiology, and End Results database (4), for ages less than 1 year the age-adjusted incidence is 58 per million in European-Americans (95% CI, 52–64) versus only 30 per million in African-Americans (95% CI, 21–42); however, incidence rates in the 2 ethnic groups are more similar for older ages (20 vs. 21 for ages 1–4; 4 vs. 3 for ages 5–9; and 1 vs. 2 for ages 10–14). Older age at diagnosis is an adverse prognostic factor in neuroblastoma (5).
Genetic risk factors may contribute to disparities in cancer prevalence and outcome. In recent years, findings from case–control genome-wide association studies (GWAS; refs. 6–10) and family-based linkage analysis (11) have improved our understanding of genetic susceptibility to neuroblastoma. Individuals of European descent constitute the majority of patients with neuroblastoma in the United States, and so far the GWAS of neuroblastoma have been limited to this ethnic group. Here, we have obtained genome-wide single-nucleotide polymorphism (SNP) genotyping data on a cohort of African-American patients with neuroblastoma collected by the Children's Oncology Group and have used this information in a case–control study to evaluate whether the same genes and SNPs identified in the European-American studies affect risk of neuroblastoma in African-American children. Our results show that of all the risk variants identified so far, only SNPs of BARD1 unequivocally have an effect on neuroblastoma risk in African-American children. Whether this is due to difference in genetic susceptibility or limited power to detect small genetic effects remains to be determined.
Materials and Methods
Patients and controls
DNA samples and clinical information were available for 390 African-American patients with neuroblastoma from the Children's Oncology Group, with clinical and biologic annotation as reported previously (6–10). For this study, subjects eligible for inclusion were African-Americans based on self-reported ethnicity. A total of 2,500 control samples were selected on the basis of self-reported African-American ethnicity from the Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (Philadelphia, PA).
Genotyping and quality control filters
Genome-wide SNP genotype data from 390 patients with neuroblastoma and 2,500 disease-free control subjects were obtained at the CAG using the Illumina HumanHap 550 (243 cases and 1,875 controls) and 610 Quad (147 cases and 625 controls) SNP chips as previously described (9). Only SNPs that were included in both chips were evaluated further.
SNPs were excluded from analysis if they showed deviation from Hardy–Weinberg equilibrium with a P value less than 10−4 in controls or 10−7 in cases, a genotype yield of less than 95%, a minor allele frequency (MAF) of less than 1%, and difference in missing rates between cases and controls with a P value less than 10−4. This filtering resulted in a total of 504,535 autosomal SNPs available for the subsequent analyses. Among these, we selected all the SNPs included in 3 genes showing association to neuroblastoma in previous GWAS of European-Americans, namely, FLJ22536, BARD1, and LMO1 (6, 7, 9). We also included all the extragenic SNPs located in the regions extending 10 kb on either side of each candidate gene, or as far as the most distant SNP having r2 ≥ 0.5 in the HapMap CEU group with the neuroblastoma associated SNPs, whichever was larger. This brought the total number of genotyped SNPs to be tested for association to 201 (Supplementary Table S1). Coverage of common variants in the candidate regions provided by these SNPs was estimated using HapMap phase II data in the CEU and YRI populations using the Tagger option of Haploview (12) with pairwise tagging at r2 = 0.8.
We did not include in our analysis SNPs from the regions that were identified as neuroblastoma associated in the low-risk patients only (including the genes DUSP12, DDX4, IL31RA, and HSD17B12; ref. 10) because of the small size of this group in our sample (97 patients).
Before statistical analysis, individual data were filtered on the basis of standard quality control measures, including call rates, discrepancy between reported sex and X chromosome marker heterozygosity, and cryptic relatedness. A total of 34 samples (25 case subjects and 9 control subjects) had genotype yields of less than 95% and were removed, leaving 365 cases and 2,491 controls available for analysis. Power to detect association in this sample given varying risk allele frequency (0.1–0.9) and genetic risk effect (1.2–1.6) was estimated using Quanto (13).
Analysis of stratification and admixture
African-Americans are a recently admixed population with 2 major founder groups. Admixture may introduce a bias in association tests if ancestry proportions are different between cases and controls. Ancestry proportions in our cases and controls were estimated with the software ADMIXTURE (14) and assuming 2 founder populations. We also evaluated stratification in our case–control sample by multidimensional scaling (MDS) using the procedure implemented in PLINK (15) and a subset of independent SNPs (r2 < 0.01). Data from our cases and controls were compared with those obtained from the HapMap in the 4 major populations (CEU, CHB, JPT, and YRI).
Association analysis
To account for possible difference in substructure between cases and controls due to varying levels of admixture, we tested SNPs for association with neuroblastoma by logistic regression using the proportion of African ancestry estimated by ADMIXTURE as covariate. We also tested for association using the stratified Cochran–Mantel–Haenszel (CMH) test implemented in PLINK on the basis of the clusters identified by the MDS analysis (15). Results of the association tests with the 2 methods were almost identical and only results of logistic regression are presented in details (CMH results available in Supplementary Table S1).
To correct for the different number of SNPs tested in each candidate region, while taking into consideration the correlation between SNPs in any given region due to linkage disequilibrium (LD), we estimated empirical P values by permutation of case–control status separately in each gene, using the EMP2 statistic from the mperm procedure implemented in PLINK (15). P values reported in our tables are asymptotic P values from the corresponding tests and empirical P values from 10,000 permutations. Because all 3 genes have been previously reported to be associated to neuroblastoma in multiple independent data sets (6, 7, 9), we did not correct for the number of genes tested.
Regional association and LD plots were generated using SNAP (16).
Imputation
Imputation was carried out with IMPUTE2 (17) using the complete 1000 Genomes Project Phase 1 interim data as reference (18). We followed the practice recommended by the developers of IMPUTE2 to use a cosmopolitan reference population rather than arbitrarily select ancestral populations for our individuals (19). All SNPs contained in the reference data and included in the 3 gene regions defined above were imputed. Following imputation, SNPs with MAF < 1% or IMPUTE2 info quality score <0.8 were removed. To correctly account for uncertainty in the data resulting from the imputation process, the remaining SNPs were tested for association with neuroblastoma using the frequentist association test under the additive model implemented in SNPTEST (20), including the estimated proportion of African admixture as covariate. All genotyped SNPs were also tested with the same test to compare results with the imputed SNPs.
Results
Power and SNP tagging
Power to detect association in our study was estimated to be >80% for intermediate risk allele frequencies (0.2–0.7) and ORs more than 1.4 (Supplementary Fig. 1A).
Percentage of HapMap release 24 SNP coverage for the individual gene regions provided by the SNPs included in our analysis varied from 49% for FLJ22536 to 64% for BARD1 in the YRI population and from 70% for LMO1 to 77% for FLJ22536 and BARD1 in the CEU population (Supplementary Fig. S1B).
Stratification and admixture
MDS analysis of cases and controls together with the 4 major HapMap populations showed that our samples cluster along a continuum between the CEU and the YRI populations as expected (Supplementary Fig. S2A and S2B). Genome-wide estimates of African ancestry assuming 2 founder populations had a mean of 0.76% ± 0.23% in cases and 0.76% ± 0.19% in controls. There were some slight differences between the 2 groups with more cases than controls at the extremes of the distribution (0%–10% and 80%–90% African ancestry; Supplementary Fig. S2C).
Association analysis of genotyped SNPs
On the basis of the genome-wide data, we estimated a genomic inflation factor (GIF) of the logistic regression test of 1.02. In contrast, the GIF of the 1 degree-of-freedom allelic test was 1.19. These data suggest that the procedures implemented to control population stratification were effective in reducing any potential inflation in type 1 error.
Results for the most significant SNPs reported in previous GWAS of European-American patients (or index SNPs) are reported in Table 1. In the African-American case series, 2 SNPs at the BARD1 locus, rs7587476 and rs6435862, showed significant P values even after multiple test correction (permutation P = 3 × 10−4 and 5 × 10−4, respectively), and the other 3 BARD1 index SNPs reached nominal significance. For all 5 BARD1 index SNPs, the direction of the association was the same as the one observed in the European-American patients, with similar ORs in the range of 1.3 to 1.5 (8). None of the index SNPs in the other genes reached even nominal levels of significance (all nominal P values >0.05).
Gene . | Chr . | SNP (bp) . | Alleles (m/M) . | MAF cases . | MAF controls . | P . | Emp-p . | OR (95% CI) . |
---|---|---|---|---|---|---|---|---|
FLJ22536 | 6 | rs4712653 (22,233,943) | T/C | 0.19 | 0.19 | 0.56 | 1.00 | 0.94 (0.77–1.15) |
rs9295536 (22,239,908) | C/A | 0.19 | 0.19 | 0.30 | 1.00 | 0.90 (0.73–1.10) | ||
rs6939340 (22,247,983) | A/G | 0.17 | 0.18 | 0.77 | 1.00 | 0.97 (0.79–1.19) | ||
BARD1a | 2 | rs3768716 (215,344,039) | G/A | 0.10 | 0.07 | 0.007 | 0.14 | 1.45 (1.11–1.90) |
rs17487792 (215,351,745) | T/C | 0.10 | 0.06 | 0.009 | 0.19 | 1.45 (1.10–1.93) | ||
rs7587476 (215,362,132) | T/C | 0.42 | 0.34 | 2.4 × 10−6 | 0.0003 | 1.47 (1.25–1.72) | ||
rs6712055 (215,375,149) | C/T | 0.22 | 0.17 | 0.006 | 0.14 | 1.31 (1.08–1.59) | ||
rs6435862 (215,380,791) | G/T | 0.34 | 0.26 | 1.8 × 10−5 | 0.0005 | 1.44 (1.22–1.70) | ||
LMO1 | 11 | rs4758051 (8,195,215) | A/G | 0.20 | 0.18 | 0.77 | 1.00 | 1.03 (0.84–1.27) |
rs10840002 (8,199,602) | G/A | 0.27 | 0.25 | 0.79 | 1.00 | 1.03 (0.85–1.23) | ||
rs110419 (8,209,429) | G/A | 0.22 | 0.23 | 0.18 | 0.99 | 0.88 (0.72–1.07) | ||
rs204938 (8,234,773) | T/C | 0.34 | 0.31 | 0.27 | 1.00 | 1.10 (0.93–1.30) |
Gene . | Chr . | SNP (bp) . | Alleles (m/M) . | MAF cases . | MAF controls . | P . | Emp-p . | OR (95% CI) . |
---|---|---|---|---|---|---|---|---|
FLJ22536 | 6 | rs4712653 (22,233,943) | T/C | 0.19 | 0.19 | 0.56 | 1.00 | 0.94 (0.77–1.15) |
rs9295536 (22,239,908) | C/A | 0.19 | 0.19 | 0.30 | 1.00 | 0.90 (0.73–1.10) | ||
rs6939340 (22,247,983) | A/G | 0.17 | 0.18 | 0.77 | 1.00 | 0.97 (0.79–1.19) | ||
BARD1a | 2 | rs3768716 (215,344,039) | G/A | 0.10 | 0.07 | 0.007 | 0.14 | 1.45 (1.11–1.90) |
rs17487792 (215,351,745) | T/C | 0.10 | 0.06 | 0.009 | 0.19 | 1.45 (1.10–1.93) | ||
rs7587476 (215,362,132) | T/C | 0.42 | 0.34 | 2.4 × 10−6 | 0.0003 | 1.47 (1.25–1.72) | ||
rs6712055 (215,375,149) | C/T | 0.22 | 0.17 | 0.006 | 0.14 | 1.31 (1.08–1.59) | ||
rs6435862 (215,380,791) | G/T | 0.34 | 0.26 | 1.8 × 10−5 | 0.0005 | 1.44 (1.22–1.70) | ||
LMO1 | 11 | rs4758051 (8,195,215) | A/G | 0.20 | 0.18 | 0.77 | 1.00 | 1.03 (0.84–1.27) |
rs10840002 (8,199,602) | G/A | 0.27 | 0.25 | 0.79 | 1.00 | 1.03 (0.85–1.23) | ||
rs110419 (8,209,429) | G/A | 0.22 | 0.23 | 0.18 | 0.99 | 0.88 (0.72–1.07) | ||
rs204938 (8,234,773) | T/C | 0.34 | 0.31 | 0.27 | 1.00 | 1.10 (0.93–1.30) |
Abbreviations: Emp-p, empirical P value from permutation test; m/M, minor allele/major allele; and OR, ORs for the minor allele.
aSNP rs6715570 reported by Capasso and colleagues (7) was removed from our analysis for missing rate greater than 5% in cases.
Because our previous studies in European-Americans had found that association signals were stronger when analysis was restricted to high-risk patients despite of the smaller sample size (6, 7, 9), we repeated the association analysis in this subgroup (180 cases) against all controls. Of the 2 significant index SNPs of BARD1 from the analysis of all patients, one became more significant (rs7587476, permutation P < 10−4), and the other less (rs6435862, permutation P = 10−3). All other BARD1 index SNPs were not significant after multiple test correction (permutation P > 0.05; Supplementary Table S1). One index SNP in LMO1 reached nominal significance in the high-risk group (rs294938, P = 0.04). However, the direction of the association was the opposite of that observed in the European-Americans, where the G allele showed association to neuroblastoma (9), rather than the A allele as in the African-Americans. All the other index SNPs were still not significant (all nominal P values >0.05; Supplementary Table S1).
We then asked whether other SNPs in the candidate regions, different from those reported as most significant in the European-American patients, reached statistical significance in the African-American patients. One BARD1 SNP (in addition to those already reported in Table 1) showed significant association (rs16852804, permutation P = 0.02). When we looked at the high-risk subgroup separately, one additional BARD1 SNPs (rs7599060, permutation P = 0.02) and one SNP in LMO1 (rs4237769, permutation P = 0.001) reached statistical significance. No other SNP showed statistical significance after multiple test correction (Supplementary Table S1). No additional signals of association were detected when analysis was further extended to include all SNPs with r2 > 0.1 with the index SNPs (all P > 0.05).
Overall, 3 SNPs of BARD1 reached statistical significance after correction for the number of SNPs tested, and 10 more had nominal P values less than 0.05 (Supplementary Table S1 and Supplementary Fig. S3A). To test whether these may represent multiple independent signals of association, we repeated the logistic regression test conditional on the most significant SNP, rs7587476. None of the SNPs was significant after multiple test correction (permutation P values >0.05). The nominal P value for rs6435862 went from 0.00002 to 0.04 and for rs16852804 from 0.0007 to 0.005. All other nominal P values were greater than 0.05 (Supplementary Fig. S3B).
To examine the extent of the associated region, we plotted the r2 values relative to rs7587476 for all the SNPs in a 100-kb region around it against their genomic location. We chose rs7587476 as the reference SNP because not only it is the most strongly neuroblastoma-associated SNP in this study but also is the most significant BARD1 SNP in our most recent analysis of European-Americans (9). Using data from the 1000 Genomes Project Pilot 1 in the CEU and YRI populations and the web-based software SNAP (15), we determined the size of the region which includes SNPs with r2 > 0.5 with rs7587476 (Supplementary Fig. S4). In the YRI population, this region extends from 215,348,641 to 215,367,140 bp and comprises introns 2 to 4 and exons 3 to 4 of BARD1. In the CEU population, this region extends from 215,344,039 to 215,457,501 bp, and thus for an additional 4.6 kb proximally and 90.4 kb distally.
Association analysis of imputed SNPs
Following imputation using reference data available through the 1000 Genomes Project Phase 1, the number of imputed SNPs that passed quality control filters (MAF ≥1% and IMPUTE2 info score ≥0.8) was 948 for the BARD1 region, 1,946 for the FLJ22536 region, and 215 for the LMO1 region (Supplementary Table S2).
One BARD1-imputed SNP, rs35933323, located in intron 1, had a slightly smaller P value than the most significant genotyped SNP, rs7587476 (P = 1 × 10−6 and P = 2 × 10−6, respectively), and 8 additional imputed SNPs had smaller P values than rs6435862, the second most significant genotyped SNPs (P < 1.6 × 10−5; Supplementary Fig. S5A). All these SNPs are intragenic to BARD1 and have r2 values with rs7587476 ranging from 0.51 to 1 in the CEU samples and from 0.24 to 1 in the YRI samples.
In the FLJ22536 gene region, the most significant imputed SNP was rs9460668, with a P value of 0.00015. An additional 82 imputed SNPs had P values smaller than that of the most significant genotyped SNP, rs9466182, which had a P value of 0.013, and of these, 50 had a P value <0.01 (Supplementary Fig. S5B). None of the imputed SNPs in the LMO1 region had smaller P values than rs4237769, the most significant genotyped SNP (P value = 0.0054; Supplementary Fig. S5C).
Discussion
SNPs in the BARD1 gene region associated to neuroblastoma in European-American patients show similar, strongly significant association in African-American patients. In particular among the genotyped SNPs most of the association seems to be explained by rs7587476, located in intron 3 of BARD1, with some residual association signal detected by SNPs located in the first intron (Supplementary Fig. S3). Analysis of imputed SNPs identified additional SNPs strongly associated to neuroblastoma in the same intragenic region, but limited association outside the gene's boundary (Supplementary Fig. S5A). In contrast, the association in European-Americans extends further to the extragenic region 5′ to BARD1 (9), possibly due to the more extensive LD around the associated SNPs (Supplementary Fig. S4). This information may be helpful in locating the causal BARD1 variants.
Besides the BARD1 SNPs, the only other genotyped SNP that remained significant after correction for number of SNPs tested in a given gene was rs4237769 in LMO1 (permutation P = 0.001), when analysis was restricted to patients from the high-risk group. Five other SNPs in or around LMO1 had nominal P values <0.05 in the high-risk group, including rs204938, previously reported associated to neuroblastoma (9). The second most significant LMO1 SNP in the high-risk group in this study, rs3794012 (nominal P = 0.005) had a P value of 3 × 10−5 in Wang and colleagues (9). However, the rare G allele of rs3794012 is associated to neuroblastoma in the African-American patients, rather than the common A allele as observed in the European-American cases.
Interestingly, we did not detect a clear signal of association with SNPs of the FLJ22536 gene region, which show the most significant association to neuroblastoma in European-Americans (6, 9). The most significant FLJ22536 genotyped SNPs in our study were rs9466182 with a nominal P value of 0.01 in the whole group of cases and rs1207774 with a nominal P value of 0.005 in the high-risk group. However, these are hardly significant when considering the large number of SNPs tested in this gene (136 genotyped SNPs). The different structure and lower LD in the African-American population may have prevented us to detect association in FLJ22536, as well as this study being relatively underpowered for this rare condition. On the basis of HapMap CEU and YRI data, among the regions studied, coverage of FLJ22536 variation in Africans provided by the genotyped SNPs is the lowest (49%) and the most divergent from that of European-Americans (77%; Supplementary Fig. S1B). Analysis of more than 1,900 imputed SNPs detected a more significant signal of association with rs9460668, with a nominal P value of 0.00015 (Supplementary Fig. S5B). Although there are no LD data available for rs9460668, this SNP is located approximately 400 kb away from the FLJ22536 SNPs associated to neuroblastoma in European-Americans, and is unlikely to be in LD with them. Additional analyses with direct genotyping in this and other populations will be necessary to confirm whether this finding represents a real association or a false positive due to the large number of SNPs tested.
This study shows the difficulty in replicating association in African-Americans even for SNPs that are confirmed and show strong significant association in multiple European or European-American populations. Similar conclusions were reported for other well-documented SNP associations following GWAS in breast and prostate cancer (21, 22). Several factors may contribute to this discrepancy, including variation in LD patterns and allele frequencies among ethnic groups, allele and genetic heterogeneity, and low power to detect association to variants with small effect. A more detailed analysis genotyping additional SNPs not included in the standard GWAS chips and a larger sample size may be necessary to be able to conclude on the role of these genes on susceptibility to neuroblastoma in African-Americans. A larger sample size would also allow the opportunity to evaluate association separately in subgroups stratified by the admixture proportions, such as those with a preponderance of European or African ancestry.
Nonetheless, attempting replication of known association findings in populations of African descent is important not only to establish their validity in these populations but also as it can reduce the size of the candidate regions thanks to the more limited LD structure in these populations (23), as shown here in the case of BARD1. Association in the European-American group extended to SNPs located in the extragenic region 5′ to BARD1 as far as 75 kb from its initiation codon. On the basis of our study, a much smaller candidate region for location of the putative causal variants can be identified limited to the intragenic region between introns 1 and 4 (Supplementary Fig. S3). While our sample size may be still too small and the gene coverage too low to detect new significant associations which may be specifically relevant to African-American patients, our results for BARD1 clearly indicate the value to extend gene mapping studies to well-characterized samples from this ethnic group.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: H. Hakonarson, J.M. Maris, M. Devoto
Development of methodology: S.J. Diskin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H. Hakonarson, J.M. Maris
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): V. Latorre, S.J. Diskin, H. Zhang, H. Hakonarson, J.M. Maris, M. Devoto
Writing, review, and/or revision of the manuscript: V. Latorre, S.J. Diskin, H. Hakonarson, J.M. Maris, M. Devoto
Study supervision: J.M. Maris, M. Devoto
Acknowledgments
The authors thank the Children's Oncology Group and its investigators for providing neuroblastoma specimens.
Grant Support
This work was supported in part by NIH grants R01-CA124709, U01-CA98543 to the Children's Oncology Group, the Giulio D'Angio Endowed Chair, the Alex's Lemonade Stand Foundation, Andrew's Army Foundation, the SuperJake Foundation, the Abramson Family Cancer Research Institute (all to J.M. Maris); K99-CA151869 (to S.J. Diskin); and the CAG (to H. Hakonarson) at the Research Institute of CHOP.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.