Abstract
Background: Genome-wide association studies of European and East Asian populations have identified lung cancer susceptibility loci on chromosomes 5p15.33, 6p22.1-p21.31, and 15q25.1. We investigated whether these regions contain lung cancer susceptibly loci in African-Americans and refined previous association signals by using the reduced linkage disequilibrium observed in African-Americans.
Methods: 1,308 African-American cases and 1,241 African-American controls from 3 centers were genotyped for 760 single-nucleotide polymorphisms (SNP) spanning 3 regions, and additional SNP imputation was carried out. Associations between polymorphisms and lung cancer risk were estimated using logistic regression, stratified by tumor histology where appropriate.
Results: The strongest associations were observed on 15q25.1 in/near CHRNA5, including a missense substitution [rs16969968: OR, 1.57; 95% confidence interval (CI), 1.25–1.97; P, 1.1 × 10−4) and variants in the 5′-UTR. Associations on 6p22.1-p21.31 were histology specific and included a missense variant in BAT2 associated with squamous cell carcinoma (rs2736158: OR, 0.64; 95% CI, 0.48–0.85; P, 1.82 × 10−3). Associations on 5p15.33 were detected near TERT, the strongest of which was rs2735940 (OR, 0.82; 95% CI, 0.73–0.93; P, 1.1 × 10−3). This association was stronger among cases with adenocarcinoma (OR, 0.75; 95% CI, 0.65–0.86; P, 8.1 × 10−5).
Conclusions: Polymorphisms in 5p15.33, 6p22.1-p21.31, and 15q25.1 are associated with lung cancer in African-Americans. Variants on 5p15.33 are stronger risk factors for adenocarcinoma and variants on 6p21.33 associated only with squamous cell carcinoma.
Impact: Results implicate the BAT2, TERT, and CHRNA5 genes in the pathogenesis of specific lung cancer histologies. Cancer Epidemiol Biomarkers Prev; 22(2); 251–60. ©2012 AACR.
Introduction
Recent genome-wide association studies (GWAS) have identified lung cancer susceptibility loci on chromosomes 5p15.33 (1–5), 6p22.1-p21.31 (1–3, 6), and 15q25.1 (1–3, 6, 7). Thus far, lung cancer GWAS have been conducted only on European (1–4, 6, 7) or East Asian (5) populations, despite the fact that African-Americans have higher lung cancer incidence rates and lower lung cancer survival rates than other ethnic groups in the United States (8). Genetic effects are likely to differ in African-Americans because the previously associated single-nucleotide polymorphisms (SNP) have different allele frequencies and linkage patterns across populations (9, 10). As a result, how genetic variation in these 3 regions influences lung cancer susceptibility among African-Americans remains poorly understood.
The associated region on 5p15.33 contains 2 candidate lung cancer susceptibility genes, telomerase reverse transcriptase (TERT) and cleft lip and palate transmembrane 1-like (CLPTM1L). SNPs in this region have consistently been associated with both lung cancer and lung cancer histology. Specifically, rs2736100 in TERT has been shown to be significantly more common in cases with adenocarcinoma than with other lung cancer histologies (3). There is also evidence from GWAS that 5p15.33 is a susceptibility locus for multiple cancer sites, including: the pancreas (11), bladder (12), prostate (13), and brain (14).
The 6p22.1-p21.31 locus is part of the HLA region and is highly polymorphic. Association studies of lung cancer susceptibility have found inconsistent associations in this region, with several GWAS identifying significant associations (1–3, 6) and several follow-up replication studies failing to do so (10, 15, 16). Notably, multiple risk SNPs identified in European populations are nonpolymorphic in East Asian populations (10). Because previously associated SNPs in the HLA region may represent European-specific susceptibility loci, studies conducted in other ethnic groups should assay additional SNPs in this region to capture genetic variation not seen in Europeans.
The 15q25.1 region associated with lung cancer contains 6 genes, 3 of which (CHRNA5, CHRNA3, and CHNB4) are nicotinic receptor subunit genes. SNPs in these genes have been associated with smoking behavior, with risk alleles conferring a propensity to smoke with greater frequency and intensity (17–19). This would suggest an indirect effect of genetic variation at 15q25.1 on lung cancer risk, where variants influence cancer risk through a causal pathway in which smoking is an obligate intermediate. However, the effects of SNPs in this region on lung cancer risk appear more complex than this. Multiple studies have found that SNPs at 15q25.1 also influence lung cancer risk among never smokers (6, 20). Functional analyses have further shown that 1 of the 6 genes in this region, proteasome α-4 subunit isoform 1 (PSMA4), plays a role in cancer cell proliferation and apoptosis (21). Additional functional studies identified a lung cancer risk SNP on 15q25.1 that increases IREB2 expression (22). Identifying the causal variants at 15q25.1 has been challenging because SNPs in the region are in high linkage disequilibrium (LD) among Europeans (23). Fine mapping of the associated region in African-Americans, where LD is reduced, provides an opportunity to refine the location of lung cancer risk alleles by exploiting the shorter haplotype blocks (24, 25).
To better understand the role of genetic variation in these 3 regions, we conducted a multicenter case-control study of African-Americans to identify genetic variants associated with an increased risk of lung cancer. Samples were genotyped on an Illumina Golden Gate custom panel containing SNPs selected for fine-mapping of previous GWAS hits on chromosomes 5, 6, and 15. The goals of the study were to replicate SNP associations previously detected in European populations, identify novel or African American-specific lung cancer susceptibly SNPs in these regions, and refine previous association signals by exploiting the reduced LD observed in African-Americans.
Materials and Methods
Study population
A multicenter case-control study was designed to include African-American lung cancer cases and controls from 3 collaborating institutions: The University of California, San Francisco (UCSF; 447 cases and 453 controls), Wayne State University (WSU; 459 cases and 460 controls), and the MD Anderson Cancer Center (MDA; 479 cases and 376 controls). All participating institutions received Institutional Review Board approval, and appropriate written informed consent was obtained from human subjects. All study participants reported being of African-American ethnicity.
UCSF cases and controls were enrolled as part of The Northern California Lung Cancer Study, which has been described in detail previously (26). Cases and controls older than 18 years of age were identified during 2 collection periods, spanning September 1998 to March 2003 and July 2005 to March 2008. Cases were Northern California residents presenting with previously untreated, histologically confirmed lung cancer. Cases in the first accrual period were identified primarily through the Northern California Cancer Center (NCCC) rapid case ascertainment program and the Alta Bates/Summit Hospital. Cases in the second accrual period were identified through both the NCCC and the Kaiser Permanente Medical Care Program (KPMCP). Control participants ascertained in the first accrual period were recruited through 3 sources: random-digit dialing, Health Care Financing Administration records, and community-based recruitment. Controls in the second accrual period were recruited through the KPMCP.
Wayne State University cases were identified through the population-based Metropolitan Detroit Cancer Surveillance System, an NCI-funded SEER registry, as part of the EXHALE study (27). Rapid case ascertainment was used to identify histologically-confirmed cases within several months of diagnosis. African-Americans diagnosed with a first primary lung cancer from November 1, 2005 through June 30, 2010 were recruited for the study. Controls were recruited through community-based methods and were frequency matched on race, sex, and 5-year age group.
MDA cases were recruited from The University of Texas M. D. Anderson Cancer Center and the Michael E. DeBakey VA Medical Center in Houston. All cases with newly diagnosed, histopathologically confirmed lung cancer were eligible. Case-exclusion criteria were previous chemotherapy or radiotherapy or recent blood transfusion. African-American controls were recruited from Houston-area community centers and the Kelsey-Seybold Foundation, Houston's largest multispecialty physician group practice. Controls were matched to the cases on age (±5 years), sex, and African-American ethnicity.
SNP selection
Three regions found in previous GWAS to be associated with an increased risk for lung cancer were selected for fine-mapping in African-American subjects. The custom SNP panel included 120 ancestry-informative markers (AIM) for the calculation of percentage of sub-Saharan African ancestry and 760 SNPs selected to carry out fine-mapping of the 5p15.33, 6p22.1-p21.31, and 15q25.1 regions. This includes 138 SNPs on 5p15.33, 356 SNPs on 6p22.1-p21.31, and 266 SNPs on 15q25.1. Markers used in analyses were selected on the basis of: known functional effect on activity of nicotinic acetylcholine receptors, previously associated in East Asian or European populations, allele frequency greater than 0.01 in African populations, position across the region, predicted effect on function, r2 value with respect to other markers less than 0.70 as determined by SNP browser version 4.0 (28), inclusion in 1 of 3 previous studies of African-American lung cancer susceptibility (24, 25, 29), or discovery by targeted sequencing (30).
Genotyping
UCSF samples were genotyped at the University of California, San Francisco Genome Center using an Illumina custom panel of 1,536 SNPs. Unamplified genomic DNA samples extracted from whole blood (n = 750) were genotyped together with whole genome-amplified (WGA) blood or buccal DNA samples (n = 150), prepared as previously described (31). Genotypes for unamplified DNA and WGA DNA samples were clustered separately. A GenCall genotype quality threshold of 0.25 was used. Genotype reproducibility was verified with 12 duplicate samples with average concordance of 99.97% (range, 99.68–100%). Ceph Trios were genotyped to assess the accuracy of assigned genotype clusters. The average heritability for 9 Parent–Parent–Child trios was 99.88% (99.19–100%). All cluster plots were visually inspected.
Wayne State University samples were genotyped at the Applied Genomics Technology Center (AGTC) at Wayne State University using the same Illumina Golden Gate Custom panel of 1,536 SNPs. All WSU samples were unamplified genomic DNA, extracted from whole blood. Genotype reproducibility was verified with 30 duplicate samples, each with more than 99% concordance. Ten CEPH controls were genotyped and checked for concordance with published HapMap SNP genotypes at loci overlapping with those assayed by the Illumina custom panel. MDA samples were genotyped at the MD Anderson Cancer Center using the same Illumina Golden Gate Custom panel as the other sites. Genotyping was carried out on unamplified genomic DNA derived from peripheral whole blood. Genotype reproducibility was verified with 7 duplicate samples, with concordance ranging from 99.93% to 100%.
For all study sites, samples with genotyping call-rates less than 95% were excluded from analysis. SNPs with genotyping call rates less than 95% in more than 1 study site were excluded from analyses. To exclude poorly genotyped SNPs, any SNP with a Hardy–Weinberg Equilibrium (HWE) P value less than 1.0 × 10−4 in controls, stratified by site, was removed from analysis. All SNP quality control was carried out using Plink v1.07 (32).
Calculation of % sub-Saharan African ancestry
The genetic structure of African-American subjects was evaluated using Structure v2.3.1 to estimate percentage of membership in 3 distinct founder populations: sub-Saharan African, European, and East Asian (33), where East Asian population ancestry was also used as a surrogate for Amerindian descent. Founder population allele frequencies were defined using SNP data from 102 unlinked (r2 < 0.20) ancestry informative markers (AIM), genotyped in 502 unrelated HapMap individuals (167 Yoruban Africans, 165 Europeans, 84 Chinese, and 86 Japanese; ref. 34). These same AIMs were genotyped in our study participants for use with the Structure program.
Collection of covariates
Cases and controls recruited at all study sites completed interviews conducted by trained interviewers. Data on sex, age, and smoking behaviors were collected as covariates for this study. “Never smokers” were defined as those who had smoked less than 100 cigarettes in their lifetimes; “former smokers” were those who had quit smoking more than 1 year before diagnosis (cases) or interview (controls); “current smokers” included those who had quit smoking within the past 12 months. Pack-years for former and current smokers were calculated as the years smoked times the average number of cigarettes per day divided by 20.
Cancer histology was determined using ICD-O codes abstracted from Surveillance Epidemiology and End Results (SEER) data from the California Cancer Registry (UCSF cases) or Detroit Cancer Registry (WSU cases). For MDA cases, histology was determined by extraction from medical records. The following ICD-O groupings were made: adenocarcinoma (ICD-O: 8140, 8230, 8250–8255, 8260, 8310, 8333, 8470, 8480, 8481, 8490, and 8550), squamous cell carcinoma (8052, 8070–8073, 8083, and 8084), and small cell carcinoma (8041–8045).
SNP imputation and association analyses
To refine association peaks from case-control analyses and identify additional risk variants, we carried out SNP imputation in the 5p15.33, 6p22.1-p21.31, and 15q25.1 regions using the Impute2 v2.1.2 software and its standard Markov chain Monte Carlo algorithm using the default settings for targeted imputation (35). All 1,000 Genomes Phase I interim release haplotypes were provided as the reference haplotype panel (36). Using a cosmopolitan set of reference haplotypes is currently recommended for imputation, and is especially critical for imputation of recently admixed populations, such as African-Americans (37).
SNPs with imputation quality (info) scores less than 0.70 or posterior probabilities less than 0.90 were excluded to remove poorly imputed SNPs. Any SNP with a minor allele frequency of less than 1% in case subjects was excluded from association tests. All association statistics, both for imputed and for directly genotyped SNPs, were calculated using logistic regression in SNPTEST v2 assuming a log-additive model. A missing-data likelihood score-test was applied to the imputed variants, as this produces standard errors which account for the additional uncertainty inherent in the analysis of imputed genotypes. The effect of individual SNPs on lung cancer risk was calculated while adjusting for sex, age, % African ancestry, % European ancestry, number of pack-years smoked, and study site. Adjustment for current smoking did not affect associations and this covariate was not included in the final models.
Because the 15q25.1 region contains nicotinic receptor gene variants that have been associated with smoking behavior and, therefore, SNPs in this region may influence lung cancer risk by modifying smoking behavior, these SNP associations were also calculated without adjustment for pack-years. All SNP associations were assessed in the full case-control sample, and also stratified by histology (adenocarcinoma vs. controls, squamous cell carcinoma vs. controls, and small-cell lung cancer vs. controls). All associations are for an allelic additive model, adjusted for the indicated covariates, where odds ratios (OR) are for each additional copy of the minor allele.
Haplotype analyses
Haplotype analyses were carried out using Haploview v4.2 to identify haplotype blocks and calculate r2 values (38). Associations between the identified haplotypes and lung cancer risk were calculated in Plink while controlling for appropriate covariates.
Results
After excluding samples with call rates of less than 95%, 2,549 African-American participants remained for analysis (1,308 cases and 1,241 controls). Compared with controls, cases were more likely to be male, older, smoke, and have smoked a greater number of pack-years (Table 1). Cases and controls had similar African ancestry and similar Caucasian ancestry (Table 1; Supplementary Fig. S1). After excluding monomorphic SNPs and SNPs that did not meet call-rate or HWE thresholds, 660 genotyped SNPs remained for analysis. This included 111 SNPs on 5p15.33, 320 SNPs on 6p22.1-p21.31, and 242 SNPs on 15q25.1. An additional 103,928 SNPs were imputed in the 3 regions, of which 38.5% were excluded for having a MAF less than 1% in cases, 20.7% were excluded for imputation quality (info) scores less than 0.70, and 4.6% were excluded for posterior probabilities less than 0.90. This left 37,660 imputed SNPs for analysis.
. | UCSF (n = 879) . | MDA (n = 839) . | WSU (n = 831) . | Total (n = 2549) . | ||||
---|---|---|---|---|---|---|---|---|
. | Case . | Control . | Case . | Control . | Case . | Control . | Case . | Control . |
Sample size | 432 | 447 | 473 | 366 | 403 | 428 | 1,308 | 1,241 |
Mean ± SE | ||||||||
Age | 64.36 ± 0.53 | 64.17 ± 0.54 | 62.40 ± 0.47 | 55.41 ± 0.59 | 62.24 ± 0.51 | 62.04 ± 0.48 | 63.0 ± 0.29 | 60.86 ± 0.33 |
% African Ancestry | 0.75 ± 0.007 | 0.76 ± 0.006 | 0.76 ± 0.007 | 0.76 ± 0.008 | 0.78 ± 0.006 | 0.77 ± 0.006 | 0.76 ± 0.004 | 0.76 ± 0.004 |
% European Ancestry | 0.19 ± 0.006 | 0.19 ± 0.007 | 0.18 ± 0.006 | 0.18 ± 0.008 | 0.17 ± 0.006 | 0.17 ± 0.006 | 0.18 ± 0.004 | 0.18 ± 0.004 |
Pack-years smokeda | 33.39 ± 1.15 | 27.39 ± 1.74 | 39.92 ± 1.37 | 24.66 ± 1.32 | 39.17 ± 1.45 | 30.65 ± 3.52 | 37.45 ± 0.77 | 27.65 ± 1.41 |
n (%) | ||||||||
Male | 196 (45.37) | 204 (45.64) | 251 (53.07) | 154 (42.08) | 193 (47.89) | 195 (45.56) | 640 (48.93) | 553 (44.56) |
Ever smoker | 403 (93.29) | 289 (64.65) | 397 (83.93) | 270 (73.77) | 382 (94.79) | 294 (68.69) | 1,182 (90.37) | 853 (68.73) |
Current smoker | 181 (41.90) | 134 (29.98) | 246 (52.01) | 143 (39.07) | 241 (59.80) | 171 (39.95) | 668 (51.07) | 448 (36.10) |
Adenocarcinoma | 176 (40.74) | NA | 215 (45.45) | NA | 167 (41.44) | NA | 558 (42.66) | NA |
Squamous cell | 107 (24.77) | NA | 131 (27.70) | NA | 105 (26.05) | NA | 343 (26.22) | NA |
Small cell | 34 (7.87) | NA | 24 (5.07) | NA | 29 (7.20) | NA | 87 (6.65) | NA |
Other/not specified | 115 (26.62) | NA | 103 (21.78) | NA | 102 (25.31) | NA | 320 (24.46) | NA |
. | UCSF (n = 879) . | MDA (n = 839) . | WSU (n = 831) . | Total (n = 2549) . | ||||
---|---|---|---|---|---|---|---|---|
. | Case . | Control . | Case . | Control . | Case . | Control . | Case . | Control . |
Sample size | 432 | 447 | 473 | 366 | 403 | 428 | 1,308 | 1,241 |
Mean ± SE | ||||||||
Age | 64.36 ± 0.53 | 64.17 ± 0.54 | 62.40 ± 0.47 | 55.41 ± 0.59 | 62.24 ± 0.51 | 62.04 ± 0.48 | 63.0 ± 0.29 | 60.86 ± 0.33 |
% African Ancestry | 0.75 ± 0.007 | 0.76 ± 0.006 | 0.76 ± 0.007 | 0.76 ± 0.008 | 0.78 ± 0.006 | 0.77 ± 0.006 | 0.76 ± 0.004 | 0.76 ± 0.004 |
% European Ancestry | 0.19 ± 0.006 | 0.19 ± 0.007 | 0.18 ± 0.006 | 0.18 ± 0.008 | 0.17 ± 0.006 | 0.17 ± 0.006 | 0.18 ± 0.004 | 0.18 ± 0.004 |
Pack-years smokeda | 33.39 ± 1.15 | 27.39 ± 1.74 | 39.92 ± 1.37 | 24.66 ± 1.32 | 39.17 ± 1.45 | 30.65 ± 3.52 | 37.45 ± 0.77 | 27.65 ± 1.41 |
n (%) | ||||||||
Male | 196 (45.37) | 204 (45.64) | 251 (53.07) | 154 (42.08) | 193 (47.89) | 195 (45.56) | 640 (48.93) | 553 (44.56) |
Ever smoker | 403 (93.29) | 289 (64.65) | 397 (83.93) | 270 (73.77) | 382 (94.79) | 294 (68.69) | 1,182 (90.37) | 853 (68.73) |
Current smoker | 181 (41.90) | 134 (29.98) | 246 (52.01) | 143 (39.07) | 241 (59.80) | 171 (39.95) | 668 (51.07) | 448 (36.10) |
Adenocarcinoma | 176 (40.74) | NA | 215 (45.45) | NA | 167 (41.44) | NA | 558 (42.66) | NA |
Squamous cell | 107 (24.77) | NA | 131 (27.70) | NA | 105 (26.05) | NA | 343 (26.22) | NA |
Small cell | 34 (7.87) | NA | 24 (5.07) | NA | 29 (7.20) | NA | 87 (6.65) | NA |
Other/not specified | 115 (26.62) | NA | 103 (21.78) | NA | 102 (25.31) | NA | 320 (24.46) | NA |
aAmong ever smokers
Abbreviations: UCSF, University of California San Francisco; MDA, M.D. Anderson Cancer Center; WSU, Wayne State University.
Figs. 1 and 2 show results of the association analyses for the 5p15.33, 6p22.1-p21.31, and 15q25.1 regions, respectively, with triangles identifying SNPs previously determined to be associated with lung cancer risk according to the NHGRI Catalog of Published Genome-Wide Association Studies. Supplementary Table S1 provides association results for all the imputed SNPs included in Figs. 1 and 2 that had P values less than 0.01. Histology-specific association results are also provided.
Four previously identified lung cancer risk SNPs on 5p15.33 were associated with lung cancer in this African-American sample (P < 0.05, effect size in same direction as previous reports), but were not the most statistically significant associations observed in the region. The most significantly associated genotyped SNP in the region was rs2735940 (OR, 0.82; 95% CI, 0.73–0.93; P, 0.00011) and the most significantly associated imputed SNP in the region was rs62332591 (OR, 0.78; 95% CI, 0.68–0.89; P, 0.00037). Overall, the most significant associations were localized to the 5′ end of the TERT gene, just downstream of CLPTM1L (Fig. 1A; Tables 2 and 3).
SNPa . | Position . | Gene . | Alleleb . | MAFc . | OR (95% CI)d . | P valuee . |
---|---|---|---|---|---|---|
rs2036527 | Chr15:78851615 | 6.3 kb upstream of CHRNA5 | C/T | 0.2102 | 1.34 (1.17–1.54) | 2.0 × 10−5 |
rs17486278 | Chr15:78867482 | Intron 1 of CHRNA5 | A/C | 0.2802 | 1.31 (1.15–1.48) | 2.7 × 10−5 |
rs16969968 | Chr15:78882925 | Exon 5 of CHRNA5 (missense Asp → Asn) | G/A | 0.06008 | 1.57 (1.25–1.97) | 1.1 × 10−4 |
rs7180002 | Chr15:78873993 | Intron 2 of CHRNA5 | A/T | 0.102 | 1.40 (1.17–1.68) | 2.1 × 10−4 |
rs951266 | Chr15:78878541 | Intron 2 of CHRNA5 | C/T | 0.1031 | 1.39 (1.16–1.66) | 2.8 × 10−4 |
rs17486195 | Chr15:78865197 | Intron 1 of CHRNA5 | A/G | 0.114 | 1.36 (1.14–1.61) | 5.4 × 10−4 |
rs4243084 | Chr15:78911672 | Intron 1 of CHRNA3 | C/G | 0.1753 | 1.28 (1.11–1.48) | 8.5 × 10−4 |
rs2735940 | Chr5:1296486 | 1.3 kb upstream of TERT | T/C | 0.479 | 0.82 (0.73–0.93) | 1.1 × 10−3 |
rs4635969 | Chr5:1308552 | 13.3 kb upstream of TERT | C/T | 0.3328 | 0.81 (0.72–0.92) | 1.2 × 10−3 |
rs17405217 | Chr15:78731149 | Intron 1 of IREB2 | C/T | 0.05842 | 1.43 (1.14–1.81) | 2.5 × 10−3 |
SNPa . | Position . | Gene . | Alleleb . | MAFc . | OR (95% CI)d . | P valuee . |
---|---|---|---|---|---|---|
rs2036527 | Chr15:78851615 | 6.3 kb upstream of CHRNA5 | C/T | 0.2102 | 1.34 (1.17–1.54) | 2.0 × 10−5 |
rs17486278 | Chr15:78867482 | Intron 1 of CHRNA5 | A/C | 0.2802 | 1.31 (1.15–1.48) | 2.7 × 10−5 |
rs16969968 | Chr15:78882925 | Exon 5 of CHRNA5 (missense Asp → Asn) | G/A | 0.06008 | 1.57 (1.25–1.97) | 1.1 × 10−4 |
rs7180002 | Chr15:78873993 | Intron 2 of CHRNA5 | A/T | 0.102 | 1.40 (1.17–1.68) | 2.1 × 10−4 |
rs951266 | Chr15:78878541 | Intron 2 of CHRNA5 | C/T | 0.1031 | 1.39 (1.16–1.66) | 2.8 × 10−4 |
rs17486195 | Chr15:78865197 | Intron 1 of CHRNA5 | A/G | 0.114 | 1.36 (1.14–1.61) | 5.4 × 10−4 |
rs4243084 | Chr15:78911672 | Intron 1 of CHRNA3 | C/G | 0.1753 | 1.28 (1.11–1.48) | 8.5 × 10−4 |
rs2735940 | Chr5:1296486 | 1.3 kb upstream of TERT | T/C | 0.479 | 0.82 (0.73–0.93) | 1.1 × 10−3 |
rs4635969 | Chr5:1308552 | 13.3 kb upstream of TERT | C/T | 0.3328 | 0.81 (0.72–0.92) | 1.2 × 10−3 |
rs17405217 | Chr15:78731149 | Intron 1 of IREB2 | C/T | 0.05842 | 1.43 (1.14–1.81) | 2.5 × 10−3 |
aordered by P value;
bminor allele listed second;
cMAF in controls only;
dOR for each additional copy of the minor allele, estimated in a logistic regression model adjusted for: age, sex, study site, % sub-Saharan African ancestry, % European ancestry, and number of pack-years smoked;
eP value in an allelic additive logistic regression model, adjusted for: age, sex, study site, % sub-Saharan African ancestry, % European ancestry, and number of pack-years smoked.
. | Adenocarcinoma (n = 538) . | Squamous cell (n = 343) . | Small cell (n = 87) . | All histologies (n = 1308) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Histologic subtype . | SNP . | Position . | Gene . | Alleleb . | ORc . | P valued . | ORc . | P-valued . | ORc . | P-valued . | ORc . | P-valued . |
Adenocarcinoma | rs2735940 | Chr5:1296486 | 1.3 kb upstream of TERT | T/C | 0.75 | 8.1 × 10−5 | 0.81 | 0.024 | 0.87 | 0.3991 | 0.82 | 1.1 × 10−3 |
rs7725218 | Chr5:1282414 | Intron 2 of TERT | G/A | 1.33 | 2.0 × 10−4 | 1.08 | 0.43 | 0.91 | 0.5635 | 1.13 | 0.047 | |
rs2736100 | Chr5:1286516 | Intron 1 of TERT | T/G | 1.31 | 2.7 × 10−4 | 1.13 | 0.17 | 1.12 | 0.4955 | 1.17 | 8.9 × 10−3 | |
Squamous cell | rs401681 | Chr5:1322087 | Intron 13 of CLPTM1L | T/C | 1.12 | 0.12 | 1.35 | 1.5 × 10−3 | 1.16 | 0.37 | 1.18 | 4.3 × 10−3 |
rs2844463 | Chr6:31615167 | Intron 7 of BAT3 | C/T | 0.92 | 0.26 | 0.74 | 1.8 × 10−3 | 1.0 | 0.99 | 0.91 | 0.13 | |
rs2736158 | Chr6:31600304 | Exon 16 of BAT2 (missense Ala → Gly) | C/G | 1.06 | 0.61 | 0.64 | 1.8 × 10−3 | 1.16 | 0.53 | 1.04 | 0.66 |
. | Adenocarcinoma (n = 538) . | Squamous cell (n = 343) . | Small cell (n = 87) . | All histologies (n = 1308) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Histologic subtype . | SNP . | Position . | Gene . | Alleleb . | ORc . | P valued . | ORc . | P-valued . | ORc . | P-valued . | ORc . | P-valued . |
Adenocarcinoma | rs2735940 | Chr5:1296486 | 1.3 kb upstream of TERT | T/C | 0.75 | 8.1 × 10−5 | 0.81 | 0.024 | 0.87 | 0.3991 | 0.82 | 1.1 × 10−3 |
rs7725218 | Chr5:1282414 | Intron 2 of TERT | G/A | 1.33 | 2.0 × 10−4 | 1.08 | 0.43 | 0.91 | 0.5635 | 1.13 | 0.047 | |
rs2736100 | Chr5:1286516 | Intron 1 of TERT | T/G | 1.31 | 2.7 × 10−4 | 1.13 | 0.17 | 1.12 | 0.4955 | 1.17 | 8.9 × 10−3 | |
Squamous cell | rs401681 | Chr5:1322087 | Intron 13 of CLPTM1L | T/C | 1.12 | 0.12 | 1.35 | 1.5 × 10−3 | 1.16 | 0.37 | 1.18 | 4.3 × 10−3 |
rs2844463 | Chr6:31615167 | Intron 7 of BAT3 | C/T | 0.92 | 0.26 | 0.74 | 1.8 × 10−3 | 1.0 | 0.99 | 0.91 | 0.13 | |
rs2736158 | Chr6:31600304 | Exon 16 of BAT2 (missense Ala → Gly) | C/G | 1.06 | 0.61 | 0.64 | 1.8 × 10−3 | 1.16 | 0.53 | 1.04 | 0.66 |
aTop 3 associations for small-cell carcinoma are not included due to limited sample size. All 3 SNPs were located on 15q25.1.
bMinor allele listed second.
cOR for each additional copy of the minor allele, estimated in a logistic regression model adjusted for: age, sex, study site, % sub-Saharan African ancestry, and number of pack-years smoked.
dP value in an allelic additive logistic regression model, adjusted for: age, sex, study site, % sub-Saharan African ancestry, and number of pack-years smoked.
In analysis of all tumor histologies, the most significantly associated genotyped SNP in the 6p22.1-p21.31 region was rs184054 (P, 0.0172). No genotyped or imputed SNPs had a P value less than 0.001. A previously identified lung cancer risk SNP, rs3117582, was not associated with case-control status in these African-Americans (P, 0.2181; Fig. 1C).
Analysis of the 15q25.1 region identified a number of strongly associated SNPs (Fig. 2), including 2 previously reported to influence lung cancer risk. These associations were replicated in our African-American case-control sample at both the genotyped SNP rs8034191 (P, 0.0028, effect in same direction as previous reports) and the imputed SNP rs1051730 (P, 0.000058, effect in same direction as previous reports). The most significantly associated SNP in 15q25.1 was rs2036527 (OR, 1.34; 95% CI, 1.17–1.54; P, 0.00002). This SNP was also the most significantly associated SNP in the analysis of all 3 regions. Indeed, of the 10 most strongly associated genotyped SNPs in the full analysis of the 3 regions, 8 are located on 15q25.1 (Table 2). These 8 SNPs are primarily located in or upstream of CHRNA5. One of these, rs16969968, is a functional SNP located in the fifth exon of CHRNA5 (OR, 1.57; 95% CI, 1.25–1.97; P, 0.00011). The A>G substitution changes an aspartic acid residue to an asparagine residue and is found at a frequency of 8.8% in cases. None of the 1,013 imputed SNPs on 15q25.1 were more significantly associated with lung cancer than the 2 most significant genotyped SNPs, although both rs55853698 and rs55781567 in the 5′ untranslated region of CHRNA5 were strongly associated with lung cancer (P, 0.000089 and 0.000056, respectively). Both SNPs are located within predicted transcription-factor binding sites generated from ENCODE data.
Modeling all 15q25.1 SNP associations conditional on rs16969968 genotype (i.e., including the number of rs16969968 risk alleles as an ordinal covariate in the sex, age, study site, ancestry, and smoking-adjusted model) attenuated associations (Supplementary Fig. S2). However, the 2 most significant SNPs from Table 2 (rs2036527 and rs17486278) remained associated with lung cancer in the conditional analysis (P, 0.0056 and 0.0047, respectively). Overall, the most strongly associated SNPs in the conditional analysis were rs149156593 in an intron of CHRNB4 (OR, 2.31; 95% CI, 1.45–3.69; P, 0.00045) and rs111819086 in an intron of AGPHD (OR, 1.92; 95% CI, 1.31–2.82; P, 0.00079). Analysis without adjustment for pack-years modestly increased the strength of the 15q25.1 associations (Supplementary Fig. S3).
In haplotype analyses of the 3 chromosomal regions, no significant haplotype (P < 0.05) was more statistically significant than the most significant SNP within that haplotype. As a result, haplotype analyses did not outperform single SNP analyses.
Conducting association tests after stratifying by tumor histology produced striking results for the 5p15.33 and 6p22.1-p21.31 regions (Fig. 3), while associations were very consistent across histologic subtypes in the 15q25.1 region (data not shown). Several SNPs in the 5p15.33 region are strongly associated with adenocarcinoma of the lung, but have weaker associations with other lung cancer types. As seen in Table 3 and Fig. 3A, SNPs near the promoter and first exons of TERT were more strongly associated with adenocarcinoma than with other histologies, resulting in a dampening of the association signal in unstratified analyses. The strongest of these associations was rs2735940, with an OR of 0.75 comparing adenocarcinoma patients with controls (95% CI, 0.64–0.86; P, 0.000081) and an OR of 0.82 (95% CI, 0.73–0.93; P, 0.0011) comparing all lung cancer histologies with controls.
For the 6p22.1-p21.31 region, the association pattern seen for participants with squamous cell tumors differed from those with other tumor histologies, including a noteworthy association peak spanning from approximately 31.60 to 32.10 Mb and overlapping with previous GWAS hits in BAT3 (Table 3 and Fig. 3B). The most strongly associated genotyped SNPs in the region were rs2844463 in an intron of BAT3 (OR, 0.74; 95% CI, 0.62–0.90; P, 0.00179) and rs2736158 in exon 16 of BAT2 (OR, 0.64; 95% CI, 0.48–0.85; P, 0.00182). This missense C>G substitution changes an alanine residue to a glycine residue, and is found at a frequency of 12% in controls where it acts as a protective allele decreasing the risk of squamous cell carcinoma. Imputation of an additional 3,447 SNPs in the region provides additional evidence that a squamous cell risk variant is located on 6p21.33. As a whole, these results provide evidence that previous GWAS hits in and around BAT2/BAT3 are valid associations, but that at least in African-Americans the association is specific to squamous cell lung cancer.
Discussion
Our findings provide evidence that inherited variation in the 3 regions most frequently associated with lung cancer in European and East Asian populations are also associated with lung cancer in African-Americans. Despite this consistency, there remains heterogeneity across ethnicities in which particular SNPs have the strongest associations, particularly on 15q25.1 where the association peak in African-Americans is more strongly localized to CHRNA5 than it is among other ethnicities. We further show that SNPs which confer risk for specific lung cancer histologies are not necessarily risk factors for all types of lung cancer, as 5p15.33 associations were driven by risk for adenocarcinoma and 6p21.33 associations appear specific to risk for squamous cell carcinoma. Imputation of an additional 305 SNPs on 5p15.33 and 3,447 SNPs on 6p21.33 further support the histology-specific nature of these associations. Finally, a list of the most strongly associated SNPs in the analysis includes functional variants on 15q25.1 and 6p21.33, which may directly influence lung cancer risk in African-Americans. Histology-specific lung cancer associations are being reported with increasing frequency, as researchers begin to refine case definitions in their association studies (39, 40).
We observed a moderate association between lung cancer and variants at 5p15.33 in TERT and CLPTM1L, expanding on previous reports of associations in this region in African-Americans (29). These associations appear largely to be confined to risk for adenocarcinoma, as several SNP associations in and near TERT were greatly strengthened when analysis was conducted only on cases with adenocarcinoma, despite the reduction in sample size. The association in adenocarcinoma cases was well localized to the region immediately upstream and in the first 2 introns of TERT, suggesting that TERT is likely the gene in this region functionally involved in lung cancer pathogenesis.
In analysis of the full case-control dataset, variants in the HLA region on 6p22.1-p21.31 showed little association with lung cancer. However, when stratified by histology, a previously associated region in BAT3 was replicated in the squamous cell carcinoma subgroup and an associated missense variant in BAT2 was identified. Associations in the 6p21.33 region have not been consistently replicated, in part due to the European risk SNPs having low allele frequencies in East Asian populations. We offer another possible reason for the inconsistent associations of SNPs in this region – namely, that the associations appear to be histology specific and may be difficult to detect when squamous cell carcinomas are analyzed with other tumor histologies. Just as the TERT associations are strongest in cases of adenocarcinoma, in African-Americans the 6p21.33 associations appear strongest in cases of squamous cell carcinoma. Indeed, of the 3 GWAS identifying an association peak in/near BAT3, all contained a substantial proportion of cases with squamous cell tumors (range, 19.5–38%; refs. 1–3). The histology-specific association of 6p21 SNPs with squamous cell tumors was also recently detected in a meta-analysis of Caucasian lung cancer patients (39).
The strongest associations in this African-American lung cancer study were located on 15q25.1, in or just upstream of CHRNA5. Several associated SNPs in CHRNA5 may be functionally relevant, including strongly associated variants in the 5′ UTR and in exon 5. We genotyped many more SNPs in the 15q25.1 region than previous studies of African-Americans (41), and imputed additional variants using data from the 1000 Genomes project. While studies in European populations have had difficulty mapping the 15q25.1 signal to a particular gene because of large LD blocks in the region, the reduced LD in African-Americans has allowed functional and tagging variants in CHRNA5 to emerge as the likeliest candidates in our analysis.
Controlling for the number of pack-years smoked modestly attenuated CHRNA5 associations, but they remained the strongest association signals in the analyses. Because SNPs in this region are associated with smoking behaviors, and pack-years is an imperfect measurement of smoking behavior, the effects observed here may be mediated by smoking. Conditioning association tests on missense SNP rs16969968 revealed additional association signals in CHRNB4 and AGPHD. Whether these associations are due to the effects of additional genes on 15q25.1, or simply result from linkage between SNPs in CHRNB4 and AGPHD with other CHNRA5 risk variants remains to be determined. Whether differences in the association patterns on 15q25.1 seen in African-Americans, compared with other ethnicities, are because of differences in LD patterns or differences in smoking behaviors remains to be determined.
The current analysis represents the largest genetic study of lung cancer conducted in African-Americans to date and has identified functional variants which may contribute to the pathogenesis of this devastating disease. While multicenter studies pose certain challenges, we believe that the close similarity of the percentage of sub-Saharan African ancestry estimates and the similar sample size across study sites is a particular strength of the study. Our study provides evidence that regions which are associated with lung cancer in Europeans and East Asians are also associated in African-Americans, but that there is incomplete overlap of the associated SNPs. We further conclude that tumor histology is an important consideration in genetic association analyses of lung cancer, especially in African-Americans where there is both a different distribution of tumor histology (42) and of SNP allele frequencies compared with European populations.
Disclosure of Potential Conflicts of Interest
Laura J. Bierut served as a consultant for Pfizer Inc. in 2008 and is an inventor on the patent “Markers for Addiction” (US 20070258898) covering the use of certain SNPs in determining the diagnosis, prognosis, and treatment of addiction. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: K.M. Walsh, C. Wei, M.F. Seldin, M.R. Wrensch, A.G. Schwartz, J.K. Wiencke, C.I. Amos
Development of methodology: K.M. Walsh, H.M. Hansen, M.F. Seldin, M.R. Wrensch, C.I. Amos
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H.M. Hansen, M.R. Spitz, J.D. Sison, M.L. Frazier, M.R. Wrensch, A.G. Schwartz, J.K. Wiencke, C.I. Amos
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.M. Walsh, I.P. Gorlov, H.M. Hansen, E.Y. Lu, J.D. Sison, C. Wei, W. Chen, M.F. Seldin, L.J. Bierut, M.R. Wrensch, A.G. Schwartz, J.K. Wiencke, C.I. Amos
Writing, review, and/or revision of the manuscript: K.M. Walsh, I.P. Gorlov, H.M. Hansen, M.R. Spitz, J.D. Sison, C. Wei, S.M. Lloyd, M.L. Frazier, M.F. Seldin, L.J. Bierut, P.M. Bracci, M.R. Wrensch, A.G. Schwartz, J.K. Wiencke, C.I. Amos
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.M. Hansen, X. Wu, H. Zhang, A.S. Wenzlaff, J.D. Sison, M.R. Wrensch, J.K. Wiencke, C.I. Amos
Study supervision: M.R. Spitz, A.S. Wenzlaff, J.D. Sison, M.R. Wrensch, A.G. Schwartz, J.K. Wiencke, C.I. Amos
Other (Cross-population work on cigarette smoking): L.J. Bierut
Grant Support
This work was financially supported by NIH grants R01CA52689, R01ES06717, R01CA121197, R01CA121197S2, R01CA14176, R01CA060691, N01PC35145, HHSN261201000028C, and R25CA112355 to K.M. Walsh and K02DA021237 to L.J. Bierut.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.