Abstract
To explore the impact of common variation on the risk of developing lung cancer, we conducted a two-phase genome-wide association (GWA) study. In phase 1, we compared the genotypes of 511,919 tagging single nucleotide polymorphisms (SNP) in 1,952 cases and 1,438 controls; in phase 2, 30,568 SNPs were genotyped in 2,465 cases and 3,005 controls. SNP selection was based on best supported P values from phase 1 and two other GWA studies of lung cancer. In the combined analysis of phases 1 and 2, the strongest associations identified were defined by SNPs mapping to 15q25.1 (rs12914385; P = 3.19 × 10−16), 5p15.33 (rs4975616; P = 6.66 × 10−7), and 6p21.33 (rs3117582; P = 9.13 × 10−7). Variation at 15q25.1, but not 5p15.33 or 6p21.33, was strongly associated with smoking behavior with risk alleles correlated to higher consumption. Variation at 5p15.33 was shown to significantly influence induction of lung cancer histology. Pooling data from the four series provided 21,620 genotypes for 7,560 cases and 8,205 controls. A meta-analysis provided increased support that variation at 15q25.1 (rs8034191; P = 3.24 × 10−26), 5p15.33 (rs4975616; P = 2.99 × 10−9), and 6p21.33 (rs3117582; P = 4.46 × 10−10) influences lung cancer risk. The next best-supported associations were attained at 15q15.2 (rs748404: P = 1.08 × 10−6) and 10q23.31 (rs1926203; P = 1.28 × 10−6). These data indicate few common variants account for 1% of the excess familial risk underscoring the necessity of having additional large sample series for gene discovery. [Cancer Res 2009;69(16):6633–41]
Introduction
Lung cancer is a leading cause of cancer death worldwide. Although >80% of the population attributable risk of lung cancer can be ascribed to tobacco smoking, several lines of evidence indicate that inherited genetic factors influence the development and progression of lung cancer; in particular, epidemiologic studies have consistently shown an elevated risk of lung cancer in relatives of lung cancer cases after adjustment for smoking.
Recently genome-wide association (GWA) studies of lung cancer have shown common variation at 15q24-25.1 as a determinant of risk (1–3). Two studies found that the same alleles at this locus increased risk of lung cancer and influenced tobacco smoking behavior. Genes mapping to this region of association include CHRNA3, CHRNA4, CHRNA5, PSM4, LOC123688, and IREB2. The CHRNA genes encode the nicotinic receptor subunits; in addition to playing a role in development of nicotine dependence, nicotine receptors also influence cell proliferation and apoptosis. Hence, these genes represent strong candidates for combined lung cancer susceptibility and predilection to smoking. PSMA4 encodes the fourth component of the proteasome, which plays a role in protein degradation, and IREB2 is involved in iron metabolism, which may thus impact on oxidative damage. A second lung cancer locus identified through the GWA studies maps to 5p and includes the genes encoding TERT and CLMPTL1. In addition to these loci, we and others have found statistically significant evidence implicating a third locus at 6p as a risk factor for lung cancer (4, 5).
We have previously reported the results of the most extreme hits from phase 1 of our GWA study with independent replication (4), identifying 15q25, 5p, and 6p as disease loci for lung cancer. Here, we report comprehensive findings from our GWA study. In phase 1, we genotyped 561,466 tagging single nucleotide polymorphisms (SNP) in 1,978 lung cancer cases, comparing genotype frequencies with 1,438 controls. In phase 2, we genotyped 33,060 selected SNPs in 2,484 lung cancer cases and 3,036 controls. This analysis in conjunction with a meta-analysis of two other GWA studies (2, 4) provides insight into the genetic architecture of inherited susceptibility to lung cancer.
Materials and Methods
Study participants. UK-GWA study phase 1: Cases (1,182 male, 796 female; mean age at diagnosis, 57 y; SD, 6) with pathologically confirmed lung cancer were ascertained through the Genetic Lung Cancer Predisposition Study (GELCAPS; ref. 6). All were British residents and self-reported to be of European Ancestry. Individuals from the 1958 Birth cohort served as source of controls (7). Comprehensive information on the 1958 Birth Cohort can be obtained through the Centre for Longitudinal Studies Web site.4
UK-GWA study phase 2: An additional 2,484 cases (1,690 male, 794 female; mean age at diagnosis, 72 y; SD, 7) were ascertained through GELCAPS. Blood samples were obtained from 3,036 healthy individuals (1,497 male, 1,539 female; mean age, 61 y; SD, 11) recruited to the National Cancer Research Network genetic epidemiologic studies, the National Study of Colorectal Cancer (1999–2006; n = 541), GELCAPS (1999–2004; n = 1,520), and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (1999–2004; n = 975). These controls were the spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at time of ascertainment. All were British residents and self reported to be of European Ancestry.
SNP selection and genotyping. DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen). Phase 1 of the UK-GWA study was conducted using Illumina Human550 BeadChips according to the manufacturer's protocols (Illumina). Phase 2 genotyping was carried out using Illumina Infinium custom arrays according to the manufacturer's protocols. Selection of SNPs were based on a stepwise procedure (Supplementary Fig. S1); the majority, 20,000, were chosen by a hypothesis-free (agnostic) strategy, simply on the basis of being most significantly associated with lung cancer risk in phase 1. The remainder were selected on an alternative basis, briefly: 1,799 additional SNPs (annotated in dbSNP) were included in the 15q25.1, 6p21.33, and 5p15.33 regions, which had been previously reported to be associated with disease risk. Seventy-nine SNPs showing an association with lung cancer risk in previously reported GWA studies [International Agency for Research on Cancer (IARC)-GWA and Texas-GWA studies] at a P value of <10−4, not captured by the 20,000 most significant SNPs in phase 1 and fine mapping SNPs. Agnostic SNPS (11,182) not included in the aforementioned criteria, based on being most significantly associated with lung cancer in a previously reported meta-analysis of phase 1 and IARC-GWA and Texas-GWA studies (4).
DNA samples with GenCall scores <0.25 at any locus were considered “no calls.” A DNA sample was deemed to have failed if it generated genotypes at <95% of loci. An SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To ensure quality of genotyping, a series of duplicate samples were genotyped and cases and controls were genotyped in the same batches.
Meta-analysis. A meta-analysis pooling both phases of our UK-GWA study with data from two other studies: IARC-GWA study of 1,989 cases and 2,625 controls (2), summary data from which is publicly available; Texas-GWA study of 1,154 non–small cell lung cancer (NSCLC) cases who were all smokers and 1,137 smoking matched controls (4). Comprehensive details of case and control ascertainment and matching criteria, as well as the genotyping of Texas-GWA and IARC-GWA studies have been published previously (2, 4).
Ethical approval for the UK study was obtained from the London Multi-Centre Research Ethics Committee (MREC/98/2/67) in accordance with the tenets of the Declaration of Helsinki. All participants provided informed consent.
Statistical analysis. Statistical analysis was undertaken using S Plus v7.0 (Insightful), R v2.8.0, and STATA v8.0 Software. Genotype data were used to search for duplicates and closely related individuals among all samples in phase 1. Identity by state values were calculated for each pair of individuals on 22,120 SNPs, and for any pair with allele sharing >80%, the sample generating the lowest call rate was removed from further analysis. In phase 1, genotyped samples were excluded from further analyses for the following reasons: gender discrepancy (n = 6), duplicated (n = 0), and relatedness (n = 0).
The adequacy of case-control matching and possibility of differential genotyping of cases and controls were formally evaluated using Q-Q plots of −log10P values (based on the 90% least significant SNPs). Deviation of the genotype frequencies in the controls from those expected under Hardy-Weinberg Equilibrium was assessed by χ2 test, or Fisher's exact test where an expected cell count was <5. Comparison of the difference in number of associations observed and expected was made using the binomial test.
The association between each SNP and risk was assessed by the allele test. Odds ratios (OR) and associated 95% confidence intervals (CI) were calculated by unconditional logistic regression. Associations by histology [NSCLC, small cell lung cancer (SCLC)] were examined by logistic regression in case-only analyses.
Pooling of phase 1 and phase 2 data were based on individual genotypes. We imposed stringent criteria for call rates of SNPs and checked for significant disparity of minor allele frequencies (MAFs) between series. Only summary data were available for the Texas and IARC-GWA studies. To minimize errors in data harmonization, we examine for deviation in MAF for SNPs in cases and controls across data sets.
Meta-analysis was conducted using standard methods for combining raw data based on the Mantel-Haenszel method and weighted average of study-specific estimates of the ORs (8). Cochran's Q statistic to test for heterogeneity (8) and the I2 statistic to quantify the proportion of the total variation due to heterogeneity were calculated. This was performed using Metagen module from Meta library for R.
The sibling relative risk attributable to a given SNP was calculated using the formula (9):
where P is the population frequency of the minor allele, q = 1−P, and r1 and r2 are the relative risks (estimated as OR) for heterozygotes and rare homozygotes, relative to common homozygotes. Assuming a multiplicative interaction, the proportion of the familial risk attributable to a SNP was calculated as log(λ)/log(λ0), where λ0 is the overall familial relative risk estimated from epidemiologic studies, assumed here to be 1.8 (10).
The impact of variants on smoking behavior was assessed by comparing the prevalence of alleles stratified by cigarette consumption (cigarettes per day, CPD) using the Cochran-Armitage test. We also used the Kruskall-Wallis test to analyze for differences in cigarette consumption stratified by genotype. To test for independent effect of variants on CPD, genotype frequencies in light and medium smokers were compared with frequencies in heavy smokers using multinomial logistic regression.
We estimated power of our analysis to identify associations over a range of MAFs assuming joint analysis and a multiplicative effect model for each SNP.
Bioinformatics. We used Haploview (v3.2) to infer the linkage disequilibrium (LD) structure of the genome in the regions containing loci associated with disease risk.
Results
UK-GWA study. Our previous publication details numbers of SNPs genotyped and quality controlled in phase 1 (4). Briefly, a total of 552,974 SNPs were satisfactorily genotyped (99.7%). Of the SNPs satisfactorily genotyped, 524,714 were common to both cases and controls. Several quality control processes were sequentially applied to these SNPs (Fig. 1), leaving 511,919 SNPs for which genotype data were informative. For the informative SNPs, mean individual sample call rates (the percentage of samples for which a genotype was obtained for each SNP) were 99.8% and 99.6% in cases and controls, respectively. Comparison of the observed and expected distributions showed little evidence for an inflation of the test statistics (inflation factor, 1.02, based on the 90% least significant SNPs), thereby excluding systematic bias. Of the 1,978 cases, 1,958 were successfully genotyped, and following quality control, 1,952 cases were used in the analysis (Fig. 1).
In phase 2, a total of 31,039 SNPs (93.9%) were successfully genotyped, of which 30,568 provided reliable genotypes according to our quality control metrics (Fig. 1). Of the 2,484 cases and 3,036 controls attempted, 2,465 cases and 3,014 controls were successfully genotyped, and of these, 2,465 cases and 3,005 controls were suitable for analysis (Fig. 1). In the combined analysis of phase 1 and 2 (Supplementary Table S1), the strongest associations identified were found at polymorphic sites defined by SNPs mapping to 15q25.1 (rs12914385; P = 3.19 × 10−16), 5p15.33 (rs4975616; P = 6.66 × 10−7), and 6p21.33 (rs3117582; P = 9.13 × 10−7), which have been the subject of previous fast-tracking analyses.
On the basis of the combined phase 1 and phase 2 GWA data, the 15q25.1 association is defined by a 248-kb region of strong LD on chromosome 15 extending from 76,499,754bp to 76,747,584bp (Fig. 2A). Maximal evidence of a relationship was provided by the SNP rs12914385, which maps at 76,685,778bp (OR, 1.29; 95% CI, 1.21-1.37; P = 3.19 × 10−16; Supplementary Table S1). rs938682 and rs8042374 also provide strong evidence for an association between 15q25 and lung cancer risk (Supplementary Table S2). The relative position of these three SNPs to the CHRNA3, CHRNA5, CHRNB4, IREB2, PSMA4, and LOC123688 transcripts, which map to the 248kb region of LD within 15q25.1, is shown in Fig. 2A. All three SNPs localize to intron 4 of CHRNA3 strongly favoring variation within this gene as being the basis of 15q25 lung cancer association. rs12914385 and rs938682/rs8042374 seem to tag different blocks of LD (respective r2 values for rs938682-rs12914385, rs938682-rs8042374, and rs12914385-rs8042374 are 0.22, 1.00, and 0.22 based on HapMap CEU, and 0.18, 0.99, and 0.17 based on UK-GWA phase 2 controls). The possibility of two independently acting loci is supported by logistic regression whereby the ORs for rs12914385 were 1.29 (Ptrend = 4.79 × 10−16) and 1.20 (Ptrend = 1.81 × 10−7) without and with adjustment for rs8042374. Similarly the ORs for rs8042374 were 0.75 (Ptrend = 5.82 × 10−15) and 0.82 (Ptrend = 2.13 × 10−6) without and with adjustment for rs12914385 (Table 1).
Regional plots of the (A) 15q25.1, (B) 5p15.33, and (C) 6p21.33 associations. Each panel shows single-marker association statistics (as −log10P) from the analysis of UK-GWA study phase 1 (diamonds), UK-GWA study phase 2 (squares), and phases 1 and 2 combined (triangles), as a function of genomic position (National Center for Biotechnology Information build 36.1). The recombination rate across each region in HapMap CEU is shown (black, right, Y axis). Also shown for 15q25.1 (A) and 5p15.33 (B) is the relative position of genes mapping to each region of association, there are a large number of genes mapping to the 6p21.33 (C) region so for clarity only BAT3 and TNXB are illustrated. Exons of genes have been redrawn to show the relative positions in the gene; therefore, maps are not to physical scale. LD plot was generated using UK-GWA study phase 2 controls; values and shading show r2 between each pair of SNPs; the darker the shading, the greater extent of LD.
Regional plots of the (A) 15q25.1, (B) 5p15.33, and (C) 6p21.33 associations. Each panel shows single-marker association statistics (as −log10P) from the analysis of UK-GWA study phase 1 (diamonds), UK-GWA study phase 2 (squares), and phases 1 and 2 combined (triangles), as a function of genomic position (National Center for Biotechnology Information build 36.1). The recombination rate across each region in HapMap CEU is shown (black, right, Y axis). Also shown for 15q25.1 (A) and 5p15.33 (B) is the relative position of genes mapping to each region of association, there are a large number of genes mapping to the 6p21.33 (C) region so for clarity only BAT3 and TNXB are illustrated. Exons of genes have been redrawn to show the relative positions in the gene; therefore, maps are not to physical scale. LD plot was generated using UK-GWA study phase 2 controls; values and shading show r2 between each pair of SNPs; the darker the shading, the greater extent of LD.
Association (in combined analysis of UK-GWA study phases 1 and 2) between 15q25.1 SNPs (rs938682, rs12914385, and rs8042374) and lung cancer, with and without adjustment for a second variant
. | rs938682 . | . | rs12914385 . | . | rs8042374 . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Ptrend . | OR (95% CI) . | Ptrend . | OR (95% CI) . | Ptrend . | OR (95% CI) . | ||||||
Unadjusted | 1.45 × 10−14 | 0.75 (0.70–0.81) | 4.79 × 10−16 | 1.29 (1.21–1.37) | 5.82 × 10−15 | 0.75 (0.70–0.80) | ||||||
Adjusted for | ||||||||||||
rs938682 | 1.00 × 10−7 | 1.20 (1.12–1.28) | 0.91 | 0.94 (0.36–2.48) | ||||||||
rs12914385 | 4.17 × 10−6 | 0.83 (0.76–0.90) | 2.13 × 10−6 | 0.82 (0.76–0.89) | ||||||||
rs8042374 | 0.63 | 0.79 (0.30–2.08) | 1.81 × 10−7 | 1.20 (1.12–1.28) |
. | rs938682 . | . | rs12914385 . | . | rs8042374 . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Ptrend . | OR (95% CI) . | Ptrend . | OR (95% CI) . | Ptrend . | OR (95% CI) . | ||||||
Unadjusted | 1.45 × 10−14 | 0.75 (0.70–0.81) | 4.79 × 10−16 | 1.29 (1.21–1.37) | 5.82 × 10−15 | 0.75 (0.70–0.80) | ||||||
Adjusted for | ||||||||||||
rs938682 | 1.00 × 10−7 | 1.20 (1.12–1.28) | 0.91 | 0.94 (0.36–2.48) | ||||||||
rs12914385 | 4.17 × 10−6 | 0.83 (0.76–0.90) | 2.13 × 10−6 | 0.82 (0.76–0.89) | ||||||||
rs8042374 | 0.63 | 0.79 (0.30–2.08) | 1.81 × 10−7 | 1.20 (1.12–1.28) |
At 5p15.33 the best evidence for an association was provided by rs4975616 (OR, 0.86; 95% CI, 0.81–0.91; P = 6.66 × 10−7; Supplementary Table S1), which localizes within a 60-kb region of LD (1,353,580–1,412,838bp) between TERT and CLPTM1L (Fig. 2B). It has been proposed that 5p15.33 harbors two independent locifor lung cancer risk (5). The first is defined by rs402710, which maps within intron 16 of CLPTM1L and is in strong LDwith rs4975616 (r2 = 0.53 based on HapMap CEU; r2 = 0.55 based on UK-GWA phase 2 controls). The second association signal is defined by rs2736100, which maps within intron 2 of TERT. In our combined data series, the association between rs2736100 and lung cancer risk was, however, weak (OR, 0.96; Ptrend = 0.19) without and with adjustment for rs4975616 (OR, 0.97; Ptrend = 0.37; Supplementary Table S3).
The strongest evidence for a relationship between genetic variation at 6p21.33 and lung cancer risk was attained at rs3117582 and rs1150752 (OR, 1.24; 95% CI, 1.14–1.35; P = 9.13 × 10−7; OR, 1.24; 95% CI, 1.13–1.35; P = 1.93 × 10−6, respectively; Supplementary Table S1). rs3117582 (31,728,499bp) localizes to intron 1 of BAT3 and rs1150752 (32,172,704bp) localizes to exon 3 of TNXB (Fig. 2C). Genotypes are highly correlated (r2 = 0.73 based on HapMap CEU; r2 = 0.91 based on UK-GWA phase 2 controls) and on the basis of flanking recombination hotspots define a single locus at 31,676,001 to 32,303,001bps.
Excluding the SNPs mapping to 15q25.1, 5p15.33, and 6p21.33 loci, the most significant association was provided by rs11264329 and rs2844363 (P = 1.22 × 10−6 and 5.90 × 10−6, respectively), which map to 153,361,782bp on 1q22 and 37,586,864bp on 3p22.2 (Supplementary Table S1). Although suggestive of association, none were statistical significant imposing the conventionally accepted threshold for genome-wide significance (i.e., 1 × 10−7).
15q25.1, 5p15.33, and 6p21.33 variants and lung cancer histology and smoking behavior. In view of the differences in biology of NSCLC and SCLC, we examined the relationship between 15q25.1, 5p15.33, and 6p21.33 variants and tumor histology (Table 2). Variation at 15q25.1 defined by rs12914385, rs8042374, or rs9838682 was not associated with any difference in lung cancer histology. At 5p15.33 variation defined by rs4975616 (CLPM1L) was not associated with any difference in lung tumor type; however, variation defined by rs2736100 (TERT) was shown to influence lung cancer histology. Specifically, there was a significant difference in the allele frequency of rs2736100 between SCLC and NSCLC (P = 0.0011). This association was ascribable to a significantly increased frequency of the risk allele in cases with NSCLC-adenocarcinoma. Similarly for variation at 6p21.33 defined by rs3117582, although allele frequencies were not significantly different between SCLC and NSCLC cases (P = 0.15), a significant difference in allele frequency between adenocarcinoma and squamous disease was shown.
Association between 15q25.1 (rs12914385, rs8042374, and rs938682), 5p15.33 (rs4975616, rs2736100), 6p21.33 (rs3117582), and lung cancer histology in combined analysis of UK-GWA study phases 1 and 2
. | SCLC . | . | NSCLC . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | All . | . | All . | . | Adeno . | . | Squamous . | . | Other . | SCLC vs NSCLC . | . | Adeno vs Squamous . | . | |||||||||||||
SNP . | n . | MAF . | n . | MAF . | n . | MAF . | n . | MAF . | n . | P . | OR (95%CI) . | P . | OR (95% CI) . | |||||||||||||
15q25.1 | ||||||||||||||||||||||||||
rs12914385 (CHRNA3) | 1,035 | 0.44 | 3,344 | 0.44 | 903 | 0.43 | 1,677 | 0.44 | 764 | 0.71 | 1.02 (0.92–1.13) | 0.63 | 0.97 (0.87–1.09) | |||||||||||||
rs8042374 (CHRNA3) | 1,033 | 0.19 | 3,339 | 0.19 | 901 | 0.20 | 1,675 | 0.18 | 764 | 0.76 | 0.98 (0.86–1.11) | 0.20 | 1.10 (0.95–1.27) | |||||||||||||
rs938682 (CHRNA3) | 1,035 | 0.19 | 3,345 | 0.19 | 903 | 0.20 | 1,678 | 0.18 | 764 | 0.78 | 1.02 (0.90–1.16) | 0.18 | 0.91 (0.78–1.05) | |||||||||||||
5p15.33 | ||||||||||||||||||||||||||
rs4975616 (CLPTM1L) | 1,035 | 0.40 | 3,340 | 0.39 | 901 | 0.38 | 1,675 | 0.39 | 764 | 0.46 | 1.04 (0.94–1.15) | 0.85 | 0.99 (0.88–1.11) | |||||||||||||
rs2736100 (TERT) | 1,034 | 0.51 | 3,343 | 0.47 | 903 | 0.44 | 1,677 | 0.49 | 764 | 0.0011 | 1.18 (1.07–1.30) | 7.2 × 10−4 | 0.82 (0.73–0.92) | |||||||||||||
6p21.33 | ||||||||||||||||||||||||||
rs3117582 (BAT3) | 1,034 | 0.14 | 3,342 | 0.16 | 903 | 0.14 | 1,677 | 0.17 | 764 | 0.15 | 0.90 (0.79–1.04) | 0.02 | 0.83 (0.71–0.98) |
. | SCLC . | . | NSCLC . | . | . | . | . | . | . | . | . | . | . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | All . | . | All . | . | Adeno . | . | Squamous . | . | Other . | SCLC vs NSCLC . | . | Adeno vs Squamous . | . | |||||||||||||
SNP . | n . | MAF . | n . | MAF . | n . | MAF . | n . | MAF . | n . | P . | OR (95%CI) . | P . | OR (95% CI) . | |||||||||||||
15q25.1 | ||||||||||||||||||||||||||
rs12914385 (CHRNA3) | 1,035 | 0.44 | 3,344 | 0.44 | 903 | 0.43 | 1,677 | 0.44 | 764 | 0.71 | 1.02 (0.92–1.13) | 0.63 | 0.97 (0.87–1.09) | |||||||||||||
rs8042374 (CHRNA3) | 1,033 | 0.19 | 3,339 | 0.19 | 901 | 0.20 | 1,675 | 0.18 | 764 | 0.76 | 0.98 (0.86–1.11) | 0.20 | 1.10 (0.95–1.27) | |||||||||||||
rs938682 (CHRNA3) | 1,035 | 0.19 | 3,345 | 0.19 | 903 | 0.20 | 1,678 | 0.18 | 764 | 0.78 | 1.02 (0.90–1.16) | 0.18 | 0.91 (0.78–1.05) | |||||||||||||
5p15.33 | ||||||||||||||||||||||||||
rs4975616 (CLPTM1L) | 1,035 | 0.40 | 3,340 | 0.39 | 901 | 0.38 | 1,675 | 0.39 | 764 | 0.46 | 1.04 (0.94–1.15) | 0.85 | 0.99 (0.88–1.11) | |||||||||||||
rs2736100 (TERT) | 1,034 | 0.51 | 3,343 | 0.47 | 903 | 0.44 | 1,677 | 0.49 | 764 | 0.0011 | 1.18 (1.07–1.30) | 7.2 × 10−4 | 0.82 (0.73–0.92) | |||||||||||||
6p21.33 | ||||||||||||||||||||||||||
rs3117582 (BAT3) | 1,034 | 0.14 | 3,342 | 0.16 | 903 | 0.14 | 1,677 | 0.17 | 764 | 0.15 | 0.90 (0.79–1.04) | 0.02 | 0.83 (0.71–0.98) |
We investigated for an association between 15q25.1, 5p15.33, and 6p21.33 variants and smoking behavior by studying the relationship with consumption of CPD categorized into different levels of smoking quantity (Table 3). A strong relationship between all 15q25.1 variants (rs12914385, rs8042374, and rs938682) and smoking was observed (Table 3). Statistically significant allele-dependent associations between risk genotype and cigarette consumption was seen in cases. A similar trend was observed in controls for each of the SNPs but was not statistically significant. Adjusting rs8042374 or rs938682 for rs12914385 provided evidence of an independent effect of the two loci on smoking behavior (P < 0.05). No significant association was observed between 5p15.33 (rs4975616, rs2736100) and 6p21.33 (rs3117582) genotypes and smoking.
Association between smoking behavior and genotype, (b) allele frequency
A. Association between smoking behavior and genotype . | . | . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
15q25.1 . | . | . | . | . | . | . | ||||||
. | Cases . | . | . | Controls . | . | . | ||||||
. | n . | Mean CPD . | P* . | n . | Mean CPD . | P* . | ||||||
rs12914385 | ||||||||||||
CC | 1,230 | 21.59 | 373 | 18.13 | ||||||||
CT | 1,973 | 22.19 | 413 | 18.46 | ||||||||
TT† | 815 | 23.91 | 3.7 × 10−3 | 121 | 18.53 | 0.17 | ||||||
rs8042374 | ||||||||||||
AA† | 2,649 | 22.65 | 541 | 18.57 | ||||||||
AG | 1,219 | 21.96 | 314 | 18.00 | ||||||||
GG | 144 | 20.16 | 2.4 × 10−3 | 52 | 17.86 | 0.48 | ||||||
rs938682 | ||||||||||||
TT† | 2,648 | 22.66 | 542 | 18.09 | ||||||||
CT | 1,229 | 21.98 | 312 | 17.89 | ||||||||
CC | 142 | 19.99 | 1.8 × 10−3 | 53 | 18.61 | 0.37 | ||||||
5p15.33 | ||||||||||||
rs4975616 | ||||||||||||
AA† | 1,491 | 22.33 | 301 | 17.65 | ||||||||
AG | 1,893 | 22.57 | 455 | 18.91 | ||||||||
GG | 630 | 21.77 | 0.78 | 151 | 17.97 | 0.56 | ||||||
rs2736100 | ||||||||||||
GG† | 1071 | 22.19 | 217 | 18.17 | ||||||||
GT | 2,009 | 21.41 | 457 | 18.35 | ||||||||
TT | 936 | 22.42 | 0.95 | 233 | 18.45 | 0.94 | ||||||
6p21.33 | ||||||||||||
rs3117582 | ||||||||||||
AA | 2,892 | 22.36 | 679 | 18.30 | ||||||||
AC | 1,008 | 22.35 | 209 | 18.45 | ||||||||
CC† | 115 | 22.27 | 0.54 | 19 | 18.11 | 0.97 | ||||||
B. Association between smoking behavior and allele frequency | ||||||||||||
15q25.1 | ||||||||||||
rs12914385 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.42 | 479 | 0.34 | ||||||||
19–24 | 1,349 | 0.44 | 254 | 0.39 | ||||||||
25+ | 1,209 | 0.48 | 6.7 × 10−5 | 174 | 0.37 | 0.16 | ||||||
rs8042374 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,458 | 0.21 | 479 | 0.24 | ||||||||
19–24 | 1,348 | 0.18 | 254 | 0.23 | ||||||||
25+ | 1,206 | 0.17 | 1.2 × 10−3 | 174 | 0.22 | 0.46 | ||||||
rs938682 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.21 | 479 | 0.24 | ||||||||
19–24 | 1,350 | 0.18 | 254 | 0.23 | ||||||||
25+ | 1,209 | 0.17 | 9.0 × 10−4 | 174 | 0.22 | 0.46 | ||||||
5p15.33 | ||||||||||||
rs4975616 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,458 | 0.38 | 479 | 0.40 | ||||||||
19–24 | 1,349 | 0.40 | 254 | 0.45 | ||||||||
25+ | 1,208 | 0.39 | 0.55 | 174 | 0.41 | 0.52 | ||||||
rs2736100 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.52 | 479 | 0.49 | ||||||||
19–24 | 1,348 | 0.53 | 254 | 0.49 | ||||||||
25+ | 1,208 | 0.51 | 0.65 | 174 | 0.5 | 0.78 | ||||||
6p21.33 | ||||||||||||
rs3117582 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,459 | 0.16 | 479 | 0.13 | ||||||||
19–24 | 1,348 | 0.15 | 254 | 0.14 | ||||||||
25+ | 1,208 | 0.15 | 0.56 | 174 | 0.14 | 0.79 |
A. Association between smoking behavior and genotype . | . | . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
15q25.1 . | . | . | . | . | . | . | ||||||
. | Cases . | . | . | Controls . | . | . | ||||||
. | n . | Mean CPD . | P* . | n . | Mean CPD . | P* . | ||||||
rs12914385 | ||||||||||||
CC | 1,230 | 21.59 | 373 | 18.13 | ||||||||
CT | 1,973 | 22.19 | 413 | 18.46 | ||||||||
TT† | 815 | 23.91 | 3.7 × 10−3 | 121 | 18.53 | 0.17 | ||||||
rs8042374 | ||||||||||||
AA† | 2,649 | 22.65 | 541 | 18.57 | ||||||||
AG | 1,219 | 21.96 | 314 | 18.00 | ||||||||
GG | 144 | 20.16 | 2.4 × 10−3 | 52 | 17.86 | 0.48 | ||||||
rs938682 | ||||||||||||
TT† | 2,648 | 22.66 | 542 | 18.09 | ||||||||
CT | 1,229 | 21.98 | 312 | 17.89 | ||||||||
CC | 142 | 19.99 | 1.8 × 10−3 | 53 | 18.61 | 0.37 | ||||||
5p15.33 | ||||||||||||
rs4975616 | ||||||||||||
AA† | 1,491 | 22.33 | 301 | 17.65 | ||||||||
AG | 1,893 | 22.57 | 455 | 18.91 | ||||||||
GG | 630 | 21.77 | 0.78 | 151 | 17.97 | 0.56 | ||||||
rs2736100 | ||||||||||||
GG† | 1071 | 22.19 | 217 | 18.17 | ||||||||
GT | 2,009 | 21.41 | 457 | 18.35 | ||||||||
TT | 936 | 22.42 | 0.95 | 233 | 18.45 | 0.94 | ||||||
6p21.33 | ||||||||||||
rs3117582 | ||||||||||||
AA | 2,892 | 22.36 | 679 | 18.30 | ||||||||
AC | 1,008 | 22.35 | 209 | 18.45 | ||||||||
CC† | 115 | 22.27 | 0.54 | 19 | 18.11 | 0.97 | ||||||
B. Association between smoking behavior and allele frequency | ||||||||||||
15q25.1 | ||||||||||||
rs12914385 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.42 | 479 | 0.34 | ||||||||
19–24 | 1,349 | 0.44 | 254 | 0.39 | ||||||||
25+ | 1,209 | 0.48 | 6.7 × 10−5 | 174 | 0.37 | 0.16 | ||||||
rs8042374 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,458 | 0.21 | 479 | 0.24 | ||||||||
19–24 | 1,348 | 0.18 | 254 | 0.23 | ||||||||
25+ | 1,206 | 0.17 | 1.2 × 10−3 | 174 | 0.22 | 0.46 | ||||||
rs938682 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.21 | 479 | 0.24 | ||||||||
19–24 | 1,350 | 0.18 | 254 | 0.23 | ||||||||
25+ | 1,209 | 0.17 | 9.0 × 10−4 | 174 | 0.22 | 0.46 | ||||||
5p15.33 | ||||||||||||
rs4975616 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,458 | 0.38 | 479 | 0.40 | ||||||||
19–24 | 1,349 | 0.40 | 254 | 0.45 | ||||||||
25+ | 1,208 | 0.39 | 0.55 | 174 | 0.41 | 0.52 | ||||||
rs2736100 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,460 | 0.52 | 479 | 0.49 | ||||||||
19–24 | 1,348 | 0.53 | 254 | 0.49 | ||||||||
25+ | 1,208 | 0.51 | 0.65 | 174 | 0.5 | 0.78 | ||||||
6p21.33 | ||||||||||||
rs3117582 | ||||||||||||
Cig/d | ||||||||||||
1–18 | 1,459 | 0.16 | 479 | 0.13 | ||||||||
19–24 | 1,348 | 0.15 | 254 | 0.14 | ||||||||
25+ | 1,208 | 0.15 | 0.56 | 174 | 0.14 | 0.79 |
NOTE: Association between 15q25.1 (rs12914385, rs8042374, rs938682), 5p15.33 (rs4975616, rs2736100), 6p21.33 (rs3117582), and smoking behavior assessed by studying the relationship with consumption of CPD. Complete CPD information was available for 4019 UK-GWA study (phases 1 and 2) cases and 907 UK-GWA study phase 2 control samples. CPD were categorized into different smoking quantities: 1 to 18 CPD, 19 to 24 CPD, 25 or more CPD; strata defined to ensure number of individuals in each approximately equal.
Abbreviation: Adeno, adenocarcinoma; cig/d, cigarettes per day.
From Kruskall-Wallis test.
Risk genotype.
From trend test.
Meta-analysis of GWA studies. To facilitate the identification of additional risk variants, we conducted a meta-analysis pooling our UK-GWA phase 1 and phase 2 with two other studies: IARC-GWA and Texas-GWA. Pooling was based on the 21,620 autosomal SNPs genotyped in all three GWA studies, which had MAFs of >1% and no departure from Hardy-Weinberg Equilibrium (P ≤ 10−5 in cases and controls).
As expected, the strongest associations were obtained for SNPs mapping to 15q25.1 (rs8034191: OR, 1.29; 95% CI, 1.23–1.35; P = 3.24 × 10−26), 5p15.33 (rs4975616: OR, 0.87; 95% CI, 0.83–0.91; P = 2.99 × 10−9), and 6p21.33 (rs3117582: OR, 1.24; 95% CI, 1.16–1.33; P = 4.46 × 10−10; Supplementary Table S4). In the meta-analysis, the strongest association at 15q25.1 was for rs8034191, which lies 80 kb outside of CHRNA3. This SNP is in strong LD with rs12914385 (r2 = 0.72 based on HapMap CEU; r2 = 0.73 based on UK-GWA phase 2 controls). Excluding SNPs mapping to 15q25.1, 5p15.33, and 6p21.33, seven SNPs were associated with lung cancer risk at a P value of <10−5 (Supplementary Table S5). The most significant association is provided by rs748404 (OR, 0.87; 95% CI, 0.83–0.92; P = 1.08 × 10−6), mapping to 41,346,523bp on 15q15.2. Two other SNPs associated with lung cancer risk at a P value of <10−5 also map to 15q15.2 (rs504417 and rs11853991, 41,341,518bp, and 41,344,841bp, respectively) and are in strong LD with rs748404 (respective r2 values for rs748404-rs504417, rs748404-rs504417 are 0.65 and 0.68 based on HapMap CEU, and 0.59 and 0.61 based on UK-GWA phase 2 controls). rs748404 maps 3′ to Transglutaminase 5 (TGM5), with rs504417 and rs11853991 mapping to intron 1 of this gene.
Architecture of genetic susceptibility to lung cancer. On the basis of MAFs and associated genotypic risks, we estimate the 5p15.33 and 6p21.33 variants individually account for ∼1% of the excess familial risk, with the 15q25.1 locus having a much greater impact (∼5%). To gain insight into the basis of the inherited risk of lung cancer in general, we estimated the power of our analyses to identify disease-associated loci with different MAF, which would account for 1% of the familial risk (Fig. 3). With the UK-GWA study, we had >90% power to harvest variants with similar characteristics to 15q25.1. However, we only had ∼30% and 40% power to identify variants such as 5p15.33 and 6p21.33, which have much weaker effects. Using all four data sets, our meta-analysis was well-powered to identify common variants (MAF, >0.15), provided each accounts for ≥1% excess risk. Clearly for variants such as 5p15.33, power still remained limited.
Power to detect lung cancer susceptibility alleles in UK-GWA study (phases 1 and 2 combined) and the UK-IARC-Texas GWA studies meta-analysis. The power of the UK-GWA study (dashed lines; phases 1 and 2 combined) and the UK-IARC-Texas-GWA studies meta-analysis (solid lines; phases 1 and 2) to identify susceptibility alleles with different minor allele frequencies, respectively, are shown. Power to identify 5p15.33 (rs4975616), 6p21.33 (rs3117582), and 15q25.1 (rs12914385) variants in each analysis denoted by squares and diamonds, respectively (P = 10−7).
Power to detect lung cancer susceptibility alleles in UK-GWA study (phases 1 and 2 combined) and the UK-IARC-Texas GWA studies meta-analysis. The power of the UK-GWA study (dashed lines; phases 1 and 2 combined) and the UK-IARC-Texas-GWA studies meta-analysis (solid lines; phases 1 and 2) to identify susceptibility alleles with different minor allele frequencies, respectively, are shown. Power to identify 5p15.33 (rs4975616), 6p21.33 (rs3117582), and 15q25.1 (rs12914385) variants in each analysis denoted by squares and diamonds, respectively (P = 10−7).
Discussion
These analyses provide increased support that variation at 15q25.1, 5p15.33, and 6p21.33 influences the risk of developing lung cancer. Our estimate of the contribution of 15q25.1, 5p15.33, and 6p21.33 loci to the excess familial risk of lung cancer is likely to be conservative as the effect of the causal variant will typically be larger than the association detected through a tag SNP. This is especially relevant with respect to the 15q25.1 association as we provide evidence for two independent loci. Furthermore, because a high proportion of UK-GWA study phase 2 controls were spouses and unrelated friends of lung cancer cases, overmatching on life-time smoking exposure (i.e., cases and controls may have been more likely to be concordant on smoking status than individuals of the general population) may have impacted on study findings. Hence, risk estimates for smoking-related SNPs identified in our analysis may be attenuated. In addition, multiple causal variants may exist at each locus including low frequency variants with significantly larger effects on risk. This may impact significantly on the contribution of CHRNA5-A3 region to the familial risk of lung cancer, especially as our analysis provides evidence to support independent alleles at this locus.
Identification of the causal variants for these loci will be challenging, contingent on resequencing and fine mapping studies. Although in part, speculative current data provides information on the probable genetic basis of the associations at 15q25.1, 5p15.33, and 6p21.33. Although there is strong evidence that a major component (some would assert all) of the lung cancer risk associated with 15q25.1 is mediated through propensity to smoke and hence a higher exposure to smoking-related carcinogens (3, 11), it does not exclude the possibility that 15q25.1 variation also has a direct effect on lung cancer risk as has also been proposed (2, 11).
Although our UK study does not support the tenet of two independent loci at 5p15.33 for lung cancer, which has recently been proposed (5), data from both IARC and Texas provides strong evidence for an independent locus (Supplementary Table S6). Stratification of UK data by histology does, however, provide strong evidence that rs2736100 genotype influences the histology of lung cancer induction favoring development of NSCLC-adenocarcinoma (Table 2). It is therefore noteworthy in that the Texas GWA study was based on analysis of only NSCLC cases, and despite its relatively small size, a strong association between rs2736100 and risk was detected. Both CLPTM1L and TERT, which map to 5p, represent attractive candidates for lung cancer susceptibility a priori, assuming the causal variant exerts an influence through a cis effect. The biology of TERT makes it an attractive candidate for a gene that influences lung cancer risk, and moreover, association between rs401681 risk allele and shorter telomere length has recently been reported (12). High levels of polycyclic aromatic hydrocarbon adducts correlate with lung cancer risk and as a major effect of platinum on cells is through adduct formation, CLPTM1L (alias CRR9) is also an attractive lung cancer susceptibility gene as it encodes a transcript whose overexpression has been linked to cis-platinum resistance (13).
The 6p21.33 association could be mediated through any number of transcripts mapping to the region of LD. BAT3 represents a strong candidate for lung cancer susceptibility as it is implicated in apoptosis and the protein complexes with E1A-binding protein p300, required for acetylation of p53 in response to DNA damage (14). As the region of LD at 6p21.33 is extensive and contains a large number of transcripts, dissection and elucidation of the causal variant make this the most refractory to interrogation.
Findings from this GWA study provide insight into the allelic architecture of predisposition to lung cancer. Pooling data from our study with two other GWA studies provided a combined analysis of 7,560 cases and 8,205 controls. Nevertheless, we have failed to identify additional loci to 5p15.33, 6p21.33, and 15q25.1. As our power to detect the major common loci conferring risks of 1.2 or greater was high, we consider that there are unlikely to be many additional SNPs (tagged by the Illumina 550K array) with similar effects for alleles with frequencies of >0.2 in populations of European ancestry. Inevitably, we had low power to detect alleles with smaller effects and/or MAFs of <0.1. By implication, variants with such profiles may represent a much larger class of susceptibility loci for lung cancer and current GWA-based strategies based on the currently available commercial arrays are not optimally configured to identify low frequency variants with potentially stronger effects. Thus, there may a large number of low penetrance variants that remain to be discovered. Further efforts to expand the scale of GWA meta-analyses, in terms of both sample size and SNP coverage, and to increase the number of SNPs taken forward to large-scale replication may thus lead to the identification of additional variants for lung cancer. Twin studies have not, however, provided evidence for heritable factors for the risk of lung cancer (15, 16). The identification of disease-causing alleles for lung cancer may thus be inherently harder than for other cancers in which familial aggregation of a major life-style/environmental risk factor is less likely to be a confounder.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
P. Broderick and Y. Wang contributed equally to this work.
The R suite can be found at http://www.r-project.org/.
Detailed information on the tag single nucleotide polymorphism panel can be found at http://www.illumina.com/.
HAPMAP: http://www.hapmap.org/.
Genetic Lung Cancer Predisposition Study (GELCAPS): http://pfsearch.ukcrn.org.uk/StudyDetail.aspx?TopicID=1&StudyID=781.
National Study of Colorectal Cancer Genetics (NSCCG): http://pfsearch.ukcrn.org.uk/StudyDetail.aspx?TopicID=1&StudyID=1269.
1958 Birth Cohort: http://www.cls.ioe.ac.uk/studies.asp?section=0001000200030012.
Central Europe data from IARC-GWAS: http://www.ceph.fr/cancer.
Acknowledgments
Grant support: This work was supported by Cancer Research UK (C1298/A8780 and C1298/A8362-Bobby Moore Fund for Cancer Research UK) who provided principal funding for this study. A. Matakidou was the recipient of a clinical research fellowship from the Allan J Lerner Fund. We are also grateful to National Cancer Research Network, Helen Rollason Heal Cancer Charity, and Sanofi-Aventis. Additional funding was obtained from NIH grants 5R01CA055769, 5R01CA127219, 5R01CA133996, and 5R01CA121197. We acknowledge NHS funding for the Royal Marsden Biomedical Research Centre.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank all the individuals that participated in this study and the clinicians who took part in the GELCAPS consortium. This study made use of genotyping data on the 1958 Birth Cohort; these data were generated and generously supplied to us by Panagiotis Deloukas of the Wellcome Trust Sanger Institute. A full list of the investigators who contributed to the generation of the data is available from the Wellcome Trust Case Control Consortium Web site.5