Abstract
Background: Variation at TP63 has recently been shown to be associated with lung adenocarcinoma in the Asian population.
Methods: To investigate how this finding translates to the European population we compared the genotypes of SNPs annotating the TP63 locus at 3q28 in 4,462 lung cancer patients, including 911 with adenocarcinoma, and 8,235 controls from the United Kingdom.
Results: A statistically significant association between adenocarcinoma risk and SNP genotype was shown: rs10937405, OR = 1.21, P = 1.82 × 10−4; rs17429138, OR = 1.23, P = 7.49 × 10−5; and rs4396880, OR = 1.21, P = 2.03 × 10−4. Haplotype analysis was consistent with a single TP63 risk locus defined by SNPs rs10937405, rs17429138, and rs4396880. While no association between SNPs and small cell lung cancer was shown, the rs10937405 and rs439680 associations were significant for squamous cancer (respective P-values, 0.0022 and 0.02).
Conclusions: These findings show TP63 variation is a risk factor for the development of lung adenocarcinoma in the UK population. Furthermore, they provide additional insight into the subtype-specificity of the 3q28 lung cancer association.
Impact: Our data confirm the association of 3q28 with lung adenocarcinoma and that this association is not confined to the Asian population. Elucidating the functional basis of this association will be contingent on future fine mapping of the TP63 loci. Cancer Epidemiol Biomarkers Prev; 20(7); 1453–62. ©2011 AACR.
Introduction
Primary lung cancer is a major cause of cancer death worldwide causing over 1 million deaths each year (1). The various histological forms of lung cancer are typically divided into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) comprising adenocarcinoma and squamous tumors. Each of the lung cancer types has different clinicopathological characteristics reflective of differences in carcinogenesis (2).
While lung cancer is largely caused by tobacco smoking, previous studies have implicated inherited genetic factors in disease etiology. Notably, genome-wide association (GWA) studies of lung cancer have robustly demonstrated that polymorphic variation at 5p15.33 (TERT-CLPTM1), 6p21.33 (BAT3-MSH5), and 15q25.1 (CHRNA5-CHRNA3-CHRNA4) influences lung cancer risk in European populations (3–7). Given the biological differences between the different types of lung cancer, searches for histology-specific associations have been conducted. Analysis of European GWA datasets has shown that the single nucleotide polymorphism (SNP) rs2736100 (TERT) is principally associated with adenocarcinoma risk (8, 9). A recent GWA study has replicated the rs2736100 adenocarcinoma risk association in Japanese and Korean populations and additionally also showed an association between rs10937405, annotating TP63 at 3q28 and adenocarcinoma risk (10).
Understanding the effects of these risk variants in different populations is important in terms of inferring disease causality as well as for the translation of these results to risk prediction in different populations. The risk variants may confer different magnitudes of increased risk in different populations for a variety of reasons, including differences in allele frequency and linkage disequilibrium (LD) structure, and difference in genetic and environmental backgrounds that interact with the variants.
To provide further insights into the relationship between 3q28 variation and adenocarcinoma of the lung we have analyzed a large series of cases and controls from the UK.
Materials and Methods
Study participants
This analysis is based on data previously generated from a 2-stage GWA study of lung cancer (6, 8). Briefly, phase 1 comprised 1,978 cases with pathologically confirmed lung cancer ascertained through the Genetic Lung Cancer Predisposition Study (GELCAPS; ref. 11). Of the 1,978 cases (24%), 472 had a diagnosis of adenocarcinoma. A total of 5,199 individuals from the 1958 Birth cohort and National Blood Service served as source of phase 1 controls (12, 13). Phase 2 consisted of an additional 2,484 lung cancer cases ascertained through GELCAPS (11). Of the 2,484 cases (18%), 439 had a diagnosis of adenocarcinoma. Control blood samples were obtained from 3,036 healthy individuals recruited to the National Cancer Research Network genetic epidemiologic studies, the National Study of Colorectal Cancer (1999–2006; n = 541; ref. 14), GELCAPS (1999–2004; n = 1,520), and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (1999–2004; n = 975). All of the cases and controls were British residents and had self-reported European Ancestry. Table 1 provides details of the cases and controls. In phase 1, demographic information for the public accessible controls is not available. In phase 2, cases tended to be older than controls and higher proportion were male. Furthermore, the proportion of cases which were smokers was higher than in controls, and cigarette consumption was greater (Table 1). Collection of blood samples and clinicopathologiocal information from patients and controls was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki.
SNP selection and genotyping
DNA was extracted from samples using conventional methodologies and quantified using PicoGreen (Invitrogen). Genotyping of phases 1 and 2 was conducted using Illumina Human550 BeadChips and Illumina Infinium custom arrays, respectively, according to the manufacturer's protocols as previously described (6, 8). Our selection of SNPs for analysis was largely dictated by previously published data. The study reported by Miki and colleagues reported an association between rs10937405 and adenocarcinoma risk in the Asian population (10). In addition, they provided evidence for a weak association between rs4396880 and lung cancer risk in the Central European population using data previously generated by IARC researchers (10).
For phase 1 in addition to analyzing these 2 SNPs, we derived the genotypes for 35 SNPs which map to a 169 kb region of LD encompassing rs10937405 (190,865,877 bps) at 3q28 (190,707,812 bps–190,876,439 bps; Supplementary Table S1). For phase 2 analysis, we derived rs4396880 (190,838,915 bps) genotypes from Illumina phase 2 data but genotyped rs10937405 and rs17429138 (190,728,287 bps) directly using allele-specific PCR (KBiosciences). In all assays, a DNA sample was deemed to have failed if it generated genotypes at less than 95% of loci. A SNP was deemed to have failed if fewer than 95% of DNA samples generated a genotype at the locus. To monitor QC genotyping, a series of duplicate samples were genotyped in the same batches.
Statistical and bioinformatic analysis
In all analyses, a 2-sided P value of 0.05 of less was considered statistically significant. Statistical analyses were undertaken in R (v2.8) software. Deviation of the genotype frequencies in the controls from those expected under Hardy–Weinberg Equilibrium (HWE) was assessed by χ2 test. OR and associated 95% CI was calculated by unconditional logistic regression. Because of the unavailability of demographic information on the phase 1 controls, adjustment of ORs for age and gender was only undertaken for phase 2 data. To investigate the relationship between genotype with age, sex, and family history, we conducted a case-only analysis using both phase 1 and phase 2 case data. To examine the impact of genotype on smoking quantity, we tested the equality of medium cigarette consumption of the three genotype strata using the Kruskal–Wallis test. The contributing population attributable risk (PAR) from TP63 variants was derived by the formulae:
where is the prevalence in controls of the lung cancer risk allele at the locus, and is the OR of the risk allele at the locus.
Haplotype analysis was performed in PLINK (v.1.07) software (15) whereby a standard E–M algorithm is used to compare the distribution of probabilistically inferred set of haplotypes for each individual. LD metrics between HapMap SNPs were based on HapMapIII Release27, viewed using Haploview (v4.2; ref. 16) and plotted using SNAP. LD blocks were defined on the basis of HapMap recombination rate (cM/Mb) as defined using Oxford recombination hotspots (17) and on the basis of distribution of CI previously defined (18). Prediction of the untyped SNPs was carried out using IMPUTEv2, based on HapMapIII Release27 (Feb2009, NCBI B36, dbSNP26) and the 1,000 genomes project. Imputed data were analyzed using SNPTESTv2 to account for uncertainties in SNP prediction, using a threshold for maximum posterior probability of calling of 95% or more.
Results
While this study was primarily a study of the relationship between TP63 variation and risk of adenocarcinoma, we also investigated the relationship between genotype and other lung cancer subtypes. Genotypes were obtained for more than 95% of cases and controls for all SNPs irrespective of genotyping platform; hence there was no evidence of any systematic bias in genotyping. There was complete concordance between duplicate samples. The SNP allele frequencies in each of the control series in our study were similar to previously published data on the Northern European population. Furthermore, there was no evidence of population stratification as the genotype distribution in controls for each SNP satisfied HWE (i.e., P > 0.05; Supplementary Table S1).
Confining our analysis to the relationship between 3q28 variation and adenocarcinoma risk, in phase 1, 17 of the 37 SNPs provided evidence for an association at P < 0.05 (Fig. 1). The strongest association was provided by rs17429138 (per allele P = 2.51 × 10−3; Table 2). Evidence for an association was also provided by rs10937405 (per allele P = 9.24 × 10−3) and rs4396880 (per allele P = 1.00 × 10−2; Table 2).
Each of these 3 SNPs provided support for a relationship between TP63 variation and adenocarcinoma risk in phase 2 data (Table 2). ORs were unaffected, adjusting for age and sex (Supplementary Table S2). Moreover, pooling data from the 2 case–control series provided statistically significant evidence for an association between rs10937405, (per allele P = 1.82 × 10−4), rs17429138 (per allele P = 7.49 × 10−5), and rs4396880 (per allele P = 2.03 × 10−4) even with adjustment for multiple testing ascribable to evaluation of 37 SNPs. For all 3 SNPs, the association with lung adenocarcinoma risk was dose dependent, with the highest risks being conferred by homozygosity for risk genotype (Table 2).
Following these analyses, we investigated the relationship between rs10937405, rs17429138, and rs4396880 genotypes and the other lung cancer histologies (Table 2). None of the 3 SNPs provided evidence for an association between 3q28 variation and risk of SCLC (Table 2). In contrast, a strong relationship with NSCLC was shown; respective combined per allele P-values were 4.49 × 10−6, 5.07 × 10−4, and 2.98 × 10−5 (Table 2). In addition to this association being driven by an association for adenocarcinoma support was also provided by an association with squamous cancer, notably with rs10937405 and rs4396880 for which respective per allele P-values in the combined analysis were 2.15 × 10−3 and 2.00 × 10−2 (Table 2).
To explore for age- and sex-specific differences, we conducted a case-only analysis of rs10937405, rs17429138 and rs4396880, using age 65 to stratify age at diagnosis of adenocarcinoma. This analysis provided no evidence that the risk associated with TP63 genotype is modified by age or gender (Table 3). We also found no evidence to support a relationship between TP63 genotype and a family history of lung cancer (based on the definition of having at least one first-degree relative affected with lung cancer; Table 3). Using either all cases or controls we found no evidence that TP63 genotype defined by either rs10937405, rs17429138 or rs4396880 influences cigarette consumption (Table 4).
Figure 1 shows the position of the SNPs rs10937405, rs17429138, and rs4396880 mapping to 3q28 and the relative positions of the 2 isoforms of TP63; the TA and N-terminal-truncated (ΔN) TP63. Also shown is the LD structure across the region. The SNPs rs10937405, rs17429138, and rs4396880 are highly correlated within the CEU population; rs10937405–rs17429138 (r2 = 0.60, D′ = 0.86), rs17429138–rs4396880 (r2 = 0.66, D′ = 0.83), rs10937405–rs4396880 (r2 = 0.82, D′ = 0.98), thus defining a single risk haplotype (Supplementary Table S3).
Using phase 1 data, we sought to establish whether we could identify SNPs better correlated with risk of adenocarcinoma at 3q28 (190.5–191.1 Mb, encompassing TP63;Fig. 1) through imputation of untyped SNPs referencing HapMap. In total 1,497 additional HapMap SNPs mapping to the remainder of the interval were successfully imputed. Nine SNPs provided slightly superior evidence for an association with adenocarcinoma risk to that provided by rs17429138, all mapping 5′ to TP63 (rs190726018, rs34000992, rs16864458, rs35218873, rs1597774, rs2378502, rs6787097, rs9290894, rs6444380; Fig. 1).
Discussion
Our findings provide evidence that polymorphic variation annotating TP63 plays a role in determining the risk of developing lung adenocarcinoma; thereby confirming the recent observation made by Miki and colleagues (10) in an analysis of Japanese and Korean populations. In addition, our analysis provides evidence that the association while not extending to SCLC appears to also influences other forms of NSCLC.
A major strength of our study is that these data have been systematically ascertained in a consistent fashion and by making use of GWA data bias from population stratification confounding has been avoided. Population stratification is a concern in all association studies as a source of bias, as the genotype frequencies for many polymorphic variants differ markedly between ethnic groups. We have sought to further minimize this form of bias by excluding subjects with self-reported non-European ethnicity and the use of GWA SNP data to identify non-CEU individuals. Moreover, the frequency of SNP genotypes in controls were directly comparable to those seen in previously published data on the UK population. It is entirely conceivable that polymorphic variation, for example, in TP63, may contribute to the differing rates of adenocarcinoma shown between ethnic groups. The risk of adenocarcinoma associated with rs10937405 reported by Miki and colleagues (10) in Asians was higher than that seen in the United Kingdom (per allele ORs of 1.31 and 1.20, respectively), however, the risk allele is more common (0.43 vs. 0.33), suggesting the variant contributes to about 8% of the PAR for adenocarcinoma in both populations.
While it will be challenging to identify the precise mechanism by which 3q28 variation affects lung adenocarcinoma development, accumulation of DNA damage and lack of response to genotoxic stress is recognized to contribute to lung carcinogenesis. TP63 is a member of the tumor suppressor TP53 gene family, which is pivotal to cellular differentiation and responsiveness to cellular stress (19). Exposure of cells to DNA damage leads to induction of TP63 and both isoforms have the ability to transactivate TP53 target genes, hence impacting on cellular responsiveness to DNA damage (20, 21). The TAp63 isoforms are transcribed using a promoter-located upstream of exon 1 of the gene, whereas expression of the ΔNp63 isoforms are regulated by a promoter within intron 3 of TP63 (22). rs10937405, rs17429138, and rs4396880 appear to define a single risk haplotype to which a functional variant maps. While it is probable that the association annotated by this haplotype reflects a single risk variant, it does preclude the possibility that the haplotype may capture multiple functional risk alleles. Although elucidating a functional basis for the SNP associations will be contingent on fine mapping, it is entirely plausible that they may impact either directly or through LD on TP63 expression, especially as our imputed data implies a functional association 5′ to the coding region of TP63.
In summary, our data confirm TP63 as a susceptibility gene for lung adenocarcinoma and that the association is not confined to the Asian population. Furthermore, our data provides evidence that the association may extend to other forms of NSCLC.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
This work was supported by Cancer Research UK (C1298/A8780 and C1298/A8362- Bobby Moore Fund for Cancer Research UK) who provided principal funding for this study. Athena Matakidou was the recipient of a clinical research fellowship from the Allan J Lerner Fund. We are also grateful to National Cancer Research Network, Helen Rollason Heal Cancer Charity and Sanofi-Aventis. We acknowledge NHS funding for the Royal Marsden Biomedical Research Centre. We would like to thank all individuals that participated in this study and the clinicians who took part in the GELCAPS consortium. This study made use of genotyping data on the 1958 Birth Cohort and these data were generated and generously supplied to us by Panagiotis Deloukas of the Wellcome Trust Sanger Institute. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk.