DNA pooling in combination with high-throughput sequencing was done as a part of the Sequenom-Genefinder project. In the pilot study, we tested 83,715 single nucleotide polymorphisms (SNP), located primarily in gene-based regions, to identify polymorphic susceptibility variants for lung cancer. For this pilot study, 369 male cases and 287 controls of both sexes (white Europeans of Southern German origin) were analyzed. The study identified a candidate region in 22q12.2 that contained numerous SNPs showing significant case-control differences and that coincides with a region that was shown previously to be frequently deleted in lung cancer cell lines. The candidate region overlies the seizure 6-like (SEZ6L) gene. The pilot study identified a polymorphic Met430Ile substitution in the SEZ6L gene (SNP rs663048) as the top candidate for a variant modulating risk of lung cancer. Two replication studies were conducted to assess the association of SNP rs663048 with lung cancer risk. The M. D. Anderson Cancer Center study included 289 cases and 291 controls matched for gender, age, and smoking status. The Liverpool Lung Project (a United Kingdom study) included 248 cases and 233 controls. Both replication studies showed an association of the rs663048 with lung cancer risk. The homozygotes for the variant allele had more than a 3-fold risk compared with the wild-type homozygotes [combined odds ratio (OR), 3.32; 95% confidence interval (95% CI), 1.81–7.21]. Heterozygotes also had a significantly elevated risk of lung cancer from the combined replication studies with an OR of 1.15 (95% CI, 1.04–1.59). The effect remained significant after adjusting for age, gender, and pack-years of tobacco smoke. We also compared expression of SEZ6L in normal human bronchial epithelial cells (n = 7), non–small cell lung cancer (NSCLC; n = 52), and small cell lung cancer (SCLC; n = 22) cell lines by using Affymetrix HG-U133A and HG-U133B GeneChips. We found that the average expression level of SEZ6L in NSCLC cell lines was almost two times higher and in SCLC cell lines more than six times higher when compared with normal lung epithelial cell lines. Using the National Center for Biotechnology Information Gene Expression Omnibus database, we found a ∼2-fold elevated and statistically significant (P = 0.004) level of SEZ6L expression in tumor samples compared with normal lung tissues. In conclusion, the results of these studies representing 906 cases compared with 811 controls indicate a role of the SEZ6L Met430Ile polymorphic variant in increasing lung cancer risk. [Cancer Res 2007;67(17):8406–11]

Although up to 90% of lung cancers are attributable to smoking, only a small fraction of smokers develop lung cancer over their lifetimes (1), suggesting that genetic variation may contribute to lung cancer susceptibility (2). Results of segregation analyses suggest that rare autosomal dominant polymorphisms may explain susceptibility to early-onset lung cancer; however, only a minority of lung cancer cases can be explained by the presence of such variants (36). There is also a rapidly expanding body of literature on the association of common, low-penetrance genes with lung cancer risk (58). According to the latest update of Cancer Genetics Web database,10

more than forty genes may be involved in susceptibility to, and progress of lung cancer.

Identification of genetic factors modulating lung cancer risk requires a combination of effective genotyping technologies with an appropriate and efficient study design. Sequenom (San Diego, CA) has developed a DNA analysis platform, capable of high-throughput genotyping with pooled DNA allele frequency analysis. Using this approach, Sequenom implemented a Genetics Discovery platform with dense genome-wide single nucleotide polymorphism (SNP) markers (7, 8). A hypothesis-free approach using allele frequency estimates of many thousands (for lung cancer, 83,715 SNPs) of SNPs was used as a first step (pilot study) in identifying potentially relevant genetic variants. Significant SNPs identified in this first step were then individually genotyped and validated in replication studies using independent samples. The efficiency of this strategy has been shown by the rediscovery of genes shown previously to be involved in several common diseases (810). The purpose of the current study was to implement this strategy to identify genetic variation modulating lung cancer risk.

Sequenom-Genefinder pilot study (Southern Germany). Lung cancer cases for the pilot study were recruited from the Departments for Respiratory Medicine and Thoracic Surgery, Schillerhöhe Specialist Hospital (Stuttgart-Gerlingen, Germany) and the Department for Respiratory Medicine, Asklepios Specialist Hospitals (Munich-Gauting, Germany). Controls were sampled from patients with nonmalignant disease at the same hospitals. The final sample consisted of 369 male cases and 287 controls of both genders, all with a positive history of tobacco smoke exposure. A total of 83,715 SNPs, mostly in gene-based regions, were used in the analysis. Epidemiologic data were collected by personal interview. Table 1 provides a description of cases and controls used in the pilot study.

Table 1.

Description of the cases and controls from the pilot and the two replication studies

Sequenom-Genefinder pilot study (Southern Germany)
MDACC (United States)
LLP (United Kingdom)
Cases, n = 369 (%)Controls, n = 287 (%)Cases, n = 289 (%)Controls, n = 291 (%)Cases, n = 248 (%)Controls, n = 233 (%)
Gender       
    Male 369 (100) 196 (68.29) 164 (56.75) 158 (54.30) 159 (64.11) 158 (67.81) 
    Female 91 (31.71) 125 (43.25) 133 (45.70) 89 (35.89) 75 (32.19) 
Smoking status       
    Never 13 (5.24) 66 (28.33) 
    Former 168 (58.13) 173 (59.45) 105 (42.34) 121 (51.93) 
    Current 368 (100) 287 (100) 121 (41.87) 118 (40.55) 130 (52.42) 46 (19.74) 
Age (y)       
    Mean (SD) 65.4 (6.3) 63.0 (8.3) 62.7 (9.94) 62.2 (10.09) 65.9 (9.2) 65.4 (8.4) 
Sequenom-Genefinder pilot study (Southern Germany)
MDACC (United States)
LLP (United Kingdom)
Cases, n = 369 (%)Controls, n = 287 (%)Cases, n = 289 (%)Controls, n = 291 (%)Cases, n = 248 (%)Controls, n = 233 (%)
Gender       
    Male 369 (100) 196 (68.29) 164 (56.75) 158 (54.30) 159 (64.11) 158 (67.81) 
    Female 91 (31.71) 125 (43.25) 133 (45.70) 89 (35.89) 75 (32.19) 
Smoking status       
    Never 13 (5.24) 66 (28.33) 
    Former 168 (58.13) 173 (59.45) 105 (42.34) 121 (51.93) 
    Current 368 (100) 287 (100) 121 (41.87) 118 (40.55) 130 (52.42) 46 (19.74) 
Age (y)       
    Mean (SD) 65.4 (6.3) 63.0 (8.3) 62.7 (9.94) 62.2 (10.09) 65.9 (9.2) 65.4 (8.4) 

The M. D. Anderson Cancer Center replication study (United States). Cases and controls for the M. D. Anderson Cancer Center (MDACC) study were recruited from an ongoing lung cancer case-control study enrolling patients with newly diagnosed, untreated lung cancer at The University of Texas MDACC (Houston, TX). The study has been described in detail previously (11). Control subjects were recruited from the largest private multispecialty physician group in the Houston metropolitan area. The controls did not have a previous diagnosis of any type of cancer and were frequency matched to the cases on age (±5 years), sex, ethnicity, and smoking status (current, former, and never). There were no never smokers in this subgroup (defined as a person who smoked fewer than 100 cigarettes in his/her lifetime). A former smoker had quit smoking for at least 1 year before the interview. Pack-years were defined as the number of cigarettes smoked per day divided by 20 and then multiplied by the number of years smoked. Epidemiologic data were collected by personal interview. In total, 289 lung cancer cases and 291 controls were included in the analysis. Table 1 provides a description of cases and controls used in the MDACC.

The Liverpool Lung Project replication study (United Kingdom). The lung cancer case-control data were derived from the Liverpool Lung Project (LLP) from an ongoing molecular epidemiologic study of lung cancer in Liverpool, United Kingdom (12). Histologically or cytologically confirmed lung cancer cases with primary tumors were recruited from participating chest clinics. Population controls were selected from registers of General Practitioners in Liverpool to ensure similar age-sex distributions to the cases. In all studies, a standardized questionnaire was used to determine basic demographic characteristics in addition to details on smoking history, lifetime residence and occupation, history of lung diseases, family history of cancer in first-degree relatives, and exposure to environmental tobacco smoke. Smoking status was defined as in the U.S. study. In total, 248 lung cancer cases and 233 controls were included in this analysis. Table 1 provides a description of cases and controls used in the LLP study.

SNP markers and genotyping. Genomic DNA was extracted from blood peripheral leukocytes by using the Qiagen DNA blood mini kit (Qiagen) according to the manufacturer's instruction. DNA pools were formed by combining equimolar amounts of individual samples as described elsewhere (13). For the pilot study, one pool of 369 cases and one pool of 287 controls, respectively, were constructed. For assays carried out on sample pools, 25 ng of a 5-ng/L pool were used for PCRs. All PCR and MassEXTEND reactions were conducted using standard conditions. Relative allele frequency estimates were derived from calculations based on the area under the peak of mass spectrometry measurements from four analyte aliquots (14). Tests of association between disease status and each SNP were carried out as previously discussed (15). When three or more replicate measurements of a SNP were available, the corresponding variance component was estimated from the data. Otherwise, the following historical laboratory averages were used to calculate sources of variability: pool formation, 5.0 × 10−5; PCR/mass extension, 1.7 × 10−4; and chip measurement, 1.0 × 10−4. The same procedure was used for individual genotyping except 2.5 ng DNA was used and only one mass spectrometry measurement was taken. The following gene-specific primers were used to genotype rs663048: the forward PCR primer was 5′-TGGGCTATGAGCTCCAGGG-3′; the reverse PCR primer was 5′-TGCGGCTTGGAGGCATTGAT-3′; and extend primer was 5′-GAGCTCCAGGGCGCTAAGAT-3′.

The Sequenom-Genefinder pilot study included 83,715 SNPs selected based on their location within a gene region (including the coding region plus additional 10 kb at the both ends) and minor allele frequency (MAF) from a total of 125,799 experimentally validated polymorphic variations (7, 8, 10). In the first step, one PCR and primer extension reaction was carried out for 83,715 SNPs on each pool (case and control). In the second step, 4,293 SNPs (∼5%) with the most statistically significant associations were remeasured in triplicate on each DNA pool. In the third step, the 301 most significant SNPs (∼7%) from step two were individually genotyped in each sample. A total of 160 SNP markers were identified with statistically significant differences between cases and controls (P < 0.05) after individual genotyping in the German pilot study and were then genotyped in the MDACC and LLP replication samples.

Expression of the seizure 6-like gene. Besides analyzing the effect of the Met430Ile variant on lung cancer risk, we also compared the seizure 6-like (SEZ6L) expression level in normal versus cancer cell lines (our data) and in primary tumors versus normal lung tissues [data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database]. Affymetrix HG-U133A and HG-U133B high-density oligonucleotide microarrays were also used to evaluate expression of the SEZ6L gene in cell lines. Gene expression profiling was done on a panel of 52 non–small cell lung cancer (NSCLC) and 22 small cell lung cancer (SCLC) cell lines. As a control, we used seven normal human bronchial epithelial cell lines immortalized with cyclin-dependent kinase 4 and hTert or with E6/E7 with or without hTert (16), and two unimmortalized lung cell lines (NHBEC and SAEC). The list of the cell lines used in the study can be found in Supplementary Table S1. Four probes were used for SEZ6L. Signals were median normalized and log 2 transformed. Averaging of the signal across the five probes was used to estimate signal intensity for each cell line as well as for controls. Expression of the SEZ6L was detected in all cell lines.

For comparing the SEZ6L expression in normal tissues and primary lung tumors, we used the GDS619 data set with microarray gene expression data in SCLC (19 samples), non–small lung cancer (12 samples), and normal lung tissue (20 samples). Tumor samples were obtained from the patients undergoing surgery at the Cancer Institute Hospital (Tokyo, Japan). The control samples were obtained by bronchial brushes from unrelated healthy individuals. Normalized and log-transformed gene expression data were downloaded and the SEZ6L expression in normal and tumor tissues was compared (see description for GDS619 data set from the GEO database).11

Statistical analysis. The distributions of the demographic variables between cases and controls were compared using the χ2 test. For categorical variables (sex, ethnicity, and smoking status), two-sample Student's t test was used. A goodness-of-fit χ2 test was used to determine whether the polymorphisms were in Hardy-Weinberg equilibrium. The adjusted odds ratios (OR) were calculated using multiple logistic regression to control for age, sex, and intensity of smoking (pack-year), to estimate effect of SNPs on lung cancer risk. To estimate the overall ORs from the replication studies, we used the Mantel-Haenszel test (17). All statistical analyses were done using STATISTICA (StatSoft, Inc.).

The Sequenom-Genefinder pilot study. The pilot study identified a candidate region located at 22q12.2. The size of the candidate region was ∼37 kb. The region contains a cluster of 17 of 25 SNPs significantly associated with lung cancer risk (P < 0.05). Most of the SNPs in the region were in linkage disequilibrium (minimum r2 ≥ 0.8). Based on moderate or low r2 ≤ 0.6 levels of linkage disequilibrium, six significant SNPs were chosen for genotyping in two replication sample sets.

We found that the position and size of the candidate region coincides well with the position and size of the SEZ6L gene. The candidate region occupied the distal part of the 428-kb region frequently deleted in lung cancer cell lines (18). The deletion contains two genes: SEZ6L and MYO18B (18). No significant SNPs were detected in the MYO18B region (Fig. 1).

Figure 1.

Lung cancer susceptibility region of the chromosome 22q12.2. A total of 35 SNPs were genotyped in this region with median spacing ∼1 kb. Dark line, −log10 (P value) averaged across five neighboring SNPs. Blue bar, deletion identified by Nishioka et al. (18); orange bars, positions of the SEZ6L and MYO18B genes. Diamonds, positions of SNPs that were genotyped in the pilot study.

Figure 1.

Lung cancer susceptibility region of the chromosome 22q12.2. A total of 35 SNPs were genotyped in this region with median spacing ∼1 kb. Dark line, −log10 (P value) averaged across five neighboring SNPs. Blue bar, deletion identified by Nishioka et al. (18); orange bars, positions of the SEZ6L and MYO18B genes. Diamonds, positions of SNPs that were genotyped in the pilot study.

Close modal

The MDACC replication study. SNP markers with allelic frequencies found to be significantly different between cases and controls in the pilot study were validated in the MDACC replication study. The total number of SNP markers genotyped in the MDACC study was 147. Thirteen SNPs exhibited significant (P < 0.01) deviation from Hardy-Weinberg equilibrium in controls and were excluded from the analysis. The remaining 133 SNPs were analyzed to estimate their association with lung cancer. Analysis of all histologic types together enabled an identification of six cancer-associated SNP markers. In NSCLC, the same SNP markers were identified as showing significant associations with lung cancer as well as two additional SNPs (Table 2).

Table 2.

SNP markers found to be significant by analysis of the all histologic types or NSCLCs (MDACC)

SNP IDChromosomeAllelesControls
Cases
χ2P
nMAMAFSEnMAMAFSE
rs1037023 13 C/T 288 0.2 0.02 250 0.14 0.02 8.47 0.004 
rs1906520 G/T 278 0.16 0.02 241 0.12 0.01 6.37 0.012 
rs1890697 14 A/C 286 0.08 0.01 251 0.12 0.01 6.06 0.014 
rs3803 C/T 273 0.23 0.02 245 0.17 0.02 5.63 0.018 
rs663048 22 G/T 287 0.21 0.02 253 0.27 0.02 5.33 0.021 
rs422679 17 C/T 281 0.39 0.02 250 0.32 0.02 5.32 0.021 
rs765707* 12 C/T 277 0.06 0.01 247 0.1 0.01 5.18 0.023 
rs1057050* 10 G/A 288 0.05 0.01 254 0.08 0.01 4.79 0.029 
SNP IDChromosomeAllelesControls
Cases
χ2P
nMAMAFSEnMAMAFSE
rs1037023 13 C/T 288 0.2 0.02 250 0.14 0.02 8.47 0.004 
rs1906520 G/T 278 0.16 0.02 241 0.12 0.01 6.37 0.012 
rs1890697 14 A/C 286 0.08 0.01 251 0.12 0.01 6.06 0.014 
rs3803 C/T 273 0.23 0.02 245 0.17 0.02 5.63 0.018 
rs663048 22 G/T 287 0.21 0.02 253 0.27 0.02 5.33 0.021 
rs422679 17 C/T 281 0.39 0.02 250 0.32 0.02 5.32 0.021 
rs765707* 12 C/T 277 0.06 0.01 247 0.1 0.01 5.18 0.023 
rs1057050* 10 G/A 288 0.05 0.01 254 0.08 0.01 4.79 0.029 

Abbreviation: MA, minor allele.

*

SNPs were significant only in NSCLCs.

To further prioritize the significant SNPs listed in Table 2, we annotated the SNPs by using available data that potentially can help to identify causal SNPs (Table 3). The SNP rs663048 was the only nonsynonymous (Met430Ile) SNP in the list. Sorting Intolerant from Tolerant (SIFT; ref. 19) and Polymorphism Phenotyping (PolyPhen; ref. 20) software were used to predict the functional effect of the Met430Ile amino acid substitution. Both algorithms predict that the Met430Ile is a functional (protein disturbing) variant; therefore, our further analysis was concentrated on rs663048 polymorphism.

Table 3.

Annotations of the SNPs that have been shown to be significantly associated with lung cancer risk

SNP IDSNP locationChromosomeChromosome positionGeneDescriptionCancer-related*
rs1037023 Intron 13q14 24854588 FLJ10094 Hypothetical protein No 
rs1906520 7 kb Downstream CD200 3q13 113580534 CD200 Immunoglobulin No 
rs1890697 2 kb Upstream SETP2 14q22 50961888 SETP2 Pseudogene No 
rs3803 5′-UTR 3q21 129682070 GATA2 Zinc finger, hematopoiesis No 
rs663048 Missense 22q12 25025077 SEZ6L Transmembrane protein Yes 
rs422679 Intron 17p13 8226123 RPL26 Ribosomal protein L26 No 
rs765707 2 kb Downstream OR7A19P 12q12 45288613 OR7A19P Olfactory receptor No 
rs1057050 3′-UTR 10q24 104132284 GBF1 Guanine exchange factor No 
SNP IDSNP locationChromosomeChromosome positionGeneDescriptionCancer-related*
rs1037023 Intron 13q14 24854588 FLJ10094 Hypothetical protein No 
rs1906520 7 kb Downstream CD200 3q13 113580534 CD200 Immunoglobulin No 
rs1890697 2 kb Upstream SETP2 14q22 50961888 SETP2 Pseudogene No 
rs3803 5′-UTR 3q21 129682070 GATA2 Zinc finger, hematopoiesis No 
rs663048 Missense 22q12 25025077 SEZ6L Transmembrane protein Yes 
rs422679 Intron 17p13 8226123 RPL26 Ribosomal protein L26 No 
rs765707 2 kb Downstream OR7A19P 12q12 45288613 OR7A19P Olfactory receptor No 
rs1057050 3′-UTR 10q24 104132284 GBF1 Guanine exchange factor No 

Abbreviation: UTR, untranslated region.

*

An exhaustive search of literature was conducted to estimate potential relationship of the SNP to cancer.

To estimate the relative risk associated with the significant SNPs, we used a logistic regression model. Table 4 shows the predicted OR adjusted for age, intensity of smoking (pack-year), and gender. There was no significant deviation from Hardy-Weinberg equilibrium in both replication samples neither in controls nor in cases. Homozygotes for the more frequent allele were used as reference groups. OR values ranged from 1.29 to 1.72, with four of them being statistically significant. The OR for the SNP rs663048 was 1.48 (P = 0.03) with rare genotypes being associated with increased risk for NSCLC.

Table 4.

Risk estimates for lung cancer associated with rs663048 genotype in MDACC and LLP replication samples

SampleGenotypeCase, n (%)Control, n (%)OR* (95% CI)P
MDACC GG 137 (51.7) 176 (62.4) 1 (Reference)  
 TG 102 (38.5) 100 (34.8) 1.18 (0.87–2.04) 0.27 
 TT 26 (9.8) 11 (3.8) 2.84 (1.36–8.54) 0.008 
LLP GG 125 (50.4) 133 (57.1) 1 (Reference)  
 TG 91 (36.7) 85 (36.5) 1.23 (0.78–1.96) 0.40 
 TT 22 (8.9) 7 (3.0) 3.81 (1.40–10.42) 0.009 
SampleGenotypeCase, n (%)Control, n (%)OR* (95% CI)P
MDACC GG 137 (51.7) 176 (62.4) 1 (Reference)  
 TG 102 (38.5) 100 (34.8) 1.18 (0.87–2.04) 0.27 
 TT 26 (9.8) 11 (3.8) 2.84 (1.36–8.54) 0.008 
LLP GG 125 (50.4) 133 (57.1) 1 (Reference)  
 TG 91 (36.7) 85 (36.5) 1.23 (0.78–1.96) 0.40 
 TT 22 (8.9) 7 (3.0) 3.81 (1.40–10.42) 0.009 
*

ORs were adjusted for age, pack-year, and gender.

The LLP replication study (United Kingdom). The LLP replication sample was used to further validate the association between rs663048 and lung cancer risk. In this study, we also found a significant association between Met430Ile polymorphism and lung cancer with cases having a higher frequency of the variant T/T genotype (8.9 ± 1.9%) than the controls [3.0 ± 1.1%; χ2 = 7.4; degrees of freedom (df) = 1; P = 0.006]. We observed a 3.8-fold increased risk of lung cancer in individuals who carried the rs663048 null genotype T/T compared with those who carried the rs663048 more common (wild-type) genotype G/G (95% CI, 1.40–10.42) after adjusting for age, sex, and smoking (Table 4).

Expression of the SEZ6L in normal and lung tumor cell lines. Supplementary Fig. S1 shows the expression level of SEZ6L in 9 normal lung cell lines, 54 NSCLC, and 22 SCLC cell lines. The average expression signal in controls was 6.2 (n = 9) compared with the average expression signal in the NSCLC cell lines of 7 (n = 54) and in SCLC cell lines of 8.8 (n = 22). A nonparametric Mann-Whitney U test was significant for all pairwise comparisons. The smallest (but still very significant) difference was found between the normal lung cell lines and NSCLC cell lines (Z = −3.3; P = 0.001). We also found significant differences in expression level of GATA2, with the expression being higher in cancer lines compared with controls (Mann-Whitney U test, Z = −3.9; P = 0.0005). Other candidate genes listed in Table 3 did not show differential expression between lung cancer and normal lung cell lines. SNP genotypes were not available for this analysis.

Expression of the SEZ6L in primary tumor and normal lung tissues. We used the NCBI GEO database containing microarray data on the genome wide assessment of gene expression. Using the key words “lung AND cancer OR tumor,” we have identified several entries with data on gene expression. The GDS619 data set (platform GPL962: CHUGAI41K) was most appropriate for our goal to compare SEZ6L expression in normal and tumor tissues. This data set contains data on gene expression in SCLC, adenocarcinomas, and normal tissues. We found that the average log 2-transformed SEZ6L expression value was significantly higher in adenocarcinoma compared with normal lung tissues (0.34 ± 0.13 versus −0.38 ± 0.15; two sided t test = 3.1; df = 29; P = 0.004). The expression of SEZ6L was also insignificantly higher in SCLC compared with normal tissues (Student's t test = 1.4; df = 43; P = 0.15). The variance of expression values in NSCLC sample was significantly higher compared with the normal tissues (Var = 0.07 among controls versus Var = 0.42 among SCLCs; F = 6.2; P = 0.001). This result together with our analysis of the SEZ6L expression in cell lines suggests an increased level of the SEZ6L expression in lung tumor compared with normal lung tissues.

There are two major approaches used to identify cancer-related genes: (a) candidate gene approach and (b) genome-wide scan approach. The candidate gene approach is based on prior data indicating that a gene(s) is involved in cancer. For example, it has been shown that increased lung cancer risk correlates with decreased nucleotide excision repair (NER) capacity (2123). Targeting genes involved in NER has identified several polymorphisms in NER genes associated with elevated lung cancer risk (2426). The advantage of the candidate gene approach is that the number of candidate genes is limited; therefore, the number of false positives among significant associations is also expected to be low. A disadvantage of the approach is that genes without prior data on associations will not be included in the analysis and, therefore, cannot be identified by the candidate gene approach. In the genome-wide scan approach, potentially all genes in the genome are targeted, allowing the identification of novel cancer-related genes. A disadvantage of the genome-wide scan is the large number of independent tests so that a large sample size is required to distinguish between true- and false-positive associations.

In our study, we have combined elements of both approaches. At the first step, more than 83,000 SNP markers covering the coding region of the whole genome were analyzed. This analysis yielded many candidate SNPs showing significant associations with lung cancer. Significant SNPs identified in the pilot study are a mix of false positives and true associations. For selected SNPs in SEZ6L candidate region, two independent replication studies were then conducted to identify true associations. The MDACC replication study yielded eight SNPs showing associations with lung cancer risk. We then used additional SNP-related information to identify most promising genes. As a result of this analysis, the SEZ6L gene emerged as a top candidate gene to be associated with lung cancer. It is interesting that based only on the significance of the χ2 test, this gene was number five in the list. The LLP replication study further validated the association of SNP rs663048 in SEZ6L with lung cancer risk.

An analysis of the expression of the SEZ6L gene showed different expression of SEZ6L in normal and NSCLC SCLC cell lines. Interestingly, we found that the two histologic types of lung cancer had different levels of expression of SEZ6L. The average expression signal in NSCLC was 7.0 ± 0.1 and in SCLC 8.8 ± 0.2 (Mann-Whitney U test, Z = 6.0; P < 0.001). It needs to be noted that there is an apparent inconsistency between the results of the analysis of the expression of SEZ6L and the results of the association studies. In the MDACC and United Kingdom samples, the variant allele (that was predicted to be protein disturbing based on analysis of the protein structure and evolutionary conservation) was associated with increased risk for lung cancer, suggesting that loss of normal SEZ6L function may be a risk factor for lung cancer. This is consistent with the finding that SEZ6L region is often deleted in lung cancer cell lines (18). On the other hand, we found that expression of SEZ6L is elevated in lung cancer cell lines. One explanation might be that SEZ6L is both a tumor marker and a variant affecting lung cancer susceptibility. Loss of normal SEZ6L function is a risk factor for development of lung cancer; however, when lung cancer is caused by factors other than loss-of-SEZ6L function, expression of the SEZ6L is adaptively up-regulated to suppress tumorigenesis.

Several lines of evidence support the hypothesis that SEZ6L might modulate lung cancer risk. First, frequent allelic losses on 22q in NSCLCs have been reported, indicating the presence of tumor suppressor gene(s) on that chromosome arm (18). Cloning of the breakpoints revealed a 400-kb deletion containing the SEZ6L and MYO18B genes (18, 27). A study conducted by Suzuki at al. (28) suggests that SEZ6L gene may also influence development and progression of colorectal cancer. The authors found that SEZ6L was one of the few genes highly hypermethylated in primary colorectal tumors.

In the pilot study, we populated the candidate region with 35 SNPs and found that markers located in the SEZ6L gene region show a strong association with lung cancer; however, no significant associations were found in the neighboring MYO18B gene. Applying a sliding window of five neighboring SNPs revealed a peak of −log10 (P values) that coincides with the position of the SEZ6L gene. We found that the principal contributor to the peak was rs663048. The association of this SNP with lung cancer risk was verified in two independent replication studies. The rs663048 SNP is a Met430Ile amino acid substitution that has been predicted to be functional by both SIFT and PolyPhen, suggesting that this amino acid substitution is protein disturbing.

Nishioka et al. (18) found that 95% (43 of 45) of primary tumor samples carry the Met430Ile mutation. The authors did not estimate the frequency of the variant in controls. We found that 38% of controls and 48% of cases carry at least one variant allele. This suggests that ∼40% of tumors may carry a somatic Met430Ile mutation. If we consider that according to the HapMap the frequency of the Met430Ile polymorphism is lower in Japanese than in Caucasians, the percentage of accumulated somatic Met430Ile may actually be higher.

We found that homozygotes for the variant allele had 3-fold higher lung cancer risk compared with the normal variant homozygotes. Lung cancer risk was also significantly elevated in heterozygotes. According to our estimates, the frequency of the variant allele for rs663048 is 22%, which is very similar to the 20% reported for Caucasians by the HapMap database. We found that 36% of Caucasian controls are hetorozygotes and ∼4% are homozygotes for the risk allele, making the portion of Caucasians having at least one risk allele ∼40%. Results from combined, Mantel-Haenszel, analysis yielded ORs of 1.15 [95% confidence interval (95% CI), 1.04–1.59] for heterozygotes and 3.32 (95% CI, 1.81–7.21) for homozygotes. The population attributable risk percentage [PAR% = (OR − 1) × P / [(OR − 1) × P + 1] × 100, where P is the risk genotype frequency in the controls] was 7.5 for homozygotes and 8.3 for heterozygotes, suggesting ∼16% of excess risk in lung cancer cases is due to the presence of the variant allele.

In conclusion, our data together with published studies suggest that the Met430Ile variant might be a causal variant affecting risk of lung cancer. Although the strongest evidence from our study indicates this SNP, it is possible that another closely located SNP plays a dominant role in promoting lung cancer risk and that the Met340Ile variant is a marker in linkage disequilibrium with the underlying causal variant. However, further studies, especially those implementing functional assays, are warranted to provide more conclusive evidences on causal association between the Met430Ile and lung cancer risk.

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/) and at Cancer Genetics Web database: http://www.cancerindex.org/geneweb/X1501.htm.

I.P. Gorlov, P. Meyer, R. Dierkesmann, J.K. Field, and C.I. Amos contributed equally to this work.

Grant support: National Cancer Institute grants R01 CA55769 and CA 70907, Specialized Programs of Research Excellence grant P50CA70907, and Flight Attendant Medical Research Institute.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
American Cancer Society I. Cancer facts and figures 2005. Atlanta (GA): American Cancer Society; 2005.
2
Amos CI, Xu W, Spitz MR. Is there a genetic basis for lung cancer susceptibility?
Recent Results Cancer Res
1999
;
151
:
3
–12.
3
Sellers TA, Chen PL, Potter JD, Bailey-Wilson JE, Rothschild H, Elston RC. Segregation analysis of smoking-associated malignancies: evidence for Mendelian inheritance.
Am J Med Genet
1994
;
52
:
308
–14.
4
Sellers TA, Bailey-Wilson JE, Elston RC, et al. Evidence for Mendelian inheritance in the pathogenesis of lung cancer.
J Natl Cancer Inst
1990
;
82
:
1272
–9.
5
Yang P, Schwartz AG, McAllister AE, Aston CE, Swanson GM. Genetic analysis of families with nonsmoking lung cancer probands.
Genet Epidemiol
1997
;
14
:
181
–97.
6
Xu H, Spitz MR, Amos CI, Shete S. Complex segregation analysis reveals a multigene model for lung cancer.
Hum Genet
2005
;
116
:
121
–7.
7
Nelson MR, Marnellos G, Kammerer S, et al. Large-scale validation of single nucleotide polymorphisms in gene regions.
Genome Res
2004
;
14
:
1664
–8.
8
Kammerer S, Roth RB, Reneland R, et al. Large-scale association study identifies ICAM gene region as breast and prostate cancer susceptibility locus.
Cancer Res
2004
;
64
:
8906
–10.
9
Kammerer S, Burns-Hamuro LL, Ma Y, et al. Amino acid variant in the kinase binding domain of dual-specific A kinase-anchoring protein 2: a disease susceptibility polymorphism.
Proc Natl Acad Sci U S A
2003
;
100
:
4066
–71.
10
Spinola M, Meyer P, Kammerer S, et al. Association of the PDCD5 locus with lung cancer risk and prognosis in smokers.
J Clin Oncol
2006
;
24
:
1672
–8.
11
Wei Q, Cheng L, Amos CI, et al. Repair of tobacco carcinogen-induced DNA adducts and lung cancer risk: a molecular epidemiologic study.
J Natl Cancer Inst
2000
;
92
:
1764
–72.
12
Field JK, Smith DL, Duffy S, Cassidy A. The Liverpool Lung Project research protocol.
Int J Oncol
2005
;
27
:
1633
–45.
13
Buetow KH, Edmonson M, MacDonald R, et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.
Proc Natl Acad Sci U S A
2001
;
98
:
581
–4.
14
Bansal A, van den Boom D, Kammerer S, et al. Association testing by DNA pooling: an effective initial screen.
Proc Natl Acad Sci U S A
2002
;
99
:
16871
–4.
15
Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, Clayton DG. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design.
Ann Hum Genet
2002
;
66
:
393
–405.
16
Ramirez RD, Sheridan S, Girard L, et al. Immortalization of human bronchial epithelial cells in the absence of viral oncoproteins.
Cancer Res
2004
;
64
:
9027
–34.
17
Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease.
J Natl Cancer Inst
1959
;
22
:
719
–48.
18
Nishioka M, Kohno T, Takahashi M, et al. Identification of a 428-kb homozygously deleted region disrupting the SEZ6L gene at 22q12.1 in a lung cancer cell line.
Oncogene
2000
;
19
:
6251
–60.
19
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function.
Nucleic Acids Res
2003
;
31
:
3812
–4.
20
Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey.
Nucleic Acids Res
2002
;
30
:
3894
–900.
21
Friedberg EC, Bond JP, Burns DK, et al. Defective nucleotide excision repair in xpc mutant mice and its association with cancer predisposition.
Mutat Res
2000
;
459
:
99
–108.
22
Ide F, Iida N, Nakatsuru Y, Oda H, Tanaka K, Ishikawa T. Mice deficient in the nucleotide excision repair gene XPA have elevated sensitivity to benzo[a]pyrene induction of lung tumors.
Carcinogenesis
2000
;
21
:
1263
–5.
23
Cheng L, Spitz MR, Hong WK, Wei Q. Reduced expression levels of nucleotide excision repair genes in lung cancer: a case-control analysis.
Carcinogenesis
2000
;
21
:
1527
–30.
24
Zienolddiny S, Campa D, Lind H, et al. Polymorphisms of DNA repair genes and risk of non-small cell lung cancer.
Carcinogenesis
2006
;
27
:
560
–7.
25
Wu X, Zhao H, Wei Q, et al. XPA polymorphism associated with reduced lung cancer risk and a modulating effect on nucleotide excision repair capacity.
Carcinogenesis
2003
;
24
:
505
–9.
26
Shen M, Berndt SI, Rothman N, et al. Polymorphisms in the DNA nucleotide excision repair genes and lung cancer risk in Xuan Wei, China.
Int J Cancer
2005
;
116
:
768
–73.
27
Nishioka M, Kohno T, Tani M, et al. MYO18B, a candidate tumor suppressor gene at chromosome 22q12.1, deleted, mutated, and methylated in human lung cancer.
Proc Natl Acad Sci U S A
2002
;
99
:
12269
–74.
28
Suzuki H, Gabrielson E, Chen W, et al. A genomic screen for genes upregulated by demethylation and histone deacetylase inhibition in human colorectal cancer.
Nat Genet
2002
;
31
:
141
–9.

Supplementary data