Abstract
Lung cancer in lifetime never smokers is distinct from that in smokers, but the role of separate or overlapping carcinogenic pathways has not been explored. We therefore evaluated a comprehensive panel of 11,737 single-nucleotide polymorphisms (SNP) in inflammatory-pathway genes in a discovery phase (451 lung cancer cases, 508 controls from Texas). SNPs that were significant were evaluated in a second external population (303 cases, 311 controls from the Mayo Clinic). An intronic SNP in the ACVR1B gene, rs12809597, was replicated with significance and restricted to those reporting adult exposure to environmental tobacco smoke. Another promising candidate was an SNP in NR4A1, although the replication OR did not achieve statistical significance. ACVR1B belongs to the TGFR-β superfamily, contributing to resolution of inflammation and initiation of airway remodeling. An inflammatory microenvironment (second-hand smoking, asthma, or hay fever) is necessary for risk from these gene variants to be expressed. These findings require further replication, followed by targeted resequencing, and functional validation.
Significance: Beyond passive smoking and family history of lung cancer, little is known about the etiology of lung cancer in lifetime never smokers that accounts for about 15% of all lung cancers in the United States. Our two-stage candidate pathway approach examined a targeted panel of inflammation genes and has identified novel structural variants that appear to contribute to risk in patients who report prior exposure to sidestream smoking. Cancer Discovery; 1(5): 420–9. ©2011 AACR.
This article is highlighted in the In This Issue feature, p. 367
Introduction
From etiologic, molecular genetic, and biologic viewpoints, it is now fairly well accepted that lung cancer occurring in lifetime never smokers is distinct from smoking-associated lung cancer (1). It is noteworthy that the top hit from all published ever-smoking lung cancer genome-wide association studies (GWAS), the chromosome 15q25 locus encoding nicotinic acetylcholine receptor (NAChR) subunits and a proteasome subunit, has not been implicated in lung cancer risk in never smokers (2). Nevertheless, it is likely that the 2 disease entities do share some molecular features suggesting separate but overlapping pathways to lung carcinogenesis (1). Increasing evidence suggests that pathway-based approaches to identify the genetic contribution to cancer susceptibility may provide complementary information to conventional single-marker analyses.
Of intense interest in lung carcinogenesis is the inflammation pathway, because an abnormally prolonged or intense inflammatory response could create a microenvironment that promotes lung cancer development. Although tobacco-induced lung cancer is characterized by increased tissue oxidative stress and an abundant and deregulated inflammatory microenvironment (3), a similar role for inflammation in lung cancer in never smokers has not been studied in depth. We therefore evaluated a comprehensive panel of germline genetic variants in inflammatory pathway genes in risk of lung cancer in lifetime never smokers in a discovery phase of cases and controls selected from an ongoing multiracial/ethnic lung cancer case–control study that has recruited study participants from The University of Texas MD Anderson Cancer Center from 1995 onward (4). We performed a replication analysis in an independent sample of never-smoking lung cancer cases and controls from the Mayo Clinic (5). Lung cancer in never smokers accounts for 15% of all lung cancers in the United States, yet beyond passive smoking and family history of lung cancer, few other well-established genetic or nongenetic clues to its etiology are known.
Results
For the discovery phase, we recruited 451 non–small cell lung cancer cases and 508 controls, all lifetime never smokers (Table 1). Of these subjects, about two thirds of the cases and controls (650) were included in our previously published risk model for never smokers (6). Adenocarcinoma was diagnosed in 76% of the cases. On average, the controls were 5 years younger than the cases. More than 60% of both the cases and controls were women. The percentages of self-reported environmental tobacco smoke (ETS) exposure were 83% and 75% for the discovery cases and controls, respectively. The associations between asthma and dust exposure were not statistically significant. However, a history of hay fever (OR = 0.70; P = 0.02), passive smoking exposure (OR = 1.59; P = 0.01), family history of 2 or more first-degree relatives with any cancer (OR = 2.24, P < 0.001) (data not shown), or 2 or more first-degree relatives with lung cancer (OR = 3.47, P = 0.04) all achieved statistical significance.
Distribution of selected variables in discovery and replication populations
. | . | . | Discovery . | . | . | Replication . | . | . |
---|---|---|---|---|---|---|---|---|
. | Cases . | Controls . | OR . | . | Cases . | Controls . | OR . | . |
Variable | (n = 451) | (n = 508) | (95% CI)a | P value | (n = 303) | (n = 311) | (95% CI) | P value |
Age | ||||||||
Mean (SD) | 61.6 (13.0) | 56.6 (13.1) | 1.03 (1.02–1.04) | <0.0001 | 62.0 (12.9) | 62.2 (13.1) | 1.03 (1.02–1.04) | <0.0001 |
Sex | ||||||||
Male, n (%) | 147 (32.6) | 190 (37.4) | 83 (27.4) | 86 (27.7) | ||||
Female, n (%) | 304 (67.4) | 318 (62.6) | 0.81 (0.62–1.06) | 0.1198 | 220 (72.6) | 225 (72.4) | 0.99 (0.7–1.4) | 0.9425 |
Asthma | ||||||||
No | 369 (87.0) | 442 (87.3) | 263 (86.8) | 278 (89.4) | ||||
Yes | 55 (13.0) | 64 (12.4) | 1.03 (0.70–1.51) | 0.8830 | 40 (13.2) | 33 (10.6) | 1.28 (0.8–2.1) | 0.3223 |
Hay fever | ||||||||
No | 332 (78.3) | 363 (71.7) | N/A | |||||
Yes | 92 (21.7) | 143 (28.3) | 0.70 (0.52–0.95) | 0.022 | ||||
Dust | ||||||||
No | 345 (81.4) | 429 (84.8) | N/A | |||||
Yes | 79 (18.6) | 77 (15.2) | 1.28 (0.90–1.80) | 0.1657 | ||||
ETS | ||||||||
No | 57 (17.0) | 115 (24.6) | 99 (33.3) | 135 (43.8) | ||||
Yes | 278 (83.0) | 353 (75.4) | 1.59 (1.12–2.26) | 0.0104 | 198 (66.7) | 173 (56.2) | 1.56 (1.12–2.17) | 0.0082 |
Family history of lung cancer | ||||||||
0 | 348 (84.5) | 439 (87.1) | 213 (70.3) | 258 (83.0) | ||||
1 | 53 (12.9) | 61 (12.1) | 1.10 (0.74–1.63) | 0.6482 | 90 (29.7) | 53 (17.0) | 2.06 (1.40–3.02) | 0.0002 |
2 | 11 (2.7) | 4 (0.8) | 3.47 (1.09–10.98) | 0.0346 |
. | . | . | Discovery . | . | . | Replication . | . | . |
---|---|---|---|---|---|---|---|---|
. | Cases . | Controls . | OR . | . | Cases . | Controls . | OR . | . |
Variable | (n = 451) | (n = 508) | (95% CI)a | P value | (n = 303) | (n = 311) | (95% CI) | P value |
Age | ||||||||
Mean (SD) | 61.6 (13.0) | 56.6 (13.1) | 1.03 (1.02–1.04) | <0.0001 | 62.0 (12.9) | 62.2 (13.1) | 1.03 (1.02–1.04) | <0.0001 |
Sex | ||||||||
Male, n (%) | 147 (32.6) | 190 (37.4) | 83 (27.4) | 86 (27.7) | ||||
Female, n (%) | 304 (67.4) | 318 (62.6) | 0.81 (0.62–1.06) | 0.1198 | 220 (72.6) | 225 (72.4) | 0.99 (0.7–1.4) | 0.9425 |
Asthma | ||||||||
No | 369 (87.0) | 442 (87.3) | 263 (86.8) | 278 (89.4) | ||||
Yes | 55 (13.0) | 64 (12.4) | 1.03 (0.70–1.51) | 0.8830 | 40 (13.2) | 33 (10.6) | 1.28 (0.8–2.1) | 0.3223 |
Hay fever | ||||||||
No | 332 (78.3) | 363 (71.7) | N/A | |||||
Yes | 92 (21.7) | 143 (28.3) | 0.70 (0.52–0.95) | 0.022 | ||||
Dust | ||||||||
No | 345 (81.4) | 429 (84.8) | N/A | |||||
Yes | 79 (18.6) | 77 (15.2) | 1.28 (0.90–1.80) | 0.1657 | ||||
ETS | ||||||||
No | 57 (17.0) | 115 (24.6) | 99 (33.3) | 135 (43.8) | ||||
Yes | 278 (83.0) | 353 (75.4) | 1.59 (1.12–2.26) | 0.0104 | 198 (66.7) | 173 (56.2) | 1.56 (1.12–2.17) | 0.0082 |
Family history of lung cancer | ||||||||
0 | 348 (84.5) | 439 (87.1) | 213 (70.3) | 258 (83.0) | ||||
1 | 53 (12.9) | 61 (12.1) | 1.10 (0.74–1.63) | 0.6482 | 90 (29.7) | 53 (17.0) | 2.06 (1.40–3.02) | 0.0002 |
2 | 11 (2.7) | 4 (0.8) | 3.47 (1.09–10.98) | 0.0346 |
The replication set (Table 1) included 303 cases and 311 controls, all lifetime never smokers and well matched on age and gender. ETS exposure was reported by 67% of the cases and 56% of the controls. Passive smoking (OR = 1.56; P = 0.008) and family history of lung cancer (OR = 2.06; P = 0.0002) were significantly associated with risk. Asthma was not a risk factor in this population. Ten cases, but no controls reported a history of emphysema (OR = 10.6; P = 0.02).
In total, 11,737 single-nucleotide polymorphisms (SNP) were available for analysis in the discovery phase. Table 2 summarizes the subpathways, genes, and SNPs included in the customized Illumina inflammation chip, and as outlined in Loza and colleagues (7). In univariate analysis, assuming an additive model, 21 SNPs were statistically significant with P values ≤0.001 and Bayesian false discovery probability (BFDP) levels ≤ 0.8 (Table 3).
Summary of inflammation subpathways, genes, and SNPs on Illumina chip
Pathwaya . | Genes (n) . | SNPs (n) . |
---|---|---|
Adhesion-extravasation-migration | 12 | 108 |
Apoptosis signaling | 67 | 834 |
Complement cascade | 3 | 8 |
Cytokine signaling | 266 | 3,139 |
Glucocorticoid/PPAR signaling | 24 | 258 |
Innate pathogen detection | 53 | 542 |
Leukocyte signaling | 132 | 2,023 |
MAPK signaling | 156 | 2,854 |
Natural killer cell signaling | 31 | 296 |
Phagocytosis-Ag presentation | 41 | 488 |
PI3K/AKT signaling | 45 | 580 |
ROS/glutathione/cytotoxic granules | 25 | 231 |
TNF superfamily signaling | 49 | 569 |
Total | 904 | 11,930 |
Pathwaya . | Genes (n) . | SNPs (n) . |
---|---|---|
Adhesion-extravasation-migration | 12 | 108 |
Apoptosis signaling | 67 | 834 |
Complement cascade | 3 | 8 |
Cytokine signaling | 266 | 3,139 |
Glucocorticoid/PPAR signaling | 24 | 258 |
Innate pathogen detection | 53 | 542 |
Leukocyte signaling | 132 | 2,023 |
MAPK signaling | 156 | 2,854 |
Natural killer cell signaling | 31 | 296 |
Phagocytosis-Ag presentation | 41 | 488 |
PI3K/AKT signaling | 45 | 580 |
ROS/glutathione/cytotoxic granules | 25 | 231 |
TNF superfamily signaling | 49 | 569 |
Total | 904 | 11,930 |
Abbreviations: Ag, antigen; AKT, MAPK, mitogen-activated protein kinase; PPAR, peroxisome proliferator-activated receptor; ROS, reactive oxygen species; SNP, single-nucleotide polymorphism.
See ref. 7
Significant SNPs in discovery set (additive model)
CHR . | SNP . | BP . | Minor allele . | OR . | L95 . | U95 . | P value . | Location . | Gene . |
---|---|---|---|---|---|---|---|---|---|
1 | rs10127728 | 171417779 | A | 1.679 | 1.266 | 2.226 | 0.0003 | Flanking_3UTR | TNFSF4 |
1 | rs549471 | 158972243 | A | 0.6511 | 0.5058 | 0.8381 | 0.0009 | Flanking_5UTR | SLAMF7 |
1 | rs12131065 | 67541594 | A | 1.416 | 1.149 | 1.746 | 0.0011 | Flanking_5UTR | IL12RB2 |
1 | rs2300095 | 11188304 | A | 1.377 | 1.131 | 1.676 | 0.0015 | Intron | MTOR |
2 | rs17488897 | 97731596 | G | 0.7201 | 0.5947 | 0.8718 | 0.0008 | Flanking_3UTR | ZAP70 |
2 | rs1464572 | 45807414 | A | 0.7392 | 0.6157 | 0.8874 | 0.0012 | Intron | PRKCE |
2 | rs13432276 | 46165464 | C | 1.358 | 1.125 | 1.639 | 0.0015 | Intron | PRKCE |
5 | rs4585495 | 159717272 | A | 2.147 | 1.391 | 3.316 | 0.0006 | Intron | C1QTNF2 |
5 | rs745749 | 179648409 | G | 1.359 | 1.131 | 1.634 | 0.0011 | Intron | MAPK9 |
5 | rs17651965 | 149435808 | C | 0.6051 | 0.4454 | 0.8221 | 0.0013 | Intron | CSF1R |
6 | rs350294 | 158440305 | G | 1.399 | 1.152 | 1.699 | 0.0007 | Flanking_3UTR | SYNJ2 |
10 | rs1887327 | 6648739 | G | 0.6144 | 0.4691 | 0.8047 | 0.0004 | Flanking_5UTR | PRKCQ |
11 | rs11819995 | 127894601 | A | 1.479 | 1.186 | 1.843 | 0.0005 | Intron | ETS1 |
12 | ars12809597 | 50642590 | C | 0.7192 | 0.5892 | 0.8778 | 0.0012 | Intron | ACVR1B |
12 | brs2701129 | 50715744 | C | 0.6261 | 0.4751 | 0.825 | 0.0009 | Flanking_5UTR | NR4A1 |
13 | rs9518587 | 101309356 | G | 1.504 | 1.191 | 1.9 | 0.0006 | Flanking_3UTR | FGF14 |
14 | rs11621263 | 24164990 | A | 1.581 | 1.218 | 2.054 | 0.0006 | Flanking_3UTR | GZMB |
14 | rs11629129 | 24164696 | A | 1.743 | 1.253 | 2.426 | 0.001 | Flanking_3UTR | GZMB |
14 | rs11158813 | 24154521 | A | 1.721 | 1.236 | 2.397 | 0.0013 | Flanking_5UTR | GZMH |
17 | rs11653414 | 5355762 | C | 0.5093 | 0.3409 | 0.7609 | 0.001 | Intron | NLRP1 |
21 | rs962859 | 33569993 | C | 0.7385 | 0.6148 | 0.8871 | 0.0012 | Intron | IL10RB |
CHR . | SNP . | BP . | Minor allele . | OR . | L95 . | U95 . | P value . | Location . | Gene . |
---|---|---|---|---|---|---|---|---|---|
1 | rs10127728 | 171417779 | A | 1.679 | 1.266 | 2.226 | 0.0003 | Flanking_3UTR | TNFSF4 |
1 | rs549471 | 158972243 | A | 0.6511 | 0.5058 | 0.8381 | 0.0009 | Flanking_5UTR | SLAMF7 |
1 | rs12131065 | 67541594 | A | 1.416 | 1.149 | 1.746 | 0.0011 | Flanking_5UTR | IL12RB2 |
1 | rs2300095 | 11188304 | A | 1.377 | 1.131 | 1.676 | 0.0015 | Intron | MTOR |
2 | rs17488897 | 97731596 | G | 0.7201 | 0.5947 | 0.8718 | 0.0008 | Flanking_3UTR | ZAP70 |
2 | rs1464572 | 45807414 | A | 0.7392 | 0.6157 | 0.8874 | 0.0012 | Intron | PRKCE |
2 | rs13432276 | 46165464 | C | 1.358 | 1.125 | 1.639 | 0.0015 | Intron | PRKCE |
5 | rs4585495 | 159717272 | A | 2.147 | 1.391 | 3.316 | 0.0006 | Intron | C1QTNF2 |
5 | rs745749 | 179648409 | G | 1.359 | 1.131 | 1.634 | 0.0011 | Intron | MAPK9 |
5 | rs17651965 | 149435808 | C | 0.6051 | 0.4454 | 0.8221 | 0.0013 | Intron | CSF1R |
6 | rs350294 | 158440305 | G | 1.399 | 1.152 | 1.699 | 0.0007 | Flanking_3UTR | SYNJ2 |
10 | rs1887327 | 6648739 | G | 0.6144 | 0.4691 | 0.8047 | 0.0004 | Flanking_5UTR | PRKCQ |
11 | rs11819995 | 127894601 | A | 1.479 | 1.186 | 1.843 | 0.0005 | Intron | ETS1 |
12 | ars12809597 | 50642590 | C | 0.7192 | 0.5892 | 0.8778 | 0.0012 | Intron | ACVR1B |
12 | brs2701129 | 50715744 | C | 0.6261 | 0.4751 | 0.825 | 0.0009 | Flanking_5UTR | NR4A1 |
13 | rs9518587 | 101309356 | G | 1.504 | 1.191 | 1.9 | 0.0006 | Flanking_3UTR | FGF14 |
14 | rs11621263 | 24164990 | A | 1.581 | 1.218 | 2.054 | 0.0006 | Flanking_3UTR | GZMB |
14 | rs11629129 | 24164696 | A | 1.743 | 1.253 | 2.426 | 0.001 | Flanking_3UTR | GZMB |
14 | rs11158813 | 24154521 | A | 1.721 | 1.236 | 2.397 | 0.0013 | Flanking_5UTR | GZMH |
17 | rs11653414 | 5355762 | C | 0.5093 | 0.3409 | 0.7609 | 0.001 | Intron | NLRP1 |
21 | rs962859 | 33569993 | C | 0.7385 | 0.6148 | 0.8871 | 0.0012 | Intron | IL10RB |
Abbreviations: BP, base pair; CHR, chromosome.
Replicated in the Mayo data set (OR = 0.80; 0.62–1.02); P = 0.069.
Mayo data set (OR = 0.85; 0.60–1.21); P = 0.36.
In the replication analysis of these 21 SNPs from the discovery phase, only one, rs12809597 in the ACVR1B gene, was concordant for direction with the discovery phase [discovery OR = 0.72 (0.59, 0.88); P = 0.0012] (Table 3) but was of borderline overall significance [replication OR = 0.80 (0.62, 1.02); P = 0.069]. For women specifically, the OR in the replication population was 0.67, P = 0.0097, but was not statistically significant in men, although the numbers were small. In the combined data sets, the overall OR for rs12809597 was 0.72, P = 0.0002. For women only, the overall OR was 0.72, P = 0.0013; for men, the combined OR was 0.74, P = 0.05. A second SNP in this region, rs2701129 in the 5′ UTR of NR4A1, was strongly significant in our data (OR = 0.63; P = 0.0009) but did not achieve statistical significance in the Mayo Clinic data, although the OR was in the same protective direction (OR = 0.85; P = 0.36).
We also conducted stratified analysis by select variables including ETS exposure, family history of lung cancer, hay fever, and asthma (data not shown). Notably, the significant association between lung cancer risk and rs12809597 was evident in only those who reported ETS exposure, OR = 0.67; P = 7.8 ×10−5, compared with an OR of 0.78, P = 0.39 in those who denied ETS exposure. In the discovery data, this ACVR1B SNP was significantly protective in both men [OR = 0.47 (0.30–0.73); P = 0.0010], and women with ETS exposure [OR = 0.74 (0.54–1.01); P = 0.0543]. In the replication, this pattern was evident only in women with ETS exposure [OR = 0.60 (0.41–0.88); P = 0.009]. It is noteworthy that only 83 male cases were present in the replication set, and power is therefore limited for these subset analyses. Likewise, rs2701129 in NR4A1 was statistically significant only in ETS-exposed subjects in the discovery set [OR = 0.61 (0.43–0.87); P = 0.0068]. We also noted a greater significant effect for NR4A1 (OR = 0.31; P = 0.0081) in those with asthma (the risk group for lung cancer in never smokers), compared with those who denied having asthma (OR = 0.69; P = 0.0165). However, although we did not note a similar pattern in the discovery data for ACVR1B, we saw an identical pattern for the ACVR1B SNP in the Mayo Clinic data for those with and without asthma (OR = 0.39; P = 0.02 vs. OR = 0.86; P = 0.27, respectively).
Also of interest is that in the discovery set, in those who denied having had hay fever (i.e., the risk group), the ORs were significantly protective for both ACVR1B (OR = 0.70; P = 0.0026 and NR4A1 (OR = 0.54; P = 0.0003). We did not have comparable data for analysis in the replication set. We previously reported (8) that, paradoxically, those with both conditions (asthma and hay fever) had a significantly elevated lung cancer risk (OR = 2.43; 95% confidence interval = 1.11–5.35). It is in this subgroup (asthma and hay fever) that we detected the greatest protective effect with NR4A1 (OR = 0.28; P = 0.04).
We hypothesized that polymorphisms in genes directly associated with ACVR1B might contribute to the risk noted for the ACVR1B SNP. Therefore, we used an in silico approach, Pathway Studio (9), to identify upstream regulators and downstream targets of ACVR1B. Direct interactions between genes (i.e., direct regulation of gene expression, protein/protein binding, or binding to the promoter region) were used to construct the network. Based on these criteria, we identified 25 upstream regulators and 39 downstream targets of the ACVR1B gene. In this study, we had genotype data for 11 upstream regulators and 16 downstream targets. Of these, none was nominally significant at P < 0.05 in additive models.
Imputation was performed to increase coverage of SNPs in the region surrounding rs12809597 in the ACVR1B gene for their association with lung cancer risk (Fig. 1). Before imputation, 30 genotyped SNPs were found, 23 of which were between 50.58 Mb and 50.74 Mb. After imputation, 156 SNPs exhibited r2 > 0.8 and MAF > 0.01 and were adequately reliably imputed between 50.58 Mb and 50.74 Mb. Best-guess genotypes were used in the analysis. The most likely candidate SNP, rs1882119 (P = 1.76 × 10−4), an imputed SNP (r2 = 0.9849) in this region is in an intron of NR4A1, not ACVR1B. Conversely, rs2701129 (P =1.96 × 10−4) was directly genotyped. Because the r2 for rs12809597(ACVR1B) and rs2701129(NR4A1) was only 0.013, we further investigated relevant SNPs in NR4A1.
Association of imputed and genotyped SNPs in the chromosome 12 region around ACVR1B and NR4A1 with lung cancer risk. Chromosomal position is on the x-axis and negative logarithm to the base 10 of the P values from logistic regression analysis is on the y-axis. Genotyped SNPs are plotted as solid diamonds, and imputed SNPs, as open circles. The two most significant SNPs in the region rs1882119 and chr12:50735912 are plotted in red. The overall structure of the linkage disequilibrium (LD) with SNPs in this region is reflected by estimated recombination rates from the genetic map of Hapmap in build 36 coordinates. The strength of the pairwise correlation between the surrounding markers and the most significant SNP (rs1882119) is reflected by the size of the symbols: the larger the size, the stronger the LD. LD was calculated from actual genotyped or imputed data by using PLINK. Genes in the region are annotated with location, range, and orientation by using gene annotations from the UCSC genome browser (downloaded from Broad Institute website). The original downloaded files were in build 35 positions and converted to build 36 positions (46).
Association of imputed and genotyped SNPs in the chromosome 12 region around ACVR1B and NR4A1 with lung cancer risk. Chromosomal position is on the x-axis and negative logarithm to the base 10 of the P values from logistic regression analysis is on the y-axis. Genotyped SNPs are plotted as solid diamonds, and imputed SNPs, as open circles. The two most significant SNPs in the region rs1882119 and chr12:50735912 are plotted in red. The overall structure of the linkage disequilibrium (LD) with SNPs in this region is reflected by estimated recombination rates from the genetic map of Hapmap in build 36 coordinates. The strength of the pairwise correlation between the surrounding markers and the most significant SNP (rs1882119) is reflected by the size of the symbols: the larger the size, the stronger the LD. LD was calculated from actual genotyped or imputed data by using PLINK. Genes in the region are annotated with location, range, and orientation by using gene annotations from the UCSC genome browser (downloaded from Broad Institute website). The original downloaded files were in build 35 positions and converted to build 36 positions (46).
In parallel with the ACVR1B analysis described earlier, we identified 170 upstream/downstream genes related to NR4A1, of which 65 genes and 568 SNPs had been included in our inflammation panel. Of these, 17 SNPs had P values < 0.01 in univariate analysis, assuming an additive model (Table 4). Five of these SNPs (NR4A2, NR4A1, TP53, BCL2, and MAP2K2), based on P values < 0.05, remained statistically significant in models using logistic regression forward or stepwise selection procedures, and with controlling for age, sex, second-hand smoking exposure, and family history of lung cancer (Table 5).
SNPs from NR4A1 targets and upstream and downstream regulators (additive model)
CHR . | SNP . | BP . | OR (95% CI)a . | P value . | MAF . | Location . | Gene . |
---|---|---|---|---|---|---|---|
1 | rs566421 | 26782253 | 1.23 (1.02–1.48) | 0.029 | 0.38 | Flanking_3UTR | RPS6KA1 |
1 | rs10159180 | 154746948 | 0.83 (0.69–1.00) | 0.050 | 0.48 | Flanking_5UTR | MEF2D |
2 | rs13428968 | 156900859 | 1.30 (1.02–1.66) | 0.034 | 0.16 | Flanking_5UTR | NR4A2 |
4 | rs7656411 | 154847105 | 1.23 (1.00–1.50) | 0.048 | 0.26 | Flanking_3UTR | TLR2 |
5 | rs10482642 | 142708224 | 1.28 (1.00–1.63) | 0.047 | 0.17 | Flanking_3UTR | NR3C1 |
12 | rs2701129 | 50715744 | 0.63 (0.48–0.83) | 0.001 | 0.13 | Flanking_5UTR | NR4A1 |
12 | rs2701124 | 50734424 | 0.58 (0.41–0.82) | 0.002 | 0.08 | Coding | NR4A1 |
15 | rs325383 | 98072569 | 1.28 (1.05–1.57) | 0.016 | 0.29 | Flanking_3UTR | MEF2A |
17 | rs2078486 | 7523808 | 0.71 (0.51–1.00) | 0.050 | 0.08 | Intron | TP53 |
18 | rs4987856 | 58944474 | 1.44 (1.07–1.94) | 0.016 | 0.10 | 3UTR | BCL2 |
18 | rs1977971 | 59119174 | 1.28 (1.06–1.53) | 0.009 | 0.46 | Intron | BCL2 |
18 | rs11152377 | 59123426 | 0.83 (0.69–0.99) | 0.036 | 0.42 | Intron | BCL2 |
18 | rs1462129 | 59131851 | 1.26 (1.05–1.51) | 0.013 | 0.48 | Intron | BCL2 |
18 | rs1801018 | 59136859 | 0.82 (0.69–0.99) | 0.037 | 0.43 | Coding | BCL2 |
19 | rs8101696 | 4066452 | 0.60 (0.41–0.89) | 0.011 | 0.06 | Intron | MAP2K2 |
19 | rs4808100 | 17792497 | 1.24 (1.03–1.49) | 0.020 | 0.37 | Intron | INSL3 |
20 | rs6063022 | 35426741 | 1.33 (1.03–1.71) | 0.027 | 0.15 | Intron | SRC |
CHR . | SNP . | BP . | OR (95% CI)a . | P value . | MAF . | Location . | Gene . |
---|---|---|---|---|---|---|---|
1 | rs566421 | 26782253 | 1.23 (1.02–1.48) | 0.029 | 0.38 | Flanking_3UTR | RPS6KA1 |
1 | rs10159180 | 154746948 | 0.83 (0.69–1.00) | 0.050 | 0.48 | Flanking_5UTR | MEF2D |
2 | rs13428968 | 156900859 | 1.30 (1.02–1.66) | 0.034 | 0.16 | Flanking_5UTR | NR4A2 |
4 | rs7656411 | 154847105 | 1.23 (1.00–1.50) | 0.048 | 0.26 | Flanking_3UTR | TLR2 |
5 | rs10482642 | 142708224 | 1.28 (1.00–1.63) | 0.047 | 0.17 | Flanking_3UTR | NR3C1 |
12 | rs2701129 | 50715744 | 0.63 (0.48–0.83) | 0.001 | 0.13 | Flanking_5UTR | NR4A1 |
12 | rs2701124 | 50734424 | 0.58 (0.41–0.82) | 0.002 | 0.08 | Coding | NR4A1 |
15 | rs325383 | 98072569 | 1.28 (1.05–1.57) | 0.016 | 0.29 | Flanking_3UTR | MEF2A |
17 | rs2078486 | 7523808 | 0.71 (0.51–1.00) | 0.050 | 0.08 | Intron | TP53 |
18 | rs4987856 | 58944474 | 1.44 (1.07–1.94) | 0.016 | 0.10 | 3UTR | BCL2 |
18 | rs1977971 | 59119174 | 1.28 (1.06–1.53) | 0.009 | 0.46 | Intron | BCL2 |
18 | rs11152377 | 59123426 | 0.83 (0.69–0.99) | 0.036 | 0.42 | Intron | BCL2 |
18 | rs1462129 | 59131851 | 1.26 (1.05–1.51) | 0.013 | 0.48 | Intron | BCL2 |
18 | rs1801018 | 59136859 | 0.82 (0.69–0.99) | 0.037 | 0.43 | Coding | BCL2 |
19 | rs8101696 | 4066452 | 0.60 (0.41–0.89) | 0.011 | 0.06 | Intron | MAP2K2 |
19 | rs4808100 | 17792497 | 1.24 (1.03–1.49) | 0.020 | 0.37 | Intron | INSL3 |
20 | rs6063022 | 35426741 | 1.33 (1.03–1.71) | 0.027 | 0.15 | Intron | SRC |
Abbreviations: CHR, chromosome; CI, confidence interval; MAF, multiple alignment format.
Univariate analysis.
Stepwise logistic model including upstream and downstream regulators of NR4A1 (additive model)
SNP . | OR (95% CI)a . | P value . |
---|---|---|
rs13428968 | 1.46 (1.09–1.95) | 0.0103 |
rs2701124 | 0.52 (0.35–0.80) | 0.0024 |
rs2078486 | 0.61 (0.41–0.93) | 0.0212 |
rs1977971 | 1.36 (1.10–1.68) | 0.0049 |
rs8101696 | 0.56 (0.35–0.88) | 0.0122 |
SNP . | OR (95% CI)a . | P value . |
---|---|---|
rs13428968 | 1.46 (1.09–1.95) | 0.0103 |
rs2701124 | 0.52 (0.35–0.80) | 0.0024 |
rs2078486 | 0.61 (0.41–0.93) | 0.0212 |
rs1977971 | 1.36 (1.10–1.68) | 0.0049 |
rs8101696 | 0.56 (0.35–0.88) | 0.0122 |
Adjusted for age, sex, second-hand smoking, and family history of lung cancer.
Our original risk model was constructed based on 709 never smokers (330 lung cancer cases and 379 controls) (6). Of the total of 959 never smokers in this new analysis, 650 (68%) overlapped in both analyses. The published AUC for never smokers in that model was 0.57. The point estimate of the AUC for those not included in our original study (N = 309) was 0.56. The AUC statistic for the baseline model in the entire discovery dataset, incorporating the same clinical and epidemiologic variables (age, gender, family history of lung cancer, and ETS exposure) was 0.62 (data not shown). With the addition of the replicated SNP, rs12809597, the AUC increased to 0.64, P = 0.098. The comparable model for the Mayo Clinic data with addition of rs12809597 yielded an AUC of 0.60. The same analysis for the discovery data, adding in the NR4A1 SNPs and upstream and downstream regulators, yielded an AUC of 0.68 (P = 0.0005), data not shown.
We also summed the number of adverse alleles (ACVR1B, NR4A1, and upstream and downstream regulators) and evaluated the distribution of cases and controls across different strata to determine the cumulative risk in the discovery set (Table 6). Compared with the lowest-risk stratum (0 to 6 risk alleles), the risks increased to an OR of 2.21, P = 0.0272 for 7 risk alleles; OR = 3.26; P = 5.0 × 10−4 for 8 risk alleles, and OR = 5.28 for 9 or more risk alleles (P = 3.9 × 10−7 (Table 6). A 46% increase in risk was found for each adverse allele, and the P value for trend was 1.11 × 10−9 (Table 6). Six percent of cases and 13% of controls were in the lowest-risk stratum compared with 50% and 35% in the highest-risk stratum, respectively.
Genetic risk score in discovery set for ACVR1B, NR4A1, and upstream and downstream SNPs
Adverse alleles (n) . | Cases, n (%) . | Controls, n (%) . | OR (95% CI)a . | P value . |
---|---|---|---|---|
0–6 | 26 (5.9) | 64 (12.8) | Ref. | |
7 | 64 (14.4) | 112 (22.4) | 2.21 (1.09–4.45) | 0.0272 |
8 | 133 (29.9) | 150 (29.9) | 3.26 (1.69–6.32) | 5.00 × 10−4 |
9+ | 221 (49.8) | 175 (34.9) | 5.28 (2.78–10.05) | 3.90 × 10−7 |
P for trend | 1.46 (1.30–1.65) | 1.11 ×10−9 |
Adverse alleles (n) . | Cases, n (%) . | Controls, n (%) . | OR (95% CI)a . | P value . |
---|---|---|---|---|
0–6 | 26 (5.9) | 64 (12.8) | Ref. | |
7 | 64 (14.4) | 112 (22.4) | 2.21 (1.09–4.45) | 0.0272 |
8 | 133 (29.9) | 150 (29.9) | 3.26 (1.69–6.32) | 5.00 × 10−4 |
9+ | 221 (49.8) | 175 (34.9) | 5.28 (2.78–10.05) | 3.90 × 10−7 |
P for trend | 1.46 (1.30–1.65) | 1.11 ×10−9 |
Adjusted for age, sex, environmental tobacco smoke, and lung cancer family history.
Discussion
In this two-stage candidate pathway analysis of inflammation gene variants, we were able to replicate one variant (rs12809597) in the Activin receptor type-1B (ACVR1B)/Activin receptor-like kinase 4 (ALK4) gene that was significantly associated with lung cancer risk in lifetime never-smoking cases. This risk was most prominent in women and in those risk subgroups that reported adult exposure to ETS, prior asthma, or no prior hay fever. Further analysis of SNPs 1 Mb from this polymorphism suggested that another promising target was in the 5′ UTR of the Nuclear receptor subfamily 4 group A member 1 (NR4A1) gene, although the OR in the replication Mayo Clinic data did not achieve statistical significance, and the association we detected could be attributed to chance.
Inflammation is a complex host defense against biological, chemical, physical, and endogenous irritants. Innate immunity is mediated by a variety of secreted proinflammatory cytokines. The inflammation is resolved by anti-inflammatory cytokines. Chronic inflammation results from a dysfunction of these negative regulatory mechanisms (10). Although smoking (and perhaps, to a lesser extent, passive exposure) is the obvious cause of a chronic inflammatory milieu in the lung parenchyma and bronchial epithelium, other likely precipitating factors include infection, inhaled particulate exposures, and pulmonary scarring (11) that can lead to oxidative stress and an inflammatory response, even in non–tobacco-exposed subjects in whom lung cancer develops. It remains plausible, therefore, that inflammation gene polymorphisms could be important in lung cancer risk in lifetime never smokers as well.
Elevated prediagnostic C-reactive protein (CRP) levels, a systemic, but nonspecific, marker of chronic inflammation, have been associated with subsequent lung cancer risk (12) with evidence of a dose–response relation. Conversely, use of nonsteroidal anti-inflammatory drugs (NSAIDs) has been associated with decreased lung cancer risk in some (13–16), but not all studies (17–19). Few of these studies have specifically evaluated the risk in lifetime never smokers, although in one cohort analysis (13), the strongest effect for total NSAID use was for long-term former smokers.
Activin receptor type-1B is a protein encoded by the ACVR1B gene with alternate splicing, resulting in multiple transcript variants. Our SNP of interest, rs12809597, is intronic, and no function has been reported for this SNP, although it is possible that this tagSNP may be linked to other causal SNP(s) in the gene that affect expression or function. ACVR1B, also known as ALK4, acts as a transducer of activin or activin-like ligands that are growth and differentiation factors belonging to the transforming growth factor-β (TGF-β) superfamily of signaling proteins, essential regulators of proliferation and apoptosis, and key regulators of inflammation and angiogenesis. Activins signal through a heteromeric complex of receptor serine kinases, which include at least two type I (I and IB) and two type II (II and IIB) receptors (20). Activin complexes with ACVR1B and recruits SMAD2 or SMAD3, members of the SMAD family of transcriptional coregulators. ACVR1B has been shown to be mutated in pancreatic tumors (21), and activin signaling mediates growth inhibition and cell cycle arrest in breast cancer cells (22). Moreover, differential expression of this gene has been found in the epithelial cells of a subset of smokers with lung cancer (23) and in bone marrow micrometastases from lung cancer patients (24), although the relevance of the gene deregulation in lung cancer is not entirely clear. Whole-genome microarray analysis of ACVR1B expression in large airway epithelial cells indicated some reduction in expression among normal smokers compared with nonsmokers (25), suggesting the possible impact of cigarette-smoke exposure on activin signaling. It is therefore of interest that risk from the variant was most apparent in ETS-exposed subjects. Activins have also been implicated in the etiology of fibrotic diseases (26) and are upregulated during the fibrotic response in vivo (27).
Both the TGF-β and activin signaling pathways are activated on allergen provocation in asthma and may contribute to the resolution of inflammation and initiation of airway remodeling after allergen challenge (28). Activin may also act as an inhibitor of cytokine-induced proinflammatory chemokine release from the airway epithelium. Activin-A is rapidly induced in TH2 cells on T-cell activation and may also function as a TH2 immunomodulatory cytokine (29). An enhanced TH2 immune response contributes to the induction of allergy and asthma.
We previously showed that self-reported, physician-diagnosed asthma is significantly associated with risk of lung cancer in lifetime never smokers who were a subset of this larger analysis (OR = 1.82) with evidence of a dose–response pattern for duration (P = 0.007 for trend) (8), although this pattern was not evident in our discovery data. In their meta-analysis, Santillan and colleagues (30) also found asthma to be a significant risk factor for lung cancer in never smokers. Our data also demonstrated a protective effect of prior hay fever on lung cancer risk in never smokers (6). Cockcroft and colleagues (31) suggested that patients with respiratory atopy appeared to have some degree of protection against developing malignancies of endodermal origin, attributable to enhanced immune surveillance in a stimulated immune system.
It was of special interest that the most significant odds ratio for the ACVR1B SNP was obtained in the subset of cases and controls that reported adult exposure to ETS, although these subset analyses are based on small sample sizes. No such association was evident in those who denied such exposure. It could be argued that an inflammatory microenvironment is more likely to exist in those exposed passively to tobacco smoke, and that exposure is necessary for the impact of the gene variant to be apparent.
The association with NR4A1 (also known as Nur77) is intriguing, but must be viewed with considerable caution. It is an orphan receptor within the nuclear hormone receptor superfamily and a potent inhibitor of NF-κB activation (32). NRA41 is overexpressed in patients with atopic dermatitis compared with healthy volunteers (33). Protective effects of the NR4A1 SNP were also largest in putative risk subgroups (asthma, no prior hay fever).
A 1% increase in the AUC (0.64) was found in an expanded clinical and epidemiologic risk model incorporating the ACVR1B SNP, and an additional 5% (0.68), when we also added the upstream and downstream targets of NR4A1. These improvements in risk prediction incorporating these genes were statistically significant. The final AUC of 0.68 is similar, but the incremental improvement in AUC is larger, than that obtained from a risk-prediction model of lung cancer in ever smokers in which we incorporated top lung cancer GWAS hits, the chromosome 15q nicotinic receptor gene cluster (tag SNP rs1051730 G>A), and two SNPs from the 5p15.33 region (rs2736100 and rs401681) (34). However, higher AUC values are desirable for the model to have clinical utility and for any public health impact or recommendation, especially because the incidence of lung cancer in never smokers is substantially lower compared with that in ever smokers.
In a parallel analysis, rs2701129 was associated with an OR of 0.78, P = 0.014, in 1,096 cases and 727 controls, all ever smokers, whom we have genotyped by using the same Illumina platform, although rs12809597 was not a risk predictor. The rs12809597 was not directly genotyped in the GWAS, and the r2 value was not sufficiently robust for imputation. The rs2701129 was genotyped in GWAS, but was not statistically significant.
Although the chemical constituents of sidestream and mainstream smoke are qualitatively the same, differences in pH, combustion temperature, and degree of dilution with air contribute to quantitative differences in their chemical composition and their emission rates. For example, nitrosamines and other carcinogens are present in greater concentrations in sidestream than in mainstream smoke (35). The ever-smoker cases for GWAS also differ from the never-smoker cases in this analysis. For example, more than 25% of our ever-smoker cases report preexisting chronic obstructive pulmonary disease (COPD) that is almost nonexistent in never smokers. One could therefore hypothesize that the pathogenic processes for smokers and never smokers are not equivalent, although certain etiologic pathways could be shared, such as the involvement of inflammation.
We acknowledge the limitations of this study and the challenge in drawing causal inferences from association analyses. Relatively small sample sizes were used for both the discovery and replication sets, and this problem is exaggerated in subset analyses. We also relied on self-reported questionnaire data for assessment of ETS exposure, raising the potential for both misclassification and recall bias. Nevertheless, for residential exposure to ETS, most studies in the past have confirmed that self-reports were generally reliable (36), and practical approaches to alternative measurement of ETS exposure decades before the onset of lung cancer have not been established. In national survey data, the accuracy of self-reported second-hand smoke exposure at work, home, or home and work ranged from 87% to 92%, although workers reporting no second-hand smoke exposure were only 28% accurate (37). Thus underreporting of ETS exposure could occur, but overreporting is less likely.
In summary, this analysis used a candidate pathway approach to evaluate SNPs comprehensively in inflammation genes as predisposing to lung cancer risk in lifetime never smokers. We replicated an SNP in the TGF-β family in ETS-exposed patients or those with inflammatory/allergic conditions, and by using in silico analyses, we were able to identify upstream and downstream SNPs of our target SNPs that further contributed to risk. Recent progress in identification of novel SNPs, especially those generated from the 1000 Genomes Project, have identified several polymorphisms in the ACVR1B gene that could be candidates for causal variants. Those SNPs include 6 polymorphisms located in the coding region of the ACVR1B: rs34488074, rs114081852, rs117020497, rs114735080, rs77643569, and rs34050429. We plan to include these SNPs in the next phase of our targeted sequencing studies.
Methods
Subject Accrual
This analysis focuses on lung cancer cases and controls who reported themselves to be lifetime never smokers (i.e., smoked <100 cigarettes over a lifetime). Cases for the discovery phase were consecutive Caucasian patients with newly diagnosed, histopathologically confirmed, and previously untreated non–small cell lung cancer with no age, gender, ethnicity, tumor histology, or disease-stage restrictions. Medical history, family history of cancer, adult environmental tobacco-exposure history, and occupational history were obtained through an interviewer-administered risk-factor questionnaire. We did not validate self-reports of passive smoking exposure. Case-exclusion criteria for the study included prior chemotherapy or radiotherapy or recent blood transfusion.
We recruited our control population from the Kelsey-Seybold Clinic, Houston's largest multidisciplinary physician practice. Potential controls were first surveyed with a short questionnaire for their willingness to participate in research studies and to provide preliminary data for matching demographic characteristics with those of cases (4). Controls were frequency matched to the cases on the basis of age (±5 years), sex, smoking status, and ethnicity. Exclusion criteria were similar and also included no prior cancer. To date, the response rate among both the cases and controls has been approximately 75%. On receiving informed consent, we drew a 40–mL blood sample into coded, heparinized tubes from study participants. Genomic DNA was extracted from peripheral blood lymphocytes and stored at −80°C.
The replication phase was conducted among never-smoking cases and controls recruited between January 1997 and September 2008 and who were included in a published GWAS (5). These lifetime never-smoking lung cancer cases were recruited from the Mayo Clinic, and community residents who were never smokers were selected as controls and matched to the patients according to age, sex, and ethnic background. Personal interviews with structured questionnaires were used to elicit demographic, epidemiologic, and exposure data. Institutional review board approval was obtained from the MD Anderson Cancer Center, Kelsey-Seybold Foundation (Houston, Texas), and Mayo Clinic (Rochester, Minnesota).
Gene and SNP Selection
Candidate genes for the discovery phase were selected based on the following criteria. We searched the Gene Ontology database (38) and the National Center for Biotechnology Information (NCBI) PubMed (39) to identify a list of inflammation pathway–related genes. For each gene, we selected haplotype tagging SNPs (htSNP) located within 10 kb upstream of the transcriptional start site or 10 kb downstream of the transcriptional stop site, based on data from the International HapMap Project (40) release 24/Phase II. By using the LD select program (41) and the UCSC Golden Path Gene Sorter program (42), we further divided identified SNPs into bins based on an r2 threshold of 0.8 and minor allele frequency (MAF) greater than 0.05 in Caucasians to select tagging SNPs. We also included SNPs in the coding (synonymous SNPs, nonsynonymous SNPs) and regulatory regions (promoter, splicing site, 5′ UTR, and 3′ UTR). Functional SNPs and SNPs previously reported to be associated with cancer were also included. We also extensively used the inflammation pathway gene list and functionally defined subpathways, as outlined in Loza and colleagues (7), who suggested that variants in multiple genes in inflammation pathways may likely cooperate in additive or synergistic ways to affect disease risk. The complete set of selected SNPs was submitted to Illumina technical support for Infinium chemistry designability, beadtype analyses, and iSelect Infinium Beadchip synthesis.
Of the total number of selected SNPs, 2.9% could not be designed because of designability score failure. An additional 12% could not be incorporated into the beadchip owing to manufacturing issues (within the norm stated by Illumina). Overall, slightly fewer than 15% of all SNPs were not designed. We did not seek surrogates for failed SNPs, because of the relatively low failure rate for designability (<3%) and constraint on the total number of beadtypes for the custom chip design.
Genotyping
In total, 19,949 SNPs were genotyped in the discovery samples by using Illumina′s Infinium iSelect HD Custom Genotyping BeadChip according to the standard 3-day protocol (San Diego, CA). Of these, 11,930 SNPs were in inflammation pathways, and the remaining SNPs were identified from ongoing GWAS for further query in separate analyses. Genotypes were autocalled by using the BeadStudio software. Any SNP with a call rate lower than 95% was excluded from further analysis (n = 203). A further 27 SNPs were removed because of a difference in genotype between the original and the duplicate sample (error rate). We also deleted 93 SNPs that were at the same chromosomal position and 89 SNPs with MAF = 0. The final data set included 19,537 SNPs, of which 11,737 SNPs were in the inflammation pathway.
For the Mayo Clinic samples, whole genome amplification (WGA) was performed before SNP genotyping. The WGA was set up in four separate reactions, each of which included 25 ng of genomic DNA and standard amplification procedures with a total reaction volume of 25 μL (REPLI-g Midi Kit, Qiagen). After WGA, the four reactions were pooled, mixed, and quantified by the picogreen method. Genotyping was performed in Dynamic Arrays (Fluidigm; South San Francisco, CA) containing integrated fluidic circuits (IFCs). Then 75 ng of the WGA-DNA was pre-amplified using 0.2X primer multiplex of the source primers. 2.3 μL of pre-amplified DNA was then loaded onto the array; 3 μL of each Applied Biosystems TaqMan genotyping assay in a 5-μL assay reaction volume was loaded onto the array. The assay was run for 40 PCR cycles under vacuum pressure. The end-point read was performed on an EP1 machine by using a CCD camera to detect VIC and FAM dyes. SNP Genotyping Analysis Software was used to autocall SNP genotype clusters with a confidence index of 95%. The specific SNPs identified from this pathway-based analysis were not included in the Mayo GWAS chip (5) and were directly genotyped for this analysis. The never-smoker GWAS with the Mayo Clinic samples had a rather limited sample size, and an additional GWAS in never smokers is under way, including our discovery set of never-smoking cases and controls.
Statistical Analyses
Pearson's χ2 test was used to assess the differences in categoric variables, and t tests were used for continuous variables in both discovery and replication data sets. All tests were 2-sided. For each SNP, Hardy-Weinberg equilibrium was assessed among controls by using a χ2 test. To assess case–control associations of SNP genotypes with lung cancer risk, we used unconditional logistic regression, implemented by using SAS/Genetics version 9.2. Single-SNP association tests were carried out by using PLINK 1.07 (43).
We applied the Bayesian false discovery probability test (BFDP) (44) to evaluate the chance of obtaining a false-positive association. This approach calculates the probability of declaring no association, given the data and a specified prior on the presence of an association, and has a noteworthy threshold that is defined in terms of the costs of false discovery and nondiscovery. Four levels of prior probability of 0.01, 0.03, 0.05, and 0.07 and odds ratios from 1.3 through 2.0 were tested; selected levels of noteworthiness for BFDP were set at 0.8 (i.e., false nondiscovery rate is 4 times as costly as false discovery). We used the most conservative prior of 0.01 to determine that the association was unlikely to represent a false-positive result.
In stratified analyses, we used logistic regression to examine associations of selected SNPs with lung cancer case–control status for subgroups of subjects defined by sidestream tobacco exposure, history of hay fever, asthma, or family history of lung cancer, comparing each subgroup of cases against controls within that subgroup.
We also performed a stepwise forward logistic regression analysis in which we allowed significant univariate SNPs to enter a model according to the strength of association, provided they showed association with disease (P < 0.05). SNPs were retained for analysis if they continued to show association (P < 0.05), given other SNPs in the model. Linkage disequilibrium (LD) between SNPs was calculated for cases and controls by using PLINK before all the SNPs were entered into the model. If two SNPs were in high LD (r2 ≥ 0.8), only one SNP was entered into the model. Linkage disequilibrium was visualized by using Haploview v. 4.1 (45) to summarize r2 statistics.
Genotyped SNPs in the region 1 Mb from each side of the ACVR1B gene range were retrieved (46). Before imputation, we identified three A/T or C/G SNPs that were in opposite strand orientation to the strand of the 1000 Genomes Project reference data, based on comparisons of minor allele frequencies. The strands for these three SNPs were flipped before imputation. MACH version 1.016 was used for imputation and options, with the 1000 Genomes Project March 2010 release CEU data as the reference panel (47). For the replication analysis, we included all SNPs that were statistically significant at P values < 0.001 and BFDP levels ≤ 0.8 with prior probability of 0.01. For risk-model construction, we retained all epidemiologic variables that were components of our published risk-prediction model for never smokers (6). However, because the Mayo Clinic study did not have data available on prior hay fever, we elected to omit this variable from the model. For each risk model, we calculated specificity and sensitivity of the resulting logistic regression model by constructing receiver operator characteristic (ROC) curves and calculating the area under the curve (AUC) statistic to estimate the ability of the models to discriminate between patients and controls for the two populations separately and combined. Approximate 95% confidence intervals for the AUC were calculated, assuming a binegative exponential distribution by using SAS statistical software. An AUC of 0.5 indicates chance prediction (equivalent to a coin toss), whereas a statistic of 0.7 or higher indicates good discrimination. We also constructed expanded models that included any replicated SNPs. We performed pairwise comparisons of AUCs of the baseline multiple logistic model and the expanded model including genetic data by using a contrast matrix to evaluate differences of the areas under the empirical ROC curves (48).
Disclosure of Potential Conflicts of Interest
The authors declare that they have no competing financial interests. None of the sponsors played a role in the study design, collection, analysis, and interpretation of the data, in the writing of this article, or in the decision to submit the manuscript for publication.
Grant Support
This work was supported by grants from the National Cancer Institute [CA55769 and CA127219 (M.R. Spitz); CA80127 and CA84354 (P. Yang); U19CA148127 and CA121197 (C.I. Amos); CA123235 and CA131327 (C.J. Etzel); and CA149462 (O.Y. Gorlova)]; Kelsey Seybold Research Foundation; and Mayo Foundation Fund.