Background:

A substantial proportion of cancer driver genes (CDG) are also cancer predisposition genes. However, the associations between genetic variants in lung CDGs and the susceptibility to lung cancer have rarely been investigated.

Methods:

We selected expression-related single-nucleotide polymorphisms (eSNP) and nonsynonymous variants of lung CDGs, and tested their associations with lung cancer risk in two large-scale genome-wide association studies (20,871 cases and 15,971 controls of European descent). Conditional and joint association analysis was performed to identify independent risk variants. The associations of independent risk variants with somatic alterations in lung CDGs or recurrently altered pathways were investigated using data from The Cancer Genome Atlas (TCGA) project.

Results:

We identified seven independent SNPs in five lung CDGs that were consistently associated with lung cancer risk in discovery (P < 0.001) and validation (P < 0.05) stages. Among these loci, rs78062588 in TPM3 (1q21.3) was a new lung cancer susceptibility locus (OR = 0.86, P = 1.65 × 10−6). Subgroup analysis by histologic types further identified nine lung CDGs. Analysis of somatic alterations found that in lung adenocarcinomas, rs78062588[C] allele (TPM3 in 1q21.3) was associated with elevated somatic copy number of TPM3 (OR = 1.16, P = 0.02). In lung adenocarcinomas, rs1611182 (HLA-A in 6p22.1) was associated with truncation mutations of the transcriptional misregulation in cancer pathway (OR = 0.66, P = 1.76 × 10−3).

Conclusions:

Genetic variants can regulate functions of lung CDGs and influence lung cancer susceptibility.

Impact:

Our findings might help unravel biological mechanisms underlying lung cancer susceptibility.

Lung cancer has been one of the most commonly diagnosed malignancies and the leading cause of cancer-related death worldwide (1). The development of lung cancer is a multistep process that involves both genetic and environmental factors (2–4). Genome-wide association studies (GWAS) have been proven to be a powerful approach to dissect genetic architectures of complex diseases. To date, GWASs have identified 51 lung cancer susceptibility loci in various populations (5, 6). However, the information provided by GWAS remains inadequate. The heritability of lung cancer was estimated to be 20.6% in European populations (7), while only a small proportion of lung cancer heritability could be explained by risk loci that were identified in previous lung cancer GWASs (8). Therefore, more risk loci for lung cancer are needed to be identified.

Several waves of technology have facilitated the identification of lung cancer driver genes (lung CDG), which are improving our understanding of oncogenic process for lung cancer. On the basis of The Cancer Genome Atlas (TCGA) research on lung cancer, the most commonly mutated oncogenes in lung adenocarcinoma included KRAS, EGFR, BRAF, PIK3CA, and MET; mutations in tumor suppressors such as TP53, STK11, KEAP1, NF1, RB1, and CDKN2A were also frequently detected in lung adenocarcinoma (9–11). Although TP53, RB1, ARID1A, CDKN2A, PIK3CA, and NF1 were significantly mutated in both lung adenocarcinoma and lung squamous cell carcinoma (lung SqCC), significantly mutated genes like NOTCH1 and HRAS were only identified in lung squamous cell carcinoma (10–12). In addition to somatic mutations, somatic copy number alterations (SCNA) and rearrangements also play important roles in lung cancer development. Amplification of TERT and EGFR, as well as fusions involving ALK and ROS1, were commonly identified in lung adenocarcinoma. Deletions of CDKN2A have been identified in both lung adenocarcinoma and squamous cell carcinoma (9, 10, 12).

Emerging evidence has shown that a substantial proportion of CDGs are also cancer predisposition genes (13). The TCGA PanCanAtlas Germline Working Group identified 44 genes that showed coclustering or colocalization of pathogenic germline variants with recurrent somatic mutations, implying shared oncogenic processes in germline and somatic genomes (14). In addition, susceptibility variants could regulate the functions of nearby cancer driver genes. For example, rs2736100, a risk variant of lung cancer, is located in the first intron of driver gene TERT, and was associated with increased expression of TERT in lung tumors (15). However, the associations between common genetic variants in lung CDGs and lung cancer risk have rarely been explored. Therefore, we integrated lung CDGs, genetics of gene expression, and functional annotation databases with large-scale lung cancer GWAS datasets to systematically investigate the associations between lung CDG–related genetic variants and lung cancer risk.

GWAS datasets

This study utilized data from two existing GWASs of European descent: the OncoArray dataset (16) and Division of Cancer Epidemiology and Genetics (DCEG) Lung Cancer Study (17). The OncoArray dataset was derived from the Transdisciplinary Research of Cancer in Lung of the International Lung Cancer Consortium (TRICL-ILCCO) and the Lung Cancer Cohort Consortium (LC3). Quality control and imputation processes were described previously (16), resulting in 18,444 cases and 14,027 controls remained. The DCEG Lung Cancer GWAS data were obtained from dbGap phs000336.v1.p1 (17). Detailed quality control and imputation processes have been described previously (18). We further excluded individuals in the DCEG Lung Cancer Study that overlapped with or were related to individuals from the OncoArray dataset based on identity by descent (IBD) analysis (IBD > 0.45). As a result, a total of 2,427 cases and 1,944 controls from the DCEG Lung Cancer Study remained. All participants signed informed consents and study protocols were approved by the ethical review boards of each institution.

Selection of lung CDG–related genetic variants

Genes were annotated as lung CDGs if they fulfilled any of the following criteria: (i) lung cancer–related genes in the COSMIC Cancer Gene Census (v78; ref. 19); (ii) mutational drivers, SCNA drivers, and fusion drivers detected by the IntOGen pipeline in lung tumors (20); and (iii) significantly mutated genes (SMG) and candidate CDGs with significant SCNAs that were identified in lung adenocarcinoma and/or lung squamous cell carcinoma by the TCGA projects (10).

To investigate functional variants in lung CDGs, we included single-nucleotide polymorphisms (SNP) if they satisfied either of the following criteria: (i) SNPs that were associated with expressions of lung CDGs (expression-related SNP, or eSNPs) in normal lung tissues based on the Genotype-Tissue Expression Project (GTEx, v6p release; P < 0.05; ref. 21) or (ii) nonsynonymous variants of lung CDGs identified using Variant Effect Predictor (22). The selected eSNPs and nonsynonymous variants were extracted from the two GWAS datasets. SNP with imputation INFO < 0.8, minor allele frequency (MAF) in controls <0.005, Hardy–Weinberg equilibrium (HWE) test P in controls < 1 × 10−7, or HWE test P in cases < 1 × 10−12 was excluded from the analysis.

Statistical analyses

Association analysis

We performed logistic regression to generate odds ratios and confidence intervals (CI) for each SNP. The OncoArray dataset was used in the discovery stage with age, gender, and the first three principal components (PC) adjusted (16). Variations with association P < 0.001 were further tested in the DCEG Lung Cancer Study (the validation stage), and we adjusted age, gender, and the first PC in logistic regression model (23). SNPTEST v2.5 was used for the association analysis, taking dosage format of imputed genotypes. For variations with P < 0.05 in the validation stage, meta-analysis that combined effect estimates from the two datasets was performed using GWAMA v2.0.2 (24). The index of heterogeneity (I2) and P value based on Cochran Q test were calculated to assess the heterogeneity between studies. Fixed-effect model was used for absent of heterogeneity between studies (Pheterogeneity > 0.05); otherwise random-effect model was adopted. Variations with the same direction of effect in both GWAS datasets and P < 1 × 10−4 in the meta-analysis were considered as suggestive risk SNPs (Supplementary Fig. S1).

In addition to the overall lung cancer, we also investigated the associations of lung CDG–related SNPs with risk of lung adenocarcinoma and lung squamous cell carcinoma. As the DCEG Lung Cancer Study lacked information of histologic types, we performed association analysis using logistic regression model in the OncoArray dataset. To control the false discovery rate (FDR), we used Benjamini–Hochberg step-down method to calculate FDR for each variation. Variations with FDR < 0.01 were considered as suggestive risk SNPs.

We mapped suggestive risk SNPs to lung CDGs based on the GTEx v6p release, and performed functional prediction for significant nonsynonymous variants using SIFT (25) and PolyPhen2 (26), which were implemented in ANNOVAR (27). For lung CDGs with multiple risk SNPs, conditional and joint association analysis were performed to identify independent signals using genome-wide complex trait analysis (GCTA; ref. 28). During the model selection process, the testing SNP was not selected if its regression R2 on the selected SNPs was greater than 0.1. The threshold P value of 0.0001 was adopted to identify significant independent hits. SNPs that were significant after the multiple testing corrections and that were not in linkage disequilibrium (LD, r2 < 0.1) with and were located at least 500 kilobases apart from known risk variants were considered as novel susceptibility SNPs.

Coexpression and pathway enrichment analysis

Expression data on 56,238 genes for 320 normal lung tissues were downloaded from the GTEx website (21). Genome-wide expression correlation analysis was performed using a linear regression model to identify genes coexpressed with significant lung CDGs. Significant coexpressed genes that satisfied the Bonferroni correction were used for pathway enrichment analysis. We downloaded pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database from MSigDB (29–31), and performed pathway enrichment analysis using “PHYPER” function as implemented in R software (version 3.4.1), which computes a P value for each pathway based on hypergeometric distribution.

Associations between independent SNPs and somatic alterations

TCGA datasets of lung adenocarcinoma and lung squamous cell carcinoma were used to model the association between independent SNPs and somatic alterations in lung CDGs (9, 12). Germline genotype data generated using Affymetrix Genome-Wide Human SNP Array 6.0 were applied for and approved in Feb, 2015. Standard quality control and genotype imputation process have been described previously (32).

We downloaded Mutation Annotation Format derived from whole-exome sequencing, as well as somatic copy numbers calculated using GISTIC2 from the Broad Institute Genome Data Analysis Center (GDAC) Firehose portal (stamp analyses_2016_01_28) (33). For each patient, a lung CDG was considered mutated if one or more somatic mutations mapped to this gene. We also assessed truncation mutations (frame shift insertion/deletion, nonsense, nonstop, and splice site mutations; ref. 34) in pathways that are recurrently altered in lung cancer, including cell cycle, spliceosome, Notch signaling pathway, transcriptional misregulation in cancer, Ras signaling pathway, and PI3K–Akt signaling pathway (11). A pathway was considered as mutated if one or more truncation mutations were observed in this pathway. We used logistic regression models to evaluate the association between independent SNPs and mutational status of lung CDGs or pathways. In the analysis of SCNAs, somatic copy number of lung CDG was used as outcome, and we used linear regression to model the association between independent SNPs and SCNAs. Age, gender, smoking status, clinical stage, and the first 10 PCs were adjusted as covariates. The association analysis between independent SNPs and somatic alterations were performed in lung adenocarcinoma and lung squamous cell carcinoma, separately. Benjamini–Hochberg step-down method was used to calculate FDR for each SNP-lung CDG (or SNP pathway) pair to control the FDR. Association analysis was conducted using the R software (version 3.4.1).

The OncoArray dataset included 18,444 cases and 14,027 controls. The mean (± SE) age of the subjects was 63.79 ± 10.44 for cases and 61.77 ± 10.29 for controls. For the DCEG Lung Cancer Study, a total of 2,427 cases and 1,944 controls were included. Among participants across both studies with known histologic types, there were 6,819 lung adenocarcinomas and 4,490 lung squamous cell carcinomas. Detailed characteristics and clinical features of participants in each data set were shown in Supplementary Table S1.

Genetic variants associated with lung cancer risk

A total of 348 protein-coding lung CDGs were included from published data (Supplementary Table S2). We identified 139,666 eSNPs and 2,041 nonsynonymous variants of lung CDGs. Among SNPs that passed the quality control process, a total of 234 SNPs were identified in the OncoArray dataset (P < 0.001) and validated in the DCEG Lung Cancer Study (P < 0.05), which were mapped to five lung CDGs (Supplementary Table S3). After conditional analysis, seven independent signals were identified. Among these loci, rs78062588, which was mapped to TPM3 in chromosome 1q21.3, was a new lung cancer susceptibility locus [OR = 0.87, 95% CI: 0.81–0.92, P = 1.55 × 10−5 in the OncoArray dataset; OR = 0.82, 95% CI: 0.68–0.98, P = 3.11 × 10−2 in the DCEG Lung Cancer Study; and OR = 0.86, 95% CI: 0.81–0.91, P = 1.65 × 10−6 in the meta-analysis; Tables 1 and 2; Supplementary Table S3]. In addition, rs71658797 in FUBP1 (1p31.1), rs1655931 and rs2517586 in HLA-A (6p22.1), rs2887532 in KDM5A (12p13.33), rs7359276 and rs7161774 in IREB2 (15q25.1) had been reported by previous GWASs as lung cancer susceptibility loci (Tables 1 and 2; Supplementary Table S3; refs. 5, 6).

Table 1.

The associations between independent variants representing each lung cancer locus and overall lung cancer risk in the OncoArray dataset.

CytobandaLocation (bp)bSNPGeneEffect alleleReference alleleINFOEAF in caseEAF in controlOR (95% CI)P
1p31.1 77967507 rs71658797 FUBP1 1.00 0.11 0.10 1.14 (1.08–1.20) 1.04E-06 
1q21.3 154566225 rs78062588c TPM3 0.95 0.06 0.07 0.87 (0.81–0.92) 1.55E-05 
6p22.1 29897438 rs1655931 HLA-A 0.96 0.17 0.15 1.15 (1.10–1.20) 3.79E-10 
6p22.1 30205174 rs2517586 HLA-A 0.99 0.33 0.35 0.92 (0.89–0.95) 8.84E-07 
12p13.33 1051495 rs2887532 KDM5A 1.00 0.17 0.18 0.93 (0.89–0.97) 3.90E-04 
15q25.1 78892661 rs7359276 IREB2 1.00 0.80 0.76 1.27 (1.22–1.32) 9.74E-35 
15q25.1 79069734 rs7161774 IREB2 0.96 0.57 0.60 0.85 (0.82–0.88) 9.39E-23 
CytobandaLocation (bp)bSNPGeneEffect alleleReference alleleINFOEAF in caseEAF in controlOR (95% CI)P
1p31.1 77967507 rs71658797 FUBP1 1.00 0.11 0.10 1.14 (1.08–1.20) 1.04E-06 
1q21.3 154566225 rs78062588c TPM3 0.95 0.06 0.07 0.87 (0.81–0.92) 1.55E-05 
6p22.1 29897438 rs1655931 HLA-A 0.96 0.17 0.15 1.15 (1.10–1.20) 3.79E-10 
6p22.1 30205174 rs2517586 HLA-A 0.99 0.33 0.35 0.92 (0.89–0.95) 8.84E-07 
12p13.33 1051495 rs2887532 KDM5A 1.00 0.17 0.18 0.93 (0.89–0.97) 3.90E-04 
15q25.1 78892661 rs7359276 IREB2 1.00 0.80 0.76 1.27 (1.22–1.32) 9.74E-35 
15q25.1 79069734 rs7161774 IREB2 0.96 0.57 0.60 0.85 (0.82–0.88) 9.39E-23 

Abbreviation: EAF, effect allele frequency.

aCytogenetic band.

bSNP position, build 37.

cSNPs (or loci) that were first identified as potential lung cancer susceptibility loci in this study.

Table 2.

The associations between independent variants representing each lung cancer locus and overall lung cancer risk in the DCEG Lung Cancer Study.

CytobandaLocation (bp)bSNPGeneEffect alleleReference alleleINFOEAF in caseEAF in controlOR (95% CI)P
1p31.1 77967507 rs71658797 FUBP1 0.98 0.13 0.11 1.18 (1.04–1.35) 1.22E-02 
1q21.3 154566225 rs78062588c TPM3 0.97 0.05 0.07 0.82 (0.68–0.98) 3.11E-02 
6p22.1 29897438 rs1655931 HLA-A 0.97 0.14 0.13 1.15 (1.01–1.30) 3.37E-02 
6p22.1 30205174 rs2517586 HLA-A 0.98 0.35 0.37 0.89 (0.82–0.98) 1.34E-02 
12p13.33 1051495 rs2887532 KDM5A 1.00 0.20 0.21 0.88 (0.79–0.98) 2.10E-02 
15q25.1 78892661 rs7359276 IREB2 1.00 0.78 0.74 1.31 (1.18–1.45) 1.57E-07 
15q25.1 79069734 rs7161774 IREB2 0.96 0.63 0.66 0.87 (0.79–0.95) 2.71E-03 
CytobandaLocation (bp)bSNPGeneEffect alleleReference alleleINFOEAF in caseEAF in controlOR (95% CI)P
1p31.1 77967507 rs71658797 FUBP1 0.98 0.13 0.11 1.18 (1.04–1.35) 1.22E-02 
1q21.3 154566225 rs78062588c TPM3 0.97 0.05 0.07 0.82 (0.68–0.98) 3.11E-02 
6p22.1 29897438 rs1655931 HLA-A 0.97 0.14 0.13 1.15 (1.01–1.30) 3.37E-02 
6p22.1 30205174 rs2517586 HLA-A 0.98 0.35 0.37 0.89 (0.82–0.98) 1.34E-02 
12p13.33 1051495 rs2887532 KDM5A 1.00 0.20 0.21 0.88 (0.79–0.98) 2.10E-02 
15q25.1 78892661 rs7359276 IREB2 1.00 0.78 0.74 1.31 (1.18–1.45) 1.57E-07 
15q25.1 79069734 rs7161774 IREB2 0.96 0.63 0.66 0.87 (0.79–0.95) 2.71E-03 

Abbreviation: EAF, effect allele frequency.

aCytogenetic band.

bSNP position, build 37.

cSNPs (or loci) that were first identified as potential lung cancer susceptibility loci in this study.

Stratified analyses in lung adenocarcinoma and lung squamous cell carcinoma found another nine susceptibility genes, including seven genes that were identified in lung adenocarcinoma and two genes that were identified only in lung squamous cell carcinoma (Fig. 1A and B; Supplementary Table S4). Independent variants derived from conditional analysis are shown in Supplementary Table S5. Of these loci, rs2700389 in KALRN (3q21.1), rs79518818 in MGA (15q15.1), and rs62054832 in EFTUD2 (17q21.31) were first identified as risk loci for lung adenocarcinoma, while rs148797791 in IRF6 (1q32.2) was found as a novel risk locus for lung squamous cell carcinoma. SNPs rs7823498 in NRG1 (8p12), rs10757256 and rs1011970 in CDKN2A (9p21.3), rs79040073 in COPS2 (15q21.1), rs2281925 in ARFGAP1 (20q13.33), and rs17879961 in CHEK2 (22q12.1) had been reported by previous GWASs as lung cancer susceptibility loci (5, 6).

Figure 1.

Manhattan plot showing −log10 (P values) for SNP associations with risk of lung adenocarcinoma and squamous cell carcinoma. A, Lung adenocarcinoma (6,819 cases and 14,027 controls). B, Lung squamous cell carcinoma (4,490 cases and 14,027 controls). Each locus is annotated by its cytoband location and corresponding lung cancer driver genes. The x-axis represents chromosomal location, and the y-axis represents −log10 (P value). The horizontal line denotes FDR < 0.01.

Figure 1.

Manhattan plot showing −log10 (P values) for SNP associations with risk of lung adenocarcinoma and squamous cell carcinoma. A, Lung adenocarcinoma (6,819 cases and 14,027 controls). B, Lung squamous cell carcinoma (4,490 cases and 14,027 controls). Each locus is annotated by its cytoband location and corresponding lung cancer driver genes. The x-axis represents chromosomal location, and the y-axis represents −log10 (P value). The horizontal line denotes FDR < 0.01.

Close modal

Functional evaluation for significant SNPs

Among 234 significant SNPs in overall lung cancer, three were nonsynonymous variants. Two additional nonsynonymous variants (rs1136688 in HLA-A and rs17879961 in CHEK2) were identified in lung squamous cell carcinoma (Supplementary Table S6). We predicted functional consequence of nonsynonymous variants using SIFT and Polyphen-2 (25, 26). Notably, risk variant rs707910 in HLA-A (NM_001242758, c.G203A) was predicted as deleterious by SIFT and possibly damaging by Polyphen-2. SNP rs17879961 in CHEK2 (NM_007194, c.T470C) was predicted as tolerated by SIFT and possibly damaging by Polyphen-2.

To explore biological processes underlying significant lung CDGs, we performed genome-wide coexpression and KEGG pathway enrichment analysis. We identified essential pathways in lung carcinogenesis such as apoptosis, MAPK signaling pathway, spliceosome, cell cycle, and nucleotide excision repair (Supplementary Table S7; ref. 11).

Associations between independent risk SNPs and somatic alterations

We investigated the associations between independent SNPs and somatic alterations in lung CDGs. The protective rs78062588[C] allele (TPM3 in 1q21.3) was associated with increased expression of TPM3 in normal lung tissues (OR = 1.14, P = 0.04) and elevated somatic copy number of TPM3 in TCGA lung adenocarcinomas (OR = 1.16, P = 0.02; Supplementary Fig. S2). However, the analysis of somatic mutations in lung CDGs did not identify any association with P < 0.05. As the mutational frequencies of lung CDGs are relatively low, we further analyzed the associations between independent risk SNPs and truncation mutations at the pathway level. Among patients with lung squamous cell carcinoma, we found that rs1611182 (HLA-A in 6p22.1), a risk SNP for lung adenocarcinomas, was associated with decreased frequency of truncation mutations in the transcriptional misregulation in cancer pathway (OR = 0.66, 95% CI: 0.50–0.85, P = 1.76 × 10−3, FDR < 0.25; Table 3; Supplementary Table S8; Supplementary Fig. S3).

Table 3.

Associations between rs1611182 and truncation mutations in the transcriptional misregulation in cancer pathway.

SNPAlleleaHistological typesCasesbControlsbEAFOR (95%CI)cPc
CasesControls
rs1611182 G/T Lung ADC 30/93/81 71/144/86 0.38 0.48 0.66 (0.50–0.85) 1.76E-03 
  Lung SqCC 50/105/74 56/131/66 0.45 0.48 0.91 (0.70–1.18) 4.66E-01 
SNPAlleleaHistological typesCasesbControlsbEAFOR (95%CI)cPc
CasesControls
rs1611182 G/T Lung ADC 30/93/81 71/144/86 0.38 0.48 0.66 (0.50–0.85) 1.76E-03 
  Lung SqCC 50/105/74 56/131/66 0.45 0.48 0.91 (0.70–1.18) 4.66E-01 

Abbreviations: Lung ADC, lung adenocarcinoma; Lung SqCC, lung squamous cell carcinoma; EAF, effect allele frequency.

aReference/effect allele;

bVariant homozygote/heterozygote/wild-type homozygote. Patients with one or more truncation mutations in corresponding pathway were cases. Otherwise the patients were defined as controls.

cAdjusted by age, gender, smoking status, clinical stage, and the first ten principals.

This study comprehensively incorporated lung cancer GWASs, lung CDGs, genetics of gene expression, somatic alterations in lung tumors, and functional annotation databases to investigate the associations of CDG-related genetic variants with lung cancer risk. We identified five lung CDGs in overall lung cancer. Subgroup analysis by histologic types further identified seven and two genes in lung adenocarcinoma and lung squamous cell carcinoma, respectively. Genes coexpressed with the identified lung CDGs were involved in essential pathways including cell cycle, MAPK signaling, and nucleotide excision repair pathways. Incorporation of somatic alterations identified lung cancer risk variants that were associated with somatic alterations in lung CDGs or recurrently mutated pathways.

TPM3 is included in the COSMIC Cancer Gene Census. Translocation of TPM3 could form oncogenic fusion proteins, such as TPM3-ROS1 observed in advanced lung adenocarcinoma (35). Previously conducted functional assessment in NIH3T3 cells showed that TPM3-ALK fusion protein can interact with endogenous tropomyosin, which may induce changes in cell morphology and cytoskeleton organization and further bestowed higher metastatic capacities (36). Our results found that the protective allele of rs78062588 was associated with increased TPM3 expression as well as increased somatic copy number alterations of TPM3 in lung adenocarcinomas. However, reaching a better understanding of the functional impact of TPM3 on lung cancer development warrants further investigation.

CDKN2A in 9p21.3 encodes several alternatively spliced transcripts, among which are p16 and ARF. p16 is a tumor suppressor that functions as an inhibitor of CDK4 and CDK6 (37). Another tumor suppressor protein, ARF, functions as a stabilizer of the tumor suppressor protein p53. Both p16 and ARF have functionality in cell-cycle G1 control. CDKN2A is recognized as an important tumor suppressor gene. Deletion of CDKN2A was frequently identified in lung tumors (10). In addition, CDKN2A has been identified as susceptibility gene for lung adenocarcinoma (16). We validated this locus and identified a second signal within CDKN2A. Consistently, the risk alleles of independent SNPs were associated with decreased expression of CDKN2A in normal lung tissues.

The transcription factor interferon regulatory factor 6 (IRF6) was identified as significantly mutated gene in TCGA lung squamous cell carcinomas (10). IRF6 has essential role in epidermal development. It is induced in differentiation through a Notch-dependent mechanism. Downregulation of IRF6 in epithelial squamous cell carcinomas promotes ras-induced tumor formation and reintroduction of IRF6 strongly inhibits cell growth (38, 39). The tumor suppressor role of IRF6 has also been demonstrated in vulvar squamous cell carcinoma (40). In addition, elevated IRF6 expression in nasopharyngeal carcinomas suppressed cell proliferation and growth (41). We identified IRF6 as a susceptibility gene for lung squamous cell carcinoma. Consistent with the tumor suppressor role of IRF6, the risk allele of rs148797791 was associated with decreased expression of IRF6 in normal lung tissues. These results indicate that germline variant might contribute to lung cancer risk by downregulation of IRF6.

Genes coexpressed with the identified lung CDGs were enriched in essential pathways such as apoptosis, MAPK signaling pathway, spliceosome, cell cycle, and nucleotide excision repair. A comprehensive molecular profiling of lung adenocarcinoma demonstrated recurrent somatic alterations in cell cycle and MAPK signaling pathway (9, 42). In addition, deregulated RNA Splicing is involved in lung adenocarcinoma, and cell-cycle pathway is involved in both lung adenocarcinoma and lung squamous cell carcinoma (11, 42).

We comprehensively collected 348 lung CDGs from three databases, and tested associations between functional SNPs of lung CDGs and risk of lung cancer in large-scale lung cancer GWASs of Europeans. We identified five novel susceptibility loci of lung cancer, and validated nine loci that had been reported by previous lung cancer GWASs. These results showed that genetic variants in lung CDGs contribute to lung cancer susceptibility. Our findings might help to unravel biological functions of lung cancer susceptibility loci.

No potential conflicts of interest were disclosed.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.

Conception and design: Y. Wang, M. Zhu, O. Melander, S. Zienolddiny, H. Shen

Development of methodology: Y. Wang

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D. Albanes, S. Lam, A. Tardon, C. Chen, G.E. Goodman, S.E. Bojesen, M.T. Landi, M. Johansson, A. Risch, H.-E. Wichmann, H. Bickeboller, D.C. Christiani, G. Rennert, S.M. Arnold, P. Brennan, J.K. Field, S. Shete, L. Le Marchand, O. Melander, H. Brunnstrom, G. Liu, R.J. Hung, A.S. Andrew, L.A. Kiemeney, S. Zienolddiny, K. Grankvist, M. Johansson, N.E. Caporaso, P.J. Woll, P. Lazarus, M.B. Schabath, M.C. Aldrich, C.I. Amos, H. Shen

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Y. Wang, I.P. Gorlov, M. Zhu, J. Dai, G.E. Goodman, D.C. Christiani, G. Rennert, S.M. Arnold, R.J. Hung, N.E. Caporaso, H. Shen

Writing, review, and/or revision of the manuscript: Y. Wang, O.Y. Gorlova, J. Dai, D. Albanes, S. Lam, A. Tardon, C. Chen, G.E. Goodman, S.E. Bojesen, M.T. Landi, M. Johansson, A. Risch, H.-E. Wichmann, H. Bickeboller, D.C. Christiani, G. Rennert, S.M. Arnold, P. Brennan, J.K. Field, S. Shete, L. Le Marchand, O. Melander, H. Brunnstrom, G. Liu, R.J. Hung, L.A. Kiemeney, K. Grankvist, M. Johansson, N.E. Caporaso, P.J. Woll, M.B. Schabath, M.C. Aldrich, V.L. Stevens, H. Ma, G. Jin, Z. Hu, C.I. Amos, H. Shen

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y. Wang, G.E. Goodman, M. Johansson, D.C. Christiani, J.K. Field, G. Liu, S. Zienolddiny, M. Johansson, M.B. Schabath, C.I. Amos, H. Shen

Study supervision: R.J. Hung, L.A. Kiemeney, G. Jin, H. Shen

The authors thank the study participants and research staff for their contributions and commitment to this study. H. Shen was awarded grants from National Natural Science of China (81820108028, 81521004). C.I. Amos was awarded grants from the NIH (U19 CA148127, U19 CA203654) and the Cancer Prevention Research Institute of Texas (RR170048). H. Ma was awarded a grant from National Natural Science of China (81922061). D.C. Christiani was awarded a grant from the NIH (U01 CA209414). C. Chen was awarded grants from the NIH (U01-CA063673, UM1-CA167462, and U01-CA167462). M.B. Schabath was awarded the Moffitt Cancer Center Support Grant (P30 CA076292) and SPORE in Lung Cancer (P50 CA119997).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
. 
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
2.
Tokuhata
GK
,
Lilienfeld
AM
. 
Familial aggregation of lung cancer in humans
.
J Natl Cancer Inst
1963
;
30
:
289
312
.
3.
Lichtenstein
P
,
Holm
NV
,
Verkasalo
PK
,
Iliadou
A
,
Kaprio
J
,
Koskenvuo
M
, et al
Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland
.
N Engl J Med
2000
;
343
:
78
85
.
4.
Matakidou
A
,
Eisen
T
,
Houlston
RS
. 
Systematic review of the relationship between family history and lung cancer risk
.
Br J Cancer
2005
;
93
:
825
33
.
5.
Bosse
Y
,
Amos
CI
. 
A decade of GWAS results in lung cancer
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
363
79
.
6.
Dai
J
,
Lv
J
,
Zhu
M
,
Wang
Y
,
Qin
N
,
Ma
H
, et al
Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations
.
Lancet Respir Med
2019
;
7
:
881
91
.
7.
Sampson
JN
,
Wheeler
WA
,
Yeager
M
,
Panagiotou
O
,
Wang
Z
,
Berndt
SI
, et al
Analysis of heritability and shared heritability based on genome-wide association studies for thirteen cancer types
.
J Natl Cancer Inst
2015
;
107
:
djv279
.
8.
Dai
J
,
Shen
W
,
Wen
W
,
Chang
J
,
Wang
T
,
Chen
H
, et al
Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population
.
Int J Cancer
2017
;
140
:
329
36
.
9.
Cancer Genome Atlas Research Network
. 
Comprehensive molecular profiling of lung adenocarcinoma
.
Nature
2014
;
511
:
543
50
.
10.
Campbell
JD
,
Alexandrov
A
,
Kim
J
,
Wala
J
,
Berger
AH
,
Pedamallu
CS
, et al
Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas
.
Nat Genet
2016
;
48
:
607
16
.
11.
Swanton
C
,
Govindan
R
. 
Clinical implications of genomic discoveries in lung cancer
.
N Engl J Med
2016
;
374
:
1864
73
.
12.
Cancer Genome Atlas Research Network
. 
Comprehensive genomic characterization of squamous cell lung cancers
.
Nature
2012
;
489
:
519
25
.
13.
Rahman
N
. 
Realizing the promise of cancer predisposition genes
.
Nature
2014
;
505
:
302
8
.
14.
Huang
KL
,
Mashl
RJ
,
Wu
Y
,
Ritter
DI
,
Wang
J
,
Oh
C
, et al
Pathogenic germline variants in 10,389 adult cancers
.
Cell
2018
;
173
:
355
70
.
15.
Wei
R
,
Cao
L
,
Pu
H
,
Wang
H
,
Zheng
Y
,
Niu
X
, et al
TERT Polymorphism rs2736100-C is associated with EGFR mutation-positive non-small cell lung cancer
.
Clin Cancer Res
2015
;
21
:
5173
80
.
16.
McKay
JD
,
Hung
RJ
,
Han
Y
,
Zong
X
,
Carreras-Torres
R
,
Christiani
DC
, et al
Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes
.
Nat Genet
2017
;
49
:
1126
32
.
17.
Landi
MT
,
Chatterjee
N
,
Yu
K
,
Goldin
LR
,
Goldstein
AM
,
Rotunno
M
, et al
A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma
.
Am J Hum Genet
2009
;
85
:
679
91
.
18.
Wang
Y
,
Wu
W
,
Zhu
M
,
Wang
C
,
Shen
W
,
Cheng
Y
, et al
Integrating expression-related SNPs into genome-wide gene- and pathway-based analyses identified novel lung cancer susceptibility genes
.
Int J Cancer
2018
;
142
:
1602
10
.
19.
Futreal
PA
,
Coin
L
,
Marshall
M
,
Down
T
,
Hubbard
T
,
Wooster
R
, et al
A census of human cancer genes
.
Nat Rev Cancer
2004
;
4
:
177
83
.
20.
Gonzalez-Perez
A
,
Perez-Llamas
C
,
Deu-Pons
J
,
Tamborero
D
,
Schroeder
MP
,
Jene-Sanz
A
, et al
IntOGen-mutations identifies cancer drivers across tumor types
.
Nat Methods
2013
;
10
:
1081
2
.
21.
GTEx Consortium
. 
The genotype-tissue expression (GTEx) project
.
Nat Genet
2013
;
45
:
580
5
.
22.
McLaren
W
,
Gil
L
,
Hunt
SE
,
Riat
HS
,
Ritchie
GR
,
Thormann
A
, et al
the ensembl variant effect predictor
.
Genome Biol
2016
;
17
:
122
.
23.
Dai
J
,
Li
Z
,
Amos
CI
,
Hung
RJ
,
Tardon
A
,
Andrew
AS
, et al
Systematic analyses of regulatory variants in DNase I hypersensitive sites identified two novel lung cancer susceptibility loci
.
Carcinogenesis
2019
;
40
:
432
40
.
24.
Magi
R
,
Morris
AP
. 
GWAMA: software for genome-wide association meta-analysis
.
BMC Bioinformatics
2010
;
11
:
288
.
25.
Sim
NL
,
Kumar
P
,
Hu
J
,
Henikoff
S
,
Schneider
G
,
Ng
PC
. 
SIFT web server: predicting effects of amino acid substitutions on proteins
.
Nucleic Acids Res
2012
;
40
:
W452
7
.
26.
Adzhubei
IA
,
Schmidt
S
,
Peshkin
L
,
Ramensky
VE
,
Gerasimova
A
,
Bork
P
, et al
A method and server for predicting damaging missense mutations
.
Nat Methods
2010
;
7
:
248
9
.
27.
Wang
K
,
Li
M
,
Hakonarson
H
. 
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
2010
;
38
:
e164
.
28.
Yang
J
,
Lee
SH
,
Goddard
ME
,
Visscher
PM
. 
GCTA: a tool for genome-wide complex trait analysis
.
Am J Hum Genet
2011
;
88
:
76
82
.
29.
Ogata
H
,
Goto
S
,
Sato
K
,
Fujibuchi
W
,
Bono
H
,
Kanehisa
M
. 
KEGG: Kyoto Encyclopedia of Genes and Genomes
.
Nucleic Acids Res
1999
;
27
:
29
34
.
30.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
31.
Liberzon
A
,
Birger
C
,
Thorvaldsdottir
H
,
Ghandi
M
,
Mesirov
JP
,
Tamayo
P
. 
The Molecular Signatures Database (MSigDB) hallmark gene set collection
.
Cell Syst
2015
;
1
:
417
25
.
32.
Wang
Y
,
Wang
C
,
Zhang
J
,
Zhu
M
,
Zhang
X
,
Li
Z
, et al
Interaction analysis between germline susceptibility loci and somatic alterations in lung cancer
.
Int J Cancer
2018
;
143
:
878
85
.
33.
Marx
V
. 
Drilling into big cancer-genome data
.
Nat Methods
2013
;
10
:
293
7
.
34.
Kanchi
KL
,
Johnson
KJ
,
Lu
C
,
McLellan
MD
,
Leiserson
MD
,
Wendl
MC
, et al
Integrated analysis of germline and somatic variants in ovarian cancer
.
Nat Commun
2014
;
5
:
3156
.
35.
Zhu
YC
,
Liao
XH
,
Wang
WX
,
Xu
CW
,
Zhuang
W
,
Wei
JG
, et al
Dual drive coexistence of EML4-ALK and TPM3-ROS1 fusion in advanced lung adenocarcinoma
.
Thorac Cancer
2018
;
9
:
324
7
.
36.
Armstrong
F
,
Lamant
L
,
Hieblot
C
,
Delsol
G
,
Touriol
C
. 
TPM3-ALK expression induces changes in cytoskeleton organisation and confers higher metastatic capacities than other ALK fusion proteins
.
Eur J Cancer
2007
;
43
:
640
6
.
37.
Ohtani
N
,
Yamakoshi
K
,
Takahashi
A
,
Hara
E
. 
The p16INK4a-RB pathway: molecular link between cellular senescence and tumor suppression
.
J Med Invest
2004
;
51
:
146
53
.
38.
Botti
E
,
Spallone
G
,
Moretti
F
,
Marinari
B
,
Pinetti
V
,
Galanti
S
, et al
Developmental factor IRF6 exhibits tumor suppressor activity in squamous cell carcinomas
.
Proc Natl Acad Sci U S A
2011
;
108
:
13710
5
.
39.
Restivo
G
,
Nguyen
BC
,
Dziunycz
P
,
Ristorcelli
E
,
Ryan
RJ
,
Ozuysal
OY
, et al
IRF6 is a mediator of Notch pro-differentiation and tumour suppressive function in keratinocytes
.
EMBO J
2011
;
30
:
4571
85
.
40.
Rotondo
JC
,
Borghi
A
,
Selvatici
R
,
Magri
E
,
Bianchini
E
,
Montinari
E
, et al
Hypermethylation-induced inactivation of the IRF6 gene as a possible early event in progression of vulvar squamous cell carcinoma associated with lichen sclerosus
.
JAMA Dermatol
2016
;
152
:
928
33
.
41.
Xu
L
,
Huang
TJ
,
Hu
H
,
Wang
MY
,
Shi
SM
,
Yang
Q
, et al
The developmental transcription factor IRF6 attenuates ABCG2 gene expression and distinctively reverses stemness phenotype in nasopharyngeal carcinoma
.
Cancer Lett
2018
;
431
:
230
43
.
42.
Ji
Y
,
Zheng
MF
,
Ye
SG
,
Chen
JY
,
Chen
YJ
. 
PTEN and Ki67 expression is associated with clinicopathologic features of non-small cell lung cancer
.
J Biomed Res
2014
;
28
:
462
467
.