Abstract
Genome-wide association studies (GWAS) were successful to identify genetic factors robustly associated with lung cancer. This review aims to synthesize the literature in this field and accelerate the translation of GWAS discoveries into results that are closer to clinical applications. A chronologic presentation of published GWAS on lung cancer susceptibility, survival, and response to treatment is presented. The most important results are tabulated to provide a concise overview in one read. GWAS have reported 45 lung cancer susceptibility loci with varying strength of evidence and highlighted suspected causal genes at each locus. Some genetic risk loci have been refined to more homogeneous subgroups of lung cancer patients in terms of histologic subtypes, smoking status, gender, and ethnicity. Overall, these discoveries are an important step for future development of new therapeutic targets and biomarkers to personalize and improve the quality of care for patients. GWAS results are on the edge of offering new tools for targeted screening in high-risk individuals, but more research is needed if GWAS are to pay off the investment. Complementary genomic datasets and functional studies are needed to refine the underlying molecular mechanisms of lung cancer preliminarily revealed by GWAS and reach results that are medically actionable. Cancer Epidemiol Biomarkers Prev; 27(4); 363–79. ©2018 AACR.
See all articles in this CEBP Focus section, “Genome-Wide Association Studies in Cancer.”
Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide in both men and women (1, 2). Our molecular understanding of this disease is in progress. Although it has long been recognized that lung cancer runs strongly in families (3–5), the specific genes that are responsible for enhanced risk are just starting to be revealed. Identifying genes responsible for lung cancer before the era of genome-wide association studies (GWAS) has been limited. Candidate susceptibility genes coding for enzymes involved in the activation, detoxification, and repair of damages caused by tobacco smoke as well as genes in inflammatory and cell-cycle pathways have been extensively studied (6, 7). Many of these candidate gene studies are either preliminary or controversial (8). Rare germline mutations in TP53, RB1, and EGFR have been shown to confer inherited predisposition to lung cancer (9–11). Fine mapping of genome-wide linkage peak on 6q23-25 also identified RGS17 as a predisposing gene (12). With the arrival of GWAS approximately 10 years ago, it became possible to interrogate the human genome more comprehensively for lung cancer susceptibility genes.
Chronologic Presentation of Published GWAS on Lung Cancer
GWAS have identified genetic factors robustly associated with lung cancer. Tables 1 and 2 provide a chronologic presentation of published GWAS on lung cancer in European and Asian populations, respectively, and summarize susceptibility loci identified. During the last decade, GWAS have evolved from finding lung cancer loci per se, to a more refined search strategy focused on specific subgroups of lung cancer patients. Advances in analysis strategies were also achieved moving from single marker analyses to pathway-based and variant prioritization approaches. GWAS have also been performed to find genetic loci associated with lung cancer survival, response to conventional therapies, and multiple sites of cancer. GWAS by environmental exposures and genome-wide epistasis analyses are also emerging. This review aims to provide a concise overview of this literature. All GWAS on lung cancer susceptibility, survival, and response to treatment reported in the literature were cumulated and manually curated by the authors. At the time of writing this review, a search on PubMed with the keywords “GWAS” and “lung cancer” was performed to identify any missing GWAS in the field. Finally, GWAS and susceptibility loci on lung cancer were further refined with the GWAS catalog (13). Please note that we have attempted to include all loci reported in the literature without quality assessment or exclusion criteria based on the magnitude of effects, sample size, or other criteria. Lung cancer susceptibility loci were reported on the basis of the interpretation of the authors in the original articles.
Reference . | Studya . | Sample size (cases/controls) . | Disease/trait . | Platform (# SNPs) . | Region (size) . | Gene . | Key SNPs . |
---|---|---|---|---|---|---|---|
Hung et al. (14) | IARC | 1,926/2,522 | Lung cancer | Illumina HumanHap300 (310,023) | 15q25 (182 kb) | CHRNA5 | rs1051730 |
Toronto | 330/453 | CHRNA3 | rs8034191 | ||||
EPIC | 781/1,578 | CHRNB4 | rs16969968 (D398N) | ||||
CARET | 764/1,515 | IREB2 | |||||
Liverpool | 403/814 | PSMA4 | |||||
HUNT/Tromsø | 235/392 | HYKK (AGPHD1) | |||||
Thorgeirsson et al. (15) | Icelandic smokers | 10,995 | Smoking quantity | Illumina HumanHap300 (306,207) | 15q24 | CHRNA5 | rs1051730 |
Icelandic smokers | 2,950 | Nicotine dependence | CHRNA3 | ||||
Iceland/Spain/The Netherlands | 1,024/32,244 | Lung cancer | CHRNB4 | ||||
Amos et al. (16) | Texas | 1,154/1,137 | Non-small cell lung cancer | Illumina HumanHap300 (315,450) | 15q25.1 (88 kb) | CHRNA5 | rs1051730 |
Texas replication | 711/632 | CHRNA3 | rs8034191 | ||||
UK | 2,013/3,062 | PSMA4 | rs931794 | ||||
HYKK | |||||||
Liu et al. (17) | GELCC | 194/219 | Familial lung cancer | Affymetrix 500K (399,377) or 6.0 (722,376) | 15q24-25.1 (160 kb) | CHRNA5 | rs8034191 |
CHRNA3 | rs1051730 | ||||||
CHRNB4 | rs16969968 (D398N) | ||||||
IREB2 | rs578776 | ||||||
PSMA4 | |||||||
HYKK | |||||||
McKay et al. (19) | Central Europe/Toronto/HUNT2-Tromso/CARET | 2,971/3,746 | Lung cancer | Illumina HumanHap300 (315,194) | 15q25.1 | rs1051730 | |
EPIC/Szczecin/CARET2/Liverpool | 2,899/5,573 | 5p15.33 | TERT | rs402710 | |||
CLPTM1L | rs2736100 | ||||||
Wang et al. (18) | British cohort | 1,952/1,438 | Lung cancer | Illumina HumanHap550 (511,919) | 15q25.1 | BAG6 | rs8042374 |
IARC | 1,989/2,625 | 6p21.33 (627 kb) | MSH5 | rs3117582 | |||
Texas | 1,154/1,137 | 5p15.33 (60 kb) | CLPTM1L | rs3131379 | |||
UK replication | 2,448/2,983 | rs401681 | |||||
Broderick et al. (20) | GELCAPS phase 1 | 1,952/1,438 | Lung cancer | Illumina HumanHap550 (511,919) | 15q25.1 (248 kb) | CHRNA3 | rs12914385 |
GELCAPS phase 2 | 2,465/3,005 | rs938682 | |||||
rs8042374 | |||||||
rs8034191 | |||||||
Meta-analysis (GELCAPS, IARC, Texas) | 7,560/8,205 | 5p15.33 (60 kb) | CLPTM1L | rs4975616 | |||
TERT | |||||||
6p21.33 | BAG6 | rs3117582 | |||||
TNXB | rs1150752 | ||||||
Landi et al. (21) | NCI (EAGLE, ATBC, PLCO, CPS-II) | 5,739/5,848 | Lung cancer | Illumina (515,922) | 15q25 | CHRNA3 | rs12914385 |
Adenocarcinoma | CHRNA5 | rs1051730 | |||||
Squamous cell | HYKK | rs8034191 | |||||
Meta-analysis (UK, Central Europe, Texas, DeCODE Genetics, HGF Germany, CARET, HUNT2/Tromso, Canada, France, Estonia) | 13,300/19,666 | Small cell | 5p15 | TERT | rs2736100 | ||
CLPTM1L | rs4635969 | ||||||
rs31489 | |||||||
6p21 | BAG6 | rs3117582 | |||||
APOM | |||||||
Timofeeva et al. (22) | TRICLb | 14,900/29,485 | Lung cancer | Illumina HumanHap300 (318,094) + HumanHap550 or 610Quad (217,914) | 15q25 | CHRNA5 | rs1051730 |
Han Chinese | 2,338/3,077 | CHRNA3 | rs8034191 | ||||
CHRNB4 | rs6495309 | ||||||
HYKK | rs680244 | ||||||
rs6495306 | |||||||
rs951266 | |||||||
5p15.33 | TERT | rs2736100 | |||||
CLPTM1L | rs401681 | ||||||
rs2853677 | |||||||
rs465498 | |||||||
6p21-22 | BAG6 | rs3117582 | |||||
MSH5 | rs2523546 | ||||||
rs2523571 | |||||||
Squamous cell carcinoma | 12p13.33 | RAD52 | rs10849605 | ||||
rs3748522 | |||||||
9p21.3 | CDKN2A | rs1333040 | |||||
CDKN2B | rs1537372 | ||||||
ANRIL (CDKN2B-AS1) | |||||||
2q32.1 | NUP35 | rs11683501 | |||||
Wang et al. (23) | MDACC ICR NCI IARC | 11,348/15,861 | Lung cancer Adenocarcinoma Squamous cell carcinoma | Illumina 317, 317+240S, 370Duo, 550, 610 or 1M | 13q13.1 | BRCA2 | rs11571833 (K3326X) |
EPIC | 10,246/3,8295 | FRY | rs56084662 | ||||
ICR | |||||||
IARC | |||||||
Toronto | |||||||
22q12.1 | CHEK2 | rs17879961 (I157T) | |||||
3q28 | TP63 | rs13314271 | |||||
rs4488809 | |||||||
McKay, Hung et al. (25) | OncoArrayc | 29,863/55,586 | Lung cancer | Oncoarray (10,439,017) | 1p31.1 | FUBP1 | rs71658797 |
6q27 | RNASET2 | rs6920364 | |||||
8p21.1 | EPHX2 CHRNA2 | rs11780471 | |||||
13q13.1 | BRCA2 | rs11571833 | |||||
15q21.1 | SEMA6D | rs66759488 | |||||
15q25.1 | CHRNA5 | rs55781567 | |||||
19q13.2 | CYP2A6 | rs56113850 | |||||
11,245/54,619 | Adenocarcinoma | 3q28 | TP63 | rs13080835 | |||
5p15.33 | TERT | rs7705526 | |||||
8p12 | NRG1 | rs4236709 | |||||
9p21.3 | MTAP CDKN2A | rs885518 | |||||
10q24.3 | OBFC1 | rs11591710 | |||||
11q23.3 | MPZL3 AMICA1 | rs1056562 | |||||
15q21.1 | SECISBP2L | rs77468143 | |||||
20q13.33 | RTEL1 | rs41309931 | |||||
7,704/54,763 | Squamous cell carcinoma | 6p21.33 | MHC | rs116822326 | |||
12p13.33 | RAD52 | rs7953330 | |||||
22q12.1 | CHEK2 | rs17879961 | |||||
Never smokers | |||||||
Li et al. (34) | Mayo | 377/377 | Lung cancer in never smokers | Illumina HumanHap370 and HumanHap610 (331,918) | 13q31.3 | GPC5 | rs2352028 |
MDACC | 328/407 | rs2352029 | |||||
Harvard | 92/161 | ||||||
UCLA | 91/439 | ||||||
Pathway-based GWAS | |||||||
Shi et al. (40) | NCI | 5,355/4,344 | Lung cancer Squamous cell carcinoma Adenocarcinoma, small cell | Illumina (19,082) (pathway-based analysis) | 12p13.33 | RAD52 | rs6489769 |
UK1 | 592/2,699 | ||||||
Texas | 306/1,137 | ||||||
UK2 | 1,038/933 | ||||||
Spitz et al. (41) | Texas | 451/508 | NSCLC in never smokers | Illumina (11,737) (pathway-based analysis) | 12q13 | ACVR1B | rs12809597 |
Mayo | 303/311 | NR4A1 | rs2701129 | ||||
rs1882119 | |||||||
Wang et al. (42) | ICR | 12,160/16,838 | Lung cancer | Illumina (826 functional SNPs) (pathway-based analysis) | 6p21.33 | MSH5 | rs3115672 |
MDACC | GTF2H4 | rs114596632 | |||||
IARC | |||||||
NCI | |||||||
Toronto | |||||||
HGF Germany | |||||||
5q14.2 | XRCC4 | rs1056503 | |||||
rs2035990 | |||||||
Variant prioritization approaches | |||||||
Li et al. (44) | Texas | 1,154/1,137 | Lung cancer + intermediate phenotype (cigarettes per day) | Illumina HumanHap300 | 15q24-25.1 | HYKK | rs12914385 |
CHRNA3 | |||||||
CHRNA5 | |||||||
CHRNB4 | |||||||
19q13 | TGFB1 | rs1800469 | |||||
B9D2 | rs1982072 | ||||||
rs2241714 | |||||||
3p26 | rs1444056 | ||||||
rs1403124 | |||||||
Poirier et al. (45) | Toronto | 331/499 | Lung cancer + family history | Illumina HumanHap300, 550 | 10q23.33 | FFAR4 | rs12415204 |
IARC | 1,964/2,610 | ||||||
MDACC | 1,154/1,137 | ||||||
HMGU | 504/484 | ||||||
NCI | 5,699/5,815 | ||||||
MSH-PMH | 1,073/939 | ||||||
MEC | 215/225 | ||||||
Harvard | 523/497 | ||||||
4p15.2 | KCNIP4 | rs1158970 | |||||
Brenner et al. (46) | TRICLb | 5,061 (SQ)/6,756 (AD)/2,216 (SCLC)/33,456 | Squamous cell carcinoma | Illumina HumanHap300, 550, 610 | 4p15.2 | KCNIP4 | rs6448050 |
ILCCO | 625 (SQ)/1,417 (AD)/369 (SCLC)/2,966 | rs9799795 | |||||
Adenocarcinoma | 18q12.1 | GAREM | rs11662168 | ||||
rs3786309 | |||||||
Cross-cancer loci | |||||||
Hung et al. (55) | GAME-ON/GECCO | 64,591/74,467 | Cross-cancer | Illumina, Affymetrix (12,370) (pathway-based analysis) | 12q24 | SH2B3 | rs3184504 (W262R) |
5p15 | TERT | rs2736100 | |||||
Fehringer et al. (56) | GAME-ON/GECCO | 61,851/61,820 | Cross-cancer | Illumina, Affymetrix (9,916,564) | 1q22 | MUC1 | rs1057941 |
Europeans | 55,789/330,490 | ADAM15 | rs4072037 | ||||
Others | 18,152/21,410 | THBS3 | |||||
9p21.3 | CDKN2B-AS1 | rs62560775 | |||||
13q13.1 | BRCA2 | rs11571833 |
Reference . | Studya . | Sample size (cases/controls) . | Disease/trait . | Platform (# SNPs) . | Region (size) . | Gene . | Key SNPs . |
---|---|---|---|---|---|---|---|
Hung et al. (14) | IARC | 1,926/2,522 | Lung cancer | Illumina HumanHap300 (310,023) | 15q25 (182 kb) | CHRNA5 | rs1051730 |
Toronto | 330/453 | CHRNA3 | rs8034191 | ||||
EPIC | 781/1,578 | CHRNB4 | rs16969968 (D398N) | ||||
CARET | 764/1,515 | IREB2 | |||||
Liverpool | 403/814 | PSMA4 | |||||
HUNT/Tromsø | 235/392 | HYKK (AGPHD1) | |||||
Thorgeirsson et al. (15) | Icelandic smokers | 10,995 | Smoking quantity | Illumina HumanHap300 (306,207) | 15q24 | CHRNA5 | rs1051730 |
Icelandic smokers | 2,950 | Nicotine dependence | CHRNA3 | ||||
Iceland/Spain/The Netherlands | 1,024/32,244 | Lung cancer | CHRNB4 | ||||
Amos et al. (16) | Texas | 1,154/1,137 | Non-small cell lung cancer | Illumina HumanHap300 (315,450) | 15q25.1 (88 kb) | CHRNA5 | rs1051730 |
Texas replication | 711/632 | CHRNA3 | rs8034191 | ||||
UK | 2,013/3,062 | PSMA4 | rs931794 | ||||
HYKK | |||||||
Liu et al. (17) | GELCC | 194/219 | Familial lung cancer | Affymetrix 500K (399,377) or 6.0 (722,376) | 15q24-25.1 (160 kb) | CHRNA5 | rs8034191 |
CHRNA3 | rs1051730 | ||||||
CHRNB4 | rs16969968 (D398N) | ||||||
IREB2 | rs578776 | ||||||
PSMA4 | |||||||
HYKK | |||||||
McKay et al. (19) | Central Europe/Toronto/HUNT2-Tromso/CARET | 2,971/3,746 | Lung cancer | Illumina HumanHap300 (315,194) | 15q25.1 | rs1051730 | |
EPIC/Szczecin/CARET2/Liverpool | 2,899/5,573 | 5p15.33 | TERT | rs402710 | |||
CLPTM1L | rs2736100 | ||||||
Wang et al. (18) | British cohort | 1,952/1,438 | Lung cancer | Illumina HumanHap550 (511,919) | 15q25.1 | BAG6 | rs8042374 |
IARC | 1,989/2,625 | 6p21.33 (627 kb) | MSH5 | rs3117582 | |||
Texas | 1,154/1,137 | 5p15.33 (60 kb) | CLPTM1L | rs3131379 | |||
UK replication | 2,448/2,983 | rs401681 | |||||
Broderick et al. (20) | GELCAPS phase 1 | 1,952/1,438 | Lung cancer | Illumina HumanHap550 (511,919) | 15q25.1 (248 kb) | CHRNA3 | rs12914385 |
GELCAPS phase 2 | 2,465/3,005 | rs938682 | |||||
rs8042374 | |||||||
rs8034191 | |||||||
Meta-analysis (GELCAPS, IARC, Texas) | 7,560/8,205 | 5p15.33 (60 kb) | CLPTM1L | rs4975616 | |||
TERT | |||||||
6p21.33 | BAG6 | rs3117582 | |||||
TNXB | rs1150752 | ||||||
Landi et al. (21) | NCI (EAGLE, ATBC, PLCO, CPS-II) | 5,739/5,848 | Lung cancer | Illumina (515,922) | 15q25 | CHRNA3 | rs12914385 |
Adenocarcinoma | CHRNA5 | rs1051730 | |||||
Squamous cell | HYKK | rs8034191 | |||||
Meta-analysis (UK, Central Europe, Texas, DeCODE Genetics, HGF Germany, CARET, HUNT2/Tromso, Canada, France, Estonia) | 13,300/19,666 | Small cell | 5p15 | TERT | rs2736100 | ||
CLPTM1L | rs4635969 | ||||||
rs31489 | |||||||
6p21 | BAG6 | rs3117582 | |||||
APOM | |||||||
Timofeeva et al. (22) | TRICLb | 14,900/29,485 | Lung cancer | Illumina HumanHap300 (318,094) + HumanHap550 or 610Quad (217,914) | 15q25 | CHRNA5 | rs1051730 |
Han Chinese | 2,338/3,077 | CHRNA3 | rs8034191 | ||||
CHRNB4 | rs6495309 | ||||||
HYKK | rs680244 | ||||||
rs6495306 | |||||||
rs951266 | |||||||
5p15.33 | TERT | rs2736100 | |||||
CLPTM1L | rs401681 | ||||||
rs2853677 | |||||||
rs465498 | |||||||
6p21-22 | BAG6 | rs3117582 | |||||
MSH5 | rs2523546 | ||||||
rs2523571 | |||||||
Squamous cell carcinoma | 12p13.33 | RAD52 | rs10849605 | ||||
rs3748522 | |||||||
9p21.3 | CDKN2A | rs1333040 | |||||
CDKN2B | rs1537372 | ||||||
ANRIL (CDKN2B-AS1) | |||||||
2q32.1 | NUP35 | rs11683501 | |||||
Wang et al. (23) | MDACC ICR NCI IARC | 11,348/15,861 | Lung cancer Adenocarcinoma Squamous cell carcinoma | Illumina 317, 317+240S, 370Duo, 550, 610 or 1M | 13q13.1 | BRCA2 | rs11571833 (K3326X) |
EPIC | 10,246/3,8295 | FRY | rs56084662 | ||||
ICR | |||||||
IARC | |||||||
Toronto | |||||||
22q12.1 | CHEK2 | rs17879961 (I157T) | |||||
3q28 | TP63 | rs13314271 | |||||
rs4488809 | |||||||
McKay, Hung et al. (25) | OncoArrayc | 29,863/55,586 | Lung cancer | Oncoarray (10,439,017) | 1p31.1 | FUBP1 | rs71658797 |
6q27 | RNASET2 | rs6920364 | |||||
8p21.1 | EPHX2 CHRNA2 | rs11780471 | |||||
13q13.1 | BRCA2 | rs11571833 | |||||
15q21.1 | SEMA6D | rs66759488 | |||||
15q25.1 | CHRNA5 | rs55781567 | |||||
19q13.2 | CYP2A6 | rs56113850 | |||||
11,245/54,619 | Adenocarcinoma | 3q28 | TP63 | rs13080835 | |||
5p15.33 | TERT | rs7705526 | |||||
8p12 | NRG1 | rs4236709 | |||||
9p21.3 | MTAP CDKN2A | rs885518 | |||||
10q24.3 | OBFC1 | rs11591710 | |||||
11q23.3 | MPZL3 AMICA1 | rs1056562 | |||||
15q21.1 | SECISBP2L | rs77468143 | |||||
20q13.33 | RTEL1 | rs41309931 | |||||
7,704/54,763 | Squamous cell carcinoma | 6p21.33 | MHC | rs116822326 | |||
12p13.33 | RAD52 | rs7953330 | |||||
22q12.1 | CHEK2 | rs17879961 | |||||
Never smokers | |||||||
Li et al. (34) | Mayo | 377/377 | Lung cancer in never smokers | Illumina HumanHap370 and HumanHap610 (331,918) | 13q31.3 | GPC5 | rs2352028 |
MDACC | 328/407 | rs2352029 | |||||
Harvard | 92/161 | ||||||
UCLA | 91/439 | ||||||
Pathway-based GWAS | |||||||
Shi et al. (40) | NCI | 5,355/4,344 | Lung cancer Squamous cell carcinoma Adenocarcinoma, small cell | Illumina (19,082) (pathway-based analysis) | 12p13.33 | RAD52 | rs6489769 |
UK1 | 592/2,699 | ||||||
Texas | 306/1,137 | ||||||
UK2 | 1,038/933 | ||||||
Spitz et al. (41) | Texas | 451/508 | NSCLC in never smokers | Illumina (11,737) (pathway-based analysis) | 12q13 | ACVR1B | rs12809597 |
Mayo | 303/311 | NR4A1 | rs2701129 | ||||
rs1882119 | |||||||
Wang et al. (42) | ICR | 12,160/16,838 | Lung cancer | Illumina (826 functional SNPs) (pathway-based analysis) | 6p21.33 | MSH5 | rs3115672 |
MDACC | GTF2H4 | rs114596632 | |||||
IARC | |||||||
NCI | |||||||
Toronto | |||||||
HGF Germany | |||||||
5q14.2 | XRCC4 | rs1056503 | |||||
rs2035990 | |||||||
Variant prioritization approaches | |||||||
Li et al. (44) | Texas | 1,154/1,137 | Lung cancer + intermediate phenotype (cigarettes per day) | Illumina HumanHap300 | 15q24-25.1 | HYKK | rs12914385 |
CHRNA3 | |||||||
CHRNA5 | |||||||
CHRNB4 | |||||||
19q13 | TGFB1 | rs1800469 | |||||
B9D2 | rs1982072 | ||||||
rs2241714 | |||||||
3p26 | rs1444056 | ||||||
rs1403124 | |||||||
Poirier et al. (45) | Toronto | 331/499 | Lung cancer + family history | Illumina HumanHap300, 550 | 10q23.33 | FFAR4 | rs12415204 |
IARC | 1,964/2,610 | ||||||
MDACC | 1,154/1,137 | ||||||
HMGU | 504/484 | ||||||
NCI | 5,699/5,815 | ||||||
MSH-PMH | 1,073/939 | ||||||
MEC | 215/225 | ||||||
Harvard | 523/497 | ||||||
4p15.2 | KCNIP4 | rs1158970 | |||||
Brenner et al. (46) | TRICLb | 5,061 (SQ)/6,756 (AD)/2,216 (SCLC)/33,456 | Squamous cell carcinoma | Illumina HumanHap300, 550, 610 | 4p15.2 | KCNIP4 | rs6448050 |
ILCCO | 625 (SQ)/1,417 (AD)/369 (SCLC)/2,966 | rs9799795 | |||||
Adenocarcinoma | 18q12.1 | GAREM | rs11662168 | ||||
rs3786309 | |||||||
Cross-cancer loci | |||||||
Hung et al. (55) | GAME-ON/GECCO | 64,591/74,467 | Cross-cancer | Illumina, Affymetrix (12,370) (pathway-based analysis) | 12q24 | SH2B3 | rs3184504 (W262R) |
5p15 | TERT | rs2736100 | |||||
Fehringer et al. (56) | GAME-ON/GECCO | 61,851/61,820 | Cross-cancer | Illumina, Affymetrix (9,916,564) | 1q22 | MUC1 | rs1057941 |
Europeans | 55,789/330,490 | ADAM15 | rs4072037 | ||||
Others | 18,152/21,410 | THBS3 | |||||
9p21.3 | CDKN2B-AS1 | rs62560775 | |||||
13q13.1 | BRCA2 | rs11571833 |
Abbreviations: ATBC, Alpha-Tocopherol, Beta-Carotene Cancer Prevention study; CARET, Beta-Carotene and Retinol Efficacy Trial; CPS-II, Cancer Prevention Study II Nutrition Cohort; EAGLE, Environment and Genetics in Lung Cancer Etiology; EPIC, European Prospective Investigation in Cancer and Nutrition; GAME-ON, The Genetic Associations and Mechanisms in Oncology; GECCO, The Genetic and Epidemiology of Colorectal Cancer Consortium; GELCAPS, Genetic Lung Cancer Predisposition Study; GELCC, Genetic Epidemiology of Lung Cancer Consortium; HGF Germany, Helmholtz-Gemeinschaft Deutscher Forschungszentren Lung Cancer GWAS; HMGU, Germany Study; HUNT, Health Study of North-Trondelag; IARC, International Agency for Research on Cancer; ICR, Institute of Cancer Research; ILCCO, International Lung Cancer Consortium; MDACC, M.D. Anderson Cancer Center; MEC, Multi-Ethnic Cohort; MSH-PMH, Mount Sinai Hospital and Princess Margaret Hospital in Toronto; NCI, National Cancer Institute; PLCO, Prostate, Lung, Colon, Ovary Screening Trial; TRICL, Transdisciplinary Research In Cancer of the Lung; UCLA, University of California in Los Angeles.
aItalic text indicates replication cohorts.
bTRICL includes the following cohorts: MDACC, Liverpool, ICR, Toronto, IARC (Central Europe, CARET, Estonia, France, HUNT2/Tromso), DeCODE Genetics, HGF Germany, Harvard, NCI (EAGLE, ATBC, PLCO, CPS-II).
cOncoArray includes the following cohorts: CARET, PLCO, MEC, NELCS (New England Lung Cancer Study), Harvard, MDACC, Tampa (Tampa Lung Cancer Study), BioVU (Vanderbilt 2), LCRI-DOD, TLC (Total Lung Cancer: Molecular Epidemiology of Lung Cancer Survival), Canada (Canadian Screening Study), MSH-PMH, CAPUA (Cancer de Pulmon en Asturias), ATBC, NSHDC (Northern Sweden Health and Disease Cohort), MDCS (The Malmö Diet and Cancer Study), EPIC, Liverpool, Norway (Norway Lung Cancer Study), EAGLE, Nijmegen (The Nijmegen Lung Cancer Study), NICCC-LCA (Clalit National Israeli Cancer Control Center Lung Cancer Study), L2 (The IARC L2 Study), Copenhagen (Copenhagen Lung Cancer Study), Germany, ReSoLuCENT.
Reference . | Studya . | Sample size (cases/controls) . | Disease/trait . | Platform (# SNPs) . | Region (size) . | Gene . | Key SNPs . |
---|---|---|---|---|---|---|---|
Yoon et al. (26) | Korea | 621/1,541 | NSCLC | Affymetrix 5.0 (246,758) | 3q29 | C3orf21 | rs2131877 |
Korea replication | 804/1,470 | rs10433328 | |||||
rs952481 | |||||||
rs4677657 | |||||||
5p15 | TERT | rs2736100 | |||||
CLPTM1L | rs402710 | ||||||
rs401681 | |||||||
Miki et al. (27) | Japanese | 1,004/1,900 | Adenocarcinoma | Illumina HumanHap610-Quad and HumanHap550 (432,024) | 5p15 | TERT | rs2736100 |
Japanese | 525/7,678 | ||||||
Korean | 569/1,470 | ||||||
3q28 | TP63 | rs10937405 | |||||
rs4488809 | |||||||
rs9816619 | |||||||
rs4600802 | |||||||
Hu et al. (28) | Han Chinese (Nanjing, Beijing, Shanghai) | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) | 3q28 | TP63 | rs4488809 |
1st stage replication | 2,283/2,243 | rs10937405 | |||||
2nd stage replication | 4,030/4,166 | ||||||
5p15.33 | TERT | rs465498 | |||||
CLPTM1L | rs2736100 | ||||||
13q12.12 | MIPEP | rs753955 | |||||
TNFRSF19 | |||||||
22q12.2 | MTMR3 | rs17728461 | |||||
HORMAD2 | rs36600 | ||||||
LIF | |||||||
Dong et al. (29) | Chinese (GWAS) | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) | 10p14 | GATA3 | rs1663689 |
Chinese (stage 1) | 2,283/2,243 | ||||||
Chinese (stage 2) | 5,153/5,240 | ||||||
5q32 | PPP2R2B | rs2895680 | |||||
STK32A | |||||||
DPYSL3 | |||||||
20q13.2 | CYP24A1 | rs4809957 | |||||
rs2296239 | |||||||
5q31.1 | IL3 | rs247008 | |||||
CSF2 | |||||||
P4HA2 | |||||||
SLC22A5 | |||||||
ACSL6 | |||||||
1p36.32 | AJAP1 | rs9439519 | |||||
NPHP4 | |||||||
Shiraishi et al. (32) | Japanese | 1,695/5,333 | Adenocarcinoma | Illumina OmniExpress & Omni1-Quad (538,166) | 5p15.33 | TERT | rs2736100 |
1st validation | 2,955/7,036 | rs2853677 | |||||
2nd validation | 1,379/1,166 | ||||||
3q28 | TP63 | rs10937405 | |||||
17q24.3 | BPTF | rs7216064 | |||||
6p21.3 | BTNL2 | rs3817963 | |||||
Dong et al. (30) | Han Chinese | 833/3,094 | Squamous cell carcinoma | Affymetrix 6.0 (570,009) | 12q23.1 | NR1H4 | rs12296850 |
Replication 1 | 822/2,243 | SLC17A8 | |||||
Replication 2 | 1,401/4,166 | ||||||
Jin et al. (31) | Han Chinese | 1,341/1,982 | Lung cancer | Illumina HumanExome (72,423) | 6p21.33 | PRRC2A (BAT2) | rs9469031 (P515L) |
Replication 1 | 1,115/1,246 | FKBPL | rs200847762 (P137L) | ||||
Replication 2 | 3,584/3,669 | ||||||
20q11.21 | BPIFB1 | rs6141383 (V284M) | |||||
6p22.2 | HIST1H1E | rs2298090 (L152R) | |||||
Never smokers | |||||||
Hsiung et al. (35) | GELAC (Han Chinese) | 584/585 | Lung adenocarcinoma in never-smoking Asian females | Illumina HumanCNV370-Duo and HumanHap610 Quad (457,504) | 5p15.33 | CLPTM1L | rs2736100 |
GELAC (replication) | 610/560 | TERT | |||||
CAMSCH | 287/287 | ||||||
SNU | 259/293 | ||||||
SWHS | 209/213 | ||||||
WHLCS | 207/207 | ||||||
KNUH | 121/119 | ||||||
KUMC | 95/87 | ||||||
GEL-S | 193/546 | ||||||
NJLCS | 203/203 | ||||||
Lan et al. (36) | Female Lung Cancer Consortium in Asia | 5,510/4,544 | Lung cancer in never smokers | Illumina (512,226) | 10q25.2 | VTI1A | rs7086803 |
1,099/2,913 | rs11196080 | ||||||
6q22.2 | ROS1, DCBLD1 | rs9387478 | |||||
6p21.32 | HLA class II region | rs2395185 | |||||
5p15.33 | TERT | rs2736100 | |||||
3q28 | TP63 | rs4488809 | |||||
17q24.3 | BPTF | rs7216064 | |||||
Wang et al. (37) | Female Lung cancer Consortium in Asia | 6,877/6,277 | Lung cancer in never smokers | Illumina (7,564,751) | 6p21.1 | FOXP4 | rs7741164 |
5,878/7,046 | FOXP4-AS1 | ||||||
9p21.3 | CDKN2B | rs72658409 | |||||
CDKN2B-AS1 | |||||||
12q13.13 | ACVR1B | rs116101143 | |||||
Ahn et al. (38) | Korean | 446/497 | NSCLC in never smokers | Affymetrix 6.0 (474,503) | 18p11.22 | FAM38B | rs11080466 |
434/1,000 | (PIEZO2) | rs11663246 | |||||
APCDD1 | |||||||
NAPG | |||||||
Kim et al. (39) | Korean | 285/1,455 | Lung cancer in never smoker women | Affymetrix 5.0 (331,088) | 2p16.3 | NRXN1 | rs10187911 |
Replication 1 | 293/495 | ||||||
Replication 2 | 546/744 | ||||||
Genome-wide epistasis | |||||||
Chu et al. (52) | Han Chinese | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) (epistatis) | 2q32.2 | HIBCH | rs2562796 |
Replication 1 | 1,534/1,489 | INPP1 | rs16832404 | ||||
Replication 2 | 2,512/2,449 | PMS1 | |||||
STAT1 | |||||||
Cross-cancer loci | |||||||
Jin et al. (54) | Han Chinese | 5,368/4,006 | Cross-cancer | Affymetrix 6.0 | 6p21.1 | LRFN2 | rs2494938 |
Han Chinese | 9,001/11,436 | ||||||
7p15.3 | SP4 | rs2285947 | |||||
DNAH11 |
Reference . | Studya . | Sample size (cases/controls) . | Disease/trait . | Platform (# SNPs) . | Region (size) . | Gene . | Key SNPs . |
---|---|---|---|---|---|---|---|
Yoon et al. (26) | Korea | 621/1,541 | NSCLC | Affymetrix 5.0 (246,758) | 3q29 | C3orf21 | rs2131877 |
Korea replication | 804/1,470 | rs10433328 | |||||
rs952481 | |||||||
rs4677657 | |||||||
5p15 | TERT | rs2736100 | |||||
CLPTM1L | rs402710 | ||||||
rs401681 | |||||||
Miki et al. (27) | Japanese | 1,004/1,900 | Adenocarcinoma | Illumina HumanHap610-Quad and HumanHap550 (432,024) | 5p15 | TERT | rs2736100 |
Japanese | 525/7,678 | ||||||
Korean | 569/1,470 | ||||||
3q28 | TP63 | rs10937405 | |||||
rs4488809 | |||||||
rs9816619 | |||||||
rs4600802 | |||||||
Hu et al. (28) | Han Chinese (Nanjing, Beijing, Shanghai) | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) | 3q28 | TP63 | rs4488809 |
1st stage replication | 2,283/2,243 | rs10937405 | |||||
2nd stage replication | 4,030/4,166 | ||||||
5p15.33 | TERT | rs465498 | |||||
CLPTM1L | rs2736100 | ||||||
13q12.12 | MIPEP | rs753955 | |||||
TNFRSF19 | |||||||
22q12.2 | MTMR3 | rs17728461 | |||||
HORMAD2 | rs36600 | ||||||
LIF | |||||||
Dong et al. (29) | Chinese (GWAS) | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) | 10p14 | GATA3 | rs1663689 |
Chinese (stage 1) | 2,283/2,243 | ||||||
Chinese (stage 2) | 5,153/5,240 | ||||||
5q32 | PPP2R2B | rs2895680 | |||||
STK32A | |||||||
DPYSL3 | |||||||
20q13.2 | CYP24A1 | rs4809957 | |||||
rs2296239 | |||||||
5q31.1 | IL3 | rs247008 | |||||
CSF2 | |||||||
P4HA2 | |||||||
SLC22A5 | |||||||
ACSL6 | |||||||
1p36.32 | AJAP1 | rs9439519 | |||||
NPHP4 | |||||||
Shiraishi et al. (32) | Japanese | 1,695/5,333 | Adenocarcinoma | Illumina OmniExpress & Omni1-Quad (538,166) | 5p15.33 | TERT | rs2736100 |
1st validation | 2,955/7,036 | rs2853677 | |||||
2nd validation | 1,379/1,166 | ||||||
3q28 | TP63 | rs10937405 | |||||
17q24.3 | BPTF | rs7216064 | |||||
6p21.3 | BTNL2 | rs3817963 | |||||
Dong et al. (30) | Han Chinese | 833/3,094 | Squamous cell carcinoma | Affymetrix 6.0 (570,009) | 12q23.1 | NR1H4 | rs12296850 |
Replication 1 | 822/2,243 | SLC17A8 | |||||
Replication 2 | 1,401/4,166 | ||||||
Jin et al. (31) | Han Chinese | 1,341/1,982 | Lung cancer | Illumina HumanExome (72,423) | 6p21.33 | PRRC2A (BAT2) | rs9469031 (P515L) |
Replication 1 | 1,115/1,246 | FKBPL | rs200847762 (P137L) | ||||
Replication 2 | 3,584/3,669 | ||||||
20q11.21 | BPIFB1 | rs6141383 (V284M) | |||||
6p22.2 | HIST1H1E | rs2298090 (L152R) | |||||
Never smokers | |||||||
Hsiung et al. (35) | GELAC (Han Chinese) | 584/585 | Lung adenocarcinoma in never-smoking Asian females | Illumina HumanCNV370-Duo and HumanHap610 Quad (457,504) | 5p15.33 | CLPTM1L | rs2736100 |
GELAC (replication) | 610/560 | TERT | |||||
CAMSCH | 287/287 | ||||||
SNU | 259/293 | ||||||
SWHS | 209/213 | ||||||
WHLCS | 207/207 | ||||||
KNUH | 121/119 | ||||||
KUMC | 95/87 | ||||||
GEL-S | 193/546 | ||||||
NJLCS | 203/203 | ||||||
Lan et al. (36) | Female Lung Cancer Consortium in Asia | 5,510/4,544 | Lung cancer in never smokers | Illumina (512,226) | 10q25.2 | VTI1A | rs7086803 |
1,099/2,913 | rs11196080 | ||||||
6q22.2 | ROS1, DCBLD1 | rs9387478 | |||||
6p21.32 | HLA class II region | rs2395185 | |||||
5p15.33 | TERT | rs2736100 | |||||
3q28 | TP63 | rs4488809 | |||||
17q24.3 | BPTF | rs7216064 | |||||
Wang et al. (37) | Female Lung cancer Consortium in Asia | 6,877/6,277 | Lung cancer in never smokers | Illumina (7,564,751) | 6p21.1 | FOXP4 | rs7741164 |
5,878/7,046 | FOXP4-AS1 | ||||||
9p21.3 | CDKN2B | rs72658409 | |||||
CDKN2B-AS1 | |||||||
12q13.13 | ACVR1B | rs116101143 | |||||
Ahn et al. (38) | Korean | 446/497 | NSCLC in never smokers | Affymetrix 6.0 (474,503) | 18p11.22 | FAM38B | rs11080466 |
434/1,000 | (PIEZO2) | rs11663246 | |||||
APCDD1 | |||||||
NAPG | |||||||
Kim et al. (39) | Korean | 285/1,455 | Lung cancer in never smoker women | Affymetrix 5.0 (331,088) | 2p16.3 | NRXN1 | rs10187911 |
Replication 1 | 293/495 | ||||||
Replication 2 | 546/744 | ||||||
Genome-wide epistasis | |||||||
Chu et al. (52) | Han Chinese | 2,331/3,077 | Lung cancer | Affymetrix 6.0 (591,370) (epistatis) | 2q32.2 | HIBCH | rs2562796 |
Replication 1 | 1,534/1,489 | INPP1 | rs16832404 | ||||
Replication 2 | 2,512/2,449 | PMS1 | |||||
STAT1 | |||||||
Cross-cancer loci | |||||||
Jin et al. (54) | Han Chinese | 5,368/4,006 | Cross-cancer | Affymetrix 6.0 | 6p21.1 | LRFN2 | rs2494938 |
Han Chinese | 9,001/11,436 | ||||||
7p15.3 | SP4 | rs2285947 | |||||
DNAH11 |
Abbreviations: CAMSCH, Chinese Academy of Medical Sciences Cancer Hospital Study; GELAC, Genetic Epidemiological Study of Lung Adenocarcinoma; GEL-S, Genes and Environment in Lung Cancer, Singapore study; KNUH, Kyungpook National University Hospital Study; KUMC, Korea University Medical Center Study; NJLCS, Nanjing Lung Cancer Study; SNU, Seoul National University Study; SWHS, Shanghai Women's Health Cohort Study; WHLCS, Wuhan Lung Cancer Study.
aItalic text indicates replication cohorts.
GWAS on Lung Cancer Susceptibility
GWAS in European populations
The first GWAS on lung cancer were reported in 2008. Three independent studies identified a susceptibility locus on chromosome 15q. Hung and colleagues (14) found two SNPs strongly associated with lung cancer on chromosome 15q25. Further genotyping in this region revealed many SNPs in tight linkage disequilibrium (LD) showing evidence of association. Six genes are located in this region including three nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, and CHRNB4). Interestingly, no appreciable variation in the risk was found across smoking categories or histologic subtypes of lung cancer. In a second GWAS, a SNP within the CHRNA3 gene was strongly associated with smoking quantity and nicotine dependence (15). The same SNP was also strongly associated with lung cancer. The results suggest that the variant on chromosome 15q25 confers risk of lung cancer through its effect on tobacco addiction. In contrast, a third study showed weak evidence that the 15q25 locus influences smoking behavior and is mostly directly associated with lung cancer (16). However, it should be emphasized that the later GWAS was conducted in cases and controls matched on smoking status, thus limiting variation between the two groups and the power to detect any smoking association. Further analyses from the same study suggest that SNPs and smoking have independent effects on risk. Together, these three studies unequivocally support the 15q25 locus as harboring susceptibility variants for lung cancer or smoking behavior.
A GWAS performed in familial lung cancer confirmed the susceptibility locus on 15q24-25.1 (17). A subsequent GWAS identified two newly associated risk loci for lung cancer (18). In this study, 15q25 was again the most strongly associated locus. However, by pooling the results with other studies (14, 16), new cancer risk loci were found. Two intronic SNPs located in different genes (BAG6, previously known as BAT3, and MSH5) and separated by more than 600 kilobases on chromosome 6p21 were significantly associated with lung cancer. The strongest association, aside 15q25 and 6p21, was found on chromosome 5p15 within the CLPTM1L gene. The 5p15 locus was further supported by an expanded GWAS from previous populations (19). Two uncorrelated SNPs in that region were strongly associated with lung cancer. These SNPs are located within or in proximity to two biologically relevant genes namely CLPTM1L and TERT. Together, by the end of 2008, three susceptibility loci for lung cancer were identified, that is, 15q25, 6p21, and 5p15.
A more extensive follow-up on a previous GWAS (18) further supports the contribution of the three loci (20). The latter study supports the possibility that two independent loci are acting on 15q25. The latter locus was also associated with smoking behavior, with risk alleles correlated with higher tobacco consumption. In contrast, the 5p15 and 6p21 loci were not associated with smoking behavior. However, DNA variants at 5p15 were associated with histologic subtypes of lung cancer, with an increased frequency of the risk allele in cases with adenocarcinoma. This observation was subsequently confirmed in a meta-analysis published in 2009 (21). This study provides compelling evidence that the 5p15 susceptibility locus for lung cancer is confined to a more specific subtype of lung cancer, that is, adenocarcinoma. Particularly intriguing in that study of more than 30,000 subjects is the absence of new genomic regions associated with lung cancer. In 2010, a meta-analysis of 16 GWAS confirmed lung cancer loci on 15q25, 5p15, and 6p21 (22). Again, the association at 5p15 was confined to adenocarcinoma, whereas the 6p21 locus was more strongly associated with squamous cell carcinoma. Stratification by histology identified three loci for squamous cell carcinoma including 12q13.33, 9p21.3, and 2q32.1. In 2014, another GWAS meta-analysis taking advantage of the imputation based on the 1000 Genomes Project was performed (23), which allowed testing for less frequent SNPs not measured in earlier studies. The top nine signals were followed-up and rare genetic variants were associated with squamous cell carcinoma in the BRCA2 gene on 13q13.1 and in CHEK2 on 22q12.1. CHEK2 was previously associated with lung cancer (24), but this was the first time using a GWAS approach. The 3q28 locus was associated with lung adenocarcinoma, which has been previously found in Asian populations (see the following section). Finally, the latest and largest lung cancer GWAS in individuals of European ancestry was performed in 29,266 cases and 56,450 controls (25). This GWAS highlighted the genetic heterogeneity across histologic subtypes of lung cancer and reported novel loci for lung cancer per se (1p31.1, 6q27, 8p21.1, and 15q21.1) and adenocarcinoma (8p12, 10q24.3, 11q23.3, and 20q13.33). Previously reported lung cancer loci were more specifically associated with squamous cell carcinoma in this study including 6p21.33, 12p13.33, and 22q12.1.
GWAS in Asian populations
Genetic heterogeneity in lung cancer susceptibility is observed between populations of European and Asian descent. For example, the strongest lung cancer susceptibility variants on 15q25 have very low allele frequencies in Asian populations. Similarly, variants on 6p21 found in Europeans are not polymorphic in Asians. Accordingly, GWAS specific for Asian populations were required.
In a Korean population, a GWAS on NSCLC revealed a new locus on chromosome 3q29 (26). This study also confirmed the 5p15 susceptibility locus in Koreans. Other GWAS in Asian populations have followed. Two susceptibility loci were identified in Japanese and Korean populations confirming 5p15 and elucidating a new locus on 3q28 (27). The 5p15 and 3q28 were subsequently confirmed in a larger GWAS in Han Chinese (28). In addition, two new loci on 13q12.12 and 12q12.2 were identified. In the same GWAS, but with an extended validation sample size, five new lung cancer loci were identified including 10p14, 5q32, 20q13.2, 5q31.1, and 1p36.32 (29). A subsequent GWAS specifically for lung squamous cell carcinoma in Han Chinese revealed a new locus on 12q23.1 (30). Using the exome genotyping chip, rare variants on 6p21.33, 20q11.21, and 6p22.2 were also found in the Chinese populations (31). On 6p21.33, two missense variants, one in PRRC1A (also known as BAT2) and the other in FKBPL, were independently associated with the risk of lung cancer, suggesting more than one genetic signal in this region. This study also demonstrated that 6p21.33 is also a susceptibility locus for Asian populations, but with different risk variants. The aforementioned GWAS in Japanese population (27) was later expanded in terms of sample size and SNP coverage to identify a new locus on 17q24.3 (32). The association with lung adenocarcinoma was also confirmed for 5p15, 3q28, and 6p21, but not for 13q12.12 and 22q12.2.
GWAS in never smokers
Lung cancer in never smokers is known to be a distinct entity (33). The first GWAS on lung cancer in never smokers was reported in 2010 (34). A single locus on chromosome 13q31.3 was identified. The lung cancer–associated SNPs were located in the GPC5 gene and were also associated with mRNA expression levels of this gene in human lung tissues. A subsequent GWAS was performed in never-smoking females from Asia (35). The 5p15 locus was confirmed with an effect size greater than the estimates reported in populations of European background. Interestingly, the 15q25 and 6p21 loci were not associated with lung cancer in this study and no new loci were identified. In a larger GWAS of the same population forming the Female Lung Cancer Consortium in Asia, new susceptibility loci were revealed at 10q25.2 and 6q22.2 (36). The 6p21 was also associated with lung cancer in this study, but significant markers were not in LD with those previously reported, suggesting again more than one independent genetic signal at 6p21. This study in Asian females also confirmed other loci reported before including 5p15, 3q28, and 17q24.3. A recent meta-analysis was reported with an extended sample size of the Female Lung Cancer Consortium in Asia (37). A new locus on 12q13.13 was identified as well as genetic variants not correlated with lung cancer-SNPs previously associated with lung cancer on 6p21.1 and 9p21.3. In never smokers from Korea, a new NSCLC locus on 18p11 was identified (38). The 2p16.3 locus was also suggested in nonsmoking Korean women (39). However, previous loci identified in never smoker populations were not replicated in these Korean studies including 5p15 and 13q31.3.
Pathway-based GWAS
Pathway-based analyses have been used to identify lung cancer loci. Using GWAS data, genes listed under the category of inflammation were evaluated by lung cancer histologic subtypes (40). This analysis identified a risk locus on chromosome 12p13.33 harboring the RAD12 gene. A similar approach was used to evaluate SNPs in inflammatory pathway genes in lifetime never smokers (41). SNPs on chromosome 12q13 in the ACVR1B and NR4A1 genes were associated with lung cancer, particularly in women and those who reported environmental tobacco smoke exposure. Focused on DNA repair genes, a recent study revealed variants in GTF2H4 on 6p21 and in XRCC4 on 5q14.2 associated with lung cancer risk (42). Pathway-based analyses of GWAS data have also identified groups of genes linked by known biological pathways (ABC transporters, VEGF signaling, G1–S check point, and NRAGE signals death through JNK) that were modestly, but coordinately associated with the risk of developing lung cancer (43).
Variant prioritization approaches
New lung cancer loci were also revealed by incorporating an intermediate phenotype, that is, smoked cigarettes per day, into the analyses (44). By combining the estimates derived from the case–control analysis and the intermediate phenotype, a stronger signal was observed on 15q25 locus compared with the case–control study alone. Genetic associations with lung cancer were also detected on 19q13 and 3p26, which demonstrated improved power to identify genetic loci by combining different types of data from a single population. Studying cohorts of patients well-characterized for lung cancer may thus be very promising using this approach. A similar approach was used by assigning higher priors to SNPs associated with family history of lung cancer (45). By focusing on SNPs missed by traditional GWAS, this study identified 30 variants that showed evidence of association with lung cancer risk. The strongest associations were found on 10q23.33 and 4p15.2. Biological priors within a Bayesian framework were also applied to histology-specific analyses (46). In this study, the 4p15.2 locus was assigned more specifically to squamous cell carcinoma and a new adenocarcinoma locus was identified on 18q12.1.
GWAS-by-exposure interaction
Accounting for environmental exposure is challenging owing to the large number of possible factors as well as the level of measurement accuracy that can be achieved for each exposure. Despite these challenges, some genome-wide gene–environment interaction studies are starting to emerge in the field of lung cancer. The first attempt of a genome-wide gene–smoking interaction study identified two SNPs on 14q22.1 and 15q22.32 influencing the risk of lung cancer (47). For asbestos exposure, interacting loci were suggested on 2q34, 7q32.1, and 11q13 (48). A risk locus for asbestos-associated lung cancer was also discovered on 22q13.31 (49). Interacting loci with household air pollution caused by solid fuel burning for heating and cooking were also evaluated in never smoker women from Asia (50). Interestingly, interactions were reported for GWAS-nominated loci previously identified in this population (36), but no new loci reached significance at the genome-wide scale level. Exploratory analyses of gene–occupation interactions in determining lung cancer susceptibility were also performed for 17 established or suspected lung carcinogens and 49 additional occupational agents (51). A large number of gene–environment interactions were reported in that study. However, the results could not be validated in an independent population because of the uniqueness of the dataset with detailed occupational exposure data. So far, results from genome-wide gene–environment studies in lung cancer have been more hypothesis-generating owing to limited sample size and power as well as the lack of appropriate replication sets. To make further progress, extra care will be needed to build large cohorts that are well-characterized for environmental exposures.
Genome-wide epistasis
The effects of genetic variants on lung cancer are likely to be amplified when multiple variants synergize together. Gene–gene interactions may identify genetic determinants of lung cancer. The first and only genome-wide two-locus interaction analysis performed so far revealed a significant interaction between two SNPs 60 kilobases apart on 2q32.2 (52). Individually, the two interacting SNPs were not significantly associated with the risk of lung cancer. Further investigations of gene–gene interactions will be needed to understand the genetic architecture of lung cancer.
Cross-cancer susceptibility loci
Large-scale GWAS across cancer sites have been conducted to identify pleiotropic loci. For lung cancer, the first pleiotropic locus was identified on 5p15 (TERT-CLPTM1L; ref. 53). A novel pleiotropic association at 7p15.3 was found in Han Chinese involving lung cancer, non-cardia gastric cancer, and esophageal squamous cell carcinoma (54). The GAME-ON/GECCO Network on lung, ovary, breast, prostate and colorectal cancer then identified novel pleiotropic associations involving lung cancer on 12q24 (55) and 1q22 (56). Known lung cancer loci were also identified in cross-cancer analyses including 6p21 (54) and 5p15 (55) as well as 9p21.3 and 13q13.1 (56). These loci are particularly promising to reveal shared carcinogenesis mechanisms across multiple cancer sites.
Integration of GWAS on Lung Cancer Susceptibility
Excluding gene–environment loci that are more suggestive at this point, GWAS reported 45 loci associated with lung cancer. Figure 1 shows the chronologic and cumulative number of lung cancer susceptibility loci identified. Loci are also listed on the basis of chromosome number in Table 3. Note that these loci are an evolving list. The strength of evidence for association with lung cancer and effect size vary by loci (Fig. 2). Evidence supporting some loci is relatively modest and will require validation in independent studies. The magnitude of genetic associations reported in publications also varies within loci. For example, the largest OR for chromosome 15q25 was 7.2 reported in familial form of lung cancer with a relatively modest P value of 1.03 × 10−3 (17). On the other hand, the same locus was reported highly significant (P = 3.08 × 10−103) with an OR of approximately 1.3 (25), which is an effect size more consistent with most studies on sporadic form of lung cancer. The maximum OR and P value per locus as well as variability between studies are illustrated in Fig. 2. Refining susceptibility loci by clinically relevant subgroups is a critical step to reveal functional variants and causative genes (57). Table 3 summarizes the evidence supporting the specificity of lung cancer risk loci by histology, smoking status, gender, ethnicity, and age of onset. For example, convincing evidence supports that the 5p15 locus is specific to lung adenocarcinoma and more strongly associated with never smokers and women (21, 22, 28). Similar genetic association patterns are also emerging for 2p16.3, 5q32, and 6q22. Accordingly, a number of studies have started to delineate the effects of genetic variants in specific subgroups of patients with lung cancer, which is important to reveal the true nature of genetic effects detected in GWAS and narrow the set of genetic variants and genes worthy of functional studies. It is also important to know whether independent variants in the same loci are associated with lung cancer. Convincing evidence supports at least two independent loci on 15q25, 5p15, and 6p21 (Table 3). Two independent loci were also reported at 9p21.3 (37) and 22q12.2 (28), but will require further validation. This knowledge is lacking for other lung cancer loci. Table 3 also provides a glimpse of suspected causal genes at each locus. Further functional and biological analyses will be needed to understand the role of these genes in lung cancer development.
GWAS loci . | Histology . | Smoking . | Gender . | Ethnicity . | Age of onset . | >1 loci . | Suspected causal genes . |
---|---|---|---|---|---|---|---|
1p36.32 (29) | s | m | a | AJAP1 (87), NPHP4 (88) | |||
1p31.1 (25) | e | FUBP, DNAJB4 | |||||
1q22 (56) | SQ | MUC1 (89, 90), ADAM15 (91), THBS3 | |||||
2p16.3 (39) | AD | n | w | a | NRXN1 (92) | ||
2q32 (22) | SQ | e | NUP35 (93) | ||||
2q32.2 (52) | AD | s | m | a | HIBCH, INPP1 (94), PMS1, STAT1 | ||
3p26 (44) | e | No genes. Deletions associated with cancer (95, 96) | |||||
3q28 (23, 25, 27, 28, 32, 36) | AD | w | TP63 (97, 98) | ||||
3q29 (26) | a | C3orf21 (26) | |||||
4p15.2 (45, 46) | SQ | e | KCNIP4 (99) | ||||
5p15 (18–22, 25–28, 32, 35, 36, 55) | AD | n | w | TERT (100–105), CLPTM1L (106–110) | |||
5q14.2 (42) | e | XRCC4 (111) | |||||
5q31 (29) | a | o | PAHA2 (112), CSF2 (113), IL3 (113), SLC22A5 (29, 114), ACSL6 (115) | ||||
5q32 (29) | AD | n | w | a | STK32A (29), PPP2R2B (116, 117), DPYSL3 (118) | ||
6p22.2 (31) | a | HIST1H1E | |||||
6p21 (18, 20–22, 25, 31, 32, 36, 37, 42, 54) | SQ | e | BAG6 (119, 120), APOM (121, 122), TNXB (123), MSH5 (124), BTNL2, PRRC2A (BAT2), FKBPL (125, 126), HSPA1B (127), FOXP4, FOXP4-AS1, GTF2H4 (42), LRFN2, HLA-A (128), HLA-DQB1 (128) | ||||
6q22 (36) | AD | n | w | a | DCBLD1 (129, 130), ROS1 (131–133) | ||
6q27 (25) | e | RNASET2 | |||||
7p15.3 (54) | a | SP4, DNAH11 | |||||
8p21.1 (25) | e | EPHX2, CHRNA2 | |||||
8p12 (25) | AD | e | NRG1 (134) | ||||
9p21.3 (22, 25, 37, 56) | CDKN2A (135), CDKN2B (135), CDKN2B-AS1 (136, 137), MTAP | ||||||
10p14 (29) | a | GATA3 (138–141) | |||||
10q23.33 (45) | n | w | e | y | FFAR4 (142) | ||
10q24.3 (25) | AD | e | OBFC1 | ||||
10q25.2 (36) | SQ | w | a | VTI1A (143–145) | |||
11q23.3 (25) | AD | e | MPZL3, AMICA1 | ||||
12p13.33 (22, 25, 40) | SQ | e | RAD52 (40, 146–151) | ||||
12q13.13 (37, 41) | n | w | ACVR1B (152, 153), NR4A1 | ||||
12q23.1 (30) | SQ | a | NR1H4 (154), SLC17A8 (155, 156) | ||||
12q24 (55) | e | SH2B3 | |||||
13q12.12 (28) | a | y | MIPEP, TNFRSF19 (157) | ||||
13q13.1 (23, 25, 56) | SQ | e | BRCA2 (158) | ||||
13q31.3 (34) | n | e | GPC5 (159, 160) | ||||
15q21.1 (25) | AD | e | SEMA6D, SECISBP2L (161) | ||||
15q25 (14–22, 25, 44) | s | e | CHRNA5, CHRNA3, CHRNB4, IREB2, PSMA4 (162), HYKK | ||||
17q24.3 (32, 36) | a | BPTF (32, 163, 164) | |||||
18p11.22 (38) | n | a | FAM38B (165), APCDD1 (166, 167), NAPG | ||||
18q12.1 (46) | AD | e | GAREM (168) | ||||
19q13.2 (25, 44) | e | TGFB1 (169), CYP2A6 | |||||
20q11.21 (31) | a | BPIFB1 (170) | |||||
20q13.2 (29) | AD | a | CYP24A1 (171–174) | ||||
20q13.33 (25) | AD | e | RTEL1 (175) | ||||
22q12.1 (23, 25) | SQ | e | CHEK2 (24, 176) | ||||
22q12.2 (28) | a | LIF (177, 178), HORMAD2, MTMR3 |
GWAS loci . | Histology . | Smoking . | Gender . | Ethnicity . | Age of onset . | >1 loci . | Suspected causal genes . |
---|---|---|---|---|---|---|---|
1p36.32 (29) | s | m | a | AJAP1 (87), NPHP4 (88) | |||
1p31.1 (25) | e | FUBP, DNAJB4 | |||||
1q22 (56) | SQ | MUC1 (89, 90), ADAM15 (91), THBS3 | |||||
2p16.3 (39) | AD | n | w | a | NRXN1 (92) | ||
2q32 (22) | SQ | e | NUP35 (93) | ||||
2q32.2 (52) | AD | s | m | a | HIBCH, INPP1 (94), PMS1, STAT1 | ||
3p26 (44) | e | No genes. Deletions associated with cancer (95, 96) | |||||
3q28 (23, 25, 27, 28, 32, 36) | AD | w | TP63 (97, 98) | ||||
3q29 (26) | a | C3orf21 (26) | |||||
4p15.2 (45, 46) | SQ | e | KCNIP4 (99) | ||||
5p15 (18–22, 25–28, 32, 35, 36, 55) | AD | n | w | TERT (100–105), CLPTM1L (106–110) | |||
5q14.2 (42) | e | XRCC4 (111) | |||||
5q31 (29) | a | o | PAHA2 (112), CSF2 (113), IL3 (113), SLC22A5 (29, 114), ACSL6 (115) | ||||
5q32 (29) | AD | n | w | a | STK32A (29), PPP2R2B (116, 117), DPYSL3 (118) | ||
6p22.2 (31) | a | HIST1H1E | |||||
6p21 (18, 20–22, 25, 31, 32, 36, 37, 42, 54) | SQ | e | BAG6 (119, 120), APOM (121, 122), TNXB (123), MSH5 (124), BTNL2, PRRC2A (BAT2), FKBPL (125, 126), HSPA1B (127), FOXP4, FOXP4-AS1, GTF2H4 (42), LRFN2, HLA-A (128), HLA-DQB1 (128) | ||||
6q22 (36) | AD | n | w | a | DCBLD1 (129, 130), ROS1 (131–133) | ||
6q27 (25) | e | RNASET2 | |||||
7p15.3 (54) | a | SP4, DNAH11 | |||||
8p21.1 (25) | e | EPHX2, CHRNA2 | |||||
8p12 (25) | AD | e | NRG1 (134) | ||||
9p21.3 (22, 25, 37, 56) | CDKN2A (135), CDKN2B (135), CDKN2B-AS1 (136, 137), MTAP | ||||||
10p14 (29) | a | GATA3 (138–141) | |||||
10q23.33 (45) | n | w | e | y | FFAR4 (142) | ||
10q24.3 (25) | AD | e | OBFC1 | ||||
10q25.2 (36) | SQ | w | a | VTI1A (143–145) | |||
11q23.3 (25) | AD | e | MPZL3, AMICA1 | ||||
12p13.33 (22, 25, 40) | SQ | e | RAD52 (40, 146–151) | ||||
12q13.13 (37, 41) | n | w | ACVR1B (152, 153), NR4A1 | ||||
12q23.1 (30) | SQ | a | NR1H4 (154), SLC17A8 (155, 156) | ||||
12q24 (55) | e | SH2B3 | |||||
13q12.12 (28) | a | y | MIPEP, TNFRSF19 (157) | ||||
13q13.1 (23, 25, 56) | SQ | e | BRCA2 (158) | ||||
13q31.3 (34) | n | e | GPC5 (159, 160) | ||||
15q21.1 (25) | AD | e | SEMA6D, SECISBP2L (161) | ||||
15q25 (14–22, 25, 44) | s | e | CHRNA5, CHRNA3, CHRNB4, IREB2, PSMA4 (162), HYKK | ||||
17q24.3 (32, 36) | a | BPTF (32, 163, 164) | |||||
18p11.22 (38) | n | a | FAM38B (165), APCDD1 (166, 167), NAPG | ||||
18q12.1 (46) | AD | e | GAREM (168) | ||||
19q13.2 (25, 44) | e | TGFB1 (169), CYP2A6 | |||||
20q11.21 (31) | a | BPIFB1 (170) | |||||
20q13.2 (29) | AD | a | CYP24A1 (171–174) | ||||
20q13.33 (25) | AD | e | RTEL1 (175) | ||||
22q12.1 (23, 25) | SQ | e | CHEK2 (24, 176) | ||||
22q12.2 (28) | a | LIF (177, 178), HORMAD2, MTMR3 |
NOTE: The color of the background illustrates the strength of evidence on a black-and-white scale, where black indicates convincing evidence and white indicates no evidence so far. The strength of evidence was assigned on the basis of the content of publications cited in the first column and our best possible judgment and comprehension of each locus considering the number of studies that replicated the associations, the level of statistically significance, and the quality of the studies, for example, sample size. No evidence (white) does not delineate the lack of associations that have been studied from those not yet examined and also highlights knowledge gaps.
Abbreviations: AD, adenocarcinoma; a, Asians; e, Europeans; m, men; n, never-smokers; o, older; s, smokers; SQ, squamous cell carcinoma; w, women; y, younger.
GWAS on Lung Cancer Survival
Interindividual differences in lung cancer survival are observed among lung cancer patients, even among those with the same tumor stage and treatment regimen. The identification of genetic factors associated with lung cancer survival has the potential to guide adjuvant therapy after surgery in early-stage disease, but also to refine prognosis and personalize clinical care in advanced-stages disease. So far, GWAS on lung cancer survival were performed in patients with early-stage NSCLC (58, 59), advanced-stage NSCLC (60–63), and SCLC (64). GWAS were also performed in more specific subgroups of lung cancer patients including never smokers with NSCLC (65) and patients with lung adenocarcinoma (66). Together, these studies have identified 23 loci associated with lung cancer survival (Supplementary Table S1). However, none of these loci was reported in more than one study. The lack of replication may be explained by heterogeneity in treatment regimens. There is also no overlap with GWAS lung cancer susceptibility loci. While the 9p21.3 locus was associated with both susceptibility (22, 37) and survival (62), sentinel SNPs are located more than 1 Mb away from each other, indicating that they are likely not reflecting the same association. GWAS susceptibility loci were specifically evaluated for association with survival in SCLC (67). Briefly, three loci on 20q13.2, 22q12.2, and 5p15 demonstrated some evidence of association with survival. However, none reached genome-wide significance. It should be noted that clinical follow-up of patients are needed to conduct survival analyses and GWAS based on this outcome have thus been performed with much smaller sample sizes compared with studies focused on cancer susceptibility. Larger-scale studies are needed to identify robust lung cancer survival loci.
GWAS on Response to Lung Cancer Therapies
Somatic alterations in the tumor genome are known to modulate the response to anticancer therapy. Less is known about the influence of the host genome on treatment response. The effect of germline variants on sensitivity and toxicity to platinum-based chemotherapy have been examined by GWAS. In patients with SCLC, seven loci demonstrated some evidence of association with treatment response in a discovery set, but were not convincingly replicated in a validation set (68). In NSCLC patients, a locus on 21q22.3 was associated with platinum-induced hepatotoxicity (69) and two loci on 2q24.3 and 17p12 were associated with the risk of platinum-induced myelosuppression (70). More and larger studies are needed to effectively delineate chemosensitive patients that will benefit from treatment and nonresponders that may be spared the adverse side effects associated with chemotherapy.
Future Directions
Important progress was made to understand host susceptibility to lung cancer using GWAS. This approach is also starting to reveal inherited variants associated with lung cancer survival and response to treatment. During the last decade, progress was driven by enlarging sample sizes, improving methods to genotype and impute SNPs with more comprehensive reference sets (e.g., 1000 Genomes Project), and the creation of large-scale international collaborations and consortia. Additional developments were made by studying patients of different ancestries, never smokers, women, and specific lung cancer histology. Progress was also made by refining results by pathway-based analysis and variant prioritization approaches. The next important step in the field of genomics of lung cancer is to identify the causal genetic variants and genes underpinning GWAS-nominated loci. In addition, the new genomic knowledge must be translated into real benefits for patients. These must be achieved if GWAS are to pay off the huge investment. We foresee different strategies to reach these goals.
eQTL and TWAS
We need to continue to mine GWAS data using more advanced statistical techniques that leverage other sources of data. So far, expression quantitative trait loci (eQTL) mapping studies in a variety of tissues have been used to extent the functional meaning of GWAS in lung cancer (23, 25, 29, 30, 32, 34, 37, 45, 56, 71). More comprehensive methods of colocalization of GWAS and eQTL signals were recently developed and must be performed to reveal genetic associations explained by regulatory effects on gene expression (72). The identification of lung cancer-associated genetic variants associated with the expression of specific genes in a disease relevant tissue is an important step forward to understand the molecular mechanisms underpinning GWAS signals. In addition, the relationships between genetic variants, RNA expression levels, and lung cancer must be further delineated by causality models and Mendelian randomization approaches (73, 74). Large-scale lung cancer GWAS (25) and lung eQTL (75) are also available to perform the first transcriptome-wide association study (TWAS) in lung cancer. In this approach, the cis genetic component of expression derived from the eQTL dataset is used to impute expression data for cases and controls used in the GWAS. Imputed genome-wide gene expression levels of sample size orders of magnitude larger than any of the transcriptomic datasets generated so far can then be used to identify genes whose expression is significantly associated with the disease. This approach has the potential to elucidate the most likely molecular drivers of lung cancer in GWAS-nominated loci, but also yield molecular drivers of lung cancer outside GWAS loci. GWAS, eQTL, and TWAS results will also need to be integrated with genes differentially expressed in lung tumor compared with adjacent nontumor lung tissues. For example, we have recently derived a robust list of genes differentially expressed in lung tumor from our own transcriptomic dataset (76) as well as two publicly available datasets (77, 78). These results identified genes consistently deregulated in lung tumor and revealed important insights about the molecular transitions that occur between normal and tumor lung tissues. Accordingly, very promising research is underway exploiting GWAS and gene expression datasets to identify causal genes and molecular drivers of lung cancer.
Deep molecular profiling and biobanking
To make further progress, relevant tissues must be profiled beyond gene expression. GWAS variants of lung cancer may not exert their effects through gene regulation, but other molecular phenotypes such as protein expression, protein state, metabolite levels, and epigenetic marks. Accordingly, deep molecular profiling of human lung tissues will be needed to comprehend the molecular impact of inherited variants on lung cancer. Current biobanking activities to collect high-quality and large numbers of well-annotated lung specimens are the essence of this future development.
Exposome
As depicted in Table 3, the independent contribution of GWAS loci on lung cancer, smoking behavior, and nicotine dependence is still not clearly delineated. Larger-scale gene–exposure interaction studies with established environmental risk factors including tobacco smoke and solid fuel burning are warranted. More comprehensive assessment of environmental factors including radon, asbestos, household and outdoor pollution, and occupational agents will be critically important, but at the same time very challenging to measure accurately in large sample size. A well-orchestrated community effort thus seems necessary (79).
Exome and genome sequencing
New genomic approaches from next-generation of sequencers are also expected to refine GWAS loci and discover new variants unlikely to be found by GWAS. To this effect, whole-exome sequencing in three members of a five-generation family affected by lung cancer has revealed a rare variants in PARK2 resulting in a loss-of-function of this tumor suppressor gene (80). Although rare, the effect size of this mutation was greater than those reported in GWAS. Similarly, exome-sequencing of sporadic and familial cases of lung cancer identified rare deleterious mutations in GWAS-nominated loci located in the CDC147 and DBH (81) genes. Whole-genome sequencing in a family with very high aggregation of lung adenocarcinoma revealed a functional missense variant in the oncogene YAP1 (82) associated with the risk of developing the disease. We expect these types of discoveries using exome and genome sequencing to multiply in the near future.
En route for a genetic risk score
Identified lung cancer susceptibility loci provide hope to build tools for targeted screening of high-risk individuals. To date, cumulative effects of loci have shown promising results to improve the discriminatory performance of risk prediction models, but not sufficiently to merit clinical implementation (83). For example, a recent report combining GWAS loci demonstrated only small improvement in lung cancer risk prediction in models including basic clinical factors such as age and smoking (84). Interestingly, the best model may not come from considering only the top GWAS loci. A genetic risk score built from seven telomere-length associated genetic variants was associated with lung cancer risk (85). More recently, it was demonstrated that the cumulative effects of susceptibility variants were better predictors when organized in biological pathways (86). These examples demonstrated the variety of strategies that are currently used to develop new clinical tools to predict lung cancer. Such tools are urgently needed to enable earlier diagnosis. The task is challenging and will require major efforts, but seems more realistically feasible with the outcomes of GWAS in hands. We hope that this compendium of lung cancer GWAS loci will facilitate further progress in building a clinically useful genetic risk score.
Conclusions
Understanding the genetic factors underlying the development of lung cancer is important to elucidate the etiology of the disease. This genetic knowledge is a prerequisite to develop and improve future clinical strategies for lung cancer management. Discovered loci summarized in this review testify progress made in this field during the last decade. This review also highlights knowledge gaps about causal variants and genes responsible for the underlying genetic associations and proposes some short-term solutions to ensure further progress through eQTL, colocalization, causality models and TWAS. The specificity of many lung cancer loci in terms of histologic subtypes, gender, and ethnicity have been discovered for some loci, but will demand large studies with well-characterized individuals for others. Although smoking and other environmental factors, notably solid fuel burning, are clearly interacting with host factors to cause the disease, the specific variants that come into play are still elusive. Preliminary data provides some clues about inherited variants associated with lung cancer survival and response to treatment, but will require validation in larger-scale studies. On the other hand, robust genetic factors associated with lung cancer derived from GWAS give hope for possible clinical translation. In short term, a genetic risk score to screen high-risk individuals seems realistically achievable and would allow more effective treatments available at earlier stages of the disease. In mid and longer terms, discovering the causal genes underpinning GWAS signals will propel results one step closer to clinical applications by revealing new therapeutic targets and biomarkers to personalize quality of care.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.