Background:

Prior genome-wide association studies have identified numerous lung cancer risk loci and reveal substantial etiologic heterogeneity across histologic subtypes. Analyzing the shared genetic architecture underlying variation in complex traits can elucidate common genetic etiologies across phenotypes. Exploring pairwise genetic correlations between lung cancer and other polygenic traits can reveal the common genetic etiology of correlated phenotypes.

Methods:

Using cross-trait linkage disequilibrium score regression, we estimated the pairwise genetic correlation and heritability between lung cancer and multiple traits using publicly available summary statistics. Identified genetic relationships were also examined after excluding genomic regions known to be associated with smoking behaviors, a major risk factor for lung cancer.

Results:

We observed several traits showing moderate single nucleotide polymorphism–based heritability and significant genetic correlations with lung cancer. We observed highly significant correlations between the genetic architectures of lung cancer and emphysema/chronic bronchitis across all histologic subtypes, as well as among lung cancer occurring among smokers. Our analyses revealed highly significant positive correlations between lung cancer and paternal history of lung cancer. We also observed a strong negative correlation with parental longevity. We observed consistent directions in genetic patterns after excluding genomic regions associated with smoking behaviors.

Conclusions:

This study identifies numerous phenotypic traits that share genomic architecture with lung carcinogenesis and are not fully accounted for by known smoking-associated genomic loci.

Impact:

These findings provide new insights into the etiology of lung cancer by identifying traits that are genetically correlated with increased risk of lung cancer.

Over the past decade, genome-wide association studies (GWAS) assessing millions of single-nucleotide polymorphisms (SNPs) have identified common genetic variants that influence complex diseases and have proven useful in elucidating the heritability of complex traits (1). GWAS have generally found several loci that predispose for lung cancer across all histologic subtypes, but many more are histology-specific (2, 3). Moreover, many of the loci that predispose primarily for overall lung cancer are smoking-related, while the histology-specific loci do not show this strong association with smoking behaviors (1). Because histologic subtypes have distinct genetic architectures and an important proportion of trait heritability may be explained by variants of small genetic effects (4, 5), large overall sample sizes are needed. We have curated large datasets with harmonized histologic definitions, facilitating studies to evaluate the histology-specific and overall genetic architectures of lung cancer (6).

Lung cancer is a multifactorial disease driven by germline genetic variation, environmental risk exposures, particularly cigarette smoking, and an accumulation of somatic genetic events. Like other common diseases, much of the heritability of lung cancer remains unexplained (7). Explanations for this missing heritability may be that heritability estimates of lung cancer vary by histology or are attenuated by smoking behaviors (8–11), which varies over time periods and confers the preponderance of risk for most individuals. Approximately 80% of lung cancer–related deaths are related to smoking and this proportion is even higher for small-cell lung cancer (SCLC), which is rare among never-smokers (12). Despite the strong causal association between smoking and lung cancer incidence, there remains a substantial population of never-smokers who develop lung cancer, with about 10% of lung cancer cases in European-ancestry populations reporting to be nonsmokers (13). This suggests that interindividual variation in lung cancer susceptibility results from environmental exposures and genetic predisposition (14). In addition to smoking behaviors, other associated factors include chronic obstructive pulmonary disease (COPD), dietary behaviors, and exposures to lung carcinogens like metals from smelting, cooking emissions, atmospheric pollution, and residential radon (15–20). While exposure to environmental toxicants lacks a heritable component, many behavioral, medical, and physiologic measures have substantial heritability, allowing further exploration of the shared genetic architecture between these traits and lung cancer risk.

Chronic diseases often cooccur with other medical disorders in the same individual, so-called comorbid conditions. Understanding the genetic architecture underlying co-development of diseases may be more informative than studying individual phenotypes in isolation (21, 22). The contribution of shared genetic underpinning for comorbid conditions has been inadequately studied. However, comorbid conditions described in the medical literature may help to identify persons at increased risks (19). The most common comorbidity related to lung cancer is COPD, the umbrella term including patients with emphysema or chronic bronchitis. Both disorders that share smoking as an underlying causal risk factor (19). Type II diabetes, which is associated with immunosuppression and vascular complications, is a significant public health problem (23). About 8%–18% of patients with cancer also have diabetes (24). Lung cancer and diabetes share common risk factors such as age, diet, and smoking (24). To tease apart the shared genetic contribution to these comorbid conditions and the extent to which they are mediated solely by smoking behaviors, we performed cross-trait linkage disequilibrium (LD) score regression (LDSR) analysis (25, 26) to reveal patterns of pairwise genetic correlation between lung cancer and multiple polygenic phenotypes using GWAS summary statistics from TRICL (Transdisciplinary Research In Cancer of the Lung)-Oncoarray lung data (6) and prior GWAS studies including the UK Biobank (UKBB; refs. 27, 28). Analyses were also stratified by lung cancer histology and smoking status, and chromosomal regions related to smoking behaviors (e.g., cigarettes per day) and nicotine dependence were removed to account for the potential mediating effects of smoking behaviors (29, 30).

Summary statistics of lung cancer meta-analysis and data availability

GWAS summary statistics (6) from the TRICL-Oncoarray consortium were used in this study, including 29,266 patients with lung cancer [11,273 lung adenocarcinoma (LUAD), 7,426 lung squamous cell carcinoma (LUSC), and 2,664 SCLC], and 56,450 European-ancestry controls (Supplementary Table S1). The Oncoarray samples were genotyped using the Illumina Oncoarray-500K BeadChip and 517,482 SNPs remained for analysis after quality control processes. Individual-level genomes were imputed to 10,439,017 SNPs using the 1000 Genome Project Phase 3 panel (haplotype release October 2014; refs. 6, 31). Further details about the TRICL-Oncoarray studies and genotyping methods have been previously published (6, 31). The summary statistics for lung cancer, all lung histologic subtype-based and smoking status–based subset analyses were obtained from a prior GWAS meta-analysis of lung cancer risk (6). Summary statistics from the meta-analysis of TRICL-Oncoarray lung cancer studies have been deposited at the database of Genotypes and Phenotypes (dbGAP) under study accession phs000877.v1.p1 and phs001273.v3.p2.

Integration of existing summary-level GWAS data from UK Biobank and other resources

To estimate the genetic correlation between lung cancer and the phenotypic trait of interest, we harmonized publicly available GWAS summary-level datasets from UKBB (27, 28, 32), a large and detailed cohort study in the United Kingdom that enrolled over 500,000 adults who were aged 40–69 years when recruited in 2006–2010. The study has collected extensive and comprehensive phenotypic and genotypic details about participants, including data from medical record linkage, questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes (28). Summary statistics of imputed data were downloaded from UKBB (27). Autoimmune- and lipids-related traits were obtained from previously reported GWAS (33–36). The sample sizes and more details are shown in Supplementary Table S2.

Ethical statement

All participants provided informed consents to provide samples for DNA analysis, cancer status and smoking behavior according to protocols that were evaluated by the Internal Review Boards (or equivalent) of the contributing centers and according to prevailing rules including the Belmont Report or the Declaration of Helsinki, Council for International Organizations of Medical Science (CIOMS), the U.S. Common Rule or other guiding principles. Ethics approval for the UKBB was provided by the UK National Health Service (NHS) Research Ethics Service North West (Research Ethnics Committee approval number 11/NW/0382) and all participants informed written consent. Also, for other summary-level GWAS data, written informed consent was obtained from all study participants contributing to those studies with local Ethics Committee/Institutional Review Board approval. Full details have been reported elsewhere (28, 33–36).

Genetic heritability and pairwise genetic correlations

Genome-wide SNP-heritability and pairwise genetic correlation estimates were computed using GWAS summary statistics and linkage disequilibrium information through LDSR analyses (LD Score v1.0.1, https://github.com/bulik/ldsc). Briefly, LDSR is a method that regresses Chi-square statistics from GWAS summary on LD-scores measuring how much genetic variation each variant tags. By regressing the product of Z-scores, Z1Z2, from two polygenic, genetically correlated traits with Z-scores, Z1 and Z2, onto the LD-score for each SNP, the genetic covariance between two polygenic traits can be obtained (37). Specifically, genetic covariance is measured as the slope of this regression after accounting for GWAS sample size and the number of SNPs included in the LDSR analysis. Once obtained, LDSR standardizes genetic covariance to estimate genetic correlation by dividing the genetic covariance by the overall heritability of each trait being interrogated. Heritability is calculated simply as the genetic covariance of a trait with itself and may best be understood as the proportion of a trait that can be explained genetically by SNP variation.

For LDSR, the primary common genetic variants with minor allele frequency (MAF)>1% and imputation INFO score>90% were used in this study because the standard errors of LD scores for these variants could be substantial. Multi-allelic SNPs and the MHC region (Chr6:26Mb-34Mb) were removed from GWAS summary statistics because of the complex and unusual LD and genetic architecture of MHC region (38). These analyses were also restricted to HapMap3 SNPs with MAF above 5% in populations of European-descent as described by the 1000 Genomes Project as a reference for LD patterns (26). We performed LDSR to estimate pairwise genetic correlations between lung cancer and traits of medical conditions, including family history, lipids, and smoking behaviors, as well as histology-specific pairwise correlations for LUAD, LUSC, and SCLC. We also implemented LDSR to compute the estimate of pairwise genetic correlation for all lung cancer stratified by groups of individuals diagnosed with lung cancers that reported smoking fewer than 100 cigarettes in their lives (i.e., never-smokers) and cases and controls diagnosed with lung cancers that had smoked 100 or more cigarettes in their lifetime (i.e., ever-smokers; refs. 6). While we assume that there is no mismatch between LD scores from the reference population and the target population used for GWAS, we also considered the scenario where there is a heterogeneous substructure in populations of European-descent resulting in directional bias in average LD score (39). Using LDSR with an intercept term protects against bias due to shared population stratification and sample overlap when estimating cross-trait genetic correlation (gcov_int) and against bias due to population stratification and cryptic relatedness when estimating observed-scale heritability (h2_int) (25). The intercept, h2_int should be close to 1. The intercept, gcov_int should be less than one standard error away from zero. To avoid bias due to population stratification and sample overlap between lung cancer summary data and other existing summary data, we performed LDSR analysis without constraining intercept to zero (unconstrained intercept model; ref. 26). After filtering, there were ∼1,120,000 SNPs available for analyses in lung cancer and across all histologic subtypes and smoking status (Supplementary Table S1). We used the command option of LD Score v1.0.1, “ldsc.py –rg lung1.sumstats.gz, trait1.sumstats.gz –ref-ld-chr eur_w_ld_chr/–w-ld-chr eur_w_ld_chr/–out lung1_triat1”.

Exclusion of genomic regions associated with smoking behaviors

Epidemiologic studies consistently show that smoking is the most prominent risk factor for lung cancer (40–42). Many studies of lung cancer have addressed the potential role of tobacco as a confounder in association studies (14, 43, 44). Because smoking behaviors are heritable traits (45–47) and correlated with both other behaviors and lung cancer risk, we were concerned that associations between lung cancer and other traits revealed by LDSR could reflect a common etiology due to smoking (37, 38). To attenuate the contribution of smoking-related variants to these LDSR associations, chromosomal regions (±500 kb) centered on 473 SNPs associated with smoking traits such as cigarettes per day, smoking initiation, smoking cessation, initiation age of regular smoking, and nicotine dependence were excluded from lung cancer GWAS summary statistics in sensitivity analyses (29, 30). The excluded 473 SNPs and regions were listed in Supplementary Table S3.

We examined the degree of overlap in genomic contributions to lung cancer and other polygenic phenotypes of interest-based on the pairwise genetic correlation (rg) and the SNP-heritability (h2), representing the proportion of phenotypic variance explained by all SNPs using cross-trait LDSR analysis. We also investigated the genetic relationship on genomic regions associated with smoking behaviors, a major risk factor of lung cancer. After removing the genomic regions associated with smoking behaviors, ∼980,000 SNPs remained for the LDSR analyses (Supplementary Table S1). As shown in Supplementary Fig. S1, we observed that the strongest genome-wide signals on chromosomes 15 and 19 (near the cholinergic nicotine receptor CHRNA5 and the nicotine metabolizing enzyme CYP2A6, respectively) were absent after removing the genomic regions related to smoking behaviors.

We identified numerous traits showing moderate SNP-heritability and moderate to strong genetic correlation with lung cancer at a Bonferroni-corrected significance threshold of P = 0.05/3,600 = 1.38 × 10−5. Because of the hypothesis-generating nature of this research, we also considered P < 0.05 to be nominally significant associations that merit targeted follow-up in future studies. P values below our Bonferroni-corrected level of statistical significance were considered to be robustly associated in these analyses and are further marked in bold.

We estimated the observed heritability of lung cancer to be about 8% overall and 7% after excluding chromosomal regions associated with smoking behaviors. As shown in Table 1, an estimate of the observed heritability of lung cancer in ever-smokers was about 10% when including smoking-associated loci and 8% after the exclusion, while in never-smokers h2 was approximately 3% to approximately 4% in both analyses.

Table 1.

Estimates of observed-scale heritability by lung cancer, lung histologic subtypes, and smoking behaviors.

StrataHeritabilitySE of heritability
Lung cancer 0.0832 0.0126 
LUAD 0.0677 0.0099 
LUSC 0.0517 0.0106 
SCLC 0.1045 0.0192 
Ever smokers 0.0994 0.0214 
Never smokers 0.0304 0.0479 
Lung cancer exclusion 0.0706 0.0075 
LUAD Exclusion 0.0659 0.0094 
LUSC Exclusion 0.0404 0.0094 
SCLC Exclusion 0.0856 0.0194 
Ever smokers exclusion 0.0826 0.0157 
Never smokers exclusion 0.0463 0.0531 
StrataHeritabilitySE of heritability
Lung cancer 0.0832 0.0126 
LUAD 0.0677 0.0099 
LUSC 0.0517 0.0106 
SCLC 0.1045 0.0192 
Ever smokers 0.0994 0.0214 
Never smokers 0.0304 0.0479 
Lung cancer exclusion 0.0706 0.0075 
LUAD Exclusion 0.0659 0.0094 
LUSC Exclusion 0.0404 0.0094 
SCLC Exclusion 0.0856 0.0194 
Ever smokers exclusion 0.0826 0.0157 
Never smokers exclusion 0.0463 0.0531 

Note: Exclusion indicates the removal of genomic regions associated to smoking behaviors.

Genome-wide genetic correlation estimates were obtained, stratified by lung cancer histology and smoking status (Supplementary Table S4). The genetic correlation estimates were comparable among histologic subtypes, and SCLC appeared to share more genetic architecture with LUSC than with LUAD both before and after omitting genomic regions related to smoking behaviors. Risk in ever-smokers was more correlated with LUAD and LUSC than with SCLC, and risk in never-smokers was more correlated with LUAD than with other subtypes.

Our results based on cross-trait LDSR analyses demonstrated that the genetic architecture of lung cancer susceptibility was strongly positively correlated with that of traits related to smoking behaviors, selected medical conditions, and family history. As presented in Tables 2 and 3, the strongest genetic correlation across all lung cancer subtypes was with pack-years of smoking, followed by current tobacco smoking, maternal smoking behavior around birth, paternal history of lung cancer, parental longevity, and chronic bronchitis/emphysema. All cross-trait genetic correlations between lung cancer and polygenic traits considered in this study are reported in Supplementary Tables S5–S7.

Table 2.

The shared genetic correlations of lung cancer on inclusion/exclusion of genomic regions related to smoking genetic behaviors.

Lung cancerLUADLUSCSCLC
InclusionExclusionInclusionExclusionInclusionExclusionInclusionExclusion
Phenotypic traitrg1P1rg2P2rg1P1rg2P2rg1P1rg2P2rg1P1rg2P2
Allergy atopy 
 Eczema −0.11 0.005 −0.10 0.024 −0.11 0.019 −0.09 0.049 −0.12 0.024 −0.11 0.055 −0.14 0.012 −0.12 0.036 
Blood traits 
 High light scatter reticulocyte count 0.18 8.20 × 10−8 0.19 1.04 × 10−7 0.16 7.56 × 10−5 0.15 6.46 × 10−4 0.17 4.87 × 10−4 0.14 0.011 0.24 7.95 × 10−7 0.25 2.19 × 10−5 
 Lymphocyte count 0.14 1.27 × 10−4 0.14 1.97 × 10−4 0.12 6.02 × 10−3 0.12 8.90 × 10−3 0.17 8.58 × 10−5 0.14 4.23 × 10−3 0.19 1.02 × 10−5 0.17 1.19 × 10−3 
 White blood cell count 0.21 6.72 × 10−9 0.21 2.40 × 10−7 0.16 7.59 × 10−5 0.16 1.74 × 10−4 0.20 1.25 × 10−5 0.18 9.95 × 10−4 0.22 2.11 × 10−6 0.23 3.62 × 10−4 
Medical condition and family history 
 Emphysema/chronic bronchitis 0.52 2.98 × 10−13 0.47 3.91 × 10−7 0.44 1.25 × 10−6 0.40 2.04 × 10−4 0.45 5.10 × 10−6 0.33 7.94 × 10−3 0.57 1.63 × 10−6 0.61 1.32 × 10−4 
 Forced expiratory volume In 1-Second (FEV1) −0.18 1.19 × 10−7 −0.10 3.93 × 10−3 −0.10 0.019 −0.10 5.93 × 10−3 −0.24 4.53 × 10−7 −0.17 3.31 × 10−3 −0.19 8.45 × 10−4 −0.14 0.024 
 COPD 0.28 0.008 0.22 0.052 0.08 0.510 0.02 0.869 0.32 0.037 0.23 0.149 0.53 0.002 0.47 0.006 
 Diabetes 0.13 0.007 0.13 0.014 0.05 0.323 0.04 0.432 0.16 0.015 0.17 0.019 0.14 0.049 0.14 0.067 
 Paternal history of lung cancer 0.95 5.52 × 10−30 0.89 1.10 × 10−20 0.79 2.00 × 10−15 0.75 1.69 × 10−12 0.93 1.38 × 10−16 0.82 9.92 × 10−8 0.69 1.14 × 10−7 0.69 1.36 × 10−5 
 Paternal history of chronic bronchitis/emphysema 0.60 1.06 × 10−13 0.53 2.28 × 10−7 0.42 6.67 × 10−5 0.39 2.32 × 10−3 0.52 2.00 × 10−6 0.42 4.09 × 10−3 0.61 1.66 × 10−7 0.63 6.64 × 10−5 
 Paternal age at death −0.51 2.65 × 10−21 −0.48 1.20 × 10−10 −0.36 5.90 × 10−7 −0.35 3.13 × 10−5 −0.54 2.07 × 10−10 −0.50 1.73 × 10−6 −0.55 1.43 × 10−8 −0.48 6.54 × 10−5 
 Maternal history of chronic bronchitis/emphysema 0.49 2.56 × 10−11 0.40 1.73 × 10−4 0.33 7.49 × 10−4 0.63 1.18 × 10−8 0.63 1.18 × 10−8 0.56 1.75 × 10−4 0.30 0.014 0.22 0.185 
 Maternal age at death −0.50 2.84 × 10−11 −0.43 1.22 × 10−5 −0.42 3.26 × 10−6 −0.39 6.92 × 10−4 −0.40 1.02 × 10−5 −0.29 0.019 −0.48 2.35 × 10−5 −0.45 3.36 × 10−3 
Metabolic traits 
 HDL −0.17 2.69 × 10−4 −0.18 0.002 −0.10 0.073 −0.10 0.137 −0.18 0.010 −0.18 0.018 −0.21 0.014 −0.20 0.021 
 Sodium in urine 0.20 1.38 × 10−5 0.23 2.65 × 10−7 0.17 7.06 × 10−4 0.21 6.84 × 10−5 0.19 0.057 0.17 0.013 0.20 8.31 × 10−4 0.21 4.13 × 10−3 
Smoking behaviors 
 Current tobacco smoking 0.61 8.22 × 10−25 0.60 2.32 × 10−32 0.43 2.17 × 10−15 0.43 2.11 × 10−12 0.67 9.44 × 10−18 0.62 9.37 × 10−11 0.54 1.31 × 10−14 0.52 9.54 × 10−9 
 Former tobacco smoking 0.40 1.52 × 10−7 0.39 1.05 × 10−5 0.36 1.43 × 10−5 0.39 4.08 × 10−5 0.43 1.32 × 10−5 0.38 3.90 × 10−3 0.27 0.008 0.22 0.089 
 Maternal smoking around birth 0.53 2.81 × 10−22 0.46 1.28 × 10−15 0.37 9.74 × 10−10 0.32 4.20 × 10−6 0.60 2.17 × 10−17 0.51 6.50 × 10−9 0.49 3.80 × 10−10 0.44 7.02 × 10−6 
 Pack years adult smoking 0.75 3.19 × 10−57 0.68 2.29 × 10−30 0.56 1.76 × 10−14 0.49 1.05 × 10−13 0.75 1.17 × 10−24 0.64 2.35 × 10−10 0.71 2.70 × 10−17 0.65 2.27 × 10−11 
 Smoking/smokers in household 0.48 3.42 × 10−6 0.50 1.21 × 10−4 0.35 0.001 0.37 4.89 × 10−3 0.55 2.58 × 10−5 0.50 4.48 × 10−3 0.38 0.008 0.40 0.051 
 Time from waking to first cigarette −0.73 2.29 × 10−9 −0.73 1.29 × 10−4 −0.44 0.002 −0.41 0.040 −0.77 8.59 × 10−6 −0.72 4.12 × 10−3 −0.96 6.11 × 10−7 −1.14 9.23 × 10−4 
Lung cancerLUADLUSCSCLC
InclusionExclusionInclusionExclusionInclusionExclusionInclusionExclusion
Phenotypic traitrg1P1rg2P2rg1P1rg2P2rg1P1rg2P2rg1P1rg2P2
Allergy atopy 
 Eczema −0.11 0.005 −0.10 0.024 −0.11 0.019 −0.09 0.049 −0.12 0.024 −0.11 0.055 −0.14 0.012 −0.12 0.036 
Blood traits 
 High light scatter reticulocyte count 0.18 8.20 × 10−8 0.19 1.04 × 10−7 0.16 7.56 × 10−5 0.15 6.46 × 10−4 0.17 4.87 × 10−4 0.14 0.011 0.24 7.95 × 10−7 0.25 2.19 × 10−5 
 Lymphocyte count 0.14 1.27 × 10−4 0.14 1.97 × 10−4 0.12 6.02 × 10−3 0.12 8.90 × 10−3 0.17 8.58 × 10−5 0.14 4.23 × 10−3 0.19 1.02 × 10−5 0.17 1.19 × 10−3 
 White blood cell count 0.21 6.72 × 10−9 0.21 2.40 × 10−7 0.16 7.59 × 10−5 0.16 1.74 × 10−4 0.20 1.25 × 10−5 0.18 9.95 × 10−4 0.22 2.11 × 10−6 0.23 3.62 × 10−4 
Medical condition and family history 
 Emphysema/chronic bronchitis 0.52 2.98 × 10−13 0.47 3.91 × 10−7 0.44 1.25 × 10−6 0.40 2.04 × 10−4 0.45 5.10 × 10−6 0.33 7.94 × 10−3 0.57 1.63 × 10−6 0.61 1.32 × 10−4 
 Forced expiratory volume In 1-Second (FEV1) −0.18 1.19 × 10−7 −0.10 3.93 × 10−3 −0.10 0.019 −0.10 5.93 × 10−3 −0.24 4.53 × 10−7 −0.17 3.31 × 10−3 −0.19 8.45 × 10−4 −0.14 0.024 
 COPD 0.28 0.008 0.22 0.052 0.08 0.510 0.02 0.869 0.32 0.037 0.23 0.149 0.53 0.002 0.47 0.006 
 Diabetes 0.13 0.007 0.13 0.014 0.05 0.323 0.04 0.432 0.16 0.015 0.17 0.019 0.14 0.049 0.14 0.067 
 Paternal history of lung cancer 0.95 5.52 × 10−30 0.89 1.10 × 10−20 0.79 2.00 × 10−15 0.75 1.69 × 10−12 0.93 1.38 × 10−16 0.82 9.92 × 10−8 0.69 1.14 × 10−7 0.69 1.36 × 10−5 
 Paternal history of chronic bronchitis/emphysema 0.60 1.06 × 10−13 0.53 2.28 × 10−7 0.42 6.67 × 10−5 0.39 2.32 × 10−3 0.52 2.00 × 10−6 0.42 4.09 × 10−3 0.61 1.66 × 10−7 0.63 6.64 × 10−5 
 Paternal age at death −0.51 2.65 × 10−21 −0.48 1.20 × 10−10 −0.36 5.90 × 10−7 −0.35 3.13 × 10−5 −0.54 2.07 × 10−10 −0.50 1.73 × 10−6 −0.55 1.43 × 10−8 −0.48 6.54 × 10−5 
 Maternal history of chronic bronchitis/emphysema 0.49 2.56 × 10−11 0.40 1.73 × 10−4 0.33 7.49 × 10−4 0.63 1.18 × 10−8 0.63 1.18 × 10−8 0.56 1.75 × 10−4 0.30 0.014 0.22 0.185 
 Maternal age at death −0.50 2.84 × 10−11 −0.43 1.22 × 10−5 −0.42 3.26 × 10−6 −0.39 6.92 × 10−4 −0.40 1.02 × 10−5 −0.29 0.019 −0.48 2.35 × 10−5 −0.45 3.36 × 10−3 
Metabolic traits 
 HDL −0.17 2.69 × 10−4 −0.18 0.002 −0.10 0.073 −0.10 0.137 −0.18 0.010 −0.18 0.018 −0.21 0.014 −0.20 0.021 
 Sodium in urine 0.20 1.38 × 10−5 0.23 2.65 × 10−7 0.17 7.06 × 10−4 0.21 6.84 × 10−5 0.19 0.057 0.17 0.013 0.20 8.31 × 10−4 0.21 4.13 × 10−3 
Smoking behaviors 
 Current tobacco smoking 0.61 8.22 × 10−25 0.60 2.32 × 10−32 0.43 2.17 × 10−15 0.43 2.11 × 10−12 0.67 9.44 × 10−18 0.62 9.37 × 10−11 0.54 1.31 × 10−14 0.52 9.54 × 10−9 
 Former tobacco smoking 0.40 1.52 × 10−7 0.39 1.05 × 10−5 0.36 1.43 × 10−5 0.39 4.08 × 10−5 0.43 1.32 × 10−5 0.38 3.90 × 10−3 0.27 0.008 0.22 0.089 
 Maternal smoking around birth 0.53 2.81 × 10−22 0.46 1.28 × 10−15 0.37 9.74 × 10−10 0.32 4.20 × 10−6 0.60 2.17 × 10−17 0.51 6.50 × 10−9 0.49 3.80 × 10−10 0.44 7.02 × 10−6 
 Pack years adult smoking 0.75 3.19 × 10−57 0.68 2.29 × 10−30 0.56 1.76 × 10−14 0.49 1.05 × 10−13 0.75 1.17 × 10−24 0.64 2.35 × 10−10 0.71 2.70 × 10−17 0.65 2.27 × 10−11 
 Smoking/smokers in household 0.48 3.42 × 10−6 0.50 1.21 × 10−4 0.35 0.001 0.37 4.89 × 10−3 0.55 2.58 × 10−5 0.50 4.48 × 10−3 0.38 0.008 0.40 0.051 
 Time from waking to first cigarette −0.73 2.29 × 10−9 −0.73 1.29 × 10−4 −0.44 0.002 −0.41 0.040 −0.77 8.59 × 10−6 −0.72 4.12 × 10−3 −0.96 6.11 × 10−7 −1.14 9.23 × 10−4 

Note: Bold indicates P ≤ 1.38 × 10−5.

Abbreviation: rg, genetic correlation.

Table 3.

The shared genetic correlations of lung cancer by smoking behaviors on inclusion/exclusion of genomic regions related to smoking genetic behaviors.

Ever smokersNever Smokers
InclusionExclusionInclusionExclusion
Phenotypic traitrg1P1rg2P2rg1P1rg2P2
Allergy atopy 
 Eczema −0.12 0.011 −0.13 0.037 −0.01 0.937 −0.02 0.906 
Blood traits 
 High light scatter reticulocyte count 0.17 6.99 × 10−5 0.19 8.59 × 10−5 0.16 0.328 0.10 0.402 
 Lymphocyte count 0.12 0.003 0.12 0.008 0.21 0.309 0.16 0.287 
 White blood cell count 0.17 1.23 × 10−5 0.18 2.43 × 10−4 0.32 0.246 0.22 0.190 
Medical condition and family history 
 Emphysema/chronic bronchitis 0.43 1.64 × 10−6 0.43 5.81 × 10−4 0.48 0.272 0.42 0.231 
 Forced expiratory volume in 1-second (FEV1) −0.15 7.33 × 10−4 −0.15 0.001 −0.16 0.348 −0.03 0.775 
 Paternal history of lung cancer 0.97 2.86 × 10−16 0.92 5.96 × 10−12 0.37 0.348 0.32 0.276 
 Paternal history of chronic bronchitis/emphysema 0.53 4.66 × 10−8 0.48 2.25 × 10−4 0.08 0.814 0.03 0.917 
 Paternal age at death −0.44 7.97 × 10−9 −0.41 4.89 × 10−5 −0.24 0.405 −0.14 0.540 
 Maternal history of chronic bronchitis/emphysema 0.37 2.53 × 10−4 0.25 0.045 0.35 0.385 0.28 0.413 
 Maternal age at death −0.48 8.81 × 10−7 −0.48 2.52 × 10−4 −0.12 0.691 0.00 0.985 
Metabolic traits 
 HDL −0.21 0.002 −0.21 0.006 0.06 0.721 0.11 0.566 
 Sodium in urine 0.18 0.001 0.23 1.57 × 10−4 0.36 0.194 0.38 0.089 
Smoking behaviors 
 Current tobacco smoking 0.48 5.90 × 10−12 0.48 4.55 × 10−11 0.59 0.186 0.55 0.078 
 Former tobacco smoking 0.21 0.015 0.20 0.062 0.91 0.197 0.76 0.107 
 Maternal smoking around birth 0.41 6.38 × 10−11 0.38 3.67 × 10−7 0.39 0.227 0.25 0.243 
 Pack years adult smoking 0.72 9.71 × 10−25 0.66 2.43 × 10−15 0.29 0.284 0.22 0.275 
 Smoking/smokers in household 0.42 0.001 0.50 0.001 0.35 0.384 0.31 0.368 
 Time from waking to first cigarette −0.75 4.47 × 10−7 −0.80 0.001 −0.24 0.557 −0.24 0.560 
Ever smokersNever Smokers
InclusionExclusionInclusionExclusion
Phenotypic traitrg1P1rg2P2rg1P1rg2P2
Allergy atopy 
 Eczema −0.12 0.011 −0.13 0.037 −0.01 0.937 −0.02 0.906 
Blood traits 
 High light scatter reticulocyte count 0.17 6.99 × 10−5 0.19 8.59 × 10−5 0.16 0.328 0.10 0.402 
 Lymphocyte count 0.12 0.003 0.12 0.008 0.21 0.309 0.16 0.287 
 White blood cell count 0.17 1.23 × 10−5 0.18 2.43 × 10−4 0.32 0.246 0.22 0.190 
Medical condition and family history 
 Emphysema/chronic bronchitis 0.43 1.64 × 10−6 0.43 5.81 × 10−4 0.48 0.272 0.42 0.231 
 Forced expiratory volume in 1-second (FEV1) −0.15 7.33 × 10−4 −0.15 0.001 −0.16 0.348 −0.03 0.775 
 Paternal history of lung cancer 0.97 2.86 × 10−16 0.92 5.96 × 10−12 0.37 0.348 0.32 0.276 
 Paternal history of chronic bronchitis/emphysema 0.53 4.66 × 10−8 0.48 2.25 × 10−4 0.08 0.814 0.03 0.917 
 Paternal age at death −0.44 7.97 × 10−9 −0.41 4.89 × 10−5 −0.24 0.405 −0.14 0.540 
 Maternal history of chronic bronchitis/emphysema 0.37 2.53 × 10−4 0.25 0.045 0.35 0.385 0.28 0.413 
 Maternal age at death −0.48 8.81 × 10−7 −0.48 2.52 × 10−4 −0.12 0.691 0.00 0.985 
Metabolic traits 
 HDL −0.21 0.002 −0.21 0.006 0.06 0.721 0.11 0.566 
 Sodium in urine 0.18 0.001 0.23 1.57 × 10−4 0.36 0.194 0.38 0.089 
Smoking behaviors 
 Current tobacco smoking 0.48 5.90 × 10−12 0.48 4.55 × 10−11 0.59 0.186 0.55 0.078 
 Former tobacco smoking 0.21 0.015 0.20 0.062 0.91 0.197 0.76 0.107 
 Maternal smoking around birth 0.41 6.38 × 10−11 0.38 3.67 × 10−7 0.39 0.227 0.25 0.243 
 Pack years adult smoking 0.72 9.71 × 10−25 0.66 2.43 × 10−15 0.29 0.284 0.22 0.275 
 Smoking/smokers in household 0.42 0.001 0.50 0.001 0.35 0.384 0.31 0.368 
 Time from waking to first cigarette −0.75 4.47 × 10−7 −0.80 0.001 −0.24 0.557 −0.24 0.560 

Note: Bold indicates P ≤ 1.38 × 10−5.

Taking into consideration inclusion (rg1 and P1) and exclusion (rg2 and P2) of genomic regions associated with smoking behaviors, emphysema, or chronic bronchitis (rg1 = 0.52, P1 = 2.98 × 10−13; rg2 = 0.47, P2 = 3.91 × 10−7), lung function in forced expiratory volume in 1-second (FEV1)(rg1 = −0.18, P1 = 1.19 × 10−7; rg2 = −0.10, P2 = 3.93 × 10−3), and sodium in urine (rg1 = 0.20, P1 = 1.38 × 10−5; rg2 = −0.23, P2 = 2.65 × 10−7) were the most highly correlated medical measurements associated with overall lung cancer risk.

Among hematologic traits, white blood cell count (WBC) and high light scatter reticulocyte count showed significant positive genetic correlations with lung cancer risk overall (rg1 = 0.21, P1 = 6.72 × 10−9; rg2 = 0.21, P2 = 2.40 × 10−7), and across histologic subtypes (LUAD: rg1 = 0.16, P1 = 7.59 × 10−5; rg2 = 0.16, P2 = 1.74 × 10−4; LUSC: rg1 = 0.20, P1 = 1.25 × 10−5; rg2 = 0.18, P2 = 9.95 × 10−4; SCLC: rg1 = 0.22, P1 = 2.11 × 10−6; rg2 = 0.23, P2 = 3.62 × 10−4). Lymphocyte count demonstrated the significant genetic correlation with SCLC before removing the genomic regions centered on 473 SNPs related to smoking behaviors (rg1 = 0.22, P1 = 2.11 × 10−6).

Evaluating familial illnesses, having a father with lung cancer showed the strongest genetic correlation with lung cancer (rg1 = 0.95, P1 = 5.52 × 10−30; rg2 = 0.89, P2 = 1.10 × 10−20) and for all lung histologic subtypes. Parental diagnoses of chronic bronchitis or emphysema were also positively associated, with consistent direction of genetic correlation in lung cancer and across histologic subtypes (rg1_Father = 0.60, P1_Father = 1.06 × 10−13; rg1_Mother = 0.49, P1_Mother = 2.56 × 10−11; rg2_Father = 0.53, P2_Father = 2.28 × 10−7; rg2_Mother = 0.40, P2_Mother = 1.73 × 10−4). parental longevity is strongly negatively correlated with lung cancer (rg1_Father = −0.51, P1_Father = 2.65 × 10−21; rg1_Mother = −0.50, P1_Mother = 2.84 × 10−11; rg2_Father = −0.48, P2_Father = 1.20 × 10−10; rg2_Mother = −0.43, P2_Mother = 1.22 × 10−5).

Among smoking behaviors, pack years of adult smoking showed strongly significant positive correlation with lung cancer susceptibility (rg1 = 0.75, P1 = 3.19 × 10−57; rg2 = 0.68, P2 = 2.29 × 10−30), with LUAD (rg1 = 0.56, P1 = 1.76 × 10−14; rg2 = 0.49, P2 = 1.05 × 10−13), with LUSC (rg1 = 0.75, P1 = 1.17 × 10−24; rg2 = 0.64, P2 = 2.35 × 10−10), and with SCLC (rg1 = 0.71, P1 = 2.70 × 10−17; rg2 = 0.65, P2 = 2.27 × 10−11). Current tobacco smoking also presented a stronger positive genetic correlation with lung cancer (rg1 = 0.61, P1 = 8.22 × 10−25; rg2 = 0.60, P2 = 2.32 × 10−32), with LUAD (rg1 = 0.43, P1 = 2.17 × 10−15; rg2 = 0.43, P2 = 2.11 × 10−12), with LUSC (rg1 = 0.67, P1 = 9.44 × 10−18; rg2 = 0.62, P2 = 9.37 × 10−11), and with SCLC (rg1 = 0.54, P1 = 1.31 × 10−14; rg2 = 0.52, P2 = 9.54 × 10−9) compared with former smokers with lung cancer (rg1 = 0.40, P1 = 1.52 × 10−7; rg2 = 0.39, P2 = 1.05 × 10−5), with LUAD (rg1 = 0.36, P1 = 1.43 × 10−5; rg2 = 0.39, P2 = 4.08 × 10−5), with LUSC (rg1 = 0.43, P1 = 1.32 × 10−5; rg2 = 0.38, P2 = 3.90 × 10−3), and with SCLC (rg1 = 0.27, P1 = 0.008; rg2 = 0.22, P2 = 0.089). The genetic architecture of maternal smoking around birth was also strongly associated with lung cancer risk (rg1 = 0.53, P1 = 2.81 × 10−22; rg2 = 0.54, P2 = 3.96 × 10−23), with LUAD (rg1 = 0.37, P1 = 9.74 × 10−10; rg2 = 0.36, P2 = 1.00 × 10−7), with LUSC (rg1 = 0.60, P1 = 2.17 × 10−17; rg2 = 0.61, P2 = 1.21 × 10−15), and with SCLC (rg1 = 0.49, P1 = 3.80 × 10−10; rg2 = 0.49, P2 = 3.68 × 10−10). Smoking/smokers in household demonstrated the significant genetic correlation with lung cancer risk without exclusion of genomic regions associated with smoking behaviors (rg1 = 0.48, P1 = 3.42 × 10−6).

We also examined the pairwise genetic relationship between lung cancer and various traits of interest among ever- and never-smokers. As represented in Table 3, the strongest genetic correlation with lung cancer was observed with pack-years of smoking among ever-smokers (rg1 = 0.72, P1 = 9.71 × 10−25; rg2 = 0.66, P2 = 2.43 × 10−15). Among hematologic traits, WBC showed a significantly positive association with lung cancer susceptibility among ever-smokers (rg1 = 0.17, P1 = 1.23 × 10−5). Paternal history of lung cancer elucidated the very strong genetic architecture of lung cancer risk among ever-smokers (rg1 = 0.97, P1 = 2.86 × 10−16; rg2 = 0.92, P2 = 5.96 × 10−12). Also, parental longevity showed the strong inverse genetic association with lung cancer susceptibility among ever-smokers (rg1_Father = −0.44, P1_Father = 7.97 × 10−9; rg1_Mother = −0.48, P1_Mother = 8.81 × 10−7).

For never smokers, there were no significant associations between lung cancer and multiple polygenic traits whether we included or excluded genomic regions related to smoking behaviors. This may be due to the reduced sample size for analyses of never-smokers, or indicate that lung cancer in never-smokers may be more likely related to rare variants than to common variants of small effect that are required to evaluate shared heritability analyses using LDSR.

We examined the patterns of genetic overlap between lung cancer and a variety of complex phenotypes using SNP-trait GWAS meta-analysis summary-level data acquired from UKBB (27), other publicly available GWAS data sources (33–36) and the TRICL-Oncoarray Lung Consortium, which provides the largest GWAS of lung cancer in European-descent populations (6) so far conducted. Cross-trait LDSR using GWAS summary statistics enabled us to compute the genetic correlation between phenotypic traits by estimating genetic correlations between pairs of traits to gain insights into common etiologies (25, 26). We identified new phenotypic associations between different traits and lung cancer, including subsets stratified by lung histologic subtypes and primarily mediated by the genetics of smoking behaviors. The findings of this study enable us to confirm previously known associations and to identify polygenic traits for further study using complementary approaches such as Mendelian Randomization analyses to identify new causal relationships between lung cancer and related traits.

We observed a significant positive correlation between the genetic architecture of emphysema/chronic bronchitis and lung cancer susceptibility in LDSR analyses, while higher FEV1 was negatively (i.e., protectively) associated with lung cancer (48). COPD and chronic bronchitis (airflow obstruction) are endophenotypes of emphysema, which is a well-established comorbidity that often precedes lung cancer diagnosis (49). Chronic inflammation is a key feature in the development of both emphysema and bronchitis and could potentially contribute to lung cancer development, but their shared association with smoking history makes disentangling such effects challenging.

Most traits related to smoking behaviors increased the risk of lung cancer development among ever-smokers, including the length of time a person waits to smoke their first cigarette after waking. Our study also observed a significant positive correlation between the genomic architecture of in utero tobacco smoke exposure and that of lung cancer, suggesting that individuals who smoke during pregnancy could contribute to risk in their offspring. Infants whose mothers smoke during pregnancy may have reduced birth weight, birth defects, and weaker lungs than those whose mothers do not smoke, and this has been associated with numerous negative health outcomes in later life (50, 51). However, maternal smoking behaviors may also serve as a proxy for an individual's own genetic predisposition to engage in smoking, suggesting that additional smoking-associated loci that we were not able to account for may underlie the observed association.

Several blood traits showed positive genetic correlations with lung cancer and across all lung histologic subtypes. UKBB does not delineate white blood cell types among participants, limiting the observations that can be made with the available data. Overall, genetic correlations with increased lung cancer risk were observed with increased lymphocyte counts and a higher WBC. After excluding smoking-associated genomic loci, WBC remained associated with lung cancer. These findings showed consistency across lung histological subtypes and among ever-smokers. Further studies delineating risks by white blood cell subtypes are needed. A reticulocyte measures were also associated with a higher risk of lung cancer. The reticulocyte count measures how quickly reticulocytes made by the bone marrow are released into the blood. Reticulocyte count can be increased due to diseases that prematurely destroy red blood cells, such as aplastic anemia. The reticulocyte count is one of the primary parameters used in the initial classification of anemia, which co-occurs with lung cancer at baseline and is often exacerbated after chemotherapy treatment (52). The prevalence of anemia in patients with lung cancer is 37.6% and its incidence after chemotherapy treatment is 80% (53).

We also identified several suggestive polygenic phenotypes for which the genetic correlations were nominally significant (P < 0.05). We observed a nominally significant inverse genetic correlation between lung cancer including histology-specific subtypes and among ever-smokers and HDL levels, which was also negatively correlated with pancreatic cancer (37). Low HDL and high hematology level of lymphocyte and WBC were highly associated with insulin resistance, which has been positively associated with cancer development (54). We observed a positive association between diabetes and lung cancer with attenuation of this signal after exclusion smoking-associated genomic regions from analysis. Lung cancer, across all histological subtypes and among ever-smokers showed a uniquely negative genetic correlation with eczema at a nominal significance level of 0.05 compared with breast, colorectal, head/neck, ovary, and prostate cancers (38). Eczema is an umbrella term for several types of dermatitis, most commonly atopic dermatitis associated with increased serum IgE. Elevated serum IgE has also been shown to play a role in cancer immunosurveillance (55). Eczema (i.e., atopic dermatitis) is characterized by a Th2-dominated immune response both in the skin and in circulation. The associations between cancer and atopic conditions, including eczema, have been examined previously and found to protect against cancers of the lung, brain, pancreas, colon, leukemia, and melanoma (55, 56). We observed a shared effect with another autoimmune condition, celiac disease, which is a digestive disorder characterized by an abnormal reaction to gluten. Celiac disease showed an inverse genetic correlation with lung cancer and LUSC and a positive genetic correlation with prostate cancer (38) at the nominal significance level of 0.05. One possible explanation is that patients with celiac disease might be less likely to become smokers. However, the genetic correlations demonstrated a consistent inverse genetic correlation between celiac disease and lung cancer across all subtypes and even after excluding regions associated with smoking behaviors (57). When stratified by smoking status, we observed an inverse correlation in ever-smokers and a positive correlation in never-smokers, although neither reached statistical significance.

All phenotypic traits related to smoking behaviors showed strong genetic correlations with the risk of developing lung cancer among ever-smokers. Even after the removal of chromosomal regions associated with smoking behaviors, we still observed significant positive associations with lung cancer, across all subtypes, and among ever-smokers. This demonstrates that there may remain additional genetic loci of small effect size that influence smoking behaviors and therefore lung cancer risk, but which have not yet been identified by GWAS. It also suggests that our analyses that excluded the known smoking-associated regions are unable to fully account for the contribution of smoking-associated genomic variation to our analyses. However, the studies that have been conducted to study the genetics of smoking have included more than 500,000 individuals, so that a comprehensive assessment of smoking genetic architecture has been completed and we would anticipate that much of the genetic variation contributing to smoking behavior should have been removed by our approach (i.e., removing a 500-kb region around each of the known smoking-related regions). The limited change in estimates of cross-correlations in this analysis suggests that the cross-heritability analyses we have conducted may be robust to possible confounding with smoking behavior.

Large-scale datasets, where multiple phenotypes are assayed and surveyed in tens or hundreds of thousands of individuals, have become increasingly available to genomic researchers. However, the availability of individual-level genotype data remains more limited. We performed cross-trait LDSR analyses to decipher genetic correlations between lung cancer and various complex traits and diseases using GWAS summary statistics. LDSR approach provides an improved understanding of the shared genetic architecture between traits, which can serve as potential surrogates for polygenic variants involved in lung carcinogenesis. LDSR does not require individual-level data, genome-wide significant SNPs, or LD pruning, which otherwise leads to loss of information when causal SNPs are in LD. A particular advantage is that measuring multiple traits within the same individuals is not required, allowing one to assess the presence of genetic correlation between risk for lung cancer and other traits using summary statistics and known LD patterns. Genetic correlation analysis has unique strengths in leveraging information from the entire genome to evaluate correlations between a disease of interest, like lung cancer, and multiple traits that have been studied in unrelated populations. However, LDSR is not without limitations. One important assumption is that population stratification in the underlying GWAS summary statistics is not present, which we ensured by using summary statistics only from studies that employed ancestry-adjusted regression analyses. In addition, the genetic architecture of subjects must be similar across the two traits being compared, otherwise estimates will be biased – generally toward the null. We restricted all individuals of our analyses to populations of European-descent and included SNPs that were imputed using the common 1000 Genomes imputation platform. Furthermore, GWAS summary statistics based on small sample sizes or for traits with low SNP-based heritability are not amenable to LDSR analysis, limiting the breadth of phenotypes one can regress against their trait of interest. Many non-significant associations in our data could be due to limited statistical power, rather than a lack of shared heritability, as cross-trait LDSR requires larger sample sizes of GWAS summary data to achieve equivalent SE compared with methods using individual-level data (25). Finally, LDSR currently relies on analysis of common genetic variants with MAF>0.01 and therefore fails to capture shared heritability due to underlying rare variants, which have previously been reported in association with lung cancer (58).

LDSR analysis confirmed previously identified traits such as smoking and family history, that are positively genetically correlated with lung cancer, implying a shared genetic background. This study also showed inverse (protective) associations between the genetic architecture of lung cancer and that of both eczema and celiac disease, which persisted when excluding smoking behavior–associated genomic loci from analyses. In addition, our analyses indicate that additional smoking-associated loci may yet be identified on the basis of the persistent positive correlation between smoking and lung cancer in our data after accounting for known common loci. Analyses among never-smokers failed to identify notable associations, suggesting that lung cancer in these patients might be caused by rarer variants. Future studies using Mendelian randomization or genetic instrument variable approaches may further elucidate the causal relationship between the risk of lung cancer and other phenotypic traits of interest to reveal surrogate biomarkers with shared genetic backgrounds.

C.I. Amos reports grants from NCI and grants from Cancer Prevention Research Institute of Texas during the conduct of the study. No disclosures were reported by the other authors.

J. Byun: Conceptualization, formal analysis, supervision, investigation, methodology, writing–original draft, writing–review and editing. Y. Han: Conceptualization, data curation, software, formal analysis, investigation, visualization, methodology, writing–review and editing. Q.T. Ostrom: Conceptualization, data curation, software, formal analysis, investigation, methodology, writing–review and editing. J. Edelson: Conceptualization, data curation, software, formal analysis, writing–review and editing. K.M. Walsh: Conceptualization, formal analysis, investigation, methodology, writing–review and editing. R.W. Pettit: Formal analysis, writing–review and editing. M.L. Bondy: Conceptualization, investigation, methodology, writing–review and editing. R.J. Hung: Resources, writing–review and editing. J.D. McKay: Resources, methodology, writing–review and editing. C.I. Amos: Conceptualization, supervision, funding acquisition, investigation, methodology, writing–review and editing.

The authors would like to thank all members of the Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Team of the International Lung Cancer Consortium (ILCCO) for providing summary results data for lung cancer. INTEGRAL-ILCCO acknowledges the following contributing investigators: Demetrius Albanes, Stephan Lam, Adonina Tardon, Chu Chen, Gary Goodman, Stig E. Bojesen, Maria Teresa Landi, Mattias Johansson, Angela Risch, H-Erich Wichmann, Heike Bickeboller, David C. Christiani, Gadi Rennert, Susanne Arnold, Paul Brennan, John K. Field, Sanjay Shete, Loic Le Marchand, Olle Melander, Hans Brunnström, Geoffrey Liu, Angeline Andrew, Lambertus A. Kiemeney, Hongbing Shen, Shan Zienolddiny, Kjell Grankvist, Mikael Johansson, Neil Caporaso, Penella Woll, Richard Houlston, Ying Wang, M. Dawn Teare, Yun-Chul Hong, Jian-Min Yuan, Philip Lazarus, Matthew B. Schabath, Melinda C. Aldrich. C.I. Amos is an Established Research Scholar of the Cancer Prevention Research Institute of Texas. R.W. Pettit would like to thank the Baylor Research Advocates for Student Scientists for their funding support. Cancer Prevention Research Interest of Texas (CPRIT) award: RR170048 (C.I. Amos, J. Byun); NIH for INTEGRAL consortium: U19CA203654 (C.I. Amos, J. Byun, Y. Han, J. Edelson, R.J. Hung, J.D. McKay); Distinguished Scientist award from the Sontag Foundation (to K.M. Walsh); Research Training Grant from the Cancer Prevention and Research Institute of Texas: RP160097T (to Q.T. Ostrom); NIH: R01CA139020 (to M.L. Bondy); Training In Precision Environmental Health Sciences (TPEHS) Program (NIH grant no. T32ES027801; R.W. Pettit).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Bosse
Y
,
Amos
CI
. 
A decade of gwas results in lung cancer
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
363
79
.
2.
Bailey-Wilson
JE
,
Amos
CI
,
Pinney
SM
,
Petersen
GM
,
de Andrade
M
,
Wiest
JS
, et al
A major lung cancer susceptibility locus maps to chromosome 6q23–25
.
Am J Hum Genet
2004
;
75
:
460
74
.
3.
Liu
P
,
Vikis
HG
,
Wang
D
,
Lu
Y
,
Wang
Y
,
Schwartz
AG
, et al
Familial aggregation of common sequence variants on 15q24–25.1 in lung cancer
.
J Natl Cancer Inst
2008
;
100
:
1326
30
.
4.
Maher
B
. 
Personal genomes: the case of the missing heritability
.
Nature
2008
;
456
:
18
21
.
5.
Gorlov
IP
,
Gorlova
OY
,
Sunyaev
SR
,
Spitz
MR
,
Amos
CI
. 
Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms
.
Am J Hum Genet
2008
;
82
:
100
12
.
6.
McKay
JD
,
Hung
RJ
,
Han
Y
,
Zong
X
,
Carreras-Torres
R
,
Christiani
DC
, et al
Large-scale association analysis identifies new lung cancer susceptibility loci and heterogeneity in genetic susceptibility across histological subtypes
.
Nat Genet
2017
;
49
:
1126
32
.
7.
Zuk
O
,
Hechter
E
,
Sunyaev
SR
,
Lander
ES
. 
The mystery of missing heritability: genetic interactions create phantom heritability
.
Proc Natl Acad Sci U S A
2012
;
109
:
1193
8
.
8.
Amos
CI
,
Wu
X
,
Broderick
P
,
Gorlov
IP
,
Gu
J
,
Eisen
T
, et al
Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1
.
Nat Genet
2008
;
40
:
616
22
.
9.
Thorgeirsson
TE
,
Geller
F
,
Sulem
P
,
Rafnar
T
,
Wiste
A
,
Magnusson
KP
, et al
A variant associated with nicotine dependence, lung cancer and peripheral arterial disease
.
Nature
2008
;
452
:
638
42
.
10.
Truong
T
,
Hung
RJ
,
Amos
CI
,
Wu
X
,
Bickeboller
H
,
Rosenberger
A
, et al
Replication of lung cancer susceptibility loci at chromosomes 15q25, 5p15, and 6p21: a pooled analysis from the international lung cancer consortium
.
J Natl Cancer Inst
2010
;
102
:
959
71
.
11.
Chen
LS
,
Kaphingst
KA
. 
Risk perceptions and family history of lung cancer: differences by smoking status
.
Public Health Genomics
2011
;
14
:
26
34
.
12.
Zou
P
,
Gu
A
,
Ji
G
,
Zhao
L
,
Zhao
P
,
Lu
A
. 
The TERT rs2736100 polymorphism and cancer risk: a meta-analysis based on 25 case-control studies
.
BMC Cancer
2012
;
12
:
7
.
13.
Beckett
WS
. 
Epidemiology and etiology of lung cancer
.
Clin Chest Med
1993
;
14
:
1
15
.
14.
Schwartz
AG
. 
Genetic predisposition to lung cancer
.
Chest
2004
;
125
:
86S
9S
.
15.
Galvan
A
,
Ioannidis
JP
,
Dragani
TA
. 
Beyond genome-wide association studies: genetic heterogeneity and individual predisposition to cancer
.
Trends Genet
2010
;
26
:
132
41
.
16.
Wunsch-Filho
V
,
Boffetta
P
,
Colin
D
,
Moncau
JE
. 
Familial cancer aggregation and the risk of lung cancer
.
Sao Paulo Med J
2002
;
120
:
38
44
.
17.
Yang
IA
,
Holloway
JW
,
Fong
KM
. 
Genetic susceptibility to lung cancer and co-morbidities
.
J Thorac Dis
2013
;
5
:
S454
62
.
18.
Tse
LA
,
Yu
IT
,
Rothman
N
,
Ji
BT
,
Qiu
H
,
Wang
XR
, et al
Joint effects of environmental exposures and familial susceptibility to lung cancer in Chinese never smoking men and women
.
J Thorac Oncol
2014
;
9
:
1066
72
.
19.
Young
RP
,
Hopkins
RJ
. 
Chronic obstructive pulmonary disease (COPD) and lung cancer screening
.
Transl Lung Cancer Res
2018
;
7
:
347
60
.
20.
Wei
S
,
Chen
F
,
Liu
R
,
Fu
D
,
Wang
Y
,
Zhang
B
, et al
Outcomes of lobectomy on pulmonary function for early stage non-small cell lung cancer (NSCLC) patients with chronic obstructive pulmonary disease (COPD)
.
Thorac Cancer
2020
;
11
:
1784
9
.
21.
Tradigo
G
,
Vacca
R
,
Manini
T
,
Bird
V
,
Gerke
T
,
Veltri
P
, et al
A new approach to disentangle genetic and epigenetic components on disease comorbidities: studying correlation between genotypic and phenotypic disease networks
.
Procedia Comput Sci
2017
;
110
:
453
8
.
22.
Rubio-Perez
C
,
Guney
E
,
Aguilar
D
,
Pinero
J
,
Garcia-Garcia
J
,
Iadarola
B
, et al
Genetic and functional characterization of disease associations explains comorbidity
.
Sci Rep
2017
;
7
:
6207
.
23.
Wild
S
,
Roglic
G
,
Green
A
,
Sicree
R
,
King
H
. 
Global prevalence of diabetes: estimates for the year 2000 and projections for 2030
.
Diabetes Care
2004
;
27
:
1047
53
.
24.
Leduc
C
,
Antoni
D
,
Charloux
A
,
Falcoz
P-E
,
Quoix
E
. 
Comorbidities in the management of patients with lung cancer
.
Eur Respir J
2017
;
49
:
1601721
.
25.
Bulik-Sullivan
B
,
Finucane
HK
,
Anttila
V
,
Gusev
A
,
Day
FR
,
Loh
PR
, et al
An atlas of genetic correlations across human diseases and traits
.
Nat Genet
2015
;
47
:
1236
41
.
26.
Bulik-Sullivan
BK
,
Loh
PR
,
Finucane
HK
,
Ripke
S
,
Yang
J
, et al
,
Schizophrenia working group of the psychiatric genomics C, et al.
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies
.
Nat Genet
2015
;
47
:
291
5
.
27.
Abbott
L
,
Anttila
V
,
Aragam
K
, et al
Neale Lab - UK Biobank GWAS
2018
; http://www.nealelab.is/uk-biobank/.
28.
Sudlow
C
,
Gallacher
J
,
Allen
N
,
Beral
V
,
Burton
P
,
Danesh
J
, et al
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age
.
PLoS Med
2015
;
12
:
e1001779
.
29.
Liu
M
,
Jiang
Y
,
Wedow
R
,
Li
Y
,
Brazel
DM
,
Chen
F
, et al
Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use
.
Nat Genet
2019
;
51
:
237
44
.
30.
Buniello
A
,
MacArthur
JAL
,
Cerezo
M
,
Harris
LW
,
Hayhurst
J
,
Malangone
C
, et al
The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019
.
Nucleic Acids Res
2019
;
47
:
D1005
d12
.
31.
Amos
CI
,
Dennis
J
,
Wang
Z
,
Byun
J
,
Schumacher
FR
,
Gayther
SA
, et al
The OncoArray consortium: a network for understanding the genetic architecture of common cancers
.
Cancer Epidemiol Biomarkers Prev
2017
;
26
:
126
35
.
32.
Bycroft
C
,
Freeman
C
,
Petkova
D
,
Band
G
,
Elliott
LT
,
Sharp
K
, et al
The UK Biobank resource with deep phenotyping and genomic data
.
Nature
2018
;
562
:
203
9
.
33.
Kettunen
J
,
Demirkan
A
,
Wurtz
P
,
Draisma
HH
,
Haller
T
,
Rawal
R
, et al
Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA
.
Nat Commun
2016
;
7
:
11122
.
34.
Teslovich
TM
,
Musunuru
K
,
Smith
AV
,
Edmondson
AC
,
Stylianou
IM
,
Koseki
M
, et al
Biological, clinical and population relevance of 95 loci for blood lipids
.
Nature
2010
;
466
:
707
13
.
35.
Willer
CJ
,
Schmidt
EM
,
Sengupta
S
,
Peloso
GM
,
Gustafsson
S
,
Kanoni
S
, et al
Discovery and refinement of loci associated with lipid levels
.
Nat Genet
2013
;
45
:
1274
83
.
36.
Dubois
PC
,
Trynka
G
,
Franke
L
,
Hunt
KA
,
Romanos
J
,
Curtotti
A
, et al
Multiple common variants for celiac disease influencing immune gene expression
.
Nat Genet
2010
;
42
:
295
302
.
37.
Lindstrom
S
,
Finucane
H
,
Bulik-Sullivan
B
,
Schumacher
FR
,
Amos
CI
,
Hung
RJ
, et al
Quantifying the genetic correlation between multiple cancer types
.
Cancer Epidemiol Biomarkers Prev
2017
;
26
:
1427
35
.
38.
Jiang
X
,
Finucane
HK
,
Schumacher
FR
,
Schmit
SL
,
Tyrer
JP
,
Han
Y
, et al
Shared heritability and functional enrichment across six solid cancers
.
Nat Commun
2019
;
10
:
431
.
39.
Byun
J
,
Han
Y
,
Gorlov
IP
,
Busam
JA
,
Seldin
MF
,
Amos
CI
. 
Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure
.
BMC Genomics
2017
;
18
:
789
.
40.
Peto
R
,
Darby
S
,
Deo
H
,
Silcocks
P
,
Whitley
E
,
Doll
R
. 
Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case-control studies
.
BMJ
2000
;
321
:
323
9
.
41.
Doll
R
,
Peto
R
,
Wheatley
K
,
Gray
R
,
Sutherland
I
. 
Mortality in relation to smoking: 40 years' observations on male British doctors
.
BMJ
1994
;
309
:
901
11
.
42.
Darby
S
,
Whitley
E
,
Doll
R
,
Key
T
,
Silcocks
P
. 
Diet, smoking and lung cancer: a case-control study of 1000 cases and 1500 controls in South-West England
.
Br J Cancer
2001
;
84
:
728
35
.
43.
Zhang
LR
,
Morgenstern
H
,
Greenland
S
,
Chang
SC
,
Lazarus
P
,
Teare
MD
, et al
Cannabis smoking and lung cancer risk: pooled analysis in the international lung cancer consortium
.
Int J Cancer
2015
;
136
:
894
903
.
44.
National Center for Chronic Disease Prevention and Health Promotion Office on Smoking and Health
.
The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General
.
Atlanta, GA
:
Centers for Disease Control and Prevention (US)
; 
2014
.
45.
Morel
C
,
Fattore
L
,
Pons
S
,
Hay
YA
,
Marti
F
,
Lambolez
B
, et al
Nicotine consumption is regulated by a human polymorphism in dopamine neurons
.
Mol Psychiatry
2014
;
19
:
930
6
.
46.
Chen
LS
,
Saccone
NL
,
Culverhouse
RC
,
Bracci
PM
,
Chen
CH
,
Dueker
N
, et al
Smoking and genetic risk variation across populations of European, Asian, and African American ancestry–a meta-analysis of chromosome 15q25
.
Genet Epidemiol
2012
;
36
:
340
51
.
47.
Thorgeirsson
TE
,
Stefansson
K
. 
Genetics of smoking behavior and its consequences: the role of nicotinic acetylcholine receptors
.
Biol Psychiatry
2008
;
64
:
919
21
.
48.
Kang
HS
,
Shin
AY
,
Yeo
CD
,
Kim
JS
,
Kim
YH
,
Kim
JW
, et al
A lower level of forced expiratory volume in one second predicts the poor prognosis of small cell lung cancer
.
J Thorac Dis
2018
;
10
:
2179
85
.
49.
Sekine
Y
,
Katsura
H
,
Koh
E
,
Hiroshima
K
,
Fujisawa
T
. 
Early detection of COPD is important for lung cancer surveillance
.
Eur Respir J
2012
;
39
:
1230
40
.
50.
Welch
HG
,
Black
WC
. 
Overdiagnosis in cancer
.
J Natl Cancer Inst
2010
;
102
:
605
13
.
51.
Spindel
ER
,
McEvoy
CT
. 
The role of nicotine in the effects of maternal smoking during pregnancy on lung development and childhood respiratory disease. implications for dangers of E-cigarettes
.
Am J Respir Crit Care Med
2016
;
193
:
486
94
.
52.
Crawford
J
,
Kosmidis
PA
,
Hirsch
FR
,
Langer
CJ
. 
Targeting anemia in patients with lung cancer
.
J Thorac Oncol
2006
;
1
:
716
25
.
53.
Souilah
S
,
Dermech
N
,
Benbetka
Y
,
Djami
N
,
Khennouf
K
,
Amrane
R
, et al
Anemia during lung cancer
.
Eur Respir J
2018
;
52
:
PA2822
.
54.
Caporaso
NE
,
Jones
RR
,
Stolzenberg-Solomon
RZ
,
Medgyesi
DN
,
Kahle
LL
,
Graubard
BI
. 
Insulin resistance in healthy U.S. adults: findings from the national health and nutrition examination survey (NHANES)
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
157
68
.
55.
Kantor
ED
,
Hsu
M
,
Du
M
,
Signorello
LB
. 
Allergies and Asthma in relation to cancer risk
.
Cancer Epidemiol Biomarkers Prev
2019
;
28
:
1395
403
.
56.
Wang
G
,
Cao
C
,
Yu
Q
,
Tian
B
,
Chiu
FH
,
Xu
Z
. 
Atopic diseases correlated with the incidence of cancer
.
Chemotherapy
2017
;
6
.
57.
Ludvigsson
JF
,
West
J
,
Hubbard
R
,
Card
T
. 
Neutral risk of lung cancer in adults with celiac disease–nationwide cohort study
.
Lung Cancer
2012
;
78
:
179
84
.
58.
Wang
Y
,
McKay
JD
,
Rafnar
T
,
Wang
Z
,
Timofeeva
MN
,
Broderick
P
, et al
Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer
.
Nat Genet
2014
;
46
:
736
41
.