Abstract
Prior genome-wide association studies have identified numerous lung cancer risk loci and reveal substantial etiologic heterogeneity across histologic subtypes. Analyzing the shared genetic architecture underlying variation in complex traits can elucidate common genetic etiologies across phenotypes. Exploring pairwise genetic correlations between lung cancer and other polygenic traits can reveal the common genetic etiology of correlated phenotypes.
Using cross-trait linkage disequilibrium score regression, we estimated the pairwise genetic correlation and heritability between lung cancer and multiple traits using publicly available summary statistics. Identified genetic relationships were also examined after excluding genomic regions known to be associated with smoking behaviors, a major risk factor for lung cancer.
We observed several traits showing moderate single nucleotide polymorphism–based heritability and significant genetic correlations with lung cancer. We observed highly significant correlations between the genetic architectures of lung cancer and emphysema/chronic bronchitis across all histologic subtypes, as well as among lung cancer occurring among smokers. Our analyses revealed highly significant positive correlations between lung cancer and paternal history of lung cancer. We also observed a strong negative correlation with parental longevity. We observed consistent directions in genetic patterns after excluding genomic regions associated with smoking behaviors.
This study identifies numerous phenotypic traits that share genomic architecture with lung carcinogenesis and are not fully accounted for by known smoking-associated genomic loci.
These findings provide new insights into the etiology of lung cancer by identifying traits that are genetically correlated with increased risk of lung cancer.
Introduction
Over the past decade, genome-wide association studies (GWAS) assessing millions of single-nucleotide polymorphisms (SNPs) have identified common genetic variants that influence complex diseases and have proven useful in elucidating the heritability of complex traits (1). GWAS have generally found several loci that predispose for lung cancer across all histologic subtypes, but many more are histology-specific (2, 3). Moreover, many of the loci that predispose primarily for overall lung cancer are smoking-related, while the histology-specific loci do not show this strong association with smoking behaviors (1). Because histologic subtypes have distinct genetic architectures and an important proportion of trait heritability may be explained by variants of small genetic effects (4, 5), large overall sample sizes are needed. We have curated large datasets with harmonized histologic definitions, facilitating studies to evaluate the histology-specific and overall genetic architectures of lung cancer (6).
Lung cancer is a multifactorial disease driven by germline genetic variation, environmental risk exposures, particularly cigarette smoking, and an accumulation of somatic genetic events. Like other common diseases, much of the heritability of lung cancer remains unexplained (7). Explanations for this missing heritability may be that heritability estimates of lung cancer vary by histology or are attenuated by smoking behaviors (8–11), which varies over time periods and confers the preponderance of risk for most individuals. Approximately 80% of lung cancer–related deaths are related to smoking and this proportion is even higher for small-cell lung cancer (SCLC), which is rare among never-smokers (12). Despite the strong causal association between smoking and lung cancer incidence, there remains a substantial population of never-smokers who develop lung cancer, with about 10% of lung cancer cases in European-ancestry populations reporting to be nonsmokers (13). This suggests that interindividual variation in lung cancer susceptibility results from environmental exposures and genetic predisposition (14). In addition to smoking behaviors, other associated factors include chronic obstructive pulmonary disease (COPD), dietary behaviors, and exposures to lung carcinogens like metals from smelting, cooking emissions, atmospheric pollution, and residential radon (15–20). While exposure to environmental toxicants lacks a heritable component, many behavioral, medical, and physiologic measures have substantial heritability, allowing further exploration of the shared genetic architecture between these traits and lung cancer risk.
Chronic diseases often cooccur with other medical disorders in the same individual, so-called comorbid conditions. Understanding the genetic architecture underlying co-development of diseases may be more informative than studying individual phenotypes in isolation (21, 22). The contribution of shared genetic underpinning for comorbid conditions has been inadequately studied. However, comorbid conditions described in the medical literature may help to identify persons at increased risks (19). The most common comorbidity related to lung cancer is COPD, the umbrella term including patients with emphysema or chronic bronchitis. Both disorders that share smoking as an underlying causal risk factor (19). Type II diabetes, which is associated with immunosuppression and vascular complications, is a significant public health problem (23). About 8%–18% of patients with cancer also have diabetes (24). Lung cancer and diabetes share common risk factors such as age, diet, and smoking (24). To tease apart the shared genetic contribution to these comorbid conditions and the extent to which they are mediated solely by smoking behaviors, we performed cross-trait linkage disequilibrium (LD) score regression (LDSR) analysis (25, 26) to reveal patterns of pairwise genetic correlation between lung cancer and multiple polygenic phenotypes using GWAS summary statistics from TRICL (Transdisciplinary Research In Cancer of the Lung)-Oncoarray lung data (6) and prior GWAS studies including the UK Biobank (UKBB; refs. 27, 28). Analyses were also stratified by lung cancer histology and smoking status, and chromosomal regions related to smoking behaviors (e.g., cigarettes per day) and nicotine dependence were removed to account for the potential mediating effects of smoking behaviors (29, 30).
Materials and Methods
Summary statistics of lung cancer meta-analysis and data availability
GWAS summary statistics (6) from the TRICL-Oncoarray consortium were used in this study, including 29,266 patients with lung cancer [11,273 lung adenocarcinoma (LUAD), 7,426 lung squamous cell carcinoma (LUSC), and 2,664 SCLC], and 56,450 European-ancestry controls (Supplementary Table S1). The Oncoarray samples were genotyped using the Illumina Oncoarray-500K BeadChip and 517,482 SNPs remained for analysis after quality control processes. Individual-level genomes were imputed to 10,439,017 SNPs using the 1000 Genome Project Phase 3 panel (haplotype release October 2014; refs. 6, 31). Further details about the TRICL-Oncoarray studies and genotyping methods have been previously published (6, 31). The summary statistics for lung cancer, all lung histologic subtype-based and smoking status–based subset analyses were obtained from a prior GWAS meta-analysis of lung cancer risk (6). Summary statistics from the meta-analysis of TRICL-Oncoarray lung cancer studies have been deposited at the database of Genotypes and Phenotypes (dbGAP) under study accession phs000877.v1.p1 and phs001273.v3.p2.
Integration of existing summary-level GWAS data from UK Biobank and other resources
To estimate the genetic correlation between lung cancer and the phenotypic trait of interest, we harmonized publicly available GWAS summary-level datasets from UKBB (27, 28, 32), a large and detailed cohort study in the United Kingdom that enrolled over 500,000 adults who were aged 40–69 years when recruited in 2006–2010. The study has collected extensive and comprehensive phenotypic and genotypic details about participants, including data from medical record linkage, questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes (28). Summary statistics of imputed data were downloaded from UKBB (27). Autoimmune- and lipids-related traits were obtained from previously reported GWAS (33–36). The sample sizes and more details are shown in Supplementary Table S2.
Ethical statement
All participants provided informed consents to provide samples for DNA analysis, cancer status and smoking behavior according to protocols that were evaluated by the Internal Review Boards (or equivalent) of the contributing centers and according to prevailing rules including the Belmont Report or the Declaration of Helsinki, Council for International Organizations of Medical Science (CIOMS), the U.S. Common Rule or other guiding principles. Ethics approval for the UKBB was provided by the UK National Health Service (NHS) Research Ethics Service North West (Research Ethnics Committee approval number 11/NW/0382) and all participants informed written consent. Also, for other summary-level GWAS data, written informed consent was obtained from all study participants contributing to those studies with local Ethics Committee/Institutional Review Board approval. Full details have been reported elsewhere (28, 33–36).
Genetic heritability and pairwise genetic correlations
Genome-wide SNP-heritability and pairwise genetic correlation estimates were computed using GWAS summary statistics and linkage disequilibrium information through LDSR analyses (LD Score v1.0.1, https://github.com/bulik/ldsc). Briefly, LDSR is a method that regresses Chi-square statistics from GWAS summary on LD-scores measuring how much genetic variation each variant tags. By regressing the product of Z-scores, Z1Z2, from two polygenic, genetically correlated traits with Z-scores, Z1 and Z2, onto the LD-score for each SNP, the genetic covariance between two polygenic traits can be obtained (37). Specifically, genetic covariance is measured as the slope of this regression after accounting for GWAS sample size and the number of SNPs included in the LDSR analysis. Once obtained, LDSR standardizes genetic covariance to estimate genetic correlation by dividing the genetic covariance by the overall heritability of each trait being interrogated. Heritability is calculated simply as the genetic covariance of a trait with itself and may best be understood as the proportion of a trait that can be explained genetically by SNP variation.
For LDSR, the primary common genetic variants with minor allele frequency (MAF)>1% and imputation INFO score>90% were used in this study because the standard errors of LD scores for these variants could be substantial. Multi-allelic SNPs and the MHC region (Chr6:26Mb-34Mb) were removed from GWAS summary statistics because of the complex and unusual LD and genetic architecture of MHC region (38). These analyses were also restricted to HapMap3 SNPs with MAF above 5% in populations of European-descent as described by the 1000 Genomes Project as a reference for LD patterns (26). We performed LDSR to estimate pairwise genetic correlations between lung cancer and traits of medical conditions, including family history, lipids, and smoking behaviors, as well as histology-specific pairwise correlations for LUAD, LUSC, and SCLC. We also implemented LDSR to compute the estimate of pairwise genetic correlation for all lung cancer stratified by groups of individuals diagnosed with lung cancers that reported smoking fewer than 100 cigarettes in their lives (i.e., never-smokers) and cases and controls diagnosed with lung cancers that had smoked 100 or more cigarettes in their lifetime (i.e., ever-smokers; refs. 6). While we assume that there is no mismatch between LD scores from the reference population and the target population used for GWAS, we also considered the scenario where there is a heterogeneous substructure in populations of European-descent resulting in directional bias in average LD score (39). Using LDSR with an intercept term protects against bias due to shared population stratification and sample overlap when estimating cross-trait genetic correlation (gcov_int) and against bias due to population stratification and cryptic relatedness when estimating observed-scale heritability (h2_int) (25). The intercept, h2_int should be close to 1. The intercept, gcov_int should be less than one standard error away from zero. To avoid bias due to population stratification and sample overlap between lung cancer summary data and other existing summary data, we performed LDSR analysis without constraining intercept to zero (unconstrained intercept model; ref. 26). After filtering, there were ∼1,120,000 SNPs available for analyses in lung cancer and across all histologic subtypes and smoking status (Supplementary Table S1). We used the command option of LD Score v1.0.1, “ldsc.py –rg lung1.sumstats.gz, trait1.sumstats.gz –ref-ld-chr eur_w_ld_chr/–w-ld-chr eur_w_ld_chr/–out lung1_triat1”.
Exclusion of genomic regions associated with smoking behaviors
Epidemiologic studies consistently show that smoking is the most prominent risk factor for lung cancer (40–42). Many studies of lung cancer have addressed the potential role of tobacco as a confounder in association studies (14, 43, 44). Because smoking behaviors are heritable traits (45–47) and correlated with both other behaviors and lung cancer risk, we were concerned that associations between lung cancer and other traits revealed by LDSR could reflect a common etiology due to smoking (37, 38). To attenuate the contribution of smoking-related variants to these LDSR associations, chromosomal regions (±500 kb) centered on 473 SNPs associated with smoking traits such as cigarettes per day, smoking initiation, smoking cessation, initiation age of regular smoking, and nicotine dependence were excluded from lung cancer GWAS summary statistics in sensitivity analyses (29, 30). The excluded 473 SNPs and regions were listed in Supplementary Table S3.
Results
We examined the degree of overlap in genomic contributions to lung cancer and other polygenic phenotypes of interest-based on the pairwise genetic correlation (rg) and the SNP-heritability (h2), representing the proportion of phenotypic variance explained by all SNPs using cross-trait LDSR analysis. We also investigated the genetic relationship on genomic regions associated with smoking behaviors, a major risk factor of lung cancer. After removing the genomic regions associated with smoking behaviors, ∼980,000 SNPs remained for the LDSR analyses (Supplementary Table S1). As shown in Supplementary Fig. S1, we observed that the strongest genome-wide signals on chromosomes 15 and 19 (near the cholinergic nicotine receptor CHRNA5 and the nicotine metabolizing enzyme CYP2A6, respectively) were absent after removing the genomic regions related to smoking behaviors.
We identified numerous traits showing moderate SNP-heritability and moderate to strong genetic correlation with lung cancer at a Bonferroni-corrected significance threshold of P = 0.05/3,600 = 1.38 × 10−5. Because of the hypothesis-generating nature of this research, we also considered P < 0.05 to be nominally significant associations that merit targeted follow-up in future studies. P values below our Bonferroni-corrected level of statistical significance were considered to be robustly associated in these analyses and are further marked in bold.
We estimated the observed heritability of lung cancer to be about 8% overall and 7% after excluding chromosomal regions associated with smoking behaviors. As shown in Table 1, an estimate of the observed heritability of lung cancer in ever-smokers was about 10% when including smoking-associated loci and 8% after the exclusion, while in never-smokers h2 was approximately 3% to approximately 4% in both analyses.
Strata . | Heritability . | SE of heritability . |
---|---|---|
Lung cancer | 0.0832 | 0.0126 |
LUAD | 0.0677 | 0.0099 |
LUSC | 0.0517 | 0.0106 |
SCLC | 0.1045 | 0.0192 |
Ever smokers | 0.0994 | 0.0214 |
Never smokers | 0.0304 | 0.0479 |
Lung cancer exclusion | 0.0706 | 0.0075 |
LUAD Exclusion | 0.0659 | 0.0094 |
LUSC Exclusion | 0.0404 | 0.0094 |
SCLC Exclusion | 0.0856 | 0.0194 |
Ever smokers exclusion | 0.0826 | 0.0157 |
Never smokers exclusion | 0.0463 | 0.0531 |
Strata . | Heritability . | SE of heritability . |
---|---|---|
Lung cancer | 0.0832 | 0.0126 |
LUAD | 0.0677 | 0.0099 |
LUSC | 0.0517 | 0.0106 |
SCLC | 0.1045 | 0.0192 |
Ever smokers | 0.0994 | 0.0214 |
Never smokers | 0.0304 | 0.0479 |
Lung cancer exclusion | 0.0706 | 0.0075 |
LUAD Exclusion | 0.0659 | 0.0094 |
LUSC Exclusion | 0.0404 | 0.0094 |
SCLC Exclusion | 0.0856 | 0.0194 |
Ever smokers exclusion | 0.0826 | 0.0157 |
Never smokers exclusion | 0.0463 | 0.0531 |
Note: Exclusion indicates the removal of genomic regions associated to smoking behaviors.
Genome-wide genetic correlation estimates were obtained, stratified by lung cancer histology and smoking status (Supplementary Table S4). The genetic correlation estimates were comparable among histologic subtypes, and SCLC appeared to share more genetic architecture with LUSC than with LUAD both before and after omitting genomic regions related to smoking behaviors. Risk in ever-smokers was more correlated with LUAD and LUSC than with SCLC, and risk in never-smokers was more correlated with LUAD than with other subtypes.
Our results based on cross-trait LDSR analyses demonstrated that the genetic architecture of lung cancer susceptibility was strongly positively correlated with that of traits related to smoking behaviors, selected medical conditions, and family history. As presented in Tables 2 and 3, the strongest genetic correlation across all lung cancer subtypes was with pack-years of smoking, followed by current tobacco smoking, maternal smoking behavior around birth, paternal history of lung cancer, parental longevity, and chronic bronchitis/emphysema. All cross-trait genetic correlations between lung cancer and polygenic traits considered in this study are reported in Supplementary Tables S5–S7.
. | Lung cancer . | LUAD . | LUSC . | SCLC . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Inclusion . | Exclusion . | Inclusion . | Exclusion . | Inclusion . | Exclusion . | Inclusion . | Exclusion . | ||||||||
Phenotypic trait . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . |
Allergy atopy | ||||||||||||||||
Eczema | −0.11 | 0.005 | −0.10 | 0.024 | −0.11 | 0.019 | −0.09 | 0.049 | −0.12 | 0.024 | −0.11 | 0.055 | −0.14 | 0.012 | −0.12 | 0.036 |
Blood traits | ||||||||||||||||
High light scatter reticulocyte count | 0.18 | 8.20 × 10−8 | 0.19 | 1.04 × 10−7 | 0.16 | 7.56 × 10−5 | 0.15 | 6.46 × 10−4 | 0.17 | 4.87 × 10−4 | 0.14 | 0.011 | 0.24 | 7.95 × 10−7 | 0.25 | 2.19 × 10−5 |
Lymphocyte count | 0.14 | 1.27 × 10−4 | 0.14 | 1.97 × 10−4 | 0.12 | 6.02 × 10−3 | 0.12 | 8.90 × 10−3 | 0.17 | 8.58 × 10−5 | 0.14 | 4.23 × 10−3 | 0.19 | 1.02 × 10−5 | 0.17 | 1.19 × 10−3 |
White blood cell count | 0.21 | 6.72 × 10−9 | 0.21 | 2.40 × 10−7 | 0.16 | 7.59 × 10−5 | 0.16 | 1.74 × 10−4 | 0.20 | 1.25 × 10−5 | 0.18 | 9.95 × 10−4 | 0.22 | 2.11 × 10−6 | 0.23 | 3.62 × 10−4 |
Medical condition and family history | ||||||||||||||||
Emphysema/chronic bronchitis | 0.52 | 2.98 × 10−13 | 0.47 | 3.91 × 10−7 | 0.44 | 1.25 × 10−6 | 0.40 | 2.04 × 10−4 | 0.45 | 5.10 × 10−6 | 0.33 | 7.94 × 10−3 | 0.57 | 1.63 × 10−6 | 0.61 | 1.32 × 10−4 |
Forced expiratory volume In 1-Second (FEV1) | −0.18 | 1.19 × 10−7 | −0.10 | 3.93 × 10−3 | −0.10 | 0.019 | −0.10 | 5.93 × 10−3 | −0.24 | 4.53 × 10−7 | −0.17 | 3.31 × 10−3 | −0.19 | 8.45 × 10−4 | −0.14 | 0.024 |
COPD | 0.28 | 0.008 | 0.22 | 0.052 | 0.08 | 0.510 | 0.02 | 0.869 | 0.32 | 0.037 | 0.23 | 0.149 | 0.53 | 0.002 | 0.47 | 0.006 |
Diabetes | 0.13 | 0.007 | 0.13 | 0.014 | 0.05 | 0.323 | 0.04 | 0.432 | 0.16 | 0.015 | 0.17 | 0.019 | 0.14 | 0.049 | 0.14 | 0.067 |
Paternal history of lung cancer | 0.95 | 5.52 × 10−30 | 0.89 | 1.10 × 10−20 | 0.79 | 2.00 × 10−15 | 0.75 | 1.69 × 10−12 | 0.93 | 1.38 × 10−16 | 0.82 | 9.92 × 10−8 | 0.69 | 1.14 × 10−7 | 0.69 | 1.36 × 10−5 |
Paternal history of chronic bronchitis/emphysema | 0.60 | 1.06 × 10−13 | 0.53 | 2.28 × 10−7 | 0.42 | 6.67 × 10−5 | 0.39 | 2.32 × 10−3 | 0.52 | 2.00 × 10−6 | 0.42 | 4.09 × 10−3 | 0.61 | 1.66 × 10−7 | 0.63 | 6.64 × 10−5 |
Paternal age at death | −0.51 | 2.65 × 10−21 | −0.48 | 1.20 × 10−10 | −0.36 | 5.90 × 10−7 | −0.35 | 3.13 × 10−5 | −0.54 | 2.07 × 10−10 | −0.50 | 1.73 × 10−6 | −0.55 | 1.43 × 10−8 | −0.48 | 6.54 × 10−5 |
Maternal history of chronic bronchitis/emphysema | 0.49 | 2.56 × 10−11 | 0.40 | 1.73 × 10−4 | 0.33 | 7.49 × 10−4 | 0.63 | 1.18 × 10−8 | 0.63 | 1.18 × 10−8 | 0.56 | 1.75 × 10−4 | 0.30 | 0.014 | 0.22 | 0.185 |
Maternal age at death | −0.50 | 2.84 × 10−11 | −0.43 | 1.22 × 10−5 | −0.42 | 3.26 × 10−6 | −0.39 | 6.92 × 10−4 | −0.40 | 1.02 × 10−5 | −0.29 | 0.019 | −0.48 | 2.35 × 10−5 | −0.45 | 3.36 × 10−3 |
Metabolic traits | ||||||||||||||||
HDL | −0.17 | 2.69 × 10−4 | −0.18 | 0.002 | −0.10 | 0.073 | −0.10 | 0.137 | −0.18 | 0.010 | −0.18 | 0.018 | −0.21 | 0.014 | −0.20 | 0.021 |
Sodium in urine | 0.20 | 1.38 × 10−5 | 0.23 | 2.65 × 10−7 | 0.17 | 7.06 × 10−4 | 0.21 | 6.84 × 10−5 | 0.19 | 0.057 | 0.17 | 0.013 | 0.20 | 8.31 × 10−4 | 0.21 | 4.13 × 10−3 |
Smoking behaviors | ||||||||||||||||
Current tobacco smoking | 0.61 | 8.22 × 10−25 | 0.60 | 2.32 × 10−32 | 0.43 | 2.17 × 10−15 | 0.43 | 2.11 × 10−12 | 0.67 | 9.44 × 10−18 | 0.62 | 9.37 × 10−11 | 0.54 | 1.31 × 10−14 | 0.52 | 9.54 × 10−9 |
Former tobacco smoking | 0.40 | 1.52 × 10−7 | 0.39 | 1.05 × 10−5 | 0.36 | 1.43 × 10−5 | 0.39 | 4.08 × 10−5 | 0.43 | 1.32 × 10−5 | 0.38 | 3.90 × 10−3 | 0.27 | 0.008 | 0.22 | 0.089 |
Maternal smoking around birth | 0.53 | 2.81 × 10−22 | 0.46 | 1.28 × 10−15 | 0.37 | 9.74 × 10−10 | 0.32 | 4.20 × 10−6 | 0.60 | 2.17 × 10−17 | 0.51 | 6.50 × 10−9 | 0.49 | 3.80 × 10−10 | 0.44 | 7.02 × 10−6 |
Pack years adult smoking | 0.75 | 3.19 × 10−57 | 0.68 | 2.29 × 10−30 | 0.56 | 1.76 × 10−14 | 0.49 | 1.05 × 10−13 | 0.75 | 1.17 × 10−24 | 0.64 | 2.35 × 10−10 | 0.71 | 2.70 × 10−17 | 0.65 | 2.27 × 10−11 |
Smoking/smokers in household | 0.48 | 3.42 × 10−6 | 0.50 | 1.21 × 10−4 | 0.35 | 0.001 | 0.37 | 4.89 × 10−3 | 0.55 | 2.58 × 10−5 | 0.50 | 4.48 × 10−3 | 0.38 | 0.008 | 0.40 | 0.051 |
Time from waking to first cigarette | −0.73 | 2.29 × 10−9 | −0.73 | 1.29 × 10−4 | −0.44 | 0.002 | −0.41 | 0.040 | −0.77 | 8.59 × 10−6 | −0.72 | 4.12 × 10−3 | −0.96 | 6.11 × 10−7 | −1.14 | 9.23 × 10−4 |
. | Lung cancer . | LUAD . | LUSC . | SCLC . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Inclusion . | Exclusion . | Inclusion . | Exclusion . | Inclusion . | Exclusion . | Inclusion . | Exclusion . | ||||||||
Phenotypic trait . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . |
Allergy atopy | ||||||||||||||||
Eczema | −0.11 | 0.005 | −0.10 | 0.024 | −0.11 | 0.019 | −0.09 | 0.049 | −0.12 | 0.024 | −0.11 | 0.055 | −0.14 | 0.012 | −0.12 | 0.036 |
Blood traits | ||||||||||||||||
High light scatter reticulocyte count | 0.18 | 8.20 × 10−8 | 0.19 | 1.04 × 10−7 | 0.16 | 7.56 × 10−5 | 0.15 | 6.46 × 10−4 | 0.17 | 4.87 × 10−4 | 0.14 | 0.011 | 0.24 | 7.95 × 10−7 | 0.25 | 2.19 × 10−5 |
Lymphocyte count | 0.14 | 1.27 × 10−4 | 0.14 | 1.97 × 10−4 | 0.12 | 6.02 × 10−3 | 0.12 | 8.90 × 10−3 | 0.17 | 8.58 × 10−5 | 0.14 | 4.23 × 10−3 | 0.19 | 1.02 × 10−5 | 0.17 | 1.19 × 10−3 |
White blood cell count | 0.21 | 6.72 × 10−9 | 0.21 | 2.40 × 10−7 | 0.16 | 7.59 × 10−5 | 0.16 | 1.74 × 10−4 | 0.20 | 1.25 × 10−5 | 0.18 | 9.95 × 10−4 | 0.22 | 2.11 × 10−6 | 0.23 | 3.62 × 10−4 |
Medical condition and family history | ||||||||||||||||
Emphysema/chronic bronchitis | 0.52 | 2.98 × 10−13 | 0.47 | 3.91 × 10−7 | 0.44 | 1.25 × 10−6 | 0.40 | 2.04 × 10−4 | 0.45 | 5.10 × 10−6 | 0.33 | 7.94 × 10−3 | 0.57 | 1.63 × 10−6 | 0.61 | 1.32 × 10−4 |
Forced expiratory volume In 1-Second (FEV1) | −0.18 | 1.19 × 10−7 | −0.10 | 3.93 × 10−3 | −0.10 | 0.019 | −0.10 | 5.93 × 10−3 | −0.24 | 4.53 × 10−7 | −0.17 | 3.31 × 10−3 | −0.19 | 8.45 × 10−4 | −0.14 | 0.024 |
COPD | 0.28 | 0.008 | 0.22 | 0.052 | 0.08 | 0.510 | 0.02 | 0.869 | 0.32 | 0.037 | 0.23 | 0.149 | 0.53 | 0.002 | 0.47 | 0.006 |
Diabetes | 0.13 | 0.007 | 0.13 | 0.014 | 0.05 | 0.323 | 0.04 | 0.432 | 0.16 | 0.015 | 0.17 | 0.019 | 0.14 | 0.049 | 0.14 | 0.067 |
Paternal history of lung cancer | 0.95 | 5.52 × 10−30 | 0.89 | 1.10 × 10−20 | 0.79 | 2.00 × 10−15 | 0.75 | 1.69 × 10−12 | 0.93 | 1.38 × 10−16 | 0.82 | 9.92 × 10−8 | 0.69 | 1.14 × 10−7 | 0.69 | 1.36 × 10−5 |
Paternal history of chronic bronchitis/emphysema | 0.60 | 1.06 × 10−13 | 0.53 | 2.28 × 10−7 | 0.42 | 6.67 × 10−5 | 0.39 | 2.32 × 10−3 | 0.52 | 2.00 × 10−6 | 0.42 | 4.09 × 10−3 | 0.61 | 1.66 × 10−7 | 0.63 | 6.64 × 10−5 |
Paternal age at death | −0.51 | 2.65 × 10−21 | −0.48 | 1.20 × 10−10 | −0.36 | 5.90 × 10−7 | −0.35 | 3.13 × 10−5 | −0.54 | 2.07 × 10−10 | −0.50 | 1.73 × 10−6 | −0.55 | 1.43 × 10−8 | −0.48 | 6.54 × 10−5 |
Maternal history of chronic bronchitis/emphysema | 0.49 | 2.56 × 10−11 | 0.40 | 1.73 × 10−4 | 0.33 | 7.49 × 10−4 | 0.63 | 1.18 × 10−8 | 0.63 | 1.18 × 10−8 | 0.56 | 1.75 × 10−4 | 0.30 | 0.014 | 0.22 | 0.185 |
Maternal age at death | −0.50 | 2.84 × 10−11 | −0.43 | 1.22 × 10−5 | −0.42 | 3.26 × 10−6 | −0.39 | 6.92 × 10−4 | −0.40 | 1.02 × 10−5 | −0.29 | 0.019 | −0.48 | 2.35 × 10−5 | −0.45 | 3.36 × 10−3 |
Metabolic traits | ||||||||||||||||
HDL | −0.17 | 2.69 × 10−4 | −0.18 | 0.002 | −0.10 | 0.073 | −0.10 | 0.137 | −0.18 | 0.010 | −0.18 | 0.018 | −0.21 | 0.014 | −0.20 | 0.021 |
Sodium in urine | 0.20 | 1.38 × 10−5 | 0.23 | 2.65 × 10−7 | 0.17 | 7.06 × 10−4 | 0.21 | 6.84 × 10−5 | 0.19 | 0.057 | 0.17 | 0.013 | 0.20 | 8.31 × 10−4 | 0.21 | 4.13 × 10−3 |
Smoking behaviors | ||||||||||||||||
Current tobacco smoking | 0.61 | 8.22 × 10−25 | 0.60 | 2.32 × 10−32 | 0.43 | 2.17 × 10−15 | 0.43 | 2.11 × 10−12 | 0.67 | 9.44 × 10−18 | 0.62 | 9.37 × 10−11 | 0.54 | 1.31 × 10−14 | 0.52 | 9.54 × 10−9 |
Former tobacco smoking | 0.40 | 1.52 × 10−7 | 0.39 | 1.05 × 10−5 | 0.36 | 1.43 × 10−5 | 0.39 | 4.08 × 10−5 | 0.43 | 1.32 × 10−5 | 0.38 | 3.90 × 10−3 | 0.27 | 0.008 | 0.22 | 0.089 |
Maternal smoking around birth | 0.53 | 2.81 × 10−22 | 0.46 | 1.28 × 10−15 | 0.37 | 9.74 × 10−10 | 0.32 | 4.20 × 10−6 | 0.60 | 2.17 × 10−17 | 0.51 | 6.50 × 10−9 | 0.49 | 3.80 × 10−10 | 0.44 | 7.02 × 10−6 |
Pack years adult smoking | 0.75 | 3.19 × 10−57 | 0.68 | 2.29 × 10−30 | 0.56 | 1.76 × 10−14 | 0.49 | 1.05 × 10−13 | 0.75 | 1.17 × 10−24 | 0.64 | 2.35 × 10−10 | 0.71 | 2.70 × 10−17 | 0.65 | 2.27 × 10−11 |
Smoking/smokers in household | 0.48 | 3.42 × 10−6 | 0.50 | 1.21 × 10−4 | 0.35 | 0.001 | 0.37 | 4.89 × 10−3 | 0.55 | 2.58 × 10−5 | 0.50 | 4.48 × 10−3 | 0.38 | 0.008 | 0.40 | 0.051 |
Time from waking to first cigarette | −0.73 | 2.29 × 10−9 | −0.73 | 1.29 × 10−4 | −0.44 | 0.002 | −0.41 | 0.040 | −0.77 | 8.59 × 10−6 | −0.72 | 4.12 × 10−3 | −0.96 | 6.11 × 10−7 | −1.14 | 9.23 × 10−4 |
Note: Bold indicates P ≤ 1.38 × 10−5.
Abbreviation: rg, genetic correlation.
. | Ever smokers . | Never Smokers . | ||||||
---|---|---|---|---|---|---|---|---|
. | Inclusion . | Exclusion . | Inclusion . | Exclusion . | ||||
Phenotypic trait . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . |
Allergy atopy | ||||||||
Eczema | −0.12 | 0.011 | −0.13 | 0.037 | −0.01 | 0.937 | −0.02 | 0.906 |
Blood traits | ||||||||
High light scatter reticulocyte count | 0.17 | 6.99 × 10−5 | 0.19 | 8.59 × 10−5 | 0.16 | 0.328 | 0.10 | 0.402 |
Lymphocyte count | 0.12 | 0.003 | 0.12 | 0.008 | 0.21 | 0.309 | 0.16 | 0.287 |
White blood cell count | 0.17 | 1.23 × 10−5 | 0.18 | 2.43 × 10−4 | 0.32 | 0.246 | 0.22 | 0.190 |
Medical condition and family history | ||||||||
Emphysema/chronic bronchitis | 0.43 | 1.64 × 10−6 | 0.43 | 5.81 × 10−4 | 0.48 | 0.272 | 0.42 | 0.231 |
Forced expiratory volume in 1-second (FEV1) | −0.15 | 7.33 × 10−4 | −0.15 | 0.001 | −0.16 | 0.348 | −0.03 | 0.775 |
Paternal history of lung cancer | 0.97 | 2.86 × 10−16 | 0.92 | 5.96 × 10−12 | 0.37 | 0.348 | 0.32 | 0.276 |
Paternal history of chronic bronchitis/emphysema | 0.53 | 4.66 × 10−8 | 0.48 | 2.25 × 10−4 | 0.08 | 0.814 | 0.03 | 0.917 |
Paternal age at death | −0.44 | 7.97 × 10−9 | −0.41 | 4.89 × 10−5 | −0.24 | 0.405 | −0.14 | 0.540 |
Maternal history of chronic bronchitis/emphysema | 0.37 | 2.53 × 10−4 | 0.25 | 0.045 | 0.35 | 0.385 | 0.28 | 0.413 |
Maternal age at death | −0.48 | 8.81 × 10−7 | −0.48 | 2.52 × 10−4 | −0.12 | 0.691 | 0.00 | 0.985 |
Metabolic traits | ||||||||
HDL | −0.21 | 0.002 | −0.21 | 0.006 | 0.06 | 0.721 | 0.11 | 0.566 |
Sodium in urine | 0.18 | 0.001 | 0.23 | 1.57 × 10−4 | 0.36 | 0.194 | 0.38 | 0.089 |
Smoking behaviors | ||||||||
Current tobacco smoking | 0.48 | 5.90 × 10−12 | 0.48 | 4.55 × 10−11 | 0.59 | 0.186 | 0.55 | 0.078 |
Former tobacco smoking | 0.21 | 0.015 | 0.20 | 0.062 | 0.91 | 0.197 | 0.76 | 0.107 |
Maternal smoking around birth | 0.41 | 6.38 × 10−11 | 0.38 | 3.67 × 10−7 | 0.39 | 0.227 | 0.25 | 0.243 |
Pack years adult smoking | 0.72 | 9.71 × 10−25 | 0.66 | 2.43 × 10−15 | 0.29 | 0.284 | 0.22 | 0.275 |
Smoking/smokers in household | 0.42 | 0.001 | 0.50 | 0.001 | 0.35 | 0.384 | 0.31 | 0.368 |
Time from waking to first cigarette | −0.75 | 4.47 × 10−7 | −0.80 | 0.001 | −0.24 | 0.557 | −0.24 | 0.560 |
. | Ever smokers . | Never Smokers . | ||||||
---|---|---|---|---|---|---|---|---|
. | Inclusion . | Exclusion . | Inclusion . | Exclusion . | ||||
Phenotypic trait . | rg1 . | P1 . | rg2 . | P2 . | rg1 . | P1 . | rg2 . | P2 . |
Allergy atopy | ||||||||
Eczema | −0.12 | 0.011 | −0.13 | 0.037 | −0.01 | 0.937 | −0.02 | 0.906 |
Blood traits | ||||||||
High light scatter reticulocyte count | 0.17 | 6.99 × 10−5 | 0.19 | 8.59 × 10−5 | 0.16 | 0.328 | 0.10 | 0.402 |
Lymphocyte count | 0.12 | 0.003 | 0.12 | 0.008 | 0.21 | 0.309 | 0.16 | 0.287 |
White blood cell count | 0.17 | 1.23 × 10−5 | 0.18 | 2.43 × 10−4 | 0.32 | 0.246 | 0.22 | 0.190 |
Medical condition and family history | ||||||||
Emphysema/chronic bronchitis | 0.43 | 1.64 × 10−6 | 0.43 | 5.81 × 10−4 | 0.48 | 0.272 | 0.42 | 0.231 |
Forced expiratory volume in 1-second (FEV1) | −0.15 | 7.33 × 10−4 | −0.15 | 0.001 | −0.16 | 0.348 | −0.03 | 0.775 |
Paternal history of lung cancer | 0.97 | 2.86 × 10−16 | 0.92 | 5.96 × 10−12 | 0.37 | 0.348 | 0.32 | 0.276 |
Paternal history of chronic bronchitis/emphysema | 0.53 | 4.66 × 10−8 | 0.48 | 2.25 × 10−4 | 0.08 | 0.814 | 0.03 | 0.917 |
Paternal age at death | −0.44 | 7.97 × 10−9 | −0.41 | 4.89 × 10−5 | −0.24 | 0.405 | −0.14 | 0.540 |
Maternal history of chronic bronchitis/emphysema | 0.37 | 2.53 × 10−4 | 0.25 | 0.045 | 0.35 | 0.385 | 0.28 | 0.413 |
Maternal age at death | −0.48 | 8.81 × 10−7 | −0.48 | 2.52 × 10−4 | −0.12 | 0.691 | 0.00 | 0.985 |
Metabolic traits | ||||||||
HDL | −0.21 | 0.002 | −0.21 | 0.006 | 0.06 | 0.721 | 0.11 | 0.566 |
Sodium in urine | 0.18 | 0.001 | 0.23 | 1.57 × 10−4 | 0.36 | 0.194 | 0.38 | 0.089 |
Smoking behaviors | ||||||||
Current tobacco smoking | 0.48 | 5.90 × 10−12 | 0.48 | 4.55 × 10−11 | 0.59 | 0.186 | 0.55 | 0.078 |
Former tobacco smoking | 0.21 | 0.015 | 0.20 | 0.062 | 0.91 | 0.197 | 0.76 | 0.107 |
Maternal smoking around birth | 0.41 | 6.38 × 10−11 | 0.38 | 3.67 × 10−7 | 0.39 | 0.227 | 0.25 | 0.243 |
Pack years adult smoking | 0.72 | 9.71 × 10−25 | 0.66 | 2.43 × 10−15 | 0.29 | 0.284 | 0.22 | 0.275 |
Smoking/smokers in household | 0.42 | 0.001 | 0.50 | 0.001 | 0.35 | 0.384 | 0.31 | 0.368 |
Time from waking to first cigarette | −0.75 | 4.47 × 10−7 | −0.80 | 0.001 | −0.24 | 0.557 | −0.24 | 0.560 |
Note: Bold indicates P ≤ 1.38 × 10−5.
Taking into consideration inclusion (rg1 and P1) and exclusion (rg2 and P2) of genomic regions associated with smoking behaviors, emphysema, or chronic bronchitis (rg1 = 0.52, P1 = 2.98 × 10−13; rg2 = 0.47, P2 = 3.91 × 10−7), lung function in forced expiratory volume in 1-second (FEV1)(rg1 = −0.18, P1 = 1.19 × 10−7; rg2 = −0.10, P2 = 3.93 × 10−3), and sodium in urine (rg1 = 0.20, P1 = 1.38 × 10−5; rg2 = −0.23, P2 = 2.65 × 10−7) were the most highly correlated medical measurements associated with overall lung cancer risk.
Among hematologic traits, white blood cell count (WBC) and high light scatter reticulocyte count showed significant positive genetic correlations with lung cancer risk overall (rg1 = 0.21, P1 = 6.72 × 10−9; rg2 = 0.21, P2 = 2.40 × 10−7), and across histologic subtypes (LUAD: rg1 = 0.16, P1 = 7.59 × 10−5; rg2 = 0.16, P2 = 1.74 × 10−4; LUSC: rg1 = 0.20, P1 = 1.25 × 10−5; rg2 = 0.18, P2 = 9.95 × 10−4; SCLC: rg1 = 0.22, P1 = 2.11 × 10−6; rg2 = 0.23, P2 = 3.62 × 10−4). Lymphocyte count demonstrated the significant genetic correlation with SCLC before removing the genomic regions centered on 473 SNPs related to smoking behaviors (rg1 = 0.22, P1 = 2.11 × 10−6).
Evaluating familial illnesses, having a father with lung cancer showed the strongest genetic correlation with lung cancer (rg1 = 0.95, P1 = 5.52 × 10−30; rg2 = 0.89, P2 = 1.10 × 10−20) and for all lung histologic subtypes. Parental diagnoses of chronic bronchitis or emphysema were also positively associated, with consistent direction of genetic correlation in lung cancer and across histologic subtypes (rg1_Father = 0.60, P1_Father = 1.06 × 10−13; rg1_Mother = 0.49, P1_Mother = 2.56 × 10−11; rg2_Father = 0.53, P2_Father = 2.28 × 10−7; rg2_Mother = 0.40, P2_Mother = 1.73 × 10−4). parental longevity is strongly negatively correlated with lung cancer (rg1_Father = −0.51, P1_Father = 2.65 × 10−21; rg1_Mother = −0.50, P1_Mother = 2.84 × 10−11; rg2_Father = −0.48, P2_Father = 1.20 × 10−10; rg2_Mother = −0.43, P2_Mother = 1.22 × 10−5).
Among smoking behaviors, pack years of adult smoking showed strongly significant positive correlation with lung cancer susceptibility (rg1 = 0.75, P1 = 3.19 × 10−57; rg2 = 0.68, P2 = 2.29 × 10−30), with LUAD (rg1 = 0.56, P1 = 1.76 × 10−14; rg2 = 0.49, P2 = 1.05 × 10−13), with LUSC (rg1 = 0.75, P1 = 1.17 × 10−24; rg2 = 0.64, P2 = 2.35 × 10−10), and with SCLC (rg1 = 0.71, P1 = 2.70 × 10−17; rg2 = 0.65, P2 = 2.27 × 10−11). Current tobacco smoking also presented a stronger positive genetic correlation with lung cancer (rg1 = 0.61, P1 = 8.22 × 10−25; rg2 = 0.60, P2 = 2.32 × 10−32), with LUAD (rg1 = 0.43, P1 = 2.17 × 10−15; rg2 = 0.43, P2 = 2.11 × 10−12), with LUSC (rg1 = 0.67, P1 = 9.44 × 10−18; rg2 = 0.62, P2 = 9.37 × 10−11), and with SCLC (rg1 = 0.54, P1 = 1.31 × 10−14; rg2 = 0.52, P2 = 9.54 × 10−9) compared with former smokers with lung cancer (rg1 = 0.40, P1 = 1.52 × 10−7; rg2 = 0.39, P2 = 1.05 × 10−5), with LUAD (rg1 = 0.36, P1 = 1.43 × 10−5; rg2 = 0.39, P2 = 4.08 × 10−5), with LUSC (rg1 = 0.43, P1 = 1.32 × 10−5; rg2 = 0.38, P2 = 3.90 × 10−3), and with SCLC (rg1 = 0.27, P1 = 0.008; rg2 = 0.22, P2 = 0.089). The genetic architecture of maternal smoking around birth was also strongly associated with lung cancer risk (rg1 = 0.53, P1 = 2.81 × 10−22; rg2 = 0.54, P2 = 3.96 × 10−23), with LUAD (rg1 = 0.37, P1 = 9.74 × 10−10; rg2 = 0.36, P2 = 1.00 × 10−7), with LUSC (rg1 = 0.60, P1 = 2.17 × 10−17; rg2 = 0.61, P2 = 1.21 × 10−15), and with SCLC (rg1 = 0.49, P1 = 3.80 × 10−10; rg2 = 0.49, P2 = 3.68 × 10−10). Smoking/smokers in household demonstrated the significant genetic correlation with lung cancer risk without exclusion of genomic regions associated with smoking behaviors (rg1 = 0.48, P1 = 3.42 × 10−6).
We also examined the pairwise genetic relationship between lung cancer and various traits of interest among ever- and never-smokers. As represented in Table 3, the strongest genetic correlation with lung cancer was observed with pack-years of smoking among ever-smokers (rg1 = 0.72, P1 = 9.71 × 10−25; rg2 = 0.66, P2 = 2.43 × 10−15). Among hematologic traits, WBC showed a significantly positive association with lung cancer susceptibility among ever-smokers (rg1 = 0.17, P1 = 1.23 × 10−5). Paternal history of lung cancer elucidated the very strong genetic architecture of lung cancer risk among ever-smokers (rg1 = 0.97, P1 = 2.86 × 10−16; rg2 = 0.92, P2 = 5.96 × 10−12). Also, parental longevity showed the strong inverse genetic association with lung cancer susceptibility among ever-smokers (rg1_Father = −0.44, P1_Father = 7.97 × 10−9; rg1_Mother = −0.48, P1_Mother = 8.81 × 10−7).
For never smokers, there were no significant associations between lung cancer and multiple polygenic traits whether we included or excluded genomic regions related to smoking behaviors. This may be due to the reduced sample size for analyses of never-smokers, or indicate that lung cancer in never-smokers may be more likely related to rare variants than to common variants of small effect that are required to evaluate shared heritability analyses using LDSR.
Discussion
We examined the patterns of genetic overlap between lung cancer and a variety of complex phenotypes using SNP-trait GWAS meta-analysis summary-level data acquired from UKBB (27), other publicly available GWAS data sources (33–36) and the TRICL-Oncoarray Lung Consortium, which provides the largest GWAS of lung cancer in European-descent populations (6) so far conducted. Cross-trait LDSR using GWAS summary statistics enabled us to compute the genetic correlation between phenotypic traits by estimating genetic correlations between pairs of traits to gain insights into common etiologies (25, 26). We identified new phenotypic associations between different traits and lung cancer, including subsets stratified by lung histologic subtypes and primarily mediated by the genetics of smoking behaviors. The findings of this study enable us to confirm previously known associations and to identify polygenic traits for further study using complementary approaches such as Mendelian Randomization analyses to identify new causal relationships between lung cancer and related traits.
We observed a significant positive correlation between the genetic architecture of emphysema/chronic bronchitis and lung cancer susceptibility in LDSR analyses, while higher FEV1 was negatively (i.e., protectively) associated with lung cancer (48). COPD and chronic bronchitis (airflow obstruction) are endophenotypes of emphysema, which is a well-established comorbidity that often precedes lung cancer diagnosis (49). Chronic inflammation is a key feature in the development of both emphysema and bronchitis and could potentially contribute to lung cancer development, but their shared association with smoking history makes disentangling such effects challenging.
Most traits related to smoking behaviors increased the risk of lung cancer development among ever-smokers, including the length of time a person waits to smoke their first cigarette after waking. Our study also observed a significant positive correlation between the genomic architecture of in utero tobacco smoke exposure and that of lung cancer, suggesting that individuals who smoke during pregnancy could contribute to risk in their offspring. Infants whose mothers smoke during pregnancy may have reduced birth weight, birth defects, and weaker lungs than those whose mothers do not smoke, and this has been associated with numerous negative health outcomes in later life (50, 51). However, maternal smoking behaviors may also serve as a proxy for an individual's own genetic predisposition to engage in smoking, suggesting that additional smoking-associated loci that we were not able to account for may underlie the observed association.
Several blood traits showed positive genetic correlations with lung cancer and across all lung histologic subtypes. UKBB does not delineate white blood cell types among participants, limiting the observations that can be made with the available data. Overall, genetic correlations with increased lung cancer risk were observed with increased lymphocyte counts and a higher WBC. After excluding smoking-associated genomic loci, WBC remained associated with lung cancer. These findings showed consistency across lung histological subtypes and among ever-smokers. Further studies delineating risks by white blood cell subtypes are needed. A reticulocyte measures were also associated with a higher risk of lung cancer. The reticulocyte count measures how quickly reticulocytes made by the bone marrow are released into the blood. Reticulocyte count can be increased due to diseases that prematurely destroy red blood cells, such as aplastic anemia. The reticulocyte count is one of the primary parameters used in the initial classification of anemia, which co-occurs with lung cancer at baseline and is often exacerbated after chemotherapy treatment (52). The prevalence of anemia in patients with lung cancer is 37.6% and its incidence after chemotherapy treatment is 80% (53).
We also identified several suggestive polygenic phenotypes for which the genetic correlations were nominally significant (P < 0.05). We observed a nominally significant inverse genetic correlation between lung cancer including histology-specific subtypes and among ever-smokers and HDL levels, which was also negatively correlated with pancreatic cancer (37). Low HDL and high hematology level of lymphocyte and WBC were highly associated with insulin resistance, which has been positively associated with cancer development (54). We observed a positive association between diabetes and lung cancer with attenuation of this signal after exclusion smoking-associated genomic regions from analysis. Lung cancer, across all histological subtypes and among ever-smokers showed a uniquely negative genetic correlation with eczema at a nominal significance level of 0.05 compared with breast, colorectal, head/neck, ovary, and prostate cancers (38). Eczema is an umbrella term for several types of dermatitis, most commonly atopic dermatitis associated with increased serum IgE. Elevated serum IgE has also been shown to play a role in cancer immunosurveillance (55). Eczema (i.e., atopic dermatitis) is characterized by a Th2-dominated immune response both in the skin and in circulation. The associations between cancer and atopic conditions, including eczema, have been examined previously and found to protect against cancers of the lung, brain, pancreas, colon, leukemia, and melanoma (55, 56). We observed a shared effect with another autoimmune condition, celiac disease, which is a digestive disorder characterized by an abnormal reaction to gluten. Celiac disease showed an inverse genetic correlation with lung cancer and LUSC and a positive genetic correlation with prostate cancer (38) at the nominal significance level of 0.05. One possible explanation is that patients with celiac disease might be less likely to become smokers. However, the genetic correlations demonstrated a consistent inverse genetic correlation between celiac disease and lung cancer across all subtypes and even after excluding regions associated with smoking behaviors (57). When stratified by smoking status, we observed an inverse correlation in ever-smokers and a positive correlation in never-smokers, although neither reached statistical significance.
All phenotypic traits related to smoking behaviors showed strong genetic correlations with the risk of developing lung cancer among ever-smokers. Even after the removal of chromosomal regions associated with smoking behaviors, we still observed significant positive associations with lung cancer, across all subtypes, and among ever-smokers. This demonstrates that there may remain additional genetic loci of small effect size that influence smoking behaviors and therefore lung cancer risk, but which have not yet been identified by GWAS. It also suggests that our analyses that excluded the known smoking-associated regions are unable to fully account for the contribution of smoking-associated genomic variation to our analyses. However, the studies that have been conducted to study the genetics of smoking have included more than 500,000 individuals, so that a comprehensive assessment of smoking genetic architecture has been completed and we would anticipate that much of the genetic variation contributing to smoking behavior should have been removed by our approach (i.e., removing a 500-kb region around each of the known smoking-related regions). The limited change in estimates of cross-correlations in this analysis suggests that the cross-heritability analyses we have conducted may be robust to possible confounding with smoking behavior.
Large-scale datasets, where multiple phenotypes are assayed and surveyed in tens or hundreds of thousands of individuals, have become increasingly available to genomic researchers. However, the availability of individual-level genotype data remains more limited. We performed cross-trait LDSR analyses to decipher genetic correlations between lung cancer and various complex traits and diseases using GWAS summary statistics. LDSR approach provides an improved understanding of the shared genetic architecture between traits, which can serve as potential surrogates for polygenic variants involved in lung carcinogenesis. LDSR does not require individual-level data, genome-wide significant SNPs, or LD pruning, which otherwise leads to loss of information when causal SNPs are in LD. A particular advantage is that measuring multiple traits within the same individuals is not required, allowing one to assess the presence of genetic correlation between risk for lung cancer and other traits using summary statistics and known LD patterns. Genetic correlation analysis has unique strengths in leveraging information from the entire genome to evaluate correlations between a disease of interest, like lung cancer, and multiple traits that have been studied in unrelated populations. However, LDSR is not without limitations. One important assumption is that population stratification in the underlying GWAS summary statistics is not present, which we ensured by using summary statistics only from studies that employed ancestry-adjusted regression analyses. In addition, the genetic architecture of subjects must be similar across the two traits being compared, otherwise estimates will be biased – generally toward the null. We restricted all individuals of our analyses to populations of European-descent and included SNPs that were imputed using the common 1000 Genomes imputation platform. Furthermore, GWAS summary statistics based on small sample sizes or for traits with low SNP-based heritability are not amenable to LDSR analysis, limiting the breadth of phenotypes one can regress against their trait of interest. Many non-significant associations in our data could be due to limited statistical power, rather than a lack of shared heritability, as cross-trait LDSR requires larger sample sizes of GWAS summary data to achieve equivalent SE compared with methods using individual-level data (25). Finally, LDSR currently relies on analysis of common genetic variants with MAF>0.01 and therefore fails to capture shared heritability due to underlying rare variants, which have previously been reported in association with lung cancer (58).
LDSR analysis confirmed previously identified traits such as smoking and family history, that are positively genetically correlated with lung cancer, implying a shared genetic background. This study also showed inverse (protective) associations between the genetic architecture of lung cancer and that of both eczema and celiac disease, which persisted when excluding smoking behavior–associated genomic loci from analyses. In addition, our analyses indicate that additional smoking-associated loci may yet be identified on the basis of the persistent positive correlation between smoking and lung cancer in our data after accounting for known common loci. Analyses among never-smokers failed to identify notable associations, suggesting that lung cancer in these patients might be caused by rarer variants. Future studies using Mendelian randomization or genetic instrument variable approaches may further elucidate the causal relationship between the risk of lung cancer and other phenotypic traits of interest to reveal surrogate biomarkers with shared genetic backgrounds.
Authors' Disclosures
C.I. Amos reports grants from NCI and grants from Cancer Prevention Research Institute of Texas during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
J. Byun: Conceptualization, formal analysis, supervision, investigation, methodology, writing–original draft, writing–review and editing. Y. Han: Conceptualization, data curation, software, formal analysis, investigation, visualization, methodology, writing–review and editing. Q.T. Ostrom: Conceptualization, data curation, software, formal analysis, investigation, methodology, writing–review and editing. J. Edelson: Conceptualization, data curation, software, formal analysis, writing–review and editing. K.M. Walsh: Conceptualization, formal analysis, investigation, methodology, writing–review and editing. R.W. Pettit: Formal analysis, writing–review and editing. M.L. Bondy: Conceptualization, investigation, methodology, writing–review and editing. R.J. Hung: Resources, writing–review and editing. J.D. McKay: Resources, methodology, writing–review and editing. C.I. Amos: Conceptualization, supervision, funding acquisition, investigation, methodology, writing–review and editing.
Acknowledgments
The authors would like to thank all members of the Integrative Analysis of Lung Cancer Etiology and Risk (INTEGRAL) Team of the International Lung Cancer Consortium (ILCCO) for providing summary results data for lung cancer. INTEGRAL-ILCCO acknowledges the following contributing investigators: Demetrius Albanes, Stephan Lam, Adonina Tardon, Chu Chen, Gary Goodman, Stig E. Bojesen, Maria Teresa Landi, Mattias Johansson, Angela Risch, H-Erich Wichmann, Heike Bickeboller, David C. Christiani, Gadi Rennert, Susanne Arnold, Paul Brennan, John K. Field, Sanjay Shete, Loic Le Marchand, Olle Melander, Hans Brunnström, Geoffrey Liu, Angeline Andrew, Lambertus A. Kiemeney, Hongbing Shen, Shan Zienolddiny, Kjell Grankvist, Mikael Johansson, Neil Caporaso, Penella Woll, Richard Houlston, Ying Wang, M. Dawn Teare, Yun-Chul Hong, Jian-Min Yuan, Philip Lazarus, Matthew B. Schabath, Melinda C. Aldrich. C.I. Amos is an Established Research Scholar of the Cancer Prevention Research Institute of Texas. R.W. Pettit would like to thank the Baylor Research Advocates for Student Scientists for their funding support. Cancer Prevention Research Interest of Texas (CPRIT) award: RR170048 (C.I. Amos, J. Byun); NIH for INTEGRAL consortium: U19CA203654 (C.I. Amos, J. Byun, Y. Han, J. Edelson, R.J. Hung, J.D. McKay); Distinguished Scientist award from the Sontag Foundation (to K.M. Walsh); Research Training Grant from the Cancer Prevention and Research Institute of Texas: RP160097T (to Q.T. Ostrom); NIH: R01CA139020 (to M.L. Bondy); Training In Precision Environmental Health Sciences (TPEHS) Program (NIH grant no. T32ES027801; R.W. Pettit).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.