Abstract
Lung cancer remains the leading cause of cancer death worldwide, with 15% to 20% occurring in never smokers. To assess genetic determinants for prognosis among never smokers, we conducted a genome-wide investigation in the International Lung Cancer Consortium (ILCCO).
Genomic and clinical data from 1,569 never-smoking patients with lung cancer of European ancestry from 10 ILCCO studies were included. HRs and 95% confidence intervals of overall survival were estimated. We assessed whether the associations were mediated through mRNA expression–based 1,553 normal lung tissues from the lung expression quantitative trait loci (eQTL) dataset and Genotype-Tissue Expression (GTEx). For cross-ethnicity generalization, we assessed the associations in a Japanese study (N = 887).
One locus at 13q22.2 was associated with lung adenocarcinoma survival at genome-wide level, with carriers of rs12875562-T allele exhibiting poor prognosis [HR = 1.71 (1.41–2.07), P = 3.60 × 10−8], and altered mRNA expression of LMO7DN in lung tissue (GTEx, P = 9.40 × 10−7; Lung eQTL dataset, P = 0.003). Furthermore, 2 of 11 independent loci that reached the suggestive significance level (P < 10−6) were significant eQTL affecting mRNA expression of nearby genes in lung tissues, including CAPZB at 1p36.13 and UBAC1 at 9q34.3. One locus encoding NWD2/KIAA1239 at 4p14 showed associations in both European [HR = 0.50 (0.38–0.66), P = 6.92 × 10−7] and Japanese populations [HR = 0.79 (0.67–0.94), P = 0.007].
Based on the largest genomic investigation on the lung cancer prognosis of never smokers to date, we observed that lung cancer prognosis is affected by inherited genetic variants.
We identified one locus near LMO7DN at genome-wide level and several potential prognostic genes with cis-effect on mRNA expression. Further functional genomics work is required to understand their role in tumor progression.
Introduction
With over 1 million deaths each year, lung cancer continues to be the leading cause of cancer mortality worldwide, and the 5-year survival rate remains low at only 10% to 20% (1, 2). Although it is well established that tobacco smoking is the primary cause of lung cancer, inherited genetic variations have also been established as etiologic factors through genome-wide association studies (GWAS), which identified susceptibility loci including CHRNA3/5, TERT-CLPTM1L, the HLA/MHC region, CHEK2, and more in the last decade (3–9).
Approximately 15% to 20% of lung cancer cases occur in individuals who are lifelong never smokers (10, 11). Many studies have shown significant differences in the etiology and clinical characteristics between never and ever smokers, and lung cancer in never smokers is being recognized as a distinct disease entity. Most notably, smokers and never smokers have different histologic presentation with adenocarcinoma being the main histologic type among never-smoking patients (10), and never smokers have a higher prevalence of EGFR mutations and those with EGFR mutations show longer survival after treatment with EGFR inhibitors than ever smokers do. Additional features that distinguish lung cancer in never smokers and ever smokers are differences in their somatic mutations and methylation profiles (12, 13).
Inherited genetic variation has been hypothesized to influence lung cancer survival, and several GWAS were performed with a focus on overall survival (14), in patients with early stage lung cancer (15, 16), patients who received platinum-based chemotherapies (17–19), and patients with advanced non-small cell lung cancer (20, 21), although most studies have relatively modest sample sizes ranging from 100 to 400 patients with lung cancer. Moreover, we hypothesize that there are distinctive genetic factors that contribute to lung cancer prognosis in smokers and never smokers, and analyzing never smokers separately would provide a greater insight into the genetic components of lung cancer survival for this specific population. To increase our power for genomic discovery, we conducted a meta-analysis of 10 GWAS with clinical prognosis data based on a total of 1,569 never-smoking patients with lung cancer of European ancestry in a two-stage analysis. The generalizability of the candidate association across ethnicity was tested in the Japanese nonsmoking population in the second stage. The potential functional significance of the genetic regions related to prognosis was investigated using expression quantitative trait loci (eQTL) analysis based on four independent studies from the Universities of Laval, University of British Columbia (UBC) and Groningen, and the Genotype-Tissue Expression (GTEx) data (22, 23).
Materials and Methods
Description of participating studies
A total of 12 studies in the International Lung Cancer Consortium participated in this analysis, including ten lung cancer GWAS of European populations and two studies in Japanese populations to asses for generalizability. Never smokers were defined as individuals who smoked fewer than 100 cigarettes during their lifetime, with the exception of Liverpool Lung Project in which the definition was individuals who smoked 10 cigarettes per week regularly (among those 98.5% also fit under the former definition). All participants provided written-informed consent, and research protocols of all studies were reviewed and approved by the local institutional review boards of each participating study. Information of each study is summarized in Table 1 and included in the Supplementary Materials.
Genotyping and imputation
Genotyping in each study was conducted using Illumina HumanHap300K, 370K, 610K, 660K, OmniExpress, or OncoArray. In general, the quality control procedures were similar across studies with exclusion of variants based on low call rate (<90%) and low minor allele frequency (<1%). Individuals with high missing rate (>5 or 10%), gender discrepancies, unexpected duplicates, or relatedness were excluded. Details of genotyping and quality control procedures as applied to the lung cancer OncoArray project have been previously published (24). After applying quality control steps and restricting to genotyped individuals of European ancestry with no smoking history and complete clinical follow-up information, data were available on a total of 1,569 never-smoking patients with lung cancer, including 208 from Toronto, 327 from MD Anderson Cancer Center (MDACC), 349 from Mayo, 59 from Central Europe, 92 from Harvard study, and 534 in the 5 studies genotyped in the OncoArray project. The key characteristics of all participating studies are summarized in Table 1.
To facilitate the meta-analysis across genotyping platforms, genotype imputation was conducted in each study based on the March 2012 release of the 1000-Genomes Project. The Toronto and Mayo Clinic studies were imputed using IMPUTE2 (25–27), and the IARC-Central Europe and Harvard studies were imputed using MaCH software (28). Variants that were not present in any genotyping array or with suboptimal imputation quality were excluded from the analysis based on IMPUTE2 Info < 0.3 and MACH RSQR < 0.3. After applying quality control filters, 629,283 SNPs were available for the meta-analysis. For the Japanese GWAS study, the 887 patients with lung cancer from National Cancer Center Hospital and Aichi Cancer Centre were genotyped using Illumina HumanOmini1-Quad and Illumina 660W. The Japanese study was imputed using IMPUTE2. Standard quality control steps applied to remove potential errors and biases have been previously described (29). Briefly, individuals with gender discrepancies, low call rates (<98%), and first-degree relatives were excluded, and variants with Hardy–Weinberg Equilibrium (P < 10−6) were removed.
Statistical analysis
Study-specific analysis of GWAS data
Overall survival time was defined as the time from date of lung cancer diagnosis to date of death or the last known date alive. Cox proportional hazards model was applied to assess marginal effects of patient characteristics on lung cancer survival. Genomic inflation factor was estimated by comparing observed and expected P values. Quantile–quantile plots were used to assess the extent to which the observed distribution of the test statistic follows the expected distribution for each study. OncoArray project data were pooled and analyzed as one study as they were all genotyped and processed at the same time. The analytical process was summarized in Supplementary Fig. S1.
For each variant that passed quality control procedures as described, multivariate Cox proportional hazards regression was used to assess the association of lung cancer survival within each study. The probabilistic genotype dosage model was used for the main analysis and included potential confounding factors that might influence patient survival including age (as a continuous variable), sex (male or female), clinical stage (IA–IIIA, IIIB–IV), and where available, treatment information. To limit inflation of the calculated test-statistics due to population substructure, each study was independently adjusted by the top two to six principal components (PC) (30). The Japanese studies were adjusted by the top five PCs. HRs and their corresponding 95% confidence intervals (95% CI) for survival were computed based on cox regression models.
Survival rates were estimated using the Kaplan–Meier method, and median survival times were calculated based on diagnosis and death dates. Log-rank tests were used to examine for differences between survival estimates of genotypes pooled across studies.
Meta-analysis of GWAS data
A fixed-effects meta-analysis was performed to combine study-specific HR of sequence variants using an inverse variance-based weighting method implemented in the METAL program (31). The combined estimates were only computed for those variants observed in at least three studies. I2 statistic was calculated to assess the proportion of the total variation due to heterogeneity, and I2 > 75% and PHET < 0.05 were applied to filter out variants with high study heterogeneity (32).
Given the biological heterogeneity across lung cancer histologic types, we have also conducted additional analysis restricted to 1,065 patients with adenocarcinoma, as this is the predominant histologic type of never smokers. We did not consider a subgroup analysis for other histology types due to small sample size. For genetic variants with P value of less than 10−6, we assessed the generalizability based on the Japanese study. All statistical tests were two-sided.
Functional significance
Genetic variants with combined P value less than 10−6 for lung cancer survival were followed up for potential functional significance through an eQTL investigation based on the Lung eQTL dataset, which includes three independent studies and GTEx data. All data sources have been described previously (22, 23). Briefly for the Lung eQTL dataset, whole-genome gene expression profiling in the lung was performed on a custom Affymetrix array (GPL10379). Microarray preprocessing and quality controls were conducted as previously described (22). Genotyping was carried on the Illumina Human 1M-Duo BeadChip array. Only cis-eQTL were considered in this study, testing probe sets located within 1 Mb up and downstream of the SNPs associated with lung cancer survival. Genotypes and gene expression were available in a total of 1,038 individuals including 409 from Laval University, 287 patients from the UBC, and 342 from University of Groningen (33). Association tests were carried in each cohort and then meta-analyzed using the Fisher method. Expression QTL analyses were performed and adjusted for age, sex, and smoking status. In addition, the GTEx database (http://www.gtexportal.org/home/) of RNA sequencing analysis was queried to examine the functional association between candidate variants and expressions of nearby genes in 515 human lung tissues (Release V7).
Results
Study population characteristics
The baseline characteristics of 1,569 never-smoking patients with lung cancer with European ancestry and 887 Asian patients with lung cancer from Japan are shown in Table 1. All studies had similar age distribution with mean age at diagnosis of approximately 62 years across all studies with European ancestry. As expected, approximately two thirds of the patients were females, and lung adenocarcinoma was the primary histologic type in all studies. Median follow-up time ranged from 26 months in the MDACC-OncoArray study to 126 months in the Mayo Clinic study. Overall, 53% of patients were diagnosed with localized stage (I–IIIA) and the remaining 47% with advanced stage (IIIB–IV). The association between key patient characteristics and survival is shown in Supplementary Table S1. As expected, clinical stage is the most prominent factor associated with survival. Treatment information was available in five of the studies as surgery, chemotherapy, or radiotherapy.
Genetic variants associated with lung cancer survival
A total of 629,283 single-nucleotide variants were included in the combined analysis after quality control filtering procedures previously described. The distribution of the bottom 95% of P values was similar to the expected distribution, and the genomic control parameter was 1.02 based on the combined analysis. The associations between genetic variants and lung cancer overall survival for all patients with lung cancer and adenocarcinoma across chromosomes are shown in Manhattan plots (Supplementary Fig. S2). The main findings for overall survival among all patients with lung cancer and adenocarcinoma are summarized in Table 2. For lung cancer overall, no regions reached genome-wide significance, and four variants at 1p22.3, 8q2.3, 9q31.3, and 10p14 were associated with overall survival at P value less than 10−6 (Supplementary Fig. S3A–S3D).
When restricting the analysis to 1,065 patients with lung adenocarcinoma, the intergenic region at 13q22.2 (represented by rs12875562) reached significance GWAS level with T allele was associated with shorter survival time (HR = 1.71; 95% CI, 1.41–2.07, P = 3.60 × 10−8; Table 2 and Fig. 1). In addition, seven other loci had suggestive evidence of association with overall survival at P value ≤ 10−6(Table 2). Among those loci that showed suggestive evidence, it was worthwhile to mention that loci encoding CAPZB gene at 1p36.13 (represented by rs214346) and encoding UBAC1 gene at 9q34.3 (represented by rs6569) both conferred consistent association across studies with HR of 0.72 (95% CI, 0.63–0.82, P = 5.86 × 10−7) and 0.72 (95% CI, 0.63–0.82, P = 5.94 × 10−7), respectively (Figs. 2A and 3A). The genetic locus that conferred the most distinctive survival patterns by genotype is located in 11q14.1 (represented by rs17148028) encoding DLG2 (Table 2; Supplementary Fig. S3I) with HR of 0.48 (0.36–0.64) with carriers of T allele exhibiting better prognosis. The forest plots and regional plots of the remaining loci are shown in the Supplementary Fig. S3.
Among the total 12 loci, the locus encoding NWD2/KIAA1236 located at 4p14 (represented by rs17603438) also showed an association with lung cancer survival in the Japanese cohort. The major allele A was correlated with longer survival time in both the European cohorts (HR = 0.50; 95% CI, 0.38–0.66, P = 6.92 × 10−7) and in the Japanese study (HR = 0.79; 95% CI, 0.67–0.94, P = 0.0072). No other loci showed generalizable association across ethnic groups.
Supplementary Table S2 summarizes the result of genetic variants previously reported to be associated with lung prognosis based on two of the studies included in this analysis (34). Three of the eight loci remained to be nominally significant (Supplementary Table S2) at P value of 10−2 to 10−4 based on ten studies. No follow-up analyses were performed on these variants given the weak level of evidence.
Functional characterization
To investigate whether the variants associated with lung cancer survival may modulate the mRNA expression in the lung tissues, we conducted eQTL analysis for the top 12 loci identified by overall and adenocarcinoma only analysis in a total of 4 independent studies, including 3 lung microarray studies and GTEx, based on a total of 1,553 lung tissues. Variants that were shown to have significant cis-effect on the mRNA expression of a nearby gene across all four studies are shown in Table 3.
Three loci demonstrated consistent cis-effects across all 4 eQTL studies, including the only GWAS level significant variant, rs12875562 located at 13q22.2 with significant eQTL effect on LMO7 Downstream Neighbor (LMO7DN) gene expression with P value of 9.40 × 10−7 in GTEx and 3.44 × 10−3 in Lung eQTL dataset and 9.40 × 10−7 in GTEx (Table 3 and Fig. 1). Patients with the minor allele T had a poor survival and lower LMO7DN expressions. The strongest eQTL signal came from rs214346 located in CAPZB at 1p36.13, which showed a consistent association with increased expression of CAPZB with in Lung eQTL dataset (P value of 2.78 × 10−10) and in GTEx lung tissue (P = 1.10 × 10−9; Table 3 and Fig. 2). Patients with minor A allele have better survival and lower CAPZB expression in lung tissue. Finally, rs6569, located at 9q34.3 in UBAC1 gene, was found to be associated with decreased expression of UBAC1 in both Lung eQTL dataset (P = 1.14 × 10−3) and GTEx (P = 9.20 × 10−14; Table 3 and Fig. 3). The minor allele A of this variant was associated with longer survival (HR = 0.71; 95% CI, 0.62–0.81, P = 5.94 × 10−7) and a higher expression of UBAC1.
Discussion
Based on the largest GWAS on lung cancer prognosis for never smokers conducted to date, we identified one locus that reached GWAS significance level at ch13q22 for patients with lung adenocarcinoma, and 11 loci that provided suggestive evidence; among those ch4p14 was also associated with lung cancer prognosis in Japanese population. Three of the top 12 loci were shown to affect mRNA expression in nearby genes as cis-eQTL across multiple studies, including the region with top signal in ch13q22, which provided additional level of evidence of the association-related prognosis.
The variant at the 13q22 locus is located adjacent to the LMO7 gene and LMO7DN. LMO7 encodes a fibrous actin-binding protein that is commonly expressed in many human tissues, but particularly high in the lung epithelial cells. It was suggested to be involved in the maintenance of epithelial architecture (35) and is considered to act as tumor-suppresser gene, as LMO7 knock-out mice were shown to develop spontaneous lung adenocarcinoma (36). LMO7 expression was shown to be associated with lung cancer prognosis, but the direction of effect is not yet conclusive, which can be attributed to histologic types included in the studies as well as other related genes in the same regulatory pathway (37, 38). We did not observe a consistent eQTL association with LMO7 per se, but with its downstream neighbor (LMO7DN) instead, with carriers of T allele exhibiting poor prognosis for lung adenocarcinoma and lower LMO7DN expression. This suggests that LMO7DN might play a more important role in lung tumor progression. This is the first time LMO7DN is identified as a gene associated with lung cancer prognosis at the genome-wide level.
The region in ch1p36.13 encodes CAPZB gene, whose mRNA expression is affected by the variant with A allele associated with lower level of expression. CAPZB is a regulator of actin filament length that determines the mitotic cortex thickness during cell-cycle progression, and it is associated with cell growth and motility in epithelioid sarcoma (39, 40). Variants within the same locus were previously associated with total platelet mass (41) and autoimmune-related disorders such as Crohn's disease and psoriasis (42, 43). Other variants of the same gene have also been shown to be related to lung function (44, 45), although the exact mechanism of how this gene is associated with lung cancer prognosis in never smokers is not clear.
The region in ch9q34.3 encodes UBAC1, which is associated with innate immune system and Class I MHC–mediated antigen processing. The sequence variants in this gene have been shown to be associated with IL4 and IL6 levels (46). It has not been previously reported to be associated with cancer risk or prognosis. It is biologically plausible that this gene may modulate tumor progression through the innate immune pathway. Finally, genetic variants of DLG2 located at 11q14.1 were previously shown to be associated with lung function (47) and familial squamous cell lung carcinoma (48), as well as anthropometric measurement such as body fat mass, fat, and body mass index (49), which support a role of this gene in lung carcinogenesis and prognosis.
Although previous studies have identified several genetic loci associated with lung cancer overall survival, primarily in smoking-related lung cancer and early stage non–small cell lung cancer (14, 17–19), we did not observe the association with those previously reported loci in our study of never smokers. This is not a surprise as we expect distinctive genetic architecture contributes to lung cancer survival in smokers and never smokers.
For cross-ethnic generalizability, we observed only one association in ch4p14, encoding NWD2/KIAA1239 gene that was potentially generalizable between European and Japanese populations. This could be due to differences in the genetic background or different linkage disequilibrium patters underlying causal variants between European and Japanese population. In addition, the Japanese study has different characteristics such as distribution of sex, stage, treatment modality, and overall survival, which could also contribute to limited generalizability across different ethnicities. Interestingly, the 4p14 locus was previous reported to be associated with smoking behavior (50).
Our study has several limitations—First, potential heterogeneity between patient characteristics across studies may not be fully accounted for. In particular, treatment information was not available for studies included in the OncoArray project and therefore could not be adjusted for in the model. However, it is expected that adjustment for clinical stage would mitigate the potential confounding effect of treatment as these two factors are highly correlated. Similarly, we do not have information on somatic mutations such as EGFR or ALK, which would affect prognosis. It is likely that the differences we observed across ethnic population are due to the differences in these mutations, which are markedly different by ethnic groups. Second, our sample size might be underpowered due to relatively rare occurrence of lung cancer in never smokers, which can lead to potential false-negative results. Nevertheless, this is the largest genome-wide analysis for lung cancer prognosis among never smokers conducted to date, and the results have high relevance considering the percentage of never smokers is increasing among patients with lung cancer.
In summary, we identified one locus at genome-wide significant level at 13q22 which was consistent with known role of genetic region encoding LMO7/LMO7DN in cancer pathology, along with several potential loci that would require further validations. The integrated evidence from both the associations with lung cancer survival and eQTL are complementary and provide support of our hypothesis that inherited genetic factors affect lung cancer survival in never smokers. Functional genomic experiments to assess the effect of altered gene regulation would be required to further understand the therapeutic potential of these biologically plausible genes.
Disclosure of Potential Conflicts of Interest
J.K. Field reports other from AstraZeneca (speakers bureau/advisory board), Epigenomics (advisory board), and Nucleix Ltd. (advisory board) and grants from Janssen Research & Development LLC outside the submitted work. No potential conflicts of interest were disclosed by the other authors.
Disclaimer
Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article, and they do not necessarily represent the decisions, policy, or views of the International Agency for Research on Cancer/World Health Organization.
Authors' Contributions
Y. Brhane: Conceptualization, formal analysis, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. P. Yang: Resources, data curation, project administration, writing–review and editing. D.C. Christiani: Resources, data curation, project administration, writing–review and editing. G. Liu: Resources, data curation, project administration, writing–review and editing. J.R. McLaughlin: Resources, data curation, project administration, writing–review and editing. P. Brennan: Resources, data curation, project administration, writing–review and editing. S. Shete: Resources, data curation, project administration, writing–review and editing. J.K. Field: Resources, data curation, project administration, writing–review and editing. A. Tardón: Resources, data curation, project administration, writing–review and editing. T. Kohno: Resources, data curation, project administration, writing–review and editing. K. Shiraishi: Resources, data curation, project administration, writing–review and editing. K. Matsuo: Resources, data curation, project administration, writing–review and editing. Y. Bossé: Resources, data curation, project administration, writing–review and editing. C.I. Amos: Resources, data curation, funding acquisition, project administration, writing–review and editing. R.J. Hung: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, methodology, writing–original draft, project administration, writing–review and editing.
Acknowledgments
The genetic data generation was supported by U.S. NIH U19 CA148127, and the analysis was supported by U19 CA203654 and CIHR FDN 167273.
R.J. Hung holds the Canada Research Chair in Integrative Molecular Epidemiology.
Dr. C.I. Amos is a Research Scholar of the Cancer Prevention Research Institute of Texas, and part of his effort is supported by RR170048.
The Mayo Clinic Study is supported by NIH grants CA77118/CA80127/CA84354 and Mayo Foundation. Support was provided by the Mayo Clinic Shared Resources.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.