Purpose: Postoperative recurrence in stage I non–small cell lung cancer (NSCLC) is the major cause of a poor prognosis. This study aims to identify genetic variants that are associated with the prognosis of early-stage NSCLCs.

Experimental Design: A genome-wide association study (GWAS) was conducted in 250 patients in stage I NSCLCs and the results were replicated in additional 308 patients.

Results: Results from an Affymetrix Genome-wide Human SNP array in 250 patients identified 94 SNPs with significant associations (P < 2 × 10−4), which were selected for replication in 308 additional patients. Pooled analysis of the 558 patients determined that rs1454694 in chromosome 4q34 was the most significant marker of lung cancer prognosis in the stage I patients (adjusted HR = 2.81; P = 5.91 × 10−8). After the candidate loci were mapped, an additional four markers at chromosome 4q34.3 were significantly associated with recurrence-free survival (RFS; P < 5 × 10−5). A haplotype of five SNPs in 4q34 also showed significant association with RFS (P = 4.29 × 10−6).

Conclusions: A genetic polymorphism rs1454694 was identified as a novel genetic risk factor for RFS of stage I NSCLCs. This genome-wide study suggests that genetic markers in 4q34.3 contribute to predict the prognosis of Korean patients with stage I NSCLCs. Clin Cancer Res; 20(12); 3272–9. ©2014 AACR.

Translational Relevance

The inherited genetic variants that are associated with the risk of postoperative recurrence could be useful indications for risk prediction and the improvement of prognosis in stage I non–small cell lung cancer (NSCLC). In this genome-wide association study (GWAS), we found that rs1454694 in chromosome 4q34 was significantly associated with recurrence-free survival (RFS) of patients with stage I NSCLCs. Consequently, an additional four markers at the same locus also displayed a significant association with RFS. A novel locus that was identified in this study may provide new insights for understanding individual susceptibility to postoperative recurrence in patients with early-stage lung cancer.

Lung cancer is the leading cause of cancer mortality worldwide (1–3). In Korea, non–small cell lung cancer (NSCLC) constitutes the majority of all diagnosed lung cancers with 70% of patients with NSCLCs being diagnosed at stages III and IV and only 17% at stage I (4, 5). The overall survival (OS) of patients with advanced NSCLCs remains poor, whereas the survival time in early-stage patients has improved.

Of those patients in pathologic stage IA and IB NSCLCs who are candidates for surgical resection, the 5-year OS rates are approximately 73% and 58%, respectively (6, 7). While postoperative adjuvant chemotherapy is the standard care among patients with stage II to IIIA NSCLCs who have undergone complete resections, surgery remains the only recommended guideline for the treatment of stage I NSCLCs. The recurrence rate in stage I patients who underwent surgical resection ranges from 13% to 41% (8, 9). Thus, postoperative recurrence is a major obstacle to prolonged survival in early-stage NSCLCs, and considerable differences exist among patients with the same pathologic stage. Genetic markers that could identify those at high risk for recurrence are essential for the development of an effective clinical practice strategy, as patients with a poor prognosis could undergo more aggressive treatment regimens.

Genome-wide association studies (GWAS) identify phenotype-correlated markers that are based on tagging polymorphisms across the entire genome. GWAS were used to identify not only as the susceptibility markers but also as predictors of clinical outcome in cancers of the breast, pancreas, and lung (10–13). Several genome-wide studies reported genetic polymorphisms associated with lung cancer prognosis. Huang and colleagues reported 5 SNPs in the STK39, PCDH7, A2BP1, and EYA2 genes were associated with OS in early-stage NSCLCs from GWAS with tumor tissues (14). An SNP located in the chemokine-like receptor 1 gene (12q23.3) was associated with OS of patients with advanced-stage NSCLCs receiving platinum-based chemotherapy (15). Another GWAS identified 3 prognostic markers associated with OS for advanced NSCLCs treated with carboplatin and paclitaxel (16). However, genetic polymorphisms associated with the risk for postoperative recurrence in completely resected stage I patients with NSCLCs have not been identified.

In this study, we conducted genome-wide screening and a subsequent replication study involving 558 stage I patients who underwent complete resection for NSCLC treatment to identify prognostic markers for early-stage NSCLCs in the Korean population.

Study population

We recruited patients with histologically confirmed stage I NSCLC at the National Cancer Center Korea (Goyang, Korea) from May 2001 to December 2010. The patients voluntarily participated in this study after providing written informed consent that was approved by the Institutional Review Board. No recruitment restrictions were imposed with regard to gender or smoking status. None of the patients with cancer had received chemotherapy or radiotherapy before their recruitment. As a discovery set, 250 patients in stage I NSCLCs were proceeded to the genome-wide SNP array with the Affymetrix SNP array 5.0 chip. Selected SNPs from the GWAS results were validated in a replication study with an additional 308 patients with stage I NSCLCs. Demographic characteristics, including gender, age, and smoking habits, were collected and follow-up information was available for all patients. Patients who currently smoked or who smoked more than 100 cigarettes during their lifetime were defined as ever-smokers.

Clinical outcomes

All patients who underwent surgical resection were followed-up with an interval of 3 months after surgery. We retrospectively reviewed the medical and pathology records of each patient. Chest computed tomography (CT), positron emission tomography (PET), bronchoscopy, and pulmonary function tests were performed preoperatively. Chest CT scans were obtained postoperatively at 3-month intervals, and PET-CT scans were obtained annually to detect recurrence of the tumor. The pathologic staging of patients was confirmed by pathologists using the resected specimens and dissected lymph nodes following pulmonary resection and complete lymph node dissection. Exclusion criteria comprised patients who underwent a limited resection such as a wedge resection or segmentectomy and patients who underwent neoadjuvant chemotherapy treatment. The recurrence-free survival (RFS) rate was calculated from the date of surgery to the recurrence or to the date of the last follow-up. The OS after surgery was measured from the date of surgery until the date of death (from all causes) or last follow-up for those not known to be deceased at the time of analysis.

Genotyping

Genomic DNA was extracted from peripheral blood using a QIAamp DNA Blood Mini Kit (Qiagen) in accordance with the manufacturer's instructions. Genome-wide, high-density SNP screening was performed on 250 subjects using the Affymetrix Genome-Wide Human SNP array 5.0 (Affymetrix). Genotyping in the replication phase were carried out using SNPtype assay (Fluidigm) or a TaqMan genotyping assay (Applied Biosystems) according to manufacturer's recommendation. SNPtype assay was performed with the 96.96 Dynamic Array and analyzed with Fluidigm SNP Genotyping Analysis software (version 3.1.1; Fluidigm). To confirm the reproducibility of our results, 50 samples were genotyped in duplicate with each method.

Mapping

To assess the genomic region associated with RFS in depth, the 4q34.3 locus was selected for fine-mapping using whole-genome sequencing (WGS) data and HapMap data. Because rs1454694 was located in a long stretch of gene desert region, it was fine-mapped using WGS data collected from 27 blood samples at the National Cancer Center of Korea. We selected 400 kbp upstream and downstream of rs1454694, spanning 0.8-Mb region, and scanned all SNPs in the 27 patients' WGS data. We filtered out novel rare variants that do not yet have a reference ID (RSID) in the National Center for Biotechnology Information (NCBI, Bethesda, MD) dbSNP database version 130 and applied strict quality control criteria [minor allele frequency (MAF) ≥ 10% and Hardy–Weinberg equilibrium (HWE): P ≥ 0.0001]. Using Tagger function from Haploview 4.2 software (r2 ≥ 0.8, aggressive 2-marker haplotypes tagging), we mapped 94 novel tag SNPs across 0.8-Mb region surrounding rs1454694, excluding SNPs that were in an absolute linkage disequilibrium (LD; r2 = 1.0) with markers from the Affymetrix SNP array. Selected SNPs from the WGS data were then genotyped using an SNPtype assay in 558 samples.

Statistical analysis

To test the association between demographic characteristics and prognosis, the log-rank trend test was performed. The median follow-up time was estimated with the reverse Kaplan–Meier method (17). The RFS rates were assessed using the Kaplan–Meier product limit method, and the trend among the polymorphisms was estimated with the log-rank test. The HRs and the associated 95% confidence intervals (CI) were derived from a Cox proportional hazards regression model. To combine the results of discovery and replication phases, a stratified Cox model assuming different baseline hazards for 2 phases was used.

From the result of SNP array, SNPs with duplicate ID in Affymetrix annotation database or a minimum call rate <95% were excluded from the statistical analysis. The MAF of SNPs for subsequent analyses should exceed 5% in our samples. The genotyping results were tested for HWE. If a polymorphism was not in HWE (P < 0.0001), it was also excluded from further analysis. For genome-wide quality control and analysis, PLINK software was used (18). The quantile–quantile (Q–Q) plot and the Manhattan plot were used to evaluate the overall significance of the GWAS data (19).

LD between SNPs was measured with Lewontin D′ (|D′|) and r2 using the Haploview software package (20). Furthermore, each haplotype of 5 SNPs in Ch4q34.3 was estimated from the genotyping data using the PHASE software (21) and the haplotype with the highest probability was used for the analysis. The relationship between polymorphisms and recurrence was analyzed using multiple logistic regression models while controlling for age, gender, smoking status, pathologic stage, and postoperative complication as covariates. All reported P values are 2-sided. False discovery rate (FDR) for multiple testing were estimated using the Benjamini–Hochberg method and the q value method by Storey (22, 23). The statistical software Stata/SE version 10 was used to perform statistical analyses (StataCorp LP).

Patient characteristics

The demographic features of patients with stage I NSCLCs are shown in Table 1. Patients involved in the GWAS were retrospectively recruited from National Cancer Center of Korea with histologically confirmed NSCLCs. All patients were in pathologic stage I (American Joint Committee on Cancer, AJCC/UICC 7th edition), and survival and tumor recurrence were assessed in all patients who underwent surgical resection of the lung and had available clinical information. We observed 117 patients with recurrences during a median follow-up period of 42.8 months. Patients with adenocarcinoma and ever-smokers were predominated in our patients. Demographic features such as age, pathologic stage, and histologic type affected the recurrence rate. Specifically, older patients were significantly associated with a poor clinical outcome (HR, 1.65; P = 0.008) and patients in stage IB showed poorer RFS (HR, 2.09; P = 0.0003) than patients in stage IA, whereas adjuvant chemotherapy was not associated with RFS. Patients with adenocarcinoma had a lower risk of recurrence (HR, 0.67; P = 0.005) than other patients. We selected age, pathologic stage, and histology types as covariates to be adjusted for in multivariate Cox proportional hazards regression analysis.

Table 1.

Demographic characteristics among patients with stage I lung cancer

Discovery phase (n = 250)Replication phase (n = 308)Pooled (n = 558)
Characteristicsn (%)n (%)n (%)Log-rank test Pa
Gender    0.389 
 Male 148 (59.2) 216 (70.1) 364 (65.2)  
 Female 102 (40.8) 92 (29.9) 194 (34.8)  
Ageb, y    0.008 
 <64 127 (50.8) 157 (51.0) 269 (48.2)  
 ≥64 123 (49.2) 151 (49.0) 289 (51.8)  
Smoking status    0.491 
 Never 105 (42.0) 98 (31.8) 203 (36.4)  
 Ever 145 (58.0) 210 (68.2) 355 (63.6)  
Histology    0.005 
 Adenocarcinoma 186 (74.4) 164 (53.2) 350 (62.7)  
 Squamous cell carcinoma 64 (25.6) 121 (39.3) 185 (33.2)  
 Othersc 23 (7.5) 23 (4.1)  
Pathologic stage    0.0003 
 IA 135 (54.0) 95 (30.8) 230 (41.2)  
 IB 115 (46.0) 213 (69.2) 328 (58.8)  
Complication    0.119 
 No 200 (80.0) 222 (72.1) 422 (75.6)  
 Yes 50 (20.0) 86 (27.9) 136 (24.4)  
Adjuvant chemotherapy    0.699 
 No 217 (86.8) 263 (85.4) 480 (86.0)  
 Yes 33 (13.2) 45 (14.6) 78 (14.0)  
Recurrence 29 (11.6) 88 (28.6) 117 (21.0)  
Death 19 (7.6) 61 (19.8) 80 (14.3)  
Median follow-up timed, mo (95% CI) 38.70 (36.73–44.38) 53.09 (37.78–59.46) 42.84 (37.88–47.93)  
Discovery phase (n = 250)Replication phase (n = 308)Pooled (n = 558)
Characteristicsn (%)n (%)n (%)Log-rank test Pa
Gender    0.389 
 Male 148 (59.2) 216 (70.1) 364 (65.2)  
 Female 102 (40.8) 92 (29.9) 194 (34.8)  
Ageb, y    0.008 
 <64 127 (50.8) 157 (51.0) 269 (48.2)  
 ≥64 123 (49.2) 151 (49.0) 289 (51.8)  
Smoking status    0.491 
 Never 105 (42.0) 98 (31.8) 203 (36.4)  
 Ever 145 (58.0) 210 (68.2) 355 (63.6)  
Histology    0.005 
 Adenocarcinoma 186 (74.4) 164 (53.2) 350 (62.7)  
 Squamous cell carcinoma 64 (25.6) 121 (39.3) 185 (33.2)  
 Othersc 23 (7.5) 23 (4.1)  
Pathologic stage    0.0003 
 IA 135 (54.0) 95 (30.8) 230 (41.2)  
 IB 115 (46.0) 213 (69.2) 328 (58.8)  
Complication    0.119 
 No 200 (80.0) 222 (72.1) 422 (75.6)  
 Yes 50 (20.0) 86 (27.9) 136 (24.4)  
Adjuvant chemotherapy    0.699 
 No 217 (86.8) 263 (85.4) 480 (86.0)  
 Yes 33 (13.2) 45 (14.6) 78 (14.0)  
Recurrence 29 (11.6) 88 (28.6) 117 (21.0)  
Death 19 (7.6) 61 (19.8) 80 (14.3)  
Median follow-up timed, mo (95% CI) 38.70 (36.73–44.38) 53.09 (37.78–59.46) 42.84 (37.88–47.93)  

aP value was calculated by the log-rank trend test for recurrence in 558 patients.

bAge group was dichotomized by the median age 64 years.

cOthers included nonspecified patients with NSCLCs.

dMedian follow-up time and 95% CIs were estimated with the reverse Kaplan–Meier method.

Association analysis

In the discovery phase of the GWAS, 250 patients with stage I NSCLCs were genotyped using Affymetrix Genome-Wide Human SNP Array 5.0. Among 440,094 genotyped SNPs, we excluded 127,001 SNPs that showed MAF less than 5%. In addition, 4,771 SNPs were excluded on the basis of calling rate (<95%) as well as 224 markers based on the HWE test (P ≤ 0.0001). After applying strict quality control criteria, 308,098 genotyped SNPs remained for subsequent analyses. The measured genetic inflation factor (λ = 1.01) with the Q–Q plot and the association results for the SNPs across the genome using a trend P value plot are shown in Supplementary Fig. S1. From the discovery phase, 94 SNPs significantly associated with recurrence or death based on the minimum P value between codominant and dominant models (minimum P < 2 × 10−4) were selected and subjected to replication in 308 patients. The association results for the selected SNPs with RFS in all 558 patients are shown using a trend P value as well as multiple testing adjusted P value (Supplementary Table S1).

A single polymorphism located at the chromosome 4q34 region, rs1454694, showed the most significant association with RFS in the replication phase (adjusted P = 8.32 × 10−4) as well as in the discovery phase (P = 2.86 × 10−6). The results of a pooled analysis of 558 patients in stage I revealed enhanced significance of rs1454694 with a P value of 5.91 × 10−8 that was calculated by a meta-analysis using a fixed-effects model (Table 2). The effect of rs1454694 remains significant after adjustment for tests of 308,098 SNPs using Bonferroni correction (P = 0.0182). Subgroup analyses revealed that the association of rs1454694 and RFS was pronounced in male, ever-smokers, and stage IB group (Table 3). To assess the potential heterogeneity among studies, we tested heterogeneity for the effect of rs1454694 using the Cochran Q statistic test. There was no heterogeneity of rs1454694 effects on RFS according to gender (P = 0.46), smoking status (P = 0.54), histology (P = 0.64), or pathologic stages (P = 0.86). However, the genotypes of rs1454694 were not associated with OS.

Table 2.

Association of rs1454694 with RFS and OS in GWAS and replication study

RecurrenceOS
rs1454694Overall patients (n)Recurrence positive (n)MRFTa, moHR (95%CI)PMSTa, moHR (95%CI)P
Discovery phase 
TT 154 NR Ref. NR Ref. 
TC 92 18 96.22 5.08 (2.16–11.97) 2.03 × 10−4 NR 1.71 (0.67–4.38) 2.61 × 10−1 
CC 34.76 28.72 (6.20–133.00) 1.76 × 10−5 81.93 31.27 (1.8–544.59) 1.82 × 10−2 
TC + CC 96 21 96.22 5.48 (2.36–12.68) 7.20 × 10−5 NR 1.88 (0.76–4.68) 1.75 × 10−1 
Ptrend    5.25 (2.62–10.50) 2.86 × 10−6  2.19 (0.91–5.29) 8.18 × 10−2 
Replication phase 
TT 185 38 NR Ref. NR Ref. 
TC 102 44 72.01 2.34 (1.51–3.61) 1.39 × 10−4 110.64 2.13 (1.27–3.59) 4.27 × 10−3 
CC 18 NR 1.86 (0.78–4.42) 1.63 × 10−1 NR 1.32 (0.40–4.40) 6.47 × 10−1 
TC + CC 120 50 76.81 2.27 (1.48–3.46) 1.56 × 10−4 110.64 2.02 (1.22–3.36) 6.61 × 10−3 
Ptrend    1.70 (1.24–2.31) 8.32 × 10−4  1.55 (1.05–2.27) 2.56 × 10−2 
Pooled 
TT 339 46 NR Ref. NR Ref. 
TC 194 62 93.76 2.77 (1.89–4.06) 1.99 × 10−7 NR 2.05 (1.30–3.22) 1.92 × 10−3 
CC 22 77.92 3.17 (1.53–6.57) 1.95 × 10−3 NR 1.81 (0.63–5.16) 2.68 × 10−1 
TC + CC 216 71 93.76 2.81 (1.93–4.08) 5.91 × 10−8 110.64 2.02 (1.30–3.15) 1.80 × 10−3 
Ptrend    2.12 (1.61–2.79) 9.15 × 10−8  1.67 (1.18–2.37) 3.89 × 10−3 
RecurrenceOS
rs1454694Overall patients (n)Recurrence positive (n)MRFTa, moHR (95%CI)PMSTa, moHR (95%CI)P
Discovery phase 
TT 154 NR Ref. NR Ref. 
TC 92 18 96.22 5.08 (2.16–11.97) 2.03 × 10−4 NR 1.71 (0.67–4.38) 2.61 × 10−1 
CC 34.76 28.72 (6.20–133.00) 1.76 × 10−5 81.93 31.27 (1.8–544.59) 1.82 × 10−2 
TC + CC 96 21 96.22 5.48 (2.36–12.68) 7.20 × 10−5 NR 1.88 (0.76–4.68) 1.75 × 10−1 
Ptrend    5.25 (2.62–10.50) 2.86 × 10−6  2.19 (0.91–5.29) 8.18 × 10−2 
Replication phase 
TT 185 38 NR Ref. NR Ref. 
TC 102 44 72.01 2.34 (1.51–3.61) 1.39 × 10−4 110.64 2.13 (1.27–3.59) 4.27 × 10−3 
CC 18 NR 1.86 (0.78–4.42) 1.63 × 10−1 NR 1.32 (0.40–4.40) 6.47 × 10−1 
TC + CC 120 50 76.81 2.27 (1.48–3.46) 1.56 × 10−4 110.64 2.02 (1.22–3.36) 6.61 × 10−3 
Ptrend    1.70 (1.24–2.31) 8.32 × 10−4  1.55 (1.05–2.27) 2.56 × 10−2 
Pooled 
TT 339 46 NR Ref. NR Ref. 
TC 194 62 93.76 2.77 (1.89–4.06) 1.99 × 10−7 NR 2.05 (1.30–3.22) 1.92 × 10−3 
CC 22 77.92 3.17 (1.53–6.57) 1.95 × 10−3 NR 1.81 (0.63–5.16) 2.68 × 10−1 
TC + CC 216 71 93.76 2.81 (1.93–4.08) 5.91 × 10−8 110.64 2.02 (1.30–3.15) 1.80 × 10−3 
Ptrend    2.12 (1.61–2.79) 9.15 × 10−8  1.67 (1.18–2.37) 3.89 × 10−3 

NOTE: NR means the median time-to-event has not been reached. HRs, 95% CIs, and P values were calculated with multivariate Cox models adjusting for age, pathologic stage, and histologic types as covariates.

aMRFT (median recurrence-free time) and MST (median survival time) are indicated as months.

Table 3.

Association with SNPs in chromosome 4q34 and RFS

LD with rs1454694Overall patientsMaleFemaleEver-smokersNever smokersStage IAStage IB
SNPPositionMAFAlleleDr2HR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P value
rs1454694 182434941 0.21 T/C 2.12 (1.61–2.79) 2.25 (1.61–3.16) 1.79 (1.06–3.01) 2.21 (1.57–3.11) 1.82 (1.10–3.03) 2.19 (1.27–3.75) 2.31 (1.66–3.21) 
      9.15 × 10−8 2.26 × 10−6 2.98 × 10−2 5.21 × 10−6 1.97 × 10−2 4.53 × 10−3 6.03 × 10−7 
rs10019279 182435473 0.21 T/C 0.987 2.03 (1.53–2.68) 2.14 (1.52–3.01) 1.74 (1.03–2.96) 2.09 (1.48–2.96) 1.78 (1.07–2.96) 2.04 (1.19–3.49) 2.23 (1.59–3.13) 
      7.96 × 10−7 1.45 × 10−5 3.88 × 10−2 3.18 × 10−5 2.69 × 10−2 9.61 × 10−3 2.92 × 10−6 
rs5012808 182440245 0.2 C/A 0.959 0.840 1.98 (1.49–2.63) 2.09 (1.48–2.97) 1.64 (0.95–2.83) 2.03 (1.42–2.94) 1.88 (1.12–3.18) 2.01 (1.12–3.61) 2.20 (1.58–3.06) 
      2.67 × 10−6 3.43 × 10−5 7.73 × 10−2 1.06 × 10−4 1.77 × 10−2 1.87 × 10−2 3.18 × 10−6 
rs17241910 182444812 0.19 G/A 0.957 0.815 2.08 (1.54–2.79) 2.06 (1.44–2.93) 1.91 (1.04–3.52) 2.04 (1.42–2.94) 2.29 (1.29–4.08) 2.06 (1.14–3.72) 2.38 (1.68–3.39) 
      1.32 × 10−6 6.49 × 10−5 3.73 × 10−2 1.21 × 10−4 4.88 × 10−3 1.60 × 10−2 1.39 × 10−6 
rs28431436 182454221 0.17 A/G 0.947 0.632 1.91 (1.40–2.60) 1.91 (1.32–2.78) 1.71 (0.94–3.09) 1.95 (1.33–2.87) 1.86 (1.06–3.28) 2.01 (1.09–3.69) 2.15 (1.48–3.13) 
      4.59 × 10−5 6.93 × 10−4 7.93 × 10−2 6.24 × 10−4 3.18 × 10−2 2.48 × 10−2 6.03 × 10−5 
LD with rs1454694Overall patientsMaleFemaleEver-smokersNever smokersStage IAStage IB
SNPPositionMAFAlleleDr2HR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P valueHR (95% CI) P value
rs1454694 182434941 0.21 T/C 2.12 (1.61–2.79) 2.25 (1.61–3.16) 1.79 (1.06–3.01) 2.21 (1.57–3.11) 1.82 (1.10–3.03) 2.19 (1.27–3.75) 2.31 (1.66–3.21) 
      9.15 × 10−8 2.26 × 10−6 2.98 × 10−2 5.21 × 10−6 1.97 × 10−2 4.53 × 10−3 6.03 × 10−7 
rs10019279 182435473 0.21 T/C 0.987 2.03 (1.53–2.68) 2.14 (1.52–3.01) 1.74 (1.03–2.96) 2.09 (1.48–2.96) 1.78 (1.07–2.96) 2.04 (1.19–3.49) 2.23 (1.59–3.13) 
      7.96 × 10−7 1.45 × 10−5 3.88 × 10−2 3.18 × 10−5 2.69 × 10−2 9.61 × 10−3 2.92 × 10−6 
rs5012808 182440245 0.2 C/A 0.959 0.840 1.98 (1.49–2.63) 2.09 (1.48–2.97) 1.64 (0.95–2.83) 2.03 (1.42–2.94) 1.88 (1.12–3.18) 2.01 (1.12–3.61) 2.20 (1.58–3.06) 
      2.67 × 10−6 3.43 × 10−5 7.73 × 10−2 1.06 × 10−4 1.77 × 10−2 1.87 × 10−2 3.18 × 10−6 
rs17241910 182444812 0.19 G/A 0.957 0.815 2.08 (1.54–2.79) 2.06 (1.44–2.93) 1.91 (1.04–3.52) 2.04 (1.42–2.94) 2.29 (1.29–4.08) 2.06 (1.14–3.72) 2.38 (1.68–3.39) 
      1.32 × 10−6 6.49 × 10−5 3.73 × 10−2 1.21 × 10−4 4.88 × 10−3 1.60 × 10−2 1.39 × 10−6 
rs28431436 182454221 0.17 A/G 0.947 0.632 1.91 (1.40–2.60) 1.91 (1.32–2.78) 1.71 (0.94–3.09) 1.95 (1.33–2.87) 1.86 (1.06–3.28) 2.01 (1.09–3.69) 2.15 (1.48–3.13) 
      4.59 × 10−5 6.93 × 10−4 7.93 × 10−2 6.24 × 10−4 3.18 × 10−2 2.48 × 10−2 6.03 × 10−5 

NOTE: HRs, 95% CIs, and P values were obtained with multivariate Cox models adjusting for age, pathologic stage, and histologic types.

Fine mapping of chromosome 4q34.3

To map the region around rs1454694, SNPs in tight LD with rs1454694 (r2 > 0.8) were collected from the 0.8-Mb region. We used variants from the WGS data obtained using blood samples of 27 Korean patients with cancer. After a filtering step, 94 additional SNPs showing a high LD with rs1454694 were selected for subsequent genotyping (Supplementary Table S2). Four SNPs of rs10019279, rs5012808, rs17241910, and rs28431436 located in 4q34.3 showed significant association with postoperative recurrence (HR = 1.91–2.08; Table 3).

We estimated haplotypes of 5 SNPs on 4q34.3 and analyzed the association between haplotypes and RFS. Patients harboring haplotype CCAAG had a relatively poor prognosis (HR = 2.09, P = 4.29 × 10−6). As shown in Table 4, the shortest survival duration was observed in patients with the haplotype CCAAA.

Table 4.

Association with haplotypes of 5SNPs in chromosome 4q34 and prognostic parameters

RFSOS
HaplotypeaNo. of allelesMRFTbHR (95% CI)PMSTbHR (95% CI)P
TTCGA 840 NR Ref. NR Ref. 
CCAAG 170 93.76 2.09 (1.53–2.87) 4.29 × 10−6 110.64 1.55 (1.04–2.30) 3.04 × 10−2 
CCAAA 31 55.45 2.75 (1.58–4.78) 3.51 × 10−4 62.22 2.28 (1.15–4.54) 1.90 × 10−2 
CCCGA 20 77.92 1.86 (0.86–3.99) 1.12 × 10−1 NR 0.99 (0.31–3.12) 9.83 × 10−1 
Othersc 37 NR 1.31 (0.66–2.58) 4.37 × 10−1 NR 0.59 (0.19–1.86) 3.67 × 10−1 
RFSOS
HaplotypeaNo. of allelesMRFTbHR (95% CI)PMSTbHR (95% CI)P
TTCGA 840 NR Ref. NR Ref. 
CCAAG 170 93.76 2.09 (1.53–2.87) 4.29 × 10−6 110.64 1.55 (1.04–2.30) 3.04 × 10−2 
CCAAA 31 55.45 2.75 (1.58–4.78) 3.51 × 10−4 62.22 2.28 (1.15–4.54) 1.90 × 10−2 
CCCGA 20 77.92 1.86 (0.86–3.99) 1.12 × 10−1 NR 0.99 (0.31–3.12) 9.83 × 10−1 
Othersc 37 NR 1.31 (0.66–2.58) 4.37 × 10−1 NR 0.59 (0.19–1.86) 3.67 × 10−1 

NOTE: Analyzing results with multivariate Cox models were adjusted by age, pathologic stage, and histologic types as covariates. NR means the median time-to-event has not been reached.

aHaplotypes are from rs1454694, rs10019279, rs5012808, rs17241910, and rs28431436.

bMRFT (median recurrence-free time) and MST (median survival time) are indicated as months.

cOthers include rare haplotypes, such as TTCGG, TTAAA, TTAAG, TCCGA, TCCGG, TCAAG, CTCGA, CTAGA, CTAGG, CTAAG, CCCAA, CCAGA, and CCAGG.

Our genome-wide scan and subsequent replication study was designed to identify the genetic factors that could predict the prognosis of early-stage NSCLCs, particularly of the recurrence after surgical resection. The survival period in patients with early-stage lung cancer is strongly correlated with the recurrence rate after surgery. Although clinical features, including pathologic stage or postoperative complications, were strongly related to the recurrence rate after resection, inherited genetic variations responsible for individual differences in prognosis could be potential prognostic markers. We focused on patients with stage I NSCLCs to predict the risk of recurrence after surgical treatment.

On the basis of our genome-wide scanning and subsequent replication study of 558 patients with stage I NSCLCs, we suggested genetic variations on 4q34.3 as potential prognostic markers associated with the RFS of patients with early-stage lung cancer. The most significant association with recurrence was exhibited by a polymorphism of rs1454694, particularly with short RFS in patients with NSCLCs (Fig. 1). The effect of rs1454694 on prognosis was dominant among male, ever-smokers, and stage IB group. An additional 4 SNPs in high LD with rs1454694 also revealed a significant association with RFS, as well as haplotypes of those 5 SNPs.

Figure 1.

The variant genotypes of rs1454694 polymorphism are associated with poor RFS rate in 558 patients with stage I NSCLCs. The probability of RFS (A) or OS (B) is shown for each genotype. The effect of rs1454694 is only significant for RFS, not for OS.

Figure 1.

The variant genotypes of rs1454694 polymorphism are associated with poor RFS rate in 558 patients with stage I NSCLCs. The probability of RFS (A) or OS (B) is shown for each genotype. The effect of rs1454694 is only significant for RFS, not for OS.

Close modal

Our most plausible candidate, rs1454694, is located within a gene desert region of chromosome 4q34.3. Approximately 120-kb downstream of rs1454694, a non–protein-coding RNA 290 (LINC00290, Gene ID:728081, HGNC: 38515) was found to be the closest gene. Although its function and biologic relevance have yet to be identified, a role of long noncoding RNAs is increasingly being understood (24). Recently, LINC00290 was reported as one of the target genes of frequent somatic copy number alteration across in various cancer types (25). Frequent focal deletions around the LINC00290 gene were also reported in TP53-mutated childhood adrenocortical tumors (26).

The region of chromosome 4q34-q35 was reported as a common region of copy number loss in colon cancer and a hypermethylated region in lung cancer (27, 28). This region includes the coding region of the prognosis-related genes, such as caspase-3 (CASP3) and VEGF-C. The clinical significance of CASP3 expression was reported in stage I NSCLCs as a significant factor for poor prognosis (29). Located about 4500 kb downstream of rs1454694, VEGF-C is known to stimulate lymphangiogenesis and its high expression in tumor cells was related to poor prognosis in NSCLCs (30).

We attempted to annotate this region using histone marks to test whether this region could be enriched with regulatory elements. ENCODE data of 16 chromatin states from 9 cell lines were used; the significant SNPs associated with RFS were found to overlap with weak enhancers in human embryo stem cell (H1-hesc) and human mammary epithelial cell (HMEC). However, no tissue-specific results in the lung were noted. In addition, we focused on regulatory elements for transcription, particularly binding of known transcription factors and histone modifications. We identified 161 additional SNPs in LD (r2 > 0.2) with 5 significant SNPs in 4q34.3 using Asian population data from Haploreg (www.broadinstitute.org/mammals/haploreg/haploreg.php). When we compared ENCODE data with the 161 SNPs, we found weak enhancer signals from H1hesc and HMEC that overlapped with SNPs. Further biologic experiments should be accompanied to assure the presence of regulatory elements spanning risk loci.

Despite the significant effect of rs1454694 in 4q34.3, our study has the limitation considering biologic relevance as there is no characterized gene in the genomic region at 4q34.3. Further functional analysis of this region may reveal a clue for the functional association with the regulatory elements. Furthermore, multiple validation approaches should be considered in other ethnic populations.

In conclusion, this study constitutes the high-density large-scale GWAS carried out in the Korean patients with stage I NSCLCs and suggests 5 SNPs, including rs1454694, as the prognostic markers associated with postoperative recurrence. These results suggest that the presence of the risk alleles in the 5 polymorphisms on 4q34.3 is a risk factor for recurrence after surgery and could be an important marker of poor prognosis for early-stage lung cancer.

No potential conflicts of interest were disclosed.

Conception and design: K.-A. Yoon, H.-S. Lee, J.S. Lee

Development of methodology: D. Lee, J.N. Joo, G.K. Lee, H.-S. Lee

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K.-A. Yoon, K. Bae, G.K. Lee, H.-S. Lee

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.-A. Yoon, M.K. Jung, D. Lee, J.N. Joo, H.-S. Lee, J.S. Lee

Writing, review, and/or revision of the manuscript: K.-A. Yoon, J.N. Joo, G.K. Lee, H.-S. Lee, J.S. Lee

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.-S. Lee

Study supervision: H.-S. Lee, J.S. Lee

The authors thank all of the individuals who participated in this study. Dr. Yeon Soo Lee kindly provided the information on the genetic variation in the genomic region 4q34 from WGS project.

This work was supported by a National Cancer Center Research Grant 1010040 (to J.S. Lee), and 1210360 (to K.-A. Yoon).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Jee
SH
,
Kim
IS
,
Suh
I
,
Shin
D
,
Appel
LJ
. 
Projected mortality from lung cancer in South Korea, 1980–2004
.
Int J Epidemiol
1998
;
27
:
365
9
.
2.
Kamangar
F
,
Dores
GM
,
Anderson
WF
. 
Patterns of cancer incidence, mortality, and prevalence across five continents: defining priorities to reduce cancer disparities in different geographic regions of the world
.
J Clin Oncol
2006
;
24
:
2137
50
.
3.
Marugame
T
,
Hirabayashi
Y
. 
Comparison of time trends in lung cancer mortality (1990–2006) in the world, from the WHO Mortality Database
.
Jpn J Clin Oncol
2009
;
39
:
696
7
.
4.
Choi
JH
,
Chung
HC
,
Yoo
NC
,
Lee
HR
,
Lee
KH
,
Choi
W
, et al
Changing trends in histologic types of lung cancer during the last decade (1981–1990) in Korea: a hospital-based study
.
Lung Cancer (Amsterdam, Netherlands)
1994
;
10
:
287
96
.
5.
Kim
YC
,
Kwon
YS
,
Oh
IJ
,
Kim
KS
,
Kim
SY
,
Ryu
JS
, et al
National survey of lung cancer in Korea, 2005
.
J Lung Cancer
2007
;
6
:
67
73
.
6.
Goldstraw
P
,
Crowley
J
,
Chansky
K
,
Giroux
DJ
,
Groome
PA
,
Rami-Porta
R
, et al
The IASLC Lung Cancer Staging Project: proposals for the revision of the TNM stage groupings in the forthcoming (seventh) edition of the TNM Classification of malignant tumours
.
J Thorac Oncol
2007
;
2
:
706
14
.
7.
Groome
PA
,
Bolejack
V
,
Crowley
JJ
,
Kennedy
C
,
Krasnik
M
,
Sobin
LH
, et al
The IASLC Lung Cancer Staging Project: validation of the proposals for revision of the T, N, and M descriptors and consequent stage groupings in the forthcoming (seventh) edition of the TNM classification of malignant tumours
.
J Thorac Oncol
2007
;
2
:
694
705
.
8.
Hung
JJ
,
Hsu
WH
,
Hsieh
CC
,
Huang
BS
,
Huang
MH
,
Liu
JS
, et al
Post-recurrence survival in completely resected stage I non-small cell lung cancer with local recurrence
.
Thorax
2009
;
64
:
192
6
.
9.
Sugimura
H
,
Nichols
FC
,
Yang
P
,
Allen
MS
,
Cassivi
SD
,
Deschamps
C
, et al
Survival after recurrent nonsmall-cell lung cancer after complete pulmonary resection
.
Ann Thorac Surg
2007
;
83
:
409
17
;
discussioin 417–8
.
10.
Hu
L
,
Wu
C
,
Zhao
X
,
Heist
R
,
Su
L
,
Zhao
Y
, et al
Genome-wide association study of prognosis in advanced non-small cell lung cancer patients receiving platinum-based chemotherapy
.
Clin Cancer Res
2012
;
18
:
5507
14
.
11.
Azzato
EM
,
Pharoah
PD
,
Harrington
P
,
Easton
DF
,
Greenberg
D
,
Caporaso
NE
, et al
A genome-wide association study of prognosis in breast cancer
.
Cancer Epidemiol Biomarkers Prev
2010
;
19
:
1140
3
.
12.
Muller-Tidow
C
,
Diederichs
S
,
Thomas
M
,
Serve
H
. 
Genome-wide screening for prognosis-predicting genes in early-stage non-small-cell lung cancer
.
Lung Cancer (Amsterdam, Netherlands)
2004
;
45
Suppl 2
:
S145
50
.
13.
Innocenti
F
,
Owzar
K
,
Cox
NL
,
Evans
P
,
Kubo
M
,
Zembutsu
H
, et al
A genome-wide association study of overall survival in pancreatic cancer patients treated with gemcitabine in CALGB 80303
.
Clin Cancer Res
2012
;
18
:
577
84
.
14.
Huang
YT
,
Heist
RS
,
Chirieac
LR
,
Lin
X
,
Skaug
V
,
Zienolddiny
S
, et al
Genome-wide analysis of survival in early-stage non-small-cell lung cancer
.
J Clin Oncol
2009
;
27
:
2660
7
.
15.
Wu
X
,
Ye
Y
,
Rosell
R
,
Amos
CI
,
Stewart
DJ
,
Hildebrandt
MAT
, et al
Genome-wide association study of survival in non–small cell lung cancer patients receiving platinum-based chemotherapy
.
J Natl Cancer Inst
2011
;
103
:
817
25
.
16.
Sato
Y
,
Yamamoto
N
,
Kunitoh
H
,
Ohe
Y
,
Minami
H
,
Laird
NM
, et al
Genome-wide association study on overall survival of advanced non-small cell lung cancer patients treated with carboplatin and paclitaxel
.
J Thorac Oncol
2011
;
6
:
132
8
.
17.
Schemper
M
,
Smith
TL
. 
A note on quantifying follow-up in studies of failure time
.
Control Clin Trials
1996
;
17
:
343
6
.
18.
Purcell
S
,
Neale
B
,
Todd-Brown
K
,
Thomas
L
,
Ferreira
MA
,
Bender
D
, et al
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am J Hum Genet
2007
;
81
:
559
75
.
19.
Clarke
GM
,
Anderson
CA
,
Pettersson
FH
,
Cardon
LR
,
Morris
AP
,
Zondervan
KT
. 
Basic statistical analysis in genetic case-control studies
.
Nat Protoc
2011
;
6
:
121
33
.
20.
Barrett
JC
,
Fry
B
,
Maller
J
,
Daly
MJ
. 
Haploview: analysis and visualization of LD and haplotype maps
.
Bioinformatics
2005
;
21
:
263
5
.
21.
Stephens
M
,
Donnelly
P
. 
A comparison of Bayesian methods for haplotype reconstruction from population genotype data
.
Am J Hum Genet
2003
;
73
:
1162
9
.
22.
Storey
JD
,
Tibshirani
R
. 
Statistical significance for genomewide studies
.
Proc Natl Acad Sci U S A
2003
;
100
:
9440
5
.
23.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc B (Methodological)
1995
;
57
:
289
300
.
24.
Du
Z
,
Fei
T
,
Verhaak
RG
,
Su
Z
,
Zhang
Y
,
Brown
M
, et al
Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer
.
Nat Struct Mol Biol
2013
;
20
:
908
13
.
25.
Zack
TI
,
Schumacher
SE
,
Carter
SL
,
Cherniack
AD
,
Saksena
G
,
Tabak
B
, et al
Pan-cancer patterns of somatic copy number alteration
.
Nat Genet
2013
;
45
:
1134
40
.
26.
Letouze
E
,
Rosati
R
,
Komechen
H
,
Doghman
M
,
Marisa
L
,
Fluck
C
, et al
SNP array profiling of childhood adrenocortical tumors reveals distinct pathways of tumorigenesis and highlights candidate driver genes
.
J Clin Endocrinol Metab
2012
;
97
:
E1284
93
.
27.
Douglas
EJ
,
Fiegler
H
,
Rowan
A
,
Halford
S
,
Bicknell
DC
,
Bodmer
W
, et al
Array comparative genomic hybridization analysis of colorectal cancer cell lines and primary carcinomas
.
Cancer Res
2004
;
64
:
4817
25
.
28.
Kohno
T
,
Kawanishi
M
,
Inazawa
J
,
Yokota
J
. 
Identification of CpG islands hypermethylated in human lung cancer by the arbitrarily primed-PCR method
.
Hum Genet
1998
;
102
:
258
64
.
29.
Takata
T
,
Tanaka
F
,
Yamada
T
,
Yanagihara
K
,
Otake
Y
,
Kawano
Y
, et al
Clinical significance of caspase-3 expression in pathologic-stage I, nonsmall-cell lung cancer
.
Int J Cancer
2001
;
96
Suppl
:
54
60
.
30.
Ogawa
E
,
Takenaka
K
,
Yanagihara
K
,
Kurozumi
M
,
Manabe
T
,
Wada
H
, et al
Clinical significance of VEGF-C status in tumour cells and stromal macrophages in non-small cell lung cancer patients
.
Br J Cancer
2004
;
91
:
498
503
.