Abstract
Purpose: Postoperative recurrence in stage I non–small cell lung cancer (NSCLC) is the major cause of a poor prognosis. This study aims to identify genetic variants that are associated with the prognosis of early-stage NSCLCs.
Experimental Design: A genome-wide association study (GWAS) was conducted in 250 patients in stage I NSCLCs and the results were replicated in additional 308 patients.
Results: Results from an Affymetrix Genome-wide Human SNP array in 250 patients identified 94 SNPs with significant associations (P < 2 × 10−4), which were selected for replication in 308 additional patients. Pooled analysis of the 558 patients determined that rs1454694 in chromosome 4q34 was the most significant marker of lung cancer prognosis in the stage I patients (adjusted HR = 2.81; P = 5.91 × 10−8). After the candidate loci were mapped, an additional four markers at chromosome 4q34.3 were significantly associated with recurrence-free survival (RFS; P < 5 × 10−5). A haplotype of five SNPs in 4q34 also showed significant association with RFS (P = 4.29 × 10−6).
Conclusions: A genetic polymorphism rs1454694 was identified as a novel genetic risk factor for RFS of stage I NSCLCs. This genome-wide study suggests that genetic markers in 4q34.3 contribute to predict the prognosis of Korean patients with stage I NSCLCs. Clin Cancer Res; 20(12); 3272–9. ©2014 AACR.
The inherited genetic variants that are associated with the risk of postoperative recurrence could be useful indications for risk prediction and the improvement of prognosis in stage I non–small cell lung cancer (NSCLC). In this genome-wide association study (GWAS), we found that rs1454694 in chromosome 4q34 was significantly associated with recurrence-free survival (RFS) of patients with stage I NSCLCs. Consequently, an additional four markers at the same locus also displayed a significant association with RFS. A novel locus that was identified in this study may provide new insights for understanding individual susceptibility to postoperative recurrence in patients with early-stage lung cancer.
Introduction
Lung cancer is the leading cause of cancer mortality worldwide (1–3). In Korea, non–small cell lung cancer (NSCLC) constitutes the majority of all diagnosed lung cancers with 70% of patients with NSCLCs being diagnosed at stages III and IV and only 17% at stage I (4, 5). The overall survival (OS) of patients with advanced NSCLCs remains poor, whereas the survival time in early-stage patients has improved.
Of those patients in pathologic stage IA and IB NSCLCs who are candidates for surgical resection, the 5-year OS rates are approximately 73% and 58%, respectively (6, 7). While postoperative adjuvant chemotherapy is the standard care among patients with stage II to IIIA NSCLCs who have undergone complete resections, surgery remains the only recommended guideline for the treatment of stage I NSCLCs. The recurrence rate in stage I patients who underwent surgical resection ranges from 13% to 41% (8, 9). Thus, postoperative recurrence is a major obstacle to prolonged survival in early-stage NSCLCs, and considerable differences exist among patients with the same pathologic stage. Genetic markers that could identify those at high risk for recurrence are essential for the development of an effective clinical practice strategy, as patients with a poor prognosis could undergo more aggressive treatment regimens.
Genome-wide association studies (GWAS) identify phenotype-correlated markers that are based on tagging polymorphisms across the entire genome. GWAS were used to identify not only as the susceptibility markers but also as predictors of clinical outcome in cancers of the breast, pancreas, and lung (10–13). Several genome-wide studies reported genetic polymorphisms associated with lung cancer prognosis. Huang and colleagues reported 5 SNPs in the STK39, PCDH7, A2BP1, and EYA2 genes were associated with OS in early-stage NSCLCs from GWAS with tumor tissues (14). An SNP located in the chemokine-like receptor 1 gene (12q23.3) was associated with OS of patients with advanced-stage NSCLCs receiving platinum-based chemotherapy (15). Another GWAS identified 3 prognostic markers associated with OS for advanced NSCLCs treated with carboplatin and paclitaxel (16). However, genetic polymorphisms associated with the risk for postoperative recurrence in completely resected stage I patients with NSCLCs have not been identified.
In this study, we conducted genome-wide screening and a subsequent replication study involving 558 stage I patients who underwent complete resection for NSCLC treatment to identify prognostic markers for early-stage NSCLCs in the Korean population.
Materials and Methods
Study population
We recruited patients with histologically confirmed stage I NSCLC at the National Cancer Center Korea (Goyang, Korea) from May 2001 to December 2010. The patients voluntarily participated in this study after providing written informed consent that was approved by the Institutional Review Board. No recruitment restrictions were imposed with regard to gender or smoking status. None of the patients with cancer had received chemotherapy or radiotherapy before their recruitment. As a discovery set, 250 patients in stage I NSCLCs were proceeded to the genome-wide SNP array with the Affymetrix SNP array 5.0 chip. Selected SNPs from the GWAS results were validated in a replication study with an additional 308 patients with stage I NSCLCs. Demographic characteristics, including gender, age, and smoking habits, were collected and follow-up information was available for all patients. Patients who currently smoked or who smoked more than 100 cigarettes during their lifetime were defined as ever-smokers.
Clinical outcomes
All patients who underwent surgical resection were followed-up with an interval of 3 months after surgery. We retrospectively reviewed the medical and pathology records of each patient. Chest computed tomography (CT), positron emission tomography (PET), bronchoscopy, and pulmonary function tests were performed preoperatively. Chest CT scans were obtained postoperatively at 3-month intervals, and PET-CT scans were obtained annually to detect recurrence of the tumor. The pathologic staging of patients was confirmed by pathologists using the resected specimens and dissected lymph nodes following pulmonary resection and complete lymph node dissection. Exclusion criteria comprised patients who underwent a limited resection such as a wedge resection or segmentectomy and patients who underwent neoadjuvant chemotherapy treatment. The recurrence-free survival (RFS) rate was calculated from the date of surgery to the recurrence or to the date of the last follow-up. The OS after surgery was measured from the date of surgery until the date of death (from all causes) or last follow-up for those not known to be deceased at the time of analysis.
Genotyping
Genomic DNA was extracted from peripheral blood using a QIAamp DNA Blood Mini Kit (Qiagen) in accordance with the manufacturer's instructions. Genome-wide, high-density SNP screening was performed on 250 subjects using the Affymetrix Genome-Wide Human SNP array 5.0 (Affymetrix). Genotyping in the replication phase were carried out using SNPtype assay (Fluidigm) or a TaqMan genotyping assay (Applied Biosystems) according to manufacturer's recommendation. SNPtype assay was performed with the 96.96 Dynamic Array and analyzed with Fluidigm SNP Genotyping Analysis software (version 3.1.1; Fluidigm). To confirm the reproducibility of our results, 50 samples were genotyped in duplicate with each method.
Mapping
To assess the genomic region associated with RFS in depth, the 4q34.3 locus was selected for fine-mapping using whole-genome sequencing (WGS) data and HapMap data. Because rs1454694 was located in a long stretch of gene desert region, it was fine-mapped using WGS data collected from 27 blood samples at the National Cancer Center of Korea. We selected 400 kbp upstream and downstream of rs1454694, spanning 0.8-Mb region, and scanned all SNPs in the 27 patients' WGS data. We filtered out novel rare variants that do not yet have a reference ID (RSID) in the National Center for Biotechnology Information (NCBI, Bethesda, MD) dbSNP database version 130 and applied strict quality control criteria [minor allele frequency (MAF) ≥ 10% and Hardy–Weinberg equilibrium (HWE): P ≥ 0.0001]. Using Tagger function from Haploview 4.2 software (r2 ≥ 0.8, aggressive 2-marker haplotypes tagging), we mapped 94 novel tag SNPs across 0.8-Mb region surrounding rs1454694, excluding SNPs that were in an absolute linkage disequilibrium (LD; r2 = 1.0) with markers from the Affymetrix SNP array. Selected SNPs from the WGS data were then genotyped using an SNPtype assay in 558 samples.
Statistical analysis
To test the association between demographic characteristics and prognosis, the log-rank trend test was performed. The median follow-up time was estimated with the reverse Kaplan–Meier method (17). The RFS rates were assessed using the Kaplan–Meier product limit method, and the trend among the polymorphisms was estimated with the log-rank test. The HRs and the associated 95% confidence intervals (CI) were derived from a Cox proportional hazards regression model. To combine the results of discovery and replication phases, a stratified Cox model assuming different baseline hazards for 2 phases was used.
From the result of SNP array, SNPs with duplicate ID in Affymetrix annotation database or a minimum call rate <95% were excluded from the statistical analysis. The MAF of SNPs for subsequent analyses should exceed 5% in our samples. The genotyping results were tested for HWE. If a polymorphism was not in HWE (P < 0.0001), it was also excluded from further analysis. For genome-wide quality control and analysis, PLINK software was used (18). The quantile–quantile (Q–Q) plot and the Manhattan plot were used to evaluate the overall significance of the GWAS data (19).
LD between SNPs was measured with Lewontin D′ (|D′|) and r2 using the Haploview software package (20). Furthermore, each haplotype of 5 SNPs in Ch4q34.3 was estimated from the genotyping data using the PHASE software (21) and the haplotype with the highest probability was used for the analysis. The relationship between polymorphisms and recurrence was analyzed using multiple logistic regression models while controlling for age, gender, smoking status, pathologic stage, and postoperative complication as covariates. All reported P values are 2-sided. False discovery rate (FDR) for multiple testing were estimated using the Benjamini–Hochberg method and the q value method by Storey (22, 23). The statistical software Stata/SE version 10 was used to perform statistical analyses (StataCorp LP).
Results
Patient characteristics
The demographic features of patients with stage I NSCLCs are shown in Table 1. Patients involved in the GWAS were retrospectively recruited from National Cancer Center of Korea with histologically confirmed NSCLCs. All patients were in pathologic stage I (American Joint Committee on Cancer, AJCC/UICC 7th edition), and survival and tumor recurrence were assessed in all patients who underwent surgical resection of the lung and had available clinical information. We observed 117 patients with recurrences during a median follow-up period of 42.8 months. Patients with adenocarcinoma and ever-smokers were predominated in our patients. Demographic features such as age, pathologic stage, and histologic type affected the recurrence rate. Specifically, older patients were significantly associated with a poor clinical outcome (HR, 1.65; P = 0.008) and patients in stage IB showed poorer RFS (HR, 2.09; P = 0.0003) than patients in stage IA, whereas adjuvant chemotherapy was not associated with RFS. Patients with adenocarcinoma had a lower risk of recurrence (HR, 0.67; P = 0.005) than other patients. We selected age, pathologic stage, and histology types as covariates to be adjusted for in multivariate Cox proportional hazards regression analysis.
. | Discovery phase (n = 250) . | Replication phase (n = 308) . | Pooled (n = 558) . | . |
---|---|---|---|---|
Characteristics . | n (%) . | n (%) . | n (%) . | Log-rank test Pa . |
Gender | 0.389 | |||
Male | 148 (59.2) | 216 (70.1) | 364 (65.2) | |
Female | 102 (40.8) | 92 (29.9) | 194 (34.8) | |
Ageb, y | 0.008 | |||
<64 | 127 (50.8) | 157 (51.0) | 269 (48.2) | |
≥64 | 123 (49.2) | 151 (49.0) | 289 (51.8) | |
Smoking status | 0.491 | |||
Never | 105 (42.0) | 98 (31.8) | 203 (36.4) | |
Ever | 145 (58.0) | 210 (68.2) | 355 (63.6) | |
Histology | 0.005 | |||
Adenocarcinoma | 186 (74.4) | 164 (53.2) | 350 (62.7) | |
Squamous cell carcinoma | 64 (25.6) | 121 (39.3) | 185 (33.2) | |
Othersc | 0 | 23 (7.5) | 23 (4.1) | |
Pathologic stage | 0.0003 | |||
IA | 135 (54.0) | 95 (30.8) | 230 (41.2) | |
IB | 115 (46.0) | 213 (69.2) | 328 (58.8) | |
Complication | 0.119 | |||
No | 200 (80.0) | 222 (72.1) | 422 (75.6) | |
Yes | 50 (20.0) | 86 (27.9) | 136 (24.4) | |
Adjuvant chemotherapy | 0.699 | |||
No | 217 (86.8) | 263 (85.4) | 480 (86.0) | |
Yes | 33 (13.2) | 45 (14.6) | 78 (14.0) | |
Recurrence | 29 (11.6) | 88 (28.6) | 117 (21.0) | |
Death | 19 (7.6) | 61 (19.8) | 80 (14.3) | |
Median follow-up timed, mo (95% CI) | 38.70 (36.73–44.38) | 53.09 (37.78–59.46) | 42.84 (37.88–47.93) |
. | Discovery phase (n = 250) . | Replication phase (n = 308) . | Pooled (n = 558) . | . |
---|---|---|---|---|
Characteristics . | n (%) . | n (%) . | n (%) . | Log-rank test Pa . |
Gender | 0.389 | |||
Male | 148 (59.2) | 216 (70.1) | 364 (65.2) | |
Female | 102 (40.8) | 92 (29.9) | 194 (34.8) | |
Ageb, y | 0.008 | |||
<64 | 127 (50.8) | 157 (51.0) | 269 (48.2) | |
≥64 | 123 (49.2) | 151 (49.0) | 289 (51.8) | |
Smoking status | 0.491 | |||
Never | 105 (42.0) | 98 (31.8) | 203 (36.4) | |
Ever | 145 (58.0) | 210 (68.2) | 355 (63.6) | |
Histology | 0.005 | |||
Adenocarcinoma | 186 (74.4) | 164 (53.2) | 350 (62.7) | |
Squamous cell carcinoma | 64 (25.6) | 121 (39.3) | 185 (33.2) | |
Othersc | 0 | 23 (7.5) | 23 (4.1) | |
Pathologic stage | 0.0003 | |||
IA | 135 (54.0) | 95 (30.8) | 230 (41.2) | |
IB | 115 (46.0) | 213 (69.2) | 328 (58.8) | |
Complication | 0.119 | |||
No | 200 (80.0) | 222 (72.1) | 422 (75.6) | |
Yes | 50 (20.0) | 86 (27.9) | 136 (24.4) | |
Adjuvant chemotherapy | 0.699 | |||
No | 217 (86.8) | 263 (85.4) | 480 (86.0) | |
Yes | 33 (13.2) | 45 (14.6) | 78 (14.0) | |
Recurrence | 29 (11.6) | 88 (28.6) | 117 (21.0) | |
Death | 19 (7.6) | 61 (19.8) | 80 (14.3) | |
Median follow-up timed, mo (95% CI) | 38.70 (36.73–44.38) | 53.09 (37.78–59.46) | 42.84 (37.88–47.93) |
aP value was calculated by the log-rank trend test for recurrence in 558 patients.
bAge group was dichotomized by the median age 64 years.
cOthers included nonspecified patients with NSCLCs.
dMedian follow-up time and 95% CIs were estimated with the reverse Kaplan–Meier method.
Association analysis
In the discovery phase of the GWAS, 250 patients with stage I NSCLCs were genotyped using Affymetrix Genome-Wide Human SNP Array 5.0. Among 440,094 genotyped SNPs, we excluded 127,001 SNPs that showed MAF less than 5%. In addition, 4,771 SNPs were excluded on the basis of calling rate (<95%) as well as 224 markers based on the HWE test (P ≤ 0.0001). After applying strict quality control criteria, 308,098 genotyped SNPs remained for subsequent analyses. The measured genetic inflation factor (λ = 1.01) with the Q–Q plot and the association results for the SNPs across the genome using a trend P value plot are shown in Supplementary Fig. S1. From the discovery phase, 94 SNPs significantly associated with recurrence or death based on the minimum P value between codominant and dominant models (minimum P < 2 × 10−4) were selected and subjected to replication in 308 patients. The association results for the selected SNPs with RFS in all 558 patients are shown using a trend P value as well as multiple testing adjusted P value (Supplementary Table S1).
A single polymorphism located at the chromosome 4q34 region, rs1454694, showed the most significant association with RFS in the replication phase (adjusted P = 8.32 × 10−4) as well as in the discovery phase (P = 2.86 × 10−6). The results of a pooled analysis of 558 patients in stage I revealed enhanced significance of rs1454694 with a P value of 5.91 × 10−8 that was calculated by a meta-analysis using a fixed-effects model (Table 2). The effect of rs1454694 remains significant after adjustment for tests of 308,098 SNPs using Bonferroni correction (P = 0.0182). Subgroup analyses revealed that the association of rs1454694 and RFS was pronounced in male, ever-smokers, and stage IB group (Table 3). To assess the potential heterogeneity among studies, we tested heterogeneity for the effect of rs1454694 using the Cochran Q statistic test. There was no heterogeneity of rs1454694 effects on RFS according to gender (P = 0.46), smoking status (P = 0.54), histology (P = 0.64), or pathologic stages (P = 0.86). However, the genotypes of rs1454694 were not associated with OS.
. | . | . | Recurrence . | OS . | ||||
---|---|---|---|---|---|---|---|---|
rs1454694 . | Overall patients (n) . | Recurrence positive (n) . | MRFTa, mo . | HR (95%CI) . | P . | MSTa, mo . | HR (95%CI) . | P . |
Discovery phase | ||||||||
TT | 154 | 8 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 92 | 18 | 96.22 | 5.08 (2.16–11.97) | 2.03 × 10−4 | NR | 1.71 (0.67–4.38) | 2.61 × 10−1 |
CC | 4 | 3 | 34.76 | 28.72 (6.20–133.00) | 1.76 × 10−5 | 81.93 | 31.27 (1.8–544.59) | 1.82 × 10−2 |
TC + CC | 96 | 21 | 96.22 | 5.48 (2.36–12.68) | 7.20 × 10−5 | NR | 1.88 (0.76–4.68) | 1.75 × 10−1 |
Ptrend | 5.25 (2.62–10.50) | 2.86 × 10−6 | 2.19 (0.91–5.29) | 8.18 × 10−2 | ||||
Replication phase | ||||||||
TT | 185 | 38 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 102 | 44 | 72.01 | 2.34 (1.51–3.61) | 1.39 × 10−4 | 110.64 | 2.13 (1.27–3.59) | 4.27 × 10−3 |
CC | 18 | 6 | NR | 1.86 (0.78–4.42) | 1.63 × 10−1 | NR | 1.32 (0.40–4.40) | 6.47 × 10−1 |
TC + CC | 120 | 50 | 76.81 | 2.27 (1.48–3.46) | 1.56 × 10−4 | 110.64 | 2.02 (1.22–3.36) | 6.61 × 10−3 |
Ptrend | 1.70 (1.24–2.31) | 8.32 × 10−4 | 1.55 (1.05–2.27) | 2.56 × 10−2 | ||||
Pooled | ||||||||
TT | 339 | 46 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 194 | 62 | 93.76 | 2.77 (1.89–4.06) | 1.99 × 10−7 | NR | 2.05 (1.30–3.22) | 1.92 × 10−3 |
CC | 22 | 9 | 77.92 | 3.17 (1.53–6.57) | 1.95 × 10−3 | NR | 1.81 (0.63–5.16) | 2.68 × 10−1 |
TC + CC | 216 | 71 | 93.76 | 2.81 (1.93–4.08) | 5.91 × 10−8 | 110.64 | 2.02 (1.30–3.15) | 1.80 × 10−3 |
Ptrend | 2.12 (1.61–2.79) | 9.15 × 10−8 | 1.67 (1.18–2.37) | 3.89 × 10−3 |
. | . | . | Recurrence . | OS . | ||||
---|---|---|---|---|---|---|---|---|
rs1454694 . | Overall patients (n) . | Recurrence positive (n) . | MRFTa, mo . | HR (95%CI) . | P . | MSTa, mo . | HR (95%CI) . | P . |
Discovery phase | ||||||||
TT | 154 | 8 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 92 | 18 | 96.22 | 5.08 (2.16–11.97) | 2.03 × 10−4 | NR | 1.71 (0.67–4.38) | 2.61 × 10−1 |
CC | 4 | 3 | 34.76 | 28.72 (6.20–133.00) | 1.76 × 10−5 | 81.93 | 31.27 (1.8–544.59) | 1.82 × 10−2 |
TC + CC | 96 | 21 | 96.22 | 5.48 (2.36–12.68) | 7.20 × 10−5 | NR | 1.88 (0.76–4.68) | 1.75 × 10−1 |
Ptrend | 5.25 (2.62–10.50) | 2.86 × 10−6 | 2.19 (0.91–5.29) | 8.18 × 10−2 | ||||
Replication phase | ||||||||
TT | 185 | 38 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 102 | 44 | 72.01 | 2.34 (1.51–3.61) | 1.39 × 10−4 | 110.64 | 2.13 (1.27–3.59) | 4.27 × 10−3 |
CC | 18 | 6 | NR | 1.86 (0.78–4.42) | 1.63 × 10−1 | NR | 1.32 (0.40–4.40) | 6.47 × 10−1 |
TC + CC | 120 | 50 | 76.81 | 2.27 (1.48–3.46) | 1.56 × 10−4 | 110.64 | 2.02 (1.22–3.36) | 6.61 × 10−3 |
Ptrend | 1.70 (1.24–2.31) | 8.32 × 10−4 | 1.55 (1.05–2.27) | 2.56 × 10−2 | ||||
Pooled | ||||||||
TT | 339 | 46 | NR | 1 | Ref. | NR | 1 | Ref. |
TC | 194 | 62 | 93.76 | 2.77 (1.89–4.06) | 1.99 × 10−7 | NR | 2.05 (1.30–3.22) | 1.92 × 10−3 |
CC | 22 | 9 | 77.92 | 3.17 (1.53–6.57) | 1.95 × 10−3 | NR | 1.81 (0.63–5.16) | 2.68 × 10−1 |
TC + CC | 216 | 71 | 93.76 | 2.81 (1.93–4.08) | 5.91 × 10−8 | 110.64 | 2.02 (1.30–3.15) | 1.80 × 10−3 |
Ptrend | 2.12 (1.61–2.79) | 9.15 × 10−8 | 1.67 (1.18–2.37) | 3.89 × 10−3 |
NOTE: NR means the median time-to-event has not been reached. HRs, 95% CIs, and P values were calculated with multivariate Cox models adjusting for age, pathologic stage, and histologic types as covariates.
aMRFT (median recurrence-free time) and MST (median survival time) are indicated as months.
. | . | . | . | LD with rs1454694 . | Overall patients . | Male . | Female . | Ever-smokers . | Never smokers . | Stage IA . | Stage IB . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP . | Position . | MAF . | Allele . | D′ . | r2 . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . |
rs1454694 | 182434941 | 0.21 | T/C | 1 | 1 | 2.12 (1.61–2.79) | 2.25 (1.61–3.16) | 1.79 (1.06–3.01) | 2.21 (1.57–3.11) | 1.82 (1.10–3.03) | 2.19 (1.27–3.75) | 2.31 (1.66–3.21) |
9.15 × 10−8 | 2.26 × 10−6 | 2.98 × 10−2 | 5.21 × 10−6 | 1.97 × 10−2 | 4.53 × 10−3 | 6.03 × 10−7 | ||||||
rs10019279 | 182435473 | 0.21 | T/C | 1 | 0.987 | 2.03 (1.53–2.68) | 2.14 (1.52–3.01) | 1.74 (1.03–2.96) | 2.09 (1.48–2.96) | 1.78 (1.07–2.96) | 2.04 (1.19–3.49) | 2.23 (1.59–3.13) |
7.96 × 10−7 | 1.45 × 10−5 | 3.88 × 10−2 | 3.18 × 10−5 | 2.69 × 10−2 | 9.61 × 10−3 | 2.92 × 10−6 | ||||||
rs5012808 | 182440245 | 0.2 | C/A | 0.959 | 0.840 | 1.98 (1.49–2.63) | 2.09 (1.48–2.97) | 1.64 (0.95–2.83) | 2.03 (1.42–2.94) | 1.88 (1.12–3.18) | 2.01 (1.12–3.61) | 2.20 (1.58–3.06) |
2.67 × 10−6 | 3.43 × 10−5 | 7.73 × 10−2 | 1.06 × 10−4 | 1.77 × 10−2 | 1.87 × 10−2 | 3.18 × 10−6 | ||||||
rs17241910 | 182444812 | 0.19 | G/A | 0.957 | 0.815 | 2.08 (1.54–2.79) | 2.06 (1.44–2.93) | 1.91 (1.04–3.52) | 2.04 (1.42–2.94) | 2.29 (1.29–4.08) | 2.06 (1.14–3.72) | 2.38 (1.68–3.39) |
1.32 × 10−6 | 6.49 × 10−5 | 3.73 × 10−2 | 1.21 × 10−4 | 4.88 × 10−3 | 1.60 × 10−2 | 1.39 × 10−6 | ||||||
rs28431436 | 182454221 | 0.17 | A/G | 0.947 | 0.632 | 1.91 (1.40–2.60) | 1.91 (1.32–2.78) | 1.71 (0.94–3.09) | 1.95 (1.33–2.87) | 1.86 (1.06–3.28) | 2.01 (1.09–3.69) | 2.15 (1.48–3.13) |
4.59 × 10−5 | 6.93 × 10−4 | 7.93 × 10−2 | 6.24 × 10−4 | 3.18 × 10−2 | 2.48 × 10−2 | 6.03 × 10−5 |
. | . | . | . | LD with rs1454694 . | Overall patients . | Male . | Female . | Ever-smokers . | Never smokers . | Stage IA . | Stage IB . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP . | Position . | MAF . | Allele . | D′ . | r2 . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . | HR (95% CI) P value . |
rs1454694 | 182434941 | 0.21 | T/C | 1 | 1 | 2.12 (1.61–2.79) | 2.25 (1.61–3.16) | 1.79 (1.06–3.01) | 2.21 (1.57–3.11) | 1.82 (1.10–3.03) | 2.19 (1.27–3.75) | 2.31 (1.66–3.21) |
9.15 × 10−8 | 2.26 × 10−6 | 2.98 × 10−2 | 5.21 × 10−6 | 1.97 × 10−2 | 4.53 × 10−3 | 6.03 × 10−7 | ||||||
rs10019279 | 182435473 | 0.21 | T/C | 1 | 0.987 | 2.03 (1.53–2.68) | 2.14 (1.52–3.01) | 1.74 (1.03–2.96) | 2.09 (1.48–2.96) | 1.78 (1.07–2.96) | 2.04 (1.19–3.49) | 2.23 (1.59–3.13) |
7.96 × 10−7 | 1.45 × 10−5 | 3.88 × 10−2 | 3.18 × 10−5 | 2.69 × 10−2 | 9.61 × 10−3 | 2.92 × 10−6 | ||||||
rs5012808 | 182440245 | 0.2 | C/A | 0.959 | 0.840 | 1.98 (1.49–2.63) | 2.09 (1.48–2.97) | 1.64 (0.95–2.83) | 2.03 (1.42–2.94) | 1.88 (1.12–3.18) | 2.01 (1.12–3.61) | 2.20 (1.58–3.06) |
2.67 × 10−6 | 3.43 × 10−5 | 7.73 × 10−2 | 1.06 × 10−4 | 1.77 × 10−2 | 1.87 × 10−2 | 3.18 × 10−6 | ||||||
rs17241910 | 182444812 | 0.19 | G/A | 0.957 | 0.815 | 2.08 (1.54–2.79) | 2.06 (1.44–2.93) | 1.91 (1.04–3.52) | 2.04 (1.42–2.94) | 2.29 (1.29–4.08) | 2.06 (1.14–3.72) | 2.38 (1.68–3.39) |
1.32 × 10−6 | 6.49 × 10−5 | 3.73 × 10−2 | 1.21 × 10−4 | 4.88 × 10−3 | 1.60 × 10−2 | 1.39 × 10−6 | ||||||
rs28431436 | 182454221 | 0.17 | A/G | 0.947 | 0.632 | 1.91 (1.40–2.60) | 1.91 (1.32–2.78) | 1.71 (0.94–3.09) | 1.95 (1.33–2.87) | 1.86 (1.06–3.28) | 2.01 (1.09–3.69) | 2.15 (1.48–3.13) |
4.59 × 10−5 | 6.93 × 10−4 | 7.93 × 10−2 | 6.24 × 10−4 | 3.18 × 10−2 | 2.48 × 10−2 | 6.03 × 10−5 |
NOTE: HRs, 95% CIs, and P values were obtained with multivariate Cox models adjusting for age, pathologic stage, and histologic types.
Fine mapping of chromosome 4q34.3
To map the region around rs1454694, SNPs in tight LD with rs1454694 (r2 > 0.8) were collected from the 0.8-Mb region. We used variants from the WGS data obtained using blood samples of 27 Korean patients with cancer. After a filtering step, 94 additional SNPs showing a high LD with rs1454694 were selected for subsequent genotyping (Supplementary Table S2). Four SNPs of rs10019279, rs5012808, rs17241910, and rs28431436 located in 4q34.3 showed significant association with postoperative recurrence (HR = 1.91–2.08; Table 3).
We estimated haplotypes of 5 SNPs on 4q34.3 and analyzed the association between haplotypes and RFS. Patients harboring haplotype CCAAG had a relatively poor prognosis (HR = 2.09, P = 4.29 × 10−6). As shown in Table 4, the shortest survival duration was observed in patients with the haplotype CCAAA.
. | . | RFS . | OS . | ||||
---|---|---|---|---|---|---|---|
Haplotypea . | No. of alleles . | MRFTb . | HR (95% CI) . | P . | MSTb . | HR (95% CI) . | P . |
TTCGA | 840 | NR | 1 | Ref. | NR | 1 | Ref. |
CCAAG | 170 | 93.76 | 2.09 (1.53–2.87) | 4.29 × 10−6 | 110.64 | 1.55 (1.04–2.30) | 3.04 × 10−2 |
CCAAA | 31 | 55.45 | 2.75 (1.58–4.78) | 3.51 × 10−4 | 62.22 | 2.28 (1.15–4.54) | 1.90 × 10−2 |
CCCGA | 20 | 77.92 | 1.86 (0.86–3.99) | 1.12 × 10−1 | NR | 0.99 (0.31–3.12) | 9.83 × 10−1 |
Othersc | 37 | NR | 1.31 (0.66–2.58) | 4.37 × 10−1 | NR | 0.59 (0.19–1.86) | 3.67 × 10−1 |
. | . | RFS . | OS . | ||||
---|---|---|---|---|---|---|---|
Haplotypea . | No. of alleles . | MRFTb . | HR (95% CI) . | P . | MSTb . | HR (95% CI) . | P . |
TTCGA | 840 | NR | 1 | Ref. | NR | 1 | Ref. |
CCAAG | 170 | 93.76 | 2.09 (1.53–2.87) | 4.29 × 10−6 | 110.64 | 1.55 (1.04–2.30) | 3.04 × 10−2 |
CCAAA | 31 | 55.45 | 2.75 (1.58–4.78) | 3.51 × 10−4 | 62.22 | 2.28 (1.15–4.54) | 1.90 × 10−2 |
CCCGA | 20 | 77.92 | 1.86 (0.86–3.99) | 1.12 × 10−1 | NR | 0.99 (0.31–3.12) | 9.83 × 10−1 |
Othersc | 37 | NR | 1.31 (0.66–2.58) | 4.37 × 10−1 | NR | 0.59 (0.19–1.86) | 3.67 × 10−1 |
NOTE: Analyzing results with multivariate Cox models were adjusted by age, pathologic stage, and histologic types as covariates. NR means the median time-to-event has not been reached.
aHaplotypes are from rs1454694, rs10019279, rs5012808, rs17241910, and rs28431436.
bMRFT (median recurrence-free time) and MST (median survival time) are indicated as months.
cOthers include rare haplotypes, such as TTCGG, TTAAA, TTAAG, TCCGA, TCCGG, TCAAG, CTCGA, CTAGA, CTAGG, CTAAG, CCCAA, CCAGA, and CCAGG.
Discussion
Our genome-wide scan and subsequent replication study was designed to identify the genetic factors that could predict the prognosis of early-stage NSCLCs, particularly of the recurrence after surgical resection. The survival period in patients with early-stage lung cancer is strongly correlated with the recurrence rate after surgery. Although clinical features, including pathologic stage or postoperative complications, were strongly related to the recurrence rate after resection, inherited genetic variations responsible for individual differences in prognosis could be potential prognostic markers. We focused on patients with stage I NSCLCs to predict the risk of recurrence after surgical treatment.
On the basis of our genome-wide scanning and subsequent replication study of 558 patients with stage I NSCLCs, we suggested genetic variations on 4q34.3 as potential prognostic markers associated with the RFS of patients with early-stage lung cancer. The most significant association with recurrence was exhibited by a polymorphism of rs1454694, particularly with short RFS in patients with NSCLCs (Fig. 1). The effect of rs1454694 on prognosis was dominant among male, ever-smokers, and stage IB group. An additional 4 SNPs in high LD with rs1454694 also revealed a significant association with RFS, as well as haplotypes of those 5 SNPs.
Our most plausible candidate, rs1454694, is located within a gene desert region of chromosome 4q34.3. Approximately 120-kb downstream of rs1454694, a non–protein-coding RNA 290 (LINC00290, Gene ID:728081, HGNC: 38515) was found to be the closest gene. Although its function and biologic relevance have yet to be identified, a role of long noncoding RNAs is increasingly being understood (24). Recently, LINC00290 was reported as one of the target genes of frequent somatic copy number alteration across in various cancer types (25). Frequent focal deletions around the LINC00290 gene were also reported in TP53-mutated childhood adrenocortical tumors (26).
The region of chromosome 4q34-q35 was reported as a common region of copy number loss in colon cancer and a hypermethylated region in lung cancer (27, 28). This region includes the coding region of the prognosis-related genes, such as caspase-3 (CASP3) and VEGF-C. The clinical significance of CASP3 expression was reported in stage I NSCLCs as a significant factor for poor prognosis (29). Located about 4500 kb downstream of rs1454694, VEGF-C is known to stimulate lymphangiogenesis and its high expression in tumor cells was related to poor prognosis in NSCLCs (30).
We attempted to annotate this region using histone marks to test whether this region could be enriched with regulatory elements. ENCODE data of 16 chromatin states from 9 cell lines were used; the significant SNPs associated with RFS were found to overlap with weak enhancers in human embryo stem cell (H1-hesc) and human mammary epithelial cell (HMEC). However, no tissue-specific results in the lung were noted. In addition, we focused on regulatory elements for transcription, particularly binding of known transcription factors and histone modifications. We identified 161 additional SNPs in LD (r2 > 0.2) with 5 significant SNPs in 4q34.3 using Asian population data from Haploreg (www.broadinstitute.org/mammals/haploreg/haploreg.php). When we compared ENCODE data with the 161 SNPs, we found weak enhancer signals from H1hesc and HMEC that overlapped with SNPs. Further biologic experiments should be accompanied to assure the presence of regulatory elements spanning risk loci.
Despite the significant effect of rs1454694 in 4q34.3, our study has the limitation considering biologic relevance as there is no characterized gene in the genomic region at 4q34.3. Further functional analysis of this region may reveal a clue for the functional association with the regulatory elements. Furthermore, multiple validation approaches should be considered in other ethnic populations.
In conclusion, this study constitutes the high-density large-scale GWAS carried out in the Korean patients with stage I NSCLCs and suggests 5 SNPs, including rs1454694, as the prognostic markers associated with postoperative recurrence. These results suggest that the presence of the risk alleles in the 5 polymorphisms on 4q34.3 is a risk factor for recurrence after surgery and could be an important marker of poor prognosis for early-stage lung cancer.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: K.-A. Yoon, H.-S. Lee, J.S. Lee
Development of methodology: D. Lee, J.N. Joo, G.K. Lee, H.-S. Lee
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K.-A. Yoon, K. Bae, G.K. Lee, H.-S. Lee
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.-A. Yoon, M.K. Jung, D. Lee, J.N. Joo, H.-S. Lee, J.S. Lee
Writing, review, and/or revision of the manuscript: K.-A. Yoon, J.N. Joo, G.K. Lee, H.-S. Lee, J.S. Lee
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.-S. Lee
Study supervision: H.-S. Lee, J.S. Lee
Acknowledgments
The authors thank all of the individuals who participated in this study. Dr. Yeon Soo Lee kindly provided the information on the genetic variation in the genomic region 4q34 from WGS project.
Grant Support
This work was supported by a National Cancer Center Research Grant 1010040 (to J.S. Lee), and 1210360 (to K.-A. Yoon).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.