Abstract
Background: African Americans have the highest lung cancer mortality in the United States. Genome-wide association studies (GWASs) of germline variants influencing lung cancer survival have not yet been conducted with African Americans. We examined five previously reported GWAS catalog variants and explored additional genome-wide associations among African American lung cancer cases.
Methods: Incident non–small cell lung cancer cases (N = 286) in the Southern Community Cohort Study were genotyped on the Illumina HumanExome BeadChip. We used Cox proportional hazards models to estimate HRs and 95% confidence intervals (CIs) for overall mortality. Two independent African American studies (N = 316 and 298) were used for replication.
Results: One previously reported variant, rs1878022 on 12q23.3, was significantly associated with mortality (HR = 0.70; 95% CI: 0.54–0.92). Replication findings were in the same direction, although attenuated (HR = 0.87 and 0.94). Meta-analysis had a HR of 0.83 (95% CI, 0.71–0.97). Analysis of common variants identified an association between chromosome 6q21.33 and mortality (HR = 0.46; 95% CI, 0.33–0.66).
Conclusions: We identified an association between rs1878022 in CMKLR1 and lung cancer survival. However, our results in African Americans have a different direction of effect compared with a prior study in European Americans, suggesting a different genetic architecture or presence of gene–environment interactions. We also identified variants on chromosome 6 within the gene-rich HLA region, which has been previously implicated in lung cancer risk and survival.
Impact: We found evidence that inherited genetic risk factors influence lung cancer survival in African Americans. Replication in additional populations is necessary to confirm potential genetic differences in lung cancer survival across populations. Cancer Epidemiol Biomarkers Prev; 26(8); 1288–95. ©2017 AACR.
Introduction
Lung cancer is the leading cause of cancer-related death in the United States (1). Notably, African Americans have poorer lung cancer survival compared with whites, namely a 14% 5-year survival in African Americans compared with 17% in whites (2). This poor survival can largely be attributed to presentation at a later stage in African Americans, which is associated with reduced lung cancer survival (3). In the clinical setting, an individual's treatment plan is tailored to the stage of lung cancer diagnosis and can impact lung cancer survival (4). Other factors associated with poor lung cancer survival include low socioeconomic status (SES), smoking, older age, male sex, site of origin, tumor grade, histologic subtype, and somatic mutation profile (5, 6).
Tumor mutation status is frequently incorporated into genetic medicine to guide therapeutic decisions (7, 8). However, germline variation could also influence treatment response (9, 10). Genetic association studies have successfully identified germline variants that contribute to lung cancer survival, but genome-wide association studies (GWAS) of lung cancer survival have been conducted only in populations of European or Asian descent (11–18). To improve the precision of approaches based on population genetic history, genetic studies in diverse racial/ethnic populations need to be conducted. Overall, African descent populations have shorter segments of linkage disequilibrium compared with European descent populations; thus, examining genetic associations of disease in African American populations may allow for improved fine mapping of causal variants with potential therapeutic benefits applicable across racial/ethnic groups.
Using a prospective cohort of African Americans, we sought to validate lung cancer survival variants previously reported in the NHGRI-EBI GWAS catalog (19), accounting for known factors influencing survival. We also sought to identify novel and potentially functional genome-wide germline variants associated with lung cancer survival in African Americans. Although the current study is underpowered to detect genome-wide associations, this is the first study to our knowledge to examine genome-wide germline genetic associations of lung cancer survival in African Americans.
Materials and Methods
Study population
The Southern Community Cohort Study (SCCS) is a large prospective cohort study designed to examine racial disparities in cancer. Adults between the ages of 40–79 years were enrolled primarily at community health centers throughout a 12-state region across the Southeastern United States (Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia, and West Virginia) between March 2002 and September 2009. Approximately 90% of enrolled individuals agreed to donate a biologic specimen. Nearly two-thirds of participants self-identified as African American. Additional details regarding the study design and recruitment have been previously published (20, 21). The SCCS was approved by institutional review boards at Vanderbilt University and Meharry Medical College. Written informed consent was obtained from all participants.
Case identification and mortality assessment
Incident African American lung cancer cases occurring in the SCCS through September 2012 were identified via linkage with the 12 state cancer registries throughout the study catchment area. Epidemiologic data such as demographic characteristics, self-reported medical history, and tobacco use, were ascertained at study enrollment by trained personnel via in-person computer-assisted interviews at community health centers or by questionnaire mailed to a random subset of the general population in the 12-state region (20, 21). Histology, stage at diagnosis, and clinical treatment were collected from state cancer registry data. Lung cancer stage was determined using the American Joint Committee on Cancers staging system and Surveillance, Epidemiology, and End Results (SEER) summary stage. Treatment information was identified from cancer registries and summarized as a 5-level design variable: no treatment, chemotherapy only, radiation only, surgery only, or multi-modality (any combination of the above therapy regimens). Participants were followed for all-cause mortality through 2013, which was assessed via linkage with the Social Security Administration and the National Death Index. Survival time (years) was defined as the time between date of diagnosis and date of death, loss to follow-up, or end of follow-up period for the study.
Genotyping and quality control
Individuals with an available blood or buccal cell biospecimen from which germline DNA could be extracted as of September 2012 were genotyped on the Illumina HumanExome BeadChip v1.1. The HumanExome array contains >240,000 variants, including a panel of >3,000 Ancestry Informative Markers (AIM) for distinguishing African and European ancestries, and >4,700 disease-associated tag markers identified from GWAS. A detailed description of the quality control (QC) process is presented in Supplementary Fig. S1. Briefly, QC procedures removed individuals having mismatched genetic sex, <98% genotyping efficiency and related individuals. Autosomal variants were excluded that had Mendelian errors, a call rate <98%, or a minor allele frequency (MAF) < 5%. QC filters were applied using PLINK and R (version 3.0.3; ref. 22). A total of 14 samples were used to verify genotype reproducibility, with a genotype concordance of >99.99%.
Admixture estimation
Global ancestry was estimated using the ADMIXTURE software (23). The CEU (Utah residents with ancestry from northern and western Europe) and YRI (Yoruba from Ibadan, Nigeria) HapMap populations were included as ancestral reference populations to inform the admixture estimation process. After merging with the HapMap reference populations and pruning based on linkage disequilibrium (r2 > 0.4), 1,137 AIMs remained among the 2,587 AIMs in the final QC dataset for estimation of genetic admixture. Supervised (k = 2) admixture analysis was performed to estimate the percent African (YRI) and European (CEU) ancestry for each African American individual.
Statistical analysis
Trans ethnic replication of NHGRI-EBI GWAS catalog lung cancer survival variants.
Variants previously associated with lung cancer survival and genotyped on the HumanExome array were identified from the NHGRI-EBI GWAS catalog (rs1878022, rs1209950, rs9981861, rs1656402, and rs716274, Supplementary Table S1; ref. 19). For each additively coded variant, we ran a Cox proportional hazards model, controlling for age at diagnosis, sex, disease stage, treatment, and percent African ancestry. Statistical significance was determined using Bonferroni correction based on the number of a priori variants tested (five variants, α = 0.05/5 = 0.01). We also examined whether smoking pack-years, education status, and self-reported COPD status modified our genetic associations. All statistical analyses were conducted using the R survival package (survival, version 2.37–7).
Discovery of new common variants associated with lung cancer survival.
Cox proportional hazards models were used to estimate HRs assessing the association between common (MAF ≥ 5%) variants and lung cancer survival. Genotypes were coded additively and models were adjusted for age at diagnosis, sex, disease stage, treatment, and percent African ancestry. We further adjusted our models to assess the impact of smoking, education, and self-reported COPD status on our genetic associations. Functional predictions were conducted using PolyPhen-2 and SIFT gathered from the ENSEMBL website (24, 25). A type I error rate of α = 0.05/28,041 variants = 1.78 × 10−6 was set for inference of statistical “significance.”
Replication populations
Two independent African American lung cancer study populations were available for replication: (i) the Karmanos Cancer Institute (KCI) at Wayne State University (WSU) and (ii) University of California San Francisco (UCSF). Non–small cell lung cancer (NSCLC) cases from both study populations (N = 316 and 298, respectively) were previously genotyped on the Illumina HumanHap 1M Duo array and filtered on the basis of standard QC measures. Four variants meeting our P value threshold for replication that were not directly genotyped were imputed with IMPUTE2 using a cosmopolitan reference population from 1000 Genomes (phase 1, b37, June 2014 release), with phasing performed using SHAPEIT (26). KCI/WSU African American lung cancer cases were included from three WSU studies: the Family Health Study (FHS study III), the Women's Epidemiology of Lung Disease (WELD) Study, and the EXHALE (Exploring Health, Ancestry and Lung Epidemiology) Study, and have been previously described (27). Cases were ascertained using rapid case ascertainment through the population-based Metropolitan Detroit Cancer Surveillance System, an NCI-funded SEER registry. Stage, treatment, and vital status were ascertained from the Detroit SEER registry. African Americans participating in the UCSF lung cancer study were identified using rapid case ascertainment methods from September 1998 to March 2003 (28). Specifically, cancer histology and stage were determined using ICD-O codes abstracted from SEER data from the California Cancer Registry (CCR). The CCR also provided data for whether surgery, radiation, and chemotherapy were given to the patients, their last known vital status, and date of death or date of last contact. Stage and treatment information for both replication populations were summarized using the same algorithm as implemented within the SCCS. Percent African ancestry was estimated using genome-wide data and a supervised analysis (k = 2) implemented with ADMIXTURE (23). A fixed effects meta-analysis of the discovery (SCCS) and replication (KCI/WSU and UCSF) cohorts was performed using METAL (29).
Results
We identified a total of 336 incident African American lung cancer cases in the SCCS cohort with an available biospecimen. After implementation of QC procedures, 286 African Americans with NSCLC were included in our study (Supplementary Fig. S1). The mean follow-up time was 1.3 years (min = 0.003; max = 8.6), and 87% of individuals were deceased at the end of follow-up. Lung cancer cases were diagnosed at a mean age of 60 years and were primarily (60%) male (Table 1). Ninety-four percent of individuals were recruited from community health centers resulting in a primarily low SES population with 68% having an annual household income less than $15,000, and 47% having less than a high school education. Fifty-two percent of African American lung cancer cases in the SCCS were diagnosed with stage IV disease (Table 1; Supplementary Fig. S2). Median African ancestry was 86% (Supplementary Fig. S3A) among SCCS African American NSCLC cases. African American NSCLC cases in the KCI/WSU (N = 316) and UCSF (N = 298) studies had lower African ancestry (83% and 82%, respectively; Supplementary Fig. S3B and S3C), higher SES and fewer stage IV diagnoses (26% KCI/WSU and 29% UCSF; Supplementary Table S2; Supplementary Fig. S2) than African Americans in the SCCS. The mean follow-up time for the KCI/WSU and UCSF populations was 4.0 years (min = 0.3; max = 13.7) and 2.8 years (min = 0.2; max = 9.0), respectively. Our final sample size for complete case analysis was N = 275 in the SCCS, N = 312 in KCI/WSU, and N = 284 in UCSF.
. | N (%) . |
---|---|
Sex | |
Male | 171 (59.8) |
Female | 115 (40.2) |
Vital status | |
Alive | 38 (13.3) |
Dead | 248 (86.7) |
Median African ancestry, % | 85.6 |
Lung cancer stage at diagnosis | |
I | 44 (15.7) |
II/III | 90 (32.0) |
IV | 147 (52.3) |
Unknown | 5 |
Treatment | |
No treatment | 75 (26.8) |
Surgery only | 31 (11.1) |
Chemotherapy only | 51 (18.2) |
Radiation only | 36 (12.9) |
Multimodality | 87 (31.1) |
Unknown | 6 |
Histology | |
Adenocarcinoma | 113 (39.5) |
NSCLC-NOS | 78 (27.3) |
Squamous | 72 (25.2) |
Other NSCLC | 22 (7.7) |
Multiple histologies | 1 (0.3) |
Mean age at diagnosis, y (SD) | 59.6 (9.1) |
Mean duration of disease among those who died, y (SD) | 0.88 (1.1) |
Mean duration of disease among those alive at last follow-up, y (SD) | 4.1 (1.7) |
Highest education level, y | |
<12 | 134 (47.2) |
≥12 | 150 (52.8) |
Unknown | 2 |
Smoking status at cohort entry | |
Current | 206 (72.8) |
Former | 59 (20.8) |
Never | 18 (6.4) |
Unknown | 3 |
Mean cigarettes per day (SD) | 15.7 (13.1) |
Mean smoking pack-years (SD) | 37.5 (30.7) |
First-degree relative with lung cancer | |
Yes | 21 (9.3) |
No | 205 (90.7) |
Unknown | 60 |
. | N (%) . |
---|---|
Sex | |
Male | 171 (59.8) |
Female | 115 (40.2) |
Vital status | |
Alive | 38 (13.3) |
Dead | 248 (86.7) |
Median African ancestry, % | 85.6 |
Lung cancer stage at diagnosis | |
I | 44 (15.7) |
II/III | 90 (32.0) |
IV | 147 (52.3) |
Unknown | 5 |
Treatment | |
No treatment | 75 (26.8) |
Surgery only | 31 (11.1) |
Chemotherapy only | 51 (18.2) |
Radiation only | 36 (12.9) |
Multimodality | 87 (31.1) |
Unknown | 6 |
Histology | |
Adenocarcinoma | 113 (39.5) |
NSCLC-NOS | 78 (27.3) |
Squamous | 72 (25.2) |
Other NSCLC | 22 (7.7) |
Multiple histologies | 1 (0.3) |
Mean age at diagnosis, y (SD) | 59.6 (9.1) |
Mean duration of disease among those who died, y (SD) | 0.88 (1.1) |
Mean duration of disease among those alive at last follow-up, y (SD) | 4.1 (1.7) |
Highest education level, y | |
<12 | 134 (47.2) |
≥12 | 150 (52.8) |
Unknown | 2 |
Smoking status at cohort entry | |
Current | 206 (72.8) |
Former | 59 (20.8) |
Never | 18 (6.4) |
Unknown | 3 |
Mean cigarettes per day (SD) | 15.7 (13.1) |
Mean smoking pack-years (SD) | 37.5 (30.7) |
First-degree relative with lung cancer | |
Yes | 21 (9.3) |
No | 205 (90.7) |
Unknown | 60 |
Abbreviation: NOS, not otherwise specified.
NHGRI-EBI GWAS catalog variants
Of the variants previously associated with lung cancer survival in the GWAS catalog, five were genotyped on the HumanExome array and passed QC (14, 16, 18). Variants rs1209950, rs9981861, rs1656402, and rs716274 were previously identified in East Asian populations, while rs1878022 was previously reported in a European descent population (Supplementary Table S1). Variants had similar allele frequencies as those observed in the 1000 Genomes Americans of African Ancestry in the Southwestern US (ASW) population (Supplementary Table S3). We observed that the C allele for rs1878022 was significantly associated with lower mortality after correction for multiple testing (HR = 0.70; 95% confidence interval (CI), 0.54–0.92], adjusting for age, sex, stage, treatment, and African ancestry (Table 2; Supplementary Table S3). Although HRs were in the same direction as previously reported for the remaining variants, CIs were wide and results were not significant. Inclusion of smoking pack years, education, or COPD status in our models did not appreciably alter (<10% change) our genetic associations (Supplementary Table S4). Associations of rs1878022 in the KCI/WSU and UCSF study populations had HRs in the same direction as the SCCS, although not statistically significant (KCI/WSU: HR = 0.87; 95% CI, 0.66–1.14; UCSF: HR = 0.94; 95% CI, 0.72–1.24; Table 2; Supplementary Table S3). A fixed effects meta-analysis of the three populations revealed a summary HR of 0.83 (95% CI, 0.71–0.97; P = 0.02, I2 = 0.30; Fig. 1). Stratification of the SCCS cohort by stage revealed a significant association among African Americans with stage IV lung cancer (HR = 0.62; 95% CI, 0.44–0.88; P = 0.008; Supplementary Table S5). Pooled analysis of stage IV individuals from KCI/WSU and UCSF resulted in a HR of 1.20 (95% CI, 0.85–1.69; Supplementary Table S5) and a stage-stratified meta-analysis of the three populations (N = 308) found an attenuated and nonsignificant association among stage IV individuals (HR = 0.83; 95% CI, 0.65–1.07).
. | . | . | . | SCCS (N = 275) . | KCI/WSU (N = 312) . | UCSF (N = 284) . | |||
---|---|---|---|---|---|---|---|---|---|
SNP . | Chr. . | bp Positiona . | Geneb . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
rs1878022 | 12q23.3 | 108699032 | CMKLR1 | 0.70 (0.54–0.92) | 9.8 × 10−3 | 0.87 (0.66–1.14) | 0.30 | 0.94 (0.71–1.24) | 0.66 |
rs1209950 | 21q22.2 | 40173528 | || ETS2 | 1.16 (0.90–1.50) | 0.24 | 0.93 (0.70–1.25) | 0.65 | 0.91 (0.68–1.22) | 0.53 |
rs9981861 | 21q22.2 | 41415044 | DSCAM | 1.08 (0.88–1.33) | 0.46 | 1.13 (0.91–1.39) | 0.27 | 1.15 (0.91–1.45) | 0.24 |
rs1656402 | 2q37.1 | 233426526 | EIF4E2 | 0.93 (0.76–1.14) | 0.48 | 0.90 (0.72–1.11) | 0.30 | 1.11 (0.86–1.43) | 0.41 |
rs716274c | 11q22.3 | 103418158 | DYNC2H1 || | 1.02 (0.84–1.23) | 0.86 | 0.97 (0.80–1.17) | 0.72 | 1.21 (0.97–1.50) | 0.08 |
. | . | . | . | SCCS (N = 275) . | KCI/WSU (N = 312) . | UCSF (N = 284) . | |||
---|---|---|---|---|---|---|---|---|---|
SNP . | Chr. . | bp Positiona . | Geneb . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
rs1878022 | 12q23.3 | 108699032 | CMKLR1 | 0.70 (0.54–0.92) | 9.8 × 10−3 | 0.87 (0.66–1.14) | 0.30 | 0.94 (0.71–1.24) | 0.66 |
rs1209950 | 21q22.2 | 40173528 | || ETS2 | 1.16 (0.90–1.50) | 0.24 | 0.93 (0.70–1.25) | 0.65 | 0.91 (0.68–1.22) | 0.53 |
rs9981861 | 21q22.2 | 41415044 | DSCAM | 1.08 (0.88–1.33) | 0.46 | 1.13 (0.91–1.39) | 0.27 | 1.15 (0.91–1.45) | 0.24 |
rs1656402 | 2q37.1 | 233426526 | EIF4E2 | 0.93 (0.76–1.14) | 0.48 | 0.90 (0.72–1.11) | 0.30 | 1.11 (0.86–1.43) | 0.41 |
rs716274c | 11q22.3 | 103418158 | DYNC2H1 || | 1.02 (0.84–1.23) | 0.86 | 0.97 (0.80–1.17) | 0.72 | 1.21 (0.97–1.50) | 0.08 |
Models are adjusted for age, sex, treatment, stage, and percent African ancestry.
adbSNP build 137/GRCH37.p5.
bFor variants outside of gene boundaries, || denotes the location of the variant relative to the closest gene.
cImputed in KCI/WSU and USCF populations. Imputation score = 0.99.
HR = hazard ratio, CI = confidence interval.
ExomeChip common variant associations
We examined common variants associated with lung cancer survival among African Americans. After QC, 28,041 common variants remained for analysis. While no variant met our threshold for statistical significance, thirteen variants had P values < 1.0 × 10−4 (Table 3; Fig. 2). Of the 13 variants with the smallest P values, seven were associated with greater all-cause mortality and six were associated with reduced all-cause mortality in additive models with increasing copies of the minor allele. All variants had similar allele frequencies to those observed in the 1000 Genomes ASW reference population (Supplementary Table S6). Four of the 13 variants (rs2072633, rs1505229, rs7502216, rs537160) were located within gene introns and an additional five (rs1133358, rs8176785, rs7626962, rs35761244, rs1639122) were exonic. A variant in the SUN5 gene on chromosome 20q11, rs1133358, was the most strongly associated with mortality (HR = 0.61; 95% CI, 0.49–0.76) and predicted by SIFT to be deleterious to protein function, although with low confidence (Tables 3 and 4). A peak on chromosome 6 revealed three variants (rs605203, rs2072633, rs537160, r2 < 0.5) associated with reduced mortality (HR = 0.46; 95% CI, 0.33–0.66; HR = 0.66; 95% CI, 0.54–0.81; HR = 0.61; 95% CI, 0.48–0.78, respectively). Variant rs7626962 was also predicted to have deleterious effects by SIFT (Table 4). Adding smoking pack-years, education status, and self-reported COPD status to our statistical models did not substantially alter the observed HRs (Supplementary Table S7). We further examined these 13 variants in our KCI/WSU and UCSF replication populations (Table 3; Supplementary Table S6). Within the KCI/WSU study population, the G allele at rs7302017 was associated with increased mortality (HR = 1.29; 95% CI, 1.01–1.63), while the G allele at rs605203 was associated with reduced mortality in the UCSF study population (HR, 0.55; 95% CI, 0.38–0.78). Both of these associations have the same magnitude and direction of effect as the discovery cohort.
. | . | . | . | SCCS (N = 275) . | KCI/WSU (N = 312) . | UCSF (N = 284) . | |||
---|---|---|---|---|---|---|---|---|---|
SNP . | Chr . | bp Positiona . | Geneb . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
rs1133358 | 20q11.21 | 31590686 | SUN5 | 0.61 (0.49–0.76) | 8.12 × 10−6 | 1.01 (0.82–1.24) | 0.94 | 1.08 (0.87–1.35) | 0.49 |
rs8176785 | 11p15.1 | 20805286 | NELL1 | 1.65 (1.32–2.07) | 1.20 × 10−5 | 0.98 (0.79–1.21) | 0.82 | 1.11 (0.87–1.42) | 0.39 |
rs7626962 | 3p22.2 | 38620907 | SCN5A | 1.95 (1.44–2.64) | 1.44 × 10−5 | 1.05 (0.75–1.46) | 0.78 | 0.80 (0.53–1.22) | 0.30 |
rs605203 | 6p21.33 | 31847012 | SLC44A4 || EHMT2 | 0.46 (0.33–0.66) | 1.63 × 10−5 | 1.13 (0.82–1.57) | 0.43 | 0.55 (0.38–0.79) | 0.001 |
rs35761244c | 19p13.2 | 13873698 | CCDC130 | 1.66 (1.32–2.09) | 1.68 × 10−5 | 0.98 (0.70–1.36) | 0.90 | 0.94 (0.68–1.32) | 0.73 |
rs6959964c | 7q11.22 | 68905738 | || AUTS2 | 1.56 (1.27–1.91) | 2.21 × 10−5 | 0.84 (0.67–1.04) | 0.10 | 0.88 (0.71–1.10) | 0.27 |
rs1639122 | 12p13.31 | 6711147 | CHD4 | 1.80 (1.36–2.38) | 3.38 × 10−5 | 0.93 (0.74–1.19) | 0.58 | 1.05 (0.78–1.40) | 0.76 |
rs7138803 | 12q13.12 | 50247468 | BCDIN3D || FAIM2 | 1.70 (1.32–2.18) | 3.78 × 10−5 | 0.96 (0.74–1.24) | 0.74 | 0.90 (0.69–1.19) | 0.48 |
rs2072633 | 6p21.33 | 31919578 | CFB | 0.66 (0.54–0.81) | 4.07 × 10−5 | 1.00 (0.81–1.23) | 0.98 | 0.91 (0.73–1.13) | 0.40 |
rs1505229 | 2p12 | 77589901 | LRRTM4 | 0.67 (0.56–0.82) | 5.44 × 10−5 | 0.96 (0.78–1.19) | 0.72 | 1.04 (0.84–1.30) | 0.72 |
rs7502216 | 17q12 | 36612948 | ARHGAP23 | 0.67 (0.55–0.81) | 5.78 × 10−5 | 0.85 (0.69–1.04)) | 0.11 | 1.10 (0.89–1.36) | 0.40 |
rs537160 | 6p21.33 | 31916400 | CFB | 0.61 (0.48–0.78) | 7.44 × 10−5 | 0.98 (0.74–1.29) | 0.87 | 0.81 (0.61–1.06) | 0.13 |
rs7302017c | 12q14.1 | 63004583 | MIRLET7I || PPM1H | 1.53 (1.24–1.89) | 9.28 × 10−5 | 1.29 (1.01–1.63) | 0.04 | 0.87 (0.67–1.13) | 0.30 |
. | . | . | . | SCCS (N = 275) . | KCI/WSU (N = 312) . | UCSF (N = 284) . | |||
---|---|---|---|---|---|---|---|---|---|
SNP . | Chr . | bp Positiona . | Geneb . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
rs1133358 | 20q11.21 | 31590686 | SUN5 | 0.61 (0.49–0.76) | 8.12 × 10−6 | 1.01 (0.82–1.24) | 0.94 | 1.08 (0.87–1.35) | 0.49 |
rs8176785 | 11p15.1 | 20805286 | NELL1 | 1.65 (1.32–2.07) | 1.20 × 10−5 | 0.98 (0.79–1.21) | 0.82 | 1.11 (0.87–1.42) | 0.39 |
rs7626962 | 3p22.2 | 38620907 | SCN5A | 1.95 (1.44–2.64) | 1.44 × 10−5 | 1.05 (0.75–1.46) | 0.78 | 0.80 (0.53–1.22) | 0.30 |
rs605203 | 6p21.33 | 31847012 | SLC44A4 || EHMT2 | 0.46 (0.33–0.66) | 1.63 × 10−5 | 1.13 (0.82–1.57) | 0.43 | 0.55 (0.38–0.79) | 0.001 |
rs35761244c | 19p13.2 | 13873698 | CCDC130 | 1.66 (1.32–2.09) | 1.68 × 10−5 | 0.98 (0.70–1.36) | 0.90 | 0.94 (0.68–1.32) | 0.73 |
rs6959964c | 7q11.22 | 68905738 | || AUTS2 | 1.56 (1.27–1.91) | 2.21 × 10−5 | 0.84 (0.67–1.04) | 0.10 | 0.88 (0.71–1.10) | 0.27 |
rs1639122 | 12p13.31 | 6711147 | CHD4 | 1.80 (1.36–2.38) | 3.38 × 10−5 | 0.93 (0.74–1.19) | 0.58 | 1.05 (0.78–1.40) | 0.76 |
rs7138803 | 12q13.12 | 50247468 | BCDIN3D || FAIM2 | 1.70 (1.32–2.18) | 3.78 × 10−5 | 0.96 (0.74–1.24) | 0.74 | 0.90 (0.69–1.19) | 0.48 |
rs2072633 | 6p21.33 | 31919578 | CFB | 0.66 (0.54–0.81) | 4.07 × 10−5 | 1.00 (0.81–1.23) | 0.98 | 0.91 (0.73–1.13) | 0.40 |
rs1505229 | 2p12 | 77589901 | LRRTM4 | 0.67 (0.56–0.82) | 5.44 × 10−5 | 0.96 (0.78–1.19) | 0.72 | 1.04 (0.84–1.30) | 0.72 |
rs7502216 | 17q12 | 36612948 | ARHGAP23 | 0.67 (0.55–0.81) | 5.78 × 10−5 | 0.85 (0.69–1.04)) | 0.11 | 1.10 (0.89–1.36) | 0.40 |
rs537160 | 6p21.33 | 31916400 | CFB | 0.61 (0.48–0.78) | 7.44 × 10−5 | 0.98 (0.74–1.29) | 0.87 | 0.81 (0.61–1.06) | 0.13 |
rs7302017c | 12q14.1 | 63004583 | MIRLET7I || PPM1H | 1.53 (1.24–1.89) | 9.28 × 10−5 | 1.29 (1.01–1.63) | 0.04 | 0.87 (0.67–1.13) | 0.30 |
NOTE: Models are adjusted for age, sex, treatment, stage, and percent African ancestry. Results are shown for associations with a p value < 1.0 × 10−4.
adbSNP build 137/GRCH37.p5.
bFor variants outside of gene boundaries, || denotes the location of the variant relative to the closest gene(s).
cImputed in KCI/WSU and USCF populations. Imputation score = 0.90 for rs35761244, =0.98 for rs6959964, and =0.99 for rs7302017.
SNP . | Genea . | Amino acid change . | PolyPhen-2 prediction . | SIFT Prediction . |
---|---|---|---|---|
rs1133358 | SUN5 | E > D | Benign | Deleterious |
rs8176785 | NELL1 | R > Q | Benign | Tolerated |
rs7626962 | SCN5A | S > Y | Benign | Deleterious |
rs35761244 | CCDC130 | C > S | Benign | Tolerated |
rs1639122 | CHD4 | E > D | Benign | Tolerated |
SNP . | Genea . | Amino acid change . | PolyPhen-2 prediction . | SIFT Prediction . |
---|---|---|---|---|
rs1133358 | SUN5 | E > D | Benign | Deleterious |
rs8176785 | NELL1 | R > Q | Benign | Tolerated |
rs7626962 | SCN5A | S > Y | Benign | Deleterious |
rs35761244 | CCDC130 | C > S | Benign | Tolerated |
rs1639122 | CHD4 | E > D | Benign | Tolerated |
Discussion
We sought to validate lung cancer survival variants previously identified in Asian or European populations in an incident study of African Americans. We additionally sought to identify novel common variants associated with survival in African Americans. Our evaluation of five previously identified lung cancer survival GWAS variants identified one variant in the CMKLR1 gene significantly associated with reduced mortality in African Americans with NSCLC. The remaining four variants were not significantly associated. The observed reduced mortality for rs1878022 is in contrast to a prior study by Wu and colleagues conducted in European Americans in which they reported that the minor allele of rs1878022 was associated with increased mortality (18). All individuals in the study by Wu and colleagues were white ever-smokers with stage III or IV NSCLC who received platinum-based chemotherapy and no surgery; whereas, in the current study of African Americans no restriction was made on stage, treatment, or smoking status for study inclusion. Of note, the frequency of the C allele in the 1000 Genomes populations differs between European and African populations (35% versus 13%). The observed allele frequency of 14% in the current study of SCCS African Americans is similar to the reported frequency (18%) in the 1000 Genomes African American (ASW) population and in the UCSF and WSU lung cancer cases (17.4% and 18.8%, respectively). While the observed population-specific effects of rs1878022 may simply be the result of a spurious association, it could also result from differences in linkage disequilibrium (LD) patterns between Africans and Europeans as a reflection of distinct backbone haplotypes that could have different causal variants. Examination of LD structure surrounding the CMKLR1 gene revealed differences between the YRI and CEU populations, with the CEU having larger blocks of strong LD (Supplementary Fig. S4). Using LDlink (30) and data from an African (YRI) population, we find five variants are in strong LD with rs1878022 (r2 > 0.8) and an additional eight variants in moderate LD (r2 > 0.3 and r2 < 0.8) while six variants are in moderate LD in whites (CEU) and none in strong LD. RegulomeDB (31) reveals two of the variants in moderate LD in the YRI (rs4964244 and rs4964245, r2 = 0.44 and 0.38, respectively) and one variant in the CEU (rs4964242, r2 = 0.55) have high regulatory potential and are likely to affect binding. Variant rs4964242 is not in LD with rs4964244 or rs4964245 in either the CEU or YRI population (r2 ≤ 0.1), indicating that different regulatory variants could account for the observed population-specific effects. It is possible that rs1878022 is tagging a different causal variant in the YRI population than in the CEU population and thus it remains necessary to fine map the CMKLR1 gene to determine the causal variant influencing lung cancer survival in African Americans compared with European Americans. Given the high prevalence of rare variants, targeted nature of the HumanExome array and lack of an imputation backbone, we were unable to examine LD surrounding rs1878022 within the SCCS.
Variant rs1878022 is located within the second intron within the 5′-untranslated region of the chemokine-like receptor 1 (CMKLR1) gene on chromosome 12q23.3. CMKLR1 encodes a seven transmembrane G-protein coupled receptor that has been associated with adiposity, glucose intolerance, and inflammation (32, 33). It has been shown to play a role in the immune response to cigarette smoke in murine models of chronic obstructive pulmonary disease, an established risk factor for lung cancer risk (34). Furthermore, in response to binding of its ligand, chemerin, CMKLR1 has been shown to activate MAPK, ERK1/2, and Akt signaling cascades involved in cell-cycle regulation (35). Examination of ENCODE data at rs1878022 identified a small peak of H3K27Ac and H3K4me1 in normal human lung fibroblasts (Supplementary Fig. S5). Epigenetic markers found surrounding rs1878022 are commonly found at active regulatory elements and suggest that this variant might play a role in the regulation of CMKLR1 activity.
While our analysis of common variants did not identify any significant variants, several variants had promising P values for associations with mortality, including multiple protein-coding variants. Three variants on chromosome 6p21.33 in weak LD (rs605203, rs2072633, and rs537160, r2 < 0.5) were associated with a reduction in mortality. All three variants were nonprotein coding. Variant rs605203, which had a similar HR in the UCSF population, is located 70-bp upstream of the SLC44A4 gene and downstream of EHMT2. Variants rs2072633 and rs537160 are located within the 29th and 19th intron of the complement factor B (CFB) gene, respectively. Chromosome 6p21.33 is part of the gene-rich human leukocyte antigen (HLA) region. The 6p22.1-p21.31 chromosome region has been previously associated with lung cancer risk in both European and African Americans, although these results have inconsistently replicated (36–40). Furthermore, rs4324798 located at 6p22.1 was previously associated with increased survival in a European descent population of never-smoking small-cell lung cancer cases (41, 42). Lung cancer susceptibility within this region has been largely attributed to the BAT3 and MSH5 genes, which play an important role in DNA damage response and potentially response to cancer therapeutics (43–45). Our findings support an association of the 6p22.1-p21.33 region with overall survival in African American NSCLC cases.
We identified variant rs1639122 on chromosome 12, located within the chromodomain helicase DNA binding protein 4 (CHD4). A change from the A to C allele results in a missense mutation that changes the amino acid from glutamate to aspartate. CHD4 is involved in nucleosome remodeling and transcriptional silencing as part of the nucleosome remodeling and deacetylation (NuRD) complex (46, 47). In addition, CHD4 plays a role in DNA damage response and cell-cycle progression (48, 49). Somatic mutations in CHD4 were identified in endometrial tumors and a germline protein-coding variant was associated with increased risk of overall cancer, malignant lymphoma, rectal cancer, and lung cancer, providing further evidence for a potential role of this gene in lung cancer survival (50–52). While the majority of these common variants failed to replicate in the KCI/WSU and UCSF populations, the totality of evidence from previously published associations with lung cancer risk and biological plausibility suggest our findings should be further investigated in additional African American lung cancer populations.
To our knowledge, this study is the first to examine genome-wide genetic variants associated with all-cause mortality among African American NSCLC cases. While we acknowledge our sample size is small, these analyses serve as a starting point for further investigation. With our limited sample size of African American NSCLC cases we have reduced statistical power to detect true associations and thus replication in a larger population of African American NSCLC cases is necessary. However, we had a priori evidence to assess variants previously associated with lung cancer survival, thus reducing the burden of multiple testing typical of genetic association studies. Importantly, assessing trans-ethnic replication of the NHGRI-EBI GWAS Catalog lung cancer survival variants provides genetic information about lung cancer desperately needed among diverse racial/ethnic populations. Moreover, meta-analysis of the discovery and replication cohorts provides additional support for this association. Our findings provide greater evidence of an association between the CMKLR1 gene and survival among lung cancer cases.
It is important to note that the current analyses are limited by the design of the genotyping array used in the discovery population. The HumanExome array was designed to capture biologically-relevant coding variation. As a result, variants are not evenly distributed across the genome and are predominantly rare in frequency. This array also includes >4,500 variants identified in genome-wide association studies. While there were more than 20 variants associated with lung cancer survival in the NHGRI-EBI GWAS catalog at the time of our data extraction, only five were present on the array for examination in the current study. The unique design of the HumanExome array precluded us from analyzing variants in LD with the five previously reported variants examined here. Specifically, examination of LD patterns surrounding the five variants in the relevant 1000 Genomes reference population revealed four variants in LD (±100 kb, r2 > 0.6, CEU reference population) with rs1878022, the only variant previously identified in a European descent population, and 57 variants in LD [±100 kb, r2 > 0.6, Han Chinese in Bejing, China (CHB) reference population] with one of the four variants previously identified in Asian populations. None of these variants in LD were present on the HumanExome array, preventing inclusion in the statistical analyses. Future analyses should include previously reported variants not present here and perform a thorough examination of LD structure surrounding these variants.
While the KCI/WSU and UCSF replication populations were similar, we note key differences between the discovery (SCCS) and replication study populations likely due to differences in the composition of the study base from which the lung cancer cases arose. The discovery and the replication lung cancer cases were all sampled using cancer registries within primary study bases; however, the study base of the SCCS is defined by its prospective cohort design of primarily individuals seeking medical care at community health centers across the Southeastern United States and KCI/WSU and UCSF are defined by geographic regions encompassing the Detroit and the San Francisco Bay Area metropolitan areas, respectively. These different geographic samplings likely contributed to important differences across the three studies in education level, current smoking prevalence, and stage distribution. Specifically, a greater percentage of SCCS lung cancer cases were diagnosed at later stages of disease compared with KCI/WSU and UCSF cases (Supplementary Fig. S2). This difference in stage distribution between discovery and replication studies could be attributed to differences in cohort versus case–control ascertainment as case–control designs often miss the sickest individuals. The observed difference in results across the discovery and replication studies among stage IV lung cancer cases for the association between rs1878022 and survival may be due to unique gene–environment interactions or simply a matter of chance.
In summary, this study identified several variants associated with survival in African Americans NSCLC cases and controls, a high-risk and underrepresented population in lung cancer genetics research. By examining variants previously associated with lung cancer survival, we observed that rs1878022 is significantly associated with survival in African Americans. However, the direction of effect observed is in contrast to a previous study in Europeans, suggesting rs1878022 may have populations-specific effects on lung cancer survival due to possible differing causal variants between European descent populations and African Americans due to distinct LD substructure (18). In addition, we identified several potential novel variants, both protein coding and non-coding, associated with survival in African Americans, including a region on chromosome 6p21.33 that has been previously associated with lung cancer risk and small-cell lung cancer survival. Future studies fine-mapping CMKLR1 and the 6p21-22 region and conducting functional studies could lead to potential therapeutic interventions to improve lung cancer survival, especially in African Americans.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The contents of this publication are solely the responsibility of the authors and do not represent the official views of the Centers for Disease Control (CDC) or the Mississippi Cancer Registry. The funding source had no role in study design; the collection, analysis and interpretation of data; the writing of the report; or the decision to submit the article for publication.
Authors' Contributions
Conception and design: M.C. Aldrich
Development of methodology: W.S. Bush, J.K. Wiencke, M.R. Wrensch, E.L. Grogan
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.G. Schwartz, J.K. Wiencke, M.R. Wrensch, W.J. Blot, S.J. Chanock
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.C. Jones, W.S. Bush, S.J. Chanock, M.C. Aldrich
Writing, review, and/or revision of the manuscript: C.C. Jones, D.C. Crawford, A.G. Schwartz, W.J. Blot, E.L. Grogan, M.C. Aldrich
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.C. Jones, A.S. Wenzlaff, A.G. Schwartz
Study supervision: D.C. Crawford, J.K. Wiencke, M.R. Wrensch, M.C. Aldrich
Acknowledgments
Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health Cancer Registry. Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries of the CDC.
Grant Support
This work was supported by a Department of Defense Early Investigator Synergistic Idea Award (granted to M.C. Aldrich; W81XWH-12-1-0547; and E.L. Grogan, W81XWH-12-1-0544). M.C. Aldrich was supported by a NIH/National Cancer InstituteK07 CA172294. C.C. Jones was supported by the Training Program on Genetic Variation and Human Phenotypes, a NIH/NIGMS training grant (4T32GM08017810). E.L. Grogan was supported by a Veterans Affairs Career Development Award10-024. Studies at the Karmanos Cancer Institute at Wayne State University were supported by NIH grants/contracts R01CA060691, R01CA87895, and P30CA22453, and a Department of Health and Human Services contract HHSN261201000028C (to A.G. Schwartz). The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries of the CDC.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.