Abstract
This study was aimed to identify novel susceptibility variants for second primary tumor (SPT) or recurrence in curatively treated early-stage head and neck squamous cell carcinoma (HNSCC) patients.
We constructed a custom chip containing a comprehensive panel of 9,645 chromosomal and mitochondrial single nucleotide polymorphisms (SNP) representing 998 cancer-related genes selected by a systematic prioritization schema. Using this chip, we genotyped 150 early-stage HNSCC patients with and 300 matched patients without SPT/recurrence from a prospectively conducted randomized trial and assessed the association of these SNPs with risk of SPT/recurrence.
Individually, six chromosomal SNPs and seven mitochondrial SNPs were significantly associated with risk of SPT/recurrence after adjustment for multiple comparisons. A strong gene-dosage effect was observed when these SNPs were combined, as evidenced by a progressively increasing SPT/recurrence risk as the number of unfavorable genotypes increased (P for trend < 1.00 × 10−20). Several polygenic analyses suggest an important role of interconnected functional network and gene-gene interaction in modulating SPT/recurrence. Furthermore, incorporation of these genetic markers into a multivariate model improved significantly the discriminatory ability over the models containing only clinical and epidemiologic variables.
This is the first large-scale systematic evaluation of germ-line genetic variants for their roles in HNSCC SPT/recurrence. The study identified several promising susceptibility loci and showed the cumulative effect of multiple risk loci in HNSCC SPT/recurrence. Furthermore, this study underscores the importance of incorporating germ-line genetic variation data with clinical and risk factor data in constructing prediction models for clinical outcomes.
Approximately 10% of early-stage head and neck squamous cell carcinoma (HNSCC) patients develop locoregional recurrence and 15% to 25% develop second primary tumors (SPT) within 5 years of initial diagnosis (1, 2). As diagnostic and therapeutic approaches continue to improve, the ability to accurately predict SPT/recurrence in early-stage HNSCC patients would facilitate intensive surveillance or targeted interventions for high-risk patients and thereby reduce mortality and morbidity.
Clinical (index tumor site and disease stage) and lifestyle (continued smoking and alcohol drinking) factors contribute to the risk of SPT and recurrence (3, 4). HNSCC tumorigenesis is a multistep process involving an accumulation of progressive genetic alterations (5), including genomic alterations of multiple chromosomes (3p, 9p, 13q, and 17p; refs. 6, 7) and mutations of essential oncogenes and tumor suppressor genes (p53, p16, cyclin D1, KRAS, and FHIT; refs. 8, 9). Many of these somatic alterations have also been linked to SPT/recurrence development.
We previously reported that high mutagen sensitivity measured by an in vitro lymphocytic assay, reflecting constitutional genetic instability, was associated with increased risk of SPT/recurrence (10, 11). Although the association between single nucleotide polymorphisms (SNP) and risk of HNSCC (12, 13) has been extensively investigated, no studies have investigated their association with SPT/recurrence. To address this issue, we conducted this nested case-control analysis to test the hypothesis that common sequence variants affect the risk of SPT/recurrence in curatively treated HNSCC patients. Because genome-wide scanning approach was not an option due to the limited sample availability of HNSCC patients who developed SPT/recurrence, we therefore constructed a comprehensive panel of 998 cancer-related genes and 9,645 SNPs to assess both their individual and combined effects on SPT/recurrence. We also constructed risk prediction models of SPT/recurrence based on known clinical and epidemiologic risk factors, and SNPs identified from this study.
Materials and Methods
Study population and epidemiologic data
The subjects included in this study were participants enrolled (1991-1999) in the Retinoid Head and Neck Second Primary Trial designed to evaluate whether daily low-dose 13-cis-retinoic acid (13-cRA) prevents SPT or tumor recurrence in early-stage HNSCC patients (1). Briefly, patients with histologically confirmed stage I or II HNSCC who were cancer-free for at least 16 wk after the end of treatment were eligible for randomization to either low-dose (30 mg/d) 13-cRA treatment or placebo for 3 y with a minimum of planned 4 y of follow-up. The stratification criteria for randomization included the primary tumor site (larynx, oral cavity, and pharynx), tumor stage (stage I or II), and smoking status (current, former, or never smoker). Never smokers were individuals who had smoked <100 total cigarettes during their lifetime. Former smokers were individuals who had stopped smoking for at least 1 y at the time of enrollment (14). Patients were evaluated at 3, 6, 9, 12, 16, 20, 24, 28, 32, and 36 mo after randomization. After completing treatment, patients were followed up at 6-mo intervals for an additional 4 y. Standard criteria for diagnosis of a SPT were applied (15). The major sites of SPT in this population were lung (29.8%), head and neck (28.0%), prostate (14.2%), and bladder (5.1%). Local recurrence was defined as any tumor of similar histology appearing within 2 cm or within 3 y of the primary tumor. Among ∼1,190 patients enrolled, 354 developed SPT/recurrence. However, only 150 patients have blood DNA samples available. Therefore, we designed a nested case-control study to evaluate these 150 patients with SPT/recurrence designated as cases and 300 patients without SPT/recurrence as controls. We did analyses on these 150 cases and those not included in this study and did not find significant differences in terms of age, sex, smoking, alcohol, tumor site, stage, radiotherapy, surgery, or 13-cRA treatment. Patients included in the study had higher percentage of Caucasians (95%) than patients not included (89%; P = 0.001). We are confident that there is minimal patient selection bias. The study was approved by the Institutional Review Board of The University of Texas M. D. Anderson Cancer Center. Informed consent was obtained from all participants.
Development of the iSelect Infinium II cancer gene/SNP BeadChip
We developed a customized and comprehensive panel of cancer-related genes involved in 12 major cellular pathways (Supplementary Table S1). For each specific pathway, genes were subcategorized according to their major reported functions. To generate an unbiased relevant gene list, we used the Gene Ontology (GO),5
a comprehensive database of gene annotation. We further used the Cancer Genome Anatomy Project GO Browser6 to pinpoint all relevant ontology terms for probing the GO database. We did an extensive literature review on the genes returned by the GO database using the HUGO name and the common aliases7 and “cancer” as keywords to interrogate the PubMed to further scrutinize for cancer relevance. We then assigned a priority score to each gene based on the importance and relevance of the gene to the specific cancer pathway. For each gene with a high priority score, we identified the tagSNPs ranging from 10 kb upstream of the 5′-untranslated region (UTR) to 10 kb downstream of the 3′-UTR of the gene (16). We also included potentially functional SNPs, which are located in the functional regions of the genes, including coding (synonymous SNPs and nonsynonymous SNPs) and regulatory (promoter, splicing site, 5′-UTR, and 3′-UTR) regions. Each gene was then analyzed using the LDSelect program8 to divide SNPs into bins based on the r2 threshold of 0.8 and minor allele frequency (MAF) ≥0.05 in Caucasians. For genes with a medium priority score, only potentially functional SNPs were identified. For tagSNP selection, we selected one SNP from each bin according to preset criteria considering the validation status, designability score, position, and bead type number of specific SNPs. For potentially functional SNP selection, we included all two-hit or HapMap validated SNPs with a designability score ≥0.6 and a MAF ≥0.01 in Caucasians. Overall, 9,645 SNPs were included on the BeadChip (Supplementary Table S1). The complete set of selected SNPs was submitted to Illumina technical support for the Infinium II chemistry designability and bead type analyses using a proprietary program developed by Illumina (17).Genotyping
Genomic DNA was extracted from peripheral blood lymphocytes. Genotyping was carried out according to the standard 3-d protocol provided by Illumina. The genotypes were autocalled using the BeadStudio software.
Statistical analysis
Statistical analyses were done using Intercooled STATA software (STATA Corp.) and SAS/Genetics, version 9.0 (SAS Institute). χ2 analysis was used to assess the differences between subject groups with regard to categorical variables and Student's t test for continuous variables. For each chromosomal SNP, the risks of SPT/recurrence were estimated as hazard ratios (HR) and 95% confidence intervals (95% CI) using multivariable Cox proportional hazard regression models adjusted for age, gender, ethnicity, smoking status, tumor site, stage, and treatment, where appropriate. Three genetic models (dominant, recessive, and additive) were tested for each SNP and the model with the highest significance was considered the best-fitting model and used to measure the statistical significance of each SNP (18). For mitochondrial SNPs (mtSNP), the heterozygous genotypes were treated as missing data because these calls typically result from either DNA contamination or heteroplasmy (19). The wild-type and variant genotypes of mtSNPs were then analyzed in the same way as chromosomal SNPs. Multiple hypothesis testing was done using the q value, a measure of significance in terms of the false discovery rate and implemented in the R package (20). The multiple comparison adjustment was carried out for the best-fitting model representing the significance of the association for each SNP. We applied a bootstrap resampling method to internally validate the results. We generated bootstrap 100 samples. Each time a bootstrap sample was drawn from the original data set and the P value was obtained for each SNP among the dominant, recessive, and additive models. The cumulative effects of unfavorable genotypes on SPT/recurrence were tested for the combined top SNPs that showed a significant q value (<0.05) and also had a bootstrap P value of <0.01 at least 80% times. Based on the percentage of patients developing SPT/recurrence, subjects were categorized into low-risk (<25%), medium low-risk (25-50%), medium high-risk (51-75%), and high-risk (>75%) groups by number of unfavorable genotypes. We calculated the HRs and 95% CIs for all other groups compared with the low-risk reference group using a multivariable Cox proportional hazard regression model. Kaplan-Meier estimates were calculated to plot the event-free curve for each group and the log-rank test was used to compare survival between these groups. We also constructed receiver operating characteristic curves and calculated the area under the curve (AUC) to evaluate the specificity and sensitivity of predicting SPT/recurrence by incorporating different combinations of epidemiologic, clinical, and genetic predictor variables. We only included SNPs internal validated by bootstrapping in these analyses. A two-sided P ≤ 0.05 was considered the threshold of statistical significance.
Results
Characteristics of the study population
One hundred and fifty patients with SPT/recurrence (cases) were 1:2 matched to 300 patients without SPT/recurrence (controls) by age (±5 years), gender, and ethnicity (Supplementary Table S2). There were no significant differences between these two groups in radiotherapy (P = 0.71), surgery (P = 0.34), or 13-cRA treatment arm (P = 0.42). There seemed to be more current smokers (42%) in SPT/recurrence group than in no event group (34%), and more high-stage (stage II) patients in the former group (41%) than the latter group (34%), although these two comparisons did not reach statistical significance (P = 0.22 and 0.13, respectively). However, significant differences were observed between the two groups in pack-years (P = 0.007) and tumor site (P = 6.0 × 10−5).
iSelect Infinium II BeadChip content and genotyping quality controls
There were 998 genes represented by 9,645 SNPs on the BeadChip (Supplementary Table S3). Seventy-eight percent were tagging SNPs and 22% were potentially functional SNPs. The initial conversion rate of the BeadChip synthesis was 90.61%, leaving 8,739 SNPs (8,583 chromosomal SNPs and 156 mtSNPs) with reliable genotyping data. Individuals with >5% missing genotypes, SNPs with >5% missing calls, chromosomal SNP with <1% MAF, or mtSNPs with <5% MAF were excluded. After applying these filters, 8,370 SNPs and 440 study subjects (147 cases and 293 controls) were included in the following analyses.
Significant individual SNPs associated with SPT/recurrence in the main effect analysis
Because the genetic background and replication patterns are significantly different for chromosomal SNPs and mtSNPs, we did analyses separately for these two groups. Table 1 lists the top 20 chromosomal SNPs sorted by P values. Six SNPs remained statistically significant after multiple comparison adjustment using q value (Table 1). The most significant SNP (rs12359892) was located in the 3′ region of the MKI67 gene. The homozygous variant genotype was associated with a 2.65-fold (95% CI 1.72-4.11; P = 1.25 × 10−5; q = 0.042) increased risk of SPT/recurrence under the recessive genetic model. Seven mtSNPs had significant q values after multiple comparison adjustment (Table 2). mitoA11813G located in the NADH dehydrogenase subunit 4 (ND4) gene was the most significant mtSNP. The HR of the variant allele was 0.06 (95% CI, 0.01-0.44; P = 1.24 × 10−6; q = 1.98 × 10−5) compared with the wild-type allele. We then did bootstrap 100 times for internal validation and listed the number of times that the bootstrap P value was <0.01 for each SNP (Tables 1 and 2). For the top 20 chromosomal SNPs, 12 had a bootstrap P value of <0.01 at least 80% times (Table 1, shaded SNPs). The top SNP, MKI67 rs12359892, exhibited a highly consistent result with P < 0.01 96 times in 100 bootstrap samples (Table 1). The top three mtSNPs had a bootstrap P value of <0.01 at least 80% times (Table 2). The top mtSNP, mitoA11813G, exhibited a highly consistent result with a bootstrap P value of <0.01 for 98 times.
Associations of SPT/recurrence with mtSNPs remaining significant after multiple comparison adjustment
SNP . | Host gene/region . | SNP type . | Mitochondrial position . | Allelic change . | Genotype counts* . | Cox model . | No. times in bootstrap sample . | |||
---|---|---|---|---|---|---|---|---|---|---|
SPT . | No SPT . | HR (95% CI)† . | P . | q . | P < 0.01 . | |||||
mitoA11813G | Mt-ND4 | sSNP | 11812 | A>G | 146/1 | 259/34 | 0.06 (0.01-0.44) | 1.24 × 10−6 | 1.98 × 10−5 | 98 |
mitoG15929A | Mt-TT | Noncoding | 15928 | G>A | 143/4 | 252/39 | 0.20 (0.08-0.56) | 5.47 × 10−5 | 4.37 × 10−4 | 94 |
mitoA14906G | Mt-CYB | sSNP | 14905 | A>G | 141/6 | 248/41 | 0.28 (0.12-0.63) | 1.95 × 10−4 | 1.04 × 10−3 | 83 |
mitoT10464C | Mt-TR | Noncoding | 10463 | T>C | 140/7 | 253/39 | 0.34 (0.16-0.73) | 1.04 × 10−3 | 4.17 × 10−3 | 73 |
mitoA11252G | Mt-ND4 | sSNP | 11251 | A>G | 132/14 | 230/60 | 0.47 (0.27-0.82) | 3.21 × 10−3 | 1.03 × 10−2 | 68 |
mitoG3012A | Mt-RNR2 | Noncoding | 3010 | G>A | 101/46 | 237/56 | 1.73 (1.21-2.49) | 3.87 × 10−3 | 1.03 × 10−2 | 70 |
mitoT14767C | Mt-CYB | Thr > Ile | 14766 | T>C | 60/87 | 166/127 | 1.60 (1.14-2.25) | 5.90 × 10−3 | 1.35 × 10−2 | 67 |
SNP . | Host gene/region . | SNP type . | Mitochondrial position . | Allelic change . | Genotype counts* . | Cox model . | No. times in bootstrap sample . | |||
---|---|---|---|---|---|---|---|---|---|---|
SPT . | No SPT . | HR (95% CI)† . | P . | q . | P < 0.01 . | |||||
mitoA11813G | Mt-ND4 | sSNP | 11812 | A>G | 146/1 | 259/34 | 0.06 (0.01-0.44) | 1.24 × 10−6 | 1.98 × 10−5 | 98 |
mitoG15929A | Mt-TT | Noncoding | 15928 | G>A | 143/4 | 252/39 | 0.20 (0.08-0.56) | 5.47 × 10−5 | 4.37 × 10−4 | 94 |
mitoA14906G | Mt-CYB | sSNP | 14905 | A>G | 141/6 | 248/41 | 0.28 (0.12-0.63) | 1.95 × 10−4 | 1.04 × 10−3 | 83 |
mitoT10464C | Mt-TR | Noncoding | 10463 | T>C | 140/7 | 253/39 | 0.34 (0.16-0.73) | 1.04 × 10−3 | 4.17 × 10−3 | 73 |
mitoA11252G | Mt-ND4 | sSNP | 11251 | A>G | 132/14 | 230/60 | 0.47 (0.27-0.82) | 3.21 × 10−3 | 1.03 × 10−2 | 68 |
mitoG3012A | Mt-RNR2 | Noncoding | 3010 | G>A | 101/46 | 237/56 | 1.73 (1.21-2.49) | 3.87 × 10−3 | 1.03 × 10−2 | 70 |
mitoT14767C | Mt-CYB | Thr > Ile | 14766 | T>C | 60/87 | 166/127 | 1.60 (1.14-2.25) | 5.90 × 10−3 | 1.35 × 10−2 | 67 |
Abbreviations: TT, tRNA threonine; TR, tRNA arginine.
*Genotype counts: wild genotype/variant genotype.
†Adjusted for age, gender, ethnicity, smoking status, tumor site, tumor stage, and treatment.
To increase sample size and statistical power, we grouped all SPT cases in our analysis. Because the relevance of prostate cancer and other non–smoking-related or nonaerodigestive tract cancer as SPT may not be clear, we also did separate analyses of smoking-related and aerodigestive SPT and compared the results to the entire SPT group. Of the top 20 chromosomal SNPs that were significant in the entire SPT cases (Table 1), 18 remained significant at significance level 0.05 in both smoking-related and aerodigestive tract SPT subgroup analyses, one SNP remained significant at the significance levels 0.05 in smoking related and borderline significance in aerodigestive tract SPT, and the remaining SNP had a P value of 0.11 when considering smoking-related SPT cases and P value of 0.15 when considering aerodigestive tract SPT cases. The HR estimates were similar and the best-fitting models were the same for the top 20 chromosomal SNPs (Supplementary Table S4). A similar pattern was observed for the top mtSNPs (Supplementary Table S5). We chose to present data from the entire SPT cases to reflect general risk for developing any new tumors.
Cumulative effects of the unfavorable genotypes
We further evaluated the cumulative effects of the high-risk genotypes on SPT/recurrence by summing the unfavorable genotypes of the above-described top risk-conferring chromosomal SNPs and mtSNPs that had bootstrap P values of <0.01 at least 80% times. Twelve chromosome SNPs and 1 mtSNP (mitoG15929A and mitoA14906G were excluded because of high linkage disequilibrium with mitoA11813G) were included in this analysis. As shown in Table 3, there was a significant gene-dosage effect. Compared with those in the low-risk reference group (≤4 unfavorable genotypes), subjects with medium low risk (5-6 unfavorable genotypes), medium high risk (7), and high risk (≥8) had 4.29-fold (95% CI, 2.52-7.29; P = 7.59 × 10−8), 9.16-fold (95% CI, 5.52-17.83; P = 1.80 × 10−14), and 26.72-fold (95% CI, 14.00-50.99; P < 1 × 10−20) increased SPT/recurrence risks, respectively (P for trend < 1 × 10−20). The event-free median survival times were 14.6, 49.2, and 79.4 months for these three risk groups, respectively, compared with >93.0 months for the low-risk groups (P = 9.92 × 10−38, log-rank test; Fig. 1).
The cumulative effects of unfavorable genotypes on SPT/recurrence
No. unfavorable genotypes* . | SPT/recurrence, n (%) . | No SPT/recurrence, n (%) . | HR (95% CI)† . | P . |
---|---|---|---|---|
Reference group ≤4 | 18 (10.91) | 147 (89.09) | 1 | Reference |
5-6 | 62 (37.58) | 103 (62.42) | 4.29 (2.52-7.29) | 7.59 × 10−8 |
7 | 34 (61.82) | 21 (38.18) | 9.16 (5.52-17.83) | 1.80 × 10−14 |
≥8 | 25 (96.15) | 1 (3.85) | 26.72 (14.00-50.99) | <1.00 × 10−20 |
P for trend | <1.00 × 10−20 |
No. unfavorable genotypes* . | SPT/recurrence, n (%) . | No SPT/recurrence, n (%) . | HR (95% CI)† . | P . |
---|---|---|---|---|
Reference group ≤4 | 18 (10.91) | 147 (89.09) | 1 | Reference |
5-6 | 62 (37.58) | 103 (62.42) | 4.29 (2.52-7.29) | 7.59 × 10−8 |
7 | 34 (61.82) | 21 (38.18) | 9.16 (5.52-17.83) | 1.80 × 10−14 |
≥8 | 25 (96.15) | 1 (3.85) | 26.72 (14.00-50.99) | <1.00 × 10−20 |
P for trend | <1.00 × 10−20 |
*Unfavorable genotype was based on the 12 chromosomal SNPs and 1 mtSNP as described in text.
†Adjusted for age, gender, smoking status, ethnicity, tumor site, tumor stage, and treatment.
Kaplan-Meier event-free survival curve on SPT/recurrence by the unfavorable genotypes of 18 chromosomal SNPs and 3 mtSNPs.
Kaplan-Meier event-free survival curve on SPT/recurrence by the unfavorable genotypes of 18 chromosomal SNPs and 3 mtSNPs.
Model discrimination ability
We next constructed prediction models by incorporating established prognostic clinical variables (tumor site, stage, and treatment), epidemiologic variables (smoking pack-years), and genetic variables (12 chromosomal SNPs and 1 mtSNP identified in this study; Fig. 2). The AUC increased from 0.61 (clinical variables only) to 0.64 (clinical-smoking variables) and to 0.84 (clinical, smoking, and genetic variables). The observed difference in AUC between the third and second models was 0.20, and the bias-corrected 95% CIs based on 10,000 bootstrap samples were 0.15 to 0.27, suggesting significant differences between these two models.
Receiver operating characteristic curves from various models showing improvement of discrimination ability.
Receiver operating characteristic curves from various models showing improvement of discrimination ability.
Because age, gender, and ethnicity were matched by study design, the above models may be weak in terms of epidemiologic risk factors. However, we analyzed the entire cohort data to explore the main effects of age, gender, and ethnicity on SPT/recurrence and constructed receiver operating characteristic curve based on these data. We found a significant effect of age on SPT/recurrence, but neither sex nor ethnicity was significantly associated with SPT/recurrence. However, adding age to the clinical-smoking model did not significantly change the AUC of the clinical-smoking model (data not shown).
Discussion
In this large-scale systematic evaluation of 9,645 SNPs in 998 cancer-related genes, we identified six chromosomal SNPs and seven mtSNPs significantly associated with risk of SPT/recurrence after correction for type I errors, with evidence of a significant gene-dosage effect. These results support the notion that SPT and tumor recurrences are polygenic traits determined by multiple low penetrance loci.
We developed a customized SNP chip encompassing well-established pathways through comprehensive and exhaustive database interrogation and literature review. The associations identified are biologically plausible. Among the six significant chromosomal variants, the most significant is localized in the MKI67 gene, an important cell cycle proliferation marker whose expression is correlated with the development and progression of various malignancies, including HNSCC (21). Cyclin-dependent kinase (CDK) 6 mostly functions in the progression of G1 phase through interacting with multiple cyclins and inhibiting tumor suppressor protein RB (22). Both CDK6 and MKI67 are reported to promote HNSCC progression through enhancing expression of protein kinases to phosphorylate and activate proliferative transcription factors (23). MNAT1 is a key component of the protein complex CDK-activating kinase, which phosphorylates CDKs to activate cell cycle progression and also interacts with transcription factor TFIIH to stimulate nucleotide excision repair (24). NHEJ1 gene product interacts with both XRCC4 and LIG4 as a core component of the protein complex responsible for nonhomologous end-joining pathway of dsDNA break repair (25). Suboptimal DNA repair capacity has been shown to increase the risk of HNSCC and SPT/recurrence (10, 11). TNFRSF10B encodes a member of the tumor necrosis factor (TNF) receptor superfamily involved in extrinsic apoptosis pathway (26). Mutations in TNFRSF10B have been identified in multiple cancers, including HNSCC (10). GSTM4 belongs to the Mu subclass of the glutathione S-transferase family, essential in the detoxification of electrophilic compounds, and polymorphisms of this gene family have been extensively associated with the risk and outcomes of HNSCC (27, 28). Taken together, there is strong biological plausibility for the associations between the six identified chromosomal genes and HNSCC.
We also identified several mtSNPs as predictors of HNSCC SPT/recurrence. Mitochondrial dysfunction may lead to tumorigenesis through apoptotic regulation, reactive oxygen species generation, metabolic regulation, and nucleus-mitochondria communications (29). Altered mitochondrial function with increased aerobic glycolysis, the Warburg effect, is a common feature in many tumors (30). Aberrations of mitochondrial DNA have been observed in almost all types of solid cancers, including HNSCC (31). Polymorphisms in the mitochondrial genome have also been associated with many common diseases, including diabetes and cancer (32). The most significant mtSNP, mitoA11813G, is located in the ND4 gene, which has been implicated in head and neck cancer by multiple independent studies (33, 34). Mutations of cytochrome b (CYB) and 16s rRNA (RNR2) were also identified in HNSCC (31). mtSNPs may be involved in the initiation and progression of both index tumors and SPT/recurrence due to possible disruptive effects on mitochondria genes and energy metabolism (35) or related to the central role of mitochondria in apoptosis and reactive oxygen species production.
We further used Ingenuity Pathway Analysis to explore whether certain canonical pathways were overrepresented for significant associations by inputting chromosomal genes containing SNPs with P < 0.01 (a total of 170 genes; ref. 36). The top predefined canonical pathways to which these genes belong include aryl hydrocarbon receptor signaling, PTEN signaling, lipopolysaccharide/interleukin-1–mediated inhibition of retinoid X receptor function, xenobiotic metabolism signaling, and cell cycle (Supplementary Table S6), most of which are implicated in carcinogen or drug metabolism and treatment-related cellular response. Because of the etiologic role of tobacco and alcohol in HNSCC carcinogenesis, these results are not surprising. Most genetic markers of clinical outcome have only modest effects, and there is likely to be an enhanced predictive power when SNPs are analyzed jointly (18, 37, 38), as we noted. Another data-mining tool we explored is the survival tree analysis, which uses a binary recursive partitioning to produce a tree structure with many binary splits. Our survival tree analysis produced a decision tree with 14 terminal nodes, each with a different SPT/recurrence risk based on distinct combination of genotypes (Supplementary Fig. S1). The terminal nodes from the final tree were grouped into four risk groups based on the percentage of patients developing SPT/recurrence in each terminal node: low risk (<25%), medium low risk (25-50%), medium high risk (51-75%), and high risk (>75%). Compared with the low-risk group, the risk increased from 3.48- to 17.04-fold for medium low-risk to high-risk groups (Supplementary Fig. S1). We validated the risk groups by bootstrapping the samples 10,000 times. These data support an important role of gene-gene interactions in modulating SPT/recurrence. Furthermore, when we incorporated the genetic variables into a multivariate model, we obtained a significant improvement of discriminatory ability (Fig. 2), underscoring the importance of incorporating germ-line genetic variation data with clinical and risk factor data into prediction models for clinical outcomes.
There are also a few limitations of this study. First, the sample size is limited due to the rarity of events and availability of germ-line DNA. We calculated statistical power based on the MAF and genetic models (Supplementary Table S7). Power is adequate for additive and dominant models to detect an OR of ≥2.5 when MAF is >0.05. At a MAF of 0.05, we have more than 91% power and 94% power to detect an increased OR of 2.5 in dominant and additive models, respectively. The power to detect OR of 2.5 is close to 100% for larger MAFs. For a recessive model, we have >80% power to detect an increased OR of 3.0 when MAF is ≥0.20. However, power is limited when MAF is lower in recessive model. We calculated power to detect ORs instead of HRs. In cohort studies with long follow-up time, the HR approach based on survival analysis for time to event end point is even more efficient than the OR approach based on logistic regression for binary end point. Second, due to the sample size, we could not do stratified analyses, for example, on smoking and tumor site. Hence, we adjusted these variables in all our analyses. We also do not have information on human papillomavirus-16 status. Third, due to the difficulty in identifying an external validation population, we are unable to validate the significant SNPs in an independent population. Such external validation would be a critical next step. Finally, we used a nested 1:2 case-control study design, which may not reflect the population of early-stage HNSCC, although the 1:2 case-control ratio is comparable with the roughly 30% of SPT/recurrence incidence in the original population.
There are many strengths of this study. This is the first large-scale study to systematically evaluate germ-line genetic variants in HNSCC SPT/recurrence. Because a genome-wide scanning approach was not possible due to the limited numbers of HNSCC patients who developed SPT/recurrence, our pathway-based custom SNP array is the best option. There is minimal selection bias because the cases and controls were well matched and were all early-stage HNSCC patients enrolled in a prospectively conducted randomized chemoprevention trial. The significant SNPs identified may be useful for clinicians in assessing the risk for SPT/recurrence in early-stage HNSCC patients. The genotyping technology is robust and consistent. Obtaining DNA from peripheral blood is noninvasive and inexpensive. We can generate thousands of genotypes from one drop of blood and get the patients' genetic profile predictive of SPT/recurrence, which can be incorporated into a risk prediction model to identify high-risk patients to undergo intensive screening, smoking cessation, or dietary modification. Chemoprevention trials have been mostly negative in head and neck cancer. Although the main reason for these negative results probably is that the tested chemoprevention agents are not the best, we also think that patients are heterogeneous and these agents may not work in all patients. Not considering patients' genetic background in patient stratification may at least partially contribute to the negative results. Patients with a specific genetic background may respond better to certain chemoprevention agents.
The present study focused on comprehensive risk-modeling analyses of SNPs to identify early-stage head and neck HNSCC cancer patients at the highest risk of SPT/recurrence and conducted within a large-scale randomized trial of 13-cRA. Ongoing work that is beyond the scope of this article is examining pharmacogenetic interactions to see if there are certain germ-line alterations associated with a better outcome of 13-cRA treatment. This treatment was a covariate in the risk-modeling analysis, which was adjusted for this factor. We identified the top 20 chromosomal SNPs associated with a high risk of SPT/recurrence (Table 1); of these 20 SNPs, only 1, which is in MK167, a cell cycle gene, was associated with the retinoid effect of a significantly reduced SPT/recurrence risk (62%), making this SNP both highly prognostic and predictive (data not shown). This preliminary observation is advantageous in that it seems to mark high-risk patients with the greatest need and their sensitivity to an agent; it is being examined further in the broader pharmacogenomic studies mentioned above. If these studies identify a predictive marker or signature based on individual patients' germ-line genetic variations, we can design a better patient stratification plan in future chemoprevention trials, targeting chemoprevention agents to patients with a high risk of SPT/recurrent and more likely to benefit from treatment. Through this personalized chemoprevention, we may have better success in chemoprevention trials.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.