To identify genetic variants for risk of squamous cell carcinoma of the head and neck (SCCHN), we conducted a two-phase genome-wide association study consisting of 7,858,089 SNPs in 2,171 cases and 4,493 controls of non-Hispanic white, of which, 434,839 typed and 7,423,250 imputed SNPs were used as the discovery. SNPs with P < 1 × 10−3 were further validated in the OncoArray study of oral and pharynx cancer (5,205 cases and 3,232 controls of European ancestry) from databases of Genotypes and Phenotypes. Meta-analysis of the discovery and replication studies identified one novel locus 6p22.1 (P = 2.96 × 10−9 for the leading rs259919) and two cancer susceptibility loci 6p21.32 (rs3135001, HLA-DQB1) and 6p21.33 (rs1265081, CCHCR1) associated with SCCHN risk. Further stratification by tumor site revealed four known cancer loci (5p15.33, 6p21.32, 6p21.33, and 2p23.1) associated with oral cavity cancer risk and oropharyngeal cancer risk, respectively. In addition, one novel locus 18q22.2 (P = 2.54 × 10−9 for the leading SNP rs142021700) was identified for hypopharynx and larynx cancer risk. For SNPs in those reported or novel loci, we also performed functional annotations by bioinformatics prediction and expression quantitative trait loci analysis. Collectively, our identification of four reported loci (2p23.1, 5p15.33, 6p21.32, and 6p21.33) and two novel loci (6p22.1 and 18q22.2) for SCCHN risk highlight the importance of human leukocyte antigen loci for oropharyngeal cancer risk, suggesting that immunologic mechanisms are implicated in the etiology of this subset of SCCHN.

Significance:

Two novel risk loci for SCCHN in non-Hispanic white individuals highlight the importance of immunologic mechanism in the disease etiology.

Squamous cell carcinoma of the head and neck (SCCHN) is the sixth most common malignancy world-wide and the seventh leading cause of cancer-related deaths worldwide (1, 2). In the United States, it is estimated that there will be approximately 65,410 new cases and 14,620 deaths to occur in 2019 (3). SCCHN includes cancers of the oral cavity (including the gums and tongue), pharynx, and larynx with well-documented associations with exposure to tobacco and alcohol as well as infection with human papillomavirus (HPV; refs. 4–6). However, the disease develops in only a small fraction of tobacco users, alcohol drinkers, or individuals who contracted HPV (7), implying an important role of genetic susceptibility in the etiology of SCCHN (8, 9). For example, genetic variants in alcohol-related genes (i.e., ADH1B and ADH7) have been reported to be associated with risk of SCCHN and upper aerodigestive cancers (10, 11). To date, there are only two published genome-wide association studies (GWAS) of SCCHN (12, 13). One study was performed with 2,398 cases and 2,804 controls of Chinese ancestry and reported six loci (i.e., 5q14.3, 6p21.33, 6q16.1, 11q12.2, 12q24.21, and 16p13.2) to be associated with the laryngeal cancer risk (12). Another study investigated genetic susceptibility of oral cavity and pharyngeal cancer with 6,034 cases and 6,585 controls of European ancestry, which reported three loci (i.e., 6p21.32, 10q26.13, and 11p15.4) to be associated with the overall cancer risk, four loci (i.e., 2p23.3, 5p15.33, 9p15.3, and 9q34.12) contributed to oral cancer, and the human leukocyte antigen (HLA) region 6p21.32 was associated with oropharyngeal cancer (13). These limited risk loci represent only a small proportion of the heritability, and no additional follow-up studies have been reported.

In this GWAS, with the goal of identifying additional novel genetic risk loci for SCCHN, we conducted a two-phase GWAS in non-Hispanic whites. We first identified SNPs in The University of Texas MD Anderson Cancer Center (MDACC, Houston, TX) GWAS, followed by validating those SNPs with P < 1 × 10−3 using the published OncoArray GWAS data. As a result, we found three loci (6p22.1, 6p21.33, and 6p21.32) for overall SCCHN risk, three for oropharyngeal cancer risk (2p23.1, 6p21.33, and 6p21.32), two for oral cavity cancer risk (5p15.33 and 6p21.32), and one (18q22.2) for the hypopharyngeal and laryngeal cancer risk. Therefore, we identified two novel loci (6p22.1 and 18q22.2) for SCCHN risk in addition the replication of four known cancer susceptibility regions (2p23.1, 5p15.33, 6p21.33, and 6p21.32).

Populations and genotyping

Discovery population

The SCCHN cases of the present discovery GWAS were ascertained at Head and Neck Surgery Clinic through MDACC (14, 15) between December 1996 and July 2011, whose genomic DNA was genotyped with Illumina HumanOmniExpress-12v1 BeadChip. All cases were individuals with newly diagnosed, histologically confirmed, and previously untreated SCCHN of the oral cavity, pharynx, or larynx (14–16). Cases were categorized by tumor site according to the International Classification of Diseases for Oncology (ICD-O, 2nd edition) or ICD10 (5, 17–19). We considered individuals with cancers of the oral cavity (codes C00.3–C00.9, C02.0–C02.3, C03.0, C03.1, C03.9, C04.0, C04.1, C04.8, C04.9, C05.0, C06.0–C06.2, C06.8, and C06.9), oropharynx (codes C01.9, C02.4, C05.1, C05.2, C09.0, C09.1, C09.8, C09.9, C10.0–C10.4, C10.8, and C10.9), hypopharynx (codes C12.9, C13.0–C13.2, C13.8, and C13.9), oral cavity or pharynx overlapping or not otherwise specified (codes C02.8, C02.9, C05.8, C05.9, C14.0, C14.2, and C14.8), and larynx (codes C32.0–C32.3 and C32.8–C32.9). Genotypes were available for 2,249 cases (Supplementary Fig. S1).

The controls were recruited from genetically unrelated visitors who accompanied patients with cancer to MDACC outpatient clinics (14, 15, 20), or individuals recruited previously for the MDACC melanoma study (20), which was deposited in the database of Genotypes and Phenotypes (dbGaP accession no.: phs000187.v1.p1) or from the Study of Addiction: Genetics and Environment (SAGE) study (SAGE; dbGaP accession no.: phs000092.v1.p1; ref. 21). Of these datasets, there were 1,188 cancer-free individuals recruited for the SCCHN study, whose genomic DNA was genotyped by using Illumina HumanOmniExpress-12v1 BeadChip; 1,026 cancer-free individuals previously recruited for the melanoma GWAS, in which the genomic DNA was genotyped by using Illumina Omni1-Quad_v1-0_B BeadChip, and 2,377 cancer-free individuals of European descendent from the SAGE study (21), who have genotyping data generated by Illumina Human1Mv1 BeadChip. The genotyping data of the SCCHN GWAS has been deposited in the dbGaP (accession no.: phs001173.v1.p1).

All participants in the discovery study signed a written informed consent form that permited us to collect blood samples and clinicopathologic information. The study protocols were approved by the Institutional Review Board of MDACC in accordance with tenets of the Declaration of Helsinki.

Replication population

The replication dataset was part of a published study, which comprised 6,034 cases and 6,585 controls derived from 12 epidemiologic studies, with the majority having been collected through a case–control design as part of the International Head and Neck Cancer Epidemiology Consortium (13). We requested the related genotyping data and phenotype data from dbGaP (accession no.: phs001202.v1.p1), in which data were available for 6,034 cases and 4,062 controls. Genomic DNA isolated from blood or buccal cells was genotyped at the Center for Inherited Disease Research (CIDR) with a novel genotyping tool, the Illumina OncoArray custom array designed for cancer studies by the OncoArray Consortium part of the Genetic Associations and Mechanisms in Oncology Network. The majority of the samples that were genotyped had oral and pharynx cancer.

Quality control in both discovery and replication GWASs

For the discovery study, we used the genotypes to identify individuals with discordant sex information, duplicates, and closely related individuals among all samples. We identified genetically related individuals by calculating genome-wide identity-by-state distances on markers for each pair of individuals. For any pair with allele sharing of >80%, we excluded the sample generating the lowest call rate from further analysis. Across data from both phases, we excluded 15 lacking consent, 101 duplicated individuals, 49 individuals because of discordant sex information, eight because of cryptic relatedness, and nine because of overall genotyping rate below 95% (Supplementary Fig. S1). For the combined data of 2,171 cases and 4,493 controls (including 1,149, 1,022, and 2,322 controls from the SCCHN GWAS, melanoma GWAS, and SAGE study, respectively), 543,328 tagging SNPs were available. After applying quality control on genotype data, we retained 414,349 autosomal and 9,955 X chromosome SNPs showing a minimal departure from Hardy–Weinberg equilibrium (HWE; P > 10−6 in controls), a genotyping call rate ≥ 95%, and a minor allele frequency (MAF) ≥ 1% in cases and controls.

As described in a previous publication, a similar quality control process was applied in the validation GWAS (13). Briefly, this SCCHN GWAS study first excluded samples with a genotyping rate <95% and SNPs with a call rate <95%. After that, we also removed samples with unsolved genetic and reported sex discrepancies and individuals with outlying autosomal heterozygosity rate (±4 SD), as well as duplicate-pairs (identity by descent > 0.9) and relative pairs (showing identity by descent > 0.3). SNPs with MAF < 1% and deviation of HWE in controls (P ≤ 10−6) were also removed.

We also applied FastPop to estimate ancestry proportion (22). The final samples were those with the proportion of European ancestry ≥ 0.8, which included 2,171 cases and 4,493 controls for the discovery study (MDACC, Supplementary Fig. S2), and 5,205 cases and 3,232 controls for the replication study (OncoArray).

Imputation

To impute untyped genetic variants, we first performed strand flip using PLINK to convert all alleles to the forward genomic strand, and then used SHAPEIT for phasing and performed imputation with minimac4 on the Michigan imputation server (https://imputationserver.sph.umich.edu) with the Haplotype Reference Consortium reference panel (Version r1.1 2016) consisting of 64,940 haplotypes of predominantly European ancestry. For imputation, we used a set of high-quality SNPs: an MAF > 0.01; a call rate > 95%, a HWE test P > 10−6 and an allele frequency difference ≤ 0.20 between the sample data and the reference panel. After imputation, SNPs with an MAF < 0.01, an imputation quality r2 < 0.3 or a significant allele frequency difference (P < 1 × 10−3) among the controls of newly genotyped and those from the MDACC and SAGE GWAS were excluded from the final association analysis. Thus, the final set included 7,858,089 SNPs on autosomes and X chromosome, of which, 434,839 were typed and 7,423,250 were imputed SNPs.

We also imputed HLA classical alleles and amino acids by using software SNP2HLA and the Type I Diabetes Genetics Consortium reference panel of 5,225 individuals of European descent. We divided the SCCHN GWAS dataset into three subsets (each subset with around 1,100 samples), and the control samples from SAGE GWAS into two subsets (with 1,200 and 1,113 samples), and performed imputation separately for each subset. The final panel included 8,926 HLA alleles, of which, 8,648 and 7,463 were imputed alleles with an info score ≥ 0.3 and 0.9, respectively. We then performed regional association analyses of binary markers, followed by meta-analysis of imputed binary markers (SNPs, classical alleles, and amino acids) using PLINK 1.90.

Statistical analysis and in silico functional annotations

To control for population confounding, for the two discovery datasets and replication dataset, we performed principal components analysis (PCA) in EIGENSTRAT using 90,636 common markers in low linkage disequilibrium (LD; r2 < 0.1; MAF > 0.05). Significant principal components (PC) associated with the disease status (P < 0.05) were adjusted as covariates in the further risk association analysis (including PC 1, 2, 5, 6, and 8 in the discovery GWAS and the top three PCs and the continent source in the replication dataset). As the distributions of age and sex were significantly different between cases and controls in the two discovery studies and the OncoArray replication study, we also adjusted them in the risk analysis. We performed an unconditional logistic regression to estimate ORs and 95% confidence intervals (CI) per an effect allele by using PLINK (v2.0, https://www.cog-genomics.org/plink/2.0/) software with adjustment for the age, sex, and top significant PCs. The association analysis between SNPs on X chromosome and cancer risk was performed by using the –xchr-model 1 option in PLINK as well as stratified analysis by sex. SNPs with P ≤ 10−3 were chosen to validate in the OncoArray GWAS. SNPs with a combined P ≤ 5 × 10−8 were considered to reach the genome-wide significance. We performed both random-effects and fixed-effects meta-analyses by using the inverse variance–weighted average method to combine the summary results of the discovery and replication studies. Heterogeneity was assessed as a Q test P ≤ 0.10 or I2 > 50.0%. For SNPs in the identified regions, we performed the clump analysis to remove high LD SNPs with pairwise r2 > 0.1 and then performed conditional analysis with PLINK 1.9 to identify SNPs with independent effects. For SNPs remained significant or marginally significant in the conditional analysis, we constructed a polygenic risk score (PRS) by summing risk alleles weighted by their corresponding effect sizes in the MDACC study by using the “–score” function in PLINK 1.90. The PRS was standardized by the mean and SD and then estimated for its association with SCCHN risk. The OR was reported as per SD of the PRS.

To explore the possible functions of SNPs at the final identified regions, we applied the online tool HaploReg v4.1 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php), which integrated the Encyclopedia of DNA elements data, to perform functional annotation. We also performed in silico expression quantitative trait loci (eQTL) analysis by using data from multiple sources: the lymphoblastoid cell lines of 373 European individuals from Genetic European Variation in Health and Disease Consortium (GEUVADIS) and the 1000 Genomes Project (23); eQTL data of multiple tissues from the Genotype-Tissue Expression (GTEx) project (24); and SNP and mRNA expression data in primary tumor tissues from 344 patients with SCCHN of European ancestry in The Cancer Genome Atlas (TCGA) database (dbGaP accession no.: phs000178.v1.p1; ref. 25). Manhattan plots were generated in R using the package qqman; the regional association plots and LD plots were constructed on the basis of the 1000 Genomes European reference data (phase III, release date October 2014) by using LocusZoom and Haploview v4.2, respectively. SNP pruning was applied, and SNPs with paired-wise r2 < 0.30 were considered independent. All other analyses were conducted with R (version 3.5.1) and SAS (version 9.4; SAS Institute), if not specified otherwise.

Characteristics of the study populations

The workflow of this GWAS is depicted in Supplementary Fig. S1. The distributions of age and sex were statistically different between cases and controls (Table 1; P < 0.001) in the discovery dataset, and the case group included much older males [mean age 57.9 (SD ±11.2) for cases and 50.0 (12.3) for controls with 77.2% of males in cancers and 55.1% in controls]. Of the cases, there were 631 (29.1%) patients with oral cavity cancer, 1,144 (52.7%) with oropharyngeal cancer, and 394 (18.2%) with hypopharyngeal, laryngeal, or overlapping cancer sites, with two samples with missing values for histologic types.

Table 1.

Distributions of population characteristics in the two-phase study.

Discovery study (MDACC)Replication study (OncoArray)
CasesaControlsCasesbControls
Variables# (N = 2,171)%# (N = 4,493)%P# (N = 5,205)%# (N = 3,232)%P
Age     <0.0001    <0.0001 
 Median (range) 57 (18–94)  49 (18–89)   59 (18–94)  58 (17–89)   
 Mean (SD) 57.9 (11.2)  50.0 (12.3)   59.7 (10.9)  58.1 (11.5)   
Sex     <0.0001    0.001 
 Female 494 22.8 2,018 44.9  1,344 25.8 940 29.1  
 Male 1,677 77.2 2,475 55.1  3,861 74.2 2,292 70.9  
Tumor sites           
 Oral cancer 631 29.1    2,568 49.5    
 Oropharynx 1,144 52.7    2,328 44.8    
 Hypopharynx and larynx and other sites 394 28.2    295 5.7    
Discovery study (MDACC)Replication study (OncoArray)
CasesaControlsCasesbControls
Variables# (N = 2,171)%# (N = 4,493)%P# (N = 5,205)%# (N = 3,232)%P
Age     <0.0001    <0.0001 
 Median (range) 57 (18–94)  49 (18–89)   59 (18–94)  58 (17–89)   
 Mean (SD) 57.9 (11.2)  50.0 (12.3)   59.7 (10.9)  58.1 (11.5)   
Sex     <0.0001    0.001 
 Female 494 22.8 2,018 44.9  1,344 25.8 940 29.1  
 Male 1,677 77.2 2,475 55.1  3,861 74.2 2,292 70.9  
Tumor sites           
 Oral cancer 631 29.1    2,568 49.5    
 Oropharynx 1,144 52.7    2,328 44.8    
 Hypopharynx and larynx and other sites 394 28.2    295 5.7    

aTwo cases were missing the tumor site information in the discovery study from the MDACC GWAS.

bIn the replication study from the OncoArray GWAS, there are 14 cases with missing site information.

Similar to the discovery population, case group in the replication study had a higher proportion of males (74.2%) and older subjects (mean age of 59.7) than control group (70.9% and mean age of 58.1, respectively; Table 1). Of the cases, there were 2,568 (49.5%) patients with oral cavity cancer, 2,328 (44.8%) with oropharyngeal cancer, and 295 (5.7%) with hypopharyngeal, laryngeal, or overlapping cancer.

Association analysis

We performed association analysis for SNPs with an imputation quality r2 ≥ 0.3 and an MAF ≥ 0.01, and quality distributions have been shown in Supplementary Fig. S3A-F for those SNPs with an MAF ≥ 0.01 from the SCCHN GWAS, and the using controls from melanoma GWAS and SAGE studies, respectively. The overall results of the discovery results are presented in Fig. 1A. There were 10,714 SNPs (i.e., 10,218 SNPs on autosomes and 496 SNPs on X chromosome) with P ≤ 1.00 × 10−3 and 25 SNPs with P ≤ 5.00 × 10−8 in the MDACC discovery study. Quantile–quantile (Q–Q) plots of the observed and expected P values showed a moderate genomic inflation (λ) for discovery results (λ = 1.035; Supplementary Fig. S4A). We then replicated the associations of these SNPs in the OncoArray study and found 94 and 87 SNPs located at three loci (6p22.1, 6p21.33, and 6p21.32) associated with SCCHN risk with P ≤ 5 × 10−8 in the fixed-effects or random-effects model of the meta-analyses, respectively (Supplementary Table S1). We have also provided the results of the leading SNPs in each region in Table 2 (in the meta-analysis of the discovery and replication studies, P = 2.96 × 10−9, 3.75 × 10−10, and 1.44 × 10−16 for SNP rs259919 at 6p22.1, rs1265081 at 6p21.33, and rs3135001 at 6p21.32 in a random-effects model, respectively).

Figure 1.

Manhattan plots of the association results in the discovery study. Overall SCCHN risk (A); oral cancer (B); oropharyngeal cancer (C); and hypopharyngeal/laryngeal cancers (D). Dotted line, P = 5 × 10–8. The y-axis represents the –log10P values.

Figure 1.

Manhattan plots of the association results in the discovery study. Overall SCCHN risk (A); oral cancer (B); oropharyngeal cancer (C); and hypopharyngeal/laryngeal cancers (D). Dotted line, P = 5 × 10–8. The y-axis represents the –log10P values.

Close modal
Table 2.

Association results of leading SNPs with P ≤ 1 × 10−3 in the discovery dataset and P ≤ 5 × 10−8 in the random-effects model of the final meta-analysis.

MDACCOncoArrayMeta-analysis
RegionSNPChr:pos (hg19)GeneEff/refCases/controlsaOR (95% CI)bPbCases/controlsaOR (95% CI)cPcOR (95% CI)dPd
Overall SCCHN 
6p22.1 rs259919 6:30025503 ZNRD1-AS1 A/G 0.34/0.31 1.15 (1.06–1.25) 7.65E-04 0.34/0.28 1.19 (1.11–1.28) 8.51E-07 1.17 (1.11–1.24) 2.96E-09 
6p21.33 rs1265081 6:31111675 CCHCR1 C/A 0.47/0.5 0.85 (0.79–0.92) 6.71E-05 0.45/0.48 0.85 (0.80.91) 1.35E-06 0.85 (0.81–0.9) 3.75E-10 
6p21.32 rs3135001 6:32670136 HLA-DQB1 T/C 0.21/0.25 0.76 (0.69–0.83) 9.89E-09 0.19/0.21 0.78 (0.72–0.85) 2.35E-09 0.77 (0.73–0.82) 1.44E-16 
Oral cavity cancer 
5p15.33 rs10462706 5:1343794 CLPTM1L T/C 0.12/0.15 0.72 (0.60–0.88) 9.65E-04 0.14/0.16 0.73 (0.65–0.81) 2.10E-08 0.73 (0.66–0.80) 7.87E-11 
6p21.32 rs1049055 6:32634387 HLA-DQB1 C/T 0.23/0.27 0.76 (0.66–0.89) 4.86E-04 0.19/0.22 0.79 (0.72–0.87) 1.45E-06 0.78 (0.72–0.85) 2.96E-09 
Oropharyngeal cancer 
2p23.1 rs4318431 2:31098065 GALNT14 T/C 0.10/0.08 1.43 (1.21–1.69) 3.40E-05 0.10/0.08 1.37 (1.18–1.58) 2.14E-05 1.39 (1.25–1.55) 3.13E-09 
6p21.33 rs13211972 6:30959001 MUC21 A/G 0.08/0.05 1.64 (1.35–1.99) 7.79E-07 0.06/0.04 1.48 (1.23–1.77) 2.35E-05 1.55 (1.36–1.77) 1.04E-10 
6p21.32 rs34518860 6:32594103 HLA-DQA1 A/G 0.06/0.11 0.57 (0.48–0.69) 6.6E-09 0.07/0.11 0.63 (0.54–0.73) 5.40E-10 0.61 (0.54–0.68) 2.61E-17 
Hypopharyngeal and laryngeal cancer 
18q22.2 rs142021700 18:67701583 RTTN C/T 0.03/0.01 4.03 (2.25–7.21) 2.83E-06 0.03/0.01 3.84 (1.88–7.85) 2.28E-04 3.95 (2.51–6.21) 2.54E-09 
MDACCOncoArrayMeta-analysis
RegionSNPChr:pos (hg19)GeneEff/refCases/controlsaOR (95% CI)bPbCases/controlsaOR (95% CI)cPcOR (95% CI)dPd
Overall SCCHN 
6p22.1 rs259919 6:30025503 ZNRD1-AS1 A/G 0.34/0.31 1.15 (1.06–1.25) 7.65E-04 0.34/0.28 1.19 (1.11–1.28) 8.51E-07 1.17 (1.11–1.24) 2.96E-09 
6p21.33 rs1265081 6:31111675 CCHCR1 C/A 0.47/0.5 0.85 (0.79–0.92) 6.71E-05 0.45/0.48 0.85 (0.80.91) 1.35E-06 0.85 (0.81–0.9) 3.75E-10 
6p21.32 rs3135001 6:32670136 HLA-DQB1 T/C 0.21/0.25 0.76 (0.69–0.83) 9.89E-09 0.19/0.21 0.78 (0.72–0.85) 2.35E-09 0.77 (0.73–0.82) 1.44E-16 
Oral cavity cancer 
5p15.33 rs10462706 5:1343794 CLPTM1L T/C 0.12/0.15 0.72 (0.60–0.88) 9.65E-04 0.14/0.16 0.73 (0.65–0.81) 2.10E-08 0.73 (0.66–0.80) 7.87E-11 
6p21.32 rs1049055 6:32634387 HLA-DQB1 C/T 0.23/0.27 0.76 (0.66–0.89) 4.86E-04 0.19/0.22 0.79 (0.72–0.87) 1.45E-06 0.78 (0.72–0.85) 2.96E-09 
Oropharyngeal cancer 
2p23.1 rs4318431 2:31098065 GALNT14 T/C 0.10/0.08 1.43 (1.21–1.69) 3.40E-05 0.10/0.08 1.37 (1.18–1.58) 2.14E-05 1.39 (1.25–1.55) 3.13E-09 
6p21.33 rs13211972 6:30959001 MUC21 A/G 0.08/0.05 1.64 (1.35–1.99) 7.79E-07 0.06/0.04 1.48 (1.23–1.77) 2.35E-05 1.55 (1.36–1.77) 1.04E-10 
6p21.32 rs34518860 6:32594103 HLA-DQA1 A/G 0.06/0.11 0.57 (0.48–0.69) 6.6E-09 0.07/0.11 0.63 (0.54–0.73) 5.40E-10 0.61 (0.54–0.68) 2.61E-17 
Hypopharyngeal and laryngeal cancer 
18q22.2 rs142021700 18:67701583 RTTN C/T 0.03/0.01 4.03 (2.25–7.21) 2.83E-06 0.03/0.01 3.84 (1.88–7.85) 2.28E-04 3.95 (2.51–6.21) 2.54E-09 

Abbreviations: Chr:pos, chromosome:position; Eff/ref, effect allele/reference allele.

aMAF in cases and controls.

bAdjusted for top five significant PCs, age and sex in the MDACC study with 2,171 SCCHN cases, 631 patients with oral cancer, 1,144 patients with oropharyngeal cancer, and 394 patients with hypopharyngeal and laryngeal cancer, versus 4,493 controls.

cAdjusted for top three significant PCs, age, sex, and continent in the OncoArray study with 5,205 SCCHN cases, 2,568 patients with oral cavity cancer, 2,368 patients with oropharyngeal cancer, and 295 patients with hypopharyngeal and laryngeal cancer versus 3,232 controls.

dMeta-analysis with random-effects model.

Further stratified analysis by tumor site revealed that there were 23, 75, and one SNPs with P ≤ 5.00 × 10−8 in association with risk of oral cavity cancer, oropharyngeal cancer, and hypopharyngeal/laryngeal cancers in the discovery study, respectively (Fig. 1BD). We then selected SNPs with P ≤ 1.00 × 10−3 (i.e., 8,658, 12,454, and 9,062 SNPs in the three subpopulations of the discovery study, respectively) for replication with the OncoArray dataset (Supplementary Tables S2–S4). We found in the random-effects model of meta-analysis that two loci associated with oral cavity cancer risk reached genome-wide significance [the leading SNPs rs10462706 in CLPTM1L at 5p15.33 region and rs1049055 in HLA-DQB1 at 6p21.32 with P = 7.87 × 10−11 (OR, 0.73; 95% CI, 0.66–0.80) and P = 2.96 × 10−9 (OR, 0.78; 95% CI, 0.72–0.85), respectively]. Three loci (2p23.1, 6p21.33, and 6p21.32) were found to be associated with oropharyngeal cancer risk, with leading SNP rs4318431 (P = 3.13 × 10−9; OR, 1.39; 95% CI, 1.25–1.55) nearby gene GALNT14; SNP rs13211972 in MUC21 (P = 1.04 × 10−10; OR, 1.55; 95% CI, 1.36–1.77); and SNP rs34518860 (P = 2.61 × 10−17; OR, 0.61; 95% CI, 0.54–0.68) in HLA-DQA1, respectively. We also identified one novel locus (18q22.2) to be associated with risk of hypopharyngeal and laryngeal cancers (P = 2.54 × 10−9; OR, 3.95; 95% CI, 2.51–6.21 for leading SNP rs142021700 in RTTN; Supplementary Tables S2 and S4). Q–Q plots of the stratified results are shown in Supplementary Figs. S4B–S4D, and regional association plots for each identified locus are presented in Fig. 2A,I.

Figure 2.

The genetic regions associated with SCCHN and three subtypes. 6p22.1 in overall SCCHN (A); 6p21.33 in overall SCCHN (B); 6p21.32 in overall SCCHN (C); 5p15.33 region in oral cancers (D); 6p21.32 in oral cancer (E); 2p23.1 in oropharyngeal cancer (F); 6p21.33 region in oral cancer (G); 6p21.32 region in oropharyngeal cancer (H); and 18q22.2 in hypopharyngeal and laryngeal cancer (I). The association results were based on the discovery study.

Figure 2.

The genetic regions associated with SCCHN and three subtypes. 6p22.1 in overall SCCHN (A); 6p21.33 in overall SCCHN (B); 6p21.32 in overall SCCHN (C); 5p15.33 region in oral cancers (D); 6p21.32 in oral cancer (E); 2p23.1 in oropharyngeal cancer (F); 6p21.33 region in oral cancer (G); 6p21.32 region in oropharyngeal cancer (H); and 18q22.2 in hypopharyngeal and laryngeal cancer (I). The association results were based on the discovery study.

Close modal

To identify independent SNPs, we also performed the clump analysis and revealed four low LD clumps consisted of the 87 SNPs associated with the overall SCCHN risk. In the following conditional analysis, we found that each of four SNPs (rs3129726 and rs62404579 at 6p21.32, rs1265081 at 6p21.33, and rs259919 at 6p22.1) remained significant in the presence of three other SNPs (P < 0.05; Supplementary Table S5). Similarly, we found one SNP (i.e., SNP rs7713218 at 5q15.33) remained significant after conditioning on two lead SNPs at 5p15.33 and 6p21.32 for oral cavity cancer (Supplementary Table S6); five SNPs in 6p21.33 remained significantly associated with oropharyngeal cancer risk after conditioning on SNPs at the two reported loci 6p21.32 and 2p23.1; while no SNP was significant with hypopharyngeal and laryngeal cancer risk after conditioning on the leading SNP at 18q22.2 (Supplementary Tables S7 and S8). These results suggested that independent signals at three loci (6p22.1, 6p21.33, and 6p21.32) contribute to the risks of overall SCCHN and oropharyngeal cancer.

The two previous GWASs and candidate gene–based studies have reported 31 loci (including 33 leading SNPs) associated with risk of oral, oropharyngeal, pharyngeal, and laryngeal cancers (12, 13). We extracted the results from GWAS catalog (https://www.ebi.ac.uk/gwas/home) and investigated their association in the MDACC GWAS. As a result, we found five loci could be replicated in this SCCHN GWAS study (i.e., rs10462706 at locus 5p15.33 and rs1800628 at 6p21.33 with oral cancer risk, rs2216824 at 2p23.1 and rs1453414 at 11p15.4 with oropharyngeal cancer risk, and rs1229984 at 4q23 with hypopharyngeal and laryngeal cancer risk; Supplementary Table S9).

Because multiple SNPs in the HLA region have been associated with overall SCCHN risk and oral/oropharyngeal cancer risk, we then performed a HLA imputation to reveal the exact HLA alleles associated with cancer risk. For those with an imputation info ≥ 0.3, we found 53 HLA alleles to be associated with overall SCCHN risk with P < 0.05, of which, three alleles reached genome-wide significance (HLA-B*37, HLA-B*3701, and HLA-DQB1*06; Supplementary Table S10); 28 HLA alleles to be associated with oral cancer risk, of which, two alleles reached genome-wide significance (HLA-B*37 and HLA-B*3701; Supplementary Table S11); 59 HLA alleles to be associated with oropharyngeal cancer risk (Supplementary Table S12), of which, four alleles (i.e., HLA-B*37, HLA-B*3701, HLA-DQB1*06, and HLA-DRB1*13) reached genome-wide significant level; and 23 HLA variants to be associated with hypopharyngeal and laryngeal cancer risk (Supplementary Table S13). As shown in Supplementary Tables S12 and S14, we also replicated the association results of three reported HLA-specific alleles (DRB1*1301, DQA1*0103, and DQB1*0603) and their haplotype with decreased oropharyngeal cancer risk (P = 6.5 × 10−6, 4.16 × 10−7, 6.2 × 10−7, and 1.95 × 10−7, respectively; ref. 13).

We also constructed PRS by summing the effects of the 12 SNPs that remained significant or marginally significant in the conditional analysis of oropharyngeal cancer (i.e., rs73730372, rs28366328, rs9469220, rs13211972, rs17207190, rs114202986, rs144112342, rs2194452, rs41258944, rs114949918, rs3131013, and rs147748716), and analyzed its association with oropharyngeal cancer risk. As a result (Supplementary Table S15), the PRS showed a significant association (P < 2.00 ×10−16) with oropharyngeal cancer risk with an OR per SD of the PRS of 1.49 (95% CI, 1.39–1.60) and 1.38 (95% CI, 1.30–1.46) in the MDACC study and OncoArray study, respectively.

The in silico functional annotation

Functional annotations for the identified representative genetic variants reaching P < 5 × 10−8 are summarized in Supplementary Table S16. There were 179 SNPs at 6p21.32 and 6p21.33 with potential effects on the promoter or enhancer activities with a significant eQTL evidence. We also retrieved the eQTL results of multiple tissues from the GTEx for the lead SNPs or LD SNPs significantly correlated with corresponding mRNA expression levels. For example, the variant allele A of rs259919 at 6p22.1 was significantly correlated with increased mRNA expression levels of ZFP57 and HLA-DQB1, respectively (Supplementary Fig. S5A), while the variant allele A of rs13211972 at 6p21.33 and the wild allele T of rs1049055 at 6p21.32 were found to be associated with the decreased mRNA expression levels of MICA and HLA-DQB1, respectively, in multiple tissues (Supplementary Fig. S5B and S5C). The effect of rs10462706 in 5p15.33 on mRNA expression levels of CLPTM1L was different by tissues (Supplementary Fig. S5D). However, we did not find any evidence for the effects of rs78082910 in the 2p23.1 region on the expression of nearby genes (https://www.gtexportal.org/home/snp/rs78082910). We also performed the eQTL analysis in the primary tumor tissues of 344 patients with SCCHN from TCGA where both genotyping/imputation data and mRNA expression data were available. As one of the lead SNPs, rs73730372, was not included in the TCGA data, we used one of its high LD SNP rs115625939 in the eQTL analysis. Of the five loci, we found that the variant alleles of SNP rs115625939 (which is in a high LD with one representative SNP rs73730372 with r2 = 0.89) in 6p21.32, and rs27069 (which is in a high LD with representative SNP rs2447853 with r2 = 0.69) in 5p15.33 were correlated with the upregulated mRNA expression of HLA-DQB1 and CLPTM1L in the tumor tissues of head and neck cancer, respectively (Supplementary Fig. S6A and S6B: P = 0.012 and 0.008, respectively). No significance was found in the eQTL analyzes for MICA and GALNT14 in TCGA, and the mRNA expression data were unavailable for ZFP57. We also found that the identified SNP rs73730372 in 6p21.32 was significantly correlated with the mRNA expression of HLA-DQB1 (Supplementary Fig. S6C: P = 3.67 × 10−10) in the lymphoblastoid cell lines of 358 European individuals from the 1000 Genomes Project.

We have also performed differential expression analysis for the five genes in the identified regions (Table 2) by using the mRNA expression data in 520 SCCHN tumor tissues and 44 adjacent normal tissues from TCGA (http://ualcan.path.uab.edu/cgi-bin/ualcan-res.pl). As a result, we found that the mRNA expression levels of the five genes were significantly higher in the primary tumor tissues than in the adjacent normal tissues with P < 0.05 (Supplementary Fig. S7A–S7E for ZFP57, MICA, HLA-DQB1, CLPTM1L, and GALNT14, respectively). Ten other genes at the five loci (HLA-DRB1, HLA-DQA1, HLA-DQA2, PSORS1C3, HCG27, MICA, HCP5, HLA-DRB5, DPCR1, and MUC21) also showed significant difference on mRNA expression between the tumor tissues and adjacent normal tissues (http://ualcan.path.uab.edu/cgi-bin/TCGAExResultNew2.pl?genenam=HLA-DRB1, HLA-DQA1, HLA-DQA2, PSORS1C3, HCG27, MICA, HCP5, HLA-DRB5, DPCR1, MUC21&ctype = HNSC).

In this SCCHN GWAS study, we aimed to identify additional genetic loci associated with risk of SCCHN and its subtypes by using a discovery study, followed by an independent replication study. In the meta-analysis of all the two GWAS datasets, we identified SNPs at six genomic regions (i.e., 2p23.1, 5p15.33, 6p21.32, 6p21.33, 6p22.1, and 18q22.2) to be associated with risk of SCCHN or its subtypes at a GWAS significance level. Four of the regions are the known SCCHN risk loci (i.e., 2p23.1, 5p15.33, 6p21.32, and 6p21.33), while two other loci (i.e., 6p22.1 and 18q22.2) are novel findings for SCCHN risk. Functional annotation revealed that multiple SNPs in these regions were potentially functional, because they may affect their mRNA expression. The most prominent finding in the overall and stratified meta-analyses was a strong association signal at 6p21.32 within the HLA class II region. SNPs in this region showed significant associations with risk of SCCHN and all three subtypes, especially oropharyngeal cancer characterized by HPV infection in the etiology.

The HLA system has long been recognized in humans as a very important genomic region relating to infection, inflammation, autoimmunity, and transplantation medicine (26). The HLA system is categorized into class I, II, and II regions and consists of more than 200 genes that have multiple biological functions with an emphasis on immunologic functions (27). Specifically, SNP rs9273448/rs1049225 maps to the 3′ untranslated region of the gene MHC class II, DQ beta-1, also called HLA-DQB1, which belongs to HLA class II beta chain paralogs. The protein encoded by this gene is one of two proteins that are required to form the DQ heterodimer, a cell surface receptor essential to the function of the immune system. The identified SNPs for all SCCHN, oral, and mixed hypopharyngeal and laryngeal cancers are mainly located around HLA-DQB1/DQA1, while the identified SNPs for oropharyngeal cancer are distributed in a wide range, covering multiple HLA genes, including HLA-B, HCP5, HLA-DRA1, HLA-DRB1, HLA-DRB1, HLA-DQA1, and HLA-DQB1. Previous studies have reported significant associations between HLA-DQB1 polymorphisms and risks of HPV-related oropharyngeal cancer (13), cervical cancers (13, 28), cutaneous melanoma (29), gastric adenocarcinoma (30, 31), breast cancer (32), and nasopharyngeal carcinoma. In this SCCHN GWAS study, we replicated a previous report that three HLA alleles (DRB1*1301, DQA1*0103, and DQB1*0603) as well as their haplotypes were associated with risks of SCCHN and oropharyngeal cancer (13). In addition, we have shown that the variant allele of rs73730372 was associated with both higher mRNA expression levels of HLA-DQB1 and lower risk of oropharyngeal cancer, which is consistent with previous reports that HLA-DQB1 may be involved in the HPV-specific immune response (33). We also observed similar results by using the TCGA expression data, which indicated that those HLA genes had higher expression levels in the SCCHN tumor tissues than in adjacent normal tissues. Further functional studies are warranted to illuminate the underlying biological mechanisms.

The closest gene at the replicated loci 2p23.1 is GALNT14 that encodes a Golgi protein, which catalyzes the transfer of N-acetyl-D-galactosamine (GalNAc) to large proteins like mucins (34). Aberrant glycosylation is a hallmark of most human cancers and affects many cellular properties, including cell proliferation, apoptosis, differentiation, transformation, migration, invasion, and immune responses (35). GALNT14 has been reported to be involved in the initial step of mucin-type O-glycosylation and thus plays a critical role in the invasion and migration of breast cancers by regulating the activity of MMP-2 and expression of some epithelial–mesenchymal transition genes (36). Another study also suggested that GALNT14 might contribute to ovarian carcinogenesis through aberrant glycosylation of MUC13, whose expression was dysregulated in many human cancers (37, 38). In this SCCHN GWAS study, we also observed that GALNT14 had higher mRNA expression levels in the SCCHN tumors than in the adjacent normal tissues. Interestingly, we also found that SNPs in MUC21/MUC22 (6p21.33) were associated with risk of the overall SCCHN and oropharyngeal cancer. Splicing variants and mutations in mucin genes have been observed in various cancers and shown to participate in cancer progression and metastasis (39). The nearby gene MUC21 is localized on chromosome 6 (6p21.33) closing to the HLA class I region, which is a membrane-associated mucin belonging to the mucin family (40). Clinically, mucins are used as carcinoma markers and therapeutic targets for cancer treatment (40, 41). The protein encoded by MUC21 has been shown to be expressed by adenocarcinomas of the lung (40). However, by using the TCGA data, we observed that MUC21 had higher expression levels in the normal tissues than in the SCCHN tumor tissues, which implied this gene might play a different role in the development of SCCHN. Previous association studies of genetic variants have also linked the MUC21 gene to noncancer diseases (e.g., Stevens–Johnson syndrome/toxic epidermal necrolysis, and pulmonary function; refs. 42, 43). However, few studies have investigated the functions of the MUC22 gene. In addition, we revealed that the variant allele of SNP rs13211972 in the 6p21.33 region was significantly correlated with decreased mRNA expression levels of MICA and an increased risk of SCCHN. MICA encodes a membrane-bound protein, acting as a ligand of natural killer (NK) group 2D (NKG2D) to trigger NK-cell–mediated cytotoxicity. MICA has an antitumor property as its expression is induced in stressed cells, such as transformed tumor cells for the detection by NK cells (44). Several studies have reported that SNPs in MICA have been associated with risk of cervical squamous cell carcinoma (45, 46) and HCV-induced hepatocellular carcinoma (47). The association between the MICA short tandem repeat polymorphism and risk of oral squamous cell carcinoma had been investigated in several candidate gene studies but with conflicting results, which might be due to small sample size (48–50). Considering the relatively large sample of this SCCHN GWAS, our results provided a strong evidence that individuals with SNPs associated with lower mRNA expression levels of MICA might have an increased risk of SCCHN and oropharyngeal cancer. Genetic variants in this region have also been reported to be associated with risk of lung cancer and follicular lymphoma, and the susceptibility gene BAT3 was found to be involved in DNA damage–induced apoptosis and to modulate the acetylation of p53 during autophagy (51–53). In addition, in the HLA allele analysis, we observed that the HLA-B*37 allele was associated with the risks of SCCHN, oral cancer, and oropharyngeal cancer. Further functional validation for those susceptibility genes is warranted.

We also found that the variant allele of SNP rs259919 in the 6p22.1 region was significantly correlated with decreased mRNA expression levels of ZFP57 in multiple tissues and an increased risk of SCCHN. ZFP57 is an important transcriptional regulator involved in DNA methylation and genomic imprinting during development (54). In addition, previous studies have reported that ZFP57 plays an important role in DNA methylation and epigenetic regulation and has important potential implications for diseases.

In this SCCHN GWAS, we also showed a significant association between the variant allele of SNP rs2447853 at 5p15.33 and an increased risk of oral cavity cancer, which confirms the previous finding (13). This locus was also reported to be associated with lung cancer risk, and the genetic variants in TERT_CLPTM1L have been reported to be associated with DNA-adduct levels in the lung (51, 55). By using the GTEx data, we found that the variant allele of rs2447853 was significantly correlated with increased mRNA expression levels of CLPTM1L in multiple tissues (e.g., small intestine, colon, and esophagus), which provides some biological evidence for the identified association. However, further functional studies are warranted.

It should be noted that the rs1229984 (4q23, ADH1B) has been previously reported as a susceptibility locus for oral cancer and oropharyngeal cancer in several studies. In this SCCHN GWAS study, we found this SNP was only associated with risk of hypopharyngeal and laryngeal cancers, but not oral cavity and oropharyngeal cancer. Such discrepancies might due to population heterogeneity.

In summary, in this GWAS of SCCHN in non-Hispanic whites, we identified two novel common loci that might influence SCCHN risk and replicated some loci previously reported, which highlights the importance of genetic variation of genes (e.g., HLA-DQB1, HLA-DQA1, and MUC21) in the HLA system in the development of SCCHN. These findings suggest that the immunologic mechanism is implicated in the etiology of SCCHN, particularly in oropharyngeal cancer. Future replication of these findings in other independent populations is warranted with additional functional studies necessary to establish the biological framework underlying the observed associations.

E.M. Sturgis reports receiving other commercial research support from Roche. No potential conflicts of interest were disclosed by the other authors.

Conception and design: S. Shete, E.M. Sturgis, G. Li, C.I. Amos, Q. Wei

Development of methodology: S. Shete, H. Liu, Q. Wei

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S. Shete, E.M. Sturgis, G. Li, Z. Liu, C.I. Amos, Q. Wei

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S. Shete, H. Liu, J. Wang, R. Yu, E.M. Sturgis, K.R. Dahlstrom, Q. Wei

Writing, review, and/or revision of the manuscript: S. Shete, H. Liu, J. Wang, E.M. Sturgis, G. Li, K.R. Dahlstrom, Z. Liu, C.I. Amos, Q. Wei

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Shete, H. Liu, E.M. Sturgis, Q. Wei

Study supervision: S. Shete, E.M. Sturgis, Q. Wei

S. Shete was supported in part by the NIH grants 1R01CA131324 and R01DE022891; the Cancer Prevention Research Institute of Texas grants RP170259; the Barnhart Family Distinguished Professorship in Targeted Therapy; and Betty B. Marcus Chair in Cancer Prevention. S. Shete and J. Wang were supported in part by the Cancer Center Support grant P30CA016672. Q. Wei was supported by NIH grants 2R01 ES011740 and 1R01CA 131274 and the Duke Cancer Institute as part of the P30 Cancer Center Support grant (Grant ID: NIH/NCI CA014236).

Part of the controls were from the melanoma GWAS of MDACC, which was deposited in dbGaP (accession no.: phs000187.v1.p1). Research support to collect data and develop an application to support this project was provided by 3P50CA093459, 5P50CA097007, R01CA100264, and 5R01CA133996.

Part of the control were requested from the SAGE in dbGaP. Funding support for the SAGE was provided through the NIH Genes, Environment and Health Initiative (GEI, U01 HG004422). SAGE is one of the GWASs funded as part of the Gene Environment Association Studies (GENEVA) under GEI. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the National Center for Biotechnology Information. Support for collection of datasets and samples was provided by the Collaborative Study on the Genetics of Alcoholism (U10 AA008401), the Collaborative Genetic Study of Nicotine Dependence (P01 CA089392), and the Family Study of Cocaine Dependence (R01 DA013423). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract “High throughput genotyping for studying the genetic contributions to human disease” (HHSN268200782096C). The datasets used for the analyses described in this article were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000092.v1.p1 through dbGaP accession number phs000092.v1.p.

The replication data were from the study of OncoArray: Oral and Pharynx Cancer (dbGaP study accession no.: phs001202.v1.p1) in dbGaP. Genotyping performed at the CIDR was supported through contract number HHSN268201200008I: funds were provided by the U.S. National Institute of Dental and Craniofacial Research grant X01HG007780; funds were also provided by the NCI for genotyping for shared controls with the Lung OncoArray initiative (grant X01HG007492). University of Pittsburgh head and neck cancer study: grants P50 CA097190 and P30 CA047904. Carolina Head and Neck Cancer Study: R01-CA90731. GENCAPO: FAPESP, grant numbers 04/12054-9 and 10/51168-0. HN5000 study: NIHR RP-PG-0707-10034. Toronto study: the Canadian Cancer Society Research Institute (020214) and NCI U19 CA148127. ARCAGE study: European Commission's 5th Framework Program (QLK1-2001-00182), FIRMS, Region Piemonte, and Padova University (CPDA057222). Rome Study: AIRC IG 2011 10491 and IG2013 14220, and Fondazione Veronesi. IARC Latin American study: European Commission INCO-DC IC18-CT97-0222, Fondo para la Investigacion Cientifica y Tecnologica (Argentina), and Fundação de Amparo à Pesquisa do Estado de São Paulo (01/01768-2). IARC Central Europe study: INCO-COPERNICUS Program (IC15- CT98-0332), and NCI CA92039 and WCRF99A28. IARC Oral Cancer Multicenter study: Europe against Cancer (S06 96 202489 05F02), Spain FIS 97/0024, FIS 97/0662, BAE 01/5013, UICC Yamagiwa-Yoshida, National Cancer Institute of Canada, AIRC, and PAHO/WHO. EPIC study: European Commission (DG SANCO) and IARC.

eQTL analysis was performed by using the genotyping data and mRNA expression data in the primary tumor tissues of 344 patients with SCCHN of European ancestry from TCGA database (dbGaP accession no.: phs000178.v1.p1). The results published here are in whole or part based upon data generated by TCGA managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Parkin
DM
,
Pisani
P
,
Ferlay
J
. 
Global cancer statistics
.
CA Cancer J Clin
1999
;
49
:
33
64
.
2.
Wang
M
,
Chu
H
,
Zhang
Z
,
Wei
Q
. 
Molecular epidemiology of DNA repair gene polymorphisms and head and neck cancer
.
J Biomed Res
2013
;
27
:
179
92
.
3.
Siegel
R
,
Naishadham
D
,
Jemal
A
. 
Cancer statistics, 2013
.
CA Cancer J Clin
2013
;
63
:
11
30
.
4.
Chaturvedi
AK
,
Engels
EA
,
Pfeiffer
RM
,
Hernandez
BY
,
Xiao
W
,
Kim
E
, et al
Human papillomavirus and rising oropharyngeal cancer incidence in the United States
.
J Clin Oncol
2011
;
29
:
4294
301
.
5.
Hashibe
M
,
Brennan
P
,
Benhamou
S
,
Castellsague
X
,
Chen
C
,
Curado
MP
, et al
Alcohol drinking in never users of tobacco, cigarette smoking in never drinkers, and the risk of head and neck cancer: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium
.
J Natl Cancer Inst
2007
;
99
:
777
89
.
6.
Sturgis
EM
,
Cinciripini
PM
. 
Trends in head and neck cancer incidence in relation to smoking prevalence: an emerging epidemic of human papillomavirus-associated cancers?
Cancer
2007
;
110
:
1429
35
.
7.
Negri
E
,
Boffetta
P
,
Berthiller
J
,
Castellsague
X
,
Curado
MP
,
Dal Maso
L
, et al
Family history of cancer: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium
.
Int J Cancer
2009
;
124
:
394
401
.
8.
Ho
T
,
Wei
Q
,
Sturgis
EM
. 
Epidemiology of carcinogen metabolism genes and risk of squamous cell carcinoma of the head and neck
.
Head Neck
2007
;
29
:
682
99
.
9.
Neumann
AS
,
Sturgis
EM
,
Wei
Q
. 
Nucleotide excision repair as a marker for susceptibility to tobacco-related cancers: a review of molecular epidemiological studies
.
Mol Carcinog
2005
;
42
:
65
92
.
10.
Wei
S
,
Liu
Z
,
Zhao
H
,
Niu
J
,
Wang
LE
,
El-Naggar
AK
, et al
A single nucleotide polymorphism in the alcohol dehydrogenase 7 gene (alanine to glycine substitution at amino acid 92) is associated with the risk of squamous cell carcinoma of the head and neck
.
Cancer
2010
;
116
:
2984
92
.
11.
Hashibe
M
,
McKay
JD
,
Curado
MP
,
Oliveira
JC
,
Koifman
S
,
Koifman
R
, et al
Multiple ADH genes are associated with upper aerodigestive cancers
.
Nat Genet
2008
;
40
:
707
9
.
12.
Wei
Q
,
Yu
D
,
Liu
M
,
Wang
M
,
Zhao
M
,
Liu
M
, et al
Genome-wide association study identifies three susceptibility loci for laryngeal squamous cell carcinoma in the Chinese population
.
Nat Genet
2014
;
46
:
1110
4
.
13.
Lesseur
C
,
Diergaarde
B
,
Olshan
AF
,
Wunsch-Filho
V
,
Ness
AR
,
Liu
G
, et al
Genome-wide association analyses identify new susceptibility loci for oral cavity and pharyngeal cancer
.
Nat Genet
2016
;
48
:
1544
50
.
14.
Neumann
AS
,
Lyons
HJ
,
Shen
H
,
Liu
Z
,
Shi
Q
,
Sturgis
EM
, et al
Methylenetetrahydrofolate reductase polymorphisms and risk of squamous cell carcinoma of the head and neck: a case-control analysis
.
Int J Cancer
2005
;
115
:
131
6
.
15.
Li
G
,
Sturgis
EM
,
Wang
LE
,
Chamberlain
RM
,
Amos
CI
,
Spitz
MR
, et al
Association of a p73 exon 2 G4C14-to-A4T14 polymorphism with risk of squamous cell carcinoma of the head and neck
.
Carcinogenesis
2004
;
25
:
1911
6
.
16.
Chen
X
,
Sturgis
EM
,
Lei
D
,
Dahlstrom
K
,
Wei
Q
,
Li
G
. 
Human papillomavirus seropositivity synergizes with MDM2 variants to increase the risk of oral squamous cell carcinoma
.
Cancer Res
2010
;
70
:
7199
208
.
17.
World Health Organization
.
International statistical classification of diseases, injuries, and causes of death
.
Geneva, Switzerland
:
World Health Organization
; 
1992
.
18.
Percy
C
,
Van Holten
VD
,
Muir
C
.
International classification of diseases for oncology
.
Geneva, Switzerland
:
World Health Organization
; 
1990
.
19.
Wyss
A
,
Hashibe
M
,
Chuang
SC
,
Lee
YC
,
Zhang
ZF
,
Yu
GP
, et al
Cigarette, cigar, and pipe smoking and the risk of head and neck cancers: pooled analysis in the International Head and Neck Cancer Epidemiology Consortium
.
Am J Epidemiol
2013
;
178
:
679
90
.
20.
Amos
CI
,
Wang
LE
,
Lee
JE
,
Gershenwald
JE
,
Chen
WV
,
Fang
S
, et al
Genome-wide association study identifies novel loci predisposing to cutaneous melanoma
.
Hum Mol Genet
2011
;
20
:
5012
23
.
21.
Mailman
MD
,
Feolo
M
,
Jin
Y
,
Kimura
M
,
Tryka
K
,
Bagoutdinov
R
, et al
The NCBI dbGaP database of genotypes and phenotypes
.
Nat Genet
2007
;
39
:
1181
6
.
22.
Li
Y
,
Byun
J
,
Cai
G
,
Xiao
X
,
Han
Y
,
Cornelis
O
, et al
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data
.
BMC Bioinformatics
2016
;
17
:
122
.
23.
Lappalainen
T
,
Sammeth
M
,
Friedlander
MR
,
t Hoen
PA
,
Monlong
J
,
Rivas
MA
, et al
Transcriptome and genome sequencing uncovers functional variation in humans
.
Nature
2013
;
501
:
506
11
.
24.
GTEx Consortium
. 
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans
.
Science
2015
;
348
:
648
60
.
25.
Cancer Genome Atlas Network
. 
Comprehensive genomic characterization of head and neck squamous cell carcinomas
.
Nature
2015
;
517
:
576
82
.
26.
Shiina
T
,
Hosomichi
K
,
Inoko
H
,
Kulski
JK
. 
The HLA genomic loci map: expression, interaction, diversity and disease
.
J Hum Genet
2009
;
54
:
15
39
.
27.
Horton
R
,
Wilming
L
,
Rand
V
,
Lovering
RC
,
Bruford
EA
,
Khodiyar
VK
, et al
Gene map of the extended human MHC
.
Nat Rev Genet
2004
;
5
:
889
99
.
28.
Zhang
X
,
Lv
Z
,
Yu
H
,
Wang
F
,
Zhu
J
. 
The HLA-DQB1 gene polymorphisms associated with cervical cancer risk: a meta-analysis
.
Biomed Pharmacother
2015
;
73
:
58
64
.
29.
Lee
JE
,
Reveille
JD
,
Ross
MI
,
Platsoucas
CD
. 
HLA-DQB1*0301 association with increased cutaneous melanoma risk
.
Int J Cancer
1994
;
59
:
510
3
.
30.
Wu
MS
,
Hsieh
RP
,
Huang
SP
,
Chang
YT
,
Lin
MT
,
Chang
MC
, et al
Association of HLA-DQB1*0301 and HLA-DQB1*0602 with different subtypes of gastric cancer in Taiwan
.
Jpn J Cancer Res
2002
;
93
:
404
10
.
31.
Lee
JE
,
Lowy
AM
,
Thompson
WA
,
Lu
M
,
Loflin
PT
,
Skibber
JM
, et al
Association of gastric adenocarcinoma with the HLA class II gene DQB10301
.
Gastroenterology
1996
;
111
:
426
32
.
32.
Chaudhuri
S
,
Cariappa
A
,
Tang
M
,
Bell
D
,
Haber
DA
,
Isselbacher
KJ
, et al
Genetic susceptibility to breast cancer: HLA DQB*03032 and HLA DRB1*11 may represent protective alleles
.
Proc Natl Acad Sci U S A
2000
;
97
:
11451
4
.
33.
Peng
S
,
Trimble
C
,
Wu
L
,
Pardoll
D
,
Roden
R
,
Hung
CF
, et al
HLA-DQB1*02-restricted HPV-16 E7 peptide-specific CD4+ T-cell immune responses correlate with regression of HPV-16-associated high-grade squamous intraepithelial lesions
.
Clin Cancer Res
2007
;
13
:
2479
87
.
34.
Wang
H
,
Tachibana
K
,
Zhang
Y
,
Iwasaki
H
,
Kameyama
A
,
Cheng
L
, et al
Cloning and characterization of a novel UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase, pp-GalNAc-T14
.
Biochem Biophys Res Commun
2003
;
300
:
738
44
.
35.
Pinho
SS
,
Reis
CA
. 
Glycosylation in cancer: mechanisms and clinical implications
.
Nat Rev Cancer
2015
;
15
:
540
55
.
36.
Huanna
T
,
Tao
Z
,
Xiangfei
W
,
Longfei
A
,
Yuanyuan
X
,
Jianhua
W
, et al
GALNT14 mediates tumor invasion and migration in breast cancer cell MCF-7
.
Mol Carcinog
2015
;
54
:
1159
71
.
37.
Wang
R
,
Yu
C
,
Zhao
D
,
Wu
M
,
Yang
Z
. 
The mucin-type glycosylating enzyme polypeptide N-acetylgalactosaminyltransferase 14 promotes the migration of ovarian cancer by modifying mucin 13
.
Oncol Rep
2013
;
30
:
667
76
.
38.
Maher
DM
,
Gupta
BK
,
Nagata
S
,
Jaggi
M
,
Chauhan
SC
. 
Mucin 13: structure, function, and potential roles in cancer pathogenesis
.
Mol Cancer Res
2011
;
9
:
531
37
.
39.
Kumar
S
,
Cruz
E
,
Joshi
S
,
Patel
A
,
Jahan
R
,
Batra
SK
, et al
Genetic variants of mucins: unexplored conundrum
.
Carcinogenesis
2017
;
38
:
671
79
.
40.
Itoh
Y
,
Kamata-Sakurai
M
,
Denda-Nagai
K
,
Nagai
S
,
Tsuiji
M
,
Ishii-Schrade
K
, et al
Identification and expression of human epiglycanin/MUC21: a novel transmembrane mucin
.
Glycobiology
2008
;
18
:
74
83
.
41.
Hollingsworth
MA
,
Swanson
BJ
. 
Mucins in cancer: protection and control of the cell surface
.
Nature Rev Cancer
2004
;
4
:
45
60
.
42.
Genin
E
,
Schumacher
M
,
Roujeau
JC
,
Naldi
L
,
Liss
Y
,
Kazma
R
, et al
Genome-wide association study of Stevens-Johnson syndrome and toxic epidermal necrolysis in Europe
.
Orphanet J Rare Dis
2011
;
6
:
52
.
43.
Hancock
DB
,
Artigas
MS
,
Gharib
SA
,
Henry
A
,
Manichaikul
A
,
Ramasamy
A
, et al
Genome-wide joint meta-analysis of SNP and SNP-by-smoking interaction identifies novel loci for pulmonary function
.
PLoS Genet
2012
;
8
:
e1003098
.
44.
Chen
D
,
Gyllensten
U
. 
MICA polymorphism: biology and importance in cancer
.
Carcinogenesis
2014
;
35
:
2633
42
.
45.
Chen
D
,
Juko-Pecirep
I
,
Hammer
J
,
Ivansson
E
,
Enroth
S
,
Gustavsson
I
, et al
Genome-wide association study of susceptibility loci for cervical cancer
.
J Natl Cancer Inst
2013
;
105
:
624
33
.
46.
Chen
D
,
Hammer
J
,
Lindquist
D
,
Idahl
A
,
Gyllensten
U
. 
A variant upstream of HLA-DRB1 and multiple variants in MICA influence susceptibility to cervical cancer in a Swedish population
.
Cancer Med
2014
;
3
:
190
8
.
47.
Jiang
X
,
Zou
Y
,
Huo
Z
,
Yu
P
. 
Association of major histocompatibility complex class I chain-related gene A microsatellite polymorphism and hepatocellular carcinoma in South China Han population
.
Tissue Antigens
2011
;
78
:
143
7
.
48.
Chung-Ji
L
,
Yann-Jinn
L
,
Hsin-Fu
L
,
Ching-Wen
D
,
Che-Shoa
C
,
Yi-Shing
L
, et al
The increase in the frequency of MICA gene A6 allele in oral squamous cell carcinoma
.
J Oral Pathol Med
2002
;
31
:
323
8
.
49.
Reinders
J
,
Rozemuller
EH
,
van der Ven
KJ
,
Caillat-Zucman
S
,
Slootweg
PJ
,
de Weger
RA
, et al
MHC class I chain-related gene a diversity in head and neck squamous cell carcinoma
.
Hum Immunol
2006
;
67
:
196
203
.
50.
Tamaki
S
,
Sanefuzi
N
,
Ohgi
K
,
Imai
Y
,
Kawakami
M
,
Yamamoto
K
, et al
An association between the MICA-A5.1 allele and an increased susceptibility to oral squamous cell carcinoma in Japanese patients
.
J Oral Pathol Med
2007
;
36
:
351
6
.
51.
Wang
Y
,
Broderick
P
,
Webb
E
,
Wu
X
,
Vijayakrishnan
J
,
Matakidou
A
, et al
Common 5p15.33 and 6p21.33 variants influence lung cancer risk
.
Nat Genet
2008
;
40
:
1407
9
.
52.
Sasaki
T
,
Gan
EC
,
Wakeham
A
,
Kornbluth
S
,
Mak
TW
,
Okada
H
. 
HLA-B-associated transcript 3 (Bat3)/Scythe is essential for p300-mediated acetylation of p53
.
Genes Dev
2007
;
21
:
848
61
.
53.
Skibola
CF
,
Bracci
PM
,
Halperin
E
,
Conde
L
,
Craig
DW
,
Agana
L
, et al
Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma
.
Nat Genet
2009
;
41
:
873
5
.
54.
Plant
K
,
Fairfax
BP
,
Makino
S
,
Vandiedonck
C
,
Radhakrishnan
J
,
Knight
JC
. 
Fine mapping genetic determinants of the highly variably expressed MHC gene ZFP57
.
Eur J Hum Genet
2014
;
22
:
568
71
.
55.
Zienolddiny
S
,
Skaug
V
,
Landvik
NE
,
Ryberg
D
,
Phillips
DH
,
Houlston
R
, et al
The TERT-CLPTM1L lung cancer susceptibility variant associates with higher DNA adduct formation in the lung
.
Carcinogenesis
2009
;
30
:
1368
71
.

Supplementary data