Abstract
Purpose: There is substantial heterogeneity within human papillomavirus (HPV)-associated head and neck cancer (HNC) tumors that predispose them to different outcomes; however, the molecular heterogeneity in this subgroup is poorly characterized due to various historical reasons.
Experimental Design: We performed unsupervised gene expression clustering on deeply annotated (transcriptome and genome) HPV+ HNC samples from two cohorts (84 total primary tumors), including 18 HPV− HNC samples, to discover subtypes and characterize the differences between subgroups in terms of their HPV characteristics, pathway activity, whole-genome somatic copy number alterations, and mutation frequencies.
Results: We identified two distinct HPV+ subtypes (namely HPV-KRT and HPV-IMU). HPV-KRT is characterized by elevated expression of genes in keratinocyte differentiation and oxidation–reduction process, whereas HPV-IMU has strong immune response and mesenchymal differentiation. The differences in expression are likely connected to the differences in HPV characteristics and genomic changes. HPV-KRT has more genic viral integration, lower E2/E4/E5 expression levels, and higher ratio of spliced to full-length HPV oncogene E6 than HPV-IMU; the subgroups also show differences in copy number alterations and mutations, in particular the loss of chr16q in HPV-IMU and gain of chr3q and PIK3CA mutation in HPV-KRT.
Conclusions: Our characterization of two subtypes of HPV+ HNC tumors provides valuable molecular level information that point to two main carcinogenic paths. Together, these results shed light on stratifications of the HPV+ HNCs and will help to guide personalized care for HPV+ HNC patients. Clin Cancer Res; 22(18); 4735–45. ©2016 AACR.
Despite significant clinical implications, the molecular characteristics of human papillomavirus associated (HPV+) head and neck cancers (HNC) are less well characterized than HPV-negative (HPV−) tumors. We deeply characterized the genomic and transcriptomic signatures of HPV+ and HPV− HNCs to determine whether there are molecular subgroups that may correspond to clinical outcomes. We identified two robust HPV+ subtypes using gene expression–based clustering. One subtype (HPV-KRT) has more keratinization, viral integration events, spliced E6*, chr3q amplifications, and PIK3CA mutations, whereas the other subtype (HPV-IMU) has more mesenchymal differentiation, full-length E6 activity and chr16q deletions, and stronger immune response. Literature concerning these features suggests a worse outcome for HPV-KRT; the expected trend was observed in The Cancer Genome Atlas survival analysis. Our study provides insight into the key genetic events that likely drive two different paths of oncogenesis in HPV+ HNCs, which will be important for guiding personalized treatment plans for HPV+ patients.
Introduction
Head and neck cancer (HNC) affects approximately 600,000 patients per year globally, and with five-year survival rates ranging from 37% to 62% (1). The majority of HNC cases have been historically attributed to the excessive use of carcinogens such as tobacco and alcohol, but a significant and increasing proportion of cases are associated with high-risk human papillomavirus (HPV) infection. In fact, it is predicted that by 2020, the number of HPV+ HNC cases will outnumber cervical cancer cases (2). Similar to cervical cancer, HPV type 16 is the most common high-risk type, accounting for 87% of HPV+ cases in oropharyngeal HNC, 68% in oral HNC and 69% in laryngeal HNC (3, 4). Currently, approximately 75% of oropharyngeal tumors are associated with HPV; the prevalence of HPV in non-oropharyngeal sites such as oral cavity and larynx is relatively rare but still occurs (5, 6). Overall, HPV+ HNC patients tend to have more favorable prognosis and treatment response rates, and different patient characteristics such as younger age at diagnosis, lower smoking rate, and higher intake of beneficial micronutrients compared with their HPV− counterparts (7–9). They also have key molecular differences, such as the observed high expression of p16 (CDKN2A) in HPV+ tumors and loss of p16 expression in HPV− tumors, and HPV+ tumors are generally less-differentiated (10) and have different copy number profiles (11) compared with HPV− tumors.
HPV normally infects the basal layer of the epithelium, and then exploits the epithelial-to-keratinocyte proliferation and differentiation pathways to complete the viral life cycle. HPV expresses two main viral oncoproteins, E6 and E7, which cooperatively inhibit apoptosis and enhance tumor cell growth and proliferation by inducing degradation of tumor suppressor TP53 and disruption of function of the retinoblastoma protein (Rb1), respectively (12). Alteration of additional pathways, such as suppression of immune response (13) and cell adhesion (14), and induction of DNA damage (15) and oxidative stress (16), may also be important for tumor transformation.
High-risk HPV E6 is expressed in cells as two main isoforms: a full-length variant (E6) and truncated variants often collectively referred to as E6*. It was shown that E6* inversely regulates the ability of E6 to degrade TP53 (17). Full-length E6 and E6* also bind to different sites of procaspase 8 and alternatively modulate its stability (destabilizing and stabilizing, respectively; ref. 18). The ratio of spliced E6 also correlated with outcomes in HPV+ oropharyngeal patients (19). In addition to the combined oncogenic potential of the E6 and E7 proteins, the integration of part or all of the HPV genome into the host genome is suggested to trigger focal genomic instability and further drive the neoplastic process, and is estimated to occur in 75% of HNC cases, of which an estimated 54% are integrated into a known gene (20, 21). Integrated viral transcripts are more stable than those derived from episomal HPV (22), and integration confers an increased proliferative capacity and selective growth advantage (12, 22). Expression of these transcripts is key, as it has been observed that tumors positive for HPV DNA, but negative for HPV RNA, have similar gene expression and TP53 mutation frequencies as HPV− tumors (23).
Several studies have investigated the differences in gene expression and copy number changes between HPV+ and HPV− tumors (24, 25), or have defined gene expression subtypes of HNCs irrespective of HPV status (26, 27). To date, only two studies have identified expression subtypes of HPV+ HNCs (24, 28); both studies used microarray data, thus limiting their ability to comprehensively characterize the differences between subtypes and identify the underlying causes of the expression differences. Here, by deep analyses of 36 HNC tumors [18 HPV+; 18 HPV−] collected at the University of Michigan with transcriptome and whole genome copy number alterations (CNA) data, we define two robust HPV+ subtypes distinguished by gene expression patterns. Membership in the two subgroups correlates with genic viral integration status, E2/E4/E5 expression, and E6 splicing ratio. In addition, the clusters show different CNA patterns, mutation frequencies in PIK3CA and expression of cancer-relevant pathways, such as host immune response and keratinocyte differentiation. Validation analyses carried out on 66 additional HPV+ HNC samples from The Cancer Genome Atlas (TCGA) yielded the same findings, demonstrating the robustness of the subtypes and their related characteristics.
Materials and Methods
Tumor tissue acquisition and DNA and RNA extraction
As a part of our ongoing survivorship cohort, we identified incident cases of HNC patients at University of Michigan hospital with untreated oropharynx or oral cavity squamous cell carcinoma between 2011 and 2013. Written informed consent was obtained, and the study was approved by the University of Michigan Institutional Review Board. Pretreatment tumor tissue and blood were collected into a cryogenic storage tube and flash frozen in liquid nitrogen by surgical staff until storage at −80°C. The flash-frozen tissues were embedded in optimal cutting temperature media in vinyl cryomolds on dry ice and stored in −80°C until prepared for histology. Hematoxylin and eosin slides were sectioned from each frozen tumor specimen on a cryostat and assessed by a board-certified, HNC expert pathologist for degrees of cellularity and necrosis. Criteria used for inclusion in the study were a minimum of 70% cellularity and less than 10% necrosis. The first 36 tumors meeting these criteria were selected. Using a sterile scalpel, surface scrapings were taken directly from the frozen tissue blocks from the region of tissue identified as having at least 70% tumor cellularity, over dry ice to allow the tissue to remain frozen. Frozen scrapings were placed into prechilled tubes on dry ice and processed using the Qiagen AllPrep DNA/RNA/Protein Mini Kit as per the manufacturer's protocol. Blood DNA from the same patients was isolated using the Qiagen QIAamp Blood DNA Mini Kit.
RNA-seq and SNP-array protocol
RNA library construction and sequencing on Illumina HiSeq using 100 nt paired-end reads were performed by the University of Michigan DNA sequencing Core Facility. Samples were multiplexed and run on multiple lanes to avoid lane variations. DNA isolates from all tumors and matched blood samples were run on the Illumina HumanOmniExpress BeadChip SNP-array. The raw microarray images were processed by Illumina Genome Studio to yield the log R ratio (LRR) values and B allele frequency values. Raw and processed RNA-seq and SNP-array data can be accessed from GEO with the accession number GSE74956.
RNA-seq analysis
The RNA-seq libraries were aligned to both human and HPV genomes to quantify the host and viral gene expression and determine HPV status. Samples were classified as HPV+ if they had more than 500 read pairs aligned to any HPV genome. Of all 36 tumors, we identified 14 HPV type 16, one type 18, one type 33, and two type 35. Virus genic integration events were detected using VirusSeq (29). Full-length E6 percent was calculated as the ratio of full-length E6 over all E6 transcripts. Variant calling was performed following GATK Best Practices on RNA-seq data (30), and the resulting variants were filtered to leave only the ones predicted to be damaging. RNA-seq fastq files of 66 TCGA HPV+ tumor samples were downloaded from cghub (31). The data were realigned and analyzed in the same way as UM RNA-seq data described above, except for variant calling. Gene somatic mutations for TCGA samples were downloaded instead from Xena (32). See Supplementary Methods for additional RNA-seq analysis details.
Subtype discovery and pathway scores
Standard hierarchical clustering and consensus clustering methods (33) were both performed on gene expression matrices and produced consistent results. The cluster memberships were robust to the choices of different clustering parameters and the number of top variable genes. Sample-wise pathway scores were computed as aggregated ranks of gene expression levels for four selected representative gene sets: E6-regulated genes, EMT genes, T-cell activation (GO: 0042110), and keratinocyte differentiation (GO: 0030216). See Supplementary Methods for details.
CNA analysis
OncoSNP v2.1 was run on every tumor and matched normal, with the setting that the maximal possible stromal contamination is 0.5 (34). The level 1 through level 5 CNA output data from OncoSNP were overlaid to provide the final CNA calls. The copy number for each gene was calculated as the average copy number of the segments overlapping the gene rounded to the nearest integer. Segmented whole genome copy number variation data for TCGA samples were downloaded from TCGA data portal. The copy number values for TCGA samples were stored as log2 ratios of the tumor probe intensity to the normal intensity. Gene-level CNAs for PIK3CA were downloaded from Xena (32) for TCGA samples.
Results
Overview of differential expression results from HPV+ and HPV− tumors
We performed transcriptomic analysis via RNA-deep sequencing on 36 tumor samples (18 HPV+ and 18 HPV−) to define gene expression levels. Supervised differential expression analysis using HPV status as the group variable identified 1,887 and 1,644 genes significantly upregulated and downregulated in HPV+ samples, respectively [false discovery rate (FDR) < 0.05 and absolute fold change > 2]. We annotated the genes to neoplasm-related terms downloaded from Gene2MeSH (35), and reidentified several important differentially expressed genes: TP53, CDKN2A, BRCA2, CYP2E1, KIT, and EZH2 were significantly upregulated in HPV+ tumors, and CCND1, GSTM1, HIF1A, MMP2, CD44, and MET were downregulated (Supplementary Table S1).
We performed Gene Ontology enrichment analysis with LRpath (36) and found that "immune response", "cell cycle", and "DNA replication" were upregulated in HPV+ samples compared with HPV−, whereas "extracellular matrix" and "epithelium development" were upregulated in HPV−samples (Supplementary Table S2). These findings are consistent with what has been previously reported (24). Enrichment analysis with cytobands identified several locations on 11q (11q13, 11q22.3, and 11q23.3) as enriched for genes upregulated in HPV− samples. This may be driven by frequent focal amplification of 11q13 and 11q22 in HPV− samples and deletion of the far end of chr11q in HPV+ samples (Supplementary Fig. S1; ref. 27).
Unsupervised clustering revealed two HPV+ subgroups
Unsupervised clustering using the 6,922 most variably expressed genes among all samples revealed three distinct groups (Fig. 1A and Supplementary Fig. S2A). First, HPV− samples were distinguished from HPV+ samples, except for one HPV− sample which clustered with HPV+ samples. The 18 HPV+ samples separated into two clusters of 8 and 10 HPV+ samples. These clustering results were robust to various clustering metrics (Euclidean distance, centered or uncentered Pearson's correlation), linkages (single, average, or robust), and different numbers of top variable genes used. To assure that the clusters we found are generalizable and not limited to our cohort, we included 66 additional HPV+ samples from The Cancer Genome Atlas (TCGA) HNC cohort, and then performed unsupervised clustering on the 84 HPV+ samples combined. Again, two clusters were robustly identified (Fig. 1B and Supplementary Fig. S2B). The clusters were not associated with smoking history, anatomical site, clinical T stage, clinical N stage or tumor stage, but were correlated with gender and HPV type (α level = 0.05), although we were underpowered to compare most epidemiologic characteristics (Fig. 1A; Table 1).
. | UM Tumors . | TCGA Tumors . | . | |||||
---|---|---|---|---|---|---|---|---|
. | Total . | Non-HPV . | HPV-KRT . | HPV-IMU . | Total . | HPV-KRT . | HPV-IMU . | . |
. | 36 . | 18 . | 10 . | 8 . | 66 . | 41 . | 25 . | Pa . |
Age | ||||||||
Median (SD) | 56.5 (10) | 55.5 (6.8) | 62.5 (6.7) | 58 (10) | 57 (7.8) | 0.38 | ||
Gender | ||||||||
Male | 26 | 9 | 9 | 8 | 60 | 35 | 25 | 0.039 |
Female | 10 | 9 | 1 | 0 | 6 | 6 | 0 | |
HPV type | ||||||||
HPV16 | 14 | 9 | 5 | 55 | 37 | 18 | 0.022 | |
HPV18 | 1 | 1 | 0 | 0 | 0 | |||
HPV33 | 1 | 0 | 1 | 8 | 3 | 5 | ||
HPV35 | 2 | 0 | 2 | 3 | 1 | 2 | ||
Anatomical site | ||||||||
Oropharynx | 20 | 3 (17%) | 9 (90%) | 8 (100%) | 47 | 26 (63%) | 21 (84%) | 0.051 |
Oral cavity | 14 | 13 (72%) | 1 (10%) | 0 | 16 | 13 (32%) | 3 (12%) | |
Larynx | 2 | 2 (11%) | 0 | 0 | 1 | 0 | 1 (4%) | |
Hypopharynx | 0 | 0 | 0 | 0 | 2 | 2 (5%) | 0 | |
Tumor stage | ||||||||
I–II | 5 | 4 | 0 | 1 | 12 | 6 | 6 | 0.27 |
III | 3 | 1 | 1 | 1 | 7 | 3 | 4 | |
IV | 28 | 13 | 9 | 6 | 47 | 32 | 15 | |
T stage | ||||||||
T1–T2 | 14 | 6 | 3 | 5 | 39 | 23b | 16 | 0.37 |
T3–T4 | 22 | 12 | 7 | 3 | 26 | 17 | 9 | |
N stage | ||||||||
N0 | 10 | 9 | 0 | 1 | 18 | 12c | 6 | 0.34 |
N1 | 2 | 0 | 1 | 1 | 6 | 1 | 5 | |
N2 | 17 | 6 | 7 | 4 | 39 | 25 | 14 | |
N3 | 7 | 3 | 2 | 2 | 2 | 2 | 0 | |
Smoking | ||||||||
Never | 7 | 3 (17%) | 3 (30%) | 1 (13%) | 22 | 11 (27%) | 11 (44%)d | 0.66 |
Former | 23 | 12 (66%) | 4 (40%) | 7 (87%) | 30 | 23 (56%) | 7 (28%) | |
Current | 6 | 3 (17%) | 3 (30%) | 0 | 13 | 7 (17%) | 6 (24%) |
. | UM Tumors . | TCGA Tumors . | . | |||||
---|---|---|---|---|---|---|---|---|
. | Total . | Non-HPV . | HPV-KRT . | HPV-IMU . | Total . | HPV-KRT . | HPV-IMU . | . |
. | 36 . | 18 . | 10 . | 8 . | 66 . | 41 . | 25 . | Pa . |
Age | ||||||||
Median (SD) | 56.5 (10) | 55.5 (6.8) | 62.5 (6.7) | 58 (10) | 57 (7.8) | 0.38 | ||
Gender | ||||||||
Male | 26 | 9 | 9 | 8 | 60 | 35 | 25 | 0.039 |
Female | 10 | 9 | 1 | 0 | 6 | 6 | 0 | |
HPV type | ||||||||
HPV16 | 14 | 9 | 5 | 55 | 37 | 18 | 0.022 | |
HPV18 | 1 | 1 | 0 | 0 | 0 | |||
HPV33 | 1 | 0 | 1 | 8 | 3 | 5 | ||
HPV35 | 2 | 0 | 2 | 3 | 1 | 2 | ||
Anatomical site | ||||||||
Oropharynx | 20 | 3 (17%) | 9 (90%) | 8 (100%) | 47 | 26 (63%) | 21 (84%) | 0.051 |
Oral cavity | 14 | 13 (72%) | 1 (10%) | 0 | 16 | 13 (32%) | 3 (12%) | |
Larynx | 2 | 2 (11%) | 0 | 0 | 1 | 0 | 1 (4%) | |
Hypopharynx | 0 | 0 | 0 | 0 | 2 | 2 (5%) | 0 | |
Tumor stage | ||||||||
I–II | 5 | 4 | 0 | 1 | 12 | 6 | 6 | 0.27 |
III | 3 | 1 | 1 | 1 | 7 | 3 | 4 | |
IV | 28 | 13 | 9 | 6 | 47 | 32 | 15 | |
T stage | ||||||||
T1–T2 | 14 | 6 | 3 | 5 | 39 | 23b | 16 | 0.37 |
T3–T4 | 22 | 12 | 7 | 3 | 26 | 17 | 9 | |
N stage | ||||||||
N0 | 10 | 9 | 0 | 1 | 18 | 12c | 6 | 0.34 |
N1 | 2 | 0 | 1 | 1 | 6 | 1 | 5 | |
N2 | 17 | 6 | 7 | 4 | 39 | 25 | 14 | |
N3 | 7 | 3 | 2 | 2 | 2 | 2 | 0 | |
Smoking | ||||||||
Never | 7 | 3 (17%) | 3 (30%) | 1 (13%) | 22 | 11 (27%) | 11 (44%)d | 0.66 |
Former | 23 | 12 (66%) | 4 (40%) | 7 (87%) | 30 | 23 (56%) | 7 (28%) | |
Current | 6 | 3 (17%) | 3 (30%) | 0 | 13 | 7 (17%) | 6 (24%) |
aTests were performed between HPV-KRT and HPV-IMU for both cohorts combined. Wilcoxon rank sum test was performed for age, whereas Fisher exact test was used for other variables. For HPV type, HPV16 versus others combined was tested. For anatomical sites, only oropharynx versus oral cavity was tested due to insufficient tumors from other sites. For N stage, N0, N1 versus N2, N3 was tested.
b,c,dThere was one missing value for each of these features (all from different samples).
Differentially regulated genes and pathways between HPV+ subgroups
We next characterized the molecular differences between the HPV+ subgroups. Differential expression analysis between the two HPV+ clusters found 3,515 genes significantly differentially expressed (absolute fold change > 2 and FDR < 0.05). Upregulated genes in one cluster were enriched for “immune response,” “mesenchymal cell differentiation,” and various differentiation and development-related terms; upregulated genes in the other cluster were most significantly enriched for “keratinocyte differentiation” and “oxidative reduction process” (complete list in Supplementary Table S3). Therefore, we named the clusters HPV-IMU and HPV-KRT, respectively. The top differentially expressed genes from each relevant gene set are shown in Fig. 1C, including BCL2 for mesenchymal differentiation, CDH3 and TP63 for keratinization, and CDH1 and KRT16 for cell adhesion. To understand how the pathways were dysregulated in the context of all HNC tumors, we compared each HPV+ cluster to the other cluster and HPV− samples, and identified pathways that were uniquely up- or downregulated (see Supplementary Methods). Enrichment testing results showed remarkably elevated immune response in HPV-IMU, consisting of increased T-cell activation, B-cell activation, and lymphocyte activation, and repression of both mesenchymal differentiation and extracellular matrix–related expression in HPV-KRT; it also showed increased keratinization/epidermal differentiation and oxidative–reduction process gene expression in HPV-KRT relative to HPV-IMU, with mixed expression in the HPV− samples (Fig. 1D). Multidimensional scaling analysis revealed that the HPV-KRT subgroup was overall more similar to the HPV− samples (Fig. 1B). We observed that HPV-KRT patients were more likely to be female and of type HPV16 compared with HPV-IMU patients.
HPV+ subgroups correlate with HPV integration, E2/E4/E5 expression levels, full-length E6 percent, and E6 activity
We investigated HPV genic integration status and integration sites with VirusSeq (29) using RNA-seq data for the HPV+ tumors. Nine of the 18 UM and 41 of the 66 TCGA HPV+ samples were found to contain at least one genic HPV integration site identified from virus-host fusion gene expression, hereafter denoted as genic-integration. Surprisingly, cluster HPV-KRT had more samples with genic-integration (7 of 10; 70%) than cluster HPV-IMU (2 of 8; 25%). The same difference was observed for TCGA data, where cluster HPV-KRT had 32 of 41 (78%) with a genic-integration and HPV-IMU had 9 of 25 (36%; Fig. 2A). This difference in genic-integration was significant between subgroups (combined P = 0.0001; Fisher exact test).
We next tested whether any of the early HPV genes (E1, E2, E4, E5, E6, E7) were differentially expressed between the subgroups. Of these, we found that E2, E4, and E5 had significantly lower expression in the HPV-KRT subgroup (E5 in Fig. 2B, and E2 and E4 in Supplementary Fig. S3A and S3B), whereas E6 (Fig. 2C) and E7 (Supplementary Fig. S3C) did not. HPV integration frequently associates with loss of E2, E4, and/or E5 (37). As the HPV-KRT cluster had more genic integration events than HPV-IMU, this result is consistent with the difference in genic integration events. Upon closer examination, 4 of the 9 UM genic-integration–positive samples and 22 of the 41 TCGA samples displayed lost expression of E2/E4/E5 (Supplementary Fig. S3D).
It is known that E6 may be expressed in either of two main forms: a full-length E6 isoform and various spliced isoforms, together referred to as E6*, which have different functions not yet fully understood. We next asked whether the ratio of full-length E6 to total E6 expression correlates with subgroup membership (Supplementary Fig. S3E; see Supplementary Methods for details). Because the full-length E6 percentages differed significantly by HPV types (Supplementary Fig. S3F), we restricted our analysis to only the most prevalent HPV type, HPV16 (82% of all cases). HPV-KRT has significantly lower full-length E6 percent than HPV-IMU (Fig. 2D; Wilcoxon rank sum test P = 0.001), suggesting that HPV-KRT expresses less full-length E6 transcript and more spliced form E6*.
Although the full-length E6 percent was different between subgroups, total E6 expression levels measured by RNA-seq were not (Fig. 2C). As the expression levels were quantified using RNA levels, they may not reflect the actual E6 protein activity levels in the cell. We took advantage of a published study by Duffy and colleagues (38) of 51 genes (35 down and 16 up) regulated by E6, to calculate an E6 activity score for each tumor sample (see Supplementary Methods). Overall, the E6 score was significantly higher in HPV-IMU, indicating elevated E6 activity in HPV-IMU (Fig. 2E and F; P = 2.6e−7). In particular, the genes down-regulated by E6 in Duffy's study were more repressed in HPV-IMU than HPV-KRT, whereas the upregulated genes were induced to a lesser degree (Fig. 2E).
Correlation of subgroups with copy number alterations and PIK3CA mutation
Examining genomic properties of the two HPV+ subgroups allowed us to further compare them with the HPV− HNC samples. Somatic copy numbers were obtained for all samples by analyzing SNP-array data from tumors and blood (see Methods). In our sample of 36 tumors, HPV− samples tended to harbor more copy number gains than HPV+ samples (Fig. 3A). In addition to the commonly observed copy number changes associated with HPV status (gain of 3q, 5p, 8q in HPV− tumors and loss of 11q and 13q in HPV+ tumors; ref. 27), we found substantial differences between the two HPV+ subgroups. Overall, HPV-KRT tumors tended to have more amplifications than HPV-IMU. Particularly, at the chromosomal arm level, HPV-KRT had more amplifications on all or a significant portion of chr3q than subgroup HPV-IMU (P = 1.7e−5, Wilcoxon test). In addition, HPV-IMU had frequent copy number loss on chr16q, which was completely absent in HPV-KRT and HPV− samples. Differences in chr3q CNA gain and chr16q CNA loss were also observed in the respective HPV+ TCGA subgroups (Fig. 3B).
To determine whether the CNAs affected gene expression differences, we asked what percent of the genes on chr3q and chr16q were up versus downregulated. Of the 693 and 427 genes on chr3q and chr16q, 148 (21%) and 132 (31%) genes were significantly upregulated in HPV-KRT compared with HPV-IMU, respectively; in contrast, a much smaller percent of genes were downregulated (7% and 5%; Fig. 3C and D). This is quite distinct from the opposite arms of these chromosomes, 3p and 16p, where nearly equal percentages of genes were up- and downregulated. The same trends were observed for TCGA data (Fig. 3C and D). This suggests that the gain and loss of copy numbers in part drove the expression differences between the two subgroups. Using neoplasm-related gene annotations from Gene2MeSH (35), we found 33 oncology-related genes, including TNSF10, PIK3CA, TP63, and MUC4 on chr3q, and MMP2, CDH1, NQO1, and CDH13 on chr16q (Supplementary Table S4). Nineteen of the 33 genes were also differentially expressed between the clusters.
To investigate whether there were any differences in gene mutation frequencies between the two subgroups, we analyzed expressed, nonsynonymous mutations derived from the RNA-seq data. We also obtained TCGA gene-level somatic mutation data from Xena UCSC (32). None of the genes had a mutation percent difference between HPV-KRT and HPV-IMU greater than 20% in both cohorts, except for oncogene PIK3CA (Fig. 4A). PIK3CA had a mutation in a striking 60% of samples from the HPV-KRT subgroup and 0% in HPV-IMU for the UM cohort. In TCGA, the difference was smaller (37% in HPV-KRT vs. 16% in HPV-IMU); however, still appreciable (Fig. 4B). Five of the 6 UM PIK3CA mutations were known activating mutations (E545K, E545G, and E542K), while the remaining mutation has unknown effect (E81K). Similarly, 17 of the 19 mutations from TCGA were known activating mutations. In addition, PIK3CA is located on chr3q, which was also found to have more amplifications in HPV-KRT. Not surprisingly, the copy numbers of PIK3CA were higher in HPV-KRT (Fig. 4C). Together, these two results strongly suggest upregulated PI3K activity in the HPV-KRT tumors.
Characteristics associated with HPV+ subgroups and patient survival
Summarizing and visualizing all of the above findings, we observe that although the clusters can be robustly separated by expression patterns, there remains substantial heterogeneity within each subgroup for each variable (Fig. 5A). For example, we created an epithelial-to-mesenchymal transition (EMT) score for each tumor, which combines the expression levels of the epithelial and mesenchymal differentiation genes (see Materials and Methods). While the HPV-KRT subgroup clearly had lower EMT scores overall, there was substantial variation in scores, especially among the TCGA cohort. HPV-IMU has higher levels of immune response (represented in Fig. 5A by an overall T-cell score), frequent chr16q loss, less viral integration in an expressed genic region, more E2/E5, and higher BCL2 gene expression. Except for a stronger EMT signature, all other attributes for HPV-IMU were correlated with better prognosis in the literature (Fig. 5B and Discussion). Because the elevated immune response in HPV-IMU could be an indication of higher infiltration of cytotoxic T cells, we calculated a T-cell activation score for each tumor based on 196 genes from the GO term “T-cell activation” (see Materials and Methods). Using overall survival data from TCGA, we found that TCGA HPV+ tumors with high T-cell activation scores had better overall survival than those with low T-cell activation scores (Fig. 5C; P < 0.05). Although we did not find a significant difference in overall survival between HPV-KRT and HPV-IMU with the same TCGA data, HPV-KRT tended to have worse overall survival than HPV-IMU (Fig. 5D), consistent with the general predictions from literature (39–45) and the findings between HPV+ subgroups in Keck and colleagues (28). Tumor stage and site are two important covariates. To address a potential confounding effect, we performed a multivariate risk analysis using cluster membership, clinical stage (stage 1–3 vs. stage 4) and site (oropharyngeal vs. non-oropharyngeal). Compared with the univariate analysis we presented (Fig. 5D), the trend of increasing risk in HPV-KRT remained, though the coefficient became smaller (0.608 to 0.369) with the P value remaining insignificant. Lack of a significant difference could be due to the poor follow-up data available from TCGA, or due to truly similar overall survival between subgroups. In either case, the many above-described differences strongly suggest that different therapies may best benefit patients within each subgroup.
Discussion
Head and neck squamous cell carcinomas represent a heterogeneous disease that consists of two molecular and clinically distinct entities distinguished by HPV infection. Because of a lack of HPV information and/or a small number of HPV+ cases in most previous studies, attempts to identify tumor subtypes in HNCs have often neglected HPV as an important variable. In one study, Pyeon and colleagues found two distinct subgroups of HPV+ cancers, which were differentiated by a few key genes (24). Keck and colleagues also independently identified two HPV+ subtypes (28), and characterized differences in morphology (differentiation and proliferation), expression (mesenchymal and immune response), and copy number of a few key neoplasm genes (PIK3CA, TP63, SOX2) between subtypes. However, they were unable to comprehensively describe molecular differences between the HPV+ subtypes. Specifically, it remained unclear how these subtypes correlated with HPV characteristics, and the likely cause for their distinct behaviors.
Using unsupervised clustering analysis, we independently identified two robust HPV+ subgroups with our cohort that persisted when we applied the same algorithm to the TCGA cohort. Although our clusters are constructed on RNA-seq data as opposed to microarrays, they share similarities with the two HPV+ clusters found in Keck and colleagues (28). We observed similar differences between the two clusters in immune response, mesenchymal differentiation and keratinization, and copy number changes in PIK3CA and TP63. However, whereas we found that the two HPV+ groups clustered separately from the HPV− tumors (with one exception), Keck and colleagues found a significant number of HPV− tumors coclustered in each of their two HPV+ subgroups. The one HPV− sample that clustered with HPV-IMU in our cohort had an induced immune response expression profile similar to that of the HPV-IMU subgroup, and it was classified as laryngeal, possibly explaining its outlier status. In addition to providing an independent source validating the existence of two HPV+ subtypes, we greatly expanded the depth of characterization by comprehensively profiling the expressed mutations, whole-genome CNAs and exploring HPV characteristics. Most importantly, our findings highlight two oncogenic paths in HPV+ tumors that are likely driven by differences in HPV characteristics, chromosome-arm level CNAs, and PI3K pathway activity. Interestingly, the subtype membership also seemed to correlate with other demographic variables: the KRT group tended to have more females, more high-stage tumors, and more non-oropharynx tumors. The gender ratio was similar between subtypes in Keck and colleagues (28), and the P value was very close to 0.05 in our analysis (Table 1), suggesting that this may be a spurious result and will need validation with larger cohorts.
The most significant difference in expression between the HPV+ clusters is the upregulation of mesenchymal and immune-response genes in the HPV-IMU group, and keratinization and oxidation–reduction process in HPV-KRT. In fact, all of these pathways can be explained by the biology of HPV carcinogenesis. The HPV E6 oncoprotein is reported to downregulate a large number of genes involved in keratinocyte differentiation, and upregulate genes normally expressed in mesenchymal lineages (38). In addition, HPV type 16 spliced variant E6* induces oxidative stress (16). Our analysis revealed that the HPV-IMU group had higher E6 activity, whereas HPV-KRT had higher levels of the spliced forms of E6, E6*, which is concordant with the repressed keratinization and induced mesenchymal differential pathways in HPV-IMU and higher oxidation–reduction response in HPV-KRT. The HPV-KRT group was also more likely to have a detected HPV integration event, which may drive the expression of more E6* and less full-length E6.
The higher immune response in HPV-IMU may be particularly relevant, as it was shown that HPV+ HNCs have a different immune profile than their HPV− counterparts, featuring higher numbers of infiltrating CD8+ T lymphocytes, myeloid dendritic cell, and proinflammatory chemokines (46). Together, these are hypothesized to promote better response to treatment in HPV+ patients (46). In our study, we showed that stronger immune response could indeed predict better overall survival (Fig. 5C). We hypothesize that when HPV shifts from the initial episomal form to an integrated transcribed form, the inflammatory/immune response towards HPV concurrently weakens. That may explain why HPV-IMU, having fewer HPV-integration events, have a stronger immune response. The stronger inflammatory/immune response in the HPV-IMU group may partially explain why HPV+ HNCs overall have better prognosis.
Another key finding of our study is that more chr3q amplifications were observed in HPV-KRT, whereas frequent chr16q deletions were found in HPV-IMU. These two signature copy number changes were evident in the independent UM and TCGA cohorts, validating these findings. However, the functional significance behind the associations of the copy number signatures with subgroups is unclear. One possible explanation could be that the change in copy number of a few key genes on chr3q and chr16q favors the survival/growth of tumor cells in each respective subgroup, thus they were each positively selected in the tumor evolution of each subtype. We queried a tumor associated gene (TAG) database (47) for the percent of oncogenes and tumor repressor genes (TSG) on chr3q and chr16q, and interestingly, the majority of TAGs on chr3q are oncogenes (14 oncogenes, 4 TSGs, and 9 unknown) whereas most TAGs on chr16q are TSGs (12 TSGs, 1 oncogene, and 2 unknown). This preliminary result suggests that the amplification of chr3q and deletion of chr16q are both more likely to promote tumor growth. Examining the genes more closely, we found that on chr3q, several key cancer genes stood out as having antiapoptotic function, such as PIK3CA, TP63, MUC4, which may promote cancer cell survival for HPV-KRT as a result of chr3q amplification. Particularly ΔNp63α (an isoform of p63) is dominantly over expressed in HNC and plays a pivotal anti-differentiation and anti-apoptosis role in the formation of HNC (48). On chr16q, CDH1, CDH13, and BCAR1 were associated with cell adhesion, thus the attenuation of their expression by copy number loss in HPV-IMU likely strengthens EMT properties in this subtype.
When we investigated the mutation difference between the clusters, PIK3CA was found to be the top hit. PI3K signaling pathway has previously been implicated in tumorigenesis and PIK3CA-activating mutations have been found in various cancer types (49). We discovered that the HPV-KRT subgroup not only has more PIK3CA-activating mutations, but also more copy number gains of PIK3CA. This suggests elevated PIK3CA activity is important especially in the HPV-KRT group, and may further implicate a difference in tumorigenesis paths between the two subtypes.
One limitation of our study is the lack of patient survival data. UM HPV+ tumors were collected between 2011 and 2013, and due to high overall survival in this group, we are not yet able to perform meaningful risk analysis on these patients. The TCGA cohort has 66 HPV+ samples; however, most of the patients were lost to follow-up after 12 months, therefore effectively reducing the sample size and conferring the analysis a much lower power, especially in analyses including covariate adjustment. Nevertheless, HPV-KRT patients are conceived to have worse outcome based on several lines of evidence. First, HPV-KRT's expression profiles (Fig. 1B) and copy number profiles, particularly gain on chr3q (Fig. 3A), are more similar to those of HPV−patients, who are known to have worse treatment response and survival. HPV-IMU has more frequent chr16q loss, which was reported to correlate with better survival for patients with oropharyngeal SCCs (39). In addition, HPV-KRT tumors are more likely to have expressed genic virus integration events, which may be indicative of or contribute to a higher level of genomic instability. In cervical cancer, it was found that patients who only had integrated HPV expression had a trend towards worse disease-free survival compared with patients with both integrated and episomal HPV forms (40). Studies on HNCs showed worse recurrence-free survival for HPV+ tumors with low levels of E2 (41) and E5 (42), markers for integrated HPV. HPV-IMU has much higher levels of BCL2 expression than HPV-KRT, which has been reported to correlate with more favorable outcome in HNCs (43, 44). HPV-IMU also has higher immune response, which was shown to correlate with better survival in HPV+ oropharyngeal cancer (45). Altogether, this network of correlated events and pathways implies that the HPV-IMU and HPV-KRT subtypes proceed through carcinogenesis using different driving forces, and are thus likely to benefit from different treatment strategies. While patients falling in the HPV-IMU subgroup may benefit more from immunotherapies and treatment targeting EMT, HPV-KRT may benefit from strategies to induce tissue-based inflammation (50). Furthermore, our results strongly suggest that most of these differences are driven by characteristics of the HPV infection itself.
In conclusion, we identified two subtypes of HNC HPV+ tumors based initially on expression patterns from RNA-seq data, but found to strongly correlate with a substantial number of important molecular markers, including viral characteristics, CNAs, and oncogenic PIK3CA mutations. This study takes a significant step towards decoding the molecular heterogeneity among HPV-associated HNCs. Our work has important translational implications that could guide biomarker development and precision medicine for HPV+ HNC patients.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: T.E. Carey, L.S. Rozek, M.A. Sartor
Development of methodology: Y. Zhang, T.E. Carey, L.S. Rozek, M.A. Sartor
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.E. Arthur, A. Virani, T.E. Carey, D.B. Chepeha, J.B. McHugh, L.S. Rozek, S. Virani
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Y. Zhang, L.A. Koneva, P.B. Hall, C. Warden, T.E. Carey, J.B. McHugh, G.T. Wolf, L.S. Rozek, M.A. Sartor
Writing, review, and/or revision of the manuscript: Y. Zhang, A.E. Arthur, P.B. Hall, T.E. Carey, D.B. Chepeha, M.E. Prince, J.B. McHugh, G.T. Wolf, L.S. Rozek, M.A. Sartor
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A. Virani, C.D. Warden, D.B. Chepeha, G.T. Wolf, L.S. Rozek, S. Virani
Study supervision:L.S. Rozek
Grant Support
This work was funded by NIH, NCI (R01CA158286), the University of Michigan Specialized Programs of Research Excellence (SPORE) grant (P50CA097248), as well as by the National Human Genome Research Institute (NHGRI) training grant (T32 HG00040).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.