Human papillomavirus (HPV)-positive head and neck cancers, predominantly oropharyngeal squamous cell carcinoma (OPSCC), exhibit epidemiologic, clinical, and molecular characteristics distinct from those OPSCCs lacking HPV. We applied a combination of whole-genome sequencing and optical genome mapping to interrogate the genome structure of HPV-positive OPSCCs. We found that the virus had integrated in the host genome in two thirds of the tumors examined but resided solely extrachromosomally in the other third. Integration of the virus occurred at essentially random sites within the genome. Focal amplification of the virus and the genomic sequences surrounding it often occurred subsequent to integration, with the number of tandem repeats in the chromosome accounting for the increased copy number of the genome sequences flanking the site of integration. In all cases, viral integration correlated with pervasive genome-wide somatic alterations at sites distinct from that of viral integration and comprised multiple insertions, deletions, translocations, inversions, and point mutations. Few or no somatic mutations were present in tumors with only episomal HPV. Our data could be interpreted by positing that episomal HPV is captured in the host genome following an episode of global genome instability during tumor development. Viral integration correlated with higher grade tumors, which may be explained by the associated extensive mutation of the genome and suggests that HPV integration status may inform prognosis.
Our results indicate that HPV integration in head and neck cancer correlates with extensive pangenomic structural variation, which may have prognostic implications.
Head and neck cancer, an aggressive malignancy with high morbidity and mortality, is the seventh most common cancer worldwide, with 890,000 new cases and 450,000 deaths worldwide and 51,540 new cases and 10,030 deaths in the United States in 2018 (1–3). Greater than 90% of head and neck cancers are squamous cell carcinomas arising from the mucosal surfaces of the oral cavity, oropharynx, and larynx (4, 5). While the classic risk factors are tobacco and alcohol, human papillomavirus (HPV) has emerged in the past few decades as a growing risk factor for these cancers, especially for oropharyngeal squamous cell carcinoma (OPSCC), defining a new subtype of tumor that is distinct from HPV-negative tumors. As a consequence, OPSCC is one of the few cancers with rapidly increasing incidence in recent years, driven predominantly by HPV-positive cases (6, 7).
Following initial infection, HPV persists in the nucleus of its host cell as an extrachromosomal episome, but can subsequently integrate into the host genome (8, 9). The reported proportion of HPV-positive tumors in which the virus integrates into the genome varies by study, but analysis of The Cancer Genome Atlas (TCGA) data indicates that HPV is found integrated in approximately 71% in virus-positive head and neck cancer and 83% in cervical cancer (10, 11). These integration events occur essentially randomly throughout the genome, although a few loci have been identified as recurrent (12, 13). The recurrent sites are often associated with common fragile genomic locations, transcriptionally active regions, and near regions of microhomology (1–10 bp) between the viral and human genomes (11, 14–16). This suggests that DNA double strand breaks drive vial integration, a conclusion supported by the observation that DNA damage promotes viral integration (17, 18). Integration occurs at essentially random sites within the viral genome as well. Most often only a fragment of the viral genome is retained following integration, spanning E6, E7 and a random amount of the adjacent viral genome often lacking an intact E2 (11, 13). The retention of E6 and E7 and the loss of E2 is likely the consequence of selection during tumorigenesis, as the elimination of E2 results in increased expression of the E6/E7 viral oncogenes, which drive tumorigenesis. The E6 oncoprotein of the oncogenic strains of HPV, primarily 16, 18, and 33, inactivates the p53 pathway by promoting degradation of p53, resulting in abrogation of genome integrity surveillance (19, 20). In addition, the second HPV oncoprotein, E7, binds to and inactivates the cell-cycle inhibitor, Rb, leading to uncontrolled cell-cycle progression. Accordingly, tumorigenesis results predominantly from loss of cell-cycle regulation elicited by E7 and abrogation of DNA damage checkpoint control caused by E6.
Several previous studies have documented an association of HPV integration with structural alteration of the host genome at the site of viral integration in OPSCC (11, 21–24). An elegant study by Akagi and colleagues (21), primarily focused on head and neck cancer cell lines, demonstrated that viral integration was linked to local genome instability, including inversions, duplications and deletions, often leading to amplification of host sequences adjacent to the site of integration. Using a combination of PCR amplification, chromosome walking and Sanger sequencing they determined detailed structures of the focal rearrangements surrounding the sites of integration and proposed a rolling circle replication and looping model to account for the organization of the genome over the region surrounding viral integration. A separate study proposed amplification of excised hybrid viral-human DNA segments as extrachromosomal circles often followed by reintegration into the host genome (22). Consistent with the observations underlying both models, several groups examined sequence data from HPV-positive head and neck cancer and noted amplification of host sequences surrounding the sites of viral integration (21, 23).
Earlier studies, particularly with cervical cancer, indicate that HPV is associated not only with focal disruptions near the site of integration but also with genome wide genomic instability, most notably aneuploidy (25). Several lines of evidence suggested that HPV16 E7 disrupts genome integrity by directly interfering with centriole duplication control (26, 27). Nonetheless, HPV-positive tonsillar squamous cell carcinoma exhibit a lower frequency of aneuploidy than HPV-negative tumors (28), so whether HPV promotes aneuploidy in oral cancers remains an open question. An additional global effect of HPV on genome integrity is that HPV E6 and E7 appear to enhance mutation frequency in primary human keratinocytes (29). In addition, HPV infection and HPV E6 and E7 oncoproteins alone activate the DNA damage ATM and ATR pathways (30). Our data described below provides a compelling argument that HPV integration, rather than HPV per se, correlates with genome-wide mutation and genomic instability.
We applied a combination of whole-genome sequencing (WGS) and optical genome mapping (OGM) of several HPV-positive OPSCCs in order to identify the viral integration state as well as somatic alterations throughout the tumor genome. OGM using a Bionano Genomics microfluidic instrument, such as the Saphyr, interrogates individual large (>250 kb) genomic DNA fragments rendered strictly linear in nanofluidic channels following fluorescent barcoding targeting specific DNA sequences (31, 32). The aggregated images of these molecules allow de novo assembly of the tumor genome without reference to a scaffold. Once assembled, the resultant genome can be compared with a generic reference map or, in our case, to the normal germline genome from the patient's peripheral blood (33). Our results confirmed focal amplifications and rearrangements surrounding the sites of integration but also documented a high level of genome-wide somatic structural variants (SV) in HPV-positive tumors, but only in those tumors in which HPV was integrated into the host genome. Tumors with comparable amounts of episomal HPV exhibited essentially no somatic structural variation. On the basis of our analysis of the various rearrangements and integration events, we speculate that viral integration occurs following a genome wide structural catastrophe. The causes and consequences of such widespread genome instability remain to be resolved but our data and some previous reports suggest that head and neck tumors with integrated HPV may be more aggressive, resulting in poorer patient outcomes (34–36).
Materials and Methods
Tissue and blood samples were obtained following surgical resection for twelve p16-positive oropharyngeal tumors under protocol PRAMS00040532 approved by the Penn State Health Institutional Review Board. Patients’ demographics are provided in Supplementary Table S1. We retained both tumor and resected lateral neck lymph nodes adjacent to the tumor, which we analyzed if they were metastasis-positive. Tissue samples were flash frozen and stored at –80°C. Blood samples were obtained from all patients and stored at –80°C until use. DNA samples isolated from tumor tissue using the DNeasy Blood & Tissue Kit (Qiagen) were tested by HPV-specific PCR to confirm the presence of virus in the tumor.
Ultrahigh molecular weight DNA was extracted from tumor tissue and associated blood samples and fluorescently labeled as described previously (33). Samples were analyzed on Saphyr chips (Bionano Genomics, USA), targeting approximately 200X human genome coverage (Supplementary Table S2).
Previously isolated high molecular weight DNA was sheared into 400 base pair fragments using a Covaris Sonicator, followed by size selection using SparQ PureMag magnetic beads at 70x and 55x bead concentration. Library preparation was performed using the KAPA Hyper Prep Kit with dual-indexed, unique NEXTflex DNA Barcode library adapters. Samples were pooled, applied to a S1 flowcell of an Illumina NovaSeq 6000 Sequencer, from which we obtained an average coverage of 40X (Supplementary Table S2).
Bionano data analysis
Whole-genome imaging data was analyzed using Bionano Access 1.6 Pipeline. Individual consensus genome maps were assembled de novo and compared with GRCh38 reference. On the basis of the de novo assembly results, we further ran dual variant annotation pipeline (Bionano Solve 3.6) for each cancer genome to filter germline SV present in matched blood sample from the same patient. We also removed SVs present in the control reference genome to remove common SVs. Filtered SV counts are shown in Table 1.
|Sample .||HPV status .||Insertions .||Deletions .||Inversion .||Duplications .||Translocations .||Total .|
|Sample .||HPV status .||Insertions .||Deletions .||Inversion .||Duplications .||Translocations .||Total .|
WGS data analysis and virus detection
WGS reads were mapped to human GRCh38 and combined HPV database, which includes multiple HPV reference genomes, using BWA-mem (version 0.7.17; ref. 37). WGS somatic mutations were identified with DRAGEN pipeline and only variants that passed filter were counted and annotated with Funcotator from Genome Analysis Toolkit (version 184.108.40.206). Somatic SNV counts are listed in Table 1b. Copy-number variants (CNV) were determined by Control-FREEC (version 11.5; ref. 38). Mutated cancer associated genes were called in MutSigSV (version 1.41; ref. 39). SAMtools depth was used to obtain the depth of sequence coverage across the ∼7.9kb HPV genome based on alignment files in mapped BAM format (40).
Raw and aligned next generation sequencing files have been submitted to the European Genome-Phenome Archive (https://www.ebi.ac.uk/ega) within study accession EGAS00001005163. Bionano variant calls and mapped reads for our samples can be downloaded from https://www.datacommons.psu.edu/commonswizard/MetadataDisplay.aspx?Dataset=6286.
Determination of HPV integration
We performed WGS and OGM on OPSCCs or associated lymph node metastases, if available, as well as corresponding whole blood from twelve patients. We mapped WGS reads to the human reference hg38 and, for tumor samples, to either HPV16 (eleven of the twelve tumors) or HPV33 (one tumor) genomes. We noted those sequence reads carrying both human and HPV sequences, including those in which one of the paired ends mapped to the human genome and the other mate pair mapped to the HPV genome (discordant reads) and those in which one of the paired ends carried HPV sequences immediately abutting human sequences (softclip reads). We considered those samples with multiple consistent discordant and softclip reads as likely candidates for containing integrated HPV, with the boundaries of the human sequences in softclip reads marking the likely sites of integration. We then examined the OGM data for evidence of integration at the sites specified by the softclip reads and designated a sample as containing integrated HPV only if the OGM data confirmed an integration event at those sites. In most cases, softclip reads indicated one or two sites of possible integration that were confirmed by OGM. However, all samples had additional low-level softclip reads that were not confirmed by OGM and represented either sequencing artifacts, minor extrachromosomal hybrid molecules or integration events in a small subclone of cells below the 5% limit of detection of OGM in this study. In addition, WGS of sample 3943 returned a large number of consistent softclip reads linking viral sequences to a single chromosome 10 sequence. However, OGM failed to identify an integration event at that genome position and thus we concluded that sample 3943 carried an extrachromosomal hybrid viral genome. In sum, we determined that eight of the twelve tumors carried integrated HPV genomes, while the others carried exclusively extrachromosomal viral genomes.
We mapped the WGS reads to the human and HPV genome, from which we calculated the copy number of the virus as a function of position along its genome (Fig. 1; Supplementary Table S3). As evident from these data, all the tumors with only episomal copies of HPV contained sequences covering the entire genome, indicating the presence of an intact HPV genome within the tumor, with copy numbers ranging from 0.6 to 16. Three of these four tumors contained additional viral sequences mapping to only a portion of the genome, indicating the presence of a truncated extrachromosal HPV species as well. However, with the exception of sample 3943, WGS demonstrated that the viral sequences at the two boundaries of the gaps in each of these genomes are linked to each other, demonstrating that the gap results from an internal deletion of the virus rather than an integration event. Contrary to a previous report (23), in no case was the calculated copy level of the truncated viral genome equal to that of the intact genome.
All of the tumors with integrated HPV genomes contained sequences that mapped to only a portion of the genome and the boundaries of the partial HPV genome often correlated with the viral boundaries of the integration event. This indicates that, consistent with earlier studies, only a portion of the HPV genome integrated into the tumor genome such that integration maintained integrity of the E6 and E7 loci but inactivated E2. Three of the tumors carried two distinct partial genomes at equal copy level. Some of the tumors contained sequences mapping to the entire viral genome, indicating the presence of a complete viral genome in the tumors in addition to the partial integrated genomes. Our results do not distinguish between the persistence of an extrachromosomal HPV viral genome in the tumor versus integration of the intact viral genome, perhaps in tandem with the partial genome. The fact that the copy number of the intact genome is roughly equal to that of the partial genome suggests the latter, with the integrating virus consisting of one copy of the intact genome and one copy of the truncated genome. In several cases, the sizes of the integrated virus as determined by OGM are consistent with that interpretation.
From the genome sequences of the viruses, we could extract single nucleotide polymorphisms that allowed us to identify the relatedness among the viruses in the different tumors. Each had a unique SNP profile but with overlapping patterns, yielding a similarity profile as indicated in Fig. 1. Of particular significance, those viruses that were exclusively extrachromosomal did not constitute a cluster distinct from those that had integrated into the genome. Thus, the difference between integrating and non-integrating viruses in our cohort does not appear to be an intrinsic feature of the virus itself.
HPV can integrate at multiple sites and induce focal amplification
The combination of WGS and OGM data allowed us to determine the site and structure of the tumor genome spanning the viral integration site in almost all cases (Fig. 2; Supplementary Table S4). As has been observed previously for both head and neck and cervical cancers, the sites of integration were recurrent in neither the host genome nor the viral genome. Moreover, we observed focal amplification of the host sequences surrounding the site of insertion in almost every case of integration. These amplifications involved anywhere from 22 kb to 400 Kb of flanking sequences and range from a simple tandem duplication of the virus and surrounding sequences (tumor 7387) to a complex rearrangement spanning 2.5 Mb and at least 26 copies of the virus and various regions of the bordering host genome (tumor 7122; Fig. 2). In four cases, we found the virus inserted at two different sites in the genome, located on two different chromosomes. In another case, the virus spans the breakpoint in an interchromosomal translocation involving three separate chromatids, and in three other cases, it spans the breakpoint of an intrachromosomal translocation. In all but one case, at least one of the virus insertions lay inside, or within 10 kb, of a protein-coding region, consistent with previous observations (11, 13, 41, 42). Detailed descriptions of the rearrangements at the sites of each integration are provided in Supplementary Material.
Genome instability strictly correlates with HPV integration
By analyzing the genomes of the tumor samples by OGM, we were able to identify SVs not only at the site of viral integration but also across the entire genome. As shown in Fig. 3, every one of the tumors in which HPV had integrated had also undergone extensive genome rearrangement, including insertions, deletions, inversions, and translocations. Some of these resulted in substantial segmental aneuploidies (Supplementary Fig. S1) and impinge on a variety of cancer associated genes, including some that have been previously implicated in OPSCC (Supplementary Table S5). In contrast, the tumors in which HPV remained episomal showed no comparable genome instability: the genomes of these tumors were essentially identical to those of the corresponding germline genomes of the patients. The numbers of each type of SV in each of the tumors, listed in Table 1A, confirm this visual impression and document that the differences in both individual and total SV load between tumors with integrated versus episomal HPV is statistically significant (P = 0.008, Wilcoxon rank sum test, for total SVs).
We also examined the single nucleotide frequency in the tumors as well as the overall mutational load and the specific cancer genes mutated in each tumor. Previous sequence analyses of head and neck tumors demonstrated that HPV-positive head and neck tumors had a mutational landscape quite distinct from that of HPV-negative tumors (43–48). In particular, almost all HPV negative tumors contained mutations in TP53 and a significant fraction carried mutations of CDKN2A or amplification of CCND1, whereas HPV-positive tumors contained almost none of these mutations. This is consistent with viral E6 and E7 oncogenes driving tumorigenesis in HPV-positive tumors through inactivation of Rb and p53, eliminating the selective pressure for host mutations in genes comprising these pathways. Previous results also demonstrated that a significant fraction of both HPV-positive and negative tumors contained activating mutations or amplifications of PIK3A (44, 45).
The total point mutational burden across all of the tumors in our study spanned a range comparable with that previously reported for head and neck cancers (Supplementary Fig. S2). Moreover, the mutational landscape of our tumors was consistent with that of HPV-positive tumors described in the TCGA cohort. None of our cohort carried TP53 mutations or amplification of CCND1, while a significant fraction carried an activating mutation or amplification of PIK3CA. Nonetheless, the mutational landscape of tumors with episomal HPV was clearly distinct from that of tumors with integrated HPV (Fig. 4). The average mutational burden was substantially less in the former group than the latter, albeit not reaching statistical significance. Moreover, the spectrum of mutations was quite distinct. Seven of eight of the tumors with integrated HPV carried a mutation in or amplification of PIK3CA. On the other hand, only one of the episomal HPV tumors had a mutation or amplification of PIK3CA. These results suggest that HPV E6 and E7 drive tumorigenesis in both episomal and integrated HPV tumors. However, the mutational landscape is distinct between the two classes and the overall mutational frequency is substantially higher in those with integrated (Supplementary Fig. S2; Table 2). Whether the different mutational spectrum simply reflects the difference in mutational burden or a distinction in the growth properties of the two classes requires a more extensive analysis.
|Sample .||HPV status .||Missense .||Nonsense .||In Frame Del .||In Frame Ins .||Frameshift .||Total .|
|Sample .||HPV status .||Missense .||Nonsense .||In Frame Del .||In Frame Ins .||Frameshift .||Total .|
Viral integration is associated with tumors that are more aggressive
While the number of patients examined in this study is too small to draw statistically significant conclusions, the available data suggests that the tumors with integrated HPV are more aggressive than those with episomal HPV (Supplementary Table S1). Specifically, the mean size of tumors with integrated HPV was 3.5 cm while those with episomal HPV was 2.5 cm. Moreover, half of the tumors with integrated HPV exhibited perinteural invasion while none of the tumors with episomal HPV exhibited this feature. Finally, tumor staging indicated that the HPV integrated tumors were more advanced with 3 of the 8 patients with integrated HPV tumors were stage 4 while all of the tumors with episomal HPV were only stage 1 or 2. These observations beg the question of cause and effect but suggest that further evaluation of features and outcomes of HPV integrated versus episomal tumors is warranted.
We applied a combination of WGS and whole-genome imaging to interrogate the genome structure of HPV-positive OPSCCs. These complementary techniques allowed us to determine unequivocally the integration state of HPV in the tumors, to define the genome organization at the sites of integration and to correlate genome wide mutational patterns with the integration state of the virus. The results yield the striking conclusion that viral integration is tightly linked to global genome instability and increased mutation frequency. Moreover, while the number of cases examined to date is limited, clinical features indicate a trend toward more aggressive tumors associated with viral integration. Using our methodology, we were unable to rule out that some HPV genomes remained extrachromosomal in tumors in which HPV had integrated. However, regardless of whether some viral genomes remained extrachromosomal, the presence of an integrated copy was sufficient and necessary to link it to pervasive genomic instability.
Our criteria for determining viral integration required both the presence of multiple consistent WGS read in which viral sequences abut human sequences and OGM documentation of a structural alteration at the site predicted by the WGS reads. This is more rigorous than relying on WGS or mate pair sequencing alone, which do not distinguish between integration and extrachromosomal human-viral hybrid episomes, and more reliable than more indirect methods, such as RNA sequencing, inverse PCR, FISH, or E2/E6 copy levels (23, 49–51). Moreover, the method allowed us to extract the structure of the region immediately surrounding the site of integration directly from the primary data. These analyses showed that integration can occur at multiple sites within one tumor, that integration is often followed by focal amplification of the virus and the genomics sequences surrounding it and that integration often occurs at the junction between inter- or intra-chromosomal translocations. In all our cases, the local amplification of adjacent host sequences accounts for the increased copy number of those sequences, as has previously reported (13, 21, 23), without positing the formation and persistence of extrachromosomal hybrid molecules.
Our results document a strict correlation between viral integration and whole-genome instability. Tumors in which HPV had integrated contained a significantly greater number of SVs of all types—deletions, insertions, translocations and inversions—and an increase in single nucleotide and frameshift mutations over those found in tumors with only extrachromosomal HPV. Tumors without integrated HPV contained essentially no SVs and relatively few point mutations. While previous analysis of integrated versus episomal head and neck tumors noted significant genomic alterations at the site of viral integration, few examined the overall structural variation in those tumors.
This correlation raises the question of causality. Does integration induce genome instability, perhaps as a consequence of increased expression of E7 following inactivation of E2 and attendant reduction of p53 activity (25)? Or, does genome instability arise from a plethora of DNA damage events, yielding double strand breaks and activation of nonhomologous end joining repair through which the extrachromosomal HPV genomes could become attached to a chromosomal site (52)? Certainly, previous reports noted in the introduction suggest that enhanced E6 and E7 expression attendant upon integration could increase mutation and aneuploidy (26, 27, 29, 30). However, in half the tumors with integrated HPV, the viral segment is present at two different sites in the genome. This is more readily explained by a concurrent capture of the virus at two separate sites during a genome catastrophe rather than sequential integration events driven by selection. In addition, in several cases, we found the virus integrated at the junction of an intra- or inter-chromosome translocation. This would require the interaction of two separate chromatids, or in the case of one tumor, three separate chromatids, all broken concurrently and healed in concert. Finally, given the fact that some tumors with only extrachromosomal HPV carried no obvious chromosomal driver mutations, we conclude that episomal HPV alone is likely sufficient for promoting tumor formation. Thus, subsequent selection for integration of the genome was not required for tumorigenesis. In sum, these observations are consistent with the hypothesis that viral integration results from some genome catastrophe subsequent to HPV infection rather than selection for viral integration to drive tumorigenesis. Such a genome catastrophe could occur in a short period during tumor development through a breakage-fusion-bridge cycle initiated by an initial cell division error (53), accounting for simultaneous viral integration and widespread genomic rearrangements.
Substantial clinical data has documented that HPV-positive OPSCCs are more responsive to radiation treatment or chemotherapy than HPV negative carcinomas and such patients have more favorable outcomes (54–56). However, those studies have not distinguished between integrated versus episomal HPV-positive tumors. Clinical characteristics of the tumors in the dozen cases examined here suggest that those with integrated HPV are more aggressive, although examination of more cases will be necessary to rigorously test that correlation. This differs from previous reports indicating that integration had no effect on outcomes (49) or indicated a more favorable prognosis (28). However, our suggestion that tumors with integrated virus are more aggressive is consistent with the recent observation that high copy-number variation in HPV-positive OPSCCs is strongly associated with worse recurrence-free survival (57) and our observation that copy-number variation is substantially elevated in integrated versus episomal cases (Supplementary Fig. S1). We do not have sufficient outcomes data to determine whether integration is associated with increased recurrence or reduced overall survival. However, our results raise the possibility that such distinction may have prognostic value that could inform treatment options.
J.R. Broach reports grants from George L. Laverty Foundation during the conduct of the study. No disclosures were reported by the other authors.
B. Labarge: Investigation, methodology, writing–original draft. M. Hennessy: Formal analysis, investigation, methodology. L. Zhang: Data curation, software, formal analysis, visualization. D. Goldrich: Investigation, methodology. S. Chartrand: Investigation. C. Purnell: Software, formal analysis, visualization. S. Wright: Data curation, formal analysis. D. Goldenberg: Conceptualization, supervision, funding acquisition, project administration. J.R. Broach: Conceptualization, supervision, writing–original draft, project administration, writing–review and editing.
The authors are grateful to Yuanyuan Chang and Ben Clifford of Bionano Genomics for advice on OGM analysis and to Vonn Walter for statistical analysis. This work was supported by a grant from the Laverty Foundation to D. Goldenberg.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Molecular Cancer Research Online (http://mcr.aacrjournals.org/).