Abstract
HPV16 is the most oncogenic type of human papillomaviruses (HPV). Integration of HPV into the human genome is an important mechanism of carcinogenesis but is absent in at least 30% of HPV16+ tumors. We applied long-read whole-genome sequencing (WGS) to cervical cancer cell lines and tumors to characterize HPV16 carcinogenesis in the absence of integration. Large tandem arrays of full-length and unique truncated viral genomes integrated into multiple chromosomes were identified in two HPV16+ cell lines. The dispersion of characteristic viral variants to multiple integration sites indicates that viral deletions formed as extrachromosomal DNA (a phenomenon we term HPV superspreading). In addition, we identified an HPV16+ cell line with unintegrated (episomal) DNA that has tandem arrays of full-length, truncated, and rearranged HPV16 genomes (multimer episomes). Cytogenetic analysis of this cell line shows intense extrachromosomal HPV staining, including structures resembling double-minute chromosomes. WGS of HPV16+ cervical tumor samples from Latin America revealed that 11 of 20 tumors with only episomal HPV (EP) had intact monomer episomes. The remaining nine EP tumors had multimer and rearranged HPV genomes. The majority (80%) of HPV rearrangements and deletions disrupted the E1 and E2 genes, and EP tumors overexpressed the E6 and E7 viral oncogenes, a similar profile to tumors with HPV integration. Tumors with putative multimer HPV integrations display HPV multimers and concatemers of human and viral sequences. Our data uncovered a novel mechanism for HPV16 to cause cancer without integration through aberrant episomal replication, forming rearranged, mutated, and multimer episomes.
Multimers of the HPV genome are generated in cervical tumors replicating as extrachromosomal episomes, which is associated with deletion and rearrangement of the HPV genome and provides a mechanism for oncogenesis without integration.
Introduction
Human papillomavirus (HPV) causes more than 90% of cervical cancer, resulting in at least 300,000 deaths per year, mostly in low-and-middle-income countries (LMIC; refs. 1, 2). Two high-risk (HR) types, HPV16 and HPV18, are responsible for more than 70% of precancerous cervical lesions and advanced cancers (3). However, genetic variation within HPV types is associated with histologic type of cervical cancer [adenocarcinoma vs. squamous cell carcinoma (SCC)], integration rate, and carcinogenicity (4–6).
The HPV genome replicates inside the nucleus of the host cell as an episome, a form of extrachromosomal DNA (ecDNA). The circular, approximately 7,900 base pair (bp) genome encodes two critical oncoproteins, E6 and E7, which inhibit the tumor suppressors TP53 and RB1 (7–9). E6 and E7 oncogene expression is controlled in part by the E2 protein (10), and E1 and E2 regulate replication of the viral genome (11). In most cancers, the viral genome integrates into the host DNA deleting all or portions of the E1 and E2 genes (12). Thus, most integration events retain a truncated portion of the HPV genome containing the viral upstream regulatory region (URR) and the E6 and E7 genes (type I integration), In contrast, a subset of tumors integrates with intact copies of the viral genome (type II integration; ref. 13) and extrachromosomal human-HPV hybrid molecules have been proposed (14–17)
Inactivation of E1 and E2 inhibits DNA synthesis at the HPV origin of replication and releases repression of E6 and E7 expression. HPV integration can be associated with amplified viral and flanking cellular genes at the integration site (18–22) and lead to activation of flanking human genes (23). Integration can occur throughout the genome but mainly occurs in transcriptionally active regions. Recurrent sites have been found in or near specific genes (5, 21, 22, 24–26).
Viral integration is not a part of the HPV lifecycle and does not occur in all tumors. While HPV18 and HPV45 integrate into nearly 100% of tumors, the integration rate of HPV16 is only 60% to 80% (5, 16, 25, 27–30). Some episomal-only HPV-positive tumors have mutations in the URR of HPV (31, 32), and HPV16 regulation is altered in cellular models of episomal infection (33). To investigate the mechanism of HPV16 carcinogenesis in the absence of integration, we applied multiple long-read DNA and RNA sequencing (RNA-seq) strategies to cell lines with and without HPV integration (Supplementary Fig. S1A). Finally, we used these methods to study a well-curated set of HPV16 tumors with episomal DNA. Our results provide insight into the extrachromosomal replication of HPV and carcinogenesis.
Materials and Methods
Patients/informed consent
This study was conducted at the Instituto de Cancerología (INCAN) in Guatemala City and the Hospital Central Universitario, Venezuela. The Research Ethical Committees of both institutions approved the protocol, and the study was exempt from Institutional Review Board approval by the NIH Office of Human Studies Research. Women gave written informed consent.
Cell culture, DNA, and RNA extraction
Cell lines were obtained from ATCC and the Korean Cell-Line Bank and Cancer Research Center, Seoul National University (Seoul, South Korea; ref. 34), verified by Identifiler, checked every 6 months for Mycoplasma. Cells were cultured in Eagle minimum essential medium (EMEM) or RPMI640 media with 10% FBS 1% penicillin–streptomycin (10,000 U/mL of penicillin, 10,000 μg/mL of streptomycin, and 25 μg/mL of amphotericin B). DNA was extracted using a Gentra Puregene Kit from Qiagen or Circulomics Nanobind HMW DNA Kit. RNA was prepared from 30 million cells using TRIzol (Thermo Fisher Scientific) and Poly-A+ RNA purified by DYNAL Dynabeads (Invitrogen). Tumor DNA and RNA were simultaneously purified from 1 cm of tumor tissue using the DNeasy Blood & Tissue Kit or RNeasy Mini Kit. DNA was quantitated by Nanodrop (Thermo Fisher Scientific) and Qubit (Thermo Fisher Scientific) and stored at 4°C; RNA was quantitated by Qubit and stored at −80°C.
Determination of HPV integration
Long-read DNA sequencing
For ligation sequencing, 1 μg of cell line DNA was sheared to 8–20 kb with a G-tube (Covaris) or used unsheared with the LSK-109 kit. Targeted sequencing of HPV16 was carried out in CaSki and SiHa cells using CRISPR probes to HPV16 probes (Supplementary Table S1). For transposase sequencing of cell lines, 1 μg was used with the Rapid Sequencing kit (SQK-RAD004, Oxford Nanopore Technologies). CaSki and SNU-1000 DNA was also prepared using the ultra-long protocol of Circulomics and the ultra-long DNA Sequencing kit (SQK-ULK001, Oxford Nanopore Technologies) with and without adaptive sampling (35, 36) using a combined human HG38/HR HPV FASTA file selecting for cancer genes, integration loci, and HPV.
Tumor DNAs (0.3 μg) were sequenced using the Rapid Barcoding kit (SQK-RBK004, Oxford Nanopore Technologies). A total of 30 to 50 fmol/L of DNA was loaded onto MinIon R9.4 flow cells (Oxford Nanopore Technologies).
Full-length RNA sequencing
Cell line RNA (500 ng Poly-A+) was sequencing using the Direct RNA sequencing kit (SQK-RNA002, Oxford Nanopore Technologies) and the Direct cDNA kit (DCS109, Oxford Nanopore Technologies). Tumor RNA (50-ng total RNA) was sequenced with the PCR-cDNA Barcoding kit (SQK-PCB109, Oxford Nanopore Technologies) on MinIon R9.4 flow cells. Data for all sequencing runs is displayed in Supplementary Table S2 and coverage for tumors in Supplementary Table S3.
PCR and Sanger sequencing
Primers were designed with Primer3 (37). Overlapping primers spanned the HPV16 and HPV18 genomes. Cell line DNA was amplified using a long-range PCR kit from New England BioLabs. Sanger sequencing was performed using an Applied Biosystems 3500xL Genetic Analyzer from Thermo Fisher Scientific.
Cytogenetics and in situ hybridization
Spectral karyotyping (SKY) was performed as previously described (38) while HPV FISH (Enzo Life Sciences) was performed per the manufacturer's recommendations. Metaphase chromosome suspensions were prepared by treating cells with a hypotonic solution (0.075 mol/L KCl), followed by fixation using methanol:acetic acid [3:1 (v/v)]. Slides were prepared by dropping a small amount of suspension onto slides using a Thermotron chamber (Thermotron) to control humidity. Slides were aged for 1 week at 37°C before hybridization. Chromosome preparations were hybridized for 72 hours with in-house SKY probes and imaged using a Leica DMRXE microscope (Leica) equipped with 4',6-diamidino-2-phenylindole (DAPI) and SKY filters (Chroma), a xenon lamp, and a spectracube (Applied Spectral Imaging). To perform HPV–FISH, we used the PATHO-GENE probe HPV screening probe - ENZ-32884 (Enzo) and the tyramide signal amplification kit (Invitrogen). We also performed a nonamplified detection of HPV-FISH after we performed SKY washing off the SKY paints and rehybridized it with the PATHO-GENE probe HPV screening probe - ENZ-32884. We then detected using our standard FISH detection protocol using BSA blocking followed by incubation with Cy3 Streptavidin (Jackson ImmunoResearch Laboratories).
For FISH analysis, slides were evaluated using the Leica Thunder Imager microscope with a coolLED lamp and a DMC4500 digital camera (Leica). Leica Application Suite X 3.7.0.20979 software was used to acquire cell images. For each cell line, 18 to 20 metaphase spreads were acquired for SKY and scored for numerical and structural chromosomal aberrations according to established human chromosome nomenclature rules from ISCN (International Standing Committee on Human Cytogenetic Nomenclature 2009).
Bioinformatics and statistical analysis
The fastq files were aligned to the human (HG38) or HPV genomes using EPI2ME (https://epi2me.nanoporetech.com) Fastq Human Alignment GRCh38 or Fastq Custom Alignment app with a HR HPV type fasta file. Reads were merged in excel or in filemaker (Claris) and merged with read length data and manually extracted and mapped using BLAT (https://genome.ucsc.edu) or BLAST.
BAM files were merged and indexed using the BamTools Merge and SAMtools Index tools in the Cancer Genomics Cloud (CGC; ref. 39). An Oxford Nanopore Technologies WGS data processing pipeline was run on align human and HPV reads, produce BAM files.
DNA reads were also analyzed using Guppy/4.5.4 (https://nanoporetech.com/). Modified base-calling was performed using Megalodon/2.3.3 https://github.com/nanoporetech/megalodon. Structural variation calling was carried out with the Nanopore pipeline-structural-variation. The workflow is available at https://github.com/NCI-CGR/Nanopore_DNA-seq. RNA reads were base-called using BINITO /v0.3.7 and aligned to the HG38 genome using Minimap2/2.17. Isoforms were detected and quantified using Stringtie2/2.1.5 (40) and Freddie (https://www.biorxiv.org/content/10.1101/2021.01.20.427493v1.full; https://github.com/vpc-ccg/freddie). The Freddie program calls the Gurobi package (www.gurobi.com) to solve optimization problems. The entire workflow is available at https://github.com/NCI-CGR/Nanopore_RNA-seq.
Statistical analysis was performed in GraphPad.
Data availability
Cell line sequencing data are available at the sequence read archive PRJNA772772 and cervical tumor samples at: dbGaP Study: phs002810.v1. The data analyzed in this study were obtained from TCGA at cbioportal (https://www.cbioportal.org/)and from Supplementary Data in ref. 4. All other raw data are available upon request from the corresponding author.
Results
HPV type controls integration frequency
HPV types have varying frequencies of integration (5, 41). To explore integration frequency by HPV type, we compiled data from TCGA and Guatemalan tumors merging type I and type II integrations into a single ‘integrated’ class (4, 27, 42). We confirmed that HPV18 and HPV45 integrate into nearly 100% of tumors, while HPV16 and HPV31 integrate into 60% to 70% of tumors (Fig. 1A). HPV18 and 45 are in the alpha-7 clade and HPV16 and 31 the alpha-9 clade (43), indicating that viral genetics in part determines integration rate.
Three classes of HPV16 tumors have been described: (i) those with only episomal DNA, (ii) integrated only tumors (type I and III) tumors with either type II integrations or both integrated and episomal DNA (27, 44, 45). However, hybrid human and HPV ecDNA has been proposed to be present in tumor previously classified as containing both episomal and integrated DNA (14, 15). To better understand these different tumor classes, we applied long-range and single-molecule sequencing using methods applicable to linear and circular DNA (Supplementary Fig. S1A). We first sequenced SiHa cervical cancer cells (46), known to have a single locus of integration on chromosome 13 (18, 47). Using both WGS and CRISPR-targeted sequencing, we constructed a 54-kb contig containing the 7652 bp portion of the integrated HPV16 genome, confirming the rearrangement of flanking human DNA (Supplementary Fig. S1B). Therefore, long-read and targeted sequencing can resolve the structure of HPV integration sites.
Long-read DNA sequencing identifies complex integrated HPV multimers
To understand the structure of complex integration events, we performed long-read DNA sequencing of the CaSki cervical cancer cell line containing 800 copies of integrated HPV16 at 30 to 40 chromosomal sites (18, 19). We used sheared and unsheared DNA to achieve a range of read lengths. Readings of up to 67 kb were obtained containing HPV and human sequences, and 28 out of 35 (80%) recurrent junctions matched those seen previously using short-read WGS (Fig. 1B; Supplementary Table S4; ref. 18). The HPV sequences in these reads are concatemers of:
- (i)
a full-length genome (genome A).
- (ii)
a previously described 6.5 kb genome with a 1.4-kb deletion (genome B; ref. 19).
- (iii)
HPV16 fragments of other sizes.
We obtained reads of up to 102 kb with only HPV16 sequences (Fig. 1C) with concatemers of genomes A and B and smaller fragments, some of recurrent structure (Supplementary Fig. S2). Reads aligning to the same human-HPV junction assembled into contigs, but there was no identifiable pattern in the order of genomes A and B at different chromosomal locations. We constructed libraries with an ultra-high molecular weight protocol, confirmed many junctions and obtained HPV-human reads of >340 kb and HPV-only reads >240 kb (Supplementary Fig. S2). Therefore, a complex duplication and assembly process likely generated the concatemers.
We also analyzed complex integrations in a head and neck squamous cell carcinoma (HNSCC) cell line, SCC152. This line was derived from a relapsed tumor obtained one year after the SCC090 cell line was established from the same patient (48). SCC090 contains 200 to 500 copies of HPV16 integrated at chromosomes 2, 3, 6, and 9 (18). Our sequencing of SCC152 revealed nine integrated loci, supported by multiple long reads, on chromosomes 2, 3, and 9 (Supplementary Table S5). These integration sites were also identified in SCC090 and, therefore, retained in the relapsed tumor, with the chromosome 9 locus being the only transcribed site. Most integrated loci contained HPV16 arrays composed of (i) full-length genomes, (ii) genomes containing a 163-bp deletion, and (iii) genomes with both the 163-bp and a 367-bp deletion (Fig. 1D).
The 367-bp deletion in SCC152 was observed over 95% of the time with the 163-bp deletion, suggesting that the 163-bp deletion appeared first and the 367-bp deletion occurred on a 163-bp deletion-containing genome. Both deletions are in the URR region, with the 163-bp deletion removing part of the intermediate enhancer region and the 367-bp deletion a portion of the distal region (see below). As with CaSki cells, we observed a random order of the different HPV16 forms integrated into SCC152 cells at distinct genomic locations, although in SCC152, there are very few full-length genomes. The data from both CaSki and SCC152 suggest that deleted forms of HPV formed as HPV ecDNA, present as viral concatemers, before integration. We term this model of episomal amplification, episomal deletion, and rearrangement, followed by integration at multiple chromosomal locations “HPV superspreading.”
A cell line with episomal HPV16 displays multimer and deleted episomes
To further understand the formation of multimer episomes and the mechanisms of HPV superspreading, we searched for cell lines with episomal HPV. The SNU-1000 cell line was established from a cervical squamous cell carcinoma isolated from a 43-year-old Korean patient and published as having episomal and integrated forms of HPV16 (34). Long-read WGS of SNU-1000 identified a 150 bp HPV fragment integrated on chromosome 11q in the intron of the CEP126 gene. This HPV fragment contains a portion of the E7 oncogene but cannot encode a functional E7 protein (Fig. 2A). Read count analysis across chromosome 11 identified genes located 5′ to the integration site, such as the progesterone receptor (PGR) amplified 10-fold, and the YAP1, BIRC2, and BIRC3 genes, 3′ to the integration, amplified 25-fold (Fig. 2A).
The region distal to 11q22.2 (103–140 Mb) is present at a low copy number from the WGS data, and we identified multiple long reads joining chromosome 11 at 102,657 kb to 102,663 kb and 102,898 kb to 102,913 kb (Fig. 2B). Therefore, the chromosome 11 HPV16 integration is accompanied by amplification of the locus and rearrangement of the flanking human DNA. As we did not see chromosome 11q joined to another chromosome, this amplified DNA appears to lack a telomere. Figure 2C presents a model in which these sequences are in circular, extrachromosomal structures.
Sequencing of SNU-1000 DNA was performed with both standard adapter ligation onto linear DNA molecules and insertion of transposase adaptors into linear and circular molecules (Supplementary Fig. S1A). In both cases, we identified large HPV16-only reads consistent with episomal DNA. We obtained transposon-tagged, HPV-only reads of 7.9 kb representing monomer episomes (Fig. 2D). In addition, there were numerous reads longer than 7.9 kb containing concatemers of full-length genomes, a 634 bp deleted genome, and other truncated genomes (Fig. 2E). Some reads contained only full-length genomes consisting of monomer episomes, dimers, and higher order concatemers. Others contained complex patterns of full-length and deleted forms, indicating aberrant replication of episomal DNA. The 634-bp deletion removes a portion of the C terminus of the E1 gene and the N-terminus of E2. Therefore, SNU-1000 displays a full spectrum of episomal monomers, multimers, deletions, and complex concatemers as extrachromosomal DNA. Episomal deletion of the E1 and E2 genes provide a mechanism for HPV16 transformation without integration.
To delineate the karyotype and to confirm the presence of extrachromosomal HPV, we performed SKY and HPV–FISH on metaphase preparations of SNU-1000 and SNU-1245 cells. Figure 3A and B display SKY and HPV–FISH on the same SNU-1000 metaphase. HPV–FISH resulted in very intense hybridization signals outside of the chromosomes, or in some cases adjacent to or on top of chromosomes (Fig. 3B; Supplementary Figs. S3 and S4). In some instances, the extrachromosomal HPV DNA resembled double-minute chromosomes in size and structure (Fig. 3B, inset). In contrast, a near-tetraploid cell line with a single copy of integrated HPV18 according to our sequencing data, SNU-1245, showed far less intense signals on a specific chromosome (chromosome 1; Supplementary Fig. S3B), in agreement with DNA sequencing data. The karyotype of SNU-1000 revealed extensive diversity between metaphases with a range of 52–76 chromosomes (median 68) exhibiting multiple translocations, and deleted chromosomes (Fig. 3C; Supplementary Tables S6–S8). A der(11)t(11;13)(q23;q21) chromosome was observed in all 18 metaphases examined, with an apparent loss of 11q23-ter, consistent with the copy number analysis (Fig. 2A). In conclusion, SNU-1000 revealed complex karyotype aberrations and heterogeneity, a 150-bp fragment of integrated HPV16, and a large and variable number of copies of extrachromosomal HPV.
Identification of a cell line with integrated HPV18 multimers
Because of the high rate of integration of HPV18 in cancers, we were surprised to find a cell line, SNU-1245, reported to have episomal HPV18 (34). However, in situ hybridization of HPV revealed a signal on chromosome 1 as well as an acrocentric chromosome (Supplementary Fig. S3B). SKY analysis of 17 metaphases yielded a median chromosome count of 90 (78–96) with multiple chromosomes showing translocations, deletions, and insertions (Fig. 3D; Supplementary Tables S7 and S8). Long-read WGS and targeted sequencing using computer-guided sequence selection (adaptive sampling) revealed that SNU-1245 has a single multimer integration of HPV18 on chromosome 1q32.2 (Fig. 4A). Our sequencing results showed that the locus contains one full-length copy of HPV18 and three HPV18 fragments, a type II integration. The HPV sequence block totals 16,615 bp, and this region of chromosome 1 is amplified approximately five times (Fig. 4B). In addition, the flanking segments of human DNA are rearranged, consistent with a looping amplification mechanism, as reported previously for HPV16 and HPV18 cell lines and tumors (18). Therefore, the structure of the SNU-1245 locus suggests that HPV18 can also undergo multimer amplification of episomal HPV DNA, followed by integration.
Episomal cervical cancer cells have an HPV expression profile like integrated cells
To study the expression pattern of episomal and integrated HPV, we used full-length direct cDNA and direct RNA-seq to quantify the HPV16 transcripts in the SNU-1000 cell line and HPV18 transcripts in SNU-1245 (Fig. 4C). No HPV-Human hybrid transcripts were observed in either cell line, indicating that HPV expression is within the HPV concatemers. Supplementary Table S9 and Supplementary Fig. S5 shows that the most abundant HPV16 transcript (transcript B) in SNU-1000 cells encodes the spliced E6*I form of E6, E7, E4, and E5 and accounts for 78% of the transcripts. There is a very low abundance of mRNAs capable of encoding E1 or E2 in SNU-1000, and the percentage of unspliced E6 transcripts in SNU-1000 is in the range of six other cell lines with integrated HPV16 (Supplementary Fig. S6; Supplementary Table S10). In the SNU-1245 line, there is a nearly even balance between transcripts encoding the full-length E6 and E6 spliced forms. Furthermore, there is a higher level of transcripts capable of encoding E2 and E1 (transcripts 4, 5, and 6; Fig. 4D). Therefore, the episomal HPV16 in SNU-1000 cells displays an expression profile like that in integrated HPV16 cell lines.
Multimer episomes are present in cervical tumors
To analyze the structure of HPV16 episomes in cervical tumors, we performed Oxford Nanopore Technologies tagmentation sequencing (Supplementary Fig. S1A) on tumors previously classified as episomal (EP) only or episomal and integrated by HPV capture and deep short-read sequencing (27). In that assay, tumors were classified as EP if they had deep HPV sequence coverage with no gap in HPV coverage and no recurrent human-HPV junction. We have reclassified the episomal and integrated tumors as putative type II integrated to reflect current understanding (14). The age range, PIK3CA mutation status, histology, HPV type, and HPV16 sublineage are shown in Fig. 5A. From 28 HPV16 EP tumors, we obtained total coverage of at least one HPV16-genome to be able to examine the structure of the 20 episomal tumor genomes. None of these tumors had HPV/human junction reads, or human–HPV hybrid transcripts consistent with their status as EP. For eight of these tumors, the longest reads began and ended at nearly the same position on the 7906-bp viral genome and had no insertions, deletions, or rearrangements (Supplementary Table S11). This result is consistent with these molecules representing circular episomes, tagged in random positions by the transposon (Fig. 5B). These tumors and those with multiple reads all less than 7906 bp were tentatively classified as monomer-only tumors. Therefore, approximately one-half of EP tumors appear to retain a monomer episome.
A total of nine additional EP tumors had HPV16 DNA reads larger than 7.9 kb, representing putative multimer episomes, or had rearranged HPV–HPV junctions. Three of these tumors had sequences of 15.8 kb with two complete copies of the HPV16 genome, likely representing HPV dimers (Fig. 5C; Supplementary Table S12). As with the monomers, these reads start and stop at nearly the same position on the genome. One tumor, T393, had dimers with both 30 and 58 bp deletions in the URR (Fig. 1D). In addition, dimer reads with one or two copies of Δ58 or one copy of Δ30 and Δ58 were observed (Fig. 5D). The Δ30 and Δ58 deletions are in overlapping regions of the URR and delete an NF1-binding site and two YY1-binding sites (Fig. 1D). These two YY1-binding sites have previously been found deleted in HPV16 isolates from cervical cancer, leading to elevated activity of the P97 promoter (31). Therefore, as seen in the SNU-1000 cell line, deletions affecting key regulatory sites can propagate in episomes replicating in cervical tumors and give rise to multiple aberrant structures, contributing to transformation without integration.
Six other EP tumors had episomal DNA with rearrangement of HPV16 sequences (Fig. 6A). Interestingly, 7/8 of these tumors have a breakpoint inside the E1 (914–2666 bp) or E2 genes (2756–3853). These rearrangements could separate the E1 and E2 open reading frames (ORF) from the P97 promoter and lead to decreased expression of E1 and E2 or disrupt their reading frame. Most of these HPV16 rearrangement junctions were observed in our previously published HPV capture and ion torrent sequencing (27), performed on the same DNA samples (Supplementary Fig. S7). This data clearly shows that rearrangement of episomal DNA, with frequent inactivation of the E1 and E2 genes, is a common feature of HPV16-driven cervical tumors. In addition, transcriptome analysis of representative EP-only tumors revealed that E6/E7 transcripts are predominant and that nearly all splice inside the E6 gene (Supplementary Fig. S8A). Interestingly, integrated tumors had a significantly (P = 0.049) higher mean HPV RNA expression than episomal only tumors (Supplementary Fig. S8B).
Long-read WGS of putative type II integrated tumors gave a more complex picture. Despite having only 0.5× genome coverage, we identified the same HPV-human breakpoint in 13/27 tumors found by HPV capture and short-read sequencing (27). Nearly all tumors for which the integration was not confirmed by Oxford Nanopore Technologies sequencing yielded less than 10 HPV-containing reads. At least eight type II tumors have complex rearrangements of HPV-only sequences (Fig. 6B). Several tumors have 50–100 bp deletions internal to the HPV sequence or have multiple rearranged junctions. As with EP tumors, the majority (3/5) of HPV junctions are within the 865–3875 region of the HPV16 genome containing the E1 and E2 genes.
Tumor 429, a type II integrated tumor with integration on multiple chromosomes, contains a 63 bp duplication in the E1 gene that is a known viral variant (49). The deletion was seen twice each in single HPV-only molecules of 13 and 14 kb (Fig. 6C). In addition, the dup63 mutation was seen twice in a read anchored to chromosome X. Therefore, this tumor appears to have either both episomal and integrated HPV or large HPV multimers.
Discussion
HPVs are among the most oncogenic human cancer viruses, and HPV16 is the most carcinogenic type of HPV (1, 6, 50). While integration of viral DNA into the human genome is an important mechanism of HPV carcinogenesis, transformation without HPV integration is less well understood. Our data employing long-read and single-molecule sequencing of cell lines and cervical tumors reveal new aspects of HPV oncogenesis. By carrying out long-read WGS on well-characterized cell lines (SiHa, CaSki, and SCC152), we established that the methods reliably identify HPV sequences and the complex structure of integrated loci. Using HPV-containing reads of up to 347 kb, we show that all of the integrated loci in the CaSki cell line are composed of complex strings of full-length genomes, a recurrent 6.4 kb truncated genome and other smaller fragments. The finding of concatemers of similar structure and complexity integrated into multiple regions of the genome supports a model where the mixed concatemers of HPV genomes were generated as extrachromosomal DNA and subsequently inserted into the human genome. This model is further supported by HNSCC cell line, SCC152, which has unique Δ163 bp and Δ163/Δ367 bp HPV16 deletions. In addition, SCC152 has complex concatemers integrated at chromosomes 3 and 9. HPV–FISH data for CaSki and SCC-090, a precursor to SCC152, demonstrating that all HPV DNA is integrated (18). To explain the presence of HPV concatemers with unique deleted genomes on multiple chromosomes, we propose an “HPV Superspreading model.”
To support the model, we analyzed a cell line, SNU-1000, that stably replicates HPV16 episomal DNA (34). SNU-1000 contains only a 150 bp fragment of the HPV16 E7 gene integrated on chromosome 11q22.1, and is truly a cell line with episomal and integrated HPV16. Interestingly, this integration is within a 2.9-Mb amplified region containing the YAP1, BIRC2, and BIRC3 genes. To our knowledge, this is the first time an oncogene amplification has been found at a locus with an incomplete copy of the HPV oncogenes, a new mechanism of HPV carcinogenesis.
Except for the chromosome 11q22.1 integration site, there were no other recurrent human-HPV junction reads in SNU-1000. All other HPV16 sequences in SNU-1000 are HPV-only reads. We used HPV-FISH to demonstrate that the HPV concatemers are extrachromosomal and in some cases form structures that resemble double-minute chromsomes.
Using a transposon-based approach that can directly sequence linear and circular DNA, we recovered multiple reads from SNU-1000 that begin and end at nearly the same position on the HPV16 genome. These reads are almost certainly derived from intact monomer episomes. We also identified readings that represent intact dimers and multimers of full-length HPV16 genomes. Therefore, replication of the HPV16 monomer can lead to intact multimer genomes. In SNU-1000 we identified a 634-bp deletion, removing portions of the E1 and E2 genespresent in 15% of viral genomes exclusively in large multimer episomal structures. These multimer episomes are composed of concatemers of full-length HPV16 genomes, Δ634, and rearranged genomes.
Analysis of direct, full-length cDNA and RNA from SNU-1000 demonstrates that the predominant HPV transcript encodes the E6*I, E7, E4, and E5 proteins and low amounts of E1 and E2. Thus, in the absence of integration, rearranged and deleted HPV16 multimers appear to have resulted in the upregulation of E6 and E7 and downregulation of E1 and E2, like tumors with integrated HPV. Therefore, one mechanism of episomal HPV carcinogenesis is the episomal deletion of E1 or E2 (Fig. 6).
To understand the mechanisms of carcinogenesis in tumors without HPV16 integration, we studied tumors previously characterized as episomal-only. Nearly one half of these tumors have intact monomer episomes. Therefore, HPV16 is capable of causing cancer without integration or rearrangement of episomal DNA. Monomer-only tumors were mostly SCC and included HPV16 A1, D2, and D3 sublineages. RNA-seq data shows a transcription pattern dominated by the expression of a restricted set of mRNAs containing the E6 and E7 genes. Therefore, even though the entire HPV genome is present in the monomer-only EP tumors they display a viral mRNA expression pattern similar to integrated HPV16 tumors.
We identified a subset of episomal-only tumors with rearranged and deleted episomal HPV and 80% of these deletions and rearrangements have a breakpoint in the E1/E2 genes. Therefore, as multimer episomes expand, they may gain more replication origins, increasing the rate of expansion and deletion, and more carcinogenic episomes are selected. Therefore, HPV16 can cause cervical cancer without integration via a combination of episome deletion, mutation, and rearrangement and favoring E6/E7 oncogene expression.
Integrated forms of HPV nearly always retain the URR, and the origin of replication (ORI). As the E1 and E2 proteins promote replication at the ORI the presence of both episomal and integrated HPV presents a situation where E1 and E2 could promote unscheduled replication at the ORI in integrated HPV. Orav and colleagues and Peter and colleagues observed DNA synthesis at integrated ORIs in transfected cells and proposed that this “rereplication” leads to local amplification of HPV and human DNA, the formation of viral/human DNA–containing ecDNA, and chromosome translocations (22, 51). Akagi and colleagues used WGS to characterize HPV integration sites with local amplification and proposed a looping model causing the amplification of viral and flanking human DNA (18). Finally, Kim and colleagues demonstrated that ecDNA is frequent in solid tumors and 12% of cervical tumors (52). We reanalyzed this data and found that 38% of integrated HPV16 cervical tumors are predicted to have ecDNA (Supplementary Table S13). Several papers have proposed the presence of ecDNA, containing human and HPV sequences, in HPV+ HNSCC tumors (15, 17, 53). However, our sequencing data does not directly address these postintegration events. We do show that rearrangement and amplification of HPV can occur before integration. Deleting E1 and E2 in episomal DNA, and epigenetic silencing, may permit an integration locus with an ORI to remain stable in the presence of episomal DNA.
There are several methods to identify the presence of HPV integration, either using DNA or RNA-based analyses. We used the presence or absence of an HPV/human DNA junction following HPV capture and deep short-read sequencing to determine integration status. We classified tumors without a detected integration and an intact viral genome as episomal-only. We used overlapping amplicons spanning the HPV genome to detect deletions and classify tumors with complete deletions in the E1/E2 region as type I integrated. The classification of tumors as episomal and integrated is more problematic. Tumor cell lines such as CaSki and SCC152 can have full-length copies of HPV16 integrated into the genome (type II integration) and therefore could be confused as EP/INT. SNU-1245 was classified as EP/INT–based on an intact E2 gene; however, it contains a full-length copy of HPV18 integrated into the genome and no detectable episomal DNA. Morgan and colleagues argue that, at least for HNSCC, the EP/INT category is mischaracterized and many of these tumors have viral/human ecDNA (14). Our data show that putative type II tumors can have apparent monomer, multimer, or rearranged HPV-only sequences, however, we did not obtain deep enough coverage to fully classify these tumors and without very deep, long-read sequencing, it is not possible to determine if this is the exact state of these sequences.
This study has several limitations, including that some of the analyses involve a small set of cell lines and only one cell line has episomal HPV16. But, there have been few HR HPV-containing cell lines with episomal DNA described. SNU-1000 may be unique in having amplified YAP1, a potent cervical cancer oncogene (54).and YAP1 overexpression might aid in the generation of additional episomal HPV models. Due to limitations in DNA quantity and quality, we obtained relatively low coverage WGS of tumors, and did not select for circular DNA. Therefore, we have not detected all HPV-containing species, and our tumor classifications are incomplete. However, to our knowledge, this is the first study to apply long-read sequencing to the study of episomal HPV-containing cervical tumors.
In conclusion, we find that multimers of the HPV genome are generated in cervical tumors replicating as extrachromosomal episomes. This HPV replication is associated with deletion and rearrangement of the HPV genome and provides a mechanism for oncogenesis without integration. We provide confirmation by DNA and RNA sequencing that a subset of HPV16-containing tumors has only episomal viral DNA. About half of episomal tumors have intact monomer episomes and an expression pattern dominated by E6/ E7 expression. Another subset of episomal tumors has rearranged episomes, often deleting the E1 and E2 genes. Our data support a model of HPV replicating as ecDNA, accumulating rearrangements leading to the integration of rearranged HPV multimers in the human genome (Fig. 7). This process parallels the well described amplification of oncogenes and drug resistance genes as ecDNA or double minute chromosomes that integrate as homogenously stained regions (HSRs; 55). And we observed HPV16 forming double minute-like structures. Further study of HPV extrachromosomal amplification and integration may provide insight into gene amplification and ecDNA formation across cancer types.
Authors' Disclosures
No disclosures were reported.
Disclaimer
The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Authors' Contributions
N.M. Rossi: Conceptualization, investigation, writing–review, and editing. J. Dai: Investigation, writing–review, and editing. Y. Xie: Investigation, writing–review, and editing. D. Wangsa: Investigation, writing–review, and editing. K. Heselmeyer-Haddad: Investigation, writing–review, and editing. H. Lou: Investigation, writing–review, and editing. J.F. Boland: Investigation and methodology. M. Yeager: Conceptualization, resources, and supervision. R. Orozco: Resources, investigation, and project administration. E. Alvirez Freites: Conceptualization, resources, and project administration. L. Mirabello: Conceptualization, resources, supervision, writing–review, and editing. E. Gharzouzi: Resources and supervision. M. Dean: Conceptualization, resources, data curation, supervision, funding acquisition, investigation, writing–original draft, project administration, writing–review, and editing.
Acknowledgments
Thanks to Lineth Boror, Ester Avila, and Patricia Zaid for sample collection and Thomas Ried, Monolina Binny, and Dave Roberson for helpful discussions. The authors acknowledge the research contributions of the Cancer Genomics Research Laboratory for their expertise, execution, and support of this research in the areas of project planning, wet laboratory processing of specimens, and bioinformatics analysis of generated data. This project has been funded in whole or in part with federal funds from the NCI, NIH, under NCI contract no. 75N910D00024. The authors are grateful for the use of the NIH Helix Biowulf computing facility. The Seven Bridges Cancer Research Data Commons Cloud Resource has been funded in whole or in part with federal funds from the NCI, NIH, contract no. HHSN261201400008C and ID/IQ agreement no. 17×146 under contract no. HHSN261201500003I and 75N91019D00024.
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).