Oncogenic Merkel Cell Polyomavirus T Antigen Truncating Mutations are Mediated by APOBEC3 Activity in Merkel Cell Carcinoma

Merkel cell carcinoma (MCC) is an aggressive skin cancer, which is frequently caused by Merkel cell polyomavirus (MCPyV). Mutations of MCPyV tumor (T) antigens are major pathologic events of virus-positive (MCPyV+) MCCs, but their source is unclear. Activation-induced cytidine deaminase (AID)/APOBEC family cytidine deaminases contribute to antiviral immunity by mutating viral genomes and are potential carcinogenic mutators. We studied the contribution of AID/APOBEC cytidine deaminases to MCPyV large T (LT) truncation events. The MCPyV LT area in MCCs was enriched with cytosine-targeting mutations, and a strong APOBEC3 mutation signature was observed in MCC sequences. AICDA and APOBEC3 expression were detected in the Finnish MCC sample cohort, and LT expression correlated with APOBEC3H and APOBEC3G. Marginal but statistically significant somatic hypermutation targeting activity was detected in the MCPyV regulatory region. Our results suggest that APOBEC3 cytidine deaminases are a plausible cause of the LT truncating mutations in MCPyV+ MCC, while the role of AID in MCC carcinogenesis is unlikely. Significance: We uncover APOBEC3 mutation signature in MCPyV LT that reveals the likely cause of mutations underlying MCPyV+ MCC. We further reveal an expression pattern of APOBECs in a large Finnish MCC sample cohort. Thus, the findings presented here suggest a molecular mechanism underlying an aggressive carcinoma with poor prognosis.


Introduction
Merkel cell carcinoma (MCC) is a rare but aggressive skin cancer mostly affecting the elderly and/or immunocompromised. The carcinoma has poor prognosis and at least two distinct etiologies. Approximately 80% of the tumors are caused by Merkel cell polyomavirus (MCPyV) where the virus is clonally integrated in the genome, have low mutation burden, and lack UV mutation signatures (1)(2)(3)(4)(5)(6)(7).Unlike virus-positive (MCPyV+) MCCs, virus-negative (MCPyV−) MCCs have high mutation burden and an UV mutation signature commonly seen in skin cancers (8)(9)(10). In addition to mutation accumulation, MCPyV+ and MCPyV− MCC differ in cell morphology and possibly the cell of origin (11,12).
In MCPyV+ cancers, the MCPyV small tumor antigen (ST) and large tumor antigen (LT) cooperate to promote transformation and growth of the host cells antigen receptor gene recombination, and are shown to recombine their Ig genes, such as occurs in developing B cells (21). In addition, both MCPyV+ and MCPyV− MCCs are shown to express activation-induced cytidine deaminase (AID), a member of the AID/APOBEC family of cytidine deaminases (6,28) AID is expressed almost exclusively in B cells, where it initiates somatic hypermutation (SHM) and class-switch recombination (CSR) in Ig genes. Being a powerful mutator, it is involved in the development of several cancer types, such as lymphoma, skin cancers (squamous cell carcinoma, basal-cell carcinoma, and melanoma), gastric cancers, and hepatocellular cancers. Particularly the members of APOBEC3 subfamily participate in antiviral defense (29,30) and mutate viral genomes (31,32), as well as contribute to carcinogenesis in, for instance, breast, cervical, and lung cancer (8,29). APOBEC3A, APOBEC3B, and APOBEC3H are most tightly linked to carcinogenesis due to their frequent expression in cancers, nuclear localization, and ability to deaminate cytidines in genomic DNA (31,33,34). In addition, polyomaviruses have been shown to induce APOBEC3 expression (35,36).
In Ig genes, AID-induced SHM is targeted to a 250-1,500 bp region downstream of the transcription start site, which codes for the variable domain of antibodies. SHM is targeted to the Ig loci by Ig enhancers and enhancer-like sequences (37). SHM is known to target several non-Ig genes outside the Ig locus, a phenomenon which is intimately linked to lymphoma (38)(39)(40)(41)(42).
We sought to investigate whether mutations in MCPyV LT could arise from an SHM-like cytidine deamination. We found APOBEC3 signature from LT area enriched in MCC sequences and APOBEC expression in a Finnish MCC sample cohort. We found AICDA expression in a subset of Finnish MCC tumors and marginal SHM recruiting activity in the MCPyV regulatory region upstream of LT in a B-cell model, but the AID mutation signature was not enriched in LT sequences. We conclude that APOBEC3s, rather than AID, are mostly responsible for LT mutations in MCC.

Cloning Procedure
The sequences tested for SHM targeting activity were cloned to a GFP4expressing vector either at a NheI/SpeI site (upstream position) or at a BamHI site (downstream position). The tested sequences were first amplified from MCV-HF (RRID:Addgene_32057; ref. 43) using Q5 High-Fidelity DNA Polymerase (NEB). For downstream cloning, the GFP expression cassette was amplified from the GFP4 vector. Primers were designed according to In-Fusion primer design protocol. Cloning was performed with In-Fusion cloning (Takara) kit according to the manufacturer's protocol. The cloned plasmids were isolated with GeneJET Plasmid Miniprep Kit (Thermo Fisher Scientific), and the success of cloning was checked with restriction enzyme digestion. After successful cloning, new plasmid isolations were made with ZymoPure II Maxiprep kit (Zymo research).

Transfection and Cell Culture
The chicken B-cell line DT40 (RRID:CVCL_0249) with modifications (UNG −/− AICDA R/puro ; described in ref. 37) was received from David. G. Schatz. The cells were authenticated by testing their sensitivity for blasticidin and puromycin after targeted integration of the reporter construct, Western blotting of AID expression and functionally by carrying out the GFP loss assay with control reporters (Fig. 2B). The cells were cultured for approximately 25 days during the assay periods. The cells were cultured at +40°C, 5% CO 2 , 90% humidity. Growth media included RPMI1640 HEPES modification (Sigma) with 10% FBS (HyClone), 1% NCS (Biowest), 1x penicillinstreptomycin antibiotic (Gibco), 1x l-glutamine (Gibco), and 50 μmol/L β-mercaptoethanol. The cells were tested for lack of Mycoplasma on January 9, 2020 using PCR Mycoplasma test kit I/C (PromoCell).
The GFP loss assay was performed (as described in ref. 37). First, 12 × 10 6 cells were transfected with 50 μg of NotI-linearized test sequences containing plasmids. For negative control, a plasmid including the GFP reporter without a test sequence was included. Transfection was performed with GenePulser electroporator (Bio-Rad) using settings 0.7 kV, 200 Ω, and 25 μF. Immediately after transfection, cells were transferred to 96-well plates to growth media with 5% extra FBS added. The day after transfection, blasticidin was added at a final concentration of 15 μg/mL to select the transfected cells. Cells were grown for approximately 7 days before primary clones were picked and tested for targeted integration using puromycin selection (final concentration 1 μg/mL). At least one targeted clone per test sequence was then subcloned by limiting dilution and cultured for 12 days before flow cytometry analysis.

Mutation Analysis
A total of 113 MCPyV LT sequences from MCC samples and 83 MCPyV LT sequences from healthy skin samples submitted to GenBank NCBI Virus database by the year 2020 were analyzed. The sequences varied in length between 65 and 2,885 bases. Sequences were aligned to the reference MCPyV isolate R17b (NCBI:txid 493803) sequence using SnapGene software (RRID:SCR_015052).
All single-bp substitutions and insertion/deletion mutations relative to the MCPyV reference genome were included in the analysis. To avoid overrepresentation of MCPyV strain-specific variants, each mutation was calculated as one despite its potential occurrence in multiple sequences. The amount and distribution of single-bp substitutions as well as mutations leading to a stop codon were analyzed. WRC, TCW, and YCC motifs and mutations affecting the underlined cytidines were calculated. These motifs were selected to represent AID, APOBEC3, and UV hotspots, as they are mutually exclusive and thus give a better resolution for mutation analysis and trinucleotide sites were used to achieve better comparability. Mutation frequency was calculated in 200 bp bins. The number of mutations was divided by the number of sequences analyzed. Because some analyzed sequences were truncated or otherwise partial, this bin-based approach prevented bias caused by different number of sequences obtained along the LT area. Graphs were made using GraphPad Prism 9 software.

Statistical Analysis
Mann-Whitney U test was performed to evaluate statistical significance of the number of mutations, mutation distribution as well as the ratio and type of hotspot mutations between MCC and healthy control sequences. Mann-Whitney U test was also performed to evaluate statistical significance of the median GFP loss compared with GFP4 vector without test sequence and for comparison between upstream and downstream constructs. One minus the Spearman rank correlation was used as the calculation method for heatmap hierarchical clustering and Spearman rank correlation for similarity matrixes. Fisher exact test was used to calculate statistical differences of AID/APOBECexpressing and nonexpressing MCC samples. Fisher exact test was also used to calculate statistical differences of high and low sun-exposed MCC tumors and their MCPyV status. Statistical analyzes were performed using GraphPad Prism 9 software.

Data Availability Statement
The RNA-seq data are available under the BioProject PRJNA775071 in NCBI Sequence Read Archive (SRA) database.

MCPyV LT is Enriched in Cytosine-targeting Mutations in MCC
We  Table 1).
To determine the distribution of the mutations in the MCPyV T antigen area, Given that a truncation mutation in LT is a major oncogenic event in MCPyV+ MCC (18) and the fact that a substitution of a C is a likely way to generate a stop codon due to their lack of cytidines, we considered whether the large number of substitutions from C in MCC samples is a result of positive selection of stop codons.
While most (75.7%) of the substitutions targeting Cs were in the region spanning 1,200-2,200 bp ( Fig. 1D), 6.5% of all substitutions in the T antigen area caused an in-frame stop codon (Table 1), and the in-frame stop codon distribution ( Fig. 1E) explains only a very small proportion of the overall distribution of mutations in MCC samples (Fig. 1B). Instead, out of all insertion and deletion mutations, 77% caused an in-frame stop codon, all of which were in the 1,200-2,200 bp region (  Ig loci. To do this, we performed a GFP-loss assay in DT40 B cells, which measures SHM recruitment activity (37). We first tested four partly overlapping regions of the MCPyV genome (Fr1, Fr2, Fr3, and Fr4; Fig. 2A). None of these regions showed significant SHM recruitment activity (Fig. 2B). SHM activity in the genome is restricted within topologically associating domains, whose boundaries may prevent the spreading of SHM from one domain to the next (51). Thus, any insulator, such as a CCCTC-binding factor (CTCF) binding site in the test DNA fragment between the potential active SHM-recruiting region and the GFP reporter gene, could prevent GFP loss. To address this, we tested subregions of Fr1 and Fr2 that do not contain CTCF sequences (Fr1 truncated and Fr2 truncated, respectively) and tested Fr1 in reverse orientation, which moves the CTCF site away from between the fragment core and the GFP transcription unit of the reporter (Fr1 flipped). None of these modified fragments exhibited SHM targeting activity (Fig. 2B).
Because SHM targeting elements in Ig loci are found in regulatory regions (enhancers), we also tested the MCPyV noncoding control region (NCCR) more carefully (Fig. 2). We tested NCCR fragments both upstream and downstream of the GFP expression cassette, as positioning the element upstream of the reporter affects the transcriptional output but not when positioned downstream (52). Interestingly, the downstream position of the 440 bp fragment (440bpD) showed low but statistically significant (3.2% GFP loss, P = 0.0161) SHM targeting activity (Fig. 2B). Further splitting the 440 bp fragment into smaller 80 and 204 bp fragments reduced the targeting activity below detection limit, but similarly to the 440 bp fragment, the 204 bp fragment had more SHM targeting activity in downstream than in upstream position (440bpD: 3.2% and 440bp 1.7%, P < 0.0001; 204bpD 3.1% and 204bp 1.1%, P < 0.0001). Therefore, we conclude that the NCCR of MCPyV can marginally recruit SHM to a neighboring transcription unit in cells that are capable of SHM.

Mutations in MCC LT Area Concentrate to APOBEC3 Hotspots
To explore the role of AID/APOBEC deaminases in LT mutations, we analyzed the immediate sequence context of LT substitution mutations for enrichment of AID hotspot sequences (WR C). We found that while 23.9% of mutations were in WRC hotspots in control samples, they were not enriched in MCC samples (18.0%, P = 0.0630; Fig. 3A).
As the mutations were clearly accumulated in C bases (Fig. 1), and cytidine deaminase APOBEC3 signature was recently established in MCPyV in the context of TC dinucleotides (36), we analyzed APOBEC3 hotspot mutations in LT in the TCW trinucleotide context. The proportion of mutations at APOBEC3 hotspots was increased 3-fold in MCC samples (30.5%) compared with healthy controls (10.2%, P < 0.0001; Fig. 3A). Interestingly, 42.9% of the in-frame stop codons caused by substitution mutations at Cs were also APOBEC3 hotspot mutations, out of which 60% were between RB and helicase domains (Fig.  1F). In contrast, none of the in-frame stop codons after this region were in APOBEC3 hotspots.
UV radiation is also a probable source of C mutations in skin cancers, and UV mutation patterns are observed in MCPyV− MCCs, but not in MCPyV+ MCCs (7). UV radiation causes frequent C>T and CC>TT mutations at dipyrimidine sites (53,54). Because TCC or CCC trinucleotides are mutated most frequently in the single-base mutation signatures SBS7a and SBS7b of the  Analysis of the proportions of C>A, C>G, and C>T substitutions at each hotspot ( Fig. 3C-F) showed that in the context of WRC and YCC hotspots, C>T substitutions were predominant, while in APOBEC3 hotspots also C>G substitutions were common. This is in line with the mutation type distribution described for AID, APOBEC, and UV radiation (10,29,39,42) and further implicates these factors as mutators in MCC. Note, that these classifications are not definitive, as several factors influence which base is incorporated to the site of initial lesion.
Curiously, AID and APOBEC3 hotspots were mainly mutated after RB binding site (starting from 1,200 bp bin), whereas UV hotspot mutations seemed to have accumulated in the beginning of the LT area (up until 1,400 bp bin) and at the very end of the LT (at 2885 bp bin; Fig. 3G). Overall, APOBEC3 had the highest hotspot mutation rate and the best resemblance with overall C-mutation distribution, enforcing the notion of APOBEC3 signature in the LT.

Finnish MCCs Express AICDA and APOBEC3s
To RNA-seq probably underestimates the number of MCPyV+ MCC samples, as in previous studies using other methods to analyze the same cohort, the number of MCPyV+ MCC samples was higher (27,55,57).
AICDA expression levels were 1.4-fold higher, and the proportion of AICDAexpressing samples was higher in the MCPyV− group (34.4%) than in the MCPyV+ group (32.0%; Table 2).
APOBEC and APOBEC were expressed sparsely (Table 2), and their expression levels were very low. APOBEC levels were higher (3.1-fold), and APOBEC was more frequently expressed in the MCPyV− group than in the MCPyV+ group (21.9% and 14.0, respectively). However, neither APOBEC2 nor APOBEC4 are known to have deaminase activity (29). We performed hierarchical clustering for AID/APOBEC and LT expression data (Fig. 4A). The LT expression clustered together with APOBEC subfamily, excluding APOBECB. APOBECH, and APOBECG expression correlated with LT expression (Fig. 4B; Spearman correlation APOBECH r = 0.32, P = 0.004; APOBECG r = 0.25, P = 0.025), suggesting a possible mechanistic link between MCPyV infection and APOBECG/APOBECH expression.

More tumors in the
APOBECB excluded, APOBECs and LT coexpressed with markers of T lymphocytes (CDA, CDB, CD, CDG, and granzyme B; Fig. 4C). Also, IFNγ (IFNG), and IFNG regulating long noncoding RNA (IFNG-AS) coexpressed with these APOBECs. This is in line with the presence and potential contribution of tumor-infiltrating lymphocytes for APOBEC expression in MCC tumors (36,58). Interestingly, we found coexpression of B-lymphocyte markers (CD and MSA, encoding for CD20) with most of these APOBECs. This suggests that as well as T lymphocytes, B lymphocytes also infiltrate MCC tumors.
In addition, germinal center B-cell markers (BCL, Ki, CD, ICOS, and PDCD encoding for PD-1) were expressed, which could indicate formation of tertiary lymphoid structures. Therefore, we cannot exclude tumor-infiltrating B lymphocytes rather than malignant cells as the cellular source of AID expression in the tumors.
Overall, our findings indicate clear APOBEC3 signature and APOBEC expression in MCPyV+ MCC tumors.

Discussion
The human AID/APOBEC gene family of cytosine deaminases codes for 11 proteins with various functions: AID, APOBEC1, APOBEC2, APOBEC3A, Large T-antigen Large T-antigen Large T-antigen While beneficial for antiviral immunity, the AID/APOBEC are known to induce genomic mutations and chromosomal aberrations with oncogenic outcomes in lymphocytes as well as in solid tissue cancers (reviewed in ref. 29). However, the contribution of APOBECs in cancer via mutations of a viral genome has remained poorly understood.
In this study, we found APOBEC3 deaminase mutation signature in the MCC LT area. Up to 43% of in-frame stop codon-causing substitutions occurred in APOBEC3 hotspots with the mutational outcome expected for APOBEC3s, and APOBEC was widely expressed in MCPyV+ MCC tumors. Thus, our analysis implicates APOBEC3s as the primary mutators of LT.
APOBEC3s are largely agreed to target TCW motifs (29,64). Focusing the analysis on this hotspot avoids mistakenly counting in UV-induced mutations, which frequently target pyrimidine dimers (54). CC>TT substitutions were absent in LT, and 35% of the Finnish MCPyV+ MCC tumors were in low sun exposure areas of the body. Thus, our findings do not support the role of UV in LT mutations.
Using a recently specified target motif preference for individual APOBEC3 subfamily members (30), we could not discriminate a specific APOBEC3 as the AACRJournals.org Cancer Res Commun; 2(11) November 2022 primary mutator in the LT. As expected, due to a loose definition and overlap with other hotspots, the TC dinucleotides suggested as APOBEC3D/H hotspot (30) were most frequently mutated, agreeing with the conclusion drawn from characterizing mutations in MCPyV by using the TC dinucleotide as APOBEC3 hotspot (36).
MCPyV can induce APOBEC3A, APOBEC3B, and APOBEC3G expression in cell models (35,36), where at least APOBEC3B can be upregulated by IFNγ (36). APOBEC3B has been the primary suspect out of the APOBEC family in cancers (65) In contrast to APOBECB, the expression of APOBECA and APOBECH was widespread. There is accumulating evidence that APOBEC3A and 3H are equally or even more potent mutators in cancers than APOBEC3B (67)(68)(69).
APOBECA and APOBECH were expressed in a larger proportion of Finnish MCPyV+ MCC samples (46.0% and 42.0%, respectively) than MCPyV− MCC samples (37.5% and 9.0%, respectively). The difference was statistically significant for APOBECH, which also showed statistically significant coexpression with LT. In addition, specific hotspot mutations (at YT CR and TC) were frequent, and also both APOBEC3A and APOBEC3H can mutate TCW (67)(68)(69). It should be noted that the mutation pattern and RNA expression data in our study came from different sources, making it possible, although unlikely, that the different genetic background of the subjects obscured our findings.
APOBECC, APOBECF, and APOBECG were expressed in the majority of the MCC samples. APOBECC and APOBECG levels were higher in MCPyV+ MCCs compared with MCPyV− MCCs, and APOBECG showed statistically significant coexpression with LT, which could implicate an immune response to MCPyV infection and make APOBEC3C and APOBEC3G potential LT mutators. Because APOBECD was expressed only marginally, and along with APOBEC3G and APOBEC3F, mainly reside in the cytoplasm (59), our data do not strongly support them as primary mutators.
Given the contribution of APOBEC3s in genomic cancer mutations and viral mutations in antiviral defense, our findings strongly suggest that APOBEC3 subfamily, particularly APOBEC3A, APOBEC3H, APOBEC3C, and to smaller extent APOBEC3G, contribute to the LT mutation signature and potentially premature stop codon formation.
AID is expressed in activated B lymphocytes and its association in Blymphocyte cancers is well established (70) and has been implicated in skin cancers such as melanoma (71). We also detected AICDA expression in a subset of Finnish MCC samples, where the expression was stronger and slightly more frequently expressed in MCPyV− MCC samples, conforming with a previous finding (6). We also did not find AID mutation signatures enriched in MCC samples over control.
We found very weak, yet statistically significant, SHM targeting activity in the MCPyV NCCR. Therefore, it is possible that the SHM targeting activity pos-sessed by the MCPyV NCCR could target AID-mediated SHM to the adjacent T antigen area (or other nearby genes), similar to the enhancers in Ig loci. As SHM is a property of germinal-center B lymphocytes, the scenario of AID inducing carcinogenic mutations in MCPyV is more likely if MCC indeed derived from activated skin-resident B lymphocytes. Expression of B-lymphocyte markers in MCC (reviewed in 72) fit better for pro/pre-B lymphocyte stage. Thus, we conclude that AID-induced SHM is unlikely the primary mechanism for LT mutations, but its role in MCC carcinogenesis cannot be entirely excluded. Because there is a correlation between APOBEC mutations and DNA replication (29) and the NCRR also has the replication of origin for the MCPyV, it is conceivable that the mutations causing the GFP loss in our reporter system result from APOBEC activity in the DT40 B cells used to carry out the assay. This remains to be addressed experimentally.
Tumors are often infiltrated with immune cells (73), and T lymphocytes, macrophages, and natural killer cells were detected more frequently in MCPyV+ MCCs than in MCPyV− MCCs, demonstrating immune cell infiltration also in MCC (58). The role of B lymphocytes in tumor environment has not been studied as extensively as other immune cells, but there is evidence showing the importance of tumor-infiltrating B lymphocytes (74). We did observe expression of B-cell markers including AICDA in addition to Tlymphocyte markers in the RNA-seq data. It remains to be investigated whether a high number of tumor-infiltrating B lymphocytes could also explain the B-lymphocyte marker expression in MCC.
In conclusion, our findings support the view where APOBEC3 enzymes mutate MCPyV LT area and that this mutational activity is a major cause of premature stop codon formation and thus MCC carcinogenesis.