Although infection contributes to over 20% of human cancers worldwide, the list of confirmed carcinogenic infectious agents is surprisingly short[1]. Epidemiologic studies strongly suggest that novel infectious agents remain to be discovered and contribute to a broad range of diseases, including cancers, autoimmune disorders, and degenerative diseases.

Although current methods have been successful in identifying human viruses, most pathogen discovery approaches suffer from several shortcomings. None are quantitative, so if no candidate sequence is found, it is not possible to estimate how likely it is that an agent is present but missed in the search. There are no simple ways to scale up pathogen discovery with these methods short of analyzing additional samples and it is unlikely that any technique can universally identify all human tumor viruses. But with the near completion of the human genome project, high-throughput sequencing can be exploited to overcome many of these problems.

We developed digital transcript subtraction (DTS) to subtract in silico known human sequences from expression library data sets, leaving candidate nonhuman sequences for further analysis [2]. This approach requires precise discrimination between human and nonhuman cDNA sequences. Database comparisons show high likelihood that small viral cDNA sequences can be successfully distinguished from human sequences. We pilot tested DTS on 9,026 20-bp cDNA tags from an expression library of BCBL-1 cells infected with Kaposi's sarcoma-associated herpesvirus (KSHV). In this initial test, we succeeded in complete in silico cDNA subtraction. Only three candidate sequences were identified as being of nonhuman origin: two of these sequences belonged to highly-expressed KSHV transcripts, and the third belonged to an unannotated human expression sequence tag. Overall, 0.24% of transcripts from this cell line were of viral origin. DTS was then applied to 241,122 expression tags from three squamous cell conjunctival carcinomas (SCCC), a cancer strongly associated with immunosuppression. Only 21 of these sequences did not align to human databases and all 21 candidates were ruled out as viral sequences by experimental isolation. This analysis shows that it is unlikely that distinguishable viral transcripts are present in conjunctival carcinomas at 20 transcripts per million or higher, which is the equivalent of approximately 4 transcripts per cell.

Merkel cell carcinoma (MCC) is a malignant skin tumor arising from mechanoreceptor Merkel cells. MCC also occurs more frequently than expected among immunosuppressed transplant and AIDS patients. To search for viral sequences in MCC, we performed DTS on cDNA from 4 MCC cases, analyzing 395,734 cDNA sequences [3]. One transcript was similar to but distinct from African green monkey (AGM) lymphotropic polyomavirus (LPV) and human BK polyomavirus T antigen sequences, defining a new human polyomavirus, Merkel cell polyomavirus (MCV or McPyV). We sequenced the complete close circular genome of MCV (5387 bp), which encodes a T antigen locus, VP1, VP2/3, and replication origin sequences. MCV has highest homology to LPV belonging to MuPyV subgroup, the first human member of this subgroup, The MCV transcript was present at 10 transcript per million or abut 5 transcripts per cell.

MCV sequences were detected in 8 of 10 (80%) MCC tumors but only 5 of 59 (8%) control tissues from various body sites and 4 of 25 (16%) control skin tissues. Rapid amplification of cDNA ends (3’-RACE) revealed a T antigen fusion transcript with intron 1 of the human receptor tyrosine phosphatase type G gene (PTPRG), suggesting viral integration in the tumor. In six of eight MCV-positive MCCs, viral DNA was also found integrated within the tumor genome in a clonal pattern, suggesting that MCV infection and integration preceded clonal expansion of the tumor cells. 5’-RACE and northern analysis defined large (LT), small (ST) and variably spliced MCV T transcripts that retain polyomaviral CR1 (LXXLL), DnaJ (HPDKGG), PP2A binding (CXCXXC), Rb binding (LXCXE) motifs as well as origin-binding and helicase/ATPase domains. We further sequenced nine MCC tumor-derived LT genomic region and all were found to harbor mutations prematurely truncating the MCV LT helicase [4]. In contrast, four presumed episomal viruses from nontumor sources did not process this T antigen signature mutation. MCV-positive MCC tumor cells undergo selection for LT mutations to prevent autoactivation of integrated virus replication detrimental to cell survival. Because these mutations render the virus replication-incompetent, MCV cannot be a “passenger virus” that secondarily infects MCC tumors.

Overall, DTS is a simple screening method to discover novel viral nucleic acids. It provides, for the first time, quantitative evidence against some classes of viral etiology when no viral transcripts are found, thereby reducing the uncertainty involved in new pathogen discovery.

Citation Information: Cancer Prev Res 2008;1(7 Suppl):CN09-04.

Seventh AACR International Conference on Frontiers in Cancer Prevention Research-- Nov 16-19, 2008; Washington, DC

Parkin DMThe global health burden of infection-associated cancers in the year 2002
Int J Cancer
Feng H et al Human transcriptome subtraction by using short sequence tags to search for tumor viruses in conjunctival carcinoma
J Virol
Feng H et al Clonal integration of a polyomavirus in human Merkel cell carcinoma
Shuda M et al T antigen mutations are a human tumor-specific signature for Merkel cell polyomavirus\
Proc Natl Acad Sci U S A