Targeted sequencing is used increasingly in clinics to guide therapeutic decisions by measuring mutations, small indels and/or rearrangements in cancer genes. An ability to use the same platform to detect additional oncogene activation (or tumor suppressor loss) through copy number changes could significantly expand the number of patients able to benefit from targeted therapies. Many currently available tools either requires a matched normal, works only for whole genome or exome sequencing, or don't work for PCR-based targeted sequencing. In addition, no tools available to detect breakpoints within a gene from targeted sequencing. A versatile copy number analysis tool from sequencing is needed to maximize the value of the sequencing routinely applied in clinics.

Here we presented a novel computational tool, Seq2C (Sequencing To Copy Number), which is versatile to handle various situations and reports aberrations at gene level ready for interpretation. Seq2C works at cohort level and does not require a matched ‘normal’, though it can optionally use one or more normal samples for small (<20) or homogeneous cohorts. Seq2C applies a robust three-step cohort-based normalization to identify and quantify sequencing coverage variability in given regions, which can be sequence or platform dependant. Seq2C works for hybrid or PCR based targeted sequencing, exome and even whole genome sequencing. In targeted sequencing, Seq2C automatically identify and exclude regions that fail to be captured by the assay to prevent false positive calls of deletions, and can accommodate several magnitude differences in PCR efficiency.

Another distinct feature differentiating Seq2C from other currently available tools is that Seq2C identifies breakpoints and detects one or more exon deletion or duplication rearrangements within a gene, which is common in tumor suppressors as a mechanism to lose function. In addition, it can also detect potential fusions in genes such as ALK and ERG, where a copy number change is often accompanied with the fusion and a breakpoint can thus be called by Seq2C. Furthermore, it can predict gender from exome or whole genome sequencing.

We applied Seq2C to exome sequencing of CCLE cell lines, and showed that it produced gene level copy number data highly correlated to those derived from microarrays, the current gold standard. Interestingly, Seq2C identified one or more exon deletions in several common tumor suppressors, such as TP53, PTEN, CDKN2A, NF1, STK11 and RB1, in cell lines with no known aberrations. They were supported by RNA-Seq data. It also correctly called known ERG fusion in prostate cell line NCI-H660 and identified a previously un-reported EML4-ALK fusion in a pancreatic cancer cell line SNU-324, which is confirmed by RNA-Seq data, suggesting EML4-ALK is not limited to lung cancer.

In conclusion, Seq2C is a versatile copy number analysis tool for sequencing and will be useful for cancer research. Seq2C is freely available in GitHub.

Citation Format: Zhongwu Lai, Aleksandra Markovets, Jonathan Dry. Seq2C: from sequence to copy number for cancer samples. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 5268.