Abstract
Liquid biopsy holds great promise in noninvasive diagnosis of cancers through detecting minute amounts of cell-free DNA released from cancer cells in non-solid biologic tissue such as peripheral blood. A critical bottleneck in developing liquid biopsy methods is the limited accuracy of current next-generation sequencing technology (NGS), evidenced by its high error rate (0.1%-1%, as of 2018). Through mathematical modeling of NGS errors, we have recently published a method to computationally suppress the current NGS error rate to between 10−5 and 10−4, two orders of magnitude lower than general reports. However, this error rate is a product of both PCR errors and instrument (i.e., sequencer) errors, and it is currently unknown how to separate these error sources. In this work, we developed a novel computational algorithm to precisely measure the errors caused by sequencers. By using 12 publicly available datasets from 10 sequencing centers (in America, Europe, and Asia), we discovered highly reproducible patterns of sequencer errors, including: 1) the overall sequencer error rate is 10−5; 2) at the flow-cell level, error rates are elevated in the bottom surface; 3) almost all flow cells have a small fraction of random tiles with a dramatically elevated error rate; 4) the elevated error rates appear to be enriched in some reaction cycles; 5) removal of these reaction cycles yields 5-fold lower error rates at some genomic loci, so that A>C, A>T, and C>G error types have error rates close to 10−6; and 6) sequencer errors have a pattern markedly distinct from PCR errors. We have implemented the above observations into a general-purpose algorithm, termed CleanDeepSeq2, to computationally suppress sequencer errors and to also effectively monitor sequencer anomalies. CleanDeepSeq2 was engineered for efficiency so that a dataset with ultra-deep sequencing (1,000,000X depth) can be processed in 1.5N minutes on a single CPU core, where N is the number of target regions. Similarly, WES (100X) and WGS (~30X) datasets can be processed in under 1 CPU hour in order to monitor instrument performance. Overall, we have developed a computational method that for the first time enabled precise measurement of sequencer errors. Our study revealed novel insights on sequencer errors that can lead to improved instrumentation, NGS chemistry, and ultimately higher DNA sequencing fidelity. In addition, our developed software can efficiently suppress sequencer errors in addition to previously discovered error sources.
Citation Format: Eric Davis, Rain Sun, Ying Shao, Yanling Liu, Heather L. Mulder, Stephen V. Rice, John Easton, Jinghui Zhang, Xiaotu Ma. Uncovering instrument errors in next-generation sequencing by CleanDeepSeq2 [abstract]. In: Proceedings of the AACR Special Conference on Advances in Liquid Biopsies; Jan 13-16, 2020; Miami, FL. Philadelphia (PA): AACR; Clin Cancer Res 2020;26(11_Suppl):Abstract nr A57.