Detection of circulating tumor DNA is a promising approach for the diagnosis and monitor of solid tumors. However, sensitive detection of ctDNA remains challenging when minor allele frequency (MAF) is ultra-low. Numerous noise reduction data algorithms have been developed to reduce next-generation sequencing (NGS) errors, but variant callers that can take advantage of ultra-deep sequencing data are lacking. Here we develop a bin-based genotyping (BinGo) method for canceling sequencing noise using ultra-deep (> 100,000 X depth) targeted sequencing data. A mutation reference sample with 44 mutations was diluted to different MAFs at 1% and 0.5%. One blood sample for proficiency test was included. We constructed targeted sequencing libraries by anchored multiplex enrichment for a panel of targets that cover the 44 mutations, and sequenced to a very high depth (Mean X ranged: 71K to 350K).After raw NGS reads were mapped by BWA MEM, BinGo tags unique molecular index (UMI) to each alignment (BAM) to preserve alignment signal of all bases in all raw reads. BinGo then bins all reads with the same UMI into one bin, generating millions of bins. Millions of parallel Samtools mpileup was enabled by the GNU parallel tool. Typical mpileup analyses use the human genome (requires ~4GB RAM) as a reference, making millions of such analyses essentially impossible in a regular computing server. To overcome this hurdle, BinGo uses tailor-made reference sequences comprising the targets only (< 100KB), drastically reduces computation memory consumption and enables millions of mpileup analyses. BinGo recompiles Samtools to increase its default max depth from 8,000 to 8,000,000 to accommodate the ultra-deep data. Variants derived from the same bin are effectively phased, thereby enabling calling of di-nuclei mutations such as BRAF V600K (GT>TT) and V600R (GT>AG). This binning also allows for sensitive calling of insertions and deletions even down to a single read. To profile noise patterns, BinGo uses Samtools tview to acquire every base in every alignment position of the target sequences in all raw reads. Finally, BinGo uses multiple statistical testing to call variants, including calling nucleotide changes over the background noise model that is defined by nucleotide change and sequence context; filtering out strand imbalance likely due to DNA template damage, and a low number of supporting ligation starting sites. With 10 ng DNA as input and at an expected MAF of 0.5%, BinGo demonstrates a 100% (44/44) analytical sensitivity and 100% specificity. When using 1 ng DNA as input and at an expected MAF of 0.5%, only one mutation was missed, leading to a sensitivity of 97.73% (43/44) and 100% specificity. Furthermore, BinGo detected in the blood DNA sample a 15-bp EGFR exon 19 deletion at 0.05%.Bingo integrates UMI-bin derived variants and raw sequencing background noise modeling, enabling analysis of ultra-deep targeted NGS data for rare variant detection.

Citation Format: Weiwei Bian, Firaol Kebede, Zongli Zheng. A computation method for noise reduction based on ultra-deep targeted sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 244.