We propose, implement, and evaluate a novel method (GATK gCNV) for accurate discovery of rare and common copy-number variations (CNVs) from read-depth data obtained from whole genome sequencing (WGS), whole exome sequencing (WES), or custom gene panels. GATK gCNV utilizes a sophisticated Bayesian model to learn bias factors arising from sequencing and library preparation. This model accounts for ploidy of sex chromosomes and autosomal aneuploidies, treats GC bias probabilistically, and automatically determines the necessary level of model complexity in a data-driven manner. Unlike most existing read-depth methods, GATK gCNV maintains a high level of sensitivity in common CNV regions, due to a hierarchical hidden Markov model used for accurate genotyping of multi-allelic loci. Furthermore, GATK gCNV performs bias modeling and CNV discovery simultaneously and self-consistently, resulting in significantly improved sensitivity and precision. Our implementation utilizes the PyMC3/Theano framework for performing automatic differentiation variational inference (ADVI). In addition, GATK gCNV automatically scatters large tasks across multiple machines using the Cromwell/WDL framework, enabling the scalable processing of large cohorts.

We use GATK gCNV to compute copy-number transmission and de novo rates in a cohort of WES trios and observe consistency with observed population metrics. Furthermore, we benchmark GATK gCNV for sensitivity, precision, and reproducibility of both rare and common CNV calls. Using a cohort of WES blood normal samples from The Cancer Genome Atlas (TCGA), we show that GATK gCNV calls are in remarkable concordance with Genome STRiP calls on matched WGS data and gCNV substantially outperforms XHMM and CODEX. We also validate GATK gCNV calls on WGS data against calls made using PacBio long reads.

Citation Format: Mehrtash Babadi, Samuel K. Lee, Andrey Smirnov, Lee Lichtenstein, Laura D. Gauthier, Daniel P. Howrigan, Timothy Poterba. Precise common and rare germline CNV calling with GATK [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 2287.