Abstract
Early cancer detection by cell-free DNA (cfDNA) faces multiple challenges: the low fraction of tumor DNA in cfDNA, the molecular heterogeneity of cancer, and sample sizes that are too small to reflect the heterogeneous patient population. We have developed an integrated cancer detection system, CancerRadar, that addresses all three challenges. It consists of (1) a cost-effective experimental assay, cfMethyl-Seq, for genome-wide methylation profiling of cfDNA, which provides >12-fold enrichment over Whole Genome Bisulfite Sequencing (WGBS) in CpG islands; and (2) a computational platform to extract information from cfMethyl-Seq data and diagnose the patient. The platform derives cfDNA methylations, cfDNA fragment sizes, copy number variations (CNV), and microbial composition from the raw cfMethyl-Seq data, and performs multi-feature ensemble learning. We demonstrate the power of CancerRadar by detecting and locating cancer in a cohort of 275 colon, liver, lung, and stomach cancer patients and 204 non-cancer individuals. For cancer detection, we achieve a sensitivity of 85.6%± 6.7% across all stages and 80.6%±9.1% for early stages (I and II), with a specificity of 99% in both cases. These metrics are derived using leave-one-out cross-validation. During independent validation on a reserved subsample, it achieves a sensitivity of 89.1%±11.3% across all stages and 85.7%±14.2% for early stages, with a specificity of 97% (one false positive). For locating a tumor's tissue of origin (TOO), CancerRadar achieved an accuracy of 91.5%±5.0% for all stages and 89.1%±7.3% for early stages, on an independent subsample. This study is the first to integrate cfDNA methylation, cfDNA fragment size, CNV, and microbial composition analyses for cancer detection on the same patient cohort. cfDNA methylation was the most useful for detecting cancer, but including features from other categories significantly increased the performance, especially for early-stage cancer. In contrast, with respect to TOO prediction, methylation-derived features were overwhelmingly important while including other features did not further improve performance. To fully exploit the power of cfDNA methylation, we identified four types of methylation markers with different characteristics. We have also improved our previous read-level deconvolution algorithm to more accurately identify trace tumor signals. Finally, our data show that as training sample sizes increase, the detection power of CancerRadar continues to increase. Although all existing cancer detection studies are limited by training sample sizes, the CancerRadar system uniquely and cost-effectively retains the genome-wide epigenetic and genetic profiles of cancer abnormalities, thereby permitting the classification models to learn and exploit newly significant features as training cohorts grow, as well as expanding their scope to other cancer types.
Citation Format: Mary Stackpole, Weihua Zeng, Shuo Li, Chun-Chi Liu, Yonggang Zhou, Shanshan He, Angela Yeh, Ziye Wang, Fengzhu Sun, Qingjiao Li, Zuyang Yuan, Asli Yildirim, Pin Jung Chen, Paul Winograd, Shize Li, Zorawar Noor, Edward Garon, Samuel French, Clara Magyar, Sarah Dry, Clara Lajonchere, Daniel Geschwind, Gina Choi, Sammy Saab, Frank Alber, Wing Hung Wong, Steven Dubinett, Denise Aberle, Vatche Agopian, Steven-Huy Han, Xiaohui Ni, Wenyuan Li, Xianghong Jasmine Zhou. Multi-feature ensemble learning on cell-free dna for accurately detecting and locating cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 24.