Abstract
Colorectal cancer is frequently diagnosed in advanced stages, highlighting the need for developing approaches for early detection. Liquid biopsy using cell-free DNA (cfDNA) fragmentomics is a promising approach, but the clinical application is hindered by complexity and cost. This study aimed to develop an integrated model using cfDNA fragmentomics for accurate, cost-effective early-stage colorectal cancer detection. Plasma cfDNA was extracted and sequenced from a training cohort of 360 participants, including 176 patients with colorectal cancer and 184 healthy controls. An ensemble stacked model comprising five machine learning models was employed to distinguish patients with colorectal cancer from healthy controls using five cfDNA fragmentomic features. The model was validated in an independent cohort of 236 participants (117 patients with colorectal cancer and 119 controls) and a prospective cohort of 242 participants (129 patients with colorectal cancer and 113 controls). The ensemble stacked model showed remarkable discriminatory power between patients with colorectal cancer and controls, outperforming all base models and achieving a high area under the receiver operating characteristic curve of 0.986 in the validation cohort. It reached 94.88% sensitivity and 98% specificity for detecting colorectal cancer in the validation cohort, with sensitivity increasing as the cancer progressed. The model also demonstrated consistently high accuracy in within-run and between-run tests and across various conditions in healthy individuals. In the prospective cohort, it achieved 91.47% sensitivity and 95.58% specificity. This integrated model capitalizes on the multiplex nature of cfDNA fragmentomics to achieve high sensitivity and robustness, offering significant promise for early colorectal cancer detection and broad patient benefit.
Significance: The development of a minimally invasive, efficient approach for early colorectal cancer detection using advanced machine learning to analyze cfDNA fragment patterns could expedite diagnosis and improve treatment outcomes for patients.