Background: Detection of rare variants in traditional and liquid biopsies is becoming increasingly important for cancer diagnosis, monitoring and treatment. Next Generation Sequencing (NGS) has the capacity to detect known and novel variants in small biological samples with a high level of sensitivity. However, the limit of detection is affected by sample DNA concentration and integrity, technical error and number of unique molecules captured and sequenced (library complexity). Unique molecular identifiers (UMl) and random fragment end analysis commonly are used to measure complexity yet both are reported to have systematic biases leading to a non-random read distribution when measured, altering interpretation of results and potentially skewing data. Here we describe the use of a spike in complexity calibration ladder comprising synthetic DNA internal standard competitors (IS) as an orthogonal measure of library complexity.

Methods: An Endogenous Complexity Calibration Ladder (ECCL) comprising multiple unique synthetic IS at different concentrations was created. All ECCL IS share homology with each other and endogenous human SCGB1A1 sequence but each contain nucleotide changes at different positions along the sequence string so that they can be distinguished. The ECCL was mixed with several IS targeting different TP53 exons in a fixed proportion. The ECCL/TP53 IS mixture is designed to be used in either amplicon or hybrid-capture library preparation. Here, amplicon libraries were generated. Molecule numbers and ratios between IS were determined using deep sequencing of multiple replicates. The ECCL/TP53 IS mix was spiked into a commercial human gDNA prior to NGS library preparation. Serial dilutions (1.5-, 3-, 6-, 12- and 24-fold) and intentional addition of inhibitors were conducted to stress and test the system. The ECCL/TP53 IS mixture also was spiked into gDNA from 60 primary airway epithelial cell (AEC) specimens prior to library preparation and NGS to measure TP53 variant fraction while controlling for complexity.

Results: Observed complexity measured using the ECCL was close to expected for each serially diluted mixture of gDNA and ECCL/TP53 IS. The precision of complexity measurement for less dilute and less inhibited samples was limited by absence of finer titration of the least concentrated IS in ECCL. Use of ECCL enabled measurement of inter-sample variation in complexity among the 60 AEC specimens.

Conclusions: The ECCL controls for both sample and library specific variation in complexity, and enables better estimation of the lower limit of detection for variant allele fraction (VAF). Additionally, it is compatible with the use of target specific IS which allow for improved measurement of technical sequencing error. The variation in complexity observed in primary AEC gDNA samples demonstrates the need to account for this when assessing rare variant allele fraction.

Citation Format: Erin L. Crawford, Tian Chen, Daniel J. Craig, James C. Willey. Use of a synthetic spike-in ladder to measure NGS library complexity [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 2286.