Abstract
Homologous recombination deficiency (HRD) drives genomic instability in multiple cancer types and renders tumors vulnerable to certain DNA-damaging agents such as PARP inhibitors. Thus, HRD is emerging as an attractive biomarker in oncology. A variety of in silico methods are available for predicting HRD; however, few of these methods have been applied to cell lines in a comprehensive manner. Here, we utilized two of these methods, “Classifier of HOmologous Recombination Deficiency” and “HRDsum” scores, to predict HRD for 1,332 cancer cell lines and 84 noncancerous cell lines. Cell lines with biallelic mutations in BRCA1 or BRCA2, which encode key components of the homologous recombination pathway, showed the strongest HRD predictions, validating the two methods in cell lines. A small subset of BRCA1/2 wild-type cell lines was also classified as HRD, several of which showed evidence of epigenetic BRCA1 silencing. Similar to HRD in patient samples, HRD in cell lines was associated with p53 loss, was mutually exclusive with microsatellite instability, and occurred most frequently in breast and ovarian cancer types. In addition to validating previously identified associations with HRD, we leveraged cell line–specific datasets to gain new insights into HRD and its relation to various genetic dependency and drug sensitivity profiles. We found that in cell lines, HRD was associated with sensitivity to PARP inhibition in breast cancer but not at a pan-cancer level. By generating large-scale, pan-cancer datasets on HRD predictions in cell lines, we aim to facilitate efforts to improve our understanding of HRD and its utility as a biomarker.
HRD is common in cancer and can be exploited therapeutically, as it sensitizes cells to DNA-damaging agents. Here, we scored more than 1,300 cancer cell lines for HRD using two different bioinformatic approaches, thereby enabling large-scale analyses that provide insights into the etiology and features of HRD.
Introduction
Homologous recombination (HR) is a high-fidelity mechanism for the repair of DNA double-strand breaks. In the absence of HR, cells rely on more error-prone pathways such as polymerase θ–mediated end joining to repair double-strand breaks and thereby accumulate specific types of mutations over time (1). The genomic instability that develops in response to HR deficiency (HRD) can, in turn, drive tumorigenesis. HRD is present in up to half of all high-grade serous ovarian cancers, at least a quarter of primary breast cancers, and a subset of pancreatic and prostate cancers (2–4). As a source of genomic instability, HRD also sensitizes cells to certain types of DNA damage. Inhibition of PARP leads to an accumulation of DNA lesions that are insufficiently repaired in the absence of BRCA1 or BRCA2, key components of the HR pathway (5, 6). As a result, PARP inhibition leads to cell death in BRCA1/2-deficient tumors.
Due to its prevalence in cancer, HRD has garnered much interest from a clinical perspective, and several methods have been developed to detect HRD (7). Traditionally, tests have focused on germline and somatic mutations in BRCA1 and BRCA2, the most common causes of HRD. Gene panels that include additional HR genes such as BRIP1, PALB2, and RAD51C have also been developed to capture potential causes of HRD that extend beyond BRCA1/2 mutations (8). Although genetic testing is well-established, this approach has a few caveats. First, variants of unknown significance can be difficult to interpret. Second, genetic testing does not capture epigenetic alterations such as BRCA1 promoter methylation, a mechanism of BRCA1 inactivation that is present in approximately 11% of ovarian cancers and 13% to 25% of breast cancers (9–12). Lastly, proper gene selection for panel-based tests is limited by our incomplete understanding of non-BRCA causes of HRD. Given the constraints of genetic testing, alternative HRD detection methods are gaining traction. Quantification of RAD51 foci, which mark a key step in the HR pathway, provides a relatively direct test of HRD. However, clinical implementation of RAD51 foci quantification and other functional assays remains challenging, as these assays are labor-intensive and often require fresh tumor tissue (13). Moreover, the RAD51 foci assay is insensitive to downstream defects in HR such as the loss of RAD51AP1 (14). Another approach for detecting HRD assesses the mutational footprints, or genomic “scars,” characteristic of HRD tumors. As these genomic scars accumulate because of HRD, they enable HRD detection regardless of the cause. Genomic scar-based assays are also clinically amenable and have been approved as diagnostic tests for PARP inhibitor (PARPi) therapy in patients with ovarian cancer (7).
Multiple methods are available for predicting HRD based on genomic scars (Table 1). One of the most commonly used metrics is the “HRD score,” which represents the sum of the counts of three different types of large-scale changes in allelic content/copy number that have been shown to associate with BRCA1/2 deficiency: LOH, large-scale state transitions, and telomeric allelic imbalance (15–19). To clearly distinguish HRD scores from other HRD metrics, we subsequently refer to HRD scores as “HRDsum” scores. More recently, machine-learning approaches such as HRDetect and “Classifier of HOmologous Recombination Deficiency” (CHORD) have identified additional combinations of genomic scars predictive of BRCA1/2 deficiency (3, 20). CHORD primarily uses the relative counts of microhomology-mediated deletions and 1 to 10 kb duplications to predict HRD, whereas HRDetect combines the proportion of microhomology-mediated deletions with the HRDsum score and four different mutational signature scores to predict HRD. Compared with HRDetect, CHORD requires less preprocessing of the data, was trained on a pan-cancer cohort as opposed to a breast cancer cohort, and includes the ability to predict BRCA1 vs BRCA2 subtypes (Table 1). HRDetect and CHORD both perform better than HRDsum scores at predicting BRCA1/2 deficiency, but HRDetect and CHORD were designed for analysis of whole-genome sequencing (WGS), whereas HRDsum scores can be generated from more readily available data types, including SNP arrays and whole-exome sequencing (WES; Table 1).
. | HRD score (HRDsum) . | HRDetect . | CHORD . |
---|---|---|---|
Predictive scar types | LOH, LST, and TAI | Microhomology-mediated deletions, SBS3, RS3, RS5, … | Microhomology-mediated deletions, 1–10 kb duplications, … |
Input | SNP array, WES, or WGS | WGS | WGS |
Preprocessing | Allele-specific copy-number analysis | Call somatic mutations, fit mutational signatures, and calculate HRDsum | Call somatic mutations |
Output | LOH + LST + TAI = HRDsum | PHRD | PBRCA1-type HRD + PBRCA2-type HRD = PHRD |
Training set | Breast (n = 497) and ovarian (n = 561) cancers (18) | Breast cancer (n = 560; ref. 20) | Pan-cancer (n = 3,824; ref. 3) |
Performance | Pan-cancer AUROC = 0.91 (23) | Breast cancer AUROC = 0.98 (20) | Pan-cancer AUROC = 0.98 (3) |
. | HRD score (HRDsum) . | HRDetect . | CHORD . |
---|---|---|---|
Predictive scar types | LOH, LST, and TAI | Microhomology-mediated deletions, SBS3, RS3, RS5, … | Microhomology-mediated deletions, 1–10 kb duplications, … |
Input | SNP array, WES, or WGS | WGS | WGS |
Preprocessing | Allele-specific copy-number analysis | Call somatic mutations, fit mutational signatures, and calculate HRDsum | Call somatic mutations |
Output | LOH + LST + TAI = HRDsum | PHRD | PBRCA1-type HRD + PBRCA2-type HRD = PHRD |
Training set | Breast (n = 497) and ovarian (n = 561) cancers (18) | Breast cancer (n = 560; ref. 20) | Pan-cancer (n = 3,824; ref. 3) |
Performance | Pan-cancer AUROC = 0.91 (23) | Breast cancer AUROC = 0.98 (20) | Pan-cancer AUROC = 0.98 (3) |
Only the top predictive scar types are shown for HRDetect and CHORD. Calling somatic mutations in preparation for HRDetect and CHORD includes SNV, small indels, and structural variants. Performance is represented as the AUROC for predicting BRCA1/2 deficiency.
Abbreviations: LST, large-scale state transition; PHRD, HRD probability; PBRCA1-type HRD, BRCA1-type HRD probability; PBRCA2-type HRD, BRCA2-type HRD probability; RS, rearrangement signature; SBS, single-base substitution signature; TAI, telomeric allelic imbalance.
Genomic scar-based HRD predictions are available for thousands of patient tumor samples, thus enabling the identification of associations between HRD and other features such as gene deficiencies, cancer types, and clinical outcomes (3, 20–23). Given that a wide range of datasets and experimental setups are available exclusively to cell lines, HRD predictions for cell lines could provide additional insights into the molecular underpinnings of its mechanism. To this end, we adapted the CHORD method to cell lines to generate HRD predictions for more than 300 cancer cell lines. To predict HRD in cell lines that lacked WGS and were therefore ineligible for CHORD analysis, we also calculated HRDsum scores, resulting in HRD predictions for a total of 1,416 cell lines. We validated the quality of these predictions by comparing them with known BRCA1/2 status and demonstrated the utility of these datasets by using them to explore associations between HRD and various other features in cell lines.
Materials and Methods
WGS for CHORD
CHORD was applied to 336 cell lines with WGS available. FASTQ files for 329 of these cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE, RRID: SCR_013836; ref. 24). The other seven cell lines were sequenced for this study as follows: 647-V (DSMZ, RRID: CVCL_1049), J82 (ATCC, RRID: CVCL_0359), JIMT-1 (DSMZ, RRID: CVCL_2077), MDA-MB-436 (ATCC, RRID: CVCL_0623), NCI-H1915 (ATCC, RRID: CVCL_1505), SUSA (DSMZ, RRID: CVCL_L280), and UWB1.289 (ATCC, RRID: CVCL_B079) cells were cultured for 48 to 72 hours at 37°C, 5% CO2 in the supplier’s recommended media and collected for gDNA isolation. gDNA was isolated on an XTRACT 16+ (AutoGen) using an XK110-96 kit and quantified on a Qubit 4 (Invitrogen). Cell lines were confirmed to be Mycoplasma negative using the BIOWING Applied Biotechnology kit (Cat #: BPM50) or the Sigma LookOut Mycoplasma Testing Kit (Cat #: MP0035) and authenticated using short tandem repeat (STR) profiling. For all samples, reads were aligned to human genome reference GRCh38 using BWA-MEM (v0.7.17, RRID: SCR_010910; ref. 25).
Variant calling and filtering for CHORD
Prior to calling variants, duplicate reads were marked and removed using the MarkDuplicates algorithm from GATK (v4.1.7.0, RRID: SCR_001876). Single-nucleotide variants (SNV) and small insertions and deletions (indel) were detected using SAGE (v3.0.1) and annotated using PAVE (v1.2). SAGE and PAVE are both available from the Hartwig Medical Foundation (HMF, https://github.com/hartwigmedical/hmftools). As SAGE is optimized for samples with 100× coverage by default, we made the following adjustments for analysis of the 30× WGS samples: min_tumor_qual filters were adjusted to the cutoffs recommended for 30× coverage (Hotspot = 40; Panel = 60; High Confidence = 100; Low Confidence = 150), and the read depth parameters max_read_depth, max_read_depth_panel, and max_realignment_depth were lowered to 400, 40,000, and 400, respectively. Using the GRCh38 HMF cohort panel of normals as a reference, common variants were removed with the recommended cutoffs (HOTSPOT: 5:5; PANEL: 2:5; UNKNOWN: 2:0) during the PAVE annotation step. Structural variants were detected using GRIDSS (v2.13.2; refs. 26, 27). The HMF tool GRIPSS (https://github.com/hartwigmedical/hmftools) was used to remove low-quality structural variants and flag structural variants present in HMF cohort panels of normals.
In addition to removing variants marked as Panel of Normals (“PON”) by PAVE and GRIPSS, we applied a variety of other filters to help select high-quality somatic variants. First, indels were normalized, multi-nucleotide variants were split into SNVs, and multiallelic SNVs and indels were separated into multiple entries. Then, SNVs and indels present in gnomAD v2.1 (RRID: SCR_014964), dbSNP build 146 (RRID: SCR_002338), and/or the Homo_sapiens_assembly38.known_indels.vcf file from the GATK resource bundle were removed. SNVs, indels, and structural variants were also removed if they were present in three or more cell lines, a cutoff that was chosen based on CHORD performance. To obtain high-quality variants, we selected PASS variants with allele frequencies of ≥0.1. For SNVs and indels, we also required that variants in low-confidence regions have a quality score of ≥240 and variants in high-confidence regions have a quality score of ≥170. For structural variants, we required that variants have a quality score ≥1,000 and have supporting assemblies from both sides of the breakpoint (AS > 0 and RAS > 0). All variant processing and filtering steps were performed using BCFtools (28).
CHORD analysis
In preparation for CHORD, structural variant lengths and categories (deletions, duplications, inversions, and translocations) were determined using code adapted from https://github.com/PapenfussLab/gridss/blob/master/example/simple-event-annotation.R. Mutation “contexts,” or counts for various mutation types, were extracted and then used to generate CHORD predictions with the CHORD package in R (Supplementary Table S1; ref. 29). In the original CHORD analysis of patient samples, tumors containing more than 14,000 indels in repeat regions were classified as showing microsatellite instability (MSI; ref. 3). For cell lines, we found that a cutoff of 10,000 indels in repeat regions resulted in better alignment with previously determined MSI annotations (30). Therefore, the min.msi.indel.rep parameter was lowered to 10,000. All other parameters remained unchanged from the default setting.
Gene deficiency and proficiency labeling
BRCA1, BRCA2, and other genes included in this study were labeled as deficient if they met one of the two following criteria: (i) The gene was homozygous (allele frequency >0.85) for a mutation that was annotated as pathogenic by ClinVar (RRID: SCR_006169) and/or predicted to have a high impact by VEP (v106, RRID: SCR_007931; ref. 31). For this analysis, cell line mutations were obtained from the Cancer Dependency Map (DepMap) portal (https://depmap.org, RRID: SCR_017655) under release 22Q4 and then annotated with VEP. (ii) The gene contained a deep deletion, as determined by copy-number analysis (WES_pureCN_CNV_genes_cn_category_20221213.csv) from the Sanger Institute (https://cellmodelpassports.sanger.ac.uk/downloads). For BRCA1 and BRCA2, we also reviewed the literature for additional information on functional status. With this approach, we identified HCC1428 as a BRCA2 revertant.
For assessment of CHORD performance, BRCA1/2 deficiency and proficiency labels were assigned as follows. Cell lines that met one of the deficiency criteria described above were labeled as BRCA1/2 deficient. Cell lines were labeled as BRCA1/2 proficient if they (i) lacked any mutations in BRCA1 and BRCA2, according to the mutation calls from DepMap (22Q4); (ii) lacked any deep deletions in BRCA1 and BRCA2, according to the copy-number calls from the Sanger Institute; and (iii) expressed BRCA1 and BRCA2 at levels above the lower quartile [log2 (TPM + 1) = 3.8 for BRCA1, 2.07 for BRCA2]. Expression levels were obtained from the DepMap portal (https://depmap.org) under release 22Q4.
SIFT (RRID: 012813) and PolyPhen (RRID: SCR_013189) predictions for the BRCA1 variant Y1853C were determined using VEP (v106; ref. 31).
To identify cases of BRCA1 gene silencing, we looked for cell lines that showed both relatively low expression of BRCA1 and relatively high levels of BRCA1/NBR2 promoter methylation. Expression values were downloaded from the DepMap portal (release 22Q4), and promoter methylation values were obtained from the 2019 CCLE (24). We defined low BRCA1 expression as log2 (TPM + 1) values below the lower quartile (3.8) and high promoter methylation as methylation fraction values of >0.3. When comparing BRCA1 expression with NBR2 promoter methylation, a defined cluster of cell lines (MDA-MB-134-VI, EFO21, TE5, OVCAR4, SNU119, OVCAR8, and CAL851) met these criteria. OVCAR8 was also reported in the literature as showing BRCA1 methylation (32). Promoter methylation data for HCC38 was unavailable from the CCLE. However, HCC38 was previously reported as showing BRCA1 promoter hypermethylation and minimal BRCA1 expression (33). Therefore, we annotated HCC38 as a likely case of BRCA1 gene silencing, along with the aforementioned cell lines.
RAD51 immunofluorescence assay
Cell lines RPE-1 (ATCC, RRID: CVCL_4388), U-2 OS (ATCC, RRID: CVCL_0042), CAOV3 (ATCC, RRID: CVCL_0201), COV362 (ECACC, RRID: CVCL_2420), UWB1.289 (ATCC, RRID: CVCL_B079), and MDA-MB-436 (ATCC, RRID: CVCL_0623) were confirmed to be Mycoplasma negative using the BIOWING Applied Biotechnology kit (Cat #: BPM50) or the Sigma LookOut Mycoplasma Testing Kit (Cat #: MP0035) and were authenticated using STR profiling. After thawing, cells were maintained in the supplier’s recommended media and incubated at 37°C, 5% CO2 for six passages. Cells were then seeded on coverslips in a six-well plate in the supplier’s recommended media and incubated overnight. The following day, compounds were added, and cells were incubated for 24 hours. Cells were then washed with PBS, pre-extracted with 0.5% Triton X-100 in PBS for 5 minutes on ice, and fixed with 4% paraformaldehyde for 15 minutes at room temperature. Fixed cells were incubated with primary antibodies against RAD51 (1:250 Abcam 133534) at 37°C for 1 hour, washed, and incubated with secondary antibodies (1:250 Alexa Fluor 488 goat anti–rabbit IgG) for 1 hour at room temperature. After washing, the coverslips were mounted onto glass slides using a VECTASHIELD mounting medium containing 4',6-diamidino-2-phenylindole (Vector Laboratories). Images were taken under a 60× oil immersion lens on a DeltaVision Elite Widefield Deconvolution microscope for analysis.
HRDsum score analysis
HRDsum scores were determined from three different allele-specific copy-number datasets. The first dataset, referred to as the “Broad” dataset, was acquired from the 2019 CCLE study and contains copy-number calls for 997 cell lines analyzed by the Broad Institute (24). The second and third datasets, referred to as “Sanger (Broad WES)” and “Sanger (Sanger WES),” respectively, were both derived from copy-number calls (WES_pureCN_CNV_segments_20220623.csv) provided by the Sanger Institute (https://cellmodelpassports.sanger.ac.uk/downloads). The “Sanger (Broad WES)” dataset contains copy-number calls for 324 cell lines with WES originating from the Broad Institute, and the “Sanger (Sanger WES)” dataset contains copy-number calls for 1,047 cell lines with WES originating from the Sanger Institute. HRDsum scores were calculated from each dataset using code adapted from a previous study (21).
After generating HRDsum scores for each dataset, we merged the three sets of scores into one summary dataset. First, we calculated the median HRDsum score for each of the 104 cell lines that were shared by all three datasets. Then, for each dataset, we fit a regression model using natural splines to predict the median HRDsum score for the 104 common cell lines. The three resulting models were applied to their corresponding datasets to generate normalized HRDsum scores for all cell lines. The normalized scores from all three datasets were combined, and the mean score was calculated for cell lines present in more than one dataset. These “summary” HRDsum scores were used throughout this study and are available as a Supplementary Table, along with the raw HRDsum scores for each dataset.
To assign tissue types to each cell line in the HRDsum score dataset, we harmonized the Broad Institute and Sanger Institute tissue type annotations. Broad Institute annotations were downloaded from the DepMap portal (https://depmap.org) under release 22Q4, and Sanger Institute annotations (model_list_20230110.csv) were downloaded from https://cellmodelpassports.sanger.ac.uk/downloads. Sanger tissue terms were converted to Broad tissue terms, and Broad annotations were used in cases in which the two different annotations did not agree. Noncancerous cell lines were grouped into a separate category labeled “noncancerous.” These tissue annotations were used throughout this study.
Mutation enrichment analysis
To test if HRD cell lines were enriched for deficiencies in any DNA repair–related genes, we first obtained a list of 276 genes involved in DNA damage repair that was curated in a previous study (22). Deficiencies in these genes were determined as described above. Enrichment was determined with a one-tailed Fisher's Exact test, and P values were adjusted with the Benjamini–Hochberg method. Only genes showing a deficiency in at least one cell line classified as HRD were included in the analysis.
CRISPR screen analysis
To compare the fitness effects of POT1 and TINF2 knockouts in BRCA1/2-deficient versus BRCA1/2 wild-type (WT) cell lines, we analyzed isogenic CRISPR screen data from a previous study (34). Using the read counts in source data Fig. 1 of the publication, the log2 fold change of endpoint counts (plus a pseudocount of 1) over the initial timepoint counts (plus a pseudocount of 1) was calculated for each guide in each replicate. For each replicate, the distribution of guide-level log fold changes was centered using the median log fold change. These values were then averaged across replicates to obtain guide-level scores. Gene-level scores were determined by calculating the median guide-level score.
Association analyses
Associations between HRD predictions and other features were determined with either a Pearson’s correlation test or by binning cell lines based on HRDsum quartiles and performing a Kruskal–Wallis test. For analyses exploring associations among multiple CCLE features or drug sensitivities, P values were adjusted with the Benjamini–Hochberg method. Noncancerous and MSI cell lines were excluded from these analyses. Gene expression values, damaging mutations, relative copy-number values, and gene dependency scores were downloaded from the DepMap portal (https://depmap.org) under release 22Q4. Promoter methylation fractions were obtained from the 2019 CCLE study (24). For the PRISM Repurposing drug response dataset, which was downloaded from the DepMap portal under release 23Q2, associations were tested using the log2 fold change in viability (treatment vs. DMSO; ref. 35). For the GDSC2 drug response dataset, the July 24, 2022, release was downloaded from https://www.cancerrxgene.org/downloads, and associations were tested using the log10 IC50 values (36). Annotations for MSI and p53 functional status were obtained from a previous study (30).
PARPi sensitivity determination
Sensitivity to niraparib or olaparib was measured via clonogenic assays. Cell lines were sourced from American Type Culture Collection (ATCC), Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ), Japanese Collection of Research Bioresources, European Collection of Authenticated Cell Cultures (ECACC), Research Cell Bank, and Korean Cell Line Bank, were confirmed to be Mycoplasma negative using the BIOWING Applied Biotechnology kit (Cat #: BPM50) or the Sigma LookOut Mycoplasma Testing Kit (Cat #: MP0035), and were authenticated using STR profiling. After thawing, cells were passaged twice and then seeded in six-well dishes in the supplier’s recommended media and incubated at 37°C, 5% CO2 overnight. The following day, niraparib (Selleckchem) or olaparib (Selleckchem) was added at a top dose of 10 µmol/L across a 10-point, 1:3 dilution curve. Following 7 days of drug treatment, media were removed, and fresh media containing drug were added. On day 14, media were removed, and cells were fixed with 4% paraformaldehyde for 15 minutes. Cells were washed with water and stained with 0.1% crystal violet in a 10% (v/v) ethanol in dH2O solution for 20 minutes. Plates were washed with water and dried overnight at room temperature. Crystal violet was extracted using 10% acetic acid, and absorbance was read using a BioTek Cytation 5 plate reader.
Absorbance values were used to quantify IC50 values via the following procedure: For each assay, a spreadsheet containing absorbance values was ingested into an internal LIMS database using the Apache POI library (version 4.1.1), and values were normalized to the average value of the wells containing the control (e.g., DMSO). From these normalized values, single-agent IC50 values were calculated by fitting the single agent data to a two-parameter dose–response curve of the form , in which y represents the response, with 0 indicating no response and 1 indicating complete inhibition; x represents the log10 of the concentration in nmol/L; and the IC50 value represents given by 10a. If the optimization function failed to find a fit using this model, fitting was performed using the alternative model , in which x and y serve the same roles as previously described and in which the IC50 value was given instead by a. In both cases, fitting was performed by the SimpleCurveFitter.fit method of the Apache Commons Math library (version 3.6.1).
Cell lines were classified as “sensitive” to PARP inhibition if at least 66% of replicates showed an IC50 value of <1 µmol/L and a maximal inhibitory response of >75%. Cell lines were classified as “insensitive” to PARP inhibition if at least 66% of replicates showed an IC50 value of ≥1 µmol/L and a maximal inhibitory response of <35%. Cell lines that did not meet either the “sensitive” or “insensitive” criteria were excluded from the analysis.
Statistical analysis
All statistical analyses were performed using R version 4.1.3 (RRID: SCR_001905; ref. 37). Specific statistical tests along with P or q values are described in each figure legend.
Data availability
WGS data generated for this study are available at Sequence Read Archive under accession number PRJNA1151053. All other data generated in this study are available in the article and its Supplementary Materials. Sources for publicly available datasets analyzed in this study are described above. Source code for the generation of CHORD probabilities and HRDsum scores is available at https://github.com/ksqtx/HRD-predictions-cell-lines.
Results
CHORD performance in cell lines
To predict HRD in cell lines, we first applied the CHORD model to 336 cell lines with WGS available (Fig. 1A). CHORD predicts HRD based on somatic variants, which can be called by comparing variants between tumors and matching normal samples. However, matching normal samples are not available for most cancer cell lines. To identify probable somatic variants for cell lines without matching normal samples, we inferred variants in tumor-only mode and then applied multiple filters that remove known germline variants as well as variants that are likely artifacts of PCR amplification or sequencing (Fig. 1A; see “Materials and Methods”). As CHORD relies on relative counts of microhomology-mediated deletions and other mutation types, it cannot be applied to samples with MSI, as these samples contain an abundance of indels in repeat regions that dominate the signal (3). CHORD classified 32 cell lines as MSI, all of which were called MSI in an independent study (Supplementary Fig. S1; ref. 30). For the remaining 304 cell lines, CHORD generated probability scores for both BRCA1-type and BRCA2-type HRD as well as an overall HRD probability, which is equal to the sum of the two subtype scores (Supplementary Table S2).
To test how well CHORD performs on cell lines, we compared the subtype and overall HRD probabilities of cell lines showing biallelic loss of BRCA1 or BRCA2 (n = 8) with the corresponding probabilities of cell lines likely to be proficient for BRCA1/2 (n = 176). BRCA1/2 proficiency was defined as cell lines that are WT for BRCA1 and BRCA2 and also show BRCA1/2 expression levels above the lower quartile. Overall, CHORD predictions agreed well with the known BRCA status of these cell lines, in terms of both the overall HRD probabilities as well as the BRCA1/2 subtype predictions (Fig. 1B). To quantify CHORD performance, we first measured the area under the receiver operating characteristic curve (AUROC). The AUROC values were 0.996, 0.994, and 1 for overall HRD predictions, BRCA1-type predictions, and BRCA2-type predictions, respectively (Fig. 1C). Given the class imbalance (8 BRCA1/2-deficient vs. 176 BRCA1/2-proficient cell lines), we also measured the area under the precision–recall curve (AUPRC). The AUPRC values were 0.928, 0.751, and 1 for overall HRD predictions, BRCA1-type predictions, and BRCA2-type predictions, respectively (Fig. 1C). As all known BRCA1/2-deficient cell lines showed an HRD probability of >0.4, we set this value as the cutoff for classification and subsequently refer to cell lines with HRD probabilities of >0.4 as CHORD-HRD and cell lines with HRD probabilities of ≤0.4 as CHORD–HR-proficient or CHORD-HRP. With an HRD probability cutoff value of 0.4, three of the 11 cell lines predicted to be HRD (CAOV3, PANC0327, and UACC893) were initially labeled as BRCA1/2 proficient, as they lack detectable mutations in BRCA1/2 and show intermediate to high levels of BRCA1/2 expression (Supplementary Table S3). It is possible that these three cell lines are indeed false positives. However, given that CHORD is intended to capture instances of HRD that are overlooked by more traditional detection methods, it is also possible that these cell lines are truly HRD, perhaps because of non-BRCA causes that would have escaped detection by our labeling criteria. Potential causes of HRD in these cell lines are further explored in the following section.
To compare the results of CHORD with those of a functional assay for measuring HR repair capability, we measured RAD51 foci formation in response to DNA-damaging agents for five cell lines included in the CHORD dataset, as well as the noncancerous HR-proficient cell line RPE-1 as a control. Four of the five CHORD predictions (one CHORD-HRP and three CHORD-HRD) agreed with the corresponding RAD51 foci status, suggesting an 80% concordance rate between CHORD and the RAD51 foci assay (Supplementary Fig. S2). The fifth cell line, CAOV3, which was predicted to be HRD, retained the ability to form RAD51 foci in the functional assay.
Genetic and epigenetic BRCA1/2 alterations explain the majority of CHORD-HRD classifications
After assessing the performance of CHORD on BRCA1/2-deficient and likely BRCA1/2-proficient cell lines, we extended our analysis to all 336 cell lines with WGS available. In addition to the eight cell lines known to be BRCA1/2-deficient, 12 other cell lines were classified as CHORD-HRD (Fig. 2A). This included KE39, a gastric carcinoma cell line that is homozygous for a missense mutation in BRCA1 (Y1853C) but was not originally labeled as BRCA1/2-deficient because this mutation was not deemed damaging by our initial criteria (see “Materials and Methods”). To further assess the Y1853C variant, we predicted its functional impact using SIFT and PolyPhen and found that Y1853C scored as “deleterious” with a value of 0 and as “probably damaging” with a value of 1, respectively. Furthermore, the consensus for Y1853C in the ClinVar database is “likely pathogenic.” These observations, combined with the BRCA1-type HRD prediction made by CHORD, suggest that KE39 is deficient for BRCA1. As genomic scars serve as a historical record of repair deficiencies and do not necessarily reflect the current repair status, CHORD, and other scar-based methods are likely to classify BRCA1/2-revertant lines as HRD. Indeed, HCC1428, a BRCA2-mutated cell line with a secondary mutation that rescues BRCA2 function, showed an HRD probability of 0.792 (Fig. 2B).
Genetic inactivation of BRCA1/2 was not detected in the remaining 10 cell lines classified as HRD. To explore the underlying causes of HRD in these cell lines, we calculated Pearson correlations between CHORD-HRD probabilities and various features available from the CCLE in WT BRCA1/2 cell lines. The most significant association was a positive correlation with promoter methylation of NBR2 (neighbor ofBRCA1 gene 2), with five BRCA1-type CHORD-HRD lines showing relatively high levels of NBR2 promoter methylation (q < 0.0001; Fig. 2B). Interestingly, NBR2 shares a bidirectional promoter with BRCA1 (38). CHORD-HRD probabilities also correlated positively with BRCA1 promoter methylation levels (q < 0.0001 for sites chr17:41276132-41277132 and chr17:41277500-41278500) and negatively with BRCA1 expression levels (q < 0.0001; Supplementary Fig. S3A). Together, these observations suggest that multiple BRCA1-type CHORD-HRD lines are HRD because of epigenetic silencing of BRCA1. To identify all likely cases of BRCA1 silencing, we determined which cell lines showed evidence of BRCA1 silencing in previous studies and/or showed both relatively high levels of NBR2/BRCA1 promoter methylation and relatively low levels of BRCA1 expression (see “Materials and Methods”; Supplementary Fig. S3B). Of the eight cell lines that met these criteria, six cell lines (CAL851, EFO21, HCC38, OVCAR4, OVCAR8, and SNU119) were classified by CHORD as BRCA1-type HRD (Fig. 2A). Thus, epigenetic inactivation of BRCA1 likely explains a substantial fraction of HRD classifications (6/10) in WT BRCA1/2 cell lines. Although OVCAR8 showed low levels of BRCA1 expression and relatively high levels of BRCA1 methylation using data from the CCLE (Supplementary Fig. S3B), OVCAR8 is also reported to only have heterozygous BRCA1 methylation and is capable of forming RAD51 foci, suggesting a partial reversal of HRD (39). This reflects a limitation of this analysis because the methylation status of any HR genes may be altered during cell culture/cell line passages after a deficiency in HR is encoded in the genome.
After accounting for both the genetic and epigenetic status of BRCA1/2, four CHORD-HRD cell lines (CAOV3, PANC0327, HCC1187, and UACC893) lacked any detectable BRCA1/2 alterations. To investigate the potential causes of HRD in these cell lines, we determined if biallelic loss of any DNA repair–related genes other than BRCA1/2 was enriched in these cell lines using a previously curated list of genes involved in major DNA damage repair pathways (22). Although no genetic deficiencies were significantly enriched after adjusting for multiple testing, it is interesting to note that the top hit was FANCC, which encodes a Fanconi anemia protein that has been shown to promote HR (Supplementary Table S4; ref. 40). The BRCA1-type CHORD-HRD cell line PANC0327 harbors a deletion of FANCC, whereas none of the CHORD-HRP cell lines showed biallelic inactivation of FANCC (Fig. 2A). In patient samples, deficiencies in RAD51C and PALB2 associate with BRCA2-type HRD predictions (3). In cell lines, however, associations with these genetic deficiencies could not be tested, as none of the cell lines analyzed by CHORD show biallelic inactivation of either RAD51C or PALB2. As promoter methylation of RAD51C can also cause HRD, we also examined RAD51C expression and promoter methylation levels in the cell lines analyzed by CHORD using data available from the CCLE (41, 42). None of the CHORD-HRD cell lines showed high levels of RAD51C methylation combined with low levels of RAD51C expression (Supplementary Fig. S3C). Two CHORD-HRD cell lines, CAL851 and EFO21, showed partial methylation of RAD51C, but these cell lines are most likely HRD because of BRCA1 methylation. Aside from the FANCC deletion in PANC0327, no other DNA repair–related gene deficiencies were exclusive to CHORD-HRD cell lines, and the underlying cause of the HRD phenotype for CAOV3, HCC1187, and UACC893 remains to be determined.
Generation of HRDsum scores for cell lines
The above results suggest that with the proper preprocessing of the data, CHORD can be a powerful tool for predicting HRD and BRCA1/2 subtypes in cell lines. In its current state, however, the CHORD method is only applicable to cell lines with WGS available. In comparison, HRDsum scores can be determined from more readily available types of data such as SNP arrays or WES. Therefore, to gain a more comprehensive understanding of HRD in cell lines, we also calculated HRDsum scores for cell lines using an approach that was previously applied to patient samples (21). Starting with allele-specific copy-number calls available from either the Broad Institute or Sanger Institute, we generated HRDsum scores for three different datasets (Fig. 3A). A subset of cell lines was present in more than one dataset, enabling us to compare HRDsum scores between the different datasets for these cell lines (Fig. 3B). Although there was a shift in absolute values between the datasets, HRDsum scores derived from different data sources were highly correlated, with Pearson correlation coefficient values of ≥0.89 (Fig. 3C). We used natural spline regression to integrate the different datasets into a single summary HRDsum score for each cell line (see “Materials and Methods”), resulting in a dataset of HRDsum scores for 1,414 cell lines (Supplementary Table S5).
Like CHORD probabilities, HRDsum scores agreed well with known BRCA1/2 status. As expected, BRCA1/2-deficient cell lines ranked among the highest HRDsum scores (Fig. 3D). HRDsum scores in cell lines with either biallelic loss of BRCA1/2 or likely epigenetic silencing of BRCA1 were significantly higher than HRDsum scores in tissue-matched cell lines that are likely BRCA1/2-proficient (Supplementary Fig. S4A). As was the case with CHORD (and is likely the case for most scar-based HRD detection methods), the HRDsum score method produced a relatively high score of 52 for the BRCA2-revertant cell line HCC1428 (Fig. 3D). Overall, the strong association between HRDsum scores and BRCA1/2 alterations suggests that HRDsum scores serve as an independent and high-quality proxy of HRD status.
For patient samples, analyses of HRDsum scores often apply a cutoff value of ≥42, which was set by calculating the fifth percentile of HRDsum scores for a group of BRCA1/2-deficient breast and ovarian tumors (18). For cell lines, we propose a cutoff value of ≥47, which was the fifth percentile of HRDsum scores for BRCA1/2-deficient cell lines (Fig. 3D). Eighty-two of 1,414 cell lines (6%) fell above this cutoff and were therefore predicted to be HRD. We subsequently refer to these cell lines as HRDsum-high. BRCA1/2 deficiencies were detectable in 21 (26%) of HRDsum-high cell lines. To explore other possible causes of HRD in the remaining 61 HRDsum-high cell lines, we tested if these cell lines were enriched for biallelic mutations in any other DNA repair–related genes, similar to the analysis performed for the CHORD dataset. The top hit was a deficiency for ATRX, which encodes a chromatin remodeler known to function in HR (Supplementary Fig. S4B, Supplementary Table S6; ref. 43). However, the enrichment for ATRX mutations was not significant after correcting for multiple testing. In our enrichment analysis for CHORD, we identified FANCC loss as a possible explanation for the high HRD probability observed in PANC0327 (Fig. 2A). PANC0327 was not HRDsum-high; however, a second FANCC-deficient cell line, HUH7, showed a high HRDsum score of 52 (Supplementary Fig. S4B). HUH7 was not included in the CHORD dataset and therefore the CHORD probability for this cell line remains to be determined.
Next, we compared HRDsum scores with CHORD probabilities for the 302 cell lines that were in common between the two datasets (Supplementary Fig. S4C). CHORD-HRD cell lines showed significantly higher HRDsum scores than CHORD-HRP cell lines, and most (17/20) CHORD-HRD cell lines were HRDsum-high (Fig. 3D; Supplementary Fig. S4D). Although the strongest HRD predictions were shared by both the CHORD and HRDsum score datasets, some HRD predictions were unique to one of the two datasets (Supplementary Fig. S4E). UACC893, HCC1187, and PANC0327, for instance, were all classified as CHORD-HRD, but the HRDsum scores for these cell lines fell below the HRDsum cutoff (Fig. 2A; Supplementary Fig. S4E).
HRD cell lines are primarily breast and ovarian cancer types
Using both the CHORD and HRDsum score datasets, we investigated the prevalence of HRD across different cancer types. In patient samples, CHORD-HRD classifications are most frequent in ovarian (30%–52%) and breast (12%–24%) cancer types, followed by pancreatic (7%–13%) and prostate (6%–13%) cancers (3). We grouped cell lines by tissue type and observed that similar to patient samples, CHORD-HRD predictions were most frequent in cell lines derived from ovarian (38%) and breast (23%) cancers (Fig. 4A). HRD predictions were also present in cell lines derived from pancreatic (18%), esophagus/stomach (4%), and lung (1%) cancers (Fig. 4A). Similar to the CHORD dataset, the HRDsum score dataset showed the highest prevalence of HRD in breast (33%) and ovarian (21%) tissue types (Fig. 4B). The highest HRDsum score was in the cervical cancer cell line DOTC24510, which is homozygous for a stop-gain variant in BRCA2. As expected, noncancerous cell lines showed the lowest mean HRDsum score (Fig. 4B). The relatively high mean HRDsum score for the esophagus/stomach tissue type was somewhat surprising, although a previous pan-cancer analysis of HRDsum scores in patient samples also showed an enrichment of HRD in esophageal carcinomas (Fig. 4B; ref. 23). Although up to 13% of samples of patients with prostate tumor score as HRD, none of the prostate cancer cell lines included in our analysis qualified as HRD (3, 23). However, the CHORD and HRDsum datasets included only one and eight prostate cell lines respectively, and four of those cell lines are MSI, which has been shown to be mutually exclusive with HRD (21, 44). Additional samples are needed to obtain a better understanding of HRD in prostate cancer cell lines.
To further assess HRD by indication, we aligned cell lines with HRDsum scores to transcriptionally similar tumor samples using Celligner (45). HRDsum-high cell lines predominantly clustered with tumor samples of the same cancer type, with several cell lines aligning with breast and ovarian tumors, followed by lung and esophageal tumors (Supplementary Fig. S5A). As CHORD provides subtype-level resolution for HRD, we also mapped CHORD predictions to tumor types to see how the distribution of BRCA1-type HRD compared with that of BRCA2-type HRD. Interestingly, BRCA1-type breast cancer lines all clustered with the basal subtype, whereas BRCA2-type breast cancer lines represented a variety of subtypes (Supplementary Fig. S5B). This observation aligns with previous studies showing that patient tumors with germline mutations in BRCA1, but not BRCA2, express a basal-like phenotype (46, 47). Overall, these results suggest that the distribution of HRD across different cancer types and subtypes in cell lines is representative of that in patient samples.
HRDsum scores associated with p53 loss, microsatellite stability, and chromosome 3q26 copy number
In patient samples, HRD exhibits a positive correlation with TP53 inactivation and an inverse correlation with MSI (21–23, 44, 48). To test whether these associations are conserved in cell lines, we first compared HRDsum scores with the functional status of p53 using previously determined predictions based on nutlin-3 sensitivity as well as a p53 target gene expression signature (30). Pan-cancer, HRDsum scores were significantly higher in p53-deficient cell lines than in p53-proficient cell lines (Fig. 5A). At the level of individual tissue types, HRDsum scores associated significantly with TP53 status in breast and esophagus/stomach cancer cell lines (Supplementary Fig. S6A). Next, we compared HRDsum scores between MSI cell lines and microsatellite-stable cell lines using MSI classifications based on the frequency of deletions within microsatellite regions (30). As is the case in patient samples, HRDsum scores were markedly low in MSI cell lines compared with HRDsum scores in microsatellite-stable cell lines (Fig. 5B). We then performed a correlative analysis to look for associations between HRDsum scores and various genomic features in an unbiased manner. The most significant result was a positive correlation with the copy number on chromosome 3q26.2, which peaked at the ACTRT3 locus (Fig. 5C). ACTRT3 expression levels also correlated positively with HRDsum scores (Supplementary Fig. S6B). To the best of our knowledge, there is no known biological association between HR and ACTRT3 or any of the neighboring hits from the correlative analysis. However, given that telomeric allelic imbalance makes up part of the HRDsum score, it is interesting to note that ACTRT3 is adjacent to TERC, which encodes the RNA component of telomerase. TERC was not included in the original copy-number analysis, which focused on protein-coding genes. We obtained absolute copy-number calls for TERC from the DepMap portal (https://depmap.org) and found that indeed, the TERC copy number associated with HRDsum scores (Supplementary Fig. S6C). Amplification of 3q26 is characteristic of a class of tumors enriched for TP53 mutations and copy-number changes and was previously shown to associate with HRDsum scores in serous endometrial cancer (49, 50). Altogether, these observations suggest that key associations with HRDsum scores translate from patient samples to cell lines.
Associations between HRD classifications and genetic dependencies
To gain new insight into HRD and its therapeutic potential, we took advantage of key datasets that are available exclusively for cell lines. One such dataset is the DepMap, which identifies genetic cancer dependencies for more than 1,000 cell lines (51, 52). We used this dataset to determine whether HRD cell lines are particularly vulnerable (or resistant) to certain genetic deficiencies by binning cell lines based on HRDsum scores and then testing for differences in genetic dependency scores. The top two associations were dependency scores for POT1 and TINF2, which encode for two different subunits of the telomere-capping shelterin complex. In particular, cell lines with high HRDsum scores were less sensitive to POT1 and TINF2 inactivation than cell lines with low HRDsum scores (Fig. 5D; Supplementary Fig. S6D). To validate the observed association between POT1 and TINF2 dependency and low HRDsum scores, we analyzed genetic screen data in isogenic BRCA1/2 mutant and WT cell lines (34) and observed that POT1 knockout provides a relative growth advantage in BRCA1-deficient and BRCA2-deficient cell lines relative to their WT counterparts (Supplementary Fig. S6E). However, TINF2 did not score highly in the same screens and is also annotated as a common essential gene by DepMap, suggesting that the association of TINF2 dependency with low HRDsum scores is likely an artifact and not of biological relevance.
Synthetic lethal interactions are known to show incomplete penetrance because of their dependence on genetic backgrounds or tumor physiology and heterogeneity (53, 54). To test for context-dependent associations between HRDsum scores and dependency scores, we performed a similar analysis as above but with only breast and ovarian tissue types, which showed the highest frequency of HRD. One of the top lineage-specific associations was the oncoprotein CIP2A (cancerous inhibitor of protein phosphatase PP2A), for which the dependency score was most significant in cell lines with high HRDsum scores (Fig. 5E). This observation corroborates the dependence of HR-deficient cells on CIP2A to enter and successfully complete mitosis and is in line with the finding that loss of CIP2A is synthetic lethal with BRCA1/2 mutations (34, 55). CIP2A dependency was also associated with CHORD-HRD predictions (Supplementary Fig. S6F). We observed a similar pattern with dependency on CUL4B, which encodes part of an E3 ubiquitin ligase complex that participates in the DNA damage response as well as other processes (56, 57). Sensitivity to CUL4B inactivation seemed to be selective, with only a fraction of cell lines showing CUL4B dependency, and dependency was specific to the group of cell lines with the highest HRDsum scores (Fig. 5F). Thus, CUL4B may harbor potential as a therapeutic target in HRD tumors. More observations are needed to determine whether the association between HRD and CUL4B dependency is significant.
Associations between HRD classifications and drug sensitivities
HRD shows promise as a predictive biomarker for PARPi, as well as platinum-based chemotherapy drugs (58, 59). To explore this aspect of HRD in cell lines, we compared HRDsum scores with PARPi sensitivity, as determined by a clonogenic assay, for 94 cell lines across 19 different tissue types. Cell lines were categorized as either sensitive or insensitive to PARP inhibition based on cutoffs for both the IC50 and maximal activity values (see “Materials and Methods”). Overall, HRDsum scores were slightly higher in PARPi-sensitive cell lines than in PARPi-insensitive cell lines (Supplementary Fig. S7A). Looking at specific tissue types, this effect was driven largely by breast cancer lines, which showed a significant difference in HRDsum scores between response groups (Fig. 6A). Ovarian cancer cell lines, in contrast, showed a wide range of HRDsum scores within the PARPi-sensitive group (Fig. 6A). Outside of breast and ovarian cell lines, there was no difference in HRDsum scores between response groups for cell lines in the remaining 17 tissue types (Fig. 6A). Thus, HRDsum scores were associated with PARPi sensitivity in breast cancer cell lines but not pan-cancer. We observed a similar, albeit not significant, trend with independent datasets, using either CHORD probabilities for HRD predictions or the PRISM Repurposing dataset for PARPi sensitivity (Supplementary Fig. S7B and S7C; ref. 35). For platinum-based drugs, there were no associations between HRDsum scores and viability measurements available from the PRISM Repurposing dataset (Fig. 6B; Supplementary Fig. S7D).
In addition to testing for previously identified associations, we used the PRISM Repurposing dataset to identify potentially novel associations between HRD predictions and drug sensitivity (35). Interestingly, various MDM2 inhibitors (MDM2i) showed many of the strongest correlations between HRDsum scores and drug response (Fig. 6B). Cell lines sensitive to milademetan and other MDM2i showed relatively low HRDsum scores, indicating that MDM2i-sensitive lines tend to be HR-proficient (Fig. 6B and C). In line with this observation, cell lines showing a dependency on the MDM2 gene showed relatively low HRDsum scores (Fig. 6D). MDM2 negatively regulates p53, and the primary mechanism of action for MDM2i involves activation of the p53 pathway (60). Given that response to MDM2i depends on functional p53 and that HRD cell lines are typically p53-deficient, we reasoned that TP53 loss underlies the resistance of HRD cell lines to MDM2i. Consistent with this idea, HRD was mutually exclusive with dependence on MDM4 and PPM1D, which, like MDM2, also negatively regulate p53 (Supplementary Fig. S8A; ref. 61). In addition to MDM2i, HRDsum scores were also associated with resistance to CDK inhibitors, particularly those targeting CDK4 and CDK6 (CDK4/6i; Fig. 6B; Supplementary Fig. S8B). This association may also be explained by p53 loss in HRD cell lines, as there is evidence to suggest that response to CDK4/6 inhibition is p53-dependent (62, 63). On the other end of the spectrum, significant negative correlates were predominantly EGFR/HER2 inhibitors (Fig. 6B). However, these correlations were relatively weak, and cell lines with pathogenic EGFR mutations did not have particularly high HRDsum scores, which one would expect if HRD were predictive of sensitivity to EGFR inhibition (Supplementary Fig. S8C). In conclusion, a survey of PRISM drug responses revealed that HRD associated with resistance to MDM2i and, to a lesser extent, CDK4/6i and did not associate well with sensitivity to any drugs. Similar trends were observed with drug responses available from the GDSC2 dataset (Supplementary Fig. S8D; ref. 36).
Discussion
Multiple methods based on mutational scars have been developed and used to predict HRD for large cohorts of patient samples. However, equivalent datasets for cell lines have been lacking. In this study, we determined HRD predictions for more than 1,400 cell lines using both a traditional method based on HRDsum scores, as well as the more recently developed classifier CHORD. Given that HRDsum scores and CHORD were designed to analyze mutational patterns in patient tumor samples, it was initially unclear how well these methods would perform on cell lines for multiple reasons. First, the mutational profiles of cell lines can differ from tumors, as cell lines are less heterogeneous and tend to harbor more mutations overall (64). Second, HRDsum scores and CHORD rely specifically on somatic mutations, which are less straightforward to detect in cell lines because most cancer cell lines lack matching normal cell lines. For HRDsum scores, which can be determined from SNP arrays or WES, somatic mutations can be inferred by using a pool of nonmatching normal cell lines as a reference. CHORD, however, requires WGS, which is not readily available for many normal cell lines. To circumvent this issue, we applied a series of filters designed to remove common variants that are most likely germline or artifacts of PCR amplification or sequencing. Despite the aforementioned differences between cell lines and patient samples, HRDsum scores and CHORD were able to discriminate BRCA1/2-deficient cell lines from BRCA1/2-proficient cell lines. Furthermore, HRD predictions in cell lines showed several similarities to HRD in patient samples, including an association with p53 loss and enrichment in breast and ovarian cancer types. Overall, these results suggest that genomic scar-based methods may be utilized for predicting HRD in cell lines and that cancer cell lines are representative of tumors with respect to key HRD characteristics. As WGS becomes more available, this adaptation of CHORD for cell lines can be readily applied to additional samples.
HRDsum scores and CHORD probabilities agreed well overall, particularly for BRCA1/2-deficient cell lines. Some cell lines, however, were classified as HRD in only one of the two datasets. One possible explanation for the differences observed between the two datasets is that CHORD and HRDsum scores each capture different nuances in the DNA repair landscape, which can vary depending on the genetic or epigenetic background of a tumor or cell line. The following points support this idea. First, CHORD and HRDsum scores detect different types of mutations: the top predictors for the CHORD model are small deletions with flanking microhomology and duplications, whereas HRDsum scores are based on large-scale changes in allelic content and copy number. Second, given that the loss of HR factors can result in different types of DNA damage that can be repaired by a variety of pathways, different cell lines may acquire different types of mutational scars depending on the nature of the HRD and/or the functional status of compensatory repair pathways (1). Thus, HRD is a complex phenotype in terms of mutational consequences, and different HRD detection methods are likely to detect different subtleties of this phenotype.
Compared with other HRD detection methods, genomic scar–based methods such as CHORD and HRDsum scores have many advantages as well as limitations. As HRD-induced scars are a relatively permanent phenotype that will persist even if HR is restored, these scars may not always represent the current status of HR. This was indeed the case for the BRCA2-revertant cell line HCC1428, which was classified as HRD by both CHORD and the HRDsum method. However, because genomic scars are a downstream consequence of HRD, scar-based methods have the major advantage of being able to detect etiologies of HRD that extend beyond the loss of BRCA1 and BRCA2. Interestingly, the two FANCC-deficient cell lines included in our study were classified as HRD by either CHORD or HRDsum scores. FANCC is part of the Fanconi anemia core complex, which has been shown to promote HR (65). Therefore, FANCC loss potentially explains the HRD phenotype of a subset of cell lines. Additional FANCC-mutant cell lines will need to be assessed to determine whether FANCC loss significantly associates with HRD. Probable explanations for an HRD phenotype remain undetermined for some cell lines such as CAOV3, which scored high for HRD in both the CHORD and HRDsum datasets but does not show biallelic loss of any DNA damage repair–related genes other than TP53 and retains the ability to form RAD51 foci. It is possible that certain monoallelic mutations or epigenetic alterations, either alone or in combination, underlie HRD in such cell lines.
In addition to validating previously observed associations between HRD predictions and various features, cell line–specific datasets such as the DepMap dependency scores enabled us to gain new insights into HRD. We found, for instance, that HRD negatively associated with genetic dependency on POT1, a subunit of the shelterin complex. POT1 was recently shown to preserve the stability of telomeres by protecting them from unwanted HR activity (66). Thus, POT1 inactivation is more detrimental when HR is intact, which could explain why HR-proficient cell lines were more sensitive to loss of POT1. We also found that CUL4B dependency was specific to HRD cell lines in breast and ovarian cancer types. Loss of CUL4B sensitizes cells to DNA-damaging agents and is thought to accelerate apoptosis in response to DNA damage (57). It would therefore be interesting to further explore whether CUL4B loss is synthetic lethal with HRD. Given that CUL4B is amenable to small molecule inhibition and seems to be selectively essential in HRD cell lines, CUL4B could serve as an attractive therapeutic target.
HRD is actively being explored as a predictive biomarker for sensitivity to PARP inhibition, and two different genomic scar-based HRD assays, Myriad MyChoice CDx and Foundation Medicine % LOH, have been approved as diagnostics for PARPi therapy (7). In clinical studies including PAOLA-1, ARIEL2, and ARIEL3, HRD patient groups showed a greater benefit from PARPi treatment than HR-proficient patient groups (67–69). However, current HRD assays have yet to consistently discriminate responders from nonresponders, suggesting that HRD-associated genomic scars are an imperfect biomarker of PARPi sensitivity (70, 71). In cell lines, we found that HRD predictions associated with PARPi sensitivity in breast cancer, but not other cancer types. Lack of a pan-cancer association between HRD and PARPi sensitivity was also reported in a recent study of CCLE cell lines (72). More work needs to be done to fully understand the relationship between HRD and PARPi response, and we hope that the resources provided in this study facilitate this effort.
We envision a wide range of future applications for the datasets generated by this work. In this study, we performed univariate association analyses to provide an initial characterization of HRD in cell lines. However, multivariate analyses could yield additional insights and tools. For instance, the HRDsum scores could be used to build a gene expression-based classifier of HRD, which would enable the prediction of HRD in new cell lines in which only RNA sequencing data are available. We also anticipate that the HRD predictions generated in this study will aid in the selection of cell lines for experiments relating to DNA repair. For all future studies based on scar-based predictions, it is important to keep in mind that genomic scars do not necessarily reflect the current status of HR, and in certain cases, it may be prudent to validate HR status with functional assays such as RAD51 foci quantification or DR-GFP reporters. Potential applications of this work extend beyond the study of HRD, as well, as we also provide a table of WGS-based mutation contexts, or counts of different mutation types, for more than 300 cell lines. Only a subset of these mutation types was utilized by CHORD to predict HRD, and it may be worth exploring other mutation types or combinations thereof to determine their potential to predict other types of DNA repair deficiencies, for example. In conclusion, further exploration of the datasets from this study is warranted and should advance our understanding of DNA repair deficiencies in cancer.
Authors’ Disclosures
S. Shenker reports a patent for US-20230277533-A1 pending. C. Middleton reports nonfinancial support from Sapio Sciences outside the submitted work, as well as employment with KSQ Therapeutics and receiving stock options as a condition of employment. A.A. Wylie reports personal fees from KSQ Therapeutics during the conduct of the study. No disclosures were reported by the other authors.
Authors’ Contributions
A.E. Dodson: Conceptualization, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. S. Shenker: Conceptualization, software, formal analysis. P. Sullivan: Investigation, methodology, writing–original draft. S.U. Nayak: Conceptualization, formal analysis, investigation, visualization, methodology. C. Middleton: Data curation, software. M. McGuire: Software. E. Chipumuro: Investigation, methodology. Y. Mishina: Investigation, methodology. E.R. Tobin: Resources, investigation, methodology. L. Cadzow: Resources, investigation, methodology. A.A. Wylie: Conceptualization, resources, investigation. D. Sangurdekar: Conceptualization, resources, formal analysis, supervision, methodology, writing–original draft, project administration, writing–review and editing.
Acknowledgments
We would like to thank the contract research organization ChemPartner for supporting this work. All studies were funded by KSQ Therapeutics.
Note: Supplementary data for this article are available at Cancer Research Communications Online (https://aacrjournals.org/cancerrescommun/).