Abstract
The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest. By examining single-guide RNAs that map to multiple genomic sites, we found that this cell response to CRISPR/Cas9 editing correlated strongly with the number of target loci. These observations indicate that genome targeting by CRISPR/Cas9 elicits a gene-independent antiproliferative cell response. This effect has important practical implications for the interpretation of CRISPR/Cas9 screening data and confounds the use of this technology for the identification of essential genes in amplified regions.
Significance: We found that the number of CRISPR/Cas9-induced DNA breaks dictates a gene-independent antiproliferative response in cells. These observations have practical implications for using CRISPR/Cas9 to interrogate cancer gene function and illustrate that cancer cells are highly sensitive to site-specific DNA damage, which may provide a path to novel therapeutic strategies. Cancer Discov; 6(8); 914–29. ©2016 AACR.
See related commentary by Sheel and Xue, p. 824.
See related article by Munoz et al., p. 900.
This article is highlighted in the In This Issue feature, p. 803
Introduction
Genome engineering using site-specific DNA endonucleases has operationalized functional somatic cell genetics, enabling precise perturbation of both coding and noncoding regions of the genome in cells from a range of different organisms. Zinc-finger nucleases (ZFN) and transcription activator-like effector nucleases (TALEN) are custom-designed endonucleases that enable site-specific genome editing, but their widespread application has been limited by reagent complexity and cost (1, 2). The bacterial CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated 9) system, which serves as an adaptive immune mechanism, has been shown to serve as a versatile and highly effective technology for genome editing (3–8). CRISPR/Cas9 applications require introduction of two fundamental components into cells: (i) the RNA-guided CRISPR-associated Cas9 nuclease derived from Streptococcus pyogenes and (ii) a single-guide RNA (sgRNA) that directs the Cas9 nuclease through complementarity with specific regions of the genome (3, 7–11).
Genome editing occurs through induction of double-stranded breaks in DNA by the Cas9 endonuclease in an sgRNA-directed sequence-specific manner. These DNA breaks can be repaired by one of two mechanisms: nonhomologous end joining (NHEJ) or homology-directed repair (HDR; refs. 3, 12). CRISPR/Cas9-mediated gene knockout results from a DNA break being repaired in an error-prone manner through NHEJ and introduction of an insertion/deletion (indel) mutation with subsequent disruption of the translational reading frame (11). Alternatively, HDR-mediated repair in the presence of an exogenously supplied nucleotide template can be utilized to generate specific point mutations or other precise sequence alterations. Furthermore, nuclease-dead versions of Cas9 (dCas9) can also be fused to transcriptional activator or repressor domains to modulate gene expression at specific sites in the genome (13–17). CRISPR/Cas9 technology has been effectively utilized in cultured cells from a myriad of organisms (12) and has also been successfully used for in vivo modeling in the mouse germline (18, 19) as well as for somatic gene editing to generate novel mouse models of cancer (20–24).
Recent studies have shown that CRISPR/Cas9 can be effectively used for loss-of-function genome-scale screening in human and mouse cells (9–11, 25–28). These approaches rely upon lentiviral delivery of the gene encoding the Cas9 nuclease and sgRNAs targeting annotated human or mouse genes. Multiple different CRISPR/Cas9 knockout screening libraries have been developed, including both single-vector (Cas9 and the sgRNA on the same vector) and dual-vector systems (9, 25, 29). Pooled CRISPR/Cas9 screening is typically performed through massively parallel introduction of sgRNAs targeting all genes into Cas9-expressing cells, with a single sgRNA per cell. Positive- or negative-selection proliferation screens are performed, and sgRNA enrichment or depletion is measured by next-generation sequencing (9, 10).
To date, only a limited number of genome-scale CRISPR/Cas9 knockout screens have been reported, and these screens have demonstrated a high rate of target gene validation (9–11, 25–28). Wang and colleagues recently reported an analysis of cell-essential genes using CRISPR/Cas9-mediated loss-of-function screens in four leukemia and lymphoma cell lines (28). Hart and colleagues also reported identification of core and cell line–specific essential genes in five cancer cell lines of differing lineages (25). This approach has enabled the identification of known oncogene dependencies as well as many novel essential genes and pathways in individual cancer cell lines (25, 28). In addition to knockout screens, proof-of-concept CRISPR-activator or CRISPR-inhibitor screens using dCas9 and genome-scale sgRNA libraries have also been successfully conducted (30, 31). Moreover, in vivo genome-scale screens with CRISPR/Cas9 have also been performed for cancer-relevant phenotypes (32).
To identify cancer cell vulnerabilities in a genotype- and phenotype-specific manner, we performed genome-scale loss-of-function genetic screens in 33 cancer cell lines representing a diversity of cancer types and genetic contexts of both adult and pediatric lineages (Supplementary Table S1; ref. 29). When we analyzed essential genes across the entire dataset, we unexpectedly found a robust correlation between apparent gene essentiality and genomic copy number (CN), where the number of CRISPR/Cas9-induced DNA cuts predict the cellular response to genome editing.
Results
High-Resolution CRISPR/Cas9 Screening in Cancer Cell Lines for Gene Dependencies
Using the dual-vector GeCKOv2 CRISPR/Cas9 system, we performed genome-scale pooled screening in 33 cancer cell lines representing a wide diversity of adult and pediatric cancer types (Supplementary Table S1; Fig. 1A). Cancer cell lines were transduced with a lentiviral vector expressing the Cas9 nuclease under blasticidin selection. These stable cell lines were then infected in replicate (n = 3 or 4) at low multiplicity of infection (MOI < 1) with a library of 123,411 unique sgRNAs targeting 19,050 genes (6 sgRNAs per gene), 1,864 miRNAs and 1,000 nontargeting negative control sgRNAs (29). Infected cells were purified by selection with puromycin and then passaged with an average representation of 500 cells per sgRNA until an endpoint of 21 or 28 days. At the endpoint, the abundance of sgRNAs in these cells was quantitated from genomic DNA by massively parallel sequencing and compared with the abundance in the plasmid pool used for virus production to define the relative dropout or enrichment in the screen (Fig. 1A).
The log2-normalized read counts of the 1,000 nontargeting sgRNAs show a slight enrichment in representation from the original plasmid DNA pool, indicating that on average nontargeting guides have no substantial detrimental effect on viability (Fig. 1B). As positive controls, we also compiled a list of 213 putative cell-essential genes that are part of the ribosome, proteasome or spliceosome complexes (Supplementary Table S2). In contrast to the nontargeting negative control guides, the read counts of these positive controls in late time point samples were substantially depleted compared with the initial reference pool (Fig. 1B). Replicate reproducibility after quality control for each cell line was consistently high (Fig. 1C).
We defined a CRISPR/Cas9 guide score for each sgRNA in the screen by first calculating the log2 fold change in abundance from the screen endpoint compared with the pool of plasmid DNA, followed by subtraction of the median scores of the negative control sgRNAs (see Methods). Hence, in our dataset a guide score of zero equates to the median effect of negative control sgRNAs. Similarly, the second most depleted sgRNA for each gene was used to call a single “second-best” CRISPR/Cas9 guide score and therefore allow the representation of gene-level dependencies (33). Significant depletions of sgRNAs are denoted by negative CRISPR/Cas9 guide scores and correspond to decreased proliferation/survival after CRISPR/Cas9-mediated gene editing.
To identify genes essential for viability in each cell line across a variety of cancer contexts, we rank-ordered genes by second-best CRISPR/Cas9 guide score from most negative (most depleted) to positive (not depleted or enriched). For each cell line, we identified key vulnerabilities corresponding to both oncogenic driver lesions as well as nononcogene dependencies (Fig. 1D–F). For instance, we observed that KRAS, ESR1, and EGFR were essential genes in KRAS-mutant (Fig. 1D), estrogen receptor–positive (Fig. 1E), and EGFR-mutant cell lines (Fig. 1F), respectively. Moreover, we observed strong dependency on a number of other cancer-relevant genes and therapeutic targets in each cell line, including BRD4, MTOR, IGF1R, CCND1, and MYC (Fig. 1D–F). Thus, our approach to CRISPR/Cas9 screening yields high-quality reproducible data that enable the identification of cancer gene dependencies across many different cellular contexts.
Genomic Copy-Number Variation Predicts the Response to CRISPR/Cas9 Genome Targeting Independent of Target Gene Expression
Copy-number alterations (CNA) are the most common genetic alterations in human epithelial cancers (34) and lead to overexpression of driver oncogenes in cancer. To identify such driver oncogenes responsible for cancer cell proliferation and survival within regions of copy-number amplification, we mapped sgRNAs in CRISPR/Cas9 screens of each cell line to genomic coordinates and investigated the relationship of apparent gene essentiality with ABSOLUTE DNA CN data available from the Broad Institute–Novartis Cancer Cell Line Encyclopedia (CCLE; Methods; Supplementary Table S1; refs. 35, 36). We observed a striking enrichment of negative CRISPR/Cas9 guide scores for genes that reside in CN amplifications in several cancer cell lines (Fig. 2A and B; Supplementary Fig. S1A–S1C). Specifically, CRISPR/Cas9 targeting of genes that reside in amplifications conferred decreased proliferation/survival as compared with targeting genes that mapped outside of these amplifications. As expected, we found that known oncogenes, such as AKT2, MYC, or CDK4, scored as essential in cell lines that harbored amplifications involving these genes. However, we also noted that sgRNAs targeting other genes in these same amplified regions appeared similarly detrimental to cell proliferation or survival (Fig. 2A and B; Supplementary Fig. S1A–S1C).
When we compared these observations with those derived from genome-scale RNA interference (RNAi) screens performed in the same cell lines (37), we failed to observe enrichment of apparently essential genes within amplifications and instead identified a small number of genes in each region of CN gain that scored as essential (Fig. 2A and B; Supplementary Fig. S1B and S1C). Moreover, we found that sensitivity to CRISPR/Cas9 targeting within amplified genomic regions was also observed for genes that failed to show significant mRNA expression (Fig. 2C and D; Supplementary Fig. S1D–S1F). These observations suggested that the observed dependency of cancer cells to CRISPR/Cas9 targeting of genes resident in amplifications was not the direct consequence of deleting the target gene.
We next sought to determine if this “CRISPR–CN relationship” also extends to lower levels of CNAs. For all 33 cancer cell lines screened, we defined genomic segments by their CN and labeled those segments with their median CRISPR/Cas9 guide score across all sgRNAs targeting within the segment (Fig. 2E and F; Supplementary Fig. S2). We found a striking correlation between CN and median CRISPR/Cas9 guide score across even low ranges of CNAs. The 1,000 “negative control” sgRNAs in the CRISPR/Cas9 library exhibited minimal effects on cell proliferation and viability, and the majority of other data points had lower CRISPR/Cas9 guide scores than the median of these negative controls (Fig. 2E and F; Supplementary Fig. S2). Strikingly, targeting a locus with an ABSOLUTE CN of 1, which corresponds to a single CRISPR/Cas9-induced DNA cut, also resulted in reduced proliferation/viability in comparison with the negative controls (Fig. 2E–F; Supplementary Fig. S2), indicating that even a discrete instance of CRISPR/Cas9 genome modification significantly affects cell proliferation/viability. For each incremental increase in DNA CN, we observed a progressive decrease in CRISPR/Cas9 guide scores in nearly all of the cell lines that we screened (Fig. 2E–F; Supplementary Fig. S2). Moreover, we observed this CRISPR–CN correlation among both low-level CN gains (e.g., 1–2 extra copies) and high-level amplifications, and both focal and arm-level CNAs (Fig. 2; Supplementary Figs. S1 and S2).
Amplified Genes Rank among the Top Dependencies in Genome-Scale Negative-Selection CRISPR/Cas9 Screens
Given the profound impact of genomic CN on apparent gene essentiality in CRISPR/Cas9 screens as well as the high rate of CNAs in cancer cells, we reasoned that this effect could result in a high false-positive rate for the identification of essential genes. To characterize the impact of these false positives on CRISPR/Cas9 screening data, we compared the apparent essentiality of amplified genes with that of all other genes within each of the 33 cell lines. Specifically, we examined CRISPR guide scores for all genes and observed that genes residing in focal high-level amplifications consistently rank among the most highly essential genes identified for each cell line (Fig. 3A and B; Supplementary Fig. S3A–S3C).
We then performed an aggregate analysis of apparent essentiality due to amplified genes across the entire CRISPR/Cas9 dataset. For this analysis of all genes and all cell lines, we accounted for differences in Cas9 activity/efficacy across cell lines using cell line–specific Z-score normalization (see Methods). To investigate relative gene dependencies within the dataset, we calculated composite CRISPR/Cas9 gene scores using the ATARiS algorithm, as previously described (Methods; ref. 38). We next calculated a global z-score for gene dependency values, representing the number of standard deviations from the mean of the distribution. In parallel, we performed a similar analysis of an available RNAi dataset (Fig. 3C). Thus, this analysis enables a global examination of apparent relative gene dependencies and their relationship to genomic CN amplification. Strikingly, we observed that increasingly essential genes (lower z-scores) were more likely to reside on CN amplifications in CRISPR/Cas9 data but not in RNAi data (Fig. 3C). For genes with a z-score of less than or equal to −5, 28.2% (87 of 308) of those genes reside within a CN amplification, defined as a CN ratio (ABSOLUTE/average sample ploidy) greater than two. Thus, CN amplification is a strong determinant of apparent essentiality in CRISPR/Cas9 screening data, and if not properly accounted for, this CRISPR–CN relationship will likely contribute to a higher false-positive rate for calling gene dependency. When we inspected results from another recently published study that screened five human cancer cell lines with a different CRISPR/Cas9 library (25), we found that gene CN also predicted essentiality (Supplementary Fig. S4A–S4E), thus indicating that the CRISPR–CN correlation occurs independently of the specific sgRNA library used.
The CRISPR–CN Relationship Is Observed across Multiple Different Chromosome Structural Alterations
To investigate the CRISPR–CN relationship across a spectrum of different chromosomal structural alterations, we performed whole-genome sequencing (WGS) on three cell lines harboring CN gains and amplifications and showing a strong correlation between CN and CRISPR/Cas9 guide scores (HT29, CAL120, and PANC-1). We observed the CRISPR–CN relationship in the context of several different structural amplification patterns, including near arm-level CN gain (Supplementary Fig. S5A), simple tandem duplication (Supplementary Fig. S5A), low-level copy gain from interchromosomal translocation (Supplementary Fig. S5B), and complex amplicon structure involving breakage–fusion–bridge cycles and chromothripsis (Supplementary Fig. S6). These observations suggest that the CRISPR–CN correlation occurs at both low- and high-amplitude CN changes and does not relate to specific types of chromosomal structural variation.
The Response of Cells to CRISPR/Cas9 Genome Targeting Correlates with the Total Predicted Number of DNA Cuts at Target Loci
We have demonstrated that there is a gene-independent antiproliferative effect of CRISPR/Cas9 targeting that occurs with even a single target locus, increases with increasing genomic CN, and is independent of the type of structural alteration that leads to increased CN. Thus, we hypothesized that this gene-independent response reflects the total number of CRISPR/Cas9-induced DNA cuts at target loci. The GeCKOv2 library contains 3,593 sgRNAs that have multiple perfect match alignments along with a protospacer-adjacent motif (PAM) sequence within the hg19 reference genome. We typically remove these sgRNAs prior to analyzing cancer cell line dependencies. However, these promiscuous sgRNAs provided an opportunity to perform a comparative analysis of the response of cells to CRISPR/Cas9 editing and the relationship to the predicted number of CRISPR/Cas9-induced DNA cuts based on either CN or number of perfect-match on- and off-target alignments (“multiple alignment analysis”). For the CN analysis, we used only sgRNAs mapping to a single genomic locus. For the multiple alignment analysis, we reintroduced these multitargeting sgRNAs and used only sgRNAs targeting nonamplified regions, thus allowing segregation of the impact of CRISPR/Cas9-induced DNA cuts due to CN or promiscuous multiple genome alignments.
We observed that sgRNAs that target multiple sites in the unamplified genome yield a strong antiproliferative effect, similar to that observed for sgRNAs targeting genomic amplifications (Fig. 4A–D). We found that the number of predicted DNA cuts correlated strongly with the observed depletion of sgRNAs, whether mediated by CN (Fig. 4A and C) or multiple alignments (Fig. 4B and D). To quantify this effect, we calculated the slope coefficient for a linear regression of CRISPR guide scores versus predicted number of cuts for both singly and multiply targeted sets of sgRNAs within each cell line. We term these coefficients the CRISPR-Cut Index (CCI) for single-targeting sgRNAs where the amount of cutting depends on copy number (CCI-CN; Fig. 4A and C) and for multiple-targeting sgRNAs where the amount of cutting depends on the number of multiple alignments (CCI-MA; Fig. 4B and D). We observed that the CCI-CN and the CCI-MA for each individual cell line are comparable, suggesting that the decreased proliferation/survival response of cells to increases in the number of loci targeted by CRISPR/Cas9 is similar, whether the number of target loci is driven by CNA of a single target locus or multiple different target loci within the genome (Fig. 4E).
We further investigated whether there was a difference in the cell response to CRISPR/Cas9-induced DNA cuts targeted to different chromosomes or multiple cuts within a single chromosome. Using the multiple alignment analysis described above, we further split multiple-targeting sgRNAs into sets that either targeted multiple chromosomes (interchromosomal) or targeted sites within only a single chromosome (intrachromosomal). We observed, on average, lower guide scores for sgRNAs targeting multiple interchromosomal loci as compared with sgRNAs targeting a comparable number of intrachromosomal loci (Fig. 5A and B; Supplementary Fig. S7). Moreover, the most promiscuous sgRNAs targeting more than 10 interchromosomal loci rank among most depleted sgRNAs in pooled screening data for each cell line (Fig. 5C and D). Thus, the response of cancer cells to multiple CRISPR/Cas9-induced DNA cuts is greater when multiple loci are targeted across several chromosomes. Beyond the effects of target gene disruption, these observations further suggest that CRISPR/Cas9 gene editing also yields an antiproliferative response that is truly gene independent.
Variation in Cell Response to CRISPR/Cas9 Targeting
Because CCI-CN and CCI-MA are correlated across cell lines, we next calculated a net index for each cell line by integrating the number of targeted sites and genomic CN to predict the total number of cuts for all sgRNAs. We observed a plateau in CRISPR guide scores beyond a certain number of cuts for each cell line, typically ranging from 10 to 50 cuts, suggesting an important limitation in the resolution of sgRNA depletion for sgRNAs targeting many genomic sites (Supplementary Fig. S8A and S8B). Informed by this observation, we fit a segmented least-squares model composed of a general linear regression below a breakpoint (estimated by the model) and a flat segment above this breakpoint. The slope coefficient of the first segment of the model is used as the net index (CCI-Total), reflecting the magnitude of the effect of cutting on CRISPR guide scores.
The CCI-Total showed considerable variability across cell lines. Although the limited sample size of this CRISPR/Cas9 screening dataset restricts the power for a full multivariate analysis of the genetic and biological influences on the CCI-Total, we found two variables that affect this index. Investigating the median CRISPR/Cas9 guide score for “positive control” cell-essential genes as a surrogate for CRISPR/Cas9 efficacy in the screens, we identified a strong correlation of this metric with the CCI-Total (Supplementary Fig. S8C), suggesting that Cas9 efficacy influences the strength of the CRISPR–CN relationship. We also identified that TP53 mutation status also correlates with the CCI-Total (Supplementary Fig. S8D). Although both TP53 mutant and wild-type cells clearly demonstrate the CRISPR–CN relationship, wild-type cells on average show a more pronounced effect, therefore suggesting that the p53 pathway may play a role in mediating the gene-independent response of cells to CRISPR/Cas9 targeting.
CRISPR/Cas9 Targeting of Amplified Regions Induces DNA Damage and a G2 Cell-Cycle Arrest
To interrogate a specific amplification example, we introduced sgRNAs targeting genes and intergenic regions inside and outside of the 19q13 amplicon in the PANC-1 pancreatic cancer cell line (Fig. 6A) and measured viable cell number in a short-term, arrayed format CellTiter-Glo luminescent assay (Fig. 6B; Supplementary Fig. S9A). We observed a significant reduction in cell proliferation for sgRNAs targeting loci inside the amplicon as compared with outside the amplicon at 6 days after expression of each sgRNA. We noted that the observed effect was equally strong for sgRNAs targeting both amplified genes and intergenic regions and was at least as potent as those sgRNAs targeting nonamplified known essential genes, such as RPL4, U2AF1, and MYC (Fig. 6B). Furthermore, we noted that CRISPR/Cas9 targeting of other loci that are not highly amplified resulted in decreased cell proliferation compared with LacZ and Luciferase targeting negative controls. In addition to interrogating sgRNAs targeting amplified regions, we also investigated the effect of two multitargeted sgRNAs on cell proliferation in this 6-day assay, including one sgRNA with multiple perfect match alignments as well as an sgRNA previously shown to target the genome at 151 different loci (Fig. 6B; ref. 39). Here, we also observed a potent reduction in cell proliferation with these multitargeted sgRNAs.
To investigate the mechanism of decreased cell proliferation observed with sgRNAs targeting amplified regions or multiple genomic loci, we utilized a high-content imaging assay to interrogate cell-cycle kinetics in multiple sgRNAs in parallel (40). At 48 hours after expression of these sgRNAs, we observed decreased incorporation of the modified thymidine analogue 5-ethynyl-2′-deoxyuridine (EdU), with diminished S-phase suggestive of decreased DNA synthesis (Fig. 6C). We also observed an accumulation of cells in the G2 phase of the cell cycle with sgRNAs targeting amplified regions or multiple genomic loci (Fig. 6C). Moreover, we observed an increased number of γ-H2AX foci in cells infected with these amplicon-targeting or multitargeted sgRNAs as compared with control sgRNAs, suggesting that increased DNA damage leads to a G2 cell-cycle arrest in these cells (Fig. 6C and D). Notably, we did not observe significant levels of apoptosis at this same time point by measuring cleaved PARP by immunoblotting (Supplementary Fig. S9B). We have performed similar experiments with the chromosome 12 amplicon in the CAL120 breast cancer cell line and confirmed that these observations are not restricted to the chromosome 19 amplicon in PANC-1 (Supplementary Fig. S10A–S1E).
Overall, these observations suggest that CRISPR/Cas9 genome targeting of amplified regions induces a potent early DNA-damage response and cell-cycle arrest that is proportional to the number of target loci. Notably, this antiproliferative effect is independent of targeting expressed protein coding genes and does not depend on target gene disruption and protein loss, which typically occurs on a longer time scale (10).
Increased Genomic CN of Cell-Essential Genes May Protect from Complete Gene Knockout
Although we found that an increased number of target loci for each sgRNA generally leads to increased gene-independent CRISPR/Cas9-mediated cytotoxicity, we reasoned that because CRISPR/Cas9 genome editing is often incomplete within a cell population, more copies of a target gene could also make a cell resistant to complete gene disruption and protein loss through CRISPR/Cas9 targeting of that locus. Therefore, we hypothesized that certain cell-essential gene sets may show the opposite correlation with DNA CN in pooled negative-selection screening. When we examined the CRISPR–CN correlation across all genes screened in all cell lines from the dataset, we first found an overall negative correlation, as expected. However, we also observed that cell-essential genes from the Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets for the proteasome, ribosome, and spliceosome exhibit a CRISPR–CN correlation significantly shifted in the positive direction relative to the rest of the genes in the genome, i.e., higher CN correlated with higher CRISPR gene scores and less observed gene essentiality (Fig. 7). These observations suggest that increased DNA CN for target genes with strong underlying essentiality may protect cells from complete CRISPR-induced knockout of these genes, and thus manifest as relatively less apparent essentiality compared with other essential genes in CN normal regions of the genome. Together, these data further highlight the importance of considering target gene CN and gene function in the interpretation of negative selection pooled screening data.
Discussion
Using data from the genome-scale interrogation of essential genes in 33 cancer cell lines by CRISPR/Cas9, we report that the number of CRISPR/Cas9-induced DNA cuts strongly influences the proliferation/survival response of cells to CRISPR/Cas9 genome editing in a gene-independent manner. We report that targeting sequences within CN amplifications with the CRISPR/Cas9 endonuclease induces decreased cell proliferation/viability that is independent of target gene expression or the structure of the targeted amplicon. The magnitude of the effect increases with the amplitude of CN amplification, and CRISPR/Cas9 targeting within high-level amplifications shows some of the most profound antiproliferative effects observed in the screens. Moreover, analysis of sgRNAs targeting multiple genomic sites also revealed a strong correlation of cell proliferation/viability with the number of predicted CRISPR/Cas9 DNA cuts. Thus, we propose that there are two types of responses to CRISPR/Cas9 targeting in cancer cell lines: (i) an early antiproliferative effect of CRISPR/Cas9-induced DNA cuts that increases with the number of cuts conferred by each sgRNA and that is independent of the target gene, and (ii) the gene essentiality resulting from CRISPR/Cas9-induced knockout of the target gene and subsequent loss of normal protein expression.
The mechanism of the early antiproliferative response to CRISPR/Cas9-mediated gene editing likely relates to induction of multiple double-strand DNA breaks and subsequent G2 cell-cycle arrest. Wang and colleagues also recently reported an analysis of cell-essential genes using CRISPR/Cas9-mediated loss-of-function screens in four leukemia and lymphoma cell lines (28). They found that CRISPR/Cas9-mediated targeting of several genes within the BCR–ABL amplification in the K562 leukemia cell line and JAK2 amplification in the HEL erythroleukemia cell line induced decreased cell viability associated with increased levels of phosphorylated histone H2AX, a marker of DNA damage. Hart and colleagues also recently reported that guide RNAs targeting greater than 20 sites appear similar to known essential genes (25). Here, we present a comprehensive global analysis of this CRISPR–CN correlation in a large and diverse array of cancer cell lines and demonstrate that this phenomenon is pervasive across many different genetic and phenotypic contexts. Moreover, we provide the first evidence that this CRISPR–CN correlation occurs across a wide range of CNAs and chromosome structures, including those with low-level CN gain. Importantly, we demonstrate that targeting sequences within regions of high-level CN gain induces among the strongest observed viability phenotypes of all sgRNAs in the screen. Because this effect is not related to specific genes, these observations have important practical implications for utilizing CRISPR/Cas9 technology for cancer dependency profiling and for studying gene essentiality in general.
When we analyzed the effects on cell proliferation/viability induced by increased numbers of cuts, we noted that even a single CRISPR/Cas9-induced DNA cut resulted in decreased cell proliferation when compared with sgRNAs that do not target any human sequence. Thus, choice of negative controls for CRISPR/Cas9 experiments is critically important to interpret the consequences of CRISPR/Cas9-mediated genome editing. Although nontargeting sgRNAs may best represent truly neutral negative controls, it may be more appropriate to use a targeting sgRNA directed at a non-genic and CN-normal region of the genome to better model the baseline impact of nonspecific DNA targeting with CRISPR/Cas9. The observation that off-target CRISPR/Cas9 cuts likely also cooperate with on-target cuts to effect a cumulative toll on the cell highlights the paramount importance of optimal library design for better on-target and less off-target activity. Improved sgRNA libraries would thus allow better prediction of the total number of CRISPR/Cas9-induced DNA cuts according to baseline CN and therefore enable enhanced resolution of actual gene-based dependencies within the data.
Moreover, the observation that targeting the CRISPR/Cas9 endonuclease to even a single locus induced decreased proliferation/viability indicates that this approach to targeting genes induces a cellular response in the majority of cases. As such, the effects of this response should be considered in the interpretation of any phenotype observed after targeting a specific gene. Indeed, this observation may also affect efforts to use the CRISPR/Cas9 approach to perform genome editing for therapeutic purposes.
We also observed that for high-level genomic amplifications, the cellular responses to CRISPR/Cas9 cutting toxicity overwhelm the signal from underlying gene essentiality, thus complicating efforts to use CRISPR/Cas9 for the identification of essential genes in amplified regions. Hence, it may be most prudent in individual cell line screening data to exclude certain reagents from consideration for the identification of essential genes, including sgRNAs targeting genomic amplifications as well as those predicted to confer multiple CRISPR/Cas9 DNA cuts. Failure to properly account for CNAs may lead to confounding effects and a higher rate of false-positive identification of cell-essential genes. Because CNAs are the most common genetic alteration found in human epithelial cancers, these observations have practical implications on both individual experiments as well as systematic efforts to interrogate the consequences of gene depletion. These observations also highlight the need to perform CRISPR/Cas9 screens across a large collection of diverse cancer cell lines to represent a variety of cancer gene dependencies while accounting for specific confounding genomic structural alterations within individual cell lines.
We propose that this observation extends beyond merely a confounding artifact of CRISPR/Cas9 technology and uncovers an important underlying biological concept that cancer cells are vulnerable to induction of site-specific double-stranded DNA breaks within regions of genomic amplification. Genome-scale CRISPR/Cas9 screening has provided an unprecedented resolution of the degree of DNA damage necessary to effect an antiproliferative or cytotoxic response in cancer cells, revealing an unappreciated susceptibility to even a small number of site-specific DNA breaks. Our observations support the notion that CRISPR/Cas9 targeting of amplified regions of the genome leads to increased DNA damage and a significant consequent antiproliferative response. Although these findings complicate the study of amplified regions with CRISPR/Cas9-based approaches, this early antiproliferative cell response may enable sequence-specific therapeutic approaches to target cancer. Many chemotherapy agents (e.g., cisplatin), as well as ionizing radiation, achieve their effects by inducing DNA damage that is not adequately repaired by cancer cells (41, 42). Although many cancer cells are more susceptible than normal cells to chemotherapy and radiation, a major limitation of these treatment approaches is the nonspecific nature of these modalities and the narrow therapeutic window for preferential killing of cancer cells versus normal cells. Our observations suggest that targeting nonessential genes or even noncoding, intergenic regions of amplified DNA with CRISPR/Cas9 technology may unveil critical vulnerabilities in cancer cells that could be harnessed for cancer-specific therapy. A precision medicine approach using simultaneous combination of CRISPR/Cas9 reagents to target multiple amplified loci or tumor-specific mutated sequences within a cancer cell may enable development of cancer-specific treatments with an optimal therapeutic window.
Methods
CRISPR/Cas9 Screening
Cancer cell lines were transduced with a lentiviral vector expressing the Cas9 nuclease under blasticidin selection (pXPR-311Cas9). Each Cas9-expressing cell line was subjected to a Cas9 activity assay (see below) to characterize the efficacy of CRISPR/Cas9 in these cell lines (Supplementary Table S1). Cell lines with less than 45% measured Cas9 activity were considered ineligible for screening. Stable polyclonal Cas9+ cell lines were then infected in replicate (n = 3) at low multiplicity of infection (MOI < 1) with a library of 123,411 unique sgRNAs targeting 19,050 genes (6 sgRNAs per gene), 1,864 miRNAs, and 1,000 nontargeting control sgRNAs (GeCKO v2), selected in puromycin and blasticidin for 7 days and then passaged without selection while maintaining a representation of 500 cells per sgRNA until a defined time point. Genomic DNA was purified from end cell pellets and the guide sequence PCR amplified with sufficient gDNA to maintain representation, and quantified using massively parallel sequencing.
Data Quality Control
Quality control measures were used to remove cell line replicate samples where (i) the single-nucleotide polymorphism (SNP) genotype fingerprint failed to match the reference cell line as previously described (37), (ii) the reproducibility between replicates was less than 80%, and (iii) principal component analysis showed a replicate or cell line to be an outlier.
Data Processing
Data were processed in a reproducible GenePattern pipeline and are provided on the Project Achilles portal (43). A fold change was calculated per sgRNA, and the median of nontargeting controls (n = 1,000) in the GeCKOv2 library was subtracted from each sgRNA to generate a CRISPR guide score. Given the gene-independent effect of CRISPR/Cas9 described in this article, we chose to use the second-best CRISPR/Cas9 guide score for the purpose of ranking gene-level dependencies in individual cell lines. See Supplementary Methods for further details.
Cancer Cell Lines and Cas9 Activity Assay
Cancer cell lines were obtained primarily from the CCLE, which obtained each line from the original source (Supplementary Table S1; ref. 35). All cell lines were Mycoplasma negative, and identity was confirmed through fingerprinting prior to screening using an Affymetrix SNP array as previously described (37). Prior to screening, cell lines were engineered to stably express Cas9 under Blasticidin selection, and Cas9 activity was assayed using a lentivirus with an EF1a-driven puromycin-2A-GFP cassette, and a U6-driven sgRNA targeting GFP (pXPR_011; ref. 44). The initial level of GFP is measured with FACS and monitored over time as a measure of cells harboring modified alleles. Cells with GFP remaining are due to either modifications that do not inactivate GFP florescence or inactive Cas9.
Essential Gene Controls
Genes from the KEGG gene sets for ribosome, proteasome, and spliceosome subunits (Supplementary Table S2) were used as cell-essential (positive) controls in the analysis of negative selection CRISPR/Cas9 screening data. Guide sequences that were a perfect match to sgRNAs targeting any other gene or noncoding sequence were removed, except when specifically utilized in described analyses.
CN Analysis
DNA CN data were derived from SNP microarrays, and ABSOLUTE CN calls were made as previously described (35, 36). CRISPR/Cas9 screening data were mapped according to the genomic position of sgRNA sequence (guide-level data) or target gene (by ATARiS algorithm) to the human genome version 19 (hg19). CRISPR/Cas9 screening data were plotted in parallel to Project Achilles shRNA dependency data (43) or CCLE CN or gene expression data (35, 37).
WGS and Analysis
WGS was performed through the Broad Institute-Novartis CCLE, as previously described (45). Whole-genome DNA sequencing data of the cancer cell lines are aligned by the Burrows–Wheeler aligner (46) to the human genome reference 19. The aligned reads are filtered for PCR duplicates by MarkDuplicates from Picard. Read-depth coverage was computed and normalized using the previously described approach (47). Briefly, the number of aligned reads was counted for nonoverlapping 1-kb bins and then normalized for GC-content and mappability biases using the HMMcopy R/Biocondutor package. The normalization was applied to both the cancer cell line and pseudo-normal sample, independently, and then used to generate a log2 ratio (tumor:normal) of GC-corrected coverage. The GC-corrected coverage was then smoothed over 20-kb bins and plotted in Supplementary Fig. S6 and Fig. 7. Chromosomal rearrangements were detected by dRanger (48) from clusters of discordant pairs. Rearrangements at the breakpoint boundaries were manually reviewed and plotted. The relative order of breakage–fusion–bridge cycles and chromothripsis in PANC-1 cells was inferred based on the criteria by Li and colleagues (49).
Analysis of Published CRISPR/Cas9 Screening Data
Bayes factor (BF) values were derived from Hart and colleagues (25), and fitness genes were determined per cell line according to the thresholds described therein. Gene-level CN data for HCT116, A375, and DLD1 cells were downloaded from the CCLE. Gene-level CN data for HeLa cells were downloaded from the Gene Expression Omnibus (GEO) database, accession number GSE8605. Further details of the analysis are provided in the Supplementary Methods.
sgRNA Design and Cloning
sgRNAs for validation experiments were designed utilizing the Broad Institute Avana sgRNA design tool (44). sgRNA sequences and characteristics are provided in Supplementary Table S3.
Cell Viability Assay
The PANC-1 and CAL120 cell lines stably expressing Cas9 were plated in a 96-well plate at 1,000 cells/well. One day after plating, cells were infected at a high multiplicity of infection with virus harboring each of the indicated sgRNAs. Cells were cultured ±puromycin, and infection efficiency was calculated from comparison of the puromycin-selected and unselected wells. Six days after infection, cell viability was read out using CellTiter-Glo. Data are presented using unselected wells and calculating fold change relative to the nontargeting negative control sgRNA. Error bars are the result of three biological replicates.
Immunoblots
Cells were infected at high MOI in 6-well plates, and protein was extracted at 48 hours after infection. Immunoblotting was performed using antibodies for PARP (Cell Signaling Technology, 46D11, #9532, 1:1,000) and β-actin (Sigma Aldrich, A5316, 1:5,000).
High-Content Imaging Assay and Analysis
PANC-1 and CAL120 cells constitutively expressing Cas9 protein were plated at a density of 4,000 cells per well, infected in replicate in 96-well plates at high MOI and analyzed at an endpoint of 48 hours after infection. Cells were labeled with EdU and fixed with paraformaldehyde and then labeled with anti-pHH3 (S10) primary antibody (Rabbit: #9701, Cell Signaling Technology, 1:800), anti–phospho-histone H2A.X (Ser139, Mouse: 05-636, END Millipore, 1:1,250) and Hoechst 33342 (H3570, Thermo Fisher Scientific, 1 μg/mL). Imaging was performed with the OperaPhenix imaging system on 20× magnification, and data were analyzed using the PerkinElmer Harmony software (40). See Supplementary Methods for additional details.
Disclosure of Potential Conflicts of Interest
C.-Z. Zhang is a consultant/advisory board member for Pillar Biosciences and Ori Capital. A.D. Cherniack reports receiving a commercial research grant from Bayer AG. G. Kryukov is Senior Director, Computational Biology, at KSQ Therapeutics. L.A. Garraway reports receiving a commercial research grant from Novartis, has ownership interest (including patents) in Foundation Medicine, and is a consultant/advisory board member for Warp Drive, Novartis, Foundation Medicine, and Boehringer Ingelheim. M. Meyerson reports receiving a commercial research grant from Bayer and is a consultant/advisory board member for the same. W.C. Hahn reports receiving a commercial research grant from Novartis and is a consultant/advisory board member for the same. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: A.J. Aguirre, R.M. Meyers, B.A. Weir, F. Vazquez, A. Cook, K. Stegmaier, M. Meyerson, D.E. Root, A. Tsherniak, W.C. Hahn
Development of methodology: A.J. Aguirre, R.M. Meyers, B.A. Weir, F. Vazquez, H. Xu, L.D. Ali, G. Jiang, D.E. Root, A. Tsherniak
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.J. Aguirre, F. Vazquez, A. Cook, W.F. Harrington, M.B. Doshi, S. Gill, L.D. Ali, G. Jiang, S. Pantel, Y. Lee, A. Goodale, G.S. Cowley, D.E. Root
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.J. Aguirre, R.M. Meyers, B.A. Weir, F. Vazquez, C.-Z. Zhang, U. Ben-David, G. Ha, W.F. Harrington, M.B. Doshi, M. Kost-Alimova, H. Xu, G. Jiang, A.D. Cherniack, C. Oh, G. Kryukov, G.S. Cowley, L.A. Garraway, C.W. Roberts, T.R. Golub, D.E. Root, A. Tsherniak, W.C. Hahn
Writing, review, and/or revision of the manuscript: A.J. Aguirre, R.M. Meyers, B.A. Weir, U. Ben-David, M. Kost-Alimova, H. Xu, A.D. Cherniack, K. Stegmaier, M. Meyerson, D.E. Root, A. Tsherniak, W.C. Hahn
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.J. Aguirre, G. Jiang, S. Pantel, D.E. Root
Study supervision: A.J. Aguirre, K. Stegmaier, T.R. Golub, M. Meyerson, D.E. Root, A. Tsherniak, W.C. Hahn
Acknowledgments
We thank Eejung Kim, Joseph Rosenbluh, Srivatsan Raghavan, and Belinda Wang for helpful discussions.
Grant Support
This project was supported by R01 CA130988 (W.C. Hahn), U01 CA199253 (W.C. Hahn), U01 CA176058 (W.C. Hahn), P01 CA154303 (W.C. Hahn), and P50 CA12700323 (W.C. Hahn and A.J. Aguirre). A.J. Aguirre was supported by the Pancreatic Cancer Action Network Samuel Stroum Fellowship, Hope Funds for Cancer Postdoctoral Fellowship, American Society of Clinical Oncology Young Investigator Award, Dana-Farber Cancer Institute Hale Center for Pancreatic Cancer, Perry S. Levy Endowed Fellowship, and the Harvard Catalyst and Harvard Clinical and Translational Science Center (UL1 TR001102). This work was conducted as part of the Slim Initiative for Genomic Medicine, a project funded by the Carlos Slim Foundation in Mexico.