Abstract
CRISPR/Cas9 has emerged as a powerful new tool to systematically probe gene function. We compared the performance of CRISPR to RNAi-based loss-of-function screens for the identification of cancer dependencies across multiple cancer cell lines. CRISPR dropout screens consistently identified more lethal genes than RNAi, implying that the identification of many cellular dependencies may require full gene inactivation. However, in two aneuploid cancer models, we found that all genes within highly amplified regions, including nonexpressed genes, scored as lethal by CRISPR, revealing an unanticipated class of false-positive hits. In addition, using a CRISPR tiling screen, we found that sgRNAs targeting essential domains generate the strongest lethality phenotypes and thus provide a strategy to rapidly define the protein domains required for cancer dependence. Collectively, these findings not only demonstrate the utility of CRISPR screens in the identification of cancer-essential genes, but also reveal the need to carefully control for false-positive results in chromosomally unstable cancer lines.
Significance: We show in this study that CRISPR-based screens have a significantly lower false-negative rate compared with RNAi-based screens, but have specific liabilities particularly in the interrogation of regions of genome amplification. Therefore, this study provides critical insights for applying CRISPR-based screens toward the systematic identification of new cancer targets. Cancer Discov; 6(8); 900–13. ©2016 AACR.
See related commentary by Sheel and Xue, p. 824.
See related article by Aguirre et al., p. 914.
This article is highlighted in the In This Issue feature, p. 803
Introduction
Genetic loss-of-function screens are an important approach enabling the systematic identification of cancer-selective vulnerabilities. In mammalian cells, RNAi has been the predominant method of screening and has enabled systematic and genome-wide loss-of-function screens leading to the identification of new cancer targets (1, 2). RNAi-based screens, however, are often confounded by off-target effects (3). In addition, RNAi induces mRNA downregulation, typically resulting in reduced gene function (hypomorphic allele) rather than a complete loss of function (null allele). Thus, in addition to the problem of false positives, RNAi screens also likely suffer a certain rate of false-negative detection of genes where near-complete loss of function would be required in order to elicit a phenotypic effect. The frequency of false negatives in RNAi-based screens has not yet been systematically assessed.
More recently, the prokaryotic type II CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR-associated 9) has emerged as an RNA-based genome-editing tool that can be used to enact loss-of-function screens (4). In contrast with RNAi, the CRISPR system induces sequence-directed DNA double-stranded breaks resulting in frameshift insertion/deletion (indel) mutations that can induce complete loss-of-protein function (5). Initial studies demonstrated the use of CRISPR for genetic screens in mammalian cells (6, 7) and showed a high level of phenotypic agreement between reagents targeting the same gene and a high rate of hit confirmation. Most of these screens were positive selection screens which are technically less challenging than “drop out” screens. Subsequent screens (8, 9) used improved libraries and screening methods to discover essential genes in mammalian cells, but a systematic comparison of CRISPR with RNAi in dropout screens has not yet been described with sufficient reagent depth to enable robust conclusions.
In this study, we systematically compared the performance of these two screening technologies for the identification of new cancer vulnerabilities. We show that at equivalent screening depth, CRISPR dropout screens identified a significantly higher number of essential genes and thus provide a more comprehensive assessment of genetic dependencies compared with RNAi-based screens. In addition, we show that sgRNAs that target DNA sequences within conserved PFAM domains (10) tend to result in a more robust dropout phenotype. These findings have important implications for future library designs and suggest that the CRISPR tiling approach outlined herein might be used to elucidate which protein domains are critical in driving biological effects. We surprisingly found that all genes within highly amplified genes, even when not expressed, scored as strongly lethal, revealing an unanticipated class of false-positive hits. Collectively, these findings demonstrate that although CRISPR has certain specific limitations, CRISPR-mediated genetics screens can be used for robust and systematic discovery of cancer cell vulnerabilities.
Results
CRISPR-Based Dropout Screens Provide a More Complete Assessment of Cancer Dependencies Compared with shRNA Screens
In order to robustly compare RNAi- and CRISPR-based screening technologies, we constructed shRNA and single- guide RNA (sgRNA) libraries targeting 2,722 human genes with an average coverage of 20 reagents per gene. The sgRNAs were designed against the N-terminus of protein coding genes, as described previously, as frameshift mutations in the N-terminus are thought to be more likely to result in “complete” protein inactivation (7, 11). Deep shRNA libraries were designed as previously described (2). These libraries were used in proliferation-based screens in a set of 5 cancer cell lines, including the colorectal cancer cell lines DLD1 and RKO, the fibrosarcoma cell line HT1080, the astrocytoma cell line SF268, and the gastric cancer cell line MKN45 (Fig. 1A). Following lentiviral transduction of the sgRNA libraries, the impact of gene depletion on cellular viability or proliferation was assessed by quantifying the abundance of sgRNAs at day 0 (plasmid count) relative to day 14 using next-generation sequencing (see details in Methods). sgRNAs targeting essential genes are expected to inhibit the growth of transduced cells, and thus their relative abundance will be reduced when comparing the relative counts on day 14 versus day 0 (Fig. 1A, right graph). We found that across all cell lines screened, about 2% to 3% of genes scored as lethal genes by both RNAi and CRISPR approaches (Fig. 1B and C; Supplementary Fig. S1, quadrant III). The gene list in quadrant III included many known essential gene classes such as ribosomal, RNA processing, and DNA replication factors (Supplementary Fig. S2). Notably, there were very few genes that scored as essential by RNAi but not by CRISPR (Fig. 1B and C; Supplementary Fig. S1, IV). In contrast, in all of the five cancer models screened, a large number of genes scored as essential by CRISPR but not by RNAi (Fig. 1B and C; Supplementary Fig. S1, II). In fact, the number of lethal genes identified by CRISPR was 2-fold (HT1080 cells) to 5-fold (DLD1 cells) higher compared with RNAi. This suggested that CRISPR either had a significantly lower false-negative rate than shRNA, or a much higher false-positive rate. One way to identify likely off-target hits is to examine the lethality scores of nonexpressed genes, as these are expected to not be required for cell viability. In DLD1, RKO, and HT1080 cells, all of the genes required for cell viability (average z-scores below –1) had an RNA sequencing (RNA-seq) RPKM expression value greater than 2, indicating that the CRISPR screen at this depth showed virtually no false-positive effects from sgRNAs directed against nonexpressed genes (Fig. 1D; Supplementary Fig. S3). However, in SF268 (Fig. 1E) and MKN45 cells (Supplementary Fig. S3), a number of genes scored as essential in the CRISPR screen despite not being expressed. As described in detail below, we found that these false-positive hits were associated with genes in regions of high copy-number amplification. These false-positive hits were observed only in SF268 and MKN45 cells, as these are chromosomally aneuploid lines, whereas DLD1, RKO, and HT1080 are diploid cancer lines. After removing these false-positive hits due to amplified genes in SF268 and MKN45 cells, we conducted further analysis on the essential genes identified by CRISPR and RNAi. The category of genes that scored only by CRISPR but not by RNAi included many genes known to be essential for proliferation of most cells, such as CDK9, PLK1, and MYC (12, 13), as well as many known essential gene classes (RNA processing and DNA replication). We hypothesized that RNAi-based screens failed to recover these genes due to either the absence of effective shRNAs in our library (despite a coverage of 20 reagents per gene) and/or insufficient protein knockdown to reveal a full loss-of-function phenotype. In support of this hypothesis, we found that only 1 of 6 CDK9 shRNAs tested achieved potent CDK9 protein depletion that resulted in growth inhibition (Supplementary Fig. S4A and S4B), thus explaining why CDK9 failed to score as an essential gene in the shRNA-based screen. Collectively, these findings indicate that CRISPR-based screening enables a more complete assessment of genes required for cancer cell growth.
We next sought to explore whether CRISPR screens can be used to identify cancer-selective dependencies. The five cell lines screened were derived from various tumor lineages with distinct genetic alterations. Using a cutoff of –1 average z-score to delineate genes that are cell essential, we found that a total of 409 genes scored as essential in at least one of the five cancer cell lines. Of these, 34% of essential genes were required for the proliferation of all cell lines, suggesting that these genes serve core cellular functions that are likely required for the proliferation of most cells (Fig. 2A and B); we henceforth refer to this category of broadly essential genes as pan-lethals. A smaller number of genes was selectively required for the growth of only one (25%) or two (12%) of the five screened cancer cell models (Fig. 2A and B); we refer to this class of genes as selective lethals. Of note, the class of selective lethals included several known oncogene dependencies. For instance, β-catenin targeting sgRNAs selectively impaired the proliferation of DLD1, an APC-mutated cell line with constitutive activation of WNT pathway signaling (ref. 14; Fig. 2C). The selective dependence on β-catenin was validated using inducible sgRNAs and additional cell proliferation assays (Supplementary Fig. S5A–S5D). This cell line was also dependent on TCF7L2, a gene that encodes the transcription factor TCF4. TCF4 interacts with β-catenin to drive expression of WNT pathway target genes. Surprisingly, the gastric cancer cell line MKN45 also exhibited dependence on β-catenin and TCF4 despite lacking genetic alterations in WNT pathway components (Fig. 2C). Of note, a prior study reported high levels of nuclear β-catenin in MKN45 cells (15), suggesting that WNT pathway activation in this cell line might be driven by nongenetic mechanisms. When we looked at the pattern of KRAS dependence, we found that KRAS selectively impaired the proliferation of DLD1 cells (Fig. 2C); this dependency is likely explained by the fact that DLD1 harbors the oncogenic KRASG13D mutation. Unexpectedly, however, MKN45 cells were also dependent on KRAS despite lacking genetic alterations in this oncogene. These cells harbor MET amplification; thus, one possibility is that MET signaling requires KRAS. KRAS also had a low z-score in MKN45 cells by shRNA screening (average z-score of –0.8), suggesting the phenotype is biologically relevant and not a false positive. Both the β-catenin and KRAS sensitivities in MKN45 cells would not have been predicted by genetic alterations alone, but were discovered independently by both RNAi and CRISPR, highlighting the importance of functional profiling. The selective pattern of NRAS and PIK3CA dependency correlated well with the presence of oncogenic alterations in NRAS and PIK3CA, respectively (Fig. 2C; Supplementary Fig. S5). In addition, MDM2 sgRNAs selectively impaired the growth of p53 wild-type but not p53 mutant cell lines (Fig. 2C). Importantly, this genetic pattern of MDM2 dependence recapitulates the selective inhibition of p53 wild-type cell lines by pharmacologic MDM2 inhibitors, such as Nutlin-3 (16). Together, these findings indicate that in addition to the identification of broadly essential genes, CRISPR-based dropout screens can also robustly identify cancer-selective vulnerabilities.
sgRNAs Targeting Conserved PFAM Domains Show Most Robust Dropout Phenotypes
Although the CRISPR-based screen identified CTNNB1 as a cancer-selective dependency in WNT pathway deregulated cancer models, CTNNB1 was one of the few genes that scored more robustly in the shRNA screen compared with CRISPR (Fig. 3A). Examination of the individual sgRNA scores indicated that the efficacy of sgRNAs correlated with the targeting position in the CTNNB1 transcript (Fig. 3B); the first five sgRNAs targeting the most 5′ regions of the CTNNB1 transcript showed very little to no dropout phenotype. By contrast, 87% of the next 15 sgRNAs targeting the downstream exons 3, 4, and 5 exhibited a stronger lethality score. Investigation of the genomic locus of CTNNB1 revealed that it harbors an alternative translational initiation start site in exon 3 (transcript ID ENST00000405570), suggesting that the isoform expressed from this alternative start site is likely sufficient for cancer cell growth, explaining the lack of a dropout phenotype of the 5′ targeting sgRNAs.
We next set out to more systematically investigate the importance of sgRNA positioning on gene inactivation. To this end, we designed a sgRNA library that contains all possible sgRNAs targeting a set of 139 genes with an average of 364 sgRNAs/gene, which we refer to as CRISPR tiling array (Fig. 4A). The genes included in the CRISPR tiling array were chosen to represent diverse biological functions, but were enriched for genes that elicited growth phenotypes in the primary screen. In order to minimize potential biases, we included all unique sgRNA sequences targeting these gene coding sequences only requiring the presence of a protospacer adjacent motif (PAM) sequence and lack of perfect homology to other coding sequences. This CRISPR tiling library was screened in the three cancer cell lines DLD1, RKO, and NCI-H1299. Interestingly, as observed for CTNNB1, for 63% (46 of 73) of the growth essential genes in DLD1 cells, the sgRNA performance was strongly influenced by the sgRNA position within the coding region. Similarly, 68% (52 of 76) of the essential genes in RKO cells showed coding region–dependent activity. The growth effects of individual sgRNAs were significantly correlated across cell lines (r2 = 0.504), suggesting that these effects represent consistent differences in the biological effectiveness of individual reagents (Fig. 4B). We next performed a systematic correlation analysis of sgRNA features to identify what features correlated most strongly with sgRNA potency. Interestingly, the top predictive feature for sgRNA performance was its localization within a conserved PFAM protein domain (Fig. 4C; Supplementary Fig. S6). In addition, the extent of sequence conservation across vertebrate species was also a good predictor (P < 0.001) of sgRNA efficacy, regardless of whether or not the region was annotated as a conserved PFAM domain. Although prior studies (7) have suggested that there is value in targeting the most 5′ coding regions of proteins with CRISPR reagents, this was not the case in our screens. In this dataset, the average phenotype of sgRNAs targeting essential genes was slightly weaker for sgRNAs targeting the extreme N-terminal coding region, and much weaker in the 3′ most coding regions (last 20%) of proteins. This effect, however, appeared to be largely driven by the location of PFAM domains within coding regions, as the N-terminal and C-terminal effects were no longer observed when only sgRNAs targeting annotated domains were included in the analysis (Supplementary Fig. S7A and S7B).
Based on the observation that sgRNAs targeting conserved protein domains scored more robustly in CRISPR-based screens, we hypothesized that CRISPR tiling data might be used to perform functional annotation of critical protein domains. Indeed, sgRNAs targeting the highly conserved armadillo repeats in β-catenin demonstrated more significant average lethality scores compared with sgRNAs targeting less conserved regions (Fig. 4D). The failure of some of the β-catenin sgRNAs to score despite targeting the highly conserved armadillo repeats correlated with ineffective genome editing by these reagents (Supplementary Fig. S8A–S8C). Similar to the case of β-catenin, sgRNAs targeting the highly conserved kinase domain or polo-box regions in PLK1 showed the most robust dropout phenotypes (Fig. 4E), and sgRNAs targeting the kinase domain of Aurora kinase B (AURKB) had significantly stronger effects than those targeting the extreme N or C termini (Fig. 4F). These findings are consistent with the notion that the armadillo repeats in β-catenin, the kinase activity in Aurora kinase B, and both the kinase activity and polo-boxes in PLK1s are essential in mediating their cellular functions (13, 17). A recent study revealed that the helicase activity but not the bromodomain of BRM is required to sustain the growth of BRG1-deficient cancers (Fig. 4G; ref. 18). Strikingly, the CRISPR tiling data for BRM indicated a more robust dropout phenotype for sgRNAs targeting the ATPase/helicase activity compared with those targeting the bromodomain region. Together, these findings suggest that CRISPR tiling screens might be useful, in some cases, to decipher which protein domains are required for cancer cell growth.
Amplified Genomic Loci Score as False Positive in CRISPR-Based Dropout Screens
As described earlier, we found that in the two aneuploid cancer cell lines SF268 and MKN45, several nonexpressed genes scored as essential, suggesting that these genes represent false-positive hits. Strikingly, all of these false-positive hits mapped to regions of high-level copy-number amplification (Fig. 5A and B). We therefore wanted to explore more deeply the effect of amplified genomic regions on the performance of CRISPR-based screens. MKN45 is a gastric cancer cell line that harbors amplification of a region of chromosome 7 (7q31) that contains the likely driver oncogene MET (Fig. 5A). Whereas MET scored as essential in MKN45 cells, all other genes included in the library and located within the 7q31 amplicon also scored as lethal. Moreover, sgRNAs targeting ING3 and CAV1 exhibited the strongest viability effect of the genes located within 7q31 amplicon, with MET ranking third (Fig. 5C). Similar results were observed for SF268 cells, where all genes in the chromosome 11 amplicon (11q22) scored as lethal (Fig. 5B and D). YAP has been hypothesized to be the most likely driver of this amplicon (19). Although YAP did score as the most strongly essential in this cell line, three genes within this amplicon, MMP7, MMP20, and ANGPTL5, showed strong viability effects despite lacking detectable expression based on RNA-seq. Of note, all of the nonexpressed genes (RNA-seq < 1) that scored as lethal in SF268 and MKN45 cells were located in amplified genomic regions (Fig. 5A and B). By contrast, shRNA-based screens identified both MET and YAP as the sole driver oncogenes of their respective amplicons (Fig. 5E and F). We hypothesized that sgRNAs targeting amplified loci may lead to excessive double-strand breaks and activation of the DNA damage repair pathways. To test this, we examined the effects of sgRNAs targeting the nonexpressed and amplified genes MMP7, MMP20, and ANGPTL5 in SF268 cells that harbor the 11q22 amplicon. All three sgRNAs led to a strong increase in phosphorylated histone H2AX, a marker of DNA damage (Supplementary Fig. S9A), and resulted in a G2–M arrest and induction of apoptosis (Supplementary Figs. S9B–S9D and S10A and S10B). As predicted, the induction of DNA damage response, G2–M arrest, and apoptosis by these sgRNAs was specific to cells with 11q22 amplicon and not observed in the diploid DLD1 cells (Supplementary Figs. S9A and S10A and S10B).
We next explored the effect of relative gene copy number on CRISPR lethality score more globally across the CRISPR screening dataset. When comparing the copy-number status with the average lethality score for all 2,700 genes screened in these two cell lines, we found a positive correlation between the degree of amplification and CRISPR lethality score (Fig. 6A). By contrast, there was no correlation between copy number and lethality score in the shRNA screen dataset (Fig. 6A). Even sgRNAs directed against loci with only a modestly increased copy number, harboring as few as one or two additional copies, showed a greater average growth-inhibitory effect than nonamplified loci. In addition, we observed that sgRNAs targeting regions harboring hemizygous or complete loss of the genomic region displayed on average a less pronounced growth effect than diploid regions. This effect was highly significant even when the analysis was restricted to only nonexpressed genes (P = 10−35), thus excluding the possibility that this effect is due to disruption of gene function. Together, these findings further support the notion that CRISPR reagents that induce multiple genomic cuts result in antiproliferative effect independent of the target gene function and that this is directly proportional to the number of induced cuts. We next wanted to investigate if this phenomenon may also help to explain some of the off-target lethality of individual sgRNA reagents. To minimize any confounding effects due to on-target gene inactivation, we restricted this analysis to 14,000 sgRNAs targeting nonlethal genes (as judged by lack of dropout of the average sgRNA targeting that gene). Strikingly, the best predictor of off-target lethality was the number of genomic sites with perfect complementarity to the target site (Fig. 6B; Supplementary Table S1). To investigate the mechanism of growth inhibition of these “multi-cutter” sgRNAs, we examined the cellular response to VEGFA site 2 sgRNA that was previously shown by GUIDE-seq to have more than 140 verified off-target sites (20), as well as another multiple cutter sgRNA observed in our screens (originally designed against the olfactory receptor OR4F5). Similar to sgRNAs targeting amplified loci, we found that both “multi-cutter” sgRNAs led to a strongly increased phosphorylation of H2AX, G2–M cell-cycle arrest, and apoptosis (Supplementary Figs. S9 and S10). It is important to note that the sgRNAs included in the CRISPR tilingarray were filtered against only perfect matches to other coding regions rather than the entire genome. sgRNAs targeting multiple genomic loci frequently contained low-complexity repeat sequences (e.g., AGGAGGAGG…), but the off-target effects due to multiple genome matches were still observed after the exclusion of low-complexity repeats from the dataset. Collectively, these findings indicate that loss-of-function proliferation-based studies using sgRNA-mediated gene inactivation will be subject to a set of off-target activities related to the number of times a guide strand sequence is found in the genome. This will likely lead to false positives in genes found in areas of genomic amplification, and false positives due to multiple homologous sites for a given sgRNA. Hence, sgRNAs should be selected to have no additional matches to genomic regions (even if not expressed) in order to minimize off-target lethality due to excessive genome damage. Moreover, these findings indicate that RNAi or CRISPR interference (CRISPRi)–based screens (21) will be better suited to elucidate the driver oncogenes of amplified regions.
Discussion
Genetic loss-of-function studies hold great promise for the discovery of novel therapeutic targets for cancer and other diseases. In this study, we compared the deep-coverage shRNA and CRISPR-based screens for the systematic identification of cancer vulnerabilities. Our data indicate that CRISPR dropout screens identified between 2 and 5 times as many essential genes compared with RNAi-based loss-of-function screens, even when the shRNA screens were powered at 20 shRNAs per gene. We speculate that this high rate of false negatives in RNAi-based screens can likely be attributed to the incomplete nature of gene inactivation by RNAi, which in most cases generates hypomorphic rather than complete null alleles (22). By contrast, CRISPR cutting of genomic DNA and error-prone nonhomologous end-joining will result in indel mutations. Indels are typically more catastrophic mutations to protein function and frequently lead to complete gene disruption, especially in the case of frameshift mutations. These findings indicate that CRISPR-based dropout screens can provide a more comprehensive assessment of genetic dependencies compared with RNAi-based screens.
As for any emerging technology, the specificity and optimal design parameters for CRISPR experiments are not yet fully understood. Although CRISPR-based screens generally have a low false-positive rate (7, 11), likely owing to the increased targeting specificity of sgRNAs (22), we surprisingly found that CRISPR can be prone to false-positive hits for genes with high ploidy, especially above a copy-number threshold greater than 6 copies. Although it will be important to control for this class of false-positive hits, it is important to note that these artifactual hits comprise only a minor fraction of all essential genes discovered by CRISPR screens in aneuploid lines (Supplementary Fig. S11) and can easily be removed bioinformatically. The copy-number effect on CRISPR lethality was likely missed in several earlier studies because those screens were performed on cell lines with stable diploid genomes. A recent study has observed a similar copy-number effect on a single cell line harboring a high-level amplicon (8). We reasoned that the lethality of sgRNAs targeting amplified genomic regions might be explained by two hypotheses. First, sgRNAs targeting genes within tandem amplicons could lead to the excision of the entire locus, including removal of the essential oncogenic driver genes. Alternatively, an excessive number of DNA double-strand breaks may lead to sustained activation of the DNA damage response pathway and growth inhibition. In agreement with Wang and colleagues, we found that sgRNAs targeting amplified loci led to an increase of the DNA damage marker phospho-H2AX, a G2–M cell-cycle arrest, and induction of apoptosis (8). These findings suggest that activation of the DNA damage response pathway due to excessive DNA double-strand breaks is, at least in part, responsible for the observed growth-inhibitory effects, but it is quite possible that the deletion of oncogenic drivers in tandem amplicons contributes as well. The copy-number (CN) effect of CRISPR appears to be independent of p53 status, as it was observed with similar magnitude in both p53-mutated (SF268) and wild-type (MKN45) cell lines. Although the CN effect is most severe at highly amplified loci, we found that even subtle copy-number changes can have statistically significant effects on CRISPR dropout scores. It is important to note, however, that one may be able to correct for these subtler copy-number effects with bioinformatics approaches.
These findings have several important implications for the design of CRISPR screening strategies. First, CRISPR-based screens will likely not be a good approach to determine drivers of amplified genomic regions. The putative amplified driver oncogene MET, for instance, did not have the strongest viability effect in MKN45 cells compared with other genes in the amplicon. By contrast, MET was identified as the driver oncogene of this amplicon using shRNA-based screen, indicating that RNAi- or CRISPRi-based screens (21) are better suited to elucidate the driver oncogenes of amplified regions. Second, these findings have important implications for future sgRNA library designs. In order to avoid lethality due to excessive genome cuts, it will be critical to design CRISPR reagents that have no or at least minimal other matches across the entire human genome. Our findings also imply that for pooled CRISPR screening studies, it will be important to keep the multiplicity of infection during lentiviral transduction low, as transduction with multiple sgRNAs targeting different genomic regions could lead to excessive genome cuts and hence result in lethality. Interestingly, even diploid genes (CN = 2) exhibited a slight but statistically significant growth reduction compared with haploid (CN = 1) gene loci. Due to this apparent selection pressure against any genome cutting, it is possible that Cas9-expressing cells could be selected against strongly during the course of screening. Third, the ability to easily multiplex sgRNA in single experiments affords the ability of complex genome engineering and synthetic-lethal screening. However, based on our findings, one needs to carefully control for the effects of additional genomic cuts in dual or even higher multiplexed screens, as synthetic lethality could be the result of passing a threshold of “excessive” genomic cuts rather than genetic interactions. Fourth, the observed copy-number effects suggest that the use of a scrambled nontargeting CRISPR that does not cut the human genome is likely not the best control for CRISPR lethality experiments, and should be replaced with reagents cutting nonexpressed or known nonessential genomic regions, such as the AAVS1 locus. Lastly, it will be important to examine whether the copy-number effects observed in our study also pertain to normal tissues. In that case, caution should be exerted in both the experimental and therapeutic applications of CRISPR to the editing polyploid tissues, such as liver (23), as it may result in extensive genome damage that leads to impaired growth or apoptosis.
Most sgRNA libraries have been designed to direct CRISPR/Cas9-induced mutations to the 5′ exons of coding regions (7, 11) with the goal of introducing frameshift mutations early in the coding region of the gene of interest, and initial sgRNA design rules (24, 25) have focused on thermodynamic and sequence parameters of the guide RNA, much like the rules that were derived for RNAi reagents (26). Our results, however, suggest that performance of sgRNAs appears to be also strongly influenced by the structure/function of the gene regions they target. This can likely be explained by the fact that CRISPR can induce both frameshift (3n+/−1, 3n+/−2) and in-frame deletions (3n) of variable size. The consequences of these indel events can be quite variable depending on the nature of the deletion event. Frameshift deletions are likely to destroy protein function due to the deletion of large regions of the protein. However, small in-frame deletions in nonessential domains are likely to retain functionality (i.e., deletion of one or a few amino acids does not alter protein function) and thereby significantly reduce the signal-to-noise in dropout screens. By contrast, deletions of even single amino acids in key functional domains, such as the catalytic core, are likely perturbing protein function due to improper spacing of functional groups required for catalysis (27). Therefore, in contrast to nonessential domains that can tolerate small in-frame deletions, the deletion of even a single amino acid residue in highly conserved catalytic regions will likely result in disruption of protein function, explaining why these conserved regions show a much more robust dropout phenotype compared with nonessential regions. These findings are consistent with recent findings by Vakoc and colleagues (6) and imply that for genes of unknown function or with multiple known functions, the phenotypic strength of sgRNA targeting different regions could help pinpoint which domains are most essential for cancer cell growth.
Collectively, our study demonstrates the power of CRISPR-based dropout screens toward identifying cancer-selective vulnerabilities, but also highlights important caveats for the interrogation of genes in amplified regions. Moreover, our results suggest that the frequently used sgRNA design strategies that predominantly target the most 5′ coding regions of genes may be suboptimal. Instead, our data indicate that targeting the most highly conserved regions of a gene may yield a more robust dropout phenotype and thus maximize screen performance. Together, the findings described in this study provide a roadmap toward the systematic elucidation of cancer dependencies using CRISPR-based screening approaches.
Methods
Cell Culture, RNA-seq, and Copy-Number Variation
Cell lines were purchased from the ATCC, the RIKEN cell bank, or NCI/DCTC in June 2008 and were grown in either DMEM or RPMI supplemented with 10% FBS (Thermo Scientific). Cell lines were authenticated by SNP genotyping with the fluidigm biomark platform, with a panel of 48 SNPs (Fluidigm) prior to the screens. DNA copy number was measured using high-density SNP arrays (Affymetrix SNP 6.0; ref. 28). The RNA-seq data were acquired from the Cancer Cell Line Encyclopedia (CCLE) from the Broad Institute, where large insert non–strand-specific RNA sequencing was performed using a large-scale, automated variant of the Illumina TruSeq.Oligo dT beads are used to select mRNA from the total RNA sample (200 ng). The selected RNA is then heat fragmented and randomly primed before cDNA synthesis from the RNA template. The resultant cDNA then goes through Illumina library preparation (end repair, base “A” addition, adapted ligation, and enrichment) using Broad designed indexed adapters for multiplexing. After enrichment, the samples are qPCR quantified and equimolar pooled before processing to Illumina sequencing, done in the Illumina HiSeq 2000 or HiSeq 2500, with sequence coverage to 100M paired reads.
Vectors and CAS9 Cell Line Generation
To construct the lentiviral CAS9 vector, a human optimized 3FlagSPy-Cas9 was cloned into pLenti 6 (Thermo Scientific). Cell lines expressing CAS9 were generated by lentiviral transduction of the pLenti6-3flagSPyCAS9 vector. Positive populations were selected using Blasticidin S (Thermo Scientific). CAS9 expression was measured by flow cytometry. Cells (2 × 106) were fixed with 1% paraformaldehyde (Electron Microscopy Sciences) and ice-cold methanol (Fisher Scientific), were permeabilized with 0.2% Triton-X (Sigma-Aldrich), and stained using an antibody against Cas9 at a concentration of 1/200 (Cell Signaling Technology).
The shRNA library was constructed by Cellecta Inc. and can be acquired using library ID number: 27K-BGP2-MS-NOVA; 13K-hTF-GH-NOVA; 13K-hYAP-GH-NOVA; 13K-hEPI2-GH-NOVA. The sgRNA libraries were designed as previously described (7). A modified tracrRNA scaffold (29) for Cas9 loading was cloned into the sgRNA vectors before cloning of the guide RNAs. Each library targets approximately 2,700 genes and is comprised of 20 shRNAs or sgRNAs per gene (Supplementary Tables S2 and S3). For the tiling library, all possible sgRNAs (based on the presence of a PAM motif) against 157 genes were identified (Supplementary Table S4). Oligonucleotides were synthesized on a 92k array (Custom Array Inc.), amplified by PCR, and cloned into the lentiviral U6 sgRNA expression vector’s BbsI restriction sites using Golden Gate assembly (30). For all proliferation assays and next-generation sequencing, individual sgRNAs were cloned to an inducible U6 shRNA or sgRNA-expressing vector using the restriction enzyme BbsI or AarI.
CRISPR Guide Selection
RefSeq (downloaded on January 5, 2015) was used as the gene model for guide design. All potential 20_mer guides with a predicted cut site within an exon or within 10 base pairs from the exon–intron boundary were included as potential guides. Guides were annotated with sequence properties (e.g., GC Percentage, sequence degeneracy, Doench-root), mapping properties (e.g., 20 mer sequence uniqueness in the human genome, whether there are known overlapping SNPs or variants observed in any cell lines in the Novartis-Broad CCLE), and gene and expressed properties (e.g., overlapping protein domains).
Rather than choosing guides based on transcript or gene, genetic features were first grouped. In particular, transcript isoforms which shared at least 50% of potential guides were combined into a single meta-transcript, for which guides were chosen optimized to target all isoforms in that meta-transcript.
Pooled Screening
For all screens, cells were infected with lentiviral shRNAs or sgRNA pools at a representation of 1,000 cells per shRNA at a multiplicity of infection of 0.5. Cells were selected for 4 days in the presence of puromycin, and a reference sample was collected 72 hours after selection to ensure adequate selection/representation. Cells were propagated for a total of 14 days with an average shRNA/sgRNA representation of ≥1,000 maintained at each passage. Cells (100 million) were harvested for DNA extraction by the Qiagen QIAmp Blood Maxi Kit, shRNA and sgRNAs were PCR amplified from 100 μg of genomic DNA, and PCR fragments of 260–280bp were purified using Agencourt AMpure XP beads (Beckman). The resulting fragments were sequenced on a HiSeq 2500 (Illumina) with a single end 50bp run. Sequencing reads were aligned to the shRNA or sgRNA library, and the enrichment or loss of individual bar codes or sgRNA was quantified.
Data Processing
For each sample, total number of read counts was normalized to 50 × 106, with 50 additional pseudo-counts added to each shRNA to minimize false positives in the low-abundance tail of the shRNA library distribution, where counts are unreliable. All samples had day 14 log2 ratios for each sgRNA/shRNA calculated relative to plasmid counts, and shRNAs or sgRNAs whose abundance was significantly different from the mean were calculated using a z-score. The average z-score values integrate the information from multiple shRNAs targeting a single gene, thus showing the similarity of the effect of these multiple sgRNAs/shRNAs and minimizing the impact of possible off-target effects.
Statistical Analysis
All statistics were computed using python/scipy/pandas. P values were calculated using the Mann—Whitney–Wilcoxon rank-sum test using Python’s scipy.stats.mannwhitneyu function. Correlation coefficients between z-scores for DLD1 and RKO were calculated by Pearson correlation, whereas correlation between z-scores and vertebrate sequence conservation were calculated using Spearman correlation.
Sequencing Analysis
Cells were stably transfected with individual sgRNAs against CTNNB1, after selection cells were cultured in the presence or absence of doxycycline and collected for DNA extraction after 4 days in culture. Target regions were amplified by the locus-specific primer pairs shown in Supplementary Table S5. Amplicons were pooled in equimolar amounts, and libraries were generated on the Illumina NeoPrep using the TruSeq nano protocol and sequences on the MiSeq (Illumina). The indel frequency was calculated as previously described (8, 31).
Western Blot Analysis
Protein extracts, separated by SDS-PAGE and transferred onto PVDF membranes, were probed with antibodies against CDK9 [clone (C12F7); Cell Signaling Technology], actin (Clone AC-74; Sigma), pH2AX (Ser139- clone JBW301; Millipore), and Tubulin (clone DM1A; Sigma). Proteins of interest were detected with horseradish peroxidase–conjugated sheep anti-mouse and sheep anti-rabbit IgG antibody (1:2,500; Biorad) and visualized with the Pierce ECL Western Blotting Substrate (Thermo Scientific), according to the provided protocol.
Proliferation Assays
Cells stably expressing dox-inducible shRNA or sgRNAs against β-catenin, NRAS, and PLK1 (Supplementary Table S3) were used for ATP-based measurements of cellular proliferation by plating 1,300 cells per well, biologically replicated 3 times, in 96-well plates. After 6 days, 100 μL of Cell Titer-Glo reagent (Promega) were added to each well and mixed for 30 minutes, after which the luminescence was measured on the SpectraMax M5 Luminometer (Molecular Devices). P values were determined by one-tailed Student t test.
Proliferation was also measured using live-cell time-lapse imaging. Cells were harvested by trypsinization, counted on a Countess automated cell counter (Invitrogen), and plated at 130 cells per well on 96-tissue culture plates in 3 replicates. Photomicrographs were taken every 6 hours using an IncuCyte live-cell imager (Essen Bioscience), and confluence of the cultures was measured using IncuCyte software (Essen Biosciences) over 160 hours in culture. P values were determined by one-tailed Student t test.
Cell Cycle and Annexin/Fixable Viability Dye Assays
Cells were analyzed for phosphatidylserine exposure by an Annexin-V PerCP-eFluor 710/Fixable viability dye eFluor 780 (eBioscience) double-staining according to the provided protocol. Cells stably transfected with sgRNAs toward MMP7, MMP20, ANGPTL5, VEGFA, and OR4F5 (Supplementary Table S3) were analyzed after 6 days in culture. A minimum of 10,000 cells were collected with FACScanto (BD Pharmingen) and analyzed with FlowJo (Tree Star).
For cell-cycle analysis, cells were plated in 6-well plates and analyzed 6 days after stable transfection of the sgRNAs mentioned above. Cells were harvested, fixed with 70% ethanol, and stained with a solution containing 1% Triton X100 (Sigma) and 1 μg/mL DAPI (Invitrogen) in PBS for 30 minutes. A minimum of 10,000 cells were collected with LSRFortessa (BD Pharmingen) and analyzed with FlowJo (Tree Star). The Watson (Pragmatic) model was used for the cell cycle and apoptotic peak modeling.
Disclosure of Potential Conflicts of Interest
P.J. Cassiani is a scientist at KSQ Therapeutics. N. Keen reports receiving a commercial research grant from Novartis and has ownership interest (including patents) in the same. W.R. Sellers has ownership interest (including patents) in Novartis. F. Stegmeier is Chief Scientific Officer at KSQ Therapeutics. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: D.M. Munoz, F. Hofmann, T. Schmelzle, F. Stegmeier, M.R. Schlabach
Development of methodology: E. Billy, J.M. Korn, M.D. Jones, J. Golji, D.A. Ruddy, M.R. Schlabach
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D.M. Munoz, P.J. Cassiani, L. Li, E. Billy, D.A. Ruddy, K. Yu, D. Abramowski, J. Wan, O. Weber
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D.M. Munoz, J.M. Korn, M.D. Jones, J. Golji, G. McAllister, A. DeWeck, M.D. Shirley, A. Kauffmann, E.R. McDonald III, M.R. Schlabach
Writing, review, and/or revision of the manuscript: D.M. Munoz, E. Billy, J.M. Korn, M.D. Jones, J. Golji, A. Kauffmann, E.R. McDonald III, N. Keen, F. Hofmann, W.R. Sellers, T. Schmelzle, F. Stegmeier, M.R. Schlabach
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): K. Yu, D. Abramowski, S.Y. Neshat, D. Rakiec, R. de Beaumont, M.R. Schlabach
Study supervision: E.R. McDonald III, F. Hofmann, W.R. Sellers, T. Schmelzle, F. Stegmeier, M.R. Schlabach
Other (NGS support): K. Yu