Abstract
Small-molecule drugs have enabled the practice of precision oncology for genetically defined patient populations since the first approval of imatinib in 2001. Scientific and technology advances over this 20-year period have driven the evolution of cancer biology, medicinal chemistry, and data science. Collectively, these advances provide tools to more consistently design best-in-class small-molecule drugs against known, previously undruggable, and novel cancer targets. The integration of these tools and their customization in the hands of skilled drug hunters will be necessary to enable the discovery of transformational therapies for patients across a wider spectrum of cancers.
Target-centric small-molecule drug discovery necessitates the consideration of multiple approaches to identify chemical matter that can be optimized into drug candidates. To do this successfully and consistently, drug hunters require a comprehensive toolbox to avoid following the “law of instrument” or Maslow's hammer concept where only one tool is applied regardless of the requirements of the task. Combining our ever-increasing understanding of cancer and cancer targets with the technological advances in drug discovery described below will accelerate the next generation of small-molecule drugs in oncology.
INTRODUCTION
The approval of imatinib to treat chronic myelogenous leukemia in 2001 marked the beginning of the small-molecule precision oncology era. In subsequent years, more than 90 small-molecule targeted therapies have been approved to treat various cancers (1). Developing targeted therapies requires a hypothesis-driven mechanistic framework that contrasts the decades-old empirical approaches used to develop cytotoxic chemotherapies. Patient selection is based on molecular markers and directs therapies to the patients most likely to benefit, and pharmacodynamic biomarkers provide insight into target modulation, allowing a link to be drawn between the mechanism of action and efficacy. Establishing a pharmacokinetic/pharmacodynamic/efficacy relationship in appropriate preclinical models builds confidence in target-mediated efficacy and provides thresholds for pharmacodynamic modulation that can be used as criteria for compound optimization. Furthermore, applying correction factors for plasma protein binding across species allows one to predict drug concentrations required for target engagement in humans that can be used for human dose projection and then tested using pharmacodynamic assays during dose escalation in phase I trials (2). Following this framework provides a mechanistic link between the drug and target allowing the therapeutic hypothesis to be tested and provides confidence to advance a drug to later-stage clinical development to confirm efficacy (reviewed in refs. 3, 4).
In spite of the progress made in developing targeted therapies, only ∼7% of patients derive benefit (5). Developing effective therapies for broader patient populations, targeting this white space of medical need, will require adherence to the principles learned regarding target selection, the importance of potency and selectivity, and addressing resistance. New insights gained in cancer biology have informed novel target identification, and the evolution of medicinal chemistry and data science has expanded the drug hunter's toolbox to support the development of potent, selective drugs (Fig. 1). Although immuno-oncology has emerged as a critical field area in cancer treatment and drug discovery, there are still many lessons to be learned around target selection and translation to the clinic and therefore will not be covered here. This review will provide an overview of the major learnings with select examples and highlight recent advances in technologies used in small-molecule drug discovery that will be needed to deliver precision medicines to a wider population of patients with cancer.
Scientific and technical advances in biology, chemistry, and data science over the past two decades have driven the development of novel first-in-class drugs and the evolution of best-in-class drugs in oncology. ctDNA, circulating tumor DNA; DepMap, Cancer Dependency Map; DL, deep learning; FEP, free energy perturbation; GEMM, genetically engineered mouse model; LTS MD, long time-scale molecular dynamics; PDB, Protein Data Bank; PROTAC, proteolysis-targeting chimeras. Herceptin is manufactured by Genentech, Gleevec by Novartis, Iressa by AstraZeneca, Zelboraf by Genentech, Xalkori by Pfizer, Imbruvica by Pharmacyclics/AbbVie and Janssen, Zykadia by Novartis, Tagrisso by AstraZeneca, Vitrakvi by Bayer, Lorbrena by Pfizer, and Lumakras by Amgen.
Scientific and technical advances in biology, chemistry, and data science over the past two decades have driven the development of novel first-in-class drugs and the evolution of best-in-class drugs in oncology. ctDNA, circulating tumor DNA; DepMap, Cancer Dependency Map; DL, deep learning; FEP, free energy perturbation; GEMM, genetically engineered mouse model; LTS MD, long time-scale molecular dynamics; PDB, Protein Data Bank; PROTAC, proteolysis-targeting chimeras. Herceptin is manufactured by Genentech, Gleevec by Novartis, Iressa by AstraZeneca, Zelboraf by Genentech, Xalkori by Pfizer, Imbruvica by Pharmacyclics/AbbVie and Janssen, Zykadia by Novartis, Tagrisso by AstraZeneca, Vitrakvi by Bayer, Lorbrena by Pfizer, and Lumakras by Amgen.
ADVANCES OVER 20 YEARS IN SCIENCE AND TECHNOLOGY ACROSS CANCER BIOLOGY, CHEMISTRY, AND DATA SCIENCE
Cancer Biology
Since the discovery of the first human oncogenes and tumor suppressor genes in the 1980s, the amount of available information on the genetic drivers of cancer has exploded. Completion of the Human Genome Project in 2003 delivered the first nearly complete sequence of the human genome. This provided scientists with a normal reference that allowed the comparison of DNA sequences in cancers to the normal DNA sequence, dramatically improving our ability to identify the recurrent genetic alterations that contribute to tumor initiation and progression. Improvements in DNA sequencing technologies, including the initial introduction of massively parallel sequencing by Roche 454 and Illumina in 2004 to 2006, followed shortly by technologies for DNA sequence enrichment that allowed focused sequencing of specific regions of interest, made genome-scale and whole-exome sequencing (WES) of normal and cancer samples feasible (6).
Armed with these next-generation sequencing technologies (NGS), large-scale efforts were initiated to profile the molecular alterations present in patient tumors and cancer cell lines. The Cancer Genome Atlas (TCGA) project, a joint effort between the National Cancer Institute and the National Human Genome Research Institute, was launched in 2006 with “the aim of obtaining a comprehensive understanding of the genomic alterations that underlie all major cancers.” Similar efforts were initiated in the United Kingdom as the Cancer Genome Project (7). Shortly thereafter, the International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer profiling projects being conducted in numerous countries around the world (8). These efforts were accompanied by the development of novel computational and statistical approaches to distinguish functional genetic variants and candidate driver genes from the numerous passenger mutations that accumulate in cancer cells (9, 10).
These profiling projects have expanded beyond cataloging genomic changes to characterize transcriptomic, proteomic, and epigenomic alterations as well. To date, these efforts have profiled tens of thousands of patient samples across more than 30 different tumor types. This has led to the systematic identification and characterization of diverse types of genetic alterations including substitutions, indels, fusions, copy-number alterations, complex structural variations, and somatic driver mutations in noncoding regions, as well as recurrent changes in mRNA splicing, chromatin architecture, and cancer-associated proteoforms (9, 11–13).
In addition to reconfirming the prevalence of mutations in known oncogenes and tumor suppressor genes, such as KRAS, TP53, PTEN, PIK3CA, and EGFR, these sequencing efforts identified many new candidate driver genes and provided insights into novel oncologic processes. Positive selection for genetic alterations in transcriptional regulators, chromatin modifiers, metabolic pathway genes, and components of the spliceosome suggests many potential therapeutic targets beyond the oncogenic kinases and their regulators that were targeted by the first generation of precision oncology drugs (14, 15).
Numerous Web-based portals and visualization tools have been developed that allow the broader scientific community to access and analyze the vast amounts of “omics” data and associated clinical and biological data that have been generated though these large-scale profiling efforts (Table 1).
Databases and visualization tools for molecular characterization of human tumors and tumor cell lines
Database . | Link . | Types of data/analyses . |
---|---|---|
cBioPortal | https://www.cbioportal.org/ | Mutations, putative CNVs; mRNA expression, protein/phosphoprotein level; survival analyses (173, 174) |
COSMIC | https://cancer.sanger.ac.uk/cosmic | Curated somatic mutations across tumors and cell lines |
ICGC Data Portal | https://dcc.icgc.org/ | Somatic mutations, somatic CNVs, somatic structural variants, germline mutations, DNA methylation, gene/protein expression, miRNA expression, exon junction; epidemiologic and clinical data |
UCSC Genome Browser | https://genome.ucsc.edu/ | Mutations, CNVs, mRNA and miRNA expression, splice variants, DNA methylation, protein expression, clinical data |
Genomic Data Commons | https://gdc.cancer.gov/ | Mutations, CNVs, mRNA and miRNA expression, structural variants, splice variants, DNA methylation, protein expression |
FireBrowse | http://firebrowse.org/ | Interface for analyzing TCGA data |
OncoKB | https://www.oncokb.org/ | Mutations, CNVs, fusions (175) |
DepMap | https://depmap.org/portal/ | Genetic loss-of-function screening, pharmacologic dependencies, CCLE omics characterizations |
canSAR.ai | https://cansar.ai/ | Integrates biology, chemistry, pharmacology, structural biology, cellular networks and clinical annotations, and applies machine learning approaches to develop predictions useful in drug discovery (117) |
Database . | Link . | Types of data/analyses . |
---|---|---|
cBioPortal | https://www.cbioportal.org/ | Mutations, putative CNVs; mRNA expression, protein/phosphoprotein level; survival analyses (173, 174) |
COSMIC | https://cancer.sanger.ac.uk/cosmic | Curated somatic mutations across tumors and cell lines |
ICGC Data Portal | https://dcc.icgc.org/ | Somatic mutations, somatic CNVs, somatic structural variants, germline mutations, DNA methylation, gene/protein expression, miRNA expression, exon junction; epidemiologic and clinical data |
UCSC Genome Browser | https://genome.ucsc.edu/ | Mutations, CNVs, mRNA and miRNA expression, splice variants, DNA methylation, protein expression, clinical data |
Genomic Data Commons | https://gdc.cancer.gov/ | Mutations, CNVs, mRNA and miRNA expression, structural variants, splice variants, DNA methylation, protein expression |
FireBrowse | http://firebrowse.org/ | Interface for analyzing TCGA data |
OncoKB | https://www.oncokb.org/ | Mutations, CNVs, fusions (175) |
DepMap | https://depmap.org/portal/ | Genetic loss-of-function screening, pharmacologic dependencies, CCLE omics characterizations |
canSAR.ai | https://cansar.ai/ | Integrates biology, chemistry, pharmacology, structural biology, cellular networks and clinical annotations, and applies machine learning approaches to develop predictions useful in drug discovery (117) |
Abbreviations: CCLE, Cancer Cell Line Encyclopedia; CNV, copy-number variation; DepMap, Cancer Dependency Map.
Large-scale sequencing efforts have revealed hundreds of potential driver genes, each with numerous different coding variants. Elucidation of the functional role of these genes in cancer and the phenotypic consequences of specific genetic alterations requires experimental manipulation in relevant model systems. In the past few decades, several new types of cancer models have been developed that recapitulate key features of tumor heterogeneity and the microenvironment, including but not limited to genetically engineered mouse models, patient-derived xenograft models, and patient-derived organoid models. However, cancer cell lines remain the workhorse model system for studying cancer biology and characterizing the effects of genetic and pharmacologic perturbations due to their scalability and ease of manipulation.
The Cancer Cell Line Encyclopedia (CCLE) project began in 2008 as a collaboration between the Broad Institute and the Novartis Institutes for BioMedical Research to comprehensively characterize the molecular features of a large panel of human cell lines. This collaboration was later joined by the MD Anderson Cancer Center and Harvard Medical School. Since its initiation, this effort has profiled over 1,000 cell lines from more than 30 cancer lineages at the genomic, transcriptomic, proteomic, and metabolic levels. All, or a significant subset of the lines, have been profiled for whole-genome sequencing, WES, mRNA expression, RNA splicing, microRNA expression, DNA methylation, histone modifications, reverse-phase protein array, and metabolites (16, 17). In 2018, the CCLE project became part of the Cancer Dependency Map (DepMap) program (discussed below); CCLE profiling data are available through the DepMap portal. The Wellcome Sanger Institute's Catalogue of Somatic Mutations in Cancer (COSMIC) database is another useful resource with large-scale genomic data on tumors and tumor cell lines (Table 1).
Coupled with this extensive panel of well-characterized cell line models, the advent and adoption of a variety of functional genomics tools provided new approaches to characterize the biological role of the many cancer-related genes identified through these large profiling efforts. In the early 2000s, RNAi became a valuable and widely used tool to silence gene expression, thus allowing scientists to assess the function of any gene of interest in nearly any cell type. The use of siRNA and short hairpin RNA (shRNA) technologies to characterize the phenotypic consequences of knockdown of individual genes in cancer cell lines became the standard approach for evaluating the functions of candidate oncogenes and tumor suppressor genes. Genome-wide barcode shRNA screens coupled to NGS became widely used for the identification of new therapeutic targets in cancer (18–20). However, it soon became understood that the use of RNAi approaches resulted in the silencing of numerous unintended (off-target) transcripts due to seed region sequence complementarity, leading to many false positives when assessing cancer dependencies in pooled screens and on an individual gene basis (21). To overcome this limitation, researchers have made improvements in the promoter and microRNA context for shRNA expression and incorporated increased numbers of shRNA sequences per gene in large-scale screens (22–24).
In many instances, the use of “dirty” validation tools such as early RNAi modalities and nonselective tool compounds led to significant resources being deployed against the wrong targets. One example of an incorrectly validated target is the Maternal Embryonic Leucine Zipper Kinase (MELK), in which inhibition by RNAi and promiscuous kinase inhibitors showed strong effects on the viability of triple-negative breast and other cancer cell lines (25–28). On the basis of numerous publications on this dependency, several companies developed MELK inhibitors. However, later characterization using CRISPR–Cas9 knockout and selective MELK inhibitors clearly showed that MELK activity is not required for cell proliferation, survival, or stress tolerance (29, 30). The precise role of MELK in cancer is still being explored. This highlights the importance of using selective inhibitors for target validation.
As the use of RNAi screening was exploding, scientists were characterizing the prokaryotic CRISPR–Cas system and developing tools for CRISPR–Cas9-mediated genome editing in mammalian cells (31–34). Although initially used to drive gene knockout, it was soon determined that this system could be exploited to knock in specific genetic alterations by providing a DNA sequence that the cell can use as a repair template to drive homology-directed repair, thus providing a means to determine the functional consequence of individual coding sequence variants in oncogenes and tumor suppressor genes. There are still instances where CRISPR-mediated editing has unintended “off-target” effects. For instance, CRISPR targeting of amplified genes can lead to viability effects resulting from excessive DNA damage rather than loss of gene function (35, 36). Nevertheless, CRISPR technology has proven to have fewer overall off-target effects than RNAi and has revolutionized how cancer biologists approach target identification and validation.
To explore the role of every gene in nearly every cancer type, scientists at the Broad Institute (Project Achilles) and the Wellcome Sanger Institute (Project Score) initiated genome-wide CRISPR-Cas9 (and, previously, RNAi) loss-of-function screens in hundreds of molecularly characterized cell lines to systematically identify genotype-specific selective dependencies. The two institutes entered a strategic collaboration to accelerate these efforts, known as the DepMap, which has produced an integrated genome-wide screening dataset spanning more than 900 cell lines (37–39). This project also includes a large-scale drug-sensitivity profiling project, PRISM, that utilizes a molecular barcoding method to pool cell lines in order to rapidly profile the viability effects of thousands of compounds across hundreds of cell lines (40). As of the summer of 2018, the CCLE also became part of the Broad DepMap Project. The DepMap portal (Table 1; Fig. 2), which is publicly accessible, integrates datasets from all of these screening efforts and the CCLE. DepMap has become a fundamental resource used by academia and industry alike for further hypothesis-driven target validation and targeted drug discovery efforts.
Databases and visualization tools for molecular characterization of human tumors and tumor cell lines. DepMap visualization for the PIK3CA gene indicating that cell lines with functional/activating PIK3CA mutations are dependent on PIK3CA for proliferation (16, 176). RNA-seq, RNA sequencing; WGS, whole-genome sequencing; WT, wild-type.
Databases and visualization tools for molecular characterization of human tumors and tumor cell lines. DepMap visualization for the PIK3CA gene indicating that cell lines with functional/activating PIK3CA mutations are dependent on PIK3CA for proliferation (16, 176). RNA-seq, RNA sequencing; WGS, whole-genome sequencing; WT, wild-type.
Pairing tumor genetic data with functional genomics data from DepMap and other large-scale screens has contributed to the development of several novel therapies that are nearing or have entered clinical trials. The foundation for this approach has its origins in the identification of mutant BRAF as a validated oncogene. Activating BRAF mutations were first identified through sequencing tumor cell lines followed by functional validation in vitro and confirmation in a range of primary human tumor samples (41). This discovery invigorated efforts to develop selective BRAF inhibitors, which have transformed the treatment landscape in metastatic melanoma, and also provided useful tools to gain insights into the mechanisms of activation of various BRAF mutations (42). Today, there are three BRAF inhibitors approved to treat a range of tumors with BRAFV600 mutations, including vemurafenib, dabrafenib, and encorafenib.
The elucidation of synthetic lethal relationships has also been a major advancement coming out of the pairing of sequencing and functional genomics efforts and has led to many novel therapeutic targets. The vulnerability of MTAP-deleted cancer cells to inhibition of PRMT5 and other enzymes that support its function was first discovered through Project Achilles and Project DRIVE shRNA screening data (43, 44). MTAP is a key enzyme in the methionine salvage pathway, and the MTAP gene is frequently deleted in cancers due to its proximity to the CDKN2A tumor suppressor gene. These initial findings paved the way for deeper exploration of susceptibilities conferred by MTAP deletion (45, 46), and there are now several clinical trials underway in patients with MTAP-deleted cancer for agents targeting PRMT5 and its supporting enzymes, including the MTA-cooperative PRMT5 inhibitors AMG-193, TNG462, and MRTX1719, and the MAT2A inhibitors IDE397 and AG-270 (47).
Importantly, synthetic lethality has opened the door to selective targeting of certain tumor suppressor gene mutations. PARP inhibitors provided the first clinical validation of this approach when the discovery of the synthetic dependency on PARP1 in BRCA1/2-deficient cancers led to the development of a number of PARP inhibitors, namely, olaparib, niraparib, rucaparib, and talazoparib. These medicines have revolutionized the treatment of BRCA-mutant ovarian, breast, prostate, and pancreatic cancers (reviewed in ref. 48). Components of the BAF (SWI/SNF) chromatin-remodeling complexes, such as SMARCA4, ARID1A, SMARCB1, and others, represent another class of tumor suppressor genes that are frequently lost or mutated in cancer. Together, deficiencies in BAF complex subunits occur in over 20% of human cancers. Synthetic lethal interactions between various components of the complex were identified through functional genomics screening efforts (reviewed in ref. 49). Several agents including SMARCA2 degraders and BRD9 degraders are currently in clinical trials in patients with SMARCA4 mutations and SMARCB1 loss, respectively. There are also efforts underway to target mutant tumor suppressors directly. For example, PC14586, a small-molecule structural corrector that restores wild-type function to the Y220C mutant p53 protein, is currently in clinical trials in patients carrying the TP53Y220C mutation.
More recently, scientists have expanded functional genomics screens to use dual-gene CRISPR systems to explore the compensatory effects of paralog genes that underly selective digenic dependencies (50). Due to their sequence and structural homology, small-molecule drugs are often active across paralogs, and thus these digenic dependencies present new therapeutic opportunities.
Although the underlying basis for the selective dependency on many genes can be linked to specific mutations, such as increased dependency on KRAS in KRAS-mutant cell lines or the synthetic lethal dependency on SMARCA4 in SMARCA2-deficient cell lines, as well as the examples discussed above, there are still numerous selective dependencies identified through these large-scale screens that have not been clearly linked to a specific molecular marker or profile. The relationship between genetic dependencies and cancer genomes is nonlinear, with the interplay between multiple genetic alterations often determining the degree of dependency on any one gene. Moreover, additional nongenetic factors such as alterations in the epigenome, transcriptome, proteome, and microenvironment of cancer cells may also contribute to selective dependency. As these genes represent potential targets for the development of novel therapies, it will be crucial to elucidate the unique cellular features (biomarkers) that predict dependence to better identify patients likely to benefit from new therapies. To this end, deep learning methods are being developed in an effort to predict gene dependencies or drug sensitivities from complex genomic and transcriptomic profiles (51, 52). It remains to be determined whether these approaches will lead to the identification of predictive biomarkers for novel druggable targets.
The large number of patient tumors and cell lines that have been extensively profiled has led to the identification of both lineage-specific genetic alterations and driver genes as well as alterations that are shared across many cancer types. These pan-cancer analyses have revealed that cancers of different tissues can share the same drivers and be biologically more similar to each other than to other tumors of the same tissue of origin. The similarities in driver gene dependencies across indications have also been borne out in large-scale functional genomics screens, ultimately changing the way we think about developing new therapies to include both indication-centric treatments and treatments that are appropriate for genetically defined subsets of patients across multiple indications.
The ability to both identify and experimentally manipulate the macromolecules that are altered in cancers using relevant model systems has led to the discovery of numerous potential therapeutic targets. New drugs against several of the oncogenic proteins identified through these efforts are now in clinical trials or have been recently approved. Still, other novel agents have not been as efficacious in patients, as predicted by preclinical models. This may be due to the presence of multiple driver mutations in a single tumor, niche-derived resistance factors, intratumoral heterogeneity, or other factors, all of which point to the need for drug combination strategies. For instance, patients with colorectal cancer harboring KRASG12C mutations derive less benefit from KRASG12C inhibitors compared with patients with non–small cell lung cancer (NSCLC) who carry this mutation, likely due to high levels of EGFR activity in colorectal cancer (53). Clinical trials exploring the efficacy of KRASG12C inhibitors in combination with EGFR antibodies are currently underway, with early data indicating improved response rates with the combination (54–56).
Selective biological dependency is only one factor when prioritizing targets for drug discovery efforts. Perhaps equally important is the assessment of druggability. Many classes of oncogenic proteins remain challenging to drug using traditional approaches, but are beginning to be tackled through improvements in medicinal chemistry strategies and data science tools.
Advances in Medicinal Chemistry
Medicinal chemistry has undergone a revolution over the past two decades. Multiple advances have come together to create a powerful toolbox that, when applied in concert, promises to help address targets hereto considered undruggable.
Property-Based Drug Design
One significant advance is the use of physicochemical properties to design small-molecule compounds most likely to have good drug-like properties. This approach has been termed property-based drug design. It traces its origin to the proposal of the “rule of 5” (Ro5) in 1997, which states that compounds with hydrogen bond (H-bond) donors ≤5, H-bond acceptors ≤10, molecular weight (MW) ≤500, and logP (a measure of lipophilicity) ≤5 are more likely to have good oral absorption than those that fail these rules (57). This concept was extended to allow for the prediction of other important drug-like properties such as central nervous system penetration (58), solubility (59), and safety liabilities (60). All the physicochemical properties for these relationships can be calculated in silico based on the chemical structure at the design stage before synthesis, thereby increasing drug discovery efficiency and speed and reducing attrition.
Macrocycles
Over the last few decades, desirable drug targets have expanded to include those with shallow and extended binding pockets for which obtaining good binding activity is challenging. This has led to the exploration of the possibilities offered by compounds “beyond the rule of 5” (BRo5), with MW >500 and a higher number of H-bond donors and acceptors, capable of binding more tightly to these difficult pockets while retaining cell permeability and oral absorption (61). One of the main strategies to accomplish this has been through macrocyclization. Macrocycles are present in multiple orally bioavailable BRo5 natural products, such as rapamycin (ref. 62; see “Novel Drug Modalities: Molecular Glues and Degraders” section below), and there has been an effort to apply learnings from these compounds as strategies for de novo–designed macrocycles (63). One finding has been that the preorganization offered by the ring structure can result in the energetic accessibility of conformations that allow intramolecular H-bond formation between H-bond donors and acceptors. These intramolecular H-bonds conceal some of the molecule's polarity, thereby increasing membrane permeability. This strategy has seen a resurgence, in part due to a 2008 review (63), and has been extended to achieve good permeability and absorption with nonmacrocyclic BRo5 compounds. Macrocyclization has multiple benefits beyond improving permeability, including increased binding affinity, selectivity, and metabolic stability, due to favoring the bioactive conformation and disfavoring of other conformations that antitargets and metabolic enzymes can recognize. Hence, it is also utilized for these purposes in the Ro5 chemical space, as is the case in the ALK inhibitor lorlatinib (ref. 64; see “Potency Matters: ALK Inhibitors” section below) and in the recently approved JAK2/FLT3 inhibitor pacritinib (65).
Allostery
Allosteric ligands that modulate the activity of a protein by binding to a site distinct from the active or orthosteric site have seen a resurgence over the last decade. Some of the first allosteric modulators were developed against G protein–coupled receptors. In recent years, there has been a renewed interest in them due to the many challenges in drug discovery that they can help address (66). Because they do not bind to the highly conserved orthosteric site, they offer the opportunity for selectivity against closely related proteins or for mutants against wild-type proteins. Furthermore, allosteric modulators provide the opportunity to tune the activity of the orthosteric ligand, allowing for partial inhibition or activation, or altered downstream signaling, and can help address resistance due to mutations in the orthosteric site. For example, the BCR–ABL inhibitor asciminib binds to the allosteric myristate site and maintains activity against orthosteric inhibitor–resistant ATP-site mutations (ref. 67; see “Targeting through Orthosteric and Allosteric Mechanisms: BCR–ABL” section below). Significantly, allosteric modulators can help tackle challenging biological targets such as those with poor orthosteric binding sites, including some protein–protein interactions, and targets with high-affinity endogenous ligands that would be difficult to displace with inhibitors. An example is the discovery of SHP099, which inhibits SHP2 phosphatase by interacting with an allosteric site (68) and is often credited for reinvigorating this target class, as selective orthosteric inhibition of phosphatases with drug-like molecules is challenging. The MEK inhibitor trametinib binds in an allosteric site adjacent to the ATP-binding site and further illustrates the concept (69). The disadvantages of pursuing an allosteric approach include the difficulty in identifying these sites in targets of interest, particularly since some may be cryptic—that is, only present in specific protein conformations. Furthermore, some allosteric sites may not affect the desired function once identified. The advantages outlined above, progress in understanding allosteric modulation mechanisms, the increased availability of structural biology information, and computational approaches to model protein dynamics have significantly bolstered this area. KRASG12C inhibitors that bind to a cryptic allosteric site adjacent to the nucleotide-binding pocket epitomize this class of compounds and are further discussed below (ref. 70; see “Covalent Binders” and “Covalent Targeting to Reveal Cryptic Drug-Binding Pockets: KRASG12C” sections).
Fragment-Based Drug Discovery
Fragment-based drug discovery (FBDD), the identification of relatively low-MW compounds that bind efficiently to their biological targets, and can be evolved into higher MW compounds with greater affinity and drug-like properties, has become a prevalent approach in drug discovery (71). FBDD was first practically demonstrated in 1996 by Abbott scientists in the discovery of FKBP ligands (72) and has since been applied to a multitude of targets. The physicochemical properties of the compounds screened in FBDD have been defined as the “rule of 3,” which includes the use of fragments with MW ≤300 (73). Because these fragments are small and relatively simple in structure, it is possible to cover, with a relatively small library of a few thousand compounds, a similar breadth of chemical space as with a traditional, significantly larger collection of higher-MW compounds. Fragment screens are usually conducted using biophysical techniques to detect hits with relatively weak binding affinity, which are then evaluated using ligand efficiency calculations. This metric normalizes their binding affinity to their size (74). The initial fragment hits are then elaborated to potent molecules either by merging two fragments or fragment growth through the attachment of additional functionality. The process is typically guided by X-ray crystallography or other structural biology techniques. An alternative process involves screening by high-throughput X-ray crystallography and has the benefit of providing structural information to guide hit optimization directly from the screen (75). This approach has been successfully optimized for academic and industrial applications at the XChem facility at Diamond Light Source (76). Due to its efficient coverage of chemical space, FBDD enables hit finding against challenging targets and the identification of allosteric sites and ligands, such as in the discovery of the BCR–ABL inhibitor asciminib (ref. 67; see “Targeting through Orthosteric and Allosteric Mechanisms: BCR–ABL” section below). In addition to asciminib, several oncology-approved drugs have originated from fragments. They include the first fragment-based approved drug, the BRAF inhibitor vemurafenib (77), the BCL-2 inhibitor venetoclax, which originated with a fragment identified using the original nuclear magnetic resonance (NMR) screening approach (78), and the FGFR inhibitor erdafitinib (79). In addition, capivasertib, an AKT inhibitor in phase III clinical trials, originated independently from a fragment similar to vemurafenib's original fragment (80, 81).
Degraders
Identifying degraders, compounds that cause the selective degradation of a protein of interest (POI), has emerged as a promising strategy for drugging targets for which developing functional inhibitors is difficult or insufficient (82). Molecular glues are small molecules capable of stabilizing the interactions between two proteins through the formation of a ternary complex to alter their function (83). Glue degraders, a subclass of molecular glues, have emerged as a promising type of monovalent degraders (84). They are compounds that bind to an E3 ubiquitin ligase and induce or enhance its interaction with a POI, leading to ubiquitination and proteasomal degradation of the POI. Typically, they do not bind independently to the POI. The prototypical glues are the immunomodulatory imide drugs (IMiD), such as thalidomide, which were discovered serendipitously. They bind the cereblon E3 ligase and induce degradation of the IKAROS family zinc finger proteins, among others. Unfortunately, the de novo identification of glue degraders for a specific POI has proven challenging.
Proteolysis-targeting chimeras (PROTAC), a more modular type of degrader, were first described in 2001 (85). PROTACs are bifunctional molecules that contain a binder to an E3 ubiquitin ligase, a binder to the POI, and a linker that joins the two (86). The PROTAC thus forms a ternary complex with the E3 ligase and POI, bringing the two proteins into proximity and allowing for ubiquitination and degradation of the POI. One major advantage of PROTACs is that the binder to the POI can be “silent” or devoid of functional activity, as PROTACs can exploit favorable silent allosteric sites. Other advantages, which they share with glue degraders, include their catalytic nature, which can lower the requirement for strong affinity for the POI and for high in vivo exposure, their possible extended duration of action, and the opportunity for novel pharmacology arising from degradation as opposed to specific functional inhibition. PROTAC disadvantages include the current need for empirical optimization despite their modularity and typically BRo5 characteristics. Despite this, the application of BRo5 approaches has resulted in 15 oral PROTACs, spanning nine different targets, advancing to clinical trials (ref. 86; see “Molecular Glues and Degraders” section).
Covalent Binders
Another area of significant progress has been the rational design of covalent small-molecule drugs (87, 88), resulting in more than 40 approved covalent drugs to date. Covalent compounds bind to their biological targets in a two-step process. First, they bind reversibly through specific noncovalent interactions that place the ligand's reactive functionality close to the target's reactive amino acid, enabling subsequent covalent bond formation between the ligand and target. The initial specific noncovalent binding requirement is critical to achieving selectivity for the desired target. Although covalent drugs have been in use for a long time, it has only recently been demonstrated that one can, by design, add a reactive electrophilic group to an existing noncovalent ligand. This approach was first applied to the tyrosine kinase EGFR (ref. 89; see “Targeting Resistance Mutations: Four Generations of EGFR Inhibitors” section) and subsequently to multiple other targets. Notably, the increased binding affinity obtained with covalent compounds has been utilized to address targets that have proven challenging using noncovalent ligands, as with KRASG12C inhibitors (90). The KRASG12C inhibitor example used a different approach than that used in EGFR, which is based on FBDD and involves first identifying a covalent fragment hit and then evolving it to a higher-MW compound with enhanced noncovalent interactions with the target (70). This work also demonstrates the utility of this approach for identifying novel allosteric or cryptic pockets (see “Covalent Targeting to Reveal Cryptic Drug-Binding Pockets: KRASG12C” section below). Most of the designed covalent drugs, including those against EGFR and KRASG12C, target cysteine residues, as these can have high reactivity. Still, efforts are ongoing to expand them to other potentially reactive amino acids such as lysine and tyrosine. The potential advantages of covalent ligands include the ability to obtain good potency even in relatively shallow binding pockets, enhanced selectivity in cases in which the covalently bound amino acid residue is unique to the desired target, extended duration of action after the inhibitor has been cleared from the body, and reduction in off-target toxicity by rapid and extended target engagement. One common criticism of covalent compounds is the possibility of off-target or idiosyncratic toxicity arising from the covalent modification of undesired proteins.
Developments in the area of chemical proteomics and particularly in competitive activity-based protein profiling (ABPP) have greatly enabled covalent drug discovery (91). Identifying covalent fragment hits, as in the KRASG12C case, promises to provide good starting points for covalent drugs against challenging targets. Although initially focused on investigations of enzyme families, ABPP was first utilized in 2016 to identify covalent fragment ligands for cysteine residues across the proteome, leveraging advances in higher throughput mass spectroscopy (MS; ref. 92). This approach moved covalent fragment discovery from a target-focused activity to one that could be done broadly across the proteome to identify allosteric and cryptic pockets for challenging targets (93). The approach relies on incubating live cells (or cell lysate) with a compound, followed by treatment of each sample with a cysteine-reactive probe that covalently binds to accessible cysteines not previously modified by the compound. Following proteolysis, the probe is used to pull down the peptides containing probe-modified cysteines, and the samples are injected into the MS. The hits are identified by a loss of signal in any of the peptide-probe MS peaks in the compound-treated samples relative to the control. MS-enabled competitive ABPP finds ligandable hotspots across the proteome and identifies covalent fragment hits against some of these sites that can be elaborated to covalent drugs. The technique can also determine the selectivity of covalent compounds against cysteines across the proteome to help reduce their off-target or idiosyncratic toxicity risk. Notably, this technique allows for screening targets in their native cellular context. Furthermore, the method can identify covalent E3 ligands that could be transformed into covalent PROTACs or molecular glues.
Chemical Libraries and Probes
Enhancements in compound screening collections have also been a critical enabling tool. This extends to better curated high-throughput sequencing collections, target-focused collections, and well-designed noncovalent and covalent fragment libraries (94). DNA-encoded libraries (DEL), first described conceptually in 1992 (95), have become a valuable method of identifying binding hits for challenging targets, and there are now at least three compounds in clinical development targeting sEH, RIP1, and ATX that originated from DEL hits, although these are all being developed for nononcology indications (96, 97). They are extensive combinatorial small-molecule libraries in which each compound is attached to a DNA oligomer that encodes the identity of the small-molecule. They are screened by affinity capture with the target protein, and the hits are decoded by PCR amplification and sequencing of the attached DNA tag. Because of their very large size, DELs can be helpful when smaller libraries have not afforded hits. The binders they identify can be screened for functional activity or used as the POI binder in a PROTAC approach. Covalent DELs can also identify hits to enable covalent programs, demonstrating their versatility.
Chemical probes are well characterized and selective small-molecule modulators of a specific protein that are useful for exploring the biological function or role of its target, and for validating or invalidating the target for drug discovery. They are complementary to genetic approaches. Unfortunately, the broad use of low-quality chemical probes, particularly with low selectivity, has led to erroneous conclusions in the literature. The Chemical Probes Portal (https://www.chemicalprobes.org/) was established as an expert-curated resource for high-quality chemical probes for usage in biomedical and drug discovery efforts (98, 99) and exemplifies the benefits of the close integration of chemistry with biology.
Advances in Data Science
Many of the advances in medicinal chemistry have been accelerated by advances in structure-enabled drug discovery approaches. The significant increases in computational power, including the development of graphical processing units (GPU), have enabled the implementation of more sophisticated algorithms and the execution of larger-scale experiments.
The Protein Data Bank, established in 1971, currently contains >200,000 entries covering X-ray crystallography structures and also structures determined by other techniques such as NMR, cryo-electron microscopy (cryoEM), and other diffraction methods (100). Since then, the receptor-based drug design field quickly emerged as a tool in drug discovery and evolved through increases in computational power (101, 102) and available structural information. The first publication on a small-molecule docking algorithm appeared in 1982 (103). With the sequencing of the human genome, there was renewed interest in assigning function to every protein in the human genome to understand human disease and enable drug discovery. To support those efforts, in the early 2000s, the Structural Genomics Consortium was started as a public–private partnership to solve crystal structures of novel proteins (104, 105).
Computational tools are now routinely used to find novel chemical matter for targets of interest via virtual screening or scaffold hopping, and for the structure-guided optimization of existing small-molecule binders to targets of interest to further enhance potency and/or selectivity over undesirable off-targets.
Recently, the enumeration of ever-larger chemical libraries (106, 107) has emerged, using available building blocks and precedented chemical reactions—Enamine REAL and the WuXi AppTec virtual library. These collections now cover billions of structures, which can be reliably synthesized within short time frames of 2 to 3 weeks at a reasonable cost. Although ligand-based approaches can be utilized to analyze these large collections comprehensively, screening these collections using receptor-based virtual screening requires too much computing time. In order to address those constraints, a combination of docking and docking-based machine learning approaches is being explored to prioritize compounds for testing (108, 109).
Deep learning has also enabled the development of generative chemistry engines to derive novel chemical structures based on existing chemical space, like, for example, those embedded in ChEMBL (110) or ZINC (111). Generally, either SMILES representations or graphical representations are used to describe the molecules, and different architectures can be used to generate novel molecules (112–115). In order to generate compounds within a specific chemical space of biological interest, the generic model is fine-tuned using compounds with the desired property of interest (115, 116). A more focused resource for cancer-centric drug discovery is canSAR (https://cansar.ai; Table 1), which integrates medicinal chemistry information with structural biology data and multiomic data (117).
In addition to advances in docking algorithms and integration of deep learning approaches, predictions of free energies of binding of small molecules to protein targets have become significantly more accurate in the last decade. Methods like free energy perturbation (FEP) and thermodynamic integration (TI) have benefited from improvements in molecular force fields, development of enhanced sampling methods, and utilization of GPUs instead of central processing units to make these methods important new tools in drug discovery (118, 119).
With the advances in force fields and hardware, molecular dynamics simulations and enhanced sampling methods have also become important tools for understanding protein dynamics–function relationships and small-molecule binding to protein targets (120–123). The increased simulation times that are now feasible have enabled reproduction of compound binding processes to cryptic binding sites (124–126) and the prediction of unexplored sites.
There has been a long-standing interest in the protein folding problem, or the question of how the amino acid sequence of a protein determines its 3D structure (127). Protein structure prediction methods have been an area of considerable interest, and since 1994, the biannual Critical Assessment of Techniques for Protein Structure Prediction (CASP) event has been held. CASP allows different groups to test their methods objectively on unpublished structures (128). With the incorporation of novel deep learning approaches, the DeepMind team developed AlphaFold (129) and placed first in CASP13 (130). CASP14 then saw multiple groups iterate on AlphaFold's advances, resulting in the development of RosettaFold (131), improvements to MULTICOM (132), and further development by DeepMind and the release of AlphaFold2 (129). Since the release of AlphaFold2, predicted structures for the complete human proteome and other species have become readily available. The utility of these models has been assessed in virtual screening (133) and FEP (134), and also to better define construct boundaries for crystallography and help resolve cryoEM and X-ray structures (135). However, there are many targets and domains of proteins that are poorly characterized experimentally, and/or have a significant amount of disorder to them, in which these approaches can provide little insight. In addition, proteins are highly dynamic, and the characterization of multiple dynamic states is poorly captured (135).
EVOLUTION OF SMALL-MOLECULE CREATION: LESSONS LEARNED FROM INDIVIDUAL ADVANCES
Targeting through Orthosteric and Allosteric Mechanisms: BCR–ABL
ABL1 is a receptor tyrosine kinase and proto-oncogene in chronic myelogenous leukemia (CML). Translocation between the ABL1 gene and the breakpoint cluster region (BCR) gene generates the BCR–ABL oncogene and is pathognomonic for CML (Philadelphia chromosome). Targeting BCR–ABL in CML with imatinib represents the foundational example of small-molecule precision oncology: In the first randomized phase III study, imatinib treatment led to improvements in complete cytogenic responses (76.2%) compared with the interferon-alpha plus cytarabine combination (14.5%; ref. 136). Imatinib is an ATP-competitive (i.e., orthosteric) inhibitor of ABL1 kinase and provided critical proof of concept for targeting a kinase, demonstrating that a small-molecule drug can compete with millimolar concentrations of ATP in the cell, and that sufficient selectivity could be achieved such that only a narrow spectrum of kinases are inhibited, enabling a wide therapeutic index.
Targeting BCR–ABL in CML also provided a benchmark for follow-on inhibitors that demonstrated additional key concepts in small-molecule precision oncology, such as the importance of potency, selectivity, and the utility of drugs with orthogonal mechanisms of action that can overcome resistance. Clear evidence of the impact of improved potency and selectivity was demonstrated through head-to-head studies of nilotinib, another ATP-competitive inhibitor of BCR–ABL, versus imatinib. Depending on the assay used, nilotinib is ∼10- to 30-fold more potent than imatinib on BCR–ABL but roughly equipotent on cKIT and PDGFRβ (137). In a phase III study, nilotinib treatment led to a higher complete molecular response rate (26% at 300 mg twice/day and 21% at 400 mg twice/day) compared with imatinib (10% at 400 mg daily) and fewer patients progressing to accelerated or blast phase and fewer CML-related deaths (138).
Although additional orthosteric BCR–ABL inhibitors have been developed in CML, the acquisition of resistance mutations is a universal challenge (reviewed in ref. 139). A novel approach to overcome this issue was discovered by a team at Novartis that demonstrated that BCR–ABL could be inhibited through an allosteric mechanism by optimizing a small molecule to occupy the myristoyl pocket, locking the kinase in an inactive conformation (140). This concept led to the development of asciminib (ABL-001), which specifically targets the ABL myristoyl pocket, and preclinical studies demonstrated its ability to inhibit the most common and broadly resistant mutant to orthosteric inhibitors, BCR–ABLT315I. Although preclinical studies suggested that asciminib-resistance mutations in BCR–ABL could arise, these mutants were effectively inhibited by imatinib, and the combination of asciminib + imatinib completely prevented the emergence of resistance mutations in a mouse tumor xenograft model of CML (141). Asciminib received accelerated approval to treat patients with CML who have failed two or more prior BCR–ABL inhibitors or in patients who have a BCR–ABLT315I mutation. Asciminib is also being tested in combination with imatinib (e.g., NCT03578367) and nilotinib (e.g., NCT03874858) in human clinical trials to test the hypothesis that combining orthogonal mechanisms of action will further delay or prevent the emergence of resistance.
Overall, the journey of targeting BCR–ABL with small molecules and incrementally improving patient outcomes spans more than 20 years and exemplifies part of the evolution of small-molecule discovery (139). Prior to the introduction of imatinib, patients were treated with interferon and chemotherapy and mostly progressed rapidly from the chronic phase and died in blast crisis, while today CML is a chronic disease that most patients can live with for decades, with about 5% to 10% achieving treatment-free remission after cessation of tyrosine kinase inhibitor (TKI) therapy (142).
Potency Matters: ALK Inhibitors
ALK is a receptor tyrosine kinase, and fusions between the ALK gene and nucleophosmin (NPM) or echinoderm microtubule-associated protein-like 4 (EML4) occur in approximately 60% of anaplastic large cell lymphomas and 2% to 7% of NSCLC. These fusions lead to enhanced dimerization and activation of the ALK kinase domain and drive oncogenic transformation. The discovery of ALK translocations in NSCLC led to the evaluation of the multitargeted ALK kinase inhibitor, crizotinib, and validated EML4–ALK as a therapeutic target. In a phase I study in patients with ALK+ NSCLC, crizotinib treatment resulted in a 47% response rate (RR), and in a subsequent randomized phase III trial, crizotinib treatment resulted in a 74% RR and median progression-free survival (PFS) of 10.9 months, which compared favorably with chemotherapy (45% and 7 months; ref. 143). Crizotinib is an orthosteric kinase inhibitor that was initially developed as a MET inhibitor and in fact inhibits several other kinases at similar concentrations as ALK and MET, leaving open the possibility to create second-generation inhibitors with greater potency against ALK. For example, the cellular potency of alectinib is approximately 4-fold more potent than crizotinib (∼10 nmol/L vs. ∼40 nmol/L IC50; ref. 144), and in a head-to-head phase III trial, alectinib provided a higher RR and improved median PFS versus crizotinib (92% vs. 79% and 34.1 vs. 10.2 months; refs. 145, 146). With the therapeutic benefit of ALK inhibitors clearly established, the macrocycle small-molecule lorlatinib (see “Advances in Medicinal Chemistry” section) was developed with best-in-class potency against ALK (∼2 nmol/L cellular IC50; ref. 144) and in a phase III trial demonstrated improved efficacy over crizotinib with an RR of 76% versus 58% (147) and 3-year PFS of 64% versus 19% (148). The improvement in clinical efficacy demonstrated by lorlatinib compared with crizotinib clearly demonstrates the potential for a more potent and target-optimized small-molecule inhibitor to displace an earlier-generation inhibitor in the first-line setting and represents a major advance for patients with NSCLC whose tumors express ALK translocations (Fig. 3). Although the case for increased potency can be made with ALK inhibitors and other targeted therapies, sufficient selectivity versus off-targets, especially for kinase inhibitors, must be maintained in order to realize improvements in therapeutic benefit.
The evolution of ALK inhibitors to treat ALK+ NSCLC. Kaplan–Meier plot illustrating the improvement in PFS of crizotonib vs. chemotherapy (143) on the left and lorlatinib vs. crizotinib (147) on the right. Reprinted with permission from NEJM. CI, confidence interval; SBDD, structure-based drug design.
The evolution of ALK inhibitors to treat ALK+ NSCLC. Kaplan–Meier plot illustrating the improvement in PFS of crizotonib vs. chemotherapy (143) on the left and lorlatinib vs. crizotinib (147) on the right. Reprinted with permission from NEJM. CI, confidence interval; SBDD, structure-based drug design.
Targeting Resistance Mutations: Four Generations of EGFR Inhibitors
EGFR is a receptor tyrosine kinase that plays a critical role in epithelial cell proliferation and homeostasis. Activating mutations in EGFR occur in approximately 20% of NSCLC tumors, the vast majority of which consist of exon 19 deletions (EGFRExon19del) or the L858R mutation (EGFRL858R). Erlotinib and gefitinib are ATP-competitive EGFR inhibitors that were first approved to treat NSCLC based on the hypothesis that EGFR overexpression drives tumors in this indication. However, it was soon recognized that only patients with EGFR-mutant tumors realize clinical benefit (149, 150). Although the clinical benefit in the EGFR-mutant patient population was confirmed in randomized clinical studies (151, 152), the majority of patients still relapsed in less than 1 year. As with other small-molecule targeted therapies, tumors developed resistance and the most frequent mechanism was through the acquisition of the T790M gatekeeper mutation in the EGFR ATP-binding pocket (153, 154). This mutation causes resistance due to increasing the affinity for ATP (lowers ATP Km), making it more difficult for erlotinib or gefitinib to bind. Although second-generation EGFR inhibitors were developed, differentiated from first-generation inhibitors through their covalent mode of binding, these compounds failed to effectively inhibit the T790M mutant.
The third-generation EGFR covalent inhibitor osimertinib was specifically designed to inhibit EGFRL858R;T790M and EGFRExon19del;T790M. In a phase II clinical trial in patients who relapsed on a prior EGFR inhibitor with the T790M mutation, osimertinib treatment resulted in a 70% RR, providing clinical proof of concept (155). Osimertinib was subsequently tested head-to-head against gefitinib or erlotinib in first-line NSCLC patients with EGFR mutations and demonstrated a comparable RR to the first-generation inhibitors (80% vs. 76%) but significant improvements in median PFS (18.9 vs. 10.2 months) and overall survival (38.6 vs. 31.8 months; refs. 156, 157).
First-line osimertinib treatment represents a significant improvement in therapeutic benefit to patients; however, acquired resistance to osimertinib still occurred through a range of mechanisms but most notably not through the acquisition of T790M (reviewed in ref. 158). Some of the on-target mechanisms of resistance represent opportunities for the development of next-generation EGFR inhibitors. For example, the EGFR C797S mutation causes resistance to osimertinib through mutation of the cysteine to which the drug covalently binds, opening the opportunity for next-generation noncovalent EGFR inhibitors. Fourth-generation EGFR inhibitors such as BLU-945 selectively inhibit EGFRL858R;T790M by binding in the orthosteric pocket through a noncovalent mechanism and therefore have the potential to treat osimertinib-resistant tumors with a C797S mutation (159). BLU-945 (NCT04862780) and several other so-called fourth-generation EGFR inhibitors are currently undergoing preclinical and clinical development. The development of drugs that inhibit different resistance mutations provides the potential to move combinations of EGFR inhibitors into first-line treatment and potentially prevent the emergence of on-target resistance mechanisms, similar to the strategy for ABL inhibitors as described above.
Covalent Targeting to Reveal Cryptic Drug-Binding Pockets: KRASG12C
KRAS is a small GTPase that transmits growth factor receptor signals from the cell membrane to intracellular signal transduction cascades, including the RAF–MEK–ERK mitogen-activated protein kinase pathway and the PI3K pathway. KRAS is the most frequently mutated oncogene, occurring in approximately 16% of all human cancers, and has been the focus of drug discovery efforts for more than 25 years. However, due to its high affinity for GTP and the lack of an alternative pocket, it remained an undruggable target until the focus turned to the creation of KRASG12C mutant–specific inhibitors through covalent binding to the mutant cysteine (70). The identification of cysteine-reactive fragments that selectively bind the mutant cysteine revealed a shallow cryptic pocket adjacent to the nucleotide-binding pocket. This seminal work launched multiple drug discovery efforts, culminating in the development and approval of sotorasib, the first inhibitor of KRASG12C in patients with NSCLC. Approval was based on a single-arm, phase II study in second-line KRASG12C-mutant NSCLC patients previously treated with standard therapies in which sotorasib demonstrated a 37% RR, a median PFS of 6.8 months, and median overall survival of 12.5 months (55). A second KRASG12C inhibitor, adagrasib, which has a similar mechanism of action, was also recently approved in second-line KRASG12C-mutant NSCLC patients and demonstrated comparable efficacy: a 42.9% RR, a median PFS of 6.5 months, and a median overall survival of 12.6 months (160).
The successful development of inhibitors against KRASG12C spurred the discovery of drugs that selectively target KRASG12D noncovalently (161), KRASG12R covalently (162), and wild-type KRAS noncovalently (163) using inhibitors that bind the same cryptic pocket. Leveraging cysteine-reactive small molecules to identify cryptic pockets is an approach that could be applied to other difficult-to-drug targets and encompasses the rapidly growing field of covalent chemical proteomics (92). The ability to screen cysteine-reactive fragment probes or more drug-like small molecules in live cells has the potential to identify cryptic pockets on thousands of proteins that were previously considered undruggable (see “Advances in Medicinal Chemistry” section).
Novel Drug Modalities: Molecular Glues and Degraders
Small molecules that promote the formation of a ternary complex between two proteins to alter the function or induce the degradation of a target protein create the potential for novel mechanisms of action against target proteins that are otherwise difficult to drug. A thorough review of the field is beyond the scope of this article, but the reader can refer to recent review articles (83, 86). Instead, we will provide a brief history of small-molecule glues and degraders with examples of oncology drugs or drug candidates in this class that have been approved or are currently undergoing clinical trials.
The natural product rapamycin was one of the first molecular glues to be discovered, inducing a ternary complex between mTOR and FKBP12, resulting in the allosteric inhibition of mTOR activity (reviewed in ref. 62). Rapamycin analogues with improved drug-like properties such as temsirolimus and everolimus have been approved to treat renal cell carcinoma, with the latter also approved to treat HR+/HER2− breast cancer and pancreatic neuroendocrine tumors. Recently, a bisteric inhibitor of mTORC1 has been described, which is composed of an FKBP12 binding moiety linked to an orthosteric mTOR kinase inhibitor (164), and a clinical candidate, RMC-5552, is undergoing a phase I clinical trial in solid tumors (NCT04774952). Bifunctional inhibitors of KRASG12C have also been described, which create a ternary complex with FKBP12 or cyclophilin A (165), and a phase I trial has recently started with RMC-6236, a so-called tricomplex inhibitor that generates a ternary complex with KRAS and cyclophilin A (NCT05379985).
The discovery of the mechanism of action of thalidomide, a drug used to treat multiple myeloma, involves the formation of a ternary complex between members of the IKAROS family of transcription factors (IKZF1 and IKZF3) and the E3 ubiquitin ligase cereblon (166), leading to the degradation of IKZF1/3. This discovery accelerated the emergence of new classes of drugs that function as monovalent degraders through cereblon engagement. For example, monovalent degraders of GSPT1 such as CC-90009 (NCT02848001) and MRT-2359 (NCT05546268) or IKZF2 such as DKY709 (NCT03891953) are currently undergoing clinical trials, and major efforts are ongoing to identify new chemical matter and new E3 ligases that can function together as molecular glue degraders (reviewed in ref. 167). This mechanism has the advantage of not requiring a drug-binding pocket in the POI, but rather relies on the generation of a drug-binding pocket at the interface of the two proteins. Thalidomide derivatives have also been used as cereblon engagers in heterobifunctional degraders (PROTAC), in which they are linked to a small molecule that binds to the POI. Such heterobifunctional degraders can also be generated using small molecules that engage other E3 ubiquitin ligases such as VHL, IAP, or MDM2. There are numerous heterobifunctional degraders in clinical development targeting a wide range of proteins including BCL-xL, BRD9, BTK, EGFRL858R, BRAFV600E, ER, AR, TRK, and IRAK4 (reviewed in ref. 86).
INTEGRATION OF THE TECHNOLOGICAL TOOLBOX
These technology advances over the past 20 years have dramatically expanded the drug hunter's toolbox to address an equally expanding universe of validated, driver oncology targets. Maximally leveraging these technologies requires an integrated and synergistic approach, not simply deploying them in a linear, one-at-a-time manner. For example, advanced chemical proteomics tools can uncover cryptic pockets in proteins (93), but these pockets, while ligandable by reactive probes, may not be large enough or open long enough to make them druggable. Motion-based computational drug discovery may separately reveal transient pockets in such proteins (125). Combining these technologies allows one to understand when and where these reactive cryptic pockets become available for small-molecule ligands, and once identified, how best to advance the structure–activity relationship, aided by structural biology and computational techniques, to provide additional, noncovalent points of interaction to increase the potency and selectivity of the emerging drug-like leads. Additionally, machine learning tools that augment the drug hunter's experience and skill to accelerate specific aspects of drug design have now begun to make their way into the toolbox (112–116). When machine learning is integrated with chemical proteomics, for example, the identification of cryptic pockets and the features of ligands that can bind in these pockets may become predictable with sufficient starting datasets. Deployment of a single tool (“hammer”) against a challenging drug target may yield suboptimal candidate design, if at all, as not all drug design challenges are a “nail” (Maslow's hammer concept). As these technologies become more commonplace in the industry, we expect their integrated use (“toolbox”) to become more and more critical to targeting a wide range of oncology targets to arrive at high-quality compound designs (Fig. 4).
Integration of biology, chemistry, and data science is required to support the identification of novel targets and develop optimized, high-quality drug candidates. SBDD, structure-based drug design.
Integration of biology, chemistry, and data science is required to support the identification of novel targets and develop optimized, high-quality drug candidates. SBDD, structure-based drug design.
With these combined tools in hand, the oncology drug hunter now faces the challenge of choosing the right targets with profiles that balance the technical and commercial risks of oncology drug discovery. Although precision targeted therapies can be highly effective in small patient populations, there remains a wide area of therapeutic “white spaces” where exquisitely selective candidates with enhanced drug profiles against high-impact targets may deliver transformational patient outcomes (Fig. 5), which can be divided into three categories.
- 1.
Clinically validated targets with current therapies that have suboptimal properties and leave significant room for additional patient benefit, such as from improved selectivity over anti–targets of interest or improved pharmacokinetic properties that expand the treatable sites or target coverage in patients. In NSCLC, the ALK inhibitor progression from suboptimal EML4–ALK targeting (crizotinib) to more optimized compounds has yielded remarkable improvement in patient outcomes. KRASG12C, a previously “undruggable” target, has now been clinically validated by commercially available medicines, but newer, differentiated, and potentially “best-in-class” compounds are now being tested in the clinic. New classes of mutant-selective PI3Kα inhibitors such as RLY-2608, LOXO-783, and STX-478 have now entered clinical trials (NCT05216432, NCT05307705, and NCT05768139, respectively). These inhibitors have the potential to prevent the on-target toxicities associated with PI3Kα inhibitors, such as alpelisib, that are caused by the inhibition of wild-type PI3Ka in normal tissues (168).
- 2.
Classically “undruggable” targets that do not have a natural receptor pocket, which are well validated, typically with robust clinical cancer genetics and a preclinical functional genetic data package supporting a precision medicine hypothesis. These include transcription factors—a large class of well-validated tumor vulnerabilities, such as MYC, which is one of the most frequently aberrantly expressed, amplified, or translocated transcription factors across a range of tumors. Examples include lineage-dependent transcription factors (with ER and AR as prime “druggable” examples) or emerging synthetic lethal targets.
- 3.
Novel targets that have not been previously identified or validated, which nevertheless emerge from newer computational biology analysis of large datasets. These targets represent a new frontier of oncology drug discovery and, once validated using cell-based and animal model systems, could significantly expand the number of patients who can benefit from precision oncology. Exciting examples of these that have recently emerged include the GEMINI targets that integrate both cancer somatic and germline population-level genetics to identify targets that are synthetically lethal with highly frequent loss-of-heterozygosity (LOH) events in tumors (169). In this article, the authors identified an essential gene, the DNA primase PRIM1, which is located on a chromosomal locus with high-frequency LOH events in a variety of tumors and which contains polymorphisms common in human populations. Selective targeting of one of the polymorphisms using allele-selective CRISPR techniques led to specific cell killing of patient-derived cells containing the targeted polymorphism, whereas patient-derived cells containing the nontargeted polymorphism were unaffected. These experiments offer proof of concept for the potential to use silent polymorphisms to convert common essential targets to highly selective precision medicine targets.
The majority of targeted therapies serve patient populations of <10,000. Circles represent precision medicines against the indicated target, and colors represent tumor type as shown in the legend. Source: Boston Consulting Group analysis of Decision Resources Group epidemiology, ClinicalTrials.gov, FDA labels, and company websites. Data were gathered for approved precision oncology assets labeled according to their biological target and overall response rate (ORR) vs. prevalence of the relevant metastatic cancer. In instances in which there were several assets approved with the same biological target, ORR was based on the drug with the strongest response. AML, acute myelogenous leukemia; BCC, basal cell carcinoma; CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CRC, colorectal cancer; FL, follicular lymphoma; GIST, gastrointestinal stromal tumor; HCC, hepatocellular carcinoma; MCL, mantle cell lymphoma; MM, multiple myeloma; MZL, marginal zone lymphoma; NSCLC, non–small cell lung cancer; RCC, renal cell carcinoma; STS, soft tissue sarcoma; TGCT, tenosynovial giant cell tumor; WM, Waldenstrom macroglobulinemia.
The majority of targeted therapies serve patient populations of <10,000. Circles represent precision medicines against the indicated target, and colors represent tumor type as shown in the legend. Source: Boston Consulting Group analysis of Decision Resources Group epidemiology, ClinicalTrials.gov, FDA labels, and company websites. Data were gathered for approved precision oncology assets labeled according to their biological target and overall response rate (ORR) vs. prevalence of the relevant metastatic cancer. In instances in which there were several assets approved with the same biological target, ORR was based on the drug with the strongest response. AML, acute myelogenous leukemia; BCC, basal cell carcinoma; CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; CRC, colorectal cancer; FL, follicular lymphoma; GIST, gastrointestinal stromal tumor; HCC, hepatocellular carcinoma; MCL, mantle cell lymphoma; MM, multiple myeloma; MZL, marginal zone lymphoma; NSCLC, non–small cell lung cancer; RCC, renal cell carcinoma; STS, soft tissue sarcoma; TGCT, tenosynovial giant cell tumor; WM, Waldenstrom macroglobulinemia.
The final challenge before the drug hunter embarks on a campaign against a target from one of these three categories is prioritization. Unlike therapeutic areas in which there is a dearth of validated targets and robust model systems, such as neurologic diseases like Alzheimer's, the oncology field has benefited from the immense expansion of validated target space over the past 20 years. For clinically validated targets, one prioritization approach is to systematically and comprehensively map the approved therapeutic and target landscape for precision oncology according to the patient population size and the RR or PFS these existing drugs have achieved in patients, as shown in Fig. 5. The upper right quadrant of such an analysis contains highly effective therapies for large populations of patients with cancer. The lower left quadrant are therapies with minimal patient impact (overall response rate <40%) in small patient populations. Low response rates for drugs against some of these targets may reflect tumor biology—such a target is simply not enough of a dependency in the given tumor to lead to meaningful patient responses regardless of how optimally the drugs inhibit the protein. Alternatively, these targets may represent rich opportunities for an integrated toolbox of drug discovery to create more selective, more potent drugs with improved drug metabolism and pharmacokinetic properties, overcoming the limitations of existing therapies. Similar systematic approaches can be applied to an investigational drug landscape, or even a preclinical target landscape, mapping preclinical data, such as functional genetic dependency scores (universally or selectively in certain cell types), to prioritize targets for drug discovery. The various technological approaches and classification of targets described above should also be considered in the context of more detailed metrics that characterize overall druggability such as target class, protein structure, chemical tractability, and precedent as previously described (170).
OUTLOOK FOR PRECISION ONCOLOGY IN THE EVOLVING ONCOLOGY LANDSCAPE
Prior to 2000, cancer treatment had mostly relied on crude methods such as chemotherapy, radiation, and surgery to combat this aggressive disease with moderate success (171). The advent of the genomic era enabled the scientific community to begin using genetic targets such as BCR–ABL for the design of precision drugs. Thus, precision oncology was born in 2001 with the transformational success of imatinib in CML (172). In the 20 years following imatinib, we have used available chemistry and data science tools and emerging biology insights to drive small-molecule designs for genetically defined targets. Of the 160 approved oncology drugs between 2001 and 2021, 68% were small molecules, thus illustrating its relative impact. However, due to existing limitations in our knowledge of target biology and the absence of more recent chemistry and data science tools, most new compounds could only address targets representing rare patient populations. In addition, many compounds were characterized by narrow therapeutic windows due to toxicities and other undesired properties that could not be engineered out of the molecules. Cumulatively, targeted therapies are estimated to benefit approximately 7% of patients today (5). Despite these limitations, some exceptional molecules have been developed, such as the ALK inhibitor lorlatinib and the EGFR inhibitor osimertinib, both for segments of patients with NSCLC (147, 156, 157). The disparity between the number of new drugs and the paucity of patient benefit can be addressed only with a consistent improvement of molecule designs that allow access to more common cancer targets and offer cleaner drug profiles for wider therapeutic windows. In addition, beyond the science of drug discovery, patient access to genetic testing and approved medicines must improve to maximize benefit in the future.
As is illustrated by the 20-year journey of targeting BCR–ABL with imatinib and several generations of subsequent TKIs, patient outcomes can be incrementally improved (139), transforming CML into a chronic disease with about 5% to 10% of patients achieving treatment-free remission after cessation of TKI therapy (142). With an expanded toolbox of advanced technologies, this time window may be substantially reduced for new cancer targets.
As we have summarized above, the last 20 years have brought an extensive evolution of our biological knowledge base and investigational biology tools accompanied by the introduction of novel technologies in chemistry and data science that now allow a more sophisticated and integrated approach to small-molecule drug discovery. In particular, the access to the full toolbox paired with the scientific skill set of trained drug hunters enables the integration of these tools and their application to the right targets. With an increased ambition to tailor the selectivity of the compound for the target in the diseased cell but not the wild-type cell, and carefully defined drug properties such as improved pharmacokinetic and pharmacodynamic profiles, next-generation small molecules may deliver the desired transformational clinical effects more consistently than ever before. The ambition should now go beyond improvements over existing drugs that have suboptimally addressed known targets and extend to previously undruggable targets and novel targets with complex biology.
Although the outlook for more precise targeted therapies has certainly improved, there will remain biologically relevant cancer targets that cannot be directly drugged even with novel tools such as chemical proteomics. Such targets may be transcription factors with cryptic pockets to address some mutations of p53 or the MYC oncogene, among others. The druggable universe the drug hunter can access has grown but is not indefinite.
In totality, with this integrated approach, we expect more frequent creation of transformational medicines for underserved patient populations, driven by better drug profiles offering wider therapeutic windows, better tolerability, longer duration of treatment, more efficacy, and a broader ability to combine new small molecules with other drug classes for optimized therapy and ultimate opportunities to cure disease.
Authors’ Disclosures
D.D. Stuart reports other support from Scorpion Therapeutics outside the submitted work. A. Guzman-Perez reports other support from Scorpion Therapeutics outside the submitted work. N. Brooijmans reports other support from Scorpion Therapeutics outside the submitted work. E.L. Jackson reports other support from Scorpion Therapeutics during the conduct of the study. G.V. Kryukov reports other support from Scorpion Therapeutics and KSQ Therapeutics outside the submitted work. A.A. Friedman reports other support from Scorpion Therapeutics outside the submitted work. A. Hoos reports other support from Scorpion Therapeutics during the conduct of the study, as well as other support from Scorpion Therapeutics outside the submitted work. The authors are all employees of Scorpion Therapeutics and receive a salary and hold stock options.
Acknowledgments
The authors thank Boston Consulting Group for the analysis presented in Fig. 5, Allison Bruce for graphic design, and Sejal Patel for useful discussion.