The UK's Institute of Cancer Research, London has broadened its CanSAR knowledgebase, a cancer drug–discovery platform that integrates genomic, chemical, and pharmacologic data. The newly released CanSAR 2.0 adds more prediction methodologies and holds data on more than 8 million experimentally derived measurements, nearly 1 million biologically active chemical compounds, and more than 1,000 cancer cell lines.

The UK's Institute of Cancer Research, London (ICR) has expanded its CanSAR knowledgebase, a cancer drug–discovery platform that integrates genomic, chemical, and pharmacologic data and claims to be the world's largest disease database of its type.

CanSAR 2.0 holds data on more than 8 million experimentally derived measurements, nearly 1 million biologically active chemical compounds, and more than 1,000 cancer cell lines. “In the new release, we've more than doubled the data, added a lot of patient-derived data, and added new prediction methodologies,” says Bissan Al-Lazikani, PhD, team leader in computational biology and chemogenomics at the ICR.

“CanSAR spans the worlds of chemistry, pharmacology, network biology, and genetics,” Al-Lazikani says. “Additionally, instead of just concentrating all these data in one place, it translates the data from these different scientific domains into a common language and presents them for you in an easy-to-digest-and-explore manner.”

For example, researchers studying potential targets “need to gather together as much of the related data as possible, such as whether there are chemical tools for the targets and compounds that are active for them,” she says. “We put all of these data in one place and make it as easy to digest as we could make it, given the complexity.”

Additionally, CanSAR incorporates a number of algorithms that predict which targets are most likely to be successful in a drug-discovery context. “They look at 3D structure information, chemical information, protein–protein interaction data, and so on, and they come up with information and mechanisms that the researcher can then use to triage targets,” she says.

Scientists can use the knowledgebase “as the first port of call,” Al-Lazikani says. “If you're interested in a gene, for example, asking CanSAR to tell you absolutely everything that's known about it can save you weeks of digging around. Additionally, in the new version, we've got summaries not just for the genes and the drugs but for protein families and cancer cell lines.”

Originally built for ICR's own use, with funding from Cancer Research UK, CanSAR has been freely available since 2010 and more than 26,000 outside researchers have accessed it.

ICR plans numerous further enhancements for the knowledgebase. “CanSAR currently lets you look at an entire protein interaction network and immediately identify any potential deregulated proteins within it, any druggable proteins within it, any proteins with chemical tools, and so on,” Al-Lazikani says. “We want to help people to better interrogate and interact with these networks. We're also looking for ways to capture the outcomes of clinical trials. Additionally, further down the road, we are talking with colleagues within ICR about making links between imaging data and all of these molecular data.”

“Overall,” she says, “our goal is to make it easier for scientists to effectively utilize the mass of multidisciplinary data for translational research, especially drug discovery.”