A National Cancer Policy Forum workshop weighs the challenges in analyzing the flood of all types of data produced by cancer studies.

Workshop weighs challenges in analyzing the flood of data of all types

Every day, in laboratories and clinics across the globe, cancer researchers generate mind-boggling amounts of data—from gene sequences and digital images of myriad different tumors to information on patient outcomes. Their challenge is “to develop an integrated network system to deal with the overwhelming amount of information from basic, translational, and clinical research,” according to William Dalton, PhD, MD, director of the Moffitt Cancer Center and Research Institute (Tampa, FL).

“This is not easy and it's not cheap,” said Dalton, quoting National Cancer Institute (NCI) director Harold Varmus, MD. “I think that summarizes it right there.”

Dalton, along with 30 other academic and industry experts, spoke at “Informatics: Needs and Challenges in Cancer Research,” a National Cancer Policy Forum workshop at the Institute of Medicine in Washington, DC, on February 27.

Dana-Farber Cancer Institute Chief Medical Officer Lawrence Shulman, MD, noted that in dealing with clinical data, such an integrated network system must comply with federal health information privacy and security laws. “De-identifying” patient information might make this possible, but a bigger problem is the current state of electronic health records (EHR), which are neither uniform nor in sufficiently widespread use.

Additionally, even the best EHRs don't track key cancer-related information, including patient demographics and tumor type, with both anatomic and nonanatomic staging (for example, if the tumor has a KRAS or EGFR kinase mutation), Shulman said.

Other challenges include technical problems with data sharing, including the need to safeguard intellectual property rights, and data governance, such as ensuring the validity and quality of data, and making sure that data can be understood within different contexts depending on who needs it, Dalton commented.

In addition to addressing all these concerns, Daniel Masys, MD, an affiliate professor of biomedical and health informatics at the University of Washington in Seattle, emphasized the need to set achievable targets, as he described the lessons learned to date from NCI informatics efforts.

Launched in 2004, NCI's Cancer Biomedical Informatics Grid (caBig) was designed to help collect, manage, and analyze the explosion of data from cancer researchers. But a 2011 review by the board of scientific advisors for the working group on caBig criticized the project for what it described as serious failings, including a technology-centric approach to data sharing and “unfocused expansion.”

Masys, who chaired the caBig review committee, recommended smaller, more targeted goals, such as creating applications where researchers have a pre-existing motivation to share data. And instead of trying to design a “perfect” program, make one that's useable, he advised. “We need to shift our frame of reference from engineering specifications for a provable, correct system to one that adds immediate function, and we'll get it right along the way,” he said.