Background: Cancer epidemiology studies offer unique value to the cancer research enterprise. With large populations and detailed data, including repeat measurements, collected over many years, cancer epidemiology studies contribute immensely to the understanding of real-world cancer risks and outcomes. To accelerate research, discovery, and translation, and to reduce the overall burden of cancer, the NCI has embraced the vision, as outlined by the Cancer Moonshot Blue Ribbon Panel, of creating a national cancer data ecosystem that enables “all participants across the cancer research and care continuum to contribute, access, combine, and analyze diverse data.” The cancer data ecosystem will integrate data from many domains using an overall infrastructure that leverages the elasticity, scalability, and security of cloud computing. The first components of the ecosystem are based on genomics and imaging. Other areas, including biomarkers, animal models, clinical trials, EHRs, and observational studies, will follow. Including data from the dozens of ongoing cancer epidemiology studies, and especially the prospective cancer epidemiology cohorts, that exist today would greatly enhance the cancer data ecosystem. However, many of these existing studies began before cloud computing existed and do not use cloud computing to store, manage, analyze, and share their data. Cancer epidemiology studies today must decide how to transition from their legacy data approaches to the types of cloud-first approach the national cancer data ecosystem requires.

Methods: The California Teachers Study (CTS) is a large and prospective cancer epidemiology cohort that began in 1995-1996. In 2014, the CTS decided to adopt cloud computing to improve the quality, speed, and visibility of its data collection, management, analysis, and sharing. Since 2015, in partnership with the San Diego Supercomputer Center and funded by award U01-CA199277, the CTS has installed a new, innovative, and secure yet flexible infrastructure that includes a data warehouse, data marts, and shared workspace.

Results: The CTS infrastructure contains the technical, security, data, and process features necessary to participate in the national cancer data ecosystem. Our novel solution serves as a prototype that other cancer epidemiology studies—and other observational studies and cohorts in general—can emulate. In designing, developing, and deploying this CTS solution, our team encountered a number of challenges that are likely not unique to the CTS and that other observational studies and cohorts are likely to encounter as they modernize their data and infrastructure. This presentation will discuss lessons learned in three key areas that the CTS encountered during our data-modernization process. First, although cancer epidemiology has always been multidisciplinary, the transition to a cancer data ecosystem creates opportunities for even broader collaborations with new partners, both public and private. Second, a central benefit of the national cancer data ecosystem is to process data via standardized pipelines. This should incentivize cancer epidemiology studies to prioritize standardization of common cancer epidemiologic data activities. Third, the cancer data ecosystem creates opportunities for new entities, and not just traditional NIH-funded cancer epidemiology studies, to meaningfully contribute to the observational research landscape. Although competition for funding and relevance seems likely to increase, the overall effects on the cancer research enterprise could be positive.

Conclusion: Observational studies, and especially cancer epidemiology studies, will be a vital component of the national cancer data ecosystem, but existing cancer epidemiology studies will need to drastically and rapidly modernize their core data processes and infrastructure before that can occur. As an early adopter of cloud computing and the key foundational principles of the cancer data ecosystem, the CTS has developed a fully functioning and successful example of how existing cancer epidemiology studies can modernize their infrastructure in ways that preserve their unique strengths and further increase their immense potential and value.

Citation Format: Jim Lacey. Insights from adopting a data commons approach: The California Teachers Study [abstract]. In: Proceedings of the AACR Special Conference on Modernizing Population Sciences in the Digital Age; 2019 Feb 19-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(9 Suppl):Abstract nr IA22.