This article is featured in Highlights of This Issue, p. 697
The era of “big data” has been transformative for many disciplines, particularly for the health sciences with increased access to electronic health records and other personal information. The proliferation of genome-wide association studies across virtually all phenotypes, supported by novel technologies accessible at decreasing costs, demonstrates that strength in numbers can uncover extremely modest associations with statistical significance. A close evaluation of population sciences reveals that virtually all aspects of epidemiologic research are undergoing a similar transformation. Novel technologies are enabling new ways to measure nongenetic exposures; real-time/wearable technologies, smart questionnaires, text messaging, linkages to environmental data sources and other administrative databases, and new global biomarker assays (e.g., metabolomics, methylation, imaging) are all resulting in a breadth and depth of data previously unseen. The development of these tools has had an important influence on research approaches and methodology, by necessity. Traditional methods using the postal service or random-digit dialing for recruiting participants are no longer feasible. Technological innovations are expanding and altering the nature of population recruitment, consenting, and retention through Web/mobile-based app technologies (e.g., online and/or via social media) that are more efficient and less costly than telephone-based and mail-based methods.
The emerging use of mobile technologies for population research is predicated on the strong need for ascertaining rapid/real-time detection of signals (1). Smart questionnaires and wearable/app-based exposure assessment present the possibility of instantaneous data collection and analyses. The generation of big data has also resulted in a transformation of downstream processes for data storage and analyses. As evidenced by the genomic revolution, an important advantage of these new technologies to population-based research is the precision of results. Moreover, these collective resources are expanding the breadth, depth, and scope of data that can be utilized for research.
Although this is an exciting time for epidemiologic research, fully leveraging these technological and digital advances and translating the efficiencies to the recruitment and retention of patient populations, data collection, and informatics is a major challenge. Many of these new tools and methodologies require population scientists to expand their knowledgebase and to adapt and embrace new skills that have not traditionally been taught in graduate school. As stated by Mooney and colleagues (2), the value of epidemiology in the future “lies in integrating subject matter knowledge with increased technical savvy.”
The inaugural conference on “Modernizing Population Sciences in the Digital Age” was conceived as an introduction to state-of-the-art technologies and information systems that will enrich epidemiologic research and revolutionize methodologic approaches to the study of human disease. The meeting highlighted numerous examples of transformative population and clinical research that leveraged recent technological advances.
The collection of articles in this CEBP Focus highlight the work of several scientists who are moving their research forward in this digital age through the integration of novel methodologies. Examples from the Cancer Prevention Study-3 (CPS-3) and California Teachers Study (CTS) cohorts demonstrate the deployment of Web-based questionnaires (3, 4) and the use of marketing automation technology to assess real-time response rates and optimize participant engagement (3). The latter example demonstrates the benefit of applying tools from new disciplines (e.g., sales and marketing) that have successfully embraced mobile technologies. Out-of-the box methods for recruiting participants are also demonstrated by the use of quota sampling as an efficient means for ascertaining specified populations (5). This is just one example of a growing number of population resources, including populations enrolled in direct-to-consumer genetic testing (e.g., 23andMe) that make data available for research purposes. New field tools also abound, with real-time measurement of environmental exposures leveraging the use of smartphone apps and Fitbits in the Nurses' Health Study 3 (NHS3; ref. 6), and the use of digital applications in the field to assess physical and cognitive function in the 10,000 Families Study (7). Real-time mobile tools are also being deployed for intervention studies, such as text-based dietary interventions among colorectal cancer survivors (8) and the use of wearables (e.g., Fitbit) to promote physical activity (9, 10). To handle high-density data, the CTS describes the transformation of their infrastructure for moving data storage, management, data analysis, and data sharing activities to the cloud (11). Querying large multisource databases can yield important scientific insights as demonstrated by the harmonization of health care databases in Ohio (12), the linkage of electronic health records with cancer registry data (13), and the use of the Utah Population Database to directly address key issues in cancer research, for example, etiologic heterogeneity and cancer pleiotropy (14). Finally, the use of natural language processing and machine/deep learning techniques to explore electronic health records data is demonstrated in an assessment of treatment-related side effects and patient-reported outcomes (15).
It is our hope that this collection of articles emboldens epidemiologists and population scientists to embrace the paradigm shift in research methodology. We acknowledge that thoughtful application of these new technologies and synthesis of resultant “big data” into meaningful scientific discoveries will require substantial investment in new collaborations with researchers across fields not typically associated with population sciences, such as engineers and application developers. Nonetheless, the genomics revolution has shown that there is tremendous power in numbers. The need for robust development of predictive markers of health and disease has already led population scientists to collect increasing amounts of intrinsic and extrinsic exposure data for each study participant. We urge our fellow researchers to explore and support the ever-evolving digital world and lead by example, as our colleagues have done in this special section.
See all articles in this CEBP Focus section, “Modernizing Population Science.”
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.