This article is featured in Highlights of This Issue, p. 697

The era of “big data” has been transformative for many disciplines, particularly for the health sciences with increased access to electronic health records and other personal information. The proliferation of genome-wide association studies across virtually all phenotypes, supported by novel technologies accessible at decreasing costs, demonstrates that strength in numbers can uncover extremely modest associations with statistical significance. A close evaluation of population sciences reveals that virtually all aspects of epidemiologic research are undergoing a similar transformation. Novel technologies are enabling new ways to measure nongenetic exposures; real-time/wearable technologies, smart questionnaires, text messaging, linkages to environmental data sources and other administrative databases, and new global biomarker assays (e.g., metabolomics, methylation, imaging) are all resulting in a breadth and depth of data previously unseen. The development of these tools has had an important influence on research approaches and methodology, by necessity. Traditional methods using the postal service or random-digit dialing for recruiting participants are no longer feasible. Technological innovations are expanding and altering the nature of population recruitment, consenting, and retention through Web/mobile-based app technologies (e.g., online and/or via social media) that are more efficient and less costly than telephone-based and mail-based methods.

The emerging use of mobile technologies for population research is predicated on the strong need for ascertaining rapid/real-time detection of signals (1). Smart questionnaires and wearable/app-based exposure assessment present the possibility of instantaneous data collection and analyses. The generation of big data has also resulted in a transformation of downstream processes for data storage and analyses. As evidenced by the genomic revolution, an important advantage of these new technologies to population-based research is the precision of results. Moreover, these collective resources are expanding the breadth, depth, and scope of data that can be utilized for research.

Although this is an exciting time for epidemiologic research, fully leveraging these technological and digital advances and translating the efficiencies to the recruitment and retention of patient populations, data collection, and informatics is a major challenge. Many of these new tools and methodologies require population scientists to expand their knowledgebase and to adapt and embrace new skills that have not traditionally been taught in graduate school. As stated by Mooney and colleagues (2), the value of epidemiology in the future “lies in integrating subject matter knowledge with increased technical savvy.”

The inaugural conference on “Modernizing Population Sciences in the Digital Age” was conceived as an introduction to state-of-the-art technologies and information systems that will enrich epidemiologic research and revolutionize methodologic approaches to the study of human disease. The meeting highlighted numerous examples of transformative population and clinical research that leveraged recent technological advances.

The collection of articles in this CEBP Focus highlight the work of several scientists who are moving their research forward in this digital age through the integration of novel methodologies. Examples from the Cancer Prevention Study-3 (CPS-3) and California Teachers Study (CTS) cohorts demonstrate the deployment of Web-based questionnaires (3, 4) and the use of marketing automation technology to assess real-time response rates and optimize participant engagement (3). The latter example demonstrates the benefit of applying tools from new disciplines (e.g., sales and marketing) that have successfully embraced mobile technologies. Out-of-the box methods for recruiting participants are also demonstrated by the use of quota sampling as an efficient means for ascertaining specified populations (5). This is just one example of a growing number of population resources, including populations enrolled in direct-to-consumer genetic testing (e.g., 23andMe) that make data available for research purposes. New field tools also abound, with real-time measurement of environmental exposures leveraging the use of smartphone apps and Fitbits in the Nurses' Health Study 3 (NHS3; ref. 6), and the use of digital applications in the field to assess physical and cognitive function in the 10,000 Families Study (7). Real-time mobile tools are also being deployed for intervention studies, such as text-based dietary interventions among colorectal cancer survivors (8) and the use of wearables (e.g., Fitbit) to promote physical activity (9, 10). To handle high-density data, the CTS describes the transformation of their infrastructure for moving data storage, management, data analysis, and data sharing activities to the cloud (11). Querying large multisource databases can yield important scientific insights as demonstrated by the harmonization of health care databases in Ohio (12), the linkage of electronic health records with cancer registry data (13), and the use of the Utah Population Database to directly address key issues in cancer research, for example, etiologic heterogeneity and cancer pleiotropy (14). Finally, the use of natural language processing and machine/deep learning techniques to explore electronic health records data is demonstrated in an assessment of treatment-related side effects and patient-reported outcomes (15).

It is our hope that this collection of articles emboldens epidemiologists and population scientists to embrace the paradigm shift in research methodology. We acknowledge that thoughtful application of these new technologies and synthesis of resultant “big data” into meaningful scientific discoveries will require substantial investment in new collaborations with researchers across fields not typically associated with population sciences, such as engineers and application developers. Nonetheless, the genomics revolution has shown that there is tremendous power in numbers. The need for robust development of predictive markers of health and disease has already led population scientists to collect increasing amounts of intrinsic and extrinsic exposure data for each study participant. We urge our fellow researchers to explore and support the ever-evolving digital world and lead by example, as our colleagues have done in this special section.

See all articles in this CEBP Focus section, “Modernizing Population Science.”

No potential conflicts of interest were disclosed.

1.
Ehrenstein
V
,
Nielsen
H
,
Pedersen
AB
,
Johnsen
SP
,
Pedersen
L
. 
Clinical epidemiology in the era of big data: new opportunities, familiar challenges
.
Clin Epidemiol
2017
;
9
:
245
50
.
2.
Mooney
SJ
,
Westreich
DJ
,
El-Sayed
AM
. 
Commentary: epidemiology in the era of big data
.
Epidemiology
2015
;
26
:
390
4
.
3.
Savage
KE
,
Benbow
JL
,
Duffy
C
,
Spielfogel
ES
,
Chung
NT
,
Wang
SS
, et al
Using marketing automation to modernize data collection in the California Teachers Study cohort
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
714
23
.
4.
Rittase
M
,
Kirkland
E
,
Dudas
DM
,
Patel
AV
. 
Survey item response rates by survey modality, language, and sociodemographic factors in a large U.S. cohort
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
724
30
.
5.
Miller
CA
,
Guidry
JPD
,
Dahman
B
,
Thomson
MD
. 
A tale of two diverse Qualtrics samples: information for online survey researchers
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
731
5
.
6.
Fore
R
,
Hart
JE
,
Choirat
C
,
Thompson
JW
,
Lynch
K
,
Laden
F
, et al
Embedding mobile health technology into the Nurses' Health Study 3 to study behavioral risk factors for cancer
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
736
43
.
7.
Thyagarajan
B
,
Nelson
HH
,
Poynter
JN
,
Prizment
AE
,
Roesler
MA
,
Cassidy
E
, et al
Field application of digital technologies for health assessment in the 10,000 Families Study
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
744
51
.
8.
Van Blarigan
EL
,
Kenfield
SA
,
Chan
JM
,
Van Loon
K
,
Paciorek
A
,
Zhang
L
, et al
Feasibility and acceptability of a Web-based dietary intervention with text messages for colorectal cancer: a randomized pilot trial
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
752
60
.
9.
Liao
Y
,
Basen-Engquist
KM
,
Urbauer
DL
,
Bevers
TB
,
Hawk
E
,
Schembre
SM
. 
Using continuous glucose monitoring to motivate physical activity in overweight and obese adults: a pilot study
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
761
8
.
10.
Robertson
MC
,
Green
CE
,
Liao
Y
,
Durand
CP
,
Basen-Engquist
KM
. 
Self-efficacy and physical activity in overweight and obese adults participating in a worksite weight loss intervention: multistate modeling of wearable device data
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
769
76
.
11.
Lacey
JV
 Jr
,
Chung
NT
,
Hughes
P
,
Benbow
JL
,
Duffy
C
,
Savage
KE
, et al
Insights from adopting a data commons approach for large-scale observational cohort studies: the California Teachers Study
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
777
86
.
12.
Tian
YD
,
Menegay
H
,
Waite
KA
,
Saroufim
PG
,
Beno
MF
,
Barnholtz-Sloan
JS
. 
Facilitating cancer epidemiologic efforts in Cleveland via creation of longitudinal de-duplicated patient data sets
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
787
95
.
13.
Thompson
CA
,
Jin
A
,
Luft
HS
,
Lichtensztajn
DY
,
Allen
L
,
Liang
S-Y
, et al
Population-based registry linkages to improve validity of electronic health record-based cancer research
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
796
806
.
14.
Hanson
HA
,
Leiser
CL
,
Madsen
MJ
,
Gardner
J
,
Knight
S
,
Cessna
M
, et al
Family study designs informed by tumor heterogeneity and multi-cancer pleiotropies: the power of the Utah Population Database
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
807
15
.
15.
Hernandez-Boussard
T
,
Blayney
DW
,
Brooks
JD
. 
Leveraging digital data to inform and improve quality cancer care
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
816
22
.