Abstract
Like other cancer epidemiologic cohorts, the California Teachers Study (CTS) has experienced declining participation to follow-up questionnaires; neither the reasons for these declines nor the steps that could be taken to mitigate these trends are fully understood.
The CTS offered their 6th study questionnaire (Q6) in the fall of 2017 using an integrated, online system. The team delivered a Web and mobile-adaptive questionnaire to 45,239 participants via e-mail using marketing automation technology. The study's integrated platform captured data on recruitment activities that may influence overall response, including the date and time invitations and reminders were e-mailed and the date and time questionnaires were started and submitted.
The overall response rate was 43%. Participants ages 65 to 69 were 25% more likely to participate than their younger counterparts (OR = 1.25; 95% CI, 1.18–1.32) and nonwhite participants were 28% less likely to participate than non-Hispanic white cohort members (OR = 0.72; 95% CI, 0.68–0.76). Previous questionnaire participation was strongly associated with response (OR = 6.07; 95% CI, 5.50–6.70). Invitations sent after 2 pm had the highest response (OR = 1.75; 95% CI, 1.65–1.84), as did invitations sent on Saturdays (OR = 1.48; 95% CI, 1.36–1.60).
An integrated system that captures paradata about questionnaire recruitment and response can enable studies to quantify the engagement patterns and communication desires of cohort members.
As cohorts continue to collect scientific data, it is imperative to collect and analyze information on how participants engage with the study.
See all articles in this CEBP Focus section, “Modernizing Population Science.”
Introduction
Cancer epidemiological cohorts (CEC) are a cornerstone of epidemiologic research and uniquely positioned to drive future discovery. These cohorts have traditionally collected self-reported data on health status, lifestyle factors, and attitudes from study participants via paper-based, mailed questionnaires, the response rate to which has steadily declined over time (1). Systematic studies of how best to administer these questionnaires to improve response and overall study retention within cohorts are limited (2, 3). How to maintain response to mailed paper surveys is also a challenge in other fields and industries (4, 5).
For this and other reasons, many population-based studies are adopting Web-based questionnaires (6, 7). Web-based surveys for epidemiologic research enable cohorts to streamline data collection, receive questionnaire responses more quickly, swiftly identify and respond to problems within the questionnaire, and automate recruitment (8). However, the same issues surrounding paper questionnaire logistics apply to Web-based administration: what is the best way to distribute questionnaires, and how can researchers use them to ensure high-quality, representative data from study participants?
The move to Web-based questionnaires makes it easier for cohort studies to begin to address these questions. Intrinsic in Web-based questionnaire administration is the ability to collect paradata: data generated during—and about—the data collection process (9). Web-based surveys can capture key information about participant preferences and the effect of these preferences on response. Recorded at the respondent level, this information—who responds to the questionnaire, how they respond, when they respond, the device(s) on which they respond, etc.—lays the foundation to evaluate survey distribution methods and their effect on the collection of self-reported data.
The California Teachers Study (CTS), an ongoing CEC established in the mid-1990s (10), has distributed five paper-based questionnaires and experienced the same declining response seen across the field. In transitioning the study's most recent survey to Web, the team designed the questionnaire to collect paradata at each stage of questionnaire recruitment to identify which elements were associated with response.
Materials and Methods
Study population
The CTS began in 1995 to 1996, when 133,479 female public-school professionals (primarily teachers or administrators, ages 22–104) completed a mailed self-administered questionnaire. All participants have been invited to complete four paper-based follow-up questionnaires, in 1997 to 1999 (Q2); 2000 to 2002 (Q3); 2005 to 2007 (Q4); and 2012 to 2015 (Q5). Grant award U01-CA199277, from 2015 to 2020, funded the CTS to administer a sixth follow-up survey (Q6) that gave participants the option to complete the survey electronically or on paper.
All CTS participants who were alive and had not withdrawn or opted out of future contact would be invited to complete Q6 (see Fig. 1). Participants who had provided an e-mail were initially recruited for the Web-based version of Q6, that is, Questionnaire 6 Web (Q6web), as described in this paper. Participants who had not provided an e-mail address would receive the paper version of Q6, that is, Q6paper.
Of the 133,479 original CTS participants, 2 had withdrawn from the CTS, 29,970 were deceased, and 2,060 had opted out of further participation at the time of the questionnaire. Four additional participants were excluded due to discrepancies with their baseline questionnaires. Another 12,144 participants completed the CTS baseline survey in 1995 to 1996 but had not completed any of the four CTS follow-up questionnaires; we considered this subgroup to be inactive. The remaining 89,299 participants were eligible for Q6 in fall of 2017. Of these participants, 45,239 (50.7%) had provided the CTS with an e-mail address and were considered eligible for the Web-based questionnaire. Seven eligible participants died shortly after Q6web recruitment began and are therefore excluded from this analysis.
Questionnaire content
The CTS Steering Committee (https://www.calteachersstudy.org/team) determined content with consideration of current and new scientific hypotheses, the need to update key participant lifestyle and health factors, and emerging topics of interest. Question wording was based on previous questionnaires and validated questions from other sources. Q6 topics included physical activity, health conditions, medications, family health history, reproductive history, body size, financial stress, social support, sexual orientation and gender identity, and medicinal cannabis.
Questionnaire platform
Since 2014, the CTS has consolidated study management activities within a single study customer relationship management (CRM) platform hosted on Salesforce.com (https://www.salesforce.com/). Initially developed to manage customer acquisition and sales cycles, CRM platforms integrate all customer “touchpoints”—and paradata about those touchpoints—within a single central system with the ultimate goal of improving customer relationships (11). The CTS CRM is used to manage participant contact information and recruitment activities for projects that collect data or biospecimens.
We evaluated multiple survey platforms for their ability to integrate directly with the study CRM, their ease of use, compatibility with mobile devices, and security. The CTS purchased an annual subscription to Qualtrics.com (https://www.qualtrics.com/), an experience management company that provides customer, brand, product, and employee experience platforms. For CTS research, we used the Qualtrics survey tools, which removed the need to custom-code a CTS-specific platform. All Qualtrics surveys are Web and mobile enabled, use Transport Layer Security (TLS) encryption, and meet the technical requirements of the Health Information Technology for Economic and Clinical Health Act (HITECH).
Questionnaire design
Qualtrics' point-and-click questionnaire development software offers over 150 question templates. After the CTS agreed on topic and question order, one CTS team member added the content to Qualtrics, specified question format, determined topic and question order, governed display logic, created test questionnaires, and directly integrated Qualtrics and the study CRM without support from software developers or institutional IT staff.
Display logic and embedded data
Display logic and embedded data are standard components in self-reported questionnaires used by cancer epidemiology cohorts. Display logic, also known as skip logic, determines which questions are presented to participants based on their previous responses within the questionnaire. Display logic helps achieve efficiency and accuracy by hiding questions from participants for whom they would be redundant and asking additional questions of participants for whom follow-up is applicable. Embedded data are additional information stored directly within a questionnaire. For example, previous CTS questionnaires included questions such as “Since 2005, have you…”; the dates embedded in these questions frame the period of interest. In the previous CTS paper surveys, the skip logic and embedded data were incorporated into the design of the pages and therefore held constant across participants' questionnaires.
For Q6web, we used display logic and embedded data to individualize participants' questionnaires. We identified the self-reported data from previous questionnaires that was relevant to Q6 and embedded it in the questionnaire. Each participant's questionnaire contained her name, birthdate, mailing address, e-mail address(es), and phone number(s) for verification; participants were asked to verify or correct their contact information. Participants' menopausal status and the month and year of their most recent questionnaire were used as display logic: women who had previously reported they were postmenopausal were not presented with the menopausal status section, and participants were asked if they had used hormone therapy since the date of their last completed questionnaire.
Applying this individual-level data for all eligible participants generated a personalized, unique questionnaire link for each Q6web recipient.
Questionnaire dissemination
The CTS agreed upon the questionnaire recruitment methodology as outlined in Fig. 2A, whereby a participant would receive follow-up e-mails depending on her interaction with the first questionnaire invitation (Invite 1). Study participants could receive up to three invitations and three reminder e-mails for a maximum of six e-mails within a 30-day period.
The CTS utilized Pardot.com (https://www.pardot.com/), a marketing automation software supplied by Salesforce.com, to implement this protocol. Marketing automation software enables users to automate activities, such as sending e-mails, based on predefined rules (12, 13). These rules use “if, then” statements to determine action based on time intervals and/or participant behavior.
CTS staff designed e-mail templates in Pardot for each Q6 invite and reminder as shown in Fig. 2A. Templates were created once; Pardot automatically personalized the first name and questionnaire URL embedded in the e-mail when it was sent.
Staff added these e-mail templates to a Q6 nurturing campaign. Nurturing campaigns employ behavior-based e-mail automation: the “if, then” rules governing automated tasks depend on the e-mail recipient's actions. For example, whenever a participant met the CTS' rule for receiving Invite 2, Pardot automatically sent her the Invite 2 e-mail template with her first name and personal questionnaire link.
Participant experience
Participants started their questionnaires by clicking on the unique URL in their e-mail invitation. Each questionnaire link recorded that participant's progress as she advanced through the questionnaire. This permitted participants to start and stop the questionnaire at will, switch browsers, or change devices while preserving their previous answers. The bottom of each screen contained the CTS's toll-free number and invited participants to call if they encountered any difficulties or had questions.
Data collection and integration
Within the CTS's CRM platform, each participant has a “participant record” that includes all the data associated with her relevant CTS activities. Study staff directly integrated Qualtrics and Pardot with the study CRM so activities occurring within these external platforms were automatically documented on each participant's record (see Fig. 3). Questionnaire responses and the paradata associated with those responses—including start time, completion time, start date, completion date, the device used at questionnaire start, operating system used, etc.—were recorded on the participant's record, as were paradata on all the recruitment activities and their outcomes. These data included the date and time each recruitment e-mail was sent, the total number of invitations and reminders e-mailed, and e-mail hard bounces, that is, e-mails that could not be delivered to the recipient's inbox.
Statistical analysis
This analysis includes data on recruitment and responses collected between October 2017 and June 2019. The primary outcome was completion of Q6web. Secondary outcomes were completion of Q6web after the first, second, or third invitations. The primary potential explanatory variables were participant age, race, cancer history, previous questionnaire participation, as well as the paradata on number of invitations and reminders sent; day of week that the invitations were sent; and time of day that the invitations were sent.
We used Chi-square tests to evaluate differences in the distributions of response by categories of exposures. We did not specify the time of day that the e-mail invitations were sent, but we categorized into approximate quartiles (see Table 1). We used logistic regression to calculate ORs and 95% confidence intervals (95% CI) associated with participant and recruitment characteristics. All analyses were completed with SAS 9.4 using the CTS Data Warehouse (https://www.calteachersstudy.org/for-researchers).
. | Recruited for Q6W . | Rest of population . | . |
---|---|---|---|
Participant characteristics . | N = 45,239a . | N = 44,060 . | P value . |
Age | |||
Younger than 65 | 12,295 (27.8%) | 11,434 (26.0%) | <0.0001 |
65–69 | 8,175 (18.1%) | 6,050 (13.7%) | |
70–74 | 9,510 (21.0%) | 6,640 (15.1%) | |
75–79 | 7,119 (15.7%) | 5,385 (12.2%) | |
80+ | 8,140 (18.0%) | 14,551 (33.0%) | |
Race | |||
Non-Hispanic white/Caucasian | 39,975 (88.4%) | 37,039 (84.1%) | <0.0001 |
Nonwhite | 5,264 (11.6%) | 7,021 (15.9%) | |
Personal history of cancer at Q6 | |||
None | 42,345 (93.6%) | 35,069 (79.6%) | <0.0001 |
Cancer survivor at Q6 | 2,894 (6.4%) | 8,991 (20.41%) | |
Responses to previous CTS questionnaires | |||
Completed Questionnaire 5 (Q5) | 41,446 (91.6%) | 21,407 (48.6%) | <0.0001 |
Did not complete Q5 | 3,793 (8.4%) | 22,653 (51.4%) | |
Completed all prior questionnaires | 31,138 (68.8%) | 13,275 (30.1%) | <0.0001 |
Did not complete all prior questionnaires | 14,101 (31.2%) | 30,785 (69.9%) |
. | Recruited for Q6W . | Rest of population . | . |
---|---|---|---|
Participant characteristics . | N = 45,239a . | N = 44,060 . | P value . |
Age | |||
Younger than 65 | 12,295 (27.8%) | 11,434 (26.0%) | <0.0001 |
65–69 | 8,175 (18.1%) | 6,050 (13.7%) | |
70–74 | 9,510 (21.0%) | 6,640 (15.1%) | |
75–79 | 7,119 (15.7%) | 5,385 (12.2%) | |
80+ | 8,140 (18.0%) | 14,551 (33.0%) | |
Race | |||
Non-Hispanic white/Caucasian | 39,975 (88.4%) | 37,039 (84.1%) | <0.0001 |
Nonwhite | 5,264 (11.6%) | 7,021 (15.9%) | |
Personal history of cancer at Q6 | |||
None | 42,345 (93.6%) | 35,069 (79.6%) | <0.0001 |
Cancer survivor at Q6 | 2,894 (6.4%) | 8,991 (20.41%) | |
Responses to previous CTS questionnaires | |||
Completed Questionnaire 5 (Q5) | 41,446 (91.6%) | 21,407 (48.6%) | <0.0001 |
Did not complete Q5 | 3,793 (8.4%) | 22,653 (51.4%) | |
Completed all prior questionnaires | 31,138 (68.8%) | 13,275 (30.1%) | <0.0001 |
Did not complete all prior questionnaires | 14,101 (31.2%) | 30,785 (69.9%) |
aSeven participants died shortly after Q6web recruitment began. Those participants are included in Table 1 but excluded from the other reported results.
Results
Recruitment population
CTS participants who were recruited for Q6web (Q6web recruits) differed from the group that was not recruited (Table 1). Q6web recruits were statistically significantly younger than participants who were not recruited. The percentage of participants over age 80 who were recruited for Q6web (18%) was approximately half of the percentage of participants over age 80 who were not recruited for Q6web (33%). Compared with the CTS participants who were not recruited for Q6web, Q6web recruits were less likely to have a personal history of cancer. Within the recruited population, cancer survivors were more likely than participants without a personal history of cancer to be over age 80 (34% vs. 17%).
Q6web recruits were also more likely to be white and approximately twice as likely to have completed the previous CTS questionnaire (Q5, in 2012–2014) or to have completed all the CTS follow-up surveys since the CTS began in 1995 to 1996.
Recruitment invitations
A total of 19,481 of the 45,232 participants, or 43.1%, completed Q6web. Over half (50.8%) of the respondents completed their questionnaire after the first invitation (Invite 1) and without needing any reminder e-mails (Fig. 2B). Another 6% of total responders completed their questionnaire after receiving the first reminder e-mail (Reminder 1); the second and third reminder e-mails produced fewer responses.
Follow-up e-mail invitations were only half as effective as the previous invitation e-mails: response after the second (Invite 2) accounted for 23% of the total response, and response after the third (Invite 3) accounted for only 13% of the total response. This pattern of declining return also occurred with the reminder e-mails: the responses to the second and third reminders were consistently 60% to 75% lower than the responses to Reminder 1 (Fig. 2B). This pattern was consistent for the first, second, and third invitation e-mails. As the cumulative number of e-mails sent increased after Invite 2, the total cumulative number of completed questionnaires remained relatively flat (Fig. 2C).
Table 2 presents the days of the week and the time of day that Invites 1, 2, and 3 were sent. All 45,232 eligible participants were sent Invite 1, but Invite 2 and Invite 3 were only sent to participants who had not yet started their questionnaire. Therefore, the three denominators—45,232 for Invite 1; 31,784 for Invite 2; and 26,079 for Invite 3—are nested subsets that include the same nonresponders. Invitation e-mails were not evenly distributed across time of day and day of week (Table 2). A higher percentage of invitation e-mails were sent mid-week than on weekends and in the early morning than later in the day, and the proportion of e-mails sent before 10 am increased with subsequent invitations.
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 1 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 598 | 499 | 1,042 | 2,606 | 1,718 | 4,199 | 1,949 | 12,611 | 22.7% |
10 am–11 am | 2,650 | 2,725 | 2,275 | 2,061 | 1,647 | 348 | 1,234 | 12,940 | 24.4% |
12 pm–1 pm | 1,933 | 3,462 | 11 | 927 | 994 | 2,693 | 0 | 10,020 | 26.9% |
After 1 pm | 281 | 1,617 | 4,100 | 998 | 2,402 | 0 | 263 | 9,661 | 29.8% |
Total sent | 5,462 | 8,303 | 7,428 | 6,592 | 6,761 | 7240 | 3,446 | 45,232a | |
% Completed | 13.1% | 18.0% | 15.1% | 15.2% | 14.1% | 15.4% | 9.1% | 25.6% |
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 1 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 598 | 499 | 1,042 | 2,606 | 1,718 | 4,199 | 1,949 | 12,611 | 22.7% |
10 am–11 am | 2,650 | 2,725 | 2,275 | 2,061 | 1,647 | 348 | 1,234 | 12,940 | 24.4% |
12 pm–1 pm | 1,933 | 3,462 | 11 | 927 | 994 | 2,693 | 0 | 10,020 | 26.9% |
After 1 pm | 281 | 1,617 | 4,100 | 998 | 2,402 | 0 | 263 | 9,661 | 29.8% |
Total sent | 5,462 | 8,303 | 7,428 | 6,592 | 6,761 | 7240 | 3,446 | 45,232a | |
% Completed | 13.1% | 18.0% | 15.1% | 15.2% | 14.1% | 15.4% | 9.1% | 25.6% |
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 2 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 1,838 | 1,931 | 1,607 | 3,541 | 2,794 | 3,307 | 2,721 | 17,739 | 14.5% |
10 am–11 am | 20 | 245 | 359 | 1,973 | 254 | 31 | 265 | 3,147 | 14.9% |
12 pm–1 pm | 0 | 22 | 993 | 17 | 136 | 413 | 20 | 1,601 | 20.1% |
After 1 pm | 1,678 | 1,795 | 2,107 | 1,976 | 473 | 321 | 947 | 9,297 | 18.8% |
Total sent | 3,536 | 3,993 | 5,066 | 7,507 | 3,657 | 4,072 | 3,953 | 31,784 | |
% Completed | 17.5% | 15.7% | 12.4% | 14.7% | 18.2% | 19.2% | 17.6% | 16.1% |
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 2 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 1,838 | 1,931 | 1,607 | 3,541 | 2,794 | 3,307 | 2,721 | 17,739 | 14.5% |
10 am–11 am | 20 | 245 | 359 | 1,973 | 254 | 31 | 265 | 3,147 | 14.9% |
12 pm–1 pm | 0 | 22 | 993 | 17 | 136 | 413 | 20 | 1,601 | 20.1% |
After 1 pm | 1,678 | 1,795 | 2,107 | 1,976 | 473 | 321 | 947 | 9,297 | 18.8% |
Total sent | 3,536 | 3,993 | 5,066 | 7,507 | 3,657 | 4,072 | 3,953 | 31,784 | |
% Completed | 17.5% | 15.7% | 12.4% | 14.7% | 18.2% | 19.2% | 17.6% | 16.1% |
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 3 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 2,633 | 3,294 | 2,591 | 2,212 | 2,651 | 2,174 | 2,763 | 18,318 | 9.1% |
10 am–11 am | 1,208 | 13 | 0 | 714 | 489 | 1,280 | 12 | 3,716 | 12.1% |
12 pm–1 pm | 12 | 0 | 58 | 855 | 895 | 23 | 292 | 2,135 | 14.7% |
After 1 pm | 51 | 145 | 177 | 734 | 266 | 0 | 537 | 1,910 | 18.2% |
Total sent | 3,904 | 3,452 | 2,826 | 4,515 | 4,301 | 3,477 | 3,604 | 26,079 | |
% Completed | 10.3% | 10.3% | 10.1% | 10.5% | 9.7% | 10.4% | 13.5% | 10.7% |
. | Day of week . | . | |||||||
---|---|---|---|---|---|---|---|---|---|
Invite 3 . | Sun. . | Mon. . | Tue. . | Wed. . | Thu. . | Fri. . | Sat. . | Total sent . | % Completed . |
Time of day | |||||||||
Before 10 am | 2,633 | 3,294 | 2,591 | 2,212 | 2,651 | 2,174 | 2,763 | 18,318 | 9.1% |
10 am–11 am | 1,208 | 13 | 0 | 714 | 489 | 1,280 | 12 | 3,716 | 12.1% |
12 pm–1 pm | 12 | 0 | 58 | 855 | 895 | 23 | 292 | 2,135 | 14.7% |
After 1 pm | 51 | 145 | 177 | 734 | 266 | 0 | 537 | 1,910 | 18.2% |
Total sent | 3,904 | 3,452 | 2,826 | 4,515 | 4,301 | 3,477 | 3,604 | 26,079 | |
% Completed | 10.3% | 10.3% | 10.1% | 10.5% | 9.7% | 10.4% | 13.5% | 10.7% |
aExcludes the 7 participants who died shortly after Q6web recruitment began.
The ORs in Table 3 model the odds of having completed Q6web: ORs above 1.00 indicate a positive association between that characteristic and completing the questionnaire. Compared with participants under age 65, participants ages 65 to 79 were significantly more likely to complete Q6web (ORs ranged from 1.11 to 1.25, see Table 3). In contrast, participants over age 80 years were 50% less likely to complete Q6web compared with those under age 65 (OR = 0.50; 95% CI, 0.47–0.54). Nonwhite participants were less likely to complete Q6web compared with non-Hispanic white participants (OR = 0.72; 95% CI, 0.68–0.76). Cancer survivorship was not significantly associated with completion of Q6web. Among cancer survivors, type of cancer diagnosis was not associated with Q6web response (Chi-square P-value = 0.08).
. | Univariate . | Multivariatea . |
---|---|---|
. | OR (95% CI) . | OR (95% CI) . |
Age | ||
<65 | 1.00 (Ref.) | 1.00 (Ref.) |
65–69 | 1.25 (1.18–1.32) | 1.19 (1.12–1.26) |
70–74 | 1.24 (1.18–1.31) | 1.16 (1.09–1.22) |
75–79 | 1.11 (1.04–1.17) | 1.01 (0.95–1.07) |
80+ | 0.50 (0.47–0.54) | 0.46 (0.43–0.49) |
Race | ||
Non-Hispanic white | 1.00 (Ref.) | 1.00 (Ref.) |
Nonwhite | 0.72 (0.68–0.76) | 0.74 (0.70–0.79) |
Cancer history | ||
No history of cancer | 1.00 (Ref.) | 1.00 (Ref.) |
Cancer survivor at Q6 | 1.06 (0.98–1.14) | 0.94 (0.87–1.02) |
Previous response | ||
Completed Q5 | ||
No | 1.00 (Ref.) | 1.00 (Ref.) |
Yes | 6.07 (5.50–6.70) | 6.01 (5.45–6.63) |
Completed all previous Qs | ||
No | 1.00 (Ref.) | 1.00 (Ref.) |
Yes | 3.06 (2.93–3.20) | 3.04 (2.90–3.17) |
. | Univariate . | Multivariatea . |
---|---|---|
. | OR (95% CI) . | OR (95% CI) . |
Age | ||
<65 | 1.00 (Ref.) | 1.00 (Ref.) |
65–69 | 1.25 (1.18–1.32) | 1.19 (1.12–1.26) |
70–74 | 1.24 (1.18–1.31) | 1.16 (1.09–1.22) |
75–79 | 1.11 (1.04–1.17) | 1.01 (0.95–1.07) |
80+ | 0.50 (0.47–0.54) | 0.46 (0.43–0.49) |
Race | ||
Non-Hispanic white | 1.00 (Ref.) | 1.00 (Ref.) |
Nonwhite | 0.72 (0.68–0.76) | 0.74 (0.70–0.79) |
Cancer history | ||
No history of cancer | 1.00 (Ref.) | 1.00 (Ref.) |
Cancer survivor at Q6 | 1.06 (0.98–1.14) | 0.94 (0.87–1.02) |
Previous response | ||
Completed Q5 | ||
No | 1.00 (Ref.) | 1.00 (Ref.) |
Yes | 6.07 (5.50–6.70) | 6.01 (5.45–6.63) |
Completed all previous Qs | ||
No | 1.00 (Ref.) | 1.00 (Ref.) |
Yes | 3.06 (2.93–3.20) | 3.04 (2.90–3.17) |
aAdjusted for age, race, cancer status, and having completed Q5.
Participants who completed the previous CTS survey (Q5, 2012–2015) were six times more likely to complete Q6web than participants who did not complete Q5 (OR = 6.07; 95% CI, 5.50–6.70). Having completed all the previous CTS follow-up surveys was significantly associated with completing Q6web (OR = 3.06; 95% CI, 2.93–3.20). The multivariate ORs were not markedly different from the univariate ORs, which were adjusted for categorical age only.
Table 4 presents the associations between the time of day and the day of week that Invite 1 was e-mailed and the odds of those participants completing Q6web. The univariate ORs are adjusted for categorical age only; the multivariate ORs are adjusted for race, cancer survivorship, and having completed Q5. Compared with participants to whom Invite 1 was e-mailed before 10 am, participants to whom Invite 1 was e-mailed later in the day were statistically significantly more likely to have completed Q6web. Multivariate adjustment attenuated these associations, but the pattern of associations remained consistent. Participants to whom Invite 1 was sent after 2 pm were 50% more likely to have completed Q6web than participants to whom Invite 1 was e-mailed before 10 am (OR = 1.51; 95% CI, 1.43–1.60).
. | Univariate . | Multivariatea . |
---|---|---|
. | OR (95% CI) . | OR (95% CI) . |
Invite 1 time | ||
Before 10 am | 1.00 (Ref.) | 1.00 (Ref.) |
Hour 1 (10–11 am) | 1.27 (1.20–1.33) | 1.14 (1.09–1.21) |
Hour 2 (12–1 pm) | 1.38 (1.31–1.46) | 1.30 (1.23–1.37) |
Hour 3 (2 pm +) | 1.75 (1.65–1.84) | 1.51 (1.43–1.60) |
Invite 1 day | ||
Monday | 1.27 (1.19–1.35) | 1.20 (1.13–1.29) |
Tuesday | 1.00 (Ref.) | 1.00 (Ref.) |
Wednesday | 1.21 (1.13–1.29) | 1.26 (1.17–1.35) |
Thursday | 1.10 (1.03–1.18) | 1.16 (1.08–1.24) |
Friday | 1.04 (0.97–1.11) | 1.04 (0.97–1.11) |
Saturday | 1.48 (1.36–1.60) | 1.34 (1.24–1.46) |
Sunday | 1.32 (1.23–1.42) | 1.22 (1.13–1.31) |
. | Univariate . | Multivariatea . |
---|---|---|
. | OR (95% CI) . | OR (95% CI) . |
Invite 1 time | ||
Before 10 am | 1.00 (Ref.) | 1.00 (Ref.) |
Hour 1 (10–11 am) | 1.27 (1.20–1.33) | 1.14 (1.09–1.21) |
Hour 2 (12–1 pm) | 1.38 (1.31–1.46) | 1.30 (1.23–1.37) |
Hour 3 (2 pm +) | 1.75 (1.65–1.84) | 1.51 (1.43–1.60) |
Invite 1 day | ||
Monday | 1.27 (1.19–1.35) | 1.20 (1.13–1.29) |
Tuesday | 1.00 (Ref.) | 1.00 (Ref.) |
Wednesday | 1.21 (1.13–1.29) | 1.26 (1.17–1.35) |
Thursday | 1.10 (1.03–1.18) | 1.16 (1.08–1.24) |
Friday | 1.04 (0.97–1.11) | 1.04 (0.97–1.11) |
Saturday | 1.48 (1.36–1.60) | 1.34 (1.24–1.46) |
Sunday | 1.32 (1.23–1.42) | 1.22 (1.13–1.31) |
aAdjusted for the covariates listed in Table 3.
Response to Q6web was also associated with the day of week that Invite 1 was sent. Compared with participants to whom Invite 1 was e-mailed on a Tuesday, participants whose first invitation was e-mailed on any other day except Friday (OR = 1.04; 95% CI, 0.97–1.11) were significantly more likely to complete Q6web (ORs ranged from 1.16–1.34, see Table 4). Multivariate adjustment for participant characteristics reduced the magnitude of the associations with Saturday and Sunday (OR = 1.34; 95% CI, 1.24–1.46 and OR = 1.22; 95% CI, 1.13–1.31, respectively) but slightly increased the magnitude of the associations with Wednesday and Thursday (OR = 1.26; 95% CI, 1.17–1.35 and OR = 1.16; 95% CI, 1.08–1.24, respectively).
Table 5 shows the associations with Q6web response in models that (i) included Invite 1 time of day and day of week and (ii) were adjusted for the participant characteristics of age, race, cancer survivorship, and Q5 completion. The reference groups for these ORs are the participants who were sent Invite 1 before 10 am and on a Tuesday. Among all participants who were sent Invite 1, those who were sent that invitation e-mail after 2 pm (OR = 1.80; 95% CI, 1.69–1.92) or on a Saturday (OR = 1.79; 95% CI, 1.64–1.96) were almost twice as likely to complete Q6web. Invite 1 e-mails sent any time after 10 am, or on any day other than a Tuesday, were significantly associated with completion of Q6web. Additional adjustment for retirement age (<65 vs. ≥65) or self-reported retirement at Q5 did not materially change these associations.
. | OR (95% CI)a . |
---|---|
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 1.20 (1.14–1.27) |
12–1 pm | 1.34 (1.26–1.42) |
2 pm + | 1.80 (1.69–1.92) |
Invite 1 day | |
Monday | 1.33 (1.24–1.43) |
Tuesday | 1.00 (Ref.) |
Wednesday | 1.53 (1.42–1.65) |
Thursday | 1.25 (1.17–1.35) |
Friday | 1.37 (1.26–1.48) |
Saturday | 1.79 (1.64–1.96) |
Sunday | 1.45 (1.34–1.57) |
Among Q5 responders | |
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 1.18 (1.11–1.25) |
12–1 pm | 1.32 (1.24–1.40) |
2 pm + | 1.80 (1.69–1.93) |
Invite 1 day | |
Monday | 1.36 (1.27–1.47) |
Tuesday | 1.00 (Ref.) |
Wednesday | 1.52 (1.41–1.64) |
Thursday | 1.27 (1.18–1.36) |
Friday | 1.41 (1.30–1.53) |
Saturday | 1.82 (1.66–1.99) |
Sunday | 1.49 (1.37–1.61) |
Among Q5 nonresponders | |
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 2.44 (1.78–3.32) |
12–1 pm | 1.80 (1.37–2.36) |
2 pm + | 1.93 (1.27–2.93) |
Invite 1 day | |
Monday | 0.89 (0.58–1.37) |
Tuesday | 1.00 (Ref.) |
Wednesday | 2.22 (1.55–3.18) |
Thursday | 1.32 (0.93–1.86) |
Friday | 0.96 (0.63–1.45) |
Saturday | 1.00 (0.50–2.00) |
Sunday | 0.70 (0.37–1.33) |
. | OR (95% CI)a . |
---|---|
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 1.20 (1.14–1.27) |
12–1 pm | 1.34 (1.26–1.42) |
2 pm + | 1.80 (1.69–1.92) |
Invite 1 day | |
Monday | 1.33 (1.24–1.43) |
Tuesday | 1.00 (Ref.) |
Wednesday | 1.53 (1.42–1.65) |
Thursday | 1.25 (1.17–1.35) |
Friday | 1.37 (1.26–1.48) |
Saturday | 1.79 (1.64–1.96) |
Sunday | 1.45 (1.34–1.57) |
Among Q5 responders | |
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 1.18 (1.11–1.25) |
12–1 pm | 1.32 (1.24–1.40) |
2 pm + | 1.80 (1.69–1.93) |
Invite 1 day | |
Monday | 1.36 (1.27–1.47) |
Tuesday | 1.00 (Ref.) |
Wednesday | 1.52 (1.41–1.64) |
Thursday | 1.27 (1.18–1.36) |
Friday | 1.41 (1.30–1.53) |
Saturday | 1.82 (1.66–1.99) |
Sunday | 1.49 (1.37–1.61) |
Among Q5 nonresponders | |
Multivariate model including both time of day and day of week | |
Invite 1 time | |
Before 10 am | 1.00 (Ref.) |
10–11 am | 2.44 (1.78–3.32) |
12–1 pm | 1.80 (1.37–2.36) |
2 pm + | 1.93 (1.27–2.93) |
Invite 1 day | |
Monday | 0.89 (0.58–1.37) |
Tuesday | 1.00 (Ref.) |
Wednesday | 2.22 (1.55–3.18) |
Thursday | 1.32 (0.93–1.86) |
Friday | 0.96 (0.63–1.45) |
Saturday | 1.00 (0.50–2.00) |
Sunday | 0.70 (0.37–1.33) |
aAdjusted for the covariates listed in Table 3.
Because the majority of the Q6web responders had completed Q5, these associations were nearly identical in models restricted to participants who had completed Q5. However, among the subgroup of Q6web respondents who had not completed Q5, day of week was not as strongly or clearly associated with response. Compared with participants who were sent Invite 1 on a Tuesday, only the participants who were sent Invite 1 on a Wednesday were more likely to complete Q6web (OR = 2.22; 95% CI, 1.55–3.18; see Fig. 4A). Time of day remained associated with Q6web response, but the associations were less precise than among the entire population that was invited to complete Q6web (see Fig. 4B). Among this subgroup, invitations sent between 10 and 11 am were almost two and a half times more likely to be completed (OR = 2.44; 95% CI, 1.78–3.32), but the association was less exact than for all Q6web recruits (OR = 1.20; 95% CI, 1.14–1.27). Invitation e-mails sent to Q5 nonresponders bounced twice as often (12% bounce rate) as invitation e-mails to participants who had completed Q5 (6% bounce rate).
A total of 640 Q6web recruits (1.4%) called the 1-800 number. Participants age 75 and older called more frequently than their younger counterparts; these age groups comprised 34% of the recruited population but constituted 52% of the caller population (Fig. 5).
Discussion
To distribute our sixth questionnaire, the CTS used an integrated combination of a commercial survey platform, marketing automation software, and the existing CRM. This approach provided a professional-grade and user-friendly questionnaire experience to study participants. It also generated new data about how and when participants responded. The days of the week and times of day on which the invitations were e-mailed were significantly associated with whether participants completed the questionnaire. Even among the subgroup of CTS participants who have responded most consistently to all follow-up surveys, timing mattered. Participants to whom we sent invitation e-mails on Saturdays or in the afternoons were more likely to complete the survey than participants to whom we sent invitation e-mails in the middle of the week or in the mornings, respectively. These associations raise interesting questions about how CEC questionnaire recruitment protocols ultimately influence participant response.
This integrated approach had multiple strengths. The commercially available survey platform was user-friendly, customizable, and could be easily configured for this survey. The software was easily managed by study staff; the entire implementation—from content brainstorm to questionnaire delivery—was completed in 10 months. The range of question and answer templates accommodated the CTS' scientific needs, and the survey platform allowed us to extensively personalize the questionnaire at scale, for over 43,000 participants. This streamlined the experience for participants, increased the efficiency of our data collection process, and improved data consistency. The Qualtrics platform intrinsically collected new paradata and included multiple options for pilot-testing and reviewing results immediately after participants submitted their survey. Internal data suggest our survey costs were approximately 50% lower for Q6web than for previous CTS surveys.
This approach also has some limitations. Our team had to develop proficiency in the marketing automation software used to disseminate the survey. Integrating the survey platform, marketing automation software, and our CRM required that someone on staff manage each platform. This strategy built on our existing experience using CRM for a 2013 to 2016 CTS biobanking project. Studies that have not previously used CRM or integrated their data in this way would have a steeper learning curve, but all three of the platforms we used are user-friendly and designed to be configured by users for their specific experience.
Another limitation is the absence of paradata from earlier CTS questionnaires. Previous surveys were mailed, scannable questionnaires. Data on response to those surveys based on day or date of mailing were not collected. In the absence of those data, it is impossible to determine whether the associations we observed here existed during previous surveys; if they did, it is impossible to know whether these associations have changed over time.
We did not directly ask participants why they responded when they did. Future surveys should try to determine how much those patterns reflect participants' preferences vs. potentially random behavior. It would also be informative to know why participants chose not to respond, although that information is challenging to ascertain.
This is a hypothesis-generating analysis of how we collected data, rather than an analysis of existing scientific data. When we designed the e-mail invitation schedule, we did not actively choose which days of the week or times of day to send the survey. Although we observed associations between these “exposures”—the paradata on how we delivered Q6—and the outcome of response versus nonresponse, these associations could be confounded by other data that we did not measure, such as what participants were experiencing when they received the invitations. We assume those choices were non-random, but more data would be needed to understand those associations.
This analysis of CTS paradata demonstrates that every recruitment activity represents informative data that should be collected, analyzed, and utilized to improve CEC research. There is a need for this type of infrastructure science and paradata, especially for real-world data (14). These CTS paradata are a positive side effect of our decision to use commercially available survey and marketing automation software; paradata are essential components of marketing and communications in other industries and sectors. There are enough similarities between those industries and how CECs need to collect patient-reported data to consider making detailed paradata a standard component of a CEC's data collection strategy, regardless of whether surveys are electronic or on paper. The associations between Q6web response and time of day, day of week, and number of recruitment e-mails sent are based on one CTS survey. Whether these patterns are generalizable to future CTS surveys or other CECs will be unknown until additional data from other surveys is available for comparison. Future CTS data collection protocols should be designed on the basis of results of extensive pilot testing, using this type of paradata. For example, if pilot testing confirmed that e-mails sent later in the day and later during the week generated higher response, then the subsequent full-scale recruitment e-mails could be scheduled to only be sent during high-yield time windows. Other large-scale CECs, which require significant investments of time, personnel, and funding to collect data, might also benefit from a similar approach to understand their participants' behavior and align their recruitment strategies to those behavior patterns.
Overall, these data indicate that response to CEC follow-up surveys is associated with more than just the demographic characteristics of the participants. As CECs consider how to efficiently collect additional self-reported data from participants and patients (14), further exploration of these types of metadata and paradata has the potential to improve data collection protocols and the resulting research.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: K.E. Savage, J.V. Lacey Jr
Development of methodology: K.E. Savage, J.L. Benbow, C. Duffy, S.S. Wang, J.V. Lacey Jr
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K.E. Savage, J.L. Benbow, N.T. Chung
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.E. Savage, E.S. Spielfogel, N.T. Chung, J.V. Lacey Jr
Writing, review, and/or revision of the manuscript: K.E. Savage, J.L. Benbow, E.S. Spielfogel, N.T. Chung, S.S. Wang, M.E. Martinez, J.V. Lacey Jr
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E.S. Spielfogel, J.V. Lacey Jr
Study supervision: K.E. Savage, J.L. Benbow, J.V. Lacey Jr
Other (design and execution): K.E. Savage, J.L. Benbow, C. Duffy, S.S. Wang, J.V. Lacey Jr
Acknowledgments
The California Teachers Study and the research reported in this publication were supported by the NCI of the NIH under award number U01-CA199277, P30-CA033572, P30-CA023100, UM1-CA164917, and R01-CA077398. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NCI or the NIH.
The collection of cancer incidence data used in the California Teachers Study was supported by the California Department of Public Health pursuant to California Health and Safety Code Section 103885; Centers for Disease Control and Prevention's National Program of Cancer Registries, under cooperative agreement 5NU58DP006344; the NCI's Surveillance, Epidemiology, and End Results Program under contract HHSN261201800032I awarded to the University of California, San Francisco, contract HHSN261201800015I awarded to the University of Southern California, and contract HHSN261201800009I awarded to the Public Health Institute. The opinions, findings, and conclusions expressed herein are those of the author(s) and do not necessarily reflect the official views of the State of California, Department of Public Health, the National Cancer Institute, the National Institutes of Health, the Centers for Disease Control and Prevention or their Contractors and Subcontractors, or the Regents of the University of California, or any of its programs.
The authors thank the California Teachers Study Steering Committee that is responsible for the formation and maintenance of the Study within which this research was conducted. A full list of California Teachers Study team members is available at https://www.calteachersstudy.org/team.
Great thanks are also due to the California Teachers Study participants, who have given their time and commitment to the study since 1995. The data and information they provided have been an invaluable contribution to women's health and cancer research.