Background:

Large-scale prospective cohorts traditionally use English, paper-based, mailed surveys, but Web-based surveys can lower costs and increase data quality, and multi-language surveys may aid in capturing diverse populations. Little evidence exists examining item response for multiple survey modalities or languages in epidemiologic cohorts.

Methods:

A total of 254,475 men and women completed a comprehensive lifestyle and medical survey at enrollment (2006–2013) for the Cancer Prevention Study-3, a U.S.-based prospective cohort. Web-based (English only) or paper (Spanish or English) surveys were offered. Using generalized linear models, differences in item response rates overall and by topical areas (e.g., reproductive history) by modality and language were examined. We further examined whether differences in response quality by sociodemographic characteristics within each survey modality existed.

Results:

Overall, English Web-based surveys had the highest average item response rate (97.6%), followed by English paper (95.5%) and Spanish paper (83.1%). Lower item response rates were seen among nonwhite, lower income, or less-educated participants. When examining individual survey sections by topic, results varied the most for residential history, with the lowest item response rate among Spanish language respondents (women, 62.7% and men, 64.3%) and the highest in English language Web-based, followed by paper respondents (women, 94.6% and men, 95.3%; and women, 92.8% and men, 92.1%, respectively).

Conclusions:

This study supports that utilizing multimodal survey approaches in epidemiologic studies does not differentially affect data quality. However, for some topic areas, further analysis should be considered for assessing data quality differences in Spanish language surveys.

Impact:

Multimodal survey administration is effective in nondifferentially capturing high-quality data.

See all articles in this CEBP Focus section, “Modernizing Population Science.”

Most large-scale epidemiologic cohort studies in the United States capture lifestyle and medical data from participants using self-administered English language paper-based mailed surveys (1–3) because this method has been effective in yielding high-response rates and low-item nonresponse (4). However, drawbacks to paper surveys include high cost and longer data collection time (5, 6). With widespread internet access and user-friendly Web-based survey platforms that facilitate easier survey completion, newer cohort studies have begun to offer Web-based surveys as an alternative to paper surveys (7, 8). Web-based surveys have the potential to reduce survey administration costs such as printing and postage, manual data processing and editing, and time for participants and study personnel (7, 9–16).

Several studies that have incorporated Web-based surveys have seen higher response rates, better data quality, and more complete data (6, 9–14). For example, Web-based surveys can help to decrease or eliminate invalid responses or missed items by incorporating prompts and alerts to the responder to ensure valid responses (6, 9, 11, 12). Some (10, 11, 13, 14), but not all (9), studies also have found that Web-based surveys tend to have less item nonresponse than paper-based surveys, in part due to the survey design requiring responses to questions.

In addition to different survey modalities, offering surveys in different languages is often necessary to increase diversity in population studies. In 2015, Hispanics were the largest ethnic minority population accounting for 17.6% of the U.S. population (56.6 million people). Of these, 40 million U.S. residents were estimated to speak Spanish in the home (13.3% of the total U.S. population; ref. 17). However, when using non-English surveys, it is essential to consider cultural context, translation quality, and the comprehension of research methods by the targeted population (18–21).

While more epidemiologic cohort studies are offering surveys translated into other languages, as well as using Web-based survey methods to capture participant data, there remains a paucity of evidence examining differences in data quality and completeness across different survey modalities in the field of epidemiology. It is important for epidemiologic cohorts to examine differences in item response rates by survey modality and language, not only overall, but by specific topical areas to ensure that data quality does not vary by these factors. For example, if there was a greater proportion of missing data for a specific set of questions among paper survey responders, this may result in incomplete control for confounding among the exposures measured by those questions. Thus, understanding systematic differences in completion based on survey modality or language is important.

Because item nonresponse is a crucial factor in survey data quality, it is essential to assess the types of nonresponse as well as develop strategies to prevent or reduce missing data. It is important to consider item response rates and why data are missing, such that if the data are not missing at random (i.e., due to participant characteristics or unwillingness to respond to sensitive questions), the variables related to missing data should be included in the adjustment of the analysis model (22, 23). Therefore, we investigated the item response rates of the baseline survey data collected by age, sex, and other sociodemographic factors by survey modality (paper or Web-based) and survey language (English or Spanish) in the Cancer Prevention Study-3 (CPS-3), a large-scale prospective cohort study recruited by the American Cancer Society (ACS) across the United States and Puerto Rico.

Study population

Between 2006 and 2013, 303,682 participants between the ages of 30 and 65 were enrolled in the CPS-3. A detailed description of the cohort, recruitment methods, and participant characteristics has been described elsewhere (24). In brief, the cohort was recruited from 35 states, Puerto Rico, and the District of Columbia at both ACS fundraising events and community enrollment drives. At an enrollment event, participants completed a short on-site paper survey and provided a blood sample. Participants were asked to complete a longer baseline survey at home to provide more details on medical and lifestyle information. The baseline survey was available in English on paper or Web-based and in Spanish on paper only. A total of 254,650 participants completed their baseline survey, among whom 67 were excluded after they revoked their participation in the study. We further excluded those who returned the baseline survey with no survey questions completed (n = 108), leaving 254,475 participants in the final analytic cohort. All aspects of CPS-3 have been reviewed and approved by the Emory University Institutional Review Board.

The baseline surveys administered to participants were sex specific. The surveys were nearly identical except for additional reproductive history and medical conditions questions on the women's survey. During the multi-year enrollment period, there were three versions of the English paper survey (2006, 2007, and 2010). The Web-based English survey was first introduced in 2007 and did not change throughout the enrollment period. The Web-based survey was structured to mirror the 2007 paper survey version and every question was optional to complete. The Spanish paper survey was introduced in 2007 with an updated version in 2010, but no Spanish Web-based version was offered. Thus, over the entire enrollment period, there were six unique survey versions used to capture baseline data in both men and women (Table 1).

Table 1.

Description of 12 baseline survey versions used, CPS-3, USA, 2006–2013.

GenderYear(s) usedLanguageModalityTotal required questions
Women 2006 English Paper 169 
Women 2007–2009 English Paper 181 
Women 2007–2009 Spanish Paper 181 
Women 2007–2013 English Web-based 143 
Women 2010–2013 English Paper 157 
Women 2010–2013 Spanish Paper 156 
Men 2006 English Paper 149 
Men 2007–2009 English Paper 160 
Men 2007–2009 Spanish Paper 160 
Men 2007–2013 English Web-based 123 
Men 2010–2013 English Paper 138 
Men 2010–2013 Spanish Paper 137 
GenderYear(s) usedLanguageModalityTotal required questions
Women 2006 English Paper 169 
Women 2007–2009 English Paper 181 
Women 2007–2009 Spanish Paper 181 
Women 2007–2013 English Web-based 143 
Women 2010–2013 English Paper 157 
Women 2010–2013 Spanish Paper 156 
Men 2006 English Paper 149 
Men 2007–2009 English Paper 160 
Men 2007–2009 Spanish Paper 160 
Men 2007–2013 English Web-based 123 
Men 2010–2013 English Paper 138 
Men 2010–2013 Spanish Paper 137 

Assessment of survey item response rates

Item response rate was calculated by identifying all “required” questions and summing the number that were answered. A question was considered “required” if, based on the format of the question, all participants would be able to provide an answer and the question was not part of a logical skip pattern (i.e., missing by design based on the answer to the leading question). In addition, questions such as current weight or other questions that all participants should reasonably be able to answer were included as required questions. Because of the various versions of the baseline survey, the absolute number of required questions differed from year to year (Table 1). On average, there were 165 (range = 143–181) required questions across the different women's baseline survey versions and 145 (range = 123–160) for men. Each participant's item response rate was then calculated as a fraction of the questions answered divided by the total required questions for that survey version.

To assess the quality of data throughout the survey, we calculated not only overall survey item response rates but also item response rate by survey section [general, medical, medications, vitamins, supplements, reproductive history (women only), screening, family history, residential, sun exposure, physical activity, occupation, tobacco use, second-hand smoke, alcohol use, demographic/other].

Statistical analysis

Item response rate differences by survey modality and language were examined using generalized linear models implemented with the Genmod procedure and adjusted for covariates (race, education, gender, and age) by including indicator variables. The largest group in each comparison was used as the reference category. We used a 99% confidence interval (CI) to determine significance. Because of the large numbers included in the analysis, very small percent differences in item response rates between groups were often found to be statistically significant. Although such a small difference in item response rates was statistically significant, it does not necessarily translate to a meaningful difference; therefore, after observing the distribution of item response rates, we considered a “meaningful” difference as a >2% difference in average item response rate between groups.

We also assessed item response rate differences by various demographic characteristics (gender, race, age, and education) within each survey modality to identify any potential descriptive differences in demographics by survey modality. These models were also adjusted for the general demographic characteristics. Finally, item response rates by section were also assessed to determine consistency of completion throughout the survey. Analyses were conducted using SAS version 9.4 (SAS Institute Inc.).

This analysis included 254,475 participants (22% men and 85.3% non-Hispanic white) among whom 61.2% completed the English Web-based survey (122,495 women and 33,165 men), 38.2% completed English paper survey (74,754 women and 22,370 men), and 0.7% completed the Spanish paper survey (1,257 women and 434 men; Table 2). The average survey item response rate overall was 96.7%. The distribution of item response rates by survey modality/language is shown in Fig. 1. A total of 37.6% of English Web-based surveys were 100% complete, followed by 12.9% of English paper survey and 2.8% of Spanish paper surveys. English Web-based surveys had the highest average survey response rate (97.6%) followed by English paper (95.5%) and Spanish paper (83.1%; Table 3).

Table 2.

Demographic characteristics overall and by survey modality, CPS-3 baseline cohort, USA, 2006–2013.

All survey typesEnglish Web-basedEnglish paperSpanish paper
N = 254,475n = 155,660n = 97,124n = 1,691
N (%)n (%)n (%)n (%)
Gender 
 Men 55,969 (22) 33,165 (21.3) 22,370 (23) 434 (25.7) 
 Women 198,506 (78) 122,495 (78.7) 74,754 (77) 1,257 (74.3) 
Age 
 <40 58,904 (23.1) 37,207 (23.9) 21,292 (21.9) 405 (24) 
 40–49 75,045 (29.5) 43,395 (27.9) 31,040 (32) 610 (36.1) 
 50–59 87,627 (34.4) 54,423 (35) 32,695 (33.7) 509 (30.1) 
 ≥60 32,899 (12.9) 20,635 (13.3) 12,097 (12.5) 167 (9.9) 
Race 
 White 216,982 (85.3) 133,894 (86) 83,076 (85.5) 12 (0.7) 
 Black 9,771 (3.8) 5,915 (3.8) 3,854 (4) 2 (0.1) 
 Hispanic 16,657 (6.5) 9,140 (5.9) 6,039 (6.2) 1,478 (87.4) 
 Other 10,927 (4.3) 6,709 (4.3) 4,019 (4.1) 199 (11.8) 
Education 
 <High school 1,797 (0.7) 504 (0.3) 1,082 (1.1) 211 (12.5) 
 High school graduate 22,306 (8.8) 9,956 (6.4) 12,092 (12.5) 258 (15.3) 
 Some college 38,548 (15.1) 21,632 (13.9) 16,816 (17.3) 100 (5.9) 
 Technical degree 38,007 (14.9) 22,020 (14.1) 15,678 (16.1) 309 (18.3) 
 College graduate 82,776 (32.5) 54,323 (34.9) 28,018 (28.8) 435 (25.7) 
 Graduate degree 69,378 (27.3) 47,176 (30.3) 21,901 (22.5) 301 (17.8) 
All survey typesEnglish Web-basedEnglish paperSpanish paper
N = 254,475n = 155,660n = 97,124n = 1,691
N (%)n (%)n (%)n (%)
Gender 
 Men 55,969 (22) 33,165 (21.3) 22,370 (23) 434 (25.7) 
 Women 198,506 (78) 122,495 (78.7) 74,754 (77) 1,257 (74.3) 
Age 
 <40 58,904 (23.1) 37,207 (23.9) 21,292 (21.9) 405 (24) 
 40–49 75,045 (29.5) 43,395 (27.9) 31,040 (32) 610 (36.1) 
 50–59 87,627 (34.4) 54,423 (35) 32,695 (33.7) 509 (30.1) 
 ≥60 32,899 (12.9) 20,635 (13.3) 12,097 (12.5) 167 (9.9) 
Race 
 White 216,982 (85.3) 133,894 (86) 83,076 (85.5) 12 (0.7) 
 Black 9,771 (3.8) 5,915 (3.8) 3,854 (4) 2 (0.1) 
 Hispanic 16,657 (6.5) 9,140 (5.9) 6,039 (6.2) 1,478 (87.4) 
 Other 10,927 (4.3) 6,709 (4.3) 4,019 (4.1) 199 (11.8) 
Education 
 <High school 1,797 (0.7) 504 (0.3) 1,082 (1.1) 211 (12.5) 
 High school graduate 22,306 (8.8) 9,956 (6.4) 12,092 (12.5) 258 (15.3) 
 Some college 38,548 (15.1) 21,632 (13.9) 16,816 (17.3) 100 (5.9) 
 Technical degree 38,007 (14.9) 22,020 (14.1) 15,678 (16.1) 309 (18.3) 
 College graduate 82,776 (32.5) 54,323 (34.9) 28,018 (28.8) 435 (25.7) 
 Graduate degree 69,378 (27.3) 47,176 (30.3) 21,901 (22.5) 301 (17.8) 
Figure 1.

Item response rates by survey section in women and men, CPS-3 baseline cohort, USA, 2006–2013. Demo/other, demographic/other; Family HX, family history; Phys act, physical activity; Reproduc HX, reproductive history; Sun exp, sun exposure.

Figure 1.

Item response rates by survey section in women and men, CPS-3 baseline cohort, USA, 2006–2013. Demo/other, demographic/other; Family HX, family history; Phys act, physical activity; Reproduc HX, reproductive history; Sun exp, sun exposure.

Close modal
Table 3.

Survey mean item response rate by survey type (all types, English Web-based, English paper, Spanish paper) and participant characteristics (1) adjusted and unadjusted for covariates, CPS-3 baseline cohort, USA, 2006–2013.

All survey types (N = 254,475)English Web-based (n = 155,660)English paper (n = 97,122)Spanish paper (n = 1,691)
UnadjustedAdjustedaUnadjustedAdjustedaUnadjustedAdjustedaUnadjustedAdjusteda
Mean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CI
Overall 96.7     97.6 Reference   95.5 −2.1b −2.2, 2.0 −1.7 −1.8, −1.7 83.1 −14.4b −14.9, −13.9 −11.4b −11.9, −10.9 
Gender 
 Men 96.8 −0.1, 0.1 −0.1, 0.1 97.8 0.2 0.0, 0.3 0.1 0.0, 0.2 95.5 0.0 −0.2, 0.1 0.0 −0.2, 0.1 84.8 2.1 −0.3, 4.5 2.9b 0.7, 5.1 
 Women 96.7 Reference Reference 97.6 Reference  95.6 Reference  82.7 Reference  
Race 
 White 97.1 Reference Reference 97.8 Reference  96 Reference  82 Reference  
 Black 93.8 −3.3b −3.5, −3.1 −3.2b −3.4, −3.1 95.5 −2.4b −2.6, −2.1 −2.4b −2.6, −2.1 91.3 −4.6b −4.9, −4.4 −4.5b −4.7, −4.2 93.5 11.5 −21.4, 44.3 0.8 −9.4, 11.0 
 Hispanic 94.5 −2.6b −2.8, −2.5 −2.4b −2.5, −2.2 96.7 −1.1 −1.3, −1.0 −1.1 −1.3, −0.9 93.8 −2.1b −2.3, −1.9 −1.9 −2.1, −1.7 83.4 1.4 −11.2, 13.9 −2.3 −13.8, 9.2 
 Other 95.3 −1.8 −2.0, −1.6 −1.9 −2.0, −1.7 96.6 −1.3 −1.5, −1.0 −1.3 −1.6, −1.1 93.9 −2.1b −2.3, −1.9 −2.1b −2.4, −1.9 82.1 0.1 −13.8, 14.0 −7.8 −20.2, 4.7 
Age 
 <40 97 0.2 0.1, 0.3 0.1 0.1, 0.2 97.7 0.0 −0.2, 0.1 −0.1 −0.2, 0.0 95.9 0.6 0.4, 0.7 0.5 0.3, 0.6 86.3 4.8b 2.0, 7.6 4.1b 1.5, 6.8 
 40–49 96.8 0.0 0.0, 0.1 0.1 0.0, 0.2 97.6 −0.1 −0.3, 0.0 −0.1 −0.2, 0.0 96 0.6 0.4, 0.7 0.5 0.4, 0.6 85.2 3.7b 1.2, 6.2 3.3b 0.9, 5.7 
 50–59 96.8 Reference Reference 97.7 Reference  95.4 Reference  81.5 Reference  
 ≥60 96.1 −0.7 −0.8, −0.5 −0.7 −0.8, −0.6 97.4 −0.4 −0.5, −0.2 −0.4 −0.6, −0.3 94.3 −1.1 −1.3, −0.9 −1.0 −1.2,−0.8 74.1 −7.4b −11.7, −3.0 −7.0b −11.2, −2.8 
Education 
 < High school 88.4 −8.8b −9.2, −8.4 −8.2b −8.6, −7.8 95 −2.8b −3.6, −2.0 −2.6b −3.4, −1.8 88.5 −7.8b −8.2, −7.4 −7.3b −7.8, −6.9 72.5 −15.2b −18.7, −11.7 −14.8b −18.2, −11.3 
 High school graduate 95.1 −2.2b −2.3, −2.1 −2.1b −2.3, −2.0 96.7 −1.2 −1.4, −1.0 −1.2 −1.4, −1.0 94.1 −2.2b −2.4, −2.1 −2.1b −2.3, −2.0 79.4 −8.3b −11.5, −5.1 −8.1b −11.2, −5.0 
 Some college 96.3 −1.0 −1.1, −0.9 −0.9 −1.0, −0.8 97.1 −0.8 −0.9, −0.6 −0.7 −0.9, −0.5 95.3 −1.0 −1.1, −0.8 −0.9 −1.0, −0.7 85.2 −2.5 −6.5, 1.6 −1.9 −5.9, 2.0 
 Technical degree 96.4 −0.8 −0.9, −0.7 −0.8 −0.9, −0.7 97.3 −0.6 −0.7, −0.4 −0.6 −0.7, −0.4 95.5 −0.8 −0.9, −0.6 −0.7 −0.9, −0.6 82.9 −4.8b −7.7, −2.0 −5.7b −8.5, −3.0 
 College graduate 97.3 Reference Reference 97.8 Reference  96.3 Reference  87.7 Reference  
 Graduate degree 97.5 0.2 0.1, 0.3 0.3 0.2, 0.4 98.1 0.2 0.1, 0.3 0.3 0.2, 0.4 96.3 0.0 −0.1, 0.1 0.1 0.0, 0.2 89.9 2.2 −0.3, 4.8 1.8 −0.7, 4.2 
All survey types (N = 254,475)English Web-based (n = 155,660)English paper (n = 97,122)Spanish paper (n = 1,691)
UnadjustedAdjustedaUnadjustedAdjustedaUnadjustedAdjustedaUnadjustedAdjusteda
Mean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CIMean item response rate (%)% Difference estimate99% CI% Difference estimate99% CI
Overall 96.7     97.6 Reference   95.5 −2.1b −2.2, 2.0 −1.7 −1.8, −1.7 83.1 −14.4b −14.9, −13.9 −11.4b −11.9, −10.9 
Gender 
 Men 96.8 −0.1, 0.1 −0.1, 0.1 97.8 0.2 0.0, 0.3 0.1 0.0, 0.2 95.5 0.0 −0.2, 0.1 0.0 −0.2, 0.1 84.8 2.1 −0.3, 4.5 2.9b 0.7, 5.1 
 Women 96.7 Reference Reference 97.6 Reference  95.6 Reference  82.7 Reference  
Race 
 White 97.1 Reference Reference 97.8 Reference  96 Reference  82 Reference  
 Black 93.8 −3.3b −3.5, −3.1 −3.2b −3.4, −3.1 95.5 −2.4b −2.6, −2.1 −2.4b −2.6, −2.1 91.3 −4.6b −4.9, −4.4 −4.5b −4.7, −4.2 93.5 11.5 −21.4, 44.3 0.8 −9.4, 11.0 
 Hispanic 94.5 −2.6b −2.8, −2.5 −2.4b −2.5, −2.2 96.7 −1.1 −1.3, −1.0 −1.1 −1.3, −0.9 93.8 −2.1b −2.3, −1.9 −1.9 −2.1, −1.7 83.4 1.4 −11.2, 13.9 −2.3 −13.8, 9.2 
 Other 95.3 −1.8 −2.0, −1.6 −1.9 −2.0, −1.7 96.6 −1.3 −1.5, −1.0 −1.3 −1.6, −1.1 93.9 −2.1b −2.3, −1.9 −2.1b −2.4, −1.9 82.1 0.1 −13.8, 14.0 −7.8 −20.2, 4.7 
Age 
 <40 97 0.2 0.1, 0.3 0.1 0.1, 0.2 97.7 0.0 −0.2, 0.1 −0.1 −0.2, 0.0 95.9 0.6 0.4, 0.7 0.5 0.3, 0.6 86.3 4.8b 2.0, 7.6 4.1b 1.5, 6.8 
 40–49 96.8 0.0 0.0, 0.1 0.1 0.0, 0.2 97.6 −0.1 −0.3, 0.0 −0.1 −0.2, 0.0 96 0.6 0.4, 0.7 0.5 0.4, 0.6 85.2 3.7b 1.2, 6.2 3.3b 0.9, 5.7 
 50–59 96.8 Reference Reference 97.7 Reference  95.4 Reference  81.5 Reference  
 ≥60 96.1 −0.7 −0.8, −0.5 −0.7 −0.8, −0.6 97.4 −0.4 −0.5, −0.2 −0.4 −0.6, −0.3 94.3 −1.1 −1.3, −0.9 −1.0 −1.2,−0.8 74.1 −7.4b −11.7, −3.0 −7.0b −11.2, −2.8 
Education 
 < High school 88.4 −8.8b −9.2, −8.4 −8.2b −8.6, −7.8 95 −2.8b −3.6, −2.0 −2.6b −3.4, −1.8 88.5 −7.8b −8.2, −7.4 −7.3b −7.8, −6.9 72.5 −15.2b −18.7, −11.7 −14.8b −18.2, −11.3 
 High school graduate 95.1 −2.2b −2.3, −2.1 −2.1b −2.3, −2.0 96.7 −1.2 −1.4, −1.0 −1.2 −1.4, −1.0 94.1 −2.2b −2.4, −2.1 −2.1b −2.3, −2.0 79.4 −8.3b −11.5, −5.1 −8.1b −11.2, −5.0 
 Some college 96.3 −1.0 −1.1, −0.9 −0.9 −1.0, −0.8 97.1 −0.8 −0.9, −0.6 −0.7 −0.9, −0.5 95.3 −1.0 −1.1, −0.8 −0.9 −1.0, −0.7 85.2 −2.5 −6.5, 1.6 −1.9 −5.9, 2.0 
 Technical degree 96.4 −0.8 −0.9, −0.7 −0.8 −0.9, −0.7 97.3 −0.6 −0.7, −0.4 −0.6 −0.7, −0.4 95.5 −0.8 −0.9, −0.6 −0.7 −0.9, −0.6 82.9 −4.8b −7.7, −2.0 −5.7b −8.5, −3.0 
 College graduate 97.3 Reference Reference 97.8 Reference  96.3 Reference  87.7 Reference  
 Graduate degree 97.5 0.2 0.1, 0.3 0.3 0.2, 0.4 98.1 0.2 0.1, 0.3 0.3 0.2, 0.4 96.3 0.0 −0.1, 0.1 0.1 0.0, 0.2 89.9 2.2 −0.3, 4.8 1.8 −0.7, 4.2 

Abbreviation: CI, Confidence interval.

aMultivariable models adjusted for gender, race, age, and education.

bP < 0.01 and % difference is greater than 2%, which represents a meaningful difference.

There were no meaningful differences (i.e., >2%) overall or within any specific survey modality or language by sex, except for Spanish language surveys where men had a 2.9% higher item response rate than women when adjusting for race, age, and education (Table 3). When examining item response rates by race/ethnicity, black and Hispanic participants had lower average item response rates (93.8% and 94.5%, respectively) compared with white participants (97.1%). This difference by race/ethnicity was more pronounced among participants who completed the English paper survey compared with the Web-based survey. Furthermore, among Hispanic participants, item response rates were higher when completing an English language survey (Web-based = 96.7% and paper = 93.8%) compared with Spanish language survey (83.4%). Item response rates also differed by education, with lower education groups having lower item response rates. These education differences were greater among English and Spanish paper survey responders, but no meaningful differences were observed (except for less than high school graduates) among those who completed the Web-based survey (Table 3). Age was only meaningfully associated with item response rates among participants who completed the Spanish paper survey. When compared with the participants ages 50–59 years, the oldest participants (60 and older) had a 7% lower average item response rate and the youngest participants (39 and younger) had approximately 4% higher average item response rate (Table 3).

Item response rates by survey section were highest among English Web-based survey responders followed by English paper and then Spanish paper survey responders (Fig. 1). All individual survey sections among English Web-based and English paper survey responders had average item response rate of at least 90%, except for the medical history and medications sections where English paper survey item response rates were slightly lower in both women (88.3% vs. 99.6% and 87.9% vs. 99.4%, respectively) and men (87.0% vs. 99.5% and 89.7% vs. 99.2%, respectively; Fig. 1). Spanish paper surveys had lower item response rates for every section, with residential history, medications, physical activity, and tobacco use sections being particularly less complete (women range 62.7%–78.1% and men range 64.3%–82.6%; Fig. 1).

We also conducted a sensitivity analysis to examine differences in completion rates between Spanish language participants from Puerto Rico compared with those who live in the continental United States in case regional differences influenced interpretability of the translated survey. The majority of the Spanish language surveys (80%) were completed by participants from Puerto Rico, but among those from the continental United States, we observed a significantly lower item response rate (−4.3%; 99% CI, −6.9 to −1.8). Finally, given the very low (62.7% in women and 64.3% in men) average item response rates for the residential history section among Spanish language survey responders, we conducted a sensitivity analysis excluding the residential history section. However, the results of this analysis showed that the overall survey item response rate in Spanish language survey remained relatively unchanged after excluding the residential history section (84.8%–86.0% in men and 82.7%–83.7% in women).

The overall item response rate for CPS-3 baseline survey data was very high with an average of 96.7% of all required questions being answered. We observed a modest difference between average item response rates for Web-based (97.6%) and paper (95.5%) surveys among English language responders. There was a more pronounced difference in item response rates by language among paper survey responders (95.5% for English vs. 83.1% for Spanish). Average item response rate was slightly lower for participants who were less educated or were nonwhite.

CPS-3 is one of the newest large-scale cancer cohorts in the United States that relies on mixed methods for data collection. These findings provide additional support for the use of both English language Web-based and paper survey administration modalities in capturing data effectively in future large-scale epidemiologic cohort studies. Web-based surveys are increasingly being used to save money, decrease processing time, and provide an easy to use, accessible format to research participants, as more of the population gains reliable access to the internet (6, 9–16). However, some studies suggest that allowing optional completion of any question on a Web-based survey leads to the high possibility of missing data (25) and that Web-based surveys tend to have lower overall response rates (26–28). In the CPS-3, we found the highest item response rates among Web-based responders. However, we utilized a “soft stop” for nearly all questions on the Web-based survey where participants were given a prompt indicating that they had missed a response but were not required to provide one. This tactic may have contributed to the higher item response rates within the survey while balancing potential unintended consequences when a participant is forced to answer questions, such as nonresponse to the survey overall. However, this was not formally tested in our study design, but should be evaluated in other studies that are considering both Web-based and paper survey options.

Item response rate by survey section was similar across modalities (English language paper vs. Web-based survey) with the exception of medical history and medications. These differences are likely attributable to design differences between the two modalities. For example, on the Web-based survey, participants were given a clear place to mark “none” for use of each medication group queried (e.g., blood pressure medications and cholesterol lowering medications). However, on the paper survey, all medication groups had a lead-in question that asked “Are you currently taking prescription medication for…?” with the exception of medications for depression or anxiety. For those, participants were asked to mark “no” for each medication individually. This question had lower item response on the paper survey and highlights the importance of consistency in the structure of questions within a given section. On average, Spanish language survey responders had 12.4% lower item response rates than English paper survey responders. One possible explanation is that lower item response rates in the Spanish language surveys may be due to the language translation as well as the cultural interpretability of the survey (20, 29–32). Spanish-speaking participants from Puerto Rico tended to have higher completion rates than Spanish-speaking participants from the continental United States. This may reflect the highly diverse and distinct cultural subgroups within the continental U.S. Hispanic population, highlighting the importance of cultural interpretability. However, because of the nationwide enrollment of this cohort, design of multiple country-of-origin–specific Spanish language surveys was cost prohibitive and not feasible. Among Spanish language survey responders, item response rates were lowest for the residential history section. This section may have had lower item response rates due to interpretability of the question format (e.g., participants were asked for city/state at each age interval without instruction on how to mark a U.S. territory), misunderstanding of the question, or possibly due to higher mobility rates reported among Hispanics (20), thus making it more difficult to complete this section. In addition, among only Spanish survey responders, older age was associated with lower item response rates, which may further support cultural interpretability, particularly for older Spanish speaking participants.

Other sections where item response rates were particularly low among Spanish-speaking participants included tobacco, medications, and physical activity. For tobacco, differences between English and Spanish item response were significantly greater for “other tobacco products” (e.g., snuff, bidis, and chewing tobacco) suggesting individual product names may not translate well, especially given our use of general Spanish translation. Similar to that observed with English paper surveys, item response was lowest for medications related to depression and anxiety and may reflect a survey design flaw. Finally, item response was lowest for individual types of physical activity compared with general questions related to disability or usual walking pace. A list of 15 individual activity types (e.g., walking, running, and aerobics) was asked and individual line items were more frequently left blank by Spanish speaking participants. This may be due to translation or participants only affirmatively responding to individual activities.

Nonwhite and less-educated participants also had lower overall item response rates, which were more pronounced in paper compared with Web-based surveys. Similar patterns of item nonresponse were seen in the Medicare assessment survey, where non-Hispanic whites had lower rates of item nonresponse (i.e., more complete surveys) than other racial or ethnic groups (33). While these groups may have lower average item response rates in our study, the absolute item response rates were still very high. Previous studies suggest that increased age is associated with higher item nonresponse (22, 33), but we did not observe differences in item response rates by age in English language surveys or overall.

In conclusion, the CPS-3 cohort baseline data collection was very complete, with few differences observed between English Web-based and paper survey responders. However, Spanish language responders had lower overall survey item response rates, although overall item response rates were still over 80%. Being nonwhite or of lower educational attainment also influenced data completeness. This study supports that utilizing a multimodal survey approach in epidemiologic studies does not differentially affect data quality, but close attention is required for consistency in survey design between modalities and within each survey section. Furthermore, for specific types of lifestyle or medical topic areas, further analysis should be considered for assessing data quality differences and interpretability when using Spanish language surveys. As the field of epidemiology continues to utilize varying methods of data collection, it is essential to integrate modern technologies that can maximize study efficiency and data quality while also considering the potential impact of utilizing different modalities or languages on data completeness.

No potential conflicts of interest were disclosed.

Conception and design: M. Rittase, E. Kirkland, D.M. Dudas, A.V. Patel

Development of methodology: M. Rittase, E. Kirkland, D.M. Dudas, A.V. Patel

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M. Rittase, E. Kirkland, D.M. Dudas, A.V. Patel

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M. Rittase, E. Kirkland, D.M. Dudas, A.V. Patel

Writing, review, and/or revision of the manuscript: M. Rittase, E. Kirkland, D.M. Dudas, A.V. Patel

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. Rittase, E. Kirkland, D.M. Dudas

Study supervision: A.V. Patel

This work was funded by the American Cancer Society.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Calle
EE
,
Rodriguez
C
,
Jacobs
EJ
,
Almon
ML
,
Chao
A
,
McCullough
ML
, et al
The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics
.
Cancer
2002
;
94
:
500
11
.
2.
Bernstein
L
,
Allen
M
,
Anton-Culver
H
,
Deapen
D
,
Horn-Ross
PL
,
Peel
D
, et al
High breast cancer incidence rates among California teachers: results from the California Teachers Study (United States)
.
Cancer Causes Control
2002
;
13
:
625
35
.
3.
Barton
J
,
Bain
C
,
Hennekens
CH
,
Rosner
B
,
Belanger
C
,
Roth
A
, et al
Characteristics of respondents and non-respondents to a mailed questionnaire
.
Am J Public Health
1980
;
70
:
823
5
.
4.
Rimm
EB
,
Stampfer
MJ
,
Colditz
GA
,
Giovannucci
E
,
Willett
WC
. 
Effectiveness of various mailing strategies among nonrespondents in a prospective cohort study
.
Am J Epidemiol
1990
;
131
:
1068
71
.
5.
Couper
MP
. 
The future of modes of data collection
.
Public Opin Q
2011
;
75
:
889
908
.
6.
Kesse-Guyot
E
,
Assmann
K
,
Andreeva
V
,
Castetbon
K
,
Méjean
C
,
Touvier
M
, et al
Lessons learned from methodological validation research in E-Epidemiology
.
JMIR Public Health Surveill
2016
;
2
:
e160
.
7.
Fricker
RD
,
Schonlau
M
. 
Advantages and disadvantages of internet research surveys: evidence from the literature
.
Field Methods
2012
;
14
:
347
67
.
8.
Dillman
D
. 
Introduction to special issue of survey practice on item nonresponse
.
Surv Pract
2012
;
5
:
1
5
.
9.
Smith
B
,
Smith
TC
,
Gray
GC
,
Ryan
MAK
,
Millennium Cohort Study Team
. 
When epidemiology meets the Internet: web-based surveys in the Millennium Cohort Study
.
Am J Epidemiol
2007
;
166
:
1345
54
.
10.
Kongsved
SM
,
Basnov
M
,
Holm-Christensen
K
,
Hjollund
NH
. 
Response rate and completeness of questionnaires: a randomized study of Internet versus paper-and-pencil versions
.
J Med Internet Res
2007
;
9
:
e25
.
11.
Touvier
M
,
Mejean
C
,
Kesse-Guyot
E
,
Pollet
C
,
Malon
A
,
Castetbon
K
, et al
Comparison between web-based and paper versions of a self-administered anthropometric questionnaire
.
Eur J Epidemiol
2010
;
25
:
287
96
.
12.
van Gelder
M
,
Bretveld
R
,
Roeleveld
N
. 
Web-based questionnaires: the future in epidemiology?
Am J Epidemiol
2010
;
172
:
1292
98
.
13.
Russell
CW
,
Boggs
DA
,
Palmer
JR
,
Rosenberg
L
. 
Use of a web-based questionnaire in the Black Women's Health Study
.
Am J Epidemiol
2010
;
172
:
1286
91
.
14.
Balter
KA
,
Balter
O
,
Fondell
E
,
Lagerros
YT
. 
Web-based and mailed questionnaires: a comparison of response rates and compliance
.
Epidemiology
2005
;
16
:
577
9
.
15.
Ekman
A
,
Dickman
PW
,
Klint
A
,
Weiderpass
E
,
Litton
J-E
. 
Feasibility of using web-based questionnaires in large population-based epidemiological studies
.
Eur J Epidemiol
2006
;
21
:
103
11
.
16.
Ritter
P
,
Lorig
K
,
Laurent
D
,
Matthews
K
. 
Internet versus mailed questionnaires: a randomized comparison
.
J Med Internet Res
2004
;
6
:
e29
.
17.
United States Census Bureau
. 
Facts for features: Hispanic Heritage Month 2016. 2016 Oct 12
.
Available from
: https://www.census.gov/newsroom/facts-for-features/2016/cb16-ff16.html; 
2016
.
18.
Deyo
RA
. 
Pitfalls in measuring the health status of Mexican Americans: comparative validity of the English and Spanish Sickness Impact Profile
.
Am J Public Health
1984
;
74
:
569
73
.
19.
Berkanovic
E
. 
The effect of inadequate language translation on Hispanics' responses to health surveys
.
Am J Public Health
1980
;
70
:
1273
6
.
20.
Marin
G
,
Marin
B
.
Research with Hispanic populations
.
Newbury Park (CA)
:
Sage Publications
; 
1991
.
21.
Brown
A
.
The unique challenges of surveying U.S. Latinos
.
Washington (DC)
:
Pew Research Center
; 
2015
.
22.
de Leeuw
ED
,
Hox
J
,
Huisman
M
. 
Prevention and treatment of item nonresponse
.
J Off Stat
2003
;
19
:
153
76
.
23.
Fox-Wasylyshyn
SM
,
El-Masri
MM
. 
Handling missing data in self-report measures
.
Res Nurs Health
2005
;
28
:
488
95
.
24.
Patel
AV
,
Jacobs
EJ
,
Dudas
DM
,
Briggs
PJ
,
Lichtman
CJ
,
Bain
EB
, et al
The American Cancer Society's Cancer Prevention Study 3 (CPS-3): recruitment, study design, and baseline characteristics
.
Cancer
2017
;
123
:
2014
24
.
25.
Cantrell
MA
,
Lupinacci
P
. 
Methodological issues in online data collection
.
J Adv Nurs
2007
;
60
:
544
9
.
26.
Udtha
M
,
Nomie
K
,
Yu
E
,
Sanner
J
. 
Novel and emerging strategies for longitudinal data collection
.
J Nurs Scholarsh
2015
;
47
:
152
60
.
27.
Israel
G
. 
Combining mail and e-mail contacts to facilitate participation in mixed-mode surveys
.
Soc Sci Comput Rev
2012
;
00
:
1
13
.
28.
de Leeuw
ED
. 
To mix or not to mix data collection modes in surveys
.
J Off Stat
2005
;
21
:
233
55
.
29.
Kleiner
B
,
Pan
Y
,
Bouic
J
. 
The impact of instructions on survey translation: an experimental study
.
Surv Res Methods
2009
;
3
:
113
22
.
30.
Harrison
GG
,
Stormer
A
,
Herman
DR
,
Winham
DM
. 
Development of a Spanish-language version of the U.S. household food security survey module
.
J Nutr
2003
;
133
:
1192
7
.
31.
Harkness
J
,
Schoua-Glusberg
A
. 
Questionnaires in translation
.
In: Harkness JA, editor
.
Zuma Nachrichten Spezial volume 3. Cross-cultural survey equivalence
.
Mannheim (Germany): Zuma; 1998. p. 87–126
.
32.
Evans
B
,
Quiroz
R
,
Athey
L
,
McMichael
J
,
Albright
V
. 
Customizing survey methods to the target population: innovative approaches to improving response rates and data quality among Hispanics
. In:
The American Association for Public Opinion Research 63rd Annual Conference
; 
2008
May 18;
New Orleans, LA
. Washington (DC): AAPOR; 2008.
33.
Klein
DJ
,
Elliott
MN
,
Haviland
AM
,
Saliba
D
,
Burkhart
Q
,
Edwards
C
, et al
Understanding nonresponse to the 2007 Medicare CAHPS survey
.
Gerontologist
2011
;
51
:
843
55
.