Abstract
There is often a lack of transparency in research using online panels related to recruitment methods and sample derivation. The purpose of this study was to describe the recruitment and participation of respondents from two disparate surveys derived from the same online research panel using quota sampling.
A commercial survey sampling and administration company, Qualtrics, was contracted to recruit participants and implement two Internet-based surveys. The first survey targeted adults aged 50 to 75 years and used sampling quotas to obtain diversity with respect to household income and race/ethnicity. The second focused on women aged 18 to 49 years and utilized quota sampling to achieve a geographically balanced sample.
A racially and economically diverse sample of older adults (n = 419) and a geographically diverse sample of younger women (n = 530) were acquired relatively quickly (within 12 and 4 days, respectively). With exception of the highest income level, quotas were implemented as requested. Recruitment of older adults took longer (vs. younger female adults). Although survey completion rates were reasonable in both studies, there were inconsistencies in the proportion of incomplete survey responses and quality fails.
Cancer researchers, and researchers in general, should consider ways to leverage the use of online panels for future studies. To optimize novel and innovative strategies, researchers should proactively ask questions about panels and carefully consider the strengths and drawbacks of online survey features including quota sampling and forced response.
Results provide practical insights for cancer researchers developing future online surveys and recruitment protocols.
See all articles in this CEBP Focus section, “Modernizing Population Science.”
Introduction
Increased use of online (Internet) surveys has led to a rise in commercial survey and market research platforms. Companies such as Qualtrics (www.qualtrics.com), Survey Monkey (www.surveymonkey.com), and Amazon's Mechanical Turk (MTurk; www.mturk.com) allow researchers to develop, test, and distribute surveys online. In addition to creating electronic surveys for distribution through typical sample outlets, these companies enable researchers to purchase access to existing pools of potential participants that have agreed to be solicited for survey recruitment. Utilizing online research panels for sample acquisition and data collection is quick and efficient. Compared with traditional survey modes (e.g., mail and telephone), online surveys are typically less expensive (1), require less time to complete (2), and more readily provide access to unique populations (3, 4).
Innovations in the collection of data using mobile and online platforms are transforming the conduct of survey research (5). For example, crowdsourcing (i.e., the practice of soliciting contributions from large groups of people) has been applied to other health-related research and is in the early stages of adoption in cancer research. A systematic review identified 12 studies that applied crowdsourcing to cancer research in a range of capacities, including identifying candidate gene sequences for investigation, developing cancer prognosis models, and assessing cancer knowledge, beliefs, and behaviors (6). Within the broad field of cancer, other recent studies have drawn from panel samples to conduct survivorship (7), risk communication (8), message testing (9), and behavioral (10) research. With increasing use of new technologies for data collection, the use of commercial research panels will become more prevalent in cancer research. Furthermore, large cohorts of individuals open to participating in research are being established for longitudinal epidemiologic research, including the Growth from Knowledge (GfK) KnowledgePanel and National Institute of Health (NIH) All of Us panel. Future use of panel surveys for collecting health data and biospecimens holds promise for accelerating cancer research across the cancer continuum.
Although online panels may not be representative of the U.S. population (11), growing evidence suggests samples recruited through online panels can be as representative of the population as traditional recruitment methods (12–14). Yet, the greatest advantage of online research panels may be their ability to produce samples targeting specific groups, such as respondents who meet a specific condition of interest to the researcher. The use of quota sampling (i.e., a nonprobability sampling technique) in online panel research can help researchers obtain survey participants matching specified criteria, such as young adult cancer survivors or mammography screening age-eligible adults. Although online panel provide some advantages over traditional sampling methods, questions about the validity of commercially derived online panel samples have been raised (15–17). These concerns may arise due to a lack of transparency in the recruitment of panelists and insufficient details on how samples are derived from online panels.
Online panel members are recruited from a variety of sources (12) and therefore, precise information on how sampling frames are constructed is usually not available. Because researchers generally lack control over sample acquisition procedures, an in-depth characterization of how panel participants are recruited is needed to inform researchers on what to expect when administering an online survey and recruiting participants from online sample panels. The purpose of this study was to describe the recruitment and participation of respondents from two disparate online surveys using quota sampling and administered by the same commercial research platform.
Materials and Methods
Qualtrics, a commercial survey sampling and administration company, was contracted to recruit participants and implement two Internet-based surveys. Samples were acquired from existing pools of research panel participants. Samples were acquired from existing pools of research panel participants who have agreed to be contacted for research studies. The Qualtrics network of participant pools, referred to as the Marketplace, consists of hundreds of suppliers with a diverse set of recruitment methodologies (A. Taylor; personal communication). The compilation of sampling sources helps to ensure that the overall sampling frame is not overly reliant or dependent on any particular demographic or segment of the population. Respondents can be sourced from a variety of methods depending on the individual supply partner, including the following: ads and promotions across various digital networks, word of mouth and membership referrals, social networks, online and mobile games, affiliate marketing, banner ads, TV and radio ads, and offline mail-based recruitment campaigns.
Recruitment targeted potential survey respondents who were likely to qualify based on the demographic characteristics reported in their user profiles (e.g., race and age). Panelists were invited to participate and opted in by activating a survey link directing them to the study consent page and survey instrument. Ineligible respondents were immediately exited from the survey upon providing a response that did not meet inclusion criteria or exceeded set quotas (i.e., a priori quotas for race or household income group already met).
To ensure data quality, surveys featured (i) attention checks (i.e., survey items that instructed respondents to provide a specific response); and (ii) speeding checks (i.e., respondents with survey duration of ≤ one-third of the median duration of survey). Respondents who failed either quality check were excluded from the final samples. The two surveys were approximately equivalent in terms of survey duration and participant remuneration. Qualtrics charged investigators $6.50 for each completed survey response requested. The data reported in this study were collected according to separate Institutional review board (IRB)-approved protocols and in accordance with recognized ethical guidelines. Written informed consent was obtained from each participant.
Study design and survey administration
Sample one
The first sample was obtained as part of a pre–post parallel trial designed to examine the effects of providing colorectal cancer risk feedback to average risk adults who are age-eligible for colorectal cancer screening. The survey contained 133 items with each item requiring a response (i.e., forced response). The target enrollment was 400 eligible respondents. To be eligible to participate, respondents had to report being aged 50–75 years, have no personal or family history of colorectal cancer or other predisposing condition, able to read and comprehend English, and reside in the contiguous United States. Sampling quotas were implemented for race and annual household income. Specifically, balanced proportions of respondents with non-Hispanic White/Caucasian, non-Hispanic Black/African American, and Hispanic/Latino/Spanish origin racial/ethnic identities and diversity in reported household income (approximately 20% less than $20,000, 30% between $20,000 and $49,000, 20% between $50,000 and $74,000, 10% between $75,000 and $99,000, and 20% greater than or equal to $100,000) were requested. Respondents identifying as some other race, ethnicity, or origin were not eligible to participate.
Sample two
Sample two was acquired during a previously described study testing whether specific message types and various psychosocial variables affect future Zika vaccine uptake intent among women of childbearing age (18). The survey contained 105 items and did not utilize forced response. Target enrollment was 500 respondents. To be eligible to participate in this study, respondents had to report being female, between the ages 18 and 49 years, able to read and comprehend English, and a resident of the contiguous United States. Sampling quotas were implemented to achieve a geographically varied sample across the four census regions (i.e., Northeast, South, West, and Midwest) with oversampling in the Southern region due to the heightened risk of Zika in this area.
Results
Data collection
For the first sample, completed survey duration ranged from 10 to 1,922 minutes, with a median duration of 26 minutes. Data collection occurred over a period of 12 days in June 2017. Completed survey duration for the second sample ranged from 8 to 390 minutes, with a median duration of 27 minutes. Data collection occurred over a period of four days in March 2017.
Survey recruitment and participant flow
Survey recruitment and participant flow for each sample are depicted in Fig. 1.
Sample one
On the basis of target demographics, approximately 63,500 panelists were invited to participate in this survey by e-mail and other methods (e.g., messaging through online portals, text message, and in-app advertisements). Of those contacted, 3,178 panelists interacted with the survey by opening the survey invitation and/or survey link and 1,606 completed the consent page. 158 panelists did not consent (9.8%). Of the panelists who consented, 671 did not meet eligibility criteria (46.3%), the majority of whom were ineligible due to health history (n = 574 reported a history of colorectal cancer or other predisposing condition). 220 respondents were excluded due to quota sampling (15.2%). 71 respondents did not complete the survey entirely (4.9%) and an additional 67 were removed from the study for failing an attention check (i.e., one of three survey items that required specific responses; 4.6%). A total of n = 419 panelists completed the survey (75.2% of those who agreed to participate and were eligible).
The average age of respondents who completed the survey was 58.5 years (SD = 6.3). The sample consisted of n = 279 females (66.6%) and as requested, an equal proportion of non-Hispanic White/Caucasian, non-Hispanic Black/African American, and Hispanic/Latino/Spanish respondents (33% each). However, the a priori income quotas proposed for this study were not fully implemented. Because of difficulties acquiring participants who reported an annual household income of ≥100,000, we elected to eliminate the income quota after two weeks of data collection. To acquire an adequate sample size to ensure statistical power for the parent study (i.e., n = 400), we used natural probability sampling to obtain the remaining number of participants. Ultimately, income levels were within 5% of the proportions requested, except respondents with a reported income of ≥100,000 were 10.5% of the final sample instead of the 20% initially proposed.
Sample two
The survey for sample two was distributed via e-mail to approximately 56,978 panelists based on target demographics. A total of 2,015 panelists interacted with the survey (i.e., clicked on the survey invitation and/or survey link and 882 went on to complete the consent page). Three percent of these panelists did not consent (n = 27). Among those who consented, 23 (2.7%) did not meet eligibility criteria (due to age and gender). No respondents were screened out due to being over quota. Thirty-eight respondents did not complete the survey entirely (4.4%) and an additional 264 were removed from the study for failing an attention check (i.e., one of three survey items that required specific responses; 30.9%). A total of n = 530 panelists completed the survey (63.7% of those who consented and were eligible).
As intended, all respondents who completed the survey were female. On average, respondents were 33.9 years old (SD = 7.9). Respondents were predominantly White/Caucasian (73.5%), 9.4% were Black/African American, 8.8% Hispanic, 5.0% Asian, 1.2% American Indian, and 2.1% were some other race/ethnicity. Thirty-nine percent of respondents were from the Southern United States. Roughly one-quarter (24.5%) were residents of the Western region and the remaining respondents were from the Midwest (20.9%) and Northeast (15.6%).
Discussion
This study described the samples resulting from two online surveys that recruited participants using quota sampling through the same commercial research panel. A thorough description of how quota sampling was used to obtain targeted samples–one racially and economically diverse sample of older adults and another geographically dispersed sample of younger adult women–via an online panel was provided. Survey recruitment and participant flow within each sample were examined. Taken together, results provide context and considerations for future cancer researchers, and researchers in general, contemplating the use of commercially administered, online research surveys.
The level of transparency regarding recruitment and participant flow reported in this study (e.g., number of panelists contacted, number of panelists that interacted with the survey, analysis of over quota exclusions, etc.) is greater than that typically reported in other recent studies using online research panels (19, 20). The information outlined indicates that commercial research platforms have access to large panels of research participants. In both cases, approximately 60,000 panelists were sent a survey invitation. About one-half (50.5% and 43.8% in samples one and two, respectively) of those who interacted with the e-mail ultimately completed the consent page of the survey. Although the traditional calculation of response for each of these samples was very low (i.e., 3% to 7% of panelists interacted with the survey), these results are consistent with prior research examining response across multiple panel vendors (21) but lower than another Qualtrics panel study reporting an 18.7% response rate (10). However, among those who consented and were eligible for participation, most completed the survey (75.2% in sample one and 63.7% in sample two). For Internet-derived samples, this “completion rate” (i.e., the proportion of survey completers out of all eligible respondents who initiate the survey) is frequently reported (7, 9, 22). Our completion rates compare favorably to the typical response rates of epidemiologic studies that have been declining for the past several decades (23). A review of case–control cancer studies conducted in 2001–2010 revealed median response rates of 75.6% for cases, 78.0% for medical source controls, and 53.0% for population controls (24). The median response rate of the 2017 Behavioral Risk Factor Surveillance System (BRFSS) survey was approximately 45% (25).
Study-specific inclusion criterion were the primary reason for ineligibility within both samples, including more than 500 consented panelists with a history of colorectal cancer excluded in sample one. This presence of colorectal cancer survivors suggests Qualtrics may be a potentially promising but overlooked platform for recruiting cancer survivors. Additional panelists in the first survey screened out due to quota sampling (n = 220), while no panelists were excluded from survey two for being over quota. This difference is not entirely surprising given that no specific limits were set for the quotas on geographic regions in sample two. That the desired geographic variation was achieved without defining precise proportions represents an important consideration for future study designs as having less stringent quotas could save time (faster sample acquisition), reduce expenses (complex quotas increase sample cost), and potentially, reduce selection bias (by retaining otherwise eligible respondents).
Despite initially contacting higher numbers of potential participants relative to traditional methods, online panelists meeting specific demographic criteria can be effectively targeted on the basis of information contained in panelists' profiles. There were few exclusions based on sociodemographic characteristics. Survey completers in both studies were consistent with the study inclusion criteria and set quotas (except for the highest income level). Therefore, it may be difficult to acquire more affluent participants. It should also be noted that sample one (with no inclusion or sampling criteria related to gender) had a relatively high proportion of female respondents (66.6%) while sample two (with no race/ethnicity quota) yielded a predominately White/Caucasian sample. These results support the use of quotas when demographic characteristics are germane to study aims. For example, cancer researchers may use quotas in case–control studies to identify controls that match on specific criteria (i.e., smoking history). However, large panels would be needed to target exposures with low prevalence. Researchers should carefully weigh the benefits and potential drawbacks of using quota sampling.
Several additional practical implications can be gleaned from the examination of these online surveys. First, researchers received more completed survey responses than were purchased (e.g., 18–29) and samples were acquired relatively more quickly than traditional samples. Although both samples were acquired in a short period of time, it should be noted that sample one took three times longer than sample two, likely due to the older target population. Second, the proportion of eligible respondents who did not complete the survey was relatively low in both samples, but lower in the sample of younger adult women (i.e., 4.6% in sample two vs. 12.7% in sample one). Therefore, forced item response may not be necessary to promote complete data and may contribute to the higher incomplete rate observed in sample one. In addition, only one “speeder” (i.e., respondent with a total survey duration of ≤ one-third of the median duration of survey) was identified. Thus, researchers should be encouraged to use additional quality checks, such as attention checks, to safeguard against cheaters (i.e., respondents who rush through the survey and threaten data quality). In this analysis, the younger respondents in sample two were more likely than older respondents (sample one) to fail attention checks (31.7% and 12.0%, respectively). This finding may support that older respondents are more conscientious, or alternatively, the forced response of items in sample one may underlie the difference.
This study highlights the relative ease of obtaining diverse samples (i.e., one racially and economically diverse and the other geographically dispersed) via quota sampling and online recruitment methods. Implementing sampling quotas for race/ethnicity represents a major advantage over traditional sampling methods that often consist of predominantly White/Caucasian participants (8, 26, 27). Researchers who seek racial/ethnic diversity should utilize available representative samples whenever possible. When access to minorities is limited, however, online panel sampling using quotas sampling for race may be a valuable approach for reaching minority participants, as demonstrated in sample one and other panel samples (28, 29).
Finally, this study is one of the first to examine in-depth how studies using online panels function in terms of sampling, initial recruiting, and participation. Our results provide practical insights for cancer researchers, and researchers in general, to be cognizant of when designing future online surveys and recruitment strategies. To optimize novel and innovative strategies, researchers should carefully consider the strengths and drawbacks of online survey features including quota sampling and forced response. Our goal in documenting the methodologic aspects of recruitment and participation from two different study populations from online panels was to help build transparency in reporting online panel samples, as well as to provide a basis for comparison across different commercially available research panels. Like many other studies conducted using online panels, we were unable to fully ascertain how the panel was created or describe those who did not interact with the survey. According to Qualtrics (A. Taylor; personal communication), additional information related to these panelists (e.g., number of invalid e-mails and nonrespondent characteristics) is not currently tracked. Another limitation of using commercial participant panels is that their overall size and the demographics of the underlying user populations are dynamic and often not available. Future studies should be proactive in negotiating additional information and addressing unknowns in panel recruitment procedures. In doing so, researchers could raise the bar on the standards of information provided by suppliers of commercial panels. These are important steps toward strengthening survey methodologies in the rapidly changing landscape of “citizen science” where the public actively engages in participatory research projects, such as online panels and scientific registries.
In summary, online research panels and quota sampling techniques provide new opportunities for the acquisition of traditionally under-represented individuals or participants who meet narrowly specific inclusion criteria, an advantage over traditional sampling methods. Our results support leveraging online panels for cancer research. Future epidemiologic research using these methods to perform recruitment of targeted populations [e.g., cancer survivors (cases) and individuals (controls) residing in specific geographic areas, such as colorectal cancer “hotspots”] could alleviate the need for time- and cost-intensive methods such as mail-based and in-person correspondence. In comparison, Web-based surveys participant incentives and administrative costs are substantially lower (30). In conclusion, the use of panel participants could be leveraged to reach specific population groups, maximize limited research budgets, and therefore, enable novel cancer research focused on health disparities and cancer communication that are currently not feasible in either traditional small intervention or large population studies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: C.A. Miller, J.P.D. Guidry, B. Dahman
Development of methodology: C.A. Miller, J.P.D. Guidry, B. Dahman, M.D. Thomson
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.A. Miller, J.P.D. Guidry
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.A. Miller, J.P.D. Guidry, B. Dahman, M.D. Thomson
Writing, review, and/or revision of the manuscript: C.A. Miller, B. Dahman, M.D. Thomson
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.A. Miller
Study supervision: C.A. Miller, M.D. Thomson
Acknowledgments
Financial support was provided in part to the corresponding author (to C.A. Miller) by a Graduate Training in Disparities Research award GTDR14302086 from Susan G. Komen and an NCI T32 award (2T32CA093423).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.