Background: The current study considers potential nonresponse bias and data quality issues in the Multiethnic Cohort, a prospective study of lifestyle-cancer associations among adults ages 45 to 75 years from five ethnic groups in Hawaii and California.

Methods: We examined determinants of early versus later response to the baseline questionnaire using logistic regression with response wave regressed on measures of demographics, history of illness, health behaviors, medication, and supplement use.

Results: Participants who were more health conscious tended to respond earlier. Female sex, more education, personal experience with cancer, more physical activity, and regular use of aspirin were associated with early response. Race other than white and current smoking were associated with later response. Of note, African-Americans and Latinos, with lower response rates, and Japanese, with the highest response rate overall, were more likely to respond to a later mailing. Results were generally consistent across sex, age groups (under 65, 65+), and ethnic groups and over time. Although repeated mailings increased the proportion responding and the diversity of participants, the percent of missing item responses increased somewhat with response wave.

Conclusion: Multiple mailings are recommended for recruiting ethnic minority groups but the tradeoff may be more missing data among later respondents. (Cancer Epidemiol Biomarkers Prev 2008;17(2):447–54)

Mail survey methodology is appealing in epidemiologic research for reasons of cost and convenience. However, sampling error, adequacy of the sampling frame, measurement inaccuracies, and nonresponse must be considered for a well-designed postal study to be generalizable to the population of interest (1). Several recent epidemiology reports have focused on the issue of nonresponse error, which occurs when sampled persons who do not respond to a survey differ from those who do. Randomized trials and surveillance studies have investigated recruitment method strategies to increase response rates, a standard approach to reducing nonresponse error via reducing the number of all persons who do not respond. These strategies have included providing a pen/pencil, financial incentives, use of a culturally sensitive letter, and varying survey length (2-7). In contrast, reports of post hoc analyses compared respondents to nonrespondents and those who responded early or later to recruitment efforts (7-21), some of which make inferences about bias and the value of extensive recruitment methods. There have also been simulations to show how assumptions about nonrespondents or late respondents can potentially affect prevalence estimates for disease or exposure (22, 23).

In the current study, we investigated differences between early and late respondents to a self-administered mail survey of lifestyle exposures that may increase the risk of cancer and other disease in the Multiethnic Cohort (MEC) using all measures available from the survey. Five specific ethnic groups were targeted for recruitment, which provides a unique opportunity to learn more about ethnic differences in recruitment efforts. The main findings presented here concern early versus late response based on eventual participants. Although it would be ideal to make comparisons with nonrespondents, presumed ethnicity is the only variable available for comparison. In addition, we examined differences between younger and retirement-aged adults and show how response completeness may be affected by repeatedly mailing questionnaires to increase the response rate.

Study Population

The MEC is a prospective study that was initiated to investigate lifestyle exposures in relation to disease outcomes, especially cancer. Recruitment took place from 1993 to 1996 in Hawaii and Los Angeles, California, and targeted five ethnic/racial groups: African-American, Native Hawaiian, Japanese-American, Latino, and White. The sampling frame was the driver's license files of the state of Hawaii and of the county of Los Angeles, supplemented by the voter's registration lists in Hawaii, to ensure that older Japanese who do not drive were adequately covered and the Health Care Financing Administration member list in California to increase the number of older African-American men in the sampling frame. Sex is available from these sources, but detailed race/ethnicity is not. Only African-Americans were recruited from the Health Care Financing Administration list based on their racial designation of white, black, and other. Some Native Hawaiians were identified based on voter's registration information that listed eligibility to vote in elections only open to Native Hawaiians. Others in the sampling frame were assigned a presumed race/ethnic group based on names and residence, as follows. Last name lists were compiled from data from past epidemiology studies and public sources for the following groups: Japanese, Chinese, Korean, Filipino, Hawaiian, Samoan, and Latino; first name lists were compiled from public sources for Japanese and Hawaiians. The last names of individuals in the sampling frame were compared against the name lists and assigned race/ethnicity if there was a match. The first names of individuals not assigned Japanese or Native Hawaiian were compared against the first name lists; this step was to further identify Japanese and Native Hawaiian women with other married names. Remaining individuals were assigned a presumed race of white or African-American. In Hawaii, these individuals were assigned white, and in Los Angeles, they were assigned African-American when they lived in a census tract that was >50% black in the 1990 census, or otherwise white. Based on these steps, the mailings were targeted to the five ethnic/racial groups. A tracking database, including estimated ethnicity, was used to record recruitment efforts and final status for all persons in the sample.

The MEC design and recruitment methods have been described in detail elsewhere (24). In brief, >215,000 men and women ages 45 to 75 years became participants by completing a 26-page, self-administered survey received through the U.S. mail, with a cover letter describing the study. This survey is referred to as the baseline questionnaire. Both Spanish and English versions of the survey were sent to persons in Los Angeles with Latino surnames. The surveys were sent in seven batches to ∼100,000 persons each, spaced ∼6 months apart. Nonrespondents were sent the same questionnaire up to two additional times over a period of 3 to 7 months. The mailings to the nonrespondents occurred when the number of incoming completed surveys dropped substantially but were sometimes delayed to avoid sending questionnaires during holidays. The repeated postings within batch are referred to in this article as mailings, and the corresponding response to the mailing is referred to as the wave of response. In the present analysis of early versus late response, we excluded 6% of participants who were from batches that were mailed near the end of the initial cohort recruitment, as they were given less opportunity to respond, being sent a maximum of two surveys. These batches targeted Native Hawaiians and Latinos, which suggests their overall response rates would have been somewhat higher had they been given the same opportunity to respond. An additional <1% of participants were excluded because their surveys were received after an extended delay of >5 months after the third survey was mailed, suggesting extenuating circumstances in locating the participants or in their readiness to respond. The present analysis of response wave is based on a total of 201,461 cohort members. University of Hawaii and University of Southern California Institutional Review Boards reviewed and approved all MEC study protocols.

Follow-up Survey. A four-page follow-up survey was mailed to the cohort participants ∼5 years later in years 1999 to 2000. The follow-up survey was mailed up to four times (i.e., initial survey and three repeats) and phone contact was also used to maximize the response rate.

Measures

We wish to explore differences between individuals who required more or less effort to recruit. Therefore, the key variable of interest in this study is the wave number in which the baseline questionnaire was returned. The baseline questionnaires were not tracked with a mailing number, and precise mailing dates were difficult to assess due to the bulk mailing procedures used by the U.S. postal service. Therefore, we chose to define wave number by the distribution of surveys returned over time, within batches and location. Figure 1 shows the baseline questionnaire receipt date for one batch of Hawaii participants and illustrates the substantial increases that suggest response to a follow-up mailing. The cutoff dates used in determining wave number were independently determined by two of the authors with 100% agreement. These cutoff dates were also consistent with approximate dates that nonrespondents were mailed a duplicate survey. This procedure was followed for each batch by location.

Figure 1.

Estimation of wave number using date the survey was returned by one batch of Hawaii cohort participants (1993-1994).

Figure 1.

Estimation of wave number using date the survey was returned by one batch of Hawaii cohort participants (1993-1994).

Close modal

For the follow-up survey, we defined wave number based on return dates in a similar manner to the baseline questionnaire, but we also had the number of surveys mailed in a tracking database. In addition, for the follow-up questionnaire, the questionnaire was administered by phone, when possible, to individuals who did not respond to a third mailing, and a fourth mailing was sent to individuals who had not responded and could not be contacted by phone. For some analyses, these individuals were assigned a wave of 3. We validated our approach for defining wave number by comparing the wave number assigned for the follow-up survey with the number of questionnaires mailed and found excellent agreement (κ = 0.93).

Information on demographics, history of prior medical conditions, reproductive history for women, lifestyle behaviors (e.g., diet, smoking, and physical activity), and medication and vitamin use was asked on the initial survey, and these items were considered in this analysis as potential determinants of response behavior. The specific variables included in this analysis are listed in Table 2. Family history of cancer asked about parents' or siblings' incidence of breast, ovarian, prostate, colon, and “other” cancers. Individuals were asked if they had ever been informed by their doctors that they had any of a list of medical conditions. Dietary information was assessed via a validated food frequency questionnaire, including >180 food items, developed for this multiethnic population. Daily intake of dietary components was computed as described previously (24, 25). Body mass index (BMI) was calculated as self-reported weight in kilograms divided by self-reported height in meters, squared. Medication history asked for regular use, defined as two times per week for 1 month or longer, currently or in the past. Vitamin and mineral use was recorded if taken at least once per week during the last year; use was asked for multivitamin/minerals and selected single supplements. Women were asked if they currently use or ever used estrogen or progesterone for menopause or other reasons.

Statistical Methods

All analyses were conducted using Statistical Analysis System 9.1 (26). Logistic regression modeling wave number was the primary analysis tool. Initially, polytomous logistic models were used on an outcome variable with three levels for wave, but the findings for the second and third waves were generally consistent. Therefore, the final outcome variable collapsed the wave numbers 2 and 3 and was modeled using a dichotomous logistic model. Odds ratios (OR) and 95% confidence intervals were computed. Independent variables were tested for a bivariate relation to wave number using logistic regression. Two variables, current regular use of pain reliever other than aspirin and hypertension medication use, had no bivariate association with wave and were excluded from further analyses. All remaining variables were entered into a logistic regression model predicting response in the later waves, with separate models stratified by sex, sex-age group (under 65 years, 65+ years), and sex-ethnic group.

The pattern of response to the baseline questionnaire, defined by wave number, was compared with the wave number for the follow-up survey to test the notion that past behavior predicts future behavior. The χ2 statistic for association was used for statistical comparison. In addition, the sex-specific logistic models were repeated for the follow-up survey using covariates from the baseline questionnaire.

Finally, we determined whether the extent of missing data on select variables increased with wave of response. Diet was considered missing if the calories computed from responses were implausible as described elsewhere (27). The Mantel-Haenszel χ2 statistic (26) was used to test the trend in the percent missing over wave; in addition, logistic models were done regressing the missing status on wave of response (initial or later), controlling for age, sex, ethnicity, and language version of the questionnaire.

The response rate for the eligible sample of the five targeted ethnic groups was 25.6%, including in the denominator 15% of the sample for whom mailings were undeliverable. Table 1 shows the response and undeliverable rate by group based on the estimated ethnicity/race, which ranged from 17% for Latinos to 45% for Japanese, including undeliverable surveys. The undeliverable rate, which means the address was not valid and a forwarding address was not available, varied by ethnic group as well. These calculations exclude persons found to be ineligible for various reasons, including age, place of residence, and being deceased, the prevalences of which range from 0.4% to 3%. Undelivered surveys may have included some ineligible persons but this could not be determined. The accuracy of the estimated ethnic/racial group was assessed based on cohort participants' self-reported ethnicity. In general, there was reasonable agreement (80%) between the presumed and self-reported categories in the targeted age group, especially for African-Americans (86%), Native Hawaiians (86%), and Japanese-Americans (94%); as expected, agreement was better for men than for women, likely due to their married names not matching their ethnicity. The brief follow-up survey sent to cohort members 5 years later was completed by 78% of cohort members; 9% refused, 7% were deceased, and 6% were unreachable.

Table 1.

Response rates by ethnic/racial group for targeted MEC participants (1993-1996)

Ethnic/racial group (eligible n)*Response rate*
Response rateUndeliverable rateResponse rate for delivered surveys
White (165,020) 34.1 17.5 41.3 
African-American (188,403) 21.1 9.4 23.3 
Native Hawaiian (20,248) 33.7 15.6 39.9 
Japanese-American (128,014) 45.3 5.7 48.0 
Latino (352,928) 16.5 20.1 20.7 
Total (854,613) 25.6 15 30.2 
Ethnic/racial group (eligible n)*Response rate*
Response rateUndeliverable rateResponse rate for delivered surveys
White (165,020) 34.1 17.5 41.3 
African-American (188,403) 21.1 9.4 23.3 
Native Hawaiian (20,248) 33.7 15.6 39.9 
Japanese-American (128,014) 45.3 5.7 48.0 
Latino (352,928) 16.5 20.1 20.7 
Total (854,613) 25.6 15 30.2 
*

Based on estimated ethnicity used to target recruitment of racial/ethnic groups.

Based on five targeted ethnic groups.

Sample characteristics for participants in this analysis are summarized in Table 2. Approximately 65% of the total participant sample had responded in the first wave, 23% in the second, and 12% in the third; males were ∼9% more likely than females to respond later (OR, 1.09; 95% confidence interval, 1.07-1.11). Nineteen percent of participants were born outside of the United States, and 36% of Latinos completed a Spanish version questionnaire. Although the mean age at questionnaire completion was 60 (SD, 8.9) years, we divided the sample into age groups of under 65 (64%) and 65+ years (36%) to investigate whether older adults, who are less likely to be working, would have a different pattern of response.

Table 2.

Sample characteristics for MEC participants (1993-1996)

VariableMales (n = 90,963)*Females (n = 110,498)*
Wave (%)   
    First 63.8 65.8 
    Second 23.6 22.2 
    Third 12.6 12.0 
Demographics   
    Age at questionnaire completion (range, 41-78), mean (SD) 60.3 (8.9) 59.7 (8.9) 
    Ethnicity/race (%)   
        White 24.5 22.7 
        African-American 12.9 18.5 
        Native Hawaiian 6.3 6.5 
        Japanese 29.6 27.0 
        Latino 20.8 18.6 
        Other 6.0 6.7 
    Location (%)   
        Hawaii 52.9 48.3 
        Los Angeles, CA 47.1 51.7 
    Foreign born (%) 18.7 19.2 
    Years of education (range, 5-18), mean (SD) 13.2 (3.3) 13.0 (3.2) 
    Completed Spanish questionnaire (%) 8.1 7.6 
    Currently married (%) 77 59 
    No. children (%)   
        None 15.2 12.8 
        1-3 55.3 55.3 
        ≥4 29.5 31.9 
Health history (%)   
    Family history of cancer 33.1 38.2 
    Personal history of cancer 10.4 12.3 
    Digestive disorders 23.2 24.0 
    Hypertension 40.4 38.5 
    Asthma or other allergy 20.0 30.8 
    Other major illnesses 22.1 17.3 
    BMI (range, 8.4-100.7), mean (SD) 26.0 (4.2) 25.8 (5.7) 
Behavioral factors   
    Energy intake in kcal/d (range, 417-8,433), mean (SD) 2,378 (1,092) 1,949 (942) 
    Smoking (%)   
        Never smoked 29.9 56.7 
        Former smoker 51.8 29.1 
        Current smoker 18.3 14.2 
    Moderate or vigorous physical activity, h/wk (range, 0-13.3), mean (SD) 1.3 (1.5) 1.1 (1.3) 
Medication and supplement use (%)   
    Current aspirin use 23.1 17.5 
    Current use of acetaminophen or other (nonaspirin) pain reliever 16.7 26.5 
    Former/current medication for peptic ulcer disease 33.5 38.3 
    Current other over-the-counter medication use§ 17.7 25.3 
    Current use of blood pressure medication 25.5 27.2 
    Ever taken progesterone and/or estrogen NA 46.5 
    Current multivitamin use 46.7 52.8 
    Current vitamin C use 34.7 39.5 
    Current vitamin E use 25.9 29.5 
    Current calcium use 13.9 39.9 
VariableMales (n = 90,963)*Females (n = 110,498)*
Wave (%)   
    First 63.8 65.8 
    Second 23.6 22.2 
    Third 12.6 12.0 
Demographics   
    Age at questionnaire completion (range, 41-78), mean (SD) 60.3 (8.9) 59.7 (8.9) 
    Ethnicity/race (%)   
        White 24.5 22.7 
        African-American 12.9 18.5 
        Native Hawaiian 6.3 6.5 
        Japanese 29.6 27.0 
        Latino 20.8 18.6 
        Other 6.0 6.7 
    Location (%)   
        Hawaii 52.9 48.3 
        Los Angeles, CA 47.1 51.7 
    Foreign born (%) 18.7 19.2 
    Years of education (range, 5-18), mean (SD) 13.2 (3.3) 13.0 (3.2) 
    Completed Spanish questionnaire (%) 8.1 7.6 
    Currently married (%) 77 59 
    No. children (%)   
        None 15.2 12.8 
        1-3 55.3 55.3 
        ≥4 29.5 31.9 
Health history (%)   
    Family history of cancer 33.1 38.2 
    Personal history of cancer 10.4 12.3 
    Digestive disorders 23.2 24.0 
    Hypertension 40.4 38.5 
    Asthma or other allergy 20.0 30.8 
    Other major illnesses 22.1 17.3 
    BMI (range, 8.4-100.7), mean (SD) 26.0 (4.2) 25.8 (5.7) 
Behavioral factors   
    Energy intake in kcal/d (range, 417-8,433), mean (SD) 2,378 (1,092) 1,949 (942) 
    Smoking (%)   
        Never smoked 29.9 56.7 
        Former smoker 51.8 29.1 
        Current smoker 18.3 14.2 
    Moderate or vigorous physical activity, h/wk (range, 0-13.3), mean (SD) 1.3 (1.5) 1.1 (1.3) 
Medication and supplement use (%)   
    Current aspirin use 23.1 17.5 
    Current use of acetaminophen or other (nonaspirin) pain reliever 16.7 26.5 
    Former/current medication for peptic ulcer disease 33.5 38.3 
    Current other over-the-counter medication use§ 17.7 25.3 
    Current use of blood pressure medication 25.5 27.2 
    Ever taken progesterone and/or estrogen NA 46.5 
    Current multivitamin use 46.7 52.8 
    Current vitamin C use 34.7 39.5 
    Current vitamin E use 25.9 29.5 
    Current calcium use 13.9 39.9 

Abbreviation: NA, not available.

*

n is smaller for some variables due to missing data.

Polyps, ulcer, gall stones, removal of stomach, or gall bladder.

Heart attack, angina, stroke, and/or diabetes.

§

Includes antihistamines, antacids, and/or laxatives.

Table 3 shows model results predicting later response (i.e., in the second or third wave) for females and males, respectively. The results were generally consistent across age groups within sex (data not shown). Ethnicity was among the strongest effects, based on the magnitude of the ORs, with all nonwhite groups more likely to respond later compared with whites. Although Japanese were most likely to participate (response rate, 45%), they were 11% to 16% more likely to respond in a later wave compared with whites. Latinos, who were least likely to participate (response rate, 17%), had a likelihood of responding later, if they did participate, comparable with Japanese (female OR, 1.18; male OR, 1.06). However, persons responding to Spanish language surveys or who were foreign born were more likely to respond later, thus attenuating the effect of ethnicity for Latinos (female OR, 1.29; male OR, 1.15, when place of birth and language of survey were excluded from the models). In fact, Latinos and African-Americans had nearly identical patterns of response (later wave: Latino, 42%; African-American, 41%). African-Americans (response rate, 21%) had the highest ORs for the ethnicity effect (female OR, 1.57; male OR, 1.38), followed by Native Hawaiians (female OR, 1.32; male OR, 1.27; response rate, 34%), when compared with whites.

Table 3.

Association of variables with later response wave by sex, MEC participants (1993-1996)

VariablesFemales (n = 95,545)*
Males (n = 81,346)*
OR (95% CI)OR (95% CI)
Demographics   
    Age at questionnaire completion (10-y intervals) 1.01 (0.99-1.03) 0.91 (0.90-0.93) 
    Ethnicity/race (reference group = White)   
        African-American 1.57 (1.49-1.65) 1.38 (1.30-1.48) 
        Native Hawaiian 1.32 (1.24-1.41) 1.27 (1.19-1.36) 
        Japanese 1.16 (1.11-1.21) 1.11 (1.06-1.16) 
        Latino 1.18 (1.11-1.25) 1.06 (0.99-1.12) 
        Other 1.45 (1.36-1.55) 1.42 (1.32-1.53) 
    Hawaii vs Los Angeles (reference group) 0.88 (0.84-0.91) 0.88 (0.84-0.92) 
    Foreign vs U.S. born (reference group) 1.32 (1.26-1.38) 1.22 (1.16-1.29) 
    Years of education 0.96 (0.96-0.97) 0.95 (0.94-0.95) 
    Completed Spanish questionnaire (reference group = English) 1.33 (1.24-1.43) 1.43 (1.33-1.54) 
    Currently married (reference group = not married) 0.98 (0.95-1.00) 1.04 (1.00-1.08) 
    No. children 1.01 (1.00-1.02) 1.02 (1.01-1.02) 
Health history (reference group = none for categorical variables)   
    Family history of cancer 0.93 (0.90-0.96) 0.94 (0.91-0.97) 
    Personal history of cancer 0.84 (0.81-0.88) 0.92 (0.88-0.97) 
    Digestive disease 0.98 (0.94-1.01) 0.95 (0.92-0.99) 
    Hypertension 1.03 (1.00-1.06) 0.98 (0.95-1.02) 
    Asthma or other allergies 0.90 (0.87-0.93) 0.92 (0.89-0.95) 
    Other major illness 1.01 (0.97-1.05) 1.01 (0.97-1.05) 
    BMI 1.00 (1.00-1.01) 1.00 (1.00-1.01) 
Behavioral factors   
    Energy intake (kcal/d, divided by 500) 1.01 (1.00-1.01) 1.00 (1.00-1.01) 
    Smoking (reference group = Never smoked)   
        Former smoker 0.97 (0.94-1.00) 0.90 (0.87-0.93) 
        Current smoker 1.01 (0.97-1.06) 1.06 (1.01-1.10) 
    Moderate or vigorous physical activity (h/wk) 0.96 (0.94-0.97) 0.97 (0.96-0.98) 
Medication and supplement use (reference group = none for categorical variables)   
    Current aspirin use 0.93 (0.90-0.97) 0.90 (0.86-0.93) 
    Former/current medication for peptic ulcer disease 0.95 (0.92-0.99) 1.01 (0.97-1.05) 
    Current other over-the-counter medication use§ 0.97 (0.93-1.01) 0.95 (0.91-0.99) 
    Current multivitamin use 1.02 (0.99-1.05) 0.97 (0.94-1.01) 
    Current vitamin C use 1.04 (1.00-1.07) 1.00 (0.97-1.05) 
    Current vitamin E use 1.02 (0.98-1.05) 1.00 (0.96-1.05) 
    Current calcium use 0.98 (0.95-1.01) 1.06 (1.02-1.12) 
    Ever estrogen or progesterone use 0.89 (0.87-0.92) NA (NA) 
VariablesFemales (n = 95,545)*
Males (n = 81,346)*
OR (95% CI)OR (95% CI)
Demographics   
    Age at questionnaire completion (10-y intervals) 1.01 (0.99-1.03) 0.91 (0.90-0.93) 
    Ethnicity/race (reference group = White)   
        African-American 1.57 (1.49-1.65) 1.38 (1.30-1.48) 
        Native Hawaiian 1.32 (1.24-1.41) 1.27 (1.19-1.36) 
        Japanese 1.16 (1.11-1.21) 1.11 (1.06-1.16) 
        Latino 1.18 (1.11-1.25) 1.06 (0.99-1.12) 
        Other 1.45 (1.36-1.55) 1.42 (1.32-1.53) 
    Hawaii vs Los Angeles (reference group) 0.88 (0.84-0.91) 0.88 (0.84-0.92) 
    Foreign vs U.S. born (reference group) 1.32 (1.26-1.38) 1.22 (1.16-1.29) 
    Years of education 0.96 (0.96-0.97) 0.95 (0.94-0.95) 
    Completed Spanish questionnaire (reference group = English) 1.33 (1.24-1.43) 1.43 (1.33-1.54) 
    Currently married (reference group = not married) 0.98 (0.95-1.00) 1.04 (1.00-1.08) 
    No. children 1.01 (1.00-1.02) 1.02 (1.01-1.02) 
Health history (reference group = none for categorical variables)   
    Family history of cancer 0.93 (0.90-0.96) 0.94 (0.91-0.97) 
    Personal history of cancer 0.84 (0.81-0.88) 0.92 (0.88-0.97) 
    Digestive disease 0.98 (0.94-1.01) 0.95 (0.92-0.99) 
    Hypertension 1.03 (1.00-1.06) 0.98 (0.95-1.02) 
    Asthma or other allergies 0.90 (0.87-0.93) 0.92 (0.89-0.95) 
    Other major illness 1.01 (0.97-1.05) 1.01 (0.97-1.05) 
    BMI 1.00 (1.00-1.01) 1.00 (1.00-1.01) 
Behavioral factors   
    Energy intake (kcal/d, divided by 500) 1.01 (1.00-1.01) 1.00 (1.00-1.01) 
    Smoking (reference group = Never smoked)   
        Former smoker 0.97 (0.94-1.00) 0.90 (0.87-0.93) 
        Current smoker 1.01 (0.97-1.06) 1.06 (1.01-1.10) 
    Moderate or vigorous physical activity (h/wk) 0.96 (0.94-0.97) 0.97 (0.96-0.98) 
Medication and supplement use (reference group = none for categorical variables)   
    Current aspirin use 0.93 (0.90-0.97) 0.90 (0.86-0.93) 
    Former/current medication for peptic ulcer disease 0.95 (0.92-0.99) 1.01 (0.97-1.05) 
    Current other over-the-counter medication use§ 0.97 (0.93-1.01) 0.95 (0.91-0.99) 
    Current multivitamin use 1.02 (0.99-1.05) 0.97 (0.94-1.01) 
    Current vitamin C use 1.04 (1.00-1.07) 1.00 (0.97-1.05) 
    Current vitamin E use 1.02 (0.98-1.05) 1.00 (0.96-1.05) 
    Current calcium use 0.98 (0.95-1.01) 1.06 (1.02-1.12) 
    Ever estrogen or progesterone use 0.89 (0.87-0.92) NA (NA) 

Abbreviation: 95% CI, 95% confidence interval.

*

All variables are adjusted for all other variables present in the model.

Polyps, ulcer, gall stones, removal of stomach, or gall bladder.

Heart attack, angina, stroke, and/or diabetes.

§

Includes antihistamines, antacids, and/or laxatives.

Persons with more years of education were more likely to respond in the initial wave, with a decrease in risk of a later response of 4% to 5% per year of education. Hawaii participants were more likely to have responded in the initial wave than participants from the Los Angeles area; however, there was little overlap in ethnic groups across the two locations. Almost all Latinos and African-Americans were recruited from Los Angeles. Sex differences were noted in the effects for marital status. Married men were somewhat more likely to respond later to the questionnaire, especially for working age men. Marital status was not related to later response for women.

Among health history variables, personal and family history of cancer were associated with response in the initial wave. Other medical conditions showed small but significant associations to questionnaire response time. Having asthma or other allergies was associated with response in the initial wave. Other chronic diseases, such as cardiovascular, digestive disease, and hypertension, were associated with response in select subgroups only (data not shown). BMI was not related to later response.

Among behavioral factors, energy intake was not related to wave of response, but getting more exercise per week was related to response in the first wave. However, this was a small difference, with initial responders reporting a mean of 1.2 h per week of moderate and vigorous activity and later responders reporting 1.1 h. For all males and working age females, former smokers were more likely to respond in the initial wave than those who never smoked, whereas current smokers were more likely to respond in a later wave for all males and for older females (data not shown).

Medication and supplement use showed some small associations, most in the direction of increasing the likelihood of response in the initial wave. Regular current use of aspirin was significant for all males as well as older females (data not shown). Antihistamines, antacids, and laxative use combined into a category of over-the-counter remedies was significant for males, particularly in the older group. All females showed a small effect for hormone replacement (estrogen and/or progesterone).

The use of up to 26 variables in the multivariate models resulted in a not insignificant number of individuals with missing values (12.2%). To determine if the missing data resulted in a biased sample, we compared two models containing only the demographic independent variables: one including participants with no missing data on the eight demographic variables (96.6% of the sample) to one excluding participants having missing data on any of the 26 model variables (87.8%). The models, analyzed by sex, showed consistent findings, suggesting that no bias was introduced due to missing data.

Subgroup analyses were conducted using all of the same variables (except ethnicity) within each of the five targeted ethnic groups by sex. Few differences were found compared with the overall models.

Comparison of these results with those from the follow-up questionnaire showed a significant relation between baseline and follow-up wave of response [χ2(10) = 6809.9; P < 0.0001; Table 4]. For each wave of baseline response, the highest percentage of persons responded in the initial wave to the brief follow-up survey. However, there were proportionally more wave 1 baseline respondents who also responded in the first wave of follow-up. Similarly, wave 3 baseline respondents were proportionally more likely to respond in wave 3 follow-up, as well as be unreachable or refuse to participate. The logistic models for later response to the follow-up questionnaire showed similar results to the baseline analysis. Most differences were in the same direction, although the level of significance varied.

Table 4.

Response rates to the follow-up questionnaire by wave of response to the baseline questionnaire, MEC participants (1999-2000)

Baseline response wave123
Follow-up status    
    Wave 1 52.3 36.1 30.6 
    Wave 2 12.9 13.6 12.8 
    Wave 3* 7.2 8.9 9.7 
    Phone interview 14.3 21.1 23.3 
    Unreachable 5.8 8.1 9.7 
    Refused 7.6 12.2 14.0 
Baseline response wave123
Follow-up status    
    Wave 1 52.3 36.1 30.6 
    Wave 2 12.9 13.6 12.8 
    Wave 3* 7.2 8.9 9.7 
    Phone interview 14.3 21.1 23.3 
    Unreachable 5.8 8.1 9.7 
    Refused 7.6 12.2 14.0 
*

Follow-up wave 3 includes response to both third and fourth mailings.

Table 5 shows the percentage of missing data across mailing wave for seven variables used in baseline modeling and shows that the percent missing generally increased with each wave of response. Across the seven variables, 12.8% of wave 3 respondents had at least one missing variable. Each variable was also tested controlling for ethnicity, Spanish language survey, age, and sex in a logistic model; those who responded later compared with those responding to the initial mailing were at 27% to 68% higher risk of having missing data. Within ethnic groups, differences were generally in the same direction but sometimes attenuated (data not shown).

Table 5.

Percent with missing data in the baseline questionnaire by wave number, MEC participants (1993-1996)

VariablesMissing rate (n = 201,461)
Wave 1Wave 2Wave 3P*
Education 1.0 1.4 1.8 <0.0001 
Born in the United States 0.3 0.5 0.4 <0.0001 
No. children 1.5 1.9 2.3 <0.0001 
Weight 0.6 1.1 1.1 <0.0001 
Diet 3.8 4.8 5.2 <0.0001 
Smoking status 1.2 1.9 2.3 <0.0001 
Physical activity 1.6 2.8 3.7 <0.0001 
Any missing 8.0 11.4 12.8 <0.0001 
VariablesMissing rate (n = 201,461)
Wave 1Wave 2Wave 3P*
Education 1.0 1.4 1.8 <0.0001 
Born in the United States 0.3 0.5 0.4 <0.0001 
No. children 1.5 1.9 2.3 <0.0001 
Weight 0.6 1.1 1.1 <0.0001 
Diet 3.8 4.8 5.2 <0.0001 
Smoking status 1.2 1.9 2.3 <0.0001 
Physical activity 1.6 2.8 3.7 <0.0001 
Any missing 8.0 11.4 12.8 <0.0001 
*

Mantel-Haenszel χ2 test for linear association.

Includes missing and implausible values.

Cases with one or more missing among the seven variables listed.

We have shown that there are substantial associations between ethnicity and wave of response to a 26-page self-administered mail questionnaire, with nonwhite ethnic/racial groups more likely to respond to a later mailing of the questionnaire compared with whites. Japanese respondents had a higher response rate than whites but were more likely to respond later. Native Hawaiians were also likely to respond later, although their response rate was better than average for this sample. African-Americans and Latinos had lower response rates and were substantially more likely to respond in a later wave. Relatedly, use of the Spanish language questionnaire and being foreign born showed similar associations. Not surprisingly, more education was related to earlier response. In addition, a personal or family history of cancer, having asthma, regular use of aspirin, and engaging in more physical activity were associated with early response, whereas being a current smoker was associated with later response. We also show that most associations are present within ethnic groups and also apply to a follow-up questionnaire. Although the three mailings of questionnaires were costly, it was necessary for recruiting ethnic minorities and those with less healthy lifestyles. However, each wave of response was associated with more missing values, which effectively reduces the number of cases that could be used for multivariate analyses.

These findings are generally similar to findings reported in the literature. Certainly, people have shown education and/or other proxies for socioeconomic level were inversely related to timing of response (9, 10, 13, 17, 18, 28). The association of current smoking with late response has also been reported (9, 10, 15, 17, 29). Several studies noted more effort was needed to recruit Hispanic and African-American participants (20, 30-32). Some studies (12, 16, 20) reported findings that suggest personal salience of the topic was related to early response. Our early responders showed more personal experience with cancer for a family member and/or self, whereas other diseases had mixed associations with response wave. Among several studies in the literature, there are notable differences in the timing of mailings, use of incentives, survey topic, and the burden of completing the questionnaire. The populations have been from several different countries, in specific age groups and sexes, and even included participants that were recruited previously for other research. Given these differences, common findings may be considered robust.

As important as these common findings is the context in which they are presented. Some have stated that repeated mailings or extra recruitment efforts did not affect prevalence estimates (e.g., chronic pain) or remove differences between responders and nonresponders (12, 16, 21, 29). Our findings suggest a meaningful difference exists for repeated mailings for nonwhite ethnic groups. For example, the third wave had 12.3% of respondents overall, yet 15.8% of Latinos and 14.7% of African-Americans. In contrast to ethnicity, wave had less practical importance for the recruitment of current smokers; 13.2% were recruited in wave 3. Similarly, the physical activity differences were nominal, and energy intake and BMI did not differ. The purpose of the MEC study is to investigate associations between lifestyle and cancer and other diseases among different ethnic groups. The definitive test will be if there are differences in the lifestyle-disease associations between early and late responders, which we will examine when more follow-up data are available.

Some researchers conducting similar analyses have failed to support a “continuum of resistance” model (16, 33), which posits that initial responders are on a continuum with later responders and nonresponders, thus providing the basis for using late responders to assess potential nonresponse bias. It is certain that persons who respond later after repeated solicitations would be nonresponders if recruitment efforts were less rigorous, for any given cross-sectional sample. Our results for the follow-up questionnaire partially support this concept because those persons most difficult to recruit at baseline were more likely to refuse, be unreachable, or respond later on follow-up. However, 31% of late responders at baseline responded to the initial survey at follow-up.

As a practical matter, identifying potential participants for the four nonwhite ethnic groups was challenging, and it was necessary to recruit as many as possible within our lists given the constraints of money and time. Studies that have advocated for less rigorous recruitment efforts (12, 21, 29) had better initial (41-53%) and final (58-83%) mail response rates than we obtained. One of these studies sampled a more homogeneous population of retirement community residents (29) and another sampled previous research volunteers (12). Our response rate was 25.6%, including undeliverable addresses in the denominator (30.2% excluding them), to a study that required participants to complete a 26-page self-administered survey and asked for them to join a longitudinal cohort study. Our response rate for African-Americans (21%) was consistent with another cancer prevention surveillance study that reported a 17.5% response for African-Americans (32). If potential participants are abundant for an ethnic group of interest, and depending on the research question, it may be advantageous to attempt recruitment with new individuals rather than resending questionnaires. In our sample, initial responders did not differ from later responders on energy intake and BMI and differed little on physical activity and smoking. For all ethnic groups, the first mailing yielded the highest response rate and fewer resources were needed for follow-up.

One recent simulation study (23) suggested that late respondents were likely to have error in their responses. Using missing data, our results support that claim. We found that response waves had progressively more missing data even after adjusting for age, sex, language of questionnaire, and ethnicity. The percent missing never became unacceptable in any of the three mailings. However, for multivariate analyses, cases excluded due to missing data may affect generalizability.

A limitation of the current study was that we had little data for comparing respondents with nonrespondents, necessary to truly assess nonresponse bias. We had nonresponse data for the follow-up questionnaire, but that comparison occurred in a self-selected sample of previous participants. However, our cohort participants were similar to the U.S. Census data for Hawaii and Los Angeles on education and marital status, suggesting that the sample was representative (24). Our results are unlikely to apply to all studies, as our response wave definition was data driven and our mailing dates were not at predetermined time intervals as many in other studies were.

In conclusion, our recommendation is to pursue the difficult-to-recruit participants through repeated contacts, especially when sample availability is limited. The increased response rate may serve to lessen nonresponse bias; however, missing data may be more problematic when less motivated participants are finally recruited. To compensate, we suggest the use of adjunct measures to calibrate and/or validate measurements on subsamples using more intensive data collection protocols as was previously done for dietary data (25).

Grant support: National Cancer Institute, U.S. Department of Health and Human Services grants R37 CA 54281 and R25 CA 90956.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
Dillman, DA. The design and administration of mail surveys.
Ann Rev Soc
1991
;
17
:
225
–49.
2
White E, Carney PA, Kolar AS. Increasing response to mailed questionnaires by including a pencil/pen.
Am J Epidemiol
2005
;
162
:
261
–6.
3
Rosoff PM, Werner C, Clipp EC, Guill AB, Bonner M, Demark-Wahnefried W. Response rates to a mailed survey targeting childhood cancer survivors: a comparison of conditional versus unconditional incentives.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
1330
–2.
4
Evans BR, Peterson BL, Demark-Wahnefried W. No difference in response rate to a mailed survey among prostate cancer survivors using conditional versus unconditional incentives.
Cancer Epidemiol Biomarkers Prev
2004
;
13
:
277
–8.
5
Hoffman SC, Burke AE, Helzlsouer KJ, Comstock GW. Controlled trial of the effect of length, incentives, and follow-up techniques on response to a mailed questionnaire.
Am J Epidemiol
1998
;
148
:
1007
–11.
6
Subar AF, Ziegler RG, Thompson FE, et al. Is shorter always better? Relative importance of questionnaire length and cognitive ease on response rates and data quality for two dietary questionnaires.
Am J Epidemiol
2001
;
153
:
404
–9.
7
Chretien JP, Chu LK, Smith TC, Smith B, Ryan MA. Demographic and occupational predictors of early response to a mailed invitation to enroll in a longitudinal health study.
BMC Med Res Methodol
2007
;
7
:
6
.
8
Bhatti P, Sigurdson AJ, Wang SS, et al. Genetic variation and willingness to participate in epidemiologic research: data from three studies.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
2449
–53.
9
Brogger J, Bakke P, Eide GE, Gulsvik A. Contribution of follow-up of nonresponders to prevalence and risk estimates: a Norwegian respiratory health survey.
Am J Epidemiol
2003
;
157
:
558
–66.
10
Chen RL, Wei L, Syme PD. Comparison of early and delayed respondents to a postal health survey: a questionnaire study of personality traits and neuropsychological symptoms.
Eur J Epidemiol
2003
;
18
:
195
–202.
11
de Winter A, Oldehinkel AJ, Veenstra R, Brunnekreef JA, Verhulst FC, Ormel J. Evaluation of non-response bias in mental health determinants and outcomes in a large sample of pre-adolescents.
Eur J Epidemiol
2005
;
20
:
173
–81.
12
Elliott AM, Hannaford PC. Third mailings in epidemiological studies: are they really necessary?
Family Practice
2003
;
20
:
592
–4.
13
Etter JF, Perneger TV. Analysis of non-response bias in a mailed health survey.
J Clin Epidemiol
1997
;
50
:
1123
–8.
14
Helasoja V, Prattala R, Dregval L, Pudule I, Kasmel A. Late response and item nonresponse in the Finbalt Health Monitor survey.
Eur J Public Health
2002
;
12
:
117
–23.
15
Korkeila K, Suominen S, Ahvenainen J, et al. Non-response and related factors in a nation-wide health survey.
Eur J Epidemiol
2001
;
17
:
991
–9.
16
Lahaut VM, Jansen HA, van de Mheen D, Garretsen HF, Verdurmen JE, van Dijk A. Estimating non-response bias in a survey on alcohol consumption: comparison of response waves.
Alcohol Alcohol
2003
;
38
:
128
–34.
17
Larroque B, Kaminski M, Bouvier-Colle MH, Hollebecque V. Participation in a mail survey: role of repeated mailings and characteristics of nonrespondents among recent mothers.
Paediat Perinatal Epidemiol
1999
;
13
:
218
–33.
18
Olowokure B, Caswell M, Duggal HV. Response patterns to a postal survey using a cervical screening register as the sampling frame.
Public Health
2004
;
118
:
508
–12.
19
Paganinihill A, Hsu G, Chao A, Ross RK. Comparison of early and late respondents to a postal health survey questionnaire.
Epidemiology
1993
;
4
:
375
–9.
20
Rogers A, Murtaugh MA, Edwards S, Slattery ML. Contacting controls: are we working harder for similar response rates, and does it make a difference?
Am J Epidemiol
2004
;
160
:
85
–90.
21
Siemiatycki J, Campbell S. Nonresponse bias and early versus all responders in mail and telephone surveys.
Am J Epidemiol
1984
;
120
:
291
–301.
22
Brenner H. Alternative approaches for estimating prevalence in epidemiologic surveys with two waves of respondents.
Am J Epidemiol
1995
;
142
:
1236
–45.
23
Stang A, Jockel KH. Studies with low response proportions may be less biased than studies with high response proportions.
Am J Epidemiol
2004
;
159
:
204
–10.
24
Kolonel LN, Henderson BE, Hankin JH, et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics.
Am J Epidemiol
2000
;
151
:
346
–57.
25
Stram DO, Hankin JH, Wilkens LR, et al. Calibration of the dietary questionnaire for a multiethnic cohort in Hawaii and Los Angeles.
Am J Epidemiol
2000
;
151
:
358
–70.
26
SAS OnlineDoc 9.1.3. Cary (NC): SAS Institute, Inc.; 2002–2005.
27
Nothlings U, Wilkens LR, Murphy SP, Hankin JH, Henderson BE, Kolonel LN. Meat and fat intake as risk factors for pancreatic cancer: the multiethnic cohort study.
J Natl Cancer Inst
2005
;
97
:
1458
–65.
28
Brambilla DJ, McKinlay SM. A comparison of responses to mailed questionnaires and telephone interviews in a mixed mode health survey.
Am J Epidemiol
1987
;
126
:
962
–71.
29
Paganinihill A, Hsu G. Smoking and mortality among residents of a California retirement community.
Am J Public Health
1994
;
84
:
992
–5.
30
Nelson KM, Geiger AM, Mangione CM. Racial and ethnic variation in response to mailed and telephone surveys among women in a managed care population.
Ethnic Dis
2004
;
14
:
580
–3.
31
Morris MC, Colditz GA, Evans DA. Response to a mail nutritional survey in an older bi-racial community population.
Ann Epidemiol
1998
;
8
:
342
–6.
32
Satia JA, Galanko JA, Rimer BK. Methods and strategies to recruit African Americans into cancer prevention surveillance studies.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
718
–21.
33
Lin IF, Schaeffer NC. Using survey participants to estimate the impact of nonparticipation.
Public Opinion Quarterly
1995
;
59
:
236
–58.