Abstract
Valid and reliable self-report measures of cancer screening behaviors are important for evaluating efforts to improve adherence to guidelines. We evaluated test-retest reliability and validity of self-report of the fecal occult blood test (FOBT), sigmoidoscopy (SIG), colonoscopy (COL), and barium enema (BE) using the National Cancer Institute colorectal cancer screening (CRCS) questionnaire. A secondary objective was to evaluate reliability and validity by mail, telephone, and face-to-face survey administration modes. Consenting men and women, 51 to 74 years old, receiving care at a multispecialty clinic for at least 5 years who had not been diagnosed with colorectal cancer were stratified by prior CRCS status and randomized to survey mode (n = 857). Within survey mode, respondents were randomized to complete a second survey at 2 weeks, 3 months, or 6 months. Comparing self-report with administrative and medical records, concordance estimates were 0.91 for COL, 0.85 for FOBT, 0.85 for SIG, and 0.92 for BE. Overall sensitivity estimates were 0.91 for COL, 0.82 for FOBT, 0.76 for SIG, and 0.56 for BE. Specificity estimates were 0.91 for COL, 0.86 for FOBT, 0.89 for SIG, and 0.97 for BE. Sensitivity and specificity varied little by survey mode for any test. Report-to-records ratio showed overreporting for SIG (1.1), COL (1.15), and FOBT (1.57), and underreporting for BE (0.82). Reliability at all time intervals was highest for COL; there was no consistent pattern according to survey mode. This study provides evidence to support the use of the National Cancer Institute CRCS questionnaire to assess self-report with any of the three survey modes. (Cancer Epidemiol Biomarkers Prev 2008;17(4):758–67)
Introduction
Valid and reliable self-report measures of cancer screening behaviors are important for identifying correlates and predictors of behavior, evaluating the effectiveness of behavioral interventions, and monitoring progress and trends in adherence to cancer screening guidelines (1, 2). The implementation of the Health Insurance Portability and Accountability Act of 1996 limited access to medical records and is likely to increase the need to use self-reported data in epidemiologic, health services, and behavioral research (3-5). For colorectal cancer screening (CRCS), assessing the accuracy of self-reports is especially difficult because there are multiple types of acceptable screening tests [i.e., fecal occult blood test (FOBT), sigmoidoscopy (SIG), colonoscopy (COL), and barium enema (BE)]; the recommended time interval for test completion differs between tests, and the guidelines have changed over time (6). Adding to this complexity is the number of measures of utilization, e.g., initial, recent, up-to-date, ever.
In 1999, the National Cancer Institute (NCI) convened a group of experts to develop uniform descriptions of the tests and questions to measure CRCS behaviors (hereafter called the NCI CRCS questionnaire; ref. 6). The initial draft of the questionnaire underwent cognitive testing as part of the development of NCI's Health Information National Trends Survey in 2003. Qualitative data from focus groups (7-12) showed that many adults do not know or recognize the names of CRCS tests and that many were unable to distinguish between SIG and COL or between home-based and office-based FOBT (13, 14); therefore, the cognitive interviews focused on issues related to comprehension and interpretation of the questions and, to a lesser extent, on strategies respondents used to recall information. Findings from the cognitive interviews were consistent with previous findings and led to a revised questionnaire and a recommendation that CRCS test descriptions be provided prior to asking questions about test use (6).
The purpose of this report was to evaluate the reliability and validity of self-report for the four tests included in the NCI CRCS questionnaire (FOBT, SIG, COL, and BE). A secondary objective was to determine if validity estimates were equivalent across three modes of administration (mail, telephone, and face-to-face). We focused on survey mode because such differences may affect comparisons of estimates from national surveys (e.g., the National Health Interview Survey is administered face-to-face whereas the Behavioral Risk Factor Surveillance System is administered by telephone) and because these survey modes are commonly used in surveys and intervention studies. Although one study evaluated the accuracy of self-reports of mammography when ascertained by mail compared with telephone (15), to date, no one has compared self-report of any cancer screening behaviors when assessed by telephone versus face-to-face modes. Holbrook et al. (16) suggest that response quality may differ between telephone and face-to-face interviews due to the opportunity to establish greater trust and rapport between the respondent and the interviewer. We also evaluated the test-retest reliability of the NCI CRCS questionnaire over 2-week, 3-month, and 6-month intervals.
Materials and Methods
Study Setting, Population, and Recruitment
The study was approved by the University of Texas Health Science Center at Houston, Committee for the Protection of Human Subjects. The study setting was the Kelsey-Seybold Clinic (KSC), the largest multispecialty medical organization in Houston, Texas with a main campus and 17 satellite clinics. KSC delivers primary and specialty care to >400,000 patients and has a staff of 119 primary care physicians (family and internal medicine) and seven gastroenterologists. The prevalence of any CRCS among KSC patients, defined as home-based FOBT within the past year, SIG within the past 5 years, or COL within the past 10 years, was 54% in 2000 (17).
The study population was English-speaking men and women between 51 and 74 years of age, receiving primary care at KSC for at least 5 years. Although guidelines for COL recommend one every 10 years, we decided against requiring KSC enrollment for 10 years because of the limited number of patients meeting this more restrictive eligibility criterion. Patients with a personal history of colorectal cancer were excluded.
Eligibility status was determined from the KSC electronic administrative database. Potentially eligible participants were randomly selected every 2 weeks. Patients were mailed an invitation letter describing the study along with contact information for the Kelsey Research Foundation; staff at the Foundation facilitate research collaborations between KSC physicians and university researchers. The letter stated that patients could decline participation by calling the Foundation. A Foundation research assistant called patients who did not actively decline in order to confirm willingness to participate, ascertain eligibility, and enroll them in the study by obtaining verbal informed consent, and permission to review the medical record (as required by the Health Insurance Portability and Accountability Act). Patients were called at least six times before being classified as nonrespondents. Patients who refused were asked to give a reason for declining. Weekly, the Foundation staff sent enrollee names to University of Texas-Houston School of Public Health project staff who made all additional contacts. Recruitment began in September 2005 and follow-up ended in August 2007.
Study Design
We used a randomized design to assess equivalence of reliability and validity estimates for mail, telephone, and face-to-face modes of survey administration. We stratified randomization to survey mode by CRCS test status [single test within guidelines (FOBT, SIG, COL, or BE), multiple tests within guidelines, or no test within guidelines] as recorded in the KSC administrative database. Random numbers were generated using Stata version 10.0 and were used to assign patients with these test profiles to survey mode.
We defined adherence to guidelines as FOBT within the past year or SIG, COL, or BE within the past 5 years. We used a 5-year interval for COL, rather than 10 years as recommended by the American Cancer Society (18), to be consistent with our eligibility criteria (i.e., KSC patient for at least 5 years).
Self-report Questionnaire and Study Procedures
Vernon et al. (6) have described the development of the NCI CRCS questionnaire. We formatted two versions for this study. The mail version had explicit written instructions and detailed skip patterns. The telephone and face-to-face version contained interviewer scripts and prompts. Pilot testing resulted in reordering a few questions to help respondents follow the skip patterns appropriate for their CRCS test history. Both versions were professionally printed as eight-page color bubble-formatted booklets that could be scanned for data entry.
For each CRCS test, respondents were read, or in the mail version were instructed to read, a test description and then were asked whether they had ever heard of it, ever had it, when the most recent one had been completed, reason, and facility location (KSC or elsewhere). Respondents who had heard of the test and had had it were asked to report whether it occurred within defined time intervals. For FOBT, the intervals were “a year ago or less, more than 1 but not more than 2 years ago, more than 2 but not more than 5 years ago, and more than 5 years ago.” For SIG, COL, and BE, the intervals were “a year ago or less, more than 1 but not more than 5 years ago, more than 5 but not more than 10 years ago, and more than 10 years ago.” Respondents were also asked the month and year of the most recent test.
We also included questions about sociodemographics (age, sex, race/ethnicity, education, and marital status), family history of colorectal cancer, use of the healthcare system (recency of last KSC visit and number of KSC and non-KSC visits during the past 5 years), and social desirability. Because prior studies consistently show that respondents tend to overreport cancer screening behaviors, we included a measure of social desirability, a construct defined as the tendency to respond to questions in socially or culturally sanctioned ways (19, 20). We used the 10-item version (21) of the original 33-item Marlowe-Crowne Social Desirability Scale (19). The response format was true/false and scores ranged from 0 (low social desirability) to 10 (high social desirability).
The first survey (hereafter called the validation survey) was used to compare patients' self-report against information in the medical record, electronic administrative database, and reports from non–KSC physicians. Upon completion of the validation survey, participants were asked whether they would be willing to complete a second survey at a future date (hereafter called the reliability survey). Consenting respondents were randomly assigned within the same survey mode to be re-interviewed at one of three time intervals—2 weeks, 3 months, or 6 months. The same protocols were followed in the reliability survey.
We followed the Dillman approach for mail surveys (22). Enrollees randomized to the mail survey received a packet consisting of a letter, survey booklet, postage-paid envelope, and pencil. Nonrespondents were mailed a second packet after 4 weeks and a final reminder postcard after 8 weeks. Sixteen weeks from the enrollment date, enrollees who did not return a mail survey were classified as nonrespondents. Enrollees randomized to the telephone and face-to-face surveys were called at least six times before being classified as nonrespondents. The day before the face-to-face interview, enrollees were called to confirm their appointment. Face-to-face interviews were conducted at the participant's home or at the KSC main campus, depending on the participant's preference (61% chose a home interview). Participants were mailed a $20 honorarium for each survey completed.
Medical Record Abstraction
Because there are errors in both administrative data and medical records (23, 24), we combined data on test type and date from the administrative database and medical record as our gold standard measure (hereafter called the combined medical record). The data sources were merged based on the study identification number, test type, and test date. Duplicate tests were deleted. If a patient had multiple tests, tests were sorted by recency and type. We abstracted dates of initial and most recent patient visits, test types and dates, reason (screening or diagnostic), test facility location (KSC or non-KSC), results, outcome of follow-up for an abnormal test, and where the data were found (administrative database, medical record, or both). For FOBT, we abstracted data only on the 3-day, home-based test.
For quality control purposes, we randomly selected 10% of the records for re-review to assess interrater agreement. We evaluated agreement between abstractors based on the most recent test within guidelines and on all tests received within the past 5 years. The records of 81 patients were abstracted by three pairs of raters who reviewed between 10 and 65 records. Agreement for the most recent test within guidelines was 98% (κ, 0.96). Agreement for all tests within the past 5 years was 91% (κ, 0.89). When discrepancies occurred, the project director (R.W. Vojvodic) and database manager (S. Coan) reviewed the medical record and billing data and resolved the disagreement.
If a respondent reported a CRCS test from a non–KSC provider that was not recorded in the combined medical record, we requested permission to contact the provider. We mailed packets to providers containing a cover letter, a signed patient consent form, a CRCS test validation form, a $2 bill, and a postage-paid envelope. If we did not receive a response, we called the provider 2, 4, and 6 weeks after mailing the packet. A second packet without the incentive was mailed at the same time as the 4-week call. Providers were classified as nonrespondents after 8 weeks. Thirty of the 54 participants (55%) who reported a CRCS test from a non–KSC provider gave us permission, 7 actively refused, and 17 did not respond to our contact attempts. Of the 30 providers we attempted to contact, we had incomplete contact information for 4, 23 returned a completed validation form, and 3 were nonrespondents. Completed provider forms verified self-report of one FOBT and nine COLs. Providers reported no tests in the record for five patients, one provider had only the patient's self-report, and seven providers reported information that was less current than the KSC medical record. This information was included with KSC combined medical record data.
Statistical Analysis
We used χ2 statistics to assess whether sociodemographics and healthcare use differed according to randomized group; ANOVA was used to assess mean differences in social desirability scores. To evaluate the reliability and validity estimates, we used a criterion approach described below.
Reliability analyses. To assess test-retest reliability of the four CRCS tests, we compared a participant's responses to the validation survey with responses on the counterpart 2-week, 3-month, or 6-month reliability survey. We coded a participant's responses as consistent if the time interval between the survey date and the self-reported month and year was within guidelines on both the validation and reliability surveys or if no test within guidelines was reported on both surveys. If month and year were not reported, we used data from the time interval question. Missing values on either survey were counted as no test within guidelines. We excluded CRCS tests from these calculations if they were documented in the combined medical record as having been completed between surveys. In addition to the four CRCS test types, we also examined reliability for a combined measure of SIG and COL (i.e., endoscopy), defined as either test within the past 5 years.
To correct for chance agreement, we calculated κ coefficients (25). We used criteria recommended by Landis and Koch (26) to assess the adequacy of κ coefficients: values >0.80 indicate excellent agreement, between 0.61 and 0.80 substantial agreement, between 0.41 and 0.60 moderate agreement, 0.21 and 0.40 fair agreement, and <0.21 slight agreement.
Validity analyses. Validity was evaluated with four measures: concordance (raw percentage agreement), sensitivity, specificity, and the report-to-records ratio. The report-to-records ratio is the ratio of participants reporting a test (true positives plus false positives) divided by the percentage of tests in the record (true positives plus false negatives). It is a measure of net bias in test reporting with values >1.0 indicating overreporting and values <1.0 indicating underreporting (27). All measures were calculated for each CRCS test type and for endoscopy as well as by mode of survey administration.
We used test dates in the combined medical record to determine adherence to guidelines for each type (yes/no). Based on the response to the question about the month and year (or if missing, data from the interval question), respondents were classified as adherent or not for each test (yes/no). We then compared data from both sources to calculate the four validity measures. Month and year were reported for 61% of FOBTs, 60% of SIGs, 79% of COLs, and 39% of BEs.
Respondents with missing values for both questions and those who reported an unverified test from a non–KSC provider were classified as nonadherent. Missing values were 2% for FOBT, 3% for SIG, 4% for COL, and 5% for BE. Most missing data were from the mail survey.
In the primary analysis, persons with multiple tests within the guidelines documented in the combined medical record were included in the analyses for each test they had. A person without a given test within the guideline-specific time period was included in the analyses as a nontester (e.g., a person with an up-to-date FOBT but no COL was analyzed as a nontester in the COL analysis).
We used Tisnado et al.'s (28) criteria for evaluating the sensitivity and specificity of ambulatory care services: ≥0.9 indicates excellent agreement, ≥0.8 to <0.9 indicates good agreement, ≥0.7 to <0.8 indicates fair agreement, and <0.7 indicates poor agreement. We judged a measure to have adequate precision if the lower confidence limit was >0.70. For concordance, sensitivity, specificity, and report-to-records ratio, we calculated two-sided 95% confidence intervals.
We conducted several ancillary analyses to assess effects on sensitivity or specificity estimates. First, we excluded respondents with missing data on when they had their most recent test, rather than assume that they were nonadherent (65 exclusions). A second analysis excluded respondents with gastrointestinal conditions: polyps, diverticulitis, or Crohn's disease (67 exclusions). In a third analysis, we examined whether the number of test types received within guidelines affected recall for each test by stratifying sensitivity estimates on single versus multiple-test status as recorded in the combined medical record. We hypothesized that patients with multiple CRCS tests would be more knowledgeable about screening and therefore would be more accurate (i.e., sensitivity would be higher) in their reporting than patients having only a single test. Likewise, we hypothesized that patients with no documented test in the combined medical record during the 5-year time period would be more accurate in reporting not having a particular test compared with patients who received other types of CRCS tests, but not the one being assessed. We examined this issue by stratifying specificity estimates for each test type according to nontester status. Finally, we measured adherence to CRCS guidelines using the interval question as the primary source of self-reported information and supplementing missing information with data on month and year. Almost all respondents provided a time interval: 98% of FOBTs, 95% of SIGs, 92% of COLs, and 92% of BEs.
Results
From September 2005 to December 2006, we invited 3,519 potential study candidates and were able to contact 3,028 (86%; Fig. 1). Of the 3,028 contacted, 2,167 (72%) were eligible. Of these, 1,163 (54%) refused prior to randomization. Among the refusals, 49% did not give a reason. Among the rest, reasons included lack of interest (27%), no time (15%), only being willing to do a specific survey mode (4%), and miscellaneous other reasons (5%). Of the 1,004 randomized, our overall response proportion was 40% (857 of 2,167 eligible candidates).
Sampling and recruitment of the study population from Kelsey-Seybold Clinic, Houston, Texas; 2005-2006.
Sampling and recruitment of the study population from Kelsey-Seybold Clinic, Houston, Texas; 2005-2006.
Postrandomization validation survey completion by mode was 80% for face-to-face, 88% for mail, and 89% for telephone (P = 0.36). Although not statistically significant, there were more refusals to the face-to-face survey compared with the other two survey modes, whereas there were more nonrespondents to mail and telephone surveys compared with face-to-face surveys (Fig. 1).
There were few statistically significant differences in characteristics by survey mode (Table 1). Overall, most participants were in the younger age group, were female, white, married, had at least some college education, reported no family history of colorectal cancer, and had few visits to non–KSC providers. Respondents to the mail survey were somewhat less likely than respondents in other survey modes to report visiting a KSC provider in the past year; however, >88% in all groups had done so. Mail respondents also were more likely than the other groups to report fewer than six visits to a KSC provider within the past 5 years. Mean scores for the social desirability scale were higher for telephone respondents than for face-to-face surveys. As expected, because we stratified our results according to prior CRCS, the groups did not differ on test status.
Characteristics of survey respondents by mode of survey administration, Kelsey-Seybold Clinic (KSC), Houston, Texas, 2005-2006
. | Face-to-face, n = 280 . | Mail, n = 291 . | Telephone, n = 286 . | P* . | df . | |||||
---|---|---|---|---|---|---|---|---|---|---|
. | n (%) . | n (%) . | n (%) . | . | . | |||||
Age (years) | ||||||||||
Mean | 58.8 | 59.4 | 59.4 | 0.37 | 2 | |||||
51-64 | 232 (82.9) | 231 (79.4) | 232 (81.1) | |||||||
≥65 | 48 (17.1) | 60 (20.6) | 54 (18.9) | 0.57 | 2 | |||||
Gender | ||||||||||
Female | 186 (66.4) | 188 (64.6) | 190 (66.4) | |||||||
Male | 94 (33.6) | 103 (35.4) | 96 (33.6) | 0.87 | 2 | |||||
Race/ethnicity | ||||||||||
Non-Hispanic white | 178 (63.6) | 158 (54.3) | 173 (60.5) | |||||||
African American | 69 (24.6) | 84 (28.9) | 73 (25.5) | |||||||
Hispanic | 28 (10.0) | 31 (10.7) | 25 (8.7) | |||||||
Unreported† | 5 (1.8) | 18 (6.2) | 15 (5.2) | 0.48 | 4 | |||||
Marital status | ||||||||||
Not married | 75 (26.8) | 67 (23.0) | 73 (25.5) | |||||||
Married/living with a partner | 205 (73.2) | 218 (74.9) | 213 (74.5) | |||||||
Unreported† | 0 (0.0) | 6 (2.1) | 0 (0.0) | 0.66 | 2 | |||||
Education | ||||||||||
<High school | 4 (1.4) | 7 (2.4) | 12 (4.2) | |||||||
High school/GED | 28 (10.0) | 36 (12.4) | 28 (9.8) | |||||||
Some college | 96 (34.3) | 88 (30.2) | 83 (29.0) | |||||||
≥College | 150 (53.6) | 155 (53.3) | 163 (57.0) | |||||||
Unreported† | 2 (0.7) | 5 (1.7) | 0 (0.0) | 0.32 | 6 | |||||
Family history of colorectal cancer | ||||||||||
No | 244 (87.1) | 243 (83.5) | 249 (87.1) | |||||||
Yes | 31 (11.1) | 37 (12.7) | 29 (10.1) | |||||||
Unreported† | 5 (1.8) | 11 (3.8) | 8 (2.8) | 0.58 | 2 | |||||
Last healthcare visit to KSC | ||||||||||
Within the past year | 270 (96.4) | 258 (88.7) | 272 (95.1) | |||||||
>1 y | 10 (3.6) | 26 (8.9) | 14 (4.9) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 0 (0.0) | 0.01 | 2 | |||||
No. vists to KSC in past 5 y | ||||||||||
0-5 | 24 (8.6) | 61 (21.0) | 28 (9.8) | |||||||
>5 | 256 (91.4) | 223 (76.6) | 258 (90.2) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 0 (0.0) | 0.00 | 2 | |||||
No. visits outside KSC in past 5 y | ||||||||||
None | 143 (51.1) | 156 (53.6) | 152 (53.1) | |||||||
1-2 | 61 (21.8) | 46 (15.8) | 65 (22.7) | |||||||
3-5 | 32 (11.4) | 42 (14.4) | 29 (10.1) | |||||||
>5 | 44 (15.7) | 40 (13.7) | 39 (13.6) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 1 (0.3) | 0.32 | 6 | |||||
Test status‡ | ||||||||||
No tests | 114 (40.7) | 126 (43.3) | 113 (39.5) | |||||||
Single test | 107 (38.2) | 111 (38.1) | 126 (44.1) | |||||||
Multiple tests | 59 (21.1) | 54 (18.6) | 47 (16.4) | 0.43 | 4 | |||||
Social desirability§ | ||||||||||
Mean score (SD) | 6.64 (1.95) | 6.69 (2.11) | 7.08 (1.96) | 0.02 | 2 |
. | Face-to-face, n = 280 . | Mail, n = 291 . | Telephone, n = 286 . | P* . | df . | |||||
---|---|---|---|---|---|---|---|---|---|---|
. | n (%) . | n (%) . | n (%) . | . | . | |||||
Age (years) | ||||||||||
Mean | 58.8 | 59.4 | 59.4 | 0.37 | 2 | |||||
51-64 | 232 (82.9) | 231 (79.4) | 232 (81.1) | |||||||
≥65 | 48 (17.1) | 60 (20.6) | 54 (18.9) | 0.57 | 2 | |||||
Gender | ||||||||||
Female | 186 (66.4) | 188 (64.6) | 190 (66.4) | |||||||
Male | 94 (33.6) | 103 (35.4) | 96 (33.6) | 0.87 | 2 | |||||
Race/ethnicity | ||||||||||
Non-Hispanic white | 178 (63.6) | 158 (54.3) | 173 (60.5) | |||||||
African American | 69 (24.6) | 84 (28.9) | 73 (25.5) | |||||||
Hispanic | 28 (10.0) | 31 (10.7) | 25 (8.7) | |||||||
Unreported† | 5 (1.8) | 18 (6.2) | 15 (5.2) | 0.48 | 4 | |||||
Marital status | ||||||||||
Not married | 75 (26.8) | 67 (23.0) | 73 (25.5) | |||||||
Married/living with a partner | 205 (73.2) | 218 (74.9) | 213 (74.5) | |||||||
Unreported† | 0 (0.0) | 6 (2.1) | 0 (0.0) | 0.66 | 2 | |||||
Education | ||||||||||
<High school | 4 (1.4) | 7 (2.4) | 12 (4.2) | |||||||
High school/GED | 28 (10.0) | 36 (12.4) | 28 (9.8) | |||||||
Some college | 96 (34.3) | 88 (30.2) | 83 (29.0) | |||||||
≥College | 150 (53.6) | 155 (53.3) | 163 (57.0) | |||||||
Unreported† | 2 (0.7) | 5 (1.7) | 0 (0.0) | 0.32 | 6 | |||||
Family history of colorectal cancer | ||||||||||
No | 244 (87.1) | 243 (83.5) | 249 (87.1) | |||||||
Yes | 31 (11.1) | 37 (12.7) | 29 (10.1) | |||||||
Unreported† | 5 (1.8) | 11 (3.8) | 8 (2.8) | 0.58 | 2 | |||||
Last healthcare visit to KSC | ||||||||||
Within the past year | 270 (96.4) | 258 (88.7) | 272 (95.1) | |||||||
>1 y | 10 (3.6) | 26 (8.9) | 14 (4.9) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 0 (0.0) | 0.01 | 2 | |||||
No. vists to KSC in past 5 y | ||||||||||
0-5 | 24 (8.6) | 61 (21.0) | 28 (9.8) | |||||||
>5 | 256 (91.4) | 223 (76.6) | 258 (90.2) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 0 (0.0) | 0.00 | 2 | |||||
No. visits outside KSC in past 5 y | ||||||||||
None | 143 (51.1) | 156 (53.6) | 152 (53.1) | |||||||
1-2 | 61 (21.8) | 46 (15.8) | 65 (22.7) | |||||||
3-5 | 32 (11.4) | 42 (14.4) | 29 (10.1) | |||||||
>5 | 44 (15.7) | 40 (13.7) | 39 (13.6) | |||||||
Unreported† | 0 (0.0) | 7 (2.4) | 1 (0.3) | 0.32 | 6 | |||||
Test status‡ | ||||||||||
No tests | 114 (40.7) | 126 (43.3) | 113 (39.5) | |||||||
Single test | 107 (38.2) | 111 (38.1) | 126 (44.1) | |||||||
Multiple tests | 59 (21.1) | 54 (18.6) | 47 (16.4) | 0.43 | 4 | |||||
Social desirability§ | ||||||||||
Mean score (SD) | 6.64 (1.95) | 6.69 (2.11) | 7.08 (1.96) | 0.02 | 2 |
Using χ2 P-values and degrees of freedom (df). Results for mean age are from ANOVA (Prob>F).
The category “unreported” was not included when calculating the χ2 statistic.
Based on the combined medical record.
Results from ANOVA. In pairwise comparisons, the only significant difference was between face-to-face and telephone respondents. Omits 3 observations with a social desirability score less than 2.
Test-Retest Reliability Assessment
At the end of the validation survey, 99% of face-to-face, 98% of mail, and 99% of telephone survey participants agreed to do a reliability survey. Completion rates for the 661 reliability surveys we requested were 85% for face-to-face, 91% for mail, and 89% for telephone surveys (P = 0.11). Reasons for withdrawal between the surveys were nonresponse (n = 75), refusal (n = 9), lost to follow-up (n = 6), and illness or death (n = 4). We completed 567 reliability surveys, 185 at 2 weeks, 184 at 3 months, and 198 at 6 months.
The percentage of agreement between responses to the validation survey and the 2-week and 3-month reliability surveys was 90% or greater for all tests in all survey modes with only minor exceptions (Table 2). Except for COL and BE, agreement at 6 months was lower than the other time intervals but was never <80% for any of the tests or endoscopy.
Reliability of self-report by mode of survey administration at 2 wk, 3 mo, or 6 mo of follow-up survey (Kelsey-Seybold Clinic, Houston, TX; 2005-2006)
. | 2 wk . | . | . | 3 mo . | . | . | 6 mo . | . | . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | n . | Concordance . | κ* . | n . | Concordance . | κ* . | n . | Concordance . | κ* . | |||||||||
Fecal occult blood test | ||||||||||||||||||
Overall | 165 | 89.7 | 0.74 | 160 | 90.9 | 0.74 | 150 | 85.2 | 0.58 | |||||||||
Face-to-face | 48 | 85.7 | 0.64 | 54 | 90.0 | 0.74 | 39 | 84.8 | 0.58 | |||||||||
64 | 92.8 | 0.82 | 58 | 89.2 | 0.65 | 60 | 84.5 | 0.59 | ||||||||||
Telephone | 53 | 89.8 | 0.76 | 48 | 94.1 | 0.82 | 51 | 86.4 | 0.56 | |||||||||
Sigmoidoscopy (SIG) | ||||||||||||||||||
Overall | 168 | 90.8 | 0.77 | 170 | 92.4 | 0.81 | 170 | 86.7 | 0.67 | |||||||||
Face-to-face | 50 | 89.3 | 0.78 | 57 | 93.4 | 0.85 | 46 | 85.2 | 0.60 | |||||||||
64 | 91.4 | 0.70 | 62 | 92.5 | 0.80 | 68 | 89.5 | 0.75 | ||||||||||
Telephone | 54 | 91.5 | 0.79 | 51 | 91.1 | 0.76 | 56 | 84.9 | 0.62 | |||||||||
Colonoscopy (COL) | ||||||||||||||||||
Overall | 177 | 95.7 | 0.90 | 162 | 93.6 | 0.85 | 171 | 92.4 | 0.83 | |||||||||
Face-to-face | 52 | 92.9 | 0.83 | 51 | 92.7 | 0.83 | 48 | 95.2 | 0.91 | |||||||||
69 | 98.6 | 0.97 | 59 | 92.2 | 0.82 | 69 | 93.2 | 0.84 | ||||||||||
Telephone | 56 | 94.9 | 0.89 | 52 | 96.3 | 0.90 | 56 | 88.9 | 0.74 | |||||||||
Endoscopy (COL or SIG) | ||||||||||||||||||
Overall | 170 | 91.9 | 0.84 | 175 | 95.1 | 0.90 | 174 | 87.9 | 0.78 | |||||||||
Face-to-face | 49 | 87.5 | 0.73 | 58 | 95.1 | 0.90 | 48 | 88.9 | 0.78 | |||||||||
66 | 94.3 | 0.88 | 63 | 94.0 | 0.88 | 68 | 89.5 | 0.79 | ||||||||||
Telephone | 55 | 93.2 | 0.86 | 54 | 96.4 | 0.93 | 58 | 85.3 | 0.71 | |||||||||
Barium enema | ||||||||||||||||||
Overall | 172 | 93.5 | 0.66 | 176 | 96.7 | 0.78 | 186 | 95.4 | 0.72 | |||||||||
Face-to-face | 50 | 90.9 | 0.68 | 57 | 95.0 | 0.77 | 49 | 92.5 | 0.46 | |||||||||
67 | 95.7 | 0.64 | 66 | 100.0 | 1.00 | 72 | 97.3 | 0.87 | ||||||||||
Telephone | 55 | 93.2 | 0.63 | 53 | 94.6 | 0.55 | 65 | 95.6 | 0.64 |
. | 2 wk . | . | . | 3 mo . | . | . | 6 mo . | . | . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | n . | Concordance . | κ* . | n . | Concordance . | κ* . | n . | Concordance . | κ* . | |||||||||
Fecal occult blood test | ||||||||||||||||||
Overall | 165 | 89.7 | 0.74 | 160 | 90.9 | 0.74 | 150 | 85.2 | 0.58 | |||||||||
Face-to-face | 48 | 85.7 | 0.64 | 54 | 90.0 | 0.74 | 39 | 84.8 | 0.58 | |||||||||
64 | 92.8 | 0.82 | 58 | 89.2 | 0.65 | 60 | 84.5 | 0.59 | ||||||||||
Telephone | 53 | 89.8 | 0.76 | 48 | 94.1 | 0.82 | 51 | 86.4 | 0.56 | |||||||||
Sigmoidoscopy (SIG) | ||||||||||||||||||
Overall | 168 | 90.8 | 0.77 | 170 | 92.4 | 0.81 | 170 | 86.7 | 0.67 | |||||||||
Face-to-face | 50 | 89.3 | 0.78 | 57 | 93.4 | 0.85 | 46 | 85.2 | 0.60 | |||||||||
64 | 91.4 | 0.70 | 62 | 92.5 | 0.80 | 68 | 89.5 | 0.75 | ||||||||||
Telephone | 54 | 91.5 | 0.79 | 51 | 91.1 | 0.76 | 56 | 84.9 | 0.62 | |||||||||
Colonoscopy (COL) | ||||||||||||||||||
Overall | 177 | 95.7 | 0.90 | 162 | 93.6 | 0.85 | 171 | 92.4 | 0.83 | |||||||||
Face-to-face | 52 | 92.9 | 0.83 | 51 | 92.7 | 0.83 | 48 | 95.2 | 0.91 | |||||||||
69 | 98.6 | 0.97 | 59 | 92.2 | 0.82 | 69 | 93.2 | 0.84 | ||||||||||
Telephone | 56 | 94.9 | 0.89 | 52 | 96.3 | 0.90 | 56 | 88.9 | 0.74 | |||||||||
Endoscopy (COL or SIG) | ||||||||||||||||||
Overall | 170 | 91.9 | 0.84 | 175 | 95.1 | 0.90 | 174 | 87.9 | 0.78 | |||||||||
Face-to-face | 49 | 87.5 | 0.73 | 58 | 95.1 | 0.90 | 48 | 88.9 | 0.78 | |||||||||
66 | 94.3 | 0.88 | 63 | 94.0 | 0.88 | 68 | 89.5 | 0.79 | ||||||||||
Telephone | 55 | 93.2 | 0.86 | 54 | 96.4 | 0.93 | 58 | 85.3 | 0.71 | |||||||||
Barium enema | ||||||||||||||||||
Overall | 172 | 93.5 | 0.66 | 176 | 96.7 | 0.78 | 186 | 95.4 | 0.72 | |||||||||
Face-to-face | 50 | 90.9 | 0.68 | 57 | 95.0 | 0.77 | 49 | 92.5 | 0.46 | |||||||||
67 | 95.7 | 0.64 | 66 | 100.0 | 1.00 | 72 | 97.3 | 0.87 | ||||||||||
Telephone | 55 | 93.2 | 0.63 | 53 | 94.6 | 0.55 | 65 | 95.6 | 0.64 |
NOTE: Tests completed after the validation survey were excluded.
Landis and Koch (26) define quality of interrater agreement (κ) as excellent (κ > 0.80); substantial (κ ≥ 0.61 and ≤0.80); moderate (κ ≥ 0.41 and ≤0.60); fair (κ ≥ 0.21 and ≤0.40); and slight (κ < 0.21).
At 2 weeks and 3 months, all κ coefficients met Landis and Koch's criteria (26) for excellent (>0.8) or good agreement (0.61 to 0.80; Table 2). At 6 months, except for telephone, κ coefficients for COL remained in the excellent range. For FOBT and SIG, κ coefficients decreased to the moderate (0.41-0.60) to good range. Within test type, except for BE, there was little variation in κ coefficient according to survey mode.
Validity Assessment
We had combined medical record data for 857 patients: 353 patients with no current CRCS test of any type, 344 with only one CRCS test within guidelines (47 FOBTs, 119 SIGs, 144 COLs, and 34 BEs), and 160 patients with more than one test within guidelines (133 with two tests, 26 with three tests, and 1 with four tests). In all, we had test data for 138 FOBTs, 219 SIGs, 232 COLs, and 103 BEs. Patterns for multiple testers showed that 91 persons had FOBT and COL, SIG, or BE, 48 had SIG and COL or BE, and 21 had COL and BE.
Concordance. Overall concordance estimates for all tests and endoscopy met the criteria for good agreement; estimates were >0.80 and the lower confidence limit exceeded 0.70 (Table 3). Although the differences were small, estimates for COL and BE were consistently higher than for FOBT or SIG. Estimates of concordance showed no substantial differences by survey mode for any CRCS test or endoscopy.
Concordance, sensitivity, specificity, and report-to-records ratio comparing self-report of adherence to colorectal cancer–screening guidelines with the combined medical record (“gold standard”) by colorectal cancer–screening test type and mode of survey administration (Kelsey-Seybold Clinic, Houston, TX; 2005-2006; n = 857)
. | Endoscopy . | . | . | . | . | . | . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | FOBT . | . | SIG . | . | COL . | . | COL or SIG . | . | BE . | . | ||||||||||
. | n . | Concordance (95% CI)* . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | ||||||||||
Overall | 857 | 0.85 (0.82-0.88) | 857 | 0.85 (0.83-0.88) | 857 | 0.91 (0.89-0.93) | 857 | 0.85 (0.83-0.88) | 857 | 0.92 (0.90-0.94) | ||||||||||
Face-to-face | 280 | 0.84 (0.79-0.88) | 280 | 0.84 (0.79-0.88) | 280 | 0.91 (0.88-0.95) | 280 | 0.85 (0.80-0.89) | 280 | 0.89 (0.85-0.93) | ||||||||||
291 | 0.85 (0.81-0.90) | 291 | 0.87 (0.82-0.91) | 291 | 0.92 (0.89-0.95) | 291 | 0.86 (0.81-0.90) | 291 | 0.95 (0.92-0.97) | |||||||||||
Telephone | 286 | 0.86 (0.82-0.91) | 286 | 0.86 (0.82-0.90) | 286 | 0.89 (0.85-0.93) | 286 | 0.85 (0.81-0.90) | 286 | 0.91 (0.88-0.95) | ||||||||||
n | Sensitivity† (95% CI)* | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | |||||||||||
Overall | 138 | 0.82 (0.75-0.89) | 219 | 0.76 (0.70-0.83) | 232 | 0.91 (0.87-0.94) | 420 | 0.89 (0.85-0.92) | 103 | 0.56 (0.44-0.69) | ||||||||||
Face-to-face | 45 | 0.80 (0.74-0.86) | 83 | 0.78 (0.72-0.85) | 71 | 0.92 (0.88-0.96) | 141 | 0.91 (0.85-0.96) | 39 | 0.51 (0.45-0.58) | ||||||||||
43 | 0.81 (0.76-0.87) | 73 | 0.74 (0.68-0.80) | 79 | 0.94 (0.90-0.97) | 140 | 0.86 (0.80-0.92) | 34 | 0.68 (0.62-0.73) | |||||||||||
Telephone | 50 | 0.84 (0.79-0.89) | 63 | 0.76 (0.70-0.82) | 82 | 0.87 (0.82-0.92) | 139 | 0.88 (0.83-0.94) | 30 | 0.50 (0.44-0.56) | ||||||||||
n | Specificity† (95% CI)* | n | Specificity (95% CI) | n | Specificity (95% CI) | n | Specificity (95% CI) | n | Specificity (95% CI) | |||||||||||
Overall | 719 | 0.86 (0.83-0.88) | 638 | 0.89 (0.86-0.91) | 625 | 0.91 (0.89-0.93) | 437 | 0.82 (0.86-0.91) | 754 | 0.97 (0.95-0.98) | ||||||||||
Face-to-face | 235 | 0.84 (0.79-0.89) | 197 | 0.86 (0.81-0.91) | 209 | 0.91 (0.87-0.95) | 139 | 0.78 (0.71-0.86) | 241 | 0.95 (0.93-0.98) | ||||||||||
248 | 0.86 (0.81-0.91) | 218 | 0.91 (0.87-0.95) | 212 | 0.92 (0.88-0.95) | 151 | 0.85 (0.79-0.91) | 257 | 0.98 (0.96-1.00) | |||||||||||
Telephone | 246 | 0.87 (0.82-0.91) | 223 | 0.89 (0.84-0.93) | 204 | 0.90 (0.86-0.94) | 147 | 0.82 (0.76-0.89) | 256 | 0.96 (0.94-0.99) | ||||||||||
Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | ||||||||||||||||
Overall | 1.57 (1.43-1.72) | 1.10 (0.99-1.21) | 1.15 (1.04-1.26) | 1.07 (0.92-1.23) | 0.82 (0.66-0.97) | |||||||||||||||
Face-to-face | 1.62 (1.31-1.93) | 1.12 (0.94-1.30) | 1.18 (0.96-1.40) | 1.12 (0.90-1.34) | 0.79 (0.56-1.03) | |||||||||||||||
1.63 (1.40-1.86) | 1.01 (0.84-1.13) | 1.16 (0.96-1.36) | 1.03 (0.83-1.23) | 0.82 (0.55-1.10) | ||||||||||||||||
Telephone | 1.48 (1.24-1.72) | 1.16 (0.94-1.38) | 1.11 (0.94-1.28) | 1.07 (0.90-1.24) | 0.83 (0.51-1.16) |
. | Endoscopy . | . | . | . | . | . | . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | FOBT . | . | SIG . | . | COL . | . | COL or SIG . | . | BE . | . | ||||||||||
. | n . | Concordance (95% CI)* . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | n . | Concordance (95% CI) . | ||||||||||
Overall | 857 | 0.85 (0.82-0.88) | 857 | 0.85 (0.83-0.88) | 857 | 0.91 (0.89-0.93) | 857 | 0.85 (0.83-0.88) | 857 | 0.92 (0.90-0.94) | ||||||||||
Face-to-face | 280 | 0.84 (0.79-0.88) | 280 | 0.84 (0.79-0.88) | 280 | 0.91 (0.88-0.95) | 280 | 0.85 (0.80-0.89) | 280 | 0.89 (0.85-0.93) | ||||||||||
291 | 0.85 (0.81-0.90) | 291 | 0.87 (0.82-0.91) | 291 | 0.92 (0.89-0.95) | 291 | 0.86 (0.81-0.90) | 291 | 0.95 (0.92-0.97) | |||||||||||
Telephone | 286 | 0.86 (0.82-0.91) | 286 | 0.86 (0.82-0.90) | 286 | 0.89 (0.85-0.93) | 286 | 0.85 (0.81-0.90) | 286 | 0.91 (0.88-0.95) | ||||||||||
n | Sensitivity† (95% CI)* | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | n | Sensitivity (95% CI) | |||||||||||
Overall | 138 | 0.82 (0.75-0.89) | 219 | 0.76 (0.70-0.83) | 232 | 0.91 (0.87-0.94) | 420 | 0.89 (0.85-0.92) | 103 | 0.56 (0.44-0.69) | ||||||||||
Face-to-face | 45 | 0.80 (0.74-0.86) | 83 | 0.78 (0.72-0.85) | 71 | 0.92 (0.88-0.96) | 141 | 0.91 (0.85-0.96) | 39 | 0.51 (0.45-0.58) | ||||||||||
43 | 0.81 (0.76-0.87) | 73 | 0.74 (0.68-0.80) | 79 | 0.94 (0.90-0.97) | 140 | 0.86 (0.80-0.92) | 34 | 0.68 (0.62-0.73) | |||||||||||
Telephone | 50 | 0.84 (0.79-0.89) | 63 | 0.76 (0.70-0.82) | 82 | 0.87 (0.82-0.92) | 139 | 0.88 (0.83-0.94) | 30 | 0.50 (0.44-0.56) | ||||||||||
n | Specificity† (95% CI)* | n | Specificity (95% CI) | n | Specificity (95% CI) | n | Specificity (95% CI) | n | Specificity (95% CI) | |||||||||||
Overall | 719 | 0.86 (0.83-0.88) | 638 | 0.89 (0.86-0.91) | 625 | 0.91 (0.89-0.93) | 437 | 0.82 (0.86-0.91) | 754 | 0.97 (0.95-0.98) | ||||||||||
Face-to-face | 235 | 0.84 (0.79-0.89) | 197 | 0.86 (0.81-0.91) | 209 | 0.91 (0.87-0.95) | 139 | 0.78 (0.71-0.86) | 241 | 0.95 (0.93-0.98) | ||||||||||
248 | 0.86 (0.81-0.91) | 218 | 0.91 (0.87-0.95) | 212 | 0.92 (0.88-0.95) | 151 | 0.85 (0.79-0.91) | 257 | 0.98 (0.96-1.00) | |||||||||||
Telephone | 246 | 0.87 (0.82-0.91) | 223 | 0.89 (0.84-0.93) | 204 | 0.90 (0.86-0.94) | 147 | 0.82 (0.76-0.89) | 256 | 0.96 (0.94-0.99) | ||||||||||
Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | Report-to-records ratio (95% CI) | ||||||||||||||||
Overall | 1.57 (1.43-1.72) | 1.10 (0.99-1.21) | 1.15 (1.04-1.26) | 1.07 (0.92-1.23) | 0.82 (0.66-0.97) | |||||||||||||||
Face-to-face | 1.62 (1.31-1.93) | 1.12 (0.94-1.30) | 1.18 (0.96-1.40) | 1.12 (0.90-1.34) | 0.79 (0.56-1.03) | |||||||||||||||
1.63 (1.40-1.86) | 1.01 (0.84-1.13) | 1.16 (0.96-1.36) | 1.03 (0.83-1.23) | 0.82 (0.55-1.10) | ||||||||||||||||
Telephone | 1.48 (1.24-1.72) | 1.16 (0.94-1.38) | 1.11 (0.94-1.28) | 1.07 (0.90-1.24) | 0.83 (0.51-1.16) |
NOTE: Adherence to guidelines is defined as an annual FOBT or a SIG, COL, or BE within 5 y.
Concordance is the percentage of all those who reported receiving a test or who reported no test in agreement with the combined medical record. Sensitivity is the number of individuals who correctly recalled having the test divided by the number of individuals who had a test according to the combined medical record. Specificity is the number of individuals who correctly reported no test divided by the number of individuals with no test documented in the combined medical record. The report-to-records ratio is the ratio of participants reporting a test (true positives plus false positives) divided by the percentage of tests in the combined medical record (true positives plus false negatives). It is a measure of net bias in test reporting, with values >1.0 indicating overreporting and values <1.0 indicating underreporting.
95% confidence interval.
Tisnado et al. (28) defined sensitivity or specificity as excellent if ≥0.90; good if ≥0.80; fair if ≥0.70; and poor if <0.70.
Sensitivity. Estimates were good for FOBT, fair for SIG, good to excellent for COL and endoscopy, and poor for BE. Only SIG and BE did not consistently meet the criterion of >0.7 for the lower confidence limit. Except for BE, sensitivity estimates showed little variation by survey mode for any of the tests (Table 3).
Specificity. Estimates were good for FOBT and endoscopy, good to excellent for SIG, and excellent for COL and BE (Table 3). Specificity was highest for BE. All estimates met the criterion of >0.7 for the lower confidence limit.
Report-to-records ratio. Overall estimates indicated overreporting for FOBT, SIG, COL, and endoscopy, and underreporting for BE (Table 3). In most cases, confidence intervals for SIG, COL, and endoscopy included 1.0. These patterns were generally consistent across survey modes for all tests.
Ancillary analyses. The effect of excluding respondents with missing values on sensitivity and specificity estimates was minimal. Almost all changes were in the mail survey mode which had the most missing values. Compared with the estimates in Table 3, sensitivity consistently increased (range, 0.01 to 0.09), and specificity consistently decreased (range, −0.01 to −0.02).
When we excluded patients with gastrointestinal conditions, estimates were minimally affected compared with those in Table 3. For FOBT, SIG, and COL, there was no consistent pattern of increase or decrease in sensitivity based on test type or survey mode (range, −0.04 to 0.03). Specificity estimates decreased minimally for FOBT (−0.01) but not for SIG or COL.
When we stratified sensitivity estimates on single versus multiple tests, estimates for FOBT were consistently lower among single testers (range, −0.03 to −0.08) and were consistently higher among multiple testers (range, 0.02 to 0.05). For SIG, COL, and BE, there was no consistent increase or decrease within survey mode by test status (range, −0.09 to 0.06). Overall estimates were unchanged or lower among single testers (range, −0.06 to 0) and slightly higher among multiple testers (range, 0.01 to 0.04).
When we stratified specificity estimates on nontester status, estimates among those with no documentation of FOBT or SIG in the combined medical record consistently increased (range, 0.03 to 0.08). Among those with documentation of a test in the combined medical record, estimates consistently decreased (range, −0.03 to −0.11). The pattern was reversed for COL; specificity decreased slightly among those with no documentation for COL (range, −0.02 to 0.00) and increased slightly among those with documentation (range, 0.00 to 0.03). For BE, there was no consistent pattern by nontester status (range, −0.03 to 0.03).
Measuring self-report using the interval question as the primary source of information and month and year as a supplemental source slightly decreased some of the sensitivity estimates for FOBT and BE (range, −0.01 to −0.02). For SIG, sensitivity estimates by survey mode improved (range, 0.02 to 0.05). Almost all specificity estimates for all test types decreased slightly (range, −0.01 to −0.03). However, these changes did not affect the rating (i.e., excellent, good, or fair) based on Tisnado et al.'s criteria (28).
Discussion
To our knowledge, this is the first study to assess the test-retest reliability of CRCS tests over defined time intervals and to examine, systematically, the effect of mode of administration on the reliability and validity of self-reports. Only one other study (29) has assessed test-retest reliability of CRCS measures. We could not compare results because those researchers used different prevalence definitions (“ever had” versus “adherence to guidelines”) and because the time interval between surveys in their study was not standardized (average, 77 days; range, 40 to 356). Our findings show that although there is some decline over time, participants show reasonably good recall of CRCS tests even after 6 months, particularly for COL.
Consistent with findings from a mammography study (15), we found no evidence that survey mode affects the validity of self-report. However, as reported in a meta-analysis in this issue of CEBP, Rauscher et al. (30) found that interviews conducted face-to-face tended to be associated with reduced self-report accuracy compared with telephone or self-administered surveys for cancer screening behaviors including mammography, Pap testing, FOBT, and endoscopy (SIG or COL).
Our validity estimates compare favorably with other studies. Rauscher et al. (30) report summary effect estimates of sensitivity and specificity for eight studies of FOBT (14, 31-37) and for four studies of SIG or COL (SIG and COL were combined in the meta-analysis; refs. 14, 32, 35, 37). For FOBT, the summary sensitivity estimate was 82%, the same as our overall estimate, and specificity was 78% compared with our 86%. For endoscopy, sensitivity was 79% compared with 89% in our study; for specificity, it was 90% compared with our 82%. Our sensitivity and specificity estimates for FOBT and BE were also similar to those reported by Partin et al. (38) who used the NCI CRCS questionnaire in a low-income veteran population; however, our report-to-records ratio for BE showed statistically significant underreporting whereas Partin et al. found statistically significant overreporting. Compared with Partin et al.'s results, we observed higher sensitivity and specificity for SIG and higher specificity for COL but similar sensitivity.
Applying Tisnado et al.'s (28) criteria, our sensitivity estimates were mostly excellent for COL, good for FOBT, and fair for SIG. Differences in accurate recall may be due to test characteristics or to respondents' familiarity with the tests. Qualitative research has consistently found that patients are confused about the distinction between SIG and COL (7-12). COL may be more memorable because it has received more attention in the media and because patients are sedated, need to arrange for transportation, and need to take a day off from work. Our high sensitivity estimates for COL support the view that patients accurately recall this experience, whereas the lower sensitivity estimates for SIG may be due to patients confusing it with COL (39). To see if patients who had a single documented SIG or COL were more likely to mislabel SIG as COL than vice versa, we compared the frequency of documented SIGs that were self-reported as COL with the frequency of documented COLs that were self-reported as SIG. Few confused these tests. Six of 57 (10%) self-reported COLs were documented SIGs, whereas 4 of 73 (5%) self-reported SIGs were documented COLs. In contrast, Partin et al. (38) found that 55% of false-positive reports for either endoscopy procedure had documentation for the other procedure. Future studies should examine differences in patterns of misreporting test types to understand the reasons for confusion.
A possible reason for FOBT false-positive reports may be recall bias regarding time period. Respondents may recall FOBT as occurring more recently than it did, often referred to as forward telescoping (40, 41). We examined the distribution of FOBT self-reports, measured both as month and year and within a 1-year interval, against test dates in the combined medical record that were within 12 months of the survey date. Approximately 75% accurately reported an FOBT within the past year in response to each question; however, only 61% provided month and year compared with 98% for the interval question. Given respondents' preference for the interval question and the comparable validity estimates found in our ancillary analysis, the interval question may be the preferred way to assess the recency of FOBT, at least in surveys requiring retrospective recall over long time intervals.
Our ancillary analyses showed that sensitivity and specificity estimates were minimally affected when we excluded respondents with missing values; however, the effect of these exclusions was to consistently increase sensitivity and decrease specificity. This could be a problem in studies that have a high percentage of uncertain or missing responses. Although we originally planned to include only patients who had CRC tests for screening, data on reason for the test was not consistently available in the KSC database; however, excluding patients with gastrointestinal conditions only minimally affected our estimates. Our stratified ancillary analyses confirmed our conjectures that respondents with multiple tests were more accurate in reporting test types than single testers and that those with no documented tests were better at reporting not having a particular test than those with a CRCS test (other than the one being asked about).
We found that respondents to the telephone survey scored higher on the Marlowe-Crowne Social Desirability Scale than mail or face-to-face respondents, although only the difference between telephone and face-to-face respondents was statistically significant (means were 7.08, 6.69, and 6.64, respectively). Substantively, these differences are small, and the report-to-records ratios did not suggest that overreporting due to social desirability response tendency was more likely to occur in the telephone survey compared with the other survey modes. Likewise, although mail respondents reported less healthcare utilization compared with telephone and face-to-face respondents (they were less likely to report a healthcare visit within the past year and were more likely to report less than six visits within the past 5 years), there was no pattern in the validity estimates to suggest that these factors differentially influenced reporting of CRCS behaviors.
The 40% participation rate among eligible patients may reduce the generalizability of our results. During recruitment, men and women with no documented record of CRCS were more likely to refuse participation in our study, a finding similar to that reported by others (38, 42-45). Although our sampling strategy ensured an adequate number of participants with no prior CRCS, it is unclear whether participants differed from those without prior CRCS who refused, in ways that might have affected our reliability and validity estimates. Although it may have limited generalizability, we chose a study setting with a stable patient population, strong data systems, and the on-site provision of endoscopy in order to have the most complete ascertainment of our gold standard measure. Only 7% of our study participants reported having CRCS outside KSC, and we attempted to contact those providers to verify self-reports of patients who gave us permission to do so.
Valid and reliable self-report measures are a critical component of cancer prevention and control research. Acceptable values for sensitivity and specificity vary depending on how the measures will be used. A measure with low sensitivity may be acceptable if the purpose is to evaluate the efficacy of a behavior change intervention, but if the purpose is to identify those who need screening, it may be desirable to have high sensitivity, and sacrifice specificity, in order not to miss unscreened persons.
Conclusion
This study provides empirical support for the proposition that the reliability and validity of the NCI CRCS questionnaire is comparable across the three modes of survey administration. Researchers should base their selection of survey mode on their research objectives and on characteristics of the target population such as literacy. Future research should continue to investigate other potential sources of error and bias in self-report as was done by Beebe et al. (46) in this issue of CEBP. Finally, as new screening technologies with different test characteristics are introduced, they will need to be validated.
Grant support: PRC SIP 19-04 U48 DP000057 from the Centers for Disease Control and Prevention (S.W. Vernon, P.M. Diamond, M.E. Fernandez, A. Greisinger, and R.W. Vojvodic) and by RO1 CA97263 (S.W. Vernon).