Experience from two decades of behavioral, health services, and epidemiologic research on breast and cervical cancer screening makes evident the need for investigators to use uniform definitions and measures of cancer screening behaviors (1). The lack of consistency in defining and measuring cancer screening behaviors limits the ability to compare prevalence estimates, patterns of association between screening and independent variables, and intervention effects across studies (2-6). For example, a recent review of the literature on the prevalence of consecutive, on-schedule mammography screening (4) found inconsistencies not only in the terminology used to describe consecutive, on-schedule mammography (e.g., “repeat,” “regular,” “adherence,” “compliance,” “annual,” “rescreen,” and “maintenance”) but also in the measures of on-schedule mammography. Measures differed in terms of the interval defining “sequential” (e.g., 12, 15, or 24 months apart), how many sequential mammograms a woman must have within a given time period to be considered on schedule, and the pattern of mammograms within a given time period (e.g., more than one in a lifetime, two in the past 6 years, or an age-appropriate number).
This report describes the development of a core set of self-report measures of colorectal cancer (CRC) screening behaviors for use in behavioral, health services, and epidemiologic research. Such an effort is timely because descriptive and intervention research on CRC screening has recently become a focus of attention (7). Moreover, a recent report from the Institute of Medicine (8) calls for the evaluation of programs to address progress in cancer prevention and early detection. Valid and reliable measures are needed to assess progress.
CRC is the second leading cause of cancer deaths in the United States, with estimates of 146,940 new cases and 56,730 deaths for 2004 (9). From 1990 to 1997, the average annual incidence rates of CRC were 52.7 and 36.6 per 100,000 in White men and women and 58.3 and 45.2 per 100,000 in Black men and women (10). Regular screening with fecal occult blood test (FOBT) or with sigmoidoscopy facilitates earlier detection of CRC and lowers mortality (11-15). The early detection and removal of precancerous polyps may contribute to decreased incidence of CRC (16-18). Only 40% of CRCs are diagnosed at an early stage when the 5-year survival rate exceeds 90% (19, 20). Collectively, these data support the benefits of screening to reduce morbidity and mortality from CRC (7). However, the use of CRCS tests is low and has not increased substantially in recent years (21, 22). Behavioral Risk Factor Surveillance System data for persons over age 50 show that only 20.6% of those surveyed had had a FOBT in the preceding year in 1999 compared with 19.6% in 1997 and only 33.6% had had a sigmoidoscopy in the preceding 5 years in 1999 compared with 30.3% in 1997 (21, 23). The most recent data from the Behavioral Risk Factor Surveillance System for FOBT in the past year show little difference in these rates for 2001 and 2002 (23.9% and 21.8%), although the prevalence of self-report recent sigmoidoscopy (in the past 5 years) appears to be increasing—38.9% and 40.5% for 2001 and 2002 (24). Data from the National Health Interview Survey (NHIS) for 2000 show that 41% of men and 37.5% of women age 50 and older reported having either a FOBT within the past year or endoscopy within the past 5 years (22).
Measurement issues related to CRC screening are more challenging than those related to breast and cervical cancer screening, because there are multiple CRC screening test options that are sometimes recommended in combination, the interval for on-schedule screening differs for each test, and the technology is changing rapidly. We focus on self-report measures because they frequently are used for outcome measurement in descriptive and intervention studies (e.g., refs. 5, 6). Self-report also is the basis for national data to estimate prevalence and monitor time trends in cancer screening behaviors (22, 25).
Development of Self-Report Measures of CRC Screening
National Cancer Institute–Sponsored Meeting
The Division of Cancer Control and Population Sciences of the National Cancer Institute (NCI) sponsored a 1.5-day meeting in December 1999 to discuss and address problems related to defining and measuring CRC screening behaviors. Invitees were researchers with cancer screening research expertise who were at or were funded by the NCI, the American Cancer Society, or the Centers for Disease Control and Prevention to conduct CRC screening research. (See Appendix 1 for a list of meeting participants.) During the meeting, a small work group was formed to develop a core set of CRC screening measures and standardized descriptions of the tests. Over the next 4 years, the group met via e-mail and conference calls.
To identify variation and specific problems with CRC screening measures prior to the December 1999 meeting, investigators answered a series of questions about their measures, including what tests were offered in the study protocol; recommended time interval between tests; whether tests were described to study participants; mode of survey administration (e.g., in person, telephone, or mail); whether tests were done for screening or diagnosis, including follow-up to another CRC test; and whether home-based stool blood testing was distinguished from office-based tests done by digital rectal examination.
These responses and meeting discussions identified several key challenges. Investigators reported that many study participants in their CRC screening studies were unaware of CRC and CRC screening and were unfamiliar with the tests, their purpose, and how they were done. Sigmoidoscopy and colonoscopy were frequently confused. Other problems included distinguishing between home-based and office-based stool blood testing and between screening and diagnostic testing. Study participants also had problems recalling when and why the tests were done.
Expert evaluation was the initial approach used to develop and pretest questions for measuring CRC screening behaviors (26, 27). The work group was charged with evaluating measures from CRC screening intervention studies and national surveys, identifying potential problems, and developing a core set of measures for further evaluation using cognitive interviewing techniques (26-30). The group agreed to develop measures for stool blood testing, sigmoidoscopy, colonoscopy, and barium enema because these tests are recommended by professional organizations (31-33). The group also assessed the feasibility of using the same measures for mail, telephone, and face-to-face surveys. Extant survey measures and CRC test descriptions were provided by investigators who were funded as of January 1, 2000 by the NCI, the American Cancer Society, or the Centers for Disease Control and Prevention to conduct CRC screening research. Measures from the NHIS 2000 Cancer Control Module and the 1997 Behavioral Risk Factor Surveillance System also were included.
A database of measures, obtained from investigators' and national surveys, was compiled. Questions were grouped by content and by survey administration mode so similarities and differences could be easily compared. Work group members were asked to identify questions they thought should constitute a core set of measures, their preferred way to measure each CRC screening test, and the reasons for their choice. They also were asked to specify the characteristics of each CRC test they thought were essential to include in a test description.
Although there was considerable redundancy in how questions were asked across studies, there were noteworthy differences as well. For example, the stool blood test was variously called a “stool blood,” “blood stool,” “stool guaiac,” “hemoccult,” or “fecal occult blood” test, and it was inconsistently noted whether the test was done at home. Likewise, there were differences across studies in the terminology used to describe sigmoidoscopy (e.g., “flexible sigmoidoscopy,” “sigmoidoscopy,” and “sigmoidoscopic procedure”). Other differences across studies included the interval for test completion and the response categories used to determine when the most recent test was done. For example, some studies asked for the month and year; others used specified time intervals (e.g., less than 1 year or 1 to 2 years).
Test descriptions also varied considerably. They used different terms to describe the stool blood test and varied in describing dietary restrictions, in mentioning the number of cards or stool samples, and in specifying that the test should be done at home. Likewise, sigmoidoscopy and colonoscopy terminology was inconsistent (e.g., “colon” or “bowel”), as was the emphasis on different test characteristics such as the preparation, the part of the colon examined, the need for dietary restrictions, and whether sedation was used.
Based on comments from the work group, a tentative best set of measures was drafted and areas of disagreement were identified. An annotated version was sent to the work group for further comment. The following core questions were agreed on for all CRC screening tests: ever heard of the test; ever had the test; date of the most recent test; reason for the most recent test; date of the next most recent test; reason for the next most recent test; and number of tests in a defined time period. These questions covered all definitions of screening (e.g., initial, ever, recent, periodic, and on-schedule). Revisions were made and remaining areas of disagreement were identified. This version was reviewed by a psychologist experienced in using cognitive laboratory research methods to develop questions for national surveys, including the NHIS. Several gastroenterologists, a primary care physician, and a radiologist were asked to review the test definitions for distinguishing characteristics that would be recognized by persons who had had the tests. Questions were compiled for discussion and resolution in a conference call with the work group, after which the measures were revised. Cognitive testing was conducted on that version of the measures. After cognitive testing, the work group again revised and reviewed the measures.
Cognitive interviewing techniques (26-30) were used to evaluate questions developed by the work group. These techniques typically are applied in a cognitive laboratory setting with 20 to 30 persons (34) to reveal the cognitive processes people use to answer survey questions. These processes include comprehending or interpreting the question, retrieving information from memory, forming a judgment about how to respond, and response editing or deciding how much information to reveal (refs. 27, 35-37; http://appliedresearch.cancer.gov/areas/cognitive/interview.pdf). This methodology frequently is used to pretest survey questions, including those in the NHIS (34), and has contributed to improved reliability and validity of self-reports of retrospective information by identifying and reducing sources of response error that may go unnoticed in field tests of survey instruments (38-40).
Because awareness of CRC screening is low (41-44) and respondents had difficulty recognizing and understanding CRC screening tests, we focused on issues related to comprehending or interpreting the questions and, to a lesser extent, on strategies respondents used to recall information. Cognitive testing was done on an interviewer-administered version of the questions. Specific cognitive interviewing techniques consisted of verbal report and included the “think-aloud” method and structured and spontaneous verbal probing (26-28, 30, 45). Interviewers used immediate retrospective probing in which probe questions were asked during the interview. Such questions included asking interviewees to elaborate on their answers, explain their understanding of key terms, paraphrase survey questions, and state their level of confidence in answering some of the questions. Testing was done in conjunction with the development of the NCI's Health Information National Trends Survey (http://dccps.nci.nih.gov/hcirb/hints.html).
In-person interviews were conducted by two doctoral-level psychologists trained in cognitive interviewing techniques. Interviewees were recruited through newspaper advertisements; 18 interviews were conducted in two rounds over a 4-day period in January and May 2002 at Westat's focus group facility in Rockville, Maryland. Each interview lasted about an hour and participants were paid $50. Several coauthors (S.V., H.M., and C.K.) observed the interviews through a one-way mirror and occasionally entered the room at the end of an interview to ask follow-up questions. The Health Information National Trends Survey included persons over age 17; therefore, not all interviewees were in the age group for which CRC screening is recommended (31-33). Thirteen persons ages 50 years and older were recruited; 10 were female, 1 was African American, 1 was Native American, 2 were Asians whose second language was English, and the rest were White. Three were college graduates and three had some college education. Eligibility was determined by telephone. We attempted to recruit at least four persons who had not been screened for CRC by asking during the eligibility telephone call, “Have you ever been screened for colon cancer?” Four persons answered “no” to the question on the eligibility call; however, during the cognitive interview, it was discovered that one person reported having had all four tests (i.e., stool blood testing, sigmoidoscopy, colonoscopy, and barium enema) and two had had stool blood tests. When participants were asked why they said that they had not been screened for CRC, it was evident that they did not understand the meaning of the word “screening.” They considered these tests part of a routine health examination. Some participants were not even aware that these tests were done to check for cancer.
Comprehension was evaluated by assessing whether interviewees understood the terminology, including test descriptions, as well as whether they could distinguish among the tests. Whether interviewees had had CRC screening tests, they had difficulty recalling and pronouncing the names of the tests. When asked to name any CRC screening tests, several persons said they knew of tests to detect CRC but could not think of specific names. Some gave imprecise responses such as “endoscopic” or “wipe test.” There was considerable variability in terms that were familiar to interviewees. For example, some had not heard of “barium enema” but were familiar with “lower gastrointestinal series.” Likewise, about half were familiar with the phrase “stool blood test” and half with “FOBT.” Interviewees' understanding of the descriptions of sigmoidoscopy and colonoscopy and the differences between them was evaluated by reading the descriptions aloud and asking participants to repeat them in their own words. All interviewees were able to repeat major characteristics of both tests.
Interviewees had difficulty retrieving or recalling information about the specific dates of their screening tests, although most could report the year. Probing by the interviewer increased respondents' ability to report when a test had been done. Recalling dates for the stool blood test was more difficult than for other CRC tests.
Revision of the Measures
The measures were revised to address comprehension problems (e.g., not understanding the terminology and variability in understanding and interpreting key terms and phrases) by simplifying the language and using words that were likely to be familiar to respondents, even if the terminology lacked precision. For example, we used “checkup” instead of “screening” and “test” instead of “procedure.” If a test was known by several names (e.g., stool blood test and FOBT), we used both names in the initial description and in the first question of a series to increase understanding. As described above, problems related to comprehension (e.g., not understanding the meaning of the word “screening” and thus failing to report CRC tests done in the context of a regular checkup) were associated with false-negative reports. Test descriptions were revised to emphasize only the characteristics salient to respondents. An annotated table of the measures that describes in more detail the rationale for decisions made by the work group about question wording can be found at http://cancercontrol.cancer.gov/ACSRB/.
To address interviewees' difficulties in remembering or recognizing the names of CRC screening tests and, with few exceptions, their inability to describe the purpose or characteristics of a test before a description was read, we recommend that test descriptions be included as part of a study protocol for assessing screening. Because awareness of CRC and CRC screening is still low in the general population (41-44), we also recommend including a question on “ever heard of” the test to reduce the possibility of response error (26). In cognitive testing of questions about radon, Willis et al. (26) found that interviewees answered questions containing terms they did not understand or had never heard of. Asking whether a respondent has ever heard of each CRC screening test may decrease the chance that questions about a test will be answered incorrectly.
Our observation that confusion is particularly likely to occur in distinguishing sigmoidoscopy and colonoscopy is consistent with a previous validation study (46) and with data from the NHIS 2000 Cancer Control Module. The NHIS 2000 was the first national population survey to ask about types of endoscopic tests (i.e., proctoscopy, sigmoidoscopy, and colonoscopy); unexpectedly, the prevalence of screening colonoscopy was slightly higher than that for screening sigmoidoscopy (14.7% and 12.7%, respectively, for persons 50 years and older). NHIS interviewers did not read test descriptions unless a respondent requested it, and the interview did not include questions about whether respondents had ever heard of the tests. These circumstances could have led to inaccurate self-reports. Baier et al. (46) found that adding a phrase to the descriptions of the two endoscopic screening tests that stated sedation was usually given for colonoscopy but not for sigmoidoscopy improved the validity of self-reports for both tests.
Consistent with the cognitive research literature on information recall (38, 39, 47), respondents had difficulty remembering exact dates; however, the amount of difficulty varied by CRC test type. Because we do not know what question wording will elicit the most valid response about when a test was done, we recommend asking several questions to maximize the opportunity for respondents to accurately recall and report information. The approach to asking for this information may need to vary by mode of survey administration. For example, the approach taken by the NHIS, which is administered face to face by a trained interviewer, is hierarchical. Interviewers first ask for the month and year of the most recent test. Respondents who cannot recall the date are asked for the number of “days, weeks, months, years” since the last test and, if unable to reply, are then asked to estimate the time using categories (e.g., within the past year). Although this type of probing is not feasible for mail surveys, respondents could first be asked a more general question to orient them (e.g., using categories) and then be asked for the month and year.
Appendix 2 shows the measures we propose. An annotated table of the measures can be found at http://cancercontrol.cancer.gov/ACSRB/. A subset of the measures was included on the NCI's Health Information National Trends Survey (http://dccps.nci.nih.gov/hcirb/hints.html). Researchers are encouraged to use these measures; to evaluate their reliability and validity in behavioral, health services, and epidemiologic studies; and to publish their findings.
This is the first systematic attempt of which we are aware to develop standardized self-report measures for any cancer screening behavior. To date, research in the behavioral sciences relevant to understanding and intervening to reduce mortality and morbidity from chronic diseases, including cancer, has focused largely on identifying correlates of a behavior and testing interventions. Less attention has been paid to developing reliable and valid measures of behaviors (48, 49). As emphasized by Gordis (50), at a minimum, the use of similar measures will enhance comparability among studies.
Although the measures proposed here have face validity, they need to be evaluated in a variety of field settings and diverse populations (51). They also need to be assessed for different modes of administration. While we intend these questions to be adaptable to telephone interviews and self-administration, they were tested only in face-to-face, interviewer-administered mode. Differences in question wording (e.g., ref. 52), response categories, and the order of questions within a survey also have implications for accuracy of self-report and should be systematically evaluated.
Methodologic studies are needed that use a variety of methods, including cognitive interviews, focus groups, small-scale laboratory experiments (e.g., assessing different ways of asking a question or mode of survey administration), and reliability and validity studies of self-reports in relation to different criterion or gold standards. Such studies could be nested in surveys (53), including national surveys such as the NHIS and the Behavioral Risk Factor Surveillance System, conducted in conjunction with behavioral interventions, or funded through NIH mechanisms such as small grants (RO3) or exploratory and developmental grants (R21). The feasibility of conducting validation studies of CRC screening behaviors in medical care organizations under the Health Plan Employer Data and Information Set increased in May 2003 when the National Committee for Quality Assurance Committee on Performance Measurement approved a CRC screening measure in health plans (http://www.ncqa.org/communications/publications/nedispub.htm).
To date, very few published methodologic studies use cognitive laboratory research methods to assess measures of cancer screening behaviors. Most of what we have learned through cognitive testing of extant measures relates to the first cognitive task or process—comprehending or interpreting a question (e.g., refs. 38, 47). In addition to extending studies of comprehension to field settings that include representation of diverse populations, more work needs to be done on the cognitive processes used to recall information and estimate frequency. Cognitive strategies used to answer questions about the frequency of a behavior include enumeration (recall of individual events) and estimation based on schemas (i.e., a pattern of events in which a specific behavior is embedded), among others (54, 55). Sudman et al. (47) and Warnecke et al. (38) found that women were more likely to use schemas, such as an annual checkup, to retrieve information about dates of past mammography and Papanicolaou smears. That is, they remembered the date of their annual examination and assumed that they had screening tests in conjunction with that health care visit. In contrast, for stool blood testing, which was less common, counting was used more often (ref. 47; i.e., women recalled individual occurrences). Because most women in that study population had annual examinations, research is needed to confirm and extend those findings. In particular, more research is needed on the strategies respondents use to recall when a particular type of screening test occurred and whether recall strategies differ by type of test, testing frequency, intervals between tests, and setting or context (e.g., an annual examination versus a referral to a specialist; refs. 39, 54, 55). For example, colonoscopy may be remembered as a discrete event because it is usually done as a referral, occurs in a different setting, requires special preparation, and usually involves sedation. Depending on the context in which sigmoidoscopy is done (e.g., by a primary care physician or referral to a gastroenterologist), it may be remembered either as part of a schema or as a discrete event. These possibilities could be evaluated using cognitive research methods.
Data also are needed on the cognitive processes respondents use to form judgments and edit their responses to questions about screening. For example, we know little about whether factors such as sensitivity of the information requested, perceived threat level of the questions, or social desirability of the behavior causes interviewees to modify or edit their responses. In our cognitive interviews, respondents were very willing to discuss feelings, perceptions, and experiences related to performing or undergoing CRC tests; however, most interviewees had had one or more CRC tests.
As emphasized by Willis et al. (26), evaluating the validity of cognitive interviewing and the questions developed through this process depend on the existence of criterion measures of survey question quality. Criterion measures most often used to evaluate self-reports of cancer screening are medical records, laboratory, or administrative databases. Over the past 15 years, several studies have assessed the reliability and validity of self-report measures of mammography and Papanicolaou testing against the medical record, but measures of CRC screening behaviors have received scant attention (49). A summary measure of bias due to overreporting found that both stool blood testing and sigmoidoscopy were overreported, but overreporting was greater for sigmoidoscopy (49). Every criterion source (e.g., medical records) has limitations (56) that may have different effects on the evaluation of self-report accuracy.
The effort reported here is an example of a broader concern about the quality of survey data (50) and, specifically, measurement error (57). To improve the accuracy of self-report and other measures, we need to understand and correct sources of measurement error, both random and systematic. As Gordis (50) pointed out, survey instruments rarely are subjected to peer review or published with a manuscript. This circumstance is changing as a result of increasing use of Web technologies and new federal legislation on data sharing (http://grants1.nih.gov/grants/policy/data_sharing/); however, most efforts to standardize measurement continue to be voluntary.11
J. Lipscomb, personal communication.
In light of recent legislation that limits researchers' access to medical records (Health Insurance Portability and Accountability Act of 1996) and the cost and effort required to collect such data (58), the use of and reliance on self-report data in behavioral, health services, and epidemiologic research are likely to increase. Therefore, it is even more important to maximize the opportunity to obtain reliable and valid self-report data so we can evaluate the effectiveness of behavioral interventions, synthesize data across studies, and monitor progress and trends in cancer screening adherence.
Appendix 1. Meeting Attendees
Dennis J. Ahnen, M.D.;12
Members of the work group.
Helen Meissner, Ph.D., and Marion R. Nadel, Ph.D., M.P.H., did not attend the meeting but were later added to the work group.
Appendix 2. CORE CRC Screening Questions
Grant support: At the time this work was done, B.K. Rimer was at the Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, Maryland.