Abstract
Background: Accurate measurement of people's risk perceptions is important for numerous bodies of research and in clinical practice, but there is no consensus about the best measure.
Objective: This study evaluated three measures of women's breast cancer risk perception by assessing their psychometric and test characteristics.
Design: A cross-sectional mailed survey to women from a primary care population asked participants to rate their chance of developing breast cancer in their lifetime on a 0% to 100% numerical scale and a verbal scale with five descriptive categories, and to compare their risk to others (seven categories). Six hundred three of 956 women returned the survey (63.1%), and we analyzed surveys from the 566 women without a self-reported personal history of breast or ovarian cancer.
Results: Scores on the numeric, verbal, and comparative measures were correlated with each other (r > 0.50), worry (r > 0.51), the Gail estimate (r > 0.26), and family history (r > 0.25). The numerical scale had the strongest correlation with annual mammogram (r = 0.19), and its correlation with the Gail estimate was unassociated with participants' sociodemographics. The numerical and comparative measures had the highest sensitivity (0.89-0.90) and specificity (0.99) for identifying women with very high risk perception. The numerical and comparative scale also did well in identifying women with very low risk perception, although the numerical scale had the highest specificity (0.96), whereas the comparative scale had the highest sensitivity (0.89).
Conclusion: Different measures of women's perceptions about breast cancer risk have different strengths and weaknesses. Although the numerical measure did best overall, the optimal measure depends on the goals of the measure (i.e., avoidance of false positives or false negatives). (Cancer Epidemiol Biomarkers Prev 2006;15(10):1893–8)
Introduction
Research in fields as varied as medicine, psychology, and marketing involve assessment of risk perceptions: beliefs about the likelihood of experiencing various adverse outcomes. For instance, many studies have assessed risk perception to test models that posit an association between risk perception and health behavior (1-14). Other research assesses risk perception to detect errors and biases in risk judgments (15-18), to assess the association between risk judgments and emotion (19-25), and to increase the accuracy of people's risk perceptions (11, 26-28). In sum, several important and large bodies of research depend on obtaining accurate assessments of people's risk perception.
However, there is little consistency in the approach used to measure risk perception. The most common measures include a 0% to 100% numerical measure, a verbal measure such as “not at all likely” to “extremely likely” or “very low” to “very high,” and a comparative measure for which respondents compare their risk to that of the average person using a scale such as “much lower than average” to “much higher than average” scale (29-34). Still other measures include “1 in x” scales (e.g., 1 in 200; ref. 35) or verbal response scales with different labels for each step on the scale (35) and different numbers of steps on the scale (36).
The number of different risk perception measures used by different investigators highlights the lack of consensus regarding the best risk perception measure. This lack of consensus arises, in part, from how little is known about the psychometric properties and relative merits of the various measures (35-38). Therefore, in a secondary analysis of an existing data set, we investigated the psychometric and test characteristics of three measures of breast cancer risk perception in a random sample of women from a primary care population. In addition to overall performance, we explicitly considered how the different items perform when identifying groups of particular interest for health behavior research and interventions, such as individuals who perceive themselves at very high or very low risk. Items were compared using traditional measurement theory on score distributions and construct validity (39). In addition, because there is no gold standard for perceived risk, we used latent class models to estimate the test characteristics (specificity and sensitivity) of each item for identifying women who perceive themselves at high risk and women who perceive themselves at low risk.
Materials and Methods
The study protocol was approved by the Institutional Review Board at the University of Pennsylvania.
Data Collection
A random sample of 1,200 adult women who had been seen by a University of Pennsylvania Health System primary care provider between 1996 and 1999 was identified through a billing database managed by the University of Pennsylvania Office of the Associate Dean for Health Services Research. From December 1999 to August 2003, surveys were mailed to the 1,016 of these patients who were not excluded by their primary care provider on the basis of being deceased, non-English speaking, too ill, or male. Of these 1,016 patients, 47 surveys were undeliverable, and an additional 5 were deceased and 8 were too sick to participate. Of the remaining 956, 603 returned their survey (63.1%). Of the 603, the 566 without a self-reported personal history of breast or ovarian cancer were eligible for the study.
Measures
Risk Perception. Risk perception was assessed using the following measures: (a) a numerical measure [“What do you think your chance is of developing breast cancer in your lifetime? Please choose a number between 0% (no chance of breast cancer) and 100% (definitely will get breast cancer)”], (b) a verbal measure (“How would you rate your chance of developing breast cancer? Please check very low, moderately low, neither high nor low, moderately high or very high”), and (c) a comparative measure (“Overall, how do you think your chance of developing breast cancer compares to the average woman your age?” 1, much lower; 4, about the same; 7, much higher).
Breast Cancer Worry. Breast cancer worry was assessed with a two-item measure that includes an item assessing frequency and effect on daily life (“How often do you worry about developing breast cancer?” and “How much does worrying about developing breast cancer interfere with your everyday life?” both on a scale from 1, not at all, to 7, all the time). This measure has been previously validated (40).
Mammography Adherence. Mammography adherence was assessed by asking whether the participant had ever had a screening mammogram and if so, the month and year of the last screening mammogram following the procedures used in the Behavioral Risk Factor Surveillance System (41). All women 50 years of age and older who reported having a mammogram within the 12 months before receiving the completed questionnaire were coded as adherent to current recommendations for mammography. Women who provided only the year of the past mammogram were assumed to have undergone screening in the middle of that year (i.e., June). Analyzing mammogram data with and without this assumption yielded the same pattern of results.
Breast Cancer Risk Factors. Lifetime breast cancer risk was calculated using the Gail model (42). The risk factors included in the Gail model are age, degree of family history, age at menarche, age at first live birth and history of breast biopsy. Degree of family history was categorized by (a) ≥1 first- and second-degree relatives with breast or ovarian cancer; (b) ≥1 first-degree relatives, no second-degree relatives; (c) ≥1 second-degree relatives, no first-degree relatives; and (d) no first- or second-degree relatives. Age at first live birth was categorized as in the Gail model: none, <20 years, 20 to 24 years, 25 to 30 years, and >30 years.
Sociodemographic Characteristics. Age, educational attainment, and household income were measured using items from the Behavioral Risk Factor Surveillance System 1998 questionnaire (41). Race and ethnicity were collapsed into Black, White, and other.
Statistical Analyses
The data were analyzed using STATA 7.0 and additional software that estimates the error rates of diagnostic tests or measurements when there is no gold standard by applying maximum likelihood estimation methods to latent class models representing the observed data.7
S. Williams, personal communication, with data transfer via 3.5″ disk containing programs and examples.
We assessed the validity of each measure in several ways. Although the nomenclature for validity assessment is inconsistent, we used the traditional categories of construct validity, focusing on convergent validity (the degree to which each measure correlates with other measures of the same construct), discriminant validity (the degree to which each measure does not correlate with measures of different constructs), and predictive validity (the relationship between the measure and the criterion it is supposed to predict; ref. 39). Convergent validity was assessed by examining the correlation of each risk perception measure with the other risk perception measures. Discriminant validity was assessed by examining the correlation of each risk perception measure with a measure of breast cancer worry. Predictive validity was assessed by examining the correlation between each risk perception measure and measures of actual risk and adherence to screening mammography. The extent of family history was included in addition to the Gail estimate because, although the former is included in the latter, patients are aware of their family history but generally unaware of their Gail estimate. Analyses of annual mammography included only those respondents ≥50 years (n = 318). We also assessed the degree to which the correlation with the primary measure of absolute risk varied by race (Black versus White), household income (>$50,000 versus ≤$50,000), age (<50 versus ≥50 years), and education (completed college or more versus some college or less). This was tested by examining the interaction term of the risk perception measure and each sociodemographic variable when predicting the Gail estimate.
Because we had no gold standard of very low or high risk perception, we used latent class analysis to assess the sensitivity and specificity of each measure as described by Walter and Irwig (43). In this method, all of the measures are assumed to be subject to error (i.e., misclassification) that is independent across measures. Initial estimates of the “true” classification of each participant (e.g., very high risk perception or not) are revised iteratively using maximum likelihood estimation until convergence occurs. The estimated variables are the false-positive and false-negative rates of each item and the prevalence of the outcome in the population. This methodology has been previously applied to measurement issues in epidemiologic studies (43, 44). To conduct these analyses, each risk perception measure was categorized into five categories with four cutoff points. For the comparative measure, the seven response options were collapsed into five categories: 1 and 2, 3, 4, 5, and 6 and 7. The five categories of the verbal measure remained the same. We categorized the numerical measure into five groups corresponding to the average lifetime risk of breast cancer: 0% to 5%, 6% to 10%, 11% to 15%, 16% to 50%, and >50%. Because there is debate about whether women are able to use a numerical scale as corresponding to the predicted risk of breast cancer, we examined three other categorizations of the 0% to 100% scale; the first using categories of 0% to 19%, 20% to 39%, 40 to 59%, 60 to 79%, 80 to 100%; the second using categories of 0 to 5%, 6% to 10%, 11% to 15%, 16% to 20%, and 21% to 100%; and the third using the original categories but omitting the women who chose 50% as this choice may reflect uncertainty rather than a specific level of risk perception (45).
We calculated sensitivity and specificity for two outcomes: very high risk perception and very low risk perception. In our main approach (approach A), very high risk perception was defined as responses above the highest cutoff point on each measure (>50% for numeric, very high for verbal, >5 on the seven-point comparative measure) and very low risk perception was defined as responses below the lowest cutoff point on each scale (<6% for numeric, very low for verbal, and <3 on the 7-point comparative measure). However, multiple alternative approaches were used to define very high risk perception for the numerical measure to determine the effect of these alternative definitions on the results. In approach B, we used 50% as the cutoff and excluded participants with a 50% perceived risk on the numerical measure for predicting high risk because of the known tendency for people to use 50% to mean “I don't know” rather than a 50/50 chance (45, 46). In approach C, we used a cutoff of 20% for the numerical measure without this additional exclusion, and in approach D, we used a cutoff of 20% for the numerical measure and again excluded those with a 50% perceived risk on the numerical measure.
Results
Respondents
The study population had a mean age of 52 years (with a range from 20 to 82 and 50% being over 50 years of age). Sixty-six percent of study respondents were White. Respondents encompassed a wide range of educational attainment and household income. Respondent characteristics and Gail model breast cancer risk factors are shown in Table 1. On average, respondents had a Gail lifetime risk of 8% with a range of 0.8% to 29%.
. | (n = 566) . | |
---|---|---|
Age, y (%) | ||
18-30 | 7 | |
31-40 | 17 | |
41-50 | 25 | |
51-60 | 20 | |
>60 | 30 | |
Race/ethnicity (%) | ||
Black | 34 | |
White | 66 | |
Education (%) | ||
High school or less | 29 | |
Some college | 29 | |
College or higher | 42 | |
Annual household income (%) | ||
≤$30,000 | 22 | |
$30,001-$50,000 | 22 | |
$50,001-$70,000 | 20 | |
>$70,000 | 36 | |
Family history of breast or ovarian cancer (%) | ||
None | 66 | |
≥1 second-degree relatives | ||
No first-degree relatives | 21 | |
≥1 first-degree relatives | ||
No second-degree relatives | 11 | |
≥1 first- and second-degree relatives | 3 | |
Age first live birth, y (%) | ||
No live births | 22 | |
<20 | 10 | |
20-24 | 19 | |
25-30 | 22 | |
≥30 | 26 | |
History of breast biopsy (%) | ||
Yes | 22 | |
No | 78 | |
Age first period, y (%) | ||
9-11 | 18 | |
12-13 | 55 | |
≥14 | 27 | |
Gail estimate (%) | ||
Average or below (≤12%) | 84 | |
Above average (>12%) | 16 |
. | (n = 566) . | |
---|---|---|
Age, y (%) | ||
18-30 | 7 | |
31-40 | 17 | |
41-50 | 25 | |
51-60 | 20 | |
>60 | 30 | |
Race/ethnicity (%) | ||
Black | 34 | |
White | 66 | |
Education (%) | ||
High school or less | 29 | |
Some college | 29 | |
College or higher | 42 | |
Annual household income (%) | ||
≤$30,000 | 22 | |
$30,001-$50,000 | 22 | |
$50,001-$70,000 | 20 | |
>$70,000 | 36 | |
Family history of breast or ovarian cancer (%) | ||
None | 66 | |
≥1 second-degree relatives | ||
No first-degree relatives | 21 | |
≥1 first-degree relatives | ||
No second-degree relatives | 11 | |
≥1 first- and second-degree relatives | 3 | |
Age first live birth, y (%) | ||
No live births | 22 | |
<20 | 10 | |
20-24 | 19 | |
25-30 | 22 | |
≥30 | 26 | |
History of breast biopsy (%) | ||
Yes | 22 | |
No | 78 | |
Age first period, y (%) | ||
9-11 | 18 | |
12-13 | 55 | |
≥14 | 27 | |
Gail estimate (%) | ||
Average or below (≤12%) | 84 | |
Above average (>12%) | 16 |
Distribution of Scores
As shown in Table 2, the Shapiro-Wilks tests indicate that none of the risk perception measures are normally distributed.
Criteria . | Risk-perception measure . | . | . | |||
---|---|---|---|---|---|---|
. | Numerical (0-100) . | Verbal (1-5) . | Comparative (1-7) . | |||
Normality | ||||||
Shapiro Wilks test (W)* | 0.976 | 0.992 | 0.981 | |||
P < 0.0001 | P = 0.006 | P < 0.0001 | ||||
Convergent validity (correlation with other risk-perception measures) | ||||||
Verbal | 0.62 | — | ||||
P < 0.0001 | — | |||||
Comparative | 0.60 | 0.72 | ||||
P < 0.0001 | P < 0.0001 | — | ||||
Discriminant and predictive validity | ||||||
Correlation with breast cancer worry | 0.51 | 0.51 | 0.52 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Correlation with measures of actual risk Gail model estimate | 0.26 | 0.35 | 0.33 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Degree of family history | 0.28 | 0.27 | 0.25 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Correlation with breast cancer screening annual mammogram | 0.19 | 0.12 | 0.10 | |||
P = 0.0019 | P = 0.053 | P = 0.110 | ||||
Association with Gail risk affected by: Race (Black vs White)† | Not significant | White > Black | White > Black | |||
P = 0.103 | P = 0.034 | P = 0.014 | ||||
Income (>$50,000 vs ≤$50,000)† | Not significant | Not significant | Higher income > lower income | |||
P = 0.444 | P = 0.061 | P = 0.032 | ||||
Age (<50 vs ≥50 y)† | Not significant | Not significant | Not significant | |||
P = 0.887 | P = 0.555 | P = 0.823 | ||||
Education (completed college or more vs less than college)† | Not significant | Not significant | More education > less education | |||
P = 0.525 | P = 0.175 | P = 0.021 |
Criteria . | Risk-perception measure . | . | . | |||
---|---|---|---|---|---|---|
. | Numerical (0-100) . | Verbal (1-5) . | Comparative (1-7) . | |||
Normality | ||||||
Shapiro Wilks test (W)* | 0.976 | 0.992 | 0.981 | |||
P < 0.0001 | P = 0.006 | P < 0.0001 | ||||
Convergent validity (correlation with other risk-perception measures) | ||||||
Verbal | 0.62 | — | ||||
P < 0.0001 | — | |||||
Comparative | 0.60 | 0.72 | ||||
P < 0.0001 | P < 0.0001 | — | ||||
Discriminant and predictive validity | ||||||
Correlation with breast cancer worry | 0.51 | 0.51 | 0.52 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Correlation with measures of actual risk Gail model estimate | 0.26 | 0.35 | 0.33 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Degree of family history | 0.28 | 0.27 | 0.25 | |||
P < 0.0001 | P < 0.0001 | P < 0.0001 | ||||
Correlation with breast cancer screening annual mammogram | 0.19 | 0.12 | 0.10 | |||
P = 0.0019 | P = 0.053 | P = 0.110 | ||||
Association with Gail risk affected by: Race (Black vs White)† | Not significant | White > Black | White > Black | |||
P = 0.103 | P = 0.034 | P = 0.014 | ||||
Income (>$50,000 vs ≤$50,000)† | Not significant | Not significant | Higher income > lower income | |||
P = 0.444 | P = 0.061 | P = 0.032 | ||||
Age (<50 vs ≥50 y)† | Not significant | Not significant | Not significant | |||
P = 0.887 | P = 0.555 | P = 0.823 | ||||
Education (completed college or more vs less than college)† | Not significant | Not significant | More education > less education | |||
P = 0.525 | P = 0.175 | P = 0.021 |
Value of 1 indicates complete normality; significant P value indicates nonnormal.
P value for interaction term.
Construct Validity
Table 2 reports our assessments of convergent, discriminant, and predictive validity. Each risk perception measure was significantly associated with each of the other measures (r ≥ 0.60 for all) and slightly less correlated with a measure of breast cancer worry (r ≥ 0.51 for all). Each risk perception measure was also significantly correlated with the Gail estimate (r ≥ 0.26 for all) and degree of family history (r ≥ 0.25 for all). However, only the numerical measure was significantly correlated with adherence to annual mammography (r = 0.19). The verbal and comparative measures were only marginally correlated with mammography adherence.
The correlation between the risk perception measures and Gail estimate was significantly stronger for Whites than Blacks for the verbal measure and the comparative measure. The correlation between the Gail estimate and the risk perception measure was not affected by income or education for the numerical measure, but was significantly stronger for those with higher income (>$50,000) and who completed college or more for the comparative measure. The correlation between Gail risk and the verbal measure was marginally stronger among those with higher income (>$50,000). The associations with Gail risk did not differ by age (being ≥50 versus <50 years) for any of the three risk perception measures.
Sensitivity and Specificity
The test characteristics of each measure for identifying women with very high risk perception and women with very low risk perception are reported in Table 3.
. | Approach A . | . | . | Approach B . | . | . | Approach C . | . | . | Approach D . | . | . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | N . | V . | C . | N . | V . | C . | N . | V . | C . | N . | V . | C . | ||||||||
Very high sensitivity | 0.89 | 0.37 | 0.89 | 0.91 | 0.75 | 0.86 | 0.89 | 0.31 | 1.00 | 0.89 | 0.37 | 0.90 | ||||||||
Very high specificity | 0.99 | 0.93 | 0.99 | 0.99 | 0.98 | 0.96 | 0.36 | 1.00 | 1.00 | 0.90 | 1.00 | 0.99 | ||||||||
Very low sensitivity | 0.74 | 0.81 | 0.74 | |||||||||||||||||
Very low specificity | 0.96 | 0.95 | 0.96 |
. | Approach A . | . | . | Approach B . | . | . | Approach C . | . | . | Approach D . | . | . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | N . | V . | C . | N . | V . | C . | N . | V . | C . | N . | V . | C . | ||||||||
Very high sensitivity | 0.89 | 0.37 | 0.89 | 0.91 | 0.75 | 0.86 | 0.89 | 0.31 | 1.00 | 0.89 | 0.37 | 0.90 | ||||||||
Very high specificity | 0.99 | 0.93 | 0.99 | 0.99 | 0.98 | 0.96 | 0.36 | 1.00 | 1.00 | 0.90 | 1.00 | 0.99 | ||||||||
Very low sensitivity | 0.74 | 0.81 | 0.74 | |||||||||||||||||
Very low specificity | 0.96 | 0.95 | 0.96 |
NOTE: Approach A: Using cutoff of 50% for numeric, very high for verbal, 6 on seven-point comparative measure (7—much higher than average) for predicting high risk, or 5% for numeric, very low for verbal, 2 on seven-point comparative measure (1—much lower than average) for predicting low risk. Approach B: Using cutoff of 50% numeric, very high for verbal, 6 on seven-point comparative measure (7—much higher than average) for predicting high risk, and dropping participants with a 50% perceived risk on the numerical scale for predicting high risk. Approach C: Using cutoff of 20% for numeric, very high for verbal, 6 on seven-point comparative measure (7—much higher than average) for predicting high risk. Approach D: Using cutoff of 20% for numeric, very high for verbal, 6 on seven-point comparative measure (7—much higher than average) and dropping participants with a 50% perceived risk on the numerical scale for predicting high risk.
Abbreviations: N, numeric; V, verbal; C, comparative.
The numerical and comparative measures had the highest sensitivity (0.89-0.90) and the verbal measure had the lowest sensitivity (0.37) for identifying women with very high risk perception. Similarly, the numerical and comparative measures exhibited the highest specificity (0.99) for excluding women who did not have very high risk perception. The verbal measure had the lowest specificity for excluding women who did not have very high risk perception, although it was still high (0.93). Overall, for identifying women with very high risk perception, the numerical and comparative measures had higher sensitivity and specificity than the verbal measure. Using these cutoffs, the latent class models estimated that 5% of respondents had a very high risk perception and 15% had a very low risk perception.
Alternative approaches for categorizing the numerical scale had some effect on the relative sensitivity and specificities of the different measures for identifying women with high risk perception (Table 3). In general, the numerical measure retained high sensitivity and specificity unless the threshold for high risk perception on the numerical measure was set at 20% (approach C). The verbal measure had lower sensitivity but relatively high specificity across the approaches tested. The comparative measure had relatively high sensitivity and specificity, with levels slightly above that of the numerical measure when a cutoff of 20% for high risk was used on the numerical scale (approach C) or when women who responded 50% as their risk on the numerical scale were omitted from the analysis.
For identifying women with very low risk perception, the numerical measure had the lowest sensitivity (0.74) and highest specificity (0.96), whereas the comparative measure had the highest sensitivity (0.89) and the lowest specificity (0.91). The verbal measure had an intermediate sensitivity at 0.81 and a relatively high specificity at 0.95.
Discussion
The results of this study offer some information to guide the decision of how to best measure cancer risk perception. Each of the three measures of risk perception that we included in this study had significant strengths and weaknesses. The comparative measure of perceived risk, which asked women to compare their risk to the average woman on a seven-point scale, had a nonnormal distribution, was strongly correlated with the other measures, and moderately correlated with measures of actual risk. However, the comparative measure of perceived risk was less correlated with adherence to annual mammography and seemed to perform differently among women with different levels of income, educational attainment, and racial groups. The verbal measure of perceived risk, where a woman rated her risk on five-point scale, also had a nonnormal distribution and was strongly correlated with the other measures and moderately correlated with measures of actual risk. However, the verbal measure of perceived risk had a correlation with mammography adherence of borderline statistical significance and its performance was not consistent across all sociodemographic characteristics. The numerical measure of perceived risk also had a nonnormal distribution, strong correlations with the other measures, and moderate correlations with measures of actual risk. In addition, it was significantly correlated with adherence to annual mammography and there was no evidence that its performance differed between women of different socioeconomic groups. Thus, the numerical measure had the best overall performance across these criteria and could be considered the default option for most studies.
The high correlations between the risk perception measures and the convergent validity in their positive associations with other related variables (e.g., breast cancer worry) also suggest that the three risk perception measures represent similar constructs. Although these high correlations suggest a single underlying construct, it is possible that the differential association of each risk perception measure with other variables indicates that each of the measures may represent a different aspect of risk perception, which may have implications for health behavior. There is a growing recognition that risk perception is a more complex construct than pure expected probability and is likely to include dimensions of affective response (47), as is suggested by the very strong association between the risk perception measures and breast cancer worry. Further research is needed to tease apart how these measures may relate to this more complex model of risk perception.
Although the criteria in Table 2 provide some guidance as to the overall performance of the measures, the calculations of sensitivity and specificity can inform the decision about which measure (and which cutoff) to use when there is a specific need to accurately identify individuals with very high or low risk perception. In these analyses, both the numerical measure and the comparative measure did the best—whether in identifying women who perceive themselves at very high risk or in identifying women who perceive themselves at very low risk. The test characteristics of the numerical and comparative measures were very similar for identifying women with very high risk perception; both had very low rates of false positives (1%) and reasonably low rates of false negatives (∼10%). It is important to note that we calculated the sensitivity and specificity of the measures using different cutoffs for the numerical measure; however, other cutoffs for very high risk or very low risk may be of interest and affect the results. As expected, as the threshold for calling a risk perception “very high” decreased, the specificity of the numerical measure decreased. Furthermore, the inclusion or removal of the 50% response on the numerical scale did not substantively affect our results, which suggests that despite known problems with this response (45, 46), individuals reporting a risk perception of 50% do not need to be excluded for the numerical scale to have reasonable test characteristics.
The results of these analyses suggest that for the identification of women with very low risk perception, the comparative measure may be preferred if the goal is to minimize the rate of false negatives (classifying a woman who has very low risk perception as having average or high risk perception). This may be the goal in many settings, such as interventions to encourage cancer risk reduction behaviors among women at risk of not participating because of low perceived risk of cancer. However, if the goal is to maximize the specificity of the measure and minimize the number of false positives (classifying a woman with average or high risk perception as having low risk perception), the numerical measure would be best. Although this scenario seems less likely, it may apply to settings such as clinical trials of interventions targeted at low risk women where eligibility is dependent upon having a low risk perception.
The results of this study contribute to an ongoing discussion about the best approach to measuring risk perception. One previous study compared a comparative measure to a numerical measure and found that the comparative measure did better on various psychometric measures and subject ratings (37). Another study that compared verbal measures to numerical measures concluded that the verbal measures were subjectively better. Specifically, Diefenbach et al. (36) found that subjects rated a verbal measure as easier to use and as a better reflection of how they felt compared with a dichotomous measure and multiple numerical measures. In addition, people tend to have difficulty with numerical probabilities (“innumeracy”; ref. 48) and report a preference for expressing risk using words, not numbers (49, 50). On the other hand, there is substantial variability in how people interpret the verbal expressions of probability used to label the verbal scales, raising questions about the reliability of these measures and making it very difficult to translate verbal expressions into quantitative metrics (49, 51, 52).
Thus, to date, much debate and uncertainty has surrounded the choice of risk perception measures for clinical and research use. This study applies a clinical diagnostic tool to examine the strengths and weaknesses of three different risk perception measures. The results not only contribute to what is known about the strengths and weaknesses of each measure, but also extend this line of work by specifying which measures are best under which circumstances. As noted, previous work has generated mixed recommendations regarding the best approach to risk perception measurement. The approach used in this study suggest that despite the problems revealed with it in other work, the numerical measure performs well (if not best) by most criteria, but that the optimal measure also depends on the relative costs of false negatives and false positives in risk perception measurement.
The results of this study must be considered within its limitations. The order of the risk perception measures was not randomized, and a previous study found that risk perception measure order affects responses to these measures (53). Because that study found that responses to measures that follow the comparative measure are lower than when they do not follow the comparative measure, responses to our verbal measure (which preceded the comparative measure) may have been higher than they would have been had the verbal measure followed the comparative measure. However, the size of this effect is small and it is unlikely to have substantively affected the overall ranking of the measures. A second limitation is that, as with any survey research, there is the potential for response bias. Although the response rate of 63% is respectable, it is possible that nonresponders differed from responders. However, we do not have information on the nonresponders to enable us to examine this possibility. Another concern is that we did not specifically test whether numeracy or literacy affected the performance of the measures, although we did examine the effect of education. Thus, it is possible that one of the measures performs less well among low numeracy or literacy groups. Finally, we did not assess the performance of multiple items compared with single items. Future work is needed to determine whether multiple item measures of risk perception are preferable to the single items considered here.
Nevertheless, the results of this study provide new information about the strengths and weaknesses of multiple breast cancer risk perception measures, including information about their relative strengths and weaknesses depending on the goals or circumstances of the inquiry. The numerical measure did well overall, but other measures may be preferred depending on the study question and the relative importance of avoiding false positives versus false negatives. The recommendations from this study for cancer risk perception measures can increase the validity of risk perception measures used in clinical practice and research. In addition, this information can contribute to greater consensus about risk-perception measurement and, therefore, greater consistency in measurement across studies.
Grant support: American Cancer Society Research Training Grant and Robert Wood Johnson Generalist Faculty Scholar Award (K. Armstrong).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Acknowledgments
We thank Barbara Weber, Jill Stopfer, Susan Domchek, Ellyn Micco, Amy Carney, and the women who participated in the study.