Background:

The US Health Retirement Study (HRS) is an ongoing population-representative cohort of US adults ages >50 with rich data on health during aging. Self-reported cancer diagnoses have been collected since 1998, but they have not been validated. We compared self-reported cancer diagnoses in HRS interviews against diagnostic claims from linked Medicare records.

Methods:

Using HRS–Medicare linked data, we examined the validity of first incident cancer diagnoses self-reported in biennial interviews from 2000 to 2016 against ICD-9 and ICD-10 diagnostic claim records as the gold standard. Data were from 8,242 HRS participants ages ≥65 with 90% continuous enrollment in fee-for-service Medicare. We calculated the sensitivity, specificity, and κ for first incident invasive cancer diagnoses (all cancers combined, and each of bladder, breast, colorectal/anal, uterine, kidney, lung, and prostate cancers) cumulatively over the follow-up and at each biennial study interview.

Results:

Overall, self-reports of first incident cancer diagnoses from 2000 to 2016 had 73.2% sensitivity and 96.2% specificity against Medicare claims (κ = 0.73). For specific cancer types, sensitivities ranged from 44.7% (kidney) to 75.0% (breast), and specificities ranged from 99.2% (prostate) and 99.9% (bladder, uterine, and kidney). Results were similar in sensitivity analyses restricted to individuals with 100% continuous fee-for-service Medicare enrollment and when restricted to individuals with at least 24 months of Medicare enrollment.

Conclusions:

Self-reported cancer diagnoses in the HRS have reasonable validity for use in population-based research that is maximized with linkage to Medicare.

Impact:

These findings inform the use of the HRS for population-based cancer and aging research.

The confluence of rapid population aging with continually improving cancer survival rates in the United States is resulting in a substantial growth of the older cancer survivor population (1, 2). By 2040, the population of US cancer survivors ages ≥65 is projected to grow by nearly 50% to approximately 19 million people (3). Thus, there is an increasing need for longitudinal, population-representative data for interdisciplinary cancer research among older adults. The population-based Surveillance, Epidemiology and End Results (SEER) cancer registries capture about 28% of the US population and are widely used in cancer research, yet these samples are generally more urban, less white, lower poverty, and have more HMO coverage than the general US population (4, 5).

The US Health Retirement Study (HRS) is an ongoing population-representative cohort of adults ages >50 with rich biennial interview data on social, behavioral, economic, and health-related factors during aging, in addition to biomarker data, genetic data, and several administrative data linkages such as Medicare, Social Security, and pension data (6). Over 40,000 individuals have taken part in at least one HRS interview since its inception. The HRS has regularly collected self-reported cancer diagnoses since 1998 and may be a valuable resource for population-based cancer research. However, the self-reported cancer diagnoses in HRS have not yet been validated. Prior validation studies of self-reported cancer diagnoses using cancer registry data have shown moderate sensitivity (63.7%–72.0%) and high specificity (>90%) of self-reported cancer diagnosis, agnostic of cancer site (7–9). Similarly, a validation of self-reports against Medicare claims found a 77% sensitivity and 91% specificity (10). Sensitivity of cancer site-specific diagnoses varies widely by cancer site, with breast cancer showing the highest sensitivity at >80% (7, 9, 11, 12). Importantly, sensitivity and specificity are affected by the study population that is sampled. A French study of adults over age 75 found 20.6% sensitivity and 99.4% specificity of self-reported cancer (13). Indeed, nearly all existing validation studies have found increased false-negative and false-positive responses among older individuals (8–12).

To evaluate the utility of self-reported cancer diagnosis in the HRS for future research on cancer and aging, we aimed to validate self-reported first incident cancer diagnoses from the HRS interviews against diagnostic claims from linked Medicare records. We used data from biennial HRS interviews from 2000 through 2016 linked to Medicare claims from 2000 through 2016 to evaluate the sensitivity, specificity, and κ for self-reported diagnosis of a first incident malignant cancer (all types), and for the seven most common cancer sites in adults ages ≥65 (breast, prostate, lung, colorectal/anal, uterine, kidney, and bladder; ref. 14).

Data source

The US Health and Retirement Study (HRS) is a nationally representative cohort of Americans ages >50 who are interviewed every two years (6). The HRS began in 1992 as a cohort of adults ages 51–61 years. It merged with the 1993 Asset and Health Dynamics of the Oldest Old (AHEAD) study of adults born before 1924 and expanded recruitment across birth cohorts until it comprised a cohort of adults ages >50 in 1998. “Refresher” cohorts ages 51–56 years are added every six years, to account for the aging of the original 1998 cohort (6). Across all interview years, approximately 80% of HRS participants who are Medicare enrollees consent to linking their Medicare records (6). In the present study, we used HRS interview data from 1998 to 2016 (1998 data were used for demographic information and to exclude individuals with a prior cancer history, not for analysis), including restricted data for self-reported cancer site, linked to Medicare Part A (inpatient, outpatient) and Part B (carrier) fee-for-service claims from 2000 to 2016 (15).

Study population

We included 17,495 HRS participants who were Medicare-eligible during our validation study period (January 2000–December 2016) and had no prior history of cancer (recorded either through self-report or Medicare claims). Eligible individuals who did not enroll in Medicare, were <65 at their Medicare eligibility date, became eligible for Medicare after their last HRS interview, or had a baseline interview prior to 1998 were excluded (N = 1,901). To ensure sufficient available claims for detecting cancer diagnoses, while minimizing the number of people excluded from the analysis due to a lack of continuous coverage, participants with less than 90% Parts A and B Medicare coverage during their eligible time period (N = 6,895), those with fewer than 12 months follow-up (N = 235), or those missing baseline characteristics (N = 222) were excluded, for a final sample of 8,242.

Collection of HRS data and written informed consent is done in compliance with requirements of the University of Michigan's Institutional Review Board. This analysis was approved by the University of Michigan Health Sciences and Behavioral Sciences Institutional Review Board (HUM00170138) and additional informed consent was not necessary for the present analysis.

Identification of cancer diagnoses

Self-reported cancer diagnoses

First incident self-reported cancer diagnoses other than nonmelanoma skin cancer were identified in the HRS baseline interview (1998), follow-up interviews (2000–2016), and interviews at these same time points completed by a proxy respondent on behalf of the participant (usually a spouse or other family member who completes interview when participant is unable or unwilling). For respondents who died between follow-ups, exit interviews were completed by a knowledgeable proxy within two years of a participant's death to ascertain any cancer diagnosis that occurred following the respondent's final HRS interview. At each biennial interview, HRS participants or their proxies are asked whether they had received a new cancer diagnosis (excluding minor skin cancers) since their last interview and, if so, the date of diagnosis and in which organ or body part the cancer started (Supplementary Table S1; ref. 15).

Medicare claims–based cancer diagnoses

The cancer sites recorded in the HRS interview data were mapped onto ICD-9 and ICD-10 codes for identification in Medicare claims, resulting in a total of 52 malignant cancer types (Supplementary Table S1; refs. 16–19). Using the Medicare Chronic Conditions Warehouse cancer algorithm, claims-based cancer diagnoses were defined through the identification of a relevant ICD-9 or ICD-10 diagnosis code on either one inpatient claim, or, two outpatient or Part B carrier claims (20). The incident cancer diagnosis date was assigned as the first date the cancer diagnosis was detected on a claim within the two-year period.

Demographics

Demographic characteristics for participants were obtained from the HRS interview closest to their Medicare eligibility date and included participant age (in years, continuous), sex (male, female), race/ethnicity (non-Hispanic white, non-Hispanic black, non-Hispanic other, Hispanic), living arrangement (married/partnered, unmarried living with other, unmarried living alone), education (less than high school, high school, some college, college or more), household wealth (mean (SD) in 1998 dollars adjusted for inflation), and comorbid conditions (0, 1, 2+).

Statistical analysis

We calculated the overall sensitivity, specificity, and κ for self-reported cancer diagnoses in the HRS interviews from 2000 through 2016, using Medicare claim-based diagnoses (2000–2016) as the gold standard. We first performed these analyses for any incident cancer diagnosis (regardless of type), and then for each of the seven most common cancers in older adults (breast, prostate, colorectal/anal, lung, bladder, uterine, and kidney). The κ statistic was calculated to quantify the concordance between HRS self-reports and Medicare claims, while accounting for chance agreement (21). We also calculated the sensitivity, specificity, and κ for self-reported cancer diagnoses stratified by participant age (65–69, 70+), sex (male/female), and level of education (less than high school, high school, some college, and college graduate). The first incident self-reported cancer diagnosis in the time period 2000–2016 was compared with any claim identified diagnosis in the period 2000–2016 (mean time between self-report and claim was 1.6 years). As sensitive health information, self-reported cancer site is restricted use data. At the time of this study, these data were available only through the 2014 HRS interview (15). Hence, cancer site–specific validations were calculated using Medicare claim data through April 2015, which was the date of the last interview in the 2014 HRS wave.

We also calculated the sensitivity, specificity, and κ for self-reported cancer diagnoses for each two-year HRS interview wave, as these figures may be useful for researchers conducting analyses using HRS data from specific time points. For each HRS interview wave beginning with 2000, we evaluated the participants' participation in the subsequent wave. Among participants who took part in two consecutive HRS interview waves, we included those who were eligible for Medicare for the full period between the two waves (approximately two years) with at least 90% fee-for-service coverage. Participants with an incident cancer were excluded from validity calculations in subsequent waves. We conducted three sensitivity analyses to evaluate the robustness of our primary analyses to changes in percentage of Medicare coverage and length of follow-up. First, we restricted our sample to individuals who had 100% Medicare fee-for-service coverage during their follow-up period. Second, we restricted to individuals with at least 24 months of follow-up in the Medicare data. Finally, we restricted to individuals with both 100% Medicare fee-for-service coverage and at least 24 months of follow-up in the Medicare data. All analyses were performed with Stata version 16.1 (StataCorp LLC).

This analysis included 8,242 Medicare-eligible HRS participants ages ≥65 (Table 1). The mean age was 71.7 years (standard deviation 7.9 years). Most participants were women (59.3%), non-Hispanic white (79.4%), and married (64.3%). Most had a high school degree or less (64.3%), no comorbid conditions, and mean household wealth was $314,560 (Table 1). Overall, 20% of the sample reported a first incident cancer diagnosis over the period 2000 to 2016.

Table 1.

Characteristics of Medicare-eligible US Health and Retirement Study respondents ages 65 and over, 2000–2016a.

CharacteristicAll participants N = 8,242 (%)
Age, mean (SD) 71.7 (7.9) 
Male 3,354 (40.7) 
Race/ethnicity (missing n = 4) 
 Non-Hispanic white 6,532 (79.3) 
 Non-Hispanic Black 1,034 (12.6) 
 American Indian/Alaska Native 78 (1.0) 
 Asian/Pacific Islander 38 (0.5) 
 Non-Hispanic other 27 (0.3) 
 Hispanic 529 (6.4) 
Education (years)b 
 Less than high school 2,537 (30.8) 
 High school 2,759 (33.5) 
 Some college 1,480 (18.0) 
 College or more 1,465 (17.8) 
Household wealth (SD)c 292,341 (707,075) 
Charlson comorbidity 4,928 (59.8) 
 None 1,347 (16.3) 
 1 1,967 (23.9) 
 2+  
Mean time from self-report to claim diagnosis (years) 1.6 
CharacteristicAll participants N = 8,242 (%)
Age, mean (SD) 71.7 (7.9) 
Male 3,354 (40.7) 
Race/ethnicity (missing n = 4) 
 Non-Hispanic white 6,532 (79.3) 
 Non-Hispanic Black 1,034 (12.6) 
 American Indian/Alaska Native 78 (1.0) 
 Asian/Pacific Islander 38 (0.5) 
 Non-Hispanic other 27 (0.3) 
 Hispanic 529 (6.4) 
Education (years)b 
 Less than high school 2,537 (30.8) 
 High school 2,759 (33.5) 
 Some college 1,480 (18.0) 
 College or more 1,465 (17.8) 
Household wealth (SD)c 292,341 (707,075) 
Charlson comorbidity 4,928 (59.8) 
 None 1,347 (16.3) 
 1 1,967 (23.9) 
 2+  
Mean time from self-report to claim diagnosis (years) 1.6 

aData based on 90% coverage for those with at least 12 months of follow-up in Medicare data.

bOne individual is missing an education value.

cData extracted from different waves (1998–2016), with the majority from the 1998 HRS interview, adjusting for inflation based on 1998 dollars using the interview year.

For the full period from 2000 to 2016, self-reports of first incident cancer diagnosis for any of the 52 cancer sites included in the HRS had an overall 73.2% sensitivity and 96.2% specificity against Medicare claim data (κ = 0.73; Table 2). For the site-specific cancer diagnoses, sensitivities ranged from 44.7% (kidney cancer) to 75.0% (breast cancer), and specificities ranged from 99.2% (prostate cancer) to 99.9% (bladder, uterine, and kidney cancers; Table 2). Accounting for chance agreement, the κ values for interview self-reports and Medicare claims ranged from 0.58 (kidney cancer) to 0.83 (breast cancer; Table 2). The contingency tables presenting self-reported versus Medicare-defined cancer diagnoses are presented in Supplementary Tables S2 and S3.

Table 2.

Cumulative sensitivity, specificity, and kappa for first incident cancer diagnosis (any cancer and seven specific cancer types), HRS–Medicare enrollees with 90% continuous fee-for-service coverage and at least 12 months of Medicare follow-up, 2000–2016.

Cancer typeaSensitivitySpecificityKappa
Any cancer type (N = 8,242)b 73.2% 96.2% 0.73 
Colorectal/anal (N = 8,192) 60.1% 99.7% 0.69 
Lung (N = 8,192) 59.2% 99.4% 0.68 
Breast (N = 4,855) 75.0% 99.8% 0.83 
Prostate (N = 3,337) 66.1% 99.2% 0.74 
Bladder (N = 8,192) 54.2% 99.9% 0.67 
Uterine (N = 4,855) 57.9% 99.9% 0.70 
Kidney (N = 8,192) 44.7% 99.9% 0.58 
Cancer typeaSensitivitySpecificityKappa
Any cancer type (N = 8,242)b 73.2% 96.2% 0.73 
Colorectal/anal (N = 8,192) 60.1% 99.7% 0.69 
Lung (N = 8,192) 59.2% 99.4% 0.68 
Breast (N = 4,855) 75.0% 99.8% 0.83 
Prostate (N = 3,337) 66.1% 99.2% 0.74 
Bladder (N = 8,192) 54.2% 99.9% 0.67 
Uterine (N = 4,855) 57.9% 99.9% 0.70 
Kidney (N = 8,192) 44.7% 99.9% 0.58 

aSite-specific cancer details are available through April 2015.

bTotal sample size including participants with and without cancer.

Table 3.

Sensitivity and specificity for a first incident cancer diagnosis of any type (interview self-reports compared with Medicare claims as the gold standard), by HRS interview wave.

Any malignant cancer
YearSensitivitySpecificityKappa
2002 67.0 99.4 0.73 
2004 65.7 99.6 0.73 
2006 65.5 99.4 0.71 
2008 66.2 99.4 0.73 
2010 69.7 99.5 0.77 
2012 71.3 99.4 0.76 
2014 64.6 99.6 0.72 
2016 72.3 99.1 0.74 
Any malignant cancer
YearSensitivitySpecificityKappa
2002 67.0 99.4 0.73 
2004 65.7 99.6 0.73 
2006 65.5 99.4 0.71 
2008 66.2 99.4 0.73 
2010 69.7 99.5 0.77 
2012 71.3 99.4 0.76 
2014 64.6 99.6 0.72 
2016 72.3 99.1 0.74 

Considering the first incident cancer diagnosis of any cancer type by HRS interview wave, sensitivity values generally increased over time, and ranged from 64.6% (2014 wave) to 72.3% (2016 wave). Specificity values also tended to increase over time and ranged from 99.1% in 2016 to 99.6% in the 2004 and 2014 waves. κ was lowest in the 2006 wave (κ = 0.71) and highest in the 2010 wave (κ = 0.77; Table 3).

When restricting to individuals with 100% Medicare fee-for-service coverage during their follow-up (N = 7,320), individuals with at least 24 months of Medicare follow-up (N = 7,740), and individuals with both 100% coverage and 24 months of follow-up (N = 6,830), the results were all similar to those observed in our primary analyses (Tables 4–6). Stratified analyses generally showed lower sensitivity values for self-reports from individuals over age 70 and those with less education, but values were similar for both males and females (Supplementary Tables 4–6).

Table 4.

Sensitivity, specificity, and kappa for overall cancer and seven specific cancer types, using the 100% coverage and at least 12-month follow-up.

Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 7,320) 72.9% 96.3% 0.73 
Colorectal/anal (N = 7,294) 59.8% 99.7% 0.70 
Lung (N = 7,294) 60.6% 99.4% 0.69 
Breast (N = 4,319) 75.1% 99.8% 0.83 
Prostate (N = 2,975) 66.0% 99.3% 0.74 
Bladder (N = 7,294) 51.8% 99.9% 0.65 
Uterine (N = 4,319) 60.8% 99.9% 0.72 
Kidney (N = 7,294) 47.3% 99.9% 0.60 
Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 7,320) 72.9% 96.3% 0.73 
Colorectal/anal (N = 7,294) 59.8% 99.7% 0.70 
Lung (N = 7,294) 60.6% 99.4% 0.69 
Breast (N = 4,319) 75.1% 99.8% 0.83 
Prostate (N = 2,975) 66.0% 99.3% 0.74 
Bladder (N = 7,294) 51.8% 99.9% 0.65 
Uterine (N = 4,319) 60.8% 99.9% 0.72 
Kidney (N = 7,294) 47.3% 99.9% 0.60 
Table 5.

Sensitivity, specificity, and kappa for overall cancer and seven specific cancer type, using the 90% coverage and at least 24-month follow-up.

Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 7,740) 73.2% 96.2% 0.73 
Colorectal/anal (N = 7,684) 59.5% 99.7% 0.69 
Lung (N = 7,684) 59.3% 99.5% 0.69 
Breast (N = 4,551) 75.7% 99.8% 0.84 
Prostate (N = 3,133) 66.7% 99.2% 0.75 
Bladder (N = 7,684) 54.4% 99.9% 0.67 
Uterine (N = 4,551) 57.9% 99.9% 0.70 
Kidney (N = 7,684) 45.1% 99.9% 0.58 
Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 7,740) 73.2% 96.2% 0.73 
Colorectal/anal (N = 7,684) 59.5% 99.7% 0.69 
Lung (N = 7,684) 59.3% 99.5% 0.69 
Breast (N = 4,551) 75.7% 99.8% 0.84 
Prostate (N = 3,133) 66.7% 99.2% 0.75 
Bladder (N = 7,684) 54.4% 99.9% 0.67 
Uterine (N = 4,551) 57.9% 99.9% 0.70 
Kidney (N = 7,684) 45.1% 99.9% 0.58 
Table 6.

Sensitivity, specificity, and kappa for overall cancer and seven specific cancer types, using the 100% coverage and at least 24-month follow-up.

Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 6,830) 72.9% 96.3% 0.73 
Colorectal/anal (N = 6,795) 59.1% 99.7% 0.70 
Lung (N = 6,795) 60.8% 99.5% 0.70 
Breast (N = 4,019) 75.9% 99.8% 0.84 
Prostate (N = 2,776) 66.7% 99.2% 0.75 
Bladder (N = 6,795) 51.9% 99.9% 0.65 
Uterine (N = 4,019) 60.8% 99.9% 0.72 
Kidney (N = 6,795) 47.9% 99.9% 0.60 
Cancer typeSensitivitySpecificityKappa
Any cancer type (N = 6,830) 72.9% 96.3% 0.73 
Colorectal/anal (N = 6,795) 59.1% 99.7% 0.70 
Lung (N = 6,795) 60.8% 99.5% 0.70 
Breast (N = 4,019) 75.9% 99.8% 0.84 
Prostate (N = 2,776) 66.7% 99.2% 0.75 
Bladder (N = 6,795) 51.9% 99.9% 0.65 
Uterine (N = 4,019) 60.8% 99.9% 0.72 
Kidney (N = 6,795) 47.9% 99.9% 0.60 

In this large, population-representative cohort of older US adults, the sensitivity of self-reported first incident cancer was moderate and differed by cancer site, while the specificity was consistently high across cancer sites. Similarly, κ values varied by site but showed moderate to substantial agreement overall (22). These results suggest that the HRS provides an additional and valuable resource for population-based cancer and aging studies, particularly for studies involving breast cancer, or all cancer sites combined. Cancer site-specific studies involving other common cancers should consider also using Medicare claim data to maximize the validity of diagnostic data. Results from the present study provide guidance on the most valid sources of cancer diagnostic information in the US HRS over time and across cancer sites.

Our findings build upon prior studies finding that sensitivity of self-reported cancer diagnoses varies widely by cancer site, and that the specificity of self-reported cancer diagnoses is generally higher than sensitivity (9, 10, 12, 13). In prior studies, most self-reports were validated against cancer registry data and generally found sensitivity values from 60% to 80% and specificity values over 90% (7–10, 13). Site-specific sensitivity values ranged from 17% for colorectal/anal cancer to 96% for breast cancer in these studies (7, 9, 11, 12). Several other studies found the highest sensitivity for self-reported breast cancer, with colorectal/anal cancer and prostate cancer also having higher sensitivity than many cancers (7, 9, 12, 13). Although our sensitivity values were similar to these prior studies, across cancer sites they were 10–20 percentage points lower than values observed in studies with younger populations, but substantially higher than in a French sample of adults over age 75 where overall sensitivity was 20.6% (9, 12, 13). Although some of this difference could be due to using a different gold standard than previous studies, this difference is also likely due to differences in age distribution across samples. Older age has been associated with lower accuracy in self-reported cancer diagnoses, which could be due to impaired memory or increased number of medical conditions (8, 9, 12). Our stratified results support this hypothesis, showing lower sensitivity values among participants over age 70 (Supplementary Table S4). Additionally, it is possible that among sites with low sensitivity values, participants may have reported having cancer in another primary site. Researchers interested in studying cancer in older populations should consider using medical claims to bolster the validity of self-reported diagnostic information when available.

Breast cancer had the highest sensitivity of the site-specific self-reported diagnoses in this study. Although prior studies have argued that breast cancer has the highest sensitivity because it has clear-cut diagnostic criteria (23), it is not the only cancer site included in this study with clear-cut diagnostic criteria. The reasons driving differences in sensitivity by cancer site are likely multifactorial. Breast cancer is one of the two most common cancers in the United States, along with prostate cancer, which had the second highest sensitivity in this study (24). Attention to these two cancers is high (25–27), and they are more commonly diagnosed at early stages, making it easier to attribute the cancer to a specific site, unlike cancers diagnosed at later stages with regional and distant spread (28). Although the cancers we included for the site-specific validation include the most common cancer types among older adults, awareness and reporting vary by cancer site and may influence older adults' recall or understanding of a diagnosis (25–27, 29).

The HRS provides an additional and valuable resource for cancer and aging resource as it includes over two decades of population-representative longitudinal pre- and post-diagnostic information on a rich range of social, behavioral, genetic, geographic, and biomarker data that are not available in existing claims databases and cancer registry data. Researchers wishing to maximize the validity of site-specific cancer diagnoses for adults ages 65 and over should consider using Medicare claim records linked to the HRS data. However, the increased validity of diagnostic data needs to be weighed against the restricted age range when linking to Medicare and the costs and time associated with claims linkage, when compared with HRS alone. As such, the HRS self-reported cancer data may be sufficient for many research questions and offer increased generalizability, lower cost, and more timely access to diagnostic information than the use of Medicare claims. Therefore, researchers who wish to use the HRS data for cancer and aging research should consider which diagnostic data source best suits their research question and needs.

Strengths and limitations

Strengths of this study include its large, nationally representative sample of US adults ages ≥65, with nearly two decades of follow-up data on self-reported and Medicare-recorded cancer diagnosis data. We provide validation data both cumulatively and over two-year HRS interview waves across a 16-year period for all cancers combined as well as seven common cancer types. Our results provided a useful resource to researchers who wish to access these data for specific time points or cancer sites. However, this study also has potential limitations that warrant comment. First, the use of Medicare claim diagnoses as a gold standard for cancer diagnosis may increase the likelihood of false positives and restrict the age of our analytic sample to adults ages ≥65, which may result in lower sensitivity and specificity of self-reports compared with younger populations. Cancer registry data are most commonly considered the gold standard for cancer diagnoses, given the granular collection of diagnostic information including stage at diagnosis. However, prior studies investigating the validity of Medicare-recorded cancer diagnoses against SEER registry data find high sensitivity and specificity of diagnoses in Medicare claims, especially when utilizing both inpatient and outpatient claims (30–32). To minimize the likelihood of a false-positive diagnostic claim, we required at least two outpatient encounters or one inpatient stay with a cancer diagnosis. We would expect any error in cancer assignment in Medicare claims to reduce sensitivity and specificity, which would result in our estimates being conservative values. Second, linking HRS data to fee-for-service Medicare limits the generalizability of our sample to adults ages ≥65 with fee-for-service Medicare (33). Third, the HRS interview questions about cancer site inquired about the organ in which the cancer first started, rather than directly asking about the type of cancer. Errors in reporting cancer site according to this interview question may have reduced sensitivity and specificity for site-specific diagnoses.

Conclusions

As the population of older cancer survivors continues to grow, our findings highlight the utility of the US HRS as a population-representative, longitudinal data resource for research on cancer and aging. This work provides insight into when diagnoses self-reported in HRS study interviews alone are sufficient for research purposes, and when researchers should consider linking HRS data to Medicare claims to maximize the validity of diagnostic information. Researchers who wish to use the HRS data for cancer and aging research should consider which diagnostic data source best suits their research question and needs.

M.A. Mullins reports grants from NCI during the conduct of the study. L.P. Wallner reports grants from NCI during the conduct of the study as well as grants from American Cancer Society outside the submitted work. No disclosures were reported by the other authors.

M.A. Mullins: Conceptualization, supervision, investigation, methodology, writing–original draft, project administration, writing–review and editing. J.S. Kler: Data curation, software, formal analysis, validation, visualization, writing–review and editing. M.R. Eastman: Data curation, software, validation, visualization, writing–review and editing. M. Kabeto: Software, formal analysis, visualization, methodology, writing–review and editing. L.P. Wallner: Conceptualization, supervision, funding acquisition, methodology, writing–review and editing. L.C. Kobayashi: Conceptualization, resources, supervision, methodology, project administration, writing–review and editing.

This research was supported by the NCI at the NIH (R03CA241841 to L.C. Kobayashi and P30CA046952 to Dr. Eric Fearon) and the National Institute on Aging at the NIH (P30AG012846 to Dr. Vicki Freedman). M.A. Mullins received research support from the NCI institutional training grant T32-CA-236621. The US Health and Retirement Study is sponsored by the National Institute on Aging (U01AG009740) and is conducted by the University of Michigan. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the NCI or the National Institute on Aging.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
US Census Bureau
. 
2019 population estimates by demographic characteristics [cited 2021 Feb 21]
.
Available from
: https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html.
2.
Miller
KD
,
Nogueira
L
,
Mariotto
AB
,
Rowland
JH
,
Yabroff
KR
,
Alfano
CM
, et al
Cancer treatment and survivorship statistics, 2019
.
CA Cancer J Clin
2019
;
69
:
363
85
.
3.
Bluethmann
SM
,
Mariotto
AB
,
Rowland
JH
. 
Anticipating the ‘silver tsunami’: prevalence trajectories and co-morbidity burden among older cancer survivors in the United States
.
Cancer Epidemiol Biomarkers Prev
2016
;
25
:
1029
36
.
4.
Warren
JL
,
Klabunde
CN
,
Schrag
D
,
Bach
PB
,
Riley
GF
. 
Overview of the SEER-Medicare data: content, research applications, and generalizability to the united states elderly population
.
Med Care
2002
;
40
:
IV3
IV18
.
5.
Zahnd
WE
,
Jenkins
WD
,
James
AS
,
Izadi
SR
,
Steward
DE
,
Fogleman
AJ
, et al
Utility and generalizability of multi-state, population-based cancer registry data for rural cancer surveillance research in the United States
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
1252
60
.
6.
Sonnega
A
,
Faul
JD
,
Ofstedal
MB
,
Langa
KM
,
Phillips
JWR
,
Weir
DR
. 
Cohort profile: the health and retirement study (HRS)
.
Int J Epidemiol
2014
;
43
:
576
85
.
7.
Cho
S
,
Shin
A
,
Song
D
,
Park
JK
,
Kim
Y
,
Choi
J-Y
, et al
Validity of self-reported cancer history in the health examinees (HEXA) study: a comparison of self-report and cancer registry records
.
Cancer Epidemiol
2017
;
50
(
A
):
16
21
.
8.
Cowdery
SP
,
Stuart
AL
,
Pasco
JA
,
Berk
M
,
Campbell
D
,
Williams
LJ
. 
Validity of self-reported cancer: comparison between self-report versus cancer registry records in the Geelong osteoporosis study
.
Cancer Epidemiol
2020
;
68
:
101790
.
9.
Loh
V
,
Harding
J
,
Koshkina
V
,
Barr
E
,
Shaw
J
,
Magliano
D
. 
The validity of self-reported cancer in an Australian population study
.
Aust N Z J Public Health
2014
;
38
:
35
38
.
10.
Brault
MWM
,
Landon
BEM
,
Zaslavsky
AM
. 
Validating reports of chronic conditions in the Medicare CAHPS survey
.
Med Care
2019
;
57
:
830
5
.
11.
Navarro
C
,
Chirlaque
MD
,
Tormo
MJ
,
Pérez-Flores
D
,
Rodríguez-Barranco
M
,
Sánchez-Villegas
A
, et al
Validity of self-reported diagnoses of cancer in a major Spanish prospective cohort study
.
J Epidemiol Community Health
2006
;
60
:
593
9
.
12.
Parikh-Patel
A
,
Allen
M
,
Wright
WE
., 
California teachers study steering committee. validation of self-reported cancers in the California teachers study
.
Am J Epidemiol
2003
;
157
:
539
45
.
13.
Berthier
F
,
Grosclaude
P
,
Bocquet
H
,
Faliu
B
,
Cayla
F
,
Machelard-Roumagnac
M
. 
Prevalence of cancer in the elderly: discrepancies between self-reported and registry data
.
Br J Cancer
1997
;
75
:
445
7
.
14.
Surveillance Research Program, National Cancer Institute
. 
Cancer query system: SEER incidence statistics
[cited 2020 Aug 3]
.
Available from:
https://seer.cancer.gov/canques/incidence.html.
15.
Cancer site | health and retirement study
[cited 2021 Feb 21]
.
Available from:
https://hrs.isr.umich.edu/data-products/restricted-data/available-products/9691.
16.
Weiner
MG
,
Livshits
A
,
Carozzoni
C
,
McMenamin
E
,
Gibson
G
,
Loren
AW
, et al
Derivation of malignancy status from ICD-9 codes
.
AMIA Annu Symp Proc
2003
;
2003
:
1050
.
17.
Centers for Disease Control and Prevention
. 
screening list of ICD-9-CM codes for casefinding
[cited 2020 Oct 1]
.
Available from:
https://www.cdc.gov/cancer/apps/ccr/icd9cm_codes.pdf.
18.
2018 ICD-10-CM Casefinding List
.
SEER
[cited
2020
Nov 1]
.
Available from:
https://seer.cancer.gov/tools/casefinding/case2018-icd10cm.html.
19.
Agency for Healthcare Research and Quality
. 
AHRQ QI ICD-10-CM/PCS specification version 6.0, patient safety indicators appendix H
[cited 21 Feb 21]
.
Available from:
https://www.qualityindicators.ahrq.gov/Downloads/Modules/PSI/V60-ICD10/TechSpecs/PSI_Appendix_H.pdf.
20.
Condition categories—chronic conditions data warehouse
[cited 2021 Mar 24]
.
Available from:
https://www2.ccwdata.org/web/guest/condition-categories.
21.
Cohen
J
. 
A coefficient of agreement for nominal scales
.
Educ Psychol Meas
1960
;
20
:
37
46
.
22.
McHugh
ML
. 
Interrater reliability: the kappa statistic
.
Biochem Medica
2012
;
22
:
276
82
.
23.
Colditz
GA
,
Martin
P
,
Stampfer
MJ
,
Willett
WC
,
Sampson
L
,
Rosner
B
, et al
Validation of questionnaire information on risk factors and disease outcomes in a prospective cohort study of women
.
Am J Epidemiol
1986
;
123
:
894
900
.
24.
Common Cancer Sites—Cancer Stat Facts
.
SEER
[cited
2021 March 1]
.
Available from:
https://seer.cancer.gov/statfacts/html/common.html.
25.
Maggio
LA
,
Ratcliff
CL
,
Krakow
M
,
Moorhead
LL
,
Enkhbayar
A
,
Alperin
JP
. 
Making headlines: an analysis of US government-funded cancer research mentioned in online media
.
BMJ Open
2019
;
9
:
e025783
.
26.
Jensen
JD
,
Scherr
CL
,
Brown
N
,
Jones
C
,
Christy
K
,
Hurley
RJ
, et al
. 
Public estimates of cancer frequency: cancer incidence perceptions mirror distorted media depictions
.
J Health Commun
2014
;
19
:
609
24
.
27.
Jensen
JD
,
Moriarty
CM
,
Hurley
RJ
,
Stryker
JE
. 
Making sense of cancer news coverage trends: a comparison of three comprehensive content analyses
.
J Health Commun
2010
;
15
:
136
51
.
28.
Cancer of the Breast (Female)—Cancer Stat Facts
. 
SEER
[cited February 10]
.
Available from:
https://seer.cancer.gov/statfacts/html/breast.html.
29.
Kealey
E
,
Berkman
CS
. 
The relationship between health information sources and mental models of cancer: findings from the 2005 health information national trends survey
.
J Health Commun
2010
;
15
:
236
51
.
30.
Gold
HT
,
Do
HT
. 
Evaluation of three algorithms to identify incident breast cancer in Medicare claims data
.
Health Serv Res
2007
;
42
:
2056
69
.
31.
Parlett
LE
,
Beachler
DC
,
Lanes
S
,
Hoover
RN
,
Cook
MB
. 
Validation of an algorithm for claims-based incidence of prostate cancer
.
Epidemiol Camb Mass
2019
;
30
:
466
71
.
32.
Cooper
GS
,
Yuan
Z
,
Stange
KC
,
Dennis
LK
,
Amini
SB
,
Rimm
AA
. 
The sensitivity of Medicare claims data for case ascertainment of six common cancers
.
Med Care
1999
;
37
:
436
44
.
33.
Jacobson
G
,
Damico
A
. 
Medicare advantage 2017 spotlight: enrollment market update
.
KFF; 2017 [cited 2021 Feb 21]. Available from:
https://www.kff.org/medicare/issue-brief/medicare-advantage-2017-spotlight-enrollment-market-update/.