Abstract
The US Health Retirement Study (HRS) is an ongoing population-representative cohort of US adults ages >50 with rich data on health during aging. Self-reported cancer diagnoses have been collected since 1998, but they have not been validated. We compared self-reported cancer diagnoses in HRS interviews against diagnostic claims from linked Medicare records.
Using HRS–Medicare linked data, we examined the validity of first incident cancer diagnoses self-reported in biennial interviews from 2000 to 2016 against ICD-9 and ICD-10 diagnostic claim records as the gold standard. Data were from 8,242 HRS participants ages ≥65 with 90% continuous enrollment in fee-for-service Medicare. We calculated the sensitivity, specificity, and κ for first incident invasive cancer diagnoses (all cancers combined, and each of bladder, breast, colorectal/anal, uterine, kidney, lung, and prostate cancers) cumulatively over the follow-up and at each biennial study interview.
Overall, self-reports of first incident cancer diagnoses from 2000 to 2016 had 73.2% sensitivity and 96.2% specificity against Medicare claims (κ = 0.73). For specific cancer types, sensitivities ranged from 44.7% (kidney) to 75.0% (breast), and specificities ranged from 99.2% (prostate) and 99.9% (bladder, uterine, and kidney). Results were similar in sensitivity analyses restricted to individuals with 100% continuous fee-for-service Medicare enrollment and when restricted to individuals with at least 24 months of Medicare enrollment.
Self-reported cancer diagnoses in the HRS have reasonable validity for use in population-based research that is maximized with linkage to Medicare.
These findings inform the use of the HRS for population-based cancer and aging research.
Introduction
The confluence of rapid population aging with continually improving cancer survival rates in the United States is resulting in a substantial growth of the older cancer survivor population (1, 2). By 2040, the population of US cancer survivors ages ≥65 is projected to grow by nearly 50% to approximately 19 million people (3). Thus, there is an increasing need for longitudinal, population-representative data for interdisciplinary cancer research among older adults. The population-based Surveillance, Epidemiology and End Results (SEER) cancer registries capture about 28% of the US population and are widely used in cancer research, yet these samples are generally more urban, less white, lower poverty, and have more HMO coverage than the general US population (4, 5).
The US Health Retirement Study (HRS) is an ongoing population-representative cohort of adults ages >50 with rich biennial interview data on social, behavioral, economic, and health-related factors during aging, in addition to biomarker data, genetic data, and several administrative data linkages such as Medicare, Social Security, and pension data (6). Over 40,000 individuals have taken part in at least one HRS interview since its inception. The HRS has regularly collected self-reported cancer diagnoses since 1998 and may be a valuable resource for population-based cancer research. However, the self-reported cancer diagnoses in HRS have not yet been validated. Prior validation studies of self-reported cancer diagnoses using cancer registry data have shown moderate sensitivity (63.7%–72.0%) and high specificity (>90%) of self-reported cancer diagnosis, agnostic of cancer site (7–9). Similarly, a validation of self-reports against Medicare claims found a 77% sensitivity and 91% specificity (10). Sensitivity of cancer site-specific diagnoses varies widely by cancer site, with breast cancer showing the highest sensitivity at >80% (7, 9, 11, 12). Importantly, sensitivity and specificity are affected by the study population that is sampled. A French study of adults over age 75 found 20.6% sensitivity and 99.4% specificity of self-reported cancer (13). Indeed, nearly all existing validation studies have found increased false-negative and false-positive responses among older individuals (8–12).
To evaluate the utility of self-reported cancer diagnosis in the HRS for future research on cancer and aging, we aimed to validate self-reported first incident cancer diagnoses from the HRS interviews against diagnostic claims from linked Medicare records. We used data from biennial HRS interviews from 2000 through 2016 linked to Medicare claims from 2000 through 2016 to evaluate the sensitivity, specificity, and κ for self-reported diagnosis of a first incident malignant cancer (all types), and for the seven most common cancer sites in adults ages ≥65 (breast, prostate, lung, colorectal/anal, uterine, kidney, and bladder; ref. 14).
Materials and Methods
Data source
The US Health and Retirement Study (HRS) is a nationally representative cohort of Americans ages >50 who are interviewed every two years (6). The HRS began in 1992 as a cohort of adults ages 51–61 years. It merged with the 1993 Asset and Health Dynamics of the Oldest Old (AHEAD) study of adults born before 1924 and expanded recruitment across birth cohorts until it comprised a cohort of adults ages >50 in 1998. “Refresher” cohorts ages 51–56 years are added every six years, to account for the aging of the original 1998 cohort (6). Across all interview years, approximately 80% of HRS participants who are Medicare enrollees consent to linking their Medicare records (6). In the present study, we used HRS interview data from 1998 to 2016 (1998 data were used for demographic information and to exclude individuals with a prior cancer history, not for analysis), including restricted data for self-reported cancer site, linked to Medicare Part A (inpatient, outpatient) and Part B (carrier) fee-for-service claims from 2000 to 2016 (15).
Study population
We included 17,495 HRS participants who were Medicare-eligible during our validation study period (January 2000–December 2016) and had no prior history of cancer (recorded either through self-report or Medicare claims). Eligible individuals who did not enroll in Medicare, were <65 at their Medicare eligibility date, became eligible for Medicare after their last HRS interview, or had a baseline interview prior to 1998 were excluded (N = 1,901). To ensure sufficient available claims for detecting cancer diagnoses, while minimizing the number of people excluded from the analysis due to a lack of continuous coverage, participants with less than 90% Parts A and B Medicare coverage during their eligible time period (N = 6,895), those with fewer than 12 months follow-up (N = 235), or those missing baseline characteristics (N = 222) were excluded, for a final sample of 8,242.
Collection of HRS data and written informed consent is done in compliance with requirements of the University of Michigan's Institutional Review Board. This analysis was approved by the University of Michigan Health Sciences and Behavioral Sciences Institutional Review Board (HUM00170138) and additional informed consent was not necessary for the present analysis.
Identification of cancer diagnoses
Self-reported cancer diagnoses
First incident self-reported cancer diagnoses other than nonmelanoma skin cancer were identified in the HRS baseline interview (1998), follow-up interviews (2000–2016), and interviews at these same time points completed by a proxy respondent on behalf of the participant (usually a spouse or other family member who completes interview when participant is unable or unwilling). For respondents who died between follow-ups, exit interviews were completed by a knowledgeable proxy within two years of a participant's death to ascertain any cancer diagnosis that occurred following the respondent's final HRS interview. At each biennial interview, HRS participants or their proxies are asked whether they had received a new cancer diagnosis (excluding minor skin cancers) since their last interview and, if so, the date of diagnosis and in which organ or body part the cancer started (Supplementary Table S1; ref. 15).
Medicare claims–based cancer diagnoses
The cancer sites recorded in the HRS interview data were mapped onto ICD-9 and ICD-10 codes for identification in Medicare claims, resulting in a total of 52 malignant cancer types (Supplementary Table S1; refs. 16–19). Using the Medicare Chronic Conditions Warehouse cancer algorithm, claims-based cancer diagnoses were defined through the identification of a relevant ICD-9 or ICD-10 diagnosis code on either one inpatient claim, or, two outpatient or Part B carrier claims (20). The incident cancer diagnosis date was assigned as the first date the cancer diagnosis was detected on a claim within the two-year period.
Demographics
Demographic characteristics for participants were obtained from the HRS interview closest to their Medicare eligibility date and included participant age (in years, continuous), sex (male, female), race/ethnicity (non-Hispanic white, non-Hispanic black, non-Hispanic other, Hispanic), living arrangement (married/partnered, unmarried living with other, unmarried living alone), education (less than high school, high school, some college, college or more), household wealth (mean (SD) in 1998 dollars adjusted for inflation), and comorbid conditions (0, 1, 2+).
Statistical analysis
We calculated the overall sensitivity, specificity, and κ for self-reported cancer diagnoses in the HRS interviews from 2000 through 2016, using Medicare claim-based diagnoses (2000–2016) as the gold standard. We first performed these analyses for any incident cancer diagnosis (regardless of type), and then for each of the seven most common cancers in older adults (breast, prostate, colorectal/anal, lung, bladder, uterine, and kidney). The κ statistic was calculated to quantify the concordance between HRS self-reports and Medicare claims, while accounting for chance agreement (21). We also calculated the sensitivity, specificity, and κ for self-reported cancer diagnoses stratified by participant age (65–69, 70+), sex (male/female), and level of education (less than high school, high school, some college, and college graduate). The first incident self-reported cancer diagnosis in the time period 2000–2016 was compared with any claim identified diagnosis in the period 2000–2016 (mean time between self-report and claim was 1.6 years). As sensitive health information, self-reported cancer site is restricted use data. At the time of this study, these data were available only through the 2014 HRS interview (15). Hence, cancer site–specific validations were calculated using Medicare claim data through April 2015, which was the date of the last interview in the 2014 HRS wave.
We also calculated the sensitivity, specificity, and κ for self-reported cancer diagnoses for each two-year HRS interview wave, as these figures may be useful for researchers conducting analyses using HRS data from specific time points. For each HRS interview wave beginning with 2000, we evaluated the participants' participation in the subsequent wave. Among participants who took part in two consecutive HRS interview waves, we included those who were eligible for Medicare for the full period between the two waves (approximately two years) with at least 90% fee-for-service coverage. Participants with an incident cancer were excluded from validity calculations in subsequent waves. We conducted three sensitivity analyses to evaluate the robustness of our primary analyses to changes in percentage of Medicare coverage and length of follow-up. First, we restricted our sample to individuals who had 100% Medicare fee-for-service coverage during their follow-up period. Second, we restricted to individuals with at least 24 months of follow-up in the Medicare data. Finally, we restricted to individuals with both 100% Medicare fee-for-service coverage and at least 24 months of follow-up in the Medicare data. All analyses were performed with Stata version 16.1 (StataCorp LLC).
Results
This analysis included 8,242 Medicare-eligible HRS participants ages ≥65 (Table 1). The mean age was 71.7 years (standard deviation 7.9 years). Most participants were women (59.3%), non-Hispanic white (79.4%), and married (64.3%). Most had a high school degree or less (64.3%), no comorbid conditions, and mean household wealth was $314,560 (Table 1). Overall, 20% of the sample reported a first incident cancer diagnosis over the period 2000 to 2016.
Characteristic . | All participants N = 8,242 (%) . |
---|---|
Age, mean (SD) | 71.7 (7.9) |
Male | 3,354 (40.7) |
Race/ethnicity (missing n = 4) | |
Non-Hispanic white | 6,532 (79.3) |
Non-Hispanic Black | 1,034 (12.6) |
American Indian/Alaska Native | 78 (1.0) |
Asian/Pacific Islander | 38 (0.5) |
Non-Hispanic other | 27 (0.3) |
Hispanic | 529 (6.4) |
Education (years)b | |
Less than high school | 2,537 (30.8) |
High school | 2,759 (33.5) |
Some college | 1,480 (18.0) |
College or more | 1,465 (17.8) |
Household wealth (SD)c | 292,341 (707,075) |
Charlson comorbidity | 4,928 (59.8) |
None | 1,347 (16.3) |
1 | 1,967 (23.9) |
2+ | |
Mean time from self-report to claim diagnosis (years) | 1.6 |
Characteristic . | All participants N = 8,242 (%) . |
---|---|
Age, mean (SD) | 71.7 (7.9) |
Male | 3,354 (40.7) |
Race/ethnicity (missing n = 4) | |
Non-Hispanic white | 6,532 (79.3) |
Non-Hispanic Black | 1,034 (12.6) |
American Indian/Alaska Native | 78 (1.0) |
Asian/Pacific Islander | 38 (0.5) |
Non-Hispanic other | 27 (0.3) |
Hispanic | 529 (6.4) |
Education (years)b | |
Less than high school | 2,537 (30.8) |
High school | 2,759 (33.5) |
Some college | 1,480 (18.0) |
College or more | 1,465 (17.8) |
Household wealth (SD)c | 292,341 (707,075) |
Charlson comorbidity | 4,928 (59.8) |
None | 1,347 (16.3) |
1 | 1,967 (23.9) |
2+ | |
Mean time from self-report to claim diagnosis (years) | 1.6 |
aData based on 90% coverage for those with at least 12 months of follow-up in Medicare data.
bOne individual is missing an education value.
cData extracted from different waves (1998–2016), with the majority from the 1998 HRS interview, adjusting for inflation based on 1998 dollars using the interview year.
For the full period from 2000 to 2016, self-reports of first incident cancer diagnosis for any of the 52 cancer sites included in the HRS had an overall 73.2% sensitivity and 96.2% specificity against Medicare claim data (κ = 0.73; Table 2). For the site-specific cancer diagnoses, sensitivities ranged from 44.7% (kidney cancer) to 75.0% (breast cancer), and specificities ranged from 99.2% (prostate cancer) to 99.9% (bladder, uterine, and kidney cancers; Table 2). Accounting for chance agreement, the κ values for interview self-reports and Medicare claims ranged from 0.58 (kidney cancer) to 0.83 (breast cancer; Table 2). The contingency tables presenting self-reported versus Medicare-defined cancer diagnoses are presented in Supplementary Tables S2 and S3.
Cancer typea . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 8,242)b | 73.2% | 96.2% | 0.73 |
Colorectal/anal (N = 8,192) | 60.1% | 99.7% | 0.69 |
Lung (N = 8,192) | 59.2% | 99.4% | 0.68 |
Breast (N = 4,855) | 75.0% | 99.8% | 0.83 |
Prostate (N = 3,337) | 66.1% | 99.2% | 0.74 |
Bladder (N = 8,192) | 54.2% | 99.9% | 0.67 |
Uterine (N = 4,855) | 57.9% | 99.9% | 0.70 |
Kidney (N = 8,192) | 44.7% | 99.9% | 0.58 |
Cancer typea . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 8,242)b | 73.2% | 96.2% | 0.73 |
Colorectal/anal (N = 8,192) | 60.1% | 99.7% | 0.69 |
Lung (N = 8,192) | 59.2% | 99.4% | 0.68 |
Breast (N = 4,855) | 75.0% | 99.8% | 0.83 |
Prostate (N = 3,337) | 66.1% | 99.2% | 0.74 |
Bladder (N = 8,192) | 54.2% | 99.9% | 0.67 |
Uterine (N = 4,855) | 57.9% | 99.9% | 0.70 |
Kidney (N = 8,192) | 44.7% | 99.9% | 0.58 |
aSite-specific cancer details are available through April 2015.
bTotal sample size including participants with and without cancer.
. | Any malignant cancer . | ||
---|---|---|---|
Year . | Sensitivity . | Specificity . | Kappa . |
2002 | 67.0 | 99.4 | 0.73 |
2004 | 65.7 | 99.6 | 0.73 |
2006 | 65.5 | 99.4 | 0.71 |
2008 | 66.2 | 99.4 | 0.73 |
2010 | 69.7 | 99.5 | 0.77 |
2012 | 71.3 | 99.4 | 0.76 |
2014 | 64.6 | 99.6 | 0.72 |
2016 | 72.3 | 99.1 | 0.74 |
. | Any malignant cancer . | ||
---|---|---|---|
Year . | Sensitivity . | Specificity . | Kappa . |
2002 | 67.0 | 99.4 | 0.73 |
2004 | 65.7 | 99.6 | 0.73 |
2006 | 65.5 | 99.4 | 0.71 |
2008 | 66.2 | 99.4 | 0.73 |
2010 | 69.7 | 99.5 | 0.77 |
2012 | 71.3 | 99.4 | 0.76 |
2014 | 64.6 | 99.6 | 0.72 |
2016 | 72.3 | 99.1 | 0.74 |
Considering the first incident cancer diagnosis of any cancer type by HRS interview wave, sensitivity values generally increased over time, and ranged from 64.6% (2014 wave) to 72.3% (2016 wave). Specificity values also tended to increase over time and ranged from 99.1% in 2016 to 99.6% in the 2004 and 2014 waves. κ was lowest in the 2006 wave (κ = 0.71) and highest in the 2010 wave (κ = 0.77; Table 3).
When restricting to individuals with 100% Medicare fee-for-service coverage during their follow-up (N = 7,320), individuals with at least 24 months of Medicare follow-up (N = 7,740), and individuals with both 100% coverage and 24 months of follow-up (N = 6,830), the results were all similar to those observed in our primary analyses (Tables 4–6). Stratified analyses generally showed lower sensitivity values for self-reports from individuals over age 70 and those with less education, but values were similar for both males and females (Supplementary Tables 4–6).
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 7,320) | 72.9% | 96.3% | 0.73 |
Colorectal/anal (N = 7,294) | 59.8% | 99.7% | 0.70 |
Lung (N = 7,294) | 60.6% | 99.4% | 0.69 |
Breast (N = 4,319) | 75.1% | 99.8% | 0.83 |
Prostate (N = 2,975) | 66.0% | 99.3% | 0.74 |
Bladder (N = 7,294) | 51.8% | 99.9% | 0.65 |
Uterine (N = 4,319) | 60.8% | 99.9% | 0.72 |
Kidney (N = 7,294) | 47.3% | 99.9% | 0.60 |
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 7,320) | 72.9% | 96.3% | 0.73 |
Colorectal/anal (N = 7,294) | 59.8% | 99.7% | 0.70 |
Lung (N = 7,294) | 60.6% | 99.4% | 0.69 |
Breast (N = 4,319) | 75.1% | 99.8% | 0.83 |
Prostate (N = 2,975) | 66.0% | 99.3% | 0.74 |
Bladder (N = 7,294) | 51.8% | 99.9% | 0.65 |
Uterine (N = 4,319) | 60.8% | 99.9% | 0.72 |
Kidney (N = 7,294) | 47.3% | 99.9% | 0.60 |
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 7,740) | 73.2% | 96.2% | 0.73 |
Colorectal/anal (N = 7,684) | 59.5% | 99.7% | 0.69 |
Lung (N = 7,684) | 59.3% | 99.5% | 0.69 |
Breast (N = 4,551) | 75.7% | 99.8% | 0.84 |
Prostate (N = 3,133) | 66.7% | 99.2% | 0.75 |
Bladder (N = 7,684) | 54.4% | 99.9% | 0.67 |
Uterine (N = 4,551) | 57.9% | 99.9% | 0.70 |
Kidney (N = 7,684) | 45.1% | 99.9% | 0.58 |
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 7,740) | 73.2% | 96.2% | 0.73 |
Colorectal/anal (N = 7,684) | 59.5% | 99.7% | 0.69 |
Lung (N = 7,684) | 59.3% | 99.5% | 0.69 |
Breast (N = 4,551) | 75.7% | 99.8% | 0.84 |
Prostate (N = 3,133) | 66.7% | 99.2% | 0.75 |
Bladder (N = 7,684) | 54.4% | 99.9% | 0.67 |
Uterine (N = 4,551) | 57.9% | 99.9% | 0.70 |
Kidney (N = 7,684) | 45.1% | 99.9% | 0.58 |
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 6,830) | 72.9% | 96.3% | 0.73 |
Colorectal/anal (N = 6,795) | 59.1% | 99.7% | 0.70 |
Lung (N = 6,795) | 60.8% | 99.5% | 0.70 |
Breast (N = 4,019) | 75.9% | 99.8% | 0.84 |
Prostate (N = 2,776) | 66.7% | 99.2% | 0.75 |
Bladder (N = 6,795) | 51.9% | 99.9% | 0.65 |
Uterine (N = 4,019) | 60.8% | 99.9% | 0.72 |
Kidney (N = 6,795) | 47.9% | 99.9% | 0.60 |
Cancer type . | Sensitivity . | Specificity . | Kappa . |
---|---|---|---|
Any cancer type (N = 6,830) | 72.9% | 96.3% | 0.73 |
Colorectal/anal (N = 6,795) | 59.1% | 99.7% | 0.70 |
Lung (N = 6,795) | 60.8% | 99.5% | 0.70 |
Breast (N = 4,019) | 75.9% | 99.8% | 0.84 |
Prostate (N = 2,776) | 66.7% | 99.2% | 0.75 |
Bladder (N = 6,795) | 51.9% | 99.9% | 0.65 |
Uterine (N = 4,019) | 60.8% | 99.9% | 0.72 |
Kidney (N = 6,795) | 47.9% | 99.9% | 0.60 |
Discussion
In this large, population-representative cohort of older US adults, the sensitivity of self-reported first incident cancer was moderate and differed by cancer site, while the specificity was consistently high across cancer sites. Similarly, κ values varied by site but showed moderate to substantial agreement overall (22). These results suggest that the HRS provides an additional and valuable resource for population-based cancer and aging studies, particularly for studies involving breast cancer, or all cancer sites combined. Cancer site-specific studies involving other common cancers should consider also using Medicare claim data to maximize the validity of diagnostic data. Results from the present study provide guidance on the most valid sources of cancer diagnostic information in the US HRS over time and across cancer sites.
Our findings build upon prior studies finding that sensitivity of self-reported cancer diagnoses varies widely by cancer site, and that the specificity of self-reported cancer diagnoses is generally higher than sensitivity (9, 10, 12, 13). In prior studies, most self-reports were validated against cancer registry data and generally found sensitivity values from 60% to 80% and specificity values over 90% (7–10, 13). Site-specific sensitivity values ranged from 17% for colorectal/anal cancer to 96% for breast cancer in these studies (7, 9, 11, 12). Several other studies found the highest sensitivity for self-reported breast cancer, with colorectal/anal cancer and prostate cancer also having higher sensitivity than many cancers (7, 9, 12, 13). Although our sensitivity values were similar to these prior studies, across cancer sites they were 10–20 percentage points lower than values observed in studies with younger populations, but substantially higher than in a French sample of adults over age 75 where overall sensitivity was 20.6% (9, 12, 13). Although some of this difference could be due to using a different gold standard than previous studies, this difference is also likely due to differences in age distribution across samples. Older age has been associated with lower accuracy in self-reported cancer diagnoses, which could be due to impaired memory or increased number of medical conditions (8, 9, 12). Our stratified results support this hypothesis, showing lower sensitivity values among participants over age 70 (Supplementary Table S4). Additionally, it is possible that among sites with low sensitivity values, participants may have reported having cancer in another primary site. Researchers interested in studying cancer in older populations should consider using medical claims to bolster the validity of self-reported diagnostic information when available.
Breast cancer had the highest sensitivity of the site-specific self-reported diagnoses in this study. Although prior studies have argued that breast cancer has the highest sensitivity because it has clear-cut diagnostic criteria (23), it is not the only cancer site included in this study with clear-cut diagnostic criteria. The reasons driving differences in sensitivity by cancer site are likely multifactorial. Breast cancer is one of the two most common cancers in the United States, along with prostate cancer, which had the second highest sensitivity in this study (24). Attention to these two cancers is high (25–27), and they are more commonly diagnosed at early stages, making it easier to attribute the cancer to a specific site, unlike cancers diagnosed at later stages with regional and distant spread (28). Although the cancers we included for the site-specific validation include the most common cancer types among older adults, awareness and reporting vary by cancer site and may influence older adults' recall or understanding of a diagnosis (25–27, 29).
The HRS provides an additional and valuable resource for cancer and aging resource as it includes over two decades of population-representative longitudinal pre- and post-diagnostic information on a rich range of social, behavioral, genetic, geographic, and biomarker data that are not available in existing claims databases and cancer registry data. Researchers wishing to maximize the validity of site-specific cancer diagnoses for adults ages 65 and over should consider using Medicare claim records linked to the HRS data. However, the increased validity of diagnostic data needs to be weighed against the restricted age range when linking to Medicare and the costs and time associated with claims linkage, when compared with HRS alone. As such, the HRS self-reported cancer data may be sufficient for many research questions and offer increased generalizability, lower cost, and more timely access to diagnostic information than the use of Medicare claims. Therefore, researchers who wish to use the HRS data for cancer and aging research should consider which diagnostic data source best suits their research question and needs.
Strengths and limitations
Strengths of this study include its large, nationally representative sample of US adults ages ≥65, with nearly two decades of follow-up data on self-reported and Medicare-recorded cancer diagnosis data. We provide validation data both cumulatively and over two-year HRS interview waves across a 16-year period for all cancers combined as well as seven common cancer types. Our results provided a useful resource to researchers who wish to access these data for specific time points or cancer sites. However, this study also has potential limitations that warrant comment. First, the use of Medicare claim diagnoses as a gold standard for cancer diagnosis may increase the likelihood of false positives and restrict the age of our analytic sample to adults ages ≥65, which may result in lower sensitivity and specificity of self-reports compared with younger populations. Cancer registry data are most commonly considered the gold standard for cancer diagnoses, given the granular collection of diagnostic information including stage at diagnosis. However, prior studies investigating the validity of Medicare-recorded cancer diagnoses against SEER registry data find high sensitivity and specificity of diagnoses in Medicare claims, especially when utilizing both inpatient and outpatient claims (30–32). To minimize the likelihood of a false-positive diagnostic claim, we required at least two outpatient encounters or one inpatient stay with a cancer diagnosis. We would expect any error in cancer assignment in Medicare claims to reduce sensitivity and specificity, which would result in our estimates being conservative values. Second, linking HRS data to fee-for-service Medicare limits the generalizability of our sample to adults ages ≥65 with fee-for-service Medicare (33). Third, the HRS interview questions about cancer site inquired about the organ in which the cancer first started, rather than directly asking about the type of cancer. Errors in reporting cancer site according to this interview question may have reduced sensitivity and specificity for site-specific diagnoses.
Conclusions
As the population of older cancer survivors continues to grow, our findings highlight the utility of the US HRS as a population-representative, longitudinal data resource for research on cancer and aging. This work provides insight into when diagnoses self-reported in HRS study interviews alone are sufficient for research purposes, and when researchers should consider linking HRS data to Medicare claims to maximize the validity of diagnostic information. Researchers who wish to use the HRS data for cancer and aging research should consider which diagnostic data source best suits their research question and needs.
Authors' Disclosures
M.A. Mullins reports grants from NCI during the conduct of the study. L.P. Wallner reports grants from NCI during the conduct of the study as well as grants from American Cancer Society outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
M.A. Mullins: Conceptualization, supervision, investigation, methodology, writing–original draft, project administration, writing–review and editing. J.S. Kler: Data curation, software, formal analysis, validation, visualization, writing–review and editing. M.R. Eastman: Data curation, software, validation, visualization, writing–review and editing. M. Kabeto: Software, formal analysis, visualization, methodology, writing–review and editing. L.P. Wallner: Conceptualization, supervision, funding acquisition, methodology, writing–review and editing. L.C. Kobayashi: Conceptualization, resources, supervision, methodology, project administration, writing–review and editing.
Acknowledgments
This research was supported by the NCI at the NIH (R03CA241841 to L.C. Kobayashi and P30CA046952 to Dr. Eric Fearon) and the National Institute on Aging at the NIH (P30AG012846 to Dr. Vicki Freedman). M.A. Mullins received research support from the NCI institutional training grant T32-CA-236621. The US Health and Retirement Study is sponsored by the National Institute on Aging (U01AG009740) and is conducted by the University of Michigan. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the NCI or the National Institute on Aging.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.