Abstract
Evaluations of cancer etiology and safety and effectiveness of cancer treatments are predicated on large numbers of patients with sufficient baseline and follow-up data. To assess feasibility of FDA's Sentinel System's electronic healthcare data for surveillance of malignancy onset and examination of product safety, this study examined patterns of enrollment surrounding new-onset cancers.
Using a retrospective cohort of patients based on administrative claims, we identified incident events of 19 cancers among 292.5 million health plan members from January 2000 to February 2020 using International Classification of Diseases (ICD) diagnosis codes. Annual incident cases were stratified by sex, age, medical and drug coverage, and insurer type. Descriptive statistics were calculated for observable time prior to and following diagnosis.
We identified 10,697,573 incident cancer events among members with medical coverage. When drug coverage was additionally required, number of incident cancers was reduced by 41%. Medicare data contributed 61% of cases, with similar duration trends as other insurers. Mean duration of follow-up prior to diagnosis ranged from 4.0 to 4.6 years, whereas follow-up post diagnosis ranged from 1.1 to 3.3 years. Approximately a third (36.1%) had at least 2 years both prior to and following diagnosis.
The FDA Sentinel System's electronic healthcare data may be useful for characterizing relatively short latency cancer risk, examining cancer drug utilization and safety after diagnosis, and conducting surveillance for acute adverse events among patients with cancers.
A national distributed system with electronic health data, the Sentinel system provides opportunity for rapid pharmacoepidemiologic assessments relevant in oncology.
Introduction
Clinical trials that support medical product approvals are often limited in their ability to detect cancer risk (1). The limitations are partly due to small sample sizes and subsequent insufficient power to detect differences in rare events, relatively short trial durations, and controlled conditions which may not reflect real-world settings. Observational studies that retrospectively use existing data from electronic healthcare data sources may present an alternative option for evaluating cancer risk in the post-approval setting if the source of real-world data (RWD) is deemed relevant and reliable for supporting regulatory decision-making (2).
The FDA has issued several post-market requirement (PMR) studies to evaluate the risk of malignancy after drug exposure, an important and relatively frequently needed outcome to evaluate for approved drug products. In addition, the safety of required therapeutic cancer treatments is also a topic for investigation as oncology drugs constitute 27% of all new drug approvals in the United States since 2010 (3, 4). To enhance post-market drug safety capabilities across multiple therapeutic areas, the FDA has made use of longitudinal electronic healthcare data through the Sentinel Distributed Database to conduct studies in real-world settings (5). The FDA Sentinel Distributed Database constitutes a national post-marketing surveillance system that contains curated and routinely updated electronic healthcare data (6).
Although Sentinel's Active Risk Identification and Analysis (ARIA) system has been deemed sufficient to study short-term risk of lymphoma and renal cell carcinoma for select drug products (7–10), cancer data remain generally uncharacterized in the Sentinel System. Some important considerations are needed before utilizing data systems for population-based observational drug-safety studies conducted after drug approval, including member sample size and observation time in the database in relation to disease latency and treatment duration (11).
In this study, we seek to explore 19 different cancer outcomes to understand trends in cancer incidence and describe the availability of longitudinal information surrounding cancer onset, both before and after cancer diagnosis. This study was initiated to help understand whether the available members contributing to the data system have sufficient longitudinal experience in the database to support studies of cancer as a drug safety outcome, and to understand whether the data could support descriptive assessments of cancer therapeutic regimens.
Materials and Methods
Data source
The Sentinel System is a national distributed data network, which is largely comprised of medical claims covering over 700 million person-years of longitudinal observation time, and captures information on patient demographics, medical encounters, and outpatient pharmacy dispensings during health plan enrollment periods (12). Fourteen Data Partners, listed in the acknowledgments, contributed data for this study between 2000 and 2020. The Data Partners included eight integrated delivery systems and six claims-based national health service plans or healthcare organizations, one of which includes 100% fee-for-service Medicare. Sentinel uses a common data model for standardization and harmonization across Data Partners (13). This Sentinel project is a public health surveillance activity conducted under the authority of the FDA and accordingly, is not subject to Institutional Review Board oversight [Basic HHS Policy for Protection of Human Research Subjects, 45 CFR §46.102(l)(2)].
Study cohort
The study examined a retrospective cohort of persons continuously enrolled in a health plan with medical and drug coverage for at least 365 days prior to the cancer diagnosis of interest. In addition, we removed the drug coverage requirement in a sensitivity analysis to explore the impact on sample size. To accommodate administrative gaps in enrollment, we allowed gaps of up to 45 days. The study included health plan members of all ages from January 1, 2000, to February 29, 2020.
We identified primary invasive cancers of the following 19 sites: breast (separately for females, males), lung and bronchus, colon, rectum, thyroid, melanoma of the skin, non-Hodgkin lymphoma, pancreas, leukemia, kidney and renal pelvis, prostate, urinary bladder, liver and intrahepatic bile duct, esophagus, anus, cervix, endometrial, other endocrine. The cancers included are considered among the leading sites of new cancer cases in the United States by gender (14).
Defining cancer types
International Classification of Diseases (ICD) diagnosis codes were used to define the cancer sites. Cancer diagnosis codes were derived from Chronic Conditions Data Warehouse (CCW) and reviewed by subject matter experts from SEER program (Supplementary Table S1). Prior to confirming the coding definition for each cancer type, a trend analysis was conducted to assess definition integrity across the switch from the 9th revision (ICD-9-CM) to the 10th revision (ICD-10-CM). The trend analysis assessed potential discontinuities in prevalence of each ICD code defined condition in the 6 months prior to and following the October 2015 ICD transition (15).
We defined a cancer event based on observation of two diagnosis codes associated with the cancer of interest within 2 months of each other, in any care setting (16). Two diagnosis codes on the same day were not considered a qualifying event; secondary, in situ and benign cancers were excluded. Only the first qualifying incident diagnosis for each person was included, per cancer type. Incidence was defined with a 365-day washout for any diagnosis code specific to the site of interest. We assessed each cancer type individually, in addition to aggregated analyses with incidence defined as a new onset cancer after absence of any of the 19 cancer types of interest during the preceding 365 days. The aggregated evaluation was expanded to include a sensitivity analysis with event definitions of 3 and 6 months between two required diagnosis codes, instead of 2 months.
Follow-up time
We captured each person's observable time in the database in the periods prior to and following the qualifying event. Observable time prior to the event began with the person's enrollment start and ended the day prior to the date of the first qualifying diagnosis code. Time following the event began on the day of the first qualifying diagnosis code and ended on the first of any of the following: disenrollment, end of Data Partner data availability, or death. Death in the Sentinel System is defined as a medical encounter with a discharge status of expired, or a record in the Death Table which is populated from the National Death Index, State Death files, tumor data, and other local sources.
Analysis
Analyses were stratified by coverage type and source data (administrative claims collectively, Medicare individually, and integrated delivery system). For each cancer site, we summarized the frequency of cases and incidence rates by sex, age group (0–44, 45–64, 65+ years), and year of diagnosis. Descriptive statistics were calculated for follow-up time, cancer site, and sex. The frequency of incident cases was additionally examined by increasing length of duration (≥1, ≥2, ≥3, ≥4 years of patient longitudinal follow-up) separately for time prior to and following cancer diagnosis. The reason for end of patient's follow-up time in the system following diagnosis was examined across cancer sites.
Ethics statement
This Sentinel project is a public health surveillance activity conducted under the authority of the FDA and accordingly, is not subject to institutional review board oversight.
Data availability statement
The data generated in this study are not publicly available. Sentinel uses a distributed data approach in which Data Partners maintain physical and operational control of their own electronic health data after transforming it into a common data model. Sentinel does not save, maintain, or post individual level datasets to preserve patient privacy. However, related study results will be made available subsequent to publication on the Sentinel Initiative website (17).
Results
Cohort size by coverage type and data partner source
The assessment of potential impact of the ICD-10-CM transition on identification of cancer events in the Sentinel distributed database confirmed relatively consistent diagnostic code mappings across the 2015 switch from the ICD-9 to ICD-10 era, thereby allowing us to capture events using available codes across the study period (15). Among the 157 million remaining health plan members with medical coverage and drug enrollment coverage, we identified 5,166,982 members with at least one incident cancer diagnosis (Table 1) for a total of 6,638,134 cancer events. When changing the event definition to 3 and 6 months between two qualifying claims, the number of persons increased by 3% and 12%, respectively.
. | Medical coverage . | Medical and drug coverage . | ||||||
---|---|---|---|---|---|---|---|---|
. | All Data Partnersa . | Administrative Claims . | Medicareb . | Integrated Delivery System . | All Data Partnersa . | Administrative Claims . | Medicareb . | Integrated Delivery System . |
Eligible members | 226,953,399 | 212,783,554 | 50,231,064 | 14,169,845 | 156,959,321 | 143,578,255 | 34,635,110 | 13,381,066 |
(% of All Data Partners) | (94%) | (22%) | (6%) | (91%) | (22%) | (9%) | ||
Total diagnosis events | 10,697,573 | 10,234,169 | 6,693,274 | 463,404 | 6,638,134 | 6,203,001 | 4,162,167 | 435,133 |
(% of All Data Partners) | (95%) | (61%) | (5%) | (93%) | (61%) | (7%) | ||
Total members with any diagnosis, 2-month definitionc | 8,318,422 | 7,926,806 | 5,067,625 | 391,616 | 5,166,982 | 4,798,353 | 3,140,739 | 368,629 |
3-Month definitionc | 8,604,450 | 8,201,441 | 5,246,836 | 403,009 | 5,345,208 | 4,965,824 | 3,253,923 | 379,384 |
(% change) | (3%) | (3%) | (4%) | (3%) | (3%) | (3%) | (4%) | (3%) |
6-Month definition | 9,294,367 | 8,869,478 | 5,689,126 | 424,889 | 5,775,214 | 5,375,208 | 3,535,363 | 400,006 |
(% change) | (12%) | (12%) | (12%) | (8%) | (12%) | (12%) | (13%) | (9%) |
Demographics (% of members with any diagnosis, 2-month definition) | ||||||||
Female | 4,110,168 | 3,902,410 | 2,422,048 | 207,758 | 2,659,230 | 2,463,435 | 1,609,090 | 195,795 |
(49%) | (49%) | (48%) | (53%) | (51%) | (51%) | (51%) | (53%) | |
Male | 4,208,038 | 4,024,200 | 2,645,577 | 183,838 | 2,507,578 | 2,334,763 | 1,531,649 | 172,815 |
(51%) | (51%) | (52%) | (47%) | (49%) | (49%) | (49%) | (47%) | |
0–44 years of age | 309,265 | 282,907 | 29,664 | 26,358 | 201,423 | 176,796 | 24,856 | 24,627 |
(4%) | (4%) | (1%) | (7%) | (4%) | (4%) | (1%) | (7%) | |
45–64 years of age | 1,739,728 | 1,587,222 | 376,928 | 152,506 | 1,135,324 | 991,994 | 271,085 | 143,330 |
(21%) | (20%) | (7%) | (39%) | (22%) | (21%) | (9%) | (39%) | |
65+ years of age | 6,269,429 | 6,056,677 | 4,661,033 | 212,752 | 3,830,235 | 3,629,563 | 2,844,798 | 200,672 |
(75%) | (76%) | (92%) | (54%) | (74%) | (76%) | (91%) | (54%) |
. | Medical coverage . | Medical and drug coverage . | ||||||
---|---|---|---|---|---|---|---|---|
. | All Data Partnersa . | Administrative Claims . | Medicareb . | Integrated Delivery System . | All Data Partnersa . | Administrative Claims . | Medicareb . | Integrated Delivery System . |
Eligible members | 226,953,399 | 212,783,554 | 50,231,064 | 14,169,845 | 156,959,321 | 143,578,255 | 34,635,110 | 13,381,066 |
(% of All Data Partners) | (94%) | (22%) | (6%) | (91%) | (22%) | (9%) | ||
Total diagnosis events | 10,697,573 | 10,234,169 | 6,693,274 | 463,404 | 6,638,134 | 6,203,001 | 4,162,167 | 435,133 |
(% of All Data Partners) | (95%) | (61%) | (5%) | (93%) | (61%) | (7%) | ||
Total members with any diagnosis, 2-month definitionc | 8,318,422 | 7,926,806 | 5,067,625 | 391,616 | 5,166,982 | 4,798,353 | 3,140,739 | 368,629 |
3-Month definitionc | 8,604,450 | 8,201,441 | 5,246,836 | 403,009 | 5,345,208 | 4,965,824 | 3,253,923 | 379,384 |
(% change) | (3%) | (3%) | (4%) | (3%) | (3%) | (3%) | (4%) | (3%) |
6-Month definition | 9,294,367 | 8,869,478 | 5,689,126 | 424,889 | 5,775,214 | 5,375,208 | 3,535,363 | 400,006 |
(% change) | (12%) | (12%) | (12%) | (8%) | (12%) | (12%) | (13%) | (9%) |
Demographics (% of members with any diagnosis, 2-month definition) | ||||||||
Female | 4,110,168 | 3,902,410 | 2,422,048 | 207,758 | 2,659,230 | 2,463,435 | 1,609,090 | 195,795 |
(49%) | (49%) | (48%) | (53%) | (51%) | (51%) | (51%) | (53%) | |
Male | 4,208,038 | 4,024,200 | 2,645,577 | 183,838 | 2,507,578 | 2,334,763 | 1,531,649 | 172,815 |
(51%) | (51%) | (52%) | (47%) | (49%) | (49%) | (49%) | (47%) | |
0–44 years of age | 309,265 | 282,907 | 29,664 | 26,358 | 201,423 | 176,796 | 24,856 | 24,627 |
(4%) | (4%) | (1%) | (7%) | (4%) | (4%) | (1%) | (7%) | |
45–64 years of age | 1,739,728 | 1,587,222 | 376,928 | 152,506 | 1,135,324 | 991,994 | 271,085 | 143,330 |
(21%) | (20%) | (7%) | (39%) | (22%) | (21%) | (9%) | (39%) | |
65+ years of age | 6,269,429 | 6,056,677 | 4,661,033 | 212,752 | 3,830,235 | 3,629,563 | 2,844,798 | 200,672 |
(75%) | (76%) | (92%) | (54%) | (74%) | (76%) | (91%) | (54%) |
aCounts presented in “All Data Partners” represent the aggregate of cancers captured in “Administrate Claims” and “Integrated Delivery Systems.”
bCounts presented in “Medicare” are subset of the larger category “Administrative Claims.”
c2-, 3-, and 6-month definitions refer to the maximum time between two claims with ICD diagnosis codes for the same cancer site forming the basis for identifying members with cancers.
As would be expected since cancers occur more frequently among older individuals, Medicare, which accounted for 22% of eligible members with administrative claims, contributed 61% of cancer events (Table 1). Proportions of males and females were mostly balanced across coverage type and Data Partners; albeit the proportion of women was slightly higher in the integrated delivery systems. The majority of cancer diagnoses were among persons ages 65+ years of age, for both coverage types and contributing Data Partners.
Review of cancer types
Figure 1A demonstrates that female breast cancer was the most common diagnosis among persons with medical and drug coverage, representing over 1 million women and an incidence rate of 36.5 per 10,000 eligible person-years (py). The cancer type with the smallest number of members diagnosed was male breast cancer, representing 9,751 members and an incidence rate of 0.4 per 10,000 eligible py. Among non-sex-specific cancers, lung cancer was diagnosed most frequently (869,654 members, 15.6 per 10,000 eligible py) and anal cancer was diagnosed least frequently (46,767 members, 0.8 per 10,000 eligible py). Among men, the most common diagnosis was prostate cancer, with over 1 million men and an incidence rate of 40.4 per 10,000 eligible py. Men had a higher incidence rate for all but three cancers: anal cancer, other endocrine cancers, and thyroid cancer (Table 2).
. | By sex . | By age . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | Female . | Male . | 0–44 years . | 45–64 years . | 65+ years . | . | |||||
Cancer types . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | Mean age in years (Std) . |
Anal | 27,023 | 0.9 | 19,741 | 0.8 | 1,802 | 0.1 | 14,442 | 1.0 | 30,523 | 1.6 | 65.0 (13.2) |
Breast, female | 1,064,111 | 36.5 | — | N/A | 48,002 | 4.2 | 277,775 | 38.7 | 738,370 | 70.6 | 63.3 (12.3) |
Breast, male | — | N/A | 9,749 | 0.4 | 234 | 0.0 | 1,900 | 0.3 | 7,617 | 1.0 | 67.4 (12.3) |
Cervical | 51,328 | 1.7 | — | N/A | 7,989 | 0.7 | 15,655 | 2.1 | 27,686 | 2.5 | 55.0 (14.1) |
Colon | 276,193 | 9.2 | 242,241 | 9.4 | 11,691 | 0.5 | 92,761 | 6.5 | 414,002 | 22.4 | 68.2 (12.7) |
Endometrial | 192,780 | 6.4 | — | N/A | 6,504 | 0.6 | 53,783 | 7.3 | 132,501 | 12.0 | 63.6 (11.4) |
Esophageal | 24,171 | 0.8 | 67,020 | 2.6 | 1,266 | 0.1 | 19,102 | 1.3 | 70,830 | 3.8 | 67.7 (11.1) |
Kidney or pelvis | 105,577 | 3.5 | 150,053 | 5.8 | 9,181 | 0.4 | 57,714 | 4.0 | 188,743 | 10.1 | 64.6 (13.4) |
Leukemia | 113,704 | 3.8 | 132,734 | 5.1 | 16,428 | 0.7 | 40,712 | 2.9 | 189,309 | 10.2 | 62.4 (19.1) |
Liver | 53,487 | 1.8 | 82,026 | 3.2 | 2,545 | 0.1 | 34,013 | 2.4 | 98,966 | 5.3 | 66.2 (11.7) |
Lung | 448,364 | 15.0 | 421,251 | 16.3 | 7,299 | 0.3 | 146,584 | 10.3 | 715,771 | 38.6 | 69.4 (10.5) |
Melanoma | 244,252 | 8.2 | 337,032 | 13.0 | 33,528 | 1.5 | 118,234 | 8.3 | 429,530 | 23.2 | 63.8 (13.8) |
Non-Hodgkin's lymphoma | 175,777 | 5.9 | 183,222 | 7.1 | 21,998 | 1.0 | 70,544 | 5.0 | 266,467 | 14.4 | 64.3 (15.3) |
Other endocrine | 251,076 | 8.4 | 12,135 | 0.5 | 11,975 | 0.5 | 70,617 | 5.0 | 180,630 | 9.7 | 62.6 (13.0) |
Pancreatic | 104,070 | 3.5 | 94,550 | 3.6 | 2,550 | 0.1 | 32,216 | 2.3 | 163,862 | 8.8 | 69.0 (11.7) |
Prostate | — | N/A | 1,017,028 | 40.4 | 2,369 | 0.2 | 181,139 | 26.6 | 833,558 | 121.1 | 68.7 (9.4) |
Rectal | 95,520 | 3.2 | 102,761 | 4.0 | 6,738 | 0.3 | 48,814 | 3.4 | 142,737 | 7.7 | 65.6 (12.5) |
Thyroid | 115,395 | 3.9 | 41,755 | 1.6 | 32,068 | 1.4 | 53,895 | 3.8 | 71,189 | 3.8 | 53.8 (14.4) |
Urinary tract or bladder | 106,575 | 3.6 | 275,186 | 10.6 | 3,674 | 0.2 | 46,336 | 3.2 | 331,766 | 17.9 | 71.1 (11.3) |
. | By sex . | By age . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | Female . | Male . | 0–44 years . | 45–64 years . | 65+ years . | . | |||||
Cancer types . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | N . | Incidence rate . | Mean age in years (Std) . |
Anal | 27,023 | 0.9 | 19,741 | 0.8 | 1,802 | 0.1 | 14,442 | 1.0 | 30,523 | 1.6 | 65.0 (13.2) |
Breast, female | 1,064,111 | 36.5 | — | N/A | 48,002 | 4.2 | 277,775 | 38.7 | 738,370 | 70.6 | 63.3 (12.3) |
Breast, male | — | N/A | 9,749 | 0.4 | 234 | 0.0 | 1,900 | 0.3 | 7,617 | 1.0 | 67.4 (12.3) |
Cervical | 51,328 | 1.7 | — | N/A | 7,989 | 0.7 | 15,655 | 2.1 | 27,686 | 2.5 | 55.0 (14.1) |
Colon | 276,193 | 9.2 | 242,241 | 9.4 | 11,691 | 0.5 | 92,761 | 6.5 | 414,002 | 22.4 | 68.2 (12.7) |
Endometrial | 192,780 | 6.4 | — | N/A | 6,504 | 0.6 | 53,783 | 7.3 | 132,501 | 12.0 | 63.6 (11.4) |
Esophageal | 24,171 | 0.8 | 67,020 | 2.6 | 1,266 | 0.1 | 19,102 | 1.3 | 70,830 | 3.8 | 67.7 (11.1) |
Kidney or pelvis | 105,577 | 3.5 | 150,053 | 5.8 | 9,181 | 0.4 | 57,714 | 4.0 | 188,743 | 10.1 | 64.6 (13.4) |
Leukemia | 113,704 | 3.8 | 132,734 | 5.1 | 16,428 | 0.7 | 40,712 | 2.9 | 189,309 | 10.2 | 62.4 (19.1) |
Liver | 53,487 | 1.8 | 82,026 | 3.2 | 2,545 | 0.1 | 34,013 | 2.4 | 98,966 | 5.3 | 66.2 (11.7) |
Lung | 448,364 | 15.0 | 421,251 | 16.3 | 7,299 | 0.3 | 146,584 | 10.3 | 715,771 | 38.6 | 69.4 (10.5) |
Melanoma | 244,252 | 8.2 | 337,032 | 13.0 | 33,528 | 1.5 | 118,234 | 8.3 | 429,530 | 23.2 | 63.8 (13.8) |
Non-Hodgkin's lymphoma | 175,777 | 5.9 | 183,222 | 7.1 | 21,998 | 1.0 | 70,544 | 5.0 | 266,467 | 14.4 | 64.3 (15.3) |
Other endocrine | 251,076 | 8.4 | 12,135 | 0.5 | 11,975 | 0.5 | 70,617 | 5.0 | 180,630 | 9.7 | 62.6 (13.0) |
Pancreatic | 104,070 | 3.5 | 94,550 | 3.6 | 2,550 | 0.1 | 32,216 | 2.3 | 163,862 | 8.8 | 69.0 (11.7) |
Prostate | — | N/A | 1,017,028 | 40.4 | 2,369 | 0.2 | 181,139 | 26.6 | 833,558 | 121.1 | 68.7 (9.4) |
Rectal | 95,520 | 3.2 | 102,761 | 4.0 | 6,738 | 0.3 | 48,814 | 3.4 | 142,737 | 7.7 | 65.6 (12.5) |
Thyroid | 115,395 | 3.9 | 41,755 | 1.6 | 32,068 | 1.4 | 53,895 | 3.8 | 71,189 | 3.8 | 53.8 (14.4) |
Urinary tract or bladder | 106,575 | 3.6 | 275,186 | 10.6 | 3,674 | 0.2 | 46,336 | 3.2 | 331,766 | 17.9 | 71.1 (11.3) |
The lowest and highest mean (SD) ages occurred among persons with thyroid cancer, 53.8 years (14.4), and bladder cancer, 71.1 years (11.3). Incidence rates increased with age for all cancers, although the highest incidence rate per age group shifted from female breast cancer to prostate cancer for persons 65+ years of age at diagnosis. Annual incidence rates for all cancer types, for 2000 to 2020, are presented in Fig. 1B. These rates reflect new cancer diagnoses among eligible members accrued and available for study in the Sentinel Distributed Database during the study period. Notably, 2010 marked the beginning of data availability for Medicare as a Data Partner and only a select few Data Partners reported data into the first 2 months of 2020 at the time of the study.
Evaluation of follow-up time
Among members with any of the cancers of interest, nearly half (2,294,166/ 5,166,982 = 44.4%) had at least 4 years of observable time prior to the diagnosis, approximately a quarter (1,325,640/5,166,982 = 25.7%) had at least 4 years of observable time following the diagnosis, and more than a third (1,864,247/5,166,982 = 36.1%) had at least 2 years both prior to and following diagnosis. This was comparable for men and women (Supplementary Table S2).
By cancer type, observable time prior to the diagnosis of interest ranged from 366 days, to 19.6 years (Supplementary Table S3). Following diagnosis, observable time ranged from 1 day to 18.6 years. The mean durations of these periods ranged, by cancer type, from 4.0 (SD 2.6) to 4.6 (SD 2.9) years prior to diagnosis and 1.1 (SD 1.6) to 3.3 (SD 2.8) years following diagnosis (Fig. 2A). Follow-up terminated most frequently because of disenrollment; death was the least common identified reason for ending follow-up. The exception to this was for cancers with lower survival rates (lung, esophageal, liver, and pancreatic cancer) in which time following diagnosis ended due to death for more than 30% of the members (Fig. 2B).
Discussion
Stemming from the 21st Century Cures Act, efforts to harness information technology for evaluation of cancer treatments are of great public health interest (18, 19). Our findings suggest that Sentinel is a potentially valuable source of real-world data for drug-safety studies in the field of oncology. With over 10.5 million new onset cancer events identified from 2000 to 2020, Sentinel may be useful for characterizing at the population-level cancer risk in the post-approval setting, examining cancer drug utilization after diagnosis as well as safety post diagnosis for which drug use would be hypothesized to be a major contributing factor, or acute adverse events. However, utilization of Sentinel may be limited in its ability to examine rare cancers that require larger sample sizes or cancer risks in newly approved drugs for which there is limited duration of exposure of the medical product. As would be expected, addition of Medicare as a Data Partner to the Sentinel Distributed Database in 2010 doubled the number of incident cancer events that preferentially occur among older individuals.
Pharmacoepidemiology assessments of drug products post-approval require sufficient time for drug exposure-outcome observations. This study found duration of patient follow-up to be relatively consistent across cancer type prior to diagnosis. As expected, the duration of patient follow-up after cancer diagnosis varied by cancer type, reflecting the biology of certain cancer types having lower survival rates (lung 20.5%, esophageal 19.9%, liver 19.6%, pancreas 10.0%; ref. 14). We observed that for these cancers with shorter time following diagnosis, discontinuation was due to evidence of death for more than 30% of the members. Of note, capture of death information in the Sentinel Distributed Database is incomplete as it is limited mostly to in-hospital deaths and restricted by a 2- to 4-year data lag and Data Partner-specific dependencies (12).
As Sentinel continues to mature and evolve, further investigation may help to reassess its utility for rare cancers and provide insights with respect to appropriateness for post-market safety studies. That said, this study was not intended to validate a claims-based cancer diagnostic algorithm for any specific cancer type. Rather, we sought to broadly identify members with a specific cancer diagnosis to assess potential case numbers and observation times before and after a new cancer diagnosis for insured members in the Sentinel Distributed Database. For a broad cancer event identification, we required two claims with a cancer-specific ICD code within no more than 2 months (3 and 6 months in sensitivity analyses), to minimize potential misclassification of patients with cancers. For this reason, and because the underlying population (insured vs. general population) as well as the reporting mechanism (claims for care services versus facility reporting of individuals with cancer diagnoses) differ from the authoritative SEER Program cancer statistics, one might expect higher cancer incidence rates identified in this study than in SEER data. A crude comparison of cancer incidence rates among members 65 years old and older in the present Sentinel study population to those in the SEER 18 Registry 2000 to 2018 data confirms this assumption (20).
This analysis had additional limitations. This study did not include a comprehensive list of cancer types and since the focus was on new onset cancer diagnoses, analyses did not include secondary cancers, recurrences, or cancers in remission. Like other claims-based data systems, Sentinel lacks clinical details (cancer stage at diagnosis, tumor markers) that may be useful when considering cancer risk studies and new oncology treatments (21). However, the purpose of our analyses was to assess the potential use of the Sentinel Distributed Database to study drug exposure–outcome relationships in oncology. Cancers could either be risk outcomes associated with therapeutics or cancer therapeutics could be exposures of interest among patients with cancers. Our findings indicate that Sentinel Distributed Database could be a valuable, large, readily analyzable data source for both purposes, with two caveats: despite the large number of cancer events identified, rare cancers may not be sufficiently represented; and precancer observations times of members in the database may be too short to detect long-latency cancer risks. As such, Sentinel may constitute a first step to target more detailed research in combination with other real-world data sources such as cancer registries or electronic healthcare records that may enhance or augment information about cancer outcomes and cancer treatment utilization and safety (6).
In conclusion, a national distributed system with routinely updated electronic health data and standardized analytic tools provides opportunities for Sentinel to contribute rapid and large-scale assessments of cancer risks associated with therapeutics and may serve as a useful tool for assessing utilization and safety of cancer therapeutics. Enrolled time before and after a cancer diagnosis may provide sufficient duration of follow-up for studies in which cancer treatment-related adverse events are of relatively short latency, or studies in which cancer diagnosis is a component of the cohort eligibility. This study provides resource for future studies aiming to answer a variety of cancer-related questions for FDA. We developed a flexible, reusable publicly available analytic code to assess impacts of the ICD-9-CM to ICD-10-CM coding era switch, sex-specific annual crude counts of new onset cancers by age group and coverage type, and duration of observable time prior to and following incident cancer diagnoses. That said, considerations of relevancy and reliability are important when determining the appropriateness of a data source, including Sentinel, for generating real-world evidence in support of regulatory decision-making (2). Feasibility assessments that additionally consider study-specific eligibility criteria and medical product utilization are warranted prior to performing in-depth pharmacoepidemiologic studies examining a particular drug product and safety issue.
Authors' Disclosures
A.K. Wagner reports grants from FDA during the conduct of the study. C.E. Leonard reports grants from NIH, Veterans Affairs, and CDC; grants and other support from FDA; nonfinancial support from John Wiley & Sons; personal fees from University of Florida, University of Massachusetts, CIHR, and Consortium for Medical Marijuana Clinical Outcomes Research; other support from Reagan-Udall Foundation, Pfizer, and Sanofi outside the submitted work. No disclosures were reported by the other authors.
Disclaimer
The opinions and views expressed in this article are the authors’ alone and not those of the FDA nor the NIH.
Authors' Contributions
N.R. Haug: Data curation, formal analysis, writing–original draft, writing–review and editing. A.K. Wagner: Methodology, writing–review and editing. K.A. McGlynn: Methodology, writing–review and editing. C.E. Leonard: Methodology, writing–review and editing. M.D. Nguyen: Methodology, writing–review and editing. J.M. Major: Conceptualization, methodology, writing–original draft, writing–review and editing.
Acknowledgments
This project was supported by the FDA contract no. HHS75F40119D10037. The opinions in this manuscript are those of the authors and not necessarily those of the FDA. This study was performed while J.M. Major was employed with CDER; she is currently with CDRH. Many thanks are due to Data Partners who provided data used in the analysis. Data Partners who provided data used in the analysis included Aetna, a CVS Health company; Duke University School of Medicine, Department of Population Health Sciences, through the Centers for Medicare and Medicaid Services which provided data; Harvard Pilgrim Health Care Institute; HealthCore/Anthem Inc., Translational Research for Affordability and Quality; HealthPartners Institute; Humana Healthcare Research Inc.; Kaiser Permanente Colorado Institute for Health Research; Kaiser Permanente Hawaii Center for Integrated Health Care Research; Kaiser Permanente Mid-Atlantic States, Mid-Atlantic Permanente Research Institute; Kaiser Permanente Northern California, Division of Research; Kaiser Permanente Northwest Center for Health Research; Kaiser Permanente Washington Health Research Institute; Marshfield Clinic Research Institute; OptumInsight Life Sciences Inc.; Vanderbilt University Medical Center, Department of Health Policy, through the TennCare Division of the Tennessee Department of Finance & Administration which provided data.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).