Abstract
Identifying changes in geographic disparities of cancer mortality reveals locations where cancer prevention and control efforts should be focused/targeted. We use recent cancer surveillance data to demonstrate the geographic disparity of major cancer mortality rates in the United States and its shift compared with previous data.
This cross-sectional study used the 2018 to 2022 county-level mortality rates of colorectal, lung, breast, and prostate cancers from the Centers for Disease Control mortality data. Counties with suppressed death counts were imputed by spatial regression models. Getis–Ord Gi* statistics were used to evaluate the spatial clustering of county mortality. Identified hotspot counties were visualized and compared with literature for hotspot pattern change.
A total of 3,108 US mainland counties were included. Cancer mortality rates were significantly higher in 244 counties for colorectal, 456 for lung, 147 for breast, and 180 for prostate cancers. Hotspot areas were central Appalachia (colorectal and lung cancers), Lower Mississippi Delta (colorectal, breast, and prostate cancers), Midwest (colorectal and lung cancers), north Michigan/Wisconsin (lung and prostate cancers), north Florida (lung cancer), and the West (prostate cancer).
West central Appalachia and Lower Mississippi Delta continue to be hotspots for major cancer types, whereas previously identified eastern North Carolina/Virginia hotspots shrunk, east Oklahoma and North Florida emerged as new hotspots for lung cancer, and several hotspots emerged in the West for prostate cancer.
This study updated the analyses for geospatial disparity in major cancer mortality since 2018, illustrating recent changes in the disparity pattern and pinpointing areas that cancer prevention and control efforts should target.
Introduction
Cancer is a leading cause of death worldwide and the second leading cause in the United States. Two million cancer diagnoses and 611,720 cancer-related deaths are expected to occur in the United States in 2024 (1), approximating 1,676 deaths per day.
There are geographic disparities in cancer outcomes (e.g., incidence and mortality) in the United States that may arise because of differences in (i) demographic characteristics, such as age, sex, race, and ethnicity; (ii) health care support, such as accessibility to cancer screening, medical insurance, and primary care providers; (iii) behavioral risk factors, such as smoking, drinking, diet, and physical activity; and (iv) environmental risk factors, such as pollution (2). Various databases were established to monitor population-level cancer outcomes and are extensively used by the public and researchers to evaluate geographic disparity across the United States. The Surveillance, Epidemiology, and End Results database has collected state-wide registry data since the 1970s. Population-based survey databases—the Behavioral Risk Factor Surveillance System (BRFSS), National Health Interview Survey, and National Health and Nutrition Examination Survey (3)—also collect data for cancer outcomes and common risk factors. Additionally, the Centers for Disease Control (CDC) maintains the US Cancer Statistics database. Although these databases provide convenient direct visualization (e.g., Supplementary Figs. S1–S4) for evaluating geographic areas for cancer mortality rates, they are prone to bias because of the choice of cut points and unstable small-area estimates.
Hotspot analyses with spatial clustering incorporate neighborhood-area correlation and county-size heterogeneity and are preferred over simple mapping to locate geographic patterns of cancer mortality. Many researchers have used hotspot analyses of cancer mortality rates for specific subpopulations or states (4–14). Moore and colleagues (13) reported a hotspot analysis of breast cancer mortality rates from 2000 to 2015. Shreves and colleagues (14) analyzed the geographic pattern of lung cancer mortality rates from 2005 to 2018. Rogers and colleagues (6) examined geographic disparities in early-onset colorectal cancer mortality among men using 1999 to 2017 data. Siegel and colleagues (7) identified hotspots of colorectal cancer mortality using historical data from 1970 to 2009. All the above studies analyzed the county-level data. Other studies exist for less common cancer types and local areas using older data. As more recent cancer data have become available, it is necessary to update the geospatial analysis for cancer mortality for a timely evaluation of the geospatial disparity in the United States and to ensure that prevention services focus on priority areas.
Here, we conducted a national spatial hotspot analyses for mortality rates of four major cancers, accounting for more than 44% of all cancer deaths (namely, colorectal, lung, breast, and prostate cancers), using the latest 2018 to 2022 CDC Wide-ranging ONline Data for Epidemiologic Research (WONDER) Underlying Cause of Death data. Identified hotspots were visualized and compared with literature for potential shifts in high mortality geospatial clusters. We also examined areas for common risk factors that underlie cancer mortality disparities. This is the first update for geospatial disparity in mortality of major cancer types since 2018 and since the COVID-19 pandemic. Findings illustrate changes in the disparity pattern and pinpoint areas that cancer prevention and control efforts should target.
Materials and Methods
We evaluated mortality rates of four cancer types: colorectal, lung, breast, and prostate cancers. Data were analyzed at the US county level. We included counties in the US mainland states and obtained their cancer mortality data for the years 2018 to 2022 from the CDC WONDER website. The CDC WONDER provides aggregated cancer mortality (identified by the International Classification of Diseases, Tenth Revision Cause List) data, with original data collected by state registries and provided to the National Vital Statistics System.
Death count imputation
We define the pseudo to evaluate the goodness of fit of the spatial regression models. The pseudo was based on the sum-squared Pearson residuals (19), in which is the estimated log mortality rate using the above spatial regression model and is derived from the null model (i.e., setting in the above model). The pseudo was between 0 and 1, with larger values indicating a better fit of the data. Next, to impute the suppressed death counts, we truncated the predicted death counts at the suppression limit (i.e., ) as the imputed death counts. The county cancer mortality rates were then calculated as the annual death counts divided by the population at risk.
Hotspot analysis
We calculated the Getis–Ord Gi* statistics for spatial clustering (20). The Gi* statistic identifies spatial clusters, i.e., hotspots of counties with high mortality rates based on their neighborhood counties. It also provides measures of statistical significance based on permutation testing for both global and local spatial clustering, in which a larger global G test statistic (and smaller P value) indicates more spatially clustering. This clustering approach was used in other cancer geospatial analyses (7, 13). We used queen contiguity to define neighborhood (i.e., neighbors share the edge or vertex) and used the row-standardized adjacent matrix as the spatial weight matrix (18). The counties with imputed mortality rates were used to evaluate their neighboring counties’ spatial clustering, but their own clustering was not calculated. Hotspot counties were visualized, and spatial clustering patterns were compared with those from prior studies to reveal the shifting and shrinking of previously identified hotspot counties.
We provided some sensitivity analyses. First, we used rook (instead of queen) contiguity (18) to define neighborhood (i.e., neighbors share only the edge). Second, we adjusted the mortality rates to the US 2010 census population age. This was done by setting the county median age in the fitted spatial regression model as in the year 2010 and “predicting” the mortality rates. Assuming that the predicted linear predictor was , the age-adjusted mortality rate was calculated as . Third, we also provide spatial clustering for those counties with imputed mortality. All analyses were conducted using R software. The spatial regression models were fitted using the spaMM package (v 4.5.0; ref. 21), and the spatial clustering analysis was conducted using the sfdep package (v 0.2.4). The data used in this study are publicly available from CDC WONDER. A description of detailed data processing and analytic methods is available in the Supplementary Materials. The R code is available at https://github.com/chongliang-luo/NationalCancerHotspot.
Data availability
The cancer mortality data were obtained from the CDC WONDER Underlying Cause of Death data (RRID: SCR_025830). We extracted county-level, Single Race (all race), 2018 to 2022 data. The cancer types are grouped by the International Classification of Diseases, Tenth Revision 113 Cause List: malignant neoplasms of the colon, rectum, and anus (C18–C21); malignant neoplasms of the trachea, bronchus, and lung (C33–C34); malignant neoplasm of the breast (C50); and malignant neoplasm of the prostate (C61).
The predictors data for death count imputation were obtained from the County Health Ranking 2018 data (https://www.countyhealthrankings.org/health-data), which originated from the BRFSS (RRID: SCR_012974) data. The population data (2020 total or male/female population and male/female ratio) were obtained from the R package tidycensus, which originated from the American Community Survey (RRID: SCR_011587). The median age (2020) and 2018 poverty rate data were obtained from the Social Explorer (https://www.socialexplorer.com/explore-maps). The county geography data were obtained from the US Census Bureau (https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). All the data were extracted on or around August 18, 2024. The processed data are available at https://github.com/chongliang-luo/NationalCancerHotspot.
Results
Mortality rates for the four major cancers and spatial clustering analysis are summarized in Table 1. A total of 3,108 counties were included. The proportion of counties with suppressed mortality data ranged from 6.8% (lung cancer) to 31.3% (prostate cancer). Lung cancer led to the highest mortality rates among the four cancers. The spatial error regression models achieved a good fit of the death counts, with the pseudo R2 being 77.2%, 87.5%, 62.7%, and 63.8% for colorectal, lung, breast, and prostate cancer deaths, respectively. The detailed spatial error regression results are available in Supplementary Table S1.
Summary of the mortality rates and spatial clustering analysis of the four major cancers in the United States, 2018 to 2022 data.
Cancer type . | Number of counties with no suppressiona (total N = 3,108) . | National average mortality per 100,000 . | Number of hotspot counties . |
---|---|---|---|
Colorectal | 2,572 (82.8%) | 16.4 | 244 (7.9%, 64 + 180) |
Lung | 2,897 (93.2%) | 42.0 | 456 (14.7%, 235 + 221) |
Breast (female) | 2,259 (73.8%) | 25.9 | 147 (4.7%, 49 + 98) |
Prostate (male) | 2,134 (68.7%) | 20.2 | 180 (5.8%, 68 + 112) |
Cancer type . | Number of counties with no suppressiona (total N = 3,108) . | National average mortality per 100,000 . | Number of hotspot counties . |
---|---|---|---|
Colorectal | 2,572 (82.8%) | 16.4 | 244 (7.9%, 64 + 180) |
Lung | 2,897 (93.2%) | 42.0 | 456 (14.7%, 235 + 221) |
Breast (female) | 2,259 (73.8%) | 25.9 | 147 (4.7%, 49 + 98) |
Prostate (male) | 2,134 (68.7%) | 20.2 | 180 (5.8%, 68 + 112) |
NOTE: The parentheses in the last column contain the percentage of counties with available mortality data (the second column) and the number of hotspot counties with P < 0.01 + the number of hotspot counties with P < 0.05.
Death counts that are between 1 and 9 are suppressed in the CDC WONDER Underlying Cause of Death data.
Figure 1 (or Supplementary Figs. S5–S8) shows the results of spatial clustering analysis. The numbers of hotspot counties per cancer type are listed in Table 1. Colorectal cancer hotspots were clustered in central Appalachia, Lower Mississippi Delta, and also small areas in the Midwest (Fig. 1A; global G test statistic = 9.9; P < 1e–15). Lung cancer hotspots were clustered in central Appalachia, Missouri–Kentucky–Arkansas–Tennessee joint area, east Oklahoma, and small areas in north Michigan and north Florida (Fig. 1B; global G test statistic = 17.6; P < 1e–15). Breast cancer had less spatial disparity, with small hotspots clustered in the Lower Mississippi Delta and southeast Kansas areas (Fig. 1C; global G test statistic = 4.0; P < 1e–4). Unlike the other cancers, prostate cancer hotspots were primarily clustered in the West (California–Oregon–Idaho, north Arizona, and New Mexico) and also small clusters in north Wisconsin and Lower Mississippi Delta (Fig. 1D; global G test statistic = 7.1; P < 1e–12). The identified hotspot counties are listed in Supplementary Table S2. The average mortality rates in the identified hotspot, nonsignificant, and cold spot areas are shown in Fig. 2, in which the mortality rates in the hotspot counties on average were significantly higher than those in the nonsignificant or cold spot counties for all the four cancers.
Hotspots of major cancer mortality rates in the United States, 2018−2022. The hotspots were identified via Getis–Ord Gi* statistics and permutation tested for statistical significance. Hotspot clusters were manually circled. A, Colorectal cancer; B, lung cancer; C, breast cancer; D, prostate cancer. Data were obtained from the CDC WONDER website, and some counties are suppressed because of small death counts (5-year total between 1 and 9). Higher-resolution maps are presented in Supplementary Figs. S1–S4.
Hotspots of major cancer mortality rates in the United States, 2018−2022. The hotspots were identified via Getis–Ord Gi* statistics and permutation tested for statistical significance. Hotspot clusters were manually circled. A, Colorectal cancer; B, lung cancer; C, breast cancer; D, prostate cancer. Data were obtained from the CDC WONDER website, and some counties are suppressed because of small death counts (5-year total between 1 and 9). Higher-resolution maps are presented in Supplementary Figs. S1–S4.
Mortality rates of the four major cancers by the identified hotspot clusters (United States, 2018−2022). The dashed horizontal lines indicate the national average mortality rates.
Mortality rates of the four major cancers by the identified hotspot clusters (United States, 2018−2022). The dashed horizontal lines indicate the national average mortality rates.
The results of the sensitivity analyses are presented in Supplementary Figs. S9–S20. In the first sensitivity analysis that used rook contiguity neighbor definition, the resulted hotspot/cold spot counties that were different from the main result were few: 91 (2.9%), 69 (2.2%), 59 (1.9%), and 56 (1.8%) for colorectal, lung, breast, and prostate cancers, respectively. These counties were changed from hotspot (or cold spot) to nonsignificant or vice versa, and no counties were changed from hotspot to cold spot or vice versa. This indicated that the hotspot clustering was robust to the choice of the neighbor and spatial matrix definition. The other two sensitivity analyses facilitated age-adjusted mortality comparison with literature and the inclusion of suppressed counties in hotspots.
Discussion
Our study indicates that notable shifts in cancer mortality hotspot clustering occurred in the past decade (2018–2022) compared with previous years. Siegel and colleagues (7) showed that colorectal cancer hotspots shifted from the large northeast and Midwest areas in 1970 to 1999 to the west central Appalachia, Lower Mississippi Delta, and eastern North Carolina/Virginia areas in 2000 to 2009. However, these eastern North Carolina/Virginia hotspots shrank in our study, and smaller areas in the Midwest (Iowa, Nebraska, and Kansas) emerged as new colorectal cancer hotspots. Shreves and colleagues (14) showed lung cancer hotspots in west central Appalachia and Lower Mississippi Delta areas from 2005 to 2018, whereas we show east Oklahoma, north Michigan, and north Florida as the new hotspots. For breast cancer, Moore and colleagues (13) indicated hotspots in the Lower Mississippi Delta, South Carolina, and eastern North Carolina/Virginia areas using 2000 to 2015 data, and our analysis resembles a similar pattern, with new hotspots emerging in the southeast Kansas area. Prostate cancer hotspots were mostly clustered in the north before 2000 (22–24). A recent study (25) reported higher prostate cancer mortality rates in the West region in the recent decades, whereas our analysis is the first to show that hotspots have emerged in the West.
Our study shows that in general, many previously identified cancer mortality hotspots have shrunk, with the Lower Mississippi Delta remaining a persistent hotspot and east Oklahoma (for lung cancer), scattered areas in the Midwest (for colorectal cancer), and the West (for prostate cancer) emerging as new hotspots. To illustrate this trend, we included plots (Supplementary Fig. S21) showing the historical trend of cancer mortality in the United States and some representative states (e.g., Mississippi, Kansas, Oklahoma, Michigan, etc.). Although all these states have experienced a gradual decline in mortality over the past two decades, the rate of decline has been slower compared with the national average, indicating that cancer prevention and control efforts may have varying impacts. This finding highlights the importance of implementing more efficient and targeted interventions in underprivileged areas (e.g., rural areas in the Midwest and Lower Mississippi Delta; refs. 26, 27). For example, Oklahoma has been identified as one of the states with the lowest rates of cancer screening (28, 29). This implies that there may be barriers to accessing and using cancer screening services in this area. Oklahoma is also one of the states with the highest proportion of native American population (9.4%). Previous studies (30–32) have reported elevated risks of cancers and less screening among the Native American population. This disparity might be associated with factors such as mistrust of and dissatisfaction with the healthcare system (33, 34), which may be due to the history of colonialism, racism, and unethical use of Native American tribes’ research data (35). By focusing resources and initiatives on improving cancer screening rates in Oklahoma and other similar underprivileged areas, it is possible to address these disparities and improve cancer prevention and control efforts. Although NCI-designated cancer centers make essential contributions to regional cancer prevention and treatment, there are few cancer centers in the Lower Mississippi Delta states (see Supplementary Figs. S1–S4; ref. 36).
Over a more prolonged period (e.g., since the 1970s), there has been a noteworthy shift in high cancer mortality areas from the north to the south. This transformation for colorectal cancer was demonstrated by Siegel and colleagues (7). Additionally, there is literature indicating a similar shift for prostate cancer. In early studies, the underlying factor contributing to this shift in prostate cancer mortality was suggested to have changed from lower UV radiation–related vitamin D synthesis in earlier studies (1970–1994; ref. 22) to lack of access to medical care as strong predictors of cancer mortality (1995–2000; ref. 23). However, it remains unclear why the prostate cancer mortality rates in recent decades are higher in the West region (25), with one possible reason being that men in the West and Midwest had the highest rate of active surveillance for low-risk prostate cancer (37). On the other hand, common cancer risk factors display a geospatial disparity, with a higher prevalence of risk factors in cancer mortality hotspots. For example, the Lower Mississippi Delta states (Los Angeles, Mississippi, and Arkansas) have persistently higher poverty rates, and their decrease in the 3-year average poverty rate from 2009 to 2011 to 2019 to 2021 was not significant (38). The current cigarette use among adults is also among the highest in these states.
This study has several limitations. Public cancer mortality data are usually subject to suppression, especially for subgroups such as those stratified by race/ethnicity or age groups, making the evaluation of racial disparity difficult. To reduce the impact of suppressed counties on hotspot analysis of their neighbor counties, we decided to impute them using cancer risk factors. Risk factors could be extended (using cancer origin–specific risk factors) to improve the prediction and robustness of the hotspot analysis. Moreover, the hotspot analysis results may also depend on the geospatial clustering methods and parameters (13, 39). More extensive sensitivity analyses may be needed to make the identified hotspot patterns more robust.
Health equity is an essential objective of the Cancer Moonshot Initiative (40). Although the overall risk of death from cancer has been decreasing across the nation, significant geospatial disparities persist, leading to higher mortality rates in certain regions. Areas such as the Lower Mississippi Delta, Appalachia, and Oklahoma are now cancer mortality hotspots, indicating rural–urban discrepancy. The disparity is deeply rooted in socioeconomic decline and differences in cancer prevention and control efforts. Although cancer prevention through lifestyle interventions is desirable, they can be challenging to implement effectively. Therefore, focusing on cancer control measures, such as promoting and improving access to screening and treatment, is often a more efficient approach for targeted interventions from a public policy perspective. By targeting these underresourced areas with tailored interventions, we can work toward reducing the geospatial disparity in cancer mortality and promoting health equity across the nation.
Authors’ Disclosures
A.S. James reports grants from the NCI during the conduct of the study. No disclosures were reported by the other authors.
Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH
Authors’ Contributions
C. Luo: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. S. Khan: Validation, methodology, writing–review and editing. L. Jin: Data curation, software, visualization. A.S. James: Writing–review and editing. G.A. Colditz: Supervision, methodology, writing–review and editing. B.F. Drake: Conceptualization, supervision, investigation, methodology, writing–review and editing.
Acknowledgments
The authors extend their gratitude to Dr. Heather Siedhoff for the meticulous language editing in the preparation of this manuscript. C. Luo’s work was supported by the NCI of the NIH under award number P50CA244431.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).