Background:

Disparities in the stage at diagnosis for breast cancer have been independently associated with various contextual characteristics. Understanding which combinations of these characteristics indicate highest risk, and where they are located, is critical to targeting interventions and improving outcomes for patients with breast cancer.

Methods:

The study included women diagnosed with invasive breast cancer between 2009 and 2018 from 680 U.S. counties participating in the Surveillance, Epidemiology, and End Results program. We used a machine learning approach called Classification and Regression Tree (CART) to identify county “phenotypes,” combinations of characteristics that predict the percentage of patients with breast cancer presenting with late-stage disease. We then mapped the phenotypes and compared their geographic distributions. These findings were further validated using an alternate machine learning approach called random forest.

Results:

We discovered seven phenotypes of late-stage breast cancer. Common to most phenotypes associated with high risk of late-stage diagnosis were high uninsured rate, low mammography use, high area deprivation, rurality, and high poverty. Geographically, these phenotypes were most prevalent in southern and western states, while phenotypes associated with lower percentages of late-stage diagnosis were most prevalent in the northeastern states and select metropolitan areas.

Conclusions:

The use of machine learning methods of CART and random forest together with geographic methods offers a promising avenue for future disparities research.

Impact:

Local interventions to reduce late-stage breast cancer diagnosis, such as community education and outreach programs, can use machine learning and geographic modeling approaches to tailor strategies for early detection and resource allocation.

Because of increased screening and awareness of symptoms in the past few decades, the rate of advanced stage breast cancer (for which prognosis is markedly poorer) has decreased dramatically (1). However, breast cancer remains the second leading cause of cancer death in U.S. women (1). In addition, segments of our population continue to bear a disproportionate burden and experience high rates of late-stage diagnosis.

Previous studies have linked place-based disparities in late-stage breast cancer (LSBC) to numerous contextual characteristics, including socioeconomic status (2, 3), neighborhood deprivation (4, 5), access to screening services (5–9), and availability of primary care physicians (10–12). Studies of LSBC traditionally have considered all predictor variables independently using a parametric regression framework (2–8, 10). However, such a framework does not lend itself to identifying homogeneous subgroups, nor is it efficient in detecting effect measure modification in predicting LSBC. For example, one variable might be important in predicting LSBC in a certain subgroup of a population but may not be as important in other subgroups of the population. Several studies have recognized this issue by stratifying their populations into subgroups by factors such as race/ethnicity (4, 11), income or poverty level (11, 12), and urban-rural status (4, 13). However, the decision on which attributes to stratify the data may be subjective, and the selection of threshold values often are arbitrary. In addition, the complexity of a model increases substantially when considering stratifications along multiple dimensions. Therefore, more advanced solutions are needed to explore the correlates of LSBC among different subgroups of the population.

In this study, we identified “phenotypes,” or combinations of county characteristics that predict the percentage of women with breast cancer presenting with late-stage disease. We applied the machine learning technique known as classification and regression tree (CART) with a broad range of county-level characteristics harvested from various sources. This resulted in the classification of counties into phenotypes based on the most important predictors of LSBC. We then examined the geographic distribution of counties with high-risk phenotypes. These findings were further examined using random forest.

Identifying specific clusters of characteristics associated with late-stage diagnosis acknowledges the complex relationships among selected drivers of cancer disparities. It also offers researchers and practitioners a better framework for addressing disparities across heterogeneous and more highly specified groups.

Data source

This study used cancer incidence data from the Surveillance, Epidemiology, and End Results (SEER) program, a resource from the NCI. Data in the SEER program cover 34.6% of the U.S. population with 97% completeness within SEER regions (14). These data cover geographically diverse regions of the country and are broadly representative of the U.S. population along the dimensions of poverty and education (15). The SEER*Stat software was used to query and extract the data. Given the deidentified nature of the data, the Case Western Reserve University Institutional Review Board determined that this work did not involve human subjects research and was thus exempted from review.

Study population and variables of interest

The study included 20 of the 21 registries in the SEER program. We excluded the Alaska Native Tumor Registry because it does not cover cancer cases of all demographic groups (16). The included registries cover the U.S. states of California (Greater California, Los Angeles, San Francisco-Oakland, and San Jose-Monterey registries), Connecticut, Georgia (Atlanta, Greater Georgia, and Rural Georgia registries), Hawaii, Idaho, Iowa, Kentucky, Louisiana, Massachusetts, New Jersey, New Mexico, New York, Utah, and metropolitan areas of Detroit and Seattle-Puget Sound. A total of 732 counties or equivalents (i.e., parishes in Louisiana; for convenience, “counties” is used in the rest of the text) from the 20 SEER registries were included in the study. The outcome of interest was the county-level percentage of LSBC among women diagnosed with invasive breast cancer during a 10-year period between 2009 and 2018. For Massachusetts, only cases from 2009 through 2017 were included because the stage variable in 2018 was not available at the time of the study. Further explanation of the Massachusetts data can be found in Supplementary Fig. S1 and Fig. S2. We used the “Combined Summary Stage” variable from the SEER*Stat software, which classifies tumors into five stages: in situ, local, regional, distant, and unknown stages (excluding in situ and unknown stage cases from our analysis). We collapsed the stage variable to “early stage” (which includes local stage only), and “late stage” (which includes regional and distant stages). To mitigate stochastic variations in percentages of LSBC, we excluded counties with fewer than 16 late-stage cases over the study period, a strategy also adopted by SEER in displaying cancer statistics, resulting in a total of 680 counties in the study. For individuals with multiple tumor records, we selected only the first record.

County-level characteristics were harvested from the Census-American Community Survey (ACS; ref. 17), County Health Rankings & Roadmaps (CHR; ref. 18), Area Health Resources Files (AHRF; ref. 19), Behavioral Risk Factor Surveillance System (BRFSS; ref. 20), and FDA (21), as well as from SEER*Stat. In total, 53 variables were included in our models (Table 1). These variables were selected on the basis of several domains of health care resources, behavioral risk factors, population health status, demographic compositions, and other measures of social determinants of health including income, education, occupation, housing, transportation, and neighborhood safety. Their relationships with diagnosis and treatment of breast cancer were explained under the Conceptual Reason for Inclusion column in Table 1. We temporally harmonized the data sources by selecting the years of these variables that overlapped with or were closest to the mid-years of the breast cancer data, with the assumption that no significant secular trends would substantively change any of the factors described in Table 1.

Table 1.

Definitions of predictor variables in CART and random forest analyses.

VariableYear(s)SourceConceptual reason for inclusion
% Patients younger than 65 2009–2018 SEER Women ages 65 years and older may have better access to health insurance as they are mostly eligible for Medicare. 
Incidence rate of breast cancer (age-adjusted)   Incidence rate is a measure of the breast cancer burden in the population. 
Mammography use (age 40+) 2010 BRFSS Mammography screening helps detect breast cancer at an early stage (1). 
Mammography facilities per 100,000 population 2016 FDA The availability of mammography could impact the uptake of screening in an area (5–9). 
Hospitals per 100,000 population 2015 AHRF Hospitals could recommend and provide screening services for women. 
Community Health Centers per 100,000 population 2014  Community Health Centers could provide screening services or facilitate referrals for screening. 
Primary care physicians per 100,000 population 2014  Sufficient primary care physicians are essential for preventive cancer care, and referrals for diagnostic services when necessary (10–12). 
Obstetricians-gynecologists per 100,000 population 2015  Obstetrician and gynecologists are more likely to discuss and perform breast cancer screening and more likely to recognize breast cancer than other physicians (37, 38). 
Radiologists per 100,000 population 2015  Radiologists are essential for the diagnosis and staging of breast cancer. 
Population estimate 2014  An indicator of urbanicity, which may be associated with screening uptake (39). 
% Urban population 2010  An indicator of urbanicity, which may be associated with screening uptake (39). 
Per capita income 2014  Lower income is associated with late-stage diagnosis of breast cancer (10, 26). 
Rural-urban continuum code 2013  An indicator of urbanicity, which may be associated with screening uptake (39) 
Urban Influence code 2013  An indicator of urbanicity, which may be associated with screening uptake (39) 
Health professional shortage area - primary care 2015  Sufficient availability of primary care physicians is essential for preventive cancer care, and referrals for diagnostic services when necessary (10–12). 
% Eligible for medicare 2014  Medicare-eligible individuals are more likely to undergo all cancer preventive services (40). 
Median household income 2014  Lower income has been associated with late-stage diagnosis of breast cancer (10, 26). 
% People in poverty 2014  Poverty has been associated with late-stage diagnosis of breast cancer (41). 
% Food stamp or SNAP recipients 2014  Receipt of SNAP benefits may be predictive of breast cancer tumor size (42). 
% Uninsured women (age 18–64) 2014  Women without health insurance are more likely to be diagnosed at late stage for breast cancer compared with those with insurance (43). 
% People under 200% of poverty (age 18–64) 2014  Lower income and poverty are associated with late stage at breast cancer diagnosis (10, 26, 41). 
% People with poor or fair health 2014 CHR Overall poor or fair health may be positively or negatively associated with mammography screening based on either increased health care contacts or competing health priorities (44). 
Poor physical health days 2014  More poor physical health days may be positively or negatively associated with mammography screening based on either increased health care contacts or competing health priorities (44). 
Poor mental health days 2014  People with more poor mental health days may have lower priority of screening in the context of managing other medical and life issues (44). 
Adult smoking 2014  Smoking may be inversely associated with use of colorectal cancer screening tests. (45). 
Excessive drinking 2014  Alcohol consumption may be associated with breast cancer screening rates (46). 
Teen births 2007–2013  Teenage women who bear a child are more likely to have lower socioeconomic status and psychologic distress in their later lives (47, 48). 
Children in poverty 2014  Child poverty could reflect long-term negative consequences of the population along various aspects of social determinants health (49). 
Low birthweight 2008-2014  Low birthweight may indicate maternal exposure to various health risks (50). 
Adult obesity 2013  High body mass is associated with late-stage breast cancer at diagnosis (27, 51). 
Food environment index 2010&2014  The urgency of food insecurity may deprioritize the receipt of preventive screening services (52). 
Physical Inactivity 2013  Physical activity may be associated with use of colorectal cancer screening tests (45). 
Access to exercise opportunities 2010&2014  A study found that physical activity was also associated with use of colorectal cancer screening tests (45). 
Mammography use (Medicare age 67–69) 2014  Mammography screening helps detect breast cancer at an early stage (1). 
Social associations 2014  People with adequate social support had more health care access and fostered more productive relationships with their health care providers (53). 
Violent crime 2012–2014  Homicide rate in the neighborhoods of women's nearest screening facility is associated with breast cancer late-stage diagnosis (54). 
Severe housing problems 2009–2013  People with severe housing problems might have lower priority of screening in the context of managing other acute issues (55). 
% Racial/ethnic minorities 2012–2016 ACS Race and ethnicity have been strong predictors of late-stage breast cancer (26, 28). 
% Family with own children (age < 18)   Women with more children are less likely to receive follow-up of tests or seek care for symptoms suggestive of breast cancer (56). 
% Female-headed households   Women from neighborhoods with greater percentages of female-headed households may be at higher risk of LSBC (57). 
% Women with high school degree or higher (age 25+)   Women from neighborhoods with less educated people may be at higher risk of LSBC (30, 57). 
% People spoke English less than "very well" (age 18+)   Language may be a barrier to breast cancer screening (30, 58, 59). 
% Women in management/business/science/arts occupations (age 16+)   Occupation categories are associated with breast cancer stage at diagnosis (60, 61). 
% Women in service occupations (age 16+)    
% Women in sales and office occupations (age 16+)    
% Women in labor intensive occupations (age 16+)    
% Renter occupied households   Women living in area with higher rates of renter occupied households are more likely to be diagnosed with LSBC (59). 
% People moved residency in the past year   High-frequency residential change is potentially a marker for the clinical risk of behavioral and emotional problems (62). 
% Women worker drove alone to work (age 16+)   Percentage of women driving alone to work is an indicator of vehicle availability, which is an indicator of spatial access to screening services (3). 
% Women worker with > = 30 minutes travel time to work (age 16+)   Travel time to work may be an indicator of proximity to urban centers where most screening services are located, which in turn may be associated with cancer stage at diagnosis (6–9). 
% Women (age 15–50) had a birth in the past 12 months   After a childbirth, mothers experience a transient increased risk of late-stage breast cancer (63). 
% Women unemployed among those in labor force (age 16+)   Women living in area with higher rates of unemployment are more likely to be diagnosed with LSBC (59). 
Area deprivation index 2014 R Package "Sociome" Neighborhood deprivation along various aspects of social determinants of health may be associated with LSBC (4, 5). 
VariableYear(s)SourceConceptual reason for inclusion
% Patients younger than 65 2009–2018 SEER Women ages 65 years and older may have better access to health insurance as they are mostly eligible for Medicare. 
Incidence rate of breast cancer (age-adjusted)   Incidence rate is a measure of the breast cancer burden in the population. 
Mammography use (age 40+) 2010 BRFSS Mammography screening helps detect breast cancer at an early stage (1). 
Mammography facilities per 100,000 population 2016 FDA The availability of mammography could impact the uptake of screening in an area (5–9). 
Hospitals per 100,000 population 2015 AHRF Hospitals could recommend and provide screening services for women. 
Community Health Centers per 100,000 population 2014  Community Health Centers could provide screening services or facilitate referrals for screening. 
Primary care physicians per 100,000 population 2014  Sufficient primary care physicians are essential for preventive cancer care, and referrals for diagnostic services when necessary (10–12). 
Obstetricians-gynecologists per 100,000 population 2015  Obstetrician and gynecologists are more likely to discuss and perform breast cancer screening and more likely to recognize breast cancer than other physicians (37, 38). 
Radiologists per 100,000 population 2015  Radiologists are essential for the diagnosis and staging of breast cancer. 
Population estimate 2014  An indicator of urbanicity, which may be associated with screening uptake (39). 
% Urban population 2010  An indicator of urbanicity, which may be associated with screening uptake (39). 
Per capita income 2014  Lower income is associated with late-stage diagnosis of breast cancer (10, 26). 
Rural-urban continuum code 2013  An indicator of urbanicity, which may be associated with screening uptake (39) 
Urban Influence code 2013  An indicator of urbanicity, which may be associated with screening uptake (39) 
Health professional shortage area - primary care 2015  Sufficient availability of primary care physicians is essential for preventive cancer care, and referrals for diagnostic services when necessary (10–12). 
% Eligible for medicare 2014  Medicare-eligible individuals are more likely to undergo all cancer preventive services (40). 
Median household income 2014  Lower income has been associated with late-stage diagnosis of breast cancer (10, 26). 
% People in poverty 2014  Poverty has been associated with late-stage diagnosis of breast cancer (41). 
% Food stamp or SNAP recipients 2014  Receipt of SNAP benefits may be predictive of breast cancer tumor size (42). 
% Uninsured women (age 18–64) 2014  Women without health insurance are more likely to be diagnosed at late stage for breast cancer compared with those with insurance (43). 
% People under 200% of poverty (age 18–64) 2014  Lower income and poverty are associated with late stage at breast cancer diagnosis (10, 26, 41). 
% People with poor or fair health 2014 CHR Overall poor or fair health may be positively or negatively associated with mammography screening based on either increased health care contacts or competing health priorities (44). 
Poor physical health days 2014  More poor physical health days may be positively or negatively associated with mammography screening based on either increased health care contacts or competing health priorities (44). 
Poor mental health days 2014  People with more poor mental health days may have lower priority of screening in the context of managing other medical and life issues (44). 
Adult smoking 2014  Smoking may be inversely associated with use of colorectal cancer screening tests. (45). 
Excessive drinking 2014  Alcohol consumption may be associated with breast cancer screening rates (46). 
Teen births 2007–2013  Teenage women who bear a child are more likely to have lower socioeconomic status and psychologic distress in their later lives (47, 48). 
Children in poverty 2014  Child poverty could reflect long-term negative consequences of the population along various aspects of social determinants health (49). 
Low birthweight 2008-2014  Low birthweight may indicate maternal exposure to various health risks (50). 
Adult obesity 2013  High body mass is associated with late-stage breast cancer at diagnosis (27, 51). 
Food environment index 2010&2014  The urgency of food insecurity may deprioritize the receipt of preventive screening services (52). 
Physical Inactivity 2013  Physical activity may be associated with use of colorectal cancer screening tests (45). 
Access to exercise opportunities 2010&2014  A study found that physical activity was also associated with use of colorectal cancer screening tests (45). 
Mammography use (Medicare age 67–69) 2014  Mammography screening helps detect breast cancer at an early stage (1). 
Social associations 2014  People with adequate social support had more health care access and fostered more productive relationships with their health care providers (53). 
Violent crime 2012–2014  Homicide rate in the neighborhoods of women's nearest screening facility is associated with breast cancer late-stage diagnosis (54). 
Severe housing problems 2009–2013  People with severe housing problems might have lower priority of screening in the context of managing other acute issues (55). 
% Racial/ethnic minorities 2012–2016 ACS Race and ethnicity have been strong predictors of late-stage breast cancer (26, 28). 
% Family with own children (age < 18)   Women with more children are less likely to receive follow-up of tests or seek care for symptoms suggestive of breast cancer (56). 
% Female-headed households   Women from neighborhoods with greater percentages of female-headed households may be at higher risk of LSBC (57). 
% Women with high school degree or higher (age 25+)   Women from neighborhoods with less educated people may be at higher risk of LSBC (30, 57). 
% People spoke English less than "very well" (age 18+)   Language may be a barrier to breast cancer screening (30, 58, 59). 
% Women in management/business/science/arts occupations (age 16+)   Occupation categories are associated with breast cancer stage at diagnosis (60, 61). 
% Women in service occupations (age 16+)    
% Women in sales and office occupations (age 16+)    
% Women in labor intensive occupations (age 16+)    
% Renter occupied households   Women living in area with higher rates of renter occupied households are more likely to be diagnosed with LSBC (59). 
% People moved residency in the past year   High-frequency residential change is potentially a marker for the clinical risk of behavioral and emotional problems (62). 
% Women worker drove alone to work (age 16+)   Percentage of women driving alone to work is an indicator of vehicle availability, which is an indicator of spatial access to screening services (3). 
% Women worker with > = 30 minutes travel time to work (age 16+)   Travel time to work may be an indicator of proximity to urban centers where most screening services are located, which in turn may be associated with cancer stage at diagnosis (6–9). 
% Women (age 15–50) had a birth in the past 12 months   After a childbirth, mothers experience a transient increased risk of late-stage breast cancer (63). 
% Women unemployed among those in labor force (age 16+)   Women living in area with higher rates of unemployment are more likely to be diagnosed with LSBC (59). 
Area deprivation index 2014 R Package "Sociome" Neighborhood deprivation along various aspects of social determinants of health may be associated with LSBC (4, 5). 

Abbreviations: ACS, American Community Survey; AHRF, Area Health Resources Files; CHR, County Health Rankings & Roadmaps; BRFSS, Behavioral Risk Factor Surveillance System; FDA, Food and Drug Administration; SEER, Surveillance, Epidemiology, and End Results.

Given the nature of the study, all measures were aggregated at the county level; therefore, we did not account for variables at the individual level.

Statistical analysis

As described in detail below, machine learning methods (including CART and random forest) and geographic information systems were used to accomplish the objectives of this study. The outcome for all models was county-level percentage of LSBC and all variables in Table 1 were included as candidate predictors.

CART uses conditional inference that recursively partitions data into smaller, more homogeneous groups characterized by combinations of predictors (22, 23). At each split, the data are divided into two homogeneous groups according to a threshold value of one of the predictors, a predictor that results in the two groups with greatest difference in the outcome (23). The splitting procedure is repeatedly applied for each of the split groups by selecting one of the predictors that holds the lowest P-value based on Pearson correlation test if the predictor variable is numeric, or Kruskal–Wallis test if the predictor variable is categorical (22). This procedure continues until all possible splits are exhausted or until some stopping criteria is met. In this study, we set the following stopping criteria: a maximum tree depth of 6 splits, a minimum number of 80 counties in a terminal node, and lack of statistical significance for variable splits (α > 0.05). We also conducted a sensitivity analysis of the CART model with a minimum number of 20 counties in a terminal node.

CART was used to identify phenotypes associated with differing levels of LSBC. We defined a phenotype of LSBC as the combination of characteristics along a top-down path of a tree to a terminal node (a node without any further split) which includes a group of homogenous counties with similar percentages of LSBC. Conceptually, this results in the identification of the combinations of county characteristics that predict the percentage of women with breast cancer presenting with late-stage disease.

Next, we visualized the identified phenotypes using geographic information systems and examined their distribution among regions. To optimize interpretability, the minimum number of counties in a terminal node of the CART model was set to 80 to limit the number of phenotypes presented on the map.

Random forest analysis was used to determine whether our CART model captured the most important variables in predicting the percentage of LSBC. While CART has advantages in variable identification and group classification, a major disadvantage of this single-tree model is that it is sensitive to changes in the data. Hence, the entire tree could be altered if, for example, additional counties are included in the model. In contrast, random forest analysis is more “stable” (24). It uses the same algorithm as CART, but instead of relying on only one tree, the algorithm creates and aggregates an ensemble of trees using random variable selection and bootstrap sampling (24). It then takes an overall average of these tree models' outputs as a prediction. Next, the mean decrease in accuracy was used to calculate variables' relative importance in predicting the outcome. We created 200,000 trees with all predictor variables included in the analysis. The number of variables randomly sampled as candidates at each split was set equal to the number of splits in the results of the CART model.

Because of the nature of CART and random forest, in that both algorithms select only one of many variables at each split of their trees, they can handle highly correlated variables. However, the associations among candidate predictors remain unknown. For highly correlated predictors, while CART selects only the one that most significantly splits the group, it does not suggest that the rest of the predictors are not predictive of the outcome. Random forest partially addresses this issue with the rankings of variable importance. To further explore the associations among candidate predictors, we conducted a correlation analysis using Pearson correlation coefficients among all splitting variables in the CART model and top 10 variables in the variable importance plot of the random forest, as well as the variable representing the proportion of women in race/ethnic minorities.

SAS v9.4 and R v3.6.1 were used for the analyses, and ArcGIS Pro v2.7.0 and Tableau v2021.1 were used for mapping and visualization. R packages “rpart,” “partykit,” and “randomForest” were used for conducting machine learning analyses.

Within the 680 included counties between 2009 and 2018, there were 812,048 women diagnosed with invasive breast cancer, among whom 276,305 (34.0%) were diagnosed at a late stage. The median percentage of LSBC among the counties was 35.4%. The geographic distribution of counties by percentage of LSBC is presented in Fig. 1.

Figure 1.

Percentage of late-stage breast cancer at the time of diagnosis among SEER counties during years 2009–2018 (2009–2017 for Massachusetts). Maps are not on the same scale. Percentages were classified by Jenks natural breaks classification method.

Figure 1.

Percentage of late-stage breast cancer at the time of diagnosis among SEER counties during years 2009–2018 (2009–2017 for Massachusetts). Maps are not on the same scale. Percentages were classified by Jenks natural breaks classification method.

Close modal

We observed that counties in the Northeast states (New York, Massachusetts, Connecticut, and New Jersey) had lower percentages of LSBC compared with the majority of those in the south and west. The box plots showing the county distribution of LSBC by region are included in Supplementary Fig. S2.

The results of the CART analysis are shown in Fig. 2. Each path down to a terminal node of the tree represents a phenotype of LSBC with corresponding characteristics. P values on the splitting nodes (<0.05) suggest that the groups of counties split by the thresholds of the variables are statistically significantly different from each other in terms of percentage of LSBC. Counties within the same terminal node have similar percentages of LSBC and belong to the same phenotype. Phenotypes are classified into low-risk (LR), medium-risk (MR), and high-risk (HR) by their median percentage of LSBC among counties, with LR 1 having the lowest percentage and HR 3 having the highest percentage.

Figure 2.

Classification and regression tree analysis in predicting percentage of LSBC. Each path down to a terminal node represents a phenotype of LSBC. Box plots in the terminal nodes represent the percentages of LSBC among counties. Minimum number of counties in a terminal node was set to 80. Counties with an ADI greater than 99.7 were more deprived than 52.2% of all 680 counties in the study. Counties with an UIC smaller or equal to 3 were Metropolitan counties or Micropolitan counties adjacent to a large Metropolitan county; counties with a higher UIC were more rural.

Figure 2.

Classification and regression tree analysis in predicting percentage of LSBC. Each path down to a terminal node represents a phenotype of LSBC. Box plots in the terminal nodes represent the percentages of LSBC among counties. Minimum number of counties in a terminal node was set to 80. Counties with an ADI greater than 99.7 were more deprived than 52.2% of all 680 counties in the study. Counties with an UIC smaller or equal to 3 were Metropolitan counties or Micropolitan counties adjacent to a large Metropolitan county; counties with a higher UIC were more rural.

Close modal

The results show that among all candidate predictors, CART selected five variables as splitting nodes, with percentage of uninsured women ages 18–64 on the top, followed sequentially by percentage of mammography use among women aged 67–69 enrolled in Medicare, Area Deprivation Index (ADI), Urban Influence Code (UIC), percentage of people under poverty, and per capita income. The ADI is an index of social deprivation calculated from census variables, which incorporates 17 separate factors covering domains of education, employment, income, housing (costs and crowding), and transportation access (25). The ADI ranges from 30.5 to 154.5 among all counties in the study area (mean: 98.8, median: 98.5). A higher ADI indicates that the county is more deprived. Counties with an ADI greater than 99.7, as shown in the CART output, means that they were more deprived than 52.2% of all counties. Counties with a higher UIC were more rural. Specifically, counties with UIC less than or equal to 3 were metropolitan counties or micropolitan counties adjacent to a large metropolitan county.

HR 3 is the phenotype associated with the highest median percentage of LSBC (40.1%). It includes 89 counties and is characterized as having a higher percentage of uninsured middle-aged women (>11.6%), greater area deprivation (ADI > 99.7), and more people under poverty (>26.1%). HR 2 is the phenotype associated with the second highest median percentage of LSBC (38.4%). HR 2 has the same levels of percentage of uninsured middle-aged women and area deprivation compared with HR 1 but has a lower poverty rate (≤26.1%) and higher per capita income (>32,946 U.S. dollars). HR 1 has a slightly lower median percentage of LSBC than HR 2 (37.0% vs. 38.4%), and their only difference is in per capita income.

MR 1 has the largest group of counties (n = 139) among all phenotypes and has a median percentage of LSBC that is close to that of the overall study area (35.5% vs. 35.4%). Counties of MR 1 has greater rates of uninsured middle-aged women (>11.6%) but lower area deprivation (≤99.7).

Phenotypes LRs 1, 2, and 3, have better outcomes of LSBC compared with HRs 1, 2, 3, and MR 1. The key difference between LR phenotypes and MR or HR phenotypes is the top splitting variable (uninsured middle-aged women), with a threshold of 11.6% that separates the tree into two large branches. The variable that separates LR 1 from LRs 2 and 3 is mammography use among Medicare beneficiaries ages 67–69 years (>68.1%). When mammography use is at a lower level (≤68.1%), UIC comes into play and differentiates LR 2 (urban) with LR 3 (rural).

The sensitivity analysis of CART (with a minimum of 20 counties in a terminal node) presents additional splits compared with the main model, which include availability of obstetrics and gynecologists, access to exercise opportunities, breast cancer incidence rate, availability of primary care physicians, women in professional occupations, and Medicare eligibility (Supplementary Fig. S3). Note that UIC no longer appears in this model, suggesting potential correlations between this and other variables.

Figure 3 shows the geographic distribution of the phenotypes identified in the main CART model. We observed strong regional variations in the composition of phenotypes. For example, Massachusetts is the only region that has only one phenotype, which is LR 1, the most favorable phenotype of LSBC, while California, Kentucky, and Georgia tend to have the greatest variability of phenotypes. LR 1 also covers a large number of counties in Upstate New York and Iowa. LR 2 is common in metropolitan areas of San Francisco Bay Area (CA), New York (NY, NJ, and CT), Seattle, Detroit, Louisville (KY), Lexington (KY), metropolitan areas in Upstate New York, and urban areas in Hawaii and Iowa. LR 3 is mostly found in rural areas in Iowa, Kentucky, and Upstate New York. No counties of any LR phenotypes appear in Georgia, Louisiana, and Idaho, and only one county in New Mexico (Los Alamos County). MR 1 covers most counties in Utah, Idaho, metropolitan Atlanta in Georgia, and Coastal California. The three HR phenotypes (with higher rates of uninsured women, greater area deprivation, and worst outcome of LSBC) are found mostly in rural counties of the southern and western states of Georgia, Louisiana, New Mexico, Idaho, and California. We also summarized these descriptions by phenotype in Table 2 and visualized the distribution of phenotypes by region in Supplementary Fig. S4.

Figure 3.

Geographic distribution of county phenotypes of LSBC identified by the CART. Maps are not on the same scale.

Figure 3.

Geographic distribution of county phenotypes of LSBC identified by the CART. Maps are not on the same scale.

Close modal
Table 2.

Characteristics of phenotypes and prevalent regions.

Phenotype (median % LSBC)Characteristics associated with LSBCPrevalent regions
LR 1 (30.6%) Lower uninsured (≤11.6%), higher use of mammography (>68.1%) Massachusetts, New York, Connecticut, Iowa 
LR 2 (32.6%) Lower uninsured (≤11.6%), lower use of mammography (≤68.1%), urban area New York, New Jersey, Connecticut, Kentucky, Hawaii, California (San Francisco Bay), Iowa, Utah, Seattle Puget Sound, Detroit 
LR 3 (34.5%) Lower uninsured (≤11.6%), lower use of mammography (≤68.1%), rural area Kentucky, Iowa, New York, Hawaii 
MR 1 (35.5%) Higher uninsured (>11.6%), lower area deprivation (≤99.7) Utah, Idaho, California, New Jersey, Georgia, Louisiana, New Mexico 
HR 1 (37.0%) Higher uninsured (>11.6%), higher area deprivation (>99.7), lower poverty (≤26.1%), lower per capita income (≤$32,946) Georgia, Kentucky, Louisiana, New Mexico 
HR 2 (38.4%) Higher uninsured (>11.6%), higher area deprivation (>99.7), lower poverty (≤26.1%), higher per capita income (> $32,946) Louisiana, California, New Mexico, Idaho, Georgia, New York (Kings County) 
HR 3 (40.1%) Higher uninsured (>11.6%), higher area deprivation (>99.7), higher poverty (>26.1%) Georgia, Kentucky, Louisiana, New Mexico, New York (Bronx County) 
Phenotype (median % LSBC)Characteristics associated with LSBCPrevalent regions
LR 1 (30.6%) Lower uninsured (≤11.6%), higher use of mammography (>68.1%) Massachusetts, New York, Connecticut, Iowa 
LR 2 (32.6%) Lower uninsured (≤11.6%), lower use of mammography (≤68.1%), urban area New York, New Jersey, Connecticut, Kentucky, Hawaii, California (San Francisco Bay), Iowa, Utah, Seattle Puget Sound, Detroit 
LR 3 (34.5%) Lower uninsured (≤11.6%), lower use of mammography (≤68.1%), rural area Kentucky, Iowa, New York, Hawaii 
MR 1 (35.5%) Higher uninsured (>11.6%), lower area deprivation (≤99.7) Utah, Idaho, California, New Jersey, Georgia, Louisiana, New Mexico 
HR 1 (37.0%) Higher uninsured (>11.6%), higher area deprivation (>99.7), lower poverty (≤26.1%), lower per capita income (≤$32,946) Georgia, Kentucky, Louisiana, New Mexico 
HR 2 (38.4%) Higher uninsured (>11.6%), higher area deprivation (>99.7), lower poverty (≤26.1%), higher per capita income (> $32,946) Louisiana, California, New Mexico, Idaho, Georgia, New York (Kings County) 
HR 3 (40.1%) Higher uninsured (>11.6%), higher area deprivation (>99.7), higher poverty (>26.1%) Georgia, Kentucky, Louisiana, New Mexico, New York (Bronx County) 

Note: Under prevalent regions, the listing of states in multiple phenotypes refers to different counties within state, not overlapping areas.

The ranking of variables by their importance for predicting percentages of LSBC based on the random forest analysis is shown in Fig. 4. This dot plot ranks the variables in descending order relative to the most important predictor. Among all variables, uninsured rate among middle-aged women and mammography use among Medicare beneficiaries ages 67 to 69 are the two most important predictors of percentage of LSBC. Other important predictors are adult obesity, percentage of children in poverty, ADI, percentage of people under 200% of poverty, percentage of female-headed households, and percentage of people with poor or fair health. Detailed information regarding the calculation of variable importance with intermediate results, as well as an alternative measure of mean decrease in node impurity of the random forest analysis are included in Supplementary Table S1.

Figure 4.

Dot chart of random forest analysis showing variable importance for predicting counties with high proportion of LSBC. The most important variable is at the top and scaled to 100%. The importance of the rest of the variables is shown relative to the top one. The star sign (*) at the end of a variable indicates the variable is specific to females.

Figure 4.

Dot chart of random forest analysis showing variable importance for predicting counties with high proportion of LSBC. The most important variable is at the top and scaled to 100%. The importance of the rest of the variables is shown relative to the top one. The star sign (*) at the end of a variable indicates the variable is specific to females.

Close modal

The correlation matrix in Supplementary Table S2 suggests that some of the candidate predictors were highly correlated with absolute values of correlation coefficient (cc) greater than 0.7. Notably, ADI was highly correlated with multiple variables, including women with high school degree or above, per capita income, teen birth, percent people with poor of fair health, poverty, child poverty, and population below 200% of the poverty level (cc: −0.77, −0.74, 0.81, 0.85, 0.88, 0.89, and 0.90, respectively). The correlation between the percentage of women in race/ethnic minority groups and other variables were moderate, with the largest cc at 0.65 with female-headed households.

Using CART analysis, our study identified low, medium, and high risk phenotypes of LSBC consisting of county-level characteristics that were predictive of LSBC. These phenotypes were defined by combinations of indicators of uninsured rate, mammography use, area deprivation, urban-rural status, poverty rate, and per capita income. Among those, the importance of uninsured rate and mammography use was further evidenced by their top rankings in the random forest analysis.

When a smaller number of counties was allowed in a terminal node, additional characteristics came into play in the CART model. Surprisingly, the percentage of racial and ethnic minorities, a factor frequently emphasized in previous studies of LSBC (3, 4, 7, 10–13, 26–28), did not appear in the results of the CART models and was only ranked 13th in the random forest plot. Our study suggests that other contextual factors might have played more critical roles than the constructs of race and ethnicity themselves.

We also recognized the correlations among variables. Some predictors, although did not appear in CART, could still have important implications due to their strong correlations with the outcome and splitting variables. For example, several measures of population socioeconomic status were highly correlated with ADI, most of which also had relatively high rankings in the random forest plot. In addition, despite the absence of adult obesity as a splitting variable in CART, it was identified as a top-ranking variable in predicting LSBC by the random forest analysis, which is notable because it was the only physiologic factor associated with LSBC.

The spatial distribution of phenotypes shows that LR phenotypes, or those comprised of counties with a relatively lower percentage of LSBC, were prevalent in northeastern states, Iowa, and select metropolitan areas, whereas HR phenotypes were mostly observed in the southern states of Georgia, Louisiana, New Mexico, and some rural areas in other states. The unbalanced distribution of phenotypes suggests that there were geospatial disparities in LSBC, and these disparities were strongly associated with population characteristics along multiple dimensions.

The geographic clustering of phenotypes suggests that the association between LSBC and various socioeconomic characteristics may be mediated by geographic-related factors. We observed strong differences in the phenotype composition of states, such as northeastern states versus southern and western states. These state-level disparities could be related to the differences in state-specific culture and policies. For example, states with stringent eligibility criteria for Medicaid enrollment may observe higher rates of uninsured among people with low socioeconomic status compared with states that do not. This was especially the case after 2014 when some states expanded their Medicaid programs under the Patient Protection and Affordable Care Act, while others had not by the end of the study period (29).

We also noted within-state variations, especially in California, Kentucky, and Georgia, where counties in large metro areas generally had more favorable phenotypes than their rural neighbors. This may be due to the large differences in demographic and socioeconomic characteristics and availability of resources between rural and urban areas as indicated in earlier studies (9, 13). However, not all urban areas outperformed rural areas in LSBC. For example, several rural counties in northern California had LR phenotypes, while Los Angeles County and San Diego County, the state's two most populous counties, were of the MR phenotype. This suggests that there was a complex relationship between LSBC and urbanicity, which could involve associations with other variables, such as uninsured rates and mammography use as shown in the CART models.

To our knowledge, this is the first study that combines machine learning and geographic methods to explore the association between LSBC and population characteristics using cancer registry data. Prior studies have investigated associations between LSBC and various demographic, socioeconomic, and behavioral factors using geospatial analyses and parametric regression models (2–8, 10–12, 30–32). However, these studies did not evaluate the roles of these factors in specific subgroups of the population. For example, one indicator may be important to a certain group of a population but less important to other subgroups of the population. As indicated in Fig. 2, when mammography use was less than or equal to 68.1%, UIC was able to significantly distinguish LR 2 and LR 3 based on the percentage of LSBC. However, it was no longer used as a splitting node when mammography use was higher. Similarly, when uninsured rate was greater than 11.6%, neither mammography use nor rural-urban status was as important as the other variables that appeared in the right branch of the tree. By using the CART-defined phenotypes, our study identifies correlates that are specific to various subgroups of the population that share common characteristics.

There are several methodologic strengths that lend confidence to the study results. First, compared with parametric regression models (such as logistic regression), CART and random forest methods can deal with a greater number of predictor variables simultaneously without concerns about outliers, multicollinearity, heteroscedasticity, or distributional error structures that affect parametric procedures. Second, both CART and random forest methods are able to handle highly correlated data due to their variable selection and bootstrap sampling strategies. Finally, the identified phenotypes capture both the outcome and top predictor variables, allowing the examination of geographic patterns of LSBC from multiple aspects.

The main limitation of the current study is that individual regions in the study area were disconnected in geography, preventing the use of geospatial analyses designed for contiguous regions, such as the spatial scan statistics (33), the Local Indicators of Spatial Association (LISA; ref. 34), and the recently developed geographically weighted random forest (35, 36). A geospatial analysis on a contiguous study area, such as the contiguous United States, could help us understand a broad scope of the disparities in LSBC and provide insights into why neighboring areas present similar or different patterns. However, this limitation was tempered by the wide distribution of the study area across the country which covered diverse populations that were comparable with the overall United States. Another limitation is that we were not able to incorporate additional individual-level characteristics in the analysis. Nevertheless, the findings of the study are still valid as the nature of our study was to discover community-level drivers of LSBC disparities.

In summary, our study shows that the use of machine learning and geographic methods is a promising avenue for future disparities research. The findings of our study suggest that the disparities of LSBC are associated with multiple characteristics of the population, and these associations vary greatly across geographies. Local interventions to reduce late-stage diagnosis of breast cancer, such as community education and outreach programs, should consider the characteristics of their communities; thus translational and implementation researchers should consider phenotype-tailored interventions.

W. Dong reports grants from the NCI during the conduct of the study; and grants from Cleveland Clinic Foundation outside the submitted work. U. Kim reports grants from National Center for Advancing Translational Sciences, National Institute of General Medical Sciences, and PhRMA Foundation during the conduct of the study. S.M. Koroukian reports grants from Centers for Disease Control and Prevention, Ohio Medicaid Technical Assistance and Policy Program, NIH, American Cancer Society, and Contracts from Cleveland Clinic Foundation, including a subcontract from Celgene Corporation outside the submitted work; and her spouse has stock ownership in American Renal Associates. No disclosures were reported by the other authors.

W. Dong: Conceptualization, resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. W.P. Bensken: Conceptualization, resources, investigation, methodology, writing–original draft, writing–review and editing. U. Kim: Conceptualization, resources, investigation, methodology, writing–original draft, writing–review and editing. J. Rose: Conceptualization, resources, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing. N.A. Berger: Conceptualization, resources, writing–original draft, writing–review and editing. S.M. Koroukian: Conceptualization, resources, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing.

The authors thank Will Penman of Composition Coaching for his advice and assistance in the earlier planning of this article. This study was funded by a grant from the NCI, Case Comprehensive Cancer Center (P30 CA043703) to S.M. Koroukian, J. Rose, and W. Dong. W. Dong and S.M. Koroukian are supported by a grant from the American Cancer Society (132678-RSGI-19-213-01-CPHPS) and by contracts from Cleveland Clinic Foundation, including a subcontract from Celgene Corporation. S.M. Koroukian is also supported by grants from the NIH (R15 NR017792, and UH3-DE025487) and the American Cancer Society (RWIA-20-111-02 RWIA). W.P. Bensken is supported by a grant from the National Institute on Minority Health and Health Disparities (F31MD015681). U. Kim is supported by grants from the National Institute of General Medical Sciences (5T32GM007250), National Center for Advancing Translational Sciences (5TL1TR002549), and the PhRMA Foundation (PDHO18). J. Rose is supported by grants from the National Institute of Dental and Craniofacial Research (1UH2DE025487-01), the National Heart Lung and Blood Institute (R01 HL153175), and the American Cancer Society (RWIA-20-111-02 RWIA). N.A. Berger is supported by grants from the NCI (2P50 CA150964, 2U54CA163060, P20CA233216, R25CA221718, and R25CA225461).

This study was funded by a grant from the NCI, Case Comprehensive Cancer Center (P30 CA043703, to S.M. Koroukian, J. Rose, and W. Dong).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Siegel
RL
,
Miller
KD
,
Fuchs
HE
,
Jemal
A
. 
Cancer statistics, 2021
.
CA Cancer J Clin
2021
;
71
:
7
33
.
2.
MacKinnon
JA
,
Duncan
RC
,
Huang
Y
,
Lee
DJ
,
Fleming
LE
,
Voti
L
, et al
Detecting an association between socioeconomic status and late stage breast cancer using spatial analysis and area-based measures
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
756
62
.
3.
Wang
F
,
Luo
L
,
McLafferty
S
. 
Healthcare access, socioeconomic factors and late-stage cancer diagnosis: an exploratory spatial analysis and public policy implication
.
Int J Public Pol
2010
;
5
:
237
58
.
4.
Spada
NG
,
Geramita
EM
,
Zamanian
M
,
van Londen
GJ
,
Sun
Z
,
Sabik
LM
. 
Changes in disparities in stage of breast cancer diagnosis in pennsylvania after the affordable care act
.
J Women's Health
2021
;
30
:
324
31
.
5.
Anderson
RT
,
Yang
TC
,
Matthews
SA
,
Camacho
F
,
Kern
T
,
Mackley
HB
, et al
Breast cancer screening, area deprivation, and later-stage breast cancer in Appalachia: does geography matter
?
Health Serv Res
2014
;
49
:
546
67
.
6.
Huang
B
,
Dignan
M
,
Han
D
,
Johnson
O
. 
Does distance matter? Distance to mammography facilities and stage at diagnosis of breast cancer in Kentucky
.
J Rural Health
2009
;
25
:
366
71
.
7.
Henry
KA
,
Boscoe
FP
,
Johnson
CJ
,
Goldberg
DW
,
Sherman
R
,
Cockburn
M
. 
Breast cancer stage at diagnosis: is travel time important?
J Community Health
2011
;
36
:
933
42
.
8.
Onitilo
AA
,
Liang
H
,
Stankowski
RV
,
Engel
JM
,
Broton
M
,
Doi
SA
, et al
Geographical and seasonal barriers to mammography services and breast cancer stage at diagnosis
.
Rural Remote Health
2014
;
14
:
2738
.
9.
Chandak
A
,
Nayar
P
,
Lin
G
. 
Rural-urban disparities in access to breast cancer screening: a spatial clustering analysis
.
J Rural Health
2019
;
35
:
229
35
.
10.
Wang
F
,
McLafferty
S
,
Escamilla
V
,
Luo
L
. 
Late-stage breast cancer diagnosis and health care access in Illinois
.
Prof Geogr
2008
;
60
:
54
69
.
11.
Kuo
TM
,
Mobley
LR
,
Anselin
L
. 
Geographic disparities in late-stage breast cancer diagnosis in California
.
Health Place
2011
;
17
:
327
34
.
12.
Barry
J
,
Breen
N
,
Barrett
M
. 
Significance of increasing poverty levels for determining late-stage breast cancer diagnosis in 1990 and 2000
.
J Urban Health
2012
;
89
:
614
27
.
13.
McLafferty
S
,
Wang
F
. 
Rural reversal? Rural-urban disparities in late-stage cancer risk in Illinois
.
Cancer
2009
;
115
:
2755
64
.
14.
Surveillance, Epidemiology, and End Results (SEER) Program SEER*Stat database: incidence - SEER Research plus limited-field data, 21 registries, Nov 2020 Sub (2000–2018) - linked to county attributes - total U.S., 1969–2019 counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021, based on the November 2020 submission
; 
2021
.
15.
Surveillance, Epidemiology, and End Results (SEER) Program
. 
Characteristics of the SEER population compared with the total United States population
; 
2021
.
Available from:
https://seer.cancer.gov/registries/characteristics.html.
16.
Surveillance, Epidemiology, and End Results (SEER) Program
. 
About the SEER registries
; 2021.
Available from:
https://seer.cancer.gov/registries/.
17.
American Community Survey (ACS)
; 2021.
Available from:
https://www.census.gov/programs-surveys/acs.
18.
County Health Rankings & Roadmaps (CHR)
; 2021.
Available from:
https://www.countyhealthrankings.org/.
19.
Area Health Resources Files (AHRF)
; 2021.
Available from:
https://data.hrsa.gov/topics/health-workforce/ahrf.
20.
Behavioral Risk Factor Surveillance System (BRFSS)
; 2021.
Available from:
https://www.cdc.gov/brfss/index.html.
21.
U.S. Food and Drug Administration (FDA)
; 2021.
Available from:
https://www.fda.gov/.
22.
Hothorn
T
,
Hornik
K
,
Zeileis
A
. 
Unbiased recursive partitioning: a conditional inference framework
.
J Comput Graph Statist
2006
;
15
:
651
74
.
23.
Ryo
M
,
Rillig
MC
. 
Statistically reinforced machine learning for nonlinear patterns and variable interactions
.
Ecosphere
2017
;
8
.
24.
Breiman
L
. 
Random forests
.
Mach Learn
2001
;
45
:
5
32
.
25.
Singh
GK
. 
Area deprivation and widening inequalities in US mortality, 1969–1998
.
Am J Public Health
2003
;
93
:
1137
43
.
26.
Lannin
DR
,
Mathews
HF
,
Mitchell
J
,
Swanson
MS
,
Swanson
FH
,
Edwards
MS
. 
Influence of socioeconomic and cultural factors on racial differences in late-stage presentation of breast cancer
.
JAMA
1998
;
279
:
1801
7
.
27.
Jones
BA
,
Kasl
SV
,
Curnen
MGM
,
Owens
PH
,
Dubrow
R
. 
Severe obesity as an explanatory factor for the black/white difference in stage at diagnosis of breast cancer
.
Am J Epidemiol
1997
;
146
:
394
404
.
28.
Smigal
C
,
Jemal
A
,
Ward
E
,
Cokkinides
V
,
Smith
R
,
Howe
HL
, et al
Trends in breast cancer by race and ethnicity: update 2006
.
CA Cancer J Clin
2006
;
56
:
168
83
.
29.
Le Blanc
JM
,
Heller
DR
,
Friedrich
A
,
Lannin
DR
,
Park
TS
. 
Association of Medicaid expansion under the Affordable Care Act with breast cancer stage at diagnosis
.
JAMA Surg
2020
;
155
:
752
8
.
30.
Tatalovich
Z
,
Zhu
L
,
Rolin
A
,
Lewis
DR
,
Harlan
LC
,
Winn
DM
. 
Geographic disparities in late stage breast cancer incidence: results from eight states in the United States
.
Int J Health Geogr
2015
;
14
:
31
.
31.
McElroy
JA
,
Remington
PL
,
Gangnon
RE
,
Hariharan
L
,
Andersen
LD
. 
Identifying geographic disparities in the early detection of breast cancer using a geographic information system
.
Prev Chronic Dis
2006
;
3
:
A10
.
32.
de Oliveira
NPD
,
de Camargo Cancela
M
,
Martins
LFL
,
de Souza
DLB
. 
A multilevel assessment of the social determinants associated with the late stage diagnosis of breast cancer
.
Sci Rep
2021
;
11
:
2712
.
33.
Kulldorff
M
. 
A spatial scan statistic
.
Commun Stat Theory Methods
1997
;
26
:
1481
96
.
34.
Anselin
L
. 
Local indicators of spatial association—LISA
.
Geographical Analysis
1995
;
27
:
93
115
.
35.
Georganos
S
,
Grippa
T
,
Gadiaga
AN
,
Linard
C
,
Lennert
M
,
Vanhuysse
S
, et al
Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling
.
Geocarto Int
2019
;
36
:
121
36
.
36.
Luo
Y
,
Yan
J
,
McClure
S
. 
Distribution of the environmental and socioeconomic risk factors on COVID-19 death rate across continental USA: a spatial nonlinear analysis
.
Environ Sci Pollut Res Int
2021
;
28
:
6587
99
.
37.
Gusberg
SB
. 
The gynecologist and breast cancer
.
Isr J Med Sci
1981
;
17
:
843
6
.
38.
Frank
E
,
Rimer
BK
,
Brogan
D
,
Elon
L
. 
US women physicians' personal and clinical breast cancer screening practices
.
J Womens Health Gend Based Med
2000
;
9
:
791
801
.
39.
Nguyen-Pham
S
,
Leung
J
,
McLaughlin
D
. 
Disparities in breast cancer stage at diagnosis in urban and rural adult women: a systematic review and meta-analysis
.
Ann Epidemiol
2014
;
24
:
228
35
.
40.
Meyer
CP
,
Allard
CB
,
Sammon
JD
,
Hanske
J
,
McNabb-Baltar
J
,
Goldberg
JE
, et al
The impact of Medicare eligibility on cancer screening behaviors
.
Prev Med
2016
;
85
:
47
52
.
41.
Coughlin
SS
. 
Social determinants of breast cancer risk, stage, and survival
.
Breast Cancer Res Treat
2019
;
177
:
537
48
.
42.
Nguyen
BC
,
Alawadi
ZM
,
Roife
D
,
Kao
LS
,
Ko
TC
,
Wray
CJ
. 
Do socioeconomic factors and race determine the likelihood of breast-conserving surgery?
.
Clin Breast Cancer
2016
;
16
:
e93
7
.
43.
Berrian
JL
,
Liu
Y
,
Lian
M
,
Schmaltz
CL
,
Colditz
GA
. 
Relationship between insurance status and outcomes for patients with breast cancer in Missouri
.
Cancer
2021
;
127
:
931
7
.
44.
Zha
N
,
Alabousi
M
,
Patel
BK
,
Patlas
MN
. 
Beyond universal health care: barriers to breast cancer screening participation in Canada
.
J Am Coll Radiol
2019
;
16
:
570
9
.
45.
Shapiro
JA
,
Seeff
LC
,
Nadel
MR
. 
Colorectal cancer-screening tests and associated health behaviors
.
Am J Prev Med
2001
;
21
:
132
7
.
46.
Mu
L
,
Mukamal
KJ
. 
Alcohol consumption and rates of cancer screening: is cancer risk overestimated?
Cancer Causes Control
2016
;
27
:
281
9
.
47.
Hoffman
SD
,
Maynard
RA
.
editors
.
Kids having kids: economic costs and social consequences of teen pregnancy
. 2nd ed.
Washington, DC
:
Urban Institute Press
; 
2008
.
48.
SmithBattle
L
,
Freed
P
. 
Teen mothers' mental health
.
MCN Am J Matern Child Nurs
2016
;
41
:
31
6
.
49.
Griggs
MJ
,
Walker
R
. 
The costs of child poverty for individuals and society: a literature review
; 
2008
.
50.
Bailey
BA
,
Byrom
AR
. 
Factors predicting birth weight in a low-risk sample: the role of modifiable pregnancy health behaviors
.
Matern Child Health J
2007
;
11
:
173
9
.
51.
Cui
Y
,
Whiteman
MK
,
Flaws
JA
,
Langenberg
P
,
Tkaczuk
KH
,
Bush
TL
. 
Body mass and stage of breast cancer at diagnosis
.
Int J Cancer
2002
;
98
:
279
83
.
52.
Mahmood
A
,
Kedia
S
,
Dillon
P
,
Arshad
H
,
Ray
M
. 
Food security status and breast cancer screening among women in the United States: evidence from Health and Retirement Study and Health Care and Nutrition Study
.
Research Square
2021
.
53.
Gage-Bouchard
EA
. 
Social support, flexible resources, and health care navigation
.
Soc Sci Med
2017
;
190
:
111
8
.
54.
Tarlov
E
,
Zenk
SN
,
Campbell
RT
,
Warnecke
RB
,
Block
R
. 
Characteristics of mammography facility locations and stage of breast cancer at diagnosis in Chicago
.
J Urban Health
2009
;
86
:
196
213
.
55.
Peek
ME
,
Sayad
JV
,
Markwardt
R
. 
Fear, fatalism and breast cancer screening in low-income African-American women: the role of clinicians and the health care system
.
J Gen Intern Med
2008
;
23
:
1847
53
.
56.
Weinmann
S
,
Taplin
SH
,
Gilbert
J
,
Beverly
RK
,
Geiger
AM
,
Yood
MU
, et al
Characteristics of women refusing follow-up for tests or symptoms suggestive of breast cancer
.
JNCI Monographs
2005
;
2005
:
33
8
.
57.
Davidson
PL
,
Bastani
R
,
Nakazono
TT
,
Carreon
DC
. 
Role of community risk factors and resources on breast carcinoma stage at diagnosis
.
Cancer
2005
;
103
:
922
30
.
58.
Crawford
J
,
Ahmad
F
,
Beaton
D
,
Bierman
AS
. 
Cancer screening behaviours among South Asian immigrants in the UK, US and Canada: a scoping study
.
Health Soc Care Community
2016
;
24
:
123
53
.
59.
Roche
LM
,
Niu
X
,
Stroup
AM
,
Henry
KA
. 
Research full report: disparities in female breast cancer stage at diagnosis in New Jersey: a spatial-temporal analysis
.
J Public Health Manag Pract
2017
;
23
:
477
86
.
60.
Liu
Y
,
Zhang
J
,
Huang
R
,
Feng
WL
,
Kong
YN
,
Xu
F
, et al
Influence of occupation and education level on breast cancer stage at diagnosis, and treatment options in China: a nationwide, multicenter 10-year epidemiological study
.
Medicine
2017
;
96
:
e6641
.
61.
Pudrovska
T
,
Carr
D
,
McFarland
M
,
Collins
C
. 
Higher-status occupations and breast cancer: a life-course stress approach
.
Soc Sci Med
2013
;
89
:
53
61
.
62.
Jelleyman
T
,
Spencer
N
. 
Residential mobility in childhood and health outcomes: a systematic review
.
J Epidemiol Community Health
2008
;
62
:
584
92
.
63.
Wohlfahrt
J
,
Andersen
PK
,
Mouridsen
HT
,
Melbye
M
. 
Risk of late-stage breast cancer after a childbirth
.
Am J Epidemiol
2001
;
153
:
1079
84
.