Background:

Work is needed to better understand how joint exposure to environmental and economic factors influence cancer. We hypothesize that environmental exposures vary with socioeconomic status (SES) and urban/rural locations, and areas with minority populations coincide with high economic disadvantage and pollution.

Methods:

To model joint exposure to pollution and SES, we develop a latent class mixture model (LCMM) with three latent variables (SES Advantage, SES Disadvantage, and Air Pollution) and compare the LCMM fit with K-means clustering. We ran an ANOVA to test for high exposure levels in non-Hispanic black populations. The analysis is at the census tract level for the state of North Carolina.

Results:

The LCMM was a better and more nuanced fit to the data than K-means clustering. Our LCMM had two sublevels (low, high) within each latent class. The worst levels of exposure (high SES disadvantage, low SES advantage, high pollution) are found in 22% of census tracts, while the best levels (low SES disadvantage, high SES advantage, low pollution) are found in 5.7%. Overall, 34.1% of the census tracts exhibit high disadvantage, 66.3% have low advantage, and 59.2% have high mixtures of toxic pollutants. Areas with higher SES disadvantage had significantly higher non-Hispanic black population density (NHBPD; P < 0.001), and NHBPD was higher in areas with higher pollution (P < 0.001).

Conclusions:

Joint exposure to air toxins and SES varies with rural/urban location and coincides with minority populations.

Impact:

Our model can be extended to provide a holistic modeling framework for estimating disparities in cancer survival.

See all articles in this CEBP Focus section, “Environmental Carcinogenesis: Pathways to Prevention.”

There is growing awareness that people are exposed to numerous environmental and social factors and that all of these may interact to influence an individual's susceptibility to disease or poor health. Our objective is to quantify the relationship between socioeconomic status (SES) and air quality and develop a framework for understanding the interplay of these factors and their relationship to disparities in cancer outcomes. Both factors are known to be associated with human health outcomes. The health disparities associated with low SES are wide ranging and include obesity (1), type 2 diabetes (2), and cancer (3–9). Negative health outcomes linked to air pollution include respiratory (10, 11), cardiovascular (12–14), increased rates of mortality (15, 16), and various cancers. Fine particulate matter (PM), volatile organic compounds (VOC), and other traffic-related air toxins have been linked to lung cancer (17–19); heavy metals (HM) have been indicated as influential in tumor formation (20–22).

Current studies investigate the interplay of SES, air quality, and health. Coker and colleagues illustrate how pollution, built environment, and SES influence adverse birth outcomes in Los Angeles using Bayesian profile regression (23). Chi and colleagues use Cox proportional hazard models to study SES and the association between air pollution and cardiovascular disease (24). Vieira and colleagues also use Cox proportional hazard models to show the impact of SES and air pollution on geographic disparities in ovarian cancer survival in California (25). Weaver and colleagues illustrate the impact of fine PM and SES on cardiovascular health and diabetes by using Wald hierarchical clustering to identify clusters of SES factors and showing that areas of relative SES disadvantage have stronger associations between air pollution and negative health outcomes as compared with areas of SES advantage (26).

In addition to studying air pollution in conjunction with SES when examining health outcomes, there has also been increased interest in the impact of multipollution exposure (MPE) on poor health. Recent studies highlight that measuring a single pollutant may not adequately capture the total environmental burden individual's face in a community (27–32) and methodologies have emerged to model multiple exposure to air toxins. For example, Keller and colleagues use a K-means clustering method to establish MPE for cohorts (33). Lenters and colleagues study the effects of MPE on birth weight using elastic nets (34). Zanobetti and colleagues present a clustering method to show how multipollutant mixtures impact mortality (35). Pirani and colleagues use a Bayesian approach to model MPE (36).

We present a latent class mixture model (LCMM) to quantify the relationship between SES and MPE. LCMMs are used to understand latent grouping mechanism behind correlated variables, like pollutants and SES variables. Once latent classes or groupings are accounted for, the variables are assumed to be conditionally independent (37). The advantage of LCMMs is that they are easily visualized to demonstrate interrelationships of measures, and can generate flexible summaries of many variables. For example, using LCMMs allows us to take a multitude of pollutants and come up with discrete classifications that describe the joint impact of those pollutants. LCMMs have been used previously in epidemiologic studies (38), including those that model the impacts of MPE on human health (39) as well as SES on health (40). To our knowledge, this is the first application of LCMMs to study the interaction of MPE and SES together.

We develop a LCMM to quantify joint levels of exposure to pollution mixtures and SES at the census tract level in North Carolina (NC). We use this model to examine racial disparities in exposure and differences in urban versus rural communities. To highlight the strengths of using LCMM over other clustering methods, we compare our model with K-means clustering. While K-means clustering is a simple and effective algorithm for clustering correlated observations into groups, LCMM provides the option to include expert knowledge into the model between observation and the latent variables. Furthermore, we present a framework that builds upon this research to delineate disparities in cancer outcomes. We use publicly available data on pollutants and indicators of SES. We focus on air toxins such as diesel PM, VOCs, and HMs. We collect social and economic variables known to be indicative of either SES advantage or disadvantage (40, 41). We chose NC as the study domain because of its diversity in SES, presence of both rural and urban counties and heterogeneity in air pollution. We identify vulnerable populations in NC at the census tract level and gain a better understanding of the interplay between SES and MPE.

We acquired data on air pollutants from the U.S. Environmental Protection Agency's (EPA) National Air Toxics Assessment (NATA; ref. 42). The NATA has been conducted approximately every 3 years since 1996. For this analysis, we used the most recent data from 2014. In the NATA, the EPA estimates ambient concentrations of air toxins via air quality models that use emissions data from the National Emissions Inventory along with meteorologic and other data as inputs (43). The result is spatially resolved numerically modeled estimates of multiple pollutants.

The NATA provides estimates of hazardous air pollutants (HAP) and diesel PM. HAPs, a class of pollutants that are considered carcinogenic and associated with negative health outcomes, are regulated under the 1990 Clean Air Act (42). Diesel PM is included in the NATA as an air toxic, but not considered a HAP under the 1990 Clean Air Act (43). However, some scientific evidence suggests a possible link between exposure to pollution from heavy traffic and breast cancer, so we include diesel PM in our model (44). In addition, we include 11 VOCs [acetaldehyde, benzene, 1,3-butadiene, carbon tetrachloride, ethylbenzene, formaldehyde, hexane, methanol, methyl chloride (chloromethane), toluene, and xylene (mixed isomers)] and four HMs (lead, manganese, mercury, and nickel). Maps of each pollutant in NC, where each pollutant is presented as a percentile in relation to NC, are presented in Supplementary Figs. S1–S3.

Our socioeconomic data is from the American Community Survey (ACS), which has been conducted by the U.S. Census Bureau since 2005. We used the 1-year 2014 survey to align with the NATA data timeframe (45). We selected variables from the ACS that have been found to reflect both socioeconomic advantage and disadvantage (2016; ref. 40). Palumbo and colleagues show that SES factors can be distinguished by latent advantage/disadvantage groups, providing a more flexible and nuanced way to describe SES as compared with other continuous metrics such as the neighborhood SES deprivation index (e.g., NSES). We downloaded the data using the R package, tidycensus. The SES Disadvantage variables (40) include data on households such as the proportion of households with a single head of house or a female head of household and the proportion of people renting in a given census tract. We also report the proportion of crowded households or houses having more than one person per room, which is generally considered a sign of overcrowded housing (46). We also include the proportion of people without a vehicle, with phone service, below the poverty line, relying on public assistance, and unemployed. The SES Disadvantage latent group includes data on race/ethnicity via the proportion of non-Hispanic black individuals in a given census tract. The SES Advantage variables are related to education and profession (40) and include the proportion of males and females in a professional occupation, and the proportion of people with less than a high school education. Maps of each SES variable in NC are provided in the Supplementary Materials (Supplementary Figs. S4 and S5).

LCMM

We employ LCMMs to investigate how pollution and SES interact by identifying latent classes underlying the observed data. Our model has three latent variables: SES Disadvantage, SES Advantage, and Toxins and Pollutants. We allowed covariance between the three latent variables. Our LCMM is illustrated in Fig. 1.

Figure 1.

LCMM for SES and Toxins and Pollutants. The variables in squares denote observed continuous variables and those in circles denote the categorical latent class variables. The arrows describe the relationships between the variables and latent classes.

Figure 1.

LCMM for SES and Toxins and Pollutants. The variables in squares denote observed continuous variables and those in circles denote the categorical latent class variables. The arrows describe the relationships between the variables and latent classes.

Close modal

To preprocess the data, we applied square root and log transformations to the SES and pollution data, respectively, to achieve normality for each group of variables because the SES variables are proportions, whereas the pollution data are continuous.

We ran several versions of the model, each with different numbers of classes for each latent variable. For assessing model fit, typical metrics to consider include model convergence, fit statistics such as BIC, AIC, and entropy, and class membership, that is, the percentage of census tracts classified under each level. To assess uncertainty in the best-fit model, we examine the probability of a census tract being assigned to a given level, if this probability is high, we can conclude that the assignment has low uncertainty.

We used MPlus [v 1.5 (1)] to run the analysis, which uses maximum likelihood estimation to estimate the level of exposure for each census tract. We also used the R package, MplusAutomation to run the analysis within R, which is a tool that calls MPlus from within R to facilitate preprocessing and postprocessing of data from these models. To evaluate presence of racial disparities in our data, we ran an ANOVA of non-Hispanic black population density versus the low/high SES Disadvantage, SES Advantage, and pollution latent classes using the R function, lm().

We also conducted K-means clustering on the SES and pollution variables as a comparison. We tested up to 15 clusters using the kmeans() function in R. We evaluated each model using NbClust(), which utilizes several metrics to choose the number of cluster which best fit the data. We computed AIC and BIC to compare the fit of the best K-means model with the LCMM.

We present summary statistics of each SES and pollution variable for a given census tract in NC as compared with the United States in Table 1. For both SES Advantage and SES Disadvantage, the NC averages are higher than the national average. For the pollutant variables, the opposite is true of most VOCs (1,3-butadiene, benzene, ethylbenzene, hexane, methyl chloride, and xylenes), diesel PM, lead compounds, mercury compounds, and nickel compounds. However, acetalaodehyde, formaldehyde, and methanol are higher in NC than the national average.

Table 1.

We compared SES and pollution levels in NC with nationwide levels by examining summary statistics (mean, SD, and IQR) of the latent and observed variables.

United StatesNC
Latent variableObserved variableMeanSDIQRMeanSDIQR
 Single parent 0.29 0.16 0.21 0.52 0.14 0.19 
 Female head of house 0.21 0.14 0.17 0.45 0.14 0.18 
 Crowded household 0.04 0.05 0.04 0.13 0.09 0.12 
 Renting 0.35 0.23 0.32 0.56 0.17 0.24 
SES Disadvantage Phone service 0.97 0.03 0.03 0.99 0.01 0.01 
 Below poverty line 0.23 0.31 0.37 0.39 0.39 0.76 
 Public assistance 0.03 0.03 0.03 0.12 0.08 0.09 
 Unemployed 0.06 0.03 0.04 0.25 0.07 0.08 
 No vehicle 0.10 0.13 0.09 0.24 0.12 0.15 
 Non-Hispanic black 0.13 0.22 0.14 0.40 0.23 0.35 
 Males—professional 0.07 0.07 0.08 0.20 0.15 0.20 
SES Advantage Females—professional 0.06 0.06 0.07 0.19 0.13 0.17 
 < High school education 0.14 0.11 0.13 0.36 0.13 0.19 
 1,3-Butadiene (VOC) 0.03 0.03 0.03 0.02 0.01 0.02 
 Acetaldehyde (VOC) 1.12 0.39 0.48 1.43 0.25 0.25 
 Benzene (VOC) 0.42 0.19 0.24 0.37 0.11 0.18 
 Carbon tetrachloride (VOC) 0.53 0.02 0.02 0.53 0.01 0.01 
 Ethylbenzene (VOC) 0.14 0.10 0.40 0.10 0.05 0.09 
 Formaldehyde (VOC) 1.38 0.42 0.12 1.65 0.26 0.34 
 Hexane (VOC) 0.25 0.27 0.65 0.15 0.09 0.09 
Pollutants Methanol (VOC) 1.36 0.76 0.21 1.47 0.36 0.45 
 Methyl chloride (VOC) 1.15 0.23 0.69 1.14 0.19 0.09 
 Toluene (VOC) 1.04 0.80 0.14 0.79 0.37 0.61 
 Xylenes (VOC) 0.49 0.37 0.91 0.36 0.20 0.34 
 Diesel PM 0.48 0.39 0.47 0.31 0.16 0.22 
 Lead compounds (HM) 0.0008 0.0015 0.0005 0.00047 0.0005 0.00021 
 Manganese compounds (HM) 0.0013 0.0035 0.0009 0.0016 0.0018 0.0008 
 Mercury compounds (HM) 0.0018 0.0005 0.0002 0.0017 0.00031 0.00018 
 Nickel compounds (HM) 0.0006 0.0018 0.0005 0.00025 0.00014 0.000082 
United StatesNC
Latent variableObserved variableMeanSDIQRMeanSDIQR
 Single parent 0.29 0.16 0.21 0.52 0.14 0.19 
 Female head of house 0.21 0.14 0.17 0.45 0.14 0.18 
 Crowded household 0.04 0.05 0.04 0.13 0.09 0.12 
 Renting 0.35 0.23 0.32 0.56 0.17 0.24 
SES Disadvantage Phone service 0.97 0.03 0.03 0.99 0.01 0.01 
 Below poverty line 0.23 0.31 0.37 0.39 0.39 0.76 
 Public assistance 0.03 0.03 0.03 0.12 0.08 0.09 
 Unemployed 0.06 0.03 0.04 0.25 0.07 0.08 
 No vehicle 0.10 0.13 0.09 0.24 0.12 0.15 
 Non-Hispanic black 0.13 0.22 0.14 0.40 0.23 0.35 
 Males—professional 0.07 0.07 0.08 0.20 0.15 0.20 
SES Advantage Females—professional 0.06 0.06 0.07 0.19 0.13 0.17 
 < High school education 0.14 0.11 0.13 0.36 0.13 0.19 
 1,3-Butadiene (VOC) 0.03 0.03 0.03 0.02 0.01 0.02 
 Acetaldehyde (VOC) 1.12 0.39 0.48 1.43 0.25 0.25 
 Benzene (VOC) 0.42 0.19 0.24 0.37 0.11 0.18 
 Carbon tetrachloride (VOC) 0.53 0.02 0.02 0.53 0.01 0.01 
 Ethylbenzene (VOC) 0.14 0.10 0.40 0.10 0.05 0.09 
 Formaldehyde (VOC) 1.38 0.42 0.12 1.65 0.26 0.34 
 Hexane (VOC) 0.25 0.27 0.65 0.15 0.09 0.09 
Pollutants Methanol (VOC) 1.36 0.76 0.21 1.47 0.36 0.45 
 Methyl chloride (VOC) 1.15 0.23 0.69 1.14 0.19 0.09 
 Toluene (VOC) 1.04 0.80 0.14 0.79 0.37 0.61 
 Xylenes (VOC) 0.49 0.37 0.91 0.36 0.20 0.34 
 Diesel PM 0.48 0.39 0.47 0.31 0.16 0.22 
 Lead compounds (HM) 0.0008 0.0015 0.0005 0.00047 0.0005 0.00021 
 Manganese compounds (HM) 0.0013 0.0035 0.0009 0.0016 0.0018 0.0008 
 Mercury compounds (HM) 0.0018 0.0005 0.0002 0.0017 0.00031 0.00018 
 Nickel compounds (HM) 0.0006 0.0018 0.0005 0.00025 0.00014 0.000082 

Note: For SES data, the mean represents the average proportion of people or households. For the pollution data, the mean represents the average concentration (μg/m3).

Abbreviation: IQR, interquartile range.

The best-fit model for NC comprised of mixtures of two levels of Pollutants, two levels of SES Advantage, and two levels of SES Disadvantage, for a total of eight levels of exposure. We modeled the SES latent variables with two levels to reflect the findings in the Palumbo study (40). We ran several models with varying levels within the Pollutants latent variable. We found that including more than two levels resulted in nonconvergence in the model, so we limited our final model to include only two levels for the pollutant latent variable.

We estimated the mean of each observed variable under each of the eight levels and conclude that the levels or classes within each latent variable can be characterized as low and high. The latent variable profiles are displayed in Fig. 2 as bar plots, where the pollution variables are expressed as percentiles and the SES variables are expressed as percents. In the low pollution level, the majority of the pollutants fall around the 20th to 25th percentile, except for mercury, nickel, and methyl chloride (40th–50th percentile). In the high pollution level, almost every pollutant at least doubles; methyl chloride hardly changes, and mercury and nickel only increase roughly to the 60th percentile.

Figure 2.

Barplots of the estimated average value of the variables in the SES Disadvantage, SES Advantage, and Pollution categories for each latent class. We express each estimated average value as percentages for the SES variables and percentiles for the pollutants.

Figure 2.

Barplots of the estimated average value of the variables in the SES Disadvantage, SES Advantage, and Pollution categories for each latent class. We express each estimated average value as percentages for the SES variables and percentiles for the pollutants.

Close modal

In areas with low SES Advantage, on average, the percentage of males and females in professional occupations is estimated to be low (<5%), while the percentage of people who did not graduate from high school is estimated to be close to 20%. Conversely, in high SES Advantage areas, the estimated percentage of people in professional occupations rose to roughly 10% for both males and females, while the percentage of the population without a high school degree dropped to around 5%. Examining the SES Disadvantage latent variable, the model estimates that the following variables are found in higher percentages in the High Disadvantage class as opposed to the low SES Disadvantage class: no vehicle, public assistance, unemployed, crowded housing, renting, single householder, female householder, below poverty line. We found that phone service was the only SES Disadvantage variable to barely follow this trend: phone service is close to 100% in both the high disadvantage areas and only slightly higher in low disadvantage areas.

We predicted the level of exposure for each census tract, based on the highest predicted class probability for each tract. Overall, 34.1% of the census tracts have high SES Disadvantage, 66.3% have low SES Advantage, and 59.2% have high mixtures of air pollutants. Our analysis shows that rural areas are sometimes observed under the highest levels of pollution, in addition to metropolitan areas. Areas of high SES Advantage are concentrated in urban/suburban and coastal areas, while SES Disadvantage dominates the eastern half of the state as well as many urban areas. Table 2 displays the number of census tracts in each level, as well as the number of those that are rural. In the “Census tracts” column, the values down the column all sum to the total number of census tracts in NC (N = 2,174). In the “Rural census tracts” column, each row is the percentage of census tracts in a given exposure level that are classified as rural. For example, in exposure Level 1, 162 of the 216 census tracts are rural, that is, 75%.

Table 2.

The LCMM predicts the level of joint exposure to pollutants, socioeconomic advantage, and socioeconomic disadvantage for each census tract in NC.

LevelDescription of exposureCensus tractsRural census tracts
High Disadvantage, Low Advantage, and Low Pollution 216 (9.9%) 162 (75%) 
High Disadvantage, High Advantage, and Low Pollution 0 (0%) 0 (0%) 
Low Disadvantage, Low Advantage, and Low Pollution 547 (25.2%) 321 (58.7%) 
Low Disadvantage, High Advantage, and Low Pollution 124 (5.7%) 51 (41.1%) 
Low Disadvantage, High Advantage, and High Pollution 561 (25.8%) 11 (2.0%) 
Low Disadvantage, Low Advantage, and High Pollution 201 (9.2%) 29 (14.4%) 
High Disadvantage, High Advantage, and High Pollution 47 (2.2%) 1 (2.1%) 
High Disadvantage, Low Advantage, and High Pollution 478 (22.0%) 60 (12.6%) 
LevelDescription of exposureCensus tractsRural census tracts
High Disadvantage, Low Advantage, and Low Pollution 216 (9.9%) 162 (75%) 
High Disadvantage, High Advantage, and Low Pollution 0 (0%) 0 (0%) 
Low Disadvantage, Low Advantage, and Low Pollution 547 (25.2%) 321 (58.7%) 
Low Disadvantage, High Advantage, and Low Pollution 124 (5.7%) 51 (41.1%) 
Low Disadvantage, High Advantage, and High Pollution 561 (25.8%) 11 (2.0%) 
Low Disadvantage, Low Advantage, and High Pollution 201 (9.2%) 29 (14.4%) 
High Disadvantage, High Advantage, and High Pollution 47 (2.2%) 1 (2.1%) 
High Disadvantage, Low Advantage, and High Pollution 478 (22.0%) 60 (12.6%) 

Note: This table shows the number of census tracts predicted to belong to each level of exposure. It also displays the number of these tracts that are classified as nonmetropolitan or rural. In the “Census tracts” column, values down the column sum to the total number of census tracts in NC (N = 2,174) and the percentages represent a percentage of that total. In the “Rural census tracts” column, each row is the percentage of census tracts in a given exposure level that are classified as rural. We defined rural tracts via the Federal Office of Rural Health Policy (47).

These patterns are visible in Fig. 3, which displays estimated exposure level for each census tract in NC. The rural areas of the eastern half of the state are dominated by Level 1 (high disadvantage, low advantage, and low pollution) and comprise 9.9% of the census tracts. We did not observe any census tracts in Level 2 (high disadvantage, high advantage, and low pollution) due to the fact that the combination of high disadvantage and high advantage is rare (40).

Figure 3.

Estimated latent class membership for each census tract in NC (2014). Each tract is categorized according to level of socioeconomic disadvantage (Disad.), advantage (Adv.), and multipollutant exposures (Poln.). Tracts without available data are denoted as NA.

Figure 3.

Estimated latent class membership for each census tract in NC (2014). Each tract is categorized according to level of socioeconomic disadvantage (Disad.), advantage (Adv.), and multipollutant exposures (Poln.). Tracts without available data are denoted as NA.

Close modal

Level 3 (low disadvantage, low advantage, and low pollution) dominates the rural areas of the western half of the state and comprises 25.2% of the census tracts. The most ideal level, Level 4 (low disadvantage, high advantage, and low pollution), comprises only 5.7% of the tracts and can be found exclusively in suburban areas throughout the state as well as on the coast.

Level 5 (low disadvantage, high advantage, and high pollution) is the most prevalent class with 25.8% membership and is concentrated in urban areas across NC. Level 6 (low disadvantage, low advantage, and high pollution) is in a few urban areas in the central part of the state, as well as suburban mountain and inner coastal areas, comprising 9.2% of the census tracts. The smallest exposure level is Level 7 (high disadvantage, high advantage, and high pollution) with only 2.2% membership and is found exclusively in urban centers throughout NC. The most toxic class, Level 8 (high disadvantage, low advantage, and high pollution) can be found in 22% of the census tracts and is largely located in the urban areas of central NC as well as scattered in rural mountain, inner coastal and coastal areas. This is one of the few levels to associate high pollution with rural areas.

To assess the uncertainty of these predictions, we examined the probabilities of a census tract being assigned to a given level. In the Supplementary Materials (Supplementary Fig. S6), we present a box plot which shows the distribution of assignment probabilities for seven of the eight latent variable levels. Level 2 is not represented because in our analysis, no census tracts were assigned to Level 2. In the case of all levels except Level 7, the median is above 0.80, leading us to believe that there is low uncertainty in the results of our analysis for these levels. The median for Level 7 is approximately 0.75.

Our results also provided insights into economic disparities based on race/ethnicity. Areas with higher SES Disadvantage had significantly higher black population density (P < 0.001). Similarly, black population density was higher in areas with higher pollution (P < 0.001).

For the K-means clustering analysis, we found that four clusters best fit the data. In the Supplementary Materials (Supplementary Figs. S7 and S8), we provide visualization of the cluster characteristics and the census tract membership in NC. There was a similar pattern of unfavorable SES and Pollution levels in urban areas, and favorable in suburban areas. However, with only four clusters, the clustering from K-means is less nuanced than those from the LCMM. On the basis of AIC and BIC (lower values are better), the LCMM was a better fit to the data (BIC/AIC for LCMM = −57,345.93/57,874.57, BIC/AIC for K-means = 29,646.9/29,019.16).

Model extension for cancer disparities

We have proposed an initial framework to link SES and environmental exposures. In Fig. 4, we demonstrate how this model can be extended as a means to delineate cancer outcomes disparities. Our future work involves utilizing this novel framework to determine potential modifiable factors that could be employed to reduce disparities in cancer outcomes. Each oval in Fig. 4 represents a domain of interest, measured or determined by a set of variables, similar to that shown in Fig. 1.

Figure 4.

Cancer disparities modeling framework. Ovals represent latent variables, informed by multiple indicators, as in the example shown in Fig. 1. Rectangles represent available data, used as outcomes or covariates. The dashed arrows demonstrate that race can inform the latent classes across domains, leading to disparities. Race can also inform the cancer outcome, and the model will estimate the separate effect of race on SES and on other domains from the remaining effect of race on outcomes.

Figure 4.

Cancer disparities modeling framework. Ovals represent latent variables, informed by multiple indicators, as in the example shown in Fig. 1. Rectangles represent available data, used as outcomes or covariates. The dashed arrows demonstrate that race can inform the latent classes across domains, leading to disparities. Race can also inform the cancer outcome, and the model will estimate the separate effect of race on SES and on other domains from the remaining effect of race on outcomes.

Close modal

Clinical care and access could be measured at the individual level, such as type of insurance, does patient have a primary care physician, proximity to care. Individual behavior could include physical activity, smoking, alcohol, and other measures that infer higher (or lower) risk. The Biology and Genetics/Ancestry domain could incorporate high-dimensional data, or a smaller set of markers thought to influence cancer outcomes. For high-dimensional data, a variable selection process would also have to be included. Other extensions, such as spatiotemporal correlation will be critical to deepening our understanding of how exposures over time contribute to the development of disease. We continue to work with this ultimate framework in mind, and will share techniques, software, and methods, as they are extended. For different cohorts and diseases that we study, some domains are critical, while others may not be, so we will not always utilize every domain area. But we consider this framework as a starting point for thinking about how to model cancer outcomes disparities. We recognize that there can be many differential exposures, access, behaviors, which may all contribute to the outcomes disparities we see. For example, in studies of breast cancer, we have utilized a biological domain of the subtyping biomarkers, and we incorporate an association of race as well as one to SES. It is well-known that African American women more frequently have triple-negative breast cancer, and this likelihood may vary with SES. This modeling framework and methodology provides the ability to flexibly model such relationships.

To our knowledge, this work provides the first framework for an exposure model based on a broad range of SES measures and environmental toxins. The model can also be extended to incorporate cohorts (or trials) with biological measures, clinical care and access, behavioral measures, and health outcomes. LCMMs have the advantage of providing a flexible approach for fitting and testing theoretical relationships. Furthermore, these structures and relationships can be illustrated via diagrams (e.g., Figs. 1 and 4), which, though representative of complex statistical models, can be used to bridge knowledge gaps and further research within multidisciplinary teams of experts by providing accessible visualizations of potential hypotheses to be tested. Also, LCMMs have the added benefit of allowing for uncertainty quantification, an essential modeling feature when it comes to working with results with the potential to inform cancer treatment. We illustrate this advantage by comparing LCMM with a widely used clustering technique, K-means clustering, which not only underperformed in terms of fitting the data, but also lacked any measures of uncertainty in assigning census tracts to exposure clusters/classes. In addition, the LCMM modeling approach can incorporate the fact that race and SES (as well as other measures), which are typically highly correlated, can be incorporated with that correlation taken into account. This allows us to interpret the impact of SES, which is informed by race, rather than assuming the effect of race is “removed” when controlling for SES. Furthermore, many potential exposures can be incorporated, the mixture of exposures can be tested, and, when planning interventions, areas that exhibit higher risk populations can be easily identified and prioritized.

Gray and colleagues (48) used modeled predicted surfaces to examine the relationship between air pollution exposure, race, and measures of SES in NC. They considered only PM2.5 and O3 as the pollutants, and assessed poverty, education, and income as SES area level measures from the 2000 census, as well as consideration of the neighborhood deprivation index (41). Similar to our results, they found that PM2.5 was higher in areas with lower SES, higher deprivation, and higher minority population density. Weaver and colleagues (26) examined the joint impact of SES and PM2.5 on cardiovascular outcomes. They utilized a hierarchical clustering approach to identify SES groupings, where clusters 1 and 2 exhibited high proportions of black population, impoverished, nonmanagerial populations, unemployed, and single parent households; while cluster 3 was urban, with high proportions of college degree, and low poverty, nonmanagerial, and unemployed. These groupings have some similarities to our Disadvantage and Advantage latent classes. In their model, all of the SES variables are in a single domain with multiple levels, while we consider two domains with multiple levels and utilize the association between these two domains. They also noted higher impact of PM2.5 on cardiovascular outcomes in the lower SES areas. Brochu and colleagues (49), in a PM10 and PM2.5 model in the Northeastern United States also found that annual PM was consistently and significantly higher in census tracts with lower socioeconomic position, based on cost of living adjusted median household income. In a review of North American studies of criterion air pollutants and SES (50), most studies found a similar relationship of higher air pollutants in areas with lower SES. Some exceptions existed to this general pattern, for example, in New York City, in a borough-specific analysis, the Bronx, Staten Island, and areas of Manhattan exhibited an opposite pattern. In Los Angeles, PM2.5 and O3 levels were similar across SES, but other pollutants were higher for lower SES.

Previous research has shown that when a latent class level has too few observations, it is not meaningful to include in the model (51). Often in latent models, the AIC, BIC, and entropy continue to improve based on increasing the number of latent variable classes and not necessarily based on better fit. This can be mitigated through cross-validation (52). In our analysis, each individual latent variable has sufficient membership in both the high and low levels (illustrated in Supplementary Fig. S9 of the Supplementary Materials). We did see sparse membership once we consider the combination of the high and low levels of SES and pollution. In fact, the probability of assignment to Level 2 is nonzero, but it is small in many areas. So, we are not surprised to see that we do not observe any census tracts assigned to Level 2 in NC. We anticipate that if the analysis were repeated for a larger geographic region, we may see a nonzero, but still a small number of tracts assigned to Level 2. Our earlier work identified a small proportion of zip codes assigned to the combination of High Advantage, High Disadvantage (40). We think this particular combination represents neighborhoods that are in flux, and longitudinally could represent gentrification or decline.

We recognize that the NATA and area-level SES data used do not represent actual individual exposures. However, it is recognized that associations with health outcomes from such area-level measures can be informative. Widely used for health research including cancer studies (53–58), the NATA and ACS data are largely useful for large-scale time trend and spatial analysis and are limited in their usefulness for analysis on a fine spatial and temporal scale. This may be mitigated using datasets with finer spatial and temporal resolution (e.g., EPA Federal Reference Method Air Quality Monitors or the Community Multiscale Air Quality Modeling System) and/or by interpolating the data to achieve higher resolutions (59). There is ongoing research to develop causal models of environmental exposure on health outcomes, and in that setting, it may be critical to utilize specific exposure data. Future work is needed to extend such causal models to the MPE and SES framework we propose for cancer disparities.

T. Hyslop reports grants from NCI during the conduct of the study and personal fees from AbbVie outside the submitted work (not related to submitted work). No potential conflicts of interest were disclosed by the other authors.

A. Larsen: Data curation, software, formal analysis, validation, writing–original draft, writing–review and editing. V. Kolpacoff: Data curation, formal analysis, writing–review and editing. K. McCormack: Conceptualization, software, writing–original draft, writing–review and editing. V. Seewaldt: Conceptualization, resources, writing–review and editing. T. Hyslop: Conceptualization, resources, validation, methodology, writing–original draft, project administration, writing–review and editing.

This work is supported by the NCI of the NIH under award number NCI R01CA220693 (awarded to V. Seewaldt and T. Hyslop, supporting A. Larsen, V. Seewaldt, and T. Hyslop), and this material is also based upon work supported by the National Science Foundation under grant number DGE 1545220 (awarded to C. Gunsch, supporting K. McCormack).

1.
Scharoun-Lee
M
,
Kaufman
JS
,
Popkin
BM
,
Gordon-Larsen
P
. 
Obesity, race/ethnicity and life course socioeconomic status across the transition from adolescence to adulthood
.
J Epidemiology Community Health
2009
;
63
:
133
9
.
2.
Stringhini
S
,
Batty
GD
,
Bovet
P
,
Shipley
MJ
,
Marmot
MG
,
Kumari
M
, et al
Association of lifecourse socioeconomic status with chronic inflammation and type 2 diabetes risk: the Whitehall II prospective cohort study
.
PLoS Med
2013
;
10
:
e1001479
.
3.
Cramb
SM
,
Mengersen
KL
,
Baade
PD
. 
Identification of area-level influences on regions of high cancer incidence in Queensland, Australia: a classification tree approach
.
BMC Cancer
2011
;
11
:
311
.
4.
Rebbeck
TR
. 
Prostate cancer disparities by race and ethnicity: from nucleotide to neighborhood
.
Cold Spring Harb Perspect Med
2018
;
8
:
a030387
.
5.
Conroy
SM
,
Shariff-Marco
S
,
Koo
J
,
Yang
J
,
Keegan
TH
,
Sangaramoorthy
M
, et al
Racial/ethnic differences in the impact of neighborhood social and built environment on breast cancer risk: the neighborhoods and breast cancer study
.
Cancer Epidemiol Biomarkers Prev
2017
;
26
:
541
52
.
6.
Palmer
JR
,
Boggs
DA
,
Wise
LA
,
Adams-Campbell
LL
,
Rosenberg
L
. 
Individual and neighborhood socioeconomic status in relation to breast cancer incidence in African-American women
.
Am J Epidemiol
2012
;
176
:
1141
6
.
7.
Robert
SA
,
Strombom
I
,
Trentham-Dietz
A
,
Hampton
JM
,
McElroy
JA
,
Newcomb
PA
, et al
Socioeconomic risk factors for breast cancer: distinguishing individual- and community-level effects
.
Epidemiology
2004
;
15
:
442
50
.
8.
Webster
TF
,
Hoffman
K
,
Weinberg
J
,
Vieira
V
,
Aschengrau
A
. 
Community- and individual-level socioeconomic status and breast cancer risk: multilevel modeling on Cape Cod, Massachusetts
.
Environ Health Perspect
2008
;
116
:
1125
9
.
9.
Yost
K
,
Perkins
C
,
Cohen
R
,
Morris
C
,
Wright
W
. 
Socioeconomic status and breast cancer incidence in California for different race/ethnic groups
.
Cancer Causes Control
2001
;
12
:
703
11
.
10.
Dominici
F
,
Peng
RD
,
Bell
ML
,
Pham
L
,
McDermott
A
,
Zeger
SL
, et al
Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases
.
JAMA
2006
;
295
:
1127
34
.
11.
Rappold
AG
,
Stone
SL
,
Cascio
WE
,
Neas
LM
,
Kilaru
VJ
,
Carraway
MS
, et al
Peat bog wildfire smoke exposure in rural North Carolina is associated with cardiopulmonary emergency department visits assessed through syndromic surveillance
.
Environ Health Perspect
2011
;
119
:
1415
20
.
12.
Feng
J
,
Yang
W
. 
Effects of particulate air pollution on cardiovascular health: a population health risk assessment
.
PLoS One
2012
;
7
:
e33385
.
13.
Mar
TF
,
Ito
K
,
Koenig
JQ
,
Larson
TV
,
Eatough
DJ
,
Henry
RC
, et al
PM source apportionment and health effects. 3. Investigation of inter-method variations in associations between estimated source contributions of PM2.5 and daily mortality in Phoenix, AZ
.
J Expo Sci Environ Epidemiol
2006
;
16
:
311
20
.
14.
Tong
Y
,
Luo
K
,
Li
R
,
Pei
Lu
,
Li
A
,
Yang
M
, et al
Association between multi-pollutant mixtures pollution and daily cardiovascular mortality: an exploration of exposure-response relationship
.
Atmospheric Environ
2018
;
186
:
136
43
.
15.
Dominici
F
,
Samet
JM
,
Zeger
SL
. 
Combining evidence on air pollution and daily mortality from the 20 largest US cities: a hierarchical modelling strategy
.
J Royal Stat Soc Ser A
2000
;
163
:
263
302
.
16.
Klemm
RJ
,
Mason
RM
 Jr
. 
Aerosol Research and Inhalation Epidemiological Study (ARIES): air quality and daily mortality statistical modeling-interim results
.
J Air Waste Manag Assoc
2000
;
50
:
1433
9
.
17.
Nyberg
F
,
Gustavsson
P
,
Järup
L
,
Bellander
T
,
Berglind
N
,
Jakobsson
R
, et al
Urban air pollution and lung cancer in Stockholm
.
Epidemiology
2000
;
11
:
487
95
.
18.
Pope
CA
 3rd
,
Burnett
RT
,
Thun
MJ
,
Calle
EE
,
Krewski
D
,
Ito
K
, et al
Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution
.
JAMA
2002
;
287
:
1132
41
.
19.
Raaschou-Nielsen
O
,
Andersen
ZJ
,
Beelen
R
,
Samoli
E
,
Stafoggia
M
,
Weinmayr
G
, et al
Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE)
.
Lancet Oncol
2013
;
14
:
813
22
.
20.
Carver
A
,
Gallicchio
VS
. 
Heavy metals and cancer
.
In
:
Atroshi
F
,
editor
.
Cancer causing substances
.
London
:
IntechOpen
; 
2017
.
Available from
: https://www.intechopen.com/books/cancer-causing-substances/heavy-metals-and-cancer.
21.
Zhao
Q
,
Wang
Y
,
Cao
Ye
,
Chen
A
,
Ren
M
,
Ge
Y
, et al
Potential health risks of heavy metals in cultivated topsoil and grain, including correlations with human primary liver, lung and gastric cancer, in Anhui Province, Eastern China
.
Sci Total Environ
2014
;
470–1
:
340
7
.
22.
Matés
JM
,
Segura
JA
,
Alonso
FJ
,
Márquez
J
. 
Roles of dioxins and heavy metals in cancer and neurological diseases using ROS-mediated mechanisms
.
Free Radic Biol Med
2010
;
49
:
1328
41
.
23.
Coker
E
,
Liverani
S
,
Su
JG
,
Molitor
J
. 
Multi-pollutant modeling through examination of susceptible subpopulations using profile regression
.
Curr Environ Health Rep
2018
;
5
:
59
69
.
24.
Chi
GC
,
Hajat
A
,
Bird
CE
,
Cullen
MR
,
Griffin
BA
,
Miller
KA
, et al
Individual and neighborhood Socioeconomic status and the association between air pollution and cardiovascular disease
.
Environ Health Perspect
2016
;
124
:
1840
7
.
25.
Vieira
VM
,
Villanueva
C
,
Chang
J
,
Ziogas
A
,
Bristow
RE
. 
Impact of community disadvantage and air pollution burden on geographic disparities of ovarian cancer survival in California
.
Environ Res
2017
;
156
:
388
93
.
26.
Weaver
AM
,
McGuinn
L
,
Neas
L
,
Mirowsky
J
,
Devlin
RB
,
Dhingra
R
, et al
Neighborhood sociodemographic effects on the associations between long-term PM2.5 exposure and cardiovascular outcomes and diabetes mellitus
.
Environ Epidemiol
2019
;
3
:
e038
.
27.
Johns
DO
,
Stanek
LW
,
Walker
K
,
Benromdhane
S
,
Hubbell
B
,
Ross
M
, et al
Practical advancement of multipollutant scientific and risk assessment approaches for ambient air pollution
.
Environ Health Perspect
2012
;
120
:
1238
42
.
28.
Davalos
AD
,
Luben
TJ
,
Herring
AH
,
Sacks
JD
. 
Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures
.
Ann Epidemiol
2017
;
27
:
145
53.e1
.
29.
Oakes
M
,
Baxter
L
,
Long
TC
. 
Evaluating the application of multipollutant exposure metrics in air pollution health studies
.
Environ Int
2014
;
69
:
90
9
.
30.
Stafoggia
M
,
Breitner
S
,
Hampel
R
,
Basagaña
X
. 
Statistical approaches to address multi-pollutant mixtures and multiple exposures: the state of the science
.
Curr Environ Health Rep
2017
;
4
:
481
90
.
31.
Billionnet
C
,
Sherrill
D
,
Annesi-Maesano
I
. 
Estimating the health effects of exposure to multi-pollutant mixture
.
Annals Epidemiol
2012
;
22
:
126
41
.
32.
Dominici
F
,
Peng
RD
,
Barr
CD
,
Bell
ML
. 
Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach
.
Epidemiology
2010
;
21
:
187
94
.
33.
Keller
JP
,
Drton
M
,
Larson
T
,
Kaufman
JD
,
Sandler
DP
,
Szpiro
AA
. 
Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts
.
Ann Appl Stat
2017
;
11
:
93
113
.
34.
Lenters
V
,
Portengen
L
,
Rignell-Hydbom
A
,
Jönsson
BoAG
,
Lindh
CH
,
Piersma
AH
, et al
Prenatal phthalate, perfluoroalkyl acid, and organochlorine exposures and term birth weight in three birth cohorts: multi-pollutant models based on elastic net regression
.
Environ Health Perspect
2016
;
124
:
365
72
.
35.
Zanobetti
A
,
Austin
E
,
Coull
BA
,
Schwartz
J
,
Koutrakis
P
. 
Health effects of multi-pollutant profiles
.
Environ Int
2014
;
71
:
13
9
.
36.
Pirani
M
,
Best
N
,
Blangiardo
M
,
Liverani
S
,
Atkinson
RW
,
Fuller
GW
. 
Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles
.
Environ Int
2015
;
79
:
56
64
.
37.
Hagenaars
JA
,
Mccutcheon
AL
,
editors
.
Applied latent class analysis
.
Cambridge (UK)
:
Cambridge University Press;
2002
.
38.
Sánchez
BN
,
Budtz-Jørgensen
E
,
Ryan
LM
,
Hu
H
. 
Structural equation models: a review with applications to environmental epidemiology
.
J Am Stat Soc
2005
;
100
:
1443
55
.
39.
Hendryx
M
,
Luo
J
. 
Latent class analysis of the association between polycyclic aromatic hydrocarbon exposures and body mass index
.
Environ Int
2018
;
121
:
227
31
.
40.
Palumbo
A
,
Michael
Y
,
Hyslop
T
. 
Latent class model characterization of neighborhood socioeconomic status
.
Cancer Causes Control
2016
;
27
:
445
52
.
41.
Messer
LC
,
Laraia
BA
,
Kaufman
JS
,
Eyster
J
,
Holzman
C
,
Culhane
J
, et al
The development of a standardized neighborhood deprivation index
.
J Urban Health
2006
;
83
:
1041
62
.
42.
Hazardous Air Pollutants
;
[about 2 screens]. Available from
: https://www.epa.gov/haps.
43.
2014 NATA: assessment results
.
Washington (DC)
:
Environmental Protection Agency
; 
2018
.
Available from
: https://www.epa.gov/national-air-toxics-assessment.
44.
Hart
JE
,
Bertrand
KA
,
DuPre
N
,
James
P
,
Vieira
VM
,
Tamimi
RM
, et al
Long-term particulate matter exposures during adulthood and risk of breast cancer incidence in the Nurses' Health Study II prospective cohort
.
Cancer Epidemiol Biomarkers Prev
2016
;
25
:
1274
6
.
45.
American Community Survey 1-year data (2011–2017)
.
Washington (DC)
:
U.S. Census Bureau
.
Available from
: https://www.census.gov/programs-surveys/acs/data.html.
46.
Blake
KS
,
Kellerson
RL
,
Simic
A
. 
Measuring overcrowding in housing
.
Washington (DC)
:
U.S. Department of Housing and Urban Development, Office of Policy Development and Research
; 
2007 Sep.
Project No.: 017-002. Available from
: https://www.huduser.gov/Publications/pdf/Measuring_Overcrowding_in_Hsg.pdf.
47.
Defining Rural Population
;
[about 3 screens]. Available from:
https://www.hrsa.gov/rural-health/about-us/definition/index.html.
48.
Gray
SC
,
Edwards
SE
,
Miranda
ML
. 
Race, socioeconomic status, and air pollution exposure in North Carolina
.
Environ Res
2013
;
126
:
152
8
.
49.
Brochu
PJ
,
Yanosky
JD
,
Paciorek
CJ
,
Schwartz
J
,
Chen
JT
,
Herrick
RF
, et al
Particulate air pollution and socioeconomic position in rural and urban areas of the Northeastern United States
.
Am J Public Health
2011
;
101
:
S224
30
.
50.
Hajat
A
,
Hsia
C
,
O'Neill
MS
. 
Socioeconomic disparities and air pollution exposure: a global review
.
Curr Environ Health Rep
2015
;
2
:
440
450
.
51.
Eppig
JS
,
Edmonds
EC
,
Campbell
L
,
Sanderson-Cimino
M
,
Delano-Wood
L
,
Bondi
MW
, et al
Statistically derived subtypes and associations with cerebrospinal fluid and genetic biomarkers in mild cognitive impairment: a latent profile analysis
.
J Int Neuropsychol Soc
2017
;
23
:
564
76
.
52.
Grimm
KJ
,
Mazza
GL
,
Davoudzadeh
P
. 
Model selection in finite mixture models: a k-fold cross-validation approach
.
Struct Equ Model A Multidiscip J
2017
;
24
:
246
56
.
53.
Krieger
N
. 
Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter
? 
: the Public Health Disparities Geocoding Project
.
Am J Epidemiol
2002
;
156
:
471
82
.
54.
Krieger
N
. 
Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology
.
Am J Public Health
1992
;
82
:
703
10
.
55.
Apelberg
BJ
,
Buckley
TJ
,
White
RH
. 
Socioeconomic and racial disparities in cancer risk from air toxics in Maryland
.
Environ Health Perspect
2005
;
113
:
693
9
.
56.
George
BJ
,
Schultz
BD
,
Palma
T
,
Vette
AF
,
Whitaker
DA
,
Williams
RW
. 
An evaluation of EPA's National-Scale Air Toxics Assessment (NATA): comparison with benzene measurements in Detroit, Michigan
.
Atmospheric Environ
2011
;
45
:
3301
8
.
57.
Zhou
Y
,
Li
C
,
Huijbregts
MA
,
Mumtaz
MM
. 
Carcinogenic air toxics exposure and their cancer-related health impacts in the United States
.
PLoS One
2015
;
10
:
e0140013
.
58.
McEntee
JC
,
Ogneva-Himmelberger
Y
. 
Diesel particulate matter, lung cancer, and asthma incidences along major traffic corridors in MA, USA: a GIS analysis
.
Health Place
2008
;
14
:
817
28
.
59.
Lawson
AB
,
Choi
J
,
Cai
B
,
Hossain
M
,
Kirby
RS
,
Liu
J
. 
Bayesian 2-stage space-time mixture modeling with spatial misalignment of the exposure in small area health data
.
J Agric Biol Environ Stat
2012
;
17
:
417
41
.