Abstract
Geospatial science is the science of location or place that harnesses geospatial tools, such as geographic information systems (GIS), to understand the features of the environment according to their locations. Geospatial science has been transformative for cancer epidemiologic studies through enabling large-scale environmental exposure assessments. As the research paradigm for the exposome, or the totality of environmental exposures across the life course, continues to evolve, geospatial science will serve a critical role in determining optimal practices for how to measure the environment as part of the external exposome. The objectives of this article are to provide a summary of key concepts, present a conceptual framework that illustrates how geospatial science is applied to environmental epidemiology in practice and through the lens of the exposome, and discuss the following opportunities for advancing geospatial science in cancer epidemiologic research: enhancing spatial and temporal resolutions and extents for geospatial data; geospatial methodologies to measure climate change factors; approaches facilitating the use of patient addresses in epidemiologic studies; combining internal exposome data and geospatial exposure models of the external exposome to provide insights into biological pathways for environment–disease relationships; and incorporation of geospatial data into personalized cancer screening policies and clinical decision making.
Introduction
Location is a fundamental driver of human health, and includes the places where we are born, live, and work (1). Our environment, which includes the physical, chemical, and biological agents found in the air, water, and soil, as well as our lived experiences as part of the neighborhood contexts where we spend our time, is impacted by location – and by extension, our locations can impact our health (2). Geospatial science is the science of location or place, typically using geospatial tools, such as geographic information systems (GIS), to understand, analyze, and visualize the features of the environment according to their locations (3). The most widely used geospatial tool is geographic information systems (GIS), which enables the geoprocessing, management, and analysis of location-based geospatial data (4). Geospatial data, which are georeferenced or tied to some location on Earth, include vector data that represent features in the environment as points (e.g., geocoded addresses), lines (e.g., roads), and polygons (e.g., counties, states, and other administrative units), and raster data that represent features as a matrix cells or pixels (5, 6). Raster data are useful for depicting continuous phenomena such as temperature. Remote sensing data, which are captured from a distance using satellites and/or aircraft, are commonly available as raster data. Vector file formats include shapefiles, GeoJSON, and KML and raster file formats include ASCII and GeoTIFF. Examples of both proprietary and open source geospatial software include Esri ArcGIS, QGIS, GeoDa, and R (7, 8).
There are several key publicly available geospatial data sources that are useful for epidemiologic research. The U.S. Census Bureau provides population, demographic, and economic data at different geographic scales, such as census tracts, block groups, and blocks, each of which has corresponding TIGER/Line shapefiles to depict their boundaries (9). The U.S. Environmental Protection Agency (EPA) provides geospatial data on environmental topics such as air and water quality, including the Air Quality System, Toxics Release Inventory, and Air Toxics Screening Assessment (AirToxScreen; formerly known as National Air Toxics Assessment; ref. 10). The U.S. Geological Survey (USGS) Earth Explorer is a portal for downloading remote sensing data including Normalized Difference Vegetation Index (NDVI) data for greenness/greenspace from expedited products of the Moderate Resolution Imaging Spectroradiometer and Visible Infrared Imaging Radiometer Suite satellite sensors (11). The USGS National Map provides access to topographic data including the 3D Elevation Program digital elevation models (DEM) (12). The U.S. Department of Agriculture (USDA) Geospatial Data Gateway is a clearinghouse for geospatial data such as the National Agriculture Imagery Program (13). Rural-Urban Commuting Area code data are another widely used USDA geospatial dataset for geographic delineations of rurality and urbanicity (14). The National Oceanic and Atmospheric Administration provides data on climate, meteorological, oceanic, and coastal factors (15). The National Aeronautics and Space Administration (NASA) Earthdata Search portal includes access to numerous datasets, including remote sensing data for Earth Observation (EO) data products from satellite missions including Landsat (16). The NASA Socioeconomic Data and Applications Center serves as a Distributed Active Archive Center for global geospatial data on environmental and socioeconomic factors such as on climate risk and vulnerability, environmental sustainability, land use and land cover, outdoor light at night, and natural disasters and hazards (17). The ArcGIS Data and Maps Collection enables online access to geospatial data on cartographic boundaries and sociodemographic factors using data sourced from authoritative entities, such as U.S. federal agencies, for analysis with ArcGIS software (18). ArcGIS Hub is a cloud platform for sharing and downloading open geospatial data, for example, from federal agencies such as the Department of Homeland Security Geospatial Management Office (19). OpenStreetMap is an editable global map facilitating the creation and analysis of crowdsourced geospatial data such as on the food environment (20, 21).
Geospatial Science to Understand the Environmental Epidemiology of Cancer
Geospatial science provides methods and tools that can be integrated into epidemiologic studies to assess and examine the factors to which we are exposed in the environment, harnessing information gleaned from participant locations within study populations and information on the environment associated with those locations (3, 22). Geospatial science has played a particularly prominent role in environmental epidemiology studies because GIS and other tools are well-suited to modeling environmental exposures as they are inherently geographically varying. Applications of geospatial science in epidemiology can include descriptive epidemiologic studies that characterize the environment and disease according to person, place, and time (4). Commonly used GIS tools for descriptive epidemiology include mapping and spatial cluster analyses [e.g., Moran's I for global spatial autocorrelation vs. Local Indicators of Spatial Association (LISA) for local spatial autocorrelation; ref. 23]. These methods can be utilized to visualize the distribution of environmental exposures and/or disease occurrence and burden, identify disproportionately impacted geographic areas, and highlight temporal trends.
These descriptive epidemiologic studies can be valuable in providing exploratory, hypothesis-generating insights to inform the pursuit of analytic epidemiologic studies, which are designed to test hypotheses regarding the association between an exposure and disease using a comparison group (e.g., observational studies such as cohort studies and case–control studies). A primary application of geospatial science in analytic studies in environmental epidemiology is data linkages between geospatial datasets and a study population (using participant location information) to conduct environmental exposure assessments and thus the determination of the distribution of exposures within a study population and the subsequent quantification of an association between an environmental exposure and outcome. In addition, geospatial linkages may be used to incorporate other geospatial data into studies to consider as potential confounders and/or effect modifiers.
Geospatial datasets include exposure models, or geospatial representations of environmental exposures, that arise from existing georeferenced datasets, such as the U.S. Census Bureau; remote sensing data products; and exposure models developed using geospatial data, environmental measurements, and/or advanced methodologies. Methods to develop geospatial exposure models include spatial interpolation, which is the prediction of a variable at unmeasured locations based on data sampled at known locations, and land use regression (LUR), which combines sampled data from specific locations, development of stochastic models using predictor variables obtained using GIS, and application to a large number of unsampled locations in a study area to capture small-area spatial variation (6, 24). Spatial interpolation methods include inverse distance weighting (IDW) and kriging. For example, IDW was used to create a raster surface of predicted PM <2.5 microns in diameter (PM2.5) air pollution concentrations using values from nearby locations of air quality monitoring sites in the United States, in which the weight assigned to data from each included monitor was a function of the inverse distance (5, 6, 25). Kriging is a geostatistical method that extends IDW to predict values at unmeasured locations through additionally accounting for spatial autocorrelation, and has been used to develop an ultraviolet (UV) radiation exposure model incorporating the spatial autocorrelation of residuals from a regression model comprised key predictors of UV (e.g., ozone, cloud cover, elevation; refs. 5, 6, 26). LUR has been used to model intraurban air pollution levels, for example, in a study examining lung cancer risk and exposure to ultrafine particles (UFP; PM <0.1 μm in diameter) assessed using LUR models that incorporated data from a UFP measurement campaign throughout the Los Angeles Basin of California (27, 28), as well as distance to airports, density of major roads, and traffic intensity (24). Ensemble modeling has been used to combine multiple machine learning models (e.g., neural network, random forest, gradient boosting) to predict ambient air pollution levels (29–31). Irrespective of the employed modeling methodology, validation of geospatial exposure models should be executed using approaches, including but not limited to cross-validation techniques and comparison to ground monitoring stations, to determine predictive performance in modeling the occurrence of an environmental exposure for population sciences research (26, 27).
As part of an environmental exposure assessment, geospatial exposure models are linked with participant locations available in study populations such as geocoded residential addresses (i.e., assigned latitude and longitude coordinates), census tracts, and zip codes. Geospatial linkages between participant locations and geospatial datasets occur through GIS-based overlays and distance or proximity-based metrics (e.g., using buffers created around addresses), all of which apply the fundamental principle underlying geography and geospatial science of how phenomena closer in space are more related than phenomena farther away (32).
An important aspect of geospatial linkages is the explicit consideration of spatial mismatches, in which the scale of geographic variables available in a study population may differ from those in a geospatial dataset. Methodologies to reconcile spatial mismatches include areal interpolation to weight re-allocated data according to the spatial relationships of the source units. For example, a study using the Surveillance, Epidemiology, and End Results (SEER)-Medicare database linked participant zip codes with pesticide data reported for cadastral parcels using areal weighted interpolation to examine agricultural pesticide exposure and liver cancer risk (33). Other methods to address spatial mismatches include aggregation to upscale data to larger geographic units, which is facilitated when analyzing nested administrative units such as census tracts within counties, and utilization of geometric properties for geospatial linkages (e.g., centroids). The latter was implemented in a study of SEER cancer registries that used pixel centroids from a raster UV exposure model intersecting cancer patient counties of residence at diagnosis to assess ambient UV exposure in relation to liver cancer risk (34). An alternative to using centroids that is relevant to population sciences research is centers of population, which represent the ‘average’ location of inhabitants within a polygon that weights the location of each person (35). Centers of population were used in a SEER study to estimate a measure of environmental circadian misalignment (i.e., solar jetlag) relevant to the majority of the population residing within a county (36). Although point locations corresponding to geocoded addresses can be linked with virtually any geospatial dataset using GIS, it is advisable to ground any inferences derived from such analyses in the context of likely within-unit variability of the exposure of interest (6). Multilevel modeling, which accounts for within-unit correlation (e.g., individuals residing within the same census tract are assigned the same exposure value), can be implemented to adjust standard errors to account for clustering (37). For example, an epidemiologic study examining radon exposure and breast cancer risk used frailty models, an extension of the Cox proportional hazards model, to account for within-unit clustering introduced by linking participant geocoded residential addresses with a county-level geospatial exposure model (38).
Geospatial environmental exposure assessments have been transformative to research on the environmental etiology of cancer as this work can be efficiently conducted for large numbers of individuals within large-scale cohorts. For example, epidemiologic studies conducted across the world have linked geocoded participant addresses with geospatial air pollution exposure models, demonstrating that higher ambient air pollution exposure was associated with increased risk for lung cancer incidence and/or mortality (39, 40). These studies contributed to the 2013 International Agency for Research on Cancer (IARC) classification of particulate matter (PM) and outdoor air pollution as human carcinogens (41), the scientific evaluation process to update the 2021 World Health Organization (WHO) Air Quality Guidelines (42), and proposed revisions to the European Union Ambient Air Quality Directives that are ongoing as of 2023 (43), which serve as guidance in policy and scientific decision making. Thus, geospatial science continues to exert a substantive influence in environmental and cancer epidemiology research.
The Exposome: A Modern View of the Environment
Modern conceptualizations of the environment apply a holistic perspective that comprehensively incorporate all components of the world that we experience, which offers a relatively more realistic understanding of our ambient surroundings. In particular, the exposome, or the totality of exposures across the life course, provides a research paradigm with which to understand what comprises the environment and how it impacts health (44, 45). Critical time periods of exposure may exist across the life course for a given disease outcome, representing important windows of susceptibility during which exposures will confer long-term risk for health outcomes (46, 47). The emphasis on life course exposures necessitates highlighting the role of the parental and/or pregnancy exposome during preconception and pregnancy, which is considered the starting point for quantifying a person's exposome (48, 49). The developing fetus is vulnerable to the effects of environmental exposures due to rapidly growing and developing organs, immature metabolism, and received doses that are relatively greater than body weight, with maternal in utero exposures associated with fetal programming and short- and long-term health effects among offspring (48, 49). Early-life exposures in childhood represent another critical time window because biological systems are in various stages of development (50). Furthermore, there are unique exposure routes specific to children, which would result in relatively greater levels of environmental exposures (e.g., more time spent outdoors, soil, and dust ingestion), thus posing health risks over the lifetime (50).
In Price and colleagues (2022), the exposome was positioned within a multi-omics framework because of the need to reconcile divergent interpretations and thus applications of the exposome across scientific disciplines and the need to incorporate underlying endogenous processes to meaningfully understand the associations between the environment and health (51). Price and colleagues (2022) sought to implement an “uncoupling exposure and response” such that the exposome was “reattributed” to exclusively represent contact events with environmental exposures, and to further refine the concept of “functional exposomics” (51). The authors defined an environmental exposure as “a contact between external factor(s) (agent) and a biological entity occurring at an (exposure) interface” and how “a single exposure event (exposure period) is a continuous contact with an agent” (51). The “exposome” specifically refers to the measure of the totality of contact events that a person experiences. Contact events are dynamic in space and time and can occur in aggregate such as with exposure mixtures (e.g., multiple types of external factors) and/or multiple contact events, which are characteristic of real-world experiences. Contact events are measured using assessment methods (described below) such as questionnaires, participant locations linked with geospatial datasets, and omics. “Functional exposomics” was defined as the “systematic and comprehensive study of environmental exposure–phenotype interaction over a defined time-period” (51). Thus, the emphasis of functional exposomics is on understanding all external factors substantively contributing to the totality of traits or characteristics displayed by a person (i.e., phenome; ref. 51).
Figure 1 provides a conceptual framework illustrating how geospatial science is applied to environmental epidemiology in practice and through the lens of the exposome and environmental exposures. In contrast to traditional epidemiologic approaches, the exposome paradigm recognizes the utility of an expanded and dynamic exposure assessment across multiple domains, both internal and external, the integration of data across multiple scales of variation (e.g., omic to individual to population) over time and space, and the use of the resulting high-dimensional data to investigate multiple exposure–response relationships (44, 45). Through integrating multiple approaches and ontologies of the exposome (44, 45, 51–60), the environmental factors to which humans are exposed can be categorized according to agents in the personal environment (e.g., lifestyle, diet, occupation, household, microbes, psychosocial factors, others); natural environment (e.g., air pollution); built environment (e.g., green, blue, and gray spaces; ref. 61); and social environment [e.g., neighborhood contextual factors such as area deprivation, socioeconomic status, and other social determinants of health (SDOH); refs. 62, 63]. It should be noted that the policy environment, comprised of governmental laws, ordinances, and regulations and institutional policies, has typically been considered a separate category, although its impact could be directly or indirectly observed in the factors comprising the natural, built, and/or social environments (64).
To measure contact to these agents and the imprints they leave in our bodies, the exposome further includes the domains of the external and internal exposome (51–54, 58). The external exposome is measured using external assessment methods such as geospatial methods, questionnaires, sensors, mobile phones, and environmental measurements, while the internal exposome is measured using internal assessment methods such as multi-omics systems biology in the metabolome, methylome, adductome, proteome, transcriptome, etc. (51–54, 58, 65). The microbial exposome is another area of interest, referring to the totality of gut microbiome-related metabolites in body fluids or tissues of the host (66). The external exposome and internal exposome are interrelated and these measures can be evaluated in tandem to demonstrate environmental exposure-induced biological perturbations (51).
The external exposome is further categorized as specific or general, where the specific external exposome is comprised of personal, individual-level factors such as lifestyle, diet, and occupation that can be assessed using questionnaires (67, 68). The general external exposome is comprised of ambient, contextual factors in the natural, built, and social environments that can be captured using geospatial science methods because of their occurrence and variability according to location. In addition, self-reported measures of a person's perceptions of their environments can also be used (69). For example, neighborhood greenness can be assessed through self-report using questionnaires and/or by linking geocoded residential addresses with NDVI data (70, 71). Thus, both specific and general factors in the external exposome include agents in the personal, natural, built, and social environments. Furthermore, factors in the personal, natural, built, and social environments can impact personal decision-making, thus representing mediators in the association between the environment and health.
To subsequently pursue a geospatial-based environmental exposure assessment, participant locations are linked with geospatial datasets (e.g., exposure models) using methods, such as GIS, to conduct epidemiologic studies investigating the association between the environmental exposure and outcome of interest. The conceptual framework in Fig. 1 can be extended to epidemiologic studies of cancer and other disease outcomes. Thus, as we continue to refine our understanding of how to conceptualize and measure the environment, and through continued domain-specific contextualizations to advance the exposome from concept to practical applications, geospatial science will serve a critical role in determining best practices regarding how to measure the environment as part of the external exposome.
Beyond research, an exposome perspective has further potential benefits for shaping policy, as addressing a health issue that considers the totality of environmental exposures occurring at the individual and neighborhood levels would lend itself to crafting policies that holistically address cumulative life course exposures (64, 72). Intentionally distinguishing between factors within the different environments that impact our lives, from the personal, natural, built, to social, enables elucidating multiple and complex mechanistic pathways through which environmental exposures confer risk for disease.
Geospatial Analytics for Contemporary Cancer Population Sciences Research
The enduring impact of geospatial science in cancer epidemiology is reflected in the AACR Annual Meeting hosting the Geospatial Science Methods Workshop in April 2023 in Orlando, FL, which provided a state-of-the-art overview of new and emerging research in the field and future research directions (73). Geospatial science applications for measuring the external exposome in cancer epidemiologic studies were front and center, where a diversity of applied research examples was described spanning environmental exposures in the natural, built, and social environments and their associations with outcomes related to cancer incidence, mortality, and survival. These studies included research on PM2.5 air pollution in wildfire-burned areas and cancer survival using SEER cancer registries (74), outdoor light at night and endometrial cancer risk in the NIH-AARP Diet and Health Study (75), and neighborhood income inequality and colorectal cancer survival in the Women's Health Initiative (76). Several highlighted studies considered important features of the external exposome, including exposure mixtures and incorporating historical exposure estimates to address timing of exposures during critical windows across the life course (45). For example, in the Sister Study cohort, predictive k-means clustering was used to examine breast cancer risk in association with PM2.5 component mixtures based on participant residential addresses linked with a geospatial air pollution exposure model, thus evaluating air pollution as a complex, heterogenous mixture of hazardous substances (77). In another example, satellite imagery was used to enhance a historical database comprised of decades of information on point source emissions from industrial dioxin-emitting facilities, thus improving the accuracy of source stack locations, the measures of associations derived from epidemiologic studies, and the capacity to estimate exposure lags to address latency periods in investigating cancer outcomes (78).
Emerging Issues and Future Opportunities in Geospatial Science for Environmental and Cancer Epidemiology
The following topics represent areas for future work to advance the application of geospatial analytics to keep pace with modern challenges (and opportunities) in cancer population sciences research investigating the environment. These include methodological issues regarding spatial and/or temporal mismatches in data; the urgent need to craft and implement standard geospatial practices in the measurement and examination of climate change factors in epidemiologic research; approaches to protecting patient confidentiality while promoting the pursuit of scientific investigations; epidemiologic research integrating the external and internal exposome; and translational work in screening and clinician decision making utilizing geospatial methods.
Spatial and/or temporal mismatches
Spatial resolution is the accuracy at which we can depict the location and/or shape of features and spatial extent refers to geographic coverage. Temporal resolution is the frequency at which data are available and temporal extent is the time period of interest. Limitations in spatial and temporal resolution and extent can impact environmental exposure datasets as well as participant location data, thus creating mismatches affecting our ability to accurately assess exposures to an environmental factor. However, in addition to the methods mentioned earlier in the article (e.g., areal weighted interpolation), these issues can in part be addressed through the increasing availability of geospatial big data that is both spatially and temporally granular. This is a prominent trend with satellite imagery, as many high-resolution datasets are available for research and analysis on cloud-based platforms for access to high performance computing (79). Google Earth Engine is comprised of a multi-petabyte geospatial data catalog that is freely available to academic, nonprofit, business, and government users for noncommercial use (79, 80). Google Earth Engine is a successful example of an online platform using a cloud infrastructure that enables the execution of geospatial workflows to analyze geospatial big data without requiring the end user to setup local high performance computing resources (79, 80). There are increasing amounts of open-source geospatial software being developed, such as the geemap Python package that leverages Google Earth Engine, to further enhance computation and visualization in geospatial data science and analyses (81–84).
Wearable sensors can provide highly resolved personal measurements of locations and ambient exposures for utilization in research (85, 86). Sensors, which include wearable personal devices and tools for measurement within an external environment (e.g., in-home monitoring), enable assessment of environmental exposures in the specific and general domains of the external exposome (87). Data derived from sensors include physical activity, sleep, noise, light, air pollution, and temperature (88, 89). Locational data from Global Positioning Systems (GPS)-enabled devices, such as smartphones, can provide information on georeferenced time–activity patterns (90). Yet challenges in the integration of sensors to measure the external exposome in environmental epidemiologic studies include participant adherence and data quality, feasibility of long-term exposure assessments, and storage and processing of big data generated from wearables (87).
There may also be limited capacity to assess time-varying exposures, particularly if participant locations are only available at one point in time (e.g., baseline addresses). Studies can acquire information on address changes to account for participant residential mobility, which is possible through the use of commercial databases (91). This may be conducted in conjunction with usage of residential address histories that are available in many established cohort studies (92). Improved ascertainment of residential histories is valuable for assessing historical long-term exposures and latency periods for environmental exposures and the development of cancer (93). These methodologies can be incorporated into epidemiologic studies as appropriate to fill gaps in and/or enhance spatial and temporal resolution and coverage.
Climate change
Climate change is a long-term change in the average weather patterns that have historically defined the Earth's local, regional, and global climates (94). It is associated with increases in the frequency, intensity, and severity of extreme weather events such as heat waves, heavy rainfall and flooding, dust and wind storms, and droughts, alterations in crop production, and changes in the transmissibility of vector-borne diseases (95). One consequence of climate change is the exacerbation of population-level exposures to particular cancer risk factors (96, 97). For example, the June 2023 wildfire “smoke wave” (98) from fires originating in Canada was extensive in scale, impacting New York City and other cities in North America (99). The major air pollutants emitted during wildfires include PM2.5 (100). Given the pervasive population-level impacts of climate change and how related factors, such as air pollution and meteorological variables of temperature and precipitation, are available as geospatial datasets, it is prudent to determine optimal and scientifically rigorous practices for how to measure these factors using geospatial science (101). Standard and consistent methods in climate change-related geospatial exposure measures and consideration of biological plausibility to chronic disease outcomes will promote etiologically relevant approaches, facilitate comparison of study results, enable effective synthesis and critical appraisal of epidemiologic research, and be valuable as we try to learn more about how climate change impacts cancer outcomes. A promising start is demonstrated in emerging work that considered how to meaningfully measure wildfire exposure: PM2.5 directly emitted from wildfires, proximity to a wildfire (within 20 km), and proximity to a wildfire evacuation zone boundary (102). More research is needed on this topic, which is of high public health significance.
Approaches for protection of patient location data
Participant location data is required to conduct a geospatial-based exposure assessment in an epidemiologic study of cancer. Ideally, utilization of the most granular location data, namely geocoded residential addresses, would minimize exposure measurement error associated with using relatively coarser variables such as counties or zip codes. However, balanced against using higher resolution address data are concerns from data providers (e.g., cancer registries) regarding protection of patient privacy (103). Addresses are typically considered protected health information (PHI) and can be difficult to access and/or involve time-consuming efforts to navigate requisite approvals (103). Thus, it is worthwhile to devise alternative approaches to ensure patient confidentiality as well as maximizing internal validity in the pursuit of cancer epidemiologic research, such as imputation of approximate location from larger geographic units (104), geomasking/jittering of geocoded addresses (105, 106), and/or geospatial virtual data enclaves enabling secure remote access to and analysis of PHI (107). This is particularly relevant to consortium studies, such as the National Cancer Institute (NCI) Cohort Consortium, comprised of many cohorts to conduct large-scale pooled analyses enabling investigation of research questions that would otherwise be difficult to pursue such as examining uncommon cancers (108). Consortium studies require data harmonization and data transfer agreements, which pose logistical challenges including potential differences in geocoding methods and linkage of geospatial data to pursue new hypotheses. One solution devised by the California Teachers Study, which participates in the NCI Cohort Consortium, includes the development of a data warehouse to which investigators can apply for access for use of geocoded residential addresses to conduct geospatial research (109). As technologies emerge, it will be important to continue exploring new ways to achieve usage of geospatial data that maximizes the precision of the geographic unit as well as the privacy of the individual.
Integration of data from the external and internal exposome to elucidate biological mechanisms for environment–disease relationships
Through using omics biomarkers of the internal exposome, the exposome paradigm can be applied to provide insights into how geospatial measures of the external exposome are associated with environmentally induced changes in measures of biological response (45). For example, a metabolome-wide association study was conducted using high-resolution metabolomic profiling of serum samples using liquid chromatography with high-resolution mass spectrometry to identify metabolites and metabolic pathways associated with pesticide classes, which was determined using a GIS-based pesticide exposure measure linking geocoded residential and occupational address histories with a pesticide exposure database (110). Another approach is the meet-in-the-middle method, which is comprised of relating external exposome measures to intermediate biomarkers (e.g., untargeted omics) to prospectively examine a health outcome, the findings of which demonstrate biological pathways and thus plausibility for disease (59). This approach was implemented in an epidemiologic study using metabolomics to show perturbations in metabolic pathways associated with both geospatially assessed UFP air pollution exposure and with asthma and cardiovascular disease outcomes (111). Thus, geospatial measures of the external exposome can be combined with measures of the internal exposome for population-based exposure assessments and identification of biological pathways underlying environmental exposure-disease associations.
Personalized cancer screening policies and clinical decision making using geospatial data
Effective early detection to improve cancer patient outcomes depends on the ability to accurately assess risk at a given time point, which subsequently informs the design of personalized screening regimens (112, 113). Because geospatial data are used to conduct epidemiologic studies of cancer etiology, geospatial data can also be used in cancer prevention and control efforts. For example, although low-dose CT lung cancer screening in former or current smokers aged ≥50 years reduces lung cancer mortality, a low percentage of this high-risk population is diagnosed with lung cancer, thus exposing many patients to unnecessary diagnostic procedures (114). In an extension of the Dutch-Belgian Randomized Lung Cancer Screening Trial (NELSON), investigators aim to improve the efficiency of lung cancer screening through the development of new risk prediction models, including polygenic risk scores and environmental risk scores (114). Polygenic risk scores will be constructed using genotype profiles and genetic variants from Genome-Wide Association Study (GWAS) data (114). Environmental risk scores will be created using patient home addresses linked with geospatial air pollution exposure data (114). It is worth noting that polygenic risk scores and environmental risk scores for non–geospatial-based factors have been developed for other cancers (115, 116). An environmental risk score for colorectal cancer was created using alcohol consumption, body mass index (BMI), diet, nonsteroidal anti-inflammatory drug use, occupational exposure, physical activity, and smoking in the UK Biobank (115). In another study based in Germany, an environmental risk score for breast cancer was created using age, age at first live birth, age at menarche, alcohol consumption, BMI, menopausal hormone therapy, number of first-degree relatives with breast cancer, and parity (116).
In addition to screening, information gleaned from patient residential information can be used in clinical decision making to promote testing and mitigation of hazardous environmental exposures, such as radon, which is a naturally occurring radioactive gas and an established lung carcinogen (117). As radon is a modifiable risk factor, similar to what has been proposed with air pollution and cardiovascular disease, a clinical approach to assessing and mitigating cancer risk from exposure to radon can be implemented (118). Using information from patient home addresses linked with geospatial data on radon concentrations, clinicians could provide guidance on interventions related to radon testing and/or radon mitigation to ensure a patient's indoor air radon concentrations are below US EPA recommended action levels (119–122). If relevant to the patient, since the presence of smoke particles increases the radiation dose from radon daughters, clinicians could provide resources to promote smoking cessation to reduce the radiation dose and thus reduce risk of respiratory diseases such as lung cancer (120). In addition, building on the discussion on air pollution above, an environmental risk score for lung cancer screening could incorporate residential radon exposure (123).
An important consideration in any environmental risk communication is balancing the need for patient education regarding the significance of an environmental pollutant impacting disease risk with being grounded in an awareness of the patient's circumstances (124, 125). If the extent to which a patient can mitigate their residential exposures to a pollutant is limited, for example due to financial constraints, it is incumbent on the health care professional to consider a delivery of information with accompanying strategies within their means (e.g., subsidized radon testing). Risk communication should be anchored on the goal of empowering patients with knowledge in a way that is evidence-based, practical, safe, and congruent with their exposure risk to facilitate disease prevention, and that is also tailored to consider personal priorities, circumstances, and values to minimize unnecessary anxiety and/or stress (124, 125).
These examples demonstrate how similar approaches to the development, validation, and implementation of geospatial-based, environment-focused risk prediction models and/or geospatial-based measures based on patient residential exposure could be executed to improve and inform clinical practice and decision making (113), particularly as evidence mounts for the role of air pollution in the etiology of other cancers (93, 126–128).
Conclusions
In summary, geospatial science remains an important methodological tool in epidemiologic investigations of the environment and cancer. This article provided a contemporary assessment of how geospatial science continues to contribute to improving our understanding of the environmental etiology of cancer, for example, through geospatial analysis for the discovery of novel environmental risk factors for cancer outcomes and progressive methodologic enhancements for improved geospatial exposure modeling and exposure assessments (and thus elucidation of exposure–disease relationships). In the era of the exposome, geospatial science will serve a critical role in determining optimal and scientifically rigorous practices on how to measure key aspects of the environment as part of the external exposome. Particular opportunities for advancing geospatial science in cancer research include enhancing spatial and temporal resolutions and extents of environmental exposure and participant location data; consistent definitions to model and measure climate change factors; developing approaches that balance concerns of patient confidentiality with the successful conduct of epidemiologic research; combining geospatial-based external exposome measures with omics-based internal exposome measures to elucidate biological pathways for environment–disease associations; and the incorporation of geospatial data in personalized cancer screening policies and clinical decision making.
Authors' Disclosures
T. VoPham reports grants from National Institute of Diabetes and Digestive and Kidney Diseases during the conduct of the study. No disclosures were reported by the other authors.
Acknowledgments
This work was supported by grants from the NIH National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK; K01 DK125612, to T. VoPham). A.J. White is supported by the National Institute of Environmental Health Sciences (NIEHS) grant Z1A ES103332. R.R. Jones is supported by the NCI ZIA CP010125 – 28.