Abstract
We critically reviewed the validity and interpretation of two analytical approaches that have been used in the molecular epidemiological literature to investigate the role of gene-environment (GxE) interactions in disease (D) causation. Several studies have attempted to use biomarkers of biologically effective dose (BBED) such as polycyclic aromatic hydrocarbon-DNA and alfatoxin-albumin adducts to assess possible GxE interactions. To truly determine whether BBED results from a GxE interaction that is causally implicated in disease development would require data on G, E, BBED, and D, and thus far, few studies have had data on each of these components. In the absence of data on an antecedent E, one approach has been to assess interactions between G and BBED on D and to interpret the results as providing information on the presence of GxE interactions. In the absence of data on G, another approach has been to control for E in analyses of BBED and D and to interpret nonnull risk estimates for BBED as reflecting the role of G. We show that neither approach is valid. Analyses of interactions between G and BBED cannot be used to draw conclusions about the presence or absence of GxE interactions. Similarly, analyses of BBED and D, controlling for E, do not provide insight into the role of G. We discuss how differences in the risk estimate for BBED, with and without control for E may be interpreted.
Introduction
One of the premises of molecular epidemiology is that measures of biological parameters reflecting events along the causal chain between exposure and disease can provide insight into exposure disease relationships (1, 2). These measures, referred to as biomarkers, have been classified into a series of categories based on their theoretical position and function on causal pathways (1, 2). These categories include biomarkers of exposure, BBED,3 biomarkers of altered function or effect, and biomarkers of susceptibility (Fig. 1). In this article, we focus on issues related to analyses of BBED; markers that measure the quantity of a xenobiotic that has entered the body, escaped detoxification, and has interacted with a cellular molecular target (1, 2, 3). PAH-DNA adducts, AFB1-gaunine adducts, and AFB1-albumin adducts are examples of BBED (1, 4). Interindividual variability in carcinogen-DNA or carcinogen-protein adduct levels is thought to result, in part, from GxE interactions between exposure levels (E) and polymorphisms in metabolic genes (G) (5, 6, 7).
An optimal test of the hypothesis that BBED is a biomarker resulting from a GxE interaction that is causally implicated in disease development would require information on (a) genetic polymorphisms related to the detoxification of the specific xenobiotic under investigation, (b) levels of exposure to the xenobiotic, (c) BBED, and (d) disease outcome. Thus far, there have been relatively few studies that have assessed associations between BBED and disease status, and most studies in the molecular epidemiological literature have been transitional studies assessing associations among E, G, and BBED (4, 8, 9, 10, 11, 12). Because many of the etiologic studies have measures of some but not all of these variables, they have attempted to draw conclusions about the presence or absence of GxE interactions in disease development using three of four elements described above. For example, several reports have examined this hypothesis through an assessment of effect modification between BBED and a putative antecedent gene on disease risk or by assessing the association between BBED and D controlling for a putative antecedent E. Modification of the association between BBED and D by the genetic factor or a finding of a relationship between BBED and D after controlling for E was interpreted as evidence for a GxE interaction in disease risk (4, 8, 9, 10, 11, 13). The validity of these interpretations is the focus of this article.
Two reports of nested case-control analyses have assessed effect modification of the association between AFB1-albumin adducts (the BBED) and hepatocellular carcinoma (D) by polymorphisms in the glutathione S-transferase genes GSTM1 and GSTT1 (G) (4, 13).4 Similarly, a nested case-control study assessed modification of the association between urinary AFB1-N-gaunine adducts and hepatocellular carcinoma by the GSTM1 polymorphism (11). The motivation for these studies was to test the hypothesis that individuals exposed to aflatoxin (E) with the deletion polymorphism (G) have higher concentrations of the reactive metabolite and, in turn, higher adduct levels (BBED) and thus higher disease risk. The theory is that the proteins coded for by the GST genes detoxify the reactive metabolite of aflatoxin, AFB1–8,9, epoxide, which binds to albumin and DNA to form the AFB1 adduct. Individuals with the germ-line deletion polymorphism in these genes cannot detoxify the epoxide metabolite through these pathways and therefore, at the same levels of exposure, are more likely to form adducts and develop disease (7). However, because there were no measures of the exposure (aflatoxin), effect modification between the biomarker and the genetic polymorphism on disease risk was measured instead of effect modification between the exposure and the polymorphism. The results of the former analysis were interpreted as if they were results from the latter.
Similarly, several reports [including ones co-authored by an author of this report (A. R.)] have assessed the association between BBED and D, controlling for E (8, 9, 10, 14), to examine the GxE hypothesis. Specifically, these reports assessed the association between PAH-DNA adduct levels (BBED) and disease controlling for putative antecedent E (8, 9, 10, 14). A nonnull finding from this analysis was interpreted as evidence for a GxE interaction in disease formation. The reasoning behind this interpretation was that if variation in PAH-DNA adducts results from variation in exposure and metabolism, controlling for E will provide information on the effect of differences in metabolism, where these differences are thought to relate to unmeasured G (2, 9, 10). However, in retrospect, the validity of this interpretation is not clear; does controlling for a putative antecedent E allow us to make interpretations about unmeasured G? How should differences in risk estimates for BBED before and after control for E be interpreted?
Analyses of Intermediate Variables Cannot be Used to Assess Interactions between Antecedent Variables.
We argue that neither of these analytical approaches can provide evidence regarding the existence of GxE interactions. We begin with a discussion of the reasoning behind our conclusions and then provide illustrative numeric examples. The model of interest is that a GxE interaction leads to a BBED that, in turn, leads to disease. One can think of these variables in a causal chain where the G and E are interacting antecedents of a BBED, which is an intermediate variable in causing the disease (Fig. 2). For a series of variables aligned on a causal chain, analyses of intermediate and downstream variables cannot provide information on unmeasured antecedent variables. This is because the intermediate variables partially integrate the effects of the antecedent variables. A well-known example of this is when intermediate variables are controlled for in an analysis that assesses the association between an antecedent variable and disease. In such analyses, the effect of the antecedent variable can no longer be observed because its effect is integrated into the intermediate variable that is held constant in the analyses. Indeed, when both the antecedent and the intermediate variable are measured, if the antecedent has an effect solely through its influence on the intermediate variable, controlling for the antecedent will have no effect on the relationship between the intermediate variable and the outcome because the intermediate variable already includes the influence of the antecedent variable on the disease.
Conceptualized in this way one can see why assessing effect modification between G and BBED (or E and BBED for that matter) is not informative about the relationship between E and G in causing disease through BBED. If indeed BBED is the consequence of an interaction between E and G, the effect of this interaction on disease is already measured by the relationship between BBED and disease. An observation of effect modification between G and BBED is then the result an interaction between G and the biomarker itself. For instance, the deletion polymorphism may cause a biologically effective dose to result in an early biological effect that would not have occurred without the deletion. Thus, observation of effect modification between G and BBED provides evidence for a genetic susceptibility to a process further down the causal chain from, rather than antecedent to, BBED.
In a similar manner, controlling for E in an analysis of the relationship between BBED and disease is not informative about the relationship between E and G in causing BBED. In the usual situation, controlling for an antecedent variable has no effect on the relationship between an intermediate variable and the outcome. It is important to recognize that virtually all antecedents work in interaction with other variables in causing outcomes. Thus, the effect of controlling for an antecedent where the interaction is unmeasured but posited will be no different from the effect of controlling for an antecedent where the interaction is similarly unmeasured but not posited (as in the well known scenario described above). Controlling for the exposure will have no influence on the observed relationship between BBED and D, even if BBED is a result of the interaction between E and G.
Materials and Methods
In the following sections, we illustrate our perspective with three numerical examples of cohort data. We generated a series of example cohort studies that included measures of the variables necessary to assess the hypothesis of interest: exposures (E); a polymorphic gene (G); BBED; and disease (D). In each example, 2000 subjects were classified as having the wild-type genotype of a polymorphic gene, and 2000 subjects were classified as having the variant homozygous deletion polymorphism. Within each stratum of genotype, 1000 subjects were classified as exposed and 1000 as unexposed; exposure and genotype were independently distributed. Using this cohort as the background sample, we developed three different numerical examples, each involving one or more genetic interaction (Figs. 2,3,4). For each numerical example, we calculated RRs for associations between E and D and between BBED and D, by strata of G. This allows us to assess the interaction between E and G and between BBED and G. We also assessed the impact of controlling for E on the association between BBED and D. The data were stratified by E, and the stratum specific RR for the associations between BBED and D were calculated. MH RRs for the BBED to D association were compared with the crude BBED-D RR to assess the presence of confounding. Through these methods, we performed the types of analyses used to test for GxE interactions when full information is unavailable. The results of these analyses were then compared with the true associations that we generated using the full information. Because these are idealized data, any difference in stratum-specific risk estimates represents true effect modification rather than sampling or measurement error. Our conclusions, therefore, apply to optimal situations. Sampling and measurement error, ubiquitous in real-life research, could only serve to amplify the invalidity of the results. It should be noted that in these models, interaction is defined on a multiplicative scale, the scale used in most studies that examine biomarkers.
Results
Numerical Example 1: A True Interaction between G and E Causes BBED, which Causes Disease.
In numerical example 1, where there is a true interaction between E and the G on disease risk, this relationship is indicated by the heterogeneity of the RRs for the exposure disease relationship across strata defined by G (Fig. 2). When the analysis is carried out with information about the G and the E, there is no problem, and the GxE interaction is detectable. However, analyses of BBED and D stratified by G (i.e., assessments of Gx BBED interaction in causing disease) reveal no effect modification by genotype (Fig. 2). Similarly, the second proposed substitute analysis, controlling for E in analyses of BBED and D, does not provide any insight into GxE interaction. Analyses of BBED and D adjusted for E yielded a MH RR that was identical to the crude RR; both RRs were 5.
This illustration represents the model invoked for the role of the GSTM1 and GSTT1 genes, both of which have a deletion polymorphism that precludes the detoxification of reactive metabolites through these pathways. These reactive metabolites may be detoxified by other metabolic gene products or may form carcinogen-DNA or carcinogen-protein adducts. In this example, we developed a model in which G modifies the effect of E but does not modify the effect of BBED; these data are consistent with the theoretical model. In a situation where there is G by E interaction, there would be no evidence of a BBED by G interaction in the substitute models. Therefore, using a G by BBED interaction as a proxy for a E by G interaction would not validly test the hypothesis of GxE interaction. Similarly, the BBED to D risk ratio, controlling for E, is identical to the crude BBED to D risk ratio, thus control for E has no effect on the risk estimate. Thus, contrary to what has previously been assumed, controlling for E in these analyses does not provide any indication of the presence of GxE interactions. If this is so, the question arises as to the correct interpretation of results that do indicate interaction in these substitute models. We illustrate this in the next two numerical examples.
Numerical Example 2: An Interaction between G and BBED Causes Disease.
In numerical example 2, E is an antecedent of BBED, but there are no GxE interactions in the pathway from E to BBED. There is, however, an interaction between the G and BBED, i.e., G has an effect that occurs after BBED formation. Thus, this example posits an underlying model different from the one theoretical interest. Analyses of E and D stratified by G revealed an apparent effect modification (Fig. 3). However, this effect is driven by the interaction between BBED and G and the correlation between E and BBED. That is an interaction between variables further upstream (e.g., between G and the antecedent) can reflect an interaction further downstream (between G and the consequence of E, i.e., BBED), but as we’ve seen in example 1, the reverse is not true. As expected, analyses of BBED and D stratified by G show the presence of effect modification (Fig. 3). In analyses of BBED and D, control for E does not alter the BBED RR; the crude and MH RR are both 6.
Although our model posits biological interaction between G and BBED, we also observed effect modification between G and E. However, the extent of effect modification in analyses of G and E is smaller than observed in analyses of G and BBED. The ratio of stratum-specific RR for analyses of E and D stratified by G is 1.15, compared with a ratio of 1.4 for analyses of BBED and D stratified by G. This occurs because E is an antecedent of BBED, and so in interaction analyses of E and G, E serves as a proxy measure for BBED. In a study that has measures of E, G, BBED, and D, comparisons of the extent of effect modification between E and G and between G and BBED could be used to infer where along the causal chain a biological interaction occurs. However, contrary to discussions in the literature, the observation of effect modification between BBED and G alone does not suggest the presence of a GxE interaction. Rather, it reflects an interaction further down in the causal chain. Again, as in example 1, control for E does not alter the risk ratio calculated for BBED, and thus, control for E does not provide any additional information relevant to assessing interactions with G.
Numerical Example 3: An Interaction between G and E Causes BBED and an Interaction between G and BBED Causes Disease.
Numerical example 3 is the combination of examples 1 and 2, so there are both GxE and GxBBED interactions, both of which are of the same magnitude as used in examples 1 and 2. Analyses of E and D by strata of G revealed effect modification that is larger than that seen in example 1 (Fig. 4). Analyses of BBED and D by strata of G also reveal effect modification, but the magnitude of the effect is identical to that seen in example 2 (Fig. 4). In analyses of BBED and D, stratification by E reveals a small amount of effect modification. Among the exposed, the BBED to D risk ratio is 6.33 and among the unexposed the risk ratio is 6.00.
In example 3, we modeled biological interactions between G and E and between G and BBED. Here, we observe the strongest effect modification between E and G; this occurs because there is an interaction between E and G, and because E causes BBED, and there is an interaction between G and BBED. This observation reinforces the conclusion that if an interaction occurs between G and any marker on the causal chain, effect modification will be observed between G and any causal antecedent of that marker if the marker is not included in the statistical model. The strength of the effect modification diminishes as the antecedent variable is further removed along the causal chain from the point of biological interaction. We also observe that the effect modification between G and BBED in example 3 is the same as observed in example 2, i.e., the interaction between G and E does not impact the observed extent of effect modification observed between G and BBED. This reinforces our point that the presence or absence of effect modification between BBED and G cannot be used to draw conclusions about the presence or absence of interactions between G and E.
In example 3, analyses of BBED and D stratified by E suggest the presence of effect modification by E. This occurs because there is a higher proportion of subjects who are BBED positive and have the variant polymorphism in the exposed stratum compared with the unexposed stratum. Because there is an interaction between G and BBED and there is a higher proportion of subjects with both these traits in the exposed stratum, the association between BBED and D appears slightly stronger among the exposed. Additional stratification by G in each stratum of E reveals the same associations between BBED and D in each stratum of E (data available upon request).
Discussion
There are several publications in which markers of BBED and disease outcome have been analyzed in which a putative antecedent exposure has been controlled for or effect modification between the BBED marker and a putative antecedent gene has been assessed. Such analyses are commonly used in case-control and nested case-control studies of carcinogen-DNA or protein adducts (BBED) and cancer outcomes to test for evidence of gene-exposure interactions. However, such analyses of intermediate markers cannot provide evidence for interactions between antecedent variables as our numerical examples attest.
Our examples were designed to reflect the hypothesized causal pathways in the cited publications. In these causal models, GSTM1 modifies the association between exposure and carcinogen-DNA adduct formation that, in turn, is hypothesized to increase the risk of cancer. Thus, we set our gene prevalence at 50%. For a continuous variable like aflatoxin exposure through diet, it would be reasonable to dichotomize the subjects into high versus low exposure categories based on the median level of exposure; we also set the prevalence of exposure to be 50%. In many ways, this represents the optimal scenario for detecting GxE interactions, giving sufficient power for stratified analyses (15). We also wanted to create an idealized scenario, for instance, with no measurement or sampling error, to show that these analytical approaches do not work even in the best of circumstances.
However, the exact prevalences of exposure and genotype in our examples are immaterial to the central point, which is that the cited analytical approaches do not provide evidence for or against the existence of GxE interactions. A comparison of examples 2 and 3 shows that the presence of a GxE interaction does not alter the extent of effect modification observed in analyses of GxBBED interaction. Both examples show a ratio of 1.4 for the stratum specific RRs representing the association between BBED and D (7/5), despite example 3 including a GxE interaction that precedes the GxBBED interaction and example 2 having no interaction between G and E. Thus, analyses of GxBBED interactions do not provide any information about potential GxE interactions.
We have also shown that controlling for E in analyses of BBED and D does not generally alter the disease risk for BBED. However, these examples assume that E has no effect on risk other than that mediated by BBED. If E also influences risk via another causal pathway, control for E will alter the observed risk for BBED. In this case, E would act as a confounder. If the association between E and D that occurs through the secondary pathway is positive, then control for E would be expected to diminish the risk observed for BBED. Only in the instance where the second pathway is inversely associated with BBED or E has a protective effect through a second pathway will control for E cause the BBED risk ratio to increase. In any case, however, a change in the BBED to disease relationship after control for E would not be an indication of an GxE interaction.
In the context of interactions, it is important to remember that most determinations of BBED used in epidemiological studies are performed in blood samples and are assumed to be a surrogate for BBED determinations at the target tissue (3). Furthermore, most studies only include measures taken at one time point, and it has been shown there is intraindividual variability in BBED levels (6). Additionally, the laboratory assays themselves are not completely reliable (16). Thus, there is likely to be a good deal of measurement error associated with BBED determinations. Likewise, there is likely to be error in the measurement of E, which in the context of case-control studies might also be differential because of recall bias. Furthermore, because a rationale for using BBED when exposures are ambient and have multiple routes is that they will reduce measurement error, measures of E are likely to have more measurement error than those of BBED (2). It is well established that in stratified analyses of a risk factor, measurement error in the ascertainment of the risk factor can cause the stratum-specific risk estimates to be divergent, giving the appearance of effect modification where none truly exists (17).
In conclusion, we have illustrated that analyses of effect modification between G and BBED cannot be used to draw conclusions about biological interactions between G and E. The observation of true effect modification between G and BBED suggests the presence of a biological interaction further down the causal pathway between BBED and D. However, we caution that measurement error in BBED determinations can easily cause the appearance of effect modification when none truly exists. Lastly, we show that contrary to prior discussions, analyses of BBED and D that control for E do not provide any information regarding the presence of GxE interactions.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This work was supported by National Cancer Institute Grant KO7-CA92348 and a grant from the New York City Council Speaker’s Fund for Public Health Research.
The abbreviations used are: BBED, biomarkers of biologically effective dose; PAH, polycyclic aromatic hydrocarbon; AFB1, aflatoxin B1; GxE, gene-environment; MH RR, Mantel Haenszel relative risk; RR, relative risk.
Similar analyses have been performed in a case-control study of lung cancer in which the interaction between PAH-DNA adduct levels in WBCs and GSTM1 genotype was assessed (8).
Acknowledgments
We thank Drs. Federica Perera and Regina Santella and Ulka Campbell for reviewing and providing us feedback on drafts of this manuscript. We also thank Dr. Alex Halim for his thoughtful comments.