The ascendancy of the molecular epidemiology approach (1, 2), simply defined as including biomarkers in population-based study designs, is clear by any survey of current population studies. Most scientists would agree that embedding advanced technologies in molecular epidemiology designs will be key to achieving breakthroughs. Given the increasingly central role of molecular epidemiology in unraveling major questions in cancer, there is a compelling argument to expand its scope to include two areas that are often absent, namely, behavior and outcome. Integrative epidemiology refers to population-based study designs incorporating information or biomarkers that add the behavior and outcome ‘wings’ to the traditional/familiar molecular epidemiology paradigm that extends from exposure to disease (see Fig. 1, integrative epidemiology).
Integrative epidemiology is simply the familial molecular epidemiology paradigm with the ‘wings’, behavior, and outcome added.
Integrative epidemiology is simply the familial molecular epidemiology paradigm with the ‘wings’, behavior, and outcome added.
First, with regard to behavior, it is well established that the exposures that account for the majority of human cancer, such as tobacco (3), alcohol (4), and diet/energy balance (5), have strong hereditary components. Yet, although even the very earliest molecular epidemiology studies emphasized investigation of genes that process exposures as potentially influencing cancer (6), the study of genes that contribute to the key exposures responsible for cancer in the population has often been left to behavioral geneticists who focus on more extreme clinical samples (i.e., alcoholics and the morbidly obese). The lack of attention by cancer epidemiologists seems shortsighted because the genetic contribution to the exposures themselves is at least as strong as the hereditary component of risk of the individual cancers (7). A comprehensive understanding of the genetic architecture of cancer, not to mention cancer etiology, will be deficient without understanding the genes that influence the likelihood of the exposure itself. Moreover, selected genes influence both the likelihood of exposure and the downstream effects [i.e., CYP2A6 for tobacco (8) and the alcohol and acetaldehyde hydrogenase gene families for alcohol (9)]. Understanding such pleiotropic effects will be critical to unraveling the complex interplay of genes and environment. Complementing the role of genes, and considering tobacco as an example, behavioral measures can include dependency (i.e., Fagerstrom test for nicotine dependence; ref. 10), psychological characteristics (i.e., depression and anxiety), traits (introversion and neuroticism), other substance use, psychiatric disorders (schizophrenia and eating disorders), or exposure markers (i.e., cotinine). As behaviors associated with tobacco smoking, obesity, and alcohol consumption are resistant to modification, enhanced behavioral data on populations relevant to cancer may yield new prevention or therapeutic strategies or allow existing approaches to be better tailored to individuals. Just as expression/proteomic/genetic characterization of disease is expected to redefine disease categories and provide insights into molecular etiology, combined behavior and exposure information may result in more precise definitions of tobacco phenotypes (11).
Second, with respect to outcome, molecular epidemiology emphasizes cancer incidence with the investigation of outcomes relegated to clinical trials, where unfortunately, an emphasis on exposure, genetics, and biomarkers has, until recently, often been absent. Heredity influences treatment response and toxicity, and at least a few genes influence current therapy (12) and specifically cancer therapy (12). It is widely anticipated that genomics research (integrated with proteomics and expression) will eventually enhance risk stratification by genetic or biomarker defined subsets, with the potential for targeted prevention and therapy (i.e., personalized medicine; ref. 13). Spitz has used the term integrative epidemiology (14) to advocate this approach, emphasizing that selected genes/markers may influence both risk and outcome, highlighting glutathione S-transferase family genes, cyclin D1, and matrix metalloproteinase 1, as examples. How common such associations are is unknown, but as we currently lack a comprehensive understanding of the common modifier genes for any major human cancer (15), it would seem clear that if both disease and outcome can be investigated using the same technology, biospecimens, and study platform, such an effort could be both informative and efficient.
There are two central advantages of integrative epidemiology. The first is that a given study platform can accommodate a greatly expanded group of investigations. Epidemiologists traditionally emphasize well-defined exposure and disease measures in their studies, along with the collection of biospecimens. Given this baseline commitment, adding the relevant behavior and outcome components is efficient, as often minor adjustments to existing designs (i.e., questionnaires, biospecimens, collection protocols, informed consents, etc.) may capture large new classes of information. Even when more substantial resources are required, the overall effort will be far less then fielding a de novo study. Epidemiologists traditional focus on data quality will enhance interpretation. The application of diverse high-technology approaches (genomics, proteomics, etc.) along with clinical and epidemiologic data, and tissue resources is expected to provide the molecular tools to both elucidate mechanism and refine risk and response (16, 17).
The second advantage is that integrative epidemiology allows investigation of characteristics (genes, biomarkers, etc.) that span more than one study feature (i.e., a gene with pleiotropic effects related to both disease and outcome as noted above) or behavior and exposure. Tobacco smoking is related to lung cancer and smokers who persist after diagnosis have increased second tumors and a poorer outcome (18). Isolated investigations will not reveal the actual complex roles of pleiotropic genes relevant to cancer without integrative epidemiology studies that include the relevant study domains (i.e., disease, outcome, etc.). The additional data will also permit more sophisticated analysis to sort out causal pathways in the presence of mediator variables (19) and interaction (20).
Arguments against such approaches generally cite cost and complexity, in part, because integrative studies require participation of diverse disciplines with resulting large study teams. Both meta-analyses and recent findings from large scans (21) indicate that relative risks of susceptibility genes for common cancers are modest, so substantial sample size is mandatory to detect weak genetic signals, to achieve power to detect gene-environment and gene-gene effects, and to investigate subgroups. Accordingly, consortia have formed or are planned for virtually every major disease (22) and it is clear that team science featuring interdisciplinary teams will be an increasingly prominent trend (23). It follows that the increased cost will mandate that available efficiencies be applied to maximize the science. If we consider each study domain (behavior, outcome, exposure, genetics, disease, etc.) to be a ‘node’, and connections between nodes reflect potential areas for scientific investigation, adding behavior and outcome clearly enhances scientific opportunities (i.e., increases the number of nodes) for a given study platform. Efficiency also improves with larger studies [i.e., the absolute cost of an integrative epidemiology study is greater but the marginal cost (per unit of information) is lower].
In conclusion, integrated designs that incorporate behavior and outcome data into population-based studies will help fully realize the benefits anticipated for molecular epidemiology. As we commit large resources to conducting high-technology studies, platforms that incorporate these elements will provide an enhanced scientific payoff.