Despite widespread enthusiasm for the candidate gene approach to identify common but “low penetrance” susceptibility genes that predict cancer risk (1), in the aggregate, association studies involving complex diseases have yielded a scant harvest (2). One review concluded recently that only 6 of 166 disease associations that were studied three or more times were replicated consistently (3). Another systematic review of genetic polymorphisms and breast cancer could not exclude a potential role for a variety of candidates but neither could it verify significant risk associated with any specific gene (4). Earlier studies, as well as meta-analyses involving putative susceptibility genes and lung cancer have been inconclusive. Therefore, uncertainty as to whether polymorphic variation in the MPO2 gene alters susceptibility to lung cancer is more typical rather than unique, and we need to examine the issue in a broad context. This provides an opportunity to consider why well-conducted studies such as the reports by Feyler et al. (5) and Xu et al. (6) involving the relationship of the MPO gene polymorphism and lung cancer arrive at seemingly opposing conclusions.
I will frame the discussion of the genetics of complex disease as pertains to cancer in general and MPO and lung cancer in particular around the following questions.
Is there sound evidence that a hereditary component of common malignancies exist?
If inherited susceptibility to common cancers exists, is it important to identify?
What does the current evidence indicate?
If finding genetic susceptibility factors is important, why after more than a decade of work has it not clearly identified? What can be done to resolve the question?
Evidence for an Influence of Heredity in Common Cancers
The preponderance of evidence from diverse approaches supports at least some hereditary component for virtually all of the major malignancies. Approaches include enumerating cancer in relatives of cases compared with the relatives of controls and twin studies that show enhanced concordance in monozygotic compared with dizygotic twins (7, 8). The high penetrance genes that create striking familial aggregations of cancer (i.e., the BRCA mutations and breast cancer) only explain a fraction of the genetic component of risk.
The earliest studies to evaluate candidate genes and tobacco-related cancers began in the pregenotyping era (i.e., before 1998). Probe drugs corresponding to potentially carcinogenic compounds metabolized by the genes of interest were administered and “metabolic phenotypes” constructed based on the distribution of recovered metabolites; genotypes were inferred. Of the first four metabolic polymorphisms studied by this approach (9, 10, 11, 12), only one association can be considered well established. Mechanistic (13) and population data (14) are reasonably concordant, and establish that NAT2 “slow acetylators” exhibit increased risk for bladder cancer. Evidence that polymorphisms in CYP1A1, GSTM1, or CYP2D6 influence susceptibility to lung cancer remains controversial. For the first two, mechanistic evidence is convincing (15), but meta-analyses now including a substantial number of studies exhibit only weak evidence for a small effect (16, 17, 18). For CYP2D6, both mechanistic evidence and population data are null to weakly suggestive (19).
Why Is It Important to Identify Low Penetrance Genes That Contribute to Cancer?
Given that the environment plays a dominant role in most common cancers (20, 21), why is identifying a “minor” genetic component important? The history of epidemiology indicates that progress in prevention (i.e., public health) and treatment (i.e., clinical) of cancer can occur without knowledge of the specific mechanism (or genes) that contribute to susceptibility. However, for many malignancies, substantive advances in both etiologic understanding (i.e., brain and prostate) and in therapy (i.e., lung) have been elusive. Given the slow progress, improved insight from understanding how genes promote or block cancer can be crucial. Moreover, even a modest contribution of a common genetic factor would imply a substantial attributable risk (22). A clear understanding of one or more susceptibility genes could provide improved mechanistic insight, clues to exposure mechanisms (i.e., a gene clearly related to susceptibility may suggest a specific etiologic agent), or may suggest a target for chemoprevention or other interventions. Even more importantly, such a finding can be seen as key to elucidating a fundamental question about cancer. Given that on the population level, most cancer depends on environmental factors, but on a cellular level, genetic lesions characterize cancer, understanding how these two processes intersect will be of fundamental importance. This is true even if in most instances, hereditary susceptibility genes are not the same as the genes somatically mutated.
What Does the Evidence Suggest and Do the Classic Criteria for Evaluation of Causality Help?
What can we conclude from the published MPO/lung cancer studies, and what are the implications for the broader literature? One starting point is to consider the major criteria for the evaluation of causality (23, 24) that include biological plausibility, strength and specificity of the association, temporality, and dose response.
MPO is an enzyme that generates reactive oxygen species and thereby contributes to bactericidal activity in white cells. The cost of enhanced killing by the more active forms of MPO is increased inflammation, and the balance of polymorphic gene variants that alter activity likely reflect selection pressure in different populations. In a manner similar to other polymorphic Phase I enzymes (i.e., CYP1A1), MPO converts polycyclic hydrocarbons such as benzo(a)pyrene to DNA-binding intermediates (25) potentially contributing to pulmonary carcinogenesis, as well as to other conditions such as arteriosclerosis and Alzheimer’s (26). A G-463-A variant in the promoter region of MPO alters an SP1 binding site and results in decreased transcriptional activity. One report suggests the action of MPO is additionally catalyzed by nicotine, suggesting the possibility of enhanced effect in smokers (27). The evidence for biological plausibility is generally supportive.
Strength of Association.
Before the studies in the current issue, the hypothesis of an association between an MPO variant and lung cancer has been addressed in six published studies: London et al. (28), LeMarchand et al. (29), Cascorbi et al. (30), Schabath et al. (31), Misra et al. (32), and Kantarcci et al. (33). A detailed quantitative treatment of the data from all of the previous studies involving MPO and lung cancer exceeds the scope of this article but considering the six earlier published studies, a summary risk ratio is 0.92 (95% confidence interval, 0.75–1.04) for heterozygotes and 0.66 (0.36–0.78) for AA homozygotes with borderline not significant heterogeneity. Three of the six studies are nominally “significant,” and five of the six studies exhibit an overall effect in the direction hypothesized. When the small positive study by Feyler et al. (5) and a large convincingly null study by Xu et al. (6) are included, the summary risk ratios lose statistical significance, whereas heterogeneity becomes highly significant. Heterogeneity and loss of significance for the overall finding are entirely because of the Xu et al. (6) study (based on a “sensitivity analysis;” results not shown) consistent with the larger study size and overall null results.
As indicated above, there is a tendency but not significant evidence in the summary data that the risk for heterozygotes (GA) is intermediate between that of the two homozygous genotypes.
This criteria is inherent in the germ-line nature of the gene.
Given the heterogeneity in the eight published studies, this criteria is not met.
Specificity of Association.
There is some evidence for an effect in other tobacco-related cancers (i.e., laryngeal although null in pharyngeal cancer; Ref. 34), whereas a study in lymphoma, a nontobacco-related malignancy, was null (35). Also generally consistent with the postulated mechanism are reported associations with atherosclerosis, stroke, liver fibrosis in hepatitis C (36), Alzheimer’s disease, and multiple sclerosis (37), as well as a finding of a relation of the MPO variant to DNA adducts levels in mammary tissue (38). However, all of these associations require additional confirmatory studies.
Summing up these categories of evidence, considering the criteria for causality with regard to MPO and lung cancer, this evidence does not allow a clear conclusion.
Perspectives on Nonconfirmation?
Methodologists from different disciplines have offered diverse perspectives on the reasons why verification of gene-disease associations has been so inconsistent.
Virtually all of the observers note that power has been inadequate, particularly for interactive effects (i.e., gene-environment and gene-gene; Ref. 39). Although population stratification is both a real and theoretical problem, recent work suggests it does not pose a substantial source of bias sufficient to justify the inefficiency and other drawbacks of using family controls in population study settings (40).
Epidemiologists and statisticians have described poor control selection, design flaws, publication bias, disease misclassification (“phenotype” definition in genetics parlance), failure to obtain or poor quality exposure data, misclassification bias, excessive type 1 errors (false-positives), post-hoc and subgroup analysis, and multiple comparisons as the likely culprits (41, 42). Geneticists point to failure to consider the mode of inheritance, genetic and disease heterogeneity, failure to account for the linkage disequilibrium structure of the gene, inadequate characterization of the mutations that account for the genetic abnormalities, and the likelihood that genes identified only account for a small proportion of the variability in risk, i.e., power (43, 44, 45).
All of the above points are relevant; however, it may be that additional sources of complexity add to the challenge of identifying the specific genes that account for susceptibility to cancer. Three types of complexity will be considered.
In explaining failed attempts to find genes that account for major psychiatric conditions such as manic depression and schizophrenia, behavioral geneticists cite the difficulty in defining the phenotype (46). Cancer epidemiologists would call this “disease misclassification,” but generally argue that cancer studies enjoy the theoretical advantage of a diagnosis established by pathological tissue examination, rather then one at least historically based on solely clinical criteria. In fact, there is every reason to believe that what we clinically label and traditionally study as “cancer” will not have a one-to-one correspondence with a specific gene (47). Genes may be associated with some early aspect of carcinogenesis (i.e., DNA repair and metabolism), a late feature (i.e., metastasis), a particular histological variant (i.e., lung adenocarcinoma), a vanety of molecular features (i.e. somatic mutations, expression pattern, etc.) or an exposure causally linked to the cancer such as tobacco.
An example is provided that illustrates the last point. Twins studies consistently demonstrate strong heritability for smoking (48, 49). Population studies have tended to rely on statistical adjustments for tobacco use to separate effects of smoking, but given the strong dependency of lung cancers on tobacco, this strategy may not be completely successful. Certain genes may contribute to both smoking and lung cancer, i.e., CYP2A6 is implicated in both the metabolism of nicotine and of carcinogenic nitrosamines (50).
Aside from exposures, component or antecedent conditions may be related to particular genes. Chronic obstructive pulmonary disease and, specifically, emphysema may comprise an independent risk factor for lung cancer (51). Emphysema exhibits a strong familial tendency and at least one gene, α-1-antitrypsin, is established as causing a more severe form of this condition (52). Because many of the genes and mechanisms that have been proposed for lung cancer such as GSTM1 (53) and epoxide hydrolase (54) have also been suggested as risk factors for emphysema, observed associations with either condition may depend on the degree to which the conditions overlap in a particular study setting.
These points illustrate how genes may be associated with exposures or conditions related to the disease of interest, rather than to the disease itself. More generally, whereas early studies of genetic susceptibility in lung cancer focused on the class of genes that controlled metabolism of chemical carcinogens, newly identified genes and pathways involving key features of neoplasia such as inflammation, angiogenesis, DNA repair, drug transport, immune response, metastasis, apoptosis, and others are under active investigation. Studies will need to consider and examine a variety of “endophenotypes” potentially associated with cancer, and expect that specific genes may exhibit a variable and not necessarily consistent degree of association with lung cancer itself.
In the late 1980s, as the first RFLP assays began to replace probe drug phenotyping, there was the naïve expectation that variation in function of complex P450s would be explained by a set of simple mutations. This expectation was not unreasonable given that family studies established, for example, for NAT2 and CYP2D6, that metabolic phenotypes segregated in families according to Mendelian expectations (55). With additional study their complexity has become apparent. Both NAT2 and CYP2D6 (>70 mutations) exhibit a variety of common and rare mutations, which ablate activity, increase activity (i.e., gene amplifications), or alter substrate specificity (56, 57). Whereas reasonable assessments of gene activity can be constructed from smaller subsets of the known variants, earlier studies that assessed only one or a few mutations can be seen in retrospect to have had little chance to identify associations. This also suggests that the hope that a given haplotype or SNP can accurately represent gene action, at least for the more complex genes, may not always be fulfilled. Linkage disequilibrium-based search strategies that depend on random SNPs may misclassify and, therefore, miss such gene systems (58, 59).
In the case of MPO, the regulatory site that has been consistently studied may not be the only source of hereditary variation. It is likely that other polymorphisms (some rare) as well as environmental factors contribute to biologically meaningful regulation of MPO (60). It is of interest to note that among rare individuals that lack the gene, increased infections are observed only in the presence of secondary factors (i.e., diabetes). This suggests that parallel pathways may “buffer” activity and that presently unknown combinations of genes may mediate parallel effects. This leads directly to the last aspect of complexity to be considered.
There has been much discussion regarding whether the effects of multiple genes are additive or multiplicative. Postulating simple and straightforward mechanisms is necessary to model statistical expectations, but in actuality nature may be more complex. For example, as noted above, selection has insured that important pathways controlling processes that are critical in neoplasia (cell cycle control, immune response, and DNA repair) are redundant. As an example, P450s have overlapping substrate specificities (61). Many xenobiotics can be metabolized by more than one enzyme albeit with differing kinetic parameters. In addition to the complexity involving specific substrates, biological endpoints (i.e., cell transformation) can likely be reached in diverse ways. Pathways to phenotypes may be anastomotic, i.e., several pathways lead to the same phenotype with homeostatic and compensating feedback mechanisms between them (62). These considerations imply that biologically meaningful effects may become apparent only when there are sufficient “hits” accrued to a given pathway. The current complex pattern of somatic mutations that characterize specific cancers may come into better focus when pathways are assessed. As a practical application, this would suggest that it will be worthwhile to consider a panel of “Phase I genes” together. Alternatively, a panel of genes that influence a key mediator such as nicotine, nitrosamines or benzo(a)pyrene might be considered. Although perhaps exceeding the current analytical capacity and the biochemical and genetic database, integrating projected exposures with kinetic parameters for each relevant genetic variant may provide a more biologically realistic projection of exposure.
Given these issues, we briefly consider the prospects that technology will resolve the question of gene-cancer associations.
What is the Role of Technology in Resolving These Questions?
Accelerating technological progress is characteristic of our era, and there is justifiable enthusiasm that new approaches will improve molecular epidemiology studies and refine the ability to conduct studies to detect associations. High throughput genotyping using ever-smaller quantities of specimens obtained less invasively, increasing availability of SNPs from the Human Genome effort, and improved bioinformatics tools to facilitate information processing are examples that are currently impacting these studies.
There are two different orientations to approaching the problem of complex disease. The traditional approach begins with specific candidate genes such as MPO for which some population and mechanistic evidence already exists. Increasingly, technology affords the opportunity to rapidly and efficiently test specific variants against many DNA samples.
The alternative approach, based on the availability of the database and informatics tools to use genetic markers deriving from Human Genome research, and potential for pooling DNA (63) and high throughput genotyping, is to conduct linkage disequilibrium mapping to identify regions that harbor candidate susceptibility genes. This approach may be conducted using an ambitious entire genome search (64) or may be limited to particular regions thought to harbor candidates based on other evidence. There is controversy about the degree to which the extent of linkage disequilibrium (which varies in different population settings and also across the genome) affects this strategy (65, 66). Limiting markers to include a smaller number of representative haplotypes may enhance the ability to identify regions that may harbor new susceptibility genes (67). Given the vast number of available SNP:, and the attendent problem of false positives, it will be important to develop a strategy for SNP selection that takes into account: unferred function, biological plausibility, conservation across species, haplotype setting, and prior population data.
A variety of new technologies will be briefly mentioned that compliment both of these approaches by providing new candidate genes for evaluation and introducing new evidence to support biological plausibility. However, it should be noted that whatever technical approaches are used to suggest new candidates or enhance the evidence for old ones, verification in an appropriately designed population-based study will remain a benchmark requirement for scientific acceptance and any serious consideration of public health impact.
Some Specific Technologies
Animal models and various model systems provide insight into the genetics of cancer (68). Although there are important intrinsic limitations (different life span, organ and site specificity, complex human exposures, and so forth), the available mouse models should increasingly be a rich source of candidates for human study (69). Expression studies in tumors can provide indications of genes that are under or overexpressed in tumors, and by identifying patterns, suggest fundamental disease redefinition (70). Gene silencing by methylation can alter expression, and so identifying genes that have undergone this epigenetic alteration in critical tissues can identify elements in the transformation pathway. Cytogenetic changes were historically noted as early evidence that the genetic material was a key target in cancer. Newer approaches such as interphase genetics (i.e., fluorescence in situ hybridization) have delineated key regions with characteristic alterations in specific cancers. There is great enthusiasm that efforts to isolate protein/peptide markers of cancer will identify “early markers” for cancer and also identify critical genes or pathways.
There is no question that each of these approaches and others have the potential to identify new candidates and to provide new types of mechanistic support. Yet we will ultimately be left with the same task we currently face with regard to MPO, that is, examining the population evidence to establish whether the specific variant of interest is associated with disease.
Simple quantitative considerations and our current mechanistic understanding of MPO suggest that the hypothesis of an association of the polymorphic variant with lung cancer remains unproven but still plausible. Specifically, the question of whether the existing positive studies represent a real but small effect exists cannot be resolved with the available data. The broad “complexity” considerations noted above would suggest future studies might consider whether: (a) based on its mechanism or substrate specificity MPO might have a relation to some other endophenotype, i.e., inflammation or altered immune function; (b) other MPO genetic variants may exist and should be analyzed; and (c) other genes known to influence the disposition of benzo(a)pyrene and oxidative species should be comprehensively assessed and the possibility of nonadditive/nonmultiplicative relationships among genes in the same pathway explored.
What Can Be Recommended?
The earlier cited recommendations that studies will have to be of optimum design, quality, and size are relevant (71). Population-based designs including both case-control (72) and cohort (73) designs will be key to the identification and verification of candidate susceptibility genes (74). The advantage of the diverse approaches derives from the different opportunities afforded by each. For example, a cohort study can address multiple disease endpoints potentially allowing a simultaneous evaluation of the multiple cancers postulated to be related to MPO as well as other conditions (i.e., Alzheimer’s and arteriosclerosis-related diseases). A case-control study might include a more in-depth questionnaire to better define exposure subgroups or biospecimens to address finer mechanistic questions.
Meta- and pooled analyses that allow a compilation and limited quantitative treatment of existing data, both published and unpublished, will be a useful adjunct. Meta- and pooled analyses have well-known limitations, and while such efforts should be supported, it is unlikely that pooled approaches alone will resolve these questions. They will provide insight and help with subgroup questions (75) and establish a baseline for new investigations.
Application of technological methods listed as well as newer approaches will provide a steady influx of new candidate genes with attractive mechanistic credentials, but confirmation of candidate genes on the population level will always be required. Studies with borderline power reporting new findings will be met with a greater degree of skepticism, whereas journals will find web and other formats to record null studies that heretofore have been considered too low priority to publish (76). Given the recognized obligate role of environmental factors in most common cancers and lung cancer in particular, careful exposure assessment will be essential to evaluate how risks because of genes and environment interplay. Technology will prove to be an asset, particularly when it allows issues of cancer causation to be considered in a broad context.
Epidemiological studies that intend to evaluate genes must be of substantial size. Diverse studies of various (i.e., both case-control and cohort) but sound design will be required, as verification in different populations and exposure settings will be important. Because population differences in both exposure and genes will be considerations, international cooperation will be a priority. Networking studies and consortia will be critical to encourage interdisciplinary and international participation. Leaders must craft incentives into the granting structure to reward such cooperation. If any of the complexity considerations noted earlier apply, the only way small but potentially important effects can be revealed and then confirmed is through cooperation between existing and planned large-scale efforts. From the earliest molecular epidemiology studies, the advantage of close cooperation between population scientists and laboratory investigators was recognized. The large studies and cooperative efforts envisioned will require an infrastructure and level of support capable of generating a rich population data and biospecimen resource to richly reward the new generation of interdisciplinary scientists ready to address these challenges.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The abbreviation used is: MPO, myeloperoxidase.