The tension between hypothesis-driven and exploratory research crosses scientific disciplines (1) but is particularly well illustrated in the current excitement about genome-wide association (GWA) studies. Standard linkage analysis has the potential to localize major susceptibility genes to within a few million base pairs using as few as 300 microsatellite markers for a genome-wide scan. However, it has increasingly been recognized that linkage analysis may not be powerful enough to detect genes involved in “complex diseases” like cancer, which are caused by multiple genes and multiple environmental factors, interacting in complicated ways. Thus, molecular epidemiologists are turning to candidate-gene association studies or studies of entire candidate pathways, driven by specific biological hypotheses, as an alternative approach. Now comes the prospect, first seriously proposed a decade ago by Risch and Merikangas (2), of testing virtually all ∼10 million common single nucleotide polymorphisms (SNP) in the human genome for associations with a given disease, either directly or by linkage disequilibrium with other SNPs. Recent developments in ultra-high-volume genotyping chip technology now make the study of as many as 500,000 SNPs commercially viable. Coupled with the extensive haplotype tagging SNP information being catalogued by the HapMap project (3), it now seems that that this density may be sufficient to permit indirect tests of association with the majority of all common SNPs (4), although this fundamental assumption has recently been questioned (5). Does this development mark the end of pathway-driven research? I suggest that it is possible to marry the hypothesis-driven and exploratory approaches in a way that will make better use of this novel but expensive technology.
The first GWA study was published in 2002 (6), using an early 100 K version of the technology. Within the last year, several others have been published (7-10), and many other studies have been launched or proposed. Time will tell whether these early reports represent true positives, but the simultaneous publication in Science of two confirmatory studies (11, 12) for the Complement Factor H association with age-related macular degeneration along with the GWA scan has sparked great enthusiasm, and several reports have seemed subsequently confirming the association. However, this finding remains to be confirmed in population-based studies, which may be less vulnerable to selection bias and less weighed towards advanced cases.
Is the time really ripe for wholesale adoption of the GWA approach? The cost of the genotyping technology is bound to keep falling over the next few years, and many study design and analysis issues remain to be resolved. Yet, investigators may feel that if they do not hop on this bandwagon before everyone else does, they will be left out. But there simply are not enough resources to fund all eager investigators, even all those with already established and well-characterized cohorts or case-control samples for which the major expense would be the genotyping costs.
So how many GWA studies can the scientific community afford and how should they be prioritized relative to hypothesis-driven studies? By any standard, these will be expensive. Supposing a typical study might require at least 1,000 cases and 1,000 controls and at current genotyping costs of approximately US$1,000 per sample for a 500 K chip, the genotyping alone will cost more than US$2 M per study. DNA pooling offers the potential to dramatically reduce the genotyping cost, but substantial technical difficulties remain, and the power and false-positive rates of the approach relative to individual genotyping remain uncertain (13-17).
Are multiple GWA studies really needed? Replication of entire scans is not a good use of limited resources, except perhaps for protection against false negatives; thus, there is little need for multiple studies of the same condition, but there will be many investigators well poised to propose such studies for any given disease. Competition for the best proposal(s) is a sensible approach, but rather than using standing disease-oriented study sections, in which GWA proposals would have to compete against lower-cost, hypothesis-driven proposals for the same disease, GWA proposals for a broad range of diseases should be evaluated against each other. At this early stage of the methodology, this would help prioritize the available funds among the diseases for which GWA studies are likely to be the most informative and would also help refine the methods by ensuring some uniformity of standards against which they would be judged.
It is encouraging that several recent NIH initiatives have taken steps in this direction, in addition to European initiatives, such as the Welcome Trust Case-Control Consortium.123456
Both the GAIN and GEI initiatives plan to select studies by a rigorous peer review process (separately for each initiative). Applications for the GAIN initiative will be due in late April 2006, and genotyping may be under way as early as late summer 2006, a remarkably short interval compared with traditional grant cycles. In preparation for these various initiatives, Institutes have surveyed existing studies that may qualify and have considered their own priorities. But investigators who have not been privy to the planning process may find it difficult to meet the short deadline planned for at least the GAIN initiative, and it will be a challenge to organize an effective peer review process on such a short timeline with the uncertainties of several overlapping initiatives. Meanwhile, the funds available and priority given to various proposals that are currently under review through the traditional R01 mechanism could be affected by these new initiatives.
It is premature to issue rigid guidelines for the conduct of GWA studies, given the infancy of the field and the rapid evolution of the technology, design, and analysis methods, although potential applicants would benefit from some guidelines for review of proposals. In this spirit, the CIDR Access Committee has developed a set of criteria for investigators planning applications for their SNP genotyping service. The draft recommendations arose from a roundtable discussion at the October 2005 annual meeting of the International Genetic Epidemiology Society, chaired by Dan Schaid, with input from Leonid Kruglyak and David Clayton, among others. The draft criteria include such considerations as the potential societal benefit, the evidence in support of a genetic basis, plans for replication, data sharing, and localization of suggested associations, as well as a host of methodologic issues discussed below.
Several recent reviews (18-23) have discussed a range of methodologic challenges in the design and analysis of GWA studies in detail. Some of the issues that require careful consideration include DNA pooling versus individual genotyping; choice of genotyping platform and selection of panel of SNPs; use of multistage designs; control of, and allowance for, genotyping errors; population-based versus family-based designs and adjustment for population stratification; and multiple comparisons and criteria for claiming statistical significance.
Sample Size and Power
To get a sense of the magnitude of the task, it is helpful to consider some rough sample size requirements. Suppose one planned to test 500,000 single-SNP associations in a single-stage case-control study and wished to control the genome-wide type I error rate at α = 5% (i.e., an expected number of false positives across the entire genome of only 0.05), so that any statistically significant associations would be very likely to be true positives. A conservative Bonferroni correction would require a single-SNP significance level of α = 0.05/500,000 = 1 × 10−7. Table 1 indicates the numbers of case-control pairs that would be required to attain 95% power for a range of genetic relative risks and population allele frequencies. These numbers might be reduced by about a factor of two by using a multistage design, in which only the first sample would be tested on the complete panel, with subsequent samples tested on only a subset of the most significant markers (24-26). On the other hand, testing multiple genetic models, additional SNPs or haplotypes, subgroups, or interactions would require an even stricter significance level and larger sample sizes. Thus, these sample size requirements should be taken as only rough guidelines. Methods for significance testing in GWA studies are the subject of an important research area (27-35).
It is not clear that GWA studies should be held to the same standards of replicability as candidate gene studies. The huge cost of GWA studies, and thus the likelihood that few of them will ever be undertaken for any given disease, demands that they should be designed to have very high power, even if at the cost of relatively high false positive rates. If we are going to the bother of looking for the proverbial needle in a haystack, we want to be assured that we have a good chance of finding it, if it is really there, because we are not going to have the energy, or money, to do it all over again! Among such a large number of tests, there is no guarantee that the expected modest number of true positives will rank near the top of the list (36). For example, in the Ozaki et al. (6), GWA scan of 65,671 SNPs, the one functional association was less significant than 200 spurious associations that failed to replicate. To find a true association of the observed size (relative risk = 1.6) with at least 50% power, one would have to follow up >3,400 of the most significant associations! On the other hand, because the yield of positive associations could also be very large, the cost and manpower needed to follow up on each will be considerable. Should few of these associations be replicated, the societal investment in this high-risk experiment could quickly turn sour.
Multistage sampling designs can be thought of as a form of built-in replication, but if the same epidemiologic study design and population is used in the different stages, this is really just a form of statistical replication (albeit a more efficient one than conducting two or more separate studies; ref. 37). True scientific replication involves different investigators, studying different populations, using different study designs, with potentially different strengths and weaknesses. In a recent editorial, Nature Genetics has made this a formal requirement for publication: “Because meta-analysis has shown that many published associations could not be replicated, we now stipulate that the association should be observed in two independent cohorts” (38).
An example of such an article is the recent publication of four variants associated with myocardial infarction, found to be significant in two separate case-control samples of pooled DNA, followed by replication in a third independent case-control comparison using individual genotyping (9). Replication has also been addressed in an editorial in this journal (39), albeit in the context of candidate gene rather than GWA studies, and sparked an extensive series of follow-up commentaries (40-45). Without belaboring these points, suffice it to say that true scientific replication will be essential to make sense of the mass of associations likely to result from GWA studies, but we are hesitant to follow the lead of Nature Genetics by making rigid requirements for replication within a single report, particularly if attaining that goal were to come at the expense of multiple under-powered studies (45).
Perhaps the best way of ensuring scientific replication is to embrace a culture of data sharing, to facilitate investigators' pursuit of alternative analyses of each others' data: generating hypotheses they can attempt to replicate in their own data or replicating their own findings in independent data. The recently announced GAIN initiative is to be commended for embracing these principles: all genotype data will be made public as soon as they are generated and checked for quality, similar to the rules for the Human Genome and HapMap projects. The phenotype data will be made available as soon as the genotype data are, through an NIH access committee. The contributing investigators will have a 9-month window for exclusive right to publish but will receive the data at the same time as anyone else.
The sheer scale of GWA data sets, combined with the usual need to protect confidentiality, will pose formidable practical challenges to sharing data in a manner that will be genuinely useful. Although the GAIN plan is to post-raw genomic data on individual subjects, standards could be developed for highly detailed aggregate data tables for studies where the informed consent would preclude public posting of individual data. As a minimum, these might include summary statistics for all single-SNP associations and details of the methods used to obtain the statistical results, but more thought is needed about whether this should include haplotype associations, gene-gene or gene-environment associations, or various subgroup analyses. Rather than routinely posting such an enormous number of associations one might consider making the data available in through an online “automatic hypothesis-testing machine” with a limited menu of user-specified criteria that could perform the first cut at a replication analysis. Promising associations could then be followed-up with more detailed collaborative analyses by the investigators of both the discovery and replication studies before journal publication.
Testing of interaction effects poses particular challenges in the context of GWA studies (46), first because of the enormous number of potential interactions, 2.5 × 1011 pairwise SNP × SNP interactions in a typical whole genome scan, and similarly huge numbers for even a univariate scan of SNP associations with expression of all known genes (47), and by the general lack of specific prior hypotheses for any of them, beyond a vague belief that interactions are likely to be important. There is even more doubt about the reliability of interaction reports than for main effects, if only because of the larger multiple-comparison problem, the smaller sample sizes, and the low priors for any particular interaction, not to mention the poor track record of replication (44, 48-51).
So how should such a vague belief about interactions be accommodated? In studying lung cancer, one might strongly suspect that smoking could be an important modifier (52), but would it make more sense to sample smokers or nonsmokers? Perhaps simple random sampling or oversampling the extremes of the distribution of relevant environmental factor(s) might be the best gamble, in the absence of more specific hypotheses, with the hope of testing interactions in the analysis. But then how much of the type I error rate is one willing to spend on testing interactions at the expense of power for detecting main effects? Is one perhaps better off limiting the testing of interactions to pairs of factors attaining some threshold for main effects?
Marchini et al. (53) have shown that exhaustive testing of all possible pairwise interactions can be more powerful than testing only the univariately significant ones, depending of course upon the true interaction model. For candidate gene pathways, Millstein et al. (54) have proposed a “Focused Interaction Testing Framework” that seems to outperform a leading alternative approach, Multifactor Dimension Reduction (55), for searching for multidimensional interactions without requiring main effects, but it remains to be seen whether these approaches will be useful for GWA studies (56).
In any event, identification of genes that mediate or modify the effects of environmental hazards, as well as gene-gene interactions, are an important priority of both the scientific community and government agencies, as reflected in the recent National Heart, Lung, and Blood Institute and GEI announcements. Indeed, the GEI initiative involves a substantial commitment to developing new environmental measurements and incorporating them into GWA studies. Ironically, however, the GEI initiative coincided with the announcement of the cancellation of the National Children's Study (57), which could have set the groundwork for prospective evaluation of gene-environment interactions at least for common early-onset conditions (58), although an adult cohort would likely be more useful for studying most cancers, while also allowing for enrollment of their offspring and subsequent generations (59).
A Unified Approach
Returning to the question posed at the outset about pathway-driven versus exploratory approaches, some attractive approaches to marrying to two are the “weighed false discovery rate” approach (60) and a Bayesian false-discovery rate approach (42). Essentially, the weighed false-discovery rate spends the false discovery rate nonuniformly across all associations tested by using prior knowledge (in their example, a prior linkage trace). Although Roeder et al. warn against trying many different weighting functions and choosing the one that yields the greatest number of statistically significant findings (or most satisfying set of them), hierarchical regression models (61, 62) offer a valid way of incorporating a broad range of prior genomic information (location relative to genes, putative function, evolutionary conservations, biological pathways, previous linkage or association findings, or “-omics” databases; refs. 39, 63) into a general framework for prioritizing associations from an initial GWA scan for follow-up in later stages of a multistage design or for replication in other studies. Current skepticism about the utility of at least presently available genomic information (43) may change as these bioinformatics tools mature.
Clearly, the next few years should be an exciting time as GWA studies get under way, and the methods for conducting them become better developed. The interpretation of the mass of data that will result can be expected to keep investigators and pundits entertained long into the future. The stakes are high. If the diseases and specific studies are chosen wisely, the methods effective, and the information shared widely, the payoff could be considerable. But careful preparation is essential, lest high-profile failures diminish the public's and the scientific community's support of such research in the future.