Recently, CEBP published a new editorial policy for association studies of genetic variants and disease (1). The policy gives high priority to studies that meet several criteria. The studies must (a) evaluate a variant in light of its possible interaction with other genes in a pathway and with exogenous exposures: (b) consider the biological plausibility of the reported association(s), and the strength of prior evidence; (c) provide good statistical power for the reported association(s) or lack thereof; (d) and evaluate statistical significance in light of the multiple tests conducted.
These criteria have the laudable goal of promoting clarity and consistency. However, the criteria may be difficult to implement for two reasons. First, they require a balance between the need to evaluate many variants with a reasonable overall type I error rate and the loss of power associated with a stringent per-variant error rate. Second, they are qualitative, so that their interpretation can vary across authors, reviewers, and editors.
One way to resolve both of these difficulties is to introduce a quantitative credibility scale that automatically integrates the criteria. The Bayesian approach of Wacholder et al. (2) addresses the first three criteria, but does not deal with test multiplicity. The frequentist false discovery rate of Benjamini and Hochberg (3) provides a solution to the test multiplicity problem, but does not consider the first three criteria. By combining the Bayesian proposal of Wacholder et al. with the frequentist proposal of Benjamini and Hochberg, we can rank associations on a credibility scale that integrates all four criteria.
The ranking can be described in terms of the prior probability π that an association is true, based on its biological plausibility and on the findings of previous studies. The prior odds in favor of the association are then Oprior = π / (1 − π). Let us say that the study under review has power q to detect the association (if real), and that the study data support it with P value equal to p. As shown by Wacholder et al., the posterior odds in favor of the association, given the study data, are Opost = Oprior (q / p). Notice that the posterior odds increase with increasing prior odds and study power q, and with decreasing P value p.
The notion of posterior odds integrates into a single measure the concepts of biological plausibility, study power, and P value. Wacholder et al. (2) suggest that associations can be evaluated more meaningfully by considering their posterior odds in addition to their P values. However, their proposal pertains to single associations in isolation, and so does not consider multiple findings for several polymorphisms in several genes. To address this need, I suggest using the posterior odds in a Bayesian multiple testing procedure akin to the frequentist procedure proposed by Benjamini and Hochberg (3).
To illustrate this idea, suppose that a study evaluates m tests of association. Each of these associations can be classified jointly as real or spurious, and as “significant” or “nonsignificant” according to some rule based on the study data. For example, a common rule is the one that classifies as significant those associations having P values less than 0.05. The frequentist false discovery rate associated with the rule is the expected fraction of significant findings that are spurious. Benjamini and Hochberg (3) proposed a simple rule (the BH rule) for declaring a subset of the m associations significant, based on a prespecified value r between 0 and 1. Remarkably, the r-based BH rule is guaranteed to have a false discovery rate below 100r%, regardless of how many, and which, of the m associations are spurious. The BH rule can be considerably more powerful than the Bonferroni correction (3).
The BH rule is based only on P values, and thus does not consider either the biological plausibility, the strength of prior evidence, or the statistical power of the various associations tested. In research currently under way, I am investigating an r-based Bayesian rule that ranks the m associations with respect to the posterior odds that they are true. Like the frequentist BH rule, this Bayesian rule ensures that, on average, no more than 100r% of the significant associations are spurious. As a by-product, it also produces a meaningful ranking of the associations with respect to the posterior odds in their favor. Its implementation involves (a) enumerating the associations tested; (b) combining prior odds, power, and P value to calculate a posterior odds in favor of each association; (c) ranking the associations with respect to their posterior odds; (d) using an r-based Bayesian false discovery rate rule to select the noteworthy ones.
Is it time to implement the goals outlined by Rebbeck et al. (1) in such a quantitative way? Should we rank reported associations according to some formal measure of credibility, such as their posterior odds? This measure would help readers to systematically and consistently evaluate the importance of observed associations or absences thereof, not only as evidenced by their P values but also by their plausibility, the prior evidence supporting them and the study power.
No quantitative measure (including the one proposed here) can escape some limitations [see Thomas and Clayton (4) for a thoughtful discussion of these], nor can it eliminate the need for subjective judgment calls. Nevertheless, a formal ranking would make these calls and their rationale more explicit.