Abstract
B55
Objectives: Molecular high throughput data offers unprecedented opportunities for discovery including new diagnostics and personalized treatments in a wide range of cancers. Given the novel nature of such data, standard analysis principles either do not apply or are not sufficient. On the other hand, recent high-throughput analysis guidelines are not standardized, they have not been independently validated, and there is no consensus regarding best practices. The present study illustrates these problems in two recent cancer studies by identifying subtle yet critical errors that jeopardize the studies' findings and conclusions. Methods to avoid such errors are proposed. Methods: We examine the methodological validity and identify subtle errors in the highly-cited studies of [1] and [2] We test the impact of the above errors on study conclusions by re-analyzing the original and simulated data using protocols where the identified errors are systematically removed. Results: Changes in the error metric, the methods employed for its estimation and statistical testing, and the classifier, allow previously undetected predictive signals to be identified in 6 out of 7 datasets of [1]. This refutes the original study conclusions that microarray data may not predict cancer outcomes and that studies require thousands of patients for the purpose of outcome prediction. Changes in the statistical tests for SNP selection and signature error estimation reveal that all SNPs identified by [2] are not, in fact, statistically significant at the chosen level and that the original study classifier does not perform better than chance. Conclusions: Critical to fulfilling the promise of "omics" cancer research is the sound analysis of high-throughput data. The field is in dire need for validated protocols and standardized best practices in order to protect researchers from critical errors and to allow them to use their data effectively and their resources efficiently. References: 1. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 2005;365(9458):488-92. 2. Hu N, Wang C, Hu Y, Yang HH, Giffen C, Tang ZZ, et al. Genome-wide association study in esophageal cancer using GeneChip mapping 10K array. Cancer Res 2005 Apr 1;65(7):2542-6.
Second AACR International Conference on Molecular Diagnostics in Cancer Therapeutic Development-- Sep 17-20, 2007; Atlanta, GA