Abstract
The purpose of this work is to introduce an approach to identify potential biomarkers from the analysis of microarray experiments. The proposed approach does not require the adjustment of any parameters, thus favoring results’ consistency and reproducibility across different analysts. The key idea in this approach is to represent the biomarker identification task as a multicriteria mathematical optimization problem. The solution of this problem to optimality implies the determination of a set of genes that significantly change their expression level across different experiments. In order to enable such representation, a series of performance measures are obtained for each gene under analysis. It is proposed that these performance measures be statistical p values obtained through nonparametric statistical comparison. A gene with minimal p values across all instances can be considered a potentially dominant gene. Due to the nature of data-based analysis, however, considering multiple p values at a time will lead to conflicting behavior among them. This conflicting behavior introduces a considerable amount of subjectivity in the selection of important genes since different researchers will use different techniques to deal with it. It is our hypothesis that an optimization-based approach can resolve this conflict with consistent and objective results.
Preliminary results on the application of the proposed approach to cervix cancer publicly available data from Wong, et al (2003), include the following prioritized lists of accession numbers for potential biomarkers of cervix cancer. The list with the first priority includes {AA488645, H22826}. The list with the second priority includes {AI553969, AA243749, T71316, AA460827}. The list with the third priority is {AA454831, AA913408}. The list with the fourth priority is {AA487237, AA446565}, and the list with the fifth priority includes {H23187}. A literature search on these genes indicates that there is a high likelihood for some of them to be biomarkers with at least H23187 being a confirmed biomarker for gastrointestinal stromal tumors as reported by Parkkila, et al (2010).
The initial results are encouraging in terms of the high discrimination rate shown by the approach, the lack of parameters involved to identify potential biomarkers, and the hierarchical structure in which the final information can be shown. A validation experiment on the list of genes here presented is currently under consideration by our research group. The proposed method can be used to perform secondary analysis of existing microarray databases with more objectivity.
References:
1. Y. F. Wong, et al, “Expression Genomics of Cervical Cancer: Molecular Classification and Prediction of Radiotherapy Response by DNA Microarray,” Clinical Cancer Research, vol 9, pp. 5486-5492. July 2003.
2. Parkkila S, et al, ”Carbonic anhydrase II. A novel biomarker for gastrointestinal stromal tumors,” Modern Pathology (2010) 23, 743-750.
Citation Information: Cancer Prev Res 2010;3(12 Suppl):B45.