Gene expression arrays are widely used to analyze tumor samples with the intention of identifying tumor identity or aspects of the tumors that may influence disease prognosis. Although arrays have illustrated the large diversity of individual tumors, most research efforts have focused on classifying tumors according to currently established classification systems. In this study we directly tested whether the pathological classification of tumor samples was necessary or sufficient to group tumors into subgroups based on biological similarity. Microarray data obtained from tumor samples was obtained from public databases. Individual data files were rigorously screened to remove poor quality samples. The samples were initially grouped according to large pathological divisions and training sets and test sets were created. Working back and forth between clustering samples and selecting genes an optimal grouping of samples was achieved in the training data along with the list of genes that defined each group. The final classification was then tested on the independent data sets to validate the classification of the tumor groups. We found the glioblastoma tumors clearly subclassified into distinct groups based on tumor grade whereas most other tumor classes did not. Interestingly, ovarian tumors, endometrial tumors and a few breast tumors classified as one single tumor group that might be further subgrouped, but not by site of origin. For cases of sarcoma the observed subclasses only partially recapitulate pathologically defined tumor types. Only with adenocarcinoma tumors did the major subclassification depend on the tumor site of origin. A preliminary classification scheme is presented where tumors are classified on a primary, secondary, and sometimes a tertiary level. One implication of this analysis is that many tumor classes that have been traditionally separated for clinical trials may be similar enough to consider combined trials and other groups of tumors may be inappropriately combined in current trials.

Citation Information: Cancer Epidemiol Biomarkers Prev 2010;19(10 Suppl):A57.