One reason that ovarian cancer is such a deadly disease is because it is not usually diagnosed until it has reached an advanced stage. In this study, we developed a novel algorithm for group biomarkers identification using gene expression data. Group biomarkers consist of coregulated genes across normal and different stage diseased tissues. Unlike prior sets of biomarkers identified by statistical methods, genes in group biomarkers are potentially involved in pathways related to different types of cancer development. They may serve as an alternative to the traditional single biomarkers or combination of biomarkers used for the diagnosis of early-stage and/or recurrent ovarian cancer. We extracted group biomarkers by applying biclustering algorithms that we recently developed on the gene expression data of over 400 normal, cancerous, and diseased tissues. We identified several groups of coregulated genes that encode for secreted proteins and exhibit expression levels in ovarian cancer that are at least 2-fold (in log2 scale) higher than in normal ovary and nonovarian tissues. In particular, three candidate group biomarkers exhibited a conserved biological pattern that may be used for early detection or recurrence of ovarian cancer with specificity greater than 99% and sensitivity equal to 100%. We validated these group biomarkers using publicly available gene expression data sets downloaded from a NIH Web site (http://www.ncbi.nlm.nih.gov/geo). Statistical analysis showed that our methodology identified an optimum combination of genes that have the highest effect on the diagnosis of the disease compared with several computational techniques that we tested. Our study also suggests that single or group biomarkers correlate with the stage of the disease. [Mol Cancer Ther 2008;7(1):27–37]

Epithelial ovarian cancer is the most lethal form of gynecologic cancer and the fourth leading cause of cancer death among women in developed countries, claiming about 15,000 lives in the United States each year (1). One reason it is so deadly is the fact that ovarian cancer is not usually diagnosed until it has reached an advanced stage. Early detection can help prolong or save lives, but clinicians currently have no specific and sensitive screening method and the disease displays very subtle symptoms (2). The well-known CA-125 blood test and other imaging techniques, such as ultrasound and computed tomographic scan, or the combination of the CA-125 blood test with one of the above imaging techniques, are useful for tracking patients already diagnosed with ovarian cancer but have not proven sensitive enough to be used as an early diagnostic test (3).

In recent years, large-scale gene expression analyses have been done to identify differentially expressed genes in ovarian carcinoma (411). A common goal of these studies was to identify potential tumor markers for the diagnosis of early-stage ovarian cancer as well as to use these markers as targets for improved therapy and treatment of the disease during all stages.

Numerous computational tools have been developed to analyze gene expression data for biomarker discovery (1219). Most focus on differential gene expression, which is tested by a simple calculation of the fold changes by t test, F test, scoring methods (12), or cluster analysis (13). Many other computational techniques based on a supervised learning approach have also been developed (e.g. support vector machine; ref. 15, naive Bayes method, and Fisher discriminant analysis; refs. 1619). Although most of these approaches have been successful in uncovering interesting patterns that can be used to discriminate between healthy and diseased tissues, computational tools for the identification of potential blood biomarkers are still not well developed or do not take into account all of the input variables. Most approaches only do a comparison between healthy and diseased tissues of the corresponding disease and do not take into consideration other tissues in the body that may produce the same protein as the diseased tissue. Therefore, potential biomarkers identified using these approaches may introduce false positives in a diagnostic blood test.

In this study, we first used a computational tool that we recently developed (20) to identify single biomarkers. Then, we used a second novel algorithm that we recently developed to identify group biomarkers (2022). Group biomarkers correspond to a set of single biomarkers that exhibit coherent behavior across an ordered set of ovarian cancer tissue samples, representing distinct stages of the disease. That is, their expression level increases or decreases coherently during the progression of the disease. This unique pattern shows a correlation or coregulation among the set of genes that belong to the same group biomarkers, suggesting that they respond similarly to the same environmental conditions. Prior studies on different organisms have examined several biclusters of coregulated genes and showed that the genes in a given bicluster typically participate in a single pathway (23).

Our methodology for identifying single or group biomarkers is based on unifying techniques that are well understood and developed in the literature: gene expression data analysis, biclustering algorithms, and receiver operating characteristics (ROC) curves. Furthermore, our approach for identifying blood biomarkers is based on the observation that, if we are looking for biomolecular patterns in the blood that are caused by ovarian cancer, those patterns should only be present in the gene expression data of ovarian cancer tissue samples compared with the gene expression data of normal ovary tissue samples or any nonovarian healthy or diseased tissue samples.

We implemented our approach using the computer program Matlab and applied it to a comprehensive set of well-defined gene expression data corresponding to normal ovary, ovarian cancer, and nonovarian tissue samples. We identified three candidate group biomarkers that encode for secreted proteins, membrane proteins, and/or extracellular matrix proteins. These three candidate group biomarkers clearly discriminate between the sample sets, and they are promising candidates to be used for early detection or recurrence of ovarian cancer using a blood test. Statistical analysis showed that these group biomarkers have a much better detection performance than single biomarkers and combinations of biomarkers identified using other computational approaches. Our data also suggest that single or group biomarkers correlate with the stage of the disease.

Tissue Samples

Table 1 lists all of the tissue samples used in this study. They can be classified into four different sample sets: normal ovary, ovarian cancer, normal nonovarian, and diseased nonovarian. Normal ovaries were obtained from 62 women. Seven borderline ovarian tumors were obtained; these tumors are considered to be of low malignant potential and were not staged. Next, we obtained tissue samples of stage III or IV serous epithelial ovarian cancer derived from two different sites: 22 from the ovary itself and 16 from the omentum. Tissues were ranked from normal to low malignancy to highly malignant as follows: normal ovaries, borderline ovarian tumors, primary serous epithelial ovarian tumors present in ovarian tissues, and serous epithelial ovarian tumors present in omental tissues. None of the patients had been treated with chemotherapy before surgical resection of the tissues (10, 11).

Table 1.

Tissue samples used to generate gene expression data

Tissue samplesNo. samplesAge (y), mean (range)
Normal ovarian tissues   
    Normal ovary 62 51 (28–79) 
Ovarian cancer tissues   
    Borderline ovarian cancer 51 (25–81) 
    Papillary serous adenocarcinoma 22 58 (29–79) 
    Omentum; papillary serous adenocarcinoma 16 57 (29–79) 
Normal nonovarian tissues   
    Adipose 13 52 (14–86) 
    Cervix 17 50 (34–62) 
    Colon 16 57 (24–87) 
    Kidney 12 60 (38–89) 
    Liver 14 50 (22–90) 
    Lung 18 55 (32–76) 
    Myometrium 90 50 (14–84) 
    Skeletal muscle 10 40 (14–75) 
    Small intestine 10 62 (20–83) 
    Uterus 17 46 (30–73) 
Diseased nonovarian tissues   
    Degenerative surface of bone 18 63 (43–85) 
    Kidney clear cell adenocarcinoma 79 (67–89) 
    Gallbladder with chronic inflammation 14 35 (12–68) 
    Liver fibrosis 51 (33–67) 
    Myometrium leiomyoma 33 47 (26–87) 
    Tonsils with lymphoid hyperplasia 26 21 (10–42) 
Tissue samplesNo. samplesAge (y), mean (range)
Normal ovarian tissues   
    Normal ovary 62 51 (28–79) 
Ovarian cancer tissues   
    Borderline ovarian cancer 51 (25–81) 
    Papillary serous adenocarcinoma 22 58 (29–79) 
    Omentum; papillary serous adenocarcinoma 16 57 (29–79) 
Normal nonovarian tissues   
    Adipose 13 52 (14–86) 
    Cervix 17 50 (34–62) 
    Colon 16 57 (24–87) 
    Kidney 12 60 (38–89) 
    Liver 14 50 (22–90) 
    Lung 18 55 (32–76) 
    Myometrium 90 50 (14–84) 
    Skeletal muscle 10 40 (14–75) 
    Small intestine 10 62 (20–83) 
    Uterus 17 46 (30–73) 
Diseased nonovarian tissues   
    Degenerative surface of bone 18 63 (43–85) 
    Kidney clear cell adenocarcinoma 79 (67–89) 
    Gallbladder with chronic inflammation 14 35 (12–68) 
    Liver fibrosis 51 (33–67) 
    Myometrium leiomyoma 33 47 (26–87) 
    Tonsils with lymphoid hyperplasia 26 21 (10–42) 

Tissues were obtained from the University of Minnesota Cancer Center Tissue Procurement Facility on approval by the University of Minnesota Institutional Review Board. Tissue Procurement Facility employees obtained signed consent from each patient, allowing procurement of excess waste tissue and access to medical records. Bulk tumor and normal tissues were identified, dissected, and snap frozen in liquid nitrogen within 15 to 30 min of resection from the patient. Tissue sections were made from each sample, stained with H&E, and examined independently by two pathologists to confirm the pathological state of each sample. The integrity of the RNA was verified before use in gene array experiments (10, 11).

Gene Expression Matrix

The gene expression data were determined by Gene Logic using the Affymetrix GeneChip HG_U95A, which contains 12,651 known genes and 48,000 expressed sequence tags. The gene expression data were normalized using Affymetrix M.A.S. 4.0.1, and the log-floor data transform with a floor value of 1 was done (24). After this process, the data ranged from 0 to 4. The data were then organized into three matrices defined as follows: matrix A is a 12,651 × 62 matrix that represents the gene expression of the 62 normal ovary tissue samples; matrix B = [B1B2B3] is a 12,651 × 45 matrix that represents the gene expression of the 45 ovarian cancer tissues samples; submatrix B1 is a 12,651 × 7 matrix representing the gene expression of the 7 borderline ovarian cancer tissues; submatrix B2 is a 12,651 × 22 matrix, which represents the gene expression of the 22 papillary serous adenocarcinoma tumors; submatrix B3 is a 12,651 × 16 matrix representing the gene expression of the 16 omentum papillary serous adenocarcinoma; and matrix C is a 12,651 × 319 matrix that represents the gene expression of the 319 nonovarian tissues.

Identification of Single Biomarkers in Ovarian Carcinoma

Biomarkers specific for ovarian cancer should be highly expressed in ovarian cancer samples and low or absent in other samples, including normal ovaries and nonovarian tissues. Mathematically, they should correspond to the set of genes that are up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and each set of nonovarian tissue samples. In this study, we assume that for a given gene to correspond to a potential biomarker it should be at least 2-fold (log2 scale) up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and each set of nonovarian tissue samples [that is, log2(y / x) ≥ 2 and log2(y / z) ≥2, where x, y, and z correspond to the expression level of a gene in the healthy ovary, ovarian cancer, and nonovarian data, respectively]. Also, the corresponding gene must exhibit a sensitivity of ≥90% for a specificity of ≥90%, with accuracy of ≥90%. Identification of such a pattern is done in this study using the combination of the Robust Biclustering Algorithm (20) and the ROC approach we will define here. To develop a diagnostic assay for the detection of ovarian cancer using a blood test, biomarkers should also correspond to genes that encode for predicted secreted proteins, membrane proteins, and/or extracellular matrix proteins. These types of proteins are more likely to be present in the blood than proteins localized to the cell nucleus or cytoplasm.

Biclustering Approach

We used the Robust Biclustering Algorithm that we recently developed (20) to identify biclusters with constant values. Given the above gene expression matrices: A, B, and C, with set of rows or genes G = {g1, g2, …, gN} and set of conditions or tissue samples SA = {s1A, s2A, …, sM1A}, SB = {s1B, s2B, …, sM2B}, and SC = {s1C, s2C, …, sM3C}, respectively. We define a bicluster with constant values, that is, a subset of genes that the expression level stay constant across a subset of conditions or tissue samples as MkA = {IkA, JkA}, MlB = {IlB, JlB}, and MmC = {ImC, JmC} or as submatrices MkA = [MkA(i,j)], MlB = [MlB(i,j)], and MmC = [MmC(i,j)] of A, B, and C, respectively. The Is correspond to the subsets of genes G, the Js correspond to the subsets of tissue samples SA, SB, or SC, and M(i,j) corresponds to the expression level of gene ith under condition jth, with iεI and jεJ. Identification of potential biomarkers can be done using Eq. (1) below, with 1 ≤ kNA, 1 ≤ l ≤ NB, and 1 ≤ mNC

\[I={\cup}(I_{k}^{A}{\cap}I_{l}^{B}{\cap}I_{m}^{C}).\]

In Eq. (1), NA corresponds to the number of biclusters MkA = {IkA, JkA} with constant values x in the healthy ovary tissue samples data set, NB corresponds to the number of biclusters MlB = {IlB, JlB} with constant values y (y >> x) in the ovarian cancer tissue samples data set, and NC corresponds to the number of biclusters MmC = {ImC, JmC} with constant values z (z << y) in the nonovarian tissue sample data set.

Here, we considered each set of ovarian cancer tissues data separately and the expression level of the gene considered in the ovarian cancer tissue samples should be at least 2-fold (log2 scale) greater than the expression level of the same gene in normal ovary tissue samples and in each set of nonovarian tissue samples. Ideally, when dealing with blood biomarkers, we would like x = z = 0.

The statistical performance of a given bicluster M = [M(i,j)] with constant value was then evaluated using the following equation: for all rows, M(i,:) of M = [M(i,j)],

\[Max(M(i,:)){-}Min(M(i,:)){\leq}{\delta}\]

with δ → 0 (that is, δ is a real positive small number).

ROC Approach

Given the gene expression data as defined above, the ROC approach first assumes that all genes correspond to potential biomarkers. Then, it uses the following criterion based on the detection performance exhibited by their respective ROC curve to select the ones with high specificity corresponding to high sensitivity and high accuracy.

For a given screening cutoff point, let a be the number of healthy ovary and nonovarian tissue samples (healthy and diseased) that screen positive, b the number of ovarian cancer tissue samples that screen positive, c the number of healthy ovary and nonovarian tissue samples that screen negative, and d the number of ovarian cancer tissue samples that screen negative. The sensitivity of a potential blood biomarker (Se) is the number of ovarian cancer tissue samples that screen positive divided by the total number of ovarian cancer tissue samples: Se = b / b + d. The specificity of a potential blood biomarker (Sp) is the number of healthy ovary and nonovarian tissue samples that screen negative divided by the total number of healthy ovary and nonovarian tissue samples: Sp = c / c + a. Using these variables, we compute the ROC function of each potential blood biomarker using the following equation: Se = f (1 - Sp). Basically, Se = f (1 - Sp) describes the relationship between the true-positive rate (sensitivity) and the false-positive rate (1 - specificity) for different screening cutoff points. Finally, the ROC methodology keeps all genes capable of achieving specificity corresponding to sensitivity as well as accuracy greater than the defined specified thresholds. The resultant family of genes will correspond to biomarkers that may then be evaluated for their use in the detection of ovarian cancer using a blood test.

The P of each identified single biomarker, that is, the probability of observing the given result, or one more extreme by chance if the null hypothesis is true, was estimated using a two-sided test.

Identification of Group Biomarkers in Ovarian Carcinoma

Identification of group biomarkers was done using a randomly selected set of 40 of the 62 normal ovary tissues and 30 of the 45 ovarian cancer tissues (5 of the 7 borderline ovarian cancer tissues, 15 of the 22 papillary serous adenocarcinoma tumors, and 10 of the 16 omentum papillary serous adenocarcinoma metastases). Briefly, we applied Eqs. (1) and (2) and the ROC approach on the randomly selected set of data to uncover potential single biomarkers. Then, the gene expression data of the single biomarkers identified were sorted according to the progression of the disease. Given that we only had three different stages (normal ovary, borderline ovarian cancer, and primary ovarian cancer) and two different sites of ovarian cancer (ovary and the omentum), the stages were repeated periodically every three samples. Data were organized as D = [D1D2D3D1D2D3D1D2D3], where D1 is a column vector representing the expression level of one of the 40 randomly selected normal ovary tissues, D2 is a column vector representing the expression level of one of the 5 randomly selected borderline ovarian tissues, and D3 is one of the 15 randomly selected papillary serous adenocarcinoma or one of the 10 randomly selected omentum papillary serous adenocarcinoma metastases. Also, because we only had 5 randomly selected borderline samples, the maximum number of columns or tissue samples in D that we could have was 15. We therefore produced several D matrices, which used the same borderline data. Different matrices had different combinations of randomly selected normal ovary tissues and papillary serous adenocarcinoma of ovarian tumors or omentum papillary serous adenocarcinoma ovarian tumors. In all, we examined 8 such matrices using the order preserving biclustering algorithm of Tchagang and Tewfik (21, 22) and retained the genes that appeared as many times as possible in the same bicluster. The order preserving biclustering algorithm has the advantage of being insensitive to the relative position of each tissue of a given kind. That is, it produces the same output under any permutation of the positions of the normal ovary tissues, papillary serous adenocarcinoma of ovarian tumors, or omentum papillary serous adenocarcinoma of ovarian tumors within each group of tissues (positions D1, D2, or D3).

In our problem, the bicluster identification step of refs. 21, 22 consists of two substeps. In the first substep, the procedure enumerates all combinations of K tissues, where KKmin, the prespecified minimum number of tissues in a valid bicluster, from the given MD tissues in matrix D that could potentially appear in a valid bicluster. For each subset of K tissues, it then uses a row sort procedure that allows us to focus on the coherent evolutions of gene expression levels rather than the raw or processed expression levels. The output of this step is a matrix that contains the rank of each of the K tissues for each row (gene) when the expression level at each tissue for the given gene are ordered in a nondecreasing manner. This matrix is referred to as the “tissue rank matrix” and used as the input to the main bicluster identification routine (22). In the second substep, the main bicluster identification routine identifies all valid coherent evolution patterns involving all genes and a set of K tissues “simultaneously” through a fast row sorting procedure. Note that this allows the algorithm to identify all the possible valid biclusters “without” an exhaustive enumeration of all possible K! permutations of the K tissues. The procedure will also yield biclusters of genes where a subset of genes are coherently up-regulated and another subset coherently down-regulated across the K tissues. A final pruning step eliminates all biclusters that are completely included in larger ones (22).

The statistical significance of each identified group biomarker with G genes is assessed using Eq. (3), that is, the upper bound of the tail probability that a random data set of size I × J will contain an order preserving bicluster with G or more genes in it (21).

\[Z(J,G)=J!{{\sum}_{i=G}^{I}}\left(_{i}^{I}\right)\left(\frac{1}{J!}\right)^{i}\left(1\frac{1}{J!}\right)^{I{-}i}\]

As long as that upper bound probability is smaller than any desired significance level, the identified group biomarker will be statistically significant.

Single Biomarker Algorithm

Using Eqs. (1) and (2) with δ < 1 and the ROC approach, we identified 54 genes that are up-regulated in ovarian cancer tissue samples at least 2-fold (log2 scale) compared with normal ovary tissue samples and each set of nonovarian healthy and diseased tissue samples used in this study (Table 2). The 54 genes achieved a specificity greater or equal to 90% corresponding to a sensitivity greater or equal to 90% using the ROC approach. The 54 genes encode for predicted secreted proteins, membrane proteins, and/or extracellular proteins.5

Therefore, they represent proteins that have the potential to be present in the blood of ovarian cancer patients and may prove useful in an ovarian cancer diagnostic blood test.

Table 2.

Fifty-four genes identified by the single biomarker algorithm to be up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and nonovarian tissue samples

Fragment nameGene nameKnown gene symbolOvarian cancer borderline
Ovarian cancer primary
Ovarian cancer omentum
Fold changePFold changePFold changeP
33454_at Agrin AGRN 2.2 5.2e-07 2.4 5.1e-21 2.3 5.5e-16 
757_at Annexin A2, Annexin A2 pseudogene 2 ANXA2 3.0 3.8e-04   3.1 1.6e-07 
35099_at Apolipoprotein L1 APOL1     2.1 1.9e-06 
2011_s_at BCL2-interacting killer (apoptosis-inducing) (BIK, KIAA1654)     4.7 8.3e-13 
35822_at B-factor properdin BF*   6.0 3.7e-15   
41534_at BH-protocadherin (brain-heart) PCDH7 2.0 1.7e-04     
1620_at Cadherin 6, type 2, K-cadherin (fetal kidney) CDH6 2.6 3.1e-03 4.4 6.2e-15 5.2 1.8e-18 
41660_at Cadherin, EGF LAG seven-pass G-type receptor 1 (flamingo homologue, Drosophila) CELSR1* 3.5 2.5e-04 3.8 5.4e-10 3.8 9.0e-09 
36499_at Cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homologue, Drosophila) CELSR2* 2.2 2.0e-10 2.0 4.7e-19 2.0 3.8e-19 
37890_at CD47 antigen (Rh-related antigen, integrin-associated signal transducer) CD47 2.1 9.0e-06 2.4 1.1e-21 2.7 6.5e-22 
39008_at Ceruloplasmin (ferroxidase) CP     4.1 7.4e-12 
431_at Chemokine (C-X-C motif) ligand 10 CXCL10     3.1 5.12e-15 
36197_at Chitinase 3-like 1 (cartilage glycoprotein-39) CHI3L1     5.2 2.7e-12 
33904_at Claudin 3 CLDN3   6.5 1.7e-15 6.6 1.0e-11 
35276_at Claudin 4 CLDN4   5.9 1.6e-16 5.5 1.6e-11 
38482_at Claudin 7 CLDN7   5.1 1.5e-10   
37534_at Coxsackie virus and adenovirus receptor CXADR     4.6 5.0e-07 
35453_at Dermatan sulfate proteoglycan 3 DSPG3     2.8 3.7e-16 
36643_at Discoidin domain receptor family, member 1 DDR1 2.1 7.2e-06 2.4 5.3e-18 2.2 7.6e-12 
1007_s_at Discoidin domain receptor family, member 1 DDR1 2.1 2.5e-09 2.1 2.0e-28 2.1 1.6e-18 
41586_at Fibroblast growth factor 18 FGF18 2.9 1.7e-05     
41587_g_at Fibroblast growth factor 18 FGF18 3.7 1.6e-16     
534_s_at Folate receptor 1 (adult) FOLR1   2.5 2.7e-05 2.6 4.9e-09 
821_s_at Folate receptor 1 (adult) FOLR1   2.4 8.0e-11 3.1 7.6e-13 
38749_at G protein-coupled receptor 39, LY6/PLAUR domain containing 1 GPR39, LYPDC1* 6.0 3.1e-27 5.8 7.3e-54 5.6 8.6e-51 
406_at Integrin β4 ITGB4, (A) 3.2 1.4e-07 2.9 7.4e-13 2.4 1.5e-06 
37554_at Kallikrein 6 (neurosin, zyme) KLK6     5.2 2.9e-19 
38143_at Kallikrein 7 (chymotryptic, stratum corneum) KLK7, (C) 3.3 2.1e-03 4.4 4.1e-04 4.7 1.4e-04 
37131_at Kallikrein 8 (neuropsin/ovasin) KLK8, (C) 5.4 5.4e-20 6.1 1.6e-70 6.2 1.5e-71 
36838_at Kallikrein 10 KLK10   2.6 3.1e-16 3.0 1.5e-14 
40035_at Kallikrein 11 KLK11   4.7 9.9e-15 5.2 1.3e-13 
36929_at Laminin β3 LAMB3   3.7 1.9e-11   
35280_at Laminin γ2 LAMC2* 5.3 1.6e-08 5.0 2.6e-28   
39583_at Leucine-rich repeat neuronal 5 LRRN5 2.4 1.1e-03     
32821_at Lipocalin 2 (oncogene 24p3) LCN2*, (A) 6.3 2.8e-08 5.1 5.7e-13 4.9 6.6e-10 
40093_at Lutheran blood group (Auberger b antigen included) LU   2.6 3.5e-14 2.8 1.1e-14 
32072_at Mesothelin MSLN, (B), (C) 3.4 2.5e-05 4.1 2.9e-19 4.5 1.5e-17 
38784_g_at Mucin 1, transmembrane MUC1, (B) 4.9 4.7e-13 4.8 2.8e-26 4.6 1.4e-16 
927_s_at Mucin 1, transmembrane MUC1 4.5 5.7e-07 4.9 7.5e-18 4.1 1.5e-12 
38783_at Mucin 1, transmembrane MUC1 6.2 7.5e-07 6.5 6.6e-16 5.9 5.5e-12 
1083_s_at Mucin 1, transmembrane MUC1 3.6 3.6e-06 4.1 2.3e-17 3.9 5.9e-13 
35912_at Mucin 4, tracheobronchial MUC4 3.1 7.9e-05     
32625_at Natriuretic peptide receptor A/guanylate cyclase A (atrionatriuretic peptide receptor A) NPR1     2.4 1.4e-06 
33483_at Neuromedin U NMU     4.3 3.2e-19 
35663_at Neuronal pentraxin II NPTX2 2.0 1.9e-07     
1985_s_at Nonmetastatic cells 1, protein (NM23A) expressed in nonmetastatic cells 2, protein (NM23B) (NME1, NME2)     2.1 1.6e-11 
33783_at Plexin B1 PLXNB1* 2.3 1.7e-04 2.9 2.2e-12 2.8 6.9e-10 
34780_at Plexin B2 PLXNB2 2.1 9.4e-08     
41106_at Potassium intermediate/small conductance calcium-activated channel, subfamily N, member 4 KCNN4 4.7 9.3e-07     
41470_at Prominin 1 PROM1 3.8 7.2e-05 3.4 3.0e-04   
32275_at Secretory leukocyte protease inhibitor (antileukoproteinase) SLPI* 4.2 1.3e-04 4.2 7.6e-11 4.1 2.3e-08 
39075_at Sialidase 1 (lysosomal sialidase) NEU1     2.5 4.5e-06 
35207_at Sodium channel, non-voltage-gated 1α SCNN1A* 5.6 3.7e-06 6.0 2.9e-17 6.2 6.7e-14 
36609_at Solute carrier family 1 (glial high-affinity glutamate transporter), member 3 (DKFZP547J0410, SLC1A3)     2.2 1.6e-08 
35277_at Spondin 1, extracellular matrix protein SPON1     2.2 5.3e-10 
575_s_at Tumor-associated calcium signal transducer 1 TACSTD1* 5.4 4.4e-05 5.6 3.5e-13 5.4 1.1e-09 
291_s_at Tumor-associated calcium signal transducer 2 TACSTD2* 4.7 2.4e-06     
33218_at V-erb-b2 erythroblastic leukemia viral oncogene homologue 2, neuro/glioblastoma-derived oncogene homologue (avian) ERBB2 2.0 2.7e-05 2.1 8.5e-16   
33933_at WAP four-disulfide core domain 2 WFDC2, (B) 4.8 7.9e-06 5.3 1.5e-17 5.2 2.1e-13 
1887_g_at Wingless-type MMTV integration site family, member 7A WNT7A*, (A) 3.0 7.8e-22 2.3 7.1e-22 3.4 2.2e-33 
Fragment nameGene nameKnown gene symbolOvarian cancer borderline
Ovarian cancer primary
Ovarian cancer omentum
Fold changePFold changePFold changeP
33454_at Agrin AGRN 2.2 5.2e-07 2.4 5.1e-21 2.3 5.5e-16 
757_at Annexin A2, Annexin A2 pseudogene 2 ANXA2 3.0 3.8e-04   3.1 1.6e-07 
35099_at Apolipoprotein L1 APOL1     2.1 1.9e-06 
2011_s_at BCL2-interacting killer (apoptosis-inducing) (BIK, KIAA1654)     4.7 8.3e-13 
35822_at B-factor properdin BF*   6.0 3.7e-15   
41534_at BH-protocadherin (brain-heart) PCDH7 2.0 1.7e-04     
1620_at Cadherin 6, type 2, K-cadherin (fetal kidney) CDH6 2.6 3.1e-03 4.4 6.2e-15 5.2 1.8e-18 
41660_at Cadherin, EGF LAG seven-pass G-type receptor 1 (flamingo homologue, Drosophila) CELSR1* 3.5 2.5e-04 3.8 5.4e-10 3.8 9.0e-09 
36499_at Cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homologue, Drosophila) CELSR2* 2.2 2.0e-10 2.0 4.7e-19 2.0 3.8e-19 
37890_at CD47 antigen (Rh-related antigen, integrin-associated signal transducer) CD47 2.1 9.0e-06 2.4 1.1e-21 2.7 6.5e-22 
39008_at Ceruloplasmin (ferroxidase) CP     4.1 7.4e-12 
431_at Chemokine (C-X-C motif) ligand 10 CXCL10     3.1 5.12e-15 
36197_at Chitinase 3-like 1 (cartilage glycoprotein-39) CHI3L1     5.2 2.7e-12 
33904_at Claudin 3 CLDN3   6.5 1.7e-15 6.6 1.0e-11 
35276_at Claudin 4 CLDN4   5.9 1.6e-16 5.5 1.6e-11 
38482_at Claudin 7 CLDN7   5.1 1.5e-10   
37534_at Coxsackie virus and adenovirus receptor CXADR     4.6 5.0e-07 
35453_at Dermatan sulfate proteoglycan 3 DSPG3     2.8 3.7e-16 
36643_at Discoidin domain receptor family, member 1 DDR1 2.1 7.2e-06 2.4 5.3e-18 2.2 7.6e-12 
1007_s_at Discoidin domain receptor family, member 1 DDR1 2.1 2.5e-09 2.1 2.0e-28 2.1 1.6e-18 
41586_at Fibroblast growth factor 18 FGF18 2.9 1.7e-05     
41587_g_at Fibroblast growth factor 18 FGF18 3.7 1.6e-16     
534_s_at Folate receptor 1 (adult) FOLR1   2.5 2.7e-05 2.6 4.9e-09 
821_s_at Folate receptor 1 (adult) FOLR1   2.4 8.0e-11 3.1 7.6e-13 
38749_at G protein-coupled receptor 39, LY6/PLAUR domain containing 1 GPR39, LYPDC1* 6.0 3.1e-27 5.8 7.3e-54 5.6 8.6e-51 
406_at Integrin β4 ITGB4, (A) 3.2 1.4e-07 2.9 7.4e-13 2.4 1.5e-06 
37554_at Kallikrein 6 (neurosin, zyme) KLK6     5.2 2.9e-19 
38143_at Kallikrein 7 (chymotryptic, stratum corneum) KLK7, (C) 3.3 2.1e-03 4.4 4.1e-04 4.7 1.4e-04 
37131_at Kallikrein 8 (neuropsin/ovasin) KLK8, (C) 5.4 5.4e-20 6.1 1.6e-70 6.2 1.5e-71 
36838_at Kallikrein 10 KLK10   2.6 3.1e-16 3.0 1.5e-14 
40035_at Kallikrein 11 KLK11   4.7 9.9e-15 5.2 1.3e-13 
36929_at Laminin β3 LAMB3   3.7 1.9e-11   
35280_at Laminin γ2 LAMC2* 5.3 1.6e-08 5.0 2.6e-28   
39583_at Leucine-rich repeat neuronal 5 LRRN5 2.4 1.1e-03     
32821_at Lipocalin 2 (oncogene 24p3) LCN2*, (A) 6.3 2.8e-08 5.1 5.7e-13 4.9 6.6e-10 
40093_at Lutheran blood group (Auberger b antigen included) LU   2.6 3.5e-14 2.8 1.1e-14 
32072_at Mesothelin MSLN, (B), (C) 3.4 2.5e-05 4.1 2.9e-19 4.5 1.5e-17 
38784_g_at Mucin 1, transmembrane MUC1, (B) 4.9 4.7e-13 4.8 2.8e-26 4.6 1.4e-16 
927_s_at Mucin 1, transmembrane MUC1 4.5 5.7e-07 4.9 7.5e-18 4.1 1.5e-12 
38783_at Mucin 1, transmembrane MUC1 6.2 7.5e-07 6.5 6.6e-16 5.9 5.5e-12 
1083_s_at Mucin 1, transmembrane MUC1 3.6 3.6e-06 4.1 2.3e-17 3.9 5.9e-13 
35912_at Mucin 4, tracheobronchial MUC4 3.1 7.9e-05     
32625_at Natriuretic peptide receptor A/guanylate cyclase A (atrionatriuretic peptide receptor A) NPR1     2.4 1.4e-06 
33483_at Neuromedin U NMU     4.3 3.2e-19 
35663_at Neuronal pentraxin II NPTX2 2.0 1.9e-07     
1985_s_at Nonmetastatic cells 1, protein (NM23A) expressed in nonmetastatic cells 2, protein (NM23B) (NME1, NME2)     2.1 1.6e-11 
33783_at Plexin B1 PLXNB1* 2.3 1.7e-04 2.9 2.2e-12 2.8 6.9e-10 
34780_at Plexin B2 PLXNB2 2.1 9.4e-08     
41106_at Potassium intermediate/small conductance calcium-activated channel, subfamily N, member 4 KCNN4 4.7 9.3e-07     
41470_at Prominin 1 PROM1 3.8 7.2e-05 3.4 3.0e-04   
32275_at Secretory leukocyte protease inhibitor (antileukoproteinase) SLPI* 4.2 1.3e-04 4.2 7.6e-11 4.1 2.3e-08 
39075_at Sialidase 1 (lysosomal sialidase) NEU1     2.5 4.5e-06 
35207_at Sodium channel, non-voltage-gated 1α SCNN1A* 5.6 3.7e-06 6.0 2.9e-17 6.2 6.7e-14 
36609_at Solute carrier family 1 (glial high-affinity glutamate transporter), member 3 (DKFZP547J0410, SLC1A3)     2.2 1.6e-08 
35277_at Spondin 1, extracellular matrix protein SPON1     2.2 5.3e-10 
575_s_at Tumor-associated calcium signal transducer 1 TACSTD1* 5.4 4.4e-05 5.6 3.5e-13 5.4 1.1e-09 
291_s_at Tumor-associated calcium signal transducer 2 TACSTD2* 4.7 2.4e-06     
33218_at V-erb-b2 erythroblastic leukemia viral oncogene homologue 2, neuro/glioblastoma-derived oncogene homologue (avian) ERBB2 2.0 2.7e-05 2.1 8.5e-16   
33933_at WAP four-disulfide core domain 2 WFDC2, (B) 4.8 7.9e-06 5.3 1.5e-17 5.2 2.1e-13 
1887_g_at Wingless-type MMTV integration site family, member 7A WNT7A*, (A) 3.0 7.8e-22 2.3 7.1e-22 3.4 2.2e-33 

NOTE: “(A),” “(B),” and “(C)” are genes that belong to group biomarkers “A,” “B,” and “C,” respectively. Fold change relative to normal ovary tissues; P values relative to normal ovary tissues and nonovarian tissues.

Selection criteria: Up-regulated in ovarian cancer tissue samples at least 2-fold (in log2 scale) compared with normal ovary tissue samples and each set of nonovarian tissue samples. Specificity greater or equal to 90% corresponding to sensitivity greater or equal to 90%. Genes code for proteins that are secreted, extracellular, or membranous.

*

Genes not previously linked in the literature to ovarian cancer.

We analyzed each stage of ovarian cancer separately. From our analysis and as shown in the following sections, we found that many of the ovarian cancer biomarkers correlated with the stage of ovarian cancer. That is, some biomarkers did very well on some stages of ovarian cancer but not as well on others. For example, chemokine (C-X-C motif) ligand 10 and chitinase 3-like 1 were found to be up-regulated in the omental metastases but were not up-regulated in the borderline ovarian cancer or primary ovarian cancer tissue relative to normal ovary (Table 2).

The genes listed in Table 2 include most of the potential biomarkers uncovered by previous studies (411) as well as an additional 13 that do not appear to have been mentioned in the literature before as potential ovarian cancer biomarkers (indicated by an asterisk). For example, the G protein-coupled receptor 39, LY6/PLAUR domain containing 1 (GPR39, LYPDC1) corresponds to a gene that has not been mentioned in the literature before as a potential ovarian cancer biomarker. GPR39, LYPDC1 is at least 6-fold (log2 scale) up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and each set of nonovarian tissue samples used in this study (Fig. 1A). By ROC analysis, GPR39, LYPDC1 achieved a specificity greater than 90% for a sensitivity greater than 90% when used to detect ovarian cancer at each stage (Fig. 1B).

Figure 1.

A, mean expression level of GPR39, LYPDC1 in various tissues. GPR39, LYPDC1 is at least 6-fold (log2 scale) up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and each set of nonovarian tissue samples used in this study. B, ROC analysis of GPR39, LYPDC1 on each stage of ovarian cancer [omentum papillary serous adenocarcinomas (♦), papillary serous adenocarcinomas of the ovary (▪), and borderline ovarian cancer (▴)] shows that with a specificity greater than 90%, GPR39, LYPDC1 achieves a sensitivity greater than 90% when used to detect ovarian cancer at each stage.

Figure 1.

A, mean expression level of GPR39, LYPDC1 in various tissues. GPR39, LYPDC1 is at least 6-fold (log2 scale) up-regulated in ovarian cancer tissue samples compared with normal ovary tissue samples and each set of nonovarian tissue samples used in this study. B, ROC analysis of GPR39, LYPDC1 on each stage of ovarian cancer [omentum papillary serous adenocarcinomas (♦), papillary serous adenocarcinomas of the ovary (▪), and borderline ovarian cancer (▴)] shows that with a specificity greater than 90%, GPR39, LYPDC1 achieves a sensitivity greater than 90% when used to detect ovarian cancer at each stage.

Close modal

Group Biomarkers

Using the order preserving technique as described above on the gene expression data of the set of single biomarkers listed in Table 2, we identified three potential group biomarkers that exhibited unique and conserved biological patterns across the ranked data sets that we randomly generated. Because we were looking for the group of genes that exhibited coherent behavior across the largest number of ranked tissue samples in each one of the eight matrices that we randomly generated and because each random matrix had 54 rows (genes) and 15 columns (ranked tissue samples), we fixed Kmin = 15.

The three genes (ITGB4, LCN2, and WNT7A) identified with “A” in Table 2 represent the set of genes that belong to group biomarker “A,” Z = 3.7e-92. They exhibit a coherent behavior across the following ranked conditions: normal ovary, borderline ovarian cancer, and primary ovarian cancer. Integrin β4 (ITGB4) encodes for a membrane protein. It is a receptor for laminin and it plays a critical structural role in the hemidesmosomes of epithelial cells (25). It has been shown previously to be up-regulated in ovarian cancer (10). Lipocalin 2 (oncogene 24p3; LCN2) encodes for a secreted protein. It transports small lipophilic substances and forms a heterodimer with type V collagenase (MMP-9). Although LCN2 has been shown to be up-regulated in patients with renal cell carcinoma (26), little is mentioned in the literature about LCN2 and its role in ovarian cancer. Wingless-type MMTV integration site family, member 7A (WNT7A) encodes for a secreted protein that is present in the extracellular matrix. It is a ligand for members of the frizzled family of seven transmembrane receptors (27). It is a developmental protein; signaling by WNT7A allows sexual dimorphic development of the Mullerian ducts (27). WNT7A has been shown to be up-regulated in lung cancer patients (28), but no one has shown a role for WNT7A in ovarian cancer.

Figure 2A shows the expression profile of the three genes that belong to group biomarker “A” across one of the 8 randomly generated matrices. The three genes that belong to this group behave coherently across these ranked conditions. This pattern is conserved across most of the 8 matrices that we randomly generated (Supplementary Fig. S1).6

6

Supplementary material for this article is available at Molecular Cancer Therapeutics Online (http://mct.aacrjournals.org/).

This correlation may mean that they respond similarly to the same environmental conditions.

Figure 2.

Genes belonging to group biomarkers show a coherent pattern of expression. A, group biomarker “A” contains LCN2 (♦), WNT7A (▪), and ITGB4 (▴). B, group biomarker “B” contains MSLN (♦), WFDC2 (▪), and MUC1 (▴). C, group biomarker “C” contains KLK8 (♦), KLK7 (▪), and MLSN (▴). The Y axis corresponds to the expression level and the X axis shows a series of different samples ranked as follows: normal, borderline, and primary or normal, borderline, and omentum.

Figure 2.

Genes belonging to group biomarkers show a coherent pattern of expression. A, group biomarker “A” contains LCN2 (♦), WNT7A (▪), and ITGB4 (▴). B, group biomarker “B” contains MSLN (♦), WFDC2 (▪), and MUC1 (▴). C, group biomarker “C” contains KLK8 (♦), KLK7 (▪), and MLSN (▴). The Y axis corresponds to the expression level and the X axis shows a series of different samples ranked as follows: normal, borderline, and primary or normal, borderline, and omentum.

Close modal

Three other genes (WFDC2, MUC1, and MSLN) labeled as “B” in Table 2 belong to group biomarker “B,” Z = 3.7e-92. They exhibit a coherent behavior across the following ranked conditions: normal ovary, borderline ovarian cancer, and primary ovarian cancer. WAP four-disulfide core domain 2 (WFDC2) encodes for a secreted protein that is expressed in several tumor cells, such as ovarian, colon, breast, lung, and renal (29). WFDC2, also known as HE4, has been shown to be highly up-regulated in ovarian cancer (30, 31). Mucin 1 (MUC1) encodes for a membrane protein that is also secreted. It may play a role in adhesive functions and in cell-cell interactions, metastasis, and signaling (32). MUC1 may provide a protective layer on epithelial surfaces (32). MUC1 has been shown to be highly up-regulated in ovarian cancer (33). Mesothelin (MSLN) encodes for a membrane protein. Its function is unknown, but it may play a role in cell adhesion. It has multiple transcripts due to alternative splicing. MSLN has been shown to be highly up-regulated in ovarian cancer (11, 3436). Figure 2B shows the expression profile of the three genes that belong to group biomarker “B” across one of the 8 randomly generated matrices. This pattern is conserved across most of the 8 matrices that we randomly generated (data not shown). The genes in this group behave coherently across these ranked conditions.

Finally, the three genes (MSLN, KLK8, and KLK7) labeled as “C” in Table 2 belong to group biomarker “C,” Z = 3.7e-92. They exhibit a coherent behavior across the following ranked conditions: normal ovary, borderline ovarian cancer, and secondary ovarian cancer of the omentum. Kallikrein 8 (neuropsin/ovasin; KLK8) encodes for a secreted protein. KLK8 may be involved in epileptogenesis and hippocampal plasticity. KLK8 has been shown to be highly up-regulated in ovarian cancer (10, 11, 37, 38). Kallikrein 7 (chymotryptic, stratum corneum; KLK7) encodes for a secreted protein. KLK7 is highly up-regulated in ovarian cancer (10, 11) and is present at the apical membrane and in the cytoplasm at the invasive front.

Figure 2C shows the expression profile of the three genes that belong to group biomarker “C” across one of the 8 randomly generated matrices. This pattern is conserved across the 8 matrices that we randomly generated (data not shown).

Comparison of Group Biomarkers with Other Sets of Biomarkers Obtained with Alternative Statistical Approaches

We next did statistical analysis and validation of our three group biomarkers on the entire set of gene expression data for the ovary tissue samples. Thus, we analyzed data from the 62 normal ovaries, 7 borderline ovarian cancers, 22 papillary serous adenocarcinomas, and 16 omentum papillary serous adenocarcinoma metastases. We also compared the performance of our three group biomarkers with that of the combinations of the best biomarkers identified using other computational approaches: F test, ROC approach, and clustering.

ROC plots of group biomarkers “A” (Fig. 3A), “B” (Fig. 3B), and “C” (Fig. 3C) were compared with the combination of the six best biomarkers identified using the F test: GPR39, KLK8, LAMC2, LCN2, SCNN1A, and TACSTD1 from the borderline data set; BF, CLDN3, CLDN4, GPR39, KLK8, and SCNN1A from the papillary serous adenocarcinoma data set; and CLDN3, CLDN4, GPR39, KLK8, SCNN1A, and WFDC2 from the omentum papillary serous adenocarcinoma data set. The six best genes identified using the Eisen clustering approach were CDH6, DDR1, GPR39, KLK8, LAMC2, and LCN2 from the borderline data set; CDH6, CLDN3, GPR39, KLK8, MUC1, and WFDC2 from the papillary serous adenocarcinoma data set; and CLDN3, CLDN4, GPR39, KLK8, SCNN1A, and WFDC2 from the omentum papillary serous adenocarcinoma data set. The five best genes identified using the ROC approach were DDR1, ITGB4, KLK8, LCN2, and WNT7A from the borderline data set; DDR1, KLK8, MSLN, MUC1, and WFDC2 from the papillary serous adenocarcinoma data set; and CD47, CLDN3, KLK7, KLK8, and MSLN from the omentum papillary serous adenocarcinoma data set.

Figure 3.

ROC curves comparison of group biomarkers “A” (A), “B” (B), and “C” (C), with the best genes uncovered using other computational techniques: F test (♦), ROC approach (•), and Eisen clustering (▴).

Figure 3.

ROC curves comparison of group biomarkers “A” (A), “B” (B), and “C” (C), with the best genes uncovered using other computational techniques: F test (♦), ROC approach (•), and Eisen clustering (▴).

Close modal

With specificity greater than 99%, each of our three group biomarkers achieved 100% sensitivity, with accuracy greater than 99% (Fig. 3). Thus, our group biomarkers outperformed the combination of the best biomarkers identified using the other three computational techniques.

Group Biomarkers Validation

We validated our three group biomarkers using publicly available sets of gene expression data downloaded from the NIH Web site.7

Data sets GSM139377 to GSM139479 for ovarian cancer and normal ovary tissue samples were made available on April 9, 2007. These data sets contain the gene expression of 99 individual ovarian tumors (37 endometrioid, 41 serous, 13 mucinous, and 8 clear cell carcinomas) and 4 individual normal ovary samples. Each tissue was assayed on Affymetrix HG_U133A array, the data were processed using “Ann Arbor quantile-normalized trimmed-mean method” and normalized using “quantile-normalized trimmed-mean, log transformed with log[max(x + 50,0) + 50] using base 10 logarithms.”

Data sets GSM44671 to GSM44706 for nonovarian tissue samples were made available on April 5, 2005. These data sets contain the expression profiling of 36 types of normal tissue from different organs; RNA samples had been pooled from several donors then assayed on Affymetrix HG_U133A arrays. To compare these data with the above ovarian cancer and normal ovary gene expression data, we normalized this nonovarian data using the “log transformed with log[max(x + 50,0) + 50] using base 10 logarithms.”

Table 3 shows the different values of maximum sensitivities for specificity greater than or equal to 99% when our three group biomarkers were used to detect different types and stages of ovarian cancer on the publicly available gene expression data. At least one of our group biomarkers detected each stage and different type of ovarian cancer on the publicly available data set, except for stage II endometrioid and stage III mucinous. Interestingly, with 100% specificity, group biomarker “A” achieved 100% sensitivity on each type of ovarian cancer at stage I of the disease, suggesting a potential usefulness in detecting early-stage ovarian cancer compared with group biomarkers “B” and “C.”

Table 3.

Maximum values of sensitivities for specificity greater than or equal to 99%

No. tissue samplesGroup biomarker “A” (%)Group biomarker “B” (%)Group biomarker “C” (%)
Stage I     
    Clear cell 100 99 100 
    Endometrioid 18 100 78 84 
    Mucinous 100 100 75 
    Serous 100 100 99 
Stage II     
    Clear cell 100 100 100 
    Endometrioid 80 60 60 
    Mucinous 100 100 100 
    Serous 100 100 100 
Stage III     
    Clear cell 100 100 100 
    Endometrioid 11 90 100 90 
    Mucinous 50 50 30 
    Serous 30 100 99 100 
Stage IV     
    Clear cell 100 100 100 
    Endometrioid 100 100 100 
    Serous 100 100 100 
No. tissue samplesGroup biomarker “A” (%)Group biomarker “B” (%)Group biomarker “C” (%)
Stage I     
    Clear cell 100 99 100 
    Endometrioid 18 100 78 84 
    Mucinous 100 100 75 
    Serous 100 100 99 
Stage II     
    Clear cell 100 100 100 
    Endometrioid 80 60 60 
    Mucinous 100 100 100 
    Serous 100 100 100 
Stage III     
    Clear cell 100 100 100 
    Endometrioid 11 90 100 90 
    Mucinous 50 50 30 
    Serous 30 100 99 100 
Stage IV     
    Clear cell 100 100 100 
    Endometrioid 100 100 100 
    Serous 100 100 100 

In this study, we applied a novel set of biclustering algorithms and a ROC approach on well-defined gene expression data representing ovarian cancer, normal ovary, and nonovarian healthy and diseased tissues samples. We identified many significant patterns that encode for secreted proteins, membrane proteins, and/or extracellular matrix proteins that clearly discriminate between the gene expression data of ovarian cancer, normal ovary, and nonovarian tissues.

The advantage of using a biclustering algorithm is that it allows grouping together of subsets of genes that exhibit the same behavior across subsets of tissue samples. Therefore, the genes that belong to the same bicluster likely have similar responses to the same environmental condition. Thus, a biclustering algorithm approach will give more clinical and biological insight into the tissue samples analyzed and potential biomarkers uncovered.

A major difference between our ROC approach and other computational techniques based on ROC curves is that our definition of specificity includes the nonovarian tissues whereas others do not (17). Other computational techniques only do a classification based on a comparison between healthy ovary and ovarian cancer tissue samples and do not account for other tissues in the body that may produce the same protein as the ovarian cancer tissue. Therefore, these approaches will result in less specific biomarkers than ours in a diagnostic blood test. The advantages of our ROC approach are 2-fold. A given gene with a high specificity corresponding to a high sensitivity will not only indicate that it is minimally or not expressed in normal ovary tissues and nonovarian tissues but also indicate that it is highly expressed in ovarian cancer tissues. Therefore, it will represent a highly specific and sensitive single biomarker for ovarian cancer detection using a blood test.

This study used the novel approach of group biomarkers as an alternative to the traditional single biomarkers or other combinations of biomarkers used to date for the detection of ovarian cancer using blood tests. Statistical analysis of the potential group biomarkers identified in this study showed that they outperform the combination of the best biomarkers identified using other computational approaches. We believe that our approach outperforms other computational techniques because there exists a correlation or coregulation among the genes that belong to the group biomarkers that we identified. In contrast, other techniques combined potential biomarkers without checking to see whether they are correlated or not.

Interestingly, our group biomarkers contain fewer genes (that is, maximum of three genes per group) and they do better than the combination of the best biomarkers identified by previous approaches, which contain more genes (that is, a minimum of five genes per group). Thus, our methodology identifies an optimum combination of genes that have the highest effect on the diagnosis of a disease. This suggests that the number of genes in a group biomarker is irrelevant, but how they behave together as a group is very important.

We statistically validated the group biomarkers identified in this study using publicly available gene expression data downloaded from a NIH Web site. Because the genes that we identified in this study encode for secreted proteins, they have the potential to be used as tumor markers for the detection of ovarian cancer in a diagnostic blood test. However, additional clinical studies assessing serum levels of the identified putative biomarkers are required to confirm their usefulness in the diagnosis and/or monitoring of ovarian cancer.

Grant support: University of Minnesota Graduate Program Grant-in-Aid of Research, Artistry, and Scholarship Program and NIH/National Cancer Institute grant NIH R01CA106878 (A.P.N. Skubitz).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank Gene Logic for providing the gene expression data and Diane Rauch and Sarah Bowell for procuring the tissue samples (University of Minnesota Cancer Center Tissue Procurement Facility).

1
American Cancer Society. Cancer facts and figures 2007. Atlanta: American Cancer Society; 2007.
2
Verheijen RHM, Von Mensdorff-Pouilly S, Van Kamp GJ, Kenemans P. CA 125: fundamental and clinical aspects.
Cancer Biol
1999
;
9
:
117
–24.
3
Bast RC, Jr. Early detection of ovarian cancer: new technologies in pursuit of a disease that is neither common nor rare.
Trans Am Clin Climatol Assoc
2004
;
115
:
233
–48.
4
Welsh JB, Zarrinkar PP, Sapinoso LM, et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer.
Proc Natl Acad Sci U S A
2001
;
98
:
1176
–81.
5
Hough CD, Cho KR, Zonderman AB, Schwartz DR, Morin PJ. Coordinately up-regulated genes in ovarian cancer.
Cancer Res
2001
;
61
:
3869
–76.
6
Schummer M, Bumgarner RE, Nelson PS, et al. Comparative hybridization of an array of 21,500 ovarian cDNA's for the discovery of genes overexpressed in ovarian carcinomas.
Gene
1999
;
238
:
375
–85.
7
Ismail RS, Baldwin RL, Fang J, et al. Differential gene expression between normal and tumor derived ovarian epithelial cells.
Cancer Res
2000
;
60
:
6744
–9.
8
Ono K, Tanaka T, Tsunoda T, et al. Identification by cDNA microarray of genes involved in ovarian carcinogenesis.
Cancer Res
2000
;
60
:
5007
–11.
9
Santin AD, Zhan F, Bellone S, et al. Gene expression profiles in primary ovarian serous papillary tumours and normal ovarian epithelium: identification of candidate molecular markers for ovarian cancer diagnosis and therapy.
Int J Cancer
2004
;
112
:
14
–25.
10
Hibbs K, Skubitz KM, Pambuccian SE, et al. Differential gene expression in ovarian carcinoma. Identification of potential biomarkers.
Am J Pathol
2004
;
165
:
397
–414.
11
Skubitz APN, Pambuccian SE, Argenta AP, Skubitz KM. Differential gene expression identifies subgroups of ovarian carcinoma.
Translational Res
2006
;
148
:
223
–48.
12
Hedenfalk I, Duggan D, Chen Y, et al. Gene expression profiles in hereditary breast cancer.
N Engl J Med
2001
;
344
:
539
–48.
13
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns.
Proc Natl Acad Sci
1998
;
95
:
14863
–8.
14
Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks.
Proc Natl Acad Sci U S A
2000
;
97
:
12182
–6.
15
Furey T, Cristianini N, Duffy N, Bednarski D, Schummer M, Haussler D. Support vector machines classification and validation of cancer tissue samples using microarray expression data.
Bioinformatics
2000
;
16
:
906
–14.
16
Moler EJ, Chow ML, Mian IS. Analysis of molecular profile data using generative and discriminative methods.
Physiol Genomics
2000
;
4
:
109
–26.
17
Debashis G, Chinnaiyan AM. Classification and selection of biomarkers in genomic data using LASSO.
J Biomed Biotechnol
2005
;
2
:
147
–54.
18
Xiong M, Xiangzhong F, Jinying Z. Biomarker identification by feature wrappers.
Genome Res
2001
;
11
:
1878
–87.
19
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data.
J Am Stat Assoc
2002
;
97
:
77
–87.
20
Tchagang AB, Tewfik AH. DNA Microarray data analysis: a novel biclustering algorithm approach. EURASIP J App Sig Proc 2006; article ID 59809.
21
Tewfik AH, Tchagang AB, Vertatschitsch L. Parallel identification of gene biclusters with coherent evolution.
IEEE Trans Sig Proc
2006
;
54
:
2408
–17.
22
Tchagang AB, Tewfik AH, Skubitz APN. Analysis of order preserving genes biclusters. Proceedings of IEEE International Workshop on Genomic Signal Processing and Statistics; 2006 May 28-30; College Station, TX; IEEE; 2006.
23
Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey.
IEEE Trans Comp Biol Bioinf
2004
;
1
:
24
–45.
24
GeneLogic GX™ Explorer 2.0. A component of the Genesis Enterprise System™ user guide. Gene Logic, Inc.; 2003.
25
Inoue M, Tamai K, Shimizu H, et al. A homozygous missense mutation in the cytoplasmic tail of β4 integrin, G931D, that disrupts hemidesmosome assembly and underlies non-Herlitz junctional epidermolysis bullosa without pyloric atresia.
J Invest Dermatol
2000
;
114
:
1061
–4.
26
Süllentrop F, Moka D, Neubauer S, et al. 31P NMR spectroscopy of blood plasma: determination and quantification of phospholipid classes in patients with renal cell carcinoma.
NMR Biomed
2002
;
15
:
60
–8.
27
Bui TD, Lako M, Lejeune S, et al. Isolation of a full-length human WNT7A gene implicated in limb development and cell transformation, and mapping to chromosome 3p25.
Gene
1997
;
189
:
25
–9.
28
Calvo R, West J, Franklin W, et al. Altered HOX and WNT7A expression in human lung cancer.
Proc Natl Acad Sci U S A
2000
;
97
:
12776
–81.
29
Bouchard D, Morisset D, Bourbonnais Y, Tremblay GM. Proteins with whey-acidic-protein motifs and cancer.
Lancet Oncol
2006
;
7
:
167
–74.
30
Hellstrom I, Raycraft J, Hayden-Ledbetter M, et al. The HE4 (WFDC2) protein is a biomarker for ovarian carcinoma.
Cancer Res
2003
;
63
:
3695
–700.
31
Drapkin R, Von Horsten HH, Lin Y, et al. Human epididymis protein 4 (HE4) is a secreted glycoprotein that is overexpressed by serous and endometrioid ovarian carcinomas.
Cancer Res
2005
;
65
:
2162
–9.
32
Komatsu M, Carraway CAC, Fregien NL, Carraway KL. Reversible disruption of cell-matrix and cell-cell interactions by overexpression of sialomucin complex.
J Biol Chem
1997
;
272
:
33245
–54.
33
Feng H, Ghazizadeh M, Konishi H, Araki T. Expression of MUC1 and MUC2 mucin gene products in human ovarian carcinomas.
Jpn J Clin Oncol
2002
;
32
:
525
–9.
34
Chang K, Pastan I. Molecular cloning of mesothelin, a differentiation antigen present on mesothelium, mesotheliomas, and ovarian cancers.
Proc Natl Acad Sci U S A
1996
;
93
:
136
–40.
35
Muminova ZE, Strong TV, Shaw DR. Characterization of human mesothelin transcripts in ovarian and pancreatic cancer.
BMC Cancer
2004
;
4
:
19
.
36
Scholler N, Fu N, Yang Y, et al. Soluble member(s) of the mesothelin/megakaryocyte potentiating factor family are detectable in sera from patients with ovarian carcinoma.
Proc Natl Acad Sci
1999
;
96
:
11531
–6.
37
Borgono CA, Kishi T, Scorilas A, et al. Human kallikrein 8 protein is a favorable prognostic marker in ovarian cancer.
Clin Cancer Res
2006
;
12
:
1487
–93.
38
Magklara A, Scorilas A, Katsaros D, et al. The human KLK8 (neuropsin/ovasin) gene: identification of two novel splice variants and its prognostic value in ovarian cancer.
Clin Cancer Res
2001
;
7
:
806
–11.