The Cancer Gene Anatomy Project database of the National Cancer Institute has thousands of expressed sequences, both known and novel,in the form of expressed sequence tags (ESTs). These ESTs, derived from diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Using a data-mining tool called Digital Differential Display (DDD) from the Cancer Gene Anatomy Project database, ESTs from six different solid tumor types (breast, colon,lung, ovary, pancreas, and prostate) were analyzed for differential expression. An electronic expression profile and chromosomal map position of these hits were generated from the Unigene database. The hits were categorized into major classes of genes including ribosomal proteins, enzymes, cell surface molecules, secretory proteins, adhesion molecules, and immunoglobulins and were found to be differentially expressed in these tumor-derived libraries. Genes known to be up-regulated in prostate, breast, and pancreatic carcinomas were discovered by DDD, demonstrating the utility of this technique. Two hundred known genes and 500 novel sequences were discovered to be differentially expressed in these select tumor-derived libraries. Test genes were validated for expression specificity by reverse transcription-PCR, providing a proof of concept for gene discovery by DDD. A comprehensive database of hits can be accessed at http://www.fau.edu/cmbb/publications/cancergenes.htm. This solid tumor DDD database should facilitate target identification for cancer diagnostics and therapeutics.

With the expected completion of the human genome sequencing efforts in the next few years, over 100,000 new genes are likely to be discovered (1, 2, 3). From these vast numbers of new genes,new diagnostic and therapeutic targets for diseases like cancer are predicted to emerge (4). Only a subset of genes is expressed in a given cell, and the level of expression governs function. High-throughput gene expression technology is becoming a possibility for analyzing expression of a large number of sequences in diseased and normal tissues with the use of microarrays and gene chips(5, 6, 7, 8). A parallel way to initiate a search for genes relevant to cancer diagnostics and therapy is to data mine the sequence database (9, 10, 11, 12, 13). A large number of expressed sequences from diverse organ-, species-, and disease-derived cDNA libraries are being deposited in the form of ESTs3in different databases.

The CGAP database of the NCI is an attractive starting point for cancer-specific gene discovery (13). The Human Tumor Gene Index was initiated by the NCI in 1997 with a primary goal of identifying genes expressed during development of human tumors in five major cancer sites: (a) breast; (b) colon;(c) lung; (d) ovary; and (e) prostate. This database consists of expression information (mRNA) of thousands of known and novel genes in diverse normal and tumor tissues. By monitoring the electronic expression profile of many of these sequences, it is possible to compile a list of genes that are selectively expressed in the cancers. Data-mining tools are becoming available to extract expression information about the ESTs derived from various CGAP libraries (9, 10, 12, 14). Currently, there are 1.5 million ESTs in the CGAP database, of which 73,000 are novel sequences. These sequences are also subclassified into those derived from libraries of normal, precancerous, or cancer tissues. We chose the DDD at the CGAP database to identify genes (both novel and known ESTs)that are selectively up- or down-regulated in six major solid tumor types (breast, colon, lung, ovary, pancreas, and prostate). Survey sequencing of mRNA gene products can provide an indirect means of generating gene expression fingerprints for cancer cells and their normal counterparts. DDD is a computer method of comparing these fingerprints. DDD is a quantitative method that enables the user to determine the fold differences between the libraries being compared,using a statistical method to quantitate the transcript levels. ESTs present in tumor-derived libraries were compared against all other libraries or against the corresponding normal libraries by DDD, and the hits showing >10-fold differences were compiled for each of the organ types. These hits were functionally classified into major classes of proteins. Genes belonging to ribosomal proteins, enzymes, receptors,binding proteins, secretory proteins, and cell adhesion molecules were identified to be differentially expressed in these tumor types. A comprehensive database of hits was created, providing additional electronic expression data as well as novel ESTs that were thus identified. This database can be accessed on the World Wide Web.4

Data-mining of CGAP Database.

The CGAP database was accessed,5and the DDD tool was used according to the database instructions. DDD takes advantage of the UniGene database by comparing the number of times ESTs from different libraries were assigned to a particular UniGene cluster. Six different solid tumor-derived EST libraries(breast, colon, lung, ovary, pancreas, and prostate) with corresponding normal tissue-derived libraries were chosen for DDD(N = 110). To identify tumor- and organ-specific ESTs, all the other organ- and tumor-derived EST libraries (N = 327) were chosen for comparison with each of the six tumor types. The nature of the libraries (normal, pretumor, or tumor) was authenticated by comparison of the CGAP data with UniGene database.6Those few libraries showing discrepancies of definition between the two databases were excluded. The DDD was performed for each organ type individually. DDD was performed using ESTs from tumors (pool A) and corresponding normal organ (pool B) for DDD2 method or tumors(pool A) and all other organ- and tumor-derived cDNA libraries including the corresponding normal (pool B) for the DDD1 method using the online tool. The output provided a numerical value in each pool denoting the fraction of sequences within the pool that mapped to the UniGene cluster, providing a dot intensity. Fold differences were calculated by using the ratio of pool A:pool B. Statistically significant hits (Fisher’s exact test) showing >10-fold differences were compiled, and a preliminary database was created. Hits were classified into major families using information generated from two web sites.7Novel ESTs were compiled into a separate database. The UniGene database was accessed to establish an electronic expression profile (E-Northern)for each of the hits to facilitate tumor- and organ-selective gene discovery. The cytogenetic map position of the hits was also inferred from the UniGene page. A final database of ESTs that were up-regulated,down-regulated, and show absolute differences (+/−) in the six tumor types was created.4

Validation of DDD Hits.

Tumor and normal tissues were obtained from the Cooperative Human Tissue Network (Birmingham, AL). One μg of total RNA was reverse-transcribed using random hexamers and Superscript Reverse Transcriptase (Life Technologies, Inc., Gaithersburg, MD). One-fortieth of the cDNA was PCR-amplified using gene-specific primers. Primers were designed using the Primer 3 program on the web.8Primers were chosen for the following 13 known genes that showed DDD specificity for colon tumors: (a) creatine kinase (Hs. 118843); (b) guanylate cyclase (Hs. 1085); (c)ETS-variant gene (Hs. 179214); (d) placental lactogen (Hs. 75984); (e) troponin (Hs. 73980); (f) tinin (Hs. 172004); (g) fibrinogen (Hs. 90765); (h) homeobox transcription factor 1 (Hs. 1545); (i) homeobox transcription factor 2 (Hs. 7399); (j) myoglobin C(Hs. 118836); (k) cytokeratin 20 (Hs. 84905);(l) neurotensin receptor (Hs. 110642), and (m)transmembrane glycoprotein (Hs. 143133). In addition, primers were designed for 20 colon-specific novel ESTs. The primer sequences are available on request. The PCR parameters included 94°C for 7 min,followed by a 35–40-cycle amplification at 94°C for 45 s,62°C–65°C for 45 s, and 72°C for 90 s, with a final extension at 72°C for 10 min. RT-minus controls and genomic DNA controls were routinely used to authenticate the RT-derived products. One-half of the amplified products were separated by electrophoresis on 2% agarose gel and detected by ethidium bromide staining. Internal control actin RT-PCR was done on all samples simultaneously.

Electronic Profiling of Up-Regulated Genes in Solid Tumors.

The EST libraries representing each of the organ types used in the DDD protocol were chosen from the CGAP database library browser. The content of each of the libraries used was verified by comparison against the UniGene database.6 The DDD was performed by taking all the libraries representing tumors for each organ type and digitally comparing against either the corresponding normal tissue-derived library (DDD2) or all the other libraries plus the corresponding normal in the database (DDD1). Pretumor libraries were not included in all the other libraries and were analyzed separately(DDD3 and DDD4). Comparing the tumor libraries with all of the other libraries including the corresponding normal (DDD1) or with the corresponding normal only (DDD2) resulted in the identification of over 600 ESTs (data not shown). These hits were subdivided into known and novel ESTs. Approximately 10% of the hits showed varying levels of similarity (weak, moderate, and high) to alu-containing repeat sequences. An interesting pattern emerged regarding the known ESTs thus identified. The majority of the known ESTs can be classified into distinct classes of genes. These included ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, and immunoglobulins. Over 80 genes were found to be up-regulated with >10-fold differences in comparison to normal and all other organs (Table 1). Ribosomal proteins were found to be up-regulated in breast- and prostate carcinoma-derived libraries, but not in the other four solid tumor-derived libraries. Known hits (enzymes) for select organ type(for example, prostate-specific antigen and prostatic acid phosphatase for prostate, several pancreatic enzymes for pancreas, tryptophan hydroxylase and DOPA decarboxylase in the lung, and mammaglobin for breast) were identified by the DDD protocol. The DDD also identified mucin-related proteins in the colon and folate receptor in the ovary carcinoma-derived EST libraries. The majority of the up-regulated genes were identified by both DDD1 and DDD2 protocols. These results suggest the potential utility of the DDD protocol to rapidly identify tumor-selective genes. An electronic Northern (sources of cDNAs) and cytogenetic map position for each of the hits was created from the UniGene page (data not shown). The above-mentioned DDD approach also resulted in the identification of up-regulated novel ESTs(data not shown). The vast majority of the ESTs were present in an organ- and tumor type-dependent manner. The results of these hits and additional hits (>5-fold), including the novel ESTs, can be viewed at the web site.4

Electronic Profiling of Down-Regulated Genes in Solid Tumors.

Using the same DDD protocol (DDD1 and DDD2), a list of genes that are down-regulated by >10-fold was also compiled (Table 2). The majority of the down-regulated genes were discovered by DDD2 protocol (tumor versus normal). Similar to the above-mentioned results, distinct members of ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, and immunoglobulins were discovered to be selectively down-regulated in an organ- and tumor type-dependent manner(N = 34). A complete listing of all the genes that are down-regulated (>5-fold) with their E-Northern and cytogenetic map position can be accessed at the web site.4

Electronic Prediction of Tumor-selective Genes by DDD.

Using the DDD1 and DDD2 protocol, a list of known genes that are predicted to be either present or absent in the tumor types (plus/minus differences) was compiled (Table 3). Fourteen genes were found to be present exclusively in the select solid tumor-derived libraries (Table 3). These included mammaglobulin in breast, androgen receptor in prostate, γ-glutamyl transferase in lung, and neurotensin receptor in pancreas. Seventeen genes were found to be selectively absent in the solid tumors analyzed (Table 4). These included seminogelin in the prostate, apolipoprotein in the breast, and islet ameloid polypeptide in the pancreas.

Test Validation of DDD Hits by RT-PCR.

The electronic prediction by DDD was tested for expression relevance using cDNAs from a matched set of normal and tumor colon tissues (Fig. 1). Twelve known genes that showed plus/minus differences from the analysis of colon DDD from Table 3 and Table 4 were analyzed by RT-PCR. Three of these 12 genes showed concordance with the DDD prediction. The results of these three genes are shown in Fig. 1. Creatine kinase (DDD2)expression was detected in the normal colon-derived but not tumor colon-derived cDNAs, whereas, Guanylate cyclase (DDD7) and ETS variant gene (DDD12) were expressed in the same tumor, but not in the normal colon-derived cDNAs, whereas, RT-PCR analysis of the remaining nine genes did not show the DDD-predicted specificity of expression (data not shown). Furthermore, analysis of 20 novel ESTs predicted to be specific for colon tumors by DDD demonstrated two of the ESTs to be specifically expressed in the tumor, consistent with the prediction of DDD (data not shown). The authenticity of the RT-derived products was established using RT-minus reactions. These results demonstrated that it is possible to rapidly validate the electronic prediction by DDD.

Discovery of cancer genes is a major challenge facing cancer research (9, 13, 15). It is becoming increasingly clear that multiple genetic alterations are responsible for cancer development. The task of identifying cancer genes is complex due to the fact that only a subset of genes within a cell population is expressed. Whereas in the past, cancer gene discovery followed conventional methods such as mapping the gene, loss of heterozygosity, model organism studies, and so forth, the future of cancer gene discovery has to be able to make effective use of the large number of genes(predicted to be around 100,000 per cell) that will emerge from the human genome sequencing efforts (1, 2, 3, 4). These sequences are being deposited in the vast sequence databases, most of which are in the public domain. For cancer-specific gene discovery, the CGAP database of the NCI provides a comprehensive collection not only of expressed sequences in the form of ESTs but also of various data-mining tools to analyze the ESTs. The basis of the CGAP database is establishment of the molecular anatomy of the cancer cell by determining the repertoire of genes that are expressed in the cancer in a quantitative manner, so that a fingerprint can be established. By comparing the fingerprints with the normal cells, it should be possible to short list genes that are present in the cancer cells, which can be followed up for relevance.

High-throughput gene expression techniques (microarrays, Genechips) to identify cancer-specific genes are becoming available(5, 6, 7, 8), however, the technology is not cost effective for average laboratories. Furthermore, this method introduces a bias in that only a limited number, usually one tumor- and normal-derived RNA,can be used for the initial analysis. In view of the pharmacogenomic gene expression profile differences that are usually seen in different patients (8, 9), direct use of this technique may become limiting. In this context, data-mining the databases provides a parallel approach to rapidly establish transcript-based fingerprinting.

The CGAP database currently offers three different data-mining tools:(a) X-profiling; (b) SAGE analysis; and(c) DDD. The X-profiling method enables identification of mostly novel ESTs that are present or absent in a given library, but it does not permit quantitation. The SAGE and DDD options enable quantitation of transcript levels and identify both known and novel ESTs. The SAGE database was a recent addition to the CGAP database(10); currently, SAGE libraries are available for only a few organs. Hence, to establish a proof of concept of cancer gene discovery by data-mining, we choose the DDD protocol.

A database of ESTs, both known and novel, that are differentially expressed at least >10-fold was created for six major solid tumor types. An electronic Northern based on cDNA sources as well as the cytogenetic map position was created for each of these hits. The vast sequence database (1.5 million ESTs) was thus reduced to approximately 200 known genes and 500 novel ESTs for these six organ-derived tumor types. When these hits were subdivided into major classes of genes, the number of known hits was significantly reduced (10–30 per organ type). The majority of the hits were classifiable as ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, or immunoglobulins. Known organ type-specific genes were identified, including prostate-specific antigens, prostatic acid phosphatase, and androgen receptor for prostate; pancreatic enzymes such as islet ameloid polypeptide, lipases, elastases, and carboxypeptidases for pancreas; and mammaglobulin for breast. Our recent demonstration of pancreatic tumor-specific expression of neurotensin receptor (20) was also corroborated by the DDD results.

Distinct ribosomal proteins were found to be up- and down-regulated in ovary-, pancreas-, and prostate cancer-derived libraries by DDD. A recent report (17) supports the selective involvement of the ribosomal proteins in prostate and breast tumors. Subtractive hybridization of prostatic hyperplasia from prostate tumors identified distinct ribosomal proteins (L4, l5, L7a, L23a, l30, L37, S14, and S18). Furthermore, L23a and S14 levels were shown to be elevated in PC-3 prostate carcinoma cells in comparison to normal prostatic epithelial cells (17). In addition, mutant p53 expression has been shown to induce overexpression of selective ribosomal proteins(17). These results demonstrate the utility of DDD for rapid gene discovery of cancer.

Electronic expression profiling to identify cancer specific genes is likely to have false positives. However, the true lead genes can be rapidly validated by RT-PCR using appropriate cDNAs. RT-PCR validation of test genes indicated the expression profile consistent with the DDD prediction. We choose 12 known genes predicted to be either up- or down-regulated in colon tumors by DDD. Three out of these 12 genes showed the expected specificity of expression. Evidence in the literature supports these three hits in select cancer types. Creatine kinase isoforms have been shown to be lost in colon carcinomas(18), consistent with our DDD findings. Similarly, the guanylate cyclase C isoform has recently been shown to be a biomarker for advanced colon carcinomas (19). The selective up-regulation of the ETS variant gene seen in the colon tumor, if validated in a large sample size, provides a novel diagnostic and therapy target for colon carcinomas. The ETS gene belongs to the ETS oncogene family (16), which has DNA binding ability. In addition, the DDD protocol predicted the expression specificity of neurotensin receptor in pancreatic tumors. We have recently identified this gene by independently data-mining the Unigene database and showed the expression specificity in the pancreatic tumors (20). Identification by the DDD protocol of several known genes that have already been shown to be of diagnostic and therapeutic value in prostate, breast, and pancreatic tumors suggests that it is possible to discover novel genes from the list of novel ESTs that we have uncovered. In support of this, our preliminary results with 20 colon-specific novel ESTs have led to the identification of 2 ESTs showing specificity of expression to the colon tumors. These novel ESTs can be subjected to additional data-mining tools [e.g., comparison with SAGE and X-profiling; contig construction to expand the sequences and motifs recognition, and electronic Northern (14)] to further reduce the numbers before laboratory validation with relevant cDNAs.

The ability to reduce the number of hits from the vast and rapidly growing amount of sequence information in the sequence database is crucial to efficient gene discovery. The results presented in this report support the starting premise that by data-mining the CGAP database, it is possible to rapidly short list both known and novel ESTs for immediate follow-up studies. The database of hits we have generated using DDD for six different solid tumor types should provide a rapid starting point for discovery of both diagnostic and therapy targets.

We thank the CGAP database for providing access and the data-mining tools used in this study. We thank Jeanine Narayanan for editorial assistance.

Fig. 1.

RT-PCR validation of test hits identified by DDD. Three hits (creatine kinase MM-DDD2, guanylate cyclase 2C-DDD7, and ETS variant gene 3-DDD12) from Table 3 were chosen for expression specificity using a matched set of normal- and tumor colon-derived random primed cDNAs synthesized in the presence (RT+) or absence of(RT−) reverse transcriptase. NEG, template minus negative control.

Fig. 1.

RT-PCR validation of test hits identified by DDD. Three hits (creatine kinase MM-DDD2, guanylate cyclase 2C-DDD7, and ETS variant gene 3-DDD12) from Table 3 were chosen for expression specificity using a matched set of normal- and tumor colon-derived random primed cDNAs synthesized in the presence (RT+) or absence of(RT−) reverse transcriptase. NEG, template minus negative control.

Close modal

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported by an institutional startup grant (to R. N.). D. S. was supported by a grant from the Boca Raton Community Hospital Foundation. Colon tumor and normal tissues were provided by the Cooperative Human Tissue Network, which is funded by the National Cancer Institute.

3

The abbreviations used are: EST, expressed sequence tag; CGAP, Cancer Gene Anatomy Project; DDD, Digital Differential Display; RT, reverse transcription; Hs., human sequence;NCI, National Cancer Institute.

4

http://www.fau.edu/cmbb/publications/cancergenes.htm.

5

http://www.cgap.gov.

6

http://www.ncbi.nlm.nih.gov/UniGene/.

7

http://www.ncbi.nlm.nih.gov/Omin/ and the GeneCards site http://bioinformatics.weizmann.ac.il/cards/.

8

http://www.genome.wi.mit.edu//cgi-bin/primer/primer3_www.cgi.

Table 1

DDD of up-regulated genes in tumors

Hits (known ESTs) showing >10-fold differences in the indicated tumor-derived libraries in comparison with normal tissue-derived cDNA library and all other organ- and tumor-derived cDNA libraries were compiled. ESTs belonging to specific classes of genes were subclassified as indicated. UniGene number for each hit is shown. Electronic expression (E-Northern) for each of these hits was inferred from the UniGene database from the cDNA sources, and the chromosomal map position for each of these hits was inferred from the cytogenetic map.4

NameHs.#BrCoLuOvPaPr
Enzymes (n = 35)        
CYB561 153028 29      
P450, subfamily XVII 1363 26      
ATP synthase, isoform 2 155751 20      
Proteosome non-ATPase, 7 155543 10   18   
Glu. peroxidase 2a 2702  17     
Carbonic anhydrase I 23118  18     
Dopa decarboxylase 150403   10    
Tryptophan Hydroxylase 144563   27    
ATPase, lysosomal 24322    13   
Serine protease 9 79361    28   
Serine-like protease, 1 69423    25 28  
Myosin IXBB 159629    17   
ATPase isoform 1 64173    15   
ADH 8 87539    15   
ATPase, 9 kD 24322    13   
PKR 177574    10   
Cathepsin E 1355     80  
Panc. lipase 102876     47  
Ser. protease 2 241561     39  
Ser. protease 1 241395     34  
Elastase 1 21     38  
Carboxy peptidase A2 89717     34  
Carboxy peptidase A1 2879     33  
Elastase 3 181289     32  
Chymotrypsino-gen B1 74502     32  
Pancreatic lipase-related protein 1 73923     30  
Carboxy peptidase B1 180884     28  
DNase II 118243     14  
Elastase 3B 183864     13  
Urokinase 77274     13  
Ribosomal proteins (N = 6)        
S15a 2953 11      
S17 5174      11 
S19 126707 16      
L31 184014      10 
L35 182825      10 
L41 108124      10 
Receptor/surface/membrane (N = 17)        
Claudin 25640  33     
Tumor necrosis factor receptor, 6b 194676  17     
Plakophilin 3 26557  11     
Lymphocyte antigen complex 77667    17   
Mesothelin 155981    40   
Chloride channel 1A 84974    12   
Folate receptor 1 73769    10   
CD74 84298    10   
Keratin 19 182265     24  
CEA 220529     21  
Keratin 17 2785     17  
Trans membrane 4 superfamily 3 84072     13  
Trans membrane 4 superfamily 4 11881     17  
Non-specific cross-reacting antigen 73848     61  
T-cell receptor γ 112259      95 
IL-17 receptor 129751      43 
ERBB3 199067      15 
Binding proteins (N = 24)        
Myosin 170482 236      
PDEF 79414 194      
Ubiquitin binding, p62 182248 27      
HSFBP 1 158675 25      
H2A histone 795 23      
Ald. dehydrogenase 6 75746     11  
PSA/KLK2 181350      77 
PSA/KLK3 171995      53 
Acid phosphatase 1852      61 
Bruton’s tyrosine kinase 178391       
Treacher Collins-Franceschetti syndrome 1 172727 18   17   
ATP binding cassette, subfamily B, member 6 107911 14      
Homeobox A9 127428  22     
Insulinoma-associated 1 89584   48    
(TITF1) 107764   34    
ATP-binding cassette, subfamily C, 8 54470   30    
I3 protein 75922   16    
H1 histone 109804   15    
Lectin binding (galectin 6) 79339    13   
LIM domain 79691    10   
Retinoic acid binding 2 183650    10   
NameHs.#BrCoLuOvPaPr
Enzymes (n = 35)        
CYB561 153028 29      
P450, subfamily XVII 1363 26      
ATP synthase, isoform 2 155751 20      
Proteosome non-ATPase, 7 155543 10   18   
Glu. peroxidase 2a 2702  17     
Carbonic anhydrase I 23118  18     
Dopa decarboxylase 150403   10    
Tryptophan Hydroxylase 144563   27    
ATPase, lysosomal 24322    13   
Serine protease 9 79361    28   
Serine-like protease, 1 69423    25 28  
Myosin IXBB 159629    17   
ATPase isoform 1 64173    15   
ADH 8 87539    15   
ATPase, 9 kD 24322    13   
PKR 177574    10   
Cathepsin E 1355     80  
Panc. lipase 102876     47  
Ser. protease 2 241561     39  
Ser. protease 1 241395     34  
Elastase 1 21     38  
Carboxy peptidase A2 89717     34  
Carboxy peptidase A1 2879     33  
Elastase 3 181289     32  
Chymotrypsino-gen B1 74502     32  
Pancreatic lipase-related protein 1 73923     30  
Carboxy peptidase B1 180884     28  
DNase II 118243     14  
Elastase 3B 183864     13  
Urokinase 77274     13  
Ribosomal proteins (N = 6)        
S15a 2953 11      
S17 5174      11 
S19 126707 16      
L31 184014      10 
L35 182825      10 
L41 108124      10 
Receptor/surface/membrane (N = 17)        
Claudin 25640  33     
Tumor necrosis factor receptor, 6b 194676  17     
Plakophilin 3 26557  11     
Lymphocyte antigen complex 77667    17   
Mesothelin 155981    40   
Chloride channel 1A 84974    12   
Folate receptor 1 73769    10   
CD74 84298    10   
Keratin 19 182265     24  
CEA 220529     21  
Keratin 17 2785     17  
Trans membrane 4 superfamily 3 84072     13  
Trans membrane 4 superfamily 4 11881     17  
Non-specific cross-reacting antigen 73848     61  
T-cell receptor γ 112259      95 
IL-17 receptor 129751      43 
ERBB3 199067      15 
Binding proteins (N = 24)        
Myosin 170482 236      
PDEF 79414 194      
Ubiquitin binding, p62 182248 27      
HSFBP 1 158675 25      
H2A histone 795 23      
Ald. dehydrogenase 6 75746     11  
PSA/KLK2 181350      77 
PSA/KLK3 171995      53 
Acid phosphatase 1852      61 
Bruton’s tyrosine kinase 178391       
Treacher Collins-Franceschetti syndrome 1 172727 18   17   
ATP binding cassette, subfamily B, member 6 107911 14      
Homeobox A9 127428  22     
Insulinoma-associated 1 89584   48    
(TITF1) 107764   34    
ATP-binding cassette, subfamily C, 8 54470   30    
I3 protein 75922   16    
H1 histone 109804   15    
Lectin binding (galectin 6) 79339    13   
LIM domain 79691    10   
Retinoic acid binding 2 183650    10   
Table 2

DDD of down-regulated genes in tumors

Hits showing >10-fold differences in normal organ-derived cDNA library in comparison with tumor-derived and other organ-derived libraries were compiled. The hits are classified as shown in Table 2.4

NameHs.#BrCoLuOvPaPr
Enzymes (N = 7)        
Creatine kinase, brain 173724  10     
TIMP 1a 5831   10    
Myosin light kinase 211582   13    
Superoxide dismutase 2 177781     20  
Aldo-keto reductase 78183     16  
Amylase, α 2A 75733     15  
Elastase 3B 183864     18  
Receptor/surface/membrane (N = 8)        
Phospholemman-like 92323  15     
CEA 220529  14     
MHC class I, E 181392   21    
Surfactant C 1074   19    
Surfactant A1 177582   34    
SB class HC antigen α 914   15    
Surfactant B 76305   14    
CD44 169610     22  
Secretory proteins (N = 9)        
HB α 1 75792 12 140 13   30 
Hb γ A 182167  12 11  22  
Albumin 75442  29  58 10  
Placental lactogen 75984  39     
Hb β 155376  11     
TGFBI 118787   12    
CTGF 75511   17    
PLAB 116577    21   
Interleukin 8 624     301  
Ribosomal proteins (N = 6)        
L9 157850    16   
L17 82202    11   
L21 184108    15   
L23 234518    15   
L37A 184109     23  
S19 126707     11  
Binding proteins (N = 13)        
Myosin 9615 35      
Crystallin α B 1940 19      
SNC73 protein 32225 10 12     
Galectin 4 5302  18     
Myosin 929  20     
Pleckstrin 77436   70    
Actin, β 180952   14    
α-2-Macroglobulin 74561   13    
EIF4A1 129673   10    
IGFBP5 103391    83   
Transgelin 75777    14   
Karyopherin α 4 119500    11   
TYR-MAP-β 182238     10  
Cell adhesion molecules (N = 2)        
Fibronectin 118162   15    
Vimentin 2064    12   
Immunoglobulins (N = 2)        
FCGRT 160741  11     
Ig κ 156110  10     
NameHs.#BrCoLuOvPaPr
Enzymes (N = 7)        
Creatine kinase, brain 173724  10     
TIMP 1a 5831   10    
Myosin light kinase 211582   13    
Superoxide dismutase 2 177781     20  
Aldo-keto reductase 78183     16  
Amylase, α 2A 75733     15  
Elastase 3B 183864     18  
Receptor/surface/membrane (N = 8)        
Phospholemman-like 92323  15     
CEA 220529  14     
MHC class I, E 181392   21    
Surfactant C 1074   19    
Surfactant A1 177582   34    
SB class HC antigen α 914   15    
Surfactant B 76305   14    
CD44 169610     22  
Secretory proteins (N = 9)        
HB α 1 75792 12 140 13   30 
Hb γ A 182167  12 11  22  
Albumin 75442  29  58 10  
Placental lactogen 75984  39     
Hb β 155376  11     
TGFBI 118787   12    
CTGF 75511   17    
PLAB 116577    21   
Interleukin 8 624     301  
Ribosomal proteins (N = 6)        
L9 157850    16   
L17 82202    11   
L21 184108    15   
L23 234518    15   
L37A 184109     23  
S19 126707     11  
Binding proteins (N = 13)        
Myosin 9615 35      
Crystallin α B 1940 19      
SNC73 protein 32225 10 12     
Galectin 4 5302  18     
Myosin 929  20     
Pleckstrin 77436   70    
Actin, β 180952   14    
α-2-Macroglobulin 74561   13    
EIF4A1 129673   10    
IGFBP5 103391    83   
Transgelin 75777    14   
Karyopherin α 4 119500    11   
TYR-MAP-β 182238     10  
Cell adhesion molecules (N = 2)        
Fibronectin 118162   15    
Vimentin 2064    12   
Immunoglobulins (N = 2)        
FCGRT 160741  11     
Ig κ 156110  10     
a

TIMP, tissue inhibitor of metalloproteinase 1; CEA,carcinoembryonic antigen; HB, hemoglobin; TGFB, transforming growth factor β; PLAB, prostate differentiation factor; SNC73,immunoglobulin heavy constant α 1; EIF, eukaryotic translation initiation factor; IGFBP, insulin-like growth factor-binding protein;TYR-MAP, tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein; FCGRT, Fc fragment of IgG receptor.

Table 3

DDD to identify hits present in the tumor types

From the hits of Tables 2,3, ESTs present only in the tumor were compiled.4

NameHs.#BrCoLuOvPaPr
Mammaglobin 1 46452 +/−      
Guanylate cyclase 2C 1085  +/−     
Homeobox trans. factor 1a 1545  +/−     
Homeobox trans. factor 2 77399  +/−     
Cyto keratin 20 84905  +/−     
Ets variant 179214  +/−     
EGR4 3052   +/−    
Dopamin rec. D2 73893   +/−    
γ-glutamyl-transferase 2 211824   +/−    
V κ 2 249231    +/−   
Neurotensin receptor 1 110642     +/−  
Transcobalamin I 2012     +/−  
Androgen receptor 99915      +/− 
Specific granular protein 54431      +/− 
NameHs.#BrCoLuOvPaPr
Mammaglobin 1 46452 +/−      
Guanylate cyclase 2C 1085  +/−     
Homeobox trans. factor 1a 1545  +/−     
Homeobox trans. factor 2 77399  +/−     
Cyto keratin 20 84905  +/−     
Ets variant 179214  +/−     
EGR4 3052   +/−    
Dopamin rec. D2 73893   +/−    
γ-glutamyl-transferase 2 211824   +/−    
V κ 2 249231    +/−   
Neurotensin receptor 1 110642     +/−  
Transcobalamin I 2012     +/−  
Androgen receptor 99915      +/− 
Specific granular protein 54431      +/− 
a

Trans, transcription; EGR, early growth response.

Table 4

DDD to identify hits absent in the tumor types

From the hits of Tables 2,3, ESTs absent only in the tumor were compiled.4

NameHs.#BrCoLuOvPaPr
Apolipoprot. D 75736 +/−      
Albumin 75442 +/−  +/−    
Troponin T1 73980  +/−     
Titin 172004  +/−     
Creatine kinase 118843  +/− +/− +/−   
Actin α 1 1288  +/− +/− +/−   
Myoglobin 118836  +/− +/−    
Vinculin 75350   +/−    
Myosin, heavy 929   +/−    
p53-response gene 2 118893   +/−    
Placental lactogen 75984   +/− +/− +/−  
HB βa 155376    +/−   
HB α 1 75792    +/− +/−  
Islet amyloid polypeptide 142255     +/−  
Tissue factor pathway inhibitor 2 78045     +/−  
IGF2 182167      +/− 
Semenogelin I 1968      +/− 
NameHs.#BrCoLuOvPaPr
Apolipoprot. D 75736 +/−      
Albumin 75442 +/−  +/−    
Troponin T1 73980  +/−     
Titin 172004  +/−     
Creatine kinase 118843  +/− +/− +/−   
Actin α 1 1288  +/− +/− +/−   
Myoglobin 118836  +/− +/−    
Vinculin 75350   +/−    
Myosin, heavy 929   +/−    
p53-response gene 2 118893   +/−    
Placental lactogen 75984   +/− +/− +/−  
HB βa 155376    +/−   
HB α 1 75792    +/− +/−  
Islet amyloid polypeptide 142255     +/−  
Tissue factor pathway inhibitor 2 78045     +/−  
IGF2 182167      +/− 
Semenogelin I 1968      +/− 
a

HB, hemoglobin; IGF, insulin-like growth factor.

Table 1A

Continued

NameHs.#BrCoLuOvPaPr
SNC73 protein 32225     12  
Actin-related protein 2/3 complex, subu. 1B 11538     11  
Lumican 79914     10  
Thrombospondin 2 108623     10  
Ca+ integrin BP 10803     10  
ELK4, ETS-domain protein 169241      60 
Myosin BP-C 181368      10 
Fatty acid BP-5 153179      15 
Secretory proteins (N = 16)        
Trefoil factor 1 1406  22     
Mucin 2 315  22     
Pancreatitis-associated protein 423  18     
H. sapiens gallbladder mucin MUC5B mRNA 102482  11     
Epididymis-specific, whey-acidic protein type 2719    27   
Lutheran blood group 155048    21   
Progestagen-associated protein 82269    17   
TNF, 13 54673    10   
Glucagon 1460     93  
Mucin 5 103707     84  
Islet-derived 1β protein 4158     51  
Islet-derived 1α protein 1032     21  
Secreted frizzled-related protein 4 105700     17  
Mucin 1, transmembrane 89603     10  
Microseminoprotein β 183752      37 
Small inducible cytokine 153423      18 
Immunoglobulins (N = 4)        
FCGRT 160741 26      
Ig λ gene cluster 181125 10      
Ig super family 102171    15   
Ig γ 3 140      52 
Cell adhesion molecules (N = 6)        
Integrin β 5 149846 11      
Collagen type XIII, α 1 211933    23   
Laminin α 5 11669    10   
Laminin β 3 75517     24  
Laminin γ 2 54451     20  
Collagen Type X α 1 179729     30  
NameHs.#BrCoLuOvPaPr
SNC73 protein 32225     12  
Actin-related protein 2/3 complex, subu. 1B 11538     11  
Lumican 79914     10  
Thrombospondin 2 108623     10  
Ca+ integrin BP 10803     10  
ELK4, ETS-domain protein 169241      60 
Myosin BP-C 181368      10 
Fatty acid BP-5 153179      15 
Secretory proteins (N = 16)        
Trefoil factor 1 1406  22     
Mucin 2 315  22     
Pancreatitis-associated protein 423  18     
H. sapiens gallbladder mucin MUC5B mRNA 102482  11     
Epididymis-specific, whey-acidic protein type 2719    27   
Lutheran blood group 155048    21   
Progestagen-associated protein 82269    17   
TNF, 13 54673    10   
Glucagon 1460     93  
Mucin 5 103707     84  
Islet-derived 1β protein 4158     51  
Islet-derived 1α protein 1032     21  
Secreted frizzled-related protein 4 105700     17  
Mucin 1, transmembrane 89603     10  
Microseminoprotein β 183752      37 
Small inducible cytokine 153423      18 
Immunoglobulins (N = 4)        
FCGRT 160741 26      
Ig λ gene cluster 181125 10      
Ig super family 102171    15   
Ig γ 3 140      52 
Cell adhesion molecules (N = 6)        
Integrin β 5 149846 11      
Collagen type XIII, α 1 211933    23   
Laminin α 5 11669    10   
Laminin β 3 75517     24  
Laminin γ 2 54451     20  
Collagen Type X α 1 179729     30  
a

Glu, glutathione; DOPA, aromatic 1-amino acid decarboxylase; ADH, aldehyde dehydrogenase; PK, protein kinase; ser,serine; CEA, carcinoembryonic antigen; HSFBP, heat shock factor-binding protein; BP, binding protein; ald, aldehyde; PSA, prostate-specific antigen; TNF, tumor necrosis factor; FCGRT, Fc fragment of IgG,receptor, I3, brain protein I3.

1
Robbins R. J. Bioinformatics: essential infrastructure for global biology.
J. Computat. Biol.
,
3
:
465
-478,  
1996
.
2
Andrade M. A., Sander C. Bioinformatics: from genome data to biological sciences.
Curr. Opin. Biotechnol.
,
8
:
675
-683,  
1997
.
3
Collins, F. S., Patrinos, A., Jordan, E., Chakravarti, A., Gesteland, R., and Walters, L. New goals for the U.S. Human Genome Project: 1998–2003. Science (Washington, DC), 282: 682–689, 1998.
4
Fannon M. R. Gene expression in normal and disease states: identification of therapeutic targets.
Trends Biotechnol.
,
14
:
294
-298,  
1996
.
5
Lockhart D. J., Dong H., Byrne M. C., Follettie M. T., Gallo M. V., Chee M. S., Mittmann M., Wang C., Kobayashi M., Horton H., Brown E. L. Expression monitoring by hybridization to high-density oligonucleotide arrays.
Nat. Biotechnol.
,
14
:
1675
-1680,  
1996
.
6
Heller R. A., Schena M., Chai A., Shalon D., Bedilion T., Gilmore J., Woolley D. E., Davis R. W. Discovery and analysis of inflammatory disease-related genes using cDNA microarrays.
Proc. Natl. Acad. Sci. USA
,
94
:
2150
-2155,  
1997
.
7
Khan J., Bittner M. L., Saal L. H., Teichmann U., Azorsa D. O., Gooden G. C., Pavan W. J., Trent J. M., Meltzer P. S. cDNA microarrays detect activation of a myogenic transcription program by the PAX3-FKHR fusion oncogene.
Proc. Natl. Acad. Sci. USA
,
23
:
13264
-13269,  
1999
.
8
Elek J., Park K. H., Narayanan R. Microarray-based expression profiling in prostate tumors.
In Vivo
,
14
:
173
-182,  
2000
.
9
Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R., Vogelstein, B., and Kinzler, K. W. Gene expression profiles in normal and cancer cells. Science (Washington, DC), 276: 1268–1272, 1997.
10
Lal A., Lash A. E., Altschul S. F., Velculescu V., Zhang L., McLendon R. E., Marra M. A., Prange C., Morin P. J., Polyak K., Papadopoulos N., Vogelstein B., Kinzler K. W., Strausberg R. L., Riggins G. J. A public database for gene expression in human cancers.
Cancer Res.
,
59
:
5403
-5407,  
1999
.
11
Nacht M., Ferguson A. T., Zhang W., Petroziello J. M., Cook B. P., Yu H. G., Maguire S., Riley D., Coppola G., Landes G. M., Madden S. L., Sukumar S. Combining serial analysis of gene expression and array technologies to identify genes differentially expressed in breast cancer.
Cancer Res.
,
59
:
5464
-5470,  
1999
.
12
Wheeler D. L., Chappey C., Lash A. E., Leipe D. D., Madden T. L., Schuler G. D., Tatusova T. A., Rapp B. A. Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res.
,
28
:
10
-14,  
2000
.
13
Strausberg, R. L., Dahl, C. A., and Klausner, R. D. New opportunities for uncovering the molecular basis of cancer. Nat. Genet., Spec No 17: 415–416, 1997.
14
Schmitt A. O., Specht T., Beckmann G., Dahl E., Pilarsky C. P., Hinzmann B., Rosenthal A. Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues.
Nucleic Acids Res.
,
27
:
4251
-4260,  
1999
.
15
Szallasi Z. Bioinformatics.
Gene expression patterns and cancer. Nat. Biotechnol.
,
16
:
1292
-1293,  
1998
.
16
Klemsz M., Hromas R., Raskind W., Bruno E., Hoffman R. PE-1, a novel ETS oncogene family member, localizes to chromosome 1q21–q23.
Genomics
,
20
:
291
-294,  
1994
.
17
Vaarala M. H., Porvari K. S., Kyllonen A. P., Mustonen M. V., Lukkarinen O., Vihko P. T. Several genes encoding ribosomal proteins are over-expressed in prostate-cancer cell lines: confirmation of L7a and L37 over-expression in prostate-cancer tissue samples.
Int. J. Cancer
,
78
:
27
-32,  
1998
.
18
Joseph J., Cardesa A., Carreras J. Creatine kinase activity and isoenzymes in lung, colon and liver carcinomas.
Br. J. Cancer
,
76
:
600
-605,  
1997
.
19
Cagir B., Gelmann A., Park J., Fava T., Tankelevitch A., Bittner E. W., Weaver E. J., Palazzo J. P., Weinberg D., Fry R. D., Waldman S. A. Guanylyl cyclase C messenger RNA is a biomarker for recurrent stage II colorectal cancer.
Ann. Intern. Med.
,
131
:
805
-812,  
1999
.
20
Elek J., Pinzon W., Park K. H., Narayanan R. Relevant genomics of neurotensin receptor in cancer.
Anticancer Res.
,
20
:
53
-58,  
2000
.