Abstract
The Cancer Gene Anatomy Project database of the National Cancer Institute has thousands of expressed sequences, both known and novel,in the form of expressed sequence tags (ESTs). These ESTs, derived from diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Using a data-mining tool called Digital Differential Display (DDD) from the Cancer Gene Anatomy Project database, ESTs from six different solid tumor types (breast, colon,lung, ovary, pancreas, and prostate) were analyzed for differential expression. An electronic expression profile and chromosomal map position of these hits were generated from the Unigene database. The hits were categorized into major classes of genes including ribosomal proteins, enzymes, cell surface molecules, secretory proteins, adhesion molecules, and immunoglobulins and were found to be differentially expressed in these tumor-derived libraries. Genes known to be up-regulated in prostate, breast, and pancreatic carcinomas were discovered by DDD, demonstrating the utility of this technique. Two hundred known genes and 500 novel sequences were discovered to be differentially expressed in these select tumor-derived libraries. Test genes were validated for expression specificity by reverse transcription-PCR, providing a proof of concept for gene discovery by DDD. A comprehensive database of hits can be accessed at http://www.fau.edu/cmbb/publications/cancergenes.htm. This solid tumor DDD database should facilitate target identification for cancer diagnostics and therapeutics.
Introduction
With the expected completion of the human genome sequencing efforts in the next few years, over 100,000 new genes are likely to be discovered (1, 2, 3). From these vast numbers of new genes,new diagnostic and therapeutic targets for diseases like cancer are predicted to emerge (4). Only a subset of genes is expressed in a given cell, and the level of expression governs function. High-throughput gene expression technology is becoming a possibility for analyzing expression of a large number of sequences in diseased and normal tissues with the use of microarrays and gene chips(5, 6, 7, 8). A parallel way to initiate a search for genes relevant to cancer diagnostics and therapy is to data mine the sequence database (9, 10, 11, 12, 13). A large number of expressed sequences from diverse organ-, species-, and disease-derived cDNA libraries are being deposited in the form of ESTs3in different databases.
The CGAP database of the NCI is an attractive starting point for cancer-specific gene discovery (13). The Human Tumor Gene Index was initiated by the NCI in 1997 with a primary goal of identifying genes expressed during development of human tumors in five major cancer sites: (a) breast; (b) colon;(c) lung; (d) ovary; and (e) prostate. This database consists of expression information (mRNA) of thousands of known and novel genes in diverse normal and tumor tissues. By monitoring the electronic expression profile of many of these sequences, it is possible to compile a list of genes that are selectively expressed in the cancers. Data-mining tools are becoming available to extract expression information about the ESTs derived from various CGAP libraries (9, 10, 12, 14). Currently, there are 1.5 million ESTs in the CGAP database, of which 73,000 are novel sequences. These sequences are also subclassified into those derived from libraries of normal, precancerous, or cancer tissues. We chose the DDD at the CGAP database to identify genes (both novel and known ESTs)that are selectively up- or down-regulated in six major solid tumor types (breast, colon, lung, ovary, pancreas, and prostate). Survey sequencing of mRNA gene products can provide an indirect means of generating gene expression fingerprints for cancer cells and their normal counterparts. DDD is a computer method of comparing these fingerprints. DDD is a quantitative method that enables the user to determine the fold differences between the libraries being compared,using a statistical method to quantitate the transcript levels. ESTs present in tumor-derived libraries were compared against all other libraries or against the corresponding normal libraries by DDD, and the hits showing >10-fold differences were compiled for each of the organ types. These hits were functionally classified into major classes of proteins. Genes belonging to ribosomal proteins, enzymes, receptors,binding proteins, secretory proteins, and cell adhesion molecules were identified to be differentially expressed in these tumor types. A comprehensive database of hits was created, providing additional electronic expression data as well as novel ESTs that were thus identified. This database can be accessed on the World Wide Web.4
Materials and Methods
Data-mining of CGAP Database.
The CGAP database was accessed,5and the DDD tool was used according to the database instructions. DDD takes advantage of the UniGene database by comparing the number of times ESTs from different libraries were assigned to a particular UniGene cluster. Six different solid tumor-derived EST libraries(breast, colon, lung, ovary, pancreas, and prostate) with corresponding normal tissue-derived libraries were chosen for DDD(N = 110). To identify tumor- and organ-specific ESTs, all the other organ- and tumor-derived EST libraries (N = 327) were chosen for comparison with each of the six tumor types. The nature of the libraries (normal, pretumor, or tumor) was authenticated by comparison of the CGAP data with UniGene database.6Those few libraries showing discrepancies of definition between the two databases were excluded. The DDD was performed for each organ type individually. DDD was performed using ESTs from tumors (pool A) and corresponding normal organ (pool B) for DDD2 method or tumors(pool A) and all other organ- and tumor-derived cDNA libraries including the corresponding normal (pool B) for the DDD1 method using the online tool. The output provided a numerical value in each pool denoting the fraction of sequences within the pool that mapped to the UniGene cluster, providing a dot intensity. Fold differences were calculated by using the ratio of pool A:pool B. Statistically significant hits (Fisher’s exact test) showing >10-fold differences were compiled, and a preliminary database was created. Hits were classified into major families using information generated from two web sites.7Novel ESTs were compiled into a separate database. The UniGene database was accessed to establish an electronic expression profile (E-Northern)for each of the hits to facilitate tumor- and organ-selective gene discovery. The cytogenetic map position of the hits was also inferred from the UniGene page. A final database of ESTs that were up-regulated,down-regulated, and show absolute differences (+/−) in the six tumor types was created.4
Validation of DDD Hits.
Tumor and normal tissues were obtained from the Cooperative Human Tissue Network (Birmingham, AL). One μg of total RNA was reverse-transcribed using random hexamers and Superscript Reverse Transcriptase (Life Technologies, Inc., Gaithersburg, MD). One-fortieth of the cDNA was PCR-amplified using gene-specific primers. Primers were designed using the Primer 3 program on the web.8Primers were chosen for the following 13 known genes that showed DDD specificity for colon tumors: (a) creatine kinase (Hs. 118843); (b) guanylate cyclase (Hs. 1085); (c)ETS-variant gene (Hs. 179214); (d) placental lactogen (Hs. 75984); (e) troponin (Hs. 73980); (f) tinin (Hs. 172004); (g) fibrinogen (Hs. 90765); (h) homeobox transcription factor 1 (Hs. 1545); (i) homeobox transcription factor 2 (Hs. 7399); (j) myoglobin C(Hs. 118836); (k) cytokeratin 20 (Hs. 84905);(l) neurotensin receptor (Hs. 110642), and (m)transmembrane glycoprotein (Hs. 143133). In addition, primers were designed for 20 colon-specific novel ESTs. The primer sequences are available on request. The PCR parameters included 94°C for 7 min,followed by a 35–40-cycle amplification at 94°C for 45 s,62°C–65°C for 45 s, and 72°C for 90 s, with a final extension at 72°C for 10 min. RT-minus controls and genomic DNA controls were routinely used to authenticate the RT-derived products. One-half of the amplified products were separated by electrophoresis on 2% agarose gel and detected by ethidium bromide staining. Internal control actin RT-PCR was done on all samples simultaneously.
Results
Electronic Profiling of Up-Regulated Genes in Solid Tumors.
The EST libraries representing each of the organ types used in the DDD protocol were chosen from the CGAP database library browser. The content of each of the libraries used was verified by comparison against the UniGene database.6 The DDD was performed by taking all the libraries representing tumors for each organ type and digitally comparing against either the corresponding normal tissue-derived library (DDD2) or all the other libraries plus the corresponding normal in the database (DDD1). Pretumor libraries were not included in all the other libraries and were analyzed separately(DDD3 and DDD4). Comparing the tumor libraries with all of the other libraries including the corresponding normal (DDD1) or with the corresponding normal only (DDD2) resulted in the identification of over 600 ESTs (data not shown). These hits were subdivided into known and novel ESTs. Approximately 10% of the hits showed varying levels of similarity (weak, moderate, and high) to alu-containing repeat sequences. An interesting pattern emerged regarding the known ESTs thus identified. The majority of the known ESTs can be classified into distinct classes of genes. These included ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, and immunoglobulins. Over 80 genes were found to be up-regulated with >10-fold differences in comparison to normal and all other organs (Table 1). Ribosomal proteins were found to be up-regulated in breast- and prostate carcinoma-derived libraries, but not in the other four solid tumor-derived libraries. Known hits (enzymes) for select organ type(for example, prostate-specific antigen and prostatic acid phosphatase for prostate, several pancreatic enzymes for pancreas, tryptophan hydroxylase and DOPA decarboxylase in the lung, and mammaglobin for breast) were identified by the DDD protocol. The DDD also identified mucin-related proteins in the colon and folate receptor in the ovary carcinoma-derived EST libraries. The majority of the up-regulated genes were identified by both DDD1 and DDD2 protocols. These results suggest the potential utility of the DDD protocol to rapidly identify tumor-selective genes. An electronic Northern (sources of cDNAs) and cytogenetic map position for each of the hits was created from the UniGene page (data not shown). The above-mentioned DDD approach also resulted in the identification of up-regulated novel ESTs(data not shown). The vast majority of the ESTs were present in an organ- and tumor type-dependent manner. The results of these hits and additional hits (>5-fold), including the novel ESTs, can be viewed at the web site.4
Electronic Profiling of Down-Regulated Genes in Solid Tumors.
Using the same DDD protocol (DDD1 and DDD2), a list of genes that are down-regulated by >10-fold was also compiled (Table 2). The majority of the down-regulated genes were discovered by DDD2 protocol (tumor versus normal). Similar to the above-mentioned results, distinct members of ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, and immunoglobulins were discovered to be selectively down-regulated in an organ- and tumor type-dependent manner(N = 34). A complete listing of all the genes that are down-regulated (>5-fold) with their E-Northern and cytogenetic map position can be accessed at the web site.4
Electronic Prediction of Tumor-selective Genes by DDD.
Using the DDD1 and DDD2 protocol, a list of known genes that are predicted to be either present or absent in the tumor types (plus/minus differences) was compiled (Table 3). Fourteen genes were found to be present exclusively in the select solid tumor-derived libraries (Table 3). These included mammaglobulin in breast, androgen receptor in prostate, γ-glutamyl transferase in lung, and neurotensin receptor in pancreas. Seventeen genes were found to be selectively absent in the solid tumors analyzed (Table 4). These included seminogelin in the prostate, apolipoprotein in the breast, and islet ameloid polypeptide in the pancreas.
Test Validation of DDD Hits by RT-PCR.
The electronic prediction by DDD was tested for expression relevance using cDNAs from a matched set of normal and tumor colon tissues (Fig. 1). Twelve known genes that showed plus/minus differences from the analysis of colon DDD from Table 3 and Table 4 were analyzed by RT-PCR. Three of these 12 genes showed concordance with the DDD prediction. The results of these three genes are shown in Fig. 1. Creatine kinase (DDD2)expression was detected in the normal colon-derived but not tumor colon-derived cDNAs, whereas, Guanylate cyclase (DDD7) and ETS variant gene (DDD12) were expressed in the same tumor, but not in the normal colon-derived cDNAs, whereas, RT-PCR analysis of the remaining nine genes did not show the DDD-predicted specificity of expression (data not shown). Furthermore, analysis of 20 novel ESTs predicted to be specific for colon tumors by DDD demonstrated two of the ESTs to be specifically expressed in the tumor, consistent with the prediction of DDD (data not shown). The authenticity of the RT-derived products was established using RT-minus reactions. These results demonstrated that it is possible to rapidly validate the electronic prediction by DDD.
Discussion
Discovery of cancer genes is a major challenge facing cancer research (9, 13, 15). It is becoming increasingly clear that multiple genetic alterations are responsible for cancer development. The task of identifying cancer genes is complex due to the fact that only a subset of genes within a cell population is expressed. Whereas in the past, cancer gene discovery followed conventional methods such as mapping the gene, loss of heterozygosity, model organism studies, and so forth, the future of cancer gene discovery has to be able to make effective use of the large number of genes(predicted to be around 100,000 per cell) that will emerge from the human genome sequencing efforts (1, 2, 3, 4). These sequences are being deposited in the vast sequence databases, most of which are in the public domain. For cancer-specific gene discovery, the CGAP database of the NCI provides a comprehensive collection not only of expressed sequences in the form of ESTs but also of various data-mining tools to analyze the ESTs. The basis of the CGAP database is establishment of the molecular anatomy of the cancer cell by determining the repertoire of genes that are expressed in the cancer in a quantitative manner, so that a fingerprint can be established. By comparing the fingerprints with the normal cells, it should be possible to short list genes that are present in the cancer cells, which can be followed up for relevance.
High-throughput gene expression techniques (microarrays, Genechips) to identify cancer-specific genes are becoming available(5, 6, 7, 8), however, the technology is not cost effective for average laboratories. Furthermore, this method introduces a bias in that only a limited number, usually one tumor- and normal-derived RNA,can be used for the initial analysis. In view of the pharmacogenomic gene expression profile differences that are usually seen in different patients (8, 9), direct use of this technique may become limiting. In this context, data-mining the databases provides a parallel approach to rapidly establish transcript-based fingerprinting.
The CGAP database currently offers three different data-mining tools:(a) X-profiling; (b) SAGE analysis; and(c) DDD. The X-profiling method enables identification of mostly novel ESTs that are present or absent in a given library, but it does not permit quantitation. The SAGE and DDD options enable quantitation of transcript levels and identify both known and novel ESTs. The SAGE database was a recent addition to the CGAP database(10); currently, SAGE libraries are available for only a few organs. Hence, to establish a proof of concept of cancer gene discovery by data-mining, we choose the DDD protocol.
A database of ESTs, both known and novel, that are differentially expressed at least >10-fold was created for six major solid tumor types. An electronic Northern based on cDNA sources as well as the cytogenetic map position was created for each of these hits. The vast sequence database (1.5 million ESTs) was thus reduced to approximately 200 known genes and 500 novel ESTs for these six organ-derived tumor types. When these hits were subdivided into major classes of genes, the number of known hits was significantly reduced (10–30 per organ type). The majority of the hits were classifiable as ribosomal proteins,enzymes, cell surface receptors, binding proteins, secretory proteins,cell adhesion molecules, or immunoglobulins. Known organ type-specific genes were identified, including prostate-specific antigens, prostatic acid phosphatase, and androgen receptor for prostate; pancreatic enzymes such as islet ameloid polypeptide, lipases, elastases, and carboxypeptidases for pancreas; and mammaglobulin for breast. Our recent demonstration of pancreatic tumor-specific expression of neurotensin receptor (20) was also corroborated by the DDD results.
Distinct ribosomal proteins were found to be up- and down-regulated in ovary-, pancreas-, and prostate cancer-derived libraries by DDD. A recent report (17) supports the selective involvement of the ribosomal proteins in prostate and breast tumors. Subtractive hybridization of prostatic hyperplasia from prostate tumors identified distinct ribosomal proteins (L4, l5, L7a, L23a, l30, L37, S14, and S18). Furthermore, L23a and S14 levels were shown to be elevated in PC-3 prostate carcinoma cells in comparison to normal prostatic epithelial cells (17). In addition, mutant p53 expression has been shown to induce overexpression of selective ribosomal proteins(17). These results demonstrate the utility of DDD for rapid gene discovery of cancer.
Electronic expression profiling to identify cancer specific genes is likely to have false positives. However, the true lead genes can be rapidly validated by RT-PCR using appropriate cDNAs. RT-PCR validation of test genes indicated the expression profile consistent with the DDD prediction. We choose 12 known genes predicted to be either up- or down-regulated in colon tumors by DDD. Three out of these 12 genes showed the expected specificity of expression. Evidence in the literature supports these three hits in select cancer types. Creatine kinase isoforms have been shown to be lost in colon carcinomas(18), consistent with our DDD findings. Similarly, the guanylate cyclase C isoform has recently been shown to be a biomarker for advanced colon carcinomas (19). The selective up-regulation of the ETS variant gene seen in the colon tumor, if validated in a large sample size, provides a novel diagnostic and therapy target for colon carcinomas. The ETS gene belongs to the ETS oncogene family (16), which has DNA binding ability. In addition, the DDD protocol predicted the expression specificity of neurotensin receptor in pancreatic tumors. We have recently identified this gene by independently data-mining the Unigene database and showed the expression specificity in the pancreatic tumors (20). Identification by the DDD protocol of several known genes that have already been shown to be of diagnostic and therapeutic value in prostate, breast, and pancreatic tumors suggests that it is possible to discover novel genes from the list of novel ESTs that we have uncovered. In support of this, our preliminary results with 20 colon-specific novel ESTs have led to the identification of 2 ESTs showing specificity of expression to the colon tumors. These novel ESTs can be subjected to additional data-mining tools [e.g., comparison with SAGE and X-profiling; contig construction to expand the sequences and motifs recognition, and electronic Northern (14)] to further reduce the numbers before laboratory validation with relevant cDNAs.
The ability to reduce the number of hits from the vast and rapidly growing amount of sequence information in the sequence database is crucial to efficient gene discovery. The results presented in this report support the starting premise that by data-mining the CGAP database, it is possible to rapidly short list both known and novel ESTs for immediate follow-up studies. The database of hits we have generated using DDD for six different solid tumor types should provide a rapid starting point for discovery of both diagnostic and therapy targets.
Acknowledgments
We thank the CGAP database for providing access and the data-mining tools used in this study. We thank Jeanine Narayanan for editorial assistance.
RT-PCR validation of test hits identified by DDD. Three hits (creatine kinase MM-DDD2, guanylate cyclase 2C-DDD7, and ETS variant gene 3-DDD12) from Table 3 were chosen for expression specificity using a matched set of normal- and tumor colon-derived random primed cDNAs synthesized in the presence (RT+) or absence of(RT−) reverse transcriptase. NEG, template minus negative control.
RT-PCR validation of test hits identified by DDD. Three hits (creatine kinase MM-DDD2, guanylate cyclase 2C-DDD7, and ETS variant gene 3-DDD12) from Table 3 were chosen for expression specificity using a matched set of normal- and tumor colon-derived random primed cDNAs synthesized in the presence (RT+) or absence of(RT−) reverse transcriptase. NEG, template minus negative control.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by an institutional startup grant (to R. N.). D. S. was supported by a grant from the Boca Raton Community Hospital Foundation. Colon tumor and normal tissues were provided by the Cooperative Human Tissue Network, which is funded by the National Cancer Institute.
The abbreviations used are: EST, expressed sequence tag; CGAP, Cancer Gene Anatomy Project; DDD, Digital Differential Display; RT, reverse transcription; Hs., human sequence;NCI, National Cancer Institute.
http://www.fau.edu/cmbb/publications/cancergenes.htm.
http://www.cgap.gov.
http://www.ncbi.nlm.nih.gov/UniGene/.
http://www.ncbi.nlm.nih.gov/Omin/ and the GeneCards site http://bioinformatics.weizmann.ac.il/cards/.
http://www.genome.wi.mit.edu//cgi-bin/primer/primer3_www.cgi.
DDD of up-regulated genes in tumors
Hits (known ESTs) showing >10-fold differences in the indicated tumor-derived libraries in comparison with normal tissue-derived cDNA library and all other organ- and tumor-derived cDNA libraries were compiled. ESTs belonging to specific classes of genes were subclassified as indicated. UniGene number for each hit is shown. Electronic expression (E-Northern) for each of these hits was inferred from the UniGene database from the cDNA sources, and the chromosomal map position for each of these hits was inferred from the cytogenetic map.4
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Enzymes (n = 35) | |||||||
CYB561 | 153028 | 29 | |||||
P450, subfamily XVII | 1363 | 26 | |||||
ATP synthase, isoform 2 | 155751 | 20 | |||||
Proteosome non-ATPase, 7 | 155543 | 10 | 18 | ||||
Glu. peroxidase 2a | 2702 | 17 | |||||
Carbonic anhydrase I | 23118 | 18 | |||||
Dopa decarboxylase | 150403 | 10 | |||||
Tryptophan Hydroxylase | 144563 | 27 | |||||
ATPase, lysosomal | 24322 | 13 | |||||
Serine protease 9 | 79361 | 28 | |||||
Serine-like protease, 1 | 69423 | 25 | 28 | ||||
Myosin IXBB | 159629 | 17 | |||||
ATPase isoform 1 | 64173 | 15 | |||||
ADH 8 | 87539 | 15 | |||||
ATPase, 9 kD | 24322 | 13 | |||||
PKR | 177574 | 10 | |||||
Cathepsin E | 1355 | 80 | |||||
Panc. lipase | 102876 | 47 | |||||
Ser. protease 2 | 241561 | 39 | |||||
Ser. protease 1 | 241395 | 34 | |||||
Elastase 1 | 21 | 38 | |||||
Carboxy peptidase A2 | 89717 | 34 | |||||
Carboxy peptidase A1 | 2879 | 33 | |||||
Elastase 3 | 181289 | 32 | |||||
Chymotrypsino-gen B1 | 74502 | 32 | |||||
Pancreatic lipase-related protein 1 | 73923 | 30 | |||||
Carboxy peptidase B1 | 180884 | 28 | |||||
DNase II | 118243 | 14 | |||||
Elastase 3B | 183864 | 13 | |||||
Urokinase | 77274 | 13 | |||||
Ribosomal proteins (N = 6) | |||||||
S15a | 2953 | 11 | |||||
S17 | 5174 | 11 | |||||
S19 | 126707 | 16 | |||||
L31 | 184014 | 10 | |||||
L35 | 182825 | 10 | |||||
L41 | 108124 | 10 | |||||
Receptor/surface/membrane (N = 17) | |||||||
Claudin | 25640 | 33 | |||||
Tumor necrosis factor receptor, 6b | 194676 | 17 | |||||
Plakophilin 3 | 26557 | 11 | |||||
Lymphocyte antigen complex | 77667 | 17 | |||||
Mesothelin | 155981 | 40 | |||||
Chloride channel 1A | 84974 | 12 | |||||
Folate receptor 1 | 73769 | 10 | |||||
CD74 | 84298 | 10 | |||||
Keratin 19 | 182265 | 24 | |||||
CEA | 220529 | 21 | |||||
Keratin 17 | 2785 | 17 | |||||
Trans membrane 4 superfamily 3 | 84072 | 13 | |||||
Trans membrane 4 superfamily 4 | 11881 | 17 | |||||
Non-specific cross-reacting antigen | 73848 | 61 | |||||
T-cell receptor γ | 112259 | 95 | |||||
IL-17 receptor | 129751 | 43 | |||||
ERBB3 | 199067 | 15 | |||||
Binding proteins (N = 24) | |||||||
Myosin | 170482 | 236 | |||||
PDEF | 79414 | 194 | |||||
Ubiquitin binding, p62 | 182248 | 27 | |||||
HSFBP 1 | 158675 | 25 | |||||
H2A histone | 795 | 23 | |||||
Ald. dehydrogenase 6 | 75746 | 11 | |||||
PSA/KLK2 | 181350 | 77 | |||||
PSA/KLK3 | 171995 | 53 | |||||
Acid phosphatase | 1852 | 61 | |||||
Bruton’s tyrosine kinase | 178391 | ||||||
Treacher Collins-Franceschetti syndrome 1 | 172727 | 18 | 17 | ||||
ATP binding cassette, subfamily B, member 6 | 107911 | 14 | |||||
Homeobox A9 | 127428 | 22 | |||||
Insulinoma-associated 1 | 89584 | 48 | |||||
(TITF1) | 107764 | 34 | |||||
ATP-binding cassette, subfamily C, 8 | 54470 | 30 | |||||
I3 protein | 75922 | 16 | |||||
H1 histone | 109804 | 15 | |||||
Lectin binding (galectin 6) | 79339 | 13 | |||||
LIM domain | 79691 | 10 | |||||
Retinoic acid binding 2 | 183650 | 10 |
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Enzymes (n = 35) | |||||||
CYB561 | 153028 | 29 | |||||
P450, subfamily XVII | 1363 | 26 | |||||
ATP synthase, isoform 2 | 155751 | 20 | |||||
Proteosome non-ATPase, 7 | 155543 | 10 | 18 | ||||
Glu. peroxidase 2a | 2702 | 17 | |||||
Carbonic anhydrase I | 23118 | 18 | |||||
Dopa decarboxylase | 150403 | 10 | |||||
Tryptophan Hydroxylase | 144563 | 27 | |||||
ATPase, lysosomal | 24322 | 13 | |||||
Serine protease 9 | 79361 | 28 | |||||
Serine-like protease, 1 | 69423 | 25 | 28 | ||||
Myosin IXBB | 159629 | 17 | |||||
ATPase isoform 1 | 64173 | 15 | |||||
ADH 8 | 87539 | 15 | |||||
ATPase, 9 kD | 24322 | 13 | |||||
PKR | 177574 | 10 | |||||
Cathepsin E | 1355 | 80 | |||||
Panc. lipase | 102876 | 47 | |||||
Ser. protease 2 | 241561 | 39 | |||||
Ser. protease 1 | 241395 | 34 | |||||
Elastase 1 | 21 | 38 | |||||
Carboxy peptidase A2 | 89717 | 34 | |||||
Carboxy peptidase A1 | 2879 | 33 | |||||
Elastase 3 | 181289 | 32 | |||||
Chymotrypsino-gen B1 | 74502 | 32 | |||||
Pancreatic lipase-related protein 1 | 73923 | 30 | |||||
Carboxy peptidase B1 | 180884 | 28 | |||||
DNase II | 118243 | 14 | |||||
Elastase 3B | 183864 | 13 | |||||
Urokinase | 77274 | 13 | |||||
Ribosomal proteins (N = 6) | |||||||
S15a | 2953 | 11 | |||||
S17 | 5174 | 11 | |||||
S19 | 126707 | 16 | |||||
L31 | 184014 | 10 | |||||
L35 | 182825 | 10 | |||||
L41 | 108124 | 10 | |||||
Receptor/surface/membrane (N = 17) | |||||||
Claudin | 25640 | 33 | |||||
Tumor necrosis factor receptor, 6b | 194676 | 17 | |||||
Plakophilin 3 | 26557 | 11 | |||||
Lymphocyte antigen complex | 77667 | 17 | |||||
Mesothelin | 155981 | 40 | |||||
Chloride channel 1A | 84974 | 12 | |||||
Folate receptor 1 | 73769 | 10 | |||||
CD74 | 84298 | 10 | |||||
Keratin 19 | 182265 | 24 | |||||
CEA | 220529 | 21 | |||||
Keratin 17 | 2785 | 17 | |||||
Trans membrane 4 superfamily 3 | 84072 | 13 | |||||
Trans membrane 4 superfamily 4 | 11881 | 17 | |||||
Non-specific cross-reacting antigen | 73848 | 61 | |||||
T-cell receptor γ | 112259 | 95 | |||||
IL-17 receptor | 129751 | 43 | |||||
ERBB3 | 199067 | 15 | |||||
Binding proteins (N = 24) | |||||||
Myosin | 170482 | 236 | |||||
PDEF | 79414 | 194 | |||||
Ubiquitin binding, p62 | 182248 | 27 | |||||
HSFBP 1 | 158675 | 25 | |||||
H2A histone | 795 | 23 | |||||
Ald. dehydrogenase 6 | 75746 | 11 | |||||
PSA/KLK2 | 181350 | 77 | |||||
PSA/KLK3 | 171995 | 53 | |||||
Acid phosphatase | 1852 | 61 | |||||
Bruton’s tyrosine kinase | 178391 | ||||||
Treacher Collins-Franceschetti syndrome 1 | 172727 | 18 | 17 | ||||
ATP binding cassette, subfamily B, member 6 | 107911 | 14 | |||||
Homeobox A9 | 127428 | 22 | |||||
Insulinoma-associated 1 | 89584 | 48 | |||||
(TITF1) | 107764 | 34 | |||||
ATP-binding cassette, subfamily C, 8 | 54470 | 30 | |||||
I3 protein | 75922 | 16 | |||||
H1 histone | 109804 | 15 | |||||
Lectin binding (galectin 6) | 79339 | 13 | |||||
LIM domain | 79691 | 10 | |||||
Retinoic acid binding 2 | 183650 | 10 |
DDD of down-regulated genes in tumors
Hits showing >10-fold differences in normal organ-derived cDNA library in comparison with tumor-derived and other organ-derived libraries were compiled. The hits are classified as shown in Table 2.4
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Enzymes (N = 7) | |||||||
Creatine kinase, brain | 173724 | 10 | |||||
TIMP 1a | 5831 | 10 | |||||
Myosin light kinase | 211582 | 13 | |||||
Superoxide dismutase 2 | 177781 | 20 | |||||
Aldo-keto reductase | 78183 | 16 | |||||
Amylase, α 2A | 75733 | 15 | |||||
Elastase 3B | 183864 | 18 | |||||
Receptor/surface/membrane (N = 8) | |||||||
Phospholemman-like | 92323 | 15 | |||||
CEA | 220529 | 14 | |||||
MHC class I, E | 181392 | 21 | |||||
Surfactant C | 1074 | 19 | |||||
Surfactant A1 | 177582 | 34 | |||||
SB class HC antigen α | 914 | 15 | |||||
Surfactant B | 76305 | 14 | |||||
CD44 | 169610 | 22 | |||||
Secretory proteins (N = 9) | |||||||
HB α 1 | 75792 | 12 | 140 | 13 | 30 | ||
Hb γ A | 182167 | 12 | 11 | 22 | |||
Albumin | 75442 | 29 | 58 | 10 | |||
Placental lactogen | 75984 | 39 | |||||
Hb β | 155376 | 11 | |||||
TGFBI | 118787 | 12 | |||||
CTGF | 75511 | 17 | |||||
PLAB | 116577 | 21 | |||||
Interleukin 8 | 624 | 301 | |||||
Ribosomal proteins (N = 6) | |||||||
L9 | 157850 | 16 | |||||
L17 | 82202 | 11 | |||||
L21 | 184108 | 15 | |||||
L23 | 234518 | 15 | |||||
L37A | 184109 | 23 | |||||
S19 | 126707 | 11 | |||||
Binding proteins (N = 13) | |||||||
Myosin | 9615 | 35 | |||||
Crystallin α B | 1940 | 19 | |||||
SNC73 protein | 32225 | 10 | 12 | ||||
Galectin 4 | 5302 | 18 | |||||
Myosin | 929 | 20 | |||||
Pleckstrin | 77436 | 70 | |||||
Actin, β | 180952 | 14 | |||||
α-2-Macroglobulin | 74561 | 13 | |||||
EIF4A1 | 129673 | 10 | |||||
IGFBP5 | 103391 | 83 | |||||
Transgelin | 75777 | 14 | |||||
Karyopherin α 4 | 119500 | 11 | |||||
TYR-MAP-β | 182238 | 10 | |||||
Cell adhesion molecules (N = 2) | |||||||
Fibronectin | 118162 | 15 | |||||
Vimentin | 2064 | 12 | |||||
Immunoglobulins (N = 2) | |||||||
FCGRT | 160741 | 11 | |||||
Ig κ | 156110 | 10 |
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Enzymes (N = 7) | |||||||
Creatine kinase, brain | 173724 | 10 | |||||
TIMP 1a | 5831 | 10 | |||||
Myosin light kinase | 211582 | 13 | |||||
Superoxide dismutase 2 | 177781 | 20 | |||||
Aldo-keto reductase | 78183 | 16 | |||||
Amylase, α 2A | 75733 | 15 | |||||
Elastase 3B | 183864 | 18 | |||||
Receptor/surface/membrane (N = 8) | |||||||
Phospholemman-like | 92323 | 15 | |||||
CEA | 220529 | 14 | |||||
MHC class I, E | 181392 | 21 | |||||
Surfactant C | 1074 | 19 | |||||
Surfactant A1 | 177582 | 34 | |||||
SB class HC antigen α | 914 | 15 | |||||
Surfactant B | 76305 | 14 | |||||
CD44 | 169610 | 22 | |||||
Secretory proteins (N = 9) | |||||||
HB α 1 | 75792 | 12 | 140 | 13 | 30 | ||
Hb γ A | 182167 | 12 | 11 | 22 | |||
Albumin | 75442 | 29 | 58 | 10 | |||
Placental lactogen | 75984 | 39 | |||||
Hb β | 155376 | 11 | |||||
TGFBI | 118787 | 12 | |||||
CTGF | 75511 | 17 | |||||
PLAB | 116577 | 21 | |||||
Interleukin 8 | 624 | 301 | |||||
Ribosomal proteins (N = 6) | |||||||
L9 | 157850 | 16 | |||||
L17 | 82202 | 11 | |||||
L21 | 184108 | 15 | |||||
L23 | 234518 | 15 | |||||
L37A | 184109 | 23 | |||||
S19 | 126707 | 11 | |||||
Binding proteins (N = 13) | |||||||
Myosin | 9615 | 35 | |||||
Crystallin α B | 1940 | 19 | |||||
SNC73 protein | 32225 | 10 | 12 | ||||
Galectin 4 | 5302 | 18 | |||||
Myosin | 929 | 20 | |||||
Pleckstrin | 77436 | 70 | |||||
Actin, β | 180952 | 14 | |||||
α-2-Macroglobulin | 74561 | 13 | |||||
EIF4A1 | 129673 | 10 | |||||
IGFBP5 | 103391 | 83 | |||||
Transgelin | 75777 | 14 | |||||
Karyopherin α 4 | 119500 | 11 | |||||
TYR-MAP-β | 182238 | 10 | |||||
Cell adhesion molecules (N = 2) | |||||||
Fibronectin | 118162 | 15 | |||||
Vimentin | 2064 | 12 | |||||
Immunoglobulins (N = 2) | |||||||
FCGRT | 160741 | 11 | |||||
Ig κ | 156110 | 10 |
TIMP, tissue inhibitor of metalloproteinase 1; CEA,carcinoembryonic antigen; HB, hemoglobin; TGFB, transforming growth factor β; PLAB, prostate differentiation factor; SNC73,immunoglobulin heavy constant α 1; EIF, eukaryotic translation initiation factor; IGFBP, insulin-like growth factor-binding protein;TYR-MAP, tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein; FCGRT, Fc fragment of IgG receptor.
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Mammaglobin 1 | 46452 | +/− | |||||
Guanylate cyclase 2C | 1085 | +/− | |||||
Homeobox trans. factor 1a | 1545 | +/− | |||||
Homeobox trans. factor 2 | 77399 | +/− | |||||
Cyto keratin 20 | 84905 | +/− | |||||
Ets variant | 179214 | +/− | |||||
EGR4 | 3052 | +/− | |||||
Dopamin rec. D2 | 73893 | +/− | |||||
γ-glutamyl-transferase 2 | 211824 | +/− | |||||
V κ 2 | 249231 | +/− | |||||
Neurotensin receptor 1 | 110642 | +/− | |||||
Transcobalamin I | 2012 | +/− | |||||
Androgen receptor | 99915 | +/− | |||||
Specific granular protein | 54431 | +/− |
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Mammaglobin 1 | 46452 | +/− | |||||
Guanylate cyclase 2C | 1085 | +/− | |||||
Homeobox trans. factor 1a | 1545 | +/− | |||||
Homeobox trans. factor 2 | 77399 | +/− | |||||
Cyto keratin 20 | 84905 | +/− | |||||
Ets variant | 179214 | +/− | |||||
EGR4 | 3052 | +/− | |||||
Dopamin rec. D2 | 73893 | +/− | |||||
γ-glutamyl-transferase 2 | 211824 | +/− | |||||
V κ 2 | 249231 | +/− | |||||
Neurotensin receptor 1 | 110642 | +/− | |||||
Transcobalamin I | 2012 | +/− | |||||
Androgen receptor | 99915 | +/− | |||||
Specific granular protein | 54431 | +/− |
Trans, transcription; EGR, early growth response.
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Apolipoprot. D | 75736 | +/− | |||||
Albumin | 75442 | +/− | +/− | ||||
Troponin T1 | 73980 | +/− | |||||
Titin | 172004 | +/− | |||||
Creatine kinase | 118843 | +/− | +/− | +/− | |||
Actin α 1 | 1288 | +/− | +/− | +/− | |||
Myoglobin | 118836 | +/− | +/− | ||||
Vinculin | 75350 | +/− | |||||
Myosin, heavy | 929 | +/− | |||||
p53-response gene 2 | 118893 | +/− | |||||
Placental lactogen | 75984 | +/− | +/− | +/− | |||
HB βa | 155376 | +/− | |||||
HB α 1 | 75792 | +/− | +/− | ||||
Islet amyloid polypeptide | 142255 | +/− | |||||
Tissue factor pathway inhibitor 2 | 78045 | +/− | |||||
IGF2 | 182167 | +/− | |||||
Semenogelin I | 1968 | +/− |
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
Apolipoprot. D | 75736 | +/− | |||||
Albumin | 75442 | +/− | +/− | ||||
Troponin T1 | 73980 | +/− | |||||
Titin | 172004 | +/− | |||||
Creatine kinase | 118843 | +/− | +/− | +/− | |||
Actin α 1 | 1288 | +/− | +/− | +/− | |||
Myoglobin | 118836 | +/− | +/− | ||||
Vinculin | 75350 | +/− | |||||
Myosin, heavy | 929 | +/− | |||||
p53-response gene 2 | 118893 | +/− | |||||
Placental lactogen | 75984 | +/− | +/− | +/− | |||
HB βa | 155376 | +/− | |||||
HB α 1 | 75792 | +/− | +/− | ||||
Islet amyloid polypeptide | 142255 | +/− | |||||
Tissue factor pathway inhibitor 2 | 78045 | +/− | |||||
IGF2 | 182167 | +/− | |||||
Semenogelin I | 1968 | +/− |
HB, hemoglobin; IGF, insulin-like growth factor.
Continued
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
SNC73 protein | 32225 | 12 | |||||
Actin-related protein 2/3 complex, subu. 1B | 11538 | 11 | |||||
Lumican | 79914 | 10 | |||||
Thrombospondin 2 | 108623 | 10 | |||||
Ca+ integrin BP | 10803 | 10 | |||||
ELK4, ETS-domain protein | 169241 | 60 | |||||
Myosin BP-C | 181368 | 10 | |||||
Fatty acid BP-5 | 153179 | 15 | |||||
Secretory proteins (N = 16) | |||||||
Trefoil factor 1 | 1406 | 22 | |||||
Mucin 2 | 315 | 22 | |||||
Pancreatitis-associated protein | 423 | 18 | |||||
H. sapiens gallbladder mucin MUC5B mRNA | 102482 | 11 | |||||
Epididymis-specific, whey-acidic protein type | 2719 | 27 | |||||
Lutheran blood group | 155048 | 21 | |||||
Progestagen-associated protein | 82269 | 17 | |||||
TNF, 13 | 54673 | 10 | |||||
Glucagon | 1460 | 93 | |||||
Mucin 5 | 103707 | 84 | |||||
Islet-derived 1β protein | 4158 | 51 | |||||
Islet-derived 1α protein | 1032 | 21 | |||||
Secreted frizzled-related protein 4 | 105700 | 17 | |||||
Mucin 1, transmembrane | 89603 | 10 | |||||
Microseminoprotein β | 183752 | 37 | |||||
Small inducible cytokine | 153423 | 18 | |||||
Immunoglobulins (N = 4) | |||||||
FCGRT | 160741 | 26 | |||||
Ig λ gene cluster | 181125 | 10 | |||||
Ig super family | 102171 | 15 | |||||
Ig γ 3 | 140 | 52 | |||||
Cell adhesion molecules (N = 6) | |||||||
Integrin β 5 | 149846 | 11 | |||||
Collagen type XIII, α 1 | 211933 | 23 | |||||
Laminin α 5 | 11669 | 10 | |||||
Laminin β 3 | 75517 | 24 | |||||
Laminin γ 2 | 54451 | 20 | |||||
Collagen Type X α 1 | 179729 | 30 |
Name . | Hs.# . | Br . | Co . | Lu . | Ov . | Pa . | Pr . |
---|---|---|---|---|---|---|---|
SNC73 protein | 32225 | 12 | |||||
Actin-related protein 2/3 complex, subu. 1B | 11538 | 11 | |||||
Lumican | 79914 | 10 | |||||
Thrombospondin 2 | 108623 | 10 | |||||
Ca+ integrin BP | 10803 | 10 | |||||
ELK4, ETS-domain protein | 169241 | 60 | |||||
Myosin BP-C | 181368 | 10 | |||||
Fatty acid BP-5 | 153179 | 15 | |||||
Secretory proteins (N = 16) | |||||||
Trefoil factor 1 | 1406 | 22 | |||||
Mucin 2 | 315 | 22 | |||||
Pancreatitis-associated protein | 423 | 18 | |||||
H. sapiens gallbladder mucin MUC5B mRNA | 102482 | 11 | |||||
Epididymis-specific, whey-acidic protein type | 2719 | 27 | |||||
Lutheran blood group | 155048 | 21 | |||||
Progestagen-associated protein | 82269 | 17 | |||||
TNF, 13 | 54673 | 10 | |||||
Glucagon | 1460 | 93 | |||||
Mucin 5 | 103707 | 84 | |||||
Islet-derived 1β protein | 4158 | 51 | |||||
Islet-derived 1α protein | 1032 | 21 | |||||
Secreted frizzled-related protein 4 | 105700 | 17 | |||||
Mucin 1, transmembrane | 89603 | 10 | |||||
Microseminoprotein β | 183752 | 37 | |||||
Small inducible cytokine | 153423 | 18 | |||||
Immunoglobulins (N = 4) | |||||||
FCGRT | 160741 | 26 | |||||
Ig λ gene cluster | 181125 | 10 | |||||
Ig super family | 102171 | 15 | |||||
Ig γ 3 | 140 | 52 | |||||
Cell adhesion molecules (N = 6) | |||||||
Integrin β 5 | 149846 | 11 | |||||
Collagen type XIII, α 1 | 211933 | 23 | |||||
Laminin α 5 | 11669 | 10 | |||||
Laminin β 3 | 75517 | 24 | |||||
Laminin γ 2 | 54451 | 20 | |||||
Collagen Type X α 1 | 179729 | 30 |
Glu, glutathione; DOPA, aromatic 1-amino acid decarboxylase; ADH, aldehyde dehydrogenase; PK, protein kinase; ser,serine; CEA, carcinoembryonic antigen; HSFBP, heat shock factor-binding protein; BP, binding protein; ald, aldehyde; PSA, prostate-specific antigen; TNF, tumor necrosis factor; FCGRT, Fc fragment of IgG,receptor, I3, brain protein I3.