Purpose: We evaluated the genome-wide gene expression profiles of various cancer cell lines to identify the gastrointestinal tract cancer cell–related genes.

Experimental Design: Gene expression profilings of 27 cancer cell lines and 9 tissues using 7.5K human cDNA microarrays in indirect design with Yonsei reference RNA composed of 11 cancer cell line RNAs were done. The significant genes were selected using significant analysis of microarray in various sets of data. The selected genes were validated using real-time PCR analysis.

Results: After intensity-dependent, within-print-tip normalization by loess method, we observed that expression patterns of cell lines and tissues were substantially different, divided in two discrete clusters. Next, we selected 115 genes that discriminate gastrointestinal cancer cell lines from others using significant analysis of microarray. Among the expression profiles of five gastric cancer cell lines, 66 genes were identified as differentially expressed genes related to metastatic phenotype. YCC-16, which was established from the peripheral blood of one advanced gastric cancer patient, produced a unique gene expression pattern resembling the profiles of lymphoid cell lines. Quantitative real-time reverse transcription-PCR results of selected genes, including PXN, KRT8, and ITGB5, were correlated to microarray data and successfully discriminate the gastrointestinal tract cancer cell lines from hematologic malignant cell lines.

Conclusions: A gene expression database could serve as a useful source for the further investigation of cancer biology using the cell lines.

Established cancer cell lines actually differ from their tissues of origins, which are composed of various cell types in histologic or biological characters. However, because of the difficulty in accessing human cancer and normal tissues, cell lines are still the most useful tools for cancer research. Many efforts have been made to understand the basic biological and genetic characteristics of cell lines, which might be useful to determine the proper cell line model for the study purposes. Sixty National Cancer Institute (NCI) cancer cell lines have been used for drug screening for more than four decades, which significantly contributed to anticancer drug development (1–4). Based on their biological properties, the NCI's Developmental Therapeutics Program did a comprehensive gene expression analysis of these 60 NCI cell lines, which resulted in the identification of the genetic characteristics of each cell line (5). Additional studies suggested that drug sensitivity genes could be identified by evaluating the correlation between the gene expression profiles and the drug responsiveness in each cell line (6, 7).

Recent microarray technology has proven to be useful in cancer research, including cancer cell line characterization. Jietal. (8) used a cDNA microarray to analyze the gene expression profile of 12 gastric cancer cell lines that were not included in the NCI studies. This study raised the possibility that RF1 and RF48 designated as gastric cancer cell lines from the American Type Culture Collection (Manassas, VA) could be misclassified. In addition, Sakakura et al. (9) reported on the differential gene expression profiles in gastric cancer cell lines established from primary tumor and malignant ascites. Furthermore, Virtanen etal. (10) showed that gene expression profiling, which is applicable to both lung cancer cell lines and lung tumors, could provide the technology to reclassify cell lines.

Although current treatments have significantly improved patient survival, cancers of the gastrointestinal tract remain the most common cause of cancer-related deaths in the world. However, the evaluation of individual molecules failed to elaborate the complex patterns of carcinogenesis and cancer progression in gastrointestinal cancer. Recent reports suggest that the phenotypic diversity of cancer might be related to corresponding diversity in their gene expression patterns (11–13).

We did a comprehensive gene expression analysis of various cancer cell lines with different tissue origins using a high-density cDNA microarray. Our goals in this study were to identify a subset of classifier genes that distinguish cancer cell lines of gastrointestinal tract origin from other cell lines and to identify genes differently expressed in five gastric cancer cell lines. In addition, because the expression pattern of known genes can show novel phenotypic aspects of the cell lines and tissues (11–13), we tried to explore the molecular characteristics of cancer cell lines with respect to their metastatic behavior to identify optimal cell lines for given biological experiments.

Cell Lines and Clinical Samples. Twenty cancer cell lines of various tissue origins (stomach: AGS; breast: MDA-MB-231, MDA-MB-435, and MCF7; colon: HCT116, COLO 205, and HT-29; liver: SK-HEP-1 and HepG2; lung: A549 and NCI-H460; lymphoid organ: HL-60, MOLT-4, and Raji; cervix: HeLa; fibrosarcoma: HT-1080; kidney: Caki-2; brain: U-87 MG; melanoma: SK-MEL-2; and pancreas: Capan-2) were obtained from the American Type Culture Collection. Seven cancer cell lines that were established at Yonsei Cancer Metastasis Research Center (CMRC, Seoul, Korea), including YCC-B1, YCC-B2 (from pleural effusion of breast cancer patients), YCC-2, YCC-3, YCC-7 (from ascites of gastric cancer patients), YCC-16 (from the blood of gastric cancer patient), and YCC-P1 (pancreas), were also studied. The cells were cultured and maintained in MEM with 10% fetal bovine serum (Life Technologies, Rockville, MD) in 100 units/mL penicillin and 0.1 mg/mL streptomycin (Life Technologies) at 37°C in a 5% CO2 incubator. Nine tissue samples (three colon cancer tissues, two normal colon tissues, three liver metastatic tumor tissues from colon cancer, and a single sample of normal liver tissue) were collected from patients who had undergone surgery at the Severance Hospital, Yonsei University College of Medicine. Tissue samples were immediately frozen in liquid nitrogen and stored at −80°C until further use.

RNA Preparation and Purification. Total RNA was extracted from the tissues and the cell lines using Trizol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer's protocol. In the case of tissue RNA, extracted RNA was purified before probe preparation using a RNeasy kit (Qiagen, Hilden, Germany) based on the supplier's manuals. The quantity and quality of RNA were evaluated using a GeneSpec III (Hitachi, Tokyo, Japan) and a Gel Documentation-Photo System (Vilber Lourmat, France), respectively.

Probe Preparation. Total RNA (50 μg) was directly labeled and transcribed to cDNA. We combined total RNA from the following 11 cancer cell lines of various tissue origins in equal quantities to prepare the Yonsei reference RNA (CMRC, Yonsei University College of Medicine): stomach, breast, colon, liver, lung, lymphoid and myeloid system, cervix, fibrous tissue, kidney, and brain cancer cell line. Cell lines were selected from various organs to ensure that the pooled RNA contained as many transcripts as possible (see below). Yonsei reference RNA was labeled with Cy3-dUTP (NEN Co., Boston, MA) and test samples were labeled with Cy5-dUTP (NEN).

The labeling was done at 42°C for 2 hours in a total volume of 30 μL containing 400 units SuperScript II (Life Technologies); 3 μL Cy5-dUTP (or Cy-3 dUTP), 1.5 μL of each of dATP, dCTP, and dGTP; 0.6 μL dTTP; 300 mmol/L; 6 μL of 5× first-strand buffer; and 4 μg of modified oligo(dT) primer. Unincorporated nucleotide was removed by using a PCR purification kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Eluted probes were then mixed and supplemented with 20 μL of 1 μg/μL human Cot1 DNA (Life Technologies), 2 μL of 10 μg/μL polyadenylate RNA (Sigma, St. Louis, MO), and 2 μL of 10 μg/μL and 288 μL of 1 mol/L TE buffer. This probe mixture was concentrated using a Microcon-30 tube (Millipore, Bedford, MA),<1aqc=8> and the labeled mixture (48μL) was mixed with 10.2 μL of 20× SSC and 1.8 μL of 10% SDS for hybridization.

Hybridization of Fluorescence-Labeled cDNA. We used the 7.5K human cDNA microarray (GenomicTree Co., Daejon, Korea), which contains 6,360 known genes and 152 expressed sequence tags. Before the hybridization, slides were preblocked in 10 mg/mL bovine serum albumin, 3.5× SSC, 0.1% SDS solution to prevent nonspecific hybridization, and probe mixtures were heated at 95°C for 2 minutes and centrifuged at 13,000 rpm for 2 minutes. After applying the probe to the slides, the slides were hybridized in hybridization chambers (GenomicTree) at 65°C for 16 hours. After the hybridization, slides were washed in 2× SSC for 10 minutes, transferred to 0.1× SSC and 0.1% SDS for 10 minutes, and rinsed twice with 0.1× SSC for 10 minutes. After washing, slides were spun at 600 rpm for 5 minutes (Hanil Science Industrial Co., Incheon, Korea). Hybridized slides were scanned using a GenePix 4000B (Axon Instruments, Union City, CA) and the images were analyzed using GenePix Pro3.0 (Axon Instruments).

Data Analysis. All the array data were normalized by intensity-dependent, within-print-tip normalization with lowest fit (14, 15). We then selected genes with <20% missing values among the 27 experiments, leaving 3,658 genes for further analysis. Of the remaining genes, we determined a subset of classifier genes to distinguish gastrointestinal origin from other cell lines using three steps (Fig. 1). First, we chose a subset of genes to classify such origin of cell lines using multiclass significant analysis of microarray (SAM; refs. 16, 17) algorithm (origin classifier genes; Fig. 1A). For group 1 of gastrointestinal tract origin cell lines, we used four gastric cancer (AGS, YCC-2, YCC-3, and YCC-7) and three colorectal cancer (HCT116, COLO 205, and HT-29) cell lines. YCC-16 cell line was not used in any of gene selection procedure, as the cell line was established from the blood of the gastric cancer patient. Group 3 was the hematologic cells, including Raji, HL-60, and MOLT-4, and the group 2 was consisted of the residual 16 cell lines. Next, we selected a subset of classifier genes specific to cell lines by using two-class SAM (genes related to the cell line establishment; Fig. 1B). Thereafter, by using our Difference Program (CMRC, Yonsei College of Medicine, Korea), we removed a subset of genes specific for cell lines from a subset of classifier genes specific for tissue origin (Fig. 1C). The Difference Program based on Python (http://www.python.org) was developed by CMRC to easily and rapidly obtain the complement set of genes, which only belong to the specific group of the samples. Using Pearson correlation, we did two-way hierarchical clustering of selected genes and samples. TreeView was used to visualize the results using the GeneSpring program (Silicon Genetics, Inc., Redwood City, CA). Functional annotation is based on Stanford Web site (http://genome-wwws.stanford.edu/cgi-bin/source/sourceSearch).

Fig. 1

Scheme of finding a subset of differentially expressed genes in gastrointestinal cancer cell lines from other cell lines derived from the other tissue origins. A, selection of a subset of origin classifier genes unique to groups 1, 2, and 3. (group 1, cell lines of gastrointestinal tract origin, including AGS, YCC-2, YCC-3, YCC-7, HCT116, COLO 205, and HT-29; group 2, other 16 cell lines, except groups 1 and 3 among 27 cell lines; group 3, hematologic cells, including Raji, HL-60, and MOLT-4). YCC-16 cell line was not used in any of gene selection procedure, as the cell line was established from the blood of the gastric cancer patient. B, selection of a subset of genes unique to cell lines (group T, three colon cancer tissues; group C, colon cancer cell lines HCT116, COLO205, and HT-29). C, selection of gastrointestinal tract origin cell line classifier genes, a subset of genes that distinguish gastrointestinal cancer cell lines from other cancer cell lines with different origins. Difference program was developed at the CMRC, Yonsei University College of Medicine.

Fig. 1

Scheme of finding a subset of differentially expressed genes in gastrointestinal cancer cell lines from other cell lines derived from the other tissue origins. A, selection of a subset of origin classifier genes unique to groups 1, 2, and 3. (group 1, cell lines of gastrointestinal tract origin, including AGS, YCC-2, YCC-3, YCC-7, HCT116, COLO 205, and HT-29; group 2, other 16 cell lines, except groups 1 and 3 among 27 cell lines; group 3, hematologic cells, including Raji, HL-60, and MOLT-4). YCC-16 cell line was not used in any of gene selection procedure, as the cell line was established from the blood of the gastric cancer patient. B, selection of a subset of genes unique to cell lines (group T, three colon cancer tissues; group C, colon cancer cell lines HCT116, COLO205, and HT-29). C, selection of gastrointestinal tract origin cell line classifier genes, a subset of genes that distinguish gastrointestinal cancer cell lines from other cancer cell lines with different origins. Difference program was developed at the CMRC, Yonsei University College of Medicine.

Close modal

Real-time Reverse Transcription-PCR Analysis. For validation of microarray data, we did real-time reverse transcription-PCR (RT-PCR) using 16 cell line RNAs, which were used for microarray experiment. Sixteen cell lines tested were YCC-B2, MDA-MB-231, MCF7, HCT116, COLO 205, HT-29, SK-HEP-1, HepG2, HL-60, MOLT-4, Raji, AGS, YCC-2, YCC-3, YCC-7, and YCC-16. Six genes were randomly chosen from the selected gene set: dystroglycan 1 (DAG1, AA496691), integrin β5 (ITGB5, AA434397), keratin 8 (KRT8, AA598517), neuregulin 1 (NRG1, R72075), paxillin (PXN, AA430574), and suppression of tumorigenicity 14 (ST14, AA489246). The primer sequences of six genes and β-actinwere as follows as in forward/reverse sequences: DAG1 (268 bp) AGTATCCACACCAAAACCAG/CTTCACCTCAAAGTAGGTGC, ITGB5 (244 bp) CTGTCCATGAAGGATGACTT/TGTCCACTCTGTCTGTGAGA, KRT8 (259 bp) ACAAGTTTGCCTCCTTCATA/CGCTTATTGATCTCATCCTC, NRG1 (275 bp) AGTATCCACAGAAGGAGCAA/ATTCAACATGATGCAACAAA, PXN (159 bp) CGCTCTGTTTATAGTGACCC/AATCACAGGAATTGAAATGG, ST14 (145 bp) CTCCACTGAGTTTGTAAGCC/AGACCAGTAGTAGGCGATGA, and β-actin (98 bp) GGGAATTCAAAACTGGAACGGTGAAGG/GGAAGCTTATCAAAGTCCTCGGCCACA.

After the cDNA production from 5 μg of total RNA in reaction with oligo(dT) primer and 40 IU Moloney murine leukemia virus reverse transcriptase (MBI Fermentas Hanover, MD), 2 μL of cDNA of each cell line were used for the real-time PCR assay. The total volume of the reaction mixture was20 μL, which contained HotstarTaq DNA polymerase, QuantiTect SYBR Green PCR buffer, deoxynucleotide triphosphate mix, including dUTP, SYBR Green, 10 μL of QuantiTect SYBR Green PCR kit including 2.5 mmol/L MgCl2 (Qiagen, Valencia, CA), 2 μL of the cDNA, and 20 pmol of each primer in distilled water. PCR was done at 95°C for 15 minutes to activate the HotstarTaq DNA polymerase and then for 35 cycles of amplification at 95°C for 20 seconds, 50°C for 30 seconds, 72°C for 45 seconds on a Rotor Gene 2072D real-time PCR machine (Corbett Research, New South Wales, Sydney, Australia). The amplified fluorescence signal in each specimen was measured at the late extension step of each cycle.

To quantify the level of gene expression, we evaluated the β-actin gene expression in 10-fold serially diluted human genomic DNA (Promega, Madison, WI). The standard curve was drawn by plotting the measured threshold cycle versus the arbitrary unit of the copies/reaction according to the β-actin gene expression of diluted genomic DNA. The threshold cycle (Ct) value was determined as the cycle number at which the fluorescence exceeded the threshold value. In the negative control, there was no fluorescent signal when the cycle number was increased to 35. To compare with microarray result, the calculated copies of each gene were divided with the copies of β-actin and then were log converted. Pearson correlation coefficient was evaluated using S-PLUS 2000 (MathSoft, Inc., Seattle, WA) and the plotting was done using SigmaPlot 8.0 (SPSS, Inc., Chicago, IL).

Analysis of Global Gene Expression Patterns of Cancer Cell Lines. We did cDNA microarray testing with 7,500 cDNA spots, representing ∼6,512 genes to investigate the comprehensive gene expression patterns within 27 cancer cell lines and 9 tissues. First, we used a hierarchical clustering algorithm to group genes as well as cell lines and tissue samples according to similarities in their expression patterns by simply filtered 7,128 spots. Hierarchical clustering showed that the samples were grouped into two major branches, tissue samples and cell lines (data not shown). We observed the distinct clusters including cell lines with common tissue origin clustered together: lymphoid organs, the gastrointestinal tract, and remaining cell lines with only a few exceptions. In addition, the interesting features were that the differences of gene expression patterns among the 27 cell lines were much greater than those between normal and tumor tissues. However, normal tissues were separated from the tumor tissues.

Selection of a Subset of Genes That Discriminate Cancer Cell Lines of a Gastrointestinal Origin from Other Cancer Cell Lines of a Different Origin. Following the scheme of the gene selection in Fig. 1, we selected the significant genes with filtered 3,658 genes. Firstly, we used multiclass SAM to choose a subset of 199 cell origin classifier genes, which were assumed to differentiate cancer cell lines according to their origins, such as gastrointestinal tract, lymphoid organs, and remaining cell lines (false discovery rate 1%, δ = 0.5751; Fig. 1A). Next, by using two-class SAM, we selected a subset of 1,158 genes that expressed only in cell lines compared with tissues (false discovery rate 10%, δ = 0.9765; Fig. 1B). Then, to identify the subset of genes specific for gastrointestinal tract cell lines, we filtered off a subset of genes related to cell line establishment from 199 origin classifier genes using our Difference Program. Therefore, we could select 115 genes that might discriminate gastrointestinal cancer cell lines, including colon and stomach cancer cell lines, from others (Fig. 1C).

Gene Expression Patterns Related to Tissue Origins in Various Cancer Cell Lines. Hierarchical clustering analysis with the selected 115 genes related to gastrointestinal tract tissue origins showed that the cell lines were generally classified based on tissue origin with a few exceptions, such as MCF7, YCC-B1, and NCI-H460, which clustered in the gastrointestinal branch (Fig. 2A). We observed that the gastric cancer cell lines YCC-2, YCC-3, and YCC-7 established at Yonsei CMRC were grouped with other gastrointestinal tract cancer cells, such as AGS, HT-29, COLO 205, and HCT116 obtained from American Type Culture Collection (Fig. 2A). In addition, when the selected subset of genes was used for the clustering analysis, we observed that tissues of gastrointestinal origin completely integrated into gastrointestinal cell lines (Fig. 2A). However, in case of colon cancer cell lines, including COLO 205, HT-29, and HCT116, each cell line showed a unique expression pattern, suggesting different biological phenotype in each cell line.

Fig. 2

Two-way hierarchical clustering with selected 115 genes related to gastrointestinal tract cancer cell lines. A, clustering with all samples showing that the 115 genes distinguish gastrointestinal samples from other cell lines of different origins; B, clustering among the gastric cancer cell lines; C, clustering among the gastric and lymphoid cancer cell lines. CT, colon tumor tissue; CN, colon normal tissue; LT, liver tumor tissue; LN, liver normal tissue; GI, gastrointestinal. Red, high relative levels of expression; blue, low relative levels of expression.

Fig. 2

Two-way hierarchical clustering with selected 115 genes related to gastrointestinal tract cancer cell lines. A, clustering with all samples showing that the 115 genes distinguish gastrointestinal samples from other cell lines of different origins; B, clustering among the gastric cancer cell lines; C, clustering among the gastric and lymphoid cancer cell lines. CT, colon tumor tissue; CN, colon normal tissue; LT, liver tumor tissue; LN, liver normal tissue; GI, gastrointestinal. Red, high relative levels of expression; blue, low relative levels of expression.

Close modal

Gene Expression Patterns According to Metastatic Pattern in Gastric Cancer. By comparing gene expression profiles of five gastric cancer cell lines with different metastatic patterns (18), we discovered a subset of differentially regulated 66 genes in YCC-16, a cell line isolated from the blood of a gastric cancer patient (Tables 1 and 2), suggesting the subset of genes with more hematogenous metastatic potentials. Gastric cancer cell lines were divided into independent branches according to metastatic character: AGS (primary), YCC-2, YCC-3, and YCC-7 in one branch (peritoneal) and YCC-16 in another (hematogenous), as shown in TreeView (Fig. 2B). Among 66 genes, 40 were down-regulated (Table 1) and 26 were up-regulated (Table 2) in blood-borne YCC-16 cell line compared with the other four gastric cancer cell lines established from the peritoneal ascites of advanced gastric cancer patients. In addition, based on Pearson correlation, we observed that cancer cell lines of a lymphoid origin were closely clustered with YCC-16, which was originated in blood-borne cancer cells of the gastric cancer patient (Fig. 2C). Among 66 genes, 26 showed the similar expression with hematogenously metastasized YCC-16 gastric cancer cell and three lymphoid origin cells with 22 down-regulated and 4 up-regulated genes (Tables 1 and 2).

Table 1

Among 66 genes differently expressed among five gastric cell lines based on their metastatic potential, 40 down-regulated genes in blood-borne YCC-16 cell line

NameGenbank accession no.NameGenbank accession no.
myosin ICAA029956 myosin VI AA625890 
zinc finger protein 38 AA088434 LCA homologueAA598513 
vaccinia-related kinase-1 AA112979 pleiomorphic adenoma gene-like 2 AA704187 
major vault proteinAA158991 serine/threonine kinase 18 AA732873 
KDEL receptorAA181085 KIAA1594 protein AA876039 
butyrate response factor 1AA424743 myocilinAI971049 
destrin (actin-depolymerizing factor)* AA424824 keratin 14 H44051 
PXNAA430574 hypothetical protein MGA 20576H51645 
ankyrin repeat-containing protein AA434117 crystallin ζ (quinone reductase)* R40946 
ITGB5 AA434397 neuronal protein 4.1R71689 
syntaxin 3AAA436871 topoisomerase DNA IIβ (180 kDa) T59934 
E74-like factor 2 AA447783 splicing factor, arginine/serine-rich, 4 W87714 
ribosome binding protein 1 AA447804 tetraspan 3AA284492 
coronin, actin-binding protein 1CAA456063 malate dehydrogenase 1, NADAA403295 
HTOM34PAA457118 hydroxysteroid dehydrogenase 4 AA487914 
cyclin D1 (PRAD1)AA487486 copine IIIAA505111 
ST14 AA489246 ribosomal protein S9 AW074994 
DAG1AA496691 ESTH57136 
heat shock protein 75AA497020 arfaptin 1 T52363 
KRT8AA598517 keratin 13 W60057 
NameGenbank accession no.NameGenbank accession no.
myosin ICAA029956 myosin VI AA625890 
zinc finger protein 38 AA088434 LCA homologueAA598513 
vaccinia-related kinase-1 AA112979 pleiomorphic adenoma gene-like 2 AA704187 
major vault proteinAA158991 serine/threonine kinase 18 AA732873 
KDEL receptorAA181085 KIAA1594 protein AA876039 
butyrate response factor 1AA424743 myocilinAI971049 
destrin (actin-depolymerizing factor)* AA424824 keratin 14 H44051 
PXNAA430574 hypothetical protein MGA 20576H51645 
ankyrin repeat-containing protein AA434117 crystallin ζ (quinone reductase)* R40946 
ITGB5 AA434397 neuronal protein 4.1R71689 
syntaxin 3AAA436871 topoisomerase DNA IIβ (180 kDa) T59934 
E74-like factor 2 AA447783 splicing factor, arginine/serine-rich, 4 W87714 
ribosome binding protein 1 AA447804 tetraspan 3AA284492 
coronin, actin-binding protein 1CAA456063 malate dehydrogenase 1, NADAA403295 
HTOM34PAA457118 hydroxysteroid dehydrogenase 4 AA487914 
cyclin D1 (PRAD1)AA487486 copine IIIAA505111 
ST14 AA489246 ribosomal protein S9 AW074994 
DAG1AA496691 ESTH57136 
heat shock protein 75AA497020 arfaptin 1 T52363 
KRT8AA598517 keratin 13 W60057 
*

Indicates 22 genes in which YCC-16 and lymphoid cancer cell lines show the similar gene expression patterns.

Table 2

Among 66 genes differently expressed among five gastric cell lines based on their metastatic potential, 26 up-regulated genes in blood-borne YCC-16 cell line

NameGenbank accession no.
peptidase βN29844 
lysylhydrogenase 2 AA136707 
microtubule-associated protein τ AA199717 
sequestosome 1 AW074995 
glutamic-oxaloacetic transaminase A H22856 
trinucleotide repeat containing 3 N57754 
NRG1 R72075 
protein kinase MUKAA053674 
casein kinase A2 AA054996 
polymerase RNA III (DNA directed) AA282063 
cyclin-dependent kinase inhibitor 3 AA284072 
CD151 antigen AA443118 
paraoxonase 2 AA446028 
zinc finger protein 131 AA448919 
lactate dehydrogenase C AA453467 
PTD017 proteinAA464612 
cartilage-associated protein AA486278 
thyroid receptor interacting protein 15 AA625651 
BCL2-associated athanogeneAI017240 
cyclin-dependent kinase inhibitor 1A AI952615 
acetyl-CoA acyltransferase 2 H07926 
leptin receptor gene–related protein H51066 
replication factor C (activator 1) 3 N39611 
integrin β-like 1 N52533 
TNFR1 W02761 
integrin β1 W67174 
NameGenbank accession no.
peptidase βN29844 
lysylhydrogenase 2 AA136707 
microtubule-associated protein τ AA199717 
sequestosome 1 AW074995 
glutamic-oxaloacetic transaminase A H22856 
trinucleotide repeat containing 3 N57754 
NRG1 R72075 
protein kinase MUKAA053674 
casein kinase A2 AA054996 
polymerase RNA III (DNA directed) AA282063 
cyclin-dependent kinase inhibitor 3 AA284072 
CD151 antigen AA443118 
paraoxonase 2 AA446028 
zinc finger protein 131 AA448919 
lactate dehydrogenase C AA453467 
PTD017 proteinAA464612 
cartilage-associated protein AA486278 
thyroid receptor interacting protein 15 AA625651 
BCL2-associated athanogeneAI017240 
cyclin-dependent kinase inhibitor 1A AI952615 
acetyl-CoA acyltransferase 2 H07926 
leptin receptor gene–related protein H51066 
replication factor C (activator 1) 3 N39611 
integrin β-like 1 N52533 
TNFR1 W02761 
integrin β1 W67174 
*

Indicates four genes in which YCC-16 and lymphoid cancer cell lines show the similar gene expression patterns.

Comparison of Gene Expression between Real-time RT-PCR and Microarray Analysis. Among the 66 selected genes, we randomly selected 6 genes for validation using real-time RT-PCR, including PXN, ITGB5, DAG1, KRT8, ST14, and NRG1. Sixteen cell lines of breast, gastric, colorectal, and liver cancer and lymphoid origin cell lines were chosen. First, we evaluated the correlation between real-time RT-PCR and microarray. Pearson correlation coefficient of each gene was PXN 0.86, KRT8 0.94, ITGB5 0.65, DAG1 0.45, ST14 0.5, and NRG1 −0.13. The comparison of PXN and KRT8 RNA expression levels between two methods is displayed in Fig. 3A and B with good correlation (PXN r2 = 0.73, KRT8 r2 = 0.84). Then, we clustered five gastric, three colorectal cancer, and three lymphoid origin cells according to real-time RT-PCR result with two genes of higher correlation between two methods, PXN and KRT8 (Fig. 3C). It showed significant discrimination of three lymphoid origin cell lines from gastrointestinal tract cell lines. Especially, YCC-16, which resembles the lymphoid cell line characters in microarray data (Fig. 2B), was closely related to lymphoid cells (Fig. 3C). Next, to evaluate the performance of these gene expression results from real-time RT-PCR in discrimination of gastrointestinal cancer types, we grouped the cell lines with the same origin of tumors of colorectal, gastric, and hematologic malignant cells. When we compared the mean expression level of six genes based on the detection method, we observed that the expression patterns of gastrointestinal tract cell lines were different from hematologic malignant cells in both microarray (Fig. 4A) and real-time RT-PCR (Fig. 4B).

Fig. 3

Comparison of microarray data with quantitative real-time RT-PCR. A, a plot comparing the expression of PXN transcript in 16 cell lines using microarray and real-time RT-PCR. B, a plot of KRT8. X axis, log R/G ratio from microarray; Y axis, adjusted log (test sample copies/β-actin copies). C, clustering of five gastric, three colorectal cancer, and three lymphoid cell lines with PXN and KRT8.

Fig. 3

Comparison of microarray data with quantitative real-time RT-PCR. A, a plot comparing the expression of PXN transcript in 16 cell lines using microarray and real-time RT-PCR. B, a plot of KRT8. X axis, log R/G ratio from microarray; Y axis, adjusted log (test sample copies/β-actin copies). C, clustering of five gastric, three colorectal cancer, and three lymphoid cell lines with PXN and KRT8.

Close modal
Fig. 4

Different expression of six selected genes in various tumor types based on different detection method. Mean expression level of six genes in three gastric cancer cells, three colorectal cancer cells, and three lymphoid cells were. A, microarray results; B, real-time RT-PCR results.

Fig. 4

Different expression of six selected genes in various tumor types based on different detection method. Mean expression level of six genes in three gastric cancer cells, three colorectal cancer cells, and three lymphoid cells were. A, microarray results; B, real-time RT-PCR results.

Close modal

cDNA microarray technologies allow us to comprehensively investigate cancer in various aspects at the RNA level (4–7). To understand cancer biology, the gene expression patterns measured in cell lines using microarray can help to distinguish them at a molecular level in histologically complex cancer specimens (19). Although the laser microdissection has made it possible to analyze the expression patterns of only cancer cells from clinical tumor tissues (20), tumor heterogeneity and significant role of stromal tissues are not negligible to draw the conclusion. Hence, to identify the “molecular portrait” of the tissues from which the cell lines were derived, we evaluated the genome-wide expression profilings with 9 tissues and 27 cell lines. First, we confirmed that the tissues and the cell lines were clustered separately regardless of tissue origin (data not shown). Many reports suggested that a given cell line differs from its original tissue both biologically and genetically (5, 10, 21). Possible reasons are (a) the cell line is established from a selected clone among various cells in the tissue; (b) tissues are composed of a variety of cell origins, such as fibroblasts, blood vessel cells, immune system cells, and fat cells; and (c) in vitro system cells continuously change to adjust to the artificial environment.

We specifically focused on genes significantly expressed in cell lines derived from gastrointestinal tract because gastric and colon cancers are the leading cause of cancer death in Asia. Many suggestions have been made concerning the selection of specific gene sets from a variety of data sets. In our study, to determine the subset of genes related to the gastrointestinal tract cell lines, we first used direct two-class SAM comparing gastrointestinal cells with other cell lines (with or without lymphoid cells). However, the selected genes (93 genes with lymphoid cells and 45 genes without lymphoid cells, respectively) failed to distinguish the gastrointestinal tract cancer cells from others (data not shown). Therefore, we introduced our current indirect data analysis scheme (Fig. 1) and found that 115 genes successfully categorized the cells based on their origins, especially in the case of gastrointestinal tract cells (Fig. 2). Using 115 selected genes, we observed the MDA-MB-231, YCC-B1 breast cancer, and NCI-H460 lung cancer cells lay inside the gastrointestinal tract cluster. This may be because those cancer cell lines are more heterogeneous than others (5, 10), and the 115 genes were selected to differentiate primarily the gastrointestinal tract cancer cells. Even if there are still some arguments about the best false discovery rate, δ, optimal gene numbers, and grouping for gene selection, we showed the specific gene sets selected by using SAM might be useful for understanding and categorizing the cancer cell lines based on their origins.

This gene set might be applied for the identification of the origin of cancer cells when presented with cells of an ambiguous origin, especially when the gastrointestinal tract origin is suspicious. Although it may not be conclusive for the exact diagnosis, it could be helpful as complementary information. Gene expression profiles of cancer cell lines could reflect novel aspects of the phenotype (11–13). Based on previous biological studies (18), including the cell doubling time, tumorigenicity in soft agar, or the xenograft model and motility assay, YCC-16 was the most aggressive cell line among the five gastric cancer cell lines used in the present study. This explains why YCC-16 is not clustered with the other gastric cancer cells. In addition, this cell line was established from the bloodstream in contrast to other cell lines established form primary (AGS) and ascites (YCC series), which might explain why its gene expression is tightly clustered with blood cells. In other words, if the tumor cells show the same expression pattern as YCC-16 with this subset of genes, one might presume that those tumors may survive in the bloodstream and can metastasize to distant organs. Our data may be consistent with the idea that some primary tumors may be preprogrammed to spread to other organs and that this propensity could be detectable at the time of initial diagnosis and therefore be used as a prognostic indicator (22, 23).

It is well known that the alteration in adhesion molecules and apoptosis-related genes facilitate cell escape from the primary site, induce the resistance to cell death, and finally lead to metastasis (24–30). Among selected genes, PXN, ITGB5, coronin, actin-binding protein 1C, BAG family molecular chaperone regulator-1, cyclin-dependent kinase inhibitor 1A (p21), and fibronectin receptor (integrin β1) are involved in cell adhesion or apoptosis. Members of coronin family are involved in apoptosis, whereas BAG family molecular chaperone regulator-1 has antiapoptotic activity and increases the anti–cell death function of bcl-2 (http://genome-www5.stanford.edu/cgi-bin/source/sourceSearch). In addition, p21 was known to inhibit apoptosis (25, 26). In YCC-16, the expression of coronin was down-regulated, whereas BAG family molecular chaperone regulator-1 and p21 were up-regulated. The decreased apoptosis could provide the opportunity for cancer cells to spread to distant organs with high survival rate. Consistent with our data, increased expression of integrin β1 was reported to correlate with increased invasion and metastasis in some cancers (24, 29, 30).

Usually, the microarray data results had required the validation of gene expression using other methods such as RT-PCR or real-time PCR. It started when the quality of microarray was not satisfactory from clone preparation, chip production, variations from hybridization, or inadequate data analysis. However, with the improved chip quality and hybridization technique, recent microarray data showed good correlation with RT-PCR method with reliable biological meanings (17, 21, 22). We did real-time PCR analysis for evaluating not only the technical validation of microarray result but also the potential biological validation of the selected genes. In this study, we confirmed that the gene expression levels of two methods were well correlated, especially with PAX and KRT8, and relatively well with ITGB5, DAG1, and ST14. However, NRG1 showed the significantly low correlation, which might be related that the several cell lines did not expressed NRG1 in real-time RT-PCR. As the basic difference in the technology between microarray and real-time RT-PCR is the simultaneous competitive hybridization of microarray. Moreover, we used the reference RNA with 11 cancer cell lines, resulting in any relative expression data of each sample in microarray. Meanwhile, the real-time RT-PCR detects the expression level of sample itself. Until now, direct integration of two data sets is not settled down yet. Regardless of various correlation between microarray and real-time RT-PCR analysis, we observed the expression patterns based on RT-PCR of selected six genes successfully discriminate gastrointestinal tract from lymphoid origin cells, suggesting that the genes were properly selected.

In conclusion, evaluation of genome-wide gene expression profiles of various cancer cell lines provides the significant genetic information to understand the cancer biology, characteristics of tumor types, and specific characters of each cell line. It may be helpful for identifying and categorizing optimal cell lines for a given experimental purpose.

Grant support: Korea Science and Engineering Foundation through the Cancer Metastasis Research Center at Yonsei University College of Medicine.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
Monks A, Scudiero DA, Johnson GS, Paull KD, Sausville EA. The NCI-cancer drug screen: a smart screen to identify effectors of novel target.
Anticancer Drug Des
1997
;
12
:
533
–41.
2
Weinstein JN, Kohn KW, Grever MR, et al. Neural computing in cancer drug development: predicting mechanism of action.
Science
1992
;
258
:
447
–51.
3
Stinson SF, Alley MC, Fiebig HH, et al. Morphological and immunocytochemical characteristics of human tumor cell lines for use in a disease-oriented anticancer drug screen.
Anticancer Res
1992
;
12
:
1035
–53.
4
Paull KD, Shoemaker RH, Hodes L, et al. Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm.
J Natl Cancer Inst
1989
;
81
:
1088
–92.
5
Lee JC, Lashkari D, Shalon D, et al. Systemic variation in gene expression patterns in human cancer cell lines.
Nat Genet
2000
;
24
:
227
–35.
6
Scherf U, Ross DT, Waltham M, et al. A gene expression database for the molecular pharmacology of cancer.
Nat Genet
2000
;
24
:236–44.
7
Staunton JE, Slonim DK, Coller HA, et al. Chemosensitivity prediction by transcriptional profiling.
Proc Natl Acad Sci U S A
2001
;
98
:
10787
–92.
8
Ji J, Chen X, Leung SY, et al. Comprehensive analysis of the gene expression profiles in human cancer cell lines.
Oncogene
2002
;
21
:
6549
–56.
9
Sakakura C, Hagiwara A, Nakanishi M, et al. Differential gene expression profiles of gastric cancer cells established from primary tumor and malignant ascites.
Br J Cancer
2002
;
87
:
1153
–61.
10
Virtanen C, Ishikawa Y, Honjoh D, et al. Integrated classification of lung tumors and cell lines by expression profiling.
Proc Natl Acad Sci US A
2002
;
99
:
12357
–62.
11
DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale.
Science
1997
;
278
:
680
–6.
12
Iyer VR, Eisen MB, Ross DT, et al. The transcriptional program in the response of human fibroblasts to serum.
Science
1999
;
283
:
83
–7.
13
Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays.
Nat Genet
1999
;
21
:
33
–7.
14
Quackenbush J. Microarray data normalization and transformation.
Nat Genet
2002
;
32
:
496
–501.
15
Yang YH, Dudoit S, Luu P, et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple side systemic variation.
Nucleic Acid Res
2002
;
30
:
e15
.
16
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci USA
2001
;
98
:
5116
–21.
17
Xu Y, Selaru FM, Yin J, et al. Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett's esophagus and esophageal cancer.
Cancer Res
2002
;
62
:
3493
–7.
18
Rha SY, Noh SH, Kwak HJ, et al. Comparison of biological phenotypes according to midkine expression in gastric cancer cells and their autocrine activities could be modulated by pentosan polysulfate.
Cancer Lett
1997
;
118
:
37
–46.
19
Sgroi DC, Teng S, Robinson G, LeVangie R, Hudson JR, Elkahoun AG. In vivo gene expression profile analysis of human breast cancer progression.
Cancer Res
1999
;
59
:
5656
–61.
20
Perou CM, Jeffery SS, Rijn MV, et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.
Proc Natl Acad Sci U S A
1999
;
96
:
9212
–7.
21
Welsh JB, Zarrinkar PP, Sapinoso LM, et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer.
Proc Natl Acad Sci U S A
2001
;
98
:
1176
–81.
22
Ramaswarmy S, Ross KN, Lander ES, Golub TR. A molecular signature of metastasis in primary solid tumors.
Nat Genet
2003
;
31
:
1
–6.
23
Van de vijver MJ, He YD, van'T Veer LJ, et al. A gene-expression signature as predictor of survival in breast cancer.
N Engl J Med
2002
;
8
:
3788
–95.
24
Arboleda MJ, Lyons JF, Kabbinavar FF, et al. Overexpression of AKT2/protein kinase Bβ leads to up-regulation of β1 integrins, increased invasion, and metastasis of human breast and ovarian cancer cells.
Cancer Res
2003
;
63
:
196
–206.
25
Fang M, Liu B, Schmidt M, Lu Y, Mendelsohn J, Fan Z. Involvement of p21Waf1 in mediating inhibition of paclitaxel-induced apoptosis by epidermal growth factor in MDA-MB-486 human breast cancer cells.
Anticancer Res
2000
;
20
:
101
–3.
26
Xu SQ, EL-Diery WS. p21(WAF1/CIP1) inhibits caspase cleavage by TRAIL death receptor DR4.
Biochem Biophys Res Commun
2000
;
269
:
179
–90.
27
Yawata A, Adachi M, Okuda H, et al. Prolonged cell survival enhances peritoneal dissemination of gastric cancer cells.
Oncogene
1998
;
16
:
2681
–6.
28
Huerta S, Harris DM, Jazirehi A, et al. Gene expression profile of metastatic colon cancer cells resistant to cisplatin-induced apoptosis.
Int JOncol
2003
;
663
–70.
29
Fujita S, Watanabe M, Kubota T, Teramoto T, Kitajima M. Alteration of expression in integrin 1-subunit correlates with invasion and metastasis on colorectal cancer.
Cancer Lett
1995
;
91
:
145
–9.
30
Morini M, Mottolese M, Ferrari N, et al. The α3β1 integrin is associated with mammary carcinoma cell metastasis, invasion, and gelatinase B (MMP-9) activity.
Int J Cancer
2000
;
87
:
336
–42.