Abstract
Purpose: We evaluated the genome-wide gene expression profiles of various cancer cell lines to identify the gastrointestinal tract cancer cell–related genes.
Experimental Design: Gene expression profilings of 27 cancer cell lines and 9 tissues using 7.5K human cDNA microarrays in indirect design with Yonsei reference RNA composed of 11 cancer cell line RNAs were done. The significant genes were selected using significant analysis of microarray in various sets of data. The selected genes were validated using real-time PCR analysis.
Results: After intensity-dependent, within-print-tip normalization by loess method, we observed that expression patterns of cell lines and tissues were substantially different, divided in two discrete clusters. Next, we selected 115 genes that discriminate gastrointestinal cancer cell lines from others using significant analysis of microarray. Among the expression profiles of five gastric cancer cell lines, 66 genes were identified as differentially expressed genes related to metastatic phenotype. YCC-16, which was established from the peripheral blood of one advanced gastric cancer patient, produced a unique gene expression pattern resembling the profiles of lymphoid cell lines. Quantitative real-time reverse transcription-PCR results of selected genes, including PXN, KRT8, and ITGB5, were correlated to microarray data and successfully discriminate the gastrointestinal tract cancer cell lines from hematologic malignant cell lines.
Conclusions: A gene expression database could serve as a useful source for the further investigation of cancer biology using the cell lines.
INTRODUCTION
Established cancer cell lines actually differ from their tissues of origins, which are composed of various cell types in histologic or biological characters. However, because of the difficulty in accessing human cancer and normal tissues, cell lines are still the most useful tools for cancer research. Many efforts have been made to understand the basic biological and genetic characteristics of cell lines, which might be useful to determine the proper cell line model for the study purposes. Sixty National Cancer Institute (NCI) cancer cell lines have been used for drug screening for more than four decades, which significantly contributed to anticancer drug development (1–4). Based on their biological properties, the NCI's Developmental Therapeutics Program did a comprehensive gene expression analysis of these 60 NCI cell lines, which resulted in the identification of the genetic characteristics of each cell line (5). Additional studies suggested that drug sensitivity genes could be identified by evaluating the correlation between the gene expression profiles and the drug responsiveness in each cell line (6, 7).
Recent microarray technology has proven to be useful in cancer research, including cancer cell line characterization. Jietal. (8) used a cDNA microarray to analyze the gene expression profile of 12 gastric cancer cell lines that were not included in the NCI studies. This study raised the possibility that RF1 and RF48 designated as gastric cancer cell lines from the American Type Culture Collection (Manassas, VA) could be misclassified. In addition, Sakakura et al. (9) reported on the differential gene expression profiles in gastric cancer cell lines established from primary tumor and malignant ascites. Furthermore, Virtanen etal. (10) showed that gene expression profiling, which is applicable to both lung cancer cell lines and lung tumors, could provide the technology to reclassify cell lines.
Although current treatments have significantly improved patient survival, cancers of the gastrointestinal tract remain the most common cause of cancer-related deaths in the world. However, the evaluation of individual molecules failed to elaborate the complex patterns of carcinogenesis and cancer progression in gastrointestinal cancer. Recent reports suggest that the phenotypic diversity of cancer might be related to corresponding diversity in their gene expression patterns (11–13).
We did a comprehensive gene expression analysis of various cancer cell lines with different tissue origins using a high-density cDNA microarray. Our goals in this study were to identify a subset of classifier genes that distinguish cancer cell lines of gastrointestinal tract origin from other cell lines and to identify genes differently expressed in five gastric cancer cell lines. In addition, because the expression pattern of known genes can show novel phenotypic aspects of the cell lines and tissues (11–13), we tried to explore the molecular characteristics of cancer cell lines with respect to their metastatic behavior to identify optimal cell lines for given biological experiments.
MATERIALS AND METHODS
Cell Lines and Clinical Samples. Twenty cancer cell lines of various tissue origins (stomach: AGS; breast: MDA-MB-231, MDA-MB-435, and MCF7; colon: HCT116, COLO 205, and HT-29; liver: SK-HEP-1 and HepG2; lung: A549 and NCI-H460; lymphoid organ: HL-60, MOLT-4, and Raji; cervix: HeLa; fibrosarcoma: HT-1080; kidney: Caki-2; brain: U-87 MG; melanoma: SK-MEL-2; and pancreas: Capan-2) were obtained from the American Type Culture Collection. Seven cancer cell lines that were established at Yonsei Cancer Metastasis Research Center (CMRC, Seoul, Korea), including YCC-B1, YCC-B2 (from pleural effusion of breast cancer patients), YCC-2, YCC-3, YCC-7 (from ascites of gastric cancer patients), YCC-16 (from the blood of gastric cancer patient), and YCC-P1 (pancreas), were also studied. The cells were cultured and maintained in MEM with 10% fetal bovine serum (Life Technologies, Rockville, MD) in 100 units/mL penicillin and 0.1 mg/mL streptomycin (Life Technologies) at 37°C in a 5% CO2 incubator. Nine tissue samples (three colon cancer tissues, two normal colon tissues, three liver metastatic tumor tissues from colon cancer, and a single sample of normal liver tissue) were collected from patients who had undergone surgery at the Severance Hospital, Yonsei University College of Medicine. Tissue samples were immediately frozen in liquid nitrogen and stored at −80°C until further use.
RNA Preparation and Purification. Total RNA was extracted from the tissues and the cell lines using Trizol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer's protocol. In the case of tissue RNA, extracted RNA was purified before probe preparation using a RNeasy kit (Qiagen, Hilden, Germany) based on the supplier's manuals. The quantity and quality of RNA were evaluated using a GeneSpec III (Hitachi, Tokyo, Japan) and a Gel Documentation-Photo System (Vilber Lourmat, France), respectively.
Probe Preparation. Total RNA (50 μg) was directly labeled and transcribed to cDNA. We combined total RNA from the following 11 cancer cell lines of various tissue origins in equal quantities to prepare the Yonsei reference RNA (CMRC, Yonsei University College of Medicine): stomach, breast, colon, liver, lung, lymphoid and myeloid system, cervix, fibrous tissue, kidney, and brain cancer cell line. Cell lines were selected from various organs to ensure that the pooled RNA contained as many transcripts as possible (see below). Yonsei reference RNA was labeled with Cy3-dUTP (NEN Co., Boston, MA) and test samples were labeled with Cy5-dUTP (NEN).
The labeling was done at 42°C for 2 hours in a total volume of 30 μL containing 400 units SuperScript II (Life Technologies); 3 μL Cy5-dUTP (or Cy-3 dUTP), 1.5 μL of each of dATP, dCTP, and dGTP; 0.6 μL dTTP; 300 mmol/L; 6 μL of 5× first-strand buffer; and 4 μg of modified oligo(dT) primer. Unincorporated nucleotide was removed by using a PCR purification kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Eluted probes were then mixed and supplemented with 20 μL of 1 μg/μL human Cot1 DNA (Life Technologies), 2 μL of 10 μg/μL polyadenylate RNA (Sigma, St. Louis, MO), and 2 μL of 10 μg/μL and 288 μL of 1 mol/L TE buffer. This probe mixture was concentrated using a Microcon-30 tube (Millipore, Bedford, MA),<1aqc=8> and the labeled mixture (48μL) was mixed with 10.2 μL of 20× SSC and 1.8 μL of 10% SDS for hybridization.
Hybridization of Fluorescence-Labeled cDNA. We used the 7.5K human cDNA microarray (GenomicTree Co., Daejon, Korea), which contains 6,360 known genes and 152 expressed sequence tags. Before the hybridization, slides were preblocked in 10 mg/mL bovine serum albumin, 3.5× SSC, 0.1% SDS solution to prevent nonspecific hybridization, and probe mixtures were heated at 95°C for 2 minutes and centrifuged at 13,000 rpm for 2 minutes. After applying the probe to the slides, the slides were hybridized in hybridization chambers (GenomicTree) at 65°C for 16 hours. After the hybridization, slides were washed in 2× SSC for 10 minutes, transferred to 0.1× SSC and 0.1% SDS for 10 minutes, and rinsed twice with 0.1× SSC for 10 minutes. After washing, slides were spun at 600 rpm for 5 minutes (Hanil Science Industrial Co., Incheon, Korea). Hybridized slides were scanned using a GenePix 4000B (Axon Instruments, Union City, CA) and the images were analyzed using GenePix Pro3.0 (Axon Instruments).
Data Analysis. All the array data were normalized by intensity-dependent, within-print-tip normalization with lowest fit (14, 15). We then selected genes with <20% missing values among the 27 experiments, leaving 3,658 genes for further analysis. Of the remaining genes, we determined a subset of classifier genes to distinguish gastrointestinal origin from other cell lines using three steps (Fig. 1). First, we chose a subset of genes to classify such origin of cell lines using multiclass significant analysis of microarray (SAM; refs. 16, 17) algorithm (origin classifier genes; Fig. 1A). For group 1 of gastrointestinal tract origin cell lines, we used four gastric cancer (AGS, YCC-2, YCC-3, and YCC-7) and three colorectal cancer (HCT116, COLO 205, and HT-29) cell lines. YCC-16 cell line was not used in any of gene selection procedure, as the cell line was established from the blood of the gastric cancer patient. Group 3 was the hematologic cells, including Raji, HL-60, and MOLT-4, and the group 2 was consisted of the residual 16 cell lines. Next, we selected a subset of classifier genes specific to cell lines by using two-class SAM (genes related to the cell line establishment; Fig. 1B). Thereafter, by using our Difference Program (CMRC, Yonsei College of Medicine, Korea), we removed a subset of genes specific for cell lines from a subset of classifier genes specific for tissue origin (Fig. 1C). The Difference Program based on Python (http://www.python.org) was developed by CMRC to easily and rapidly obtain the complement set of genes, which only belong to the specific group of the samples. Using Pearson correlation, we did two-way hierarchical clustering of selected genes and samples. TreeView was used to visualize the results using the GeneSpring program (Silicon Genetics, Inc., Redwood City, CA). Functional annotation is based on Stanford Web site (http://genome-wwws.stanford.edu/cgi-bin/source/sourceSearch).
Real-time Reverse Transcription-PCR Analysis. For validation of microarray data, we did real-time reverse transcription-PCR (RT-PCR) using 16 cell line RNAs, which were used for microarray experiment. Sixteen cell lines tested were YCC-B2, MDA-MB-231, MCF7, HCT116, COLO 205, HT-29, SK-HEP-1, HepG2, HL-60, MOLT-4, Raji, AGS, YCC-2, YCC-3, YCC-7, and YCC-16. Six genes were randomly chosen from the selected gene set: dystroglycan 1 (DAG1, AA496691), integrin β5 (ITGB5, AA434397), keratin 8 (KRT8, AA598517), neuregulin 1 (NRG1, R72075), paxillin (PXN, AA430574), and suppression of tumorigenicity 14 (ST14, AA489246). The primer sequences of six genes and β-actinwere as follows as in forward/reverse sequences: DAG1 (268 bp) AGTATCCACACCAAAACCAG/CTTCACCTCAAAGTAGGTGC, ITGB5 (244 bp) CTGTCCATGAAGGATGACTT/TGTCCACTCTGTCTGTGAGA, KRT8 (259 bp) ACAAGTTTGCCTCCTTCATA/CGCTTATTGATCTCATCCTC, NRG1 (275 bp) AGTATCCACAGAAGGAGCAA/ATTCAACATGATGCAACAAA, PXN (159 bp) CGCTCTGTTTATAGTGACCC/AATCACAGGAATTGAAATGG, ST14 (145 bp) CTCCACTGAGTTTGTAAGCC/AGACCAGTAGTAGGCGATGA, and β-actin (98 bp) GGGAATTCAAAACTGGAACGGTGAAGG/GGAAGCTTATCAAAGTCCTCGGCCACA.
After the cDNA production from 5 μg of total RNA in reaction with oligo(dT) primer and 40 IU Moloney murine leukemia virus reverse transcriptase (MBI Fermentas Hanover, MD), 2 μL of cDNA of each cell line were used for the real-time PCR assay. The total volume of the reaction mixture was20 μL, which contained HotstarTaq DNA polymerase, QuantiTect SYBR Green PCR buffer, deoxynucleotide triphosphate mix, including dUTP, SYBR Green, 10 μL of QuantiTect SYBR Green PCR kit including 2.5 mmol/L MgCl2 (Qiagen, Valencia, CA), 2 μL of the cDNA, and 20 pmol of each primer in distilled water. PCR was done at 95°C for 15 minutes to activate the HotstarTaq DNA polymerase and then for 35 cycles of amplification at 95°C for 20 seconds, 50°C for 30 seconds, 72°C for 45 seconds on a Rotor Gene 2072D real-time PCR machine (Corbett Research, New South Wales, Sydney, Australia). The amplified fluorescence signal in each specimen was measured at the late extension step of each cycle.
To quantify the level of gene expression, we evaluated the β-actin gene expression in 10-fold serially diluted human genomic DNA (Promega, Madison, WI). The standard curve was drawn by plotting the measured threshold cycle versus the arbitrary unit of the copies/reaction according to the β-actin gene expression of diluted genomic DNA. The threshold cycle (Ct) value was determined as the cycle number at which the fluorescence exceeded the threshold value. In the negative control, there was no fluorescent signal when the cycle number was increased to 35. To compare with microarray result, the calculated copies of each gene were divided with the copies of β-actin and then were log converted. Pearson correlation coefficient was evaluated using S-PLUS 2000 (MathSoft, Inc., Seattle, WA) and the plotting was done using SigmaPlot 8.0 (SPSS, Inc., Chicago, IL).
RESULTS
Analysis of Global Gene Expression Patterns of Cancer Cell Lines. We did cDNA microarray testing with 7,500 cDNA spots, representing ∼6,512 genes to investigate the comprehensive gene expression patterns within 27 cancer cell lines and 9 tissues. First, we used a hierarchical clustering algorithm to group genes as well as cell lines and tissue samples according to similarities in their expression patterns by simply filtered 7,128 spots. Hierarchical clustering showed that the samples were grouped into two major branches, tissue samples and cell lines (data not shown). We observed the distinct clusters including cell lines with common tissue origin clustered together: lymphoid organs, the gastrointestinal tract, and remaining cell lines with only a few exceptions. In addition, the interesting features were that the differences of gene expression patterns among the 27 cell lines were much greater than those between normal and tumor tissues. However, normal tissues were separated from the tumor tissues.
Selection of a Subset of Genes That Discriminate Cancer Cell Lines of a Gastrointestinal Origin from Other Cancer Cell Lines of a Different Origin. Following the scheme of the gene selection in Fig. 1, we selected the significant genes with filtered 3,658 genes. Firstly, we used multiclass SAM to choose a subset of 199 cell origin classifier genes, which were assumed to differentiate cancer cell lines according to their origins, such as gastrointestinal tract, lymphoid organs, and remaining cell lines (false discovery rate 1%, δ = 0.5751; Fig. 1A). Next, by using two-class SAM, we selected a subset of 1,158 genes that expressed only in cell lines compared with tissues (false discovery rate 10%, δ = 0.9765; Fig. 1B). Then, to identify the subset of genes specific for gastrointestinal tract cell lines, we filtered off a subset of genes related to cell line establishment from 199 origin classifier genes using our Difference Program. Therefore, we could select 115 genes that might discriminate gastrointestinal cancer cell lines, including colon and stomach cancer cell lines, from others (Fig. 1C).
Gene Expression Patterns Related to Tissue Origins in Various Cancer Cell Lines. Hierarchical clustering analysis with the selected 115 genes related to gastrointestinal tract tissue origins showed that the cell lines were generally classified based on tissue origin with a few exceptions, such as MCF7, YCC-B1, and NCI-H460, which clustered in the gastrointestinal branch (Fig. 2A). We observed that the gastric cancer cell lines YCC-2, YCC-3, and YCC-7 established at Yonsei CMRC were grouped with other gastrointestinal tract cancer cells, such as AGS, HT-29, COLO 205, and HCT116 obtained from American Type Culture Collection (Fig. 2A). In addition, when the selected subset of genes was used for the clustering analysis, we observed that tissues of gastrointestinal origin completely integrated into gastrointestinal cell lines (Fig. 2A). However, in case of colon cancer cell lines, including COLO 205, HT-29, and HCT116, each cell line showed a unique expression pattern, suggesting different biological phenotype in each cell line.
Gene Expression Patterns According to Metastatic Pattern in Gastric Cancer. By comparing gene expression profiles of five gastric cancer cell lines with different metastatic patterns (18), we discovered a subset of differentially regulated 66 genes in YCC-16, a cell line isolated from the blood of a gastric cancer patient (Tables 1 and 2), suggesting the subset of genes with more hematogenous metastatic potentials. Gastric cancer cell lines were divided into independent branches according to metastatic character: AGS (primary), YCC-2, YCC-3, and YCC-7 in one branch (peritoneal) and YCC-16 in another (hematogenous), as shown in TreeView (Fig. 2B). Among 66 genes, 40 were down-regulated (Table 1) and 26 were up-regulated (Table 2) in blood-borne YCC-16 cell line compared with the other four gastric cancer cell lines established from the peritoneal ascites of advanced gastric cancer patients. In addition, based on Pearson correlation, we observed that cancer cell lines of a lymphoid origin were closely clustered with YCC-16, which was originated in blood-borne cancer cells of the gastric cancer patient (Fig. 2C). Among 66 genes, 26 showed the similar expression with hematogenously metastasized YCC-16 gastric cancer cell and three lymphoid origin cells with 22 down-regulated and 4 up-regulated genes (Tables 1 and 2).
Name . | Genbank accession no. . | Name . | Genbank accession no. . |
---|---|---|---|
myosin IC* | AA029956 | myosin VI | AA625890 |
zinc finger protein 38 | AA088434 | LCA homologue* | AA598513 |
vaccinia-related kinase-1 | AA112979 | pleiomorphic adenoma gene-like 2 | AA704187 |
major vault protein* | AA158991 | serine/threonine kinase 18 | AA732873 |
KDEL receptor* | AA181085 | KIAA1594 protein | AA876039 |
butyrate response factor 1* | AA424743 | myocilin* | AI971049 |
destrin (actin-depolymerizing factor)* | AA424824 | keratin 14 | H44051 |
PXN* | AA430574 | hypothetical protein MGA 20576* | H51645 |
ankyrin repeat-containing protein | AA434117 | crystallin ζ (quinone reductase)* | R40946 |
ITGB5 | AA434397 | neuronal protein 4.1* | R71689 |
syntaxin 3A* | AA436871 | topoisomerase DNA IIβ (180 kDa) | T59934 |
E74-like factor 2 | AA447783 | splicing factor, arginine/serine-rich, 4 | W87714 |
ribosome binding protein 1 | AA447804 | tetraspan 3* | AA284492 |
coronin, actin-binding protein 1C* | AA456063 | malate dehydrogenase 1, NAD* | AA403295 |
HTOM34P* | AA457118 | hydroxysteroid dehydrogenase 4 | AA487914 |
cyclin D1 (PRAD1)* | AA487486 | copine III* | AA505111 |
ST14 | AA489246 | ribosomal protein S9 | AW074994 |
DAG1* | AA496691 | EST* | H57136 |
heat shock protein 75* | AA497020 | arfaptin 1 | T52363 |
KRT8* | AA598517 | keratin 13 | W60057 |
Name . | Genbank accession no. . | Name . | Genbank accession no. . |
---|---|---|---|
myosin IC* | AA029956 | myosin VI | AA625890 |
zinc finger protein 38 | AA088434 | LCA homologue* | AA598513 |
vaccinia-related kinase-1 | AA112979 | pleiomorphic adenoma gene-like 2 | AA704187 |
major vault protein* | AA158991 | serine/threonine kinase 18 | AA732873 |
KDEL receptor* | AA181085 | KIAA1594 protein | AA876039 |
butyrate response factor 1* | AA424743 | myocilin* | AI971049 |
destrin (actin-depolymerizing factor)* | AA424824 | keratin 14 | H44051 |
PXN* | AA430574 | hypothetical protein MGA 20576* | H51645 |
ankyrin repeat-containing protein | AA434117 | crystallin ζ (quinone reductase)* | R40946 |
ITGB5 | AA434397 | neuronal protein 4.1* | R71689 |
syntaxin 3A* | AA436871 | topoisomerase DNA IIβ (180 kDa) | T59934 |
E74-like factor 2 | AA447783 | splicing factor, arginine/serine-rich, 4 | W87714 |
ribosome binding protein 1 | AA447804 | tetraspan 3* | AA284492 |
coronin, actin-binding protein 1C* | AA456063 | malate dehydrogenase 1, NAD* | AA403295 |
HTOM34P* | AA457118 | hydroxysteroid dehydrogenase 4 | AA487914 |
cyclin D1 (PRAD1)* | AA487486 | copine III* | AA505111 |
ST14 | AA489246 | ribosomal protein S9 | AW074994 |
DAG1* | AA496691 | EST* | H57136 |
heat shock protein 75* | AA497020 | arfaptin 1 | T52363 |
KRT8* | AA598517 | keratin 13 | W60057 |
Indicates 22 genes in which YCC-16 and lymphoid cancer cell lines show the similar gene expression patterns.
Name . | Genbank accession no. . |
---|---|
peptidase β* | N29844 |
lysylhydrogenase 2 | AA136707 |
microtubule-associated protein τ | AA199717 |
sequestosome 1 | AW074995 |
glutamic-oxaloacetic transaminase A | H22856 |
trinucleotide repeat containing 3 | N57754 |
NRG1 | R72075 |
protein kinase MUK* | AA053674 |
casein kinase A2 | AA054996 |
polymerase RNA III (DNA directed) | AA282063 |
cyclin-dependent kinase inhibitor 3 | AA284072 |
CD151 antigen | AA443118 |
paraoxonase 2 | AA446028 |
zinc finger protein 131 | AA448919 |
lactate dehydrogenase C | AA453467 |
PTD017 protein* | AA464612 |
cartilage-associated protein | AA486278 |
thyroid receptor interacting protein 15 | AA625651 |
BCL2-associated athanogene* | AI017240 |
cyclin-dependent kinase inhibitor 1A | AI952615 |
acetyl-CoA acyltransferase 2 | H07926 |
leptin receptor gene–related protein | H51066 |
replication factor C (activator 1) 3 | N39611 |
integrin β-like 1 | N52533 |
TNFR1 | W02761 |
integrin β1 | W67174 |
Name . | Genbank accession no. . |
---|---|
peptidase β* | N29844 |
lysylhydrogenase 2 | AA136707 |
microtubule-associated protein τ | AA199717 |
sequestosome 1 | AW074995 |
glutamic-oxaloacetic transaminase A | H22856 |
trinucleotide repeat containing 3 | N57754 |
NRG1 | R72075 |
protein kinase MUK* | AA053674 |
casein kinase A2 | AA054996 |
polymerase RNA III (DNA directed) | AA282063 |
cyclin-dependent kinase inhibitor 3 | AA284072 |
CD151 antigen | AA443118 |
paraoxonase 2 | AA446028 |
zinc finger protein 131 | AA448919 |
lactate dehydrogenase C | AA453467 |
PTD017 protein* | AA464612 |
cartilage-associated protein | AA486278 |
thyroid receptor interacting protein 15 | AA625651 |
BCL2-associated athanogene* | AI017240 |
cyclin-dependent kinase inhibitor 1A | AI952615 |
acetyl-CoA acyltransferase 2 | H07926 |
leptin receptor gene–related protein | H51066 |
replication factor C (activator 1) 3 | N39611 |
integrin β-like 1 | N52533 |
TNFR1 | W02761 |
integrin β1 | W67174 |
Indicates four genes in which YCC-16 and lymphoid cancer cell lines show the similar gene expression patterns.
Comparison of Gene Expression between Real-time RT-PCR and Microarray Analysis. Among the 66 selected genes, we randomly selected 6 genes for validation using real-time RT-PCR, including PXN, ITGB5, DAG1, KRT8, ST14, and NRG1. Sixteen cell lines of breast, gastric, colorectal, and liver cancer and lymphoid origin cell lines were chosen. First, we evaluated the correlation between real-time RT-PCR and microarray. Pearson correlation coefficient of each gene was PXN 0.86, KRT8 0.94, ITGB5 0.65, DAG1 0.45, ST14 0.5, and NRG1 −0.13. The comparison of PXN and KRT8 RNA expression levels between two methods is displayed in Fig. 3A and B with good correlation (PXN r2 = 0.73, KRT8 r2 = 0.84). Then, we clustered five gastric, three colorectal cancer, and three lymphoid origin cells according to real-time RT-PCR result with two genes of higher correlation between two methods, PXN and KRT8 (Fig. 3C). It showed significant discrimination of three lymphoid origin cell lines from gastrointestinal tract cell lines. Especially, YCC-16, which resembles the lymphoid cell line characters in microarray data (Fig. 2B), was closely related to lymphoid cells (Fig. 3C). Next, to evaluate the performance of these gene expression results from real-time RT-PCR in discrimination of gastrointestinal cancer types, we grouped the cell lines with the same origin of tumors of colorectal, gastric, and hematologic malignant cells. When we compared the mean expression level of six genes based on the detection method, we observed that the expression patterns of gastrointestinal tract cell lines were different from hematologic malignant cells in both microarray (Fig. 4A) and real-time RT-PCR (Fig. 4B).
DISCUSSION
cDNA microarray technologies allow us to comprehensively investigate cancer in various aspects at the RNA level (4–7). To understand cancer biology, the gene expression patterns measured in cell lines using microarray can help to distinguish them at a molecular level in histologically complex cancer specimens (19). Although the laser microdissection has made it possible to analyze the expression patterns of only cancer cells from clinical tumor tissues (20), tumor heterogeneity and significant role of stromal tissues are not negligible to draw the conclusion. Hence, to identify the “molecular portrait” of the tissues from which the cell lines were derived, we evaluated the genome-wide expression profilings with 9 tissues and 27 cell lines. First, we confirmed that the tissues and the cell lines were clustered separately regardless of tissue origin (data not shown). Many reports suggested that a given cell line differs from its original tissue both biologically and genetically (5, 10, 21). Possible reasons are (a) the cell line is established from a selected clone among various cells in the tissue; (b) tissues are composed of a variety of cell origins, such as fibroblasts, blood vessel cells, immune system cells, and fat cells; and (c) in vitro system cells continuously change to adjust to the artificial environment.
We specifically focused on genes significantly expressed in cell lines derived from gastrointestinal tract because gastric and colon cancers are the leading cause of cancer death in Asia. Many suggestions have been made concerning the selection of specific gene sets from a variety of data sets. In our study, to determine the subset of genes related to the gastrointestinal tract cell lines, we first used direct two-class SAM comparing gastrointestinal cells with other cell lines (with or without lymphoid cells). However, the selected genes (93 genes with lymphoid cells and 45 genes without lymphoid cells, respectively) failed to distinguish the gastrointestinal tract cancer cells from others (data not shown). Therefore, we introduced our current indirect data analysis scheme (Fig. 1) and found that 115 genes successfully categorized the cells based on their origins, especially in the case of gastrointestinal tract cells (Fig. 2). Using 115 selected genes, we observed the MDA-MB-231, YCC-B1 breast cancer, and NCI-H460 lung cancer cells lay inside the gastrointestinal tract cluster. This may be because those cancer cell lines are more heterogeneous than others (5, 10), and the 115 genes were selected to differentiate primarily the gastrointestinal tract cancer cells. Even if there are still some arguments about the best false discovery rate, δ, optimal gene numbers, and grouping for gene selection, we showed the specific gene sets selected by using SAM might be useful for understanding and categorizing the cancer cell lines based on their origins.
This gene set might be applied for the identification of the origin of cancer cells when presented with cells of an ambiguous origin, especially when the gastrointestinal tract origin is suspicious. Although it may not be conclusive for the exact diagnosis, it could be helpful as complementary information. Gene expression profiles of cancer cell lines could reflect novel aspects of the phenotype (11–13). Based on previous biological studies (18), including the cell doubling time, tumorigenicity in soft agar, or the xenograft model and motility assay, YCC-16 was the most aggressive cell line among the five gastric cancer cell lines used in the present study. This explains why YCC-16 is not clustered with the other gastric cancer cells. In addition, this cell line was established from the bloodstream in contrast to other cell lines established form primary (AGS) and ascites (YCC series), which might explain why its gene expression is tightly clustered with blood cells. In other words, if the tumor cells show the same expression pattern as YCC-16 with this subset of genes, one might presume that those tumors may survive in the bloodstream and can metastasize to distant organs. Our data may be consistent with the idea that some primary tumors may be preprogrammed to spread to other organs and that this propensity could be detectable at the time of initial diagnosis and therefore be used as a prognostic indicator (22, 23).
It is well known that the alteration in adhesion molecules and apoptosis-related genes facilitate cell escape from the primary site, induce the resistance to cell death, and finally lead to metastasis (24–30). Among selected genes, PXN, ITGB5, coronin, actin-binding protein 1C, BAG family molecular chaperone regulator-1, cyclin-dependent kinase inhibitor 1A (p21), and fibronectin receptor (integrin β1) are involved in cell adhesion or apoptosis. Members of coronin family are involved in apoptosis, whereas BAG family molecular chaperone regulator-1 has antiapoptotic activity and increases the anti–cell death function of bcl-2 (http://genome-www5.stanford.edu/cgi-bin/source/sourceSearch). In addition, p21 was known to inhibit apoptosis (25, 26). In YCC-16, the expression of coronin was down-regulated, whereas BAG family molecular chaperone regulator-1 and p21 were up-regulated. The decreased apoptosis could provide the opportunity for cancer cells to spread to distant organs with high survival rate. Consistent with our data, increased expression of integrin β1 was reported to correlate with increased invasion and metastasis in some cancers (24, 29, 30).
Usually, the microarray data results had required the validation of gene expression using other methods such as RT-PCR or real-time PCR. It started when the quality of microarray was not satisfactory from clone preparation, chip production, variations from hybridization, or inadequate data analysis. However, with the improved chip quality and hybridization technique, recent microarray data showed good correlation with RT-PCR method with reliable biological meanings (17, 21, 22). We did real-time PCR analysis for evaluating not only the technical validation of microarray result but also the potential biological validation of the selected genes. In this study, we confirmed that the gene expression levels of two methods were well correlated, especially with PAX and KRT8, and relatively well with ITGB5, DAG1, and ST14. However, NRG1 showed the significantly low correlation, which might be related that the several cell lines did not expressed NRG1 in real-time RT-PCR. As the basic difference in the technology between microarray and real-time RT-PCR is the simultaneous competitive hybridization of microarray. Moreover, we used the reference RNA with 11 cancer cell lines, resulting in any relative expression data of each sample in microarray. Meanwhile, the real-time RT-PCR detects the expression level of sample itself. Until now, direct integration of two data sets is not settled down yet. Regardless of various correlation between microarray and real-time RT-PCR analysis, we observed the expression patterns based on RT-PCR of selected six genes successfully discriminate gastrointestinal tract from lymphoid origin cells, suggesting that the genes were properly selected.
In conclusion, evaluation of genome-wide gene expression profiles of various cancer cell lines provides the significant genetic information to understand the cancer biology, characteristics of tumor types, and specific characters of each cell line. It may be helpful for identifying and categorizing optimal cell lines for a given experimental purpose.
Grant support: Korea Science and Engineering Foundation through the Cancer Metastasis Research Center at Yonsei University College of Medicine.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.