The identification of genes that contribute to the biological basis for clinical heterogeneity and progression of prostate cancer is critical to accurate classification and appropriate therapy. We performed a comprehensive gene expression analysis of prostate cancer using oligonucleotide arrays with 63,175 probe sets to identify genes and expressed sequences with strong and uniform differential expression between nonrecurrent primary prostate cancers and metastatic prostate cancers. The mean expression value for >3,000 tumor-intrinsic genes differed by at least 3-fold between the two groups. This includes many novel ESTs not previously implicated in prostate cancer progression. Many differentially expressed genes participate in biological processes that may contribute to the clinical phenotype. One example was a strong correlation between high proliferation rates in metastatic cancers and overexpression of genes that participate in cell cycle regulation, DNA replication, and DNA repair. Other functional categories of differentially expressed genes included transcriptional regulation, signaling, signal transduction, cell structure, and motility. These differentially expressed genes reflect critical cellular activities that contribute to clinical heterogeneity and provide diagnostic and therapeutic targets.

Carcinoma of the prostate, the most common cancer in the United States, was expected to affect ∼198,000 individuals in 2001 (1). In recent years, there has been a dramatic increase in the proportion of patients diagnosed with tumors that are seemingly confined to the gland. This is the result of increased public awareness and the more widespread application of detection strategies based on measurements of the level of prostate-specific antigen in the blood (2). Early disease is clinically heterogeneous because many patients have an indolent course that does not significantly affect an individual patient’s survival. In contrast, once metastatic disease is documented on an imaging study, the majority of patients die from their tumors as opposed to other causes. This has led us to consider the disease as a series of “states” that include clinically localized tumors, and those that have metastasized, as a framework to assess the clinical and biological factors that are associated with specific phenotypes and outcomes (3). Understanding the biological basis for this clinical heterogeneity is critical to assess prognosis, to select therapy, and to assess treatment effects.

Tumor metastasis is the most clinically significant event in prostate cancer patients. Development of metastases requires that a cancer cell must complete a series of steps involving complex interactions between tumor cells and the host (4). Cells from primary tumors must detach, invade stromal tissue, and penetrate vessels by which they disseminate. They must survive in the circulation to reach a secondary site in which they arrest because of physical size or binding to specific tissues. To form clinically significant tumors, metastatic cells must proliferate in the new microenvironment and recruit a blood supply. Those tumor cells growing at metastatic sites are then continually selected for growth advantage. This is a complex and dynamic process that is expected to involve alterations in many genes and transcriptional programs. Identification of genes, gene expression profiles, and biological pathways that contribute to metastasis will be of significant benefit to improved tumor classification and therapy. To address this question, we performed a genome-wide expression analysis and identified genes differentially expressed in primary and metastatic prostate cancers. These genes reflect the distinct clinical phenotype of these two cohorts, and provide insight into the biology of prostate cancer progression.

Samples.

Tissues from 3 noncancerous prostates, 23 primary prostate cancers, and 9 metastatic prostate cancers, obtained from therapeutic or diagnostic procedures performed as part of routine clinical management at MSKCC3 from 1993 to 1999, were analyzed. Clinical and pathological features of prostate cancer cases from which samples were obtained are presented in Table 1. Samples were snap-frozen in liquid nitrogen and stored at −80°C. Each sample was examined histologically using H&E-stained cryostat sections. Cells of interest were manually dissected from the frozen block, trimming away other tissues. Care was taken to remove nonneoplastic tissues from tumor samples. All of the studies were conducted under MSKCC Institutional Review Board-approved protocols.

Gene Expression Analysis.

RNA was extracted from frozen tissues by homogenization in quanidinium isothiocyanate-based buffer (RNAeasy; Qiagen, Valencia, CA) and evaluated for integrity by denaturing agarose gel. Complementary DNA was synthesized from total RNA using a T7-promoter-tagged oligo(dT) primer. RNA target was synthesized by in vitro transcription and labeled with biotinylated nucleotides (Enzo Biochem, Farmingdale, NY). Labeled target was assessed by hybridization to Test2 arrays (Affymetrix, Santa Clara, CA) and detected with phycoerythrin-streptavidin (Molecular Probes, Eugene, OR) amplified with antistreptavidin antibody (Vector, Burlingame, CA). Gene expression analysis was carried out using Affymetrix U95 human gene arrays with 63,175 features for individual gene/EST clusters using instruments and protocols recommended by the manufacturer. The U95 set consists of five distinct microarrays (A through E), each containing probes for about 12,000 unique gene/EST transcripts. Two response measures, the Average Difference and Absolute Call, were extracted for each gene on every sample, as determined by default settings of Affymetrix Microarray Suite 4.0. Average Difference was used as the primary measure of expression level, and Absolute Call was retained as a secondary measure. Expression values on each array were multiplicatively scaled to have an average expression of 2500 across the central 96% percent of all genes on the array.

Data Analysis.

For oligonucleotide arrays, scanned image files were visually inspected for artifacts and analyzed using Microarray Suite v4.0 (Affymetrix). Differential expression was evaluated using several measures. Final ranking to obtain genes uniformly and strongly differentially expressed was determined by the following approach (fully described in supplementary information).4 The expression data set was first filtered to include only those probe sets detecting genes with mean expression values that differed by at least 3-fold between the two groups (absolute base-ten logarithm of the ratio of the means ≥0.4771). Probes were then ranked based on the relative magnitude of the difference (t test) between the means of the two sample sets. Genes with expression differences likely attributable to contaminating nonneoplastic tissues were removed from the ranking. Datasets used for hierarchical clustering were normalized by standardizing each gene and array to mean = 0 and variance = 1. Hierarchical clustering and result display was performed using Cluster and TreeView software (5). Gene expression that was attributable to nonneoplastic tissues was identified by comparison with expression levels in nonneoplastic prostate samples (supplementary information). Specific genes corresponding to Unigene clusters were identified by GenBank accession number of the clone used to produce the oligonucleotide probe set and annotated through review of internet resources.5

Immunohistochemistry.

Multitissue blocks of formalin-fixed, paraffin-embedded tissue corresponding to the samples used in this analysis were prepared using a tissue arrayer (Beecher Instruments, Silver Spring, MD). The blocks contained three representative 0.6 mm cores from diagnostic areas for each case. Immunohistochemical detection of Ki67 (mib1; Dako, Carpinteria, CA; 1:200) was carried out with standard streptavidin-biotin peroxidase methodology using formalin-fixed, paraffin-embedded tissue, and microwave antigen retrieval as described previously (6).

Quantitative Reverse Transcriptase-PCR.

Quantitative reverse transcriptase-PCR was performed using the LightCycler thermal cycler system (Roche Diagnostics, Basel, Switzerland). Fifty to 100 ng of total RNA were used as template with the LightCycler RNA SYBR Green I system (2.7× concentration; Roche Diagnostics) according to the manufacturer’s instructions. A typical protocol included reverse transcription at 61°C for 20 min and a denaturation step at 95°C for 2 min followed by 45 cycles with 95°C denaturation for 5 s, 60°C annealing for 5 s, and 72°C extension for 8 s. Detection of the fluorescent product was performed at the end of the extension period. A melting curve analysis was performed at the end of the PCR at 95° for 5 s, 65° for 15 s, and a final denaturation at 95° for 0 s. Negative controls were run to confirm that the samples were not cross-contaminated. Data were analyzed with the LightCycler analysis software. A standard curve was created by serial dilution of the appropriate RNA template. Primers for TAGLN were: forward, 5′-GTCATTGGCCTTCAGATG-3′, and reverse, 5′-ACACCTCAAAGCTTG-3′; for MSMB: forward, 5′-TGTTTCTACACCTGTGGG-3′, and reverse, 5′-TGATAGGCATGGCTACAC-3′; and for 18S RNase: forward, 5′-GACATTGACCTCACCAAG-3′, and reverse, 5′-ATCTTCTTCAGTCGCTCC-3′.

Gene Expression Analysis Using Oligonucleotide Arrays.

We analyzed gene expression in prostate samples using hybridization of RNA target to oligonucleotide microarrays with 63,175 features for gene/EST clusters. Three nonneoplastic prostate samples, 23 primary prostate carcinomas and 9 metastatic prostate cancers were evaluated. Expression of genes corresponding to 5992 probe sets were reliably detected (absolute call of present) in all 23 primary prostate carcinomas; 34,518 probe sets detected gene expression in at least some cases; and 22,665 did not detect expression (absolute call of absent) in any of these samples. Gene expression corresponding to 7,713 probe sets were detected in all 9 metastatic prostate cancers, whereas 29,784 genes/ESTs were variably expressed, and 25,678 genes/ESTs were not detected in any. The highest proportion of expressed genes was represented by probe sets on the U95A microarray corresponding to genes with near full-length cDNAs, most of which are named. Analyses of the gene/EST clusters queried by the U95A array are described in detail below. Identical algorithms were applied to data from arrays U95B-E which are primarily comprised of probe sets for uncharacterized ESTs. Those results are available in the supplementary data.

Identification of Differentially Expressed Genes.

An unsupervised analysis of gene expression in all 32 prostate cancers based on the hybridization results for the U95A array revealed a strong tendency for primary and metastatic tumors to have distinct expression profiles based on an average linkage hierarchical clustering algorithm (Fig. 1). This analysis revealed several groups of genes with distinct expression profiles for these two major tumor subdivisions. As expected, some of the gene expression differences that distinguished primary and metastatic tumors were contributed by the small amount of contaminating nonneoplastic prostate tissue present in the primary tumor samples (Fig. 1, bottom panel). These were further confirmed by a comparison with expression data from nonneoplastic prostate (see supplementary information). The remaining gene expression differences are expected to be intrinsic to tumor cells and reflect biological distinctions.

The high level of discrimination between primary and metastatic carcinomas based on an unsupervised analysis of expression profiles suggested that specific genes responsible for biological differences could be identified. Nine of the 23 primary sample patients experienced a recurrence. We compared the subset of 14 primary tumors from patients that did not recur with the 9 metastatic prostate cancers to identify differentially expressed genes in these clinically distinct groups. Primary tumors from patients who subsequently developed metastases were not included in the comparison. Genes that were uniformly and strongly differentially expressed were selected as described under “Data Analysis” in the “Materials and Methods” section. The expression data set was first filtered to include only those probe sets detecting genes with mean expression values that differed by at least 3-fold between the two groups. Probes were then ranked based on the relative magnitude of the difference (t test) between the means of the two sample sets. Genes with expression differences that were likely attributable to contaminating nonneoplastic tissues were removed from the ranking. A total of 3,436 of the 63,175 probe sets detected tumor-related differential gene expression of at least 3-fold. Of the U95A probe sets for near-full-length genes, 132 were overexpressed in primary tumors and 360 in metastases. The 100 most highly ranked tumor-intrinsic, U95A genes based on the t test statistic are listed in Table 2. The entire list of 3,436 differentially expressed sequences is available in the supplementary material.

Functional Attributes of Differentially Expressed Genes.

We reviewed available sources and assigned a general molecular or biological function to each gene in Table 2. The functional attributes of differentially expressed genes are expected to reflect biological differences between early-stage and advanced tumors. In keeping with this concept, 26 of the 100 most highly ranked genes are believed to play a role in some aspect of cell cycle regulation, DNA replication and repair, or mitosis including many genes, such as RFC5, TOP2A, RFC4 and MAD2L1, that are known to be up-regulated in highly proliferative cells (7). This finding correlated with the increased proliferation index of metastatic tumors used for gene expression analysis based on the immunohistochemical assessment of MKI67 (Fig. 2). Fifteen of the 100 highly ranked genes correspond to products potentially involved in signaling and signal transduction and 9 others may contribute to cell adhesion, cell migration, or extracellular matrix. These include HMMR, which encodes an extracellular matrix-binding protein believed to play a role in cell motility through the RAS-ERK signaling pathway (8), and INPP4A, the substrates of which are intermediates in pathways regulated through the AKT proto-oncogene (9).

An unexpectedly large proportion of highly differentially expressed genes are believed to be involved in the regulation of gene expression and gene product function. For example, of the 100 highest-ranked differentially expressed U95A genes, 13 encode products that are expected to function as transcription factors, components of the transcriptional complex, or other proteins contributing to the regulation of transcription; 3 others encode products believed to participate in RNA splicing or metabolism; 3 gene products are thought to participate in chromatin modifications that may also impact on transcriptional regulation; and 5 genes contribute to posttranscriptional regulation of protein function. These findings suggest that the development and progression of prostate cancer metastases are associated with many gene expression changes related to cell proliferation, interactions with the microenvironment, properties that might contribute to cell motility, activated signal transduction pathways, and the regulation of gene product synthesis and function.

Validation.

Gene expression values that were determined by oligonucleotide array were validated in several ways. Some gene transcripts were represented by more than one probe set, and each detected similar relative levels of expression. An analysis of Cancer Genome Anatomy Project, serial analysis of gene expression, and other published data verified expression in prostate sample for many genes (10, 11, 12, 13, 14). Some differentially expressed genes were selected for the measurement of transcript levels using a quantitative reverse transcriptase-PCR technique in the same samples used for microarray-based expression analysis. The results were in good agreement with the relative levels of expression as determined by oligonucleotide arrays (examples shown in Fig. 3). We also performed immunohistochemical analysis of tumor samples and established that protein expression of some gene products correlated with mRNA levels [e.g., MKI67 (Fig. 2) and PSA, AR (data not shown)]. Several of the genes that we identified as highly differentially expressed have been previously implicated in progression of cancer and help to validate our experimental approach. In addition to those mentioned above, other examples include STK11(15), PTTG1(16), STK15(17), and MYBL2(18). STK15 was overexpressed in metastatic prostate cancers and has previously been implicated in aggressive prostate cancer (17).

The heterogeneity of prostate cancers extends within clinical states and across states. At the extreme ends of the clinical spectrum are prognostically favorable localized tumors with a low biological potential for metastasis and tumors with a high propensity for early dissemination that are invariably lethal. These clinical phenotypes are related in part to the intrinsic biology of tumor cells and are reflected in the pattern of expression of specific genes. Using comprehensive gene expression analysis of tumor samples representing the nonmetastatic and metastatic phenotypes, we identified genes that were consistently and strongly differentially expressed and represent common and valid biological differences underlying clinical heterogeneity.

Few prior studies have used high-throughput gene expression analysis to study prostate cancer metastases. One reason is that well-preserved surgical tissue samples of metastatic prostate cancer are rare, which limits the availability of appropriate samples. Oligonucleotide arrays with 6800 probe sets were used to compare expression in three metastases to eight primary tumors (18). Five of the nine genes found to be commonly differentially expressed in that study also have average expression values for our sample sets that are in agreement with that data. In addition, two (EGR2 and EGR3) were differentially expressed at least 3-fold in the present study. In the other published analysis of metastatic tumors (12), the hierarchical clustering of cDNA array data was used to identify coordinately expressed groups of genes that were specifically differentially expressed in metastatic versus primary prostate cancer. Two of the genes in clusters that were overexpressed in metastatic prostate carcinomas (MTA1 and MYBL2) were also overexpressed in the present study. In fact, MYBL2 met our stringent criteria for strongly and commonly overexpressed genes in metastases. In a related study to identify genes associated with aggressive primary tumors, Singh et al.(19) identified five genes that were commonly used in models to predict the recurrence of carcinoma after radical prostatectomy. Two of the genes that were overexpressed in recurrent cases (HOXC6 and PDGFRA) were also overexpressed in our samples of metastatic carcinoma, and one gene overexpressed in nonrecurrent cases (SIAT1) was more strongly expressed in primaries, although none achieved the 3-fold difference filter used here. The partial agreement between these studies is encouraging in that, despite very different methodologies, genes that participate in the process of metastasis are being identified.

The predicted function of these differentially expressed genes provides a glimpse into the biology of prostate cancer progression. Although functional assignment is based on the limited published data, it is encouraging that many differentially expressed genes reflect biological distinctions and functional pathways previously implicated in aggressive disease. Included are cell cycle regulators and DNA replication and repair proteins that may drive cell proliferation; transcriptional regulators believed to play a role in development and differentiation; and proteins that play a role in signal transduction, cell structure, and cell interactions with environmental factors. The distinct expression patterns of these genes help to validate their role in the clinical phenotype of aggressive metastatic disease.

Some of the specific genes that are differentially expressed may identify critical functional pathways. One example is the MYBL2 gene, which was overexpressed in many metastatic tumors. MYBL2 activates CDC2 gene expression in proliferating fibroblasts (20). CDC2 is a catalytic subunit of a protein kinase complex that induces entry into mitosis. Cyclin E modulates the functional activity of these genes. All of them were overexpressed in our metastatic samples, which suggests that this pathway may be a critical component of cell cycle regulation in many metastatic prostate cancers. Identifying these genes in specific tumor samples may, therefore, provide diagnostic information and serve as a therapeutic target.

In addition to the limited number of genes discussed here, our analysis identified hundreds of poorly characterized EST clusters that likely represent novel genes of unknown function that were highly differentially expressed between primary and metastatic prostate cancers. The biological activity of these uncharacterized genes may be inferred from attributes of known genes with shared expression patterns. Therefore, many are likely to play important biological roles similar to those predicted for the known gene products discussed here. All of them warrant further study and may provide new insights into prostate cancer biology. Additional study of the complete molecular phenotype and patterns of variation will lead to a more in-depth understanding of the clinical heterogeneity of prostate cancers.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported by NIH Grant U01 CA84999 (to W. L. G.).

3

The abbreviations used are: MSKCC, Memorial Sloan-Kettering Cancer Center; EST, expressed sequence tag.

4

Supplementary information for this article is available at Cancer Research Online (http://cancerres.aacrjournals.org).

5

Internet resources: Genecards, http://nciarray.nci.nih.gov/cards/; Locus Link, http://www.ncbi.nlm.nih.gov/LocusLink//index.html; Online Mendelian Inheritance in Man, http://neptune.nlm.nih.gov/entrez/query.fcgi?db = OMIM; Gene Ontology Browser, http://cgap.nci.nih.gov/Genes; and publications identified in PubMed, http://www.ncbi.nlm.nih.gov/PubMed/.

We thank Lishi Chen, David Kuo, Sandra Levcovici, Faye Taylor, and Michelle Pappas for technical and database assistance.

1
Greenlee R. T., Hill-Hamon M. B., Murray T., Thun M. Cancer statistics, 2001.
CA Cancer J. Clin.
,
51
:
15
-36,  
2001
.
2
Potosky A., Feuer E., Levin D. Impact of screening on incidence and mortality of prostate cancer in the United States.
Epidemiol. Rev.
,
23
:
181
-186,  
2001
.
3
Scher H. I., Heller G. Clinical states in prostate cancer: toward a dynamic model of disease progression.
Urology
,
55
:
323
-327,  
2000
.
4
Fidler I. J. Molecular biology of cancer: invasion and metastasis Devita V. T. Hellman S. Rosenberg S. A. eds. .
Cancer Principles and Practice of Oncology
,
135
-152, Lippincott-Raven Philadelphia  
1997
.
5
Eisen M. B., Spellman P. T., Brown P. O., Botstein D. Cluster analysis and display of genome-wide expression patterns.
Proc. Natl Acad. Sci. USA
,
95
:
14863
-14868,  
1998
.
6
Lee S. B., Kolquist K. A., Nichols K., Englert C., Maheswaran S., Ladanyi M., Gerald W., Haber D. A. The EWS-WT1 translocation product induces PDGF-A in desmoplastic small round cell tumour.
Nat. Genet.
,
17
:
309
-314,  
1997
.
7
Ross D. T., Scherf U., Eisen M. B., Perou C. M., Rees C., Spellman P., Iyer V., Jeffrey S. S., Van de Rijn M., Waltham M., et al Systematic variation in gene expression patterns in human cancer cell lines.
Nat. Genet.
,
24
:
227
-235,  
2000
.
8
Zhang S., Chang M. C. Y., Zylka D., Turley S., Harrison R., Turley E. The hyaluronan receptor RHAMM regulates extracellular-regulated kinase.
J. Biol. Chem.
,
273
:
11342
-11348,  
1998
.
9
Vyas P., Norris F A., Joseph R., Mjerus P. W., Orkin S. Inositol polyphosphate 4-phosphatase type 1 regulates cell growth downstream of transcription factor GATA-1.
Proc. Natl. Acad. Sci. USA
,
97
:
13696
-13701,  
2000
.
10
Welsh J. B., Zarrinkar P. P., Sapinoso L. M., Kern S. G., Behling C. A., Monk B. J., Lockhart D. J., Burger R. A., Hampton G. M. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer.
Proc. Natl. Acad. Sci. USA
,
98
:
1176
-1181,  
2001
.
11
Luo J., Duggan D. J., Chen Y., Sauvageot J., Ewing C. M., Bittner M. L., Trent J. M., Isaacs W. B. Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling.
Cancer Res.
,
61
:
4683
-4688,  
2001
.
12
Dhanasekaran S., Barrette T., Ghosh D., Shah R., Varambally S., Kurachi K., Plenta K., Rubin M., Chinnalyan A. Delineation of prognostic biomarkers in prostate cancer.
Nature (Lond.)
,
412
:
822
-826,  
2001
.
13
Waghray A., Schober M., Feroze F., Yao F., Virgin J., Chen Y. Identification of differentially expressed genes by serial analysis of gene expression in human prostate cancer.
Cancer Res.
,
61
:
4283
-4286,  
2001
.
14
Nelson P. S., Clegg N., Eroglu B., Hawkins V., Bumgarner R., Smith T., Hood L. The prostate expression database (PEDB): Status and enhancements in 2000.
Nucleic Acids Res.
,
28
:
212
-213,  
2000
.
15
Hamminki A., Markie D., Tomlinson I., Avizienyte E., Roth S., Loukola A., Bignell G., Warren W., Aminoff M., Höglund P., et al A serine/threonine kinase gene defective in Peutz-Jeghers syndrome.
Nature (Lond.)
,
391
:
184
-187,  
1998
.
16
Zhang X., Horwitz G. A., Prezant T. R., Valentini A., Nakashima M., Bronstein M. D., Melmed S. Structure, expression, and function of human pituitary tumor-transforming gene (PTTG).
Mol. Endocrinol.
,
13
:
156
-166,  
1999
.
17
Zhou H., Kuang J., Zhong L., Kuo W., Gray J. W., Sahin A., Brinkley B. R., Sen S. Tumour amplified kinase STK15/BTAK induces centrosome amplification, aneuploidy and transformation.
Nat. Genet.
,
20
:
189
-193,  
1998
.
18
Sala A., Watson R. B-Myb protein in cellular proliferation, transcription control, and cancer: latest developments.
J. Cell. Physiol.
,
179
:
245
-250,  
1999
.
19
Singh D., Febbo P., Ross K., Jackson D., Manola J., Ladd C., Tamayo P., Renshaw A., D’Amico A., Richie J., Lander E., Loda M., Kantoff P., Golub T., Sellers W. Gene expression correlates of clinical prostate cancer behavior.
Cancer Cell
,
1
:
203
-209,  
2002
.
20
Oh I-H., Reddy E. The myb gene family in cell growth, differentiation and apoptosis.
Oncogene
,
18
:
3017
-3033,  
1999
.