Serial analysis of gene expression (SAGE) can be used to quantify gene expression in human tissues. Comparison of gene expression levels in neoplastic tissues with those seen in nonneoplastic tissues can, in turn, identify novel tumor markers. Such markers are urgently needed for highly lethal cancers like pancreatic adenocarcinoma, which typically presents at an incurable, advanced stage. The results of SAGE analyses of a large number of neoplastic and nonneoplastic tissues are now available online, facilitating the rapid identification of novel tumor markers. We searched an online SAGE database to identify genes preferentially expressed in pancreatic cancers as compared with normal tissues. SAGE libraries derived from pancreatic adenocarcinomas were compared with SAGE libraries derived from nonneoplastic tissues. Three promising tags were identified. Two of these tags corresponded to genes (lipocalin and trefoil factor 2) previously shown to be overexpressed in pancreatic carcinoma, whereas the third tag corresponded to prostate stem cell antigen (PSCA), a recently discovered gene thought to be largely restricted to prostatic basal cells and prostatic adenocarcinomas. PSCA was expressed in four of the six pancreatic cancer SAGE libraries, but not in the libraries derived from normal pancreatic ductal cells. We confirmed the overexpression of the PSCA mRNA transcript in 14 of 19 pancreatic cancer cell lines by reverse transcription-PCR, and using immunohistochemistry, we demonstrated PSCA protein overexpression in 36 of 60 (60%) primary pancreatic adenocarcinomas. In 59 of 60 cases, the adjacent nonneoplastic pancreas did not label for PSCA. PSCA is a novel tumor marker for pancreatic carcinoma that has potential diagnostic and therapeutic implications. These results establish the validity of analyses of SAGE databases to identify novel tumor markers.
SAGE3 is a recently described technique that allows one to obtain a quantitative and comprehensive profile of cellular gene expression (1, 2). Briefly, in this procedure, cellular mRNA transcripts are converted to cDNA and then cleaved at specific sites by restriction enzymes into small (10–14 bp) fragments, also known as tags. These tags are ligated together into difragments, amplified by PCR, and then concatenated and sequenced as one long fragment of DNA. Each 10–14-bp fragment (tag) should uniquely identify a specific gene transcript because it corresponds to a defined sequence near the transcript’s 3′ terminus, as dictated by the tagging restriction enzyme used (1). The abundance of each tag provides a quantitative measure of the transcript level present within the mRNA sample analyzed, which therefore allows expression levels of specific transcripts to be compared between two samples (2). This ability to quantitate gene expression represents a major advantage of SAGE over other methods of screening cDNA libraries for differentially expressed genes.
In the initial demonstration of the SAGE technique, a gene expression profile of the normal pancreas was constructed and validated by Northern blotting (1). Subsequently, Zhang et al. (2) used SAGE to demonstrate differences in expression patterns between colonic and pancreatic adenocarcinomas and normal colonic epithelium. Such applications of SAGE hold tremendous promise for the identification of diagnostic and/or prognostic markers of malignancy. Indeed, the above-referenced analyses identified several promising serum markers for pancreatic carcinoma, such as tissue inhibitor of metalloproteinase 1 (3).
Three recent advances have made analyses of SAGE libraries for differentially expressed genes more feasible. First, rapid progress in the Human Genome Project has facilitated the mapping of specific genes to individual tags specified by SAGE (4). Fewer tags now correspond to ESTs of unknown origin, and more can be assigned to known genes. Second, a large number of normal and neoplastic tissues have now been analyzed by SAGE, creating extremely large databases for study. Third, much of this database is now online and available to the general public (5, 6).4 As of February 1, 2001, this online database included 88 SAGE libraries, and 3,632,974 tags.
Armed with these tools, we searched an online SAGE database to identify novel markers of pancreatic adenocarcinoma.
Materials and Methods
Based on the identification of differentially expressed genes in our ongoing SAGE investigation of pancreatic cancer,5 the xProfiler program available online4 was used to compare gene expression patterns in pancreatic cancer with those in nonneoplastic tissues. In this program, one can select SAGE libraries for analysis and then compare the tags in one group of online SAGE libraries with the tags in another group. We used two queries to determine differentially expressed genes. In the first strategy, we chose a pancreatic adenocarcinoma group composed of the SAGE libraries of four pancreatic cancer cell lines that yielded 96,494 total tags (CAPAN1, 37,926 tags; CAPAN2, 23,222 tags; HS766T, 10,467 tags; and Panc1, 24,879 tags). The nonneoplastic comparison group in this analysis was composed of the SAGE libraries of two short-term cultures of normal pancreatic duct epithelial cells that yielded 64,577 tags (HX, 32,157 tags; and H126, 32,420 tags). In the second query, we expanded both groups. We expanded the pancreatic cancer group to include the SAGE libraries of two primary pancreatic adenocarcinomas (Panc 91-16113, 33,941 tags; Panc 96-6252, 35,745 tags) in addition to the four above-mentioned pancreatic cancer cell lines. This raised the total number of tags in this group to 166,180. We also expanded the nonneoplastic group to include the SAGE libraries of normal colon epithelium (NC1 and NC2, 50,115 and 49,552 tags, respectively), normal ovarian surface epithelium (HOSE 4 and IOSE29-11, 48,113 and 48,498 tags, respectively), human microvascular endothelial cells (Duke HMVEC and Duke HMVEC + VEGF, 52,532 and 57,928 tags, respectively), normal luminal mammary epithelial cells purified with BER-EP4 antibody conjugated to magnetic beads (mammary epithelium and Br N, 49,137 and 37,558 tags, respectively), and normal prostate (Chen Normal Pr and normal prostate, 66,193 and 13,148 tags, respectively) in addition to the above-mentioned short-term cultures of normal pancreatic ductal epithelium (HX and H126). This raised the total number of tags in this group of 12 nonneoplastic SAGE libraries to 537,681. We set each of the two analyses to display the 100 SAGE tags that were most likely expressed at levels of 10-fold difference between the two groups. The coefficient of variance cutoff settings were kept at the default value of 0%.
The names of genes and ESTs were identified from the tag sequences using an online resource from the National Center for Biotechnology Information.6
The online SAGE database also has a feature that allows the user to create “virtual Northerns.” This tool allows one to view the expression levels of selected SAGE tags in all of the SAGE libraries. Data are presented as “virtual Northerns,” allowing the user to simultaneously visualize the levels of gene expression across multiple samples.
Human cell lines AsPC1, BxPc3, CAPAN1, CFPAC1, HS766T, MiaPaCa2, and Panc1 were obtained from the American Type Culture Collection (Manassas, VA). The 12 PL cell lines (PL1–6, PL8–11, PL13, and PL14) were low-passage pancreatic carcinoma cell lines generously provided by Dr. Elizabeth Jaffee (7). An immortal human pancreatic duct epithelial cell line (HPDE) obtained after transduction of the human papillomavirus 16 E6/E7 genes was kindly provided by Dr. Ming-Sound Taso (University of Toronto, Ontario, Canada). Cells were cultured in RPMI 1640 (Life Technologies, Inc., Rockville, MD) supplemented with 100 units/ml penicillin, 100 μg/ml streptomycin, 4 mml-glutamine, and 10% FCS. Cells were incubated at 37°C in a humidified atmosphere of 5% CO2 in air.
Total RNA was isolated from cultured cells by using Trizol reagent (Life Technologies, Inc.). An aliquot of 1 μg of total RNA from each sample was reverse transcribed to cDNA using the Superscript II kit (Life Technologies, Inc.) according to the manufacturer’s instructions, with oligo(dT)12–18 primer. PCR primers were designed to amplify a 207-bp cDNA TFF2 fragment (5′-ATGGATGCTGTTTCGACTCC-3′, sense; 5′-CAGACTTCGGGAAGAAGCAC-3′, antisense) and a 202-bp cDNA PSCA fragment (5′-CCACCCTTAACCCTGTGTTC-3′, sense; 5′-AAACTCCCAGGAACTCACGTC-3′, antisense). The PCR conditions were as follows: initial denaturation at 95°C for 3 min; 30 cycles of amplification (95°C for 15 s, 60°C for 15 s, and 72°C for 20 s); and a final extension step of 4 min at 72°C. The PCR reaction products were resolved by electrophoresis in a 2% agarose gel and stained with ethidium bromide. Loading was controlled by the simultaneous PCR of glyceraldehyde-3-phosphate dehydrogenase cDNA.
A series of 60 well-characterized primary invasive pancreatic adenocarcinomas resected at The Johns Hopkins Hospital were selected solely on the basis of tissue availability. For each case, a representative formalin-fixed paraffin-embedded tissue block containing invasive pancreatic ductal adenocarcinoma and normal tissue was chosen for labeling. Unstained 4-μm sections were then cut from the paraffin block selected and deparaffinized by routine techniques. The slides were steamed for 20 min in sodium citrate buffer (diluted to 1× from 10× heat-induced epitope retrieval buffer; Ventana-Bio Tek Solutions, Tucson, AZ). After cooling for 5 min, one slide was labeled with a 1:200 dilution of a mouse monoclonal antibody to PSCA (clone 1G8, obtained from R. E. R.), and deeper cuts of the same 60 blocks were labeled with a 1:700 dilution of the same antibody using the Bio Tek 1000 automated stainer (Ventana-Bio Tek Solutions). Labeling was detected by adding biotinylated secondary antibodies, avidin-biotin complex, and 3,3′-diaminobenzidine. Sections were then counterstained with hematoxylin. The extent and intensity of immunolabeling were evaluated jointly by three authors (P. A., R. E. W., and R. H. H.) using a multiobserver microscope. The extent of immunolabeling was categorized into five groups : 0%, negative; 1–25%, focal; and 26–50%, 51–75%, or 76–100%, diffuse. The intensity of immunolabeling was categorized as weak (+), moderate (++), strong (+++), or intense (++++). For the final statistical analyses, all focally labeled cases were categorized as “focal,” and all cases showing ≥26% labeling were categorized as “positive.” Control tissue (normal prostate) demonstrated the expected selective epithelial labeling pattern with no stromal labeling at both the 1:200 and 1:700 dilutions.
The primary outcome for this study was overall survival from the date of surgery to the time of the last follow-up or death within 5 years. Data on survival were censored if the patient was still alive at the time of the last follow-up or had died within 1 week of surgery. Kaplan-Meier survival curves were constructed and compared on the basis of PSCA status by the log-rank test. A Cox proportional hazards logistic regression model assessed estimates of the relative risk of mortality for single factors and the simultaneous contribution of the following baseline covariates to the relative risk of mortality: (a) tumor size (≥3.0 cm versus <3.0 cm); (b) resection margin status (positive margins versus negative margins); (c) resected lymph node status; (d) the degree of differentiation of the tumor (poorly differentiated versus well or moderately well differentiated); (e) the tumor-node-metastasis (TNM) stage; (f) the year of surgery; (g) patient age; (h) the amount of intraoperative blood loss (liters); and (i) PSCA status. Baseline demographic and clinical factors were compared by PSCA status. Ps were computed by Wilcoxon’s rank-sum test for continuous values and by Fischer’s exact test for discrete values. All tests were two-sided. Statistical analyses were carried out using STATA version 7 software (Statacorp, College Station, TX).
In the first query, in which the four pancreatic cancer cell lines were compared with two short-term cultures of normal pancreatic ductal cells, 67 SAGE tags were identified as more frequently expressed in the cancer group than in the normal group, and 33 were identified as more frequently expressed in the normal than in the cancer group. In the second query, in which the 4 pancreatic cancer cell lines and 2 primary pancreatic cancers were compared with 2 short-term cultures of normal pancreatic ductal cells and 10 other nonneoplastic tissues, 74 SAGE tags were identified as more frequently expressed in the cancer group than the normal group, and 26 SAGE tags were identified as more frequently expressed in the normal group than in the cancer group.
Four criteria were then used to narrow the candidate tags. First, only tags expressed more frequently in the pancreatic cancer groups were considered. Second, tags likely to correspond to normal entrapped pancreatic parenchyma or stromal elements (such as insulin, pancreatic polypeptide, and collagen type 1 and 3 α) were excluded. These tags were identified only with the second query, which included the libraries derived from primary pancreatic cancers that would be predicted to contain such nonneoplastic elements (8). Third, only tags corresponding to known genes were considered, so that tags corresponding to ESTs or rRNAs were excluded. Fourth, only tags appearing within the top 25 tags of both queries were considered.
When these strategies were applied, three tags emerged as the most promising markers for pancreatic cancer. Two of these corresponded to genes that have been previously shown to be overexpressed in pancreatic carcinomas; these were lipocalin 2 (the human homologue of mouse oncogene 24p3) and TFF2 (9, 10, 11). The third tag was Hs.20166 (GCCCAGCATT), corresponding to the recently discovered PSCA gene (12). This tag was identified 38 times in the 166,180 tags derived from the pancreatic cancers, but was never identified in the 64,577 tags derived from normal pancreatic ductal epithelium. This gene was selected for further analysis.
Using the online SAGE Tag to Gene mapping and Virtual Northern functions, we found that 13 of the 88 SAGE libraries in the database contained at least one copy of this tag (Fig. 1). These included four of the six (66%) pancreatic cancer SAGE libraries, along with SAGE libraries derived from normal prostate, primary prostate cancer, and seven other malignancies (listed in Fig. 1).
Among the 13 libraries that contained the tag for PSCA, 10 demonstrated a solitary PSCA tag. Given the known potential error rate inherent in the sequencing procedure used to generate SAGE libraries, one must be leery of accepting tags identified only once in a library because they may have been generated by random error. However, among the three libraries with more than one PSCA tag were two pancreas cancer cell lines, CAPAN1 and CAPAN2, which demonstrated 19 and 17 tags, respectively. On normalization for the number of tags per library, SAGE libraries derived from these cell lines demonstrated over 15 times as many PSCA tags as those derived from any other tissue in the database (Fig. 1).
Primers corresponding to the PSCA and TFF2 transcripts were designed, and RT-PCR was performed on RNA extracted from 19 pancreatic cell lines. The TFF2 transcript was identified in 16 of the 19 cell lines, and the PSCA transcript was identified in 14 of the 19 cell lines (74%; Fig. 2). Of note, the RT-PCR analyses parallel the results of SAGE analyses. For example, the PSCA transcript was demonstrated in cell line CAPAN1 by RT-PCR, and this cell line demonstrated a high 500 tags/million PSCA expression level with the online SAGE virtual Northern. Also, the PSCA tag was not identified in the online SAGE library corresponding to cell line Panc1, which correlates with the negative RT-PCR result we obtained on RNA extracted from this cell line.
Using the 1:200 and 1:700 dilutions of the monoclonal anti-PSCA antibody, normal pancreatic tissue did not label, with the exception of a single case in which atrophic pancreatic ducts in an area of chronic pancreatitis labeled weakly. Overall, 36 of 60 tumors (60%) labeled for PSCA (Table 1). In four cases, labeling was focal (1–25% of tumor cells labeled), whereas in four other cases, the labeling was essentially uniform throughout the tumor (75–100% of tumor cells labeled). In the remaining 28 cases, 26–75% of the neoplastic cells were labeled. Similar labeling patterns were seen with both dilutions of the antibody, with the labeling being weaker but still present in all cases at the 1:700 dilution. In general, labeling was intense (3+ or greater) within most (28 of 36) tumors and clearly demarcated them from adjacent normal tissues (Fig. 3). Labeling was most often heterogeneous within malignant glands of the tumors, such that some malignant cells labeled strongly, whereas others were completely negative. Frequently, PSCA labeling often appeared to be accentuated at the luminal border of the neoplastic glands (Fig. 3,D), and the luminal contents were frequently labeled. A range of PanIN was identified on the sections studied (13). These consisted of 50 duct profiles containing PanINs derived from 17 cases. These PanINs labeled variably (Table 1). Among the 16 PanIN-1A lesions, 9 labeled, whereas 7 did not label. Among the 20 PanIN-1B lesions, 12 labeled (4 focally), whereas 8 did not label. Among seven PanIN-2 lesions, three labeled, whereas four did not label. Among seven PanIN-3 lesions, only one labeled focally, whereas six did not label.
The presence or absence of PSCA immunoreactivity in the pancreatic adenocarcinomas was correlated with a variety of clinicopathological factors. There was a trend for PSCA immunoreactivity to be more frequent in pancreatic carcinomas in men than in women (68% versus 55%; P = nonsignificant). No significant correlations were found between PSCA labeling and tumor size, lymph node status, margin status, race, age, or survival.
This year, it has been estimated that 28,000 Americans will be diagnosed with pancreatic cancer, and 28,000 will die from it (14, 15). Tragically, patients are usually asymptomatic until the tumor has reached an advanced stage and is incurable with existing therapy. Current methods of early detection are inadequate. Therefore, there is a great need to develop new markers that will increase our ability to diagnose this deadly cancer.
We analyzed an online SAGE database and identified a new and previously unsuspected marker for pancreatic carcinoma, PSCA. We confirmed the presence of the PSCA mRNA transcript by RT-PCR analysis in pancreatic cancer cell lines and verified protein expression by immunohistochemical analysis of 60 surgically resected pancreatic ductal adenocarcinomas. These studies were concordant because overexpression of PSCA was identified in approximately two-thirds of the pancreatic cancers studied by each technique. In 59 of the 60 pancreata examined immunohistochemically, the adjacent nonneoplastic pancreatic parenchyma did not label. In one case, atrophic pancreatic parenchyma in an area of chronic pancreatitis labeled.
PSCA encodes a 123-amino acid glycoprotein that is anchored to the cell membrane by a glycosylphosphatidyl inositol anchor (12). It has been demonstrated to have limited normal tissue distribution by RT-PCR and immunohistochemical studies and is expressed most strongly in the prostate, where it is localized to the putative stem cell component of the prostate, the basal cells. Significantly, PSCA has been demonstrated to be overexpressed by more than 80% of prostatic carcinomas and correlates with the aggressive features of high stage, high Gleason grade, and androgen independence (16). Normal pancreatic tissue does not express PSCA by Northern blotting (12) or by immunohistochemistry (16).
The finding of PSCA overexpression in pancreatic cancer has several immediate applications. The immunohistochemical labeling assay for PSCA could prove useful for diagnostic purposes. Because PSCA is not expressed in normal pancreas, expression of PSCA could support the diagnosis of pancreatic adenocarcinoma, particularly in small biopsy or cytopathology samples. However, our identification of PSCA in some PanINs and in one atrophic pancreas indicates that PSCA labeling in and of itself is not accurate enough to establish the diagnosis of invasive carcinoma in the pancreas.
The immunohistochemical labeling pattern we identified with PSCA, that of frequent accentuation at the luminal borders of the malignant glands, raises the possibility that PSCA may be secreted into pancreatic juice or released into the blood. If so, tests could be devised to detect PSCA in the blood, in duodenal and pancreatic fluids or in stool samples, thereby providing a new marker of pancreatic malignancy. Indeed, PSCA protein has been demonstrated to be secreted in vitro by 293T cells that are transfected with PSCA (12). However, given that approximately one-third of pancreatic cancers do not overexpress PSCA, this potential marker would not be expected to be 100% sensitive. Indeed, a growing body of evidence now suggests that a panel of markers may be needed to screen for pancreatic cancer (3).
Finally, as a cell surface protein, PSCA has shown promise as a target for immunotherapy of advanced cancers of the prostate (17, 18) Jaffee et al. (19) have recently demonstrated that immunotherapy can be safe and effective in patients with pancreatic cancer, and our findings raise the possibility that PSCA may be a rational immune target in pancreatic cancers that overexpress PSCA.
In summary, we demonstrate that searching an online SAGE database for tags differentially expressed in the libraries derived from neoplastic and nonneoplastic tissues can lead to the discovery of novel neoplastic markers.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by the Specialized Program of Research Excellence (SPORE) in Gastrointestinal Cancer p50-CA62924, The National Pancreas Foundation, and The Michael Rolfe Fund for pancreatic cancer research.
The abbreviations used are: SAGE, serial analysis of gene expression; PanIN, pancreatic intraepithelial neoplasia; PSCA, prostate stem cell antigen; TFF2, trefoil factor 2; RT-PCR, reverse transcription-PCR; EST, expressed sequence tag.
B. Ryu, J. Jones, M. A. Hollingsworth, R. H. Hruban, and S. E. Kern. Identification of differentially expressed genes by serial analysis of gene expression profiling in pancreatic cancer, manuscript in preparation.
|.||Normal pancreas (n = 60) .||PanIN (n = 50) .||Infiltrating pancreatic adenocarcinoma (n = 60) .|
|Negative||59 (98%)||25 (50%)||24 (40%)|
|Focal||0||5 (10%)||4 (7%)|
|Positive||1a (2%)||20 (40%)||32 (53%)|
|.||Normal pancreas (n = 60) .||PanIN (n = 50) .||Infiltrating pancreatic adenocarcinoma (n = 60) .|
|Negative||59 (98%)||25 (50%)||24 (40%)|
|Focal||0||5 (10%)||4 (7%)|
|Positive||1a (2%)||20 (40%)||32 (53%)|
Weak labeling in a case of chronic pancreatitis.
We thank Jennifer A. Galford for her hard work in preparing the manuscript and Dr. Ming-Sound Taso (University of Toronto, Ontario, Canada) for kindly providing the immortal human pancreatic duct epithelial cell line (HPDE).