Abstract
Bladder cancer is the fifth most commonly diagnosed malignancy in the United States and one of the most prevalent worldwide. It harbors a probability of recurrence of >50%; thus, rigorous, long-term surveillance of patients is advocated. Flexible cystoscopy coupled with voided urine cytology is the primary diagnostic approach, but cystoscopy is an uncomfortable, invasive procedure and the sensitivity of voided urine cytology is poor in all but high-grade tumors. Thus, improvements in noninvasive urinalysis assessment strategies would benefit patients. We applied gene expression microarray analysis to exfoliated urothelia recovered from bladder washes obtained prospectively from 46 patients with subsequently confirmed presence or absence of bladder cancer. Data from microarrays containing 56,000 targets was subjected to a panel of statistical analyses to identify bladder cancer-associated gene signatures. Hierarchical clustering and supervised learning algorithms were used to classify samples on the basis of tumor burden. A differentially expressed geneset of 319 gene probes was associated with the presence of bladder cancer (P < 0.01), and visualization of protein interaction networks revealed vascular endothelial growth factor and angiotensinogen as pivotal factors in tumor cells. Supervised machine learning and a cross-validation approach were used to build a 14-gene molecular classifier that was able to classify patients with and without bladder cancer with an overall accuracy of 76%. Our results show that it is possible to achieve the detection of bladder cancer using molecular signatures present in exfoliated tumor urothelia. Further investigation and validation of the cancer-associated profiles may reveal important biomarkers for the noninvasive detection and surveillance of bladder cancer. (Cancer Epidemiol Biomarkers Prev 2009;18(2):444–53)
Introduction
Cancer of the urinary bladder is among the five most common malignancies world-wide and one of the most prevalent (1). Early detection remains one of the most urgent issues in bladder cancer research. When detected early, the 5-year survival rate is ∼94%; thus, timely intervention dramatically increases patient survival rate. At presentation, >80% of bladder tumors are noninvasive papillary tumors, but the remaining 20% exhibit muscle invasion at the time of diagnosis and have a much less favorable prognosis. Although noninvasive lesions are treated by transurethral resection, >70% of patients with these lesions have disease recurrence during the first 2 years. If left untreated, these initially noninvasive lesions can progress to being muscle-invasive (2). The recurrence phenomenon of noninvasive bladder tumors makes bladder cancer one of the more prevalent cancers; specifically 500,000 Americans are currently under treatment for bladder cancer (3). Furthermore, once bladder tumors are identified and removed, patients will routinely require strict surveillance cystoscopy every 3 months for 2 years, then every 6 months for 2 years, then yearly thereafter. As with the early detection of a first carcinoma event, the timely diagnosis and treatment of disease recurrence can dramatically improve the patients' quality of life.
The detection and surveillance of bladder cancer involves the visual inspection of the bladder for lesions using cystoscopic examination of the bladder. Cystoscopy is an uncomfortable, invasive procedure associated with significant cost and possible infection and trauma. Thus, the development of noninvasive assessment strategies is desirable for both patients and health care providers. Voided urine cytology remains the method of choice for the noninvasive detection of bladder cancer lesions, with its major application being to recognize disease recurrence and early progression in tumor stage and grade. Voided urine cytology can be used to diagnose new malignancy, yet although it has a specificity of >93%, its sensitivity is only 25% to 40%, especially for low-grade and low-stage tumors (4, 5). Furhermore, this analysis is prone to interobserver variation, results are not available rapidly, and it is relatively expensive. Accordingly, a good deal of research has focused on identifying potential urine tumor markers with higher sensitivity than provided by urine cytology alone. Diagnostic protein markers for urinalysis have been developed commercially, but these tests also suffer from high false-positive rates (6). Other promising diagnostics include telomerase detection (7) or activity assays (8) microsatellite instability assays and fluorescent in situ hybridization methods (9). However, these assays may have insufficient predictive power to be applied to the management of individual patients, and importantly, these techniques are complex, and require skillful interpretation. Thus, the identification of alternative biomarkers for the early detection and surveillance of bladder cancer in noninvasively obtained material remains important for the management of patients with this disease.
The advent of high-throughput microarray gene expression technology has greatly enabled the search for clinically important disease biomarkers. Numerous exploratory studies have shown the potential value of gene expression signatures in tumor classification (10), diagnosis (11), and in assessing the risk of postsurgical disease recurrence (12-14) in many tissue types including bladder cancer (15-23). To date, gene expression profiling studies of urological clinical material have focused on the analysis of excised solid tumor tissue (15-23). These studies have identified gene signatures that are associated with tumor stage (15, 17, 18, 23), disease recurrence and outcome prediction (15, 16, 19, 20, 23), and subtype classification (16, 17). The fact that follow-up studies have validated some of the biomarkers in independent tissue collections shows the potential utility of microarray profiling of bladder source materials (24, 25); however, the molecular analysis of solid tissue is most applicable to the development of assays that will aid the histologic evaluation of biopsy or excised tumor material. To progress toward the development of novel molecular assays for noninvasively obtained material, the more clinically appropriate material for profiling is the urine, and/or the surface transitional urothelia that are naturally shed into the urine. We and others are performing proteomic profiling of soluble factors in urine (26, 27), but to our knowledge, no one has yet done gene expression profiling on shed urothelial cells obtained from patients with bladder cancer.
In this study, we investigated the feasibility of gene expression profiling of exfoliated urothelia to identify differentially expressed gene signatures associated with the presence or absence of bladder cancer. Urothelial cells were obtained from bladder washings from 20 patients with confirmed bladder cancer and 26 patients with no evidence of bladder cancer. Two rounds of linear amplification of the urothelial mRNA enabled us to obtain enough material for hybridization to Affymetrix U133 Plus 2.0 GeneChips, and the application of robust statistical analyses identified a set of differentially expressed genes associated with bladder cancer. Furthermore, the application of a recently developed feature selection algorithm (14, 28) revealed the optimal gene signature for discriminating between the two groups. Some of the genes identified in this study have been implicated in bladder cancer previously, but many have not. Further analysis of the data implicated a role for specific signaling pathways in the neoplastic urothelia. The ability to perform global gene expression profiling on the minimal material present in shed urothelial will greatly facilitate the identification and development of potential biomarkers for the detection of bladder cancer in noninvasively obtained patient samples.
Materials and Methods
Clinical Sampling and Processing
Under Institutional Review Board approval and informed consent, urothelial samples, and associated clinical information were prospectively collected from 46 consecutive individuals with no previous history of urothelia carcinoma. This cohort served as our phase I (feasibility study) according to the International Consensus Panel on Bladder Tumor Markers (29). Patients were undergoing complete hematuria workup including office cystoscopy and upper tract imaging by computed tomography of the abdomen and pelvis (without and with i.v. contrast). Two different clinical cohorts were analyzed in this study. The first group (control) consisted of 26 subjects with a negative evaluation (i.e., normal imaging of upper tract and normal cystoscopy). The second group (experimental) consisted of 20 subjects with a visible bladder tumor detected by upper tract imaging and/or cystoscopy, and which was later proven by evaluation of a biopsy to be urothelial carcinoma. Initially, 100 mL of voided urine was obtained and processed for RNA. However, normal subjects were noted to have RNA concentrations of <2 ng/μL. It was felt that this concentration was too low to adequately perform microarray analysis. Thus, the decision was made to use bladder washings to generate the initial genomic profile.
Sampling of exfoliated urothelia for both cohorts was obtained by injection of 50 mL of saline into the bladder during the time of their office cystoscopy (barbotage). The saline solution was immediately aspirated and collected for subsequent analysis. Pertinent information on presentation, histologic grading and staging, therapy, and outcome were recorded (see Table 1). Each barbotage sample (50 mL) was assigned a unique identifying number before immediate delivery and laboratory processing. Urothelial cells were pelleted by centrifugation (600 × g, 4°C, 5 min), rinsed in PBS, pelleted again, and lysed by direct application of RNeasy lysis buffer (Qiagen). RNA samples were evaluated quantitatively and qualitatively using an Agilent Bioanalyzer 2000, before storage at −80°C.
. | Noncancer (%)* . | Cancer (%) . | ||
---|---|---|---|---|
. | (n = 26) . | (n = 20) . | ||
Median Age (range, y) | 65 (30-83) | 65 (36-82) | ||
Male/Female ratio | 16:10 | 17:3 | ||
Race | ||||
White | 17 (65) | 17 (85) | ||
African American | 6 (23) | 3 (15) | ||
Other | 3 (12) | 0 (0) | ||
Tobacco use | 10 (46) | 14 (70) | ||
Reported gross hematuria | 18 (69) | 14 (70) | ||
Suspicious/positive urine cytology | 0 (0) | 7 (35) | ||
Clinical stage | ||||
Ta | N/a | 9 (45) | ||
T1 | N/a | 4 (20) | ||
T2† | N/a | 7 (35) | ||
Grade | ||||
1/2 | N/a | 9 (45) | ||
3 | N/a | 11 (55) | ||
Median tumor size (cm) | 3.1 | |||
Median no of tumors | 1.7 |
. | Noncancer (%)* . | Cancer (%) . | ||
---|---|---|---|---|
. | (n = 26) . | (n = 20) . | ||
Median Age (range, y) | 65 (30-83) | 65 (36-82) | ||
Male/Female ratio | 16:10 | 17:3 | ||
Race | ||||
White | 17 (65) | 17 (85) | ||
African American | 6 (23) | 3 (15) | ||
Other | 3 (12) | 0 (0) | ||
Tobacco use | 10 (46) | 14 (70) | ||
Reported gross hematuria | 18 (69) | 14 (70) | ||
Suspicious/positive urine cytology | 0 (0) | 7 (35) | ||
Clinical stage | ||||
Ta | N/a | 9 (45) | ||
T1 | N/a | 4 (20) | ||
T2† | N/a | 7 (35) | ||
Grade | ||||
1/2 | N/a | 9 (45) | ||
3 | N/a | 11 (55) | ||
Median tumor size (cm) | 3.1 | |||
Median no of tumors | 1.7 |
Abbreviation: N/a, not applicable.
Negative hematuria evaluation.
One patient was noted to have carcinoma in situ on pathologic evaluation.
Gene Expression Profiling
RNA Processing and Hybridization. Gene expression profiling was done on Affymetrix Human Genome arrays according to standard protocols (Affymetrix). However, due to the paucity of RNA recovered from the urothelial samples (50-200 ng total RNA), a double amplification protocol was used (30). Preparation of labeled cRNA was done according to the Affymetrix Two-Cycle Target Labeling Assay (Affymetrix). Fragmented, biotinylated cRNA was hybridized to Affymetrix Human Genome U133 Plus 2.0 microarrays.
Generation of Expression Values. Microarray Suite version 5 (Affymetrix) was used to generate signal values but not detection calls due to the practice of double amplifications. Rather, a mixture Gaussian model was built to determine the “Absent/Present” calls (31). Quality control of each GeneChip experiment included the assessment of the 5′:3′ ratio. This index reflects not only the original level of RNA integrity but also the accuracy of sample processing (32). Any samples that had a 5′:3′ index <1 were removed from analysis.
Tests of Significance. The two-sample Welch t statistics that allows unequal variances was used to identify genes that were differentially expressed between normal and tumor samples. The P value was used to assess the statistical significance for each gene. To correct for multitesting errors, the family-wise error rate was used after a permutation based bootstrap step-down minP procedure. In this procedure, no specific parametric form was assumed for the distribution of the test statistics. The class labels of the samples were permuted 10,000 times, and for each permutation, two-sample Welch t statistics were computed for each gene. The permutation P value for a particular gene is the proportion of the permutations (of 10,000) in which the permuted test statistic exceeds the observed test statistic in absolute values. The above analyses were conducted using Bioconductor and AnalyzeIt Tools.5
Gene Ontology and Pathway Analysis. Gene Ontology annotations were obtained from Affymetrix. Biological network relationships among significantly regulated genes were explored using the Pathway Studio and ResNet mammalian database (Ariadne Genomics, Inc.; ref. 33). All microarray data obtained in the course of this study.6
Derivation of a Diagnostic Gene Signature
To extract an accurate diagnostic signature from the microarray data, we applied a feature selection algorithm that we previously derived (13, 14, 28). The algorithm performs multivariate data analyses on high-dimensional data using well-established machine learning and numerical analysis techniques but without making any assumptions about the underlying data distribution. The algorithm can perform feature selection and classification simultaneously. We have previously applied this algorithm, and an earlier version of it, to the derivation of optimal prognostic classifiers in breast and prostate cancer (14, 28). To avoid possible overfitting of a computational model to training data, we used a rigorous experimental protocol with the leave-one-out cross validation method to estimate classifier variables and prediction performance (34). A receiver operating characteristic curve obtained by varying a decision threshold is used to provide a direct view on how a prediction approach performs at the different sensitivity and specificity levels. Here, specificity is defined as the probability that a patient who did not have bladder cancer was assigned to the normal group, and the sensitivity is the probability that a patient who had bladder cancer was assigned to the disease group. For details of the computational algorithm, see Sun et al. (13, 14, 28). The Matlab implementation of the algorithm is available upon request for validating the reported results and academic research.
Validation of Profiles in an Independent Data Set
Expression profiles of a panel of bladder tumors were generated by Dyrskjot et al. (17) in a study designed to identify gene expression in superficial tumors with respect to the presence of associated carcinoma in situ lesions. The study included 28 tumor biopsies from superficial urothelial carcinoma (stages Ta to T2) and 9 biopsies of normal bladder mucosa from patients with no history of bladder cancer. The study also contained 13 tumor biopsies from muscle-invasive urothelial carcinoma, but as we had no such cases in our cohort, we did not include these in the validation study. All of the samples were obtained directly from surgery after removal of tissue for routine pathologic examination. RNA from these samples was hybridized to Affymetrix U133A GeneChips. Preparation of cRNA, hybridization and image acquisition were done according to standard Affymetrix protocols. The microarray data for this study (accession # GSE3167) were retrieved from the Gene Expression Omnibus (National Center for Biotechnology Information). The mapping of the probes between the U133A and U133 Plus 2.0 GeneChip platforms was done using NetAffy Analysis Center.7
Results
Urothelial cell samples obtained from a total of 46 patients were obtained for this study. After complete evaluation, 26 subjects had no evidence of bladder tumors, and 20 subjects had biopsy confirmed urothelial carcinoma. The median age for all patients was 65 years. Urine cytology was done in all cases and was reported as nonsuspicious/no neoplasia in all cases in the control group, and reported as suspicious/positive in 7 (35%) of the tumor-bearing group. All tumor-bearing samples were obtained from early stage disease (stages Ta, T1, or T2). A listing of the patient cohort characteristics are summarized in Table 1.
Gene expression profiles of the 46 urothelial samples were obtained by hybridization to Affymetrix U133 Plus 2.0 arrays containing 54,613 targets (covering 47,000 transcripts). Due to the paucity of RNA recovered from the urothelial cell samples (ranging from 50-200 ng total RNA), a two-cycle amplification strategy was required to generate enough labeled cRNA for array hybridization (30). As evaluation of minimal starting RNA material is awkward, we added a postarray hybridization quality control strategy to remove samples of insufficient RNA quality from further analysis. A high post chip 5′:3′ ratio of >1 corresponds to high quality material and processing (32).
After appropriate normalization, two-way hierarchical cluster analysis resulted in a distinct separation of the samples into two groups and identified a geneset that had expression patterns associated with disease status. The dendrogram (Fig. 1) shows that one cluster contained 19 of the 20 tumor-bearing cases, and the second cluster contained the 26 noncancer cases and one tumor case. A total of 319 differentially expressed genes (P < 0.01) associated with these clusters were ranked by P value (see Supplementary Table S1 in Supplementary Data). The predominant compartmental class of genes in the set encoded integral membrane proteins (110 genes), and 53 genes encoded secreted proteins. These classes are of particular potential for development as biomarkers for urinalysis. The statistical significance of the gene expression correlations with disease status was further refined by calculation of the family-wise error rate. Results derived from 10,000 permutations of the class labels (tumor or normal) revealed that the top 45 genes had a family-wise error rate of <0.05 (Supplementary Table S2 in Supplementary Data; Table 2). Differentially expressed gene information was imported into “Pathway Studio” software (Ariadne Genomics Inc.). This analysis can reveal common regulators and associated pathway components within the data set, based on multiple citations (33). The analysis revealed connectivity between many of the genes associated with bladder disease status, but mapping these relationships showed that this connectivity is mediated through a few key factors that act as signaling hubs (Fig. 2). The two major hubs, vascular endothelial growth factor and angiotensinogen, both up-regulated in tumor cases, are linked directly biologically, and indirectly through three minor hubs FLT1, ANG, and ERBB2.
Gene ID . | Gene symbol . | Gene name . | Ascribed function . | |||
---|---|---|---|---|---|---|
Up-regulated | ||||||
1495 | CTNNA1 | Catenin (cadherin-associated protein), α 1 | Cell adhesion/negative regulation of apoptosis | |||
7097 | TLR2 | Toll-like receptor 2 | Induction of apoptosis | |||
388121 | TNFAIP8L3 | Tumor necrosis factor, α-induced protein 8-like 3 | Unknown | |||
25930 | PTPN23 | Protein tyrosine phosphatase, nonreceptor type 23 | Tyrosine phosphatase | |||
10331 | B3GNT3 | UDP-GlcNAc: βGal β-1,3-N-acetylglucosaminyltransferase 3 | protein glycosylation | |||
9570 | GOSR2 | Golgi SNAP receptor complex member 2 | Intracellular protein transport | |||
9630 | GNA14 | Guanine nucleotide binding protein (G protein), α 14 | ADP-ribosylation | |||
2782 | GNB1 | guanine nucleotide binding protein (G protein), β polypeptide 1 | GTPase activity | |||
26043 | UBXN7 | UBX domain protein 7 | Unknown | |||
112399 | EGLN3 | Egl nine homologue 3 | Apoptosis/metal ion binding | |||
6515 | SLC2A3 | Solute carrier family 2 (facilitated glucose transporter), member 3 | glucose transmembrane transporter | |||
54469 | ZFAND6 | Zinc finger, AN1-type domain 6 | DNA binding | |||
7071 | KLF10 | Kruppel-like factor 10 | transcription factor | |||
4299 | AFF1 | AF4/FMR2 family, member 1 | transcription factor | |||
Down-regulated | ||||||
2064 | ERBB2 | Erb B2; Her2/neu | Receptor signaling protein tyrosine kinase | |||
1755 | DMBT1 | Deleted in malignant brain tumors 1 | Cell differentiation, cell cycle regulation | |||
25859 | PART1 | Prostate androgen-regulated transcript 1 | Unknown | |||
23762 | OSBP2 | Oxysterol binding protein 2 | Lipid transport | |||
138639 | PTPDC1 | Protein tyrosine phosphatase domain containing 1 | Protein dephosphorylation | |||
4245 | MGAT1 | Mannosyl (α-1,3-)-glycoprotein β | Carbohydrate metabolism | |||
4809 | NHP2L1 | NHP2 nonhistone chromosome protein 2-like 1 | Nuclear mRNA splicing | |||
7627 | ZNF75A | Zinc finger protein 75a | Transcription factor | |||
2330 | FMO5 | Flavin containing monooxygenase 5 | Electron transport | |||
9825 | SPATA2 | Spermatogenesis associated 2 | Unknown |
Gene ID . | Gene symbol . | Gene name . | Ascribed function . | |||
---|---|---|---|---|---|---|
Up-regulated | ||||||
1495 | CTNNA1 | Catenin (cadherin-associated protein), α 1 | Cell adhesion/negative regulation of apoptosis | |||
7097 | TLR2 | Toll-like receptor 2 | Induction of apoptosis | |||
388121 | TNFAIP8L3 | Tumor necrosis factor, α-induced protein 8-like 3 | Unknown | |||
25930 | PTPN23 | Protein tyrosine phosphatase, nonreceptor type 23 | Tyrosine phosphatase | |||
10331 | B3GNT3 | UDP-GlcNAc: βGal β-1,3-N-acetylglucosaminyltransferase 3 | protein glycosylation | |||
9570 | GOSR2 | Golgi SNAP receptor complex member 2 | Intracellular protein transport | |||
9630 | GNA14 | Guanine nucleotide binding protein (G protein), α 14 | ADP-ribosylation | |||
2782 | GNB1 | guanine nucleotide binding protein (G protein), β polypeptide 1 | GTPase activity | |||
26043 | UBXN7 | UBX domain protein 7 | Unknown | |||
112399 | EGLN3 | Egl nine homologue 3 | Apoptosis/metal ion binding | |||
6515 | SLC2A3 | Solute carrier family 2 (facilitated glucose transporter), member 3 | glucose transmembrane transporter | |||
54469 | ZFAND6 | Zinc finger, AN1-type domain 6 | DNA binding | |||
7071 | KLF10 | Kruppel-like factor 10 | transcription factor | |||
4299 | AFF1 | AF4/FMR2 family, member 1 | transcription factor | |||
Down-regulated | ||||||
2064 | ERBB2 | Erb B2; Her2/neu | Receptor signaling protein tyrosine kinase | |||
1755 | DMBT1 | Deleted in malignant brain tumors 1 | Cell differentiation, cell cycle regulation | |||
25859 | PART1 | Prostate androgen-regulated transcript 1 | Unknown | |||
23762 | OSBP2 | Oxysterol binding protein 2 | Lipid transport | |||
138639 | PTPDC1 | Protein tyrosine phosphatase domain containing 1 | Protein dephosphorylation | |||
4245 | MGAT1 | Mannosyl (α-1,3-)-glycoprotein β | Carbohydrate metabolism | |||
4809 | NHP2L1 | NHP2 nonhistone chromosome protein 2-like 1 | Nuclear mRNA splicing | |||
7627 | ZNF75A | Zinc finger protein 75a | Transcription factor | |||
2330 | FMO5 | Flavin containing monooxygenase 5 | Electron transport | |||
9825 | SPATA2 | Spermatogenesis associated 2 | Unknown |
NOTE: Expression patterns are described as either up-regulated or down-regulated in the 20 tumor cases (Welch's t statistic P < 0.01 and family-wise error rate <0.05).
Gene expression differences between tumor and normal urothelial cell samples may implicate specific genes in malignant processes, and thus reveal insights into tumor biology. However, the feasibility of using such differences to classify unlabeled testing samples requires a different computational approach known as supervised machine learning. Here, we used our previously developed feature selection/classification algorithm to identify the gene signature that could most accurately diagnose the 46 cases with respect to the presence of bladder cancer. With this modeling classification approach, a 14-gene model reached 76% overall accuracy in predicting class label during leave-one-out cross validation (Table 3). This level of accuracy supports the feasibility that gene expression differences can potentially be used to predict the identity of urothelial cell samples from patients of unknown clinical status. A receiver operator characteristic plot was used to illustrate how the accuracy of the classifier varies at different sensitivity and specificity levels (Fig. 3). For example, at a sensitivity of 90%, the molecular classifier had a specificity of 65%. The results compared very favorably with cytologic evaluation of this cohort, where only 35% of tumor cases were correctly diagnosed. The genetic signature correctly classified 35 of the 46 samples (76%), including 17 of 26 normal cases and 18 of 20 tumor cases. The mean expression of each of the 14 classifier signature genes in the 46 samples obtained from patients with, and without, disease was visualized by creating individual gene scatter plots (Fig. 4). The observed expression pattern for each of the 14 genes (P <0.01) is also listed in Table 3. P values, computed using a standard Student's t test, quantify the up- or down-regulation of individual genes between patients with and without disease.
Gene symbol . | P . | Function . | ||
---|---|---|---|---|
Up-regulated | ||||
Cyclooxygenase 1 | <0.01 | Oxidation/reduction, aerobic respiration | ||
Angiotensinogen | <0.005 | Serpin peptidase inhibitor | ||
WBSCR27 | <0.005 | Methyltransferase | ||
B3GNT3 | <0.001 | Glycosyltransferase | ||
PTPN23 | <0.005 | Protein tyrosine phosphatase | ||
ND4 | <0.05 | NADH dehydrogenase (ubiquinone) activity | ||
GRIPAP1 | <0.005 | Guanine nucleotide exchange factor for the Ras family | ||
KIAA0841 | <0.005 | Hypothetical protein | ||
LOC727916 | <0.001 | N/a | ||
LOC100131581 | <0.005 | N/a | ||
243525_at* | <0.005 | N/a | ||
Down-regulated | ||||
DMBT1 | <0.005 | Scavenger receptor, immune response | ||
237668_at* | <0.001 | Transmembrane glycoprotein | ||
1559057_at* | <0.001 | N/a |
Gene symbol . | P . | Function . | ||
---|---|---|---|---|
Up-regulated | ||||
Cyclooxygenase 1 | <0.01 | Oxidation/reduction, aerobic respiration | ||
Angiotensinogen | <0.005 | Serpin peptidase inhibitor | ||
WBSCR27 | <0.005 | Methyltransferase | ||
B3GNT3 | <0.001 | Glycosyltransferase | ||
PTPN23 | <0.005 | Protein tyrosine phosphatase | ||
ND4 | <0.05 | NADH dehydrogenase (ubiquinone) activity | ||
GRIPAP1 | <0.005 | Guanine nucleotide exchange factor for the Ras family | ||
KIAA0841 | <0.005 | Hypothetical protein | ||
LOC727916 | <0.001 | N/a | ||
LOC100131581 | <0.005 | N/a | ||
243525_at* | <0.005 | N/a | ||
Down-regulated | ||||
DMBT1 | <0.005 | Scavenger receptor, immune response | ||
237668_at* | <0.001 | Transmembrane glycoprotein | ||
1559057_at* | <0.001 | N/a |
NOTE: Distribution of each gene across all samples is depicted in Fig. 4. Expression patterns are described as either up-regulated or down-regulated in the 20 bladder cancer cases.
Affymetrix probe id, Gene symbol unavailable.
Wherever possible, it is important to evaluate the performance of a disease classifier on independent data sets; however, given that our study is the first to profile exfoliated urothelia for bladder cancer detection, there are no directly comparable, independent data sets available. Furthermore, the majority of bladder cancer studies have used exclusively tumor tissue because their goal was to reveal genes associated with tumor subtype or outcome (14-23). The most appropriate publicly available microarray data set we could find was a study by Dyrskjot et al. (17) that was designed to identify genes associated with carcinoma in situ, and included analysis of material from 41 tumor tissues and biopsies of 9 normal bladder mucosa from patients with no history of bladder cancer. The gene expression levels of these tissue samples were measured using Affymetrix HG-U133A arrays. This is an earlier microarray format with some 20,000 targets and so less than half of the probes on the HG-U133 Plus 2.0 array used in our study are present on the HG-U133A array. Hence, it was not possible to validate the 14-gene prediction model directly on the solid tissue sample data. Instead, we checked the data distributions of genes associated with bladder cancer in our study that were present on both platforms. We first examined the distribution of our 45 discriminatory genes (family-wise error rate < 0.05) in the solid tissue sample data set. Nineteen of the 45 genes were on both platforms, and 8 of those 19 (PART1, ZFAND6, SPATA2, DMBT1, KLF10, UBXN7, chr9orf42, and WDR42A) were found to be significantly differentially expressed (P < 0.05) in the same direction in both urothelia and solid tissue profiles with respect to tumor versus normal cases (see Supplementary Table S3 in Supplementary Data). We then examined the distribution of the 14-gene tumor classifier on the solid tissue data set. In this case, only four genes from the tumor classifier were present on both platforms. Among the four genes, DMBT1 expression was significantly (P = 0.006) reduced in the solid tissue tumor samples in line with the urothelial cell data. Figure S1 in Supplementary Data shows a scatter plot depicting the distribution of DMBT1 expression in the 37 cases in the solid tissue data set (16).
Discussion
Detecting bladder cancer using diagnostic markers still remains a challenge. The inadequate power of single markers may partly explain this. The concept that the presence or absence of one molecular marker will aid diagnostic or prognostic evaluation has not proved to be the case which makes sense when one analyses the complex interactions between various molecules within a single pathway, the cross-talk between molecular pathways, the redundancy of some pathways, and the oligoclonality of many tumors. There needs to be a paradigm shift from single-marker/single pathway research to a more global assessment of bladder cancer. To look for such a profile in bladder cancer, it requires not only high-throughput molecular profiling but also sophisticated bioinformatics tools for complex data analysis and pattern recognition.
The impetus for our search for bladder cancer biomarkers comes from the idea that an accurate biomarker can reduce the number of cystoscopies done each year, and thus, cut down the frequency of this invasive and costly procedure. The way to achieve this is to identify tumor-associated molecules that are available in noninvasively obtained urine samples, either as soluble factors, or within the genome and/or transcriptome of urothelial cells that are naturally shed from the bladder lining and can be readily recovered from urine samples. The proteomic component of urine is an excellent biomarker discovery source material, and indeed, we are developing techniques to maximize the analysis of urinary proteins (26), but proteomics does currently lag a little behind genomics in terms of target coverage and high-throughput technologies applicable to complex biological samples. Genomic profiling has been successfully applied to excised bladder tumor tissue specimens, and a panel of promising markers are undergoing validation in larger cohorts (15, 20, 34). Profiles gleaned from solid tissue specimens are confounded to some extent by cellular heterogeneity, and it is not clear whether candidate biomarkers present in solid tissue will necessarily translate to utility in noninvasively obtained urine specimens. Thus, solid tissue profiling data are perhaps more likely to augment histologic evaluation of excised tissue for tumor subtype classification, treatment options, and prognostication.
Global gene expression analysis of 46 urothelial specimens revealed that gene expression differences were sufficiently robust to distinguish bladder cancer cases from noncancer conditions. The complete listing and ranking of statistically significant genes is available in the Supplementary Data (Supplementary Table S1). Protein interaction analysis of the data from exfoliated urothelia revealed connectivity between many of the genes associated with bladder disease status, but mapping these relationships showed that this connectivity is mediated through a few key factors that act as signaling hubs. The two major hubs, vascular endothelial growth factor and angiotensinogen, both up-regulated in tumor cases, are biologically linked directly, and indirectly through the three minor hubs FLT1, ANG and ERBB2. The network shown in Fig. 2 places vascular endothelial growth factor at the center of multiple interactions. Beyond the genes in the differentially expressed geneset derived in this study, vascular endothelial growth factor regulates the expression of several extracellular factors, for example matrix metalloproteinases and urokinase-type plasminogen activator, which are believed to play a pivotal role in tumor growth by degrading extracellular matrix. The other major hub in the network, angiotensinogen, has multiple roles including being part of the tissue renin-angiotensin systems, which may be a local source of angiotensin II that has specific paracrine functions. Any shift in the balance of the tissue renin-angiotensin systems will have multiple effects on cell proliferation and angiogenesis in a tumor (35).
To extract an accurate diagnostic signature from the urothelial cell microarray data, we applied an improved feature selection algorithm that we previously derived (13, 14, 28). The algorithm addresses the major issues with prior work, including problems with computational complexity, solution accuracy, and capability to handle problems with extremely large data dimensionality (13). The key idea is to decompose an arbitrary complex model into a set of locally linear ones through local learning, and then estimate feature relevance globally within a large margin framework. We have experimentally shown that our algorithm is capable of handling problems with extremely large input data dimensionality, to a point far beyond that needed for gene expression data analysis, and we have successfully applied the algorithm to the derivation of optimal prognostic classifiers in breast and prostate cancer (14, 28). In this study, attempts to build a gene expression-based classifier of bladder cancer presence led to a 14-gene model that correctly predicted the status of 18 of the 20 cancer patients. A major advantage of deriving an accurate classifier signature with relatively few genes is that it facilitates downstream validation studies and the development of potential multiplex detection assays.
The performance of the disease status classifier was evaluated in silico using data from an independent bladder tumor study (17). Nineteen of the 45 top ranked genes in our study were present on the platform used in the validation data set, and 8 of these genes were significantly associated with bladder cancer in both urothelial and solid tissue data sets. Furthermore, expression of the DMBT1 gene, of the 4 genes of our 14-gene classifier that were included in the solid tissue data set, was significantly reduced in cancer cases in both our study and the solid tissue study. As shown in the study by Dyrskjot et al. (16), the “normal” biopsy material does not include only epithelial cells, but an undefined amount of supporting stromal tissue also, such that the gene expression profile will be an average obtained from a complex tissue containing multiple specialized cell types. This highlights the advantage of using samples of exfoliated urothelia over solid tissues. In excised or biopsied tumor tissue samples, the epithelial component will likely be the majority of cellular material, whereas, in normal healthy tissue samples, the epithelia may very well be in the minority. This discrepancy is not a problem when the sample is being used for morphologic or immunohistochemical evaluation, but this can a problem when samples are being compared using global molecular profiling techniques, when the data are an average of gene expression obtained from a complex tissue sample. The use of exfoliated urothelia samples overcomes this discrepancy in that the vast majority of cells will be of epithelial origin, whether obtained from tumor-bearing or healthy individuals. Given the differences in tissue sample, the different platforms used in the two studies and the fact that nonintersecting sets can perform similarly for classification due to coordinate expression, the observed overlap is encouraging for the use of urothelial cell sampling for bladder cancer detection and surveillance.
The above study has several limitations. First, although with similar number of subjects to previously reported bladder cancer microarray studies (17-20), the present data are from a small phase I study designed to illustrate the feasibility of profiling urine. Second, because sample RNA concentrations were low in normal individuals (<2 ng/μL), bladder washings were used to generate the genomic profile. Subsequently, we assayed carefully obtained first morning voids and 24-hour urines that produced similar quantity and quality of RNA as we obtained from bladder washes (data not shown), thus if needed, the above data can be generated in a noninvasive manner. Third, although we validated our data with that of published microarray databases, we did not validate our specific 14-gene model. In accordance to the recommendation of the International Consensus Panel on Bladder Tumor Markers on the development of biomarkers, next, we will perform a phase II study assessing the clinical utility of our 14-gene model in a diverse cohort (e.g., gross hematuria, voiding symptoms, urinary tract infection and urolithiasis) and subsequently validate the results in a phase III study (29). Therefore, we did not see the need to validate our profile at this time if it is not found to be robust in the phase II study. Note, RNA concentration from 100 mL of voided urine would be sufficient to validate our genomic profile using quantitative PCR.
To our knowledge, this is the first report describing the global expression profiling of urothelia obtained from patients visiting the urology clinic. In this study, we have identified genes and diagnostic classifiers that can separate urothelial samples by disease status, and we have shown their association with solid tissue profiles on an independent data set. Currently, larger, confirmatory studies are under way before the development of clinically applicable tests, but the ability to profile the minimal urothelial component of urine, and the application of appropriate data analysis approaches, will greatly facilitate the development of non-invasive methods for the diagnosis and monitoring of both primary and recurrent bladder lesions. The data presented here suggest that it may be possible to detect and characterize bladder cancer based on gene expression analysis of urothelia. Such strategies, if generalizable, would allow for the reduction of invasive procedures, improve surveillance, and provide asymptomatic screening of high-risk populations.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: Florida Biomedical Research through a James and Esther King Award (C.J. Rosser), Flight Attendant Medical Research Institute (C.J. Rosser) and in part by the National Cancer Institute under grant RO1CA116161 (S. Goodison).
Note: Supplementary data for this article are available at Cancer Epidemiology Biomarkers and Prevention Online (http://cebp.aacrjournals.org/).
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.