Purpose: New techniques for the prediction of tumor behavior are needed, because statistical analysis has a poor accuracy and is not applicable to the individual. Artificial intelligence (AI) may provide these suitable methods. Whereas artificial neural networks (ANN), the best-studied form of AI, have been used successfully, its’ hidden networks remain an obstacle to its acceptance. Neuro-fuzzy modeling (NFM), another AI method, has a transparent functional layer and is without many of the drawbacks of ANN. We have compared the predictive accuracies of NFM, ANN, and traditional statistical methods, for the behavior of bladder cancer.

Experimental Design: Experimental molecular biomarkers, including p53 and the mismatch repair proteins, and conventional clinicopathological data were studied in a cohort of 109 patients with bladder cancer. For all three of the methods, models were produced to predict the presence and timing of a tumor relapse.

Results: Both methods of AI predicted relapse with an accuracy ranging from 88% to 95%. This was superior to statistical methods (71–77%; P < 0.0006). NFM appeared better than ANN at predicting the timing of relapse (P = 0.073).

Conclusions: The use of AI can accurately predict cancer behavior. NFM has a similar or superior predictive accuracy to ANN. However, unlike the impenetrable “black-box” of a neural network, the rules of NFM are transparent, enabling validation from clinical knowledge and the manipulation of input variables to allow exploratory predictions. This technique could be used widely in a variety of areas of medicine.

New techniques to enable the prediction of individual cancer behavior are needed to improve patient care. The discovery of robust biomarkers combined with improved methods of data modeling is likely to deliver this ability. Like many forms of malignancy, the incidence of bladder cancer is increasing, with over 11,000 new cases in 1994 in England and Wales (1). At presentation, 70% of TCCs3 of the bladder are superficial and noninvasive. They can be managed by a combination of local endoscopic resection and intravesical chemotherapy. After treatment, 50% will recur as similar noninvasive lesions, and a smaller percentage (20%) will progress to muscle invasion. Muscle invasive tumors have a poor prognosis (50% 5-year survival rates) and require radical therapy if cure is to be achieved (2).

To date, the most reliable predictors of tumor behavior are the pathological stage and grade at diagnosis (TNM classification; Ref. 3). Specific tumors may also have additional prognostic information, for example, for superficial TCCs, the presence of carcinoma in situ and the state of the bladder 3 months after surgery are important prognostic factors (4). Whereas these parameters are useful for stratifying patients into subgroups, it is still impossible to predict the behavior of individual tumors. The development of molecular medicine has yielded many new molecules that may be useful as biomarkers to help the prediction of cancer behavior. Some of the most biologically promising are the p53 and MMR proteins hMSH2 and hMLH1. The p53 gene is estimated to be mutated in >50% of human cancers (5), and has been shown to predict recurrence and survival in bladder cancer (6). Loss of the MMR proteins (most frequently hMSH2 and hMLH1) results in microsatellite instability, and subsequent carcinogenesis in hereditary (7) and sporadic colorectal cancer (8). The resultant tumors have a distinctive phenotype with chemoresistance (9) and better than expected (for their stage) clinical outcomes (10). However, despite optimistic reports for these and other molecular biomarkers, to date no single molecule is sufficiently robust to use in routine clinical practice. It appears likely that the best results will be achieved with a panel of molecular markers.

An alternate solution to the problem of accurately predicting tumor behavior lies within the interpretation of data. Traditional statistical methods (e.g., LoR) are limited by their need for linear relationships between variables, for large datasets and their poor performance when large amounts of “noise” (inherent variation) contaminate a dataset. Thus, predictions are only accurate in 70% of tumors using the TNM classification (11). By using AI methods, such as NFM and ANNs, complex nonlinear relationships between dependent and independent variables within a population of which distribution may not be normal can be identified. ANNs, of which the most commonly used is the Multi-Layer Perceptron, have been applied to clinical medicine since 1989 (12). Whereas individual authors have shown that ANNs are superior to standard statistical analysis using the TNM staging system to predict breast, colorectal, and bladder cancer outcomes (11, 13), a recent review concluded that to date evidence suggests that ANNs may only outperform statistical methods in small datasets (14). Also, ANNs are not without problems. They can be “overtrained” to learn the inherent variation of a sample population and are nonrobust. They do not generalize across the specific problem range of variables, for either interpolation or extrapolation. More importantly, the network is hidden within a functional “black box.” Thus, it is difficult to gain insight into the model obtained from the data and to ensure that clinical and statistical sense prevails. As a result, statisticians are reluctant to believe in the validity of ANN (15). In addition, the weights attached to different variables are uninterpretable, making the interrogation of new variables difficult.

NFM is an alternate AI method, without many of these drawbacks of ANN. In contrast to ANN, which relies entirely on quantitative (i.e. numerical) ideas, NFM combines both quantitative and qualitative (i.e. linguistic) concepts (16). The model obtained from the data comprises a number of rules in parallel. Each rule is in the form “IF Input 1 is X1 AND Input 2 is X2… THEN output 1 is Y1… ” where X and Y are qualitative fuzzy labels. The labels are quantified to represent the strength or certainty of the particular input or output. The parallel rules are merged using fuzzy reasoning to produce the necessary quantitative output. Whereas the basic structure of a NFM model is similar to that for ANN, the model rules are entirely transparent, enabling interpretation and validation.

In the following study, the predictive abilities of two AI methods, NFM and ANN, are compared with traditional statistical methods using a cohort of TCCs. For each method, the results using both conventional clinicopathological data and experimental findings using putative molecular biomarkers are compared. To date and to our knowledge, this is the first study to report the use of NFM in cancer prediction.

Patients.

One hundred and nine patients with primary TCC of the bladder were studied. These represented a typical United Kingdom population of affected patients. The median age at diagnosis was 70 years (range, 34–90 years), and the majority of patients were male (n = 77; 69%). Each TCC had been treated at the Royal Hallamshire Hospital, Sheffield, United Kingdom, and has a clinical follow-up of at least 5 years (median, 6; range, 5–16). Standard immunohistochemistry was performed using commercially available antibodies. Abnormal expression of hMLH1 (17%), hMSH2 (17%), either (24%), or both (8%), and p53 (75%) was seen in the tumors, respectively (17). Abnormal (reduced) MMR expression was associated significantly with tumors of a more advanced stage (P = 0.01 for hMLH1 and P = 0.03 for hMSH2) and a worse differentiation (P = 0.03 for both hMLH1 and hMSH2), when compared with tumors with normal expression. By 5 years, tumors with reduced expression of either MMR protein had significantly fewer relapses (P = 0.03) than tumors with normal MMR expression. Abnormal (increased) p53 immunohistochemical expression was not significantly related to disease stage, grade, or subsequent relapse. Complete methodological and patient details are described elsewhere (17).

AI Modeling.

Two models were developed using both NFM and ANN as follows. A classifier predicted the likelihood of a tumor relapse (yes or no). A predictor predicted the timing of this relapse (months after surgery). These two models were combined together in series, thus predicting if and when a relapse would occur, to produce the most clinically useful model. To discover the value of the putative molecular biomarkers, the data were analyzed twice in each model, as follows. For analysis A, only the conventional clinicopathological data were studied. In analysis B, both the conventional clinicopathological data and the additional three molecular putative biomarkers were studied.

For both of the AI methods, the input variables used to train and test the models are shown in Table 1. The output variable from the models was either the occurrence of relapse (classifier) or the timing of relapse (predictor).

For both NFM and ANN, the models were trained on 90% of the patients before testing on the final 10%. This was then repeated until all of the patients had been used to test the model, via so-called “ensembling” and cross-validation (18). Whereas this was necessary for ANN because of the small dataset, it was also used for NFM to standardize the methods. In addition, NFM was modeled without cross-validation to assess the performance with small datasets.

NFM.

The NFM analyses were performed via an extensive in-house suite of software (19) developed using commercially available software, Matlab4 (1999, Version 5.3). The modeling procedure involves a number of iterative loops subsequent to careful data preparation and initialization of the starting model structure and parameters. These loops refine the models parameters, simplify its structure and component terms to the minimum complexity consistent with the model (i.e. parsimonious modeling), and validate the results.

ANN.

The ANN models were produced using the Matlab Neural Network toolbox. For each session, 10 models were generated and the best was selected. Using this “best fit” model, the total data set was then retested to produce the final results. Training was performed using 50 iterative loops, after which number the accuracy of the model deteriorated.

Statistical Analysis.

To obtain a probability of tumor relapse, using traditional statistical analysis, LoR was performed using SPSS (Version 11.0, SPSS Inc.; Ref. 14). To obtain a statistical prediction of the timing of tumor relapse, LiR was performed using SPSS (Version 11.0, SPSS Inc.). Statistical comparisons of the accuracy of ANN, NFM, and LoR or LiR were performed using a t test. The relationship of the clinical and molecular variables was also assessed with a t test. A P < 0.05 was taken to be significant.

AI Modeling.

The results of the classifier and predictor models generated using NFM, ANN, LoR, and LiR are described in Tables 2 and 3. For both models AI performs better than statistics, with NFM more accurate than ANN in all but one case (classifier, analysis B). The area under receiver operator characteristic curve for NFM (0.98) is superior to that for both ANN (0.91–0.88) and LoR (0.47–0.49). In Table 3, the difference between the predicted and actual timing of relapse are shown as RMS of the error values. As can be seen, NFM is superior to both ANN and LiR, with lower RMS values. When these are compared, both AI methods are significantly superior to LiR (P < 0.0006), and NFM is significantly better than ANN at predicting tumor relapse (P = 0.01) in the testing phase, but for the best fit model the P falls just short of significance (P = 0.079). If analyses A and B are compared individually, there is no difference between NFM and ANN for analysis A (P = 0.87), whereas for analysis B, NFM is superior to ANN (P < 0.001). The predictions of all three of the methods are shown graphically in Figs. 1 and 2. When cross-validation is omitted from the NFM model, the RMS values are, for analysis A, 7.6 (training), 10.3 (testing), and 8.5 (best fit), and for analysis B, 7.2 (training), 7.6 (testing), and 7.3 (best fit).

The accurate prediction of an individual patient’s tumor response to treatment is a Holy Grail of oncology. Indeed, recent discoveries in molecular medicine (9) and improvements in clinical treatments have made it now more important than ever to predict tumor behavior. We have shown that AI methods can predict tumor behavior with greater accuracy than traditional statistical methods and that NFM is superior to ANN in this study. Until the advent of AI, the best method of predicting tumor behavior was LoR. However, the poor predictive accuracy of this method (70–77% in this study and 69%–73% previously; Ref. 11) and the fact that a general probability is not applicable to an individual patient show the need for improvements. Using Fig. 1, if LiR were applied in clinical practice, those patients with late relapsing tumors (>40 months) would have had their most intensive cystoscopic surveillance too early for their actual relapse. On an individual basis, those 2 patients with relapse after 60 months (both predicted to recur by 20 months) may have been falsely reassured and discharged.

We have confirmed that ANN provides a powerful and accurate predictive method, and, as with other reports, that it is superior to traditional statistical methods (11, 20, 21, 22, 23). Despite optimistic reports on the application of ANN, the functional layers of the ANN are hidden, with the uninterpretable weights attached to individual variables. This opacity remains an obstacle to the widespread introduction of ANN. Unlike previous studies we have been able to compare ANN with NFM, which has transparent functional layers. Our current study has shown that NFM produces a significantly more accurate prediction than LoR and LiR, and is equivalent to or better than ANN. NFMs are particularly good when working with small datasets, as shown by the relative improvement of NFM over ANN when using analysis A, and the small improvement with the use of cross-validation for NFM.

Whereas the accuracy of NFM is important, it has many other benefits over ANN that promise to make it more acceptable to the clinical and scientific community. Unlike the black-box approach of ANN, the NFM approach is transparent. The problem-specific qualitative modeling representation can be easily translated into understandable medical terms. NFM uses the modeling abilities of fuzzy-logic to complete a profile for each variable. This produces a set of parallel rules, which are summated in series and interpreted to produce a quantitative output. An example is shown in Fig. 3 a. The top line, Rule 1, shows that a poorly differentiated (grade 3), superficially invasive tumor (stage T1), in a 70-year-old male current smoker (30 cigarettes/day), with abnormal p53 staining, but otherwise normal variables, will have a short time to tumor relapse.

Once trained on a dataset the NFM suite is then able to reduce the inputs into those that have most influence. This is likely to assist clinicians in day-to-day clinical practice, where insufficient data may be present in patient case notes. In Fig. 3,b, the NFM suite of programs has automatically reduced the nine inputs to the four most effective ones. These can be interpreted (Fig. 3 c) to confirm that these are clinically sensible inputs (tumor grade, patient age, smoking history, and p53 expression). The automatic selection of these inputs can easily be validated; for example, authors have used multivariate analysis of large TCC series to show that grade is more important than stage at predicting relapse (24). This ability to prioritize the inputs will prove very useful in numerous clinical areas. Furthermore, because the fuzzy representation allows the predictive modeling rules to be understood, expertise can be incorporated into the selection of the inputs and manipulation of the model rules. As a result, nonsensible variables, which may be incidentally over-represented in the training dataset, can be removed or have their importance reduced. In addition, if there is medical knowledge available relevant to the NFM rule set, this can be added to the model easily. This will then enable the model to extrapolate and be more robust than other AI approaches.

The modeling abilities of NFM have additional benefits. It is possible to hypothetically change a single variable, while keeping the others constant, and produce new exploratory predictions. This allows both hypothesis testing and individual risk assessment to be performed. In Fig. 4, NFM has modeled tumor relapse of an individual patient (in months) against number of cigarettes they smoke per day. The other inputs were fixed to represent a typical patient: a 70-year-old man with a poorly differentiated, superficially invasive TCC (G3pT1). From the resultant model, it can be seen that if smoking were reduced from 40/day to 20/day, the time to relapse would increase from 6 to 12 months.

To our knowledge, this is the first report of the use of NFM to predict the behavior of cancer. We have demonstrated that NFM can successfully predict the occurrence and timing of tumor relapse, with similar or greater accuracy than ANN and LR. In addition, NFM modeling appears to be superior to ANN and LR in its transparency, its ability to incorporate expertise, its superior performance with sparse data, and its ability to select the most useful input criteria (parsimony) and to allow predictions of outcome that result from changes in the value of individual inputs. These features suggest that NFM could be used as a valuable and versatile tool to address numerous clinical situations. Whereas the predictions have been performed using bladder cancer data, these methods are transferable to all other human malignancies.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported by a Medical Research Council Clinical Research Fellowship and the British Urological Foundation/Merck Sharpe and Dohme 2001 Scholarship.

3

The abbreviations used are: TCC, transitional cell carcinoma; AI, artificial intelligence; NFM, neuro-fuzzy modeling; ANN, artificial neural network; LoR, logistic regression; LiR, linear regression; MMR, mismatch repair; TNM, Tumor-Node-Metastasis; RMS, root mean square.

4

Internet address: http://www.mathworks.com.

1
Quinn M. .
Cancer trends in England and Wales 1950–1999. SMPs No. 66 edition
, The Stationary Office  
2000
.
2
Stein J. P., Lieskovsky G., Cote R. J., Groshen S., Feng A. C., Boyd S., Skinner E., Bocher B., Thangathurai D., Mikhail M., Raghaven D., Skinner D. G. Radical cystectomy in the treatment of invasive bladder cancer; long term results in 1054 patients.
J. Clin. Oncol.
,
19
:
666
-675,  
2001
.
3
Sobin L. H., Wittekind C. H. .
TNM Classification of Malignant Tumours
, Ed. 5
187
-190, John Wiley and Sons New York  
1997
.
4
Parmar M. K. B., Freedman L. S., Hargreave T. B., Tolley D. A. Prognostic factors for recurrence and follow up policies in the treatment of superficial bladder cancer: interim report from the British Medical Research Council subgroup on superficial bladder cancer (Urological cancer working party) J.
Urol.
,
142
:
284
-288,  
1989
.
5
Hollstein T., Sidransky D., Vogelstein B., Harris C. C. p53 mutations in human cancers.
Science (Wash. DC)
,
253
:
49
1991
.
6
Esrig D., Elmajian D., Groshen S., Freeman J. A., Stein J. P., Chen S. C., Nichols P. W., Skinner D. G., Jones P. A., Cote R. J. Accumulation of nuclear p53 and tumour progression in bladder cancer.
N. Engl. J. Med.
,
331
:
1259
1994
.
7
Aaltonen L. A., Peltomaki P., Leach F. S., Sistonen P., Pylkkanen L., Mecklin J. P., Jarvinen H., Powell S. M., Jen J., Hamilton S. R., et al Clues to the pathogenesis of familial colorectal cancer.
Science
,
260
:
812
-816,  
1993
.
8
Liu B., Nicolaides N. C., Markowitz S., Willson J. K., Parsons R. E., Jen J., Papadopolous N., Peltomaki P., de la Chapelle A., Hamilton S. R., et al Mismatch repair gene defects in sporadic colorectal cancers with microsatellite instability Nat.
Genet.
,
9
:
48
-55,  
1995
.
9
Drummond J. T., Anthoney A., Brown R., Modrich P. Cisplatin and adriamysin resistance are associated with MutLA and mismatch repair deficiency in an ovarian tumour cell line J.
Biol. Chem.
,
271
:
19645
-19648,  
1996
.
10
Gryfe R., Kim H., Hsieh E. T., Aronson M. D., Holowaty E. J., Bull S. B., Redston M., Gallinger S. Tumor microsatellite instability and clinical outcome in young patients with colorectal cancer N.
Engl. J. Med.
,
342
:
69
-77,  
2000
.
11
Burke H. B., Goodman P. H., Rosen D. B., Henson D. E., Weinstein J. N., Harrell F. E. J., Marks J. R., Winchester D. P., Bostwick D. G. Artificial neural networks improve the accuracy of cancer survival prediction.
Cancer (Phila.)
,
79
:
857
-862,  
1997
.
12
Baxt W. G. Use of an artificial neural network for data analysis in clinical decision-making: the diagnosis of acute coronary occlusion.
Neural Computation
,
2
:
480
-489,  
1989
.
13
Qureshi K. N., Naguib R. N. G., Hamdy F. C., Neal D. E., Mellon J. K. Neural network analysis of clinicopathological and molecular markers in bladder cancer.
J. Urol.
,
163
:
630
-633,  
2000
.
14
Sargent D. J. Comparison of artificial neural networks with other statistical approaches.
Cancer (Phila.)
,
91
:
1636
-1642,  
2001
.
15
Schwarzer G., Vach W., Schumacher M. On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology.
Stat. Med.
,
19
:
541
-561,  
2000
.
16
Nie J., Linkens D. A. .
Fuzzy-Neural Control
, Prentice Hall London  
1995
.
17
Catto J. W. F., Xinarianos G., Burton J. L., Meuth M., Hamdy F. C. Differential expression of hMLH1 and hMSH2 is related to bladder cancer grade, stage and prognosis, but not microsatellite instability.
Int. J. Cancer
,
105
:
484
-490,  
2003
.
18
Tarassenko L. .
A guide to neural computing applications
, Arnold London  
1998
.
19
Chen M., Linkens D. A. A systematic neurofuzzy modelling framework with application to material property prediction.
IEEE Trans SMC. Part B: Cybernetics
,
31
:
781
-790,  
2001
.
20
Porter C., O’Donnel C., Crawford D. E., Gamito E. J., Errejon A., Genega E., Sotelo T., Tewari A. Artificial neural network model to predict biochemical failure after radical prostatectomy.
Mol. Urol.
,
5
:
159
-162,  
2001
.
21
Potter S. R., Miller C., Mangold L. A., Jones K. A., Epstein J. I., Veltri R. W., Partin A. W. Genetically engineered neural networks for predicting prostate cancer progression after radical prostatectomy.
Urology
,
54
:
791
-795,  
1999
.
22
Tewari A., Issa M., El-Galley R., Strickler H., Peabody J., Pow-sang J., Shukla A., Wajsman Z., Robin M., Wei J., et al Genetic adaptive neural network to predict biochemical failure after radical prostatectomy: a multi institutional study.
Mol. Urol.
,
5
:
163
-169,  
2001
.
23
Ziada A. M., Lisle T. C., Snow P. B., Levine R. F., Miller G., Crawford D. E. Impact of different variables on the outcome of patients with clinically confined prostate cancer.
Cancer (Phila.)
,
91
:
1653
-1660,  
2001
.
24
Millan-Rodriguez F., Chechile-Toniolo G., Salvador-Bayarri J., Palou J., Vicente-Rodriguez J. Multivariate analysis of the prognostic factors of primary superficial bladder cancer.
J. Urol.
,
163
:
73
-78,  
2000
.