Purpose: The purpose of this research was to identify molecular clues to tumor progression and lymph node metastasis in esophageal cancer and to test their value as predictive markers.

Experimental Design: We explored the gene expression profiles in cDNA array data of a 36-tissue training set of esophageal squamous cell carcinoma (ESCC) by using generalized linear model-based regression analysis and a feature subset selection algorithm. By applying the identified optimal feature sets (predictive gene sets), we trained and developed ensemble classifiers consisting of multiple probabilistic neural networks combined with AdaBoosting to predict tumor stages and lymph node metastasis. We validated the classifier abilities with 18 independent cases of ESCC.

Results: We identified 71 genes of 1289 cancer-related genes of which the expression correlated with tumor stages. Of the 71 genes, 47 significantly differed between the Tumor-Node-Metastasis pT1/2 and pT3/4 stages. Cell cycle regulators and transcriptional factors possibly promoting the growth of tumor cells were highly expressed in the early stages of ESCC, whereas adhesion molecules and extracellular matrix-related molecules possibly promoting invasiveness increased in the later stages. For lymph node metastasis, we identified 44 genes with predictive values, which included cell adhesion molecules and cell membrane receptors showing higher expression in node-positive cases and cell cycle regulators and intracellular signaling molecules showing higher expression in node-negative cases. The ensemble classifiers trained with the selected features predicted tumor stage and lymph node metastasis in the 18 validation cases with respective accuracies of 94.4% and 88.9%. This demonstrated the reproducibility and predictive value of the identified features.

Conclusion: We suggest that these characteristic genes will provide useful information for understanding the malignant nature of ESCC as well as information useful for personalizing the treatments.

Esophageal cancer shows the poorest prognosis among the malignant tumors of the digestive tract. Although advances in diagnostic methods and health education have contributed to discovery of the disease in its early stages, many cases are still not detected until the advanced stage. Despite the use of modern surgical techniques in conjunction with multitreatment modalities such as radio- and chemotherapy, the overall 5-year survival rate remains 40–60% (1, 2, 3, 4). In an effort to better understand this disease and predict its clinical outcome, numerous molecular markers for tumor progression and prognosis, e.g., STMY3 (5), interleukin 6 (6), C1orf10 (7), and EC45 (8), and for lymph node metastasis, e.g., caveolin-1 (9), EphA2 (10), FAK (11), cystain B (12), and MMP-12 (13), have been identified. This limited information, however, is not enough to clarify the carcinogenesis, tumor progression, and invasiveness of esophageal cancer; just as in the allegory about the five blind men and the elephant, the complete picture of the pathophysiology of the disease remains elusive.

As distinct from these “single gene-oriented” approaches, expression profiling that collectively analyzes the expression of many genes using a cDNA array has drawn attention as a promising approach for uncovering molecular mechanisms independently of previous knowledge (14, 15, 16). In the current study, we analyzed gene expression profiles in a total of 36 esophageal cancers and their relation to pathological features based on the Tumor-Node-Metastasis staging. By using a generalized linear model-based regression analysis that allowed us to extract information on features relevant to discrete graded categories, we identified genes of which the expression was altered in association with tumor progression. Also, we identified genes of which the expressions were associated with lymph node metastasis by a feature-subset selection algorithm: this algorithm enabled us to optimize the statistical explanatory model for distinction, which we demonstrated in the classification of 18 independent cases for validation.

Patients and Samples.

We collected the primary cancer tissues obtained from a total of 54 patients with histologically verified esophageal squamous cell carcinomas (ESCCs) who underwent surgery from January 2001 to September 2003 in the Hokkaido University Hospital and 15 affiliated hospitals in Hokkaido prefecture, Japan. Only patients who agreed with the aim and contents of this study and who provided their written informed consent were included. One to five bulk tumor tissue samples of ∼5 mm-size were immediately cut from the esophagus resected by a standard surgical procedure, snap frozen in liquid nitrogen, and stored at −80°C until use. Part of each sample piece was cut and stained with H&E for verification of the presence of squamous cell carcinoma cells, and the sample pieces, of which the evaluated areas were composed of at least 50% tumor cells, were used for RNA extraction. All of the procedures in this portion of the study were approved by the Ethics Committee of Hokkaido University and the independent internal ethics committees of the affiliated hospitals.

Clinicopathological Parameters.

Histological subclassification and staging of the tumors was done by reviewing the specimens taken for pathological diagnosis, according to the Tumor-Node-Metastasis classification (17). The tumor status of each case was categorized based on the Tumor-Node-Metastasis classification (Unio Internationale Contra Cancrum, 6th edition) for the pT, pN, and pM stages. Pertinent major clinicopathological parameters are shown in Table 1.

Analysis of Gene Expression.

Each frozen tissue was crushed in liquid nitrogen to powder by using a CRYO-PRESS compressor (Microtec Nition, Chiba, Japan). Total RNA was extracted using TRIzol reagent (Invitrogen, Tokyo, Japan). Before mRNA extraction, total RNA was treated with 1 unit/μl DNase I and 10 units/μl RNase inhibitor (TOYOBO, Osaka, Japan) at 4°C for 15 min. Polyadenylated mRNA was extracted from total RNA using oligodeoxythymidylic acid magnetic beads (MagExtracter-mRNA-kit; TOYOBO). The mRNA (1.0 μg) was then reverse-transcribed using ReverTraAce reverse transcriptase (TOYOBO). The quality of the sample cDNA was checked by amplifying β-actin, G3PDH, and α-tubulin with PCR amplification. The cDNA was ethanol-precipitated and substituted to polyadenylated tailing with terminal deoxynucleotidyl transferase (TOYOBO). The Polyadenylated cDNA was then amplified by PCR (25 cycles) for biotin-16-dUTP labeling with KOD dash DNA polymerase (TOYOBO). The labeled cDNA was ethanol-precipitated, denatured at 68°C, and used as a hybridization probe in 10 ml of PerfectHyb solution (TOYOBO). Hybridization was done overnight at 68°C on a GeneticLab in-house cDNA array of 1289 genes plus 11 housekeeping genes (GeneticLab, Sapporo, Japan). The membrane was washed three times with 2×SSC/0.1% SDS at 68°C and then three times with 0.1× SSC/0.1% SDS at 68°C. Chemiluminescence was done using a streptavidin-biotinylated alkaline phosphatase system (Imaging High – Chemilumi-Gene Navigator; TOYOBO) using CDP-Star as the luminogen, and the signals were photographed by a Fluor-S MultiImager (Nippon Bio-Rad Laboratories, Tokyo, Japan). Image analysis and quantification were performed with Imagine 4.2 software (BioDiscovery Inc., Los Angeles, CA), averaging the signals from the adjusted regions of two symmetrically arranged spots for each gene, relative to background values.

Statistical Analyses and Array Data Analysis.

In this study we investigated changes in the gene-expression profile during tumor progression. Analysis of gene expression profiles and related statistical analyses were performed using programs originally developed in the MATLAB language, ver. 6.5 (MathWorks, Tokyo, Japan). To determine confounding relations among the histopathological parameters, a contingency table analysis (χ2 test) was performed between pN status and pT status.

The array data in the present study consisted of 1289 dimensional variables (genes) that showed an exponential distribution. The data distribution of each sample exhibited a cumulative distribution curve of different shape, reflecting a biased size-shift (nonlinear offset effect) because of a different condition in each hybridization. Normalizations using expression values of housekeeping genes were not considered appropriate, because they often lost linearity due to overhybridization. Thus, we normalized the data by dividing the expression value of each sample by the mean value of the genes of which the values were within 0.1–1.2-fold of the median expression. We set this interval because of the stability of data values observed in a number of empirical cumulative distribution curves of the array data.

We divided the 54 cases into a training set of 36 cases and a validation set of 18. The pertinent clinicopathological data of the constituents of each set are shown in Table 1. They indicate an unbiased division of cases. We used the data of the training set to explore the characteristics of tumor progression (advancing pT stages) and metastatic potential (pN stages). We then used the validation set to evaluate the predictive ability of the selected features.

To detect genes that increased or decreased in correlation with the graded depth of invasion, we adopted a regression analysis on a generalized linear model (18). This model allows a reduction of the relation between an explanatory variable X (the expression value of each gene) and a response variable Y (4-categorized pT stages given dummy integer numbers 1–4) into a linear relation if the distribution of Y belongs to the exponential distribution function, by transforming Y with an appropriate link (kernel) function:

where α is the intercept and β is the regression coefficient. Probit function (inverse Gaussian distribution function) was used for the link function. In the estimation of β, the maximum likelihood method is used to minimize the residuals (the weighted least-square fit). We applied this method to each of 1289 genes and extracted those that showed Ps of regression coefficient β < 0.05.

On the other hand, to determine the expression profiles that are characteristic of the presence or absence of lymph node metastasis and those that are characteristic of the pT1/2 versus pT3/4 stages, we selected genes on the basis of statistical difference between mean expressions of the classes (two-sided t test, P ≤ 0.05) in the 36-case training set. To determine whether the selected genes were classified accurately as samples in the various classes, we performed a classification using an expectation maximization algorithm that separates a mixture of different data distributions in iterative steps of maximum likelihood estimation (19). We also tested the separability of the distributions with a k-means clustering (20). In addition, we tested the differences of gene expressions between the classes by a nonparametric bootstrap resampling test that permitted us to check the stability of statistical differences independent of the data distribution (21). This procedure randomly and multiply resamples the data with replacement (allowing overlap) and compares the means of the resampled data with the means of the original samples (pivotal test, two-sided, α level 0.05). This tests the apparent by-chance difference attributable to the presence of outlier values or biased distributions of values, to which the parametric t test is insensitive.

We then performed a feature-subset selection that allows identification of a set of genes optimal for pattern classification from the selected genes to extract the essential part of the gene set. As the algorithm for the feature-subset selection, we used a sequential forward selection, which extensively explored a combination of genes minimizing the leave-one-out error rate of a k-nearest neighbor classifier. This algorithm, proposed by Whitney (22), adds the most significant feature (gene) that gives a minimal error rate for each step, from an empty set to the full set of the given data, and finally selects a series of gene sets yielding the grand minimal error rate. As a classifier we adopted the k-nearest neighbor method, which classifies each sample according to the class memberships of its k nearest points based on Euclidean distance in the N-dimensional space (20, 23). The leave-one-out error cross-validation procedure designates one sample as a test set and the remaining N - 1 samples as a training set. The k-nearest neighbor rule developed on the training set was repeatedly applied for each left-one sample, and a cumulative training error was calculated.

The feature subset selection yields a number of feature (gene) sets that give the same grand minimal error rate. The gene sets were then analyzed for their selection tree structure with Dijkstra and depth-first search algorithms to extract, by pruning leaves and branches, the connected parts (vertices) that form the tree trunk (24, 25). This procedure based on the graph theory permits extraction of the essential part of the feature subsets, by discarding the possible local minimal solutions (prediction models) that are considered the major cause of deterioration of the generalization ability of a classifier. The obtained sets of genes were then used as the final diagnostic sets to form an ensemble classifier consisting of multiple probabilistic neural networks (PNN; Ref. 26) in which each component classifier was weighted with the AdaBoost algorithm (27). PNN is a space-dividing classifier like k-nearest neighbor, but uses the radial basis function to simulate Bayesian posterior probabilities for each data point instead of evaluating simple Euclidean distances as in k-nearest neighbor. With the use of AdaBoosting, each classifier (PNN) was weighted with leave-one-out in-sample errors in the training procedure to optimize the training and voting. Finally, to determine the performance of the ensemble classifier, we tested it with the 18-case validation set independent of the 36-case training set and evaluated the overall classification error.

In this study we attempted to extract gene expression profiles characteristic of the invasiveness and progression status of ESCCs, with a particular focus on lymph node metastasis (pN stage) and depth of tumor invasion (pT stage). A contingency table analysis (χ2 test) showed no significant relationship between the pN stages and the pT stages (pN0: pT1, 7; pT2, 3; pT3, 14; pT4, 1; pN1: pT1, 3; pT2, 10; pT3, 14; pT4, 2 cases; χ2 = 5.436, P = 0.1425). The cumulative distribution curves of the array data of the 54 cases before and after normalization are shown in Fig. 1. After normalization, the different shapes in the exponential distributions were well smoothed-over in all of the cases.

We then investigated genes that altered their expression in association with tumor progression using a generalized linear model-based regression analysis in the 36-case training set data (pT1, 6; pT2, 10; pT3, 17; pT4, 3 cases). The analysis detected 47 genes up-regulated and 24 genes down-regulated in association with pT stages (Fig. 2,A; Table 2). As we noted above, the most conspicuous changes in gene expression occurred between the pT2 and pT3 stages. Therefore, we extracted a total of 88 genes that each showed a significant difference (two-sided t test, P < 0.05) in expression between pT1/2 cases (n = 16) and pT3/4 cases (n = 20) in the training set (Fig. 2,B). It was of note that the expression patterns of the 88 genes in the independent validation set (18 cases: pT1/2 7 cases, pT3/4 11 cases) were similar to those in the 36-case training set. Among these 88 genes, 47 were in common with the above 71 genes selected by the regression analysis. We confirmed the stability of statistical differences for these 88 genes with a nonparametric bootstrap test (10,000 times resampling). We tested the separability of the data subspaces (distributions) formed by the expression values of the 88 genes by using an expectation maximization algorithm. This algorithm misclassified 1 of the 36 cases (error rate, 2.8%). Another classifier, k-means clustering, misclassified 11 of 36 cases (error rate, 30.6%). To identify the essential features to optimally classify the training set data, we applied a sequential forward-selection algorithm that sequentially selected a better combination of genes based on leave-one-out error rates of a k-nearest neighbor classifier. The algorithm and the subsequent tree analysis using the Dijkstra and depth-first search algorithms finally selected 33 combinations of features (genes), consisting of 26–62 genes that classified the two classes with 100% accuracy (Fig. 2 C).

As the genes characteristic of lymph node metastasis, we extracted a total of 87 genes of which the expression showed a significant difference (two-sided t test, P < 0.05) between pN0 cases (n = 16) and pN1 cases (n = 20; Fig. 3,A). Again, the independent validation set (pN1, 9 cases; pN0, 9 cases) showed expression patterns similar to those of the training set, demonstrating the reproducibility of the expression profiles across the independent sets of samples. We confirmed the statistical differences by the bootstrap resampling test. A test of the separability of the data subspaces (87 dimensions) using the expectation maximization algorithm classified 31 of the 36 in-sample cases correctly (error rate, 13.9%). Also, the k-means clustering misclassified 11 of 36 cases (error rate, 30.6%). In the same way as described above, to prune features that made the margin between pN0 and pN1 classes and obtain an optimal set of features that characterize the lymph node status, we performed a feature-subset selection. The algorithm and the subsequent tree analysis finally selected 19 sets consisting of 11–44 genes that classified the two classes with 97.2% accuracy (Fig. 3,B; Table 3).

The 71 genes of which the expressions were altered in association with T stages (Table 2) and the 87 genes that were differentially expressed between node-positive and node-negative cases shared 14 common genes (NOL3, CAV2, CD47, CYP2D6, DRD2, DRD3, EPHB6, ITGB6, MDM2, MMP14, MUC2, GRIN2C, P2RY1, and TGFBI). We estimated the probability of the coincidental selection of these 14 genes with a Monte-Carlo simulation (100,000 times), which gave a P of 1.0 × 10−5. The 57 genes that excluded these 14 genes from the 71 genes may specifically reflect the molecular characteristics of depth of tumor invasion (pT stage). Only 4 (CYP2D6, EPHB6, ITGB6, and MDM2) of the 71 genes were also included among the 44 genes that characterized lymph node metastasis (P = 0.07695, Monte-Carlo simulation). Whereas these may represent the common molecular events between graded depth of invasion and lymph node metastasis, most genes were different between the two categories, suggesting that depth of invasion and lymph node metastasis had distinct underlying molecular pathophysiologies.

To test the validity of the selected features, we tested the feature subsets composed of 33 combinations of genes (26–62) for pT1/2 versus pT3/4 classification, as well as 19 combinations (11–44 genes) for pN1 and pN0, by using a PNN ensemble classifier. For each subset of features, a PNN classifier was formed, and its leave-one-out classification error was evaluated with the 36-case training set. The AdaBoost algorithm increased the weight of the misclassified cases and accordingly modified the successive training. Finally, the ensemble classification was done by summing up the weighted votes of all of the PNN classifiers for the independent validation-set cases and then by classifying each test case to a class given a larger vote-sum. As shown in Table 4, the overall prediction error rates for pT stages and pN stages were respectively 5.6% (17 of 18 cases correctly classified) and 11.1% (16 of 18 cases). Furthermore, we tested whether or not the 71 genes selected by the regression analysis could discriminate between the pT1/2 and pT3/4 classes in the 18-case validation set. The feature subset selection algorithm selected a total of 20 subsets consisting of 10–29 genes, giving an in-sample error rate of 2.8%. The final performance of the formed ensemble classifier was a 16.7% overall error rate (16 of 18 cases correctly classified, 2 of 7 pT1/2 and 1 of 11 pT3/4 misclassified). The finally selected 29 genes contained 18 in common with the above 62 finally selected features.

Multiple genes are involved in the complex multistep process of carcinogenesis, tumor progression, and acquisition of invasiveness in esophageal cancer. The cDNA array technology has enabled us to analyze the expression profiles of thousands of genes simultaneously and to investigate the correlation between clinicopathological phenotypes and gene expression status. This technique provides a powerful means to stratify (or personalize) tumors into classes with distinct molecular pathophysiologies that has not been completely proven. In esophageal cancer, some authors have reported that the gene expression profile of ESCC is different from those of pericancerous, normal, or dysplastic epithelia, implying the potential usefulness of the technique for the differential diagnosis of early ESCC from precancerous benign lesions (28, 29). To date, however, there has been no report that investigates expression profiles associated with specific malignant characters of ESCC. In the present study, we for the first time identified genes of which the expression was altered in correlation with the graded depth of invasion (pT stages) by applying a generalized linear model-based regression analysis to cDNA array data. The changes in gene expression were most conspicuous between pT2 and pT3 (invasion beyond the musculus proprius), suggesting that a major alteration in molecular pathophysiology occurs at a rather late stage of tumor progression. The validity of the identified genes was confirmed by the reproducibility of the expression patterns in the independent set of samples, and by the overall performance of the classifier starting from both feature sets selected by the regression analysis and binary-class comparison.

As shown in Table 2, the identified genes that increased their expression in association with advance of pT stages had a wide spectrum of functions, including cell adhesion molecules, extracellular matrix-related proteins, cell death regulators, and growth/differentiation factors. Among them, for instance, integrin α5 is known to activate signal transduction, enhance cell proliferation, suppress apoptosis, and thereby contribute to tumor progression in lung squamous cell carcinoma (30). ITGB6 also promotes the progression of oral squamous cell carcinoma by activating Fyn on binding to fibronectin(31), which showed higher expression in the progressed tumors in our study. Among cell death regulators, NOL3 inhibits apoptosis (32). LGALS1 is involved in neoplastic progression and proliferative activities (33). The matrix metalloproteinase MMP1 is associated with depth of invasion in esophageal cancer (34), whereas MMP7 and MMP10 have been shown to be prognostic factors in esophageal cancer (35, 36). TIMP1, which is an inhibitor of matrix metalloproteinases, correlated with the tumor progression in the present results, as one previous report indicated a positive correlation with tumor progression (37). VEGFC, which is a growth/differentiation factor, is known to promote esophageal tumor progression via lymphangiogenesis and angiogenesis (38). Taken together, these results indicate that tumor progression of ESCC is associated with factors that are relevant to the invasive characters of tumors.

The genes of which the expressions were decreased in association with advancing pT stage included several cell cycle regulators and transcription factors. Among the cell cycle regulators, CDC2L1 is known to induce Fas-mediated apoptosis in malignant melanoma cells (39), and RBL2, which is a member of the retinoblastoma gene family, is known to be a tumor suppressor gene that is positively correlated with the prognosis of ESCC (40). NCOA1 and NCOA2 are ligand-dependent transcription factors involved in a wide array of biological processes, including cell differentiation, reproduction, and homeostasis (41). PAX8 promotes thyroid follicular carcinoma progression by forming fusion proteins with PPARγ (42). These results suggest that the expression of cell cycle regulators acting as a tumor suppressor is maintained in the early pT stages of ESCC. Also, the transcription factors are implicated in the promotion of tumor cell proliferation through activation of multiple signal transductions. Additional investigation of these genes may provide a clue for interpretation of the mechanism by which malignancy is acquired in concert with tumor progression.

In regard to lymph node metastasis, which is another important factor for determination of clinical stage, we also confirmed the validity of the identified genes according to the reproducibility and performance of a classifier beyond the sample sets. It was here shown that cell adhesion proteins (CD33, GJB1, ITGA2B, ITGAX, and ITGB6) and cell membrane receptors (CD2, ERBB4, and CSF2RB) showed higher expression in node-positive cases. In contrast, cell cycle regulators (CDC16, CDK8, CCNF, and PPM1D) and intracellular signaling molecules (APOD, APOH, FRAP1, GAB1, NCK1, and MAPK14) were expressed in lower levels in node-positive cases. These results might indicate that the invasive character of tumor cells is determined by cellular interaction with the extracellular environment rather than the proliferative potentials. It is interesting that the spectrum of expressed genes in relation to lymph node metastasis is distinct from that for tumor progression.

In the present study, we used feature-subset selection to extract genes relevant to specific characteristics of esophageal cancers. This approach has been successfully used to extract essential features to classify cDNA array samples into categories (28, 43). If one merely selects features according to statistical differences, the selected data still contain noise that is not well separated as distinct spatial distributions, as was seen in the present study using expectation maximization-algorithm and k-means clustering (respective separation rates: 86.1% and 69.4% for lymph node status). This may often be the case in cDNA array data, which are rather susceptible to randomness in expressions and their measurements. Feature-subset selection that searches for optimal features in separation of the feature subspaces can extract genes of which the expression is consistently different in the given classes. In this regard, we used the k-nearest neighbor classifier rather than the more sophisticated classifiers that transform feature subspaces or unequally place weights on the feature values. We consider that such transformation or weighting can result in a local minimal solution that overestimates trivial features that do not have true biological significance. For the final validation of the selected features, we used a similar strategy, the use of a PNN that divides the feature space with simulated Bayesian posterior probabilities. The ensemble classification using the AdaBoost algorithm combined the multiple feature sets (models) and performed well. The good classification performance obtained for the independent set of samples, by the use of a classifier system different from that used in feature selection, indeed demonstrates the validity of the selected features. We propose that the presently shown method provides a useful means to extract biologically significant differences from noisy cDNA array data.

Although the selected genes that are characteristic of tumor progression and invasiveness in ESCC displayed some interesting trends, these features do not uncover the whole appearance of the malignant nature underlying complex gene network, because these were selected from only a part (3–4%; 1289 genes) of the whole human genes. Hence, additional investigation should be warranted to better understand the malignant nature of esophageal SCC by making the selected features as a clue.

Grant support: Ministry of Education, Science, Sports and Culture Japan, Grant-in-Aid for Scientific Research (B).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Requests for reprints: Mitsuhiro Tada, Division of Cancer-Related Genes, Institute for Genetic Medicine, Hokkaido University, Sapporo, 060-0815, Japan. Fax: 81-11-706-7870; E-mail: [email protected]

Fig. 1.

The cumulative distributions of cDNA array data of the 54 esophageal squamous cell carcinomas. Each ——— in the cascade-like picture represents a cumulative distribution curve of the expression values of 1289 genes in a case. A, before normalization, note that distributions are uneven across the cases, forming an irregularly shaped “cascade”; B, after normalization, the shape of the cascade was smoothed, especially in the lower 80 percentiles of genes.

Fig. 1.

The cumulative distributions of cDNA array data of the 54 esophageal squamous cell carcinomas. Each ——— in the cascade-like picture represents a cumulative distribution curve of the expression values of 1289 genes in a case. A, before normalization, note that distributions are uneven across the cases, forming an irregularly shaped “cascade”; B, after normalization, the shape of the cascade was smoothed, especially in the lower 80 percentiles of genes.

Close modal
Fig. 2.

Changes in the expression of genes correlated with tumor progression (pT stages). A, 24 genes (top) were decreased, and 47 genes (bottom) were increased along with the advance of pT stage by the generalized linear model-based regression analysis. Brighter color indicates higher expression. B, heat-map view of the 88 genes of which the expressions significantly differed between the pT1/2 (n = 16) and pT3/4 (n = 20) groups in the 36-case training set. The normalized relative expression for each transcript (rows) in each sample (columns) is shown in color (brighter color indicates higher expression). For comparison, the expressions of the genes in the pT1/2 (n = 7) and pT3/4 (n = 11) groups in the 18-case validation set are shown on the right of the panel. Note that the similar gene-expression patterns in the validation cases demonstrate the reproducibility of the expression profile in the independent set of samples. C, the process of feature-subset gene selection with predictive value for pT stages (pT1/2 versus pT3/4). Sequential addition of the most significant genes at each step, with a total of 3916 models tested, resulted in a minimal error rate of 0.0%.

Fig. 2.

Changes in the expression of genes correlated with tumor progression (pT stages). A, 24 genes (top) were decreased, and 47 genes (bottom) were increased along with the advance of pT stage by the generalized linear model-based regression analysis. Brighter color indicates higher expression. B, heat-map view of the 88 genes of which the expressions significantly differed between the pT1/2 (n = 16) and pT3/4 (n = 20) groups in the 36-case training set. The normalized relative expression for each transcript (rows) in each sample (columns) is shown in color (brighter color indicates higher expression). For comparison, the expressions of the genes in the pT1/2 (n = 7) and pT3/4 (n = 11) groups in the 18-case validation set are shown on the right of the panel. Note that the similar gene-expression patterns in the validation cases demonstrate the reproducibility of the expression profile in the independent set of samples. C, the process of feature-subset gene selection with predictive value for pT stages (pT1/2 versus pT3/4). Sequential addition of the most significant genes at each step, with a total of 3916 models tested, resulted in a minimal error rate of 0.0%.

Close modal
Fig. 3.

Feature selection procedures for lymph node metastasis. A, gene expression pattern of 87 genes of which the expressions were significantly different (P < 0.05) between pN1 cases (n = 20) and pN0 cases (n = 16) in the 36-case training set. For comparison, expressions of the genes in pN1 (n = 9) and pN0 (n = 9) groups in the 18-case validation set are shown on the right of the panel. Note the similar patterns in gene expression in the validation cases. B, the process of feature-subset selection with predictive value for pN stages (pN1 versus pN0). Sequential addition of the most significant genes at each step, testing a total of 3828 models, resulted in a minimal error rate of 2.8%.

Fig. 3.

Feature selection procedures for lymph node metastasis. A, gene expression pattern of 87 genes of which the expressions were significantly different (P < 0.05) between pN1 cases (n = 20) and pN0 cases (n = 16) in the 36-case training set. For comparison, expressions of the genes in pN1 (n = 9) and pN0 (n = 9) groups in the 18-case validation set are shown on the right of the panel. Note the similar patterns in gene expression in the validation cases. B, the process of feature-subset selection with predictive value for pN stages (pN1 versus pN0). Sequential addition of the most significant genes at each step, testing a total of 3828 models, resulted in a minimal error rate of 2.8%.

Close modal
Table 1

Clinicopathological features in the 36 cases for training and the 18 cases for validation

Lymph node metastasis
NegativePositive
Training cases (n = 16)Test cases (n = 9)Training cases (n = 20)Test cases (n = 9)
Age (mean ± SD) 66.2 ± 6.1 65.9 ± 7.3 63.7 ± 5.5 69.3 ± 8.4 
Gender     
 Male 14 15 
 Female 
Depth of invasion     
 pT1 
 pT2 
 pT3 
 pT4 
pTNM M-status     
 M0 16 14 
 M1a 
 M1b 
pTMN stage     
 I 
 II A 11 
 II B 
 III 11 
 IV A 
 IV B 
Lymph node metastasis
NegativePositive
Training cases (n = 16)Test cases (n = 9)Training cases (n = 20)Test cases (n = 9)
Age (mean ± SD) 66.2 ± 6.1 65.9 ± 7.3 63.7 ± 5.5 69.3 ± 8.4 
Gender     
 Male 14 15 
 Female 
Depth of invasion     
 pT1 
 pT2 
 pT3 
 pT4 
pTNM M-status     
 M0 16 14 
 M1a 
 M1b 
pTMN stage     
 I 
 II A 11 
 II B 
 III 11 
 IV A 
 IV B 
Table 2

Genes of which the expression correlated with depth of tumor invasion (pT stages)

Gene nameSymbolaAccessionbβ Value
Positive correlationc    
 Cell adhesion protein    
  Fibronectind FN1 M10905 0.26 
  Integrin α5d,e ITGA5 X06256 0.26 
  Integrin β1d,e ITGB1 X07979 1.19 
  Integrin β3d,e ITGB3 M20311 2.35 
  Integrin β6d ITGB6 M35198 1.65 
  Laminin β1 LAMB1 M61951 0.85 
  Laminin γ1d,e LAMC1 M55210 0.95 
 Cell cycle regulator    
  Cell division cycle27d,e CDC27 U00001 1.94 
  Nucleolar protein 3 NOL3 AF043244 0.43 
  Apoptosis regulator BID AF042083 1.16 
  Secreted frizzled-related protein 2d SFRP2 AF017986 0.26 
  Galectin 1d LGALS1 J04456 0.06 
 Cell membrane protein    
  CD4 antigend,e CD4 M12807 1.16 
  CD58 antigen CD58 Y00636 1.08 
  Lactadherind,e MFGE8 U58516 0.45 
  Member RAS oncogene familyd,e RAB5A M28215 0.72 
 Cell membrane receptor    
 α-1D-adrenergic receptord ADRA1D U03864 0.06 
  CD47 antigen CD47 Z25521 0.99 
  Dopamine receptor D2 DRD2 M30625 0.23 
  Dopamine receptor D3 DRD3 U32499 0.40 
  IFN-α receptord,e IFNAR1 J03171 2.20 
  Cholinergic receptor, muscarinic 1 CHRM1 X15263 0.86 
  Glutamate receptor N-methyl d-aspartate 2C GRIN2C U77782 0.09 
  P2Y purinoceptor 1 P2RY1 U42030 0.47 
 Extracellular matrix related    
  Cathepsin Kd CTSK X82153 0.13 
  Matrix metalloproteinase 1d MMP1 X54925 0.22 
  Matrix metalloproteinase 7d MMP7 Z11887 0.47 
  Matrix metalloproteinase 10d,e MMP10 X07820 0.51 
  Matrix metalloproteinase 14 MMP14 Z48481 0.26 
  Mucin 2 MUC2 M74027 0.37 
  Protein Cd PROC K02059 1.03 
  Tissue inhibitor of metalloproteinase 1d TIMP1 X03124 0.03 
  Tissue inhibitor of metalloproteinase 3d,e TIMP3 U14394 0.77 
 Growth/differentiation factor    
  Growth differentiation factor 1d,e GDF1 M62302 1.96 
  Insulin-like growth factor binding protein 6+ IGFBP6 M62402 0.32 
  Inhibin βAd INHBA M13436 0.14 
  Chemokine ligand 4d,e CCL4 J04130 1.07 
  Chemokine (C-C motif) ligand 17 CCL17 D43767 1.12 
  Vascular endothelial growth factor Cd,e VEGFC U43142 0.98 
Intracellular signaling    
  Nitric oxide synthase 2A NOS2A AB022318 1.10 
  Ribosomal protein S6 kinase, 90kD, 1d,e RPS6KA1 BC014966 0.63 
  Transforming growth factor, β-induced, 68kDad TGFBI M77349 0.15 
 Others    
  Molecular chaperone SERPINH1 X61598 0.16 
  Caveolin 2d CAV2 AA878149 0.47 
 Activating transcription factor 3 ATF3 L19871 0.48 
 DNA repair protein XRCC2 XRCC2 AF035587 0.35 
  Cathepsin Ld,e CTSL X12451 0.66 
Negative correlationf    
 Cell cycle regulator    
  CDC-like kinase 2d,e CLK2 L29216 −0.51 
  p53-binding protein MDM2 MDM2 Z12020 −1.48 
  PCTAIRE protein kinase 2d,e PCTK2 X66360 −1.24 
  Cell division cycle 2-like 1d,e CDC2L1 U04815 −0.76 
  Retinoblastoma-like 2 (p130)d,e RBL2 X76061 −0.99 
 Cell membrane receptor    
  CD20 antigen MS4A1 X12530 −0.53 
  G protein-coupled receptor 65d,e GPR65 U95218 −1.20 
  EphB6d,e EPHB6 D83492 −2.58 
Gene nameSymbolaAccessionbβ Value
Positive correlationc    
 Cell adhesion protein    
  Fibronectind FN1 M10905 0.26 
  Integrin α5d,e ITGA5 X06256 0.26 
  Integrin β1d,e ITGB1 X07979 1.19 
  Integrin β3d,e ITGB3 M20311 2.35 
  Integrin β6d ITGB6 M35198 1.65 
  Laminin β1 LAMB1 M61951 0.85 
  Laminin γ1d,e LAMC1 M55210 0.95 
 Cell cycle regulator    
  Cell division cycle27d,e CDC27 U00001 1.94 
  Nucleolar protein 3 NOL3 AF043244 0.43 
  Apoptosis regulator BID AF042083 1.16 
  Secreted frizzled-related protein 2d SFRP2 AF017986 0.26 
  Galectin 1d LGALS1 J04456 0.06 
 Cell membrane protein    
  CD4 antigend,e CD4 M12807 1.16 
  CD58 antigen CD58 Y00636 1.08 
  Lactadherind,e MFGE8 U58516 0.45 
  Member RAS oncogene familyd,e RAB5A M28215 0.72 
 Cell membrane receptor    
 α-1D-adrenergic receptord ADRA1D U03864 0.06 
  CD47 antigen CD47 Z25521 0.99 
  Dopamine receptor D2 DRD2 M30625 0.23 
  Dopamine receptor D3 DRD3 U32499 0.40 
  IFN-α receptord,e IFNAR1 J03171 2.20 
  Cholinergic receptor, muscarinic 1 CHRM1 X15263 0.86 
  Glutamate receptor N-methyl d-aspartate 2C GRIN2C U77782 0.09 
  P2Y purinoceptor 1 P2RY1 U42030 0.47 
 Extracellular matrix related    
  Cathepsin Kd CTSK X82153 0.13 
  Matrix metalloproteinase 1d MMP1 X54925 0.22 
  Matrix metalloproteinase 7d MMP7 Z11887 0.47 
  Matrix metalloproteinase 10d,e MMP10 X07820 0.51 
  Matrix metalloproteinase 14 MMP14 Z48481 0.26 
  Mucin 2 MUC2 M74027 0.37 
  Protein Cd PROC K02059 1.03 
  Tissue inhibitor of metalloproteinase 1d TIMP1 X03124 0.03 
  Tissue inhibitor of metalloproteinase 3d,e TIMP3 U14394 0.77 
 Growth/differentiation factor    
  Growth differentiation factor 1d,e GDF1 M62302 1.96 
  Insulin-like growth factor binding protein 6+ IGFBP6 M62402 0.32 
  Inhibin βAd INHBA M13436 0.14 
  Chemokine ligand 4d,e CCL4 J04130 1.07 
  Chemokine (C-C motif) ligand 17 CCL17 D43767 1.12 
  Vascular endothelial growth factor Cd,e VEGFC U43142 0.98 
Intracellular signaling    
  Nitric oxide synthase 2A NOS2A AB022318 1.10 
  Ribosomal protein S6 kinase, 90kD, 1d,e RPS6KA1 BC014966 0.63 
  Transforming growth factor, β-induced, 68kDad TGFBI M77349 0.15 
 Others    
  Molecular chaperone SERPINH1 X61598 0.16 
  Caveolin 2d CAV2 AA878149 0.47 
 Activating transcription factor 3 ATF3 L19871 0.48 
 DNA repair protein XRCC2 XRCC2 AF035587 0.35 
  Cathepsin Ld,e CTSL X12451 0.66 
Negative correlationf    
 Cell cycle regulator    
  CDC-like kinase 2d,e CLK2 L29216 −0.51 
  p53-binding protein MDM2 MDM2 Z12020 −1.48 
  PCTAIRE protein kinase 2d,e PCTK2 X66360 −1.24 
  Cell division cycle 2-like 1d,e CDC2L1 U04815 −0.76 
  Retinoblastoma-like 2 (p130)d,e RBL2 X76061 −0.99 
 Cell membrane receptor    
  CD20 antigen MS4A1 X12530 −0.53 
  G protein-coupled receptor 65d,e GPR65 U95218 −1.20 
  EphB6d,e EPHB6 D83492 −2.58 
Table 2A

Continued

Gene nameSymbolaAccessionbβ Value
 Growth/differentiation factor    
  Chemokine receptor 7 CCR7 L31581 −0.92 
  Interleukin 18d,e IL18 D49950 −1.84 
  Colony stimulating factor 1d,e CSF1 M27087 −2.16 
 Intracellular signaling    
  Dishevelled 3d DVL3 AF006013 −0.36 
  G protein-coupled receptor kinase 5d,e GPRK5 L15388 −1.39 
  G protein-coupled receptor kinase 6d,e GPRK6 L16862 −1.00 
  RalA binding protein 1d,e RALBP1 L42542 −1.92 
Transcriptional factor    
  Nuclear receptor coactivator 2d,e NCOA2 X97674 −2.67 
  Nuclear receptor coactivator 1 NCOA1 U40396 −1.61 
  Signal transducer and activator of transcription 6d,e STAT6 U16031 −0.46 
  Paired box gene 8d,e PAX8 L19606 −1.11 
 Others    
  Receptor interacting kinase 1d,e RIPK1 U50062 −0.80 
  CD8 antigen CD8A M12828 −0.97 
  Topoisomerase (DNA) Id TOP1 J03250 −0.58 
  Cytochrome P450, subfamily IID, polypeptide 6 CYP2D6 M20403 −0.63 
  Keratin 18d KRT18 M26326 −0.23 
Gene nameSymbolaAccessionbβ Value
 Growth/differentiation factor    
  Chemokine receptor 7 CCR7 L31581 −0.92 
  Interleukin 18d,e IL18 D49950 −1.84 
  Colony stimulating factor 1d,e CSF1 M27087 −2.16 
 Intracellular signaling    
  Dishevelled 3d DVL3 AF006013 −0.36 
  G protein-coupled receptor kinase 5d,e GPRK5 L15388 −1.39 
  G protein-coupled receptor kinase 6d,e GPRK6 L16862 −1.00 
  RalA binding protein 1d,e RALBP1 L42542 −1.92 
Transcriptional factor    
  Nuclear receptor coactivator 2d,e NCOA2 X97674 −2.67 
  Nuclear receptor coactivator 1 NCOA1 U40396 −1.61 
  Signal transducer and activator of transcription 6d,e STAT6 U16031 −0.46 
  Paired box gene 8d,e PAX8 L19606 −1.11 
 Others    
  Receptor interacting kinase 1d,e RIPK1 U50062 −0.80 
  CD8 antigen CD8A M12828 −0.97 
  Topoisomerase (DNA) Id TOP1 J03250 −0.58 
  Cytochrome P450, subfamily IID, polypeptide 6 CYP2D6 M20403 −0.63 
  Keratin 18d KRT18 M26326 −0.23 
a

Symbol in LocusLink database.

b

GenBank ID.

c

Genes increased their expression in association with the depth of invasion (pTstage).

d

Genes of which the expressions significantly differed between pT1/2 and pT3/4 stages.

e

Features selected for two-class discrimination for pT1/2 versus pT3/4 stages.

f

Genes decreased their expression in association with the depth of invasion (pTstage).

g

β value: regression coefficient.

Table 3

The selected optimal set of genes for classification into each class (lymph node positive and negative)

Gene nameSymbolaAccessionbpN1:pN0c
Higher expressiond    
 Cell adhesion proteins    
  CD33 antigen (gp67) CD33 M23197 1.21 
  Gap junction protein, β1, 32kDa GJB1 X04325 1.22 
  Integrin αIIb ITGA2B J02764 1.17 
  Integrin αX ITGAX Y00093 1.17 
  Integrin β6 ITGB6 M35198 1.15 
 Cell membrane receptor    
  Lymphocyte-function antigen-2 CD2 M14362 1.23 
  Avian erythroblastic leukemia viral oncogene homolog 4 ERBB4 L07868 1.17 
  Colony-stimulating factor-2 receptor CSF2RB M59941 1.24 
 Cell death regulator    
  Mitotic arrest deficient-like 1 MAD1L1 L06895 1.20 
  BCL2/adenovirus E1B 19kDa interacting protein 3 BNIP3 U15172 1.21 
 Intracellular signaling    
  CD38 antigen (p45) CD38 M34461 1.38 
  Rho GDP dissociation inhibitor γ ARHGDIG AF080237 1.36 
 Others    
  Cytochrome p450 XVIIA1 CYP17A1 M14564 1.17 
  Retroviral sequences NP2 RVNP2 M15971 1.19 
  NADPH:quinone reductase CRYZ L31521 1.20 
  Matrix metalloproteinase 13 MMP13 X75308 1.45 
Lower expressiond    
 Cell cycle regulator    
  Cell division cycle 16 homolog CDC16 AF164598 0.856 
  Cyclin-dependent kinase 8 CDK8 X85753 0.781 
  Cyclin F CCNF U17105 0.831 
  Protein phosphatase 1D PPM1D U78305 0.801 
 Intracellular signaling    
  Apolipoprotein D APOD J02611 0.714 
  Apolipoprotein H APOH X53595 0.815 
  Rapamycin target protein FRAP1 L34075 0.845 
  GRB2-associated binder-1 GAB1 U43885 0.715 
  NCK adaptor protein 1 NCK1 X17576 0.794 
  Mitogen-activated protein kinase 14 MAPK14 U19775 0.855 
 Cell membrane receptor    
  Androgen receptor AR M20132 0.810 
  EphB 6 EPHB6 D83492 0.875 
  Insulin receptor INSR M10051 0.864 
  Toll-like receptor 3 TLR3 U88879 0.777 
  Toll-like receptor 6 TLR6 AB020807 0.763 
 Metabolic enzyme    
  Cytochrome P450, family 24 CYP24A1 L13286 0.772 
  Cytochrome P450, subfamily IID, polypeptide 6 CYP2D6 M20403 0.769 
  Phosphodiesterase 4A PDE4A L20965 0.842 
  Phosphodiesterase 4C PDE4C Z46632 0.703 
 Cell death regulator    
  Caspase-activated DNase DFFB AF039210 0.841 
  Programmed cell death 1 PDCD1 U64863 0.779 
 DNA damage response    
  FRAP-related protein 1 ATR U49844 0.883 
  Mouse double min 2 MDM2 Z12020 0.688 
 Growth/differentiation factor    
  Bone morphogenetic protein 5 BMP5 M38693 0.831 
  Chemokine ligand 1 CCL1 M57502 0.861 
 Others    
  Heat shock 70kDa protein 4 HSPA4 AB023420 0.793 
  Retinoid X receptor, β RXRB M84820 0.873 
  IFN-α IFNA1 V00537 0.766 
Gene nameSymbolaAccessionbpN1:pN0c
Higher expressiond    
 Cell adhesion proteins    
  CD33 antigen (gp67) CD33 M23197 1.21 
  Gap junction protein, β1, 32kDa GJB1 X04325 1.22 
  Integrin αIIb ITGA2B J02764 1.17 
  Integrin αX ITGAX Y00093 1.17 
  Integrin β6 ITGB6 M35198 1.15 
 Cell membrane receptor    
  Lymphocyte-function antigen-2 CD2 M14362 1.23 
  Avian erythroblastic leukemia viral oncogene homolog 4 ERBB4 L07868 1.17 
  Colony-stimulating factor-2 receptor CSF2RB M59941 1.24 
 Cell death regulator    
  Mitotic arrest deficient-like 1 MAD1L1 L06895 1.20 
  BCL2/adenovirus E1B 19kDa interacting protein 3 BNIP3 U15172 1.21 
 Intracellular signaling    
  CD38 antigen (p45) CD38 M34461 1.38 
  Rho GDP dissociation inhibitor γ ARHGDIG AF080237 1.36 
 Others    
  Cytochrome p450 XVIIA1 CYP17A1 M14564 1.17 
  Retroviral sequences NP2 RVNP2 M15971 1.19 
  NADPH:quinone reductase CRYZ L31521 1.20 
  Matrix metalloproteinase 13 MMP13 X75308 1.45 
Lower expressiond    
 Cell cycle regulator    
  Cell division cycle 16 homolog CDC16 AF164598 0.856 
  Cyclin-dependent kinase 8 CDK8 X85753 0.781 
  Cyclin F CCNF U17105 0.831 
  Protein phosphatase 1D PPM1D U78305 0.801 
 Intracellular signaling    
  Apolipoprotein D APOD J02611 0.714 
  Apolipoprotein H APOH X53595 0.815 
  Rapamycin target protein FRAP1 L34075 0.845 
  GRB2-associated binder-1 GAB1 U43885 0.715 
  NCK adaptor protein 1 NCK1 X17576 0.794 
  Mitogen-activated protein kinase 14 MAPK14 U19775 0.855 
 Cell membrane receptor    
  Androgen receptor AR M20132 0.810 
  EphB 6 EPHB6 D83492 0.875 
  Insulin receptor INSR M10051 0.864 
  Toll-like receptor 3 TLR3 U88879 0.777 
  Toll-like receptor 6 TLR6 AB020807 0.763 
 Metabolic enzyme    
  Cytochrome P450, family 24 CYP24A1 L13286 0.772 
  Cytochrome P450, subfamily IID, polypeptide 6 CYP2D6 M20403 0.769 
  Phosphodiesterase 4A PDE4A L20965 0.842 
  Phosphodiesterase 4C PDE4C Z46632 0.703 
 Cell death regulator    
  Caspase-activated DNase DFFB AF039210 0.841 
  Programmed cell death 1 PDCD1 U64863 0.779 
 DNA damage response    
  FRAP-related protein 1 ATR U49844 0.883 
  Mouse double min 2 MDM2 Z12020 0.688 
 Growth/differentiation factor    
  Bone morphogenetic protein 5 BMP5 M38693 0.831 
  Chemokine ligand 1 CCL1 M57502 0.861 
 Others    
  Heat shock 70kDa protein 4 HSPA4 AB023420 0.793 
  Retinoid X receptor, β RXRB M84820 0.873 
  IFN-α IFNA1 V00537 0.766 
a

Symbol in LocusLink database.

b

GenBank accession number.

c

pN1:pN0: ratio of mean expression values (node positive cases to negative cases).

d

Expression status in node positive cases compared with node negative cases.

Table 4

Performance of the ensemble classifier for pT and pN stages in the 18-case validation set

ClassCase numberMean ± SD (%) in training multiple PNNsaMean ± SD (%) in validating multiple PNNsbOverall error by ensemble votesc
pT stages pT1/2 15.1 ± 4.8 15.5 ± 6.3 14.3 (1/7) 
 pT3/4 11 7.8 ± 4.9 0.0 ± 0.0 0.0 (0/11) 
pN stages pN1 11.3 ± 2.2 11.1 ± 0.0 11.1 (1/9) 
 pN0 6.9 ± 6.2 11.1 ± 0.0 11.1 (1/9) 
ClassCase numberMean ± SD (%) in training multiple PNNsaMean ± SD (%) in validating multiple PNNsbOverall error by ensemble votesc
pT stages pT1/2 15.1 ± 4.8 15.5 ± 6.3 14.3 (1/7) 
 pT3/4 11 7.8 ± 4.9 0.0 ± 0.0 0.0 (0/11) 
pN stages pN1 11.3 ± 2.2 11.1 ± 0.0 11.1 (1/9) 
 pN0 6.9 ± 6.2 11.1 ± 0.0 11.1 (1/9) 
a

Performance of individual probabilistic neural networks (PNNs) on the training set.

b

Performance of individual PNNs on the validation set.

c

Overall performance of the combined ensemble classifier.

We thank the following hospitals in Hokkaido prefecture, Japan for providing esophageal cancer samples: Asahikawa City Hospital, Hakodate Central General Hospital, Hakodate Medical Association Hospital, Hokkaido Gastroenterology Hospital, Keiyukai Sapporo Hospital, Kitami Red Cross Hospital, Kiyota Hospital, Kushiro City Hospital, Kushiro Red Cross Hospital, National Hakodate Hospital, Obihiro-Kosei General Hospital, Oji General Hospital, Shinnittetsu Muroran General Hospital, Teine Keijinkai Hospital, and Tonan Hospital.

1
Xiao ZF, Yang ZY, Liang J, et al Value of radiotherapy after radical surgery for esophageal carcinoma: a report of 495 patients.
Ann Thorac Surg
,
75
:
331
-6,  
2003
.
2
Mariette C, Balon JM, Piessen G, Fabre S, Van Seuningen I, Triboulet JP. Pattern of recurrence following complete resection of esophageal carcinoma and factors predictive of recurrent disease.
Cancer
,
97
:
1616
-23,  
2003
.
3
Altorki N, Kent M, Ferrara C, Port J. Three-field lymph node dissection for squamous cell and adenocarcinoma of the esophagus.
Ann Surg
,
236
:
177
-83,  
2002
.
4
Kimura H, Konishi K, Arakawa H, et al Number of lymph node metastases influences survival in patients with thoracic esophageal carcinoma: therapeutic value of radiation treatment for recurrence.
Dis Esophagus
,
12
:
205
-8,  
1999
.
5
Porte H, Triboulet JP, Kotelevets L, et al Overexpression of stromelysin-3, BM-40/SPARC, and MET genes in human esophageal carcinoma: implications for prognosis.
Clin Cancer Res
,
4
:
1375
-82,  
1998
.
6
Wang LS, Chow KC, Wu CW. Expression and up-regulation of interleukin-6 in oesophageal carcinoma cells by n-sodium butyrate.
Br J Cancer
,
80
:
1617
-22,  
1999
.
7
Xu Z, Wang MR, Xu X, et al Novel human esophagus-specific gene c1orf10: cDNA cloning, gene structure, and frequent loss of expression in esophageal cancer.
Genomics
,
69
:
322
-30,  
2000
.
8
Wang Q, Yang C, Zhou J, Wang X, Wu M, Liu Z. Cloning and characterization of full-length human ribosomal protein L15 cDNA which was overexpressed in esophageal cancer.
Gene
,
263
:
205
-9,  
2001
.
9
Kato K, Hida Y, Miyamoto M, et al Overexpression of caveolin-1 in esophageal squamous cell carcinoma correlates with lymph node metastasis and pathologic stage.
Cancer
,
94
:
929
-33,  
2002
.
10
Miyazaki T, Kato H, Fukuchi M, Nakajima M, Kuwano H. EphA2 overexpression correlates with poor prognosis in esophageal squamous cell carcinoma.
Int J Cancer
,
103
:
657
-63,  
2003
.
11
Miyazaki T, Kato H, Nakajima M, et al FAK overexpression is correlated with tumour invasiveness and lymph node metastasis in oesophageal squamous cell carcinoma.
Br J Cancer
,
89
:
140
-5,  
2003
.
12
Shiraishi T, Mori M, Tanaka S, Sugimachi K, Akiyoshi T. Identification of cystatin B in human esophageal carcinoma, using differential displays in which the gene expression is related to lymph-node metastasis.
Int J Cancer
,
79
:
175
-8,  
1998
.
13
Ding Y, Shimada Y, Gorrin-Rivas MJ, et al Clinicopathological significance of human macrophage metalloelastase expression in esophageal squamous cell carcinoma.
Oncology
,
63
:
378
-84,  
2002
.
14
DeRisi J, Penland L, Brown PO, et al Use of a cDNA microarray to analyse gene expression patterns in human cancer.
Nat Genet
,
14
:
457
-60,  
1996
.
15
Hasegawa S, Furukawa Y, Li M, et al Genome-wide analysis of gene expression in intestinal-type gastric cancers using a complementary DNA microarray representing 23,040 genes.
Cancer Res
,
62
:
7012
-7,  
2002
.
16
Huang E, Cheng SH, Dressman H, et al Gene expression predictors of breast cancer outcomes.
Lancet
,
361
:
1590
-6,  
2003
.
17
Sobin LH, Wittekind C. .
TNM classification of malignant tumours
, 6th ed.
p. 60
-64, WileyLiss New York  
2002
.
18
Durbin B, Rocke DM. Estimation of transformation parameters for microarray data.
Bioinformatics
,
19
:
1360
-7,  
2003
.
19
Redner RA, Walker HF. Mixture densities, maximum likelihood and the EM algorithm.
SIAM Review
,
26
:
195
-239,  
1984
.
20
Duda RO, Hart PE, Stork DG. .
Pattern Classification
, 2nd edition John Wiley & Sons New York  
2001
.
21
Efron B. The bootstrap method for assessing statistical accuracy.
Behaviormetrika
,
17
:
1
-35,  
1985
.
22
Whitney AW. A direct method of nonparametric measurement selection.
IEEE Trans. Comput
,
20
:
1100
-3,  
1971
.
23
Olshen AB, Jain AN. Deriving quantitative conclusions from microarray expression data.
Bioinformatics
,
18
:
961
-70,  
2002
.
24
Tarjan R. Depth-first search and linear graph algorithms.
SIAM J Comput
,
1
:
146
-60,  
1972
.
25
Dijkstra EW. A note on two problems in connection with graphs.
Numer Math
,
1
:
269
-71,  
1959
.
26
Specht DF. Probabilistic neural networks.
Neural Networks
,
3
:
229
-36,  
1990
.
27
Freund Y, Schapire R. A decision-theoretic generalization of online learning and an application to boosting.
J Comput Syst Sci
,
55
:
119
-39,  
1997
.
28
Xu SH, Qian LJ, Mou HZ, et al Difference of gene expression profiles between esophageal carcinoma and its pericancerous epithelium by gene chip.
World J Gastroenterol
,
9
:
417
-22,  
2003
.
29
Lu J, Liu Z, Xiong M, et al Gene expression profile changes in initiation and progression of squamous cell carcinoma of esophagus.
Int J Cancer
,
91
:
288
-94,  
2001
.
30
Adachi M, Taki T, Higashiyama M, Kohno N, Inufusa H, Miyake M. Significance of integrin alpha5 gene expression as a prognostic factor in node-negative non-small cell lung cancer.
Clin Cancer Res
,
6
:
96
-101,  
2000
.
31
Li X, Yang Y, Hu Y, et al Alphavbeta6-Fyn signaling promotes oral cancer progression.
J Biol Chem
,
278
:
41646
-53,  
2003
.
32
Koseki T, Inohara N, Chen S, Nunez G. ARC, an inhibitor of apoptosis expressed in skeletal muscle and heart that interacts selectively with caspases.
Proc Natl Acad Sci USA
,
95
:
5156
-60,  
1998
.
33
Shimonishi T, Miyazaki K, Kono N, et al Expression of endogenous galectin-1 and galectin-3 in intrahepatic cholangiocarcinoma.
Hum Pathol
,
32
:
302
-10,  
2001
.
34
Yamashita K, Mori M, Kataoka A, Inoue H, Sugimachi K. The clinical significance of MMP-1 expression in oesophageal carcinoma.
Br J Cancer
,
84
:
276
-82,  
2001
.
35
Yamashita K, Mori M, Shiraishi T, Shibuta K, Sugimachi K. Clinical significance of matrix metalloproteinase-7 expression in esophageal carcinoma.
Clin Cancer Res
,
6
:
1169
-74,  
2000
.
36
Mathew R, Khanna R, Kumar R, Mathur M, Shukla NK, Ralhan R. Stromelysin-2 overexpression in human esophageal squamous cell carcinoma: potential clinical implications.
Cancer Detect Prev
,
26
:
222
-8,  
2002
.
37
Mori M, Mimori K, Sadanaga N, et al Prognostic impact of tissue inhibitor of matrix metalloproteinase-1 in esophageal carcinoma.
Int J Cancer
,
88
:
575
-8,  
2000
.
38
Kitadai Y, Amioka T, Haruma K, et al Clinicopathological significance of vascular endothelial growth factor (VEGF)-C in human esophageal squamous cell carcinomas.
Int J Cancer
,
93
:
662
-6,  
2001
.
39
Ariza ME, Broome-Powell M, Lahti JM, Kidd VJ, Nelson MA. Fas-induced apoptosis in human malignant melanoma cell lines is associated with the activation of the p34(cdc2)-related PITSLRE protein kinases.
J Biol Chem
,
274
:
28505
-13,  
1999
.
40
Nozoe T, Korenaga D, Itoh S, Futatsugi M, Maehara Y. Clinicopathological significance of pRb2/p130 expression in squamous cell carcinoma of the esophagus.
J Cancer Res Clin Oncol
,
128
:
691
-6,  
2002
.
41
Mangelsdorf DJ, Thummel C, Beato M, et al The nuclear receptor superfamily: the second decade.
Cell
,
83
:
835
-9,  
1995
.
42
Kroll TG, Sarraf P, Pecciarini L, et al PAX8-PPARgamma1 fusion oncogene in human thyroid carcinoma.
Science
,
289
:
1357
-60,  
2000
.
43
Xu Y, Selaru FM, Yin J, et al Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett’s esophagus and esophageal cancer.
Cancer Res
,
62
:
3493
-7,  
2002
.