Purpose: Gene expression microarray technologies have the potential to definemolecular profiles that may identify specific phenotypes(diagnosis), establish a patient’s expected clinical outcome (prognosis), and indicate the likelihood of a beneficial effect of a specific therapy (prediction). We wished to develop optimal tissue acquisition, processing, and analysis procedures for exploring the gene expression profiles of breast core needle biopsies representing cancer and noncancer tissues.

Experimental Design: Human breast cancer xenografts were used to evaluate several processing methods for prospectively collecting adequate amounts of high-quality RNA for gene expression microarray studies. Samples were assessed for the preservation of tissue architecture and the quality and quantity of RNA recovered. An optimized protocol was applied to a small study of core needle breast biopsies from patients, in which we compared the molecular profiles from cancer with those from noncancer biopsies. Gene expression data were obtained using Research Genetics, Inc. NamedGenes cDNA microarrays. Data were visualized using simple hierarchical clustering and a novel principal component analysis-based multidimensional scaling. Data dimensionality was reduced by simple statistical approaches. Predictive neural networks were built using a multilayer perceptron and evaluated in an independent data set from snap-frozen mastectomy specimens.

Results: Processing tissue through RNALater preserves tissue architecture when biopsies are washed for 5 min on ice with ice-cold PBS before histopathological analysis. Cell margins are clear, tissue folding and fragmentation are not observed, and integrity of the cores is maintained, allowing optimal pathological interpretation and preservation of important diagnostic information. Adequate concentrations of high-quality RNA are recovered; 51 of 55 biopsies produced a median of 1.34 μg of total RNA (range, 100 ng to 12.60 μg). Snap-freezing or the use of RNALater does not affect RNA recovery or the molecular profiles obtained from biopsies. The neural network predictors accurately discriminate between predominantly cancer and noncancer breast biopsies.

Conclusions: The approaches generated in these studies provide a simple, safe, and effective method for prospectively acquiring and processing breast core needle biopsies for gene expression studies. Gene expression data from these studies can be used to build accurate predictive models that separate different molecular profiles. The data establish the use and effectiveness of these approaches for future prospective studies.

The emerging gene microarray technologies provide powerful new methodologies with which to address several important issues in breast cancer research. For example, it should be possible to define gene expression patterns that can identify specific phenotypes (diagnosis), establish a patient’s expected clinical outcome (prognosis), and indicate the likelihood of a beneficial effect of a specific therapy (prediction; Refs. 1 and 2). Gene microarray technologies are performed on chips, glass slides, or filters and allow the comparison of gene expression profiles from two or more tissues or the same tissue in different biological states (3). The technologies continue to develop, with considerable discussion regarding which technology has the greatest potential to address the molecular profiling of tumors. Each of the major approaches has advantages and disadvantages, but the most important consideration is the ability of the technology to address the chosen hypothesis (4). Overall, there is no compelling evidence of major differences in the accuracy or reproducibility of the various microarray platforms (4, 5, 6). Studies that directly compare the nylon-based cDNA arrays with either glass slide cDNA arrays and/or oligonucleotide chips consistently report that these platforms produce comparable data (5, 6, 7, 8).

Because gene expression technologies provide an assessment of mRNA abundance in a sample, all require the production of a probe, labeled with either a radioactive nucleotide or fluorescent molecule, generated from either the total or polyadenylate RNA isolated from the sample. Currently, it is not possible to isolate adequate concentrations of high-quality RNA from what would otherwise be the most abundant source: the formalin-fixed, paraffin-embedded tumor specimens available in established tumor banks. Only fresh or appropriately frozen tissues provide the necessary quality of RNA for the preparation of probes to hybridize to existing gene expression microarrays.

Whereas many institutions have frozen tumor banks, these may be of limited use in obtaining reproducible gene expression profiles for some breast cancers. For example, most are heavily biased toward large breast tumors (T3-T4). These tumors are poorly representative of the small tumors now seen in many patients for initial diagnosis (9). A further concern with existing frozen tissue banks is the frequent lack of a standardized approach for tissue acquisition and processing. Tissue handling between excision and freezing can vary considerably. For example, some tumors are frozen within seconds of excision, and others are placed on wet or dry ice after excision, whereas some may stand for many minutes at room temperature before being placed in liquid nitrogen. The importance of tissue processing is often critical for assessing various end points and can affect both RNA stability for RNA in situ hybridizations and antigen stability/accessibility for immunohistochemistry (10).

The effect of tissue acquisition and processing on gene microarray data has not been widely addressed. Nonetheless, this is likely to be important for at least two critical parameters. First is preservation of high-quality RNA. Most investigators acknowledge the importance of using only pure, high-quality RNA for gene microarray studies (11). The second factor is maintenance of a tissue’s gene expression profile. For example, hypoxia- or stress-induced responses can be induced in metabolically active cells. Oxygen deprivation begins with the loss of tissue perfusion occurring upon excision. This deprivation can trigger a hypoxic response, characterized by the altered expression of specific genes (12, 13). Several of these genes are transcription factors that further affect the expression of their target genes (13).

One problem with these two factors is that both can affect a sample, but RNA could still be obtained, a probe could still be generated, and a molecular profile could still be obtained after hybridization to a gene expression microarray. Subtle changes that are time, temperature, pH, and/or oxygen dependent could occur with sufficient variability that they are almost impossible to detect reproducibly. Some tumors with high metabolic activity may be more sensitive to hypoxia, producing a statistically valid and biologically plausible clustering that could have resulted more from tissue processing rather than tissue biology. Where such changes are subtle, expression profiles might still appear grossly similar, complicating an assessment of tissue processing artifacts.

Given the bias of existing banks and the potential differences in tissue processing, many important questions in breast cancer biology may require prospective study designs. Such study designs are more valid for the exploration or validation of new predictive and prognostic factors. Whereas optimized tissue acquisition and processing strategies for prospective studies offer the opportunity for greater control of tissue quality than retrospective studies, these strategies have not been described. In this study, we wished to develop a standard tissue acquisition/processing method for prospective core needle breast biopsy sampling. This method should avoid the initial use of liquid nitrogen, preserve tissue architecture, and provide adequate concentrations of high-quality RNA for microarray analysis. We now report a simple tissue processing approach using a commercially available reagent (RNALater) that is applicable to prospective studies on core needle biopsies. RNA obtained from this approach was compared with RNA from snap-frozen human breast biopsies of neoplastic and nonneoplastic tissues, gene expression microarray data were obtained, and an accurate neural network capable of discriminating between these tissues was built and validated in an independent data set.

Breast Cancer Xenograft Studies.

MDA-MB-231 cells were inoculated into athymic nude mice as described previously (14, 15). Mice were sacrificed, and tumor tissue was obtained using sterile scissors and forceps. Needle biopsies were taken from the excised xenografts and placed into separate tubes containing 0.5 ml of RNALater (Ambion, Austin, TX) at room temperature. Samples were stored at various temperatures for 72 h and subsequently processed according to the scheme in Table 1. Each experimental condition was explored in duplicate samples. Tissues were embedded in OCT (BDH; Poole, Dorset, United Kingdom), and standard frozen sections were prepared from each sample. Subsequently, sections were stained with H&E and evaluated by the study pathologist. The remainder of the core was stored at −80°C, and total RNA was extracted for evaluation. All animal studies were performed under protocols approved by the Georgetown University Animal Care and Use Committee.

Patient Population.

Patients undergoing a diagnostic core needle or excisional biopsy at Georgetown University Hospital were eligible for the tissue acquisition protocol, in which additional cores were obtained for study purposes. All patients signed a written consent form approved by the Georgetown University Medical Center Institutional Review Board. Core biopsies provided by the radiologists were obtained with either mammographic or ultrasound guidance. Core biopsies obtained by the surgeons were obtained either after surgical exposure of the tumor or during a routine needle biopsy. A total of 1–4 cores were obtained from each patient for study purposes, depending on the size of the breast lesion. In addition, nine frozen breast tumor specimens were obtained from the Department of Oncology, University of Edinburgh (Edinburgh, Scotland, United Kingdom) for use in testing the neural networks for accuracy in identifying tissues as malignant or nonmalignant. These samples were collected after appropriate patient consent and consistent with the relevant United Kingdom legislation. In this study, the pathologist was blinded to all clinical information on all samples.

Collection and Handling of Human Breast Core Biopsies for Microarray Analysis.

Generally, 1–4 core needle biopsies (14-gauge needle) were obtained from each consenting patient. Random cores were immediately snap-frozen in liquid nitrogen; others were individually placed in separate cryo-tubes containing 0.5 ml of RNALater solution. Snap-frozen tissues were placed directly in liquid nitrogen from the core biopsy needle, immediately upon removal from the patient. For the RNALater samples, core biopsies were placed in 500 μl of RNALater and maintained at 4°C for 24 h before snap-freezing. Each tube was labeled with the patient’s name, hospital number, and study number. Frozen samples were transferred to the Lombardi Cancer Center’s Tissue and Histopathology Shared Resource (Washington, DC) for processing.

Before removing the samples from the tube for frozen section preparation, each sample was washed for 5 min on ice with 500 μl of ice-cold sterile PBS (RNase free); otherwise, samples in RNALater will not freeze in the cryostat. Each core biopsy sample was then embedded separately in an OCT block. A frozen section was taken, stained with H&E, and examined by the study pathologist. OCT-embedded samples were maintained frozen at −70°C until the analysis of the main tumor mass was complete.

The study pathologist evaluated all biopsies to determine the presence of invasive cancer and to estimate the relative amounts of normal epithelium, stroma, and fat. Because samples were to be used for microarray analysis, the percentage of invasive cancer, normal epithelium, stroma, and fat was estimated relative to cell nuclei only. Provided this histological review offered no new clinical information important for patient care, biopsies suitable for microarray were identified. In this manner, tissue for expression microarray analysis was ensured to be of no new diagnostic relevance. This determination is important because RNA extraction destroys tissue architecture. If the samples had contained information that modified the surgical pathology diagnosis, these biopsies would not have been used. This situation did not occur in this study.

Once released for study, all patient identifiers were removed from each sample. The link between patient identifiers and study identifiers was held in a confidential database. Access to this database was reserved only for the clinical study principal investigator and the data entry technician. The frozen clinical material, mostly frozen in OCT, was directly provided to the research laboratory for storage and/or processing. Upon receipt in the research laboratory, tissue was either stored at −80°C or processed immediately for RNA extraction.

Preparation and Quality Assessment of RNA from Frozen Tissues.

Frozen tissue was placed in a 1 × 1-inch plastic bag on dry ice and pulverized, and lysis buffer from the Qiagen RNeasy kit was added (Qiagen, Inc., Valencia, CA). Each sample was then transferred to a 1.5-ml centrifuge tube, homogenized with a 1-ml syringe and an 18-gauge needle, added to the Qiagen spin column, and centrifuged to bind the RNA to the matrix. The column was washed with the buffers provided in the kit, and the RNA was finally eluted with distilled H2O. RNA concentrations were determined by comparing the absorbance ratios (A260 nm/A280 nm) obtained spectrophotometrically using a Beckman DU640 Spectrophotometer (Beckman, Fullerton, CA).

Because using standard gel electrophoresis to assess RNA quality would require almost the entire RNA sample, we used an Agilent 2100 analyzer and RNA 6000 LabChip kits (RNA microelectroseparation and analysis; Agilent Technologies, New Castle, DE). A total of 100 ng of each RNA sample was loaded/well. The analyzer allows for visual examination of both the 18S and 28S rRNA bands as a measure of RNA integrity.

Probe Generation for Gene Microarray Hybridizations.

Probes were generated as described previously (16). This method radiolabels both the sense and antisense probe strands and further increases probe-specific activity by incorporating two radiolabeled nucleotides. Thus, tumors can be arrayed on nylon filter arrays with as little as 100 ng of total RNA and without RNA amplification (7, 16). Whereas an adequate signal is generated with 100 ng of total RNA, the use of very low RNA concentrations will likely affect the ability to adequately and reproducibly detect many lower abundance mRNAs. We used 500 ng of total RNA, which is sufficient to allow the use of approximately 70% of breast needle biopsies without either RNA amplification or pooling. None of the RNAs was amplified or pooled in the current study.

To synthesize the labeled cDNA probe, 500 ng of total RNA were incubated at 70°C for 10 min with 2 mg of oligodeoxythymidylate and then chilled on ice for 2 min. The primed DNA was incubated at 37°C for 90 min in a solution containing 1× first strand, 3 mm DTT, 1 mm dGTP/dTTP, 300 units of reverse transcriptase, 50 mCi of [33P]dCTP, and 50 mCi of [33P]dATP. The second strand was synthesized by adding 1× reaction buffer, 100 units of DNA polymerase I, 500 ng of random primers, 1 mm dGTP/dTTP, 50 mCi of [33P]dCTP, and 50 mCi of [33P]dATP. The reaction was incubated for 2 h at 16°C. A radiolabeled probe was purified using a BioSpin-6 chromatography column (Bio-Rad) and denatured by boiling for 3 min. A purified probe was added to the hybridization roller tube containing the prehybridized GeneFilter and incubated for 12–18 h at 42°C in a Robin Scientific Roller Oven. For these studies, the NamedGenes filters (Research Genetics, Inc., Huntsville, AL) were used. These filters contain 4032 known genes, 192 housekeeping genes, and 192 control genes on each filter. Each hybridized GeneFilter was washed twice in 2× SSC, 1% SDS at 50°C for 20 min and once at 55°C in 0.5× SSC, 1% SDS for 15 min. Hybridization signals were detected by phosphorimaging using a Molecular Dynamics Storm PhosphorImager (Molecular Dynamics, Sunnyvale, CA). The sensitivity and reproducibility of these and other nylon filter-based cDNA microarrays have been widely reported (7, 17, 18, 19, 20).

Normalization of Data.

Pathways software algorithms (Research Genetics, Inc.) were used to correct for nonspecific binding of the probe to filter (background correction). Approaches for signal normalization, intended to correct for differences in probe specific activities, hybridizations, and other interexperiment variables, are diverse (11). In the present study, the average of all data points was used to calculate a normalization factor; the normalized intensity value for each spot was obtained by multiplying the normalization factor by the raw intensity (11).

Analysis of Gene Microarray Data.

The optimal approach for analyzing the high dimensional gene expression data generated by gene microarray studies remains unclear. The high dimensionality of these data are problematic, with most existing analyses functioning more accurately in low dimensionality (21). However, rather than making statistical inference for identifying and studying functionally relevant genes, the study goal was to validate the tissue acquisition and processing methods and demonstrate the applicability of this approach for building clinically relevant predictive models.

Recently, we devised a simple approach to the exploration of small studies with two experimental groups.5 Our approach used simple statistical analyses to reduce data dimensionality and identify subsets of discriminant genes. This approach is similar in principle to that used by Hedenfalk et al. (22). Because the class of each sample (cancer versus noncancer) is known from the histopathological analyses, dimensionality can be reduced in a supervised manner by performing a series of statistical tests. The major purpose of performing these tests was only to select a group of genes that would be used for data visualization and analysis. Student’s t test and a t test for unequal variances (each assumes normal distribution of the data) and a nonparametric (distribution-free) Wilcoxon test were used. Whereas the inflated type 1 error will overestimate significant differences, the incidence of false negative estimates should be smaller. Because the distribution of the data among and within replicate experiments and for individual genes cannot be determined (23), both logarithm-transformed and nontransformed data were compared.

Two reduced dimensional data sets were selected; one comprising genes with Ps < 0.05, and one comprising genes with Ps < 0.02. Because of their marked biological differences, these phenotypes should be easily separable. Thus, the data were visualized using our Fisher separability-based multidimensional scaling approach that projects high dimensional data into three-dimensional data space (24, 25). Because it has become widely used, visualization using the simple hierarchical clustering described by Eisen et al.(26) is also presented.

Generation and Testing of a Neural Network.

To determine whether the genes we selected could be used to separate cancer from noncancer tissues, a neural network was trained using the gene expression microarray data from five cancer biopsies and five noncancer biopsies. Neural networks can be considered as parallel computing systems consisting of many simple processors with many interconnections. The main advantages of neural networks are that they can learn complex nonlinear input-output relationships, use sequential training procedures, and adapt themselves to the data (27, 28).

The learning process involves updating network architecture and connection weights so that the predictive model can efficiently perform a specific classification task. We used a multilayer preceptron to design a nonlinear neural classifier, using each of the gene’s expression levels in the tissue samples as the input and the cancer versus noncancer phenotype of each sample as the output. Consequently, the network output comes to approximate the posterior Bayesian probabilities of a sample being either cancer or noncancer given its gene expression profile (27, 29, 30). Three experimental configurations were tested, with either the top 40, 80, and 103 dimensions (data set selecting the top 103 genes; P < 0.05) or 10, 20, and 30 dimensions (data set selecting the top 30 genes; P < 0.02). These top genes were selected based on their fold difference between cancer and noncancer and their respective Ps. Two prediction models were built, one with 3 hidden nodes and 8 inputs and one with 5 hidden nodes and 18 inputs. Mean-squared error estimates were used to explore network performance. The “leave-one-out” method was used for the initial testing and training of each neural network (27, 29, 30).

RNA Quality and Tissue Architecture from Xenograft Tissues Processed Using RNALater.

Recovery of high-quality RNA was optimal when OCT-embedded tissue samples were removed from frozen blocks using a small volume of RNALater, which thawed and softened the embedding medium before tissue extraction from the blocks. Thus, the frozen block was placed in a small plastic tray, with the embedded tissue facing up, and 500 μl of RNALater were pipetted on top. Using this method, seven of eight samples yielded high-quality RNA (Fig. 1 A; RNA integrity analyzed using the Agilent 2100 bioanalyzer and RNA 6000 LabChip kit). In previous experiments, where the OCT block was dissolved by vigorous shaking in a large volume of PBS and tissue fragments were recovered with a strainer, only 6 of 12 samples yielded fully intact RNA (data not shown).

Pathology was not interpretable from material frozen directly in RNALater and transferred to OCT without wash steps (Fig. 2,A). This reflects inadequate freezing in the cryostat and consequent tissue folding during the cutting process. When washed in PBS:RNALater (1:6) for 2 h at 4°C, the tissue did not fold on cutting, but cell outlines appeared blurred, making pathological interpretation difficult (Fig. 2,B). Washing in PBS:RNALater (1:6) for 5 min at 4°C also eliminated tissue folding, but now the cell outlines appeared distinct. Nonetheless, tissue fragmentation occurred in some specimens, making pathological interpretation suboptimal (Fig. 2,C). Optimal preservation of tissue architecture was obtained by washing tissue for 5 min on ice with ice-cold PBS. Cell margins were clear, tissue folding and fragmentation were not observed, and the integrity of the cores was maintained, allowing optimal pathological interpretation (Fig. 2,D). Similar data were obtained from a human breast core biopsy released to this study (Fig. 3,A–D). A scheme of the optimized tissue acquisition protocol is shown in Fig. 4.

Recovery of High-quality RNA from Human Breast Biopsies for Gene Microarray Studies.

Cores were removed from OCT by placing the frozen block in a small plastic tray, with the embedded tissue facing up, and pipetting 500 μl of RNALater on top. Intact cores were easily picked out of the OCT, which remained semisolid, using a sterile pipette tip. From a study of 55 breast needle biopsies, we obtained ≥100 ng of RNA on almost all samples (Table 2). The median value (1.34 μg) shows that most biopsies produce sufficient RNA to generate data using 500 ng of total RNA. There was no significant difference between frozen and RNALater-processed biopsies in the mean concentrations of total RNA recovered (Tables 2 and 3). Thus, prospectively collected breast needle biopsies, either directly snap-frozen or processed in RNALater, can produce adequate RNA concentrations for use in gene microarray studies.

A further requirement of gene expression microarray experiments is the isolation of high-quality RNA (11). Sufficient RNA was not recovered to allow for an assessment of RNA quality by both standard gel electrophoresis methods and gene microarray studies on the same samples. Because gel electrophoresis requires ∼1 μg of RNA, the Agilent 2100 “lab-on-a-chip” technology was used to assess RNA quality. This technology requires only 100 ng of RNA to determine quality, with specificity comparable with or better than that obtained from standard gel electrophoresis. Consequently, RNA quality can be assessed on samples that will later be subjected to gene microarray analysis. As is evident from Fig. 1 B, >90% of representative biopsies produced high-quality RNA.

Analysis of Core Needle Breast Biopsies and Visualization of Gene Expression Data.

To assess the applicability of the tissue processing procedure, we obtained total RNA from five random breast cancer biopsies and five random biopsies of noncancer tissue (Table 3). All tissues were evaluated by the study pathologist before release for our studies to ensure that the investigational cores contained no diagnostically useful information. Both biopsies processed in RNALater and biopsies frozen without RNALater were analyzed. These biopsies were approximately equally represented in each group (RNALater processed: cancer = 3; noncancer = 3). RNA was prepared, and probes were generated as described above. The mean RNA concentrations recovered by both methods were comparable (see also Table 2). Probes were hybridized to NamedGene filters, and signal was measured using a Molecular Dynamics Storm PhosphorImager. Digitized representations of the hybridized filter signals were imported into the Pathways software for background correction and normalization.

Normalized gene expression data were imported into the visualization algorithm, and scatter plots of the gene expression data were generated. We first reduced dimensionality by eliminating noninformative genes. Hence, we excluded those genes whose expression was not likely to be different between the cancer and noncancer groups (multiple t tests, P > 0.05). A total of 103 genes met this criterion and were used to generate a three-dimensional (from 103-dimensional) plot of the data (Fig. 5,A). The three axes are the first three principal components fitted to the cancer and noncancer molecular profile data. The cumulative proportion of the variance captured by each principal component axis is: (a) principal component axis 1, 55%; (b) principal component axis 2, 72%; and (c) principal component axis 3, 79%. We also applied hierarchical clustering, similar to approaches used by others (26), based on Euclidean space analysis (1-Pearson’s correlation coefficient matrix). The latter approach could not completely separate two cancers from the clusters of noncancers (Fig. 5 B). PCA6 -based multidimensional scaling visualization separated breastcancers (triangles) and noncancer tissue (circles) into linearly separable gene expression data space. However, it should be noted that neither approach provides a statistical assessment of separability, only a visualization of data structure. Whereas the number of data points is limited, the multidimensional scaling visualization is consistent with our ability to identify a putative molecular profile that can separate neoplastic from nonneoplastic tissues.

This subset of genes is expected to include some false positives, reflecting the type 1 error associated with the selection. Consequently, data dimensionality was further reduced using more conservative criteria (P ≤ 0.02 and regulation ≳1.8-fold). We chose this fold regulation to include all ≤2-fold differences in mean gene expression levels between cancer and noncancer tissues. The analysis produced a 30-dimensional data set; 25 signals (genes) were up-regulated in the neoplastic biopsies (Table 4A), and 5 signals were up-regulated in the nonneoplastic biopsies (Table 4B). The ability of this subset to separate cancer from noncancer was also evaluated using both our PCA-based multidimensional scaling approach and simple hierarchical clustering. The cumulative proportion of the variance captured by each principal component axis is: (a) principal component axis 1, 64%; (b) principal component axis 2, 75%; and (c) principal component axis 3, 82%. Neoplastic and nonneoplastic tissues (Table 3) were now linearly separable in gene expression data space by both visualization methods (Fig. 6, A and B).

Neural Network Predictors of Biopsy Phenotypes.

Having reduced the dimensionality, it was necessary to assess whether the expression patterns of remaining genes in the 103 and 30 dimensions contained useful discriminatory information. Thus, the ability of various gene subsets to train accurate neural network predictors that could predominantly separate cancer from noncancer tissues was assessed. The three configurations tested (1–3 hidden nodes) for genes within the 30- and 103-dimensional data sets are described in “Materials and Methods.” All were evaluated using the leave-one-out method. Whereas the number of microarrays from which the data are obtained is small (n = 10), each configuration achieved a 0% misclassification rate (network training) for cancer versus noncancer, whether in 103 or 30 dimensions and with either log10 or nontransformed gene expression data.

Because the initial training and testing were done on the original data set from the Georgetown University samples, we tested the neural networks against an independent data set of nine frozen breast specimens from the University of Edinburgh. These were snap-frozen mastectomy specimens rather than core needle breast biopsies, but they should contain a mixture of cancer and noncancer cells and provide a strong and independent challenge for the neural network. The neural network model should accurately predict as cancer any biopsy comprising >80% cancer tissue.

Gene expression data were generated using the same Research Genetics filter technology and queried in the predictive model. For both the 103 and 30 gene data sets, the nontransformed data provided the more accurate models. Both models predicted that all nine samples should be cancer and not noncancer. The pathologist who evaluated the samples for the training set subsequently performed histopathological analysis of stored samples of these tissues. All nine samples were confirmed as ≥80% cancer specimens. Thus, no samples in the independent test data set were misclassified, demonstrating the neural network’s predictive accuracy. When the log10 data were used, the models misclassified 1 of 9 tumors (30 dimensions; 89% accurate) and 2 of 9 tumors (103 dimensions; 78% accurate). The lower classification rate with the 103 genes probably reflects the increased type 1 error associated with this data set and the failure to exclude some uninformative genes.

Genes Differentially Expressed between Breast Cancer and Noncancer Tissues.

The data in Table 4 show that the choice of t test has only a marginal effect on data selection for supervised dimension reduction. If we make no assumption regarding distribution of the data, approximately 1 in 3 genes would be rejected by relying solely on the nonparametric analyses, a ≥1.8-fold differential expression, and a cutoff of P ≤ 0.02. The 30 target cDNAs comprising the 30-dimensional data set are presented in Table 4.

Generally, prospective study designs are more valid for the exploration or validation of new predictive and prognostic factors. Retrospective breast cancer studies may be compromised by the bias toward larger tumors in many existing frozen tumor banks, whereas the average size of most newly diagnosed breast tumors continues to decrease (9). Thus, many studies into the molecular biology of such early lesions may need to be done prospectively. Investigators at single academic institutions can often prospectively obtain frozen samples under a rigorous collection protocol. However, the ability to do so at multiple institutions or when local clinics and community physicians are also involved can be problematic. A rapid, standard tissue processing approach should allow for the use of tissues from multiple institutions in a controlled manner. For example, it should be possible to reduce possible changes in molecular profiles associated with differences in tissue acquisition and processing. Whereas these concerns have not been explored in detail experimentally, tissue processing clearly affects the performance of other molecular biological technologies applied to human biopsies and tumor tissues (10).

To address these issues, we conducted studies to identify an optimal tissue acquisition, processing, and analysis procedure for exploring the gene expression profiles of prospectively accrued breast core needle biopsies. Because RNA extraction destroys tissue architecture, we developed a novel method for tissue processing that would allow us to obtain samples in a uniform manner, preserve RNA quality/quantity, and, most importantly, retain all potentially diagnostically relevant information.

Tissue placed in RNALater can be left at room temperature for up to 1 h at 37°C, 1 week at 25°C, and ≥1 month at 4°C and retain fully intact RNA (31). Our data show that biopsies processed immediately in either liquid nitrogen or RNALater can produce sufficient concentrations of high-quality RNA for nylon filter microarray analysis without RNA amplification. This amount of RNA is also adequate for amplification for use with other gene expression microarray technologies (32). If processed carefully, tissue architecture can be maintained from biopsies collected in RNALater. This is clearly important because some small breast lesions can be completely removed by the biopsy procedure. These core biopsies should not be used for studies if critical diagnostic information could be lost. We estimate that, using the approaches described in this study, approximately 90% of suitable core needle breast biopsies should produce sufficient material for gene expression microarray studies.

Our studies demonstrate that the RNA recovered can be used to generate relevant gene expression microarray information. Relevance is evident from our abilities to identify differentially expressed genes associated with breast cancer cells and to build accurate neural network predictors that can identify cancer from noncancer samples based solely on their gene expression profiles.

Among the differentially expressed genes in the reduced 30-dimensional space, we would expect to find either some genes already implicated in breast cancer or known to be expressed in normal or neoplastic breast tissues. Consistent with this expectation, several genes of potential relevance were identified. For example, ceruloplasmin is up-regulated in neoplastic breast tissues, and elevated serum levels of ceruloplasmin are associated with recurrent breast cancers (33, 34). The BT2 glycoprotein is a milk protein (35) and might be expected to be expressed in tissues predominately composed of breast epithelial cells.

ER protein status is determined routinely for cancer but not noncancer biopsies. Because four of the five tumor biopsies were ER positive, we also would expect to find genes with expression patterns known either to be associated with ER or to modulate ER function. At least three genes meet these criteria. The aryl hydrocarbon receptor is known to interact with ER and affect its function (36), and the expression of both grb14 and α-catenin is associated with ER expression in breast tumors (Refs. 37 and 38; Table 5).

The discriminant power of the genes selected is evident from the accuracy of the neural networks built using the data from the initial five cancer and five noncancer biopsies. The ability to accurately identify independent samples as cancer shows that the genes of interest are expressed or repressed in both patterns and at levels consistent with the model. This is an appropriate and rigorous test of the approach because the goal was to build molecular predictors, rather than to identify functionally relevant genes. Building a predictor is also a much more efficient test of the selected genes than would be obtained by simply confirming expression gene by gene in more standard assays: Northern blot, RNase protection, or real-time PCR. Confirming the differential expression of each gene is unnecessary for building clinically relevant predictive models. Unlike studies to identify functionally relevant genes, the discriminate power of each signal from the target cDNAs on the array is independent of whether that signal originates from hybridization to its expected mRNA.

The gene expression profile data and neural network performance suggest that, at least for samples of very different biologies, contamination of samples with ≥80% of other cell types may not confound analyses for molecular profiling. Whether this observation can be extrapolated to other studies remains to be further established. Nonetheless, the resource intensive requirements of microdissection and RNA amplification may not be absolute requirements for all molecular profiling studies.

The tissue acquisition and processing methods, dimension reduction, data visualization approaches, and neural network analyses we describe may be useful in the design of larger prospective studies. We continue to develop other data visualization, normalization, and exploration algorithms that also may be of use in the analysis of gene expression microarray studies (24, 25, 39, 40).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported by USPHS from National Cancer Institute Grants 5R33CA83231 (to Y. W.), R01-CA/AG58022 (to R. C.), P50-CA58185 (to R. C.), and K12-CA76903 (to M. L.), the Department of Defense Grants BC980629 (to R. C.) and BC990358 (to R. C.), and CI-3 Cancer Research Fund-Lilly Clinical Investigator Award of the Damon Runyon-Walter Winchell Foundation (to V. S., D. F. H.).

5

Z. Gu. Association of interferon regulatory factor-1, nucleophosmin, nuclear factor-kappa-B and cAMP binding with acquired resistance to Faslodex (ICI 182,780), submitted for publication.

6

The abbreviations used are: PCA, principal component analysis; ER, estrogen receptor.

1
Hayes D. F., Bast R. C., Desch C. E., Fritsche H., Kemeny N. E., Jessup J. M., Locker G. Y., MacDonald J. S., Mennel R. G., Norton L., Ravdin P. M., Taube S., Winn R. J. Tumor marker utility grading system: a framework to evaluate clinical utility of tumor markers.
J. Natl. Cancer Inst. (Bethesda)
,
88
:
1456
-1466,  
1996
.
2
Hayes D. F., Trock B., Harris A. Assessing the clinical impact of prognostic factors: when is “statistically significant” clinically useful?.
Breast Cancer Res. Treat.
,
52
:
305
-319,  
1998
.
3
Marx J. DNA arrays reveal cancer in its many forms.
Science (Wash. DC)
,
289
:
1670
-1672,  
2000
.
4
Carulli J. P., Artinger M., Swain P. M., Root C. D., Chee L., Tulig C., Guerin J., Osborne M., Stein G., Lian J., Lomedico P. T. High throughput analysis of differential gene expression.
J. Cell. Biochem.
,
30–31
:
286
-296,  
1998
.
5
Richmond C. S., Glasner J. D., Mau R., Jin H., Blattner F. R. Genome-wide expression profiling in Escherichia coli K-12.
Nucleic Acids Res.
,
27
:
3821
-3835,  
1999
.
6
Cox J. M. Applications of nylon membrane arrays to gene expression analysis.
J. Immunol. Methods
,
250
:
3
-13,  
2001
.
7
Bertucci F., Bernard K., Loriod B., Chang Y. C., Granjeaud S., Birnbaum D., Nguyen C., Peck K., Jordan B. R. Sensitivity issues in DNA array-based expression measurements and performance of nylon microarrays for small samples.
Hum. Mol. Genet.
,
8
:
1715
-1722,  
1999
.
8
Granjeaud S., Bertucci F., Jordan B. R. Expression profiling: DNA arrays in many guises.
Bioessays
,
21
:
781
-790,  
1999
.
9
Morrow M., Schnitt S. J., Harris J. R. In situ carcinomas Harris J. R. Lippman M. E. Morrow M. Hellman S. eds. .
Diseases of the Breast
,
:
355
-373, Lippincott-Raven Philadelphia  
1996
.
10
Zeller R. In situ hybridization and immunohistochemistry Ausbel F. M. Brent R. Kingston R. E. Moore D. D. Seidman J. G. Smith J. A. Struhl K. eds. .
Current Protocols in Molecular Biology
,
:
14.1.1
-14.14.8, John Wiley & Sons, Inc. New York  
2001
.
11
Schuchhardt J., Beule D., Malik A., Wolski E., Eickhoff H., Lehrach H., Herzel H. Normalization strategies for cDNA microarrays.
Nucleic Acids Res.
,
28
:
E47
2000
.
12
Wang G. L., Semenza G. L. General involvement of hypoxia-inducible factor 1 in transcriptional response to hypoxia.
Proc. Natl. Acad. Sci. USA
,
90
:
4304
-4308,  
1993
.
13
Wenger R. H., Rolfs A., Marti H. H., Bauer C., Gassmann M. Hypoxia, a novel inducer of acute phase gene expression in a human hepatoma cell line.
J. Biol. Chem.
,
270
:
27865
-27870,  
1995
.
14
Leonessa F., Green D., Licht T., Wright A., Wingate-Legette K., Lippman J., Gottesman M. M., Clarke R. MDA435/LCC6 and MDA435/LCC6MDR1: ascites models of human breast cancer.
Br. J. Cancer
,
73
:
154
-161,  
1996
.
15
Clarke R. Human breast cancer cell line xenografts as models of breast cancer: the immunobiologies of recipient mice and the characteristics of several tumorigenic cell lines.
Breast Cancer Res. Treat.
,
39
:
69
-86,  
1996
.
16
Sgroi D. C., Teng S., Robinson G., LeVangie R., Hudson J. R., Elkahloun A. G. In vivo gene expression profile analysis of human breast cancer progression.
Cancer Res.
,
59
:
5656
-5661,  
1999
.
17
Walker J., Rigley K. Gene expression profiling in human peripheral blood mononuclear cells using high-density filter-based cDNA microarrays.
J. Immunol. Methods
,
239
:
167
-179,  
2000
.
18
McCormick S. M., Eskin S. G., McIntire L. V., Teng C. L., Lu C. M., Russell C. G., Chittur K. K. DNA microarray reveals changes in gene expression of shear stressed human umbilical vein endothelial cells.
Proc. Natl. Acad. Sci. USA
,
98
:
8955
-8960,  
2001
.
19
Herwig R., Aanstad P., Clark M., Lehrach H. Statistical evaluation of differential expression on cDNA nylon arrays with replicated experiments.
Nucleic Acids Res.
,
29
:
E117
2001
.
20
Carlisle A. J., Prabhu V. V., Elkahloun A., Hudson J., Trent J. M., Linehan W. M., Williams E. D., Emmert-Buck M. R., Liotta L. A., Munson P. J., Krizman D. B. Development of a prostate cDNA microarray and statistical gene expression analysis package.
Mol. Carcinog.
,
28
:
12
-22,  
2000
.
21
Hinneburg A., Keim D. A. Optimal grid-clustering: toward breaking the curse of dimensionality in high-dimensional clustering Atkinson M. P. Orlowska M. E. Valduriez P. Zdonik S. B. Brodie M. L. eds. .
Proceedings of the 25th Conference on Very Large Databases
,
:
506
-517, Morgan Kaufman San Francisco  
1999
.
22
Hedenfalk I., Duggan D., Chen Y., Radmacher M., Bittner M., Simon R., Meltzer P., Gusterson B., Esteller M., Kallioniemi O. P., Wilfond B., Borg A., Trent J. Gene-expression profiles in hereditary breast cancer.
N. Engl. J. Med.
,
344
:
539
-548,  
2001
.
23
Wittes J., Friedman H. P. Searching for evidence of altered gene expression: a comment on statistical analysis of microarray data.
J. Natl. Cancer Inst. (Bethesda)
,
91
:
400
-401,  
1999
.
24
Wang Y., Lin S. H., Li H., Kung S. Y. Data mapping by probabilistic modular networks and information theoretic criteria.
IEEE Trans. Signal Processing
,
46
:
3378
-3397,  
1998
.
25
Wang Y., Luo L., Freedman M. T., Kung S. Y. Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization.
IEEE Trans. Neural Net.
,
11
:
635
-646,  
2000
.
26
Eisen M. B., Spellman P. T., Brown P. O., Botstein D. Cluster analysis and display of genome-wide expression patterns.
Proc. Natl. Acad. Sci. USA
,
95
:
14863
-14868,  
1998
.
27
Haykin S. .
Neural Networks: A Comprehensive Foundation
, Prentice Hall, Inc. Upper Saddle River, NJ  
1999
.
28
Jain A. K., Duin R. P. W., Mao J. Statistical pattern recognition: a review.
IEEE Trans. Pattern Anal. Machine Intel.
,
22
:
4
-37,  
2000
.
29
Adali, T., Wang, Y., and Li, H. Neural networks for biomedical signal processing. In: Y-H. Hu and J-N. Huang (eds.), Handbook of Neural Network Signal Processing, in press. Boca Raton, FL: CRC Press, Inc., 2001.
30
Ripley B. .
Pattern Recognition and Neural Networks
, Cambridge University Press Cambridge, U.K.  
1996
.
31
RNALater Tissue Collection. RNA Stabilization Solution .
Ambion Protocols and Manuals
, Ambion Austin, TX  
2001
.
32
Eberwine J. Amplification of mRNA populations using aRNA generated from immobilized oligo(dT)-T7 primed cDNA.
Biotechniques
,
20
:
584
-591,  
1996
.
33
Ozyilkan O., Baltali E., Ozyilkan E., Tekuzman G., Kars A., Firat D. Ceruloplasmin level in women with breast disease.
Preliminary results. Acta Oncol.
,
31
:
843
-846,  
1992
.
34
Schapira D. V., Schapira M. Use of ceruloplasmin levels to monitor response to therapy and predict recurrence of breast cancer.
Breast Cancer Res. Treat.
,
3
:
221
-224,  
1983
.
35
Peterson J. A., Hamosh M., Scallan C. D., Ceriani R. L., Henderson T. R., Mehta N. R., Armand M., Hamosh P. Milk fat globule glycoproteins in human milk and in gastric aspirates of mother’s milk-fed preterm infants.
Pediatr. Res.
,
44
:
499
-506,  
1998
.
36
Klinge C. M., Kaur K., Swanson H. I. The aryl hydrocarbon receptor interacts with estrogen receptor α and orphan receptors COUP-TFI and ERRα1.
Arch. Biochem. Biophys.
,
373
:
163
-174,  
2000
.
37
Daly R. J., Sanderson G. M., Janes P. W., Sutherland R. L. Cloning and characterization of GRB14, a novel member of the GRB7 gene family.
J. Biol. Chem.
,
271
:
12502
-12510,  
1996
.
38
Gonzalez M. A., Pinder S. E., Wencyk P. M., Bell J. A., Elston C. W., Nicholson R. I., Robertson J. F., Blamey R. W., Ellis I. O. An immunohistochemical examination of the expression of E-cadherin, α- and β/γ-catenins, and α2- and β1-integrins in invasive breast cancer.
J. Pathol.
,
187
:
523
-529,  
1999
.
39
Wang Y., Lu J., Lee R. Y., Clarke R. Iterative normalization of cDNA microarray data.
IEEE Trans. Inf. Technol. Biomed.
,
6
:
29
-37,  
2002
.
40
Lu J., Wang Y., Xuan J., Kung S. Y., Gu Z., Clarke R. Discriminative mining of gene microarray data.
Proc. IEEE Neural Net Signal Processing
,
11
:
218
-227,  
2001
.
41
Heid H. W., Winter S., Bruder G., Keenan T. W., Jarasch E. D. Butyrophilin, an apical plasma membrane-associated glycoprotein characteristic of lactating mammary glands of diverse species.
Biochim. Biophys. Acta
,
728
:
228
-238,  
1983
.
42
Sakurai H., Miyoshi H., Toriumi W., Sugita T. Functional interactions of transforming growth factor β-activated kinase 1 with IκB kinases to stimulate NF-κB activation.
J. Biol. Chem.
,
274
:
10641
-10648,  
1999
.
43
Shibuya H., Yamaguchi K., Shirakabe K., Tonegawa A., Gotoh Y., Ueno N., Irie K., Nishida E., Matsumoto K. TAB1: an activator of the TAK1 MAPKKK in TGF-β signal transduction.
Science (Wash. DC)
,
272
:
1179
-1182,  
1996
.
44
Bukholm I. K., Nesland J. M., Borresen-Dale A. L. Reexpression of E-cadherin, α-catenin and β-catenin, but not of γ-catenin, in metastatic tissue from breast cancer patients.
J. Pathol.
,
190
:
15
-19,  
2000
.
45
Berger T., Brigl M., Herrmann J. M., Vielhauer V., Luckow B., Schlondorff D., Kretzler M. The apoptosis mediator mDAP-3 is a novel member of a conserved family of mitochondrial proteins.
J. Cell Sci.
,
113
:
3603
-3612,  
2000
.
46
Levy-Strumpf N., Kimchi A. Death-associated proteins (DAPs): from gene identification to the analysis of their apoptotic and tumor suppressive functions.
Oncogene
,
17
:
3331
-3340,  
1998
.
47
Kunapuli S. P., Singh H., Singh P., Kumar A. Ceruloplasmin gene expression in human cancer cells.
Life Sci.
,
40
:
2225
-2228,  
1987
.
48
Trombino A. F., Near R. I., Matulka R. A., Yang S., Hafer L. J., Toselli P. A., Kim D. W., Rogers A. E., Sonenshein G. E., Sherr D. H. Expression of the aryl hydrocarbon receptor/transcription factor (AhR) and AhR-regulated CYP1 gene transcripts in a rat model of mammary tumorigenesis.
Breast Cancer Res. Treat.
,
63
:
117
-131,  
2000
.
49
Nguyen T. A., Hoivik D., Lee J. E., Safe S. Interactions of nuclear receptor coactivator/corepressor proteins with the aryl hydrocarbon receptor complex.
Arch. Biochem. Biophys.
,
367
:
250
-257,  
1999
.
50
Yaegashi S., Sachse R., Ohuchi N., Mori S., Sekiya T. Low incidence of a nucleotide sequence alteration of the neurofibromatosis 2 gene in human breast cancers.
Jpn. J. Cancer Res.
,
86
:
929
-933,  
1995
.
51
Kanai Y., Tsuda H., Oda T., Sakamoto M., Hirohashi S. Analysis of the neurofibromatosis 2 gene in human breast and hepatocellular carcinomas.
Jpn. J. Clin. Oncol.
,
25
:
1
-4,  
1995
.
52
Preiherr J., Hildebrandt T., Klostermann S., Eberhardt S., Kaul S., Weidle U. H. Transcriptional profiling of human mammary carcinoma cell lines reveals PKW, a new tumor-specific gene.
Anticancer Res.
,
20
:
2255
-2264,  
2000
.
53
Ugolini F., Adelaide J., Charafe-Jauffret E., Nguyen C., Jacquemier J., Jordan B., Birnbaum D., Pebusque M. J. Differential expression assay of chromosome arm 8p genes identifies Frizzled-related (FRP1/FRZB) and fibroblast growth factor receptor 1 (FGFR1) as candidate breast cancer genes.
Oncogene
,
18
:
1903
-1910,  
1999
.