The study of tumor suppressor gene function has been aided by the creation of discrete gene alterations in the mouse. One such example can be seen in the study of tumor suppressor gene function in general and the retinoblastoma (Rb) tumor suppressor in particular. Because the phenotype of a cell is a direct reflection of the gene activity within that cell, a comprehensive analysis of changes in gene activity resulting from the loss of Rb function has the potential to greatly enhance our understanding of Rb biology. We have used DNA microarray analysis to identify gene expression profiles in wild-type and Rb-null mouse embryo fibroblasts, as well as cells lacking other Rb family members, as an approach to developing a more complete understanding of Rb function. In so doing, we have identified gene expression phenotypes that characterize the loss of Rb function, that distinguish a Rb-null cell from a wild-type cell as well as a p107/p130-null cell, and that identify gene regulatory pathways unique to these events. Importantly, the Rb gene expression patterns can identify murine tumors that result from Rb loss of function. We suggest that this is an approach to the eventual understanding of gene regulatory pathways that define a phenotypic state, including those events that lead to tumor development.

The phenotype of an organism is defined as the visible properties that are produced by the interaction of the genotype and the environment. Genetic changes can have either dramatic or subtle effects on phenotype. One such example can be seen in the study of cancer phenotypes resulting from the loss of function of various tumor suppressor genes. For instance, the loss of Rb5 suppressor gene activity, originally observed as the inheritance of an eye tumor (1), is seen as contributing to a broad range of tumors. Rb is now recognized to be a critical regulatory activity governing the G1 to S-phase transition of cell cycle progression. By binding to the E2F-family of transcription factors, Rb functions as an active repressor of transcription of E2F-responsive genes that are necessary for the initiation of DNA synthesis (2, 3). Dissolution of the Rb/E2F complexes, thereby freeing active E2F protein, is regulated at the G1/S-phase transition by the sequential phosphorylation of the Rb protein carried out by the D-type CDKs and cyclin E/cdk2. The CDKs are down-regulated by the activity of the CKIs (4). Thus, Rb regulates the transcriptional activity of the E2Fs by the cyclical association and subsequent activities of the CDKs and CKIs.

The three related proteins (Rb, p130, p107) are structurally quite similar, especially in the pocket domain, which is required for binding the E2Fs. Each family member is capable of inhibiting cell cycle progression and repressing transcription when expressed ectopically, indicating functional overlap among the family members (5). However, there are also clear biochemical differences in the three proteins. Most striking is the specificity for E2F interaction. Rb interacts with all E2Fs, whereas p107 and p130 bind E2Fs 4 and 5.

To further explore the distinct functions of the Rb family of proteins, Hurford et al.(6) demonstrated the differential activation of E2F-responsive genes comparing Rb−/− and p107/p130−/− knockout MEFs. Several E2F-responsive genes were assayed by Northern analysis, and it was determined that the cyclin E and p107 genes were derepressed in the Rb−/− MEFs. In the p107/p130−/− MEFs, the expression of cyclin E and p107 genes was normal but B-myb, cdc2, E2F1, TS, RRM2, and cyclin A2 were derepressed. These results thus provided evidence of differential function for the individual members of the Rb family with respect to repression of E2F-responsive gene expression. Nevertheless, the extent to which such specific control might exist, and the nature of the specific regulatory events controlled by individual members of the Rb family, are not clear.

To provide a more global evaluation of the gene expression events associated with the loss of the Rb family members, we have made use of DNA microarray analysis combined with statistical methodologies developed for the analysis of human cancer samples. In this latter instance, gene expression profiles have been identified that characterize the behavior of certain tumors; as such, these gene expression profiles represent molecular phenotypes of the tumors. For instance, our previous work has demonstrated that gene expression patterns generated by DNA microarray analysis can accurately identify breast cancer samples with respect to the status of the estrogen receptor (7). We have applied this approach, which makes use of multiple observations of a given cellular state to identify those genes that most highly correlate with that state, to develop gene expression phenotypes of Rb- and Rb family member-null cells.

Cells.

MEFs were dissected from C57Bl/6 wild-type, Rb-null or p107/130-null 13.5-day embryos as described (8). MEFs were grown in DMEM containing 15% heat-inactivated FBS. MEFS were passaged for three or four times after dissection. Cells were rendered quiescent by splitting 1:5 into DMEM containing 15% FBS and incubating overnight. The following morning the medium was removed, and the MEFs were rinsed once with DMEM containing 0.25% FBS. The cells were then incubated for 48 h in DMEM containing 0.25% FBS to generate a quiescent population of cells. Cell synchrony was assessed by propidium iodide labeling of the DNA and analyzed by flow cytometry.

RNA Preparation.

Total RNA was prepared by scraping cells into Trizol reagent according to the manufacturer’s instructions (Life Technologies, Inc.). Total RNA from tumor samples was prepared using RNeasy Midi kit according to the manufacturer’s instructions (Qiagen).

Analysis of Tumors.

Rb +/− C57Bl/6 mice were followed for 8–14 months from birth. When the mice displayed severe wasting, they were sacrificed by CO2 asphyxiation and dissected. Enlarged cellular masses corresponding to the pituitary and/or thyroid tissues were harvested and flash-frozen in liquid nitrogen. Total RNA was isolated by Qiagen RNeasy preparation and applied to Mu11K Affymetrix DNA microarrays. Six tumor samples were taken: pituitary and thyroid tumors from Rb+/− no. 1, thyroid tumor from Rb+/− no. 2, pituitary tumor from Rb+/− no. 3, thyroid tumor from Rb+/− no. 7 and thyroid tumor from Rb+/− no. 9. Normal thyroid (2) and brain (1) were harvested from three wild-type C57Bl/6 mice that were littermates of the Rb+/− tumor-ridden animals. Additional control samples were normal (one sample) and tumor-ridden pituitary (2 samples) tissues that were dissected from wild-type or ptd-FGFR4 transgenic animals. These transgenic mice develop pituitary tumors in a Rb-independent manner (9).

DNA Microarray Analysis.

All of the experiments used Affymetrix Mu11K A/B GeneChips. The targets for Affymetrix DNA microarray analysis were prepared as described by the manufacturer. GeneChip microarrays were hybridized with the targets for 16 h at 45°C, were washed, and were stained using the GeneChip Fluidics station according to the manufacturer’s instructions. DNA chips were scanned with the GeneChip scanner, and the signals were processed by the GeneChip expression analysis algorithm (v.2; Affymetrix). An initial analysis of the data generated from the microarray images was focused on simple determinations of fold change within the comparisons of wild-type samples versus knockout samples. The fold change of average difference values for the two experimental conditions, Rb-null and p107/p130-null, were calculated. The mean average fold difference of the 9 Rb-null replicate experiments or 10 p130/p107-null experiments was then determined as was the SD. For the factor analysis of expression data, we used methods essentially as described previously for the analysis of breast cancer samples (7). Briefly, the analysis uses binary regression models combined with singular value decompositions (SVDs) and with stochastic regularization using Bayesian analysis. A probability model estimates a classification probability for each of the two possible states (wild-type versus mutant) for each sample. This probability is structured as a probit regression model, in which the expression levels of genes are scored by regression parameters in a regression vector b. An analysis estimates this regression vector and the resulting classification probabilities for both training and validation samples. The estimated regression vector itself is important not only in defining the predictive classification but also in scoring genes as to their contribution to the classification.

Previous work revealed distinct alterations in gene expression resulting from the loss of Rb function as well as the Rb-related genes p107 and p130(6). These previous studies have generally used assays, such as Northern blot analysis, to assess the level of expression of a given gene as a function of Rb status. For the most part, these studies have examined genes also known to be regulated by the E2F family of transcription factors, consistent with the fact that at least part of the function of Rb is to mediate repression of transcription via E2F. A more comprehensive examination of gene expression can be achieved using DNA microarray analysis, which affords the opportunity to measure the expression of thousands of genes simultaneously. This approach has been applied to the study of E2F gene regulation and identified a number of additional targets that include not only DNA replication and cell cycle regulatory genes but also genes encoding mitotic activities (10, 11, 12, 13, 14, 15). Nevertheless, even this approach is limited in scope when it simply examines the expression of individual genes; at best, this represents a high-throughput Northern analysis.

An alternative approach is to take advantage of the complexity of the microarray data and to identify patterns of gene expression that typify a cellular phenotype. Because the genes expressed in a cell ultimately define the cellular phenotype, an analysis of this complexity has the potential to describe phenotypes in greater detail and clarity than can be achieved through morphological or various biochemical characteristics. We have taken advantage of this complexity, by using statistical methodologies that have the power to find patterns that typify a particular state, to characterize the distinctions between Rb-null and p130/p107-null cells. These methodologies have been applied in our recent studies to identify the differences in breast cancer samples that predict characteristics of clinical importance including estrogen receptor status and lymph node metastasis (7).

Gene Expression Phenotypes of Cells Null for Rb Family Members.

Given the past observations that link Rb family function with the cessation of cell growth, our strategy has been to examine the consequences of the loss of Rb or p130/p107 function as cells are induced to enter quiescence. We prepared RNA from wild-type primary MEFs as well as Rb-null and p107/130-null cells that in each case were brought to quiescence by serum starvation. Cells were harvested 48 h after serum starvation, and DNA content was determined by propidium iodide staining and flow cytometry to assess the percentage of cells in the quiescent state (data not shown). RNA was prepared and gene expression was assayed by hybridization to high-density DNA microarrays (Affymetrix GeneChips). The Mu11K Affymetrix GeneChip DNA arrays contain 11,000 mouse gene sequences and expressed sequence tags allowing for measures of a substantial fraction of the mouse genome. RNA prepared either from quiescent cells or from Rb-null or p130/p107-null cells was used to generated probes for hybridization, which were applied to Affymetrix GeneChip arrays by established protocols. Because our focus was to identify patterns of expression that defined the nature of the Rb-null or p130/p107-null cell, we repeated the isolation of cells and RNA and the hybridization to the Affymetrix arrays, a total of 9 times for Rb-null experiments and 10 times for the p130/p107-null experiments. The hybridized chips were scanned and analyzed as described previously (10) and in the “Materials and Methods.” Average difference values were determined and were used for further analysis.

To take full advantage of the complexity of the gene expression data sets, we have made use of methodologies for analysis that we have applied recently to the study of human breast cancer samples (7). In this case, we focus on the genes the expression of which most highly correlates with the cellular state (Rb-null or p130/p107-null), rather than focusing on simple fold change, to identify gene expression patterns that define the Rb-null or p130/p107-null cellular state. To accomplish this, we first used the average difference values for each gene on the array over the 9 or 10 experiments and then identified those genes the expression of which most highly correlated with the genotype of interest. This group of genes was then used for binary regression analysis to identify factors, also known as principal components, that represented underlying structure in the data. The goal was to identify the patterns of gene expression that most highly correlate with and define the cellular state of interest (Rb-null or p130/p107-null). A factor analysis that aims to identify structure in the data in relation to a biological property represents a group of genes that together exhibit a consistent pattern of expression in relation to an observable phenotype.

An illustration of the analysis of the Rb-null versus wild type based on two such factors is shown in Fig. 1. Whereas factor 2 does little to discriminate between the wild-type cells and the Rb-null cells, it is clear that factor 1 cleanly separates the samples. Very simply, this analysis reveals that factor 1 represents a group of genes the expression patterns of which clearly identify the difference between a wild-type cell and a Rb-null cell. A similar analysis for the difference between wild-type cells and the p130/p107-null cells again identifies one factor that clearly identifies the difference between the two cellular states.

In addition to the pair-wise discrimination of Rb-null or p130/p107-null versus wild type, we have also examined the potential difference between Rb and p130/p107. Once again, factor analysis was performed to identify gene expression profiles that could provide discrimination of cells lacking Rb versus cells lacking p130/p107. A two-factor plot of this analysis is shown in Fig. 1,A. As was the case for the ability of gene expression patterns to discriminate a wild-type cell from a Rb- or p130/p107-null cell, similar analysis provided clear distinction between the Rb-null cell and the p130/p107-null cell, suggesting that the absence of these Rb family members results in distinct changes in gene expression. We have also analyzed the gene expression data from all three cellular states in the same analysis. As shown in Fig. 1 B, a three-factor plot using all of the samples clearly reveals distinct gene expression factors that discriminate Rb−/− from p130/p107−/− cells.

The factors that can discriminate between a quiescent wild-type cell and a Rb-null quiescent cell are based on a collection of genes that together constitute this factor. Moreover, an analysis will typically yield multiple factors that have the capacity to provide some discrimination of the phenotypes. By combining these individual factors using linear regression, we can create a multifactor score that defines the ability to discriminate. We term this combined factor score a supergene or metagene to indicate the fact that this is a collection of gene expression values that defines the phenotype of interest. Fig. 2 depicts the estimates of classification probabilities based on the Rb or p130/p107 metagenes for each of the wild-type and the knockout samples, together with associated 90% probability intervals illustrating the degree of uncertainty. Although this analysis does not provide a measure of how well the metagenes can predict the status of a given sample, it does provide a useful visual assessment of how clearly the samples are classified based on the metagenes analysis.

The genes that constitute the metagenes can be ordered by the magnitude of the estimated regression value to provide one initial set of genes reflecting how much they contribute to this discrimination. A list of the top 200 genes, along with estimated regression parameters, can be found in the Supplementary Material.1Fig. 3 depicts expression levels of the genes with each row representing an individual gene, ordered from top to bottom as a function of estimated regression coefficients. In Fig. 3, high expression is indicated by a white color and low expression by black. Distinctions in the pattern of expression of this group of 200 genes between the wild-type and Rb-null cells are evident from this figure. To more clearly illustrate the difference in the patterns of expression, the expression values were also plotted after normalization (for each gene, the mean expression value of all of the samples was calculated, subtracted from the value of each sample, and then divided by the SD). This plot, shown at the bottom, very clearly demonstrates the distinction in the expression patterns distinguishing wild-type cells from Rb-null cells. Likewise, a similar analysis of the genes providing the discrimination between wild-type cells and p130/p107-null cells is also shown in Fig. 3. Again, the difference in the pattern, both in the raw values of expression (top panels) and in the standardized analysis of the expression patterns (bottom panels), is clearly evident.

Metagenes Can Predict a Rb-null or p130/p107-null Phenotype.

Although the analyses presented in Figs. 1 and 2 identify clear differences in the wild-type, Rb-null, and p130/p107-null cells, it remains possible that these differences reflect simple chance differences that are unrelated to biological function. A more stringent test of the significance of these gene expression patterns can be obtained from cross-validation studies that ask whether a given factor (group of genes) can, in fact, predict the state of an unknown sample. That is, do these patterns of gene expression truly represent a gene expression phenotype that defines the state of a Rb-null or p130/p107-null cell? Indeed, this is a critical test of the data analysis, given the fact that the number of gene expression observations is very large relative to the number of samples (thousands versus tens). As such, there is a significant probability that gene expression values distinguish the two states merely by chance. A standard statistical approach to addressing this question makes use of out-of-sample cross-validation analysis. In this instance, we remove one of the samples from the analysis, use the remaining samples to compute a factor score that potentially represents the Rb-null state, and then ask whether this gene expression pattern can truly predict the status of the removed sample that is treated as an unknown. Each of the “one-at-a-time” analyses leads to a different subset of 200 screened genes. These subsets are highly overlapping but also show up additional genes as we move between hold-out cases. This variation in screened subsets reflects sample variability and inherent heterogeneity in expression profiles of the samples.

Fig. 4 illustrates the cross-validation predictions resulting from this formal and honest predictive analysis. Uncertainty intervals tend to be fairly wide for some of the samples, reflecting a certain ambiguity in the expression profiles of these cases relative to the 200 genes found to be most discriminatory among the other cases. Nevertheless, in each instances (Rb or p130/p107 versus wild type, or Rb versus p130/p107) the prediction of the cellular state is quite reliable with all of the samples being accurately predicted.

Rb Metagenes Can Predict Rb-null Tumors.

The analyses shown in Fig. 4 demonstrate the ability of the Rb metagenes to accurately predict the Rb-null phenotype in MEFs. This raises the possibility that such an analysis could be extended to the study of tumors so as to predict the status of a pathway such as the Rb pathway based on a metagene analysis. That is, rather than determining whether the Rb gene is mutant, we determine whether there is a Rb-null gene expression signature. Humans that inherit one mutant Rb allele develop retinoblastoma. In contrast, mice that are heterozygous for a Rb-null allele develop pituitary and thyroid tumors, generally at 1 year of age, resulting from an additional mutational event of the wild-type allele of RB(16). Thus, we have made use of a series of pituitary and thyroid tumors that developed in Rb-heterozygous mice.

RNA was prepared from the tumors, was used to generate a probe, and then was hybridized to Affymetrix Mu11K GeneChips. For comparison, we also analyzed RNA from normal pituitary tissue as well as pituitary tumors that derived from transgenic mice expressing the novel, NN2-terminally truncated isoform of ptd-FGFR4 (9). Each hybridization was treated as an unknown state and was used for prediction of the Rb-null state based on the Rb metagenes defined in the fibroblast experiments.

The results of predictions of the various tumor and normal samples are shown in Fig. 5,A. Each sample predicted in the Rb metagene models and the resulting probabilistic predictions are graphed as the relative probabilities of the sample exhibiting a Rb pattern. It is evident from this analysis that each of the Rb-null tumors (in red) is quite accurately predicted, based on the Rb metagenes, and readily distinguished from both the normal tissue samples and the pituitary tumors deriving from an alternate pathway (FGF activation). As a further control, we also used the p107/p130 metagenes derived from the fibroblast experiments in predictions of these tumor and normal samples. As shown in Fig. 5 B, there was no discrimination of the samples based on these predictions, consistent with the fact that the tumors derive from the loss of Rb function rather than the loss of the other Rb family members.

The understanding of specificity of function within the Rb family has derived from a combination of studies of the biochemical properties of the proteins together with genetic analysis of mice and cells deficient for the individual proteins. Such studies have demonstrated clear differences in the specificity of interaction of Rb proteins with members of the E2F family. In particular, Rb is seen to interact with the entire group of E2Fs (E2F1–5) whereas p130 and p107 bind exclusively to either E2F4 or E2F5 (2, 3). The consequence of these interactions of p130 or p107 with E2F4 or E2F5 or the interaction of Rb with E2F1–3 is likely the same, i.e., the inhibition of E2F-specific transcription of the target genes the promoters of which contain E2F-binding sites. Considerable previous work would support the conclusion that the interaction of p130, and possibly p107, with E2F4 or E2F5 generates repressor complexes that actively repress the target genes. The interaction of Rb with E2F1–3 could also do the same, but other data would support a role for Rb in simply blocking the activation potential of this group of E2F proteins.

Previous work of Hurford et al.(6) identified differences in the genes that are deregulated in the absence of Rb or the other two Rb family members. By expanding the analysis to a genome scale assay of gene expression and by making use of statistical methods that can identify distinct patterns of expression, we now provide evidence for distinct roles for Rb versus the two other family members in controlling the expression of genes with distinct cellular functions.

Distinct Roles for Rb and p130/p107 in the Control of Gene Expression.

The use of DNA microarray analysis to identify patterns of gene expression that result when cells enter a quiescent state in the absence of either Rb or p130/p107 has illustrated clear and distinct differences in the nature of the genes controlled by these related proteins. In particular, it is evident from the analysis of the Rb-deficient cells that the genes characterizing the Rb-null state include a substantial number of genes encoding DNA replication and cell cycle regulatory proteins (see Supplemental Table 1). These are the genes previously suggested to represent targets for action of the activating group of E2Fs (E2F1–3), many of which have been identified from prior work by ectopic overexpression of the activating E2Fs and subsequent Northern analysis or microarray profiling of gene expression (10, 11, 12, 13, 14, 17).

In contrast, the genes identified as deregulated in the cells lacking p130 and p107 are largely devoid of the DNA replication and cell cycle regulatory genes (see Supplemental Table 2). Rather, the genes that define a quiescent cell lacking p130 and p107 seem to represent a variety of genes that encode proteins generally involved in the regulation of cell growth such as the PDGF receptor, PDGF, Gas1, Gas2, Stat1, TGF-β3, and the Ttk kinase. A second group of genes, seen to be deregulated by the absence of p130/p107, are those encoding proteins known to be involved in maintaining the extracellular matrix, i.e., decorin, procollagen, fibulin, dynactin, and catenin, and known to be involved in signaling activities that link to the extracellular matrix including PDGF receptor α, TGF-β3, and connective tissue growth factor (18). A variety of studies have documented the role of cellular interactions that involve the extracellular matrix in the regulation of cell growth and differentiation (19). Taken together, these results would suggest a common aspect of the Rb-regulated group as those involved in cell cycle control, whereas the p130/p107 regulated group would appear to contain genes involved in the control of cell growth as cells exit the cell cycle in response to differentiation signals or as cells reenter the cell cycle after a growth-stimulatory event.

What might be the significance of this difference in specificity exhibited by the loss of Rb versus the loss of the p130/p107 proteins? One interpretation of these results would suggest a role for Rb in cell cycle control versus a role for p130/p107 in maintaining the quiescent state. Certainly, this view is consistent with the observation that Rb primarily associates with those E2Fs believed to be responsible for activating cell cycle genes, whereas p130/p107 associate with the E2Fs that characterize transcription repression in the quiescent cell. Nevertheless, it is also true that other studies have identified interactions of an E2F4 complex involving particularly p130 with the majority of E2F target genes, including S-phase genes, in a quiescent cell. If so, how then are there distinctions in the genes deregulated by the absence of Rb versus the absence of p130/p107?

One clue may be seen in various studies that have suggested a role for Rb in controlling the activating potential of E2Fs as opposed to forming a repressor complex. Indeed, previous work has provided evidence for distinct roles for Rb in controlling the activating E2Fs versus forming a repression complex (20, 21). In addition, we also note past work that has suggested a role for Rb in the control of E2F activity early in the process of cell cycle exit as cells are induced to undergo terminal differentiation. In particular, Rb is seen to undergo rapid dephosphorylation soon after HL60 cells are stimulated to differentiate with retinoic acid, coincident with an association with the available E2F proteins in the cell (22). Only later does the E2F4-p130 complex begin to accumulate. It is, thus, possible that Rb plays a role in the initial inactivation of E2F function, either by blocking the activation function of the E2F1–3 proteins or through formation of a repressor complex, or both. Possibly then, the E2F4-p130 complex serves to actively repress those genes as cells enter the quiescent state. Such a model would be consistent with the finding that the genes deregulated in the absence of Rb, but not p130/p107, are the E2F targets that drive the cell cycle and that are believed to be activated by the E2F1–3 proteins at G1-S phase or at G2. These same genes might also be subject to repression by the E2F4-p130 complex but might not appear as deregulated in the p130/p107-null cells because of the action of Rb in blocking activation of the genes. If true, the genes that are deregulated by the absence of p130/p107 would not be regulated by Rb. Indeed, at least some of these genes are known to not be regulated during the cell cycle but rather only as a function of growth stimulation.

Despite several observations that fail to detect an association of an E2F-Rb complex on chromatin, Rb may play a significant role in blocking the activation of the E2F1–3 as Rb clearly interacts with these E2F proteins as measured in coimmunoprecipitation assays (23). In contrast, the E2F4-p130 is readily detected by a chromatin immunoprecipitation assay on promoters in quiescent cells (23). Thus, it is possible that the major role for Rb in the cell, relative to the control of E2F function, may be in negating the activation potential of the E2F proteins, possibly triggered by events associated with cell cycle exit, with the establishment of active repression by the E2F4-p130 complex following only later.

Gene Expression Phenotypes.

Finally, we believe that the approach taken here to identify gene expression patterns that distinguish a wild-type cell from a Rb-null cell from a p130/p107-null cell are of general applicability to the study of cellular function. Many studies aimed at identifying the gene expression events that distinguish two states rely on simple measurements of fold change. Although this approach does define gene expression differences, it is very difficult to determine which of these changes is most relevant to the biological phenotype. There are examples of genes, such as cyclin E and p57, in which the fold change is clearly different in the Rb-null or p107/p130-null cells, respectively. But, many of the genes identified in the two analyses exhibit some deregulation in both cases. For instance, MIS5, mitotic kinesin-like protein 1, and dUTPase are highly weighted in the Rb discrimination (rank of 2, 15, and 21 respectively based on regression weights) and exhibit a fold change (Rb versus wild type) of 8.7, 6.1, and 5.2 respectively. These same three genes contribute very little to defining the p130/p107 phenotype with ranks below 200 based on regression weights. Yet, these three genes are increased in expression by 2.8, 4.6, and 2.6, respectively, in the p130/p107-null cells. Likewise, apoD and Stat1 rank 1 and 2 for defining the p130/p107-null state but are below 200 in the Rb classification. Yet, apoD is increased 7.5-fold in the p130/p107-null cells versus 3.2-fold in the Rb-null cells; Stat1 is increased 6.9-fold in the p130/p107-null cells and 5.5-fold in the Rb-null cells. Clearly, a simple comparison of the fold change would not allow a conclusion as to the relative importance of the changes for a Rb-null cell or a p130/p107-null cell.

Predicting Oncogenic Events.

In contrast, the analysis of gene expression patterns that have the ability to reliably predict the cellular phenotypic state, as has been done in the case of various cancers, is based on statistically robust quantitative measures that provide a basis for determining significance. In short, the analysis of multiple samples, coupled with the ability to find patterns of expression that highly correlate, lends itself to more quantitative statistical measures of the gene expression differences that truly characterize the biological phenotype. In this example, the determination of a gene expression phenotype that reflects the deregulation of Rb or Rb family members is in principle no different from identifying the estrogen receptor phenotype of a breast cancer. In fact, we believe the connection is even more direct because we have shown that the Rb metagene, which recognizes a Rb-null cellular phenotype based on the pattern of expression of a collection of genes, also has the capacity to recognize a Rb-null state in a tumor. As such, we believe the value in this work can be in the development of a more complete understanding of the events that contribute to tumor development. A variety of past studies have documented the role of multiple genetic alterations in the development and progression of cancers. An ability to determine that a given tumor exhibited deregulation of the Rb pathway, possibly coupled with deregulation of several other tumor suppressors and oncogenes, has the potential to go well beyond the simple description of genetic alterations by providing a global view of the consequences of these alterations. The ability to deconvolute the complexities of the patterns, particularly as they intersect and alter one another, will be a significant challenge. But, we propose that the generation of signatures of various tumor suppressor and oncogene gene expression profiles will facilitate the eventual understanding of the tumor phenotype.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

E.P.B. was supported by the Howard Hughes Medical Institute. J.R.N. is an investigator of the Howard Hughes Medical Institute.

2

Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org).

5

The abbreviations used are: Rb, retinoblastoma; CDK, cyclin-dependent kinase; CKI, cyclin kinase inhibitor; MEF, mouse embryo fibroblast; FBS, fetal bovine serum; FGF, fibroblast growth factor; ptd-FGFR4, pituitary tumor-derived, N-terminally truncated isoform of FGF receptor 4; PDGF, platelet-derived growth factor; TGF, transforming growth factor.

Fig. 1.

Metagene analysis identifies gene expression differences in Rb-null and p130/107-null fibroblasts. A, two-factor analysis discriminating either wild type (WT) versus Rb−/−, wild type versus p130/p107−/−, or Rb versus p130/p107−/− cells. Individual samples depicted in a scatter plot on two dominant factors underlying 200 genes selected in pure discrimination of the training cases. Each sample is indicated by a simple index number and is color coded with red indicating Rb−/− or p130/p107−/− cases and blue indicating wild-type cases. For the bottom figure plotting Rb versus p130/p107, blue indicates Rb−/− and red indicates p130/p107−/−. B, three-factor analysis discriminating wild-type from Rb-null from p130/p107-null cells. The three sets of 200 genes providing the pairwise comparisons in A were pooled, giving a total of 437 genes. Factor analysis based on these genes identified three major factors that discriminate the samples as plotted in the figure. ○, wild-type samples the open circles; red, Rb−/− samples; blue, p130/p107−/− samples.

Fig. 1.

Metagene analysis identifies gene expression differences in Rb-null and p130/107-null fibroblasts. A, two-factor analysis discriminating either wild type (WT) versus Rb−/−, wild type versus p130/p107−/−, or Rb versus p130/p107−/− cells. Individual samples depicted in a scatter plot on two dominant factors underlying 200 genes selected in pure discrimination of the training cases. Each sample is indicated by a simple index number and is color coded with red indicating Rb−/− or p130/p107−/− cases and blue indicating wild-type cases. For the bottom figure plotting Rb versus p130/p107, blue indicates Rb−/− and red indicates p130/p107−/−. B, three-factor analysis discriminating wild-type from Rb-null from p130/p107-null cells. The three sets of 200 genes providing the pairwise comparisons in A were pooled, giving a total of 437 genes. Factor analysis based on these genes identified three major factors that discriminate the samples as plotted in the figure. ○, wild-type samples the open circles; red, Rb−/− samples; blue, p130/p107−/− samples.

Close modal
Fig. 2.

Classification of samples based on Rb- or p130/p107-null state. Fitted classification probabilities for training cases from the factor regression analysis. The values on the horizontal axis, estimates of the overall factor score in the regression. The corresponding values on the vertical axis, fitted/estimated classification probabilities with corresponding 90% probability intervals marked as dashed lines to indicate uncertainty about these estimated values. Top two panels: blue, wild-type (WT) samples; red, Rb−/− or p130/p107−/− samples. Bottom panel: blue, Rb−/− samples; red, p130/p107−/− samples.

Fig. 2.

Classification of samples based on Rb- or p130/p107-null state. Fitted classification probabilities for training cases from the factor regression analysis. The values on the horizontal axis, estimates of the overall factor score in the regression. The corresponding values on the vertical axis, fitted/estimated classification probabilities with corresponding 90% probability intervals marked as dashed lines to indicate uncertainty about these estimated values. Top two panels: blue, wild-type (WT) samples; red, Rb−/− or p130/p107−/− samples. Bottom panel: blue, Rb−/− samples; red, p130/p107−/− samples.

Close modal
Fig. 3.

Gene expression patterns that discriminate between wild-type (WT), Rb-null, and p130/p107-null fibroblasts. Expression levels are depicted by color coding: black, the lowest level of expression, followed by red, orange, yellow, and then white, the highest level of expression. Each column in the figure represents all of the 200 genes from an individual sample, which are grouped according to wild-type or the indicated Rb family deletion. Each row, an individual gene, ordered from top to bottom according to regression coefficients. Top two panels, raw expression measures; lower two panels, normalized expression values (the mean value for each gene across all of the samples is subtracted from each value and then divided by the SD).

Fig. 3.

Gene expression patterns that discriminate between wild-type (WT), Rb-null, and p130/p107-null fibroblasts. Expression levels are depicted by color coding: black, the lowest level of expression, followed by red, orange, yellow, and then white, the highest level of expression. Each column in the figure represents all of the 200 genes from an individual sample, which are grouped according to wild-type or the indicated Rb family deletion. Each row, an individual gene, ordered from top to bottom according to regression coefficients. Top two panels, raw expression measures; lower two panels, normalized expression values (the mean value for each gene across all of the samples is subtracted from each value and then divided by the SD).

Close modal
Fig. 4.

Cross-validation analysis defines Rb-null and p130/p107-null gene expression phenotypes. One-at-a-time cross-validation predictions of classification probabilities for training cases from the factor regression analysis. Values on the horizontal axis, estimates of the overall factor score in the regression. Corresponding values on the vertical axis, estimated classification probabilities with corresponding 90% probability intervals marked as dashed lines to indicate uncertainty about these estimated values. The analysis and predictions for each sample are based on the screened subset of 200 most discriminatory genes. Top two panels: blue, the wild-type (WT) samples; red, the Rb−/− or p130/p107−/− samples; bottom panel: blue, the Rb−/− samples; red, the p130/p107−/− samples.

Fig. 4.

Cross-validation analysis defines Rb-null and p130/p107-null gene expression phenotypes. One-at-a-time cross-validation predictions of classification probabilities for training cases from the factor regression analysis. Values on the horizontal axis, estimates of the overall factor score in the regression. Corresponding values on the vertical axis, estimated classification probabilities with corresponding 90% probability intervals marked as dashed lines to indicate uncertainty about these estimated values. The analysis and predictions for each sample are based on the screened subset of 200 most discriminatory genes. Top two panels: blue, the wild-type (WT) samples; red, the Rb−/− or p130/p107−/− samples; bottom panel: blue, the Rb−/− samples; red, the p130/p107−/− samples.

Close modal
Fig. 5.

Prediction of Rb status in mouse tumors. A, predictions of tumor and normal samples based on Rb metagenes. RNA was prepared from a series of mouse pituitary and thyroid tumors that were derived from animals heterozygous for Rb knockout or from pituitary tumors that were derived from transgenics that express a truncated form of the FGF4 receptor. Hybridization was carried out using Mu11K GeneChips and was analyzed as described in “Materials and Methods.” The results of the hybridization were then used in one-at-a-time cross-validation predictions of classification probabilities based on the trained pattern derived from the analysis of Rb-null fibroblasts as described in Fig. 4. Red, the Rb-null tumors; blue, the FGF4 tumors. B, the p107/p130 metagenes do not predict the Rb-null tumors. The analysis described in A was repeated using the p107/p130 metagene predictor. Blue, the Rb-null tumors.

Fig. 5.

Prediction of Rb status in mouse tumors. A, predictions of tumor and normal samples based on Rb metagenes. RNA was prepared from a series of mouse pituitary and thyroid tumors that were derived from animals heterozygous for Rb knockout or from pituitary tumors that were derived from transgenics that express a truncated form of the FGF4 receptor. Hybridization was carried out using Mu11K GeneChips and was analyzed as described in “Materials and Methods.” The results of the hybridization were then used in one-at-a-time cross-validation predictions of classification probabilities based on the trained pattern derived from the analysis of Rb-null fibroblasts as described in Fig. 4. Red, the Rb-null tumors; blue, the FGF4 tumors. B, the p107/p130 metagenes do not predict the Rb-null tumors. The analysis described in A was repeated using the p107/p130 metagene predictor. Blue, the Rb-null tumors.

Close modal

We thank Kaye Culler for assistance with the preparation of the manuscript, and Nannie Turner for media preparation.

1
Knudson A. G. Mutation and cancer: statistical study of retinoblastoma.
Proc. Natl. Acad. Sci. USA.
,
68
:
820
-823,  
1971
.
2
Dyson N. The regulation of E2F by pRB-family proteins.
Genes Dev.
,
12
:
2245
-2262,  
1998
.
3
Nevins J. R. Toward an understanding of the functional complexity of the E2F and retinoblastoma families.
Cell Growth Differ.
,
9
:
585
-593,  
1998
.
4
Sherr C. J. Cancer cell cycles.
Science (Wash. DC)
,
274
:
1672
-1677,  
1996
.
5
Harbour J. W., Dean D. C. Rb function in cell-cycle regulation and apoptosis.
Nat. Cell Biol.
,
2
:
E65
-E67,  
2000
.
6
Hurford R. K., Cobrinik D., Lee M-H., Dyson N. pRB and p107/p130 are required for the regulated expression of different sets of E2F responsive genes.
Genes Dev.
,
11
:
1447
-1463,  
1997
.
7
West M., Blanchette C., Dressman H., Huang E., Ishida S., Spang R., Zuzan H., Olson J. A., Jr., Marks J. R., Nevins J. R. Predicting the clinical status of human breast cancer by using gene expression profiles.
Proc. Natl. Acad. Sci., USA
,
98
:
11462
-11467,  
2001
.
8
Robertson E. J. Embryo-derived stem cell lines Robertson E. J. eds. .
Teratocarcinoma and embryonic stem cells: A practical approach
,
71
-112, IRL Press Oxford, UK  
1987
.
9
Ezzat S., Zheng L., Zhu X. F., Wu G. E., Asa S. L. Targeted expression of a human pituitary tumor-derived isoform of FGF receptor-4 recapitulates pituitary tumorigenesis.
J. Clin. Investig.
,
109
:
69
-78,  
2002
.
10
Ishida S., Huang E., Zuzan H., Spang R., Leone G., West M., Nevins J. R. Role for E2F in the control of both DNA replication and mitotic functions as revealed from DNA microarray analysis.
Mol. Cell. Biol.
,
21
:
4684
-4699,  
2001
.
11
Moroni M. C., Hickman E. S., Denchi E. L., Caprara G., Colli E., Cecconi F., Muller H., Helin K. Apaf-1 is a transcriptional target for E2F and p53.
Nat. Cell Biol.
,
3
:
552
-558,  
2001
.
12
Muller H., Bracken A. P., Vernell R., Moroni M. C., Christians F., Grassilli E., Prosperini E., Vigo E., Oliner J. D., Helin K. E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis.
Genes Dev.
,
15
:
267
-285,  
2001
.
13
Polager S., Kalma Y., Berkovich E., Ginsberg D. E2Fs up-regulate expression of genes involved in DNA replication. DNA repair and mitosis.
Oncogene
,
21
:
437
-446,  
2002
.
14
Ma Y., Croxton R., Moorer R. L., Cress W. D. Identification of novel E2F1-regulated genes by microarray.
Arch. Biochem. Biophys.
,
399
:
212
-224,  
2002
.
15
Ren B., Cam H., Takahashi Y., Volkert T., Terragni J., Young R. A., Dynlacht B. D. E2F integrates cell cycle progression with DNA repair, replication, and G2/M checkpoints.
Genes Dev.
,
16
:
245
-256,  
2002
.
16
Jacks T., Fazeli A., Schmitt E. M., Bronson R. T., Goodell M. A., Weinberg R. A. Effects of an Rb mutation in the mouse.
Nature (Lond.)
,
359
:
295
-300,  
1992
.
17
DeGregori J., Kowalik T., Nevins J. R. Cellular targets for activation by the E2F1 transcription factor include DNA synthesis and G1/S regulatory genes.
Mol. Cell. Biol.
,
15
:
4215
-4224,  
1995
.
18
Sano H., Yokode M., Takakura N., Takemura G., Doi T., Kataoka H., Sudo T., Nishikawa S., Fujiwara H., Nishikawa S. I., Kita T. Study on PDGF receptor b pathway in glomerular formation in neonate mice.
Ann. N. Y. Acad. Sci.
,
947
:
303
-305,  
2001
.
19
Boudreau N., Bissell M. J. Extracellular matrix signaling: integration of form and function in normal and malignant cells.
Curr. Opin. Cell Biol.
,
10
:
640
-646,  
1998
.
20
Ross J. F., Naar A., Cam H., Gregory R., Dynlacht B. D. Active repression and E2F inhibition by pRB are biochemically distinguishable.
Genes Dev.
,
15
:
392
-397,  
2001
.
21
Ross J. F., Liu X., Dynlacht B. D. Mechanism of transcriptional repression of E2F by the retinoblastoma tumor suppressor protein.
Mol. Cell
,
3
:
195
-205,  
1999
.
22
Ikeda M-A., Jakoi L., Nevins J. R. A unique role for the Rb protein in controlling E2F accumulation during cell growth and differentiation.
Proc. Natl. Acad. Sci. USA
,
93
:
3215
-3220,  
1996
.
23
Takahashi Y., Rayman J. B., Dynlacht B. D. Analysis of promoter binding by the E2F and Rb families in vivo: distinct E2F proteins mediate activation and repression.
Genes Dev
,
14
:
804
-816,  
2000
.

Supplementary data