Abstract
Relationships between genetic alterations, such as co-occurrence or mutual exclusivity, are often observed in cancer, where their understanding may provide new insights into etiology and clinical management. In this study, we combined statistical analyses and computational modeling to explain patterns of genetic alterations seen in 178 patients with bladder tumors (either muscle-invasive or non–muscle-invasive). A statistical analysis on frequently altered genes identified pair associations, including co-occurrence or mutual exclusivity. Focusing on genetic alterations of protein-coding genes involved in growth factor receptor signaling, cell cycle, and apoptosis entry, we complemented this analysis with a literature search to focus on nine pairs of genetic alterations of our dataset, with subsequent verification in three other datasets available publicly. To understand the reasons and contexts of these patterns of associations while accounting for the dynamics of associated signaling pathways, we built a logical model. This model was validated first on published mutant mice data, then used to study patterns and to draw conclusions on counter-intuitive observations, allowing one to formulate predictions about conditions where combining genetic alterations benefits tumorigenesis. For example, while CDKN2A homozygous deletions occur in a context of FGFR3-activating mutations, our model suggests that additional PIK3CA mutation or p21CIP deletion would greatly favor invasiveness. Furthermore, the model sheds light on the temporal orders of gene alterations, for example, showing how mutual exclusivity of FGFR3 and TP53 mutations is interpretable if FGFR3 is mutated first. Overall, our work shows how to predict combinations of the major gene alterations leading to invasiveness through two main progression pathways in bladder cancer. Cancer Res; 75(19); 4042–52. ©2015 AACR.
Statistical analyses of bladder cancer genetic data reveal co-occurring or mutually exclusive genetic alterations for genes frequently altered in this cancer. The interactions between these genes are organized into an influence network based on literature analysis. We find that the sole network topology is not sufficient to explain some of the nine identified associations. To assess these associations while accounting for the dynamics of associated signaling pathways, we have developed a logical model. For the identified patterns, our model sheds light on aberrant activation of signaling pathways and provides predictions.
In an influence network, details of synthesis, degradation, phosphorylation, acetylation, or ubiquitination are abstracted into binary relations. Nodes (biochemical species or phenomena) are connected through directed, signed edges (denoting regulatory interactions). For instance, RB1 is known to sequester E2F1 by forming an inactive complex. This reaction is interpreted as RB1 inhibiting E2F1. Another example is the phenomenological node Proliferation, which is activated by both CyclinE1 and CyclinA. Throughout the article, when referring to the genes, their names will be italicized (e.g., CCND1), and the nodes of the network will be written in standard format (e.g., CyclinD1).
From the influence network, we define a discrete, dynamical model using the logical formalism. Our qualitative representation of genomic data (mutation, loss, or amplification of a gene locus) justifies a discrete modeling approach. Each node of the network is associated with a discrete variable representing its qualitative functional level. A Boolean variable is often sufficient to convey the role of the corresponding node: species are either active (ON) or inactive (OFF), thus able, or not, to act upon their targets. In some cases, more than two levels are needed to convey distinct functional roles. For instance, E2F1 mediates the transcription of cell-cycle genes, but when overexpressed, it activates genes of the apoptotic pathway. To distinguish between these two situations, E2F1 is associated with a multivalued variable (values 0, 1, or 2).
The variables describe the current node states, which evolve depending on logical rules. More precisely, the target level of each node is defined by a set of logical statements on the levels of the regulators of that node using logical connectors (denoted ! for NOT, & for AND, and ∣ for OR).
For example, the Boolean variable associated with Proliferation evolves as follows:
Statement (1) indicates that Proliferation is ON if one of the cyclins is present (otherwise Proliferation is OFF). The case of the multivalued variable associated with E2F3 (which stands for the isoform E2F3a) is more complex. Its logical rule includes one logical formula for each level:
Statement (2) specifies under which conditions the target level of E2F3 is 1: simultaneous absence of RB1 (inhibitor of E2F3), of CHEK1_2 (which stands for either CHEK1 or CHEK2) at its maximum value, and presence of RAS. Statement (3) specifies under which conditions E2F3 level is 2: E2F3a is induced by DNA damage in a CHEK1/2-dependent manner in the absence of RB1 and in the presence of RAS. For any other situation, the target value of E2F3 is 0.
Given a state of the model, that is, a vector of all node levels, some of the nodes may be called to update their levels as prescribed by their logical rules. Because we have no information about the velocities of these changes, we opt for the asynchronous update that defines as many successor states as the number of updated nodes. The resulting discrete dynamics is nondeterministic (a state may have several successors) and covers all potential behaviours of the network, compatible with the logical rules.
The attractors of the logical model refer to long-term behaviors (sets of states in which the dynamics is trapped). These attractors are stable states (all nodes are stable) or complex attractors (some nodes display oscillations).
In this modeling framework, perturbations of gene activity are defined as follows: variable associated with the perturbed node is constrained, overriding its logical rule. A gain of function, amplification, or overexpression is specified by maintaining a variable at 1 (or at its maximal level in the case of a multivalued variable), and is denoted by the broad term “overexpression,” whereas a loss of function or deletion is defined by maintaining a variable at 0 (denoted KO).
To interpret the model results, we associate observed biologic phenotypes with attractors of the discrete model. These attractors correspond to phenotypes specified by the states of the output nodes: Proliferation, Apoptosis (E2F1 or TP53-dependent), and Growth_arrest. From a given set of initial conditions, the possible fates of a cell are assessed in terms of the phenotypes that can be reached from these conditions and the probabilities associated to those phenotypes. Associating a phenotype with a stable state is straightforward, whereas a cyclic attractor may include oscillations between several phenotypes when the cell decision cannot be made. The probability of a phenotype is estimated as the proportion of trajectories leading to any attractor matching this phenotype. Input components are maintained constant (corresponding to external signals). While some input values lead to multistability (several possible phenotypes), some others drive the cell, deterministically, to a unique phenotype; for example, in most cases, the sole Apoptosis is possible when DNA_damage is ON.
Introduction
Accumulated data show specific patterns of genetic and epigenetic changes associated with each cancer type. These patterns include particular sets of altered genes, types (mutations, amplifications, losses) and relationships (mutual exclusivity or co-occurrence) between alterations. The underlying molecular network should, at least partly, explain such observations. So far, these patterns have been explained in terms of linear pathways: co-occurring mutations tend to target genes in parallel signaling pathways, whereas mutual exclusive alterations may implicate genes involved either in a common pathway, or in different progression pathways, i.e., in different tumor types (1–5). Mutual exclusivity could also involve genes that are synthetically lethal (6). These explanations are only hand waving arguments, though. Indeed, the static network structure-based analysis of these patterns has its limitations. Pathways involved in tumorigenesis are complex and interconnected, and it is therefore difficult to define the borders of a signaling pathway, and the notion of parallel or common pathways. Mathematical modeling may help to go further in these interpretations.
Here, we show how dynamical modeling can relate complex networks with observable biologic data (genomic alterations, mutations). Computational modeling allows to integrate and validate current knowledge about molecular mechanisms underlying cellular decisions. It supports mechanistic understanding and helps to formulate predictions when the network complexity defies intuition. This is particularly true when dealing with diseases such as cancer, which involve the deregulation of multiple and intricate pathways (7). Numbers of models have already proved useful in elucidating questions related to cancer biology (8–13).
Bladder cancer is frequent in Europe and North America, where it represents the fourth most common cancer in men and the ninth in women in terms of incidence (14). The high recurrence rate makes bladder cancer one of the most costly cancers to treat. Many observations of the patterns of genetic changes identified in bladder cancer studies remain unexplained or only partially explained in terms of underlying molecular mechanisms (15, 16). Bladder tumors progress along two main pathways: Ta and CIS (carcinoma in situ) pathways (17, 18). About 50% of diagnosed bladder carcinomas are Ta tumors, generally of low grade; 20% are T1 tumors and 30% are muscle-invasive tumors (T2–4). CIS consists of flat, high-grade lesions, rarely found in absence of other bladder tumors. Ta tumors often recur and progress rarely (5 to 10% of cases) but unpredictably to T1 and then to muscle-invasive tumors (T2–4), whereas CIS often progress to T1 and then to muscle-invasive tumors in circa 50% of the cases. It is believed that about 80% of muscle-invasive tumors develop through the CIS pathway. In bladder cancer, as in many cancer types, an important fraction of genetic alterations concerns genes coding for growth signaling factors and for G1–S regulators. Activating mutations of the fibroblast growth factor receptor 3 (FGFR3) gene are associated with the Ta pathway with a high frequency, but are rarely found in the CIS pathway (17). Besides FGFR3, mutated in about 45% of tumors (19), common genes recurrently genetically altered include other oncogenes such as the small GTPases HRAS (9% of cases) and KRAS (4% of cases), PIK3CA (subunit of the PI3K, 18% of cases), as well as tumor suppressors such as CDKN2A (16% of cases) and RB1 (20% of cases; ref. 52). Mutations affecting oncogenes are recurrent point mutations, whereas CDKN2A is mostly affected by losses involving the whole gene (20). RB1 is targeted by both point mutations and deletions (21).
Our goal is to understand how genetic alterations (mutations, homozygous deletions, or amplifications) combine to promote cancer tumorigenesis. More precisely, we aim to explore patterns (mutual exclusivity or co-occurrence), focusing on components often altered in bladder cancer and involved in growth factor signaling, cell-cycle entry, and triggering of apoptosis in response to DNA damage with the focus on the E2F pathway. It has been shown recently that the E2F pathway is not only involved in the control of proliferation but also in invasion and metastasis (22–24), justifying the study of this pathway to explore invasiveness in bladder tumors.
This study combines literature search, statistical analysis of relevant datasets and logical modeling of the related signaling network. Using an initial dataset of 178 tumors (CIT series) and three public datasets, statistical tests on pairs of alterations identify a list of co-occurring or mutually exclusive alterations. With our computational model, we analyze each association to identify the deregulated pathways and their contribution to tumorigenesis. In some cases (e.g., co-occurrence of FGFR3 and PIK3CA mutations), we found that the sole network topology cannot explain the alteration patterns identified by statistical analysis. It appears necessary to build a dynamical model to accurately and formally identify mechanisms activated in these patterns (25). When the model cannot straightforwardly account for these patterns, we search for contexts (other activating or inactivating mutations, amplifications or losses) that could explain the statistical results. Mathematical modeling provides insights into properties of involved cellular pathways. It is thus useful to understand mechanisms at play, as a complement to statistical methods, which uncover patterns of alterations. Here, our main goal is to highlight mechanisms affected in bladder tumors to propose successive events that may lead to high-stage tumors in both Ta and CIS progression pathways.
Materials and Methods
Data production and analysis
Bladder samples (CIT series).
One hundred seventy-eight bladder carcinomas, including 90 non–muscle-invasive tumors (50 pTa, 40 pT1) and 88 muscle-invasive tumors (32 pT2, 37 pT3, 19 pT4), were collected from patients treated surgically between 1988 and 2006 at Henri Mondor Hospital (Créteil, France), Institut Gustave Roussy (Villejuif, France) and Foch Hospital (Suresnes, France). All tumors were pathologically reviewed, staged according to the 2002 TNM classification (26, 27), and graded according to the 1973 WHO classification (28). All patients provided written informed consent, and ethics committees of all hospitals approved the study (Comité de Protection des Personnes de l'hôpital Henri Mondor, Comité de Protection des Personnes de Boulogne—Ambroise Paré, and Comité de Protection des Personnes de Bicêtre). All analyses were performed on the basis of anonymous patient data.
DNA extraction from tissues.
Immediately after surgery, tissue samples were frozen in liquid nitrogen and stored at −80°C until nucleic acid extraction. DNA were extracted from frozen human bladder tissues as described in (26).
Gene mutation analysis.
FGFR3 mutations were studied with the SNaPshot method (29). TP53 (exons 2–11), KRAS (exons 2–3), NRAS (exons 2–3), HRAS (exons 2–3) and PIK3CA (exons 2, 9, and 20) gene mutations were screened by direct sequencing with previously described primers and protocols (30, 31), available on request. All mutations were confirmed by sequencing both strands of a second, independent PCR product.
CGH array analysis.
DNA copy number was analysed for the 178 bladder tumors on the human genome-wide CIT-CGH array (V6) designed by the CIT-CGH Consortium. This array contains 4,434 sequence-verified bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) clones. Genomic alterations were determined using GLAD algorithm (32).
Multiplex ligation-dependent probe amplification analysis.
DNA copy number at the CDKN2A and RB1 loci was determined using a MLPA assay, as described in ref. 33. Bladder tumor DNA was analyzed with the P024B kit and P047 kits (MRC-Holland, Amsterdam, the Netherlands) for genomic analysis of CDKN2A/B and RB1, respectively. Two of the 14 control probes spanning chromosomal regions 9q21 and 11p12 were excluded from the RB1 copy number analysis because these regions are frequently altered in bladder tumors.
Testing mutual exclusivity and co-occurrence of genetic alterations.
Table 1 reports P values of the Fisher exact tests for all pairs of selected genetic alterations. For each alteration, we split the tumor samples into two groups with and without the alteration, and analyzed the corresponding contingency tables for pairs of alterations. The significance threshold was set to 5%. We performed the tests using R software.
Computational network modeling.
Results
Figure 1 depicts the workflow designed for this study. We first searched the literature on published co-occurrence and mutual exclusivity patterns, focusing on genetic alterations (mutations, homozygous losses, and amplifications) of genes known to be frequently altered in bladder cancer and reported in previous studies (37, 38). These genes code mainly for proteins involved in growth factor receptor signaling (EGFR, FGFR3, HRAS/KRAS/NRAS, PIK3CA), cell-cycle entry (RB1, RBL2, CDKN2A, CCND1, E2F3), and in triggering apoptosis in response to DNA damage (TP53, MDM2; Fig. 2B; Supplementary Material S1). Among the selected genes, we performed a statistical analysis on our dataset, referred to as CIT dataset, including 178 samples of bladder carcinomas, with non–muscle-invasive and muscle-invasive tumors (Fig. 2A; Supplementary Table S9). We finally verified the patterns deduced from both the literature search and the statistical analysis in three additional independent bladder tumor datasets [referred to as the Lindgren dataset (37), the Iyer dataset (38), and the TCGA dataset (39)].
A model including the selected genes was built using molecular facts extracted from scientific publications. It was validated against published phenotypes of mice mutants (Supplementary Table S3). Both the topological analysis of the network and the mathematical model were used to explain the patterns of alterations, formulate predictions such as expected effect of genetic contexts, or yet probe results from the data analysis. When possible, model predictions were verified in the datasets.
Data analysis of patterns of co-occurrence and mutual exclusivity
The identified associations concern: FGFR3, RAS, PIK3CA, CCND1, E2F3 (oncogenes) and RB1, TP53, CDKN2A (tumor suppressors). We organized these associations into four groups (Table 1): (1) mutual exclusivity or co-occurrence of FGFR3 mutations and genetic alterations of another oncogene; (2) co-occurrence or mutual exclusivity of FGFR3 mutations and alterations of tumor suppressors; (3) co-occurrence of TP53 mutations and E2F3 amplifications; and (4) co-occurrence of CDKN2A homozygous deletions and oncogenes besides FGFR3.
We found associations between genetic alterations in the literature (Table 1, column 2): exclusivity of FGFR3 and RAS-activating mutations (Table 1; row 1.1; ref. 16); co-occurrence of FGFR3 and PIK3CA-activating mutations (Table 1; row 1.4; refs. 15, 40); co-occurrence of FGFR3 mutations and CDKN2A homozygous deletions in muscle-invasive tumors (Table 1, row 2.1; ref. 33); exclusivity of FGFR3 and TP53 mutations, found when considering all tumors but disappearing when stratifying by stage (Table 1, row 2.2; ref. 19); and co-occurrence of E2F3 amplifications and RB1 deletions (or CDKN2A deletions as presented in ref. 41).
On CIT dataset, we performed the Fisher exact test of independence for each pair of alterations (Table 1, columns 3–5). Because some associations depend on the stage, we considered all tumors and separately: non–muscle-invasive (sup) and muscle-invasive tumours (inv). Note that stratifying tumors by both stage and grade would have resulted in subgroups too small to achieve statistically significant tests. We found associations not previously reported in the literature for bladder cancer: exclusivity of FGFR3 mutations and E2F3 amplifications (Table 1, row 1.2); exclusivity of FGFR3 mutations and CCND1 amplifications (Table 1, row 1.3); co-occurrence of TP53 mutations and E2F3 amplifications (Table 1, row 3); co-occurrence of CCND1 amplifications and CDKN2A homozygous deletions (Table 1, row 4.1, reported in other cancer types; refs. 42, 43); and co-occurrence of PIK3CA mutations and CDKN2A deletions (Table 1, row 4.2).
Proceeding with our workflow, we mined three publicly available datasets, searching for associations found in the literature and/or in our CIT dataset. We considered each dataset separately (Table 1, columns 6–10), as well as an ensemble gathering all muscle-invasive datasets (Lindgren, Iyer, and TCGA muscle-invasive tumors; Table 1, column 11). Non–muscle-invasive tumors, besides the CIT dataset, are only found in the Lindgren dataset. Among the published associations, all were confirmed in at least one of the four datasets, except for the last one: co-occurrence of E2F3 amplifications and RB1 deletions was not found significant in any dataset (not shown). We thus chose not to further study it. Eight out of these nine remaining patterns were verified in at least one of the three public datasets (Table 1, rows 1.1 and 4.1). Figure 3 recapitulates these associations.
Computational network modeling
From the influence network to the logical model.
On the basis of an extensive literature search and on our previous work (44), we built a generic simplified influence network around E2F-activating transcription factors in response to cell receptor activation (EGFR and FGFR3), growth inhibition (mainly representing TGFβ) and DNA damage, yet focusing on genes altered in bladder cancer. We considered the major players involved in both RB/E2F and TP53 pathways, controlled by the same transcription factor, E2F1, which included the genes identified in the data analysis.
The network of Fig. 4 summarizes information from the literature and from pathway databases, such as Reactome (45) or Atlas of Cancer Signaling Network (53). It includes 30 nodes and 84 interactions. The inputs, DNA_damage, EGFR_stimulus, FGFR3_stimulus and Growth_inhibitors, trigger different responses (also referred to as phenotypes): Apoptosis, Proliferation, and Growth_arrest. Readouts for these phenotypes are: presence of CyclinE1 (CCNE1) or CyclinA (CCNA2) for proliferation, of TP53 for E2F1-independent apoptosis, of E2F1 for E2F1-dependent apoptosis, and of p21CIP (CDKN1A), RBL2 or RB1 for growth arrest. These molecular readouts are considered as phenotype triggers, possibly followed by downstream events, not described here. For example, when either CyclinE1 or CyclinA is active, it indicates that the cell enters S-phase. Similarly, TP53 activation does not necessarily lead to apoptosis but rather triggers it (Supplementary Material S1).
Growth factors are separated into two stimuli, EGFR_stimulus and FGFR3_stimulus, the first activating EGF receptor (EGFR), and the second activating FGFR3. Pathways downstream of the two receptors differ based on SPRY activity. This interplay between EGFR and FGFR3 has been abstracted from Grieco and colleagues (12).
From the influence network, we defined the logical model, with 25 Boolean variables and five ternary variables. Each combination of input values thus defines a region of the state space with over 109 states. The logical rules governing model dynamics (see Quick Guide to Equations and Assumptions) are described in Supplementary Table S1. The model has 20 stable states and 5 cyclic attractors (Supplementary Table S2). These attractors correspond to phenotypes described by the values of Apoptosis, Proliferation, and Growth_arrest. In some cases, the input combination fully determines the resulting phenotype: for DNA_damage ON, there is a unique attractor where Apoptosis and Growth_arrest are ON and, conversely, this phenotype is only possible in the presence of DNA_damage. Some input values lead to two stable states (multistability): for DNA_damage OFF, FGFR3_stimulus and Growth_inhibitors ON, either Proliferation or Growth_arrest is possible. Some complex attractors show oscillations between two of the phenotypes: for EGFR_stimulus ON and the other inputs OFF, oscillations between Growth_arrest and Proliferation are observed.
To test the model coherence, we challenged it with published experiments on diverse cell types (mouse embryo and rat fibroblasts, murine retina, etc.) by targeting the corresponding network nodes, and checking that our model qualitatively reproduces mutant phenotypes (Supplementary Table S3).
Data interpretation using the model
In the light of the model properties, we verified the possible cooperating or exclusive mechanisms corresponding to associations. To support our findings, for each association, we considered the corresponding mutants and quantified the related phenotypes (Material and Methods, Table 1; Supplementary Fig. S1).
Mutual exclusivity between FGFR3 and RAS mutations is confirmed by our model. As shown in the network (Fig. 4), FGFR3 is upstream of RAS (in our model, FGFR3 directly activates RAS). Hence, for FGFR3-overexpressed mutant, RAS is active in all stable states and, consequently, additional mutations of RAS do not alter these phenotypes; there is no advantage for the tumor to mutate RAS when FGFR3 is already mutated. Note that single RAS-overexpressed mutant has a higher probability of Proliferation phenotype, which can also be reached through EGFR signaling (Table 2; Supplementary Fig. S1). In the case of single FGFR3 mutant, EGFR is always OFF (due to mutual inhibition of FGFR3 and EGFR through PKC; ref. 12), which implies that there are less trajectories leading to Proliferation. The model thus confirms the exclusivity of the two alterations: if FGFR3 is mutated, there is no advantage to further mutate RAS. For the same reasons, we predict that EGFR amplifications and RAS mutations are mutually exclusive. This could not be verified in the datasets due to the low percentage of both genetic alterations (46).
Mutual exclusivity between FGFR3 mutations and E2F3 amplifications, not reported in the literature but suggested in the CIT dataset, is also found in the Lindgren dataset and in pooled muscle-invasive samples. Because FGFR3 is upstream of E2F3 (Fig. 4), we expect that the genes activated by E2F3 are also activated by FGFR3. Indeed, the model shows that E2F3 is activated when FGFR3 activity is forced and simulations confirm that there is no advantage for the cancerous cell to amplify E2F3 if FGFR3 is already mutated (Table 2; Supplementary Fig. S1). Proliferation probability slightly decreases in the double mutant compared with single E2F3 overexpressed mutant for reasons similar to those evoked for the previous association.
Mutual exclusivity of FGFR3 mutations and CCND1 amplifications is suggested in CIT superficial tumors but has not been reported so far in the literature, and is not confirmed in other datasets. Our model shows an increase in proliferation when further amplifying CCND1 in an FGFR3-mutated tumor (Table 2; Supplementary Fig. S1), thus contradicting the exclusivity observed in our CIT data but in agreement with absence of this pattern in other datasets. CCND1 can be activated by FGFR3, but may have additional beneficial roles for the tumor, not explained by the model straightforwardly (dashed line in Fig. 3). Indeed, we find that in the single FGFR3 mutant (with DNA damage ON), the apoptotic phenotype is TP53-dependent, whereas it is also E2F1-dependent in the single CyclinD1 mutant. CyclinD1 presumably plays an additional role in triggering apoptosis by forcing E2F1 activation or inhibiting RBL2. The role of CCND1 gene in bladder would require further investigation.
Co-occurrence of FGFR3 and PIK3CA mutations is reported in the literature (15, 40) and found significant in the CIT and Lindgren datasets. As CyclinD1, PI3K is downstream of FGFR3 (Fig. 4). Consequently, when FGFR3 is active, so should be PI3K, which is not the case, because GRB2 is needed for PI3K activation. The sole interpretation of the network topology is thus not enough to explain the advantage to mutate both PI3K and FGFR3. To investigate this co-occurrence, we simulated the single mutants FGFR3-overexpressed or PI3K-overexpressed, and the double mutant FGFR3 PI3K overexpressed. From a qualitative point of view, no striking difference appears between the three mutants: the same phenotypes are reached, the only difference being the appearance of E2F1-dependent apoptosis in PI3K-overexpressed mutant (not shown) as previously observed with CyclinD1-overexpressed mutant. It seems advantageous in terms of Proliferation probabilities to mutate FGFR3 in PI3K-mutated tumours (“Null” phenotype corresponds to stable states with inputs all OFF, AKT pathway forced, and thus survival probably activated). From the single FGFR3-overexpressed to the double FGFR3 and PI3K-overexpressed mutant, only a slight increase in proliferation is observed (Table 2; Supplementary Fig. S1). We would expect PI3K-activating mutations to favor uncontrolled growth in an FGFR3-mutated context by promoting survival and blocking apoptosis. However, our model shows that it is not the case: to fully lead to uncontrolled proliferation, other checkpoints need to be deleted, for example, CDKN2A. The systematic analysis of multiple mutants predicts that indeed, a third deletion of CDKN2A (equivalent to p16INK4a KO in our simulations) abolishes all Growth_arrest stable states and thus, in absence of DNA damage, the sole reachable phenotype is Proliferation (Supplementary Tables S4–S7). We verified this observation in the data, but unfortunately, there are too few samples with the three alterations to perform significant statistical tests; however, among the 12 samples that carry the double mutations FGFR3 and PIK3CA, the two samples that are muscle-invasive have lost the two copies of CDKN2A.
Co-occurrence of FGFR3 mutations and CDKN2A homozygous deletions has been documented; Rebouissou and colleagues (33) reports that CDKN2A (hemizygous and homozygous) losses predict progression of FGFR3-mutated tumors, and CDKN2A homozygous deletion is associated with muscle-invasive tumors; Lindgren and colleagues consider CDKN2A deletions as a late event in tumorigenesis in FGFR3-mutated tumors (data from 2010; ref. 37). This co-occurrence is found in the CIT dataset and confirmed in the Iyer and TCGA datasets, suggesting that it is indeed prevalent in muscle-invasive tumors. In our model, in an FGFR3-mutated context, loss of CDKN2A shows a slight increase in proliferation (Table 2; Supplementary Fig. S1), hence no clear advantage for co-occurrence of these two alterations. As mentioned above, mutations of PIK3CA, but also deletions of CDKN1A (p21CIP) would drastically favour proliferation (Supplementary Tables S4 and S8). Recall that RB/E2F pathway is not only involved in proliferation but also in invasion and metastasis (22–24); thus, we anticipate that highly proliferative tumors are invasive (Supplementary Fig. S2 shows that expression of gene targets of the E2F1-3 transcription factors is correlated to tumor stages). p21CIP seems to be a good candidate for progression toward invasiveness. This is supported by the reported association of p21CIP with poor clinical outcome in urothelial bladder cancer (47). Unfortunately, there are two few samples to achieve statistical significance for p21CIP alterations in the CIT or TCGA datasets, even though among the five tumors that are both FGFR3-mutated and p21CIP-altered in TCGA (i.e., mutated or deleted), four are CDKN2A homozygously deleted.
The model also suggests that the double mutant EGFR overexpressed and CDKN2A deleted (equivalent to EGFR amplifications and CDKN2A homozygous loss) gives similar results as the triple mutant FGFR3-mutated, PI3K-mutated, and CDKN2A deletion: only Proliferation phenotypes are reached in absence of DNA damage. In the CIT dataset, EGFR-amplified tumors (which belong to the basal-like bladder tumor subtype; ref. 48) do not seem to lose CDKN2A more frequently than FGFR3-mutated tumors, and, more surprisingly, CDKN2A expression is increased when compared with the nonbasal tumors. This suggests two things: (i) CDKN2A may compensate transcriptionally for the loss of RB activation in these tumors; and (ii) in bladder, FGFR3 activates the cell cycle through CDK4/6 (CDKN2A-dependent), whereas EGFR activates the cell cycle in a CDKN2A-independent manner. In agreement with this idea, CCNB1 and CCNE2, whose actions are CDKN2A-independent, are both overexpressed in basal-like bladder tumors presenting an activation of EGFR target genes (48).
Mutual exclusivity of FGFR3 and TP53 mutations has been reported for all tumors rather than tumors separated by stage (19). In our model, when TP53 is mutated, Apoptosis is still reachable through E2F1, whereas Apoptosis is only TP53-dependent in FGFR3 single mutant (not shown). Model simulations indicate that TP53 mutations in an FGFR3-mutated context have little impact on proliferation (Table 2; Supplementary Fig. S1); however, mutating FGFR3 in a TP53-mutated context clearly increases Proliferation probability. The model shows that, when TP53 is already mutated, it is advantageous to mutate any gene from the cell-cycle machinery (here FGFR3, but amplification of any oncogene from our network would serve, as shown in the next association). From these results, we can only conclude that mutual exclusivity may concern FGFR3-mutated tumors.
We anticipate that EGFR amplifications and mutations of TP53 would be associated in muscle-invasive tumors. In the data, the amplifications for EGFR are rare (4 in CIT, 1 in Lindgren, 0 in Iyer, 10 in TCGA). But, recently, a subgroup of tumors was identified with overexpression of EGFR (either through transcriptional mechanisms or amplification) and indeed enriched with TP53 mutations (48).
Co-occurrence of TP53 mutations and E2F3 amplifications has not been reported but appears significant in CIT data. The model shows an increase in proliferation when comparing TP53 single mutant with the double mutant TP53-deleted E2F3-overexpressed, which is not the case when comparing E2F3 single mutant with the double mutant (Table 2; Supplementary Fig. S1). In other words, this co-occurrence is beneficial for the tumor cell when mutations of TP53 appear first. We looked more closely at the CIT dataset: There are 58 TP53-mutated and 11 E2F3-amplified samples in total. We found that among the 11 E2F3-amplified tumors, 9 are TP53-mutated, and 7 out of these 9 are muscle-invasive. In the Iyer dataset, 15 of 20 E2F3-amplified tumors are also TP53-mutated. As mentioned in the previous association, amplifying E2F3 might be one way to favor proliferation when tumors are already TP53-mutated.
Co-occurrence of CCND1 amplifications and CDKN2A homozygous deletions has been reported for head and neck squamous cell carcinomas (42, 43) and is found in CIT data. Simulations show that amplifying CCND1 (CyclinD1 overexpressed) alone has already an advantage over deleting CDKN2A (p16INK4a and p14ARF deletions in the model) alone in terms of proliferation (Table 2; Supplementary Fig. S1). It is known that CDKN2A inhibits CCND1 by forming a complex with the CDKs, CDK4 or CDK6. As a consequence, its loss facilitates the activation of CCND1 but does not necessarily promote proliferation. The double mutant shows an increase in proliferation when compared with both single mutants (Table 2; Supplementary Fig. S1). Our model thus confirms co-occurrence of these alterations. The role of CDKN2A in senescence is not considered here, and we anticipate that it may play an additional role that we cannot account for in this model.
Co-occurrence of PIK3CA mutations and CDKN2A homozygous deletions is similar to the previous case. The double mutation increases the probability to reach proliferation compared with single mutants. Indeed, the double mutant PI3K overexpressed and CDKN2A deleted (p16INK4a and p14ARF deletions) suppresses the Growth_arrest attractors (Table 2 and Supplementary Table S6; Supplementary Fig. S1).
Discussion
By performing literature search and data mining of four independent bladder cancer datasets, we identified nine patterns of co-occurrence and mutual exclusivity in genetic alterations affecting growth factor signaling pathways, cell cycle, and apoptosis. To explain the reasons and contexts for these patterns, we defined a mathematical model derived from an influence network encompassing the frequently altered genes (Fig. 4). We simulated the mutants corresponding to the patterns and provided different types of predictions. First, we concluded that, in some cases, co-occurrence needs to be accompanied by a third mutation to be associated with invasiveness, for example, PIK3CA mutations or p21CIP deletions in an FGFR3-mutated and CDKN2A-deleted context. Next, we found that the order of mutations might explain associations and concluded that some events occur late during tumorigenesis (e.g., co-occurrence of TP53 mutations and E2F3 amplifications).
In some cases, our model suggests that co-occurring genetic alterations prepare a context for more aggressive tumors (e.g., FGFR3 and PIK3CA mutations) or lead to more invasive tumors (e.g., E2F3 amplifications and TP53 mutations); and mutually exclusive alterations show redundant effect of the alterations on proliferation probabilities (e.g., FGFR3 and RAS mutations) or may be involved in different tumor types (FGFR3-mutated tumors associated to Ta pathway vs. TP53-mutated associated to CIS pathway). However, the model shows its limitations when it comes to distinguishing between the two hypotheses for mutual exclusivity. For instance, FGFR3 mutations and E2F3 amplifications are found mutually exclusive. In the model, they have the same downstream effect. Thus, we could conclude that one alteration only should be selected or the two alterations belong to different progression pathways as suggested by Lindgren (49). Because of Lindgren's results and the fact that E2F3 amplifications are co-occurring with TP53 mutations, we are tempted to associate E2F3 alterations to CIS progression pathway.
Looking at all nine associations studied with the model (Fig. 3), we can deduce (and confirm) that FGFR3 and PIK3CA mutations along with CDKN2A homozygous deletions are more associated with Ta progression pathway, whereas EGFR, E2F3 amplifications, and TP53 mutations are more associated with CIS progression pathway. p21CIP alterations (mutations or loss) seem to be associated with the Ta progression pathway in absence of PIK3CA mutations. The assignment of RAS mutations to one of the progression pathways was more difficult.
Because cancer is a disease involving multiple alterations and because our model only includes simplified representations of pathways, results from model analysis have to be interpreted with care. Moreover, the model refers to a single, idealized cell, thus ignoring crucial effects from its microenvironment. It would be tempting to suggest that phenotype probabilities predicted by the model for mutation patterns can have improved association to patient survival compared with mutational profile. Cox proportional hazard regression for the model probabilities of the phenotypes (Proliferation, Growth_arrest, Apoptosis) showed a significant association to patient survival. However, the same regression model estimated for the raw mutational profiles gives slightly more significant results (not shown). It is not yet realistic to expect that our mechanistic model would outcompete statistical approaches in predicting patient survival. This could be explained by the complexity of factors, which affects survival. Our model only focuses on the prediction of cancer cells to become invasive. Nevertheless, this does not compromise the cognitive value of the model to provide mechanistic insights into patterns of gene alterations observed in groups of patients, which is the main focus of the current study.
What the model shows with confidence is that reasoning in terms of pathways is not enough to explain the studied patterns. Linear signaling pathways, as often described in biology, are highly interconnected, and it becomes difficult to reason without a model to formally conclude on the roles of frequent mutations. While the model does not provide straightforward explanations for all associations, or cannot confirm some results of the CIT dataset statistical analysis (see CCND1 amplifications and FGFR3 mutations), it allows to explore the effects of single, double, or even triple mutations on the studied phenotypes. For these cases, it can provide qualitative trends (e.g., increase in Proliferation) and lead to the formulation of predictions. The model can propose contexts in which particular mutations lead to extreme cases. For instance, we can interpret proliferative phenotypes, if they are the only reachable stable states, as a very invasive situation in which the cell would have lost all protections against uncontrolled growth (this is the case for the triple mutant FGFR3 overexpressed and PI3K overexpressed and CDKN2A deleted). Our approach combining data analysis and mathematical modeling is able to shed some light on the mechanisms leading to tumorigenesis and allows an alternative interpretation of the genomic data.
Of course, some predictions remain to be checked in other public datasets, as soon as they are made available (e.g., the role of p21CIP in invasiveness). To draw our conclusions, we considered copy number and sequencing data, but many other events can happen downstream of gene activity. It would be appropriate to include other types of data, such as mRNA expression, DNA methylation, and protein level. We plan to include at least transcriptomic data in future analyses and anticipate that other genes will appear to play a role in the process of invasiveness.
Finally, we analyzed association patterns limiting ourselves to cell cycle and apoptosis entries. Some of our explanations may be incomplete because of the involvement of important genes in other cell fates such as senescence (e.g., TP53, CDKN2A, RBL2, etc.; ref. 50). Similarly, PTEN role will need to be further explored in pathways other than PI3K/AKT as reported in refs. 26, 51.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: E. Remy, S. Rebouissou, C. Chaouiya, A. Zinovyev, F. Radvanyi, L. Calzone
Development of methodology: E. Remy, C. Chaouiya, A. Zinovyev, L. Calzone
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S. Rebouissou, F. Radvanyi
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E. Remy, S. Rebouissou, C. Chaouiya, F. Radvanyi, L. Calzone
Writing, review, and/or revision of the manuscript: E. Remy, S. Rebouissou, C. Chaouiya, A. Zinovyev, F. Radvanyi, L. Calzone
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E. Remy, L. Calzone
Study supervision: E. Remy, C. Chaouiya, F. Radvanyi, L. Calzone
Acknowledgments
The authors thank Luca Grieco for help in modeling the interaction between EGFR and FGFR3, Gautier Stoll for MaBoSS simulations, Fanny Coffin and Pierre Gestraud for statistical consulting, Anne Biton and Elodie Chapeaublanc for data management/processing, Pedro T. Monteiro for Avatar simulations, and Denis Thieffry and Emmanuel Barillot for critical reading of the manuscript.
Grant Support
This work was supported by Agence Nationale de la Recherche (ANR-08-SYSC-003).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.