Abstract
A novel genome-wide screen that combines patient outcome analysis with array comparative genomic hybridization and mRNA expression profiling was developed to identify genes with copy number alterations, aberrant mRNA expression, and relevance to survival in glioblastoma. The method led to the discovery of physical gene clusters within the cancer genome with boundaries defined by physical proximity, correlated mRNA expression patterns, and survival relatedness. These boundaries delineate a novel genomic interval called the functional common region (FCR). Many FCRs contained genes of high biological relevance to cancer and were used to pinpoint functionally significant DNA alterations that were too small or infrequent to be reliably identified using standard algorithms. One such FCR contained the EphA2 receptor tyrosine kinase. Validation experiments showed that EphA2 mRNA overexpression correlated inversely with patient survival in a panel of 21 glioblastomas, and ligand-mediated EphA2 receptor activation increased glioblastoma proliferation and tumor growth via a mitogen-activated protein kinase–dependent pathway. This novel genome-wide approach greatly expanded the list of target genes in glioblastoma and represents a powerful new strategy to identify the upstream determinants of tumor phenotype in a range of human cancers. (Cancer Res 2006; 66(22): 10815-23)
Introduction
Glioblastoma is one of the most malignant of all brain tumors, with a median patient survival of 12 to 14 months (1). A detailed analysis of DNA and mRNA alterations in glioblastoma may lead to an understanding of why these tumors develop and to the identification of novel therapeutic targets. Expression profiling combined with clinical survival data have been widely used to identify changes in mRNA expression in glioblastoma and other human cancers (2–4). However, hundreds of genes with altered expression may be identified in such analyses, and it is often difficult to determine from these analyses alone which of the changes in mRNA expression are upstream, initiating determinants of tumor phenotype and which ones represent secondary changes or even “bystander effects” with no essential role in determining the neoplastic behavior of tumor cells.
In an attempt to identify the upstream genetic determinants of tumor phenotype, several investigators have combined microarray-based comparative genomic hybridization (aCGH) studies of tumor DNA with mRNA expression profiling in human cancers (4–11). However, the likely presence of random DNA alterations with little functional significance within the cancer genome and the adoption of somewhat arbitrary criteria for determining which DNA abnormalities are “significant” often result in an imprecise prediction of functionally relevant DNA alterations.
We present here a novel genome-wide method for identifying functionally relevant genetic changes that are difficult to identify reliably using standard aCGH analytic methods alone. We have called this integrated method Functional Genomic Analysis of RNA and DNA (FGARD) and have used it to show the existence of physical clusters of genes that are statistically related to survival in patients with glioblastoma and that display recurrent and correlated DNA and mRNA alterations. We have called these clusters functional common regions (FCR) to emphasize their dual physical and functionally relevant nature. We validated the FGARD approach by characterizing in detail one of the identified gene products, the EphA2 receptor tyrosine kinase. We present evidence that EphA2 expression is related to survival in glioblastoma, and that ligand-mediated activation of the receptor promotes glioblastoma cell proliferation and tumor growth in a mitogen-activated protein kinase (MAPK)–dependent manner. This functional genomic screening method may be generally applicable to a wide range of human cancers and can be used to identify genes related to a variety of clinical variables on a genome-wide scale.
Materials and Methods
Tumor samples and cell lines. Frozen primary human tissue samples of WHO grade 2, 3, and 4 astrocytomas were obtained from the Brain Tumor Tissue Bank at Brigham and Women's Hospital under the auspices of an Institutional Review Board (IRB)–approved study protocol. Non-tumor brain samples from surgical epilepsy procedures were also obtained from the Brain Tumor Tissue Bank. In some cases, DNA was extracted from paraffin-embedded tumor tissues obtained from the Department of Pathology at Brigham and Women's Hospital. Histology of glioblastoma samples was confirmed by H&E before inclusion in this study. Glioblastoma cell lines were obtained from the American Tissue Type Culture Collection (Manassas, VA; U87, U343, U251, and T98) or as a gift (D566 from D. Bigner, Duke University).
CGH microarray analysis. CGH microarray analysis was done as described (9) using genomic DNA isolated from 21 human glioblastoma samples. aCGH profiles were generated at the Arthur and Rochelle Belfer Cancer Genomic Center at Dana-Faber Cancer Institute. Control DNA was obtained from peripheral lymphoid cells and used as a normalization standard. The DNA was isolated using a commercially available method (Qiagen, Valencia, CA), digested, and random prime labeled and hybridized as described (9) using a spotted human cDNA microarray consisting of ∼14,160 cDNA clones (Human Clone Set 1, Agilent Technologies, Palo Alto, CA). This array provides a median resolution of ∼100 kb (9).
Log ratios of probe intensities between glioblastoma and control samples were plotted in the order of their physical location along the chromosome. Amplifications and deletions for each chromosome were identified using the clustering along chromosomes (CLAC) segmentation algorithm (12) using smoothing windows of either 3 or 5, and the results of these analyses were combined for maximal coverage. False discovery rates for the CLAC analysis ranged from 0.01 to 0.05. aCGH analyses were also done using other algorithms, such as the circular binary segmentation method, a Hidden Markov Model–based method, and various smoothing methods (13, 14) for validation, but we chose CLAC because of its intuitive algorithm, calculation of the false discovery rate, and clear visual representation. A comparison of these methods using both real and simulated data has been carried out by us previously (15).
mRNA expression profiling. Total RNA was isolated from an independent set of 21 frozen glioblastoma samples and 5 non-tumor human brain samples. Affymetrix U133A expression data for two additional non-tumor brain samples (16) was obtained from public databases (Gene Expression Omnibus: GSM18929 and GSM18930). The mRNA was reverse-transcribed to generate cDNA, which was then biotinylated and hybridized to Affymetrix HG-U133A oligonucleotide probe arrays (Affymetrix, Santa Clara, CA). Differentially expressed genes between glioblastoma and non-tumor brain samples were identified using the t test. To account for multiple hypothesis testing, we calculated both Ps and Qs (ref. 17; Q is similar to P but measures significance in terms of the false discovery rate). Because additional filtering for significant gene clusters is done subsequently, a conservative threshold value (unadjusted P = 0.05, which corresponds to Q = 0.03) was applied in this step. Unsupervised hierarchical clustering of differentially expressed probe sets was done using average linkage and 1 − r as the distance measure, where r is the Pearson correlation coefficient.
Survival analysis. Patient survival data was obtained from hospital records under an approved IRB human subjects protocol. Survival was calculated from the time of initial diagnosis of glioblastoma to the time of death and was measured in months. Using a Cox regression model, mRNA expression in a panel of 21 glioblastomas was analyzed to determine the relationship to patient survival. A list of genes significantly related to survival was generated using the significance level of P < 0.05 (corresponding to a false discovery rate of Q < 0.29). As in the differential expression analysis, this threshold was not stringent after correcting for multiple testing, but it was more important to minimize false negatives at this stage. Because the subsequent step involving cluster identification is robust, false positives do not have a large effect on the overall analysis. The log-rank test was used to confirm that a subset of genes that were chosen for more detailed study was predictive of patient survival.
Cluster detection algorithm. The idea behind the method is to scan along the genome and find clusters of genes correlated with survival. The Ps from the Cox regression are first discretized into 1s and 0s, with 1s corresponding to genes with P ≤ 0.05. The problem then is to find regions with a high density of 1s compared with 0s. Because the size of a cluster is not known in advance, every possible size (up to some limit) is considered at every probe location, and a score is calculated based on the binomial distribution. These scores allow comparison among clusters of varying sizes, and the sequences of 1s (usually with 0s interspersed) that are least likely to have arisen by chance are selected and ordered by their probabilities. Mathematically, those clusters whose scores are below a given threshold are identified using the following formula: where {ai} is the sequence of 1s and 0s along the chromosome, and L indicates the maximum size of the cluster (L = 15 in our computations). Further processing is done to sort out overlapping clusters. Other methods without the discretization of survival Ps may be possible, but it is not clear how to rescale the Ps effectively in those cases. The present method seems to work well and is relatively robust to the changes in the threshold value.
Additional details regarding this method are described in the Supplementary Methods.
Generation of stable clones expressing EphA2 short hairpin RNA. Two expression vectors containing short hairpin RNA (shRNA) sequences directed against EphA2 (Origene, Rockville, MD) were transfected into U343 glioma cells using LipofectAMINE 2000 (Invitrogen, San Diego, CA) according to the manufacturer's protocol. A control shRNA vector supplied by the manufacturer was used as a control. The cells were maintained in DMEM/F12 growth medium containing 10% serum and puromycin (0.5 μg/mL) for several weeks to select for stable clones. Two sets of clones, each expressing one of the two shRNAs directed against EphA2 and displaying a 60% reduction in EphA2 protein expression, were then used for further experiments.
Western blot analysis. Protein extracts from glioblastoma samples were prepared in SDS, separated by gel electrophoresis (10-12% Tris-HCl gel), and transferred to 0.2-μm nylon membranes at 100 V and 4°C for 90 minutes. Membranes were then blocked and incubated overnight in primary antibody. Primary antibodies used were anti-EphA2 (1:500; Santa Cruz Biotechnology, Santa Cruz, CA), anti-ephrinA1 (1:200; Santa Cruz Biotechnology), anti-TAX1BP1 (1:100; Santa Cruz Biotechnology), anti-cyclooxygenase-2 (1:50; Transduction Laboratories, Lexington, KY), and anti-SRI (1:1,000; Zymed, South San Francisco, CA). Membranes were then washed and incubated in biotinylated secondary antibodies for 1 hour. Specific immunoreactivity was visualized using enhanced chemiluminescence (Amersham, Piscataway, NJ).
Real-time PCR. Real-time PCR for EphA2 was done using total RNA from primary glioblastomas and from non-tumor brain samples, a Taqman PCR EphA2 probe kit, and an ABI 7300 Real Time PCR Thermocycler (Applied Biosystems, Foster City, CA) according to the manufacturer's protocol.
Reverse transcription-PCR for EphA2. Total RNA was isolated from primary glioblastomas and from non-tumor brain samples as a control using a commercially available kit (RNeasy, Qiagen). PCR primers were designed to amplify the full 2,930-bp sequence, with the exception of the NH2-terminal of 154 bp. Reliable PCR primers including this 154-bp region of the gene could not be obtained due to its high GC content. The PCR primers used were 5′-GGGACCTGATGCAGAACATC-3′ and 3′-AAGGTCGGCTTGGGAATATC-5′ and were derived from exon 1 and from the 3′-untranslated region of the human EphA2 gene.
Bromodeoxyuridine ELISA cell proliferation and 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide cell growth assays. Quantitation of cell proliferation was done using a bromodeoxyuridine (BrdUrd) ELISA assay (Roche Applied Science, Indianapolis, IN). Briefly, U343 glioblastoma cells were plated at a density of 1 × 104 per well in 96-well microtiter plates and maintained in growth medium containing 10% serum. The cells were then incubated in 1 μmol/L BrdUrd for 16 to 18 hours. Newly synthesized BrdUrd-DNA was assayed using colorimetric detection on an ELISA plate reader according to the manufacturer's protocol.
3-(4,5-Dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) cell growth assays (Roche Applied Science) were also done. Briefly, U343 glioblastoma cells were plated at a density of 1 × 104 per well in 96-well microtiter plates and maintained in growth medium containing 10% serum. MTT was added to the cultures and incubated at 37°C for 4 hours. MTT reaction product was then measured using an ELISA plate reader at 590 nm according to the manufacturer's protocol.
MAPK pathway studies. U343 glioblastoma cells were exposed to ephrinA1-Fc (2 μg/mL; Sigma-Aldrich, St. Louis, MO), and total protein was isolated for Western blot. Antibodies directed against total extracellular signal-regulated kinase (ERK), phosphorylated ERK1/2 (pT202/pY204), pan-c-Jun NH2-terminal kinase/stress-activated protein kinase 1 (pan-JNK/SAPK1), phosphorylated JNK (pT183/pY185), p38α, and phosphorylated p38 MAPK (pT180/pY182) were purchased from BD Biosciences (San Jose, CA). Immunoreactive bands were visualized using chemiluminescence. Assays were done in triplicate.
For MAPK inhibition studies, U343 cells were preincubated in the presence of p38 or MAP/ERK kinase (MEK) inhibitors for 1 hour before exposure to ephrinA1-Fc (2 μg/mL) for 1 hour. The inactive control compound (SB202474) and specific inhibitors for p38α/β MAPK (SB202190), p38 MAPK (SB203580), MAPK kinase/MEK (PD-98059), and MEK1/MEK2 (U0126) were purchased from Calbiochem (La Jolla, CA). All inhibitors were dissolved in DMSO and used at a concentration of 10 μmol/L. Cell proliferation was measured using the BrdUrd assay. MAPK inhibition assays were done in duplicate.
In vivo tumor assay. A mouse xenograft model was used to examine the effect of EphA2 on glioblastoma tumor growth. U343 glioblastoma cells (1 × 106) with stable expression of an shRNA vector directed against EphA2 (Origene) were implanted s.c. in the right flank of nude mice (n = 7) under the auspices of an approved animal subjects protocol. An equivalent number of U343 glioblastoma cells containing a control shRNA vector supplied by the manufacturer were implanted s.c. in the left flank. The mice were followed for a period of 7 weeks. Tumor volume for each group was measured weekly.
Results
Identification of FCRs. Commonly used aCGH segmentation algorithms detect regions of interest by identifying consecutive probes that show copy number alterations (12–14). The FGARD approach incorporates other types of data to select regions more likely to be functionally and clinically important. The method is based upon the observation that genes that are differentially expressed because of gains or losses at the DNA level often have neighboring genes that exhibit correlated changes in mRNA expression (18–21). We hypothesized that this phenomenon should lead to the formation of physical clusters of genes that exhibit statistically correlated functional relationships.
To identify these clusters, we first did expression analysis using 21 glioblastomas and 7 non-tumor brain samples. Unsupervised hierarchical clustering illustrated the marked distinction between the mRNA expression profiles of the glioblastoma and non-tumor brain samples (Supplementary Fig. S1). Several of the differences in gene expression were confirmed by Western blot analysis (Supplementary Fig. S2). Second, we used clinical annotations and Cox regression analysis to identify survival-related genes and ordered them according to their location within the genome. We then implemented a new algorithm based on the binomial distribution to look for clustering of survival-related genes within each region. Because the size and location of each putative cluster were unknown at the outset, the statistic was calculated for variable cluster sizes at every possible location, and the gene cluster that was least likely to have arisen by chance was selected. These clusters were examined for overlap and rank-ordered according to their Ps. Because this algorithm does not define clusters by the number of consecutive survival-related genes, it is robust to noise in the data. This approach is also relatively insensitive to the effect of variable gene density along the genome, as it identifies survival-related clusters independent of the number of genes within a specified genetic interval (Supplementary Fig. S3). On average, genes in close physical proximity within the genome display a slightly increased correlation in mRNA expression when compared with genes that are widely separated (19, 20). Based upon previous studies showing correlations between DNA copy number and regional expression changes, we sought to assess in our data set whether the genes within survival clusters had an even greater correlation among their mRNA expression patterns than similarly grouped genes that were not within these clusters. This phenomenon can be readily appreciated by examining known regions of DNA alteration containing genes of clinical relevance in glioblastoma. As shown in Fig. 1, amplicons containing the target genes EGFR or PDGFRA also contain adjacent genes that show highly correlated changes in gene expression within each tumor sample examined. In many cases, these “bystander” genes also show a statistical relationship to survival, thereby creating a cluster of survival-related genes that corresponds to the location of the DNA alteration.
To further investigate this phenomenon, we determined the probability density function of the correlations between mRNA expression patterns for probe sets located within the survival clusters (Fig. 2). For comparison, we calculated the distribution of correlations for mRNA probe sets in similarly sized genomic regions located outside of the survival clusters, as well as for randomly shuffled mRNA probe sets from throughout the genome. The density distribution plot of correlations for probe sets located within the survival clusters was asymmetrical and skewed to the right, indicating the presence of a significantly stronger correlation between mRNA expression patterns for genes located within the survival clusters than for genes located outside these clusters (P < 10−15, Kolmogorov-Smirnov test). This confirmed that our cluster-finding algorithm successfully identified clusters of survival-related genes with boundaries defined by common functional relationships, highly correlated mRNA expression patterns, and physical proximity. These boundaries, thus, delineate a novel genomic interval that we have called the FCR. Highly significant FCRs, as well as their associated survival-related target genes, are listed in Table 1. A complete list of FCRs may be found in Supplementary Table S1.
Chromosome . | Position (Mb) . | Binomial statistic . | FCR sequence . | Genes within FCR . |
---|---|---|---|---|
6p21.1 | 43.5-44.2 | 0.000124659 | 111110011 | EGFL9, TJP4, POLR1C, POLH, GTPBP2, MAD2L1BP, MRPS18A, VEGF, CAPN11 |
5q23.3-5q31.1 | 132.3-134 | 0.000124937 | 10101101110101 | AF5Q31, ZCCHC10, HSPA4, FSTL4, C5orf15, VDAC1, TCF7, SKP1A, PPP2CA, CDKL3, UBE2B, PHF15, SARA2 |
2p22.2-2p22.1 | 37.5-39 | 0.000159168 | 11111 | QPCT, CDC42EP3, CYP1B1, SFRS7 |
11p15.1 | 17.4-18 | 0.000159168 | 11111 | USH1C, MYOD1, KCNC1, DELGEF |
12q13.3 | 55.9-56.1 | 0.000164898 | 1111011 | NXPH4, SHMT2, LOC56901, KIAA1002, INHBC, INHBE |
5q31.3 | 139.5-140.1 | 0.000212289 | 101011001111 | PURA, PFDN1, DTR, MASK-BP3, ANKHD1, APBB3, PRO1580, CD14, NDUFA2, FLJ20195, DND1, HARS |
8p12-8p11.21 | 38-39.8 | 0.000263933 | 110011110010101 | STAR, LSM1, BAG4, DDHD2, WHSC1L1, FGFR1, TACC1, PLEKHA2, ADAM9, ADAM5, ADAM3A, ADAM18, ADAM2, INDO |
17q25.1 | 73.6-73.8 | 0.000561616 | 10110111 | ATP5H, KCTD2, SLC16A5, FLJ22160, HN1, SUMO2 |
1q32.1 | 200.9-201.3 | 0.000816591 | 101111 | ATP2B4, LAX, KIAA0663, SNRPE, SOX13, FLJ10761 |
13q21.32-13q21.33 | 65-68.5 | 0.000816591 | 110111 | KLHL1, SCA8 |
17p11.2 | 18.1-18.4 | 0.000816591 | 111011 | C17orf39, DRG2, MYO15A, LLGL1, FLII, TOP3A |
1q41-1q42.11 | 218.9-220.9 | 0.000915136 | 1111 | DUSP10, FLJ13840, FLJ12806, CAPN2 |
4p16.3 | 3-3.1 | 0.000915136 | 1111 | C4orf10, C4orf9, GRK4 |
4q28.2-4q28.3 | 129.3-139.1 | 0.000915136 | 1111 | FLJ21106, PGRMC2, PHF17 |
5q31.2 | 137.9-138.3 | 0.000915136 | 1111 | ETF1, HSPA9B, CTNNA1 |
5q31.3 | 140.1-140.1 | 0.000915136 | 1111 | NDUFA2, FLJ20195, DND1, HARS |
6p21.33 | 30.2-30.4 | 0.000915136 | 1111 | TRIM10, TRIM15, TRIM26 |
10q24.32 | 103.7-103.8 | 0.000915136 | 1111 | PITX3, GBF1, NFKB2 |
16p12.1-16p11.2 | 28.7-28.9 | 0.000915136 | 1111 | CLN3, EIF3S8, A2LP, TUFM |
18p11.22 | 9.1-9.5 | 0.000915136 | 1111 | NDUFV2, ANKRD12, TWSG1, RALBP1 |
20q13.12 | 46-47 | 0.000915136 | 1111 | SLC2A10, EYA2, PRKCBP1, NCOA3 |
17p13.1 | 7.5-7.8 | 0.000924093 | 11001100110011 | CHRNB1, POLR2A, TNFSF12, TNFSF13, SENP3, EIF4A1, CD68, MPDU1, SOX15, FXR2, SHBG, ATP1B2, TP53, FLJ10385 |
4q23 | 100.5-101.3 | 0.00143553 | 111001011 | ADH5, ADH6, ADH1A, ADH1B, ADH1C, ADH7, MTP, DAPP1, MAP2K1IP1 |
9p21.3 | 21.2-22 | 0.00143553 | 111001101 | IFNA4, IFNA21, IFNA10, IFNA8, IFNA5, KIAA1354, IFNA2, IFNA1, MTAP |
19p13.11 | 18.2-18.5 | 0.00143553 | 101011011 | PDE4C, JUND, LSM4, PGPEP1, GDF15, ISYNA1, ELL, FKBP8, MGC2749 |
1q41-1q42.11 | 217.3-221.4 | 0.001678561 | 110001001111001 | FLJ10326, RAB3-GAP150, MARK1, FLJ14146, FLJ20605, FLJ22390, HLX1, DUSP10, FLJ13840, FLJ12806, CAPN2, TP53BP2, KIAA0483, DEGS |
11p15.2-11p15.1 | 14.6-18 | 0.001678561 | 100101000011111 | PDE3B, CYP2R1, CALCA, CALCB, SMAP, PIK3C2A, NUCB2, ABCC8, USH1C, MYOD1, KCNC1, DELGEF |
12q13.13 | 52.1-52.7 | 0.001678561 | 101010011100011 | AMHR2, DKFZP564J157, PCBP2, MAP3K12, TARBP2, NPFF, ATF7, ATP5G2, KIAA1536, HOXC13, HOXC11, HOXC10, HOXC8, HOXC4, HOXC6 |
1p36.11 | 25.2-26.2 | 0.001679229 | 110010011011 | RHCE, RHD, ARH, MAN1C1, MGC2603, STMN1, PAFAH2, EXTL1, FLJ14050, ZNF593, CNKSR1, DKFZP434L0117 |
4q11-4q12 | 52.8-55.1 | 0.002445824 | 1010111 | SGCB, FLJ11850, USP46, RASL11B, FIP1L1, CHIC2, PDGFRA |
12q13.13 | 51.4-51.7 | 0.002445824 | 1110011 | HUMCYT2A, KRT3, KRT4, KRT18, EIF4B, PRO1843, TENC1 |
2q11.2 | 98.8-100.8 | 0.00305984 | 1010110101 | UNC50, MGAT4A, MGC42367, TSGA10, TXNDC9, EIF5B, REV1L, LAF4, CHST10, PDCL3 |
6p21.32 | 32.8-33.1 | 0.00305984 | 1101100101 | TAP2, PSMB8, TAP1, PSMB9, HLA-DMA, BRD2, HLA-DOA, HLA-DPA1 |
1q23.3 | 158.4-158.8 | 0.003939005 | 10111 | APOA2, NR1I3, MPZ, SDHC, FCGR2A |
3q27.2 | 186.4-187 | 0.003939005 | 10111 | MAP3K13, SENP2, IMP-2, SFRS10 |
3q29 | 197.4-198 | 0.003939005 | 11011 | KIAA0794, PAK2, SENP5 |
4p16.3 | 0.9-1.2 | 0.003939005 | 11101 | DGKQ, SLC26A1, IDUA, SPON2, CTBP1 |
6p22.1 | 27.3-27.5 | 0.003939005 | 10111 | PRSS16, POM121L2, ZNF204 |
6q21 | 112-114.2 | 0.003939005 | 11011 | FYN, WISP3, LAMA4, MARCKS |
7p15.2 | 26.9-27 | 0.003939005 | 11011 | HOXA5, HOXA6, HOXA7, HOXA9, HOXA10 |
14q11.2 | 22.9-23 | 0.003939005 | 10111 | KIAA1305, KIAA0323, C14orf124, CMA1, CTSG |
18q11.2 | 20.3-22.7 | 0.003939005 | 11101 | IMPACT, HRH4, SS18, TAF4B, AQP4 |
22q13.1 | 36.8-36.9 | 0.003939005 | 10111 | PLA2G6, MAFF, C22orf5, KIAA1660, CSNK1E |
Chromosome . | Position (Mb) . | Binomial statistic . | FCR sequence . | Genes within FCR . |
---|---|---|---|---|
6p21.1 | 43.5-44.2 | 0.000124659 | 111110011 | EGFL9, TJP4, POLR1C, POLH, GTPBP2, MAD2L1BP, MRPS18A, VEGF, CAPN11 |
5q23.3-5q31.1 | 132.3-134 | 0.000124937 | 10101101110101 | AF5Q31, ZCCHC10, HSPA4, FSTL4, C5orf15, VDAC1, TCF7, SKP1A, PPP2CA, CDKL3, UBE2B, PHF15, SARA2 |
2p22.2-2p22.1 | 37.5-39 | 0.000159168 | 11111 | QPCT, CDC42EP3, CYP1B1, SFRS7 |
11p15.1 | 17.4-18 | 0.000159168 | 11111 | USH1C, MYOD1, KCNC1, DELGEF |
12q13.3 | 55.9-56.1 | 0.000164898 | 1111011 | NXPH4, SHMT2, LOC56901, KIAA1002, INHBC, INHBE |
5q31.3 | 139.5-140.1 | 0.000212289 | 101011001111 | PURA, PFDN1, DTR, MASK-BP3, ANKHD1, APBB3, PRO1580, CD14, NDUFA2, FLJ20195, DND1, HARS |
8p12-8p11.21 | 38-39.8 | 0.000263933 | 110011110010101 | STAR, LSM1, BAG4, DDHD2, WHSC1L1, FGFR1, TACC1, PLEKHA2, ADAM9, ADAM5, ADAM3A, ADAM18, ADAM2, INDO |
17q25.1 | 73.6-73.8 | 0.000561616 | 10110111 | ATP5H, KCTD2, SLC16A5, FLJ22160, HN1, SUMO2 |
1q32.1 | 200.9-201.3 | 0.000816591 | 101111 | ATP2B4, LAX, KIAA0663, SNRPE, SOX13, FLJ10761 |
13q21.32-13q21.33 | 65-68.5 | 0.000816591 | 110111 | KLHL1, SCA8 |
17p11.2 | 18.1-18.4 | 0.000816591 | 111011 | C17orf39, DRG2, MYO15A, LLGL1, FLII, TOP3A |
1q41-1q42.11 | 218.9-220.9 | 0.000915136 | 1111 | DUSP10, FLJ13840, FLJ12806, CAPN2 |
4p16.3 | 3-3.1 | 0.000915136 | 1111 | C4orf10, C4orf9, GRK4 |
4q28.2-4q28.3 | 129.3-139.1 | 0.000915136 | 1111 | FLJ21106, PGRMC2, PHF17 |
5q31.2 | 137.9-138.3 | 0.000915136 | 1111 | ETF1, HSPA9B, CTNNA1 |
5q31.3 | 140.1-140.1 | 0.000915136 | 1111 | NDUFA2, FLJ20195, DND1, HARS |
6p21.33 | 30.2-30.4 | 0.000915136 | 1111 | TRIM10, TRIM15, TRIM26 |
10q24.32 | 103.7-103.8 | 0.000915136 | 1111 | PITX3, GBF1, NFKB2 |
16p12.1-16p11.2 | 28.7-28.9 | 0.000915136 | 1111 | CLN3, EIF3S8, A2LP, TUFM |
18p11.22 | 9.1-9.5 | 0.000915136 | 1111 | NDUFV2, ANKRD12, TWSG1, RALBP1 |
20q13.12 | 46-47 | 0.000915136 | 1111 | SLC2A10, EYA2, PRKCBP1, NCOA3 |
17p13.1 | 7.5-7.8 | 0.000924093 | 11001100110011 | CHRNB1, POLR2A, TNFSF12, TNFSF13, SENP3, EIF4A1, CD68, MPDU1, SOX15, FXR2, SHBG, ATP1B2, TP53, FLJ10385 |
4q23 | 100.5-101.3 | 0.00143553 | 111001011 | ADH5, ADH6, ADH1A, ADH1B, ADH1C, ADH7, MTP, DAPP1, MAP2K1IP1 |
9p21.3 | 21.2-22 | 0.00143553 | 111001101 | IFNA4, IFNA21, IFNA10, IFNA8, IFNA5, KIAA1354, IFNA2, IFNA1, MTAP |
19p13.11 | 18.2-18.5 | 0.00143553 | 101011011 | PDE4C, JUND, LSM4, PGPEP1, GDF15, ISYNA1, ELL, FKBP8, MGC2749 |
1q41-1q42.11 | 217.3-221.4 | 0.001678561 | 110001001111001 | FLJ10326, RAB3-GAP150, MARK1, FLJ14146, FLJ20605, FLJ22390, HLX1, DUSP10, FLJ13840, FLJ12806, CAPN2, TP53BP2, KIAA0483, DEGS |
11p15.2-11p15.1 | 14.6-18 | 0.001678561 | 100101000011111 | PDE3B, CYP2R1, CALCA, CALCB, SMAP, PIK3C2A, NUCB2, ABCC8, USH1C, MYOD1, KCNC1, DELGEF |
12q13.13 | 52.1-52.7 | 0.001678561 | 101010011100011 | AMHR2, DKFZP564J157, PCBP2, MAP3K12, TARBP2, NPFF, ATF7, ATP5G2, KIAA1536, HOXC13, HOXC11, HOXC10, HOXC8, HOXC4, HOXC6 |
1p36.11 | 25.2-26.2 | 0.001679229 | 110010011011 | RHCE, RHD, ARH, MAN1C1, MGC2603, STMN1, PAFAH2, EXTL1, FLJ14050, ZNF593, CNKSR1, DKFZP434L0117 |
4q11-4q12 | 52.8-55.1 | 0.002445824 | 1010111 | SGCB, FLJ11850, USP46, RASL11B, FIP1L1, CHIC2, PDGFRA |
12q13.13 | 51.4-51.7 | 0.002445824 | 1110011 | HUMCYT2A, KRT3, KRT4, KRT18, EIF4B, PRO1843, TENC1 |
2q11.2 | 98.8-100.8 | 0.00305984 | 1010110101 | UNC50, MGAT4A, MGC42367, TSGA10, TXNDC9, EIF5B, REV1L, LAF4, CHST10, PDCL3 |
6p21.32 | 32.8-33.1 | 0.00305984 | 1101100101 | TAP2, PSMB8, TAP1, PSMB9, HLA-DMA, BRD2, HLA-DOA, HLA-DPA1 |
1q23.3 | 158.4-158.8 | 0.003939005 | 10111 | APOA2, NR1I3, MPZ, SDHC, FCGR2A |
3q27.2 | 186.4-187 | 0.003939005 | 10111 | MAP3K13, SENP2, IMP-2, SFRS10 |
3q29 | 197.4-198 | 0.003939005 | 11011 | KIAA0794, PAK2, SENP5 |
4p16.3 | 0.9-1.2 | 0.003939005 | 11101 | DGKQ, SLC26A1, IDUA, SPON2, CTBP1 |
6p22.1 | 27.3-27.5 | 0.003939005 | 10111 | PRSS16, POM121L2, ZNF204 |
6q21 | 112-114.2 | 0.003939005 | 11011 | FYN, WISP3, LAMA4, MARCKS |
7p15.2 | 26.9-27 | 0.003939005 | 11011 | HOXA5, HOXA6, HOXA7, HOXA9, HOXA10 |
14q11.2 | 22.9-23 | 0.003939005 | 10111 | KIAA1305, KIAA0323, C14orf124, CMA1, CTSG |
18q11.2 | 20.3-22.7 | 0.003939005 | 11101 | IMPACT, HRH4, SS18, TAF4B, AQP4 |
22q13.1 | 36.8-36.9 | 0.003939005 | 10111 | PLA2G6, MAFF, C22orf5, KIAA1660, CSNK1E |
NOTE: Identification of FCRs. A novel algorithm based on the binomial distribution was used to identify clusters of survival-related genes throughout the genome. The statistic was calculated for variable cluster sizes at every possible location, and the gene cluster that was least likely to have arisen by chance was selected. These physical gene clusters, thus, displayed boundaries defined by physical proximity, correlated mRNA expression patterns, and survival relatedness and were called FCRs. These FCRs were examined for overlap and rank-ordered according to their Ps. Within each cluster, the presence of survival-related or survival-unrelated genes is indicated by either a 1 or a 0, respectively.
Identification of cryptic DNA alterations using FCR mapping. The identification of FCRs was derived solely from the clinical and mRNA expression data sets, with no contribution from direct DNA analysis. The independently derived FCRs were thus mapped onto a physical map of genome-wide copy number alterations of 21 primary glioblastomas generated by a cDNA aCGH platform with a median resolution of 100 kb. The effect of DNA alteration on the genesis of survival-related FCRs is clearly illustrated for EGFR and PDGFRA, two genes known to play a role in gliomagenesis (Fig. 1). Visual inspection of 104 FCRs indicated that >85% seemed to be associated with focal regions of DNA abnormality. Further analysis of 550 genes located within the FCRs indicated that 68% of the survival-related genes were associated with focal DNA alterations, whereas only 29% of the non–survival-related genes were associated with such alterations. This difference was statistically significant (P < 0.00001, proportion test). In addition to genes with a previously reported role in glioblastoma, a large number of survival-related genes not previously known to be amplified or deleted in glioblastoma were identified within FCRs (see Table 1).
Comparison with aCGH analysis. To place the ability of FGARD to identify functionally relevant DNA alterations into context, an analysis of the cDNA aCGH data set using a commonly used algorithm was done for comparison. This copy-number-only aCGH analysis differed from prior reports (21–27) in that it used mRNA expression and clinical data to locate novel high-probability target genes within the aCGH loci in a manner that has not been previously reported for glioblastoma. Genome-wide aCGH profiling of copy number alterations (CNA) in 21 glioblastomas using the CLAC algorithm (12) identified 82 aberrant loci occurring in at least 14% of the samples (Fig. 3). A frequency plot of the CNAs detected here closely matched that reported in a meta-analysis of 334 cases of astrocytoma analyzed by traditional CGH methods (ref. 22; data not shown). The median size of the identified loci was 400 kb, with >65% being <2 Mb. Previously reported loci in glioblastoma were identified in this analysis, including CNAs on chromosomes 12q15 (MDM2), 12q14.1 (CDK4), 9p21 (CDKN2A), 7p11.2 (EGFR), and 19p13.2 (CDKN2D; refs. 21–27).
The differentially expressed mRNA probe sets were cross-referenced with the list of survival-related mRNA probe sets and subsequently mapped to the CNAs identified by CLAC analysis. In this manner, 293 differentially expressed, survival-related genes residing in 36 loci were identified (see Supplementary Table S2).
Comparison of the FGARD results with these 36 survival CNAs defined by the aCGH algorithm revealed <10% overlap. FGARD identified 382 loci that were not detected by the CLAC aCGH analysis. Of the 36 survival CNAs defined by CLAC aCGH, 33 were also identified by FGARD (92%). Of the remaining three survival CNAs not detected by FGARD, all three contained a single survival-related gene, thus providing an explanation for why they were not detected by the FGARD cluster-based approach.
Increased expression of the EphA2 receptor tyrosine kinase in glioblastoma. To show the use of FGARD, we chose an FCR corresponding to a low-amplitude low-frequency amplicon for further study (listed in Supplementary Table S1). This FCR amplicon was located on chromosome 1p36.13 and covered ∼400 kb (Fig. 4A). We identified evidence for single copy number gain in 3 of 18 probes (17%) reporting at this locus. Because of its focal nature and low amplitude, this DNA alteration could not be reliably identified as significant using the CLAC aCGH algorithm alone. We chose EphA2 as the likely target gene within this FCR because of its location near the peak of the amplicon, its correlated degree of mRNA expression, its statistical relationship to survival, and its known biological function in other tissues.
Microarray analysis identified increased EphA2 mRNA expression in 5 of 21 glioblastoma tissue samples examined (24%). We obtained further confirmation of EphA2 mRNA overexpression in a subpopulation of glioblastomas when compared with non-tumor brain using real-time PCR (Fig. 4B). We did reverse transcription-PCR (RT-PCR) for EphA2 to look for splice variants of this protein in glioblastoma cells (Fig. 4B). mRNA for EphA2 RT-PCR was isolated from four primary glioblastomas and three non-tumor control brain samples. RT-PCR for EphA2 was also done in an additional primary glioblastoma and in five glioblastoma cell lines (U87, U343, D566, T98, and U251; data not shown). A single mRNA band corresponding to the expected size for EphA2 of 3 kb was detected in the 10 glioblastoma samples and in the 3 non-tumor brain samples examined.
Western blot analysis revealed increased EphA2 protein expression in glioblastoma when compared with non-tumor brain (Fig. 4C). Analysis of 14 glioblastoma and 3 non-tumor brain samples indicated that 9 of 14 glioblastomas (64%) showed increased EphA2 expression, with 5 of 14 (36%) showing strong expression when compared with non-tumor brain. Two of the seven low-grade astrocytomas examined (WHO grade 2) showed elevated EphA2 protein expression when compared with non-tumor brain (Fig. 4C). In addition, all glioblastoma cell lines examined showed high EphA2 expression (Fig. 4C).
EphrinA1 is an endogenous ligand for the EphA2 receptor (28). Microarray analysis in 21 glioblastomas revealed a 1.8-fold increase in ephrinA1 mRNA expression when compared with non-tumor brain (P < 0.026, unpaired t test). We confirmed increased ephrinA1 protein expression in glioblastoma by Western blot (Fig. 4C). In general, tumors that displayed elevated expression of EphA2 protein also displayed increased levels of ephrinA1 protein (Supplementary Fig. S4).
Initial FGARD analysis indicated that EphA2 mRNA expression was related to survival in patients with glioblastoma (P < 0.016, Cox regression). We have validated this result using Cox regression analysis of data from an independent set of 50 glioblastomas (ref. 29; P < 0.024). To examine this correlation further, we did a Kaplan-Meier analysis of EphA2 mRNA expression in our set of 21 glioblastomas. We divided the tumors into two groups based upon the mean level of EphA2 expression: those with high EphA2 expression (n = 10) and those with low expression (n = 11). Increased EphA2 mRNA expression correlated significantly with decreased survival in patients with malignant glioma (Fig. 4E; P < 0.034, Mantel-Cox log-rank test).
Functional role of EphA2 in glioblastoma. Activation of the EphA2 receptor by addition of the soluble EphA2 receptor ligand ephrinA1-Fc (2 μg/mL) resulted in a concentration-dependent increase in cell proliferation (P < 0.01, unpaired t test; Fig. 5A). The EphA2 receptor tyrosine kinase has been reported to activate the MAPK signaling pathway in several cell types (30). To investigate whether MAPK lies downstream of EphA2 activation in glioblastoma, we stimulated U343 glioblastoma cells with ephrinA1-Fc (2 μg/mL), collected total protein at various time points, and assayed by Western blot for evidence of ERK1/2, JNK/SAPK1, or p38 MAPK activation (Fig. 5B). Increased phosphorylation of ERK1/2, p38 MAPK, and JNK was observed 4 hours after EphA2 receptor activation. No change in the overall expression of these kinases was observed over the same time period. ERK1/2 phosphorylation was detected as early as 20 minutes after ephrinA1-Fc exposure (data not shown).
We sought additional evidence for EphA2-induced MAPK pathway activation in U343 cells using specific inhibitors for MEK (PD-98059 and U0126) and p38 MAPK (SB203580 and SB202190). We used the inactive compound SB202474 as a control (Fig. 5C). Inhibitors for either MEK or p38 MAPK decreased the EphA2-induced mitogenic response by up to 60% for MEK inhibitors (SB202190; n = 8; P < 0.0001, unpaired t test) and up to 80% for p38 MAPK inhibitors (U0126; n = 8; P < 0.0001, unpaired t test).
To confirm a role for the endogenous EphA2 receptor in the control of glioblastoma cell proliferation, we transfected an expression vector containing an shRNA sequence directed against EphA2 into U343 glioblastoma cells and selected stable clones. Western blot indicated a significant reduction in EphA2 protein expression in the shRNA-transfected cells when compared with cells transfected with a control vector (Fig. 5D). Knockdown of EphA2 expression decreased BrdUrd incorporation by 35% when compared with control-transfected U343 cells (P < 0.0004, unpaired t test). Separate experiments using a colorimetric MTT growth assay confirmed a 33% decrease in the growth of U343 glioblastoma cells after EphA2 knockdown (P < 0.001, unpaired t test; Fig. 5D). An additional shRNA directed against EphA2 also promoted cell growth when overexpressed in U343 glioblastoma cells (Supplementary Fig. S5).
We also assayed the role of EphA2 in glioblastoma tumor growth in vivo using a mouse xenograft model. U343 glioblastoma cells expressing EphA2 shRNA were implanted s.c. into nude mice. U343 glioblastoma cells transfected with a control shRNA vector were implanted on the contralateral side. Knockdown of endogenous EphA2 significantly inhibited glioblastoma tumor growth (n = 7; P < 0.0002, paired t test; Fig. 5E).
Discussion
We have developed a novel genomic screening method called FGARD that combines expression profiling and functional variables in humans (in this case survival) with aCGH to identify functionally relevant DNA alterations that may be too small or infrequent to be reliably identified as significant using aCGH data alone. The use of this method results in the identification of a highly selected subset of DNA alterations and associated target genes, all of which display multiple criteria predictive of biological importance in cancer. A significant fraction of the genes identified by FGARD in the current study are likely to represent the upstream determinants of tumor phenotype in glioblastoma and will thus serve as fruitful targets for further study and/or therapeutic intervention.
One finding of the FGARD method is the discovery of physical clusters of survival-related genes scattered throughout the cancer genome. These clusters demarcate a novel genomic interval called the FCR that has boundaries defined by physical proximity of the genes, correlated mRNA expression patterns, and functional significance. FCRs generally contained genes of high biological relevance to cancer, and >85% were associated with DNA alterations. Presumably, the minority of FCRs that were not associated with DNA alterations arose as a result of other factors that coordinately regulate gene expression on a regional basis such as gene duplication events with generation of similar promoters, DNA methylation, changes in chromosomal structure, etc. The advantage of the FGARD method remains relevant, even when currently available high-resolution aCGH platforms are used.
Direct comparison of FGARD with standard aCGH analysis revealed a significant advantage of FGARD for identifying functionally relevant DNA alterations. Using the additional information provided by the functional gene clustering effect described here, we were able to identify these alterations more accurately. The statistical clustering of functionally related genes, as evidenced by the increased correlation of mRNA expression within these clusters, and the association of the clusters with DNA alterations all contribute to the ability of FGARD to select for functionally relevant genetic alterations over background noise. Although solitary survival-related genes were sometimes associated with DNA alterations, we found that the occurrence of such associations was only 50% (n = 63), compared with a nearly 90% association rate that was observed when clusters of survival-related genes were examined (n = 100).
Some genes that are known to play a role in glioblastoma, such as PTEN and RB1, were not identified in this study. This was primarily because these genes were not significantly associated with survival in our series. Cox regression analysis of expression data in a larger, independent set of 50 glioblastomas (29) identified PTEN and RB1 as related to survival (P < 0.003 and P < 0.001, respectively). Thus, the use of larger data sets will allow for greater ability to correctly identify functionally relevant genes using the FGARD approach. However, in cases where mutations within a gene affect gene function, changes in mRNA expression may not correlate with survival.
The use of the FGARD approach was validated using an FCR containing the EphA2 receptor tyrosine kinase. EphA2 is a transmembrane receptor tyrosine kinase that is activated by ephrinA1, which is a cell surface molecule (28). The EphA2-ephrinA1 interaction is thought to play a role in contact inhibition, as stimulation of the EphA2 receptor by ephrinA1 under normal conditions inhibits proliferation and cell migration (31–34). Overexpression of endogenous EphA2 has been correlated with increased malignancy in a variety of human cancers, including pancreatic, prostatic, breast, and renal carcinomas (33–39). Enforced overexpression of the receptor using transfection techniques can transform epithelial cells (40), suggesting that EphA2 may act as an oncogene under some conditions. In contrast, several reports indicate that EphA2 receptor activation inhibits tumor growth and angiogenesis (32, 33, 41). This apparent contradiction may be explained by studies suggesting that the function of EphA2 may differ, depending upon whether it is expressed under conditions of cell-cell contact that allow for intercellular receptor activation, or whether it is expressed in malignant cells that lack normal intercellular interactions (42). The EphA2 receptor has a different epitope display in normal versus malignant cells, suggesting that it may exist in an altered state in tumors (42).
Although the Eph receptor family has been widely implicated in nervous system morphogenesis (28) and in some cancers (30–39), studies indicating a functional role for EphA2 in promoting brain tumor growth have not been published previously. We have used FGARD to identify EphA2 as a receptor tyrosine kinase that is overexpressed and inversely related to survival in a subset of glioblastomas. The endogenous EphA2 ligand ephrinA1 is also overexpressed in glioblastoma, indicating that this signaling system is dysregulated in these tumors. EphA2 activation increased glioblastoma cell proliferation and activated the MAPK/ERK pathway. Knockdown of endogenous EphA2 by RNA interference resulted in decreased proliferation and decreased tumor growth in a mouse xenograft model, supporting a role for EphA2 as a novel regulator of tumorigenesis in glioblastoma. Because EphA2 is expressed at relatively low levels in non-tumor adult brain, the possibility exists for the use of antagonists to this receptor in the treatment of malignant gliomas. Caution must be used in the development of such therapies, however, as decreased expression of other members of the Eph receptor family can promote tumorigenesis (43).
While this article was in preparation, two reports describing overexpression of EphA2 in glioblastoma were published (44, 45). One of these reports (44) examined the functional role of this overexpression and suggested that EphA2 activation inhibited anchorage-independent growth of glioblastoma cells. This report is in an apparent disagreement with reports that EphA2 overexpression can transform other cell types (40) and with our observations of increased proliferation and tumor growth in glioblastoma cells after EphA2 activation. One explanation for this discrepancy may lie in the duration of ligand exposure, as prolonged exposure to ephrinA1 leads to down-regulation of EphA2 receptor expression and function (data not shown). Further studies may help to reconcile the apparent pleiomorphic effects of EphA2 receptor activation in human gliomas and in other cancers.
Overall, these findings show the power of FGARD analysis for the study of human cancers on a genome-wide scale. This innovative method is not limited to the use of survival as a functional criterion. Other functional variables relevant to cancer biology, such as response to therapy or metastatic potential, could also be used in FGARD to identify genes with copy number–driven changes in expression that are upstream determinants of tumor phenotype.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
F. Liu, P.J. Park, and M.D. Johnson contributed equally to this work.
Acknowledgments
Grant support: Hagerty Fund Research Award (M.D. Johnson); Sontag Foundation Distinguished Scientist Award (M.D. Johnson); NIH grants K08 NS43482 (M.D. Johnson), K25 GM067825 (P.J. Park), R01 CA99041 (L. Chin), and P01CA095616 (L. Chin and R.A. DePinho); Brigham and Women's Hospital Institute for the Neurosciences Award (M.D. Johnson); Dana-Farber/Harvard Cancer Center Faculty Development Award (M.D. Johnson); NIH Roadmap for Medical Research grant U54LM008748 (P.J. Park); American Cancer Society Research Professorship (R.A. DePinho); Ellison Medical Foundation Senior Scholar (R.A. DePinho); Belfer Foundation Institute (R.A. DePinho); Goldhirsh Foundation Grant (E. Maher); Goldhirsch Foundation (L. Chin); NIH T32 grant (C. Bennan); and Massachusetts General Hospital Brian D. Silber Fund (A. Chakravarti).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.