The quality of cancer genomic and proteomic data relies upon the quality of the clinical specimens examined. Here, we show that data derived from non-microdissected glioblastoma multiforme tumor tissue is either masked or not accurate, producing correlations between genomic and proteomic data that lead to false classifications for therapeutic stratification. We analyzed the level of 133 key signaling proteins and phosphoproteins in laser capture microdissected (LCM) primary tumors from a study set of tissues used for the Cancer Genome Atlas (TCGA) profiling efforts, comparing the results to tissue-matched, nontumor cell–enriched lysates from adjacent sections. Among the analytes, 44%, including targets for clinically important inhibitors, such as phosphorylated mTOR, AKT, STAT1, VEGFR2, or BCL2, differed between matched tumor cell–enriched and nonenriched specimens (even in tumor sections with 90% tumor cell content). While total EGFR protein levels were higher in tumors with EGFR mutations, regardless of tumor cell enrichment, EGFR phosphorylation was increased only in LCM-enriched tumor specimens carrying EGFR mutations. Phosphorylated and total PTEN, which is highly expressed in normal brain, was reduced only in LCM-enriched tumor specimens with either PTEN mutation or loss in PTEN copy number, with no differences observed in non-microdissected samples. These results were confirmed in an independent, non-microdissected, publicly available protein data set from the TCGA database. Our findings highlight the necessity for careful upfront cellular enrichment in biospecimens that form the basis for targeted therapy selection and for molecular characterization efforts such as TCGA. Cancer Res; 74(3); 818–28. ©2013 AACR.
Tumor cells do not exist as a separate cellular entity but are part of a heterogeneous microecology that includes many different cell types such as fibroblasts, endothelial, nerve, and infiltrating cells of the immune system among others. It is becoming clear that this tumor microenvironment is an active participant in tumorigenesis (1, 2). Depending on the tumor type, location, and origin, the cellular makeup of the tumor tissue varies dramatically. Even within the brain, which is typically associated with few specialized cell types, high-grade glioma tissue contains transformed cancer cells, microglia, infiltrating lymphocytes, neural precursor cells, vascular endothelial cells, pericytes, and astrocytes (3). Understanding the contribution of each compartment or cell type to the disease is critical. In fact, if targeted therapy is to be successful, we need to understand what to target and where.
Coupling techniques that preselect specific tissue areas and/or cell types with downstream molecular profiling analysis may provide a powerful approach to personalized therapy. One such technique, laser capture microdissection (LCM), provides microscopically guided selective capture of individual cells or tissue areas (4) and has received widespread acceptance and has been used successfully in numerous studies (5–7).
A large proportion of targeted therapeutics in oncology are kinase inhibitors (8), reflecting our current understanding of cancer as being most often a disease of signaling derangements. As kinases are the central feature of protein signaling pathways that depend on the posttranslational phosphorylation of proteins, accurate measurements of the functional signaling architecture in human diagnostic tissue is necessary to facilitate targeted treatment decisions. Past studies have revealed that protein pathway activation levels cannot be obtained by measuring genomic alterations or RNA transcript profiling alone, which have little correlative relationships with posttranslational protein modifications. One proteomic technique that is sensitive enough to enable measurement of the functional state of cell signaling pathways in LCM material is the reverse-phase protein microarray (RPMA) technology (9). RPMA allows the evaluation of several hundred samples on a single “dot blot” type array, can measure hundreds of phosphoproteins from only a few thousand cells, and has been widely adopted in research and clinical trial settings (for review, see ref. 10).
Past work has evaluated the impact of upfront cellular enrichment by LCM (11, 12). However, the overall impact of upfront cellular enrichment on drug target activation signatures and multi-omic data analysis with correlation between protein activation and gene mutation/copy number variation has not been systematically studied. Here, we have investigated the impact of tumor cell enrichment by LCM on signal protein pathway activation data of primary human glioblastoma multiforme tumor samples that represent a subset of glioblastoma multiforme tumors from the Cancer Genome Atlas (TCGA) project. The TCGA project is systematically charting genomic changes in 20 different cancers with hope of determining driving mutations in tumor cells and stimulating the discovery of new drugs as well as enable targeted therapy utilizing currently available drugs (13, 14). Using available genomic TCGA data, we investigated whether upfront cellular enrichment via LCM produced proteomic data that better correlates with known genomic mutations and copy number variations in key cancer drivers such as EGFR and PTEN. Moreover, we have used this tissue set to determine whether very high endogenous tumor cell content (≥90%) diminishes the potential difference between enriched and nonenriched samples and would thus allow for omitting sample preprocessing, as is current TCGA practice. We then determined the impact of LCM on patient stratification for potential targeted therapy by interrogating differences in actionable/druggable signaling protein measurements in matched LCM versus non-LCM material.
Materials and Methods
Resected brain tumors were collected at Henry Ford Hospital (Detroit, MI) with written consent from patients in accordance with institutional guidelines and graded pathologically as glioblastoma multiforme according to the WHO criteria. Tumor samples were snap-frozen in liquid nitrogen within 1 hour of surgery. Frozen tissue from 39 newly diagnosed glioblastoma multiforme previously profiled by the TCGA (15) were used for this work; in most cases, tumor samples were cut from tissue adjacent to those sent to TCGA. Somatic mutations and copy number variation were obtained from cBio Cancer Genomics Portal (Supplementary Table S1; ref. 16).
Laser capture microdissection
Frozen tissue was cut into 8 μm sections with alternating sections of each sample being subjected to LCM or whole slide lysis (Fig. 1). LCM was conducted as previously described (17). Care was taken to selectively microdissect viable tumor cells without inclusion of necrotic tissue, blood vessels, body fluids, and normal brain parenchyma. Tumor cell selection was conducted during LCM by histologic analysis of the tissue specimen, under guidance of a board-certified pathologist, using standard surgical pathology criteria for frozen glioblastoma multiforme sections: nuclear anaplasia, mitotic activity, nuclear/cytoplasmic ratio, cellular density, microvascular hyperplasia, and necrosis (18). To collect whole slide lysates, tissue sections were lysed directly in extraction buffer, boiled for 8 minutes at 100°C, and frozen at −80°C until printing. Sections per sample were alternated between LCM and slide lysis, and the number of sections used for slide lysis was dependent on the number of sections used for LCM.
RPMA construction and staining
LCM tumor cell lysates were prepared and RPMA were printed and stained as previously described (10). Samples were printed along with a series of positive and negative control lysates consisting of cell lines treated with compounds that cause broad phosphoprotein increases (e.g., pervanadate, calyculin) and slides were stored dessicated at −20°C before staining. All antibodies used (Supplementary Table S2) were subjected to extensive validation for single band, appropriate molecular weight specificity by Western blotting as well as phosphorylation specificity through the use of cell lysate controls (e.g., HeLa ± pervanadate, Jurkat ± calyculin). Raw spot analysis data were acquired using ImageQuant 5.2 (Molecular Dynamics) and postprocessed using the Reverse-Phase Protein Microarray Analysis Suite (developed in-house; ref. 19).
Formalin-fixed, paraffin-embedded (FFPE) tissue sections were baked at 56°C for 30 minutes, deparaffinized in xylene, and rehydrated in a series of graded alcohols (100%, 95%, and 70%) with a final rinse in wash buffer (Dako). Immunostaining after heat-induced epitope retrieval (20 minutes at 95°C, followed by 20 minutes at room temperature in HIER buffer S2367, Dako) was conducted in a Dako Autostainer with an EnvisionSystem+HRP staining kit (Dako) with development in diaminobenzidine. Tissue sections were nuclear counterstained with hematoxylin (Dako) and Scott's Tap Water Substitute, and coverslips were applied with aqueous mounting medium (Faramount, Dako).
Two-way unsupervised hierarchical clustering analysis was prepared using JMP (version 5.1, SAS). Protein endpoints were only included in hierarchical clustering if all data points were available. High and low signaling clusters were determined according to the dendrogram. Mean comparisons of protein levels and phosphorylation were conducted in R (20) using Wilcoxon rank-sum or Student t test, depending on data normality (Shapiro–Wilk test). Data correlation (R2) was determined using GraphPad Prism (version 5.04; GraphPad Software Inc.), which was also used to prepare all of the bar graphs, box plots, and scatter plots. Differences between populations of PTEN copy number (0, −1, −2) were determined in R using pairwise Wilcoxon rank-sum with Bonferroni correction for multiple comparison. The Spearman ρ analysis was conducted using SciPy (21). P < 0.05 was chosen to indicate significance and P < 0.1 to indicate a trend. The EGFR/AKT/mTOR pathway activation score was determined by adding the respective rank of EGFR Tyr1068, AKT Ser473, and mTOR Ser2448 for each sample. Quartile differences between LCM and slide lysis samples were calculated by scoring each phosphoprotein level or pathway activation score according to quartiles (0, 1+, 2+, 3+) and comparing this quartile score between matched LCM and slide lysis samples.
Many protein endpoints differ significantly between enriched and nonenriched samples
As a measure of quantifying the difference between tumor cell enrichment via LCM versus no enrichment (slide lysis), we measured the mean total or phosphorylation level for 133 proteins. Of these, 44% showed a significant difference by mean comparison (P < 0.05) between LCM and slide lysis samples (Fig. 2A). Unsupervised hierarchical clustering analysis revealed that samples clustered largely by LCM or slide lysis as opposed to which tumor the matched pairs originated from, with only 1 of 39 patient-matched pairs clustering together (TCGA-06-0162, Supplementary Fig. S1). Limiting the samples to those containing ≥90% tumor reduced the difference between LCM and slide lysis to 14% of the analytes measured, whereas 23% of the measured signaling proteins were significantly different between LCM and slide lysis samples containing 50% to 89% tumor. Because of the low number of samples (n = 5), the difference between LCM and slide lysis was not determined for samples containing less than 50% tumor.
Next, we investigated individual LCM/slide lysis pairs to determine whether a general trend (e.g., a scalar increase or decrease from LCM to slide lysis) exists that would allow computational transformation of slide lysis data to LCM. However, as shown for 3 example phosphoproteins relevant for glioblastoma multiforme tumor biology (mTOR Ser2448, BCL2 Ser70, VEGFR2 Tyr951), phosphorylation levels did not generally increase or decrease from LCM to slide lysis but were protein- and patient-dependent (Fig. 2B). We also failed to find a correlation between protein phosphorylation levels in LCM/slide lysis matched pairs when focusing only on high tumor content samples (Fig. 2B). In fact, for VEGFR2 Tyr951, we even found a slight but significant inverse correlation, showing reduced levels of VEGFR2 Tyr951 phosphorylation in slide lysis samples with high phosphorylation levels in matched LCM pairs.
The impact of enrichment is protein-dependent
Using Spearman ρ analysis on all 133 protein endpoints, we identified the ErbB/HER family receptor tyrosine kinases (RTK) as constituting a significant portion of the small subset of proteins with high correlation between LCM and receptor tyrosine kinase samples (Supplementary Table S3). On the basis of these results, we then wanted to determine whether co-clustering of matched LCM and slide lysis samples differed between ErbB/HER proteins, which are known to be mostly expressed and activated in tumor cells (22), and their downstream signaling targets, which are more likely to be activated in non-tumor cells as well. Furthermore, we focused on the lowest and highest signaling clusters each because of their potential impact on patient stratification and selection for therapy. We found much more prominent co-clustering of matched dissected and undissected samples for ErbB/HER kinases compared with downstream targets (81% and 50% LCM/slide lysis co-clustering for ErbB/HER kinases in low and high signaling sample groups compared with 44% and 12% LCM/slide lysis co-clustering for downstream targets; Fig. 3). This indicates that the impact of LCM depends, in part, on how exclusively a protein is expressed or activated in tumor cells.
Whole-tissue extraction masks tumor biology and important DNA mutation–protein activation relationships
We then chose to investigate the EGF receptor (EGFR) and the phosphatase and tensin homolog (PTEN), which are clinically important proteins in glioblastoma multiforme tumor biology within the ErbB/HER signaling pathway and are known to be highly correlated with their cognate genomic mutational state (23, 24). Because of the more specific expression of EGFR in tumor cells compared with normal brain cells (22), we postulated a limited effect of tumor cell enrichment via LCM on the correlation of EGFR DNA mutation/copy number variation and EGFR (phospho)protein levels. In contrast, PTEN protein/phosphorylation level concordance with DNA mutation/copy number variation should be affected much more by LCM, because of its more ubiquitous expression in all cell types (Fig. 5D; ref. 25).
Of 30 patients with known EGFR mutation status, 5 patients (17%) had a mutation in EGFR (Supplementary Table S1). As expected, carriers of EGFR mutation showed significantly increased EGFR protein levels, both in LCM and in slide lysis samples (Fig. 4A). However, changes in EGFR phosphorylation (Tyr1173) were only found to be significant or showed a trend toward significance (Tyr1068) after LCM while being masked by high variation after lysing whole tissue (Fig. 4A).
We then investigated whether EGFR gene amplification would show the same trend. Of 36 patients with known EGFR copy number information, 15 (42%) had no or limited EGFR amplification (0/+1), whereas 21 patients (58%) had significant EGFR amplification (+2). As expected, we found a general increase in EGFR protein levels and phosphorylation (Tyr1068 and Tyr1173) corresponding with significant EGFR amplification (Fig. 4B). Although the increase in total EGFR and both phosphorylated forms of EGFR was significant (P < 0.05) in LCM samples, the increase in EGFR Tyr1173 phosphorylation was less pronounced and not significant in whole-tissue lysates. No significant impact of tumor content was found for either LCM or slide lysis samples (Fig. 4B).
Following this, we determined the impact of PTEN mutation on PTEN protein levels. Of the 39 TCGA samples used for this study, 30 had a known PTEN mutation status, with 23 patients displaying PTEN and 7 patients PTEN mutation (23%; Supplementary Table S1). A decrease in PTEN protein levels was found in tissue samples from patients with PTEN mutation only when the data were derived from cells enrichment by LCM (P < 0.1; Fig. 5A). Moreover, constraining the analysis to high tumor content (≥90%) samples did not improve the ability to obtain accurate PTEN protein data in whole-tissue lysates (Fig. 5A). PTEN gene copy number was available for 36 samples, with 7 patients having no change in copy number (19%), 25 patients with single-copy loss (heterozygous deletion, 69%) and 4 patients with deep loss (possibly homozygous deletion, 11%). Deep loss of PTEN resulted in a significant decrease in PTEN protein levels and phosphorylation (Ser380) again identified only after LCM tumor cell enrichment (Fig. 5B).
Whole-tissue extraction masks expected DNA mutation/protein network activation determinants
Given the current emphasis on signaling networks in cancer biology, we sought to determine the effect of LCM on accurate determination of EGFR and PTEN mutation–driven phosphorylation of AKT (for review including current molecular targets of this signaling network in glioblastoma multiforme, see ref. 26). We found a highly significant inverse correlation between PTEN levels and AKT phosphorylation (Fig. 6A; AKT Ser473: R2 = 0.8, P < 0.01; AKT Thr3081: R2 = 0.9, P < 0.005) in LCM samples of patients carrying PTEN mutations. This correlation was lost when analyzing nonenriched samples (AKT Ser473: R2 = 0.1, P > 0.5; AKT Thr3081: R2 = 0.1, P > 0.6). As expected, no correlation was found between AKT phosphorylation and PTEN in patients without PTEN mutations (Fig. 6A). Similarly, EGFR mutation lead to a significant correlation between EGFR and AKT activation in LCM samples (Fig. 6B; EGFR Tyr1068: R2 = 0.8, P < 0.05; EGFR Tyr1173: R2 = 0.9, P < 0.05), which is largely lost in nonenriched samples (EGFR Tyr1068: R2 = 0.3, P > 0.2; EGFR Tyr1173: R2 = 0.5, P > 0.1). Patients without EGFR mutation exhibited generally lower levels of EGFR phosphorylation and a lack of correlation with phosphorylated AKT (Fig. 6B).
Independent validation confirms that whole-tissue extraction masks expected genomic–proteomic relationships
To confirm our findings, we took advantage of publicly available proteomic data from an independent TCGA subset of glioblastoma multiforme samples that were not subjected to any upfront tumor cell enrichment. Within this independent set of 123 samples with known mutation status, 29 patients displayed EGFR mutations (24%), whereas 37 displayed PTEN mutations (30%). Gene copy number information was available for 159 samples, with 76 (48%) having no or limited EGFR amplification (0/+1) and 83 (52%) having significant EGFR amplification (+2), whereas 16 patients had no change in PTEN copy number (16%), 124 patients had a single-copy loss (78%) and 18 patients a deep loss (11%). As shown using this independent data (Fig. 4C), and in keeping with our own data, EGFR protein and phosphorylation levels were also significantly increased in patients carrying EGFR mutations or significant EGFR gene copy number increases (Fig. 4C). However, as we also found in our undissected set, the TCGA data showed lack of the expected correlation between PTEN mutational status or PTEN copy number with PTEN protein levels (Fig. 5C).
Phosphoprotein measurements obtained from undissected tissue samples lead to large variance and inaccurate drug target activation conclusions
To evaluate the impact of LCM enrichment of tumor cells for potential downstream clinical applications, we chose 6 proteins that are both relevant for glioblastoma multiforme biology and known important drug targets for which the measurement could be envisioned for patient selection/stratification: EGFR, AKT, MTOR, STAT1, VEGFR2, and BCL2. Next, we scored tumors depending on the respective phosphorylation level of each protein by converting the continuous variable RPMA data into quartiles of relative phosphorylation (0, 1+, 2+, 3+) similar to IHC scoring. We also calculated an EGFR/AKT/mTOR pathway activation score for each sample, including EGFR Tyr1068, AKT Ser473, and mTOR Ser2448 measurements as previously described (27) to evaluate the impact of LCM enrichment within the context of a canonical signaling network measurement. Following this, we compared how patients would have been stratified for therapy based on drug target activation measurements using these scoring determinants. We found that even for samples with at least 90% tumor cell content, 28% of matched LCM and slide lysis samples showed significant pathway activation classification differences of 2 or more quartiles (e.g., 0 to 3+ or 0 to 2+ and vice versa; Fig. 7A). In addition, 3 individual proteins (STAT1 Tyr701, VEGFR2 Tyr951, and BCL2 Ser70) showed significant classification differences between matched LCM and slide lysis samples in at least 40% of patients (Fig. 7B). Samples with smaller amounts of endogenous tumor cell content had only a modest reduction in major drug target classification changes between undissected and LCM material.
The clinical utility of molecular profiling for personalized therapy is inherently tied to the accuracy and fidelity of the underpinning data used. Recent high-profile molecular characterization efforts such as the TCGA have sought to develop translational “-omic” databases wherein comprehensive molecular analysis of DNA, RNA, metabolomic, and (phospho)protein levels are being used to develop a systems level view of tumor biology with the ultimate hope of using such data for personalized therapy-based applications. Use of upfront cellular enrichment technologies such as LCM have been extensively used in conjunction with molecular profiling and recent studies have revealed both the clinical accuracy of data obtained by LCM-based protein analysis (12, 28) and the impact of coincident stromal and epithelial co-contamination on molecular data (11, 29). However, none of these past efforts have systematically investigated the impact of LCM from clinical specimens on downstream data fidelity in the context of important drug target measurements and DNA mutation–protein phosphorylation analysis. The results presented herein provide evidence in support of the critical role of upfront tumor cell enrichment, especially in clinical patient samples that form the basis for targeted therapy selection and for important molecular characterization initiatives like the TCGA.
One may expect a limited impact of tumor cell enrichment for glioblastoma multiforme, where tumor cells are often assumed to be mixed with fewer cells of the microenvironment, owing to its high tumor cell density compared with normal brain parenchyma. In contrast to this assumption, we found more than 40% of the 133 proteins/phosphoproteins and nearly all important protein drug targets for clinical therapeutics that were measured to be significantly different between enriched and nonenriched samples (Fig. 2A). In fact, underpinned by the systemic differences in protein level and activation, only 1 of 39 enriched/nonenriched sample pairs coming from the same patient clustered together during unsupervised hierarchical clustering (Supplementary Fig. S1).
We hypothesized that the highest impact of LCM would be on the measurement of DNA mutation–protein level linkages of analytes with expression within both the tumor cells and the surrounding brain parenchyma, whereas measurements of proteins highly expressed in tumor cells would be less affected (Supplementary Fig. S2). This was supported by our observation that ErbB/HER RTKs, which are well known to be highly expressed/phosphorylated in tumor cells, make up a significant portion of the small subset of proteins with high correlation between LCM and slide lysis samples (Fig. 3; Supplementary Table S3). In addition, increased EGFR protein/phosphoprotein levels were observed for both LCM and slide lysis samples from patients with increased EGFR gene copy number. However, the increase in EGFR phosphorylation in carriers of EGFR mutations was not statistically significant in slide lysis samples due to high data variability (Fig. 4B). Only in an independent study set that consisted of a larger number of nonenriched samples (n = 159, publicly available TCGA data) was statistical significance reached. This indicates that for largely tumor cell–specific proteins, the analysis of large numbers of samples may overcome the observed increased patient-to-patient variation following whole-tissue lysis. In contrast to EGFR, PTEN is highly expressed in normal brain (25, 30). In fact, loss of PTEN copy number is frequently found in glioblastoma multiforme (15) and is under intense investigation as a companion diagnostic marker for PI3K/AKT/mTOR targeting therapies. Moreover, it has been recently shown by us that PTEN mutation status can be accurately determined by RPMA-based PTEN protein levels in lymphoblastic leukemia samples (31), providing confidence in the PTEN results we obtained herein. We observed decreased PTEN levels in tumors from patients with deep loss of PTEN or carriers of PTEN mutations only after LCM (Fig. 5). Furthermore, and in contrast to EGFR, increasing the number of nonenriched samples by investigating the aforementioned large independent study set of publicly available TCGA data did not improve the ability to measure correlations between PTEN protein levels and PTEN mutation or variations in PTEN copy number (Fig. 5C). This indicates that for proteins/phosphoproteins that are ubiquitously expressed/activated both in tumor and non-tumor cells, the non-tumor cell signal also scales up with sample size (Supplementary Fig. S2) and that for many important phosphoproteins such as AKT, mTOR, ERK, etc., simply analyzing a larger number of nonenriched samples will not produce more accurate data or correlations. Compounding this issue is that for every given tissue sample, it will be impossible to know ahead of time the percent contribution and amount of each cell type. Thus, although the overall amount of significantly different analytes between LCM and slide lysis samples is reduced with higher tumor content [23% (50%–89% tumor) to 14% (90% tumor); Fig. 2A], several key signaling proteins do not benefit significantly from increased tumor cellularity (Fig. 7B).
Activated EGFR is known to lead to the downstream activation of AKT, whereas active PTEN causes a downstream reduction in AKT phosphorylation. However, AKT is a very “promiscuous” signaling molecule that interacts with and is activated by a plethora of upstream ligands and pathways. It is therefore not surprising that we only found a correlation between AKT phosphorylation and EGFR or PTEN when either was mutated and therefore constituted a major driving signaling influence (or lack of, as in the case of PTEN). However, the expected and observed correlation between (phospho-) EGFR levels, PTEN levels, and AKT phosphorylation disappeared in all cases if samples were not subjected to LCM-based tumor cell enrichment. Taken together, these observations indicate that expected network biology within tumor cells is more clearly revealed by upfront tumor cell enrichment.
On the basis of the observed differences between LCM and slide lysis samples, one may raise the possibility that the procedure of LCM itself might systemically change the proteome. However, the utility and validity of LCM have been shown in a multitude of publications by many research groups (4), including direct proteomic comparisons of LCM versus non-LCM matched tumor samples, which showed no alteration of the proteome by LCM (32). Moreover, recent publications have shown excellent and even better concordance of (phospho-)HER2 protein data obtained from microdissected tissue with matched immunohistochemical and FISH data (28). In our study, we found better linkage between proteomic and genomic data in LCM samples compared with matched undissected samples, indicating that LCM facilitates a correct snapshot of the tumor cell–specific proteome.
Apart from improving our understanding of tumor biology, measuring the activation of signaling proteins also provides a means of patient stratification for targeted therapy. Given their central role in tumorigenesis and metastatic progression, protein kinases are ideal targets for therapy, which is reflected in the high proportion of protein kinase inhibitors in the current drug pipeline for cancer therapy (8). On the basis of our analysis of important drug targets such as phosphorylated EGFR, STAT1, VEGFR2, BCL2, AKT, and mTOR, as well as EGFR/AKT/mTOR pathway activation, a significant number of tumors would have been inaccurately determined to have little to no drug target either at the individual or at the pathway level when actually they are highly activated, or vice versa, when using slide lysis samples (Fig. 7) even when limiting the analysis to samples with 90% tumor content or more. This raises concern that treatment selection based on data obtained from tumor specimens that were not microdissected (and independent of starting tumor cellularity) may generate incorrect decisions if the targeted therapy is directed toward tumor cells.
In an effort to ensure the appropriate quality and fidelity of data, restrictions such as minimum tumor content per sample are typically used as filters for biospecimen inclusion/exclusion. In a TCGA pilot set studying glioblastoma multiforme, the inclusion criteria were set at 80% tumor nuclei, which has lead to the exclusion of 65% glioblastoma multiforme biospecimens (15). Recently, TCGA has determined that current second-generation sequencing platforms are sensitive enough “that 60 percent [tumor nuclei] provides a sufficient proportion to generate high quality data in which the tumor's signal can be distinguished from other cells' signals,” which allows for a larger number of biospecimens to be included in the study (33). However, our data suggest that for the analysis of proteins/phosphoproteins, the practice of sample inclusion/exclusion based on tumor cell content does not lead to appropriate quality data, even if standards are set as high as 90% tumor nuclei. Only upfront tumor cell enrichment appears to selectively provide tumor cell–specific proteomic data.
A potential issue that could contribute some variability to the levels of proteins/phosphoproteins measured in our study was the impact of phosphoprotein changes that could occur between the start of surgery and the time when samples were snap-frozen. It is well-known that kinases and phosphatases are active during warm and cold ischemia time (34), which may affect the final protein/phosphoprotein measurements. Although it is impossible to know exactly what the impact of preanalytical variables was on the analytes we measured, the fidelity of the data (following LCM) for phosphorylated EGFR, PTEN, and AKT based on concordance with PTEN and EGFR mutation data was shown. Protein/phosphoprotein–DNA status correlations were conducted under the assumption that the tumor cells provided the major source of altered genetic information. Currently, efforts like the TCGA use the whole tissue sample itself as input for genetic and proteomic analysis. Although nearly all current genomic analyses are focused on the identification of driver DNA alterations within the tumor cell archive, efforts to understand changes in the genetic landscape of stromal cells, immune cells, etc., within the context of the tumor microenvironment are now beginning. As these studies unfold, the impact of tumor cell enrichment within the context of protein/phosphoprotein–DNA status correlations therein will need to be carefully scrutinized. It also remains to be seen what the impact on data accuracy obtained from global proteomic analysis (e.g., from mass spectrometry–based profiling) will be when tumor cell enrichment is not used. However, on the basis of the data provided herein and the significant degree of discrepancies and inaccuracies that our data revealed even in our subscribe analysis on a limited number of proteins and DNA mutations, further evaluation of the impact of upfront tissue enrichment techniques on such analysis is imperative and warranted.
Disclosure of Potential Conflicts of Interest
L.A. Liotta is a consultant/advisory board member of Theranostics Health. E.F. Petricoin has ownership interest (including patents) in Thernanostics Health, Inc. and Perthera, Inc. and has served as the consultant/advisory board member of Theranostics Health, Inc. and Perthera, Inc. No potential conflicts of interest were disclosed by the other authors.
Conception and design: C. Mueller, L.A. Liotta, E.F. Petricoin
Development of methodology: C. Mueller, L.A. Liotta, E.F. Petricoin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Mueller, A.C. deCarvalho, T. Mikkelsen, N.L. Lehman, V.S. Calvert, L.A. Liotta
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C. Mueller, L.A. Liotta, E.F. Petricoin
Writing, review, and/or revision of the manuscript: C. Mueller, A.C. deCarvalho, T. Mikkelsen, N.L. Lehman, L.A. Liotta, E.F. Petricoin
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Mueller, N.L. Lehman, V. Espina, L.A. Liotta, E.F. Petricoin
Study supervision: E.F. Petricoin
The authors thank Laila Poisson, PhD, for her contributions in procuring the public TCGA data.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.