Overall survival of patients with osteosarcoma (OS) has improved little in the past three decades, and better models for study are needed. OS is common in large dog breeds and is genetically inducible in mice, making the disease ideal for comparative genomic analyses across species. Understanding the level of conservation of intertumor transcriptional variation across species and how it is associated with progression to metastasis will enable us to more efficiently develop effective strategies to manage OS and to improve therapy. In this study, transcriptional profiles of OS tumors and cell lines derived from humans (n = 49), mice (n = 103), and dogs (n = 34) were generated using RNA sequencing. Conserved intertumor transcriptional variation was present in tumor sets from all three species and comprised gene clusters associated with cell cycle and mitosis and with the presence or absence of immune cells. Further, we developed a novel gene cluster expression summary score (GCESS) to quantify intertumor transcriptional variation and demonstrated that these GCESS values associated with patient outcome. Human OS tumors with GCESS values suggesting decreased immune cell presence were associated with metastasis and poor survival. We validated these results in an independent human OS tumor cohort and in 15 different tumor data sets obtained from The Cancer Genome Atlas. Our results suggest that quantification of immune cell absence and tumor cell proliferation may better inform therapeutic decisions and improve overall survival for OS patients.
Significance: This study offers new tools to quantify tumor heterogeneity in osteosarcoma, identifying potentially useful prognostic biomarkers for metastatic progression and survival in patients. Cancer Res; 78(2); 326–37. ©2017 AACR.
Osteosarcoma (OS) is the most common primary bone malignancy, accounting for ∼2% of childhood cancers. The total number of new human cases diagnosed each year in the United States is low, ∼400–600 annually, which presents a significant challenge to studying this rare but deadly cancer (1). The relative 5-year survival rates for all OS (∼60%) and for metastatic OS (∼20%) have not significantly changed since the mid-1980s (1–7). OS in dogs is much more common, with an overall incidence rate 30 to 50 times higher than humans, and a lifetime risk up to 10% in larger, pre-disposed breeds (8). OS tumors can also be generated in wild-type and predisposed mice via tissue-specific mobilization of the T2/ONC transposon by the Sleeping Beauty Transposase (9, 10).
While in humans OS is primarily a pediatric disease, in dogs the majority of cases occur in older dogs with an average onset at 7.5 years (8). Unfortunately ∼80% of dogs die as a result of metastasis within 1 year of diagnosis (11). Human dog and mouse OS share many clinical and molecular features and insight gained from one species may be translatable to the others (9–12).
Identifying and understanding molecular mechanisms of OS have lagged behind other cancer types, partly due to the small number of human tumors available for study (12, 13). Additionally, OS is characterized by complex karyotypes with highly variable structural and numerical chromosomal aberrations in humans, dogs, and mice (9, 14–16). Sequencing studies have identified recurring genetic alterations in OS, some of which are common to both human and dog OS (12, 14, 17–19). However, understanding of these genetic alterations has not led to improved outcomes for OS patients. There remains a critical need to develop diagnostic tests that can identify risk of OS progression at an early, pre-metastatic stage.
The goal of this study was to assess transcriptional variation in OS tumor samples, identify common patterns of expression across different species, and evaluate their association with metastases and clinical outcomes. To accomplish these goals, we developed the gene cluster expression summary score (GCESS) technique to identify and quantify transcriptional patterns present across datasets and across species. We then use the GCESS method to derive associations between metastatic progression, outcome, and transcriptional patterns in OS and extend these results to a wide range of human tumors. Based solely on our genome wide method, we show that the loss of immune cell transcripts as well as increases in cell-cycle transcripts are associated with poor outcomes in OS and extend these results to a wide range of human tumor types.
Materials and Methods
Biospecimen collection and processing
Human and dog biospecimens were collected from newly diagnosed OS patients prior to treatment with cytotoxic chemotherapy drugs. Specimens were obtained under protocols approved by either the University of Minnesota's Institutional Review Board or Institutional Animal Care and Use Committee (protocol numbers 0802A27363, 1101A94713, and 1312-31131A) or the University of Colorado Institutional Review Board or Institutional Animal Care and Use Committee (AMC 635040202, AMC 200201jm, AMC 2002141jm, 02905603(01)1F, and COMIRB 06-1008).
Human patient OS samples (n = 44) and normal bone samples (n = 3) were obtained from the University of Minnesota Biological Materials Procurement Network (UMN BioNet) or the Cooperative Human Tissue Network, both of which follow standardized patient consent protocols. Samples had been deidentified and only a limited amount of patient information was provided. Saos-2, U-2 OS, MG-63, and 143B human OS cell lines were purchased from ATCC and were authenticated by the University of Arizona Genetics Core using short tandem repeat profiling, as well as an osteoblast cell line, which was a gift from Dr. Richard G. Gorlick (Albert Einstein College of Medicine, New York, NY), were also sequenced. Supplementary Table S1 summarizes the available metadata characteristics for the samples used in this study.
Dog OS samples (total n = 31) were obtained from dogs with naturally occurring primary appendicular tumors, recruited between 1999 and 2012. The majority of the samples were from Rottweilers and Golden Retrievers. Specimens were obtained with owner consent under approved protocols as previously described (13). Also included in this study were two cell lines (OSCA-8 and OSCA-78) derived from primary OS tumor as previously described (20), and one dog osteoblast sample, CnOB, purchased from Cell Applications. The OSCA cell lines are available for distribution through Kerafast, Inc.
Mouse OS samples (n = 92 tumor, n = 11 cell line) were derived from previously established experimental models. Included in this study were 25 OS samples from mice with somatic induction of Trp53R270H expression in Osx1-expressing cells (9), 67 samples from Sleeping Beauty transposon accelerated OS in wild-type and Trp53R270H mice (10), and 11 cell lines established from mouse OS tumors (9).
RNA extraction from frozen tumor tissue and cell lines
Isolation of total RNA from tissues, avoiding areas of necrosis, and from cell lines was performed according to the recommended protocol for Ambion's TotalRNA kit from Thermo Fisher Scientific. Samples were quantified using fluorescence by RiboGreen dye (Thermo Fisher Scientific). RNA integrity was assessed using capillary electrophoresis (RIN > 6.5) with the Agilent 2100 BioAnalyzer system (Agilent Technologies).
Sequencing libraries of each sample were prepared using the TruSeq Library Preparation Kit (Illumina). Paired end sequencing (30–40 million reads per sample) was done at the University of Minnesota Genomics Center on a High Seq 2000 (Illumina). The raw FASTQ files are available at the NCBI Sequence Read Archive and linked to from Gene Expression Omnibus SuperSeries GSE87686.
Publicly available human data
RNA sequencing (RNA-Seq) FASTQ files and outcome related metadata for 35 additional human OS samples (HOS2) were obtained from dbGap:phs000699.v1.p1 (http://www.ncbi.nlm.nih.gov/gap; ref. 21). RNA-seq files were also obtained from 25 additional human OS samples (HOS3) available from previously published studies (9, 15, 17).
FASTQ files from the HOS2 cohort were mapped to the reference genome, and fragments per kilobase of transcript per million mapped (FPKM) values were calculated using the same protocols as the samples in this study, while FASTQ files from the HOS3 cohort were mapped to the reference genome and RSEM expression values were calculated using the The Cancer Genome Atlas (TCGA) RNA-sequencing protocol (22, 23). In this study, the samples from both of these cohorts were used for calculating pairwise Pearson correlation coefficients and were included in the transcriptome analyses described below. The HOS2 cohort included survival data and information on presence of metastasis at diagnosis, so this cohort was used for survival analysis.
GEO dataset series GSE21257, which consisted of genome-wide expression data of 53 human OS tumors produced from the Illumina human-6 v2.0 expression array, was downloaded for this study. Patient outcome data, including survival and metastasis, was included in this study.
TCGA data portal (https://gdc.cancer.gov/) was used to download survival times, death events, and RNA-Seq data for 5582 tumors as described in Supplementary Table S2.
Briefly, mapping was carried out using Tophat (24), Samtools (25), and Cufflinks (26) to generate FPKM values using reference genomes as described in Supplementary Methods. To minimize the effects of dividing FPKM values by numbers close to 0 and stochastic noise, 0.1 was added to each FPKM value (27, 28). The FPKM files are available at GEO GSE87686. Mapping statistics are summarized in Supplementary Table S3.
Due to the high correlations between human OS datasets from multiple sources, the human datasets were combined for further analyses (9, 15, 17, 21). The ∼8,000 most variable genes across the full same-species datasets were identified for clustering by species (standard deviation cutoffs: human > 0.93, mouse > 0.72, and dog > 0.79). Cluster 3.0 (C Clustering Library 1.52) was used to log2 transform and gene-mean-center the data and then perform hierarchical average linkage clustering using the Pearson similarity metric. Clustering data were visualized in Java TreeView (version 1.1.6r4). OptiType, a precision HLA typing tool (29), was used to identify human OS samples derived from the same patient.
Systematic identification of gene clusters
Gene clusters with a dendrogram node correlation > 0.60 and at least 60 individual genes were identified in each of the species datasets. The 0.6 Pearson correlation cutoff value was chosen, as it is a widely accepted conservative confidence threshold. The minimum cluster size of 60 was chosen to ensure that only larger transcriptional patterns were identified. Permuted and random datasets were used to show that these thresholds would not identify clusters in artificial datasets that do not contain meaningful transcription patterns. Gene clusters representing batch effects from the combination of different human samples were removed from further analyses.
Generation of control datasets
For each species, a random dataset and a permuted dataset were generated as controls. Random datasets were generated by randomly selecting values between −2 and 2 to replace the actual (mean-centered) values. Permuted datasets were generated by randomly reordering the values for each gene.
The GCESS is defined as the sum of expression values (log2-transformed and mean centered) of all genes in a particular defined cluster for a single sample. The GCESS quantifies transcriptional variation between tumors. It takes many correlated individual transcript data points and condenses them into a single value.
The Ingenuity Pathway Analysis (IPA) suite (Qiagen) was used to identify pathways associated with gene clusters.
Statistical significance was calculated using the log-rank test or by Fisher exact test depending on analysis and a P < 0.05 was considered significant. Kaplan–Meier (KM) survival plots were generated using the “survival” package in R (Version 0.98.1103; refs. 30, 31). The GCESS values were used to rank the tumors into quartile groups and the quartile groups were systematically tested for association with outcome.
Histopathology, immunohistochemistry, and additional statistical information is provided in Supplementary Methods.
OS tumors show a common transcriptional profile across species that is distinct from other types of tumors
To better understand the OS transcriptome, we performed RNA-seq analysis on mRNA libraries generated from human, dog, and mouse tumors and cell lines (Table 1). We hypothesized that despite the highly complex karyotypes associated with OS, tumors would maintain common transcriptional characteristics between species that were distinct from other types of tumors. To test this hypothesis, we calculated pairwise Pearson correlation coefficients within and between our OS cohort (HOS1) and two independent human OS tumor cohorts (HOS2 and HOS3) recently published by other groups (Fig. 1; refs. 9, 15, 17, 21). Strong correlations were observed within each human cohort (average intracohort correlations: HOS1 0.62, HOS2 0.67, and HOS3 0.66). Strong correlations were also observed between our cohort and the previously published cohorts (average intercohort correlations: HOS1–HOS2 0.61, HOS1–HOS3 0.53). In contrast, strong correlations were not observed between HOS1 and other human tumors obtained from TCGA (cervical 0.26, colorectal 0.41, glioblastoma 0.30, leukemia 0.24, prostate adenocarcinoma 0.34, and thyroid cancer 0.17), indicating that there are common transcriptional events that define OS pathology (Fig. 1A).
|Source .||Species .||OS tumors .||OS cell lines .||Other .||Total .|
|This study||Human||44||5||3 normal bone||52|
|This study||Dog||31||2||1 osteoblast cell line||34|
|Perry et al. (21)||Human||35||35|
|Previous studies (9, 15, 17)||Human||25||25|
|Source .||Species .||OS tumors .||OS cell lines .||Other .||Total .|
|This study||Human||44||5||3 normal bone||52|
|This study||Dog||31||2||1 osteoblast cell line||34|
|Perry et al. (21)||Human||35||35|
|Previous studies (9, 15, 17)||Human||25||25|
To establish the cross-species relevance of the dog (DOS; Fig. 1B) and mouse (MOS; Fig. 1C) OS tumors, we calculated both inter- and intraspecies correlations. Similar to humans, average intraspecies correlations were high in both dog (0.76) and mouse (0.68) samples. Dog and mouse OS samples were both correlated with the human OS tumors (HOS1-DOS 0.48, HOS1-MOS 0.52), but not with other human cancers (DOS-cervical 0.18, MOS-cervical 0.14, DOS-colorectal 0.31, MOS-colorectal 0.28, DOS-glioblastoma 0.21, MOS-glioblastoma 0.25, DOS-leukemia 0.14, MOS-leukemia 0.14, DOS-prostate adenocarcinoma 0.24, MOS-prostate adenocarcinoma 0.21, DOS-thyroid cancer 0.13, MOS-thyroid cancer 0.12), validating a cross-species analysis strategy to identify common transcriptional components in the development and progression of OS.
OS tumors show common transcriptional variation patterns across three species
We hypothesized that patterns of transcriptional variation would be conserved in OS tumors across species. To systematically assess transcriptional variations across human, dog, and mouse OS tumors, each dataset was first analyzed independently using average linkage clustering. Tightly correlated and large gene clusters representing patterns of intertumor transcriptional variation were defined as having both a dendrogram node correlation greater than 0.6 and a minimum of 60 genes. A total of 9 human, 5 dog, and 11 mouse gene clusters were identified (Fig. 2A, Supplementary Table S4). To ensure that these clusters were not artifacts, similarly sized random datasets and shuffled permutations of the real data were generated and clustered. No clusters passed the threshold criteria in either the random or permuted datasets, validating that the clusters observed in the real RNA-Seq data represent true transcriptional variation (Supplementary Figs. S1–S9).
To determine whether the identified gene clusters represented sets of genes common across the 3 species, pairwise percent overlaps were calculated between each gene cluster from a single species and all clusters in the other 2 species. The resulting percentage overlap values were clustered for each species pair. Several clusters of genes showed significant overlap across species (Fig. 2B–D; Supplementary Table S4), indicating conserved patterns of transcriptional variation in OS tumor samples.
To describe the potential biological significance of these common gene clusters, enriched functional annotations were identified using IPA software (Qiagen). The most conserved cluster across species was composed of transcripts that were independently highly enriched in genes associated with cell cycle and mitotic functions. The next two highly conserved clusters across species were each independently enriched in transcripts relating to immune cell functions (Fig. 2D, Supplementary Table S5). A gene cluster enriched for muscle differentiation genes was observed in the human and mouse tumors but it was not present in the dog tumors.
We next asked if the genes in the two immune cell gene clusters represented a broad spectrum of immune cells or were enriched for particular leukocytes. We compared these immune cell gene clusters to a gene signature matrix (LM22), which consists of 547 genes and was created to identify genes unique to 11 different subtype populations of immune cells (32). Overall, our human immune cell annotated gene clusters were highly enriched in the LM22 gene transcripts (Supplementary Table S4). Using the LM22 annotations to determine the types of active leukocytes present (as represented by their unique transcripts), we found genes representing monocytes to be most enriched in human cluster-1 and genes representing T cells to be most enriched in human gene cluster-8 (Supplementary Table S4). Results from assessing enrichment of the LM22 genes in the dog and mouse immune cell gene clusters also supported these results (Supplementary Table S4).
The GCESS technique quantifies tumor transcriptional variation
The GCESS was developed to reduce the high dimensionality of expression profiles generating a normalized and easily comparable value per sample for use in further association analyses. The GCESS is defined as the sum of expression values (log2-transformed and mean centered) of all genes in a particular gene cluster for a single sample. A negative GCESS indicates relative underexpression of the group of genes in that sample compared with all of the samples in the analysis set, a positive GCESS indicates overexpression, and a GCESS close to 0 (zero) indicates mean expression. The GCESS summarizes the relative transcript levels of many correlated genes into a single value. This value is calculated for each cluster in each tumor. This allows for tumors to be rank ordered by summary score, which is based on the observed transcriptional data. Multiple summary scores were generated for each tumor sample, allowing for the independent comparison of the impact of each identified gene cluster, thereby achieving an unbiased dimensional reduction.
To better characterize the conserved OS tumor transcriptional variation, GCESS values were calculated for the identified for all identified gene clusters in each sample (Fig. 3A and Supplementary Table S6). In addition to human, dog, and mouse OS tumor samples, we also analyzed tumor-derived cell lines, normal human bone cells, and dog osteoblasts. The distribution of GCESS values (Fig. 3B–D) shows distinct patterns for cells/cell lines, normal bone, and tumor samples.
Tumor-derived cell lines had higher cell-cycle GCESS values compared with tumor tissues, while normal human bone had cell-cycle GCESS values corresponding to the lower range of GCESS values obtained from tumor samples (Fig. 3B). Dog osteoblasts had an extremely low cell-cycle GCESS. These results are consistent with the upregulation of cell-cycle genes in a subset of highly proliferating OS tumors (13).
Tumor-derived cell lines had lower immune GCESS values compared with the majority of tumor tissues in all three species datasets. The immune cell GCESS of the normal human bone sample was within the middle of the range seen in human tumor samples. Dog osteoblasts had GCESS values corresponding to the tumor-derived cell lines (Fig. 3C and D). These results are consistent with the absence of immune cells in cell lines and the variable presence of immune cells in tumor-derived tissues from all 3 species.
To validate the variable presence of immune cells indicated by GCESS immune cluster scores, we evaluated 10 FFPE sections derived from canine OS tumors, which were also sequenced, for the presence of T cells and macrophages within tumor stroma by immunohistochemistry (Supplementary Fig. S10A–S10D and Supplementary Table S7). Immunohistochemistry staining supported the transcriptome derived GCESS data for both MAC387 and CD3 staining. Stromal MAC387 was not observed in the 3 tumors with the lowest GCESS scores for the immune cell cluster-1 and occasional MAC387-positive cells were observed for the 3 tumors with the highest immune cluster-1 GCESS scores. One of the middle range tumors showed the presence of MAC387 positive staining cells within the stroma while 3 others did not. For CD3, only the sample with a positive GCESS immune cluster-2 score showed T cells present within the tumor stroma. Of note, the next four tumors sorted by immune cluster-2 GCESS score all showed MAC387-positive cells within the tumor stroma.
Minimization of multiple testing errors
We hypothesized that patterns of transcriptional variation present in tumors should indicate associations with patient outcome and that methodological data analyses improvements would be necessary to reveal these associations. Associations between OS patient outcomes and individual gene transcript levels are commonly calculated using the KM estimator, a nonparametric statistic that estimates the survival function based on censored lifetime data. KM survival analyses were calculated using groups defined by individual transcript levels for our dog and human HOS2 datasets. Potentially significant events were present following comparison of all transcripts (P < 0.00001; Supplementary Fig. S11) used in clustering. To assess whether these associations were false positives or not, similarly sized random datasets and shuffled permutations of the real data were also analyzed. Random and shuffled permutations led to similarly “significant events” as were observed in the real data due to multiple testing of ∼8 to 9,000 individual tests, indicating that methodological improvements were necessary (Supplementary Fig. S11).
Two routes are typically used to minimize the effects of multiple testing in statistical analyses. The first entails increasing sample numbers in order to surpass multiple testing corrections via increased statistical power. This was not feasible in this case. A second approach, which we used here, was to drastically decrease the number of tests performed (dimensionality reduction) by testing the gene cluster GCESS values rather than all individual transcript values. The GCESS was used to rank the tumors into quartile (Q) groups and systematically compare Q1- (lowest GCESSs) vs. Q234, Q12 vs. Q34, and Q123 vs. Q4. (Fig. 3A) Using this approach, associations between outcome and immune cell GCESS could be systematically examined without prior knowledge of the type of association present. Examination of randomly generated and real, but permuted datasets using the full pipeline did not identify gene clusters, limiting association analyses to gene clusters defined by nonrandom transcript patterns.
Systematic examination of GCESSs with tumor outcomes identifies poor patient outcome association with high expression of cell-cycle cluster genes and low expression of immune cell cluster genes
Applying the GCESS approach to the dog data revealed an association between increased transcript levels of cell-cycle genes and decreased time to death. Significantly worse outcomes were associated with the higher GCESS groups in both the Q123 vs. Q4 and Q12 vs. Q34 comparisons. Similarly, the human outcome analysis revealed that the highest cell cycle GCESSs (Q4) also had a strong trend towards decreased patient survival times (Q123 vs. Q4). Analyzing the human Immune-1 GCESSs demonstrated that human patients with the lowest immune cell cluster GCESSs had significantly shorter times to death (Q1 vs. Q234). The Immune-2 GCESSs also showed a trend towards association between low GCESSs and decreased time to death (Fig. 3; Supplementary Figs. S12–S15).
Association between GCESS and patient outcome validated in an independent cohort
If the conserved tumor transcriptional variation and association between GCESS and patient outcome revealed by this methodology are generally observable in OS tumors, they should also be observable in older array hybridization-based data. To validate this hypothesis, we analyzed GSE21257, an Illumina mRNA array dataset, which contained genome wide expression data for 53 tumors from patients with known outcome data, including survival and metastasis (33). Following a strategy similar to the one used for the RNA-seq data, 14 highly correlated gene clusters were identified. Gene overlap analyses comparing clusters derived from array and RNA-seq data sets identified gene clusters from the array that corresponded to the RNA-seq defined Immune-1, Immune-2, cell-cycle, and muscle transcript clusters (Fig. 4A). KM survival analyses utilizing sample GCESSs generated from each of these array gene clusters again showed that patients whose tumors had low immune cell GCESSs were more likely to succumb to OS (Fig. 4B–D; Supplementary Figs. S16 and S17).
A KM-analysis examining the time to metastasis was also performed. Low immune cell GCESSs or high cell cycle GCESSs were strongly associated with faster progression to metastasis (P = 0.0001 and P = 0.02, respectively; Supplementary Figs. S18–S20). These results independently validate the initial human data set findings as well as the general utility of the methodology described independent of experimental platform.
Low immune-2 GCESS associated with metastasis in human and mouse samples
Following necropsy, many of the mice had observable metastases. Scores from samples with metastases had lower Immune-2 scores than samples where metastatic tumors were not observed. In both human datasets, Immune-2 scores were lower in samples where metastases were present at diagnosis. This difference became significant when patients with metastasis in less than one year were combined with patients where metastasis was observed at diagnosis. The significance became even stronger when patients who showed metastasis at any point were compared with patients where metastasis was not observed. These findings indicate that the Immune-2 score has prognostic potential for determining the likelihood of a tumor metastasizing in human patients (Fig. 4E).
OS-derived cell-cycle and immune cell GCESSs correlate with poor clinical outcomes across multiple tumor types
We hypothesized that the patterns of transcriptional variations we found to be associated with outcomes in OS may also be relevant across many different types of tumors. To determine if increased cell-cycle transcripts and decreased immune cell transcripts are each also associated with poor clinical outcome measures across different tumor types, GCESSs were calculated for each of the TCGA tumor datasets (using the genes comprising the respective human OS clusters to identify each tumor's cell cycle and immune cluster), which were then subjected to KM-survival analysis. High cell-cycle GCESSs were significantly associated with poor survival outcomes in kidney renal clear cell carcinoma (KIRC), and clear trends were apparent in liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), pancreatic adenocarcinoma (PAAD), head and neck squamous cell carcinoma (HNSC), and cutaneous melanoma (SKCM; Table 2). Low immune cell GCESSs from human OS gene cluster 1 were significantly associated with poor survival outcomes in SKCM, and clear trends were apparent in LUAD, colon adenocarcinoma (COAD), and cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC; Table 2). Low immune cell GCESSs derived from human OS gene cluster 8 were significantly associated with poor survival outcomes in SKCM and clear trends were visible in, HNSC, LUAD, LIHC, COAD, CESC, and breast invasive carcinoma (BRCA). These results indicate that the survival associations between cell-cycle and immune transcript expression levels observed in OS are also present in a wide range of tumor types and that the GCESS methodology is capable of observing these associations in datasets where improved sample power exists.
Many diverse genetic events, including a catalog of rare events, have been reported to lead to OS formation, progression, and metastasis. Our data indicate that loss of immune cell infiltration and increased levels of cell-cycle transcripts are two specific transcriptional prognostic biomarkers for metastasis, and overall poor survival of OS patients. These transcriptional signatures also had prognostic utility across many types of human tumors, suggesting they are common transcriptional markers of pathological progression. Further, these two transcriptional patterns appear to be independent (Supplementary Table S6; Supplementary Fig. S21).
Tumor transcription can be conceptualized as resulting from loss of control of a series of independent transcriptional modules (gene clusters). The GCESS technique described in this paper generates a single meaningful value for each module in a tumor. The GCESS value can then be used for phenotype association discovery, thereby minimizing the multiple testing risks in datasets underpowered for meaningful genome-wide association analyses. These types of errors are clearly described for SNP association studies but remain prevalent in genome-wide studies utilizing large numbers of parallel analyses. Empirical testing of single-gene–based strategies to associate tumor transcript level with outcome revealed that for every 20 random transcripts tested, 1 false positive prognostic transcript was likely to be identified (using an uncorrected P < 0.05). If 1,000 transcripts were tested, then this would result in 50 likely false positives. Many genome-wide analyses of RNA-seq or array data routinely involve testing thousands of transcripts, which would potentially result in hundreds of false positives. This may explain why many transcription-based prognostic tests fail to be reproducible in independent cohorts.
Our previous work identified increased levels of cell-cycle transcripts in cell lines generated from OS tumors with worse outcomes (13). We have now extended this work to OS tumors and provide a novel methodology for identifying transcriptionally related gene clusters, even if they are not the primary component present in the dataset, and testing their association with a variety of outcomes while minimizing errors from multiple testing. These results are highly consistent with the CINSARC signature (Complexity INdex in SARComas) predictive of metastasis-free survival. Many of the genes identified encode for proteins involved in cell-cycle/mitosis, cytokinesis, mitotic checkpoint, and DNA damage repair (34, 35).
Our data also indicate that cell-cycle–mediated events have more prognostic potential for dog disease progression compared with human disease. We speculate this is largely due to differences in therapeutic regimens and overall commitment to therapy; however, it may also represent a fundamental difference between human and dog immune responses. Another potential confounding factor is the general characteristic of dog OS to progress much faster than typically observed in either normal or late-onset human OS. Dogs with OS may not survive long enough for the role of the immune cells to become observable. In specific dog breeds, the immune system may also have a reduced capability to recognize and respond to tumor cells.
Transcriptional profiles derived from grossly dissected tumors have variable types and quantities of cells present, including stromal and immune cells in addition to cancer cells. Our GCESS technique provides a straightforward method to indicate the relative abundance of immune cells in the tumor tissue sample, which can also be generally applied to any component present in a transcriptional dataset. Importantly, we compared results from our method with a previously published tool (ESTIMATE) that infers tumor purity (36). We found that in the naturally occurring human and dog OS tumor samples there was a strong correlation between our immune cell GCESS value and the ESTIMATE immune score.
The GCESS method provides a useful tool to distinguish between osteosarcoma tumor subtypes that seem to have distinctly different likelihoods of metastatic progression as a result of the presence of decreased numbers of immune cells within the tumor stroma. Supporting our conclusion is the lack of expression of these genes in OS cell lines, high correlation to the ESTIMATE scores, validation in multiple independent datasets, and a growing body of literature on the potential, variable immunogenicity in cancers such as OS.
As a genomically chaotic disease, conventional wisdom suggests OS must provide an abundance of antigens that typically would result in increased recognition by the host immune system, yet somehow OS tumors and their metastases have proven capable of escaping immune surveillance. Our results reinforce the theory that immune cell infiltration and monitoring is a normal process in bone tissue and suggest that disruption of this process in a subset of tumors is associated with increased tumor mortality and progression to metastasis (Supplementary Fig. S22). Determining whether the observed decrease in immune cells is intrinsically controlled by the tumor, or extrinsically by the immune system, or both, is an important, future research question in OS and other cancers.
Analyses of the TCGA data showed that the association between decreased immune cell GCESS and outcome was strongest in cutaneous melanoma (SKCM), a data set containing a large number of metastatic samples. Importantly, melanoma patients respond to immune checkpoint blockade therapy (37), and our results indicate that this same approach may have clinical utility in some OS patients. Furthermore, these results support an association between absence of immune cells, metastasis, and clinical outcome. Where immune cell outcome associations were seen in the TCGA the effects were more significant using the T cell–enriched cluster relative to the monocyte enriched cluster. On the contrary, in our OS datasets the monocyte enriched gene cluster was more significant (lower P values) than the T cell–enriched gene cluster.
We conclude that using multispecies datasets provides unique opportunities and insights to identify biologically meaningful gene signatures and to identify aspects of the immune response that can be manipulated therapeutically to improve the quality of life and outcomes of children with bone cancer, as well as patients with other sarcomas and solid tumors.
Disclosure of Potential Conflicts of Interest
D.A. Largaespada is Chief Science Officer for Surrogen/Recombinetics Inc. and Chair of the Board for B-MoGen, Inc., reports receiving a commercial research grant from Genentech, has ownership interest (including patents) in Surrogen/Recombinetics, B-MoGen, Inc., Immunsoft, Inc., and NeoClone, and is a consultant/advisory board member for NeoClone Inc. and Immusoft, Inc. J.F. Modiano has ownership interest in a patent application. No potential conflicts of interest were disclosed by the other authors.
Conception and design: M.C. Scott, R.S. LaRue, L.G. Spector, D.A. Largaespada, J.F. Modiano, S. Subramanian, A.L. Sarver
Development of methodology: M.C. Scott, A.E. Sarver, R.S. LaRue, J. Varshney, D.A. Largaespada, J.F. Modiano, S. Subramanian, A.L. Sarver
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Varshney, N.K. Wolf, B.S. Moriarity, T.D. O'Brien, L.G. Spector, D.A. Largaespada, J.F. Modiano, S. Subramanian
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M.C. Scott, N.A. Temiz, A.E. Sarver, R.S. LaRue, S.K. Rathe, T.D. O'Brien, L.G. Spector, J.F. Modiano, S. Subramanian, A.L. Sarver
Writing, review, and/or revision of the manuscript: M.C. Scott, N.A. Temiz, A.E. Sarver, S.K. Rathe, B.S. Moriarity, T.D. O'Brien, L.G. Spector, D.A. Largaespada, J.F. Modiano, S. Subramanian, A.L. Sarver
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.C. Scott, R.S. LaRue, J. Varshney, J.F. Modiano, S. Subramanian, A.L. Sarver
Study supervision: D.A. Largaespada, J.F. Modiano, S. Subramanian, A.L. Sarver
Other (obtained funding to support research study): J.F. Modiano
The authors would like to thank John Garbe for technical assistance in the RNA-seq data processing pipeline, Rebecca S. Larue for running the RNA-seq data processing pipeline, Colleen Forster and the Clinical and Translational Science Institute's Bionet laboratory for optimization and performance of immunohistochemistry, the Minnesota Supercomputing Institute for computing time and storage space, and the University of Minnesota Genomics Center for sequencing services. This work was generously supported by the Zach Sobiech Osteosarcoma Fund, Keren Wyckoff Rein in Sarcoma Foundation, grants to A.L. Sarver NCI (CA211249), J.F. Modiano NCI (CA208529) and Morris Animal Foundation (D13CA-032), D.A. Largaespada NCI (CA113636) and ACS (#123939), and A.E. Sarver NIH (CA099936), S. Subramanian ACS (RSG-13-381-01) and the Masonic Cancer Center Comprehensive Cancer Center Support Grant (CA077598).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.