Abstract
Purpose: Improved understanding of the molecular basis underlying oral squamous cell carcinoma (OSCC) aggressive growth has significant clinical implications. Herein, cross-species genomic comparison of carcinogen-induced murine and human OSCCs with indolent or metastatic growth yielded results with surprising translational relevance.
Experimental Design: Murine OSCC cell lines were subjected to next-generation sequencing (NGS) to define their mutational landscape, to define novel candidate cancer genes, and to assess for parallels with known drivers in human OSCC. Expression arrays identified a mouse metastasis signature, and we assessed its representation in four independent human datasets comprising 324 patients using weighted voting and gene set enrichment analysis. Kaplan–Meier analysis and multivariate Cox proportional hazards modeling were used to stratify outcomes. A quantitative real-time PCR assay based on the mouse signature coupled to a machine-learning algorithm was developed and used to stratify an independent set of 31 patients with respect to metastatic lymphadenopathy.
Results: NGS revealed conservation of human driver pathway mutations in mouse OSCC, including in Trp53, mitogen-activated protein kinase, phosphoinositide 3-kinase, NOTCH, JAK/STAT, and Fat1-4. Moreover, comparative analysis between The Cancer Genome Atlas and mouse samples defined AKAP9, MED12L, and MYH6 as novel putative cancer genes. Expression analysis identified a transcriptional signature predicting aggressiveness and clinical outcomes, which were validated in four independent human OSCC datasets. Finally, we harnessed the translational potential of this signature by creating a clinically feasible assay that stratified patients with OSCC with a 93.5% accuracy.
Conclusions: These data demonstrate surprising cross-species genomic conservation that has translational relevance for human oral squamous cell cancer. Clin Cancer Res; 20(11); 2873–84. ©2014 AACR.
In this study, we present our genomic analysis of a carcinogen-induced mouse model of oral squamous cell carcinoma (OSCC) that has surprising conservation with human OSCC with significant translational implications. First, we identified that as a group, mouse OSCC cell lines contain the same driver mutations described in human OSCC. In addition, this analysis highlighted novel candidate cancer genes that may affect the biology of human OSCC. Second, we identified a mouse-based metastasis signature that was highly conserved in four independent human OSCC datasets. The translational relevance of these data for OSCC patients lies in the current deficiencies in risk stratification that result in many patients receiving unnecessary surgery or chemotherapy. Therefore, we developed a proof of principle, clinically relevant assay from this mouse signature that has significant potential for human OSCC risk stratification.
Introduction
Aggressive carcinogen-induced oral squamous cell carcinomas (OSCC) are difficult to treat due to locoregional recurrences. In contrast, more indolent lesions can be treated with single-modality surgical intervention with low morbidity and favorable outcomes. Histologic criteria, such as perineural or lymphovascular invasion and tumor depth, harbingers of early spread to regional lymph nodes, are commonly used to predict tumor behavior (1, 2). In addition, among clinical staging criteria, metastatic lymphadenopathy is one of the best predictors of a poor prognosis as it likely reflects aggressive primary tumor biology (3–5; seer.cancer.gov/statfacts/html/oralcav.html). This staging is especially challenging in early disease as 20% of these patients have pathologically identifiable disease that is clinically undetectable. Thus, all “high-risk” patients undergo neck dissection operations, which prove to be unnecessary in nearly 80% of clinically node-negative patients. However, there is a dearth of studies delineating markers predictive of lymph node involvement, and genetic stratification approaches are at an early stage (6, 7). In addition, the molecular underpinnings of aggressive OSCC growth and metastasis remain largely undefined (5, 8).
Next-generation sequencing (NGS) of human head and neck squamous cell carcinomas (HNSCC), of which OSCCs are a significant subset, has confirmed previously identified aberrations (e.g., TP53 and CDKN2A) and has also defined novel NOTCH and FAT gene mutations along with frequent phosphoinositide 3-kinase (PI3K) pathway mutations (9–14). In addition, other mitogenic cascades, such as RAS and JAK/STAT, are altered at lower frequencies. In contrast, mutations that distinguish indolent from aggressive human OSCC remain undefined. Genomic approaches to identify signatures that predict metastatic behavior in OSCC have been described but none have approached the clinical impact of tests available for breast cancer and ocular melanoma (15–19). Importantly, molecular clues reflecting metastatic regulators have not arisen from these biomarker studies.
To better understand the genomic basis of the aggressive OSCC phenotype, we used our recently described carcinogen-induced mouse oral cancer (MOC) cell line model (20). These MOC lines, which parallel the distinct phenotypes seen in human disease, are either CD44low and indolent, or CD44high and aggressive/metastatic. Herein, we used genomic approaches to (i) define parallels to human OSCC, (ii) understand the transcriptomic differences that underlie both phenotypes, and (iii) translate this information into a clinically relevant context. Remarkably, despite differences in species and carcinogen exposure, many of the same drivers implicated in humans were altered in MOC lines, revealing highly conserved pathways in OSCC tumorigenesis. In addition, we identified a gene expression signature associated with metastasis that was conserved from mouse to four distinct human datasets, uncovering potential promoters of aggressive OSCC. Finally, we successfully translated this signature into a platform for potential clinical application. Together, this analysis identifies novel pathways associated with aggressive growth and metastasis that may contribute functionally to cancer progression and lead to improved diagnostics.
Materials and Methods
Study approval
Mouse studies were performed and human specimens were obtained under approved protocols of Washington University Animal Studies and the Human Research Protection Office, respectively.
MOC cell line model
Cell lines were generated, characterized, and propagated as described (20). Further analysis since their initial description revealed that the MOC7 and MOC10 lines were derived from MOC2 and were thus renamed MOC2-7 and MOC2-10 (data not shown). MOC2LN was generated from a lymph node bearing metastatic MOC2. Primary C57BL/6 oral keratinocytes were generated by microdissection of oral mucosa from wild-type mice (Taconic), generating single-cell suspensions and growing to near confluence using keratinocyte media (CellNTec). Media were then changed to MOC line media for 24 hours before RNA isolation.
Exome capture and sequencing
Genomic DNA from MOC cells was extracted using the DNeasy Blood & Tissue Kit (Qiagen) and was constructed into Illumina libraries according to the manufacturer's protocol (Illumina Inc.). One microgram of the size-fractionated Illumina library was hybridized to the Agilent mouse exome reagent. After the 24-hour, 42°C hybridization, we added DynaBeads Streptavidin-coated magnetic beads to selectively remove the biotinylated Agilent probes and hybridized cDNA library fragments. The beads were washed, and the captured library fragments were released into solution using NaOH. The recovered fragments were PCR amplified according to the manufacturer's protocol using 11 PCR cycles. Illumina library quantification was completed using the KAPA SYBR FAST qPCR Kit (KAPA Biosystems). The quantitative PCR (qPCR) result was used to determine the quantity of library necessary to produce 180,000 clusters on a single lane of the Illumina GAIIx. One lane of 100-bp paired-end data was generated for each captured sample on the HiSeq2000 (Illumina).
Mutation detection and annotation
As normal tissue from the mice bearing the parental tumors was not available, these mutation calls were compared with the reference C57BL/6 genome for MOC1, 22, and 23 or to the CXCR3−/− exome that we generated in this analysis for MOC2 and 2-10. Sequence data from each tumor and the C57BL/6 genome were aligned independently to NCBI Build 37 of the mouse reference using BWA 0.5.9 and deduplicated using Picard 1.29 (http://picard.sourceforge.net). Sample variants were called using Samtools (Version 0.1.7a, revision #599). Somatic single-nucleotide variants (SNV) were detected using VarScan 2 (http://varscan.sourceforge.net) with the following parameters: min-coverage 30, min-var-freq 0.08, normal purity 1, P value 0.10, somatic P value 0.001, validation 1, and SomaticSniper. Somatic indels were extracted using GATK (Version 3 http://genome.cshlp.org/cgi/reprint/gr.107524.110v1) and Pindel. All predicted variants were filtered to remove false positives due to potential homopolymer artifacts (variants found in homopolymers with sequence length ≥ 5 were removed), strand-specific sequence artifacts, ambiguously mapped data (the average mapping quality difference between the reference-supporting reads and variant-supporting reads >30), and low-quality data at the beginning and end of reads (variants supported exclusively by bases observed in first or last 10% of the reads). Variants with an allele frequency <8% were removed. Initial variant transcript annotation was based on NCBI mouse build37. Because of lack of a true matched normal tissue, we had more somatic single-nucleotide polymorphism (SNP) than expected, so we removed “clustered” SNPs using our internal cluster filter, which allowed a maximum of 2 variants per 0.5 MB genome region and also filtered out mouse dbSNPs. To identify any sample-specific mutations, variant allele frequency was calculated for all the SNVs using an internally developed tool Bam2ReadCount (unpublished), which counts the number of reads supporting the reference and variant alleles. We accessed The Cancer Genome Atlas (TCGA) HNSCC mutational data from (http://gdac.broadinstitute.org/runs/analyses__latest/reports/cancer/HNSC-TP/MutSigNozzleReportCV/nozzle.html).
Expression microarrays
MOC line and primary oral keratinocyte total RNA were isolated using the RNeasy Kit (Qiagen) and subjected to gene expression profiling using Illumina MouseRef-8 Expression BeadChips (Illumina). Raw expression data were subjected to cubic spline normalization in GenomeStudio (version 2011.1). Microarray data are available in NCBIs GEO (GSE50041). Principal component analysis (PCA), ANOVA, and hierarchical clustering were performed with Partek Genomics Suite (version 6.6) using a significance of P < 0.01 as a threshold for gene inclusion. Significance analysis of microarrays (SAM), Version 4.0 was used to generate a ranked gene list, and a threshold of q < 10% was then used to select the most highly significant genes that were up- or downregulated in indolent versus aggressive mouse cell lines. These lists were used as signature gene sets for gene set enrichment analysis (GSEA). Human OSCC expression datasets were accessed via public databases, and information about patient selection, demographics, tumor staging, and treatment outcomes was reported in their original publications or on the TCGA data portal. For the TCGA dataset, we analyzed RNA-seq data from 134 cases of OSCC with sufficient annotation to determine regional lymph node involvement.
Quantitative reverse transcriptase PCR
Total RNA was isolated from MOC cell lines (RNeasy; Qiagen) and converted to cDNA using the High Capacity cDNA Reverse Transcription Kit (ABI). Taqman Gene Expression Assays with glyceraldehyde-3-phosphate dehydrogenase (Gapdh) controls were performed in duplicate using the Taqman Fast Advanced Master Mix (ABI) on an ABI Step One Plus. Relative expression for each probe was then calculated using the comparative Ct method.
Iterative GSEA-based enrichment
GSEA software and a complete description of the algorithm are provided online by the Broad Institute (http://broadinstitute.org/GSEA, ref. 21). Each published OSCC dataset was formatted for GSEA and classified by regional lymph node involvement or stage. GSEA was applied to each dataset using the two lists of significantly up- and downregulated genes in indolent versus aggressive mouse cell lines. The enrichment scores assigned by GSEA were then used to trim away genes that were oppositely enriched to produce two new, trimmed ranked gene lists derived from each human dataset. GSEA was performed again using the trimmed lists from each dataset against each of the other human datasets; e.g., the lists trimmed by the Fred Hutchinson Cancer Research Center (FHCRC) dataset were tested against the MD Anderson (MDA) dataset, and vice versa, resulting in six pairs of lists (Fig. 4A). This process was continued for another round, producing the final lists that had been trimmed based on enrichment of the mouse genes in all three human expression sets.
Development of support vector machine–based clinical assay
Five formalin-fixed, paraffin-embedded (FFPE) sections (10 μM) each from surgically treated patients with OSCC were obtained and tumor areas marked by a board-certified head and neck pathologist (J.S. Lewis Jr). These areas were microdissected and combined for each individual tumor. RNA was harvested using RecoverAll (Ambion) and converted to cDNA using the High Capacity cDNA Reverse Transcription Kit (ABI). Using pooled Taqman Gene Expression Assays [for 42 discriminating and 3 housekeeping genes (GAPDH, ACTIN, and UBC)] and Taqman Pre-Amp Master Mix (ABI), all samples were preamplified for 14 cycles. Samples were then diluted 20-fold and assayed in duplicate for individual genes using Taqman probes and Gene Expression Master Mix on an ABI Step One Plus. ΔCt values were calculated by subtracting the geometric mean of the mean Ct values of the three endogenous control genes from the mean ΔCt of each discriminating gene. The 42 genes were refined into the 19-gene set by support vector machine (SVM) analysis (http://www.chibi.ubc.ca/cgi-bin/nph-SVMsubmit.cgi) on a 17-tumor training set with known pathologic status. SVM was able to accurately classify the 17 tumors when submitted as unknowns using the 19-gene set data. With the trained SVM, we then submitted data from 13 independent tumors, again with known pathologic status, as unknowns for classification.
For fresh biopsy samples, we acquired RNA prepared from freshly frozen OSCC tumor samples at surgery from the Siteman Cancer Center Tissue Procurement Core. RNA was processed as above for analysis with discriminating and housekeeping genes. We were able to refine the 42 genes into a 10-gene list for fresh samples. Using these probes, we trained the SVM with new data from 16 fresh biopsy tumors. Subsequently, 18 independent test set OSCCs were analyzed and stratified as above.
Statistical analysis
Weighted voting was performed using GenePattern version 3.3.3 for classification of human tumor microarray data (http://www.broadinstitute.org/cancer/software/genepattern). For weighted voting, the gene expression data for the oral cancer aggressiveness and metastasis predictor (OCAMP-A) signature genes were collected from the UW/FHCRC published dataset. Weighted voting was performed on the entire set, followed by leave-one-out cross-validation, which identified a subset of 26 tumors with correct calls and high confidence (>0.4). This subset was then used as a training set to reclassify the rest of the samples. After identifying the 118 genes of the OCAMP-B signature, leave-one-out cross-validation was performed using the weighted voting algorithm to reclassify samples according to the OCAMP-B signature. The Kaplan–Meier survival analysis was performed on the reclassified UW/FHCRC and MDA samples using clinical follow-up data available with the datasets. Cross-tabulation was used to explore the relationship of OCAMP with clinical tumor–node–metastasis (TNM) stage. The impact of gene signature on survival was evaluated using the product limit Kaplan–Meier method. The log-rank test was used for comparison of survival curves and was evaluated at α level of 0.05. Stratified analysis was used to explore role of the signature within TNM clinical stage categories. Statistical analysis was performed in IBM-SPSS (v20.0).
Results
Murine OSCC model and NGS
Previously, we described a 7,12 dimethylbenzanthracene (DMBA)-induced mouse cell line model of OSCC where, upon transplantation, individual lines displayed fixed in vivo phenotypes (Fig. 1A). The indolent lines all formed tumors in RAG2−/− immunodeficient mice but only MOC1 and MOC22 grew in wild-type mice. Of the aggressive lines, MOC2-7 and MOC2-10 were derived from the MOC2 line, but the MOC2-10 line was included in the current analysis because it uniquely displayed lung in addition to lymph node metastasis. Note that we used the flank model for ease of tumor measurement, but lymph node metastasis was also observed upon orthotopic transplantation (20). MOC growth behaviors were consistent with human OSCC clinical behavior, leading us to investigate whether their somatic alterations were also congruent. NGS was performed on 3 indolent and 2 related aggressive/metastatic lines with excellent coverage depth (Supplementary Fig. S1A).
Mutation overview
Many somatic nonsynonymous SNVs (nsSNV) were identified in these lines as expected for carcinogen-treated tumors (Fig. 1B, Supplementary Fig. S1B, Supplementary Tables S1 and S2, and ref. 22). Observed mutations were consistent with the known predilection of DMBA for first, A:T → T:A (range, 48.7%–59.3% of total), and second, G:C → T:A (range, 14.4%–17.6%), transversions (Fig. 1C, Supplementary Table S3, ref. 23) overlapping with G:C → T:A mutations described for HNSCC (11).
Conservation of candidate driver mutations between MOC lines and human HNSCC
We next compared the MOC line mutations with the 32 most significantly mutated genes from the TCGA HNSCC effort (Supplementary Table S4). Surprisingly, as a group, the MOC lines bore mutations in many of these same genes with seven of the top ten genes altered in human HNSCC also carrying mutations in the MOC lines (Table 1). We also asked whether drivers described in human OSCC were present in MOC lines. Recent work has highlighted the NOTCH, PI3K, mitogen-activated protein kinase (MAPK), JAK/STAT, FAT families, and Trp53 as pathways critical for HNSCC (9–13). Again, MOC lines bore mutations in the same driver pathways described for HNSCC and changes were typical of the DMBA spectrum as described above (Fig. 1C, Supplementary Table S5). Whereas all MOC lines had mutations in Trp53, MAPK, and the FAT family of genes, only the indolent cell lines showed NOTCH, JAK/STAT, and PI3K pathway mutations. Mutations in FAT1 (12%) and FAT4 (10%) have been identified in human HNSCC (13) and these genes in addition to FAT2 and FAT3 were altered in MOC lines. Other candidate driver mutations included CASP8 in MOC22, which is altered in 8% to 10% of human HNSCC typically in association with HRAS mutations (11, 13). Indels did not segregate into either indolent or aggressive growth categories (Supplementary Table S6). Copy number and tumor heterogeneity could not be reliably evaluated, as normal tissue from the parental mice was not available. Thus, as a group, MOC lines had alterations in the most commonly mutated genes and driver pathways in HNSCC, reflecting an unexpected conservation in the mutational landscape, despite differences in the species, specific carcinogen used to derive the lines, and overall numbers of mutations.
. | MOC1 . | MOC22 . | MOC23 . | MOC2/2-10 . |
---|---|---|---|---|
1. CDKN2A | — | — | — | — |
2. TP53 | V170E and splice site | T122P, K384* | Splice site | E225* |
3. JUB | — | — | — | — |
4. CASP8 | — | R88S | — | — |
5. PIK3CA | Other | Other | Other | Other |
6. NSD1 | — | — | N265I | — |
7. NOTCH1 | splice site | Other | G1195R + splice site | — |
8. FAT1 | E2054D | D2756V | K243* | W3896R |
9. MLL2 | — | S4691* | Q1695L | Q3647L |
10. EPHA2 | — | K946* | — | — |
16. HRAS | Q61L | Q61L | Q61L | Other |
. | MOC1 . | MOC22 . | MOC23 . | MOC2/2-10 . |
---|---|---|---|---|
1. CDKN2A | — | — | — | — |
2. TP53 | V170E and splice site | T122P, K384* | Splice site | E225* |
3. JUB | — | — | — | — |
4. CASP8 | — | R88S | — | — |
5. PIK3CA | Other | Other | Other | Other |
6. NSD1 | — | — | N265I | — |
7. NOTCH1 | splice site | Other | G1195R + splice site | — |
8. FAT1 | E2054D | D2756V | K243* | W3896R |
9. MLL2 | — | S4691* | Q1695L | Q3647L |
10. EPHA2 | — | K946* | — | — |
16. HRAS | Q61L | Q61L | Q61L | Other |
NOTE: “Other” indicates that mutations in other genes involved in this pathway were identified. See Supplementary Table S5.
Novel candidate cancer genes
As common MOC line mutations may represent novel OSCC promoters, our analysis identified the A kinase anchoring protein Akap9, mediator complex component Med12l, and Myh6 as potential candidates (Supplementary Tables S5, S7, and S8). AKAP and MED protein families were mutated in the TCGA cohort (analyzed via cBio; ref. 24) in which we found that 9 members of the AKAP family were altered in 20.4% of tumors, with AKAP9 changes in 7% (Fig. 1D). Six components of the mediator complex were mutated in 14.7% of cases, with MED12L changes in 5% (Fig. 1E). Of note, MED1 mutations were previously identified in 5% of HNSCC (11). Importantly, MutSigCV analysis did not identify any of these genes as significantly mutated in TCGA when analyzed individually. However, together, the mutations in several AKAP family members and mediator components suggest that these pathways may be relevant promoters. Further analysis identified 5% and 9% rates for AKAPs and 13% and 17.5% for MEDs in two independent HNSCC datasets [(12) and (11), respectively]. Finally, very recent work using an RNAi in vivo screen identified MYH9 as a putative cancer gene in squamous cell carcinoma and we identified the related MYH6 gene as commonly mutated in MOC lines (25). The TCGA dataset shows equivalent mutation rates for both these genes. In addition, THSD7A, MUC5B, LRP2, and LAMA1 gene mutations were common in MOC lines (Supplementary Tables S7 and S8) and were also present in the TCGA cohort (Supplementary Fig. S2). These alterations illustrate not only the conservation of structural parallels between mouse and human OSCC but also the ability of the mouse model to highlight novel tumor promoters.
Growth phenotype–specific mutations
Although NGS confirmed MOC and human OSCC conservation, analysis of mutations specific to indolent or aggressive lines and lymph node versus lung metastatic lines was inconclusive likely due in part to the limited numbers of samples (Supplementary Tables S7 and S9). We next approached this question by comparing our mouse sequencing data with mutations unique to lymph node metastasis negative (N0, 62 patients) versus positive (N+, 84 patients) OSCC samples from TCGA. We identified 3,273 N0 and 3,097 N+ mutations exclusive to each nodal status subset (Supplementary Table S10). There was no significant difference in the average number of mutations between N0 and N+ patients regardless of smoking status (Supplementary Fig. S3 and Supplementary Table S11). This analysis showed 17 genes commonly mutated in mouse indolent and human N0 tumors and 55 common genes mutated in mouse aggressive and human N+ tumors (Supplementary Table S12). However, none of these common genes were mutated at high frequency in the human N+ or N0 datasets (Supplementary Tables S13 and S14). Finally, isolated analysis of the human TCGA data also showed that nodal status–specific mutations occur infrequently in both the metastatic and nonmetastatic tumors (from cBio portal, data not shown). Together, this analysis suggests that the aggressive OSCC phenotype is not clearly a result of somatic exome changes but rather may be driven instead by epigenetic or transcriptional alterations.
Expression microarray analysis
Next, we interrogated MOC lines and primary C57BL/6 oral keratinocytes to identify transcriptomic promoters of aggressiveness. As expected, PCA showed that all related aggressive lines clustered together. The indolent MOC1 and MOC22 clustered near each other and were only slightly separated from normal oral keratinocytes. In contrast, MOC23 showed a distinct distribution consistent with it being a unique subtype that grows only in RAG2−/− immunodeficient mice (Fig. 2A).
Unsupervised hierarchical clustering demonstrated a metastasis signature (Fig. 2B) and SAM identified specific differentially expressed genes at a false-discovery rate of <10% (Supplementary Fig. S4 and Supplementary Table S15), which were confirmed by ANOVA (P ≤ 0.01). The mouse signature was divided into significantly downregulated (260) or upregulated (218) genes in indolent versus aggressive lines (Supplementary Table S16). Expression patterns for genes described in human metastatic tumors, such as MUC1, SLPI, and TACSTD2, were conserved in mouse OSCC (26–28). We identified several upregulated transcription factors, including Eomes, Nkx2-3, Foxa1, Hnf1b, Meis1, and E2f4, that were previously not described in OSCC and may be central to controlling global programs of aggressiveness.
The dramatic differences in expression between indolent and aggressive lines for Nkx2-3 and Foxa1 were confirmed by quantitative real-time PCR (qPCR; Fig. 2C). Finally, Hoxb7 and Bmp4 were implicated as candidate promoters of lung metastasis as they were overexpressed in MOC2-10 versus MOC2 (Fig. 2D) and are two key candidate promoters of distant metastasis in the MOC model.
Cross-species signature conservation
We next asked whether the mouse signature predicted outcomes in patients with human OSCC. Using microarray data from a carcinogen-induced, human papilloma virus–negative cohort of 97 patients with OSCC (UW/FHCRC), we stratified patients based on enrichment of the mouse signature by weighted voting (16). Using the Kaplan–Meier analysis, disease-specific survival (DSS) was statistically significantly worse for subjects in the group with the more aggressive signature as compared with those with the less aggressive signature (50% versus 80% 5-year DSS, Fig. 3A, P < 0.01). Thus, we termed this signature the Oral Cancer Aggressiveness and Metastasis Predictor (OCAMP-A).
To identify overlap between mouse and human signatures, we used GSEA, which allows comparison of data from different platforms and species (21). Three independent datasets of human OSCC (UW/FHCRC, 97 patients), MD Anderson (MD, 71 patients, ref. 29), and TCGA (134 patients) were first classified by stage (UW/FHCRC) or regional lymph node metastasis as surrogate markers of tumor aggressiveness and then independently analyzed by GSEA with OCAMP-A. In all cases, there was enrichment of OCAMP-A in human tumors (Fig. 3B–D) that was statistically significant for the TCGA (normalized enrichment score, NES = 1.6; nominal P value < 0.001) and UW/FHCRC (NES = 1.43; nominal P value < 0.05) but not MD datasets (NES = 0.9 and nominal P value = 0.57).
Despite the high P value for the MD dataset, we noted substantial overlap of enriched genes among the three human datasets. We used iterative GSEA-based enrichment (Fig. 4A, Supplementary Fig. S5A–S5F) to identify commonly enriched genes in all three human datasets and eliminate mouse-specific transcripts [Fig. 4B and C, designated OCAMP-B (118 genes)]. Because initial analysis for the MD set was not significant, we reassessed significance of OCAMP-B. The Kaplan–Meier analysis showed a statistically significant worse overall and disease-specific survival for patients with the aggressive OCAMP-B signature (Fig. 4D, P < 0.001 and Supplementary Fig. S6, P < 0.05). Similar analysis on patients with OSCC from the TCGA dataset was limited by the availability of follow-up data. However, OCAMP-B was predictive of both OS and DSS in the UW/FHCRC dataset (P < 0.01, Supplementary Fig. S7).
In current OSCC management, clinically node-negative patients (the cN0 patient—i.e., those with no suspicious neck lymph nodes by palpation or imaging) undergo neck dissection surgery depending on specific features of the primary tumor to pathologically identify occult nodal metastases. As this approach leads to unnecessary surgery in nearly 80% of patients, our goal is to identify gene expression in the primary tumor predictive of outcomes and occult metastatic disease among newly diagnosed and untreated patients. Thus, we used clinical rather than pathologic TNM staging (available only for the MD Anderson dataset) and found that OCAMP-B status defines unique prognostic subgroups within clinical stages I/II and III/IV (Fig. 4E and Supplementary Table S17A). Multivariate modeling showed a statistically significant independent effect of OCAMP-B such that patients with an aggressive signature were 3.9 times more likely to die (adjusted for TNM HR value; 95% confidence interval, 1.52–10.03; Supplementary Table S17B). Finally, we sought to compare the performance of the OCAMP-B signature with histopathologic grading. Of 18 patients who were cN0 but pathologically N+ (pN+, i.e., clinically did not have nodal disease but harbored disease on pathologic analysis after neck dissection), all 18 had the aggressive gene signature. In addition, of 24 cN+ and pN+ patients, all harbored the OCAMP-B aggressive signature. Finally, of 24 patients who were cN0 and pN0, all had the indolent signature (Fig. 4F, Supplementary Table S17C). Given that OCAMP-B was generated with overlaps between 3 datasets, the above stratification was not surprising. Toward independent confirmation of OCAMP-B performance, we used a 22-patient OSCC dataset from the University of Pennsylvania (30) and saw excellent stratification (21/22 tumors correct) with respect to lymph node metastatic status (Fig. 5A, Supplementary Table S18). Robust follow-up data for more complex analysis were not reported for this dataset. Together, these findings demonstrate that OCAMP-B allows disease outcome stratification at initial presentation based on results from the primary tumor.
A multigene assay to stratify OSCC by lymph node status
As knowledge of lymph node metastatic status of OSCC is critical in clinical decision making, including whether to suggest neck surgery for early stage cancers, we next asked whether the mouse signature could be translated into a diagnostic test as described for ocular melanoma (Fig. 5B, ref. 17). For training sets, we used 17 FFPE or 16 fresh biopsies specimens from the primary tumor of patients with OSCC from Washington University with known pathologic status. Using a Taqman platform and an SVM-learning algorithm (31), 42 discriminating genes were refined into 19 or 10 that classified the FFPE or fresh tumor set with 100% accuracy, respectively. Test sets of 13 independent FFPE or 18 fresh biopsy tumors were then subjected to the assay and analyzed as unknowns by the trained SVM. Accurate lymph node classification of 12 of 13 FFPE or 17 of 18 fresh tumors was achieved (Fig. 5C, Supplementary Table S19). Importantly, no N+ samples were classified as N0 and the two N0 samples classified as N+ were from larger T3 and T4 tumors. These data represent proof of principle that the OCAMP can be translated for clinical stratification of patients with OSCC.
Discussion
Here, we used genomic approaches including exome sequencing and transcriptional profiling to delineate the genetic basis of aggressive growth in the MOC model and in particular focused on its fidelity with human OSCC. Two obvious constraints of our approach were the limited number of different lines and that the aggressive lines were related. Despite these limitations, the MOC lines manifested the breadth of clinical scenarios observed in human OSCC. Our data showed that MOC lines as a group contained mutations in the majority of commonly mutated HNSCC genes, in driver pathways described in human OSCC and, in addition, highlighted potential new driver mutations. No recurrent mutations associated with aggressive growth were identified. However, transcriptomic analysis revealed a mouse metastasis signature that contained both known and novel candidates for promoters of aggressiveness. Even though this signature was derived from a small number of cell lines, we were surprised it was conserved in four independent human datasets including from the TCGA effort. Using iterative GSEA, we then developed a consensus 118-transcript metastasis predictor. Finally, using the mouse signature, we were able to develop a preliminary clinically applicable test for genetic stratification of OSCC. Thus, these data have significant potential implications for understanding the biology, prognosis, and therapy of human OSCC.
Recent genomics studies to define distinct HNSCC oncogenic driver classes revealed a major functional role for the PI3K pathway (10, 12). We found PI3K pathway mutations in MOC22 and 23 but their functional relevance has not been evaluated. As expected, all MOC lines shared RAS pathway mutations due to the predilection of DMBA for RAS mutations (20). Relevant to the HRAS-mutant group of human OSCC (12), MOC22 was found to have both HRAS and CASP8 mutations. Interestingly, KRAS was mutated in aggressive MOC lines and NRAS in MOC23; however, these alleles are less common in human OSCC. Importantly, based on the initial description of enhanced ERK1/2 activation in CD44+ aggressive lines (20), we have initiated a MEK inhibitor (trametinib) clinical trial in patients with OSCC (NCT01553851). Future studies will address the functional contribution of putative drivers. Our focus on the genetic contribution of a conserved mouse to human transcriptional signature supports the existence of a distinct program of aggressiveness in MOC2 and MOC2-10 and human OSCC that is independent of common driver mutations.
Analysis of the aggressiveness biomarker panel revealed several intriguing candidate promoters, most notably the lineage-specific transcription factor Nkx2-3 that, in addition to other tissues, is normally expressed in the developing tongue, floor of mouth, and mandible (32). The NKX family of homeodomain transcription factors has been implicated in a variety of malignancies with lung adenocarcinoma serving as a prime example where Nkx2-1 has a dual role in tumor promotion and metastasis (33). Interestingly, recent work has shown that the Foxa1 pioneer factor partners with Nkx2-1 (34), and our analysis shows that Foxa1 is also upregulated in Nkx2-3–expressing aggressive tumors. Finally, with regard to MOC2-10, we identified Hoxb7, which has been implicated in poor outcomes in OSCC (35) and Bmp4, which promotes breast cancer metastasis (36), as candidate regulators of lung metastasis in OSCC. Thus, this approach of murine modeling is highly useful, and it supports the generation of additional lines to assess the frequency of recurrent mutations, to extend genotype–phenotype correlations, and to undertake further detailed mechanistic work. Finally, although carcinogenesis with DMBA results in a high number of mutations, it clearly identifies conserved cross-species pathways in contrast with defined oncogene-driven models, perhaps because it allows the natural biology of OSCC to emerge.
Several groups have used expression analyses on human OSCC specimens to develop predictive genetic biomarkers (15, 16, 19). Van Hooff and colleagues prospectively showed that their signature had 86% sensitivity and 44% specificity for metastasis in early-stage OSCC (19). This signature had an 89% negative predictive value for metastasis in early-stage OSCC lesions, but clinical application of the test would still result in either under or overtreatment of significant numbers of patients. Thus, the exact utility of this assay in clinical practice remains to be defined and more robust assays are desirable. The OCAMP signature offers a unique biomarker for human OSCC, as it does not have significant overlap with work described to date (Supplementary Table S20). We successfully translated the OCAMP signature into a robust assay using a straightforward platform and anticipate rapid progression to larger samples and eventual validation in a prospective fashion. Further work focused on defining the molecular basis of OSCC aggressiveness using the high-fidelity MOC platform may identify additional novel therapeutic approaches for human OSCC.
Disclosure of Potential Conflicts of Interest
J.S. Lewis Jr received honoraria from the speakers' bureau of the American Society of Clinical Pathology and has an expert testimony from Stamos and Trucco LLP. A patent application for the metastasis signature is pending (Washington University/R. Uppaluri). No potential conflicts of interest were disclosed by the other authors.
Disclaimer
This publication is solely the responsibility of the authors and does not necessarily represent the views of NIH/National Center for Research Resources.
Authors' Contributions
Conception and design: M.D. Onken, A.E. Winkler, N.P. Judd, G.P. Dunn, J.F. Piccirillo, R. Uppaluri
Development of methodology: M.D. Onken, A.E. Winkler, E.R. Mardis, R. Uppaluri
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.E. Winkler, V. Chalivendra, J.H. Law, E.R. Mardis, R. Uppaluri
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M.D. Onken, A.E. Winkler, K.-L. Kanchi, V. Chalivendra, J.H. Law, C.G. Rickert, D. Kallogjeri, G.P. Dunn, J.F. Piccirillo, J.S. Lewis Jr., E.R. Mardis, R. Uppaluri
Writing, review, and/or revision of the manuscript: M.D. Onken, A.E. Winkler, K.-L. Kanchi, V. Chalivendra, C.G. Rickert, D. Kallogjeri, N.P. Judd, G.P. Dunn, J.F. Piccirillo, J.S. Lewis Jr., E.R. Mardis, R. Uppaluri
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.D. Onken, A.E. Winkler, K.-L. Kanchi, V. Chalivendra, R. Uppaluri
Study supervision: R. Uppaluri
Acknowledgments
The authors thank Dr. J. William Harbour for support during early phases of microarray analysis. R. Uppaluri thanks Dr. Richard Chole for sustained support.
Grant Support
This work is supported by The Genome Institute at Washington University for sequencing and analysis via a center-initiated project (NIH/NHGRI 5U54HG003079; Richard Wilson, principal investigator). NIH P30DC04665 supports the RCAVS Histology (Brian Faddis and Dorothy Edwards) and biostatistics cores (to D. Kallogjeri). It is also supported by the Siteman Cancer Frontier Fund, Tissue Procurement (NCI Cancer Center Support Grant #P30 CA91842) and GTAC (NCCR ICTS/CTSA UL1TR000448 and NCI-CA91842). M.D. Onken is supported by NIGMS #R01GM38542 (John Cooper, principal investigator).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.