Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a data set of 39,787 solid tumors sequenced using a clinically targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivaling the performance of WGS-based methods. GDD-ENS can also guide diagnoses of rare type and cancers of unknown primary and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows could provide clinically relevant tumor-type predictions to guide treatment decisions in real time.

Significance:

We describe a highly accurate tumor-type prediction model, designed specifically for clinical implementation. Our model relies only on widely used cancer gene panel sequencing data, predicts across 38 distinct cancer types, and supports integration of patient-specific nongenomic information for enhanced decision support in challenging diagnostic situations.

See related commentary by Garg, p. 906.

This article is featured in Selected Articles from This Issue, p. 897

Knowledge of a patient's tumor type is crucial for clinical decision-making in cancer, guiding therapy choice, and clinical trial enrollment. Tumor-type diagnosis is typically performed via histologic and IHC analyses, but these techniques may be inconclusive, especially if a patient's tumor is poorly differentiated, or when distinguishing between independent primary tumors or clonally related metastases. For cancers of unknown primary (CUP), which account for 3% to 5% of patients diagnosed in the United States today or 60,000 to 100,000 cases per year, the true tissue of origin is unknown and unable to guide treatment, limiting the number of available FDA-approved therapies and resulting in overall poor prognosis (1–3). Overall, as incorrect tumor-type diagnoses can lead to misinformed treatment decisions and poor patient outcomes, there is a clinical need to improve tumor-type diagnosis methods, as a form of enhanced decision support for precision oncology (4).

Large-scale, pan-cancer genomic analyses have shown that many genomic alterations and signatures are associated with specific tumor types. Some alterations are largely specific to a single type, e.g., APC loss-of-function mutations in colorectal cancer and TMPRSS2-ERG fusions in prostate cancer (5–7). Other alterations can narrow the search: TERT promoter mutations, for example, are predominantly found in glioblastoma, bladder, thyroid, and skin cancers (8). These associations between genomic alterations and tumor type could provide valuable diagnostic information when other indicators of tumor type are inconclusive. As such, clinical genomic profiling of tumors, already commonly used to identify therapeutically targetable alterations and guide the selection of precision onco­logy treatments, can be leveraged to guide and refine cancer-type diagnoses (9).

A major limitation of existing genomic-based tumor-type classifiers is their lack of clinical feasibility and scope. Currently, the most accurate classifiers rely on whole-genome sequencing (WGS) or whole-exome sequencing (WES) which are not in routine clinical use, in part due to the associated cost of performing WGS or WES for every patient (10–17). Even as the sequencing cost decreases, the infrastructure required for clinical tumor WGS/WES analysis, variant interpretation, and reporting does not exist at scale. As a result of these limitations, many of these classifiers are derived from small data sets across limited cancer types, likely not representative of true cancer type frequencies seen in larger patient populations, and therefore rendering a significant proportion of patients with cancer unpredictable within the context of the model (18–20). The expected performance of these models on excluded cancer types (i.e., those not available for classification by the model) is heretofore undescribed.

Clinical genomic sequencing is often performed using cancer gene sequencing panels, which target specific mutations in genes known to be recurrently altered in different cancer types. Unlike WGS or WES, these can be readily applied at a broad scale. Notably, Memorial Sloan Kettering's Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) panel is an FDA-authorized clinical test that profiles somatic and germline alterations in over 500 known cancer genes and has been used to sequence more than 75,000 patients to date (21). Developing a classifier from MSK-IMPACT data specifically would resolve many limitations related to clinical feasibility, relying on readily accessible clinical genomic data derived from a large patient cohort representative of true cancer-type incidences.

As such, our Center previously established a random-forest classifier to assist in genome-derived diagnosis (GDD-RF) to predict tumor type from MSK-IMPACT data. GDD-RF provides accurate predictions across 22 major cancer types, with only slightly decreased performance from WGS-based models, and has since been prospectively incorporated into a clinical decision-support pipeline (22). Since GDD-RF was released, the MSK-IMPACT data set more than quadrupled in size, potentially enabling improved accuracy on the original 22 types, expansion to previously excluded tumor types, and inclusion of additional genomic features. Additionally, deep-learning architectures have been shown to improve performance on similar tasks in the context of WES, WGS, and transcriptomic data, but have yet to be explored in the context of targeted cancer gene panels (11, 13, 16, 23). Here, we describe Genome-Derived-Diagnosis Ensemble (GDD-ENS): a deep-learning system designed to directly address these challenges, as well as others through specific modifications for enhanced clinical utility.

Our new model now distinguishes 38 tumor types with higher accuracy than GDD-RF. These 38 types represent 97% of solid tumors available for classification within the MSK-IMPACT cohort. Despite relying only on targeted panel sequencing, the prediction accuracy for GDD-ENS equals or exceeds the accuracy of WGS classifiers. GDD-ENS also allows for easy incorporation of additional, patient-specific clinical information and provides guidance on rare type and CUP diagnoses. Significantly, GDD-ENS has already been implemented prospectively to provide predictions and prediction-specific feature importance values to clinicians in real time.

Clinical Cohort and Development of Ensemble Neural-Net Model

To build GDD-ENS, we aggregated a discovery cohort of 42,694 solid tumors profiled by MSK-IMPACT between 2014 and 2020 with adequate sequence coverage and tumor content for GDD-ENS development (Fig. 1A; Supplementary Table S1–S3; Methods). Previously, we had developed a model, GDD-RF, on a smaller cohort and feature set (22). Against this baseline, we added newly collected samples, as well as new genomic features that we had previously established were predictive of tumor type (13). Improvements to the model, features, and training set were performed sequentially from the baseline GDD-RF model (Supplementary Table S4; Methods). The larger training set also allowed us to add 16 additional cancer types to the model; doing so reduces the proportion of excluded patients from 15% of the discovery cohort (for 22 types) to 3.1%. These excluded samples represent 45 different ultra-rare cancer subtypes, the majority of which have fewer than 15 relevant samples per type. After removing the excluded samples, we split the resulting discovery cohort into training and testing sets for model development. To account for the large variability in the overall sample size for the 38 types, we upsampled smaller cancer types to include a minimum of 350 examples per type during training. Any samples from patients included in the training set were removed from our testing set, so that the final sizes of both sets were 32,816 samples in the training set and 6,971 in the test set (Fig. 1A and B).

Figure 1.

Overview of GDD-ENS model. A, Cohort diagram, detailing samples used to form a training and testing cohort. Hematologic refers to blood-based cancers (i.e., leukemias) sequenced using MSK-IMPACT before the development of a separate nonsolid tumor assay. B, Training set distribution of cancer types included after expanding GDD-RF to GDD-ENS, colored by model inclusion. Any type with fewer than 350 examples is upsampled via replacement during training. NSCLC, non–small cell lung cancer; GIST, gastrointestinal stromal tumor; SQC, squamous cell carcinoma; SCLC, small cell lung cancer; PNET, pancreatic neuroendocrine tumor; Lu-NET, lung neuroendocrine tumor; GI-NET, gastrointestinal neuroendocrine tumor; Carc., carcinoma; MPNST, malignant peripheral nerve sheath tumor. C, Distribution of informative feature types. “Other” refers to clinical features or numerical features representing overall mutational burden across categories. Allele-specific hotspot features annotate hotspot mutation (i.e., KRAS G12C), whereas gene-specific hotspot features only specify the gene altered (KRAS hotspot). CNAs, copy-number alterations. D, GDD-ENS workflow from patient to output. All nonclinical features are derived from the MSK-IMPACT sequencing assay and then fed into GDD-ENS. GDD-ENS reports the top three tumor-type predictions with confidence estimates, along with 10 most important features for the top prediction on a patient-specific basis. Workflow overview created with BioRender (https://www.biorender.com/).

Figure 1.

Overview of GDD-ENS model. A, Cohort diagram, detailing samples used to form a training and testing cohort. Hematologic refers to blood-based cancers (i.e., leukemias) sequenced using MSK-IMPACT before the development of a separate nonsolid tumor assay. B, Training set distribution of cancer types included after expanding GDD-RF to GDD-ENS, colored by model inclusion. Any type with fewer than 350 examples is upsampled via replacement during training. NSCLC, non–small cell lung cancer; GIST, gastrointestinal stromal tumor; SQC, squamous cell carcinoma; SCLC, small cell lung cancer; PNET, pancreatic neuroendocrine tumor; Lu-NET, lung neuroendocrine tumor; GI-NET, gastrointestinal neuroendocrine tumor; Carc., carcinoma; MPNST, malignant peripheral nerve sheath tumor. C, Distribution of informative feature types. “Other” refers to clinical features or numerical features representing overall mutational burden across categories. Allele-specific hotspot features annotate hotspot mutation (i.e., KRAS G12C), whereas gene-specific hotspot features only specify the gene altered (KRAS hotspot). CNAs, copy-number alterations. D, GDD-ENS workflow from patient to output. All nonclinical features are derived from the MSK-IMPACT sequencing assay and then fed into GDD-ENS. GDD-ENS reports the top three tumor-type predictions with confidence estimates, along with 10 most important features for the top prediction on a patient-specific basis. Workflow overview created with BioRender (https://www.biorender.com/).

Close modal

Genomic features were derived from MSK-IMPACT data in several broad categories: mutations and indels, focal amplifications and deletions, broad copy-number gains and losses, structural rearrangements and fusions, mutational signatures, tumor mutation burden, microsatellite instability (MSI) score, and sex (Fig. 1C; refs. 21, 24–27). Feature categories were annotated to varying degrees of specificity; for example, we included separate gene-level features indicating the presence of any mutation, truncating mutation, or known cancer hotspot mutation for single-nucleotide variants and indels across the 341 genes included in all MSK-IMPACT panels, as well as both single-base substitution counts and precomputed scores for 8 mutational signatures (Supplementary Table S5). We show that incorporating features from all broad categories improves model performance (Supplementary Fig. S1; Supplementary Methods). Our final model includes 4,487 informative and interpretable features derived from the MSK-IMPACT panel and analysis pipeline.

The architecture of our final model is a hyperparameter ensemble of 10 individual multilayer perceptrons (MLP; Fig. 1D, Methods). We specifically chose this architecture because individual neural-net models have been shown to have poorly calibrated confidence estimates and ensembling is a well-established technique to improve calibration while also improving model performance (28). The training set was divided into 10 training and validation folds, where each validation set represented 10% of the full training set, and each model was trained on the remaining 90%. Models were initialized at the same starting parameters and allowed to optimize to different final parameter values as a result of their unique training and validation sets. The selected hyperparameters varied considerably among the models (Supplementary Table S6); often, this diversity improves generalization and out-of-distribution detection (29). For each sample, the 10 individual MLPs provide a softmax output across all potential tumor types, which is then averaged across all 10 models to return a final confidence estimate for each type. The type with the highest confidence after averaging represents the predicted type for the sample.

Classification Accuracy

We report the overall performance of our classifier on the held-out test set from the discovery cohort (Table 1; Supplementary Table S7). We measured performance using a wide variety of metrics but focused on overall prediction accuracy (proportion of correct predictions across all samples) and macro-precision (class-averaged precision, or the precision of each type averaged, across all 38 types) during optimization and development. The performance of each individual MLP model before ensembling ranged from 73.9% to 77.0% accuracy and 57.5% to 62.9% macro-precision on the test set (Supplementary Table S6). After ensembling, accuracy was 78.8% and macro-precision was 64.2% (Table 1). Accuracy increases when expanding to the second-highest (87.0%) and third-highest confidence prediction (90.2%) for each sample, as does macro-precision (75.8% and 78.1%, respectively). Moreover, each GDD-ENS prediction is returned with a confidence estimate, representing the model's estimated probability of correct classification, which we can use to distinguish between high-confidence (≥0.75) and low-confidence (<0.75) predictions (Supplementary Fig. S2; Supplementary Table S8; Supplementary Methods). In most clinical scenarios, we would only consider high-confidence predictions as potentially informative, and low-confidence predictions are less likely to factor into clinical decision-making. The average prediction confidence for all GDD-ENS predictions was 0.84, with 71.9% (5,013/6,971) of all test samples yielding high-confidence predictions. When restricting to predictions above the high-confidence threshold, accuracy increases to 92.7% and macro-precision to 87.7% (Fig. 2A). Strikingly, GDD-ENS's high-confidence prediction accuracy and macro-precision are comparable with accuracies reported for WGS and WES-based classifiers, despite being generated from a panel-based data set (Table 1; Methods; refs. 10, 11, 13–16). For each method, we also compared the number of predictable types and the in-distribution proportion, or the percentage of discovery cohort samples represented by each specific model's cancer type labels (Methods). GDD-ENS has the largest number of cancer types and the highest in-distribution proportion, indicating that it can provide relevant predictions for a larger proportion of patients with cancer than existing models. GDD-ENS also represents a significant improvement in accuracy, calibration, and in-distribution proportion from our original model, across all confidence levels (Table 1).

Table 1.

GDD-ENS performance and comparison with WGS/WES classifiers.

ModelData setTypesAccuracyMacro-prec.% In-dist% High conf.
DeepTumour (13WGS 24 91% 91% 73 — 
CUPLR (11WGS 33 89% 78% 92 — 
Salvadores-SVM (10WGS 18 91% 86% 74 — 
MuAt (14WGS 24 89% 87% 73 — 
Soh-SVM (15WES 28 77% 78% 84 — 
CPEM (16WES 31 84% 83% 85 — 
MuAt (14WES 20 64% 66% 74 — 
GDD-RF (22MSK-IMPACT 22 74% 71% 85 — 
GDD-ENS MSK-IMPACT 38 79% 64% 97 — 
CUPLR (11)
High conf. 
WGS 33 96% 82% 92 82 
GDD-RF (22)
High conf. 
MSK-IMPACT 22 91% 87% 85 62 
GDD-ENS
High conf. 
MSK-IMPACT 38 93% 88% 97 72 
ModelData setTypesAccuracyMacro-prec.% In-dist% High conf.
DeepTumour (13WGS 24 91% 91% 73 — 
CUPLR (11WGS 33 89% 78% 92 — 
Salvadores-SVM (10WGS 18 91% 86% 74 — 
MuAt (14WGS 24 89% 87% 73 — 
Soh-SVM (15WES 28 77% 78% 84 — 
CPEM (16WES 31 84% 83% 85 — 
MuAt (14WES 20 64% 66% 74 — 
GDD-RF (22MSK-IMPACT 22 74% 71% 85 — 
GDD-ENS MSK-IMPACT 38 79% 64% 97 — 
CUPLR (11)
High conf. 
WGS 33 96% 82% 92 82 
GDD-RF (22)
High conf. 
MSK-IMPACT 22 91% 87% 85 62 
GDD-ENS
High conf. 
MSK-IMPACT 38 93% 88% 97 72 

NOTE: WGS and WES-based methods perform better than panel-based approaches in general, as these approaches generate more data that can be used to derive additional informative features, like regional mutation density. However, high-confidence GDD-ENS predictions perform similarly to or better than most models, on a larger set of cancer types that covers a greater percentage of the solid tumor data set. Bold indicates best performing models for each metric.

Abbreviations: Macro-prec., macro-precision; % In-dist., in-distribution proportion or percentage of our solid tumor discovery cohort predictable by the classifier's specific training labels. % High conf., proportion of outputs above the high-confidence threshold above.

Figure 2.

GDD-ENS performance across cancer types. A, Row-normalized confusion matrix of high-confidence predictions across cancer types. Off-diagonal values correspond to the proportion of the row, predicted cancer type that is a column, true type. The true prostate cancer predicted ependymoma off-diagonal value indicates a single prostate cancer sample that was predicted as ependymoma, likely due to low tumor burden as it just passed the inclusion thresholds for tumor content. Due to rounding, some rows may not sum to 100. Top bar graphs indicate test set count (top) and type-specific recall across confidence levels. The left bar graphs represent high-confidence positive predictive value (high conf. PPV, right) and type-specific precision. Sorted by overall type precision on both axes. B, Calibration plot for GDD-single models and GDD-ENS ensemble. The X-axis represents maximum confidence after binning outputs into five equally sized ranges, and the Y-axis represents the overall accuracy of all predictions within that range. The blue line represents the GDD-ENS model; dark gray line and shaded regions represent mean GDD-Single accuracy and 95% confidence interval, respectively. Expected calibration error (ECE) calculated as per Methods. C, Shapley value score distributions for correct GDD-ENS predictions across individual types (left, middle) and all predictions (right). Importance score proportion represents the proportion of total Shapley value scores after summing the absolute Shapley value per feature across the specified subset, normalized as per Methods.

Figure 2.

GDD-ENS performance across cancer types. A, Row-normalized confusion matrix of high-confidence predictions across cancer types. Off-diagonal values correspond to the proportion of the row, predicted cancer type that is a column, true type. The true prostate cancer predicted ependymoma off-diagonal value indicates a single prostate cancer sample that was predicted as ependymoma, likely due to low tumor burden as it just passed the inclusion thresholds for tumor content. Due to rounding, some rows may not sum to 100. Top bar graphs indicate test set count (top) and type-specific recall across confidence levels. The left bar graphs represent high-confidence positive predictive value (high conf. PPV, right) and type-specific precision. Sorted by overall type precision on both axes. B, Calibration plot for GDD-single models and GDD-ENS ensemble. The X-axis represents maximum confidence after binning outputs into five equally sized ranges, and the Y-axis represents the overall accuracy of all predictions within that range. The blue line represents the GDD-ENS model; dark gray line and shaded regions represent mean GDD-Single accuracy and 95% confidence interval, respectively. Expected calibration error (ECE) calculated as per Methods. C, Shapley value score distributions for correct GDD-ENS predictions across individual types (left, middle) and all predictions (right). Importance score proportion represents the proportion of total Shapley value scores after summing the absolute Shapley value per feature across the specified subset, normalized as per Methods.

Close modal

We next sought to characterize the classifier accuracy at the level of individual cancer types. Individual type performance ranges from 14.3% to 90.7% before accounting for confidence thresholds (Supplementary Fig. S3; Supplementary Table S9). The majority of incorrect predictions are low-confidence (75.3%); when only looking at high-confidence predictions, all cancer types have ≥50% precision.

Incorrect high-confidence predictions often reflect tumor biology (Fig. 2A). For example, 10.1% of all high-confidence misclassifications arise between endometrial and ovarian cancer (37/366 predictions). It is well established that high-grade serous or endometrioid ovarian cancers have very similar genomic characteristics to endometrial cancers, making these cancer types particularly challenging to distinguish (30). Despite these errors, GDD-ENS correctly identified at least one endometrial cancer that was initially misannotated as ovarian cancer. Endometrial cancer was predicted with 0.998 confidence in a patient who was originally diagnosed with ovarian cancer (endometrioid subtype), but a subsequent clonally related tumor specimen was obtained from the same patient and rediagnosed as endometrial cancer, suggesting that the original diagnosis was incorrect.

Additionally, two tumors annotated as non-uveal melanoma received high-confidence and nearly high-confidence predictions of uveal melanoma (0.91 and 0.74, respectively). This was unexpected, as the genomics of uveal melanoma is notably distinct from other melanomas in that they harbor GNAQ/GNA11 mutations rather than alterations in the MAPK pathway more common in cutaneous, acral, and mucosal melanoma. However, on further inspection, these two cases both represented the very rare primary CNS melanoma subtype. Previous studies on primary CNS melanomas reported mutation profiles that are more similar to uveal melanomas and are notably distinct from other melanoma subtypes (31). Indeed, a genomic review of the primary CNS melanoma cases found that both harbored either GNAQ or GNA11 mutations, explaining their GDD-ENS predictions of uveal melanoma.

To ensure that our classifier was applicable across multiple races and ethnicities, we used genetically inferred ancestry as a proxy to assess accuracy across four major ancestry groups (32). Despite an overrepresentation of European ancestry within the discovery cohort, the proportion of high-confidence predictions and high-confidence accuracy within each ancestry was not significantly different from the overall high-confidence proportion and accuracy for any ancestry (Fisher exact test, P = 0.05) (Supplementary Fig. S4A and S4B; Supplementary Methods). Similarly, we show that GDD-ENS performance is largely invariant across histopathologically estimated tumor purities. With the exception of the 6% of our test set representing very low purity tumors (≤10%), where GDD-ENS performance decreases to 70% overall (89% at high confidence), we observed a consistently high accuracy across all other purities (Supplementary Fig. S5; Supplementary Table S10; Methods; Supplementary Methods). Finally, we also explored whether GDD-ENS could be directly applied to other genomic panels using a masked feature analysis and publicly available data from AACR-GENIE (Methods; Supplementary Methods; refs. 33–35). We found that the differences among panel designs result in only minor losses in accuracy and that GDD-ENS can be applied without modification to data from other centers, but performs best for panels that report copy number and use a matched normal reference similar to MSK-IMPACT (Supplementary Table S11).

Lastly, we ensure GDD-ENS is well calibrated by computing the calibration of the full ensemble model and the calibration of each of the individually trained MLPs prior to ensembling (Fig. 2B). In a well-calibrated model, the confidence value output by the model is approximately equal to the probability that its prediction is correct. Calibration is particularly important when model predictions could be used to guide clinical decisions. Both the calibration plots and the estimated calibration error (ECE) show that GDD-ENS is well calibrated and confirm previous reports that ensemble models are better calibrated than individual MLPs (28).

Prediction-Specific Feature Importance

To provide insight into the factors driving GDD-ENS's predictions, we report feature importance using Shapley values (36). Shapley values represent the proportion of each output that is contributed by individual features for nonlinear systems; their interpretation is similar to the proportion of variance explained in regression models (Methods).

We validated this approach for our model by aggregating Shapley values for all correctly predicted instances of each tumor type within the test set, identifying the strongest positive associations per type (Methods). Known associations between genomic alterations and tumor types were recaptured through this analysis across all cancer types (Fig. 2C; Supplementary Fig. S6; Supplementary Table S12). We specifically highlight colorectal cancer and cutaneous squamous cell carcinoma as two types with drastically different sample sizes and type-specific accuracies, yet Shapley value analysis identified relevant associations for both cancer types. Colon cancer is one of the most represented cancer types included in our training set, and we find that APC truncating mutations are the most important features for colorectal cancer predictions, as expected (5, 6). Cutaneous squamous cell carcinoma has a smaller sample size with only 8 total correct predictions in our test set, but its top predictive features, UV-signature and NOTCH1/2 mutations, are also consistent with the genomics for this cancer type, indicating that GDD-ENS learns important associations even from small numbers of training examples (37).

To understand what features are driving correct predictions overall, we first aggregated Shapley values across broad feature categories, previously described and highlighted in Fig. 1C, across all correct predictions. We found that, in aggregate, features derived from gene-level mutations were the most informative category, followed closely by copy-number alterations (CNA; Fig. 2C). When considered per cancer type, CNAs were the most important for 24 of 38 cancer types, and mutations for 11 types (Supplementary Figs. S6–S8; Supplementary Tables S12–S14; Methods; Supplementary Methods). This discrepancy occurs because CNAs are often the most important for rarer cancer types, whereas mutations are informative across almost all included cancer types (Supplementary Fig. S7).

Only three cancer types implicated a broad category outside of mutations/CNAs as most informative. Unsurprisingly, the most informative feature category for melanoma was mutational signatures, whereas uveal melanoma and sex-cord stromal tumors were driven by hotspot-based features. In uveal melanoma predictions, these were GNAQ/GNA11 hotspot mutations, a well-established genomic association as previously discussed (31). In sex-cord stromal tumors, these were FOXL2 hotspot mutations, specifically FOXL2 C134W, an association that has recently been shown to drive the formation of granulosa cell tumors (the largest subtype within the sex-cord stromal cancer type) in mice (38), but implication within the context of this large, pan-cancer prediction set suggests that these hotspots are a potential differential biomarker for the disease.

We then analyzed whether any features were significantly associated with the prediction of certain cancer types. For this, we expanded our analysis to all predictions, regardless of correctness, and conducted Mann–Whitney U tests to determine if any of the top-scoring features were significantly associated with certain cancer types (Methods; Supplementary Methods). After applying Bonferroni correction, 1,655 features were associated with specific cancer types with a P value of <0.05, which are presented in Supplementary Table S15. We broadly characterized significant associations across features relating to signatures, structural variants, and arm-level copy numbers in Supplementary Methods and Supplementary Fig. S9A–S9C.

Finally, to justify our choice of feature set and show the benefit of incorporating driver alterations at multiple levels of specificity, we highlight the Shapley values for KRAS-associated features. KRAS is a well-known driver gene present across many cancer types, but the landscape of KRAS alterations varies significantly across cancers (39). Using the strongest Shapley effect scores as before, we observe this variation when evaluating correct predictions across pancreatic cancer, non–small cell lung cancer (NSCLC), esophagogastric cancer, and colorectal cancer (Supplementary Fig. S10; Supplementary Methods). KRAS G12C is only associated with NSCLC predictions, whereas KRAS G12D, G12R, and G12V are more often implicated in pancreatic cancer, and KRAS amplifications are of the highest importance to esophagogastric cancer. All 341 genes included in the MSK-IMPACT panel are assessed and included as features in this way, allowing for GDD-ENS to learn and distinguish between specific driver alterations across cancer types, and likely accounting for the increase in performance compared with previous driver-based classifiers (13).

Performance on Excluded Samples

In clinical applications of tumor-type classifiers, it is important to consider all samples, not just those with tumor types represented in the classifier output. GDD-ENS's 38 cancer types are nearly comprehensive, with only 3.1% of samples (n = 1,321) in the discovery cohort coming from ultra-rare cancer types excluded from the 38 distinct classifiable types. Often it is possible to identify these so-called ‘out-of-distribution’ samples based on GDD-ENS's output. GDD-ENS predictions on these cases had an average confidence of 63.3%, with only 471 of 1,321 representing high-confidence predictions (35.7%; Fig. 3A), i.e., the majority of the excluded cancer type samples are correctly labeled as low-confidence, thus minimizing their impact in practice. Among the 471 high-confidence predictions assigned to the excluded types, we observed that cases were often predicted as more common cancer types within proper organ systems, likely because most cancers from the same organ system arise from similar molecular pathways (Fig. 3B). For example, over half of all small bowel and appendiceal carcinoma samples were predicted as colorectal. We therefore annotated all high-confidence excluded samples into one of 10 corresponding broad organ systems and determined how frequently the GDD-ENS predictions aligned with these annotations (Supplementary Table S13; Methods; ref. 40). For this analysis, we removed 36 samples that did not correspond to any of the 10 major organ systems described, resulting in 435 samples with distinct organ system annotations. Overall, 66% of samples were predicted within the expected organ system (288/435), indicating that high-confidence predictions on rare cancer samples can still guide predictions for the majority of cancer types and organ systems, even when the true correct type is not included in training.

Figure 3.

GDD-ENS performance on excluded cancer samples. A, Empirical cumulative distribution function comparing the probability of in-distribution test set and excluded samples (left). 72% of the in-distribution test set are high confidence, compared with only 36% of excluded samples. The relative fraction of excluded samples among the entire discovery cohort indicates the vast majority of solid tumors are in-distribution (right); conf., confidence. B, Row-normalized confusion matrix of the organ system of the true type for excluded samples vs. the high-confidence GDD-ENS prediction organ system. Colors represent broad organ systems annotated, many predictions are conserved within correct organ systems. Rows and columns are ordered by the total number of excluded type samples within each organ system (right), and the number of conserved predictions for each cancer type (top). GDD-ENS predictions correspond to types from the same organ system for 290/440 excluded samples with specific organ system annotations (66%).

Figure 3.

GDD-ENS performance on excluded cancer samples. A, Empirical cumulative distribution function comparing the probability of in-distribution test set and excluded samples (left). 72% of the in-distribution test set are high confidence, compared with only 36% of excluded samples. The relative fraction of excluded samples among the entire discovery cohort indicates the vast majority of solid tumors are in-distribution (right); conf., confidence. B, Row-normalized confusion matrix of the organ system of the true type for excluded samples vs. the high-confidence GDD-ENS prediction organ system. Colors represent broad organ systems annotated, many predictions are conserved within correct organ systems. Rows and columns are ordered by the total number of excluded type samples within each organ system (right), and the number of conserved predictions for each cancer type (top). GDD-ENS predictions correspond to types from the same organ system for 290/440 excluded samples with specific organ system annotations (66%).

Close modal

Similarly, we report the performance of GDD-ENS on samples previously removed from training due to very low tumor content. Although we observed an expected decrease in performance due to likely false-negative mutation calls, GDD-ENS still achieves 85% accuracy for the 40% of samples with high-confidence predictions (Supplementary Table S16).

Incorporating Additional Clinical Data into Classification

Nongenomic features such as clinical history, histopathologic features, and site(s) of metastasis are often available at diagnosis and may further inform cancer type. A natural way to incorporate these features is to compute a new cancer type prior distribution conditioned on these data. For example, to capture information about metastatic sites, we computed the cancer type distributions conditioned on each of 19 major metastatic sites using the annotated site of biopsy used for next-generation sequencing. These priors suggest that metastatic site is often informative about tumor type—55% of skin metastases submitted for genomic profiling originated from primary breast cancers, whereas 57% of pleural metastatic samples submitted originate from primary NSCLC (Fig. 4A; Supplementary Fig. S11; ref. 40).

Figure 4.

Adaptable prior distribution enables the incorporation of nongenomic information for enhanced predictions. A, Proportion of all in-distribution discovery cohort samples that are either primary or metastatic (top) or have broad histologic annotations (middle, top) per cancer type. Underlying distributions of metastatic site (middle) and histology (bottom) for all in-distribution discovery cohort samples across 19 metastatic sites and 2 histologic subtypes. Heatmaps are row normalized but only show 10 types (full heatmap in Supplementary Fig. S11). Met., metastatic; Prim., primary; Adeno., adenocarcinoma; SQC, squamous cell carcinoma. B, Overview of adaptable prior methodology. C, Flow of results for combination prior using both metastatic site and histology for all test set examples. Arrow base represents preadjustment category; arrowhead represents post-adjustment. Circle arrows indicate the number of samples that did not change categories after adjustment, i.e., 4,618 samples that were correct and high confidence before and after applying the prior. D, Walkthrough of a patient with head and neck squamous cell cancer predicted bladder by GDD-ENS with 0.85 confidence. After applying priors specific to the site of metastasis and annotated histology, the sample was correctly predicted with high confidence (0.96). Overview and flow diagram created with BioRender (https://biorender.com/).

Figure 4.

Adaptable prior distribution enables the incorporation of nongenomic information for enhanced predictions. A, Proportion of all in-distribution discovery cohort samples that are either primary or metastatic (top) or have broad histologic annotations (middle, top) per cancer type. Underlying distributions of metastatic site (middle) and histology (bottom) for all in-distribution discovery cohort samples across 19 metastatic sites and 2 histologic subtypes. Heatmaps are row normalized but only show 10 types (full heatmap in Supplementary Fig. S11). Met., metastatic; Prim., primary; Adeno., adenocarcinoma; SQC, squamous cell carcinoma. B, Overview of adaptable prior methodology. C, Flow of results for combination prior using both metastatic site and histology for all test set examples. Arrow base represents preadjustment category; arrowhead represents post-adjustment. Circle arrows indicate the number of samples that did not change categories after adjustment, i.e., 4,618 samples that were correct and high confidence before and after applying the prior. D, Walkthrough of a patient with head and neck squamous cell cancer predicted bladder by GDD-ENS with 0.85 confidence. After applying priors specific to the site of metastasis and annotated histology, the sample was correctly predicted with high confidence (0.96). Overview and flow diagram created with BioRender (https://biorender.com/).

Close modal

To allow users to incorporate this and other such information in practice, GDD-ENS permits its predictions to be flexibly recalibrated to account for these features without additional training on nongenomic information. In this “adaptable prior” model, GDD-ENS is augmented with a naïve Bayes classifier that incorporates the GDD-ENS output and adjusts for any additional nongenomic features, where each feature is represented by its own distribution over cancer types (Fig. 4B; Methods).

We demonstrate the utility of this approach using two broad, nongenomic sources of information: metastatic biopsy site and tumor histology category. For the metastatic biopsy site, we use the 19 major groupings described above to map metastatic biopsy site labels to a broader anatomic site category. We then aggregate all biopsy site-cancer type pairs across all metastatic samples within the training set to compute distributions for the frequency of each cancer type at the biopsy site (Methods). For the histology-based prior, we focus on carcinomas, which represent the majority of our solid tumor data set, and developed priors based on the two major carcinoma subtypes: adenocarcinomas and squamous cell carcinomas. Augmenting genomic predictions with prior information from these broad systems allows for improvements in accuracy and macro-precision when applied to relevant subsets of the test set (Table 2; Supplementary Fig. S12). Furthermore, the frequency of high-confidence, correct predictions increases. In cases where samples had both histology and metastatic site annotations, we incorporated multiple sources of prior information within our naïve Bayes classifier approach, which returned similar performance (Fig. 4C and Table 2).

Table 2.

Adaptable prior results.

PriorRel. samplesAccuracy% High conf.High conf. accuracy
Met. site N = 2,166 Before 78.2% 72.0 92.2% 
  After 80.8% 76.3 92.6% 
Histology N = 4,574 Before 82.1% 75.3 93.5% 
  After 84.3% 81.6 92.5% 
Combined N = 1,518 Before 82.1% 75.4 94.1% 
  After 87.7% 85.2 93.8% 
Any relevant N = 5,222 Before 80.4% 73.9 92.8% 
  After 83.6% 80.7 92.2% 
PriorRel. samplesAccuracy% High conf.High conf. accuracy
Met. site N = 2,166 Before 78.2% 72.0 92.2% 
  After 80.8% 76.3 92.6% 
Histology N = 4,574 Before 82.1% 75.3 93.5% 
  After 84.3% 81.6 92.5% 
Combined N = 1,518 Before 82.1% 75.4 94.1% 
  After 87.7% 85.2 93.8% 
Any relevant N = 5,222 Before 80.4% 73.9 92.8% 
  After 83.6% 80.7 92.2% 

NOTE: Accuracy metrics before and after applying adaptable prior to relevant samples for either metastatic site, histology, or both, where “Combined” represents the samples with both types of annotations, and “Any relevant” represents all test samples after applying either metastatic, histology, or combined priors. “Rel. samples” indicates the number of samples within the test set which could potentially be recalibrated given the samples’ annotations. Overall accuracy and high-confidence proportion increase for all three priors.

Abbreviations: Conf., confidence; Rel., relevant.

We highlight a specific case where applying both adaptable priors was instrumental in producing an accurate prediction. As demonstrated in Fig. 4D, we identified lung metastasis from a patient with head and neck squamous cell carcinoma (HNSCC), initially predicted by GDD-ENS as bladder cancer with 0.85 probability representing a high-confidence, incorrect prediction. After applying both priors, the sample was predicted correctly as HNSCC with 0.96 confidence. Therefore, applying both available priors not only corrected the prediction but also increased the prediction confidence enough to surpass the high-confidence threshold for the correct type. Given the success of adjusting outputs given both priors when available, we have made our final adaptable prior extension flexible to multiple sources of prior information, including discrete priors that could be derived from a patient's specific clinical history.

CUP Analysis

An important potential application of our method is for CUP samples, so that tumor types can be identified through confident genomic-based predictions even when histology is inconclusive. Within the MSK-IMPACT discovery cohort set, there are 1,586 unknown/mixed cancer types, of which 1,441 samples are defined as true CUP samples derived from 1,400 patients. To assess unbiased GDD-ENS performance on CUP samples, we removed any corresponding in-distribution samples from CUP patients within our training set and retrained GDD-ENS before generating predictions across all the CUP cohorts (Methods). Although we cannot definitively assess the accuracy of these predictions because most of these tumors remain unclassified by standard histology-based criteria, we identified 26 patients where the CUP sample was followed by a subsequent positively diagnosed MSK-IMPACT sample performed on the primary tumor, in which the paired samples were found to be clonally related (Methods). GDD-ENS predicted the later-confirmed tumor type for 21 of 26 patients (80.8% accuracy), with the correct indication appearing in the top three predicted types for 24 of 26 cases. Fifteen of 26 samples were predicted with high confidence, 100% of which represented correct predictions (15/15). These correct predictions spanned 8 distinct cancer types, indicating that GDD-ENS has the capability to accurately guide CUP diagnoses and treatment decisions when histologic conclusions are indeterminate for multiple cancers (Supplementary Table S17).

This high accuracy encouraged us to expand our analyses to all remaining CUPs, focusing on the high-confidence predictions to quantify model power in the absence of later-confirmed diagnoses. GDD-ENS returned 45.6% of samples with high-confidence, across 36 of 38 potential cancer types (Fig. 5A). As these cases represent extremely challenging diagnostic scenarios, we further analyzed samples with extremely high-confidence GDD-ENS predictions; overall, 400 total predictions on CUP samples were >95% confidence, and 255 were >99% confidence, suggesting broad clinical utility of GDD-ENS when applied to cases of indeterminate histopathology-based diagnosis. 906 CUP samples had relevant metastatic site annotations within the 19 previously defined metastatic site labels, 320 had broad histology annotations (e.g., squamous cell carcinoma not otherwise specified) and 190 samples had both, so we applied our adaptable prior distributions to recalibrate CUP predictions. The proportion of high-confidence predictions increased after applying the priors separately (47.1%–50.8% for metastatic biopsy prior, 42.1%–54.1% for the histology prior, and 42.1%–58.4% for the combination prior), indicating that adaptable priors for metastatic site and histology can help increase confidence and inform predictions in CUP patients when available.

Figure 5.

GDD-ENS predictions on CUP patients can identify targetable alterations. A, Distribution of GDD-ENS predictions for all 1,441 CUP patients. B, Bar plot indicating the total number of patients with targetable alterations after GDD-ENS predictions (bottom axis). Only 550 CUP patients had potentially actionable alterations at level 3B or higher in CUP patients before GDD-ENS predictions; the top axis indicates the overall proportion of these patients with identified alterations. C, Most frequently identified targetable alteration for only high-confidence GDD-ENS predictions. Targetable alterations representing a specific allele change or structural variant are annotated as such; otherwise, counts represent a combination of broad oncogenic mutations across multiple alteration types and allele changes.

Figure 5.

GDD-ENS predictions on CUP patients can identify targetable alterations. A, Distribution of GDD-ENS predictions for all 1,441 CUP patients. B, Bar plot indicating the total number of patients with targetable alterations after GDD-ENS predictions (bottom axis). Only 550 CUP patients had potentially actionable alterations at level 3B or higher in CUP patients before GDD-ENS predictions; the top axis indicates the overall proportion of these patients with identified alterations. C, Most frequently identified targetable alteration for only high-confidence GDD-ENS predictions. Targetable alterations representing a specific allele change or structural variant are annotated as such; otherwise, counts represent a combination of broad oncogenic mutations across multiple alteration types and allele changes.

Close modal

Lastly, a major difficulty in CUP treatment is that most precision oncology therapies are only approved by the US Food and Drug Administration for specific cancer-type indications. For example, although a patient with an ERBB2 amplification and a breast cancer diagnosis would have on-label access to trastuzumab as well as newer antibody–drug conjugates such as T-DM1 and T-DXd, these potent drugs are not approved for patients with CUP (41). We sought to determine if GDD-ENS predictions could increase the number of CUP patients with potentially eligible for available therapies.

We used OncoKB, a precision oncology knowledge base, to annotate the clinical actionability of observed genomic alterations for both CUP and the GDD-ENS predicted cancer-type diagnosis, with Level 1 representing an FDA-recognized biomarker predictive of response to an FDA-approved drug within the given cancer type and Level 3B representing a predictive biomarker in another cancer type (42). Among the 1,441 CUP patients, only 550 had potentially targetable alterations at Level 3B or higher, of which only 71 (12.9% of 550) of CUPs harbored Level 1 alterations—the latter were linked to pan-cancer FDA approvals namely, BRAF V600E (dabrafenib plus trametinib), NTRK fusions (entrectinib or larotrectinib), RET fusions (selpercatinib), and microsatellite instability-high tumors (pembrolizumab). Thus, 479 CUP patients could potentially benefit from tumor-type identification.

After assigning GDD-ENS predictions to these CUP samples, we identified Level 1 actionable alterations in 101 additional patients with high-confidence tumor-type predictions, representing a 2.4-fold increase (Fig. 5B). The most common targetable alterations identified among these high-confidence predictions were KRAS G12C mutations (NSCLC), ERBB2 amplifications (esophagogastric and breast cancer), and PIK3CA oncogenic mutations (breast cancer; Fig. 5C). Notably, an additional 12 CUP tumors with BRAF V600E mutations were predicted with high confidence to be colorectal cancer, a diagnosis that would change the approved Level 1 treatment from dabrafenib plus trametinib to encorafenib plus cetuximab (43). For those patients in which the top GDD-ENS cancer type did not return any actionable alterations, we expanded our assessment to include lower-confidence predictions, or targetable alterations within the top 3 predicted cancer types (Fig. 5B, Methods). With this broadest scope, 263 of 550 (47.8%) patients with potentially targetable alterations were connected with FDA-recognized treatments, a 3.7-fold increase from the initial 71 patients identified prior to tumor-type classification. Although GDD-ENS predictions alone are not sufficient to result in the change of a CUP diagnosis to a definitive type for every case, this analysis indicates the potential clinical utility of our classifier in clarifying challenging clinical cases using purely genomic panel sequencing data.

We present a deep-learning tumor-type classifier, GDD-ENS, which predicts across 38 distinct cancer types using targeted DNA sequencing data. Specifically, our model is trained using data from the MSK-IMPACT panel, which captures exons from only 341 genes, most of which are common cancer drivers. Despite this, GDD-ENS's high-confidence predictions have performance characteristics comparable with or better than state-of-the-art classifiers trained on WGS data. Unlike previous studies indicating that tumor-type classifiers built on driver mutations alone have poor performance (13), we show that with sufficient training data set size, targeted gene panels can support highly accurate tumor-type classification. Our model features are also well designed to support a panel-based classifier. Building from our prior model (22), we capture SNVs, indels, and CNAs at multiple levels of specificity (i.e., hotspot and allele-specific features for each gene).

Although our feature set and model choices are already performing well, we acknowledge that other methods of feature representation and model architectures could improve the accuracy of GDD-ENS. We currently represent SNVs, indels, and CNAs at multiple levels of specificity (i.e., hotspot and allele-specific features for each gene)—incorporating driver alterations in this way allows for the broad spectrum of alterations within a single driver to be represented in our feature set for GDD-ENS to use to distinguish similar mutations across diverse cancer types. However, this also creates a large, sparse feature set. Future work to define or select features through unsupervised or self-supervised learning could produce more succinct representations (14). Similarly, we chose our feature set to be mostly binary for the sake of interpretability, but future work could investigate incorporating signals from mutations and CNAs as continuous features (i.e., VAF). Finally, other model architectures such as convolutional neural networks or recurrent neural networks could be worth further exploration.

Most of GDD-ENS predictions are high confidence (71.9%), and these predictions are reported across a much-expanded set of distinct cancer types (Table 1). Expanding to include more cancer types results in some individual types having poor performance when observing all predictions; however, we find that simply thresholding outputs based on confidence removes the majority of incorrect predictions, as high-confidence precision values for all cancer types are ≥50%, and 6 types reach 100% (overall average 88%). As these types in some cases represent broad cancer labels of multiple distinct subtypes, future work to develop subtype-specific classifiers should be explored. It should also be noted that our reported accuracy estimate is likely a lower bound. Some predictions that are discordant from the reported, true diagnosis labels could be correct predictions that are misdiagnosed or mislabeled within our test set. Although these labeling and diagnosis errors are likely infrequent, GDD-ENS can be used to identify these cases for further review, which would also improve overall classifier performance.

In developing a tumor-type classifier specifically for clinical deployment and use, it is important that the model is comprehensive, in that it is able to provide tumor-type predictions across the majority of patients with cancer. The 38 cancer types recognized by GDD-ENS cover 96.9% of patients within our real-world patient cohort, representing the vast majority of samples. We also describe expected performance on any rare excluded types, in order to enhance interpretation of the model's result in a clinical setting. The majority of the 3.1% patients with excluded cancer types are assigned low-confidence predictions by GDD-ENS or have predictions within the correct organ systems, indicating that GDD-ENS can guide diagnoses even on untrained, rare cancer types. Future work could be directed to develop methods that could enable auxiliary rare type classification and detection even with small sample sizes; specifically using few-shot learning for even rarer subtypes with no related cancer types in the in-distribution setting, as these are more likely to have distinct or unrecognizable genomics (44).

We demonstrate the utility of an adaptable prior distribution function that enables prediction-specific recalibration and show its potential for improving model performance in multiple contexts. This framework enables clinicians to incorporate clinical information such as the site of biopsy and patient history into tumor-type predictions without training the classifier on this information, and could easily be expanded to other contexts, including other genomic sources or entirely new data modalities. For example, RNA expression or DNA methylation profiles are often strong indicators of cell state and tissue/cell of origin, though these assays are less frequently performed in the clinical setting. When these data are available, the adaptable prior offers a way to incorporate these data, potentially supplementing an independent GDD-ENS classification. Although models that are purely trained on RNA expression or DNA methylation (or from other data sources such as pathology or radiology slides) can predict tumor type well, a purely DNA panel-based classifier like GDD-ENS has important advantages (45–47). Tumor-type predictors using expression and methylation-based data are often trained on smaller data sets that encompass fewer cancer types. The training data are often derived from primary tumor samples and, as such, the models are more sensitive to confounding by nontumor tissue contamination when applied to metastases (48, 49). In contrast, because GDD-ENS is based on data from a routinely applied assay in a clinical setting, it is trained and tested on large cohorts of primary and metastatic samples from a broad range of cancers, providing a clinically relevant estimate of performance.

We have shown that GDD-ENS would be relatively easy to extend to inputs drawn from other targeted assays because most genes targeted by MSK-IMPACT are also targeted in other large panels. GDD-ENS generalizes well in these cases by generating accurate predictions using only these overlapping genes. Retraining GDD-ENS on the MSK-IMPACT cohort using only data from these overlapping genes, or fine-tuning GDD-ENS using a small amount of data from the other panels, could overcome any potential performance losses due to different methods of variant calling, sample preparation, and tumor/normal sequencing regimes. To accommodate this retraining, we provide a framework for generating a feature table and code for training such a new model. Alternatively, one could also explore using the adaptable prior mechanism to incorporate missing data and new features.

Lastly, we provide GDD-ENS predictions across a set of CUP patients, with 45.6% of patients predicted with high-confidence and 27.8% above 95% confidence. For a set of 26 sample pairs where patient initial samples were confirmed through later patient samples, GDD-ENS accurately classifies early CUP samples in 21 of these patients and achieves 100% accuracy for the 15 samples predicted with high-confidence, which would have enabled faster guided treatments should these predictions have been able to confirm diagnoses in real time. Furthermore, we show that GDD-ENS predictions when applied to patients with CUP could expand patients with potentially targetable Level 1 alterations, increasing the proportion of actionable patients from 13% to 48% of a potentially actionable subset. The success of GDD-ENS in patients with CUP could potentiate the model for the classification of early cancer samples, an open area of research for many early detection models. As such, work to expand GDD-ENS to cell-free DNA sequencing data is currently ongoing (50).

In summary, this study demonstrates a highly accurate tumor-type classification model that was designed to specifically improve upon the clinical relevance and applicability of existing, genomic-based tumor-type classifiers through the use of a large, fixed-panel data set and a deep learning model. The potential clinical relevance of our model is immediate: GDD-ENS has already been incorporated into MSK-IMPACT's workflow to provide cancer-type predictions and flag cases for deeper diagnostic review and workup in real time. Overall, GDD-ENS is uniquely positioned for tangible, clinical impact and is capable of providing broad and comprehensive decision support for the many cancer diagnostic challenges.

Study Design

The purpose of this study was to develop and evaluate an algorithm for tumor-type classification using genomic features derived from fixed-panel genomic sequencing data. To do so, we collected a data set of 47,938 samples across 43,625 patients profiled by the MSK-IMPACT clinical series. All patients completed written informed consent for somatic mutation profiling in a CLIA-compliant laboratory and for enrollment on a research protocol approved by the Memorial Sloan Kettering Cancer Center Institutional Review Board (NCT01775072). Samples from rare or nondistinct cancer types, duplicated patient samples, and low tumor content samples were removed for later analysis; all remaining samples formed a discovery patient cohort of 39,787 samples. We used data from this cohort to train, test, and validate our model, generating 4,487 features per sample. The classifier is a hyperparameter ensemble of 10 multilayer perceptrons, which used a Gaussian process for hyperparameter optimization during training separately for each model before averaging model outputs for final classification results. We optimized for overall model accuracy and macro-precision (class-averaged precision), as well as the proportion of high-confidence samples (≥75% probability) and proportion of out-of-distribution samples during model development and developed methods for extended clinical utility by incorporating nongenomic features for output recalibration and reporting performance on CUP.

Patients and Samples

The discovery cohort included 80 distinct cancer types, which were determined and recorded in real time, along with primary or metastatic site classification, as part of the clinical workup of each case. Surgical pathology reports for all cases were reviewed by molecular pathology fellows in order to ensure accurate diagnoses of each sample and patient, and annotated using appropriate OncoTree label codes representing their detailed type and subtype (51). All cancer types and subtypes observed within this initial discovery cohort are annotated in Supplementary Table S2.

For classification purposes, we implemented a tumor content threshold consistent with the GDD-RF model, in order to ensure all samples had sufficient tumor content. Biopsy samples with low tumor signal would likely have reduced sensitivity for detection of genomic alterations, copy-number changes, signatures, and fusions. This sensitivity reduction would disproportionately affect cancer types with lower frequencies of the genomic alterations, so all samples for which all variants detected had a somatic mutant allele frequency less than 10% and with CNAs with an absolute log ratio less than 0.2 were excluded, removing 4,872 samples of our initial cohort.

We excluded 1,586 CUP samples, as well as any diagnosed as a heme-related subtype to focus on classifiable, solid tumor samples. 22 major cancer types were selected for the initial development of GDD-RF, which all had at least 40 independent tumors in the original MSK-2017 training set. Given the performance of the initial classifier and sample counts within the expanded data set, these 22 types were refined and expanded to a final set of 38 types selected for classifier development. Only 1,321 samples (3.1% of high tumor content, solid tumor discovery cohort) were excluded as a result of this type of exclusion. Thirty-seven of 38 types represented distinct tumor types derived from OncoTree annotation and pathology review (51). Only Ano-Genital is a combination of two originally distinct types, made up of both cervical cancer and anal squamous cell carcinomas, as these types were found to be indistinguishable genomically during classification likely because the development of both types is strongly associated with HPV.

The final discovery cohort was split into a training and testing set in an 80:20 split using simple randomization. We removed all redundant patient samples from patients with more than one sample of the same diagnosed cancer type or repeated patient examples in training from our testing set. Our final training and testing cohort included 32,816 and 6,971 samples, respectively. Patient age, sex, and other key demographic characteristics of this cohort are shown in Supplementary Table S3.

Accessing Data

Data for all samples used in this study are available through AACR Project GENIE as part of release 14.0 (52). We have provided GENIE Sample IDs and the corresponding classes (i.e., training, low tumor content, out of distribution) for our entire cohort in Supplementary Table S1. Instructions for downloading GENIE data are available at https://www.synapse.org/#!Synapse:syn7222066/wiki/410922. The raw files used to generate our feature table include data_clinical_patient.txt, data_clinical_sample.txt, data_mutations_extended.txt, data_CNA.txt, data_sv.txt, and data_cna_hg19.seg.

Feature Derivation

GDD-ENS includes 4,487 features representing mutations, indels, copy-number rearrangements, fusions, and signatures at varying degrees of specificity. These features represent informative features detected at least once within the discovery cohort training set, any all zero features were removed. Although the majority of features included are retained from GDD-RF and are therefore aggregated in the same way, we detail each feature category below, highlighting changes specific to GDD-ENS. All features included are detailed in Supplementary Table S5.

Mutations.

A binary feature indicating the presence or absence of a nonsynonymous missense mutation for each of the 341 genes present in the original MSK-IMPACT panel was annotated for each sample in the discovery cohort, as well as an additional binary feature specifying the presence of a truncating mutation in these genes. The overall mutational burden attributed to SNVs and INDELs, separately, was included as a numerical feature.

Hotspots.

We used a data set of 194 predefined cancer hotspot mutations to generate binary features for the presence or absence of gene-specific hotspots for each of the original 341 MSK-IMPACT panels. These hotspots were aggregated from cancerhotspots.org (26, 27). We also included 2,725 binary features for the presence or absence of any specific hotspot allele within our predefined hotspot list.

CNAs.

The presence or absence of focal amplifications and deep deletions for each of the 341 genes in the panel were included as binary features. Furthermore, binary features indicating the presence or absence of genomic gains and losses for each chromosome arm were calculated from MSK-IMPACT data. Chromosome arms were defined using genomic coordinates in the GRCh37/hg19 human genome assembly and considered gained or lost if >50% of the arm was affected by segment gain or loss with an absolute value of ±0.2. Lastly, we included a numerical feature representing the overall mutational burden attributed to CNAs, defined as the percentage of the autosomal genome affected by copy-number gains or losses, as derived from sample segmented log-ratio data.

Structural Variants.

There are several intronic regions included in the MSK-IMPACT panel designed to assay genes that are commonly rearranged in cancer for structural variants. We identified 13 intragenic fusions and 2 intergenic fusions for feature inclusion from the literature. In GDD-ENS, we included binary features for the presence or absence of these structural variants on a gene-specific level, an update from the feature representation in GDD-RF, which reported presence or absence on a fusion-specific level. Focusing on the broad genes involved in common structural variants instead of limiting genes to common fusion partners reduced the sparsity of our data set.

Signatures.

The presence of 8 different mutational signatures was calculated for each sample with at least 10 synonymous or nonsynonymous mutations. Any signature representing more than 40% of mutations was annotated as present.

In updating feature representations for GDD-ENS, we expanded the number of included signatures. We also included the single-base substitution counts in numerical form for each of the 96 possible substitutions as a way of expanding differential signature strength across samples beyond a binary representation.

Clinical Features.

Patient sex is included as a binary feature, as certain cancer types are sex specific. In GDD-ENS, we also include the MSI-Sensor score, a measure of MSI-Status, as a numerical value (24).

GDD-ENS Model Development

Stepwise Model Evaluation (GDD-RF to GDD-ENS).

The original GDD-RF model is a multiclass, random-forest ensemble algorithm. Although this model was well-suited for the initial problem and training set, when expanding our data set to include nearly 5 times as many samples and new cancer types, it became necessary to explore other algorithms and feature sets. In order to justify these adaptations, we performed a series of stepwise experiments (Supplementary Table S4). First, using the original training set of 7,791 samples across 22 cancer types, we updated the model architecture from a random forest to a single MLP. We then updated the features from GDD-RF's representation as described in the Feature Derivation section of the methods. Next, we updated the data set to use the full discovery cohort described. We then updated the architecture again to a hyperparameter ensemble, then finally updated the types from 22 to 38. Each described adaptation was coupled with an improvement in most accuracy metrics, or in the case of the type expansion, a major decrease in the proportion of out-of-distribution patients.

As a final means of comparison, we directly compare the random forest and hyperparameter ensemble architectures in two ways: first, we train an ensemble model using the smaller GDD-RF training set, cancer types, and features that we compare to GDD-RF's overall performance; second, we train a random forest on GDD-ENS’ training set, cancer types, and features, which we compare to GDD-ENS overall performance. For the first comparison, we assess performance on a separate held-out test set of 11,644 samples, described in GDD-RF's initial study (22). As expected, the ensemble trained on the original random-forest data set performs better than GDD-RF across all metrics. GDD-ENS also outperforms the random forest built on the updated training set, features, and types, indicating that the ensemble neural-net architecture positively contributes to the predictive capability of our model, as hypothesized.

Training Procedure.

We first trained a single, fully connected feed-forward neural network, which we called GDD-NN. During training, smaller cancer types are upsampled such that each type has a minimum of 350 examples available. We use a Bayesian optimization approach to select hyperparameters, selecting across multiple hyperparameters: number of hidden layers, number of neurons per hidden layer, learning rate, dropout rate, and weight decay. The final layer of the network was a softmax output, representing the probability distribution of the 38 tumor types. The type with the greatest softmax probability represented the predicted type, and the maximum probability value represented prediction confidence.

The model was trained using Adam, with a batch size of 32 for 200 epochs. Initial hyperparameters were set to the default Adam parameters, but updated during GDD-NN development to represent relevant initializations and optimization ranges (Supplementary Table S6). We specifically trained the model using gp-minimize, from the scikit-learn scikit-optimization library (version 0.22, RRID:SCR_002577), which evaluates hyperparameter performance using a gaussian process, using gp_hedge as an acquisition function. Each time overall validation accuracy was improved, the model was saved using the given hyperparameter set. To prevent overfitting, we implemented an early-stopping mechanism that stopped model development after either 5 model updates or 500 calls to gp-minimize. Final model performance was reported across a held-out test set after model training.

As neural nets are often overconfident when presented with training examples with limited data, we developed our ensemble model GDD-ENS, as these architectures have been shown to effectively reduce calibration error. To do so, we split our training set into 10 stratified training and validation sets and trained 10 separate models such that individual hyperparameters were selected across each validation set, using the same process and upsampled training set as described for GDD-NN. We used scikit-learn's StratifiedShuffleSplit function to ensure that each fold is made up of similar proportions of each cancer type, and thus is a good representative of the entire cohort. The softmax layer of each model was averaged across all 10 models to provide a final prediction and confidence value. We assessed and compared model calibration for GDD-NN and GDD-ENS using ECE. Briefly, this represents the difference in expected confidence and reported accuracy within binned confidence levels, normalized by the number of samples reported within each confidence level.

Each model was implemented and trained in Pytorch 1.4.0 (RRID:SCR_018536). All code was written in Python 3.7.6 (RRID:SCR_001658).

Comparison with WGS and WES-Based Classifiers.

For each WGS and WES-based model, we compared the total number of types included to the total number for GDD-ENS. However, these labels were often inconsistent across classifiers, in that some types were more broadly annotated than others, i.e., “Sarcoma” in GDD-ENS, vs. “Osteosarcoma” in other models. These represent different subsets of sarcoma patients; a more specific label, while informative, would be incorrect for any sarcomas that are not osteosarcomas. In order to compare the proportion of patients with cancer that are “in-distribution” for each of the classifiers, we calculated in-distribution proportion, defined as the proportion of the solid tumor samples with sufficient tumor content within the discovery cohort that are classifiable with each model's distinct classifier label set and annotations. In cases where classifiers did not specify included subtypes within a broad cancer type (i.e., only Sarcoma), we included any within the broadest cancer type labeling.

Shapley Values

We used the SHAP Python package (version 0.37.0, RRID:SCR_021362) to calculate feature importance values per prediction. The upsampled training set was summarized using the SHAP kmeans function to represent baseline values for each cancer type. We then implemented a Kernel Explainer to generate prediction-specific importance values. The Kernel SHAP method is model-agnostic and uses a specially weighted local linear regression to estimate Shapley values for GDD-ENS, given the background distribution from the summarized training distribution. This method was used to calculate important features per prediction across the test set. We report the absolute Shapley value for each feature, specifying whether the importance value was derived from the presence or absence of each feature, excluding SBS transition counts from final outputs due to their lack of interpretability.

To validate this method, we aggregated all feature importance values for correct test set predictions within each cancer type. We calculated per-sample feature effect scores by multiplying the reported Shapley value score for the feature by an indicator signifying whether or not the feature was present within the sample (i.e., 1 or −1). This is because a negative Shapley value score, combined with the absence of a feature, indicates a positive effect of that feature on the predicted cancer type. Within the context of a single cancer type, features with negative overall effect scores represent features that are highly predictive of other cancer types. Therefore, to determine the individual features that were most predictive of each type, we remove any features with negative effect scores and then normalize them by the sum of all positive effect scores.

As mutation and copy-number-based features were represented in varying degrees of granularity (i.e., mutation presence, truncating mutations, amplifications and deletions, and hotspot mutations for a single gene), we further aggregated these features to individual units to assess the overall importance of each feature unit to correct predictions within each cancer type. We report the top importance of broad features per cancer type, as well as across organ systems (Fig. 2C; Supplementary Figs. S6 and S8; Supplementary Tables S12 and S14).

In a separate analysis, we aggregated Shapley values even further using broad feature categories (i.e., mutation, hotspot, and copy number) to determine which classes of features were most important for correct predictions across types and organ systems (Supplementary Figs. S7 and S8). Organ system annotations and assignments are detailed in Supplementary Table S13, adapted from Nguyen et al. (40). We also perform this analysis on all correct predictions regardless of cancer type, to determine which features are driving correct predictions overall. Lastly, we compare top Shapley values for correct versus incorrect predictions using these organ annotations and evaluating both in and out-of-distribution samples in Supplementary Fig. S8 and the Supplementary Methods.

Finally, in order to justify our choice of features, we analyzed Shapley values for KRAS-associated features only, across any cancer type which included KRAS in the top 10 features from this analysis (Supplementary Fig. S10).

For final implementation, we report the top 10 absolute Shapley values across interpretable features per model prediction.

Identifying Significantly Associated Shapley Values and Cancer Types.

We conducted Mann–Whitney U tests across each combination of feature and cancer type within our test set predictions: the distribution of Shapley values effect scores for a feature across all test samples predicted as that cancer type was compared with the distribution of effect scores for that feature across all other test samples not predicted as that cancer type. We then applied Bonferroni test correction and found that 1,655 features were significantly associated with specific cancer types with P < 0.5, reported in Supplementary Table S15.

Classification of Excluded Types

We classified 1,321 solid tumor samples from our initial discovery cohort as rare, excluded types due to their small sample size and low likelihood to be genomically distinct. These represent 33 distinct tumor types and 23 subtypes of in-distribution cancer types. Organ system labeling for in-distribution and rare types was performed using oncotree labeling methods (51). Each sample was assigned to a relevant organ system in the same method as previously described for in-distribution samples. Analysis of rare types and their corresponding organ system predictions were restricted to the 10 major systems previously described: any samples from cancer types that corresponded to multiple organ systems or organ systems outside of these 10 systems were excluded from organ system analysis (Supplementary Table S13).

Additional methods for attempting automated out-of-distribution (i.e., rare, excluded type) separation using ensemble statistics or external techniques are described in Supplementary Methods.

Adaptable Prior Methodology

The adaptable prior distribution function recalibrates probabilities from GDD-ENS outputs given prediction-specific prior likelihoods using a variation of a Naïve Bayes classifier. We define P(yt) as the likelihood of cancer type t for sample y. We want to adjust P(yt) given sample and type-specific genomic (g) and nongenomic evidence, such as metastatic site (m), histology (h), or clinical history (c). In other words, find Padapt (yt) = P(yt|X), where X = ( g, m, h, c).

Using Naïve Bayes, this expands to

As P(yt|g) represents the genomic probability of type t for sample y outputted by GDD-ENS, and P(yt|g)/P(yt) = P( g|yt)/P( gt), we simplify the prior to

In the case that only a single source of additional clinical information is available to guide predictions, all unrelated terms can be simplified, such that the final prior only reflects GDD-ENS probabilities and updated likelihoods given the sample-specific additional information.

We calculate P(yt|X) for all t to return a final adjustment array which is then normalized, potentially identifying a new prediction and confidence estimate for each sample.

We validate adaptable prior using two broad systems: metastatic biopsy site and histology. For the biopsy site adaptable prior, biopsy site labels were mapped to a set of 20 broad organ or organ system labels using ICD Billing codes (41). P(m|yt) and P(m) are derived from the metastatic, upsampled training set. For each organ system, we derived site-specific priors given the frequency of each cancer type within the metastatic, upsampled training set seen at that site, using pseudocounts to allow flexibility for unseen (type, met_site) pairs. We also include an “Original” prior, for any metastatic site label with too few examples for prior distribution calculation, that is, a metastatic site label only seen in testing or corresponding to the “Other” system label (as per ref. 41). Metastatic biopsy site adaptable prior results were then calculated from the metastatic samples within the test set.

Histology site prior was similarly calculated for any samples within the two broad classes of carcinomas: adenocarcinoma and squamous cell carcinoma. We derived the frequency of each type within each histology using subtype frequencies across the nonduplicated training set and did not include pseudocounts as prior likelihoods for histologies are nonflexible. To report the results on test samples, we assume that any sample with a true type within either histology would be observable at the time of diagnosis without requiring a true type indication, and adjust based on annotated subtype.

When combining multiple priors, that is, for metastatic samples with histology annotated, we multiply across all included priors as per Naïve Bayes classification methodology previously described. Patient-specific clinical history priors are not pseudocount adjusted during implementation. Adjusted predictions are calculated similarly regardless of context, multiplying P(yt|g) by the adjustment ratio for each type, normalizing, and then reporting highest probability type and corresponding confidence value.

CUP Analysis

To identify CUP samples with later-confirmed diagnoses, we first screened the MSK-IMPACT clinical series with the following criteria: (i) the patient had two or more samples successfully profiled by MSK-IMPACT; (ii) one or more of the samples were annotated as CUP; (iii) one or more of the samples were not annotated as CUP. Samples of low tumor content or low quality were excluded as described above. Each patient was manually reviewed by a molecular pathologist (JC or JCC) for clinical and histologic findings as well as corresponding IMPACT sequencing results on the tumors, and the clonal relationship and tumor origin were determined by shared genomic alterations among tumors. Patients who had pathologic diagnoses and primary sites successfully established on the CUP samples from subsequent sequencing results were identified for the analysis. For all CUP predictions, we retrained GDD-ENS after removing any samples from patients with a CUP diagnosis regardless of the timeline for CUP indication to remove any potential prediction biases from training, using the same training procedure previously described for GDD-ENS. Results for later-confirmed CUP patients are in Supplementary Table S17. Adaptable priors were also applied identically for CUP patients.

We used the OncoKB Annotator tool to annotate actionable alterations for all CUP samples (42). We first annotated the baseline level alterations for CUP patients, and then reran this analysis using the top predicted indication for each CUP patient, before thresholding on confidence. Any patient that was found to have a targetable alteration at baseline was not included in new patient counts when analyzing targetable alterations for CUP after GDD-ENS predictions, although these overlapping cases represent relevant therapeutic regimen changes and thus were highlighted separately during analysis. We repeated this process for the second and third-highest prediction, similarly removing any patients with an actionable alteration identified from a top prediction when analyzing patient counts after incorporating the second prediction, etc. Gene and alteration frequency was calculated from only the high-confidence GDD-ENS predictions.

Data and Materials Availability

Data and code for the generation of full feature tables, as well as code for model training, individual predictions, and adaptable prior available at https://github.com/mmdarmofal/GDD_ENS. All other data and full results are present in the paper or the Supplementary Materials. All data used to generate the feature table is available for download on the AACR Project GENIE database (release 14.0); we have provided instructions for downloading raw data in Methods (52). There is also a CodeOcean compute capsule at https://codeocean.com/capsule/3410308/.

M.F. Berger reports personal fees from Eli Lilly, AstraZeneca, Paige.AI, other support from BoundlessBio and SOPHiA Genetics outside the submitted work; in addition, M.F. Berger has a patent for classifier models to predict tissue of origin from targeted tumor DNA sequencing pending. No disclosures were reported by the other authors.

M. Darmofal: Conceptualization, data curation, software, formal analysis, validation, visualization, methodology, writing–original draft, writing–review and editing. S. Suman: Software, validation. G. Atwal: Conceptualization, software, supervision, methodology. M. Toomey: Resources, data curation, software. J. Chen: Resources, data curation. J.C. Chang: Resources, data curation. E. Vakiani: Data curation, methodology. A.M. Varghese: Resources, data curation. A. Balakrishnan Rema: Software. A. Syed: Software. N. Schultz: Resources, supervision. M.F. Berger: Conceptualization, supervision, funding acquisition, methodology, project administration, writing–review and editing. Q. Morris: Conceptualization, resources, supervision, funding acquisition, methodology, writing–review and editing.

We gratefully acknowledge the Marie Josée and Henry R. Kravis Center for Molecular Oncology and the MSK Molecular Diagnostics Service. We would like to thank the patients and their families. We would also like to thank Hannes Bretschneider, Chenlian Fu, Dig Vijay Kumar Yarlagadda, Aziz Zafar, Julia Simundza, and Peter Zhang for their insights and helpful comments during the preparation of this manuscript and codebase. NIH/NCI Cancer Center Support Grant P30 CA008748 (QM and MB) and NIH/NCI Grant R25 CA233208 (MB).

Note: Supplementary data for this article are available at Cancer Discovery Online (http://cancerdiscovery.aacrjournals.org/).

1.
Pavlidis
N
,
Briasoulis
E
,
Hainsworth
J
,
Greco
FA
.
Diagnostic and therapeutic management of cancer of an unknown primary
.
Eur J Cancer
2003
;
39
:
1990
2005
.
2.
Varghese
AM
,
Arora
A
,
Capanu
M
,
Camacho
N
,
Won
HH
,
Zehir
A
, et al
.
Clinical and molecular characterization of patients with cancer of unknown primary in the modern era
.
Ann Oncol
2017
;
28
:
3015
21
.
3.
Kato
S
,
Alsafar
A
,
Walavalkar
V
,
Hainsworth
J
,
Kurzrock
R
.
Cancer of unknown primary in the molecular Era
.
Trends Cancer
2021
;
7
:
465
77
.
4.
Greco
FA
.
Molecular diagnosis of the tissue of origin in cancer of unknown primary site: useful in patient management
.
Curr Treat Options Oncol
2013
;
14
:
634
42
.
5.
Zehir
A
,
Benayed
R
,
Shah
RH
,
Syed
A
,
Middha
S
,
Kim
HR
, et al
.
Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients
.
Nat Med
2017
;
23
:
703
13
.
6.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
.
Pan-cancer analysis of whole genomes
.
Nature
2020
;
578
:
82
93
.
7.
Wang
Z
,
Wang
Y
,
Zhang
J
,
Hu
Q
,
Zhi
F
,
Zhang
S
, et al
.
Significance of the TMPRSS2:ERG gene fusion in prostate cancer
.
Mol Med Rep
2017
;
16
:
5450
8
.
8.
Gupta
S
,
Vanderbilt
CM
,
Lin
YT
,
Benhamida
JK
,
Jungbluth
AA
,
Rana
S
, et al
.
A pan-cancer study of somatic TERT promoter mutations and amplification in 30,773 tumors profiled by clinical genomic sequencing
.
J Mol Diagn
2021
;
23
:
253
63
.
9.
Malone
ER
,
Oliva
M
,
Sabatini
PJB
,
Stockley
TL
,
Siu
LL
.
Molecular profiling for precision cancer therapies
.
Genome Med
2020
;
12
:
8
.
10.
Salvadores
M
,
Mas-Ponte
D
,
Supek
F
.
Passenger mutations accurately classify human tumors
.
PLoS Comput Biol
2019
;
15
:
e1006953
.
11.
Nguyen
L
,
Van Hoeck
A
,
Cuppen
E
.
Machine learning-based tissue of origin classification for cancer of unknown primary diagnostics using genome-wide mutation features
.
Nat Commun
2022
;
13
:
4013
.
12.
Dietlein
F
,
Eschner
W
.
Inferring primary tumor sites from mutation spectra: a meta-analysis of histology-specific aberrations in cancer-derived cell lines
.
Hum Mol Genet
2014
;
23
:
1527
37
.
13.
Jiao
W
,
Atwal
G
,
Polak
P
,
Karlic
R
,
Cuppen
E
,
Danyi
A
, et al
.
A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns
.
Nat Commun
2020
;
11
:
728
.
14.
Sanjaya
P
,
Maljanen
K
,
Katainen
R
,
Waszak
SM
,
Genomics England Research
C
,
Aaltonen
LA
, et al
.
Mutation-Attention (MuAt): deep representation learning of somatic mutations for tumour typing and subtyping
.
Genome Med
2023
;
15
:
47
.
15.
Soh
KP
,
Szczurek
E
,
Sakoparnig
T
,
Beerenwinkel
N
.
Predicting cancer type from tumour DNA signatures
.
Genome Med
2017
;
9
:
104
.
16.
Lee
K
,
Jeong
HO
,
Lee
S
,
Jeong
WK
.
CPEM: accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network
.
Sci Rep
2019
;
9
:
16927
.
17.
Moon
I
,
LoPiccolo
J
,
Baca
SC
,
Sholl
LM
,
Kehl
KL
,
Hassett
MJ
, et al
.
Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary
.
Nat Med
2023
;
29
:
2057
67
.
18.
Ri
I
,
Kawata
J
,
Nagai
A
,
Muto
K
.
Expectations, concerns, and attitudes regarding whole-genome sequencing studies: a survey of cancer patients, families, and the public in Japan
.
J Hum Genet
2023
;
68
:
281
5
.
19.
Cuppen
E
,
Elemento
O
,
Rosenquist
R
,
Nikic
S
,
IJzerman
M
,
Zaleski
ID
, et al
.
Implementation of whole-genome and transcriptome sequencing into clinical cancer care
.
JCO Precis Oncol
2022
;
6
:
e2200245
.
20.
Zhao
EY
,
Jones
M
,
Jones
SJM
.
Whole-genome sequencing in cancer
.
Cold Spring Harb Perspect Med
2019
;
9
:
a034579
.
21.
Cheng
DT
,
Mitchell
TN
,
Zehir
A
,
Shah
RH
,
Benayed
R
,
Syed
A
, et al
.
Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology
.
J Mol Diagn
2015
;
17
:
251
64
.
22.
Penson
A
,
Camacho
N
,
Zheng
Y
,
Varghese
AM
,
Al-Ahmadie
H
,
Razavi
P
, et al
.
Development of genome-derived tumor type prediction to inform clinical cancer care
.
JAMA Oncol
2020
;
6
:
84
91
.
23.
Comitani
F
,
Nash
JO
,
Cohen-Gogo
S
,
Chang
AI
,
Wen
TT
,
Maheshwari
A
, et al
.
Diagnostic classification of childhood cancer using multiscale trans­criptomics
.
Nat Med
2023
;
29
:
656
66
.
24.
Niu
B
,
Ye
K
,
Zhang
Q
,
Lu
C
,
Xie
M
,
McLellan
MD
, et al
.
MSIsensor: microsatellite instability detection using paired tumor-normal sequence data
.
Bioinformatics
2014
;
30
:
1015
6
.
25.
Middha
S
,
Zhang
L
,
Nafa
K
,
Jayakumaran
G
,
Wong
D
,
Kim
HR
, et al
.
Reliable pan-cancer microsatellite instability assessment by using targeted next-generation sequencing data
.
JCO Precis Oncol
2017
;
2017
:
PO.17.00084
.
26.
Chang
MT
,
Asthana
S
,
Gao
SP
,
Lee
BH
,
Chapman
JS
,
Kandoth
C
, et al
.
Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity
.
Nat Biotechnol
2016
;
34
:
155
63
.
27.
Chang
MT
,
Bhattarai
TS
,
Schram
AM
,
Bielski
CM
,
Donoghue
MTA
,
Jonsson
P
, et al
.
Accelerating discovery of functional mutant alleles in cancer
.
Cancer Discov
2018
;
8
:
174
83
.
28.
Guo
C
,
Pleiss
G
,
Sun
Y
,
Weinberger
KQ
.
On calibration of modern neural networks.
PMLR
2017
:
1321
30
.
29.
Yang
PY
,
Yang
YH
,
Zhou
BB
,
Zomaya
AY
.
A review of ensemble methods in bioinformatics
.
Curr Bioinform
2010
;
5
:
296
308
.
30.
Zhong
H
,
Chen
H
,
Qiu
H
,
Huang
C
,
Wu
Z
.
A multiomics comparison between endometrial cancer and serous ovarian cancer
.
PeerJ
2020
;
8
:
e8347
.
31.
van de Nes
J
,
Gessi
M
,
Sucker
A
,
Moller
I
,
Stiller
M
,
Horn
S
, et al
.
Targeted next-generation sequencing reveals unique mutation profile of primary melanocytic tumors of the central nervous system
.
J Neurooncol
2016
;
127
:
435
44
.
32.
Arora
K
,
Tran
TN
,
Kemel
Y
,
Mehine
M
,
Liu
YL
,
Nandakumar
S
, et al
.
Genetic ancestry correlates with somatic differences in a real-world clinical cancer sequencing cohort
.
Cancer Discov
2022
;
12
:
2552
65
.
33.
Garcia
EP
,
Minkovsky
A
,
Jia
Y
,
Ducar
MD
,
Shivdasani
P
,
Gong
X
, et al
.
Validation of OncoPanel: a targeted next-generation sequencing assay for the detection of somatic variants in cancer
.
Arch Pathol Lab Med
2017
;
141
:
751
8
.
34.
Beaubier
N
,
Tell
R
,
Lau
D
,
Parsons
JR
,
Bush
S
,
Perera
J
, et al
.
Clinical validation of the tempus xT next-generation targeted oncology sequencing assay
.
Oncotarget
2019
;
10
:
2384
96
.
35.
Milbury
CA
,
Creeden
J
,
Yip
WK
,
Smith
DL
,
Pattani
V
,
Maxwell
K
, et al
.
Clinical and analytical validation of FoundationOne(R)CDx, a comprehensive genomic profiling assay for solid tumors
.
PLoS One
2022
;
17
:
e0264138
.
36.
Lundberg
SM
,
Lee
S-I
.
A unified approach to interpreting model predictions: advances in neural information processing systems
.
Curran Associates, Inc
;
2017
.
37.
South
AP
,
Purdie
KJ
,
Watt
SA
,
Haldenby
S
,
den Breems
NY
,
Dimon
M
, et al
.
NOTCH1 mutations occur early during cutaneous squamous cell carcinogenesis
.
J Invest Dermatol
2014
;
134
:
2630
8
.
38.
Llano
E
,
Todeschini
AL
,
Felipe-Medina
N
,
Corte-Torres
MD
,
Condezo
YB
,
Sanchez-Martin
M
, et al
.
The oncogenic FOXL2 C134W mutation is a key driver of granulosa cell tumors
.
Cancer Res
2023
;
83
:
239
50
.
39.
Lee
JK
,
Sivakumar
S
,
Schrock
AB
,
Madison
R
,
Fabrizio
D
,
Gjoerup
O
, et al
.
Comprehensive pan-cancer genomic landscape of KRAS altered cancers and real-world outcomes in solid tumors
.
NPJ Precis Oncol
2022
;
6
:
91
.
40.
Nguyen
B
,
Fong
C
,
Luthra
A
,
Smith
SA
,
DiNatale
RG
,
Nandakumar
S
, et al
.
Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients
.
Cell
2022
;
185
:
563
75
.
41.
Boekhout
AH
,
Beijnen
JH
,
Schellens
JH
.
Trastuzumab. Oncologist
2011
;
16
:
800
10
.
42.
Chakravarty
D
,
Gao
J
,
Phillips
SM
,
Kundra
R
,
Zhang
H
,
Wang
J
, et al
.
OncoKB: a precision oncology knowledge base
.
JCO Precis Oncol
2017
;
2017
:
PO.17.00011
.
43.
Tabernero
J
,
Velez
L
,
Trevino
TL
,
Grothey
A
,
Yaeger
R
,
Van Cutsem
E
, et al
.
Management of adverse events from the treatment of encorafenib plus cetuximab for patients with BRAF V600E-mutant metastatic colorectal cancer: insights from the BEACON CRC study
.
ESMO Open
2021
;
6
:
100328
.
44.
Li
HY
,
Tian
SY
,
Li
Y
,
Fang
QM
,
Tan
RB
,
Pan
YJ
, et al
.
Modern deep learning in bioinformatics
.
J Mol Cell Biol
2020
;
12
:
823
7
.
45.
Koelsche
C
,
von Deimling
A
.
Methylation classifiers: brain tumors, sarcomas, and what's next
.
Genes Chromosomes Cancer
2022
;
61
:
346
55
.
46.
Mohammed
M
,
Mwambi
H
,
Mboya
IB
,
Elbashir
MK
,
Omolo
B
.
A stacking ensemble deep learning approach to cancer type classification based on TCGA data
.
Sci Rep
2021
;
11
:
15626
.
47.
Divate
M
,
Tyagi
A
,
Richard
DJ
,
Prasad
PA
,
Gowda
H
,
Nagaraj
SH
.
Deep learning-based pan-cancer classification model reveals tissue-of-origin specific gene expression signatures
.
Cancers
2022
;
14
:
1185
.
48.
Zhao
Y
,
Pan
Z
,
Namburi
S
,
Pattison
A
,
Posner
A
,
Balachander
S
, et al
.
CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence
.
EBioMedicine
2020
;
61
:
103030
.
49.
Modhukur
V
,
Sharma
S
,
Mondal
M
,
Lawarde
A
,
Kask
K
,
Sharma
R
, et al
.
Machine learning approaches to classify primary and metastatic cancers using tissue of origin-based DNA methylation profiles
.
Cancers
2021
;
13
:
3768
.
50.
Brannon
AR
,
Jayakumaran
G
,
Diosdado
M
,
Patel
J
,
Razumova
A
,
Hu
Y
, et al
.
Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell-free DNA via paired normal sequencing using MSK-ACCESS
.
Nat Commun
2021
;
12
:
3770
.
51.
Kundra
R
,
Zhang
H
,
Sheridan
R
,
Sirintrapun
SJ
,
Wang
A
,
Ochoa
A
, et al
.
OncoTree: a cancer classification system for precision oncology
.
JCO Clin Cancer Inform
2021
;
5
:
221
30
.
52.
Consortium, A.P.G
.
AACR project GENIE: powering precision medicine through an international consortium
.
Cancer Discov
2017
;
7
:
818
31
.
This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

Supplementary data