Purpose:

The clinical behavior of ampullary adenocarcinoma varies widely. Targeted tumor sequencing may better define biologically distinct subtypes to improve diagnosis and management.

Experimental Design:

The hidden-genome algorithm, a multilevel meta-feature regression model, was trained on a prospectively sequenced cohort of 3,411 patients (1,001 pancreatic adenocarcinoma, 165 distal bile-duct adenocarcinoma, 2,245 colorectal adenocarcinoma) and subsequently applied to targeted panel DNA-sequencing data from ampullary adenocarcinomas. Genomic classification (i.e., colorectal vs. pancreatic) was correlated with standard histologic classification [i.e., intestinal (INT) vs. pancreatobiliary (PB)] and clinical outcome.

Results:

Colorectal genomic subtype prediction was primarily influenced by mutations in APC and PIK3CA, tumor mutational burden, and DNA mismatch repair (MMR)–deficiency signature. Pancreatic genomic-subtype prediction was dictated by KRAS gene alterations, particularly KRAS G12D, KRAS G12R, and KRAS G12V. Distal bile-duct adenocarcinoma genomic subtype was most influenced by copy-number gains in the MDM2 gene. Despite high (73%) concordance between immunomorphologic subtype and genomic category, there was significant genomic heterogeneity within both histologic subtypes. Genomic scores with higher colorectal probability were associated with greater survival compared with those with a higher pancreatic probability.

Conclusions:

The genomic classifier provides insight into the heterogeneity of ampullary adenocarcinoma and improves stratification, which is dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing and obviates subjective histologic interpretation.

Translational Relevance

Ampullary adenocarcinomas are classified into intestinal or pancreatobiliary subtypes based on histologic criteria, with potentially different clinical behavior. The incorporation of targeted tumor sequencing may better define biologically distinct phenotypes of ampullary adenocarcinoma to improve clinical diagnosis and management. Following training of the hidden-genome algorithm, a multilevel meta-feature regression model, on related malignancies of the pancreas, distal bile duct, and intestine, the molecular taxonomy was applied to an institutional cohort of patients with ampullary adenocarcinoma. The genomic-classifier methodology revealed significant heterogeneity among ampullary cancers. The genomic classifier better stratified the divergent outcomes in ampullary cancer, which were dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing, is not subject to subjective histologic interpretation, and holds promise for improving identification of distinct clinical subtypes for risk stratification and may guide selection for multimodality therapies.

Accurate classification of tumors is essential to guide management and inform prognosis. Traditionally, site of origin and histopathologic subtype have defined a tumor and its expected clinical phenotype. Targeted tumor sequencing is an increasingly available technology that allows more precise classification utilizing patterns of genomic alterations. Moreover, such analyses may help quantify tumor heterogeneity, which is important not only for understanding tumor biology but also differences in clinical behavior and potential tailored treatment.

The hidden-genome classifier (1, 2) is a powerful tool that offers deeper insight into tumor biology. Although there is a robust literature describing the association between specific genomic alterations and individual tumor types, frequent and highly conserved variants comprise a small number of observed variants. Deep sequencing has revealed millions of unique somatic mutations, and often, more than 90% of somatic variants are singletons (3). The hidden-genome classifier uses multilevel meta-feature regression to utilize both common and rare variants and incorporates previously unobserved variants to determine cancer type (1).

Ampullary adenocarcinoma is a heterogeneous disease that may benefit from hidden-genome methodology. Ampullary adenocarcinoma is an uncommon malignancy of the periampullary region (4–7), the junction of the biliary, pancreatic, and digestive tracts. As such, adenocarcinomas arising within the duodenal ampullary complex have variable clinical phenotypes, likely due to differences in cell of origin (8). Although two major histologic types – intestinal (INT) and pancreatobiliary (PB) – have been identified (9, 10), morphology-based subtype classification is unreliable for prognostication (11). Additionally, molecular genotyping has not improved prognostic stratification relative to histology alone (12). Thus, identification of distinct clinical subtypes for risk stratification and improving selection for multimodality therapies is a critical unmet need.

Herein, we report a molecular taxonomy of ampullary adenocarcinoma using hidden-genome methodology based on a broad array of genomic alterations identified by targeted tumor sequencing. Additionally, we used the model to quantify the degree of genomic heterogeneity in individual samples, which included therapeutically actionable alterations.

Patient cohort

After institutional review board (IRB) approval, we identified consecutive patients treated for ampullary adenocarcinoma at Memorial Sloan Kettering Cancer Center (MSKCC) with targeted next-generation sequencing (NGS) of their tumor. All specimens were sequenced using the Memorial Sloan Kettering– Integrated Molecular Profiling of Actionable Cancer Targets (MSK-IMPACT) assay, a clinically validated targeted NGS array that can detect mutations, copy-number alterations, and select rearrangements (13). Demographic, clinical, pathologic, and outcome data were abstracted from an institutional database and patients' medical records. All patients provided written informed consent. The study was conducted in accordance with the US Common Rule.

Pathologic assessment of ampullary adenocarcinomas

Tumors were reviewed by a gastrointestinal pathologist blinded to the genotypic classifier. The histologic category was assigned using established criteria (14) as INT, PB, or mixed INT/PB (mixed). The Ang immunophenotypic classification was assigned as “INT”, “PB”, or “ambiguous”, as previously described (15).

Hidden-genome classification

For probabilistic classification of the ampullary tumors based on their genomic profiles, we trained three hidden-genome group-lasso regularized multi-logistic models – (i) a three-class model trained on 2,245 colorectal adenocarcinomas, 165 distal bile-duct adenocarcinomas, and 1,001 pancreatic adenocarcinomas (n = 3,411), (ii) a four-class model that added 254 gastric tumors (n = 3,665) to the three-class model, and (iii) a four-class model that added 69 small-bowel adenocarcinomas (n = 3,480) to the three-class model. The training cohort included patients treated at MSKCC with available MSK-IMPACT sequencing of their primary tumor. Various descriptive statistics for the training data cohort are provided in Supplementary Table S1. Colorectal adenocarcinoma was used to represent intestinal genomics in the three-class model, given the rarity of primary small-bowel adenocarcinoma and the overlap of driver mutations (16).

The hidden-genome model utilized the following predictors: (i) normalized binary indicators for 250 discriminative individual variants observed in the MSK-IMPACT cancer-gene panels, (ii) normalized number of mutations observed in each MSK-IMPACT gene, (iii) normalized counts of mutations associated with each of the 96 possible single-base substitution categories [ref. 17; SBS-96; each considering the mutated base (6 possibilities), along with immediately 5′ and 3′ flanks (4 possibilities each), resulting in |6 \times \ 4 \times 4\ = \ 96$| categories], (iv) square-root of the total number of mutations observed in the tumor, (v) binary indicators of copy-number loss and gain at each of 476 genes present in MSK-IMPACT panel, and (vi) average copy-number log-ratio computed at 782 chromosome cytogenic bands spanned across the 22 autosomes (18, 19). Predictors (ii) to (iv) can be interpreted as scalar projections of three meta-features, namely, the gene itself, SBS-96, and an intercept meta-feature vector of 1, respectively (Supplementary Methods), along the direction of the mutation-profile vector. The mutation contexts embodied in these meta-features combine information in the individual variants, including rare variants, thereby permitting a highly informative dimension reduction of the ultra-high dimensional mutation-profile vector. The model also includes several discriminative hotspot variants with substantial residual effects not explained by mutation context.

We used the fitted hidden-genome model to predict parental cancer sites for the sample of ampullary adenocarcinoma specimens. For each ampullary tumor, the predicted class probabilities were used to produce a soft classification of the tumors (i.e., percentage colorectal, pancreatic, and biliary). An associated hard classification was subsequently obtained by assigning each tumor to the class with the highest predicted probability if the highest probability was ≥0.5; otherwise, the hard class was “indeterminate”.

Statistical analysis

Continuous data are expressed as medians and interquartile range (IQR) and compared between groups using Wilcoxon rank sum test. Categorical variables are expressed as frequencies and percentages and compared using Fisher exact test. Overall survival (OS) was defined from time of diagnosis to death or last follow-up; for the surgically resected subset, OS was additionally evaluated using time from resection to death or last follow-up. OS was estimated with Kaplan–Meier methods and compared with the log–rank test. All tests were two-sided, and P < 0.05 was considered significant. SAS (version 9.4; SAS Institute) or R (version 4.0.1; R Foundation for Statistical Computing) was used for all analyses.

Patient cohort

A cohort of 76 patients with ampullary adenocarcinoma was identified (Table 1), with a median age of 60.5 years (IQR 50.0–68.0); 62% were male and 57 (75%) underwent resection (pancreaticoduodenectomy). Median tumor diameter was 2.3 cm (IQR 1.7–2.8), and the majority (72.2%) had American Joint Committee on Cancer (AJCC) 8th edition stage III disease. Adverse histopathologic features were common: poor differentiation (n = 19; 35.2%), lymphovascular invasion (n = 39; 75.0%), and perineural invasion (n = 37; 72.6%). By immuno-histologic assessment, 21 patients had INT, 50 patients had PB, and 5 patients had mixed-subtype tumors.

Table 1.

Demographic, clinical, tumor, and treatment-related variables for all patients.

Surgery
Variable, N (%) or median (IQR) or mean (SD)No resection (n = 19)Resection (n = 57)Total (n = 76)
Age, median (IQR)a 61 (51–67) 60 (48–70) 60.5 (50–68) 
Gender, female 6 (31.6%) 23 (40.4%) 29 (38.2%) 
CEA (ng/mL), median (IQR)b 7.2 (3.8–40.9) 2.5 (1.4–4.1) 3.8 (1.9–12.0) 
Ca 19–9 (U/mL), median (IQR)c 74 (18.5–1869) 58 (24–285) 58 (19–319) 
AJCC staged I-II 0 (0%) 13 (24.1%) 13 (17.8%) 
 III 2 (10.5%) 39 (72.2%) 41 (56.2%) 
 IV 17 (89.5%) 2 (3.7%) 19 (26.0%) 
Gradee Well 1 (5.3%) 2 (3.7%) 3 (4.1%) 
 Moderate 11 (57.9%) 33 (61.1%) 44 (60.3%) 
 Poor 7 (36.8%) 19 (35.2%) 26 (35.6%) 
Immunomorphologic subtype INT 5 (26.3%) 16 (28.1%) 21 (27.6%) 
 PB 13 (68.4%) 37 (64.9%) 50 (65.8%) 
 Mixed 1 (5.3%) 4 (7.0%) 5 (6.6%) 
T-stagef T1–2 – 15 (28.9%) – 
 T3 – 37 (71.2%) – 
 T4 – 2 (3.7%) – 
Nodal positivityg – 39 (75.0%) – 
Tumor size, median (IQR)h – 2.3 (1.7–2.8) – 
Margin negative (R0)i – 49 (89.1%) – 
Lymphovascular invasionj – 39 (75.0%) – 
Perineural invasionk – 37 (72.6%) – 
Chemotherapy Neoadjuvant – 3 (5.3%) – 
 Adjuvant – 40 (70.2%) – 
Radiation, adjuvant – 17 (33.3%) – 
Surgery
Variable, N (%) or median (IQR) or mean (SD)No resection (n = 19)Resection (n = 57)Total (n = 76)
Age, median (IQR)a 61 (51–67) 60 (48–70) 60.5 (50–68) 
Gender, female 6 (31.6%) 23 (40.4%) 29 (38.2%) 
CEA (ng/mL), median (IQR)b 7.2 (3.8–40.9) 2.5 (1.4–4.1) 3.8 (1.9–12.0) 
Ca 19–9 (U/mL), median (IQR)c 74 (18.5–1869) 58 (24–285) 58 (19–319) 
AJCC staged I-II 0 (0%) 13 (24.1%) 13 (17.8%) 
 III 2 (10.5%) 39 (72.2%) 41 (56.2%) 
 IV 17 (89.5%) 2 (3.7%) 19 (26.0%) 
Gradee Well 1 (5.3%) 2 (3.7%) 3 (4.1%) 
 Moderate 11 (57.9%) 33 (61.1%) 44 (60.3%) 
 Poor 7 (36.8%) 19 (35.2%) 26 (35.6%) 
Immunomorphologic subtype INT 5 (26.3%) 16 (28.1%) 21 (27.6%) 
 PB 13 (68.4%) 37 (64.9%) 50 (65.8%) 
 Mixed 1 (5.3%) 4 (7.0%) 5 (6.6%) 
T-stagef T1–2 – 15 (28.9%) – 
 T3 – 37 (71.2%) – 
 T4 – 2 (3.7%) – 
Nodal positivityg – 39 (75.0%) – 
Tumor size, median (IQR)h – 2.3 (1.7–2.8) – 
Margin negative (R0)i – 49 (89.1%) – 
Lymphovascular invasionj – 39 (75.0%) – 
Perineural invasionk – 37 (72.6%) – 
Chemotherapy Neoadjuvant – 3 (5.3%) – 
 Adjuvant – 40 (70.2%) – 
Radiation, adjuvant – 17 (33.3%) – 

aAge data missing for 2 patients.

bCEA data missing for 34 patients.

cCa 19–9 data missing for 29 patients.

dAJCC stage data missing for 3 patients.

eGrade data missing for 3 patients.

fT-stage data missing for 5 patients.

gNodal-positivity data missing for 5 patients.

hTumor-size data missing for 7 patients.

iMargin data missing for 2 patients.

jLymphovascular-invasion data missing for 5 patients.

kPerineural-invasion data missing for 6 patients.

Generating a hidden-genome model of ampullary adenocarcinoma

To visualize the discriminative signals of the hidden-genome methodology when applied to the three-class training set, we performed a principal component analysis of all active predictors (i.e., selected in group-lasso) in the fitted model (Supplementary Fig. S1). Following a Uniform Manifold Approximation and Projection (UMAP) analysis (20) on the resultant first 50 principal components, an approximate two-dimensional embedding of all active predictors for each training-set tumor was created (Fig. 1A). There was distinct separation between colorectal and pancreatic tumors, suggesting unique genomic signatures of these two cancer types. Distal bile-duct adenocarcinomas, in contrast, did not harbor strong tissue-specific signals but rather a mixture of colorectal- and pancreatic-specific genomic information. Still, many fell “closer” to pancreatic tumors, consistent with their established histologic similarities (21, 22). A complete heatmap displaying the values of all active predictors in the hidden-genome model across all training-set tumors (grouped by cancer sites) is displayed in Supplementary Fig. S2.

Figure 1.

Visualizing the training data and the fitted three-class hidden-genome model. A, Scatter plots showing two-dimensional embeddings of the lasso selected active genomic predictors in the training data. Each point in each scatter represents the genomic profile of a single training-set tumor and is color-coded according to its cancer site. For each tumor, the lasso selected active genomic features from the fitted hidden-genome model are first collected from the training data; then the first 50 principal components of these active genomic features are computed; finally, two-dimensional UMAPs of these 50 principal components are computed to produce approximate two-dimensional representations of the genomic profiles of all training-set tumors. B, Log one-vs.-rest ORs of top 5 predictors (with largest absolute log ORs) in each predictor group in the fitted hidden-genome model. Each bar represents the change in the log odds of a tumor being classified into the corresponding cancer site, relative to not being classified into that site, for a one-standard deviation increase in the associated predictor from its mean, while keeping all other predictors fixed at their respective means. Predictors from the same group (variants, genes, SBSs, gene copy-number loss/gain indicators, and cytoband copy-number average log ratios) are grouped. The cancer-type–specific log ORs due to one-standard deviation increases from the means of individual predictors while keeping all other predictors fixed at their respective means are plotted for the 5 most informative predictors in each predictor group. C, The precision-recall AUC for one-vs.-rest comparisons specific to each training cancer site category, obtained from the pre-validated hidden-genome multinomial logistic predictive probabilities for tumors in that site, are plotted as horizontal blue bars. The final row displays the ‘Average’ of the individual class-specific AUCs. The darkened area on each bar represents the corresponding baseline AUC associated with a null classifier that randomly assigns class labels to tumors. CNA, copy number alterations.

Figure 1.

Visualizing the training data and the fitted three-class hidden-genome model. A, Scatter plots showing two-dimensional embeddings of the lasso selected active genomic predictors in the training data. Each point in each scatter represents the genomic profile of a single training-set tumor and is color-coded according to its cancer site. For each tumor, the lasso selected active genomic features from the fitted hidden-genome model are first collected from the training data; then the first 50 principal components of these active genomic features are computed; finally, two-dimensional UMAPs of these 50 principal components are computed to produce approximate two-dimensional representations of the genomic profiles of all training-set tumors. B, Log one-vs.-rest ORs of top 5 predictors (with largest absolute log ORs) in each predictor group in the fitted hidden-genome model. Each bar represents the change in the log odds of a tumor being classified into the corresponding cancer site, relative to not being classified into that site, for a one-standard deviation increase in the associated predictor from its mean, while keeping all other predictors fixed at their respective means. Predictors from the same group (variants, genes, SBSs, gene copy-number loss/gain indicators, and cytoband copy-number average log ratios) are grouped. The cancer-type–specific log ORs due to one-standard deviation increases from the means of individual predictors while keeping all other predictors fixed at their respective means are plotted for the 5 most informative predictors in each predictor group. C, The precision-recall AUC for one-vs.-rest comparisons specific to each training cancer site category, obtained from the pre-validated hidden-genome multinomial logistic predictive probabilities for tumors in that site, are plotted as horizontal blue bars. The final row displays the ‘Average’ of the individual class-specific AUCs. The darkened area on each bar represents the corresponding baseline AUC associated with a null classifier that randomly assigns class labels to tumors. CNA, copy number alterations.

Close modal

To visualize the tissue-site specific signals of the most informative predictors in the training data set, we plotted the estimated average ORs for being classified relative to not being classified (one-vs.-rest, Supplementary Methods; Fig. 1B). KRAS had a large positive OR for pancreatic cancer, with the hotspots KRAS G12D, KRAS G12V, and the more pancreas-specific KRAS G12R hotspot providing additional discriminative information captured by the “residual” effects at the variant level. However, hotspot KRAS G13D was more specific to colorectal cancer, where its residual effect produced a large positive-log OR. In contrast, the APC gene had a large OR for colorectal cancer, while its hotspot APC I1307K had a small OR for that site. There were two predictors with large positive-log ORs in distal bile-duct adenocarcinoma, namely, copy-number gain in the MDM2 gene and the SBS-96 category C>T T.T. Complete log-ORs of all active predictors in the fitted model are provided as Supplementary data.

Finally, to assess the predictive accuracy of the fitted hidden-genome model, we computed one-vs.-rest precision-recall AUC for each training site, and subsequently obtained as an overall measure the average of the site-specific AUCs, through prevalidated (23) predictive probabilities (see Supplementary Methods). Note that a precision-recall AUC, unlike an ROC AUC, adjusts for class-size imbalances (which necessarily occur in one-vs.-rest comparisons) and thus produces a robust measure of predictive performance of a multi-class classifier. As depicted in Fig. 1C, the fitted hidden-genome model achieved near perfect predictive accuracies in the colorectal (AUC = 0.99) and pancreatic (AUC = 0.94) genomic groups. The bile-duct group, in contrast, had a noticeably smaller AUC (0.46), likely reflecting the absence of bile-duct–specific discriminative genomic signals, heterogeneity, and small sample size. The precision-recall AUCs achieved by the hidden-genome classifier were well above the null baseline values across all cancer sites. The average of the site-specific AUCs was 0.79 (corresponding to a null baseline of 0.33), demonstrating strong overall classification performance of the fitted model.

Analogous heatmaps, UMAP, and OR for the active predictors, along with precision-recall AUCs for the fitted four-class training-set models (with gastric and small bowel as the additional training sites, respectively) are displayed in Supplementary Figures S3–S6. The additional sites, gastric and small bowel, lacked strong discriminative signals and, similar to distal bile-duct tumors, harbored mixtures of pancreatic- and colorectal-specific genomic signals (Supplementary Figures S3A and S5A). These were reflected in smaller one-vs.-rest ORs (in absolute log scale; Supplementary Figures S3B and S5B), and precision-recall AUCs (Supplementary Fig. S3C and S5C).

Predicting parental sites for ampullary tumors

Applying the trained model to the cohort of 76 ampullary adenocarcinomas, we observed a high degree of concordance between the genomic prediction and pathologic subtype (Fig. 2A). For INT subtype classified by histology and IHC, 76.2% of the samples were genomically predicted to be colorectal site of origin based on the hidden-genome model, with a median predicted probability of 80%. For the PB-subtype adenocarcinoma, 56.0% were genomically predicted as pancreatic site of origin, with a median predicted probability of 55%, representing a higher degree of heterogeneity. The distal bile-duct adenocarcinoma genomic signature was rarely the dominant profile, with a median predicted probability of 5% for INT subtypes and 10% for PB subtypes. Of note, addition of gastric and small-bowel adenocarcinoma to the three-class genomic model did not improve the diagnostic accuracy of the system (Supplementary Fig. S7).

Figure 2.

Application of hidden-genome methodology to test set of ampullary adenocarcinoma specimens. A, Boxplot for hidden-genome–predicted probabilities stratified by immunomorphologic subtype. Separately for ampullary tumors in the three pathologic subtype groups, i.e., intestinal (INT), mixed, and pancreatobiliary (PB) (displayed across columns), the predicted probabilities (along the vertical axis) for colorectal, distal bile-duct, and pancreatic cancer types (along the horizontal axis) obtained from the fitted hidden-genome model are plotted as boxplots. The plots show that most ampullary tumors with INT histologic subtype have high predicted probabilities for colorectal, and most PB subtypes have high pancreatic predicted probabilities. B, Swimmer plot displaying soft and hard classification probabilities. For all ampullary tumors (plotted across rows; the tumor IDs are displayed along the left-most column), the predicted probabilities of having colorectal (blue), distal bile duct (orange), and pancreatic (pink) as their parental cancer types are plotted along the 2nd column (from left), and the predicted hard classes are plotted along the 3rd column (from left). The “indeterminate” hard classes are displayed as gray bars. The right-most column shows the immunohistologic classifications, which are used to group the rows. C, Table of genomic hard classification and the associated immunomorphologic classification.

Figure 2.

Application of hidden-genome methodology to test set of ampullary adenocarcinoma specimens. A, Boxplot for hidden-genome–predicted probabilities stratified by immunomorphologic subtype. Separately for ampullary tumors in the three pathologic subtype groups, i.e., intestinal (INT), mixed, and pancreatobiliary (PB) (displayed across columns), the predicted probabilities (along the vertical axis) for colorectal, distal bile-duct, and pancreatic cancer types (along the horizontal axis) obtained from the fitted hidden-genome model are plotted as boxplots. The plots show that most ampullary tumors with INT histologic subtype have high predicted probabilities for colorectal, and most PB subtypes have high pancreatic predicted probabilities. B, Swimmer plot displaying soft and hard classification probabilities. For all ampullary tumors (plotted across rows; the tumor IDs are displayed along the left-most column), the predicted probabilities of having colorectal (blue), distal bile duct (orange), and pancreatic (pink) as their parental cancer types are plotted along the 2nd column (from left), and the predicted hard classes are plotted along the 3rd column (from left). The “indeterminate” hard classes are displayed as gray bars. The right-most column shows the immunohistologic classifications, which are used to group the rows. C, Table of genomic hard classification and the associated immunomorphologic classification.

Close modal

The cumulative probabilities of the gene classifier for each patient sample are summarized in a probability swimmer plot (multiple horizontal barplots with a common horizontal axis), stratified by immuno-histologic subtype (Fig. 2B). The INT subtypes predominately expressed a colorectal genomic profile, although there was a wide range (3%–100%). There was similar heterogeneity for the PB subtypes; the pancreatic genomic profile ranged from 1% to 98%. Interestingly, of the 5 patients with mixed INT-PB subtype, nearly all (n = 4) had a dominant genomic profile characterized by the colorectal signature (range, 72%–99%).

Genomic-predicted probabilities were converted to a hard classifier by assigning a single category for any subtype that reached a 50% threshold. Overall, concordance between the hard classification and immuno-histologic subtype was 73.2% (Fig. 2C). Of the 21 patients with INT subtype by histology and IHC, 16 (76.2%) had a dominant genomic profile consistent with colorectal. For example, case P−0003602 was INT subtype by IHC, which was concordant with a colorectal genotype (100% colorectal; 0% pancreatic; 0% distal bile duct). There were classic colorectal-cancer genomic features, including KRAS G12C mutation and gain of chromosome 13. Similarly, case P−0023740 was INT subtype by IHC, which was concordant with a colorectal genotype (97%) and included PIK3CA mutation and the presence of a microsatellite instability (MSI) signature (score = 46). In contrast, PB had greater genomic heterogeneity. Of the 50 patients with PB subtype by histology and IHC, 28 patients (56.0%) had a dominant genomic profile defined by the pancreatic genotype. The remaining patients included 7 colorectal, 8 distal bile-duct, and 7 indeterminate. Hard classification into the distal bile-duct subset was infrequent, and nearly always from the PB subtype (8 of 9; 89%). When including small-bowel adenocarcinoma in the four-model classifier, the predicted probability of small bowel was low (range 0%–6%) and never the predicted hard-classified genomic origin (Supplementary Fig. S8).

Determining the most influential predictors of ampullary tumors

We used the Jensen–Shannon importance metric (Supplementary Methods) to identify the five most influential predictors in each individual tumor hard prediction, collectively producing a list of 64 unique predictors with the largest influences across all tumors (Fig. 3). The KRAS gene had a strong influence on nearly all pancreatic hard predictions. Conversely, colorectal hard predictions were influenced by both the APC gene mutations and tumor mutational burden (TMB). Finally, distal bile-duct adenocarcinoma hard predictions were strongly influenced by copy-number gains in the MDM2 gene.

Figure 3.

Visualizing the effects of most influential predictors in individual predictions of ampullary tumors. The Jensen–Shannon importance metrics for 64 most influential predictors (along the columns) are plotted against all 76 ampullary tumors (along the rows) as a heatmap. The rows (tumors) are grouped according to their predicted hard classes, and the columns (predictors) are grouped according to their types.

Figure 3.

Visualizing the effects of most influential predictors in individual predictions of ampullary tumors. The Jensen–Shannon importance metrics for 64 most influential predictors (along the columns) are plotted against all 76 ampullary tumors (along the rows) as a heatmap. The rows (tumors) are grouped according to their predicted hard classes, and the columns (predictors) are grouped according to their types.

Close modal

Prognostication by genomic profile

After a median follow-up of 26.9 months (IQR 13.8–45.6), median OS was 50.6 months (IQR 34.6–85.63). There was no significant difference in OS between the INT and PB subtypes as defined by histology and IHC, overall or in the surgically-resected cohort (log rank P = 0.129 and P = 0.783, respectively; Fig. 4A and B). In contrast, in the cohort of patients classified into the colorectal or pancreatic hard genomic groups (n = 58), there was a trend to improved survival in patients in the colorectal genomic group (P = 0.089; Fig. 4C). Moreover, the predicted genomic probabilities of the colorectal and pancreatic subtypes correlated with predicted 72-month survival probability (Fig. 4D). In the bivariate gradient plot, genomic scores with higher colorectal probability were associated with higher survival probability, whereas higher pancreatic probabilities were associated with lower survival probability.

Figure 4.

Prognostication according to hidden-genome predicted probabilities. A, Kaplan–Meier survival plot of OS in all patients, stratified by immunomorphologic subtype. B, Kaplan–Meier survival plot of OS in resected patients, stratified by immunomorphologic subtype. C, Kaplan–Meier survival plot of OS, stratified by genomic hard classification of colorectal vs. pancreatic subtypes. D, Bivariate gradient plot of predicted probability of genomic category (pancreatic: x-axis; colorectal: left y-axis) and predicted survival at 72 months (color gradation; right y-axis). The immunomorphologic category denoted by shape of point and summarized in key.

Figure 4.

Prognostication according to hidden-genome predicted probabilities. A, Kaplan–Meier survival plot of OS in all patients, stratified by immunomorphologic subtype. B, Kaplan–Meier survival plot of OS in resected patients, stratified by immunomorphologic subtype. C, Kaplan–Meier survival plot of OS, stratified by genomic hard classification of colorectal vs. pancreatic subtypes. D, Bivariate gradient plot of predicted probability of genomic category (pancreatic: x-axis; colorectal: left y-axis) and predicted survival at 72 months (color gradation; right y-axis). The immunomorphologic category denoted by shape of point and summarized in key.

Close modal

Application to indeterminate clinical scenarios

There were several patients with mixed PB/INT histology based on IHC staining. We queried survival of these patients with mixed subtypes according to the dominant genomic profile present. Patient P−0022573 had morphologic features of PB subtype, yet IHC staining was predictive of INT subtype (MUC1+, MUC2+, CK20+, CDX2+). The genomic classifier predicted higher likelihood of pancreatic genotype, with canonical KRAS G12D mutation. The patient's clinical course reflected this, with recurrence 5.5 months and death 19.5 months after resection. Patient P−0035477 had morphologic features of mixed subtype along with ambiguous IHC (MUC1+, CK20+, CDX2). The genomic classifier predicted intestinal subtype with 98% probability; there were a total of 72 mutations (62.3/M) with a MSH2 germline variant and MSI (score = 43). The genomic profile correctly predicted a favorable clinical course, with the patient remaining disease-free for over 10 years after resection. Patient P-0012334 had a morphologically-mixed tumor, but IHC was not performed because of inadequate tissue. After application of the genomic algorithm, a poor prognosis was predicted by the distal bile-duct (75%) and pancreatic (22%) subtypes; the patient died 18 months after diagnosis. P−0002503 had mixed morphology and also inadequate tissue for IHC. The patient survived 71 months after resection, consistent with the 70% predicted probability of colorectal genomic subtype.

We developed a genomic classifier trained on the mutational profiles of related gastrointestinal malignancies to stratify ampullary adenocarcinomas using routinely collected genomic-sequencing data. The genomic classifier had high concordance with existing histologic and IHC subtypes and improved quantification of heterogeneity present in individual patient tumors, which may underlie the range of clinical phenotypes observed. Such genomic knowledge provides therapeutic targets – both immediately actionable, as well as potential candidates – and may, following validation in larger cohorts, help guide diagnosis and prognosis.

Our genomic classifier used multilevel meta-feature regression to extract both common and rare variants in training data and incorporate previously unobserved variants when applied to ampullary adenocarcinoma samples. Such methodology has previously been applied successfully to classify “unknown” tumors (1), but this study represents the first successful application for tumor-subtype classification. The genomic-subtype classifier relied on common differences between pancreatic, distal bile-duct, and colorectal adenocarcinomas, which were recapitulated in distinct clinical subtypes of ampullary adenocarcinoma. There was a high degree of concordance between the intestinal immunomorphological subtype and the colorectal genomic profile, characterized by APC and PIK3CA gene mutations, TMB, MSI mutational signatures, and copy-number gains of chromosomes 13 and 20 – all well-known genomic characteristics of colorectal tumors and small-bowel adenocarcinomas (16). Likewise, the pancreatobiliary immunomorphological subtype was most frequently associated with the pancreatic genomic profile, with frequent KRAS mutations. These results align with recent findings that disruptions in Wnt signaling (most commonly by APC mutation) and MSI are more frequently observed in INT subtype tumors, whereas PB subtypes are more likely to harbor KRAS and TP53 mutations (24, 25). Perkins and colleagues also demonstrated that KRAS mutations were more frequent in PB subtype defined by immunomorphology (12); however, other genomic distinctions between the subtypes were not identified, likely due to the rarity of any single variant. This is a common limitation of single-gene mutational analyses, which is overcome by the genomic classifier methodology that additionally utilizes rare genomic variants.

Certain genomic alterations identified herein hold immediate therapeutic value. For instance, TMB was a strong determinant of the colorectal hard classifier. Checkpoint blockade has demonstrated activity in tumors with high TMB (26), and may hold therapeutic value for these subsets. A point of future study will be to determine whether a pancreatic versus colorectal genomic subtype warrants a tailored systemic chemotherapeutic regimen (e.g., FOLFIRINOX for pancreatic-subtype ampullary adenocarcinoma given its successes in the adjuvant treatment of resectable pancreatic adenocarcinoma).

Additionally, the genomic classifier improved diagnosis in patients with mixed-morphology tumors. Even after refinement of histologic classification based on IHC criteria, there was a small cohort that could not be binarized into an intestinal/pancreatobiliary classification. True mixed-type ampullary carcinomas thus appear to be a clinical entity, and the genomic analysis demonstrated the significant heterogeneity that may underlie the inability to “fit” certain tumors into a hard two-category classification. Here, the genomic classifier was particularly useful in guiding prognostication by quantifying the relative proportions of intestinal, pancreatic, and bile duct profiles in a given sample. Importantly, there was a continuum of survival outcomes that was linearly related to the relative proportion of favorable intestinal and unfavorable pancreatobiliary genomic profiles in any given tumor sample.

It remains to be determined in larger, independent samples if the genomic classifier can reliably be used for prognostication. We observed a trend to improved OS in the colorectal versus the pancreatic genomic groups, which outperformed traditional histologic subtypes in predicting long-term survival. The PB subtype has been associated with inferior survival outcomes relative to the INT subtype in some (7, 27) but not all studies (4, 28, 29). Genomic profiles that account for marked heterogeneity may allow for more accurate and consistent classification with substratification into prognostically-distinct groups. Multi-institutional collaboration will be required to adequately study this rare disease entity and to evaluate the genomic classifier in a larger data set where it can be incorporated into multivariable models along with known prognostic variables.

Several limitations warrant emphasis. First, MSK-IMPACT is increasingly used at our institution, but there is nonrandom referral for targeted sequencing, which may introduce bias into the cases included in these analyses. Second, the use of small-bowel adenocarcinoma in our training set may have been the preferred input in place of colorectal adenocarcinoma, given its anatomic proximity to the ampulla. However, evaluation of small-bowel adenocarcinoma in our expanded four-class model showed that small-bowel adenocarcinoma did not have a distinct genomic signature apart from colorectal adenocarcinoma, adding little value to site-of-origin prediction. Third, tumor heterogeneity may impact the precision of the genomic classifier; we cannot rule out that geographic mapping of subclones would identify differing genomic profiles in different regions of the same tumor (25). Fourth, the majority of specimens were collected at resection, but a minority of genomic sequencing was performed on a biopsy of a metastasis or on the primary tumor after systemic therapy. Genomic profiles evaluated by MSK-IMPACT are conserved between primary and metastatic lesions for several malignancies (30, 31), but this has not been adequately studied for ampullary adenocarcinoma. Lastly, there remains the question regarding the applicability of this hidden-genome framework to other NGS platforms. The MSK-IMPACT assay used at our institution interrogates over 400 genes, yet these genes are covered by the majority of large NGS assays because they are commonly mutated in cancer. By comparison, the Foundation Medicine platform analyzes 324 similar genes, including MDM2 copy-number gains. We anticipate that the hidden-genome algorithm can be retooled for other NGS platforms, and are seeking such data to test this hypothesis.

Our analyses suggest genomic criteria can assist in accurate diagnosis and prognostication of ampullary adenocarcinoma. Genomic heterogeneity shown in our model may be related to the multiple cells-of-origin, and identification of broad differences between genomic subtypes suggest potential subtype-specific therapeutic strategies that may improve survival for these patients.

S. Chakraborty reports grants from NCI during the conduct of the study. J.A. Drebin's spouse is employed by American Regent Pharmaceuticals. T.P. Kingham reports personal fees from Olympus Surgical outside the submitted work. A.C. Wei reports personal fees from Histosonics, Medtronic, AstraZeneca, and Celgene and nonfinancial support from Bayer and Intuitive Surgical outside the submitted work. No disclosures were reported by the other authors.

The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

S. Chakraborty: Conceptualization, resources, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft. B.L. Ecker: Conceptualization, data curation, visualization, writing–original draft, project administration, writing–review and editing. K. Seier: Software, formal analysis, visualization, writing–review and editing. V.G. Aveson: Conceptualization, resources, data curation, funding acquisition, writing–review and editing. V.P. Balachandran: Supervision, writing–review and editing. J.A. Drebin: Supervision, writing–review and editing. M.I. D'Angelica: Supervision, writing–review and editing. T.P. Kingham: Supervision, writing–review and editing. C.S. Sigel: Data curation, formal analysis, investigation, methodology, writing–review and editing. K.C. Soares: Supervision, writing–review and editing. E. Vakiani: Supervision, writing–review and editing. A.C. Wei: Supervision, writing–review and editing. R. Chandwani: Data curation, formal analysis, supervision, validation, methodology, writing–review and editing. M. Gonen: Conceptualization, methodology, project administration, writing–review and editing. R. Shen: Conceptualization, resources, software, formal analysis, supervision, methodology, project administration, writing–review and editing. W.R. Jarnagin: Conceptualization, resources, supervision, writing–original draft, project administration, writing–review and editing.

This work was funded by NCI awards P30 CA008748, R01 CA251339 (to R. Shen), and U01 CA238444 (to W.R. Jarnagin).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Chakraborty
S
,
Begg
CB
,
Shen
R
. 
Using the “Hidden” genome to improve classification of cancer types
.
Biometrics
2020
Sep 11 [Epub ahead of print]
.
2.
Chakraborty
S
,
Martin
A
,
Guan
Z
,
Begg
CB
,
Shen
R
. 
Mining mutation contexts across the cancer genome to map tumor site of origin
.
Nat Commun
2021
;
12
:
3051
.
3.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
174
:
1034
5
.
4.
Ecker
BL
,
Vollmer
CM
,
Behrman
SW
,
Allegrini
V
,
Aversa
J
,
Ball
CG
, et al
Role of adjuvant multimodality therapy after curative-intent resection of ampullary carcinoma
.
JAMA Surg
2019
;
154
:
706
14
.
5.
Howe
JR
,
Klimstra
DS
,
Moccia
RD
,
Conlon
KC
,
Brennan
MF
. 
Factors predictive of survival in ampullary carcinoma
.
Ann Surg
1998
;
228
:
87
94
.
6.
Berberat
PO
,
Künzli
BM
,
Gulbinas
A
,
Ramanauskas
T
,
Kleeff
J
,
Müller
MW
, et al
An audit of outcomes of a series of periampullary carcinomas
.
Eur J Surg Oncol
2009
;
35
:
187
91
.
7.
Neoptolemos
JP
,
Moore
MJ
,
Cox
TF
,
Valle
JW
,
Palmer
DH
,
McDonald
AC
, et al
Effect of adjuvant chemotherapy with fluorouracil plus folinic acid or gemcitabine vs observation on survival in patients with resected periampullary adenocarcinoma: the ESPAC-3 periampullary cancer randomized trial
.
JAMA
2012
;
308
:
147
56
.
8.
O'Connell
JB
,
Maggard
MA
,
Manunga
J
,
Tomlinson
JS
,
Reber
HA
,
Ko
CY
, et al
Survival after resection of ampullary carcinoma: a national population-based study
.
Ann Surg Oncol
2008
;
15
:
1820
7
.
9.
Robert
P-E
,
Leux
C
,
Ouaissi
M
,
Miguet
M
,
Paye
F
,
Merdrignac
A
, et al
Predictors of long-term survival following resection for ampullary carcinoma: a large retrospective French multicentric study
.
Pancreas
2014
;
43
:
692
7
.
10.
Chang
DK
,
Jamieson
NB
,
Johns
AL
,
Scarlett
CJ
,
Pajic
M
,
Chou
A
, et al
Histomolecular phenotypes and outcome in adenocarcinoma of the ampulla of vater
.
J Clin Oncol
2013
;
31
:
1348
56
.
11.
Reid
MD
,
Balci
S
,
Ohike
N
,
Xue
Y
,
Kim
GE
,
Tajiri
T
, et al
Ampullary carcinoma is often of mixed or hybrid histologic type: an analysis of reproducibility and clinical relevance of classification as pancreatobiliary versus intestinal in 232 cases
.
Mod Pathol
2016
;
29
:
1575
85
.
12.
Perkins
G
,
Svrcek
M
,
Bouchet-Doumenq
C
,
Voron
T
,
Colussi
O
,
Debove
C
, et al
Can we classify ampullary tumours better? clinical, pathological and molecular features. results of an AGEO study
.
Br J Cancer
2019
;
120
:
697
702
.
13.
Cheng
DT
,
Mitchell
TN
,
Zehir
A
,
Shah
RH
,
Benayed
R
,
Syed
A
, et al
Memorial sloan kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology
.
J Mol Diagn
2015
;
17
:
251
64
.
14.
Nagtegaal
ID
,
Odze
RD
,
Klimstra
D
,
Paradis
V
,
Rugge
M
,
Schirmacher
P
, et al
The 2019 WHO classification of tumours of the digestive system
.
Histopathology
2020
;
76
:
182
8
.
15.
Ang
DC
,
Shia
J
,
Tang
LH
,
Katabi
N
,
Klimstra
DS
. 
The utility of immunohistochemistry in subtyping adenocarcinoma of the ampulla of vater
.
Am J Surg Pathol
2014
;
38
:
1371
9
.
16.
Schrock
AB
,
Devoe
CE
,
McWilliams
R
,
Sun
J
,
Aparicio
T
,
Stephens
PJ
, et al
Genomic profiling of small-bowel adenocarcinoma
.
JAMA Oncol
2017
;
3
:
1546
53
.
17.
Alexandrov
LB
,
Kim
J
,
Haradhvala
NJ
,
Huang
MN
,
Tian Ng
AW
,
Wu
Y
, et al
The repertoire of mutational signatures in human cancer
.
Nature
2020
;
578
:
94
101
.
18.
Furey
TS
,
Haussler
D
. 
Integration of the cytogenetic map with the draft human genome sequence
.
Hum Mol Genet
2003
;
12
:
1037
44
.
19.
Cheung
VG
,
Nowak
N
,
Jang
W
,
Kirsch
IR
,
Zhao
S
,
Chen
XN
, et al
Integration of cytogenetic landmarks into the draft sequence of the human genome
.
Nature
2001
;
409
:
953
8
.
20.
Becht
E
,
McInnes
L
,
Healy
J
,
Duterte
CA
,
Kwok
IWH
,
Ng
LG
, et al
Dimensionality reduction for visualizing single-cell data using UMAP
.
Nat Biotechnol
2018
Dec 3 [Epub ahead of print]
.
21.
Collins
AL
,
Wojcik
S
,
Liu
J
,
Frankel
WL
,
Alder
H
,
Yu
L
, et al
A differential microRNA profile distinguishes cholangiocarcinoma from pancreatic adenocarcinoma
.
Ann Surg Oncol
2014
;
21
:
133
8
.
22.
Takenami
T
,
Maeda
S
,
Karasawa
H
,
Suzuki
T
,
Furukawa
T
,
Morikawa
T
, et al
Novel biomarkers distinguishing pancreatic head Cancer from distal cholangiocarcinoma based on proteomic analysis
.
BMC Cancer
2019
;
19
:
318
.
23.
Tibshirani
RJ
,
Efron
B
. 
Pre-validation and inference in microarrays
.
Stat Appl Genet Mol Biol
2002
;
1
:
Article1
.
24.
Gingras
M-C
,
Covington
KR
,
Chang
DK
,
Donehower
LA
,
Gill
AJ
,
Ittmann
MM
, et al
Ampullary Cancers harbor ELF3 tumor suppressor gene mutations and exhibit frequent WNT dysregulation
.
Cell Rep
2016
;
14
:
907
19
.
25.
Yachida
S
,
Wood
LD
,
Suzuki
M
,
Takai
E
,
Totoki
Y
,
Kato
M
, et al
Genomic sequencing identifies ELF3 as a driver of ampullary carcinoma
.
Cancer Cell
2016
;
29
:
229
40
.
26.
Marabelle
A
,
Fakih
M
,
Lopez
J
,
Shah
M
,
Shapira-Frommer
R
,
Nakagawa
K
, et al
Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study
.
Lancet Oncol
2020
;
21
:
1353
65
.
27.
Westgaard
A
,
Tafjord
S
,
Farstad
IN
,
Cvancarova
M
,
Eide
TJ
,
Mathisen
O
, et al
Pancreatobiliary versus intestinal histologic type of differentiation is an independent prognostic factor in resected periampullary adenocarcinoma
.
BMC Cancer
2008
;
8
:
170
.
28.
Zhou
H
,
Schaefer
N
,
Wolff
M
,
Fischer
HP
. 
Carcinoma of the ampulla of Vater: comparative histologic/immunohistochemical classification and follow-up
.
Am J Surg Pathol
2004
;
28
:
875
82
.
29.
Colussi
O
,
Voron
T
,
Pozet
A
,
Hammel
P
,
Sauvanet
A
,
Bachet
JB
, et al
Prognostic score for recurrence after Whipple's pancreaticoduodenectomy for ampullary carcinomas; results of an AGEO retrospective multicenter cohort
.
Eur J Surg Oncol
2015
;
41
:
520
6
.
30.
Brannon
AR
,
Vakiani
E
,
Sylvester
BE
,
Scott
SN
,
McDermott
G
,
Shah
RH
, et al
Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions
.
Genome Biol
2014
;
15
:
454
.
31.
Yaeger
R
,
Chatila
WK
,
Lipsyc
MD
,
Hechtman
JF
,
Cercek
A
,
Sanchez-Vega
F
, et al
Clinical sequencing defines the genomic landscape of metastatic colorectal cancer
.
Cancer Cell
2018
;
33
:
125
36
.