Abstract
The clinical behavior of ampullary adenocarcinoma varies widely. Targeted tumor sequencing may better define biologically distinct subtypes to improve diagnosis and management.
The hidden-genome algorithm, a multilevel meta-feature regression model, was trained on a prospectively sequenced cohort of 3,411 patients (1,001 pancreatic adenocarcinoma, 165 distal bile-duct adenocarcinoma, 2,245 colorectal adenocarcinoma) and subsequently applied to targeted panel DNA-sequencing data from ampullary adenocarcinomas. Genomic classification (i.e., colorectal vs. pancreatic) was correlated with standard histologic classification [i.e., intestinal (INT) vs. pancreatobiliary (PB)] and clinical outcome.
Colorectal genomic subtype prediction was primarily influenced by mutations in APC and PIK3CA, tumor mutational burden, and DNA mismatch repair (MMR)–deficiency signature. Pancreatic genomic-subtype prediction was dictated by KRAS gene alterations, particularly KRAS G12D, KRAS G12R, and KRAS G12V. Distal bile-duct adenocarcinoma genomic subtype was most influenced by copy-number gains in the MDM2 gene. Despite high (73%) concordance between immunomorphologic subtype and genomic category, there was significant genomic heterogeneity within both histologic subtypes. Genomic scores with higher colorectal probability were associated with greater survival compared with those with a higher pancreatic probability.
The genomic classifier provides insight into the heterogeneity of ampullary adenocarcinoma and improves stratification, which is dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing and obviates subjective histologic interpretation.
Ampullary adenocarcinomas are classified into intestinal or pancreatobiliary subtypes based on histologic criteria, with potentially different clinical behavior. The incorporation of targeted tumor sequencing may better define biologically distinct phenotypes of ampullary adenocarcinoma to improve clinical diagnosis and management. Following training of the hidden-genome algorithm, a multilevel meta-feature regression model, on related malignancies of the pancreas, distal bile duct, and intestine, the molecular taxonomy was applied to an institutional cohort of patients with ampullary adenocarcinoma. The genomic-classifier methodology revealed significant heterogeneity among ampullary cancers. The genomic classifier better stratified the divergent outcomes in ampullary cancer, which were dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing, is not subject to subjective histologic interpretation, and holds promise for improving identification of distinct clinical subtypes for risk stratification and may guide selection for multimodality therapies.
Introduction
Accurate classification of tumors is essential to guide management and inform prognosis. Traditionally, site of origin and histopathologic subtype have defined a tumor and its expected clinical phenotype. Targeted tumor sequencing is an increasingly available technology that allows more precise classification utilizing patterns of genomic alterations. Moreover, such analyses may help quantify tumor heterogeneity, which is important not only for understanding tumor biology but also differences in clinical behavior and potential tailored treatment.
The hidden-genome classifier (1, 2) is a powerful tool that offers deeper insight into tumor biology. Although there is a robust literature describing the association between specific genomic alterations and individual tumor types, frequent and highly conserved variants comprise a small number of observed variants. Deep sequencing has revealed millions of unique somatic mutations, and often, more than 90% of somatic variants are singletons (3). The hidden-genome classifier uses multilevel meta-feature regression to utilize both common and rare variants and incorporates previously unobserved variants to determine cancer type (1).
Ampullary adenocarcinoma is a heterogeneous disease that may benefit from hidden-genome methodology. Ampullary adenocarcinoma is an uncommon malignancy of the periampullary region (4–7), the junction of the biliary, pancreatic, and digestive tracts. As such, adenocarcinomas arising within the duodenal ampullary complex have variable clinical phenotypes, likely due to differences in cell of origin (8). Although two major histologic types – intestinal (INT) and pancreatobiliary (PB) – have been identified (9, 10), morphology-based subtype classification is unreliable for prognostication (11). Additionally, molecular genotyping has not improved prognostic stratification relative to histology alone (12). Thus, identification of distinct clinical subtypes for risk stratification and improving selection for multimodality therapies is a critical unmet need.
Herein, we report a molecular taxonomy of ampullary adenocarcinoma using hidden-genome methodology based on a broad array of genomic alterations identified by targeted tumor sequencing. Additionally, we used the model to quantify the degree of genomic heterogeneity in individual samples, which included therapeutically actionable alterations.
Materials and Methods
Patient cohort
After institutional review board (IRB) approval, we identified consecutive patients treated for ampullary adenocarcinoma at Memorial Sloan Kettering Cancer Center (MSKCC) with targeted next-generation sequencing (NGS) of their tumor. All specimens were sequenced using the Memorial Sloan Kettering– Integrated Molecular Profiling of Actionable Cancer Targets (MSK-IMPACT) assay, a clinically validated targeted NGS array that can detect mutations, copy-number alterations, and select rearrangements (13). Demographic, clinical, pathologic, and outcome data were abstracted from an institutional database and patients' medical records. All patients provided written informed consent. The study was conducted in accordance with the US Common Rule.
Pathologic assessment of ampullary adenocarcinomas
Tumors were reviewed by a gastrointestinal pathologist blinded to the genotypic classifier. The histologic category was assigned using established criteria (14) as INT, PB, or mixed INT/PB (mixed). The Ang immunophenotypic classification was assigned as “INT”, “PB”, or “ambiguous”, as previously described (15).
Hidden-genome classification
For probabilistic classification of the ampullary tumors based on their genomic profiles, we trained three hidden-genome group-lasso regularized multi-logistic models – (i) a three-class model trained on 2,245 colorectal adenocarcinomas, 165 distal bile-duct adenocarcinomas, and 1,001 pancreatic adenocarcinomas (n = 3,411), (ii) a four-class model that added 254 gastric tumors (n = 3,665) to the three-class model, and (iii) a four-class model that added 69 small-bowel adenocarcinomas (n = 3,480) to the three-class model. The training cohort included patients treated at MSKCC with available MSK-IMPACT sequencing of their primary tumor. Various descriptive statistics for the training data cohort are provided in Supplementary Table S1. Colorectal adenocarcinoma was used to represent intestinal genomics in the three-class model, given the rarity of primary small-bowel adenocarcinoma and the overlap of driver mutations (16).
The hidden-genome model utilized the following predictors: (i) normalized binary indicators for 250 discriminative individual variants observed in the MSK-IMPACT cancer-gene panels, (ii) normalized number of mutations observed in each MSK-IMPACT gene, (iii) normalized counts of mutations associated with each of the 96 possible single-base substitution categories [ref. 17; SBS-96; each considering the mutated base (6 possibilities), along with immediately 5′ and 3′ flanks (4 possibilities each), resulting in |6 \times \ 4 \times 4\ = \ 96$| categories], (iv) square-root of the total number of mutations observed in the tumor, (v) binary indicators of copy-number loss and gain at each of 476 genes present in MSK-IMPACT panel, and (vi) average copy-number log-ratio computed at 782 chromosome cytogenic bands spanned across the 22 autosomes (18, 19). Predictors (ii) to (iv) can be interpreted as scalar projections of three meta-features, namely, the gene itself, SBS-96, and an intercept meta-feature vector of 1, respectively (Supplementary Methods), along the direction of the mutation-profile vector. The mutation contexts embodied in these meta-features combine information in the individual variants, including rare variants, thereby permitting a highly informative dimension reduction of the ultra-high dimensional mutation-profile vector. The model also includes several discriminative hotspot variants with substantial residual effects not explained by mutation context.
We used the fitted hidden-genome model to predict parental cancer sites for the sample of ampullary adenocarcinoma specimens. For each ampullary tumor, the predicted class probabilities were used to produce a soft classification of the tumors (i.e., percentage colorectal, pancreatic, and biliary). An associated hard classification was subsequently obtained by assigning each tumor to the class with the highest predicted probability if the highest probability was ≥0.5; otherwise, the hard class was “indeterminate”.
Statistical analysis
Continuous data are expressed as medians and interquartile range (IQR) and compared between groups using Wilcoxon rank sum test. Categorical variables are expressed as frequencies and percentages and compared using Fisher exact test. Overall survival (OS) was defined from time of diagnosis to death or last follow-up; for the surgically resected subset, OS was additionally evaluated using time from resection to death or last follow-up. OS was estimated with Kaplan–Meier methods and compared with the log–rank test. All tests were two-sided, and P < 0.05 was considered significant. SAS (version 9.4; SAS Institute) or R (version 4.0.1; R Foundation for Statistical Computing) was used for all analyses.
Results
Patient cohort
A cohort of 76 patients with ampullary adenocarcinoma was identified (Table 1), with a median age of 60.5 years (IQR 50.0–68.0); 62% were male and 57 (75%) underwent resection (pancreaticoduodenectomy). Median tumor diameter was 2.3 cm (IQR 1.7–2.8), and the majority (72.2%) had American Joint Committee on Cancer (AJCC) 8th edition stage III disease. Adverse histopathologic features were common: poor differentiation (n = 19; 35.2%), lymphovascular invasion (n = 39; 75.0%), and perineural invasion (n = 37; 72.6%). By immuno-histologic assessment, 21 patients had INT, 50 patients had PB, and 5 patients had mixed-subtype tumors.
. | Surgery . | . | |||
---|---|---|---|---|---|
Variable, N (%) or median (IQR) or mean (SD) . | No resection (n = 19) . | Resection (n = 57) . | Total (n = 76) . | ||
Age, median (IQR)a | 61 (51–67) | 60 (48–70) | 60.5 (50–68) | ||
Gender, female | 6 (31.6%) | 23 (40.4%) | 29 (38.2%) | ||
CEA (ng/mL), median (IQR)b | 7.2 (3.8–40.9) | 2.5 (1.4–4.1) | 3.8 (1.9–12.0) | ||
Ca 19–9 (U/mL), median (IQR)c | 74 (18.5–1869) | 58 (24–285) | 58 (19–319) | ||
AJCC staged | I-II | 0 (0%) | 13 (24.1%) | 13 (17.8%) | |
III | 2 (10.5%) | 39 (72.2%) | 41 (56.2%) | ||
IV | 17 (89.5%) | 2 (3.7%) | 19 (26.0%) | ||
Gradee | Well | 1 (5.3%) | 2 (3.7%) | 3 (4.1%) | |
Moderate | 11 (57.9%) | 33 (61.1%) | 44 (60.3%) | ||
Poor | 7 (36.8%) | 19 (35.2%) | 26 (35.6%) | ||
Immunomorphologic subtype | INT | 5 (26.3%) | 16 (28.1%) | 21 (27.6%) | |
PB | 13 (68.4%) | 37 (64.9%) | 50 (65.8%) | ||
Mixed | 1 (5.3%) | 4 (7.0%) | 5 (6.6%) | ||
T-stagef | T1–2 | – | 15 (28.9%) | – | |
T3 | – | 37 (71.2%) | – | ||
T4 | – | 2 (3.7%) | – | ||
Nodal positivityg | – | 39 (75.0%) | – | ||
Tumor size, median (IQR)h | – | 2.3 (1.7–2.8) | – | ||
Margin negative (R0)i | – | 49 (89.1%) | – | ||
Lymphovascular invasionj | – | 39 (75.0%) | – | ||
Perineural invasionk | – | 37 (72.6%) | – | ||
Chemotherapy | Neoadjuvant | – | 3 (5.3%) | – | |
Adjuvant | – | 40 (70.2%) | – | ||
Radiation, adjuvant | – | 17 (33.3%) | – |
. | Surgery . | . | |||
---|---|---|---|---|---|
Variable, N (%) or median (IQR) or mean (SD) . | No resection (n = 19) . | Resection (n = 57) . | Total (n = 76) . | ||
Age, median (IQR)a | 61 (51–67) | 60 (48–70) | 60.5 (50–68) | ||
Gender, female | 6 (31.6%) | 23 (40.4%) | 29 (38.2%) | ||
CEA (ng/mL), median (IQR)b | 7.2 (3.8–40.9) | 2.5 (1.4–4.1) | 3.8 (1.9–12.0) | ||
Ca 19–9 (U/mL), median (IQR)c | 74 (18.5–1869) | 58 (24–285) | 58 (19–319) | ||
AJCC staged | I-II | 0 (0%) | 13 (24.1%) | 13 (17.8%) | |
III | 2 (10.5%) | 39 (72.2%) | 41 (56.2%) | ||
IV | 17 (89.5%) | 2 (3.7%) | 19 (26.0%) | ||
Gradee | Well | 1 (5.3%) | 2 (3.7%) | 3 (4.1%) | |
Moderate | 11 (57.9%) | 33 (61.1%) | 44 (60.3%) | ||
Poor | 7 (36.8%) | 19 (35.2%) | 26 (35.6%) | ||
Immunomorphologic subtype | INT | 5 (26.3%) | 16 (28.1%) | 21 (27.6%) | |
PB | 13 (68.4%) | 37 (64.9%) | 50 (65.8%) | ||
Mixed | 1 (5.3%) | 4 (7.0%) | 5 (6.6%) | ||
T-stagef | T1–2 | – | 15 (28.9%) | – | |
T3 | – | 37 (71.2%) | – | ||
T4 | – | 2 (3.7%) | – | ||
Nodal positivityg | – | 39 (75.0%) | – | ||
Tumor size, median (IQR)h | – | 2.3 (1.7–2.8) | – | ||
Margin negative (R0)i | – | 49 (89.1%) | – | ||
Lymphovascular invasionj | – | 39 (75.0%) | – | ||
Perineural invasionk | – | 37 (72.6%) | – | ||
Chemotherapy | Neoadjuvant | – | 3 (5.3%) | – | |
Adjuvant | – | 40 (70.2%) | – | ||
Radiation, adjuvant | – | 17 (33.3%) | – |
aAge data missing for 2 patients.
bCEA data missing for 34 patients.
cCa 19–9 data missing for 29 patients.
dAJCC stage data missing for 3 patients.
eGrade data missing for 3 patients.
fT-stage data missing for 5 patients.
gNodal-positivity data missing for 5 patients.
hTumor-size data missing for 7 patients.
iMargin data missing for 2 patients.
jLymphovascular-invasion data missing for 5 patients.
kPerineural-invasion data missing for 6 patients.
Generating a hidden-genome model of ampullary adenocarcinoma
To visualize the discriminative signals of the hidden-genome methodology when applied to the three-class training set, we performed a principal component analysis of all active predictors (i.e., selected in group-lasso) in the fitted model (Supplementary Fig. S1). Following a Uniform Manifold Approximation and Projection (UMAP) analysis (20) on the resultant first 50 principal components, an approximate two-dimensional embedding of all active predictors for each training-set tumor was created (Fig. 1A). There was distinct separation between colorectal and pancreatic tumors, suggesting unique genomic signatures of these two cancer types. Distal bile-duct adenocarcinomas, in contrast, did not harbor strong tissue-specific signals but rather a mixture of colorectal- and pancreatic-specific genomic information. Still, many fell “closer” to pancreatic tumors, consistent with their established histologic similarities (21, 22). A complete heatmap displaying the values of all active predictors in the hidden-genome model across all training-set tumors (grouped by cancer sites) is displayed in Supplementary Fig. S2.
To visualize the tissue-site specific signals of the most informative predictors in the training data set, we plotted the estimated average ORs for being classified relative to not being classified (one-vs.-rest, Supplementary Methods; Fig. 1B). KRAS had a large positive OR for pancreatic cancer, with the hotspots KRAS G12D, KRAS G12V, and the more pancreas-specific KRAS G12R hotspot providing additional discriminative information captured by the “residual” effects at the variant level. However, hotspot KRAS G13D was more specific to colorectal cancer, where its residual effect produced a large positive-log OR. In contrast, the APC gene had a large OR for colorectal cancer, while its hotspot APC I1307K had a small OR for that site. There were two predictors with large positive-log ORs in distal bile-duct adenocarcinoma, namely, copy-number gain in the MDM2 gene and the SBS-96 category C>T T.T. Complete log-ORs of all active predictors in the fitted model are provided as Supplementary data.
Finally, to assess the predictive accuracy of the fitted hidden-genome model, we computed one-vs.-rest precision-recall AUC for each training site, and subsequently obtained as an overall measure the average of the site-specific AUCs, through prevalidated (23) predictive probabilities (see Supplementary Methods). Note that a precision-recall AUC, unlike an ROC AUC, adjusts for class-size imbalances (which necessarily occur in one-vs.-rest comparisons) and thus produces a robust measure of predictive performance of a multi-class classifier. As depicted in Fig. 1C, the fitted hidden-genome model achieved near perfect predictive accuracies in the colorectal (AUC = 0.99) and pancreatic (AUC = 0.94) genomic groups. The bile-duct group, in contrast, had a noticeably smaller AUC (0.46), likely reflecting the absence of bile-duct–specific discriminative genomic signals, heterogeneity, and small sample size. The precision-recall AUCs achieved by the hidden-genome classifier were well above the null baseline values across all cancer sites. The average of the site-specific AUCs was 0.79 (corresponding to a null baseline of 0.33), demonstrating strong overall classification performance of the fitted model.
Analogous heatmaps, UMAP, and OR for the active predictors, along with precision-recall AUCs for the fitted four-class training-set models (with gastric and small bowel as the additional training sites, respectively) are displayed in Supplementary Figures S3–S6. The additional sites, gastric and small bowel, lacked strong discriminative signals and, similar to distal bile-duct tumors, harbored mixtures of pancreatic- and colorectal-specific genomic signals (Supplementary Figures S3A and S5A). These were reflected in smaller one-vs.-rest ORs (in absolute log scale; Supplementary Figures S3B and S5B), and precision-recall AUCs (Supplementary Fig. S3C and S5C).
Predicting parental sites for ampullary tumors
Applying the trained model to the cohort of 76 ampullary adenocarcinomas, we observed a high degree of concordance between the genomic prediction and pathologic subtype (Fig. 2A). For INT subtype classified by histology and IHC, 76.2% of the samples were genomically predicted to be colorectal site of origin based on the hidden-genome model, with a median predicted probability of 80%. For the PB-subtype adenocarcinoma, 56.0% were genomically predicted as pancreatic site of origin, with a median predicted probability of 55%, representing a higher degree of heterogeneity. The distal bile-duct adenocarcinoma genomic signature was rarely the dominant profile, with a median predicted probability of 5% for INT subtypes and 10% for PB subtypes. Of note, addition of gastric and small-bowel adenocarcinoma to the three-class genomic model did not improve the diagnostic accuracy of the system (Supplementary Fig. S7).
The cumulative probabilities of the gene classifier for each patient sample are summarized in a probability swimmer plot (multiple horizontal barplots with a common horizontal axis), stratified by immuno-histologic subtype (Fig. 2B). The INT subtypes predominately expressed a colorectal genomic profile, although there was a wide range (3%–100%). There was similar heterogeneity for the PB subtypes; the pancreatic genomic profile ranged from 1% to 98%. Interestingly, of the 5 patients with mixed INT-PB subtype, nearly all (n = 4) had a dominant genomic profile characterized by the colorectal signature (range, 72%–99%).
Genomic-predicted probabilities were converted to a hard classifier by assigning a single category for any subtype that reached a 50% threshold. Overall, concordance between the hard classification and immuno-histologic subtype was 73.2% (Fig. 2C). Of the 21 patients with INT subtype by histology and IHC, 16 (76.2%) had a dominant genomic profile consistent with colorectal. For example, case P−0003602 was INT subtype by IHC, which was concordant with a colorectal genotype (100% colorectal; 0% pancreatic; 0% distal bile duct). There were classic colorectal-cancer genomic features, including KRAS G12C mutation and gain of chromosome 13. Similarly, case P−0023740 was INT subtype by IHC, which was concordant with a colorectal genotype (97%) and included PIK3CA mutation and the presence of a microsatellite instability (MSI) signature (score = 46). In contrast, PB had greater genomic heterogeneity. Of the 50 patients with PB subtype by histology and IHC, 28 patients (56.0%) had a dominant genomic profile defined by the pancreatic genotype. The remaining patients included 7 colorectal, 8 distal bile-duct, and 7 indeterminate. Hard classification into the distal bile-duct subset was infrequent, and nearly always from the PB subtype (8 of 9; 89%). When including small-bowel adenocarcinoma in the four-model classifier, the predicted probability of small bowel was low (range 0%–6%) and never the predicted hard-classified genomic origin (Supplementary Fig. S8).
Determining the most influential predictors of ampullary tumors
We used the Jensen–Shannon importance metric (Supplementary Methods) to identify the five most influential predictors in each individual tumor hard prediction, collectively producing a list of 64 unique predictors with the largest influences across all tumors (Fig. 3). The KRAS gene had a strong influence on nearly all pancreatic hard predictions. Conversely, colorectal hard predictions were influenced by both the APC gene mutations and tumor mutational burden (TMB). Finally, distal bile-duct adenocarcinoma hard predictions were strongly influenced by copy-number gains in the MDM2 gene.
Prognostication by genomic profile
After a median follow-up of 26.9 months (IQR 13.8–45.6), median OS was 50.6 months (IQR 34.6–85.63). There was no significant difference in OS between the INT and PB subtypes as defined by histology and IHC, overall or in the surgically-resected cohort (log rank P = 0.129 and P = 0.783, respectively; Fig. 4A and B). In contrast, in the cohort of patients classified into the colorectal or pancreatic hard genomic groups (n = 58), there was a trend to improved survival in patients in the colorectal genomic group (P = 0.089; Fig. 4C). Moreover, the predicted genomic probabilities of the colorectal and pancreatic subtypes correlated with predicted 72-month survival probability (Fig. 4D). In the bivariate gradient plot, genomic scores with higher colorectal probability were associated with higher survival probability, whereas higher pancreatic probabilities were associated with lower survival probability.
Application to indeterminate clinical scenarios
There were several patients with mixed PB/INT histology based on IHC staining. We queried survival of these patients with mixed subtypes according to the dominant genomic profile present. Patient P−0022573 had morphologic features of PB subtype, yet IHC staining was predictive of INT subtype (MUC1+, MUC2+, CK20+, CDX2+). The genomic classifier predicted higher likelihood of pancreatic genotype, with canonical KRAS G12D mutation. The patient's clinical course reflected this, with recurrence 5.5 months and death 19.5 months after resection. Patient P−0035477 had morphologic features of mixed subtype along with ambiguous IHC (MUC1+, CK20+, CDX2−). The genomic classifier predicted intestinal subtype with 98% probability; there were a total of 72 mutations (62.3/M) with a MSH2 germline variant and MSI (score = 43). The genomic profile correctly predicted a favorable clinical course, with the patient remaining disease-free for over 10 years after resection. Patient P-0012334 had a morphologically-mixed tumor, but IHC was not performed because of inadequate tissue. After application of the genomic algorithm, a poor prognosis was predicted by the distal bile-duct (75%) and pancreatic (22%) subtypes; the patient died 18 months after diagnosis. P−0002503 had mixed morphology and also inadequate tissue for IHC. The patient survived 71 months after resection, consistent with the 70% predicted probability of colorectal genomic subtype.
Discussion
We developed a genomic classifier trained on the mutational profiles of related gastrointestinal malignancies to stratify ampullary adenocarcinomas using routinely collected genomic-sequencing data. The genomic classifier had high concordance with existing histologic and IHC subtypes and improved quantification of heterogeneity present in individual patient tumors, which may underlie the range of clinical phenotypes observed. Such genomic knowledge provides therapeutic targets – both immediately actionable, as well as potential candidates – and may, following validation in larger cohorts, help guide diagnosis and prognosis.
Our genomic classifier used multilevel meta-feature regression to extract both common and rare variants in training data and incorporate previously unobserved variants when applied to ampullary adenocarcinoma samples. Such methodology has previously been applied successfully to classify “unknown” tumors (1), but this study represents the first successful application for tumor-subtype classification. The genomic-subtype classifier relied on common differences between pancreatic, distal bile-duct, and colorectal adenocarcinomas, which were recapitulated in distinct clinical subtypes of ampullary adenocarcinoma. There was a high degree of concordance between the intestinal immunomorphological subtype and the colorectal genomic profile, characterized by APC and PIK3CA gene mutations, TMB, MSI mutational signatures, and copy-number gains of chromosomes 13 and 20 – all well-known genomic characteristics of colorectal tumors and small-bowel adenocarcinomas (16). Likewise, the pancreatobiliary immunomorphological subtype was most frequently associated with the pancreatic genomic profile, with frequent KRAS mutations. These results align with recent findings that disruptions in Wnt signaling (most commonly by APC mutation) and MSI are more frequently observed in INT subtype tumors, whereas PB subtypes are more likely to harbor KRAS and TP53 mutations (24, 25). Perkins and colleagues also demonstrated that KRAS mutations were more frequent in PB subtype defined by immunomorphology (12); however, other genomic distinctions between the subtypes were not identified, likely due to the rarity of any single variant. This is a common limitation of single-gene mutational analyses, which is overcome by the genomic classifier methodology that additionally utilizes rare genomic variants.
Certain genomic alterations identified herein hold immediate therapeutic value. For instance, TMB was a strong determinant of the colorectal hard classifier. Checkpoint blockade has demonstrated activity in tumors with high TMB (26), and may hold therapeutic value for these subsets. A point of future study will be to determine whether a pancreatic versus colorectal genomic subtype warrants a tailored systemic chemotherapeutic regimen (e.g., FOLFIRINOX for pancreatic-subtype ampullary adenocarcinoma given its successes in the adjuvant treatment of resectable pancreatic adenocarcinoma).
Additionally, the genomic classifier improved diagnosis in patients with mixed-morphology tumors. Even after refinement of histologic classification based on IHC criteria, there was a small cohort that could not be binarized into an intestinal/pancreatobiliary classification. True mixed-type ampullary carcinomas thus appear to be a clinical entity, and the genomic analysis demonstrated the significant heterogeneity that may underlie the inability to “fit” certain tumors into a hard two-category classification. Here, the genomic classifier was particularly useful in guiding prognostication by quantifying the relative proportions of intestinal, pancreatic, and bile duct profiles in a given sample. Importantly, there was a continuum of survival outcomes that was linearly related to the relative proportion of favorable intestinal and unfavorable pancreatobiliary genomic profiles in any given tumor sample.
It remains to be determined in larger, independent samples if the genomic classifier can reliably be used for prognostication. We observed a trend to improved OS in the colorectal versus the pancreatic genomic groups, which outperformed traditional histologic subtypes in predicting long-term survival. The PB subtype has been associated with inferior survival outcomes relative to the INT subtype in some (7, 27) but not all studies (4, 28, 29). Genomic profiles that account for marked heterogeneity may allow for more accurate and consistent classification with substratification into prognostically-distinct groups. Multi-institutional collaboration will be required to adequately study this rare disease entity and to evaluate the genomic classifier in a larger data set where it can be incorporated into multivariable models along with known prognostic variables.
Several limitations warrant emphasis. First, MSK-IMPACT is increasingly used at our institution, but there is nonrandom referral for targeted sequencing, which may introduce bias into the cases included in these analyses. Second, the use of small-bowel adenocarcinoma in our training set may have been the preferred input in place of colorectal adenocarcinoma, given its anatomic proximity to the ampulla. However, evaluation of small-bowel adenocarcinoma in our expanded four-class model showed that small-bowel adenocarcinoma did not have a distinct genomic signature apart from colorectal adenocarcinoma, adding little value to site-of-origin prediction. Third, tumor heterogeneity may impact the precision of the genomic classifier; we cannot rule out that geographic mapping of subclones would identify differing genomic profiles in different regions of the same tumor (25). Fourth, the majority of specimens were collected at resection, but a minority of genomic sequencing was performed on a biopsy of a metastasis or on the primary tumor after systemic therapy. Genomic profiles evaluated by MSK-IMPACT are conserved between primary and metastatic lesions for several malignancies (30, 31), but this has not been adequately studied for ampullary adenocarcinoma. Lastly, there remains the question regarding the applicability of this hidden-genome framework to other NGS platforms. The MSK-IMPACT assay used at our institution interrogates over 400 genes, yet these genes are covered by the majority of large NGS assays because they are commonly mutated in cancer. By comparison, the Foundation Medicine platform analyzes 324 similar genes, including MDM2 copy-number gains. We anticipate that the hidden-genome algorithm can be retooled for other NGS platforms, and are seeking such data to test this hypothesis.
Our analyses suggest genomic criteria can assist in accurate diagnosis and prognostication of ampullary adenocarcinoma. Genomic heterogeneity shown in our model may be related to the multiple cells-of-origin, and identification of broad differences between genomic subtypes suggest potential subtype-specific therapeutic strategies that may improve survival for these patients.
Authors' Disclosures
S. Chakraborty reports grants from NCI during the conduct of the study. J.A. Drebin's spouse is employed by American Regent Pharmaceuticals. T.P. Kingham reports personal fees from Olympus Surgical outside the submitted work. A.C. Wei reports personal fees from Histosonics, Medtronic, AstraZeneca, and Celgene and nonfinancial support from Bayer and Intuitive Surgical outside the submitted work. No disclosures were reported by the other authors.
Disclaimer
The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Authors' Contributions
S. Chakraborty: Conceptualization, resources, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft. B.L. Ecker: Conceptualization, data curation, visualization, writing–original draft, project administration, writing–review and editing. K. Seier: Software, formal analysis, visualization, writing–review and editing. V.G. Aveson: Conceptualization, resources, data curation, funding acquisition, writing–review and editing. V.P. Balachandran: Supervision, writing–review and editing. J.A. Drebin: Supervision, writing–review and editing. M.I. D'Angelica: Supervision, writing–review and editing. T.P. Kingham: Supervision, writing–review and editing. C.S. Sigel: Data curation, formal analysis, investigation, methodology, writing–review and editing. K.C. Soares: Supervision, writing–review and editing. E. Vakiani: Supervision, writing–review and editing. A.C. Wei: Supervision, writing–review and editing. R. Chandwani: Data curation, formal analysis, supervision, validation, methodology, writing–review and editing. M. Gonen: Conceptualization, methodology, project administration, writing–review and editing. R. Shen: Conceptualization, resources, software, formal analysis, supervision, methodology, project administration, writing–review and editing. W.R. Jarnagin: Conceptualization, resources, supervision, writing–original draft, project administration, writing–review and editing.
Acknowledgments
This work was funded by NCI awards P30 CA008748, R01 CA251339 (to R. Shen), and U01 CA238444 (to W.R. Jarnagin).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.