Background: The evolutionary trajectory of a growing tumor is defined by the selective pressure within the tissue microenvironment, or ecological niche, of origin. As new mutations arise, the potential for new cellular phenotypes, or subclones, with varying degrees of fitness also increases. This intratumor clonality is known to be an important factor in predicting cancer progression, survival, therapeutic response, and drug resistance. Yet, parsing and quantifying the underlying subclones that drive ongoing adaptation poses challenges in noisy single timepoint, single site bulk sequenced tumor biopsies.

Considering noisy sequencing data, a general framework for understanding subclonal dynamics in single tumor biopsies has been to use variant allele frequency (VAF) distributions to capture ongoing selection in tumors. With that said, the range of existing VAF-based methods suffer from unique limitations. Namely, single statistics compress data into a single value without accounting for noise, classical likelihood-free methods such as approximate Bayesian computation (ABC) can be very slow as simulations are required for each patient, and mixture models are dependent on the presence of qualitatively clean, distinct peaks within the VAF distribution to detect subclones and assign parameters.

Method: We developed a method that combines stochastic cancer evolution simulations with Bayesian neural networks to estimate subclonal dynamics in bulk sequenced single tumor biopsies using only mutation frequency information. Compared to existing methods, our simulation-based deep learning approach allows us to limit data compression as observed with single statistics (utilize the complete dimensionality of VAF distribution), to perform simulation and training independently from prediction (fast estimates via amortization), and to learn concise embeddings of the underlying data leading to more accurate predictions relative to existing methods (representation learning).

Results: We find our approach to be significantly more accurate at differentiating between positive selection and neutral evolution, estimating the number of subclones, and estimating subclone frequency. Furthermore, our neural network-based approach enables the use of transfer learning to re-optimize our models for additional evolutionary predictions. In this regard, we make additional estimates on subclone fitness, subclone emergence time, and mutation rate with mean percentage error < 5% in all cases.

Conclusion: The integration of explicit cancer evolution simulations with Bayesian neural networks provides a new avenue for disentangling the subclonal dynamics in growing tumors. Application of our method recovers positive selection and neutral subclonal dynamics in whole-genome and exome-sequenced tumors from 10 different cancer types.

Citation Format: Tom W. Ouellette, Philip Awadalla. Disentangling subclonal dynamics in growing tumors using stochastic simulations and Bayesian neural networks [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 6090.