Introduction: Li Fraumeni Syndrome (LFS) is a rare hereditary genetic cancer predisposition syndrome. Germline mutations of the TP53 tumor suppressor gene are the underlying cause in >80% of patients with LFS, and are associated with an increased risk of second tumors and a spectrum of early onset cancers, even in the absence of a family history of cancer. We have previously developed and implemented a comprehensive life-long clinical surveillance protocol for individuals with a germline TP53 mutation. We set out to make this screening process more targeted by building a predictive model of age of onset. We accomplished this goal by implementing machine learning methods on germline methylation data.

Methods: We made use of the Toronto Hospital for Sick Children (SickKids) LFS family cohort in our predictive model of age of onset. In all, we have 74 patients with germline methylation data, consisting of ~450,000 probe sites. We subset this data by identifying probes that fall into differentially methylated regions between LFS and cancer patients with wild-type TP53. The probes identified in these regions were used in our predictive model of age of onset. Because age of sample collection was highly correlated with age of onset (r2 ~ .90), we corrected for confounding using a strategy that is two-fold: (1) we extracted the variation of each probe that is independent of the age of sample collection (the residual after regressing on the age of sample collection) and use these as predictors in our model, and (2) we test our models on the task of predicting the age of sample collection for LFS patients that do not have cancer. The former provided us with more robust predictions while the latter verified that we are in fact predicting age of onset, rather than simply predicting age at which the sample was collected.

Results: Our machine learning model was able to achieve 86% correlation between true and predicted values of the age of onset. Additionally, we have tested the ability of our models to predict whether an individual will be diagnosed before or after the age of 4. Our classification machine learning model achieved 91% accuracy on average. We verified that our model does not simply predict age of sample collection by using our cohort of LFS patients that do not have cancer (n = 37). The distribution of the age of sample collection matched those of the patients used in our model. The model has no predictive power on the age of sample collection, thus confirming that our model is highly predictive of the age of cancer onset in LFS TP53 Mutation patients.

Conclusions: We identified two predictive models for age of cancer onset in LFS patients that achieve high accuracy, both when predicting the age of onset as a continuous variable (86% correlation) and whether cancer onset will occur before or after the age of 4 (91% accuracy). Our model will assist clinicians in targeting high risk patients for screening, lower the cost of treatment, and raise the likelihood of survival among LFS patients.

Citation Format: Benjamin M. Brew, David Malkin, Lauren Erdman, Andrea Doria, Jason Berman, Adam Shlien, Tanya Guha, Ana Novokmet, Anna Goldenberg. Methylation accurately predicts age of cancer onset in patients with Li Fraumeni Syndrome [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 973. doi:10.1158/1538-7445.AM2017-973