Background. Cancer gene mutations exhibit mutation patterns of prevalence that vary across different ancestry groups. For example, EGFR variants are more frequent in non-small cell lung cancer among people with Asian ancestry, and KRAS variants are observed more frequently in colorectal cancer among patients with African American ancestry. Additionally, many histological subtypes of cancers demonstrate differences in prevalence between people of different ancestry. However, many cancer studies lack the statistical power to identify such nuances. The large cohort of patients who have undergone comprehensive cancer genomic profiling at Foundation Medicine may provide a useful starting point for characterizing these mutation patterns across ancestry groups.

Methods. To establish the ancestry on de-identified samples, we superimposed SNPs targeted by each of our comprehensive genomic profiling tests (FoundationOne, FoundationOne CDx, FoundationOne Heme) with Phase 3 1000 Genomes data. Using an established approach, we projected the SNPs down to the top five principal components and used random forest ensemble learning to train a classifier on each bait set. 10-fold cross-validation indicates this approach performs with 98-99% precision and recall for the different genomic profiling tests.

Results. Ancestry calls were made on over 170,000 de-identified samples consented for research. Initial analyses indicated that classification of American samples was not as robust as other groups. To address this, we trained classifiers on a per-chromosome basis, and re-assigned samples which exhibited less than 80% consensus across chromosomes to an admixture group. Overall prevalence of patient ancestry in the dataset is 75.9% European, 8.3% African, 4.7% East Asian, 0.8% South Asian, and 0.8% American, and 9.5% admixed. From the resulting data, we summarize cancer types that are well-represented across populations, identifying at least 28 tumor types for which we likely have power to identify ancestry-dependent somatic mutations.

Discussion. The dataset described contains a previously unavailable set of cancer types to be mined for ancestry-dependent cancer-driving alterations. Those results will be presented. The ancestry classification approach described in this work can be applied to a range of genomic profiling tests, and refinements on this approach can be integrated into clinical trials and ultimately clinical care to better elucidate varied biologic behavior across advanced cancer.

Citation Format: Justin Newberg, Caitlin Connelly, Garrett Frampton. Determining patient ancestry based on targeted tumor comprehensive genomic profiling [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1599.