Although The Cancer Genome Atlas (TCGA) has produced an unprecedented amount of data molecularly characterizing thirty-two tumor types, the usability of the data has been restricted by download model. The Genomics Data Commons (GDC) is a new database for retrieval and query of all NCI genomic-level data, allowing the user to identify areas of interest and download the information on just those segments (BAM slicing). All the data contained in the GDC will be mapped to the genome (aligned) and variations on the sequence called in a uniform fashion. In future releases, the GDC will incorporate a variety of analytical tools, obviating the need to download the raw data in order to analyze it. However, the ethnic diversity represented in the data is sparce; a new initiative by the NCI will try to bridge that deficiency by targeting accrual of minorities (as well as early onset cases) in five different tumor types.

Citation Format: Jean C. Zenklusen. The Cancer Genome Atlas: a primer on 2.5 PB of high quality data. [abstract]. In: Proceedings of the Ninth AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2016 Sep 25-28; Fort Lauderdale, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(2 Suppl):Abstract nr IA25.