People of different races and ethnicities have different likelihoods of being diagnosed with certain types of cancer and respond differently to therapeutic agents. However, genome-wide association studies of cancer have typically assumed fixed genetic effects across ethnicities, and rarely compared and contrasted findings across ethnic groups, especially the Hispanic population. Baptist Health South Florida enrolled more than 3,000 patients into the Total Cancer Care Protocol, a prospective longitudinal outcomes study designed to bank tissues and clinical information in order to allow scientists and clinicians to bring cancer treatment to a new level. Tissue samples were snap-frozen within 15 minutes of surgical removal, macrodissected to ≥85% tumor purity, and quantified for the percent of malignancy, cellularity, stroma, normalcy, and necrosis. Longitudinal clinical and pathologic data were extracted from patients' electronic medical records to annotate the biospecimens. We analyzed gene expression and sequencing data from 443 cancer patients enrolled in this study whose tumors were molecularly profiled. All the cases had gene expression data, and 32% (n=140) also had next-generation sequencing data. Most common primary cancers were breast cancer (44.2%, n=196), followed by large bowel (18.9%, n=84), lung (15.8%, n=70), and uterine (9.5%, n=42) cancer. The uniqueness of this data is the distinctive racial and ethnic distribution of the patients; the majority of them were Hispanic (51.7%, n=229) specifically from Cuban, Mexican, Puerto Rican, Central American, and South American populations, followed by whites (36.3%, n=161), blacks (6.1%, n=27), other racial/ethnic groups (5.0%, n=22) and unknown (1.0%, n=4). Microarrays, across all sites of origin, were normalized using IRON against the median sample. An RNA quality-related technical artifact was observed and corrected by subtracting the 1st principal component of a partial least squares (PLS) model trained against the RNA integrity number (RIN), a measure of RNA quality. All samples were then extracted from this master normalized quality-corrected dataset and gene expression analysis was performed. For sequencing data, reads were aligned to the human reference genome (hs37d5) using the Burrows-Wheeler Aligner (BWA). Duplicate reads were marked with Picard-Tools. Indel realignment and base quality score recalibration were performed with GATK. Variations observed among ethnic and racial groups for each primary cancer are described in this representative South Florida population.

Citation Format: Zuanel Diaz, Arpit Mehta, Zasha Pou, Muni B. Rubens, Don Parris, Miguel Villalona-Calero. Analysis of a cancer gene expression and next-generation sequencing dataset representing hispanic predominant South Florida population [abstract]. In: Proceedings of the Eleventh AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2018 Nov 2-5; New Orleans, LA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2020;29(6 Suppl):Abstract nr B054.