Abstract
International cancer registries make real-world genomic and clinical data available, but their joint analysis remains a challenge. AACR Project GENIE, an international cancer registry collecting data from 19 cancer centers, makes data from >130,000 patients publicly available through the cBioPortal for Cancer Genomics (https://genie.cbioportal.org). For 25,000 patients, additional real-world longitudinal clinical data, including treatment and outcome data, are being collected by the AACR Project GENIE Biopharma Collaborative using the PRISSMM data curation model. Several thousand of these cases are now also available in cBioPortal. We have significantly enhanced the functionalities of cBioPortal to support the visualization and analysis of this rich clinico-genomic linked dataset, as well as datasets generated by other centers and consortia. Examples of these enhancements include (i) visualization of the longitudinal clinical and genomic data at the patient level, including timelines for diagnoses, treatments, and outcomes; (ii) the ability to select samples based on treatment status, facilitating a comparison of molecular and clinical attributes between samples before and after a specific treatment; and (iii) survival analysis estimates based on individual treatment regimens received. Together, these features provide cBioPortal users with a toolkit to interactively investigate complex clinico-genomic data to generate hypotheses and make discoveries about the impact of specific genomic variants on prognosis and therapeutic sensitivities in cancer.
Enhanced cBioPortal features allow clinicians and researchers to effectively investigate longitudinal clinico-genomic data from patients with cancer, which will improve exploration of data from the AACR Project GENIE Biopharma Collaborative and similar datasets.
Introduction
Over the past decade large-scale sequencing projects, including The Cancer Genome Atlas (TCGA; ref. 1) and institutional clinical genomics initiatives such as MSK-IMPACT (2), have generated data for tens of thousands of tumors. These initiatives have allowed for numerous findings that have changed our understanding of cancer and the way we treat cancer. It remains challenging however to explore the full context in which these tumors appear and their response to treatment, due to a lack of detailed clinical data on the patients and their tumors. Although many hospitals collect these data in addition to tumor profiling they are often stored in separate databases and not presented in a uniform manner, limiting analysis of clinical and genomic data together. We have extended the cBioPortal for Cancer Genomics (3, 4) significantly over the years to enable their joint analysis.
The cBioPortal was originally designed to support The Cancer Genome Atlas (TCGA) dataset and other large-scale cancer genomic data. These datasets often consist of only a single genomically characterized sample per patient (e.g., pretreatment primary tumors for TCGA) and limited clinical data. The cBioPortal interface includes four interconnected views: (i) “Study” view for the interactive exploration of genomic and clinical data in a dataset, (ii) “Query/Results” view for the detailed analysis and visualization of genomic alterations in specific genes, (iii) “Comparison” view for the comparison of genomic and clinical features in two or more cohorts, and (iv) “Patient” view for the visualization and interpretation of genomic and clinical data in a patient. The cBioPortal is integrated with various other databases to help with interpretability of the genomic data, including OncoKB (5), CIVIC (6), MyCancerGenome (7), Cancer Hotspots (8), and gnomAD (9).
A follow-up initiative to the TCGA with a focus on untreated as well as treated tumors is the American Association for Cancer Research (AACR) Project GENIE (Genomics Evidence Neoplasia Information Exchange). AACR Project GENIE is an international cancer registry of real-world genomic and limited clinical data (10). Established in 2015, the registry aggregates genomic data from clinical tumor sequencing and accompanying clinical patient data from 19 cancer centers around the world. The latest public release (v13.0, released in January of 2023) contains data from 167,423 tissue samples from 148,268 patients, spanning >100 cancer types. The data include genomic variants from targeted sequencing panels, containing somatic mutations, copy-number alterations, as well as rearrangements and fusions. Clinical data elements for this main dataset consist of sex, race, ethnicity, detailed cancer type, sample origin (primary or metastasis), and vital status. The cBioPortal was leveraged to host this data and allow users to explore the dataset in their browser (https://genie.cbioportal.org). Access is granted to anyone accepting the terms of use, which allows free use of the data but protects patient privacy by prohibiting re-identification of individuals and redistribution of data without permission. We improved the performance of cBioPortal to better handle large amounts of samples, added support for structural variants including rearrangements and fusions, and enabled the calculation of gene alteration frequencies across cohorts with different gene panels.
The limited clinical data available in Project GENIE sparked the creation of the Project GENIE Biopharma Collaborative (BPC). Project GENIE BPC, initiated in 2019, is a research collaboration with 10 biopharmaceutical companies, with the goal of obtaining comprehensive clinical and genomic data from an estimated 25,000 de-identified patients treated at a subset of the centers participating in AACR Project GENIE. Electronic health records were curated, and patient outcomes were defined using the Pathology; Radiology; Imaging; Signs and Symptoms; tumor Markers; Medical (PRISSMM) oncology assessments phenomic data system (11, 12). PRISSMM is a set of phenomic data standards and tools for the characterization and communication of structured information about cancer status and treatment outcomes for patients with solid tumors. Curated data include exact diagnosis date, time of sample acquisition and sequencing, detailed drug data along with treatment start and stop dates, as well as detailed information about response. The first datasets curated, a non–small cell lung cancer (NSCLC) cohort consisting of 1,846 patients and a colorectal cancer dataset consisting of 1,485 patients, are now publicly available through cBioPortal (https://genie.cbioportal.org/) and Synapse (https://www.synapse.org/genie). A breast cancer dataset will be released in mid-2023.
We report a significantly enhanced cBioPortal interface for visualization and analysis of the longitudinal data from GENIE BPC, now displaying details of diagnoses, treatments, and outcome. The software now allows selection of sequenced samples before or after specific treatments to allow genomic comparisons; survival analysis based on treatment history and disease progression is also now possible. These enhancements provide cBioPortal users with more powerful options to explore longitudinal clinico-genomic data. We discuss several potential analyses in the Results section.
Materials and Methods
Longitudinal clinico-genomic data from AACR GENIE BPC
The AACR Project GENIE BPC NSCLC 2.0-public cohort (GENIE BPC NSCLC) is described in detail in Choudhury and colleagues (13). Briefly, the NSCLC BPC cohort contains 2,004 samples from 1,846 patients, randomly selected from samples in the GENIE 11.1-public release, from four centers: Dana-Farber Cancer Institute (DFCI), Memorial Sloan Kettering Cancer Center (MSK), Vanderbilt-Ingram Cancer Center (VICC), and Princess Margaret Cancer Centre - University Health Network (UHN). The following selection criteria were applied for the GENIE BPC NSCLC cohort: NSCLC OncoTree (14) diagnosis, stage I to IV at diagnosis, and at least 18 years of age at the time of sequencing. In addition, patients needed a genomic sequencing report performed between January 1, 2014, and December 31, 2017, and at least 2 years of potential follow-up. The cancer with associated genomic sequencing that met the inclusion criteria is hereafter referred to as the index cancer.
All clinical data were curated using the PRISSMM framework. Curation with PRISSMM requires reviewing all pathology reports, reports of imaging studies (exclusive of plain films other than mammography), and at least one medical oncology note per month from the time of index cancer diagnosis. The duration of all systemic anticancer treatments, including chemotherapy, targeted therapy, and immunotherapy, are extracted from both pharmacy records and medical oncology notes, which allows for documentation of treatments administered outside of the submitting institution. The start of a new treatment is defined as any 8-week gap between treatment dates. Leveraging these assessments, progression-free survival (PFS) is derived from several starting time points, including from diagnosis and from the start of each drug regimen. Data curation further includes detailed annotations that capture attributes of diagnoses, sequenced samples, and treatments. All cancer diagnoses, including NSCLC diagnoses and other cancer diagnoses, were recorded for each patient. Cancer diagnosis curation includes staging and other relevant cancer-specific variables. For example, smoking history and pleural invasion is added to the base curation model for patients with NSCLC. In addition to sample attributes reported in the main consortium (assay type, sequencing center, age at sequencing, cancer type, and sample type), other clinically relevant elements are included, such as Programmed Death-Ligand 1 (PD-L1) testing and results, when available.
BPC data were submitted by the centers to GENIE's bioinformatics partner Sage Bionetworks via Synapse (https://www.synapse.org/genie). A large subset of the BPC data elements were selected, and when necessary, reformatted to the cBioPortal file format by Sage Bionetworks (https://docs.cbioportal.org/file-formats/). Only data elements associated with the index cancer, such as diagnosis information and treatment for index cancer, are visualized in cBioPortal. In addition, we selected three outcome variables: overall survival (OS), PFS based on imaging (PFS-I), and PFS based on medical oncologist assessment (PFS-M), each from first index cancer diagnosis and from first administration of a regimen for visualization. We restricted the regimen outcome data to the 20 most frequent regimens. All other longitudinal data (sample acquisition, sequencing, diagnosis, treatment; assessments from imaging, pathology, and medical oncologist notes) were captured in the patient timeline files.
The mutation data and accompanying gene panel information, which are part of the main GENIE consortium, were previously processed as follows. Submitted VCF data are converted to MAF and are annotated using Genome Nexus (https://www.genomenexus.org; ref. 15). Because DFCI, VICC, and UHN perform tumor-only sequencing, mutations are filtered for recurrent artifacts and germline events reported by ExAC (16). Germline variants are further filtered out using the gnomAD (9) allele frequency threshold of 0.0005 or greater. In addition to mutation data, the centers contributed discrete copy-number data, segmented copy-number data, rearrangements, and gene fusions.
Additional information on the data from the main GENIE consortium and the GENIE BPC can be found in their respective data guides, which are both available from https://genie.cbioportal.org/login.jsp. A comprehensive collection of materials about cBioPortal, including tutorial slides and webinar recordings, can be found at https://docs.cbioportal.org/user-guide/.
Data availability
The data are publicly available in cBioPortal (https://genie.cbioportal.org). The complete clinical and processed genomic data can be downloaded from Synapse (https://www.synapse.org/genie). All other raw data are available upon request from the corresponding author.
Results
The cBioPortal for AACR Project GENIE (https://genie.cbioportal.org/) hosts all of the genomic and clinical data from 19 cancer centers, allowing users to explore the data through the web interface. The BPC cohorts with additional longitudinal clinical data for four institutions as described in the Materials and Methods section are available now as well from the same website. Here, we describe the enhancements that were made to cBioportal to better support this data and provide detailed use-case examples of this new functionality to analyze the GENIE BPC NSCLC cohort.
Integration of Project GENIE and GENIE BPC data into cBioPortal
The AACR Project GENIE data include somatic mutations, DNA copy-number alterations, as well as rearrangements and gene fusions, all derived from targeted sequencing panels (Fig. 1). Although much functionality existed in cBioPortal already, several enhancements were made, including (i) performance improvements to enable querying across large amounts of samples (>160K samples in release v13), (ii) support for structural variant data and (iii) improved handling of gene panels when calculating gene alteration frequencies by using profiled samples as the denominator rather than all samples. The additional GENIE BPC longitudinal clinical data that are being collected for 25,000 of the Project GENIE patients incentivized development of additional features including improved visualization of diagnoses, treatment, and outcome; selection of sequenced samples before or after specific treatments to allow genomic comparisons; survival analysis based on treatment history and disease progression. This functionality is explained in more detail in the following sections.
Integration of the longitudinal clinico-genomic data from AACR GENIE BPC in cBioPortal. Multidimensional GENIE BPC datasets have been harmonized across the four participating centers. AACR Project GENIE includes molecular and basic clinical data from 19 institutions. GENIE BPC enriches this data for four institutions with additional clinical longitudinal elements. The data are publicly available in cBioPortal (https://genie.cbioportal.org). The complete clinical and genomic data can be downloaded from Synapse (https://www.synapse.org/genie).
Integration of the longitudinal clinico-genomic data from AACR GENIE BPC in cBioPortal. Multidimensional GENIE BPC datasets have been harmonized across the four participating centers. AACR Project GENIE includes molecular and basic clinical data from 19 institutions. GENIE BPC enriches this data for four institutions with additional clinical longitudinal elements. The data are publicly available in cBioPortal (https://genie.cbioportal.org). The complete clinical and genomic data can be downloaded from Synapse (https://www.synapse.org/genie).
Identification of resistance mutations in a patient
The Patient view in cBioPortal depicts all genomic and clinical data for a single patient over the course of their treatment (Fig. 2). This visualization yields a descriptive picture of a patient's treatment history and mutational profiles. The timeline of an patient with NSCLC with four metastatic biopsies obtained at different timepoints indicates when disease improved or worsened in relation to the different treatments over time (Fig. 2A).
Visualization of longitudinal clinical and genomic profile of a patient in GENIE BPC. A, The timeline shows biopsies and resections, diagnoses, lines of treatments, as well as assessments from imaging and medical oncology. B, The genomic event tables show detailed information about mutations, structural variants, and copy-number alterations in each of the samples. The mutations table for instance shows the effect on the protein and the variant allele frequency. The samples column in each of the tables indicates in which sample(s) the event was found; a dash indicates that the sample was not profiled for that gene. All genomic events have an annotation column with information from other resources about that event, including, for example, OncoKB, which is indicated by a blue target icon. Hovering over those icons gives additional information in a tooltip. C, The tooltip is shown for the OncoKB annotation of ALK G1202R. It mentions that the ALK G1202R mutation identified in sample 4 is a known resistance mutation to crizotinib.
Visualization of longitudinal clinical and genomic profile of a patient in GENIE BPC. A, The timeline shows biopsies and resections, diagnoses, lines of treatments, as well as assessments from imaging and medical oncology. B, The genomic event tables show detailed information about mutations, structural variants, and copy-number alterations in each of the samples. The mutations table for instance shows the effect on the protein and the variant allele frequency. The samples column in each of the tables indicates in which sample(s) the event was found; a dash indicates that the sample was not profiled for that gene. All genomic events have an annotation column with information from other resources about that event, including, for example, OncoKB, which is indicated by a blue target icon. Hovering over those icons gives additional information in a tooltip. C, The tooltip is shown for the OncoKB annotation of ALK G1202R. It mentions that the ALK G1202R mutation identified in sample 4 is a known resistance mutation to crizotinib.
To help with variant interpretation and guide clinical decisions, variants are annotated with information from OncoKB (5), a precision oncology knowledgebase that contains biological and clinical information about genomic alterations in cancer (Fig. 2B and C). In the first biopsy, an EGFR L858R mutation was identified, which is a biomarker for sensitivity to several FDA-approved drugs (OncoKB Level 1), which prompted the use of erlotinib as treatment. When disease worsened, an ALK fusion event was discovered in the second sequenced sample and crizotinib was added to the treatment regimen (OncoKB Level 1). In the fourth sample, an ALK resistance mutation was found (OncoKB Level R2), which likely led to another change in treatment. Note that although this case has four profiled samples, 92% of the NSCLC cohort has only one profiled sample. In the following section we therefore focus on how to identify potential resistance alterations in a cohort containing cases with at least one profiled sample.
Exploration of resistance mutations in patient cohorts
Using the tyrosine kinase inhibitor crizotinib, the below case study demonstrates how cBioPortal can be used to identify known resistance mutations acquired in the course of treatment in lung cancer (Fig. 3; ref. 17); This selection and comparison process in cBioPortal can be applied to other treatments to identify and visualize potential acquired resistance mutations that are yet unknown but will be used for hypothesis generation. Using the Study view, one can select samples that were obtained before or after crizotinib treatment and start a comparison (Fig. 3A). The Comparison view has functionality to compare alterations in these two groups (Fig. 3B). There are 80 sequenced samples that were obtained in patients prior to receiving crizotinib and 25 that were obtained after the patient received crizotinib; there were 8 patients with samples sequenced before and after crizotinib. There is a notable difference in mutation events in ALK: there are 3 mutations in the samples sequenced prior to crizotinib treatment, but 9 in samples sequenced after crizotinib treatment, as expected based on findings previously reported in the literature. In Fig. 3C, this difference in ALK mutations is visualized in an OncoPrint, showing that the samples sequenced after crizotinib contain both the ALK fusion as well as the ALK missense mutation. Hovering over these events indicates that these are known resistance mutations according to OncoKB. The mutation (“lollipop”) diagram below shows that the known resistance mutations are all in the kinase domain.
Visualization and analysis of GENIE BPC dataset in cBioPortal. A, Study view. Overview of clinical and genomic data of the GENIE BPC NSCLC cohort; samples can be selected by their treatment status and then compared. B, Comparison view. Example comparison of samples collected before and/or after a patient was treated with crizotinib to identify genomic alterations potentially responsible for acquired resistance to treatment. C, Results view. OncoPrint with resistance alterations in post-crizotinib samples and a lollipop highlighting resistance mutations in the kinase domain of ALK. D, Survival analysis, showing a comparison of survival between TP53 mutant versus wild-type patients with NSCLC.
Visualization and analysis of GENIE BPC dataset in cBioPortal. A, Study view. Overview of clinical and genomic data of the GENIE BPC NSCLC cohort; samples can be selected by their treatment status and then compared. B, Comparison view. Example comparison of samples collected before and/or after a patient was treated with crizotinib to identify genomic alterations potentially responsible for acquired resistance to treatment. C, Results view. OncoPrint with resistance alterations in post-crizotinib samples and a lollipop highlighting resistance mutations in the kinase domain of ALK. D, Survival analysis, showing a comparison of survival between TP53 mutant versus wild-type patients with NSCLC.
Three sidenotes to this comparison. First, we are including multiple samples in each group for a single patient, which can skew the results. Selecting one sample per patient per group would avoid this. Second, we are comparing samples before and after a treatment that are not from the same patient. One can restrict the comparison to patients with only paired samples by using the Overlap tab on the Comparison view. Finally, it is important to consider the context in which a particular treatment was given. The comparison shown serves as a starting point and we encourage further exploration of patients’ treatment regimens using the Patient view. The cBioPortal interface allows for creating groups of any subset of patients and samples enabling users to limit the comparison to a specific context of interest.
Exploratory survival comparison between TP53 altered and unaltered cases
To enable exploration of the outcome data collected under the PRISSMM data model, the survival analysis visualizations in cBioPortal were extended to include estimation from various starting time points and various definitions of PFS. For example, alterations in TP53 may be associated with worse progression-free survival from diagnosis based on imaging reports (PFS-I), with Kaplan–Meier estimates of the survival distribution for TP53 wild-type slightly longer than TP53 mutant (Fig. 3D). There are various ways in which one can further explore potential differences in survival for patients with and without TP53 mutations, for example, by changing the start point of the survival analysis from diagnosis to the start of any treatment regimen. In addition to PFS based on imaging reports (PFS-I), one can also use PFS metrics based on the medical oncologist's assessment (PFS-M) to assess whether the associations between genomic alterations and survival vary based on the criteria used to define a progression event.
Discussion
The enhanced cBioPortal features for longitudinal data analysis greatly improve exploration of the AACR GENIE BPC dataset, made available through the GENIE instance of the cBioPortal for Cancer Genomics (https://genie.cbioportal.org). These new features further improve a user's ability to generate hypotheses about the relationships between genomic alterations and clinical outcomes. Although not all datasets in cBioPortal have the same clinical and genomic data as in the GENIE BPC cohort, many of the features described here will also be useful for datasets where only a subset of data elements are available. Hence, all instances of cBioPortal including the public cBioPortal (https://cbioportal.org) with >30K monthly visitors as well as local institutional installations (>70 globally to date) can make use of these new functionalities.
To our knowledge, the GENIE BPC cohort is the first dataset that combines complete and detailed clinical longitudinal and genomic sequencing data across several centers. Even within a single institution, it can already be a Herculean task to combine these data elements, as they are often stored across multiple databases and in heterogeneous formats. In addition, certain data elements might not be collected consistently, resulting in missing data. Combining clinical and genomic data sources across multiple centers results in additional complexities. For instance, there are different ontologies for cancer classification such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT; https://www.snomed.org/) or International Classification of Diseases for Oncology (ICD-O; ref. 18). Similarly for drug ontologies, one might use SNOMED-CT or the National Cancer Institute thesaurus (NCIt; https://ncithesaurus.nci.nih.gov/ncitbrowser/). Curation under the PRISSMM data model addresses harmonization of key data elements. Another difficulty when combining data generated at different centers is the use of different gene panels; each center uses a different gene panel, and gene panels change over time. The cBioPortal software addresses this concern, by storing gene panel information for each sequenced tumor sample, taking differences in panel coverage into account when calculating alteration frequencies by using profiled samples for a gene as the denominator rather than all samples, and visualizing whether a sample was profiled for a gene of interest in both the OncoPrint and the Patient view.
The cBioPortal software makes complex data accessible and interpretable, allowing a broad audience of researchers and clinicians to explore the data. Several components of analyses presented in the portal require careful consideration. It is important to note that although the portal visualizes complex data and can be used for generating hypotheses, users should work with statisticians to address specific research questions while formally investigating and accounting for missing data, confounders, and biases inherent in the data. Importantly, the selection bias that occurs by building a cohort of patients who underwent genomic testing is not currently accounted for as part of the analyses presented in cBioPortal, but investigating and accounting for this bias is critical for statistical inference (19).
cBioPortal is a powerful resource for the generation of hypotheses, making exploratory data analysis accessible to all users. The addition of comprehensive clinical data from GENIE BPC is an important step towards the advancement of precision oncology and personalizing treatment trajectories. Data exploration should be guided by clinical and statistical input to maintain rigorous research practices. Several features are implemented in cBioPortal to assist with this, including a minimum sample size threshold for which comparisons are made (n > 5 per group), to prevent presenting potentially misleading results. In addition, to prevent false-positive findings, q values correcting for multiple testing are presented by default; P values are hidden and must be manually unhidden by the user.
We plan to continue improving cBioPortal for longitudinal data analysis. In particular, we want to enable querying for samples that were obtained during specific time intervals in relation to an event rather than only before or after a particular treatment. We also plan to further expand on building a cohort based on treatment received by allowing users to select a particular order of consecutive treatments and events, as opposed to creating a cohort of patients based on a single treatment that was received. The Patient view already allows detailed exploration of the clinico-genomic profiling of a patient including their treatment history. Being able to seamlessly retrieve cases with similar clinico-genomic features will enable clinicians and researchers to build cohorts for further investigation.
Authors' Disclosures
P. Lukasse reports grants and other support from Dana Farber Cancer Institute during the conduct of the study and other support from Dana Farber Cancer Institute outside the submitted work. R. Sheridan reports stock and other ownership interests: 2Seventy Bio Inc., Abbvie Inc., Agios Pharmaceuticals Inc., Alnylam Pharmaceuticals Inc., Alphabet Inc., Amazon Inc., Amgen Inc., Berkshire Hathaway Inc., Biogen Inc., Bluebird Bio Inc., Bristol Myers Squibb, ClearPoint Neuro Inc., Clementia Pharmaceuticals Inc., GSK plc, General Electric, Johnson & Johnson, Merck & Co. Inc., Northwest Biotherapeutics, Organon & Co., Pfizer Inc., Philip Morris Intl Inc. (Vectura Fertin Pharma), Portola Pharmaceuticals (Alexion Pharmaceuticals), Viatris Inc. S. Brown reports other support from AACR during the conduct of the study. J.A. Lavery reports other support from AACR during the conduct of the study. K.S. Panageas reports other support from AACR Project GENIE Biopharmaceutical Consortium during the conduct of the study. M.L. LeNoue-Newton reports other support from AACR during the conduct of the study and other support from GE Healthcare outside the submitted work. J.L. Warner reports grants and nonfinancial support from AACR during the conduct of the study; grants, other support, and personal fees from NIH, HemOnc.org LLC, Melax Tech, Westat, Flatiron, Roche, and ASCO outside the submitted work. C. Nichols reports grants from AACR during the conduct of the study. K.L. Kehl reports grants from AACR during the conduct of the study. G.J. Riely reports grants from AACR during the conduct of the study and grants from Novartis, Mirati, Roche, Takeda, Lilly, Rain Therapeutics, and Verastem outside the submitted work. D. Schrag reports grants from GRAIL, Inc. during the conduct of the study and personal fees from JAMA, University of North Carolina, Ontario Institute of Cancer Research, and Mayo Clinic outside the submitted work. J. Lee reports ownership of Abbott and Gilead Sciences stock. J. Lee is also an Associate Director of AACR Project GENIE (AACR Project GENIE received funding from Amgen, Inc., Bristol-Myers Squibb Company, Bayer HealthCare Pharmaceuticals Inc., Merck Sharp & Dohme Corp., Pfizer, AstraZeneca UK Limited, Genentech, Novartis, Boehringer Ingelheim, and Janssen Pharmaceuticals, Inc.). M.V. Fiandalo reports other support from Genentech, Novartis, Amgen, Merck, Bayer, Janssen, Bohringer Ingelheim, Bristol Myers Squibb, AstraZeneca, and Pfizer during the conduct of the study. S.M. Sweeney reports grants from Amgen, AstraZeneca, Bristol Myers Squibb, Boehringer-Ingelheim, Bayer Pharmaceuticals, Inc., Genentech, Janssen, Merck, Novartis, and Pfizer during the conduct of the study. T.J. Pugh reports grants and personal fees from AstraZeneca, Chrysalis Biomedical Advisors, Merck, SAGA Diagnostics, and Roche/Genentech outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
I. de Bruijn: Conceptualization, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. R. Kundra: Data curation, methodology, writing–original draft, writing–review and editing. B. Mastrogiacomo: Data curation, methodology, writing–original draft, writing–review and editing. T.N. Tran: Formal analysis, investigation, visualization. L. Sikina: Software, visualization. T. Mazor: Formal analysis, validation, investigation, visualization, writing–original draft, writing–review and editing. X. Li: Software, visualization. A. Ochoa: Software. G. Zhao: Software. B. Lai: Software, visualization. A. Abeshouse: Software. D. Baiceanu: Software. E. Ciftci: Software. U. Dogrusoz: Software, supervision. A. Dufilie: Software. Z. Erkoc: Software. E. Garcia Lara: Software. Z. Fu: Software. B.E. Gross: Software, supervision. C. Haynes: Software. A. Heath: Supervision. D.M. Higgins: Supervision, validation, investigation. P. Jagannathan: Software. K. Kalletla: Software. P. Kumari: Data curation, validation, investigation. J. Lindsay: Software. A. Lisman: Software. B. Leenknegt: Software. P. Lukasse: Software. D. Madala: Software. R. Madupuri: Software. P. van Nierop: Software. O. Plantalech: Software. J. Quach: Software. A. Resnick: Supervision. S.Y.A. Rodenburg: Software. B.A. Satravada: Software. F. Schaeffer: Software. R. Sheridan: Software. J. Singh: Software. R. Sirohi: Software. S.O. Sumer: Software. S. van Hagen: Software, supervision, validation, investigation. A. Wang: Software. M. Wilson: Software. H. Zhang: Software. K. Zhu: Software. N. Rusk: Writing–review and editing. S. Brown: Formal analysis, validation, writing–review and editing. J.A. Lavery: Formal analysis, validation, writing–review and editing. K.S. Panageas: Supervision, writing–review and editing. J.E. Rudolph: Conceptualization, supervision. M.L. LeNoue-Newton: Methodology, writing–review and editing. J.L. Warner: Supervision, writing–review and editing. X. Guo: Software. H. Hunter-Zinck: Software. T.V. Yu: Software, writing–review and editing. S. Pillai: Data curation. C. Nichols: Data curation, software. S.M. Gardos: Supervision. J. Philip: Supervision. G. BPC Core Team: Resources, data curation, formal analysis, validation, investigation, methodology, project administration, writing–review and editing. A. Project GENIE Consortium: Resources, data curation, validation, investigation, methodology, project administration, writing–review and editing. K.L. Kehl: Conceptualization, supervision. G.J. Riely: Conceptualization, supervision. D. Schrag: Conceptualization, supervision, methodology, writing–review and editing. J. Lee: Conceptualization, supervision. M.V. Fiandalo: Supervision. S.M. Sweeney: Conceptualization, supervision. T.J. Pugh: Supervision. C. Sander: Supervision. E. Cerami: Resources, supervision. J. Gao: Conceptualization, software, supervision, investigation, visualization, methodology, writing–original draft, writing–review and editing. N. Schultz: Conceptualization, supervision, writing–original draft, writing–review and editing.
Acknowledgments
The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to open data. The full list of AACR Project GENIE Consortium members and AACR Project GENIE BPC Core Team members can be found in the Supplementary Data (AACR Project GENIE Consortium Members, AACR Project GENIE BPC Core Team Members). Interpretations are the responsibility of study authors. While this manuscript focuses on the newly developed clinico-genomic features for cBioPortal, it builds on over a decade of work supported by several organizations. The authors would like to thank our funders: the American Association for Cancer Research; the NCI, through ITCR grant U24 CA274633, GDAN grant U24 CA264028, and HTAN grant U24 CA233243; the Marie-José and Henry R. Kravis Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center; Dana-Farber Cancer Institute; the Prostate Cancer Foundation; and Google Summer of Code. Support for this work was provided to Memorial Sloan Kettering Cancer Center by a core grant from the National Cancer Institute (P30 CA008748). The authors would also like to thank all past funders: Stand Up 2 Cancer; the Ben & Catherine Ivy Foundation; the NCI, as a TCGA Genome Data Analysis Center (GDAC; NCI-U24CA143840); NCRR, as the National Resource for Network Biology (NRNB) Research Resource (RR 031228–02); the Starr Cancer Consortium; the Breast Cancer Research Foundation; the Adenoid Cystic Carcinoma Research Foundation; the POETIC Consortium; the Cholangiocarcinoma Foundation; the Robertson Foundation; and the Parker Institute for Cancer Immunotherapy. Finally, the authors thank all users of cBioPortal, as their feedback has directly influenced many of the features described here.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).