After years of enormous research efforts for the systematic cataloguing of genetic alterations with causative function in cancer, their exploitation in clinical oncology is now potentially at reach. Large-scale approaches for the systematic molecular characterization of human tumors—such as The Cancer Genome Atlas (TCGA)—eventually demonstrated their huge informative potential for the categorization of molecularly circumscribed tumor subpopulations featuring specific genetic lesions. However, the ultimate goal of “personalized medicine” requires the validation of such lesions as therapeutic targets and the definition of biomarkers for accurate prediction of sensitivity to rational treatments. These aims are not fulfilled yet, as the technical limitations and the descriptive essence of endeavors such as TCGA limited researchers' ability to link sequencing data with clinical outcomes in terms of prognosis and response to treatments. As a consequence of this, Nature was recently stating that:

The end of TCGA (expected at the end of 2015) represents an opportunity for the field to balance its cancer-genomics projects more evenly between cataloguing mutations and studying their functional significance. Functional studies have had short shrift, whereas “sequencing a simple concept, and easier to communicate to policy-makers and the public” has taken the lead. Correcting that imbalance will lead to exciting discoveries for science and for patients (see Nature 508, 287-288; 2014).

Patient-derived xenografts (PDX) offer new means to face this issue by combining the flexibility of preclinical analysis with the instructive value of population-based studies. A deep biological and molecular characterization of a large number of established tumor grafts could afford PDX-based approaches with the necessary statistical robustness for conducting reliable genotype-response correlation studies at the population level.

To this aim, international efforts are on-going to constitute consortia, in which different groups from multiple countries share PDX models and their bio-molecular characterization in order to establish a higher-order resource available for the scientific community. Among these, EurOPDX puts together researchers from 14 EU countries that overall generated 1500+ PDX models from 30+ tumor types. These collaborative frameworks offer unique opportunities for the generation, development and validation of new hypotheses in cancer treatment and diagnosis. However, the technical implementation of dedicated data-sharing platforms for PDX biobanks still remains a major challenge.

Two of the major advantages of PDX-based approaches over “classical” methods from the experimental point of view, become major hurdles for efficient data sharing:

a. Data heterogeneity. Typical biobanks usually offer information regarding sample availability accompanied by a limited set of descriptive data (molecular, clinical, histopathological annotations), which are normally shared by all the samples derived from an individual patient. In the case of PDX collections, every collected sample gives rise to a virtually infinite genealogy of derived tumors, which are serially propagated by transplantation from mouse to mouse. Thus, besides tracing samples' availability a further level of complexity is needed, in order to track information regarding the experimental details specifically related to a subsample within a collection (e.g. the number of xenotransplants that preceded the generation of the actual sample or the drugs administered to the mouse bearing this specific tumor and how these affected its growth).

b. Data dynamics. Oncogenomic portals that allow mining of deeply characterized tumor samples are based on batch releases of rather static data. Once generated, the data are available for the public, which is allowed to read and mine them but not to write in the central repository. Also because the original sample is not available anymore for further testing, the dataflow is always unidirectional. This paradigm is completely reversed in the case of PDX networks. In this scenario, sample availability usually is not a rate-limiting step. Indeed, the possibility to incrementally stratify and integrate multiple layers of information generated from diverse laboratories at different times represents one of the key added values of PDX-based approaches over standard methods. Moreover, the relatively short lifecycle of typical xenografts renders the biobank itself extremely dynamic, imposing dedicated efforts to facilitate prompt and frequent updates from multiple and distributed users.

To tackle these issues, we implemented a prototypic platform for the coordinated data management, storage and sharing of PDX-related data. The application, which is called Laboratory Assistant Suite (LAS), aims at reducing operator-dependent error in data generation, tracking, and storage, by minimizing manual data entry and by guiding the users through dedicated SOPs. LAS also allows for integration of complex multidimensional data (including in vivo experiments and molecular, biobanking, and storage information) by managing multiple independent databases linked in a network, which can be queried through a dedicated graphical interface. This tool is amenable for multiuser/multicenter deployment and allows for high-granularity management of data access privileges, enabling flexible management of data sharing and embargo policies.

Citation Format: Andrea Bertotti. Heterogeneous data management and integration for living biobanks. [abstract]. In: Proceedings of the AACR Special Conference: Patient-Derived Cancer Models: Present and Future Applications from Basic Science to the Clinic; Feb 11-14, 2016; New Orleans, LA. Philadelphia (PA): AACR; Clin Cancer Res 2016;22(16_Suppl):Abstract nr IA08.