SP 3

Data management and Multiscale Computational Modeling

The MultiscaleHCC consortium aims at the development of novel models to mirror the growth of hepatocellular carcinoma (HCC, liver cancer) subjected to most common therapies (Sorafenib and transarterial chemoembolisation or TACE). Such models provide invaluable insights into the liver tumor's anatomy and its response to treatment, ultimately supporting the refinement of known therapies. In order to create sound and robust models, the data in our clinical studies are generated by several complementary high-throughput technologies (genome sequencing, protein and metabolite measurements, medical imaging) and interpreted in combination. Since these datasets are huge and complex (better known as “big data“), they cannot be evaluated without efficient and automated computer algorithms even for a few measurements. Furthermore, a robust strategy for the storage and management of meta information is crucial in order to integrate and analyze the heterogeneous data consistently.Our project fulfills these requirements by designing a consistent strategy to store and manage the “big data“ generated by the MultiscaleHCC consortium, to annotate them with clinically relevant parameters, and to automate their processing and statistical analysis. By this means, we can capture everything by a consistent system, beginning from the experimental design to the data interpretation and final validation of novel prediction models for liver carcinoma. The consistent modelling and curation of all data is crucial for the sustainability of research as conducted by the MultiscaleHCC consortium. Our strategy is built upon the IT infrastructure of the Quantitative Biology Center (QBiC) in Tübingen which provides redundant storage of data and facilitates their time-efficient processing through high-performance computing.Aside from the implementation of the technical requirements of our project, we are involved in bioinformatics analysis and interpretation of the data. This includes mapping of data onto biological networks in order to develop new ideas about the (patho)physiological mechanisms that are involved in the tumor growth. Such information is passed on to our consortium partners. By doing so, we support the development of novel and refinement of existing tumor models to achieve a better understanding of liver cancer.During the course of the clinical and co-clinical trials, which are at the heart of the MultiscaleHCC project, we have diligently collected, processed, and made available to all partners the acquired data. To this end, we have extended our existing data models for capturing Omics based experiments to clinical and preclinical biomedical imaging experiments. In this manner, we ensure that a comprehensive analysis of seemingly disparate datasets can be implemented and standardized. Moreover, in order to aid physicians in making therapeutic decisions based on the results of tumor gene-sequencing, we have developed a pipeline for simplifying the results to key driver genes and providing information of direct or indirect drug targeting. The pipeline compresses the large output of genomic sequencing, allowing for a tabular overview, which can be printed out in document form for subsequent use in clinical routine, aiding in maintaining concise records of therapeutic decision making based on genomic testing. Further steps will go in the direction of automating holistic data analysis methods, which have proven to have general applicability as more patients are enrolled in the clinical trial. This will allow participants in this consortium as well as future projects with similar aims to take advantage of automatic analysis workflows.

Keywords: Bioinformatics, data integration