SP4 - COMMITMENT

Multi-OMICs transfer learning

In order to achieve a better description and eventually diagnosis for complex diseases such as schizophrenia, the integration of multiple data types, such as molecular data (gene expression data, epigenetic data,…), imaging data (such as brain MRI) or clinical and lifestyle data appears to be the most promising approach. By using advanced machine-learning approaches, we can combine such heterogeneous datasets to build signatures, or digital “fingerprints” of a disease, which can be used to detect and classify the disease for a potential patient. A challenge of this approach is the heterogeneous nature of these data types, which describe the disease at very different levels (molecular to macroscopic level). Hence, as part of this project, we want to develop meaningful ways to project these data types into a common description space, for example at the level of pathway activities. This will have the advantage that distinct data modalities can be described in a unified framework, and will improve the predictive potential of machine-learning models. On the other hand, the available datasets for patient cohorts are often difficult to compare directly, as various cohorts might have distinct data types available. For some, only genetics data would be provided, while other might have expression data. Hence, a further challenge will be to apply predictive models based on some data types (for example gene expression or imaging data) to other patient cohorts, for which only some of these data, or even different data types are available. Such a strategy is called “transfer learning, by which a trained model is transferred to a distinct biological context.

We will derive the most appropriate strategies to perform such a model transfer, in order to ensure a wide applicability of our predictive models. In order to do so, we will use previously developed unsupervised learning approaches based on matrix factorization or neural networks, and extend them to work on distributed datasets. As a result, we will provide a toolbox of validated methods which can be used to

(1) predict and classify the disease in any available patient cohort, and

(2) compare the digital fingerprint to those obtained in comorbid diseases such as diabetes or cardiovascular diseases.

Keywords: machine-learning; multi-omics