SP1 - SASKit

Learning best biomarkers for pancreatic cancer, stroke and for their comorbidity based on human, animal and in-vitro data

The bioinformatics will focus on analyzing the transcriptomics and protein data, as well as blood count and other clinical/phenotypic data, together with public data, towards biomarker identification. We aim for easy-to-interpret diagnostic/theranostic biomarkers for deterioration/recovery, and ultimately, therapeutic intervention. Machine learning of biomarkers will start with simple correlation-based approaches that, in principle, can include any measurements for which several time points or conditions are available. The highest-correlation interactions and correlation-based subnetworks will then be investigated in detail. For example, we would expect that the longitudinal change in inflammatory blood components (e.g. neutrophils) correlates with inflammation-related gene expression across tissue, age group and overall senescence status. We will also run the most promising standard omics analyses, such as Gene Ontology and pathway enrichment analyses. As endpoint data such as progression/survival or recovery data become available, Cox hazard models, support vector machines (SVM), random forests and (deep) neural networks will be used.
An important component of our analysis will be the parallelogram approach where we extrapolate expression data for one species/tissue combination once data for three other combinations are known. This will allow us to estimate expression values for human diseased tissue (for which we cannot obtain biological material) based on the expression data in blood from the patients as well as blood and diseased tissue from the mouse models. The actual extrapolation step can be done based on various formulas and we will specifically optimize these for interaction, subnetwork and pathway activation data. During preliminary investigations, we found no suitable publicly available datasets, but we successfully contacted the authors of van der Velpen (2016) and obtained their data on isoflavone effects in blood and fat, in rat and human. We then tested a simple parallelogram approach at the pathway level. Indeed, the ratio of log fold changes between blood and fat in rat allowed good predictions for pathway activity in human fat on the basis of human blood. The sum of log fold changes for the KEGG cellular senescence pathway was predicted with an error of only 3%.


Parallelogram Approach: Log fold changes are summed up for transcripts contributing to cellular senescence and other pathways, and their expression ratios PBMC/fat are plotted, human vs rat. Values close to 1.0 indicate a strong similarity between blood and fat. In this simple test of the idea, the black “dot” for cellular senescence is found close to the middle (1.0/1.0), together with the “dots” of many other pathways, suggesting good extrapolations for these.