Institute of Bioinformatics · University Medicine Greifswald · Ernst-Moritz-Arndt University of Greifswald
Large scale, complex data sets arising in modern biology are often of a tremendous complexity, making their manual analysis infeasible. Computers are of paramount importance in this situation, and automated machine learning tools can be key to extracting information from high dimensional data. Frequent problems encountered in biological data analysis concern the recognition and identification of (noisy) patterns in large, high-dimensional data sets, the correlation of such patterns with biological or clinical phenotypes, and the prediction of phenotypes based on new data.
We develop and apply machine learning algorithms to analyze and classify large scale data sets, and develop predictive models for biological processes based on high dimensional experimental data. Our work ecompasses supervised and unsupervised machine learning tools for this purpose, and employ them in collaborative research projects to elucidate biological function.
Individual genes are known to correlate with certain phenotypic traits, for example, increased risk for specific diseases. One example out of many is sickle cell disease, a severe condition that is due to a single mutation in a hemoglobin gene.
For complex diseases, no single gene is responsible, but a combination of several to many genes ultimately cause the disease. An example for a complex disease is cancer, where for most cancers a combination of environmental factors and genetic predisposition underlies the development and progression of the tumor. If these patterns were known, they could be used not only for diagnosis or staging, but also to understand a particular patients disease in more detail, up to the point where we can tailor treatment of the disease in response to the individual patients genomic profile.
We work on the development of supervised and unsupervised methods to identify such predictive patterns in large scale data, pursuing the following aims:
Network inference deals with the problem of reconstructing a gene regulatory or signal transduction network from observations of the networks behavior. Hence, based on mere observational data, for example simply over time, or after certain interventions, the question is to infer how this systems functions internally. This is a complicated inverse problem, that has attracted much attention in engineering ("system identification"), and has tremendous potential for applications in Biology. We work at the forefront of the development of new methods for network reconstrucion in biological applications, using, for example, Bayesian models or nonlinear dynamic systems.