Institute of Bioinformatics · University Medicine Greifswald · University of Greifswald

The inference of signal transduction and gene regulatory networks (GRN) from data is a major challenge in systems biology. Viewing signal transduction or gene regulation as a black box, the aim is to reconstruct the underlying processes from observational or interventional data. Sophisticated machine learning algorithms and computational methods from the system identification field can be used for this purpose. We develop algorithms for the inference of networks from high-throughput experimental data, using a variety of mathematical and computational tools, and based on different types of experimental data.

Gene expression is tightly controlled in cells. Depending on external and internal conditions in the cell, genomic DNA is transcribed to messenger RNA, which is subsequently translated to protein. If this protein is a transcription factor, it can in turn activate or inactivate the transcription of other genes, leading to complicated genetic regulatory networks. Noncoding RNAs can further complicate this picture, and can also influence gene or protein expression. Our aim in the following projects is to understand these regulatory networks in more detail, using computational methods.

In this project, we have embedded probabilistic boolean networks into a Bayesian statistical learning approach. We propose a specific prior distribution on model parameters that drives learning to so-called scale-free network topologies, where particular hub-genes are highly connected, while most of the genes in the network are only of minor regulatory significance. Together with the probabilistic approach and the underlying boolean network model, this allows the inference of fairly large networks, taking into account properties of real biological networks. We have successully applied this approach to simulated and real experimental data on the yeast cell cycle. Results show excellent performance even for relatively large networks with several hundreds of genes, and have led to the identification of new hubs in the yeast transcriptional network.

We have developed a novel approach to infer Gene Regulatory Networks from time series gene expression data, using ordinary and delay differential equations to describe the system's dynamic behavior. The model used is based on biochemical reaction kinetics, model parameters thus correspond directly to reaction properties. Parameters of the differential equations and network topology are estimated simultaneously using simulated annealing or markov chain monte carlo methods, the latter permitting the computation of distributions over network topologies and model parameters.

To circumvent overfitting and handle noisy experimental data, we embed our differential equations into a Bayesian framework, and encode prior expectations on model parameters and network topologies into a prior distribution. Our particular choice of prior distribution is designed to drive network inference to sparse solutions, where the number of significant regulations between different genes is kept low.

Signal transduction processes control the flow of information in a cell, and determine, for example, how the cell senses and processes information from its environment. While experimental techniques exist that allow the observation of cellular phenotypes after targeted interventions, such as RNA interference or other knockdown or knockout screens, the inference of signal transduction networks from such perturbation data is a daunting task. We are developing computational tools to infer networks after RNAi knockdowns or similar perturbations, when phenotypic effects are observed after each knockdown at one or several time points.

A second alternative is to use a dynamic Bayesian network model. Here, signaling proteins are modeled as either active or inactive, and signal propagation through the network is modeled using a stochastic framework. One can then define a likelihood function that gives the probability of observing particular phenotypes after knockdowns. Using Bayes' theorem, one can then sample from the posterior distribtuion over network topologies and model parmaeters, and derive probabilities for alternative models. These can then be used for experiment design, to find the most informative experiment to refine the network topology further.

One approach which has been developed in our group formulates the network inference challenge as a linear optimization problem. We assume that the signal transduction within a network is given as an information flow. A protein influences other proteins which are further down in the network topology and thus, the knockdown of an individual gene has an influence on the child nodes. Based on these assumptions, the proposed linear model uses the observed data to find the network topology which minimizes the edge weights. Using the simplex algorithm allows to solve the models in a fast and efficient manner even for large data sets.

Yet another network inference method developed in our group is based on Bayesian networks in combination with probabilistic Boolean thresholding. Here, we use distributed evolutionary markov chain monte carlo to deal with the high dimensionality of the target distribution. The genetic operators of the evolutionary methods have been modified and adopted according to the proposed target function. Biological prior knowledge can be easily incorporated in the network inference task. The approach preferentially uses time-course data where perturbation effects are directly considered, however it can also be applied on steady-state or incomplete time-series data.

**M. Böck**, S. Ogishima, H. Tanaka, S. Kramer,**L. Kaderali**(2012).*Hub-Centered Gene Network Reconstruction using Automatic Relevance Determination.*PLoS ONE, in press.**J. Mazur**,**L. Kaderali**(2011).*Bayesian Experimental Design for the Inference of Gene Regulatory Networks.*In: Proceedings of the Fifth International Workshop on Machine Learning in Systems Biology, Vienna, Austria, July 20-21, 2011. Stefan Kramer and Neil Lawrence (Eds.), 54-58.- N. Radde,
**L. Kaderali**(2010).*A Bayes regularized ODE Model for the Inference of Gene Regulatory Networks.*In: Das, Carayea, Hsu, Welch, Handbook of Research: Computation Methodologies in Gene Regulatory Networks. IGI-Global. **J. Mazur**,**D. Ritter**, G. Reinelt,**L. Kaderali**(2009).*Reconstructing Nonlinear Dynamic Models of Gene Regulation using Stochastic Sampling*. BMC Bioinformatics 10:448.**L. Kaderali**, E. Dazert, U. Zeuge, M. Frese, R. Bartenschlager (2009).*Recontructing Signaling Pathways from RNAi Data using Probabilistic Boolean Threshold Networks.*Bioinformatics, 25(17), 2229-2235, doi:10.1093/bioinformatics/btp375.- N. Radde,
**L. Kaderali**(2009).*Inference of an Oscillating Model for the Yeast Cell Cylce.*Discrete Applied Mathematics 157, 2285-2295, doi:10.1016/j.dam.2008.06.036. **L. Kaderali**, N. Radde (2008).*Inferring Gene Regulatory Networks from Expression Data*. In: A. Kelemen, A. Abraham, Y. Chen (Editors), Computational Intelligence in Bioinformatics. Studies in Computational Intelligence 94, Springer-Verlag, Heidelberg.**D. Ritter**(2008),*Machine Network Learning - Bayesian Inference of Gene Regulatory Networks with Differential Equations using Stochastic Simulation*. Master Thesis, Faculty of Mathematics and Computer Science, University of Heidelberg (with Prof. Dr. G. Reinelt).**M. Böck**(2008),*Bayesian learning of Boolean regulatory networks derived from expression data*. Diploma Thesis, Bioinformatics, Technical University of Munich (with Prof. Dr. S. Kramer).**G. Klingbeil**(2007),*Inference of Biochemical Networks using a Hybrid Approach.*Thesis, CUBIC Graduate Course in Bioinformatics, University of Cologne.- N. Radde,
**L. Kaderali**(2007).*Bayesian Inference of Gene Regulatory Networks using Gene Expression Time Series Data*. S. Hochreiter and R. Wagner (EDS): BIRD 2007, LNBI Lecture Notes in Bioinformatics, 4414, 1-15.