Bioinformatics and Cheminformatics

Advancing the life sciences through machine learning and data mining

Bioinformatics has become essential to understand what makes living creatures tick. In the 21st century, biology is a data-rich science where algorithms matter.

Detailed topics: 


Proteins are the main functional units in a living cell. The ability to quickly and reliably quantify the different proteins present in a tissue is therefore a critical capability in biological and pharmaceutical research, and is increasingly adopted in healthcare. However, finding a reliable experimental protocol manually is arduous and the interpretation of measurements difficult. We collaborate in the InSPECtor project on algorithms to model, predict, optimize, and interpret proteomics experiments.

Drug-target interactions

By studying drug-target interactions we can gain knowledge on how small molecules bind with proteins. A possible application is drug repurposing, in which we try to find new and useful functions for existing drugs, and in this way avoiding the clinical trials and costs involved in producing new drugs.

Gene function prediction

Gene function prediction or functional genomics is all about predicting the functions of genes. A gene may have multiple functions at the same time, and biologists have organised these functions into hierarchies. This makes gene function prediction a key application of hierarchical multi-label classification (HMC) methods. We have implemented a number of these methods in our system Clus.

network-based interpretation of unstructured gene lists

Omics experiments are commonly used in wet lab practice to identify leads involved in interesting phenotypes. These omics experiments often result in unstructured gene lists, the interpretation of which in terms of pathways or the mode of action is challenging. To aid in the interpretation of such gene lists, we developed PheNetic, a decision theoretic method that is based on ProbLog and exploits publicly available information, captured in a comprehensive interaction network to obtain a mechanistic view of the listed genes. PheNetic selects from an interaction network the sub-networks highlighted by these gene lists. Phenetic is developed in collaboration with Dries De Mayer and Prof. Kathleen Marchal.

  • NSPDK: the Neighborhood Subgraph Pairwise Distance Kernel is a fast graph kernel with state-of-the-art QSAR generalization performance.
  • PIUS: Peptide Identification by Unbiased Search. PIUS searches for peptides in the six-frame translation of the complete genome.
  • ProbLog: ProbLog programs are logic programs in which some of the facts are annotated with probabilities. ProbLog is a tool that allows you to intuitively build programs that do not only encode complex interactions between a large sets of heterogenous components but also the inherent uncertainties that are present in real-life situations.
  • kLog: a system for statistical relational learning, powered by graph kernels. It is developed in collaboration with prof. Paolo Frasconi (Florence).
  • DMax Chemistry Assistant a QSAR data mining system.
  • Clus: a predictive clustering system.
  • PMCSFG: Pairwise Maximum Common Subgraph Feature Generation.
  • CP-DT: Predicts the probability of tryptic cleavages in denatured proteins.
Selected publications: