Bioinformatics has become essential to understand what makes living creatures tick. In the 21st century, biology is a data-rich science where algorithms matter.
Proteins are the main functional units in a living cell. The ability to quickly and reliably quantify the different proteins present in a tissue is therefore a critical capability in biological and pharmaceutical research, and is increasingly adopted in healthcare. However, finding a reliable experimental protocol manually is arduous and the interpretation of measurements difficult. We collaborate in the InSPECtor project on algorithms to model, predict, optimize, and interpret proteomics experiments.
By studying drug-target interactions we can gain knowledge on how small molecules bind with proteins. A possible application is drug repurposing, in which we try to find new and useful functions for existing drugs, and in this way avoiding the clinical trials and costs involved in producing new drugs.
Gene function prediction
Gene function prediction or functional genomics is all about predicting the functions of genes. A gene may have multiple functions at the same time, and biologists have organised these functions into hierarchies. This makes gene function prediction a key application of hierarchical multi-label classification (HMC) methods. We have implemented a number of these methods in our system Clus.
network-based interpretation of unstructured gene lists
Omics experiments are commonly used in wet lab practice to identify leads involved in interesting phenotypes. These omics experiments often result in unstructured gene lists, the interpretation of which in terms of pathways or the mode of action is challenging. To aid in the interpretation of such gene lists, we developed PheNetic, a decision theoretic method that is based on ProbLog and exploits publicly available information, captured in a comprehensive interaction network to obtain a mechanistic view of the listed genes. PheNetic selects from an interaction network the sub-networks highlighted by these gene lists. Phenetic is developed in collaboration with Dries De Mayer and Prof. Kathleen Marchal.
- InSPECtor: An integrated informatics platform for mass spectrometry-based protein assays (2013-2017)
- SNIPER: Statistical Network-based Inference for Proteomics Experiment Reasoning (2013-2014)
- Natural and Artificial Genetic Variation in Microbes (2010-2017)
- Elaboration of the CellPhInDER platform (2011-2015)
- NSPDK: the Neighborhood Subgraph Pairwise Distance Kernel is a fast graph kernel with state-of-the-art QSAR generalization performance.
- PIUS: Peptide Identification by Unbiased Search. PIUS searches for peptides in the six-frame translation of the complete genome.
- ProbLog: ProbLog programs are logic programs in which some of the facts are annotated with probabilities. ProbLog is a tool that allows you to intuitively build programs that do not only encode complex interactions between a large sets of heterogenous components but also the inherent uncertainties that are present in real-life situations.
- kLog: a system for statistical relational learning, powered by graph kernels. It is developed in collaboration with prof. Paolo Frasconi (Florence).
- DMax Chemistry Assistant a QSAR data mining system.
- Clus: a predictive clustering system.
- PMCSFG: Pairwise Maximum Common Subgraph Feature Generation.
- CP-DT: Predicts the probability of tryptic cleavages in denatured proteins.
- Fannes, T., Vandermarliere, E., Schietgat, L., Degroeve, S., Martens, L., Ramon, J. (2013). Predicting tryptic cleavage from proteomics data using decision tree ensembles. Journal of Proteome Research, 12 (5), 2253-2259.
- De Maeyer, D., Cloots L., Renkens, J., De Raedt, L., Marchal, K. (2013). PheNetic: Network-based interpretation of unstructured gene lists in E. coli. Molecular Biosystems, 9 (7), 1594-1603.
- Kelchtermans, P., Bittremieux, W., De Grave, K., Degroeve, S., Ramon, J., Laukens, K., Valkenborg, D., Barsnes, H., Martens, L. (2013). Machine learning applications in proteomics research: How the past can boost the future. Proteomics.
- Schietgat, L., Costa, F., Ramon, J., De Raedt, L. (2011). Effective feature construction by maximum common subgraph sampling. Machine Learning, 83 (2), 137-161.
- Fabrizio Costa, Kurt De Grave. (2010) Fast Neighborhood Subgraph Pairwise Distance Kernel. Proceedings of the 27th International Conference on Machine Learning (ICML-2010), Haifa, Israel.
- De Grave, K., Costa, F. (2010). Molecular graph augmentation with rings and functional groups. Journal of Chemical Information and Modeling, 50 (9), 1660-1668.
- Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Dzeroski, S. (2010). Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinformatics, 11 (2).
- Sarah ElShal, Leon-Charles Tranchevent, Alejandro Sifrim, Amin Ardeshirdavani, Jesse Davis, and Yves Moreau (2015). Beegle: from literature mining to disease-gene discovery. Nucleic Acids Research.
- Dusan Popovic, Alejandro Sifrim, Jesse Davis, Yves Moreau, and Bart De Moor (2015). Problems with the nested granularity of feature domains in bioinformatics: the eXtasy case. BMC Bioinformatics