| 09h00 |
Prof. dr. ir. Yves Moreau Department of Electrical Engineering, K.U.Leuven Candidate Gene Prioritization by Genomic Data Fusion
Despite significant advances in omics techniques, the identification of genes causing rare genetic diseases and the understanding of the molecular networks underlying those diseases remains difficult. Gene prioritization attempts to integrate multiple, heterogeneous data sources to identify candidate genes most likely to be associated or causative with a disorder. Such strategies are useful both to support clinical genetic diagnosis and to speed up biological discovery. Genomic data fusion algorithms are rapidly maturing statistical and machine learning techniques that integrate complex, heterogeneous information (such as sequence similarity, interaction networks, expression data, annotation, or biomedical literature) towards prioritization, clustering, or prediction. In particular, text mining is a particularly powerful methodology underlying genomic data fusion. We present a number of gene prioritization strategies, focusing on kernel methods and network analysis. We illustrate these approaches at the hand of several applications in genetic diagnosis and disease gene discovery. We also go beyond learning methods as such by addressing how such strategies can be embedded into the daily practice of geneticists, mostly through collaborative knowledge bases that integrate tightly with prioritization and network analysis methods.
|
| 09h55 |
Prof. dr. Ross D. King Aberystwyth University, Wales, UK Automating Biology using Robot Scientists
A Robot Scientist is a physically implemented robotic system that
applies techniques from artificial intelligence to execute cycles of
automated scientific experimentation. A Robot Scientist can
automatically execute cycles of: hypothesis formation, selection of
efficient experiments to discriminate between hypotheses, execution of
experiments using laboratory automation equipment, and analysis of
results. We have developed the Robot Scientist “Adam” to investigate
yeast (Saccharomyces cerevisiae) functional genomics. Adam has
autonomously identified genes encoding locally “orphan” enzymes in
yeast. This is the first time a machine has discovered novel scientific
knowledge. To describe Adam's research we have developed an ontology and
logical language. Use of these produced a formal argument involving over
10,000 different research units that relates Adam's 6.6 million biomass
measurements to its conclusions. We are now developing the Robot
Scientist “Eve” to automate drug screening and QSAR development.
|
| 10h50 | Coffee |
| 11h10 |
Prof. dr. Andrea Passerini Università degli Studi di Trento, Italy Frankenstein Junior: a Relational Learning Approach toward Protein Engineering
Protein engineering is the process of developing novel proteins with
useful functions. Rational design aims at exploiting available
knowledge to suggest promising mutations to be later verified by
site-directed mutagenesis. Machine learning techniques have been
extensively employed for predicting characteristics of proteins
(e.g. stability) from sequence information. A naive approach to
protein engineering consists of trying all possible mutations of a
certain sequence and evaluating each resulting mutant by these
predictors. However, this approach is computationally infeasible when
multiple mutations have to be jointly evaluated.
We propose a simple relational learning approach for protein engineering. First, we learn a set of relational rules from mutation data, then we use them for generating a set of candidate mutations that are most probable to improve protein function, e.g. conferring resistance to a certain inhibitor or improving activity on a specific substrate. Encouraging preliminary results were obtained in predicting HIV drug resistance mutations. We'll discuss the potentials and criticalities of the approach and suggest some directions for future research. |
| 12h05 |
Prof. dr. Manfred Jaeger Aalborg Universitet, Denmark Factorial Clustering with an Application to Plant Distribution Data
We propose a latent variable approach for multiple clustering of
categorical data. We use logistic regression models for the conditional
distribution of observable features given the latent cluster variables.
This model supports an interpretation of the different clusterings as
representing distinct, independent factors that determine the distribution
of the observed features. We apply the model for the analysis of
plant distribution data, where multiple clusterings are of interest
to determine the major underlying factors that determine
the vegetation in a geographical region.
|
| 13h00 | Sandwich lunch |
| 14h00 |
Public PhD defense of Kurt De Grave Department of Computer Science, K.U.Leuven Predictive Quantitative Structure-Activity Relationship Models and their use for the Efficient Screening of Molecules
We explore two avenues where machine learning can help drug discovery:
predictive models of in vivo or in vitro effects of molecules (known as
Quantitative Structure-Activity Relationship or QSAR models), and the selection
of efficient experiments based on such models.
In the first part, we present methods to improve the predictive power of graph
kernel based molecule classifiers. The bias of existing graph kernels can be
improved by augmenting atom-bond graphs with functional groups. This novel
representation allows a machine learning algorithm to use both high-level
functional and low-level atomic information, without any change to the kernel or
learning algorithm. In internal validation tests, we observe consistently higher
AUROCs for all tested kernels.
We also introduce a novel, efficient graph kernel called the Neighborhood
Subgraph Pairwise Distance Kernel. The feature space of this kernel is the space
of pairs of topological balls and the interpair distance. Using this kernel, a
standard support vector machine outperforms existing methods in the prediction
of all investigated target properties: mutagenicity, in vivo toxicity, antiviral
activity, and cancer suppression.
In the second part, we tackle the problem of efficient experimentation in drug
discovery using optimization assisted by a learned surrogate model and we
evaluate different experiment selection strategies. The algorithm is extended to
accommodate drug discovery needs, such as the selection of many parallel
experiments. The algorithm is integrated in an automated drug discovery
platform, the robot scientist Eve. It is also applied to the optimization of the
design of nanofiltration membranes.
The candidate gives a 40 minutes presentation in Dutch, with English slides, followed by an examination and a deliberation by the jury. |
| 15h45 | Reception |

The reception, lunch and coffee break take place in the thermodynamics museum adjacent to the auditorium.
Download the Ph.D. thesis (PDF, 6.5 MB).
There will be complimentary paperbacks available at the defense.
You can also order a paperback from Amazon.com.
