SIM 2009, Joint Conference: SRL ILP MLG

ILP

Oral Presentation

Grant Anderson and Bernhard Pfahringer
"Relational Random Forests based on Random Relational Rules"

Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, R4F, for generating Random Forests over relational data. R4F employs randomly generated relational rules as fully self-contained Boolean tests inside each node in a tree and thus can be viewed as an instance of dynamic propositionalization. The implementation of R4F allows for the simultaneous or parallel growth of all the branches of all the trees in the ensemble in an efficient shared, but still single-threaded way. Experiments favorably compare R4F to both FORF and the combination of static propositionalization together with standard Random Forests. Various strategies for tree initialization and splitting of nodes, as well as resulting ensemble size, diversity, and computational complexity of R4F are also investigated. A longer version of this paper has been accepted by IJCAI2009.

Joris Gillis and Jan Van den Bussche
"Induction of Relational Algebra Expressions"

We consider the induction of relational algebra expressions from examples consisting of a relational database and an output relation. This problem fits perfectly in the ILP context but has not been investigated in specific detail. We discuss the importance of negation (difference, complementation, universal quantification), propose a new heuristic to take complementation into account, and propose the use of cylindrical set algebra as a more flexible form for searching expressions. We present some modest experimental results which nevertheless show that our approach allows the induction of typical relational database queries involving universal quantification, such as Codd's relational division operator.

Tamas Horvath, Gerhard Paass, Frank Reichartz and Stefan Wrobel
"A Logic-Based Approach to Relation Extraction from Texts"

In recent years, text mining has moved far beyond the classical problem of text classification with an increased interest in more sophisticated processing of large text corpora, such as, for example, evaluations of complex queries. This and several other tasks are based on the essential step of relation extraction. This problem becomes a typical application of learning logic programs by considering the dependency trees of sentences as relational structures and examples of the target relation as ground facts of a target predicate. In this way, each example is represented by a definite first-order Horn-clause. We show that Plotkin's LGG operator can effectively be applied to such clauses and propose a simple and effective divide-and-conquer algorithm for listing a certain set of LGGs. We use these LGGs as binary features and obtain the final hypothesis by applying SVM to the corresponding feature vectors. Empirical results on the ACE--2003 benchmark dataset indicate that the performance of our approach is comparable with state-of-the-art kernel methods.

Katsumi Inoue, Koichi Furukawa and Ikuo Kobayashi
"Abducing Rules with Predicate Invention"

This paper addresses discovery of unknown relations from incomplete network data using abduction. Given a network information such as causal relations and metabolic pathways, we want to infer missing links and nodes in the network to account for observations. This is implemented in SOLAR, an automated deduction system for consequence finding, using first-order representation of algebraic relations and full-clausal ground formulas of network information. Abduction by SOLAR is powerful enough to infer unknown rules and to realize predicate invention by inferring unknown causes. In particular, we point out the importance of existentially quantified formulas to express hypotheses including new variables representing missing nodes to be introduced. We also show an application of rule abduction to discover knack in performing skillful tasks.

Ondřej Kuželka and Filip Zelezny
"Block-Wise Construction of Acyclic Relational Features with Monotone Relevancy"

We describe an algorithm for constructing a set of acyclic conjunctive relational features by combining smaller conjunctive blocks. Unlike traditional level-wise approaches which preserve the monotonicity of frequency, our block-wise approach preserves a form of monotonicity of relevancy, which is important in propositionalization employed in the context of classification learning. With pruning based on this property, our block-wise approach efficiently scales to features including tens of first-order literals, far beyond the reach of state-of-the art propositionalization or inductive logic programming systems.

Jens Lehmann and Christophe Haase
"Ideal Downward Refinement in the EL Description Logic"

With the proliferation of the Semantic Web, there has been a rapidly rising interest in description logics, which form the logical foundation of the W3C standard ontology language OWL. While the number of OWL knowledge bases grows, there is an increasing demand for tools assisting knowledge engineers in building up and maintaining their structure. For this purpose, concept learning algorithms based on refinement operators have been investigated. In this paper, we provide an ideal refinement operator for the popular description logic E L and show that it is computationally feasible on large knowledge bases.

Marco Lippi, Manfred Jaeger, Paolo Frasconi and Andrea Passerini
"Relational Information Gain"

We introduce relational information gain, a refinement scoring function measuring the informativeness of newly introduced variables. The gain can be interpreted as a conditional entropy in a well-defined sense and can be efficiently approximately computed. In conjunction with simple greedy general-to-specific search algorithms such as FOIL, it yields an efficient and competitive algorithm in terms of predictive accuracy, compactness of the learned theory, and ability to learn a good structure for probabilistic logical models.

Wannes Meert, Jan Struyf and Hendrik Blockeel
"CP-Logic Theory Inference with Contextual Variable Elimination and Comparison to BDD Based Inference Methods"

There is a growing interest in languages that combine probabilistic models with logic to represent complex domains involving uncertainty. Causal Probabilistic Logic (CP-logic) is such a probabilistic logic language, which has been designed to model causal processes. This paper investigates inference algorithms for CP-logic; these are crucial for developing learning algorithms. It proposes a new CP-logic inference method based on contextual variable elimination and compares this method to variable elimination and to inference methods based on binary decision diagrams.

Lilyana Mihalkova and Matthew Richardson
"Speeding up Inference in Statistical Relational Learning by Clustering Similar Query Literals"

Markov logic networks (MLNs) have been successfully applied to several challenging problems by taking a "programming language" approach where a set of formulas is hand-coded and weights are learned from data. Because inference plays an important role in this process, "programming" with an MLN would be significantly facilitated by speeding up inference. We present a new meta-inference algorithm that exploits the repeated structure frequently present in relational domains to speed up existing inference techniques. Our approach first clusters the query literals and then performs full inference for only one representative from each cluster. The clustering step incurs only a one-time up-front cost when weights are learned over a fixed structure.

Stephen Muggleton, Jose Santos and Alireza Tamaddoni-Nezhad
"Towards a System Based on Asymmetric Relative Minimal Generalisation"

Over the last decade Inductive Logic Programming systems have been dominated by use of top-down refinement search techniques. In this paper we re-examine the use of bottom-up approaches to the construction of logic programs. In particular, we explore an asymmetric variant of Plotkin's Relative Least General Generalisation (RLGG) which is based on subsumption relative to a bottom clause. With Plotkin's RLGG, clause length grows exponentially in the number of examples. By contrast, in the Golem system, the length of ij-determinate RLGG clauses were shown to be polynomially bounded for given values of i and j. However, the determinacy restrictions made Golem inapplicable in many key application areas, including the learning of chemical properties from atom and bond descriptions. In this paper we show that with Asymmetric Relative Minimal Generalisations (or ARMGs) relative to a bottom clause, clause length is bounded by the length of the initial bottom clause. ARMGs , therefore do not need the determinacy restrictions used in Golem but still have a polynomial-time construction. An algorithm is described for constructing ARMGs together with some initial results on the clause lengths for some well-known ILP applications. In a longer follow-up paper we aim to extend the analysis and implementation of a system called ProGolem which combines bottom-clause construction in Progol with a Golem control strategy for constructing multiple clause predicate definitions.

Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz and David Page
"An ILP Approach to Model and Classify Hexose Binding Sites"

Hexoses play a key role in many cellular pathways and in the regulation of development and disease mechanisms. As such, hexose binding proteins and their binding properties are of great interest to biomedical researchers. Current protein-sugar computational models are based, at least partially, on prior biochemical knowledge. We investigate the empirical support for biochemical findings by comparing ILP-induced rules to actual biochemical results. Our method matches lab findings, is able to identify hexose binding sites, and in addition reveals a Trp-Glu dependency. Our classifier achieves a similar accuracy as other black-box classifiers while providing insight into the discriminating process.

Louis Oliphant, Elizabeth Burnside and Jude Shavlik
"Boosting First-Order Clauses for Large, Skewed Data sets"

Creating an effective ensemble of clauses for large, skewed data sets requires finding a diverse, high-scoring set of clauses and then combining them in such a way as to maximize predictive accuracy. We have adapted the RankBoost algorithm in order to maximize area under the recall-precision curve, a much better metric when working with highly skewed data sets than ROC curves. We have also explored a range of possibilities for the weak hypotheses used by our modified RankBoost algorithm beyond using individual clauses. We provide results on four large, skewed data sets showing that our modified RankBoost algorithm outperforms the original on area under the recall-precision curves.

Taisuke Sato, Masakazu Ishihata and Katsumi Inoue
"Constraint-based Probabilistic Modeling for Statistical Abduction"

We introduce a new framework for logic-based probabilistic modeling called constraint-based probabilistic modeling which defines conditional distributions P(. | KB) over independent propositional variables constrained by a first-order knowledge base KB. We prove generative models such as PCFGs and discriminative models such as CRFs have an equivalent CBPM (constraint-based probabilistic model) as long as they are discrete. We apply this generality to abductive reasoning with KB allowing cyclic dependencies and derive an EM algorithm for the parameter learning of CBPMs.

Lisa Torrey and Jude Shavlik
"Policy Transfer via Markov Logic Networks"

We propose an improved method for transfer in reinforcement learning via Markov Logic Networks. Our goal is to extract relational knowledge from a source task and use it to speed up learning in a related target task. We do so by learning a Markov Logic Network that describes the source-task policy, and then using it for decision making in the early learning stages of the target task. Through experiments in the RoboCup simulated-soccer domain, we show that this approach performs better than a previous MLN transfer method that modeled a value function rather than a policy.

Poster Presentation

Elma Akand, Michael Bain and Mark Temple
"Learning Responsive Cellular Betworks by Integrating Genomics and Proteomics Data"

Systems biology is an important application area for ILP. We investigate learning networks of activity in response to stress in yeast cells. Diverse heterogeneous empirical data is used rather than curated knowledge sources. We find that although this representation is restricted and predictive accuracy is low this can still lead to useful outputs.

Kamal Ali, Kevin Leung, Tolga Konik, Dongkyu Choi and Dan Shapiro
"Knowledge-Directed Theory Revision"

Using domain knowledge to speed up learning is widely accepted but theory revision of such knowledge continues to use general syntactic operators. Using such operators for theory revision of teleoreactive logic programs is especially expensive since such programs are often used to control games in which proof of a top-level goal involves playing a game. In such contexts, one should have the option to complement general theory revision with domain-specific knowledge. We have found it easy to extract and express such knowledge. Using American football as an example, we use ICARUS’s multiplayer teleoreactive logic programming ability to encode a coach agent which infers faults during a game and at its conclusion applies procedural attachments to fix the programs of other agents. Our results show effective learning using as few as twenty training examples. We also show that structural changes made by such revision can produce performance gains that cannot be matched using algorithms that do not make structural changes.

Laura-Andreea Antanas, Ingo Thon, Martijn van Otterlo, Niels Landwehr and Luc De Raedt
"Probabilistic Logical Sequence Models for Video"

Understanding complex dynamic scenes of real-world activities from low-level sensor data is of central importance for intelligent systems. The main difficulty lies in the fact that complex scenes are best described in high-level, logical formalisms, while sensor data usually consists of many low-level features. We first propose a method to obtain a logical representation of real-world dynamic scenes based on input video stream solely. We focus on representing the video data using probabilistic relational sequences as a natural way to incorporate sensor uncertainty. They allow us to work with structured terms, and in addition they capture the inherent uncertainty of object detection. Further on, we employ r-grams as the probabilistic logical learning model for this application. In a first step we use r-grams in a simple setting and we show their viability in card games. We then show how r-grams can be upgraded to deal with uncertain observations.

Jianzhong Chen and Stephen Muggleton
"Decision-Theoretic Logic Programs"

We propose a new Probabilistic ILP (PILP) framework, Decision-theoretic Logic Programs (DTLPs), in the paper. DTLPs extend PILP models by integrating decision-making features developed in Statistical Decision Theory area. Both decision-theoretic knowledge (e.g. utilities) and probabilistic knowledge (e.g. probabilities) can be represented and dealt with in DTLPs. An implementation of DTLPs using Stochastic Logic Programs (SLPs) is introduced and a DTLP parameter learning algorithm is discussed accordingly. The representation and methods are tested by performing regression on the traditional mutagenesis dataset.

Anton Dries and Luc De Raedt
"Towards Clausal Discovery for Stream Mining"

With the increasing popularity of data streams it has become time to adapt logical and relational learning techniques for dealing with streams. In this note, we present our preliminary results on upgrading the clausal discovery paradigm towards the mining of streams. In this setting, there is a stream of interpretations and the goal is to learn a clausal theory that is satisfied by these interpretations. Furthermore, in data streams the interpretations can be read (and processed) only once.

Daan Fierens
"On the Relationship between Logical Bayesian Networks and Probabilistic Logic Programming based on the Distribution Semantics"

A significant part of current research on ILP deals with probabilistic logical models. Over the last decade many logics or languages for representing such models have been introduced. There is currently a great need for insight into the relationships between all these languages. One class of languages are those that extend probabilistic models with elements of logic, such as in the language of Logical Bayesian Networks (LBNs). Some other languages follow the converse strategy of extending logic programs with a probabilistic semantics, often in a way similar to that of Sato's distribution semantics. In this paper we study the relationship between the language of LBNs and languages based on the distribution semantics. Concretely, we define a mapping from LBNs to theories in the Independent Choice Logic (ICL). We also show that this mapping provides us with a learning algorithm for ICL.

Beatriz García Jiménez, Agapito Ledezma Espino and Araceli Sanchis de Miguel
"Modular Multi-Relational Framework for Gene Group Function Prediction"

Determining the functions of genes is essential for understanding how the metabolisms work, and for trying to solve their malfunctions. Genes usually work in groups rather than isolated, so functions should be assigned to gene groups and not to individual genes. Moreover, the genetic knowledge has many relations and is very frequently changeable. Thus, a propositional ad-hoc approach is not appropriate to deal with the gene group function prediction domain. We propose the Modular Multi-Relational Framework (MMRF), which faces the problem from a relational and flexible point of view. The MMRF consists of several modules covering all involved domain tasks (grouping, representing and learning using computational prediction techniques). A specific application is described, including a relational representation language, where each module of MMRF is individually instantiated and refined for obtaining a prediction under specific given conditions.

Frank Hoonakker, Nicolas Lachiche, Alexandre Varnek and Alain Wagner
"Condensed Graph of Reaction: Considering a Chemical Reaction As one Single Pseudo Molecule"

Chemical reactions always involve several molecules of two types – reactants and products. On the other hand, Quantitative Structure Activity Relationship (QSAR) methods are used to predict physic-chemical or biological properties of individual molecules. In this article, we propose to use Condensed Graph of Reaction (CGR) approach merging all molecules involved in a reaction into one molecular graph. This allows one to consider reactions as pseudo-molecules and to develop QSAR models based on fragment descriptors. Here Substructure Molecular Fragment descriptors calculated from CGRs have been used to build quantitative models for the rate constant of SN2 reactions in water. Three common attribute-value regression algorithms (linear regression, support vector machine, and regression trees) have been evaluated.

Srihari Kalgi, Chirag Gosar, Ganesh Ramakrishnan and Ashwin Srinivasan
"Application of Theory of Optimal Search to ILP"

Most ILP systems attempt to find a hypothesis by searching through a space of possible alternatives, the principal problem is that the size of the search space makes it impossible to perform even a complete (but non-exhaustive) search. Hence, greedy approaches such as FOIL and restricted variants of branch and bound techniques (Aleph), etc. are used to restrict the search space. In this paper, we will concern ourselves principally with systems like Progol, in which search proceeds by examining subspaces constrained by a set of most-specific clause ⊥1, ⊥2, ..., ⊥n (where n here will be the number of positive examples) using a greedy covering procedure. In each iteration of the greedy procedure a Progol-like system selects a positive example ei, construct a bottom clause ⊥i using the example and the background knowledge B and then searches for the best clause hi in a lattice lat(⊥i) subsuming ⊥i. Here "best" is in terms of some objective function (like pos-neg). Once this best clause hi in the lattice lat(⊥i) is found, all positive examples made redundant by B ∧ hi are removed from further consideration; and the search continues. A number of questions still remain, for example: How is the ei to be selected? How much effort should we expend on any one search? Current ILP systems answer these in ad hoc ways. It is our aim to show how a theory of optimal search can be applied to answer such questions.

Hassan Khosravi, Oliver Schulte and Bahareh Bina
"Virtual Joins With Nonexistent Links"

Many approaches to multi-relational learning require the computation of database frequencies in the presence of nonexistent links. The corresponding ILP problem is to compute the number of groundings that satisfy a given conjunction of literals in a relational database, where one or more of the literals is negated. We present a fast new dynamic programming algorithm for this problem. The database table joins performed by our algorithm are restricted to joins of tables already existing in the database. Evaluation on three data sets confirms the efficiency of our algorithm; computing frequencies for negated literals added about 15% to the cost of computing frequencies for positive literals only.

Tolga Konik, Negin Nejati and Ugur Kuter
"Inductive Generalization of Analytically Learned Goal Hierarchies"

We describe a new approach for learning procedural knowledge represented as Teleoreactive Logic Programs (TLPs) using relational behavior traces as input. A TLP organizes task decomposition skills hierarchically and attaches them explicitly defined goals. Our approach integrates analytical learning with inductive generalization in order to learn these skills. The analytical component predicts the goal dependencies given a successful behavior trace and produces the structure of the skill hierarchy. The inductive component generates successful and failed traces using those skills and refines their applicability conditions with an ILP component. We claim that explicit goal representation of TLPs allows the learning system to determine positive and negative examples used in learning. We claim that our approach uses less expert effort compared to a purely inductive approach and may perform better compared to a purely analytical approach.

Julien Lesbegueries, Agnès Braud and Nicolas Lachiche
"A Propositionalisation that Preserves More Continuous Attribute Domains"

The aim of this paper is to extend local transformation functions for propositionalisation. We follow the works of (Krogel et al., 2001) by adding a transformation function that reverses the thresholding problem. Indeed, we propose an innovative method that manages numeric attributes with the hypothesis that features can be transformed without use of discretisation nor aggregation.

Francesca Alessandra Lisi and Floriana Esposito
"Nonmonotonic Onto-Relational Learning"

In this paper we carry on the work on Onto-Relational Learning by investigating the impact of having disjunctive Datalog with default negation either in the language of hypotheses or in the language for the background theory. The inclusion of nonmonotonic features strengthens the ability of our ILP framework to deal with incomplete knowledge by performing some form of commonsense reasoning. One such ability can turn out to be useful in application domains, such as the Semantic Web, that require that kind of reasoning.

Stephen Muggleton, Aline Paes, Vítor Santos Costa and Gerson Zaverucha
"Chess Revision: Acquiring the Rules of Chess Variants through FOL Theory Revision from Examples"

The game of chess has been a major testbed for research in artificial intelligence since it requires focus on intelligent reasoning. Particularly, several challenges arise to machine learning systems when inducing a model describing the rules of chess, including the creation and definition of the examples, the learning of a model which correctly represents the official rules of the game, covering all the branches and restrictions of the correct moves, and the comprehensibility of such a model. Besides, the game of chess has inspired the creation of numerous variants, ranging from faster to more challenging or to regional versions of the game. The question arises if it is possible to take advantage of an initial classifier of chess as a starting point to obtain classifiers for the different variants. We approach this problem as an instance of theory revision from examples. The initial classifier of chess is inspired by a FOL theory approved by a chess expert and the examples are defined as sequences of moves within a game. Starting from a standard revision system, we argue that abduction and negation are also required to best address this problem. Experimental results show the effectiveness of our approach.

Sriraam Natarajan, Gautam Kunapuli, Ciaran O'Reilly, Richard Maclin, Trevor Walker, David Page and Jude Shavlik
"ILP for Bootstrapped Learning: A Layered Approach to Automating the ILP Set-Up Problem"

This paper introduces a new type of application for ILP called Bootstrapped Learning (BL). BL brings several challenges to ILP, including the need to automate the “ILP Set-Up” problem; small numbers of examples, in some cases no negative examples; and the need to “bootstrap” or to automatically base learning in part on the results of earlier learning sessions. The paper introduces BL, discusses the general challenges it raises for ILP, and presents initial results in ILP for BL.

Niels Pahlavi and Stephen Muggleton
"Higher-order Logic Learning"

This paper presents Higher-order Logic Learning (HOLL), which consists of generalizing logic-based Machine Learning, and more particularly Inductive Logic Programming (ILP), from the first-order to the higher-order context. The power of expressivity of Higher-order Logic (HOL) is used to improve significantly the learning capacity and efficiency of logic-based Machine Learning -- by both allowing to learn new tasks but also to perform better than first-order Machine Learning systems on some already learnable problems. We describe a Higher-order ILP system, called λProgol, adapting the ILP system Progol and based on the HOL formalism λProlog, along with its first implementation and promising results on motivating worked examples. We intend to extend the implementation, tests and evaluation of λProgol further, and to develop a theory of HOLL.

Anup Patel, Ganesh Ramakrishnan and Pushpak Bhattacharya
"Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages"

Developing linguistically sound and data-compliant rules for named entity annotation is usually an intensive and time consuming process for any developer or linguist. In this work, we present the use of two Inductive Logic Programming (ILP) techniques to construct rules for extracting instances of various named entity classes thereby reducing the efforts of a linguist/developer. Using ILP for rule development not only reduces the amount of effort required but also provides an interactive framework wherein a linguist can incorporate his intuition about named entities such as in form of mode declarations for refinements (suitably exposed for ease of use by the linguist) and the background knowledge (in the form of linguistic resources). We have a small amount of tagged data - approximately 3884 sentences for Marathi and 22748 sentences in Hindi. The paucity of tagged data for Indian languages makes manual development of rules more challenging, However, the ability to fold in background knowledge and domain expertise in ILP techniques comes to our rescue and we have been able to develop rules that are mostly linguistically sound that yield results comparable to rules hand-crafted by linguists. The ILP approach has two advantages over the approach of hand-crafting all rules: (i) the development time reduces by a factor of 240 when ILP is used instead of involving a linguist for the entire rule development and (ii) the ILP technique has the computational edge that it has a complete and consistent view of all significant patterns in the data at the level of abstraction specified through the mode declarations. The point (ii) enables the discovery of rules that could be missed out by the linguist and also makes it possible to scale the rule development to a larger training dataset. The rules thus developed could be optionally edited by linguistic experts and consolidated either (a) through default ordering (as in TILDE[1]) or (b) with an ordering induced using [2] or (c) by using the rules as features in a statistical graphical model such a conditional random field (CRF) [3]. We report results using WARMR [4] and TILDE to learn rules for named entities of Indian languages namely Hindi and Marathi.

Scott Proper and Prasad Tadepalli
"Transfer Learning via Relational Templates"

Transfer Learning refers to learning of knowledge in one domain that can be applied to a different domain. In this paper, we view transfer learning as generalization of knowledge in a richer representation language that includes multiple subdomains as parts of the same superdomain. We employ relational templates of different specificity to learn pieces of additive value functions. We show significant transfer of learned knowledge across different subdomains of a real-time strategy game by generalizing the value function using relational templates.

Oliver Ray, Ken Whelan and Ross King
"Automatic Revision of Metabolic Networks through Logical Analysis of Experimental Data"

This paper presents a nonmonotonic ILP approach for the automatic revision of metabolic networks through the logical analysis of experimental data. The method extends previous work in two respects: by suggesting revisions that involve both the addition and removal of information; and by suggesting revisions that involve a combination of gene function, enzyme inhibition and metabolic reactions. Our proposal is based on a new declarative theory of metabolism expressed in a nonmonotonic logic programming formalism. With respect to this theory, a mixture of abductive and inductive inference is used to compute a set of minimal revisions needed to make a given network consistent with some observed data. In this way, we describe how a hybrid reasoning system was able to usefully revise a state-of-the-art metabolic network in order to better account for real-world experimental data acquired by an autonomous laboratory platform known as the Robot Scientist.

Lothar Richter, Regina Augustin and Stefan Kramer
"Finding Relational Associations in HIV Resistance Mutation Data"

HIV therapy optimization is a hard task due to rapidly evolving mutations leading to drug resistance. Over the past five years, several machine learning approaches have been developed for decision support, mostly to predict therapy failure from the genotypic sequence of viral proteins and additional factors. In this paper, we define a relational representation for an important part of the data, namely the sequences of a viral protein (reverse transcriptase), their mutations, and the drug resistance(s) associated with those mutations. The data were retrieved from the Los Alamos National Laboratories' (LANL) HIV databases. In contrast to existing work in this area, we do not aim directly for predictive modeling, but take one step back and apply descriptive mining methods to develop a better understanding of the correlations and associations between mutations and resistances. In our particular application, we use the Warmr algorithm to detect non-trivial patterns connecting mutations and resistances. Our findings suggest that well-known facts can be rediscovered, but also hint at the potential of discovering yet unknown associations.

Hannes Schulz, Kristian Kersting and Andreas Karwath
"ILP, the Blind, and the Elephant: Euclidean Embedding of Co-Proven Queries"

Relational data is complex. This complexity makes one of the basic steps of ILP difficult for the untrained user: understanding the data. And if the user cannot easily understand it, she gets frustrated soon or draws incomplete conclusions. The situation is very much as in the parable of the blind men and the elephant that appears in many cultures. In this tale the blind work independently and with quite different pieces of information, thereby drawing very different conclusions about the nature of the beast. In contrast, visual representations make it easy to shift from one perspective to another while exploring and analyzing data. This paper describes a method for embedding interpretations and queries into a single, common Euclidean space based on their co-proven statistics. We demonstrate our method on real-world datasets showing that ILP results can indeed be captured at a glance.

Ashwin Srinivasan and Ganesh Ramakrishnan
"Parameter Screening and Optimisation for ILP using Designed Experiments"

Reports of experiments conducted with an Inductive Logic Programming system rarely describe how specific values of hyperparameters of the system are arrived at when constructing models. Usually, no attempt is made to identify sensitive hyperparameters, and those that are used are often given "factory-supplied" default values, or values obtained from some non-systematic exploratory analysis. The immediate consequence of this is, of course, that it is not clear if better models could have been obtained if some form of hyperparameter selection and optimisation had been performed. Questions follow inevitably on the experiments themselves: specifically, are all algorithms being treated fairly, and is the exploratory phase sufficiently well-defined to allow the experiments to be replicated? In this paper, we investigate the use of hyperparameter selection and optimisation techniques grouped under the study of experimental design. Screening and response surface methods determine, in turn, sensitive hyperparameters and good values for these hyperparameters. Screening is done here by constructing a stepwise regression model relating the utility of an ILP system's hypothesis to its input hyperparameters, using systematic combinations of values of input hyperparameters (technically speaking, we use a two-level fractional factorial design of the input hyperparameters). The hyperparameters used by the regression model are taken to be the sensitive hyperparameters for the system for that application. We then seek an assignment of values to these sensitive hyperparameters that maximise the utility of the ILP model. This is done using the technique of constructing a local "response surface". The hyperparameters are then changed following the path of steepest ascent until a locally optimal value is reached. This combined use of hyperparameter selection and response surface-driven optimisation has a long history of application in industrial engineering, and its role in ILP is investigated using two well-known benchmarks. The results suggest that computational overheads from this preliminary phase are not substantial, and that much can be gained, both on improving system performance and on enabling controlled experimentation, by adopting well-established procedures such as the ones proposed here.

Gabriel Synnaeve, Andrei Doncescu and Katsumi Inoue
"Kinetic Models for Logic-Based Hypothesis Finding in Metabolic Pathways"

In this paper we propose a logical model of Glycolysis and Pentose phosphate pathways of E.Coli that allows the analysis of the dynamical response of a biological system perturbed by a pulse of glucose. Our goal is to give a better comprehension of the physiological state of the cell from a better interpretation of the interactions between metabolic and signaling networks. Starting with the discretization of concentrations of some metabolites considered in steady state, we fully explain our approach to build a symbolic model applied to kinetics. Finally, hypothesis finding produces logical formulas on the metabolites through abduction which cannot be measured during dynamical state.

Ingo Thon
"Don’t Fear Optimality: Sampling for Probabilistic-Logic Sequence Models"

OOne of the current challenges in artificial intelligence is modeling dy- namic environments that change due to the actions or activities undertaken by people or other agents. The task of inferring hidden states, e.g. the activities or intentions of people, based on observations is called filtering. Standard proba- bilistic models such as Dynamic Bayesian Networks are able to solve this task efficiently using approximative methods such as particle filters. However, these models do not support logical or relational representations. The key contribution of this paper is the upgrade of a particle filter algorithm for use with a prob- abilistic logical representation through the definition of a proposal distribution. The performance of the algorithm depends largely on how well this distribution fits the target distribution. We adopt the idea of logical compilation into Binary Decision Diagrams for sampling. This allows us to use the optimal proposal dis- tribution which is normally prohibitively slow.

Daisuke Tokoro, Ai Fukunaga, Kiyoharu Hamaguchi, Toshinobu Kashiwabara and Shin-ichi Minato
"Exploiting Global Structures in Bayesian Network Compilation by Zero-suppressed BDDs"

This paper addresses an algorithm for probabilistic inference for Bayesian Networks (BNs). Our algorithm uses Zero-suppressed BDDs for compiling BNs. We introduce "separation variable" to reflect global structures of BNs, which provides more compact ZBDDs. We show some experimental results to compare our method with the state-of-the-art tool.

Hiroaki Watanabe and Stephen Muggleton
"Can ILP be Applied to Large Dataset?"

There exist large data in science and business. Existing ILP systems cannot be applied effectively for data sets with 10000 data points. In this paper, we consider a technique which can be used to apply for more than 10000 data by simplifying it. Our approach is called Approximative Generalisation and can compress several data points into one example. In case that the original examples are mixture of positive and negative examples, the resulting example is ascribed in probability values representing proportion of positiveness. Our longer term aim is to apply on large Chess endgame database to allow well controlled evaluations of the technique. In this paper we start by choosing a simple game of Noughts and Crosses and we apply mini-max backup algorithm to obtain database of examples. These outcomes are compacted using our approach and empirical results show this has advantage both in accuracy and speed. In further work we hope to apply the approach to large database of both natural and artificial domains.

MLG

Oral Presentation

Christian Desrosiers and George Karypis
"Within-network Classification Using Local Structure Similarity"

This paper presents a novel collective classification approach that classifies entities of a network using their local structure. Moreover, a new node similarity measure based on random walks, which takes into account label uncertainty and the degree of nodes, is introduced. Through experimentation on real-life datasets from different domains, we show our method to outperform several state-of-the-art approaches for this problem.

Christian Desrosiers, Alain Hertz, Philippe Galinier and Hansen Pierre
"Mining Graphs to Discover New Theorems in Mathematics"

This paper introduces a new data mining problem of characterizing a class of graphs with a set of forbidden subgraphs. Efficient methods for this problem are presented, and the potential of these methods illustrated on the task of discovering novel and significant results in the field of graph theory.

Christoph Lippert, Nino Shervashidze and Oliver Stegle
"Relational Models for Generating Labeled Real-world Graphs"

Analyzing and understanding the structure of social networks and other real-world graphs has become a major area of research in the field of data mining. An important problem setting is the creation of realistic synthetic graphs that resemble real-world social networks. While a range of efficient algorithms for this task have been proposed, current methods solely take the network topology into account ignoring any node labels. We propose a probabilistic approach to synthetic graph generation with node labels, building on concepts from relational learning.

Andreas Maunz, Christoph Helma and Stefan Kramer
"Large-Scale Graph Mining Using Backbone Refinement Classes"

We present a new approach to large-scale graph mining based on so-called backbone refinement classes. The method efficiently mines tree-shaped subgraph descriptors under minimum frequency and significance constraints, using classes of fragments to reduce feature set size and running times, defined in terms of fragments sharing a common back- bone. The method is able to optimize structural inter-feature entropy as opposed to occurrences, which is characteristic for open or closed fragment mining. In the experiments, the proposed method reduces feature set sizes by >90% and >30% compared to complete tree mining and open tree mining, respectively. Evaluation using cross-validation runs shows that their classification accuracy is similar to the complete set of trees but significantly better than that of open trees. Compared to open or closed fragment mining, a large part of the search space can be pruned due to an improved statistical constraint (dynamic upper bound adjustment), which is confirmed in the experiments in lower running times compared to static upper bound pruning. Further analysis using large-scale datasets confirms that the novel descriptors render large training sets feasible which previously might have been intractable.

Galileo Mark Namata and Lise Getoor
"A Pipeline Approach to Graph Identification"

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show how the performance of graph identification is sensitive to the inter-dependencies among these inferences.

Juuso Parkkinen, Janne Sinkkonen, Adam Gyenge and Samuel Kaski
"A Block Model Suitable for Sparse Graphs"

We introduce a new generative block model for graphs. Vertices (nodes) have mixed memberships in margin components, and edges arise from a multinomial defined over the cartesian product of the margin components. The model is able to represent block structures of "non-community" type, that is, it is able to model linkage between margin components. Compared to earlier mixed membership stochastic blockmodels which have a Bernoulli parameterization for the generation of links between each margin component pair, in the new model collapsed Gibbs samplers need to represent only those interactions with realized data in them, making possible large and sparse block models.

Leander Schietgat, Fabrizio Costa, Jan Ramon and Luc De Raedt
"Maximum Common Subgraph Mining: A Fast and Effective Approach towards Feature Generation"

There exists a wide variety of local graph mining approaches that search for frequent, correlated or closed patterns in graphs. All these methods return very large sets of patterns which can then be used as features to build classifiers. Here we take a different approach: rather than mining for all optimal local patterns, we randomly sample from the set of maximum common subgraphs. The advantages are that maximum common subgraphs are easier to compute than frequent or correlated patterns (for the class of outerplanar graphs this can even be accomplished in polynomial time), and that the resulting features lead to classification models that achieve significantly better predictive performance than models built on the patterns returned by traditional mining approaches.

Shankar Vembu, Thomas Gärtner and Mario Boley
"MAP Structured Prediction by Sampling"

We consider maximum a posteriori parameter estimation for structured prediction with exponential family models. In this setting, the main difficulty lies in the computation of the partition function and the first-order moment of the sufficient statistics. We consider the case that efficient algorithms for exact uniform sampling from the output space exist. This assumption is orthogonal to the typical assumptions made in structured output learning. It holds, in particular, for the highly relevant problem of sampling potent drugs. Under our uniform sampling assumption, we show that exactly computing the partition function is intractable, but it can be approximated efficiently. Furthermore, we show that the first-order moment of the sufficient statistics can be approximated and that we can sample according to the estimated distribution. We also present some application settings.

Poster Presentation

Albert Bifet and Ricard Gavalda
"Adaptive XML Tree Mining on Evolving Data Streams"

We propose a new method to classify trees, using closed and maximal frequent trees. Closed trees maintain the same information as frequent trees using less space and maximal trees maintain approximate information. We use them to reduce the number of classification features. We present a new framework for data stream tree classification. For the first component of our classification framework, using a methodology based in Galois Lattice Theory, we present three closed tree mining algorithms: an incremental one IncTreeMiner, a sliding-window based one, WinTreeMiner, and finally one that mines closed trees adaptively from data streams, AdaTreeMiner. To the best of our knowledge this is the first work on tree classification in streaming data varying with time. We give a first experimental evaluation of the proposed classification method.

Mario Boley, Tamas Horvath, Axel Poigne and Stefan Wrobel
"New Results on Listing Closed Sets of Strongly Accessible Set Systems"

In a former work we defined different graph mining settings raising the problem of listing all closed sets of strongly accessible set systems and proposed a DFS-algorithm that solves this problem with polynomial delay and incremental polynomial space. In this extended abstract we summarize our new results on listing closed sets of strongly accessible set systems. In particular, (i) we show that while listing all closed sets is intractable for accessible set systems, it becomes tractable for the class of strongly accessible set systems, (ii) give a listing algorithm for this problem that has not only better complexity, but provides also an algorithmic characterization of strongly accessible set systems, and (iii) present new results on the relationship between closure operators and support-closedness of patterns as defined in data mining.

Charles Bouveyron, Hugh Chipman and Etienne Côme
"Supervised Classification and Visualization of Social Networks Based on a Probabilistic Latent Space Model"

Graph-structured networks are widely used to represent relationships between persons in organizations or communities. Recently, the need of classifying and visualizing such data has suddenly grown due to the emergence of a large number of social network websites.~We propose in this paper two supervised approaches for learning a latent space model of the network taking into account both the observed class labels and the graph structure. The first proposed approach introduces the class information through the conditional model of the link existence between two nodes whereas the second one considers the class labels as new observed variables. The learned models are then used to project and classify new nodes.

Mickey Brautbar
"Online Learning a Binary Labeling of a Graph"

We investigate the problem of online learning a binary labeling of the vertices for a given graph. We design an algorithm, Majority, to solve the problem and show its optimality on clique graphs. For general graphs we derive a relevant mistake bound that relates the algorithm's performance to the cut size (the number of edges between vertices with opposite labeling) and the maximum independent set in the graph. We next introduce a novel complexity measure of the true labeling - the frontier and relate the number of mistakes incurred by Majority to this measure. This allows us to show, in contrast to previous known approaches, that our algorithm works well even when the cut size is bigger than the number of vertices. A detailed comparison with previous results is given.

Thomas Cason, Pierre-Antoine Absil and Paul Van Dooren
"Iterative Methods for Low Rank Approximation of Graph Similarity Matrices"

In this paper, we analyze an algorithm to compute a low-rank approximation of the similarity matrix between nodes of two graphs. This problem can be reformulated as an optimization problem of a continuous function Phi(S) = tr[S^T * M²(S)] where S is constrained to have unit Frobenius norm, and M² is a non-negative linear map. We restrict the feasible set to the smooth manifold of rank-k matrices with unit Frobenius norm and k identical singular values. We analyze the convergence properties of our algorithm and prove that accumulation points are stationary points of Phi(S). We finally compare our method in terms of speed and accuracy to the full rank algorithm.

Christophe Costa Florencio, Jan Ramon, Jonny Daenen, Jan Van den Bussche and Dries Van Dyck
"Context-free Graph Grammars as a Language-bias Mechanism for Graph Pattern Mining"

Many graph mining tasks involve search over languages of graphs. Generating all elements of a graph language is a non-trivial task, and there are often additional constraints, e.g. to allow for efficient pruning. There are several approaches to generate graphs. For some applications, however, it is desirable to constrain the search and only consider a subclass of all graphs. For this, the use of language bias has been considered, e.g. in the field of inductive logic programming. In the context of graph mining, one would expect of a similar general language bias mechanism that it allows one to specify many different classes of graphs in a declarative way. In the present paper, we propose the use of context-free graph grammars for this purpose. Graph grammars are well known, and a wide variety of classes of graphs can be specified using graph grammars. This makes them suitable for use as an expressive and flexible mechanism for graph pattern language bias. Examples of context-free graph languages include: all trees; all cycles; all outerplanar graphs; all graphs of treewidth at most k. A particularly elegant approach to graph grammars has been developed by Bauderon and Courcelle. As has been pointed out before, Courcelle's writings might be somewhat inaccessible for readers with little algebraic background. This paper has three purposes: (1) review Courcelle's graph grammars (more precisely: the HR-equation systems) in a simple and, hopefully, accessible manner, (2) define a canonical form for bounded treewidth graphs based on Arnborg's treewidth checking algorithm and (3) show how these can be used as a tractable formalism for the generation of graph patterns.

Christophe Costa Florencio
"Identification of Hyperedge-Replacement Graph Grammars"

The last few years have seen an increasing interest in mining and learning from graphs. Most work in this area has aimed at finding approximate solutions to real-world problems, without much concern for more fundamental issues. This paper is intended as a step towards an approach that is theoretically more sound, by applying ideas from the field of grammar induction. We present results concerning the learning of hypergraph grammars. We consider identification in the limit from both derivation trees and graphs. We show that, by restricting the complexity of grammars, a learnable class of derivation languages is obtained. By exploiting a known invariance result this class is extended to a class that is learnable from hypergraphs.

Guillaume Damiand, Colin de la Higuera, Jean-Christophe Janodet, Emilie Samuel and Christine Solnon
"A Polynomial Algorithm for Subisomorphism of Open Plane Graphs"

We address the problem of searching for a pattern in a plane graph, that is, a planar drawing of a planar graph. We define plane subgraph isomorphism and give a polynomial algorithm for this problem. We show that this algorithm may be used even when the pattern graph has holes.

Frank Eichinger and Klemens Böhm
"Towards Scalability of Graph-Mining Based Bug Localisation"

(Semi-)automated bug localisation is an important issue in software engineering. Recent techniques based on call graphs and graph mining can locate bugs in relatively small programs, but do not scale for real-world applications. In this paper we describe a bug-localisation approach based on graph mining that has this property, at least according to preliminary experiments. Our main contribution is the definition and analysis of class-level call graphs, with encouraging results.

Risi Kondor and Nino Shervashidze
"The Graphlet Spectrum"

Current graph kernels suffer from two limitations: graph kernels based on counting particular types of subgraphs ignore the relative position of these subgraphs to each other, while graph kernels based on algebraic methods are limited to graphs without node labels. In this paper we present the graphlet spectrum, a system of graph invariants derived by means of group representation theory that captures information about the number as well as the position of labeled subgraphs in a given graph. In our experimental evaluation the graphlet spectrum outperforms state-of-the-art graph kernels.

Olana Missura and Thomas Gärtner
"Classification on Graphs for Dynamic Difficulty Adjustment: More Questions Than Answers"

Motivated by a problem of dynamic difficulty adjustment in computer games, we propose a new learning setting. We argue that video games require a mechanism for dynamic difficulty adjustment. Such mechanism can be formulated as a classification problem on a graph that combines the properties of online and active learning. The main contribution of this paper is the clarification of this novel learning setting and a summary of open questions.

Tsuyoshi Murata
"New Modularity for Evaluating Communities in Bipartite Networks"

Detecting dense subnetworks (communities) and evaluating them are practically important for finding similar vertices in a network. Although Newman's modularity is widely used for evaluating network division, it is for unipartite networks composed of only one vertex type. As the attempts for evaluating divisions of bipartite networks, Guimera and Barber propose bipartite modularities. The author proposes another bipartite modularity that allows one-to-many correspondence of communities of different vertex types.

Alberto Pinto
"A Statistical Generalization of Music Graph Spectral Mining"

Graph spectral representation of pitch class sequences provides an effective tool for mining structural music information. Here we show how it is possible to extend this approach by embedding rhythmical information into graph representation through a complex valued function defined on the graph edge set allowing for both pitch-class and rhythmical modelling through coefficients of variation. Experiments run on MIREX 05 dataset show that spectral distances calculated from this representation outperform existing methods.

SRL

Oral Presentation

Matthias Broecheler and Lise Getoor
"Probabilistic Similarity Logic"

Many interesting research problems, such as ontology alignment and collective classification, require probabilistic and collective inference over imprecise evidence. Existing approaches are typically ad-hoc and problem-specific, requiring significant effort to devise and implement and provide poor generalizability. In this paper, we introduce probabilistic similarity logic (PSL), a simple, yet powerful language for describing problems which require probabilistic reasoning about similarity where, in addition to reasoning probabilistically, we want to capture both logical constraints and imprecision. We prove that PSL inference is polynomial and describe an efficient implementation which uses an off-the-shelf convex optimization package. Lastly, we outline a wide range of application areas for PSL.

Rodrigo de Salvo Braz, Sriraam Natarajan, Hung Bui, Jude Shavlik and Stuart Russell
"Anytime Lifted Belief Propagation"

Lifted first-order probabilistic inference has been receiving increasing attention. To date, all lifted inference methods require a model to be shattered against itself and evidence, before inference start. In many situations this will lead to great fracturing of the model, producing a new one that is not far from propositionalized and therefore canceling the benefits of lifted inference. Even when shattering is not so severe, intuition suggests it should not involve an entire model, but only the most relevant parts to the evidence and query at hand. We present an algorithm that corresponds to this intuition by performing shattering during belief propagation inference, on a as-needed basis, starting on the most relevant parts of a model first. The trade-off is having an (exact) bound on the query's belief rather than an exact belief. In many problems one can find a very good bound for only a fraction of the cost for computing an exact solution. Moreover, the bound converges to the exact solution as inference and shattering converge to the entire model. Interestingly, this algorithm mirrors theorem-proving, helping to close the gap between probabilistic and logic inference.

Tuyen Huynh and Raymond Mooney
"Max-Margin Weight Learning for Markov Logic Networks"

Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both first-order logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a max-margin framework. This results in a new model, Max-Margin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for loss-augmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 score than the current best discriminative weight learner for MLNs.

Jacek Kisyński and David Poole
"Lifted Aggregation in Directed First-order Probabilistic Models"

There is an ongoing effort to design efficient lifted inference algorithms for first-order probabilistic models. This paper focuses on directed first-order models that require an aggregation operator when a parent random variable is parameterized by logical variables that are not present in a child random variable. We describe our work on extending Milch et al.’s C-FOVE lifted inference algorithm with efficient lifted aggregation procedures.

Aniruddh Nath and Pedro Domingos
"A Language for Relational Decision Theory"

In recent years, many representations have been proposed that combine graphical models with aspects of first-order logic, along with learning and inference algorithms for them. However, the problem of extending decision theory to these representations remains largely unaddressed. In this paper, we propose a framework for relational decision theory based on Markov logic, which treats weighted first-order clauses as templates for features of Markov networks. By allowing clauses to have utility weights as well as probability weights, very rich utility functions can be represented. In particular, both classical planning and Markov decision processes are special cases of this framework.

Sebastian Riedel
"Cutting Plane MAP Inference for Markov Logic"

Abstract In this work we present Cutting Plane Inference (CPI) for MAP inference in Markov Logic. CPI incrementally solves partial Ground Markov Networks, adding formulae only if they are violated in the current solution. We show dramatic improvements in terms of efficiency, and discuss scenarios where CPI is likely to be fast.

Nima Taghipour, Wannes Meert, Jan Struyf and Hendrik Blockeel
"First-Order Bayes-Ball for CP-Logic"

Efficient probabilistic inference is key to the success of statistical relational learning. One issue that affects inference cost is the presence of irrelevant random variables. The Bayes-ball algorithm can identify such irrelevant variables in a propositional Bayesian network. This paper presents a lifted version of Bayes-ball, which works directly on the first-order level, and shows how this algorithm applies to CP-logic inference.

Volker Tresp, Yi Huang, Markus Bundschus and Achim Rettinger
"Materializing and Querying Learned Knowledge"

The Semantic Web (SW) presents new challenges to statistical relational learning. One of the main features of SW data is that it is notoriously incomplete. Consider friend-of-a-friend (FOAF) data. The purpose of the FOAF project is to create a web of machine-readable pages describing people, their relationships, and people's activities and interests, using SW technology. Obviously people vary in their motivation to communicate private information such that person-specific information varies from almost nothing to great detail. Another feature of SW data is the great sparsity of relational information such as friendship: friends in the knowledge base are a small subset of all potential friends. Finally, there is a variety of additional untested and potentially conflicting schema information specified by users. We are convinced that the data situation in the FOAF setting will typical for many other SW domains. Naturally, missing information can be supplemented. First, deductive reasoning can be used to complement factual knowledge based on axioms. The advantage is that deduced facts can be queries with powerful query languages such as SQL for relational data bases and SPARQL for SW data. The disadvantage is that, generally, valid axioms are rare and many true facts in an application cannot be derived. Inductive reasoning is the second way for gaining additional knowledge by exploiting regularities in the data. The advantage of inductive reasoning is that one can perform inference on a large number of statements. Disadvantages are that only the probability of the truth values of statements can be derived and that machine learning needs to be performed by machine learning experts. The goal of the work presented here is to apply statistical relational learning in sparse and unreliable domains such as the SW. Several requirements need to be fulfilled. The statistical relational learning algorithms need to be able to deal with sparse data typical for SW domains, they must be scalable, they need to be easily configurable, finally it should be possible to integrate the learning results into querying. For the user there should be almost no difference between querying facts, deduced facts and induced information. We define an extension of SPARQL, which allows the integration of the learned probabilistic statements into querying. Statements that can be inferred via logical reasoning can readily be integrated into learning and querying. We study learning algorithms that are suitable for the resulting high-dimensional sparse data matrix. We present experimental results using a friend-of-a-friend data set.

Poster Presentation

Ronen Brafman and Yagil Engel
"Lifted Optimization for Relational Preference Rules"

In reasoning about preference we often have knowledge about the desirable behavior or state of a system of agents/objects, that applies to different instantiations of this system, while instantiations may differ in the number and properties of concrete objects. With this motivation we previously proposed relational preference rules, which is a formalism for modeling preference and value information about complex command and control systems. Preference rules allow to specify weights on first-order statements regarding properties (attributes) of object classes. The rules determine a value for any given set of grounded objects and an assignment to their properties. A major reasoning task with preference rules is to find an assignment to the attributes of the objects in the system, that maximizes this value. In the current work we focus on the computational problem of performing lifted optimization; that is finding an optimal assignment without explicit enumeration of the space of assignments to the grounded set of objects. As we show, lifted reasoning with preferences turn out to be significantly different from lifted probabilistic inference.

Maurice Bruynooghe, Daan Fierens, Bernd Gutmann, Angelika Kimmig, Niels Landwehr, Wannes Meert, Ingo Thon and Luc De Raedt
"An Exercise with Statistical Relational Learning Systems"

We report on two exercises in modeling, inference and learning with seven statistical relational learning systems and use this as a basis for a simple and preliminary comparison between these systems.

Michael Chiang and David Poole
"Incremental EM for Latent-class Relational Models"

Latent-class relational models have found much use in key areas such as recommender systems and social network analysis, where hidden group information about domain individuals can help improve predictive accuracy. Fitting such models, however, is generally infeasible due to the large number of latent quantities that must be inferred, and approximate inference techniques such as Gibbs sampling and variational Bayes are required. In this work, we show how an instance of Neal and Hinton's incremental EM algorithm can be derived for fitting latent-class relational models tractably. Empirical evaluations show good accuracy of models learned with the proposed algorithm in numerous social networks and movie rating domains.

James Cussens
"On Generative Parameterisations of Markov Logic Networks"

A Markov logic network uses weighted clauses to define a probability distribution over Herbrand interpretations. PRISM programs also define distributions over Herbrand interpretations. In this paper we show how to represent any Markov logic network as a PRISM program, and then consider the pros and cons of so doing.

Jan De Belder, Wim De Smet, Raquel Mochales Palau and Marie-Francine Moens
"Does Google own Youtube? Entity Relationship Extraction with Minimal Supervision"

This paper advances state of the art entity relationship extraction from text with minimal supervision in two ways. The contributions are a new weighting scheme, that solves different types of bias caused by having only a few training examples, and a new method to construct a support vector machine in the Multiple Instance Learning (MIL) setting, based on Weighted Least Squares (WLS).

Arjen Hommersom, Nivea Ferreira and Peter Lucas
"Learning Parameters of Chain Logic Theories"

There has been a considerable amount of research on statistical relational learning (SRL) during the past decade, much of it focusing on generalised probabilistic logics. SRL formalisms are typically based on Bayesian or Markov networks. Chain graphs (CGs) generalise both acyclic directed graph (ADG) models, i.e., Bayesian networks, and undirected graph (UG) models, i.e., Markov networks, as they may contain both undirected and directed edges. As a consequence they constitute an attractive point of departure for SRL. In this paper, we start with a logical generalisation of chain graphs, called chain logic, to elaborate a method for learning the parameters of chain logic theories.

Kristian Kersting and Zhao Xu
"Relational Gaussian Processes for Learning Preference Relations"

Gaussian processes have successfully been used to learn preferences among entities. They are attractive because they provide Bayesian approaches for model selection and probabilistic inference. For many entities encountered in real-world applications, however, there are complex relations between them. Considering a preference model taking these relations into account should improve learning performance as it allows for preference information between entities to be shared. In this paper, we show that this is indeed the case. Specifically, we propose a probabilistic kernel model to preference learning based on Silva et al.'s mixed graph Gaussian processes. A new prior distribution, enhanced with relational graph kernels, is proposed to capture the correlations between preferences in the Bayesian framework. We show competitive results against baseline methods when demonstrated on publicly available LETOR datasets.

Angelika Kimmig, Bernd Gutmann and Vítor Santos Costa
"Trading Memory for Answers: Towards Tabling ProbLog"

ProbLog is a recent probabilistic extension of Prolog, where facts can be labeled with mutually independent probabilities that they belong to a randomly sampled program. The implementation of ProbLog on top of the YAP-Prolog system provides various inference algorithms that calculate the success probability of a query, i.e. the probability that the query is provable in a randomly sampled program. We discuss extensions of these algorithms with tabling that broaden the class of problems that can be handled. First, exploiting structure sharing can speed up inference in domains where different proofs of a query share many subgoals. Second, we extend exact inference to deal with negated ground subgoals in clause bodies.

Stanley Kok and Pedro Domingos
"Hypergraph Lifting for Structure Learning in Markov Logic Networks"

Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. Learning MLN structure from a relational database involves learning the clauses and weights. The state-of-the-art MLN structure learners all involve some element of greedily generating candidate clauses, and are susceptible to local optima. To address this problem, we present an approach that directly utilizes the data in constructing candidates. A relational database can be viewed as a hypergraph with constants as nodes and relations as hyperedges. We find paths of true ground atoms in the hypergraph that are connected via their arguments. To make this tractable (there are exponentially many paths in the hypergraph), we lift the hypergraph by jointly clustering the constants to form higher-level concepts, and find paths in it. We variabilize the ground atoms in each path, and use them to form clauses, which are evaluated using a pseudo-likelihood measure. In our experiments on three real-world datasets, we find that our algorithm outperforms the state-of-the-art approaches.

Matt Michelson and Sofus Macskassy
"Layered, Multivariate Anomaly Explanations: A First Look"

An anomaly is a data point that deviates dramatically from some set of related data points, based on some common metric. While there is significant research on finding anomalous records within data sources, we are unaware of research on explaining why those anomalous data points are actually anomalous (beyond their deviance from the "standard" value for the metric). Generating these explanations is the focus of this paper.

Sriraam Natarajan, Prasad Tadepalli, Gautam Kunapuli and Jude Shavlik
"Knowledge Intensive Learning - Directed vs Undirected SRL Models"

SRL models have become attractive for allowing the domain expert to succuintly encode the knowledge about the domain. The goal in this work is to present work in progress of comparing two kinds of models in their ability to use this knowledge: directed models with combining rules and undirected models based on weighted logics (MLNs). We compare the models in 2 real-world datasets and present the empirical results. From initial experiments, it appears that the directed models perform better when there are small number of rules while the undirected models improve their performance when there are a large number of weakly predictive rules.

Vítor Santos Costa and Aline Paes
"On Connecting PRISM to CLP(BN)"

The last few years have seen significant progress in Statistical Relational Learning. This progress requires the design and implementation of languages that can be used to express uncertainty over relational data. These languages differ on both the representation they use for structure, with logic-based formalisms being quite popular, and on the representations for uncertainty, with both directed and undirected graphical models being used. Recently, there has been interest on understanding the relationship between these different languages. Establishing a mapping between two different languages is useful to provide insight in understanding the language and sharing implementation technology and experience. In this work, we study the relation between CLP(BN) and PRISM. We will focus on the PRISM to CLP(BN) mapping first, following an example-based approach.

Joost Vennekens, Angelika Kimmig, Theofrastos Mantadelis, Bernd Gutmann, Luc De Raedt and Maurice Bruynooghe
"From ProbLog to First Order Logic: A First Exploration"

ProbLog is a system that allows a user to compute (or approximate) the probability of a query in a theory consisting of Horn clauses guarded by probabilistic facts. This paper explores whether it is feasible to generalize the approach towards theories consisting of first order logic formulas guarded by probabilistic facts.

Hiroaki Watanabe and Stephen Muggleton
"Abstracting Probabilistic Relational Model by Specialisation"

Knowledge abstraction is an essential activity of human intelligence. This paper introduces a logic-based specialisation algorithm for relational domains under uncertainty. The specialisation algorithm takes a pair of existentially quantified conjunctions of first-order literals and computes a most general specialisation of the pair by adopting Plotkin's least general generalisation algorithm. The representation and specialisation algorithm provide a natural way to upgrade non-relational probabilistic models to relational probabilistic models.

Extra Material

Abstracts List

ILP-2009 - 19th International Conference on Inductive Logic Programming

MLG-2009 - 7th International Workshop on Mining and Learning with Graphs

SRL-2009 - International Workshop on Statistical Relational Learning

ILP

Oral Presentation

Poster Presentation

MLG

Oral Presentation

Poster Presentation

SRL

Oral Presentation

Poster Presentation