Book review

Logic for Learning: Learning Comprehensible Theories from Structured Data by John W. Lloyd, Springer-Verlag, 2003, hard cover: ISBN 3-540-42027-4, x + 256 pages.

The publication of a new book by John Lloyd, nearly twenty years after the first edition of his classic Foundations of Logic Programming saw the light, has to count as a major event in the computational logic community. Much has happened in two decades, and those expecting a third, extended edition of Foundations of Logic Programming will be in for a surprise. This time, John has tried his hand at machine learning, and his main aim in Logic for Learning is to demonstrate ``the rich and fruitful interplay between the fields of computational logic and machine learning".

In this book review I will try to assess to which extent this aim has been achieved. In doing so I will attempt to approach matters from the point of view of a computational logician (even though I am more of a machine learner myself). As with any book review, the main questions are ``who should read this book?" and ``what's in it for them?". I should perhaps add that John Lloyd and I are no strangers, our appointments in Bristol having overlapped for about two years in 1997-1999. After John's return to Australia we continued to publish joint papers on subjects closely related to Logic for Learning. While this means I am sympathetic towards the general approach advocated in Logic for Learning, I do not believe this seriously affects my capacity for critical assessment, which of course in an academic review is chiefly related to the question ``would I have written the same book?".

The most obvious connection between machine learning and computational logic is in the area of knowledge representation. Both data input to a learning system and models output by it need to be expressed in some kind of declarative formalism (assuming a symbolic learner; non-symbolic learners such as neural networks only need a declarative formalism for the data). The sophistication required of the declarative language depends on the nature of the data. Logic for Learning is concerned with learning from rich and complex data, witness its subtitle Learning Comprehensible Theories from Structured Data. The logic used in this book is correspondingly rich: a typed higher-order logic, with a rewrite-based inference system combining aspects of functional and logic programming. This will sound familiar to some readers, as Lloyd has been working in the past on the declarative programming language Escher along these lines. However, Escher is not mentioned in the book, presumably because a fully-fledged Escher implementation is neither available nor needed for the kind of machine learning applications considered in this book.

The fact that the logic is higher-order means that a wider range of data types is available than the usual data constructor types that can be used to build tree-like structures. A set can be identified with its characteristic function which can be represented by a $\lambda$ -abstraction. By the same token, we can have multisets or bags ( $\lambda$ -abstractions mapping into integers rather than booleans). Sets and multisets are represented extensionally as finite look-up tables: for instance, the set $\{1,2,3\}$ is represented by the higher-order term

$\begin{displaymath}\lambda x.\mbox{\it if}\ x=1\ \mbox{\it then}\ \top\ \mbox{\i... ...box{\it if}\ x=3\ \mbox{\it then}\ \top\ \mbox{\it else}\ \bot \end{displaymath}$

Describing individuals that make up a data set for learning by higher-order terms is elegant and powerful. The type signature provides the meta-data that is exploited by the learning system to construct the search space for possible classification models. Furthermore, the inductive nature of the type signature facilitates the proof of formal results (e.g., that a distance measure defined on pairs of higher-order terms is a metric).

Higher-order logic is also used extensively in the construction of classification models. A wide range of classifiers is used in machine learning; Logic for Learning describes in some detail a learning system called ALKEMY that constructs higher-order decision trees. Internal nodes in such a tree contain predicates (i.e., formulas with one free variable corresponding to the type of individuals), and for a given individual we take one of two branches emanating from the node depending on whether the predicate is true of the individual. The tree's leaves are then labelled with predicted classes. The main task of a decision tree learner is to decide which predicates are to be used for the tree's internal nodes, and here again higher-order logic is used for the construction of such predicates using a predicate rewrite system. The advantage of using higher-order logic here is that these predicates can be described by function composition and variables can be dispensed with. Here is an example used in the book on a well-known toy problem classifying trains:

$\begin{displaymath}\mbox{\it setExists}_1 (\wedge_2(\mbox{\it projLength} \circ ... ...it Short})) (\mbox{\it projKind} \circ (= \mbox{\it Closed}))))\end{displaymath}$

This predicate is true if the train, represented as a set of cars, contains a car satisfying a single predicate, which itself is a conjunction of two predicates, each of which consists of a projection function (since cars are represented by tuples) composed with a predicate comparing the result of the projection to a constant value. The use of higher-order logic here is elegant but not crucial, as it does not offer additional power over the first-order refinement operators or hypothesis grammars that are used in inductive logic programming (ILP).

Logic for Learning is a fairly slim volume consisting of six main chapters titled Introduction, Logic, Individuals, Predicates, Computation, and Learning. This organisation has an inherent logic, although it does mean that the reader will not see any learning until she has read more than 200 pages. However, the introductory chapter provides a very helpful ``shortest path to the machine learning applications". Chapter 1 is also remarkable in another respect: it provides an excellent, intuitive and very readable introduction to the rest of the book. The remaining chapters are written in the terse and mathematical style familiar to readers of Foundations of Logic Programming. As such, the book is more geared towards computational logicians who are interested in machine learning, while machine learners who want to know what computational logic has to offer may find the mathematical approach a little bit unforgiving (but not unsurmountable). Literature references are collected at the end of each chapter rather than being sprinkled throughout the text. Each chapter also has a list of (almost exclusively mathematical) exercises.

Of course the use of computational logic in machine learning is not new: ILP has been a very active research area for well over twenty years, with its own annual conference. However, readers interested in an overview of ILP and how the field has developed should look elsewhere. What Logic for Learning has to offer is a very personal view on the interplay between computational logic and machine learning. It is almost a rational reconstruction of what ILP could have been, had it used Escher-style higher-order logic rather than Prolog from the beginning. This is a worthy and fascinating exercise, as it could be argued that a considerable amount of work in ILP has been concerned with repairing the restrictions and shortcomings of Prolog, and it is refreshing - even for ILP experts - to see things approached from such a different perspective.

In conclusion, this book contributes to the trend towards inter-disciplinarity that can be observed in many scientific disciplines and particularly in computer science. We can also perceive this trend in computational logic, which is branching out to application areas such as Semantic Web and e-Science. As Logic for Learning argues convincingly, machine learning is another very promising application area for computational logic, and this book provides an excellent basis for logicians who want to move in that direction. I recommend that they follow it up with an introductory text on machine learning - such as Tom Mitchell's Machine Learning (McGraw-Hill, 1997) or Ian Witten and Eibe Frank's Data Mining (Morgan Kaufmann, 2000) - as Logic for Learning doesn't contain any material on heuristics, overfitting, tree-pruning, and other standard machine learning techniques. The book can also be used as a textbook in a mathematically oriented advanced graduate course.

My main criticism of the book is that it would have been even more convincing if it had demonstrated how a range of machine learning techniques - not just decision tree learning - could have benefited from the higher-order logic treatment. I would have wanted to see material on classification rule learning, association rule learning, learning of probabilistic models, and so on. As it stands, the book offers a wealth of technical material but only one or two examples are given of how it could be used in machine learning, which made me feel like someone who has climbed a tall tower only to be given a few minutes to admire the view. I would also have liked to see some serious applications on real-world, highly structured data. But it is always easy to say to an author: ``This is great stuff, why didn't you write more?" The bottom line is that it is indeed great stuff, which deserves to be taken serious by any computational logician who wants to stay ahead of her game.

PETER FLACH
University of Bristol, United Kingdom