Inductive LP, EBG and Uncertainty

Appeared in Volume 9/1, February 1996

Keywords: inductive LP.

ries+@cs.cmu.edu
Klaus Ries
6th November 1995

There is so much work on learning with clause logics that I am curious whether there has been some work that extends this to a logic that contains uncertainty, e.g. in the disguise of probabilities or in the disguise of fuzzy-logic.

My interest is especially in restricted forms of logic, that allows induction over huge databases. My research objective is to find better statistical language models, and a model that uses just the last two words to estimate the next is by far the best. I wonder if logical structures could contribute here.

One possibily is to partition the history of the text using decision trees (e.g. with ID3, CART). Since decision trees are unable to handle 2 or more independent features without duplications, I wonder if a restricted form of logic - say some " FUZZY-Probabilistic-Datalog without function symbols " :) - has been explored which inductively derives stochastical models.

pereira@research.att.com
Fernando Pereira
8th November 1995

I don't know the origin of the folk-pseudo-result that "trigrams are best" among all current statistical language models in existence, but it's just not true. 4- and 5-grams perform significantly better than trigrams in some speech recognition tasks (see the results of last year's ARPA NAB tests, especially the AT&T entry that used 5-grams). But I take it that your real interest is to move beyond n-grams to models with more structure.

A first question you may want to consider is whether the applications you are interested in require models that output probability estimates for configurations/analyses, or instead just need to make some decision (eg. does a noun phrase start/end between these two consecutive words). In the latter case, you don't need a probabilistic model (although a probabilistic model may be worth considering). Then existing concept learning frameworks may be applicable. See for example the work of Ray Mooney (UT Austin) specifically in NLP applications, of William Cohen (AT&T), or, further from logic programming, Eric Brill (Johns Hopkins).

Daphne Koller (Stanford) has suggested connections between her earlier work on probabilistic logics and Bayesian networks. I don't know how far that work has gone though.

ries+@cs.cmu.edu
Klaus Ries
8th November 1995

I would like to have a probabilistic model, or at least one that can incorporate / be combined with the acoustic information. No comment from me on 3 vs. 4-5 - grams :) Large decision tree also perform much better, but the type of modelling they use seems to be really defective as well.

Pereira writes:
See for example the work of Ray Mooney (UT Austin)...

I am aware of this research, but I really want a probabilistic model and I am willing to pay in terms of the expressivness of the underlying logic. Decision trees have a very simple logic. On the other hand most of the learning algortihms for decision trees have a reasonable stochastic explanation.

Pereira continues:
Daphne Koller (Stanford) has suggested connections between her earlier work on probabilistic logics and Bayesian networks.

I have just visited her WWW pages at:
http://robotics.stanford.edu/users/daphne/bio.html

They seem to be close to what I'm looking for.