Session 3 - ProbLog inference

ProbLog’s inference consists of a number of stages and alternative pipelines, which are shown in the following overview.

../_images/pipeline.png

All of these components can be accessed through the ProbLog package in Python. You can install this package from the PyPi repository: pip install problog --user --pre. (ProbLog is under active development, so it is a good idea to use the development version (--pre)).

Mutually exclusive proofs

Before we discuss the standard inference pipeline, let us first look at an example.

digraph probabilistic_graph { rankdir=LR; 1 -> 3 [label="0.1"]; 1 -> 2 [label="0.6"]; 2 -> 5 [label="0.4"]; 2 -> 6 [label="0.3"]; 3 -> 4 [label="0.3"]; 4 -> 5 [label="0.8"]; 5 -> 6 [label="0.2"]; }

0.6::edge(1,2). 0.1::edge(1,3). 0.4::edge(2,5). 0.3::edge(2,6). 0.3::edge(3,4). 0.8::edge(4,5). 0.2::edge(5,6). path(X,Y) :- edge(X,Y). path(X,Y) :- edge(X,Z), path(Z,Y). query(path(1,6)).

Standard pipeline

Stage 1: grounding

Grounding is the transformation from a first-order model to a propositional formula.

Let us revisit the model of the game from the tutorial slides.

0.4 :: heads.
0.3 :: col(1,red); 0.7 :: col(1,blue).
0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue).
win :- heads, col(_,red).
win :- col(1,C), col(2,C).
query(win).

We can ground this model using the command problog ground game.pl. This will use ProbLog’s builtin Prolog engine to generate a propositional program.

0.2::col(2,red); 0.5::col(2,blue).
0.3::col(1,red); 0.7::col(1,blue).
0.4::heads.
win :- heads, col(1,red).
win :- heads, col(2,red).
win :- col(1,red), col(2,red).
win :- col(1,blue), col(2,blue).
query(win).

This process is query-based, which means that facts that are irrelevant to the query are omitted, for example, the fact col(2,green) does not appear in the ground version. The rules for win are grounded such that all variables have been removed.

Stage 2: breaking cycles

Unlike standard Prolog, ProbLog can deal with program that are defined in a cyclical way.

0.25::stress(1). 0.35::stress(2). 0.2::influences(1,2). 0.2::influences(2,1). smokes(X) :- stress(X). smokes(X) :- influences(Y, X), smokes(Y). query(smokes(1)). query(smokes(2)).

When we ground this program, we get the following program, which is clearly cyclical.

0.25::stress(1). 0.35::stress(2). 0.2::influences(2,1). 0.2::influences(1,2). smokes(1) :- stress(1). smokes(1) :- influences(2,1), smokes(2). smokes(2) :- stress(2). smokes(2) :- influences(1,2), smokes(1). query(smokes(1)). query(smokes(2)).

Before we can compile such a program, we need to remove these cyclic dependencies. In standard logic semantics, the model (smokes(1), smokes(2)) is a valid model for this formula. In logic programming this is not the case.

0.25::stress(1). 0.35::stress(2). 0.2::influences(2,1). 0.2::influences(1,2). smokes(1) :- stress(1). smokes(1) :- influences(2,1), stress(2). smokes(2) :- stress(2). smokes(2) :- influences(1,2), stress(1). query(smokes(1)). query(smokes(2)).

Stage 3: knowledge compilation

ProbLog supports different knowledge compilation techniques, such as Ordered Binary Decision Diagrams (OBDD), Sentential Decision Diagrams (SDD) and smoothed deterministic, decomposable negation normal form (d-DNNF). All of these have the property that disjunctive branches are mutually exclusive, and in that way they address the disjoint sum problem.

On the command line, you can select the technique using the option -k {sdd,ddnnf,bdd}.

Let us look at the compiled versions of this small program:

0.1::burglary.
0.2::earthquake.
alarm :- burglary.
alarm :- earthquake.
0.7::hears_alarm(john).
calls(john) :- alarm, hears_alarm(john).
query(calls(john)).
../_images/alarm_formula.png

The uncompiled formula (note that the OR branches are not mutually exclusive).

../_images/alarm_bdd.png

The formula compiled as a Binary Decision Diagram.

../_images/alarm_sdd.png

The formula compiled as a Sentential Decision Diagram

../_images/alarm_ddnnf.png

The formula compiled into smoothed deterministic, decomposable negation normal form.

Stage 4: Weighted Model Counting

Each of the compiled representations allows for efficient weighted model counting.

Variations

Sampling

ProbLog can be used to sample possible worlds. This can be used to generate data or to perform inference through sampling.

0.4::c(A,a) ; 0.2::c(A,b) ; 0.1::c(A,c); 0.3::stop(A). seq(N, []) :- stop(N). seq(N, [H|T]) :- c(N, H), N1 is N + 1, seq(N1, T). seq(S) :- seq(0, S). query(seq(_)).

Using sampling you could answer queries that have infinite support (which is not possible with the standard inference).

For example, you could query for the probability of obtaining a sequence with an a as the second character.

q :- seq([_,a|_]).
query(q).

You can run this example with problog sample example.pl --estimate -N 10000 (which will generate 10000 samples). (The correct answer is 0.28.)

Sampling also supports continuous distributions. Details can be found in this paper:

  • A. Dries. Declarative data generation with ProbLog. International symposium on information and communication technology, 2015. PDF
  • This type of inference is also used in the related system Distributional Clauses.

    Forward compilation

    One of the bottlenecks in the standard ProbLog pipeline is cycle-breaking. We can avoid this step by using forward compilation based on a logical TP-operator.

  • J. Vlasselaer, G. Van den Broeck, A. Kimmig, W. Meert and L. De Raedt. Anytime inference in probabilistic logic programs with Tp-compilation. Proceedings of 24th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1852 - 1858, 2015. PDF
  • Most Probable Explanation

    ProbLog can also be used to find the most-probable explanation given some evidence.

    0.6::edge(1,2). 0.1::edge(1,3). 0.4::edge(2,5). 0.3::edge(2,6). 0.3::edge(3,4). 0.8::edge(4,5). 0.2::edge(5,6). path(X,Y) :- edge(X,Y). path(X,Y) :- edge(X,Z), path(Z,Y). evidence(path(1,6)).