Probabilistic-Programming Datalog

Probabilistic-Programming Datalog (PPDL) is an extension of Datalog with probabilities proposed in Barani et al., Declarative Statistical Modeling with Datalog, arxiv 2015. We’ll model some of their examples.

Introduction: Clients and Branches

Here’s a ProbLog variant of their motivating example about clients visiting branches of a service provider, replacing the Poisson with a finite distribution to keep support finite.

% input relations (filled arbitrarily) % client(id, branch, avg_num_of_visits) client(a,b1,2). client(b,b2,2). client(c,b2,3). % preferred_client(id, branch, avg_num_of_visits) preferred_client(a,b1,2). preferred_client(b,b2,2). % active(branch) active(b1). % fixed distribution centered around the average M (assumes M >=2) 0.1::dist(C,B,M,MM2) ; 0.2::dist(C,B,M,MM1) ; 0.4::dist(C,B,M,M) ; 0.2::dist(C,B,M,MP1) ; 0.1::dist(C,B,M,MP2) :- MM2 is M-2, MM1 is M-1, MP1 is M+1, MP2 is M+2. % directly modeling the rules; note that they all share the same distribution (which is what the paper wants to achieve) visits(C,B,X) :- client(C,B,L), dist(C,B,L,X). visits(C,B,X) :- preferred_client(C,B,L), dist(C,B,L,X). visits(C,B,X) :- client(C,B,L),active(B), dist(C,B,L,X). query(visits(_,_,_)). % this is a ProbLog variant of the transformed program in the paper % that explicitly makes intermediate steps % 1. define the shared cause for sampling ex_y_visits_flip(C,B,L) :- client(C,B,L). ex_y_visits_flip(C,B,L) :- preferred_client(C,B,L). ex_y_visits_flip(C,B,L) :- client(C,B,L),active(B). % 2. sample (this is implicit in their translation) visits_flip(C,B,X,L) :- ex_y_visits_flip(C,B,L), dist(C,B,L,X). % 3. project out the parameter visits_hat(C,B,X) :- visits_flip(C,B,X,L). query(visits_hat(_,_,_)).

Here’s a variant that illustrates the noisy-or effect in the case where each rule uses its own distribution.

% input relations (filled arbitrarily) % client(id, branch, avg_num_of_visits) client(a,b1,2). client(b,b2,2). client(c,b2,3). % preferred_client(id, branch, avg_num_of_visits) preferred_client(a,b1,2). preferred_client(b,b2,2). % active(branch) active(b1). % fixed distribution centered around the average M (assumes M >=2) 0.1::basic(C,B,M,MM2) ; 0.2::basic(C,B,M,MM1) ; 0.4::basic(C,B,M,M) ; 0.2::basic(C,B,M,MP1) ; 0.1::basic(C,B,M,MP2) :- MM2 is M-2, MM1 is M-1, MP1 is M+1, MP2 is M+2. 0.1::pref(C,B,M,MM2) ; 0.2::pref(C,B,M,MM1) ; 0.4::pref(C,B,M,M) ; 0.2::pref(C,B,M,MP1) ; 0.1::pref(C,B,M,MP2) :- MM2 is M-2, MM1 is M-1, MP1 is M+1, MP2 is M+2. 0.1::active(C,B,M,MM2) ; 0.2::active(C,B,M,MM1) ; 0.4::active(C,B,M,M) ; 0.2::active(C,B,M,MP1) ; 0.1::active(C,B,M,MP2) :- MM2 is M-2, MM1 is M-1, MP1 is M+1, MP2 is M+2. % directly modeling the rules; note that they all share the same distribution (which is what the paper wants to achieve) visits(C,B,X) :- client(C,B,L), basic(C,B,L,X). visits(C,B,X) :- preferred_client(C,B,L), pref(C,B,L,X). visits(C,B,X) :- client(C,B,L),active(B), active(C,B,L,X). query(visits(_,_,_)).

Earthquakes

% the edb from figure 3: different types of properties in cities with different burglary rates home(np1,napa). home(np2,napa). home(yc1,yucaipa). business(np3,napa). business(yc1,yucaipa). city(napa,0.03). city(yucaipa,0.01). % ProbLog encoding of rules in figure 1 0.01::earthquake(C,true); 0.99::earthquake(C,false) :- city(C,R). unit(H,C) :- home(H,C). unit(B,C) :- business(B,C). R::burglary(X,C,true) ; RR::burglary(X,C,false) :- unit(X,C), city(C,R), RR is 1-R. 0.6::trig(X,true) ; 0.4::trig(X,false) :- unit(X,C),earthquake(C,true). 0.9::trig(X,true) ; 0.1::trig(X,false) :- burglary(X,C,true). alarm(X) :- trig(X,true). query(earthquake(_,_)). query(burglary(_,_,_)). query(trig(_,_)). query(alarm(_)). % ProbLog encoding of transformed rules in figure 2 exY_earthquake_flip_2(C,y,0.01) :- city(C,R). unit_hat(H,C) :- home(H,C). unit_hat(B,C) :- business(B,C). exY_burglary_flip_3(X,C,y,R) :- unit_hat(X,C),city(C,R). exY_trig_flip_2(X,y,0.6) :- unit_hat(X,C), earthquake_hat(C,true). exY_trig_flip_2(X,y,0.9) :- burglary_hat(X,C,true). alarm_hat(X) :- trig_hat(X,true). earthquake_hat(C,D) :- earthquake_flip_2(C,D,P). burglary_hat(X,C,Y) :- burglary_flip_3(X,C,Y,R). trig_hat(X,Y) :- trig_flip_2(X,Y,P). P::earthquake_flip_2(C,true,P) ; PP::earthquake_flip_2(C,false,P) :- exY_earthquake_flip_2(C,y,P), PP is 1-P. R::burglary_flip_3(X,C,true,R); RR::burglary_flip_3(X,C,false,R) :- exY_burglary_flip_3(X,C,y,R), RR is 1-R. P::trig_flip_2(X,true,P) ; PP::trig_flip_2(X,false,P) :- exY_trig_flip_2(X,y,P), PP is 1-P. query(earthquake_hat(_,_)). query(burglary_hat(_,_,_)). query(trig_hat(_,_)). query(alarm_hat(_)).

Infinite examples

% use typing to restrict to finite set of queries dom(X) :- between(0,5,X). % program 1 (single infinite model) % r(X,f(X)) :- r(Y,X) with f(X) = 2X with prob 1 % r(0,1) % ProbLog style r(0,1). 1.0::r(X,X2) :- r(_,X), X2 is 2*X, dom(X). % closer to delta-notation in paper s(0,1). s(Y,DY) :- s(_,Y), delta(Y,DY). 1.0::delta(Y,DY) :- dom(Y), DY is 2*Y. query(r(_,_)). query(s(_,_)).
% use typing to restrict to finite set of queries dom(X) :- between(0,5,X). % program 2 (one finite, one infinite model) % same as program 1, but start with (0,1) or (0,0) % ProbLog style 0.5::r(0,0); 0.5::r(0,1). 1.0::r(X,X2) :- r(_,X), X2 is 2*X, dom(X). % closer to delta-notation in paper s(0,F) :- flip(0.5,F). s(Y,DY) :- s(_,Y), delta(Y,DY). 1.0::delta(Y,DY) :- dom(Y), DY is 2*Y. 0.5::flip(0.5,0); 0.5::flip(0.5,1). query(r(_,_)). query(s(_,_)).
% use typing to restrict to finite set of queries dom(X) :- between(0,5,X). % program 3 (all models infinite) % as program 1, but splitting on 2X / 2X+1 % ProbLog style r(0,1). 0.5::r(X,X2); 0.5::r(X,X3) :- r(_,X), X2 is 2*X, X3 is X2+1, dom(X). % closer to delta-notation in paper s(0,1). s(Y,DY) :- s(_,Y), delta(Y,DY). 0.5::delta(Y,DY); 0.5::delta(Y,DY1) :- dom(Y), DY is 2*Y, DY1 is DY+1. query(r(_,_)). query(s(_,_)).
% use typing to restrict to finite set of queries dom(X) :- between(0,5,X). % program 4 (all models infinite except for one) % combining programs 2 and 3 % ProbLog style 0.5::r(0,0); 0.5::r(0,1). 0.5::r(X,X2); 0.5::r(X,X3) :- r(_,X), X2 is 2*X, X3 is X2+1, dom(X). % closer to delta-notation in paper s(0,F) :- flip(0.5,F). 0.5::flip(0.5,0); 0.5::flip(0.5,1). s(Y,DY) :- s(_,Y), delta(Y,DY). 0.5::delta(Y,DY); 0.5::delta(Y,DY1) :- dom(Y), DY is 2*Y, DY1 is DY+1. query(r(_,_)). query(s(_,_)).