Probabilistic Context Free Grammars

While probabilistic context free grammars are a natural fit for the probabilistic logic programming system PRISM, they are somewhat more challenging to model in ProbLog. The reason is that ProbLog uses stochastic memoization, while PCFGs (and PRISM) make a fresh probabilistic choice every time the same non-terminal is encountered. In ProbLog, we thus need to “deactivate” stochastic memoization by adding suitable extra arguments, similar to the time argument in ProbLog models of HMMs. While doing so, we further need to ensure that these identifiers correctly interact with ProbLog’s approach to tackling the disjoint sum problem. In the following, we illustrate two approaches to achieve this, where we assume a PCFG in Chomsky normal form, i.e., non-terminals rewrite to either a single terminal or two non-terminals.

Derivation-based Encoding

The first encoding explicitly tracks leftmost derivations in the grammar via sequences of states, where each state consists of a list of non-terminals already seen (in reverse order as it is easier to add an element to the front of a list) and a list of non-terminals still to be expanded. State transitions are given by the rules with the head of the second list as left hand side, and take place with the probabilities of the corresponding right hand sides. Note that this encoding is clearly suboptimal from an efficiency point of view, as it makes all derivations explicit and thus does not allow sharing of partial results between derivations.

The first example is a simple grammar generating all strings over {a,b}:

% s is the starting non-terminal state([],[s]). % s -> 0.3 :: a | 0.3 :: b | 0.4 :: s s 0.3::state([a|W],S) ; 0.3::state([b|W],S) ; 0.4::state(W,[s,s|S]) :- state(W,[s|S]). word(W) :- reverse(W,WR),state(WR,[]). reverse(L,R) :- reverse(L,[],R). reverse([],L,L). reverse([H|T],A,R) :- reverse(T,[H|A],R). query(word([_])). query(word([_,_])). query(word([_,_,_])).

The second example generates well-formed sequences of brackets, using a for the opening and b for the closing one.

% s is the starting non-terminal state([],[s]). % S -> 0.4 :: O M | 0.6 :: S S 0.4::state(W,[o,m|S]) ; 0.6::state(W,[s,s|S]) :- state(W,[s|S]). % O -> a state([a|W],S) :- state(W,[o|S]). % M -> 0.7 :: b | 0.3 :: S C 0.7::state([b|W],S) ; 0.3::state(W,[s,c|S]) :- state(W,[m|S]). % C -> b state([b|W],S) :- state(W,[c|S]). word(W) :- reverse(W,WR), state(WR,[]). reverse(L,R) :- reverse(L,[],R). reverse([],L,L). reverse([H|T],A,R) :- reverse(T,[H|A],R). query(word([_,_])). query(word([_,_,_,_])). query(word([_,_,_,_,_,_])).

Encoding with Nonterminal-Counters

Instead of carrying around the full derivation for bookkeeping, we can also remember how often we have seen each non-terminal that offers a probabilistic choice of rules. This can be seen as numbering all occurrences of a non-terminal in a leftmost derivation from left to right, using that number (instead of the exact context the non-terminal is expanded in) to distinguish the different uses. Similar in spirit to the well-known CYK parsing algorithm, at each application of a rule rewriting one non-terminal into two non-terminals, we split the word into two (non-empty) parts in all possible ways. This encoding often significantly reduces the number of distinct random variables created and thus improves scalability.

Here’s the first example rewritten to this encoding:

% s -> 0.3 :: a | 0.3 :: b | 0.4 :: s s 0.3::s_to(N,a); 0.3::s_to(N,b); 0.4::s_to(N,ss). s([a],s(N,NN)) :- s_to(N,a), NN is N+1. s([b],s(N,NN)) :- s_to(N,b), NN is N+1. s(L,s(N,NN)) :- s_to(N,ss), NI is N+1, split(L,L1,L2), s(L1,s(NI,NS)), s(L2,s(NS,NN)). % s is the starting symbol, and we start with the counter for s being 0 word(L) :- s(L,s(0,_)). % split(+T,-P,-S) splits the list T into two non-empty sublists P(refix) and S(uffix) % note that T needs to have fixed length for this to terminate split([A,B|C],[A],[B|C]). split([A,B|C],[A|D],E) :- split([B|C],D,E). % we need to fix the length of the list to ensure split/3 has a finite number of solutions query(word([_])). query(word([_,_])). query(word([_,_,_])).

And here is the new encoding for the bracketing example:

% S -> 0.4 :: O M | 0.6 :: S S 0.4::s_to(N,om); 0.6::s_to(N,ss). s(L,s(N,NN),m(M,MM)) :- s_to(N,om), NI is N+1, split(L,L1,L2), o(L1,s(NI,NS),m(M,MS)), m(L2,s(NS,NN),m(MS,MM)). s(L,s(N,NN),m(M,MM)) :- s_to(N,ss), NI is N+1, split(L,L1,L2), s(L1,s(NI,NS),m(M,MS)), s(L2,s(NS,NN),m(MS,MM)). % O -> a o([a],s(N,N),m(M,M)). % M -> 0.7 :: b | 0.3 :: S C 0.7::m_to(N,b); 0.3::m_to(N,sc). m([b],s(N,N),m(M,MM)) :- m_to(M,b),MM is M+1. m(L,s(N,NN),m(M,MM)) :- m_to(M,sc), MI is M+1, split(L,L1,L2), s(L1,s(N,NS),m(MI,MS)), c(L2,s(NS,NN),m(MS,MM)). % C -> b c([b],s(N,N),m(M,M)). % s is the starting symbol, and we start with the counters for both s and m 0 word(L) :- s(L,s(0,_),m(0,_)). % split(+T,-P,-S) splits the list T into two non-empty sublists P(refix) and S(uffix) % note that T needs to have fixed length for this to terminate split([A,B|C],[A],[B|C]). split([A,B|C],[A|D],E) :- split([B|C],D,E). % we need to fix the length of the list to ensure split/3 has a finite number of solutions query(word([_,_])). query(word([_,_,_,_])). query(word([_,_,_,_,_,_])).

As explicitly incorporating counters into the rules as above is cumbersome, here’s a general purpose encoding of the approach that allows for easy exchange of the grammar. It also adds a predicate tree/2 that generates a nested term representing a parse tree.

%%% model-specific part % - one ground fact for start/1 giving the starting symbol % - ground facts for nonterminal/1 listing all nonterminal symbols % - for each nonterminal, an AD defining its rules, with a shared variable as last argument, where % rule(X,Y,Z,_) is a rule X -> Y Z with nonterminals Y and Z % rule(X,A,_) is a rule X -> a with terminal "a" % % S -> 0.4 :: O M | 0.6 :: S S % O -> a % M -> 0.7 :: b | 0.3 :: S C % C -> b start(s). nonterminal(s). nonterminal(m). nonterminal(o). nonterminal(c). 0.4::rule(s,o,m,T); 0.6::rule(s,s,s,T). rule(o,a,_). 0.7::rule(m,b,T); 0.3::rule(m,s,c,T). rule(c,b,_). % we need to fix the length of the list to ensure split/3 has a finite number of solutions query(word([_,_,_,_])). query(tree([a,b,a,b,a,b],_)). %%% model-independent part % X -> Y Z parse(W,X,CIn,COut) :- get_counter(X,CIn,N), rule(X,Y,Z,N), increment_counter(X,CIn,CC), split(W,W1,W2), parse(W1,Y,CC,CCC), parse(W2,Z,CCC,COut). % X -> a parse([A],X,CIn,COut) :- get_counter(X,CIn,N), rule(X,A,N), increment_counter(X,CIn,COut). word(W) :- findall(X,nonterminal(X),NTs), init_counter(NTs,Counter), start(S), parse(W,S,Counter,_). % building parse tree tree(W,T) :- findall(X,nonterminal(X),NTs), init_counter(NTs,Counter), start(S), parse_tree(W,S,Counter,_,T). % X -> Y Z parse_tree(W,X,CIn,COut,Tree) :- Tree =.. [X,Left,Right], get_counter(X,CIn,N), rule(X,Y,Z,N), increment_counter(X,CIn,CC), split(W,W1,W2), parse_tree(W1,Y,CC,CCC,Left), parse_tree(W2,Z,CCC,COut,Right). % X -> a parse_tree([A],X,CIn,COut,Tree) :- Tree =.. [X,A], get_counter(X,CIn,N), rule(X,A,N), increment_counter(X,CIn,COut). %%% auxiliaries % split(+T,-P,-S) splits the (fixed length) list T into two non-empty sublists P(refix) and S(uffix) % note that T needs to have fixed length for this to terminate split([A,B|C],[A],[B|C]). split([A,B|C],[A|D],E) :- split([B|C],D,E). get_counter(X,Cs,N) :- member(count(X,N),Cs). member(A,[A|_]). member(A,[H|T]) :- A \= H, member(A,T). increment_counter(X,[count(X,N)|Cs],[count(X,NN)|Cs]) :- NN is N+1. increment_counter(X,[count(Y,N)|Cs],[count(Y,N)|NCs]) :- X \== Y, increment_counter(X,Cs,NCs). init_counter([],[]). init_counter([H|T],[count(H,0)|C]) :- init_counter(T,C).