Parameter Learning From Queries
Go to the directory ~/yap-6/packages/ProbLog/problog_examples/ and start YAP. Then type :- [learn_graph]. This query loads the example file below. It contains the probabilistic graph from above, but the probabilities are unknown.
t(0.9)::dir_edge(1,2). t(0.8)::dir_edge(2,3). t(0.6)::dir_edge(3,4). t(0.7)::dir_edge(1,6). t(0.5)::dir_edge(2,6). t(0.4)::dir_edge(6,5). t(0.7)::dir_edge(5,3). t(0.2)::dir_edge(5,4).
Instead of probabilities every fact has a t( ) prefix. The t stands for tunable and indicate that ProbLog should learn the probability. The number between the parentheses indicates the ground truth probability. It is ignored by the learning algorithm and if you do not know the ground truth, you can write t(_). The ground truth is used after learning to estimate the distance of the learned model parameters to the ground truth model parameters.
Furthermore, the example files contains training examples
example(1,path(1,2),0.94). example(2,path(1,3),0.81). example(3,path(1,4),0.54). example(4,path(1,5),0.70). example(5,path(1,6),0.87). example(6,path(2,3),0.85). example(7,path(2,4),0.57). ...
Every example has a unique identifier, a query and a probability. The first example - for instance - says that the probability for a path from 1 to 2 is 0.94. Instead of queries, you can also give proofs as training example. They are encoded as the conjunction of the probabilistic facts used in the proof
... example(16,(dir_edge(2,3),dir_edge(2,6),dir_edge(6,5)),0.032). ...
This example encodes, that there is a path from 2 to 5 via the nodes 2 and 6 and this path has probability 0.032. Appart from training examples, you can also specify test examples which are ignored during learning but which are used afterwards to check the performance of the model.
... test_example(21,path(2,1),0.94). test_example(22,path(3,1),0.81). test_example(23,path(4,1),0.54). test_example(24,path(5,1),0.70). test_example(25,path(6,1),0.87). ...
Please note that ID namespace is shared with the training examples and you may only reuse an ID if the queries are identical.
After the example file is loaded, you can start learning algorithm by typing :-do_learning(10). where 10 is the number of iterations you want to perform. Alternatively, you can also use :-do_learning(N,Epsilon). and the learning will stop after N iterations or if the difference of the Mean Squared Error (MSE) between two iterations gets smaller than Epsilon - depending on what happens first.
Afterwards you can quit YAP and go to the folder ~/problog/examples/output. There you will find the file log.dat which contains MSE on training and test set for every iteration, the timings, and some metrics on the gradient in CSV format. The files factprobs_N.pl contain the fact probabilities after the Nth iteration and the files predictions_N.pl contain the estimated probabilities for each training and test example - per default these file are generated every 5th iteration only.
Also, the folder output is created as sub folder in the folder where you YAP started. To change this location, use the corresponding flag (see below).
Settings for Learning (learning.yap)
For the learning module, use :-problog_flags. to get an overview of all options, :-set_problog_flag(Name,Value). to change an option and :-problog_flag(Name,Value). to obtain the current value for a flag.
Continue with the tutorial on parameter learning from partial interpretations (LFI-ProbLog).