Performace of Binary Clauses / Continuations

Appeared in Volume 6/3, August 1993

Keywords: benchmarks.

vbeiren@kiwi.uia.ac.be
Luc Van Beirendonck
8th April 1993

I have read that an alternative WAM architecture is used in BinProlog: Clauses are first transformed into binary clauses and then a continuation is applied at run-time.

P. Tarau (BinProlog) argues that this is considerably faster than the usual approaches [1]. However, B. Demoen and A. Marien (ProLog by BIM) argue that this architecture is slower in most cases than a classical WAM [2].

Has anybody had performance experience with the continuation architecture (in comparison to a classical WAM architecture)?

[1] A compiler and simplified abstract machine for the execution of binary metaprograms, Technical report, Universite de Moncton, Canada

[2] Implementation of Prolog as binary definite programs, Procs Second Russian Conference on Logic Programming, 1991 (LNAI 592)

lindgren@sics.se
Thomas Lindgren
9th April, 1993

The question seems to be how a goal stacking abstract machine compares to an environment stacking one. In other respects, the architectures seem similar.

Warren compares goal stacking and environment stacking [3]. Goal stacking is a bit awkward, since unsafe variables have to be checked, the body code may be hard to optimize, and goals can't be trimmed. He drops goal stacking mainly because of unsafe variables. In the next section, Warren describes how environment stacking and goal stacking can be reconciled:
P :- Q,R,S.

becomes:

P :- Q, Z1.
Z1 :- R, Z2.
Z2 :- S.

Note the similarity to binarization! Demoen does a similar thing in a paper on binary programs.

In Paul Tarau's engine, there is no goal stack; instead, the success continuation is allocated on the heap. Demoen and Marien also followed this approach. There are no unsafe variables (these have disappeared in WAM-style Prologs such as Aquarius and BIM also). The drawback is that these objects end up on the heap instead.

In a discussion last summer on comp.lang.prolog, Tim Lindholm wrote that not having unsafe variables increased heap consumption by 50-100% in Quintus, on a suite of benchmarks. They did not measure runtime effects. Writing entire goals to the heap increases this overhead somewhat.

Bart Demoen noted in the same discussion:
Two more remarks: I want to make a link with BinProlog (by the way, the relation between BinProlog and BIMProlog as not the same as between Nu-Prolog and Mu-Prolog !) or a continuation based implementation of Prolog: continuations on the heap actually means also environments on the heap, so that it in fact puts uninitialized 'permanent' variables on the heap. Still, it trails as much as a WAM implementation with unsafe variables. And it is difficult to get rid of the extra dereference every time the variable is accessed.

(The other remark was that permanent variables can be eliminated by program

transformation, as shown by Proietti & Petterossi in PLILP'91.)

As far as I can see, there are some overheads of a continuation-based implementation that require some thought to solve. On the other hand, Paul Tarau's work shows that perhaps these overheads are less severe than one may think. BinProlog is an emulated implementation, though; the difference may be greater in a 'fully compiled' implementation. Or perhaps not.

With such a conclusive answer, I pause to catch my breath.

[3] D.H.D Warren, An Abstract Prolog Instruction Set, Technical Note 309, October 1983. That's a tenth-year anniversary this year!

tarau@IRO.UMontreal.CA
Paul Tarau
9th April 1993

There's indeed some extra work and heap consumption with an AND-continuation passing Prolog engine. But there's much less 'bureaucracy' than with environment-stack based-WAMs. The simplification is so visible that every one who writes a 'BinWAM' emulator will probably understand it (I still believe this is not true for the WAM).

I agree with the common opinion that in theory the WAM uses resources more carefully then the 'BinWAM' but the constants involved are definitely higher. (Some previous discussions on the constants involved in linear unification algorithms point to similar things).

BinProlog's speed does not come strictly from its differences with standard WAM but from the simplifications that allows you to do a few essential things well: elimination of the interpretation overhead, code reuse, a simplified term representation, less tests, and less questions asked about things that are the same, most of the time.

A description of the implementation can be found by ftp at:
ftp://clement.info.umoncton.ca/BinProlog/UNCOMPRESSED/papers/TechRep92-3/TR92-03.tex

It is accurate except for some benchmarks that are now 10-20% faster.

Thomas Lindgren writes:
P :- Q,R,S.
becomes:

P :- Q, Z1.
Z1 :- R, Z2.
Z2 :- S.

The most important difference is that the WAM hard-wires this transformation as its environment-stack optimization. Obviously if you write out this work in C you can do it faster than in Prolog but you are committed to one particular implementation.

On the other hand, if you play with it at source level (partial evaluation, some unification reordering and variable elimination as shown in the Demoen-Marien papers) or if you consider alternative body decompositions you often can do things that are better. A new EBC transformation and some insights about binary versus standard WAM can be found in Ulrich Neumerkel's PhD thesis (ulrich@mips.complang.tuwien.ac.at) and some of his recent papers.

Thomas Lindgren writes:
In Paul Tarau's engine, there is no goal stack; instead, the success continuation is allocated on the heap. Demoen and Marien also followed this approach. There are no unsafe variables (these have disappeared in WAM-style Prologs such as Aquarius and BIM also). The drawback is that these objects end up on the heap instead.

The fact that something ends up on the heap is not necessarily a drawback. One factor is that with the stack you 'push-and-pop' while with the heap you 'push' N-times and eventually 'pop' once with some form of generational copying garbage collector. Or you put somewhere a 'read barrier' and copy incrementally and then start from the beginning again. However, I know that caching might suffer if this is not implemented carefully.

My current belief is that there is no 'philosophical reason' why the 'BinWAM' is inherently slower than standard WAM. The question is really an empirical one.

Thomas Lindgren writes:
BinProlog is an emulated implementation, though; the difference may be greater in a 'fully compiled' implementation.

I am afraid this might be true, at least until BinProlog goes native or C-expanded.

vbeiren@kiwi.uia.ac.be
Luc Van Beirendonck
13th April 1993

I believe that Paul Tarau's net message can be interpreted as saying that the WAM is designed so that it:

uses as little stack/heap space as possible
runs as fast as possible

The second objective is only realized within the bounds of the first objective.

Continuations seem to reverse the priorities of this objectives: all frame variables (in classical WAM terminology) are pushed on the heap, where they have a longer life time (more space consumption). On the other hand, the absense of a frame stack simplifies the run-time process (there is less 'bureaucracy', and no interpretation overhead).

So, continuations seem to pay space for speed (although the speed profit is questionable, due to the term pushing overhead). I agree with this evolution: space issues are less a problem now-a-days, while performance is still a bottle neck (even on fast machines).

Therefore I have the following question: Do more radical strategies (than continuations) exist that optimize speed at the cost of incremented space?