Why is Interning Atoms and Functors Nonbacktrackable?

Appeared in Volume 7/1, February 1994

Keywords: backtracking.

vanroy@prl.dec.com
Peter Van Roy
15th November 1993

In all Prolog systems I know about, inserting atoms and functors into the symbol table is a nonbacktrackable operation. Why is this? Has anyone studied the trade-off of making this operation backtrackable? Is there a fundamental reason why this operation should be nonbacktrackable? I can imagine a database application that needs lots of fresh atoms, and in this case creating them once might be more efficient. Has anyone done any measurements or back-of-the-envelope calculations to justify this?

tarau@cs.sfu.ca
Paul Tarau
15th November 1993

In BinProlog this is backtrackable but as a 'bloc' of insertions, on (re)loading a file.

BinProlog uses time-stamping to wipe out everything that's more recent than a given interesting return point. More generally, this extends to other forms of state information. For instance the blackboard supporting global logical variables (in upcoming release 2.10) and the dynamic database is also subject to this time-stamped clean-up. This gets rid of annoying reinitialization and database cleanup operations for the user while simplifying the implementation considerably.

Making them atomically backtrackable creates two problems: you have to trail entries in the hashing table, strings, each dynamic clause.

Note that this is not necessarily inefficient or unpleasant to implement. However, making every insertion backtrackable means that your top-level loop is also backtrackable in some sense. This may induce a too drastic change in the user interface and users may be quite sensible to that.

I think that what should be backtrackable is a global design decision. Things to also consider are I/O predicates and dynamic database updates. By the way, suppose you have a flag indicating that update of flags is backtrackable. Should updating such a flag be backtrackable?

lee@munta.cs.mu.oz.au
Lee Naish
16th November 1993

A few years ago we had an honours student do some modifications to NU-Prolog related to the database interface(s). One of them was to make inserting atoms into the symbol table backtrackable (unless they were used in something like assert/1). For database queries involving joins (where many atoms were used, but most were on failed derivations) it saved quite a lot of space and was also significantly faster. The reason it was faster was that it avoided expensive expansions of the symbol table. Unfortunately (I think) his work was never incorporated into the mainstream version of NU-Prolog.

pereira@alta.research.att.com
Fernando Pereira
16th November 1993

You can't make interning fully backtrackable. After all, you may build a clause with the interned symbols, assert it, and backtrack over the whole thing. The clause is still accessible, and it would be very peculiar not to be able to connect new goals or clauses to the asserted clause just because its symbols have suddenly lost their connection with their names, not to mention the havoc this would create in clause indexing.

You could of course 'globally intern' the symbols used in any clause being asserted, just as the terms in such a clause are copied. In other words, you could have a backtrackable symbol table and a nonbacktrackable one, and promote symbols from the backtrackable one to the nonbacktrackable one if a persistent reference to them (eg. via an asserted clause) is created.

Historically, I believe the main reason why no one thought of this is that the most obvious way of building a symbol table is just to have a global hash table, etc. I certainly don't remember any discussion of this issue when DEC-10 Prolog was being written. In any case, before last call optimization was available, most substantial loops (eg. the read loop in Prolog itself) were of the repeat, compute, side-effect, fail kind, so backtrackable interning would hardly make sense.

micha@ecrc.de
Micha Meier
16th November 1993

Peter Van Roy writes:
I can imagine a database application that needs lots of fresh atoms, and in this case creating them once might be more efficient.

As a matter of fact, the opposite is the case. The databases are huge and they can store billions of different strings. You can never dream of storing them all in a Prolog symbol table. In database applications you want atoms or strings which are removed upon backtracking.

You can note that there is something rotten in the usual Prolog symbol table when you start to parallelize it :-)

In my opinion, the Prolog system should recognize when an atom is a long-living one and when it can be disposed of on backtracking. I think ALS Prolog had the concept of interned and uninterned atoms, the latter not being stored in the symbol table at all.

Those atoms that are never unified with other atoms need no symbol table entry in the first place and thus they can be easily removed on backtracking. We have adopted a compromise - we have both atoms and explicit strings. The atoms are inserted into the symbol table and stay there, until garbage collected. Strings are never stored in the symbol table, and by default they are on the global stack where they disappear after backtracking. Strings occurring in compiled or recorded terms are assigned an entry in the symbol table. This scheme requires the user to say which symbol needs a symbol table entry and which not, but in fact no one else knows better if the entry is needed or not. This scheme works fine.

chik@icot.or.jp
Takashi Chikayama
16th November 1993

With nonbacktrackable assert or record, there may be atoms and functors interned in a failing branches but adhering to the asserion/record databases, which makes it difficult.

Peter Van Roy writes:
I can imagine a database application that needs lots of fresh atoms, and in this case creating them once might be more efficient.

Garbage collecting atoms and/or functors would be an easier solution. I remember some Lisp system had a feature called something like "garbage-collect-truely-useless-atoms". It disposes of atoms with no references except from the atom name table, and no setq'ed value, function definition nor property list entries.

jlothian@castle.ed.ac.uk
J Lothian
17th November 1993

I seem to remember that the Perlog persistent Prolog developed at Aberdeen has garbage collection of atoms and functors, presumably using some kind of reference-counting setup. There must be somebody out there who was involved with that work?