Dismal Indexing

Appeared in Volume 8/3, August 1995

Keywords: indexing.

conway@munta.cs.mu.oz.au
Thomas Charles Conway
16th April 1995

I am curious to know why most Prolog implementations have such dismal indexing (usually limited to the top functor, first argument) when indexing has such a big impact on performance? Why don't implementations provide indexing at least on the top level functors of all head arguments?

In the Mercury implementation we merge the multiple clauses of a predicate into a single clause with an explicit disjunction. For example:

p([], 0).
p([X|Xs], Sum) :-
  p(Xs, Sum0), Sum is Sum0 + X.

becomes:

p(H1, H2) :-
  (
   H1 = [], H2 = 0
  ;
   H1 = [X|Xs], H2 = Sum,
   p(Xs, Sum0), Sum is Sum0 + X
  ).

We then do indexing on all disjunctions. This means that where Prolog programmers will often introduce a small auxiliary predicates to get better indexing, Mercury programmers can mostly use an explicit disjunction. This removes the need for the overhead of a predicate call, and often results in better code readability.

pereira@research.att.com
Fernando Pereira
16th April 1995

Thomas Conway writes:
I am curious to know why most Prolog implementations have such dismal indexing (usually limited to the top functor, first argument) when indexing has such a big impact on performance? Why don't implementations provide indexing at least on the top level functors of all head arguments?

Mostly historic precedent. The majority of Prolog programs that were around when David Warren designed the DEC-10 Prolog compiler benefitted enormously from first argument indexing. Of course more sophisticated indexing might have made things even better, but it would also make the compiler and the generated code bigger. In those days, size was a much more serious consideration than it is today. In the WAM, David maintained the same simple indexing scheme (actually, not in the first version). Again, the payoff of 1st-arg indexing for typical programs of the time as great, but the payoff for more thorough indexing is unclear (of course, we were already conditioned in our programming by the existence of 1st-arg indexing in DEC-10 Prolog). Without detailed mode and determinacy inference (which I guess Mercury uses), full indexing would require monster decision graphs (constrast that with the decision graphs for SML patterns, which can be straightforward because they only do 1-way matching without the need for backtracking). Again, given limited resources (of compiler writers and machines to run the code), there seemed to be many more important things to focus on at the time.

ok@goanna.cs.rmit.edu.au
Richard A. O'Keefe
24th April 1995

Thomas Conway writes:

The answers are (1) cost, (2) cost, and (3) cost.

First I should make two points.

not all commercial Prolog systems are so restricted
don't sneer at first-argument indexing: early Prologs had none (several early Prolog systems - not from Edinburgh or Quintus - talked about indexing, but all it meant was that they could find the right predicate in one step).

Without some built-in indexing, you are sunk. Indexing on the principal functor of the first argument is a sufficient building block to synthesise any indexing pattern you want.

There are several things to remember about Prolog.

Dynamic data base changes
Incremental compilation

There are Prolog compilers that wait until they have a whole predicate before compiling it (typically insisting that the whole predicate be one contiguous block in one file), but that precludes some quite sensible coding styles.

These mean that you really want an indexing method that works well in the presence of dynamic insertions (1,2) and deletions (1). (3) Multimodal use of predicates.

Now, suppose you have an N argument flat relation with ground unit clauses (a data base). To get indexing in all cases, you have to deal with 2**N possible modes.

Take as an example:

	mark(Student#, Subject#, Assignment#, Mark)

In our data base, there are about 2600 student ids. At a rough guess, in any one year, there'd be around 20000 marks in this relation. So that's 20000 entries, to be multiplied by 2**14 = 16 indexes required.

Even if we took into account the functional dependency:

	Student# x Subject# x Assignment# -> Mark

there are still 8 reasonable modes, one of which admittedly requires only a trivial index. Do you really want 7 index tables automatically built for a relation with 20000 clauses?

Now, what do commercial data base systems do? It is an important thing about relational data bases that there are no 'mode' declarations; any relation (even a view) can be used in any mode. They therefore have exactly the same indexing problem (simplified by the fact that all their "clauses" are ground units). They require you to explicitly ask for indices to be built.

So a reasonable compromise, which some commercial Prologs offer, is to allow the programmer to specify which arguments and perhaps combinations of arguments are to be indexed. (There used to be a Quintus library package that built multiple single-argument indices given a declaration; by now I would hope that this is built in.)

However, there's one more problem: variables. I suppose one way to cope would be to say "if you declare that a predicate is to be indexed on a certain argument, then that argument must be instantiated in all clauses". At first it was quite tricky working out a way to make first-argument indexing work in the presence of interspersed clauses with variable first arguments. All the obvious ways of doing this go exponential when you try to index on several arguments at once.

Conway continues:
In the Mercury implementation ...

Yes, but Mercury has strict modes (unlike Prolog) and block compilation (unlike Edinburgh Prologs).

Conway continues:
we merge the multiple clauses of a predicate into a single clause with an explicit disjunction

Yes, that's easy. In effect Kliger did the same thing for FCP. I did it for my (unfinished) Prolog-D. It helps a treat if you have compile-time information about modes and block compilation.

conway@munta.cs.mu.oz.au
Thomas Charles Conway
30th April 1995

From my limited experience, it appears that the strict types and modes used by Mercury are well worth the restrictions they impose. Not only do they enable efficient indexing (In many cases, the Mercury implementation is able to achieve indexing that is quicker than switches in C), but they enable the compiler to warn about many errors that often occur and can take ages of tedious debugging to find such as missing or duplicate clauses. (Duplicate clauses are mainly a problem if they cause choice points to hang around.)