The Occurs Check

Appeared in Volume 6/2, May 1993

Keywords: occurs check.

smk@dcs.ed.ac.uk
Stefan Kahrs
17th December 1992

Matthew Huntbach writes:
I have written programs which rely on the occurs check not happening, since they intentionally build circular structures. What is so wrong with this - after all it's a standard programing feature in lazy functional languages?

If you allow cycles, you change the term universe. A solution to a goal t=u where t and u are first order terms should be a substitution (a mapping from variables to first-order terms) sigma, such that sigma(t)=sigma(u). Because sigma is a first-order substitution, the resulting terms are first-order too.

If you are using unification without the occurs check, then the substitution might not be first-order, the substituted terms can contain cycles. One can express this semantically by adding a fixpoint operator to the term universe.

You are right that it is often useful to have cyclic terms; but they live in this different term universe and Prolog's =/2 is an incomplete unification there, e.g.A=f(A), B=f(B), A=B

This conjunction usually fails to terminate, although it should either fail (in the Herbrand universe) or succeed (in the extended universe). It is possible to extend unification with a cycle check to make it terminate even for cyclic terms (some Prologs have such a thing; I even know a programming language which uses unification with cycle check for type inference), but this is less efficient than ordinary =/2. Probably, it is similar in efficiency to unification with the occurs check.

torkel@SICS.se
Torkel Franzen
17th December 1992

Matthew Huntbach writes:
I have written programs which rely on the occurs check not happening...

There's nothing wrong. However, it's a good idea, both from a theoretical and a practical point of view, to know what constraint system one is working in. A Prolog without an occurs check, which only partially supports circular term-like structures, promotes the use of obscure programming tricks and widens the gap between theory and practice.

pereira@alice.att.com
Fernando Pereira
17th December 1992

Matthew Huntbach writes:
I have written programs which rely on the occurs check not happening...

If you think of Prolog programs as logic programs where the logic is (a fragment of) first-order logic, then the answers derived without the occurs check are in general unsound. For example, consider the first-order question:

for all x. there exists y. r(x,y) ?= there exists x. r(x, x)

(where ?= means "is the rhs a logical consequence of the lhs?"). The answer is negative, eg. take r to be the successor relation on the natural numbers. Now convert the question to Prolog by Skolemization. The same question maps to the Prolog question:

r(X, f(X)) ?= r(U, U)

Without the occurs check, this succeeds! Lest you think this is an artificial example of no practical importance, let me say that I have been badly bitten by the lack of an occurs check in several natural-language parsers and generators. Nonetheless, the efficiency reasons that led to Prolog's unification not having the check are pretty compelling, so I do not advocate adding it to Prolog in all cases. A clever compiler might be able to recognize many cases of practical importance in which the check is not needed, and only call the more expensive unification with an occurs check in the remaining cases. Alternatively, careful programmers can do some of that analysis by hand, and put explicit calls to a sound unification when necessary.

If you want to program with cyclic terms, you have gone outside bare first-order logic to a system in which terms are interpreted in a different way. The theoretical framework and practical value of that alternative were established with the Prolog II work at Marseille, and there's certainly a sound way of thinking about Prolog with cyclic terms. However, there is no free lunch. Sound and terminating unification of cyclic terms can be more costly than unsound Prolog unification (the reverse can also happen for acyclic terms with a lot of reentrancy, though) and that cost is incurred at every unification in a program.

miller@cs.rochester.edu
18th December 1992

The RHET language (a Prolog deviant) does allow you to turn the occurs check on and off, and only shows about a 15% degradation when on. However, this is using an interpreted system, with a very general unification operator. Current work on a compiled replacement leads me to beleive that compiled specialized unification would indeed degrade by many-fold if it (always) needed to do an occurs check, but it is unclear whether this is in fact the case. The compiler may be able to detect when an occurs check provably isn't needed for a substantial subset of the cases that occurs in day-to-day Prolog type code. I do not yet have any numbers to back up this opinion, however, looking at the problem more abstractly...

Graph unification that does not detect infinite unifiablity can run in linear time O(E+V) where E and V are the number of edges and vertecies in the graph. (Prolog terms can be represented as graphs, obviously). See:

M.S. Paterson and M.N. Wegman, "Linear unification," J. Comput. Syst. Sci., vol. 16, pp. 158-167, Apr. 1978

Graph unification that does detect infinite unifiability can run in just over linear time (specifically O(alphaE + V) where alpha is the inverse Ackerman's function). This result is mentioned in the above paper, but an example is given in:

Vitter J.S. and Simons R.A., "New Classes for Parallel Complexity: A Study of Unification and Other Complete Problems for *P*" IEEE Trans. on Comp. Vol. c35, no. 5 May 1986 pp 403-417.

Note that this function does not come up with the substitution, it just detects unifiable, infinitely unifiable, or neither.

Essentially what this tells us is that detecting infinite unifiability and coming up with a substitution (stronger than just an occurs check), is only "slightly worse" than not doing so. For practical problems, the inverse Ackerman's function is a constant. However, this does not address the all-important "constant factor" by which we really live our lives (e.g. even if I gave you a polynomial function for reducing NP complete problems to P problems, it would be of tremendous theoretical importance, but little practical importance if it looked like (((10**10)**10)**10)n**312875. So it is really in this constant factor that the occurs check becomes important, and that is where we are hard pressed to look to theory for any concrete answers (it's easy to show that checking that a substitution is valid is linear, but it's hard to show any sort of constant factors for anything since that becomes architecture-dependant). We can only come up with experiments, and say things like "so far, it looks pretty bad to me".

If you want to argue against the efficiency position, it really isn't an argument to say that in this particular program, it makes little difference. You have to come up with an algorithm for doing compiled unification and infinite unification that is at least as efficient as the best known work for doing unification, on a particular architecture.

Personally, I feel compelled by the early Prolog results, so I've never pushed hard to even try to do infinite unification efficiently.

debray@majestix.cs.uoregon.edu
Saumya K. Debray
17th December 1992

Fernando Pereira writes:
Nonetheless, the efficiency reasons that led to Prolog's unification not having the check are pretty compelling, so I do not advocate adding it to Prolog in all cases.

While I believe this statement, it would be interesting to actually see some numbers (preferably for programs other than nrev/2) showing how much more expensive unification becomes with the occurs check. Do the implementors of Sepia, NU-Prolog, and other systems that provide both kinds of unification have any numbers?

pereira@alice.att.com
Fernando Pereira
18th December 1992

Let me refine and qualify my assertion. In typical programs that I have written or read, fairly large terms are passed around. With a naive implementation of the occurs check, each such term has to be fully examined whenever it is passed as an actual parameter to a predicate. For instance, append/3 becomes O(N²) rather than O(N). However, in many of the cases the complex term is being unified with a variable occuring only once in the head, with a term containing variables which occur only once in the head. The same situations arise with variable instances which can be shown by global flow analysis to be "new" (that is not occuring anywhere else in the current resolvent). In those cases, the occurs check is not needed. This is just scratching the surface, much more sophisticated analyses have been published.

Except for the lack of an occurs check, the original pure Prolog is sound, and many interesting deductive tasks can be coded in pure Prolog. Furthermore, extralogical features like assert can be used in constrained logical ways, eg. for lemma storage, and the resulting programs would be sound with respect to well-defined logical systems, if they were executed with the occurs check. The problem with the lack of an occurs check is that it can hit you in unforeseen and mysterious ways, while the nonlogical parts of Prolog hit you only if you use them unwisely.

torkel@SICS.se
Torkel Franzen
17th December 1992

This is putting it too strongly, since such answers are perfectly sound relative to the interpretation of a (pure) program as the union of the clauses and the identities true in a suitable universe of rational trees. The logic involved is still first order logic, it's just that we have a somewhat stronger interpretation of programs. Generally speaking, these distinctions are much simpler if we think in terms of wedding different constraint theories to Horn clauses, as originally suggested by Colmerauer.

pereira@alice.att.com
Fernando Pereira
18th December 1992

No, it is not too strong. I think you are confusing deduction in an interpreted system (eg. Horn clauses over the domain of finite trees, or of rational trees) with deduction that yields consequences true in any model, that is, valid consequences. If I use resolution to show that P |- there exists an x. Q (by deriving the empty clause from P /\ for all x. not Q) I have shown P |= there exists an x. Q, that is, for any model M such that M |= P there is an assignment a for x such that P, a |= Q. In particular (going back to my example) if I conclude by resolution that P |- there exists an x. r(x,x) then in every model of P there is an element that stands in the r relation with itself.

With P = for all u. r(u, f(u)), from which there exists an x. r(x,x) follows by resolution without the occurs check, the model in which r is the successor relation on N and f the successor function refutes there exists an x. r(x, x). Thus resolution without the occurs check is unsound as a method for deriving valid consequences in (uninterpreted) first-order logic. It could still be sound as a method for deriving true conseuqnces over some class of models, eg. rational trees. But that is a very different thing.

When using logic for knowledge representation, we often do not know what models to consider, but rather we are interested in characterizing logically the class on models of interest. Thus we need to work with valid consequence, not just true consequence in particular models. On the other hand, if we are interested in using logic to define relations over particular structures, eg. rational trees, then resolution modulo a particular equational theory can be the right way to go. Still, it is important not to confuse those two alternatives.

peter@infko.uni-koblenz.de
Peter Baumgartner
18th December 1992

Saumya K. Debray writes:
While I believe this statement, it would be interesting to actually see some numbers

That's a topic we are interested in, too. Often, people working on the sound omission of the occurs check write that carrying out sound unification is always prohibited for efficiency reasons. But has anybody investigated this systematically?

In order to support the efficiency argument for omitting the occurs check, we carried out some small experiments with SEPIA-Prolog. We ran reverse/2 on a list of about 1000 terms. The result was astonishing (at least to us): performance degrades only about 10-20% when using sound unification. It was necessary to work on (moderately) large terms in order to note a difference at all.

Perhaps our observation is not true in general, and the result is due to the clever implementation of SEPIA. Does someone know about the SEPIA implementation? Are there any better examples supporting the effciency argument?

jamie@cs.sfu.ca
Jamie Andrews
18th December 1992

Torkel Franzen writes:
This is putting it too strongly, since such answers are perfectly sound relative to the interpretation of a (pure) program as the union of the clauses and the identities true in a suitable universe of rational trees.

I was going to object to this, when I realized what Torkel meant: any answers that you do get out of systems without occurs check are sound relative to the corresponding theory of rational trees.

However, there are still completeness problems, with cases like:

x=f(x), y=f(y), x=y

where such systems typically loop instead of succeeding, and even cases like:

x=f(x,g(x)), y=f(y,h(y)), x=y

where such systems may loop where they should fail. My guess would be that applications which rely on the absence of occurs check to construct cyclic structures would be likely to run into these problems when comparing terms. On the other hand I've never written such an application so I could be wrong.

Another point that comes out of this is that there isn't just one "unification without occurs check"; we have to be more precise about exactly what algorithm is being used before we can tell how it's going to behave. I'm not even convinced that all such straightforward unification algorithms would be sound with respect to rational trees. I seem to dimly remember someone having characterized the precise theory solved by some unification without the occurs check. Does anyone remember the reference?

Also, I sometimes tell people that Martelli and Montanari's and Paterson and Wegman's linear unification algorithms are not used because either (a) the constant factor is too high or (b) the space requirements are too high. Can anyone point to something comparing time and space requirements for:

a typical occurs check unification
a typical occurs check-free unification
Martelli and Montanari's linear unification
Paterson and Wegman's linear unification
Colmerauer's (or other) rational tree unification?

As I recall, M&M and P&W only proved that their algorithms were linear.

ludemann@quintus.com
Peter Ludemann
18th December 1992

Saumya K. Debray writes:
While I believe this statement, it would be interesting to actually see some numbers

An extreme example is append/3 for difference lists (no flames on the terminology, please!). Without the occurs check, the cost is O(1); with the occurs check, the cost is O(N) on the size of the terms (in other words, it gets the same cost as regular append/3). Normally, the programmer knows that the occurs check isn't needed, so this is a big win.

Without the occurs check, "infinite" or "rational" terms can be produced. I've seen one practical use for them: representing the fixpoint of a recursive program as a term. No doubt, there are other uses.

pereira@alice.att.com
Fernando Pereira
19th December 1992

Jamie Andrews writes:
I seem to dimly remember someone having characterized the precise theory solved by some unification without occurs check. Does anyone remember the reference?

Joxan Jaffar and Peter J. Stuckey, Semantics of Infinite Tree LP, In Theoretical Computer Science, 1986, Vol. 46, pp.141-158

Jamie Andrews continues:
Also, I sometimes tell people that Martelli and Montanari's and Paterson and Wegman's linear unification algorithms are not used because either (a) the constant factor is too high or (b) the space requirements are too high.

(a) is a problem, but the main problem is that those algorithms are linear on the sum of the sizes of the two terms being unified, while (in most practical cases) the Robinson unification algorithm without the occurs check is linear on the size of the smaller term.

Notice that most unifications in Prolog involve some arbitrarily large constructed term and a clause head, which is tyoically small, and of bounded size. I say above "in most practical cases" because the Robinson algorithm without occurs check is linear on the size of the tree unfolding of the smaller term, but the tree unfolding of a term may be exponential on the size of the (graph representation of) the term, and thus on the number of variable substitution steps needed to construct the term.

In contrast, the linear unification algorithms, as well as the most efficient algorithms for rational term unification (all based on some version of UNION-FIND) keep track of which subterms they have visited and don't revisit them.

Another source of overhead for the rational term unification algorithms is that in general they side-effect not just the leaves (variable cells) but also internal (complex term) nodes. These side-effects must be recorded so that they can be undone on backtracking.

torkel@SICS.se
Torkel Franzen
18th December 1992

Fernando Pereira writes:
No, it is not too strong. I think you are confusing deduction in an interpreted system with deduction that yields consequences true in any model, that is, valid consequences.

There is no confusion involved. It is merely a matter of "valid consequences of what". The answers yielded by a Prolog system using unification without the occurs check are consequences in first order logic of the clauses of the program together with the first order identity theory of rational trees.

dumant@prl.dec.com
Bruno Dumant
21st December 1992

Saumya Debray writes:
While I believe this statement, it would be interesting to actually see some numbers ...

Some benchmarks were published in a paper about the architecture design of a RISC processor for Prolog, in which the occurs check is always performed if necessary. The design uses the WAM, and takes advantage of the fact that only get_value instructions really need to call the occurs check routine. According to the results, only 5% of the time is spent performing the occurs check! The benchmarks include quite large programs, of up to 5600 lines.

With some analysis (a simple groundness analysis, or more sophisticated analysis schemes), the overhead of the occurs check on a compiled Prolog can be negligible.

Reference:
P Dreussen, W Rosentiel, K E Schauser and J Wedeck, Architecture Design of a RISC Processor for Prolog, In Euromicro 89, 1989

smolka@dfki.uni-sb.de
Gert Smolka
21st December 1992

I seriously doubt that rational tree unification pays any significant efficiency penalties in practise over Edinburgh-style unification without the occurs check. As has been mentioned by Torkel before, unification in Sicstus is rational tree unification, and Sicstus seems to be rather efficient.

Isn't it strange that the forthcoming Prolog standard will explicitly accept implementations of unification that may not terminate on certain terms? I certainly understand that finite tree unifications is needed for some interesting applications. However, given the efficiency and the clean logical status of rational tree unification, I would expect that rational tree unification is taken as default and finite tree unification is accommodated as a built-in predicate.

By the way, was it Edinburgh or Marseille which first dropped the occurs check?

pereira@alice.att.com
Fernando Pereira
20th December 1992

Torkel Franzen writes:
It is merely a matter of "valid consequences of what". The answers yielded by a Prolog system using unification without occurs check are consequences in first order logic of the clauses of the program together with the first order identity theory of rational trees.

I agree. Adding the first-order theory of rational trees to the left of the |- is precisely restricting the class of models under consideration from arbitrary first-order models to rational tree models. For some applications this is fine, but as I pointed out earlier, for many others it is not. In any case, one needs to be very clear about what one is assuming to the left of the |-.

torkel@SICS.se
Torkel Franzen
20th December 1992

This is not so, since the class of rational tree models cannot be characterized by any set of first order axioms, any more than the finite tree models.

In a more perspicuous description of Prolog-style Horn clause programming, I think it's a good idea to separate the constraint system from the resolution mechanism. Whatever the constraint system used, (a simplified) Prolog, delivers answers to a question A(x) of the form C(x,y), where C(x,y) is a conjunction of constraints. It always holds that:

(1) C(x,y) -> A(x)

is a logical consequence of the program, read as a set of Horn clauses. Of course (1) does not imply that there is an x such that A(x); for this to follow, the constraint theory must be used.

mj@cs.brown.edu
Mark Johnson
21st December 1992

This is a nice way of viewing things, but I don't know any Prolog that gives you all of the constraints C. Consider the program:

p(a) :- q(X,X).
q(Y, f(Y)).

Sicstus does the following:

| ?- p(X).
X = a ? ;
no
| ?- q(X,X). 
X = f(f(f(f(f(f(f(f(f(f(f(f(f(f ...

That is, SICStus only returns constraints related to variables appearing the goal. But the constraints that only have cyclic solutions may be associated with other variables.

On a related issue, are there systems that permit user-specified constraints, perhaps by means of some kind of escape system built into the unifier?

torkel@SICS.se
Torkel Franzen
21st December 1992

Mark Johnson writes:
This is a nice way of viewing things, but I don't know any Prolog that gives you all of the constraints C.

True; but this standard behavior can be understood from the point of view of the model. In the theoretical model, your clauses have a canonical form:

p(X) :- X=a,q(Y,Y).
q(X,Y) :- Y=f(X).

and the Prolog execution ends with a goal X=a,Y=f(Y). Since this is a conjunction of constraints, there is nothing more for the resolution mechanism to do, and the constraint solver decides whether this is a solution to be presented to the user. However, it's a difficult task in general for the constraint solver to massage a solution into a perspicuous human-readable form. In the present implementation of Prolog, a solution of the form C(X)&C'(Y) to a query A(X) is reduced to a simpler equivalent form C(X). Equivalent in the sense that there exists a Y(C(X)&C'(Y)) is equivalent in the constraint system to C(X). This, of course, is a good thing if the system doesn't have routines to handle the printing of C'(Y).

However, your example doesn't have anything to do with circular terms in particular. You might as well define q by the clause q(b,b).

A clarification: if we take "logical consequence of the program" in my original formulation to mean consequence in predicate logic with identity, then q(b,b) cannot be used in the example. (I prefer to formulate the soundness property in predicate logic without identity.) However, q might be defined by q(X,Y) :- X=Y*Y with constraints over the real numbers.

pereira@alice.att.com
Fernando Pereira
20th December 1992

Torkel Franzen writes:
This is not so, since the class of rational tree models cannot be characterized by any set of first order axioms, any more than the finite tree models.

It cannot be characterized by any finite set of first-order formulas. Notice that I used theory, not program in my note. The term "theory" in logic means any deductively closed set of formulas. In fact, for a finite signature you can characterize the finite tree models by a finite set of equality and inequality formulas - that's precisely what the Clark completion uses. I don't have handy the relevant papers on rational tree models, but I believe the same is true for them (papers by Joxan Jaffar and Michael Maher?).

Anyway, this is just fine print, the main points have already been made clear:

the occurs check is needed for sound deduction in pure first-order logic;
unification without the occurs check is sound for deduction over rational tree models;
Prolog unification (by this I mean what was in Prolog originally) can construct rational terms, but may not terminate when unifying them;
there are various unification algorithms for rational trees, the asymptotically most efficient are all based on UNION-FIND (DFA equivalence).

torkel@SICS.se
Torkel Franzen
21st December 1992

Fernando Pereira writes:
It cannot be characterized by any finite set of first-order formulas.

Finiteness is not relevant here. It is a matter of what can be expressed in first order logic. By compactness, any set of first order formulas which has a model with the Herbrand universe {0,s(0),s(s(0)),..} as its domain will have countable models containing elements different from (the interpretations of) the terms 0,s(0), etc. Therefore it is not correct to say that adding the (complete) first order identity theory of finite trees to the left of |- is precisely restricting the class of models under consideration from arbitrary first order models to finite tree models. Similarly for rational trees. Such a restriction cannot be achieved in first order logic.

pereira@alice.att.com
Fernando Pereira
21st December 1992

Gert Smolka writes:
I seriously doubt that rational tree unification pays any significant efficiency penalties in practise over Edinburgh-style unification without occurs check.

I don't know what "rather efficient" means without actual numbers to back it up. But all the rational unification algorithms I know of need to make and trail more assignments, and chase more pointers, than the original Prolog unification without the occurs check. I would be happy it it were shown over a well-chosen mix of programs that the overhead in question is negligible. But I've not seen such a demonstration.

Gert Smolka continues:
However, given the efficiency and the clean logical status of rational tree unification, I would expect that rational tree unification is taken as default and finite tree unification is accommodated as a built-in predicate

The reason is that the efficiency issues mentioned above have not been addressed to everyone's satisfaction.

Gert Smolka continues:
By the way, was it Edinburgh or Marseille which first dropped the occurs check?

Marseille. A very detailed and interesting account of the design history of the first Prolog, written by Alain Colmerauer and Philipe Roussel, will appear in the forthcoming proceedings of the History of Programming Languages Conference (April 1993). Their choice was based on the considerations that I have mentioned before. However, Alain eventually changed his mind, as shown by the exclusive use of rational unification in Prolog II. Nevertheless, the efficiency tradeoffs might still be different for fast compiled Prologs than for interpreted systems like the original Prolog and Prolog II: when the cost of clause chasing goes down, the relative cost of term chasing goes up.

mj@cs.brown.edu
Mark Johnson
21st December 1992

Torkel Franzen writes:
However, your example doesn't have anything to do with circular terms in particular. You might as well define q by the clause q(b,b).

Only that the constraints that q introduces, viz. X=f(X), are not satisfiable with respect to the standard equality theory. The answer Sicstus Prolog returns gives the user no indication of this. By the way, I don't think this is a problem with Sicstus, only that it doesn't solve all the problems involved with cyclic solutions.

I'm interested in this approach. I know of work by Hoefeld and Smolka "Definite Relations over Constraint Languages", and Chen and D.S. Warren "C-Logic of Complex Objects" and "Abductive Logic Programming" that describe the decomposition into a relational and constraint components. Any other pointers to relevant literature?

jamie@cs.sfu.ca
Jamie Andrews
23rd December 1992

Another relevant paper is "Constraint Simplification Rules", by Thom Fruehwirth (thom@ecrc.de), ECRC tech report 92-18. He describes a framework of delay declarations and special clause-like rules which are used by the system to store and simplify constraints. It looks like a very general system.

torkel@SICS.se
Torkel Franzen
21st December 1992

The theoretical treatment of constraint LP has been solidifying lately, and there are several related formal frameworks. I think a "standard treatment" may well soon emerge. My remarks have been prompted by work here at SICS on AKL (the Andorra Kernel Language), which has a reasonable concurrent constraint LP subset. The theory of that subset is quite close in many ways to the work of Hoefeld and Smolka, but there are some points of difference in the formal treatment, partly having to do with concurrency. A paper on this has been in the pipeline for a long time and may or may not eventually emerge; those who are fanatically interested in this sort of thing can get a copy from me.

smolka@dfki.uni-sb.de
Gert Smolka
22nd December 1992

A recent paper of Ralf Treinen and myself in JICSLP'92 studies the incremental decision methods for satisfiability, entailment and negation for rational trees (actually, a proper generalization of Colmerauer's rational tree system also accommodating features). The full paper describes an abstract machine (WAM-like) providing for an efficient incremental implementation of the decision methods.

Both papers can be obtained by anonymous FTP from:
ftp://duck.dfki.uni-sb.de/pub/papers

Gert Smolka and Ralf Treinen, Records for Logic Programming, In Proc. of the 1992 Joint Int. Conf. and Symposium on LP, 1992, The MIT Press, Cambridge, Mass.

Gert Smolka and Ralf Treinen, Records for Logic Programming, German Research Center for Artificial Intelligence (DFKI), 1992, Research Report, RR-92-23, Stuhlsatzenhausweg 3, 6600 Saarbrucken 11, Germany