Strong Typing and Meta-programming

Appeared in Volume 8/2, May 1995

Keywords: typing, meta-level.

vanroy@dfki.uni-sb.de
Peter Van Roy
23rd November 1994

Several posters have been suggesting that Prolog would be better if it were strongly typed, without any real evidence to support this assertion.

Here's a strong argument against strong typing.

In Prolog, programs and terms have exactly the same syntax, and it is easy to manipulate programs as data. This is second nature to Prolog programmers, and most good ones have used preprocessors and program generators to good advantage. Prolog supports this well, in particular, the ==/2 comparison effectively treats variables as atoms, so that terms representing programs can be manipulated with ease.

So here's a question for advocates of strong typing: how can I manipulate programs as data in a strongly typed language in a way that is as simple and natural as Prolog's?

pereira@alta.research.att.com
Fernando Pereira
24th November 1994

This old chestnut is popular among Lisp and Prolog afficionados. But in any decent strongly-typed language (eg. SML, Modula-3) you can define datatypes for the abstract syntax of the language and write syntax-driven program manipulation programs. Pattern-matching helps in writing such programs, thus SML is an especially good candidate for this. The other thing you need is an easily-callable parser for the language returning abstract syntax, the analogue of Prolog's read/1, and a way of compiling from abstract syntax analogous to Prolog's assert/1. That is, you need an open compiler, like the one in the latest version of SML/NJ. This is the short answer.

The longer answer is that I've never seen or written myself more obscure and buggy programs than those that manipulate Prolog terms intended to represent Prolog programs. The promiscuous use of unbound logical variables to represent program variables leads to complicated case analysis using var/1 and cut, and to the danger of accidentally binding those variables. Even the trivial problem of translating DCG rules to Prolog has several of those pitfalls, and almost all such translators are or have been buggy. Some of the most clever but most obnoxiously unmaintainable parts of Prolog compilers I have seen involve taking advantage of variable promiscuity to propagate variable occurrence information. And so on.

Sure, the identification of terms and programs is seductive, and leads to very concise hacks. But it leads to unmaintainable, unsafe code.

And now for a final heresy. I've written lots of Prolog, and lots of C. Writing the Prolog code has often been more gratifying, because something is up and running much sooner. But the C programs tend to have far fewer runtime bugs once they compile and link, because even the lame C type system catches many type bugs that Prolog lets by.

vanroy@dfki.uni-sb.de
Peter Van Roy
24th November 1994

Fernando Pereira writes:
But in any decent strongly-typed language (eg. SML, Modula-3) you can define datatypes for the abstract syntax of the language and write syntax-driven program manipulation programs. Pattern-matching helps in writing such programs, thus SML is an especially good candidate for this. The other thing you need is an easily-callable parser for the language returning abstract syntax ...

In other words, you've back-patched the language with a second syntax that corresponds to the original program syntax. One has to open another manual and start reading. How many syntaxes do we need in this world? How can such a second syntax, in the case of SML, manipulate functions defined with pattern-matching in a readable way?

FP: the identification of terms and programs is seductive, and leads to very concise hacks. But it leads to unmaintainable, unsafe code.

On the contrary, keeping source code simple by hiding bookkeeping through a preprocessor improves maintainability and safety.

FP: a final heresy. I've written lots of Prolog, and lots of C. Writing the Prolog code has often been more gratifying, because something is up and running much sooner. But the C programs tend to have far fewer runtime bugs once they compile and link, because even the lame C type system catches many type bugs that Prolog lets by.

This is true up to a point (it depends what kind of runtime bugs you look at). But it's a very one-sided picture. For example, since Prolog is usually incrementally compiled with an interactive top level, bugs are easier to catch immediately, rather than having to write 1000's of lines before one can even start the chase.

thom@ecrc.de
Thom Fruehwirth
23rd November 1994

It is easy to manipulate programs as data in a typed logic language, e.g. Prolog with polymorphic types:

T. Frühwirth, "Using Meta-interpreters for Polymorphic Type Checking", 2nd Workshop on Meta-Programming in Logic (META 90), K.U. Leuven, Belgium, April 1990.

contains types like the following (':' read as "has type"):

(G1,G2):goal :- G1:goal, G2:goal.
bagof(X,G,L):goal :- X:A, G:goal, Z:list(A).

I don't think it is easy to treat programs as data in Prolog once you want to do something non-trivial. One reason is the syntax of terms - with feature terms it would be easier.

pereira@alta.research.att.com
Fernando Pereira
24th November 1994

Peter Van Roy writes:
In other words, you've back-patched the language with a second syntax that corresponds to the original program syntax.

No I haven't. I'm representing the abstract syntax of language L as L expressions. In languages like Lisp and Prolog this seems unnecessary, because their concrete syntax in very close to its representation as expressions. But not close enough. I mentioned the problem with variables. Another problem for Prolog has been the proper representation of clause bodies, eg. is a clause body a list of literals or a tree of formulas.

PVR: One has to open another manual and start reading. How many syntaxes do we need in this world?

Just one. The representation of L abstract syntax in L uses L alone. The only additional piece of information you need is the syntactic constructor signature. But you really need that information for Prolog already, because operators like "," disguise the abstract syntax to some extent.

The real reason why "programs as data" works to some extent in Prolog and Lisp is that their concrete syntax is so simple (impoverished?). And even then, writing a syntactic processor for Prolog that takes into account all the syntactic operators (",". ";", "->", variables as goals, etc) correctly is sufficiently tricky that an explicitly-typed abstract syntax might well help.

PVR: How can such a second syntax, in the case of SML, manipulate functions defined with pattern-matching in a readable way?

Again, it's not a second syntax, it's just an explicit abstract syntax represented as SML expressions. Now you might argue that even the base syntax of SML is too complex for this kind of processing (I'm agnostic on the point), but that's a different question from the one under discussion, the advantages of strong typing for meta-programming.

PVR: keeping source code simple by hiding bookkeeping through a preprocessor improves maintainability and safety.

Preprocessors like DCGs are very convenient when they fit the problem. But the crucial point is that bugs in recursive decomposition of abstract syntax are more easily caught in a strongly-typed setting.

In response to my comment 'And now for a final heresy.', Peter Van Roy wrote:

This is true up to a point (it depends what kind of runtime bugs you look at). But it's a very one-sided picture. For example, since Prolog is usually incrementally compiled with an interactive top level, bugs are easier to catch immediately, rather than having to write 1000's of lines before one can even start the chase.

I wasn't comparing Prolog with C as whole languages. I was pointing out the advantages of strong typing in program development. The fact is that even with all the horrible unsafeties of C, its strong typing helped me catch the great majority of errors before running the program. Of course incremental program development is better, which you can do as well in strongly-typed languages like SML as in Prolog. The point is that many runtime bugs in Prolog (or Lisp) programs would never have happened in a strongly-typed setting, because they would have been caught by the type checker. Even with incremental development, debugging at runtime is less reliable and more laborious than having a type-checker help.

Think of it another way. Type-cheking is cheap, effective (if partial) automatic program verification. How could one want to reject such convenient help? We could as well reject compilers and go back to assembler: after all even the most machine-oriented language, eg. C, stops you from doing things that you could do in assembler.

vanroy@dfki.uni-sb.de
Peter Van Roy
25th November 1994

In response to my statement: 'How many syntaxes do we need in this world?', Fernando Pereira writes:
Just one. The representation of L abstract syntax in L uses L alone.

Not just one. It uses L, but it is not L. It is a set of definitions that you can write in L. You might as well say that the C language includes all the declarations of a C parser written in C. The plain fact is: the amount of stuff the programmer must keep in his head is greater. This is by default a bad thing, unless countered by quite strong arguments.

There is a language design trade-off here. For many applications a reflective syntax is not needed, so a library to handle it can be added. For many applications strong typing is very useful, so it can be included by default in the language. The opposite design decisions were taken for Prolog. There is no notion of "badness" or "goodness" here. It's just that Prolog and (your favorite strongly-typed non-reflective language) are good for different tasks.

FP: The real reason why "programs as data" works to some extent in Prolog and Lisp is that their concrete syntax is so simple (impoverished?).

A syntax does not have to be simple (or complex) to be a good one. These are orthogonal issues, although simplicity helps. For example, it is much easier to develop a program to build expressions and evaluate them in Prolog or Lisp than in csh.

The point of my original posting was not to say that strong typing is bad. The point is that for certain applications, the dynamic typing and reflective syntax of Prolog are a good thing. I did not claim, as you seem to imply, that they are good for everything. For small program-generation and knowledge-massaging kinds of tasks however, Prolog is very good.