Compiler Writing Using DCGs

Appeared in Volume 8/1, February 1995

Keywords: DCGs.

alberto@cs.umd.edu
Jose Alberto Fernandez R
22nd June 1993

Seven or more years ago, I was teaching a class in compilers construction and decided to make the students do all the implementation in Prolog. The development time was reduce enormously. Using DCGs for the atribute grammar make the abstract tree code generation an afternoon's work. Generating code, optimizing, register allocation, is all easy because you can work symbolically and try, and redesign, strategies very quickly.

The biggest problem was syntactic error detection. And synchronizing the parser after an error had been found. I have not seen anything about how to do error detection and synchronization using DCG grammars. The work in NLP usually does not help because they assume that the input is "correct", if not, they fail.

srin@cs.utexas.edu
Srinivas Nedunuri
24th June 1993

I agree with Jose, in comparision with DCGs, yacc and lex aren't very user-friendly. True, writing a grammar for a top-down parser (such as the DCG parser) means contorting your grammar a little bit, whereas LR parsers are supposed to accept the more "natural" grammars as is. (Even this is not quite true; yacc for example "recommends" that you use left-recursive rules instead of right-recursive rules so you don't overflow the symbol stack. Top down parser don't recommend, they insist on non-left recursive rules!). On the other hand, inherited attributes are really troublesome to implement - you have to search the value stack to get at them. I don't consider that a very "natural" style of programming. DCGs on the other hand, give you synthesized and derived attributes very naturally using unification.

What is undoubtedly true is that yacc produces very efficient parsers. DCGs run several orders of magnitude slower, which makes them useless for realistic compiler writing. If there was a standard efficient implementation of DCGs, I'm sure they would be more widely used.

pereira@alta.research.att.com
Fernando Pereira
24th June 1993

The standard technique (standard in the folklore, that is, I know of no complete and formal presentation) is to convert the DCG to an equivalent Prolog program that when run by Prolog behaves as a bottom-up parser for the language of the original grammar. With a suitably chosen mapping, the resulting parser has the valid-prefix property, or an approximation thereof, making error detection and recovery much easier than for the original DCG executed directly by Prolog. This technique is typically an instantation (by hand) of a standard grammar-to-pushdown automaton mapping, eg. the LALR(1) construction. It is, in principle, possible to extend to some classes of DCGs the standard grammar->automaton mappings, but the theoretical fine print and suitable grammar compilation systems have not been fully developed yet.

Much recent work on NLP, in Prolog or not, is concerned with parsing incomplete, corrupted or extra-gramamtical input. Many more or less ad hoc techniques are being tried, although the obvious connection with error recovery techniques in parsing (especially with least-cost error recovery) has hardly been taken advantage of.

ted@nmsu.edu
Ted Dunning
24th June 1993

Another reference of interest is the article by the erlang group about the use of Prolog to prototype erlang. You can obtain it by FTP from:
ftp://euagate.eua.ericsson.se/pub/erling/info/prac_appl_prolog.ps

The paper is called:
J L Armstrong, S R Virding and M C Williams, Use of Prolog for developing a new programming language., In "The Practical Application of Prolog", London, April 1992

ulfni@ida.liu.se
Ulf Nilsson
24th June 1993

Fernando Pereira writes:
...convert automatically a DCG into another Prolog program that when executed directly by Prolog behaves as a bottom-up parser for the language of the original DCG.

I did some work on this a few years back. Instead of compiling a DCG using the standard approach I wrote a program which compiled the DCG into an SLR(1) parser. As long as the underlying CFG was SLR(1) the resulting parser was deterministic (provided, of course, that the semantic actions were deterministic). As a side-effect I obtained a working parser even if the CFG was non-SLR(1). However, shift-reduce conflicts resulted in a non-deterministic parser. The translation is quite simple and the only problem that I encountered is that semantic actions may be permuted.

Nilsson, U., AID: An Alternative Implementation of DCGs, New Generation Computing, No 4, pp.383-399, Vol 4, 1986

pereira@alta.research.att.com
Fernando Pereira
25th June 1993

Srinivas Nedunuri writes:
What is undoubtedly true is that yacc produces very efficient parsers. DCG's run several orders of magnitude slower which makes them useless for realistic compiler writing.

What's the evidence for this claim? 10 years ago or so I played a bit with comparing lex/yacc-based compilers with hand-translations of DCGs to bottom-up parsers to see if there would be a great payoff in using lex/yacc for a Prolog reader. I don't remember the actual numbers, but they were nothing like several orders of magnitude, more like less than one order of magnitude. Remember, the yacc runtime is an interpreter for a certain class of deterministic pushdown automata, and yacc puts more effort in encoding parse tables compactly than on the speed of the parser.

srin@cs.utexas.edu
Srinivas Nedunuri
28th June 1993

I can believe that. I was actually referring to the usual top-down parser that the built-in DCG translator gives you, which does backtrack extensively. However, it should actually be quite straightforward to write a top-down translator that inserts cuts in appropriate places for a given class of grammars, say LL(1).

jukpa@ida.liu.se
Jukka Paakki
28th June 1993

I have played with an alternative DCG implementation based on LL(1) grammars. The implementation method was akin to the conventional one, a direct translation into Prolog. Because of this, the determinism did not actually provide any speed-up over the standard backtracking scheme. The actual goal of the implementation was not efficiency, but rather automatic syntactic error recovery (something that is totally missing from conventional DCGs). The error recovery facility was easy to obtain, thanks to restricting the DCG notion to have an LL(1) CFG counterpart.

References:

J.Paakki: A Practical Implementation of DCGs. In: Proc. of Third Int. Workshop on Compiler Compilers, Schwerin, 1990. Report 8/1990, Akademie der Wissenschaften der DDR, 1990, 249-257.

J.Paakki, K.Toppola: An Error-Recovering Form of DCGs. Acta Cybernetica 9, 3, 1990, 211-221.

ulrich@mips.complang.tuwien.ac.at
Ulrich Neumerkel
23rd June 1993

Jose Alberto Fernandez writes:
The biggest problem was syntactic error detection.

Parsing with errors means that you want to describe the relation between a string, an AST (or whatever) and a string of error messages.

So instead of
:- phrase(nonterminal(AST),Tokens).

you could use:
:- phrase(nonterminal(AST,Errors,[]),Tokens).

where
:- phrase(nonterminal(AST,[],[]),Tokens).
should be equivalent to the original query.

Now the remaining problem is to encode error messages in some way

into the grammar. This is indeed not that easy because you have to describe all ungrammatical sentences within the grammar. A discussion in the contex of NLP can be found in [1] on page 178.

The advantage of such parsing is that error messages might be more precise. Typical errors can be incorporated with ease:

comma(Errors,Errors) --> [','].
comma(['Period instead of comma'|Errors],Errors) --> ['.'].
comma(['Comma missing'|Errors],Errors) --> [].

Another more general way is to encode skipping of unparsable parts.

It seems preferable to use iterative deepening to search for a parse with the smallest number of error messages. For instance, parsing is cheap for correct sentences, and expensive for the others. Otherwise, the grammar would be cluttered with lots of cuts.

It is, indeed, much more elegant to use an extended DCG-formalism that hides the error lists, and adds information about the position of the error messages (Peter van Roy's EDCG).

As an alternative approach you could use continuations for recovery.

[1] Berwick, Robert C., Principle-Based Parsing, In "Foundational Issues in Natural Language Processing", MIT, Peter Sells, Stuart M. Shieber and Thomas Wasow (eds.), Ch. 4, pp. 115-226, 1991