Prev Next Up Home Keys Figs Search New

Demand Driven Lists for DCGs?

Appeared in Volume 8/1, February 1995

Keywords: DCGs.

geoff@coral.cs.jcu.edu.au
Geoff Sutcliffe
10th November 1994

One of the researchers in Building Management is using DCGs to do some parsing of data files. He has a problem that the files are very large, so he cannot get all the tokens into a list to be submitted to the DCG. I've been trying to dream up a way of making 'demand driven lists' (ala Scheme's streams) for DCGs, but have failed. The idea is that the tokens should only be read from the file when the DCG needs them for the parse. If backtracking is required in the parse then the tokens remain in memory as part of the list. Has anyone a solution to this?

vannoord@let.rug.nl
Gertjan van Noord
10th November 1994

For a start, you probably shouldn't use lists in this case. Furthermore, if you're lucky you're using a Prolog in which you are allowed to redefine 'C'. What you could do is use two pointers instead of a difference list. The predicate 'C'(P0, Terminal, P) could then be defined in such a way that it searches for the P'th member of the datafile. Furthermore, you memoize this predicate such that upon backtracking you don't have to redo this work.

pereira@alta.research.att.com
Fernando Pereira
11th November 1994

Here's a technique that I have used in Quintus Prolog. It works, although it is on the slow side. In the DCG, instead of using [X] for a terminal symbol X, use `X, with the following operator and predicate definitions.

:- op(500, fx, `).

`(C, @(P0,Stream), @(P,Stream)) :-
  stream_position(Stream, _, P0),
  get0(Stream, C),
  stream_position(Stream, P).
If the grammar start symbol is S and the file to parse is File, define:

parse_from_file(S, File) :-
  open(File, read, Stream),
  stream_position(Stream, Pos0),
  phrase(S, @(Pos0,Stream), @(Pos, _)),
  stream_position(Stream, _, Pos),
  get0(Stream, C), is_end_of_file(C),    % optional: file is completely parsed
  close(Stream).
The relative slowness is due to the constant fiddling with stream positions, which involves going through several layers of Prolog, C, I/O libraries and system calls. But using assert to implement some form of streams would be even worse.

J_Hamer@cs.auckland.ac.nz
John Hamer
14th November 1994

A 1Mb file will consume about 8Mb of memory when read into a list. If you tokenise the file first, you will reduce the memory required by a factor of 10 or more.

What is in the data files? If they have a regular format, you may be able to use that regularity to construct a much more convenient and efficient representation.

The problem with demand-driven lists is that they tend to be very much slower than in-memory lists.

Prev Next Up Home Keys Figs Search New