State-of-the-art Garbage Collection (GC)

Appeared in Volume 7/3, August 1994

Keywords: garbage.


lindgren@sics.se 
Thomas Lindgren
3rd March 1994

Mikael Pettersson writes:

What's the state-of-the-art in garbage collectors for Prolog?

I had an idea to implement a generational copying collector for a specification language of mine(*), once I observed that the trail could do double duty as the Remembered Set of the generational collector. But that would seem to inhibit the trail optimizations as described by A"it-Kaci's WAM tutorial.

So what do Prolog implementors do? Mark-sweep with sliding compaction?

(*) On the surface very non-Prologish, but the implementation uses a WAM-like run-time model.

As it turns out, Johan Bevemyr and I have been working on that very issue. As you note, the trail stores pointers from older generations to newer ones, which is useful for generations. Early reset is done by our collector. We have tried both copying collection and generational copying collection; the costs are some extra trailing and less memory recovery on backtracking (both are so far entirely negligible) and an extra stack, and both algorithms are faster (net) than the Appleby mark-sweep collector. Our implementations are basically adaptions of the standard copying and generational copying algorithms to the situation of Prolog/WAM.

One of Prolog's particular pains is the use of '@<'/2 and similar primitives. If your language disallows these, things become simpler.

Others that have worked on copying collection are Touati and Hama, who were first as far as I know, and Bekkers, Ridoux and Ungaro. The latter algorithm is quite straightforward, but doesn't handle @ From what I know, Prolog implementations seem to use mark-sweep along the lines of Appleby et al. That one is mildly generational as well.

Our paper is submitted, give me a mail for a draft. The others are given below.

[1] Appleby, Carlsson, Haridi, Sahlin, Garbage collection for Prolog based on WAM, CACM, June 1988.

[2] Touati, Hama, A light-weight Prolog garbage collector, FGCS, 1988.

[3] Bekker, Ridoux, Ungaro, Dynamic memory management for Prolog programs, IWMM'92 (LNCS 637), 1992.

Also, our department has generated a couple of papers on GC, check for technical reports by Barklund and Barklund and Millroth as well.


jb@sics.se 
Johan Bevemyr
4th March 1994

The problem with the '@<'/2 primitive is that it is used for ordering Prolog terms. Most Prologimplementations use the address of a variable for ordering them.

The problem with using a copying garbage collector is that the relative positions of the variables on the heap are not preserved.


ridoux@irisa.fr 
Olivier Ridoux
4th March 1994

Yes, but for tracing, debugging, displaying terms, the relative positions are not enough. One needs a stable identification of the variables. No compacting GC will offer stable addressing of the variables.

However, a stable addressing of the variables is not the only means for a stable identification. So, one can still use a compacting GC, even a copying GC!

Another stable identification scheme is simply to label variables.


mlevy@csr.uvic.ca 
Michael Levy
4th March 1994

Mikael Pettersson wrote:

What's the state-of-the-art in garbage collectors for Prolog?

Bill Older and John Rummell describe an incremental GC for WAM-Based Prolog implementations (Proc. Joint Intnl Conf. and Symp. on Logic Programming, Washington DC, 1992, MIT press, K. Apt (Ed), pp. 369-383). Their approach has a few advantages for our Prolog-to-C system, one of which is that is amortises the cost of GC uniformly, which means that the overall system will behave in a manner more consistent with the C programmers view (i.e. that programs don't suddenly suspend for no explicit reason). More importantly for us, we can restrict the times we need to check H against the top of the heap, avoiding a penalty that we would incur if we had to check it every time it is incremented by a WAM operation.

The problem is, I'm not totally convinced by the argument given in the paper (probably because I don't fully understand the agorithm). The question I wonder about is whether Rummell and Older's algorithm really finds all the garbage. I started implementing the algorithm in our system, but I have not had time to verify that all garbage is always collected. Has anyone else looked at their algorithm? If it does work, it seems to me to be a very nice alternative to the usual mark-and-sweep algorithms, but maybe I am missing some subtle point.


lindgren@sics.se 
Thomas Lindgren
5th March 1994

Michael Levy writes:

More importantly for us, we can restrict the times we need to check H against the top of the heap, avoiding a penalty that we would incur if we had to check it every time it is incremented by a WAM operation.

Just a quick note: you can avoid this in at least two other ways:

1. Generate a trap when memory runs out, garbage collect and return from the trap. For instance, set the page beyond the heap to read-only or protected otherwise.

2. Generate explicit heap checks. You can compute (an upper bound of) the memory needed by most operations (the others can do their GC checking themselves) and test once per chunk.

Both of these are used in real systems. In fact, I think SML/NJ used (1) previously and (2) nowadays. We are using (2) in Reform Prolog.


jb@sics.se 
Johan Bevemyr
4th March 1994

In response to Olivier Ridoux. I'm not convinced variables need to be internally labeled in order to implement an atractive debugger. For example, does the Opium debugger by Mireille Ducasse [1] rely on labeled variables for most (any?) of its nice features?

If we agree that variables need to be labled then the tricky part is to do this with minimal resource consumption (execution time and memory).

Assuming a WAM based implementation I can think of at least two methods for labeling variables:

1) Extending the representation of variables with an extra word containing the label.

2) Tag unbound variables with a special unbound-tag and use the value field of the variable to store the label.

The problem with the first approach is that more memory is consumed, and initializing unbound variables becomes more expensive (a unique label has to be constructed and written for each variable). I believe Eclipse from ECRC use some similar techinque.

The second approach makes trailing more expensive as both the address of the bound variable and its label have to be saved on the trail. Also, my experience is that using unbound-tags slow down an emulator based WAM implementation by approximately 5%.

I think Touati and Hama [2] suggested the use of a hash table of labeling interesting variables. The hash table is updated and garbage collected during garbage collection of the heap.

Are there any other reasonable ways of labeling/ordering variables?

[1] M. Ducasse, An Extendable Trace Analyser to Support Automated Debugging, Technical Report ECRC-92-33, ECRC GMBH, Arabellastr. 17, D-8000 Munchen 81, Germany.

[2] H. Touati, T. Hama, A light-weight prolog garbage collector, Proceedings of the International Conference on Fifth Generation Computing Systems, 1988.


jonas@groucho.csd.uu.se 
Jonas Barklund
9th March 1994

In reply to mlevy@csr.uvic.ca's message. If you were interested in Older & Rummel's paper then you might want to have a look at "Garbage Cut for Garbage Collection of Iterative Prolog Programs" by Barklund and Millroth in Proc. 1986 Symp. on Logic Programming. (It seems to me that O&R have rediscovered essentially the same thing.) The idea in our paper was that if you have a deterministic iterative program, you can reclaim garbage very inexpensively between iterations. Many programs seem to follow this pattern (though certainly not all). We also provided a cut operator with exactly the same observed behaviour as ordinary cut but that actually initiated a garbage collection. In our example program, a Prolog compiler compiling a 200 predicate program, doing 200 such small gc's (one for each predicate compiled) was faster than four big gc's. The paging behaviour was also greatly improved.


pereira@alta.research.att.com 
Fernando Pereira
8th March 1994

Thomas Lindgren writes:

1. Generate a trap when memory runs out, garbage collect and return from the trap. For instance, set the page beyond the heap to read-only or protected otherwise.

Unfortunately, most OS implementations are not too helpful in what they support reliably on a memory protection fault. Most attempts to use SIGSEGV to do it in various Unix implementations on various kinds of hardware that I have seen failed because not enough context was saved in all cases by processors or trap handlers.