Net Talk
edited by Roberto Bagnara



Documenting Large Prolog Systems
From: David Poole


What experiences do people have documenting large Prolog systems?

We are looking for something more than predicate or module-level documentation; a way for a new user to understand the structure of a large Prolog system. Something like UML, but for relations not just describing objects. Has anyone tried documenting a Prolog program in UML? Our intuition is that this is not a good place to start, but I thought I would ask about what are good places to start. Any experience you have would be appreciated.

The Prolog program we are using is part of a larger system that is being documented in UML. I'd appreciate any feedback from anyone who has tried to document large Prolog systems.





From: A.L.
Subject: Re: Documenting Large Prolog Systems

I am using Z, strictly speaking, some subset of Object-Z. Prolog system is encapsulated as Java classes, therefore UML can be used on the client side.

Maybe Z is not the ideal tool, but good enough...



From: Gregory Bourassa
Subject: Re: Documenting Large Prolog Systems

I have documented large Prolog systems, and it is not too hard.  That said, any of the off-the-shelf documentation tools like UML will be tortured if you try to use them -- they have no representation for backtracking, failure, unification, and multiple solutions from backtracking.

Your best bet is to assume an "Encyclopoedia of predicates" approach and document the main predicates -- not the helpers which are only there to support interesting predicates.  For each predicate or relation you should document:

...and probably other things, but they escape my memory just now -- the above constitutes a good start

If you can, partition the system into "packages".   There is usually a set of list-handling and other predicates which are just "utilities" used everywhere,  then there will be other modules or packages which you can recognise, which do things like file access,   IP communications, pretty-printing,  graphics,  and finally the various parts of the core behaviour of your system.

Let the rest of the project team know that the Prolog subsystem is meta-relational and meta-object-oriented,  so they will have to deal with your documentation as something more than they think they see -- they will think they are seeing something very like a C-language API.    Don't stand for such complacency.



From: Brian Hulley
Subject: Re: Documenting Large Prolog Systems
Gregory Bourassa wrote:
> support interesting predicates.  For each predicate or relation you should
> document:
>     Name
>     Arity
>     Known flow patterns (bound on call,  free on call for each
> argument)...e.g. (var, var, nonvar) (var,var,var) (novar,var,nonvar), or
> (free,free.bound) etc., or (i,i,o)...whatever notation you like as long as
> you do it.

This reminds me of a paper I came across a while ago (unfortunately I don't remember the authors' names) where they automatically detected the sets of argument mode assignments for each predicate to see if the program was "simply well-moded" (to enable optimizations when compiling). Ie you start from your application's top level predicate eg go/0, and recursively traverse the rest of the program asserting the modes at each stage. (Presumably you could do something similar working bottom up for the case where you have an API rather than an app.)

>     Some comments on the expected terms which can be passed as arguments

It would be interesting to know if anyone has automatically generated this info (ie the type lattice)- on the face of it it seems possible
for some sub-systems of predicates (ie those which don't construct atoms/functors from character codes or user input etc)



From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems

Brian Hulley wrote:
> This reminds me of a paper I came across a while ago (unfortunately I
> don't remember the authors' names) where they automatically detected
> the sets of argument mode assignments for each predicate to see if the
> program was "simply well-moded" (to enable optimizations when
> compiling). Ie you start from your application's top level predicate eg
> go/0, and recursively traverse the rest of the program asserting the
> modes at each stage. (Presumably you could do something similar working
> bottom up for the case where you have an API rather than an app.)

The standard name for this is mode inference, and there have been many papers on it, mostly in the eighties. It never got anywhere much, for two reasons.

First, the search space (the possible sets of predicate/mode pairs) is too large to be searched exhaustively, and even the best heuristics that people came up with don't cut it down sufficiently. The only way to keep the process feasible was to introduce approximations. As the program becomes larger, these approximations come to dominate the output; which means that the larger the program, and thus the more you need the tools, the less the tool tells you.

Second, if you apply inference to a real program, all you get is information about how information actually *does* flow in the program;
it doesn't tell you how it *should* flow, or where the actual flow differs from the desired flow. Even if the user does a visual diff of the output of the inference against their expectations, these systems don't really help in tracking down the sources of discrepancies.

The problems reinforce each other. The more approximations the inference makes, the harder the job of doing the visual diff is, and the more bugs exist in the program, the more approximations must be made.

Suppose you have a predicate whose mode should be (in,out), i.e. the first argument is input (caller should supply a ground term) but the second is output (caller should supply an fresh, unaliased variable). Suppose you have a bug in which one clause out of several switches the roles of the two arguments. Since a straight mode inference algorithm takes the code as gospel, it cannot infer (in,out) as the mode; it must infer (in or out, in or out), a mode that contains no meaningful information. This uncertainty propagates to the predicate's callers and to their callees, and from there to the rest of the program.

>> Some comments on the expected terms which can be passed as arguments

> It would be interesting to know if anyone has automatically generated
> this info (ie the type lattice)- on the face of it it seems possible
> for some sub-systems of predicates (ie those which don't construct
> atoms/functors from character codes or user input etc)

While type inference usually means inference of the types of predicates' arguments, there is a form which tries to infer the definitions of types themselves; I think you are asking for both. While the first is definitely feasible and in fact is a routine part of many languages (e.g. Haskell, and it is available though not recommended for Mercury), there are good reasons for not doing the second. There are usually much fewer types than predicates, so defining them explicitly is less work for the programmer, and any mistakes by the inference of type definitions are magnified hugely when you try to use its output in type inference of predicate arguments. In most people's opinion, the cost/benefit ratio of type definition inference is very far from favourable.




From: Brian Hulley
Subject: Re: Documenting Large Prolog Systems


Zoltan Somogyi wrote:
>  it cannot infer (in,out) as the mode; it must infer (in or out, in or out),

I would have thought that the goal of mode inference would be to infer a *set* of mode assignments for each predicate, thus the above bug would return the set, {(in, out), (out,in)} which would at least tell you that the predicate is being used in more than one way, which might help a tool flag possible bugs since I would have expected most predicates in a large program to only have one flow (except the very general utility predicates like member/2 etc)

My perceived problem with systems which require flow declarations is that they can break the flow of thought from programmer to predicate (!) in the same way that the need to first decide how to reify a domain into objects and methods prevents exploratory programming in object oriented languages. Also, the concept of in/out seems somehow, to me at least, not quite the same as the pure concept of unification thus I would not want to have to think in terms of in/out during the incarnation of some prolog-thought into a prolog predicate.

However such declarations are obviously very useful later on, for code maintenance, generation of efficient code etc, hence the desire to generate them automatically.

Of course the above is just my personal experience and no doubt others find such declarations helpful right from the beginning - I'll have another look at the Mercury website.



From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems

I wrote:
> it cannot infer (in,out) as the mode; it must infer (in or out, in or out),

Brian Hulley wrote:
> I would have thought that the goal of mode inference would be to infer
> a *set* of mode assignments for each predicate, thus the above bug
> would return the set, {(in, out), (out,in)} which would at least tell
> you that the predicate is being used in more than one way, which might
> help a tool flag possible bugs

Yes. But if the incorrect analysis of one predicate contanimates the analysis of many other predicates, then the programmer is faced with a whole slew of messages saying "this predicate has more than one mode, and is therefore likely to be buggy" with no clue as to which of these are cause and which are effect.

> since I would have expected most
> predicates in a large program to only have one flow (except the very
> general utility predicates like member/2 etc)

That is true. Our statistics on the Mercury system, taken at various times over the years, have consistently said that the average number of modes per predicate is in the 1.03 to 1.05 range. As you say, it is mostly (although not exclusively) utility predicates in the library that have more than one mode. Since some of these have more than two modes, the fraction of predicates with more than one mode is probably around 2-3%.

Those statistics are for one program; other programs may yield different numbers. However, I wouldn't expect the numbers to be *too* different.

> My perceived problem with systems which require flow declarations is
> that they can break the flow of thought from programmer to predicate

There is nothing preventing you from writing the code first and adding the declarations later, as long as you add the declarations (at least for the predicates exported from a module) before you compile the code.

> Also, the concept of in/out seems somehow, to me at
> least, not quite the same as the pure concept of unification

Unification is used in several ways in Prolog. Most of the time, it is used purely for parameter passing, and for that, thinking in terms of in/out is useful. More rarely, it is used for solving Herbrand constraints. When we designed Mercury in 1993-94, we didn't know how to handle constraints (of any kind) while keeping the properties we wanted (ability to handle I/O in a declarative manner, reliability from checked declarations, maintainability for the same reason, and speed), so we didn't include constraints. The HAL project later explored the integration of constraints into Mercury, first as a separate language (HAL) that compiles to Mercury, and now as language extensions in Mercury itself.

The point is, parameter passing is fundamentally simple and should be fast. Constraint solving isn't anywhere near that simple in general, and can't be made that fast. Using the same mechanism for both may be intellectually parsimonious, but it is not a sound engineering choice if performance is a concern, which it is in programming languages.

> thus I
> would not want to have to think in terms of in/out during the
> incarnation of some prolog-thought into a prolog predicate.

This is where people differ, partly because of psychological differences and partly because they tackle different tasks. If I had to code a problem which can be modelled as Herbrand constraints, I wouldn't think in terms of in/out either, but in pretty much all other cases, I would.

> However such declarations are obviously very useful later on, for code
> maintenance, generation of efficient code etc, hence the desire to
> generate them automatically.

The Mercury compiler can infer the types, modes and determinisms of predicates that are local to their module; it will print out the inferred declarations, and you can cut-and-paste them into the source file if you like. We have found that it is usually better not to take advantage of this capability, and not only because the mode inference quite weak (the other two are much better). Several people who originally disliked the thought of declarations have come around to accepting and appreciating them, as they work with other people's code.



From: Tom Breton
Subject: Re: Documenting Large Prolog Systems

Zoltan Somogyi wrote:
> The HAL project later explored the integration of constraints
> into Mercury, first as a separate language (HAL) that compiles to Mercury,
> and now as language extensions in Mercury itself.

Can you elaborate on those language extensions?  

My information must be out of date.  Last I saw, the HAL docs mentioned only extending Mercury with a few new node labels like "new" and "old", and the Mercury docs didn't seem to mention HAL or constraints.



From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems


I wrote:
> The HAL project later explored the integration of constraints
> into Mercury, first as a separate language (HAL) that compiles to Mercury,
> and now as language extensions in Mercury itself.

Tom Breton wrote:
> Can you elaborate on those language extensions? 

The main one is that you can now declare a type to be a solver type. A solver type has two faces. Everywhere in the program except its defining module, it stands for a data structure (e.g. a float or a term) on which constraints may be placed; its instantiation state is usually "any". Inside the defining module, i.e. the module that implements the constraint solver, its type will typically be different (e.g. it may be an integer representing an index into a table) and its instantiation states will also be different (typically ground). The solver writer must convert between the two views when necessary, and is required to ensure that the operations on the internal representation faithfully implement the semantics of the public solver type.

Recent versions of the Mercury reference manual have more detail on solver types. If you are interested in the design rationale, you'll have to search the mercury-developer mailing list, available from the Mercury web site.

> My information must be out of date.  Last I saw, the HAL docs
> mentioned only extending Mercury with a few new node labels like "new"
> and "old", and the Mercury docs didn't seem to mention HAL or
> constraints.

The Mercury web site didn't say much about HAL because HAL was being built by a different group (mostly across town), and we figured that having stuff about HAL on our web page was a double maintenance problem. So we just had a link to the HAL page.

The Mercury docs don't trumpet constraints yet because that capability is still being worked on, and will be for a while. We don't want people to unknowingly start relying on features that may be changed in the near future. Of course, we can always use beta testers, and any feedback would be appreciated.



From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems

Gregory Bourassa wrote:
> For each predicate or relation you should
> document:
>    Name
>    Arity
>    Known flow patterns (bound on call,  free on call for each
> argument)...e.g. (var, var, nonvar) (var,var,var) (novar,var,nonvar),  or
> (free,free.bound) etc., or (i,i,o)...whatever notation you like as long as
> you do it.
>    Some comments on the expected terms which can be passed as arguments
> -- this is very important to maintainers.  Let's be practical, rarely can
> every argument can be of every functor type and still work!
>    Whether or not failure can happen
>    Whether or not the predicate can backtrack (deterministic or
> nondeterministic)

Just for the record, the above pieces of information (which is a large part of the list from the post I am replying to) are all present in Mercury's type, mode and determinism declarations for a predicate.

We designed Mercury to have this information in declarations because our experience, just as Gregory's, led us to believe that maintainers are often lost without this information. However, in Mercury, the information in these declarations is verified by the compiler, so maintainers can rely on it. In Prolog, it is all to easy to update the code without updating the documentation.



From: Lindsey Spratt
Subject: Re: Documenting Large Prolog Systems

I discussed the implementation of large system (SPARCL) by having an  abstraction hierarchy of subsystems and modules, where a subsystem was  composed of several modules. The implementation discussion was  organized by subsystem (subsystems can share modules). The discussion  of the implementation of a system was organized in part using module  dependency graphs. (The module dependency graphs were programmatically  generated.)

This discussion was chapter 6 of my PhD thesis which can be found at:
http://homepage.mac.com/lspratt/papers/PhD_Dissertation.dir/SPARCL-Vis.%20Set%20Logic%2C%20PDF.sit

[Editor's note: Lindsey later added that the PhD thesis is also available
                in .zip format at
                http://homepage.mac.com/lspratt/papers/PhD_Dissertation.dir/Spratt1996-SPARCL.zip]

The modules in my discussion are fairly large in many cases and for  documentation purposes are perhaps like the "packages" Gregory Bourassa  mentions. I think something like the encylopedic approach Gregory  advocates would be useful, but only in the context of documentation of  the architecture or higher levels of abstraction of the system (such as  the subsystem and module levels).