Net Talk
edited by Roberto Bagnara
|
Documenting Large Prolog Systems
From: David Poole
What experiences do people
have documenting large Prolog systems?
We are looking for something more than predicate or module-level
documentation; a way for a new user to understand the structure of a
large Prolog system. Something like UML, but for relations not just
describing objects. Has anyone tried documenting a Prolog program in
UML? Our intuition is that this is not a good place to start, but I
thought I would ask about what are good places to start. Any experience
you have would be appreciated.
The Prolog program we are using is part of a larger system that is
being documented in UML. I'd appreciate any feedback from anyone who
has tried to document large Prolog systems.
|
From: A.L.
Subject: Re: Documenting Large Prolog Systems
I am using Z, strictly speaking, some subset of Object-Z. Prolog system
is encapsulated as Java classes, therefore UML can be used on the
client side.
Maybe Z is not the ideal tool, but good enough...
From: Gregory Bourassa
Subject: Re: Documenting Large Prolog Systems
I have documented large Prolog systems, and it is not too
hard. That said, any of the off-the-shelf documentation tools
like UML will be tortured if you try to use them -- they have no
representation for backtracking, failure, unification, and multiple
solutions from backtracking.
Your best bet is to assume an "Encyclopoedia of predicates" approach
and document the main predicates -- not the helpers which are only
there to support interesting predicates. For each predicate or
relation you should document:
- Name
- Arity
- Known flow patterns (bound on call, free on call for each
argument)...e.g. (var, var, nonvar) (var,var,var)
(novar,var,nonvar), or (free,free.bound) etc., or
(i,i,o)...whatever notation you like as long as you do it.
- A comment for each flow pattern
- Some comments on the expected terms which can be passed as
arguments -- this is very important to maintainers. Let's be
practical, rarely can every argument can be of every functor type and
still work!
- Whether or not failure can happen
- What is returned on multiple entries due to backtracking
- Whether or not the predicate can backtrack (deterministic or
nondeterministic)
- Whether or not you can expect any system-level exceptions and
what they might be
- What data (facts) if any are expected to exist in the environment
at run-time for the predicate to succeed
...and probably other things, but they escape my memory just now --
the above constitutes a good start
If you can, partition the system into "packages". There is
usually a set of list-handling and other predicates which are just
"utilities" used everywhere, then there will be other modules or
packages which you can recognise, which do things like file
access, IP communications, pretty-printing,
graphics, and finally the various parts of the core behaviour of
your system.
Let the rest of the project team know that the Prolog subsystem is
meta-relational and meta-object-oriented, so they will have to
deal with your documentation as something more than they think they see
-- they will think they are seeing something very like a C-language
API. Don't stand for such complacency.
From: Brian Hulley
Subject: Re: Documenting Large Prolog Systems
Gregory Bourassa wrote:
> support
interesting predicates. For each predicate or relation you should
> document:
>
Name
>
Arity
>
Known flow patterns (bound on call, free on
call for each
>
argument)...e.g. (var, var, nonvar) (var,var,var) (novar,var,nonvar), or
>
(free,free.bound) etc., or (i,i,o)...whatever notation you like as long
as
> you do
it.
This reminds me of a paper I came across a while ago (unfortunately I
don't remember the authors' names) where they automatically detected
the sets of argument mode assignments for each predicate to see if the
program was "simply well-moded" (to enable optimizations when
compiling). Ie you start from your application's top level predicate eg
go/0, and recursively traverse the rest of the program asserting the
modes at each stage. (Presumably you could do something similar working
bottom up for the case where you have an API rather than an app.)
>
Some comments on the expected terms which can be
passed as arguments
It would be interesting to know if anyone has automatically generated
this info (ie the type lattice)- on the face of it it seems possible
for some sub-systems of predicates (ie those which don't construct
atoms/functors from character codes or user input etc)
From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems
Brian Hulley wrote:
> This
reminds me of a paper I came across a while ago (unfortunately I
> don't
remember the authors' names) where they automatically detected
> the sets
of argument mode assignments for each predicate to see if the
> program
was "simply well-moded" (to enable optimizations when
>
compiling). Ie you start from your application's top level predicate eg
> go/0,
and recursively traverse the rest of the program asserting the
> modes at
each stage. (Presumably you could do something similar working
> bottom
up for the case where you have an API rather than an app.)
The standard name for this is mode inference, and there have been many
papers on it, mostly in the eighties. It never got anywhere much, for
two reasons.
First, the search space (the possible sets of predicate/mode pairs) is
too large to be searched exhaustively, and even the best heuristics
that people came up with don't cut it down sufficiently. The only way
to keep the process feasible was to introduce approximations. As the
program becomes larger, these approximations come to dominate the
output; which means that the larger the program, and thus the more you
need the tools, the less the tool tells you.
Second, if you apply inference to a real program, all you get is
information about how information actually *does* flow in the program;
it doesn't tell you how it *should* flow, or where the actual flow
differs from the desired flow. Even if the user does a visual diff of
the output of the inference against their expectations, these systems
don't really help in tracking down the sources of discrepancies.
The problems reinforce each other. The more approximations the
inference makes, the harder the job of doing the visual diff is, and
the more bugs exist in the program, the more approximations must be
made.
Suppose you have a predicate whose mode should be (in,out), i.e. the
first argument is input (caller should supply a ground term) but the
second is output (caller should supply an fresh, unaliased variable).
Suppose you have a bug in which one clause out of several switches the
roles of the two arguments. Since a straight mode inference algorithm
takes the code as gospel, it cannot infer (in,out) as the mode; it must
infer (in or out, in or out), a mode that contains no meaningful
information. This uncertainty propagates to the predicate's callers and
to their callees, and from there to the rest of the program.
>> Some
comments on the expected terms which can be passed as arguments
> It would
be interesting to know if anyone has automatically generated
> this
info (ie the type lattice)- on the face of it it seems possible
> for some
sub-systems of predicates (ie those which don't construct
>
atoms/functors from character codes or user input etc)
While type inference usually means inference of the types of
predicates' arguments, there is a form which tries to infer the
definitions of types themselves; I think you are asking for both. While
the first is definitely feasible and in fact is a routine part of many
languages (e.g. Haskell, and it is available though not recommended for
Mercury), there are good reasons for not doing the second. There are
usually much fewer types than predicates, so defining them explicitly
is less work for the programmer, and any mistakes by the inference of
type definitions are magnified hugely when you try to use its output in
type inference of predicate arguments. In most people's opinion, the
cost/benefit ratio of type definition inference is very far from
favourable.
From: Brian Hulley
Subject: Re: Documenting Large Prolog Systems
Zoltan Somogyi wrote:
> it
cannot infer (in,out) as the mode; it must infer (in or out, in or out),
I would have thought that the goal of mode inference would be to infer
a *set* of mode assignments for each predicate, thus the above bug
would return the set, {(in, out), (out,in)} which would at least tell
you that the predicate is being used in more than one way, which might
help a tool flag possible bugs since I would have expected most
predicates in a large program to only have one flow (except the very
general utility predicates like member/2 etc)
My perceived problem with systems which require flow declarations is
that they can break the flow of thought from programmer to predicate
(!) in the same way that the need to first decide how to reify a domain
into objects and methods prevents exploratory programming in object
oriented languages. Also, the concept of in/out seems somehow, to me at
least, not quite the same as the pure concept of unification thus I
would not want to have to think in terms of in/out during the
incarnation of some prolog-thought into a prolog predicate.
However such declarations are obviously very useful later on, for code
maintenance, generation of efficient code etc, hence the desire to
generate them automatically.
Of course the above is just my personal experience and no doubt others
find such declarations helpful right from the beginning - I'll have
another look at the Mercury website.
From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems
I wrote:
> it
cannot infer (in,out) as the mode; it must infer (in or out, in or out),
Brian Hulley wrote:
> I would
have thought that the goal of mode inference would be to infer
> a *set*
of mode assignments for each predicate, thus the above bug
> would
return the set, {(in, out), (out,in)} which would at least tell
> you that
the predicate is being used in more than one way, which might
> help a
tool flag possible bugs
Yes. But if the incorrect analysis of one predicate contanimates the
analysis of many other predicates, then the programmer is faced with a
whole slew of messages saying "this predicate has more than one mode,
and is therefore likely to be buggy" with no clue as to which of these
are cause and which are effect.
> since I
would have expected most
>
predicates in a large program to only have one flow (except the very
> general
utility predicates like member/2 etc)
That is true. Our statistics on the Mercury system, taken at various
times over the years, have consistently said that the average number of
modes per predicate is in the 1.03 to 1.05 range. As you say, it is
mostly (although not exclusively) utility predicates in the library
that have more than one mode. Since some of these have more than two
modes, the fraction of predicates with more than one mode is probably
around 2-3%.
Those statistics are for one program; other programs may yield
different numbers. However, I wouldn't expect the numbers to be *too*
different.
> My
perceived problem with systems which require flow declarations is
> that
they can break the flow of thought from programmer to predicate
There is nothing preventing you from writing the code first and adding
the declarations later, as long as you add the declarations (at least
for the predicates exported from a module) before you compile the code.
> Also,
the concept of in/out seems somehow, to me at
> least,
not quite the same as the pure concept of unification
Unification is used in several ways in Prolog. Most of the time, it is
used purely for parameter passing, and for that, thinking in terms of
in/out is useful. More rarely, it is used for solving Herbrand
constraints. When we designed Mercury in 1993-94, we didn't know how to
handle constraints (of any kind) while keeping the properties we wanted
(ability to handle I/O in a declarative manner, reliability from
checked declarations, maintainability for the same reason, and speed),
so we didn't include constraints. The HAL project later explored the
integration of constraints into Mercury, first as a separate language
(HAL) that compiles to Mercury, and now as language extensions in
Mercury itself.
The point is, parameter passing is fundamentally simple and should be
fast. Constraint solving isn't anywhere near that simple in general,
and can't be made that fast. Using the same mechanism for both may be
intellectually parsimonious, but it is not a sound engineering choice
if performance is a concern, which it is in programming languages.
> thus I
> would
not want to have to think in terms of in/out during the
>
incarnation of some prolog-thought into a prolog predicate.
This is where people differ, partly because of psychological
differences and partly because they tackle different tasks. If I had to
code a problem which can be modelled as Herbrand constraints, I
wouldn't think in terms of in/out either, but in pretty much all other
cases, I would.
> However
such declarations are obviously very useful later on, for code
>
maintenance, generation of efficient code etc, hence the desire to
> generate
them automatically.
The Mercury compiler can infer the types, modes and determinisms of
predicates that are local to their module; it will print out the
inferred declarations, and you can cut-and-paste them into the source
file if you like. We have found that it is usually better not to take
advantage of this capability, and not only because the mode inference
quite weak (the other two are much better). Several people who
originally disliked the thought of declarations have come around to
accepting and appreciating them, as they work with other people's code.
From: Tom Breton
Subject: Re: Documenting Large Prolog Systems
Zoltan Somogyi wrote:
> The HAL
project later explored the integration of constraints
> into
Mercury, first as a separate language (HAL) that compiles to Mercury,
> and now
as language extensions in Mercury itself.
Can you elaborate on those language extensions?
My information must be out of date. Last I saw, the HAL docs
mentioned only extending Mercury with a few new node labels like "new"
and "old", and the Mercury docs didn't seem to mention HAL or
constraints.
From: Zoltan Somogyi
Subject: Re: Documenting Large Prolog Systems
I wrote:
> The HAL
project later explored the integration of constraints
> into
Mercury, first as a separate language (HAL) that compiles to Mercury,
> and now
as language extensions in Mercury itself.
Tom Breton wrote:
> Can you
elaborate on those language extensions?
The main one is that you can now declare a type to be a solver type. A
solver type has two faces. Everywhere in the program except its
defining module, it stands for a data structure (e.g. a float or a
term) on which constraints may be placed; its instantiation state is
usually "any". Inside the defining module, i.e. the module that
implements the constraint solver, its type will typically be different
(e.g. it may be an integer representing an index into a table) and its
instantiation states will also be different (typically ground). The
solver writer must convert between the two views when necessary, and is
required to ensure that the operations on the internal representation
faithfully implement the semantics of the public solver type.
Recent versions of the Mercury reference manual have more detail on
solver types. If you are interested in the design rationale, you'll
have to search the mercury-developer mailing list, available from the
Mercury web site.
> My
information must be out of date. Last I saw, the HAL docs
>
mentioned only extending Mercury with a few new node labels like "new"
> and
"old", and the Mercury docs didn't seem to mention HAL or
>
constraints.
The Mercury web site didn't say much about HAL because HAL was being
built by a different group (mostly across town), and we figured that
having stuff about HAL on our web page was a double maintenance
problem. So we just had a link to the HAL page.
The Mercury docs don't trumpet constraints yet because that capability
is still being worked on, and will be for a while. We don't want people
to unknowingly start relying on features that may be changed in the
near future. Of course, we can always use beta testers, and any
feedback would be appreciated.
From: Zoltan
Somogyi
Subject: Re: Documenting Large Prolog Systems
Gregory Bourassa wrote:
> For each
predicate or relation you should
> document:
>
Name
>
Arity
>
Known flow patterns (bound on call, free on call for each
>
argument)...e.g. (var, var, nonvar) (var,var,var)
(novar,var,nonvar), or
>
(free,free.bound) etc., or (i,i,o)...whatever notation you like as long
as
> you do
it.
>
Some comments on the expected terms which can be passed as arguments
> -- this
is very important to maintainers. Let's be practical, rarely can
> every
argument can be of every functor type and still work!
>
Whether or not failure can happen
>
Whether or not the predicate can backtrack (deterministic or
>
nondeterministic)
Just for the record, the above pieces of information (which is a large
part of the list from the post I am replying to) are all present in
Mercury's type, mode and determinism declarations for a predicate.
We designed Mercury to have this information in declarations because
our experience, just as Gregory's, led us to believe that maintainers
are often lost without this information. However, in Mercury, the
information in these declarations is verified by the compiler, so
maintainers can rely on it. In Prolog, it is all to easy to update the
code without updating the documentation.
From: Lindsey
Spratt
Subject: Re: Documenting Large Prolog Systems
I discussed the implementation of large system (SPARCL) by having
an abstraction hierarchy of subsystems and modules, where a
subsystem was composed of several modules. The implementation
discussion was organized by subsystem (subsystems can share
modules). The discussion of the implementation of a system was
organized in part using module dependency graphs. (The module
dependency graphs were programmatically generated.)
This discussion was chapter 6 of my PhD thesis which can be found at:
[Editor's note: Lindsey later added that the PhD thesis is also
available
in .zip format at
http://homepage.mac.com/lspratt/papers/PhD_Dissertation.dir/Spratt1996-SPARCL.zip]
The modules in my discussion are fairly large in many cases and
for documentation purposes are perhaps like the "packages"
Gregory Bourassa mentions. I think something like the encylopedic
approach Gregory advocates would be useful, but only in the
context of documentation of the architecture or higher levels of
abstraction of the system (such as the subsystem and module
levels).