# What does Subset Mean?

Appeared in Volume 8/3, August 1995

Keywords: sets.

Richard A. O'Keefe

A few months ago, in the comp.lang.prolog newsgroup, I offered the following code for subset/2:

member(X, [X|_]).
member(X, [_|Xs]) :-
member(X, Xs).

subset([], _).
subset([X|Xs], Ys) :-
member(X, Ys),
subset(Xs, Ys).
and pointed out that the query

?- subset([A,B], [1,2,3]).

finds all nine solutions:

B = 1, A = 1 ;  B = 2, A = 1 ;  B = 3, A = 1 ;
B = 1, A = 2 ;  B = 2, A = 2 ;  B = 3, A = 2 ;
B = 1, A = 3 ;  B = 2, A = 3 ;  B = 3, A = 3 .
Gerard Ellis challenged me about this, claiming that there are three solutions, not nine. I decided to follow it up because it is an interesting question. Everything hinges on the precise definition of the meaning of the subset/2 predicate.

I started from the meaning:

subset(Small, Large) is to be true if and only if Small is a list containing the elements of a set in no particular order, possibly containing duplicates. Large is a list containing the elements of a set in no particular order, possibly containing duplicates. Also, the set that Small represents is a subset of the set that Large represents.

In particular, this means that:

subset(X, [1])

has infinitely many solutions:

X = [] ; X = [1] ; X = [1,1] ; X = [1,1,1] ; ...

The fact that duplicates are allowed in Small meant that the solutions [1,1], [2,2], and [3,3] are allowed. These are solutions with two-element lists representing one-element sets.

You have to be explicit about your intended representation of sets. In my view, there is only really one good choice in plain Prolog (a list in sorted order without duplicates); there are also extensions of Prolog with direct support for sets.

Suppose we now decide that a list representing a set should not contain duplicates, and we say:

subset(Small, Large) is only to be used when Large is a list containing the elements of a set in no particular order, there being no duplicates. It is then true when Small is a list containing the elements of a set in no particular order, there being no duplicates. Also, the set that Small represents is a subset of the set that Large represents.

In that case, we need different code, because the meaning is different.

subset([], _).
subset([X|Xs], Set) :-
select(X, Set, Set1),
subset(Xs, Set1).

select(X, [X|R], R).
select(X, [H|T], [H|R]) :-
select(X, T, R).
The use of select/3 instead of member/2 ensures that no element can be selected twice. (This is a standard technique.) With this definition, the query:

?- subset(X, [1]).

has precisely two answers: X = [] ; X = [1]

and the query:

?- subset([A,B], [1,2,3]).

B = 2, A = 1 ;  B = 3, A = 1 ;  B = 1, A = 2 ;
B = 3, A = 2 ;  B = 1, A = 3 ;  B = 2, A = 3.
The three sets {1,2}, {1,3}, and {2,3} are each represented by two lists. It is right that the Prolog code has six answers, because there really are six different lists (without duplicate elements) representing two- element subsets of {1,2,3}.

If we want a one-to-one correspondence between lists and sets, we have to use a data structure with this correspondence built in, such as the sorted list representation. We can build a subset predicate which assumes this representation in Large and thereby implicitly yields it in Small:

subset([], _).
subset([X|Xs], Set) :-
append(_, [X|Set1], Set),
subset(Xs, Set1).

append([], L, L).
append([H|T], L, [H|R]) :-
append(T, L, R).
With this definition, the query:

?- subset([A,B], [1,2,3]).

B = 2, A = 1 ; B = 3, A = 1 ; B = 3, A = 2.

Note, however, that the query:

?- subset([A,B], [3,2,1]).

yields answers where the first argument is not a sorted list. The implementation assumes the validity of the second argument.

If you want a subset predicate where the second argument is in no particular order, but you still get only one list for each possible subset, you have to force the order you want. One way to do that is this:

subset([], _).
subset([X|Xs], Set) :-
select(X, Set, Set1),
subset(Xs, X, Set1).

subset([], _, _).
subset([X|Xs], Z, Set) :-
select(X, Set, Set1),
X @> Z,
subset(Xs, X, Set1).
which will yield sorted subsets whether the second argument is sorted or not, and whether the second argument contains duplicates or not. It requires only that the second argument be ground.

The important point here is: what counts as a valid solution to the question subset(S, L), and hence what counts as a correct implementation of the predicate, depends on what meaning you want subset/2 to have. In other words, it depends on exactly which lists are allowed as representatives of any given set. Other set data structures are possible.

Richard A. O'Keefe
Dept. of Computer Science, RMIT
Melbourne, Australia
Email: ok@goanna.cs.rmit.edu.au
URL: http://www.cs.rmit.edu.au/~ok