» ASP Competition

Errata

Errors were found in encodings and/or checkers of three benchmarks after the competition took place. Below we discuss these errors in detail and their possible impact on the outcome of the competition.

The errors on this page have been identified by Broes De Cat and Johan Wittocx.

We offer our sincere apologies for these errors.

  • Labyrinth: The labyrinth encoding on Asparagus assumes that a solution can always be found by pushing the row or column containing the goal. It is obvious that this is not always the case, for example in the following problem instance depicted below. This problem has a solution in one step, by pushing the middle column to north (push(2,n,1)). The shortest solution found by the asparagus encoding has two actions (push(3,e,1), push(2,n,2)).




    Unfortunately, this error was not detected because all solvable test and competition instances of this benchmark could be solved by a series of pushes to the tile containing the goal.

    The Asparagus encoding was used by at least 8 teams.

  • LabyrinthOptimize: the Asparagus encoding for the labyrinth benchmark was used also in the optimization version of this problem.

    The Asparagus encoding was used by 3 teams.

  • EdgeMatching: The Asparagus encoding and the DLV encoding contain the rule
    "rotate(R,0,R) :- rotation(R)." with a type mismatch on R: in "rotate(R,0,R)", R should range over {left,right,top,bottom} whereas in "rotation(R)", R ranges over {0,90,180,270}. As a consequence, the predicate "rotation" contains wrong tuples for rotation 0 and the constraints ensuring the color matches are trivially satisfied for any pair of tiles involving an unrotated tile. The checker of EdgeMatching is based on the Asparagus encoding and therefore, did not detect this error.

    The picture below shows an erroneous constellation of two tiles that is accepted by the checker of the edgematching problem (with 2 tiles) for the input instance available here.



    The Asparagus encoding was used by at least 8 teams, the DLV encoding only by the DLV team.



Impact on the competition results

In all three cases, the errors in the encodings make the benchmark problems computationally easier for the ASP solvers that use them. In the case of Labyrinth, the Asparagus encoding solves a variant of the benchmark that is computationally simpler than the actual benchmark. In the case of EdgeMatching, the same can be said for the encodings of Asparagus and of DLV.

In the three benchmarks, teams that did not use the erroneous encodings had a clear disadvantage. These are:

  • Labyrinth and LabyrinthOptimize: DLV and IDP
  • EdgeMatching: bpsolver-CLP(FD), IDP and Enfragmo
Inspection of the detailed results of the competition show that in the track "Optimization problems" to which LabyrinthOptimize belongs, margins between different contenders were quite large and the error most likely had no impact on the ranking.

The impact is possibly larger in the track "Decision problems in NP", to which Labyrinth and EdgeMatching belong. If we drop these 2 benchmarks and we compare total number of solved instances, then Potassco still leads with a good margin, followed by single system teams IDP, Claspfolio and Cmodels: here, IDP overtakes Claspfolio and Cmodels. This does not guarantee that IDP would have been first single system in this track. Indeed, rankings were made on the basis of weighted sums of number of solved instances for individual benchmarks and these weights depended on various parameters.

The only fully correct way to recompute rankings would be to rerun all solvers on all instances of the reduced set of benchmarks. We have decided against this.


Lessons for the future

In our opinion, a modeling competition like this ASP competition is valuable and gives valuable information on various aspect of modeling and solving (see the competition report).

Unfortunately, in such a competition, there is an inherent risk that bugs go unnoticed or even occur in the programs that should check for correctness. Based on our experience, here are some ideas that would have led to an early detection of the above problems and could help to reduce the risk for errors in future modeling competitions:

  • Multiple checkers: To test correctness of solutions, one could use more than one checker program. This would have shown the problem with EdgeMatching.

  • Visualization methods: The organizers could provide, for each benchmark, a visualization method for benchmark solutions. This is the way the bugs in Labyrinth and EdgeMatching were discovered. Several good visualization programs for structures/answer sets are available.

  • Apply the "small scope hypothesis": if there are bugs in an encoding, then probably they will show up in small domains. This idea could be exploited in various ways.

    • Exhaustive list of solutions for some test instances: The organizers could provide for some small problem instances an exhaustive list of all solutions. Since every benchmark encoding can easily be converted in a checker, each team could verify that its encoding accepts all solutions.

    • Exhaustive list of test instances: In the previous idea, it is important that the set of problem instances is large enough. ASP or FO(.) benchmark encodings can easily be converted in a generator that for a given input domain, generates all combinations of problem instances and solutions within this domain. The organizers could provide all combined solutions in the context of a fixed small domain.

    • Number of solutions: The above method might be infeasible because even for a small domain, the amount of problem instances and solutions may be prohibitive. A more compact method might be to provide the number of solutions for a given instance, or the number of combinations of instances and solutions for a given small domain.

    • A test script: Alternatively, the organizers could provide a script that generates a large number of combinations of instances and solutions and tests whether a benchmark encoding accepts all of them. The input of this script would be the participants benchmark encoding and a (small) domain or (small) problem instance.

For this competition, several combinations of the above techniques would have led to a timely detection of the above errors.