Datasets

Format: The datasets are in annotated transaction format with labels: every line is one transaction. A transaction is a space-separated list of item identifiers (offset 0), the last item is either 1 or 0 and represents the class label.
The meaning of every label is given in the header of the file: @<nr>: ... lines describe item number <nr>, @class: ... describes the two classes. To parse the files correctly, all lines starting with @, with % and empty lines should be ignored. (the format is a combination of the FIMI format with annotations like the ARFF format).

Sources: The original datasets were collected from the UCI machine learning repository. More datasets can be found in the FIMI repository, but they are not annotated.

Preprocessing: Preprocessing steps were added to the @relation tag of every file.

Properties: Different datasets have different properties and will behave differently. A key property to watch is density (the relative number of 1's in the binary format): traditional itemset mining focussed on very large and sparse datasets (see the FIMI competitions). In constraint-based mining dense datasets are considered harder to mine because of the large number of candidates. For discriminative itemset mining class labels are given, the number of positive transactions are indicated below for each dataset.
The number of itemsets (standard and closed/maximal condensed) is also given, for verification of correctness and as a guideline for usage. LCM ver. 4 was used to find them.