HMC Software and Datasets
This page contains supporting materials for the paper C. Vens, J. Struyf, L. Schietgat, S. Džeroski, H. Blockeel, "Decision trees for hierarchical multi-label classification", Machine Learning 73(2):185-214, 2008.
- Download pre-print in PDF format.
Software
The Clus-HMC algorithm is implemented in the Clus system.Datasets
The datasets used in our experimental comparison are from the field of functional genomics. Amanda Clare kindly provides us with the data sets. The original versions can be found here. We keep the input features, but add new class labels. In a first version, we took annotations from MIPS Functional Catalogue; in a second version we took Gene Ontology terms. The table below shows the details.
| FunCat | Gene Ontology | |
| Scheme version | 2.1 (2007/01/09) | 1.2 (2007/04/11) |
| Yeast annotations | 2007/03/16 | 2007/04/07 |
| Total classes | 1362 | 22960 |
| Avg nb classes per dataset | 492 (6 levels) | 3997 (14 levels) |
| Avg nb labels per example | 8.8 (3.2 most specific) | 35.0 (5.0 most specific) |
The datasets are recorded in Weka's arff format, and are ready to be used with Clus. For each dataset, there are 3 arff files: train, valid, and test. The file valid was used in our article to tune the f-test stopping criterion. The final model, constructed on the union of train and valid, was tested on test.
| FunCat annotated datasets | Gene Ontology annotated datasets |
Extra materials
- Example settings files to be used with Clus (to run Clus in HMC mode):
- Optimal HMC ftest values for all datasets (Clus automatically selects the optimal f-value for the area under the average PR curve with the above settings files).
- Scripts to run the SC and HSC settings (run_sc.pl, run_hsc.pl), and scripts to create precision-recall curves based on a fixed set of thresholds (prcurves.pl, computepr.pl, ipol_pr.pl). These are also included in the "data/church_FUN" directory that is distrubuted with the Clus software.
Questions?
Please direct questions about Clus-HMC/HSC/SC to Celine Vens, Jan Struyf, and Leander Schietgat.


