This page contains supporting materials for the paper C. Vens, J. Struyf, L. Schietgat, S. Džeroski, H. Blockeel, "Decision trees for hierarchical multi-label classification", Machine Learning 73(2):185-214, 2008.
The datasets used in our experimental comparison are from the field of functional genomics. Amanda Clare kindly provides us with the data sets. The original versions can be found here. We keep the input features, but add new class labels. In a first version, we took annotations from MIPS Functional Catalogue; in a second version we took Gene Ontology terms. The table below shows the details.
|Scheme version||2.1 (2007/01/09)||1.2 (2007/04/11)|
|Avg nb classes per dataset||492 (6 levels)||3997 (14 levels)|
|Avg nb labels per example||8.8 (3.2 most specific)||35.0 (5.0 most specific)|
The datasets are recorded in Weka's arff format, and are ready to be used with Clus. For each dataset, there are 3 arff files: train, valid, and test. The file valid was used in our article to tune the f-test stopping criterion. The final model, constructed on the union of train and valid, was tested on test.
|FunCat annotated datasets||Gene Ontology annotated datasets|
Please direct questions about Clus-HMC/HSC/SC to Celine Vens, Jan Struyf, and Leander Schietgat.