HMC Software and Datasets
This page contains supporting materials for the paper L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, S. Džeroski, "Predicting gene function using hierarchical multi-label decision tree ensembles", BMC Bioinformatics 2010, 11:2.
Software
The Clus-HMC and Clus-HMC-Ens algorithms are implemented in the Clus system.Datasets
The datasets used in our experimental comparison are from the field of functional genomics. Amanda Clare, Zafer Barutcuoglu, Tim Hughes and Fritz Roth kindly provided us with the datasets. The original versions of D1-D18 can be found here. The datasets originate from the organisms S. cerevisiae and A. thaliana and have annotations from the MIPS Functional Catalogue and the molecular function branch of Gene Ontology. Dataset D19 originates from the organism M. musculus and has annotations from the 3 branches of the Gene Ontology. The original data can be found here.
The datasets are recorded in Weka's arff format, and are ready to be used with Clus. For each dataset, there are 3 arff files: train, valid, and test. The file valid was used in our article to tune the f-test stopping criterion. The final model, constructed on the union of train and valid, was tested on test.
S. cerevisiae datasets
| FunCat annotated datasets | Gene Ontology annotated datasets |
A. thaliana datasets
| FunCat annotated datasets | Gene Ontology annotated datasets |
M. musculus dataset
Parameter settings for Clus-HMC(-Ens)
- Example settings files to be used with Clus-HMC(-Ens):
- To run Clus-HMC-Ens, include the command line option "-forest" when running Clus.
Data files for figures in the paper
- Pooled AUPRC comparison (Fig. 3): csv file
- Average AUPRC comparison (Fig. 7): csv file
- Average precision at C4.5H/M's recall (Fig. 8): csv file
- AUROC comparison (Fig. 12): csv file
Questions?
Please direct questions about Clus-HMC(-Ens) to Leander Schietgat, Celine Vens, and Jan Struyf.


