Convex functions :
Monotone functions :
Convex hull :
CIMCP, correlated and discriminative itemset mining.
CIMCP is the Correlated/Discriminative Itemset Mining system that uses Constraint Programming. It can prune te search space extremely efficiently for correlation measures such as information gain and chi-square and other disciminative measures such as accuracy and laplace.
The system even allows to mine all itemsets that could be optimal according to a correlation measure, namely all the itemset on the convex hull in PN-space.
CIMCP is fully compatible and builds further on FIM_CP and Gecode, an open and efficient constraint solver written in C++.
Download latest version: CIMCP 2.1
Read the Download and Installation instructions.
Questions and bugreports can be sent to tias.guns@cs.kuleuven.be
When writing a publication that uses this software, you can cite the KDD 2009 paper as such:
Siegfried Nijssen, Tias Guns, Luc De Raedt: Correlated Itemset Mining in ROC Space: A Constraint Programming Approach. 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09), Paris, France, 2009.
The Models
The models are divided in 3 categories, depending on their constraints:- Convex functions: Models that optimize a convex (correlation) function, such as information gain, chi-square, gini index and fisher score.
- Monotone functions: Models that optimize a monotone function, such as accuracy, relative accuracy and laplace.
- Convex hull: A model that finds all the itemsets on the convex hull in PN-space.
Every model has a link to its source file, ignore the rest of the Doxygen documentation, as it does not capture the structure that most of the models use.
For these models we implemented a custom constraint in gecode. If you want to do something similar, look at the source of constraint_Fconvex.cpp. An example of a stand-alone constraint is FconvexBoolGq, but the file contains more propagators to support reification.
Information Gain | (source code preview: cimcp_infgain.cpp) |
Description: This model finds discriminative patterns, using information gain as measure
Usage example:Specific options for cimcp_infgain: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Chi-Square | (source code preview: cimcp_chi2.cpp) |
Description: This model finds discriminative patterns, using chi-square correlation as measure
Usage example:Specific options for cimcp_chi2: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Gini Index | (source code preview: cimcp_gini.cpp) |
Description: This model finds discriminative patterns, using gini index as measure
Usage example:Specific options for cimcp_gini: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Fisher Score | (source code preview: cimcp_fisher.cpp) |
Description: This model finds discriminative patterns, using gini index as measure
Usage example:Specific options for cimcp_fisher: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Accuracy | (source code preview: cimcp_accuracy.cpp) |
Description: This model finds discriminative patterns, using accuracy as measure
Usage example:Specific options for cimcp_accuracy: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0.5 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Relative Accuracy | (source code preview: cimcp_accuracyRel.cpp) |
Description: This model finds discriminative patterns, using relative accuracy as measure
Usage example:Specific options for cimcp_accuracyRel: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Laplace | (source code preview: cimcp_laplace.cpp) |
Description: This model finds discriminative patterns, using laplace as measure
Usage example:Specific options for cimcp_laplace: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name) -delta (floating point value) default: 0 delta parameter -alpha (unsigned int) default: 1 alpha parameter
Convex Hull | (source code preview: cimcp_convexhull.cpp) |
Description: This model finds all patterns on the convex hull (for both classes)
Usage example:Specific options for cimcp_convexhull: -output (none, normal, cpvars) default: normal type of output of solutions none: do not output solutions normal: print solutions (FIMI-style) cpvars: print the CP variables of the solutions) -cclause (unsigned int) default: 1 coverage constraint using clause ? -datafile (filename with extention) default: example.txt filename of dataset to use (any name) -solfile (filename with extention) default: filename to write solutions to (any name)