Recent updates
- "CPSM: Constraint Programming for Sequence Mining" to be presented at CPAIOR 2015, paper and software
- "Modeling in MiningZinc" detailed instructions and examples, used at the ICON summer school 2014.
- Slides of my Frontiers of AI talk @ ECAI 2014, "Constraint Solving in Data Mining".
- "CCCG: Constrained Clustering using Column Generation" presented at CPAIOR 2014, paper and software
- "MiningZinc: A Language for Constraint-based Mining" presented at IJCAI 2013, paper and software
- "Mining local staircase patterns" presented at CoClus workshop 2012, paper and software
Introduction
Constraint Programming for Itemset Mining (CP4IM) is a declarative approach to constraint-based itemset mining.
Instead of hand-crafting imperative algorithms, in constraint programming you declaratively specify a problem by means of the constraints it needs to satisfy. A generic solver will then effectively search for the solutions that satisfy the constraints.
In L. De Raedt, T. Guns, S. Nijssen. Constraint programming for itemset mining, KDD 2008,
we have shown that this is a viable approach for many pattern mining problems such as frequent, closed, cost-based itemset mining and more.
An accessible introduction is available in T. Guns, S. Nijssen, L. De Raedt. Itemset mining: A constraint programming perspective, Artificial Intelligence 175(12-13), 2011.
This website hosts an overview of our publications, the software we have developed as well as the datasets we used.
We additionally aim to gather information on the use of constraint programming for pattern mining in general.
Contact Tias Guns with questions or comments regarding this website.
Constraint-Based Mining
Mining all (or top-k) itemsets that satisfy the constraints.- L. De Raedt, T. Guns, S. Nijssen. Constraint programming for itemset mining, KDD 2008. PDF related video Presents FIM_CP, the most flexible itemset mining framework to date. Some capabilities of the framework:
- Different interestingness measures including frequent itemsets, discriminating itemsets and emerging itemsets.
- Different condensed representations including closed itemset mining (formal concept learning), delta-closed and maximal itemset mining.
- Different item constraints where one can put a treshold on both the minimum or the maximum value. For example on the size of the itemset, or in case the cost over every item is known, the total or average cost of the itemset.
- Combining constraints all of the above constraints can be easily combined, as well as any other constraint that one can express. This requires minimal work thanks to the declarative specification language.
- S. Nijssen, T. Guns, L. De Raedt. Correlated itemset mining in ROC space: A constraint programming approach, KDD 2009. PDF video Presents CIMCP, the discriminative/correlated itemset mining framework with the most effective pruning to date.
- Any convex or monotone function using the number of positive and negative examples covered can be used. Examples of convex functions are information gain, chi-square, gini index and fisher score. Examples of monotone functions are accuracy, relative accuracy and laplace.
- Many different domains use this kind of functions, including contrast set mining, emerging pattern mining and subgroup discovery, but it has also been named k-optimal pattern discovery, interesting itemset mining, emerging itemset mining, discriminative itemset mining and correlated itemset mining. Lastly, many of the functions used have their origin in rule learning.
- Other constraints can be easily added in the search, as CIMCP is built on FIM_CP.
- L. De Raedt, T. Guns, S. Nijssen. Constraint programming for data mining and machine learning, AAAI 2010. PDF slides An AAAI 'nectar track' paper with a 5 page synthesis on the KDD 08 and KDD 09 papers and an outlook on the future. For a more detailed version, see our AIJ journal paper.
- S. Nijssen, T. Guns. Integrating constraint programming and itemset mining, ECML PKDD 2010. PDF slides Shows how a framework can be built that inherits the generality of CP systems and the efficiency of specialized mining algorithms.
- T. Guns, S. Nijssen, L. De Raedt. Itemset mining: A constraint programming perspective, Artificial Intelligence 175(12-13), 2011. PDF related video A journal paper of the KDD 08 and 09 papers, with many examples and detailed comparisons with existing mining techniques. Software and examples available online.
Also uses corrmine, a specialised correlated itemset miner using the same principles of CIMCP, but written from scratch in C++. Because it implements just this one task, it is faster and more scalable then CIMCP, but less general.
Software and examples available online.
Software available online.
Pattern Set Mining
Mining a small set (e.g. of size k) of patterns that together satisfy the constraints.-
T. Guns, S. Nijssen, L. De Raedt. k-Pattern set mining under constraints, IEEE TKDE 2011.
PDF
Winner of the Innovation Award at the Lorentz workshop on Mining Patterns and Subgroups, 2010.
Presents both a high-level language in which to express k-pattern set mining problems, as well as a transformation from this language to constraint programming. We show how many known data mining and machine learning tasks can be seen as instances of k-pattern set mining, including concept learning, conceptual clustering, tiling and redescription mining. These tasks only differ in the constraints that are imposed, both at the level of individual patterns (local) and the entire pattern set (global).
- T. Guns, S. Nijssen, L. De Raedt. Evaluating pattern set mining strategies in a constraint programming framework, PAKDD 2011. PDF slides Studies, compares and evaluates search strategies for pattern set mining. Employs concept-learning as a benchmark and uses CP4IM to formulate and implement the key components of the different strategies, including the k-pattern set mining strategy.
- T. Guns, S. Nijssen, A. Zimmermann, L. De Raedt. Declarative heuristic search for pattern set mining, DPM (ICDMW) 2011. PDF slides Traditional constraint programming uses exhaustive search. This paper explores the use of heuristic search in a declarative constraint programming language.
Software available online.
Software available online.
Declarative Comet models available online.
More Settings
- T. Guns, S. Hong, M. Kathleen, S. Nijssen. Cis-regulatory module detection using constraint programming, BIBM 2010. PDF A bio-informatics application in which the CP4IM framework is used and extended with a domain-specific constraint.
- S. Nijssen, A. Jiminez, T. Guns. Constraint-based pattern mining in multi-relational databases, DPM (ICDMW) 2011. PDF Constraint-based multi-relational mining using a declarative graphical language (close to ER) that can be translated into constraint programming. Supports many constraints, including (anti) monotonicity and closedness constraints, over complex aggregates involving multiple relations.
- L. De Raedt, S. Nijssen. Machine Learning and Data Mining: Challenges and Opportunities for Constraint Programming, CP 2011 tutorial. slides
- L. De Raedt, S. Nijssen. Towards Programming Languages for Machine Learning and Data Mining, ISMIS 2011 invited talk. DOI
Software available online.
Software available online (soon).