Project IQ: Inductive Queries for Mining Patterns and Models

1 September, 2005 to 31 August, 2008


At present, there is a lack of a generally accepted framework for data mining - the quest for such a framework is a major research priority. The most promising approach to this task is taken by inductive databases (IDBs) that contain not only data but also patterns. Patterns can be either local patterns, which describe properties of a subset of data, such as frequent itemsets, or global models such as decision trees which are predictive in nature and characterize the whole dataset. In an IDB, inductive queries can be used to generate, manipulate, and apply patterns. The IDB framework is appealing as a theory for data mining because it employs declarative queries instead of ad hoc procedural constructs. Declarative queries are often formulated using constraints so inductive querying is closely related to constraint-based data mining. The IDB framework is also appealing for data mining applications as it supports the process of knowledge discovery in databases (KDD) - the results of one inductive query may be used as input for another, thereby supporting multi-step KDD scenarios. Project IQ aims to develop the theory and practical approaches to inductive querying of global models, as well as the answering of complex inductive queries that involve both local patterns and global models. Based on this, showcase IDBs in the area of bioinformatics will be developed that will enable users to query data about drug activity, gene expression, gene function and protein sequences. Objectives:

  • To develop a sound, theoretical understanding of inductive querying that enables one to develop effective inductive database systems and to apply them to significant real-life applications.
  • Develop a number of significant show-case applications of inductive databases in the area of bioinformatics. The key criteria for success will be if the new inductive querying techniques contribute to new scientific insights. This may be measured by the number and quality of scientific publications in the application domain and by comparing the performance of the new inductive querying techniques to existing data mining techniques.
  • Further develop the relevant theory, representations and primitives for local pattern and global model mining, and integrate these into expressive, inductive query languages that enable one to discover new knowledge on the basis of data. The key criteria for success will be if the new inductive query languages and techniques support a wide range of different pattern representations and a wide range of data mining tasks, such as prediction and clustering.
  • To identify the database issues for inductive querying, such as optimization, scaling of large datasets, and integration.