|
The 5th International Workshop on Knowledge Discovery in Inductive Databases (KDID'06)
Invited Talk by Kiri Wagstaff
Jet Propulsion Laboratory, California Institute of Technology, USA
Value, Cost, and Sharing: Open Issues in Constrained Clustering
Abstract
Clustering is an important tool for data mining, since it can identify major patterns or trends without any supervision (labeled data). Over the past five years, semi-supervised clustering methods have become very popular, as they can address problems in which some, but not all, supervisory information is available – a situation that commonly occurs with large, real-world databases. These constrained clustering methods began with incorporating pairwise constraints and have developed into more general methods that can learn appropriate distance metrics. However, several important open questions have arisen: 1) Given the recent observation that some constraint sets can adversely impact performance, how can we determine the utility of a given constraint set, prior to clustering? 2) How can we minimize the effort required of the user, by active soliciting only the most useful constraints? 3) When and how should constraints be propagated or shared with neighboring points? In this talk, I present these questions and suggest future directions for constrained clustering research.
Biography
Kiri Wagstaff is a senior researcher at the Jet Propulsion Laboratory in Pasadena, CA. Her focus is on developing new machine learning methods and applying them to challenging problems such as clustering or classifying surface features and predicting crop yield from Earth orbit. She is particularly interested in onboard data analysis for spacecraft, enabling missions with higher capability and autonomy. Her dissertation, "Intelligent Clustering with Instance-Level Constraints," helped initiate work in the machine learning community on constrained clustering methods.
Homepage: http://www.litech.org/~wkiri/.
|