MASSON: Discovering Commonalities in Collection of Objects
Using Genetic Programming
Tae-Wan Ryu and Christoph F. Eick
Department of Computer Science
University of Houston
Houston, Texas 77204-3475
{twryu,ceick}@cs.uh.edu
Abstract
For the current flood of data, automatic tools for searching or analyzing data
are necessary, especially for complex databases. Accordingly,
knowledge discovery in databases is getting more and more attention. This paper
centers on the problem of discovering the common characteristics that are
shared by a set of objects belonging to an object-oriented database. In our
approach, commonalities within a set of objects are described by
object-oriented queries that compute this set of objects. The paper discusses
the architecture of a knowledge discovery system, called MASSON, which employs
genetic programming to find such queries, and presents an example run of the
system to illustrate how the system works; we will show how interesting queries
that describe commonalities within a set of objects are automatically generated,
modified, evaluated, and selected; we will also discuss how the search for the
"best" query is conducted by the MASSON system. Specific problems such as the
generation of constants in queries, how to cope with type violations and other
constraints when creating object-oriented queries, and query evaluation are
discussed in some detail.
Click here to see the full paper...(Postscript version)