MASSON: Discovering Commonalities in Collection of Objects

Using Genetic Programming

Tae-Wan Ryu and Christoph F. Eick
Department of Computer Science
University of Houston
Houston, Texas 77204-3475
{twryu,ceick}@cs.uh.edu


Abstract


For the current flood of data, automatic tools for searching or analyzing data are necessary, especially for complex databases. Accordingly, knowledge discovery in databases is getting more and more attention. This paper centers on the problem of discovering the common characteristics that are shared by a set of objects belonging to an object-oriented database. In our approach, commonalities within a set of objects are described by object-oriented queries that compute this set of objects. The paper discusses the architecture of a knowledge discovery system, called MASSON, which employs genetic programming to find such queries, and presents an example run of the system to illustrate how the system works; we will show how interesting queries that describe commonalities within a set of objects are automatically generated, modified, evaluated, and selected; we will also discuss how the search for the "best" query is conducted by the MASSON system. Specific problems such as the generation of constants in queries, how to cope with type violations and other constraints when creating object-oriented queries, and query evaluation are discussed in some detail.

Click here to see the full paper...(Postscript version)