COSC 6340 Project2
OLAP, Decision Trees, and Clustering
using MS SQL Server 2000

This project centers on using OLAP and data mining technology for a FoodMart database. It is a group project and groups should subdivide the work between their members, but it is recommended that each students does Part1 on her/his own (you can help each other) to get familiarized with OLAP.

In the second part "interesting" data analysis problems for the FoodMart database have to be identified and the MS SQL Server 2000 model generation functions will be used to solve the problems your group selected. Moreover, if you have any doubts about selecting "good" data analysis problems, feel free to visit Dr. Eick during his office hour. In general, Haili Tu is be responsible for Part1 and Part2 of the project, and Dr. Eick is responsible for Part2 and Part3 of the project. Moreover, if you find a "very interesting" problem with respect to the FoodMart database that does not completely matches the task specification (given below), feel free to discuss the the "alternative" task with Dr. Eick in order to determine its suitability for the project.

Moreover, the results of the project have to be summarized in a report and each group is giving a 12 minute presentation about their project results in the last week of the semester.

Part1: OLAP

1.1: Get Familiar with the Multi-Dimensional Model.

1.2 Create a Data Cube by Your Own

1.3 Browse the Data Cube Inventory


Part2: Create a Decision Tree Data Mining Model

Analyze the schema of the FoodMart Database. Identity 2 "interesting" attributes in the database. Using MS SQL server 2000 2 models (one for each attribute) to predict the value of each attribute. Evaluate the goodness of your prediction. Looking at the decision tree you obtained what can be said what is important for predicting the particular value of this attribute? Extra credit will be given, if this prediction is done accross at different levels of granularity with respect to the available dimensions of the chosen data analysis problem. Also feel free to create additional data cubes to facilitate conduction the project.

Part3: Create a Clustering Model

Analyze the schema of the FoodMart Database and identify a class of objects in the database that are "interesting" to be clustered. Create a MS clustering model for this class of objects. Interpret the clusters; what can be said how the objects belonging to a particular cluster? Are the result of the clustering algorithm stabil?



There should be a single, comprehensive report per group. In addition to presenting your results of Part1, Part2, and Part3 your report should address the following questions (if your group members disagree state the majority opinions and minority opinions when aswering the questions below):
  • Did you find anything unexpected while doing Part2 and Part3.
  • Evaluate the graphical features to view data cubes. Do you believe that OLAP really helps data analyst to "making sense out of data"?
  • Give a evaluation of the MS SQL Server 2000 OLAP, decision tree, and clustering capabilities.
  • Give a summary what percentage of the group's time was spent conduction a particular task, for report writing, and for talk preparation.
  • Did you like the project?
  • Which database system did you like more (and why?): MS SQL Server or Oracle 8i?

    Deadlines:A group reports are due on Tu., April 23 and group presentations are scheduled during the Th., April 25 class.