Homework2:Machine Learning, Neural Networks, and Knowledge Discovery.
Dr. Eick's Graduate AI-class (COSC 6368)


available points: 45 points
Problem 4 is a group task; problem 5 is a individual task.
Deadlines: Group report is due Nov. 11, 11p; other materail is due in the Nov. 13, 2001 class.
last updated: October 17, 2:43p

4) Group Project: Knowledge Discovery for the NBA using Decision Trees (using a decision tree tool, learning about data analysis and knowledge discovery --- 35 points) The goal of this project is to explore how decision tree tools can help in predicting the field goal percentage (FG%) of a basketball player using the following NBA-Player Data Set and the popular decision tree tool C5.0. Field goal percentage is defined as being HIGH (46% and more), and LOW (less than 46%)

Tasks to be solved and questions to be answered: Convert the NBA-data set into the proper C5.0 format. When evaluating the training performance for the data set use 3-fold cross validation (take the NBA-dataset and create a benchmark by partitioning it 3 times into a training set of approx 120 players, and a test set of 60 players (see that two classes occur with similar frequency in the three sets))! Run C5.0 with various parameter settings and collect and interpret the results. Report the best decision tree or trees you found together with the parameter setting that were used to generate the trees. Analyze the decision trees the had a high accuracy in predicting good shooters --- what does it tell us about the classification problem we were trying to solve; e.g. did the decision tree approach work well/work badly for the problem at hand; did we find anything interesting that distinguishes successful shooters from unsuccessful shooters in the NBA? Did we find which attributes have a strong impact on a player's capability to have a high field goal percentage? Did your results match your expectations? Where your results stabil? Did you obtain any interesting observation concerning the size of the learnt decision trees; is there a particular size that works best for the classification problem?

Moreovoer, write a 8-9 page (single spaced) report that summarizes your findings and results of the project and prepare a 15 minute group persentation for Tu., November 13, 2001 during our regular class hours.

5) Neural Networks (Individual task: paper and pencil --- 10 points) Give a neural network (having an architecture of your own choice) that uses the step activation function and computes the following function f X=f(A,B,C) based on inputs A, B, C:

ABCX
0001
0011
0100
0111
1000
1011
1101
1111

Show how your neural network computes the answer for the third and fourth example.