4) Group Project: Knowledge Discovery for the NBA using Decision Trees (using
a decision tree tool, learning about data analysis and knowledge
discovery --- 35 points)
The goal of this project is to explore how decision tree
tools can help in predicting the field goal percentage (FG%) of a basketball
player using the following NBA-Player
Data Set and the popular decision tree tool C5.0. Field goal percentage
is defined as
being HIGH (46% and more), and LOW (less than 46%)
Tasks to be solved and questions to be answered: Convert the NBA-data set into
the proper C5.0 format.
When evaluating the training performance for the data set use 3-fold cross validation
(take the NBA-dataset and create a benchmark by partitioning it 3 times
into a training
set of approx 120 players, and a test set of 60 players (see that
two classes occur with similar frequency in the three
sets))! Run C5.0 with various
parameter settings and collect and interpret the results. Report the best decision
tree or trees you
found together with the parameter setting that were used to generate the trees.
Analyze the decision trees the had a high accuracy in predicting
good shooters --- what does it tell us about
the classification problem we were trying to solve;
e.g. did the decision tree approach work well/work badly for the problem at hand; did
we find anything interesting that distinguishes successful shooters from
unsuccessful shooters in
the NBA? Did we find which attributes have a strong impact on
a player's capability to have a high field goal percentage? Did your results
match your expectations? Where your results stabil? Did you obtain any interesting
observation concerning the size of the learnt decision trees; is there a particular
size that works best for the classification problem?
Moreovoer, write
a 8-9 page (single spaced)
report that summarizes your findings and results of the project and prepare
a 15 minute group persentation for Tu., November 13, 2001 during our regular
class hours.
5) Neural Networks (Individual task: paper and pencil --- 10 points)
Give a neural network (having an architecture of your
own choice) that uses the step
activation function and computes the following function f X=f(A,B,C)
based on inputs A, B, C:
A | B | C | X |
---|---|---|---|
0 | 0 | 0 | 1 |
0 | 0 | 1 | 1 |
0 | 1 | 0 | 0 |
0 | 1 | 1 | 1 |
1 | 0 | 0 | 0 |
1 | 0 | 1 | 1 |
1 | 1 | 0 | 1 |
1 | 1 | 1 | 1 |