Assignment2: Machine Learning and KDD, Decision Trees, and Neural Networks
Dr. Eick's Graduate AI-class (COSC 6368)

Deadlines: problem 6, 8, 9: Tu., Oct. 15 in class; problem 7: Th., October 31 in class.
last updated: September 30, 7:50p
Value: 100 (6:6, 7:70 8:12 9:12)

Problems 6, 8, 9 are "short" paper and pencil problems, wheras in problem 7 you will you will get some exposure with respect to "making sense of data" and you will also use a data mining tool.

Problem 6 --- Information Gain Heuristic

We would like to predict the gender of a person based on two binary attributes: leg-cover (pants or skirts) and beard (beard or bare-faced). We assume we have a data set of 20000 individuals, 10000 of which are male and 10000 of which are female. 80% of the 10000 males are barefaced. Skirts are present on 50% of the females. All females are bare-faced and no male wears a skirt.

A. Compute the information gain of using the attribute leg-cover for predicting gender!
B. What is the information gain of using the attribute beard to predict gender?
C. Based on yours answers to A and B which attribute should be used as the root of a decision tree that predicts the gender of a person?

Problem 7 --- Knowledge Discovery for the NBA using Decision Trees

The goal of this project is to explore how decision tree tools can help in predicting the free throw percentage (FT) of a basketball player using the following NBA-Player Data Set and the popular decision tree tool C5.0. Free throw percentages are considered to be either HIGH (80% and more), and LOW (less than 80%). The NBA dataset has been converted into the following C5.0 processable raw NBA data set with the following NBA data set attribute definitions. Go to the C5.0 website and download and install the system on your own computer. More information about decision trees can be found by following the above link.

A. Data set preparation for 4-fold cross validation: When evaluating the (testing) accuracy of decision trees to predict "good" free throw shooters use 4-fold cross validation. Take the raw nba data set and partion it into 4 disjoint data sets of approximately equal size and with positive and negative examples occuring with approximately the same percentages. Create 4 training sets using the union of 3 of the 4 sets, and use the remaining set as the test set for this training set.
B. Find the best pruning parameter settings: Run C5.0 with various pruning options and try to find the "best" parameter setting that results in the highest accuracy for the classification problem. In this project accuracy is defined as average testing performance for the 4 training sets you constructed in step A. Explain how your "best" pruning parameter setting was obtained; moreover, report accuracy, training performance, and average decision tree size for the "best" and 4 other pruning parameter settings.
C. Sensitivity analysis for the obtained results: Now look at the 4 decision trees that were obtained by running C5.0 with the "best" pruning parameter setting (you determined in step B) for the 4 training sets. Additionally, create a fifth decision tree by running C5.0 for the complete nba data set (that contains all 184 examples) with the "best" parameter setting (include these five decision trees in your report). Compare the 5 decision trees --- are they similar or do the look significantly different? What commonalities do they share and in what aspect are they different? What conclusion do you draw from your answers to the previous questions?
D. Final Evaluation: What did you learn from the experiments in steps B and C? What do the results tell us about the classification problem we are trying to solve; e.g. did the decision tree approach work well/work badly for the problem at hand; did you find anything interesting that distinguishes successful free throw shooters from not so successful ones in the NBA (e.g. is there anything nba scouts can learn from your results)? Did your results match your expectations?

Discuss and summarize your results and findings in a 5-6 page (single spaced) report. Include the 5 decision trees you obtained in Step C as an appendix in the report (you can refer to the figures in your report).

Problem 8 --- Neural Networks

Give a neural network (having an architecture of your own choice) that uses the step activation function (your are not allowed to use other activation functions) and computes the following function f based on inputs A, B, and C (X=f(A,B,C)):


Show how your neural network computes the answer for the third and fourth example.

Problem 9 --- Perceptron and Backpropagation Learning Algorithms

Assume we have theperceptron that is depicted in Fig. 1 that has two regular inputs, X1 and X2, and an extra fixed input X3, which always has the value 1.
Neural Network Architectures
The perceptron's output is given as the function:

Out= If (w1*X1 + w2*X2 + w3*X3) > 0 then 1 else 0

Note that using the extra input, X3, we can achieve the same effect as changing the perceptron's threshold by changing w3. Thus, we can use the same simple perceptron learning rule presented in our textbook to control this threshold as well.

A. We want to teach the perceptron to recognize the function X1 XOR X2 with the following training set:


Show the change in the weights of the perceptron for every presentation of a training instance. Assume the initial weights are: w1=0.2, w2=0.2, w3=0.2 Important: Do the iterations according to the order of the samples in the training set. When you finish the four samples go over them again. You should stop the iterations once you get convergence, or when the values you get indicate that there will be no convergence. In either case explain your decision to stop the iterations. Assume in your computations that the learning rate a is 0.2.


B. This time, instead of being limited to a single perceptron, we will introduce hidden units and use a different activation function. Our new network is depicted in Fig. 2. Assume that the initial weights are w14 = 0.3, w15 = 0.1, w24 = 0.2, w25 = 0.6, w34 = 0.1, w35 = 0.1, w36 = 0.1, w46 = 0.5, and w56 = 0.2. The training set is the same as in (a). Use g=0.1 as your learning rate. Show what the new weights would be after using the backpropagation algorithm for two updates using just the first two training instances. Use g(x) = 1/(1+e**(-x)) as the activation function; that is g'(x)=(e**(-x))/(1+e**(-x))**2).


C. Will the perceptron you used for problem A ever learn the XOR function? What about the multi-layered neural network you used when you solved problem B? Give reasons for your answers!