Homework3+4:Uncertainty, Belief Networks, Knowledge-based Systems, Machine Learning, Decision Trees and Knowledge Discovery.
Dr. Eick's Graduate AI-class (COSC 6368)

Remark: This is a first draft of homeworks 3 and 4; the final version of homework3 will be available on Oct. 20.
available points: 210 points
homework 3: problems 11, 12, 13, 14, 15.
recommendation: do problems 11 and 12 prior to the midterm exam, and start working on problem 13 prior to the midterm exam.
homework 4: problems 9, 16, 17, 18
last updated: November 19, 11:43a

11) Probability (paper and pencil --- 7 points)
Textbook problem 14.4

12) Bayes Theorem (paper and pencil --- 5 points)
a)The following predicates are given:
Rain:= "It will rain tommorow"
Cloudy:= "The sky is cloudy today"
Humid:= "It is humid today"
Cold:= "it is cold today"
Moreover, P(Rain)= 0.1 P(Cloudy|Rain)=0.8 P(Humid|Rain)=0.9 P(Cold|Rain)=0.2 P(Cloudy)=0.6 P(Humid)=0.8 P(Cold)=0.4

Will it (usually) rain tomorrow --- compute:
$P(Rain|Cloudy and Humid and Cold) and
P(Rain|Cloudy and Humid)

b) Bayes's theorem is usually applied making the so called conditional indepence. Explain the assumption by referring to the example a (explain what what was assumed to be independent in your solution of problem a).


13) Belief Networks --- Getting the Structure right! (Paper & Pencil (but you are allowed to use a belief network tool, if you find this helpful) -- 29 Points)
Solve the Telescope Problem (problem 15.3 of the textbook)! Give reasons for your answers when responding to questions a-e of problem 15.3! There is also a typo: d. suppose M1=1 and M2=3...
Additionally, assume that the probabilty of a telescope being out of focus is 0.02, and that the probability of overcounting by one star is 0.05 and that the probability of undercounting by one star is 0.05; moreover, restrict you analysis to the case that N is limited to 0, 1, 2, 3, or 4 stars! If there are any probabilities missing, just assume that they are evenly distributed (with respect to the set of states they might be in)!

14) Belief Networks --- Getting the Numbers right! (Belief Network Development & Constraint Satisfaction -- 52 Points) Assume a belief network with a particular structure is given for Huntington disease. Furthermore the following constraints have been provided (for details see Wordfile that contains the network structure and the constraints; Wordfile with Constraints of HD-GBBN only) with respect to the assumed belief network structure. This knowledge has been been obtained through extraction from a Huntington Disease Profile and through interviewing domain experts. Provide probability tables for the given belief network structure (using Netica or any other belief network tool) that implements those constraints correctly. Submit the results of running your belief network for the following cases:
a) nothing is known
b) patient has Chorea
c) test showed CAG-repeat is 44
d) test showed CAG-repeat of 60 and psychiatric disturbances
e) patent has psychiatric disturbances and abnormalities in Cognition
f) patient has positive family history and has been symptom free and is 48 years old.

Write a 1-page report that briefly discusses how you solved the problem. In summary, you submit a 1-page report, your belief network, and the answers your belief network provided for questions a-f

15) RETE Algorithm (Paper and Pencil ---- 19 points)
a) Give the RETE-network for the following CLIPS-rule (there was a line cut of in b); the changed part is red color):

(defrule Santa 
  (P ?x ?y 2) (Q ?y 3) (R ?x 3 ?z)
  =>
  ...)

b) Assume that the working memory contains: (P 2 2 2) (P 2 3 2) (Q 3 3) (Q 7 3) (Q 2 3) (Q 2 4) (R 2 3 7) (R 3 3 7). Indicate the tokens that are stored in each node of the network.
c) Assume (P 3 2 2) is inserted into the working memory. Which computations have to be done, when the RETE algorithms is used?
d) Assume now (Q 2 3) is removed from the working memory. Which computations have to be done, when the RETE-algorithm is used.
e) If the conditions of the Santa-Rule would be reordered; would your answer to questions c) and d) change?
f) What are the main ideas of the RETE algorithm? Why is it popular for implementing forward chaining rule-based systems?

16) Decision Trees (Paper and Pencil --- 14 points)
Construct the decision tree C4.5 would generate for the following dataset (updated on Nov. 19, 1999):

A1A2A3A4Class
0000C1
2000C1
1100C1
1100C2
1111C2
2110C2
2001C2
1101C2

Indicate all computations that resulted in the construction of your submitted tree (especially discuss how the information gain heuristics was used).

17) Knowledge Discovery using Decision Trees (using a decision tree tool, learning about data analysis and knowledge discovery --- 66 points) The goal of this project is to explore how decision tree tools can help in predicting

In this project you can either focus on one of the two data analysis problems, or focus on both data analysis questions. For each data analysis problem, take the NBA-dataset and create a benchmark by partitioning it 3 times into a training set of 120 players, and a test set of 60 players (see that three classes are represented somewhat equally in the three sets). Run C5.0 with various parameter settings and collect the results. Report the best decision tree you found. Analyze the decision tree you came up with --- what does it tell us about the classification problem we were trying to solve; e.g. did the decision tree approach work well/work badly for the problem at hand; did we find anything interesting that distinguishes guards, forwards, and centers in the NBA? Did we find what attributes have a strong impact on a player's capability to be a good free throw shooter? Did your results match your expectations? Where your results stabil? Write a 5-6 page (single spaced) report that summarizes your findings and results.

18) Ontologies (18 points)
Read the Chadrasekaran&...'s article centering on ontologies, and write a 150 word essay that addresses most of the following questions and topics (you are allowed to skip one or two questions/topics): What reasons does the author give why ontologies are important? Do you agree with what the author is saying? Is the list of reasons that the author gives, complete (if no, give other reasons not listed in the paper). Give a list of applications for which ontologies are/might become important --- also briefly discuss what role ontologies play in the context of the listed applications. What kind of ontology tools / ontology technologies are needed to support these applications?