2011 Course Syllabus
Basic Course Information
Instructor: Dr.
Christoph F. Eick
office hours (589 PGH): TU 2:30-3:30p and TH 11a-noon
e-mail: ceick@uh.edu
Teaching Assistant: Chun-sheng Chen
office hours (577 PGH): TU noon-1p & 2:30-3:30p
e-mail:
Link to Chun-sheng's COSC 6342 Website
class meets: TU/TH 1-2:30p
cancelled classes: Tu., April 26, 2011
Makeup class: Tu., May 3, 2011 (in 301 AH)
last class:
teaching class room: AH 301
Course Materials
Required Text:
Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2010
Recommended Texts: Tom Mitchell, "Machine Learning", McGraw-Hill, 1997.
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Course Elements and their Weights
Due to the more theoretical nature of machine learning there will
be a little more emphasis on exams and on understanding the course material
presented in the lecture and textbook. However, there will two
hands-on projects and a group project, 4 graded homeworks which count about 39%
towards the overall grade. In 2011 the weights of the different
parts of the course are as follows:
Midterm Exam 27%
Final Exam 33%
Attendance 1%
Project1 16%
Project2 8%
Project3 9%
Homeworks 6%
2011 Homeworks and Projects
Graded Homework1
Graded Homework2
Graded Homework3+4
Project1: Using Machine Learning to Make Money in Horse
Race Betting (HorseRaceExample,
Project1 Discussions,
Preference Learning Tutorial,
Wolverhampton Statistics)
Project2: Group Project---Exloring
a Subfield of Machine Learning (Project2
Group Presentation Schedule).
Project3: Application and Evaluation of Temporal Difference
Learning (Project
Description, RST-World)
Tentative Course Organization Spring 2011
Topic 1: Introduction to Machine Learning
Topic 2: Supervised Learning
Topic 3: Bayesian Decision Theory (excluding Belief Networks)
Topic 5: Parametric Model Estimation
Topic 6: Dimensionality Reduction Centering on PCA
Topic 7: Clustering1: Mixture Models, K-Means and EM
Topic 8: Non-Parametric Methods Centering on kNN and density estimation
Topic 9: Clustering2: Density-based Approaches
Topic 10 Decision Trees
Topic 11: Comparing Classifiers
Topic 12: Combining Multiple Learners
Topic 13: Support Vector Machines
Topic 14: More on Kernel Methods
Topic 15: Naive Bayes' and Belief Networks
Topic 16: Applications of Machine Learning---Urban Driving, Netflix, etc.
Topic 18: Reinforcement Learning
Topic 19: Neural Networks
Topic 20: Computational Learning Theory
The topic numbers are not changed from the 2009 offering of the course;
the main reason for this is that like this the names of Powerpoint files remain the
same.
Prerequisites
The course is mostly self-contained. However, students taking the course
should have
sound software development skills, and some basic knowledge of
statistics.

2011 Transparencies and Other Teaching Material
Course Organization ML Spring 2011
Topic 1: Introduction to Machine Learning(Eick/Alpaydin
Introduction, Tom Mitchell's Introduction
to ML---only slides 1-8 and 15-16 will be used)
Topic 2: Supervised Learning
(examples of classification techniques: Decision
Trees, k-NN)
Topic 3: Bayesian Decision Theory (excluding Belief Networks)
Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML (read
Bishop Chapter1 in conjuction with this material; not covered
in 2011)
Topic 5: Parametric Model Estimation
Topic 6: Dimensionality Reduction Centering on PCA
(PCA Tutorial, Arindam
Banerjee's More Formal Discussion of the Objectives of
Dimensionality Reduction)
Topic 7: Clustering1: Mixture Models, K-Means and EM
(Introduction to Clustering, Modified Alpaydin transparencies,
Top 10 Data Mining Algorithms paper)
Topic 8: Non-Parametric Methods Centering on kNN and Density
Estimation(kNN, Non-Parametric
Density Estimation, Summary Non-Parametric
Density Estimation, Editing and
Condensing Techniques to Enhance kNN, Toussant's survey paper on
editing, condesing and proximity graphs)
Topic 9: Clustering 2: Density-based Clustering
(DBSCAN paper,
DENCLUE2 paper)
Topic 10: Decision Trees
Topic 11: Comparing Classifiers
Topic 12: Ensembles: Combining Multiple Learners
for Better Accuracy
Topic 13: Support Vector Machines (Eick: Introduction
to Support Vector Machines, Alpaydin on
Support Vectors and the Use of Support Vector Machines for
Regression, PCA, and Outlier Detection (only transperencies which
carry the word "cover" will be discussed),
Smola/Schoelkopf Tutorial on Support Vector
Regression)
Topic 14: More on Kernel Methods(Arindam
Banerjee on Kernels,
Nuno Vasconcelos Kernel Lecture,
Bishop on Kernels; only transparencies 13-25 and
30-35 of the excellent Vasconcelos(
Homepage) lecture
will be covered in 2011)
Topic 15: Naive Bayes and Belief Networks(Eick on Naive Bayes,
Eick on Belief Networks,
Bishop on Belief Networks)
Topic 16: Successful Application of Machine Learning
Topic 18: Reinforcement Learning (Alpaydin on RL
(not used),
Eick on RL---try to understand those
transparencies; Using Reinforcement
Learning for Robot Soccer,
Kaelbling's RL Survey Article---read
sections 1, 2, 3, 4.1 and 4.2 centering on what was
discussed in the lecture)
Topic 20: Computational Learning Theory(Greiner
on PAC Learning,...)
Review May 3, 2011 (2009 Exam2 Solution
Sketches, 2009 Exam3 Solution
Sketches)
Remark: The teaching material will be extended and possibly
corrected during the course of the semester.

Grading
Each student has to
have a weighted average of 74.0 or higher in the
exams of the course in order to receive a grade of "B-" or better
for the course.
Students will be responsible for material covered in the
lectures and assigned in the readings. All homeworks and
project reports are due at the date specified.
No late submissions
will be accepted after
the due date. This policy will be strictly enforced.
Translation number to letter grades:
A:100-90 A-:90-86 B+:86-82 B:82-77 B-:77-74 C+:74-70
C: 70-66 C-:66-62 D+:62-58 D:58-54 D-:54-50 F: 50-0
Only machine written solutions to homeworks and assignments
are accepted (the only exception to this point are figures and complex formulas) in the assignments.
Be aware of the fact that our
only source of information is what you have turned in. If we are not capable to understand your
solution, you will receive a low score.
Moreover, students should not throw away returned assignments or tests.
Students may discuss course material and homeworks, but must take special
care to discern the difference between collaborating in order to increase
understanding of course materials and collaborating on the homework /
course project
itself. We encourage students to help each other understand course
material to clarify the meaning of homework problems or to discuss
problem-solving strategies, but it is not permissible for one
student to help or be helped by another student in working through
homework problems and in the course project. If, in discussing course materials and problems,
students believe that their like-mindedness from such discussions could be
construed as collaboration on their assignments, students must cite each
other, briefly explaining the extent of their collaboration. Any
assistance that is not given proper citation may be considered a violation
of the Honor Code, and might result in obtaining a grade of F
in the course, and in further prosecution.
Master Thesis and Dissertation Research in Data Mining and Machine Learning
If you plan to perform a dissertation or Master thesis project in the areas of
data mining or machine learning, I strongly recommend
to take the "Data Mining" course; moreover, I also suggest to take at least one, preferably two, of the following
courses: Pattern Classification (COSC 6343), Artificial
Intelligence (COSC 6368) or Machine Learning (COSC 6342). Furthermore, knowing
about evolutionary computing (COSC 6367) will be helpful, particularly
for designing novel data mining algorithms.
Moreover, having basic knowledge in data structures, software design, and databases is important when conducting
data mining projects; therefore, taking COSC 6320, COSC 6318 or COSC 6340 is a also good choice.
Moreover, taking a course that teaches high preformance computing is also
desirable, because most data mining algorithms are very resource intensive.
Because a lot of data mining projects have to deal with images, I
suggest to take at least one of the many
biomedical image processing courses that are offered in our curriculum. Finally, having some knowledge
in the following fields is a plus: software engineering, numerical optimization techniques, statistics, and data visualization. Also be aware of the fact that having sufficient background in the above listed areas is a prerequisite for consideration for a thesis or dissertation project in the area of data mining. I will not serve as your MS thesis or dissertation advisor, if you have do not have basic knowledge
in data mining, machine learning, statistics and related areas. Similarly, you
will not be hired as a RA for a
data mining project without having some background in data mining.

Machine Learning Resources
ICML 2011 (ICML is the #1 Machine
Learning Conference)
Carlos Guestrin's
2009 CMU Machine Learning Course
Andrew Ng's Stanford
Machine
Learning Course
Andrew Moore's Statistical Data Mining Tutorial
Christoph Bishop IET/CBS Turing Lecture
Alpaydin Textbook
Webpage