- I enjoyed teaching the course and like to wish everybody an interesting summer! The final grade distribution for COSC 6342 in Spring 2014 was: A:6 A-:7 B+:9 B:8 B-:1 C+:3. Detailed grade summaries will be posted on Wellington's website no later than May 15. The final exam will not be be returned to students; however, you can see your final exam at the following times in 575 (or 573) PGH: Thursday, May 15, 3:30-4:30p and Thursday, August 28, 3:30-4:30p. If you have some other concerns about the grading of the course also visit us at that time.
- The results of Project1 and all other projects and homeworks, are still subject to curving; in general, your results of Homework1, Homework2, Project 1 and Project2 will be combined using a magic formular into a single number and this number will then converted using another magic formula into a number grade which will then count 41% towards your course grade. The posted results of the midterm exam are already curved; therefore, you will already know your number grade which will conunt 26% to your overall grade.
- In the course, like in most other graduate courses in our department, no makeup opportunities will be given to students to makeup for poor exam, homework, and project grades.
- 2013 Reading assignment (page number refer to the second edition of the textbook): Jan. 20: 1-17; Jan. 24: 21-28 & 30-43; Jan. 26: 46-58; Feb. 1: 61-66; Feb. 3: 66-83, Feb. 8: 87-94, 100-102; Feb. 10: 109-116; Feb. 14: 116-120, 125-128, 133-134 (ISOMAP); Feb. 16: 142-149, Feb. 20: 149-157; 163-171; Feb. 26: 174-179 185-192; March 2: 192-203; March 6: Read DBSCAN paper; March 25: 447-463 (lecture does not follow much the textbook); March 30: 319-343, April 4: 309-317, April 8: 319-327; April 12: 328-337; April 13:Read first 3 columns of Smola Support Vector Regression Tutorial; April 19: 483-503; April 27: 387-402.
- 2014 Course Syllabus
## Basic Course Information

Instructor: Dr. Christoph F. Eick

office hours (573 PGH): MO 4-5p and WE 1-2p (through April 30)

e-mail: ceick@uh.edu

Teaching Assistant: Wellington Cabrera

office hours (575 PGH): MO 1-2p and MO 4-5p

Wellington's COSC 6342 Website

class meets: M/W 2:30-4p

cancelled classes: none

Makeup class:

## Course Materials

**Required Text:**Ethem Alpaydin, Introduction to Machine Learning, MIT Press, Second Edition, 2010.

**Recommended Texts:**Tom Mitchell, "Machine Learning", McGraw-Hill, 1997.

Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.## Some Important Dates for the Spring 2014 Semester

We., March 5: 45 minute review of course material

Mo., March 17: Midterm Exam

January 20, March 10, March 12: No lecture

Mo., April 21 and We., April 23: Student presentations Project2

Mo., April 28: last lecture

Mo., April 28: 45 minute review of course material

Mo., May 5, 2p: Final Exam

## Course Elements and their Weights

Due to the more theoretical nature of machine learning there will be a little more emphasis on exams and on understanding the course material presented in the lecture and textbook. However, there will be 2-3 projects and 2-3 graded homeworks (typically containing 15-20 problems total) which count about 40% towards the overall grade; 3-4 of these activities will be group projects / group homeworks. In 2014 the weights of the different parts of the course are as follows:

Midterm Exam 26%

Final Exam 32%

Attendance 1%

Project1: 20%

Project2: 13%

Homeworks: 8%## 2014 Course Projects, Reviews and Homeworks

Project1: Q-Learning for a Pickup Dropoff World ( PD-World)

Homework1 (Group Homework; usually 3 students)

**Homework Groups**:

Group1: Aleti,Sampath Vinayak Barman,Arko Bector,Shiwani

Group2: Cao,Can Chen,Juan Chilluveru,Sri Lakshmi

Group3: Dhawad,Priyanka Ghosh,Sayan Gopisetti,Narayana Jagdeesh

Group4: Gopisetti,Satya Deepthi Khalil,Nacer Kulkarni,Amita A

Group5: Liu,Tzu Hua Majeti,Dinesh Matusevich,David Sergio

Group6: Memariani,Ali Miao,Yang Mirsharif,Seyyedeh Qazale

Group7: MohammadMoradi,SeyyedHessamAldin Mughal,Zakariyya Saeed Nandamuri,Anil Kumar

Group8: Nguyen,Toan Bao Pavuluri,Bhagyasri Ponnada,Venkata Sai Karthik

Group9: Ravipati,Prudhvi Rayanapati,Krishna Swathi Sant,Prajakta Avinash

Group10: Siebeneich,Eric J. Uppaluri,Chaitanya Wei,Li

Group11: Wu,Yuhang Xing,Yutao Yang,Yajun Zhang,Xiaolu

Review 1 March 5, 2014 (contains solutions to the 2013 Homework1 and a few other problems from exams and other homeworks; also briefly discusses some Solution Sketches of the 2013 Midterm Exam)

Homework2&3 (final draft; same groups as homework1; due April 15, 11p)

Project2: Making Sense of Data-Apply Machine Learning to an Interesting Dataset/Problem (Group Project; Discussion Project2 2014, More on Project2)

**Project2 Groups**:

Group A: Aleti,Sampath Vinayak Gopisetti,Satya Deepthi Khalil,Nacer Kulkarni,Amita A

Group B: Barman,Arko Bector,Shiwani Chilluveru,Sri Lakshmi Nguyen,Toan Bao

Group C: Cao,Can Chen,Juan Pavuluri,Bhagyasri Siebeneich,Eric J.

Group D: Dhawad,Priyanka Ghosh,Sayan Gopisetti,Narayana Jagdeesh

Group E: Liu,Tzu Hua Majeti,Dinesh Matusevich,David Sergio

Group F: Memariani,Ali Miao,Yang Mirsharif,Seyyedeh Qazale MohammadMoradi,SeyyedHessamAldin

Group G: Mughal,Zakariyya Saeed Nandamuri,Anil Kumar Uppaluri,Chaitanya Wei,Li

Group H: Ponnada,Venkata Sai Karthik Ravipati,Prudhvi Rayanapati,Krishna Swathi Sant,Prajakta Avinash

Group I: Wu,Yuhang Xing,Yutao Yang,Yajun Zhang,Xiaolu

Review2 on Monday, April 28, 2014 (Exam3 in 2009, 2013 Machine Learning Final Exam, Solution Sketches 2011 Homework3)## Course Topics

Topic 1: Introduction to Machine Learning Topic 2: Supervised Learning Topic 3: Bayesian Decision Theory (excluding Belief Networks) Topic 5: Parametric Model Estimation Topic 6: Dimensionality Reduction Centering on PCA Topic 7: Clustering1: Mixture Models, K-Means and EM Topic 8: Non-Parametric Methods Centering on kNN and density estimation Topic 9: Clustering2: Density-based Approaches Topic 10 Decision Trees Topic 11: Comparing Classifiers Topic 12: Combining Multiple Learners Topic 13: Support Vector Machines Topic 14: More on Kernel Methods Topic 15: Naive Bayes' and Belief Networks Topic 16: Neural Networks Topic 17: Active Learning Topic 18: Reinforcement Learning Topic 19: Hidden Markov Models Topic 20: Computational Learning Theory

Topics covered and order of Topic Coverage in 2014: Topic1, Topic18, Topic2, Topic3, Topic5, Topic6, Topic7, Topic8, Topic10, Topic12, Topic11, Topic15, Topic19, Topic13, Topic14, and Topic20.## Prerequisites

The course is mostly self-contained. However, students taking the course should have sound software development skills, and some basic knowledge of statistics.

## 2014 Transparencies and Other Teaching Material

Course Organization ML Spring 2014

Topic 1: Introduction to Machine Learning(Eick/Alpaydin Introduction, Tom Mitchell's Introduction to ML---only slides 1-8 and 15-16 will be used)

Topic 2: Supervised Learning (examples of classification techniques: Decision Trees, k-NN)

Topic 3: Bayesian Decision Theory (excluding Belief Networks) and Naive Bayes (Eick on Naive Bayes)

Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML (read Bishop Chapter1 in conjuction with this material; not covered in 2011)

Topic 5: Parametric Model Estimation

Topic 6: Dimensionality Reduction Centering on PCA (PCA Tutorial, Arindam Banerjee's More Formal Discussion of the Objectives of Dimensionality Reduction)

Topic 7: Clustering1: Mixture Models, K-Means and EM (Introduction to Clustering, Modified Alpaydin transparencies, Top 10 Data Mining Algorithms paper)

Topic 8: Non-Parametric Methods Centering on kNN and Density Estimation(kNN, Non-Parametric Density Estimation, Summary Non-Parametric Density Estimation, Editing and Condensing Techniques to Enhance kNN, Toussant's survey paper on editing, condesing and proximity graphs)

Topic 9: Clustering 2: Density-based Clustering (DBSCAN paper, DENCLUE2 paper)

Topic 10: Decision Trees

Topic 11: Comparing Classifiers

Topic 12: Ensembles: Combining Multiple Learners for Better Accuracy

Topic 13: Support Vector Machines (Eick: Introduction to Support Vector Machines, Alpaydin on Support Vectors and the Use of Support Vector Machines for Regression, PCA, and Outlier Detection (only transperencies which carry the word "cover" will be discussed), Smola/Schoelkopf Tutorial on Support Vector Regression)

Topic 14: More on Kernel Methods(Arindam Banerjee on Kernels, Nuno Vasconcelos Kernel Lecture, Bishop on Kernels; only transparencies 13-25 and 30-35 of the excellent Vasconcelos( Homepage) slides 13-19, 22-24, 30-35 will be covered in 2013)

Topic 15: Naive Bayes and Belief Networks(Eick on Naive Bayes, Eick on Belief Networks (used in the lecture), Bishop on Belief Networks (not used in the lecture, but might be useful for preparing for the final exam)

Topic 16: Successful Application of Machine Learning

Topic 17: Active Learning (might be covered in 2014, if enough time)

Topic 18: Reinforcement Learning (Alpaydin on RL (not used), Eick on RL---try to understand those transparencies; Using Reinforcement Learning for Robot Soccer, Sutton "Ideas of RL" Video (to be shown and discussed in part in 2013), Kaelbling's RL Survey Article---read sections 1, 2, 3, 4.1, 4.2, 8.1 and 9 centering on what was discussed in the lecture)

Topic 19: Brief Introduction to Hidden Markov Models (HMM in BioInformatics)

Topic 20: Computational Learning Theory(Greiner on PAC Learning,...)

Some old Exams: 2009 Exam2 Solution Sketches, 2009 Exam3 Solution Sketches, 2014 Midterm Exam Solution Sketches.

Remark: The teaching material will be extended and possibly corrected during the course of the semester.

## Grading

Each student has to have a**weighted average of 74.0 or higher in the exams of the course**in order to receive a grade of "B-" or better for the course. Students will be responsible for material covered in the lectures and assigned in the readings. All homeworks and project reports are due at the date specified. No late submissions will be accepted after the due date. This policy will be strictly enforced.Translation number to letter grades:

A:100-90 A-:90-86 B+:86-82 B:82-77 B-:77-74 C+:74-70

C: 70-66 C-:66-62 D+:62-58 D:58-54 D-:54-50 F: 50-0Only machine written solutions to homeworks and assignments are accepted (the only exception to this point are figures and complex formulas) in the assignments. Be aware of the fact that our only source of information is what you have turned in. If we are not capable to understand your solution, you will receive a low score. Moreover, students should not throw away returned assignments or tests.

Students may discuss course material and homeworks, but must take special care to discern the difference between

**collaborating**in order to increase understanding of course materials and collaborating on the homework / course projects. We encourage students to help each other understand course material to clarify the meaning of homework problems or to discuss problem-solving strategies, but it is**not**permissible for one student to help or be helped by another student in working through individual homework problems and in individual course projects. If, in discussing course materials and problems, students believe that their like-mindedness from such discussions could be construed as collaboration on their assignments, students must cite each other, briefly explaining the extent of their collaboration. Any assistance that is not given proper citation may be considered a violation of the Honor Code, and might result in obtaining a grade of F in the course, and in further prosecution.## 2013 Course Projects and Homeworks

Project1 (Project Description Project1): Making Sense of Data---Apply Machine Learning to a Dataset (Group Project; typically 2 students per group)

Project 2: Read, Understand, Summarize and Review a Machine Learning Paper (Project2 Post-Analysis; Individual Project Part1; group project Part2; 6 paper candidates to review include: C1, C2, C3, C4, C5 and C6. Papers on Reviewing: RV1, RV2)

Project 3: Application and Evaluation of Temporal Difference Learning (Individual Project; RST-World)

Homework1 (Individual)

Homework2 (Group Homework)

Homework3 (Individual)

## 2011 Homeworks and Projects

Graded Homework1

Graded Homework2

Graded Homework3+4

Project1: Using Machine Learning to Make Money in Horse Race Betting (HorseRaceExample, Project1 Discussions, Preference Learning Tutorial, Wolverhampton Statistics)

Project2: Group Project---Exloring a Subfield of Machine Learning (Project2 Group Presentation Schedule).

Project3: Application and Evaluation of Temporal Difference Learning (Project Description, RST-World)

## Master Thesis and Dissertation Research in Data Mining and Machine Learning

If you plan to perform a dissertation or Master thesis project in the areas of data mining or machine learning, I strongly recommend to take the "Data Mining" course; moreover, I also suggest to take at least one, preferably two, of the following courses: Pattern Classification (COSC 6343), Artificial Intelligence (COSC 6368) or Machine Learning (COSC 6342). Furthermore, knowing about evolutionary computing (COSC 6367) will be helpful, particularly for designing novel data mining algorithms. Moreover, having basic knowledge in data structures, software design, and databases is important when conducting data mining projects; therefore, taking COSC 6320, COSC 6318 or COSC 6340 is a also good choice. Moreover, taking a course that teaches high preformance computing is also desirable, because most data mining algorithms are very resource intensive. Because a lot of data mining projects have to deal with images, I suggest to take at least one of the many biomedical image processing courses that are offered in our curriculum. Finally, having some knowledge in the following fields is a plus: software engineering, numerical optimization techniques, statistics, and data visualization.Also be aware of the fact that having sufficient background in the above listed areas is a prerequisite for consideration for a thesis or dissertation project in the area of data mining. I will not serve as your MS thesis or dissertation advisor, if you have do not have basic knowledge in data mining, machine learning, statistics and related areas. Similarly, you will not be hired as a RA for a data mining project without having some background in data mining.

## Machine Learning Resources

Soybean Grand Challenge: Make some money with Machine Learning!

ICML 2013 (ICML is the #1 Machine Learning Conference)

Carlos Guestrin's 2009 CMU Machine Learning Course

Andrew Ng's Stanford Machine Learning Course

Andrew Moore's Statistical Data Mining Tutorial

Christoph Bishop IET/CBS Turing Lecture

Alpaydin Textbook Webpage