www: Rachsuda's Website

e-mail: rachsuda@cs.uh.edu

office hours: TU 10-11a TH 11a-noon

The course discusses the most important data mining techniques and provides background knowledge on how to approach a data mining task. Moreover, the application of data mining techniques will be studied in homework assignements and a course project.

office hours (589 PGH): TU 2:30-3:30p TH 11:30a-12:30p

TA: ?

office: ??? PGH

e-mail: ???@cs.uh.edu officehours: ??p

class meets: TU/TH 1-2:30p in 232 PGH

cancelled classes: Tu., Nov. 29. 2005

makeup classes: Friday, Nov. 11, 2-5p (double makeup class for Rita and Nov. 29, 2005)

class room: 232 PGH

- Jiawei Han and Micheline Kamber,
*Data Mining: Concepts and Techniques* - Morgan Kaufman Publishers, 2001, ISBN 1-55860-489-8
- Link to Data Mining Book Home Page

- P.-N. Tang, M. Steinback, and V. Kumar
*Introduction to Data Mining*, - Addison Wesley,
- Link to
Book HomePage

- The grades should be available Thursday night; check Rachsuda webpage after 6p. Solutions to some homework problems can also be found on her website.
- The final exam will not be returned to students; however, you can look at your final exam: Tu., Jan. 4, 11a-noon; Th., Jan. 20, 4:30-5:30p; Tu., Jan. 25, 5-6p.
- Solution Sketches for Homework1, for a few problems of Homework2, and for Exam/Quiz1 are available now!
- Due to the fact that the course is offered the first time, the teaching plan is preliminary and subject to change.
- Approx. 80% of the teaching material covered in the class orginates from the Han and Tan introductions to data mining.

Homework2 (due on Mo., October 10, 10a --- electronic submission)

Homework3 (due on Tu., Nov. 8, in class)

Homework4 (due on Sa., Dec. 3, 11p, electronic submission)

1 paper review

4 assignments/homeworks (contain medium-size KDD projects, short review problems, and tasks some of which might involve some programming or using a KDD-tool; other problems are intended as a prepartion for the exams of the course)

Homework2: Mo., October 10, 10a (electronic submission)

Exam1: Th., October 13

Homework3: Tu., November 8, in class

Exam2: Th., November 10, 2005

Homework4: Sa., Dec. 3, 2005

Final Exam: ...

II Exploratory Data Analysis (the transparencies are "self-explanatory")

III Data Preprocessing (covers Han Chapter3; finalized set of transparencies to be used in the second week of the semester)

IV Concept Description (Han Chapter5; transparency 40 and higher will not be discussed in the lecture)

V Introduction to Classification: Basic Concepts and Decision Trees (last transparency modified and 4 transparencies added on Sept. 28, 2005)

VI More on Classification: Instance-based Learning and Support Vector Machines (NN-Classifiers and Support Vector Machines (updated on Sept. 28, 2005), Editing and Condensing Techniques for NN-Classifiers)

VII Introduction to Similarity Assessment and Clustering

VIII More on Clustering: Grid-based, Hierarchical and Density-based Clustering (more on AGNES and DBSCAN), Critical Issues with Respect to Clustering", Supervised Clustering.

IX Brief Introduction to Data Cubes (Han Chapter 2 centering on cubes)

X Association Rules

XI Paper Walk Throughs

XII Mining Complex Types of Data Part1, Part2 (covers Han Chapter 9 in part)

XIII Spatial Data Mining (Spatial Databases, Spatial Data Mining (very long), Spatial Data Mining Transparencies covered in class)

XIV Final Words

Remark: Topics with '*' might not be covered due to a lack of time.

Student Assignments for the DBSCAN-Paper: Section 1+2: Wu, Ma Section 3: Jefferson, Thomas Section 4, 4.1: Wadwani, Patel Section 4.2: Ma, Wu Section 5: Thomas, Jefferson Section 6: Patel, Wadwami Student Assignments for the Sarawawi Paper: Section 1: Arigai, Belokrylov Section 2: Alkan, Vaezian Sections 3.1, 3.2, 3.3: Belokrylov, Arigai Section 3.4, 3.5, 3.6: Saveran, Tabbaa Section 4.1: Vaezian, Alkan Section 4.2, 5: Tabbaa, Saverna

Translation number to letter grades:

A:100-90 A-:90-86 B+:86-82 B:82-77 B-:77-74 C+:74-70

C: 70-66 C-:66-62 D+:62-58 D:58-54 D-:54-50 F: 50-0

Only machine written solutions to homeworks and assignments are accepted (the only exception to this point are figures and complex formulas) in the assignments. Be aware of the fact that our only source of information is what you have turned in. If we are not capable to understand your solution, you will receive a low score. Moreover, students should not throw away returned assignments or tests.

Students may discuss course material and homeworks, but must take special
care to discern the difference between **collaborating** in order to increase
understanding of course materials and collaborating on the homework /
course project
itself. We encourage students to help each other understand course
material to clarify the meaning of homework problems or to discuss
problem-solving strategies, but it is **not** permissible for one
student to help or be helped by another student in working through
homework problems and in the course project. If, in discussing course materials and problems,
students believe that their like-mindedness from such discussions could be
construed as collaboration on their assignments, students must cite each
other, briefly explaining the extent of their collaboration. Any
assistance that is not given proper citation may be considered a violation
of the Honor Code, and might result in obtaining a grade of F
in the course, and in further prosecution.

KDD 2005 Conference

IEEE International Conference on Data Mining (ICDM) Website

PKDD 2005 (European KDD Conference)

UIUC Data Mining Group

Microsoft DMX Group

Penn Data Mining Group

UMN Spatial Database and Spatial Data Mining Group

Vrije Universiteit Amsterdam Data Mining Group

Data Mining and Machine Learning Group University of Helsinki

Data Mining at Massey University

UH's Data Mining and Machine Learning Group (UH-DMML)