last updated: April 24, 2026


COSC 6335: Data Mining in Spring 2026 (Dr. Eick )



2024 COSC 6335 Syllabus

Goals of the Data Mining Course

Data mining centers on finding novel, interesting, valid, and potentially useful patterns in data. It aims at transforming a large amount of data into a well of knowledge. Data mining has become a very important field in industry as well as academia. The course covers most of the important data mining techniques, covers the Basics of Data Science, and provides background knowledge on how to conduct a data mining project. Topics covered in the course include exploratory data analysis, classification and prediction, clustering and similarity assessment, association analysis, outlier and anomaly detection, and interpreting and evaluating data analysis/data mining results. Also basic visualization techniques and statistical methods will be introduced. Moreover, hands on data mining experience will be provided in three assignments. You will also get some practical expierence in evaluating data mining results from you fellow students and data mining publications. Finally, you will learn on how to use and do programming in the popular statistics, visualization, and data mining environment R. The topics of the course have some overlap with what is taught in the Machine Learning (COSC 6342) course, to reduce this overlap the teaching of this course places a little less emphasis on learning classification and prediction models (this topic will be covered "more quickly" and more emphasis will put on Data Science Basics, Exploratory Data Analysis, Association Analysis, Clustering, Outlier Detection and Data Set Augmentation. There will be 3 assignments and 2 "paper walkthoughs" in 2026. There will be 3 group activities in 2026: Assignment2, leading paper walkthoughts, and Group Homework Credit (you find a more detailed description about group activities below).

Comments concerning this website

If you have any comments concerning this website, send e-mail to: ceick@uh.edu

Basic Course Information

Instructor: Dr. Christoph F. Eick
office hours: TU 4-5p TH 9-10a
e-mail: ceick@uh.edu
TA: Janet Anagli
e-mail: jyanagli@CougarNet.UH.EDU
TA office hours: MO+TU 9-10a (on MS Teams)
class meets: TU/TH 11:30-1
class room: SEC 203
classes taught by others: February 5
cancelled classes: Likely, Tuesday, April 14

Course Materials

Objectives Data Mining Course

Highly Recommended Text:
P.-N. Tang, M. Steinback, and V. Kumar: Introduction to Data Mining,
Addison Wesley, Second Edition, 2019
Link to Book HomePage

Recommended Texts: NIST/SEMATECH e-Handbook of Statistical Methods (good onlne source covering exploratory data analysis, statistics, modelling and prediction)

2026 Course Organization

I.	Introduction to Data Mining
II. 	Density Estimation 
III.    Similarity Assessment 
IV.	Outlier and Anomaly Detection
V.      Autoencoders 
VI.	Data Science Basics and Exploratory Data Analysis 
VII.    Clustering 
VIII.   Supervised Learning: Basic Concepts, Decision Trees, Neural Networks and Deep Learning 
IX.	Association Analysis: Rule, Sequence, Graph and Collocation Mining
X.    Data Storytelling (not covered in 2026)
XI.    Preprocessing 
  

Important Course Dates

Thursday, February 5: Assignment1 Lab
Tu., Feb. 24: In-class Activity (30 minutes) and Lockdown Browser Exam Mock Exam (10 minutes) Thursday, March 5, 11:30a: First Course Exam
Thursday, March 12: Last Lecture before Spring Break
March 17+19: no lecture (Spring Break)
Tu., March 10: "Fast" Task2 Topic Overview Presentations (120-150 seconds for each group)
Th., March 26: First Paper Walkthrough
Th+Tu, April 2+7: Task2 Student Presentations; groups G, H and I present on April 2.
Tu., April 14: no class!
Th., April 16: Second Paper Walkthrough
Thursday, April 30, 11:30a: Second Course Exam

News COSC 6335 (Data Mining) Fall 2026

Course Elements and Their Weights

These are the final weights for Spring 2026:
Exams (2): 51% (Exam1 (March 5, 2026): 24%, Exam2 (April 30, 2026): 27%
Assignments (3 tasks): 41%
Group Homework Credit(4%) and Paper Walkthroughs(2%): 6%
Attendance: 2%

2026 Assignments

Task1: Outlier Detection For a Houston Weather Dataset (second draft; tentative deadline: Feb. 26, 2023 Houston Weather Dataset); Task1 is an individual task and centers on density estimation, autoencoders and outlier detection.

Task2: Group Project (tentative deadline: March 30) centering on classification/prediction and clustering; each group will find an "interesting" dataset and choose their own topic which will be approved by Feb. 20 the latest. Groups will present their Task2 results in presentation April 2 and 7!

Task3: Collocation Mining for a Building Dataset (deadline: April 24, More Information about Task3, Building Dataset (KML file)). Individual task centering on association analysis.

Task Weights: Task1: 90 points, Task2: 70 points (subject to change; might go up or down), Task3: 48 points.
Remark: The three task scores will be added, and the sum will be curved and converted into a number grade which counts 41% towards your course grade.

2026 Group Activities

Groups will perform Assignment2, summarize paper sections and lead discussion in the two paper walkthoughs, and will make a 13-18 minute group homework credit (GHC) presentation. GHC tasks include presenting solutions to homework-style problems, leading in-class discussions and demo tools. Each group get a different GHC task.

2026 Group Homework Credit

Group A and Group B Tasks (Group A will present on Feb. 12 and Group B will present on Feb. 17)
Group C Task (will present on Feb. 24)
Group D Task (will present on March 3)
Group E Task (scheduled for March 24)
Group F Task (scheduled for April 2)
Group H Task (scheduled for April 21)
Group G Task (scheduled for April 23)
Group I Task (scheduled for April 28)

Remarks: Tasks will be assigned at least 4 days before your group's presentation date!

Paper Walktroughs

2026 Paper Walkthrough Info (scheduled for March 26 and April 17)

Dr. Eick's COSC 6335 2026 Lecture Notes

I Introduction to Data Mining (Part1, Part2, Part3--- covers chapter 1 and Section 2.1)
II Data Science Basics / Exploratory Data Analysis (covers chapter 3 of the first edition of the Tan book)
III R (Arko's Short Intro Into R (used in Lab), Scatter Plot Code, Decision Trees in R, Some useful code for Task1 (to be discussed on Sept. 16), Some other code for Task2 (not discussed in the lecture), Computing Statistical Summaries In the Presense of Missing Value (NA) (not discussed in the lecture), Functions and Loops in R (useful, but not discussed in the lecture)
IV Similarity Assessment
V Naive, Parametric (MLE Example) and Non-Parametric Density Estimation
VI Autoencoders (presented by Janet on Feb. 5)
VII Outlier Detection (extended and updated on Feb. 6, 2026)
VIII Clustering: Introduction to Clustering, Density-based Clustering, Hierarchical Clustering, DENCLUE, EM, R-scripts demoing K-means/medoids and DBSCAN and Cluster Validity.
IX Supervised Learning (Introduction to Classification: Basic Concepts and Decision Trees, Overfitting, Introduction to Neural Networks. More on NNs: Simon J.D. Prince, Understanding Deep Learning, MIT Press, 2023.
Text Book Resourses (contains slides as well as the book in pdf format, which you can download); relevant content: Fitting Models and Regularization; K-Nearest Neighbors and Support Vector Machines (partially covered in 2026, as we are running out of time)).
XI Association Analysis: Assiociation Rule Mining, Sequence Mining (not covered in 2026).
XII Introduction to Spatial Data Mining (not covered in 2026)
XIII Data Preprocessing for Data Mining (will already be covered early September)
XIV Advanced Clustering (will cover CLIQUE, DENCLUE, FCM and SNN)
XV Data Storytelling (not or only breifly covered in 2026)

Old Webpages of COSC 6335: 2013 and 2009.

In-Class Activity

Feb. 24 Activity (Group Activity; groupsize 4-5)

COSC 6335 Grading

Dr. Eick uses the following number grade scale for graduate courses: A: 100-90, A-: 90-85, B+: 85-82, B: 82-77, B-:77-74, C+: 74-70, C:70-66,C-:66-62, D+: 62-58, D:58-54, D-:54-50, F: 50-0. Exams and Problemsets are still curved after your exam and task scores have been determined. Exam scores are curved immediately and problem set scores are normalized, added; ultimately, your total for the 3 assignment is converted at the end of the semester into a number grade.

Number grades higher than 95 are rarely used---except for truly exceptional performances and outliers; a grade of 95 already represents a letter grade of A+¦ Exams of graduate courses are usually curved so that the exam number grade average is in the range 81-84, depending how well the students performed in the particular exam. Dr. Eick first determines the class's performance in the exam and selects an average and then the exam is curved accordingly. Number grades do not directly correspond to percentage obtained: for high percentage averages percentage will be down graded and for low percentage average percentages will be upgraded.

2026 Exam1 Curving Function: h(x)= round(82.9+((x - 31.55)*0.87))

Old COSC 6335 News Items

Data Sets

Complex8
Complex9 with 8% Gaussian Noise Added
Bank Note Authentication

2024 Exams

Midterm1 Exam (scheduled for Oct. 10, 2024, 2:30p in 203 SEC, Review List, October 8, 2024 Review for the Midterm Exam, Solution Sketches 2022 Midterm Exam, Solution Sketches 2023 Midterm Exam)

Midterm2 Exam (2024 Review List, Nov. 21 Review for the Nov. 26, 2024 Exam) has been scheduled for Tuesday, November 26, 2024, 2:30p(Some solution sketches for the 12/09/22 Final Exam; Some solution sketches for the 12/07/23 Final Exam)

Course exams will be open book/notes paper exams; however, the use of cell phones and computers during the exam is not allowed; basic calculators are okay.

Midterm1 counts 22% and Midterm2 counts 24% toward the course grade.

Attendance 2024

Attendance counts 2% towards the course grade. Attendance will be taken starting Tuesday, August 27 throughout the remainder of the semester. Only F2F attendance counts. Therefore, 24 attendances will be taken (August (2), September (7), October (9), November (6)). Your number of attendances will be converted as follows into a number grade:
24-23: 92, 22-21: 91, 20-19 :90, 18:88, 17:86, 16:84, 15:81, 14:78, 13:75, 12:71, 11:67, 10:63, 9:59, 8-0:55.