last updated: April 24, 2026


COSC 6335: Data Mining in Spring 2026
(Dr. Eick )
Goals of the Data Mining Course
Data mining centers on finding novel, interesting, valid, and potentially useful patterns in data. It aims
at transforming a large amount of data into a well of knowledge. Data mining has become a very
important field in industry as well as academia. The course covers most of the important data
mining techniques, covers the Basics of Data Science, and provides
background knowledge on how to conduct a data mining project.
Topics covered in the course include exploratory data analysis, classification and prediction,
clustering and similarity assessment, association analysis, outlier and anomaly detection, and
interpreting and evaluating data analysis/data mining results. Also basic visualization techniques
and statistical methods will be introduced. Moreover, hands on data mining experience will be
provided in three assignments. You will also get some practical expierence in evaluating data
mining results from you fellow students and data mining publications. Finally, you will learn on how to use and do
programming in the popular statistics, visualization, and data mining environment R. The topics
of the course have some overlap with what is taught in the Machine Learning (COSC 6342) course, to
reduce this overlap the teaching
of this course places a little less emphasis on learning
classification and prediction models (this topic will be covered "more quickly"
and more emphasis will put on Data Science Basics, Exploratory Data Analysis,
Association Analysis, Clustering, Outlier Detection and Data Set Augmentation.
There will be 3 assignments and 2 "paper walkthoughs" in 2026.
There will be 3 group activities in 2026: Assignment2, leading paper walkthoughts,
and Group Homework Credit (you find a more
detailed description about group activities below). Comments concerning this website
If you have any comments
concerning this website, send e-mail
to: ceick@uh.edu
Basic Course Information
Instructor: Dr.
Christoph F. Eick
office hours: TU 4-5p TH 9-10a
e-mail: ceick@uh.edu
TA: Janet Anagli
e-mail: jyanagli@CougarNet.UH.EDU
TA office hours: MO+TU 9-10a (on MS Teams)
class meets: TU/TH 11:30-1
class room: SEC 203
classes taught by others: February 5
cancelled classes: Likely, Tuesday, April 14
Course Materials
Objectives Data Mining Course
Highly Recommended Text:
- P.-N. Tang, M. Steinback, and V. Kumar: Introduction to Data Mining,
- Addison Wesley, Second Edition, 2019
- Link to
Book HomePage
Recommended Texts:
NIST/SEMATECH e-Handbook of Statistical Methods (good onlne
source covering exploratory data analysis, statistics, modelling and prediction)
2026 Course Organization
I. Introduction to Data Mining
II. Density Estimation
III. Similarity Assessment
IV. Outlier and Anomaly Detection
V. Autoencoders
VI. Data Science Basics and Exploratory Data Analysis
VII. Clustering
VIII. Supervised Learning: Basic Concepts, Decision Trees, Neural Networks and Deep Learning
IX. Association Analysis: Rule, Sequence, Graph and Collocation Mining
X. Data Storytelling (not covered in 2026)
XI. Preprocessing
Important Course Dates
Thursday, February 5: Assignment1 Lab
Tu., Feb. 24: In-class Activity (30 minutes) and Lockdown Browser Exam Mock Exam (10 minutes)
Thursday, March 5, 11:30a: First Course Exam
Thursday, March 12: Last Lecture before Spring Break
March 17+19: no lecture (Spring Break)
Tu., March 10: "Fast" Task2 Topic Overview Presentations (120-150 seconds for each group)
Th., March 26: First Paper Walkthrough
Th+Tu, April 2+7: Task2 Student Presentations; groups G, H and I present on April 2.
Tu., April 14: no class!
Th., April 16: Second Paper Walkthrough
Thursday, April 30, 11:30a: Second Course Exam
News COSC 6335 (Data Mining) Fall 2026
- Exam2 has been scheduled for April 30, 11:30a-12:45p in our class room. Here is the
2026 Review List for Exam2!
- Task3 is due Friday, April 24 end of the day!
- Please upload your GHC presentation slides in the GHC channel of the 6335 MS Teams page.
- GHC: Group H will present April 21, Group G will present April 23, Group H will present April 28!
- We graded Exam1: Overall, the 2026 performance for Exam1 was more positive than negative.
On the postive side, a large number of students received scores above 35/50 (number grades of 86 or higher) and the Exam1 number grade
average is 82.89. On the negative side, five students had scores below 20.
- The scores for Task1 have been posted; the score average was 88% of the available points (90).
Course Elements and Their Weights
These are the final weights for Spring 2026:
Exams (2): 51% (Exam1 (March 5, 2026): 24%, Exam2 (April 30, 2026): 27%
Assignments (3 tasks): 41%
Group Homework Credit(4%) and Paper Walkthroughs(2%): 6%
Attendance: 2%
2026 Assignments
Task1: Outlier Detection For a Houston Weather Dataset
(second draft; tentative deadline: Feb. 26, 2023 Houston Weather Dataset);
Task1 is an individual task and centers on density estimation, autoencoders and outlier detection.
Task2: Group Project (tentative deadline: March 30)
centering on classification/prediction and clustering;
each group will find an "interesting" dataset and choose their own topic which will be approved by Feb. 20 the latest.
Groups will present their Task2 results in presentation April 2 and 7!
Task3: Collocation Mining for a Building Dataset (deadline: April 24,
More Information about Task3, Building Dataset (KML file)).
Individual task centering on association analysis.
Task Weights: Task1: 90 points, Task2: 70 points (subject to change; might
go up or down), Task3: 48 points.
Remark: The three task scores will be added, and the sum will be curved and converted into a number grade which
counts 41% towards your course grade.
2026 Group Activities
Groups will perform Assignment2, summarize paper sections and lead discussion in
the two paper walkthoughs, and will make a 13-18 minute group homework credit (GHC) presentation. GHC tasks include
presenting solutions to homework-style problems, leading in-class discussions and demo tools. Each group get a different GHC task.
2026 Group Homework Credit
Group A and Group B Tasks (Group A will present on Feb. 12 and Group B will present on Feb. 17)
Group C Task (will present on Feb. 24)
Group D Task (will present on March 3)
Group E Task (scheduled for March 24)
Group F Task (scheduled for April 2)
Group H Task (scheduled for April 21)
Group G Task (scheduled for April 23)
Group I Task (scheduled for April 28)
Remarks: Tasks will be assigned at least 4 days before your group's presentation date!
Paper Walktroughs
2026 Paper Walkthrough Info (scheduled for March 26 and April 17)
Dr. Eick's COSC 6335 2026 Lecture Notes

I Introduction to Data Mining (Part1, Part2,
Part3--- covers chapter 1 and Section 2.1)
II Data Science Basics / Exploratory Data Analysis (covers
chapter 3 of the first edition of the Tan book)
III R (Arko's Short Intro Into R (used in Lab),
Scatter
Plot Code, Decision Trees in R, Some useful code for Task1 (to be discussed
on Sept. 16),
Some other code for Task2 (not discussed in the lecture),
Computing Statistical Summaries In the Presense of Missing Value (NA) (not discussed in the lecture),
Functions
and Loops in R (useful, but not discussed in the lecture)
IV Similarity Assessment
V Naive, Parametric (MLE Example)
and Non-Parametric Density Estimation
VI Autoencoders (presented by Janet on Feb. 5)
VII Outlier Detection (extended and updated on Feb. 6, 2026)
VIII Clustering: Introduction to Clustering, Density-based Clustering, Hierarchical Clustering,
DENCLUE, EM,
R-scripts demoing K-means/medoids and
DBSCAN and Cluster Validity.
IX Supervised Learning (Introduction to Classification: Basic Concepts and Decision
Trees,
Overfitting,
Introduction to Neural Networks.
More on NNs: Simon J.D. Prince, Understanding Deep Learning, MIT Press, 2023.
Text Book Resourses
(contains slides as well as the book in pdf format,
which you can download); relevant content: Fitting Models and
Regularization; K-Nearest Neighbors and
Support Vector Machines (partially covered in 2026, as we are running out of time)).
XI Association Analysis: Assiociation Rule Mining,
Sequence Mining (not covered in 2026).
XII Introduction to Spatial Data Mining (not covered in 2026)
XIII Data Preprocessing for Data Mining (will already be covered early September)
XIV Advanced Clustering (will cover CLIQUE, DENCLUE, FCM and SNN)
XV Data Storytelling (not or only breifly covered in 2026)
Old Webpages of COSC 6335: 2013 and
2009.
In-Class Activity
Feb. 24 Activity (Group Activity; groupsize 4-5)
COSC 6335 Grading
Dr. Eick uses the following number grade scale for graduate courses: A: 100-90, A-: 90-85, B+: 85-82, B: 82-77,
B-:77-74, C+: 74-70, C:70-66,C-:66-62, D+: 62-58, D:58-54, D-:54-50, F: 50-0. Exams and Problemsets
are still curved after your exam and task scores
have been determined. Exam scores are curved immediately and problem set scores are normalized, added;
ultimately, your total for the 3 assignment is converted at the end of the semester into a number grade.
Number grades higher than 95 are rarely used---except for
truly exceptional performances and outliers; a grade of 95
already represents a letter grade of A+¦ Exams of graduate courses are usually
curved so that the exam number grade average is in the
range 81-84, depending how well the students performed in the particular exam. Dr. Eick
first determines the class's performance in the exam and selects an average
and then the exam is curved accordingly.
Number grades do not directly correspond to percentage obtained: for
high percentage averages percentage
will be down graded and for low percentage average percentages will be upgraded.
2026 Exam1 Curving Function: h(x)= round(82.9+((x - 31.55)*0.87))
Old COSC 6335 News Items
- One personal comment about Tuesday's Task2 presentations: Several students left immediately after or shortly
after their group's presentation without listening to the presentations of the other groups. Another problem is students
showing up for the lecture 20+ minutes late; I understand a few students might show up a few minutes late
due to their instructor's finishing late and having to walk 10-15 minutes to SEC 203. I find those behaviors completely unacceptable!
- The order in which topics will be covered in 2026 is quite different from the order the topics were covered
in 2024. Moreover, the peer reviewing tool Kritik, which was used in the course's teaching in 2022 and 2024, will
not be used in 2026.
Moreover, in 2026 students will be presenting more and lead discussions in comparison to the 2024 teaching of
the course; therefore, there will be fewer traditional lectures.
- Always download documents from the course website, as it always stores the most recent version of
the respective documents.
Data Sets
Complex8
Complex9 with 8% Gaussian Noise Added
Bank Note Authentication
2024 Exams
Midterm1 Exam (scheduled for Oct. 10, 2024, 2:30p in 203 SEC, Review List,
October 8, 2024 Review for the Midterm Exam, Solution Sketches 2022 Midterm
Exam, Solution Sketches 2023 Midterm
Exam)
Midterm2 Exam (2024 Review List, Nov. 21 Review
for the Nov. 26, 2024 Exam) has been scheduled for Tuesday, November 26, 2024, 2:30p(Some solution
sketches for the 12/09/22 Final Exam; Some solution
sketches for the 12/07/23 Final Exam)
Course exams will be open book/notes paper exams; however, the use of cell phones and
computers during the exam is not allowed; basic calculators are okay.
Midterm1 counts 22% and Midterm2 counts 24% toward the course grade.
Attendance 2024
Attendance counts 2% towards the course grade.
Attendance will be taken starting Tuesday, August 27 throughout the remainder of the semester. Only
F2F attendance counts. Therefore, 24 attendances will be taken
(August (2), September (7), October (9), November (6)). Your number of attendances will be converted as follows into a number grade:
24-23: 92, 22-21: 91, 20-19 :90, 18:88, 17:86, 16:84, 15:81, 14:78, 13:75, 12:71, 11:67, 10:63, 9:59, 8-0:55.