last updated: Dec. 6, 2024


COSC 6335: Data Mining in Fall 2024 (Dr. Eick )



2024 COSC 6335 Syllabus

Goals of the Data Mining Course

Data mining centers on finding novel, interesting, valid, and potentially useful patterns in data. It aims at transforming a large amount of data into a well of knowledge. Data mining has become a very important field in industry as well as academia. The course covers most of the important data mining techniques, covers the Basics of Data Science, and provides background knowledge on how to conduct a data mining project. Topics covered in the course include exploratory data analysis, classification and prediction, clustering and similarity assessment, association analysis, outlier and anomaly detection, and interpreting and evaluating data analysis/data mining results. Also basic visualization techniques and statistical methods will be introduced. Moreover, hands on data mining experience will be provided in three Problem Sets. You will also get some practical expierence in evaluating data mining results from you fellow students and data mining publications. Finally, you will learn on how to use and do programming in the popular statistics, visualization, and data mining environment R. The topics of the course have some overlap with what is taught in the Machine Learning (COSC 6342) course, to reduce this overlap the teaching of this course places a little less emphasis on learning classification and prediction models (this topic will be covered "more quickly" and not a lot of points are allocated in the problem sets to this topic) and more emphasis will put on Data Science Basics, Exploratory Data Analysis, Association Analysis, Clustering, and Outlier Detection.

Comments concerning this website

If you have any comments concerning this website, send e-mail to: ceick@uh.edu

Basic Course Information

Instructor: Dr. Christoph F. Eick
office hours: TU 4:10-4:55p TH 8:50-10a (on MS Teams)
e-mail: ceick@uh.edu
TA: Md. Mahin
e-mail: mdmahin3@gmail.com
TA office hours: MO 12:30-1:30p TU 12:30-1:30p (on MS Teams)
class meets: TU/TH 2:30-4p
class room: SEC 203
classes taught by others: Mahin will teach Sept. 24 and Oct. 31!
cancelled classes: Tuesday, September 10 and Tuesday, November 12.
Makeup class: Fr., November 1, 3:30-4:45p (MS Teams online lecture)

Course Materials

Objectives Data Mining Course

Highly Recommended Text:
P.-N. Tang, M. Steinback, and V. Kumar: Introduction to Data Mining,
Addison Wesley, Second Edition,
Link to Book HomePage

Recommended Texts:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques
Morgan Kaufman Publishers, Third Edition, 2011.
Link to Data Mining Book Home Page

NIST/SEMATECH e-Handbook of Statistical Methods (good onlne source covering exploratory data analysis, statistics, modelling and prediction)

2024 Course Organization

I.	Introduction to Data Mining
II.	Data Science Basics and Exploratory Data Analysis 
III.    Preprocessing 
IV.	Brief Introduction to Peer Reviewing and using Kritik for it 
V.	Labs: Using R and Python for Data Science and Data Mining
VI.	Density Estimation
VII.	Outlier and Anomaly Detection
VIII.	Introduction to Clustering and Similarity Assessment
IX.    Data Storytelling 
X.	Classification: Basic Concepts and Decision Trees and Neural Networks. 
XI.	Introduction to Deep Learning Centering on Autoencoders
XII.	Reviewing Data Mining Papers 
XIII.	Association Analysis: Rule, Sequence, Graph and Collocation Mining
XIV.    Spatial Data Mining 
XV.     Advanced Clustering  

Important Course Dates

Thursday, August 29: Lab taught by Mahin (in preparation to Task1; bring laptop)
Thursday, October 10, 2:30p: First Course Exam (Review List 2023 Midterm Exam)
Thursday, October 31, 2:30p: "Introduction to Deep Learning" and Task4 Lab taught by Mahin.
Tuesday, November 26, 2:30p: Second Course Exam

News COSC 6335 (Data Mining) Fall 2024