last updated: December 15, 2023


COSC 6335: Data Mining in Fall 2023 (Dr. Eick )



2023 COSC 6335 Syllabus

Goals of the Data Mining Course

Data mining centers on finding novel, interesting, valid, and potentially useful patterns in data. It aims at transforming a large amount of data into a well of knowledge. Data mining has become a very important field in industry as well as academia. The course covers most of the important data mining techniques, covers the Basics of Data Science, and provides background knowledge on how to conduct a data mining project. Topics covered in the course include exploratory data analysis, classification and prediction, clustering and similarity assessment, association analysis, outlier and anomaly detection, and interpreting and evaluating data analysis/data mining results. Also basic visualization techniques and statistical methods will be introduced. Moreover, hands on data mining experience will be provided in three Problem Sets. You will also get some practical expierence in evaluating data mining results from you fellow students and data mining publications. Finally, you will learn on how to use and do programming in the popular statistics, visualization, and data mining environment R. The topics of the course have some overlap with what is taught in the Machine Learning (COSC 6342) course, to reduce this overlap the teaching of this course places a little less emphasis on learning classification and prediction models (this topic will be covered "more quickly" and not a lot of points are allocated in the problem sets to this topic) and more emphasis will put on Data Science Basics, Exploratory Data Analysis, Association Analysis, Clustering, and Outlier Detection.

Comments concerning this website

If you have any comments concerning this website, send e-mail to: ceick@uh.edu

Basic Course Information

Instructor: Dr. Christoph F. Eick
office hours: TU 4:10-5p TH 8:50-10a (on MS Teams)
e-mail: ceick@uh.edu
TA: Md. Mahin
e-mail: mdmahin3@gmail.com
TA office hours: MO 12:30-1:30p TU 12:30-1:30p (on MS Teams)
2014 TA website: Arko Barman's COSC 6335 Website
class meets: TU/TH 2:30-4p
class room: S 120
classes taught by others: Tuesday, October 3, Tuesday, November 14, Thursday, November 16; moreover, Mahin will teach 3-4 30-75 minute labs as part of the lecture to provide background knowledge for some problemset task. Makeup class (if necessary): TBDL

Course Materials

Objectives Data Mining Course

Highly Recommended Text:
P.-N. Tang, M. Steinback, and V. Kumar: Introduction to Data Mining,
Addison Wesley, Second Edition,
Link to Book HomePage

Recommended Texts:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques
Morgan Kaufman Publishers, Third Edition, 2011.
Link to Data Mining Book Home Page

NIST/SEMATECH e-Handbook of Statistical Methods (good onlne source covering exploratory data analysis, statistics, modelling and prediction)

2023 Course Organization

I.	Introduction to Data Mining
II.	Data Science Basics and Exploratory Data Analysis 
III.    Preprocessing 
IV.	Brief Introduction to Peer Reviewing and using Kritik for it 
V.	Labs: Using R and Python for Data Science and Data Mining
VI.	Introduction to Clustering and Similarity Assessment
VII.    Data Storytelling 
VIII.	Density Estimation
IX.	Outlier and Anomaly Detection
X.	Classification: Basic Concepts and Decision Trees, Support Vector Machines and Neural Networks. 
XI.	Introduction to Deep Learning Centering on Autoencoders
XII.	Reviewing Data Mining Papers 
XIII.	Association Analysis: Rule, Sequence, Graph and Collocation Mining
XIV.    Spatial Data Mining 
XV.     Advanced Clustering  

Important Course Dates

Tuesday, September 5: Lab taught by Mahin (in preparation to Task1)
Tuesday, October 17, 2:30p: Midterm Exam (Review List 2023 Midterm Exam)
Thursday, December 7, 2p: Final exam

News COSC 6335 (Data Mining) Fall 2023