COSC 6335: Data Mining in Fall 2020 (Dr. Eick )

Goals of the Data Mining Course

Data mining centers on finding novel, interesting, valid, and potentially useful patterns in data. It aims at transforming a large amount of data into a well of knowledge. Data mining has become a very important field in industry as well as academia. The course covers most of the important data mining techniques, covers the Basics of Data Science, and provides background knowledge on how to conduct a data mining project. Topics covered in the course include exploratory data analysis, classification and prediction, clustering and similarity assessment, association analysis, outlier and anomaly detection, and interpreting and evaluating data analysis/data mining results. Also basic visualization techniques and statistical methods will be introduced. Moreover, hands on data mining experience will be provided in three Problem Sets. You will also get some practical expierence in evaluating data mining results from you fellow students and data mining publications. Finally, you will learn on how to use and do programming in the popular statistics, visualization, and data mining environment R. The topics of the course have some overlap with what is taught in the Machine Learning (COSC 6342) course, to reduce this overlap the teaching of this course places a little less emphasis on learning classification and prediction models (this topic will be covered "more quickly" and not a lot of points are allocated in the problem sets to this topic) and more emphasis will put on Data Science Basics, Exploratory Data Analysis, Association Analysis, Clustering, and Outlier Detection.

Basic Course Information

Instructor: Dr. Christoph F. Eick
office hours: TU 4-5p TH 11:30a-12:30p (on MS Teams)
TA: Nour Smaoui
TA office hours: TU 1-2p TH 4-5p (on MS Teams)
class meets: TU/TH 2:30-4p
classes taught by others: none
cancelled classes: TBDL
Makeup class (if necessary): none
lecture not taught by Dr. Eick: Th., October 29

Course Materials

COSC 6335 Syllabus Fall 2020
Objectives Data Mining Course

Highly Recommended Text:
P.-N. Tang, M. Steinback, and V. Kumar: Introduction to Data Mining,
Addison Wesley, Second Edition,
Link to Book HomePage

Recommended Texts:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques
Morgan Kaufman Publishers, Third Edition, 2011.
Link to Data Mining Book Home Page

NIST/SEMATECH e-Handbook of Statistical Methods (good onlne source covering exploratory data analysis, statistics, modelling and prediction)

Course Content

1. Introduction to Data Mining
2. Exploratory Data Analysis
3. A Short Introduction to R
4. Brief Discussion Peer Reviewing and Using Kritik for it
5. Introduction to Classification: Basic Concepts, Decision Trees, Support Vector Machines, k-NN and Neural Networks.
6. Association Analysis: Rule, Sequence, Graph and Collocation Mining
7. Outlier and Anomaly Detection
8. Introduction to Density Estimation
9. Introduction to Clustering and Similarity Assessment
10. On Convolutional Neural Networks and Autoencoders in Particular and on Deep Learning in General
11. More on Clustering: Density-based Clustering, EM, Hierarchical Clustering and Cluster Validity
12. A Brief Introduction to Spatial Data Mining
13. Data Preprocessing

Tentative Course Schedule

Th., September 3, 3:05p: Likely, R Lab (given by Nour)
Th., October 8: 35 minute Review for 2020 Midterm Exam
Tu., October 13: Midterm Exam (Review List 2020 Midterm Exam)
Th., November 5, 11p: Deadline Collocation Mining Group Project
Th., November 26: no class (Thanksgiving)
Tu., December 1: Peer Reviewing and Kritik Discussion
Th., December 3: last lecture including a Review for the 2020 Final Exam
Th., December 10,2p: Final Exam (Review List 2020 Final Exam)
Tu, December 22, 3:30p: Collocation Mining Group Project Post Analysis (only if there is some intrest in having this meeting)

