I suggest you visit this webpage "one last time" in the window Dec. 28-31, 2024, as Mahin
and I will post "a lot of interesting things" shortly after Christmas.
2024 Problem Set Tasks
Task1: Exploratory Data Analysis for an Abalone Dataset; Task1 is a
peer-review group task which is due Friday, September 20, 11p in Kritik.
Mahin's August 29 Lab.
Task2: Outlier Detection For a Houston Weather Dataset; Task2 is an individual,
peer-reviewed task
and is due Thursday, October 17, 11p in Kritik.
Task3: Clustering an Earthquake and a Weather Dataset; Task3 is an individual task which is due
Sunday, November 3, end of the day in MS Teams.
Task4: Deep Learning Centering on Auto-Encoders and Generative Models; individual task which
is due on Monday, November 11, end of the day in MS Teams.
Task5: Reviewing a Data Mining Paper
(ICDM 2022 paper to review, Task5 Discussion; Task5 is a peer reviewed
group task which is due on Nov. 16 in Kritik, followed by peer reviewing other group's reviews of the paper)
Problem Set Tasks and Schedule
Task 1 (Group Task): August 31-Sept. 26: Exploratory Data Analysis
Task 2: Sept. 26-Oct. 22: Outlier Detection; Task2 is due on Oct. 18 in Kritik, followed by peer reviewing.
Task 3: Oct. 16-Nov. 2: Clustering (no peer reviewing)
Task 4: Nov. 3-11: Deep Learning Task(no peer reviewing)
Task 5 (Group Task): Nov. 12-Nov. 22: Data Mining Paper Reviewing Task
The tentative weights of the 5 problemset tasks are as follows: Task 1 = 18%,
Task 2 = 29%,
Task 3 = 21%,
Task 4 = 16%,
Task 5 = 16%.
The problem set tasks count 47% toward the course grade!
2024 Exams
Midterm1 Exam (scheduled for Oct. 10, 2024, 2:30p in 203 SEC, Review List,
October 8, 2024 Review for the Midterm Exam, Solution Sketches 2022 Midterm
Exam, Solution Sketches 2023 Midterm
Exam)
Midterm2 Exam (2024 Review List, Nov. 21 Review
for the Nov. 26, 2024 Exam) has been scheduled for Tuesday, November 26, 2024, 2:30p(Some solution
sketches for the 12/09/22 Final Exam; Some solution
sketches for the 12/07/23 Final Exam)
Course exams will be open book/notes paper exams; however, the use of cell phones and
computers during the exam is not allowed; basic calculators are okay.
Midterm1 counts 23% and Midterm2 counts 25% toward the course grade.
Attendance 2024
Attendance counts 2% towards the course grade.
Attendance will be taken starting Tuesday, August 27 throughout the remainder of the semester. Only
F2F attendance counts. Therefore, 24 attendances will be taken
(August (2), September (7), October (9), November (6)). Your number of attendances will be converted as follows into a number grade:
24-23: 92, 22-21: 91, 20-19 :90, 18:88, 17:86, 16:84, 15:81, 14:78, 13:75, 12:71, 11:67, 10:63, 9:59, 8-0:55.
Dr. Eick's COSC 6335 2024 Lecture Notes
I Introduction to Data Mining (Part1, Part2,
Peer Reviewing and Kritik,
Part3--- covers chapter 1 and Section 2.1)
II Data Science Basics (formerly called Exploratory Data Analysis) (covers
chapter 3 of the first edition of the Tan book)
III R (Arko's Short Intro Into R (used in Lab),
Scatter
Plot Code, Decision Trees in R, Some useful code for Task1 (to be discussed
on Sept. 16),
Some other code for Task1 (not discussed in the lecture),
Computing Statistical Summaries In the Presense of Missing Value (NA) (not discussed in the lecture),
Functions
and Loops in R (useful, but not discussed in the lecture)
IV Peer Reviewing and Kritik (
Introduction to Kritik Video (will
we watch this video during the lecture!))
V Naive, Parametric
and Non-Parametric Density Estimation
VI Introduction to Similarity Assessment
and Clustering
VII More on Clustering: Density-based Clustering, Hierarchical Clustering,
DENCLUE, EM,
R-scripts demoing K-means/medoids and
DBSCAN, Randomized Hill Climbing and Cluster Validity.
VIII Outlier Detection
IX Classification (Introduction to Classification: Basic Concepts and Decision
Trees,
Overfitting,
Neural
Networks Part1
(3blue1brown: What is a
Neural Network? (will show the first 12:30 of this video)),
Neural Networks Part2, kNN-Classifiers and Support Vector Machines
X Deep Learning (Introduction to Deep Learning (will watch and discuss some MIT Deep Learning
Bootcamp Videos), Review Neural Network Basics, Autoencoders, Language Models and Convolutional
neural networks (CNN)); taught by Mahin on October 31, 2024 , More on VAEs (not covered in 2023!))
XI Association Analysis: Assiociation Rule Mining,
Sequence Mining.
XII Introduction to Spatial Data Mining
XIII Data Preprocessing for Data Mining (will already be covered in early September)
XIV Advanced Clustering (will cover CLIQUE, FCM and EM)
XV Data Storytelling
Old Webpages of COSC 6335: 2013 and
2009.
Data Sets
Complex8
Complex9 with 8% Gaussian Noise Added
Bank Note Authentication
Group Homework Credit
2024 GHC Groups and Contact E-mails
In this activity which will be called group homework credit, each group formed for this activity,
receives a different usually homework-style problem (other tasks include demos, and leading discussions), and
they present their solution during the lecture (10-14 minutes),
and share their solution in form of a
Word or pptx file. The groups and e-mail addresses of the group members have been posted in the 'Group Homework Credit' channel
of this section's MS Team. Below is a list of the already assigned tasks and associated groups and presentation dates;
tasks will be added as we move along with the teaching of the course; tasks will be posted at
least 5 days before a group's presenation date!
2024 Schedule and Tasks
Group A and B be will present on Sept. 12 and group C will present on
Sept. 17 (Tasks for groups A, B, C)
Group D will present on Sept. 19 and Group E will present on Sept. 26 (Group D and E Tasks)
Group F Task (will present on Th., October 3!)
Group G Task (will present on Oct. 8)
Group H, I, J and K Tasks (Group H will present on Oct. 22, Groups I will present
on October 29, group J will present on Nov. 1 (online) and Group K will present and lead a discussion on November 7)
Group L and M Tasks (group L will present Nov. 14 and group M will present Nov. 19)
Group N Task (will present on November 21)
2023 COSC 6335 Grading
Dr. Eick uses the following number grade scale for graduate courses: A: 100-90, A-: 90-85, B+: 85-82, B: 82-77,
B-:77-74, C+: 74-70, C:70-66,C-:66-62, D+: 62-58, D:58-54, D-:54-50, F: 50-0. Exams and Problemsets
are still curved after your exam and task scores
have been determined. Exam scores are curved immediately and problem set scores are normalized, added;
ultimately, your Problemset total is converted at the end of the semester into a number grade which
counts about 48% to the overall number grade score which is then converted into a letter grade.
Number grades higher than 95 are rarely used---except for
truly exceptional performances and outliers; a grade of 95
already represents a letter grade of A+… Exams of graduate courses are usually
curved so that the exam number grade average is in the
range 81-84, depending how well the students performed in the particular exam. Dr. Eick
first determines the class’ performance in the exam and selects an average
and then the exam is curved accordingly.
Number grades do not directly correspond to percentage obtained: for
high percentage averages percentage
will be down graded and for low percentage average percentages will be upgraded.
As far as individual Kritik tasks are concerned, creation scores were weighted by 66%, written evaluation scores by 15%, grading scores
by 10% and feedback scores by 95. As far as the grading of the 2 group tasks is concerned:
Dr. Eick read your collocation mining reports and created his own creation score for each group project
which will be combined
creation score your group received from your student peers (by averaging the two scores and then
converting those into a number grades) based on his impressions
and his evaluation about the
quantity and the quality of the work you did for the collocation mining task. The same procedure was used
to obtain a final creation score for Task 7, except Nour and not Dr. Eick took a look at your reviews. Moreover, no feedback with
respect to written comments is solicited from the groups in Kritik group projects. However, your written
comments for Task 6 will be graded by Dr. Eick and for Task 7 by Nour, and this written comment score will be combined (counting 15%) with
your group's creation score (counting 85%) when the final group task grade is computed. In contrast to individual Kritik projects
there will be no feedback and grading scores for group projects.
Comments about COSC 6335 Exams: to reduce the probability of cheating, the COSC 6335 final exam was designed to
be slightly too long. If it turns out that the exam was more than slightly too long, be aware of the fact that all
students took the same exam, and that the final exam is still subject of curving to possibly rectify some problems with
exam length.
In most of Dr. Eick's exams students who score more than 75% of the available points in the exam usually will get a grade of A- or better.
Moreover, I believe that graduate students should be "challenged" to demonstrate their skills during final exams. On the other
hand, this semester's
midterm exam was not very challenging, in my opinion,
and more importaintly is was much too short, encouraging cheating.
Dr. Eick also concluded---after using them a lot for 2 semesters---that multiple choice exams are not appropriate to
assess particular skills, and that often multiple choice exams
test a student's capability with respect to differences in natural language semantics rather than assessing if students actually
understand and can apply what was taught in a course.
Peer Review using Kritik
Moreover, peer review will be a component of this course: you will evaluate work of other students
in the course and work of other peers in
the field of data mining; Kritik will be used for producing
and evaluating peer evaluations. There will be a $29+tax student fee to get Kritik access—however, if you
are a really poor graduate student, feel free to contact Dr. Eick to subsidise this fee! Moreover, we received
2 'free' accounts for deserving students!
Kritik Tasks: There is a 24 hour grace period for each Kritik task. You are allowed to use this grade period for
up to 5 submissions. "Draft" rubrics for the peer reviewed tasks in the problem sets can be found in the ProblemSet
channel in DM2000 and ultimately you will find the "final" rubric in Kritik. As these rubrics tell
you how your submissions will be graded by your peers, Nour and Dr. Eick, it might be worth while looking
at those rubrics closely.
Polls
Nov. 19, 2024 Poll about Problemset Tasks and Kritik
Graduate Research Opportunities
UH-DAIS Research Overview
2023 Problem Sets
ProblemSet1 (Task1:
Exploratory Data Analysis for a Basel Weather Dataset, Task2: Develop
an Intelligent Tool which Compares Boxplots; individual
tasks; you should read the specification of Task1 by August 30, and should start working on Task1 on
Sept. 5!)
ProblemSet2 (contains Task3 a clustering group task, and Task4 an individual, peer reviewed
outlier detection task)
ProblemSet3(contains Task5, a peer reviewed group task in which you will review a
data mining paper which is due on November 14, 2023; Short Discussion
Concerning Reviewing Data Mining Papers).
2023 Groups for Tasks 3 and 5; Task 1, 2, and 4 are individual tasks!
2023 Weights for the ProblemSet Tasks: Task1:4, Task2:4, Task3:4, Task4:5, Task5:1.5.
2023 Weights of Course Elements: Midterm Exam:21%, Final Exam:27%; Group Homework Credit: 3%; Attendance:2%; Problem
Set Tasks:47%.
Old COSC 6335 News Items
- It is unacceptable that more than 50% of the students in COSC 6335 show up 5-25 minutes late for class in the recent
two weeks!
- Task3 has been posted.
- Due to multiple student requests, the lecture on Tu., Sept. 10 has been cancelled; it will be
made up on Friday, Sept. 27, 1:15p-2:30 (via online lecture via MS Teams).
- We will be taking class attendance starting August 27, 2024.
- The GHC tasks for groups A-E have been posted; see below!
- Please, make sure that your Kritik accounts are active by September 15 the latest.
- Always download documents from the course website, as it always stores the most recent version of
the respective documents.