The results of the Kritik Poll have been posted in a channel with the same name in the DM2020 team.
Please, take a look!
COSC 6335 Grading
Dr. Eick uses the following number grade scale: A: 100-90, A-: 90-86, B+: 86-82, B: 82-77,
B-:77-74, C+: 74-70, C:70-66,C-:66-62, D+: 62-58, D:58-54, D-:54-50, F: 50-0. Exams and Problemsets
are still curved after your exam and task scores
have been determined. Exam scores are curved immediately and problem set scores are normalized, added;
ultimately, your Problemset total is converted at the end of the semester into a number grade which
counts about 48% to the overall number grade score which is then converted into a letter grade.
Number grades higher than 95 are rarely used---except for
truly exceptional performances and outliers; a grade of 95
already represents a letter grade of A+… Exams of graduate courses are usually
curved so that the exam number grade average is in the
range 81-84, depending how well the students performed in the particular exam. Dr. Eick
first determines the class’ performance in the exam and selects an average
and then the exam is curved accordingly.
Number grades do not directly correspond to percentage obtained: for
high percentage averages percentage
will be down graded and for low percentage average percentages will be upgraded.
As far as individual Kritik tasks are concerned, creation scores were weighted by 66%, written evaluation scores by 15%, grading scores
by 10% and feedback scores by 95. As far as the grading of the 2 group tasks is concerned:
Dr. Eick read your collocation mining reports and created his own creation score for each group project
which will be combined
creation score your group received from your student peers (by averaging the two scores and then
converting those into a number grades) based on his impressions
and his evaluation about the
quantity and the quality of the work you did for the collocation mining task. The same procedure was used
to obtain a final creation score for Task 7, except Nour and not Dr. Eick took a look at your reviews. Moreover, no feedback with
respect to written comments is solicited from the groups in Kritik group projects. However, your written
comments for Task 6 will be graded by Dr. Eick and for Task 7 by Nour, and this written comment score will be combined (counting 15%) with
your group's creation score (counting 85%) when the final group task grade is computed. In contrast to individual Kritik projects
there will be no feedback and grading scores for group projects.
Comments about COSC 6335 Exams: to reduce the probability of cheating, the COSC 6335 final exam was designed to
be slightly too long. If it turns out that the exam was more than slightly too long, be aware of the fact that all
students took the same exam, and that the final exam is still subject of curving to possibly rectify some problems with
In most of Dr. Eick's exams students who score more than 75% of the available points in the exam usually will get a grade of A- or better.
Moreover, I believe that graduate students should be "challenged" to demonstrate their skills during final exams. On the other
hand, this semester's
midterm exam was not very challenging, in my opinion,
and more importaintly is was much too short, encouraging cheating.
Dr. Eick also concluded---after using them a lot for 2 semesters---that multiple choice exams are not appropriate to
assess particular skills, and that often multiple choice exams
test a student's capability with respect to differences in natural language semantics rather than assessing if students actually
understand and can apply what was taught in a course.
Course Elements and Their Tentative Weights for 2020
Problem Set Tasks: 48%
Spontaneous Online Credit(small exploratory tasks will be given to the small groups during the lecture) and Attendance: 4%
Midterm Exam: 19%
Final Exam: 29%
Problem Sets contain paper and pencil tasks which review your understanding of basic data mining
concepts and algorithms, tasks which use data mining tools, and small and medium sized data analysis/data mining projects,
and tasks in which you evaluate data mining results of other students and data mining publications.
Some tasks will be group tasks. There will be three Problem Sets in Fall 2020:
Problem Set1: Eploratory Data Analysis, Classification,
and Evaluating Data Analysis Results
(Cleaned Pima Indian Diabetes Dataset)
Problem Set2: Outlier Detection and Collocation Mining (Task6
Problem Set3: Data Mining Paper Reviewing and Clustering (Some
Discussion of Task7; some potentially useful material for Task8:
Loops and Functions in R, Randomized
Two of the tasks in the Problem Sets will be group trasks:
There will be a collocation mining group project in ProblemSet2,
and you will conduct a group review of a data mining paper in ProblemSet3.
Peer Assessment is a new element of COSC 6335: you will get some practical experience in evaluating
data mining results of your fellow students as well as data mining publications. Kritik
will be used for the peer assessment tasks of the course. Each student, taking
this course will need to pay a $15+tax usage fee for the
Dr. Eick's COSC 6335 2020 Lecture Notes
I Introduction to Data Mining (Part1, Part2,
Part3--- covers chapter 1 and Section 2.1)
II Exploratory Data Analysis (covers
chapter 3 of the first edition of the Tan book)
III R (Arko's Short Intro Into R (used in Lab),
Plot Code, Decision Trees in R,
Some useful code for Task1,
Computing Statistical Summaries In the Presense of Missing Value (NA),
and Loops in R)
IV Peer Reviewing and Kritik (
Introduction to Kritik Video (will
we watch this video during the lecture!))
V Classification (Introduction to Classification: Basic Concepts and Decision
(3blue1brown: What is a
Neural Network? (will show the first 12:30 of this video)),
Neural Networks Part2, kNN-Classifiers and Support Vector Machines)
VI Association Analysis: Assiociation Rule Mining,
Sequence and Graph Mining, Collocation
VII Outlier Detection
VIII A Brief Introduction to Naive, Parametric
and Non-Parametric Density Estimation
IX Introduction to Similarity Assessment
X Two Popular Deep Learning Approaches: Convolutional Neural Networks and
Autoencoders (by Rishabh Sharma)
XI More on Clustering: Density-based Clustering and Hierarchical Clustering,
R-scripts demoing K-means/medoids and
DBSCAN, Randomized Hill Climbing and Cluster Validity.
XII A Brief Introduction to Spatial Data Mining
XIII Data Preprocessing for Data Mining
Other Material: COSC 6335 Grading
Old Webpages of COSC 6335: 2013 and
Interactive Course Elements and Peer Review using Kritik
The course will try out some "new" approaches to reduce student isolation during online teaching. Each
student taking COSC 6335
belongs to two groups (a small one with 2 students and a larger group of 4-5 students) and some of
the course problem set tasks
will involve activities of the larger groups. Problem Set2 and Set3 will contain a single
task for the larger groups. Small tasks will be assigned to small groups for
online credit as we move along during the semester; groups
present their findings/solutions during the lecture using a single (rarely 2) powerpoint slide. Also mail your
solution slide to Dr. Eick with your group name and
the names of the group members in the header of the slide, so
that he can add those slides to the COSC 6335 teaching material!
Moreover, peer review will be a new component of this course: you will evaluate work of other students
in the course and work of other peers in
the field of data mining; Kritik will be used for producing
and evaluating peer evaluations. There will be a $15+tax student fee to get Kritik access—however, if you
are a really poor graduate student, feel free to contact Dr. Eick to subsidise this fee!
Kritik Tasks: There is a 24 hour grace period for each Kritik task. You are allowed to use this grade period for
up to 5 submissions. "Draft" rubrics for the peer reviewed tasks in the problem sets can be found in the ProblemSet
channel in DM2000 and ultimately you will find the "final" rubric in Kritik. As these rubrics tell
you how your submissions will be graded by your peers, Nour and Dr. Eick, it might be worth while looking
at those rubrics closely.
As we are trying out
some of those approaches for the first time, I like to ask you for a little patience as there might be some
startup problems and not everything might work well initially, and some of the new
approaches might not work as well as we expect them to work.
Online Teaching Information
In general, the course will be taught 100% online with lectures being taught on
MS Teams TU+TH 2:30-4p! The course will use MS Teams for the teaching of the course; the 2020 Team is called
DM2020. The team
pass code to join the team is: 707alkj —I also suggest to join the
team as soon as possible as by doing so you will receive all relevant course information!
For taking the course you will
need to use your UH-cougarnet account: if you do not have a working cougarnet account,
fix this problem as soon as possible. As we move along
with the course various channels will be created for course
discussions, content, problem sets and polls in DM2020! Dr. Eick will schedule his Th. office hours
a separate meeting in DM2020 (Th. 11:30a-12:30p) that you can join as you join the course
lecture. If you have some private matters to discuss, a private channel will be used to ensure your privacy.
The is no separate meeting for his office hour
Tu. 4-5; join the Tuesday lecture Teams event if you want to meet him.
COVID-19 Related Matters Interferring with Taking COSC 6335
Moreover, if you face serious problems taking COSC 6335 related COVID-19, with respect to
obtaining real-time access to course meetings, related to not be able to by physically in Houston,...
please send Dr. Eick an e-mail!
Other Ideas for COSC 6335
Another idea is to reach out to industry to sponsor COSC 6335 course activities; e.g.
we could have the "Collocation Mining Group Project (sponsored by Company X)". If you have any good ideas and/or
expertise on obtaining such sponsorships, feel free to contact Nour or Dr. Eick!
2020FA-27014-COSC6335-Data Mining: Nours Dec. 10, 2p Final Exam Instructions
The COSC 6335 Final Exam consists of a single part, containing 4 single answer multiple-choice questions (1
of the presented n alternatives is correct) and 13 free text questions.
Questions will be presented in a random sequential order and you will not
have the opportunity to change answers to previous questions. For most exam problems multiple versions
exist, and questions are assigned to students randomly.
You will need to take the exam between 2:00 pm and 3:45 pm on Thursday, Dec 10, 2020. You will have 105 minutes to
complete the exam. The exam will be available at 2:00 pm on the Blackboard Assignments section
with the name "COSC 6335: Data Mining 2020 Final Exam".
Be mindful of the weight of each question. The available scores for the 17 questions will be:
8-5-5-7-8-3-3-3-5-8-4-5-3-3-8-9-4 for a total of 91 points. You may not see this order as the
questions order will also be randomized.
CASA Monitor will NOT be used for this exam. You won't be asked to show your webcam or share your screen.
- Important Note: UH academic honesty rules apply to the taking of the 6335 Final Exam.
If you have any questions during the exam please feel free to contact me at:
Nour: +1 832-866-7732
or by chat on DM2020 Teams.
Best of luck to everyone!
Online Credit Tasks Fall 2020
Task A: Demo and Evaluation of Tools which Create Histograms
Task B: Demo and Evaluation of Tools which Create Histograms
Task C: Scatter Plot Interpretation
Task D: Comparing two Age Distribution Histograms
Task E: Comparing two Box Plots
Task F: Example Information and GINI Gain Computations
Task G: Parametric Density Estimation for an Example Dataset
Task H: Using Maximum Likelihood to Get a Parametric Model for a given Dataset
Task I: Application of Non-parametric Density Estimation to an Example Dataset
Task J: Demo of the Expectation Maximization (EM) Algorithm
Task K: Design of a Distance Function for a Supermarket Customer Dataset
Task L: Running PAM/K-Medoids for an Example Dataset
Task N: Leading a Discussion about the DENCLUE Density-based Clustering Algorithm
Task O: Cluster Evaluation using Silhouette