The Home Page of the
Graduate Database Course COSC 6340
Spring 2005

Basic Course Information COSC 6340

Instructor: Dr. Christoph F. Eick
Office: 589 PGH
Office hours: TU 2:30-4p TH 1-2p


Class meets: TU/TH 11:30am-1:00pm
Class room: 232 PGH
cancelled classes: Th., Feb. 3, 2004 and Th., March 24; there will also be no class on March 15+17 due to Spring Break.
makeup class: Fr., April 8, 1-3:45p

Course Materials

Required Text:
Raghu Ramakrishnan and Johannes Gehrke, Data Management Systems,
McGraw Hill, Third Edition, 2002.
Call number:
Link to Textbook Homepage
Recommended:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques
Morgan Kaufman Publishers, 2001, ISBN 1-55860-489-8
Link to Data Mining Book Home Page
Other books with relevant material:
Ramez Elmasri and Shamkant Navathe, Fundamentals of Database Systems, Third Edition
Addison Wesley ISBN: 0-8053-1755-4
Link to Navathe's Book/Course Page

News COSC 6340 Spring 2005

  • I enjoyed teaching the course; I was impressed with most of the Lab2 and Lab3 projects. I like to wish everybody a good Summer 2005!
  • Here are the letter grades for COSC 6340 for Spring 2005. In Spring 2005 the grade distribution was as follows: A:3, A-:1, B+:5, B:0, B-:1, C+:1.
  • A detailed grade report will be posted on this website within the next 10 day (May 23, 2005 the latest). Also if you like to pick up the graded homeworks, you can do so in June 2005 (I have officehours Tuesdays in June) or between August 20 and September 8, 2005. Homeworks that have not been picked up by September 9, 2005 will be trashed.
  • The final exam will not be returned to students; however, you can see your final exam: Tu., June 7, 2:30-4p, Fr., August 26, 11a-noon, and on Tu., Aug. 30, 3:30-4:30p.
  • Those students that already gave their Lab2/3 presentation; please e-mail me your transparencies so that I can put those up on the web. Thanks!
  • The updated lecture schedule can be found in: Course Information and Introduction to Data Management (last updated on March 2, 2005))
  • The teaching of the course will be somewhat similar to the Spring 2004 teaching of the course. However, this year's teaching of the class will be much more project-oriented and will provide a more in-depth coverage of the database aspects of data mining; on the other hand, there will be less coverage of object-oriented databases, physical database design and on the semantic web.
  • The webpage will be updated as the course evolves. If you take the course, please check it at least once a week!

    Prerequisites

    The class prerequisites are COSC 3480 or equivalent (undergraduate database class) and MATH 3336. The first 4 weeks of the COSC 3480-lectures will review some undergraduate material, but the review will be quite fast pace and far from being complete. If you have neither taken an undergraduate database class, nor acquired the necessary knowledge in a different setting, you will have to work quite hard in the first 4 weeks of the semester. However, I also have to say, that most students with weak database background had no problems doing well in the course.

    Material Covered in COSC 6340

    Students are assumed to be familiar with most of the material that is covered in chapters 1-5, 7-11 and 18 of the required textbook that discuss material that is typically covered in undergraduate database courses, such as COSC 3480 in our undergraduate program. If you took a database class a long time ago, studying chapters 1-5 prior to February 1 is highly recommended.

    The course is subdivided into five parts (chapter numbers refer to chapters in the second edition of the textbook).

    Moreover, significant time of the lecture will be allocated to the discussion of homeworks, old exam and assignment problems.

    Important Dates for Spring 2005

    Tu., March 1: Undergraduate Material Review Exam
    Tu., April 14: midterm exam
    Tu., May 10, 11a(!): Final Exam

    Spring 2005 Labs and Homeworks

    Lab1(2004 Lab-TA homepage --- contains useful links concerning Lab1+2 and concerning on how to use Oracle; SQL-Sever 2000 Pets Script (donated by Ross Wright))
    Lab2 (last updated on March 15, 2005, 4p)
    Lab3

    Homework1
    Homework2
    Homework3

    2005 Weights of the Different Parts of the Course in 2005

    First Exam=15%, Second Exam=22%; Final Exam=30%, H1+H2+H3=6%, Presentation=3%, Lab1=2%, Lab2=11%, Lab3=11%.

    Spring 2004 Projects and Other Activities for COSC 6340

    Useful Information on how to install and use ORACLE
    Lab1 (using SQL)
    Lab2 (using PL/SQL)
    Research Review Group Project
    Assignment1
    Assignment2
    Ungraded Homeworks (see homework section)

    Group Project 2004

    2004 Groups and their Topics.

    2004 Group Project Websites

    Group One: Multi-Relational Data Mining
    Group Two: Scalable Clustering Algorithms
    Team 3: Sequence Data Mining
    Team 4: Grid Data Management
    Team 5: the Semantic Web Group (I also strongly recommend that you read The W3C Semantic Web Activity Statement that is part of the review material for the COSC 6340 final exam).
    Team 6: Constructing Data-centric Web Applications

    Fall 2003 Lab Projects for COSC 3480

    Project1
    Project2
    Project3
    Project4 (still subject to change)

    If you never used relational database management systems before, it might not be a bad idea to do some of those projects.

    Oracle 9i

    If you are interested in installing Oracle on your own computer go to the Oracle website, register as a user, and click the download button, and follow our Installation Instructions. Moreover, Oracle 9 is also installed on machines in the 3rd 5th floor labs in the PGH building. Each student receives an account number that allows him/her to use the departmental version of Oracle.

    Lecture Schedule Spring 2004

    see Course Information and Introduction to Data Management

    Class Transparencies Spring 2005

  • 0 Course Information and Introduction to Data Management (updated on January 11, 2005)
  • I Basic Concepts of Databases
  • II Implementation of Relational Operators, Query Optimization, and Physical Database Design
  • III Relational Database Design
  • IV Introduction to KDD and Making Sense of Data
  • V Internet Databases and XML
  • VI Object-Oriented Databases
  • VII Summary COSC 6340

    Relevant pages from the Han/Kamber book in 2005 (updated March 3, 2005)

  • chapter 1 pages 1-9 (sections 1.1 and 1.2 only)
  • chapter 2 pages 39-68
  • chapter 6 225-239 pages (excluding 6.2.4), 269-271 (6.7 summary only).
  • pages chapter 7 pages 279-292
  • pages chapter 8 335-354 and 370-376 (grid-based methods)

    Important Database Conferences

    International Conference Conference on Data Engineering (ICDE)
    International Conference on Very Large Databases (VLDB)
    International ACM SIGMOD Conference on Management of Data

    Important KDD Conferences

    KDD
    PKDD
    ICDM

    2003 Projects and Graded Homeworks

    2003 Specification Graded Homework1 (due We., March 19 (electronically) and Tu., April 1 (hard copy only))
    Specification Graded Homework2 2003 (due on Saturday,April 26, 2003, 11p (electronic submission)) new!!
    2003 Project1 (due Sa., March 15, 2003, 9p; electronic submission)
    2003 Project2 (student presentations are scheduled for April 17 and April 22, 2003).

    Spring 2003 Project2 Groups

    Decision Tree Learning for Large Data Sets --- Group1
    Decision Tree Learning on Very Large Data Sets --- Group2
    Clustering Large Data Sets Group
    Genomic Database Group
    Querying and Mining Data Streams Group
    Iceberg Queries Group

    Grading

    There will be an undergraduate material review exam, a midterm exam, and a final exam. Each student has to have a weighted average of 74.0 or higher in the exams of the course in order to receive a grade of "B-" or better for the course. Students will be responsible for material covered in the lectures and assigned in the readings. All homeworks and project reports are due at the date specified. No late submissions will be accepted after the due date. This policy will be strictly enforced. Course grades will be computed using a weight of approx. 72% for the exams, and a weight of 28% for the homeworks/assignments/group projects.

    Translation number to letter grades:
    A:100-90 A-:90-86 B+:86-82 B:82-77 B-:77-74 C+:74-70
    C: 70-66 C-:66-62 D+:62-58 D:58-54 D-:54-50 F: 50-0

    Only machine written solutions to homeworks and assignments are accepted (the only exception to this point are figures and complex formulas) in the assignments. Be aware of the fact that our only source of information is what you have turned in. If we are not capable to understand your solution, you will receive a low score. Moreover, students should not throw away returned assignments or tests.

    Students may discuss course material and homeworks, but must take special care to discern the difference between collaborating in order to increase understanding of course materials and collaborating on the homework / course project itself. We encourage students to help each other understand course material to clarify the meaning of homework problems or to discuss problem-solving strategies, but it is not permissible for one student to help or be helped by another student in working through homework problems and in the course project. If, in discussing course materials and problems, students believe that their like-mindedness from such discussions could be construed as collaboration on their assignments, students must cite each other, briefly explaining the extent of their collaboration. Any assistance that is not given proper citation may be considered a violation of the Honor Code, and might result in obtaining a grade of F in the course, and in further prosecution.

    CS lab: you are responsible to protect your own files. If you leave files on computers and other students turn in these files as their solution for course project, you are violating the university's academic honesty code. One way, to prevent that your solutions are copied by other students, is to edit and save all your sql files into your floppy disk (and run the system from using the data on the floppy). Alternatively, you could create local folders with your files on the hard drive and remove all files from the folder before you logoff (you could even write a script that does the cleanup).

    Communication with the teaching staff

    We strongly encourage students to come to my office hours or to talk to me directly after class. If a homework clarification is posted after a student has completed an assignment, the student should contact us as soon as possible to check if the assumptions s/he made are going to be accepted.

    Please do not e-mail us with grading questions. If you want us/me to explain why I took points off, you can talk to me/us during our office hours.

    Course Exams

    UG Material Review Exam

    Review sheet for the 2004 UG Review Exam (to be given on Th., Feb. 26, 2004).
    Solutions to a Sample Exam0 (in Word)
    Number Grades Review Exam
    Review List for 2002 Exam0)

    Midterm Exam

    Exam1 (March 30, 2000) (in Word)
    2000 Midterm Solutions (in Word)
    Solution Sketches April 2002 Midterm Exam

    Final Exam

    The 2002 final exam is scheduled for Tuesday, May 6, 11a-2p in ... --- be aware of the room change!!
    Review list for the 2002 Final Exam (new!!!)
    2000 Final Exam (was given on May 9, 2000)
    Solution Sketches Spring 2002 Final Exam (was given on May 7, 2002)

    Database Qualifying Exam

    The first part of the qualifying exam will be the COSC 6340 final exam. The second part of the qualifying exam will be a separate 90 minute exam that centers on the contents of 4 scientific papers and on material that was covered in the midterm exam (but not in the final exam). The second part is scheduled for Friday, May 9, 2003 10:30p-noon in room 315 PGH . 5 Scientific Papers for the 2003 QE (you can checkout the papers for copying during my officehours; but the papers can also easily be found by doing a websearch. Review List for Part2 of the Qualifying Exam

    Course Homeworks and Projects Spring 2000

    Useful Links

    Leftovers

  • A draft of the first graded homework (solution for SQL-problem 4a3 added on Feb. 28, 2004; also gives solutions for the relational algebra problems 4a-c) is available now; problems 1-3 are due Tu., Feb. 22 in class. Some typos and inconsistencies in the homework specification have been updated on Feb. 17, 1p. (Solution Sketches Problems 2 + 3 Homework1) 2005 class.
  • If you have no background in data management, I strongly recommend that you read and study section 4.2 and chapter 5 of you textbook through Feb. 21, 2005.
  • Lab2 is due Mo., April 4, 11a; reports that are received after the deadline but before Mo., April 4, 11p receive a 4% penalty; report that are received before We., April 6, 11a receive a 10% penalty; reports that are received after April 6, 11a will not be graded.
  • The grading of Second Exam has been completed. The number grade average of the midterm exam was 84.5. The midterm exam will count approximately 22% towards the final course grade. The midterm exam will be returned to students and briefly discussed in the class on Th., April 28.
  • This semester we will have 3 labs: Lab1(2004 Lab-TA homepage --- contains useful links concerning Lab1+2 and concerning on how to use Oracle) is a practical warmup excercise in using and understanding relational databases. Lab2 (last updated on March 29, 2005, 4p) will center on the design and implementation of a database to store data mining datasets and datamining results. Lab3 (new!!) will center on implementing and evaluting a data mining technique (different students will use different data mining techniques) or on providing additional capabilities on the top of the database you desing. This is the schedule for the three labs: Lab1 (March 3-12, 2005); Lab2 (March 12-April 2, 2005); Lab3 (March 21-April 23, 2005). last updated: May 13, 2005, 1p