Overview
This is a graduate level course in information retrieval (IR) and text mining. The course in intended for developing foundations in IR and Web data/text mining with a focus on solving real-world problems. Throughout the course, emphasis will be placed on tying IR techniques to specific real-world applications through hands-on experience.
Administrative details
Office hours
Instructor office hours: MW 1:00 - 2:00 PM, PGH XXX
TA: XXXXXX (xxx @ uh . edu)
TA office hours: MW 1:00 - 2:00 PM, PGH XXX
Prerequisites
The course requires basic background in mathematics and sufficient programming skills. If you have taken and did well in one or more of the equivalent courses/topics such as Algorithms, Artificial Intelligence, Numerical methods, or have some background in linear algebra, it will be helpful. The course however reviews and covers required mathematical foundations. Sufficient experience for building projects in a high level programming language (e.g., Java) will prove beneficial.
Note: This course has some overlap with Data Mining (COSC 6335) and Natural Language Processing(COSC 6342). Especially some topics in supervised learning, clustering. However, the focus is more applied (as opposed to theory) and the goal is to understand their applications in IR. They are covered in this course to make this course standalone. Although not absolutely essential, if you have taken either of those courses, it will definitely be serve to be useful.
Reading Materials
Textbooks:
IIR: Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Cambridge University Press. 2008.
Companion website with online version for reading.
WDM: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Bing Liu; Springer, 1st Edition.
Reference materials:
SI: Statistical Inference, Casella and Berger. Cengage Learning; 2nd edition.
Online resources (OR) per topic as appearing in the schedule below.
Lecture notes
Grading
Component | Contribution |
HW0 | x% |
HW1 | x% |
HW2 | x% |
HW3 | x% |
Project 1 | x% |
Project 2 | x% |
Final (Must pass to pass the course!) | x% |
Homework/Project due dates
Assignment | Due date |
HW0 | xx/xx |
HW1 | xx/xx |
HW2 | xx/xx |
Project 1 | xx/xx |
Project 2 | xx/xx |
Schedule of topics
Class | Topic(s) | Resources: Readings, Slides, Lecture notes, Papers, etc. |
M, 8/25 | Introduction
Course administrivia, semester plan, course goals Information Retrieval in a nutshell |
Readings:
OR01:IR tutorials by (1) [Zhou et al. 2006] (2) K.Radinsky |
x, x/xx | Retrieval Models in IR
Boolean retrieval Vector space model |
Readings:
OR02: Chapter 1 IIR, Chapter 6, IIR OR02: Lecture notes/slides by H. Schütze on Boolean retrieval, Vector space model. Programming resources, tools, libraries for projects and homeworks: Lucene, Nutch |
x, x/xx | Relevance feedback and Probabilistic IR
Relevance feedback and querry expansion Probabilistic retrieval |
Readings: Chapter 9, 11 IIR |
x, x/xx | Evalaution and Language Models
Evaluation metrics Language models for IR |
Readings:
OR13: Chapter 8 IIR, Chapter 12 IIR. OR14: Lecture notes/slides by H. Schütze on Evaluation metrics for IR, Langugae models for IR. Programming resources, tools, libraries for projects and homeworks: RankLib:Implementation of Ranking algorithms |
X, X/XX | Machine Learning Review
Basic concepts: Data and features Decision tress Naive Bayes Support Vector Machines Classifier evaluation |
Readings: Chapter 3 WDM (3.1, 3.2, 3.3, 3.5, 3.6, 3.8)
Programming resources, tools, libraries for projects and homeworks: SVM: SVMLight, Mallet, LingPipe |
x, x/xx | Web Search basics
Infomration and Text Retrieval basics Web search: Text retireval models Meta Search: Combining multiple rankings |
Reading(s): Chapter 6 WDM (6.1, 6.2, 6.3, 6.8, 6.9) |
x, x/xx | Link Analysis I
Social Network Analysis Page Rank and Random Walk models Trust Rank and Topical Page Rank HITS algorithm |
Reading(s): Chapter 6 WDM (6.4), Chapter 7 WDM (7.1, 7.3, 7.4), Chapter 19, 21 IIR.
Papers on (1) Trust rank and (2) Topic sensitive pagerank. Programming resources, tools, libraries for projects and homeworks: JUNG, RankLib |
x, x/xx | Link Analysis II
Ranking evaluation metrics (revisited): MAP, NDCG, P@n Markov Random Fields for IR LETOR: Combining multiple rankers |
Reading(s): Chapter 6 WDM (6.4), Chapter 7 WDM (7.1, 7.3, 7.4), Chapter 19, 21 IIR.
MRF and BP tutorial MRFs for IR/Fraud detection [Metzler and Croft, 2005], [Metzler and Croft, 2007], [Pandit et al., 2007], [Fei et al., 2013] LETOR Tutorial by (1) T. Liu (Chapter 1 reading) H. Li (comprehensive reference), application papers [Mukherjee et al., 2013], [Mukherjee et al., 2012] Programming resources, tools, libraries for projects and homeworks: JUNG, RankLib, Beleif Propagation on MRFs |
x, x/xx | Latent Variable Models for IR I
Stat review I: Distributions, Hierarchical models Stat review II:Bayes nets, PGMs Latent variable models: Statistical topic models |
Readings: TBA |
x, x/xx | Latent Variable Models for IR II
Modeling query intent and interesting things Topic modeling for IR Ad-hoc IR using statistical topic/language models |
Readings: TBA Utilizing Topic Models in IR Topic models for Ad-hoc IR |
x, x/xx | Adversarial Infomration Retrieval
Web spam Collaborative filtering Opinion spam |
Readings: TBA AIRWeb |