**Overview**

This is an advanced course on Natural Language Processing and applied NLP in Web mining. The course in intended for developing advanced skills in NLP and Web data and text mining via NLP applications. The broader goal is two fold: (1) Get a thorough understanding of statistical NLP techniques (e.g., latent variable models, graphical models in NLP, etc.) and learning to build tools for solving practical text mining problems, (2) Explore recent papers in the field with presentations, talks, critiques, and defence. Throughout the course, large emphasis will be placed on tying NLP techniques to specific real-world applications through hands-on experience. The course covers fundamental topics in statistical machine learning and touches upon various topics in NLP for the Web.

**Administrative details**

**Office hours**

Instructor office hours: M 2-4 pm

**Prerequisites**

The course requires decent background in mathematics and sufficient programming skills. If you have taken and did well in one or more of the equivalent courses/topics such as Algorithms, Data Mining, Machine Learning, Natural Language Processing, or have decent background in probability/statistics, it will be helpful. The course however reviews and covers required mathematical and statistical foundations. Sufficient experience for building projects in a high level programming language (e.g., Java) is required.

**Required reference materials:**

Online resources (OR) per topic as appearing in the schedule below.
**Course Materials including books and lecture notes**

**Grading**

Component | Contribution |

Project | 25% |

Paper Presentations | 55% |

Critique | 15% |

Class Participation | 5% |

**Paper Reading Assignments/Project due dates**

Assignments | Due date |

Project | 4/18 |

Paper: Domain Adaptation with Structural Correspondence Learning [Blitzer at al., 2006] Presenter/Defender: Fan Critique: Yifan |
Next regular meeting |

Paper: Distance Metric Learning for Large Margin
Nearest Neighbor Classification [Weinberger et al., 2006] Presenter/Defender: Marjan Critique: Huijie |
Next regular meeting |

Paper: One-Class SVMs for Document Classification [Manevitz et al., 2001] Presenter/Defender: Santosh Critique: Marjan |
Next regular meeting |

Paper: Hinge Loss Markov Random Fields [Bach et al., 2013] Presenter/Defender: Dainis Critique: Huijie |
Next regular meeting |

Paper: AFRAID: Fraud Detection via Active Inference in
Time-evolving Social Networks
[Vlasselaer et al., 2015] Presenter/Defender: Huijie Critique: Santosh |
Next regular meeting |

Paper: Efficient Estimation of Word Representations in
Vector Space [Mikolov et al., 2013] . Ref. for background: [1], [2] Presenter/Defender: Yifan Critique: Dainis |
Next regular meeting |

Paper: Learning Latent Representations for Domain Adaptation using Supervised Word Clustering [Xiao et al., 2013] Presenter/Defender: Fan Critique: Santosh |
Next regular meeting |

Paper: Co-Training for Domain Adaptation [Chen et al., 2011] Presenter/Defender: Marjan Critique: Fan |
Next regular meeting |

Paper: Supervised Random Walks [Backstrom and Leskovec, 2011] Presenter/Defender: Santosh Critique: Huijie |
Next regular meeting |

Paper: Distributed Representations of Words and Phrases
and their Compositionality [Mikolov et al., 2013] Presenter/Defender: Dainis Critique: Yifan |
Next regular meeting |

Paper: Understanding and Combating Link Farming
in the Twitter Social Network [Ghosh et al., 2012] Presenter/Defender: Huijie Critique: Santosh |
Next regular meeting |

Paper: DeepWalk: Online Learning of Social Representations [Perozzi et al., 2014] Presenter/Defender: Yifan Critique: Santosh |
Next regular meeting |

Paper: Domain Adaptation for Large-Scale Sentiment Classification:
A Deep Learning Approach
[Glorot et al., 2011] Presenter/Defender: Fan Critique: Marjan |
Next regular meeting |

Paper: CRF Autoencoders for Unsupervised Structured Prediction [Ammar et al., 2014] Presenter/Defender: Yifan Critique: Fan |
Next regular meeting |

Paper: Authorship Verification as a One-Class Classification Problem [Koppel and Schler, 2004] Presenter/Defender: Marjan Critique: Dainis |
Next regular meeting |

Paper: Unsupervised Cross-Domain Word Representation Learning [Bollegala et al., 2015] Presenter/Defender: Dainis Critique: Fan |
Next regular meeting |

Paper: Deep Semantic Frame-based Deceptive Opinion Spam
Analysis [Kim et al., 2015] Presenter/Defender: Huijie Critique: Marjan |
Next regular meeting |

Paper: Collective Opinion Spam Detection [Rayana and Akoglu, 2015] Presenter/Defender: Santosh Critique: Dainis |
Next regular meeting |

Paper: Learning with Marginalized Corrupted Features [Maaten et al. 2013] Presenter/Defender: Fan Critique: Marjan |
Next regular meeting |

Paper: Bidirectional LSTM-CRF Models for Sequence Tagging [Huang et al., 2015] Ref. for background: [1] Presenter/Defender: Dainis Critique: Yifan |
Next regular meeting |

Paper: Joint Modeling of Opinion Expression Extraction
and Attribute Classification [Yang and Cardie 2014] Presenter/Defender: Yifan Critique: Dainis |
Next regular meeting |

Paper: Frustratingly Easy Domain Adaptation [Daume III, 2007] Presenter/Defender: Marjan Critique: Dainis |
Next regular meeting |

Paper: From Word Embeddings to Document Distances [Kusner et al., 2015] Presenter/Defender: Sanotsh Critique: Fan |
Next regular meeting |

Paper: BIRDNEST: Bayesian INference for Review Rating Fraud [Hooi et al., 2015] Presenter/Defender: Huijie Critique: Santosh |
Next regular meeting |

**Schedule of topics**

Topic(s) | Resources: Readings, Slides, Lecture notes, Papers, Pointers to useful materials, etc. |

Brief Introduction to NLP
Course administrivia, semester plan, course goals NLP Resources Language as a probabilistic phenomenon Word collocations, NLP and text retrival basics Text categorization Introduction to topics to be covered in the course |
Required readings:
Lecture notes/slides Chapter 1 FSNLP (Sections 1.2.3, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.4.4) Boolean retrieval slides by H.Schutze Boolean retrieval [Manning et al., 2008] (upto section 1.4) F. Keller's tutorial on Naiye Bayes + notes of A.Moore for graph view (Slide 8) |

Statistical foundations I:Basics
Probability theory Conditional probability and independence |
Required Readings:
Lecture notes/slides Chapter 2 FSNLP (Section 2.1.1 - 2.1.10), Chapter 1 SI (Selected topics covered in class and solved examples) OR01: X.Zhu's notes on mathematical background for NLP Slides: |

Statistical foundations II: Random varibales and Distributions
Random variables, density and mass fuctions Mean, Variance Common families of distributions Multiple random variabls: joints and marginals |
Required Readings:
Lecture notes/slides Chapter 2 SI (Theorem 2.1.10, 2.2, 2.2.1, 2.2.2, 2.2.3, 2.2.5, 2.3.1, 2.3.2, 2.3.4, and topics covered in class). Chapter 3 SI (All sections + worked out examples upto 3.4), focus on distributions/problems covered in class and skip other topics. Chapter 4 SI (4.1, 4.1.1, 4.1.2, 4.1.3, 4.1.4, 4.1.5, 4.1.6, 4.1.10, 4.1.11, 4.1.12, 4.2.1, 4.2.2, 4.2.3, 4.2.4, 4.2.5). OR02: K.Zhang's notes on common families of distribution with worked out examples [Skip hyupergeometric, neg-binomial distributions and focus on the ones covered in class]. Optional Recommended reading/solved examples:
OR03: Notes on Joint, marginals, worked out examples by S.Fan OR04: Tutorial on joints and marginals by M.Osborne [Contains NLP specific examples] |

Hierarchical models and mixture distributions
Parameter estimation: MLE vs MAP Prior, posterior, conjugate priors Binomial-Poisson hierarchy Beta-Binomial hierarchy |
Required Readings:
Lecture notes/slides + Chapter 4 SI (4.4, 4.4.1, 4.4.2, 4.4.5 - 4.4.6) OR05: P. Robinson's notes on parameter estimation [Slides 1-35] Optional reference:
OR06: Notes on conjugate models by P. Lam [Slides 1-49] Conjugate priors for common families of distribution |

Text Clustering: Semantic Clustering and Topic Models
Latent semantics and clustering problem Introduction to Bayes nets and PGMs Latent Dirichlet Allocation Learning and evalauting Topic Models |
Required Readings:
Lecture notes + Stat review: Sampling from distributions (previous slides/lecture notes) OR13: Tutorial by D.Blei (Slides 1-17) OR14:Gibbs sampling tutorial by M.Bahadori (Slides 1, 3-5, 7, 16-20, 22 ) Gibbs sampler derivation for Latent Dirichlet Allocation. Comprehensive explanation/derivation of LDA by Gregor Heinrich LDA Gibbs sampler implementation [Java/Eclipse project] Programming resources, tools, libraries for projects and homeworks:
Mallet, LingPipe Java Topic Modeling Toolkit [with implementation of LabeledLDA] Matlab Topic Modleing Toolkit [with implementation of Author-Topic model] [Implementation of advanced models] G.Heinrich's LDA and statistics base classes for sampling based algorithms in Java Supervised Topic Models Optional recommended reading for research/projects:
Understanding Gibbs sampling with derivation for the Naive Bayes model (unsupervised) [Resnik and Hardisty, 2010] D.Blei's tutorial on Dirichlet priors (Slides 32-39) LDA Gibbs Sampler derivation (Chapter 2) by Yi Wang Author topic model [Rosen-Zvi et al., 2004]; Derivation and details. Applications of topic models (NIPS Workshop) Topic coherence metric for evalauting topic models [Mimno et al., 2011] Generic Gibbs sampling for Topic Models by G. Heinrich Supervised Topic Models |

Sentiment Analysis and Psycholinguistics
Aspect extraction Deception and opinion spam |
Required Readings:
Lecture notes + slides + selected topics (covered in lectures) from Chapter 11, WDM Programming resources, tools, libraries for projects and homeworks:
Pos/Neg Sentiment Lexicon, SentiWordNet, Deep learning for senitment analysis Optional topics/concepts useful for research/projects
Papers on opinon spam: [Mukherjee et al., 2013], [Mukherjee et al., 2012] Papers on topic modeling: [Blei et al., 2003], [Resnik and Hardisty, 2010] Aspect and Senitment Model: [Jo and Oh, 2011], slides, Accompanying data and source code Other relevant papers: [Lin and He, 2009], [Zhao et al., 2010], [Mukherjee and Liu, 2012] |