CS6120: Natural Language Processing

Spring 2016 Syllabus

Return to basic course information.

This schedule is subject to change. Check back as the class progresses.

Why NLP?
Language Models
- n-gram models, naive Bayes classifiers, probability, estimation
- We also played the Shannon game, guessing the next letter from the previous n letters.
- Readings for Jan. 20: Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP, 2002.
- Victor Chahuneau, Kevin Gimpel, Bryan R. Routledge, Lily Scherlis, and Noah A. Smith. Word Salad: Relating Food Prices and Descriptions. In EMNLP, 2012.
- Reading for Jan. 27: C. E. Shannon. Prediction and Entropy of Printed English. The Bell System Technical Journal, January 1951.
- Background: Jurafsky & Martin, chapter 4
Regular Languages
- history of NLP research, the Chomsky hierarchy, regular expressions, (weighted) finite-state automata and transducers
- Readings for Feb. 3: Kevin Knight and Jonathan Graehl. Machine Transliteration. Computational Linguistics, 24(4), 1998.
- Ryan Cotterell, Nanyun Peng, and Jason Eisner. Stochastic Contextual Edit Distance and Probabilistic FSTs. In ACL, 2014.
- Background on NLP with unweighted finite state machines: Karttunen, Chanod, Grefenstette, and Schiller. Regular expressions for language engineering. Journal of Natural Language Engineering, 1997. We discussed the main points and interesting examples from this paper in class, but you can read it for more derivations and examples.
- More background: Jurafsky & Martin, chapter 2
Noisy Channel and Hidden Markov Models
- noisy channel models with finite state transducer; part-of-speech tagging; hidden Markov models as noisy channel models; Viterbi and Forward-Backward algorithms; parameter estimation with supervised maximum likelihood and expectation maximization
- Readings for Feb. 17: Barzilay & Lee. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. HLT-NAACL, 2004.
- Ritter, Cherry & Dolan. Unsupervised Modeling of Twitter Conversations. HLT-NAACL, 2010.
- Background: Jurafsky & Martin, chapter 5 and 6.1-6.5
Context-Free Grammars and Parsers
- Readings for Mar. 2: Dan Klein and Christopher D. Manning. Accurate Unlexicalized Parsing. ACL, 2003.
- Slav Petrov. Generative and Discriminative Latent Variable Grammars.. NIPS Workshop, 2009.
- Background: Jurafsky & Martin, chapters 12-14
Log-Linear Models
- also known as: logistic regression, and maximum entropy (maxent) models; directly modeling the conditional probability if output given input, rather than the joint probability of input and output (and then using Bayes rule)
- Background: Jurafsky & Martin, chapter 6.6-6.7; N. Smith, Appendix C
Models with Structured Outputs
- models that decide among combinatorially many outputs, e.g. sequences of tags or dependency links; locally normalized (action-based) models such as Maximum Entropy Markov Models (MEMMs); globally normalized models such as linear-chain Conditional Random Fields (CRFs)
- Background: Jurafsky & Martin, chapter 6.8; N. Smith, chapter 3.1-3.5
Semantics
- logical form: lambda expressions, event semantics, quantifiers, intensional semantics; first steps in computational semantics: semantic role labeling, combinatory categorial grammar (CCG)
- Background: Jurafsky & Martin, chapters 18-20; see also NLTK book, chapter 10
Machine Translation
- word-based alignment models; phrase-based models; syntactic and tree-based models; learning from comparable corpora
- Background: Jurafsky & Martin, chapter 25