IS U900/CS G224 Topics in Information Science/Natural Language Processing
Course Syllabus Spring 2007

Instructor: Prof. Carole Hafner, Office 446 WVH, Tel. 617-373-5116
Course web site:
Prof. Hafner Office Hours: Mon 4-6 p.m., Fri 1:45 - 3:30 p.m.




Course Administration and Rules

This course provides an introduction to the computational analysis of human language, the ongoing effort to create computer programs that can communicate with people in natural language, and current applications of the natural language field such as intelligent text retrieval,  information extraction, and question answering. Topics include: computational models of grammar and automatic parsing; statistical language models and the analysis of large text corpuses; natural language semantics and programs that understand language; and language use by intelligent agents. Required coursework includes understanding and appplying language models, and implementation of working programs that analyze and interpret natural language text.  Prereq. CS U370 Object oriented design

There are two textbooks for this class: 
Other materials will be provided as handouts or links to Web sites.

Approximate Schedule of Topics and Readings

Note: details will be added to this broad outline throughout the semester.  Check the "last modified" date below to see whether the syllabus has changed since you last reviewed it.

WEEK(s) TOPICS                                                                                

Getting started in text analysis

NLTK Preface and Ch 1
NLTK Ch 2, Appendix 14
English morphology and the lexicon
NLTK, Sec 3.1-3.4
JM Sec 3.0, 3.1, 3.2(pp. 65-70)
  3.4, 3.5, App B
Supplementary readings
Word frequency-based statistics
Part of speech tagging
NLTK Sec 3.5
JM Sec 6.0-6.3
NLTK Ch 4, JM Ch 8

Formal grammars and parsing I
NLTK Ch 7-8, JM Ch 10
7 - 8
Lexical semantics and information retrieval
NLP community resources and activities
Discussion of term project
JM Ch 16, 17


9 - 10
Language and meaning
NLTK Ch 11, JM Ch 14, 15, 19
11 - 13
Topics and readings to be determined by student interest
Note: no class on March 30 (will follow Monday schedule)


Last class: Student Project Presentations

Course Administration and Rules

Approximately one third of the student's grade will be determined by individual homework, one third by two exams (a midterm exam and a final exam, weighted equally) and one third by a term project. In order to get a passing grade in the course, you must get a passing grade on all three components. Class participation will also be taken into account in determining the course grade.  Late assignments may be discounted, and very late assignments may be discarded.

Academic (Dis)Honesty: The individual assignments must be each student's own work.  Any group projects assigned must be the work of the students in the group.  Plagiarism or copying will result in official University disciplinary review. Security is an important aspect of computer science. Students are required to take reasonable measures to protect their work (such as setting permissions on files to
prevent others from reading them, and not leaving problem solutions in locations where other students may see them).

A CCIS Unix account is required to access course materials.  To learn how to get an account, go to:

There are no make-up exams in this course.  Normally if a student misses an exam the student will receive a grade of 0 on that exam. Under unusual circumstances (such as documented serious illness), the student's grade on a missed exam will be replaced by the grade on the final exam.

Last modified: January 11, 2007