CS6120: Natural Language Processing

Final Project

Due: 11:59pm, Monday, 24 April 2017

Instructions

The project for this course may be any system that uses empirical data to perform a modeling, prediction, or explanation task that can, at least in part, be quantitatively evaluated. By empirical data, we mean data generated by human language speakers other than you. You might annotate it, if necessary, or use others' annotations. By quantitative evaluation, we mean any method that can rank the performance of alternate models or approaches to the problem you are trying to solve. How will you know if you're making progress? If you are using supervised learning, this is usually easy: hold out some data from your training set and evaluate an objective function on the test data to test the match of your model predictions with the target annotation. If you are performing exploratory data analysis or explanatory modeling, e.g., to determine the factors behind language change, you might come up with partial evaluations that determine how robust your model is to varying your assumptions.

Short Pitch by 6 p.m., 2 March

Before class on 2 March, please send the instructor an email with:

one paragraph on what your model with do,
an example of the kind of data you might use, and
at least two example papers by different authors that describe related work.

We will then discuss whether your proposed project is too big or too small, whether a change in focus might be helpful, etc.

Short Presentation in Class on 20 April

On the last class of the term, we'll have short presentations about all of the projects. Each student will have three minutes to talk. You should send me (Prof. Smith) one PDF slide by 4pm on Thursday, 20 April that I have time to compile them in the correct order. Because you'll be limited to one slide and three minutes, you should focus not on explaining the whole project and its background; instead, think about what interesting models, datasets, or results would be most interesting for everyone else in the class to hear about.

Please let me know as soon as possible if you cannot make it to this class so that we can schedule an alternate presentation time.

Final Report by 11:59pm, Monday, 24 April 2017

You should email to Prof. Smith a final report on your project in PDF form, along with separate files for code and links to any datasets you used. (Please don't submit datasets by email.) The report, which should be abuot eight to ten pages, should describe in detail what you implemented, what the results of your quantitative evaluation were, and what conclusions you draw for future work in this area. Although you don't need to perform an extensive literature review, you should cite previous work on the problem, especially for models you are using as baselines or comparisons. When reporting quantitative results, please include tables and/or graphs, as well as discussing the results in the narrative.