Return to basic course information.
Assigned: Friday, 22 February 2013
Due: 11:59pm, Wednesday, 13 March 2013
Generating sentences from a grammar [10 points]
First, download the initial
grammar: grammar1
. The file
contains self explanatory comments, which are delimited by
#. The basic format is:
1 VP Verb NPIgnore the leading "1" for now. The first symbol is the left-hand side of a context-free rewrite rule; the remaining one or more symbols are right-hand-side terminals and non-terminals. There is no typographic distinction enforced between terminals and non-terminals; rather, any symbol that does not appear on a left-hand side is by definition a terminal.
Now let's get down to work:
./generate grammar1 5In other words, print five random sentences from the grammar
grammar1
to standard output.
You should see output like:
the president ate every sandwich !
Your program should start with the ROOT
symbol and recursively choose rewrite rules until
termination. In grammar1
, there are three
possible expansions of grammar1
; you should
have an equal chance of choosing each of them. Don't
hardwire this grammar into your program. Read it anew each
time it runs, so that you can modify the grammar
later.
Save and hand in 10 random sentences generated by this first version of the program.
grammar1
to grammar2
and
modify it. Instead of just "1", allow any non-negative real
number at the beginning of a rule. For instance, you can
assert that the is a more common determiner
than a or every like so:
4 Det the 1.5 Det a 0.5 Det everyIn particular, they would be distributed (2/3, 1/4, 1/12).
Play around with the weights in grammar2
and see how your generated sentences change. Can you
make the average sentence length much longer or
shorter?
Hand in the new probabilized generation program, your
modified grammar2
, and ten random
sentences.
Describing English with context-free rules [15 points]
grammar2
to grammar3
. Modify the latter so that, in
addition to the strings of grammar2
, it can also
generate phenomena like:
Alex kissed a sandwich . that Alex kissed a sandwich perplexed the president . the president thought that Alex kissed a sandwich . the president smiled . Alex ate a sandwich and wanted a pickle . the president and the chief of staff understood that Alex smiled .In addition to new terminals, like Alex, you'll need to add new non-terminals. Of course, you don't want to generate just the six sentences above, but others with the same constructions.
Hand in your expanded grammar.
grammar4
. Modify it
to model some new constructions of English that you find
interesting. Explain in comments to your grammar file what you
are doing.
Context-free parsing [25 points]
Using the programming language of your choice, write a recognizer using Earley's algorithm as discussed in class. The parser should read grammars in the format you've been writing above and read input sentences in the space-separated format you used above to output generated sentences. In other words, you should be able to parse the sentences generated by your own grammar. To get sufficient speed, you may want to implement the left-corner indexing trick discussed in class. (Other tricks may not offer as much benefit for a given implementation time within the scope of this assignment.)
For sentences in the language defined by the grammar, output true; otherwise, false.
Since this is an unweighted parser, you don't have to do the bookkeeping of weights or decide among competing alternatives when attaching completed rules.
Extra credit: Competitve grammar writing
This exercise is based on Jason Eisner and Noah Smith's article
of that name. Using the grammar you've handed in
as grammar4
, we will generate 10 new sentences and
judge whether they are in fact grammatical English. (You can't
win this competition by generating Chinese or Finnish or
Navajo.) Then, your grammatical sentences will be sent to be
parsed by everyone else's Earley recognizers and their
grammars. You get five points for each grammatical
sentence you generate that no one else's grammar can parse.