Assignment 1: Lexer
Due: 10 pm, Monday, September 19.
Assignment 1 is to build a lexer for Appel's Tiger language using
ML-Lex. The Tiger language is defined in Appendix A of the course
textbook. Documentation on ML-Lex is available on the class web page,
and also (in less detail) in chapter two of the textbook.
Skeleton files to get you started are available in the
$TIGER/chap2/
directory, which you can find at Appel's textbook homepage.
You should submit:
- A tiger.lex file, with the source for your lexer.
- Any other source files you wrote to support your lexer.
- A text file describing
- the members of your team
- How you handled comments
- How you handled errors
- How you handled end-of-file
- Anything else you think is of interest about your lexer
You're expected to write clean code; just getting it to work is not enough.
Your lexer should use the error-reporting machinery in Appel's
ErrorMsg
module
(see file $TIGER/chap2/errormsg.sml
),
or something equivalent that you write yourself.
In particular, error messages should be reported using
line-number/column offsets,
not by simply specifying the character offset from the beginning of the file.
Words to the wise: Relative to, say, a parser, it's not hard to build a lexer,
but:
- Tiger's lexical grammar has some complexities that may take you more
work to handle than you might initially suspect:
- Comments nest in Tiger.
- The string-literal syntax is especially complex, involving a lot
of subcases, some of which are fairly complex in their own right.
- Having all of the above machinery interact with your line-number
tracking machinery has its own complications.
- If this is your first experience ever programming in SML, you'll
need to allocate time generously to deal with coming up to speed
on the language.
- Does your lexer do the right thing if eof occurs inside a comment
or string-literal? How about illegal escape codes in string literals?
See the course text for more information,
in particular chapter two and appendix A.
–Olin
CS4410