Assignment 1: OCaml warmup, part 2: trees

8.12

Assignment 1: OCaml warmup, part 2: trees🔗

Due: Tue 01/16 at 8:59pm

git clone

Almost every assignment we work on in this course will involve transforming list- or tree-shaped data from one form to another. In this assignment, you will be working with traversing a tree and producing results, and traversing a list and producing a tree.

1 Evaluating arithmetic🔗

Let’s define a data type for describing simple arithmetic expressions:

type arith =
  | Plus of arith * arith
  | Times of arith * arith
  | Variable of string
  | Num of int

It should be mnemonically apparent what each case ought to represent. Note that there is no notion of a “parentheses expression”: the parenthesization is implicit in the tree structure. For example,

"Math" syntax		`arith`
`3 * (4 + 5)`		`Times(Num 3, Plus(Num 4, Num 5))`
`(3 * 4) + 5`		`Plus(Times(Num 3, Num 4), Num 5)`

This is a miniature language, and we reasonably want to do typical things with it: evaluate it and get an answer, or print it back out as text. Evaluation is straightforward, except for handling variables: we need some sort of environment to look up their values.

In the starter file expr.ml, we have given you the arith data type, and a type for environments. Implement the five functions in that file:

(* Looks up the given variable name in the given environment *)
get : env string -> int option

(* Determines whether the given environment contains a binding for
   the given variable *)
contains : env string -> bool

(* Adds a new variable-->value mapping to an environment, and
   produces a new environment *)
add : env string int -> env

(* Evaluates the given expression in the given environment *)
evaluate : arith env -> int

(* Neatly prints the given expression as a string *)
pretty : arith -> string

Enhance the provided starter file test.ml to thoroughly test your functions. Think carefully about how to test them, and be sure to check for sneaky edge cases: get used to thinking about that now, while the languages are still small, so that it becomes second-nature when the languages get bigger!

2 Parsing S-expressions🔗

In this part, you will be working on obtaining a tree of data from a string input: in other words, you will be parsing the data. Generally speaking, manually parsing an arbitrary language is tedious, so we will be working with a particularly simple language instead: s-expressions.

The first step in parsing a file is tokenization, that is, a function

tokenize : string -> token list

You can think of this as breaking up the string into its component words and punctuation marks. The starter file you are given contains a definition of this function, and a definition of the token type, but those definitions are slightly more intricate than above. When you make a mistake writing a program in almost any language, the compiler will give you an error message that contains the location of the error in the source. Accordingly, we need to define

(* startline, startcol, endline, endcol *)
type pos = int * int * int * int

To define tokens, we could explicitly include position information in each token, but we’ll be more general: we’ll let tokens include any information, and position information will often be handy:

type 'a tok =
  | LPAREN of 'a
  | RPAREN of 'a
  | TSym of string * 'a
  | TInt of int * 'a
  | TBool of bool * 'a

Now the tokenize function actually has signature

tokenize : string -> pos tok list

Explain in a short paragraph how you think this function works. You should be sure to explain why it uses List.fold_left, what the try/with construction is for, what the call to regexp is likely doing, and how the function keeps track of position information.

The starter file also gives you a type definition of s-expressions, which have also been decorated:

type 'a sexp =
  | Sym of string * 'a
  | Int of int * 'a
  | Bool of bool * 'a
  | Nest of 'a sexp list * 'a

The last function in the file produces a value of type result, which is predefined as:

type ('a , 'b) result =
  | Ok of 'a
  | Error of 'b

Define a function

parse_toks : pos tok list -> (pos sexp list, string) result

that parses a given list of tokens with position information (the pos tok list argument) into a result: either it successfully produces a list of valid s-expressions from the token list, or else it should produce an error message explaining the error...and referencing the location at which the error occurred. For example:

(parse_toks (tokenize "(a b)"))          ==> Ok [Nest ([Sym ("a", (0, 1, 0, 2));
                                                        Sym ("b", (0, 3, 0, 4))],
                                                       (0, 0, 0, 5))]

(parse_toks (tokenize "(a (b true) 3)")) ==> Ok [Nest
                                                  ([Sym ("a", (0, 1, 0, 2));
                                                    Nest ([Sym ("b", (0, 4, 0, 5));
                                                           Bool (true, (0, 6, 0, 10))],
                                                          (0, 3, 0, 11));
                                                    Int (3, (0, 12, 0, 13))],
                                                   (0, 0, 0, 14))]


(parse_toks (tokenize "(a"))             ==> Error "Unmatched left paren at line 0, col 0"

(parse_toks (tokenize "(a (b c"))        ==> Error "Unmatched left paren at line 0, col 3"

Hint: the signature above is not general enough to be sufficient to parse s-expressions, and you’ll need one or more helper functions. First consider what a plausible “subtask” might be, to make progress on producing a pos sexp list. Next, consider the data definition for sexp carefully, and look very carefully at the second example above: at the point just after the token true and before the following RPAREN, how might you describe your intermediate state? What have you built up so far, what are you in the process of building, and what is left over?

Hint: you can and should solve this problem by looking at only one token at a time. Trying to “look ahead” in the list is pretty much guaranteed not to work, so you need to use some other technique to match up left and right parentheses. One mnemonic I was shown a long time ago: to check if an s-expression is balanced, mentally start a running count at zero, scan across the expression from left to right, increment the count at each left parenthesis, and decrement it at each right parenthesis. If the count is zero at the end, and never became negative, then you’ve got a correctly-matched s-expression. How might you carry the idea of this across into your code? (You won’t need to maintain an actual int counter; you’re implicitly maintaining a stack...)

Define a function
```
parse : string -> (pos sexp list, string) result
```
that is more convenient to use than parse_toks.
Test your parser carefully.

3 Grading standards🔗

For this assignment, you will be graded on

whether your code compiles,
whether your code implements the specification (functional correctness),
whether you thoroughly test every method that you write, and
how readable your code is (indented well, commented well, etc).

4 Submission🔗

4.1 Deliverables🔗

Your submission should include all the provided files; you should not need to create any new ones.

Please ensure that your code compiles! On this assignment, half of your grade is for correctness as determined by automated testing, so code that doesn’t compile is subject to a 50% penalty up front. Zip the provided files together:

$ zip submission.zip Makefile sexp.ml expr.ml test.ml

and submit that zip.

1	Evaluating arithmetic
2	Parsing S-expressions
3	Grading standards
4	Submission