Lecture 9.2: Interpreting Calc

Overview

Interpreting Calc
Adding new data types: Booleans and Control Flow

1. Interpreting Calc

Last class, we defined a grammar, discussed concrete and abstract syntax, and showed the following AST representation:

type expr =
  | Add of expr * expr
  | Mul of expr * expr
  | Nat of int

We then showed how a parser can take concrete syntax, like

(3 + 4) * 5

and transform it into the AST

Mul(Add(3, 4), 5).

Today, we will tackle the "semantics" of Calc, specifically asking the question: "what does a program mean?" In the case of (3 + 4) * 5 we want the program to mean 35.

For our simple type of expr, how to implement this meaning is simple: we recursively traverse the tree to form integers.

let rec eval (x : expr) : int = 
  match x with
  | Nat n -> n
  | Add (a, b) -> eval a + eval b
  | Mul (a, b) -> eval a * eval b

There is a very subtle, but important point happening here. When we write

(3 + 4) * 5

in our concrete syntax, we first parse this piece of raw syntax into the AST Mul(Add(Nat 3, Nat 4), Nat 5). At this point, the names Mul and Add are still uninterpreted; we could have just as easily called them Mario and Luigi. We then give semantics to these names by choosing to interpret Add via OCaml's + and Mul by *.

Properties of Evaluation

We can already state useful properties about calc programs using this AST representation. For example, that addition is commutative:

∀ (e1 e2 : expr), eval (Add(e1, e2)) = eval (Add(e1, e2))

And that multiplication is distributive over addition:

∀ (e0 e1 e2 : expr), eval (Mul(e0, Add(e1, e2))) = eval (Add(Mul(e0, e1), Mul(e0, e2)))

Note that simple properties like the ones above only hold for expressions because they denote pure, total expressions. Indeed, as we go to more expressive languages, properties like these start to break down. Let's see an example of this in OCaml. Given the below setup:

let r = ref 0

let f () = (r := !r + 1; !r)
let g () = (r := !r * 2; !r)

The following are two valid OCaml expressions:

(f ()) + (g ())

and

(g ()) + (f ())

Is it the case that addition commutes here? No:

utop # (f ()) + (g ());;
- : int = 1

utop # (g ()) + (f ());;
- : int = 3

What's going on here? + in OCaml evaluates right to left; and since f and g are touching shared mutable memory, the order in which they evaluate will depend on their final results.

Adding Booleans

Simple calculator languages are nice, but they hardly count as "real" programming languages. Let's fix that! First, we're going to add a new type of values: booleans. In particular, we will add the boolean constants true/false; boolean operators for &&, ||, and not; a way to create Booleans by checking for equality; and control flow via if/then/else.

To do that, we first need to add the corresponding nodes to the AST:

type expr =
  | Add of expr * expr
  | Mul of expr * expr
  | And of expr * expr
  | Or of expr * expr
  | Not of expr
  | Ite of expr * expr * expr
  | Eq of expr * expr
  | Nat of int
  | Bool of bool

Our type of expr is already getting large, so let's do a little work up front to make it a bit smaller:

type value = | Nat of int | Bool of bool

type op = | Add | Mul | And | Or | Not | Eq

type expr =
  | Const of value
  | Op of op * expr list
  | Ite of expr * expr * expr

We split expr into just three cases: constant values, which can either be natural numbers or booleans; an application of an operator (such as Add) to a list of subexpressions; and if/then/else, which we will keep separate.

Now, we need to give expr semantics. How to do so? Before, we just let eval have type expr -> int; this won't work any more, because certain expressions (e.g., Op (Not, [Const (Bool false)])) should evaluate to booleans. Luckily, we have a fix: we just let expressions evaluate to values. In this sense, we can think of value as the type of "fully evaluated" things; while expr is the type of things which are yet to be evaluated.

let rec eval (e : expr) : value = ...

Let's now think about how eval should be implemented. In the Const case, this is easy; we just return the value. But how to implement an operator (e.g., Add)?

let rec eval (e : expr) : value = 
  match e with
  | Const v -> v
  | Op (o, es) -> ???
  | Ite (cond, e1, e2) -> ???

The first thing we can do is evaluate all of the subexpressions for Op.

| Op (o, es) -> let vs = List.map eval es in ...

At this point, we have an operator o, a list of argument values vs, and we want to return a new value. Let's make a helper function for that:

let eval_op (o : op) (vs : value list) : value = 
  match o, vs with
  | Add, [Nat i; Nat j] -> Nat (i + j)
  | Mul, [Nat i; Nat j] -> Nat (i * j)
  | And, [Bool x; Bool y] -> Bool (x && y)
  | Or,  [Bool x; Bool y] -> Bool (x || y)
  | Not,  [Bool x] -> Bool (not x)
  | Eq, [v1; v2] -> Bool (v1 = v2)
  | _, _ -> failwith "eval_op: got unexpected arguments"

How eval_op works is, for each op, we try to see if vs is of the right "shape": for example, if we are evaluating an Add, then vs should hold exactly two natural numbers. If it is, we can straightforwardly evaluate. If not, we will throw an exception. There is a subtle point here with Eq: eq will work with any two values, because we can use OCaml's = built-in function on value to compare them.

Other choices we can make

This semantics will throw an error if we try to evaluate false + false (similarly to how in OCaml, we will get a type error in this case). However, not all languages work like this. In Python:

>>> False + False
0

While one perspective might be that this makes + more useful, since it can work on more things, another perspective is that it gives + a more complicated semantics, which increases the chance of people misusing it. For example, if we are evaluating x + y, we might want this to fail in the event that earlier on, we got a type error which caused x to be a boolean.

Now that we have eval_op, we can implement eval. Our implementation of if/then/else will be similar, in that we will pattern match on the value, and fail if it is unexpected:

let rec eval (e : expr) : value = 
  match e with
  | Const v -> v
  | Op (o, es) -> let vs = List.map eval es in eval_op o vs
  | Ite (cond, e1, e2) -> 
    match eval cond with 
    | Bool b -> if b then eval e1 else eval e2
    | _ -> failwith "Ite: didn't get a boolean"

Note here that, just like how Add is implemented in terms of +, we implement Ite in terms of OCaml's native if/then/else construct. This also means that (by design!) if we are evaluating Ite (cond, e1, e2), we don't evaluate e2 if cond evaluates to Bool true. This is the reason we don't have Ite be an operator: unlike an operator, we don't evaluate both branches of the conditional.

Note

What would it look like if we did evaluate both branches of the conditional? Why does OCaml not behave like this?

The completed code for this lecture can be found here.