Lecture 9.2: Interpreting Calc
Overview
- Interpreting Calc
- Adding new data types: Booleans and Control Flow
Interpreting Calc
Last class, we defined a grammar, discussed concrete and abstract syntax, and showed the following AST representation:
type expr =
| Add of expr + expr (* + and * are come from our /definitional/ or /meta/ language *)
| Mul of expr * expr
| Nat of int
We then showed how a parser can take concrete syntax, like
and transform it into the ASTToday, we will tackle the "semantics" of Calc, specifically asking the question:
"what does a program mean?" In the case of (3 + 4) * 5 we want the program to
mean 35.
For our simple type of expr, how to implement this meaning is simple: we recursively traverse the tree to form integers.
let rec eval (x : expr) : int =
match x with
| Nat n -> n
| Add (a, b) -> eval a + eval b
| Mul (a, b) -> eval a * eval b
;;
let eval_tests () =
assert (eval (Nat 12) = 12);
assert (eval Add((Nat 1), (Nat 2)) = 3);
assert (eval (Mul(Nat 7, Add((Nat 1), (Nat 2)))) = 21);
print_endline "these are some examples of semantics!";;
There is a very subtle, but important point happening here. When we write
in our concrete syntax, we first parse this piece of raw syntax into the ASTMul(Add(Nat 3, Nat 4), Nat 5). At this point, the names Mul and Add are still uninterpreted; we could have just as easily called them Mario and Luigi. We then give semantics to these names by choosing to interpret Add via OCaml's + and Mul by *.
Properties of Evaluation
We can already state useful properties about calc programs using this AST representation. For example, that addition is commutative:
And that multiplication is distributive over addition:
Let's formalize this in OCaml (our first formalized property since break!).
let rec forall
?(ntrials : int = 1000)
(gen : unit -> 'a)
(prop : 'a -> bool)
: 'a option
= if ntrials <= 0
then None
else
let x = gen () in
if prop x
then forall ~ntrials:(ntrials - 1) gen prop
else Some x
let rec gen_expr (mx : int) () : expr =
let gen_nat () = Nat (Random.int 50) in
let go n =
if n <= 0
then gen_nat ()
else
match Random.int 3 with
| 0 -> gen_nat ()
| 1 -> Add (go (n - 1), go (n - 1))
| 2 -> Mul (go (n - 1), go (n - 1))
| _ -> failwith "IMPOSSIBLE"
in
go (Random.int mx)
let gen_pair f g () = (f (), g ())
let prop_add_comm (a, b) : bool
eval (Add(a, b)) = eval (Add(b, a))
let prop_m_dist_a (a, b, c) : bool
eval (Mul(c, Add(a, b)) = eval (Mul(Add(c, a), Add(c, b)))
let test_add_comm () =
forall (gen_pair (gen_expr 20) (gen_expr 20)) prop_add_comm
"pure and total" vs "impure and partial"
Note that simple properties like the ones above only hold for expressions because they denote pure, total expressions.
Indeed, as we go to more expressive languages, properties like these start to break down. Let's see an example of this in OCaml.
Given the below setup:
The following are two valid OCaml expressions: andIs it the case that addition commutes here? No:
What's going on here? + in OCaml evaluates right to left; and since f and g are touching shared mutable memory, the order in which they evaluate will depend on their final results.
There's another angle to this: What if we added division to our calculator language? Then our valid properties might break the testing harness. Meaning, if we get an expression like the following:
let rec eval (x : expr) : int =
match x with
...
| Div (a, b) -> eval a / eval b
;;
(* where e2 is generated to be some Div(???, 0) *)
eval (Add(e1, e2)) = eval (Add(e2, e1))
We would need eval to be a partial function -- it doesn't have a semantics over all of the expressions and we will get a division by zero error. This is a problem: we won't get back a useful counter-example at all and the entire PBT harness will crash.
There are many different ways we can handle this. Can you name one?
- don't generate bad expressions:
∀ e1 e2, e1 <> Div(_, Nat 0) → e2 <> Div(_, Nat 0) → ... - make our
evalfunction return anoptionoreithertype. - add a try-catch in our
forallfunction: there are subtle issues here. What if (in another langauge) eval on the left crashes but eval on the right does not, and this is an important detail of our property? - add a smarter notion of
=which can handle the above.
Adding Booleans
Simple calculator languages are nice, but they hardly count as "real" programming languages. Let's fix that!
First, we're going to add a new type of values: booleans. In particular, we will add the boolean constants true/false; boolean operators for &&, ||, and not; a way to create Booleans by checking for equality; and control flow via if/then/else.
Let's discuss, at a high level, how to introduce Booleans into our semantics. We want the following new features:
- boolean values
- connectives: and, or, ot
- if-then-else
- equality on values.
To do that, we first need to add the corresponding nodes to the AST:
type expr =
| Add of expr * expr
| Mul of expr * expr
| And of expr * expr
| Or of expr * expr
| Not of expr
| If of expr * expr * expr (* this is a design decision! *)
| Eq of expr * expr
| Nat of int
| Bool of bool
Notice that If is not taking in a bool * expr * expr, then it would be relying on OCaml to provide boolean values. We could allow for this by enabling support for this in our parser, but it would amount to introducing a static notion of Booleans. We also could pull out a separate bool_expr ADT, which might be a bit more flexible, we could interleave bool_expr with some nat_expr, but this would be approximately the same idea. These small design decisions propagate quite quickly into our interpreter!
type value = | Nat of int | Bool of bool
type expr =
(* to unify the types, we will introduce this Const constructor *)
| Const of value
...
let rec eval (x : expr) : value =
match x with
| Const v -> v
| Add (a, b) ->
(match eval a, eval b with
| Nat a, Nat b -> a + b
| _ -> failwith "+: expected two natural numbers")
| Mul (a, b) ->
(match eval a, eval b with
| Nat a, Nat b -> a * b
| _ -> failwith "*: expected two natural numbers")
| If (a, b, c) ->
(match eval a with
| Bool true -> eval b
| Bool false -> eval c
| _ -> failwith "if: expected a boolean")
| And (a, b) ->
(* (match eval a, eval b with *)
(* | Bool a, Bool b -> a && b *)
(* | _ -> failwith "&&: expected two Booleans") *)
(match eval a with
| Bool true -> eval b
| Bool false -> Bool false
| _ -> failwith "and: expected LHS boolean")
;;
And in this way results in short-circuiting. How does OCaml deal with this?
utop # (false && (failwith "!"));;
- : bool = false
utop # (true && (failwith "!"));;
Exception: Failure "!".
When we short-circuit our And expression, it may risk loosing commutativity depending our our language design.
Our type of expr is already getting large, so let's do a little work up front to make it a bit smaller:
type value = | Nat of int | Bool of bool
type op = | Add | Mul | And | Or | Not | Eq
type expr =
| Const of value
| Op of op * expr list
| If of expr * expr * expr
We split expr into just three cases: constant values, which can either be natural numbers or booleans; an application of an operator (such as Add) to a list of subexpressions; and if/then/else, which we will keep separate.
Now, we need to give expr semantics. How to do so? Before, we just let eval have type expr -> int; this won't work any more, because certain expressions (e.g., Op (Not, [Const (Bool false)])) should evaluate to booleans. Luckily, we have a fix: we just let expressions evaluate to values. In this sense, we can think of value as the type of "fully evaluated" things; while expr is the type of things which are yet to be evaluated.
Let's now think about how eval should be implemented. In the Const case, this is easy; we just return the value. But how to implement an operator (e.g., Add)?
let rec eval (e : expr) : value =
match e with
| Const v -> v
| Op (o, es) -> ???
| If (cond, e1, e2) -> ???
The first thing we can do is evaluate all of the subexpressions for Op.
o, a list of argument values vs, and we want to return a new value. Let's make a helper function for that:
let eval_op (o : op) (vs : value list) : value =
match o, vs with
| Add, [Nat i; Nat j] -> Nat (i + j)
| Mul, [Nat i; Nat j] -> Nat (i * j)
| And, [Bool x; Bool y] -> Bool (x && y)
| Or, [Bool x; Bool y] -> Bool (x || y)
| Not, [Bool x] -> Bool (not x)
| Eq, [v1; v2] -> Bool (v1 = v2)
| _, _ -> failwith "eval_op: got unexpected arguments"
eval_op works is, for each op, we try to see if vs is of the right "shape": for example, if we are evaluating an Add, then vs should hold exactly two natural numbers. If it is, we can straightforwardly evaluate. If not, we will throw an exception. There is a subtle point here with Eq: eq will work with any two values, because we can use OCaml's = built-in function on value to compare them.
Other choices we can make
This semantics will throw an error if we try to evaluate false + false (similarly to how in OCaml, we will get a type error in this case).
However, not all languages work like this. In Python:
While one perspective might be that this makes + more useful, since it can work on more things, another perspective is that
it gives + a more complicated semantics, which increases the chance of people misusing it. For example, if we are evaluating x + y, we might want this to fail in the event that earlier on, we got a type error which caused x to be a boolean.
Now that we have eval_op, we can implement eval. Our implementation of if/then/else will be similar, in that we will pattern match on the value, and fail if it is unexpected:
let rec eval (e : expr) : value =
match e with
| Const v -> v
| Op (o, es) -> let vs = List.map eval es in eval_op o vs
| If (cond, e1, e2) ->
match eval cond with
| Bool b -> if b then eval e1 else eval e2
| _ -> failwith "If: didn't get a boolean"
Note here that, just like how Add is implemented in terms of +, we implement If in terms of OCaml's native if/then/else construct.
This also means that (by design!) if we are evaluating If (cond, e1, e2), we don't evaluate e2 if cond evaluates to Bool true.
This is the reason we don't have If be an operator: unlike an operator, we don't evaluate both branches of the conditional.
Note
What would it look like if we did evaluate both branches of the conditional? Why does OCaml not behave like this?
The completed code for this lecture can be found here.