Defining an interpreter for a small language

6.10

Defining an interpreter for a small language

We’ve been using DrRacket to help us write and run programs in BSL and ISL this semester, and we’ve relied on its ability to evaluate our code and produce either an answer or an error message. But how exactly does it do that? Writing an interpreter for a language is one of the ways programming-language researchers and implementors understand and create new languages. DrRacket is itself a program: can we design our way to building an interpreter for a small langauge?

1 What makes a language?

As we’ve introduced BSL and ISL this semester, we’ve consistently introduced three things:

The kinds of values our language can manipulate: things like Numbers, Booleans, Symbols, structure values, etc;
The syntax of expressions we use to manipulate values: things like define, if, function application, etc;
The semantics or meaning of each of those expression forms

Those three pieces define a language. Moreover, the semantics of the language is essentially a function from expressions to values: it assigns meaning to each expression by determining what its result should be.

For our language, we’re going to pick a very small set of values and a small set of expressions to manipulate them. Building a bigger language doesn’t introduce any new essential concepts for us here.

2 Data definitions

Sample Problem What kinds of values can we manipulate in ISL+Lambda? What is a minimal set of interesting values to work with?

Starting from atomic data, we’ve worked with booleans, numbers, symbols, strings and images. For our purposes, we really only need booleans and numbers: the other three are just "other kinds of stuff we work with using functions", just like numbers. If we can figure out how to work with numbers, that’s good enough.

Moving up, we work with structured (and recursive) data. These are more complicated; we’ll leave them out for now.

Most importantly, we have functions, which are the workhorse of BSL and ISL. Functions come in two varieties: the built-in ones, and the ones we write ourselves. We’ll start with defining our own functions, and incorporate built-in ones later.

Sample Problem Design a data definition for Values.

Here is one possible definition:

; A Value is one of
; - Number
; - Boolean
; - CustomFunction
; - BuiltInFunction

We’ll come back to those last two cases later, as we discover what they need to be.

Sample Problem What kinds of expressions do we manipulate in ISL+Lambda? What is a minimal set of interesting expressions to work with?

First, recall that define is not an expression, because it does not produce a value. Likewise, check-expect is not an expression. So we’ll throw those away. Similarly, since we are ignoring structures for now, we can ignore define-struct, too.

What’s left? For booleans, we need the ability to write if expressions...and that’s pretty much enough. (We can write cond expressions using a lot of ifs, so cond is not essential — we say it’s "syntactic sugar", because having it around makes writing programs "sweeter".)

Working with numbers is done entirely through function calls, so we don’t need any special-purpose expressions for that.

Writing our own functions requires lambda expressions. Using functions requires being able to call them, so we need some syntax for function-application.

Finally, using the parameters of a function means we need to have names in our program.

Sample Problem Design a data definition for Expressions.

Here is one possible definition:

; A Expr is one of
; - Number
; - Boolean
; - Identifier
; - (list 'if Expr Expr Expr)
; - (list 'fun [List-of Identifier] Expr)
; - [NEList-of Expr]
; A Identifier is any Symbol except any Keywords
; A Keyword is one of
; - 'if
; - 'fun

We pick 'fun rather than 'λ or 'lambda so that we don’t get too confused when looking at Expressions versus looking at BSL code. Also, notice that we isolate 'fun and 'if from the set of possible Identifiers — just as in BSL/ISL, we can’t use a keyword as a variable name.

Exercise 4 Construct at least ten examples of Exprs, covering all the cases. Be sure some of the examples nest expressions inside each other.

Here’s how we would represent the function that takes an argument named 'x and returns 5 if it’s true, and 6 if it’s false:

'(fun (x) (if x 5 6))

If you’re not familiar with the quote syntax, read Intermezzo 2. We will use it freely in these notes.

Note! We’re creating a data definition to represent programs of our toy language. Those data definitions happen to look a lot like BSL itself. That’s not just a coincidence: our expressions above are all examples of s-expressions — the "s" stands for "syntax", and it’s a big part of the design history of LISP, Scheme and DrRacket.

If you find yourself getting confused whether something is an Expr or "just" a ISL expression, rely on signatures to help you out.

Exercise 5 Design the function keyword=?, that takes a BSL value and a Keyword, and checks if they’re equal.

; keyword=? : Any Keyword -> Boolean
; Is this value equal to the given keyword?
(define (keyword=? val kw)
(and (symbol? val) (symbol=? val kw)))
(check-expect (keyword=? 42 'if) #false)
(check-expect (keyword=? 'fun 'fun) #true)

Exercise 6 Construct the template for Expr.

Since Expr is a union data-definition, we’ll certainly use a cond, but since the last three clauses are all non-empty lists, we’ll have to be a bit careful.

(define (expr-temp expr)
  (cond
    [(number? expr) ...]
    [(boolean? expr) ...]
    [(symbol? expr) ...]
    [else
     (local ((define first-expr (first expr)))
       (cond
         [(keyword=? first-expr 'if)
          (... (expr-temp (second expr)) ; the condition of the if
           ... (expr-temp (third expr))  ; the "then" clause
           ... (expr-temp (fourth expr)) ; the "else" clause
           ...)]
         [(keyword=? first-expr 'fun)
          (... (second expr) ; the arguments of the function
           ... (expr-temp (third expr)) ; the function body
           ...)]
         [else
          (... (expr-temp (first expr))    ; the function
           ... (map expr-temp (rest expr)) ; the function arguments
           ...)]))]))

3 Evaluation

Our goal is to develop a function eval, that takes a Expr and returns a Value. We’ll build this function in several stages.

3.1 First attempt: just numbers and booleans

Exercise 7 Fill in the number and boolean cases of the template above. Complete the case of 'if expressions. Be careful! How many possible outcomes can there be for an 'if expression?

Numbers and booleans evaluate to themselves, so they’re easy to fill in:

; Expr -> Value
; Evaluate the given expression to an answer
(define (eval expr)
  (cond
    [(number? expr) expr]
    [(boolean? expr) expr]
    ...))

In the 'if case, what can happen? We must evaluate the expression that represents the condition, to see which branch to take. If it’s true, evaluate the first branch. If it’s false, evaluate the second. If it’s neither, we have to stop with an error, because the program doesn’t make sense!

; Is this value #true, and nothing else?
; Any -> Boolean
(define (true? v)
  (and (boolean? v) v))
; Expr -> Value
; Evaluate the given expression to an answer
(define (eval expr)
  (cond
    [(number? expr) expr]
    [(boolean? expr) expr]
    [(symbol? expr) ...]
    [else
     (local ((define first-expr (first expr)))
       (cond
         [(keyword=? first-expr 'if)
          (local ((define cond-val (expr-temp (second expr)))) ; evaluate the condition
            (cond
              [(true? cond-val) (eval (third expr))]   ; if it's true, use the "then" clause
              [(false? cond-val) (eval (fourth expr))] ; if it's false, use the "else" clause
              [else (error "If expects a boolean, got " cond-val)]))]
         ...))]))

Exercise 8 Write several tests for eval. Confirm that it behaves as expected.

3.2 Including identifiers

Suppose we wrote a tiny program in BSL:

x

DrRacket would complain that x is not defined, and rightfully so.

But a slightly larger program works perfectly fine:

((λ(x) (+ x 2)) 5)

Here, x is in scope when it’s used, and therefore the program succeeds. Yet the use of x is exactly the same expression as our prior program. So clearly, DrRacket is keeping track of what variables are in scope, so that they can be used.

In order to implement our language correctly, we must also keep track of what variables are bound.

Exercise 9 Design a function all-bound?, that takes a Expr and returns true if all the uses of variables in the program are properly bound, and false if any of them are not.

The set of variables that are in scope will grow or shrink through different parts of the code. We need a data definition to describe this set:

; An Environment (Env) is a [List (list Identifier Value)]
; INTERPRETATION: Each pair in the list associates a variable name to
; its value.

This is very similar to other "mapping names to values" situations we’ve seen (Wiki in class, History on the project). We can similarly implement a lookup function, to find the value associated with a name:

; Identifier Env -> Value
(define (lookup name env)
  (cond
    [(empty? env) (error name "not defined!")]
    [(cons? env)
     (if (symbol=? (first (first env)) name)
         (second (first env))
         (lookup name (rest env)))]))

Sample Problem Define several interesting test cases for lookup.

The most interesting situation we might encounter is finding a name appear in our environment multiple times. If that occurs, lookup will return the value that is closest to the front of the list, i.e. the most recently-added binding. This is great! We’ve reproduced a crucial feature of how scope works: we’ve implemented shadowing, where the most recent binding of a name takes precedence over any earlier ones.

Taken together, environments and the lookup function imply that we might be able to extend our eval function with an extra Env parameter, that accumulates the currently available bindings.

; Expr Env -> Value
; Evaluate the given expression to an answer
; ACCUMULATOR: env includes all the variables that are currently in scope
(define (eval expr env)
  (cond
    [(number? expr) expr]
    [(boolean? expr) expr]
    [(symbol? expr) (lookup expr env)]
    [else
     (local ((define first-expr (first expr)))
       (cond
         [(keyword=? first-expr 'if)
          (local ((define cond-val (eval (second expr) env))) ; evaluate the condition
            (cond
              [(true? cond-val) (eval (third expr) env)]   ; if it's true, use the "then" clause
              [(false? cond-val) (eval (fourth expr) env)] ; if it's false, use the "else" clause
              [else (error "If expects a boolean, got " cond-val)]))]
         ...))]))

Exercise 10 Test this new evaluator thoroughly. Mix and match identifiers, expressions, and different environments.

3.3 Implementing our own functions

Suppose that we wrote a slightly larger program:

(define foo (λ(x) x))
(foo 5)

DrRacket would accept this program and produce 5, because the use of x is bound by the argument to the lambda.

A slightly larger program works well, too:

(define foo
(λ(a)
(λ(b) (+ a b))))
(define bar (foo 3))
(define baz (foo 5))
(bar 5)

On the last line, the call to bar will produce 8, even though neither a nor b are still in scope when bar is called. Instead, as we’ve explained it so far, bar gets bound to "(λ (b) (+ a b)), where we substitute 3 for a". This is semantically correct, and is a good intuition, but it’s a horribly inefficient implementation strategy. Perhaps DrRacket keeps track of a once and for all? That doesn’t work, because (baz 5) should produce something distinct from (bar 5), even though both are defined using the same (λ (b) (+ a b)) expression... Instead, we come to the idea that evaluating a function must somehow keep track of the environment that exists at the moment we evaluated it. This concept is called a closure, and it’s a pretty powerful implementation strategy.

; A CustomFunction is a (make-closure [List-of Identifier] Expr Env)
; INTERPRETATION: the argument names and the body of a function,
; packaged with the environment that existed at the moment the closure was created.
(define-struct closure [args body env])

Now, evaluating a function is really easy: we just create a (make-closure ...). The challenge will be in using it for function applications. We know that the rule for evaluating an application expression is to evaluate the function itself to a value (which we now have decided will be a closure), then to evaluate the arguments to values, and finally to apply the function to those values. Let’s separate that last step into a helper function.

; Expr Env -> Value
; Evaluate the given expression to an answer
; ACCUMULATOR: env includes all the variables that are currently in scope
(define (eval expr env)
  (cond
    [(number? expr) expr]
    [(boolean? expr) expr]
    [(symbol? expr) (lookup expr env)]
    [else
     (local ((define first-expr (first expr)))
       (cond
         [(keyword=? first-expr 'if)
          (local ((define cond-val (eval (second expr) env))) ; evaluate the condition
            (cond
              [(true? cond-val) (eval (third expr) env)]   ; if it's true, use the "then" clause
              [(false? cond-val) (eval (fourth expr) env)] ; if it's false, use the "else" clause
              [else (error "If expects a boolean, got " cond-val)]))]
         [(keyword=? first-expr 'fun)
          (make-closure (second expr) ; the args
                        (third expr)  ; the body – not evaluated!
                        env           ; the environment)]

         [else
          (local ((define eval-fun (eval (first expr) env))
                  (define eval-args (map (λ(arg) (eval arg env)) (rest expr))))
           (do-application eval-fun eval-args))]))]))

The application step itself must first check that its supposed function is in fact a closure – after all, (5 6) doesn’t make sense. Once we know we have a closure, we can tear it apart:

; Value [List-of Value] -> Value
; Applies the given function-value to the given argument-values
(define (do-application fun-val args-vals)
  (cond
    [(closure? fun-val)
     (... (closure-args fun-val)
          (closure-body fun-val)
          (closure-env fun-val))]
    [else (error "not a function: " fun-val)]))

Exercise 11 Finish this function!

We have a list of argument names (the closure-args), and a list of argument values (args-vals). We must check that those are the same length, or else we have a mistake in how we called the function. If they are the same length, then we must create a new environment that includes everything from the existing environment, along with binding all the argument names to the argument values...and then just evaluate the body in the new environment.

; Value [List-of Value] -> Value
; Applies the given function-value to the given argument-values
(define (do-application fun-val args-vals)
  (cond
    [(closure? fun-val)
     (cond
       [(= (length (closure-args fun-val)) (length args-vals))
        (local ((define args-bindings
                        ; combine each argument name with the corresponding
                        ; argument value
                        (map (λ(arg val) (list arg val))
                             (closure-args fun-val)
                             args-vals))
                ; the new environment includes the arguments and the old environment
                (define new-env (append args-bindings (closure-env fun-val))))
          (eval (closure-body fun-val) new-env))]
       [else (error "Invalid function call: wrong number of arguments")])]
    [else (error "not a function: " fun-val)]))

This definition handles shadowing properly: since the function arguments come before the old environment, they shadow any existing definitions.

Exercise 12 Confirm that this shadowing works properly.

Exercise 13 Test the new functions thoroughly.

3.4 Including built-in functions

We might want to include existing DrRacket functions into our environment, so that we can use them in our programs. For example, we might want to write

'((fun (n) (add-one (num-square n))) 5)

and have an environment where add-one actually was ISL’s add1, and num-square was actually sqr. (Notice the names are different: the names in ISL have no meaning in our expressions – we define the names by what we put in the environment.)

To do this, we define a new kind of value:

; A BuiltinFunction is a (make-builtin ISLFunction)
; INTERPRETATION: A way to include ISL functions into our environments
(define-struct builtin [fun])

We will also need to enhance our do-application function:

; Value [List-of Value] -> Value
; Applies the given function-value to the given argument-values
(define (do-application fun-val args-vals)
  (cond
    [(closure? fun-val)
     (cond
       [(= (length (closure-args fun-val)) (length args-vals))
        (local ((define args-bindings
                        ; combine each argument name with the corresponding
                        ; argument value
                        (map (λ(arg val) (list arg val))
                             (closure-args fun-val)
                             args-vals))
                ; the new environment includes the arguments and the old environment
                (define new-env (append args-bindings (closure-env fun-val))))
          (eval (closure-body fun-val) new-env))]
       [else (error "Invalid function call: wrong number of arguments")])]
    [(builtin? fun-val)
     (apply (builtin-fun fun-val) args-vals)]
    [else (error "not a function: " fun-val)]))

Here we use the apply function that takes a Racket function and a list of Racket values, and applies the function to the values.

Now, if we build an environment

(define AN-ENV (list (list 'add-one (make-builtin add1))
(list 'num-square (make-builtin sqr))))

we can use ISL builtin functions in our own code. And now we’re done!

4 Summary

Building a small language is relatively easy. We started with the concepts of Values and Expressions, and we chose representations for them that were convenient for us to create and process (using s-expressions). Then we systematically and incrementally built up an evaluator, the function that gives meaning to expressions by evaluating them to values. Our function was based heavily on the template for expressions, and we found ourselves reaching for just one additional accumulator parameter beyond that template-driven design. As we implemented CustomFunctions, we wound up with mutually-recursive data definitions (because now Values contained Exprs inside a closure-body), so our evaluator split into two mutually recursive functions, eval and do-application. Nothing in this program is particularly difficult, but it is sophisticated, and rereading it a few times to make sure you understand it all is definitely worthwhile.

If you found this material surprisingly interesting, definitely take a Programming Languages course, where you’ll learn a lot more about how languages are designed and what additional features we can add to them. If you want to know more about "real languages" like DrRacket and Java and others are implemented, consider taking a Compilers course after PL.

← prev up next →

	General Information
	Advice
	Texts
	Office Hours/ Contact
	Syllabus
	Labs
	Problem Sets
	Old Exams
	The Style
	Interesting Articles
	Peer Review
	Course Contract
	The Design Recipe
	Lecture notes from the end-of-semester

	Defining an interpreter for a small language
	Shrinking a language, part 1
	Defining a perceptron to classify messages