Assignment 8: Fer-de-lance:   Anonymous, first-class functions
1 Language
1.1 Syntax
1.2 Semantics
2 Implementation
2.1 Memory Layout and Function Values
2.2 Computing and Storing Free Variables
2.3 Restoring Saved Variables
3 Examplar
4 Recommended TODO List
5 List of Deliverables
6 Grading Standards
7 Submission
8.12

Assignment 8: Fer-de-lance: Anonymous, first-class functions🔗

Due: Tue 3/19 Wed 03/20 at 11:59pm

git clone

In this compiler you’ll implement Functions Defined by Lambdas.

There is relatively little starter code this time.

1 Language🔗

1.1 Syntax🔗

Fer-de-lance starts with the same semantics as Egg-eater, and makes four significant changes:

In addition, I’ve made a couple of minor syntactic changes and bug-fixes that you can ignore if you wish:

type 'a program = 'a expr

and 'a expr =
    ...
  | ELambda of 'a bind list * 'a expr * 'a (* Arbitrary argument bindings, including tuples *)
  | EApp of 'a expr * 'a expr list
  | ELetRec of 'a binding list * 'a expr * 'a (* syntactically restricted to only BName bindings *)


type 'a cexpr =
    ...
  | CLambda of string list * 'a aexpr (* only simple name arguments *)
  | CApp of 'a immexpr * 'a immexpr list

The concrete syntax of lambda expressions requires parentheses surrounding the whole expression, and around the arguments (spaces are not significant):

let add = (lambda (x, y): x + y) in
add(5, 6)

The concrete syntax of let rec expressions is restricted to only permit binding names to values; it does not permit the fancier tuple bindings or underscore bindings tht we allow elsewhere. But for AST simplicity, we reuse 'a binding.

In the interests of expediency, print has been restored to being a Prim1 built-in expression, rather than a runtime-provided function. You do not have to provide support for arbitrary runtime-provided functions...though you’re encouraged to try. The implementations for print, input and equal have been provided for you, though you’re encouraged to enhance the printing of functions to be more informative.

1.2 Semantics🔗

Functions should behave just as if they followed a substitution-based semantics. This means that when a function is constructed, the program should store any variables that they reference that aren’t part of the argument list, for use when the function is called. This naturally matches the semantics of function values in languages like OCaml and Python.

There are several updates to errors as a result of adding first-class functions:

2 Implementation🔗

2.1 Memory Layout and Function Values🔗

Functions are stored in memory with the following layout:

image

For example, in this program:

let x = 10 in
let y = 12 in
let f = (lambda(z): x + y + z) in
f(5)

The memory layout of the lambda would be:

image

There is one argument (z), so 1 is stored for arity. There are two free variables—x and yso the corresponding values are stored in contiguous addresses (20 to represent 10 and 24 to represent 12). If the function stored three variables instead of two, then padding would not be needed. It is your choice whether to store the arity and the number of free variables as raw numbers or as Fer-de-lance numbers (i.e. as 1 and 2 or as 2 and 4) — but be sure to explain which choice you made and why.

Function values are stored in variables and registers as the address of the first word in the function’s memory, but with an additional 0x5 (0b101 in binary) added to the value to act as a tag.

The value layout is now:

Bit pattern

     

Value type

0xWWWWWWW[www0]

     

Number

0xFFFFFFF[1111]

     

True

0x7FFFFFF[1111]

     

False

0xWWWWWWW[w001]

     

Tuple

0xWWWWWWW[w101]

     

Function

2.2 Computing and Storing Free Variables🔗

An important part of saving function values is figuring out the set of variables that need to be stored, and storing them on the heap. Our compiler needs to generate code to store all of the free variables in a function — all the variables that are used but not defined by an argument or let binding inside the function. So, for example, x is free and y is not in:

(lambda(y): x + y)

In this next expression, z is free, but x and y are not, because x is bound by the let expression.

(lambda(y): let x = 10 in x + y + z)

Note that if these examples were the whole program, well-formedness would signal an error that these variables are unbound. However, these expressions could appear as sub-expressions in other programs, for example:

let x = 10 in
let f = (lambda(y): x + y) in
f(10)

In this program, x is not unbound — it has a binding in the first branch of the let. However, relative to the lambda expression, it is free, since there is no binding for it within the lambda’s arguments or body.

You will write a function free_vars that takes an aexpr and returns the set of free variables (as a list):

let free_vars (ae : 'a aexpr) : (string list) =
  ...

You may need to write one or more helper functions for free_vars, that keep track of an environment. Then free_vars can be used when compiling CLambda to fetch the values from the surrounding environment, and store them on the heap. In the example of heap layout above, the free_vars function should return ["x", "y"], and that information can be used in conjunction with env to perform the necessary mov instructions. Note that you really want to return a set, with no duplicates. You may want to look into using StringSet.t for this (we have defined StringSet for you), or manage duplicates in your string lists manually. (You might also want to analogously redefine envt to use a StringMap, rather than the association-list you currently have.)

This means that the generated code for a lambda will look much like it did before, but with an extra step to move the stored variables:

  jmp after1
temp_closure_1:
  <code for body of closure>
after1:
  mov [R15 + 0], <arity>
  mov [R15 + 8], temp_closure_1
  mov [R15 + 16], <number of closed variables>
  mov [R15 + 24], <var1>
  ... and so on for each variable to store
  mov RAX, R15
  add RAX, 5
  add R15, <heap offset amount>

2.3 Restoring Saved Variables🔗

The description above outlines how to store the free variables of a function. They also need to be restored when the function is called, so that each time the function is called, they can be accessed.

In this assignment we’ll treat the stored variables as if they were a special kind of local variable, and reallocate space for them on the stack at the beginning of each function call. So each function body will have an additional part of the prelude that restores the variables onto the stack, and their uses will be compiled just as local variables are. This lets us re-use much of our infrastructure of stack offsets and the environment.

The outline of work here is:

The second and third points are straightforward applications of ideas we’ve seen already — copying appropriate values from the heap into the stack, and using the environment to make variable references look at the right locations on the stack.

The first point requires a little more design work. If we try to fill in the body of temp_closure_1 above, we immediately run into the issue of where we should find the stored values in memory. We’d like some way to, say, move the address of the function value into RAX so we could start copying values onto the stack:

temp_closure_1:
  push RBP
  mov RBP, RSP
  mov RAX, <function value?>

  push QWORD [RAX + 19]  ;; NOTE: why 19?
  push QWORD [RAX + 27]  ;; NOTE: why 27?
  ... and so on ...

  ... continue with reserving stack frame space ...

But how do we get access to the function value? The list of instructions for temp_closure_1 may be run for many different instantiations of the function, so they can’t all look in the same place.

To solve this, we are going to augment the calling convention in Fer-de-lance to pass along the function value when calling a function. That is, we will push one extra time after pushing all the arguments, and add on the function value itself from the caller. So, for example, in a call like:

f(4, 5)

We would generate code for the caller like:

mov RAX, [RBP-8]   ;; (or wherever the variable f happens to be)
<code to check that RAX is tagged 0b101, and has arity 2>
push 10
push 8
push RAX           ;; BE CAREFUL that this is still the tagged value
mov RAX, [RAX + 3] ;; the address of the code pointer for the function value
call RAX           ;; call the function
add RSP, 24        ;; since we pushed two arguments and the function value, adjust RSP by three slots

(Note: Be careful when implementing tail-calls to functions now, since their arity is now one less than the number of stack slots actually needed...)

Now the function value is available on the stack, accessible just as an argument (e.g. with [RBP+16]), so we can use that in the prelude for copying all the saved variables out of the closure and into their more typical local-variable stack slots:

temp_closure_1:
  push RBP
  mov RBP, RSP
  mov RAX, [RBP+16]

  push QWORD [RAX + 19]  ;; NOTE: why 19?
  push QWORD [RAX + 27]  ;; NOTE: why 27?
  ... and so on ...

  ... continue with reserving stack frame space ...

3 Examplar🔗

We hope to provide an Examplar implementation for Fer-de-lance, much as we did with Cobra. It’s not ready yet; I hope to post it later this week. For this Examplar, focus on expected output versus crashing/erroring, but don’t focus on specific error messages in the output.

Note: in Cobra’s Examplar, occasionally multiple students would produce temp-files whose names collided. This should no longer happen: Examplar now produces temp files named /tmp/examplar_<username>_#####_output. You should no longer collide with each other, and you should also occasionally cleanup any lingering temp temp files with rm /tmp/examplar_<username>_*.

4 Recommended TODO List🔗

5 List of Deliverables🔗

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

DO NOT SUBMIT YOUR .git DIRECTORY! For that matter, don’t submit your output or _build directories. Basically, run make clean and then submit a zip of the remaining directory.

6 Grading Standards🔗

For this assignment, you will be graded on

7 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.