Assignment 8: Fer-de-lance: Anonymous, first-class functions
Due: Wed 3/19 at 9:00pm
git clone
In this compiler you’ll implement Functions Defined by Lambdas.
There is relatively little starter code this time.
1 Language
1.1 Syntax
Fer-de-lance starts with the same semantics as Egg-eater, and makes four significant changes:
It adds the notion of a lambda expression for creating anonymous function values.
The function-position in an application expression may now be any expression, not just an identifier.
To write anonymous, (mutually-)recursive functions, we introduce a new
let rec
expression to allow recursive bindings.In the presence of
let rec
, we no longer technically needdef
to define functions; however, we preserve it for backward compatibility. To make mutual recursion explicit, we create a notion of “declaration groups”, written asdef foo(...): ... and def bar(...): ...
These appear in the AST as adecl list list
, where each inner list is the group of mutually-recursive functions.
In addition, I’ve made a couple of minor syntactic changes and bug-fixes that you can ignore if you wish:
You may specify the keyword
shadow
to stipulate that a binding is permitted to shadow any other bindings of the same name. You may update (or implement) shadow-checking rules to accommodate this.The AST for applications includes a “call_type” argument, which starts off as
Unknown
and you can deduce whether functions areSnake
orNative
, if this is of help in organizing. Otherwise you may ignore this parameter.
type 'a program = 'a expr
and 'a expr =
...
| ELambda of 'a bind list * 'a expr * 'a (* Arbitrary argument bindings, including tuples *)
| EApp of 'a expr * 'a expr list
| ELetRec of 'a binding list * 'a expr * 'a (* syntactically restricted to only BName bindings *)
type 'a cexpr =
...
| CLambda of string list * 'a aexpr (* only simple name arguments *)
| CApp of 'a immexpr * 'a immexpr list
The concrete syntax of lambda
expressions requires parentheses
surrounding the whole expression, and around the arguments (spaces are not significant):
let add = (lambda (x, y): x + y) in
add(5, 6)
The concrete syntax of let rec
expressions is restricted to only permit
binding names to values; it does not permit the fancier tuple bindings or
underscore bindings tht we allow elsewhere. But for AST simplicity, we reuse
'a binding
.
In the interests of expediency, print
has been restored to being a
Prim1
built-in expression, rather than a runtime-provided function. You
do not have to provide support for arbitrary runtime-provided
functions...though you’re encouraged to try. The implementations for
print
, input
and equal
have been provided for you, though
you’re encouraged to enhance the printing of functions to be more informative.
1.2 Semantics
Functions should behave just as if they followed a substitution-based semantics. This means that when a function is constructed, the program should store any variables that they reference that aren’t part of the argument list, for use when the function is called. This naturally matches the semantics of function values in languages like OCaml and Python.
There are several updates to errors as a result of adding first-class functions:
There is no longer a well-formedness error for an arity mismatch. It is a runtime error (that should report "arity mismatch", at least).
The value in function position may not be a function (for example, a user may erroneously apply a number), which should raise a runtime error that reports "non-function".
There should still be a (well-formedness) check for duplicate argument names, but there is, naturally, no longer a check for duplicate function declarations. (This will be covered by any shadowing checks you do for repeated bindings of a name.)
There is a new well-formedness error for
let rec
declarations whose right hand sides are not all lambda expressions.You should report
DuplicateId
well-formedness errors if alet
orlet rec
contains multiple bindings of the same name.
2 Implementation
2.1 Memory Layout and Function Values
Functions are stored in memory with the following layout:
For example, in this program:
let x = 10 in
let y = 12 in
let f = (lambda(z): x + y + z) in
f(5)
The memory layout of the lambda
would be:
There is one argument (z
), so 1
is stored for arity. There are two free
variables—x
and y
—20
to represent 10 and 24
to represent 12). If the function
stored three variables instead of two, then padding would not be needed.
It is your choice whether to store the arity and the number of free variables as raw
numbers or as Fer-de-lance numbers (i.e. as 1
and 2
or as 2
and 4
) —
Function values are stored in variables and registers as the address
of the first word in the function’s memory, but with an additional 0x5
(0b101
in binary) added to the value to act as a tag.
The value layout is now:
Bit pattern |
| Value type |
|
| Number |
|
| True |
|
| False |
|
| Tuple |
|
| Function |
2.2 Computing and Storing Free Variables
An important part of saving function values is figuring out the set of
variables that need to be stored, and storing them on the heap. Our compiler
needs to generate code to store all of the free variables in a function —x
is free and y
is not in:
(lambda(y): x + y)
In this next expression, z
is free, but x
and y
are not, because x
is
bound by the let
expression.
(lambda(y): let x = 10 in x + y + z)
Note that if these examples were the whole program, well-formedness would signal an error that these variables are unbound. However, these expressions could appear as sub-expressions in other programs, for example:
let x = 10 in
let f = (lambda(y): x + y) in
f(10)
In this program, x
is not unbound —let
. However, relative to the lambda
expression, it is free, since
there is no binding for it within the lambda
’s arguments or body.
You will write a function free_vars
that takes an aexpr
and returns the set
of free variables (as a list):
let free_vars (ae : 'a aexpr) : (string list) =
...
You may need to write one or more helper functions for free_vars
, that keep
track of an environment. Then free_vars
can be used when compiling CLambda
to fetch the values from the surrounding environment, and store them on the
heap. In the example of heap layout above, the free_vars
function should
return ["x", "y"]
, and that information can be used in conjunction with env
to perform the necessary mov
instructions. Note that you really want to
return a set, with no duplicates. You may want to look into using
StringSet.t
for
this (we have defined StringSet
for you), or manage duplicates in
your string lists manually. (You might also want to analogously
redefine envt
to use a StringMap
, rather than the association-list you
currently have.)
This means that the generated code for a lambda
will look much like it did
before, but with an extra step to move the stored variables:
jmp after1
temp_closure_1:
<code for body of closure>
after1:
mov [R15 + 0], <arity>
mov [R15 + 8], temp_closure_1
mov [R15 + 16], <number of closed variables>
mov [R15 + 24], <var1>
... and so on for each variable to store
mov RAX, R15
add RAX, 5
add R15, <heap offset amount>
2.3 Restoring Saved Variables
The description above outlines how to store the free variables of a function. They also need to be restored when the function is called, so that each time the function is called, they can be accessed.
In this assignment we’ll treat the stored variables as if they were a special kind of local variable, and reallocate space for them on the stack at the beginning of each function call. So each function body will have an additional part of the prelude that restores the variables onto the stack, and their uses will be compiled just as local variables are. This lets us re-use much of our infrastructure of stack offsets and the environment.
The outline of work here is:
At the top of the function, get a reference to the address at which the function’s stored variables are in memory
Add instructions to the prelude of each function that restore the stored variables onto the stack, given this address
Assuming this stack layout, compile the function’s body in an environment that will look up all variables, whether stored, arguments, or let-bound, in the correct location
The second and third points are straightforward applications of ideas we’ve
seen already —
The first point requires a little more design work. If we try to fill in the
body of temp_closure_1
above, we immediately run into the issue of where we
should find the stored values in memory. We’d like some way to, say, move the
address of the function value into RAX
so we could start copying values onto
the stack:
temp_closure_1:
push RBP
mov RBP, RSP
mov RAX, <function value?>
push QWORD [RAX + 19] ;; NOTE: why 19?
push QWORD [RAX + 27] ;; NOTE: why 27?
... and so on ...
... continue with reserving stack frame space ...
But how do we get access to the function value? The list of instructions for
temp_closure_1
may be run for many different instantiations of the function,
so they can’t all look in the same place.
To solve this, we are going to augment the calling convention in Fer-de-lance
to pass along the function value when calling a function. That is, we will
push
one extra time after pushing all the arguments, and add on the function
value itself from the caller. So, for example, in a call like:
f(4, 5)
We would generate code for the caller like:
mov RAX, [RBP-8] ;; (or wherever the variable f happens to be)
<code to check that RAX is tagged 0b101, and has arity 2>
push 10
push 8
push RAX ;; BE CAREFUL that this is still the tagged value
mov RAX, [RAX + 3] ;; the address of the code pointer for the function value
call RAX ;; call the function
add RSP, 24 ;; since we pushed two arguments and the function value, adjust RSP by three slots
(Note: Be careful when implementing tail-calls to functions now, since their arity is now one less than the number of stack slots actually needed...)
Now the function value is available on the stack, accessible just as an
argument (e.g. with [RBP+16]
), so we can use that in the prelude for
copying all the saved variables out of the closure and into their more typical
local-variable stack slots:
temp_closure_1:
push RBP
mov RBP, RSP
mov RAX, [RBP+16]
push QWORD [RAX + 19] ;; NOTE: why 19?
push QWORD [RAX + 27] ;; NOTE: why 27?
... and so on ...
... continue with reserving stack frame space ...
3 Recommended TODO List
Move over code from past assignments and/or lecture code to get the basics going. There is intentionally less support code this time to put less structure on how errors are reported, etc. Note that the initial state of the tests will not run even simple programs until you get things started.
Implement a
desugar
phase that translates declaration groups (i.e.decl list
s) into explict let-rec bindings (i.e. anELetRec
of the associated list of declarations).Implement ANF for
ELambda
. Hint: it’s quite similar to what needed to be done to ANF a declaration.Implement the compilation of
CLambda
andCApp
, ignoring stored variables. You’ll deal with storing and checking the arity and code pointer, and generating and jumping over the instructions for a function. Test as you go.Implement
free_vars
, testing as you go. You can test with the helpertfvs
, which takes a name, an expression string, and a list of identifiers, and checks thatfree_vars
returns the same list of strings (in any order).Implement storing and restoring of variables in the compilation of
CLambda
andCApp
Implement support for
ELetRec
. 4410: You need to support just a single recursive declaration. 6410: You need to support multiple, mutually recursive declarations.Restore support for
print
,equal
, and any other runtime-provided functions, by generating an appropriate closure value.
4 List of Deliverables
all your modified files
tests in an OUnit test module (
test.ml
)any test input programs (
input/*/*.fdl
files)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
DO NOT SUBMIT YOUR .git
DIRECTORY! For that matter, don’t submit
your output
or _build
directories. Basically, run
make clean
and then submit a zip of the remaining directory.
5 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
6 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.