Assignment 10: Racer Snake: Register allocation
Due: Wed 4/20 at 8:59pm
git clone
In this compiler you’ll implement register allocation, to replace the naive stack allocation scheme. After all, Register Allocation Contributes to Efficient Runtime...
There is relatively little starter code this time.
1 Language
Racer starts with the same semantics and runtime behavior as Garter, and simply revisits the naive stack-allocation scheme to assign locations to let-bound variables.
2 Implementation
Read the TODO list below for a recommended order of operations for this assignment. Here, we’ll focus on the new code, rather than the refactorings of old code.
Cache the free variables on every expression.
Define an interference graph for a given program.
Color the graph such that no edge has two endpoints of the same color.
Map the coloring back to actual locations
Implement a dead-variable elimination pass
2.1 Caching free variables
You have already implemented free_vars
in Fer-de-lance, to
compute the set of free variables of a given expression. In this
assignment, you’re going to need to access those free variable sets
repeatedly, so rather than recomputing them over and over (which would
be exponential in the size of your AST), you should design a function
free_vars_cache
that takes a tagged aprog
and walks the AST,
computing the free variables at every expression, and produces a new
aprog
that stores those free variable sets.
2.2 Defining an interference graph
Consider a let-expression let x = e in b
. In order for
b
to run correctly, all of its free variables must not be
placed wherever we place x
. Accordingly, we define
inteference edges between x
and every free variable within
b
. Implicitly, this means we need to add nodes to the
interference graph for each variable mentioned in this whole
expression. Note that we do not add interference edges between
x
and any free variables of e
, since x
isn’t
defined until after e
executes.
Consider a lambda-expression lam(x, y, ...): b end
. We build
an interference graph for b
. To that graph, add interference
edges for every pair of free variables of the entire lambda (i.e., the
free variables of b
except for the arguments x
,
y
, etc.), since those are precisely the values that are packed
into the closure and that need to be unpacked to distinct locations
when the closure gets called.
Consider a let-rec expression
let rec x1 = e1, x2 = e2, ... in b
.
By well-formedness, each of the e#
expressions is a
lambda. We therefore collect the union of the free variables of all
the lambdas, and add interference edges with each of the x#
variables. We also add interference edges between each pair of
x#
variables. (Note that this technically introduces
some potentially-unneeded interference edges, but the bookkeeping
needed to avoid the consequences of them (in how you fill in the
bodies of the closures) is sufficiently painful to make these edges
worthwhile.)
When building the interference graph for an aexpr
expression,
note that every cexpr
cannot introduce additional interference
edges...except for CIf
, which can contain aexpr
s that
themselves contain further let-bindings or lambdas. In particular,
do not recur into any lambda expressions; only use their cached
free-variables set. I.e. the interference graph for the lambda itself
should not be merged with the interference graph for the expression
containing that lambda expression.
2.3 Graph coloring
Given an interference graph, we need to color each node such that no
edge of the graph has two endpoints with the same color. Our set of
“colors” consists of each of the general-purpose registers that are
not already in use by our program. This would be R10
,
R11
, R12
, R13
, R14
, and RBX
, possibly
excluding one or two of those as temporary registers. (You are
welcome to try to include the six argument-passing registers
RSI
, RDI
, RCX
, RDX
, R8
, R9
, though
additional stack-push
and pop
bookkeeping will be needed
to preserve them across function-calls. If you’ve been pushing
arguments onto the stack for your calling convention, then you should
definitely include these registers in your available set of registers
to use!) Additionally, we can create arbitrarily many more “colors”
corresponding to stack slots RBP-8
, RBP-16
, etc. Each of
these colors is an arg
.
Graph coloring is rather inconveniently an NP-complete problem to solve optimally, though there are many heuristics. We’re going to implement the so-called smallest-last algorithm. We’ll break the process of graph coloring into two steps: first, producing a worklist of nodes that we’ll process in order, and then processing each node of that worklist in turn. Our worklist will be a stack. To determine the order to process the nodes of the graph:
Initialize the worklist to the empty stack.
Find the node in the graph of smallest degree. Push it onto the worklist, and remove that node (and all its edges) from the interference graph.
Repeat this process with the reduced graph, repeatedly removing the node of highest (remaining) degree and pushing onto the stack, until the graph is empty.
To actually color the nodes:1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.
Create a mapping from node names to colors (i.e. an
arg envt
), initialized to the empty mapping.For each node on the stack, examine all of its neighbors in the original interference graph, and collect all their currently-known colors. Find the minimum color that isn’t used yet by that set of colors, and allocate that new minimum color to that node. Obviously, we prefer register “colors” to stack-slot colors!
Repeat until the stack is empty.
This arg envt
is the final environment for that lambda.
You will compute an interference graph and color it, for each function
in your program (including our_code_starts_here
...), and you’ll
store these resulting environments in a map from function names to
environments —arg envt envt
that matches the
return type of naive_stack_alloc
, so your register allocator
function should be a drop-in replacement for that simpler function.
2.4 Dead variable elimination
(Required for CS6410; optional for CS4410)
Consider the program let x = e1 in let y = e2 in x
. According
to our rules for interference graph construction, x
and
y
interfere. However, y
is dead: it is never
used after its definition. Therefore we don’t even need to store its
value at all! In general, though, we can’t eliminate the computation
of e2
, since it may have side effects. Instead, we can detect
this scenario, where a variable defined by a let is not used within
its own body, and transform the ALet
into an ASeq
.
The analogous situation for let-rec expressions is trickier. Consider
a let-rec let rec x1 = e1, x2 = e2, ... in b
where some of the
bindings are not used in any of the other bindings. Because of
well-formedness, the right hand sides of those bindings are lambdas
—e#
, and find the
set of unused x#
variables (that aren’t in that union)...and
just delete those bindings entirely.
Implement these two transformations as a new phase of your compiler,
after free_vars_cache
and before register-allocation.
3 Recommended TODO List
Move over code from past assignments and/or lecture code to get the basics going.
Implement
free_vars_cache
, to take a taggedaprog
and produce anaprog
augmented with the set of free variables at every single expression position. Test as you go.Duplicate your existing
naive_stack_allocation
implementation into the body ofregister_allocation
. These two functions should already be wired up to the compilation pipeline incompile.ml
.Modify
register_allocation
to initially callfree_vars_cache
, and then refactor its behavior to depend entirely on those cached results: at this point you should have code that works identically tonaive_stack_allocation
, but whose code is ready for the next step.Implement
interfere
for everything other than lambda expressions (i.e. skipCLambda
andALetRec
). Test very carefully: minor bugs here can make later allocation decisions very tricky to debug. Write several test programs that avoid let-rec and lambdas, but use variables in interesting ways: work your way back up through all the languages we’ve seen. Confirm that all let-free Adder programs use at most one register. Confirm that all let-free Boa programs use at most as many registers as the rightmost depth of the infix expression. Confirm that any programs using lets use an expected number of registers. Confirm that the branches of if expresions don’t interfere with each other in Cobra programs. Etc...Implement
color_graph
, testing as you go.Revise
register_allocation
to useinterfere
andcolor_graph
to make allocation decisions, rather than just assigning stack slots to all variables. You should be able to run existing integration tests by omitting builtins, for now.Add support for
CLambda
andALetRec
tointerfere
. Test very carefully.
4 List of Deliverables
all your modified files
tests in an OUnit test module (
test.ml
)any test input programs (
input/*/*.racer
files)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
DO NOT SUBMIT YOUR .git
DIRECTORY! For that matter, don’t submit
your output
or _build
directories. Basically, run
make clean
and then submit a zip of the remaining directory.
5 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
6 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline.
1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.