Assignment 10: Racer Snake: Register allocation
Due: Wed 4/09 at 8:59pm
git clone
In this compiler you’ll implement register allocation, to replace the naive stack allocation scheme. After all, Register Allocation Contributes to Efficient Runtime...
There is relatively little starter code this time. The main change is a new
command-line flag, -alloc <naive|register>, to configure your compiler
to use either the former naive allocation strategy or your new register
allocation strategy. You can also use this command line flag in your options
files for your tests; there is not a test helper-function in test.ml to
do this for you, though you can easily add one.
1 Language
Racer starts with the same semantics and runtime behavior as Garter, and simply revisits the naive stack-allocation scheme to assign locations to let-bound variables.
2 Implementation
Read the TODO list below for a recommended order of operations for this assignment. Here, we’ll focus on the new code, rather than the refactorings of old code.
Cache the free variables on every expression.
Define an interference graph for a given program.
Color the graph such that no edge has two endpoints of the same color.
Map the coloring back to actual locations
Implement a dead-variable elimination pass
2.1 Caching free variables
You have already implemented free_vars in Fer-de-lance, to
compute the set of free variables of a given expression. In this
assignment, you’re going to need to access those free variable sets
repeatedly, so rather than recomputing them over and over (which would
be exponential in the size of your AST), you should design a function
free_vars_cache that takes a tagged aprog and walks the AST,
computing the free variables at every expression, and produces a new
aprog that stores those free variable sets.
2.2 Defining an interference graph
In order to construct the interference graph for a given program, we compute the live variables for every expression in the program, and use them in constructing the necessary graphs. Conceptually, this is a (pseudocode) pipeline
|> (add_phase ... compute_free_vars) (* produces a StringSet of free vars *)
|> (add_phase ... compute_live_vars_also) (* produces a pair of StringSets of free vars and live vars *)
|> (add_phase locate_bindings register_allocation)Instead of breaking this down into two separate steps, you might instead want
to calculate \(\mathit{LIVE}_{in}\)[\(\cdot\)] sets and build the interference graphs
simultaneously. Either approach is fine. (Note that if you choose this
two-step approach, you’ll need to slightly modify the signature of
naive_stack_alloc to match.)
Consider a let-expression let x = e in b. In order for
b to run correctly, each element of \(\mathit{LIVE}_{in}\)[b]
must not be placed wherever we place x, and any two variables in that
set must conflict with each other. Accordingly, add nodes (if needed) for each
variable in \(\mathit{LIVE}_{in}\)[b] and for x, edges between x and
every variable in \(\mathit{LIVE}_{in}\)[b], and edges between each pair of variables
in \(\mathit{LIVE}_{in}\)[b]. Note that we do not add interference
edges between x and any free variables of e, since
x isn’t defined until after e executes.
You should deduce the appropriate graph for ASeq expressions from the
rules for ALet expressions above.
Consider a let-rec expression let rec x1 = e1, x2 = e2, ... in b. By
well-formedness, each of the e# expressions is a lambda. We therefore
collect the union of the free variables of all the lambdas and
\(\mathit{LIVE}_{in}\)[b], and add interference edges with each of the x#
variables. We also add interference edges between each pair of x#
variables. (Note that this technically introduces some
potentially-unneeded interference edges, but the bookkeeping needed to avoid
the consequences of them (in how you fill in the bodies of the closures) is
sufficiently painful to make these edges worthwhile.)
When building the interference graph for an aexpr expression,
note that every cexpr cannot introduce additional interference
edges...except for CIf, which can contain aexprs that
themselves contain further let-bindings or lambdas. In particular,
do not recur into any lambda expressions; only use their cached
free-variables set. I.e. the interference graph for the lambda itself
should not be merged with the interference graph for the expression
containing that lambda expression.
2.3 Graph coloring
Given an interference graph, we need to color each node such that no
edge of the graph has two endpoints with the same color. Our set of
“colors” consists of each of the general-purpose registers that are
not already in use by our program. This would be R10,
R11, R12, R13, R14, and RBX, possibly
excluding one or two of those as temporary registers. (You are
welcome to try to include the six argument-passing registers
RSI, RDI, RCX, RDX, R8, R9, though
additional stack-push and pop bookkeeping will be needed
to preserve them across function-calls. If you’ve been pushing
arguments onto the stack for your calling convention, then you should
definitely include these registers in your available set of registers
to use!) Additionally, we can create arbitrarily many more “colors”
corresponding to stack slots RBP-8, RBP-16, etc. Each of
these colors is an arg.
Graph coloring is rather inconveniently an NP-complete problem to solve optimally, though there are many heuristics. We’re going to implement the so-called smallest-last algorithm. We’ll break the process of graph coloring into two steps: first, producing a worklist of nodes that we’ll process in order, and then processing each node of that worklist in turn. Our worklist will be a stack. To determine the order to process the nodes of the graph:
Initialize the worklist to the empty stack.
Find the node in the graph of smallest degree. Push it onto the worklist, and remove that node (and all its edges) from the interference graph.
Repeat this process with the reduced graph, repeatedly removing the node of smallest (remaining) degree and pushing onto the stack, until the graph is empty.
To actually color the nodes:1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.
Create a mapping from node names to colors (i.e. an
arg envt), initialized to the empty mapping.For each node on the stack, examine all of its neighbors in the original interference graph, and collect all their currently-known colors. Find the minimum color that isn’t used yet by that set of colors, and allocate that new minimum color to that node. Obviously, we prefer register “colors” to stack-slot colors!
Repeat until the stack is empty.
This arg envt is the final environment for that lambda.
You will compute an interference graph and color it, for each function
in your program (including our_code_starts_here...), and you’ll
store these resulting environments in a map from function names to
environments —arg envt envt that matches the
return type of naive_stack_alloc, so your register allocator
function should be a drop-in replacement for that simpler function.
2.4 Dead variable elimination
(Required for CS6410; optional for CS4410)
Consider the program let x = e1 in let y = e2 in x. According
to our rules for interference graph construction, x and
y interfere. However, y is dead: it is never
used after its definition. Therefore we don’t even need to store its
value at all! In general, though, we can’t eliminate the computation
of e2, since it may have side effects. Instead, we can detect
this scenario, where a variable defined by a let is not used within
its own body, and transform the ALet into an ASeq.
The analogous situation for let-rec expressions is trickier. Consider
a let-rec let rec x1 = e1, x2 = e2, ... in b where some of the
bindings are not used in any of the other bindings. Because of
well-formedness, the right hand sides of those bindings are lambdas
—e#, and find the
set of unused x# variables (that aren’t in that union)...and
just delete those bindings entirely.
Implement these two transformations as a new phase of your compiler,
after free_vars_cache and before register-allocation.
3 Recommended TODO List
Move over code from past assignments and/or lecture code to get the basics going.
Implement
free_vars_cache, to take a taggedaprogand produce anaprogaugmented with the set of free variables at every single expression position. Test as you go.Duplicate your existing
naive_stack_allocationimplementation into the body ofregister_allocation. These two functions should already be wired up to the compilation pipeline incompile.ml.Modify
register_allocationto initially callfree_vars_cache, and then refactor its behavior to depend entirely on those cached results: at this point you should have code that works identically tonaive_stack_allocation, but whose code is ready for the next step.Implement
interferefor everything other than lambda expressions (i.e. skipCLambdaandALetRec). Test very carefully: minor bugs here can make later allocation decisions very tricky to debug. Write several test programs that avoid let-rec and lambdas, but use variables in interesting ways: work your way back up through all the languages we’ve seen. Confirm that all let-free Adder programs use at most one register. Confirm that all let-free Boa programs use at most as many registers as the rightmost depth of the infix expression. Confirm that any programs using lets use an expected number of registers. Confirm that the branches of if expresions don’t interfere with each other in Cobra programs. Etc...Implement
color_graph, testing as you go.Revise
register_allocationto useinterfereandcolor_graphto make allocation decisions, rather than just assigning stack slots to all variables. You should be able to run existing integration tests by omitting builtins, for now.Add support for
CLambdaandALetRectointerfere. Test very carefully.
4 List of Deliverables
all your modified files
tests in an OUnit test module (
test.ml)any test input programs (
input/*/*.racerfiles)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
DO NOT SUBMIT YOUR .git DIRECTORY! For that matter, don’t submit
your output or _build directories. Basically, run
make clean and then submit a zip of the remaining directory.
5 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
6 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.
1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.
