Assignment 10: Racer Snake: Register allocation

8.14

Assignment 10: Racer Snake: Register allocation🔗

Due: Wed 4/09 at 8:59pm

git clone

In this compiler you’ll implement register allocation, to replace the naive stack allocation scheme. After all, Register Allocation Contributes to Efficient Runtime...

There is relatively little starter code this time. The main change is a new command-line flag, -alloc <naive|register>, to configure your compiler to use either the former naive allocation strategy or your new register allocation strategy. You can also use this command line flag in your options files for your tests; there is not a test helper-function in test.ml to do this for you, though you can easily add one.

1 Language🔗

Racer starts with the same semantics and runtime behavior as Garter, and simply revisits the naive stack-allocation scheme to assign locations to let-bound variables.

2 Implementation🔗

Read the TODO list below for a recommended order of operations for this assignment. Here, we’ll focus on the new code, rather than the refactorings of old code.

Cache the free variables on every expression.
Define an interference graph for a given program.
Color the graph such that no edge has two endpoints of the same color.
Map the coloring back to actual locations
Implement a dead-variable elimination pass

2.1 Caching free variables🔗

You have already implemented free_vars in Fer-de-lance, to compute the set of free variables of a given expression. In this assignment, you’re going to need to access those free variable sets repeatedly, so rather than recomputing them over and over (which would be exponential in the size of your AST), you should design a function free_vars_cache that takes a tagged aprog and walks the AST, computing the free variables at every expression, and produces a new aprog that stores those free variable sets.

2.2 Defining an interference graph🔗

In order to construct the interference graph for a given program, we compute the live variables for every expression in the program, and use them in constructing the necessary graphs. Conceptually, this is a (pseudocode) pipeline

|> (add_phase ... compute_free_vars) (* produces a StringSet of free vars *)
|> (add_phase ... compute_live_vars_also) (* produces a pair of StringSets of free vars and live vars *)
|> (add_phase locate_bindings register_allocation)

Instead of breaking this down into two separate steps, you might instead want to calculate \(\mathit{LIVE}_{in}\)[\(\cdot\)] sets and build the interference graphs simultaneously. Either approach is fine. (Note that if you choose this two-step approach, you’ll need to slightly modify the signature of naive_stack_alloc to match.)

Consider a let-expression let x = e in b. In order for b to run correctly, each element of \(\mathit{LIVE}_{in}\)[b] must not be placed wherever we place x, and any two variables in that set must conflict with each other. Accordingly, add nodes (if needed) for each variable in \(\mathit{LIVE}_{in}\)[b] and for x, edges between x and every variable in \(\mathit{LIVE}_{in}\)[b], and edges between each pair of variables in \(\mathit{LIVE}_{in}\)[b]. Note that we do not add interference edges between x and any free variables of e, since x isn’t defined until after e executes.

You should deduce the appropriate graph for ASeq expressions from the rules for ALet expressions above.

Consider a let-rec expression let rec x1 = e1, x2 = e2, ... in b. By well-formedness, each of the e# expressions is a lambda. We therefore collect the union of the free variables of all the lambdas and \(\mathit{LIVE}_{in}\)[b], and add interference edges with each of the x# variables. We also add interference edges between each pair of x# variables. (Note that this technically introduces some potentially-unneeded interference edges, but the bookkeeping needed to avoid the consequences of them (in how you fill in the bodies of the closures) is sufficiently painful to make these edges worthwhile.)

When building the interference graph for an aexpr expression, note that every cexpr cannot introduce additional interference edges...except for CIf, which can contain aexprs that themselves contain further let-bindings or lambdas. In particular, do not recur into any lambda expressions; only use their cached free-variables set. I.e. the interference graph for the lambda itself should not be merged with the interference graph for the expression containing that lambda expression.

2.3 Graph coloring🔗

Given an interference graph, we need to color each node such that no edge of the graph has two endpoints with the same color. Our set of “colors” consists of each of the general-purpose registers that are not already in use by our program. This would be R10, R11, R12, R13, R14, and RBX, possibly excluding one or two of those as temporary registers. (You are welcome to try to include the six argument-passing registers RSI, RDI, RCX, RDX, R8, R9, though additional stack-push and pop bookkeeping will be needed to preserve them across function-calls. If you’ve been pushing arguments onto the stack for your calling convention, then you should definitely include these registers in your available set of registers to use!) Additionally, we can create arbitrarily many more “colors” corresponding to stack slots RBP-8, RBP-16, etc. Each of these colors is an arg.

Graph coloring is rather inconveniently an NP-complete problem to solve optimally, though there are many heuristics. We’re going to implement the so-called smallest-last algorithm. We’ll break the process of graph coloring into two steps: first, producing a worklist of nodes that we’ll process in order, and then processing each node of that worklist in turn. Our worklist will be a stack. To determine the order to process the nodes of the graph:

Initialize the worklist to the empty stack.
Find the node in the graph of smallest degree. Push it onto the worklist, and remove that node (and all its edges) from the interference graph.
Repeat this process with the reduced graph, repeatedly removing the node of smallest (remaining) degree and pushing onto the stack, until the graph is empty.

To actually color the nodes:1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.

Create a mapping from node names to colors (i.e. an arg envt), initialized to the empty mapping.
For each node on the stack, examine all of its neighbors in the original interference graph, and collect all their currently-known colors. Find the minimum color that isn’t used yet by that set of colors, and allocate that new minimum color to that node. Obviously, we prefer register “colors” to stack-slot colors!
Repeat until the stack is empty.

This arg envt is the final environment for that lambda.

You will compute an interference graph and color it, for each function in your program (including our_code_starts_here...), and you’ll store these resulting environments in a map from function names to environments — this is the arg envt envt that matches the return type of naive_stack_alloc, so your register allocator function should be a drop-in replacement for that simpler function.

2.4 Dead variable elimination🔗

(Required for CS6410; optional for CS4410)

Consider the program let x = e1 in let y = e2 in x. According to our rules for interference graph construction, x and y interfere. However, y is dead: it is never used after its definition. Therefore we don’t even need to store its value at all! In general, though, we can’t eliminate the computation of e2, since it may have side effects. Instead, we can detect this scenario, where a variable defined by a let is not used within its own body, and transform the ALet into an ASeq.

The analogous situation for let-rec expressions is trickier. Consider a let-rec let rec x1 = e1, x2 = e2, ... in b where some of the bindings are not used in any of the other bindings. Because of well-formedness, the right hand sides of those bindings are lambdas — which do not have any interesting side effects, since they do not execute anything at the moment of the let-binding. Their only side effect is to allocate memory, but that memory can immediately be garbage-collected since the bindings are already dead. Accordingly, find the union of the free variables of the e#, and find the set of unused x# variables (that aren’t in that union)...and just delete those bindings entirely.

Implement these two transformations as a new phase of your compiler, after free_vars_cache and before register-allocation.

3 Recommended TODO List🔗

Move over code from past assignments and/or lecture code to get the basics going.
Implement free_vars_cache, to take a tagged aprog and produce an aprog augmented with the set of free variables at every single expression position. Test as you go.
Duplicate your existing naive_stack_allocation implementation into the body of register_allocation. These two functions should already be wired up to the compilation pipeline in compile.ml.
Modify register_allocation to initially call free_vars_cache, and then refactor its behavior to depend entirely on those cached results: at this point you should have code that works identically to naive_stack_allocation, but whose code is ready for the next step.
Implement interfere for everything other than lambda expressions (i.e. skip CLambda and ALetRec). Test very carefully: minor bugs here can make later allocation decisions very tricky to debug. Write several test programs that avoid let-rec and lambdas, but use variables in interesting ways: work your way back up through all the languages we’ve seen. Confirm that all let-free Adder programs use at most one register. Confirm that all let-free Boa programs use at most as many registers as the rightmost depth of the infix expression. Confirm that any programs using lets use an expected number of registers. Confirm that the branches of if expresions don’t interfere with each other in Cobra programs. Etc...
Implement color_graph, testing as you go.
Revise register_allocation to use interfere and color_graph to make allocation decisions, rather than just assigning stack slots to all variables. You should be able to run existing integration tests by omitting builtins, for now.
Add support for CLambda and ALetRec to interfere. Test very carefully.

4 List of Deliverables🔗

all your modified files
tests in an OUnit test module (test.ml)
any test input programs (input/*/*.racer files)

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

DO NOT SUBMIT YOUR .git DIRECTORY! For that matter, don’t submit your output or _build directories. Basically, run make clean and then submit a zip of the remaining directory.

5 Grading Standards🔗

For this assignment, you will be graded on

Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage

6 Submission🔗

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.

1A fairly simple greedy argument shows that we definitely don’t need more colors than one plus the maximum degree of any node in the graph. The reason the smallest-last ordering tends to work well is that each step shrinks the degress of the remaining vertices, which means that the largest degree ever encountered in this process (the so-called "degeneracy" of a graph) is lower than than the original largest degree. By coloring the vertices in the reverse order of their removal, we ensure that all the vertices so far can be colored in at most the maximum degree encountered so far, which in turn is at most the degeneracy of the original graph.

1	Language
2	Implementation
3	Recommended TODO List
4	List of Deliverables
5	Grading Standards
6	Submission