Lecture 14: Automatic Memory Management
Now that our language has heap-allocated values (lambdas and tuples), the
execution of our program will rapidly produce a lot of heap allocations which
will quickly become obsolete, but we currently have no way of freeing those
objects on the heap: this will rapidly exhaust our heap space and our program
will crash. One non-solution would be to introduce a manual deletion
primitive, whereby programmers could explicitly free
some heap-allocated
value. This has several problems:
Programmers might forget to free their data, which leaves us exactly where we currently are.
Programmers might free too soon, leading to dangling pointers and (probably exploitable!) crashes.
Programmers can only reference the values inside a tuple, but they have no way of accessing the closed-over contents of a closure. If a closure closed over some heap-allocated value, and that closure were the only thing still referring to that value, then freeing the closure would inevitably leak the value. Any manual solution to this would require the programmer to maintain references to everything inside the closure, thereby “opening” it and mostly negating the benefits of closures.
But most annoyingly, programmers didn’t have to do anything explicit in order to allocate memory, so it seems unfair to force them to manually deallocate memory. Instead, we need to design some sort of automated mechanism to reclaim memory that is no longer needed.
Unfortunately, the idea of “memory that is no longer needed” is not a computational one. A program like
let x = (1, 2, 3) in if false then x else reallyLongRunningCode() end
x
, despite it being in
scope for the duration of the long-running function. Since we can’t
peer into the future to find out if a value won’t eventually be
used, we’ll use a proxy for this that might be computable: we’ll check
if a value cannot ever be accessed again. If a value cannot be
accessed, even in theory, then it definitely cannot be used and
therefore can be reclaimed.There are several techniques we might use to accomplish this, with varying trade-offs.
1 Reference Counting
At the simplest intuitive level, the idea behind reference counting is simply to imbue every heap-allocated object with a counter tracking how often that value is used. If we can reliably detect when references are created or destroyed, we can keep the counter up-to-date, and if the counter ever drops to zero we know that the value can never be used again.
To track the count of an object, we need to detect every time a reference is taken of it and every time a reference is lost. We can’t do this purely at the code-generation level; it needs to be detected semantically in the ANF’ed code.
When we create an object, we’ll initialize the count to 1.
Every time the value is used, we increment the counter. This is probably most easily implemented by creating a new
prim1
,IncRef
, that takes a value, increments its reference count, and evaluates to that value.Every time a variable goes out of scope, we decrement the reference count of whatever it contained.1Note: this interacts quite badly with tail-position and tail calls, since it implies that the body of a
let
is no longer in tail position—some additional work must be done before the let
completes! We don’t necessarily need an additionalprim1
for this, since every reference we create gets let-bound by ANFing, and we can change the code-generation forALet
accordingly.When a value’s reference count drops to zero, we can reclaim that value. This entails recursively decrementing the reference-counts of any items that are pointed to by this one (if it’s a tuple or closure containing further references), and those values’ counts may in turn drop to zero as well.
Reference counting requires that we associate a counter with every heap-allocated value. We might naively allocate a map for this in our runtime, whose keys are the addresses of our heap-allocated values, but this has two pragmatic problems. First, it implies that every time we access a value on the heap we have to also access some map in some wholly-unrelated region of memory, which won’t be great for performance. Second, we have to have enough space to store that map! The solution to both of these is to extend every heap allocation with a header word, an extra slot that stores the count. That way we amortize the allocation cost (if there’s room for the value at all, it’ll include room for its counter) and store the count adjacent to the value (which improves memory locality).
Because reference counting can detect the moment a value’s reference count drops to zero, it can eagerly reclaim values as soon as they become unreachable. This keeps memory usage as low as it can be for any given program.
Reference counting has one major drawback, though, which precludes it from reclaiming all unreachable values: any cyclic data will maintain a permanently-positive reference count, because each value in the cycle contains a reference to the next one. As a result, unreachable cycles of data can accumulate in a reference-counted program and leak memory. To counteract this, languages have created the concept of weak references, which are references that deliberately don’t increment the reference-count of their value. This now means that weak references might dangle, and refer to no-longer-allocated memory, which makes them somewhat more fragile to use; additionally, developers must manually sprinkle them through their code to break up any cycles to ensure that they can be eventually collected.2Ironically, weak references are frequently implemented using something like the pre-allocated map idea that we discarded earlier!
Since programs frequently use a lot of references and a lot of variables with short scopes, this imposes a large churn on reference counts that frequently just cancels out. Many ideas have been put forward in the research literature on ways to delay or somehow coalesce a bunch of reference count updates, in the hope that they mostly cancel out and can be eliminated altogether. These ideas work pretty well in practice.3There’s also one tricky corner case to consider: it is conceivable that a long-running program might somehow saturate the reference count for a value and maybe even cause the count to overflow, at which point the program would crash horribly. While this is unlikely, production systems do have to consider it, since the safety and correctness of our programs depends upon the correctness of the garbage collector.
2 Mark & Sweep
In the preceding section, we tried to keep an accurate count of exactly how many references a value might have. But...we don’t really care about that exact value: if it’s zero, the value can be reclaimed, and if it’s positive, it can’t. That’s merely one bit of information, and maybe there’s a cheaper way to calculate that?
This leads to the idea of mark and sweep collection. By analogy, mark-and-sweep operates the way housekeeping does at a hotel: when guests check out, they pack up everything they care about and take it with them; housekeeping can assume that everything that was left behind is garbage, and can sweep it all up.
To implement this idea, we’ll need to periodically pause our program to run the mark-and-sweep process. (Continuing with the analogy: housekeeping can’t clean up a room simultaneously with the next guests arriving and unpacking.) But what should we mark? If the essence of reference-counting was to indicate that a value could be discarded, here we want to ensure that a value should be kept. To do this we need to examine all the places where we can find our values: on the stack, and in registers. These are known as roots, and collectively form the root set of our program: everything reachable from those roots, no matter how indirectly, is “live” and worth keeping, and anything else is “dead” and worth reclaiming.
In order to mark our values, we’ll need to find one bit that we can steal away from our representations. This won’t be a tag-bit like we use to distinguish different types of values, but rather needs to be a bit in the heap-allocated data: after all, we care about the value itself, not each individual alias to that value! To do this, we could easily steal the sign-bit from our tuple-length or closure-size fields: we’re pretty unlikely to ever have a single tuple or closure with more than \(2^{63}\) items in it. Our approach will therefore be:
Starting with each root (in registers or on the stack), mark the value by following the (untagged) pointer and setting its leading bit, to indicate it’s still alive. Recursively follow the values and tag anything they reference, and short-circuit if we reach a value that’s already tagged.
Once the marking phase has finished, we can sweep up memory somehow. This will entail sweeping through all of our heap, rather than focusing on our root set.
Once the sweep phase is complete, we can recur through our roots again and clear the tag-bit, to reset for the next time we need to run garbage-collection.
Notice that this mark-and-sweep process completely avoids the leakage of unreachable cyclic data: since values start off with an unset tag-bit, and since our traversal cannot access unreachable data (by definition!), we’ll never set that tag-bit on the cycle, and so that cycle will appear dead and be reclaimed.
To sweep up memory, we need to enhance our current notion of “where
the next allocation goes”. Currently, we simply store the next
available address in R15
, and bump that pointer every time we
allocate a new value. However, by freeing up some subset of our heap
we now have “holes”, which we ought to fill with new values. (This
observation applies to reference-counting as well.) To accomplish
this, we can exploit the fact that our minimum heap allocation size is
two words: for every object that we reclaim, we can turn it into a
linked-list cell whose first word is its size, whose second word is a
pointer to the next linked-list cell, and whose remaining words are
“junk”. This is known as a free list, and it tracks the free
space in our memory. We could even treat our initial R15
as a
pointer to the head of a 1-element linked list, whose next-pointer is
null and whose size is however big our heap is. When we allocate a
value, we need to find some cell of the free list whose size is large
enough to store our new allocation, and then we carve off however much
of that cell we need, return it as the newly allocated value, and
update the bookkeeping in the free-list to remove that allocated
space. As memory gets used over time, though, we’ll likely develop
fragmentation: a free-list containing lots of small cells, none
of which are individually large enough to store our desired values,
but which collectively have plenty of space. Accordingly, we can run
another mark-and-sweep pass, which will consolidate adjacent free
cells of the free list, to defragment memory and hopefully find a
contiguous chunk of sufficient space.
Mark-and-sweep has as its primary disadvantge that it requires we pause the program while performing the marking phase. This can lead to stuttering and unpredictable program performance, which often is undesired. Refinements to the mark-and-sweep approach use a three-colored algorithm to mark “values that are currently thought to be dead” (known as “white” values), “live values that don’t point to those supposedly-dead values” (known as “black” values) and “live values that might point to potentially-dead values” (known as “gray” values). The gray values form a frontier between the live (black) values and the dead (white) values, and this frontier can be kept updated concurrently with program execution: this helps eliminate the unwanted pauses while garbage-collection runs.
3 Copying Collection
Managing a free list and keeping track of memory fragmentation seems frustratingly indirect: once we’ve established which data is live, it seems wasteful to have to scan through all of memory to build a free list! Perhaps we can eliminate the free list as well, such that every time we collect memory we automatically compact our used memory so that we never have fragmentation to begin with. To accomplish this, we have to abandon an invariant that we never even stated: that once a value is allocated, it stays where it was put until it eventually is deallocated. If we can somehow move values around in memory, as part of the marking phase, then we can avoid the fragmentation issue. This leads to a third notion of garbage collection known as a copying collector.
More to come...
4 Hybrid approaches
There is a beautiful duality to reference counting versus mark-and-sweep or copying collection. Reference counting keeps track of dead data, while the others track live data. Reference counting walks through values as they become dead and frees them; copying collection walks through values while they’re alive and preserves them. Reference counting is slow when huge amounts of data are suddenly freed; copying collection is slow when huge amounts of data are preserved.
Accordingly, it might make sense to try to build a combined approach that uses the best attributes of both styles of garbage collection. Essentially we can partition memory into several smaller sub-heaps, and run different garbage collection algorithms on different subheaps at different rates. Since many values are created and live for only a short while before becoming garbage (they “live fast and die young”), we can build a generational garbage collector, where values are created in a small “nursery” mini-heap, which we can collect frequently since it’s small. Any values that outlive a GC pass over the nursery are moved into a larger “mature” mini-heap, which we can collect less frequently. We do have to ensure that no values in the mature heap hold references into the nursery, so that we can garbage-collect the nursery frequently and without having to look at any other memory. These hybrid approaches tend to work very well in practice, and most production systems (to my knowledge) indeed use some sort of hybrid approach.
1Note: this interacts
quite badly with tail-position and tail calls, since it implies that
the body of a let
is no longer in tail position—let
completes!
2Ironically, weak references are frequently implemented using something like the pre-allocated map idea that we discarded earlier!
3There’s also one tricky corner case to consider: it is conceivable that a long-running program might somehow saturate the reference count for a value and maybe even cause the count to overflow, at which point the program would crash horribly. While this is unlikely, production systems do have to consider it, since the safety and correctness of our programs depends upon the correctness of the garbage collector.