Assignment 10: Garter:   Garbage Collection
1 Garbage-collection
1.1 Setting up GC
1.2 Finding the root set:   walking the stack
1.3 Copying values:   the core of the algorithm (depth-first version)
1.4 Copying values:   breadth-first version
1.5 Running with multiple heap sizes
1.6 Testing
2 List of Deliverables
3 Grading Standards
4 Submission
7.2

Assignment 10: Garter: Garbage Collection

Due: Wed 04/03 at 8:59pm

git clone

In this assignment you’ll implement a GARbage-collected, Type-Enforced(?) Runtime.

1 Garbage-collection

1.1 Setting up GC

Now that you have mutation, garbage collection becomes interesting: you have the potential for cycles and for unreachable data, i.e. garbage to be collected. Now, you need to implement garbage collection. The garbage collector will be run on every single allocation attempt: in other words, instead of only manually bumping ESI every time you want to allocate some memory, you will first call try_gc, which will collect garbage if necessary so that you can do the allocation.1You may optimize this, by using the HEAP_END variable to check whether a garbage-collection is needed and skipping it when possible. But this optimization is not mandatory. It will abort your program if garbage collection could not allocate a new semispace heap, so try_gc should never return NULL.

/*
  Try to reserve the desired number of bytes of memory, and free garbage if
  needed.  Fail (and exit the program) if there is insufficient memory.  Does
  not actually allocate the desired number of bytes of memory; the caller
  will do that.

  Arguments:

    int* alloc_ptr - the current top of the heap (which we store in ESI), where
                     the next allocation should occur, if possible
    int bytes_needed - the number of bytes of memory we want to allocate
                       (including padding)
    int* cur_frame - the base pointer of the topmost stack frame of our code
                     (i.e., EBP)
    int* cur_stack_top - the stack pointer of the topmost stack frame of our
                         code (i.e., ESP)

  Returns:
    The new top of the heap (i.e. the new value of ESI) after garbage collection.
    Does not actually allocate bytes_needed space.

  Side effect:
    Also updates HEAP_END to point to the new end of the heap, if it's changed
*/
int* try_gc(int* alloc_ptr, int bytes_needed, int* cur_frame, int* cur_stack_top);

The provided code includes a dummy implementation of a garbage collector: it does not collect anything, but does check whether there is sufficient space for the desired allocation.

1.2 Finding the root set: walking the stack

You will need to be able to walk the Garter stack frames to find any pointers in those frames. We will assume that the Garter stack is contiguous, and that there are no C frames in there (as might happen if Garter code called C code which called back into Garter code). We therefore need to mark the beginning and end of the Garter stack. To do so, we’ll employ a trick. In main.c, we will declare a global variable int* STACK_BOTTOM. As part of the prelude of our_code_starts_here, once we’ve set EBP correctly, we will mov STACK_BOTTOM, EBP, and thereby inform our runtime about exactly where the Garter stack will end.

We have provided you with some skeleton (and untested) code in gc that will walk the Garter stack frames, from the provided stack-top down to STACK_BOTTOM. Along the way, it will try to copy any heap-allocated values, if needed.

1.3 Copying values: the core of the algorithm (depth-first version)

At its heart, Cheney’s semispace collector works with two values: the address of a value to be copied if needed, and the address where to copy values if needed. Note: the description of Cheney’s algorithm online is a breadth-first traversal of the heap, whereas the description in this section is depth-first. The breadth-first approach is presented next. Implement either of them.

Let’s call the value to be copied garter_val, and its address garter_val_addr. Call the destination address heap_top.

Be careful to keep straight the differences between garter_val_addr, which is where the Garter value is currently stored and which must be updated; garter_val, which may be a primitive value or a tagged heap-allocated thing; and heap_thing, which is the data pointed-to by garter_val and which may be a tuple, a closure or a forwarding pointer.

I recommend running your program with very small heap sizes (20 words or so), and calling smart_print_heap after every call to copy_if_needed, to see how your heap is changing over the course of the algorithm.

1.4 Copying values: breadth-first version

(Note: the signature of copy_if_needed and the implementation of gc will need to change to support breadth-first traversal.)

Instead of calling copy_if_needed recursively, we separate the two steps. First, we walk the stack and copy each value if needed, without recurring. That is, we copy the contents of each heap_thing, update the value at the old garter_val location to be a forwarding pointer, and return the newly allocated location, which we use to update the pointer in garter_val_addr. Once we’ve done this, our roots will all point into the to-space, and the to-space will contain heap-allocated values that all point into the from-space.

Next, we loop over the to-space, one heap-thing at a time, until we run out of values in the to-space. (This is a tricky iteration, since it modifies the number of values in the to-space as the iteration proceeds. Nevertheless it must terminate, when implemented correctly, since every step moves one heap-thing into the to-space, and there are finite number of heap-things to move.) For each value within the heap-thing, copy it to the to-space if needed. If the value points to a forwarding pointer, just update the value to point to the forwarded value directly.

The advantage of the breadth-first traversal is that it is non-recursive and uses no stack space, and you can easily inspect the state of the heap after each iteration of the loop. The disadvantage is that it does not keep linked items near each other in contiguous memory the way the depth-first traversal does.

1.5 Running with multiple heap sizes

We’ve extended the command line for main.c to accept a heap size (as an integer in words) as its first command-line argument. If none is provided, a default heap size of 100K words is allocated.

1.6 Testing

This works mostly as before, except that there are a few additional forms for checking things relative to the garbage collector. As mentioned, the main program is parameterized over an integer argument that allows you to select the size of the heap in terms of (4-byte) words. This is exposed through the testing library as well, so you can write:

tgc "gctest" 10 "(1, 2)" "(1, 2)"

and this will run the test with a heap size of 10.

You can also test for specific errors, for example in the case that there will never be enough memory to fit the required data:

tgcerr "gctest" 10 "(1, (3, (4, 5)))" "Out of memory"

Finally, you can use tvgc to run a tgc test with valgrind, to improve errors on segfaults and check memory.

2 List of Deliverables

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

DO NOT SUBMIT YOUR .git DIRECTORY! For that matter, don’t submit your output or _build directories. Basically, run make clean and then submit a zip of the remaining directory.

3 Grading Standards

For this assignment, you will be graded on

4 Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline.

1You may optimize this, by using the HEAP_END variable to check whether a garbage-collection is needed and skipping it when possible. But this optimization is not mandatory.