8.16

Homework 8🔗

home work!

Programming Language ISL

Due Date: Wednesday March 19, 9pm

Purpose Part 2 of Project, plus more practice with local and list abstractions

Expectations

----------------------------------------------------------------------------- Errata
  • added that find-chunk should return -1 if nothing found

  • clarified that initialize-hbs should build as many levels as possible, up to the level where there are only two chunks

  • fixed typo: "(exp 2 (sub1 (length hbs))) should be (expt 2 (sub1 (length hbs)))

  • sample code was corrected

-----------------------------------------------------------------------------

Introduction

For this assignment, you will be implementing part 2 of the 3-part project for the course. This document is a summary synopsis of what you must implement for this project. It is not a deep explanation of the motivations and design; that will have been covered in class.

For HW6, you developed a set of useful functions that allowed you to conveniently manipulate a list of Booleans as though it were an array of values, allowing you to examine and modify individual elements in the list by index, e.g., test to see if the 8th value is #true, or setting the 15th value to a #false.

In this homework, you will be using those functions to implement what is commonly referred to as a "bitmap". A bitmap is a list of Booleans where each element corresponds to and represents a specific entry in a sequence of things. In our case, the "things" are blocks of storage: the map corresponds to the "in-use" state of each of the sequence of storage locations in a hard drive in a computer (hereafter referred as simply a "drive"). Note that instead of using actual bits (a single binary on/off value inside a computer), for our map, we are using the closest data type we have in Racket: a list of Booleans. Bits can be 0 or 1, while Booleans are #false or #true, so it’s a good fit. (In actuality, we will be building a series of bitmaps arranged as a hierarchical list; so, a [List-of [List-of Boolean]]–more details follow below.)

Getting Back to our Bitmap

The storage drive inside a computer consists of a sequence of numbered "blocks", with each block able to store a fixed number of bytes; the specific size of our blocks doesn’t matter. A user will request storage in units of one or more blocks. At any given time, some of the blocks on our drive will be in use (allocated to some file), or unused, meaning it is not currently part of any file, and free to be given out sometime in the future. We need a map of some kind to keep track of which blocks are currently in use and which blocks are free: bitmaps to the rescue! We will be using a bitmap to keep track of the current state of each of our blocks, by setting the corresponding bit in our map to #true if the block is free, or #false if the block is allocated (i.e., in use by a file). Note that we are not managing the blocks in files! Just the free space.

Your system will provide the user with functions that allow them to request a number of sequentially-located blocks: what we refer to as a "chunk". The user does not care about where on the drive any particular chunks are allocated from: just that the blocks in a given chunk are contiguous and sequential. If a user makes two consecutive requests, the second chunk could be located earlier in the drive than the first chunk.

Your system will also provide the user with functions to give blocks back, freeing them up, i.e., deallocate a chunk. This would be called if a user decided to delete a file, for example.

Lastly, you will also design functions to allow them to get information about the current state of the storage, such as how many blocks are available (i.e., currently marked free).

All of these functions will take as one of their key parameters a data structure representing a hierarchical bitmap, which is actually a list of bitmaps– again, details on this later. The functions will operate on the hierarchical bitmap, possibly modifying it and returning an updated map.

The single most important property of your system is that it must allocate and subsequently free up spaces correctly: in other words, it must only allocate blocks that are currently free, which it must then mark as now allocated, and when later freed, it must immediately make them available again for future allocation.

A part of this design was already done in HW6: you already implemented the ability to represent a bitmap as a list of Booleans, and to manage the bitmap by using find-first-true to search for available blocks, set-to-false to allocate a block by changing the corresponding bit from free to used, (#true to #false), and set-to-true to free a block back up by changing the bit from used to free. In this Part 2, we will be extending that functionality to allow the user to ask for a chunk of blocks at a time.

From Blocks to Chunks

In this part of the project, we will extend the functionality to honor requests for multi-block chunks of space, where all of the blocks in that chunk are contiguous and sequential. Acquiring space in chunks allows the user to significantly speed up reading from and writing to the drive. So, if the user requests a 4-block chunk, we might allocate blocks 4, 5, 6, 7, but never 4, 5, 6, 8 (not contiguous), nor 8, 7, 6, 5 (not sequential).

(A digression for terminology: for the sake of brevity, we will hereafter refer to a chunk of n contiguous blocks as a "n-chunk"; for example, a chunk of 4 contiguous blocks will be referred to as a "4-chunk".)

One way to implement allocating chunks would be to design a function that sequentially scans the bitmap, looking for a continuous run of the requested number of free blocks; i.e., an uninterrupted sequence of #true’s in the list. However, you will not do this! For a number of important reasons, this is not a good way to implement chunk allocation. We will not go into details here, just take our word for it.

Instead, you will implement what we will call a hierarchical bitmap set–HBS for short. Instead of a single bitmap–a [List-of Boolean] that keeps track of each block separately– you will implement a [List-of [List-of Boolean]]; in other words, a series of interrelated bitmaps. The last, or "bottom", bitmap in the list, will be a map of 1-chunks, i.e., individual blocks, where each #true will indicate a single free block. We will refer to this as the "Level-0 bitmap" in the HBS.

Then, at each higher level (i.e., earlier in the list-of-lists) a bit will represent a chunk that is twice the size of the chunks in the level below it. Therefore, for the next-to-last bitmap in our list, which is the Level-1 bitmap, each bit represents a chunk consisting of a pair of blocks. Bit 0 will represent blocks 0 and 1, bit 1 will represent blocks 2 and 3, and bit i will represent blocks 2i and 2i+1.

At the level above that, i.e., the 3rd bitmap from the end, will be the Level-2 bitmap, where every bit represents a 4-chunk, with bit 0 representing blocks {0,1,2,3}, bit 1 representing {4,5,6,7}, and so on. Notice the pattern: a Level-n bitmap will represent chunks of size 2^n (2 to the nth power).

This implies that the top level bitmap, i.e., the very first bitmap in our list of bitmaps, will represent the biggest chunk size in our system, and the last bitmap will represent the smallest chunk size, i.e., individual blocks. We order it this way–largest chunk size bitmap first, down to single blocks last–because that design allows us to use classic structural recursion, where (rest hbs) will always itself be a valid HBS. The top level, i.e., the first bitmap in the list (whether the actual top of the entire HBS, or just the current top in our recursive call), will have a chunk size you can easily compute from the number of bitmaps in our list: it will be (expt 2 (sub1 (length hbs))).

The following is an illustration of what the layers of bitmaps would look like. Note that we stretched out at the various levels to show what drive blocks each bit at each level would represent; in reality, the upper bitmaps would each be only half the length of the bitmap below it. (In the figure, we represented #true values as 1’s, and #false as 0’s to make it visually clearer.)

; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; |               0               |               1               |
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; |       1       |       0       |       0       |       0       |
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; |   0   |   0   |   1   |   0   |   0   |   0   |   0   |   0   |
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
; | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
; +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(Ignore the actual values inside the bitmap example above–it was just meant to illustrate the relative alignment of the bits at various levels. We will discuss the possible layout of actual values in the next section.)

NOTE: your functions will be called with an existing HBS. That means that the number of levels is pre-defined. Also, every request from the user will be for a chunk of some power-of-2 size; they might ask for a 1-chunk, or a 4-chunk, but never a 3-chunk or 6-chunk, for example. Furthermore, we guarantee that every bitmap in the HBS, including the top one, will have length of at least 2. We also guarantee that the total number of blocks in your drive (i.e., the length of the bottom bitmap) will be a power of 2. These guarantees will make your design much simpler.

The hierarchical design, based on managing an assorment of power-of-2-sized chunks, makes allocating and freeing chunks very easy: you just have to go down to the appropriate level and look for a free (i.e., #true) bit. You will never have to search for a range of bits: always just a single bit. However, it’s not that simple and boring :-). When you allocate a chunk at a given level, depending on your design, you might have to scan some levels above in case what you allocated also formed part of an available chunk at higher levels. For example, the 2-chunk {2, 3} would also have been part of the 4-chunk {0, 1, 2, 3}, which could also have been marked free.

IMPORTANT: there is an absolutely essential requirement for the HBS allocation strategy: you must allocate chunks in such a way as to avoid unnecessarily disrupting larger free chunks. For example, assume you have an 8-block drive, numbering from 0, and blocks 4 and 5 are currently allocated (marked #false). If the user requests a 2-chunk, the requirement is that you must allocate the 2-chunk {6, 7}, because if you instead allocated either {0, 1} or {2, 3}, it would unnecessarily disrupt the 4-chunk {0, 1, 2, 3}, meaning you wouldn’t subsequently be able to honor a request for a 4-chunk.

Note that this design does not allow allocation of what we will call "misaligned chunks": a contiguous set of available blocks that is big enough, but not aligned on a power-of-2 boundary. For example, If only blocks 2, 3, 4, 5 were free, we could in theory fulfill a request for a 4-chunk, because the range {2, 3, 4, 5} is contiguous and sequential. However, in our design, because that particular span of 4 blocks partially overlaps two different 4-chunks–{0,1,2,3} and {4,5,6,7}–it would not show up as a free chunk in the Level-2 bitmap, and would therefore never be allocated in response to a request for a 4-chunk. That is one of the accepted deficiencies of this hierarchical design, and considered an acceptable tradeoff for the other benefits of this design. Note that not only would you not have to find and return the span {2, 3, 4, 5}, but it would be technically invalid to do so; you should just return an error to the user.

Normal Form

[NOTE: In class, we discussed that there were multiple possible designs for filling in the bits in our bitmaps: we discussed two in particular, one where each free block was represented as being part of a unique chunk, and at the highest possible level; and one where at every level, every available chunk was explicitly represented as a #true. In the former, a completely free drive would be represented with #true’s in every bit in the top-level bitmap, and #false’s at every other level. In the latter design, the HBS for an empty drive would have every bit set to #true at every level. In lecture, we mentioned that the latter design was much more complicated to implement. As this project was being refined, we decided that it was too complicated, so you will all be implementing the first version, and that is what we describe in this document.]

We already described the design of the data type: a hierarchical list-of-list-of-Booleans structure. However, given that structure, there are still various ways to represent a given set of free and used blocks as a pattern of bits. We now detail the specific way you should lay out the data. The design is meant to minimize the maintenance complexity of having to propagate allocations and frees both upwards and downwards that other designs entail, without losing any of the contiguous allocation capabilities.

The design represents each free block uniquely at the highest level it qualifies at, and only at that level, meaning we represent each free block only once within the various levels, and then as part of the largest free chunk possible. So, if we had blocks {4, 5, 6, 7} free, it would not be marked as 4 free blocks on Level-0, nor as 2 free 2-chunks on Level-1, but solely as a free 4-chunk on Level-2. Similarly, if only blocks {2, 3, 4, 5} were free, they would be represented as the free 2-chunks {2, 3} and {4, 5} (since they cannot be represented as a single free 4-chunk, due to alignment).

The implication of this is that at any level’s bitmap, we should not have any "pairs" of both-true bits. Note that by "pair", we do not mean just any adjacent two bits: instead, we are conceptually breaking a bitmap up into two-bit sets in a "buddy" system, so bits 0 and 1 are a pair, bits 2 and 3 are the next pair, and so on. We are saying that at any given level, both members of a pair cannot both be true. Why not? Since each matched pair corresponds to a chunk at the next higher level, to maximize our chunk sizes, that pair would be represented as a free chunk at the next higher level, not as separate free chunks at the current level.

This would occur at the time a user frees a chunk: you go down to the level the user is freeing into, and set the bit to #true. But then, you examine the pairs at the current level to see if you now have any cases where both of the pair are free, and if so, you must combine those into a single chunk at the next higher level. We refer to this process as "coallescing". You must also clear those two bits at the current level, since they don’t exist as separate pieces anymore.

There is one other complication: If you do not find any #true bits at the requested level, it does not mean there are no available chunks of the requested size; it just means some of those chunks might be being represented as part of a bigger chunk in a level above. Since you can use some of that larger chunk to fulfill the user’s request, you must allow for that. How to handle this? You must implement what we will call "splitting": if there are no chunks immediately visible at the requested level, you must search at increasingly higher levels, and if you find a chunk there, you must split it: turn the found #true bit to #false, and create two new #true bits at the next level down in the corresponding positions. Note that if you had to search more than one level up, you would have to do this split multiple times. However, you don’t keep splitting everything: just one chunk per level. For example, say the user requested a 2-chunk, but the smallest you could find was an 8-chunk. You would split the 8-chunk into two 4-chunks, then one of those 4-chunks into two 2-chunks, and finally, allocate and return one of those newly-created 2-chunks. You would be left with no fee 8-chunks, one 4-chunk, and one 2-chunk. This assortment of spare chunks of various sizes that is created by the splitting process will be handy for the next request, which is one of the features of this algorithm.

An example: starting with a completely empty 8-block disk (i.e., every block is free to be allocated), imagine if the user were to make a series of allocation requests, first for a 1-chunk then a 2-chunk, then another 20chunk.

The initial list-of-bitmaps before any requests would be:

(list (list #t #t)
      (list #f #f #f #f)
      (list #f #f #f #f #f #f #f #f))

Note that the entire disk is free, but there "appear" to be no free 1-chunks or 2-chunks; that is only appearance. The map says the entire disk is available as two bit 4-chunks. However, clearly, chunks can be taken at any level! The strategy is to start with the level specified by the user’s requested chunk size, and if there are no explicit free chunks (i.e., #true bits) at that level, to then try the next higher level, and if you find a free chunk there, grab part of it, by first splitting it into two chunks at the desired level. This process might require actually propagating the request more than one level upwards, which would then entail splitting more than once.

In our example, if a user requests a 1-chunk from the completely free drive, you will first search the requested level, Level-0 (the last bitmap in the list) but see no free chunks. You will try Level-1, again failing. You will finally try Level-2, find an available 4-chunk, and split it into the two corresponding 2-chunks, temporarily getting:

(list (list #f #t)
      (list #t #t #f #f)
      (list #f #f #f #f #f #f #f #f))

You would then split one of the newly-created 2-chunks into 2 1-chunks:

(list (list #f #t)
      (list #f #t #f #f)
      (list #t #t #f #f #f #f #f #f))

and finally, allocate one of the new 1-chunks:

(list (list #f #t)
      (list #f #t #f #f)
      (list #f #t #f #f #f #f #f #f))
... returning block 0 to caller.

The second request, for a 2-chunk, would immediately find the available 2-chunk at Level-1, mark it as allocated, and return the 2-chunk {2, 3}, resulting in:

(list (list #f #t)
      (list #f #f #f #f)
      (list #f #t #f #f #f #f #f #f))
(Actually, you would return just the block number 2; the user will then infer they were allocated blocks 2 and 3, since they requested a 2-chunk.)

The third request, for another 2-chunk, would fail at Level-1 (no #true’s there anymore), so you would try Level-2, and find a free 4-chunk in bit 1. You would split that into two 2-chunks at Level-1, then allocate one of those to return to the caller, resulting in:

(list (list #f #f)
      (list #f #f #f #t)
      (list #f #t #f #f #f #f #f #f))

Now, deallocating space: If a user frees up a chunk at a given level, you must first set that bit to #true. But then, to maintain normal form, you must check the freed bit’s pairmate (remember: not always the bit after: might be the bit before, depending on whether it’s an odd- or even-numbered bit). If both are #true, you must clear both of them, and set the corresponding bit at the next higher level to be now #true. You must then propagate this coallescing upwards until you hit a level with no free pairs.

In our example, let us say the user first frees back up the 1-chunk {0}: you would set Level-0 bit 0 to #true. Then, you would examine its pairmate bit 1, which is also #true, so you would coallesce those into the corresponding 2-chunk: bit 0 on Level-1. You would then check Level-1 bit 0’s pairmate, which is Level-1 bit 1, which is not free, so coallescing is done, and you end up with:

(list (list #f #f)
      (list #t #f #f #t)
      (list #f #f #f #f #f #f #f #f))

Next, free up the 2-chunk {2, 3} will start at Level-1, setting bit 1 to #true. Since its pairmate bit 0 is also free, you would clear both those bits and set the the corresponding bit at Level-2: bit 0. That bit’s pairmate (Level-2 bit 1) is not free, so end of coallescing:

(list (list #t #f)
      (list #f #f #f #t)
      (list #f #f #f #f #f #f #f #f))

Lastly, the user frees up the 2-chunk {4, 5}: you start by setting Level-1 bit 2 to #true. You then check its pairmate bit 3, which is also #true, so you coallesce: set both to #false and set the corresponding bit 1 on Level-2 to be #true, leaving:

(list (list #t #t)
      (list #f #f #f #f)
      (list #f #f #f #f #f #f #f #f))

NB: we do not collesce the two #true pairmates at Level-2, because that is the very top level, and there is no level above to coallesce into!

Provided Data Designs:

You can assume the existence of the following data type:

; A HierarchicalBitmapSet (HBS) is a [List-of [List-of Boolean]]

(Note that this is just a shorthand synonym.)

The following data definition must be used as the return type for several of your functions:
(define-struct hbs-alloc [block hbs])
; A HBSAllocResult is a (make-hbs-alloc Integer HBS)
; Represents the result of an allocation call, where:
; - block is the starting block number for the chunk that was allocated;
;   or -1 if no space available
; - hbs is the updated HBS after the allocation
(define HBSAR-1 (make-hbs-alloc 1
                                (list (list #f #t)
                                      (list #t #f #f #f))))
(define (hbsar-temp hbsar)
  (... (hbs-alloc-block hbsar) ...
       (hbs-temp (hbs-alloc-hbs hbsar) ...)))

We are also providing the following data definition; using this, you can alternatively define a HierarchicalBitmapSet as a [List-of ListOfBoolPairs]]. You are free to completely ignore this–it may or may not be useful to you, depending on the rest of your design.

; A ListOfBoolPairs (LoBP) is one of:
; - empty
; - (cons Boolean (cons Boolean ListOfBoolPairs))
; Represents an empty list, or the first pair of Booleans in an
; even-length list of Bools,
(define LOBP-0 '())
(define LOBP-1 (list #false #false))
(define LOBP-2 (append (list #true #true) LOBP-1))
(define (lobp-temp lobp)
  (...
   (cond [(empty? lobp) ...]
         [(cons? lobp) (... (first lobp) ...
                           (first (rest lobp)) ...
                           (lobp-temp (rest (rest lobp))) ...)])))

Note that the above is structurally compatible with a simple [List-of Boolean]– it just adds some constraints on the length of the list. So, any function that returns a ListOfBoolPairs, or any data type that contains this as a member, can handle this type as though it were a plain [List-of Boolean].

Exercise 1 Support Functions: For this part of the assignment, you will design functions to support status queries about the underlying storage:

  • Design the function blocks-remaining that consumes a HBS and produces a count of the total number of free blocks (not chunks) available on the drive.

  • Design the function find-chunk that consumes a HBS and a chunk size (which will always be a power-of-2) and produce the index of the first block of a free chunk of the requested size. Note that this function only returns a Integer, since it does not modify the HBS. This should report the exact same chunk of blocks as what a true allocation would return, but without actually making any changes to the HBS. Return -1 if no chunk of the appropriate size could be found.

  • Design the function initialize-hbs that consumes a single bitmap of the free blocks on a drive, and produces the corresponding HBS, set up according to the representation described earlier in this document. So, the signature for this should be:

    initialize-hbs : [List-of Boolean] -> [List-of [List-of Boolean]]

This function should create as many levels in the bitmap as appropriate; i.e., it should go up to the maximal chunksize that still follows the rule of having at least two bits.

Exercise 2 Allocation Function: For this part of the assignment, you will design the function that allocates a chunk and updates the map appropriately.

Design the function alloc-chunk that consumes a HBS and a chunk size (which will always be a power-of-2) and produces an HBSAllocResult (given above), which includes index of the first block of the free chunk of the requested size that was allocated, as well as the resulting HSB modified to reflect the current allocation. If the allocation fails because no appropriate chunk could be found, the block number in the result structure should be -1, and the HBS should be returned unmodified.

IMPORTANT: this function must return first available chunk of the appropriate size, given the restriction that it will only return aligned chunks, and the requirement that you should not unnecessarily break up bigger chunks. For example, if you have a 16-block drive, and blocks 0, 3, 12, and 13 are currently allocated, you must return the 2-chunk {14, 15}, for the following reasons: While the user would be happy with being allocated blocks {1, 2}, your system would not return it because they do not belong to the same 2-chunk (block 1 is part of the 2-chunk {0, 1}, and block 2 is part of {2, 3}). And while the chunk {8, 9} would again make the user and our chunk-based system happy, it would unnecessarily break up the 4-chunk {8, 9, 10, 11}. Allocating {14, 15} is ideal because it is not needlessly breaking up a larger chunk: the 4-chunk {12, 13, 14, 15} is already broken up because {12, 13} is already allocated.

Exercise 3 Freeing (i.e., Deallocation): For this part, you will design a function that frees a chunk back up, updating the map appropriately.

Design the function free-chunk that consumes a HBS, a chunk size, and the starting block of the chunk, and produces an updated HBS that reflects the deallocation. Again, the chunk size will be a power-of-2, and the starting block will be appropriately aligned (i.e., a multiple of chunk size). You can assume the chunk being freed is valid.

This function is very easy to describe, but much more complex to design. Be generous with adding helper functions.

Final Thoughts As stated in class multiple times, this is the first real thinking assignment. The more you actually work out the design in your head, the less code you will find you need to write, and surprisingly, the simpler it will be, too. Good luck!