5.92

3 1/27: Dictionaries

A dictionary is a commonly used data structure that maps keys to values. Like an actual dictionary on your bookshelf, the central operation is to lookup a value given its key, just like you would lookup a definition given a word. Dictionaries also provide operations to add, remove, and update their key-value mappings. Here’s an example interaction with the kinds of dictionaries we will build in this lab:

> (define d (empty-dict . set 1 "one" . set 2 "two" . set 3 "three"))
> (d . lookup 1)

"one"

> (d . lookup 2)

"two"

> (d . set 1 "uno" . lookup 1)

"uno"

> (d . set 1 "uno" . lookup 2)

"two"

> (d . has-key? 4)

#f

> (d . has-value? "two")

#t

An interface for dictionaries

Before we build anything, we should first specify what a dictionary is.

; A Key is a Nat
 
; An [Dict V] implements:
; has-key : Key -> Boolean
; Does the given key exist?
 
; lookup : Key -> V
; The value mapped to by the given key
; Assumption: the key exists
 
; set : Key V -> [Dict V]
; Set the given key-value mapping

Notice that a dictionary [Dict V] is parameterized by the type of its values, V. This enables us to give precise contracts to the methods.

Dictionaries as lists of pairs

Like most structures we encounter in programming, dictionaries can be implemented in many different ways. Let’s begin with a simple representation: a list of key-value pairs.

; A [ListDict V] is one of:
;  - (ld-empty%)
;  - (ld-cons% Key V [ListDict V])
; and implements [Dict V]

One thing to note is that the assumptions on methods are part of the interface that classes have to implement. If an assumption is impossible to meet for a given case, then this method does not need to be implemented. For example, does the lookup method make sense for an ld-empty%? Consider a concrete example like (send (new ld-empty%) lookup 5).

Exercise 1. Define the classes ld-empty% and ld-cons%.

Exercise 2. Define the method has-key? for ListDicts.

Exercise 3. Define the method lookup for ListDicts.

Exercise 4. Define the method set for ListDicts.

How good are your tests?—make sure you pass this one:

(check-expect ((new ld-empty%) . set 1 "one"
                               . set 2 "two"
                               . set 1 "uno"
                               . lookup 1)
              "uno")

Exercise 5. Here’s an interesting property relating the set and has-key? methods: for all dictionaries d, keys k, and values v, if you ask has-key? k after setting the mapping kv, then you should always get true.

Write this property down as a function that takes an [Dict V], a Key, a V, and returns true if the property is satisfied by the particular inputs given. (If it ever returns false, then the property is invalid!)

Exercise 6. As above, write a property relating set and lookup.

To randomly test these properties we need a way to randomly generate dictionaries.

Exercise 7. Define a function random-list-dict that takes a number n and builds a [ListDict Key Nat] of size n with random key-value mappings. (It’s not important that the size is exactly n.)

Exercise 8. Randomly test your properties: for each property, call check-expect a large number of times, testing the property on random inputs each time.

Dictionaries as sorted lists of pairs

What ListDict do you get back if you set the same key twice?

((ld-empty%) . set 1 "one" . set 1 "uno")

Does your set method hunt through the list and remove old mappings for the key 1, or do you tolerate multiple occurrences of the same key by ignoring all but the first? Either way works fine, but the question suggests we use a less ambiguous representation to avoid the issue in the first place.

; A [SortedListDict V] is one of:
;  - (sld-empty%)
;  - (sld-cons Key V [SortedListDict V])
; and implements [Dict V]
; 
; Invariant: The keys are sorted (increasing).

Exercise 9. Define the classes sld-empty% and sld-cons%.

Exercise 10. Define the methods has-key?, lookup, and set for SortedListDicts. Pay attention to the invariant!

How about randomized testing? The properties we wrote above are generic over any Dicts, so we can reuse them to test our SortedListDicts.

Exercise 11. Define a function random-sorted-list-dict, similar to random-list-dict above.

Identify and abstract out their common behavior.

Exercise 12. Randomly test your Dict properties on random SortedListDicts.

The methods you wrote for SortedListDict should preserve its sorting invariant, but nothing prevents client code from combining sld-empty% and sld-cons% in ways that violate it. Modules enable us to hide our implementation details and prevent clients from violating our internal invariants.

And while we’re thinking about module boundaries, let’s take a moment to organize the rest of our code, too.

Exercise 13. Split your code into separate modules:
  • "list-dict.rkt" contains the ListDict implementation, and provides only an empty ListDict

  • "sorted-list-dict.rkt" contains the SortedListDict implementation, and provides only an empty SortedListDict

  • "random-tests.rkt" requires all the other modules, and specifies and randomly tests properties for the various kinds of dictionaries

The Dict interface stands alone, ready for other modules to implement or use it. The ListDict implementation follows the Dict interface and tests its individual methods. The SortedListDict implementation does the same. And the randomized testing code acts as our “client”: it lives outside the module boundaries of each implementation and thus can only interact with them in approved ways—for example, it has no way to violate the sorting invariant in SortedListDict.

Dictionaries as binary search trees

Oh snap!—I just thought of an even better way to represent dictionaries. Searching through a binary search tree (BST) is way faster than searching through a list, even a sorted one, so let’s use that to build a better kind of dictionary.

; A [TreeDict V] is one of:
;  - (td-leaf%)
;  - (td-node% Key V [TreeDict V] [TreeDict V])
; and implements [Dict V]
; 
; Invariant: The keys form a BST.

So each key-value mapping is stored at the node of a BST ordered by the keys, which means that if I’m a node, then my key is bigger than all the keys to my left and smaller than all the keys to my right. This ordered structure enables us to find a given key—and thus its value—in O(log n) time.

Two points to be careful about:
  • Only keys follow the BST ordering; values are just along for the ride.

  • This is different than what we saw in class today.

Exercise 14. Define the classes td-leaf% and td-node% in a new module "tree-dict.rkt".

Exercise 15. Define Dict methods on TreeDicts. Know your invariant!—make sure you both preserve and exploit it.

Exercise 16. Define a function random-tree-dict, similar to random-list and random-sorted-list. (If you abstracted things properly before, this should be a one-liner.)

Exercise 17. Randomly test your Dict properties on random TreeDicts.

More useful dictionaries

Dictionaries tend to have many more behaviors than the three we’ve been playing with so far.

Exercise 18. Add a method has-value? to Dicts that tests whether a given value is present. (Invariants won’t help you here!)

Specify and randomly test a property relating set and has-value?.

Exercise 19. Add a method update to Dicts that takes a key k and an update function f : V -> V, and updates k’s value by applying f to it. The method should assume that the key k already exists in the dictionary.

Specify and randomly test a property relating update and lookup.

Exercise 20. Add a method extend to Dicts that takes another Dict (who knows which kind!) and combines all of its mappings with this’ mappings. What if the mappings from the two dictionaries overlap?—pick an easy-to-understand policy and document it in your contract/purpose.

Specify and randomly test a property relating extend and has-key?.

Specify and randomly test a property relating extend and lookup.