On this page:
20.1 Canonical forms
20.2 Integrity checking
20.3 Ordered binary trees
20.4 Exercises
20.4.1 Queues
5.3.3.8

20 Constructors

20.1 Canonical forms

Today we’re going to look more at the concept of invariants. Invariants often let us write code that takes advantage of the fact that we know some property, the invariant, of our data. We saw this last class using sorted lists of numbers. Today we’re going to examine a new example: fractions.

A fraction can be represented as a compound data that consists of two numbers representing the numerator and denominator:

;; A Fraction is a (new fraction% Integer Integer).
(define-class fraction%
  (fields numerator denominator))

The problem here is that we’d like to consider the fractions:

(new fraction% 1 2)
(new fraction% 2 4)

as representing the same number, namely 1/2, but these are different representations of the same information. The issue with fractions is a recurring issue we’ve seen with information that allows for multiple representations (sets are another example).

There are a couple approaches to solving this issue:

  1. Represents information is some canonical way.

  2. Codify the interpretation of data as a program.

The first approach basically eliminates the problem of multiple representations by picking a unique representation for any given piece of information. For example, with fractions, we might choose to represent all fractions in lowest terms. This means any fraction admits only a single representation and therefore any fractions which are interpreted as "the same" have exactly the same structure. (This approach is not always possible or feasible.)

The second approach requires us to write a program (a function, a method, etc.) that determines when two pieces of data are interpreted as representing the same information. For example, we could write a method that converts fractions to numbers and compares them for numerical equality; or we simplify the fraction to lowest terms and compare them structurally.

Along the lines of the second approach, let’s consider adding the following method:

fraction%

;; to-number : -> Number
;; Convert this fraction to a number.
(define (to-number)
  (/ (this . numerator)
     (this . denominator)))

This method essentially embodies our interpretation of the fraction% class of data. It doesn’t help with this issues:

(check-expect (new fraction% 1 2)
              (new fraction% 2 4))

But of course now we can write our tests to rely on this interpretation function:

(check-expect ((new fraction% 1 2) . to-number)
              ((new fraction% 2 4) . to-number))

But what if we wanted to go down the second route? We could define a method that computes a fraction in lowest terms:

fraction%

;; simplify : -> Fraction
;; Simplify a fraction to lowest terms.
(check-expect ((new fraction% 3 6) . simplify)
              (new fraction% 1 2))

We can use the gcd function to compute the greatest common denominator of the terms:

fraction%

(define (simplify)
  (new fraction%
       (/ (this . numerator)
          (gcd (this . numerator)
               (this . denominator)))
       (/ (this . denominator)
          (gcd (this . numerator)
               (this . denominator)))))

This allows us to structurally compare two fractions that have been simplified to lowest terms:

(check-expect ((new fraction% 3 6) . simplify)
              ((new fraction% 1 2) . simplify))

But it does not prevent us from constructing fractions that are not in lowest terms, which is what we were aiming for — we want it to be an invariant of fraction% objects that they are in their simplest form. One possibility is to define a constructor function that consumes a numerator and denominator and constructs a fraction% object in lowest terms:

;; fract-constructor : Number Number -> Fraction
;; construct a fraction in lowest terms.
(define (fract-constructor n d)
  (new fraction%
       (/ n (gcd n d))
       (/ d (gcd n d))))

So we are able to write a new function with the behavior we want and it establishes our invariant. That’s good, but there are still some inconveniences:

If we want to have a stronger guarantee that we maintain the lowest term invariant, we need a stronger mechanism for enforcing our discipline at construction-time. The idea is to allow arbitrary computation to occur between the call to a constructor and the initialization of an object. To enable this mechanism, we need to bump the language level up to class/2.

All class/1 programs continue to work in class/2. The main difference is that we now the ability to write constructors.

fraction%

(constructor (n d)
  ;;...some expression that uses the fields form to return values
  ;;   for all of the fields...
  ...)

The constructor form can take any number of arguments and must use the fields to initialize each of the fields. If you leave off the constructor form, a default constructor is generated as:

(constructor (n d)
  (fields n d))

And in general if you have n fields, the defaults constructor looks like:

(constructor (field1 field2 ... fieldn)
  (fields field1 field2 ... fieldn))

But by writing our own constructor, we can insert computation to convert arguments in a canonical form. For our fraction% class, we can use the following code:

;; Number Number -> Fraction
(constructor (n d)
  (fields (/ n (gcd n d))
          (/ d (gcd n d))))

This code is used every time we have a (new fraction% Number Number) expression. Since this is the only way to construct a fraction, we know that all fractions are represented in lowest terms. It is simply impossible, through both error or malice, to construct an object that does not have this property.

Returning to our simplify method; we don’t really need it any longer. (We could, if need be, re-write the code to take advantage of the invariant and give a correct implementation of simplify as (define (simplify) this), since all fractions are already simplified.) Likewise, we no longer need the fract-constructor function.

Finally, we get to the point we wanted:
(check-expect (new fraction% 1 2)
              (new fraction% 2 4))

Q: Can you have multiple constructor?

A: No. We’ve been thinking about multiple constructors, but we don’t have a strong story for them yet. Remember: you can always write functions and you can think of these as alternative constructors.

That brings up another feature in the class/2 language — constructors and functions are treated more uniformly now: you may leave off the new keyword when constructing objects.

Examples:

> (define-class posn%
    (fields x y))
> (new posn% 2 3)

(object:posn% 2 3)

> (posn% 4 5)

(object:posn% 4 5)

Q: Can you have a different number of arguments to the constructor than to the number of fields?

A: Yes. There’s no relation between the number of arguments to your constructor and the number of fields in the object being constructed.

One thing to note is that printing values has changed. You’ll notice that fraction% values no longer print as (new fraction% Number Number), but instead as (object:fraction% Number Number). This is because by adding arbitrary computation at construction-time, there’s no longer a close relationship between a call to a constructor and the contents of an object. So in printing values we have a choice to make: either print the constructor call, which doesn’t tell us about the contents of the object, or print the contents of the object, which doesn’t tell us about the call to the constructor. We chose the latter.

Q: Can you call methods on the object being constructed?

A: No. What would they do? Suppose you invoked a method that referred to fields of this object — those things just don’t exist yet.

Some languages allow this. Java for example, will let you invoke methods from within constructors and should those methods reference fields that are not initialized, bad things happen. (This is just poor language design, and descends from Sir Tony Hoare’s "Billion Dollar Mistake": the null reference.)

20.2 Integrity checking

Beyond computing canonical forms, constructors are also useful for checking the integrity of data given to a constructor. For example, suppose we are writing a class to represent dates in terms of their year, month, and day of the month. Now, what if we’re given the 67th day of March in the year -17? What should that data represent? Maybe it should be March 40 (because as we heard in class, (= 40 (- 67 17)); maybe it should be May 6th, 17 B.C., maybe it should May 6th, 17 years before the UNIX epoch of 1970; maybe it should be March 5, 17 A.D., which we arrive at by mod’ing 67 by the number of days in March and making the year positive; or maybe... this data is just bogus and we should raise an error and refuse to continue computing.

Let’s see how we can implement a simple form of integrity checking in a constructor. We will implement a class to represent dates and raise an error in case of a situation like the above.

;; A Date is (date% Number Number Number).
;; Interp: Year Month Day.
;; Year must be positive.
;; Month must be in [1,12].
;; Day must be in [1,31].
(define-class date%
  (fields year month day))

We can still construct meaningless dates, so what we would like to do is check the inputs to a constructor make some sense. This let’s us establish the integrity of all date% objects — if you have your hands on a date% object, you can safely assume it satisfies the specification we’ve given in the data definition.

The simplest way to satisfy the specification is with this constructor:

date%

(constructor (y m d)
  (error "I didn't like this date!"))

This is known as a "sound" solution in the program verification community. Notice: if you have your hands on a date% object, you can safely assume it satisfies the specification we’ve given in the data definition. Why? Because you cannot construct a date% object.

We’d like to do better by accepting more legitimate dates. Here is one that accepts all the things deemed acceptable in our specification (this is both "sound" and "complete"):

date%

(constructor (y m d)
  (cond [(<= y 0) (error "year was negative or zero")]
        [(or (> m 12) (< m 1)) (error "month too big or too small")]
        [(or (> d 31) (< d 1)) (error "day too big or too small")]
        [else (fields y m d)]))

Example:

> (date% 2011 3 67)

day too big or too small

It is still possible to construct meaningless dates, such as February 31, 2011. However, more stringent validation is just some more code away, and since we are more concerned with the concept of integrity checking than in a robust date library, we won’t go into the details.

Thus we can establish invariants with computation, or we can reject inputs that don’t have the invariant we want to maintain. And we can combine these approaches. (You may want to compute fractions in lowest terms and reject 0 as a denominator in fraction%, for example.)

20.3 Ordered binary trees

Now we want to look at a slightly larger program and how we use constructors to enforce important invariants. In this section, we want to develop a representation of sorted lists of numbers, which is what we did in Invariants of Data Structures, but this time we’re going to represent a sorted list of numbers as an ordered binary tree.

An ordered binary tree looks like this:

    *

   / \

  *   3

 / \

1   2

Notice that there is data only at the leaves of the tree and that if you traverse the leaves in left-to-right order, you recover the sorted list of numbers. Thus there is an important invariant about this data structure: whenever we have an ordered binary tree node, the left sub-tree is sorted and the right sub-tree is sorted and and numbers in the left sub-tree are smaller than or equal to all the numbers in the right sub-tree.

Here is our data and class definition for ordered binary trees:

;; A OBT is one of:
;; - (node% OBT OBT)
;; - (leaf% Number)
(define-class leaf%
  (fields number))
(define-class node%
  (fields left right))

Some examples:

(leaf% 7)
(node% (leaf% 1) (leaf% 2))

Now, is this an example?

(node% (leaf% 7) (leaf% 2))

This example points out that we are currently missing the specification of our invariant in the data definition:

;; A OBT is one of:
;; - (node% OBT OBT)
;; - (leaf% Number)
;; Invariant: numbers are in ascending order from left to right.

What happens if we try to construct something that violates our invariant? Nothing – we just construct bogus things. Now how could enforce this ascending order invariant?

Well, let’s first think about the leaf% case. We are given a number and we need to construct an ordered binary tree, meaning all the numbers in the tree are in ascending order. Since we are constructing a tree with only one number in it, it’s trivial to enforce this invariant—it’s always true!

Now consider the node% case. We are given two ordered binary trees. What does that mean? It means the numbers of each tree are in ascending order. But wait—isn’t that the property we are trying to enforce? Yes. Notice that if we assume this of the inputs and guarantee this of the constructed value, then it must be true of all OBTs; i.e. the assumption was valid. If this reasoning seems circular to you, keep in mind this is not unlike "the magic of recursion", which is not magic at all, but seems to be since it lets you assume the very function you are writing works in recursive calls on structurally smaller inputs. If you do the right thing in the base case, and if on that assumption of the recursive call, you can construct the correct result, then that assumption about the recursive call was valid and your program is correct for all inputs.

OK, so the challenge at hand is not in verifying that the input OBTs posses the invariant, but in guaranteeing that the result of the constructor possesses it. If we can do that, than we know the given OBTs must have the property.

But now this assumption is not sufficient to guarantee that the default constructor works:

node%

;; OBT OBT -> OBT
(constructor (a b)
  (fields a b))

Why? Although we know that the left and right sub-tree are OBTs, we know nothing about the relationship between the left and right sub-tree, which was an important part of the invariant. Consider for example, the OBTs:

(node% (leaf% 4) (leaf% 5))
(node% (leaf% 2) (leaf% 3))

Independently considered, these are definitely OBTs. However, if we construct a node% out of these two trees, we get:

(node% (node% (leaf% 4) (leaf% 5))
       (node% (leaf% 2) (leaf% 3)))

which is definitely not an OBT. (Thus we have broken the stated contract on the constructor.)

We could correctly compute an OBT by determining that, in this example, the first given tree needs to be the right sub-tree and the second given tree needs to be the left sub-tree. We can make such a determination based on the maximum and minimum numbers in each of the given trees, and that suggest the following constructor:

node%

;; OBT OBT -> OBT
(constructor (a b)
  (cond [(<= (b . max) (a . min))
         (fields b a)]
        [(<= (a . max) (b . min))
         (fields a b)]
        [else
         ...]))

The max and min methods are easily dismissed from our wish list:

leaf%

(define (min)
  (this . n))
(define (max)
  (this . n))

node%

(define (min)
  (this . left . min))
(define (max)
  (this . right . max))

At this point, our constructor does the right thing when given two OBTs that do not overlap, as in the example we considered, but a troubling pair of examples to ponder over is:

(node% (leaf% 2) (leaf% 4))
(node% (leaf% 3) (leaf% 5))

Again, considered independently, these are definitely OBTs, but there’s no way to construct an ordered binary tree with one of these as the left and the other as the right; either order you pick will be wrong. This case is the else clause of our constructor. What should we do? One solution is just to reject this case and raise and error:

node%

;; OBT OBT -> OBT
(constructor (a b)
  (cond [(<= (b . max) (a . min))
         (fields b a)]
        [(<= (a . max) (b . min))
         (fields a b)]
        [else
         (error "trees overlap")]))

But really this again fails to live up to the stated contract since we should be able to take any two OBTs and construct an OBT out of them. We know that if the trees overlap, we can’t simple make a node with them as sub-trees; we have to do something more sophisticated. Here’s an idea: insert all of the elements of one into the other. So long as we make this insertion do the right thing, our constructor will succeed in maintaining the invariant properly.

So if we indulge in some wishful thinking and suppose we have a insert-tree in our interface:

;; insert-tree : OBT -> OBT
;; Insert all elements of this tree into the given one.

then we can write the constructor as follows:

node%

;; OBT OBT -> OBT
(constructor (a b)
  (cond [(<= (b . max) (a . min))
         (fields b a)]
        [(<= (a . max) (b . min))
         (fields a b)]
        [else
         (local [(define t (a . insert-tree b))]
           (fields (t . left) (t . right)))]))

That leaves insert-tree to be written. First let’s consider the case of inserting a leaf% into a tree. If we again rely on some wishful thinking and relegate the work to another method that inserts a number into a list, we can easily write insert-tree for the leaf% case:

leaf%

(define (insert-tree other)
  (send other insert (this . number)))

In the node% case, if we first consider the template (the inventory of what we have available to use), we have:

node%

(define (insert-tree other)
  (this . left . insert-tree other) ...
  (this . right . insert-tree other) ...)

But here we don’t really want to insert the left tree into the other and the right into the other. We want to insert the right tree into the other, then insert the left tree into that one (other permutations of the order of insertions would work, too). That leads us to:

node%

(define (insert-tree other)
  (this . left . insert-tree (this . right . insert-tree other)))

We have only a single item remaining on our wish list—we need to implement the insert method for inserting a single number into a tree.

First let’s consider the case of inserting a number into a leaf%. If we have a leaf and we insert a number into it, we know we get a node with two leaves. But where should the inserted number go? One solution is to compare the inserted number against the existing number to determine which side the number should go to:

leaf%

(define (insert m)
  (node% (leaf% (the-real-min n m))
         (leaf% (the-real-max n m))))

In the case of inserting a number into a node, we compare the number against the maximum of the left sub-tree to determine if the number should be inserted in the left or right:

node%

(define (insert n)
  (cond [(> n (this . left . max))
         (node% (this . left)
                (this . right . insert n))]
        [else
         (node% (this . left . insert n)
                (this . right))]))
20.4 Exercises
20.4.1 Queues

The University Registrar is instituting a new course registration system, in which each student will wait in a “Virtual Line” until every student ahead of them has registered. A simple way to represent a line (also known as a queue) is by using a list. But this representation makes it slow to add somebody to the end of the line (or to take somebody off the front of the line, depending on whether the front of the list represents the front or rear of the line).

In order to provide maximal waiting efficiency, you have been tasked with implementing a representation that uses two lists! The key idea of this fancy representation is that one list will represent some portion of the front of the line, while the other will represent the remainder of the line in reverse order. So if you’re the first element of the first list, you are at the head of the line. On the other hand, if you’re the first element of the second list, you are the very last person in line.

Here is the interface for queues:
;; A [IQ X] implements:
 
;; head : -> X
;; Produce the element at the head of this queue.
;; Assume: this queue is not empty.
 
;; deq : -> [IQ X]      (Short for "dequeue")
;; Produces a new queue like this queue, but without
;; this queue's first element.
;; Assume: this queue is not empty.
 
;; enq : X -> [IQ X]    (Short for "enqueue")
;; Produce new queue with given element added to the
;; END of this queue.
 
;; emp? : -> Boolean
;; Is this queue empty?

The head and deq operations require that the queue be non-empty when they are used, but this can be assumed and these operations do not need to check for an empty queue.

Further, the Registrar’s office has just learned about invariants, and insists on maintaining the following invariant about all of their queues:

if the front of the queue is empty, the whole queue must also be empty.

The Registrar’s office has given you three tasks to prepare their Virtual Line for its launch later this semester: