Lecture 2.3: Defining New Types; more experience writing properties

So far, we have learned how to write and specify OCaml code using built-in types, such as lists, pairs, and option types. Today, we will focus on user-defined types, which will enable us to write and specify many more types of programs.

Binary trees

Let's look at an implementation of a binary tree type in OCaml, where the leaves are integers:

type bintree = | Leaf of int | Branch of bintree * bintree

This defines an algebraic data type, or ADT, which has a number of constructors --- here, Leaf and Branch --- which each have arguments: Leaf has one argument of type int, while Branch has two arguments of type bintree.

Note

While bintree has two arguments of type bintree, you should, in this case, not think of this as one argument of the pair type bintree * bintree.

Let's look at a few binary trees, and how they can be represented in OCaml.

    .
   / \
  2   7

This would be written Branch (Leaf 2, Leaf 7).

    .
   / \
  .   42
 / \
0  -1

This would be written Branch (Branch (Leaf 0, Leaf (-1)), Leaf 42).

Why trees?

(Binary) trees are used all over in computer science. Here, we will think of them as data structures (supporting efficient implementations of maps and sets); later on in this class, we will use trees to represent the syntax of a simple programming language.

Instead of writing trees directly, we can generate them using recursive functions. Let's create a generator for trees:

let rec mk_random_tree (n : int) = 
    if n <= 0 then Leaf (Random.int 42)
              else Branch (mk_random_tree (n - 1), mk_random_tree (n - 1))

As well as a stringifier for printing:

let rec string_of_bintree (t : bintree) = 
    match t with
    | Leaf i -> string_of_int i
    | Branch (x, y) -> "{" ^ string_of_bintree x ^ ", " ^ string_of_bintree y ^ "}"

Using binary trees

To use a binary tree, we can pattern match on it, similar to lists:

let rec sum_tree (t : bintree) = 
    match t with
    | Leaf i -> i 
    | Branch (x, y) -> sum_tree x + sum_tree y

Another example:

let rec flip_tree (t : bintree) = 
    match t with
    | Leaf i -> Leaf i
    | Branch (x, y) -> Branch (flip_tree y, flip_tree x)

Formalizing a property about `flip_tree`

To check whether flip_tree really is performing a flip, we can write a few properties. First, we can see that, for most inputs, flip_tree won't return the same tree as the one it takes as input:

let _ = 
    let t = mk_random_tree 3 in 
    if not (t = flip_tree t) then
        ()
    else 
        print_endline ("Found tree that stays the same: " ^ string_of_bintree t)

In all likelihood, mk_random_tree will cause the above equality to fail. However, this is certainly not a theorem! Any symmetric tree will cause it to succeed:

let t = Branch (Leaf 2, Leaf 2)
print_endline (string_of_bool (t = flip_tree t)) (* prints true *)

Equality on ADTs

As long as your ADT does not contain functions in it, the equality operator = will work on the ADT (as it does for bintree).

Note

Thus, the following theorem is true: $\exists t, t = \mathsf{fliptree}(t).$

But it is not true that $\forall t, t = \mathsf{fliptree}(t)$.

While it's not always true that t = flip_tree t, it definitely is true that t = flip_tree (flip_tree t). In other words, if you flip a tree twice, you get back to the original tree. In other words, flip_tree is involutive: applying the same function twice gets you back to the original value.

let flip_tree_inv () = 
    let t = mk_random_tree 4 in
    if t = flip_tree (flip_tree t) then 
        () (* OK *)
    else 
        print_endline ("Found a counterexample: " ^ string_of_bintree t)

What's algebraic about ADTs?

Another name for an ADT such as

type int_or_bool = | Int of int | Bool of bool

is a sum type, because it can be thought of as the disjoint sum of the two constructors: it is either an Int, or a Bool. We can think of this as being represented as the following equation: $$ \texttt{int_or_bool} = \mathsf{Int} + \mathsf{Bool} $$

Our bintree example is similar, except that the equation is recursive: $$ \mathsf{bintree} = \mathsf{int} + (\mathsf{bintree} * \mathsf{bintree}) $$

More on ADTs

Let's look at few more examples of ADTs, and how you might use them. Suppose you're creating a graphics application that lets you create various shapes and give them colors. First, we can define colors by an enumeration:

type color = Red | Green | Blue | Orange | Yellow | Purple

Here, the type is just an enumeration between six choices, without any data attached. While we could represent color with an integer (e.g., 0 for red, 1 for green, and so on), using an explicit enumeration with names is useful for code clarity. For example, let's write a function to compute complementary colors:

let complementary (c : color) =
    match c with
    | Red -> Green
    | Green -> Red
    | Blue -> Orange
    | Orange -> Blue
    | Yellow -> Purple
    | Purple -> Yellow

(As you might have noticed, this is another example of an involutive function.)

Next, let's define a type for shapes:

type shape = | Rectangle of int * int (* width and height *)
             | Circle of int (* radius *)
             | Square of int (* side length *)

Finally, we want to define a type that holds both a color and a shape. While we could do this with a pair type (color * shape), there is another way: by using a record:

type blob = 
    { b_color : color; b_shape : shape }

This defines a record, which must hold a field b_color of type color, and b_shape of type shape. Let's create one and analyze it:

let red_rect = { b_color = Red; b_shape = Rectangle (2, 3) }

let _ = 
    match red_rect.b_shape with 
    | Rectangle (_, _) -> print_endline "Got a rectangle!"
    | Circle _ -> print_endline "Got a circle!"
    | Square _ -> print_endline "Got a square!"