Skip to content

Lecture 4.2: Specifying with a reference implementation

tagged trees

Last class we worked with bintrees:

type bintree =
  | Leaf int
  | Branch of (bintree * bintree)

Today we will modify this slightly and look at a tagged_bintree:

type entry = {
  tag : string;
  value : int
}

type tagged_bintree =
  | Leaf of entry
  | Branch of tagged_bintree * tagged_bintree

This binary tree, at its leaves, is now storing two things: a tag, given by a string; and a value, given by an int. We store these two entries in a record called entry for convenience.

This binary tree could be storing account balances: perhaps the tag is the account name, while the value is the amount of money in the account. the tree structure is "users" and sometimes you have users that link accounts.

We might want to perform a sum on a subset of the tags. To do this, we are going to perform a tree filter: given some tag, give me all the values of the tag:

let rec sum_tagged (q : string) (bt : bintree) : int =
  match bt with
  | Leaf { value; tag } -> if q = tag then value else 0
  | Branch (l, r)  -> sum_tagged l + sum_tagged r

While this works, perhaps are finding that it is very slow, because the total set of accounts is large (say, 20 gigabytes). Also, sometimes you noticed that the set of accounts you are interested in, those tagged with "A", is actually small: maybe only a few thousand! This gives you an idea: let's precompute the 'A' accounts, store them to the side, and then sum those up separately. We'll collect the pre-computed values with collect_tagged, and use a simple sum function to arrive at our value:

let rec collect_tagged (t : string) (bt : bintree) : int list = 
    match bt with
    | Leaf { tag; value }  -> if t = tag then [value] else []
    | Branch (l, r) -> collect_tagged t l @ collect_tagged t r
    (* (@) is list concatenation from the OCaml standard library *)

let rec sum (xs : int list) : int =
  match xs with
  | [] -> 0
  | x :: xs -> x + sum xs

And let's make some randomly generated bintrees to look at examples:

let names = ["Alice"; "Bob"; "Charlie"]

let gen_tag (xs : string list) : unit -> string =
  fun () -> List.nth xs (Random.int (List.length xs))

let gen_entry : unit -> entry =
  fun () -> { tag = gen_tag names () ; value = Random.int 100 }

let rec gen_tagged_bintree (n : int) : unit -> tagged_bintree =
  fun () ->
   if n <= 0
   then Leaf (gen_entry ())
   else
     if Random.int 9 = 0
     then Leaf (gen_entry ())
     else Branch ( gen_tagged_bintree (n - 1) ()
                  , gen_tagged_bintree (n - 1) ())

How can we reason about the correctness of our optimization? We specify that the optimized code is equivalent to the original:

\[\forall q ∈ \texttt{string}, \forall bt \in \texttt{bintree}, \texttt{sum_tagged}(q, bt) = \texttt{sum}(\texttt{collect_tagged}(q, bt)).\]

Next, we want to formally state our specification in OCaml:

let prop_sum_tagged_spec
    (sum_tagged : string -> tagged_bintree -> int)
    (collect_tagged : string -> tagged_bintree -> int list)
    (sum : int list -> int)
    : string -> tagged_bintree -> bool =
  fun q bt -> 
    sum_tagged q bt = sum (collect_tagged q bt)

But we will still need a testing harness!

let rec forall_tagged_bintree
    (gen_tag : unit -> string)
    (gen : unit -> tagged_bintree)
    (prop : tagged_bintree -> bool)
    (trials : int)
    : (string * tagged_bintree) option = 
  if trials <= 0
  then None
  else
    let tag = gen_tag () in
    let bt = gen () in
    if prop tag bt
    then forall_tagged_bintree gen_tag gen prop (trials - 1)
    else Some (tag, bt)

Next class, we will remove this boilerplate and finally talk about polymorphism in OCaml.