Lecture 4.1: Polymorphism

Throughout the past few weeks, we have implemented a lot of similar-looking functions. For example, we can append two int lists together:

let rec append_ints (xs : int list) (ys : int list) : int list = 
    match xs with
    | [] -> ys
    | x :: xs' -> x :: append_ints xs' ys

At the same time, we can append two bool lists:

let rec append_bools (xs : bool list) (ys : bool list) : bool list = 
    match xs with
    | [] -> ys
    | x :: xs' -> x :: append_bools xs' ys

Similarly, we can map over a list of ints:

let rec map_ints (f : int -> int) (xs : int list)  : int list = 
    match xs with
    | [] -> []
    | x :: xs' -> f x :: map_ints f xs'

while we can also map over strings:

let rec map_strings (f : string -> string) (xs : string list)  : string list = 
    match xs with
    | [] -> []
    | x :: xs' -> f x :: map_strings f xs'

If you look at these implementations, we can see that even though append_ints and append_bools have different types, they have the same code. Ideally, we won't have to duplicate this code. We can do so by using polymorphism. In short, polymorphism lets us assign a more general type to a single implementation of append that works for both int and bool. Let's look at how append is specified:

let rec append (xs : 'a list) (ys : 'a list) : 'a list = 
    match xs with
    | [] -> ys
    | x :: xs' -> x :: append xs' ys

Note that the implementation of append is just the same as append_ints and append_bools. The difference is in the types: instead of taking in int list / bool list, we take in 'a list (where we pronounce 'a as "alpha"). The 'a here is a type variable: a variable standing for a type which can be later replaced with a concrete type. When we type this into utop, we see we get the following:

val append : 'a list -> 'a list -> 'a list = <fun>

Compare this to append_ints, which has type int list -> int list -> int list. We can get the latter from the former by specializing the type: by substituting 'a to int, we get the type of append_ints.

In OCaml, the process of specializing more general types to get more specific ones (like 'a list into int list) happens automatically whenever we use the general function. For example, let's call append directly on some arguments:

utop # append [1; 2; 3] [4; 5; 6];;
- : int list = [1; 2; 3; 4; 5; 6]

utop # append [false; false] [true; true]
- : bool list = [false; false; true; true]

When we call append on int lists, we get back an int list; similarly, when we call append on bool list, we get back a bool list. This works for every type, including function types and list types themselves:

utop # append [[1; 2; 3]; [4; 5; 6]] [[7]; [8]];;
- : int list list = [[1; 2; 3]; [4; 5; 6]; [7]; [8]]

utop # append [(fun (x : int) -> x + 1); (fun (x : int) -> x + 2)] [(fun (x : int) -> x)];;
- : (int -> int) list = [<fun>; <fun>; <fun>]

We can also see how type specialization works by partially applying append:

utop # append [1];;
- : int list -> int list = <fun>

Again, OCaml is comparing append's general type ('a list -> 'a list -> 'a list) to the input (int list). To make this work, OCaml knows that if 'a list must match int list, 'a must be int. Thus, in this line of code, OCaml will specialize append's type to be int list -> int list -> int list. Partially applying [1], we get int list -> int list.

A Polymorphic `map` function

Let's take a look at map now. Looking at map_ints and map_strings, one valid choice for the polymorphic version is below:

let rec map (f : 'a -> 'a) (xs : 'a list) : 'a list = 
    match xs with 
    | [] -> []
    | x :: xs' -> f x :: map f xs'

Just like for append, map can specialize to behave the same as map_ints and map_strings. However, we can do better. Consider this function:

let rec map_int_to_string (f : int -> string) (xs : int list) : string list = 
    match xs with
    | [] -> []
    | x :: xs' -> f x :: map_int_to_string f xs'

Here, we are writing a mapper from int lists to string lists, again with the same piece of code. How could we make this generic? We can use two type variables:

let rec map (f : 'a -> 'b) (xs : 'a list) : 'b list = 
    match xs with 
    | [] -> []
    | x :: xs' -> f x :: map f xs'

From this more general map function, we can recover map_ints by setting 'a = int and 'b = int, and map_strings by setting 'a = string and 'b = string. We can recover map_int_to_string by setting 'a = int and 'b = string. While the first polymorphic version (with just 'a) also type checks, it is less general, since it forces the input and output lists to have the same type. (Whether you want the more general or more specific version depends on what you're doing!)

Just like with append, we can see the type specialize when we partially apply map:

utop # let f (x : int) = string_of_int x ^ "abc";;
val f : int -> string = <fun>

utop # map f ;;
- : int list -> string list = <fun>

Type Inference

So far in the semester, we have been careful to annotate all types. What happened if we didn't do that?

utop # let f x = x ;;
val f : 'a -> 'a = <fun>

Above, we can see that, without type annotations, OCaml will choose the most general polymorphic type. The identity function f works on any type, so OCaml will give it type 'a -> 'a.

If we look at a more complicated example:

utop # let g x y = x ;;
val g : 'a -> 'b -> 'a = <fun>

We can see that the most general type for g says that, for any types 'a and 'b, g has type 'a -> 'b -> 'b. This makes sense, because g returns its first argument, so the return type must be the same as the first argument; while g doesn't use the second argument, so it may be anything else.

What if we add operations?

utop # let f x y = x + y ;;
val f : int -> int -> int = <fun>

Since we use + (which must take ints), OCaml knows that not any 'a will work for x and y (since we would be, for example, adding bools if 'a = bool). Thus, OCaml will automatically force 'a = int, resulting in the above type.

Going back to our examples

utop # let rec append xs ys = 
    match xs with
    | [] -> ys
    | x :: xs' -> x :: append xs' ys ;;
val append : 'a list -> 'a list -> 'a list = <fun>

Since :: (cons on lists) requires the head of the list to be the same as the tail, and we return ys in the [] case, OCaml will figure out that xs and ys must have the same type of contents.

utop # let rec map f xs =
       match xs with
       | [] -> []
       | x :: xs' -> f x :: map f xs';;
val map : ('a -> 'b) -> 'a list -> 'b list = <fun>

Here, OCaml chooses the more general polymorphic type for map, where we use two type variables.

A generic `forall`

Last class, we saw our now-familiar friend, forall, which lets us run tests many times:

let rec forall_bintree (gen : unit -> bintree) (prop : bintree -> bool) (trials : int) : bintree option =
    if trials <= 0 then     
        None (* No counter-example found *)
    else 
        let xs = gen () in 
        if prop xs then 
            forall_bintree gen prop (trials - 1)
        else Some xs

This was suspiciously close to what we saw on Wednesday:

let rec forall_list (gen : unit -> int list) (prop : int list -> bool) (trials : int) : (int list) option =
    if trials <= 0 then     
        None (* No counter-example found *)
    else 
        let xs = gen () in 
        if prop xs then 
            forall_list gen prop (trials - 1)
        else Some xs

Because we have two pieces of code with the same bodies but different types, there is likely a polymorphic type that captures both of them. Let's see it:

utop # let rec forall (gen : unit -> 'a) (prop : 'a -> bool) (trials : int) : 'a option =
    if trials <= 0 then     
        None (* No counter-example found *)
    else 
        let xs = gen () in 
        if prop xs then 
            forall gen prop (trials - 1)
        else Some xs ;;
val forall : (unit -> 'a) -> ('a -> bool) -> int -> 'a option = <fun>

We can now freely use our generic forall:

utop # forall (fun _ -> Random.int 100) (fun x -> 1 + x = x + 1) 32;;
- : int option = None

utop # let gen_pair () = let x = Random.int 100 in let y = Random.int 50 in (x, x + y);;
val gen_pair : unit -> int * int = <fun>

utop # forall gen_pair (fun p -> fst p <= snd p) 32 ;;
- : (int * int) option = None

Combining generators together

Polymorphism --- and in particular, this more general version of forall --- is incredibly powerful because it allows us to very fluidly build up new generators from smaller ones. Last week, we saw the following generator for building lists:

let gen_list (len_bound : int) : unit -> int list = fun _ -> 
    let len = Random.int len_bound in 
    List.init len (fun _ -> Random.int 100)

We can now generalize this to generating lists of any type, by passing in a generator:

let gen_list (len_bound : int) (gen : unit -> 'a) : unit -> 'a list = fun _ -> 
    let len = Random.int len_bound in 
    List.init len gen

We will call gen_list above a generator combinator, because it builds new generators out of old ones. Let's see a few more combinators:

let gen_pair (genA : unit -> 'a) (genB : unit -> 'b) : unit -> 'a * 'b = fun _ ->
    let x = genA () in 
    let y = genB () in 
    (x, y)

let gen_option (gen : unit -> 'a) : unit -> 'a option = fun _ -> 
    if Random.bool then 
        None
    else Some (gen ())

let gen_mix (genA : unit -> 'a) (genB : unit -> 'a) : unit -> 'a = fun _ ->
    if Random.int 10 <= 5 then
        genA ()
    else genB ()

We can plug these directly into forall to easily make interesting tests. For example, let's verify that the theorem from HW1 is true:

\[ \forall p \in \texttt{bool}, \forall q \in \texttt{bool}, (\neg p \implies q) = (p \vee q). \]

One way of showing this is by using forall on pairs of booleans, by using gen_pair to combine boolean generators together:

utop # forall (gen_pair (fun _ -> Random.bool) (fun _ -> Random.bool)) (fun (p, q) -> (if not p then q else true) = (p || q)) 32;;
- : (bool * bool) option = None

Note

Above, we are using a new feature of fun we haven't seen before: for pairs, we can pattern match directly on the argument, instead of giving a name to the pair.

Datatypes with type parameters

A closely related feature to polymorphism is the ability to define new types that can depend on type variables. To a certain degree, we've already seen this with lists. Let's now take a look at the definition of list from the standard library. We will simplify it somewhat to see what's going on:

type 'a list = 
    | Nil
    | Cons of 'a * 'a list

The new thing here is that before the datatype definition we may write a type variable, such as 'a. In turn, the constructors of the datatype (Nil and Cons) Now, let's see what happens when we build up some lists:

utop # Cons (1, Cons (2, Nil)) ;;
- : int list = Cons (1, Cons (2, Nil))

This is what we expect. Let's see what happens if we mismatch the types:

utop # Cons (1, Cons (false, Nil)) ;;
Error: this expression (false) has type bool but an expression was expected of type int

We get a type error at the false. This is because we are checking the inner Cons inside of the outer one, holding a 1; thus, at this point, we already know that 'a = int. Thus, when we get to false, we are checking false against 'a; but we know that 'a = int, so this gives the type error. If we swapped the 1 and false, we would get the opposite error.

Finally, let's see what happens if we use Nil on its own:

utop # Nil;;
- : 'a list = Nil

On its own, Nil does not tell us what 'a should be, so it stays unspecialized.

Generic Trees

Let's now write a few datatypes to use this new feature:

type 'a tree = 
    | Leaf of 'a
    | Branch of 'a tree * 'a tree

Here, we are defining a generic binary tree which holds things of type 'a. For example:

utop # Branch (Leaf 1, Leaf 2);;
- : int tree = Branch (Leaf 1, Leaf 2)

Just like with lists, we can write generic algorithms over them.

let rec flip (t : 'a tree) : 'a tree = 
    match t with 
    | Leaf x -> Leaf x
    | Branch (t1, t2) -> Branch (flip t2, flip t1)

Testing it out:

utop # let t = Branch (Branch (Leaf 1, Leaf 2), Branch (Leaf 3, Leaf 4));; 
val t : int tree = Branch (Branch (Leaf 1, Leaf 2), Branch (Leaf 3, Leaf 4)) 

utop # flip t ;;
- : int tree = Branch (Branch (Leaf 4, Leaf 3), Branch (Leaf 2, Leaf 1))

We can map over trees, just like with lists. Remember that the generic type of map is ('a -> 'b) -> 'a list -> 'b list. Thus, swapping list with tree, we get:

let rec map_tree (f : 'a -> 'b) (t : 'a tree)  : 'b tree = 
    match t with
    | Leaf x -> Leaf (f x)
    | Branch (t1, t2) -> Branch (map_tree f t1, map_tree f t2)