Lecture 5.2: Manipulating Modules

Last class, we saw how to implement immutable stacks by hiding its implementation in an interface file. Let's see that again:

my_stack.mli

type t 
val empty : t 
val push : int -> t -> t 
val peek : t -> int option 
val pop : t -> t

Above, we have the .mli file for a simple stack interface. An interface file can contain types (either abstract declarations, like above, or full type definitions like type q = int list), as well as declarations for values, which we write using val. Then, we have the .ml file (the implementation):

my_stack.ml

type t = int list
let empty : t = []
let push : int -> t -> t = fun x xs -> x :: xs
let peek : t -> int option = fun xs ->
    match xs with
    | [] -> None
    | x :: _ -> Some x
let pop : t -> t = fun xs ->
    match xs with
    | [] -> []
    | _ :: xs' -> xs'

We can get very far with this style of writing modules. However, OCaml lets us do more: we can also write modules inline, in a single file. Let's replicate our stack below in this style:

module type MyStackSig = sig
    type t 
    val empty : t 
    val push : int -> t -> t 
    val peek : t -> int option 
    val pop : t -> t
end

module MyStack : MyStackSig = struct 
    type t = int list
    let empty : t = []
    let push : int -> t -> t = fun x xs -> x :: xs
    let peek : t -> int option = fun xs ->
        match xs with
        | [] -> None
        | x :: _ -> Some x
    let pop : t -> t = fun xs ->
        match xs with
        | [] -> []
        | _ :: xs' -> xs'
end

First, we write the stack interface. This is given by a module type, which is secretly what an .mli file is -- just a sequence of type and value declarations. Then, we have the module, which is the implementation --- this is what goes in the .ml file. Let's see an example use:

let rec build_stack_rec (n : int) (acc : MyStack.t) : MyStack.t = 
    if n <= 0 then acc else 
        let x = Random.int 50 in 
        build_stack_rec (n - 1) (MyStack.push x acc)

let build_stack n = build_stack_rec n MyStack.empty

let pop_twice (q : MyStack.t) = 
    MyStack.pop (MyStack.pop q)

Now, since we are focusing on abstraction: what happens if we try to break the abstraction boundary?

let bad = 
    let q = MyStack.empty in 
    24 :: q (* Fails: this expression (q) has type MyStack.t but an expression was expected of type int list *)

We get the error we want. Again, why do we want this? We want to prevent users of our code from doing bad things. Let's now see what happens if we change things a bit:

module MyStack (* : MyStackSig *) = struct 
    (* ... all same as before ... *)
    end

The only difference here is that we leave off the module type. Then, our bad example will type check:

let bad = 
    let q = MyStack.empty in 
    24 :: q (* OK: MyStack.t is aliased to int list *)

What's going on here is that, when we give our module a module type, we are effectively sealing it. Inside of MyStack, we are allowed to know that the stack is a list; but outside, we only know it behaves as a MyStackSig: we have a type t, but that's it.

Let's now see some more examples. Below is a simple interface for sets of strings:

module type StringSet = sig
    type t
    val empty : t
    val insert : string -> t -> t
    val remove : string -> t -> t
    val mem : string -> t -> bool
end

While below is an interface for maps, where the keys are strings:

module type StringMap = sig
    type 'a t
    val empty : 'a t
    val insert : string -> 'a -> 'a t -> 'a t
    val remove : string -> 'a t -> 'a t
    val lookup : string -> 'a t -> 'a option
end

On API Design

Designing a good interface for your code is one of, if not the, most important idea in programming. If the interface is too narrow, it might be restrictive. For example, what if we want to count the number of elements in a StringSet.t? We are out of luck.

But if the interface is too wide, then this presents other issues. What if we added a function elems : t -> string list? This forces our set interface to "remember" all of the elements we have stored in the set. While this might sound reasonable, consider this implementation of sets:

module HashSet : StringSet = struct
    type t = string list
    let empty = []
    (* 
        Digest.string : string -> string
        computes a hash of the input.
    *)
    let insert xs x = Digest.string x :: xs
    let remove xs x = List.filter (fun y -> y <> Digest.string x) xs
    let mem xs x = List.mem (Digest.string x) xs
end

In the above implementation, we don't store the elements themselves -- but instead hashes of the elements. This could have massive implications: the hashes stored above are only 128 bits (16 bytes), so if we were storing 1MB files by their hashes instead, we would be saving around over 60,000x storage space! Thus, we need to be very careful about what goes in an API. Too few things harms usability, while too many things forces implementations to actually implement the API.

This choice comes down to what the API is actually for. There are many ways in which sets are used. Are we storing elements for later? Or are we just testing for the presence of certain elements?

Cognitive Burden

Another important facet of program correctness --- equally important as formal testing --- is decreasing cognitive burden. While you can use testing to make sure that your code is correct, another approach is to set up your APIs so that it is hard to create bad code. Here's another API for sets:

module type OtherStringSet = sig
    type t 
    (* Add an element to a set, returning the set and its index into the set. *)
    val add : string -> t -> (t * int)

    (* Now, we must work with indices. *)
    val get_nth : int -> t -> string option
    val remove_nth : int -> t -> t
    val mem : int -> t -> bool
end

Which one is easier to use? I would say the original one, because the module type is closer to what I think of when I think of a set. This other interface, while as functional as before, is much harder to use, and therefore it's more likely that bugs will arise.

Constraining Implementations via Abstract Types

Let's look now deeper at abstract types by thinking about a few examples.

module type S = sig
    type t
    val x : t
    val y : t
end

module M : S = struct .... end

Suppose we have a module M of module type S. Then how can we create a value of type t? We can only make reference to x and y. Can we do anything with these values? Not really. We can compare them for equality, but that's about it. (We shouldn't do this, though -- because this breaks through the abstraction boundary.)

Booleans as an abstract type

What about this?

module type S2 = sig
    type t
    val tru : t
    val fls : t 
    val ite : t -> 'a -> 'a -> 'a
end

module M2 : S2 = struct ...  end

What can we do here? We can have tru : t and fls : t, and we can case on a t by using ite. While we can case on a t, there are still only two values of t we can use. This means that M2.t is in bijection with the booleans: any boolean can be turned into a t, and vice versa. We can actually implement these functions:

let t_of_bool (b : bool) : M2.t = if b then M2.true else M2.fls
let bool_of_t (x : M2.t) : bool = M2.ite x true false

Let's now implement it:

module M : S2 = struct
    type t = unit option
    let true = None
    let fls = Some ()
    let ite o x y = match o with | None -> x | Some _ -> y
end

Natural numbers as an abstract type

One more:

module type S3 = sig
    type t
    val z : t
    val s : t -> t
end
module M3 : S3 = struct ... end

How many values of type t can we create? Infinitely many! There is one value of t for every natural number. Zero corresponds to z, while n + 1 corresponds to s n. We can make this bijection explicit by adding a way to recurse on values:

module type S4 = sig
    type t
    val z : t
    val s : t -> t
    (* `recurse x v f` means: if x = zero, then v; else, compute the answer for "x - 1", and apply f to it. *)
    val recurse : t -> 'a -> ('a -> 'a) -> 'a
end
module M4 : S3 = struct ... end

We can now convert to and from (nonnegative) integers:

let int_of_t (x : M4.t) : int = 
    M4.rec x 0 (fun x -> x + 1)

let rec t_of_int (x : int) : M4.t = 
    if x <= 0 then M4.z else 
        M4.s (t_of_int (x - 1))

Let's implement this interface in two different ways:

module M4 : S3 = struct 
    type t = unit list
    let z = []
    let s xs = () :: xs
    let rec recurse xs v f = 
        match xs with
        | [] -> v
        | _ :: xs' -> f (recurse xs' v f)
end

module M5 : S3 = struct
    type t = float
    let z = 0.0
    let s x = x +. 1.0
    let rec recurse x v f = if x <= 0.0 then v else f (recurse (x -. 1.0) v f)
end

module M6 : S3 = struct
    type t = | Z | S of t
    type z = Z
    let s x = S x
    let rec recurse x v f =
        match x with
        | Z -> v
        | S x' -> f (recurse x' v f)
end