Lecture 5.2: Manipulating Modules

Last class, we saw how to implement immutable stacks by hiding its implementation in an interface file. Let's see that again:

myset.mli

type 'a set = 'a list
val empty : 'a set
val mem : 'a -> 'a set -> bool
val add : 'a -> 'a set -> 'a set
val remove : 'a -> 'a set -> 'a set

Above, we have the .mli file for a simple stack interface. An interface file can contain types (either abstract declarations, like above, or full type definitions like type q = int list), as well as declarations for values, which we write using val. Then, we have the .ml file (the implementation):

myset.ml

(* {1 2 3 3} = {1 2 3} = {3 2 1} *)

type 'a set = 'a list

let empty : 'a set
  = []

let rec mem (x : 'a)   (s : 'a set) : bool
  = match s with
  | [] -> false
  | x' :: s' -> if x' = x then true else mem x s'

let add (el : 'a) (s : 'a set) : 'a set
  = if mem el s then s else el :: s
  (* = el :: s *)

let rec remove (x : 'a) (s : 'a set) : 'a set
  = match s with
  | [] -> []
  | x' :: s' -> if x' = x then s' else x' :: remove x s'

We can get very far with this style of writing modules. However, OCaml lets us do more: we can also write modules inline, in a single file. Let's replicate our stack below in this style:

(* keeping it simple: just for ints *)
module type MyIntSetSig = sig
  type t
  val empty : t
  val mem : int -> t -> bool
  val add : int -> t -> t
  val remove : int -> t -> t
end

module MyIntSet : MyIntSetSig = struct 
  type t = int list
  let empty : t = []
  let rec mem (x : int) (s : t) : bool = List.mem x s
  let add (el : int) (s : t) : t = el :: s
  let rec remove (x : int) (s : t) : t
    = List.filter (fun x' -> x' <> x) s
end

We can also make this polymorphic

module type MySetSig = sig
  type 'a t 
  val empty : 'a t
  val mem : 'a -> 'a t -> bool
  val add : 'a -> 'a t -> 'a t
  val remove : 'a -> 'a t -> 'a t 
end

module MySet : MySetSig = struct 
  type 'a t = 'a list
  let empty : 'a t = []
  let rec mem (x : 'a) (s : 'a t) : bool = List.mem x s
  let add (el : 'a) (s : 'a t) : 'a t = el :: s
  let rec remove (x : 'a) (s : 'a t) : 'a t
    = List.filter ((<>) x) s

end

First, we write the stack interface. This is given by a module type, which is secretly what an .mli file is -- just a sequence of type and value declarations. Then, we have the module, which is the implementation --- this is what goes in the .ml file. Let's see an example use:

let ex0 = MySet.empty ;;
let ex1 =
  MySet.add 1 (
  MySet.add 2 (
  MySet.add 3 xs ));;

(* scoped open *)
let ex2 = let open Myset in
  add 1 (add 2 (add 3 empty))

(* local open *)
let ex3 = Myset.(add 1 (add 2 (add 3 empty)))

let rec build_set_acc (acc : int MySet.t) (n : int) : int MySet.t =
  if n <= 0
  then acc
  else
    let x = Random.int 10 in
    build_set_acc (MySet.add x acc) (n - 1)

let build_set (n : int) = build_set_acc MySet.empty n 

let build_set : int -> MySet.t = build_set_acc MySet.empty

(* utop *)
utop# let xs = build_set 10;;
utop# MySet.mem 3 xs;;
utop# MySet.mem 4 xs;;
utop# MySet.mem 5 xs;;

Now, since we are focusing on abstraction: what happens if we try to break the abstraction boundary?

let bad = 
    let q = MySet.empty in 
    24 :: q (* Fails: this expression (q) has type MySet.t but an expression was expected of type int list *)

We get the error we want. Again, why do we want this? We want to prevent users of our code from doing bad things. Let's now see what happens if we change things a bit:

module MySet (* : MySetSig *) = struct 
    (* ... all same as before ... *)
    end

The only difference here is that we leave off the module type. Then, our bad example will type check:

let bad = 
    let q = MySet.empty in 
    24 :: q (* OK: MySet.t is aliased to int list *)

What's going on here is that, when we give our module a module type, we are effectively sealing it. Inside of MySet, we are allowed to know that the stack is a list; but outside, we only know it behaves as a MySetSig: we have a type t, but that's it.

Let's now see some more examples. Below is a simple interface for sets of strings:

module type StringSet = sig
  type t
  val empty : t
  val mem : string -> t -> bool
  val add : string -> t -> t
  val remove : string -> t -> t 
end

On API Design

Designing a good interface for your code is one of, if not the, most important idea in programming. If the interface is too narrow, it might be restrictive. For example, what if we want to count the number of elements in a StringSet.t? We are out of luck.

But if the interface is too wide, then this presents other issues. What if we added a function elems : t -> string list? This forces our set interface to "remember" all of the elements we have stored in the set. While this might sound reasonable, consider this implementation of sets:

module HashSet : StringSet = struct
    type t = string list
    let empty = []
    (* 
        Digest.string : string -> string
        computes a hash of the input.
    *)
    let add xs x = Digest.string x :: xs
    let mem xs x = List.mem (Digest.string x) xs
    let remove xs x = List.filter (fun y -> y <> Digest.string x) xs
end

In the above implementation, we don't store the elements themselves -- but instead hashes of the elements. This could have massive implications: the hashes stored above are only 128 bits (16 bytes), so if we were storing 1MB files by their hashes instead, we would be saving around over 60,000x storage space! Thus, we need to be very careful about what goes in an API. Too few things harms usability, while too many things forces implementations to actually implement the API.

This choice comes down to what the API is actually for. There are many ways in which sets are used. Are we storing elements for later? Or are we just testing for the presence of certain elements?

Cognitive Burden

Another important facet of program correctness --- equally important as formal testing --- is decreasing cognitive burden. While you can use testing to make sure that your code is correct, another approach is to set up your APIs so that it is hard to create bad code. Here's another API for sets:

module type OtherStringSet = sig
    type t 
    (* Add an element to a set, returning the set and its index into the set. *)
    val add : string -> t -> (t * int)

    (* Now, we must work with indices. *)
    val get_nth : int -> t -> string option
    val remove_nth : int -> t -> t
    val mem : int -> t -> bool
end

Which one is easier to use? I would say the original one, because the module type is closer to what I think of when I think of a set. This other interface, while as functional as before, is much harder to use, and therefore it's more likely that bugs will arise.