Lecture 2.1: More about functions; strings and printing; specifying `max3`

Last time, we started to cover functions in OCaml; let's see more about that here.

Multi-argument functions

Because functions are not special in OCaml relative to other types (e.g., integers or strings), they can return other functions:

utop # let f = fun (x : int) -> fun (y : int) -> x + y;;
val f : int -> int -> int = <fun>

Here, f has type int -> int -> int. This should be read as int -> (int -> int) (i.e., -> is right associative); in other words, f is a function which, given an int, returns a function of type int -> int.

Instead of thinking of f as a function which returns a function, we can also think of it as a function which takes two arguments. Indeed, f's declaration can be abbreviated as follows:

utop # let f = fun (x : int) (y : int) -> x + y;;
val f : int -> int -> int = <fun>

Additionally, an alternative (often more convenient) syntax for writing functions is as follows:

utop # let f (x : int) = x + 1;;
val f : int -> int = <fun>

All three of the above declarations for f are equivalent.

This style of writing functions also works with multiple arguments:

utop # let f (x : int) (y : int) = x + y;;
val f : int -> int -> int = <fun>

Annotating return types

When you write functions like this, you can annotate the return type:

utop # let f (x : int) (y : int) : int = x + y;;
val f : int -> int -> int = <fun>

You can do this with regular values, too:

utop # let x : int = 3;;
val x : int = 3

Partial Application

Since multi-argument functions in OCaml are just functions which return functions, we can partially apply them:

utop # let f (x : int) (y : int) = x + y;;
val f : int -> int -> int = <fun>

utop # let add_two = f 2;;
val add_two : int -> int = <fun>

utop # add_two 8 ;;
val - : int = 10

Higher-order functions

Functions can also be higher-order, meaning they can take functions as inputs (because function types are treated the same as any other type):

utop # let f (g : int -> int) = g 2;;
val f : (int -> int) -> int = <fun>

utop # f (fun (x : int) -> x * 5);;
- : int = 10

Note that f's type is (int -> int) -> int; this is a single argument function that takes a function of type int -> int as its argument. This is different than int -> int -> int, which takes two arguments of type int.

Exercise

What will this evaluate to?

utop # let f (g : int -> int) (x : int) = g (x + 1);;
val f : (int -> int) -> int -> int = <fun>

utop # f (fun x -> x * 5) 3;;
_ : int = ???

Exercise

What will this evaluate to?

utop # let f (g : (int -> int) -> int) (x : int) = g (fun y -> x + y);;
val f : ((int -> int) -> int) -> int -> int = <fun>

utop # f (fun h -> h 42) 10;;
- : int = ???

Strings and Printing

Now that we have seen functions, we can analyze the program from the end of last class:

tst.ml

let x = 3

let y = x + 1

let _ = 
    (* Print the value of y *)
    print_endline ("Value of y: " ^ string_of_int y)

First, string_of_int has type int -> string. (In general, functions named X_of_Y in OCaml are used to convert things of type Y to type X.) Then, print_endline has type string -> unit; you can verify this in utop or in VSCode. The result type of print_endline, unit, is a type that only has one value, ():

utop # ();;
- : unit = ()

We use unit to denote the return types of functions that are used not for their return values, but instead for their side effects. print_endline doesn't return anything useful; instead, when we run it, it prints its argument to the terminal.

utop # print_endline "this is being printed";;
this is being printed
- : unit = ()

The second line is the result of print_endline, while the third line is the return value, ().

Example

utop # print_endline ("Two plus two is " ^ string_of_int 4);;
Two plus two is 4
- : unit = ()

Now, let's combine print_endline and functions together:

utop # let printer (x : int) = print_endline ("Value of x: " ^ string_of_int x);;
val printer : int -> unit = <fun>

utop # printer 42;;
Value of x: 42
- : unit = ()

And now the opposite:

utop # let apply_with_42 (f : int -> unit) = f 42;;
val apply_with_42 : (int -> unit) -> unit = <fun>

utop # apply_with_42 (fun (_ : int) -> ());;
- : unit = ()

utop # apply_with_42 (fun (x : int) -> print_endline ("Value of x: " ^ string_of_int x));;
Value of x: 42
- : unit = ()

(Above, remember that _ is a placeholder variable name when you don't want to use the variable.)

Sequencing

Side effects are generated when an expression is run. Now, let's see what happens when we run multiple expressions at once:

utop # let _ = print_endline "hi" in 
       let _ = print_endline "there" in 
       42;;
hi
there
- : int = 42

This expression evaluated as follows:

print_endline "hi" evaluates; produces its side effect; and returns ();
print_endline "there" evaluates; produces side effect; and returns (); then, finally,
We return 42.

Whenever we write let x = e1 in e2, for any e1 or e2, e1 will evaluate and produce a side effect before e2 runs.

This pattern of using let _ = .. in .. is so common that OCaml lets us abbreviate it like so:

utop # print_endline "hi"; print_endline "there"; 42;;
hi
there
- : int = 42

Whenever we have a single semicolon, like e1; e2, and e1 has type unit (so it is only run for its side effects) this means the exact same thing as let _ = e1 in e2.

Now, let's combine functions, printing, and ; together:

utop # let print_twice (s : string) = print_endline s; print_endline s;;
val print_twice : string -> unit = <fun>

utop # print_twice "hi";;
hi
hi
- : unit = ()

Exercise

What will this do?

utop # let print_and_add (x : int) (y : int) = 
       print_endline ("Adding " ^ string_of_int x ^ " and " ^ string_of_int y);
       x + y;;
val print_and_add : int -> int -> int = <fun>

utop # print_and_add 3 4;;
Adding 3 and 4
- : int = 7

Exercise

What will happen below?

let maybe_print (x : int) (s : string) = 
    if x = 2 then print_endline s else ()

let _ = 
    maybe_print 3 "three";
    maybe_print 4 "four"

Exercise

utop # let apply_and_print (f : int -> int) (x : int) = 
       print_endline (string_of_int (f x))
val apply_and_print : (int -> int) -> int -> unit = <fun>

utop # apply_and_print (fun (x : int) -> 
    print_endline "Inside of f";
    x + 1) 42;; 
Inside of f
43
- : unit = ()

Specifying a "maximum" function

Already with the ingredients that we have --- namely, higher-order functions --- we can already start to study what it means to write a specification. We will do so with a case study, a three-valued version of max:

(* Return the highest of the three inputs. *)
let max3 (i : int) (j : int) (k : int) = ...

How do we know we got it right?

The specification in the comment is pretty unambiguous. But given this implementation..

let max3 (i : int) (j : int) (k : int) = 
    if i >= j then 
        if i >= k then 
            i
        else 
            k
    else 
        if j >= k then 
            j
        else k

how do we know it is right? We could manually inspect it, but that won't work all the time (think about max4!), and is really easy to get wrong.

Instead, we want to implement the comment specification as a piece of formal logic, implemented in OCaml. How would we do so?
First, before trying to implement it, let's write down the mathematical property max3 should have.

\(\forall i, \forall j, \forall k, \mathsf{max3}(i, j, k) >= i \wedge \mathsf{max3}(i, j, k) >= j \wedge\mathsf{max3}(i, j, k) >= k.\)

Is this correct? Is this sufficient? In other words, we are looking for a formula that encodes our intuition. If there is a solution for \(\mathsf{max3}\) that is different than what we actually want, that means there's a problem with the formula. I claim there are functions that make the above formula true, but you wouldn't count as a valid version.

Here's a fixed version: \(\forall i, \forall j, \forall k, \mathsf{max3}(i, j, k) >= i \wedge \mathsf{max3}(i, j, k) >= j \wedge\mathsf{max3}(i, j, k) >= k \wedge \mathsf{max3}(i, j, k) \in \{i,j,k\}.\)

Next, we can implement the property. How could we turn this into a piece of OCaml code?

let max3_spec (f : int -> int -> int -> int) (i : int) (j : int) (k : int) : bool = 
    let z = f i j k in 
    z >= i && z >= j && z >= k && (z == i || z == j || z == k)

Finally, we can test the property. Given max3, how should we test if it fits max3_spec? We can use Random.int.

let tester (f : int -> int -> int -> int) = 
    let i = Random.int 1000 in 
    let j = Random.int 1000 in 
    let k = Random.int 1000 in 
    if max3_spec f i k j then 
        () (* Do nothing *)
    else 
        print_endline ("Found a counterexample: (" ^ 
                        string_of_int i ^ ", " ^ 
                        string_of_int j ^ ", " ^ 
                        string_of_int k ^ ")")

On our good version of max3, tester should output nothing. But with a bad version:

let max3_bad (i : int) (j : int) (k : int) = 
    i

tester should give us a counter-example of max3_bad's correctness.

Note

What happens if we change the random range to something smaller (such as 3)?

This is the essence of property-based testing, which can be broken into these three steps:

Mathematically model a specification of correctness;
Implement the model; and
Test the model against an implementation.

We will go deeper into each of these three steps over the next few weeks.

Lecture 2.1: More about functions; strings and printing; specifying max3

Multi-argument functions

Partial Application

Higher-order functions

Strings and Printing

Sequencing

Specifying a "maximum" function

Lecture 2.1: More about functions; strings and printing; specifying `max3`