Assignment 0: OCaml warmup, part 1:   basics
1 Setup
1.1 Using login.khoury.northestern.edu
1.2 On your own computer
1.3 OSX Instructions
1.4 Windows Instructions
2 Programming in OCaml — The Basics
2.1 The Simplest Program
2.2 Defining and Calling Functions
2.3 Recursive Functions
2.4 Testing with OUnit
2.5 Exercises
3 Programming in OCaml — Datatypes
3.1 Binary Trees with type
3.2 Manipulating Data with match
3.3 Exercises
4 Programming in OCaml — Lists and Parametric Polymorphism
4.1 Linked Lists, By Hand
4.2 Linked Lists, Built-in
4.3 Exercises
5 Tuples
6 The option Type
7 Grading standards
8 Submission
8.1 Deliverables
8.2 Instructions
8.12

Assignment 0: OCaml warmup, part 1: basics🔗

Due: Sun 01/14 at 8:59pm

git clone

We will use three programming languages in this course: OCaml, C (not C++), and x64 assembly. Depending on your experience so far, you should all have some background in C and possibly x64, and the C we use won’t be surprising. You may not have seen OCaml before, and warrants an introduction.

This writeup serves as both a reference for some of the OCaml topics we’ll need in the course, and your first assignment. You should do all the numbered Exercises throughout the document for your first assignment.

1 Setup🔗

1.1 Using login.khoury.northestern.edu🔗

Add the following lines to your .bash_profile file:

# OPAM configuration
export OPAMROOT=/net/proj/ocaml/.opam
export PATH=/net/proj/ocaml/opam:$PATH
. /net/proj/ocaml/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true

Log out and log back in. You should now have a working ocaml, with the libraries above already installed for you. Confirm this by running ocaml --version to ensure that it’s on your path and runnable.

1.2 On your own computer🔗

OPAM is a package manager for OCaml. If you know of pip or easy_install for Python, it’s the same idea. OCaml is installed on the department machines, but you need to do a small bit of setup to use some libraries we’ll need for the course. Assuming you’re using Bash as your shell, open up the file .profile in your home directory (with e.g. emacs ~/.profile), and add this line at the bottom, if it’s not already there (replacing my username with your own):

test -r /home/blerner/.opam/opam-init/init.sh && . /home/blerner/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true

Then run:

$ source ~/.bashrc

(In this and all subsequent assignments, the $ indicates the command prompt; you do not type that character!) This makes it so each time you open a terminal, the build commands you run for OCaml will be able to use some libraries we need for the course.

Be sure to install a recent version of OPAM – the latest is currently v2.1.5. Ubuntu, at least, has not yet updated its packages to v2.0.6, so you’ll need to use a PPA or install a binary in order to get a newer version – the latest is currently v2.1.5. (WARNING! If you are using the Fish shell with older OPAM versions (2.0.5 or lower), setting up OPAM will possibly clobber your MANPATH environment variable, such that trying to run man <anything> will fail. If you’re stuck with this older version: Run opam config var prefix to find out the path where OPAM has installed your current setup. On my machine, that gives /home/blerner/.opam/4.09.0. Within that directory, edit the file .opam-switch/environment, and comment out the line that sets MANPATH with a #. You will need to redo this step every time you run an opam install or opam update command.)

Once you’ve installed OPAM, install the needed OCaml packages:

$ opam init
# this takes a while
$ opam switch create 4.14.1
$ opam install extlib ounit batteries
# wait a while
# follow the directions for setting up your environment for opam, then
# the Makefile should work for you

1.3 OSX Instructions🔗

If you want to work on your OSX laptop, you can install OCaml and OPAM via Homebrew first, then install the necessary packages manually:

$ brew install ocaml opam

Then follow the Setup instructions above.

1.4 Windows Instructions🔗

Sorry, it’s not feasible for me to support Windows, BSD, or any other more exotic platforms. You’re free to try building things on those platforms, but make sure your assignments run on the department machines before submitting. The assignments should work on either OSX or on the department machines; I’ll do my best to support both and note any exceptions as time goes on. When in doubt, use a department machine.

So far, I’ve had some success using Windows Subsystem for Linux, specifically the newer version 2. Within WSL, install Ubuntu, and then continue with the generic Linux instructions above.

2 Programming in OCaml — The Basics🔗

This section covers some basics first, and then gives a structured introduction to how we’ll program in OCaml in this course.

2.1 The Simplest Program🔗

OCaml files are written with a .ml extension. An OCaml file is somewhat similar to a Python file: when run, it evaluates the file directly (unlike, for example, C++, which designates a special main function). An OCaml file consists of

For example:

open Printf

let message = "Hello world";;
(printf "%s\n" message)

The first line includes the built-in library for printing, which provides functions similar to fprintf and printf from stdlib in C. The next two lines define a constant named message, and then call the printf function with a format string (where %s means “format as string”), and the constant message we defined on the line before.

Put this in a file called hello.ml and run it with:

$ ocaml hello.ml
Hello world

Most of our programs will be more complex than this, and much of the infrastructure for the “main” function will be provided, but it’s useful to see the simplest case first.

2.2 Defining and Calling Functions🔗

One thing that we’ll do over and over again is define functions. Here’s an example of a function definition in OCaml:

let max (n : int) (m : int) : int =
  if n > m then n else m;;

It uses the keyword let followed by the name of the function (here, max). Next comes the parameter list. Each parameter is enclosed in parentheses, and has the parameter’s name, then a colon, then its type. So n and m are the parameters of max, and they are both type int. The final colon followed by int describes the return type of the function. Then there is an = sign, followed by the function body.

This declaration is similar to the following in C/C++/Java:

int max(int n, int m) {
  if(n > m) { return n; }
  else { return m; }
}

One notable difference is that the OCaml function does not have any return statements. We’ll talk about how to think about the “return” value of a function without return statements next. It’s also important to note that the declaration in OCaml ends in ;;. This is a required syntactic convention for all top-level declarations in OCaml.

We’ll get to a more robust notion of testing in a little bit. We can check that max works by defining a useful top-level call with print statements:

open Printf

let max (n : int) (m : int) : int =
  if n > m then n else m;;

(printf "Should be 5: %d\n" (max 5 4));
(printf "Should be 4: %d\n" (max 3 4));
(printf "Should be 4: %d\n" (max 4 4));

You can copy this program into a file called max.ml and run it with ocaml max.ml.

There are a few things to explain here. First, the syntax for function calls in OCaml is different than you may be used to. Instead of writing

some_function(arg1, arg2, ...)

// for example

max(4, 5)

The surrounding parentheses can be omitted in some cases, but it’s always safe to include them to be clear. as we would in C++ or Python, in OCaml we write

(some_function arg1 arg2 ...)

(* for example *)

(max 4 5)

This is just a useful model; everything you know about stacks and memory diagrams is still true (and in fact, we’ll talk about stacks in quite a bit of detail this semester). But substitution is a very helpful model for reasoning in the style of programming we’ll do. That’s just a syntactic difference. There’s also a useful distinction in how I prefer to think about what happens when we call a function in OCaml. Rather than thinking about a call to max creating a new stack frame, let’s think about what happens if we substitute the provided argument values for the parameters in the body of max, and continue by evaluating the function body.

So, for example, the call to max below takes a step to the substituted form:

   (max 4 5)

=> if 4 > 5 then 4 else 5

Then we can think about how the if expression takes steps. First, it evaluates the conditional part, and based on that value being true or false, it evaluates one or the other branch:

   if 4 > 5 then 4 else 5

=> if false then 4 else 5

=> 5

From this sequence of steps, we say that (max 4 5) evaluates to 5. This gives us a way to think about evaluation that doesn’t require a notion of a return statement.

With this idea of substitution in mind, we can think about how the sequence of printf expressions we wrote will evaluate:

  (printf "Should be 5: %d\n" (max 5 4));
  (printf "Should be 4: %d\n" (max 3 4));
  (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 5: %d\n" (if 5 > 4 then 5 else 4));
   (printf "Should be 4: %d\n" (max 3 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 5: %d\n" (if true then 5 else 4));
   (printf "Should be 4: %d\n" (max 3 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 5: %d\n" 5);
   (printf "Should be 4: %d\n" (max 3 4));
   (printf "Should be 4: %d\n" (max 4 4));

The rule for semicolon-separated sequences is that they are evaluated in order, and the value resulting from each expression is ignored once it is done.

=> <internal-to-OCaml-printing of "Should be 5: 5\n">;
   (printf "Should be 4: %d\n" (max 3 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 4: %d\n" (max 3 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 4: %d\n" (if 3 > 4 then 3 else 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 4: %d\n" (if false then 3 else 4));
   (printf "Should be 4: %d\n" (max 4 4));

=> (printf "Should be 4: %d\n" 4);
   (printf "Should be 4: %d\n" (max 4 4));

=> <internal-to-OCaml-printing of "Should be 4: 4\n">;
   (printf "Should be 4: %d\n" (max 4 4));

... and so on

2.3 Recursive Functions🔗

This is because, as we’ll see, there are lots of trees to process in a compiler, and trees have a fundamentally recursive structure. A lot of the code we write this semester will be recursive functions. OCaml distinguishes between functions that can contain recursive calls and functions that cannot. We saw the latter kind above in max which simply used the let keyword. We can define a recursive function by using let rec:

let rec factorial (n : int) : int =
  if n <= 1 then 1
  else n * (factorial (n - 1));;

The substitution-based rules are a little more interesting when thinking about evaluating a call to factorial:

  (factorial 3)

=> (if 3 <= 1 then 1 else 3 * (factorial (3 - 1)))

=> (if false then 1 else 3 * (factorial (3 - 1)))

=> (3 * (factorial (3 - 1)))

=> (3 * (factorial 2))

=> (3 * (if 2 <= 1 then 1 else 2 * (factorial (2 - 1)))

...

=> (3 * (2 * (factorial (2 - 1))))

...

=> (3 * (2 * 1))

=> 6

Here, we can see the chained multiplications “stack up” during the recursive calls. Writing this in a substitution-based style makes it easy to track where the return values of function calls end up.

2.4 Testing with OUnit🔗

Testing by printing values becomes pretty onerous when we want to write more than a few examples. In this course, we’ll use a library called OUnit to write tests.

With OUnit, we will write declarations in one file, and test them in another. The code provided in your checkout has two files: functions.ml, which you’ll fill in with some implementations in the rest of the exercises, and test.ml, which will contain tests. This will become a common layout for how we write our programs in this course.

The weird-looking >:: operator is described here in terms of more basic concepts, if you’re interested. It’s just a shorthand for constructing a test value with a name. A test in OUnit is a name paired with a function of one argument. The function uses one of several different test predicates to check a computation; the one we’ll use most commonly is assert_equal. The syntax >:: is used to combine the name and the function together into a test. Here’s an example:

open OUnit2
let check_fun _ = (* a function of one argument *)
  assert_equal (2 + 2) 4;;

let my_first_test = "my_first_test">::check_fun;;

Most of this is just boilerplate that you won’t have to think much, if at all, about. But it’s useful to explain it once. Now my_first_test is a named test value. Please be careful not to use spaces in test names, because running the test suite will produce one log file per test, and those files will have spaces in their names, which may cause some headaches.

Note that we used an underscore when defining the parameter of check_fun; we can use an underscore to indicate to OCaml that we don’t care about the argument (there needs to be a parameter because of how the testing library works, even though we won’t use the parameter). We can run our test by creating a suite out of a list of tests,Again, please avoid spaces in the suite name. and running the suite:

let suite = "suite">:::[my_first_test];;
run_test_tt_main suite

The command used is not ocaml in this case, but a wrapper around ocaml called ocamlfind that knows how to search your system for packages installed with e.g. OPAM. Candidly, OCaml’s build system can be a little onerous, so I’m not teaching it explicitly. If you want to use OCaml for a large project outside this course, I recommend learning about the corebuild tool that comes with Real World OCaml. To build and run the given skeleton, use the provided Makefile that does the work of building for you. In this case, you just need to run

$ make test
$ ./test

in order to run the tests.

We can also add tests that fail to see what happens:

let check_fun2 _ = (* a failing test *)
  assert_equal (2 + 2) 5;;

let my_second_test = "my_second_test">::check_fun2;;

If we add this test to the suite and run, we get a failure:

$ ./test
.F
==============================================================================
Error: suite:1:my_second_test.

File "/Users/joe/.../oUnit-suite-prob#02.log", line 2, characters 1-1:
Error: suite:1:my_second_test (in the log).

Raised at file "src/oUnitAssert.ml", line 45, characters 8-27
Called from file "src/oUnitRunner.ml", line 46, characters 13-26

not equal
------------------------------------------------------------------------------
Ran: 2 tests in: 0.14 seconds.
FAILED: Cases: 2 Tried: 2 Errors: 0 Failures: 1 Skip:  0 Todo: 0 Timeouts: 0.

This output identifies the failing test by name (my_second_test), though it doesn’t tell us much more than that. Another annoying thing about the way we wrote those tests is that defining a new function for every test causes significant extra typing. To get a little more information out, we can pass assert_equal an optional argument that specifies how to turn the values under test into a string for printing. We can bundle that up inside a function that creates the test with its name. So, for example, we can define a function that creates tests comparing integers to integers:

let t_int name value expected = name>::
  (fun _ -> assert_equal expected value ~printer:string_of_int);;

let my_third_test = t_int "my_third_test" (2 + 2) 7;;
let my_fourth_test = t_int "my_fourth_test" (2 + 2) 4;;

If we add these two tests to the suite, we see a much more useful failure report that says expected: 7 but got: 4. I’ll often provide useful helper functions for testing with examples, but you may also decide to write your own for different kinds of tests as the semester goes on.

(Note: the funny ~printer: syntax is a labeled argument, and is a mechanism for passing arguments by name rather than just by position. Functions must be defined to expect labeled arguments; you can’t simply assume that an arbitrary argument happens to have a name. Additionally, labeled arguments might be optional, and there might be a default value supplied for them. Labeled, optional arguments are prohibited from being the final argument for a function (why do you suppose that must be?). If you want to supply a different value-to-string converter for assert_equal, you’d need to write ~printer:your_function_here.)

2.5 Exercises🔗

  1. Implement fibonacci as an OCaml function that takes an integer n and returns the \(n^{th}\) Fibonacci number. Write out the evaluation of (fibonacci 3) in substitution style.

  2. Write tests for max and fibonacci using t_int.

3 Programming in OCaml — Datatypes🔗

Programming with only integers, we wouldn’t make much progress on building a compiler. The next thing we need to do is understand how to create new datatypes in OCaml, and program with them.

3.1 Binary Trees with type🔗

Let’s start with a datatype we all ought to know well: binary trees. We know we’ll need to represent a binary tree node somehow, which has a value and two children. For now, let’s say the value has to be a string. In OCaml, we can define such a node using the keyword type:

type btnode =
  | Leaf
  | Node of string * btnode * btnode

Translated into English, this reads:

A binary tree node is either a leaf of the tree, which has no fields, or a node, which has three fields: a string, a binary tree node, and another binary tree node.

This defines what we call constructors for Leaf and Node, which we can use to construct trees. Here are a few examples of trees and their corresponding btnode value:

    "a"       Node("a",

   /   \        Node("b", Leaf, Leaf), Node("c", Leaf, Leaf))

"b"     "c"

    "a"       Node("a",

   /            Node("b", Leaf, Leaf), Leaf)

"b"

    "a"       Node("a",

   /            Node("b",

"b"               Leaf, Node("c", Leaf, Leaf)), Leaf)

   \

    "c"

In C++ implementation of binary trees, we would commonly use NULL to represent a Leaf; in Fundies 2 or OOD, we would likely use a dedicated Leaf class instead. Unlike NULL in C++, Leaf can only be a btnode and can never be confused for a value of some other type. Each position with no child corresponds to a Leaf, and the others correspond to uses of Node. We call Leaf and Node variants of the btnode type.

3.2 Manipulating Data with match🔗

The next question is how to work with these values. For example, how can we construct an in-order concatenation of the strings in a btnode as we’ve defined it? That is, how do we fill in this function:

let rec inorder_str (btn : btnode) : string =
  ...

The next feature we need to introduce is match, which allows us to examine which variant of a type a particular value has, and extract the values of its fields. Here are some examples:

let m1 = match Leaf with
  | Leaf -> true
  | Node(s, left, right) -> false;;

(* m1 is true *)

let m2 = match Leaf with
  | Leaf -> 44
  | Node(s, left, right) -> 99;;

(* m2 is 44 *)

let m3 = match Node("a", Leaf, Leaf) with
  | Leaf -> "z"
  | Node(s, left, right) -> s;;

(* m3 is "a" *)

let m4 = match Node("a", Node("b", Leaf, Leaf), Leaf) with
  | Leaf -> "z"
  | Node(s, left, right) ->
    match left with
      | Leaf -> "y"
      | Node(s2, left2, right2) -> s2;;

(* m4 is "b" *)

From these examples, we can see how match must work. It inspects the value after the match keyword, and selects the branch that corresponds to the variant of that value. Then it extracts the fields from the value, and substitutes them for the names given in the branch. Let’s use the m4 example to make that concrete:

  match Node("a", Node("b", Leaf, Leaf), Leaf) with
    | Leaf -> "z"
    | Node(s, left, right) ->
      match left with
        | Leaf -> "y"
        | Node(s2, left2, right2) -> s2

(* substitute Node("b", Leaf, Leaf) for left *)

=> match Node("b", Leaf, Leaf) with
     | Leaf -> "y"
     | Node(s2, left2, right2) -> s2

(* substitute "b" for s2 *)

=> "b"

With match available, we can now fill in the body for inorder_str. We can start by writing out a skeleton of the match structure for a btnode, as most functions over btnodes will need to match on the node to decide what to do.

let rec inorder_str (bt : btnode) : string =
  match bt with
    | Leaf -> ...
    | Node(s, left, right) -> ...

Now we can ask what the inorder traversal should yield in the case of a leaf of the tree (or an empty tree altogether). In this case, that ought to be an empty string. So the Leaf case should be filled in with "". How about for Nodes? We know an inorder traversal should have the elements to the left in order, then the current node, then the elements to the right. We can get the elements to either side via a recursive call, and then we just need one more piece of information, which is that ^ is the operator for concatenating strings in OCaml:

let rec inorder_str (bt : btnode) : string =
  match bt with
    | Leaf -> ""
    | Node(s, left, right) ->
      (inorder_str left) ^ s ^ (inorder_str right)

3.3 Exercises🔗

  1. This is a trick question.Write a test function t_string that’s like t_int, but tests for equality of strings. Can you write a function that produces a string form of the results like t_int did for integers?

  2. Write at least five interesting tests for inorder_str.

  3. Write out the substitution-based evaluation of inorder_str on a tree with at least 3 nodes.

  4. Write a function size that takes a btnode and produces an integer that is the number of Nodes in the tree.

  5. Write a function height that takes a btnode and produces an integer that is the height of the tree.

  6. Make sure to test the above two functions.

4 Programming in OCaml — Lists and Parametric Polymorphism🔗

4.1 Linked Lists, By Hand🔗

Since we’ve seen binary trees, it’s natural to think about a similar definition for the nodes of a linked list. One OCaml datatype we could write is:

type llist =
  | Empty
  | Link of string * llist

That is, a list is either Empty (the end of the list), or a Link of a string onto another list. Of course, this would require that we write additional datatype declarations for lists of numbers, lists of booleans, lists of binary trees, and so on, if we needed those shapes of data. The natural solution is to make the datatype generic over the kind of data it uses. OCaml lets us do this by defining datatypes with type variables that can be filled with any type. Type variables are written with a leading ' character:

This idea corresponds to the use of templates in C++, or generic types in Java.

type 'a llist =
  | Empty
  | Link of 'a * 'a llist

The types of the fields in Link have changed with this addition. The first field can now hold a value of the list’s type, and the second must hold a llist that contains elements of that type as well. That is, this describes a homogeneous linked list, where all elements will have the same type. We won’t have much use for heterogeneous lists, which would introduce different complications.

Lets say we want to write a function sum that takes a llist of numbers and adds them up. We now need to describe its type in terms of the contents, which will be an int:

let rec sum (l : int llist) : int =
  match l with
    | Empty -> 0
    | Link(first, rest) -> first + (sum rest)

When we construct llists, we do not need to provide any extra type information — OCaml figures it out for us. For example, we can write:

let l1 = Link(1, Empty);;
let l2 = Link("a", Empty);;

Here, l1 will have type int llist, and l2 will have type string llist.

4.2 Linked Lists, Built-in🔗

It turns out that our definition of llist above is important enough that a version of it is built into OCaml, just with slightly different names and syntax. The built-in equivalent of Empty is written [], and Link(first, rest) is written first :: rest. The syntax [a; b; c] is shorthand for a::b::c::[]. (Whitespace is not required before or after the colons and semicolons here, but is useful for readability.) The type of built-in lists is 'a list, which can be specialized for any list contents. For example, we could rewrite sum above as:

let rec sum2 (l : int list) : int =
  match l with
    | [] -> 0
    | first::rest -> first + (sum2 rest)

And we could test it by creating tests with t_int:

(* these would go in the suite list *)
t_int "sum2_empty" (sum2 []) 0;
t_int "sum2_single" (sum2 [5]) 5;
t_int "sum2_longer" (sum2 [3; 4; 5]) 12;
t_int "sum2_longer2" (sum2 (3::4::5::[])) 12;

Note that the last two tests mean the same thing; they are just different ways of writing the same list containing 3, 4, and 5.

Since lists are quite a fundamental structure, we will end up using them frequently; handy functions to use with lists are here, and we’ll talk about them more as we build up more experience with OCaml. I’ll mention here, since it’s slightly tricky to search for, that OCaml provides two ways to append lists:

(* This function ... *)
List.append [1;2;3] [4;5;6] ====> [1;2;3;4;5;6]

(* ... is synonymous with this infix operator *)
[1;2;3] @ [4;5;6] ====> [1;2;3;4;5;6]

4.3 Exercises🔗

  1. Write and test a function increment_all that takes an int list and produces a new int list with all the elements increased by 1.

  2. Write and test a function long_strings that takes a string list and an int and produces a new string list that contains all the strings that had length greater than the given int. You can get the length of a string with the function String.length. Other string functions are documented here.

  3. Write and test a function every_other that takes a 'a list (a list with elements of any one type), and produces a new list that contains every other element from the given list, starting with the first element.

  4. Write and test a function sum_all that takes an int list list (that is, a list of lists of integers), and returns an int list that contains the sums of the sub-lists.

5 Tuples🔗

There are many times in programs where we wish to return more than one value. For example, when returning a pair of key and value from a hash-table data structure, when returning an average and its standard deviation, or when representing a two (or three)-dimensional point, to name a few.

OCaml has a built-in way of handling these cases called tuples. To create a tuple, we enclose two or more values in parentheses:

let tup = (1, "a", []);;

To access the values in a tuple, we can use a special kind of let binding, where we give names to the positions of a tuple value:

let tup = (1, "a", []);;
let (one, a, empty_list) = tup;
(*
  one is 1
  a is "a"
  empty_list is []
*)

Since pairs — tuples of exactly two elements — are quite common, there are also two built-in functions, fst and snd, that get the first and second component of a two-element tuple, respectively.

The type of a tuple is written with * characters separating the components’ types, and surrounded by parentheses.

let increment_snd (t : (string * int)) : (string * int) =
  (fst t, 1 + (snd t));;

(increment_snd ("a", 5)) (* returns the pair ("a", 6) *)

Warning! It’s really tempting to write [1, 2, 3] and think you have an int list. You don’t! You actually have a one-element list, of type (int * int * int) list. Commas are reserved for creating tuple values, and semicolons are used for creating list values. This typo can cause some very confusing error messages, where the types reported in the error bear little resemblance to what you think ought to be in your code...

In the second part of this assignment, you will use tuples to represent source locations of tokens, when parsing S-expressions (and of course, tuples will be useful for many other things, later!).

6 The option Type🔗

A common way of handling failure that we’ve already seen is raising exceptions with failwith. This works well when an operation is truly nonsensical. However, it forces programs to use a different class of features — exceptions and exception handlers — to handle failing behaviors. Sometimes, the failure of an operation is a reasonable outcome, and having a way to report a failure, or the absence of an answer, with a normal value rather than an exception is quite useful.

Consider the problem of finding and returning the first element in a list that matches a particular predicate:

let rec find (l : 'a list) (pred : 'a -> bool) : 'a =
  match l with
    | [] -> failwith "Not found"
    | x::xs -> if pred x then x else find xs pred;;

(find [1;2;3] (fun n -> n > 4);; (* raises an error *)
(find [1;2;3] (fun n -> n > 2);; (* returns 3 *)

When the element isn’t found, we cannot return a value of type 'a, because the algorithm hasn’t found one. It seems we have to throw an error, as there is nothing left for us to do. This certainly limits the utility of find, which now needs a companion contains if is to be useful on lists that aren’t already known to have a matching element.

It would be convenient if we had a value that represented that there is no appropriate value to return in the empty case. Similarly, it would be useful to have the counterpart, a representation of being able to provide some appropriate value. OCaml provides just such a datatype, called option, which is built-in. If we wrote the definition ourselves, it would look like:

type 'a option =
  | None
  | Some of 'a

That is, an option is either None, which we can use to indicate failure or the lack of an appropriate value, or Some, which contains a single field that is a value of the option’s type. To write find using option, we would write:

let rec find2 (l : 'a list) (pred : 'a -> bool) : 'a option =
  match l with
    | [] -> None
    | x::xs -> if pred x then Some(x) else find2 xs pred;;

(find2 [1;2;3] (fun n -> n > 4);; (* returns None *)
(find2 [1;2;3] (fun n -> n > 2);; (* returns Some(3) *)

Now a program that calls find, rather than using an exception handler to manage the not found case, can simply match on the option that is returned to decide what to do.

Note that options aren’t always better than exceptions, as sometimes it’s difficult for the caller to know what to do when None is returned. But in many cases, when “failure” is something that the caller can reasonably react to, returning an option is a much more natural choice.

(For those coming from C/C++ backgrounds, note that an 'a option is very different from a “pointer that might be NULL”. Because of ML’s type system, you can’t simply “dereference the pointer”: the only thing you can do with a value of a datatype is match on it, which means you can never forget to check if the value is actually None.)

7 Grading standards🔗

For this assignment, you will be graded on

8 Submission🔗

8.1 Deliverables🔗

Your submission should include the three files provided: Makefile, functions.ml, and test.ml.

Please ensure that your code compiles! On this assignment, half of your grade is for correctness as determined by automated testing, so code that doesn’t compile is subject to a 50% penalty up front. Zip the files together:

$ zip submission.zip Makefile functions.ml test.ml

and submit that zip. (Note: we may or may not wind up using automated unit tests on the handin server itself, but we will run the unit tests ourselves offline.)

8.2 Instructions🔗

You will submit your homework at https://handins.khoury.northeastern.edu. Follow the instructions here on how to use the server.