Lecture 7.2: Mutability

So far in class, we have only worked with immutable data, such as the following functions:

let add_one x = x + 1

let f () = 
    let x = 1 in 
    let y = add_one x in 
    let x = 37 in 
    print_endline ("y: " ^ string_of_int y)

This will print

y: 2

The reason for this is that the second assignment of x to 37 is rebinding the value of x. Thus, the old version of x is, in a sense, still there: we just can't get access to it because we have shadowed the name x with a new binding. Thus, y still is equal to 2, even after we re-assign x to a new value.

Sometimes, we do want y to recompute and change when x changes. We can do that through making x a reference, and having y recompute whenever we want the value:

let add_one (x : int ref) = !x + 1

let f () = 
    let x : int ref = ref 1 in (* Create a new reference *)
    let y : unit -> int = fun _ -> add_one x in 
    x := 37; (* Mutate x to hold 37 *)
    print_endline ("y: " ^ y ()) (* Prints "y: 38")

While int/bool/string all stand for pure values (i.e., a specific integer, such as 42), a value of type int ref is a memory cell: a box which holds an integer. We create a memory cell by the function ref, which has type 'a -> 'a ref. In other words, it can take in something of any type 'a, and return a memory cell which holds an 'a. (Note here that ref means two different things: it is a function to create a new memory cell; and it is also a parameterized type of those memory cells.) Then, if x has type int ref, we can:

peek inside the box by calling !x, returning an int; and
modify the box by writing x := i, where i : int; this returns a unit.

Why have we waited until now to introduce mutability? The semantics can get tricky very fast. Let's see a few examples of increasing complexity:

(* Evaluates to 22 *)
let test1 () = 
    let y = ref 10 in 
    let f x = x + !y  in
    y := 20;
    f 2

Here's another one:

(* Evaluates to 13 *)
let test2 () = 
    let f x = 
        let y = ref 10 in 
        y := !y + 1;
        x + !y
    in
    let y = ref 0 in 
    f 2

How about this?

let test3 () = 
    (* Evaluates to 13 *)
    let f = 
        let y = ref 10 in
        fun x -> 
            y := !y + 1;
            x + !y
    in
    let _ = f 0 in 
    f 1

Hiding references inside of functions

Let's create a function f : unit -> int that returns an incremented number each time it's called:

let f =
    let y = ref 0 in 
    fun () -> 
        y := !y + 1;
        !y

Storing arbitrary types

You can store anything inside of a reference, including a function:

let p_to_f : (int -> int) ref = ref (fun x -> x) in 
let double_p_to_f () = 
    let f = !p_to_f in 
    p_to_f := (fun x -> f x * 2)
in
double_p_to_f ();
double_p_to_f ();
(* Evaluates to 8 *)
(! p_to_f) 2

Aliasing

A memory cell is represented in memory as an address. You can think of there being a global heap, or "scratchpad", that's holding the current state of values. For example, when we evaluate let x = ref 32, we can think of x's value being the heap address 0x1234. Then, we also have, on the side:

(* *** HEAP *** *)

(* Memory address 0x1234 holds 32 inside *)
0x1234 |-> 32

Then, if we modify x -- say, by evaluating x := 0 -- we return (), but get a new heap:

(* *** HEAP *** *)

(* Memory address 0x1234 holds 32 inside *)
0x1234 |-> 0

Since x's value is not the integer itself, but rather an address to the integer, we can think about what happens in the following program:

let x = ref 32 in 
let y = x in 
y := 0;
!x

This evaluates to 32. The runtime value of x is a memory location (e.g., our 0x1234), and when we let y = x, we are setting y to this same memory location. Thus, if we modify y, there is no guarantee that x stays the same.

This problem is called aliasing: references (pointers to memory cells) may point to the same location, and thus affect each other. Let's see how this could affect the following function:

let f (x : int ref) (y : int ref) = 
    y := 0;
    !x

Now, what happens when we call f like so?

let x = ref 2 in 
let y = ref 4 in 
f x y

This evaluates to 2, because x and y are different memory cells. But what about this?

let x = ref 2 in 
f x x

What gets evaluated here? We get 0 back. This is concerning, because the type of f has no information about whether x and y point to the same or different memory locations. Thus, as soon as we have functions, we can't really be confident that any ref doesn't coincide with any other one. We won't discuss this in more detail, but this is exactly the problem that Rust tackles.