Lecture 3.2: Working with Specifications
Last class, we discussed what specifications are; now, we will discuss what we can do with them.
Let's take a look at the spec for sort : int list -> int list
, which has two parts:
- (P1) The result should be sorted: if \(\mathsf{sort}(xs) = ys\), then \(\forall 0 \leq i < j < |ys|, ys[i] \leq ys[j]\).
- (P2) The result should have the same number of each element: if \(\mathsf{sort}(xs) = ys\), then \(\forall i, \mathsf{count}(xs, i) = \mathsf{count}(ys, i)\), where \(\mathsf{count}(xs, i)\) counts the number of occurences of \(i\) in \(xs\).
Formalizing a spec
To do anything interesting with this specification, we will first formalize it by encoding it into a programming language.
let rec all (p : int -> bool) (xs : int list) =
match xs with
| [] -> true
| x :: xs' -> p x && all p xs'
let rec sorted (xs : int list) =
match xs with
| [] -> true
| _ :: [] -> true
| i :: j :: xs' -> i <= j && sorted (j :: xs')
let rec count (xs : int list) (y : int) =
match xs with
| [] -> 0
| x::xs' -> (if x = y then 1 else 0) + count xs' y
let equal_elements (xs : int list) (ys : int list) =
all (fun (x : int) ->
count xs x = count ys x
) xs
&&
all (fun (x : int) ->
count xs x = count ys x
) ys
let sort_spec (xs : int list) (sort : int list -> int list) =
let ys = sort xs in
sorted ys && equal_elements xs ys
We made some specific coding choices above that made our two logical into a computable check that can actually be run.
Indeed, sort_spec xs sort
returns true if and only if sort
returns the correct answer on xs
.
There are some tricky points here.
- Why is our
sorted
function the same as our predicate for the list being sorted? - What about
equal_elements
?
What to do with a formalized spec?
What we have done above by defining sort_spec
is show that, for a given list xs
, we can decide whether sort
behaves correctly on that list xs
. Correctness of an algorithm on a specific input is usually easy to directly check. What's harder is showing that sort
behaves correctly on all inputs:
In general, questions like this one above is undecidable: no computer program, given the code for sort
, can automatically compute with absolute certainty whether the above formula is true. This is mostly due to the fact that int list
is infinite (modulo restrictions in OCaml about the sizes of integers), so we can't exhaustively check every case. However, even though this problem is undecidable, there are still a lot of things you can do.
Decidability
We can generalize the above discussion to talk about decidability in general. Given a set \(S\) (encoded via logical specs), the decision problem for \(S\) is the task of asking whether a given function \(f\) is inside of \(S\) or not. The decision problem for \(S\) is decidable if there exists a program decide
that, given \(f\), can always output a correct yes/no answer for whether \(f \in S\). A problem is undecidable if there does not exist such a program.
Thus, this set is decidable:
while this one isn't:
Property-Based Testing
Given a specification \(S\) given via logical specs, we can use property-based testing, or PBT, to test whether a given function \(f\) is in \(S\). The essence of PBT is to reinterpret all universal quantifiers as probabilistic checks, where we sample from a distribution (often many times). While we can't decide if \(f \in S\) with absolute certainty, PBT still allows us to automatically find counter-examples with minimal effort.
Since our logical spec for sort
is \(\forall xs, \texttt{sort_spec}(xs, \texttt{sort})\), we see we have one quantifier that we need to sample for. We sample this quantifier using a generator:
let gen_uniform_list (len_bound : int) : unit -> int list = fun () ->
let len = Random.int len_bound in
(* Standard library function to create lists using a function *)
List.init len (fun _ -> Random.int 100)
A note about Random
In OCaml, Random.int
generates random values based on a pseudorandom number generator (PRNG), which creates random values given a seed. If you use Random.int
(or any other function in Random
) and you want it to actually be random, you need to call Random.self_init ()
first. Alternatively, you can fix the seed by calling Random.init 32
.
In general, a generator for t
will have type unit -> t
; thus, we can think of gen_uniform_list
as being a function from integers (for the length bound) to generators for int list
.
We now create a sampler for the quantifier, which we will suggestively call forall
:
let rec forall_list (gen : unit -> int list) (prop : int list -> bool) (trials : int) : (int list) option =
if trials <= 0 then
None (* No counter-example found *)
else
let xs = gen () in
if prop xs then
forall_list gen prop (trials - 1)
else Some xs
Above, we are using the OCaml standard library functions for List. We will use them more going forward. If all tests succeed, we return
None
; else, we returnSome xs
, wherexs
is the counter-example of correctness. Additionally, we are abstracting over the generator we will use for the input.
Testing a sort function
Now that we have set things up, we can test whether a particular function is a good sorting algorithm. Let's use insertion sort:
let rec insert (x : int) (xs : int list) : int list =
match xs with
| [] -> [x]
| y :: ys -> if x <= y then x :: y :: ys else y :: insert x ys
let rec isort (xs : int list) =
match xs with
| [] -> []
| y :: ys -> insert y (isort ys)
Now, to test it:
utop# forall_list (gen_uniform_list 10) (fun xs ->
sort_spec xs isort
) 32;;
- : int list option = None
utop# let bozo_sort (xs : int list) = xs ;;
val bozo_sort : int list -> int list
utop # forall_list (gen_uniform_list 5) (fun xs -> sort_spec xs bozo_sort) 32;;
- : int list option = Some [62; 10; 65; 96]
We can see if we introduce a bug into isort
, PBT can catch it:
let rec insert' (x : int) (xs : int list) : int list =
match xs with
| [] -> [x]
| y :: ys -> if x <= y then y :: x :: ys else y :: insert x ys
let rec isort' (xs : int list) =
match xs with
| [] -> []
| y :: ys -> insert' y (isort' ys)
Unsoundness of PBT
While PBT is easy to use, its limitation is that it's unsound: given a function \(f\), PBT may say that \(f \in S\) (since all tests pass), but it might not actually be true that \(f \in S\). Let's explore unsoundness with an example:
let sort : int list -> int list = List.sort Int.compare ;;
let bad_sort (xs : int list) =
if List.length xs >= 300 && sorted xs then
[]
else
sort xs
What will PBT say about bad_sort
, if we use the gen_list
generator? In all likelihood, it will not catch any bug because
it's extremely unlikely to generate a list of length 300
that is sorted.
utop # forall (gen_uniform_list 500) (fun xs -> sort_spec xs bad_sort) 1000;;
- : int list option = None
What this shows us is that PBT is as only as good as the generators. Let's create a new generator that will (hopefully) catch the bug:
let gen_sorted (len : int) : unit -> int list = fun () ->
let rec aux (k : int) : int list * int =
if k <= 0 then ([], 0) else
let (xs, last) = aux (k - 1) in
let x = last + Random.int 10 in
(xs @ [x], x)
in
fst (aux (Random.int len))
Here, gen_sorted
is a random generator that only outputs sorted lists. We can then catch the bug by using gen_sorted
instead of gen_list
.
utop # forall_list (gen_sorted 500) (fun xs -> sort_spec xs bad_sort) 10;;
- : int list option = Some [3; 6; ...]
If we wanted, we could even fold both generators into one, by flipping a coin and choosing one to run:
let one_of (f : unit -> int list) (g : unit -> int list) (len : int) : unit -> int list =
if Random.int 10 <= 7 then
gen_uniform_list len
else
gen_sorted len
Which generator to use?
Above, we only caught the bug in bad_sort
because we knew it was testing for a sorted input (and thus we created a generator for sorted lists). This shows us two things:
- Uniform random inputs do not catch all corner cases --- especially corner cases that assume a lot of "structure" on the inputs (such as the input being sorted).
- To deploy PBT, sometimes it is useful to inspect the implementation of the code you're analyzing to learn about which generators to use. If you see that the code is testing the input for property
X
, it might be good to generate inputs that satisfy propertyX
.
Aside: Program Verification
Another route to checking sort
works on all inputs is to mathematically
prove it, by using proof assistants, such as Rocq,
Lean, ACL2,
and Dafny, among many others. These tools are incredibly
powerful, as they let you actually prove that sort
is correct for all inputs
--- not just probabilistically check it is correct for some. The tradeoff,
though, is that these tools are quite a bit more involved to use. Indeed, the
user has to often write many proof annotations in addition to their functions
and specifications. Indeed, very often the user needs to write more "proof
annotations" than actual code by a 5:1 ratio (or even higher!).
While we won't discuss program verification more in this class, the art of choosing a specification, and encoding it into a programming language, is necessary for both PBT and verification (and is quite useful even if you aren't testing).
Another example for unsoundness
Is it true that \(x \leq x + 1\) if \(x\) is an OCaml int
?
Let's see if it is:
utop #
let test (gen_int : unit -> int) =
let i = gen_int () in
i <= i + 1 ;;
- : (unit -> int) -> bool
utop # test (fun _ -> Random.int 500);;
- : bool = true (* This will always return true *)
But if we change the distribution :
This is because OCaml ints have a maximum value, and wrap around if we exceed that maximum value:
Whether this matters will depend heavily on your use case. Are you using int
to store quantities in an inventory system? You likely won't hit this number (around 4000 trillion). But if you are sampling unique integers from the space of all integers, then you need to be careful!