Skip to content

Intro to ITPs, Part 1

Throughout the semester, we've seen the enormous variety of logical invariants we can express using Property Based Testing.

  • The obvious weakness of PBT, which we have discussed many times, is that it only tests a certain subset of the inputs.
  • In order to have that be useful, we have to be sure that we are testing a representative subset
  • to know if it representative, we may have to already know where bugs may appear: possibly not a reasonable assumption!

Despite that, it is remarkably flexible and useful.

However, if our code had a bug that only shows up on larger inputs (or particular uncommon cases), this testing might not find it.

For many applications, this is good enough -- it's certainly better than how the vast majority of software is tested -- but there are domains (medical devices, cryptography, fly-by-wire systems, etc) where we want to do better, and logic gives us tools.

How can we possibly reason about data of arbitrary size? Or at least, arbitrary size that will fit into our computers? We certainly can't run on all inputs.

  • The key idea: same as how we write recursive functions! this, in principle, works on data of arbitrary size.

While you cannot construct an infinitely long list, the following length function will correctly work on any list you can construct it (of which there are, in practical terms, infinitely many):

let rec len l =
  match l with
  | [] -> 0
  | _ :: rest -> 1 + len rest

This may seems basic, since you've been dealing with these kinds of functions for a while, but it's worth reflecting on that fact: this three line function will work on any list you can create.

Now let's move a little closer to math, and think about a first proof.

Consider one of the simplest possible functions:

(* f : Natural -> Natural *)
(* adds 0 to its input *)

And we want to show that f(x) = x for all natural numbers. If we were happy testing a few random numbers, we could use PBT. But we want to know this holds for any natural number.

In particular, we want to be sure that the function doesn't do, for example, this:

let f x =
  if x > (Int.max_int - 30)
  then 0
  else x
\[f(x : \mathbb{N}) \{\textrm{if} (x > (2^{64} - 30)) \textrm{then}\{0\} \textrm{else} \{x + 0 \}\}\]

i.e., for any numbers beyond \(2^{64} - 30\), instead of returning the same number, it just returns 0. How can we prove that our function doesn't do that?

To understand this, we have to think more carefully about how we define natural numbers, as our proof is going to rely upon that exact definition. There are multiple ways of doing that, but in Rocq, they are defined as:

Inductive nat : Type :=
  O : nat
| S : nat -> nat.

i.e., a Nat is either O (zero), or it is the successor of some other Nat. This is an inductive definition, as it is defined in terms of its self, or inductively. All terms in Lean have types, including constructors (O and S) and types (Nat).

This looks just like an ocaml ADT!

Inductive data types are data types defined in terms of themselves. ADTs are not inductive when they depend on things like mutable state and there is some mismatch with the underlying execution model and the language. In general you can think "ADT = Inductive data type"

To prove our theorem, that is, that \(\forall x : nat, f(x) = x\), we need to show that it holds for all natural numbers. To do this, we appeal to the inductive structure of the data: we show that the property holds for 0, and then show that, assuming it holds for some arbitrary Nat m, that it holds for S m (i.e., one more than m).

By just showing those two cases, just like with our recursive len function, we can show that the property holds for any natural number, as any natural number is constructed by repeated applications of S on top of O, and thus we can repeatedly appeal the second case of our proof until we get to the base case and then appeal to the first case.

Because proofs can get somewhat unwieldy, we will usually use tactics to help construct them. Tactics are procedures that can inspect the current partially constructed proof and add to it. All theorems in Rocq use tactics by default, so when you start working on a proof, you are switched to tactic mode.

This allows our theorem, among other things, to construct trivial equality proofs with reflexivity and unfold the definition of f. Note that, in general, we can only directly prove trivial equalities (using reflexivity); for more interesting ones, we often have to rewrite them (using rewrite) to a trivial equality.

Definition f (x : nat) := x.

Theorem f_is_good : forall x, f x = x.
Proof.
  intros x.
  induction x.
  - reflexivity.
  - unfold f. reflexivity.
Qed.
A couple other important things in this, our first proof: first, note that we have the same forall construct as we've seen before. Rocq is a higher-order logic, which means that we can quantify over terms (we'll see exactly what that means later). Next, note that our theorem statement is the type of our function: it's what we return. This is the type of an equality proof (= is actually just another inductive type, though one for which there is lots of built in support for working with).

Let's see a counter-example:

Definition f_bad (x : nat) := if Nat.ltb x 100 then 0 else x.

Theorem f_bad_is_bad : forall x, f x = x.
Proof.
  intro x.
  induction x.
  - unfold f. reflexivity.
  - unfold f. admit.

The idea that we construct proofs as functional programs, and the types are the theorem statements, is a very important idea, often called the Curry-Howard isomorphism (or "proofs-as-programs," "propositions-as-types"), and it first showed up in a theorem prover called AUTOMATH, created by the mathematician and computer scientist N.G. de Bruijn in 1968.

It's worth actually reflecting a bit on the history of these theorem provers, and really, the notion of truth that they give us.

What is proof?

To start, we need to be clear about what a proof is, and in particular, what it means within the Rocq Theorem Prover. First, what is a proof? A proof is an argument for some statement, described so that someone else can follow, and in doing so be convinced of the statement. Proofs are thus, historically, very social and contextual things: a step that the reader will be able to jump over can be left out, for example.

What about counterexamples?

Related to proofs, and in some sense, simpler, are counter-examples. These are refutations of statements by means of a concrete example.

An example is the statement "all prime numbers are odd" this can be refuted by giving the counter-example 2, which is both a prime number and not odd. In fact finding contradictions is often easier is the reason why many proofs in math assume the statement to be proved is not true, and then use that to derive a contradiction.

It's also how Property Based Testing works!

There is still context, of course: what is a prime number, what is odd, etc: every bit of mathematics and logic builds upon some shared social understanding of the meaning of certain things.

People over centuries (or millenia) have found useful in trying to understand the world.

What is mechanized proof?

Much more recently, and nearly as old as computers (1950s), people have been trying to use computers to carry about, or assist in carrying out, proofs.

The appeal is two-fold: first, it can be a natural fit

  • if theorems are intended to be a series of steps, each which follow according to some logical rule from the previous, computers are quite good at following steps without forgetting ones in the middle.

  • Humans are certainly more creative, but this comes at a cost: we sometimes make leaps without explanation.

  • Of course, sometimes those leaps are correct, and some of the most creative, baffling mathematics arises from those kind of jumps (e.g., some of the number theory of Ramanujan, whose results were divinely inspired, appearing to him in dreams).

  • However, sometimes these leaps are difficult to confirm, or end up turning out to be not true!

Second, it lowers cognitive burden

  • even from the beginning, was the thought that if proof was merely the application of logical steps, one after another, perhaps proof could be rendered mechanical, and people could focus on the "interesting" parts:

  • developing new theories, adding new rules, and posing new conjectures (to be automatically proved or disproved).

It turned out that the initial optimism was somewhat misplaced

while for limited theories, computers can come up with proofs, and indeed, SMT is probably the most amazing result from that realm, beyond that, things become intractable.

Indeed, for most interesting math, there are simply too many possible approaches, and there is no tractable procedure by which a proof will be discovered.

This led to an alternative approach, and a new set of tools, proof assistants, of which Rocq is a new incarnation.

In a proof assistant, a person is primarily responsible for constructing the proof, but each step that they carry out is checked to be valid. Thus, while proofs cannot be discovered automatically (at least, not in general), even quite rich proofs can be checked automatically, and that still turns out to be incredibly valuable, as it eliminates a serious potential risk. Assuming one trusts the proof checker (which, we'll talk about), one need not trust the person who wrote the proof, only that the statement says what we want it to say. Given that proofs can often be orders of magnitude more complicated than theorem statements, this is a huge benefit.

What's in Rocq?

There are two key ideas in Rocqs. The first we mentioned earlier,

  • the idea that theorems are types and proofs are terms.

  • While some people are enamored with this from a theoretical perspective,

  • it's appeal is quite practical: it means we can reuse key ideas across programming and proving.

  • Indeed, while in some ways, mechanized theorem proving will feel quite different, in some ways, it should feel familiar:

  • like learning a new programming language, just a very very different one.

There's another idea, which actually comes from an entirely different theorem prover, developed around the same time as de Bruijn's AUTOMATH.

  • LCF (or Edinburgh LCF), created by Robin Milner in 1972,
  • the main idea was to have a trusted core that protected logical truth, and then have a flexible language with which to define conveniences for writing proofs.

As Milner worked with LCF

  • Not user friendly. mechanized proofs could be very tedious
  • much easier if you could leverage programming to make it easier.

These procedures, called tactics, are a key idea in Rocq:

  • we pretty much won't write proofs without using some tactics!

We used some tactics in the above proof: rewrite is one of the main ones we'll use, and it is the main way that we use equalities in our proofs. reflexivity is also a tactic, but a simpler one: it constructs a proof of equality between two terms that are equivalent. Equality is itself an inductive type, with a single constructor that allows reflexive proofs, and so reflexivity is really just a convenience over the constructor eq_refl.

While we are motivated to move to Rocq by proving properties involving infinite types, we will begin our journey at the beginning: with Rocq as a programming language.

Installing Rocq

Please follow instructions for your platform at https://rocq-prover.org/install

Programming in Rocq

Rocq is a typed functional programming language. Its type system is more powerful than nearly all others, but it shares many features with other typed functional languages like OCaml, Haskell, Scala, etc.

For example: all values have types, which we can inspect with Check; this is part of the interactive nature of Rocq. After each of these commands, we can run the file "to the point" -- i.e., where our cursor is. If you are using VSCode, there is a horizontal arrow near the top of the window (and a corresponding hotkey). Other editors have a similar mechanism.

Check true.
Check 42.

We can make Definitions:

Definition m : nat := 1. (* m is a natural number *)
Definition n : nat := 0.
Definition b1 : bool := true. (*  b1 is a boolean *)
Definition b2 : bool := false.

Here we've given explicit type annotations to each of the definitions. Often, Rocq can infer these, but we will usually write them explicitly both to avoid any ambiguity and because it can make it easier to read.

We can, of course, ask for the type of definitions and expressions using them.

Check m.            (* output: nat *)
Check n.
Check n + 0.        (* nat *)
Check m * (n + 0).  (* nat *)
Check b1.           (* bool *)
(* Check b1 && b2. Doesn't work. *)
Check (b1 && b2)%bool.     (* "&&" is the Boolean and *)
Check (b1 || b2)%bool.     (* Boolean or *)

What happened with the last two? Rocq has a very flexible syntax system, and one mechanism of that is that infix notation (e.g., &&) is defined in what are called "scopes". You can create and add to these yourself, and to show all the currently available ones, use:

Print Scopes.

From that, we can see bool_scope, which has the "delimiting key" bool. Any expression can be wrapped with (...)%key and the notation within will use the scope described by key.

You can also open scopes:

Open Scope bool_scope.

After which you don't need to use the delimiter, which is done by default for nat_scope (which is why + and * worked above).

And since it is a programming language, we can also evaluate expressions:

Eval compute in 5 * 4.         (* 20 *)
Eval compute in m + 2.         (* 3 *)
Eval compute in b1 && b2.      (* false *)

Some notations and operations are defined in other modules, which we can import with Require Import:

(* Eval compute in (m + 7) < 2. Doesn't work*)
Print Scope nat_scope.
From Stdlib Require Arith.
Open Scope nat_scope.

Eval compute in (m + 7) < 2. 

Definition can also be used to create (non recursive) functions, which are expressions that are abstracted over (typed) arguments:

From Stdlib Require String.
Definition g (s : string) : bool := Nat.ltb 0 (length s).

Here, we are using an operation String.length (functions from modules can always be named qualified by their module name) to calculate how long a string is. Note that we are not using <, despite it existing, as that will move us into the world of proof -- something we will come back to. Instead, we use the "ltb" (less than boolean) function from the natural number module to compare.

We can also use other typical data structures, like lists. But recursive functions, in Rocq, are defined with Fixpoint, not with Definition.

Fixpoint my_length (l : list nat) := (match l with
  | nil => 0
  (* | cons _ xs => 1 + my_length xs *)
  | _ :: xs => 1 + my_length xs
  end)%list.

This additionally shows pattern matching, which can be used both to discriminate different cases (here, the nil and cons cases) and also to unpack the data and give names to part (as in the cons case, we only want the rest of the list; and additionally use the infix syntax for cons, written ::).

But we aren't really here to learn about typed functional programming, no matter how rich. Our goal is to prove properties of programs. So how do we get from a typed functional programming language to theorem proving?

Minimal propositional logic

We'll start with a simple problem, perhaps the simplest: how to express, and prove theorems about, minimal propositional logic. That is, the language that only has logical variables and implications. This will allow us to become familiar with what it means to express theorems and prove them, and understand what that means!

In Rocq, as we've said, everything has a type, including logical statements, or propositions. The type of propositions is Prop, and we can declare that we have arbitrary ones using Parameter:

Parameter P Q R : Prop.

Here, we've declared three variables that all have type Prop. We can read this as "consider arbitrary propositions P, Q, R".

Parameter works for any type; we could write "consider arbitrary natural numbers, or arbitrary lists, or arbitrary graphs, etc".

Now, Rocq is both a programming language and a tool for expressing and proving theorems. It could have added separate rules for logic: implication as an arrow \(\Rightarrow\), and then logical rules for introduction (abstraction) and elimination (modus ponens), and many more rules for different logical constructions, but it turns out that the programming language is rich enough to express the logic. In the context of minimal propositional logic, rather than having a separate implication arrow, we can re-use the same arrow \(\rightarrow\) that we use for functions.

Thus Rocq makes no distinction between a function that takes P and produces Q and a theorem that, given P, yields Q. This is partly made possible because function arrows in Rocq are dependent: the output type can depend on the value of the input type. This is quite important for theorems, as, e.g., in the example we had last time. x is the value of the input, and it is used in the output type (a proof about equality). It can also be useful for programming, though we will not explore this much: e.g., you might encode invariants of a data structure in the types, and the output could then relate in sophisticated ways to the input.

If a function arrow is an implication, then what is a proof of such a statement?

i.e., we can write down the following theorem:

Parameter P Q R : Prop.

Theorem T : (P -> Q) -> P -> Q.

Which we can read as: given a theorem from P to Q and a proof of P, we can prove Q. The proof is a convincing argument that one can proceed as described. A particularly convincing argument is a procedure that given the inputs, produces the output. What procedure could that be? Again, we can appeal to our understanding of this as a type in a programming language. The procedure is a program with the given type. We can complete the proof as follows:

Parameter P Q R : Prop.

Theorem T : (P -> Q) -> P -> Q.
Proof.
  exact (fun Hpq Hp => Hpq Hp).
Qed.

i.e., our proof consists of a single tactic, exact, which completes a proof with term that inhabits the type that is the goal of the theorem (or the goal of whatever part of the theorem the tactic is used in -- since it is used at the beginning, it is the entire theorem). Here, it is a function that takes Hpq, p and returns Hpq applied to p.

We could write exactly the same thing in ocaml:

(* t : (P -> Q) -> P -> Q *)
let t (Hpq : 'p -> 'q) (Hp : 'p) : 'q =
  Hpq Hp

Note that the type had (P -> Q) taken as an argument, returning a function from P to Q. For compactness, we wrote the term as a function of two arguments, rather than a series of functions of one argument, but the latter would be equivalent:

Parameter P Q R : Prop.

Theorem T : (P -> Q) -> P -> Q.
Proof.
  exact (fun Hpq => fun Hp => Hpq Hp).
Qed.

We could also write this without the anonymous function, as:

Parameter P Q R : Prop.

Definition F (Hpq : P -> Q) (p : P) : Q := Hpq p.

Theorem T : (P -> Q) -> P -> Q.
Proof.
  exact F.
Qed.

This same dichotomy between functions defined with or without lambda is present in OCaml, where the following two are identical:

let myfun x y = x + y
let myfun = fun x y -> x + y

As an exercise, construct a term for the following theorem:

Parameter P Q R : Prop.
Theorem T' : (P -> Q -> R) -> Q -> P -> R.
Proof.
  exact _.
Qed.