Lecture 22: Generalizing hypotheses
1 Purpose
Explore how to construct proofs when the induction hypotheses are too weak.
2 Outline
In HW2, we defined a compiler from AddExpressions to a stack of Instrs: i.e., a simple stack-based calculator. We wrote a specification for compiler correctness, and tested that using PBT. Today, we’ll recreate that same example in Lean, and this time, rather than testing the correctness on some randomly generated examples, we’ll prove it correct.
Here, we define an inductive type for ArithExpr
, and a function
to evaluate them. I’m showing a naming convenience that Lean has here,
where if you have a value v
of type foo
, if you write
v.bar
, Lean will look for a function called foo.bar
and
call it on v
. i.e., v.bar
is shorthand for
foo.bar v
. That means that by writing ArithExpr.eval
, we
can write e_1.eval
to mean ArithExpr.eval e_1
. Also,
in the definition of ArithExpr.eval
, we can write
pattern matching cases without the ArithExpr
prefix that
would normally be needed.
I’ve also used subscripts – you can write them by typing \_1.
inductive ArithExpr :=
| num (n : Nat)
| add (e₁ e₂ : ArithExpr)
def ArithExpr.eval : ArithExpr → Nat
| num n => n
| add e₁ e₂ => e₁.eval + e₂.eval
Now, we need to implement our stack calculator, and translation (or compiler) to it.
inductive Instr :=
| push (n : Nat)
| add
def Instr.eval : List Instr → List Nat → List Nat
| [], s => s
| push n :: is, s => Instr.eval is (n :: s)
| add :: is, n1 :: n2 :: s => Instr.eval is (Nat.add n1 n2 :: s)
| add :: _, _ => []
As before, we define instr
as either push n
or
add
, and define an evaluation function for instructions. This time, pattern matching made it easier to define, so we didn’t bother to separate out into a separate function on single instructions and one on sequences.
Now finally, we define our compile function:
def ArithExpr.compile : ArithExpr → List Instr
| num n => [Instr.push n]
| add e₁ e₂ => e₁.compile ++ e₂.compile ++ [Instr.add]
Here I’ve chosen to use ArithExpr
as the namespace for
the function, as that is the input type, so I can write
e_1.compile
to mean ArithExpr.compile e_1
.
Now, we can prove that the compiler is correct. First, we write the statement of compiler correctness, leaving the proof blank:
theorem compile_correct1 :
forall e : ArithExpr,
Instr.eval (e.compile) [] = [e.eval] := by sorry
This says exactly what we want: if we compile an expression, and then run it, we end up with a stack that has a single element: the number we would get if we had evaluated the expression. Let’s see if we can prove it:
theorem compile_correct2 :
forall e : ArithExpr,
Instr.eval (e.compile) [] = [e.eval] :=
by
intro e
induction e
case num n => rfl
case add e₁ e₂ ih₁ ih₂ =>
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
sorry
We proceed by induction on the structure of our expressions, which makes sense, since they are inductive structures, and our compiler follows that structure.
We can prove the base case by unfolding the definitions of compile and both evaluations. In the other case, we can do the same unfoldings, but then things go wrong. Let’s look carefully at where we are stuck:
e₁ e₂: ArithExpr
ih₁: Instr.eval (ArithExpr.compile e₁) [] = [ArithExpr.eval e₁]
ih₂: Instr.eval (ArithExpr.compile e₂) [] = [ArithExpr.eval e₂]
⊢ Instr.eval (ArithExpr.compile e₁ ++ ArithExpr.compile e₂ ++
[Instr.add]) [] = [ArithExpr.eval e₁ + ArithExpr.eval e₂]
The last lines show what we need to prove: that evaluating the result
of compiling e₁
and e₁
, appended to each other followed
by Instr.add
, is the same as evaluating e₁
and e₂
and adding them. Seems reasonable enough, but we are stuck. The
problem is that our inductive hypotheses can’t help us: they are
the only thing we know about e₁
and e₂
, but directly we
can’t use them, since they say what happens when we evaluate just the compiled e₁
or e₂
, not when we evaluate the larger program that we have.
When using an interactive theorem prover, it can be tempting to just move forward, doing whatever seems like it might work. While for simple proofs this may work, it is often a bad idea in general. Instead, whenever you aren’t sure what the next step should be, stop and think about why you are stuck, and why you think the theorem is true (or not true) – use that to move forward.
Here, one of the problems is that
our obligation is about Instr.eval
ing a bunch of things in
sequence, but our hypotheses are only about parts of that. We would
like to use the induction hypotheses to "step through" the evaluation.
Let’s define an auxiliary theorem to help us do exactly that:
Note: to go back and define such a helper theorem, we’ll often use the "pause" button in the Lean Infoview, so we can define the theorem while still having access to the state of the proof when we got stuck (as often we’ll be copying parts of it to make the theorem statement). Once we’ve defined the theorem, we can go back to where we got stuck and resume.
theorem instr_eval_app1 :
forall (is is' : List Instr),
Instr.eval (is ++ is') [] =
Instr.eval is' (Instr.eval is []) := by sorry
Now, at this point, we have two possibilities. We can either stop and prove this new theorem, or we can assume that it is true, and then use it to prove our original theorem, and then come back and prove the auxiliary later, once you know its useful.
Which you do depends both on your style, but also how much you trust the auxiliary theorem to be true! If you do a bunch of work on the main theorem and then find out the auxiliary theorem is false, you’ve wasted a lot of time. On the other hand, if you do a bunch of work proving a helper theorem and it turns out not to be what you need, you’ve also wasted time!
Here, we’re pretty sure that the helper theorem is true, so we’ll come back to prove it later. Let’s see if it’ll help us prove our theorem.
theorem compile_correct3 :
forall e : ArithExpr,
Instr.eval (e.compile) [] = [e.eval] :=
by
intro e
induction e
case num n =>
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
case add e₁ e₂ ih₁ ih₂ =>
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
rw [instr_eval_app1]
rw [instr_eval_app1]
rw [ih₁]
sorry
This seemed to help, but we’re still stuck. Let’s look at what we have:
e e₂: ArithExpr
ih₁: Instr.eval (ArithExpr.compile e₁) [] = [ArithExpr.eval e₁]
ih₂: Instr.eval (ArithExpr.compile e₂) [] = [ArithExpr.eval e₂]
⊢ Instr.eval [instr.add]
(Instr.eval (ArithExpr.compile e₂)
[ArithExpr.eval e₁]) =
[ArithExpr.eval e₁ + ArithExpr.eval e₂]
So, first we were able to use our lemma to turn our single
Instr.eval
into a sequence of nested Instr.eval
s. And
the innermost one we can use our induction hypothesis on. But now we
are stuck, because we want to use our other induction hypothesis (ih₂
), but
it doesn’t apply, as it tells us how ArithExpr.compile e₂
runs
in an empty stack, not in a stack with ArithExpr.eval e₁
on it!
This is a case where our induction hypotheses are not strong enough, and indeed, our entire theorem is not strong enough to be proved. This may seem like a strange claim, that we need to make the theorem more general in order to prove it, but when proving via induction, the theorem you are proving is also how the induction hypotheses are constructed, so if you need stronger induction hypotheses, it may mean you need a more general theorem. Then if we can prove this, we can define the original theorem as a corollary.
Let’s define a new theorem. Note that we also have to generalize our helper lemma, or else it will not apply!
theorem instr_eval_app2 :
forall (is is' : List Instr) (s : List Nat),
Instr.eval (is ++ is') s =
Instr.eval is' (Instr.eval is s) :=
by sorry
theorem compile_correct_helper :
forall (e : ArithExpr) (s : List Nat),
Instr.eval (e.compile) s = (e.eval :: s) :=
by
intro e
induction e
case num n =>
intro s
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
case add e₁ e₂ ih₁ ih₂ =>
intro s
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
rw [instr_eval_app2]
rw [instr_eval_app2]
rw [ih₁]
rw [ih₂]
simp only [Instr.eval]
sorry
theorem compile_correct :
forall (e : ArithExpr),
Instr.eval (e.compile) [] = [e.eval] :=
fun e => compile_correct_helper e []
Now, we almost have the result: the only issue is that the order
of our operands is backwards. Now, we could finish this proof by
appealing to the commutativity of addition, but before doing that,
its worth wondering why this happened. If we do that,
we realize that our definition of compile is wrong! Since we are
pushing onto a stack, we should be compiling the second operand first,
then the first. Or, perhaps the Instr.add
instruction should
swap when it computes. Once we fix that, the proof goes through.
...not quite
As it turns out, there is a bit more to the story. When we go back to prove the instr_eval_app1
theorem, we’ll probably realize that it is actually not provable (or true!). The problem is, the way we handle add
instructions, we cannot
actually decompose two sequences of instructions! For example:
add, push 1
Will run, as a whole program, to the empty stack (since when it reaches the first add
, the stack is empty, so it stops running), but if we decompose it into the two instruction sequences:
add
and
push 1
Then, according to our lemma, we should be able to run the first (which will result in an empty stack), and then run the second using the stack that resulted from the first. Which would mean the decomposed version would run to the stack [1]
, which is not the same!
Really, the issue is that the way that we handle errors does not compose: there is no way of distinguishing an erroneous program from one that simply results in an empty stack. If we add that distinction, by having Instr.eval
return not a stack, but one of two possibilities: either a stack, or another value indicating an error, we can fix this. We can do this using the standard Option
type from the standard library: it is a data type with two constructors: one that contains a value (Option.some v
), and the other that contains nothing (Option.none
). It makes the proof a little more complicated, but it all goes through:
def ArithExpr.compile : ArithExpr → List Instr
| num n => [Instr.push n]
| add e₁ e₂ => e₂.compile ++ e₁.compile ++ [Instr.add]
def Instr.eval : List Instr → List Nat → Option (List Nat)
| [], s => Option.some s
| push n :: is, s => Instr.eval is (n :: s)
| add :: is, n1 :: n2 :: s => Instr.eval is (Nat.add n1 n2 :: s)
| add :: _, _ => Option.none
theorem instr_eval_app :
forall (is is' : List Instr) (s s' : List Nat),
Instr.eval is s = Option.some s' ->
Instr.eval (is ++ is') s =
Instr.eval is' s' :=
by intros is
induction is <;> intros is' s s' H
case nil =>
rw [List.nil_append]
cases H
rfl
case cons i is IH =>
cases i
case push n =>
simp only [Instr.eval]
simp only [Instr.eval] at H
have H1 : List.append is is' = is ++ is' := by rfl
rw [H1]
rw [IH]
assumption
case add =>
cases s
case nil => simp only [Instr.eval] at H
case cons n1 s =>
cases s
case nil => simp only [Instr.eval] at H
case cons n2 s =>
simp only [Instr.eval]
simp only [Instr.eval] at H
have H1 : List.append is is' = is ++ is' := by rfl
rw [H1]
rw [IH]
assumption
theorem compile_correct_helper :
forall (e : ArithExpr) (s : List Nat),
Instr.eval (e.compile) s = Option.some (e.eval :: s) :=
by
intro e
induction e
case num n =>
intro s
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
case add e₁ e₂ ih₁ ih₂ =>
intro s
simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
rw [instr_eval_app]
-- at this point, have to be careful when solving for s'
case a =>
rw [instr_eval_app]
apply ih₁
case a =>
apply ih₂
simp only [Instr.eval]
rfl
theorem compile_correct :
forall (e : ArithExpr),
Instr.eval (e.compile) [] = Option.some [e.eval] :=
by intros e
apply compile_correct_helper