On this page:
1 Purpose
2 Outline
8.7

Lecture 22: Generalizing hypotheses

1 Purpose

Explore how to construct proofs when the induction hypotheses are too weak.

2 Outline

In HW2, we defined a compiler from AddExpressions to a stack of Instrs: i.e., a simple stack-based calculator. We wrote a specification for compiler correctness, and tested that using PBT. Today, we’ll recreate that same example in Lean, and this time, rather than testing the correctness on some randomly generated examples, we’ll prove it correct.

Here, we define an inductive type for ArithExpr, and a function to evaluate them. I’m showing a naming convenience that Lean has here, where if you have a value v of type foo, if you write v.bar, Lean will look for a function called foo.bar and call it on v. i.e., v.bar is shorthand for foo.bar v. That means that by writing ArithExpr.eval, we can write e_1.eval to mean ArithExpr.eval e_1. Also, in the definition of ArithExpr.eval, we can write pattern matching cases without the ArithExpr prefix that would normally be needed.

I’ve also used subscripts – you can write them by typing \_1.

inductive ArithExpr :=
  | num (n : Nat)
  | add (e₁ e₂ : ArithExpr)
def ArithExpr.eval : ArithExpr  Nat
  | num n => n
  | add e₁ e₂ => e₁.eval + e₂.eval

Now, we need to implement our stack calculator, and translation (or compiler) to it.

inductive Instr :=
  | push (n : Nat)
  | add
def Instr.eval : List Instr  List Nat  List Nat
  | [], s => s
  | push n :: is, s => Instr.eval is (n :: s)
  | add :: is, n1 :: n2 :: s => Instr.eval is (Nat.add n1 n2 :: s)
  | add :: _, _ => []

As before, we define instr as either push n or add, and define an evaluation function for instructions. This time, pattern matching made it easier to define, so we didn’t bother to separate out into a separate function on single instructions and one on sequences.

Now finally, we define our compile function:

def ArithExpr.compile : ArithExpr  List Instr
  | num n => [Instr.push n]
  | add e₁ e₂ => e₁.compile ++ e₂.compile ++ [Instr.add]

Here I’ve chosen to use ArithExpr as the namespace for the function, as that is the input type, so I can write e_1.compile to mean ArithExpr.compile e_1.

Now, we can prove that the compiler is correct. First, we write the statement of compiler correctness, leaving the proof blank:

theorem compile_correct1 : 
  forall e : ArithExpr, 
    Instr.eval (e.compile) [] = [e.eval] := by sorry

This says exactly what we want: if we compile an expression, and then run it, we end up with a stack that has a single element: the number we would get if we had evaluated the expression. Let’s see if we can prove it:

theorem compile_correct2 :
  forall e : ArithExpr, 
    Instr.eval (e.compile) [] = [e.eval] :=
  by 
    intro e
    induction e
    case num n => rfl 
    case add e₁ e₂ ih₁ ih₂ => 
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
      sorry

We proceed by induction on the structure of our expressions, which makes sense, since they are inductive structures, and our compiler follows that structure.

We can prove the base case by unfolding the definitions of compile and both evaluations. In the other case, we can do the same unfoldings, but then things go wrong. Let’s look carefully at where we are stuck:

e₁ e₂: ArithExpr
ih₁: Instr.eval (ArithExpr.compile e₁) [] = [ArithExpr.eval e₁]
ih₂: Instr.eval (ArithExpr.compile e₂) [] = [ArithExpr.eval e₂]
 Instr.eval (ArithExpr.compile e₁ ++ ArithExpr.compile e₂ ++
  [Instr.add]) [] = [ArithExpr.eval e₁ + ArithExpr.eval e₂]

The last lines show what we need to prove: that evaluating the result of compiling e₁ and e₁, appended to each other followed by Instr.add, is the same as evaluating e₁ and e₂ and adding them. Seems reasonable enough, but we are stuck. The problem is that our inductive hypotheses can’t help us: they are the only thing we know about e₁ and e₂, but directly we can’t use them, since they say what happens when we evaluate just the compiled e₁ or e₂, not when we evaluate the larger program that we have.

When using an interactive theorem prover, it can be tempting to just move forward, doing whatever seems like it might work. While for simple proofs this may work, it is often a bad idea in general. Instead, whenever you aren’t sure what the next step should be, stop and think about why you are stuck, and why you think the theorem is true (or not true) – use that to move forward.

Here, one of the problems is that our obligation is about Instr.evaling a bunch of things in sequence, but our hypotheses are only about parts of that. We would like to use the induction hypotheses to "step through" the evaluation.

Let’s define an auxiliary theorem to help us do exactly that:

Note: to go back and define such a helper theorem, we’ll often use the "pause" button in the Lean Infoview, so we can define the theorem while still having access to the state of the proof when we got stuck (as often we’ll be copying parts of it to make the theorem statement). Once we’ve defined the theorem, we can go back to where we got stuck and resume.

theorem instr_eval_app1 : 
  forall (is is' : List Instr),
    Instr.eval (is ++ is') [] = 
      Instr.eval is' (Instr.eval is []) := by sorry

Now, at this point, we have two possibilities. We can either stop and prove this new theorem, or we can assume that it is true, and then use it to prove our original theorem, and then come back and prove the auxiliary later, once you know its useful.

Which you do depends both on your style, but also how much you trust the auxiliary theorem to be true! If you do a bunch of work on the main theorem and then find out the auxiliary theorem is false, you’ve wasted a lot of time. On the other hand, if you do a bunch of work proving a helper theorem and it turns out not to be what you need, you’ve also wasted time!

Here, we’re pretty sure that the helper theorem is true, so we’ll come back to prove it later. Let’s see if it’ll help us prove our theorem.

theorem compile_correct3 :
  forall e : ArithExpr, 
    Instr.eval (e.compile) [] = [e.eval] :=
  by 
    intro e
    induction e
    case num n =>
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
    case add e₁ e₂ ih₁ ih₂ => 
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
      rw [instr_eval_app1]
      rw [instr_eval_app1]
      rw [ih₁]
      sorry

This seemed to help, but we’re still stuck. Let’s look at what we have:

e e₂: ArithExpr
ih₁: Instr.eval (ArithExpr.compile e₁) [] = [ArithExpr.eval e₁]
ih₂: Instr.eval (ArithExpr.compile e₂) [] = [ArithExpr.eval e₂]
 Instr.eval [instr.add]
  (Instr.eval (ArithExpr.compile e₂)
    [ArithExpr.eval e₁]) =
  [ArithExpr.eval e₁ + ArithExpr.eval e₂]

So, first we were able to use our lemma to turn our single Instr.eval into a sequence of nested Instr.evals. And the innermost one we can use our induction hypothesis on. But now we are stuck, because we want to use our other induction hypothesis (ih₂), but it doesn’t apply, as it tells us how ArithExpr.compile e₂ runs in an empty stack, not in a stack with ArithExpr.eval e₁ on it!

This is a case where our induction hypotheses are not strong enough, and indeed, our entire theorem is not strong enough to be proved. This may seem like a strange claim, that we need to make the theorem more general in order to prove it, but when proving via induction, the theorem you are proving is also how the induction hypotheses are constructed, so if you need stronger induction hypotheses, it may mean you need a more general theorem. Then if we can prove this, we can define the original theorem as a corollary.

Let’s define a new theorem. Note that we also have to generalize our helper lemma, or else it will not apply!

theorem instr_eval_app2 : 
  forall (is is' : List Instr) (s : List Nat),
    Instr.eval (is ++ is') s = 
      Instr.eval is' (Instr.eval is s) := 
 by sorry

theorem compile_correct_helper :
  forall (e : ArithExpr) (s : List Nat), 
    Instr.eval (e.compile) s = (e.eval :: s) :=
  by 
    intro e
    induction e
    case num n =>
      intro s
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
    case add e₁ e₂ ih₁ ih₂ => 
      intro s
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
      rw [instr_eval_app2]
      rw [instr_eval_app2]
      rw [ih₁]
      rw [ih₂]
      simp only [Instr.eval]
      sorry

theorem compile_correct :
  forall (e : ArithExpr),
    Instr.eval (e.compile) [] = [e.eval] := 
    fun e => compile_correct_helper e []

Now, we almost have the result: the only issue is that the order of our operands is backwards. Now, we could finish this proof by appealing to the commutativity of addition, but before doing that, its worth wondering why this happened. If we do that, we realize that our definition of compile is wrong! Since we are pushing onto a stack, we should be compiling the second operand first, then the first. Or, perhaps the Instr.add instruction should swap when it computes. Once we fix that, the proof goes through.

...not quite

As it turns out, there is a bit more to the story. When we go back to prove the instr_eval_app1 theorem, we’ll probably realize that it is actually not provable (or true!). The problem is, the way we handle add instructions, we cannot actually decompose two sequences of instructions! For example:

add, push 1

Will run, as a whole program, to the empty stack (since when it reaches the first add, the stack is empty, so it stops running), but if we decompose it into the two instruction sequences:

add

and

push 1

Then, according to our lemma, we should be able to run the first (which will result in an empty stack), and then run the second using the stack that resulted from the first. Which would mean the decomposed version would run to the stack [1], which is not the same!

Really, the issue is that the way that we handle errors does not compose: there is no way of distinguishing an erroneous program from one that simply results in an empty stack. If we add that distinction, by having Instr.eval return not a stack, but one of two possibilities: either a stack, or another value indicating an error, we can fix this. We can do this using the standard Option type from the standard library: it is a data type with two constructors: one that contains a value (Option.some v), and the other that contains nothing (Option.none). It makes the proof a little more complicated, but it all goes through:

def ArithExpr.compile : ArithExpr  List Instr
  | num n => [Instr.push n]
  | add e₁ e₂ => e₂.compile ++ e₁.compile ++ [Instr.add]

def Instr.eval : List Instr  List Nat  Option (List Nat)
| [], s => Option.some s
| push n :: is, s => Instr.eval is (n :: s)
| add :: is, n1 :: n2 :: s => Instr.eval is (Nat.add n1 n2 :: s)
| add :: _, _ => Option.none

theorem instr_eval_app : 
  forall (is is' : List Instr) (s s' : List Nat),
    Instr.eval is s = Option.some s' ->
    Instr.eval (is ++ is') s = 
      Instr.eval is' s' := 
 by intros is
    induction is <;> intros is' s s' H
    case nil =>
      rw [List.nil_append]
      cases H
      rfl
    case cons i is IH =>
      cases i
      case push n => 
        simp only [Instr.eval]
        simp only [Instr.eval] at H
        have H1 : List.append is is' = is ++ is' := by rfl
        rw [H1]
        rw [IH]
        assumption
      case add =>
        cases s
        case nil => simp only [Instr.eval] at H
        case cons n1 s => 
          cases s
          case nil => simp only [Instr.eval] at H
          case cons n2 s =>
            simp only [Instr.eval]
            simp only [Instr.eval] at H
            have H1 : List.append is is' = is ++ is' := by rfl
            rw [H1]
            rw [IH]
            assumption

theorem compile_correct_helper :
  forall (e : ArithExpr) (s : List Nat), 
    Instr.eval (e.compile) s = Option.some (e.eval :: s) :=
  by 
    intro e
    induction e
    case num n =>
      intro s
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
    case add e₁ e₂ ih₁ ih₂ => 
      intro s
      simp only [ArithExpr.compile, Instr.eval, ArithExpr.eval]
      rw [instr_eval_app]
      -- at this point, have to be careful when solving for s'
      case a =>
        rw [instr_eval_app]
        apply ih₁
        case a =>
          apply ih₂
      simp only [Instr.eval]
      rfl

theorem compile_correct :
  forall (e : ArithExpr),
    Instr.eval (e.compile) [] = Option.some [e.eval] := 
 by intros e
    apply compile_correct_helper