Lecture 11.3: Control Flow and Compiler Correctness

Previously, we defined a compiler that consists of two compiler passes:

let compile (e : CalcLang.expr) : StackLang.program =
  let e' = uniquify StrMap.empty e in
  let prg = lower e' in
  prg

Our compiler consists of two passes: a uniquify pass to ensure we don't need to worry about name shadowing, and a lower pass to turn our CalcLang programs into equivalent StackLang ones.

That said, we have been ignoring how to implement control flow and branching in StackLang, and our lower function currently throws a runtime error if we pass in a CalcLang expression containing ITE.

Control flow with jumps

We can describe an if-then-else expression in CalcLang as a "branching" statement: if the guard is true, we travel down the first branch; otherwise, we travel down the second. StackLang, however, is represented as a straight-line program: a list of instructions to execute. So we will need similar operations to work on these lists. Let's take a look at the following CalcLang program:

CalcLang: if true then 10 else 20

We can represent each component of this expression as a StackLang subprogram:

the guard true is defined by the StackLang program [Push 1],
the equivalent StackLang program for the true branch simply pushes 10 onto the stack: [Push 10],
likewise with the false branch we will push 20 onto the stack: [Push 20].

If we naively concatenate these programs, we will end up with a program of [Push 1; Push 10; Push 20], which will produce a stack of three values. Instead, what we would like to happen is the following:

execute the instructions in the guard program, producing a value of 0 or 1,
if the guard is 1, continue to evaluate until we finish all instructions in the first sub-program. Once this is done, jump to the end of subprogram 2.
if the guard is 0, jump to the second sub-program and continue to execute instructions.

To capture these two different jump instructions, we will introduce new expressions in our AST.

Label l a pseudo-instruction that acts as a place to start executing instructions
Jump l will perform an unconditional jump to label l
JumpIfZero l will jump forward to label l but only if the top of the stack is false (i.e., 0).

Our simple ITE-expression will step in the following way:

CalcLang: if true then 10 else 20
StackLang: [Push 1; JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200]

If the guard returns false, or 0, JumpIfZero 100 will skip over the next two instructions: the truthy branch of Push 10 and the subsequent Jump instruction, landing us squarely in the else branch. From there, we can continue to execute as usual. If the guard returns true, we will ignore JumpIfZero 2, evaluate the truthy branch, and Jump over two instructions — the Label 100 and the else branch.

Looking at the configurations during execution, we would see the following:

- step 0: ⟨[Push 1; JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200],       [], {}⟩
- step 1: ⟨        [JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200],    1::[], {}⟩
- step 2: ⟨                        [Push 10; Jump 200; Label 100; Push 20; Label 200],    1::[], {}⟩
- step 3: ⟨                                 [Jump 200; Label 100; Push 20; Label 200],10::1::[], {}⟩
- step 4: ⟨                                                [Label 200],10::1::[], {}⟩
- step 5: ⟨                                                         [],10::1::[], {}⟩

If we had a guard of false, this program would evaluate the other branch:

- step 0: ⟨[Push 0; JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200],       [], {}⟩
- step 1: ⟨        [JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200],    0::[], {}⟩
- step 2: ⟨                                           [Label 100; Push 20; Label 200],    0::[], {}⟩
- step 3: ⟨                                                      [Push 20; Label 200],    0::[], {}⟩
- step 4: ⟨                                                               [Label 200],20::1::[], {}⟩
- step 5: ⟨                                                                   [],20::1::[], {}⟩

Let's update our definition of step and fill in the cases for Label, Jump and JumpIfZero. First we will need a way to skip over parts of the program:

let find_label { lbl } prog : instr list =
  let pc_opt = List.find_index (fun instr -> instr = Label { lbl }) prog in
  match pc_opt with
  | Some pc -> List.drop pc prog
  | None -> failwith ("Undefined label " ^ string_of_int lbl)

let step (cfg : config) : config =
  match cfg with
  | ([],  vs, env) -> cfg
  | (hd::tl, vs, env) ->
    (match hd with
    ...
    | Label l -> (tl, vs, env)
    | Jump l -> (find_label l tl, vs, env)
    | JumpIfZero l ->
       (match vs with
       | [] -> failwith "Not enough on stack"
       | v :: vs' ->
         if v = 0
         then (find_label l tl, vs', env)
         else (tl, vs', env)))

A quick validation with utop corroborates the example we had before:

utop #  execute ([Push 0; JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200]);;
- : value = 20

utop #  execute ([Push 1; JumpIfZero 100; Push 10; Jump 200; Label 100; Push 20; Label 200]);;
- : value = 10

Next, we'll formalize this reasoning for our calc-to-stack lowering pass:

let rec lower (e : CalcLang.expr) : StackLang.program =
  match e with
  ...
  | ITE (g, t, f) ->
     let guard_code  = go g in
     let truthy_code   = go t in
     let falsey_label  = fresh_label () in
     let falsey_code   = go f in
     let finish_label  = fresh_label () in

     guard_code  @ [StackLang.JumpIfZero falsey_label] @
     truthy_code @ [StackLang.Jump finish_label; StackLang.Label falsey_label] @
     falsey_code @ [StackLang.Label finish_label]

In order to generate labels, we will invoke fresh_label, which maintains a global mutable reference for the pass. This function looks like the following:

let next_label = ref 0
let fresh_label () =
  let lbl = !next_label in
  next_label := lbl + 1;
  lbl

With this, we have a code-complete compiler! Let's come up with a few more calc examples that introduce if-then-else expressions in interesting ways.

utop # open CalcLang;;
utop # open StackLang;;

utop # compile(Let ("x", ITE (Value (Bool false), Value (Nat 4), Value (Nat 10)),
                 App(Add, [Var "x"; Value (Nat 4)])));;
- : program = [Push 0; JumpIfZero 0; Push 4; Jump 1; Label 0;
               Push 10; Label 1; Set "x#0"; Get "x#0"; Push 4; AppInstr Add]

utop # compile(ITE (Value (Bool true),
            Let("x", Value (Nat 10), Var "x"),
            Let("x", Value (Nat 20), Var "x")));;
- : program =
[Push 1; JumpIfZero 0; Push 10; Set "x#0"; Get "x#0"; Jump 1;
 Label 0; Push 20; Set "x#0"; Get "x#0"; Label 1]