Lecture 8: Revisiting ANF: Encoding A-Normal Form with Types
Our prior implementations of ANF conversion were functions of the form
anf : tag expr -> unit expr
, that produced a value of the same expr
data type, but one that happened to satisfy the is_anf
predicate we
defined. This was certainly expedient:
We didn’t need to define new data types,
Our intuitions about the semantics of
expr
carried over to ANF’edexpr
s,All our utility functions (like pretty-printing) just worked with ANF’ed expressions
etc.
But it has downsides, primarily that we lose a lot of the power and guarantees that ML’s type system could provide us. If we define our types correctly, ML can assure us that
The result of
anf
is guaranteed to be an expression in A-normal form,Any functions that do further work on ANF’ed expressions are not missing any steps —
unlike our current ones, which have a catch-all case that throws an exception claiming that any remaining expressions just aren’t ANF.
Let’s try to define these types, then revisit the transformation functions.
1 Defining ANF expressions
Recall our definition of is_anf
:
let rec is_anf (e : 'a expr) : bool =
match e with
| EPrim1(_, e, _) -> is_imm e
| EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2
| ELet(binds, body, _) ->
List.for_all (fun (_, e, _) -> is_anf e) binds
&& is_anf body
| EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els
| _ -> is_imm e
and is_imm e =
match e with
| ENumber _ -> true
| EId _ -> true
| EBool _ -> true
| _ -> false
;;
The goal was to ensure that every primitive operation only received immediate operands, which were anything whose value was available without any further computation: namely, constants and identifiers. Now, if we try to define a new expression type for ANF expressions, we’ll want to encode that invariant in the types.
Exercise
Try to define a new expression type,
'a aexpr
, that describes this ANF invariant.
It’s easiest to start from the immediate values:
type 'a immexpr = (* immediate expressions *)
| ImmNum of int * 'a
| ImmBool of bool * 'a
| ImmId of string * 'a
The next level up from immediate expressions are compound expressions, which will include basically everything else:
(* Attempt #1 *)
type 'a comp_expr =
| CImmExpr of 'a immexpr (* base cases *)
| CPrim1 of prim1 * 'a immexpr * 'a (* argument must be immediate *)
| CPrim2 of prim2 * 'a immexpr * 'a immexpr * 'a (* arguments must be immediate *)
| CIf of 'a immexpr * 'a comp_expr * 'a comp_expr (* condition must be immediate *)
| CLet of (string * 'a comp_expr) list * 'a comp_expr * 'a (* ugh. *)
This data definition correctly encodes our stated ANF property, but that
CLet
definition is still pretty unwieldy. In particular, it permits
multiple bindings within a single CLet
(because of the list
), and it
allows let-bindings to appear in the binding-expressions themselves. This in
turn means that our binding expressions can be arbitrarily complex, which
would be nice to avoid, if possible.
We can correct the multiple bindings part just by removing the list
, but
correcting the other flaw requires a separation of comp_expr
into two
distinct, mutually recursive types:
type 'a cexpr = (* compound expressions *)
| CIf of 'a immexpr * 'a aexpr * 'a aexpr * 'a
| CPrim1 of prim1 * 'a immexpr * 'a
| CPrim2 of prim2 * 'a immexpr * 'a immexpr * 'a
| CImmExpr of 'a immexpr
and 'a aexpr = (* anf expressions *)
| ALet of string * 'a cexpr * 'a aexpr * 'a
| ACExpr of 'a cexpr
Note in particular that ALet
accepts binding expressions that are
restricted to cexpr
, which cannot itself be a let-binding. Also notice
that CIf
expressions contain branches that are aexpr
s: the branches
might well need let-bindings, but we don’t want to execute any of the code for
the branch that isn’t taken. So we can’t lift all the let-bindings out of the
if-expressions; we need to permit them to remain inside its branches.
2 Defining anf
using these types
Exercise
Translate your definition of
anf
to produce values of typeunit aexpr
, instead ofunit expr
. Hint: one of the two versions will be a much easier starting point than the other...
From Lecture 4: Conditionals and A-Normal Form, we have an attempted signature of
let anf_without_types (e : tag expr) : unit expr =
let rec help (e : tag expr) : (unit expr * (string * unit expr) list) =
...
in ...
The intuition behind the return value of help
was to return an answer
expression that was an immediate, followed by a binding-list of ANF’ed
expressions that formed the context in which the answer made sense. We can
translate this into our new types as
let anf (e : tag expr) : unit aexpr =
let rec help (e : tag expr) : (unit immexpr * (string * unit aexpr) list) =
...
in ...
But this type is overly strict. Consider the complete program
3 + 5
. In order to call help
on this expression, we have to return
an unit immexpr
, which means we need to effectively produce a translated
program of let temp = 3 + 5 in temp
. But nothing in our definition of
ANF requires that the final expression of our program be an immediate! We
merely require that all operators have immediate operands. So we need at least
one helper with a less-stringent signature, to allow us to return non-immediate
answers. Fortunately, we now have an intermediate type, between “all ANF
expressions” ('a aexpr
) and “all immediate expressions” ('a immexpr
),
namely compound expressions ('a cexpr
).
Note that similar analyses holds for the branches of a conditional, and the
right-hand side of a let-binding. These inform our choices of where to use
'a aexpr
and where to use 'a cexpr
in our type definitions above.
Exercise
Complete this analysis, and justify to yourself why the data definitions above have the desired properties.
So, now that we have three types for our ANF expressions instead of just one, we’ll need three mutually recursive functions:
let anf (e : tag expr) : unit aexpr =
let rec helpA (e : tag expr) : unit aexpr = ...
and helpC (e : tag expr) : (unit cexpr * (string * unit cexpr) list) = ...
and helpI (e : tag expr) : (unit immexpr * (string * unit cexpr) list) = ...
in
helpA e
;;
helpC
and helpI
will have similar
signatures, but their answers will be compound or immediate, respectively. By
contrast, helpA
produces a complete ANF expression, so its return type has
no need for a setup context.2.1 helpA
: obtaining full expressions
The helpA
function will be the only one that actually needs to collapse
the setup lists into nested ALet
expressions. To obtain such a setup
list, it will need to call either helpC
or helpI
on its argument. A
single example suffices to show which choice is better:
(* The highlighted expressions are pretty-printed exprs, not ML *)
helpI ~hl:1:s~(1 + 2)~hl:1:e~ ===> ~hl:1:s~let temp = 1 + 2 in temp~hl:1:e~
helpC ~hl:1:s~(1 + 2)~hl:1:e~ ===> ~hl:1:s~1 + 2~hl:1:e~
Since any cexpr
can be treated directly as an aexpr
(by using
ACExpr
) without introducing any unnecessary temp variables, we’d rather
have helpA
call helpC
. Once it has done so, it obtains an answer and
a setup list. We need to choose an ordering on that list that we’ll use to
define order of execution. The easiest choice to read is to let the first item
in the list be the first binding to execute. Accordingly, helpA
needs to
fold its setup list around the answer, starting from the right:
let rec anfA e : unit aexpr =
let (ans, ans_setup) = helpC e in
List.fold_right (* from right to left, *)
(fun (bind, exp) body -> ALet(bind, exp, body, ())) (* wrap each binding around the body *)
ans_setup (* from the list of setup bindings *)
(ACExpr ans) (* starting from the answer *)
Do Now!
Confirm that this definition of
helpA
will produce the claimed answer above. Confirm that usinghelpI
instead would produce the other answer above.
2.2 helpC
: obtaining compound expressions
We sketch just some of the cases for helpC
; complete the others as an
exercise.
and helpC (e : tag expr) : (unit cexpr * (string * unit cexpr) list) =
match e with
| EPrim1(op, arg, _) ->
(* Convert the argument to an immediate *)
let (arg_imm, arg_setup) = helpI arg in
(* Produce the appropriate cexpr, along with its context *)
(CPrim1(op, arg_imm, ()), arg_setup)
| EPrim2 _
| EIf _
| ELet _ -> (* complete these *)
| _ -> let (imm, setup) = helpI in (CImmExpr imm, setup) (* just wrap the immediate value *)
2.3 helpI
: obtaining immediate expressions
We sketch just some of the cases for helpI
; complete the others as an
exercise.
and helpI (e : tag expr) : (unit immexpr * (string * unit cexpr) list) =
match e with
| ENumber(n, _) -> (ImmNum(n, ()), []) (* Immediate values are immediate *)
| EBool(b, _) -> (ImmBool(b, ()), [])
| EId(name, _) -> (ImmId(name, ()), [])
| EPrim1(op, arg, tag) ->
(* We can't just return CPrim, so create a temp variable *)
let tmp = sprintf "unary_%d" tag in
(* Convert the argument *)
let (arg_imm, arg_setup) = helpI arg in
(* Return the **temp variable**, but include the CPrim as a new setup binding *)
(ImmId(tmp, ()), arg_setup @ [(tmp, CPrim1(op, arg_imm, ()))])
| EPrim2 _
| EIf _
| ELet _ -> (* complete these *)
3 Benefits of these types
Simply by compiling this code, ML’s totality checking will
guarantee for us that we’ve covered all the expr
cases, in each
match
expression of helpA
, helpC
and helpI
, which means our
translation didn’t miss any expression forms. This was true of the simpler
transformation last time, too. However, all subsequent compilation functions
that now work over aexpr
s also get guarantees that we haven’t
missed any cases. With the previous code, our compile
function would look
something like
let rec compile (e : tag expr) si env : instruction list =
match e with
| ... (* various cases *)
| _ -> failwith "Not in ANF"
This last catch-all case basically precludes ML from telling us we missed a
valid ANF expression, which means we must resort to a thorough test suite to
check whether we ever hit that case unexpectedly. By contrast, every
expression of type aexpr
is, by definition, a valid ANF expression, so we
can avoid this catch-all, and get compile-time checking back. For a language
as small as ours currently is, this benefit is admittedly small, but as
languages grow to dozens of expression forms or more, having this totality
checking is a very useful safety net.