Our prior implementations of ANF conversion were functions of the form
anf : tag expr -> unit expr, that produced a value of the same
data type, but one that happened to satisfy the
is_anf predicate we
defined. This was certainly expedient:
We didn’t need to define new data types,
Our intuitions about the semantics of
exprcarried over to ANF’ed
All our utility functions (like pretty-printing) just worked with ANF’ed expressions
But it has downsides, primarily that we lose a lot of the power and guarantees that ML’s type system could provide us. If we define our types correctly, ML can assure us that
The result of
anfis guaranteed to be an expression in A-normal form,
Any functions that do further work on ANF’ed expressions are not missing any steps —
unlike our current ones, which have a catch-all case that throws an exception claiming that any remaining expressions just aren’t ANF.
Let’s try to define these types, then revisit the transformation functions.
Recall our definition of
let rec is_anf (e : 'a expr) : bool = match e with | EPrim1(_, e, _) -> is_imm e | EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2 | ELet(binds, body, _) -> List.for_all (fun (_, e, _) -> is_anf e) binds && is_anf body | EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els | _ -> is_imm e and is_imm e = match e with | ENumber _ -> true | EId _ -> true | EBool _ -> true | _ -> false ;;
The goal was to ensure that every primitive operation only received immediate operands, which were anything whose value was available without any further computation: namely, constants and identifiers. Now, if we try to define a new expression type for ANF expressions, we’ll want to encode that invariant in the types.
Try to define a new expression type,
'a aexpr, that describes this ANF invariant.
It’s easiest to start from the immediate values:
type 'a immexpr = (* immediate expressions *) | ImmNum of int64 * 'a | ImmBool of bool * 'a | ImmId of string * 'a
The next level up from immediate expressions are compound expressions, which will include basically everything else:
(* Attempt #1 *) type 'a comp_expr = | CImmExpr of 'a immexpr (* base cases *) | CPrim1 of prim1 * 'a immexpr * 'a (* argument must be immediate *) | CPrim2 of prim2 * 'a immexpr * 'a immexpr * 'a (* arguments must be immediate *) | CIf of 'a immexpr * 'a comp_expr * 'a comp_expr (* condition must be immediate *) | CLet of (string * 'a comp_expr) list * 'a comp_expr * 'a (* ugh. *)
This data definition correctly encodes our stated ANF property, but that
CLet definition is still pretty unwieldy. In particular, it permits
multiple bindings within a single
CLet (because of the
list), and it
allows let-bindings to appear in the binding-expressions themselves. This in
turn means that our binding expressions can be arbitrarily complex, which
would be nice to avoid, if possible.
We can correct the multiple bindings part just by removing the
correcting the other flaw requires a separation of
comp_expr into two
distinct, mutually recursive types:
type 'a cexpr = (* compound expressions *) | CIf of 'a immexpr * 'a aexpr * 'a aexpr * 'a | CPrim1 of prim1 * 'a immexpr * 'a | CPrim2 of prim2 * 'a immexpr * 'a immexpr * 'a | CImmExpr of 'a immexpr and 'a aexpr = (* anf expressions *) | ALet of string * 'a cexpr * 'a aexpr * 'a | ACExpr of 'a cexpr
Note in particular that
ALet accepts binding expressions that are
cexpr, which cannot itself be a let-binding. Also notice
CIf expressions contain branches that are
aexprs: the branches
might well need let-bindings, but we don’t want to execute any of the code for
the branch that isn’t taken. So we can’t lift all the let-bindings out of the
if-expressions; we need to permit them to remain inside its branches.
Translate your definition of
anfto produce values of type
unit aexpr, instead of
unit expr. Hint: one of the two versions will be a much easier starting point than the other...
From Lecture 4: Conditionals and A-Normal Form, we have an attempted signature of
let anf_without_types (e : tag expr) : unit expr = let rec help (e : tag expr) : (unit expr * (string * unit expr) list) = ... in ...
The intuition behind the return value of
help was to return an answer
expression that was an immediate, followed by a binding-list of ANF’ed
expressions that formed the context in which the answer made sense. We can
translate this into our new types as
let anf (e : tag expr) : unit aexpr = let rec help (e : tag expr) : (unit immexpr * (string * unit aexpr) list) = ... in ...
But this type is overly strict. Consider the complete program
3 + 5. In order to call
help on this expression, we have to return
unit immexpr, which means we need to effectively produce a translated
let temp = 3 + 5 in temp. But nothing in our definition of
ANF requires that the final expression of our program be an immediate! We
merely require that all operators have immediate operands. So we need at least
one helper with a less-stringent signature, to allow us to return non-immediate
answers. Fortunately, we now have an intermediate type, between “all ANF
'a aexpr) and “all immediate expressions” (
namely compound expressions (
Note that similar analyses holds for the branches of a conditional, and the
right-hand side of a let-binding. These inform our choices of where to use
'a aexpr and where to use
'a cexpr in our type definitions above.
Complete this analysis, and justify to yourself why the data definitions above have the desired properties.
So, now that we have three types for our ANF expressions instead of just one, we’ll need three mutually recursive functions:
let anf (e : tag expr) : unit aexpr = let rec helpA (e : tag expr) : unit aexpr = ... and helpC (e : tag expr) : (unit cexpr * (string * unit cexpr) list) = ... and helpI (e : tag expr) : (unit immexpr * (string * unit cexpr) list) = ... in helpA e ;;
helpIwill have similar signatures, but their answers will be compound or immediate, respectively. By contrast,
helpAproduces a complete ANF expression, so its return type has no need for a setup context.
helpA function will be the only one that actually needs to collapse
the setup lists into nested
ALet expressions. To obtain such a setup
list, it will need to call either
helpI on its argument. A
single example suffices to show which choice is better:
(* The highlighted expressions are pretty-printed exprs, not ML *) helpI ~hl:1:s~(1 + 2)~hl:1:e~ ===> ~hl:1:s~let temp = 1 + 2 in temp~hl:1:e~ helpC ~hl:1:s~(1 + 2)~hl:1:e~ ===> ~hl:1:s~1 + 2~hl:1:e~
cexpr can be treated directly as an
aexpr (by using
ACExpr) without introducing any unnecessary temp variables, we’d rather
helpC. Once it has done so, it obtains an answer and
a setup list. We need to choose an ordering on that list that we’ll use to
define order of execution. The easiest choice to read is to let the first item
in the list be the first binding to execute. Accordingly,
helpA needs to
fold its setup list around the answer, starting from the right:
let rec helpA e : unit aexpr = let (ans, ans_setup) = helpC e in List.fold_right (* from right to left, *) (fun (bind, exp) body -> ALet(bind, exp, body, ())) (* wrap each binding around the body *) ans_setup (* from the list of setup bindings *) (ACExpr ans) (* starting from the answer *)
Confirm that this definition of
helpAwill produce the claimed answer above. Confirm that using
helpIinstead would produce the other answer above.
We sketch just some of the cases for
helpC; complete the others as an
and helpC (e : tag expr) : (unit cexpr * (string * unit cexpr) list) = match e with | EPrim1(op, arg, _) -> (* Convert the argument to an immediate *) let (arg_imm, arg_setup) = helpI arg in (* Produce the appropriate cexpr, along with its context *) (CPrim1(op, arg_imm, ()), arg_setup) | EPrim2 _ | EIf _ | ELet _ -> (* complete these *) | _ -> let (imm, setup) = helpI in (CImmExpr imm, setup) (* just wrap the immediate value *)
We sketch just some of the cases for
helpI; complete the others as an
and helpI (e : tag expr) : (unit immexpr * (string * unit cexpr) list) = match e with | ENumber(n, _) -> (ImmNum(n, ()), ) (* Immediate values are immediate *) | EBool(b, _) -> (ImmBool(b, ()), ) | EId(name, _) -> (ImmId(name, ()), ) | EPrim1(op, arg, tag) -> (* We can't just return CPrim, so create a temp variable *) let tmp = sprintf "unary_%d" tag in (* Convert the argument *) let (arg_imm, arg_setup) = helpI arg in (* Return the **temp variable**, but include the CPrim as a new setup binding *) (ImmId(tmp, ()), arg_setup @ [(tmp, CPrim1(op, arg_imm, ()))]) | EPrim2 _ | EIf _ | ELet _ -> (* complete these *)
Simply by compiling this code, ML’s totality checking will
guarantee for us that we’ve covered all the
expr cases, in each
match expression of
helpI, which means our
translation didn’t miss any expression forms. This was true of the simpler
transformation last time, too. However, all subsequent compilation functions
that now work over
aexprs also get guarantees that we haven’t
missed any cases. With the previous code, our
compile function would look
let rec compile (e : tag expr) si env : instruction list = match e with | ... (* various cases *) | _ -> failwith "Not in ANF"
This last catch-all case basically precludes ML from telling us we missed a
valid ANF expression, which means we must resort to a thorough test suite to
check whether we ever hit that case unexpectedly. By contrast, every
expression of type
aexpr is, by definition, a valid ANF expression, so we
can avoid this catch-all, and get compile-time checking back. For a language
as small as ours currently is, this benefit is admittedly small, but as
languages grow to dozens of expression forms or more, having this totality
checking is a very useful safety net.