Assignment 3: Boa: Adding new operators
Due: Thursday 01/30 at 8:59pm
git clone 
In this compiler, you’ll enhance your existing work with Binary Operators and Arithmetic.
Reminder: Test names cannot have spaces; this is due to the way the Makefile relies
on test names being used for filenames.
1 The Boa Language
1.1 Concrete Syntax
The concrete syntax of Boa is:1To read the given parser.mly file,
start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means
that “the parse result of a program should be an Exprs.expr value”
(decorated with a pair of start- and end-positions), and then %start program
means that the entirety of the file should be parseable as a
program.  The %token lines indicate the literals in the program (and,
optionally, their corresponding types), and the %left line indicates that
the specified infix operators are left-associative.  From there, hopefully the
grammar should be relatively straightforward to read.  The expressions in
braces on each line tell the parser “when you successfully parse the
grammatical pattern on the left, construct a new value as follows...”
‹expr› let ‹bindings› in ‹expr› if ‹expr› : ‹expr› else: ‹expr› ‹binop-expr› ‹binop-expr› NUMBER IDENTIFIER add1 ( ‹expr› ) sub1 ( ‹expr› ) ‹expr› + ‹expr› ‹expr› - ‹expr› ‹expr› * ‹expr› ( ‹expr› ) ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
As in Adder, a let expression can have one or more bindings.
1.2 Abstract Syntax
type prim1 =
  | Add1
  | Sub1
type prim2 =
  | Plus
  | Minus
  | Times
type 'a bind =
  (* the 3rd component is any information specifically about the identifier, like its position *)
  (string * 'a expr * 'a)
and 'a expr =
  | ELet of 'a bind list * 'a expr * 'a
  | EPrim1 of prim1 * 'a expr * 'a
  | EPrim2 of prim2 * 'a expr * 'a expr * 'a
  | EIf of 'a expr * 'a expr * 'a expr * 'a
  | ENumber of int * 'a
  | EId of string * 'a1.3 Semantics
In addition to the semantics of Adder, we now have infix binary operators
(addition, subtraction and multiplication), that are evaluated
leftmost-innermost first (i.e., the standard left-to-right order that obeys
parentheses), and conditional expressions.  An EIf expression evaluates its
condition, then evaluates its then-branch if the condition is non-zero, and
evaluates its else-branch if the condition was zero.
To compile these expressions, we need a few more assembly instructions:
type instruction = ...
  | ISub of arg * arg
  | IMul of arg * arg
  | ILabel of string
  | ICmp of arg * arg
  | IJne of string
  | IJe of string
  | IJmp of stringThe sub and imul instructions are analogous to add, that take
two arguments, apply their respective operations, and place their results in
RAX.2We use imul rather than mul for signed multiplication.
I suppose we should name the variant IImul, then, but that’s clunky and we’re
never going to need mul itself anyway...
Labels let us name the first of a sequence of instructions, akin to
how we label our_code_starts_here: to begin our code.  The cmp
instruction compares its two arguments, and sets a processor flag to
EQUAL or NOT-EQUAL accordingly.  This flag is something separate
from any registers we’ve worked with so far, and doesn’t have an explicit name.
Instead, we can test the value of this flag with the jump instructions:
jne will jump control to the named label if the flag is NOT-EQUAL,
and je will jump control to the named label when the flag is
EQUAL.  Finally, jmp will unconditionally jump to the named label.
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure:
The types for your AST and assembly instructions (
types.ml). You don’t need to edit this file.A lexer and parser for the Boa language (
lexer.mllandparser.mly), which together will produceexprvalues tagged by a pair of source locations (Lexing.positionvalues) indicating the start and end of each expression. You don’t need to edit these files.A main program (
main.ml) that uses the parser and compiler to produce assembly code from an input Boa text file. You don’t need to edit this.A
Makefilethat buildsmain.ml, builds a tester for Boa that you will modify (test.ml), and manipulates assembly programs created by the Boa compiler. You don’t need to edit theMakefile, but you will edittest.ml.3Note: the.PRECIOUSpseudo-target ensures that the Makefile won’t delete your assembly files when it’s finished, so that you can manually inspect the results afterward for debugging purposes.An OCaml program (
runner.ml) that works in concert with theMakefileto allow you to compile and run a Boa program from within OCaml, which is quite useful for testing. You don’t need to editrunner.ml.A pretty-printer (
pretty.ml) for expressions. There are a few pretty-printers in this file of different levels of detail, mostly for your edification. The simplest,string_of_expr, produces a string that’s in Boa syntax, and pretty close to the input program (with some extra parentheses, just for clarity). You should be able to read and understand this function with little difficulty; it’s a straightforward tree-traversing function. Additionally,string_of_poswill print out a textual rendition of a (start,end) pair of source positions.The two functions
ast_of_exprandast_of_pos_exprwill pretty-print the program in a form that’s closer to ML’s syntax, so as to better visualize the actual expression value you’re working with. The_posvariant will also print out the position information, if it’s available. This function is a lot trickier to read, as it relies on ML’sFormatlibrary. (Actually writing a pretty-printer is a distraction from our goal, but having a pretty printer is often a welcome debugging aid...) You’re welcome to try to figure out how this function works, in tandem with the library documentation (which, frankly, is much easier to understand when you have an example program that uses it!).You do not need to modify this file, but you’re welcome to enhance it if you define additional types.
All of your edits —test.ml and compile.ml.
3 Implementing a Compiler for Boa
Again, the primary task of writing the Boa compiler is simple to state: take an
instance of the expr datatype and turn it into a list of assembly
instructions.  But since we now have more complicated expressions, we need to
worry about simplifying the program, first.
3.1 Checking for scoping problems
In Adder, we asked you to confirm that no Let expression bound two
identifiers with the same name, and to confirm that no expression referred to
an unbound identifier.  You likely interspersed that code with the compiler
itself.  While this was fine for Adder, let’s refactor this code into something
more maintainable.
Define the function
check_scope (e : (Lexing.position * Lexing.position) expr) : unitthat abstracts out these two checks. This function returns nothing of any consequence; it just raises any appropriateBindingErrors as its only side-effect. (Hint: you may find the functionignore : 'a -> unitto be useful.)
3.2 Tagging
Let type tag = int.
Design (and test!) a function
tag : 'a expr -> tag exprthat gives every expression and let-binding a unique numerical id.Hint: you will likely want to define a helper function of type
'a expr -> tag -> (tag expr * tag)that takes an expression and the starting tag to use, and returns both the tagged expression and the next available tag for use. You may also want to write a functionuntag : 'a expr -> unit exprthat strips out whatever information is decorating the expression; this is most useful for writing small test cases.
3.3 Renaming
As described in the lecture notes, ANF transformations can inadvertently introduce new shadowing errors.
Design (and test!) a function
rename : tag expr -> tag exprthat uniquely renames all names in the program such that there are no longer any shadowing collisions, and all name bindings are unique.Hint: you may pick any naming convention to produce new unique names; a convenient one may be
sprintf "%s#%d" name tag, wherenameis the original name, andtagis the output of your previous function for this node.Hitnt: you will need a helper function that takes in an environment of renamings from original names to new unique names, which you will update in the
ELetcase and use in theEIdcase. Be careful about the ordering of this list, and about the ordering of the bindings you produce when transforming theELet!
3.4 Converting to A-Normal Form
A-Normal Form asserts that throughout a program, any function-call or operator
expression contains arguments that are immediate: that is, are numbers
or identifiers, and therefore don’t perform any computation of their own.
Additionally, we can think of the decision in an EIf expression to be an
operation, so we also need to ensure that the condition is immediate.  The
following function is a reliable predicate for checking this property:
let rec is_anf (e : 'a expr) : bool =
  match e with
  | EPrim1(_, e, _) -> is_imm e
  | EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2
  | ELet(binds, body, _) ->
     List.for_all (fun (_, e, _) -> is_anf e) binds
     && is_anf body
  | EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els
  | _ -> is_imm e
and is_imm e =
  match e with
  | ENumber _ -> true
  | EId _ -> true
  | _ -> false
;;Design (and test!) a function
anf : tag expr -> unit exprthat takes any tagged expression and produces a new expression that is in A-Normal Form. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with tag information, but the output will just be decorated with unit values. You will very likely need to write one or more helper functions.There are several ways to define this function. The simplest is to start with a helper
help : tag expr -> (unit expr * (string * unit expr) list)that takes an expression and returnsThe final answer of the program
A list of bindings that form the context that need to run as “setup,” in which the answer makes sense.
All of these expressions (the answer and the bound values) must all be in A-Normal Form.
When you need to generate fresh names (i.e., unique names that aren’t used anywhere in the expression), generate names of the form
"$reason_###", where reason is “prim1”, “prim2”, “if”, etc., and the numbers come from the expression’s tag.Then, define a helper function that takes this resulting answer-and-context pair, and collapses it into a single
unit exprthat is the final, converted program. (You will need to be careful about the order of bindings in your context list, and the order in which they are processed by this second helper.)(6410 students: required; 4410 students: optional) Once you’ve defined
anfas above, you may notice that it produces unnecessarily-verbose output, with extra let-bindings that could be avoided. For example,1 + 2should not need any let-bindings, since all the arguments are immediate. Split yourhelpfunction into two mutually-recursive helpers,helpCcan return an answer that is any ANF expressionhelpImust return an answer that is an immediate expression
such that you use as few let-bindings as possible. (For what it’s worth, in my reference solution,
helpCcallshelpIfour times and itself three times; converselyhelpIcallshelpCjust once and itself six times.)
3.5 Compilation
Enhance your
compilefunction from Adder to compile the new expression forms in Boa. You may want to restrict its signature to only accept tagged expressions that are in A-Normal Form. Remember that the invariant forcompileis that it always leaves its answer inRAX; this invariant will definitely help you!
4 Recommendations
Here’s an order in which you could consider tackling the implementation:
Write some tests for the input and output of
anffor nestedEPrim1expressions to understand what the ANF transformation looks like on those examples.Finish the
EIfcase forcompile_expr(with immediate conditions) so you can run simple programs with if. You will also need to fill in the printing of the relevant assembly instructions, so that they can be output...Write some tests for the input and output of performing
anfon if-expressions, again to get a feel for what the transformation looks like.Work through both the
anfimplementation and theEPrim2case ofcompile_expr. Write tests as you go.Work through both the
anfimplementation and theELetcase ofcompile_expr. Write tests as you go.
5 Testing the Compiler
The predefined testing helpers for Boa are mostly the same as for Adder; see
Assignment 2 for details.  I’ve added tprog, which is similar to t
except it takes the filename of a Boa input file as input, rather than source
text, and teprog, which is similar but expects an error to occur.  There
are also tanf, which tests your anf function alone, and ta,
which attempts to compile a program that’s already in ANF (so that you can test
your compiler independently of testing your ANF conversion).
Testing is truly important. My initial draft of a solution for this assignment had an incredibly silly typo, that was causing my program to behave in a very odd manner. I thought I could figure it out just by looking at the final assembly output. Instead I had to go back and test each individual function, at which point the error became immediately obvious.
Write several test suites, one for each major function that you’re trying to
test.  Sequence them together at the end of test.ml to run them all.
6 Running main
Running your own programs is the same as with Adder, except you’ll give them
the .boa file extension.
7 List of Deliverables
your
compile.mlany additional modules you saw fit to write
tests in an OUnit test module (
test.ml)any test input programs (
input/*.boafiles)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
8 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
9 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.
1To read the given parser.mly file,
start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program, which means
that “the parse result of a program should be an Exprs.expr value”
(decorated with a pair of start- and end-positions), and then %start program
means that the entirety of the file should be parseable as a
program.  The %token lines indicate the literals in the program (and,
optionally, their corresponding types), and the %left line indicates that
the specified infix operators are left-associative.  From there, hopefully the
grammar should be relatively straightforward to read.  The expressions in
braces on each line tell the parser “when you successfully parse the
grammatical pattern on the left, construct a new value as follows...”
2We use imul rather than mul for signed multiplication.
I suppose we should name the variant IImul, then, but that’s clunky and we’re
never going to need mul itself anyway...
3Note: the .PRECIOUS pseudo-target ensures that
the Makefile won’t delete your assembly files when it’s finished, so that you
can manually inspect the results afterward for debugging purposes.
