Assignment 3: Boa: Adding new operators
Due: Thursday 01/30 at 8:59pm
git clone
In this compiler, you’ll enhance your existing work with Binary Operators and Arithmetic.
Reminder: Test names cannot have spaces; this is due to the way the Makefile
relies
on test names being used for filenames.
1 The Boa Language
1.1 Concrete Syntax
The concrete syntax of Boa is:1To read the given parser.mly
file,
start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program
, which means
that “the parse result of a program
should be an Exprs.expr
value”
(decorated with a pair of start- and end-positions), and then %start program
means that the entirety of the file should be parseable as a
program
. The %token
lines indicate the literals in the program (and,
optionally, their corresponding types), and the %left
line indicates that
the specified infix operators are left-associative. From there, hopefully the
grammar should be relatively straightforward to read. The expressions in
braces on each line tell the parser “when you successfully parse the
grammatical pattern on the left, construct a new value as follows...”
‹expr› let ‹bindings› in ‹expr› if ‹expr› : ‹expr› else: ‹expr› ‹binop-expr› ‹binop-expr› NUMBER IDENTIFIER add1 ( ‹expr› ) sub1 ( ‹expr› ) ‹expr› + ‹expr› ‹expr› - ‹expr› ‹expr› * ‹expr› ( ‹expr› ) ‹bindings› IDENTIFIER = ‹expr› IDENTIFIER = ‹expr› , ‹bindings›
As in Adder, a let
expression can have one or more bindings.
1.2 Abstract Syntax
type prim1 =
| Add1
| Sub1
type prim2 =
| Plus
| Minus
| Times
type 'a bind =
(* the 3rd component is any information specifically about the identifier, like its position *)
(string * 'a expr * 'a)
and 'a expr =
| ELet of 'a bind list * 'a expr * 'a
| EPrim1 of prim1 * 'a expr * 'a
| EPrim2 of prim2 * 'a expr * 'a expr * 'a
| EIf of 'a expr * 'a expr * 'a expr * 'a
| ENumber of int * 'a
| EId of string * 'a
1.3 Semantics
In addition to the semantics of Adder, we now have infix binary operators
(addition, subtraction and multiplication), that are evaluated
leftmost-innermost first (i.e., the standard left-to-right order that obeys
parentheses), and conditional expressions. An EIf
expression evaluates its
condition, then evaluates its then-branch if the condition is non-zero, and
evaluates its else-branch if the condition was zero.
To compile these expressions, we need a few more assembly instructions:
type instruction = ...
| ISub of arg * arg
| IMul of arg * arg
| ILabel of string
| ICmp of arg * arg
| IJne of string
| IJe of string
| IJmp of string
The sub
and imul
instructions are analogous to add
, that take
two arguments, apply their respective operations, and place their results in
RAX
.2We use imul
rather than mul
for signed multiplication.
I suppose we should name the variant IImul
, then, but that’s clunky and we’re
never going to need mul
itself anyway...
Labels let us name the first of a sequence of instructions, akin to
how we label our_code_starts_here:
to begin our code. The cmp
instruction compares its two arguments, and sets a processor flag to
EQUAL or NOT-EQUAL accordingly. This flag is something separate
from any registers we’ve worked with so far, and doesn’t have an explicit name.
Instead, we can test the value of this flag with the jump instructions:
jne
will jump control to the named label if the flag is NOT-EQUAL,
and je
will jump control to the named label when the flag is
EQUAL. Finally, jmp
will unconditionally jump to the named label.
2 Starter code for this assignment
You’ve been given a starter codebase that has several pieces of infrastructure:
The types for your AST and assembly instructions (
types.ml
). You don’t need to edit this file.A lexer and parser for the Boa language (
lexer.mll
andparser.mly
), which together will produceexpr
values tagged by a pair of source locations (Lexing.position
values) indicating the start and end of each expression. You don’t need to edit these files.A main program (
main.ml
) that uses the parser and compiler to produce assembly code from an input Boa text file. You don’t need to edit this.A
Makefile
that buildsmain.ml
, builds a tester for Boa that you will modify (test.ml
), and manipulates assembly programs created by the Boa compiler. You don’t need to edit theMakefile
, but you will edittest.ml
.3Note: the.PRECIOUS
pseudo-target ensures that the Makefile won’t delete your assembly files when it’s finished, so that you can manually inspect the results afterward for debugging purposes.An OCaml program (
runner.ml
) that works in concert with theMakefile
to allow you to compile and run a Boa program from within OCaml, which is quite useful for testing. You don’t need to editrunner.ml
.A pretty-printer (
pretty.ml
) for expressions. There are a few pretty-printers in this file of different levels of detail, mostly for your edification. The simplest,string_of_expr
, produces a string that’s in Boa syntax, and pretty close to the input program (with some extra parentheses, just for clarity). You should be able to read and understand this function with little difficulty; it’s a straightforward tree-traversing function. Additionally,string_of_pos
will print out a textual rendition of a (start,end) pair of source positions.The two functions
ast_of_expr
andast_of_pos_expr
will pretty-print the program in a form that’s closer to ML’s syntax, so as to better visualize the actual expression value you’re working with. The_pos
variant will also print out the position information, if it’s available. This function is a lot trickier to read, as it relies on ML’sFormat
library. (Actually writing a pretty-printer is a distraction from our goal, but having a pretty printer is often a welcome debugging aid...) You’re welcome to try to figure out how this function works, in tandem with the library documentation (which, frankly, is much easier to understand when you have an example program that uses it!).You do not need to modify this file, but you’re welcome to enhance it if you define additional types.
All of your edits —test.ml
and compile.ml
.
3 Implementing a Compiler for Boa
Again, the primary task of writing the Boa compiler is simple to state: take an
instance of the expr
datatype and turn it into a list of assembly
instructions. But since we now have more complicated expressions, we need to
worry about simplifying the program, first.
3.1 Checking for scoping problems
In Adder, we asked you to confirm that no Let
expression bound two
identifiers with the same name, and to confirm that no expression referred to
an unbound identifier. You likely interspersed that code with the compiler
itself. While this was fine for Adder, let’s refactor this code into something
more maintainable.
Define the function
check_scope (e : (Lexing.position * Lexing.position) expr) : unit
that abstracts out these two checks. This function returns nothing of any consequence; it just raises any appropriateBindingError
s as its only side-effect. (Hint: you may find the functionignore : 'a -> unit
to be useful.)
3.2 Tagging
Let type tag = int
.
Design (and test!) a function
tag : 'a expr -> tag expr
that gives every expression and let-binding a unique numerical id.Hint: you will likely want to define a helper function of type
'a expr -> tag -> (tag expr * tag)
that takes an expression and the starting tag to use, and returns both the tagged expression and the next available tag for use. You may also want to write a functionuntag : 'a expr -> unit expr
that strips out whatever information is decorating the expression; this is most useful for writing small test cases.
3.3 Renaming
As described in the lecture notes, ANF transformations can inadvertently introduce new shadowing errors.
Design (and test!) a function
rename : tag expr -> tag expr
that uniquely renames all names in the program such that there are no longer any shadowing collisions, and all name bindings are unique.Hint: you may pick any naming convention to produce new unique names; a convenient one may be
sprintf "%s#%d" name tag
, wherename
is the original name, andtag
is the output of your previous function for this node.Hitnt: you will need a helper function that takes in an environment of renamings from original names to new unique names, which you will update in the
ELet
case and use in theEId
case. Be careful about the ordering of this list, and about the ordering of the bindings you produce when transforming theELet
!
3.4 Converting to A-Normal Form
A-Normal Form asserts that throughout a program, any function-call or operator
expression contains arguments that are immediate: that is, are numbers
or identifiers, and therefore don’t perform any computation of their own.
Additionally, we can think of the decision in an EIf
expression to be an
operation, so we also need to ensure that the condition is immediate. The
following function is a reliable predicate for checking this property:
let rec is_anf (e : 'a expr) : bool =
match e with
| EPrim1(_, e, _) -> is_imm e
| EPrim2(_, e1, e2, _) -> is_imm e1 && is_imm e2
| ELet(binds, body, _) ->
List.for_all (fun (_, e, _) -> is_anf e) binds
&& is_anf body
| EIf(cond, thn, els, _) -> is_imm cond && is_anf thn && is_anf els
| _ -> is_imm e
and is_imm e =
match e with
| ENumber _ -> true
| EId _ -> true
| _ -> false
;;
Design (and test!) a function
anf : tag expr -> unit expr
that takes any tagged expression and produces a new expression that is in A-Normal Form. Because this program will include some new expressions, we’re willing to discard any decorations we had on the expression, which explains why the input could be decorated with tag information, but the output will just be decorated with unit values. You will very likely need to write one or more helper functions.There are several ways to define this function. The simplest is to start with a helper
help : tag expr -> (unit expr * (string * unit expr) list)
that takes an expression and returnsThe final answer of the program
A list of bindings that form the context that need to run as “setup,” in which the answer makes sense.
All of these expressions (the answer and the bound values) must all be in A-Normal Form.
When you need to generate fresh names (i.e., unique names that aren’t used anywhere in the expression), generate names of the form
"$reason_###"
, where reason is “prim1”, “prim2”, “if”, etc., and the numbers come from the expression’s tag.Then, define a helper function that takes this resulting answer-and-context pair, and collapses it into a single
unit expr
that is the final, converted program. (You will need to be careful about the order of bindings in your context list, and the order in which they are processed by this second helper.)(6410 students: required; 4410 students: optional) Once you’ve defined
anf
as above, you may notice that it produces unnecessarily-verbose output, with extra let-bindings that could be avoided. For example,1 + 2
should not need any let-bindings, since all the arguments are immediate. Split yourhelp
function into two mutually-recursive helpers,helpC
can return an answer that is any ANF expressionhelpI
must return an answer that is an immediate expression
such that you use as few let-bindings as possible. (For what it’s worth, in my reference solution,
helpC
callshelpI
four times and itself three times; converselyhelpI
callshelpC
just once and itself six times.)
3.5 Compilation
Enhance your
compile
function from Adder to compile the new expression forms in Boa. You may want to restrict its signature to only accept tagged expressions that are in A-Normal Form. Remember that the invariant forcompile
is that it always leaves its answer inRAX
; this invariant will definitely help you!
4 Recommendations
Here’s an order in which you could consider tackling the implementation:
Write some tests for the input and output of
anf
for nestedEPrim1
expressions to understand what the ANF transformation looks like on those examples.Finish the
EIf
case forcompile_expr
(with immediate conditions) so you can run simple programs with if. You will also need to fill in the printing of the relevant assembly instructions, so that they can be output...Write some tests for the input and output of performing
anf
on if-expressions, again to get a feel for what the transformation looks like.Work through both the
anf
implementation and theEPrim2
case ofcompile_expr
. Write tests as you go.Work through both the
anf
implementation and theELet
case ofcompile_expr
. Write tests as you go.
5 Testing the Compiler
The predefined testing helpers for Boa are mostly the same as for Adder; see
Assignment 2 for details. I’ve added tprog
, which is similar to t
except it takes the filename of a Boa input file as input, rather than source
text, and teprog
, which is similar but expects an error to occur. There
are also tanf
, which tests your anf
function alone, and ta
,
which attempts to compile a program that’s already in ANF (so that you can test
your compiler independently of testing your ANF conversion).
Testing is truly important. My initial draft of a solution for this assignment had an incredibly silly typo, that was causing my program to behave in a very odd manner. I thought I could figure it out just by looking at the final assembly output. Instead I had to go back and test each individual function, at which point the error became immediately obvious.
Write several test suites, one for each major function that you’re trying to
test. Sequence them together at the end of test.ml
to run them all.
6 Running main
Running your own programs is the same as with Adder, except you’ll give them
the .boa
file extension.
7 List of Deliverables
your
compile.ml
any additional modules you saw fit to write
tests in an OUnit test module (
test.ml
)any test input programs (
input/*.boa
files)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
8 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
9 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.khoury.northeastern.edu/ by the above deadline.
1To read the given parser.mly
file,
start from the line that says %type <(Lexing.position * Lexing.position) Exprs.expr> program
, which means
that “the parse result of a program
should be an Exprs.expr
value”
(decorated with a pair of start- and end-positions), and then %start program
means that the entirety of the file should be parseable as a
program
. The %token
lines indicate the literals in the program (and,
optionally, their corresponding types), and the %left
line indicates that
the specified infix operators are left-associative. From there, hopefully the
grammar should be relatively straightforward to read. The expressions in
braces on each line tell the parser “when you successfully parse the
grammatical pattern on the left, construct a new value as follows...”
2We use imul
rather than mul
for signed multiplication.
I suppose we should name the variant IImul
, then, but that’s clunky and we’re
never going to need mul
itself anyway...
3Note: the .PRECIOUS
pseudo-target ensures that
the Makefile won’t delete your assembly files when it’s finished, so that you
can manually inspect the results afterward for debugging purposes.