Assignment 2: Adder:   A starter language
1 The Adder Language
1.1 Concrete Syntax
1.2 Abstract Syntax
1.3 Semantics
2 Starter code for this assignment
3 Implementing a Compiler for Adder
3.1 Writing the Compiler
3.2 Testing the Compiler
3.3 Running main
4 List of Deliverables
5 Grading Standards
6 Submission
7.2

Assignment 2: Adder: A starter language

Due: Tue 01/22 at 8:59pm

git clone

Note: The starter code for this assignment contains answers for the previous assignment. If you are taking a late day on that assignment, you must not view this code until you have completed the previous one.

Note: You will be working with partners on this assignment. Make sure that you’ve requested a team on https://handins.ccs.neu.edu/ well before the deadline, so that we can approve the team. Working solo is not allowed.

In this assignment you’ll implement a compiler for a small language called Adder (because it primarily adds things).

Reminder: Test names cannot have spaces; this is due to the way the Makefile relies on test names being used for filenames.

1 The Adder Language

In each of the next several assignments, we’ll introduce a language that we’ll implement. We’ll start small, and build up features incrementally. We’re starting with Adder, which has just a few features – defining variables, and primitive operations on numbers.

There are a few pieces that go into defining a language for us to compile.

1.1 Concrete Syntax

The concrete syntax of Adder is:

‹expr›: | NUMBER | IDENTIFIER | ( let ( ‹bindings› ) ‹expr› ) | ( add1 ‹expr› ) | ( sub1 ‹expr› ) ‹bindings›: | ( IDENTIFIER ‹expr› ) | ( IDENTIFIER ‹expr› ) ‹bindings›

Here, a let expression can have one or more bindings. An IDENTIFIER is any non-empty sequence of letters and digits (starting with a letter). Any highlighted text is a literal token, meaning the programmer must type exactly those characters or keywords.

1.2 Abstract Syntax

The abstract syntax of Adder is an OCaml datatype, and corresponds nearly one-to-one with the concrete syntax.

type prim1 =
  | Add1
  | Sub1

type 'a expr =
  | Number of int * 'a
  | Prim1 of prim1 * 'a expr * 'a
  | Let of (string * 'a expr) list * 'a expr * 'a
  | Id of string * 'a

1.3 Semantics

An Adder program always evaluates to a single integer. Numbers evaluate to themselves (so a program just consisting of Number(5) should evaluate to the integer 5). Primitive expressions perform addition or subtraction by one on their argument. Let bindings should evaluate all the binding expressions to values one by one, and after each, store a mapping from the given name to the corresponding value in both (a) the rest of the bindings, and (b) the body of the let expression. Identifiers evaluate to whatever their current stored value is. There are several examples further down to make this concrete.

The compiler should signal an error if:

Here are some examples of Adder programs:

Concrete Syntax

     

Abstract Syntax

     

Answer

5

     

Number(5)

     

5

(sub1 (add1 (sub1 5)))

     

Prim1(Sub1, Prim1(Add1, Prim1(Sub1, Number(5))))

     

4

(let ((x 5))
   (add1 x))

     

Let([("x", Number(5))],
    Prim1(Add1, Id("x")))

     

6

(let ((x 5)
      (y (sub1 x)))
  (sub1 y))

     

Let([("x", Number(5));
     ("y", Prim1(Sub1, Id("x")))],
    Prim1(Sub1, Id("y")))

     

3

2 Starter code for this assignment

You’ve been given a starter codebase that has several pieces of infrastructure:

All of your edits — which will be to write the compiler for Adder, and test it — will happen in test.ml and compile.ml.

3 Implementing a Compiler for Adder

3.1 Writing the Compiler

The primary task of writing the Adder compiler is simple to state: take an instance of the expr datatype and turn it into a list of assembly instructions. The provided compiler skeleton is set up to do just this, broken up over a few functions.

The first is

compile : pos expr -> instruction list

which takes a expr value (abstract syntax, tagged with source location information) and turns it into a list of assembly instructions, represented by the instruction type. Use only the provided instruction types for this assignment; we will be gradually expanding this set of instructions as the semester progresses. This function has an associated helper that takes some extra arguments to track the variable environment and stack offset.

The other component you need to implement is:

to_asm_string : instruction list -> string

which renders individual instances of the instruction datatype into a string representation of the instruction (this is done for you for mov). This second step is straightforward, but forces you to understand the syntax of the assembly code you are generating. Most of the compiler concepts happen in the first step, where we are generating assembly instructions from abstract syntax. Do use this assembly guide or ask! — if you have questions about the concrete syntax of an instruction.

3.2 Testing the Compiler

The test file has two helper functions that will be useful to you:

t : string -> string -> string -> OUnit.test

The first string given to t (test) is a test name, followed by an adder program (in concrete syntax) to compile and evaluate, followed by a string for the expected output of the program (this will just be an integer in quotes). This helper compiles, links, and runs the given program, and if the compiler ends in error, it will report the error message as a string. This includes problems building at the assembler/linker level, as well as any explicit failwith statements in the compiler itself.

te : string -> string -> string -> OUnit.test

The first string given to te (test-error) is a test name, followed by a program in concrete syntax to compile and evaluate, followed by a string that is a substring of the expected error message. For example, in the starter code there is a test that fails with the substring "not yet", because the Let case fails with an exception that mentions it is "not yet implemented". You should use this helper to explicitly test for the two error cases mentioned above. Your compiler should generate such errors by calling raise1Calling failwith "some string" function in OCaml is equivalent to calling raise (Failure "some string") – that is, it fails with a very generic Failure type of exception. You can define your own exception types, allowing for more fine-grained error handling than if everything were just a single, simple Failure exception.2If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors thare enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true. on some value of the appropriate exception type (which we have provided for you) and with an appropriately descriptive and distinctive message.

Both of these functions store the assembly file your compiler generated in the output/ directory, with the name of the test suffixed by .s, if it was possible to generate. So, for example, the starter tests generate a file called test1.s in the output/ directory, containing the compiled code for that case (assuming the file parses and the compiler even gets to run – right now, the starter code doesn’t quite get that far). This can be useful for debugging. If you hand-edit a generated assembly file (to tweak it for an attempted fix), you can then run:

$ make output/test1.run

to trigger the build from that assembly file. This can be helpful if you think you’re generating mostly-correct code, but just want to try a small edit to fix something up. It’s also helpful if you want to hand-write a small assembly example: you can create .s files from scratch in the output directory to experiment with, if you want to practice with assembly instructions without the compiler in the way.

For files that built completely, the generated *.run executable is also stored in output/.

To run all of your tests, run

$ make test
$ ./test

3.3 Running main

The main program built with make main takes a single file as its command-line argument, and outputs the compiled assembly string on standard out. If you want to write files containing Adder code, you can write files in the input/ directory with the suffix .adder, and then run make output/yourfile.run, which will trigger the same process that happens in test.ml – the compiler will run on your adder file, and it will attempt to link it with main.c to create an executable. The intermediate files will be stored in output/yourfile.s, output/yourfile.o, and output/yourfile.run.

You can run the file by executing:

$ ./output/yourfile.run

4 List of Deliverables

At a minimum, you must provide

Preferably, provide a zip containing all the sources of your code (including the Makefile, runner.ml, sexp.ml and main.ml files) so that we can simply download your code and run it. Please do not provide your .git/ or _build/ directories, or any other extraneous files! (The easiest way to ensure this is to run make clean before archiving your code directory.)

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

5 Grading Standards

We have provided you with several starter functions, and they have all been fully annotated with types. If you define any additional top-level functions, please annotate them in the same way. (Local definitions do not have to be annotated, but they shouldn’t be too hard to decipher, either.) There are no formal OCaml style guidelines, though ocaml.org defines some good ones to follow.

For this assignment, you will be graded on

(We may attempt to run each of your tests against each of your classmates’ compilers. If so, part of grading your tests will be how well they trip up bugs in your classmates, while still working properly with our reference solution. We don’t have this infrastructure set up yet, so this may not happen.)

6 Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline.

1Calling failwith "some string" function in OCaml is equivalent to calling raise (Failure "some string") – that is, it fails with a very generic Failure type of exception. You can define your own exception types, allowing for more fine-grained error handling than if everything were just a single, simple Failure exception.2If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors thare enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true.

2If you stop to think about it for a bit, this error type is rather weird: all other data types in OCaml have a finite set of constructors thare enumerated in a type definition, but these exceptions can be declared anywhere in your program, and do not have an a priori known finite set... It takes special work in OCaml’s type system to support this, and the exception type is the only type for which this is true.