Assignment 4: Cobra:   Multiple types of values
1 The Cobra Language
1.1 Concrete Syntax
1.2 Abstract Syntax
1.3 Semantics
2 Examples
3 Implementation strategies
3.1 Rendering errors
3.2 Memory Layout and Calling C Functions
3.3 New Assembly Constructs
3.4 Some software engineering considerations
3.5 Testing Functions
3.5.1 Unit testing parts of the compiler
3.5.2 Integration testing the compiler
4 Recommended TODO List
5 Running main
6 List of Deliverables
7 Grading Standards
8 Submission
8.5

Assignment 4: Cobra: Multiple types of values

Due: Friday 02/18 at 8:59pm

git clone

In this compiler, you’ll deal with COded Binary RepresntAtions of values.

1 The Cobra Language

1.1 Concrete Syntax

The concrete syntax of Cobra is very similar to Boa, with a few new additions.

‹prim1›: ... | ! | print | isbool | isnum ‹expr›: ... | true | false | ‹expr› < ‹expr› | ‹expr› > ‹expr› | ‹expr› <= ‹expr› | ‹expr› >= ‹expr› | ‹expr› == ‹expr› | ‹expr› && ‹expr› | ‹expr› || ‹expr›

1.2 Abstract Syntax

The abstract syntax is very similar to Boa, also:

type prim1 = ...
  | Print
  | IsBool
  | IsNum
  | Not

type prim2 = ...
  | And
  | Or
  | Greater
  | GreaterEq
  | Less
  | LessEq
  | Eq

type 'a expr = ...
  | EBool of bool * 'a

1.3 Semantics

The semantics of booleans are straightforward. The semantics of EIf changes slightly: its condition must evaluate to a boolean, and it branches on the truth or falsehood of that value, rather than whether it’s nonzero.

With the addition of two types to the language, there are two main changes that ripple through the implementation:

There is one other major addition, which is the print primitive, discussed more below.

The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:

You should augment the provided print function in main.c to print these values correctly: true and false should print as those words, and numbers should print out as the underlying number being represented.

You should raise errors in the following cases:

These error messages should be printed on standard error. The other operators never fail, and so never produce errors.

We add two new primitive operations, isbool and isnum. These two operations have an effective type of Any -> Bool, and will return true if the argument they receive is indeed a boolean or number, respectively.

The last required primitive operation is print, which prints its single argument to the command line, and then returns it. The print function in main.c explicitly returns its argument; you will need to retrieve and use that value in your compiled output.

2 Examples

  1. The expression

    let x = 1 in
    let y = print(x + 1) in
    print(y + 2)

    will output

    2
    4
    4

    The first 2 comes from the first print expression. The first 4 comes from the second print expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.

  2. The expression

    if 54: true else: false

    prints (on standard error) something like:

    Error: if expected a boolean, got 54

3 Implementation strategies

3.1 Rendering errors

To display error messages on standard error, you’ll need to use a call something like:

fprintf(stderr, "Error: arithmetic expected a number");

I recommend that you design a function void error(int errCode) in main.c, that handles all errors in uniform manner. Then, you should add a suffix to your generated assembly (i.e., change what goes in compile_anf_to_string) that looks something like:

err_arith_not_num:
  mov RDI, <whatever your error code for arithmetic-op-didn't-get-numbers is>
  call error

You are welcome to make this signature more elaborate, to pass the mistaken value into the error handler so that it can be printed as part of the error message. We sketched this possibility out in class.

3.2 Memory Layout and Calling C Functions

In order to set up the stack properly to call C functions, like print and your error functions, it’s necessary to make a few changes to what we had in Boa.

If you write any functions in main.c that you need to be available in your assembly, you need to declare them in the assembly via:

;; In your generated assembly
extern <exported name of function>

They also need to be declared in main.c such that the compiler won’t mangle their names, via

// In main.c
extern <your function signature here> asm("exported name of function");

If you forget either of these, your code will not link correctly.

3.3 New Assembly Constructs

3.4 Some software engineering considerations

Our compiler is getting sufficiently sophisticated that the pipeline is several stages long already. Bugs can occur in almost any stage of the pipeline. As a result, one very useful debugging trick is to engineer the compiler to save each stage of the pipeline, and print them all out if requested. Until now, our compiler has produced a string, or thrown an exception. We have already seen the use of the ('a, 'b) result type to return either an Ok value or an Error message. In phases.ml, I’ve combined these ideas for you.

First, the file contains several constructors for naming each phase of the compiler, and it defines an 'a pipeline to be a result containing either

The 'a part of the answer is whatever the most recent compiler phase produced as its result.

Next, I’ve defined a few helper functions to add a new phase onto a growing pipeline. Look carefully at the type for add_phase: it takes a function that transforms an 'a into a 'b, and an 'a pipeline, and produces a 'b pipeline as a result...and turns the 'b answer into a phase and adds it to the growing list. add_phase should be used for phases of the compiler that you don’t expect to ever fail: any exceptions that arise are internal compiler errors. By contrast add_err_phase takes a function that takes the current 'a result and produces an ('b, exn list) result, that is, it might reasonably produce multiple error messages.

To use these functions, look at the end of compile.ml:

let compile_to_string (prog : sourcespan program pipeline) : string pipeline =
  prog
  |> (add_phase well_formed check_scope)
  |> (add_phase tagged tag)
  |> (add_phase renamed rename)
  |> (add_phase anfed (fun p -> tag (anf p)))
  |> (add_phase result compile_prog)
;;

(Note: As written, all of these phases either produce an answer or raise a single exception. You may want to modify one or more of these phases to be used by add_err_phase, depending on how you report errors.)

The |> operator is reverse function application: x |> f is the exact same thing as (f x), but it allows us to write pipelines in a neatly chained form.

You are welcome to add phases to your compiler, should you wish to.

Finally, look at main.ml. If you run ./main -t yourFile.cobra, the compiler will print out the trace of the entire pipeline. If you leave out the -t option, it will print the output assembly just as before.

3.5 Testing Functions

3.5.1 Unit testing parts of the compiler

These are the same as they were for Boa. ANF is provided, and hasn’t changed aside from the addition of new primitives and EBool. So your tests should focus on te and t tests.

If your program exits with -10 as its exit code, it probably has segfaulted, meaning it tried to access memory that was not allocated. If you’re familiar with tools like valgrind, you can run valgrind output/some_test.run in order to get a little more feedback. This can sometimes tip you off quite well as to how memory is off, since sometimes you’ll see code trying to jump to a constant that’s in your code, or other obvious tells that there’s something off in the stack. Also, if you’ve done all your stack management correctly, valgrind will report a clean run for your program!1You may need to install the libc6-dbg package.

3.5.2 Integration testing the compiler

In addition to the unit-testing functions, I have added some structure to the input/ directory, and added support for this in your test.ml file. There are now four subdirectories:

To specify the intended output or error messages of these programs, read the README files in each directory that explains what files you should create.

Finally, the input_file_test_suite () in test.ml will run all the programs in your input/ directory as part of your oUnit test suite. I suspect this will be a much easier way to produce larger-scale test cases than writing everything in test.ml directly.

4 Recommended TODO List

Here’s an order in which you could consider tackling the implementation:

  1. Fix the print function in main.c so that it prints out the right output. It will need to check for the tag using C bitwise operators, and use printf or one of its variants to print the right value.

  2. Take a first shot at figuring out how to increase the stack appropriately by using count_vars.

  3. Fill in the EPrim1 case for everything but print, and figure out how to check for errors, and call the “non-number” error reporting function. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine step 2.

  4. Implement compiling print to assembly by pushing appropriate arguments, then calling print. Be aware that if the call doesn’t work, it may be because of step 2 again. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.

  5. Fill in all of the EPrim2 cases, using the error-reporting from the last step. Test as you go.

  6. Complete the if case and test as you go.

5 Running main

Running your own programs is the same as with Boa, except you’ll give them the .cobra file extension.

You can also use the -t command-line flag (described above) to print out a trace of the compilation process.

6 List of Deliverables

Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!

DO NOT SUBMIT YOUR .git DIRECTORY! For that matter, don’t submit your output or _build directories.

7 Grading Standards

For this assignment, you will be graded on

8 Submission

Wait! Please read the assignment again and verify that you have not forgotten anything!

Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline.

1You may need to install the libc6-dbg package.