Assignment 4: Cobra: Multiple types of values
Due: Thurs 02/07 at 8:59pm
git clone
In this compiler, you’ll deal with COded Binary RepresntAtions of values.
1 The Cobra Language
1.1 Concrete Syntax
The concrete syntax of Cobra is very similar to Boa, with a few new additions.
‹prim1› ... ! print isbool isnum ‹expr› ... true false ‹expr› < ‹expr› ‹expr› > ‹expr› ‹expr› <= ‹expr› ‹expr› >= ‹expr› ‹expr› == ‹expr› ‹expr› && ‹expr› ‹expr› || ‹expr›
1.2 Abstract Syntax
The abstract syntax is very similar to Boa, also:
type prim1 = ...
| Print
| IsBool
| IsNum
| Not
type prim2 = ...
| And
| Or
| Greater
| GreaterEq
| Less
| LessEq
| Eq
type 'a expr = ...
| EBool of bool * 'a
1.3 Semantics
The semantics of booleans are straightforward. The semantics of EIf
changes slightly: its condition must evaluate to a boolean, and it branches on
the truth or falsehood of that value, rather than whether it’s nonzero.
With the addition of two types to the language, there are two main changes that ripple through the implementation:
The representation of values
The possibility of errors
There is one other major addition, which is the print
primitive, discussed
more below.
The representation of values requires a definition. We’ll use the following representations for the Cobra runtime:
true
will be represented as the constant0xFFFFFFFF
. (NOTE: in class, we only specified the outermost bits; this is mandating that the middle bits all be set, too.)false
will be represented as the constant0x7FFFFFFF
. (NOTE: in class, we only specified the outermost bits; this is mandating that the middle bits all be set, too.)numbers will be represented with a zero in the rightmost bit, by shifting them left 1 bit. So, for example, 2 is represented as
0x00000004
.
You should augment the provided print
function in main.c
to print
these values correctly: true
and false
should print as those words,
and numbers should print out as the underlying number being represented.
You should raise errors in the following cases:
-
,+
,*
,add1
andsub1
should raise an error (by printing it out) with the substring"arithmetic expected a number"
if the operation’s argument(s) are not numbers. You can print more information than this if you like, but the message must have at least this substring.<
,<=
,>
and>=
should raise an error with the substring"comparison expected a number"
if the arguments are not both numbers.+
,-
,*
,add1
andsub1
should raise an error with the substring"overflow"
if the result overflows, and falls outside the range representable in 31 bits. Thejo
instruction will be useful: it jumps if the last instruction overflowed.if
should raise an error with the substring"if expected a boolean"
if the conditional value is not a boolean.and
andor
should raise an error with the substring"logic expected a boolean"
if the arguments are not both booleans.
These error messages should be printed on standard error. The other operators never fail, and so never produce errors.
We add two new primitive operations, isbool
and isnum
. These two
operations have an effective type of Any -> Bool
, and will return
true
if the argument they receive is indeed a boolean or number,
respectively.
The last required primitive operation is print
, which prints its single
argument to the command line, and then returns it. The print
function in
main.c
explicitly returns its argument; you will need to retrieve and
use that value in your compiled output.
2 Examples
The expression
let x = 1 in let y = print(x + 1) in print(y + 2)
will output
2 4 4
The first 2 comes from the first
print
expression. The first 4 comes from the secondprint
expression. The final line prints the answer of the program as usual, so there’s an “extra” 4.The expression
if 54: true else: false
prints (on standard error) something like:
Error: expected a boolean in if, got 54
3 Implementation strategies
3.1 Rendering errors
To display error messages on standard error, you’ll need to use a call something like:
fprintf(stderr, "Error: expected a number");
I recommend that you design a function void error(int errCode)
in
main.c
, that handles all errors in uniform manner. Then, you should add
a suffix to your generated assembly (i.e., change what goes in
compile_anf_to_string
) that looks something like:
err_arith_not_num:
push <whatever your error code for arithmetic-op-didn't-get-numbers is>
call error
You are welcome to make this signature more elaborate, to pass the mistaken value into the error handler so that it can be printed as part of the error message.
3.2 Memory Layout and Calling C Functions
In order to set up the stack properly to call C functions, like print
and your
error functions, it’s necessary to make a few changes to what we had in Boa.
Allocating stack space ahead of time: At the start of our generated code (which we now recognize is a function body participating in a C runtime stack), we need to make sure we make enough stack space for all variables we create, and reserve that space ahead of time. To do this, we move
ESP
to point to a location that is N words away (so N * 4 bytes for us), where N is the greatest number of variables we need at once. This is actually tricky to compute to be fully optimal (teaser for later in the semester: by "tricky" I mean NP-hard), but it’s easy to get an OK heuristic – we can compute the maximum depth of nested definitions.To do this, we need the
count_vars
function. This is a straightforward function and not very interesting right now, so I’ve provided it. You need to add instructions to compiled output in order to make sure the correct space is allocated on the stack by subtracting the right amount from esp. You may choose to implement this via a simple increment ofESP
, or you may choose to explicitlypush
an appropriate number of zeros onto the stack, to clear out any uninitialized values. (The former is more efficient; the latter might be helpful for debugging.)Using the Base Pointer: In addition, this means that all variable references need to happen from
EBP
rather than fromESP
. This is becauseESP
can change while we are pushing arguments onto the stack for other function calls, so ebp is the place we should trust for consistent offsets.Participating in the C stack: As a C function callee (from
main
) and caller (oferror
functions andprint
), our code has some responsibilities. First, we need to store the old base pointer upon entry, and update the base pointer to hold the current top of the stack (which includes the return pointer into main, for example). This is why the typical top two lines of most C functions are:push ebp mov ebp, esp
Similarly, when we’re done with the function, we need to restore the stack pointer to its old location, and put the old base pointer value back. This is why the last lines before a
ret
in a C function are often:mov esp, ebp pop ebp
Other Responsibilities: If we were using registers beyond
eax
,ebp
, andesp
, we’d be responsible for storing some of them as callee, and some as caller. But we’re not going to go into those details for this assignment. Since we aren’t using those registers, it has no effect on our code’s behavior.
If you write any functions in main.c
that you need to be available in
your assembly, you need to declare them in the assembly via:
;; In your generated assembly
extern <exported name of function>
They also need to be declared in main.c
such that the compiler won’t
mangle their names, via
// In main.c
extern <your function signature here> asm("exported name of function");
If you forget either of these, your code will not link correctly.
3.3 New Assembly Constructs
Sized
: You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:cmp [ebp-8], 0
because the assembler doesn’t know if the program should compare a four-byte zero, a one-byte zero, or something in between into memory starting at
[ebp-8]
. To solve this, you can supply a size:cmp [ebp-8], DWORD 0
The
DWORD
keyword tells the assembler to use the "double word" size for 0, which corresponds to 32 bits. AWORD
corresponds to 16 bits, and aBYTE
corresponds to 8 bits. To get a sized argument, you can use theSized
constructor fromarg
.HexConst
: Sometimes it’s nice to read things in hex notation as opposed to decimal constants. I’ve provided a newHexConst
arg that’s useful for this case, and augmentedarg_to_asm
for you to support it. It has the same meaning asConst
, and just prints out differently.IPush
,IPop
: These two instructions manage values on the stack.push
adds a value at the current location ofESP
, and decrementsESP
to point past the added value (remember, the stack grows toward numerically-smaller addresses, even though we draw it as growing upward in diagrams).pop
incrementsESP
and moves the value at the locationESP
was pointing to into the provided arg.ICall
: Acall
does two things:Pushes the next code location onto the stack (just like a
push
), which becomes the return pointerPerforms an unconditional jump to the provided label
Note that
call
does not affectEBP
, which the program must maintain on its own.IShr
,IShl
,ISar
: Bit shifting operations.sar
(shift-arithmetic-right) preserves the sign bit of the value being shifted, whileshr
(shift-right) simply fills in a zero bit.IAnd
,IOr
,IXor
: Bitwise logical operations.ITest
: Performs a bitwise-and of the two arguments, and records whether the result was zero, signed, or overflowed. This can be used in tandem with jumps likejo
orjz
.IJo
,IJno
: Jump to the provided label if the last arithmetic operation did/did not overflowILineComment
andIInstrComment
: these allow you to add a comment on a line by itself, or in-line after an instruction. These might be helpful for readability of your generated output, while you’re debugging.
3.4 Testing Functions
These are the same as they were for Boa. ANF is provided, and hasn’t changed
aside from the addition of new primitives and EBool
. So your tests should
focus on te
and t
tests.
If your programs exits with -10 as their exit code, it probably has segfaulted,
meaning it tried to access memory that was not allocated. If you’re familiar
with tools like valgrind, you can run valgrind output/some_test.run
in
order to get a little more feedback. This can sometimes tip you off quite well
as to how memory is off, since sometimes you’ll see code trying to jump to a
constant that’s in your code, or other obvious tells that there’s something off
in the stack. Also, if you’ve done all your stack management correctly,
valgrind will report a clean run for your program!1You may need to
install the libc6-dbg
package. If that doesn’t work, you may need to
install libc6-dbg:i386
(which seems like an odd package name, but it
does exist).
4 Recommended TODO List
Here’s an order in which you could consider tackling the implementation:
Fix the
print
function inmain.c
so that it prints out the right output. It will need to check for the tag using C bitwise operators, and useprintf
or one of its variants to print the right value.Take a first shot at figuring out how to increase the stack appropriately by using
count_vars
.Fill in the
EPrim1
case for everything butprint
, and figure out how to check for errors, and call the "non-number" error reporting function. Test as you go. Be aware that if the function call segfaults, it may be because you need to refine step 2.Implement compiling
print
to assembly by pushing appropriate arguments, thencall
ingprint
. Be aware that if the call doesn’t work, it may be because of step 2 again. Test as you go; be aware that you should test interesting sequences of print expressions and let-bindings to make sure your stack integrity is good before and after calls.Fill in all of the
EPrim2
cases, using the error-reporting from the last step. Test as you go.Complete the if case and test as you go.
5 Running main
Running your own programs is the same as with Boa, except you’ll give them
the .cobra
file extension.
6 List of Deliverables
your
compile.ml
any additional modules you saw fit to write
tests in an OUnit test module (
test.ml
)any test input programs (
input/*.cobra
files)any additional files that are needed to build your tests (e.g.
main.c
, etc)
Again, please ensure the makefile builds your code properly. The black-box tests will give you an automatic 0 if they cannot compile your code!
DO NOT SUBMIT YOUR .git
DIRECTORY! For that matter, don’t submit
your output
or _build
directories.
7 Grading Standards
For this assignment, you will be graded on
Whether your code implements the specification (functional correctness),
the clarity and cleanliness of your code, and
the comprehensiveness of your test coverage
8 Submission
Wait! Please read the assignment again and verify that you have not forgotten anything!
Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline.
1You may need to
install the libc6-dbg
package. If that doesn’t work, you may need to
install libc6-dbg:i386
(which seems like an odd package name, but it
does exist).