8.0

Lecture 1: Introduction

1What is a Compiler?

We all have an intuitive understanding of what a program is: it’s some thing that instructs a computer to do something. But the language in which we tend to write our programs is nothing like the language that the computer understands natively. Something must translate the source code of our programs into a form the computer understands.

Conceptually, there are two ways this could happen: first, we could write a program that takes the source code of our program and interprets it on the fly, translating as it goes. This is essentially what DrRacket does when we run programs in it. The downside to this approach is that we must keep both the source code and the interpreter around whenever we want to run the program.

Second, we could write a program that translates the source code of our program into a form the machine can understand directly. Once this translation step has completed, we no longer need the source or the translator; we can just execute the resulting program directly.

In other words, a compiler is simply a function that maps an input string to an output string,

compiler : String -> String

where we typically consider the input and output strings to be programs in two different languages:

compiler : SourceProgram -> TargetProgram

For example, here are some well-known compilers

gcc, clang : C          -> Binary          (* a.out, .exe *)
javac      : Java       -> JvmByteCode     (* .class *)
scalac     : Scala      -> JvmByteCode
ocamlc     : Ocaml      -> OcamlByteCode   (* .cmo *)
ocamlopt   : Ocaml      -> Binary
gwt        : Java       -> JavaScript      (* .js *)
v8         : JavaScript -> Binary
nasm       : X64        -> Binary
pdftex     : LaTeX      -> PDF
pandoc     : Markdown   -> PDF or Html or Doc

Key Requirements on output program:

1. Has the same meaning (“semantics”) as input,

2. Is executable in relevant context (VM, microprocessor, web browser).

1.1A Bit of History

Compilers were invented to avoid writing machine code by hand

Richard Hamming – The Art of Doing Science and Engineering, p25:

In the beginning we programmed in absolute binary... Finally, a Symbolic Assembly Program was devised – after more years than you are apt to believe during which most programmers continued their heroic absolute binary programming. At the time [the assembler] first appeared I would guess about 1% of the older programmers were interested in it – using [assembly] was “sissy stuff”, and a real programmer would not stoop to wasting machine capacity to do the assembly.

John A.N. Lee, Dept of Computer Science, Virginia Polytechnical Institute:

One of von Neumann’s students at Princeton recalled that graduate students were being used to hand assemble programs into binary for their early machine. This student took time out to build an assembler, but when von Neumann found out about it he was very angry, saying that it was a waste of a valuable scientific computing instrument to use it to do clerical work.

1.2What does a Compiler look like?

An input source program is converted to an executable binary in many stages:

• Parsed into a data structure called an Abstract Syntax Tree

• Checked to make sure code is well-formed (and well-typed)

• Simplified into some convenient Intermediate Representation

• Optimized into (equivalent) but faster program

• Generated into assembly x64

• Linked against a run-time (usually written in C)

1.3What is CS 4410?

• A bridge between two worlds

• High-level: ML (CS 2500)

• Machine Code: X64/ARM (CS 3650)

A sequel to both those classes.

• How to write a compiler for a small-ish functional language (roughly ISL+) to x64:

1. Parsing

2. Checking & Validation

3. Simplification & Normalizing

4. Optimization

5. Code Generation

• But also, how to write complex programs

• Design

• Implement

• Test

• Iterate

3What will we do?

Writing a compiler directly from a high-level language to x64 isn’t easy to build in a single step. So we will write many compilers, each adding new features:

• Numbers and increment/decrement

• Local Variables

• Nested Binary Operations

• Booleans, Branches and Dynamic Types

• Functions

• Tuples and Structures

• Lambdas and closures

• Garbage Collection

At this point you will have a language akin to ISL+, but there are many more extensions we can try:

• Optimizations

• Static type enforcement

• Mutable variables

• Objects

• ...

We may not get to implementing all of these, but we will see where they each fit into the architecture of a compiler.

4What will you learn?

• Core principles of compiler construction

• Managing Stacks & Heap

• Type Checking

• Intermediate forms

• Optimization

• Several new languages

• OCaml to write the compiler

• C to write the “run-time”

• X64 compilation target

• More importantly how to write a large program

• How to use types for design

• How to add new features / refactor

• How to test & validate