Assignment 6: Taipan:   Checking and inferring types
1 The Taipan Language
1.1 Concrete Syntax
1.2 Abstract Syntax
1.3 Semantics
2 Requirements for both CS4410 and CS6410
3 Requirements for CS4410
4 Requirements for CS6410
7.2

Assignment 6: Taipan: Checking and inferring types

Due: Wed 02/27 at 9:00pm

git clone

In this assignment, you’ll be statically checking and/or inferring the types of expressions and functions in input programs. In other words, you’ll be dealing with TYPE ANnotations.

1 The Taipan Language

1.1 Concrete Syntax

The only changes made to the concrete syntax of Diamondback have to do with handling types. First, we will optionally group multiple function definitions together, analogous to let rec ... and ..., to support mutually recursive functions. Second, we will allow type annotations on the arguments and return types of function definitions. Third, we will allow type annotations on let-bindings. Finally, we define syntax for types.

‹program›: | ‹declgroups› ‹expr› | ‹expr› ‹declgroups›: | ‹declgroup› | ‹declgroup› ‹declgroups› ‹declgroup›: | ‹decl› | ‹decl› and ‹declgroup› ‹decl›: | def IDENTIFIER < ‹tyids› > ( ‹ids› ) -> ‹typ› : ‹expr› | def IDENTIFIER ( ‹ids› ) -> ‹typ› : ‹expr› | def IDENTIFIER ( ‹ids› ) : ‹expr› ‹ids›: | ε | ‹bind› , ‹ids› ‹bind›: | IDENTIFIER | IDENTIFIER : ‹typ› ‹typ›: | IDENTIFIER | ‹tyid› | ( ‹typs› -> ‹typ› ) ‹typs›: | ‹typ› | ‹typ› , ‹typs› ‹tyid›: IDENTIFIER ‹tyids›: | ε | ‹tyid› , ‹tyids› ‹expr›: ...

The first and second ‹decl› definitions annotates the function with a return type; the other leaves the return type blank. The angle-brackets delimit type variables; you may only specify type variables if you bother to specify a return type. The ‹bind› definitions optionally annotate a binding with a type. Note that there must be no spaces before the < or ( tokens.

The ‹declgroup› construction corresponds to the ML definition let rec ... and ... form. The functions within a ‹declgroup› can refer to each other, or to any earlier definitions; unlike Diamondback, we will declare that not all definitions are mutually in scope, and so we prohibit forward-references to functions that have not yet been defined. (This will be needed for inferring polymorphic types for functions.)

1.2 Abstract Syntax

The abstract syntax is not much changed from Diamondback, other than the four changes described above.

type 'a typ =
  | TyBlank of 'a (* nothing was specified *)
  | TyCon of string * 'a (* type constants like Int or Bool *)
  | TyVar of string * 'a (* type variables, like 'a *)
  | TyArr of 'a typ list * 'a typ * 'a (* function types, ('a, 'b, 'c -> 'd) etc. *)
  | TyApp of 'a typ * 'a typ list * 'a (* type application, like 'a list *)

type 'a scheme =
  | SForall of string list * 'a typ * 'a (* e.g. Forall 'a, 'b, (some type using 'a, 'b) *)

type 'a decl =
  | DFun of string * (string * 'a) list * 'a scheme * 'a expr * 'a
         (* name,    args,                poly.type,  body,     tag *)
type 'a program =
  | Program of 'a decl list list * 'a expr * 'a (* one 'a decl list = one declgroup *)

We use TyBlank to indicate the source program did not specify a type. If a function had no type annotations at all, then it would have a scheme involving a lot of TyBlanks, e.g.

# Has scheme SForall([], TyArr([TyBlank, TyBlank], TyBlank))
def whatever(x, y): x

We add one expression form, purely for debugging use, which has no concrete syntax. The EAnnot expression allows you to decorate any expression and ascribe it a type:

type 'a expr =
    ...
  | EAnnot of 'a expr * 'a typ * 'a

At code-generation time, this expression should simply compile to its nested expression. At type-checking time, you should check that the nested expression indeed has the desired annotated type. You can also use this expression form to insert type descriptions into your program. For example,

let rec decorate (e : 'a expr) : 'a expr =
  match e with
  | ENumber(_, a) -> EAnnot(e, TyCon("Int", a))
  | EBool(_, a) -> EAnnot(e, TyCon("Bool", a))
  | EPrim1(Add1, e, a) -> EAnnot(EPrim1(Add1, decorate e, a), TyCon("Int", a))
  | ...

This may not be of much use during type-checking or type-inference, but it probably is of enormous use during debugging your code, by printing out a transformed AST that includes all the type-checks you think should be there...

1.3 Semantics

Only functions and operators can have polymorphic types, which we separate out and refer to as type schemes. An individual let-binding shall always have a monomorphic type. (Note that it could have a type that is variable, e.g.

# Should have type scheme, Forall 'a, 'a -> 'a
def identity(x):
  let y = x in y # y should have type 'a

Within the body of the function, the variable y has a single generic type, namely 'a.)

2 Requirements for both CS4410 and CS6410

You must change your scoping rules for functions, such that forward references to functions not in your current ‹declgroup› are not in scope.

You should add the new phases to your compiler pipeline as needed, and extend the phases.ml file accordingly.

We have provided you with a few new errors to report:

We have also given you a debug_printf function that behaves just like printf, except it’s controlled by the -d command-line flag for ./main.

3 Requirements for CS4410

You must implement a type-checking algorithm. A program is type-checkable if all function definitions have fully annotated their arguments and return types, and if all let-bindings have annotated their expected types. (In other words, you should never have to “guess” at the type of an expression.) You do not have to concern yourself with polymomrphic types for prim1 operations like isbool or isnum, or prim2 operations like ==. Instead, split these operations into pairs of operations, isboolN and isboolB, etc., that operate solely on numbers or on booleans. (Yes, that makes isboolN and its kin particularly useless functions — but that’s part of the benefit of static types.)

Design a function

type_check (p : sourcespan program) : sourcespan program fallible

that returns the first type error it encounters, or returns the program if it successfully type checks. You do not have to produce multiple type errors.

Test thoroughly.

4 Requirements for CS6410

You must implement a type-inference algorithm. This is intricate, and involves many substeps. Lecture 11: Type Inference lays out the outline of the functions you’ll need to implement, and what they all do. This webpage provides a good writeup of the type-inference algorithm, though it is marginally different than the one we will implement, and their code is written in Haskell rather than OCaml.

Design a function

let type_synth (p : sourcespan program) : sourcespan program fallible = ...

that returns the program itself if the program can infer a type successfully, or the first error it encounters otherwise. You do not have to produce mutliple type errors.

I recommend you use the EAnnot expression form, so that instead of returning the original input program, you return a version of the program where all expressions have been wrapped in EAnnot expressions whose types contain the inferred types you’ve deduced during inference.

The skeleton code given to you has slightly different signatures than the lecture notes; in particular, it contains additional “bookkeeping” parameters to carry along as needed: