Assignment 6: Taipan: Checking and inferring types

7.2

Assignment 6: Taipan: Checking and inferring types

Due: Wed 02/27 at 9:00pm

git clone

In this assignment, you’ll be statically checking and/or inferring the types of expressions and functions in input programs. In other words, you’ll be dealing with TYPE ANnotations.

1 The Taipan Language

1.1 Concrete Syntax

The only changes made to the concrete syntax of Diamondback have to do with handling types. First, we will optionally group multiple function definitions together, analogous to let rec ... and ..., to support mutually recursive functions. Second, we will allow type annotations on the arguments and return types of function definitions. Third, we will allow type annotations on let-bindings. Finally, we define syntax for types.

‹program›: | ‹declgroups› ‹expr› | ‹expr› ‹declgroups›: | ‹declgroup› | ‹declgroup› ‹declgroups› ‹declgroup›: | ‹decl› | ‹decl› and ‹declgroup› ‹decl›: | def IDENTIFIER < ‹tyids› > ( ‹ids› ) -> ‹typ› : ‹expr› | def IDENTIFIER ( ‹ids› ) -> ‹typ› : ‹expr› | def IDENTIFIER ( ‹ids› ) : ‹expr› ‹ids›: | ε | ‹bind› , ‹ids› ‹bind›: | IDENTIFIER | IDENTIFIER : ‹typ› ‹typ›: | IDENTIFIER | ‹tyid› | ( ‹typs› -> ‹typ› ) ‹typs›: | ‹typ› | ‹typ› , ‹typs› ‹tyid›: ’ IDENTIFIER ‹tyids›: | ε | ‹tyid› , ‹tyids› ‹expr›: ...

The first and second ‹decl› definitions annotates the function with a return type; the other leaves the return type blank. The angle-brackets delimit type variables; you may only specify type variables if you bother to specify a return type. The ‹bind› definitions optionally annotate a binding with a type. Note that there must be no spaces before the < or ( tokens.

The ‹declgroup› construction corresponds to the ML definition let rec ... and ... form. The functions within a ‹declgroup› can refer to each other, or to any earlier definitions; unlike Diamondback, we will declare that not all definitions are mutually in scope, and so we prohibit forward-references to functions that have not yet been defined. (This will be needed for inferring polymorphic types for functions.)

1.2 Abstract Syntax

The abstract syntax is not much changed from Diamondback, other than the four changes described above.

type 'a typ =
  | TyBlank of 'a (* nothing was specified *)
  | TyCon of string * 'a (* type constants like Int or Bool *)
  | TyVar of string * 'a (* type variables, like 'a *)
  | TyArr of 'a typ list * 'a typ * 'a (* function types, ('a, 'b, 'c -> 'd) etc. *)
  | TyApp of 'a typ * 'a typ list * 'a (* type application, like 'a list *)

type 'a scheme =
  | SForall of string list * 'a typ * 'a (* e.g. Forall 'a, 'b, (some type using 'a, 'b) *)

type 'a decl =
  | DFun of string * (string * 'a) list * 'a scheme * 'a expr * 'a
         (* name,    args,                poly.type,  body,     tag *)
type 'a program =
  | Program of 'a decl list list * 'a expr * 'a (* one 'a decl list = one declgroup *)

We use TyBlank to indicate the source program did not specify a type. If a function had no type annotations at all, then it would have a scheme involving a lot of TyBlanks, e.g.

# Has scheme SForall([], TyArr([TyBlank, TyBlank], TyBlank))
def whatever(x, y): x

We add one expression form, purely for debugging use, which has no concrete syntax. The EAnnot expression allows you to decorate any expression and ascribe it a type:

type 'a expr =
    ...
  | EAnnot of 'a expr * 'a typ * 'a

At code-generation time, this expression should simply compile to its nested expression. At type-checking time, you should check that the nested expression indeed has the desired annotated type. You can also use this expression form to insert type descriptions into your program. For example,

let rec decorate (e : 'a expr) : 'a expr =
  match e with
  | ENumber(_, a) -> EAnnot(e, TyCon("Int", a))
  | EBool(_, a) -> EAnnot(e, TyCon("Bool", a))
  | EPrim1(Add1, e, a) -> EAnnot(EPrim1(Add1, decorate e, a), TyCon("Int", a))
  | ...

This may not be of much use during type-checking or type-inference, but it probably is of enormous use during debugging your code, by printing out a transformed AST that includes all the type-checks you think should be there...

1.3 Semantics

Only functions and operators can have polymorphic types, which we separate out and refer to as type schemes. An individual let-binding shall always have a monomorphic type. (Note that it could have a type that is variable, e.g.

# Should have type scheme, Forall 'a, 'a -> 'a
def identity(x):
  let y = x in y # y should have type 'a

Within the body of the function, the variable y has a single generic type, namely 'a.)

2 Requirements for both CS4410 and CS6410

You must change your scoping rules for functions, such that forward references to functions not in your current ‹declgroup› are not in scope.

You should add the new phases to your compiler pipeline as needed, and extend the phases.ml file accordingly.

We have provided you with a few new errors to report:

DeclArity is unchanged from Diamondback.
NoType(name, loc) indicates that the given name wasn’t annotated with a type, at the given location
ShouldBeFunction(name, loc, typ) indicates that the given name was defined at the given location, and had the given type which somehow was not a TyArr function-type.
TypeMismatch(loc, expected, actual, why) indicates a type mismatch at the given location, with the expected and actual types as given. The why parameter is a list of reasons, which we encourage you to use to debug your code. We have provided you with four reasons, and you may add more if you want to:
- InferExp(e) means you were trying to infer the type for this expression
- Message(str) just prints out whatever message you like
- Unify(msg, t1, t2) indicates that you were attempting to unify two types, and you may supply a diagnostic message
- Instantiate(name, scheme) indicates that you were instantiating the polymorphic type-scheme for the given named function.
For example, my type-inferencer will produce the following diagnostic output:
```
Source:
(3 == true)

Type error at input/simple.taipan, 1:0-1:9: expected Bool but got Int, because
   Different base types
   trying to unify 'X_2 and Bool (because arguments of arrows)
   trying to unify ('X_2, 'X_2 -> Bool) and (Int, Bool -> 'prim2ret_1) (because use of operator)
   trying to infer type for (3 == true) at input/simple.taipan, 1:0-1:9
```
Adding these diagnostics was invaluable in debugging some odd corner cases of my type-inference code. If you choose not to use them, you can just supply [] as the why argument.

We have also given you a debug_printf function that behaves just like printf, except it’s controlled by the -d command-line flag for ./main.

3 Requirements for CS4410

You must implement a type-checking algorithm. A program is type-checkable if all function definitions have fully annotated their arguments and return types, and if all let-bindings have annotated their expected types. (In other words, you should never have to “guess” at the type of an expression.) You do not have to concern yourself with polymomrphic types for prim1 operations like isbool or isnum, or prim2 operations like ==. Instead, split these operations into pairs of operations, isboolN and isboolB, etc., that operate solely on numbers or on booleans. (Yes, that makes isboolN and its kin particularly useless functions — but that’s part of the benefit of static types.)

Design a function

type_check (p : sourcespan program) : sourcespan program fallible

that returns the first type error it encounters, or returns the program if it successfully type checks. You do not have to produce multiple type errors.

Test thoroughly.

4 Requirements for CS6410

You must implement a type-inference algorithm. This is intricate, and involves many substeps. Lecture 11: Type Inference lays out the outline of the functions you’ll need to implement, and what they all do. This webpage provides a good writeup of the type-inference algorithm, though it is marginally different than the one we will implement, and their code is written in Haskell rather than OCaml.

Design a function

let type_synth (p : sourcespan program) : sourcespan program fallible = ...

that returns the program itself if the program can infer a type successfully, or the first error it encounters otherwise. You do not have to produce mutliple type errors.

I recommend you use the EAnnot expression form, so that instead of returning the original input program, you return a version of the program where all expressions have been wrapped in EAnnot expressions whose types contain the inferred types you’ve deduced during inference.

The skeleton code given to you has slightly different signatures than the lecture notes; in particular, it contains additional “bookkeeping” parameters to carry along as needed:

unify carries around a (loc : sourcespan) that is the original source location of the expression whose type you are trying to infer, and a (reasons : reason list) of why your inference has proceeded to state that it has. You should use both of these parameters when you raise a TypeMismatch exception, and you should accumulate additional reasons as you recur through unification.
infer_exp also takes in a list of reasons, for the same purpose as above. Additionally, it produces a triple of a type substitution, an inferred type, and an annotated expression. The first two results are as described in the lecture notes; the third can use the EAnnot form as described above.
infer_decl has the same signature as infer_exp, except it applies to individual function declarations.
infer_group applies to a group of mutually recursive functions. It produces the inferred type scheme and an annotated declaration group, but does not need to return the type substitution. (Why not?)

1	The Taipan Language
2	Requirements for both CS4410 and CS6410
3	Requirements for CS4410
4	Requirements for CS6410