Assignment 6: Taipan: Checking and inferring types
Due: Wed 02/27 at 9:00pm
git clone
In this assignment, you’ll be statically checking and/or inferring the types of expressions and functions in input programs. In other words, you’ll be dealing with TYPE ANnotations.
1 The Taipan Language
1.1 Concrete Syntax
The only changes made to the concrete syntax of
Diamondback have to do with handling types. First,
we will optionally group multiple function definitions together, analogous to
let rec ... and ...
, to support mutually recursive functions. Second, we
will allow type annotations on the arguments and return types of function
definitions. Third, we will allow type annotations on let-bindings. Finally,
we define syntax for types.
‹program› ‹declgroups› ‹expr› ‹expr› ‹declgroups› ‹declgroup› ‹declgroup› ‹declgroups› ‹declgroup› ‹decl› ‹decl› and ‹declgroup› ‹decl› def IDENTIFIER < ‹tyids› > ( ‹ids› ) -> ‹typ› : ‹expr› def IDENTIFIER ( ‹ids› ) -> ‹typ› : ‹expr› def IDENTIFIER ( ‹ids› ) : ‹expr› ‹ids› ε ‹bind› , ‹ids› ‹bind› IDENTIFIER IDENTIFIER : ‹typ› ‹typ› IDENTIFIER ‹tyid› ( ‹typs› -> ‹typ› ) ‹typs› ‹typ› ‹typ› , ‹typs› ‹tyid› ’ IDENTIFIER ‹tyids› ε ‹tyid› , ‹tyids› ‹expr› ...
The first and second ‹decl› definitions annotates the function with a return type; the other leaves the return type blank. The angle-brackets delimit type variables; you may only specify type variables if you bother to specify a return type. The ‹bind› definitions optionally annotate a binding with a type. Note that there must be no spaces before the < or ( tokens.
The ‹declgroup› construction corresponds to the ML
definition let rec ... and ...
form. The functions within a
‹declgroup› can refer to each other, or to any
earlier definitions; unlike Diamondback, we will declare that not all
definitions are mutually in scope, and so we prohibit forward-references to
functions that have not yet been defined. (This will be needed for inferring
polymorphic types for functions.)
1.2 Abstract Syntax
The abstract syntax is not much changed from Diamondback, other than the four changes described above.
type 'a typ =
| TyBlank of 'a (* nothing was specified *)
| TyCon of string * 'a (* type constants like Int or Bool *)
| TyVar of string * 'a (* type variables, like 'a *)
| TyArr of 'a typ list * 'a typ * 'a (* function types, ('a, 'b, 'c -> 'd) etc. *)
| TyApp of 'a typ * 'a typ list * 'a (* type application, like 'a list *)
type 'a scheme =
| SForall of string list * 'a typ * 'a (* e.g. Forall 'a, 'b, (some type using 'a, 'b) *)
type 'a decl =
| DFun of string * (string * 'a) list * 'a scheme * 'a expr * 'a
(* name, args, poly.type, body, tag *)
type 'a program =
| Program of 'a decl list list * 'a expr * 'a (* one 'a decl list = one declgroup *)
We use TyBlank
to indicate the source program did not specify a type. If
a function had no type annotations at all, then it would have a scheme
involving a lot of TyBlank
s, e.g.
# Has scheme SForall([], TyArr([TyBlank, TyBlank], TyBlank))
def whatever(x, y): x
We add one expression form, purely for debugging use, which has no concrete
syntax. The EAnnot
expression allows you to decorate any expression and
ascribe it a type:
type 'a expr =
...
| EAnnot of 'a expr * 'a typ * 'a
At code-generation time, this expression should simply compile to its nested expression. At type-checking time, you should check that the nested expression indeed has the desired annotated type. You can also use this expression form to insert type descriptions into your program. For example,
let rec decorate (e : 'a expr) : 'a expr =
match e with
| ENumber(_, a) -> EAnnot(e, TyCon("Int", a))
| EBool(_, a) -> EAnnot(e, TyCon("Bool", a))
| EPrim1(Add1, e, a) -> EAnnot(EPrim1(Add1, decorate e, a), TyCon("Int", a))
| ...
This may not be of much use during type-checking or type-inference, but it probably is of enormous use during debugging your code, by printing out a transformed AST that includes all the type-checks you think should be there...
1.3 Semantics
Only functions and operators can have polymorphic types, which we separate out and refer to as type schemes. An individual let-binding shall always have a monomorphic type. (Note that it could have a type that is variable, e.g.
# Should have type scheme, Forall 'a, 'a -> 'a
def identity(x):
let y = x in y # y should have type 'a
Within the body of the function, the variable y
has a single generic
type, namely 'a
.)
2 Requirements for both CS4410 and CS6410
You must change your scoping rules for functions, such that forward references to functions not in your current ‹declgroup› are not in scope.
You should add the new phases to your compiler pipeline as needed, and extend
the phases.ml
file accordingly.
We have provided you with a few new errors to report:
DeclArity
is unchanged from Diamondback.NoType(name, loc)
indicates that the given name wasn’t annotated with a type, at the given locationShouldBeFunction(name, loc, typ)
indicates that the given name was defined at the given location, and had the given type which somehow was not aTyArr
function-type.TypeMismatch(loc, expected, actual, why)
indicates a type mismatch at the given location, with the expected and actual types as given. Thewhy
parameter is a list ofreason
s, which we encourage you to use to debug your code. We have provided you with four reasons, and you may add more if you want to:InferExp(e)
means you were trying to infer the type for this expressionMessage(str)
just prints out whatever message you likeUnify(msg, t1, t2)
indicates that you were attempting to unify two types, and you may supply a diagnostic messageInstantiate(name, scheme)
indicates that you were instantiating the polymorphic type-scheme for the given named function.
For example, my type-inferencer will produce the following diagnostic output:
Source: (3 == true) Type error at input/simple.taipan, 1:0-1:9: expected Bool but got Int, because Different base types trying to unify 'X_2 and Bool (because arguments of arrows) trying to unify ('X_2, 'X_2 -> Bool) and (Int, Bool -> 'prim2ret_1) (because use of operator) trying to infer type for (3 == true) at input/simple.taipan, 1:0-1:9
Adding these diagnostics was invaluable in debugging some odd corner cases of my type-inference code. If you choose not to use them, you can just supply
[]
as thewhy
argument.
We have also given you a debug_printf
function that behaves just like
printf
, except it’s controlled by the -d
command-line flag for ./main
.
3 Requirements for CS4410
You must implement a type-checking algorithm. A program is
type-checkable if all function definitions have fully annotated their arguments
and return types, and if all let-bindings have annotated their expected types.
(In other words, you should never have to “guess” at the type of an
expression.) You do not have to concern yourself with polymomrphic types for
prim1
operations like isbool
or isnum
, or prim2
operations like ==
. Instead, split these operations into pairs of
operations, isboolN
and isboolB
, etc., that operate solely on
numbers or on booleans. (Yes, that makes isboolN
and its kin particularly
useless functions —
Design a function
type_check (p : sourcespan program) : sourcespan program fallible
that returns the first type error it encounters, or returns the program if it successfully type checks. You do not have to produce multiple type errors.
Test thoroughly.
4 Requirements for CS6410
You must implement a type-inference algorithm. This is intricate, and involves many substeps. Lecture 11: Type Inference lays out the outline of the functions you’ll need to implement, and what they all do. This webpage provides a good writeup of the type-inference algorithm, though it is marginally different than the one we will implement, and their code is written in Haskell rather than OCaml.
Design a function
let type_synth (p : sourcespan program) : sourcespan program fallible = ...
that returns the program itself if the program can infer a type successfully, or the first error it encounters otherwise. You do not have to produce mutliple type errors.
I recommend you use the EAnnot
expression form, so that instead of
returning the original input program, you return a version of the program where
all expressions have been wrapped in EAnnot
expressions whose types
contain the inferred types you’ve deduced during inference.
The skeleton code given to you has slightly different signatures than the lecture notes; in particular, it contains additional “bookkeeping” parameters to carry along as needed:
unify
carries around a(loc : sourcespan)
that is the original source location of the expression whose type you are trying to infer, and a(reasons : reason list)
of why your inference has proceeded to state that it has. You should use both of these parameters when youraise
aTypeMismatch
exception, and you should accumulate additional reasons as you recur through unification.infer_exp
also takes in a list of reasons, for the same purpose as above. Additionally, it produces a triple of a type substitution, an inferred type, and an annotated expression. The first two results are as described in the lecture notes; the third can use theEAnnot
form as described above.infer_decl
has the same signature asinfer_exp
, except it applies to individual function declarations.infer_group
applies to a group of mutually recursive functions. It produces the inferred type scheme and an annotated declaration group, but does not need to return the type substitution. (Why not?)