Types

Our specifications and implementations of value-of specify and implement the dynamic semantics of expressions.

Expressions can also have static semantics, which concern the properties of expressions that can be deduced without executing the expressions.

Type safety is an important property of expressions. Whether type safety is static or dynamic depends upon the programming language. If type safety is a static property, then we say the language is strongly typed.

Some programming languages allow the type of an expression to be calculated without executing the expression. We say these languages are statically typed.

A language can be statically typed without being strongly typed. C and C++, for example, are statically typed but not strongly typed, because type safety is not a static property of those languages. The problem is that the C/C++ type system is unsound. Although the type of a C/C++ expression can be calculated statically, that type is not always a reliable prediction of the expression's value at run time.

For example:

#include <stdio.h>
#include <stdlib.h>

double f (double * p) {
  *p = 3.14159;
  return *p;
}

int main (int argc, char* argv[]) {
  int n = 12345;
  double * p = (void *) &n;
  double x = f(p);
  double y = f(p+1);
  double z = f(p+2);
  printf ("n =  %d\n", n);
  printf ("x =  %lf\n", x);
}

What output is printed by that program?

On one machine, the output was:

n =  1074340345
x =  3.141590

On another machine, the program terminated with a segmentation fault. The problem is that the program is not type-safe.

Definition of Type Safety for LET, PROC, or LETREC

  1. For every evaluation of a variable, the variable is bound.
  2. For every evaluation of a difference expression (diff-exp exp1 exp2), the values of exp1 and exp2 are both numbers.
  3. For every evaluation of an expression of the form (zero?-exp exp1), the value of exp1 is a number.
  4. For every evaluation of a conditional expression (if-exp exp1 exp2 exp3), the value of exp1 is a boolean.
  5. For every evaluation of a procedure call (call-exp exp1 exp2), the value of exp1 is a procedure.

If one of those conditions is violated, we call it a type error. (Hence not all errors are type errors.) We say that a LET, PROC, or LETREC program is type-safe if and only if its execution cannot possibly involve a type error.

Not all LET, PROC, and LETREC programs are type-safe. That leads to the following question:

Is type safety a static property of LET, PROC, or LETREC?

That's the same as asking whether LET, PROC, and LETREC are strongly typed.

It so happens that LET is strongly typed. That is not terribly interesting, however, because LET is not a very expressive language. For example, there is no LET expression exp such that, for all integer values n, let x = n in exp evaluates to the absolute value of n. For another example, it is not possible to write an infinite loop in the LET language.

The interesting question is whether PROC is strongly typed.

If PROC were strongly typed, then type safety would be a static property of PROC programs. In other words, there would be some algorithm that takes an arbitrary PROC program as input and decides whether the program is type-safe. In particular, that algorithm would be able to decide whether an arbitrary program of the form

    if <expression>
       then (0 0)
       else (0 0)

is type-safe.

It should be obvious that programs of that form are type-safe if and only if the <expression> does not halt. If PROC were strongly typed, therefore, then there would be some algorithm that takes an arbitrary expression as input and decides whether the expression halts.

Theorem. For all Turing-complete programming languages, the halting problem is undecidable.

PROC (unlike LET) is Turing-complete. Because the halting problem is undecidable, no algorithm is able to decide whether an arbitrary PROC program is type-safe. In other words, PROC is not strongly typed. Since PROC is a proper subset of the LETREC language, LETREC is not strongly typed either.

The undecidability of the halting problem tells us that no general purpose programming language can be strongly typed, assuming type safety and strong typing are defined as above.

That's not the answer we want.

We can't have the answer we want.

We can, however, change the definition of type safety and/or strong typing so we can pretend to have the answer we want. The standard way to do that is:

  1. Define a static type system.
  2. Define the well-typed programs.
  3. Show that every well-typed program is type-safe in the sense defined earlier.
  4. Show that well-typedness is statically decidable.
  5. Pretend that's good enough.

That last step means we redefine strongly typed to mean

  1. well-typedness is a static property
  2. well-typedness implies type safety

Assigning a Type to an Expression

We'll start by defining a static type system for PROC:

Typing rules for PROC

(type-of (const-exp num) tenv) = int


(type-of (var-exp var) tenv) = tenv(var)


(type-of exp1 tenv) = int
(type-of exp2 tenv) = int
--------------------------------------------------------------------
(type-of (diff-exp exp1 exp2) tenv) = int


(type-of exp1 tenv) = int
----------------------------------------------------------------
(type-of (zero?-exp exp1) tenv) = bool


(type-of exp1 tenv) = bool
(type-of exp2 tenv) = t
(type-of exp3 tenv) = t
--------------------------------------------------------------------
(type-of (if-exp exp1 exp2 exp3) tenv) = t


(type-of exp1 tenv) = t1
(type-of body [var1:t1]tenv) = t
------------------------------------------------------------------------
(type-of (let-exp var1 exp1 body) tenv) = t


(type-of body [var1:t1]tenv) = t2
----------------------------------------------------------------------------
(type-of (proc-exp var1 body) tenv) = (t1t2)


(type-of exp1 tenv) = (t1t2)
(type-of exp2 tenv) = t1
--------------------------------------------------------------------
(type-of (call-exp exp1 exp2) tenv) = t2

The next step is to define

Well-typed PROC Programs

Definition. A PROC program (a-program exp) is well-typed if and only if there exists some type t such that the typing rules for PROC can be used to prove (type-of exp tenv0) = t

where tenv0 = [i:int,v:int,x:int] is the initial type environment that specifies the types of all variables bound in the standard initial environment.


The next step is to prove

Type Soundness

Theorem. If P is a well-typed PROC program, then P is type-safe.

That theorem is proved by induction on the number of calls to value-of that occur during the evaluation of P.


The next step is to prove that well-typedness is statically decidable.

The usual way to prove the decidability of some problem is to describe an algorithm that decides the problem. Such an algorithm is said to be a decision procedure.

It is easy to describe a decision procedure for determining whether a LET program is well-typed:

Decision Procedure for Well-typedness of LET Programs

Algorithm. Given a LET program (a-program exp), use the following algorithm to decide whether exp is well-typed with respect to the initial type environment tenv0.

If exp is a constant expression, then exp is well-typed with respect to tenv.

If exp is a variable x, then exp is well-typed with respect to tenv if and only if x is bound in the type environment tenv.

If exp is of the form (diff-exp exp1 exp2), then exp is well-typed with respect to tenv if and only if both exp1 and exp2 are well-typed in the type environment tenv and are of type int.

If exp is of the form (zero?-exp exp1), then exp is well-typed with respect to tenv if and only if exp1 is well-typed in the type environment tenv and is of type int.

If exp is of the form (if-exp exp1 exp2 exp3), then exp is well-typed with respect to tenv if and only if exp1, exp2, and exp3 are well-typed in the type environment tenv, exp1 is of type bool, and exp2 and exp3 are of the same type.

If exp is of the form (let-exp var1 exp1 body), then exp is well-typed with respect to tenv if and only if exp1 is well-typed in the type environment tenv and body1 is well-typed in the type environment [var1:t1]tenv, where t1 is the type of exp1.

That decision procedure is just the obvious algorithm that uses the typing rules for the LET language to compute the type of an expression.

If we try to extend that algorithm to proc expressions, we run into a problem:

(type-of body [var1:t1]tenv) = t2
----------------------------------------------------------------------------
(type-of (proc-exp var1 body) tenv) = (t1t2)

With the other rules, every type that occurs in the hypotheses of the rule is either a fixed type (such as int) or is the type of some subexpression. With proc expressions, however, it looks like we'd have to guess the type of the bound variable.

There are two standard ways to deal with this problem:

  1. Make the programmer tell us the type of the bound variable.
  2. Make the type checker infer the type of the bound variable.

Each of these approaches has its own advantages and disadvantages:

  1. Make the programmer tell us the type of the bound variable.
  2. Make the type checker infer the type of the bound variable.

Historically, most programming languages have been designed by the same people who implement them, so most programming languages place the burden on users instead of implementors. Although that is a fairly trivial basis for making such an important design decision, it really does seem to have been the most influential factor in most programming languages.

CHECKED: A Type-Checked Language

If we make the programmer tell us the types of bound variables, then we'll have to change the syntax of PROC and LETREC to accomodate those types. Section 7.3 of our textbook shows one possible syntax, and also implements a type-checker along the lines of the type checker for LET.

INFERRED: A Type-Checked Language

Section 7.4 of our textbook changes the syntax yet again. This is unnecessary for PROC and LETREC, but it facilitates certain extensions of PROC and LETREC that are explored in the exercises.

Type Inference for PROC and LETREC Programs

We will show how to infer types for the original syntax of LET, PROC, and LETREC.

We have already described an algorithm that decides whether a LET program is well-typed; that algorithm works by inferring a type for every subexpression that appears within the program.

To extend that algorithm to the PROC and LETREC languages, we must find some alternative to guessing the type of each bound variable.

When designing algorithms, there are two standard techniques for avoiding such guesswork:

The first of those techniques generally leads to an exponential algorithm, which is impractical. For type inference (and many other static analyses), the second technique is standard.

Type Inference Algorithm for LETREC Programs

Algorithm. Given a LETREC program, translate the program into an equivalent program (a-program exp) by renaming all bound variables so no variable is bound in more than one place. Invent a fresh type variable tx for every variable x that is bound within exp. Invent a fresh type variable te for every subexpression e that occurs within exp. Use the following algorithm to generate the type constraints that the tx and te must satisfy. Then solve those type constraints for the values of the tx and te.

If the type constraints have a solution, then the program is well-typed. If the type constraints are not satisfiable, then the program is not well-typed.

Algorithm for Generating Type Constraints from LETREC Programs

Algorithm. Given a LETREC expression exp, returns the set of type constraints generated from exp.

If exp is (const-exp n), then the type constraints are
    texp = int

If exp is (var-exp x), then the type constraints are
    texp = tx

If exp is (diff-exp e1 e2), then the type constraints are
    te1 = int
    te2 = int
    texp = int

If exp is (zero?-exp e1), then the type constraints are
    te1 = int
    texp = bool

If exp is (if-exp e1 e2 e3), then the type constraints are
    te1 = bool
    te2 = texp
    te3 = texp

If exp is (let-exp x1 e1 body), then the type constraints are
    tx1 = te1
    tbody = texp

If exp is (proc-exp x1 body), then the type constraints are
    texp = (tx1tbody)

If exp is (call-exp e1 e2), then the type constraints are
    te1 = (te2texp)

If exp is (letrec-exp x0 x1 e1 body), then the type constraints are
    tx0 = (tx1te1)
    texp = tbody

We will need an algorithm for solving sets of type constraints, but that algorithm will turn out to be little more than a systematic algorithm for substituting equals for equals. Before we consider that systematic algorithm, let's infer the types for this example:

    let a = proc (i) proc (j) -(i, -(0,j))
    in letrec m (x) = proc (y) if zero?(x)
                                 then 0
                                 else ((a ((m -(x,1)) y)) y)
       in ((m 11) 12)

The type variables that are associated with the bound variables are

    ta    a
    tm    m
    ti    i
    tj    j
    tx    x
    ty    y

The type variables that are associated with the subexpressions are

    t1    11
    t2    12
    t3    m
    t4    (m 11)
    t5    ((m 11) 12)
    t6    x
    t7    zero?(x)
    t8    0
    t9    x
    t10   1
    t11   -(x,1)
    t12   m
    t13   (m -(x,1))
    t14   y
    t15   ((m -(x,1)) y)
    t16   a
    t17   (a ((m -(x,1)) y))
    t18   y
    t19   ((a ((m -(x,1)) y)) y)
    t20   if zero?(x) then 0 else ((a ((m -(x,1)) y)) y)
    t21   proc (y) if zero?(x) then ... else ...
    t22   letrec m (x) = proc (y) ... in ((m 11) 12)
    t23   0
    t24   j
    t25   -(0,j)
    t26   i
    t27   -(i, -(0,j))
    t28   proc (j) -(i, -(0,j))
    t29   proc (i) proc (j) -(i, -(0,j))
    t30   let a = ... in letrec m (x) = ... in ...

The type constraints that are generated by those subexpressions are

    t1 = int

    t2 = int

    t3 = tm

    t3 = (t1t4)

    t4 = (t2t5)

    t6 = tx

    t6 = int
    t7 = bool

    t8 = int

    t9 = tx

    t10 = int

    t9 = int
    t10 = int
    t11 = int

    t12 = tm

    t12 = (t11t13)

    t14 = ty

    t13 = (t14t15)

    t16 = ta

    t16 = (t15t17)

    t18 = y18

    t17 = (t18t19)

    t7 = bool
    t8 = t20
    t19 = t20

    t21 = (tyt20)

    tm = (txt21)
    t22 = t5

    t23 = int

    t24 = tj

    t23 = int
    t24 = int
    t25 = int

    t26 = ti

    t25 = int
    t26 = int
    t27 = int

    t28 = (tjt27)

    t29 = (tit28)

    ta = t29
    t30 = t22

Any solution to those type constraints will prove that the program is well-typed. To obtain a decision procedure, however, we will need a systematic solver that always finds a solution if a solution exists, and always terminates with a failure notice if no solution exists.


Last updated 24 March 2008.

Valid XHTML 1.0!