Types

Our specifications and implementations of value-of specify and implement the dynamic semantics of expressions.

Expressions can also have static semantics, which concern the properties of expressions that can be deduced without executing the expressions.

Type safety is an important property of expressions. Whether type safety is static or dynamic depends upon the programming language. If type safety is a static property, then we say the language is strongly typed.

Some programming languages allow the type of an expression to be calculated without executing the expression. We say these languages are statically typed.

A language can be statically typed without being strongly typed. C and C++, for example, are statically typed but not strongly typed, because type safety is not a static property of those languages. The problem is that the C/C++ type system is unsound. Although the type of a C/C++ expression can be calculated statically, that type is not always a reliable prediction of the expression's value at run time.

For example:

#include <stdio.h>
#include <stdlib.h>

double f (double * p) {
  *p = 3.14159;
  return *p;
}

int main (int argc, char* argv[]) {
  int n = 12345;
  double * p = (void *) &n;
  double x = f(p);
  double y = f(p+1);
  double z = f(p+2);
  printf ("n =  %d\n", n);
  printf ("x =  %lf\n", x);
}

What output is printed by that program?

On one machine, the output was:

n =  1074340345
x =  3.141590

On another machine, the program terminated with a segmentation fault. The problem is that the program is not type-safe.

Definition of Type Safety for LET, PROC, or LETREC

For every evaluation of a variable, the variable is bound.
For every evaluation of a difference expression (diff-exp exp₁ exp₂), the values of exp₁ and exp₂ are both numbers.
For every evaluation of an expression of the form (zero?-exp exp₁), the value of exp₁ is a number.
For every evaluation of a conditional expression (if-exp exp₁ exp₂ exp₃), the value of exp₁ is a boolean.
For every evaluation of a procedure call (call-exp exp₁ exp₂), the value of exp₁ is a procedure.

If one of those conditions is violated, we call it a type error. (Hence not all errors are type errors.) We say that a LET, PROC, or LETREC program is type-safe if and only if its execution cannot possibly involve a type error.

Not all LET, PROC, and LETREC programs are type-safe. That leads to the following question:

Is type safety a static property of LET, PROC, or LETREC?

That's the same as asking whether LET, PROC, and LETREC are strongly typed.

It so happens that LET is strongly typed. That is not terribly interesting, however, because LET is not a very expressive language. For example, there is no LET expression exp such that, for all integer values n, let x = n in exp evaluates to the absolute value of n. For another example, it is not possible to write an infinite loop in the LET language.

The interesting question is whether PROC is strongly typed.

If PROC were strongly typed, then type safety would be a static property of PROC programs. In other words, there would be some algorithm that takes an arbitrary PROC program as input and decides whether the program is type-safe. In particular, that algorithm would be able to decide whether an arbitrary program of the form

    if <expression>
       then (0 0)
       else (0 0)

is type-safe.

It should be obvious that programs of that form are type-safe if and only if the <expression> does not halt. If PROC were strongly typed, therefore, then there would be some algorithm that takes an arbitrary expression as input and decides whether the expression halts.

Theorem. For all Turing-complete programming languages, the halting problem is undecidable.

PROC (unlike LET) is Turing-complete. Because the halting problem is undecidable, no algorithm is able to decide whether an arbitrary PROC program is type-safe. In other words, PROC is not strongly typed. Since PROC is a proper subset of the LETREC language, LETREC is not strongly typed either.

The undecidability of the halting problem tells us that no general purpose programming language can be strongly typed, assuming type safety and strong typing are defined as above.

That's not the answer we want.

We can't have the answer we want.

We can, however, change the definition of type safety and/or strong typing so we can pretend to have the answer we want. The standard way to do that is:

Define a static type system.
Define the well-typed programs.
Show that every well-typed program is type-safe in the sense defined earlier.
Show that well-typedness is statically decidable.
Pretend that's good enough.

That last step means we redefine strongly typed to mean

well-typedness is a static property
well-typedness implies type safety

Assigning a Type to an Expression

We'll start by defining a static type system for PROC:

Typing rules for PROC

(type-of (const-exp num) tenv) = int

(type-of (var-exp var) tenv) = tenv(var)

(type-of exp₁ tenv) = int
(type-of exp₂ tenv) = int
--------------------------------------------------------------------
(type-of (diff-exp exp₁ exp₂) tenv) = int

(type-of exp₁ tenv) = int
----------------------------------------------------------------
(type-of (zero?-exp exp₁) tenv) = bool

(type-of exp₁ tenv) = bool
(type-of exp₂ tenv) = t
(type-of exp₃ tenv) = t
--------------------------------------------------------------------
(type-of (if-exp exp₁ exp₂ exp₃) tenv) = t

(type-of exp₁ tenv) = t₁
(type-of body [var₁:t₁]tenv) = t
------------------------------------------------------------------------
(type-of (let-exp var₁ exp₁ body) tenv) = t

(type-of body [var₁:t₁]tenv) = t₂
----------------------------------------------------------------------------
(type-of (proc-exp var₁ body) tenv) = (t₁ → t₂)

(type-of exp₁ tenv) = (t₁ → t₂)
(type-of exp₂ tenv) = t₁
--------------------------------------------------------------------
(type-of (call-exp exp₁ exp₂) tenv) = t₂

The next step is to define

Well-typed PROC Programs

Definition. A PROC program (a-program exp) is well-typed if and only if there exists some type t such that the typing rules for PROC can be used to prove (type-of exp tenv₀) = t

where tenv₀ = [i:int,v:int,x:int] is the initial type environment that specifies the types of all variables bound in the standard initial environment.

Can you give an example of a PROC program that is not well-typed?
Can you give an example of a type-safe PROC program that is not well-typed?
Can you give an example of a type-safe PROC program that never terminates?
Can you give an example of a well-typed PROC program that never terminates?

The next step is to prove

Type Soundness

Theorem. If P is a well-typed PROC program, then P is type-safe.

That theorem is proved by induction on the number of calls to value-of that occur during the evaluation of P.

The next step is to prove that well-typedness is statically decidable.

The usual way to prove the decidability of some problem is to describe an algorithm that decides the problem. Such an algorithm is said to be a decision procedure.

It is easy to describe a decision procedure for determining whether a LET program is well-typed:

Decision Procedure for Well-typedness of LET Programs

Algorithm. Given a LET program (a-program exp), use the following algorithm to decide whether exp is well-typed with respect to the initial type environment tenv₀.

If exp is a constant expression, then exp is well-typed with respect to tenv.

If exp is a variable x, then exp is well-typed with respect to tenv if and only if x is bound in the type environment tenv.

If exp is of the form (diff-exp exp₁ exp₂), then exp is well-typed with respect to tenv if and only if both exp₁ and exp₂ are well-typed in the type environment tenv and are of type int.

If exp is of the form (zero?-exp exp₁), then exp is well-typed with respect to tenv if and only if exp₁ is well-typed in the type environment tenv and is of type int.

If exp is of the form (if-exp exp₁ exp₂ exp₃), then exp is well-typed with respect to tenv if and only if exp₁, exp₂, and exp₃ are well-typed in the type environment tenv, exp₁ is of type bool, and exp₂ and exp₃ are of the same type.

If exp is of the form (let-exp var₁ exp₁ body), then exp is well-typed with respect to tenv if and only if exp₁ is well-typed in the type environment tenv and body₁ is well-typed in the type environment [var₁:t₁]tenv, where t₁ is the type of exp₁.

That decision procedure is just the obvious algorithm that uses the typing rules for the LET language to compute the type of an expression.

If we try to extend that algorithm to proc expressions, we run into a problem:

(type-of body [var₁:t₁]tenv) = t₂
----------------------------------------------------------------------------
(type-of (proc-exp var₁ body) tenv) = (t₁ → t₂)

With the other rules, every type that occurs in the hypotheses of the rule is either a fixed type (such as int) or is the type of some subexpression. With proc expressions, however, it looks like we'd have to guess the type of the bound variable.

There are two standard ways to deal with this problem:

Make the programmer tell us the type of the bound variable.
Make the type checker infer the type of the bound variable.

Each of these approaches has its own advantages and disadvantages:

Make the programmer tell us the type of the bound variable.
- Advantage: The language is easier to implement.
- Advantage: Programs are easier to understand.
- Advantage: More sophisticated types can be expressed.
Make the type checker infer the type of the bound variable.
- Advantage: Programs are easier to write.
- Advantage: Programs are less cluttered.
- Advantage: Programs are more general, hence more reusable.

Historically, most programming languages have been designed by the same people who implement them, so most programming languages place the burden on users instead of implementors. Although that is a fairly trivial basis for making such an important design decision, it really does seem to have been the most influential factor in most programming languages.

CHECKED: A Type-Checked Language

If we make the programmer tell us the types of bound variables, then we'll have to change the syntax of PROC and LETREC to accomodate those types. Section 7.3 of our textbook shows one possible syntax, and also implements a type-checker along the lines of the type checker for LET.

INFERRED: A Type-Checked Language

Section 7.4 of our textbook changes the syntax yet again. This is unnecessary for PROC and LETREC, but it facilitates certain extensions of PROC and LETREC that are explored in the exercises.

Type Inference for PROC and LETREC Programs

We will show how to infer types for the original syntax of LET, PROC, and LETREC.

We have already described an algorithm that decides whether a LET program is well-typed; that algorithm works by inferring a type for every subexpression that appears within the program.

To extend that algorithm to the PROC and LETREC languages, we must find some alternative to guessing the type of each bound variable.

When designing algorithms, there are two standard techniques for avoiding such guesswork:

Make all possible guesses, and explore each guess in a parallel computation or concurrent thread, pruning computations/threads that don't work out.
Defer each guess by replacing it with a fresh variable. Collect a set of equations or other constraints that these variables must satisfy. Solve that set of equations or constraints to find values for the variables.

The first of those techniques generally leads to an exponential algorithm, which is impractical. For type inference (and many other static analyses), the second technique is standard.

Type Inference Algorithm for LETREC Programs

Algorithm. Given a LETREC program, translate the program into an equivalent program (a-program exp) by renaming all bound variables so no variable is bound in more than one place. Invent a fresh type variable t_x for every variable x that is bound within exp. Invent a fresh type variable t_e for every subexpression e that occurs within exp. Use the following algorithm to generate the type constraints that the t_x and t_e must satisfy. Then solve those type constraints for the values of the t_x and t_e.

If the type constraints have a solution, then the program is well-typed. If the type constraints are not satisfiable, then the program is not well-typed.

Algorithm for Generating Type Constraints from LETREC Programs

Algorithm. Given a LETREC expression exp, returns the set of type constraints generated from exp.

If exp is (const-exp n), then the type constraints are
    t_exp = int
If exp is (var-exp x), then the type constraints are
    t_exp = t_x
If exp is (diff-exp e1 e2), then the type constraints are
    t_e1 = int
    t_e2 = int
    t_exp = int
If exp is (zero?-exp e1), then the type constraints are
    t_e1 = int
    t_exp = bool
If exp is (if-exp e1 e2 e3), then the type constraints are
    t_e1 = bool
    t_e2 = t_exp
    t_e3 = t_exp
If exp is (let-exp x1 e1 body), then the type constraints are
    t_x1 = t_e1
    t_body = t_exp
If exp is (proc-exp x1 body), then the type constraints are
    t_exp = (t_x1 → t_body)
If exp is (call-exp e1 e2), then the type constraints are
    t_e1 = (t_e2 → t_exp)
If exp is (letrec-exp x0 x1 e1 body), then the type constraints are
    t_x0 = (t_x1 → t_e1)
    t_exp = t_body

We will need an algorithm for solving sets of type constraints, but that algorithm will turn out to be little more than a systematic algorithm for substituting equals for equals. Before we consider that systematic algorithm, let's infer the types for this example:

    let a = proc (i) proc (j) -(i, -(0,j))
    in letrec m (x) = proc (y) if zero?(x)
                                 then 0
                                 else ((a ((m -(x,1)) y)) y)
       in ((m 11) 12)

The type variables that are associated with the bound variables are

    t_a    a
    t_m    m
    t_i    i
    t_j    j
    t_x    x
    t_y    y

The type variables that are associated with the subexpressions are

    t₁    11
    t₂    12
    t₃    m
    t₄    (m 11)
    t₅    ((m 11) 12)
    t₆    x
    t₇    zero?(x)
    t₈    0
    t₉    x
    t₁₀   1
    t₁₁   -(x,1)
    t₁₂   m
    t₁₃   (m -(x,1))
    t₁₄   y
    t₁₅   ((m -(x,1)) y)
    t₁₆   a
    t₁₇   (a ((m -(x,1)) y))
    t₁₈   y
    t₁₉   ((a ((m -(x,1)) y)) y)
    t₂₀   if zero?(x) then 0 else ((a ((m -(x,1)) y)) y)
    t₂₁   proc (y) if zero?(x) then ... else ...
    t₂₂   letrec m (x) = proc (y) ... in ((m 11) 12)
    t₂₃   0
    t₂₄   j
    t₂₅   -(0,j)
    t₂₆   i
    t₂₇   -(i, -(0,j))
    t₂₈   proc (j) -(i, -(0,j))
    t₂₉   proc (i) proc (j) -(i, -(0,j))
    t₃₀   let a = ... in letrec m (x) = ... in ...

The type constraints that are generated by those subexpressions are

    t₁ = int

    t₂ = int

    t₃ = t_m

    t₃ = (t₁ → t₄)

    t₄ = (t₂ → t₅)

    t₆ = t_x

    t₆ = int
    t₇ = bool

    t₈ = int

    t₉ = t_x

    t₁₀ = int

    t₉ = int
    t₁₀ = int
    t₁₁ = int

    t₁₂ = t_m

    t₁₂ = (t₁₁ → t₁₃)

    t₁₄ = t_y

    t₁₃ = (t₁₄ → t₁₅)

    t₁₆ = t_a

    t₁₆ = (t₁₅ → t₁₇)

    t₁₈ = y₁₈

    t₁₇ = (t₁₈ → t₁₉)

    t₇ = bool
    t₈ = t₂₀
    t₁₉ = t₂₀

    t₂₁ = (t_y → t₂₀)

    t_m = (t_x → t₂₁)
    t₂₂ = t₅

    t₂₃ = int

    t₂₄ = t_j

    t₂₃ = int
    t₂₄ = int
    t₂₅ = int

    t₂₆ = t_i

    t₂₅ = int
    t₂₆ = int
    t₂₇ = int

    t₂₈ = (t_j → t₂₇)

    t₂₉ = (t_i → t₂₈)

    t_a = t₂₉
    t₃₀ = t₂₂

Any solution to those type constraints will prove that the program is well-typed. To obtain a decision procedure, however, we will need a systematic solver that always finds a solution if a solution exists, and always terminates with a failure notice if no solution exists.

Last updated 24 March 2008.