Shrinking a language, part 1

6.10

Shrinking a language, part 1

We defined a small language yesterday that could implement simple functions. We could handle numbers, booleans, our own definitions of functions, and builtin functions. Today, we’re going to shrink our language: we are going to try to encode numbers and booleans (and the functions that work with them) using nothing but lambdas. Our goal is not to create a convenient language to use, but rather to see just how minimal of a language we can get that still suffices to compute whatever we want to compute. In fact, while we’re at it, let’s limit ourselves to lambdas of just a single argument, just to see how minimal we can get.

Our goal for today will be to translate the following function from ISL into nothing but lambdas:

; Nat -> Nat
(define (fact n)
  (if (zero? n)
      1
      (* n (fact (sub1 n)))))

Sample Problem Identify all the concepts we need to translate, in order to simplify this into only lambdas.

Starting from the simplest, we need:

Booleans: true, false and if
Numbers: 1, zero?, * and sub1
Definitions and recursion

We’ll solve all but the last one today; we’ll come back to it next time.

1 Booleans

Our strategy for defining each of these topics is to identify the constants we need to work with, and some defining equations that describe the requirements we have for them to work properly. For Booleans, it’s easy: there are three constants, TRUE, FALSE and IF, and we have the following requirements:

Actually, these equations are ever so slightly incomplete. We’ll see why, next time.

IF TRUE a b ===> a
IF FALSE a b ===> b

In words, these two equations simply describe our intuition about how if is supposed to work: if the condition is true it evaluates to the first branch, and if it’s false it evaluates to the second.

How can we define TRUE and FALSE using nothing but lambdas? Limiting ourselves to lambdas means giving up some obvious abilities: among them, we can’t check whether a value is equal to a particular constant or not, because we can’t check lambdas for equality. All we can do with lambdas is call them. So we have to shift our perspective from asking what the constants are to asking what the constants do.

The key insight is to think of these constants as choosing between two options. TRUE is simply the function that, when given two options, always picks the first one, and FALSE is the function that, when given two options, always picks the second. This is enough to define them:

; Any Any -> Any
(define TRUE.v1 (λ(a b) a))
(define FALSE.v1 (λ(a b) b))

For reasons that will become clearer later, it will be convenient to define these functions slightly differently. Instead of taking two arguments at once, let’s redefine them to take one argument and return (a function that takes one more argument and finally returns a value). In other words,

; A BOOL is a function of signature Any -> [Any -> Any],
; and is one of:
(define TRUE (λ(a) (λ(b) a)))
(define FALSE (λ(a) (λ(b) b)))

Now we need to define IF, to work with these two values, so that it behaves as we required. The purpose of IF is to choose one of its branches, based on which Boolean it receives. Fortunately, our encoding of Booleans is really good at choosing between two options!

Sample Problem Define IF, and test it.

; BOOL -> [Any -> [Any -> Any]]
(define IF (λ(cond) (λ(then) (λ(else) ((cond then) else)))))

Again, we’re defining IF using lambdas with just a single argument, nesting them inside each other.

A slight caution: the derivations below are not written to follow exactly the evaluation order that DrRacket would use. I write it this way to keep things clearer, and we can actually prove that in this setting, either evaluation order is allowed. You are encouraged to work through the derivations in the order DrRacket would evaluate. For example, the first step below should substitute both IF and TRUE...

Let’s see whether these definitions work with our requirements.

(((IF TRUE) x) y)
  ; {substitute IF}
  ==> ((((λ(cond) (λ(then) (λ(else) ((cond then) else)))) TRUE) x) y)
  ; {apply the lambdas}
  ==> ((TRUE x) y)
  ; {substitute TRUE}
  ==> (((λ(a) (λ(b) a)) x) y)
  ; {apply the lambdas}
  ==> x
(((IF FALSE) x) y)
  ; {substitute IF}
  ==> ((((λ(cond) (λ(then) (λ(else) ((cond then) else)))) FALSE) x) y)
  ; {apply the lambdas}
  ==> ((FALSE x) y)
  ; {substitute TRUE}
  ==> (((λ(a) (λ(b) b)) x) y)
  ; {apply the lambdas}
  ==> y

This seems to meet our definitions, so we have a working encoding of Booleans and if, using nothing but lambdas.

2 Pairs

In the previous lecture we ignored structured data entirely. Let’s remedy that a bit here. We won’t handle arbitrary structure definitions; we’ll just work with a single definition of a pair of values. (We can use pairs to encode everything else: lists, trees, graphs, etc. So conceptually, we only need to support just this one kind of structure.)

Sample Problem Suppose we wrote (define-struct pair [left right]) in ISL. What functions would we be defining?

We only need three constants to describe a pair: MK-PAIR to construct a pair, and LEFT and RIGHT to get the pieces of a pair. We don’t need PAIR?, since pairs are the only structure we have. (This is fortunate, since we have no way of asking a given lambda whether it is a pair or not.) Our constants will have the signatures

; MK-PAIR : Any -> Any -> Pair
; LEFT : Pair -> Any
; RIGHT : Pair -> Any

and the defining equations

(LEFT (MK-PAIR a b)) ===> a
(RIGHT (MK-PAIR a b)) ===> b

Exercise 14 Design encodings for these three constants.

The key insight here is that a pair is useless until we take it apart into its pieces, and doing so amounts to choosing which piece we want. That suggests the following encoding:

; A PAIR is a [BOOL -> Any]
; INTERPRETATION: when given TRUE, the pair will return its left value;
; and when given FALSE, it will return its right value
(define MK-PAIR (λ(a) (λ(b) (λ(choose) ((choose a) b)))))
(define LEFT (λ(pair) (pair TRUE)))
(define RIGHT (λ(pair) (pair FALSE)))

Let’s see whether these definitions work with our requirements.

(LEFT ((MK-PAIR x) y))
  ; {substitute LEFT}
  ==> ((λ(pair) (pair TRUE)) ((MK-PAIR x) y))
  ; {apply the lambda}
  ==> (((MK-PAIR x) y) TRUE)
  ; {substitute MK-PAIR}
  ==> ((((λ(a) (λ(b) (λ(choose) ((choose a) b)))) x) y) TRUE)
  ; {apply the lambdas}
  ==> ((TRUE x) y)
  ; {by the definition of TRUE}
  ==> x
(RIGHT ((MK-PAIR x) y))
  ; {substitute RIGHT}
  ==> ((λ(pair) (pair FALSE)) ((MK-PAIR x) y))
  ; {apply the lambda}
  ==> (((MK-PAIR x) y) FALSE)
  ; {substitute MK-PAIR}
  ==> ((((λ(a) (λ(b) (λ(choose) ((choose a) b)))) x) y) FALSE)
  ; {apply the lambdas}
  ==> ((FALSE x) y)
   ; {by the definition of FALSE}
  ==> y

This seems to meet our definitions, so we have a working encoding of Pairs, using nothing but lambdas.

3 Numbers

First, let’s agree that we’re not going to implement all possible numbers. Implementing complex numbers isn’t technically challenging: a complex number is simply a pair of real numbers...and we have pairs now. Implementing real numbers is very challenging, but can be implemented by something called Dedekind cuts, which is well beyond our scope but can be encoded by a predicate function on fractions...and we already have Booleans. A fraction is nothing more than a pair of integers...and we have pairs. And an integer is nothing more than the difference between two natural numbers...which is another pair. So all we need to implement are the natural numbers.

We’ve seen data definitions of naturals already: either zero, or one more than some natural. The essence of such numbers appears to be to count how many times to do something: in other words, these are the Repeaters you’ve worked with in lab.

Exercise 15 Finish Lab 12 Lambda on repeaters.

; A NAT is a function (X) [X -> X] -> [X -> X]
; Every NAT will look something like (λ(f) (λ(x) ... ))
(define ZERO  (λ(f) (λ(x) x)))
(define ONE   (λ(f) (λ(x) (f x))))
(define TWO   (λ(f) (λ(x) (f (f x)))))
(define THREE (λ(f) (λ(x) (f (f (f x))))))
...

Looking at our list of requirements for implementing fact, we need to be able to implement zero?. How can we tell if a number is zero or not? What uniquely distinguishes ZERO from all the other numbers? If we look at the definitions of ZERO, ONE etc., we might notice that ZERO is the only function that returns x directly: all the other ones return f applied to something. This gives us a hook we can use. The signature of NAT is (X) [X -> X] -> [X -> X], for any X we choose. What if we choose BOOL? Then we can read the signature as saying, "if you give a NAT a predicate and an initial BOOL value, it will give back a BOOL." Even better, it will give that initial value back only when the number is zero; otherwise it will call the function. So, we can set the initial value to TRUE, and the predicate to (λ (dont-care) FALSE):

(define ZERO?
(λ(n)
((n (λ(dont-care) FALSE)) TRUE)))

Exercise 16 Test this function to confirm that it works for several numbers.

3.1 Making numbers bigger

We need to design ADD1, which ought to take in a NAT and produce a new NAT that represents the next larger number. Therefore, its signature is

; ADD1 : NAT -> NAT

Sample Problem Determine the general scaffolding for ADD1. What can we deduce just from the signature and data definition?

We know the function must take in a NAT – call that argument n. We know it must return a NAT, something that looks like (λ (f) (λ (x) ...)). So we have

(define ADD1
(λ(n)
(λ(f) (λ(x) ...))))

To fill in the dots, observe that we want to do something n+1 times to a value, and the something we want to do is given by f, and the value we want to use is given by x. We know, from the meaning of our encoding, that ((n f) x) will repeatedly apply f to x a total of n times, and all we need to do is apply it once more:

(define ADD1
(λ(n)
(λ(f) (λ(x) (f ((n f) x))))))

Once we have ADD1, we can define arbitrary addition. That is, a function that takes two NATs and produces a new NAT that represents their sum. Here, we don’t need the (λ (f) (λ (x) ...)) scaffold, if we’re a bit clever. To add one number m to another number n, we just need to add 1 a bunch of times. How many times?

(define ADD
(λ(m) (λ(n)
((m ADD1) n))))

Sample Problem Given addition, define multiplication.

To multiply two numbers, we just need to add a number to itself a bunch of times. How many times, and counting up from what?

(define TIMES
(λ(m) (λ(n)
((m (ADD n)) ZERO))))

Exercise 17 Given multiplication, define exponentiation.

3.2 Making numbers smaller

The challenge of implementing subtraction is subtle. Our encoding of natural numbers as repeaters menas that all we can do with these values is call them or ignore them; we can’t "partially" call them to repeat "fewer times". Said another way, if we have a repeater that applies a function f a total of n times, we can’t in general "undo" one of those calls to f – instead, we have to figure out a way to never have called it in the first place.

If we can’t count "down from n", then maybe we can count "up from ZERO." One reasonable intuition is to say "let’s define a function that takes its argument n, then starts an accumulator x from ZERO and checks whether (ADD1 x) is equal to n – if it is, then x is our answer, otherwise, ADD1 to it and recur." The problem is we need to figure out how to define equality on numbers, and right now the only number we know how to check for is ZERO. So we’d have to subtract one number from the next and compare to zero...and we don’t have subtraction yet. So this approach doesn’t quite work.

But the gist of the idea is promising, though: maybe we can define a function that, when repeated exactly n times, will give us n-1? Let’s build a simple table of inputs and outputs to see what we expect:

n	(SUB1 n)
0	??
1	0
2	1
3	2
4	3
...	...
n	n - 1

This is one of the foundational axioms of Peano arithmetic. Axioms 1 and 6 correspond exactly to our NAT data definition...and our encoding actually very neatly matches this: the original notation of repeaters used s and z instead of f and x, because these two variables serve the exact same roles as "successor" and "zero".

Almost all of the table is obvious – after all, we know how subtraction is supposed to work! – but the top-right corner is tricky: zero is not the successor of any natural number. We can define it to be zero, since we aren’t working with any negative numbers: this is known as saturating arithmetic. Still, we’ve run into difficulty: while it’s easy to see how to get from each value in the left column to the one beneath it (by adding 1), it’s not so easy how to handle the right column, because of that glitch in the top-right corner.

The very clever insight here is to shift our perspective: instead of trying to consider the right column on its own, let’s consider each row as a value — specifically, a pair! Now we can ask the question, "is there a way to compute the next row from the current row?"

Sample Problem Solve this!

It’s still the case that we can get the left item of each pair by using ADD1 on the left item of the previous row. For the right item, instead of trying to compute it from the right item of the previous row, we can just copy the left item of the previous row. In other words, by thinking of this process as a function on pairs, we’ve given ourselves an accumulator parameter to hold on to the previous number, and this is exactly what we need to compute SUB1.

Our code for SUB1 will look like this

(define SUB1
  (λ(n)
    (RIGHT                      ; take the right value of the pair that we get by
      ((n (λ(row)               ;   repeat n times: given a row,
            (MK-PAIR            ;   make a pair where we
              (ADD1 (LEFT row)) ;     add 1 to the left item of the prior row, then copy
              (LEFT row))))     ;     the left item of the prior row to the right of this row
       (MK-PAIR ZERO ZERO)))))  ;   starting from a pair of zeros

This is truly elegant: we’ve used pairs to give ourselves an accumulator parameter, revised our perspective to think about repeaters as acting on rows of a table instead of just individual numbers, revised our perspective of subtraction to be a process of counting up, and in the end we obtain a function that works perfectly.

Exercise 18 Given SUB1, define arbitrary subtraction.

4 Recursion

At this point, we can translate our code to be

; NAT -> NAT
(define fact
  (λ(n)
    (IF (ZERO? n)
        ONE
        (* n (fact (SUB1 n))))))

We’ve translated everything in our function...except the recursive call itself. That will have to wait for another day.

5 Summary

The encodings we’ve discussed today are called the Church encodings, named after Alonzo Church, and the language that we’ve reduced to, which consists solely of variables, lambdas and applications, is called the Lambda calculus. There is a wealth of research and programming language theory that stems from this, and if you’re interested, there’s far more to explore here.

← prev up next →

	General Information
	Advice
	Texts
	Office Hours/ Contact
	Syllabus
	Labs
	Problem Sets
	Old Exams
	The Style
	Interesting Articles
	Peer Review
	Course Contract
	The Design Recipe
	Lecture notes from the end-of-semester

	Defining an interpreter for a small language
	Shrinking a language, part 1
	Defining a perceptron to classify messages