Lecture 20: Recursion & Induction
1 Purpose
Talk about recursion, templates, induction principles, etc.
2 Outline
In Discrete, you learned about mathematical proofs by induction. e.g., a proof you may have done goes something like this:
Prove that the number of subsets of a set of n items is 2^n.
We proceed by induction over n, a natural number.
In the base case, when n = 0, we have no items, so there is one subset, the empty set \emptyset, and so we are done, since 2^0 is 1.
In the inductive case, we consider arbitrary n = k + 1.
Now, our inductive hypothesis is that there are 2^k subsets of a set of k items. To show that there are 2^{k+1} = 2 * 2^k subsets if you add one more item, its sufficient to copy all the subsets and either include or exclude the new item in each copy.
That’s induction. How does this connect to recursion?
Well, the exact same proof can be written as a functional program, over numbers, that follows the "template" for natural numbers. Here we use lists to represent sets, for syntactic convenience, and to be able to identify the "items", we’ll use numbers: in particular, the kth item will be the number k.
Focus first on the program; come back to the signature.
(: subsets-proof (Function (arguments (n Natural)) (result (AllOf (List (List Natural)) (lambda (l) (= (length l) (expt 2 n))))))) (define (subsets-proof n) (cond [(zero? n) (list (list))] [else (let* ([k (sub1 n)] [ksubsets (subsets-proof k)]) (append ksubsets (map (lambda (s) (cons n s)) ksubsets)))])) (check-contract subsets-proof) 1 success(es) 0 failure(s) 0 error(s) 1 test(s) run
In particular, you can see the exact same base case and inductive cases: they are the two branches of the cond. In the inductive case, we take our inductive hypothesis (a recursive call), and then produce the result by including all those elements, plus another copy where we added the new element to each.
I’ve written the theorem as the signature: it’s dependent, since what you are trying to prove is an equality that involves the input to the function.
We could also write the theorem as a separate piece of code from the proof, which is the recursive program.
We could also write the theorem as a comment. How, or if, proofs get checked, is a separate idea, and can be done many ways (including reading them by hand!). But induction, which is a mechanism of proof, is fundamentally connected to recursion, which is a mechanism of writing programs. This is an old idea: at least to the 1969, possible older https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspondence
That is, this is not just coincidence that this works out. Let’s take another example from CS1800:
You want to show that, given just $2 and $5 bills, you can make any dollar amount above $4.
How do you prove that?
The logical statement you wish to prove is:
\forall n. n \ge 4 \implies \exists t f. n = 2t + 5f
Doing this directly is hard, though maybe doable. Doing it via induction goes as follows:
First, you want to formulate what you are doing induction over. Here, it is the natural number n.
Next, what is the property. In this case, we can say P(n) = \exists t f. n = 2t + 5f.
Thus we can reformulate what we want to show as:
\forall n. n \ge 4 \implies P(n)
This then allows us to set up the induction. Our base case will be 4: the smallest value of n we care about. And then our inductive case is to show, given P(k), prove P(k+1).
We can prove P(4) = \exists t f. n = 2t + 5f by setting t = 2 and f = 0.
For the inductive case, we have as an assumption that for some k, \exists t_k f_k. k = 2t_k + 5f_k, and we need to show that there exists some t and f such that k+1 = 2t + 5f, so if we add 1 to 2t_k + 5f_k we have an equivalent value, i.e., k + 1 = 2t + 5f = 2t_k + 5f_k + 1. So how do we solve for t and f?
If t_f is not 0, then you can rewrite this as 2t + 5f = 2t_k + 5(f_k - 1) + 5 + 1 = 2(t_k + 3) + 5(f_k - 1), and thus t = t_k + 3 and f = f_k - 1.
What if t_f is 0? Well, we know t_k is 2 or greater – it cannot be smaller, as our assumption is that the smallest number we represent is 4. Thus, 2t + 5f = 2t_k + 1 - 4 + 4 = 2(t_k - 2) + 5(1), and thus t = t_k - 2 and f = 1.
As before, we can write this proof as a functional program, over numbers. Again, I put the theorem into the contract so that we can test that it works, but the proof is the program (and just like on-paper proofs, it would need to be read to understand if the inductive argument is sound).
(define-contract (TwoFiveProp n) (Immediate (check (lambda (t) (= n (+ (* 2 (first t)) (* 5 (second t)))))))) (: two-five-universal-proof (Function (arguments (n (AllOf Natural (lambda (x) (>= x 4))))) (result (AllOf (Tuple Natural Natural) (TwoFiveProp n))))) (define (two-five-universal-proof n) (cond [(= n 4) (list 2 0)] [else (let* ([rec (two-five-universal-proof (sub1 n))] [t_k (first rec)] [f_k (second rec)]) (if (zero? f_k) (list (- t_k 2) 1) (list (+ t_k 3) (- f_k 1))))])) (check-contract two-five-universal-proof) 1 success(es) 0 failure(s) 0 error(s) 1 test(s) run
Different induction principles, different recursion schemes (templates)
Both of the above proofs used induction over natural numbers. This is how many simple proofs are done, and how induction is usually first taught, as its the simplest case. Natural numbers, viewed as self referential data (either 0 or the successor of another natural number) is also the simplest possible data. But the deep connection between induction and recursion holds for more sophisticated data, and indeed, more interesting proofs, and more interesting programs, require more sophisticated structure.
Since programs are proofs, our signatures, where they exist, are the theorems that we are proving. Why is that the case? Well, a proof shows that, given certain hypotheses, a conclusion can be reached. A program, similarly, shows that given certain inputs, an output can be computed. The theorem says what the hypotheses and result are, and the signatures similarly describe what the inputs and output are.
This means that most programs, even very important ones, may be proving rather dull theorems. For example, a function that counts the elements in a list:
(: len (-> (List Any) Natural))
(define (len l) (cond [(empty? l) 0] [(cons? l) (add1 (len (rest l)))]))
is proving the logical statement:
\forall L. \exists n
i.e., given any list, we can produce a number (possibly a different one, possibly the same one). What is the proof that the above recursive program corresponds to? Well, the standard list template corresponds to the simplest induction scheme on lists: given a property P(L) you want to prove, you need to prove P(empty) and given P(L_k), you need to show that for any v, P((cons~v~L_k)).
So, in our base case, we witness the number 0.
In our inductive case, we know the property holds of P(L_k), i.e., there exists some number k. So in that case, the number we decide to witness for P((cons~v~L_k)) is k+1.
Just as you learned to do strong induction on natural numbers, where in the inductive case, you assume not only P(k) when proving P(k+1) but you assume P(i) for any i <= k, there are different induction schemes for other data structures as well.
The "right" induction principle makes it easier to write a program
Different induction principles correspond to different recursion schemes for programs, and just like different inductive schemes may allow you to prove (or not prove) certain theorems, different recursion schemes can make programs much easier to write.
For example, a simple change is, for lists, have two base cases: one for empty lists and one for lists with one element, and then have a normal inductive case for any list with more than one element. That could correspond to a program like used in the following:
(define-contract (Maybe T) (OneOf False T)) (: list-max (-> (List Integer) (Maybe Integer))) (define (list-max l) (cond [(empty? l) #f] [(and (cons? l) (empty? (rest l))) (first l)] [else (max (first l) (list-max (rest l)))])) (check-contract list-max) 1 success(es) 0 failure(s) 0 error(s) 1 test(s) run
Note something interesting about how this code was structured, vs using the standard approach with only a single base case (the "standard" list template):
(define-contract (Maybe T) (OneOf False T)) (: list-max (-> (List Integer) (Maybe Integer))) (define (list-max l) (cond [(empty? l) #f] [(cons? l) (let ([r (list-max (rest l))]) (if (equal? r #f) (first l) (max (first l) r)))])) (check-contract list-max) 1 success(es) 0 failure(s) 0 error(s) 1 test(s) run
The total "lines of code" (a meaningless measure) is similar, but in the former, almost all the "logic" is in the template: the structure of the recursion itself. Indeed, if we delete all the parts that are specific to the problem, we are left with the two structures:
(define (two-base-temp l) (cond [(empty? l) ...] [(and (cons? l) (empty? (rest l))) (... (first l) ...)] [else (... (first l) ... (two-base-temp (rest l)))]))
(define (list-temp l) (cond [(empty? l) ...] [(cons? l) (... (first l) ... (list-temp (rest l)) ...)]))
When we look at these, and then compare to the solutions, we notice that the former template has almost all the code in it! All we did was delete some ellipses and add a max call. The latter, on the other hand, had to do a bit of (somewhat tricky) work: since the empty case returns a special value, we have to first call recursively, and then inspect that to see if we got that special value, because we can only call max if we didn’t.
For such small programs, it’s a significant difference. And the reason is that for this problem, there are two base cases: an empty list and a list with one element, and then one recursive case.
Since most programs we write do not correspond to complex theorems, the main benefit in thinking about induction principles is the separation of the particular details of the problem from the structure of the approach. With theorems, it’s often impossible (or very hard) to prove if you use the wrong induction principle; with programs, you can often come up with a solution, but figuring out the right structure can result in vastly simpler programs.
Connection to for-loops, languages without good support for recursion
The idea of thinking first about the structure of computation, and then the details, is not limited to when we can structure the computation closely to the mathematical induction principle of the data. Indeed, the proliferation of iterators (and different patterns of iteration) is a good example of this: rather than writing a manual for loop, indexing into a data structure sequentially, construct a pattern of traversal (perhaps linear, perhaps not) that produces elements one after another.
For linear data structures, this ends up extremely similar to the induction principle. For trees, the linear nature of for loops starts to limit things, because the structure of the code can no longer follow the structure of the data, which is not linear: it involves branches and merges. In some cases, you can essentially proceed by flattening first, in which case the iteration approach will work, but otherwise there is going to be a lot of extra code complexity if you don’t follow the recursive structur (so, even in "imperative" or "OO" languages, dealing with trees should sometimes be done recursively).
When structural recursion fails
Some problems cannot be solved via structural recursion. Viewed inductively, in order to solve P(k), we cannot just use some P(k-i); rather, we might need to solve P(l) where l is not "smaller" than k. A better way of framing that is that we probably end up needing to solve P(f(k)) – i.e., we perform some operation on k that yields a new instance of the problem that we need to solve. A classic example that you had in HW recently was DFS: you end up calling recursively on the same graph (maze) and a larger set of visited nodes.
In some cases, there is an input that is smaller, but it’s not the example provided by the induction principle. In other cases (like the graph search), no input is getting smaller (the important input, the visited nodes, is actually getting bigger).
In these cases, in addition to having to prove that this works, we have a new obligation that was completely absent above: that the code terminates. Code that follows by induction terminates by construction: to compute P(k), we compute P(0) and then P(1) using P(0) and so on until we get to P(k); since k is finite, this is guaranteed to terminate as long as each individual proof step terminates. With generative recursion, we need a separate termination argument. In the case of DFS, it is guaranteed to terminate since the visited set gains an element on every recursive call, and can never exceed the size of the graph.
2.1 Using these lessons to more easily write code
While we’re reflecting on lessons from Fundies 1; we’ve noticed many of you don’t write tests, or at least, not until we force you to, which means, after you’ve written code that you think is correct, but is failing our test cases (so isn’t actually correct). Something you may not have realized: if you are solving easy problems, writing tests before code is, as you probably think, somewhat tedious. But if you are solving hard problems, writing the tests before writing the code will make things easier. Because its easier to think through the problem when you’re writing tests, you’re more likely to consider the edge cases, and this will then automatically debug your code as you write it. As you become more experienced, more and more problems you’re working on will be complex problems, so getting good at this skill will, like understanding how to separate structure from details, make you a better programmer.
The lessons above should let you much more easily solve problems. These are theoretical ideas that have very practical consequences, and allow those that are profficient at them to be much more effective programmers.
The important steps are:
1. Look at the data provided as input to the problem. What is its structure? Is it a list? A tree? A graph? Some combination of these things?
2. What is the structure of computation that the problem you are solving needs? This needs to involve (1), since you are figuring out how you need to use what parts of the data structure. Getting good at recognizing this takes time, but the critical idea is that this should come before trying to actually solve the problem. Diving in before you know the structure is a good way to make a mess.
3. Once you have (2), you should be able to write a scaffold (or template) of a solution. If its a straightforward recursive problem on a recursive data structure, this code should look like the templates from Fundies 1. But all problems can (and should) be approached this way. You might end up with a template that involves an accumulator (or multiple). You might have a template for a generative recursion solution, in which case, part of the structure of the computation is thinking about why this should terminate.