Skip to content

Lecture 1.2: Class Overview

In the age of AI coding assistants, code is cheap: I can ask an LLM to implement an algorithm given only natural language prompts. Does this make programming obsolete?

There is a good reason to not think so. Programming is much more than just writing code. In many ways, programming is the art of manipulating code: combining pieces of code together, modifying code to match a new use case, and testing / debugging / verifying that the code is running correctly. Manipulating code in these ways --- particularly ensuring that code is running correctly --- requires a deep understanding of not only what the code is doing, but also our intent for the code: exactly what should it be doing? In other words, what is the program's specification?

How important these specifications are depends on their use case. It might not be too important to deeply specify a phone game. But what about the software running on airplanes? Or the software controlling this building's HVAC in the winter? Or the software controlling a pacemaker? Would you trust this code if it was written by an LLM? What about a sleep-deprived software engineer?

This course is all about specification: how to write them, how to use them, and how to reason about them.

The Need for Formal Specs

Imagine you are working on the backend for a new payment processing system for a bank, where each payment may have multiple inputs for a single output (imagine splitting a check). Your initial prototype will, given the list of senders and receivers and a total amount to be sent, split the total amount among all senders and generate a request for them to pay. In something like Python, this would look like:

# Generates request to send `amount`, divided evenly, from each sender to receiver
def mk_request(senders, receiver, amount):
    amount_each = amount / len(senders)
    # Truncate to two decimal places
    amount_each = math.trunc(100 * amount_each) / 100
    requests = []
    for s in senders:
        requests.append((s, receiver, amount_each))
    return requests

So far, so good. Your new feature takes off and now your banking app has a text field where people can generate requests. In principle, you have satisfied the specification; indeed, you can see it working with a few test cases:

mk_request(['a', 'b'], 'c', 100) # Outputs [('a', 'c', 50.0), ('b', 'c', 50.0)]
mk_request(['a', 'b', 'c'], 'd', 60) # Outputs [('a', 'd', 20.0), ('b', 'd', 20.0), ('c', 'd', 20.0)] 

But what is wrong with this function?

  • What happens if the app lets you call mk_request where senders = []? This would let you make the app crash. This could introduce a serious security vulnerability!
  • What happens if you call mk_request(['a', 'b', 'c'], 'd', 100)? The receiver only gets back 99.99. While one cent isn't a big deal for a single transaction, imagine a billion of them.
  • Suppose you don't properly validate the input amount. What could go wrong? (What if amount is negative?)

Note that none of these issues were captured by the simple specification we wrote in the comment. While one could expand the comments to capture all of the above corner cases, this approach of manual inspection does not scale. Imagine a 5000-line piece of code written by three people (with varying level of expertise), over five years, and alongside LLMs!

Instead, this course will teach you how to reason rigourously and mathematically about the code in order to eliminate correctness issues or security vulnerabilities in a principled way.

Writing Formal Specs

The first step is to think of mk_request not as a Python function, but as a mathematical function from List String * String * Int to List (String * String * Int). But due to the possibility of Python throwing a division-by-zero error, let's model it as a partial function. To that end, we will write mkrequest(S, r, a) = R to mean, if we provide the inputs S, r, a to mk_request, it terminates successfully and returns requests R. (For now, I am using capital letters to denote lists).

Now, let's formalize the properties we want mk_request to have. First and foremost, it should not crash: $$ \forall S, \forall r, \forall a, \exists R, \verb|mk_request|(S,r,a) = R. $$

Next, the amount of money flowing through the system should be conserved: $$ \forall S, \forall r, \forall a, \forall R, \verb|mk_request|(S, r, a) = R \implies a = \Sigma_{(s, r, i) \in R}\ i.$$

Any other properties? Does the above imply that each sender is paying same amount? We need to specify this too:

\[ \forall S, \forall r, \forall a, \forall R, \verb|mk_request|(S, r, a) = R \implies \forall (s, r, x) \in R, \forall (s', r, x') \in R, x = x'. \]

Next, we could say that if \(a\) is negative, then the function is undefined? $$ \forall S, \forall r, \forall a, a < 0 \implies \not \exists R,\ \verb|mk_request|(S, r, a) = R. $$

Why have formal specs?

There are three reasons:

  • They are precise. Formal specifications, like the ones above, leave no room for ambiguity, and thus no room for misinterpretation. While there are more and less precise ways to write English ("send amount dollars from senders to receiver" versus "the sum of all money sent from senders should total to exactly equal"), formal specs force you to write without ambiguity.
  • They are decoupled from implementations. Formal specs are properties about implementations, and thus are able to stand alone, independent of any implementation. Because of this, we can write formal specs first, before implementing anything. Programming in this style is extremely effective at preventing bugs.
  • They are executable. Given a formal spec, if written in the right way (as we will do in this class), one can turn it into a test that evaluates whether a given implementation meets that specification. This gives us a way to not only write specifications, but also evaluate the correctness of our software!

Testing Formal Specs

Let's see how a formal spec can turned into an executable test. I will show it in Python pseudocode, but we will soon move to a different language.

First, we will create a testing harness, which will allow us to run an arbitrary test, rel, against the inputs and outputs of mk_request:

def mk_request_tester(rel):
    for _ in range(1000):
        S = random_senders()
        r = random_receiver()
        a = random_amount()
        R = mk_request(S, r, a)
        if rel(S, r, a, R) == False:
            return False
    return True

Here, rel is a function that takes the inputs to mk_request (the senders, receiver, and amount) and output (the requests made) and returns a boolean (either True or False). In this way, rel is encoding a relation between the inputs and outputs of mk_request. We can encode our specs above (except the one about not crashing --- we will talk about this later) as a particular instance of rel. Finally, given rel, mk_request_tester will run mk_request many times on random inputs until rel returns False. If rel returns False on any input, we will return False (because the test has failed); thus, we return True if rel succeeded on all tests.

Let's now use mk_request_tester to implement the conservation property:

def money_conserved_relation(S, r, a, R):
    return (a == sum(x for (_, _, x) in R))

def money_conserved():
    return mk_request_tester(money_conserved_relation)

This course will teach you how to formalize, implement, and test properties like these for a variety of different kinds of programs, just like our example above.

Module 1: Programming and Property-Based Testing in OCaml

While we could do this course in any language, we will use OCaml, a functional, statically-typed programming language. OCaml is very popular both in academia and in industry (used by companies such as Jane Street) due to its practical design that lets you write simple code that runs efficiently and is easy to reason about (great for specifications!).

In this module, we will gain experience programming and specifying programs in OCaml. By the end of this part of the class, you will be able to formalize mathematical properties of code and deploy a specific kind of testing, property-based testing, to check whether your programs meet their specifications.

Module 2: Engineering with Abstractions

Next, we will dive deeper in OCaml and its capabilities for helping you write better, more correct, programs. In particular, OCaml helps you program with modularity --- breaking programming tasks into smaller ones that can be built independently --- and abstraction --- ensuring that these smaller tasks are actually independent.

For example, take a web browser. How do you test a web browser? There is no clear specification. However, there is a clear specification of subcomponents of the browser (e.g., networking libraries, JSON parsing, and file management). By programming with modularity and abstraction, you will learn how to apply property-based testing to larger, complex software systems by decomposing them into smaller ones with manageable specifications.

Module 3: Analyzing Programming Languages

Finally, at the end of the course, we will learn about analyzing programming languages themselves: how to specify a language's syntax and semantics, and how to build static analyses of languages. The main lesson of this section is that programs are data structures, just like binary search trees and association lists. And just like binary search trees, we can implement recursive algorithms over programs and use property-based testing to specify and analyze the correctness of these implementations.

By the end of this module, you will develop and analyze a simple domain specific programming language (a DSL), and use it to enforce correctness guarantees of programs.

Logistics

  • We will have one homework a week, due Sunday 11:59pm; the first one will be released this Friday.