Lecture 1.2: Class Overview
In the age of AI coding assistants, code is cheap: I can ask an LLM to implement an algorithm given only natural language prompts. Does this make programming obsolete?
There is a good reason to not think so. Programming is much more than just writing code. In many ways, programming is the art of manipulating code: combining pieces of code together, modifying code to match a new use case, and testing / debugging / verifying that the code is running correctly. Manipulating code in these ways --- particularly ensuring that code is running correctly --- requires a deep understanding of not only what the code is doing, but also our intent for the code: exactly what should it be doing? In other words, what is the program's specification?
How important these specifications are depends on their use case. It might not be too important to deeply specify a phone game. But what about the software running on airplanes? Or the software controlling this building's HVAC in the winter? Or the software controlling a pacemaker? Would you trust this code if it was written by an LLM? What about a sleep-deprived software engineer?
This course is all about specification: how to write them, how to use them, and how to reason about them.
The Need for Formal Specs
Imagine you are working on the backend for a new payment processing system for a bank, where each payment may have multiple inputs for a single output (imagine splitting a check). Your initial prototype will, given the list of senders and receivers and a total amount to be sent, split the total amount among all senders and generate a request for them to pay. In something like Python, this would look like:
# Generates request to send `amount`, divided evenly, from each sender to receiver
def mk_request(senders, receiver, amount):
amount_each = amount / len(senders)
# Truncate to two decimal places
amount_each = math.trunc(100 * amount_each) / 100
requests = []
for s in senders:
requests.append((s, receiver, amount_each))
return requests
So far, so good. Your new feature takes off and now your banking app has a text field where people can generate requests. In principle, you have satisfied the specification; indeed, you can see it working with a few test cases:
mk_request(['a', 'b'], 'c', 100) # Outputs [('a', 'c', 50.0), ('b', 'c', 50.0)]
mk_request(['a', 'b', 'c'], 'd', 60) # Outputs [('a', 'd', 20.0), ('b', 'd', 20.0), ('c', 'd', 20.0)]
But what is wrong with this function?
- What happens if the app lets you call
mk_request
wheresenders = []
? This would let you make the app crash. This could introduce a serious security vulnerability! - What happens if you call
mk_request(['a', 'b', 'c'], 'd', 100)
? The receiver only gets back99.99
. While one cent isn't a big deal for a single transaction, imagine a billion of them. - Suppose you don't properly validate the input
amount
. What could go wrong? (What ifamount
is negative?)
Note that none of these issues were captured by the simple specification we wrote in the comment. While one could expand the comments to capture all of the above corner cases, this approach of manual inspection does not scale. Imagine a 5000-line piece of code written by three people (with varying level of expertise), over five years, and alongside LLMs!
Instead, this course will teach you how to reason rigourously and mathematically about the code in order to eliminate correctness issues or security vulnerabilities in a principled way.
Writing Formal Specs
The first step is to think of mk_request
not as a Python function, but as a
mathematical function from List String * String * Int
to List (String * String * Int)
. But due to the
possibility of Python throwing a division-by-zero error, let's model it as a
partial function. To that end, we will write mkrequest(S, r, a) = R
to mean, if we
provide the inputs S, r, a
to mk_request
, it terminates successfully and
returns requests R
. (For now, I am using capital letters to denote lists).
Now, let's formalize the properties we want mk_request
to have. First and foremost, it should not crash:
$$ \forall S, \forall r, \forall a, \exists R, \verb|mk_request|(S,r,a) = R. $$
Next, the amount of money flowing through the system should be conserved: $$ \forall S, \forall r, \forall a, \forall R, \verb|mk_request|(S, r, a) = R \implies a = \Sigma_{(s, r, i) \in R}\ i.$$
Any other properties? Does the above imply that each sender is paying same amount? We need to specify this too:
Next, we could say that if \(a\) is negative, then the function is undefined? $$ \forall S, \forall r, \forall a, a < 0 \implies \not \exists R,\ \verb|mk_request|(S, r, a) = R. $$
Why have formal specs?
There are three reasons:
- They are precise. Formal specifications, like the ones above, leave no room for ambiguity, and thus no room for misinterpretation.
While there are more and less precise ways to write English ("send
amount
dollars fromsenders
toreceiver
" versus "the sum of all money sent fromsenders
should total to exactlyequal
"), formal specs force you to write without ambiguity. - They are decoupled from implementations. Formal specs are properties about implementations, and thus are able to stand alone, independent of any implementation. Because of this, we can write formal specs first, before implementing anything. Programming in this style is extremely effective at preventing bugs.
- They are executable. Given a formal spec, if written in the right way (as we will do in this class), one can turn it into a test that evaluates whether a given implementation meets that specification. This gives us a way to not only write specifications, but also evaluate the correctness of our software!
Testing Formal Specs
Let's see how a formal spec can turned into an executable test. I will show it in Python pseudocode, but we will soon move to a different language.
First, we will create a testing harness, which will allow us to run an arbitrary test, rel
, against the inputs and outputs of mk_request
:
def mk_request_tester(rel):
for _ in range(1000):
S = random_senders()
r = random_receiver()
a = random_amount()
R = mk_request(S, r, a)
if rel(S, r, a, R) == False:
return False
return True
Here, rel
is a function that takes the inputs to mk_request
(the senders, receiver, and amount) and output (the requests made) and returns a boolean (either True
or False
). In this way, rel
is encoding a relation between the inputs and outputs of mk_request
. We can encode our specs above (except the one about not crashing --- we will talk about this later) as a particular instance of rel
. Finally, given rel
, mk_request_tester
will run mk_request
many times on random inputs until rel
returns False
. If rel
returns False
on any input, we will return False
(because the test has failed); thus, we return True
if rel
succeeded on all tests.
Let's now use mk_request_tester
to implement the conservation property:
def money_conserved_relation(S, r, a, R):
return (a == sum(x for (_, _, x) in R))
def money_conserved():
return mk_request_tester(money_conserved_relation)
This course will teach you how to formalize, implement, and test properties like these for a variety of different kinds of programs, just like our example above.
Module 1: Programming and Property-Based Testing in OCaml
While we could do this course in any language, we will use OCaml, a functional, statically-typed programming language. OCaml is very popular both in academia and in industry (used by companies such as Jane Street) due to its practical design that lets you write simple code that runs efficiently and is easy to reason about (great for specifications!).
In this module, we will gain experience programming and specifying programs in OCaml. By the end of this part of the class, you will be able to formalize mathematical properties of code and deploy a specific kind of testing, property-based testing, to check whether your programs meet their specifications.
Module 2: Engineering with Abstractions
Next, we will dive deeper in OCaml and its capabilities for helping you write better, more correct, programs. In particular, OCaml helps you program with modularity --- breaking programming tasks into smaller ones that can be built independently --- and abstraction --- ensuring that these smaller tasks are actually independent.
For example, take a web browser. How do you test a web browser? There is no clear specification. However, there is a clear specification of subcomponents of the browser (e.g., networking libraries, JSON parsing, and file management). By programming with modularity and abstraction, you will learn how to apply property-based testing to larger, complex software systems by decomposing them into smaller ones with manageable specifications.
Module 3: Analyzing Programming Languages
Finally, at the end of the course, we will learn about analyzing programming languages themselves: how to specify a language's syntax and semantics, and how to build static analyses of languages. The main lesson of this section is that programs are data structures, just like binary search trees and association lists. And just like binary search trees, we can implement recursive algorithms over programs and use property-based testing to specify and analyze the correctness of these implementations.
By the end of this module, you will develop and analyze a simple domain specific programming language (a DSL), and use it to enforce correctness guarantees of programs.
Logistics
- We will have one homework a week, due Sunday 11:59pm; the first one will be released this Friday.