Lecture 1: Intro

8.15

Lecture 1: Intro🔗

1 Purpose🔗

Introduce the class, give high-level motivations.

2 Outline🔗

Program synthesis is the idea of creating programs from high-level specifications automatically. The field itself is incredibly old, yet only recently is becoming mainstream following the advancement of something you’ve become familiar with: AI. AI, particularly the use of natural language to generate code through LLMs, will undoubtedly play a significant role in program synthesis; it has already gained traction in industry to accelerate the development velocity of companies. While we can’t predict the languages or tools that you’ll use after you graduate, AI-based program synthesis is guaranteed to be involved.

Why does this matter, and what relevance does it have to the course? Program synthesis tools are already quite good at going from prompts to code and will continue to improve. However, these tools are fallible and can make mistakes:

They can produce code that seems plausibly correct, but isn’t – the code has subtle bugs.
They can produce code that has security vulnerabilities.

The question naturally arises: how can we ensure these tools do neither? One thing we could do is check the outputted code the tools produce, but what kind of checks do we want to perform on that code? One trivial check would be to see if the code runs, but logically incorrect code can still run! So, we need more rigorous checks to be performed on the outputted code: we need to check its behavior.

A specification is a precise description of what the code should do. check-expects from the Racket student languages are an example of specifications you’ve already been writing! Is the behavior that we observe from running the code something we expected – in other words, does it conform to our description? When we have rigorous specifications, program synthesis tools can be turned loose, as only synthesized programs that satisfy our rigorous specification will be accepted, and we can be reasonably sure that those generated programs are correct!

This course teaches two primary ideas:

What are specifications? How do we write them?
How can we connect specifications to programs?

While we believe the former – writing specifications – will be skills that become mainstream over the next decade or so, knowing the latter will position you to be able to help build the tools that take synthesis mainstream. Certainly, working on the models themselves requires other skills: modern AI techniques involving deep learning, neural networks, etc. but we’re not concerned with that: these models exist and will continue to get better. In this class, we’re interested in how we can connect the models to reality, and that’s where specifications come in.

We’ll use a variety of techniques in this class – and there are many more that we won’t cover – but, having learned the fundamentals, you’ll be well-positioned to use and build these tools. So, let’s get started.

What is logic?

What is computation?

Are these things connected?

Goal of this class: Learn how to write down precise descriptions of what code should do, and to discover how these properties can relate to the programs you are building.

Stop and think: what is a property?

Problem with descriptions: imprecision.

One of most valuable domains: medical devices. Consider designing an automatic insulin pump. It has electronics that periodically measure blood sugar, and separately can deliver insulin to correct when blood sugar goes too high. Having too little insulin in people with Type 1 diabetes is obviously a problem: hence the insulin pump, but delivering too much insulin is also bad. Clearly, the software that reads the measurements and decides how much insulin to deliver is quite important. How do we ensure it does the right thing?

First, we need to have a precise way of describing its behavior. You can try to do this in English, and likely, will start that way, but English has all sorts of ambiguities: take the word "or". In English, it usually means exclusive or, but sometimes might mean non-exclusive or. In cases where a little ambiguity is fine (or indeed, desired: literature would be quite boring if not for this!), this is fine, but when it comes to places where we want the exact behavior to be understood and checked (by doctors and scientists, in the case of the insulin pump), that’s not great.

So instead, we turn to more formal languages. In this, logic is a great asset, as it allows us to give precise meaning to what we want.

How would we begin to describe how an insulin pump should work? One key idea, especially when describing systems that operate over time, is to describe certain temporal properties that it should have. For example, there might be some dosage that should never be exceeded over a certain time interval, no matter what else the system was doing. Consider that the main control function for the pump took two arguments: a history of blood sugar readings, and a history of insulin doses (each, say, every second), and we need to determine what dosage to give at the next second (which, next time our control function is called, will be the entry at 0 SecondsAgo). We could represent each of those as function types, so the type of our control function might be:

; control : [SecondsAgo -> BloodSugar] [SecondsAgo -> InsulinGiven] -> InsulinGiven

If we wanted to never exceed a certain total dosage, say MAX-DOSE-OVER-T, over a certain interval T, we could express that logically with the statement:

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad \sum_{i=1}^{T-1} Rx(i) \lt \text{MAX-DOSE-OVER-T} \implies \\ \quad control(Hx, Rx) \lt \text{MAX-DOSE-OVER-T} - \sum_{i=1}^{T-1} Rx(i)

There might be simpler properties we also want to hold: never giving more than MAX-DOSE at a single time, regardless of the readings or history of dosages:

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad control(Hx,Rx) \lt \text{MAX-DOSE}

And then, of course, once we get past the safety requirements, we can start expressing properties about what we should give in terms of dosing: clearly, these requirements need to come from experts, but they can then be expressed in terms of properties that use the history of readings & history of dosings. Let’s say we wanted to express that if the blood sugar reading average was above 200 over the past 120 readings, and less than 10 units of insulin given (intentionally not giving units, to not give inaccurate medical information), deliver 1 unit.

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad ((\sum_{i=1}^{120} Hx(i)/120) \ge 200) \land ((\sum_{i=1}^{120} Rx(i)) \lt 10) \implies \\ \quad control(Hx, Rx) = 1

All of these have used a logical operator from first-order logic: forall. We use this to express that these are properties that should hold no matter what values come in.

This course is about writing specifications. We want to connect the logical operators that you learned about in Discrete with actual code, as seen above. But it is also about learning how to use those specifications.

Throughout it all, we’ll use a language designed specifically for this course called LSL (Logical Student Language), which extends the Intermediate Student Language from CS2500 with support for logical reasoning and tooling. We’ll learn how to formulate logical properties as code, and how to use property-based testing (PBT) to partially check those properties and look for counter-examples. We’ll show how type-based invariants (signatures) can be tracked and checked, and how much richer properties can be expressed in similar ways. You can install the language from within DrRacket (please install the latest version 8.15) by going to File -> Install Package..., and then searching for lsl.

We’ll learn that the representation of properties also matters. While the function (map) representation used in the example above might seem natural, functions are somewhat opaque, and we might have an easier time reasoning about a sequence of readings. We can make that change, and then actually run some of our properties.

For example, consider a prototype (and wrong) implementation of the above control function.

#lang lsl

(: control (-> [List Natural] [List Natural] Natural))
(define (control Hx Rx)
1)

We want to specify how this should work, translating the math above into code:

We need a few helpers:

(: get-n-readings (-> [List Natural] Natural [List Natural]))
(define (get-n-readings l n)
  (cond [(zero? n) '()]
        [(and (positive? n) (empty? l)) (build-list n (λ(_) 0))]
        [(and (positive? n) (cons? l)) (cons (first l)
                                             (get-n-readings (rest l)
                                                             (sub1 n)))]))
(: sum ( -> [List Integer] Integer))
(define (sum lon) (apply + lon))
(: average (-> [List Real] Real))
(define (average lon) (/ (sum lon) (length lon)))

And then we can translate, pretty closely, our specification from math:

(define MAX-DOSE-OVER-T 10)
(define T 60)
(define MAX-DOSE 2)
(: control-prop (-> [List Natural] [List Natural] True))
(define (control-prop Hx Rx)
  (let ([already-given (sum (get-n-readings Rx (- T 1)))])
    (if (< already-given MAX-DOSE-OVER-T)
        (< (control Hx Rx) (- MAX-DOSE-OVER-T already-given))
        #t)))

We’ve changed the representation of our histories to lists instead of functions, as with many of our techniques, the primary goal is to find counter-examples, and lists will print out, whereas functions are opaque. The question of data representation in order to support reasoning is an important one, and one that we’ll return to.

We’ve written the property as a predicate, but given it a signature that says it should always return true. We’ll come back to exactly what this means, in more detail. We can then ask LSL to check this contract for us, which will look for counter-examples to it:

(check-contract control-prop 2000)
1 success(es) 0 failure(s) 0 error(s) 1 test(s) run

In this case, it does!

If we fixed our control function to take history into account, we could fix this. In addition to complex properties like this one, we could write simpler ones, e.g.:

(: control-prop-2 (-> [List Natural] [List Natural] True))
(define (control-prop-2 Hx Rx)
(< (control Hx Rx) MAX-DOSE))

(check-contract control-prop-2)
1 success(es) 0 failure(s) 0 error(s) 1 test(s) run

3 Administrative🔗

We’ll cover most of the actual logistics next class, as we want to have the first class be a bit more substantive, but a few important details:

1. There is one Homework per week, and they are all autograded. You are welcome to work with a partner on them.

2. The first one will be released soon, but isn’t due until next Thursday.

contents ← prev up next →

	Syllabus
	Schedule
	Office Hours
	Lectures
	Homework
	ISL+ Recap

	Lecture 1: Intro
	Lecture 2: Logistics, Specifications in LSL
	Lecture 3: Propositional Logic in Code
	Lecture 4: Using LSL
	Lecture 5: Using LSL II
	Lecture 6: Design / Specification Recipe
	Lecture 7: Relational Specifications
	Lecture 8: Dependent Contracts
	Lecture 9: Itemizations
	Lecture 10: Recursive Contracts
	Lecture 11: More Recursive Data
	Lecture 12: Continuing with Recursive Data
	Lecture 13: Using LSL III
	Lecture 14: Specifications with Functions
	Lecture 15: Polymorphism
	Lecture 16: Polymorphism II
	Lecture 17: Data Abstraction in OO Languages
	Lecture 18: Programming Against Interfaces
	Exam 1
	Lecture 20: Recursion & Induction
	Lecture 21: Reasoning about Mutable State
	Lecture 22: Value Mutation
	Lecture 23: More Pointer Equality
	Lecture 24: Temporal Specifications
	Lecture 25: Concurrency
	Lecture 26: Concurrency II
	Lecture 27: Concurrency Practice
	Lecture 28: Big-O
	Slack Day
	Lecture 30: More Performance
	Lecture 31: State Machines
	Lecture 32: Resources
	Special Topics: Interactive Theorem Proving with Rocq
	Special Topics: Verification with Dafny