On this page:
1 Purpose
2 Outline
3 Administrative
8.15

Lecture 1: Intro🔗

1 Purpose🔗

Introduce the class, give high-level motivations.

2 Outline🔗

Program synthesis is the idea of creating programs from high-level specifications automatically. The field itself is incredibly old, yet only recently is becoming mainstream following the advancement of something you’ve become familiar with: AI. AI, particularly the use of natural language to generate code through LLMs, will undoubtedly play a significant role in program synthesis; it has already gained traction in industry to accelerate the development velocity of companies. While we can’t predict the languages or tools that you’ll use after you graduate, AI-based program synthesis is guaranteed to be involved.

Why does this matter, and what relevance does it have to the course? Program synthesis tools are already quite good at going from prompts to code and will continue to improve. However, these tools are fallible and can make mistakes:

The question naturally arises: how can we ensure these tools do neither? One thing we could do is check the outputted code the tools produce, but what kind of checks do we want to perform on that code? One trivial check would be to see if the code runs, but logically incorrect code can still run! So, we need more rigorous checks to be performed on the outputted code: we need to check its behavior.

A specification is a precise description of what the code should do. check-expects from the Racket student languages are an example of specifications you’ve already been writing! Is the behavior that we observe from running the code something we expected – in other words, does it conform to our description? When we have rigorous specifications, program synthesis tools can be turned loose, as only synthesized programs that satisfy our rigorous specification will be accepted, and we can be reasonably sure that those generated programs are correct!

This course teaches two primary ideas:

While we believe the former – writing specifications – will be skills that become mainstream over the next decade or so, knowing the latter will position you to be able to help build the tools that take synthesis mainstream. Certainly, working on the models themselves requires other skills: modern AI techniques involving deep learning, neural networks, etc. but we’re not concerned with that: these models exist and will continue to get better. In this class, we’re interested in how we can connect the models to reality, and that’s where specifications come in.

We’ll use a variety of techniques in this class – and there are many more that we won’t cover – but, having learned the fundamentals, you’ll be well-positioned to use and build these tools. So, let’s get started.

What is logic?

What is computation?

Are these things connected?

Goal of this class: Learn how to write down precise descriptions of what code should do, and to discover how these properties can relate to the programs you are building.

Stop and think: what is a property?

Problem with descriptions: imprecision.

One of most valuable domains: medical devices. Consider designing an automatic insulin pump. It has electronics that periodically measure blood sugar, and separately can deliver insulin to correct when blood sugar goes too high. Having too little insulin in people with Type 1 diabetes is obviously a problem: hence the insulin pump, but delivering too much insulin is also bad. Clearly, the software that reads the measurements and decides how much insulin to deliver is quite important. How do we ensure it does the right thing?

First, we need to have a precise way of describing its behavior. You can try to do this in English, and likely, will start that way, but English has all sorts of ambiguities: take the word "or". In English, it usually means exclusive or, but sometimes might mean non-exclusive or. In cases where a little ambiguity is fine (or indeed, desired: literature would be quite boring if not for this!), this is fine, but when it comes to places where we want the exact behavior to be understood and checked (by doctors and scientists, in the case of the insulin pump), that’s not great.

So instead, we turn to more formal languages. In this, logic is a great asset, as it allows us to give precise meaning to what we want.

How would we begin to describe how an insulin pump should work? One key idea, especially when describing systems that operate over time, is to describe certain temporal properties that it should have. For example, there might be some dosage that should never be exceeded over a certain time interval, no matter what else the system was doing. Consider that the main control function for the pump took two arguments: a history of blood sugar readings, and a history of insulin doses (each, say, every second), and we need to determine what dosage to give at the next second (which, next time our control function is called, will be the entry at 0 SecondsAgo). We could represent each of those as function types, so the type of our control function might be:

; control : [SecondsAgo -> BloodSugar] [SecondsAgo -> InsulinGiven] -> InsulinGiven

If we wanted to never exceed a certain total dosage, say MAX-DOSE-OVER-T, over a certain interval T, we could express that logically with the statement:

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad \sum_{i=1}^{T-1} Rx(i) \lt \text{MAX-DOSE-OVER-T} \implies \\ \quad control(Hx, Rx) \lt \text{MAX-DOSE-OVER-T} - \sum_{i=1}^{T-1} Rx(i)

There might be simpler properties we also want to hold: never giving more than MAX-DOSE at a single time, regardless of the readings or history of dosages:

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad control(Hx,Rx) \lt \text{MAX-DOSE}

And then, of course, once we get past the safety requirements, we can start expressing properties about what we should give in terms of dosing: clearly, these requirements need to come from experts, but they can then be expressed in terms of properties that use the history of readings & history of dosings. Let’s say we wanted to express that if the blood sugar reading average was above 200 over the past 120 readings, and less than 10 units of insulin given (intentionally not giving units, to not give inaccurate medical information), deliver 1 unit.

\forall (Hx : SecondsAgo \rightarrow BloodSugar) (Rx : SecondsAgo \rightarrow InsulinGiven).\\ \quad ((\sum_{i=1}^{120} Hx(i)/120) \ge 200) \land ((\sum_{i=1}^{120} Rx(i)) \lt 10) \implies \\ \quad control(Hx, Rx) = 1

All of these have used a logical operator from first-order logic: forall. We use this to express that these are properties that should hold no matter what values come in.

This course is about writing specifications. We want to connect the logical operators that you learned about in Discrete with actual code, as seen above. But it is also about learning how to use those specifications.

Throughout it all, we’ll use a language designed specifically for this course called LSL (Logical Student Language), which extends the Intermediate Student Language from CS2500 with support for logical reasoning and tooling. We’ll learn how to formulate logical properties as code, and how to use property-based testing (PBT) to partially check those properties and look for counter-examples. We’ll show how type-based invariants (signatures) can be tracked and checked, and how much richer properties can be expressed in similar ways. You can install the language from within DrRacket (please install the latest version 8.15) by going to File -> Install Package..., and then searching for lsl.

We’ll learn that the representation of properties also matters. While the function (map) representation used in the example above might seem natural, functions are somewhat opaque, and we might have an easier time reasoning about a sequence of readings. We can make that change, and then actually run some of our properties.

For example, consider a prototype (and wrong) implementation of the above control function.

(: control (-> [List Natural] [List Natural] Natural))
(define (control Hx Rx)
  1)

We want to specify how this should work, translating the math above into code:

We need a few helpers:
(: get-n-readings (-> [List Natural] Natural [List Natural]))
(define (get-n-readings l n)
  (cond [(zero? n) '()]
        [(and (positive? n) (empty? l)) (build-list n (λ(_) 0))]
        [(and (positive? n) (cons? l)) (cons (first l)
                                             (get-n-readings (rest l)
                                                             (sub1 n)))]))
(: sum ( -> [List Integer] Integer))
(define (sum lon) (apply + lon))
(: average (-> [List Real] Real))
(define (average lon) (/ (sum lon) (length lon)))

And then we can translate, pretty closely, our specification from math:

(define MAX-DOSE-OVER-T 10)
(define T 60)
(define MAX-DOSE 2)
(: control-prop (-> [List Natural] [List Natural] True))
(define (control-prop Hx Rx)
  (let ([already-given (sum (get-n-readings Rx (- T 1)))])
    (if (< already-given MAX-DOSE-OVER-T)
        (< (control Hx Rx) (- MAX-DOSE-OVER-T already-given))
        #t)))

We’ve changed the representation of our histories to lists instead of functions, as with many of our techniques, the primary goal is to find counter-examples, and lists will print out, whereas functions are opaque. The question of data representation in order to support reasoning is an important one, and one that we’ll return to.

We’ve written the property as a predicate, but given it a signature that says it should always return true. We’ll come back to exactly what this means, in more detail. We can then ask LSL to check this contract for us, which will look for counter-examples to it:

(check-contract control-prop 2000)

--------------------

interaction-area tests > Unnamed test

FAILURE

params:     '(#<procedure:.../syntax/compile.rkt:183:7> 2000)

name:       check-contract

location:   eval:14:0

discovered a counterexample

  counterexample: (control-prop '(1 0) '(3 4 1 1))

  error:

    control-prop: contract violation

      expected: True

      given: #f

      blaming: program (as server)

--------------------

0 success(es) 1 failure(s) 0 error(s) 1 test(s) run

In this case, it does!

If we fixed our control function to take history into account, we could fix this. In addition to complex properties like this one, we could write simpler ones, e.g.:

(: control-prop-2 (-> [List Natural] [List Natural] True))
(define (control-prop-2 Hx Rx)
  (< (control Hx Rx) MAX-DOSE))
(check-contract control-prop-2)

1 success(es) 0 failure(s) 0 error(s) 1 test(s) run

3 Administrative🔗

We’ll cover most of the actual logistics next class, as we want to have the first class be a bit more substantive, but a few important details:

1. There is one Homework per week, and they are all autograded. You are welcome to work with a partner on them.

2. The first one will be released soon, but isn’t due until next Thursday.