13 Synthesis

8.10

13 Synthesis

Purpose: Explore synthesis & how it connects to the design recipe

Pre-lab Setup

Please make sure you have Python 3 installed (We’ve tested on 3.9 & 3.11, and assume 3.10 & 3.12 will work). You also need to install the OpenAI package. If you’ve installed Python3, you should have pip (run it with either pip or pip3) installed, and can then run the following command using either the Terminal (on MacOS) or Powershell (on Windows):

pip3 install openai

If you have done Python development before, you are welcome to set this up within a virtual environment, but given that we have no other requirements, installing openai globally should be enough.

You probably want to have some text editor that can at least syntax highlight Python. At this point, perhaps the most common suggestion in VSCode; you can use that, or anything else that you’d like.

Background

In this lab, you will use a research prototype being built by your professor and several students. The goal behind the system is to use LLM-driven synthesis behind the scenes, but to never show generated code to the programmer. The primary reason for exploring this idea is a well-known phenomena in the research literature called "automation bias", which means that people are more likely to trust the results of automated systems than the same results when they come from people. Given that LLMs can produce buggy code, code with security vulnerabilities, etc, this is a particular risk that is currently under-explored. Our approach to this is to structure our synthesis to avoid ever showing the code, and thus avoiding the issue of programmers skimming past mistakes in generated code (since, especially with synthesized code, most of it will look reasonable!).

How do we do that? We have programmers write down specifications, currently in the form of signatures, purpose statements, and tests, and then use those both to guide the synthesis (as is typically done) but also (the unit tests) to validate that the generated code conforms to the expected behavior. In the case that it doesn’t, the system automatically re-prompts (with the failed test or error raised) for a corrected version and tries to run the test suite again.

Concretely, we have implemented this as a library in Python. We know some of you have not programmed in Python: that’s fine. Part of the goal of this is that you don’t actually write any code! You need to know how to write data definitions ("dataclasses"), function headers, purpose statements, and unit tests (which will require knowing how to construct examples of data), and that should be it.

How It Works

An example (complete) use of the library is the following (this shows all the features of Python that we need):

from synth import *

@dataclass
class Money:
    dollars : int
    cents : int

@synthfunc
@synthdepend(Money)
@synthexpect(Money(1,0), 100)
@synthexpect(Money(5,5), 505)
def get_in_cents(money : Money) -> int:
    """ converts money into a total number of cents """
    pass

Aside from the @synth parts, this is equivalent to the following ISL+ code:

#lang htdp/isl+

(define-struct money [dollars cents])
;; A Money is a (make-money Integer Integer)

;; get-in-cents : Money -> Integer
;; converts Money into a total number of cents
(check-expect (get-in-cents (make-money 1 0)) 100)
(check-expect (get-in-cents (make-money 5 5)) 505)
(define (get-in-cents money) ...)

Let’s explain the python version, line-by-line. On line 1, we import everything we need from our library (how to get the library we’ll cover next). On lines 3-6, we define a data definition; this is effectively a struct called Money that has a dollars and cents field. They have type annotations, but these are not enforced (they are effectively comments, just structured).

Next (Lines 8-14), we get to a function defintion. Our library operates by assuming you start with an empty function definition: pass (Line 14) is an expression in Python that stands for "do nothing". Since Python is whitespace sensitive, you cannot have an empty block; instead, you can put pass. So this is a function with one argument (called money, with type Money) that returns an int, and it does nothing. A string that appears as the first part of the body of the function (Line 13) is called a docstring: this is a purpose statement, and like the type annotations, it is supported by the language (it cannot go somewhere else and work the same way). Triple quoted strings allow line breaks to appear in the string.

Finally (Lines 8-11), for the four lines that start with @synth. These are the declarations to our library that we want this function to be synthesized. They run from the function up, though the only one whose order matters is @synthfunc: this must happen "last", i.e., it must be the first one in line order. We describe each of the three decorators as follows:

@synthexpect(arg1, arg2, ..., ret): asserts that the to-be-synthesized function f, when called on the arguments, f(arg1,arg2,...), should return the value ret. This is both used as a guide for synthesis, and also, the generated code is validated by running this test. This uses equality (==) to check equivalence with the return value.

@synthexpect(arg1, arg2, ..., predicate): alternate form, where predicate is a function that returns a boolean. In this case, we expect that the predicate returns True when the function is called on the arguments. i.e., predicate(f(arg1,arg2,...)) should be True. This allows you to write tests for functions that are non-deterministic (e.g., random shuffling), where you might not be able to determine the exact output, but you can write a function to check properties of it (the only actual Python code you may need to write during the lab!).

@synthdepend(identifier, ...): declares that there is a data definition (e.g., Money), function (e.g., get_in_cents), or module (e.g., random, though note that a few, including math and random are included by default) that may be needed in order to synthesize this function. There can be any number of dependencies declared at once, and also, multiple invocations of this decorator. This is both provided as a hint to the synthesis, but also, only these and a very small number of core data structures are available to the generated code (i.e., functions that are in scope but not declared as dependencies cannot be used in synthesized functions), so one of the tasks when designing functions is deciding what they might need to use.

Given this as a file, when you load it, i.e., in the Python REPL, you might see this:

>>> from examples.dollar_change import *

[synth] get_in_cents: ✓ ✓ (caching valid response)

>>>

This shows that, as it is loading, it is synthesizing an implementation of get_in_cents. You can’t see here, of course, but there is quite a pause between the ":" and the first check mark — this is the time to actually query OpenAI and get back a response. The check-marks show that the generated code passed both given tests (if it failed, it would report that, what the failure was, and would attempt to find a new implementation).

If you load it again, and haven’t changed the specification (tests, docstring, depends, etc), it will use the previously generated version:

>>> from examples.dollar_change import *

[synth] get_in_cents: ✓ (cached)

>>>

In this case, no queries are made, and it is very fast. The general workflow, for this lab, is to define function headers, purpose statements, and tests, and then try to generate the function. If it works (and assuming you have sufficient tests), then you move on to the next function. Assuming you don’t need to update anything about the previous function, it will load instantly on subsequent loads of the module: only the current function you are working on will take time. Of course, if you discover a bug, you may have to add more tests, but this will never cause you to have to dig into the buggy code: you add/fix tests, and that function, plus all functions depending on the updated function will automatically be regenerated.

Of course, it may turn out that the function you want to generate is too complicated for the model to figure out: in that case, just as with code, define a helper first, generate that, and then add it as a dependency to the original function you were trying to generate.

Sample Code

We have provided, for this lab, a zip file that includes both our library code and a python file "starter.py". If you open that file up, you’ll see it has an example at the top that involves clocks, and then below that, data definitions for a blackjack game. To start, load the file in Python by starting, in the commandline, python:

$ python3

Python 3.11.6 (main, Oct 2 2023, 20:46:14) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> from starter import *

[synth] add_second: ✓ (cached)

>>>

The first time you load it, it will not show (cached), and it will take a few seconds. Once it has loaded, you can test the generated function by running it on an example input:

>>> add_second(Clock(10,0,0))

Clock(hours=10, minutes=0, seconds=1)

Note that you do not see the code (while there are certainly ways for you to poke around and figure out how to do so, please, for this lab, do not do that! You can use ChatGPT outside of this lab where you can see the generated code: we’d like you to use this system as it is designed, which is that you treat the generated code as a black box).

To see what happens if there is a problem, we can artificially introduce a bug in our test cases. If we introduced enough, perhaps the synthesis would find a different function, but especially for simple functions, most likely the other tests & purpose statement will result in the right code, so a failing test will result in our system failing to validate. So let’s change the first test output from from Clock(10,1,2)) to Clock(10,1,1)) (i.e., it doesn’t increment).

Now we quit python (Ctrl-D, or typing "quit()") and load it again.

>>> from starter import *

[synth] add_second: ✓ 𐄂 failed validation

add_second([Clock(hours=10, minutes=1, seconds=1)]) ≠ Clock(hours=10, minutes=1, seconds=1)

[synth] add_second (attempt 2/3): ✓ 𐄂 failed validation

add_second([Clock(hours=10, minutes=1, seconds=1)]) ≠ Clock(hours=10, minutes=1, seconds=1)

[synth] add_second (attempt 3/3): ✓ 𐄂 failed validation

add_second([Clock(hours=10, minutes=1, seconds=1)]) ≠ Clock(hours=10, minutes=1, seconds=1)

[synth] add_second: abandoning

>>>

Here we can see that it passed one test and failed the other, and it shows the failing test. Then it attempts to synthesize again. This still doesn’t work, and the last attempt still doesn’t work, so the system abandons the attempt, and puts a placeholder that will error if you try to call the function:

>>> add_second(Clock(0,1,2))

Traceback (most recent call last):

...

synth.lib.synthfunc.NotSynthesized: NotSynthesized: No implementation of add_second was sucessfully synthesized.

(Note that when we changed the test, we changed the cache key, and so we got a new synthesis effort. If we change the test back, we will actually go back to the old key, and won’t have to generate anything!).

Generating Blackjack

The main task of this lab is to make a function-only version of the game of Blackjack. What function-only means is that there will be no user interface: you will interact with the game by calling functions in the interactive python prompt. But the code you will create would be similar to code that would be necessary for an actual game, just excluding all of the interface!

A Short Summary of Blackjack

Blackjack is a card game where there are some number of players (you will start with just one!) and a dealer. Players and dealers both get dealt cards, and the goal is to have the highest sum of cards, without going over 21. Number cards count as their number, face cards (Jack, Queen, King) count as 10, and Aces count as either 1 or 11. Initially both players and the dealer are dealt two cards. The player can "hit", asking for an additional card. The dealer, instead, must follow a mechanical rule: if they have less than 17, they must get another card, if they have 17 or over, they must "stay". Betting is always between the players and the dealer, and generally, the idea is that the player either loses what they bet (if they lose) or double it (if they win). There are more complicated rules related to betting, splitting, etc, which we will not include.

Starter Code

We have given you, as starter code, a data definition for an initial version of this: a definition of a Card, and a game state BlackjackGame that currently hardcodes the dealer’s hand, and starts with an empty deck.

We’d like you to start by making a version of the game that will allow you to be able to carry out the below interaction. Note that it’s fine if the deck overlaps with the dealers cards (i.e., when filling the deck with add_to_deck, you don’t need to exclude those two cards; instead, just add as many distinct cards as specified).

>>> gm = shuffle_deck(add_to_deck(BlackjackGame(), 52))

>>> gm1 = deal_initial(gm)

>>> render_player_hand(gm1)

'4♠ K♠'

>>> gm2 = hit(gm1)

>>> render_player_hand(gm2)

'4♠ K♠ 2♦'

>>> gm3 = hit(gm2)

>>> render_player_hand(gm3)

'4♠ K♠ 2♦ 10♣'

>>> winner(gm3)

'dealer'

You should try not to write any Python code, even if you know how! You may need to define additional helpers not included above, and depend upon them to get these functions working.

Note how most of the functions take and receive the game state; that means most @synthexpect declarations will have the form:

@synthexpect(
  Blackjackgame(player_cards = (Card("♥", "10"),), deck = (Card("♥", "7"),)),
  Blackjackgame(player_cards = (Card("♥", "7"), Card("♥", "10")), deck = tuple()))

This could be a test for hit. One oddity of Python: since we want _immutable_ lists, we are using tuples (which are essentially immutable lists). Tuple literals are written with parenthesis, i.e., (1,2,3), but if there is only one element, there is no way to tell we intend a single element tuple rather than using parethesis to disambiguate. To resolve this, Python has you write a single trailing comma, e.g., (1,). You can see this in use in the example above.

Extending the Game

Now you should change it so that the dealer starts out with an empty hand and you should add a function for the dealer to get another card if, according to the rules of the game, the dealer would ask for one. For simplicity, you can have the dealer fill their entire hand at once (you don’t have to worry about alternating between dealer cards and player cards). i.e., add a deal_dealer function.

Now add betting. You first have to add an amount of money that the player has to the game state. Note how once you do that, (most of) your functions are regenerated: this is because the dataclass that they dependend on changed. Since the change was pretty trivial (adding a field), this refactoring shouldn’t cause any of the regenerations to fail.

Next, add a player_bet function that takes an amount from what they have and add that to their wager. You can enforce that this must be done before the player has cards. Then, you need to add a call function that indicates that the player is finished getting cards: at this point, if they are the winner, they get double their wager added back to their pool of money. If they lose, they get nothing, and regardless, all cards and the wager is cleared for another round.

At this point, you’ve added a bit. Now let’s do a more involved change: we want both partners to be able to play the game! Both will be playing against the dealer, so the fact that you’ll be able to see your partners cards is okay. However, each of you need a separate hand, the ability to wager / hit / call separately. For each round, however, the dealer has only a single set of cards.

At this point, you should have a game that you can both play! Test it out! What happens if you play enough rounds that the deck runs out of cards. Do you get a sensible error? Did you include test cases for that possibility?

Continuing with this after lab

Unfortunately, unlike with most labs, you won’t really be able to continue with this after lab. While you will clearly have access to the code, the API key that provides access to the OpenAI API will be turned off, so all the queries will fail. Obviously, if you have your own access to their API and use your own API key, you are welcome to keep experimenting with this (and if you do so, please let us know!).

Submit your work

Unlike previous labs, you must submit your work on this one! Please go to gradescope and turn in whatever you have at the end of lab. You must submit this (and complete the Survey) to get credit for the lab, but you will only be graded on having done something.

Survey

Complete the survey that is posted on the board!

Before You Go...

That’s the last lab of the semester! Just two lectures and one homework to go and you are done with the accelerated version of CS2500. Congratulations on all you have achieved! Consider the kinds of problems you can solve now that would have seemed impossible at the beginning of September; you can and should feel proud. We certainly are of you!

contents ← prev up next →

	General Information
	Advice
	Syllabus
	Office Hours/ Contact
	Email, Discord
	Homeworks
	Labs
	The Style
	Texts
	Course Contract
	The Design Recipe
	Placement Test

	1 The Design Recipe
	2 Big Bang
	3 Natural Numbers and Counters
	4 Quote & Unquote
	5 Designing With Lists and Abstractions
	6 Using Abstractions to Mimic
	7 Tree Abstractions
	8 Files and Directories
	9 Graphs
	10 Generative Recursion
	11 Mutable State
	13 Synthesis