This is an example of a small experiment, and how it can lead to an interesting research question. This creates a research track. This example will hopefully motivate others to either develop their own research track, or pursue an existing one.

This web page will be extended into multiple research tracks --- perhaps on the Course Wiki, when it is made available. The dining philosophers problem for multi-threaded programming is well known. If a mutex is used to "protect" each individual fork, then a deadlock can result. Such a dining philosophers program with just two threads (two philosophers) was run through the (simple) McMini model checker, and it reported a trace that leads to deadlock. So, one of the developers of McMini (Luka Jovanovic, of this class) showed the thread schedule to an LLM (GPT4), and asked it why the deadlock occurs. GPT4's response follows.

At the end of this note, you'll find a suggested research track.


User:
Why is there a deadlock, what happened:

About to exec into ./test/deadlock_program/philosophers_mutex_deadlock
Trace 1, *** DEADLOCK DETECTED ***
THREAD BACKTRACE
thread 0: starts
thread 0: pthread_mutex_init(1, _)
thread 0: pthread_mutex_init(2, _)
thread 0: pthread_create(1)
thread 0: pthread_create(2)
thread 1: starts
thread 1: pthread_mutex_lock(1)
thread 2: starts
thread 2: pthread_mutex_lock(2)
0, 0, 0, 0, 0, 1, 1, 2, 2, 
END
THREAD STATES
thread 0: pthread_join(1, _)
thread 1: pthread_mutex_lock(2)
thread 2: pthread_mutex_lock(1)
END
***** Model checking completed! *****
*** DEADLOCK DETECTED ***
  (Trace number: 1)
Number of transitions: 34
Number of traces: 3

GPT4:
The given trace provides information about a simple example of a classic 
deadlock problem involving two mutexes (or locks) and two threads. Here's a 
breakdown of what's happening:

1. Thread 0 initializes two mutexes:
   - `pthread_mutex_init(1, _)`
   - `pthread_mutex_init(2, _)`

2. Thread 0 creates two threads (Thread 1 and Thread 2):
   - `pthread_create(1)`
   - `pthread_create(2)`

3. Both threads start running:
   - `thread 1: starts`
   - `thread 2: starts`

4. Thread 1 locks Mutex 1:
   - `thread 1: pthread_mutex_lock(1)`

5. Thread 2 locks Mutex 2:
   - `thread 2: pthread_mutex_lock(2)`

6. Now, we hit the deadlock:
   - Thread 1 tries to lock Mutex 2, but Mutex 2 is already held by Thread 2.
   - `thread 1: pthread_mutex_lock(2)`
   - Thread 2 tries to lock Mutex 1, but Mutex 1 is already held by Thread 1.
   - `thread 2: pthread_mutex_lock(1)`

7. Both threads are stuck, waiting for the other to release a mutex. This 
creates a deadlock situation, as neither thread can proceed.

The main problem here is the order in which the threads attempt to acquire the 
mutexes. If both threads always tried to acquire the mutexes in the same order 
(either 1-2 or 2-1), this deadlock would not occur. However, in the given 
trace, the threads attempt to acquire the mutexes in opposite orders, leading 
to the deadlock.

To fix this deadlock, you'd typically ensure that both threads always try to 
acquire the mutexes in the same order (e.g., always lock Mutex 1 before 
Mutex 2). Another common solution is to use a try-lock mechanism (like 
`pthread_mutex_trylock()`) which attempts to lock the mutex and immediately 
returns if it's already locked, allowing the thread to either retry or perform 
another action without blocking.

    LLMs often now have plugins. Two types of plugins are typical:
  1. domain-specific knowledge (the "fine tuning" phase of LLMs that we've seen);
  2. searching the web to see if a statement is true; and
  3. oracles that include extended reasoning to immediately decide if a statement is true.

An example of an oracle in formal verification is an add-on module that can directly provide correct answers in a limited domain. An example of this might be a machine learning system that wishes to prove theorems about algebra. Given the laws of arithmetic, it could eventually prove that 1001 + 1000 > 2000. But if an arithmetic oracle exists, it can immediately determine if the statement is true.

RESEARCH IDEA: Suppose in a program we want to start with condition 'A' and end with condition 'Z'. An LLM can provide a summary of the issues, and the model checker can perform local checks, and then instruct the LLM on how to correct them. Further, the simple model checker, McMini, can be instrumented to provide additional information that might help the LLM.

More succinctly:

"Can a model checker be used as an oracle for an LLM that uses few-shot training to reason about code and summarize program traces?"

(One imagines a "conversation" between the LLM and the model checker. This is analogous to how Code LLMs are used today with a compiler/interpreter as the oracle. The oracle tells the LLM if some proposed source code will satisfy some given input-output constraints.)

RELATED RESOURCES AND TOPICS: Maxwell Pirtle (in this class) and Luka Jovanovic have developed a simple, but performant model checker, on which the text above is based. Its simplicity allows us to experiment with perturbing its behavior, search order, hints, etc., with knowledge gained from LLMs or other sources. As resources, you can find:

The literature on AI-based Programming and automatic development of unit tests (see Suggested Readings should also be relevant.

Francesca Lucchetti (in this class) will be presenting some combination of the following three papers/tech reports on Mechanistic Interpretability. We can review these papers (which will later be presented), and suggested course readings, for ideas about how to develop this theme. The three papers follow:


Code LLMs: We started this research track with a question about summarizing traces. This seems close to the concept of Code LLMs. Here are a few papers from the literature that may be interesting from this point of view.