Web Resources for CS7670 (Computer Systems Seminar)

Instructor: Gene Cooperman

Fall, 2024
PLACE:  435 Ryder Hall
HOURS: Tuesday and Friday, 3:25 pm - 5:05 pm

Course Syllabus

Please see the separate syllabus document on this web site.

How To Do Research (and the course wiki, and topic-sentences.py)

There is a directory with a series of short essays on how to do research, and how to read and write technical papers. Please read that section thoroughly.

Course Wiki: The Wiki home page is here, for the course. (Restricted to CS 7670 students, only: Use Khoury credentials).

I have now improved topic-sentences.py. (Click on the link to the script, and copy-paste into your local file.)
It works reasonably when inputing a markdown or latex file, and outputing html, pdf, and even plain text or markdown or latex (topic sentences only). But this Python script is still a work in progress. Feel free to report any bugs that you see.

The topic-sentences.py program will extract the first sentences of each paragraph, so that it's easy to read any document, to see if it flows globally. A typical usage might be:
    # Download your favorite paper from arxiv.org, using "TeX Source" file (.tar.gz), and then untar it, and run on main.tex:
     python3 topic-sentences.py /tmp/gene/arxiv/main.tex --html
which results in:
     The file /tmp/gene/arxiv/main-summary.html was written.

Course Vision

In 2024, the research topics will center around code LLMs. However, the course will also accommodate a variety of other tracks in computer systems, not necessarily related to LLMs.

The largest (and probably most popular) track will be LLMs. Code LLMs are a sub-field which is only just now emerging. A central question of this track is:

How can LLMs help us to debug? Other related questions are:

How can LLMs engage in a multi-stage dialog of "human and machine", investigating a bug?
How can LLMs help us generate unit tests?
To what extent can LLMs generate code from specifications?
Since LLMs often create bugs when they generate code, how can one use a dialog to help LLMs to fix its own bugs?
To what extent can LLMs reason about code logic, since LLMs are notoriously bad in mathematical reasoning?

If you are looking to gain an understanding of how LLMs can help in debugging, I strongly recommend reading the paper: ChatDBG: An AI-Powered Debugging Assistant
(and note the ChatDBG github repo, which is open source) As befits the infancy of this emerging field of code LLMs, this paper appeared only in March, 2024 (only 6 months before this course).

In this jewel of a paper, ChatDBG is applied to "bugs in a collection of student labs from two introductory computer science courses" largely emphasizing small programs written in scripting languages. The authors note: "For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%."

The components of this successful demonstration are:

Enriched Stack Trace
"Take the Wheel" (allow ChatDBG to run debugger commands executing parts of the code)
Targeted Question (ask a question specific to the failure)
Dialog (extend the chat with a second query)

This architecture conforms with the following working hypothesis by the instructor's research team:

Three prerequisites for using code LLMs to debug are:

The code LLM must be presented with sufficient example programs as training data.
(ChatDBG relies on ChatGPT having already trained itself on many instances of Python and Jupyter notebooks.)
The usual human-centered debug outputs must be enhanced or instrumented with greater context, in order for the LLM to "understand" the debugging output. Enhanced debug outputs can include: a more detailed stack trace; an annotated dynamic execution trace; values of variables that changed, and relevant source code for each call frame.
(For example, ChatDBG's enriched stack includes a window of 10 source lines, types and values of variables for each frame, and suppression of detailed information for call frames from standard libraries.)
Dialogs should be restricted to standard, formulaic questions that have been shown to be successful with LLMs.
(For example, "How did this variable become null?")

Resources to explore this vision

For students who are interested in a more "hands-on" approach to this subject, we can offer a concrete problem: debugging of multi-threaded programs. This is an important research problem of the instructor's research team.

Students who become interested, will have the opportunity to continue that research as co-authors after the semester is finished. (It is important to have a sharp line between the course and any extended joint research, so as to preserve ethics, so that students are not exploited.)

In this hands-on approach, the research team has built a model checker (McMini), which is easy to install that detects race conditions, deadlock, etc., in multi-threaded C/C++ programs. While no LLM has yet been seriously used in this situation, McMini already satisfies two of the prerequisites for an LLM:

Enriched debugging traces: A bug is presented as an "enriched" dynamic execution trace. See for example the following case: McMini tutorial, "Analyzing the bug for bufferSize==1", for the flavor of the execution trace. (Or start the tutorial from the beginning for a full understanding.)
Take-the-wheel: McMini also has a command mcmini-gdb, which allows a user to execute a trace within GDB, both going forward and backward. This would become the basis for developing an analog of ChatDBG's "take the wheel" feature.

The mcmini-gdb command used above is able to provide the enriched execution trace through the use of the Python API for GDB. Once one has seen examples of GDB augmented by Python, it is easy to arbitrarily extend such examples using the Python language.

Source Material for Introductory Lectures

This was originally based on the version of the seminar from 2023. While some of this continues to be useful, this description has been augmented based on interests of the class.

We will begin the course (after looking at ChatGDB for motivation) with an overview of GPUs and LLMs, and how they impact computer systems, and vice versa. You'll find the "raw" material for my lectures in this:

Overview of GPUs and LLMs

We have a course Wiki at:

CS 7670 Course Wiki
Note also a page that will evolve with suggested papers for reading and/or presentation:
Papers to Consider Reading
And we now have a course Wiki, with research tracks. But I have created a first research track already, on:
A Research Track: Can a model checker serve as an oracle to help an LLM reason about code and program traces?
Typical Questions in Writing a Survey Paper

Study of Mechanistic Interepretability

What are the 'layers' of an LLM? (read only that one section)
Reversing Transformer to understand In-context Learning with Phase change & Feature dimensionality (April, 2024; updated discussion from original Anthropic paper; This review is more readable than the original paper, "In-context Learning and Induction Heads, from March, 2022)
Key definitions and concepts:
1. logit (the unnormalized weight for a "neuron")
2. layers (see above)
3. MLP: Multi-Layer Perceptron (layers of neurons, related to a traditional neural net for deep learning)
4. loss function: In training, the input leads to a predicted output. In addition to using backpropagation to adjust the weights, The loss function is a measure of (on average) what is the difference between the actual output vector, and a perfectly predicted output vector (the output vectur of the in-out traning example). The loss can never become zero, and when it nears an asymptote, we can stop the training.
5. types of attention heads: copying head (copy tokens of importance); induction head (completing a pattern (e.g.: tri-skip in two-layer transformer: A...B→C for tokens A, A and C.); Recall that induction in logic means to infer a new fact from many previous facts.)
6. Attention block: The neurons employed in a particular attention head. Since all attention heads operate in parallel, the "neurons" are divided into "blocks", one block for each attention head.
7. A transformer circuit can be represented as individual tensors, W_O (output), W_V (value), W_Q (query), and W_K (key), which allow for computational efficiency.
8. But in the article "Reversing Transformer to under In-context Learning ...", we see that conceptually, we can multiply to get an OV circuit (W_OW_V), and a QK circuit (W_QW_K), which conceptually yield more insight into the functionality of an LLM. (As before, see the article.)
9. W_OW_V and W_QW_K are low-rank matrices (the mathematical concept of "sparse information": high dimension, but low rank), while the four individual matrices have "dense information", but are hard to interpret mechanistically.
10. epoch -- W_V is applied to the residual stream, and then the attention heads, and then W_O. This is one epoch. Then the output becomes the new input, and another epoch commences.
Superposition Hypothesis for steering LLM with Sparse AutoEncoder (April, 2024; another excellent summary of Anthropic work: )
- QUESTION: If there are neurons that activate based on specific topics, could activating certain neurons force generation on those topics?
  ANSER: Sparse autoencoder features can be used to intervene on and steer transformer generation.
Related Papers of Anthropic:
1. Toy Models of Superposition (Sept., 2022)
2. Superposition, Memorization, and Double Descent (Jan., 2023)
3. Distributed Representations: Composition & Superposition (May, 2023; click on "Read Paper")
Steering of LLMs (to discuss; Also, see Papers to Consider Reading)

Study of Code LLMs

TODO

Scientific Writing

Written technical communication is a key skill. There are rules (even recipes) for good technical writing, and I intend to teach those rules and recipes. For example, there are global recipes, such as reading just the first sentence of each paragraph for content; and there are local recipes such as the "Structure of Prose" in
"The Science of Scientific Writing"
(from American Scientist). Good technical writing is something that you can take with you for a lifetime.

The "big discovery" in this article is that you need to be aware what the reader is anticipating will be in the next sentence. If you can understand what the reader is anticipating next, then you can "fill in the blanks for the reader". Work with the reader's preconceptions --- not against.

Rule #1: Gain a distance from your writing, so that you can see how it appears to others.
TRICK TO GAIN DISTANCE: Read only the topic sentence of your paragraphs. Does it tell a story?

If you are joining the class remotely, please go to: this Zoom link. Twinkle Jain has kindly offered to maintain this link. (We'll risk placing this link on a publicly visible course web page, without using a waiting room for now. It will be obvious if some unknown person joins us.)

LLMs (e.g., ChatGPT) and other Destabilizing Technologies in Computer Systems

This course will emphasize reading of papers and presentations. Each student will also either write a paper on some topic (perhaps a survey; perhaps ideas for future research), or else write some experimental prototype code to test the feasibility of some approach (and then write a document on the results found).

The course is intended to emphasize "blue sky" dreaming about what might be possible. This is intended to promote a spirit of fun exploration and dialogue, where there are no wrong ideas, and anything is possible. My own job will be to connect some of the "blue sky" ideas with impromptu lectures on existing approaches in computer systems and related subjects, to better understand what parts are feasible today, and what tools are needed to make other ideas feasible in the landscape of tomorrow.

If you have a good background in Computer Systems, then you should have the right background. (I define a good background to mean that you can read the man page for 'mmap', and then easily use mmap and munmap in an interesting way in a program.) As a seminar course, the course is designed to be flexible and respectful of the student's time, while also providing an in-depth experience of some of the questions at the forefront of computer systems research.

EXAMPLE: ChatGPT and Algorithmic Debugging

In order to start us off with "blue sky" dreaming, I'll present here my own dream. You're welcome to collaborate on this dream, or to collaboratively invent a dream of your own. This dream is to analyze the potential of using ChatGPT and similar LLMs (Large Language Models) as "cognitive assistants" in writing application code. (I will often refer to ChatGPT as a generic example of an LLM. Personally, I've been playing with Google Gemini. For an example, see this request to Gemini to write code for testing for malloc bugs such as double-free.) In the past, one spent a lot of time carefully designing code, with enough assertions, unit tests, and functional tests to ensure correctness. With ChatGPT, we can now take a plain-English specification (even an ambiguous specification) and rapidly write code, along with some simple tests of that code. But the code will be filled with bugs.

So, code-writing is turned on its head: Instead of carefully writing correct code, we knowingly and rapidly write incorrect code, and then try to fix the code. Luckily, there is interesting past work on how to debug an algorithmic specification. Curiously enough, this subject is called Algorithmic Debugging, which originally developed out of this 40-year old PhD thesis by Ehud Shapiro.

In a nutshell, Algorithmic Debugging looks at a buggy program with an example input-output pair that demonstrates a bug. If the top-level routine of the program call graph calls routines A, B, and C, then the system presents to the programmer (or to an oracle) the input-output pairs for each routine, and asks if one of the pairs is a bug. If B has a bug, it presents the routines X, Y, and Z that are called by B, and asks which of those routines has a bug. By breaking a program into its components, it can then identify the component that has the bug. If that component is small enough, then we can provide ChatGPT with a plain-English specification for the routine, and even try to ask ChatGPT to fix the bug in the routine.

If successful, a side benefit of this approach is to encourage programmers to emphasize a more compositional approach to programming, in which the communication patterns between components are kept simple. A more formal version of this principle is the Principle of Least Knowledge (aka the Law of Demeter, named after the Demeter project of Prof. Lieberherr).

Review of Guide for a Well-Structured Research Paper

Macro structure: Read the topic sentences of each paragraph. Does it tell a story? If not, try breaking up a paragraph, merging a paragraph, or inserting a new paragraph to fill a logical gap in your writing. (Note that the topic of a paragraph may actually fill a short two beginning sentences, or else a truncated first sentence ending in ';' or :',)
- As a mechanical aid, I am providing the following Python script: topic-sentences.py.
  It should figure out what is a paragraph (whether it's latex (newline introduces paragraph) or markdown ('#'introduces paragraph)), and print your topic sentences. Beware: Some systems use python, or some (like Khoury login) use python3). Adjust line 1 of the script for your system.
Micro structure (within a paragraph): Recall the lessons from the classic paper, "The Science of Scientific Writing".
1. Technical sentences are almost always of the form: Subject-Verb-Object. The stress is the object of your current sentence. The topic is the subject. The stress of the current sentence must match the topic of your next sentence. (This is important because of "reader expectations". If you defy the reader's expectations, you do so at your own peril. The responsibility for being read resides solely on the writer.).
2. The stress of the current sentence doesn't always match exactly the topic of your next sentence. The topic of the next sentence might be: an example ("For example"); a contrasting situation ("However"); a special csse ("In particular"); a generality ("Indeed"); a logical conclusion ("Therefore"); and so on. Don't leave your reader to guess what is the connection between your stress and the next topic. Clue the reader in by introducing the next sentence with a connecting phrase. If you don't use a connecting phrase, then the default assumption is that the next topic already matches exactly the previous stress.
3. And a third sub-principle, not discussed in class, is beware of pronouns. Technical documents usually use pronouns at an absolute minimum --- because of concerns for precise language. If a pronoun could match more than one noun phrase, then the reader's expectations is violated, and the reader must stop to search for the intended noun phrase. Ideally, if you must use pronouns, it should refer to the last noun phrase used, or maybe the second-to-last if it is nearby. (CONSIDER: "I'm looking at the boy on the hill with the telescope. I'm looking straight at it!" Probably "it" is the "telescope". But in precise technical language, it's better to repeat "hill" or "telescope" -- whichever is intended.)
(There is also a parallel construction in some situations. For example: "There are three cases. Case 1 is .... Case 2 is .... Case 3 is ...." So, if stress+topic doesn't fit in a few sentences, maybe your text falls into this special case. In this special case, the first parallel sentence is a template, and each remaining parallel sentence should use the same template.

Given this macro-structure (topic sentence of each paragraph) and micro-structure (previous sentence "stress" and the next sentence "topic"), you should then use this criteria for debugging your writing. If a structure rule is violated, then use this to consider rewriting:

Maybe you can split a paragraph, or insert a new paragraph to make sure that there are no gaps in your logical story.
If the previous sentence stress doesn't match the following sentence topic, then consider splitting the paragraph exactly where the following sentence topic isn't matching the previous stress.
If you're almost never using connecting phrases to say why the previous sentence "stress" and the next sentence "topic" are matching, then maybe it's because your "stress" and "topic" are not matching at all! Normally, we don't bother with a connecting phrase when the stress and topic are only matching approximately. If they do indeed match approximately but not exactly, then it should be easy to insert connecting phrases to explain how they match ("However", "For example", etc.).
If you're not sure how to reorganize a paragraph or a sentence, then ask an LLM (Google Gemini, ChatGPT, etc.). For example, you can ask:
"Please split the following paragraph into several paragraphs: ..."
If your paragraph is too "choppy" (maybe because of poor matching of stress-and-topic), try asking an LLM:
"Please rephrase the following paragraph so that it flows better: ..."
This is playing to the strengths of an LLM. An LLM tries to predict what should be the next phrase. So, an LLM will notice when the next phrase is not what it would have predicted. So, the LLM will automatically create a new sentence or a new paragraph where the original text fails to match the LLM's prediction.
Of course, LLMs are not perfect. You will still have to exercise editorial control. But if you being with your own first draft (maybe even with stream-of-consciousness, just to get words onto paper), then the second stage will be much easier: Now you can let the LLM be the "author", and you will be the "editor". :-)

Office Hours: Office hours are after class on Tuesday and Friday and by appointment. my office is in 336 WVH. My phone is (617) 373-8686. (Also, feel free to drop into my office (336 WVH) or lab (370 WVH) at any time. Interesting research ideas are often best developed outside of a rigid schedule.)