Web Resources for CS7670
(Computer Systems Seminar)
Instructor:
Gene Cooperman
Fall, 2024
PLACE: 435 Ryder Hall
HOURS: Tuesday and Friday, 3:25 pm - 5:05 pm
Course Syllabus
Please see the separate syllabus document
on this web site.
How To Do Research (and the course wiki, and topic-sentences.py)
There is a directory with a series of short
essays on how to do research, and how to read and write technical
papers. Please read that section thoroughly.
Course Wiki:
The Wiki home page is here, for the course.
(Restricted to CS 7670 students, only: Use Khoury credentials).
I have now improved topic-sentences.py. (Click on the link to the script, and copy-paste into your local file.)
It works reasonably when inputing a markdown or latex file, and outputing
html, pdf, and even plain text or markdown or latex (topic sentences only).
But this Python script is still a work in progress.
Feel free to report any bugs that you see.
The topic-sentences.py program will extract the first sentences of each paragraph, so that it's easy to read any
document, to see if it flows globally. A typical usage might be:
#
Download your favorite paper from arxiv.org, using "TeX Source" file (.tar.gz), and then untar it, and run on main.tex:
python3 topic-sentences.py /tmp/gene/arxiv/main.tex --html
which results in:
The file /tmp/gene/arxiv/main-summary.html was written.
Course Vision
In 2024, the research topics will center around code LLMs.
However, the course will also accommodate a variety of other tracks
in computer systems, not necessarily related to LLMs.
The largest (and probably most popular) track will be
LLMs. Code LLMs are a sub-field which is only just now emerging.
A central question of this track is:
How can LLMs help us to debug?
Other related questions are:
- How can LLMs engage in a multi-stage dialog of "human and machine",
investigating a bug?
- How can LLMs help us generate unit tests?
- To what extent can LLMs generate code from specifications?
- Since LLMs often create bugs when they generate code, how can
one use a dialog to help LLMs to fix its own bugs?
- To what extent can LLMs reason about code logic,
since LLMs are notoriously bad in mathematical reasoning?
If you are looking to gain an understanding of how LLMs can help
in debugging, I strongly recommend reading the paper:
ChatDBG: An AI-Powered Debugging Assistant
(and note the
ChatDBG github repo, which is open source)
As befits the infancy of this emerging field of code LLMs, this paper
appeared only in March, 2024 (only 6 months before this course).
In this jewel of a paper, ChatDBG is applied to "bugs in a collection
of student labs from two introductory computer science courses" largely
emphasizing small programs written in scripting languages. The authors note:
"For the Python programs, a single query led to an actionable bug fix 67%
of the time; one additional follow-up query increased the success rate
to 85%."
The components of this successful demonstration are:
- Enriched Stack Trace
- "Take the Wheel" (allow ChatDBG to run debugger commands executing
parts of the code)
- Targeted Question (ask a question specific to the failure)
- Dialog (extend the chat with a second query)
This architecture conforms with the following working hypothesis by the
instructor's research team:
Three prerequisites for using code LLMs to debug are:
- The code LLM must be presented with sufficient example programs
as training data.
(ChatDBG relies on ChatGPT having already
trained itself on many instances of Python and Jupyter notebooks.)
- The usual human-centered debug outputs must be
enhanced or instrumented with greater context,
in order for the LLM to "understand" the debugging output.
Enhanced debug outputs can include:
a more detailed stack trace; an annotated dynamic execution trace;
values of variables that changed, and
relevant source code for each call frame.
(For example, ChatDBG's enriched stack
includes a window of 10
source lines, types and values of variables for each frame,
and suppression of detailed information for call frames from
standard libraries.)
- Dialogs should be restricted to standard, formulaic questions
that have been shown to be successful with LLMs.
(For example, "How did this variable become
null?")
Resources to explore this vision
For students who are interested in a more "hands-on" approach to this subject,
we can offer a concrete problem: debugging of multi-threaded programs.
This is an important research problem of the instructor's research team.
Students who become interested, will have the opportunity to continue
that research as co-authors after the semester is finished.
(It is important to have a sharp line between the course and any
extended joint research, so as to preserve ethics, so that students
are not exploited.)
In this hands-on approach, the research team
has built
a model checker (McMini), which is easy to install
that detects race conditions, deadlock, etc.,
in multi-threaded C/C++ programs.
While no LLM has yet been seriously used in this situation, McMini already
satisfies two of the prerequisites for an LLM:
- Enriched debugging traces:
A bug is presented as
an "enriched" dynamic execution trace. See for example the following
case:
McMini tutorial, "Analyzing the bug for bufferSize==1", for the
flavor of the execution trace. (Or start the tutorial from the beginning
for a full understanding.)
- Take-the-wheel:
McMini also has a command mcmini-gdb, which allows a user
to execute a trace within GDB, both going forward and backward.
This would become the basis for developing an
analog of ChatDBG's "take the wheel" feature.
The mcmini-gdb command used above is able to provide
the enriched execution trace through the use of the
Python API for GDB. Once one has seen examples of GDB augmented by Python,
it is easy to arbitrarily extend such examples using the Python language.
Source Material for Introductory Lectures
This was originally based on the version of the seminar from 2023.
While some of this continues to be useful, this description has
been augmented based on interests of the class.
We will begin the course (after looking at ChatGDB for motivation)
with an overview of GPUs and LLMs, and how
they impact computer systems, and vice versa.
You'll find the "raw" material for my lectures in this:
Overview of GPUs and LLMs
We have a course Wiki at:
CS 7670 Course Wiki
Note also a page that will evolve with suggested papers for reading
and/or presentation:
Papers to Consider Reading
And we now have a course Wiki, with
research tracks. But I have created a first research track already, on:
A Research Track: Can a model
checker serve as an oracle to help an LLM reason about code
and program traces?
Typical Questions in Writing
a Survey Paper
Study of Mechanistic Interepretability
- What are the 'layers' of an LLM?
(read only that one section)
-
Reversing Transformer to understand In-context Learning with Phase change & Feature dimensionality
(April, 2024; updated discussion from original Athropic paper;
This review is more readable than the original paper,
"In-context Learning and Induction Heads,
from March, 2022)
- Key definitions and concepts:
- logit (the unnormalized weight for a "neuron")
- layers (see above)
- MLP: Multi-Layer Perceptron (layers of neurons, related to a
traditional neural net for deep learning)
- loss function: In training, the input leads to a predicted output.
In addition to using backpropagation to adjust the weights,
The loss function is a measure of (on average) what is the
difference between the actual output vector, and a perfectly
predicted output vector (the output vectur of the in-out traning
example). The loss can never become zero, and when it nears
an asymptote, we can stop the training.
- types of attention heads: copying head (copy tokens of importance);
induction head (completing a pattern (e.g.: tri-skip in two-layer
transformer: A...B→C for tokens A, A and C.); Recall that induction in logic means to infer a new fact from many previous facts.)
- Attention block: The neurons employed in a particular attention
head. Since all attention heads operate in parallel, the "neurons"
are divided into "blocks", one block for each attention head.
- A transformer circuit can be represented as individual tensors,
WO (output), WV (value),
WQ (query), and WK (key), which
allow for computational efficiency.
- But in the article "Reversing Transformer to under In-context
Learning ...", we see that conceptually, we can multiply
to get an OV circuit (WOWV),
and a QK circuit (WQWK), which
conceptually yield more insight into the functionallity
of an LLM. (As before, see the article.)
- WOWV and WQWK are
low-rank matrices (the mathematical concept
of "sparse information": high dimension, but low rank),
while the four individual matrices have "dense information",
but are hard to interpret mechanistically.
- epoch -- WV is applied to the residual stream, and
then the attention heads, and then WO. This is
one epoch. Then the output becomes the new input, and another
epoch commences.
-
Superposition Hypothesis for steering LLM with Sparse AutoEncoder
(April, 2024; another excellent summary of Anthropic work:
)
-
QUESTION: If there are neurons that activate based on specific topics, could activating certain neurons force generation on those topics?
ANSER: Sparse autoencoder features can be used to intervene on and steer transformer generation.
- Related Papers of Anthropic:
- Toy Models of Superposition (Sept., 2022)
-
Superposition, Memorization, and Double Descent (Jan., 2023)
-
Distributed Representations: Composition & Superposition
(May, 2023; click on "Read Paper")
- Steering of LLMs (to discuss; Also, see
Papers to Consider Reading)
Study of Code LLMs
TODO
Scientific Writing
Written technical
communication is a key skill. There are rules (even recipes)
for good technical
writing, and I intend to teach those rules and recipes.
For example, there are global recipes, such as reading just
the first sentence of each paragraph for content; and there
are local recipes such as the "Structure of Prose" in
"The Science of Scientific Writing"
(from American Scientist). Good technical
writing is something that you can take with you for a
lifetime.
The "big discovery" in this article is that you need to be aware
what the reader is anticipating will be in the next sentence.
If you can understand what the reader is anticipating next,
then you can "fill in the blanks for the reader". Work with
the reader's preconceptions --- not against.
- Rule #1: Gain a distance from your writing, so that you can
see how it appears to others.
TRICK TO GAIN DISTANCE: Read only the topic sentence of your paragraphs.
Does it tell a story?
If you are joining the class remotely, please go to:
this Zoom link.
Twinkle Jain has kindly offered to maintain this link.
(We'll risk placing this link on a publicly visible course web page,
without using a waiting room for now. It will be obvious if
some unknown person joins us.)
LLMs (e.g., ChatGPT) and other Destabilizing
Technologies in Computer Systems
This course will emphasize reading of papers and presentations.
Each student will also either write a paper on some topic (perhaps
a survey; perhaps ideas for future research), or else write some
experimental prototype code to test the feasibility of some approach
(and then write a document on the results found).
The course is intended to emphasize "blue sky" dreaming about what
might be possible. This is intended to promote a spirit of fun
exploration and dialogue, where there are no wrong ideas, and anything
is possible. My own job will be to connect some of the "blue sky"
ideas with impromptu lectures on existing approaches in computer systems
and related subjects, to better understand what parts are feasible today,
and what tools are needed to make other ideas feasible in the landscape
of tomorrow.
If you have a good background in Computer Systems, then you should have
the right background. (I define a good background to mean that you
can read the man page for 'mmap', and then easily use mmap and munmap
in an interesting way in a program.) As a seminar course, the course
is designed to be flexible and respectful of the student's time, while
also providing an in-depth experience of some of the questions at the
forefront of computer systems research.
EXAMPLE: ChatGPT and Algorithmic Debugging
In order to start us off with "blue sky" dreaming, I'll present here my own dream. You're welcome to collaborate on this dream, or to collaboratively invent a dream of your own.
This dream is to analyze the potential of using ChatGPT and similar LLMs (Large Language Models) as "cognitive assistants" in writing application code. (I will often refer
to ChatGPT as a generic example of an LLM. Personally, I've been playing with Google Gemini. For an example, see this request to Gemini to write code for testing for malloc bugs such as double-free.)
In the past, one spent a lot of time carefully designing code, with enough assertions, unit tests, and functional tests to ensure correctness. With ChatGPT, we can now take a plain-English specification (even an ambiguous specification) and rapidly write code, along with some simple tests of that code. But the code will be filled with bugs.
So, code-writing is turned on its head: Instead of carefully writing
correct code, we knowingly and rapidly write incorrect
code, and then try to fix the code. Luckily, there
is interesting past work on how to debug an algorithmic
specification. Curiously enough, this subject is called Algorithmic
Debugging, which originally developed out of this 40-year old
PhD
thesis by Ehud Shapiro.
In a nutshell, Algorithmic Debugging looks at a buggy program with an
example input-output pair that demonstrates a bug. If the top-level
routine of the program call graph calls routines A, B, and C, then the
system presents to the programmer (or to an oracle) the input-output
pairs for each routine, and asks if one of the pairs is a bug. If B
has a bug, it presents the routines X, Y, and Z that are called by B,
and asks which of those routines has a bug. By breaking a program into
its components, it can then identify the component that has the bug.
If that component is small enough, then we can provide ChatGPT with a
plain-English specification for the routine, and even try to
ask ChatGPT to fix the bug in the routine.
If successful, a side benefit of this approach is to encourage
programmers to emphasize a more compositional approach to
programming, in which the communication patterns between components
are kept simple. A more formal version of this principle is the Principle of Least
Knowledge (aka the Law of Demeter, named after the Demeter project
of Prof. Lieberherr).
Review of Guide for a Well-Structured Research Paper
- Macro structure: Read the topic sentences of each paragraph.
Does it tell a story? If not, try breaking up a paragraph,
merging a paragraph, or inserting a new paragraph to fill
a logical gap in your writing. (Note that the topic of a
paragraph may actually fill a short two beginning sentences,
or else a truncated first sentence ending in ';' or :',)
-
As a mechanical aid, I am providing the following Python script:
topic-sentences.py.
It should figure out what is a paragraph (whether it's
latex (newline introduces paragraph) or markdown ('#'introduces
paragraph)), and print your topic sentences.
Beware: Some systems use python, or some (like Khoury login)
use python3). Adjust line 1 of the script for your system.
- Micro structure (within a paragraph):
Recall the lessons from the classic paper,
"The Science of Scientific Writing".
- Technical sentences are almost always of the form:
Subject-Verb-Object.
The stress is the object of your current sentence.
The topic is the subject. The stress of the current sentence
must match the topic of your next sentence.
(This is important because of "reader expectations".
If you defy the reader's expectations, you do so at your
own peril. The responsibility for being read resides
solely on the writer.).
- The stress of the current sentence doesn't always match
exactly the topic of your next sentence. The topic of the
next sentence might be: an example ("For example"); a contrasting
situation ("However"); a special csse ("In particular");
a generality ("Indeed"); a logical conclusion ("Therefore"); and so on.
Don't leave your reader to guess what is the connection between
your stress and the next topic. Clue the reader in by introducing
the next sentence with a connecting phrase. If you don't use
a connecting phrase, then the default assumption is that the
next topic already matches exactly the previous stress.
- And a third sub-principle, not discussed in class, is
beware of pronouns. Technical documents usually use pronouns
at an absolute minimum --- because of concerns for precise
language. If a pronoun could match more than one noun phrase,
then the reader's expectations is violated, and the reader must
stop to search for the intended noun phrase. Ideally, if you must
use pronouns, it should refer to the last noun phrase used,
or maybe the second-to-last if it is nearby.
(CONSIDER: "I'm looking at the boy on the hill with the telescope.
I'm looking straight at it!" Probably "it" is the "telescope". But
in precise technical language, it's better to repeat "hill"
or "telescope" -- whichever is intended.)
(There is also a parallel construction in some situations. For example:
"There are three cases. Case 1 is .... Case
2 is .... Case 3 is ...."
So, if stress+topic doesn't fit in a few sentences, maybe your
text falls into this special case. In this special case,
the first parallel sentence is a template, and each remaining
parallel sentence should use the same template.
Given this macro-structure (topic sentence of each paragraph) and
micro-structure (previous sentence "stress" and the next sentence "topic"),
you should then use this criteria for debugging your writing.
If a structure rule is violated, then use this to consider rewriting:
- Maybe you can split a paragraph, or insert a new paragraph
to make sure that there are no gaps in your logical story.
- If the previous sentence stress doesn't match the following
sentence topic, then consider splitting the paragraph exactly
where the following sentence topic isn't matching the previous stress.
- If you're almost never using connecting phrases to say why
the previous sentence "stress" and the next sentence "topic"
are matching, then maybe it's because your "stress" and "topic"
are not matching at all! Normally, we don't bother with a
connecting phrase when the stress and topic are only matching
approximately. If they do indeed match approximately but not
exactly, then it should be easy to insert connecting phrases to
explain how they match ("However", "For example", etc.).
If you're not sure how to reorganize a paragraph or a sentence, then
ask an LLM (Google Gemini, ChatGPT, etc.). For example, you can ask:
"Please split the following paragraph into several paragraphs: ..."
If your paragraph is too "choppy" (maybe because of poor matching
of stress-and-topic), try asking an LLM:
"Please rephrase the following paragraph so that it flows better: ..."
This is playing to the strengths of an LLM. An LLM tries to
predict what should be the next phrase. So, an LLM will notice
when the next phrase is not what it would have predicted. So, the LLM
will automatically create a new sentence or a new paragraph where
the original text fails to match the LLM's prediction.
Of course, LLMs are not perfect. You will still have to exercise
editorial control. But if you being with your own first draft (maybe
even with stream-of-consciousness, just to get words onto paper), then
the second stage will be much easier: Now you can let the LLM be the
"author", and you will be the "editor". :-)
Office Hours:
Office hours are after class on Tuesday and Friday and by appointment.
my office is in 336 WVH. My phone is (617) 373-8686.
(Also, feel free to drop into my office (336 WVH) or lab
(370 WVH) at any time. Interesting research ideas are often
best developed outside of a rigid schedule.)