Web Resources for CS7670 (Computer Systems Seminar)

Instructor: Gene Cooperman
Fall, 2024
HOURS: Tuesday and Friday, 3:25 pm - 5:05 pm
This web page is mostly from when it was taught in Fall, 2023. It will have a similar emphasis in Fall, 2024. This web page will evolve. Please check back later.

Ideally, the student will have had a graduate Computer Systems course or the equivalent (e.g., advanced undergrad work, or significant systems project). If you are unsure of your background, see if you can read about a new system calls (for example, man mmap) and then write a short program using this system call.

But even if the student had only a standard undergraduate course, my plan is to offer a small number of optional, _light_ homeworks at the beginning for students who are not sure if they have the background. I will then use this as a diagnostic (not for a grade), and offer a 1-on-1 rapid tutorial to help catch them up to the level assumed for this course.

Source Material for Introductory Lectures

We will begin the course with an overview of GPUs and LLMs, and how they impact computer systems, and vice versa. You'll find the "raw" material for my lectures in this:
Overview of GPUs and LLMs

Note also a page that will evolve with suggested papers for reading and/or presentation:
Papers to Consider Reading

And once a course wiki is available, we will create wiki pages with research tracks. I have created a first research track already, on:
A Research Track: Can a model checker serve as an oracle to help an LLM reason about code and program traces?

We now have a course Wiki at:

CS 7670 course wiki

Scientific Writing

Written technical communication is a key skill. There are rules (even recipes) for good technical writing, and I intend to teach those rules and recipes. For example, there are global recipes, such as reading just the first sentence of each paragraph for content; and there are local recipes such as the "Structure of Prose" in
     "The Science of Scientific Writing"
(from American Scientist). Good technical writing is something that you can take with you for a lifetime.

The "big discovery" in this article is that you need to be aware what the reader is anticipating will be in the next sentence. If you can understand what the reader is anticipating next, then you can "fill in the blanks for the reader". Work with the reader's preconceptions --- not against.

If you are joining the class remotely, please go to: this Zoom link. Twinkle Jain has kindly offered to maintain this link. (We'll risk placing this link on a publicly visible course web page, without using a waiting room for now. It will be obvious if some unknown person joins us.)

LLMs (e.g., ChatGPT) and other Destabilizing Technologies in Computer Systems

This course will emphasize reading of papers and presentations. Each student will also either write a paper on some topic (perhaps a survey; perhaps ideas for future research), or else write some experimental prototype code to test the feasibility of some approach (and then write a document on the results found).

The course is intended to emphasize "blue sky" dreaming about what might be possible. This is intended to promote a spirit of fun exploration and dialogue, where there are no wrong ideas, and anything is possible. My own job will be to connect some of the "blue sky" ideas with impromptu lectures on existing approaches in computer systems and related subjects, to better understand what parts are feasible today, and what tools are needed to make other ideas feasible in the landscape of tomorrow.

If you have a good background in Computer Systems, then you should have the right background. (I define a good background to mean that you can read the man page for 'mmap', and then easily use mmap and munmap in an interesting way in a program.) As a seminar course, the course is designed to be flexible and respectful of the student's time, while also providing an in-depth experience of some of the questions at the forefront of computer systems research.

EXAMPLE: ChatGPT and Algorithmic Debugging

In order to start us off with "blue sky" dreaming, I'll present here my own dream. You're welcome to collaborate on this dream, or to collaboratively invent a dream of your own. This dream is to analyze the potential of using ChatGPT and similar LLMs (Large Language Models) as "cognitive assistants" in writing application code. (I will often refer to ChatGPT as a generic example of an LLM. Personally, I've been playing with Google Bard. For an example, see this request to Bard to write code for testing for malloc bugs such as double-free.) In the past, one spent a lot of time carefully designing code, with enough assertions, unit tests, and functional tests to ensure correctness. With ChatGPT, we can now take a plain-English specification (even an ambiguous specification) and rapidly write code, along with some simple tests of that code. But the code will be filled with bugs.

So, code-writing is turned on its head: Instead of carefully writing correct code, we knowingly and rapidly write incorrect code, and then try to fix the code. Luckily, there is interesting past work on how to debug an algorithmic specification. Curiously enough, this subject is called Algorithmic Debugging, which originally developed out of this 40-year old PhD thesis by Ehud Shapiro.

In a nutshell, Algorithmic Debugging looks at a buggy program with an example input-output pair that demonstrates a bug. If the top-level routine of the program call graph calls routines A, B, and C, then the system presents to the programmer (or to an oracle) the input-output pairs for each routine, and asks if one of the pairs is a bug. If B has a bug, it presents the routines X, Y, and Z that are called by B, and asks which of those routines has a bug. By breaking a program into its components, it can then identify the component that has the bug. If that component is small enough, then we can provide ChatGPT with a plain-English specification for the routine, and even try to ask ChatGPT to fix the bug in the routine.

If successful, a side benefit of this approach is to encourage programmers to emphasize a more compositional approach to programming, in which the communication patterns between components are kept simple. A more formal version of this principle is the Principle of Least Knowledge (aka the Law of Demeter, named after the Demeter project of Prof. Lieberherr).

Review of Guide for a Well-Structured Research Paper

  1. Macro structure: Read the topic sentences of each paragraph. Does it tell a story? If not, try breaking up a paragraph, merging a paragraph, or inserting a new paragraph to fill a logical gap in your writing. (Note that the topic of a paragraph may actually fill a short two beginning sentences, or else a truncated first sentence ending in ';' or :',)
  2. Micro structure (within a paragraph): Recall the lessons from the classic paper, "The Science of Scientific Writing".
    1. Technical sentences are almost always of the form: Subject-Verb-Object. The stress is the object of your current sentence. The topic is the subject. The stress of the current sentence must match the topic of your next sentence. (This is important because of "reader expectations". If you defy the reader's expectations, you do so at your own peril. The responsibility for being read resides solely on the writer.).
    2. The stress of the current sentence doesn't always match exactly the topic of your next sentence. The topic of the next sentence might be: an example ("For example"); a contrasting situation ("However"); a special csse ("In particular"); a generality ("Indeed"); a logical conclusion ("Therefore"); and so on. Don't leave your reader to guess what is the connection between your stress and the next topic. Clue the reader in by introducing the next sentence with a connecting phrase. If you don't use a connecting phrase, then the default assumption is that the next topic already matches exactly the previous stress.
    3. And a third sub-principle, not discussed in class, is beware of pronouns. Technical documents usually use pronouns at an absolute minimum --- because of concerns for precise language. If a pronoun could match more than one noun phrase, then the reader's expectations is violated, and the reader must stop to search for the intended noun phrase. Ideally, if you must use pronouns, it should refer to the last noun phrase used, or maybe the second-to-last if it is nearby. (CONSIDER: "I'm looking at the boy on the hill with the telescope. I'm looking straight at it!" Probably "it" is the "telescope". But in precise technical language, it's better to repeat "hill" or "telescope" -- whichever is intended.)
    (There is also a parallel construction in some situations. For example: "There are three cases. Case  1 is .... Case  2 is .... Case  3 is ...." So, if stress+topic doesn't fit in a few sentences, maybe your text falls into this special case. In this special case, the first parallel sentence is a template, and each remaining parallel sentence should use the same template.

Given this macro-structure (topic sentence of each paragraph) and micro-structure (previous sentence "stress" and the next sentence "topic"), you should then use this criteria for debugging your writing. If a structure rule is violated, then use this to consider rewriting:

  1. Maybe you can split a paragraph, or insert a new paragraph to make sure that there are no gaps in your logical story.
  2. If the previous sentence stress doesn't match the following sentence topic, then consider splitting the paragraph exactly where the following sentence topic isn't matching the previous stress.
  3. If you're almost never using connecting phrases to say why the previous sentence "stress" and the next sentence "topic" are matching, then maybe it's because your "stress" and "topic" are not matching at all! Normally, we don't bother with a connecting phrase when the stress and topic are only matching approximately. If they do indeed match approximately but not exactly, then it should be easy to insert connecting phrases to explain how they match ("However", "For example", etc.).
If you're not sure how to reorganize a paragraph or a sentence, then ask an LLM (Google Bard, ChatGPT, etc.). For example, you can ask:
"Please split the following paragraph into several paragraphs: ..."
If your paragraph is too "choppy" (maybe because of poor matching of stress-and-topic), try asking an LLM:
"Please rephrase the following paragraph so that it flows better: ..."

This is playing to the strengths of an LLM. An LLM tries to predict what should be the next phrase. So, an LLM will notice when the next phrase is not what it would have predicted. So, the LLM will automatically create a new sentence or a new paragraph where the original text fails to match the LLM's prediction.

Of course, LLMs are not perfect. You will still have to exercise editorial control. But if you being with your own first draft (maybe even with stream-of-consciousness, just to get words onto paper), then the second stage will be much easier: Now you can let the LLM be the "author", and you will be the "editor". :-)

Office Hours: Office hours are after class on Tuesday and Friday and by appointment. my office is in 336 WVH. My phone is (617) 373-8686. (Also, feel free to drop into my office (336 WVH) or lab (370 WVH) at any time. Interesting research ideas are often best developed outside of a rigid schedule.)