CS 5600: Review for Midterm
Gene Cooperman

The following provides a summary on some of the topics that we studied more intensively and links to further information. You are still responsible for all of the pre-midterm readings listed on the syllabus -- not just what is listed here.
  1. Shell (processes, process startup, signals, file descriptors, entry in file table ("file handle"))
    1. Process API: fork(), execvp(), waitpid() (see man pages and Chap. 5 of OSTEP)
    2. Signals: signal(), kill() (see man pages)
    3. File descriptors: open(), read(), write(), dup(), dup2(), close() (see man pages)
    4. stdin, stdout, stderr (file descriptors: 0, 1, 2)
    5. Line 55: Process table in xv6
    6. Line 1019: struct task_struct (from Linux source code)
      NOTE: A Linux "task" is a generalized concept that can be a process (i.e., the primary thread of a process) or an arbitrary thread within a process.
    7. Line 11: Process startup (from xv6:exec.c:exec())
    8. Files and Directories (from Chap. 39 of OSTEP)
    9. Line 1: entry in file table (a "file handle")
      • Each file handle includes a reference count to that file. The reference count primarily counts the number of file descriptors referring to it. When no file descriptors refer to it, it can be removed.
        (EXAMPLE: Recall that when implementing pipes in your shell assignment, if the parent process forgot to close the file descriptor associated with the input to the pipe, then the child process reading the output of the pipe would never see an EOF: end-of-file.)
    10. Line 13: i-node entry in master block on disk
      • Each i-node (index node) for a file includes a reference count. The reference count can includes several types of references: file handle, directory containing a file, mapped region (mmap with MAP_SHARED (for shared memory)), etc. When no file descriptors refer to it, it can be removed.
  2. The ELF Format (ELF interpreter, dynamic and static linking, programmer's model)
    1. The ELF format
    2. NOTE: We are skipping the GOT (global offset table) and PLT (procedure linkage table) of ELF due to lack of time.
  3. Virtual Memory - concepts: page table, MMU, TLB (translation lookaside buffer), working set, page fault, page fault handler, thrashing, spatial locality, temporal locality, demand paging, page replacement algorithm, FIFO page replacement, LRU page replacement
    1. Paging: Introduction (Chap. 18 of OSTEP)
    2. Paging: Faster Translations (TLBs) (Chap. 19 of OSTEP)
    3. Clock algorithm (second-chance algorithm; often part of a more general lecture on virtual memory): (The clock algorithm is an example of a page replacement algorithm.)
      1. Animation of clock algorithm (unfortunately in Thai language, but enough English to be interesting)
      2. Page Replacement Algorithms (by Emmett Witchel, U. Texas)
      3. Virtual Memory (by Northwestern University; Clock algorithm: Slides 13 and 14)
      4. Virtual Memory (by Junfeng Yang, Columbia U.)
      5. Virtual Memory (by Carl Burch, Hendrix College)
    4. mmap(), mprotect(), /proc/PID/maps, /proc/PID/fd (see man pages)
    5. any additional concepts of mini-DMTCP assignments
  4. Threads (and concurrency)
    1. Threads -- As an initial framework for understanding threads, thnk of them in analogy to the core code of the shell. A shell might use fork(), execvp(), and waitpid(). A thread might use pthread_create(), a start function (specified in pthread_create()), and pthread_join().
      1. Concurrency and Threads (Chap. 26 of OSTEP)
      2. Thread API (Chap. 27 of OSTEP)
    2. Locks (Concurrency)
      1. Locks (from Chap. 28 of OSTEP) (pthread_mutex_lock(), pthread_mutex_unlock(); mutual exclusion; critical section)
      2. Semaphores (from Chap. 31 of OSTEP) (sem_post(), sem_wait(): Producer/Consumer (Bounded Buffer) problem: Chap. 31.4; Rest of chapter 31)
        • Producer-consumer problem: One semaphore counts work items in the buffer, and one semaphore to count empty slots in the buffer.
      3. Condition Variables (from Chap. 30 of OSTEP) (pthread_cond_signal(); pthread_cond_wait(); The general abstraction is usually called a monitor.)
        Monitors are an easier way to implement semaphores. See section 30.3 of OSTEP (Covering Conditions) for a traditional problem that is best implemented with monitors. Because I was not able to find a site on the web that presents the proper balance between the monitor abstraction, and the implementation of monitors with pthread_cond_XXX(), I have added a web page reviewing monitors.
        1. A monitor is a large critical section, with a "cutout" where blocked threads go to wait
        2. There is a mutex lock to protect the critical section of a monitor. The functions pthread_mutex_lock() and pthread_mutex_unlock() are used to define the critical section associated with the monitor.
        3. The "cutout", where blocked threads go to wait, is implemented by calls that are similar to sem_post()/sem_wait(). Specifically, the calls pthread_cond_signal() and pthread_cond_wait() are analogous to sem_post() and sem_wait(), respectively. They should be called only from within the monitor.
        4. Hence, the "count variable" from the implementation of a semaphore would be accessed and modified only from inside the critical section.
        5. pthread_cond_wait: used inside implementation of sem_wait; a thread calls it if count went negative and they need to block.
        6. pthread_cond_signal: used inside implementation of sem_post; a thread calls it to wake up all threads in the "cutout". Each thread from the cutout then waits its turn to grab the mutex lock and enter the critical section.
        7. In many implementations (including Linux), a thread may receive a spurious wakeup (for example, if it receives a signal while it is blocked). For this reason, in well-structured code, any thread that is woken up should ideally test a condition and then call pthread_cond_wait again if the condition is not satisfied. This is why the calls pthread_cond_wait/pthread_cond_signal refer to a condition variable. The condition variable is a generalization of the count variable that is used by a semaphore.
  5. Scheduling - concepts: fairness, progress, FIFO, round-robin, time-slicing, scheduling quantum (time for one slice), overlap of computation with I/O
    1. swtch.S (from xv6)
    2. Scheduling: Multi-level Feedback (Chap. 8 of OSTEP)
    3. Multi-CPU Scheduling
    4. NOTE: We are skipping multi-level queues (queues with priorities) due to lack of time. You will find a chapter on them in OSTEP, Chapt. 9, but you do not need this information for the exam.