We have described in class described in class a hierarchy of
thread synchronization approaches:

* pure shared memory (examples:  Dekker's algorithm, Petersen's algorithm)
    - See Wikipedia: https://en.wikipedia.org/wiki/Mutual_exclusion#Software_solutions
* test-and-set or compare-and-swap (depending on the CPU instruction set)
* lock-free (non-blocking; built on top of compare-and-swap)
    - See Wikipedia:
       https://en.wikipedia.org/wiki/Non-blocking_linked_list
       https://en.wikipedia.org/wiki/Non-blocking_algorithm
       https://en.wikipedia.org/wiki/ABA_problem
         (see https://course.ccs.neu.edu/cs7600/parent/thread-synch/aba-problem.pdf )
* mutex (related concepts:  spinlock as a type of busy waiting, 'man futex')
* semaphore (and 'man pthread_barrier' can easily be built on top of semaphores)
* condition variables (and 'man pthread_rwlock_rdlock/wrlock/unlock' easily
                       built on top of condition variables)

As we move through the hierarchy, we move from the very efficient to the
less efficient.  However, in return, we gain the ability to more _easily_
construct code for general cases of thread synchronization.

At the most general end of the hierarchy, the discussion of the paper by
Lu et al. in (ostep.org: Chapter 32: Concurrency Bugs).  In addition to
deadlock bugs, Lu et al. identify violations of atomicity and ordering
as frequent bugs in real-world software.  The original paper of Lu et al.
is very readable:
  "Learning from Mistakes — A Comprehensive Study on Real World
  Concurrency Bug Characteristics" by Shan Lu, Soyeon Park, Eunsoo Seo,
  Yuanyuan Zhou. ASPLOS '08, March 2008, Seattle, Washington.
And this is part of a an excelent series of papers on "Concurrency/Multicore":
  http://opera.ucsd.edu/pub_concurrency.html
And there is an equally interesting series of papers on "Bug Detection" in:
  http://opera.ucsd.edu/pub_system_depend.html#bug-detection
This is from http://opera.ucsd.edu/pub_chronological.htm, via the home
page of Yuanyuan Zhou.

 The first in-depth study of concurrency bugs in real software, and the basis
for this chapter. Look at Y.Y. Zhou’s or Shan Lu’s web pages for many more
interesting papers on bugs.

At the lowest level in the hieararchy, the paper by Sarita Adve et al.:
  Sarita V. Adve, Kourosh Gharachorloo:
  "Shared Memory Consistency Models: A Tutorial". Computer 29(12): 66-76 (1996)
is excellent.  (Actually, I have a dim memory of a different paper by Adve that
I will look for.)  The paper discusses how different memory models are affected by:
 1. compiler optimizations: reordering instructions involving shared variables
 2. hardware re-ordering of instructions and of write buffers
There are weak and strong memory models, with sequential consistency
the strongest of them all, and the one that we usually reason with.
But most modern CPUs use weaker memory models.  These models are called
_memory consistency_ models.

Finally, we should not neglect the 'volatile' keyword of C, C++ and Java.
This forces the compiler to load from RAM (by invalidating a cache line)
whenever the declared variable is read.  (And this happens to any
local variable (or global???) when a function call to a new function is made.)

And last, there are the MESI states for handling _cache coherency_
for multiple CPU core caches (L1 caches).  I'll leave you to use these
keywords to look it up yourself, if you are interested.  But here's one
place to start:
  https://en.wikipedia.org/wiki/MESI_protocol