Data Races, Locks, Mutexes, Semaphores

Data Races

Last class: we wrote a concurrent version of a sum - one using fork and shared memory (via mmap) and one using threads (memory was shared implicitly)

We ran into data races in both implementations: at some point our two processes / threads got an inconsistent view of the shared variable

Reason - typical example scenario:

Thread A reads sum from memory
Thread A gets interrupted by the OS
Thread B reads sum from memory
Thread B modifies the value of sum
Thread B stores sum in memory
(Optionally) Thread B performs a few more iterations of steps 3-5
Thread B reads sum from memory
Thread B gets interrupted by the OS (eventually)
Thread A modifies the its value of sum
Thread A stores sum in memory (discarding the work done by Thread B)

In general, data races occur, when:

At least two processes / threads run concurrently
They share data
At least one process writes

Aside: When is data shared?

Forked processes: Shared memory (mmap with MAP_SHARED)

Threads: global variables, local static variables

Global variables

declared outside of a function
stored in the .data segment (if initialized)
fork: not shared
threads: shared

Local variables

declared inside of a function (no static keyword)
fork: not shared
threads: not shared (each thread has its own stack)

Local static variables

declared inside of a function with the static specifier
fork: not shared
threads: shared (only visible in the function it’s declared in)

Fixing Data Races

We need mutual exclusion, i.e., threads are mutually excluded from running a piece of code that needs the shared variable (a critical section)

Idea: some sort of “lock”

We (try to) acquire a lock before we enter a critical piece of code accessing/modifying shared memory

When we acquire the lock, we do our modification

After we are done, we release the lock

Roughly (and incorrectly):

while (is_locked(var_lock)) {
    sleep(1);
}
lock(var_lock);
do_important_stuff(var);
unlock(var_lock);

If we do it using just plain shared variables, we run into the same problem: data race (why?)

We need support from the OS to ensure locking and unlocking are performed atomically

A few ways to achieve this

Mutexes

Downsides: performance

Gives atomic access to a special “mutex” variable

Atomic wait+lock

Atomic unlock

Ops:

pthread_mutex_init()
pthread_mutex_lock()
pthread_mutex_unlock()

Before using a mutex: init with attributes - returns 0 on success

Initially, a mutex is unlocked

Just before entering a critical section, acquire using lock() (0 on success)

If the mutex is locked, this will block (wait)

When exiting the critical section, release using unlock() (0 on success)

At this point one of the lock() calls in other threads will return, acquiring the mutex

Semaphores

A more general primitive

A semaphore - just an integer with some ops associated

A lock (mutex) is just a special case of a semaphore

Idea: if the semaphore is 0, we have to wait, if the semaphore > 0, we’re good to go

sem_init - initialize a semaphore

sem_wait - waits for semaphore to become != 0, decrements it by 1 atomically

sem_post - increments semaphore by one atomically

If we want the semaphore to be shared, we need to allocate it as a shared variable

Example - using a semaphore as a lock (max value 1)

sem = 1

proc A             proc B             sem
sem_wait(sem)                         1
do_work();         sem_wait(sem);     0        // sem_wait blocks in B
do_more_work();    .                  0
.                  .                  0
.                  .                  0
.                  .                  0
sem_post(sem);     .                  1        // sem_wait returns in B
.                  do_work();         0
sem_wait(sem);     do_more_work();    0        // sem_wait blocks in A
.                  .                  0
.                  sem_post(sem);     1        // sem_wait returns in A
do_work();         .                  0

See sum_semaphores.c

Problem? It’s slower than the sequential example!!

Try

$ time ./sem-sum

and observe how much time is spent in the kernel. Compare this to the other versions

Kernel is doing a lot of extra work managing our semaphore

Semaphores with higher values are used, for example, for counting resources

Locking problems: Deadlocks

Happen when we introduce more than one lock and we end up in a “circular wait”

Process 1: acquire lock A -> wait for lock B

Process 2: acquire lock B -> wait for lock A

No process can make any progress because they are waiting for each other to release the other lock

“Easy” solution: enforce order on locks

If lock A is ordered before lock B, then lock A cannot be acquired after lock B, that is, if we want both, we must acquire A first, and B only if we have A already