How to Use Mutexes by Gene Cooperman (gene@ccs.neu.edu) Copyright (c) 2021 -- all rights reserved The framework for a mutex is clear: pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_lock(&mutex); /* Critical section */ pthread_mutex_unlock(&mutex); In general, a mutex is used to enforce a critical section. It is the responsibility of the programmer, not the operating system, to place a mutex lock around any code in which no more than a single thread should execute. In principle, one can do all multi-threaded programming with a single mutex ("one big lock"). No matter what a thread wants to do, if it involves a global, shared data structure, then the thread can acquire the "one big lock", do its work, and then release the "one big lock". But this is inefficient, because it prevents two threads from working in parallel. So, a thread should try to spend as little time as possible in the critical section, since the critical section prevents the use of parallelism. The later notes present special thread constructs (e.g., semaphore and condition variable) that allow more parallelism, while at the same time spending as little time as possible within a critical section. This avoids the inefficient paradigm of "one big lock". The critical section may be used in at lest two distinct cases. A. Temporarily gaining the right to use a _resource_. A "resource" is a technical term best described through a simple example. For example, maybe we don't want two threads to simultaneously write to the same file. The right (permission) to write to that file is a resource. When we lock the mutex, we "acquire that resource". When unlock the mutex, we "release that resource". B. Safely accessing and modifying some "state variables", which we will call _guard variables_. The associated mutex will be called a _guard mutex_. In this scenario, the programmer might use a three-phase strategy, shown by three functions for a resource: int resource_in_use = 0; // Global shared var, initialized to "not in use". pthread_t resource_mutex; ... int acquire_resource(...) { int resource_acquired = 0; // false pthread_mutex_lock(&resource_mutex); if (! resource_in_use) { resource_in_use = 1; resource_acquired = 1; // true } pthread_mutex_unlock(&resource_mutex); return resource_acquired; } void use_resource(...) { /* Go ahead and use the resource */ } int release_resource(...) { // Technically, we don't need the locks in this case. // We include the locks as a safer design in case we might // read another guard variable or modify more than one guard variable. pthread_mutex_lock(&resource_mutex); resource_in_use = 0; pthread_mutex_unlock(&resource_mutex); return 1; // This always succeeds. } The problem with gaining this greater parallelism is that it risks deadlock. There are at least two examples of how deadlock happens. In one case (e.g., a naive solution to the dining philosophers problem) in which each "fork" has a mutex around it. The "fork" is a _resource_ (i.e., the right to use the fork). The problem occurs because a thread (philosopher) needs two resources (forks), and it can block while holding one resource and waiting for the second resource. (This is sometimes called a "circular-wait" criterion.) A second example of how deadlock happens is when a thread locks a mutex, and then continues to hold the resource, but fails to accomplish its goal. This deadlock situation is a case of the thread entering a "hold-and-wait" scenario. This might happen because there is a problem when using the resource. It may happen if the thread calls pthread_exit() or another thread calls pthread_kill() on the first thread. The original thread dies holding the lock. _Luckily_, if a second thread calls pthread_kill() on the first, and the signal cannot be caught (e.g., there is no signal handler), then the entire process dies along with the target thread. In addition to the use case for mutexes of critical sections, there is a second, more specialized use case. This is when a mutex is used around code that simply reads and modifies some variable. It is impossible for this situation to cause deadlock. (There can be no circular-wait or hold-and-wait.) Even though there can be no deadlock, this use case is very important. We will call the mutex in this use use case a "guard mutex". The variables that it protects can be called "guard variables" or "state variables". In order to use this correctly, we should not read or modify the guard variables without using the guard mutex. (There is a specialized case where we could read a guard mutex safely, but this practice is dangerous. See "atomicity violation", below.) The use of guard mutexes is powerful, because it allows us to test if deadlock would occur before calling a mutex around a general critical section. So, we can detect potential deadlock before it happens. In this strategy, we can potentially defend against deadlock. This is especially useful to break situations like circular-wait. The use of Guard mutexes has been generalized. The standard template for using condition variables works well specifically because the mutex in that template is used _only_ to protect the user's guard variables. So, the guard mutex used in condition variables can _never_ cause deadlock. (Of course the condition variables themselves can cause deadlock, if we are not careful about how we use pthread_cond_wait().) COMMENTS IN DEPTH: In production code, many functions, such as sem_wait can be interrupted by a signal (e.g., SIGWINCH because someone resized the terminal window). But pthread_mutex_lock() is never interrupted by a signal. If it has not yet returned, then it does not yet hold the lock. And if it does not yet hold the lock, then it continues execution inside pthread_mutex_lock (just as would happen to a user-defined function when a signal arrives). Therefore, as stated in the man page, pthread_mutex_lock shall not return an error code of EINTR. It always succeeds, but only after acquiring the lock. Atomicity violation: Three of the most common bugs for multi-threaded programs are deadlock, atomicity violation, and order violation. Atomicity violation occurs when two consecutive statements by a single thread fail because a second thread executes between the two statements. The simplest example of atomicity violation is just: next_task_index++; This line of code looks like it is atomic, but when we expand it into assembly language, we see that it (a) reads next_task_index from RAM into a register; (b) increments that register; and (c) writes next_task_index from the register into RAM. Because there was no mutex lock, two threads in two cores can: (a) read next_task_index at almost the same time; (b) increment it locally; and (c) then write it at almost the same time. This creates a bug in the program due to "atomicity violation". Note that 'man pthread_mutex_init' can be used to create variants of the standard mutex. By setting an attribute, using pthread_mutex_init() along with pthread_mutexattr_init(). Not all attribute variants are provided on all Linux/UNIX systems. It's best to use these variants primarily for debugging, since they are not as efficient as the default mutex. Nevertheless, some attributes may be convenient in special cases, such as recursive mutexes, in which you can nest one mutex lock/unlock pair nside another pair, both operating on the same mutex. One Big Lock (early Linux) Originally, the Linux kernel had a single lock (equivalent to a mutex lock) for use in all cases of thread synchronization. This was safe, but slow. It was impossible for two threads to execute two unrelated operations at the same time. This was slowly split into multiple locks (multiple mutexes) to make the Linux kernel able to take advantage of multiple cores for performance.