forkfork to execute other programsmap/foldl?
map is parallel - each function application is
independent of the other onefoldl not, but often can be split into multiple smaller
child foldls and a parent foldlfork different processesfork
(and mmap if needed)POSIX Thread API = pthreads - implemented in the
pthread library (man 7 pthreads or
man 3 pthread)
a simple interface for creating and managing threads
Threads are created using pthread_create
Arguments:
NULL to use the default
attributes)Let’s do a hello thread
How do we wait? - pthread_join (take
the thread handle and a pointer for the return value)
Other useful functions:
pthread_self()pthread_cancel()pthread_exit()Rewrite sum using threads
Confirm that we still have a data race
Try timing?
Next time: how to avoid data races and what new problems does that cause?
sum from memorysum from memorysumsum in memorysum from memorysumsum in memory
(discarding the work done by Thread B)mmap with
MAP_SHARED).data segment (if initialized)static keyword)static
specifierWe need mutual exclusion, i.e., threads are mutually excluded from running a piece of code that needs the shared variable (a critical section)
Idea: some sort of “lock”
We (try to) acquire a lock before we enter a critical piece of code accessing/modifying shared memory
When we acquire the lock, we do our modification
After we are done, we release the lock
Roughly (and incorrectly):
while (is_locked(var_lock)) {
sleep(1);
}
lock(var_lock);
do_important_stuff(var);
unlock(var_lock);If we do it using just plain shared variables, we run into the same problem: data race (why?)
We need support from the OS to ensure locking and unlocking are performed atomically
A few ways to achieve this
pthread_mutex_init()pthread_mutex_lock()pthread_mutex_unlock()pthread_mutex_trylock()init with attributes - returns 0
on successlock() (0 on success)unlock() (0 on success)lock() calls in other threads
will return, acquiring the mutexA more general primitive
A semaphore - just an integer with some ops associated
A lock (mutex) is just a special case of a semaphore
Idea: if the semaphore is 0, we have to wait, if the
semaphore > 0, we’re good to go
sem_init - initialize a semaphore
sem_wait - waits for semaphore to become != 0,
decrements it by 1 atomically
sem_post - increments semaphore by one
atomically
If we want the semaphore to be shared, we need to allocate it as a shared variable
Example - using a semaphore as a lock (max value 1)
sem = 1
proc A proc B sem
sem_wait(sem) 1
do_work(); sem_wait(sem); 0 // sem_wait blocks in B
do_more_work(); . 0
. . 0
. . 0
. . 0
sem_post(sem); . 1 // sem_wait returns in B
. do_work(); 0
sem_wait(sem); do_more_work(); 0 // sem_wait blocks in A
. . 0
. sem_post(sem); 1 // sem_wait returns in A
do_work(); . 0See sum_semaphores.c
Problem? It’s slower than the sequential example!!
Try
$ time ./sem-sum
and observe how much time is spent in the kernel. Compare this to the other versions
Kernel is doing a lot of extra work managing our semaphore
Semaphores with higher values are used, for example, for counting resources