Commit | Line | Data |
---|---|---|
f3f54ffa IM |
1 | Generic Mutex Subsystem |
2 | ||
3 | started by Ingo Molnar <mingo@redhat.com> | |
9161f540 | 4 | updated by Davidlohr Bueso <davidlohr@hp.com> |
f3f54ffa | 5 | |
9161f540 DB |
6 | What are mutexes? |
7 | ----------------- | |
f3f54ffa | 8 | |
9161f540 DB |
9 | In the Linux kernel, mutexes refer to a particular locking primitive |
10 | that enforces serialization on shared memory systems, and not only to | |
11 | the generic term referring to 'mutual exclusion' found in academia | |
12 | or similar theoretical text books. Mutexes are sleeping locks which | |
13 | behave similarly to binary semaphores, and were introduced in 2006[1] | |
14 | as an alternative to these. This new data structure provided a number | |
15 | of advantages, including simpler interfaces, and at that time smaller | |
16 | code (see Disadvantages). | |
f3f54ffa | 17 | |
9161f540 | 18 | [1] http://lwn.net/Articles/164802/ |
f3f54ffa | 19 | |
9161f540 DB |
20 | Implementation |
21 | -------------- | |
f3f54ffa | 22 | |
9161f540 DB |
23 | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h |
24 | and implemented in kernel/locking/mutex.c. These locks use a three | |
25 | state atomic counter (->count) to represent the different possible | |
26 | transitions that can occur during the lifetime of a lock: | |
f3f54ffa | 27 | |
9161f540 DB |
28 | 1: unlocked |
29 | 0: locked, no waiters | |
30 | negative: locked, with potential waiters | |
f3f54ffa | 31 | |
9161f540 DB |
32 | In its most basic form it also includes a wait-queue and a spinlock |
33 | that serializes access to it. CONFIG_SMP systems can also include | |
34 | a pointer to the lock task owner (->owner) as well as a spinner MCS | |
35 | lock (->osq), both described below in (ii). | |
f3f54ffa | 36 | |
9161f540 DB |
37 | When acquiring a mutex, there are three possible paths that can be |
38 | taken, depending on the state of the lock: | |
f3f54ffa | 39 | |
9161f540 DB |
40 | (i) fastpath: tries to atomically acquire the lock by decrementing the |
41 | counter. If it was already taken by another task it goes to the next | |
42 | possible path. This logic is architecture specific. On x86-64, the | |
43 | locking fastpath is 2 instructions: | |
f3f54ffa | 44 | |
9161f540 DB |
45 | 0000000000000e10 <mutex_lock>: |
46 | e21: f0 ff 0b lock decl (%rbx) | |
47 | e24: 79 08 jns e2e <mutex_lock+0x1e> | |
f3f54ffa IM |
48 | |
49 | the unlocking fastpath is equally tight: | |
50 | ||
9161f540 DB |
51 | 0000000000000bc0 <mutex_unlock>: |
52 | bc8: f0 ff 07 lock incl (%rdi) | |
53 | bcb: 7f 0a jg bd7 <mutex_unlock+0x17> | |
54 | ||
55 | ||
56 | (ii) midpath: aka optimistic spinning, tries to spin for acquisition | |
57 | while the lock owner is running and there are no other tasks ready | |
58 | to run that have higher priority (need_resched). The rationale is | |
59 | that if the lock owner is running, it is likely to release the lock | |
60 | soon. The mutex spinners are queued up using MCS lock so that only | |
61 | one spinner can compete for the mutex. | |
62 | ||
63 | The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock | |
64 | with the desirable properties of being fair and with each cpu trying | |
65 | to acquire the lock spinning on a local variable. It avoids expensive | |
66 | cacheline bouncing that common test-and-set spinlock implementations | |
67 | incur. An MCS-like lock is specially tailored for optimistic spinning | |
68 | for sleeping lock implementation. An important feature of the customized | |
69 | MCS lock is that it has the extra property that spinners are able to exit | |
70 | the MCS spinlock queue when they need to reschedule. This further helps | |
71 | avoid situations where MCS spinners that need to reschedule would continue | |
72 | waiting to spin on mutex owner, only to go directly to slowpath upon | |
73 | obtaining the MCS lock. | |
74 | ||
75 | ||
76 | (iii) slowpath: last resort, if the lock is still unable to be acquired, | |
77 | the task is added to the wait-queue and sleeps until woken up by the | |
78 | unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. | |
79 | ||
80 | While formally kernel mutexes are sleepable locks, it is path (ii) that | |
81 | makes them more practically a hybrid type. By simply not interrupting a | |
82 | task and busy-waiting for a few cycles instead of immediately sleeping, | |
83 | the performance of this lock has been seen to significantly improve a | |
84 | number of workloads. Note that this technique is also used for rw-semaphores. | |
85 | ||
86 | Semantics | |
87 | --------- | |
88 | ||
89 | The mutex subsystem checks and enforces the following rules: | |
90 | ||
91 | - Only one task can hold the mutex at a time. | |
92 | - Only the owner can unlock the mutex. | |
93 | - Multiple unlocks are not permitted. | |
94 | - Recursive locking/unlocking is not permitted. | |
95 | - A mutex must only be initialized via the API (see below). | |
96 | - A task may not exit with a mutex held. | |
97 | - Memory areas where held locks reside must not be freed. | |
98 | - Held mutexes must not be reinitialized. | |
99 | - Mutexes may not be used in hardware or software interrupt | |
100 | contexts such as tasklets and timers. | |
101 | ||
102 | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. | |
103 | In addition, the mutex debugging code also implements a number of other | |
104 | features that make lock debugging easier and faster: | |
105 | ||
106 | - Uses symbolic names of mutexes, whenever they are printed | |
107 | in debug output. | |
108 | - Point-of-acquire tracking, symbolic lookup of function names, | |
109 | list of all locks held in the system, printout of them. | |
110 | - Owner tracking. | |
111 | - Detects self-recursing locks and prints out all relevant info. | |
112 | - Detects multi-task circular deadlocks and prints out all affected | |
113 | locks and tasks (and only those tasks). | |
114 | ||
115 | ||
116 | Interfaces | |
117 | ---------- | |
118 | Statically define the mutex: | |
119 | DEFINE_MUTEX(name); | |
120 | ||
121 | Dynamically initialize the mutex: | |
122 | mutex_init(mutex); | |
123 | ||
124 | Acquire the mutex, uninterruptible: | |
125 | void mutex_lock(struct mutex *lock); | |
126 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); | |
127 | int mutex_trylock(struct mutex *lock); | |
128 | ||
129 | Acquire the mutex, interruptible: | |
130 | int mutex_lock_interruptible_nested(struct mutex *lock, | |
131 | unsigned int subclass); | |
132 | int mutex_lock_interruptible(struct mutex *lock); | |
133 | ||
134 | Acquire the mutex, interruptible, if dec to 0: | |
135 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); | |
136 | ||
137 | Unlock the mutex: | |
138 | void mutex_unlock(struct mutex *lock); | |
139 | ||
140 | Test if the mutex is taken: | |
141 | int mutex_is_locked(struct mutex *lock); | |
f3f54ffa IM |
142 | |
143 | Disadvantages | |
144 | ------------- | |
145 | ||
9161f540 DB |
146 | Unlike its original design and purpose, 'struct mutex' is larger than |
147 | most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice | |
148 | as large as 'struct semaphore' (24 bytes) and 8 bytes shy of the | |
149 | 'struct rw_semaphore' variant. Larger structure sizes mean more CPU | |
150 | cache and memory footprint. | |
f3f54ffa | 151 | |
9161f540 DB |
152 | When to use mutexes |
153 | ------------------- | |
f3f54ffa | 154 | |
9161f540 DB |
155 | Unless the strict semantics of mutexes are unsuitable and/or the critical |
156 | region prevents the lock from being shared, always prefer them to any other | |
157 | locking primitive. |