Commit | Line | Data |
---|---|---|
1da177e4 | 1 | Locking scheme used for directory operations is based on two |
c2b38989 JJS |
2 | kinds of locks - per-inode (->i_mutex) and per-filesystem |
3 | (->s_vfs_rename_mutex). | |
1da177e4 LT |
4 | |
5 | For our purposes all operations fall in 5 classes: | |
6 | ||
7 | 1) read access. Locking rules: caller locks directory we are accessing. | |
8 | ||
9 | 2) object creation. Locking rules: same as above. | |
10 | ||
11 | 3) object removal. Locking rules: caller locks parent, finds victim, | |
12 | locks victim and calls the method. | |
13 | ||
14 | 4) rename() that is _not_ cross-directory. Locking rules: caller locks | |
15 | the parent, finds source and target, if target already exists - locks it | |
16 | and then calls the method. | |
17 | ||
18 | 5) link creation. Locking rules: | |
19 | * lock parent | |
20 | * check that source is not a directory | |
21 | * lock source | |
22 | * call the method. | |
23 | ||
24 | 6) cross-directory rename. The trickiest in the whole bunch. Locking | |
25 | rules: | |
26 | * lock the filesystem | |
27 | * lock parents in "ancestors first" order. | |
28 | * find source and target. | |
29 | * if old parent is equal to or is a descendent of target | |
30 | fail with -ENOTEMPTY | |
31 | * if new parent is equal to or is a descendent of source | |
32 | fail with -ELOOP | |
33 | * if target exists - lock it. | |
34 | * call the method. | |
35 | ||
36 | ||
37 | The rules above obviously guarantee that all directories that are going to be | |
38 | read, modified or removed by method will be locked by caller. | |
39 | ||
40 | ||
41 | If no directory is its own ancestor, the scheme above is deadlock-free. | |
42 | Proof: | |
43 | ||
44 | First of all, at any moment we have a partial ordering of the | |
45 | objects - A < B iff A is an ancestor of B. | |
46 | ||
47 | That ordering can change. However, the following is true: | |
48 | ||
49 | (1) if object removal or non-cross-directory rename holds lock on A and | |
50 | attempts to acquire lock on B, A will remain the parent of B until we | |
51 | acquire the lock on B. (Proof: only cross-directory rename can change | |
52 | the parent of object and it would have to lock the parent). | |
53 | ||
54 | (2) if cross-directory rename holds the lock on filesystem, order will not | |
55 | change until rename acquires all locks. (Proof: other cross-directory | |
56 | renames will be blocked on filesystem lock and we don't start changing | |
57 | the order until we had acquired all locks). | |
58 | ||
59 | (3) any operation holds at most one lock on non-directory object and | |
60 | that lock is acquired after all other locks. (Proof: see descriptions | |
61 | of operations). | |
62 | ||
63 | Now consider the minimal deadlock. Each process is blocked on | |
64 | attempt to acquire some lock and already holds at least one lock. Let's | |
65 | consider the set of contended locks. First of all, filesystem lock is | |
66 | not contended, since any process blocked on it is not holding any locks. | |
c2b38989 | 67 | Thus all processes are blocked on ->i_mutex. |
1da177e4 LT |
68 | |
69 | Non-directory objects are not contended due to (3). Thus link | |
70 | creation can't be a part of deadlock - it can't be blocked on source | |
71 | and it means that it doesn't hold any locks. | |
72 | ||
73 | Any contended object is either held by cross-directory rename or | |
74 | has a child that is also contended. Indeed, suppose that it is held by | |
75 | operation other than cross-directory rename. Then the lock this operation | |
76 | is blocked on belongs to child of that object due to (1). | |
77 | ||
78 | It means that one of the operations is cross-directory rename. | |
79 | Otherwise the set of contended objects would be infinite - each of them | |
80 | would have a contended child and we had assumed that no object is its | |
81 | own descendent. Moreover, there is exactly one cross-directory rename | |
82 | (see above). | |
83 | ||
84 | Consider the object blocking the cross-directory rename. One | |
85 | of its descendents is locked by cross-directory rename (otherwise we | |
670e9f34 | 86 | would again have an infinite set of contended objects). But that |
1da177e4 LT |
87 | means that cross-directory rename is taking locks out of order. Due |
88 | to (2) the order hadn't changed since we had acquired filesystem lock. | |
89 | But locking rules for cross-directory rename guarantee that we do not | |
90 | try to acquire lock on descendent before the lock on ancestor. | |
91 | Contradiction. I.e. deadlock is impossible. Q.E.D. | |
92 | ||
93 | ||
94 | These operations are guaranteed to avoid loop creation. Indeed, | |
95 | the only operation that could introduce loops is cross-directory rename. | |
96 | Since the only new (parent, child) pair added by rename() is (new parent, | |
97 | source), such loop would have to contain these objects and the rest of it | |
98 | would have to exist before rename(). I.e. at the moment of loop creation | |
99 | rename() responsible for that would be holding filesystem lock and new parent | |
100 | would have to be equal to or a descendent of source. But that means that | |
101 | new parent had been equal to or a descendent of source since the moment when | |
102 | we had acquired filesystem lock and rename() would fail with -ELOOP in that | |
103 | case. | |
104 | ||
105 | While this locking scheme works for arbitrary DAGs, it relies on | |
106 | ability to check that directory is a descendent of another object. Current | |
107 | implementation assumes that directory graph is a tree. This assumption is | |
108 | also preserved by all operations (cross-directory rename on a tree that would | |
109 | not introduce a cycle will leave it a tree and link() fails for directories). | |
110 | ||
111 | Notice that "directory" in the above == "anything that might have | |
112 | children", so if we are going to introduce hybrid objects we will need | |
113 | either to make sure that link(2) doesn't work for them or to make changes | |
114 | in is_subdir() that would make it work even in presence of such beasts. |