Commit | Line | Data |
---|---|---|
3b1b3f6e | 1 | The cgroup freezer is useful to batch job management system which start |
bde5ab65 MH |
2 | and stop sets of tasks in order to schedule the resources of a machine |
3 | according to the desires of a system administrator. This sort of program | |
4 | is often used on HPC clusters to schedule access to the cluster as a | |
5 | whole. The cgroup freezer uses cgroups to describe the set of tasks to | |
6 | be started/stopped by the batch job management system. It also provides | |
7 | a means to start and stop the tasks composing the job. | |
8 | ||
3b1b3f6e | 9 | The cgroup freezer will also be useful for checkpointing running groups |
bde5ab65 MH |
10 | of tasks. The freezer allows the checkpoint code to obtain a consistent |
11 | image of the tasks by attempting to force the tasks in a cgroup into a | |
12 | quiescent state. Once the tasks are quiescent another task can | |
13 | walk /proc or invoke a kernel interface to gather information about the | |
14 | quiesced tasks. Checkpointed tasks can be restarted later should a | |
15 | recoverable error occur. This also allows the checkpointed tasks to be | |
16 | migrated between nodes in a cluster by copying the gathered information | |
17 | to another node and restarting the tasks there. | |
18 | ||
3b1b3f6e | 19 | Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping |
bde5ab65 MH |
20 | and resuming tasks in userspace. Both of these signals are observable |
21 | from within the tasks we wish to freeze. While SIGSTOP cannot be caught, | |
22 | blocked, or ignored it can be seen by waiting or ptracing parent tasks. | |
23 | SIGCONT is especially unsuitable since it can be caught by the task. Any | |
24 | programs designed to watch for SIGSTOP and SIGCONT could be broken by | |
25 | attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can | |
26 | demonstrate this problem using nested bash shells: | |
27 | ||
28 | $ echo $$ | |
29 | 16644 | |
30 | $ bash | |
31 | $ echo $$ | |
32 | 16690 | |
33 | ||
34 | From a second, unrelated bash shell: | |
35 | $ kill -SIGSTOP 16690 | |
36 | $ kill -SIGCONT 16990 | |
37 | ||
38 | <at this point 16990 exits and causes 16644 to exit too> | |
39 | ||
3b1b3f6e | 40 | This happens because bash can observe both signals and choose how it |
bde5ab65 MH |
41 | responds to them. |
42 | ||
3b1b3f6e | 43 | Another example of a program which catches and responds to these |
bde5ab65 MH |
44 | signals is gdb. In fact any program designed to use ptrace is likely to |
45 | have a problem with this method of stopping and resuming tasks. | |
46 | ||
3b1b3f6e | 47 | In contrast, the cgroup freezer uses the kernel freezer code to |
bde5ab65 MH |
48 | prevent the freeze/unfreeze cycle from becoming visible to the tasks |
49 | being frozen. This allows the bash example above and gdb to run as | |
50 | expected. | |
51 | ||
3b1b3f6e | 52 | The freezer subsystem in the container filesystem defines a file named |
bde5ab65 MH |
53 | freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the |
54 | cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup. | |
55 | Reading will return the current state. | |
56 | ||
3b1b3f6e LZ |
57 | Note freezer.state doesn't exist in root cgroup, which means root cgroup |
58 | is non-freezable. | |
59 | ||
bde5ab65 MH |
60 | * Examples of usage : |
61 | ||
3b1b3f6e | 62 | # mkdir /containers |
bde5ab65 MH |
63 | # mount -t cgroup -ofreezer freezer /containers |
64 | # mkdir /containers/0 | |
65 | # echo $some_pid > /containers/0/tasks | |
66 | ||
67 | to get status of the freezer subsystem : | |
68 | ||
69 | # cat /containers/0/freezer.state | |
70 | THAWED | |
71 | ||
72 | to freeze all tasks in the container : | |
73 | ||
74 | # echo FROZEN > /containers/0/freezer.state | |
75 | # cat /containers/0/freezer.state | |
76 | FREEZING | |
77 | # cat /containers/0/freezer.state | |
78 | FROZEN | |
79 | ||
80 | to unfreeze all tasks in the container : | |
81 | ||
82 | # echo THAWED > /containers/0/freezer.state | |
83 | # cat /containers/0/freezer.state | |
84 | THAWED | |
85 | ||
86 | This is the basic mechanism which should do the right thing for user space task | |
87 | in a simple scenario. | |
88 | ||
89 | It's important to note that freezing can be incomplete. In that case we return | |
90 | EBUSY. This means that some tasks in the cgroup are busy doing something that | |
91 | prevents us from completely freezing the cgroup at this time. After EBUSY, | |
92 | the cgroup will remain partially frozen -- reflected by freezer.state reporting | |
93 | "FREEZING" when read. The state will remain "FREEZING" until one of these | |
94 | things happens: | |
95 | ||
96 | 1) Userspace cancels the freezing operation by writing "THAWED" to | |
97 | the freezer.state file | |
98 | 2) Userspace retries the freezing operation by writing "FROZEN" to | |
99 | the freezer.state file (writing "FREEZING" is not legal | |
3b1b3f6e | 100 | and returns EINVAL) |
bde5ab65 MH |
101 | 3) The tasks that blocked the cgroup from entering the "FROZEN" |
102 | state disappear from the cgroup's set of tasks. |