Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | CPU frequency and voltage scaling code in the Linux(TM) kernel |
2 | ||
3 | ||
4 | L i n u x C P U F r e q | |
5 | ||
6 | C P U F r e q G o v e r n o r s | |
7 | ||
8 | - information for users and developers - | |
9 | ||
10 | ||
11 | Dominik Brodowski <linux@brodo.de> | |
594dd2c9 | 12 | some additions and corrections by Nico Golde <nico@ngolde.de> |
1da177e4 LT |
13 | |
14 | ||
15 | ||
16 | Clock scaling allows you to change the clock speed of the CPUs on the | |
17 | fly. This is a nice method to save battery power, because the lower | |
18 | the clock speed, the less power the CPU consumes. | |
19 | ||
20 | ||
21 | Contents: | |
22 | --------- | |
23 | 1. What is a CPUFreq Governor? | |
24 | ||
25 | 2. Governors In the Linux Kernel | |
26 | 2.1 Performance | |
27 | 2.2 Powersave | |
28 | 2.3 Userspace | |
594dd2c9 | 29 | 2.4 Ondemand |
537208c8 | 30 | 2.5 Conservative |
1da177e4 LT |
31 | |
32 | 3. The Governor Interface in the CPUfreq Core | |
33 | ||
34 | ||
35 | ||
36 | 1. What Is A CPUFreq Governor? | |
37 | ============================== | |
38 | ||
39 | Most cpufreq drivers (in fact, all except one, longrun) or even most | |
40 | cpu frequency scaling algorithms only offer the CPU to be set to one | |
41 | frequency. In order to offer dynamic frequency scaling, the cpufreq | |
42 | core must be able to tell these drivers of a "target frequency". So | |
43 | these specific drivers will be transformed to offer a "->target" | |
44 | call instead of the existing "->setpolicy" call. For "longrun", all | |
45 | stays the same, though. | |
46 | ||
47 | How to decide what frequency within the CPUfreq policy should be used? | |
48 | That's done using "cpufreq governors". Two are already in this patch | |
49 | -- they're the already existing "powersave" and "performance" which | |
50 | set the frequency statically to the lowest or highest frequency, | |
51 | respectively. At least two more such governors will be ready for | |
52 | addition in the near future, but likely many more as there are various | |
53 | different theories and models about dynamic frequency scaling | |
54 | around. Using such a generic interface as cpufreq offers to scaling | |
55 | governors, these can be tested extensively, and the best one can be | |
56 | selected for each specific use. | |
57 | ||
58 | Basically, it's the following flow graph: | |
59 | ||
2fe0ae78 | 60 | CPU can be set to switch independently | CPU can only be set |
1da177e4 LT |
61 | within specific "limits" | to specific frequencies |
62 | ||
63 | "CPUfreq policy" | |
64 | consists of frequency limits (policy->{min,max}) | |
65 | and CPUfreq governor to be used | |
66 | / \ | |
67 | / \ | |
68 | / the cpufreq governor decides | |
69 | / (dynamically or statically) | |
70 | / what target_freq to set within | |
71 | / the limits of policy->{min,max} | |
72 | / \ | |
73 | / \ | |
74 | Using the ->setpolicy call, Using the ->target call, | |
75 | the limits and the the frequency closest | |
76 | "policy" is set. to target_freq is set. | |
77 | It is assured that it | |
78 | is within policy->{min,max} | |
79 | ||
80 | ||
81 | 2. Governors In the Linux Kernel | |
82 | ================================ | |
83 | ||
84 | 2.1 Performance | |
85 | --------------- | |
86 | ||
87 | The CPUfreq governor "performance" sets the CPU statically to the | |
88 | highest frequency within the borders of scaling_min_freq and | |
89 | scaling_max_freq. | |
90 | ||
91 | ||
594dd2c9 | 92 | 2.2 Powersave |
1da177e4 LT |
93 | ------------- |
94 | ||
95 | The CPUfreq governor "powersave" sets the CPU statically to the | |
96 | lowest frequency within the borders of scaling_min_freq and | |
97 | scaling_max_freq. | |
98 | ||
99 | ||
594dd2c9 | 100 | 2.3 Userspace |
1da177e4 LT |
101 | ------------- |
102 | ||
103 | The CPUfreq governor "userspace" allows the user, or any userspace | |
104 | program running with UID "root", to set the CPU to a specific frequency | |
105 | by making a sysfs file "scaling_setspeed" available in the CPU-device | |
106 | directory. | |
107 | ||
108 | ||
594dd2c9 NG |
109 | 2.4 Ondemand |
110 | ------------ | |
111 | ||
a2ffd275 | 112 | The CPUfreq governor "ondemand" sets the CPU depending on the |
594dd2c9 | 113 | current usage. To do this the CPU must have the capability to |
537208c8 AC |
114 | switch the frequency very quickly. There are a number of sysfs file |
115 | accessible parameters: | |
116 | ||
117 | sampling_rate: measured in uS (10^-6 seconds), this is how often you | |
118 | want the kernel to look at the CPU usage and to make decisions on | |
119 | what to do about the frequency. Typically this is set to values of | |
112124ab TR |
120 | around '10000' or more. It's default value is (cmp. with users-guide.txt): |
121 | transition_latency * 1000 | |
112124ab TR |
122 | Be aware that transition latency is in ns and sampling_rate is in us, so you |
123 | get the same sysfs value by default. | |
124 | Sampling rate should always get adjusted considering the transition latency | |
125 | To set the sampling rate 750 times as high as the transition latency | |
126 | in the bash (as said, 1000 is default), do: | |
127 | echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ | |
128 | >ondemand/sampling_rate | |
537208c8 | 129 | |
4f4d1ad6 TR |
130 | show_sampling_rate_min: |
131 | The sampling rate is limited by the HW transition latency: | |
132 | transition_latency * 100 | |
133 | Or by kernel restrictions: | |
134 | If CONFIG_NO_HZ is set, the limit is 10ms fixed. | |
bd74b32b | 135 | If CONFIG_NO_HZ is not set or nohz=off boot parameter is used, the |
4f4d1ad6 TR |
136 | limits depend on the CONFIG_HZ option: |
137 | HZ=1000: min=20000us (20ms) | |
138 | HZ=250: min=80000us (80ms) | |
139 | HZ=100: min=200000us (200ms) | |
140 | The highest value of kernel and HW latency restrictions is shown and | |
141 | used as the minimum sampling rate. | |
142 | ||
143 | show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT. | |
537208c8 | 144 | |
d9195881 | 145 | up_threshold: defines what the average CPU usage between the samplings |
537208c8 AC |
146 | of 'sampling_rate' needs to be for the kernel to make a decision on |
147 | whether it should increase the frequency. For example when it is set | |
292e0041 MF |
148 | to its default value of '95' it means that between the checking |
149 | intervals the CPU needs to be on average more than 95% in use to then | |
537208c8 AC |
150 | decide that the CPU frequency needs to be increased. |
151 | ||
992caacf ML |
152 | ignore_nice_load: this parameter takes a value of '0' or '1'. When |
153 | set to '0' (its default), all processes are counted towards the | |
154 | 'cpu utilisation' value. When set to '1', the processes that are | |
537208c8 | 155 | run with a 'nice' value will not count (and thus be ignored) in the |
992caacf | 156 | overall usage calculation. This is useful if you are running a CPU |
537208c8 AC |
157 | intensive calculation on your laptop that you do not care how long it |
158 | takes to complete as you can 'nice' it and prevent it from taking part | |
159 | in the deciding process of whether to increase your CPU frequency. | |
160 | ||
5b95364f VB |
161 | sampling_down_factor: this parameter controls the rate at which the |
162 | kernel makes a decision on when to decrease the frequency while running | |
163 | at top speed. When set to 1 (the default) decisions to reevaluate load | |
164 | are made at the same interval regardless of current clock speed. But | |
165 | when set to greater than 1 (e.g. 100) it acts as a multiplier for the | |
166 | scheduling interval for reevaluating load when the CPU is at its top | |
167 | speed due to high load. This improves performance by reducing the overhead | |
168 | of load evaluation and helping the CPU stay at its top speed when truly | |
169 | busy, rather than shifting back and forth in speed. This tunable has no | |
170 | effect on behavior at lower speeds/lower CPU loads. | |
171 | ||
537208c8 AC |
172 | |
173 | 2.5 Conservative | |
174 | ---------------- | |
175 | ||
176 | The CPUfreq governor "conservative", much like the "ondemand" | |
177 | governor, sets the CPU depending on the current usage. It differs in | |
178 | behaviour in that it gracefully increases and decreases the CPU speed | |
179 | rather than jumping to max speed the moment there is any load on the | |
180 | CPU. This behaviour more suitable in a battery powered environment. | |
181 | The governor is tweaked in the same manner as the "ondemand" governor | |
182 | through sysfs with the addition of: | |
183 | ||
184 | freq_step: this describes what percentage steps the cpu freq should be | |
185 | increased and decreased smoothly by. By default the cpu frequency will | |
186 | increase in 5% chunks of your maximum cpu frequency. You can change this | |
187 | value to anywhere between 0 and 100 where '0' will effectively lock your | |
188 | CPU at a speed regardless of its load whilst '100' will, in theory, make | |
189 | it behave identically to the "ondemand" governor. | |
190 | ||
191 | down_threshold: same as the 'up_threshold' found for the "ondemand" | |
192 | governor but for the opposite direction. For example when set to its | |
193 | default value of '20' it means that if the CPU usage needs to be below | |
194 | 20% between samples to have the frequency decreased. | |
1da177e4 LT |
195 | |
196 | 3. The Governor Interface in the CPUfreq Core | |
197 | ============================================= | |
198 | ||
199 | A new governor must register itself with the CPUfreq core using | |
200 | "cpufreq_register_governor". The struct cpufreq_governor, which has to | |
201 | be passed to that function, must contain the following values: | |
202 | ||
203 | governor->name - A unique name for this governor | |
204 | governor->governor - The governor callback function | |
205 | governor->owner - .THIS_MODULE for the governor module (if | |
206 | appropriate) | |
207 | ||
208 | The governor->governor callback is called with the current (or to-be-set) | |
209 | cpufreq_policy struct for that CPU, and an unsigned int event. The | |
210 | following events are currently defined: | |
211 | ||
212 | CPUFREQ_GOV_START: This governor shall start its duty for the CPU | |
213 | policy->cpu | |
214 | CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU | |
215 | policy->cpu | |
216 | CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to | |
217 | policy->min and policy->max. | |
218 | ||
219 | If you need other "events" externally of your driver, _only_ use the | |
220 | cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the | |
221 | CPUfreq core to ensure proper locking. | |
222 | ||
223 | ||
224 | The CPUfreq governor may call the CPU processor driver using one of | |
225 | these two functions: | |
226 | ||
227 | int cpufreq_driver_target(struct cpufreq_policy *policy, | |
228 | unsigned int target_freq, | |
229 | unsigned int relation); | |
230 | ||
231 | int __cpufreq_driver_target(struct cpufreq_policy *policy, | |
232 | unsigned int target_freq, | |
233 | unsigned int relation); | |
234 | ||
235 | target_freq must be within policy->min and policy->max, of course. | |
236 | What's the difference between these two functions? When your governor | |
237 | still is in a direct code path of a call to governor->governor, the | |
238 | per-CPU cpufreq lock is still held in the cpufreq core, and there's | |
239 | no need to lock it again (in fact, this would cause a deadlock). So | |
240 | use __cpufreq_driver_target only in these cases. In all other cases | |
241 | (for example, when there's a "daemonized" function that wakes up | |
242 | every second), use cpufreq_driver_target to lock the cpufreq per-CPU | |
243 | lock before the command is passed to the cpufreq processor driver. | |
244 |