Commit | Line | Data |
---|---|---|
abd41443 TT |
1 | Event Tracing |
2 | ||
3 | Documentation written by Theodore Ts'o | |
95b69608 | 4 | Updated by Li Zefan and Tom Zanussi |
abd41443 | 5 | |
143c145e LZ |
6 | 1. Introduction |
7 | =============== | |
abd41443 TT |
8 | |
9 | Tracepoints (see Documentation/trace/tracepoints.txt) can be used | |
10 | without creating custom kernel modules to register probe functions | |
11 | using the event tracing infrastructure. | |
12 | ||
13 | Not all tracepoints can be traced using the event tracing system; | |
14 | the kernel developer must provide code snippets which define how the | |
15 | tracing information is saved into the tracing buffer, and how the | |
143c145e | 16 | tracing information should be printed. |
abd41443 | 17 | |
143c145e LZ |
18 | 2. Using Event Tracing |
19 | ====================== | |
20 | ||
21 | 2.1 Via the 'set_event' interface | |
22 | --------------------------------- | |
abd41443 TT |
23 | |
24 | The events which are available for tracing can be found in the file | |
52ad51e7 | 25 | /sys/kernel/debug/tracing/available_events. |
abd41443 TT |
26 | |
27 | To enable a particular event, such as 'sched_wakeup', simply echo it | |
52ad51e7 | 28 | to /sys/kernel/debug/tracing/set_event. For example: |
abd41443 | 29 | |
52ad51e7 | 30 | # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event |
abd41443 | 31 | |
143c145e LZ |
32 | [ Note: '>>' is necessary, otherwise it will firstly disable |
33 | all the events. ] | |
abd41443 TT |
34 | |
35 | To disable an event, echo the event name to the set_event file prefixed | |
36 | with an exclamation point: | |
37 | ||
52ad51e7 | 38 | # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event |
143c145e LZ |
39 | |
40 | To disable all events, echo an empty line to the set_event file: | |
41 | ||
52ad51e7 | 42 | # echo > /sys/kernel/debug/tracing/set_event |
abd41443 | 43 | |
143c145e | 44 | To enable all events, echo '*:*' or '*:' to the set_event file: |
abd41443 | 45 | |
52ad51e7 | 46 | # echo *:* > /sys/kernel/debug/tracing/set_event |
abd41443 TT |
47 | |
48 | The events are organized into subsystems, such as ext4, irq, sched, | |
49 | etc., and a full event name looks like this: <subsystem>:<event>. The | |
50 | subsystem name is optional, but it is displayed in the available_events | |
51 | file. All of the events in a subsystem can be specified via the syntax | |
52 | "<subsystem>:*"; for example, to enable all irq events, you can use the | |
53 | command: | |
54 | ||
52ad51e7 | 55 | # echo 'irq:*' > /sys/kernel/debug/tracing/set_event |
143c145e LZ |
56 | |
57 | 2.2 Via the 'enable' toggle | |
58 | --------------------------- | |
59 | ||
52ad51e7 | 60 | The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy |
143c145e LZ |
61 | of directories. |
62 | ||
63 | To enable event 'sched_wakeup': | |
64 | ||
52ad51e7 | 65 | # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
143c145e LZ |
66 | |
67 | To disable it: | |
68 | ||
52ad51e7 | 69 | # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable |
143c145e LZ |
70 | |
71 | To enable all events in sched subsystem: | |
72 | ||
52ad51e7 | 73 | # echo 1 > /sys/kernel/debug/tracing/events/sched/enable |
143c145e | 74 | |
6afb1c65 | 75 | To enable all events: |
143c145e | 76 | |
52ad51e7 | 77 | # echo 1 > /sys/kernel/debug/tracing/events/enable |
143c145e LZ |
78 | |
79 | When reading one of these enable files, there are four results: | |
80 | ||
81 | 0 - all events this file affects are disabled | |
82 | 1 - all events this file affects are enabled | |
83 | X - there is a mixture of events enabled and disabled | |
84 | ? - this file does not affect any event | |
85 | ||
020e5f85 LZ |
86 | 2.3 Boot option |
87 | --------------- | |
88 | ||
89 | In order to facilitate early boot debugging, use boot option: | |
90 | ||
91 | trace_event=[event-list] | |
92 | ||
03d646e6 LZ |
93 | event-list is a comma separated list of events. See section 2.1 for event |
94 | format. | |
020e5f85 | 95 | |
143c145e LZ |
96 | 3. Defining an event-enabled tracepoint |
97 | ======================================= | |
98 | ||
99 | See The example provided in samples/trace_events | |
100 | ||
95b69608 TZ |
101 | 4. Event formats |
102 | ================ | |
103 | ||
104 | Each trace event has a 'format' file associated with it that contains | |
105 | a description of each field in a logged event. This information can | |
106 | be used to parse the binary trace stream, and is also the place to | |
107 | find the field names that can be used in event filters (see section 5). | |
108 | ||
109 | It also displays the format string that will be used to print the | |
110 | event in text mode, along with the event name and ID used for | |
111 | profiling. | |
112 | ||
113 | Every event has a set of 'common' fields associated with it; these are | |
114 | the fields prefixed with 'common_'. The other fields vary between | |
115 | events and correspond to the fields defined in the TRACE_EVENT | |
116 | definition for that event. | |
117 | ||
118 | Each field in the format has the form: | |
119 | ||
120 | field:field-type field-name; offset:N; size:N; | |
121 | ||
122 | where offset is the offset of the field in the trace record and size | |
123 | is the size of the data item, in bytes. | |
124 | ||
125 | For example, here's the information displayed for the 'sched_wakeup' | |
126 | event: | |
127 | ||
bfc672dc | 128 | # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format |
95b69608 TZ |
129 | |
130 | name: sched_wakeup | |
131 | ID: 60 | |
132 | format: | |
133 | field:unsigned short common_type; offset:0; size:2; | |
134 | field:unsigned char common_flags; offset:2; size:1; | |
135 | field:unsigned char common_preempt_count; offset:3; size:1; | |
136 | field:int common_pid; offset:4; size:4; | |
137 | field:int common_tgid; offset:8; size:4; | |
138 | ||
139 | field:char comm[TASK_COMM_LEN]; offset:12; size:16; | |
140 | field:pid_t pid; offset:28; size:4; | |
141 | field:int prio; offset:32; size:4; | |
142 | field:int success; offset:36; size:4; | |
143 | field:int cpu; offset:40; size:4; | |
144 | ||
145 | print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, | |
146 | REC->prio, REC->success, REC->cpu | |
147 | ||
148 | This event contains 10 fields, the first 5 common and the remaining 5 | |
149 | event-specific. All the fields for this event are numeric, except for | |
150 | 'comm' which is a string, a distinction important for event filtering. | |
151 | ||
152 | 5. Event filtering | |
153 | ================== | |
154 | ||
155 | Trace events can be filtered in the kernel by associating boolean | |
156 | 'filter expressions' with them. As soon as an event is logged into | |
157 | the trace buffer, its fields are checked against the filter expression | |
158 | associated with that event type. An event with field values that | |
159 | 'match' the filter will appear in the trace output, and an event whose | |
160 | values don't match will be discarded. An event with no filter | |
161 | associated with it matches everything, and is the default when no | |
162 | filter has been set for an event. | |
163 | ||
164 | 5.1 Expression syntax | |
165 | --------------------- | |
166 | ||
167 | A filter expression consists of one or more 'predicates' that can be | |
168 | combined using the logical operators '&&' and '||'. A predicate is | |
169 | simply a clause that compares the value of a field contained within a | |
170 | logged event with a constant value and returns either 0 or 1 depending | |
171 | on whether the field value matched (1) or didn't match (0): | |
172 | ||
173 | field-name relational-operator value | |
174 | ||
175 | Parentheses can be used to provide arbitrary logical groupings and | |
176 | double-quotes can be used to prevent the shell from interpreting | |
177 | operators as shell metacharacters. | |
178 | ||
179 | The field-names available for use in filters can be found in the | |
180 | 'format' files for trace events (see section 4). | |
181 | ||
182 | The relational-operators depend on the type of the field being tested: | |
183 | ||
184 | The operators available for numeric fields are: | |
185 | ||
186 | ==, !=, <, <=, >, >= | |
187 | ||
188 | And for string fields they are: | |
189 | ||
190 | ==, != | |
191 | ||
192 | Currently, only exact string matches are supported. | |
193 | ||
95b69608 TZ |
194 | 5.2 Setting filters |
195 | ------------------- | |
196 | ||
197 | A filter for an individual event is set by writing a filter expression | |
198 | to the 'filter' file for the given event. | |
199 | ||
200 | For example: | |
201 | ||
bfc672dc | 202 | # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup |
95b69608 TZ |
203 | # echo "common_preempt_count > 4" > filter |
204 | ||
205 | A slightly more involved example: | |
206 | ||
bfc672dc | 207 | # cd /sys/kernel/debug/tracing/events/signal/signal_generate |
95b69608 TZ |
208 | # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter |
209 | ||
210 | If there is an error in the expression, you'll get an 'Invalid | |
211 | argument' error when setting it, and the erroneous string along with | |
212 | an error message can be seen by looking at the filter e.g.: | |
213 | ||
bfc672dc | 214 | # cd /sys/kernel/debug/tracing/events/signal/signal_generate |
95b69608 TZ |
215 | # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter |
216 | -bash: echo: write error: Invalid argument | |
217 | # cat filter | |
218 | ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash | |
219 | ^ | |
220 | parse_error: Field not found | |
221 | ||
222 | Currently the caret ('^') for an error always appears at the beginning of | |
223 | the filter string; the error message should still be useful though | |
224 | even without more accurate position info. | |
225 | ||
226 | 5.3 Clearing filters | |
227 | -------------------- | |
228 | ||
229 | To clear the filter for an event, write a '0' to the event's filter | |
230 | file. | |
231 | ||
232 | To clear the filters for all events in a subsystem, write a '0' to the | |
233 | subsystem's filter file. | |
234 | ||
235 | 5.3 Subsystem filters | |
236 | --------------------- | |
237 | ||
238 | For convenience, filters for every event in a subsystem can be set or | |
239 | cleared as a group by writing a filter expression into the filter file | |
88393161 | 240 | at the root of the subsystem. Note however, that if a filter for any |
95b69608 TZ |
241 | event within the subsystem lacks a field specified in the subsystem |
242 | filter, or if the filter can't be applied for any other reason, the | |
243 | filter for that event will retain its previous setting. This can | |
244 | result in an unintended mixture of filters which could lead to | |
245 | confusing (to the user who might think different filters are in | |
246 | effect) trace output. Only filters that reference just the common | |
247 | fields can be guaranteed to propagate successfully to all events. | |
248 | ||
249 | Here are a few subsystem filter examples that also illustrate the | |
250 | above points: | |
251 | ||
88393161 | 252 | Clear the filters on all events in the sched subsystem: |
95b69608 TZ |
253 | |
254 | # cd /sys/kernel/debug/tracing/events/sched | |
255 | # echo 0 > filter | |
256 | # cat sched_switch/filter | |
257 | none | |
258 | # cat sched_wakeup/filter | |
259 | none | |
260 | ||
261 | Set a filter using only common fields for all events in the sched | |
88393161 | 262 | subsystem (all events end up with the same filter): |
95b69608 TZ |
263 | |
264 | # cd /sys/kernel/debug/tracing/events/sched | |
265 | # echo common_pid == 0 > filter | |
266 | # cat sched_switch/filter | |
267 | common_pid == 0 | |
268 | # cat sched_wakeup/filter | |
269 | common_pid == 0 | |
270 | ||
271 | Attempt to set a filter using a non-common field for all events in the | |
88393161 | 272 | sched subsystem (all events but those that have a prev_pid field retain |
95b69608 TZ |
273 | their old filters): |
274 | ||
275 | # cd /sys/kernel/debug/tracing/events/sched | |
276 | # echo prev_pid == 0 > filter | |
277 | # cat sched_switch/filter | |
278 | prev_pid == 0 | |
279 | # cat sched_wakeup/filter | |
280 | common_pid == 0 |