Commit | Line | Data |
---|---|---|
28ccfe49 JD |
1 | # lttng-analyses |
2 | ||
3 | This repository contains various scripts to extract monitoring data and metrics | |
4 | from LTTng kernel traces. | |
5 | ||
0f582ce8 | 6 | As opposed to other diagnostic or monitoring solutions, this approach is designed |
898ff628 JG |
7 | to allow users to record their system's activity with a low overhead, wait |
8 | for a problem to occur and then diagnose its cause offline. | |
28ccfe49 | 9 | |
0f582ce8 AB |
10 | This solution allows the user to target hard to find problems and dig until the |
11 | root cause is found. | |
28ccfe49 | 12 | |
0f582ce8 | 13 | This README describes the implemented analyses as well as how to use them. |
28ccfe49 JD |
14 | |
15 | ## Requirements | |
e3de48f7 | 16 | * LTTng >= 2.5 |
898ff628 | 17 | * Babeltrace >= 1.2 (with python bindings built) |
e3de48f7 | 18 | * Python >= 3.4 |
28ccfe49 | 19 | |
8a9f7bc2 JD |
20 | ## Install |
21 | ### Ubuntu (12.04 and 14.04 at least) | |
e3de48f7 | 22 | ```bash |
851bf235 JD |
23 | apt-get install -y software-properties-common (or python-software-properties on 12.04) |
24 | apt-add-repository -y ppa:lttng/ppa | |
25 | apt-get update | |
26 | apt-get -y install lttng-tools babeltrace lttng-modules-dkms python3-babeltrace python3-progressbar | |
8a9f7bc2 | 27 | git clone https://github.com/lttng/lttng-analyses.git |
94cdc371 JD |
28 | cd lttng-analyses |
29 | ./install.sh (or run from the local directory) | |
898ff628 JG |
30 | ``` |
31 | ||
8a9f7bc2 JD |
32 | ### Other distros |
33 | Please refer to the [LTTng documentation](http://lttng.org/download/) to | |
34 | install LTTng and the [Babeltrace | |
35 | README](http://git.efficios.com/?p=babeltrace.git;a=blob_plain;f=README;hb=HEAD) | |
00508253 | 36 | to install ```babeltrace``` with the python bindings. Optionally install the |
94cdc371 | 37 | ```progressbar``` python module, and then: |
8a9f7bc2 | 38 | |
898ff628 | 39 | ```bash |
e3de48f7 | 40 | git clone https://github.com/lttng/lttng-analyses.git |
94cdc371 JD |
41 | cd lttng-analyses |
42 | ./install.sh (or run from the local directory) | |
e3de48f7 | 43 | ``` |
28ccfe49 JD |
44 | |
45 | ## Trace creation | |
b5019297 JD |
46 | Here are the basic commands to create a trace, for more information on the |
47 | LTTng setup, please refer to the [LTTng | |
48 | documentation](http://lttng.org/docs/#doc-tracing-the-linux-kernel) | |
49 | ||
e3de48f7 JD |
50 | ### Automatic |
51 | From the cloned git tree: | |
52 | ```bash | |
53 | ./lttng-analyses-record | |
28ccfe49 | 54 | ``` |
e3de48f7 JD |
55 | |
56 | ### Manual | |
57 | ```bash | |
851bf235 JD |
58 | lttng create |
59 | lttng enable-channel -k bla --subbuf-size=4M | |
0e634170 | 60 | lttng enable-event -k sched_switch,block_rq_complete,block_rq_issue,block_bio_remap,block_bio_backmerge,netif_receive_skb,net_dev_xmit,sched_process_fork,sched_process_exec,lttng_statedump_process_state,lttng_statedump_file_descriptor,lttng_statedump_block_device,writeback_pages_written,mm_vmscan_wakeup_kswapd,mm_page_free,mm_page_alloc,block_dirty_buffer,irq_handler_entry,irq_handler_exit,softirq_entry,softirq_exit,softirq_raise -c bla |
851bf235 JD |
61 | lttng enable-event -k --syscall -a -c bla |
62 | lttng start | |
63 | ..do stuff... | |
64 | lttng stop | |
65 | lttng destroy | |
28ccfe49 JD |
66 | ``` |
67 | ||
e3de48f7 | 68 | ### Remote trace creation |
28ccfe49 | 69 | You can also create a trace on a server and send it to a remote host. The |
0f582ce8 AB |
70 | remote host only needs to run `lttng-relayd -d` and be reachable over the network. |
71 | The only difference with the above commands is the tracing session's creation: | |
e3de48f7 | 72 | ```bash |
851bf235 | 73 | lttng create -U net://<remote-host> |
3d184c69 | 74 | ``` |
28ccfe49 | 75 | |
e3de48f7 JD |
76 | ## Implemented analyses |
77 | ||
78 | * CPU usage for the whole system | |
79 | * CPU usage per-process | |
80 | * Process CPU migration count | |
81 | * Memory usage per-process (as seen by the kernel) | |
82 | * Memory usage system-wide (as seen by the kernel) | |
83 | * I/O usage (syscalls, disk, network) | |
84 | * I/O operations log (with latency and usage) | |
85 | * I/O latency statistics (open, read, write, sync operations) | |
86 | * I/O latency frequency distribution | |
87 | * Interrupt handler duration statistics (count, min, max, average stdev) | |
88 | * Interrupt handler duration top | |
89 | * Interrupt handler duration log | |
90 | * Interrupt handler duration frequency distribution | |
91 | * SoftIRQ handler latency statistics | |
92 | * Syscalls usage statistics | |
93 | ||
898ff628 JG |
94 | All of the analyses share the same code architecture making it possible |
95 | to filter by timerange, process name, PID, min and max values using the | |
96 | same command-line options. Also note that reported timestamps can | |
97 | optionally be expressed in the GMT timezone to allow easy sharing between | |
98 | teams. | |
e3de48f7 | 99 | |
898ff628 JG |
100 | The project's architecture makes it easy to add new analyses or to reuse |
101 | the analysis backend in external tools which may then present the results | |
102 | in their own format (as opposed to text). | |
e3de48f7 | 103 | |
fe109453 | 104 | ## Examples |
898ff628 JG |
105 | After having collected your trace, any script contained in this repository |
106 | can be used to run an analysis. Read on for some examples! | |
e3de48f7 JD |
107 | |
108 | ### I/O | |
109 | #### I/O latency stats | |
110 | ```bash | |
fe109453 | 111 | $ ./lttng-iolatencystats mytrace/ |
e3de48f7 JD |
112 | Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936] |
113 | Syscalls latency statistics (usec): | |
114 | Type Count Min Average Max Stdev | |
115 | ----------------------------------------------------------------------------------------- | |
116 | Open 45 5.562 13.835 77.683 15.263 | |
117 | Read 109 0.316 5.774 62.569 9.277 | |
118 | Write 101 0.256 7.060 48.531 8.555 | |
119 | Sync 207 19.384 40.664 160.188 21.201 | |
120 | ||
121 | Disk latency statistics (usec): | |
122 | Name Count Min Average Max Stdev | |
123 | ----------------------------------------------------------------------------------------- | |
124 | dm-0 108 0.001 0.004 0.007 1.306 | |
125 | ``` | |
126 | ||
127 | #### I/O latency frequency distribution | |
128 | ```bash | |
fe109453 | 129 | $ ./lttng-iolatencyfreq mytrace/ |
e3de48f7 JD |
130 | Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936] |
131 | Open latency distribution (usec) | |
28ccfe49 | 132 | ############################################################################### |
0f582ce8 AB |
133 | 5.562 ███████████████████████████████████████████████████████████████████ 25 |
134 | 9.168 ██████████ 4 | |
135 | 12.774 █████████████████████ 8 | |
136 | 16.380 ████████ 3 | |
137 | 19.986 █████ 2 | |
138 | 23.592 0 | |
139 | 27.198 0 | |
140 | 30.804 0 | |
141 | 34.410 ██ 1 | |
142 | 38.016 0 | |
143 | 41.623 0 | |
144 | 45.229 0 | |
145 | 48.835 0 | |
146 | 52.441 0 | |
147 | 56.047 0 | |
148 | 59.653 0 | |
149 | 63.259 0 | |
150 | 66.865 0 | |
151 | 70.471 0 | |
152 | 74.077 █████ 2 | |
e3de48f7 JD |
153 | ``` |
154 | ||
bb0e7459 JD |
155 | #### I/O latency top |
156 | ```bash | |
157 | $ ./lttng-iolatencytop analysis-20150115-120942/ --limit 3 --minsize 2 | |
158 | Checking the trace for lost events... | |
159 | Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313] | |
160 | Top open syscall latencies (usec) | |
161 | Begin End Name Duration (usec) Size Proc PID Filename | |
162 | [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) | |
163 | [12:18:52.946080165,12:18:52.946132278] open 52.113 N/A apache2 31588 /var/lib/php5/sess_mr9045p1k55vin1h0vg7rhgd63 (fd=13) | |
164 | [12:18:46.800846035,12:18:46.800874916] open 28.881 N/A apache2 31591 /var/lib/php5/sess_r7c12pccfvjtas15g3j69u14h0 (fd=13) | |
165 | [12:18:51.389797604,12:18:51.389824426] open 26.822 N/A apache2 31520 /var/lib/php5/sess_4sdb1rtjkhb78sabnoj8gpbl00 (fd=13) | |
166 | ||
167 | Top read syscall latencies (usec) | |
168 | Begin End Name Duration (usec) Size Proc PID Filename | |
169 | [12:18:37.256073107,12:18:37.256555967] read 482.860 7.00 B bash 10237 unknown (origin not found) (fd=3) | |
170 | [12:18:52.000209798,12:18:52.000252304] read 42.506 1.00 KB irqbalance 1337 /proc/interrupts (fd=3) | |
171 | [12:18:37.256559439,12:18:37.256601615] read 42.176 5.00 B bash 10237 unknown (origin not found) (fd=3) | |
172 | [12:18:42.000281918,12:18:42.000320016] read 38.098 1.00 KB irqbalance 1337 /proc/interrupts (fd=3) | |
173 | ||
174 | Top write syscall latencies (usec) | |
175 | Begin End Name Duration (usec) Size Proc PID Filename | |
176 | [12:18:49.913241516,12:18:49.915908862] write 2667.346 95.00 B apache2 31584 /var/log/apache2/access.log (fd=8) | |
177 | [12:18:37.472823631,12:18:37.472859836] writev 36.205 21.97 KB apache2 31544 unknown (origin not found) (fd=12) | |
178 | [12:18:37.991578372,12:18:37.991612724] writev 34.352 21.97 KB apache2 31589 unknown (origin not found) (fd=12) | |
179 | [12:18:39.547778549,12:18:39.547812515] writev 33.966 21.97 KB apache2 31584 unknown (origin not found) (fd=12) | |
180 | ||
181 | Top sync syscall latencies (usec) | |
182 | Begin End Name Duration (usec) Size Proc PID Filename | |
183 | [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None) | |
184 | [12:18:37.227867532,12:18:37.232289687] sync_file_range 4422.155 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) | |
185 | [12:18:37.238076585,12:18:37.239012027] sync_file_range 935.442 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) | |
186 | [12:18:37.220974711,12:18:37.221647124] sync_file_range 672.413 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) | |
187 | ``` | |
188 | ||
e3de48f7 JD |
189 | #### I/O operations log |
190 | ```bash | |
fe109453 | 191 | $ ./lttng-iolog mytrace/ |
e3de48f7 JD |
192 | [10:58:26.221618530,10:58:26.221620659] write 2.129 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5) |
193 | [10:58:26.221623609,10:58:26.221628055] read 4.446 50.00 B /usr/bin/x-term 11793 /dev/ptmx (fd=24) | |
194 | [10:58:26.221638929,10:58:26.221640008] write 1.079 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5) | |
195 | [10:58:26.221676232,10:58:26.221677385] read 1.153 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5) | |
196 | [10:58:26.223401804,10:58:26.223411683] open 9.879 N/A sleep 12420 /etc/ld.so.cache (fd=3) | |
197 | [10:58:26.223448060,10:58:26.223455577] open 7.517 N/A sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3) | |
198 | [10:58:26.223456522,10:58:26.223458898] read 2.376 832.00 B sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3) | |
0f582ce8 | 199 | [10:58:26.223918068,10:58:26.223929316] open 11.248 N/A sleep 12420 (fd=3) |
e3de48f7 JD |
200 | [10:58:26.231881565,10:58:26.231895970] writev 14.405 16.00 B /usr/bin/x-term 11793 socket:[45650] (fd=4) |
201 | [10:58:26.231979636,10:58:26.231988446] recvmsg 8.810 16.00 B Xorg 1827 socket:[47480] (fd=38) | |
202 | ``` | |
203 | ||
204 | ||
205 | #### I/O usage top | |
206 | ```bash | |
fe109453 | 207 | $ ./lttng-iousagetop traces/pgread-writes |
e3de48f7 JD |
208 | Timerange: [2014-10-07 16:36:00.733214969, 2014-10-07 16:36:18.804584183] |
209 | Per-process I/O Read | |
210 | ############################################################################### | |
211 | ██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 4.00 B net 16.00 MB unknown | |
212 | █████ 1.72 MB lttng-consumerd (2619) 0 B file 0 B net 1.72 MB unknown | |
213 | █ 398.13 KB postgres (4219) 121.05 KB file 277.07 KB net 8.00 B unknown | |
214 | 256.09 KB postgres (1348) 0 B file 255.97 KB net 117.00 B unknown | |
215 | 204.81 KB postgres (4218) 204.81 KB file 0 B net 0 B unknown | |
216 | 123.77 KB postgres (4220) 117.50 KB file 6.26 KB net 8.00 B unknown | |
217 | Per-process I/O Write | |
218 | ############################################################################### | |
0f582ce8 AB |
219 | ██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 8.00 MB net 8.00 MB unknown |
220 | ██████ 2.20 MB postgres (4219) 2.00 MB file 202.23 KB net 0 B unknown | |
221 | █████ 1.73 MB lttng-consumerd (2619) 0 B file 887.73 KB net 882.58 KB unknown | |
222 | ██ 726.33 KB postgres (1165) 8.00 KB file 6.33 KB net 712.00 KB unknown | |
223 | 158.69 KB postgres (1168) 158.69 KB file 0 B net 0 B unknown | |
224 | 80.66 KB postgres (1348) 0 B file 80.66 KB net 0 B unknown | |
28ccfe49 | 225 | Files Read |
e3de48f7 JD |
226 | ############################################################################### |
227 | ██████████████████████████████████████████████████ 8.00 MB anon_inode:[lttng_stream] (lttng-consumerd) 'fd 32 in lttng-consumerd (2619)' | |
228 | █████ 834.41 KB base/16384/pg_internal.init 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)' | |
229 | █ 256.09 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)' | |
230 | █ 174.69 KB pg_stat_tmp/pgstat.stat 'fd 9 in postgres (4218)', 'fd 9 in postgres (1167)' | |
231 | 109.48 KB global/pg_internal.init 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)' | |
232 | 104.30 KB base/11951/pg_internal.init 'fd 7 in postgres (4218)' | |
233 | 12.85 KB socket (lttng-sessiond) 'fd 30 in lttng-sessiond (384)' | |
234 | 4.50 KB global/pg_filenode.map 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)' | |
235 | 4.16 KB socket (postgres) 'fd 9 in postgres (4226)' | |
236 | 4.00 KB /proc/interrupts 'fd 3 in irqbalance (1104)' | |
19952e0f | 237 | Files Write |
e3de48f7 JD |
238 | ############################################################################### |
239 | ██████████████████████████████████████████████████ 8.00 MB socket:[56371] (lttng-consumerd) 'fd 30 in lttng-consumerd (2619)' | |
240 | █████████████████████████████████████████████████ 8.00 MB pipe:[53306] (lttng-consumerd) 'fd 12 in lttng-consumerd (2619)' | |
241 | ██████████ 1.76 MB pg_xlog/00000001000000000000000B 'fd 31 in postgres (4219)' | |
242 | █████ 887.82 KB socket:[56369] (lttng-consumerd) 'fd 26 in lttng-consumerd (2619)' | |
243 | █████ 882.58 KB pipe:[53309] (lttng-consumerd) 'fd 18 in lttng-consumerd (2619)' | |
244 | 160.00 KB /var/lib/postgresql/9.1/main/base/16384/16602 'fd 14 in postgres (1165)' | |
245 | 158.69 KB pg_stat_tmp/pgstat.tmp 'fd 3 in postgres (1168)' | |
246 | 144.00 KB /var/lib/postgresql/9.1/main/base/16384/16613 'fd 12 in postgres (1165)' | |
247 | 88.00 KB /var/lib/postgresql/9.1/main/base/16384/16609 'fd 11 in postgres (1165)' | |
248 | 78.28 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)' | |
249 | Block I/O Read | |
250 | ############################################################################### | |
251 | Block I/O Write | |
252 | ############################################################################### | |
0f582ce8 AB |
253 | ██████████████████████████████████████████████████ 1.76 MB postgres (pid=4219) |
254 | ████ 160.00 KB postgres (pid=1168) | |
e3de48f7 | 255 | ██ 100.00 KB kworker/u8:0 (pid=1540) |
0f582ce8 AB |
256 | ██ 96.00 KB jbd2/vda1-8 (pid=257) |
257 | █ 40.00 KB postgres (pid=1166) | |
e3de48f7 JD |
258 | 8.00 KB kworker/u9:0 (pid=4197) |
259 | 4.00 KB kworker/u9:2 (pid=1381) | |
28ccfe49 JD |
260 | Disk nr_sector |
261 | ############################################################################### | |
e3de48f7 | 262 | ███████████████████████████████████████████████████████████████████ 4416.00 sectors vda1 |
28ccfe49 JD |
263 | Disk nr_requests |
264 | ############################################################################### | |
e3de48f7 | 265 | ████████████████████████████████████████████████████████████████████ 177.00 requests vda1 |
28ccfe49 JD |
266 | Disk request time/sector |
267 | ############################################################################### | |
e3de48f7 | 268 | ██████████████████████████████████████████████████████████████████ 0.01 ms vda1 |
28ccfe49 JD |
269 | Network recv_bytes |
270 | ############################################################################### | |
e3de48f7 | 271 | ███████████████████████████████████████████████████████ 739.50 KB eth0 |
0f582ce8 | 272 | █████ 80.27 KB lo |
28ccfe49 JD |
273 | Network sent_bytes |
274 | ############################################################################### | |
e3de48f7 | 275 | ████████████████████████████████████████████████████████ 9.36 MB eth0 |
28ccfe49 | 276 | ``` |
19952e0f | 277 | |
21167679 JD |
278 | ### Syscalls |
279 | #### Statistics | |
280 | ```bash | |
281 | $ ./lttng-syscallstats mytrace/ | |
282 | Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313] | |
283 | Per-TID syscalls statistics (usec) | |
284 | find (22785), 71847 syscalls Count Min Average Max Stdev Return values | |
285 | syscall_entry_getdents 14240 0.380 364.301 43372.450 1629.390 {'success': 14240} | |
286 | syscall_entry_close 14236 0.233 0.506 4.932 0.217 {'success': 14236} | |
287 | syscall_entry_fchdir 14231 0.252 0.407 5.769 0.117 {'success': 14231} | |
288 | syscall_entry_open 7123 0.779 2.321 12.697 0.936 {'ENOENT': 4, 'success': 7119} | |
289 | syscall_entry_newfstatat 7118 1.457 143.562 28103.532 1410.281 {'success': 7118} | |
290 | syscall_entry_openat 7118 1.525 2.411 9.107 0.771 {'success': 7118} | |
291 | syscall_entry_newfstat 7117 0.272 0.654 8.707 0.248 {'success': 7117} | |
292 | syscall_entry_write 573 0.298 0.715 8.584 0.391 {'success': 573} | |
293 | syscall_entry_brk 27 0.615 5.768 30.792 7.830 {'success': 27} | |
294 | syscall_entry_rt_sigaction 22 0.227 0.283 0.589 0.098 {'success': 22} | |
295 | syscall_entry_mmap 12 1.116 2.116 3.597 0.762 {'success': 12} | |
296 | syscall_entry_mprotect 6 1.185 2.235 3.923 1.148 {'success': 6} | |
297 | syscall_entry_read 5 0.925 2.101 6.300 2.351 {'success': 5} | |
298 | syscall_entry_access 4 1.166 2.530 4.202 1.527 {'ENOENT': 4} | |
299 | syscall_entry_ioctl 4 0.342 1.151 2.280 0.873 {'ENOTTY': 2, 'success': 2} | |
300 | syscall_entry_rt_sigprocmask 3 0.325 0.570 0.979 0.357 {'success': 3} | |
301 | syscall_entry_munmap 2 3.006 5.399 7.792 ? {'success': 2} | |
302 | syscall_entry_dup2 2 0.250 0.562 0.874 ? {'success': 2} | |
303 | syscall_entry_setpgid 1 0.945 0.945 0.945 ? {'success': 1} | |
304 | syscall_entry_newuname 1 1.240 1.240 1.240 ? {'success': 1} | |
305 | syscall_entry_execve 1 7277.974 7277.974 7277.974 ? {'success': 1} | |
306 | syscall_entry_fcntl 1 ? 0.000 0.000 ? {} | |
307 | ----------------------------------------------------------------------------------------------------------------- | |
308 | apache2 (31517), 1131 syscalls Count Min Average Max Stdev Return values | |
309 | syscall_entry_fcntl 192 ? 0.000 0.000 ? {} | |
310 | syscall_entry_newfstat 156 0.237 0.484 1.102 0.222 {'success': 156} | |
311 | syscall_entry_read 144 0.307 1.602 16.307 1.698 {'EAGAIN': 27, 'success': 117} | |
312 | syscall_entry_access 96 0.705 1.580 3.364 0.670 {'ENOENT': 84, 'success': 12} | |
313 | syscall_entry_newlstat 84 0.459 0.738 1.456 0.186 {'ENOENT': 21, 'success': 63} | |
314 | syscall_entry_newstat 74 0.735 2.266 11.212 1.772 {'ENOENT': 24, 'success': 50} | |
315 | syscall_entry_lseek 72 0.317 0.522 0.915 0.112 {'success': 72} | |
316 | syscall_entry_close 39 0.471 0.615 0.867 0.069 {'success': 39} | |
317 | syscall_entry_open 36 2.219 12162.689 437697.753 72948.868 {'success': 36} | |
318 | syscall_entry_getcwd 28 0.287 0.701 1.331 0.277 {'success': 28} | |
319 | syscall_entry_poll 27 1.080 1139.669 2851.163 856.723 {'success': 27} | |
320 | ``` | |
321 | ||
e3de48f7 JD |
322 | ### IRQ |
323 | #### Handler duration and raise latency statistics | |
324 | ```bash | |
fe109453 | 325 | $ ./lttng-irqstats mytrace/ |
e3de48f7 JD |
326 | Timerange: [2014-03-11 16:05:41.314824752, 2014-03-11 16:05:45.041994298] |
327 | Hard IRQ Duration (us) | |
0f582ce8 | 328 | count min avg max stdev |
e3de48f7 | 329 | ----------------------------------------------------------------------------------| |
0f582ce8 AB |
330 | 1: <i8042> 30 10.901 45.500 64.510 18.447 | |
331 | 42: <ahci> 259 3.203 7.863 21.426 3.183 | | |
332 | 43: <eth0> 2 3.859 3.976 4.093 0.165 | | |
333 | 44: <iwlwifi> 92 0.300 3.995 6.542 2.181 | | |
e3de48f7 JD |
334 | |
335 | Soft IRQ Duration (us) Raise latency (us) | |
0f582ce8 | 336 | count min avg max stdev | count min avg max stdev |
e3de48f7 JD |
337 | ----------------------------------------------------------------------------------|------------------------------------------------------------ |
338 | 1: <TIMER_SOFTIRQ> 495 0.202 21.058 51.060 11.047 | 53 2.141 11.217 20.005 7.233 | |
339 | 3: <NET_RX_SOFTIRQ> 14 0.133 9.177 32.774 10.483 | 14 0.763 3.703 10.902 3.448 | |
340 | 4: <BLOCK_SOFTIRQ> 257 5.981 29.064 125.862 15.891 | 257 0.891 3.104 15.054 2.046 | |
341 | 6: <TASKLET_SOFTIRQ> 26 0.309 1.198 1.748 0.329 | 26 9.636 39.222 51.430 11.246 | |
342 | 7: <SCHED_SOFTIRQ> 299 1.185 14.768 90.465 15.992 | 298 1.286 31.387 61.700 11.866 | |
343 | 9: <RCU_SOFTIRQ> 338 0.592 3.387 13.745 1.356 | 147 2.480 29.299 64.453 14.286 | |
344 | ``` | |
345 | ||
346 | #### Handler duration frequency distribution | |
347 | ```bash | |
fe109453 | 348 | $ ./lttng-irqfreq --timerange [16:05:42,16:05:45] --irq 44 --stats mytrace/ |
0f582ce8 | 349 | Timerange: [2014-03-11 16:05:42.042034570, 2014-03-11 16:05:44.998914297] |
e3de48f7 | 350 | Hard IRQ Duration (us) |
0f582ce8 | 351 | count min avg max stdev |
e3de48f7 | 352 | ----------------------------------------------------------------------------------| |
0f582ce8 | 353 | 44: <iwlwifi> 72 0.300 4.018 6.542 2.164 | |
e3de48f7 JD |
354 | Frequency distribution iwlwifi (44) |
355 | ############################################################################### | |
0f582ce8 AB |
356 | 0.300 █████ 1.00 |
357 | 0.612 ██████████████████████████████████████████████████████████████ 12.00 | |
358 | 0.924 ████████████████████ 4.00 | |
359 | 1.236 ██████████ 2.00 | |
360 | 1.548 0.00 | |
361 | 1.861 █████ 1.00 | |
362 | 2.173 0.00 | |
363 | 2.485 █████ 1.00 | |
364 | 2.797 ██████████████████████████ 5.00 | |
365 | 3.109 █████ 1.00 | |
366 | 3.421 ███████████████ 3.00 | |
367 | 3.733 0.00 | |
368 | 4.045 █████ 1.00 | |
369 | 4.357 █████ 1.00 | |
370 | 4.669 ██████████ 2.00 | |
371 | 4.981 ██████████ 2.00 | |
372 | 5.294 █████████████████████████████████████████ 8.00 | |
373 | 5.606 ████████████████████████████████████████████████████████████████████ 13.00 | |
374 | 5.918 ██████████████████████████████████████████████████████████████ 12.00 | |
375 | 6.230 ███████████████ 3.00 | |
e3de48f7 JD |
376 | ``` |
377 | ||
378 | ### Others | |
379 | There are a lot of other scripts, we encourage you to try them and read the | |
380 | ```--help``` to see all the available options. | |
381 | ||
173fa485 | 382 | ## Work in progress |
19952e0f JD |
383 | Track the page cache and extract the latencies associated with pages flush to disk. |
384 | In order to do that, we rely on the assumption that the pages are flushed in a FIFO | |
385 | order. It might not be 100% accurate, but it already gives great results : | |
386 | ||
387 | An example here when saving a file in vim: | |
173fa485 | 388 | ``` |
19952e0f JD |
389 | [19:57:51.173332284 - 19:57:51.177794657] vim (31517) syscall_entry_fsync(fd = 4 <blabla>) = 0, 4.462 ms |
390 | 1 dirty page(s) were flushed (assuming FIFO): | |
391 | vim (31517): 1 pages | |
392 | - blabla : 1 pages | |
393 | 13 active dirty filesystem page(s) (known): | |
394 | redis-server (2092): 2 pages | |
395 | - /var/log/redis/redis-server.log : 2 pages | |
396 | vim (31517): 2 pages | |
397 | - .blabla.swp : 2 pages | |
398 | lttng-consumerd (6750): 9 pages | |
399 | - unknown (origin not found) : 9 pages | |
400 | ``` | |
401 | ||
402 | An other example when running the 'sync' command: | |
173fa485 | 403 | ``` |
19952e0f JD |
404 | [19:57:53.046840755 - 19:57:53.072809609] sync (31554) syscall_entry_sync(fd = <unknown>) = 0, 25.969 ms |
405 | 23 dirty page(s) were flushed (assuming FIFO): | |
406 | redis-server (2092): 2 pages | |
407 | - /var/log/redis/redis-server.log : 2 pages | |
408 | vim (31517): 9 pages | |
409 | - /home/julien/.viminfo.tmp : 6 pages | |
410 | - .blabla.swp : 3 pages | |
411 | lttng-consumerd (6750): 12 pages | |
412 | - unknown (origin not found) : 12 pages | |
413 | ``` | |
414 | ||
c6bb95df | 415 | PostgreSQL with 'sys_fdatasync': |
19952e0f JD |
416 | ``` |
417 | [13:49:39.908599447 - 13:49:39.915930730] postgres (1137) sys_fdatasync(fd = 7 </var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008>) = 0, 7.331 ms | |
418 | 2 pages allocated during the period | |
419 | 88 dirty page(s) were flushed (assuming FIFO): | |
420 | postgres (1137): 88 pages | |
421 | - /var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008 : 88 pages | |
422 | 68 last dirtied filesystem page(s): | |
423 | postgres (2419): 68 pages | |
424 | - base/11951/18410 : 46 pages | |
425 | - base/11951/18407 : 10 pages | |
426 | - base/11951/18407_fsm : 6 pages | |
427 | - base/11951/18410_fsm : 6 pages | |
428 | ``` | |
429 | ||
898ff628 | 430 | Detecting a fight for the I/O between a huge write and postgresql: |
19952e0f JD |
431 | ``` |
432 | [13:49:47.242730583 - 13:49:47.442835037] python (2353) sys_write(fd = 3 </root/bla>, count = 102395904) = 102395904, 200.104 ms | |
433 | 34760 pages allocated during the period | |
434 | woke up kswapd during the period | |
435 | 10046 pages written on disk | |
436 | freed 33753 pages from the cache during the period | |
437 | 1397 last dirtied filesystem page(s): | |
438 | python (2353): 1325 pages | |
439 | - /root/bla : 1325 pages | |
440 | postgres (2419): 72 pages | |
441 | - base/11951/18419 : 72 pages | |
442 | ``` | |
443 | ||
e3de48f7 JD |
444 | ## Limitations |
445 | ||
446 | The main limitation of this project is the fact that it can be quite slow to | |
447 | process a large trace. This project is a work in progress and we focus on the | |
0f582ce8 AB |
448 | problem-solving aspect. Therefore, features have been prioritized over |
449 | performance for now. | |
e3de48f7 | 450 | |
898ff628 JG |
451 | One other aspect is the fact that the state is not persistent; the trace has |
452 | to be re-processed if another analysis script is to be used on the same trace. | |
453 | Some scripts belonging to the same category allow the combination of multiple | |
454 | analyses into a single pass (see ```--freq```, ```--log```, ```--usage```, | |
e3de48f7 JD |
455 | ```--latencystats```, etc). We are planning to add a way to save the state |
456 | and/or create an interactive environment to allow the user to run multiple | |
457 | analyses on the same trace without having to process the trace every time. | |
458 | ||
459 | ||
19952e0f | 460 | ## Conclusion |
898ff628 JG |
461 | We hope you have fun trying this project and please remember it is a work in |
462 | progress; feedback, bug reports and improvement ideas are always welcome! |