fix: lost link to the LTTng documentation
[deliverable/lttng-analyses.git] / README.md
CommitLineData
28ccfe49
JD
1# lttng-analyses
2
3This repository contains various scripts to extract monitoring data and metrics
4from LTTng kernel traces.
5
0f582ce8 6As opposed to other diagnostic or monitoring solutions, this approach is designed
898ff628
JG
7to allow users to record their system's activity with a low overhead, wait
8for a problem to occur and then diagnose its cause offline.
28ccfe49 9
0f582ce8
AB
10This solution allows the user to target hard to find problems and dig until the
11root cause is found.
28ccfe49 12
0f582ce8 13This README describes the implemented analyses as well as how to use them.
28ccfe49
JD
14
15## Requirements
e3de48f7 16* LTTng >= 2.5
898ff628 17* Babeltrace >= 1.2 (with python bindings built)
e3de48f7 18* Python >= 3.4
28ccfe49
JD
19
20## Install on Ubuntu (12.04 and 14.04 at least)
e3de48f7 21```bash
851bf235
JD
22apt-get install -y software-properties-common (or python-software-properties on 12.04)
23apt-add-repository -y ppa:lttng/ppa
24apt-get update
25apt-get -y install lttng-tools babeltrace lttng-modules-dkms python3-babeltrace python3-progressbar
898ff628
JG
26```
27
898ff628 28```bash
e3de48f7
JD
29git clone https://github.com/lttng/lttng-analyses.git
30```
28ccfe49
JD
31
32## Trace creation
b5019297
JD
33Here are the basic commands to create a trace, for more information on the
34LTTng setup, please refer to the [LTTng
35documentation](http://lttng.org/docs/#doc-tracing-the-linux-kernel)
36
e3de48f7
JD
37### Automatic
38From the cloned git tree:
39```bash
40./lttng-analyses-record
28ccfe49 41```
e3de48f7
JD
42
43### Manual
44```bash
851bf235
JD
45lttng create
46lttng enable-channel -k bla --subbuf-size=4M
0e634170 47lttng enable-event -k sched_switch,block_rq_complete,block_rq_issue,block_bio_remap,block_bio_backmerge,netif_receive_skb,net_dev_xmit,sched_process_fork,sched_process_exec,lttng_statedump_process_state,lttng_statedump_file_descriptor,lttng_statedump_block_device,writeback_pages_written,mm_vmscan_wakeup_kswapd,mm_page_free,mm_page_alloc,block_dirty_buffer,irq_handler_entry,irq_handler_exit,softirq_entry,softirq_exit,softirq_raise -c bla
851bf235
JD
48lttng enable-event -k --syscall -a -c bla
49lttng start
50..do stuff...
51lttng stop
52lttng destroy
28ccfe49
JD
53```
54
e3de48f7 55### Remote trace creation
28ccfe49 56You can also create a trace on a server and send it to a remote host. The
0f582ce8
AB
57remote host only needs to run `lttng-relayd -d` and be reachable over the network.
58The only difference with the above commands is the tracing session's creation:
e3de48f7 59```bash
851bf235 60lttng create -U net://<remote-host>
3d184c69 61```
28ccfe49 62
e3de48f7
JD
63## Implemented analyses
64
65* CPU usage for the whole system
66* CPU usage per-process
67* Process CPU migration count
68* Memory usage per-process (as seen by the kernel)
69* Memory usage system-wide (as seen by the kernel)
70* I/O usage (syscalls, disk, network)
71* I/O operations log (with latency and usage)
72* I/O latency statistics (open, read, write, sync operations)
73* I/O latency frequency distribution
74* Interrupt handler duration statistics (count, min, max, average stdev)
75* Interrupt handler duration top
76* Interrupt handler duration log
77* Interrupt handler duration frequency distribution
78* SoftIRQ handler latency statistics
79* Syscalls usage statistics
80
898ff628
JG
81All of the analyses share the same code architecture making it possible
82to filter by timerange, process name, PID, min and max values using the
83same command-line options. Also note that reported timestamps can
84optionally be expressed in the GMT timezone to allow easy sharing between
85teams.
e3de48f7 86
898ff628
JG
87The project's architecture makes it easy to add new analyses or to reuse
88the analysis backend in external tools which may then present the results
89in their own format (as opposed to text).
e3de48f7 90
fe109453 91## Examples
898ff628
JG
92After having collected your trace, any script contained in this repository
93can be used to run an analysis. Read on for some examples!
e3de48f7
JD
94
95### I/O
96#### I/O latency stats
97```bash
fe109453 98$ ./lttng-iolatencystats mytrace/
e3de48f7
JD
99Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936]
100Syscalls latency statistics (usec):
101Type Count Min Average Max Stdev
102-----------------------------------------------------------------------------------------
103Open 45 5.562 13.835 77.683 15.263
104Read 109 0.316 5.774 62.569 9.277
105Write 101 0.256 7.060 48.531 8.555
106Sync 207 19.384 40.664 160.188 21.201
107
108Disk latency statistics (usec):
109Name Count Min Average Max Stdev
110-----------------------------------------------------------------------------------------
111dm-0 108 0.001 0.004 0.007 1.306
112```
113
114#### I/O latency frequency distribution
115```bash
fe109453 116$ ./lttng-iolatencyfreq mytrace/
e3de48f7
JD
117Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936]
118Open latency distribution (usec)
28ccfe49 119###############################################################################
0f582ce8
AB
120 5.562 ███████████████████████████████████████████████████████████████████ 25
121 9.168 ██████████ 4
12212.774 █████████████████████ 8
12316.380 ████████ 3
12419.986 █████ 2
12523.592 0
12627.198 0
12730.804 0
12834.410 ██ 1
12938.016 0
13041.623 0
13145.229 0
13248.835 0
13352.441 0
13456.047 0
13559.653 0
13663.259 0
13766.865 0
13870.471 0
13974.077 █████ 2
e3de48f7
JD
140```
141
bb0e7459
JD
142#### I/O latency top
143```bash
144$ ./lttng-iolatencytop analysis-20150115-120942/ --limit 3 --minsize 2
145Checking the trace for lost events...
146Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313]
147Top open syscall latencies (usec)
148Begin End Name Duration (usec) Size Proc PID Filename
149[12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13)
150[12:18:52.946080165,12:18:52.946132278] open 52.113 N/A apache2 31588 /var/lib/php5/sess_mr9045p1k55vin1h0vg7rhgd63 (fd=13)
151[12:18:46.800846035,12:18:46.800874916] open 28.881 N/A apache2 31591 /var/lib/php5/sess_r7c12pccfvjtas15g3j69u14h0 (fd=13)
152[12:18:51.389797604,12:18:51.389824426] open 26.822 N/A apache2 31520 /var/lib/php5/sess_4sdb1rtjkhb78sabnoj8gpbl00 (fd=13)
153
154Top read syscall latencies (usec)
155Begin End Name Duration (usec) Size Proc PID Filename
156[12:18:37.256073107,12:18:37.256555967] read 482.860 7.00 B bash 10237 unknown (origin not found) (fd=3)
157[12:18:52.000209798,12:18:52.000252304] read 42.506 1.00 KB irqbalance 1337 /proc/interrupts (fd=3)
158[12:18:37.256559439,12:18:37.256601615] read 42.176 5.00 B bash 10237 unknown (origin not found) (fd=3)
159[12:18:42.000281918,12:18:42.000320016] read 38.098 1.00 KB irqbalance 1337 /proc/interrupts (fd=3)
160
161Top write syscall latencies (usec)
162Begin End Name Duration (usec) Size Proc PID Filename
163[12:18:49.913241516,12:18:49.915908862] write 2667.346 95.00 B apache2 31584 /var/log/apache2/access.log (fd=8)
164[12:18:37.472823631,12:18:37.472859836] writev 36.205 21.97 KB apache2 31544 unknown (origin not found) (fd=12)
165[12:18:37.991578372,12:18:37.991612724] writev 34.352 21.97 KB apache2 31589 unknown (origin not found) (fd=12)
166[12:18:39.547778549,12:18:39.547812515] writev 33.966 21.97 KB apache2 31584 unknown (origin not found) (fd=12)
167
168Top sync syscall latencies (usec)
169Begin End Name Duration (usec) Size Proc PID Filename
170[12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None)
171[12:18:37.227867532,12:18:37.232289687] sync_file_range 4422.155 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
172[12:18:37.238076585,12:18:37.239012027] sync_file_range 935.442 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
173[12:18:37.220974711,12:18:37.221647124] sync_file_range 672.413 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
174```
175
e3de48f7
JD
176#### I/O operations log
177```bash
fe109453 178$ ./lttng-iolog mytrace/
e3de48f7
JD
179[10:58:26.221618530,10:58:26.221620659] write 2.129 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
180[10:58:26.221623609,10:58:26.221628055] read 4.446 50.00 B /usr/bin/x-term 11793 /dev/ptmx (fd=24)
181[10:58:26.221638929,10:58:26.221640008] write 1.079 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
182[10:58:26.221676232,10:58:26.221677385] read 1.153 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
183[10:58:26.223401804,10:58:26.223411683] open 9.879 N/A sleep 12420 /etc/ld.so.cache (fd=3)
184[10:58:26.223448060,10:58:26.223455577] open 7.517 N/A sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3)
185[10:58:26.223456522,10:58:26.223458898] read 2.376 832.00 B sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3)
0f582ce8 186[10:58:26.223918068,10:58:26.223929316] open 11.248 N/A sleep 12420 (fd=3)
e3de48f7
JD
187[10:58:26.231881565,10:58:26.231895970] writev 14.405 16.00 B /usr/bin/x-term 11793 socket:[45650] (fd=4)
188[10:58:26.231979636,10:58:26.231988446] recvmsg 8.810 16.00 B Xorg 1827 socket:[47480] (fd=38)
189```
190
191
192#### I/O usage top
193```bash
fe109453 194$ ./lttng-iousagetop traces/pgread-writes
e3de48f7
JD
195Timerange: [2014-10-07 16:36:00.733214969, 2014-10-07 16:36:18.804584183]
196Per-process I/O Read
197###############################################################################
198██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 4.00 B net 16.00 MB unknown
199█████ 1.72 MB lttng-consumerd (2619) 0 B file 0 B net 1.72 MB unknown
200█ 398.13 KB postgres (4219) 121.05 KB file 277.07 KB net 8.00 B unknown
201 256.09 KB postgres (1348) 0 B file 255.97 KB net 117.00 B unknown
202 204.81 KB postgres (4218) 204.81 KB file 0 B net 0 B unknown
203 123.77 KB postgres (4220) 117.50 KB file 6.26 KB net 8.00 B unknown
204Per-process I/O Write
205###############################################################################
0f582ce8
AB
206██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 8.00 MB net 8.00 MB unknown
207██████ 2.20 MB postgres (4219) 2.00 MB file 202.23 KB net 0 B unknown
208█████ 1.73 MB lttng-consumerd (2619) 0 B file 887.73 KB net 882.58 KB unknown
209██ 726.33 KB postgres (1165) 8.00 KB file 6.33 KB net 712.00 KB unknown
210 158.69 KB postgres (1168) 158.69 KB file 0 B net 0 B unknown
211 80.66 KB postgres (1348) 0 B file 80.66 KB net 0 B unknown
28ccfe49 212Files Read
e3de48f7
JD
213###############################################################################
214██████████████████████████████████████████████████ 8.00 MB anon_inode:[lttng_stream] (lttng-consumerd) 'fd 32 in lttng-consumerd (2619)'
215█████ 834.41 KB base/16384/pg_internal.init 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
216█ 256.09 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)'
217█ 174.69 KB pg_stat_tmp/pgstat.stat 'fd 9 in postgres (4218)', 'fd 9 in postgres (1167)'
218 109.48 KB global/pg_internal.init 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
219 104.30 KB base/11951/pg_internal.init 'fd 7 in postgres (4218)'
220 12.85 KB socket (lttng-sessiond) 'fd 30 in lttng-sessiond (384)'
221 4.50 KB global/pg_filenode.map 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
222 4.16 KB socket (postgres) 'fd 9 in postgres (4226)'
223 4.00 KB /proc/interrupts 'fd 3 in irqbalance (1104)'
19952e0f 224Files Write
e3de48f7
JD
225###############################################################################
226██████████████████████████████████████████████████ 8.00 MB socket:[56371] (lttng-consumerd) 'fd 30 in lttng-consumerd (2619)'
227█████████████████████████████████████████████████ 8.00 MB pipe:[53306] (lttng-consumerd) 'fd 12 in lttng-consumerd (2619)'
228██████████ 1.76 MB pg_xlog/00000001000000000000000B 'fd 31 in postgres (4219)'
229█████ 887.82 KB socket:[56369] (lttng-consumerd) 'fd 26 in lttng-consumerd (2619)'
230█████ 882.58 KB pipe:[53309] (lttng-consumerd) 'fd 18 in lttng-consumerd (2619)'
231 160.00 KB /var/lib/postgresql/9.1/main/base/16384/16602 'fd 14 in postgres (1165)'
232 158.69 KB pg_stat_tmp/pgstat.tmp 'fd 3 in postgres (1168)'
233 144.00 KB /var/lib/postgresql/9.1/main/base/16384/16613 'fd 12 in postgres (1165)'
234 88.00 KB /var/lib/postgresql/9.1/main/base/16384/16609 'fd 11 in postgres (1165)'
235 78.28 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)'
236Block I/O Read
237###############################################################################
238Block I/O Write
239###############################################################################
0f582ce8
AB
240██████████████████████████████████████████████████ 1.76 MB postgres (pid=4219)
241████ 160.00 KB postgres (pid=1168)
e3de48f7 242██ 100.00 KB kworker/u8:0 (pid=1540)
0f582ce8
AB
243██ 96.00 KB jbd2/vda1-8 (pid=257)
244█ 40.00 KB postgres (pid=1166)
e3de48f7
JD
245 8.00 KB kworker/u9:0 (pid=4197)
246 4.00 KB kworker/u9:2 (pid=1381)
28ccfe49
JD
247Disk nr_sector
248###############################################################################
e3de48f7 249███████████████████████████████████████████████████████████████████ 4416.00 sectors vda1
28ccfe49
JD
250Disk nr_requests
251###############################################################################
e3de48f7 252████████████████████████████████████████████████████████████████████ 177.00 requests vda1
28ccfe49
JD
253Disk request time/sector
254###############################################################################
e3de48f7 255██████████████████████████████████████████████████████████████████ 0.01 ms vda1
28ccfe49
JD
256Network recv_bytes
257###############################################################################
e3de48f7 258███████████████████████████████████████████████████████ 739.50 KB eth0
0f582ce8 259█████ 80.27 KB lo
28ccfe49
JD
260Network sent_bytes
261###############################################################################
e3de48f7 262████████████████████████████████████████████████████████ 9.36 MB eth0
28ccfe49 263```
19952e0f 264
e3de48f7
JD
265### IRQ
266#### Handler duration and raise latency statistics
267```bash
fe109453 268$ ./lttng-irqstats mytrace/
e3de48f7
JD
269Timerange: [2014-03-11 16:05:41.314824752, 2014-03-11 16:05:45.041994298]
270Hard IRQ Duration (us)
0f582ce8 271 count min avg max stdev
e3de48f7 272----------------------------------------------------------------------------------|
0f582ce8
AB
2731: <i8042> 30 10.901 45.500 64.510 18.447 |
27442: <ahci> 259 3.203 7.863 21.426 3.183 |
27543: <eth0> 2 3.859 3.976 4.093 0.165 |
27644: <iwlwifi> 92 0.300 3.995 6.542 2.181 |
e3de48f7
JD
277
278Soft IRQ Duration (us) Raise latency (us)
0f582ce8 279 count min avg max stdev | count min avg max stdev
e3de48f7
JD
280----------------------------------------------------------------------------------|------------------------------------------------------------
2811: <TIMER_SOFTIRQ> 495 0.202 21.058 51.060 11.047 | 53 2.141 11.217 20.005 7.233
2823: <NET_RX_SOFTIRQ> 14 0.133 9.177 32.774 10.483 | 14 0.763 3.703 10.902 3.448
2834: <BLOCK_SOFTIRQ> 257 5.981 29.064 125.862 15.891 | 257 0.891 3.104 15.054 2.046
2846: <TASKLET_SOFTIRQ> 26 0.309 1.198 1.748 0.329 | 26 9.636 39.222 51.430 11.246
2857: <SCHED_SOFTIRQ> 299 1.185 14.768 90.465 15.992 | 298 1.286 31.387 61.700 11.866
2869: <RCU_SOFTIRQ> 338 0.592 3.387 13.745 1.356 | 147 2.480 29.299 64.453 14.286
287```
288
289#### Handler duration frequency distribution
290```bash
fe109453 291$ ./lttng-irqfreq --timerange [16:05:42,16:05:45] --irq 44 --stats mytrace/
0f582ce8 292Timerange: [2014-03-11 16:05:42.042034570, 2014-03-11 16:05:44.998914297]
e3de48f7 293Hard IRQ Duration (us)
0f582ce8 294 count min avg max stdev
e3de48f7 295----------------------------------------------------------------------------------|
0f582ce8 29644: <iwlwifi> 72 0.300 4.018 6.542 2.164 |
e3de48f7
JD
297Frequency distribution iwlwifi (44)
298###############################################################################
0f582ce8
AB
2990.300 █████ 1.00
3000.612 ██████████████████████████████████████████████████████████████ 12.00
3010.924 ████████████████████ 4.00
3021.236 ██████████ 2.00
3031.548 0.00
3041.861 █████ 1.00
3052.173 0.00
3062.485 █████ 1.00
3072.797 ██████████████████████████ 5.00
3083.109 █████ 1.00
3093.421 ███████████████ 3.00
3103.733 0.00
3114.045 █████ 1.00
3124.357 █████ 1.00
3134.669 ██████████ 2.00
3144.981 ██████████ 2.00
3155.294 █████████████████████████████████████████ 8.00
3165.606 ████████████████████████████████████████████████████████████████████ 13.00
3175.918 ██████████████████████████████████████████████████████████████ 12.00
3186.230 ███████████████ 3.00
e3de48f7
JD
319```
320
321### Others
322There are a lot of other scripts, we encourage you to try them and read the
323```--help``` to see all the available options.
324
173fa485 325## Work in progress
19952e0f
JD
326Track the page cache and extract the latencies associated with pages flush to disk.
327In order to do that, we rely on the assumption that the pages are flushed in a FIFO
328order. It might not be 100% accurate, but it already gives great results :
329
330An example here when saving a file in vim:
173fa485 331```
19952e0f
JD
332[19:57:51.173332284 - 19:57:51.177794657] vim (31517) syscall_entry_fsync(fd = 4 <blabla>) = 0, 4.462 ms
333 1 dirty page(s) were flushed (assuming FIFO):
334 vim (31517): 1 pages
335 - blabla : 1 pages
336 13 active dirty filesystem page(s) (known):
337 redis-server (2092): 2 pages
338 - /var/log/redis/redis-server.log : 2 pages
339 vim (31517): 2 pages
340 - .blabla.swp : 2 pages
341 lttng-consumerd (6750): 9 pages
342 - unknown (origin not found) : 9 pages
343```
344
345An other example when running the 'sync' command:
173fa485 346```
19952e0f
JD
347[19:57:53.046840755 - 19:57:53.072809609] sync (31554) syscall_entry_sync(fd = <unknown>) = 0, 25.969 ms
348 23 dirty page(s) were flushed (assuming FIFO):
349 redis-server (2092): 2 pages
350 - /var/log/redis/redis-server.log : 2 pages
351 vim (31517): 9 pages
352 - /home/julien/.viminfo.tmp : 6 pages
353 - .blabla.swp : 3 pages
354 lttng-consumerd (6750): 12 pages
355 - unknown (origin not found) : 12 pages
356```
357
c6bb95df 358PostgreSQL with 'sys_fdatasync':
19952e0f
JD
359```
360[13:49:39.908599447 - 13:49:39.915930730] postgres (1137) sys_fdatasync(fd = 7 </var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008>) = 0, 7.331 ms
361 2 pages allocated during the period
362 88 dirty page(s) were flushed (assuming FIFO):
363 postgres (1137): 88 pages
364 - /var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008 : 88 pages
365 68 last dirtied filesystem page(s):
366 postgres (2419): 68 pages
367 - base/11951/18410 : 46 pages
368 - base/11951/18407 : 10 pages
369 - base/11951/18407_fsm : 6 pages
370 - base/11951/18410_fsm : 6 pages
371```
372
898ff628 373Detecting a fight for the I/O between a huge write and postgresql:
19952e0f
JD
374```
375[13:49:47.242730583 - 13:49:47.442835037] python (2353) sys_write(fd = 3 </root/bla>, count = 102395904) = 102395904, 200.104 ms
376 34760 pages allocated during the period
377 woke up kswapd during the period
378 10046 pages written on disk
379 freed 33753 pages from the cache during the period
380 1397 last dirtied filesystem page(s):
381 python (2353): 1325 pages
382 - /root/bla : 1325 pages
383 postgres (2419): 72 pages
384 - base/11951/18419 : 72 pages
385```
386
e3de48f7
JD
387## Limitations
388
389The main limitation of this project is the fact that it can be quite slow to
390process a large trace. This project is a work in progress and we focus on the
0f582ce8
AB
391problem-solving aspect. Therefore, features have been prioritized over
392performance for now.
e3de48f7 393
898ff628
JG
394One other aspect is the fact that the state is not persistent; the trace has
395to be re-processed if another analysis script is to be used on the same trace.
396Some scripts belonging to the same category allow the combination of multiple
397analyses into a single pass (see ```--freq```, ```--log```, ```--usage```,
e3de48f7
JD
398```--latencystats```, etc). We are planning to add a way to save the state
399and/or create an interactive environment to allow the user to run multiple
400analyses on the same trace without having to process the trace every time.
401
402
19952e0f 403## Conclusion
898ff628
JG
404We hope you have fun trying this project and please remember it is a work in
405progress; feedback, bug reports and improvement ideas are always welcome!
This page took 0.397462 seconds and 5 git commands to generate.