Merge pull request #18 from abusque/decouple-cputop
[deliverable/lttng-analyses.git] / README.md
1 # lttng-analyses
2
3 This repository contains various scripts to extract monitoring data and metrics
4 from LTTng kernel traces.
5
6 As opposed to other diagnostic or monitoring solutions, this approach is designed
7 to allow users to record their system's activity with a low overhead, wait
8 for a problem to occur and then diagnose its cause offline.
9
10 This solution allows the user to target hard to find problems and dig until the
11 root cause is found.
12
13 This README describes the implemented analyses as well as how to use them.
14
15 ## Requirements
16 * LTTng >= 2.5
17 * Babeltrace >= 1.2 (with python bindings built)
18 * Python >= 3.4
19
20 ## Install
21 ### Ubuntu (12.04 and 14.04 at least)
22 ```bash
23 apt-get install -y software-properties-common (or python-software-properties on 12.04)
24 apt-add-repository -y ppa:lttng/ppa
25 apt-get update
26 apt-get -y install lttng-tools babeltrace lttng-modules-dkms python3-babeltrace python3-progressbar
27 git clone https://github.com/lttng/lttng-analyses.git
28 cd lttng-analyses
29 ./install.sh (or run from the local directory)
30 ```
31
32 ### Other distros
33 Please refer to the [LTTng documentation](http://lttng.org/download/) to
34 install LTTng and the [Babeltrace
35 README](http://git.efficios.com/?p=babeltrace.git;a=blob_plain;f=README;hb=HEAD)
36 to install ```babeltrace``` with the python bindings. Optionally install the
37 ```progressbar``` python module, and then:
38
39 ```bash
40 git clone https://github.com/lttng/lttng-analyses.git
41 cd lttng-analyses
42 ./install.sh (or run from the local directory)
43 ```
44
45 ## Trace creation
46 Here are the basic commands to create a trace, for more information on the
47 LTTng setup, please refer to the [LTTng
48 documentation](http://lttng.org/docs/#doc-tracing-the-linux-kernel)
49
50 ### Automatic
51 From the cloned git tree:
52 ```bash
53 ./lttng-analyses-record
54 ```
55
56 ### Manual
57 ```bash
58 lttng create
59 lttng enable-channel -k bla --subbuf-size=4M
60 lttng enable-event -k sched_switch,block_rq_complete,block_rq_issue,block_bio_remap,block_bio_backmerge,netif_receive_skb,net_dev_xmit,sched_process_fork,sched_process_exec,lttng_statedump_process_state,lttng_statedump_file_descriptor,lttng_statedump_block_device,writeback_pages_written,mm_vmscan_wakeup_kswapd,mm_page_free,mm_page_alloc,block_dirty_buffer,irq_handler_entry,irq_handler_exit,softirq_entry,softirq_exit,softirq_raise -c bla
61 lttng enable-event -k --syscall -a -c bla
62 lttng start
63 ..do stuff...
64 lttng stop
65 lttng destroy
66 ```
67
68 ### Remote trace creation
69 You can also create a trace on a server and send it to a remote host. The
70 remote host only needs to run `lttng-relayd -d` and be reachable over the network.
71 The only difference with the above commands is the tracing session's creation:
72 ```bash
73 lttng create -U net://<remote-host>
74 ```
75
76 ## Implemented analyses
77
78 * CPU usage for the whole system
79 * CPU usage per-process
80 * Process CPU migration count
81 * Memory usage per-process (as seen by the kernel)
82 * Memory usage system-wide (as seen by the kernel)
83 * I/O usage (syscalls, disk, network)
84 * I/O operations log (with latency and usage)
85 * I/O latency statistics (open, read, write, sync operations)
86 * I/O latency frequency distribution
87 * Interrupt handler duration statistics (count, min, max, average stdev)
88 * Interrupt handler duration top
89 * Interrupt handler duration log
90 * Interrupt handler duration frequency distribution
91 * SoftIRQ handler latency statistics
92 * Syscalls usage statistics
93
94 All of the analyses share the same code architecture making it possible
95 to filter by timerange, process name, PID, min and max values using the
96 same command-line options. Also note that reported timestamps can
97 optionally be expressed in the GMT timezone to allow easy sharing between
98 teams.
99
100 The project's architecture makes it easy to add new analyses or to reuse
101 the analysis backend in external tools which may then present the results
102 in their own format (as opposed to text).
103
104 ## Examples
105 After having collected your trace, any script contained in this repository
106 can be used to run an analysis. Read on for some examples!
107
108 ### I/O
109 #### I/O latency stats
110 ```bash
111 $ ./lttng-iolatencystats mytrace/
112 Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936]
113 Syscalls latency statistics (usec):
114 Type Count Min Average Max Stdev
115 -----------------------------------------------------------------------------------------
116 Open 45 5.562 13.835 77.683 15.263
117 Read 109 0.316 5.774 62.569 9.277
118 Write 101 0.256 7.060 48.531 8.555
119 Sync 207 19.384 40.664 160.188 21.201
120
121 Disk latency statistics (usec):
122 Name Count Min Average Max Stdev
123 -----------------------------------------------------------------------------------------
124 dm-0 108 0.001 0.004 0.007 1.306
125 ```
126
127 #### I/O latency frequency distribution
128 ```bash
129 $ ./lttng-iolatencyfreq mytrace/
130 Timerange: [2015-01-06 10:58:26.140545481, 2015-01-06 10:58:27.229358936]
131 Open latency distribution (usec)
132 ###############################################################################
133 5.562 ███████████████████████████████████████████████████████████████████ 25
134 9.168 ██████████ 4
135 12.774 █████████████████████ 8
136 16.380 ████████ 3
137 19.986 █████ 2
138 23.592 0
139 27.198 0
140 30.804 0
141 34.410 ██ 1
142 38.016 0
143 41.623 0
144 45.229 0
145 48.835 0
146 52.441 0
147 56.047 0
148 59.653 0
149 63.259 0
150 66.865 0
151 70.471 0
152 74.077 █████ 2
153 ```
154
155 #### I/O latency top
156 ```bash
157 $ ./lttng-iolatencytop analysis-20150115-120942/ --limit 3 --minsize 2
158 Checking the trace for lost events...
159 Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313]
160 Top open syscall latencies (usec)
161 Begin End Name Duration (usec) Size Proc PID Filename
162 [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13)
163 [12:18:52.946080165,12:18:52.946132278] open 52.113 N/A apache2 31588 /var/lib/php5/sess_mr9045p1k55vin1h0vg7rhgd63 (fd=13)
164 [12:18:46.800846035,12:18:46.800874916] open 28.881 N/A apache2 31591 /var/lib/php5/sess_r7c12pccfvjtas15g3j69u14h0 (fd=13)
165 [12:18:51.389797604,12:18:51.389824426] open 26.822 N/A apache2 31520 /var/lib/php5/sess_4sdb1rtjkhb78sabnoj8gpbl00 (fd=13)
166
167 Top read syscall latencies (usec)
168 Begin End Name Duration (usec) Size Proc PID Filename
169 [12:18:37.256073107,12:18:37.256555967] read 482.860 7.00 B bash 10237 unknown (origin not found) (fd=3)
170 [12:18:52.000209798,12:18:52.000252304] read 42.506 1.00 KB irqbalance 1337 /proc/interrupts (fd=3)
171 [12:18:37.256559439,12:18:37.256601615] read 42.176 5.00 B bash 10237 unknown (origin not found) (fd=3)
172 [12:18:42.000281918,12:18:42.000320016] read 38.098 1.00 KB irqbalance 1337 /proc/interrupts (fd=3)
173
174 Top write syscall latencies (usec)
175 Begin End Name Duration (usec) Size Proc PID Filename
176 [12:18:49.913241516,12:18:49.915908862] write 2667.346 95.00 B apache2 31584 /var/log/apache2/access.log (fd=8)
177 [12:18:37.472823631,12:18:37.472859836] writev 36.205 21.97 KB apache2 31544 unknown (origin not found) (fd=12)
178 [12:18:37.991578372,12:18:37.991612724] writev 34.352 21.97 KB apache2 31589 unknown (origin not found) (fd=12)
179 [12:18:39.547778549,12:18:39.547812515] writev 33.966 21.97 KB apache2 31584 unknown (origin not found) (fd=12)
180
181 Top sync syscall latencies (usec)
182 Begin End Name Duration (usec) Size Proc PID Filename
183 [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None)
184 [12:18:37.227867532,12:18:37.232289687] sync_file_range 4422.155 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
185 [12:18:37.238076585,12:18:37.239012027] sync_file_range 935.442 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
186 [12:18:37.220974711,12:18:37.221647124] sync_file_range 672.413 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
187 ```
188
189 #### I/O operations log
190 ```bash
191 $ ./lttng-iolog mytrace/
192 [10:58:26.221618530,10:58:26.221620659] write 2.129 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
193 [10:58:26.221623609,10:58:26.221628055] read 4.446 50.00 B /usr/bin/x-term 11793 /dev/ptmx (fd=24)
194 [10:58:26.221638929,10:58:26.221640008] write 1.079 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
195 [10:58:26.221676232,10:58:26.221677385] read 1.153 8.00 B /usr/bin/x-term 11793 anon_inode:[eventfd] (fd=5)
196 [10:58:26.223401804,10:58:26.223411683] open 9.879 N/A sleep 12420 /etc/ld.so.cache (fd=3)
197 [10:58:26.223448060,10:58:26.223455577] open 7.517 N/A sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3)
198 [10:58:26.223456522,10:58:26.223458898] read 2.376 832.00 B sleep 12420 /lib/x86_64-linux-gnu/libc.so.6 (fd=3)
199 [10:58:26.223918068,10:58:26.223929316] open 11.248 N/A sleep 12420 (fd=3)
200 [10:58:26.231881565,10:58:26.231895970] writev 14.405 16.00 B /usr/bin/x-term 11793 socket:[45650] (fd=4)
201 [10:58:26.231979636,10:58:26.231988446] recvmsg 8.810 16.00 B Xorg 1827 socket:[47480] (fd=38)
202 ```
203
204
205 #### I/O usage top
206 ```bash
207 $ ./lttng-iousagetop traces/pgread-writes
208 Timerange: [2014-10-07 16:36:00.733214969, 2014-10-07 16:36:18.804584183]
209 Per-process I/O Read
210 ###############################################################################
211 ██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 4.00 B net 16.00 MB unknown
212 █████ 1.72 MB lttng-consumerd (2619) 0 B file 0 B net 1.72 MB unknown
213 █ 398.13 KB postgres (4219) 121.05 KB file 277.07 KB net 8.00 B unknown
214 256.09 KB postgres (1348) 0 B file 255.97 KB net 117.00 B unknown
215 204.81 KB postgres (4218) 204.81 KB file 0 B net 0 B unknown
216 123.77 KB postgres (4220) 117.50 KB file 6.26 KB net 8.00 B unknown
217 Per-process I/O Write
218 ###############################################################################
219 ██████████████████████████████████████████████████ 16.00 MB lttng-consumerd (2619) 0 B file 8.00 MB net 8.00 MB unknown
220 ██████ 2.20 MB postgres (4219) 2.00 MB file 202.23 KB net 0 B unknown
221 █████ 1.73 MB lttng-consumerd (2619) 0 B file 887.73 KB net 882.58 KB unknown
222 ██ 726.33 KB postgres (1165) 8.00 KB file 6.33 KB net 712.00 KB unknown
223 158.69 KB postgres (1168) 158.69 KB file 0 B net 0 B unknown
224 80.66 KB postgres (1348) 0 B file 80.66 KB net 0 B unknown
225 Files Read
226 ###############################################################################
227 ██████████████████████████████████████████████████ 8.00 MB anon_inode:[lttng_stream] (lttng-consumerd) 'fd 32 in lttng-consumerd (2619)'
228 █████ 834.41 KB base/16384/pg_internal.init 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
229 █ 256.09 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)'
230 █ 174.69 KB pg_stat_tmp/pgstat.stat 'fd 9 in postgres (4218)', 'fd 9 in postgres (1167)'
231 109.48 KB global/pg_internal.init 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
232 104.30 KB base/11951/pg_internal.init 'fd 7 in postgres (4218)'
233 12.85 KB socket (lttng-sessiond) 'fd 30 in lttng-sessiond (384)'
234 4.50 KB global/pg_filenode.map 'fd 7 in postgres (4218)', 'fd 7 in postgres (4219)', 'fd 7 in postgres (4220)', 'fd 7 in postgres (4221)', 'fd 7 in postgres (4222)', 'fd 7 in postgres (4223)', 'fd 7 in postgres (4224)', 'fd 7 in postgres (4225)', 'fd 7 in postgres (4226)'
235 4.16 KB socket (postgres) 'fd 9 in postgres (4226)'
236 4.00 KB /proc/interrupts 'fd 3 in irqbalance (1104)'
237 Files Write
238 ###############################################################################
239 ██████████████████████████████████████████████████ 8.00 MB socket:[56371] (lttng-consumerd) 'fd 30 in lttng-consumerd (2619)'
240 █████████████████████████████████████████████████ 8.00 MB pipe:[53306] (lttng-consumerd) 'fd 12 in lttng-consumerd (2619)'
241 ██████████ 1.76 MB pg_xlog/00000001000000000000000B 'fd 31 in postgres (4219)'
242 █████ 887.82 KB socket:[56369] (lttng-consumerd) 'fd 26 in lttng-consumerd (2619)'
243 █████ 882.58 KB pipe:[53309] (lttng-consumerd) 'fd 18 in lttng-consumerd (2619)'
244 160.00 KB /var/lib/postgresql/9.1/main/base/16384/16602 'fd 14 in postgres (1165)'
245 158.69 KB pg_stat_tmp/pgstat.tmp 'fd 3 in postgres (1168)'
246 144.00 KB /var/lib/postgresql/9.1/main/base/16384/16613 'fd 12 in postgres (1165)'
247 88.00 KB /var/lib/postgresql/9.1/main/base/16384/16609 'fd 11 in postgres (1165)'
248 78.28 KB socket:[8893] (postgres) 'fd 9 in postgres (1348)'
249 Block I/O Read
250 ###############################################################################
251 Block I/O Write
252 ###############################################################################
253 ██████████████████████████████████████████████████ 1.76 MB postgres (pid=4219)
254 ████ 160.00 KB postgres (pid=1168)
255 ██ 100.00 KB kworker/u8:0 (pid=1540)
256 ██ 96.00 KB jbd2/vda1-8 (pid=257)
257 █ 40.00 KB postgres (pid=1166)
258 8.00 KB kworker/u9:0 (pid=4197)
259 4.00 KB kworker/u9:2 (pid=1381)
260 Disk nr_sector
261 ###############################################################################
262 ███████████████████████████████████████████████████████████████████ 4416.00 sectors vda1
263 Disk nr_requests
264 ###############################################################################
265 ████████████████████████████████████████████████████████████████████ 177.00 requests vda1
266 Disk request time/sector
267 ###############################################################################
268 ██████████████████████████████████████████████████████████████████ 0.01 ms vda1
269 Network recv_bytes
270 ###############################################################################
271 ███████████████████████████████████████████████████████ 739.50 KB eth0
272 █████ 80.27 KB lo
273 Network sent_bytes
274 ###############################################################################
275 ████████████████████████████████████████████████████████ 9.36 MB eth0
276 ```
277
278 ### Syscalls
279 #### Statistics
280 ```bash
281 $ ./lttng-syscallstats mytrace/
282 Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313]
283 Per-TID syscalls statistics (usec)
284 find (22785) Count Min Average Max Stdev Return values
285 - getdents 14240 0.380 364.301 43372.450 1629.390 {'success': 14240}
286 - close 14236 0.233 0.506 4.932 0.217 {'success': 14236}
287 - fchdir 14231 0.252 0.407 5.769 0.117 {'success': 14231}
288 - open 7123 0.779 2.321 12.697 0.936 {'success': 7119, 'ENOENT': 4}
289 - newfstatat 7118 1.457 143.562 28103.532 1410.281 {'success': 7118}
290 - openat 7118 1.525 2.411 9.107 0.771 {'success': 7118}
291 - newfstat 7117 0.272 0.654 8.707 0.248 {'success': 7117}
292 - write 573 0.298 0.715 8.584 0.391 {'success': 573}
293 - brk 27 0.615 5.768 30.792 7.830 {'success': 27}
294 - rt_sigaction 22 0.227 0.283 0.589 0.098 {'success': 22}
295 - mmap 12 1.116 2.116 3.597 0.762 {'success': 12}
296 - mprotect 6 1.185 2.235 3.923 1.148 {'success': 6}
297 - read 5 0.925 2.101 6.300 2.351 {'success': 5}
298 - ioctl 4 0.342 1.151 2.280 0.873 {'success': 2, 'ENOTTY': 2}
299 - access 4 1.166 2.530 4.202 1.527 {'ENOENT': 4}
300 - rt_sigprocmask 3 0.325 0.570 0.979 0.357 {'success': 3}
301 - dup2 2 0.250 0.562 0.874 ? {'success': 2}
302 - munmap 2 3.006 5.399 7.792 ? {'success': 2}
303 - execve 1 7277.974 7277.974 7277.974 ? {'success': 1}
304 - setpgid 1 0.945 0.945 0.945 ? {'success': 1}
305 - fcntl 1 ? 0.000 0.000 ? {}
306 - newuname 1 1.240 1.240 1.240 ? {'success': 1}
307 Total: 71847
308 -----------------------------------------------------------------------------------------------------------------
309 apache2 (31517) Count Min Average Max Stdev Return values
310 - fcntl 192 ? 0.000 0.000 ? {}
311 - newfstat 156 0.237 0.484 1.102 0.222 {'success': 156}
312 - read 144 0.307 1.602 16.307 1.698 {'success': 117, 'EAGAIN': 27}
313 - access 96 0.705 1.580 3.364 0.670 {'success': 12, 'ENOENT': 84}
314 - newlstat 84 0.459 0.738 1.456 0.186 {'success': 63, 'ENOENT': 21}
315 - newstat 74 0.735 2.266 11.212 1.772 {'success': 50, 'ENOENT': 24}
316 - lseek 72 0.317 0.522 0.915 0.112 {'success': 72}
317 - close 39 0.471 0.615 0.867 0.069 {'success': 39}
318 - open 36 2.219 12162.689 437697.753 72948.868 {'success': 36}
319 - getcwd 28 0.287 0.701 1.331 0.277 {'success': 28}
320 - poll 27 1.080 1139.669 2851.163 856.723 {'success': 27}
321 - times 24 0.765 0.956 1.327 0.107 {'success': 24}
322 - setitimer 24 0.499 5.848 16.668 4.041 {'success': 24}
323 - write 24 5.467 6.784 16.827 2.459 {'success': 24}
324 - writev 24 10.241 17.645 29.817 5.116 {'success': 24}
325 - mmap 15 3.060 3.482 4.406 0.317 {'success': 15}
326 - munmap 15 2.944 3.502 4.154 0.427 {'success': 15}
327 - brk 12 0.738 4.579 13.795 4.437 {'success': 12}
328 - chdir 12 0.989 1.600 2.353 0.385 {'success': 12}
329 - flock 6 0.906 1.282 2.043 0.423 {'success': 6}
330 - rt_sigaction 6 0.530 0.725 1.123 0.217 {'success': 6}
331 - pwrite64 6 1.262 1.430 1.692 0.143 {'success': 6}
332 - rt_sigprocmask 6 0.539 0.650 0.976 0.162 {'success': 6}
333 - shutdown 3 7.323 8.487 10.281 1.576 {'success': 3}
334 - getsockname 3 1.015 1.228 1.585 0.311 {'success': 3}
335 - accept4 3 5174453.611 3450157.282 5176018.235 ? {'success': 2}
336 Total: 1131
337 ```
338
339 ### IRQ
340 #### Handler duration and raise latency statistics
341 ```bash
342 $ ./lttng-irqstats mytrace/
343 Timerange: [2014-03-11 16:05:41.314824752, 2014-03-11 16:05:45.041994298]
344 Hard IRQ Duration (us)
345 count min avg max stdev
346 ----------------------------------------------------------------------------------|
347 1: <i8042> 30 10.901 45.500 64.510 18.447 |
348 42: <ahci> 259 3.203 7.863 21.426 3.183 |
349 43: <eth0> 2 3.859 3.976 4.093 0.165 |
350 44: <iwlwifi> 92 0.300 3.995 6.542 2.181 |
351
352 Soft IRQ Duration (us) Raise latency (us)
353 count min avg max stdev | count min avg max stdev
354 ----------------------------------------------------------------------------------|------------------------------------------------------------
355 1: <TIMER_SOFTIRQ> 495 0.202 21.058 51.060 11.047 | 53 2.141 11.217 20.005 7.233
356 3: <NET_RX_SOFTIRQ> 14 0.133 9.177 32.774 10.483 | 14 0.763 3.703 10.902 3.448
357 4: <BLOCK_SOFTIRQ> 257 5.981 29.064 125.862 15.891 | 257 0.891 3.104 15.054 2.046
358 6: <TASKLET_SOFTIRQ> 26 0.309 1.198 1.748 0.329 | 26 9.636 39.222 51.430 11.246
359 7: <SCHED_SOFTIRQ> 299 1.185 14.768 90.465 15.992 | 298 1.286 31.387 61.700 11.866
360 9: <RCU_SOFTIRQ> 338 0.592 3.387 13.745 1.356 | 147 2.480 29.299 64.453 14.286
361 ```
362
363 #### Handler duration frequency distribution
364 ```bash
365 $ ./lttng-irqfreq --timerange [16:05:42,16:05:45] --irq 44 --stats mytrace/
366 Timerange: [2014-03-11 16:05:42.042034570, 2014-03-11 16:05:44.998914297]
367 Hard IRQ Duration (us)
368 count min avg max stdev
369 ----------------------------------------------------------------------------------|
370 44: <iwlwifi> 72 0.300 4.018 6.542 2.164 |
371 Frequency distribution iwlwifi (44)
372 ###############################################################################
373 0.300 █████ 1.00
374 0.612 ██████████████████████████████████████████████████████████████ 12.00
375 0.924 ████████████████████ 4.00
376 1.236 ██████████ 2.00
377 1.548 0.00
378 1.861 █████ 1.00
379 2.173 0.00
380 2.485 █████ 1.00
381 2.797 ██████████████████████████ 5.00
382 3.109 █████ 1.00
383 3.421 ███████████████ 3.00
384 3.733 0.00
385 4.045 █████ 1.00
386 4.357 █████ 1.00
387 4.669 ██████████ 2.00
388 4.981 ██████████ 2.00
389 5.294 █████████████████████████████████████████ 8.00
390 5.606 ████████████████████████████████████████████████████████████████████ 13.00
391 5.918 ██████████████████████████████████████████████████████████████ 12.00
392 6.230 ███████████████ 3.00
393 ```
394
395 ### Others
396 There are a lot of other scripts, we encourage you to try them and read the
397 ```--help``` to see all the available options.
398
399 ## Work in progress
400 Track the page cache and extract the latencies associated with pages flush to disk.
401 In order to do that, we rely on the assumption that the pages are flushed in a FIFO
402 order. It might not be 100% accurate, but it already gives great results :
403
404 An example here when saving a file in vim:
405 ```
406 [19:57:51.173332284 - 19:57:51.177794657] vim (31517) syscall_entry_fsync(fd = 4 <blabla>) = 0, 4.462 ms
407 1 dirty page(s) were flushed (assuming FIFO):
408 vim (31517): 1 pages
409 - blabla : 1 pages
410 13 active dirty filesystem page(s) (known):
411 redis-server (2092): 2 pages
412 - /var/log/redis/redis-server.log : 2 pages
413 vim (31517): 2 pages
414 - .blabla.swp : 2 pages
415 lttng-consumerd (6750): 9 pages
416 - unknown (origin not found) : 9 pages
417 ```
418
419 An other example when running the 'sync' command:
420 ```
421 [19:57:53.046840755 - 19:57:53.072809609] sync (31554) syscall_entry_sync(fd = <unknown>) = 0, 25.969 ms
422 23 dirty page(s) were flushed (assuming FIFO):
423 redis-server (2092): 2 pages
424 - /var/log/redis/redis-server.log : 2 pages
425 vim (31517): 9 pages
426 - /home/julien/.viminfo.tmp : 6 pages
427 - .blabla.swp : 3 pages
428 lttng-consumerd (6750): 12 pages
429 - unknown (origin not found) : 12 pages
430 ```
431
432 PostgreSQL with 'sys_fdatasync':
433 ```
434 [13:49:39.908599447 - 13:49:39.915930730] postgres (1137) sys_fdatasync(fd = 7 </var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008>) = 0, 7.331 ms
435 2 pages allocated during the period
436 88 dirty page(s) were flushed (assuming FIFO):
437 postgres (1137): 88 pages
438 - /var/lib/postgresql/9.1/main/pg_xlog/000000010000000000000008 : 88 pages
439 68 last dirtied filesystem page(s):
440 postgres (2419): 68 pages
441 - base/11951/18410 : 46 pages
442 - base/11951/18407 : 10 pages
443 - base/11951/18407_fsm : 6 pages
444 - base/11951/18410_fsm : 6 pages
445 ```
446
447 Detecting a fight for the I/O between a huge write and postgresql:
448 ```
449 [13:49:47.242730583 - 13:49:47.442835037] python (2353) sys_write(fd = 3 </root/bla>, count = 102395904) = 102395904, 200.104 ms
450 34760 pages allocated during the period
451 woke up kswapd during the period
452 10046 pages written on disk
453 freed 33753 pages from the cache during the period
454 1397 last dirtied filesystem page(s):
455 python (2353): 1325 pages
456 - /root/bla : 1325 pages
457 postgres (2419): 72 pages
458 - base/11951/18419 : 72 pages
459 ```
460
461 ## Limitations
462
463 The main limitation of this project is the fact that it can be quite slow to
464 process a large trace. This project is a work in progress and we focus on the
465 problem-solving aspect. Therefore, features have been prioritized over
466 performance for now.
467
468 One other aspect is the fact that the state is not persistent; the trace has
469 to be re-processed if another analysis script is to be used on the same trace.
470 Some scripts belonging to the same category allow the combination of multiple
471 analyses into a single pass (see ```--freq```, ```--log```, ```--usage```,
472 ```--latencystats```, etc). We are planning to add a way to save the state
473 and/or create an interactive environment to allow the user to run multiple
474 analyses on the same trace without having to process the trace every time.
475
476
477 ## Conclusion
478 We hope you have fun trying this project and please remember it is a work in
479 progress; feedback, bug reports and improvement ideas are always welcome!
This page took 0.052626 seconds and 5 git commands to generate.