1 Demonstrations of fileslower, the Linux eBPF/bcc version. 2 3 4 fileslower shows file-based synchronous reads and writes slower than a 5 threshold. For example: 6 7 # ./fileslower 8 Tracing sync read/writes slower than 10 ms 9 TIME(s) COMM PID D BYTES LAT(ms) FILENAME 10 0.000 randread.pl 4762 R 8192 12.70 data1 11 8.850 randread.pl 4762 R 8192 11.26 data1 12 12.852 randread.pl 4762 R 8192 10.43 data1 13 14 This showed a few reads from a "randread.pl" program, each 8 Kbytes in size, 15 and from a "data1" file. These all had over 10 ms latency. 16 17 This "latency" is measured from when the read or write was issued at the VFS 18 interface, to when it completed. This spans everything: block device I/O (disk 19 I/O), file system CPU cycles, file system locks, run queue latency, etc. This 20 is a better measure of the latency suffered by applications reading from the 21 file system than measuring this down at the block device interface. 22 23 Note that this only traces file reads and writes: other file system operations 24 (eg, directory operations, open(), fflush()) are not traced. 25 26 27 The threshold can be provided as an argument. Eg, I/O slower than 1 ms: 28 29 # ./fileslower 1 30 Tracing sync read/writes slower than 1 ms 31 TIME(s) COMM PID D BYTES LAT(ms) FILENAME 32 0.000 randread.pl 6925 R 8192 1.06 data1 33 0.082 randread.pl 6925 R 8192 2.42 data1 34 0.116 randread.pl 6925 R 8192 1.78 data1 35 0.153 randread.pl 6925 R 8192 2.31 data1 36 0.330 randread.pl 6925 R 8192 1.14 data1 37 0.345 randread.pl 6925 R 8192 1.52 data1 38 0.359 randread.pl 6925 R 8192 1.04 data1 39 0.532 randread.pl 6925 R 8192 2.56 data1 40 0.609 supervise 1892 W 18 3.65 status.new 41 0.610 randread.pl 6925 R 8192 1.37 data1 42 0.614 randread.pl 6925 R 8192 3.04 data1 43 0.729 randread.pl 6925 R 8192 2.90 data1 44 0.755 randread.pl 6925 R 8192 1.12 data1 45 0.762 randread.pl 6925 R 8192 2.62 data1 46 0.771 randread.pl 6925 R 8192 1.07 data1 47 0.816 randread.pl 6925 R 8192 10.50 data1 48 0.983 randread.pl 6925 R 8192 1.73 data1 49 0.989 randread.pl 6925 R 8192 2.12 data1 50 0.992 randread.pl 6925 R 8192 2.17 data1 51 1.001 randread.pl 6925 R 8192 1.93 data1 52 1.007 randread.pl 6925 R 8192 2.03 data1 53 1.210 randread.pl 6925 R 8192 1.82 data1 54 1.213 randread.pl 6925 R 8192 2.58 data1 55 1.219 randread.pl 6925 R 8192 2.20 data1 56 1.430 randread.pl 6925 R 8192 1.01 data1 57 1.448 randread.pl 6925 R 8192 2.22 data1 58 [...] 59 60 There's now much more output (this spans only 1.4 seconds, the previous output 61 spanned 12 seconds), and the lower threshold is catching more I/O. 62 63 64 In the following example, the file system caches were dropped before running 65 fileslower, and then in another session a "man ls" was executed. The command 66 and files read from disk can be seen: 67 68 # echo 3 > /proc/sys/vm/drop_caches; ./fileslower 1 69 Tracing sync read/writes slower than 1 ms 70 TIME(s) COMM PID D BYTES LAT(ms) FILENAME 71 0.000 bash 9647 R 128 5.83 man 72 0.050 man 9647 R 832 19.52 libmandb-2.6.7.1.so 73 0.066 man 9647 R 832 15.79 libman-2.6.7.1.so 74 0.123 man 9647 R 832 56.36 libpipeline.so.1.3.0 75 0.135 man 9647 R 832 9.79 libgdbm.so.3.0.0 76 0.323 man 9647 R 4096 59.52 locale.alias 77 0.540 man 9648 R 8192 11.11 ls.1.gz 78 0.558 man 9647 R 72 6.97 index.db 79 0.563 man 9647 R 4096 5.12 index.db 80 0.723 man 9658 R 128 12.06 less 81 0.725 man 9656 R 128 14.52 nroff 82 0.779 man 9655 R 128 68.86 tbl 83 0.814 nroff 9660 R 128 14.55 locale 84 0.830 pager 9658 R 4096 28.27 .lesshst 85 0.866 man 9654 R 128 163.12 preconv 86 0.980 nroff 9684 R 128 13.80 groff 87 0.999 groff 9684 R 4096 14.29 DESC 88 1.036 groff 9685 R 128 5.94 troff 89 1.038 groff 9686 R 128 7.76 grotty 90 1.065 troff 9685 R 4096 6.33 R 91 1.082 troff 9685 R 4096 10.52 BI 92 1.096 troff 9685 R 4096 8.70 troffrc 93 1.176 troff 9685 R 4096 80.12 composite.tmac 94 1.195 troff 9685 R 4096 19.20 fallbacks.tmac 95 1.202 troff 9685 R 4096 6.79 tty.tmac 96 1.221 troff 9685 R 4096 7.87 man.local 97 2.977 supervise 1876 W 18 4.23 status.new 98 99 This caught an individual I/O reaching 163.12 ms, for the "preconv" file. While 100 the file system cache was flushed, causing these to need to be read from disk, 101 the duration here may not be entirely disk I/O: it can include file system 102 locks, run queue latency, etc. These can be explored using other commands. 103 104 105 USAGE message: 106 107 # ./fileslower -h 108 usage: fileslower.py [-h] [-p PID] [-a] [min_ms] 109 110 Trace slow synchronous file reads and writes 111 112 positional arguments: 113 min_ms minimum I/O duration to trace, in ms (default 10) 114 115 optional arguments: 116 -h, --help show this help message and exit 117 -p PID, --pid PID trace this PID only 118 -a, --all-files include non-regular file types 119 120 examples: 121 ./fileslower # trace sync file I/O slower than 10 ms (default) 122 ./fileslower 1 # trace sync file I/O slower than 1 ms 123 ./fileslower -p 185 # trace PID 185 only 124