1 Demonstrations of zfsslower, the Linux eBPF/bcc version. 2 3 4 zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold. 5 It has been written to work on ZFS on Linux (http://zfsonlinux.org). For 6 example: 7 8 # ./zfsslower 9 Tracing ZFS operations slower than 10 ms 10 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 11 06:31:28 dd 25570 W 131072 38784 303.92 data1 12 06:31:34 dd 25686 W 131072 38784 388.28 data1 13 06:31:35 dd 25686 W 131072 78720 519.66 data1 14 06:31:35 dd 25686 W 131072 116992 405.94 data1 15 06:31:35 dd 25686 W 131072 153600 433.52 data1 16 06:31:36 dd 25686 W 131072 188672 314.37 data1 17 06:31:36 dd 25686 W 131072 222336 372.33 data1 18 06:31:36 dd 25686 W 131072 254592 309.59 data1 19 06:31:37 dd 25686 W 131072 285440 304.52 data1 20 06:31:37 dd 25686 W 131072 315008 236.45 data1 21 06:31:37 dd 25686 W 131072 343424 193.54 data1 22 06:31:38 dd 25686 W 131072 370560 286.07 data1 23 06:31:38 dd 25686 W 131072 396672 251.92 data1 24 [...] 25 26 This shows writes to a "data1" file, each taking well over the 10 ms threshold. 27 the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte 28 write by the "dd" command. 29 30 This "latency" is measured from when the operation was issued from the VFS 31 interface to the file system (via the ZFS POSIX layer), to when it completed. 32 This spans everything: block device I/O (disk I/O), file system CPU cycles, 33 file system locks, run queue latency, etc. This is a better measure of the 34 latency suffered by applications reading from the file system than measuring 35 this down at the block device interface. 36 37 Note that this only traces the common file system operations previously 38 listed: other file system operations (eg, inode operations including 39 getattr()) are not traced. 40 41 42 A threshold of 0 will trace all operations. Warning: the output will be 43 verbose, as it will include all file system cache hits. 44 45 # ./zfsslower 0 46 Tracing ZFS operations 47 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 48 06:36:07 dd 32242 O 0 0 0.01 data1 49 06:36:07 dd 32242 W 131072 0 0.25 data1 50 06:36:07 dd 32242 W 131072 128 0.03 data1 51 06:36:07 dd 32242 W 131072 256 0.04 data1 52 06:36:07 dd 32242 W 131072 384 0.04 data1 53 06:36:07 dd 32242 W 131072 512 0.04 data1 54 06:36:07 dd 32242 W 131072 640 0.03 data1 55 06:36:07 dd 32242 W 131072 768 0.03 data1 56 06:36:07 dd 32242 W 131072 896 0.04 data1 57 06:36:07 dd 32242 W 131072 1024 0.28 data1 58 06:36:07 dd 32242 W 131072 1152 0.04 data1 59 06:36:07 dd 32242 W 131072 1280 0.03 data1 60 [...] 61 06:36:07 dd 32242 W 131072 13824 0.04 data1 62 06:36:07 dd 32242 W 131072 13952 0.04 data1 63 06:36:07 dd 32242 W 131072 14080 0.04 data1 64 06:36:07 dd 32242 W 131072 14208 398.92 data1 65 06:36:07 dd 32242 W 131072 14336 0.04 data1 66 06:36:07 dd 32242 W 131072 14464 0.04 data1 67 06:36:07 dd 32242 W 131072 15104 0.03 data1 68 [...] 69 70 The output now includes the open operation for this file ("O"), and then the 71 writes. Most of the writes are very fast, with only an occasional outlier that 72 is in the hundreds of milliseconds. 73 74 Fortunately this is not a real world environment: I setup a zpool on top of a 75 XFS file system for testing purposes. More debugging using other tools will 76 explain these outliers: possibly XFS flushing. 77 78 79 Here's a random read workload, and showing operations slower than 1 ms: 80 81 # ./zfsslower 1 82 Tracing ZFS operations slower than 1 ms 83 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 84 06:47:30 randread.pl 15431 R 8192 97840 1.03 data1 85 06:47:30 randread.pl 15431 R 8192 416744 1.12 data1 86 06:47:31 randread.pl 15431 R 8192 228856 1.96 data1 87 06:47:31 randread.pl 15431 R 8192 452248 1.02 data1 88 06:47:31 randread.pl 15431 R 8192 315288 5.90 data1 89 06:47:31 randread.pl 15431 R 8192 752696 1.20 data1 90 06:47:31 randread.pl 15431 R 8192 481832 1.39 data1 91 06:47:31 randread.pl 15431 R 8192 673752 1.39 data1 92 06:47:31 randread.pl 15431 R 8192 691736 1.01 data1 93 06:47:31 randread.pl 15431 R 8192 694776 1.78 data1 94 06:47:31 randread.pl 15431 R 8192 403328 3.75 data1 95 06:47:31 randread.pl 15431 R 8192 567688 1.08 data1 96 06:47:31 randread.pl 15431 R 8192 694280 1.31 data1 97 06:47:31 randread.pl 15431 R 8192 669280 1.06 data1 98 06:47:31 randread.pl 15431 R 8192 426608 1.56 data1 99 06:47:31 randread.pl 15431 R 8192 42512 1.01 data1 100 06:47:31 randread.pl 15431 R 8192 22944 1.33 data1 101 06:47:31 randread.pl 15431 R 8192 427432 1.48 data1 102 06:47:31 randread.pl 15431 R 8192 261320 1.28 data1 103 06:47:31 randread.pl 15431 R 8192 132248 1.23 data1 104 06:47:31 randread.pl 15431 R 8192 96936 1.04 data1 105 06:47:31 randread.pl 15431 R 8192 482800 2.63 data1 106 [...] 107 108 109 A -j option will print just the fields (parsable output, csv): 110 111 # ./zfsslower -j 1 112 ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE 113 252305490911,randread.pl,17922,R,8192,163446784,1156,data1 114 252305493852,randread.pl,17922,R,8192,321437696,1129,data1 115 252305498839,randread.pl,17922,R,8192,475152384,1154,data1 116 252305505515,randread.pl,17922,R,8192,49094656,1082,data1 117 252305506774,randread.pl,17922,R,8192,470401024,1245,data1 118 252305509265,randread.pl,17922,R,8192,553246720,2412,data1 119 252305512365,randread.pl,17922,R,8192,20963328,1093,data1 120 252305513755,randread.pl,17922,R,8192,304111616,1350,data1 121 252305583330,randread.pl,17922,R,8192,166174720,1154,data1 122 252305593913,randread.pl,17922,R,8192,175079424,1241,data1 123 252305602833,randread.pl,17922,R,8192,305340416,3307,data1 124 252305608663,randread.pl,17922,R,8192,655958016,2704,data1 125 252305611212,randread.pl,17922,R,8192,40951808,1033,data1 126 252305614609,randread.pl,17922,R,8192,318922752,2687,data1 127 252305623800,randread.pl,17922,R,8192,246734848,2983,data1 128 252305711125,randread.pl,17922,R,8192,581795840,1091,data1 129 252305728694,randread.pl,17922,R,8192,710483968,1034,data1 130 252305762046,randread.pl,17922,R,8192,329367552,1405,data1 131 252305798215,randread.pl,17922,R,8192,44482560,1030,data1 132 252305806748,randread.pl,17922,R,8192,660602880,1069,data1 133 252305826360,randread.pl,17922,R,8192,616144896,2327,data1 134 [...] 135 136 137 USAGE message: 138 139 # ./zfsslower -h 140 usage: zfsslower [-h] [-j] [-p PID] [min_ms] 141 142 Trace common ZFS file operations slower than a threshold 143 144 positional arguments: 145 min_ms minimum I/O duration to trace, in ms (default 10) 146 147 optional arguments: 148 -h, --help show this help message and exit 149 -j, --csv just print fields: comma-separated values 150 -p PID, --pid PID trace this PID only 151 152 examples: 153 ./zfsslower # trace operations slower than 10 ms (default) 154 ./zfsslower 1 # trace operations slower than 1 ms 155 ./zfsslower -j 1 # ... 1 ms, parsable output (csv) 156 ./zfsslower 0 # trace all operations (warning: verbose) 157 ./zfsslower -p 185 # trace PID 185 only 158