Home | History | Annotate | Download | only in tools
      1 Demonstrations of zfsslower, the Linux eBPF/bcc version.
      2 
      3 
      4 zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold.
      5 It has been written to work on ZFS on Linux (http://zfsonlinux.org). For
      6 example:
      7 
      8 # ./zfsslower 
      9 Tracing ZFS operations slower than 10 ms
     10 TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
     11 06:31:28 dd             25570  W 131072  38784     303.92 data1
     12 06:31:34 dd             25686  W 131072  38784     388.28 data1
     13 06:31:35 dd             25686  W 131072  78720     519.66 data1
     14 06:31:35 dd             25686  W 131072  116992    405.94 data1
     15 06:31:35 dd             25686  W 131072  153600    433.52 data1
     16 06:31:36 dd             25686  W 131072  188672    314.37 data1
     17 06:31:36 dd             25686  W 131072  222336    372.33 data1
     18 06:31:36 dd             25686  W 131072  254592    309.59 data1
     19 06:31:37 dd             25686  W 131072  285440    304.52 data1
     20 06:31:37 dd             25686  W 131072  315008    236.45 data1
     21 06:31:37 dd             25686  W 131072  343424    193.54 data1
     22 06:31:38 dd             25686  W 131072  370560    286.07 data1
     23 06:31:38 dd             25686  W 131072  396672    251.92 data1
     24 [...]
     25 
     26 This shows writes to a "data1" file, each taking well over the 10 ms threshold.
     27 the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte
     28 write by the "dd" command.
     29 
     30 This "latency" is measured from when the operation was issued from the VFS
     31 interface to the file system (via the ZFS POSIX layer), to when it completed.
     32 This spans everything: block device I/O (disk I/O), file system CPU cycles,
     33 file system locks, run queue latency, etc. This is a better measure of the
     34 latency suffered by applications reading from the file system than measuring
     35 this down at the block device interface.
     36 
     37 Note that this only traces the common file system operations previously
     38 listed: other file system operations (eg, inode operations including
     39 getattr()) are not traced.
     40 
     41 
     42 A threshold of 0 will trace all operations. Warning: the output will be
     43 verbose, as it will include all file system cache hits.
     44 
     45 # ./zfsslower 0
     46 Tracing ZFS operations
     47 TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
     48 06:36:07 dd             32242  O 0       0           0.01 data1
     49 06:36:07 dd             32242  W 131072  0           0.25 data1
     50 06:36:07 dd             32242  W 131072  128         0.03 data1
     51 06:36:07 dd             32242  W 131072  256         0.04 data1
     52 06:36:07 dd             32242  W 131072  384         0.04 data1
     53 06:36:07 dd             32242  W 131072  512         0.04 data1
     54 06:36:07 dd             32242  W 131072  640         0.03 data1
     55 06:36:07 dd             32242  W 131072  768         0.03 data1
     56 06:36:07 dd             32242  W 131072  896         0.04 data1
     57 06:36:07 dd             32242  W 131072  1024        0.28 data1
     58 06:36:07 dd             32242  W 131072  1152        0.04 data1
     59 06:36:07 dd             32242  W 131072  1280        0.03 data1
     60 [...]
     61 06:36:07 dd             32242  W 131072  13824       0.04 data1
     62 06:36:07 dd             32242  W 131072  13952       0.04 data1
     63 06:36:07 dd             32242  W 131072  14080       0.04 data1
     64 06:36:07 dd             32242  W 131072  14208     398.92 data1
     65 06:36:07 dd             32242  W 131072  14336       0.04 data1
     66 06:36:07 dd             32242  W 131072  14464       0.04 data1
     67 06:36:07 dd             32242  W 131072  15104       0.03 data1
     68 [...]
     69 
     70 The output now includes the open operation for this file ("O"), and then the
     71 writes. Most of the writes are very fast, with only an occasional outlier that
     72 is in the hundreds of milliseconds.
     73 
     74 Fortunately this is not a real world environment: I setup a zpool on top of a
     75 XFS file system for testing purposes. More debugging using other tools will
     76 explain these outliers: possibly XFS flushing.
     77 
     78 
     79 Here's a random read workload, and showing operations slower than 1 ms:
     80 
     81 # ./zfsslower 1
     82 Tracing ZFS operations slower than 1 ms
     83 TIME     COMM           PID    T BYTES   OFF_KB   LAT(ms) FILENAME
     84 06:47:30 randread.pl    15431  R 8192    97840       1.03 data1
     85 06:47:30 randread.pl    15431  R 8192    416744      1.12 data1
     86 06:47:31 randread.pl    15431  R 8192    228856      1.96 data1
     87 06:47:31 randread.pl    15431  R 8192    452248      1.02 data1
     88 06:47:31 randread.pl    15431  R 8192    315288      5.90 data1
     89 06:47:31 randread.pl    15431  R 8192    752696      1.20 data1
     90 06:47:31 randread.pl    15431  R 8192    481832      1.39 data1
     91 06:47:31 randread.pl    15431  R 8192    673752      1.39 data1
     92 06:47:31 randread.pl    15431  R 8192    691736      1.01 data1
     93 06:47:31 randread.pl    15431  R 8192    694776      1.78 data1
     94 06:47:31 randread.pl    15431  R 8192    403328      3.75 data1
     95 06:47:31 randread.pl    15431  R 8192    567688      1.08 data1
     96 06:47:31 randread.pl    15431  R 8192    694280      1.31 data1
     97 06:47:31 randread.pl    15431  R 8192    669280      1.06 data1
     98 06:47:31 randread.pl    15431  R 8192    426608      1.56 data1
     99 06:47:31 randread.pl    15431  R 8192    42512       1.01 data1
    100 06:47:31 randread.pl    15431  R 8192    22944       1.33 data1
    101 06:47:31 randread.pl    15431  R 8192    427432      1.48 data1
    102 06:47:31 randread.pl    15431  R 8192    261320      1.28 data1
    103 06:47:31 randread.pl    15431  R 8192    132248      1.23 data1
    104 06:47:31 randread.pl    15431  R 8192    96936       1.04 data1
    105 06:47:31 randread.pl    15431  R 8192    482800      2.63 data1
    106 [...]
    107 
    108 
    109 A -j option will print just the fields (parsable output, csv):
    110 
    111 # ./zfsslower -j 1
    112 ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
    113 252305490911,randread.pl,17922,R,8192,163446784,1156,data1
    114 252305493852,randread.pl,17922,R,8192,321437696,1129,data1
    115 252305498839,randread.pl,17922,R,8192,475152384,1154,data1
    116 252305505515,randread.pl,17922,R,8192,49094656,1082,data1
    117 252305506774,randread.pl,17922,R,8192,470401024,1245,data1
    118 252305509265,randread.pl,17922,R,8192,553246720,2412,data1
    119 252305512365,randread.pl,17922,R,8192,20963328,1093,data1
    120 252305513755,randread.pl,17922,R,8192,304111616,1350,data1
    121 252305583330,randread.pl,17922,R,8192,166174720,1154,data1
    122 252305593913,randread.pl,17922,R,8192,175079424,1241,data1
    123 252305602833,randread.pl,17922,R,8192,305340416,3307,data1
    124 252305608663,randread.pl,17922,R,8192,655958016,2704,data1
    125 252305611212,randread.pl,17922,R,8192,40951808,1033,data1
    126 252305614609,randread.pl,17922,R,8192,318922752,2687,data1
    127 252305623800,randread.pl,17922,R,8192,246734848,2983,data1
    128 252305711125,randread.pl,17922,R,8192,581795840,1091,data1
    129 252305728694,randread.pl,17922,R,8192,710483968,1034,data1
    130 252305762046,randread.pl,17922,R,8192,329367552,1405,data1
    131 252305798215,randread.pl,17922,R,8192,44482560,1030,data1
    132 252305806748,randread.pl,17922,R,8192,660602880,1069,data1
    133 252305826360,randread.pl,17922,R,8192,616144896,2327,data1
    134 [...]
    135 
    136 
    137 USAGE message:
    138 
    139 # ./zfsslower -h
    140 usage: zfsslower [-h] [-j] [-p PID] [min_ms]
    141 
    142 Trace common ZFS file operations slower than a threshold
    143 
    144 positional arguments:
    145   min_ms             minimum I/O duration to trace, in ms (default 10)
    146 
    147 optional arguments:
    148   -h, --help         show this help message and exit
    149   -j, --csv          just print fields: comma-separated values
    150   -p PID, --pid PID  trace this PID only
    151 
    152 examples:
    153     ./zfsslower             # trace operations slower than 10 ms (default)
    154     ./zfsslower 1           # trace operations slower than 1 ms
    155     ./zfsslower -j 1        # ... 1 ms, parsable output (csv)
    156     ./zfsslower 0           # trace all operations (warning: verbose)
    157     ./zfsslower -p 185      # trace PID 185 only
    158