Home | History | Annotate | Download | only in tools
      1 Demonstrations of nfsdist, the Linux eBPF/bcc version.
      2 
      3 nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their
      4 latency as a power-of-2 histogram. For example:
      5 
      6 
      7 ./nfsdist.py
      8 
      9 Tracing NFS operation latency... Hit Ctrl-C to end.
     10 
     11 operation = read
     12      usecs               : count     distribution
     13          0 -> 1          : 4        |                                        |
     14          2 -> 3          : 0        |                                        |
     15          4 -> 7          : 0        |                                        |
     16          8 -> 15         : 7107     |**************                          |
     17         16 -> 31         : 19864    |****************************************|
     18         32 -> 63         : 1494     |***                                     |
     19         64 -> 127        : 491      |                                        |
     20        128 -> 255        : 1810     |***                                     |
     21        256 -> 511        : 6356     |************                            |
     22        512 -> 1023       : 4860     |*********                               |
     23       1024 -> 2047       : 3070     |******                                  |
     24       2048 -> 4095       : 1853     |***                                     |
     25       4096 -> 8191       : 921      |*                                       |
     26       8192 -> 16383      : 122      |                                        |
     27      16384 -> 32767      : 15       |                                        |
     28      32768 -> 65535      : 5        |                                        |
     29      65536 -> 131071     : 2        |                                        |
     30     131072 -> 262143     : 1        |                                        |
     31 
     32 operation = write
     33      usecs               : count     distribution
     34          0 -> 1          : 0        |                                        |
     35          2 -> 3          : 0        |                                        |
     36          4 -> 7          : 0        |                                        |
     37          8 -> 15         : 1        |                                        |
     38         16 -> 31         : 0        |                                        |
     39         32 -> 63         : 9        |                                        |
     40         64 -> 127        : 19491    |****************************************|
     41        128 -> 255        : 3064     |******                                  |
     42        256 -> 511        : 940      |*                                       |
     43        512 -> 1023       : 365      |                                        |
     44       1024 -> 2047       : 312      |                                        |
     45       2048 -> 4095       : 119      |                                        |
     46       4096 -> 8191       : 31       |                                        |
     47       8192 -> 16383      : 84       |                                        |
     48      16384 -> 32767      : 31       |                                        |
     49      32768 -> 65535      : 5        |                                        |
     50      65536 -> 131071     : 3        |                                        |
     51     131072 -> 262143     : 0        |                                        |
     52     262144 -> 524287     : 1        |                                        |
     53 
     54 operation = getattr
     55      usecs               : count     distribution
     56          0 -> 1          : 27       |****************************************|
     57          2 -> 3          : 2        |**                                      |
     58          4 -> 7          : 3        |****                                    |
     59          8 -> 15         : 0        |                                        |
     60         16 -> 31         : 0        |                                        |
     61         32 -> 63         : 0        |                                        |
     62         64 -> 127        : 0        |                                        |
     63        128 -> 255        : 0        |                                        |
     64        256 -> 511        : 2        |**                                      |
     65        512 -> 1023       : 2        |**                                      |
     66 
     67 operation = open
     68      usecs               : count     distribution
     69          0 -> 1          : 0        |                                        |
     70          2 -> 3          : 0        |                                        |
     71          4 -> 7          : 0        |                                        |
     72          8 -> 15         : 0        |                                        |
     73         16 -> 31         : 0        |                                        |
     74         32 -> 63         : 0        |                                        |
     75         64 -> 127        : 0        |                                        |
     76        128 -> 255        : 0        |                                        |
     77        256 -> 511        : 2        |****************************************|
     78 
     79 
     80 In this example you can see that the read traffic is rather bi-modal, with about
     81 26K reads falling within 8 - 30 usecs and about 18K reads spread between 128 -
     82 8191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket.
     83 The faster read traffic is probably coming from a filesystem cache and the slower
     84 traffic from disk. The reason why the writes are so consistently fast is because
     85 this example test was run on a couple of VM's and I believe the hypervisor was
     86 caching all the write traffic to memory.
     87 
     88 This "latency" is measured from when the operation was issued from the VFS
     89 interface to the file system, to when it completed. This spans everything:
     90 RPC latency, network latency, file system CPU cycles, file system locks, run
     91 queue latency, etc. This is a better measure of the latency suffered by
     92 applications reading from a NFS share and can better expose problems
     93 experienced by NFS clients.
     94 
     95 Note that this only traces the common NFS operations (read, write, open and
     96 getattr). I chose to include getattr as a significant percentage of NFS
     97 traffic end up being getattr calls and are a good indicator of problems
     98 with an NFS server.
     99 
    100 An optional interval and a count can be provided, as well as -m to show the
    101 distributions in milliseconds. For example:
    102 
    103 ./nfsdist -m 1 5
    104 Tracing NFS operation latency... Hit Ctrl-C to end.
    105 
    106 11:02:39:
    107 
    108 operation = write
    109      msecs               : count     distribution
    110          0 -> 1          : 1        |                                        |
    111          2 -> 3          : 24       |********                                |
    112          4 -> 7          : 114      |****************************************|
    113          8 -> 15         : 9        |***                                     |
    114         16 -> 31         : 1        |                                        |
    115         32 -> 63         : 1        |                                        |
    116 
    117 11:02:40:
    118 
    119 operation = write
    120      msecs               : count     distribution
    121          0 -> 1          : 0        |                                        |
    122          2 -> 3          : 11       |***                                     |
    123          4 -> 7          : 111      |****************************************|
    124          8 -> 15         : 13       |****                                    |
    125         16 -> 31         : 1        |                                        |
    126 
    127 11:02:41:
    128 
    129 operation = write
    130      msecs               : count     distribution
    131          0 -> 1          : 0        |                                        |
    132          2 -> 3          : 21       |******                                  |
    133          4 -> 7          : 137      |****************************************|
    134          8 -> 15         : 3        |                                        |
    135 
    136 This shows a write workload, with writes hovering primarily in the 4-7ms range.
    137 
    138 USAGE message:
    139 
    140 
    141 ./nfsdist -h
    142 usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count]
    143 
    144 Summarize NFS operation latency
    145 
    146 positional arguments:
    147   interval            output interval, in seconds
    148   count               number of outputs
    149 
    150 optional arguments:
    151   -h, --help          show this help message and exit
    152   -T, --notimestamp   don't include timestamp on interval output
    153   -m, --milliseconds  output in milliseconds
    154   -p PID, --pid PID   trace this PID only
    155 
    156 examples:
    157     ./nfsdist            # show operation latency as a histogram
    158     ./nfsdist -p 181     # trace PID 181 only
    159     ./nfsdist 1 10       # print 1 second summaries, 10 times
    160     ./nfsdist -m 5       # 5s summaries, milliseconds
    161