1 Demonstrations of nfsdist, the Linux eBPF/bcc version. 2 3 nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their 4 latency as a power-of-2 histogram. For example: 5 6 7 ./nfsdist.py 8 9 Tracing NFS operation latency... Hit Ctrl-C to end. 10 11 operation = read 12 usecs : count distribution 13 0 -> 1 : 4 | | 14 2 -> 3 : 0 | | 15 4 -> 7 : 0 | | 16 8 -> 15 : 7107 |************** | 17 16 -> 31 : 19864 |****************************************| 18 32 -> 63 : 1494 |*** | 19 64 -> 127 : 491 | | 20 128 -> 255 : 1810 |*** | 21 256 -> 511 : 6356 |************ | 22 512 -> 1023 : 4860 |********* | 23 1024 -> 2047 : 3070 |****** | 24 2048 -> 4095 : 1853 |*** | 25 4096 -> 8191 : 921 |* | 26 8192 -> 16383 : 122 | | 27 16384 -> 32767 : 15 | | 28 32768 -> 65535 : 5 | | 29 65536 -> 131071 : 2 | | 30 131072 -> 262143 : 1 | | 31 32 operation = write 33 usecs : count distribution 34 0 -> 1 : 0 | | 35 2 -> 3 : 0 | | 36 4 -> 7 : 0 | | 37 8 -> 15 : 1 | | 38 16 -> 31 : 0 | | 39 32 -> 63 : 9 | | 40 64 -> 127 : 19491 |****************************************| 41 128 -> 255 : 3064 |****** | 42 256 -> 511 : 940 |* | 43 512 -> 1023 : 365 | | 44 1024 -> 2047 : 312 | | 45 2048 -> 4095 : 119 | | 46 4096 -> 8191 : 31 | | 47 8192 -> 16383 : 84 | | 48 16384 -> 32767 : 31 | | 49 32768 -> 65535 : 5 | | 50 65536 -> 131071 : 3 | | 51 131072 -> 262143 : 0 | | 52 262144 -> 524287 : 1 | | 53 54 operation = getattr 55 usecs : count distribution 56 0 -> 1 : 27 |****************************************| 57 2 -> 3 : 2 |** | 58 4 -> 7 : 3 |**** | 59 8 -> 15 : 0 | | 60 16 -> 31 : 0 | | 61 32 -> 63 : 0 | | 62 64 -> 127 : 0 | | 63 128 -> 255 : 0 | | 64 256 -> 511 : 2 |** | 65 512 -> 1023 : 2 |** | 66 67 operation = open 68 usecs : count distribution 69 0 -> 1 : 0 | | 70 2 -> 3 : 0 | | 71 4 -> 7 : 0 | | 72 8 -> 15 : 0 | | 73 16 -> 31 : 0 | | 74 32 -> 63 : 0 | | 75 64 -> 127 : 0 | | 76 128 -> 255 : 0 | | 77 256 -> 511 : 2 |****************************************| 78 79 80 In this example you can see that the read traffic is rather bi-modal, with about 81 26K reads falling within 8 - 30 usecs and about 18K reads spread between 128 - 82 8191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket. 83 The faster read traffic is probably coming from a filesystem cache and the slower 84 traffic from disk. The reason why the writes are so consistently fast is because 85 this example test was run on a couple of VM's and I believe the hypervisor was 86 caching all the write traffic to memory. 87 88 This "latency" is measured from when the operation was issued from the VFS 89 interface to the file system, to when it completed. This spans everything: 90 RPC latency, network latency, file system CPU cycles, file system locks, run 91 queue latency, etc. This is a better measure of the latency suffered by 92 applications reading from a NFS share and can better expose problems 93 experienced by NFS clients. 94 95 Note that this only traces the common NFS operations (read, write, open and 96 getattr). I chose to include getattr as a significant percentage of NFS 97 traffic end up being getattr calls and are a good indicator of problems 98 with an NFS server. 99 100 An optional interval and a count can be provided, as well as -m to show the 101 distributions in milliseconds. For example: 102 103 ./nfsdist -m 1 5 104 Tracing NFS operation latency... Hit Ctrl-C to end. 105 106 11:02:39: 107 108 operation = write 109 msecs : count distribution 110 0 -> 1 : 1 | | 111 2 -> 3 : 24 |******** | 112 4 -> 7 : 114 |****************************************| 113 8 -> 15 : 9 |*** | 114 16 -> 31 : 1 | | 115 32 -> 63 : 1 | | 116 117 11:02:40: 118 119 operation = write 120 msecs : count distribution 121 0 -> 1 : 0 | | 122 2 -> 3 : 11 |*** | 123 4 -> 7 : 111 |****************************************| 124 8 -> 15 : 13 |**** | 125 16 -> 31 : 1 | | 126 127 11:02:41: 128 129 operation = write 130 msecs : count distribution 131 0 -> 1 : 0 | | 132 2 -> 3 : 21 |****** | 133 4 -> 7 : 137 |****************************************| 134 8 -> 15 : 3 | | 135 136 This shows a write workload, with writes hovering primarily in the 4-7ms range. 137 138 USAGE message: 139 140 141 ./nfsdist -h 142 usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count] 143 144 Summarize NFS operation latency 145 146 positional arguments: 147 interval output interval, in seconds 148 count number of outputs 149 150 optional arguments: 151 -h, --help show this help message and exit 152 -T, --notimestamp don't include timestamp on interval output 153 -m, --milliseconds output in milliseconds 154 -p PID, --pid PID trace this PID only 155 156 examples: 157 ./nfsdist # show operation latency as a histogram 158 ./nfsdist -p 181 # trace PID 181 only 159 ./nfsdist 1 10 # print 1 second summaries, 10 times 160 ./nfsdist -m 5 # 5s summaries, milliseconds 161