1 Demonstrations of nfsslower, the Linux eBPF/bcc version. 2 3 nfsslower show NFS reads, writes, opens and getattrs, slower than a 4 threshold. For example: 5 6 ./nfsslower.py 7 Tracing NFS operations that are slower than 10 ms 8 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 9 11:25:16 dd 21295 W 1048576 15360 14.84 1.test 10 11:25:16 dd 21295 W 1048576 16384 12.73 1.test 11 11:25:16 dd 21295 W 1048576 17408 24.27 1.test 12 11:25:16 dd 21295 W 1048576 18432 22.93 1.test 13 11:25:16 dd 21295 W 1048576 19456 14.65 1.test 14 11:25:16 dd 21295 W 1048576 20480 12.58 1.test 15 11:25:16 dd 21297 W 1048576 6144 10.50 1.test.w 16 11:25:16 dd 21297 W 1048576 7168 16.65 1.test.w 17 11:25:16 dd 21297 W 1048576 8192 13.01 1.test.w 18 11:25:16 dd 21297 W 1048576 9216 14.06 1.test.w 19 20 This shows NFS writes from dd each 1MB in size to 2 different files. The 21 writes all had latency higher than 10ms. 22 23 This "latency" is measured from when the operation was issued from the VFS 24 interface to the file system, to when it completed. This spans everything: 25 RPC latency, network latency, file system CPU cycles, file system locks, run 26 queue latency, etc. This is a better measure of the latency suffered by 27 applications reading from a NFS share and can better expose problems 28 experienced by NFS clients. 29 30 Note that this only traces the common NFS operations (read,write,open and 31 getattr). I chose to include getattr as a significant percentage of NFS 32 traffic end up being getattr calls and are a good indicator of problems 33 with an NFS server. 34 35 The threshold can be provided as an argument. E.g. I/O slower than 1 ms: 36 37 ./nfsslower.py 1 38 Tracing NFS operations that are slower than 1 ms 39 TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 40 11:40:16 cp 21583 R 131072 0 4.35 1.test 41 11:40:16 cp 21583 R 131072 256 1.87 1.test 42 11:40:16 cp 21583 R 131072 384 2.99 1.test 43 11:40:16 cp 21583 R 131072 512 4.19 1.test 44 11:40:16 cp 21583 R 131072 640 4.25 1.test 45 11:40:16 cp 21583 R 131072 768 4.65 1.test 46 11:40:16 cp 21583 R 131072 1280 1.08 1.test 47 11:40:16 cp 21583 R 131072 1408 3.29 1.test 48 11:40:16 cp 21583 R 131072 1792 3.12 1.test 49 11:40:16 cp 21583 R 131072 3712 3.55 1.test 50 11:40:16 cp 21583 R 131072 3840 1.12 1.test 51 11:40:16 cp 21583 R 131072 4096 3.23 1.test 52 11:40:16 cp 21583 R 131072 4224 2.73 1.test 53 11:40:16 cp 21583 R 131072 4352 2.73 1.test 54 11:40:16 cp 21583 R 131072 4480 6.09 1.test 55 11:40:16 cp 21583 R 131072 5120 4.40 1.test 56 [...] 57 58 This shows all NFS_READS that were more than 1ms. Depending on your 59 latency to your fileserver, you might need to tweak this value to 60 remove 61 62 A threshold of 0 will trace all operations. Warning: the output will be 63 verbose, as it will include all file system cache hits. 64 65 ./nfsslower.py 0 66 Tracing NFS operations 67 11:56:50 dd 21852 W 1048576 0 0.42 1.test 68 11:56:50 dd 21852 W 1048576 1024 0.46 1.test 69 11:56:50 dd 21852 W 1048576 2048 0.36 1.test 70 11:56:50 cp 21854 G 0 0 0.35 1.test 71 11:56:50 cp 21854 O 0 0 0.33 1.test 72 11:56:50 cp 21854 G 0 0 0.00 1.test 73 11:56:50 cp 21854 R 131072 0 0.07 1.test 74 11:56:50 cp 21854 R 131072 128 0.02 1.test 75 11:56:50 cp 21854 R 131072 256 0.02 1.test 76 11:56:50 cp 21854 R 131072 384 0.02 1.test 77 11:56:50 cp 21854 R 131072 512 0.02 1.test 78 11:56:50 cp 21854 R 131072 640 0.02 1.test 79 11:56:50 cp 21854 R 131072 768 0.02 1.test 80 11:56:50 cp 21854 R 131072 896 0.02 1.test 81 11:56:50 cp 21854 R 131072 1024 0.02 1.test 82 11:56:50 cp 21854 R 131072 1152 0.02 1.test 83 11:56:50 cp 21854 R 131072 1280 0.02 1.test 84 11:56:50 cp 21854 R 131072 1408 0.02 1.test 85 11:56:50 cp 21854 R 131072 1536 0.02 1.test 86 11:56:50 cp 21854 R 131072 1664 0.02 1.test 87 11:56:50 cp 21854 R 131072 1792 0.02 1.test 88 11:56:50 cp 21854 R 131072 1920 0.02 1.test 89 11:56:50 cp 21854 R 131072 2048 0.02 1.test 90 11:56:50 cp 21854 R 131072 2176 0.04 1.test 91 11:56:50 cp 21854 R 131072 2304 0.02 1.test 92 11:56:50 cp 21854 R 131072 2432 0.03 1.test 93 11:56:50 cp 21854 R 131072 2560 0.03 1.test 94 11:56:50 cp 21854 R 131072 2688 0.02 1.test 95 11:56:50 cp 21854 R 131072 2816 0.03 1.test 96 11:56:50 cp 21854 R 131072 2944 0.02 1.test 97 11:56:50 cp 21854 R 0 3072 0.00 1.test 98 11:56:50 ls 21855 G 0 0 0.00 1.test 99 11:56:50 ls 21856 G 0 0 0.36 music 100 11:56:50 ls 21856 G 0 0 0.00 music 101 11:56:50 ls 21856 G 0 0 0.00 test 102 11:56:50 ls 21856 G 0 0 0.00 ff 103 11:56:50 ls 21856 G 0 0 0.00 34.log 104 11:56:50 ls 21856 G 0 0 0.00 vmlinuz-linux 105 11:56:50 ls 21856 G 0 0 0.00 2.test 106 11:56:50 ls 21856 G 0 0 0.00 rt.log 107 11:56:50 ls 21856 G 0 0 0.00 1.lod 108 11:56:50 ls 21856 G 0 0 0.00 COPYRIGHT.txt 109 11:56:50 ls 21856 G 0 0 0.00 gg 110 11:56:50 ls 21856 G 0 0 0.00 qw.log 111 11:56:50 ls 21856 G 0 0 0.00 README.md 112 11:56:50 ls 21856 G 0 0 0.00 1.log 113 114 The output now includes open operations ("O"), and reads ("R") wand getattrs ("G"). 115 A cp operation 116 117 118 A -j option will print just the fields (parsable output, csv): 119 120 ./nfsslower.py -j 0 121 ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE 122 87054476520,dd,22754,W,1048576,0,425,1.test 123 87054482916,dd,22754,W,1048576,1048576,320,1.test 124 87054488179,dd,22754,W,1048576,2097152,389,1.test 125 87054511340,cp,22756,G,0,0,371,1.test 126 87054511685,cp,22756,O,0,0,306,1.test 127 87054511700,cp,22756,G,0,0,2,1.test 128 87054512325,cp,22756,R,131072,0,56,1.test 129 87054512432,cp,22756,R,131072,131072,22,1.test 130 87054512520,cp,22756,R,131072,262144,32,1.test 131 87054512600,cp,22756,R,131072,393216,21,1.test 132 87054512678,cp,22756,R,131072,524288,21,1.test 133 87054512803,cp,22756,R,131072,655360,56,1.test 134 135 This may be useful for visualizing with another tool, for example, for 136 producing a scatter plot of ENDTIME vs LATENCY, to look for time-based 137 patterns. 138 139 USAGE message: 140 141 usage: nfsslower.py [-h] [-j] [-p PID] [min_ms] 142 143 Trace READ, WRITE, OPEN and GETATTR NFS calls slower than a threshold,supports NFSv{3,4} 144 145 positional arguments: 146 min_ms Minimum IO duration to trace in ms (default=10ms) 147 148 optional arguments: 149 -h, --help show this help message and exit 150 -j, --csv just print fields: comma-separated values 151 -p PID, --pid PID Trace this pid only 152 153 ./nfsslower # trace operations slower than 10ms 154 ./nfsslower 1 # trace operations slower than 1ms 155 ./nfsslower -j 1 # ... 1 ms, parsable output (csv) 156 ./nfsslower 0 # trace all nfs operations 157 ./nfsslower -p 121 # trace pid 121 only 158 159