1 Demonstrations of argdist. 2 3 4 argdist probes functions you specify and collects parameter values into a 5 histogram or a frequency count. This can be used to understand the distribution 6 of values a certain parameter takes, filter and print interesting parameters 7 without attaching a debugger, and obtain general execution statistics on 8 various functions. 9 10 For example, suppose you want to find what allocation sizes are common in 11 your application: 12 13 # ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size' 14 [01:42:29] 15 p:c:malloc(size_t size):size_t:size 16 COUNT EVENT 17 [01:42:30] 18 p:c:malloc(size_t size):size_t:size 19 COUNT EVENT 20 [01:42:31] 21 p:c:malloc(size_t size):size_t:size 22 COUNT EVENT 23 1 size = 16 24 [01:42:32] 25 p:c:malloc(size_t size):size_t:size 26 COUNT EVENT 27 2 size = 16 28 [01:42:33] 29 p:c:malloc(size_t size):size_t:size 30 COUNT EVENT 31 3 size = 16 32 [01:42:34] 33 p:c:malloc(size_t size):size_t:size 34 COUNT EVENT 35 4 size = 16 36 ^C 37 38 It seems that the application is allocating blocks of size 16. The COUNT 39 column contains the number of occurrences of a particular event, and the 40 EVENT column describes the event. In this case, the "size" parameter was 41 probed and its value was 16, repeatedly. 42 43 Now, suppose you wanted a histogram of buffer sizes passed to the write() 44 function across the system: 45 46 # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len' 47 [01:45:22] 48 p:c:write(int fd, void *buf, size_t len):size_t:len 49 len : count distribution 50 0 -> 1 : 0 | | 51 2 -> 3 : 2 |************* | 52 4 -> 7 : 0 | | 53 8 -> 15 : 2 |************* | 54 16 -> 31 : 0 | | 55 32 -> 63 : 6 |****************************************| 56 [01:45:23] 57 p:c:write(int fd, void *buf, size_t len):size_t:len 58 len : count distribution 59 0 -> 1 : 0 | | 60 2 -> 3 : 11 |*************** | 61 4 -> 7 : 0 | | 62 8 -> 15 : 4 |***** | 63 16 -> 31 : 0 | | 64 32 -> 63 : 28 |****************************************| 65 64 -> 127 : 12 |***************** | 66 [01:45:24] 67 p:c:write(int fd, void *buf, size_t len):size_t:len 68 len : count distribution 69 0 -> 1 : 0 | | 70 2 -> 3 : 21 |**************** | 71 4 -> 7 : 0 | | 72 8 -> 15 : 6 |**** | 73 16 -> 31 : 0 | | 74 32 -> 63 : 52 |****************************************| 75 64 -> 127 : 26 |******************** | 76 ^C 77 78 It seems that most writes fall into three buckets: very small writes of 2-3 79 bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes. 80 81 But these are writes across the board -- what if you wanted to focus on writes 82 to STDOUT? 83 84 # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1' 85 [01:47:17] 86 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 87 len : count distribution 88 0 -> 1 : 0 | | 89 2 -> 3 : 0 | | 90 4 -> 7 : 0 | | 91 8 -> 15 : 1 |****************************************| 92 16 -> 31 : 0 | | 93 32 -> 63 : 1 |****************************************| 94 [01:47:18] 95 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 96 len : count distribution 97 0 -> 1 : 0 | | 98 2 -> 3 : 0 | | 99 4 -> 7 : 0 | | 100 8 -> 15 : 2 |************* | 101 16 -> 31 : 0 | | 102 32 -> 63 : 3 |******************** | 103 64 -> 127 : 6 |****************************************| 104 [01:47:19] 105 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1 106 len : count distribution 107 0 -> 1 : 0 | | 108 2 -> 3 : 0 | | 109 4 -> 7 : 0 | | 110 8 -> 15 : 3 |********* | 111 16 -> 31 : 0 | | 112 32 -> 63 : 5 |*************** | 113 64 -> 127 : 13 |****************************************| 114 ^C 115 116 The "fd==1" part is a filter that is applied to every invocation of write(). 117 Only if the filter condition is true, the value is recorded. 118 119 You can also use argdist to trace kernel functions. For example, suppose you 120 wanted a histogram of kernel allocation (kmalloc) sizes across the system, 121 printed twice with 3 second intervals: 122 123 # ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size' 124 [01:50:00] 125 p::__kmalloc(size_t size):size_t:size 126 size : count distribution 127 0 -> 1 : 0 | | 128 2 -> 3 : 0 | | 129 4 -> 7 : 0 | | 130 8 -> 15 : 6 |****************************************| 131 [01:50:03] 132 p::__kmalloc(size_t size):size_t:size 133 size : count distribution 134 0 -> 1 : 0 | | 135 2 -> 3 : 0 | | 136 4 -> 7 : 0 | | 137 8 -> 15 : 22 |****************************************| 138 16 -> 31 : 0 | | 139 32 -> 63 : 0 | | 140 64 -> 127 : 5 |********* | 141 128 -> 255 : 2 |*** | 142 143 Occasionally, numeric information isn't enough and you want to capture strings. 144 What are the strings printed by puts() across the system? 145 146 # ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str' 147 [01:53:54] 148 p:c:puts(char *str):char*:str 149 COUNT EVENT 150 2 str = Press ENTER to start. 151 152 It looks like the message "Press ENTER to start." was printed twice during the 153 10 seconds we were tracing. 154 155 What about reads? You could trace gets() across the system and print the 156 strings input by the user (note how "r" is used instead of "p" to attach a 157 probe to the function's return): 158 159 # ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0' 160 [02:12:23] 161 r:c:gets():char*:$retval:$retval!=0 162 COUNT EVENT 163 1 (char*)$retval = hi there 164 3 (char*)$retval = sasha 165 8 (char*)$retval = hello 166 167 Similarly, we could get a histogram of the error codes returned by read(): 168 169 # ./argdist -i 10 -c 1 -H 'r:c:read()' 170 [02:15:36] 171 r:c:read() 172 retval : count distribution 173 0 -> 1 : 29 |****************************************| 174 2 -> 3 : 11 |*************** | 175 4 -> 7 : 0 | | 176 8 -> 15 : 3 |**** | 177 16 -> 31 : 2 |** | 178 32 -> 63 : 22 |****************************** | 179 64 -> 127 : 5 |****** | 180 128 -> 255 : 0 | | 181 256 -> 511 : 1 |* | 182 512 -> 1023 : 1 |* | 183 1024 -> 2047 : 0 | | 184 2048 -> 4095 : 2 |** | 185 186 In return probes, you can also trace the latency of the function (unless it is 187 recursive) and the parameters it had on entry. For example, we can identify 188 which processes are performing slow synchronous filesystem reads -- say, 189 longer than 0.1ms (100,000ns): 190 191 # ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 192 [01:08:48] 193 r::__vfs_read():u32:$PID:$latency > 100000 194 COUNT EVENT 195 1 $PID = 10457 196 21 $PID = 2780 197 [01:08:49] 198 r::__vfs_read():u32:$PID:$latency > 100000 199 COUNT EVENT 200 1 $PID = 10457 201 21 $PID = 2780 202 ^C 203 204 It looks like process 2780 performed 21 slow reads. 205 206 Occasionally, entry parameter values are also interesting. For example, you 207 might be curious how long it takes malloc() to allocate memory -- nanoseconds 208 per byte allocated. Let's go: 209 210 # ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10 211 [01:11:13] 212 ns per byte : count distribution 213 0 -> 1 : 0 | | 214 2 -> 3 : 4 |***************** | 215 4 -> 7 : 3 |************* | 216 8 -> 15 : 2 |******** | 217 16 -> 31 : 1 |**** | 218 32 -> 63 : 0 | | 219 64 -> 127 : 7 |******************************* | 220 128 -> 255 : 1 |**** | 221 256 -> 511 : 0 | | 222 512 -> 1023 : 1 |**** | 223 1024 -> 2047 : 1 |**** | 224 2048 -> 4095 : 9 |****************************************| 225 4096 -> 8191 : 1 |**** | 226 227 It looks like a tri-modal distribution. Some allocations are extremely cheap, 228 and take 2-15 nanoseconds per byte. Other allocations are slower, and take 229 64-127 nanoseconds per byte. And some allocations are slower still, and take 230 multiple microseconds per byte. 231 232 You could also group results by more than one field. For example, __kmalloc 233 takes an additional flags parameter that describes how to allocate memory: 234 235 # ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size' 236 [03:42:29] 237 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 238 COUNT EVENT 239 1 flags = 16, size = 152 240 2 flags = 131280, size = 8 241 7 flags = 131280, size = 16 242 [03:42:30] 243 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 244 COUNT EVENT 245 1 flags = 16, size = 152 246 6 flags = 131280, size = 8 247 19 flags = 131280, size = 16 248 [03:42:31] 249 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 250 COUNT EVENT 251 2 flags = 16, size = 152 252 10 flags = 131280, size = 8 253 31 flags = 131280, size = 16 254 [03:42:32] 255 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size 256 COUNT EVENT 257 2 flags = 16, size = 152 258 14 flags = 131280, size = 8 259 43 flags = 131280, size = 16 260 ^C 261 262 The flags value must be expanded by hand, but it's still helpful to eliminate 263 certain kinds of allocations or visually group them together. 264 265 argdist also has basic support for kernel tracepoints. It is sometimes more 266 convenient to use tracepoints because they are documented and don't vary a lot 267 between kernel versions. For example, let's trace the net:net_dev_start_xmit 268 tracepoint and print out the protocol field from the tracepoint structure: 269 270 # argdist -C 't:net:net_dev_start_xmit():u16:args->protocol' 271 [13:01:49] 272 t:net:net_dev_start_xmit():u16:args->protocol 273 COUNT EVENT 274 8 args->protocol = 2048 275 ^C 276 277 Note that to discover the format of the net:net_dev_start_xmit tracepoint, you 278 use the tplist tool (tplist -v net:net_dev_start_xmit). 279 280 281 Occasionally, it is useful to filter certain expressions by string. This is not 282 trivially supported by BPF, but argdist provides a STRCMP helper you can use in 283 filter expressions. For example, to get a histogram of latencies opening a 284 specific file, run this: 285 286 # argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))' 287 [02:16:38] 288 [02:16:39] 289 [02:16:40] 290 $latency/1000 : count distribution 291 0 -> 1 : 0 | | 292 2 -> 3 : 0 | | 293 4 -> 7 : 0 | | 294 8 -> 15 : 0 | | 295 16 -> 31 : 2 |****************************************| 296 [02:16:41] 297 $latency/1000 : count distribution 298 0 -> 1 : 0 | | 299 2 -> 3 : 0 | | 300 4 -> 7 : 0 | | 301 8 -> 15 : 1 |********** | 302 16 -> 31 : 4 |****************************************| 303 [02:16:42] 304 $latency/1000 : count distribution 305 0 -> 1 : 0 | | 306 2 -> 3 : 0 | | 307 4 -> 7 : 0 | | 308 8 -> 15 : 1 |******** | 309 16 -> 31 : 5 |****************************************| 310 [02:16:43] 311 $latency/1000 : count distribution 312 0 -> 1 : 0 | | 313 2 -> 3 : 0 | | 314 4 -> 7 : 0 | | 315 8 -> 15 : 1 |******** | 316 16 -> 31 : 5 |****************************************| 317 318 319 Here's a final example that finds how many write() system calls are performed 320 by each process on the system: 321 322 # argdist -c -C 'p:c:write():int:$PID;write per process' -n 2 323 [06:47:18] 324 write by process 325 COUNT EVENT 326 3 $PID = 8889 327 7 $PID = 7615 328 7 $PID = 2480 329 [06:47:19] 330 write by process 331 COUNT EVENT 332 9 $PID = 8889 333 23 $PID = 7615 334 23 $PID = 2480 335 336 337 USAGE message: 338 339 # argdist -h 340 usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v] 341 [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header] 342 343 Trace a function and display a summary of its parameter values. 344 345 optional arguments: 346 -h, --help show this help message and exit 347 -p PID, --pid PID id of the process to trace (optional) 348 -z STRING_SIZE, --string-size STRING_SIZE 349 maximum string size to read from char* arguments 350 -i INTERVAL, --interval INTERVAL 351 output interval, in seconds (default 1 second) 352 -d DURATION, --duration DURATION 353 total duration of trace, in seconds 354 -n COUNT, --number COUNT 355 number of outputs 356 -v, --verbose print resulting BPF program code before executing 357 -c, --cumulative do not clear histograms and freq counts at each interval 358 -T TOP, --top TOP number of top results to show (not applicable to 359 histograms) 360 -H specifier, --histogram specifier 361 probe specifier to capture histogram of (see examples 362 below) 363 -C specifier, --count specifier 364 probe specifier to capture count of (see examples 365 below) 366 -I header, --include header 367 additional header files to include in the BPF program 368 as either full path, or relative to current working directory, 369 or relative to default kernel header search path 370 371 Probe specifier syntax: 372 {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label] 373 Where: 374 p,r,t,u -- probe at function entry, function exit, kernel tracepoint, 375 or USDT probe 376 in exit probes: can use $retval, $entry(param), $latency 377 library -- the library that contains the function 378 (leave empty for kernel functions) 379 category -- the category of the kernel tracepoint (e.g. net, sched) 380 signature -- the function's parameters, as in the C header 381 type -- the type of the expression to collect (supports multiple) 382 expr -- the expression to collect (supports multiple) 383 filter -- the filter that is applied to collected values 384 label -- the label for this probe in the resulting output 385 386 EXAMPLES: 387 388 argdist -H 'p::__kmalloc(u64 size):u64:size' 389 Print a histogram of allocation sizes passed to kmalloc 390 391 argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16' 392 Print a frequency count of how many times process 1005 called malloc 393 with an allocation size of 16 bytes 394 395 argdist -C 'r:c:gets():char*:$retval#snooped strings' 396 Snoop on all strings returned by gets() 397 398 argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte' 399 Print a histogram of nanoseconds per byte from kmalloc allocations 400 401 argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC' 402 Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC 403 404 argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5 405 Print frequency counts of how many times writes were issued to a 406 particular file descriptor number, in process 1005, but only show 407 the top 5 busiest fds 408 409 argdist -p 1005 -H 'r:c:read()' 410 Print a histogram of error codes returned by read() in process 1005 411 412 argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000' 413 Print frequency of reads by process where the latency was >0.1ms 414 415 argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000' 416 Print a histogram of read sizes that were longer than 1ms 417 418 argdist -H \ 419 'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1' 420 Print a histogram of buffer sizes passed to write() across all 421 processes, where the file descriptor was 1 (STDOUT) 422 423 argdist -C 'p:c:fork()#fork calls' 424 Count fork() calls in libc across all processes 425 Can also use funccount.py, which is easier and more flexible 426 427 argdist -H 't:block:block_rq_complete():u32:args->nr_sector' 428 Print histogram of number of sectors in completing block I/O requests 429 430 argdist -C 't:irq:irq_handler_entry():int:args->irq' 431 Aggregate interrupts by interrupt request (IRQ) 432 433 argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337 434 Print frequency of function addresses used as a pthread start function, 435 relying on the USDT pthread_start probe in process 1337 436 437 argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \ 438 -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec' 439 Print histograms of sleep() and nanosleep() parameter values 440 441 argdist -p 2780 -z 120 \ 442 -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1' 443 Spy on writes to STDOUT performed by process 2780, up to a string size 444 of 120 characters 445 446 argdist -I 'kernel/sched/sched.h' \ 447 -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining' 448 Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined 449 in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel 450 package. So this command needs to run at the kernel source tree root directory 451 so that the added header file can be found by the compiler. 452