Home | History | Annotate | Download | only in tools
      1 Demonstrations of argdist.
      2 
      3 
      4 argdist probes functions you specify and collects parameter values into a
      5 histogram or a frequency count. This can be used to understand the distribution
      6 of values a certain parameter takes, filter and print interesting parameters
      7 without attaching a debugger, and obtain general execution statistics on
      8 various functions.
      9 
     10 For example, suppose you want to find what allocation sizes are common in
     11 your application:
     12 
     13 # ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
     14 [01:42:29]
     15 p:c:malloc(size_t size):size_t:size
     16         COUNT      EVENT
     17 [01:42:30]
     18 p:c:malloc(size_t size):size_t:size
     19         COUNT      EVENT
     20 [01:42:31]
     21 p:c:malloc(size_t size):size_t:size
     22         COUNT      EVENT
     23         1          size = 16
     24 [01:42:32]
     25 p:c:malloc(size_t size):size_t:size
     26         COUNT      EVENT
     27         2          size = 16
     28 [01:42:33]
     29 p:c:malloc(size_t size):size_t:size
     30         COUNT      EVENT
     31         3          size = 16
     32 [01:42:34]
     33 p:c:malloc(size_t size):size_t:size
     34         COUNT      EVENT
     35         4          size = 16
     36 ^C
     37 
     38 It seems that the application is allocating blocks of size 16. The COUNT
     39 column contains the number of occurrences of a particular event, and the
     40 EVENT column describes the event. In this case, the "size" parameter was 
     41 probed and its value was 16, repeatedly.
     42 
     43 Now, suppose you wanted a histogram of buffer sizes passed to the write()
     44 function across the system:
     45 
     46 # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
     47 [01:45:22]
     48 p:c:write(int fd, void *buf, size_t len):size_t:len
     49      len                 : count     distribution
     50          0 -> 1          : 0        |                                        |
     51          2 -> 3          : 2        |*************                           |
     52          4 -> 7          : 0        |                                        |
     53          8 -> 15         : 2        |*************                           |
     54         16 -> 31         : 0        |                                        |
     55         32 -> 63         : 6        |****************************************|
     56 [01:45:23]
     57 p:c:write(int fd, void *buf, size_t len):size_t:len
     58      len                 : count     distribution
     59          0 -> 1          : 0        |                                        |
     60          2 -> 3          : 11       |***************                         |
     61          4 -> 7          : 0        |                                        |
     62          8 -> 15         : 4        |*****                                   |
     63         16 -> 31         : 0        |                                        |
     64         32 -> 63         : 28       |****************************************|
     65         64 -> 127        : 12       |*****************                       |
     66 [01:45:24]
     67 p:c:write(int fd, void *buf, size_t len):size_t:len
     68      len                 : count     distribution
     69          0 -> 1          : 0        |                                        |
     70          2 -> 3          : 21       |****************                        |
     71          4 -> 7          : 0        |                                        |
     72          8 -> 15         : 6        |****                                    |
     73         16 -> 31         : 0        |                                        |
     74         32 -> 63         : 52       |****************************************|
     75         64 -> 127        : 26       |********************                    |
     76 ^C
     77 
     78 It seems that most writes fall into three buckets: very small writes of 2-3
     79 bytes, medium writes of 32-63 bytes, and larger writes of 64-127 bytes.
     80 
     81 But these are writes across the board -- what if you wanted to focus on writes
     82 to STDOUT?
     83 
     84 # ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1'
     85 [01:47:17]
     86 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
     87      len                 : count     distribution
     88          0 -> 1          : 0        |                                        |
     89          2 -> 3          : 0        |                                        |
     90          4 -> 7          : 0        |                                        |
     91          8 -> 15         : 1        |****************************************|
     92         16 -> 31         : 0        |                                        |
     93         32 -> 63         : 1        |****************************************|
     94 [01:47:18]
     95 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
     96      len                 : count     distribution
     97          0 -> 1          : 0        |                                        |
     98          2 -> 3          : 0        |                                        |
     99          4 -> 7          : 0        |                                        |
    100          8 -> 15         : 2        |*************                           |
    101         16 -> 31         : 0        |                                        |
    102         32 -> 63         : 3        |********************                    |
    103         64 -> 127        : 6        |****************************************|
    104 [01:47:19]
    105 p:c:write(int fd, void *buf, size_t len):size_t:len:fd==1
    106      len                 : count     distribution
    107          0 -> 1          : 0        |                                        |
    108          2 -> 3          : 0        |                                        |
    109          4 -> 7          : 0        |                                        |
    110          8 -> 15         : 3        |*********                               |
    111         16 -> 31         : 0        |                                        |
    112         32 -> 63         : 5        |***************                         |
    113         64 -> 127        : 13       |****************************************|
    114 ^C
    115 
    116 The "fd==1" part is a filter that is applied to every invocation of write().
    117 Only if the filter condition is true, the value is recorded.
    118 
    119 You can also use argdist to trace kernel functions. For example, suppose you
    120 wanted a histogram of kernel allocation (kmalloc) sizes across the system,
    121 printed twice with 3 second intervals:
    122 
    123 # ./argdist -i 3 -n 2 -H 'p::__kmalloc(size_t size):size_t:size'
    124 [01:50:00]
    125 p::__kmalloc(size_t size):size_t:size
    126      size                : count     distribution
    127          0 -> 1          : 0        |                                        |
    128          2 -> 3          : 0        |                                        |
    129          4 -> 7          : 0        |                                        |
    130          8 -> 15         : 6        |****************************************|
    131 [01:50:03]
    132 p::__kmalloc(size_t size):size_t:size
    133      size                : count     distribution
    134          0 -> 1          : 0        |                                        |
    135          2 -> 3          : 0        |                                        |
    136          4 -> 7          : 0        |                                        |
    137          8 -> 15         : 22       |****************************************|
    138         16 -> 31         : 0        |                                        |
    139         32 -> 63         : 0        |                                        |
    140         64 -> 127        : 5        |*********                               |
    141        128 -> 255        : 2        |***                                     |
    142 
    143 Occasionally, numeric information isn't enough and you want to capture strings.
    144 What are the strings printed by puts() across the system?
    145 
    146 # ./argdist -i 10 -n 1 -C 'p:c:puts(char *str):char*:str'
    147 [01:53:54]
    148 p:c:puts(char *str):char*:str
    149         COUNT      EVENT
    150         2          str = Press ENTER to start.
    151 
    152 It looks like the message "Press ENTER to start." was printed twice during the
    153 10 seconds we were tracing.
    154 
    155 What about reads? You could trace gets() across the system and print the 
    156 strings input by the user (note how "r" is used instead of "p" to attach a
    157 probe to the function's return):
    158 
    159 # ./argdist -i 10 -n 1 -C 'r:c:gets():char*:(char*)$retval:$retval!=0'
    160 [02:12:23]
    161 r:c:gets():char*:$retval:$retval!=0
    162         COUNT      EVENT
    163         1          (char*)$retval = hi there
    164         3          (char*)$retval = sasha
    165         8          (char*)$retval = hello
    166 
    167 Similarly, we could get a histogram of the error codes returned by read():
    168 
    169 # ./argdist -i 10 -c 1 -H 'r:c:read()'
    170 [02:15:36]
    171 r:c:read()
    172      retval              : count     distribution
    173          0 -> 1          : 29       |****************************************|
    174          2 -> 3          : 11       |***************                         |
    175          4 -> 7          : 0        |                                        |
    176          8 -> 15         : 3        |****                                    |
    177         16 -> 31         : 2        |**                                      |
    178         32 -> 63         : 22       |******************************          |
    179         64 -> 127        : 5        |******                                  |
    180        128 -> 255        : 0        |                                        |
    181        256 -> 511        : 1        |*                                       |
    182        512 -> 1023       : 1        |*                                       |
    183       1024 -> 2047       : 0        |                                        |
    184       2048 -> 4095       : 2        |**                                      |
    185 
    186 In return probes, you can also trace the latency of the function (unless it is
    187 recursive) and the parameters it had on entry. For example, we can identify
    188 which processes are performing slow synchronous filesystem reads -- say,
    189 longer than 0.1ms (100,000ns):
    190 
    191 # ./argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
    192 [01:08:48]
    193 r::__vfs_read():u32:$PID:$latency > 100000
    194         COUNT      EVENT
    195         1          $PID = 10457
    196         21         $PID = 2780
    197 [01:08:49]
    198 r::__vfs_read():u32:$PID:$latency > 100000
    199         COUNT      EVENT
    200         1          $PID = 10457
    201         21         $PID = 2780
    202 ^C
    203 
    204 It looks like process 2780 performed 21 slow reads.
    205 
    206 Occasionally, entry parameter values are also interesting. For example, you
    207 might be curious how long it takes malloc() to allocate memory -- nanoseconds
    208 per byte allocated. Let's go:
    209 
    210 # ./argdist -H 'r:c:malloc(size_t size):u64:$latency/$entry(size);ns per byte' -n 1 -i 10
    211 [01:11:13]
    212      ns per byte         : count     distribution
    213          0 -> 1          : 0        |                                        |
    214          2 -> 3          : 4        |*****************                       |
    215          4 -> 7          : 3        |*************                           |
    216          8 -> 15         : 2        |********                                |
    217         16 -> 31         : 1        |****                                    |
    218         32 -> 63         : 0        |                                        |
    219         64 -> 127        : 7        |*******************************         |
    220        128 -> 255        : 1        |****                                    |
    221        256 -> 511        : 0        |                                        |
    222        512 -> 1023       : 1        |****                                    |
    223       1024 -> 2047       : 1        |****                                    |
    224       2048 -> 4095       : 9        |****************************************|
    225       4096 -> 8191       : 1        |****                                    | 
    226 
    227 It looks like a tri-modal distribution. Some allocations are extremely cheap,
    228 and take 2-15 nanoseconds per byte. Other allocations are slower, and take
    229 64-127 nanoseconds per byte. And some allocations are slower still, and take
    230 multiple microseconds per byte.
    231 
    232 You could also group results by more than one field. For example, __kmalloc
    233 takes an additional flags parameter that describes how to allocate memory:
    234 
    235 # ./argdist -c -C 'p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size'
    236 [03:42:29]
    237 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
    238         COUNT      EVENT
    239         1          flags = 16, size = 152
    240         2          flags = 131280, size = 8
    241         7          flags = 131280, size = 16
    242 [03:42:30]
    243 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
    244         COUNT      EVENT
    245         1          flags = 16, size = 152
    246         6          flags = 131280, size = 8
    247         19         flags = 131280, size = 16
    248 [03:42:31]
    249 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
    250         COUNT      EVENT
    251         2          flags = 16, size = 152
    252         10         flags = 131280, size = 8
    253         31         flags = 131280, size = 16
    254 [03:42:32]
    255 p::__kmalloc(size_t size, gfp_t flags):gfp_t,size_t:flags,size
    256         COUNT      EVENT
    257         2          flags = 16, size = 152
    258         14         flags = 131280, size = 8
    259         43         flags = 131280, size = 16
    260 ^C
    261 
    262 The flags value must be expanded by hand, but it's still helpful to eliminate
    263 certain kinds of allocations or visually group them together.
    264 
    265 argdist also has basic support for kernel tracepoints. It is sometimes more
    266 convenient to use tracepoints because they are documented and don't vary a lot
    267 between kernel versions. For example, let's trace the net:net_dev_start_xmit
    268 tracepoint and print out the protocol field from the tracepoint structure:
    269 
    270 # argdist -C 't:net:net_dev_start_xmit():u16:args->protocol'
    271 [13:01:49]
    272 t:net:net_dev_start_xmit():u16:args->protocol
    273         COUNT      EVENT
    274         8          args->protocol = 2048
    275 ^C
    276 
    277 Note that to discover the format of the net:net_dev_start_xmit tracepoint, you
    278 use the tplist tool (tplist -v net:net_dev_start_xmit).
    279 
    280 
    281 Occasionally, it is useful to filter certain expressions by string. This is not
    282 trivially supported by BPF, but argdist provides a STRCMP helper you can use in
    283 filter expressions. For example, to get a histogram of latencies opening a
    284 specific file, run this:
    285 
    286 # argdist -c -H 'r:c:open(char *file):u64:$latency/1000:STRCMP("test.txt",$entry(file))'
    287 [02:16:38]
    288 [02:16:39]
    289 [02:16:40]
    290      $latency/1000       : count     distribution
    291          0 -> 1          : 0        |                                        |
    292          2 -> 3          : 0        |                                        |
    293          4 -> 7          : 0        |                                        |
    294          8 -> 15         : 0        |                                        |
    295         16 -> 31         : 2        |****************************************|
    296 [02:16:41]
    297      $latency/1000       : count     distribution
    298          0 -> 1          : 0        |                                        |
    299          2 -> 3          : 0        |                                        |
    300          4 -> 7          : 0        |                                        |
    301          8 -> 15         : 1        |**********                              |
    302         16 -> 31         : 4        |****************************************|
    303 [02:16:42]
    304      $latency/1000       : count     distribution
    305          0 -> 1          : 0        |                                        |
    306          2 -> 3          : 0        |                                        |
    307          4 -> 7          : 0        |                                        |
    308          8 -> 15         : 1        |********                                |
    309         16 -> 31         : 5        |****************************************|
    310 [02:16:43]
    311      $latency/1000       : count     distribution
    312          0 -> 1          : 0        |                                        |
    313          2 -> 3          : 0        |                                        |
    314          4 -> 7          : 0        |                                        |
    315          8 -> 15         : 1        |********                                |
    316         16 -> 31         : 5        |****************************************|
    317 
    318 
    319 Here's a final example that finds how many write() system calls are performed
    320 by each process on the system:
    321 
    322 # argdist -c -C 'p:c:write():int:$PID;write per process' -n 2
    323 [06:47:18]
    324 write by process
    325         COUNT      EVENT
    326         3          $PID = 8889
    327         7          $PID = 7615
    328         7          $PID = 2480
    329 [06:47:19]
    330 write by process
    331         COUNT      EVENT
    332         9          $PID = 8889
    333         23         $PID = 7615
    334         23         $PID = 2480
    335 
    336 
    337 USAGE message:
    338 
    339 # argdist -h
    340 usage: argdist [-h] [-p PID] [-z STRING_SIZE] [-i INTERVAL] [-n COUNT] [-v]
    341                [-c] [-T TOP] [-H specifier] [-C[specifier] [-I header]
    342 
    343 Trace a function and display a summary of its parameter values.
    344 
    345 optional arguments:
    346   -h, --help            show this help message and exit
    347   -p PID, --pid PID     id of the process to trace (optional)
    348   -z STRING_SIZE, --string-size STRING_SIZE
    349                         maximum string size to read from char* arguments
    350   -i INTERVAL, --interval INTERVAL
    351                         output interval, in seconds (default 1 second)
    352   -d DURATION, --duration DURATION
    353 			total duration of trace, in seconds
    354   -n COUNT, --number COUNT
    355                         number of outputs
    356   -v, --verbose         print resulting BPF program code before executing
    357   -c, --cumulative      do not clear histograms and freq counts at each interval
    358   -T TOP, --top TOP     number of top results to show (not applicable to
    359                         histograms)
    360   -H specifier, --histogram specifier
    361                         probe specifier to capture histogram of (see examples
    362                         below)
    363   -C specifier, --count specifier
    364                         probe specifier to capture count of (see examples
    365                         below)
    366   -I header, --include header
    367                         additional header files to include in the BPF program
    368                         as either full path, or relative to current working directory,
    369                         or relative to default kernel header search path
    370 
    371 Probe specifier syntax:
    372         {p,r,t,u}:{[library],category}:function(signature)[:type[,type...]:expr[,expr...][:filter]][#label]
    373 Where:
    374         p,r,t,u    -- probe at function entry, function exit, kernel tracepoint,
    375                       or USDT probe
    376                       in exit probes: can use $retval, $entry(param), $latency
    377         library    -- the library that contains the function
    378                       (leave empty for kernel functions)
    379         category   -- the category of the kernel tracepoint (e.g. net, sched)
    380         signature  -- the function's parameters, as in the C header
    381         type       -- the type of the expression to collect (supports multiple)
    382         expr       -- the expression to collect (supports multiple)
    383         filter     -- the filter that is applied to collected values
    384         label      -- the label for this probe in the resulting output
    385 
    386 EXAMPLES:
    387 
    388 argdist -H 'p::__kmalloc(u64 size):u64:size'
    389         Print a histogram of allocation sizes passed to kmalloc
    390 
    391 argdist -p 1005 -C 'p:c:malloc(size_t size):size_t:size:size==16'
    392         Print a frequency count of how many times process 1005 called malloc
    393         with an allocation size of 16 bytes
    394 
    395 argdist -C 'r:c:gets():char*:$retval#snooped strings'
    396         Snoop on all strings returned by gets()
    397 
    398 argdist -H 'r::__kmalloc(size_t size):u64:$latency/$entry(size)#ns per byte'
    399         Print a histogram of nanoseconds per byte from kmalloc allocations
    400 
    401 argdist -C 'p::__kmalloc(size_t size, gfp_t flags):size_t:size:flags&GFP_ATOMIC'
    402         Print frequency count of kmalloc allocation sizes that have GFP_ATOMIC
    403 
    404 argdist -p 1005 -C 'p:c:write(int fd):int:fd' -T 5
    405         Print frequency counts of how many times writes were issued to a
    406         particular file descriptor number, in process 1005, but only show
    407         the top 5 busiest fds
    408 
    409 argdist -p 1005 -H 'r:c:read()'
    410         Print a histogram of error codes returned by read() in process 1005
    411 
    412 argdist -C 'r::__vfs_read():u32:$PID:$latency > 100000'
    413         Print frequency of reads by process where the latency was >0.1ms
    414 
    415 argdist -H 'r::__vfs_read(void *file, void *buf, size_t count):size_t:$entry(count):$latency > 1000000' 
    416         Print a histogram of read sizes that were longer than 1ms
    417 
    418 argdist -H \
    419         'p:c:write(int fd, const void *buf, size_t count):size_t:count:fd==1'
    420         Print a histogram of buffer sizes passed to write() across all
    421         processes, where the file descriptor was 1 (STDOUT)
    422 
    423 argdist -C 'p:c:fork()#fork calls'
    424         Count fork() calls in libc across all processes
    425         Can also use funccount.py, which is easier and more flexible 
    426 
    427 argdist -H 't:block:block_rq_complete():u32:args->nr_sector'
    428         Print histogram of number of sectors in completing block I/O requests
    429 
    430 argdist -C 't:irq:irq_handler_entry():int:args->irq'
    431         Aggregate interrupts by interrupt request (IRQ)
    432 
    433 argdist -C 'u:pthread:pthread_start():u64:arg2' -p 1337
    434         Print frequency of function addresses used as a pthread start function,
    435         relying on the USDT pthread_start probe in process 1337
    436 
    437 argdist -H 'p:c:sleep(u32 seconds):u32:seconds' \
    438         -H 'p:c:nanosleep(struct timespec *req):long:req->tv_nsec'
    439         Print histograms of sleep() and nanosleep() parameter values
    440 
    441 argdist -p 2780 -z 120 \
    442         -C 'p:c:write(int fd, char* buf, size_t len):char*:buf:fd==1'
    443         Spy on writes to STDOUT performed by process 2780, up to a string size
    444         of 120 characters 
    445 
    446 argdist -I 'kernel/sched/sched.h' \
    447         -C 'p::__account_cfs_rq_runtime(struct cfs_rq *cfs_rq):s64:cfs_rq->runtime_remaining'
    448         Trace on the cfs scheduling runqueue remaining runtime. The struct cfs_rq is defined
    449         in kernel/sched/sched.h which is in kernel source tree and not in kernel-devel
    450         package.  So this command needs to run at the kernel source tree root directory
    451         so that the added header file can be found by the compiler.
    452