Home | History | Annotate | Download | only in tools
      1 Demonstrations of syscount, the Linux/eBPF version.
      2 
      3 
      4 syscount summarizes syscall counts across the system or a specific process,
      5 with optional latency information. It is very useful for general workload
      6 characterization, for example:
      7 
      8 # syscount
      9 Tracing syscalls, printing top 10... Ctrl+C to quit.
     10 [09:39:04]
     11 SYSCALL             COUNT
     12 write               10739
     13 read                10584
     14 wait4                1460
     15 nanosleep            1457
     16 select                795
     17 rt_sigprocmask        689
     18 clock_gettime         653
     19 rt_sigaction          128
     20 futex                  86
     21 ioctl                  83
     22 ^C
     23 
     24 These are the top 10 entries; you can get more by using the -T switch. Here,
     25 the output indicates that the write and read syscalls were very common, followed
     26 immediately by wait4, nanosleep, and so on. By default, syscount counts across
     27 the entire system, but we can point it to a specific process of interest:
     28 
     29 # syscount -p $(pidof dd)
     30 Tracing syscalls, printing top 10... Ctrl+C to quit.
     31 [09:40:21]
     32 SYSCALL             COUNT
     33 read              7878397
     34 write             7878397
     35 ^C
     36 
     37 Indeed, dd's workload is a bit easier to characterize. Occasionally, the count
     38 of syscalls is not enough, and you'd also want an aggregate latency:
     39 
     40 # syscount -L
     41 Tracing syscalls, printing top 10... Ctrl+C to quit.
     42 [09:41:32]
     43 SYSCALL                   COUNT        TIME (us)
     44 select                       16      3415860.022
     45 nanosleep                   291        12038.707
     46 ftruncate                     1          122.939
     47 write                         4           63.389
     48 stat                          1           23.431
     49 fstat                         1            5.088
     50 [unknown: 321]               32            4.965
     51 timerfd_settime               1            4.830
     52 ioctl                         3            4.802
     53 kill                          1            4.342
     54 ^C
     55 
     56 The select and nanosleep calls are responsible for a lot of time, but remember
     57 these are blocking calls. This output was taken from a mostly idle system. Note
     58 the "unknown" entry -- syscall 321 is the bpf() syscall, which is not in the
     59 table used by this tool (borrowed from strace sources).
     60 
     61 Another direction would be to understand which processes are making a lot of
     62 syscalls, thus responsible for a lot of activity. This is what the -P switch
     63 does:
     64 
     65 # syscount -P
     66 Tracing syscalls, printing top 10... Ctrl+C to quit.
     67 [09:58:13]
     68 PID    COMM               COUNT
     69 13820  vim                  548
     70 30216  sshd                 149
     71 29633  bash                  72
     72 25188  screen                70
     73 25776  mysqld                30
     74 31285  python                10
     75 529    systemd-udevd          9
     76 1      systemd                8
     77 494    systemd-journal        5
     78 ^C
     79 
     80 This is again from a mostly idle system over an interval of a few seconds.
     81 
     82 Sometimes, you'd only care about failed syscalls -- these are the ones that
     83 might be worth investigating with follow-up tools like opensnoop, execsnoop,
     84 or trace. Use the -x switch for this; the following example also demonstrates
     85 the -i switch, for printing at predefined intervals:
     86 
     87 # syscount -x -i 5
     88 Tracing failed syscalls, printing top 10... Ctrl+C to quit.
     89 [09:44:16]
     90 SYSCALL             COUNT
     91 futex                  13
     92 getxattr               10
     93 stat                    8
     94 open                    6
     95 wait4                   3
     96 access                  2
     97 [unknown: 321]          1
     98 
     99 [09:44:21]
    100 SYSCALL             COUNT
    101 futex                  12
    102 getxattr               10
    103 [unknown: 321]          2
    104 wait4                   1
    105 access                  1
    106 pause                   1
    107 ^C
    108 
    109 Similar to -x/--failures, sometimes you only care about certain syscall
    110 errors like EPERM or ENONET -- these are the ones that might be worth
    111 investigating with follow-up tools like opensnoop, execsnoop, or
    112 trace. Use the -e/--errno switch for this; the following example also
    113 demonstrates the -e switch, for printing ENOENT failures at predefined intervals:
    114 
    115 # syscount -e ENOENT -i 5
    116 Tracing syscalls, printing top 10... Ctrl+C to quit.
    117 [13:15:57]
    118 SYSCALL                   COUNT
    119 stat                       4669
    120 open                       1951
    121 access                      561
    122 lstat                        62
    123 openat                       42
    124 readlink                      8
    125 execve                        4
    126 newfstatat                    1
    127 
    128 [13:16:02]
    129 SYSCALL                   COUNT
    130 lstat                     18506
    131 stat                      13087
    132 open                       2907
    133 access                      412
    134 openat                       19
    135 readlink                     12
    136 execve                        7
    137 connect                       6
    138 unlink                        1
    139 rmdir                         1
    140 ^C
    141 
    142 USAGE:
    143 # syscount -h
    144 usage: syscount.py [-h] [-p PID] [-i INTERVAL] [-T TOP] [-x] [-e ERRNO] [-L]
    145                    [-m] [-P] [-l]
    146 
    147 Summarize syscall counts and latencies.
    148 
    149 optional arguments:
    150   -h, --help            show this help message and exit
    151   -p PID, --pid PID     trace only this pid
    152   -i INTERVAL, --interval INTERVAL
    153                         print summary at this interval (seconds)
    154   -d DURATION, --duration DURATION
    155 			total duration of trace, in seconds
    156   -T TOP, --top TOP     print only the top syscalls by count or latency
    157   -x, --failures        trace only failed syscalls (return < 0)
    158   -e ERRNO, --errno ERRNO
    159                         trace only syscalls that return this error (numeric or
    160                         EPERM, etc.)
    161   -L, --latency         collect syscall latency
    162   -m, --milliseconds    display latency in milliseconds (default:
    163                         microseconds)
    164   -P, --process         count by process and not by syscall
    165   -l, --list            print list of recognized syscalls and exit
    166