Home | History | Annotate | Download | only in tools
      1 Demonstrations of runqlen, the Linux eBPF/bcc version.
      2 
      3 
      4 This program summarizes scheduler queue length as a histogram, and can also
      5 show run queue occupancy. It works by sampling the run queue length on all
      6 CPUs at 99 Hertz.
      7 
      8 As an example, here is an idle system:
      9 
     10 # ./runqlen.py
     11 Sampling run queue length... Hit Ctrl-C to end.
     12 ^C
     13      runqlen       : count     distribution
     14         0          : 1776     |****************************************|
     15 
     16 This shows a zero run queue length each time it was sampled.
     17 
     18 And now a heavily loaded system:
     19 
     20 # ./runqlen.py
     21 Sampling run queue length... Hit Ctrl-C to end.
     22 ^C
     23      runqlen       : count     distribution
     24         0          : 1068     |****************************************|
     25         1          : 642      |************************                |
     26         2          : 369      |*************                           |
     27         3          : 183      |******                                  |
     28         4          : 104      |***                                     |
     29         5          : 42       |*                                       |
     30         6          : 13       |                                        |
     31         7          : 2        |                                        |
     32         8          : 1        |                                        |
     33 
     34 Now there is often threads queued, with one sample reaching a queue length
     35 of 8. This will cause run queue latency, which can be measured by the bcc
     36 runqlat tool.
     37 
     38 
     39 Here's an example of an issue that runqlen can indentify. Starting with the
     40 system-wide summary:
     41 
     42 # ./runqlen.py 
     43 Sampling run queue length... Hit Ctrl-C to end.
     44 ^C
     45      runqlen       : count     distribution
     46         0          : 1209     |****************************************|
     47         1          : 372      |************                            |
     48         2          : 73       |**                                      |
     49         3          : 3        |                                        |
     50         4          : 1        |                                        |
     51         5          : 0        |                                        |
     52         6          : 0        |                                        |
     53         7          : 237      |*******                                 |
     54 
     55 This shows there is often a run queue length of 7. Now using the -C option to
     56 see per-CPU histograms:
     57 
     58 # ./runqlen.py -C
     59 Sampling run queue length... Hit Ctrl-C to end.
     60 ^C
     61 
     62 cpu = 0
     63      runqlen       : count     distribution
     64         0          : 257      |****************************************|
     65         1          : 64       |*********                               |
     66         2          : 5        |                                        |
     67         3          : 0        |                                        |
     68         4          : 0        |                                        |
     69         5          : 0        |                                        |
     70         6          : 1        |                                        |
     71 
     72 cpu = 1
     73      runqlen       : count     distribution
     74         0          : 226      |****************************************|
     75         1          : 90       |***************                         |
     76         2          : 11       |*                                       |
     77 
     78 cpu = 2
     79      runqlen       : count     distribution
     80         0          : 264      |****************************************|
     81         1          : 52       |*******                                 |
     82         2          : 8        |*                                       |
     83         3          : 1        |                                        |
     84         4          : 0        |                                        |
     85         5          : 0        |                                        |
     86         6          : 1        |                                        |
     87         7          : 0        |                                        |
     88         8          : 1        |                                        |
     89 
     90 cpu = 3
     91      runqlen       : count     distribution
     92         0          : 0        |                                        |
     93         1          : 0        |                                        |
     94         2          : 0        |                                        |
     95         3          : 0        |                                        |
     96         4          : 0        |                                        |
     97         5          : 0        |                                        |
     98         6          : 0        |                                        |
     99         7          : 327      |****************************************|
    100 
    101 cpu = 4
    102      runqlen       : count     distribution
    103         0          : 255      |****************************************|
    104         1          : 63       |*********                               |
    105         2          : 9        |*                                       |
    106 
    107 cpu = 5
    108      runqlen       : count     distribution
    109         0          : 244      |****************************************|
    110         1          : 78       |************                            |
    111         2          : 3        |                                        |
    112         3          : 2        |                                        |
    113 
    114 cpu = 6
    115      runqlen       : count     distribution
    116         0          : 253      |****************************************|
    117         1          : 66       |**********                              |
    118         2          : 6        |                                        |
    119         3          : 1        |                                        |
    120         4          : 1        |                                        |
    121 
    122 cpu = 7
    123      runqlen       : count     distribution
    124         0          : 243      |****************************************|
    125         1          : 74       |************                            |
    126         2          : 6        |                                        |
    127         3          : 1        |                                        |
    128         4          : 0        |                                        |
    129         5          : 1        |                                        |
    130         6          : 2        |                                        |
    131 
    132 The run queue length of 7 is isolated to CPU 3. It was caused by CPU binding
    133 (taskset). This can sometimes happen by applications that try to auto-bind
    134 to CPUs, leaving other CPUs idle while work is queued.
    135 
    136 
    137 runqlat accepts an interval and a count. For example, with -T for timestamps:
    138 
    139 # ./runqlen.py -T 1 5
    140 Sampling run queue length... Hit Ctrl-C to end.
    141 
    142 19:51:34
    143      runqlen       : count     distribution
    144         0          : 635      |****************************************|
    145         1          : 142      |********                                |
    146         2          : 13       |                                        |
    147         3          : 0        |                                        |
    148         4          : 1        |                                        |
    149 
    150 19:51:35
    151      runqlen       : count     distribution
    152         0          : 640      |****************************************|
    153         1          : 136      |********                                |
    154         2          : 13       |                                        |
    155         3          : 1        |                                        |
    156         4          : 0        |                                        |
    157         5          : 0        |                                        |
    158         6          : 0        |                                        |
    159         7          : 0        |                                        |
    160         8          : 0        |                                        |
    161         9          : 0        |                                        |
    162         10         : 1        |                                        |
    163 
    164 19:51:36
    165      runqlen       : count     distribution
    166         0          : 603      |****************************************|
    167         1          : 170      |***********                             |
    168         2          : 16       |*                                       |
    169         3          : 1        |                                        |
    170         4          : 0        |                                        |
    171         5          : 0        |                                        |
    172         6          : 0        |                                        |
    173         7          : 0        |                                        |
    174         8          : 0        |                                        |
    175         9          : 1        |                                        |
    176 
    177 19:51:37
    178      runqlen       : count     distribution
    179         0          : 617      |****************************************|
    180         1          : 154      |*********                               |
    181         2          : 20       |*                                       |
    182         3          : 0        |                                        |
    183         4          : 0        |                                        |
    184         5          : 0        |                                        |
    185         6          : 0        |                                        |
    186         7          : 0        |                                        |
    187         8          : 0        |                                        |
    188         9          : 0        |                                        |
    189         10         : 0        |                                        |
    190         11         : 1        |                                        |
    191 
    192 19:51:38
    193      runqlen       : count     distribution
    194         0          : 603      |****************************************|
    195         1          : 161      |**********                              |
    196         2          : 24       |*                                       |
    197         3          : 4        |                                        |
    198 
    199 The spikes in run queue length of 11 are likely threads waking up at the same
    200 time (a thundering herd), and then are scheduled and complete their execution
    201 quickly.
    202 
    203 
    204 The -O option prints run queue occupancy: the percentage of time that there
    205 was work queued waiting its turn. Eg:
    206 
    207 # ./runqlen.py -OT 1 
    208 Sampling run queue length... Hit Ctrl-C to end.
    209 
    210 19:54:53
    211 runqocc: 41.09%
    212 
    213 19:54:54
    214 runqocc: 41.85%
    215 
    216 19:54:55
    217 runqocc: 41.47%
    218 
    219 19:54:56
    220 runqocc: 42.35%
    221 
    222 19:54:57
    223 runqocc: 40.83%
    224 [...]
    225 
    226 This can also be examined per-CPU:
    227 
    228 # ./runqlen.py -COT 1 
    229 Sampling run queue length... Hit Ctrl-C to end.
    230 
    231 19:55:03
    232 runqocc, CPU 0    32.32%
    233 runqocc, CPU 1    26.26%
    234 runqocc, CPU 2    38.38%
    235 runqocc, CPU 3   100.00%
    236 runqocc, CPU 4    26.26%
    237 runqocc, CPU 5    32.32%
    238 runqocc, CPU 6    39.39%
    239 runqocc, CPU 7    46.46%
    240 
    241 19:55:04
    242 runqocc, CPU 0    35.00%
    243 runqocc, CPU 1    32.32%
    244 runqocc, CPU 2    37.00%
    245 runqocc, CPU 3   100.00%
    246 runqocc, CPU 4    43.43%
    247 runqocc, CPU 5    31.31%
    248 runqocc, CPU 6    28.00%
    249 runqocc, CPU 7    31.31%
    250 
    251 19:55:05
    252 runqocc, CPU 0    43.43%
    253 runqocc, CPU 1    32.32%
    254 runqocc, CPU 2    45.45%
    255 runqocc, CPU 3   100.00%
    256 runqocc, CPU 4    29.29%
    257 runqocc, CPU 5    36.36%
    258 runqocc, CPU 6    36.36%
    259 runqocc, CPU 7    30.30%
    260 
    261 19:55:06
    262 runqocc, CPU 0    40.00%
    263 runqocc, CPU 1    38.00%
    264 runqocc, CPU 2    31.31%
    265 runqocc, CPU 3   100.00%
    266 runqocc, CPU 4    31.31%
    267 runqocc, CPU 5    28.28%
    268 runqocc, CPU 6    31.00%
    269 runqocc, CPU 7    29.29%
    270 [...]
    271 
    272 
    273 USAGE message:
    274 
    275 # ./runqlen -h
    276 usage: runqlen [-h] [-T] [-O] [-C] [interval] [count]
    277 
    278 Summarize scheduler run queue length as a histogram
    279 
    280 positional arguments:
    281   interval         output interval, in seconds
    282   count            number of outputs
    283 
    284 optional arguments:
    285   -h, --help       show this help message and exit
    286   -T, --timestamp  include timestamp on output
    287   -O, --runqocc    report run queue occupancy
    288   -C, --cpus       print output for each CPU separately
    289 
    290 examples:
    291     ./runqlen            # summarize run queue length as a histogram
    292     ./runqlen 1 10       # print 1 second summaries, 10 times
    293     ./runqlen -T 1       # 1s summaries and timestamps
    294     ./runqlen -O         # report run queue occupancy
    295     ./runqlen -C         # show each CPU separately
    296