1 Demonstrations of runqlat, the Linux eBPF/bcc version. 2 3 4 This program summarizes scheduler run queue latency as a histogram, showing 5 how long tasks spent waiting their turn to run on-CPU. 6 7 Here is a heavily loaded system: 8 9 # ./runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. 11 ^C 12 usecs : count distribution 13 0 -> 1 : 233 |*********** | 14 2 -> 3 : 742 |************************************ | 15 4 -> 7 : 203 |********** | 16 8 -> 15 : 173 |******** | 17 16 -> 31 : 24 |* | 18 32 -> 63 : 0 | | 19 64 -> 127 : 30 |* | 20 128 -> 255 : 6 | | 21 256 -> 511 : 3 | | 22 512 -> 1023 : 5 | | 23 1024 -> 2047 : 27 |* | 24 2048 -> 4095 : 30 |* | 25 4096 -> 8191 : 20 | | 26 8192 -> 16383 : 29 |* | 27 16384 -> 32767 : 809 |****************************************| 28 32768 -> 65535 : 64 |*** | 29 30 The distribution is bimodal, with one mode between 0 and 15 microseconds, 31 and another between 16 and 65 milliseconds. These modes are visible as the 32 spikes in the ASCII distribution (which is merely a visual representation 33 of the "count" column). As an example of reading one line: 809 events fell 34 into the 16384 to 32767 microsecond range (16 to 32 ms) while tracing. 35 36 I would expect the two modes to be due the workload: 16 hot CPU-bound threads, 37 and many other mostly idle threads doing occasional work. I suspect the mostly 38 idle threads will run with a higher priority when they wake up, and are 39 the reason for the low latency mode. The high latency mode will be the 40 CPU-bound threads. More analysis with this and other tools can confirm. 41 42 43 A -m option can be used to show milliseconds instead, as well as an interval 44 and a count. For example, showing three x five second summary in milliseconds: 45 46 # ./runqlat -m 5 3 47 Tracing run queue latency... Hit Ctrl-C to end. 48 49 msecs : count distribution 50 0 -> 1 : 3818 |****************************************| 51 2 -> 3 : 39 | | 52 4 -> 7 : 39 | | 53 8 -> 15 : 62 | | 54 16 -> 31 : 2214 |*********************** | 55 32 -> 63 : 226 |** | 56 57 msecs : count distribution 58 0 -> 1 : 3775 |****************************************| 59 2 -> 3 : 52 | | 60 4 -> 7 : 37 | | 61 8 -> 15 : 65 | | 62 16 -> 31 : 2230 |*********************** | 63 32 -> 63 : 212 |** | 64 65 msecs : count distribution 66 0 -> 1 : 3816 |****************************************| 67 2 -> 3 : 49 | | 68 4 -> 7 : 40 | | 69 8 -> 15 : 53 | | 70 16 -> 31 : 2228 |*********************** | 71 32 -> 63 : 221 |** | 72 73 This shows a similar distribution across the three summaries. 74 75 76 A -p option can be used to show one PID only, which is filtered in kernel for 77 efficiency. For example, PID 4505, and one second summaries: 78 79 # ./runqlat -mp 4505 1 80 Tracing run queue latency... Hit Ctrl-C to end. 81 82 msecs : count distribution 83 0 -> 1 : 1 |* | 84 2 -> 3 : 2 |*** | 85 4 -> 7 : 1 |* | 86 8 -> 15 : 0 | | 87 16 -> 31 : 25 |****************************************| 88 32 -> 63 : 3 |**** | 89 90 msecs : count distribution 91 0 -> 1 : 0 | | 92 2 -> 3 : 2 |** | 93 4 -> 7 : 0 | | 94 8 -> 15 : 1 |* | 95 16 -> 31 : 30 |****************************************| 96 32 -> 63 : 1 |* | 97 98 msecs : count distribution 99 0 -> 1 : 0 | | 100 2 -> 3 : 0 | | 101 4 -> 7 : 0 | | 102 8 -> 15 : 0 | | 103 16 -> 31 : 28 |****************************************| 104 32 -> 63 : 2 |** | 105 106 msecs : count distribution 107 0 -> 1 : 1 |* | 108 2 -> 3 : 0 | | 109 4 -> 7 : 0 | | 110 8 -> 15 : 0 | | 111 16 -> 31 : 27 |****************************************| 112 32 -> 63 : 4 |***** | 113 [...] 114 115 For comparison, here is pidstat(1) for that process: 116 117 # pidstat -p 4505 1 118 Linux 4.4.0-virtual (bgregg-xxxxxxxx) 02/08/2016 _x86_64_ (8 CPU) 119 120 08:56:11 AM UID PID %usr %system %guest %CPU CPU Command 121 08:56:12 AM 0 4505 9.00 3.00 0.00 12.00 0 bash 122 08:56:13 AM 0 4505 7.00 5.00 0.00 12.00 0 bash 123 08:56:14 AM 0 4505 10.00 2.00 0.00 12.00 0 bash 124 08:56:15 AM 0 4505 11.00 2.00 0.00 13.00 0 bash 125 08:56:16 AM 0 4505 9.00 3.00 0.00 12.00 0 bash 126 [...] 127 128 This is a synthetic workload that is CPU bound. It's only spending 12% on-CPU 129 each second because of high CPU demand on this server: the remaining time 130 is spent waiting on a run queue, as visualized by runqlat. 131 132 133 Here is the same system, but when it is CPU idle: 134 135 # ./runqlat 5 1 136 Tracing run queue latency... Hit Ctrl-C to end. 137 138 usecs : count distribution 139 0 -> 1 : 2250 |******************************** | 140 2 -> 3 : 2340 |********************************** | 141 4 -> 7 : 2746 |****************************************| 142 8 -> 15 : 418 |****** | 143 16 -> 31 : 93 |* | 144 32 -> 63 : 28 | | 145 64 -> 127 : 119 |* | 146 128 -> 255 : 9 | | 147 256 -> 511 : 4 | | 148 512 -> 1023 : 20 | | 149 1024 -> 2047 : 22 | | 150 2048 -> 4095 : 5 | | 151 4096 -> 8191 : 2 | | 152 153 Back to a microsecond scale, this time there is little run queue latency past 1 154 millisecond, as would be expected. 155 156 157 Now 16 threads are performing heavy disk I/O: 158 159 # ./runqlat 5 1 160 Tracing run queue latency... Hit Ctrl-C to end. 161 162 usecs : count distribution 163 0 -> 1 : 204 | | 164 2 -> 3 : 944 |* | 165 4 -> 7 : 16315 |********************* | 166 8 -> 15 : 29897 |****************************************| 167 16 -> 31 : 1044 |* | 168 32 -> 63 : 23 | | 169 64 -> 127 : 128 | | 170 128 -> 255 : 24 | | 171 256 -> 511 : 5 | | 172 512 -> 1023 : 13 | | 173 1024 -> 2047 : 15 | | 174 2048 -> 4095 : 13 | | 175 4096 -> 8191 : 10 | | 176 177 The distribution hasn't changed too much. While the disks are 100% busy, there 178 is still plenty of CPU headroom, and threads still don't spend much time 179 waiting their turn. 180 181 182 A -P option will print a distribution for each PID: 183 184 # ./runqlat -P 185 Tracing run queue latency... Hit Ctrl-C to end. 186 ^C 187 188 pid = 0 189 usecs : count distribution 190 0 -> 1 : 351 |******************************** | 191 2 -> 3 : 96 |******** | 192 4 -> 7 : 437 |****************************************| 193 8 -> 15 : 12 |* | 194 16 -> 31 : 10 | | 195 32 -> 63 : 0 | | 196 64 -> 127 : 16 |* | 197 128 -> 255 : 0 | | 198 256 -> 511 : 0 | | 199 512 -> 1023 : 0 | | 200 1024 -> 2047 : 0 | | 201 2048 -> 4095 : 0 | | 202 4096 -> 8191 : 0 | | 203 8192 -> 16383 : 1 | | 204 205 pid = 12929 206 usecs : count distribution 207 0 -> 1 : 1 |****************************************| 208 2 -> 3 : 0 | | 209 4 -> 7 : 1 |****************************************| 210 211 pid = 12930 212 usecs : count distribution 213 0 -> 1 : 0 | | 214 2 -> 3 : 0 | | 215 4 -> 7 : 0 | | 216 8 -> 15 : 0 | | 217 16 -> 31 : 1 |****************************************| 218 32 -> 63 : 0 | | 219 64 -> 127 : 1 |****************************************| 220 221 pid = 12931 222 usecs : count distribution 223 0 -> 1 : 0 | | 224 2 -> 3 : 0 | | 225 4 -> 7 : 1 |******************** | 226 8 -> 15 : 0 | | 227 16 -> 31 : 0 | | 228 32 -> 63 : 0 | | 229 64 -> 127 : 0 | | 230 128 -> 255 : 0 | | 231 256 -> 511 : 0 | | 232 512 -> 1023 : 2 |****************************************| 233 234 pid = 12932 235 usecs : count distribution 236 0 -> 1 : 0 | | 237 2 -> 3 : 0 | | 238 4 -> 7 : 0 | | 239 8 -> 15 : 0 | | 240 16 -> 31 : 0 | | 241 32 -> 63 : 0 | | 242 64 -> 127 : 0 | | 243 128 -> 255 : 1 |****************************************| 244 256 -> 511 : 0 | | 245 512 -> 1023 : 1 |****************************************| 246 247 pid = 7 248 usecs : count distribution 249 0 -> 1 : 0 | | 250 2 -> 3 : 426 |************************************* | 251 4 -> 7 : 457 |****************************************| 252 8 -> 15 : 16 |* | 253 254 pid = 9 255 usecs : count distribution 256 0 -> 1 : 0 | | 257 2 -> 3 : 0 | | 258 4 -> 7 : 425 |****************************************| 259 8 -> 15 : 16 |* | 260 261 pid = 11 262 usecs : count distribution 263 0 -> 1 : 0 | | 264 2 -> 3 : 10 |****************************************| 265 266 pid = 14 267 usecs : count distribution 268 0 -> 1 : 0 | | 269 2 -> 3 : 8 |****************************************| 270 4 -> 7 : 2 |********** | 271 272 pid = 18 273 usecs : count distribution 274 0 -> 1 : 414 |****************************************| 275 2 -> 3 : 0 | | 276 4 -> 7 : 20 |* | 277 8 -> 15 : 8 | | 278 279 pid = 12928 280 usecs : count distribution 281 0 -> 1 : 0 | | 282 2 -> 3 : 0 | | 283 4 -> 7 : 1 |****************************************| 284 8 -> 15 : 0 | | 285 16 -> 31 : 0 | | 286 32 -> 63 : 0 | | 287 64 -> 127 : 1 |****************************************| 288 289 pid = 1867 290 usecs : count distribution 291 0 -> 1 : 0 | | 292 2 -> 3 : 0 | | 293 4 -> 7 : 0 | | 294 8 -> 15 : 15 |****************************************| 295 16 -> 31 : 1 |** | 296 32 -> 63 : 0 | | 297 64 -> 127 : 0 | | 298 128 -> 255 : 4 |********** | 299 300 pid = 1871 301 usecs : count distribution 302 0 -> 1 : 0 | | 303 2 -> 3 : 0 | | 304 4 -> 7 : 0 | | 305 8 -> 15 : 2 |****************************************| 306 16 -> 31 : 0 | | 307 32 -> 63 : 0 | | 308 64 -> 127 : 0 | | 309 128 -> 255 : 0 | | 310 256 -> 511 : 0 | | 311 512 -> 1023 : 1 |******************** | 312 313 pid = 1876 314 usecs : count distribution 315 0 -> 1 : 0 | | 316 2 -> 3 : 0 | | 317 4 -> 7 : 0 | | 318 8 -> 15 : 1 |****************************************| 319 16 -> 31 : 0 | | 320 32 -> 63 : 0 | | 321 64 -> 127 : 0 | | 322 128 -> 255 : 0 | | 323 256 -> 511 : 1 |****************************************| 324 325 pid = 1878 326 usecs : count distribution 327 0 -> 1 : 0 | | 328 2 -> 3 : 0 | | 329 4 -> 7 : 0 | | 330 8 -> 15 : 0 | | 331 16 -> 31 : 3 |****************************************| 332 333 pid = 1880 334 usecs : count distribution 335 0 -> 1 : 0 | | 336 2 -> 3 : 0 | | 337 4 -> 7 : 0 | | 338 8 -> 15 : 3 |****************************************| 339 340 pid = 9307 341 usecs : count distribution 342 0 -> 1 : 0 | | 343 2 -> 3 : 0 | | 344 4 -> 7 : 0 | | 345 8 -> 15 : 1 |****************************************| 346 347 pid = 1886 348 usecs : count distribution 349 0 -> 1 : 0 | | 350 2 -> 3 : 0 | | 351 4 -> 7 : 1 |******************** | 352 8 -> 15 : 2 |****************************************| 353 354 pid = 1888 355 usecs : count distribution 356 0 -> 1 : 0 | | 357 2 -> 3 : 0 | | 358 4 -> 7 : 0 | | 359 8 -> 15 : 3 |****************************************| 360 361 pid = 3297 362 usecs : count distribution 363 0 -> 1 : 0 | | 364 2 -> 3 : 0 | | 365 4 -> 7 : 0 | | 366 8 -> 15 : 1 |****************************************| 367 368 pid = 1892 369 usecs : count distribution 370 0 -> 1 : 0 | | 371 2 -> 3 : 0 | | 372 4 -> 7 : 0 | | 373 8 -> 15 : 0 | | 374 16 -> 31 : 1 |******************** | 375 32 -> 63 : 0 | | 376 64 -> 127 : 0 | | 377 128 -> 255 : 0 | | 378 256 -> 511 : 0 | | 379 512 -> 1023 : 2 |****************************************| 380 381 pid = 7024 382 usecs : count distribution 383 0 -> 1 : 0 | | 384 2 -> 3 : 0 | | 385 4 -> 7 : 0 | | 386 8 -> 15 : 4 |****************************************| 387 388 pid = 16468 389 usecs : count distribution 390 0 -> 1 : 0 | | 391 2 -> 3 : 0 | | 392 4 -> 7 : 0 | | 393 8 -> 15 : 3 |****************************************| 394 395 pid = 12922 396 usecs : count distribution 397 0 -> 1 : 1 |****************************************| 398 2 -> 3 : 0 | | 399 4 -> 7 : 0 | | 400 8 -> 15 : 1 |****************************************| 401 16 -> 31 : 1 |****************************************| 402 32 -> 63 : 0 | | 403 64 -> 127 : 1 |****************************************| 404 405 pid = 12923 406 usecs : count distribution 407 0 -> 1 : 0 | | 408 2 -> 3 : 0 | | 409 4 -> 7 : 1 |******************** | 410 8 -> 15 : 0 | | 411 16 -> 31 : 0 | | 412 32 -> 63 : 0 | | 413 64 -> 127 : 2 |****************************************| 414 128 -> 255 : 0 | | 415 256 -> 511 : 0 | | 416 512 -> 1023 : 1 |******************** | 417 1024 -> 2047 : 1 |******************** | 418 419 pid = 12924 420 usecs : count distribution 421 0 -> 1 : 0 | | 422 2 -> 3 : 0 | | 423 4 -> 7 : 2 |******************** | 424 8 -> 15 : 4 |****************************************| 425 16 -> 31 : 1 |********** | 426 32 -> 63 : 0 | | 427 64 -> 127 : 0 | | 428 128 -> 255 : 0 | | 429 256 -> 511 : 0 | | 430 512 -> 1023 : 0 | | 431 1024 -> 2047 : 1 |********** | 432 433 pid = 12925 434 usecs : count distribution 435 0 -> 1 : 0 | | 436 2 -> 3 : 0 | | 437 4 -> 7 : 0 | | 438 8 -> 15 : 0 | | 439 16 -> 31 : 0 | | 440 32 -> 63 : 0 | | 441 64 -> 127 : 1 |****************************************| 442 443 pid = 12926 444 usecs : count distribution 445 0 -> 1 : 0 | | 446 2 -> 3 : 1 |****************************************| 447 4 -> 7 : 0 | | 448 8 -> 15 : 1 |****************************************| 449 16 -> 31 : 0 | | 450 32 -> 63 : 0 | | 451 64 -> 127 : 0 | | 452 128 -> 255 : 0 | | 453 256 -> 511 : 0 | | 454 512 -> 1023 : 1 |****************************************| 455 456 pid = 12927 457 usecs : count distribution 458 0 -> 1 : 1 |****************************************| 459 2 -> 3 : 0 | | 460 4 -> 7 : 1 |****************************************| 461 462 463 A -L option will print a distribution for each TID: 464 465 # ./runqlat -L 466 Tracing run queue latency... Hit Ctrl-C to end. 467 ^C 468 469 tid = 0 470 usecs : count distribution 471 0 -> 1 : 593 |**************************** | 472 2 -> 3 : 829 |****************************************| 473 4 -> 7 : 300 |************** | 474 8 -> 15 : 321 |*************** | 475 16 -> 31 : 132 |****** | 476 32 -> 63 : 58 |** | 477 64 -> 127 : 0 | | 478 128 -> 255 : 0 | | 479 256 -> 511 : 13 | | 480 481 tid = 7 482 usecs : count distribution 483 0 -> 1 : 8 |******** | 484 2 -> 3 : 19 |******************** | 485 4 -> 7 : 37 |****************************************| 486 [...] 487 488 489 And a --pidnss option (short for PID namespaces) will print for each PID 490 namespace, for analyzing container performance: 491 492 # ./runqlat --pidnss -m 493 Tracing run queue latency... Hit Ctrl-C to end. 494 ^C 495 496 pidns = 4026532870 497 msecs : count distribution 498 0 -> 1 : 40 |****************************************| 499 2 -> 3 : 1 |* | 500 4 -> 7 : 0 | | 501 8 -> 15 : 0 | | 502 16 -> 31 : 0 | | 503 32 -> 63 : 2 |** | 504 64 -> 127 : 5 |***** | 505 506 pidns = 4026532809 507 msecs : count distribution 508 0 -> 1 : 67 |****************************************| 509 510 pidns = 4026532748 511 msecs : count distribution 512 0 -> 1 : 63 |****************************************| 513 514 pidns = 4026532687 515 msecs : count distribution 516 0 -> 1 : 7 |****************************************| 517 518 pidns = 4026532626 519 msecs : count distribution 520 0 -> 1 : 45 |****************************************| 521 2 -> 3 : 0 | | 522 4 -> 7 : 0 | | 523 8 -> 15 : 0 | | 524 16 -> 31 : 0 | | 525 32 -> 63 : 0 | | 526 64 -> 127 : 3 |** | 527 528 pidns = 4026531836 529 msecs : count distribution 530 0 -> 1 : 314 |****************************************| 531 2 -> 3 : 1 | | 532 4 -> 7 : 11 |* | 533 8 -> 15 : 28 |*** | 534 16 -> 31 : 137 |***************** | 535 32 -> 63 : 86 |********** | 536 64 -> 127 : 1 | | 537 538 pidns = 4026532382 539 msecs : count distribution 540 0 -> 1 : 285 |****************************************| 541 2 -> 3 : 5 | | 542 4 -> 7 : 16 |** | 543 8 -> 15 : 9 |* | 544 16 -> 31 : 69 |********* | 545 32 -> 63 : 25 |*** | 546 547 Many of these distributions have two modes: the second, in this case, is 548 caused by capping CPU usage via CPU shares. 549 550 551 USAGE message: 552 553 # ./runqlat -h 554 usage: runqlat.py [-h] [-T] [-m] [-P] [--pidnss] [-L] [-p PID] 555 [interval] [count] 556 557 Summarize run queue (scheduler) latency as a histogram 558 559 positional arguments: 560 interval output interval, in seconds 561 count number of outputs 562 563 optional arguments: 564 -h, --help show this help message and exit 565 -T, --timestamp include timestamp on output 566 -m, --milliseconds millisecond histogram 567 -P, --pids print a histogram per process ID 568 --pidnss print a histogram per PID namespace 569 -L, --tids print a histogram per thread ID 570 -p PID, --pid PID trace this PID only 571 572 examples: 573 ./runqlat # summarize run queue latency as a histogram 574 ./runqlat 1 10 # print 1 second summaries, 10 times 575 ./runqlat -mT 1 # 1s summaries, milliseconds, and timestamps 576 ./runqlat -P # show each PID separately 577 ./runqlat -p 185 # trace PID 185 only 578