1 Demonstrations of tcptop, the Linux eBPF/bcc version. 2 3 4 tcptop summarizes throughput by host and port. Eg: 5 6 # tcptop 7 Tracing... Output every 1 secs. Hit Ctrl-C to end 8 <screen clears> 9 19:46:24 loadavg: 1.86 2.67 2.91 3/362 16681 10 11 PID COMM LADDR RADDR RX_KB TX_KB 12 16648 16648 100.66.3.172:22 100.127.69.165:6684 1 0 13 16647 sshd 100.66.3.172:22 100.127.69.165:6684 0 2149 14 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 15 14458 sshd 100.66.3.172:22 100.127.69.165:7165 0 0 16 17 PID COMM LADDR6 RADDR6 RX_KB TX_KB 18 16681 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 1 1 19 16679 ssh fe80::8a3:9dff:fed5:6b19:16606 fe80::8a3:9dff:fed5:6b19:22 1 1 20 16680 sshd fe80::8a3:9dff:fed5:6b19:22 fe80::8a3:9dff:fed5:6b19:16606 0 0 21 22 This example output shows two listings of TCP connections, for IPv4 and IPv6. 23 If there is only traffic for one of these, then only one group is shown. 24 25 The output in each listing is sorted by total throughput (send then receive), 26 and when printed it is rounded (floor) to the nearest Kbyte. The example output 27 shows PID 16647, sshd, transmitted 2149 Kbytes during the tracing interval. 28 The other IPv4 sessions had such low throughput they rounded to zero. 29 30 All TCP sessions, including over loopback, are included. 31 32 The session with the process name (COMM) of 16648 is really a short-lived 33 process with PID 16648 where we didn't catch the process name when printing 34 the output. If this behavior is a serious issue for you, you can modify the 35 tool's code to include bpf_get_current_comm() in the key structs, so that it's 36 fetched during the event and will always be seen. I did it this way to start 37 with, but it was measurably increasing the overhead of this tool, so I switched 38 to the asynchronous model. 39 40 The overhead is relative to TCP event rate (the rate of tcp_sendmsg() and 41 tcp_recvmsg() or tcp_cleanup_rbuf()). Due to buffering, this should be lower 42 than the packet rate. You can measure the rate of these using funccount. 43 Some sample production servers tested found total rates of 4k to 15k per 44 second. The CPU overhead at these rates ranged from 0.5% to 2.0% of one CPU. 45 Maybe your workloads have higher rates and therefore higher overhead, or, 46 lower rates. 47 48 49 I much prefer not clearing the screen, so that historic output is in the 50 scroll-back buffer, and patterns or intermittent issues can be better seen. 51 You can do this with -C: 52 53 # tcptop -C 54 Tracing... Output every 1 secs. Hit Ctrl-C to end 55 56 20:27:12 loadavg: 0.08 0.02 0.17 2/367 17342 57 58 PID COMM LADDR RADDR RX_KB TX_KB 59 17287 17287 100.66.3.172:22 100.127.69.165:57585 3 1 60 17286 sshd 100.66.3.172:22 100.127.69.165:57585 0 1 61 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 62 63 20:27:13 loadavg: 0.08 0.02 0.17 1/367 17342 64 65 PID COMM LADDR RADDR RX_KB TX_KB 66 17286 sshd 100.66.3.172:22 100.127.69.165:57585 1 7761 67 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 68 69 20:27:14 loadavg: 0.08 0.02 0.17 2/365 17347 70 71 PID COMM LADDR RADDR RX_KB TX_KB 72 17286 17286 100.66.3.172:22 100.127.69.165:57585 1 2501 73 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 74 75 20:27:15 loadavg: 0.07 0.02 0.17 2/367 17403 76 77 PID COMM LADDR RADDR RX_KB TX_KB 78 17349 17349 100.66.3.172:22 100.127.69.165:10161 3 1 79 17348 sshd 100.66.3.172:22 100.127.69.165:10161 0 1 80 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 81 82 20:27:16 loadavg: 0.07 0.02 0.17 1/367 17403 83 84 PID COMM LADDR RADDR RX_KB TX_KB 85 17348 sshd 100.66.3.172:22 100.127.69.165:10161 3333 0 86 14374 sshd 100.66.3.172:22 100.127.69.165:25219 0 0 87 88 20:27:17 loadavg: 0.07 0.02 0.17 2/366 17409 89 90 PID COMM LADDR RADDR RX_KB TX_KB 91 17348 17348 100.66.3.172:22 100.127.69.165:10161 6909 2 92 93 You can disable the loadavg summary line with -S if needed. 94 95 96 USAGE: 97 98 # tcptop -h 99 usage: tcptop.py [-h] [-C] [-S] [-p PID] [interval] [count] 100 101 Summarize TCP send/recv throughput by host 102 103 positional arguments: 104 interval output interval, in seconds (default 1) 105 count number of outputs 106 107 optional arguments: 108 -h, --help show this help message and exit 109 -C, --noclear don't clear the screen 110 -S, --nosummary skip system summary line 111 -p PID, --pid PID trace this PID only 112 113 examples: 114 ./tcptop # trace TCP send/recv by host 115 ./tcptop -C # don't clear the screen 116 ./tcptop -p 181 # only trace PID 181 117