Home | History | Annotate | Download | only in lua
      1 Lua Tools for BCC
      2 -----------------
      3 
      4 This directory contains Lua tooling for [BCC][bcc]
      5 (the BPF Compiler Collection).
      6 
      7 BCC is a toolkit for creating userspace and kernel tracing programs. By
      8 default, it comes with a library `libbcc`, some example tooling and a Python
      9 frontend for the library.
     10 
     11 Here we present an alternate frontend for `libbcc` implemented in LuaJIT. This
     12 lets you write the userspace part of your tracer in Lua instead of Python.
     13 
     14 Since LuaJIT is a JIT compiled language, tracers implemented in `bcc-lua`
     15 exhibit significantly reduced overhead compared to their Python equivalents.
     16 This is particularly noticeable in tracers that actively use the table APIs to
     17 get information from the kernel.
     18 
     19 If your tracer makes extensive use of `BPF_MAP_TYPE_PERF_EVENT_ARRAY` or
     20 `BPF_MAP_TYPE_HASH`, you may find the performance characteristics of this
     21 implementation very appealing, as LuaJIT can compile to native code a lot of
     22 the callchain to process the events, and this wrapper has been designed to
     23 benefit from such JIT compilation.
     24 
     25 ## Quickstart Guide
     26 
     27 The following instructions assume Ubuntu 14.04 LTS.
     28 
     29 1. Install a **very new kernel**. It has to be new and shiny for this to work. 4.3+
     30 
     31     ```
     32     VER=4.4.2-040402
     33     PREFIX=http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.2-wily/
     34     REL=201602171633
     35     wget ${PREFIX}/linux-headers-${VER}-generic_${VER}.${REL}_amd64.deb
     36     wget ${PREFIX}/linux-headers-${VER}_${VER}.${REL}_all.deb
     37     wget ${PREFIX}/linux-image-${VER}-generic_${VER}.${REL}_amd64.deb
     38     sudo dpkg -i linux-*${VER}.${REL}*.deb
     39     ```
     40 
     41 2. Install the `libbcc` binary packages and `luajit`
     42 
     43     ```
     44     sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys D4284CDD
     45     echo "deb https://repo.iovisor.org/apt trusty main" | sudo tee /etc/apt/sources.list.d/iovisor.list
     46     sudo apt-get update
     47     sudo apt-get install libbcc luajit
     48     ```
     49 
     50 3. Test one of the examples to ensure `libbcc` is properly installed
     51 
     52     ```
     53     sudo ./bcc-probe examples/lua/task_switch.lua
     54     ```
     55 
     56 ## LuaJIT BPF compiler
     57 
     58 Now it is also possible to write Lua functions and compile them transparently to BPF bytecode, here is a simple socket filter example:
     59 
     60 ```lua
     61 local S = require('syscall')
     62 local bpf = require('bpf')
     63 local map = bpf.map('array', 256)
     64 -- Kernel-space part of the program
     65 local prog = assert(bpf(function ()
     66     local proto = pkt.ip.proto  -- Get byte (ip.proto) from frame at [23]
     67     xadd(map[proto], 1)         -- Increment packet count
     68 end))
     69 -- User-space part of the program
     70 local sock = assert(bpf.socket('lo', prog))
     71 for i=1,10 do
     72     local icmp, udp, tcp = map[1], map[17], map[6]
     73     print('TCP', tcp, 'UDP', udp, 'ICMP', icmp, 'packets')
     74     S.sleep(1)
     75 end
     76 ```
     77 
     78 The other application of BPF programs is attaching to probes for [perf event tracing][tracing]. That means you can trace events inside the kernel (or user-space), and then collect results - for example histogram of `sendto()` latency, off-cpu time stack traces, syscall latency, and so on. While kernel probes and perf events have unstable ABI, with a dynamic language we can create and use proper type based on the tracepoint ABI on runtime.
     79 
     80 Runtime automatically recognizes reads that needs a helper to be accessed. The type casts denote source of the objects, for example the [bashreadline][bashreadline] example that prints entered bash commands from all running shells:
     81 
     82 ```lua
     83 local ffi = require('ffi')
     84 local bpf = require('bpf')
     85 -- Perf event map
     86 local sample_t = 'struct { uint64_t pid; char str[80]; }'
     87 local events = bpf.map('perf_event_array')
     88 -- Kernel-space part of the program
     89 bpf.uprobe('/bin/bash:readline' function (ptregs)
     90     local sample = ffi.new(sample_t)
     91     sample.pid = pid_tgid()
     92     ffi.copy(sample.str, ffi.cast('char *', req.ax)) -- Cast `ax` to string pointer and copy to buffer
     93     perf_submit(events, sample)                      -- Write sample to perf event map
     94 end, true, -1, 0)
     95 -- User-space part of the program
     96 local log = events:reader(nil, 0, sample_t) -- Must specify PID or CPU_ID to observe
     97 while true do
     98     log:block()               -- Wait until event reader is readable
     99     for _,e in log:read() do  -- Collect available reader events
    100         print(tonumber(e.pid), ffi.string(e.str))
    101     end
    102 end
    103 ```
    104 
    105 Where cast to `struct pt_regs` flags the source of data as probe arguments, which means any pointer derived
    106 from this structure points to kernel and a helper is needed to access it. Casting `req.ax` to pointer is then required for `ffi.copy` semantics, otherwise it would be treated as `u64` and only it's value would be
    107 copied. The type detection is automatic most of the times (socket filters and `bpf.tracepoint`), but not with uprobes and kprobes.
    108 
    109 ### Installation
    110 
    111 ```bash
    112 $ luarocks install bpf
    113 ```
    114 
    115 ### Examples
    116 
    117 See `examples/lua` directory.
    118 
    119 ### Helpers
    120 
    121 * `print(...)` is a wrapper for `bpf_trace_printk`, the output is captured in `cat /sys/kernel/debug/tracing/trace_pipe`
    122 * `bit.*` library **is** supported (`lshift, rshift, arshift, bnot, band, bor, bxor`)
    123 * `math.*` library *partially* supported (`log2, log, log10`)
    124 * `ffi.cast()` is implemented (including structures and arrays)
    125 * `ffi.new(...)` allocates memory on stack, initializers are NYI
    126 * `ffi.copy(...)` copies memory (possibly using helpers) between stack/kernel/registers
    127 * `ntoh(x[, width])` - convert from network to host byte order.
    128 * `hton(x[, width])` - convert from host to network byte order.
    129 * `xadd(dst, inc)` - exclusive add, a synchronous `*dst += b` if Lua had `+=` operator
    130 
    131 Below is a list of BPF-specific helpers:
    132 
    133 * `time()` - return current monotonic time in nanoseconds (uses `bpf_ktime_get_ns`)
    134 * `cpu()` - return current CPU number (uses `bpf_get_smp_processor_id`)
    135 * `pid_tgid()` - return caller `tgid << 32 | pid` (uses `bpf_get_current_pid_tgid`)
    136 * `uid_gid()` - return caller `gid << 32 | uid` (uses `bpf_get_current_uid_gid`)
    137 * `comm(var)` - write current process name (uses `bpf_get_current_comm`)
    138 * `perf_submit(map, var)` - submit variable to perf event array BPF map
    139 * `stack_id(map, flags)` - return stack trace identifier from stack trace BPF map
    140 * `load_bytes(off, var)` - helper for direct packet access with `skb_load_bytes()`
    141 
    142 ### Current state
    143 
    144 * Not all LuaJIT bytecode opcodes are supported *(notable mentions below)*
    145 * Closures `UCLO` will probably never be supported, although you can use upvalues inside compiled function.
    146 * Type narrowing is opportunistic. Numbers are 64-bit by default, but 64-bit immediate loads are not supported (e.g. `local x = map[ffi.cast('uint64_t', 1000)]`)
    147 * Tail calls `CALLT`, and iterators `ITERI` are NYI (as of now)
    148 * Arbitrary ctype **is** supported both for map keys and values
    149 * Basic optimisations like: constant propagation, partial DCE, liveness analysis and speculative register allocation are implement, but there's no control flow analysis yet. This means the compiler has the visibility when things are used and dead-stores occur, but there's no rewriter pass to eliminate them.
    150 * No register sub-allocations, no aggressive use of caller-saved `R1-5`, no aggressive narrowing (this would require variable range assertions and variable relationships)
    151 * Slices with not 1/2/4/8 length are NYI (requires allocating a memory on stack and using pointer type)
    152 
    153 
    154 [bcc]: https://github.com/iovisor/bcc
    155 [tracing]: http://www.brendangregg.com/blog/2016-03-05/linux-bpf-superpowers.html
    156 [bashreadline]: http://www.brendangregg.com/blog/2016-02-08/linux-ebpf-bcc-uprobes.html