1 # Compiling P4 to EBPF
2
3 Mihai Budiu - mbudiu (a] barefootnetworks.com
4
5 September 22, 2015
6
7 ## Abstract
8
9 This document describes a prototype compiler that translates programs
10 written in the P4 programming languages to eBPF programs. The
11 translation is performed by generating programs written in a subset of
12 the C programming language, that are converted to EBPF using the BPF
13 Compiler Collection tools.
14
15 The compiler code is licensed under an [Apache v2.0 license]
16 (http://www.apache.org/licenses/LICENSE-2.0.html).
17
18 ## Preliminaries
19
20 In this section we give a brief overview of P4 and EBPF. A detailed
21 treatment of these topics is outside the scope of this text.
22
23 ### P4
24
25 P4 (http://p4.org) is a domain-specific programming language for
26 specifying the behavior of the dataplanes of network-forwarding
27 elements. The name of the programming language comes from the title
28 of a paper published in the proceedings of SIGCOMM Computer
29 Communications Review in 2014:
30 http://www.sigcomm.org/ccr/papers/2014/July/0000000.0000004:
31 "Programming Protocol-Independent Packet Processors".
32
33 P4 itself is protocol-independent but allows programmers to express a
34 rich set of data plane behaviors and protocols. The core P4
35 abstractions are:
36
37 * Header definitions describe the format (the set of fields and their
38 sizes) of each header within a packet.
39
40 * Parse graphs (finite-state machines) describe the permitted header
41 sequences within received packets.
42
43 * Tables associate keys to actions. P4 tables generalize traditional
44 forwarding tables; they can be used to implement routing tables,
45 flow lookup tables, access-control lists, etc.
46
47 * Actions describe how packet header fields and metadata are manipulated.
48
49 * Match-action units stitch together tables and actions, and perform
50 the following sequence of operations:
51
52 * Construct lookup keys from packet fields or computed metadata,
53
54 * Use the constructed lookup key to index into tables, choosing an
55 action to execute,
56
57 * Finally, execute the selected action.
58
59 * Control flow is expressed as an imperative program describing the
60 data-dependent packet processing within a pipeline, including the
61 data-dependent sequence of match-action unit invocations.
62
63 P4 programs describe the behavior of network-processing dataplanes. A
64 P4 program is designed to operate in concert with a separate *control
65 plane* program. The control plane is responsible for managing at
66 runtime the contents of the P4 tables. P4 cannot be used to specify
67 control-planes; however, a P4 program implicitly specifies the
68 interface between the data-plane and the control-plane.
69
70 The P4 language is under active development; the current stable
71 version is 1.0.2 (see http://p4.org/spec); a reference implementation
72 of a compiler and associated tools is freely available using a Apache
73 2 open-source license (see http://p4.org/code).
74
75 ### EBPF
76
77 #### Safe code
78
79 EBPF is a acronym that stands for Extended Berkeley Packet Filters.
80 In essence EBPF is a low-level programming language (similar to
81 machine code); EBPF programs are traditionally executed by a virtual
82 machine that resides in the Linux kernel. EBPF programs can be
83 inserted and removed from a live kernel using dynamic code
84 instrumentation. The main feature of EBPF programs is their *static
85 safety*: prior to execution all EBPF programs have to be validated as
86 being safe, and unsafe programs cannot be executed. A safe program
87 provably cannot compromise the machine it is running on:
88
89 * it can only access a restricted memory region (on the local stack)
90
91 * it can run only for a limited amount of time; during execution it
92 cannot block, sleep or take any locks
93
94 * it cannot use any kernel resources with the exception of a limited
95 set of kernel services which have been specifically whitelisted,
96 including operations to manipulate tables (described below)
97
98 #### Kernel hooks
99
100 EBPF programs are inserted into the kernel using *hooks*. There are
101 several types of hooks available:
102
103 * any function entry point in the kernel can act as a hook; attaching
104 an EBPF program to a function `foo()` will cause the EBPF program to
105 execute every time some kernel thread executes `foo()`.
106
107 * EBPF programs can also be attached using the Linux Traffic Control
108 (TC) subsystem, in the network packet processing datapath. Such
109 programs can be used as TC classifiers and actions.
110
111 * EBPF programs can also be attached to sockets or network interfaces.
112 In this case they can be used for processing packets that flow
113 through the socket/interface.
114
115 EBPF programs can be used for many purposes; the main use cases are
116 dynamic tracing and monitoring, and packet processing. We are mostly
117 interested in the latter use case in this document.
118
119 #### EBPF Tables
120
121 The EBPF runtime exposes a bi-directional kernel-userspace data
122 communication channel, called *tables* (also called maps in some EBPF
123 documents and code samples). EBPF tables are essentially key-value
124 stores, where keys and values are arbitrary fixed-size bitstrings.
125 The key width, value width and table size (maximum number of entries
126 that can be stored) are declared statically, at table creation time.
127
128 In user-space tables handles are exposed as file descriptors. Both
129 user- and kernel-space programs can manipulate tables, by inserting,
130 deleting, looking up, modifying, and enumerating entries in a table.
131
132 In kernel space the keys and values are exposed as pointers to the raw
133 underlying data stored in the table, whereas in user-space the
134 pointers point to copies of the data.
135
136 #### Concurrency
137
138 An important aspect to understand related to EBPF is the execution
139 model. An EBPF program is triggered by a kernel hook; multiple
140 instances of the same kernel hook can be running simultaneously on
141 different cores.
142
143 Each table however has a single instances across all the cores. A
144 single table may be accessed simultaneously by multiple instances of
145 the same EBPF program running as separate kernel threads on different
146 cores. EBPF tables are native kernel objects, and access to the table
147 contents is protected using the kernel RCU mechanism. This makes
148 access to table entries safe under concurrent execution; for example,
149 the memory associated to a value cannot be accidentally freed while an
150 EBPF program holds a pointer to the respective value. However,
151 accessing tables is prone to data races; since EBPF programs cannot
152 use locks, some of these races often cannot be avoided.
153
154 EBPF and the associated tools are also under active development, and
155 new capabilities are added frequently. The P4 compiler generates code
156 that can be compiled using the BPF Compiler Collection (BCC)
157 (https://github.com/iovisor/bcc)
158
159 ## Compiling P4 to EBPF
160
161 From the above description it is apparent that the P4 and EBPF
162 programming languages have different expressive powers. However,
163 there is a significant overlap in their capabilities, in particular,
164 in the domain of network packet processing. The following image
165 illustrates the situation:
166
167 ![P4 and EBPF overlap in capabilities](scope.png)
168
169 We expect that the overlapping region will grow in size as both P4 and
170 EBPF continue to mature.
171
172 The current version of the P4 to EBPF compiler translates programs
173 written in the version 1.1 of the P4 programming language to programs
174 written in a restricted subset of C. The subset of C is chosen such
175 that it should be compilable to EBPF using BCC.
176
177 ```
178 -------------- -------
179 P4 ---> | P4-to-EBPF | ---> C ----> | BCC | --> EBPF
180 -------------- -------
181 ```
182
183 The P4 program only describes the packet processing *data plane*, that
184 runs in the Linux kernel. The *control plane* must be separately
185 implemented by the user. The BCC tools simplify this task
186 considerably, by generating C and/or Python APIs that expose the
187 dataplane/control-plane APIs.
188
189 ### Dependencies
190
191 EBPF programs require a Linux kernel with version 4.2 or newer.
192
193 In order to use the P4 to EBPF compiler the following software must be installed:
194
195 * The compiler itself is written in the Python (v2.x) programming
196 language.
197
198 * the P4 compiler front-end: (https://github.com/p4lang/p4-hlir).
199 This is required for parsing the P4 programs.
200
201 * the BCC compiler collection tools: (https://github.com/iovisor/bcc).
202 This is required for compiling the generated code. Also, BCC comes
203 with a set of Python utilities which can be used to implement
204 control-plane programs that operate in concert with the kernel EBPF
205 datapath.
206
207 The P4 to EBPF compiler generates code that is designed for being used
208 as a classifier using the Linux TC subsystem.
209
210 Furthermore, the test code provided is written using the Python (v3.x)
211 programming language and requires several Python packages to be
212 installed.
213
214 ### Supported capabilities
215
216 The current version of the P4 to EBPF compiler supports a relatively
217 narrow subset of the P4 language, but still powerful enough to write
218 very complex packet filters and simple packet forwarding engines. In
219 the spirit of open-source "release early, release often", we expect
220 that the compiler's capabilities will improve gradually.
221
222 * Packet filtering is performed using the `drop()` action. Packets
223 that are not dropped will be forwarded.
224
225 * Packet forwarding is performed by setting the
226 `standard_metadata.egress_port` to the index of the destination
227 network interface
228
229 Here are some limitations imposed on the P4 programs:
230
231 * Currently both the ingress and the egress P4 pipelines are executed
232 at the same hook (wherever the user chooses to insert the generated
233 EBPF program). In the future the compiler should probably generate
234 two separate EBPF programs.
235
236 * arbitrary parsers can be compiled, but the BCC compiler will reject
237 parsers that contain cycles
238
239 * arithmetic on data wider than 32 bits is not supported
240
241 * checksum computations are not implemented. In consequence, programs
242 that IP/TCP/UDP headers will produce incorrect packet headers.
243
244 * EBPF does not offer support for ternary or LPM tables
245
246 * P4 cloning and recirculation and not supported
247
248 * meters and registers are not supported; only direct counters are
249 currently supported. EBPF can potentially support registers and
250 arbitrary counters, so these may appear in the future.
251
252 * learning (i.e. `generate_digest`) is not implemented
253
254 ### Translating P4 to C
255
256 To simplify the translation, the P4 programmer should refrain using
257 identifiers whose name starts with `ebpf_`.
258
259 The following table provides a brief summary of how each P4 construct
260 is mapped to a corresponding C construct:
261
262 #### Translating parsers
263
264 P4 Construct | C Translation
265 ----------|------------
266 `header_type` | `struct` type
267 `header` | `struct` instance with an additional `valid` bit
268 `metadata` | `struct` instance
269 parser state | code block
270 state transition | `goto` statement
271 `extract` | load/shift/mask data from packet buffer
272
273 #### Translating match-action pipelines
274
275 P4 Construct | C Translation
276 ----------|------------
277 table | 2 EBPF tables: second one used just for the default action
278 table key | `struct` type
279 table `actions` block | tagged `union` with all possible actions
280 `action` arguments | `struct`
281 table `reads` | EBPF table access
282 `action` body | code block
283 table `apply` | `switch` statement
284 counters | additional EBPF table
285
286 ### Code organization
287
288 The compiler code is organized in two folders:
289
290 * `compiler`: the complete compiler source code, in Python v2.x
291 The compiler entry point is `p4toEbpf.py`.
292
293 * `test`: testing code and data. There are two testing programs:
294 * `testP4toEbpf.py`: which compiles all P4 files in the testprograms folder
295
296 * `endToEndTest.py`: which compiles and executes the simple.p4
297 program, and includes a simple control plane
298
299 Currently the compiler contains no installation capabilities.
300
301 ### Invoking the compiler
302
303 Invoking the compiler is just a matter of invoking the python program
304 with a suitable input P4 file:
305
306 ```
307 p4toEbpf.py file.p4 -o file.c
308 ```
309
310 #### Compiler options
311
312 The P4 compiler first runs the C preprocessor on the input P4 file.
313 Some of the command-line options are passed directly to the
314 preprocessor.
315
316 The following compiler options are available:
317
318 Option | Meaning
319 -------|--------
320 `-D macro` | Option passed to C preprocessor
321 `-I path` | Option passed to C preprocessor
322 `-U macro` | Option passed to C preprocessor
323 `-g [router|filter]` | Controls whether the generated code behaves like a router or a filter.
324 `-o outoutFile` | writes the generated C code to the specified output file.
325
326 The `-g` option controls the nature of the generated code:
327
328 * `-g filter` generates a filter; the only P4 action that has an
329 effect is the `drop()` action. Setting metadata in P4 (e.g.,
330 `egress_port`) has no effect.
331
332 * `-g router` generates a simple router; both `drop()` and
333 `egress_port` impact packet processing.
334
335 #### Using the generated code
336
337 The resulting file contains the complete data structures, tables, and
338 a C function named `ebpf_filter` that implements the P4-specified
339 data-plane. This C file can be manipulated using the BCC tools;
340 please refer to the BCC project documentation and sample test files of
341 the P4 to EBPF source code for an in-depth understanding. A minimal
342 Python program that compiles and loads into the kernel the generated
343 file into EBPF is:
344
345 ```
346 #!/usr/bin/env python3
347 from bcc import BPF
348
349 b = BPF(src_file="file.c", debug=0)
350 fn = b.load_func("ebpf_filter", BPF.SCHED_CLS)
351 ```
352
353 ##### Connecting the generated program with the TC
354
355 The EBPF code that is generated is intended to be used as a classifier
356 attached to the ingress packet path using the Linux TC subsystem. The
357 same EBPF code should be attached to all interfaces. Note however
358 that all EBPF code instances share a single set of tables, which are
359 used to control the program behavior.
360
361 The following code fragment illustrates how the EBPF code can be
362 hooked up to the `eth0` interface using a Python program. (The `fn`
363 variable is the one produced by the previous code fragment).
364
365 ```
366 from pyroute2 import IPRoute
367
368 ipr = IPRoute()
369 interface_name="eth0"
370 if_index = ipr.link_lookup(ifname=interface_name)[0]
371 ipr.tc("add", "ingress", if_index, "ffff:")
372 ipr.tc("add-filter", "bpf", if_index, ":1", fd=fn.fd,
373 name=fn.name, parent="ffff:", action="ok", classid=1)
374 ```
375