1 FlatBuffers white paper {#flatbuffers_white_paper} 2 ======================= 3 4 This document tries to shed some light on to the "why" of FlatBuffers, a 5 new serialization library. 6 7 ## Motivation 8 9 Back in the good old days, performance was all about instructions and 10 cycles. Nowadays, processing units have run so far ahead of the memory 11 subsystem, that making an efficient application should start and finish 12 with thinking about memory. How much you use of it. How you lay it out 13 and access it. How you allocate it. When you copy it. 14 15 Serialization is a pervasive activity in a lot programs, and a common 16 source of memory inefficiency, with lots of temporary data structures 17 needed to parse and represent data, and inefficient allocation patterns 18 and locality. 19 20 If it would be possible to do serialization with no temporary objects, 21 no additional allocation, no copying, and good locality, this could be 22 of great value. The reason serialization systems usually don't manage 23 this is because it goes counter to forwards/backwards compatability, and 24 platform specifics like endianness and alignment. 25 26 FlatBuffers is what you get if you try anyway. 27 28 In particular, FlatBuffers focus is on mobile hardware (where memory 29 size and memory bandwidth is even more constrained than on desktop 30 hardware), and applications that have the highest performance needs: 31 games. 32 33 ## FlatBuffers 34 35 *This is a summary of FlatBuffers functionality, with some rationale. 36 A more detailed description can be found in the FlatBuffers 37 documentation.* 38 39 ### Summary 40 41 A FlatBuffer is a binary buffer containing nested objects (structs, 42 tables, vectors,..) organized using offsets so that the data can be 43 traversed in-place just like any pointer-based data structure. Unlike 44 most in-memory data structures however, it uses strict rules of 45 alignment and endianness (always little) to ensure these buffers are 46 cross platform. Additionally, for objects that are tables, FlatBuffers 47 provides forwards/backwards compatibility and general optionality of 48 fields, to support most forms of format evolution. 49 50 You define your object types in a schema, which can then be compiled to 51 C++ or Java for low to zero overhead reading & writing. 52 Optionally, JSON data can be dynamically parsed into buffers. 53 54 ### Tables 55 56 Tables are the cornerstone of FlatBuffers, since format evolution is 57 essential for most applications of serialization. Typically, dealing 58 with format changes is something that can be done transparently during 59 the parsing process of most serialization solutions out there. 60 But a FlatBuffer isn't parsed before it is accessed. 61 62 Tables get around this by using an extra indirection to access fields, 63 through a *vtable*. Each table comes with a vtable (which may be shared 64 between multiple tables with the same layout), and contains information 65 where fields for this particular kind of instance of vtable are stored. 66 The vtable may also indicate that the field is not present (because this 67 FlatBuffer was written with an older version of the software, of simply 68 because the information was not necessary for this instance, or deemed 69 deprecated), in which case a default value is returned. 70 71 Tables have a low overhead in memory (since vtables are small and 72 shared) and in access cost (an extra indirection), but provide great 73 flexibility. Tables may even cost less memory than the equivalent 74 struct, since fields do not need to be stored when they are equal to 75 their default. 76 77 FlatBuffers additionally offers "naked" structs, which do not offer 78 forwards/backwards compatibility, but can be even smaller (useful for 79 very small objects that are unlikely to change, like e.g. a coordinate 80 pair or a RGBA color). 81 82 ### Schemas 83 84 While schemas reduce some generality (you can't just read any data 85 without having its schema), they have a lot of upsides: 86 87 - Most information about the format can be factored into the generated 88 code, reducing memory needed to store data, and time to access it. 89 90 - The strong typing of the data definitions means less error 91 checking/handling at runtime (less can go wrong). 92 93 - A schema enables us to access a buffer without parsing. 94 95 FlatBuffer schemas are fairly similar to those of the incumbent, 96 Protocol Buffers, and generally should be readable to those familiar 97 with the C family of languages. We chose to improve upon the features 98 offered by .proto files in the following ways: 99 100 - Deprecation of fields instead of manual field id assignment. 101 Extending an object in a .proto means hunting for a free slot among 102 the numbers (preferring lower numbers since they have a more compact 103 representation). Besides being inconvenient, it also makes removing 104 fields problematic: you either have to keep them, not making it 105 obvious that this field shouldn't be read/written anymore, and still 106 generating accessors. Or you remove it, but now you risk that 107 there's still old data around that uses that field by the time 108 someone reuses that field id, with nasty consequences. 109 110 - Differentiating between tables and structs (see above). Effectively 111 all table fields are `optional`, and all struct fields are 112 `required`. 113 114 - Having a native vector type instead of `repeated`. This gives you a 115 length without having to collect all items, and in the case of 116 scalars provides for a more compact representation, and one that 117 guarantees adjacency. 118 119 - Having a native `union` type instead of using a series of `optional` 120 fields, all of which must be checked individually. 121 122 - Being able to define defaults for all scalars, instead of having to 123 deal with their optionality at each access. 124 125 - A parser that can deal with both schemas and data definitions (JSON 126 compatible) uniformly. 127 128 <br> 129