Home | History | Annotate | Download | only in source
      1 FlatBuffers white paper    {#flatbuffers_white_paper}
      2 =======================
      3 
      4 This document tries to shed some light on to the "why" of FlatBuffers, a
      5 new serialization library.
      6 
      7 ## Motivation
      8 
      9 Back in the good old days, performance was all about instructions and
     10 cycles. Nowadays, processing units have run so far ahead of the memory
     11 subsystem, that making an efficient application should start and finish
     12 with thinking about memory. How much you use of it. How you lay it out
     13 and access it. How you allocate it. When you copy it.
     14 
     15 Serialization is a pervasive activity in a lot programs, and a common
     16 source of memory inefficiency, with lots of temporary data structures
     17 needed to parse and represent data, and inefficient allocation patterns
     18 and locality.
     19 
     20 If it would be possible to do serialization with no temporary objects,
     21 no additional allocation, no copying, and good locality, this could be
     22 of great value. The reason serialization systems usually don't manage
     23 this is because it goes counter to forwards/backwards compatability, and
     24 platform specifics like endianness and alignment.
     25 
     26 FlatBuffers is what you get if you try anyway.
     27 
     28 In particular, FlatBuffers focus is on mobile hardware (where memory
     29 size and memory bandwidth is even more constrained than on desktop
     30 hardware), and applications that have the highest performance needs:
     31 games.
     32 
     33 ## FlatBuffers
     34 
     35 *This is a summary of FlatBuffers functionality, with some rationale.
     36 A more detailed description can be found in the FlatBuffers
     37 documentation.*
     38 
     39 ### Summary
     40 
     41 A FlatBuffer is a binary buffer containing nested objects (structs,
     42 tables, vectors,..) organized using offsets so that the data can be
     43 traversed in-place just like any pointer-based data structure. Unlike
     44 most in-memory data structures however, it uses strict rules of
     45 alignment and endianness (always little) to ensure these buffers are
     46 cross platform. Additionally, for objects that are tables, FlatBuffers
     47 provides forwards/backwards compatibility and general optionality of
     48 fields, to support most forms of format evolution.
     49 
     50 You define your object types in a schema, which can then be compiled to
     51 C++ or Java for low to zero overhead reading & writing.
     52 Optionally, JSON data can be dynamically parsed into buffers.
     53 
     54 ### Tables
     55 
     56 Tables are the cornerstone of FlatBuffers, since format evolution is
     57 essential for most applications of serialization. Typically, dealing
     58 with format changes is something that can be done transparently during
     59 the parsing process of most serialization solutions out there.
     60 But a FlatBuffer isn't parsed before it is accessed.
     61 
     62 Tables get around this by using an extra indirection to access fields,
     63 through a *vtable*. Each table comes with a vtable (which may be shared
     64 between multiple tables with the same layout), and contains information
     65 where fields for this particular kind of instance of vtable are stored.
     66 The vtable may also indicate that the field is not present (because this
     67 FlatBuffer was written with an older version of the software, of simply
     68 because the information was not necessary for this instance, or deemed
     69 deprecated), in which case a default value is returned.
     70 
     71 Tables have a low overhead in memory (since vtables are small and
     72 shared) and in access cost (an extra indirection), but provide great
     73 flexibility. Tables may even cost less memory than the equivalent
     74 struct, since fields do not need to be stored when they are equal to
     75 their default.
     76 
     77 FlatBuffers additionally offers "naked" structs, which do not offer
     78 forwards/backwards compatibility, but can be even smaller (useful for
     79 very small objects that are unlikely to change, like e.g. a coordinate
     80 pair or a RGBA color).
     81 
     82 ### Schemas
     83 
     84 While schemas reduce some generality (you can't just read any data
     85 without having its schema), they have a lot of upsides:
     86 
     87 -   Most information about the format can be factored into the generated
     88     code, reducing memory needed to store data, and time to access it.
     89 
     90 -   The strong typing of the data definitions means less error
     91     checking/handling at runtime (less can go wrong).
     92 
     93 -   A schema enables us to access a buffer without parsing.
     94 
     95 FlatBuffer schemas are fairly similar to those of the incumbent,
     96 Protocol Buffers, and generally should be readable to those familiar
     97 with the C family of languages. We chose to improve upon the features
     98 offered by .proto files in the following ways:
     99 
    100 -   Deprecation of fields instead of manual field id assignment.
    101     Extending an object in a .proto means hunting for a free slot among
    102     the numbers (preferring lower numbers since they have a more compact
    103     representation). Besides being inconvenient, it also makes removing
    104     fields problematic: you either have to keep them, not making it
    105     obvious that this field shouldn't be read/written anymore, and still
    106     generating accessors. Or you remove it, but now you risk that
    107     there's still old data around that uses that field by the time
    108     someone reuses that field id, with nasty consequences.
    109 
    110 -   Differentiating between tables and structs (see above). Effectively
    111     all table fields are `optional`, and all struct fields are
    112     `required`.
    113 
    114 -   Having a native vector type instead of `repeated`. This gives you a
    115     length without having to collect all items, and in the case of
    116     scalars provides for a more compact representation, and one that
    117     guarantees adjacency.
    118 
    119 -   Having a native `union` type instead of using a series of `optional`
    120     fields, all of which must be checked individually.
    121 
    122 -   Being able to define defaults for all scalars, instead of having to
    123     deal with their optionality at each access.
    124 
    125 -   A parser that can deal with both schemas and data definitions (JSON
    126     compatible) uniformly.
    127 
    128 <br>
    129