Home | History | Annotate | Download | only in source
      1 Writing a schema    {#flatbuffers_guide_writing_schema}
      2 ================
      3 
      4 The syntax of the schema language (aka IDL, [Interface Definition Language][])
      5 should look quite familiar to users of any of the C family of
      6 languages, and also to users of other IDLs. Let's look at an example
      7 first:
      8 
      9     // example IDL file
     10 
     11     namespace MyGame;
     12 
     13     attribute "priority";
     14 
     15     enum Color : byte { Red = 1, Green, Blue }
     16 
     17     union Any { Monster, Weapon, Pickup }
     18 
     19     struct Vec3 {
     20       x:float;
     21       y:float;
     22       z:float;
     23     }
     24 
     25     table Monster {
     26       pos:Vec3;
     27       mana:short = 150;
     28       hp:short = 100;
     29       name:string;
     30       friendly:bool = false (deprecated, priority: 1);
     31       inventory:[ubyte];
     32       color:Color = Blue;
     33       test:Any;
     34     }
     35 
     36     root_type Monster;
     37 
     38 (`Weapon` & `Pickup` not defined as part of this example).
     39 
     40 ### Tables
     41 
     42 Tables are the main way of defining objects in FlatBuffers, and consist
     43 of a name (here `Monster`) and a list of fields. Each field has a name,
     44 a type, and optionally a default value (if omitted, it defaults to `0` /
     45 `NULL`).
     46 
     47 Each field is optional: It does not have to appear in the wire
     48 representation, and you can choose to omit fields for each individual
     49 object. As a result, you have the flexibility to add fields without fear of
     50 bloating your data. This design is also FlatBuffer's mechanism for forward
     51 and backwards compatibility. Note that:
     52 
     53 -   You can add new fields in the schema ONLY at the end of a table
     54     definition. Older data will still
     55     read correctly, and give you the default value when read. Older code
     56     will simply ignore the new field.
     57     If you want to have flexibility to use any order for fields in your
     58     schema, you can manually assign ids (much like Protocol Buffers),
     59     see the `id` attribute below.
     60 
     61 -   You cannot delete fields you don't use anymore from the schema,
     62     but you can simply
     63     stop writing them into your data for almost the same effect.
     64     Additionally you can mark them as `deprecated` as in the example
     65     above, which will prevent the generation of accessors in the
     66     generated C++, as a way to enforce the field not being used any more.
     67     (careful: this may break code!).
     68 
     69 -   You may change field names and table names, if you're ok with your
     70     code breaking until you've renamed them there too.
     71 
     72 See "Schema evolution examples" below for more on this
     73 topic.
     74 
     75 ### Structs
     76 
     77 Similar to a table, only now none of the fields are optional (so no defaults
     78 either), and fields may not be added or be deprecated. Structs may only contain
     79 scalars or other structs. Use this for
     80 simple objects where you are very sure no changes will ever be made
     81 (as quite clear in the example `Vec3`). Structs use less memory than
     82 tables and are even faster to access (they are always stored in-line in their
     83 parent object, and use no virtual table).
     84 
     85 ### Types
     86 
     87 Built-in scalar types are 
     88 
     89 -   8 bit: `byte` (`int8`), `ubyte` (`uint8`), `bool`
     90 
     91 -   16 bit: `short` (`int16`), `ushort` (`uint16`)
     92 
     93 -   32 bit: `int` (`int32`), `uint` (`uint32`), `float` (`float32`)
     94 
     95 -   64 bit: `long` (`int64`), `ulong` (`uint64`), `double` (`float64`)
     96 
     97 The type names in parentheses are alias names such that for example
     98 `uint8` can be used in place of `ubyte`, and `int32` can be used in
     99 place of `int` without affecting code generation.
    100 
    101 Built-in non-scalar types:
    102 
    103 -   Vector of any other type (denoted with `[type]`). Nesting vectors
    104     is not supported, instead you can wrap the inner vector in a table.
    105 
    106 -   `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings
    107     or general binary data use vectors (`[byte]` or `[ubyte]`) instead.
    108 
    109 -   References to other tables or structs, enums or unions (see
    110     below).
    111 
    112 You can't change types of fields once they're used, with the exception
    113 of same-size data where a `reinterpret_cast` would give you a desirable result,
    114 e.g. you could change a `uint` to an `int` if no values in current data use the
    115 high bit yet.
    116 
    117 ### (Default) Values
    118 
    119 Values are a sequence of digits. Values may be optionally followed by a decimal
    120 point (`.`) and more digits, for float constants, or optionally prefixed by
    121 a `-`. Floats may also be in scientific notation; optionally ending with an `e`
    122 or `E`, followed by a `+` or `-` and more digits.
    123 
    124 Only scalar values can have defaults, non-scalar (string/vector/table) fields
    125 default to `NULL` when not present.
    126 
    127 You generally do not want to change default values after they're initially
    128 defined. Fields that have the default value are not actually stored in the
    129 serialized data (see also Gotchas below) but are generated in code,
    130 so when you change the default, you'd
    131 now get a different value than from code generated from an older version of
    132 the schema. There are situations, however, where this may be
    133 desirable, especially if you can ensure a simultaneous rebuild of
    134 all code.
    135 
    136 ### Enums
    137 
    138 Define a sequence of named constants, each with a given value, or
    139 increasing by one from the previous one. The default first value
    140 is `0`. As you can see in the enum declaration, you specify the underlying
    141 integral type of the enum with `:` (in this case `byte`), which then determines
    142 the type of any fields declared with this enum type.
    143 
    144 Typically, enum values should only ever be added, never removed (there is no
    145 deprecation for enums). This requires code to handle forwards compatibility
    146 itself, by handling unknown enum values.
    147 
    148 ### Unions
    149 
    150 Unions share a lot of properties with enums, but instead of new names
    151 for constants, you use names of tables. You can then declare
    152 a union field, which can hold a reference to any of those types, and
    153 additionally a hidden field with the suffix `_type` is generated that
    154 holds the corresponding enum value, allowing you to know which type to
    155 cast to at runtime.
    156 
    157 Unions are a good way to be able to send multiple message types as a FlatBuffer.
    158 Note that because a union field is really two fields, it must always be
    159 part of a table, it cannot be the root of a FlatBuffer by itself.
    160 
    161 If you have a need to distinguish between different FlatBuffers in a more
    162 open-ended way, for example for use as files, see the file identification
    163 feature below.
    164 
    165 There is an experimental support only in C++ for a vector of unions
    166 (and types). In the example IDL file above, use [Any] to add a
    167 vector of Any to Monster table.
    168 
    169 ### Namespaces
    170 
    171 These will generate the corresponding namespace in C++ for all helper
    172 code, and packages in Java. You can use `.` to specify nested namespaces /
    173 packages.
    174 
    175 ### Includes
    176 
    177 You can include other schemas files in your current one, e.g.:
    178 
    179     include "mydefinitions.fbs";
    180 
    181 This makes it easier to refer to types defined elsewhere. `include`
    182 automatically ensures each file is parsed just once, even when referred to
    183 more than once.
    184 
    185 When using the `flatc` compiler to generate code for schema definitions,
    186 only definitions in the current file will be generated, not those from the
    187 included files (those you still generate separately).
    188 
    189 ### Root type
    190 
    191 This declares what you consider to be the root table (or struct) of the
    192 serialized data. This is particularly important for parsing JSON data,
    193 which doesn't include object type information.
    194 
    195 ### File identification and extension
    196 
    197 Typically, a FlatBuffer binary buffer is not self-describing, i.e. it
    198 needs you to know its schema to parse it correctly. But if you
    199 want to use a FlatBuffer as a file format, it would be convenient
    200 to be able to have a "magic number" in there, like most file formats
    201 have, to be able to do a sanity check to see if you're reading the
    202 kind of file you're expecting.
    203 
    204 Now, you can always prefix a FlatBuffer with your own file header,
    205 but FlatBuffers has a built-in way to add an identifier to a
    206 FlatBuffer that takes up minimal space, and keeps the buffer
    207 compatible with buffers that don't have such an identifier.
    208 
    209 You can specify in a schema, similar to `root_type`, that you intend
    210 for this type of FlatBuffer to be used as a file format:
    211 
    212     file_identifier "MYFI";
    213 
    214 Identifiers must always be exactly 4 characters long. These 4 characters
    215 will end up as bytes at offsets 4-7 (inclusive) in the buffer.
    216 
    217 For any schema that has such an identifier, `flatc` will automatically
    218 add the identifier to any binaries it generates (with `-b`),
    219 and generated calls like `FinishMonsterBuffer` also add the identifier.
    220 If you have specified an identifier and wish to generate a buffer
    221 without one, you can always still do so by calling
    222 `FlatBufferBuilder::Finish` explicitly.
    223 
    224 After loading a buffer, you can use a call like
    225 `MonsterBufferHasIdentifier` to check if the identifier is present.
    226 
    227 Note that this is best for open-ended uses such as files. If you simply wanted
    228 to send one of a set of possible messages over a network for example, you'd
    229 be better off with a union.
    230 
    231 Additionally, by default `flatc` will output binary files as `.bin`.
    232 This declaration in the schema will change that to whatever you want:
    233 
    234     file_extension "ext";
    235 
    236 ### RPC interface declarations
    237 
    238 You can declare RPC calls in a schema, that define a set of functions
    239 that take a FlatBuffer as an argument (the request) and return a FlatBuffer
    240 as the response (both of which must be table types):
    241 
    242     rpc_service MonsterStorage {
    243       Store(Monster):StoreResponse;
    244       Retrieve(MonsterId):Monster;
    245     }
    246 
    247 What code this produces and how it is used depends on language and RPC system
    248 used, there is preliminary support for GRPC through the `--grpc` code generator,
    249 see `grpc/tests` for an example.
    250 
    251 ### Comments & documentation
    252 
    253 May be written as in most C-based languages. Additionally, a triple
    254 comment (`///`) on a line by itself signals that a comment is documentation
    255 for whatever is declared on the line after it
    256 (table/struct/field/enum/union/element), and the comment is output
    257 in the corresponding C++ code. Multiple such lines per item are allowed.
    258 
    259 ### Attributes
    260 
    261 Attributes may be attached to a declaration, behind a field, or after
    262 the name of a table/struct/enum/union. These may either have a value or
    263 not. Some attributes like `deprecated` are understood by the compiler;
    264 user defined ones need to be declared with the attribute declaration
    265 (like `priority` in the example above), and are
    266 available to query if you parse the schema at runtime.
    267 This is useful if you write your own code generators/editors etc., and
    268 you wish to add additional information specific to your tool (such as a
    269 help text).
    270 
    271 Current understood attributes:
    272 
    273 -   `id: n` (on a table field): manually set the field identifier to `n`.
    274     If you use this attribute, you must use it on ALL fields of this table,
    275     and the numbers must be a contiguous range from 0 onwards.
    276     Additionally, since a union type effectively adds two fields, its
    277     id must be that of the second field (the first field is the type
    278     field and not explicitly declared in the schema).
    279     For example, if the last field before the union field had id 6,
    280     the union field should have id 8, and the unions type field will
    281     implicitly be 7.
    282     IDs allow the fields to be placed in any order in the schema.
    283     When a new field is added to the schema it must use the next available ID.
    284 -   `deprecated` (on a field): do not generate accessors for this field
    285     anymore, code should stop using this data. Old data may still contain this
    286     field, but it won't be accessible anymore by newer code. Note that if you
    287     deprecate a field that was previous required, old code may fail to validate
    288     new data (when using the optional verifier).
    289 -   `required` (on a non-scalar table field): this field must always be set.
    290     By default, all fields are optional, i.e. may be left out. This is
    291     desirable, as it helps with forwards/backwards compatibility, and
    292     flexibility of data structures. It is also a burden on the reading code,
    293     since for non-scalar fields it requires you to check against NULL and
    294     take appropriate action. By specifying this field, you force code that
    295     constructs FlatBuffers to ensure this field is initialized, so the reading
    296     code may access it directly, without checking for NULL. If the constructing
    297     code does not initialize this field, they will get an assert, and also
    298     the verifier will fail on buffers that have missing required fields. Note
    299     that if you add this attribute to an existing field, this will only be
    300     valid if existing data always contains this field / existing code always
    301     writes this field.
    302 -   `force_align: size` (on a struct): force the alignment of this struct
    303     to be something higher than what it is naturally aligned to. Causes
    304     these structs to be aligned to that amount inside a buffer, IF that
    305     buffer is allocated with that alignment (which is not necessarily
    306     the case for buffers accessed directly inside a `FlatBufferBuilder`).
    307 -   `bit_flags` (on an enum): the values of this field indicate bits,
    308     meaning that any value N specified in the schema will end up
    309     representing 1<<N, or if you don't specify values at all, you'll get
    310     the sequence 1, 2, 4, 8, ...
    311 -   `nested_flatbuffer: "table_name"` (on a field): this indicates that the field
    312     (which must be a vector of ubyte) contains flatbuffer data, for which the
    313     root type is given by `table_name`. The generated code will then produce
    314     a convenient accessor for the nested FlatBuffer.
    315 -   `flexbuffer` (on a field): this indicates that the field
    316     (which must be a vector of ubyte) contains flexbuffer data. The generated
    317     code will then produce a convenient accessor for the FlexBuffer root.
    318 -   `key` (on a field): this field is meant to be used as a key when sorting
    319     a vector of the type of table it sits in. Can be used for in-place
    320     binary search.
    321 -   `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose
    322     value during JSON parsing is allowed to be a string, which will then be
    323     stored as its hash. The value of attribute is the hashing algorithm to
    324     use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`.
    325 -   `original_order` (on a table): since elements in a table do not need
    326     to be stored in any particular order, they are often optimized for
    327     space by sorting them to size. This attribute stops that from happening.
    328     There should generally not be any reason to use this flag.
    329 -   'native_*'.  Several attributes have been added to support the [C++ object
    330     Based API](@ref flatbuffers_cpp_object_based_api).  All such attributes
    331     are prefixed with the term "native_".
    332 
    333 
    334 ## JSON Parsing
    335 
    336 The same parser that parses the schema declarations above is also able
    337 to parse JSON objects that conform to this schema. So, unlike other JSON
    338 parsers, this parser is strongly typed, and parses directly into a FlatBuffer
    339 (see the compiler documentation on how to do this from the command line, or
    340 the C++ documentation on how to do this at runtime).
    341 
    342 Besides needing a schema, there are a few other changes to how it parses
    343 JSON:
    344 
    345 -   It accepts field names with and without quotes, like many JSON parsers
    346     already do. It outputs them without quotes as well, though can be made
    347     to output them using the `strict_json` flag.
    348 -   If a field has an enum type, the parser will recognize symbolic enum
    349     values (with or without quotes) instead of numbers, e.g.
    350     `field: EnumVal`. If a field is of integral type, you can still use
    351     symbolic names, but values need to be prefixed with their type and
    352     need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums
    353     representing flags, you may place multiple inside a string
    354     separated by spaces to OR them, e.g.
    355     `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`.
    356 -   Similarly, for unions, these need to specified with two fields much like
    357     you do when serializing from code. E.g. for a field `foo`, you must
    358     add a field `foo_type: FooOne` right before the `foo` field, where
    359     `FooOne` would be the table out of the union you want to use.
    360 -   A field that has the value `null` (e.g. `field: null`) is intended to
    361     have the default value for that field (thus has the same effect as if
    362     that field wasn't specified at all).
    363 -   It has some built in conversion functions, so you can write for example
    364     `rad(180)` where ever you'd normally write `3.14159`.
    365     Currently supports the following functions: `rad`, `deg`, `cos`, `sin`,
    366     `tan`, `acos`, `asin`, `atan`.
    367 
    368 When parsing JSON, it recognizes the following escape codes in strings:
    369 
    370 -   `\n` - linefeed.
    371 -   `\t` - tab.
    372 -   `\r` - carriage return.
    373 -   `\b` - backspace.
    374 -   `\f` - form feed.
    375 -   `\"` - double quote.
    376 -   `\\` - backslash.
    377 -   `\/` - forward slash.
    378 -   `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8
    379     representation.
    380 -   `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is
    381      not in the JSON spec (see http://json.org/), but is needed to be able to
    382      encode arbitrary binary in strings to text and back without losing
    383      information (e.g. the byte 0xFF can't be represented in standard JSON).
    384 
    385 It also generates these escape codes back again when generating JSON from a
    386 binary representation.
    387 
    388 ## Guidelines
    389 
    390 ### Efficiency
    391 
    392 FlatBuffers is all about efficiency, but to realize that efficiency you
    393 require an efficient schema. There are usually multiple choices on
    394 how to represent data that have vastly different size characteristics.
    395 
    396 It is very common nowadays to represent any kind of data as dictionaries
    397 (as in e.g. JSON), because of its flexibility and extensibility. While
    398 it is possible to emulate this in FlatBuffers (as a vector
    399 of tables with key and value(s)), this is a bad match for a strongly
    400 typed system like FlatBuffers, leading to relatively large binaries.
    401 FlatBuffer tables are more flexible than classes/structs in most systems,
    402 since having a large number of fields only few of which are actually
    403 used is still efficient. You should thus try to organize your data
    404 as much as possible such that you can use tables where you might be
    405 tempted to use a dictionary.
    406 
    407 Similarly, strings as values should only be used when they are
    408 truely open-ended. If you can, always use an enum instead.
    409 
    410 FlatBuffers doesn't have inheritance, so the way to represent a set
    411 of related data structures is a union. Unions do have a cost however,
    412 so an alternative to a union is to have a single table that has
    413 all the fields of all the data structures you are trying to
    414 represent, if they are relatively similar / share many fields.
    415 Again, this is efficient because optional fields are cheap.
    416 
    417 FlatBuffers supports the full range of integer sizes, so try to pick
    418 the smallest size needed, rather than defaulting to int/long.
    419 
    420 Remember that you can share data (refer to the same string/table
    421 within a buffer), so factoring out repeating data into its own
    422 data structure may be worth it.
    423 
    424 ### Style guide
    425 
    426 Identifiers in a schema are meant to translate to many different programming
    427 languages, so using the style of your "main" language is generally a bad idea.
    428 
    429 For this reason, below is a suggested style guide to adhere to, to keep schemas
    430 consistent for interoperation regardless of the target language.
    431 
    432 Where possible, the code generators for specific languages will generate
    433 identifiers that adhere to the language style, based on the schema identifiers.
    434 
    435 - Table, struct, enum and rpc names (types): UpperCamelCase.
    436 - Table and struct field names: snake_case. This is translated to lowerCamelCase
    437   automatically for some languages, e.g. Java.
    438 - Enum values: UpperCamelCase.
    439 - namespaces: UpperCamelCase.
    440 
    441 Formatting (this is less important, but still worth adhering to):
    442 
    443 - Opening brace: on the same line as the start of the declaration.
    444 - Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`.
    445 
    446 For an example, see the schema at the top of this file.
    447 
    448 ## Gotchas
    449 
    450 ### Schemas and version control
    451 
    452 FlatBuffers relies on new field declarations being added at the end, and earlier
    453 declarations to not be removed, but be marked deprecated when needed. We think
    454 this is an improvement over the manual number assignment that happens in
    455 Protocol Buffers (and which is still an option using the `id` attribute
    456 mentioned above).
    457 
    458 One place where this is possibly problematic however is source control. If user
    459 A adds a field, generates new binary data with this new schema, then tries to
    460 commit both to source control after user B already committed a new field also,
    461 and just auto-merges the schema, the binary files are now invalid compared to
    462 the new schema.
    463 
    464 The solution of course is that you should not be generating binary data before
    465 your schema changes have been committed, ensuring consistency with the rest of
    466 the world. If this is not practical for you, use explicit field ids, which
    467 should always generate a merge conflict if two people try to allocate the same
    468 id.
    469 
    470 ### Schema evolution examples
    471 
    472 Some examples to clarify what happens as you change a schema:
    473 
    474 If we have the following original schema:
    475 
    476     table { a:int; b:int; }
    477 
    478 And we extend it:
    479 
    480     table { a:int; b:int; c:int; }
    481 
    482 This is ok. Code compiled with the old schema reading data generated with the
    483 new one will simply ignore the presence of the new field. Code compiled with the
    484 new schema reading old data will get the default value for `c` (which is 0
    485 in this case, since it is not specified).
    486 
    487     table { a:int (deprecated); b:int; }
    488 
    489 This is also ok. Code compiled with the old schema reading newer data will now
    490 always get the default value for `a` since it is not present. Code compiled
    491 with the new schema now cannot read nor write `a` anymore (any existing code
    492 that tries to do so will result in compile errors), but can still read
    493 old data (they will ignore the field).
    494 
    495     table { c:int a:int; b:int; }
    496 
    497 This is NOT ok, as this makes the schemas incompatible. Old code reading newer
    498 data will interpret `c` as if it was `a`, and new code reading old data
    499 accessing `a` will instead receive `b`.
    500 
    501     table { c:int (id: 2); a:int (id: 0); b:int (id: 1); }
    502 
    503 This is ok. If your intent was to order/group fields in a way that makes sense
    504 semantically, you can do so using explicit id assignment. Now we are compatible
    505 with the original schema, and the fields can be ordered in any way, as long as
    506 we keep the sequence of ids.
    507 
    508     table { b:int; }
    509 
    510 NOT ok. We can only remove a field by deprecation, regardless of wether we use
    511 explicit ids or not.
    512 
    513     table { a:uint; b:uint; }
    514 
    515 This is MAYBE ok, and only in the case where the type change is the same size,
    516 like here. If old data never contained any negative numbers, this will be
    517 safe to do.
    518 
    519     table { a:int = 1; b:int = 2; }
    520 
    521 Generally NOT ok. Any older data written that had 0 values were not written to
    522 the buffer, and rely on the default value to be recreated. These will now have
    523 those values appear to `1` and `2` instead. There may be cases in which this
    524 is ok, but care must be taken.
    525 
    526     table { aa:int; bb:int; }
    527 
    528 Occasionally ok. You've renamed fields, which will break all code (and JSON
    529 files!) that use this schema, but as long as the change is obvious, this is not
    530 incompatible with the actual binary buffers, since those only ever address
    531 fields by id/offset.
    532 <br>
    533 
    534 ### Testing whether a field is present in a table
    535 
    536 Most serialization formats (e.g. JSON or Protocol Buffers) make it very
    537 explicit in the format whether a field is present in an object or not,
    538 allowing you to use this as "extra" information.
    539 
    540 In FlatBuffers, this also holds for everything except scalar values.
    541 
    542 FlatBuffers by default will not write fields that are equal to the default
    543 value (for scalars), sometimes resulting in a significant space savings.
    544 
    545 However, this also means testing whether a field is "present" is somewhat
    546 meaningless, since it does not tell you if the field was actually written by
    547 calling `add_field` style calls, unless you're only interested in this
    548 information for non-default values.
    549 
    550 Some `FlatBufferBuilder` implementations have an option called `force_defaults`
    551 that circumvents this behavior, and writes fields even if they are equal to
    552 the default. You can then use `IsFieldPresent` to query this.
    553 
    554 Another option that works in all languages is to wrap a scalar field in a
    555 struct. This way it will return null if it is not present. The cool thing
    556 is that structs don't take up any more space than the scalar they represent.
    557 
    558    [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language
    559