1 Writing a schema {#flatbuffers_guide_writing_schema} 2 ================ 3 4 The syntax of the schema language (aka IDL, [Interface Definition Language][]) 5 should look quite familiar to users of any of the C family of 6 languages, and also to users of other IDLs. Let's look at an example 7 first: 8 9 // example IDL file 10 11 namespace MyGame; 12 13 attribute "priority"; 14 15 enum Color : byte { Red = 1, Green, Blue } 16 17 union Any { Monster, Weapon, Pickup } 18 19 struct Vec3 { 20 x:float; 21 y:float; 22 z:float; 23 } 24 25 table Monster { 26 pos:Vec3; 27 mana:short = 150; 28 hp:short = 100; 29 name:string; 30 friendly:bool = false (deprecated, priority: 1); 31 inventory:[ubyte]; 32 color:Color = Blue; 33 test:Any; 34 } 35 36 root_type Monster; 37 38 (`Weapon` & `Pickup` not defined as part of this example). 39 40 ### Tables 41 42 Tables are the main way of defining objects in FlatBuffers, and consist 43 of a name (here `Monster`) and a list of fields. Each field has a name, 44 a type, and optionally a default value (if omitted, it defaults to `0` / 45 `NULL`). 46 47 Each field is optional: It does not have to appear in the wire 48 representation, and you can choose to omit fields for each individual 49 object. As a result, you have the flexibility to add fields without fear of 50 bloating your data. This design is also FlatBuffer's mechanism for forward 51 and backwards compatibility. Note that: 52 53 - You can add new fields in the schema ONLY at the end of a table 54 definition. Older data will still 55 read correctly, and give you the default value when read. Older code 56 will simply ignore the new field. 57 If you want to have flexibility to use any order for fields in your 58 schema, you can manually assign ids (much like Protocol Buffers), 59 see the `id` attribute below. 60 61 - You cannot delete fields you don't use anymore from the schema, 62 but you can simply 63 stop writing them into your data for almost the same effect. 64 Additionally you can mark them as `deprecated` as in the example 65 above, which will prevent the generation of accessors in the 66 generated C++, as a way to enforce the field not being used any more. 67 (careful: this may break code!). 68 69 - You may change field names and table names, if you're ok with your 70 code breaking until you've renamed them there too. 71 72 See "Schema evolution examples" below for more on this 73 topic. 74 75 ### Structs 76 77 Similar to a table, only now none of the fields are optional (so no defaults 78 either), and fields may not be added or be deprecated. Structs may only contain 79 scalars or other structs. Use this for 80 simple objects where you are very sure no changes will ever be made 81 (as quite clear in the example `Vec3`). Structs use less memory than 82 tables and are even faster to access (they are always stored in-line in their 83 parent object, and use no virtual table). 84 85 ### Types 86 87 Built-in scalar types are 88 89 - 8 bit: `byte` (`int8`), `ubyte` (`uint8`), `bool` 90 91 - 16 bit: `short` (`int16`), `ushort` (`uint16`) 92 93 - 32 bit: `int` (`int32`), `uint` (`uint32`), `float` (`float32`) 94 95 - 64 bit: `long` (`int64`), `ulong` (`uint64`), `double` (`float64`) 96 97 The type names in parentheses are alias names such that for example 98 `uint8` can be used in place of `ubyte`, and `int32` can be used in 99 place of `int` without affecting code generation. 100 101 Built-in non-scalar types: 102 103 - Vector of any other type (denoted with `[type]`). Nesting vectors 104 is not supported, instead you can wrap the inner vector in a table. 105 106 - `string`, which may only hold UTF-8 or 7-bit ASCII. For other text encodings 107 or general binary data use vectors (`[byte]` or `[ubyte]`) instead. 108 109 - References to other tables or structs, enums or unions (see 110 below). 111 112 You can't change types of fields once they're used, with the exception 113 of same-size data where a `reinterpret_cast` would give you a desirable result, 114 e.g. you could change a `uint` to an `int` if no values in current data use the 115 high bit yet. 116 117 ### (Default) Values 118 119 Values are a sequence of digits. Values may be optionally followed by a decimal 120 point (`.`) and more digits, for float constants, or optionally prefixed by 121 a `-`. Floats may also be in scientific notation; optionally ending with an `e` 122 or `E`, followed by a `+` or `-` and more digits. 123 124 Only scalar values can have defaults, non-scalar (string/vector/table) fields 125 default to `NULL` when not present. 126 127 You generally do not want to change default values after they're initially 128 defined. Fields that have the default value are not actually stored in the 129 serialized data (see also Gotchas below) but are generated in code, 130 so when you change the default, you'd 131 now get a different value than from code generated from an older version of 132 the schema. There are situations, however, where this may be 133 desirable, especially if you can ensure a simultaneous rebuild of 134 all code. 135 136 ### Enums 137 138 Define a sequence of named constants, each with a given value, or 139 increasing by one from the previous one. The default first value 140 is `0`. As you can see in the enum declaration, you specify the underlying 141 integral type of the enum with `:` (in this case `byte`), which then determines 142 the type of any fields declared with this enum type. 143 144 Only integer types are allowed, i.e. `byte`, `ubyte`, `short` `ushort`, `int`, 145 `uint`, `long` and `ulong`. 146 147 Typically, enum values should only ever be added, never removed (there is no 148 deprecation for enums). This requires code to handle forwards compatibility 149 itself, by handling unknown enum values. 150 151 ### Unions 152 153 Unions share a lot of properties with enums, but instead of new names 154 for constants, you use names of tables. You can then declare 155 a union field, which can hold a reference to any of those types, and 156 additionally a field with the suffix `_type` is generated that holds 157 the corresponding enum value, allowing you to know which type to cast 158 to at runtime. 159 160 It's possible to give an alias name to a type union. This way a type can even be 161 used to mean different things depending on the name used: 162 163 table PointPosition { x:uint; y:uint; } 164 table MarkerPosition {} 165 union Position { 166 Start:MarkerPosition, 167 Point:PointPosition, 168 Finish:MarkerPosition 169 } 170 171 Unions contain a special `NONE` marker to denote that no value is stored so that 172 name cannot be used as an alias. 173 174 Unions are a good way to be able to send multiple message types as a FlatBuffer. 175 Note that because a union field is really two fields, it must always be 176 part of a table, it cannot be the root of a FlatBuffer by itself. 177 178 If you have a need to distinguish between different FlatBuffers in a more 179 open-ended way, for example for use as files, see the file identification 180 feature below. 181 182 There is an experimental support only in C++ for a vector of unions 183 (and types). In the example IDL file above, use [Any] to add a 184 vector of Any to Monster table. 185 186 ### Namespaces 187 188 These will generate the corresponding namespace in C++ for all helper 189 code, and packages in Java. You can use `.` to specify nested namespaces / 190 packages. 191 192 ### Includes 193 194 You can include other schemas files in your current one, e.g.: 195 196 include "mydefinitions.fbs"; 197 198 This makes it easier to refer to types defined elsewhere. `include` 199 automatically ensures each file is parsed just once, even when referred to 200 more than once. 201 202 When using the `flatc` compiler to generate code for schema definitions, 203 only definitions in the current file will be generated, not those from the 204 included files (those you still generate separately). 205 206 ### Root type 207 208 This declares what you consider to be the root table (or struct) of the 209 serialized data. This is particularly important for parsing JSON data, 210 which doesn't include object type information. 211 212 ### File identification and extension 213 214 Typically, a FlatBuffer binary buffer is not self-describing, i.e. it 215 needs you to know its schema to parse it correctly. But if you 216 want to use a FlatBuffer as a file format, it would be convenient 217 to be able to have a "magic number" in there, like most file formats 218 have, to be able to do a sanity check to see if you're reading the 219 kind of file you're expecting. 220 221 Now, you can always prefix a FlatBuffer with your own file header, 222 but FlatBuffers has a built-in way to add an identifier to a 223 FlatBuffer that takes up minimal space, and keeps the buffer 224 compatible with buffers that don't have such an identifier. 225 226 You can specify in a schema, similar to `root_type`, that you intend 227 for this type of FlatBuffer to be used as a file format: 228 229 file_identifier "MYFI"; 230 231 Identifiers must always be exactly 4 characters long. These 4 characters 232 will end up as bytes at offsets 4-7 (inclusive) in the buffer. 233 234 For any schema that has such an identifier, `flatc` will automatically 235 add the identifier to any binaries it generates (with `-b`), 236 and generated calls like `FinishMonsterBuffer` also add the identifier. 237 If you have specified an identifier and wish to generate a buffer 238 without one, you can always still do so by calling 239 `FlatBufferBuilder::Finish` explicitly. 240 241 After loading a buffer, you can use a call like 242 `MonsterBufferHasIdentifier` to check if the identifier is present. 243 244 Note that this is best for open-ended uses such as files. If you simply wanted 245 to send one of a set of possible messages over a network for example, you'd 246 be better off with a union. 247 248 Additionally, by default `flatc` will output binary files as `.bin`. 249 This declaration in the schema will change that to whatever you want: 250 251 file_extension "ext"; 252 253 ### RPC interface declarations 254 255 You can declare RPC calls in a schema, that define a set of functions 256 that take a FlatBuffer as an argument (the request) and return a FlatBuffer 257 as the response (both of which must be table types): 258 259 rpc_service MonsterStorage { 260 Store(Monster):StoreResponse; 261 Retrieve(MonsterId):Monster; 262 } 263 264 What code this produces and how it is used depends on language and RPC system 265 used, there is preliminary support for GRPC through the `--grpc` code generator, 266 see `grpc/tests` for an example. 267 268 ### Comments & documentation 269 270 May be written as in most C-based languages. Additionally, a triple 271 comment (`///`) on a line by itself signals that a comment is documentation 272 for whatever is declared on the line after it 273 (table/struct/field/enum/union/element), and the comment is output 274 in the corresponding C++ code. Multiple such lines per item are allowed. 275 276 ### Attributes 277 278 Attributes may be attached to a declaration, behind a field, or after 279 the name of a table/struct/enum/union. These may either have a value or 280 not. Some attributes like `deprecated` are understood by the compiler; 281 user defined ones need to be declared with the attribute declaration 282 (like `priority` in the example above), and are 283 available to query if you parse the schema at runtime. 284 This is useful if you write your own code generators/editors etc., and 285 you wish to add additional information specific to your tool (such as a 286 help text). 287 288 Current understood attributes: 289 290 - `id: n` (on a table field): manually set the field identifier to `n`. 291 If you use this attribute, you must use it on ALL fields of this table, 292 and the numbers must be a contiguous range from 0 onwards. 293 Additionally, since a union type effectively adds two fields, its 294 id must be that of the second field (the first field is the type 295 field and not explicitly declared in the schema). 296 For example, if the last field before the union field had id 6, 297 the union field should have id 8, and the unions type field will 298 implicitly be 7. 299 IDs allow the fields to be placed in any order in the schema. 300 When a new field is added to the schema it must use the next available ID. 301 - `deprecated` (on a field): do not generate accessors for this field 302 anymore, code should stop using this data. Old data may still contain this 303 field, but it won't be accessible anymore by newer code. Note that if you 304 deprecate a field that was previous required, old code may fail to validate 305 new data (when using the optional verifier). 306 - `required` (on a non-scalar table field): this field must always be set. 307 By default, all fields are optional, i.e. may be left out. This is 308 desirable, as it helps with forwards/backwards compatibility, and 309 flexibility of data structures. It is also a burden on the reading code, 310 since for non-scalar fields it requires you to check against NULL and 311 take appropriate action. By specifying this field, you force code that 312 constructs FlatBuffers to ensure this field is initialized, so the reading 313 code may access it directly, without checking for NULL. If the constructing 314 code does not initialize this field, they will get an assert, and also 315 the verifier will fail on buffers that have missing required fields. Note 316 that if you add this attribute to an existing field, this will only be 317 valid if existing data always contains this field / existing code always 318 writes this field. 319 - `force_align: size` (on a struct): force the alignment of this struct 320 to be something higher than what it is naturally aligned to. Causes 321 these structs to be aligned to that amount inside a buffer, IF that 322 buffer is allocated with that alignment (which is not necessarily 323 the case for buffers accessed directly inside a `FlatBufferBuilder`). 324 Note: currently not guaranteed to have an effect when used with 325 `--object-api`, since that may allocate objects at alignments less than 326 what you specify with `force_align`. 327 - `bit_flags` (on an enum): the values of this field indicate bits, 328 meaning that any value N specified in the schema will end up 329 representing 1<<N, or if you don't specify values at all, you'll get 330 the sequence 1, 2, 4, 8, ... 331 - `nested_flatbuffer: "table_name"` (on a field): this indicates that the field 332 (which must be a vector of ubyte) contains flatbuffer data, for which the 333 root type is given by `table_name`. The generated code will then produce 334 a convenient accessor for the nested FlatBuffer. 335 - `flexbuffer` (on a field): this indicates that the field 336 (which must be a vector of ubyte) contains flexbuffer data. The generated 337 code will then produce a convenient accessor for the FlexBuffer root. 338 - `key` (on a field): this field is meant to be used as a key when sorting 339 a vector of the type of table it sits in. Can be used for in-place 340 binary search. 341 - `hash` (on a field). This is an (un)signed 32/64 bit integer field, whose 342 value during JSON parsing is allowed to be a string, which will then be 343 stored as its hash. The value of attribute is the hashing algorithm to 344 use, one of `fnv1_32` `fnv1_64` `fnv1a_32` `fnv1a_64`. 345 - `original_order` (on a table): since elements in a table do not need 346 to be stored in any particular order, they are often optimized for 347 space by sorting them to size. This attribute stops that from happening. 348 There should generally not be any reason to use this flag. 349 - 'native_*'. Several attributes have been added to support the [C++ object 350 Based API](@ref flatbuffers_cpp_object_based_api). All such attributes 351 are prefixed with the term "native_". 352 353 354 ## JSON Parsing 355 356 The same parser that parses the schema declarations above is also able 357 to parse JSON objects that conform to this schema. So, unlike other JSON 358 parsers, this parser is strongly typed, and parses directly into a FlatBuffer 359 (see the compiler documentation on how to do this from the command line, or 360 the C++ documentation on how to do this at runtime). 361 362 Besides needing a schema, there are a few other changes to how it parses 363 JSON: 364 365 - It accepts field names with and without quotes, like many JSON parsers 366 already do. It outputs them without quotes as well, though can be made 367 to output them using the `strict_json` flag. 368 - If a field has an enum type, the parser will recognize symbolic enum 369 values (with or without quotes) instead of numbers, e.g. 370 `field: EnumVal`. If a field is of integral type, you can still use 371 symbolic names, but values need to be prefixed with their type and 372 need to be quoted, e.g. `field: "Enum.EnumVal"`. For enums 373 representing flags, you may place multiple inside a string 374 separated by spaces to OR them, e.g. 375 `field: "EnumVal1 EnumVal2"` or `field: "Enum.EnumVal1 Enum.EnumVal2"`. 376 - Similarly, for unions, these need to specified with two fields much like 377 you do when serializing from code. E.g. for a field `foo`, you must 378 add a field `foo_type: FooOne` right before the `foo` field, where 379 `FooOne` would be the table out of the union you want to use. 380 - A field that has the value `null` (e.g. `field: null`) is intended to 381 have the default value for that field (thus has the same effect as if 382 that field wasn't specified at all). 383 - It has some built in conversion functions, so you can write for example 384 `rad(180)` where ever you'd normally write `3.14159`. 385 Currently supports the following functions: `rad`, `deg`, `cos`, `sin`, 386 `tan`, `acos`, `asin`, `atan`. 387 388 When parsing JSON, it recognizes the following escape codes in strings: 389 390 - `\n` - linefeed. 391 - `\t` - tab. 392 - `\r` - carriage return. 393 - `\b` - backspace. 394 - `\f` - form feed. 395 - `\"` - double quote. 396 - `\\` - backslash. 397 - `\/` - forward slash. 398 - `\uXXXX` - 16-bit unicode code point, converted to the equivalent UTF-8 399 representation. 400 - `\xXX` - 8-bit binary hexadecimal number XX. This is the only one that is 401 not in the JSON spec (see http://json.org/), but is needed to be able to 402 encode arbitrary binary in strings to text and back without losing 403 information (e.g. the byte 0xFF can't be represented in standard JSON). 404 405 It also generates these escape codes back again when generating JSON from a 406 binary representation. 407 408 When parsing numbers, the parser is more flexible than JSON. 409 A format of numeric literals is more close to the C/C++. 410 According to the [grammar](@ref flatbuffers_grammar), it accepts the following 411 numerical literals: 412 413 - An integer literal can have any number of leading zero `0` digits. 414 Unlike C/C++, the parser ignores a leading zero, not interpreting it as the 415 beginning of the octal number. 416 The numbers `[081, -00094]` are equal to `[81, -94]` decimal integers. 417 - The parser accepts unsigned and signed hexadecimal integer numbers. 418 For example: `[0x123, +0x45, -0x67]` are equal to `[291, 69, -103]` decimals. 419 - The format of float-point numbers is fully compatible with C/C++ format. 420 If a modern C++ compiler is used the parser accepts hexadecimal and special 421 float-point literals as well: 422 `[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]`. 423 The exponent suffix of hexadecimal float-point number is mandatory. 424 425 Extended float-point support was tested with: 426 - x64 Windows: `MSVC2015` and higher. 427 - x64 Linux: `LLVM 6.0`, `GCC 4.9` and higher. 428 429 - For compatibility with a JSON lint tool all numeric literals of scalar 430 fields can be wrapped to quoted string: 431 `"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"`. 432 433 ## Guidelines 434 435 ### Efficiency 436 437 FlatBuffers is all about efficiency, but to realize that efficiency you 438 require an efficient schema. There are usually multiple choices on 439 how to represent data that have vastly different size characteristics. 440 441 It is very common nowadays to represent any kind of data as dictionaries 442 (as in e.g. JSON), because of its flexibility and extensibility. While 443 it is possible to emulate this in FlatBuffers (as a vector 444 of tables with key and value(s)), this is a bad match for a strongly 445 typed system like FlatBuffers, leading to relatively large binaries. 446 FlatBuffer tables are more flexible than classes/structs in most systems, 447 since having a large number of fields only few of which are actually 448 used is still efficient. You should thus try to organize your data 449 as much as possible such that you can use tables where you might be 450 tempted to use a dictionary. 451 452 Similarly, strings as values should only be used when they are 453 truely open-ended. If you can, always use an enum instead. 454 455 FlatBuffers doesn't have inheritance, so the way to represent a set 456 of related data structures is a union. Unions do have a cost however, 457 so an alternative to a union is to have a single table that has 458 all the fields of all the data structures you are trying to 459 represent, if they are relatively similar / share many fields. 460 Again, this is efficient because optional fields are cheap. 461 462 FlatBuffers supports the full range of integer sizes, so try to pick 463 the smallest size needed, rather than defaulting to int/long. 464 465 Remember that you can share data (refer to the same string/table 466 within a buffer), so factoring out repeating data into its own 467 data structure may be worth it. 468 469 ### Style guide 470 471 Identifiers in a schema are meant to translate to many different programming 472 languages, so using the style of your "main" language is generally a bad idea. 473 474 For this reason, below is a suggested style guide to adhere to, to keep schemas 475 consistent for interoperation regardless of the target language. 476 477 Where possible, the code generators for specific languages will generate 478 identifiers that adhere to the language style, based on the schema identifiers. 479 480 - Table, struct, enum and rpc names (types): UpperCamelCase. 481 - Table and struct field names: snake_case. This is translated to lowerCamelCase 482 automatically for some languages, e.g. Java. 483 - Enum values: UpperCamelCase. 484 - namespaces: UpperCamelCase. 485 486 Formatting (this is less important, but still worth adhering to): 487 488 - Opening brace: on the same line as the start of the declaration. 489 - Spacing: Indent by 2 spaces. None around `:` for types, on both sides for `=`. 490 491 For an example, see the schema at the top of this file. 492 493 ## Gotchas 494 495 ### Schemas and version control 496 497 FlatBuffers relies on new field declarations being added at the end, and earlier 498 declarations to not be removed, but be marked deprecated when needed. We think 499 this is an improvement over the manual number assignment that happens in 500 Protocol Buffers (and which is still an option using the `id` attribute 501 mentioned above). 502 503 One place where this is possibly problematic however is source control. If user 504 A adds a field, generates new binary data with this new schema, then tries to 505 commit both to source control after user B already committed a new field also, 506 and just auto-merges the schema, the binary files are now invalid compared to 507 the new schema. 508 509 The solution of course is that you should not be generating binary data before 510 your schema changes have been committed, ensuring consistency with the rest of 511 the world. If this is not practical for you, use explicit field ids, which 512 should always generate a merge conflict if two people try to allocate the same 513 id. 514 515 ### Schema evolution examples 516 517 Some examples to clarify what happens as you change a schema: 518 519 If we have the following original schema: 520 521 table { a:int; b:int; } 522 523 And we extend it: 524 525 table { a:int; b:int; c:int; } 526 527 This is ok. Code compiled with the old schema reading data generated with the 528 new one will simply ignore the presence of the new field. Code compiled with the 529 new schema reading old data will get the default value for `c` (which is 0 530 in this case, since it is not specified). 531 532 table { a:int (deprecated); b:int; } 533 534 This is also ok. Code compiled with the old schema reading newer data will now 535 always get the default value for `a` since it is not present. Code compiled 536 with the new schema now cannot read nor write `a` anymore (any existing code 537 that tries to do so will result in compile errors), but can still read 538 old data (they will ignore the field). 539 540 table { c:int a:int; b:int; } 541 542 This is NOT ok, as this makes the schemas incompatible. Old code reading newer 543 data will interpret `c` as if it was `a`, and new code reading old data 544 accessing `a` will instead receive `b`. 545 546 table { c:int (id: 2); a:int (id: 0); b:int (id: 1); } 547 548 This is ok. If your intent was to order/group fields in a way that makes sense 549 semantically, you can do so using explicit id assignment. Now we are compatible 550 with the original schema, and the fields can be ordered in any way, as long as 551 we keep the sequence of ids. 552 553 table { b:int; } 554 555 NOT ok. We can only remove a field by deprecation, regardless of wether we use 556 explicit ids or not. 557 558 table { a:uint; b:uint; } 559 560 This is MAYBE ok, and only in the case where the type change is the same size, 561 like here. If old data never contained any negative numbers, this will be 562 safe to do. 563 564 table { a:int = 1; b:int = 2; } 565 566 Generally NOT ok. Any older data written that had 0 values were not written to 567 the buffer, and rely on the default value to be recreated. These will now have 568 those values appear to `1` and `2` instead. There may be cases in which this 569 is ok, but care must be taken. 570 571 table { aa:int; bb:int; } 572 573 Occasionally ok. You've renamed fields, which will break all code (and JSON 574 files!) that use this schema, but as long as the change is obvious, this is not 575 incompatible with the actual binary buffers, since those only ever address 576 fields by id/offset. 577 <br> 578 579 ### Testing whether a field is present in a table 580 581 Most serialization formats (e.g. JSON or Protocol Buffers) make it very 582 explicit in the format whether a field is present in an object or not, 583 allowing you to use this as "extra" information. 584 585 In FlatBuffers, this also holds for everything except scalar values. 586 587 FlatBuffers by default will not write fields that are equal to the default 588 value (for scalars), sometimes resulting in a significant space savings. 589 590 However, this also means testing whether a field is "present" is somewhat 591 meaningless, since it does not tell you if the field was actually written by 592 calling `add_field` style calls, unless you're only interested in this 593 information for non-default values. 594 595 Some `FlatBufferBuilder` implementations have an option called `force_defaults` 596 that circumvents this behavior, and writes fields even if they are equal to 597 the default. You can then use `IsFieldPresent` to query this. 598 599 Another option that works in all languages is to wrap a scalar field in a 600 struct. This way it will return null if it is not present. The cool thing 601 is that structs don't take up any more space than the scalar they represent. 602 603 [Interface Definition Language]: https://en.wikipedia.org/wiki/Interface_description_language 604