Home | History | Annotate | Download | only in docs
      1 ======================
      2 Nanopb: Basic concepts
      3 ======================
      4 
      5 .. include :: menu.rst
      6 
      7 The things outlined here are the underlying concepts of the nanopb design.
      8 
      9 .. contents::
     10 
     11 Proto files
     12 ===========
     13 All Protocol Buffers implementations use .proto files to describe the message
     14 format. The point of these files is to be a portable interface description
     15 language.
     16 
     17 Compiling .proto files for nanopb
     18 ---------------------------------
     19 Nanopb uses the Google's protoc compiler to parse the .proto file, and then a
     20 python script to generate the C header and source code from it::
     21 
     22     user@host:~$ protoc -omessage.pb message.proto
     23     user@host:~$ python ../generator/nanopb_generator.py message.pb
     24     Writing to message.h and message.c
     25     user@host:~$
     26 
     27 Modifying generator behaviour
     28 -----------------------------
     29 Using generator options, you can set maximum sizes for fields in order to
     30 allocate them statically. The preferred way to do this is to create an .options
     31 file with the same name as your .proto file::
     32 
     33    # Foo.proto
     34    message Foo {
     35       required string name = 1;
     36    }
     37 
     38 ::
     39 
     40    # Foo.options
     41    Foo.name max_size:16
     42 
     43 For more information on this, see the `Proto file options`_ section in the
     44 reference manual.
     45 
     46 .. _`Proto file options`: reference.html#proto-file-options
     47 
     48 Streams
     49 =======
     50 
     51 Nanopb uses streams for accessing the data in encoded format.
     52 The stream abstraction is very lightweight, and consists of a structure (*pb_ostream_t* or *pb_istream_t*) which contains a pointer to a callback function.
     53 
     54 There are a few generic rules for callback functions:
     55 
     56 #) Return false on IO errors. The encoding or decoding process will abort immediately.
     57 #) Use state to store your own data, such as a file descriptor.
     58 #) *bytes_written* and *bytes_left* are updated by pb_write and pb_read.
     59 #) Your callback may be used with substreams. In this case *bytes_left*, *bytes_written* and *max_size* have smaller values than the original stream. Don't use these values to calculate pointers.
     60 #) Always read or write the full requested length of data. For example, POSIX *recv()* needs the *MSG_WAITALL* parameter to accomplish this.
     61 
     62 Output streams
     63 --------------
     64 
     65 ::
     66 
     67  struct _pb_ostream_t
     68  {
     69     bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
     70     void *state;
     71     size_t max_size;
     72     size_t bytes_written;
     73  };
     74 
     75 The *callback* for output stream may be NULL, in which case the stream simply counts the number of bytes written. In this case, *max_size* is ignored.
     76 
     77 Otherwise, if *bytes_written* + bytes_to_be_written is larger than *max_size*, pb_write returns false before doing anything else. If you don't want to limit the size of the stream, pass SIZE_MAX.
     78  
     79 **Example 1:**
     80 
     81 This is the way to get the size of the message without storing it anywhere::
     82 
     83  Person myperson = ...;
     84  pb_ostream_t sizestream = {0};
     85  pb_encode(&sizestream, Person_fields, &myperson);
     86  printf("Encoded size is %d\n", sizestream.bytes_written);
     87 
     88 **Example 2:**
     89 
     90 Writing to stdout::
     91 
     92  bool callback(pb_ostream_t *stream, const uint8_t *buf, size_t count)
     93  {
     94     FILE *file = (FILE*) stream->state;
     95     return fwrite(buf, 1, count, file) == count;
     96  }
     97  
     98  pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
     99 
    100 Input streams
    101 -------------
    102 For input streams, there is one extra rule:
    103 
    104 #) You don't need to know the length of the message in advance. After getting EOF error when reading, set bytes_left to 0 and return false. Pb_decode will detect this and if the EOF was in a proper position, it will return true.
    105 
    106 Here is the structure::
    107 
    108  struct _pb_istream_t
    109  {
    110     bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
    111     void *state;
    112     size_t bytes_left;
    113  };
    114 
    115 The *callback* must always be a function pointer. *Bytes_left* is an upper limit on the number of bytes that will be read. You can use SIZE_MAX if your callback handles EOF as described above.
    116 
    117 **Example:**
    118 
    119 This function binds an input stream to stdin:
    120 
    121 :: 
    122 
    123  bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
    124  {
    125     FILE *file = (FILE*)stream->state;
    126     bool status;
    127     
    128     if (buf == NULL)
    129     {
    130         while (count-- && fgetc(file) != EOF);
    131         return count == 0;
    132     }
    133     
    134     status = (fread(buf, 1, count, file) == count);
    135     
    136     if (feof(file))
    137         stream->bytes_left = 0;
    138     
    139     return status;
    140  }
    141  
    142  pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
    143 
    144 Data types
    145 ==========
    146 
    147 Most Protocol Buffers datatypes have directly corresponding C datatypes, such as int32 is int32_t, float is float and bool is bool. However, the variable-length datatypes are more complex:
    148 
    149 1) Strings, bytes and repeated fields of any type map to callback functions by default.
    150 2) If there is a special option *(nanopb).max_size* specified in the .proto file, string maps to null-terminated char array and bytes map to a structure containing a char array and a size field.
    151 3) If *(nanopb).type* is set to *FT_INLINE* and *(nanopb).max_size* is also set, then bytes map to an inline byte array of fixed size.
    152 3) If there is a special option *(nanopb).max_count* specified on a repeated field, it maps to an array of whatever type is being repeated. Another field will be created for the actual number of entries stored.
    153 
    154 =============================================================================== =======================
    155       field in .proto                                                           autogenerated in .h
    156 =============================================================================== =======================
    157 required string name = 1;                                                       pb_callback_t name;
    158 required string name = 1 [(nanopb).max_size = 40];                              char name[40];
    159 repeated string name = 1 [(nanopb).max_size = 40];                              pb_callback_t name;
    160 repeated string name = 1 [(nanopb).max_size = 40, (nanopb).max_count = 5];      | size_t name_count;
    161                                                                                 | char name[5][40];
    162 required bytes data = 1 [(nanopb).max_size = 40];                               | typedef struct {
    163                                                                                 |    size_t size;
    164                                                                                 |    pb_byte_t bytes[40];
    165                                                                                 | } Person_data_t;
    166                                                                                 | Person_data_t data;
    167 required bytes data = 1 [(nanopb).max_size = 40, (nanopb.type) = FT_INLINE];    | pb_byte_t data[40];
    168 =============================================================================== =======================
    169 
    170 The maximum lengths are checked in runtime. If string/bytes/array exceeds the allocated length, *pb_decode* will return false.
    171 
    172 Note: for the *bytes* datatype, the field length checking may not be exact.
    173 The compiler may add some padding to the *pb_bytes_t* structure, and the nanopb runtime doesn't know how much of the structure size is padding. Therefore it uses the whole length of the structure for storing data, which is not very smart but shouldn't cause problems. In practise, this means that if you specify *(nanopb).max_size=5* on a *bytes* field, you may be able to store 6 bytes there. For the *string* field type, the length limit is exact.
    174 
    175 Field callbacks
    176 ===============
    177 When a field has dynamic length, nanopb cannot statically allocate storage for it. Instead, it allows you to handle the field in whatever way you want, using a callback function.
    178 
    179 The `pb_callback_t`_ structure contains a function pointer and a *void* pointer called *arg* you can use for passing data to the callback. If the function pointer is NULL, the field will be skipped. A pointer to the *arg* is passed to the function, so that it can modify it and retrieve the value.
    180 
    181 The actual behavior of the callback function is different in encoding and decoding modes. In encoding mode, the callback is called once and should write out everything, including field tags. In decoding mode, the callback is called repeatedly for every data item.
    182 
    183 .. _`pb_callback_t`: reference.html#pb-callback-t
    184 
    185 Encoding callbacks
    186 ------------------
    187 ::
    188 
    189     bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, void * const *arg);
    190 
    191 When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
    192 
    193 Usually you can use `pb_encode_tag_for_field`_ to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ instead to specify a wire type of *PB_WT_STRING*.
    194 
    195 If the callback is used in a submessage, it will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, it is called only once.
    196 
    197 .. _`pb_encode`: reference.html#pb-encode
    198 .. _`pb_encode_tag_for_field`: reference.html#pb-encode-tag-for-field
    199 .. _`pb_encode_tag`: reference.html#pb-encode-tag
    200 
    201 This callback writes out a dynamically sized string::
    202 
    203     bool write_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
    204     {
    205         char *str = get_string_from_somewhere();
    206         if (!pb_encode_tag_for_field(stream, field))
    207             return false;
    208         
    209         return pb_encode_string(stream, (uint8_t*)str, strlen(str));
    210     }
    211 
    212 Decoding callbacks
    213 ------------------
    214 ::
    215 
    216     bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void **arg);
    217 
    218 When decoding, the callback receives a length-limited substring that reads the contents of a single field. The field tag has already been read. For *string* and *bytes*, the length value has already been parsed, and is available at *stream->bytes_left*.
    219 
    220 The callback will be called multiple times for repeated fields. For packed fields, you can either read multiple values until the stream ends, or leave it to `pb_decode`_ to call your function over and over until all values have been read.
    221 
    222 .. _`pb_decode`: reference.html#pb-decode
    223 
    224 This callback reads multiple integers and prints them::
    225 
    226     bool read_ints(pb_istream_t *stream, const pb_field_t *field, void **arg)
    227     {
    228         while (stream->bytes_left)
    229         {
    230             uint64_t value;
    231             if (!pb_decode_varint(stream, &value))
    232                 return false;
    233             printf("%lld\n", value);
    234         }
    235         return true;
    236     }
    237 
    238 Field description array
    239 =======================
    240 
    241 For using the *pb_encode* and *pb_decode* functions, you need an array of pb_field_t constants describing the structure you wish to encode. This description is usually autogenerated from .proto file.
    242 
    243 For example this submessage in the Person.proto file::
    244 
    245  message Person {
    246     message PhoneNumber {
    247         required string number = 1 [(nanopb).max_size = 40];
    248         optional PhoneType type = 2 [default = HOME];
    249     }
    250  }
    251 
    252 generates this field description array for the structure *Person_PhoneNumber*::
    253 
    254  const pb_field_t Person_PhoneNumber_fields[3] = {
    255     PB_FIELD(  1, STRING  , REQUIRED, STATIC, Person_PhoneNumber, number, number, 0),
    256     PB_FIELD(  2, ENUM    , OPTIONAL, STATIC, Person_PhoneNumber, type, number, &Person_PhoneNumber_type_default),
    257     PB_LAST_FIELD
    258  };
    259 
    260 Oneof
    261 =====
    262 Protocol Buffers supports `oneof`_ sections. Here is an example of ``oneof`` usage::
    263 
    264  message MsgType1 {
    265      required int32 value = 1;
    266  }
    267 
    268  message MsgType2 {
    269      required bool value = 1;
    270  }
    271  
    272  message MsgType3 {
    273      required int32 value1 = 1;
    274      required int32 value2 = 2;
    275  } 
    276  
    277  message MyMessage {
    278      required uint32 uid = 1;
    279      required uint32 pid = 2;
    280      required uint32 utime = 3;
    281  
    282      oneof payload {
    283          MsgType1 msg1 = 4;
    284          MsgType2 msg2 = 5;
    285          MsgType3 msg3 = 6;
    286      }
    287  }
    288 
    289 Nanopb will generate ``payload`` as a C union and add an additional field ``which_payload``::
    290 
    291   typedef struct _MyMessage {
    292     uint32_t uid;
    293     uint32_t pid;
    294     uint32_t utime;
    295     pb_size_t which_payload;
    296     union {
    297         MsgType1 msg1;
    298         MsgType2 msg2;
    299         MsgType3 msg3;
    300     } payload;
    301   /* @@protoc_insertion_point(struct:MyMessage) */
    302   } MyMessage;
    303 
    304 ``which_payload`` indicates which of the ``oneof`` fields is actually set. 
    305 The user is expected to set the filed manually using the correct field tag::
    306 
    307   MyMessage msg = MyMessage_init_zero;
    308   msg.payload.msg2.value = true;
    309   msg.which_payload = MyMessage_msg2_tag;
    310 
    311 Notice that neither ``which_payload`` field nor the unused fileds in ``payload``
    312 will consume any space in the resulting encoded message.
    313 
    314 .. _`oneof`: https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field
    315 
    316 Extension fields
    317 ================
    318 Protocol Buffers supports a concept of `extension fields`_, which are
    319 additional fields to a message, but defined outside the actual message.
    320 The definition can even be in a completely separate .proto file.
    321 
    322 The base message is declared as extensible by keyword *extensions* in
    323 the .proto file::
    324 
    325  message MyMessage {
    326      .. fields ..
    327      extensions 100 to 199;
    328  }
    329 
    330 For each extensible message, *nanopb_generator.py* declares an additional
    331 callback field called *extensions*. The field and associated datatype
    332 *pb_extension_t* forms a linked list of handlers. When an unknown field is
    333 encountered, the decoder calls each handler in turn until either one of them
    334 handles the field, or the list is exhausted.
    335 
    336 The actual extensions are declared using the *extend* keyword in the .proto,
    337 and are in the global namespace::
    338 
    339  extend MyMessage {
    340      optional int32 myextension = 100;
    341  }
    342 
    343 For each extension, *nanopb_generator.py* creates a constant of type
    344 *pb_extension_type_t*. To link together the base message and the extension,
    345 you have to:
    346 
    347 1. Allocate storage for your field, matching the datatype in the .proto.
    348    For example, for a *int32* field, you need a *int32_t* variable to store
    349    the value.
    350 2. Create a *pb_extension_t* constant, with pointers to your variable and
    351    to the generated *pb_extension_type_t*.
    352 3. Set the *message.extensions* pointer to point to the *pb_extension_t*.
    353 
    354 An example of this is available in *tests/test_encode_extensions.c* and
    355 *tests/test_decode_extensions.c*.
    356 
    357 .. _`extension fields`: https://developers.google.com/protocol-buffers/docs/proto#extensions
    358 
    359 Message framing
    360 ===============
    361 Protocol Buffers does not specify a method of framing the messages for transmission.
    362 This is something that must be provided by the library user, as there is no one-size-fits-all
    363 solution. Typical needs for a framing format are to:
    364 
    365 1. Encode the message length.
    366 2. Encode the message type.
    367 3. Perform any synchronization and error checking that may be needed depending on application.
    368 
    369 For example UDP packets already fullfill all the requirements, and TCP streams typically only
    370 need a way to identify the message length and type. Lower level interfaces such as serial ports
    371 may need a more robust frame format, such as HDLC (high-level data link control).
    372 
    373 Nanopb provides a few helpers to facilitate implementing framing formats:
    374 
    375 1. Functions *pb_encode_delimited* and *pb_decode_delimited* prefix the message data with a varint-encoded length.
    376 2. Union messages and oneofs are supported in order to implement top-level container messages.
    377 3. Message IDs can be specified using the *(nanopb_msgopt).msgid* option and can then be accessed from the header.
    378 
    379 Return values and error handling
    380 ================================
    381 
    382 Most functions in nanopb return bool: *true* means success, *false* means failure. There is also some support for error messages for debugging purposes: the error messages go in *stream->errmsg*.
    383 
    384 The error messages help in guessing what is the underlying cause of the error. The most common error conditions are:
    385 
    386 1) Running out of memory, i.e. stack overflow.
    387 2) Invalid field descriptors (would usually mean a bug in the generator).
    388 3) IO errors in your own stream callbacks.
    389 4) Errors that happen in your callback functions.
    390 5) Exceeding the max_size or bytes_left of a stream.
    391 6) Exceeding the max_size of a string or array field
    392 7) Invalid protocol buffers binary message.
    393