Home | History | Annotate | Download | only in gob
      1 // Copyright 2009 The Go Authors. All rights reserved.
      2 // Use of this source code is governed by a BSD-style
      3 // license that can be found in the LICENSE file.
      4 
      5 /*
      6 Package gob manages streams of gobs - binary values exchanged between an
      7 Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
      8 arguments and results of remote procedure calls (RPCs) such as those provided by
      9 package "net/rpc".
     10 
     11 The implementation compiles a custom codec for each data type in the stream and
     12 is most efficient when a single Encoder is used to transmit a stream of values,
     13 amortizing the cost of compilation.
     14 
     15 Basics
     16 
     17 A stream of gobs is self-describing.  Each data item in the stream is preceded by
     18 a specification of its type, expressed in terms of a small set of predefined
     19 types.  Pointers are not transmitted, but the things they point to are
     20 transmitted; that is, the values are flattened. Nil pointers are not permitted,
     21 as they have no value. Recursive types work fine, but
     22 recursive values (data with cycles) are problematic.  This may change.
     23 
     24 To use gobs, create an Encoder and present it with a series of data items as
     25 values or addresses that can be dereferenced to values.  The Encoder makes sure
     26 all type information is sent before it is needed.  At the receive side, a
     27 Decoder retrieves values from the encoded stream and unpacks them into local
     28 variables.
     29 
     30 Types and Values
     31 
     32 The source and destination values/types need not correspond exactly.  For structs,
     33 fields (identified by name) that are in the source but absent from the receiving
     34 variable will be ignored.  Fields that are in the receiving variable but missing
     35 from the transmitted type or value will be ignored in the destination.  If a field
     36 with the same name is present in both, their types must be compatible. Both the
     37 receiver and transmitter will do all necessary indirection and dereferencing to
     38 convert between gobs and actual Go values.  For instance, a gob type that is
     39 schematically,
     40 
     41 	struct { A, B int }
     42 
     43 can be sent from or received into any of these Go types:
     44 
     45 	struct { A, B int }	// the same
     46 	*struct { A, B int }	// extra indirection of the struct
     47 	struct { *A, **B int }	// extra indirection of the fields
     48 	struct { A, B int64 }	// different concrete value type; see below
     49 
     50 It may also be received into any of these:
     51 
     52 	struct { A, B int }	// the same
     53 	struct { B, A int }	// ordering doesn't matter; matching is by name
     54 	struct { A, B, C int }	// extra field (C) ignored
     55 	struct { B int }	// missing field (A) ignored; data will be dropped
     56 	struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
     57 
     58 Attempting to receive into these types will draw a decode error:
     59 
     60 	struct { A int; B uint }	// change of signedness for B
     61 	struct { A int; B float }	// change of type for B
     62 	struct { }			// no field names in common
     63 	struct { C, D int }		// no field names in common
     64 
     65 Integers are transmitted two ways: arbitrary precision signed integers or
     66 arbitrary precision unsigned integers.  There is no int8, int16 etc.
     67 discrimination in the gob format; there are only signed and unsigned integers.  As
     68 described below, the transmitter sends the value in a variable-length encoding;
     69 the receiver accepts the value and stores it in the destination variable.
     70 Floating-point numbers are always sent using IEEE-754 64-bit precision (see
     71 below).
     72 
     73 Signed integers may be received into any signed integer variable: int, int16, etc.;
     74 unsigned integers may be received into any unsigned integer variable; and floating
     75 point values may be received into any floating point variable.  However,
     76 the destination variable must be able to represent the value or the decode
     77 operation will fail.
     78 
     79 Structs, arrays and slices are also supported. Structs encode and decode only
     80 exported fields. Strings and arrays of bytes are supported with a special,
     81 efficient representation (see below). When a slice is decoded, if the existing
     82 slice has capacity the slice will be extended in place; if not, a new array is
     83 allocated. Regardless, the length of the resulting slice reports the number of
     84 elements decoded.
     85 
     86 In general, if allocation is required, the decoder will allocate memory. If not,
     87 it will update the destination variables with values read from the stream. It does
     88 not initialize them first, so if the destination is a compound value such as a
     89 map, struct, or slice, the decoded values will be merged elementwise into the
     90 existing variables.
     91 
     92 Functions and channels will not be sent in a gob. Attempting to encode such a value
     93 at the top level will fail. A struct field of chan or func type is treated exactly
     94 like an unexported field and is ignored.
     95 
     96 Gob can encode a value of any type implementing the GobEncoder or
     97 encoding.BinaryMarshaler interfaces by calling the corresponding method,
     98 in that order of preference.
     99 
    100 Gob can decode a value of any type implementing the GobDecoder or
    101 encoding.BinaryUnmarshaler interfaces by calling the corresponding method,
    102 again in that order of preference.
    103 
    104 Encoding Details
    105 
    106 This section documents the encoding, details that are not important for most
    107 users. Details are presented bottom-up.
    108 
    109 An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
    110 as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
    111 (high byte first) byte stream holding the value, preceded by one byte holding the
    112 byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
    113 256 is transmitted as (FE 01 00).
    114 
    115 A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
    116 
    117 A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
    118 upward contain the value; bit 0 says whether they should be complemented upon
    119 receipt.  The encode algorithm looks like this:
    120 
    121 	var u uint
    122 	if i < 0 {
    123 		u = (^uint(i) << 1) | 1 // complement i, bit 0 is 1
    124 	} else {
    125 		u = (uint(i) << 1) // do not complement i, bit 0 is 0
    126 	}
    127 	encodeUnsigned(u)
    128 
    129 The low bit is therefore analogous to a sign bit, but making it the complement bit
    130 instead guarantees that the largest negative integer is not a special case.  For
    131 example, -129=^128=(^256>>1) encodes as (FE 01 01).
    132 
    133 Floating-point numbers are always sent as a representation of a float64 value.
    134 That value is converted to a uint64 using math.Float64bits.  The uint64 is then
    135 byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
    136 exponent and high-precision part of the mantissa go first.  Since the low bits are
    137 often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
    138 three bytes (FE 31 40).
    139 
    140 Strings and slices of bytes are sent as an unsigned count followed by that many
    141 uninterpreted bytes of the value.
    142 
    143 All other slices and arrays are sent as an unsigned count followed by that many
    144 elements using the standard gob encoding for their type, recursively.
    145 
    146 Maps are sent as an unsigned count followed by that many key, element
    147 pairs. Empty but non-nil maps are sent, so if the receiver has not allocated
    148 one already, one will always be allocated on receipt unless the transmitted map
    149 is nil and not at the top level.
    150 
    151 In slices and arrays, as well as maps, all elements, even zero-valued elements,
    152 are transmitted, even if all the elements are zero.
    153 
    154 Structs are sent as a sequence of (field number, field value) pairs.  The field
    155 value is sent using the standard gob encoding for its type, recursively.  If a
    156 field has the zero value for its type (except for arrays; see above), it is omitted
    157 from the transmission.  The field number is defined by the type of the encoded
    158 struct: the first field of the encoded type is field 0, the second is field 1,
    159 etc.  When encoding a value, the field numbers are delta encoded for efficiency
    160 and the fields are always sent in order of increasing field number; the deltas are
    161 therefore unsigned.  The initialization for the delta encoding sets the field
    162 number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
    163 delta = 1, unsigned value = 7 or (01 07).  Finally, after all the fields have been
    164 sent a terminating mark denotes the end of the struct.  That mark is a delta=0
    165 value, which has representation (00).
    166 
    167 Interface types are not checked for compatibility; all interface types are
    168 treated, for transmission, as members of a single "interface" type, analogous to
    169 int or []byte - in effect they're all treated as interface{}.  Interface values
    170 are transmitted as a string identifying the concrete type being sent (a name
    171 that must be pre-defined by calling Register), followed by a byte count of the
    172 length of the following data (so the value can be skipped if it cannot be
    173 stored), followed by the usual encoding of concrete (dynamic) value stored in
    174 the interface value.  (A nil interface value is identified by the empty string
    175 and transmits no value.) Upon receipt, the decoder verifies that the unpacked
    176 concrete item satisfies the interface of the receiving variable.
    177 
    178 The representation of types is described below.  When a type is defined on a given
    179 connection between an Encoder and Decoder, it is assigned a signed integer type
    180 id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
    181 the type of v and all its elements and then it sends the pair (typeid, encoded-v)
    182 where typeid is the type id of the encoded type of v and encoded-v is the gob
    183 encoding of the value v.
    184 
    185 To define a type, the encoder chooses an unused, positive type id and sends the
    186 pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
    187 description, constructed from these types:
    188 
    189 	type wireType struct {
    190 		ArrayT  *ArrayType
    191 		SliceT  *SliceType
    192 		StructT *StructType
    193 		MapT    *MapType
    194 	}
    195 	type arrayType struct {
    196 		CommonType
    197 		Elem typeId
    198 		Len  int
    199 	}
    200 	type CommonType struct {
    201 		Name string // the name of the struct type
    202 		Id  int    // the id of the type, repeated so it's inside the type
    203 	}
    204 	type sliceType struct {
    205 		CommonType
    206 		Elem typeId
    207 	}
    208 	type structType struct {
    209 		CommonType
    210 		Field []*fieldType // the fields of the struct.
    211 	}
    212 	type fieldType struct {
    213 		Name string // the name of the field.
    214 		Id   int    // the type id of the field, which must be already defined
    215 	}
    216 	type mapType struct {
    217 		CommonType
    218 		Key  typeId
    219 		Elem typeId
    220 	}
    221 
    222 If there are nested type ids, the types for all inner type ids must be defined
    223 before the top-level type id is used to describe an encoded-v.
    224 
    225 For simplicity in setup, the connection is defined to understand these types a
    226 priori, as well as the basic gob types int, uint, etc.  Their ids are:
    227 
    228 	bool        1
    229 	int         2
    230 	uint        3
    231 	float       4
    232 	[]byte      5
    233 	string      6
    234 	complex     7
    235 	interface   8
    236 	// gap for reserved ids.
    237 	WireType    16
    238 	ArrayType   17
    239 	CommonType  18
    240 	SliceType   19
    241 	StructType  20
    242 	FieldType   21
    243 	// 22 is slice of fieldType.
    244 	MapType     23
    245 
    246 Finally, each message created by a call to Encode is preceded by an encoded
    247 unsigned integer count of the number of bytes remaining in the message.  After
    248 the initial type name, interface values are wrapped the same way; in effect, the
    249 interface value acts like a recursive invocation of Encode.
    250 
    251 In summary, a gob stream looks like
    252 
    253 	(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
    254 
    255 where * signifies zero or more repetitions and the type id of a value must
    256 be predefined or be defined before the value in the stream.
    257 
    258 Compatibility: Any future changes to the package will endeavor to maintain
    259 compatibility with streams encoded using previous versions.  That is, any released
    260 version of this package should be able to decode data written with any previously
    261 released version, subject to issues such as security fixes. See the Go compatibility
    262 document for background: https://golang.org/doc/go1compat
    263 
    264 See "Gobs of data" for a design discussion of the gob wire format:
    265 https://blog.golang.org/gobs-of-data
    266 */
    267 package gob
    268 
    269 /*
    270 Grammar:
    271 
    272 Tokens starting with a lower case letter are terminals; int(n)
    273 and uint(n) represent the signed/unsigned encodings of the value n.
    274 
    275 GobStream:
    276 	DelimitedMessage*
    277 DelimitedMessage:
    278 	uint(lengthOfMessage) Message
    279 Message:
    280 	TypeSequence TypedValue
    281 TypeSequence
    282 	(TypeDefinition DelimitedTypeDefinition*)?
    283 DelimitedTypeDefinition:
    284 	uint(lengthOfTypeDefinition) TypeDefinition
    285 TypedValue:
    286 	int(typeId) Value
    287 TypeDefinition:
    288 	int(-typeId) encodingOfWireType
    289 Value:
    290 	SingletonValue | StructValue
    291 SingletonValue:
    292 	uint(0) FieldValue
    293 FieldValue:
    294 	builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
    295 InterfaceValue:
    296 	NilInterfaceValue | NonNilInterfaceValue
    297 NilInterfaceValue:
    298 	uint(0)
    299 NonNilInterfaceValue:
    300 	ConcreteTypeName TypeSequence InterfaceContents
    301 ConcreteTypeName:
    302 	uint(lengthOfName) [already read=n] name
    303 InterfaceContents:
    304 	int(concreteTypeId) DelimitedValue
    305 DelimitedValue:
    306 	uint(length) Value
    307 ArrayValue:
    308 	uint(n) FieldValue*n [n elements]
    309 MapValue:
    310 	uint(n) (FieldValue FieldValue)*n  [n (key, value) pairs]
    311 SliceValue:
    312 	uint(n) FieldValue*n [n elements]
    313 StructValue:
    314 	(uint(fieldDelta) FieldValue)*
    315 */
    316 
    317 /*
    318 For implementers and the curious, here is an encoded example.  Given
    319 	type Point struct {X, Y int}
    320 and the value
    321 	p := Point{22, 33}
    322 the bytes transmitted that encode p will be:
    323 	1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
    324 	01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
    325 	07 ff 82 01 2c 01 42 00
    326 They are determined as follows.
    327 
    328 Since this is the first transmission of type Point, the type descriptor
    329 for Point itself must be sent before the value.  This is the first type
    330 we've sent on this Encoder, so it has type id 65 (0 through 64 are
    331 reserved).
    332 
    333 	1f	// This item (a type descriptor) is 31 bytes long.
    334 	ff 81	// The negative of the id for the type we're defining, -65.
    335 		// This is one byte (indicated by FF = -1) followed by
    336 		// ^-65<<1 | 1.  The low 1 bit signals to complement the
    337 		// rest upon receipt.
    338 
    339 	// Now we send a type descriptor, which is itself a struct (wireType).
    340 	// The type of wireType itself is known (it's built in, as is the type of
    341 	// all its components), so we just need to send a *value* of type wireType
    342 	// that represents type "Point".
    343 	// Here starts the encoding of that value.
    344 	// Set the field number implicitly to -1; this is done at the beginning
    345 	// of every struct, including nested structs.
    346 	03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
    347 		// structType starts with an embedded CommonType, which appears
    348 		// as a regular structure here too.
    349 	01	// add 1 to field number (now 0); start of embedded CommonType.
    350 	01	// add 1 to field number (now 0, the name of the type)
    351 	05	// string is (unsigned) 5 bytes long
    352 	50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
    353 	01	// add 1 to field number (now 1, the id of the type)
    354 	ff 82	// wireType.structType.CommonType._id = 65
    355 	00	// end of embedded wiretype.structType.CommonType struct
    356 	01	// add 1 to field number (now 1, the field array in wireType.structType)
    357 	02	// There are two fields in the type (len(structType.field))
    358 	01	// Start of first field structure; add 1 to get field number 0: field[0].name
    359 	01	// 1 byte
    360 	58	// structType.field[0].name = "X"
    361 	01	// Add 1 to get field number 1: field[0].id
    362 	04	// structType.field[0].typeId is 2 (signed int).
    363 	00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
    364 	01	// Add 1 to get field number 0: field[1].name
    365 	01	// 1 byte
    366 	59	// structType.field[1].name = "Y"
    367 	01	// Add 1 to get field number 1: field[1].id
    368 	04	// struct.Type.field[1].typeId is 2 (signed int).
    369 	00	// End of structType.field[1]; end of structType.field.
    370 	00	// end of wireType.structType structure
    371 	00	// end of wireType structure
    372 
    373 Now we can send the Point value.  Again the field number resets to -1:
    374 
    375 	07	// this value is 7 bytes long
    376 	ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
    377 	01	// add one to field number, yielding field 0
    378 	2c	// encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
    379 	01	// add one to field number, yielding field 1
    380 	42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
    381 	00	// end of structure
    382 
    383 The type encoding is long and fairly intricate but we send it only once.
    384 If p is transmitted a second time, the type is already known so the
    385 output will be just:
    386 
    387 	07 ff 82 01 2c 01 42 00
    388 
    389 A single non-struct value at top level is transmitted like a field with
    390 delta tag 0.  For instance, a signed integer with value 3 presented as
    391 the argument to Encode will emit:
    392 
    393 	03 04 00 06
    394 
    395 Which represents:
    396 
    397 	03	// this value is 3 bytes long
    398 	04	// the type number, 2, represents an integer
    399 	00	// tag delta 0
    400 	06	// value 3
    401 
    402 */
    403