1 :mod:`struct` --- Interpret bytes as packed binary data 2 ======================================================= 3 4 .. module:: struct 5 :synopsis: Interpret bytes as packed binary data. 6 7 **Source code:** :source:`Lib/struct.py` 8 9 .. index:: 10 pair: C; structures 11 triple: packing; binary; data 12 13 -------------- 14 15 This module performs conversions between Python values and C structs represented 16 as Python :class:`bytes` objects. This can be used in handling binary data 17 stored in files or from network connections, among other sources. It uses 18 :ref:`struct-format-strings` as compact descriptions of the layout of the C 19 structs and the intended conversion to/from Python values. 20 21 .. note:: 22 23 By default, the result of packing a given C struct includes pad bytes in 24 order to maintain proper alignment for the C types involved; similarly, 25 alignment is taken into account when unpacking. This behavior is chosen so 26 that the bytes of a packed struct correspond exactly to the layout in memory 27 of the corresponding C struct. To handle platform-independent data formats 28 or omit implicit pad bytes, use ``standard`` size and alignment instead of 29 ``native`` size and alignment: see :ref:`struct-alignment` for details. 30 31 Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer* 32 argument. This refers to objects that implement the :ref:`bufferobjects` and 33 provide either a readable or read-writable buffer. The most common types used 34 for that purpose are :class:`bytes` and :class:`bytearray`, but many other types 35 that can be viewed as an array of bytes implement the buffer protocol, so that 36 they can be read/filled without additional copying from a :class:`bytes` object. 37 38 39 Functions and Exceptions 40 ------------------------ 41 42 The module defines the following exception and functions: 43 44 45 .. exception:: error 46 47 Exception raised on various occasions; argument is a string describing what 48 is wrong. 49 50 51 .. function:: pack(format, v1, v2, ...) 52 53 Return a bytes object containing the values *v1*, *v2*, ... packed according 54 to the format string *format*. The arguments must match the values required by 55 the format exactly. 56 57 58 .. function:: pack_into(format, buffer, offset, v1, v2, ...) 59 60 Pack the values *v1*, *v2*, ... according to the format string *format* and 61 write the packed bytes into the writable buffer *buffer* starting at 62 position *offset*. Note that *offset* is a required argument. 63 64 65 .. function:: unpack(format, buffer) 66 67 Unpack from the buffer *buffer* (presumably packed by ``pack(format, ...)``) 68 according to the format string *format*. The result is a tuple even if it 69 contains exactly one item. The buffer's size in bytes must match the 70 size required by the format, as reflected by :func:`calcsize`. 71 72 73 .. function:: unpack_from(format, buffer, offset=0) 74 75 Unpack from *buffer* starting at position *offset*, according to the format 76 string *format*. The result is a tuple even if it contains exactly one 77 item. The buffer's size in bytes, minus *offset*, must be at least 78 the size required by the format, as reflected by :func:`calcsize`. 79 80 81 .. function:: iter_unpack(format, buffer) 82 83 Iteratively unpack from the buffer *buffer* according to the format 84 string *format*. This function returns an iterator which will read 85 equally-sized chunks from the buffer until all its contents have been 86 consumed. The buffer's size in bytes must be a multiple of the size 87 required by the format, as reflected by :func:`calcsize`. 88 89 Each iteration yields a tuple as specified by the format string. 90 91 .. versionadded:: 3.4 92 93 94 .. function:: calcsize(format) 95 96 Return the size of the struct (and hence of the bytes object produced by 97 ``pack(format, ...)``) corresponding to the format string *format*. 98 99 100 .. _struct-format-strings: 101 102 Format Strings 103 -------------- 104 105 Format strings are the mechanism used to specify the expected layout when 106 packing and unpacking data. They are built up from :ref:`format-characters`, 107 which specify the type of data being packed/unpacked. In addition, there are 108 special characters for controlling the :ref:`struct-alignment`. 109 110 111 .. _struct-alignment: 112 113 Byte Order, Size, and Alignment 114 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 115 116 By default, C types are represented in the machine's native format and byte 117 order, and properly aligned by skipping pad bytes if necessary (according to the 118 rules used by the C compiler). 119 120 .. index:: 121 single: @ (at); in struct format strings 122 single: = (equals); in struct format strings 123 single: < (less); in struct format strings 124 single: > (greater); in struct format strings 125 single: ! (exclamation); in struct format strings 126 127 Alternatively, the first character of the format string can be used to indicate 128 the byte order, size and alignment of the packed data, according to the 129 following table: 130 131 +-----------+------------------------+----------+-----------+ 132 | Character | Byte order | Size | Alignment | 133 +===========+========================+==========+===========+ 134 | ``@`` | native | native | native | 135 +-----------+------------------------+----------+-----------+ 136 | ``=`` | native | standard | none | 137 +-----------+------------------------+----------+-----------+ 138 | ``<`` | little-endian | standard | none | 139 +-----------+------------------------+----------+-----------+ 140 | ``>`` | big-endian | standard | none | 141 +-----------+------------------------+----------+-----------+ 142 | ``!`` | network (= big-endian) | standard | none | 143 +-----------+------------------------+----------+-----------+ 144 145 If the first character is not one of these, ``'@'`` is assumed. 146 147 Native byte order is big-endian or little-endian, depending on the host 148 system. For example, Intel x86 and AMD64 (x86-64) are little-endian; 149 Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature 150 switchable endianness (bi-endian). Use ``sys.byteorder`` to check the 151 endianness of your system. 152 153 Native size and alignment are determined using the C compiler's 154 ``sizeof`` expression. This is always combined with native byte order. 155 156 Standard size depends only on the format character; see the table in 157 the :ref:`format-characters` section. 158 159 Note the difference between ``'@'`` and ``'='``: both use native byte order, but 160 the size and alignment of the latter is standardized. 161 162 The form ``'!'`` is available for those poor souls who claim they can't remember 163 whether network byte order is big-endian or little-endian. 164 165 There is no way to indicate non-native byte order (force byte-swapping); use the 166 appropriate choice of ``'<'`` or ``'>'``. 167 168 Notes: 169 170 (1) Padding is only automatically added between successive structure members. 171 No padding is added at the beginning or the end of the encoded struct. 172 173 (2) No padding is added when using non-native size and alignment, e.g. 174 with '<', '>', '=', and '!'. 175 176 (3) To align the end of a structure to the alignment requirement of a 177 particular type, end the format with the code for that type with a repeat 178 count of zero. See :ref:`struct-examples`. 179 180 181 .. _format-characters: 182 183 Format Characters 184 ^^^^^^^^^^^^^^^^^ 185 186 Format characters have the following meaning; the conversion between C and 187 Python values should be obvious given their types. The 'Standard size' column 188 refers to the size of the packed value in bytes when using standard size; that 189 is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or 190 ``'='``. When using native size, the size of the packed value is 191 platform-dependent. 192 193 +--------+--------------------------+--------------------+----------------+------------+ 194 | Format | C Type | Python type | Standard size | Notes | 195 +========+==========================+====================+================+============+ 196 | ``x`` | pad byte | no value | | | 197 +--------+--------------------------+--------------------+----------------+------------+ 198 | ``c`` | :c:type:`char` | bytes of length 1 | 1 | | 199 +--------+--------------------------+--------------------+----------------+------------+ 200 | ``b`` | :c:type:`signed char` | integer | 1 | \(1),\(3) | 201 +--------+--------------------------+--------------------+----------------+------------+ 202 | ``B`` | :c:type:`unsigned char` | integer | 1 | \(3) | 203 +--------+--------------------------+--------------------+----------------+------------+ 204 | ``?`` | :c:type:`_Bool` | bool | 1 | \(1) | 205 +--------+--------------------------+--------------------+----------------+------------+ 206 | ``h`` | :c:type:`short` | integer | 2 | \(3) | 207 +--------+--------------------------+--------------------+----------------+------------+ 208 | ``H`` | :c:type:`unsigned short` | integer | 2 | \(3) | 209 +--------+--------------------------+--------------------+----------------+------------+ 210 | ``i`` | :c:type:`int` | integer | 4 | \(3) | 211 +--------+--------------------------+--------------------+----------------+------------+ 212 | ``I`` | :c:type:`unsigned int` | integer | 4 | \(3) | 213 +--------+--------------------------+--------------------+----------------+------------+ 214 | ``l`` | :c:type:`long` | integer | 4 | \(3) | 215 +--------+--------------------------+--------------------+----------------+------------+ 216 | ``L`` | :c:type:`unsigned long` | integer | 4 | \(3) | 217 +--------+--------------------------+--------------------+----------------+------------+ 218 | ``q`` | :c:type:`long long` | integer | 8 | \(2), \(3) | 219 +--------+--------------------------+--------------------+----------------+------------+ 220 | ``Q`` | :c:type:`unsigned long | integer | 8 | \(2), \(3) | 221 | | long` | | | | 222 +--------+--------------------------+--------------------+----------------+------------+ 223 | ``n`` | :c:type:`ssize_t` | integer | | \(4) | 224 +--------+--------------------------+--------------------+----------------+------------+ 225 | ``N`` | :c:type:`size_t` | integer | | \(4) | 226 +--------+--------------------------+--------------------+----------------+------------+ 227 | ``e`` | \(7) | float | 2 | \(5) | 228 +--------+--------------------------+--------------------+----------------+------------+ 229 | ``f`` | :c:type:`float` | float | 4 | \(5) | 230 +--------+--------------------------+--------------------+----------------+------------+ 231 | ``d`` | :c:type:`double` | float | 8 | \(5) | 232 +--------+--------------------------+--------------------+----------------+------------+ 233 | ``s`` | :c:type:`char[]` | bytes | | | 234 +--------+--------------------------+--------------------+----------------+------------+ 235 | ``p`` | :c:type:`char[]` | bytes | | | 236 +--------+--------------------------+--------------------+----------------+------------+ 237 | ``P`` | :c:type:`void \*` | integer | | \(6) | 238 +--------+--------------------------+--------------------+----------------+------------+ 239 240 .. versionchanged:: 3.3 241 Added support for the ``'n'`` and ``'N'`` formats. 242 243 .. versionchanged:: 3.6 244 Added support for the ``'e'`` format. 245 246 247 Notes: 248 249 (1) 250 .. index:: single: ? (question mark); in struct format strings 251 252 The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by 253 C99. If this type is not available, it is simulated using a :c:type:`char`. In 254 standard mode, it is always represented by one byte. 255 256 (2) 257 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if 258 the platform C compiler supports C :c:type:`long long`, or, on Windows, 259 :c:type:`__int64`. They are always available in standard modes. 260 261 (3) 262 When attempting to pack a non-integer using any of the integer conversion 263 codes, if the non-integer has a :meth:`__index__` method then that method is 264 called to convert the argument to an integer before packing. 265 266 .. versionchanged:: 3.2 267 Use of the :meth:`__index__` method for non-integers is new in 3.2. 268 269 (4) 270 The ``'n'`` and ``'N'`` conversion codes are only available for the native 271 size (selected as the default or with the ``'@'`` byte order character). 272 For the standard size, you can use whichever of the other integer formats 273 fits your application. 274 275 (5) 276 For the ``'f'``, ``'d'`` and ``'e'`` conversion codes, the packed 277 representation uses the IEEE 754 binary32, binary64 or binary16 format (for 278 ``'f'``, ``'d'`` or ``'e'`` respectively), regardless of the floating-point 279 format used by the platform. 280 281 (6) 282 The ``'P'`` format character is only available for the native byte ordering 283 (selected as the default or with the ``'@'`` byte order character). The byte 284 order character ``'='`` chooses to use little- or big-endian ordering based 285 on the host system. The struct module does not interpret this as native 286 ordering, so the ``'P'`` format is not available. 287 288 (7) 289 The IEEE 754 binary16 "half precision" type was introduced in the 2008 290 revision of the `IEEE 754 standard <ieee 754 standard_>`_. It has a sign 291 bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), 292 and can represent numbers between approximately ``6.1e-05`` and ``6.5e+04`` 293 at full precision. This type is not widely supported by C compilers: on a 294 typical machine, an unsigned short can be used for storage, but not for math 295 operations. See the Wikipedia page on the `half-precision floating-point 296 format <half precision format_>`_ for more information. 297 298 299 A format character may be preceded by an integral repeat count. For example, 300 the format string ``'4h'`` means exactly the same as ``'hhhh'``. 301 302 Whitespace characters between formats are ignored; a count and its format must 303 not contain whitespace though. 304 305 For the ``'s'`` format character, the count is interpreted as the length of the 306 bytes, not a repeat count like for the other format characters; for example, 307 ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. 308 If a count is not given, it defaults to 1. For packing, the string is 309 truncated or padded with null bytes as appropriate to make it fit. For 310 unpacking, the resulting bytes object always has exactly the specified number 311 of bytes. As a special case, ``'0s'`` means a single, empty string (while 312 ``'0c'`` means 0 characters). 313 314 When packing a value ``x`` using one of the integer formats (``'b'``, 315 ``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``, 316 ``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format 317 then :exc:`struct.error` is raised. 318 319 .. versionchanged:: 3.1 320 In 3.0, some of the integer formats wrapped out-of-range values and 321 raised :exc:`DeprecationWarning` instead of :exc:`struct.error`. 322 323 The ``'p'`` format character encodes a "Pascal string", meaning a short 324 variable-length string stored in a *fixed number of bytes*, given by the count. 325 The first byte stored is the length of the string, or 255, whichever is 326 smaller. The bytes of the string follow. If the string passed in to 327 :func:`pack` is too long (longer than the count minus 1), only the leading 328 ``count-1`` bytes of the string are stored. If the string is shorter than 329 ``count-1``, it is padded with null bytes so that exactly count bytes in all 330 are used. Note that for :func:`unpack`, the ``'p'`` format character consumes 331 ``count`` bytes, but that the string returned can never contain more than 255 332 bytes. 333 334 .. index:: single: ? (question mark); in struct format strings 335 336 For the ``'?'`` format character, the return value is either :const:`True` or 337 :const:`False`. When packing, the truth value of the argument object is used. 338 Either 0 or 1 in the native or standard bool representation will be packed, and 339 any non-zero value will be ``True`` when unpacking. 340 341 342 343 .. _struct-examples: 344 345 Examples 346 ^^^^^^^^ 347 348 .. note:: 349 All examples assume a native byte order, size, and alignment with a 350 big-endian machine. 351 352 A basic example of packing/unpacking three integers:: 353 354 >>> from struct import * 355 >>> pack('hhl', 1, 2, 3) 356 b'\x00\x01\x00\x02\x00\x00\x00\x03' 357 >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03') 358 (1, 2, 3) 359 >>> calcsize('hhl') 360 8 361 362 Unpacked fields can be named by assigning them to variables or by wrapping 363 the result in a named tuple:: 364 365 >>> record = b'raymond \x32\x12\x08\x01\x08' 366 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) 367 368 >>> from collections import namedtuple 369 >>> Student = namedtuple('Student', 'name serialnum school gradelevel') 370 >>> Student._make(unpack('<10sHHb', record)) 371 Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) 372 373 The ordering of format characters may have an impact on size since the padding 374 needed to satisfy alignment requirements is different:: 375 376 >>> pack('ci', b'*', 0x12131415) 377 b'*\x00\x00\x00\x12\x13\x14\x15' 378 >>> pack('ic', 0x12131415, b'*') 379 b'\x12\x13\x14\x15*' 380 >>> calcsize('ci') 381 8 382 >>> calcsize('ic') 383 5 384 385 The following format ``'llh0l'`` specifies two pad bytes at the end, assuming 386 longs are aligned on 4-byte boundaries:: 387 388 >>> pack('llh0l', 1, 2, 3) 389 b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00' 390 391 This only works when native size and alignment are in effect; standard size and 392 alignment does not enforce any alignment. 393 394 395 .. seealso:: 396 397 Module :mod:`array` 398 Packed binary storage of homogeneous data. 399 400 Module :mod:`xdrlib` 401 Packing and unpacking of XDR data. 402 403 404 .. _struct-objects: 405 406 Classes 407 ------- 408 409 The :mod:`struct` module also defines the following type: 410 411 412 .. class:: Struct(format) 413 414 Return a new Struct object which writes and reads binary data according to 415 the format string *format*. Creating a Struct object once and calling its 416 methods is more efficient than calling the :mod:`struct` functions with the 417 same format since the format string only needs to be compiled once. 418 419 .. note:: 420 421 The compiled versions of the most recent format strings passed to 422 :class:`Struct` and the module-level functions are cached, so programs 423 that use only a few format strings needn't worry about reusing a single 424 :class:`Struct` instance. 425 426 Compiled Struct objects support the following methods and attributes: 427 428 .. method:: pack(v1, v2, ...) 429 430 Identical to the :func:`pack` function, using the compiled format. 431 (``len(result)`` will equal :attr:`size`.) 432 433 434 .. method:: pack_into(buffer, offset, v1, v2, ...) 435 436 Identical to the :func:`pack_into` function, using the compiled format. 437 438 439 .. method:: unpack(buffer) 440 441 Identical to the :func:`unpack` function, using the compiled format. 442 The buffer's size in bytes must equal :attr:`size`. 443 444 445 .. method:: unpack_from(buffer, offset=0) 446 447 Identical to the :func:`unpack_from` function, using the compiled format. 448 The buffer's size in bytes, minus *offset*, must be at least 449 :attr:`size`. 450 451 452 .. method:: iter_unpack(buffer) 453 454 Identical to the :func:`iter_unpack` function, using the compiled format. 455 The buffer's size in bytes must be a multiple of :attr:`size`. 456 457 .. versionadded:: 3.4 458 459 .. attribute:: format 460 461 The format string used to construct this Struct object. 462 463 .. versionchanged:: 3.7 464 The format string type is now :class:`str` instead of :class:`bytes`. 465 466 .. attribute:: size 467 468 The calculated size of the struct (and hence of the bytes object produced 469 by the :meth:`pack` method) corresponding to :attr:`format`. 470 471 472 .. _half precision format: https://en.wikipedia.org/wiki/Half-precision_floating-point_format 473 474 .. _ieee 754 standard: https://en.wikipedia.org/wiki/IEEE_floating_point#IEEE_754-2008 475