Home | History | Annotate | Download | only in library
      1 :mod:`struct` --- Interpret bytes as packed binary data
      2 =======================================================
      3 
      4 .. module:: struct
      5    :synopsis: Interpret bytes as packed binary data.
      6 
      7 **Source code:** :source:`Lib/struct.py`
      8 
      9 .. index::
     10    pair: C; structures
     11    triple: packing; binary; data
     12 
     13 --------------
     14 
     15 This module performs conversions between Python values and C structs represented
     16 as Python :class:`bytes` objects.  This can be used in handling binary data
     17 stored in files or from network connections, among other sources.  It uses
     18 :ref:`struct-format-strings` as compact descriptions of the layout of the C
     19 structs and the intended conversion to/from Python values.
     20 
     21 .. note::
     22 
     23    By default, the result of packing a given C struct includes pad bytes in
     24    order to maintain proper alignment for the C types involved; similarly,
     25    alignment is taken into account when unpacking.  This behavior is chosen so
     26    that the bytes of a packed struct correspond exactly to the layout in memory
     27    of the corresponding C struct.  To handle platform-independent data formats
     28    or omit implicit pad bytes, use ``standard`` size and alignment instead of
     29    ``native`` size and alignment: see :ref:`struct-alignment` for details.
     30 
     31 Several :mod:`struct` functions (and methods of :class:`Struct`) take a *buffer*
     32 argument.  This refers to objects that implement the :ref:`bufferobjects` and
     33 provide either a readable or read-writable buffer.  The most common types used
     34 for that purpose are :class:`bytes` and :class:`bytearray`, but many other types
     35 that can be viewed as an array of bytes implement the buffer protocol, so that
     36 they can be read/filled without additional copying from a :class:`bytes` object.
     37 
     38 
     39 Functions and Exceptions
     40 ------------------------
     41 
     42 The module defines the following exception and functions:
     43 
     44 
     45 .. exception:: error
     46 
     47    Exception raised on various occasions; argument is a string describing what
     48    is wrong.
     49 
     50 
     51 .. function:: pack(format, v1, v2, ...)
     52 
     53    Return a bytes object containing the values *v1*, *v2*, ... packed according
     54    to the format string *format*.  The arguments must match the values required by
     55    the format exactly.
     56 
     57 
     58 .. function:: pack_into(format, buffer, offset, v1, v2, ...)
     59 
     60    Pack the values *v1*, *v2*, ... according to the format string *format* and
     61    write the packed bytes into the writable buffer *buffer* starting at
     62    position *offset*.  Note that *offset* is a required argument.
     63 
     64 
     65 .. function:: unpack(format, buffer)
     66 
     67    Unpack from the buffer *buffer* (presumably packed by ``pack(format, ...)``)
     68    according to the format string *format*.  The result is a tuple even if it
     69    contains exactly one item.  The buffer's size in bytes must match the
     70    size required by the format, as reflected by :func:`calcsize`.
     71 
     72 
     73 .. function:: unpack_from(format, buffer, offset=0)
     74 
     75    Unpack from *buffer* starting at position *offset*, according to the format
     76    string *format*.  The result is a tuple even if it contains exactly one
     77    item.  The buffer's size in bytes, minus *offset*, must be at least
     78    the size required by the format, as reflected by :func:`calcsize`.
     79 
     80 
     81 .. function:: iter_unpack(format, buffer)
     82 
     83    Iteratively unpack from the buffer *buffer* according to the format
     84    string *format*.  This function returns an iterator which will read
     85    equally-sized chunks from the buffer until all its contents have been
     86    consumed.  The buffer's size in bytes must be a multiple of the size
     87    required by the format, as reflected by :func:`calcsize`.
     88 
     89    Each iteration yields a tuple as specified by the format string.
     90 
     91    .. versionadded:: 3.4
     92 
     93 
     94 .. function:: calcsize(format)
     95 
     96    Return the size of the struct (and hence of the bytes object produced by
     97    ``pack(format, ...)``) corresponding to the format string *format*.
     98 
     99 
    100 .. _struct-format-strings:
    101 
    102 Format Strings
    103 --------------
    104 
    105 Format strings are the mechanism used to specify the expected layout when
    106 packing and unpacking data.  They are built up from :ref:`format-characters`,
    107 which specify the type of data being packed/unpacked.  In addition, there are
    108 special characters for controlling the :ref:`struct-alignment`.
    109 
    110 
    111 .. _struct-alignment:
    112 
    113 Byte Order, Size, and Alignment
    114 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    115 
    116 By default, C types are represented in the machine's native format and byte
    117 order, and properly aligned by skipping pad bytes if necessary (according to the
    118 rules used by the C compiler).
    119 
    120 .. index::
    121    single: @ (at); in struct format strings
    122    single: = (equals); in struct format strings
    123    single: < (less); in struct format strings
    124    single: > (greater); in struct format strings
    125    single: ! (exclamation); in struct format strings
    126 
    127 Alternatively, the first character of the format string can be used to indicate
    128 the byte order, size and alignment of the packed data, according to the
    129 following table:
    130 
    131 +-----------+------------------------+----------+-----------+
    132 | Character | Byte order             | Size     | Alignment |
    133 +===========+========================+==========+===========+
    134 | ``@``     | native                 | native   | native    |
    135 +-----------+------------------------+----------+-----------+
    136 | ``=``     | native                 | standard | none      |
    137 +-----------+------------------------+----------+-----------+
    138 | ``<``     | little-endian          | standard | none      |
    139 +-----------+------------------------+----------+-----------+
    140 | ``>``     | big-endian             | standard | none      |
    141 +-----------+------------------------+----------+-----------+
    142 | ``!``     | network (= big-endian) | standard | none      |
    143 +-----------+------------------------+----------+-----------+
    144 
    145 If the first character is not one of these, ``'@'`` is assumed.
    146 
    147 Native byte order is big-endian or little-endian, depending on the host
    148 system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
    149 Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
    150 switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
    151 endianness of your system.
    152 
    153 Native size and alignment are determined using the C compiler's
    154 ``sizeof`` expression.  This is always combined with native byte order.
    155 
    156 Standard size depends only on the format character;  see the table in
    157 the :ref:`format-characters` section.
    158 
    159 Note the difference between ``'@'`` and ``'='``: both use native byte order, but
    160 the size and alignment of the latter is standardized.
    161 
    162 The form ``'!'`` is available for those poor souls who claim they can't remember
    163 whether network byte order is big-endian or little-endian.
    164 
    165 There is no way to indicate non-native byte order (force byte-swapping); use the
    166 appropriate choice of ``'<'`` or ``'>'``.
    167 
    168 Notes:
    169 
    170 (1) Padding is only automatically added between successive structure members.
    171     No padding is added at the beginning or the end of the encoded struct.
    172 
    173 (2) No padding is added when using non-native size and alignment, e.g.
    174     with '<', '>', '=', and '!'.
    175 
    176 (3) To align the end of a structure to the alignment requirement of a
    177     particular type, end the format with the code for that type with a repeat
    178     count of zero.  See :ref:`struct-examples`.
    179 
    180 
    181 .. _format-characters:
    182 
    183 Format Characters
    184 ^^^^^^^^^^^^^^^^^
    185 
    186 Format characters have the following meaning; the conversion between C and
    187 Python values should be obvious given their types.  The 'Standard size' column
    188 refers to the size of the packed value in bytes when using standard size; that
    189 is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or
    190 ``'='``.  When using native size, the size of the packed value is
    191 platform-dependent.
    192 
    193 +--------+--------------------------+--------------------+----------------+------------+
    194 | Format | C Type                   | Python type        | Standard size  | Notes      |
    195 +========+==========================+====================+================+============+
    196 | ``x``  | pad byte                 | no value           |                |            |
    197 +--------+--------------------------+--------------------+----------------+------------+
    198 | ``c``  | :c:type:`char`           | bytes of length 1  | 1              |            |
    199 +--------+--------------------------+--------------------+----------------+------------+
    200 | ``b``  | :c:type:`signed char`    | integer            | 1              | \(1),\(3)  |
    201 +--------+--------------------------+--------------------+----------------+------------+
    202 | ``B``  | :c:type:`unsigned char`  | integer            | 1              | \(3)       |
    203 +--------+--------------------------+--------------------+----------------+------------+
    204 | ``?``  | :c:type:`_Bool`          | bool               | 1              | \(1)       |
    205 +--------+--------------------------+--------------------+----------------+------------+
    206 | ``h``  | :c:type:`short`          | integer            | 2              | \(3)       |
    207 +--------+--------------------------+--------------------+----------------+------------+
    208 | ``H``  | :c:type:`unsigned short` | integer            | 2              | \(3)       |
    209 +--------+--------------------------+--------------------+----------------+------------+
    210 | ``i``  | :c:type:`int`            | integer            | 4              | \(3)       |
    211 +--------+--------------------------+--------------------+----------------+------------+
    212 | ``I``  | :c:type:`unsigned int`   | integer            | 4              | \(3)       |
    213 +--------+--------------------------+--------------------+----------------+------------+
    214 | ``l``  | :c:type:`long`           | integer            | 4              | \(3)       |
    215 +--------+--------------------------+--------------------+----------------+------------+
    216 | ``L``  | :c:type:`unsigned long`  | integer            | 4              | \(3)       |
    217 +--------+--------------------------+--------------------+----------------+------------+
    218 | ``q``  | :c:type:`long long`      | integer            | 8              | \(2), \(3) |
    219 +--------+--------------------------+--------------------+----------------+------------+
    220 | ``Q``  | :c:type:`unsigned long   | integer            | 8              | \(2), \(3) |
    221 |        | long`                    |                    |                |            |
    222 +--------+--------------------------+--------------------+----------------+------------+
    223 | ``n``  | :c:type:`ssize_t`        | integer            |                | \(4)       |
    224 +--------+--------------------------+--------------------+----------------+------------+
    225 | ``N``  | :c:type:`size_t`         | integer            |                | \(4)       |
    226 +--------+--------------------------+--------------------+----------------+------------+
    227 | ``e``  | \(7)                     | float              | 2              | \(5)       |
    228 +--------+--------------------------+--------------------+----------------+------------+
    229 | ``f``  | :c:type:`float`          | float              | 4              | \(5)       |
    230 +--------+--------------------------+--------------------+----------------+------------+
    231 | ``d``  | :c:type:`double`         | float              | 8              | \(5)       |
    232 +--------+--------------------------+--------------------+----------------+------------+
    233 | ``s``  | :c:type:`char[]`         | bytes              |                |            |
    234 +--------+--------------------------+--------------------+----------------+------------+
    235 | ``p``  | :c:type:`char[]`         | bytes              |                |            |
    236 +--------+--------------------------+--------------------+----------------+------------+
    237 | ``P``  | :c:type:`void \*`        | integer            |                | \(6)       |
    238 +--------+--------------------------+--------------------+----------------+------------+
    239 
    240 .. versionchanged:: 3.3
    241    Added support for the ``'n'`` and ``'N'`` formats.
    242 
    243 .. versionchanged:: 3.6
    244    Added support for the ``'e'`` format.
    245 
    246 
    247 Notes:
    248 
    249 (1)
    250    .. index:: single: ? (question mark); in struct format strings
    251 
    252    The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by
    253    C99. If this type is not available, it is simulated using a :c:type:`char`. In
    254    standard mode, it is always represented by one byte.
    255 
    256 (2)
    257    The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
    258    the platform C compiler supports C :c:type:`long long`, or, on Windows,
    259    :c:type:`__int64`.  They are always available in standard modes.
    260 
    261 (3)
    262    When attempting to pack a non-integer using any of the integer conversion
    263    codes, if the non-integer has a :meth:`__index__` method then that method is
    264    called to convert the argument to an integer before packing.
    265 
    266    .. versionchanged:: 3.2
    267       Use of the :meth:`__index__` method for non-integers is new in 3.2.
    268 
    269 (4)
    270    The ``'n'`` and ``'N'`` conversion codes are only available for the native
    271    size (selected as the default or with the ``'@'`` byte order character).
    272    For the standard size, you can use whichever of the other integer formats
    273    fits your application.
    274 
    275 (5)
    276    For the ``'f'``, ``'d'`` and ``'e'`` conversion codes, the packed
    277    representation uses the IEEE 754 binary32, binary64 or binary16 format (for
    278    ``'f'``, ``'d'`` or ``'e'`` respectively), regardless of the floating-point
    279    format used by the platform.
    280 
    281 (6)
    282    The ``'P'`` format character is only available for the native byte ordering
    283    (selected as the default or with the ``'@'`` byte order character). The byte
    284    order character ``'='`` chooses to use little- or big-endian ordering based
    285    on the host system. The struct module does not interpret this as native
    286    ordering, so the ``'P'`` format is not available.
    287 
    288 (7)
    289    The IEEE 754 binary16 "half precision" type was introduced in the 2008
    290    revision of the `IEEE 754 standard <ieee 754 standard_>`_. It has a sign
    291    bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored),
    292    and can represent numbers between approximately ``6.1e-05`` and ``6.5e+04``
    293    at full precision. This type is not widely supported by C compilers: on a
    294    typical machine, an unsigned short can be used for storage, but not for math
    295    operations. See the Wikipedia page on the `half-precision floating-point
    296    format <half precision format_>`_ for more information.
    297 
    298 
    299 A format character may be preceded by an integral repeat count.  For example,
    300 the format string ``'4h'`` means exactly the same as ``'hhhh'``.
    301 
    302 Whitespace characters between formats are ignored; a count and its format must
    303 not contain whitespace though.
    304 
    305 For the ``'s'`` format character, the count is interpreted as the length of the
    306 bytes, not a repeat count like for the other format characters; for example,
    307 ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
    308 If a count is not given, it defaults to 1.  For packing, the string is
    309 truncated or padded with null bytes as appropriate to make it fit. For
    310 unpacking, the resulting bytes object always has exactly the specified number
    311 of bytes.  As a special case, ``'0s'`` means a single, empty string (while
    312 ``'0c'`` means 0 characters).
    313 
    314 When packing a value ``x`` using one of the integer formats (``'b'``,
    315 ``'B'``, ``'h'``, ``'H'``, ``'i'``, ``'I'``, ``'l'``, ``'L'``,
    316 ``'q'``, ``'Q'``), if ``x`` is outside the valid range for that format
    317 then :exc:`struct.error` is raised.
    318 
    319 .. versionchanged:: 3.1
    320    In 3.0, some of the integer formats wrapped out-of-range values and
    321    raised :exc:`DeprecationWarning` instead of :exc:`struct.error`.
    322 
    323 The ``'p'`` format character encodes a "Pascal string", meaning a short
    324 variable-length string stored in a *fixed number of bytes*, given by the count.
    325 The first byte stored is the length of the string, or 255, whichever is
    326 smaller.  The bytes of the string follow.  If the string passed in to
    327 :func:`pack` is too long (longer than the count minus 1), only the leading
    328 ``count-1`` bytes of the string are stored.  If the string is shorter than
    329 ``count-1``, it is padded with null bytes so that exactly count bytes in all
    330 are used.  Note that for :func:`unpack`, the ``'p'`` format character consumes
    331 ``count`` bytes, but that the string returned can never contain more than 255
    332 bytes.
    333 
    334 .. index:: single: ? (question mark); in struct format strings
    335 
    336 For the ``'?'`` format character, the return value is either :const:`True` or
    337 :const:`False`. When packing, the truth value of the argument object is used.
    338 Either 0 or 1 in the native or standard bool representation will be packed, and
    339 any non-zero value will be ``True`` when unpacking.
    340 
    341 
    342 
    343 .. _struct-examples:
    344 
    345 Examples
    346 ^^^^^^^^
    347 
    348 .. note::
    349    All examples assume a native byte order, size, and alignment with a
    350    big-endian machine.
    351 
    352 A basic example of packing/unpacking three integers::
    353 
    354    >>> from struct import *
    355    >>> pack('hhl', 1, 2, 3)
    356    b'\x00\x01\x00\x02\x00\x00\x00\x03'
    357    >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
    358    (1, 2, 3)
    359    >>> calcsize('hhl')
    360    8
    361 
    362 Unpacked fields can be named by assigning them to variables or by wrapping
    363 the result in a named tuple::
    364 
    365     >>> record = b'raymond   \x32\x12\x08\x01\x08'
    366     >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
    367 
    368     >>> from collections import namedtuple
    369     >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
    370     >>> Student._make(unpack('<10sHHb', record))
    371     Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)
    372 
    373 The ordering of format characters may have an impact on size since the padding
    374 needed to satisfy alignment requirements is different::
    375 
    376     >>> pack('ci', b'*', 0x12131415)
    377     b'*\x00\x00\x00\x12\x13\x14\x15'
    378     >>> pack('ic', 0x12131415, b'*')
    379     b'\x12\x13\x14\x15*'
    380     >>> calcsize('ci')
    381     8
    382     >>> calcsize('ic')
    383     5
    384 
    385 The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
    386 longs are aligned on 4-byte boundaries::
    387 
    388     >>> pack('llh0l', 1, 2, 3)
    389     b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
    390 
    391 This only works when native size and alignment are in effect; standard size and
    392 alignment does not enforce any alignment.
    393 
    394 
    395 .. seealso::
    396 
    397    Module :mod:`array`
    398       Packed binary storage of homogeneous data.
    399 
    400    Module :mod:`xdrlib`
    401       Packing and unpacking of XDR data.
    402 
    403 
    404 .. _struct-objects:
    405 
    406 Classes
    407 -------
    408 
    409 The :mod:`struct` module also defines the following type:
    410 
    411 
    412 .. class:: Struct(format)
    413 
    414    Return a new Struct object which writes and reads binary data according to
    415    the format string *format*.  Creating a Struct object once and calling its
    416    methods is more efficient than calling the :mod:`struct` functions with the
    417    same format since the format string only needs to be compiled once.
    418 
    419    .. note::
    420 
    421       The compiled versions of the most recent format strings passed to
    422       :class:`Struct` and the module-level functions are cached, so programs
    423       that use only a few format strings needn't worry about reusing a single
    424       :class:`Struct` instance.
    425 
    426    Compiled Struct objects support the following methods and attributes:
    427 
    428    .. method:: pack(v1, v2, ...)
    429 
    430       Identical to the :func:`pack` function, using the compiled format.
    431       (``len(result)`` will equal :attr:`size`.)
    432 
    433 
    434    .. method:: pack_into(buffer, offset, v1, v2, ...)
    435 
    436       Identical to the :func:`pack_into` function, using the compiled format.
    437 
    438 
    439    .. method:: unpack(buffer)
    440 
    441       Identical to the :func:`unpack` function, using the compiled format.
    442       The buffer's size in bytes must equal :attr:`size`.
    443 
    444 
    445    .. method:: unpack_from(buffer, offset=0)
    446 
    447       Identical to the :func:`unpack_from` function, using the compiled format.
    448       The buffer's size in bytes, minus *offset*, must be at least
    449       :attr:`size`.
    450 
    451 
    452    .. method:: iter_unpack(buffer)
    453 
    454       Identical to the :func:`iter_unpack` function, using the compiled format.
    455       The buffer's size in bytes must be a multiple of :attr:`size`.
    456 
    457       .. versionadded:: 3.4
    458 
    459    .. attribute:: format
    460 
    461       The format string used to construct this Struct object.
    462 
    463       .. versionchanged:: 3.7
    464          The format string type is now :class:`str` instead of :class:`bytes`.
    465 
    466    .. attribute:: size
    467 
    468       The calculated size of the struct (and hence of the bytes object produced
    469       by the :meth:`pack` method) corresponding to :attr:`format`.
    470 
    471 
    472 .. _half precision format: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
    473 
    474 .. _ieee 754 standard: https://en.wikipedia.org/wiki/IEEE_floating_point#IEEE_754-2008
    475