Home | History | Annotate | Download | only in library
      1 
      2 :mod:`struct` --- Interpret strings as packed binary data
      3 =========================================================
      4 
      5 .. module:: struct
      6    :synopsis: Interpret strings as packed binary data.
      7 
      8 .. index::
      9    pair: C; structures
     10    triple: packing; binary; data
     11 
     12 This module performs conversions between Python values and C structs represented
     13 as Python strings.  This can be used in handling binary data stored in files or
     14 from network connections, among other sources.  It uses
     15 :ref:`struct-format-strings` as compact descriptions of the layout of the C
     16 structs and the intended conversion to/from Python values.
     17 
     18 .. note::
     19 
     20    By default, the result of packing a given C struct includes pad bytes in
     21    order to maintain proper alignment for the C types involved; similarly,
     22    alignment is taken into account when unpacking.  This behavior is chosen so
     23    that the bytes of a packed struct correspond exactly to the layout in memory
     24    of the corresponding C struct.  To handle platform-independent data formats
     25    or omit implicit pad bytes, use ``standard`` size and alignment instead of
     26    ``native`` size and alignment: see :ref:`struct-alignment` for details.
     27 
     28 Functions and Exceptions
     29 ------------------------
     30 
     31 The module defines the following exception and functions:
     32 
     33 
     34 .. exception:: error
     35 
     36    Exception raised on various occasions; argument is a string describing what
     37    is wrong.
     38 
     39 
     40 .. function:: pack(fmt, v1, v2, ...)
     41 
     42    Return a string containing the values ``v1, v2, ...`` packed according to the
     43    given format.  The arguments must match the values required by the format
     44    exactly.
     45 
     46 
     47 .. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
     48 
     49    Pack the values ``v1, v2, ...`` according to the given format, write the
     50    packed bytes into the writable *buffer* starting at *offset*. Note that the
     51    offset is a required argument.
     52 
     53    .. versionadded:: 2.5
     54 
     55 
     56 .. function:: unpack(fmt, string)
     57 
     58    Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
     59    given format.  The result is a tuple even if it contains exactly one item.
     60    The string must contain exactly the amount of data required by the format
     61    (``len(string)`` must equal ``calcsize(fmt)``).
     62 
     63 
     64 .. function:: unpack_from(fmt, buffer[,offset=0])
     65 
     66    Unpack the *buffer* according to the given format. The result is a tuple even
     67    if it contains exactly one item. The *buffer* must contain at least the
     68    amount of data required by the format (``len(buffer[offset:])`` must be at
     69    least ``calcsize(fmt)``).
     70 
     71    .. versionadded:: 2.5
     72 
     73 
     74 .. function:: calcsize(fmt)
     75 
     76    Return the size of the struct (and hence of the string) corresponding to the
     77    given format.
     78 
     79 .. _struct-format-strings:
     80 
     81 Format Strings
     82 --------------
     83 
     84 Format strings are the mechanism used to specify the expected layout when
     85 packing and unpacking data.  They are built up from :ref:`format-characters`,
     86 which specify the type of data being packed/unpacked.  In addition, there are
     87 special characters for controlling the :ref:`struct-alignment`.
     88 
     89 
     90 .. _struct-alignment:
     91 
     92 Byte Order, Size, and Alignment
     93 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     94 
     95 By default, C types are represented in the machine's native format and byte
     96 order, and properly aligned by skipping pad bytes if necessary (according to the
     97 rules used by the C compiler).
     98 
     99 Alternatively, the first character of the format string can be used to indicate
    100 the byte order, size and alignment of the packed data, according to the
    101 following table:
    102 
    103 +-----------+------------------------+----------+-----------+
    104 | Character | Byte order             | Size     | Alignment |
    105 +===========+========================+==========+===========+
    106 | ``@``     | native                 | native   | native    |
    107 +-----------+------------------------+----------+-----------+
    108 | ``=``     | native                 | standard | none      |
    109 +-----------+------------------------+----------+-----------+
    110 | ``<``     | little-endian          | standard | none      |
    111 +-----------+------------------------+----------+-----------+
    112 | ``>``     | big-endian             | standard | none      |
    113 +-----------+------------------------+----------+-----------+
    114 | ``!``     | network (= big-endian) | standard | none      |
    115 +-----------+------------------------+----------+-----------+
    116 
    117 If the first character is not one of these, ``'@'`` is assumed.
    118 
    119 Native byte order is big-endian or little-endian, depending on the host
    120 system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
    121 Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
    122 switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
    123 endianness of your system.
    124 
    125 Native size and alignment are determined using the C compiler's
    126 ``sizeof`` expression.  This is always combined with native byte order.
    127 
    128 Standard size depends only on the format character;  see the table in
    129 the :ref:`format-characters` section.
    130 
    131 Note the difference between ``'@'`` and ``'='``: both use native byte order, but
    132 the size and alignment of the latter is standardized.
    133 
    134 The form ``'!'`` is available for those poor souls who claim they can't remember
    135 whether network byte order is big-endian or little-endian.
    136 
    137 There is no way to indicate non-native byte order (force byte-swapping); use the
    138 appropriate choice of ``'<'`` or ``'>'``.
    139 
    140 Notes:
    141 
    142 (1) Padding is only automatically added between successive structure members.
    143     No padding is added at the beginning or the end of the encoded struct.
    144 
    145 (2) No padding is added when using non-native size and alignment, e.g.
    146     with '<', '>', '=', and '!'.
    147 
    148 (3) To align the end of a structure to the alignment requirement of a
    149     particular type, end the format with the code for that type with a repeat
    150     count of zero.  See :ref:`struct-examples`.
    151 
    152 
    153 .. _format-characters:
    154 
    155 Format Characters
    156 ^^^^^^^^^^^^^^^^^
    157 
    158 Format characters have the following meaning; the conversion between C and
    159 Python values should be obvious given their types.  The 'Standard size' column
    160 refers to the size of the packed value in bytes when using standard size; that
    161 is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or
    162 ``'='``.  When using native size, the size of the packed value is
    163 platform-dependent.
    164 
    165 +--------+--------------------------+--------------------+----------------+------------+
    166 | Format | C Type                   | Python type        | Standard size  | Notes      |
    167 +========+==========================+====================+================+============+
    168 | ``x``  | pad byte                 | no value           |                |            |
    169 +--------+--------------------------+--------------------+----------------+------------+
    170 | ``c``  | :c:type:`char`           | string of length 1 | 1              |            |
    171 +--------+--------------------------+--------------------+----------------+------------+
    172 | ``b``  | :c:type:`signed char`    | integer            | 1              | \(3)       |
    173 +--------+--------------------------+--------------------+----------------+------------+
    174 | ``B``  | :c:type:`unsigned char`  | integer            | 1              | \(3)       |
    175 +--------+--------------------------+--------------------+----------------+------------+
    176 | ``?``  | :c:type:`_Bool`          | bool               | 1              | \(1)       |
    177 +--------+--------------------------+--------------------+----------------+------------+
    178 | ``h``  | :c:type:`short`          | integer            | 2              | \(3)       |
    179 +--------+--------------------------+--------------------+----------------+------------+
    180 | ``H``  | :c:type:`unsigned short` | integer            | 2              | \(3)       |
    181 +--------+--------------------------+--------------------+----------------+------------+
    182 | ``i``  | :c:type:`int`            | integer            | 4              | \(3)       |
    183 +--------+--------------------------+--------------------+----------------+------------+
    184 | ``I``  | :c:type:`unsigned int`   | integer            | 4              | \(3)       |
    185 +--------+--------------------------+--------------------+----------------+------------+
    186 | ``l``  | :c:type:`long`           | integer            | 4              | \(3)       |
    187 +--------+--------------------------+--------------------+----------------+------------+
    188 | ``L``  | :c:type:`unsigned long`  | integer            | 4              | \(3)       |
    189 +--------+--------------------------+--------------------+----------------+------------+
    190 | ``q``  | :c:type:`long long`      | integer            | 8              | \(2), \(3) |
    191 +--------+--------------------------+--------------------+----------------+------------+
    192 | ``Q``  | :c:type:`unsigned long   | integer            | 8              | \(2), \(3) |
    193 |        | long`                    |                    |                |            |
    194 +--------+--------------------------+--------------------+----------------+------------+
    195 | ``f``  | :c:type:`float`          | float              | 4              | \(4)       |
    196 +--------+--------------------------+--------------------+----------------+------------+
    197 | ``d``  | :c:type:`double`         | float              | 8              | \(4)       |
    198 +--------+--------------------------+--------------------+----------------+------------+
    199 | ``s``  | :c:type:`char[]`         | string             |                |            |
    200 +--------+--------------------------+--------------------+----------------+------------+
    201 | ``p``  | :c:type:`char[]`         | string             |                |            |
    202 +--------+--------------------------+--------------------+----------------+------------+
    203 | ``P``  | :c:type:`void \*`        | integer            |                | \(5), \(3) |
    204 +--------+--------------------------+--------------------+----------------+------------+
    205 
    206 Notes:
    207 
    208 (1)
    209    The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by
    210    C99. If this type is not available, it is simulated using a :c:type:`char`. In
    211    standard mode, it is always represented by one byte.
    212 
    213    .. versionadded:: 2.6
    214 
    215 (2)
    216    The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
    217    the platform C compiler supports C :c:type:`long long`, or, on Windows,
    218    :c:type:`__int64`.  They are always available in standard modes.
    219 
    220    .. versionadded:: 2.2
    221 
    222 (3)
    223    When attempting to pack a non-integer using any of the integer conversion
    224    codes, if the non-integer has a :meth:`__index__` method then that method is
    225    called to convert the argument to an integer before packing.  If no
    226    :meth:`__index__` method exists, or the call to :meth:`__index__` raises
    227    :exc:`TypeError`, then the :meth:`__int__` method is tried.  However, the use
    228    of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`.
    229 
    230    .. versionchanged:: 2.7
    231       Use of the :meth:`__index__` method for non-integers is new in 2.7.
    232 
    233    .. versionchanged:: 2.7
    234       Prior to version 2.7, not all integer conversion codes would use the
    235       :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was
    236       raised only for float arguments.
    237 
    238 (4)
    239    For the ``'f'`` and ``'d'`` conversion codes, the packed representation uses
    240    the IEEE 754 binary32 (for ``'f'``) or binary64 (for ``'d'``) format,
    241    regardless of the floating-point format used by the platform.
    242 
    243 (5)
    244    The ``'P'`` format character is only available for the native byte ordering
    245    (selected as the default or with the ``'@'`` byte order character). The byte
    246    order character ``'='`` chooses to use little- or big-endian ordering based
    247    on the host system. The struct module does not interpret this as native
    248    ordering, so the ``'P'`` format is not available.
    249 
    250 
    251 A format character may be preceded by an integral repeat count.  For example,
    252 the format string ``'4h'`` means exactly the same as ``'hhhh'``.
    253 
    254 Whitespace characters between formats are ignored; a count and its format must
    255 not contain whitespace though.
    256 
    257 For the ``'s'`` format character, the count is interpreted as the size of the
    258 string, not a repeat count like for the other format characters; for example,
    259 ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
    260 If a count is not given, it defaults to 1.  For packing, the string is
    261 truncated or padded with null bytes as appropriate to make it fit. For
    262 unpacking, the resulting string always has exactly the specified number of
    263 bytes.  As a special case, ``'0s'`` means a single, empty string (while
    264 ``'0c'`` means 0 characters).
    265 
    266 The ``'p'`` format character encodes a "Pascal string", meaning a short
    267 variable-length string stored in a *fixed number of bytes*, given by the count.
    268 The first byte stored is the length of the string, or 255, whichever is smaller.
    269 The bytes of the string follow.  If the string passed in to :func:`pack` is too
    270 long (longer than the count minus 1), only the leading ``count-1`` bytes of the
    271 string are stored.  If the string is shorter than ``count-1``, it is padded with
    272 null bytes so that exactly count bytes in all are used.  Note that for
    273 :func:`unpack`, the ``'p'`` format character consumes count bytes, but that the
    274 string returned can never contain more than 255 characters.
    275 
    276 For the ``'P'`` format character, the return value is a Python integer or long
    277 integer, depending on the size needed to hold a pointer when it has been cast to
    278 an integer type.  A *NULL* pointer will always be returned as the Python integer
    279 ``0``. When packing pointer-sized values, Python integer or long integer objects
    280 may be used.  For example, the Alpha and Merced processors use 64-bit pointer
    281 values, meaning a Python long integer will be used to hold the pointer; other
    282 platforms use 32-bit pointers and will use a Python integer.
    283 
    284 For the ``'?'`` format character, the return value is either :const:`True` or
    285 :const:`False`. When packing, the truth value of the argument object is used.
    286 Either 0 or 1 in the native or standard bool representation will be packed, and
    287 any non-zero value will be ``True`` when unpacking.
    288 
    289 
    290 
    291 .. _struct-examples:
    292 
    293 Examples
    294 ^^^^^^^^
    295 
    296 .. note::
    297    All examples assume a native byte order, size, and alignment with a
    298    big-endian machine.
    299 
    300 A basic example of packing/unpacking three integers::
    301 
    302    >>> from struct import *
    303    >>> pack('hhl', 1, 2, 3)
    304    '\x00\x01\x00\x02\x00\x00\x00\x03'
    305    >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
    306    (1, 2, 3)
    307    >>> calcsize('hhl')
    308    8
    309 
    310 Unpacked fields can be named by assigning them to variables or by wrapping
    311 the result in a named tuple::
    312 
    313     >>> record = 'raymond   \x32\x12\x08\x01\x08'
    314     >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
    315 
    316     >>> from collections import namedtuple
    317     >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
    318     >>> Student._make(unpack('<10sHHb', record))
    319     Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)
    320 
    321 The ordering of format characters may have an impact on size since the padding
    322 needed to satisfy alignment requirements is different::
    323 
    324     >>> pack('ci', '*', 0x12131415)
    325     '*\x00\x00\x00\x12\x13\x14\x15'
    326     >>> pack('ic', 0x12131415, '*')
    327     '\x12\x13\x14\x15*'
    328     >>> calcsize('ci')
    329     8
    330     >>> calcsize('ic')
    331     5
    332 
    333 The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
    334 longs are aligned on 4-byte boundaries::
    335 
    336     >>> pack('llh0l', 1, 2, 3)
    337     '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
    338 
    339 This only works when native size and alignment are in effect; standard size and
    340 alignment does not enforce any alignment.
    341 
    342 
    343 .. seealso::
    344 
    345    Module :mod:`array`
    346       Packed binary storage of homogeneous data.
    347 
    348    Module :mod:`xdrlib`
    349       Packing and unpacking of XDR data.
    350 
    351 
    352 .. _struct-objects:
    353 
    354 Classes
    355 -------
    356 
    357 The :mod:`struct` module also defines the following type:
    358 
    359 
    360 .. class:: Struct(format)
    361 
    362    Return a new Struct object which writes and reads binary data according to
    363    the format string *format*.  Creating a Struct object once and calling its
    364    methods is more efficient than calling the :mod:`struct` functions with the
    365    same format since the format string only needs to be compiled once.
    366 
    367    .. versionadded:: 2.5
    368 
    369    Compiled Struct objects support the following methods and attributes:
    370 
    371 
    372    .. method:: pack(v1, v2, ...)
    373 
    374       Identical to the :func:`pack` function, using the compiled format.
    375       (``len(result)`` will equal :attr:`self.size`.)
    376 
    377 
    378    .. method:: pack_into(buffer, offset, v1, v2, ...)
    379 
    380       Identical to the :func:`pack_into` function, using the compiled format.
    381 
    382 
    383    .. method:: unpack(string)
    384 
    385       Identical to the :func:`unpack` function, using the compiled format.
    386       (``len(string)`` must equal :attr:`self.size`).
    387 
    388 
    389    .. method:: unpack_from(buffer, offset=0)
    390 
    391       Identical to the :func:`unpack_from` function, using the compiled format.
    392       (``len(buffer[offset:])`` must be at least :attr:`self.size`).
    393 
    394 
    395    .. attribute:: format
    396 
    397       The format string used to construct this Struct object.
    398 
    399    .. attribute:: size
    400 
    401       The calculated size of the struct (and hence of the string) corresponding
    402       to :attr:`format`.
    403 
    404