1 2 :mod:`struct` --- Interpret strings as packed binary data 3 ========================================================= 4 5 .. module:: struct 6 :synopsis: Interpret strings as packed binary data. 7 8 .. index:: 9 pair: C; structures 10 triple: packing; binary; data 11 12 This module performs conversions between Python values and C structs represented 13 as Python strings. This can be used in handling binary data stored in files or 14 from network connections, among other sources. It uses 15 :ref:`struct-format-strings` as compact descriptions of the layout of the C 16 structs and the intended conversion to/from Python values. 17 18 .. note:: 19 20 By default, the result of packing a given C struct includes pad bytes in 21 order to maintain proper alignment for the C types involved; similarly, 22 alignment is taken into account when unpacking. This behavior is chosen so 23 that the bytes of a packed struct correspond exactly to the layout in memory 24 of the corresponding C struct. To handle platform-independent data formats 25 or omit implicit pad bytes, use ``standard`` size and alignment instead of 26 ``native`` size and alignment: see :ref:`struct-alignment` for details. 27 28 Functions and Exceptions 29 ------------------------ 30 31 The module defines the following exception and functions: 32 33 34 .. exception:: error 35 36 Exception raised on various occasions; argument is a string describing what 37 is wrong. 38 39 40 .. function:: pack(fmt, v1, v2, ...) 41 42 Return a string containing the values ``v1, v2, ...`` packed according to the 43 given format. The arguments must match the values required by the format 44 exactly. 45 46 47 .. function:: pack_into(fmt, buffer, offset, v1, v2, ...) 48 49 Pack the values ``v1, v2, ...`` according to the given format, write the 50 packed bytes into the writable *buffer* starting at *offset*. Note that the 51 offset is a required argument. 52 53 .. versionadded:: 2.5 54 55 56 .. function:: unpack(fmt, string) 57 58 Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the 59 given format. The result is a tuple even if it contains exactly one item. 60 The string must contain exactly the amount of data required by the format 61 (``len(string)`` must equal ``calcsize(fmt)``). 62 63 64 .. function:: unpack_from(fmt, buffer[,offset=0]) 65 66 Unpack the *buffer* according to the given format. The result is a tuple even 67 if it contains exactly one item. The *buffer* must contain at least the 68 amount of data required by the format (``len(buffer[offset:])`` must be at 69 least ``calcsize(fmt)``). 70 71 .. versionadded:: 2.5 72 73 74 .. function:: calcsize(fmt) 75 76 Return the size of the struct (and hence of the string) corresponding to the 77 given format. 78 79 .. _struct-format-strings: 80 81 Format Strings 82 -------------- 83 84 Format strings are the mechanism used to specify the expected layout when 85 packing and unpacking data. They are built up from :ref:`format-characters`, 86 which specify the type of data being packed/unpacked. In addition, there are 87 special characters for controlling the :ref:`struct-alignment`. 88 89 90 .. _struct-alignment: 91 92 Byte Order, Size, and Alignment 93 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 94 95 By default, C types are represented in the machine's native format and byte 96 order, and properly aligned by skipping pad bytes if necessary (according to the 97 rules used by the C compiler). 98 99 Alternatively, the first character of the format string can be used to indicate 100 the byte order, size and alignment of the packed data, according to the 101 following table: 102 103 +-----------+------------------------+----------+-----------+ 104 | Character | Byte order | Size | Alignment | 105 +===========+========================+==========+===========+ 106 | ``@`` | native | native | native | 107 +-----------+------------------------+----------+-----------+ 108 | ``=`` | native | standard | none | 109 +-----------+------------------------+----------+-----------+ 110 | ``<`` | little-endian | standard | none | 111 +-----------+------------------------+----------+-----------+ 112 | ``>`` | big-endian | standard | none | 113 +-----------+------------------------+----------+-----------+ 114 | ``!`` | network (= big-endian) | standard | none | 115 +-----------+------------------------+----------+-----------+ 116 117 If the first character is not one of these, ``'@'`` is assumed. 118 119 Native byte order is big-endian or little-endian, depending on the host 120 system. For example, Intel x86 and AMD64 (x86-64) are little-endian; 121 Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature 122 switchable endianness (bi-endian). Use ``sys.byteorder`` to check the 123 endianness of your system. 124 125 Native size and alignment are determined using the C compiler's 126 ``sizeof`` expression. This is always combined with native byte order. 127 128 Standard size depends only on the format character; see the table in 129 the :ref:`format-characters` section. 130 131 Note the difference between ``'@'`` and ``'='``: both use native byte order, but 132 the size and alignment of the latter is standardized. 133 134 The form ``'!'`` is available for those poor souls who claim they can't remember 135 whether network byte order is big-endian or little-endian. 136 137 There is no way to indicate non-native byte order (force byte-swapping); use the 138 appropriate choice of ``'<'`` or ``'>'``. 139 140 Notes: 141 142 (1) Padding is only automatically added between successive structure members. 143 No padding is added at the beginning or the end of the encoded struct. 144 145 (2) No padding is added when using non-native size and alignment, e.g. 146 with '<', '>', '=', and '!'. 147 148 (3) To align the end of a structure to the alignment requirement of a 149 particular type, end the format with the code for that type with a repeat 150 count of zero. See :ref:`struct-examples`. 151 152 153 .. _format-characters: 154 155 Format Characters 156 ^^^^^^^^^^^^^^^^^ 157 158 Format characters have the following meaning; the conversion between C and 159 Python values should be obvious given their types. The 'Standard size' column 160 refers to the size of the packed value in bytes when using standard size; that 161 is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or 162 ``'='``. When using native size, the size of the packed value is 163 platform-dependent. 164 165 +--------+--------------------------+--------------------+----------------+------------+ 166 | Format | C Type | Python type | Standard size | Notes | 167 +========+==========================+====================+================+============+ 168 | ``x`` | pad byte | no value | | | 169 +--------+--------------------------+--------------------+----------------+------------+ 170 | ``c`` | :c:type:`char` | string of length 1 | 1 | | 171 +--------+--------------------------+--------------------+----------------+------------+ 172 | ``b`` | :c:type:`signed char` | integer | 1 | \(3) | 173 +--------+--------------------------+--------------------+----------------+------------+ 174 | ``B`` | :c:type:`unsigned char` | integer | 1 | \(3) | 175 +--------+--------------------------+--------------------+----------------+------------+ 176 | ``?`` | :c:type:`_Bool` | bool | 1 | \(1) | 177 +--------+--------------------------+--------------------+----------------+------------+ 178 | ``h`` | :c:type:`short` | integer | 2 | \(3) | 179 +--------+--------------------------+--------------------+----------------+------------+ 180 | ``H`` | :c:type:`unsigned short` | integer | 2 | \(3) | 181 +--------+--------------------------+--------------------+----------------+------------+ 182 | ``i`` | :c:type:`int` | integer | 4 | \(3) | 183 +--------+--------------------------+--------------------+----------------+------------+ 184 | ``I`` | :c:type:`unsigned int` | integer | 4 | \(3) | 185 +--------+--------------------------+--------------------+----------------+------------+ 186 | ``l`` | :c:type:`long` | integer | 4 | \(3) | 187 +--------+--------------------------+--------------------+----------------+------------+ 188 | ``L`` | :c:type:`unsigned long` | integer | 4 | \(3) | 189 +--------+--------------------------+--------------------+----------------+------------+ 190 | ``q`` | :c:type:`long long` | integer | 8 | \(2), \(3) | 191 +--------+--------------------------+--------------------+----------------+------------+ 192 | ``Q`` | :c:type:`unsigned long | integer | 8 | \(2), \(3) | 193 | | long` | | | | 194 +--------+--------------------------+--------------------+----------------+------------+ 195 | ``f`` | :c:type:`float` | float | 4 | \(4) | 196 +--------+--------------------------+--------------------+----------------+------------+ 197 | ``d`` | :c:type:`double` | float | 8 | \(4) | 198 +--------+--------------------------+--------------------+----------------+------------+ 199 | ``s`` | :c:type:`char[]` | string | | | 200 +--------+--------------------------+--------------------+----------------+------------+ 201 | ``p`` | :c:type:`char[]` | string | | | 202 +--------+--------------------------+--------------------+----------------+------------+ 203 | ``P`` | :c:type:`void \*` | integer | | \(5), \(3) | 204 +--------+--------------------------+--------------------+----------------+------------+ 205 206 Notes: 207 208 (1) 209 The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by 210 C99. If this type is not available, it is simulated using a :c:type:`char`. In 211 standard mode, it is always represented by one byte. 212 213 .. versionadded:: 2.6 214 215 (2) 216 The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if 217 the platform C compiler supports C :c:type:`long long`, or, on Windows, 218 :c:type:`__int64`. They are always available in standard modes. 219 220 .. versionadded:: 2.2 221 222 (3) 223 When attempting to pack a non-integer using any of the integer conversion 224 codes, if the non-integer has a :meth:`__index__` method then that method is 225 called to convert the argument to an integer before packing. If no 226 :meth:`__index__` method exists, or the call to :meth:`__index__` raises 227 :exc:`TypeError`, then the :meth:`__int__` method is tried. However, the use 228 of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`. 229 230 .. versionchanged:: 2.7 231 Use of the :meth:`__index__` method for non-integers is new in 2.7. 232 233 .. versionchanged:: 2.7 234 Prior to version 2.7, not all integer conversion codes would use the 235 :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was 236 raised only for float arguments. 237 238 (4) 239 For the ``'f'`` and ``'d'`` conversion codes, the packed representation uses 240 the IEEE 754 binary32 (for ``'f'``) or binary64 (for ``'d'``) format, 241 regardless of the floating-point format used by the platform. 242 243 (5) 244 The ``'P'`` format character is only available for the native byte ordering 245 (selected as the default or with the ``'@'`` byte order character). The byte 246 order character ``'='`` chooses to use little- or big-endian ordering based 247 on the host system. The struct module does not interpret this as native 248 ordering, so the ``'P'`` format is not available. 249 250 251 A format character may be preceded by an integral repeat count. For example, 252 the format string ``'4h'`` means exactly the same as ``'hhhh'``. 253 254 Whitespace characters between formats are ignored; a count and its format must 255 not contain whitespace though. 256 257 For the ``'s'`` format character, the count is interpreted as the size of the 258 string, not a repeat count like for the other format characters; for example, 259 ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters. 260 If a count is not given, it defaults to 1. For packing, the string is 261 truncated or padded with null bytes as appropriate to make it fit. For 262 unpacking, the resulting string always has exactly the specified number of 263 bytes. As a special case, ``'0s'`` means a single, empty string (while 264 ``'0c'`` means 0 characters). 265 266 The ``'p'`` format character encodes a "Pascal string", meaning a short 267 variable-length string stored in a *fixed number of bytes*, given by the count. 268 The first byte stored is the length of the string, or 255, whichever is smaller. 269 The bytes of the string follow. If the string passed in to :func:`pack` is too 270 long (longer than the count minus 1), only the leading ``count-1`` bytes of the 271 string are stored. If the string is shorter than ``count-1``, it is padded with 272 null bytes so that exactly count bytes in all are used. Note that for 273 :func:`unpack`, the ``'p'`` format character consumes count bytes, but that the 274 string returned can never contain more than 255 characters. 275 276 For the ``'P'`` format character, the return value is a Python integer or long 277 integer, depending on the size needed to hold a pointer when it has been cast to 278 an integer type. A *NULL* pointer will always be returned as the Python integer 279 ``0``. When packing pointer-sized values, Python integer or long integer objects 280 may be used. For example, the Alpha and Merced processors use 64-bit pointer 281 values, meaning a Python long integer will be used to hold the pointer; other 282 platforms use 32-bit pointers and will use a Python integer. 283 284 For the ``'?'`` format character, the return value is either :const:`True` or 285 :const:`False`. When packing, the truth value of the argument object is used. 286 Either 0 or 1 in the native or standard bool representation will be packed, and 287 any non-zero value will be ``True`` when unpacking. 288 289 290 291 .. _struct-examples: 292 293 Examples 294 ^^^^^^^^ 295 296 .. note:: 297 All examples assume a native byte order, size, and alignment with a 298 big-endian machine. 299 300 A basic example of packing/unpacking three integers:: 301 302 >>> from struct import * 303 >>> pack('hhl', 1, 2, 3) 304 '\x00\x01\x00\x02\x00\x00\x00\x03' 305 >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03') 306 (1, 2, 3) 307 >>> calcsize('hhl') 308 8 309 310 Unpacked fields can be named by assigning them to variables or by wrapping 311 the result in a named tuple:: 312 313 >>> record = 'raymond \x32\x12\x08\x01\x08' 314 >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) 315 316 >>> from collections import namedtuple 317 >>> Student = namedtuple('Student', 'name serialnum school gradelevel') 318 >>> Student._make(unpack('<10sHHb', record)) 319 Student(name='raymond ', serialnum=4658, school=264, gradelevel=8) 320 321 The ordering of format characters may have an impact on size since the padding 322 needed to satisfy alignment requirements is different:: 323 324 >>> pack('ci', '*', 0x12131415) 325 '*\x00\x00\x00\x12\x13\x14\x15' 326 >>> pack('ic', 0x12131415, '*') 327 '\x12\x13\x14\x15*' 328 >>> calcsize('ci') 329 8 330 >>> calcsize('ic') 331 5 332 333 The following format ``'llh0l'`` specifies two pad bytes at the end, assuming 334 longs are aligned on 4-byte boundaries:: 335 336 >>> pack('llh0l', 1, 2, 3) 337 '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00' 338 339 This only works when native size and alignment are in effect; standard size and 340 alignment does not enforce any alignment. 341 342 343 .. seealso:: 344 345 Module :mod:`array` 346 Packed binary storage of homogeneous data. 347 348 Module :mod:`xdrlib` 349 Packing and unpacking of XDR data. 350 351 352 .. _struct-objects: 353 354 Classes 355 ------- 356 357 The :mod:`struct` module also defines the following type: 358 359 360 .. class:: Struct(format) 361 362 Return a new Struct object which writes and reads binary data according to 363 the format string *format*. Creating a Struct object once and calling its 364 methods is more efficient than calling the :mod:`struct` functions with the 365 same format since the format string only needs to be compiled once. 366 367 .. versionadded:: 2.5 368 369 Compiled Struct objects support the following methods and attributes: 370 371 372 .. method:: pack(v1, v2, ...) 373 374 Identical to the :func:`pack` function, using the compiled format. 375 (``len(result)`` will equal :attr:`self.size`.) 376 377 378 .. method:: pack_into(buffer, offset, v1, v2, ...) 379 380 Identical to the :func:`pack_into` function, using the compiled format. 381 382 383 .. method:: unpack(string) 384 385 Identical to the :func:`unpack` function, using the compiled format. 386 (``len(string)`` must equal :attr:`self.size`). 387 388 389 .. method:: unpack_from(buffer, offset=0) 390 391 Identical to the :func:`unpack_from` function, using the compiled format. 392 (``len(buffer[offset:])`` must be at least :attr:`self.size`). 393 394 395 .. attribute:: format 396 397 The format string used to construct this Struct object. 398 399 .. attribute:: size 400 401 The calculated size of the struct (and hence of the string) corresponding 402 to :attr:`format`. 403 404