Home | History | Annotate | Download | only in library
      1 :mod:`zlib` --- Compression compatible with :program:`gzip`
      2 ===========================================================
      3 
      4 .. module:: zlib
      5    :synopsis: Low-level interface to compression and decompression routines
      6               compatible with gzip.
      7 
      8 --------------
      9 
     10 For applications that require data compression, the functions in this module
     11 allow compression and decompression, using the zlib library. The zlib library
     12 has its own home page at http://www.zlib.net.   There are known
     13 incompatibilities between the Python module and versions of the zlib library
     14 earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using
     15 1.1.4 or later.
     16 
     17 zlib's functions have many options and often need to be used in a particular
     18 order.  This documentation doesn't attempt to cover all of the permutations;
     19 consult the zlib manual at http://www.zlib.net/manual.html for authoritative
     20 information.
     21 
     22 For reading and writing ``.gz`` files see the :mod:`gzip` module.
     23 
     24 The available exception and functions in this module are:
     25 
     26 
     27 .. exception:: error
     28 
     29    Exception raised on compression and decompression errors.
     30 
     31 
     32 .. function:: adler32(data[, value])
     33 
     34    Computes an Adler-32 checksum of *data*.  (An Adler-32 checksum is almost as
     35    reliable as a CRC32 but can be computed much more quickly.)  The result
     36    is an unsigned 32-bit integer.  If *value* is present, it is used as
     37    the starting value of the checksum; otherwise, a default value of 1
     38    is used.  Passing in *value* allows computing a running checksum over the
     39    concatenation of several inputs.  The algorithm is not cryptographically
     40    strong, and should not be used for authentication or digital signatures.  Since
     41    the algorithm is designed for use as a checksum algorithm, it is not suitable
     42    for use as a general hash algorithm.
     43 
     44    .. versionchanged:: 3.0
     45       Always returns an unsigned value.
     46       To generate the same numeric value across all Python versions and
     47       platforms, use ``adler32(data) & 0xffffffff``.
     48 
     49 
     50 .. function:: compress(data, level=-1)
     51 
     52    Compresses the bytes in *data*, returning a bytes object containing compressed data.
     53    *level* is an integer from ``0`` to ``9`` or ``-1`` controlling the level of compression;
     54    ``1`` (Z_BEST_SPEED) is fastest and produces the least compression, ``9`` (Z_BEST_COMPRESSION)
     55    is slowest and produces the most.  ``0`` (Z_NO_COMPRESSION) is no compression.
     56    The default value is ``-1`` (Z_DEFAULT_COMPRESSION).  Z_DEFAULT_COMPRESSION represents a default
     57    compromise between speed and compression (currently equivalent to level 6).
     58    Raises the :exc:`error` exception if any error occurs.
     59 
     60    .. versionchanged:: 3.6
     61       *level* can now be used as a keyword parameter.
     62 
     63 
     64 .. function:: compressobj(level=-1, method=DEFLATED, wbits=MAX_WBITS, memLevel=DEF_MEM_LEVEL, strategy=Z_DEFAULT_STRATEGY[, zdict])
     65 
     66    Returns a compression object, to be used for compressing data streams that won't
     67    fit into memory at once.
     68 
     69    *level* is the compression level -- an integer from ``0`` to ``9`` or ``-1``.
     70    A value of ``1`` (Z_BEST_SPEED) is fastest and produces the least compression,
     71    while a value of ``9`` (Z_BEST_COMPRESSION) is slowest and produces the most.
     72    ``0`` (Z_NO_COMPRESSION) is no compression.  The default value is ``-1`` (Z_DEFAULT_COMPRESSION).
     73    Z_DEFAULT_COMPRESSION represents a default compromise between speed and compression
     74    (currently equivalent to level 6).
     75 
     76    *method* is the compression algorithm. Currently, the only supported value is
     77    :const:`DEFLATED`.
     78 
     79    The *wbits* argument controls the size of the history buffer (or the
     80    "window size") used when compressing data, and whether a header and
     81    trailer is included in the output.  It can take several ranges of values,
     82    defaulting to ``15`` (MAX_WBITS):
     83 
     84    * +9 to +15: The base-two logarithm of the window size, which
     85      therefore ranges between 512 and 32768.  Larger values produce
     86      better compression at the expense of greater memory usage.  The
     87      resulting output will include a zlib-specific header and trailer.
     88 
     89    * 9 to 15: Uses the absolute value of *wbits* as the
     90      window size logarithm, while producing a raw output stream with no
     91      header or trailing checksum.
     92 
     93    * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the
     94      window size logarithm, while including a basic :program:`gzip` header
     95      and trailing checksum in the output.
     96 
     97    The *memLevel* argument controls the amount of memory used for the
     98    internal compression state. Valid values range from ``1`` to ``9``.
     99    Higher values use more memory, but are faster and produce smaller output.
    100 
    101    *strategy* is used to tune the compression algorithm. Possible values are
    102    :const:`Z_DEFAULT_STRATEGY`, :const:`Z_FILTERED`, :const:`Z_HUFFMAN_ONLY`,
    103    :const:`Z_RLE` (zlib 1.2.0.1) and :const:`Z_FIXED` (zlib 1.2.2.2).
    104 
    105    *zdict* is a predefined compression dictionary. This is a sequence of bytes
    106    (such as a :class:`bytes` object) containing subsequences that are expected
    107    to occur frequently in the data that is to be compressed. Those subsequences
    108    that are expected to be most common should come at the end of the dictionary.
    109 
    110    .. versionchanged:: 3.3
    111       Added the *zdict* parameter and keyword argument support.
    112 
    113 
    114 .. function:: crc32(data[, value])
    115 
    116    .. index::
    117       single: Cyclic Redundancy Check
    118       single: checksum; Cyclic Redundancy Check
    119 
    120    Computes a CRC (Cyclic Redundancy Check) checksum of *data*. The
    121    result is an unsigned 32-bit integer. If *value* is present, it is used
    122    as the starting value of the checksum; otherwise, a default value of 0
    123    is used.  Passing in *value* allows computing a running checksum over the
    124    concatenation of several inputs.  The algorithm is not cryptographically
    125    strong, and should not be used for authentication or digital signatures.  Since
    126    the algorithm is designed for use as a checksum algorithm, it is not suitable
    127    for use as a general hash algorithm.
    128 
    129    .. versionchanged:: 3.0
    130       Always returns an unsigned value.
    131       To generate the same numeric value across all Python versions and
    132       platforms, use ``crc32(data) & 0xffffffff``.
    133 
    134 
    135 .. function:: decompress(data, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE)
    136 
    137    Decompresses the bytes in *data*, returning a bytes object containing the
    138    uncompressed data.  The *wbits* parameter depends on
    139    the format of *data*, and is discussed further below.
    140    If *bufsize* is given, it is used as the initial size of the output
    141    buffer.  Raises the :exc:`error` exception if any error occurs.
    142 
    143    .. _decompress-wbits:
    144 
    145    The *wbits* parameter controls the size of the history buffer
    146    (or "window size"), and what header and trailer format is expected.
    147    It is similar to the parameter for :func:`compressobj`, but accepts
    148    more ranges of values:
    149 
    150    * +8 to +15: The base-two logarithm of the window size.  The input
    151      must include a zlib header and trailer.
    152 
    153    * 0: Automatically determine the window size from the zlib header.
    154      Only supported since zlib 1.2.3.5.
    155 
    156    * 8 to 15: Uses the absolute value of *wbits* as the window size
    157      logarithm.  The input must be a raw stream with no header or trailer.
    158 
    159    * +24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as
    160      the window size logarithm.  The input must include a gzip header and
    161      trailer.
    162 
    163    * +40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as
    164      the window size logarithm, and automatically accepts either
    165      the zlib or gzip format.
    166 
    167    When decompressing a stream, the window size must not be smaller
    168    than the size originally used to compress the stream; using a too-small
    169    value may result in an :exc:`error` exception. The default *wbits* value
    170    corresponds to the largest window size and requires a zlib header and
    171    trailer to be included.
    172 
    173    *bufsize* is the initial size of the buffer used to hold decompressed data.  If
    174    more space is required, the buffer size will be increased as needed, so you
    175    don't have to get this value exactly right; tuning it will only save a few calls
    176    to :c:func:`malloc`.
    177 
    178    .. versionchanged:: 3.6
    179       *wbits* and *bufsize* can be used as keyword arguments.
    180 
    181 .. function:: decompressobj(wbits=MAX_WBITS[, zdict])
    182 
    183    Returns a decompression object, to be used for decompressing data streams that
    184    won't fit into memory at once.
    185 
    186    The *wbits* parameter controls the size of the history buffer (or the
    187    "window size"), and what header and trailer format is expected.  It has
    188    the same meaning as `described for decompress() <#decompress-wbits>`__.
    189 
    190    The *zdict* parameter specifies a predefined compression dictionary. If
    191    provided, this must be the same dictionary as was used by the compressor that
    192    produced the data that is to be decompressed.
    193 
    194    .. note::
    195 
    196       If *zdict* is a mutable object (such as a :class:`bytearray`), you must not
    197       modify its contents between the call to :func:`decompressobj` and the first
    198       call to the decompressor's ``decompress()`` method.
    199 
    200    .. versionchanged:: 3.3
    201       Added the *zdict* parameter.
    202 
    203 
    204 Compression objects support the following methods:
    205 
    206 
    207 .. method:: Compress.compress(data)
    208 
    209    Compress *data*, returning a bytes object containing compressed data for at least
    210    part of the data in *data*.  This data should be concatenated to the output
    211    produced by any preceding calls to the :meth:`compress` method.  Some input may
    212    be kept in internal buffers for later processing.
    213 
    214 
    215 .. method:: Compress.flush([mode])
    216 
    217    All pending input is processed, and a bytes object containing the remaining compressed
    218    output is returned.  *mode* can be selected from the constants
    219    :const:`Z_NO_FLUSH`, :const:`Z_PARTIAL_FLUSH`, :const:`Z_SYNC_FLUSH`,
    220    :const:`Z_FULL_FLUSH`, :const:`Z_BLOCK` (zlib 1.2.3.4), or :const:`Z_FINISH`,
    221    defaulting to :const:`Z_FINISH`.  Except :const:`Z_FINISH`, all constants
    222    allow compressing further bytestrings of data, while :const:`Z_FINISH` finishes the
    223    compressed stream and prevents compressing any more data.  After calling :meth:`flush`
    224    with *mode* set to :const:`Z_FINISH`, the :meth:`compress` method cannot be called again;
    225    the only realistic action is to delete the object.
    226 
    227 
    228 .. method:: Compress.copy()
    229 
    230    Returns a copy of the compression object.  This can be used to efficiently
    231    compress a set of data that share a common initial prefix.
    232 
    233 
    234 Decompression objects support the following methods and attributes:
    235 
    236 
    237 .. attribute:: Decompress.unused_data
    238 
    239    A bytes object which contains any bytes past the end of the compressed data. That is,
    240    this remains ``b""`` until the last byte that contains compression data is
    241    available.  If the whole bytestring turned out to contain compressed data, this is
    242    ``b""``, an empty bytes object.
    243 
    244 
    245 .. attribute:: Decompress.unconsumed_tail
    246 
    247    A bytes object that contains any data that was not consumed by the last
    248    :meth:`decompress` call because it exceeded the limit for the uncompressed data
    249    buffer.  This data has not yet been seen by the zlib machinery, so you must feed
    250    it (possibly with further data concatenated to it) back to a subsequent
    251    :meth:`decompress` method call in order to get correct output.
    252 
    253 
    254 .. attribute:: Decompress.eof
    255 
    256    A boolean indicating whether the end of the compressed data stream has been
    257    reached.
    258 
    259    This makes it possible to distinguish between a properly-formed compressed
    260    stream, and an incomplete or truncated one.
    261 
    262    .. versionadded:: 3.3
    263 
    264 
    265 .. method:: Decompress.decompress(data, max_length=0)
    266 
    267    Decompress *data*, returning a bytes object containing the uncompressed data
    268    corresponding to at least part of the data in *string*.  This data should be
    269    concatenated to the output produced by any preceding calls to the
    270    :meth:`decompress` method.  Some of the input data may be preserved in internal
    271    buffers for later processing.
    272 
    273    If the optional parameter *max_length* is non-zero then the return value will be
    274    no longer than *max_length*. This may mean that not all of the compressed input
    275    can be processed; and unconsumed data will be stored in the attribute
    276    :attr:`unconsumed_tail`. This bytestring must be passed to a subsequent call to
    277    :meth:`decompress` if decompression is to continue.  If *max_length* is zero
    278    then the whole input is decompressed, and :attr:`unconsumed_tail` is empty.
    279 
    280    .. versionchanged:: 3.6
    281       *max_length* can be used as a keyword argument.
    282 
    283 
    284 .. method:: Decompress.flush([length])
    285 
    286    All pending input is processed, and a bytes object containing the remaining
    287    uncompressed output is returned.  After calling :meth:`flush`, the
    288    :meth:`decompress` method cannot be called again; the only realistic action is
    289    to delete the object.
    290 
    291    The optional parameter *length* sets the initial size of the output buffer.
    292 
    293 
    294 .. method:: Decompress.copy()
    295 
    296    Returns a copy of the decompression object.  This can be used to save the state
    297    of the decompressor midway through the data stream in order to speed up random
    298    seeks into the stream at a future point.
    299 
    300 
    301 Information about the version of the zlib library in use is available through
    302 the following constants:
    303 
    304 
    305 .. data:: ZLIB_VERSION
    306 
    307    The version string of the zlib library that was used for building the module.
    308    This may be different from the zlib library actually used at runtime, which
    309    is available as :const:`ZLIB_RUNTIME_VERSION`.
    310 
    311 
    312 .. data:: ZLIB_RUNTIME_VERSION
    313 
    314    The version string of the zlib library actually loaded by the interpreter.
    315 
    316    .. versionadded:: 3.3
    317 
    318 
    319 .. seealso::
    320 
    321    Module :mod:`gzip`
    322       Reading and writing :program:`gzip`\ -format files.
    323 
    324    http://www.zlib.net
    325       The zlib library home page.
    326 
    327    http://www.zlib.net/manual.html
    328       The zlib manual explains  the semantics and usage of the library's many
    329       functions.
    330 
    331