Home | History | Annotate | Download | only in library
      1 
      2 :mod:`zlib` --- Compression compatible with :program:`gzip`
      3 ===========================================================
      4 
      5 .. module:: zlib
      6    :synopsis: Low-level interface to compression and decompression routines compatible with
      7               gzip.
      8 
      9 
     10 For applications that require data compression, the functions in this module
     11 allow compression and decompression, using the zlib library. The zlib library
     12 has its own home page at http://www.zlib.net.   There are known
     13 incompatibilities between the Python module and versions of the zlib library
     14 earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using
     15 1.1.4 or later.
     16 
     17 zlib's functions have many options and often need to be used in a particular
     18 order.  This documentation doesn't attempt to cover all of the permutations;
     19 consult the zlib manual at http://www.zlib.net/manual.html for authoritative
     20 information.
     21 
     22 For reading and writing ``.gz`` files see the :mod:`gzip` module.
     23 
     24 The available exception and functions in this module are:
     25 
     26 
     27 .. exception:: error
     28 
     29    Exception raised on compression and decompression errors.
     30 
     31 
     32 .. function:: adler32(data[, value])
     33 
     34    Computes an Adler-32 checksum of *data*.  (An Adler-32 checksum is almost as
     35    reliable as a CRC32 but can be computed much more quickly.)  If *value* is
     36    present, it is used as the starting value of the checksum; otherwise, a fixed
     37    default value is used.  This allows computing a running checksum over the
     38    concatenation of several inputs.  The algorithm is not cryptographically
     39    strong, and should not be used for authentication or digital signatures.  Since
     40    the algorithm is designed for use as a checksum algorithm, it is not suitable
     41    for use as a general hash algorithm.
     42 
     43    This function always returns an integer object.
     44 
     45 .. note::
     46    To generate the same numeric value across all Python versions and
     47    platforms use adler32(data) & 0xffffffff.  If you are only using
     48    the checksum in packed binary format this is not necessary as the
     49    return value is the correct 32bit binary representation
     50    regardless of sign.
     51 
     52 .. versionchanged:: 2.6
     53    The return value is in the range [-2**31, 2**31-1]
     54    regardless of platform.  In older versions the value is
     55    signed on some platforms and unsigned on others.
     56 
     57 .. versionchanged:: 3.0
     58    The return value is unsigned and in the range [0, 2**32-1]
     59    regardless of platform.
     60 
     61 
     62 .. function:: compress(string[, level])
     63 
     64    Compresses the data in *string*, returning a string contained compressed data.
     65    *level* is an integer from ``0`` to ``9`` controlling the level of compression;
     66    ``1`` is fastest and produces the least compression, ``9`` is slowest and
     67    produces the most.  ``0`` is no compression.  The default value is ``6``.
     68    Raises the :exc:`error` exception if any error occurs.
     69 
     70 
     71 .. function:: compressobj([level[, method[, wbits[, memlevel[, strategy]]]]])
     72 
     73    Returns a compression object, to be used for compressing data streams that won't
     74    fit into memory at once.  *level* is an integer from
     75    ``0`` to ``9`` or ``-1``, controlling
     76    the level of compression; ``1`` is fastest and produces the least compression,
     77    ``9`` is slowest and produces the most.  ``0`` is no compression.  The default
     78    value is ``-1`` (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default
     79    compromise between speed and compression (currently equivalent to level 6).
     80 
     81    *method* is the compression algorithm. Currently, the only supported value is
     82    ``DEFLATED``.
     83 
     84    The *wbits* argument controls the size of the history buffer (or the
     85    "window size") used when compressing data, and whether a header and
     86    trailer is included in the output.  It can take several ranges of values.
     87    The default is 15.
     88 
     89    * +9 to +15: The base-two logarithm of the window size, which
     90      therefore ranges between 512 and 32768.  Larger values produce
     91      better compression at the expense of greater memory usage.  The
     92      resulting output will include a zlib-specific header and trailer.
     93 
     94    * 9 to 15: Uses the absolute value of *wbits* as the
     95      window size logarithm, while producing a raw output stream with no
     96      header or trailing checksum.
     97 
     98    * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the
     99      window size logarithm, while including a basic :program:`gzip` header
    100      and trailing checksum in the output.
    101 
    102    *memlevel* controls the amount of memory used for internal compression state.
    103    Valid values range from ``1`` to ``9``. Higher values using more memory,
    104    but are faster and produce smaller output. The default is 8.
    105 
    106    *strategy* is used to tune the compression algorithm. Possible values are
    107    ``Z_DEFAULT_STRATEGY``, ``Z_FILTERED``, and ``Z_HUFFMAN_ONLY``. The default
    108    is ``Z_DEFAULT_STRATEGY``.
    109 
    110 
    111 .. function:: crc32(data[, value])
    112 
    113    .. index::
    114       single: Cyclic Redundancy Check
    115       single: checksum; Cyclic Redundancy Check
    116 
    117    Computes a CRC (Cyclic Redundancy Check)  checksum of *data*. If *value* is
    118    present, it is used as the starting value of the checksum; otherwise, a fixed
    119    default value is used.  This allows computing a running checksum over the
    120    concatenation of several inputs.  The algorithm is not cryptographically
    121    strong, and should not be used for authentication or digital signatures.  Since
    122    the algorithm is designed for use as a checksum algorithm, it is not suitable
    123    for use as a general hash algorithm.
    124 
    125    This function always returns an integer object.
    126 
    127 .. note::
    128    To generate the same numeric value across all Python versions and
    129    platforms use crc32(data) & 0xffffffff.  If you are only using
    130    the checksum in packed binary format this is not necessary as the
    131    return value is the correct 32bit binary representation
    132    regardless of sign.
    133 
    134 .. versionchanged:: 2.6
    135    The return value is in the range [-2**31, 2**31-1]
    136    regardless of platform.  In older versions the value would be
    137    signed on some platforms and unsigned on others.
    138 
    139 .. versionchanged:: 3.0
    140    The return value is unsigned and in the range [0, 2**32-1]
    141    regardless of platform.
    142 
    143 
    144 .. function:: decompress(string[, wbits[, bufsize]])
    145 
    146    Decompresses the data in *string*, returning a string containing the
    147    uncompressed data.  The *wbits* parameter depends on
    148    the format of *string*, and is discussed further below.
    149    If *bufsize* is given, it is used as the initial size of the output
    150    buffer.  Raises the :exc:`error` exception if any error occurs.
    151 
    152    .. _decompress-wbits:
    153 
    154    The *wbits* parameter controls the size of the history buffer
    155    (or "window size"), and what header and trailer format is expected.
    156    It is similar to the parameter for :func:`compressobj`, but accepts
    157    more ranges of values:
    158 
    159    * +8 to +15: The base-two logarithm of the window size.  The input
    160      must include a zlib header and trailer.
    161 
    162    * 0: Automatically determine the window size from the zlib header.
    163      Only supported since zlib 1.2.3.5.
    164 
    165    * 8 to 15: Uses the absolute value of *wbits* as the window size
    166      logarithm.  The input must be a raw stream with no header or trailer.
    167 
    168    * +24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as
    169      the window size logarithm.  The input must include a gzip header and
    170      trailer.
    171 
    172    * +40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as
    173      the window size logarithm, and automatically accepts either
    174      the zlib or gzip format.
    175 
    176    When decompressing a stream, the window size must not be smaller
    177    than the size originally used to compress the stream; using a too-small
    178    value may result in an :exc:`error` exception. The default *wbits* value
    179    is 15, which corresponds to the largest window size and requires a zlib
    180    header and trailer to be included.
    181 
    182    *bufsize* is the initial size of the buffer used to hold decompressed data.  If
    183    more space is required, the buffer size will be increased as needed, so you
    184    don't have to get this value exactly right; tuning it will only save a few calls
    185    to :c:func:`malloc`.  The default size is 16384.
    186 
    187 
    188 .. function:: decompressobj([wbits])
    189 
    190    Returns a decompression object, to be used for decompressing data streams that
    191    won't fit into memory at once.
    192 
    193    The *wbits* parameter controls the size of the history buffer (or the
    194    "window size"), and what header and trailer format is expected.  It has
    195    the same meaning as `described for decompress() <#decompress-wbits>`__.
    196 
    197 Compression objects support the following methods:
    198 
    199 
    200 .. method:: Compress.compress(string)
    201 
    202    Compress *string*, returning a string containing compressed data for at least
    203    part of the data in *string*.  This data should be concatenated to the output
    204    produced by any preceding calls to the :meth:`compress` method.  Some input may
    205    be kept in internal buffers for later processing.
    206 
    207 
    208 .. method:: Compress.flush([mode])
    209 
    210    All pending input is processed, and a string containing the remaining compressed
    211    output is returned.  *mode* can be selected from the constants
    212    :const:`Z_SYNC_FLUSH`,  :const:`Z_FULL_FLUSH`,  or  :const:`Z_FINISH`,
    213    defaulting to :const:`Z_FINISH`.  :const:`Z_SYNC_FLUSH` and
    214    :const:`Z_FULL_FLUSH` allow compressing further strings of data, while
    215    :const:`Z_FINISH` finishes the compressed stream and  prevents compressing any
    216    more data.  After calling :meth:`flush` with *mode* set to :const:`Z_FINISH`,
    217    the :meth:`compress` method cannot be called again; the only realistic action is
    218    to delete the object.
    219 
    220 
    221 .. method:: Compress.copy()
    222 
    223    Returns a copy of the compression object.  This can be used to efficiently
    224    compress a set of data that share a common initial prefix.
    225 
    226    .. versionadded:: 2.5
    227 
    228 Decompression objects support the following methods, and two attributes:
    229 
    230 
    231 .. attribute:: Decompress.unused_data
    232 
    233    A string which contains any bytes past the end of the compressed data. That is,
    234    this remains ``""`` until the last byte that contains compression data is
    235    available.  If the whole string turned out to contain compressed data, this is
    236    ``""``, the empty string.
    237 
    238    The only way to determine where a string of compressed data ends is by actually
    239    decompressing it.  This means that when compressed data is contained part of a
    240    larger file, you can only find the end of it by reading data and feeding it
    241    followed by some non-empty string into a decompression object's
    242    :meth:`decompress` method until the :attr:`unused_data` attribute is no longer
    243    the empty string.
    244 
    245 
    246 .. attribute:: Decompress.unconsumed_tail
    247 
    248    A string that contains any data that was not consumed by the last
    249    :meth:`decompress` call because it exceeded the limit for the uncompressed data
    250    buffer.  This data has not yet been seen by the zlib machinery, so you must feed
    251    it (possibly with further data concatenated to it) back to a subsequent
    252    :meth:`decompress` method call in order to get correct output.
    253 
    254 
    255 .. method:: Decompress.decompress(string[, max_length])
    256 
    257    Decompress *string*, returning a string containing the uncompressed data
    258    corresponding to at least part of the data in *string*.  This data should be
    259    concatenated to the output produced by any preceding calls to the
    260    :meth:`decompress` method.  Some of the input data may be preserved in internal
    261    buffers for later processing.
    262 
    263    If the optional parameter *max_length* is non-zero then the return value will be
    264    no longer than *max_length*. This may mean that not all of the compressed input
    265    can be processed; and unconsumed data will be stored in the attribute
    266    :attr:`unconsumed_tail`. This string must be passed to a subsequent call to
    267    :meth:`decompress` if decompression is to continue.  If *max_length* is not
    268    supplied then the whole input is decompressed, and :attr:`unconsumed_tail` is an
    269    empty string.
    270 
    271 
    272 .. method:: Decompress.flush([length])
    273 
    274    All pending input is processed, and a string containing the remaining
    275    uncompressed output is returned.  After calling :meth:`flush`, the
    276    :meth:`decompress` method cannot be called again; the only realistic action is
    277    to delete the object.
    278 
    279    The optional parameter *length* sets the initial size of the output buffer.
    280 
    281 
    282 .. method:: Decompress.copy()
    283 
    284    Returns a copy of the decompression object.  This can be used to save the state
    285    of the decompressor midway through the data stream in order to speed up random
    286    seeks into the stream at a future point.
    287 
    288    .. versionadded:: 2.5
    289 
    290 
    291 .. seealso::
    292 
    293    Module :mod:`gzip`
    294       Reading and writing :program:`gzip`\ -format files.
    295 
    296    http://www.zlib.net
    297       The zlib library home page.
    298 
    299    http://www.zlib.net/manual.html
    300       The zlib manual explains  the semantics and usage of the library's many
    301       functions.
    302 
    303