Home | History | Annotate | Download | only in library
      1 :mod:`tarfile` --- Read and write tar archive files
      2 ===================================================
      3 
      4 .. module:: tarfile
      5    :synopsis: Read and write tar-format archive files.
      6 
      7 
      8 .. versionadded:: 2.3
      9 
     10 .. moduleauthor:: Lars Gustbel <lars (a] gustaebel.de>
     11 .. sectionauthor:: Lars Gustbel <lars (a] gustaebel.de>
     12 
     13 **Source code:** :source:`Lib/tarfile.py`
     14 
     15 --------------
     16 
     17 The :mod:`tarfile` module makes it possible to read and write tar
     18 archives, including those using gzip or bz2 compression.
     19 Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the
     20 higher-level functions in :ref:`shutil <archiving-operations>`.
     21 
     22 Some facts and figures:
     23 
     24 * reads and writes :mod:`gzip` and :mod:`bz2` compressed archives
     25   if the respective modules are available.
     26 
     27 * read/write support for the POSIX.1-1988 (ustar) format.
     28 
     29 * read/write support for the GNU tar format including *longname* and *longlink*
     30   extensions, read-only support for the *sparse* extension.
     31 
     32 * read/write support for the POSIX.1-2001 (pax) format.
     33 
     34   .. versionadded:: 2.6
     35 
     36 * handles directories, regular files, hardlinks, symbolic links, fifos,
     37   character devices and block devices and is able to acquire and restore file
     38   information like timestamp, access permissions and owner.
     39 
     40 
     41 .. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs)
     42 
     43    Return a :class:`TarFile` object for the pathname *name*. For detailed
     44    information on :class:`TarFile` objects and the keyword arguments that are
     45    allowed, see :ref:`tarfile-objects`.
     46 
     47    *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults
     48    to ``'r'``. Here is a full list of mode combinations:
     49 
     50    +------------------+---------------------------------------------+
     51    | mode             | action                                      |
     52    +==================+=============================================+
     53    | ``'r' or 'r:*'`` | Open for reading with transparent           |
     54    |                  | compression (recommended).                  |
     55    +------------------+---------------------------------------------+
     56    | ``'r:'``         | Open for reading exclusively without        |
     57    |                  | compression.                                |
     58    +------------------+---------------------------------------------+
     59    | ``'r:gz'``       | Open for reading with gzip compression.     |
     60    +------------------+---------------------------------------------+
     61    | ``'r:bz2'``      | Open for reading with bzip2 compression.    |
     62    +------------------+---------------------------------------------+
     63    | ``'a' or 'a:'``  | Open for appending with no compression. The |
     64    |                  | file is created if it does not exist.       |
     65    +------------------+---------------------------------------------+
     66    | ``'w' or 'w:'``  | Open for uncompressed writing.              |
     67    +------------------+---------------------------------------------+
     68    | ``'w:gz'``       | Open for gzip compressed writing.           |
     69    +------------------+---------------------------------------------+
     70    | ``'w:bz2'``      | Open for bzip2 compressed writing.          |
     71    +------------------+---------------------------------------------+
     72 
     73    Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable
     74    to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use
     75    *mode* ``'r'`` to avoid this.  If a compression method is not supported,
     76    :exc:`CompressionError` is raised.
     77 
     78    If *fileobj* is specified, it is used as an alternative to a file object opened
     79    for *name*. It is supposed to be at position 0.
     80 
     81    For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open`
     82    accepts the keyword argument *compresslevel* (default ``9``) to
     83    specify the compression level of the file.
     84 
     85    For special purposes, there is a second format for *mode*:
     86    ``'filemode|[compression]'``.  :func:`tarfile.open` will return a :class:`TarFile`
     87    object that processes its data as a stream of blocks.  No random seeking will
     88    be done on the file. If given, *fileobj* may be any object that has a
     89    :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize*
     90    specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant
     91    in combination with e.g. ``sys.stdin``, a socket file object or a tape
     92    device. However, such a :class:`TarFile` object is limited in that it does
     93    not allow random access, see :ref:`tar-examples`.  The currently
     94    possible modes:
     95 
     96    +-------------+--------------------------------------------+
     97    | Mode        | Action                                     |
     98    +=============+============================================+
     99    | ``'r|*'``   | Open a *stream* of tar blocks for reading  |
    100    |             | with transparent compression.              |
    101    +-------------+--------------------------------------------+
    102    | ``'r|'``    | Open a *stream* of uncompressed tar blocks |
    103    |             | for reading.                               |
    104    +-------------+--------------------------------------------+
    105    | ``'r|gz'``  | Open a gzip compressed *stream* for        |
    106    |             | reading.                                   |
    107    +-------------+--------------------------------------------+
    108    | ``'r|bz2'`` | Open a bzip2 compressed *stream* for       |
    109    |             | reading.                                   |
    110    +-------------+--------------------------------------------+
    111    | ``'w|'``    | Open an uncompressed *stream* for writing. |
    112    +-------------+--------------------------------------------+
    113    | ``'w|gz'``  | Open a gzip compressed *stream* for        |
    114    |             | writing.                                   |
    115    +-------------+--------------------------------------------+
    116    | ``'w|bz2'`` | Open a bzip2 compressed *stream* for       |
    117    |             | writing.                                   |
    118    +-------------+--------------------------------------------+
    119 
    120 
    121 .. class:: TarFile
    122 
    123    Class for reading and writing tar archives. Do not use this class directly,
    124    better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`.
    125 
    126 
    127 .. function:: is_tarfile(name)
    128 
    129    Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile`
    130    module can read.
    131 
    132 
    133 .. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN)
    134 
    135    Class for limited access to tar archives with a :mod:`zipfile`\ -like interface.
    136    Please consult the documentation of the :mod:`zipfile` module for more details.
    137    *compression* must be one of the following constants:
    138 
    139 
    140    .. data:: TAR_PLAIN
    141 
    142       Constant for an uncompressed tar archive.
    143 
    144 
    145    .. data:: TAR_GZIPPED
    146 
    147       Constant for a :mod:`gzip` compressed tar archive.
    148 
    149 
    150    .. deprecated:: 2.6
    151       The :class:`TarFileCompat` class has been removed in Python 3.
    152 
    153 
    154 .. exception:: TarError
    155 
    156    Base class for all :mod:`tarfile` exceptions.
    157 
    158 
    159 .. exception:: ReadError
    160 
    161    Is raised when a tar archive is opened, that either cannot be handled by the
    162    :mod:`tarfile` module or is somehow invalid.
    163 
    164 
    165 .. exception:: CompressionError
    166 
    167    Is raised when a compression method is not supported or when the data cannot be
    168    decoded properly.
    169 
    170 
    171 .. exception:: StreamError
    172 
    173    Is raised for the limitations that are typical for stream-like :class:`TarFile`
    174    objects.
    175 
    176 
    177 .. exception:: ExtractError
    178 
    179    Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if
    180    :attr:`TarFile.errorlevel`\ ``== 2``.
    181 
    182 
    183 The following constants are available at the module level:
    184 
    185 .. data:: ENCODING
    186 
    187    The default character encoding: ``'utf-8'`` on Windows, the value returned by
    188    :func:`sys.getfilesystemencoding` otherwise.
    189 
    190 
    191 .. exception:: HeaderError
    192 
    193    Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid.
    194 
    195    .. versionadded:: 2.6
    196 
    197 
    198 Each of the following constants defines a tar archive format that the
    199 :mod:`tarfile` module is able to create. See section :ref:`tar-formats` for
    200 details.
    201 
    202 
    203 .. data:: USTAR_FORMAT
    204 
    205    POSIX.1-1988 (ustar) format.
    206 
    207 
    208 .. data:: GNU_FORMAT
    209 
    210    GNU tar format.
    211 
    212 
    213 .. data:: PAX_FORMAT
    214 
    215    POSIX.1-2001 (pax) format.
    216 
    217 
    218 .. data:: DEFAULT_FORMAT
    219 
    220    The default format for creating archives. This is currently :const:`GNU_FORMAT`.
    221 
    222 
    223 .. seealso::
    224 
    225    Module :mod:`zipfile`
    226       Documentation of the :mod:`zipfile` standard module.
    227 
    228    :ref:`archiving-operations`
    229       Documentation of the higher-level archiving facilities provided by the
    230       standard :mod:`shutil` module.
    231 
    232    `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_
    233       Documentation for tar archive files, including GNU tar extensions.
    234 
    235 
    236 .. _tarfile-objects:
    237 
    238 TarFile Objects
    239 ---------------
    240 
    241 The :class:`TarFile` object provides an interface to a tar archive. A tar
    242 archive is a sequence of blocks. An archive member (a stored file) is made up of
    243 a header block followed by data blocks. It is possible to store a file in a tar
    244 archive several times. Each archive member is represented by a :class:`TarInfo`
    245 object, see :ref:`tarinfo-objects` for details.
    246 
    247 A :class:`TarFile` object can be used as a context manager in a :keyword:`with`
    248 statement. It will automatically be closed when the block is completed. Please
    249 note that in the event of an exception an archive opened for writing will not
    250 be finalized; only the internally used file object will be closed. See the
    251 :ref:`tar-examples` section for a use case.
    252 
    253 .. versionadded:: 2.7
    254    Added support for the context management protocol.
    255 
    256 .. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0)
    257 
    258    All following arguments are optional and can be accessed as instance attributes
    259    as well.
    260 
    261    *name* is the pathname of the archive. It can be omitted if *fileobj* is given.
    262    In this case, the file object's :attr:`name` attribute is used if it exists.
    263 
    264    *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append
    265    data to an existing file or ``'w'`` to create a new file overwriting an existing
    266    one.
    267 
    268    If *fileobj* is given, it is used for reading or writing data. If it can be
    269    determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used
    270    from position 0.
    271 
    272    .. note::
    273 
    274       *fileobj* is not closed, when :class:`TarFile` is closed.
    275 
    276    *format* controls the archive format. It must be one of the constants
    277    :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are
    278    defined at module level.
    279 
    280    .. versionadded:: 2.6
    281 
    282    The *tarinfo* argument can be used to replace the default :class:`TarInfo` class
    283    with a different one.
    284 
    285    .. versionadded:: 2.6
    286 
    287    If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it
    288    is :const:`True`, add the content of the target files to the archive. This has no
    289    effect on systems that do not support symbolic links.
    290 
    291    If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive.
    292    If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members
    293    as possible. This is only useful for reading concatenated or damaged archives.
    294 
    295    *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug
    296    messages). The messages are written to ``sys.stderr``.
    297 
    298    If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`.
    299    Nevertheless, they appear as error messages in the debug output, when debugging
    300    is enabled.  If ``1``, all *fatal* errors are raised as :exc:`OSError` or
    301    :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as
    302    :exc:`TarError` exceptions as well.
    303 
    304    The *encoding* and *errors* arguments control the way strings are converted to
    305    unicode objects and vice versa. The default settings will work for most users.
    306    See section :ref:`tar-unicode` for in-depth information.
    307 
    308    .. versionadded:: 2.6
    309 
    310    The *pax_headers* argument is an optional dictionary of unicode strings which
    311    will be added as a pax global header if *format* is :const:`PAX_FORMAT`.
    312 
    313    .. versionadded:: 2.6
    314 
    315 
    316 .. classmethod:: TarFile.open(...)
    317 
    318    Alternative constructor. The :func:`tarfile.open` function is actually a
    319    shortcut to this classmethod.
    320 
    321 
    322 .. method:: TarFile.getmember(name)
    323 
    324    Return a :class:`TarInfo` object for member *name*. If *name* can not be found
    325    in the archive, :exc:`KeyError` is raised.
    326 
    327    .. note::
    328 
    329       If a member occurs more than once in the archive, its last occurrence is assumed
    330       to be the most up-to-date version.
    331 
    332 
    333 .. method:: TarFile.getmembers()
    334 
    335    Return the members of the archive as a list of :class:`TarInfo` objects. The
    336    list has the same order as the members in the archive.
    337 
    338 
    339 .. method:: TarFile.getnames()
    340 
    341    Return the members as a list of their names. It has the same order as the list
    342    returned by :meth:`getmembers`.
    343 
    344 
    345 .. method:: TarFile.list(verbose=True)
    346 
    347    Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`,
    348    only the names of the members are printed. If it is :const:`True`, output
    349    similar to that of :program:`ls -l` is produced.
    350 
    351 
    352 .. method:: TarFile.next()
    353 
    354    Return the next member of the archive as a :class:`TarInfo` object, when
    355    :class:`TarFile` is opened for reading. Return :const:`None` if there is no more
    356    available.
    357 
    358 
    359 .. method:: TarFile.extractall(path=".", members=None)
    360 
    361    Extract all members from the archive to the current working directory or
    362    directory *path*. If optional *members* is given, it must be a subset of the
    363    list returned by :meth:`getmembers`. Directory information like owner,
    364    modification time and permissions are set after all members have been extracted.
    365    This is done to work around two problems: A directory's modification time is
    366    reset each time a file is created in it. And, if a directory's permissions do
    367    not allow writing, extracting files to it will fail.
    368 
    369    .. warning::
    370 
    371       Never extract archives from untrusted sources without prior inspection.
    372       It is possible that files are created outside of *path*, e.g. members
    373       that have absolute filenames starting with ``"/"`` or filenames with two
    374       dots ``".."``.
    375 
    376    .. versionadded:: 2.5
    377 
    378 
    379 .. method:: TarFile.extract(member, path="")
    380 
    381    Extract a member from the archive to the current working directory, using its
    382    full name. Its file information is extracted as accurately as possible. *member*
    383    may be a filename or a :class:`TarInfo` object. You can specify a different
    384    directory using *path*.
    385 
    386    .. note::
    387 
    388       The :meth:`extract` method does not take care of several extraction issues.
    389       In most cases you should consider using the :meth:`extractall` method.
    390 
    391    .. warning::
    392 
    393       See the warning for :meth:`extractall`.
    394 
    395 
    396 .. method:: TarFile.extractfile(member)
    397 
    398    Extract a member from the archive as a file object. *member* may be a filename
    399    or a :class:`TarInfo` object. If *member* is a regular file, a file-like object
    400    is returned. If *member* is a link, a file-like object is constructed from the
    401    link's target. If *member* is none of the above, :const:`None` is returned.
    402 
    403    .. note::
    404 
    405       The file-like object is read-only.  It provides the methods
    406       :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`,
    407       and :meth:`close`, and also supports iteration over its lines.
    408 
    409 
    410 .. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None)
    411 
    412    Add the file *name* to the archive. *name* may be any type of file (directory,
    413    fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name
    414    for the file in the archive. Directories are added recursively by default. This
    415    can be avoided by setting *recursive* to :const:`False`. If *exclude* is given
    416    it must be a function that takes one filename argument and returns a boolean
    417    value. Depending on this value the respective file is either excluded
    418    (:const:`True`) or added (:const:`False`). If *filter* is specified it must
    419    be a function that takes a :class:`TarInfo` object argument and returns the
    420    changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo`
    421    object will be excluded from the archive. See :ref:`tar-examples` for an
    422    example.
    423 
    424    .. versionchanged:: 2.6
    425       Added the *exclude* parameter.
    426 
    427    .. versionchanged:: 2.7
    428       Added the *filter* parameter.
    429 
    430    .. deprecated:: 2.7
    431       The *exclude* parameter is deprecated, please use the *filter* parameter
    432       instead.  For maximum portability, *filter* should be used as a keyword
    433       argument rather than as a positional argument so that code won't be
    434       affected when *exclude* is ultimately removed.
    435 
    436 
    437 .. method:: TarFile.addfile(tarinfo, fileobj=None)
    438 
    439    Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given,
    440    ``tarinfo.size`` bytes are read from it and added to the archive.  You can
    441    create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`.
    442 
    443    .. note::
    444       On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to
    445       avoid irritation about the file size.
    446 
    447 
    448 .. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None)
    449 
    450    Create a :class:`TarInfo` object from the result of :func:`os.stat` or
    451    equivalent on an existing file.  The file is either named by *name*, or
    452    specified as a file object *fileobj* with a file descriptor.  If
    453    given, *arcname* specifies an alternative name for the file in the
    454    archive, otherwise, the name is taken from *fileobj*s
    455    :attr:`~file.name` attribute, or the *name* argument.
    456 
    457    You can modify some
    458    of the :class:`TarInfo`s attributes before you add it using :meth:`addfile`.
    459    If the file object is not an ordinary file object positioned at the
    460    beginning of the file, attributes such as :attr:`~TarInfo.size` may need
    461    modifying.  This is the case for objects such as :class:`~gzip.GzipFile`.
    462    The :attr:`~TarInfo.name` may also be modified, in which case *arcname*
    463    could be a dummy string.
    464 
    465 
    466 .. method:: TarFile.close()
    467 
    468    Close the :class:`TarFile`. In write mode, two finishing zero blocks are
    469    appended to the archive.
    470 
    471 
    472 .. attribute:: TarFile.posix
    473 
    474    Setting this to :const:`True` is equivalent to setting the :attr:`format`
    475    attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to
    476    :const:`GNU_FORMAT`.
    477 
    478    .. versionchanged:: 2.4
    479       *posix* defaults to :const:`False`.
    480 
    481    .. deprecated:: 2.6
    482       Use the :attr:`format` attribute instead.
    483 
    484 
    485 .. attribute:: TarFile.pax_headers
    486 
    487    A dictionary containing key-value pairs of pax global headers.
    488 
    489    .. versionadded:: 2.6
    490 
    491 
    492 .. _tarinfo-objects:
    493 
    494 TarInfo Objects
    495 ---------------
    496 
    497 A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside
    498 from storing all required attributes of a file (like file type, size, time,
    499 permissions, owner etc.), it provides some useful methods to determine its type.
    500 It does *not* contain the file's data itself.
    501 
    502 :class:`TarInfo` objects are returned by :class:`TarFile`'s methods
    503 :meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`.
    504 
    505 
    506 .. class:: TarInfo(name="")
    507 
    508    Create a :class:`TarInfo` object.
    509 
    510 
    511 .. method:: TarInfo.frombuf(buf)
    512 
    513    Create and return a :class:`TarInfo` object from string buffer *buf*.
    514 
    515    .. versionadded:: 2.6
    516       Raises :exc:`HeaderError` if the buffer is invalid..
    517 
    518 
    519 .. method:: TarInfo.fromtarfile(tarfile)
    520 
    521    Read the next member from the :class:`TarFile` object *tarfile* and return it as
    522    a :class:`TarInfo` object.
    523 
    524    .. versionadded:: 2.6
    525 
    526 
    527 .. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict')
    528 
    529    Create a string buffer from a :class:`TarInfo` object. For information on the
    530    arguments see the constructor of the :class:`TarFile` class.
    531 
    532    .. versionchanged:: 2.6
    533       The arguments were added.
    534 
    535 A ``TarInfo`` object has the following public data attributes:
    536 
    537 
    538 .. attribute:: TarInfo.name
    539 
    540    Name of the archive member.
    541 
    542 
    543 .. attribute:: TarInfo.size
    544 
    545    Size in bytes.
    546 
    547 
    548 .. attribute:: TarInfo.mtime
    549 
    550    Time of last modification.
    551 
    552 
    553 .. attribute:: TarInfo.mode
    554 
    555    Permission bits.
    556 
    557 
    558 .. attribute:: TarInfo.type
    559 
    560    File type.  *type* is usually one of these constants: :const:`REGTYPE`,
    561    :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`,
    562    :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`,
    563    :const:`GNUTYPE_SPARSE`.  To determine the type of a :class:`TarInfo` object
    564    more conveniently, use the ``is*()`` methods below.
    565 
    566 
    567 .. attribute:: TarInfo.linkname
    568 
    569    Name of the target file name, which is only present in :class:`TarInfo` objects
    570    of type :const:`LNKTYPE` and :const:`SYMTYPE`.
    571 
    572 
    573 .. attribute:: TarInfo.uid
    574 
    575    User ID of the user who originally stored this member.
    576 
    577 
    578 .. attribute:: TarInfo.gid
    579 
    580    Group ID of the user who originally stored this member.
    581 
    582 
    583 .. attribute:: TarInfo.uname
    584 
    585    User name.
    586 
    587 
    588 .. attribute:: TarInfo.gname
    589 
    590    Group name.
    591 
    592 
    593 .. attribute:: TarInfo.pax_headers
    594 
    595    A dictionary containing key-value pairs of an associated pax extended header.
    596 
    597    .. versionadded:: 2.6
    598 
    599 A :class:`TarInfo` object also provides some convenient query methods:
    600 
    601 
    602 .. method:: TarInfo.isfile()
    603 
    604    Return :const:`True` if the :class:`Tarinfo` object is a regular file.
    605 
    606 
    607 .. method:: TarInfo.isreg()
    608 
    609    Same as :meth:`isfile`.
    610 
    611 
    612 .. method:: TarInfo.isdir()
    613 
    614    Return :const:`True` if it is a directory.
    615 
    616 
    617 .. method:: TarInfo.issym()
    618 
    619    Return :const:`True` if it is a symbolic link.
    620 
    621 
    622 .. method:: TarInfo.islnk()
    623 
    624    Return :const:`True` if it is a hard link.
    625 
    626 
    627 .. method:: TarInfo.ischr()
    628 
    629    Return :const:`True` if it is a character device.
    630 
    631 
    632 .. method:: TarInfo.isblk()
    633 
    634    Return :const:`True` if it is a block device.
    635 
    636 
    637 .. method:: TarInfo.isfifo()
    638 
    639    Return :const:`True` if it is a FIFO.
    640 
    641 
    642 .. method:: TarInfo.isdev()
    643 
    644    Return :const:`True` if it is one of character device, block device or FIFO.
    645 
    646 
    647 .. _tar-examples:
    648 
    649 Examples
    650 --------
    651 
    652 How to extract an entire tar archive to the current working directory::
    653 
    654    import tarfile
    655    tar = tarfile.open("sample.tar.gz")
    656    tar.extractall()
    657    tar.close()
    658 
    659 How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
    660 a generator function instead of a list::
    661 
    662    import os
    663    import tarfile
    664 
    665    def py_files(members):
    666        for tarinfo in members:
    667            if os.path.splitext(tarinfo.name)[1] == ".py":
    668                yield tarinfo
    669 
    670    tar = tarfile.open("sample.tar.gz")
    671    tar.extractall(members=py_files(tar))
    672    tar.close()
    673 
    674 How to create an uncompressed tar archive from a list of filenames::
    675 
    676    import tarfile
    677    tar = tarfile.open("sample.tar", "w")
    678    for name in ["foo", "bar", "quux"]:
    679        tar.add(name)
    680    tar.close()
    681 
    682 The same example using the :keyword:`with` statement::
    683 
    684     import tarfile
    685     with tarfile.open("sample.tar", "w") as tar:
    686         for name in ["foo", "bar", "quux"]:
    687             tar.add(name)
    688 
    689 How to read a gzip compressed tar archive and display some member information::
    690 
    691    import tarfile
    692    tar = tarfile.open("sample.tar.gz", "r:gz")
    693    for tarinfo in tar:
    694        print tarinfo.name, "is", tarinfo.size, "bytes in size and is",
    695        if tarinfo.isreg():
    696            print "a regular file."
    697        elif tarinfo.isdir():
    698            print "a directory."
    699        else:
    700            print "something else."
    701    tar.close()
    702 
    703 How to create an archive and reset the user information using the *filter*
    704 parameter in :meth:`TarFile.add`::
    705 
    706     import tarfile
    707     def reset(tarinfo):
    708         tarinfo.uid = tarinfo.gid = 0
    709         tarinfo.uname = tarinfo.gname = "root"
    710         return tarinfo
    711     tar = tarfile.open("sample.tar.gz", "w:gz")
    712     tar.add("foo", filter=reset)
    713     tar.close()
    714 
    715 
    716 .. _tar-formats:
    717 
    718 Supported tar formats
    719 ---------------------
    720 
    721 There are three tar formats that can be created with the :mod:`tarfile` module:
    722 
    723 * The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames
    724   up to a length of at best 256 characters and linknames up to 100 characters. The
    725   maximum file size is 8 gigabytes. This is an old and limited but widely
    726   supported format.
    727 
    728 * The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and
    729   linknames, files bigger than 8 gigabytes and sparse files. It is the de facto
    730   standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar
    731   extensions for long names, sparse file support is read-only.
    732 
    733 * The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
    734   format with virtually no limits. It supports long filenames and linknames, large
    735   files and stores pathnames in a portable way. However, not all tar
    736   implementations today are able to handle pax archives properly.
    737 
    738   The *pax* format is an extension to the existing *ustar* format. It uses extra
    739   headers for information that cannot be stored otherwise. There are two flavours
    740   of pax headers: Extended headers only affect the subsequent file header, global
    741   headers are valid for the complete archive and affect all following files. All
    742   the data in a pax header is encoded in *UTF-8* for portability reasons.
    743 
    744 There are some more variants of the tar format which can be read, but not
    745 created:
    746 
    747 * The ancient V7 format. This is the first tar format from Unix Seventh Edition,
    748   storing only regular files and directories. Names must not be longer than 100
    749   characters, there is no user/group name information. Some archives have
    750   miscalculated header checksums in case of fields with non-ASCII characters.
    751 
    752 * The SunOS tar extended format. This format is a variant of the POSIX.1-2001
    753   pax format, but is not compatible.
    754 
    755 .. _tar-unicode:
    756 
    757 Unicode issues
    758 --------------
    759 
    760 The tar format was originally conceived to make backups on tape drives with the
    761 main focus on preserving file system information. Nowadays tar archives are
    762 commonly used for file distribution and exchanging archives over networks. One
    763 problem of the original format (that all other formats are merely variants of)
    764 is that there is no concept of supporting different character encodings. For
    765 example, an ordinary tar archive created on a *UTF-8* system cannot be read
    766 correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e.
    767 filenames, linknames, user/group names) containing these characters will appear
    768 damaged.  Unfortunately, there is no way to autodetect the encoding of an
    769 archive.
    770 
    771 The pax format was designed to solve this problem. It stores non-ASCII names
    772 using the universal character encoding *UTF-8*. When a pax archive is read,
    773 these *UTF-8* names are converted to the encoding of the local file system.
    774 
    775 The details of unicode conversion are controlled by the *encoding* and *errors*
    776 keyword arguments of the :class:`TarFile` class.
    777 
    778 The default value for *encoding* is the local character encoding. It is deduced
    779 from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In
    780 read mode, *encoding* is used exclusively to convert unicode names from a pax
    781 archive to strings in the local character encoding. In write mode, the use of
    782 *encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`,
    783 input names that contain non-ASCII characters need to be decoded before being
    784 stored as *UTF-8* strings. The other formats do not make use of *encoding*
    785 unless unicode objects are used as input names. These are converted to 8-bit
    786 character strings before they are added to the archive.
    787 
    788 The *errors* argument defines how characters are treated that cannot be
    789 converted to or from *encoding*. Possible values are listed in section
    790 :ref:`codec-base-classes`. In read mode, there is an additional scheme
    791 ``'utf-8'`` which means that bad characters are replaced by their *UTF-8*
    792 representation. This is the default scheme. In write mode the default value for
    793 *errors* is ``'strict'`` to ensure that name information is not altered
    794 unnoticed.
    795 
    796