1 :mod:`tarfile` --- Read and write tar archive files 2 =================================================== 3 4 .. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7 .. moduleauthor:: Lars Gustbel <lars (a] gustaebel.de> 8 .. sectionauthor:: Lars Gustbel <lars (a] gustaebel.de> 9 10 **Source code:** :source:`Lib/tarfile.py` 11 12 -------------- 13 14 The :mod:`tarfile` module makes it possible to read and write tar 15 archives, including those using gzip, bz2 and lzma compression. 16 Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 17 higher-level functions in :ref:`shutil <archiving-operations>`. 18 19 Some facts and figures: 20 21 * reads and writes :mod:`gzip`, :mod:`bz2` and :mod:`lzma` compressed archives 22 if the respective modules are available. 23 24 * read/write support for the POSIX.1-1988 (ustar) format. 25 26 * read/write support for the GNU tar format including *longname* and *longlink* 27 extensions, read-only support for all variants of the *sparse* extension 28 including restoration of sparse files. 29 30 * read/write support for the POSIX.1-2001 (pax) format. 31 32 * handles directories, regular files, hardlinks, symbolic links, fifos, 33 character devices and block devices and is able to acquire and restore file 34 information like timestamp, access permissions and owner. 35 36 .. versionchanged:: 3.3 37 Added support for :mod:`lzma` compression. 38 39 40 .. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) 41 42 Return a :class:`TarFile` object for the pathname *name*. For detailed 43 information on :class:`TarFile` objects and the keyword arguments that are 44 allowed, see :ref:`tarfile-objects`. 45 46 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 47 to ``'r'``. Here is a full list of mode combinations: 48 49 +------------------+---------------------------------------------+ 50 | mode | action | 51 +==================+=============================================+ 52 | ``'r' or 'r:*'`` | Open for reading with transparent | 53 | | compression (recommended). | 54 +------------------+---------------------------------------------+ 55 | ``'r:'`` | Open for reading exclusively without | 56 | | compression. | 57 +------------------+---------------------------------------------+ 58 | ``'r:gz'`` | Open for reading with gzip compression. | 59 +------------------+---------------------------------------------+ 60 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 61 +------------------+---------------------------------------------+ 62 | ``'r:xz'`` | Open for reading with lzma compression. | 63 +------------------+---------------------------------------------+ 64 | ``'x'`` or | Create a tarfile exclusively without | 65 | ``'x:'`` | compression. | 66 | | Raise an :exc:`FileExistsError` exception | 67 | | if it already exists. | 68 +------------------+---------------------------------------------+ 69 | ``'x:gz'`` | Create a tarfile with gzip compression. | 70 | | Raise an :exc:`FileExistsError` exception | 71 | | if it already exists. | 72 +------------------+---------------------------------------------+ 73 | ``'x:bz2'`` | Create a tarfile with bzip2 compression. | 74 | | Raise an :exc:`FileExistsError` exception | 75 | | if it already exists. | 76 +------------------+---------------------------------------------+ 77 | ``'x:xz'`` | Create a tarfile with lzma compression. | 78 | | Raise an :exc:`FileExistsError` exception | 79 | | if it already exists. | 80 +------------------+---------------------------------------------+ 81 | ``'a' or 'a:'`` | Open for appending with no compression. The | 82 | | file is created if it does not exist. | 83 +------------------+---------------------------------------------+ 84 | ``'w' or 'w:'`` | Open for uncompressed writing. | 85 +------------------+---------------------------------------------+ 86 | ``'w:gz'`` | Open for gzip compressed writing. | 87 +------------------+---------------------------------------------+ 88 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 89 +------------------+---------------------------------------------+ 90 | ``'w:xz'`` | Open for lzma compressed writing. | 91 +------------------+---------------------------------------------+ 92 93 Note that ``'a:gz'``, ``'a:bz2'`` or ``'a:xz'`` is not possible. If *mode* 94 is not suitable to open a certain (compressed) file for reading, 95 :exc:`ReadError` is raised. Use *mode* ``'r'`` to avoid this. If a 96 compression method is not supported, :exc:`CompressionError` is raised. 97 98 If *fileobj* is specified, it is used as an alternative to a :term:`file object` 99 opened in binary mode for *name*. It is supposed to be at position 0. 100 101 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, ``'x:gz'``, 102 ``'x:bz2'``, :func:`tarfile.open` accepts the keyword argument 103 *compresslevel* (default ``9``) to specify the compression level of the file. 104 105 For special purposes, there is a second format for *mode*: 106 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 107 object that processes its data as a stream of blocks. No random seeking will 108 be done on the file. If given, *fileobj* may be any object that has a 109 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 110 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 111 in combination with e.g. ``sys.stdin``, a socket :term:`file object` or a tape 112 device. However, such a :class:`TarFile` object is limited in that it does 113 not allow random access, see :ref:`tar-examples`. The currently 114 possible modes: 115 116 +-------------+--------------------------------------------+ 117 | Mode | Action | 118 +=============+============================================+ 119 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 120 | | with transparent compression. | 121 +-------------+--------------------------------------------+ 122 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 123 | | for reading. | 124 +-------------+--------------------------------------------+ 125 | ``'r|gz'`` | Open a gzip compressed *stream* for | 126 | | reading. | 127 +-------------+--------------------------------------------+ 128 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 129 | | reading. | 130 +-------------+--------------------------------------------+ 131 | ``'r|xz'`` | Open an lzma compressed *stream* for | 132 | | reading. | 133 +-------------+--------------------------------------------+ 134 | ``'w|'`` | Open an uncompressed *stream* for writing. | 135 +-------------+--------------------------------------------+ 136 | ``'w|gz'`` | Open a gzip compressed *stream* for | 137 | | writing. | 138 +-------------+--------------------------------------------+ 139 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 140 | | writing. | 141 +-------------+--------------------------------------------+ 142 | ``'w|xz'`` | Open an lzma compressed *stream* for | 143 | | writing. | 144 +-------------+--------------------------------------------+ 145 146 .. versionchanged:: 3.5 147 The ``'x'`` (exclusive creation) mode was added. 148 149 .. versionchanged:: 3.6 150 The *name* parameter accepts a :term:`path-like object`. 151 152 153 .. class:: TarFile 154 155 Class for reading and writing tar archives. Do not use this class directly: 156 use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 157 158 159 .. function:: is_tarfile(name) 160 161 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 162 module can read. 163 164 165 The :mod:`tarfile` module defines the following exceptions: 166 167 168 .. exception:: TarError 169 170 Base class for all :mod:`tarfile` exceptions. 171 172 173 .. exception:: ReadError 174 175 Is raised when a tar archive is opened, that either cannot be handled by the 176 :mod:`tarfile` module or is somehow invalid. 177 178 179 .. exception:: CompressionError 180 181 Is raised when a compression method is not supported or when the data cannot be 182 decoded properly. 183 184 185 .. exception:: StreamError 186 187 Is raised for the limitations that are typical for stream-like :class:`TarFile` 188 objects. 189 190 191 .. exception:: ExtractError 192 193 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 194 :attr:`TarFile.errorlevel`\ ``== 2``. 195 196 197 .. exception:: HeaderError 198 199 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 200 201 202 The following constants are available at the module level: 203 204 .. data:: ENCODING 205 206 The default character encoding: ``'utf-8'`` on Windows, the value returned by 207 :func:`sys.getfilesystemencoding` otherwise. 208 209 210 Each of the following constants defines a tar archive format that the 211 :mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 212 details. 213 214 215 .. data:: USTAR_FORMAT 216 217 POSIX.1-1988 (ustar) format. 218 219 220 .. data:: GNU_FORMAT 221 222 GNU tar format. 223 224 225 .. data:: PAX_FORMAT 226 227 POSIX.1-2001 (pax) format. 228 229 230 .. data:: DEFAULT_FORMAT 231 232 The default format for creating archives. This is currently :const:`GNU_FORMAT`. 233 234 235 .. seealso:: 236 237 Module :mod:`zipfile` 238 Documentation of the :mod:`zipfile` standard module. 239 240 :ref:`archiving-operations` 241 Documentation of the higher-level archiving facilities provided by the 242 standard :mod:`shutil` module. 243 244 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 245 Documentation for tar archive files, including GNU tar extensions. 246 247 248 .. _tarfile-objects: 249 250 TarFile Objects 251 --------------- 252 253 The :class:`TarFile` object provides an interface to a tar archive. A tar 254 archive is a sequence of blocks. An archive member (a stored file) is made up of 255 a header block followed by data blocks. It is possible to store a file in a tar 256 archive several times. Each archive member is represented by a :class:`TarInfo` 257 object, see :ref:`tarinfo-objects` for details. 258 259 A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 260 statement. It will automatically be closed when the block is completed. Please 261 note that in the event of an exception an archive opened for writing will not 262 be finalized; only the internally used file object will be closed. See the 263 :ref:`tar-examples` section for a use case. 264 265 .. versionadded:: 3.2 266 Added support for the context management protocol. 267 268 .. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=0) 269 270 All following arguments are optional and can be accessed as instance attributes 271 as well. 272 273 *name* is the pathname of the archive. *name* may be a :term:`path-like object`. 274 It can be omitted if *fileobj* is given. 275 In this case, the file object's :attr:`name` attribute is used if it exists. 276 277 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 278 data to an existing file, ``'w'`` to create a new file overwriting an existing 279 one, or ``'x'`` to create a new file only if it does not already exist. 280 281 If *fileobj* is given, it is used for reading or writing data. If it can be 282 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 283 from position 0. 284 285 .. note:: 286 287 *fileobj* is not closed, when :class:`TarFile` is closed. 288 289 *format* controls the archive format. It must be one of the constants 290 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 291 defined at module level. 292 293 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 294 with a different one. 295 296 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 297 is :const:`True`, add the content of the target files to the archive. This has no 298 effect on systems that do not support symbolic links. 299 300 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 301 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 302 as possible. This is only useful for reading concatenated or damaged archives. 303 304 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 305 messages). The messages are written to ``sys.stderr``. 306 307 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. 308 Nevertheless, they appear as error messages in the debug output, when debugging 309 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` 310 exceptions. If ``2``, all *non-fatal* errors are raised as :exc:`TarError` 311 exceptions as well. 312 313 The *encoding* and *errors* arguments define the character encoding to be 314 used for reading or writing the archive and how conversion errors are going 315 to be handled. The default settings will work for most users. 316 See section :ref:`tar-unicode` for in-depth information. 317 318 The *pax_headers* argument is an optional dictionary of strings which 319 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 320 321 .. versionchanged:: 3.2 322 Use ``'surrogateescape'`` as the default for the *errors* argument. 323 324 .. versionchanged:: 3.5 325 The ``'x'`` (exclusive creation) mode was added. 326 327 .. versionchanged:: 3.6 328 The *name* parameter accepts a :term:`path-like object`. 329 330 331 .. classmethod:: TarFile.open(...) 332 333 Alternative constructor. The :func:`tarfile.open` function is actually a 334 shortcut to this classmethod. 335 336 337 .. method:: TarFile.getmember(name) 338 339 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 340 in the archive, :exc:`KeyError` is raised. 341 342 .. note:: 343 344 If a member occurs more than once in the archive, its last occurrence is assumed 345 to be the most up-to-date version. 346 347 348 .. method:: TarFile.getmembers() 349 350 Return the members of the archive as a list of :class:`TarInfo` objects. The 351 list has the same order as the members in the archive. 352 353 354 .. method:: TarFile.getnames() 355 356 Return the members as a list of their names. It has the same order as the list 357 returned by :meth:`getmembers`. 358 359 360 .. method:: TarFile.list(verbose=True, *, members=None) 361 362 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 363 only the names of the members are printed. If it is :const:`True`, output 364 similar to that of :program:`ls -l` is produced. If optional *members* is 365 given, it must be a subset of the list returned by :meth:`getmembers`. 366 367 .. versionchanged:: 3.5 368 Added the *members* parameter. 369 370 371 .. method:: TarFile.next() 372 373 Return the next member of the archive as a :class:`TarInfo` object, when 374 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 375 available. 376 377 378 .. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False) 379 380 Extract all members from the archive to the current working directory or 381 directory *path*. If optional *members* is given, it must be a subset of the 382 list returned by :meth:`getmembers`. Directory information like owner, 383 modification time and permissions are set after all members have been extracted. 384 This is done to work around two problems: A directory's modification time is 385 reset each time a file is created in it. And, if a directory's permissions do 386 not allow writing, extracting files to it will fail. 387 388 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 389 are used to set the owner/group for the extracted files. Otherwise, the named 390 values from the tarfile are used. 391 392 .. warning:: 393 394 Never extract archives from untrusted sources without prior inspection. 395 It is possible that files are created outside of *path*, e.g. members 396 that have absolute filenames starting with ``"/"`` or filenames with two 397 dots ``".."``. 398 399 .. versionchanged:: 3.5 400 Added the *numeric_owner* parameter. 401 402 .. versionchanged:: 3.6 403 The *path* parameter accepts a :term:`path-like object`. 404 405 406 .. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False) 407 408 Extract a member from the archive to the current working directory, using its 409 full name. Its file information is extracted as accurately as possible. *member* 410 may be a filename or a :class:`TarInfo` object. You can specify a different 411 directory using *path*. *path* may be a :term:`path-like object`. 412 File attributes (owner, mtime, mode) are set unless *set_attrs* is false. 413 414 If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile 415 are used to set the owner/group for the extracted files. Otherwise, the named 416 values from the tarfile are used. 417 418 .. note:: 419 420 The :meth:`extract` method does not take care of several extraction issues. 421 In most cases you should consider using the :meth:`extractall` method. 422 423 .. warning:: 424 425 See the warning for :meth:`extractall`. 426 427 .. versionchanged:: 3.2 428 Added the *set_attrs* parameter. 429 430 .. versionchanged:: 3.5 431 Added the *numeric_owner* parameter. 432 433 .. versionchanged:: 3.6 434 The *path* parameter accepts a :term:`path-like object`. 435 436 437 .. method:: TarFile.extractfile(member) 438 439 Extract a member from the archive as a file object. *member* may be a filename 440 or a :class:`TarInfo` object. If *member* is a regular file or a link, an 441 :class:`io.BufferedReader` object is returned. Otherwise, :const:`None` is 442 returned. 443 444 .. versionchanged:: 3.3 445 Return an :class:`io.BufferedReader` object. 446 447 448 .. method:: TarFile.add(name, arcname=None, recursive=True, *, filter=None) 449 450 Add the file *name* to the archive. *name* may be any type of file 451 (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an 452 alternative name for the file in the archive. Directories are added 453 recursively by default. This can be avoided by setting *recursive* to 454 :const:`False`. Recursion adds entries in sorted order. 455 If *filter* is given, it 456 should be a function that takes a :class:`TarInfo` object argument and 457 returns the changed :class:`TarInfo` object. If it instead returns 458 :const:`None` the :class:`TarInfo` object will be excluded from the 459 archive. See :ref:`tar-examples` for an example. 460 461 .. versionchanged:: 3.2 462 Added the *filter* parameter. 463 464 .. versionchanged:: 3.7 465 Recursion adds entries in sorted order. 466 467 468 .. method:: TarFile.addfile(tarinfo, fileobj=None) 469 470 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 471 it should be a :term:`binary file`, and 472 ``tarinfo.size`` bytes are read from it and added to the archive. You can 473 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 474 475 476 .. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 477 478 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 479 equivalent on an existing file. The file is either named by *name*, or 480 specified as a :term:`file object` *fileobj* with a file descriptor. 481 *name* may be a :term:`path-like object`. If 482 given, *arcname* specifies an alternative name for the file in the 483 archive, otherwise, the name is taken from *fileobj*s 484 :attr:`~io.FileIO.name` attribute, or the *name* argument. The name 485 should be a text string. 486 487 You can modify 488 some of the :class:`TarInfo`s attributes before you add it using :meth:`addfile`. 489 If the file object is not an ordinary file object positioned at the 490 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 491 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 492 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 493 could be a dummy string. 494 495 .. versionchanged:: 3.6 496 The *name* parameter accepts a :term:`path-like object`. 497 498 499 .. method:: TarFile.close() 500 501 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 502 appended to the archive. 503 504 505 .. attribute:: TarFile.pax_headers 506 507 A dictionary containing key-value pairs of pax global headers. 508 509 510 511 .. _tarinfo-objects: 512 513 TarInfo Objects 514 --------------- 515 516 A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 517 from storing all required attributes of a file (like file type, size, time, 518 permissions, owner etc.), it provides some useful methods to determine its type. 519 It does *not* contain the file's data itself. 520 521 :class:`TarInfo` objects are returned by :class:`TarFile`'s methods 522 :meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. 523 524 525 .. class:: TarInfo(name="") 526 527 Create a :class:`TarInfo` object. 528 529 530 .. classmethod:: TarInfo.frombuf(buf, encoding, errors) 531 532 Create and return a :class:`TarInfo` object from string buffer *buf*. 533 534 Raises :exc:`HeaderError` if the buffer is invalid. 535 536 537 .. classmethod:: TarInfo.fromtarfile(tarfile) 538 539 Read the next member from the :class:`TarFile` object *tarfile* and return it as 540 a :class:`TarInfo` object. 541 542 543 .. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') 544 545 Create a string buffer from a :class:`TarInfo` object. For information on the 546 arguments see the constructor of the :class:`TarFile` class. 547 548 .. versionchanged:: 3.2 549 Use ``'surrogateescape'`` as the default for the *errors* argument. 550 551 552 A ``TarInfo`` object has the following public data attributes: 553 554 555 .. attribute:: TarInfo.name 556 557 Name of the archive member. 558 559 560 .. attribute:: TarInfo.size 561 562 Size in bytes. 563 564 565 .. attribute:: TarInfo.mtime 566 567 Time of last modification. 568 569 570 .. attribute:: TarInfo.mode 571 572 Permission bits. 573 574 575 .. attribute:: TarInfo.type 576 577 File type. *type* is usually one of these constants: :const:`REGTYPE`, 578 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 579 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 580 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 581 more conveniently, use the ``is*()`` methods below. 582 583 584 .. attribute:: TarInfo.linkname 585 586 Name of the target file name, which is only present in :class:`TarInfo` objects 587 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 588 589 590 .. attribute:: TarInfo.uid 591 592 User ID of the user who originally stored this member. 593 594 595 .. attribute:: TarInfo.gid 596 597 Group ID of the user who originally stored this member. 598 599 600 .. attribute:: TarInfo.uname 601 602 User name. 603 604 605 .. attribute:: TarInfo.gname 606 607 Group name. 608 609 610 .. attribute:: TarInfo.pax_headers 611 612 A dictionary containing key-value pairs of an associated pax extended header. 613 614 615 A :class:`TarInfo` object also provides some convenient query methods: 616 617 618 .. method:: TarInfo.isfile() 619 620 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 621 622 623 .. method:: TarInfo.isreg() 624 625 Same as :meth:`isfile`. 626 627 628 .. method:: TarInfo.isdir() 629 630 Return :const:`True` if it is a directory. 631 632 633 .. method:: TarInfo.issym() 634 635 Return :const:`True` if it is a symbolic link. 636 637 638 .. method:: TarInfo.islnk() 639 640 Return :const:`True` if it is a hard link. 641 642 643 .. method:: TarInfo.ischr() 644 645 Return :const:`True` if it is a character device. 646 647 648 .. method:: TarInfo.isblk() 649 650 Return :const:`True` if it is a block device. 651 652 653 .. method:: TarInfo.isfifo() 654 655 Return :const:`True` if it is a FIFO. 656 657 658 .. method:: TarInfo.isdev() 659 660 Return :const:`True` if it is one of character device, block device or FIFO. 661 662 663 .. _tarfile-commandline: 664 .. program:: tarfile 665 666 Command-Line Interface 667 ---------------------- 668 669 .. versionadded:: 3.4 670 671 The :mod:`tarfile` module provides a simple command-line interface to interact 672 with tar archives. 673 674 If you want to create a new tar archive, specify its name after the :option:`-c` 675 option and then list the filename(s) that should be included: 676 677 .. code-block:: shell-session 678 679 $ python -m tarfile -c monty.tar spam.txt eggs.txt 680 681 Passing a directory is also acceptable: 682 683 .. code-block:: shell-session 684 685 $ python -m tarfile -c monty.tar life-of-brian_1979/ 686 687 If you want to extract a tar archive into the current directory, use 688 the :option:`-e` option: 689 690 .. code-block:: shell-session 691 692 $ python -m tarfile -e monty.tar 693 694 You can also extract a tar archive into a different directory by passing the 695 directory's name: 696 697 .. code-block:: shell-session 698 699 $ python -m tarfile -e monty.tar other-dir/ 700 701 For a list of the files in a tar archive, use the :option:`-l` option: 702 703 .. code-block:: shell-session 704 705 $ python -m tarfile -l monty.tar 706 707 708 Command-line options 709 ~~~~~~~~~~~~~~~~~~~~ 710 711 .. cmdoption:: -l <tarfile> 712 --list <tarfile> 713 714 List files in a tarfile. 715 716 .. cmdoption:: -c <tarfile> <source1> ... <sourceN> 717 --create <tarfile> <source1> ... <sourceN> 718 719 Create tarfile from source files. 720 721 .. cmdoption:: -e <tarfile> [<output_dir>] 722 --extract <tarfile> [<output_dir>] 723 724 Extract tarfile into the current directory if *output_dir* is not specified. 725 726 .. cmdoption:: -t <tarfile> 727 --test <tarfile> 728 729 Test whether the tarfile is valid or not. 730 731 .. cmdoption:: -v, --verbose 732 733 Verbose output. 734 735 .. _tar-examples: 736 737 Examples 738 -------- 739 740 How to extract an entire tar archive to the current working directory:: 741 742 import tarfile 743 tar = tarfile.open("sample.tar.gz") 744 tar.extractall() 745 tar.close() 746 747 How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 748 a generator function instead of a list:: 749 750 import os 751 import tarfile 752 753 def py_files(members): 754 for tarinfo in members: 755 if os.path.splitext(tarinfo.name)[1] == ".py": 756 yield tarinfo 757 758 tar = tarfile.open("sample.tar.gz") 759 tar.extractall(members=py_files(tar)) 760 tar.close() 761 762 How to create an uncompressed tar archive from a list of filenames:: 763 764 import tarfile 765 tar = tarfile.open("sample.tar", "w") 766 for name in ["foo", "bar", "quux"]: 767 tar.add(name) 768 tar.close() 769 770 The same example using the :keyword:`with` statement:: 771 772 import tarfile 773 with tarfile.open("sample.tar", "w") as tar: 774 for name in ["foo", "bar", "quux"]: 775 tar.add(name) 776 777 How to read a gzip compressed tar archive and display some member information:: 778 779 import tarfile 780 tar = tarfile.open("sample.tar.gz", "r:gz") 781 for tarinfo in tar: 782 print(tarinfo.name, "is", tarinfo.size, "bytes in size and is", end="") 783 if tarinfo.isreg(): 784 print("a regular file.") 785 elif tarinfo.isdir(): 786 print("a directory.") 787 else: 788 print("something else.") 789 tar.close() 790 791 How to create an archive and reset the user information using the *filter* 792 parameter in :meth:`TarFile.add`:: 793 794 import tarfile 795 def reset(tarinfo): 796 tarinfo.uid = tarinfo.gid = 0 797 tarinfo.uname = tarinfo.gname = "root" 798 return tarinfo 799 tar = tarfile.open("sample.tar.gz", "w:gz") 800 tar.add("foo", filter=reset) 801 tar.close() 802 803 804 .. _tar-formats: 805 806 Supported tar formats 807 --------------------- 808 809 There are three tar formats that can be created with the :mod:`tarfile` module: 810 811 * The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 812 up to a length of at best 256 characters and linknames up to 100 characters. The 813 maximum file size is 8 GiB. This is an old and limited but widely 814 supported format. 815 816 * The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 817 linknames, files bigger than 8 GiB and sparse files. It is the de facto 818 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 819 extensions for long names, sparse file support is read-only. 820 821 * The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 822 format with virtually no limits. It supports long filenames and linknames, large 823 files and stores pathnames in a portable way. However, not all tar 824 implementations today are able to handle pax archives properly. 825 826 The *pax* format is an extension to the existing *ustar* format. It uses extra 827 headers for information that cannot be stored otherwise. There are two flavours 828 of pax headers: Extended headers only affect the subsequent file header, global 829 headers are valid for the complete archive and affect all following files. All 830 the data in a pax header is encoded in *UTF-8* for portability reasons. 831 832 There are some more variants of the tar format which can be read, but not 833 created: 834 835 * The ancient V7 format. This is the first tar format from Unix Seventh Edition, 836 storing only regular files and directories. Names must not be longer than 100 837 characters, there is no user/group name information. Some archives have 838 miscalculated header checksums in case of fields with non-ASCII characters. 839 840 * The SunOS tar extended format. This format is a variant of the POSIX.1-2001 841 pax format, but is not compatible. 842 843 .. _tar-unicode: 844 845 Unicode issues 846 -------------- 847 848 The tar format was originally conceived to make backups on tape drives with the 849 main focus on preserving file system information. Nowadays tar archives are 850 commonly used for file distribution and exchanging archives over networks. One 851 problem of the original format (which is the basis of all other formats) is 852 that there is no concept of supporting different character encodings. For 853 example, an ordinary tar archive created on a *UTF-8* system cannot be read 854 correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual 855 metadata (like filenames, linknames, user/group names) will appear damaged. 856 Unfortunately, there is no way to autodetect the encoding of an archive. The 857 pax format was designed to solve this problem. It stores non-ASCII metadata 858 using the universal character encoding *UTF-8*. 859 860 The details of character conversion in :mod:`tarfile` are controlled by the 861 *encoding* and *errors* keyword arguments of the :class:`TarFile` class. 862 863 *encoding* defines the character encoding to use for the metadata in the 864 archive. The default value is :func:`sys.getfilesystemencoding` or ``'ascii'`` 865 as a fallback. Depending on whether the archive is read or written, the 866 metadata must be either decoded or encoded. If *encoding* is not set 867 appropriately, this conversion may fail. 868 869 The *errors* argument defines how characters are treated that cannot be 870 converted. Possible values are listed in section :ref:`error-handlers`. 871 The default scheme is ``'surrogateescape'`` which Python also uses for its 872 file system calls, see :ref:`os-filenames`. 873 874 In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed 875 because all the metadata is stored using *UTF-8*. *encoding* is only used in 876 the rare cases when binary pax headers are decoded or when strings with 877 surrogate characters are stored. 878