1 :mod:`tarfile` --- Read and write tar archive files 2 =================================================== 3 4 .. module:: tarfile 5 :synopsis: Read and write tar-format archive files. 6 7 8 .. versionadded:: 2.3 9 10 .. moduleauthor:: Lars Gustbel <lars (a] gustaebel.de> 11 .. sectionauthor:: Lars Gustbel <lars (a] gustaebel.de> 12 13 **Source code:** :source:`Lib/tarfile.py` 14 15 -------------- 16 17 The :mod:`tarfile` module makes it possible to read and write tar 18 archives, including those using gzip or bz2 compression. 19 Use the :mod:`zipfile` module to read or write :file:`.zip` files, or the 20 higher-level functions in :ref:`shutil <archiving-operations>`. 21 22 Some facts and figures: 23 24 * reads and writes :mod:`gzip` and :mod:`bz2` compressed archives 25 if the respective modules are available. 26 27 * read/write support for the POSIX.1-1988 (ustar) format. 28 29 * read/write support for the GNU tar format including *longname* and *longlink* 30 extensions, read-only support for the *sparse* extension. 31 32 * read/write support for the POSIX.1-2001 (pax) format. 33 34 .. versionadded:: 2.6 35 36 * handles directories, regular files, hardlinks, symbolic links, fifos, 37 character devices and block devices and is able to acquire and restore file 38 information like timestamp, access permissions and owner. 39 40 41 .. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, \*\*kwargs) 42 43 Return a :class:`TarFile` object for the pathname *name*. For detailed 44 information on :class:`TarFile` objects and the keyword arguments that are 45 allowed, see :ref:`tarfile-objects`. 46 47 *mode* has to be a string of the form ``'filemode[:compression]'``, it defaults 48 to ``'r'``. Here is a full list of mode combinations: 49 50 +------------------+---------------------------------------------+ 51 | mode | action | 52 +==================+=============================================+ 53 | ``'r' or 'r:*'`` | Open for reading with transparent | 54 | | compression (recommended). | 55 +------------------+---------------------------------------------+ 56 | ``'r:'`` | Open for reading exclusively without | 57 | | compression. | 58 +------------------+---------------------------------------------+ 59 | ``'r:gz'`` | Open for reading with gzip compression. | 60 +------------------+---------------------------------------------+ 61 | ``'r:bz2'`` | Open for reading with bzip2 compression. | 62 +------------------+---------------------------------------------+ 63 | ``'a' or 'a:'`` | Open for appending with no compression. The | 64 | | file is created if it does not exist. | 65 +------------------+---------------------------------------------+ 66 | ``'w' or 'w:'`` | Open for uncompressed writing. | 67 +------------------+---------------------------------------------+ 68 | ``'w:gz'`` | Open for gzip compressed writing. | 69 +------------------+---------------------------------------------+ 70 | ``'w:bz2'`` | Open for bzip2 compressed writing. | 71 +------------------+---------------------------------------------+ 72 73 Note that ``'a:gz'`` or ``'a:bz2'`` is not possible. If *mode* is not suitable 74 to open a certain (compressed) file for reading, :exc:`ReadError` is raised. Use 75 *mode* ``'r'`` to avoid this. If a compression method is not supported, 76 :exc:`CompressionError` is raised. 77 78 If *fileobj* is specified, it is used as an alternative to a file object opened 79 for *name*. It is supposed to be at position 0. 80 81 For modes ``'w:gz'``, ``'r:gz'``, ``'w:bz2'``, ``'r:bz2'``, :func:`tarfile.open` 82 accepts the keyword argument *compresslevel* (default ``9``) to 83 specify the compression level of the file. 84 85 For special purposes, there is a second format for *mode*: 86 ``'filemode|[compression]'``. :func:`tarfile.open` will return a :class:`TarFile` 87 object that processes its data as a stream of blocks. No random seeking will 88 be done on the file. If given, *fileobj* may be any object that has a 89 :meth:`read` or :meth:`write` method (depending on the *mode*). *bufsize* 90 specifies the blocksize and defaults to ``20 * 512`` bytes. Use this variant 91 in combination with e.g. ``sys.stdin``, a socket file object or a tape 92 device. However, such a :class:`TarFile` object is limited in that it does 93 not allow random access, see :ref:`tar-examples`. The currently 94 possible modes: 95 96 +-------------+--------------------------------------------+ 97 | Mode | Action | 98 +=============+============================================+ 99 | ``'r|*'`` | Open a *stream* of tar blocks for reading | 100 | | with transparent compression. | 101 +-------------+--------------------------------------------+ 102 | ``'r|'`` | Open a *stream* of uncompressed tar blocks | 103 | | for reading. | 104 +-------------+--------------------------------------------+ 105 | ``'r|gz'`` | Open a gzip compressed *stream* for | 106 | | reading. | 107 +-------------+--------------------------------------------+ 108 | ``'r|bz2'`` | Open a bzip2 compressed *stream* for | 109 | | reading. | 110 +-------------+--------------------------------------------+ 111 | ``'w|'`` | Open an uncompressed *stream* for writing. | 112 +-------------+--------------------------------------------+ 113 | ``'w|gz'`` | Open a gzip compressed *stream* for | 114 | | writing. | 115 +-------------+--------------------------------------------+ 116 | ``'w|bz2'`` | Open a bzip2 compressed *stream* for | 117 | | writing. | 118 +-------------+--------------------------------------------+ 119 120 121 .. class:: TarFile 122 123 Class for reading and writing tar archives. Do not use this class directly, 124 better use :func:`tarfile.open` instead. See :ref:`tarfile-objects`. 125 126 127 .. function:: is_tarfile(name) 128 129 Return :const:`True` if *name* is a tar archive file, that the :mod:`tarfile` 130 module can read. 131 132 133 .. class:: TarFileCompat(filename, mode='r', compression=TAR_PLAIN) 134 135 Class for limited access to tar archives with a :mod:`zipfile`\ -like interface. 136 Please consult the documentation of the :mod:`zipfile` module for more details. 137 *compression* must be one of the following constants: 138 139 140 .. data:: TAR_PLAIN 141 142 Constant for an uncompressed tar archive. 143 144 145 .. data:: TAR_GZIPPED 146 147 Constant for a :mod:`gzip` compressed tar archive. 148 149 150 .. deprecated:: 2.6 151 The :class:`TarFileCompat` class has been removed in Python 3. 152 153 154 .. exception:: TarError 155 156 Base class for all :mod:`tarfile` exceptions. 157 158 159 .. exception:: ReadError 160 161 Is raised when a tar archive is opened, that either cannot be handled by the 162 :mod:`tarfile` module or is somehow invalid. 163 164 165 .. exception:: CompressionError 166 167 Is raised when a compression method is not supported or when the data cannot be 168 decoded properly. 169 170 171 .. exception:: StreamError 172 173 Is raised for the limitations that are typical for stream-like :class:`TarFile` 174 objects. 175 176 177 .. exception:: ExtractError 178 179 Is raised for *non-fatal* errors when using :meth:`TarFile.extract`, but only if 180 :attr:`TarFile.errorlevel`\ ``== 2``. 181 182 183 The following constants are available at the module level: 184 185 .. data:: ENCODING 186 187 The default character encoding: ``'utf-8'`` on Windows, the value returned by 188 :func:`sys.getfilesystemencoding` otherwise. 189 190 191 .. exception:: HeaderError 192 193 Is raised by :meth:`TarInfo.frombuf` if the buffer it gets is invalid. 194 195 .. versionadded:: 2.6 196 197 198 Each of the following constants defines a tar archive format that the 199 :mod:`tarfile` module is able to create. See section :ref:`tar-formats` for 200 details. 201 202 203 .. data:: USTAR_FORMAT 204 205 POSIX.1-1988 (ustar) format. 206 207 208 .. data:: GNU_FORMAT 209 210 GNU tar format. 211 212 213 .. data:: PAX_FORMAT 214 215 POSIX.1-2001 (pax) format. 216 217 218 .. data:: DEFAULT_FORMAT 219 220 The default format for creating archives. This is currently :const:`GNU_FORMAT`. 221 222 223 .. seealso:: 224 225 Module :mod:`zipfile` 226 Documentation of the :mod:`zipfile` standard module. 227 228 :ref:`archiving-operations` 229 Documentation of the higher-level archiving facilities provided by the 230 standard :mod:`shutil` module. 231 232 `GNU tar manual, Basic Tar Format <https://www.gnu.org/software/tar/manual/html_node/Standard.html>`_ 233 Documentation for tar archive files, including GNU tar extensions. 234 235 236 .. _tarfile-objects: 237 238 TarFile Objects 239 --------------- 240 241 The :class:`TarFile` object provides an interface to a tar archive. A tar 242 archive is a sequence of blocks. An archive member (a stored file) is made up of 243 a header block followed by data blocks. It is possible to store a file in a tar 244 archive several times. Each archive member is represented by a :class:`TarInfo` 245 object, see :ref:`tarinfo-objects` for details. 246 247 A :class:`TarFile` object can be used as a context manager in a :keyword:`with` 248 statement. It will automatically be closed when the block is completed. Please 249 note that in the event of an exception an archive opened for writing will not 250 be finalized; only the internally used file object will be closed. See the 251 :ref:`tar-examples` section for a use case. 252 253 .. versionadded:: 2.7 254 Added support for the context management protocol. 255 256 .. class:: TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors=None, pax_headers=None, debug=0, errorlevel=0) 257 258 All following arguments are optional and can be accessed as instance attributes 259 as well. 260 261 *name* is the pathname of the archive. It can be omitted if *fileobj* is given. 262 In this case, the file object's :attr:`name` attribute is used if it exists. 263 264 *mode* is either ``'r'`` to read from an existing archive, ``'a'`` to append 265 data to an existing file or ``'w'`` to create a new file overwriting an existing 266 one. 267 268 If *fileobj* is given, it is used for reading or writing data. If it can be 269 determined, *mode* is overridden by *fileobj*'s mode. *fileobj* will be used 270 from position 0. 271 272 .. note:: 273 274 *fileobj* is not closed, when :class:`TarFile` is closed. 275 276 *format* controls the archive format. It must be one of the constants 277 :const:`USTAR_FORMAT`, :const:`GNU_FORMAT` or :const:`PAX_FORMAT` that are 278 defined at module level. 279 280 .. versionadded:: 2.6 281 282 The *tarinfo* argument can be used to replace the default :class:`TarInfo` class 283 with a different one. 284 285 .. versionadded:: 2.6 286 287 If *dereference* is :const:`False`, add symbolic and hard links to the archive. If it 288 is :const:`True`, add the content of the target files to the archive. This has no 289 effect on systems that do not support symbolic links. 290 291 If *ignore_zeros* is :const:`False`, treat an empty block as the end of the archive. 292 If it is :const:`True`, skip empty (and invalid) blocks and try to get as many members 293 as possible. This is only useful for reading concatenated or damaged archives. 294 295 *debug* can be set from ``0`` (no debug messages) up to ``3`` (all debug 296 messages). The messages are written to ``sys.stderr``. 297 298 If *errorlevel* is ``0``, all errors are ignored when using :meth:`TarFile.extract`. 299 Nevertheless, they appear as error messages in the debug output, when debugging 300 is enabled. If ``1``, all *fatal* errors are raised as :exc:`OSError` or 301 :exc:`IOError` exceptions. If ``2``, all *non-fatal* errors are raised as 302 :exc:`TarError` exceptions as well. 303 304 The *encoding* and *errors* arguments control the way strings are converted to 305 unicode objects and vice versa. The default settings will work for most users. 306 See section :ref:`tar-unicode` for in-depth information. 307 308 .. versionadded:: 2.6 309 310 The *pax_headers* argument is an optional dictionary of unicode strings which 311 will be added as a pax global header if *format* is :const:`PAX_FORMAT`. 312 313 .. versionadded:: 2.6 314 315 316 .. classmethod:: TarFile.open(...) 317 318 Alternative constructor. The :func:`tarfile.open` function is actually a 319 shortcut to this classmethod. 320 321 322 .. method:: TarFile.getmember(name) 323 324 Return a :class:`TarInfo` object for member *name*. If *name* can not be found 325 in the archive, :exc:`KeyError` is raised. 326 327 .. note:: 328 329 If a member occurs more than once in the archive, its last occurrence is assumed 330 to be the most up-to-date version. 331 332 333 .. method:: TarFile.getmembers() 334 335 Return the members of the archive as a list of :class:`TarInfo` objects. The 336 list has the same order as the members in the archive. 337 338 339 .. method:: TarFile.getnames() 340 341 Return the members as a list of their names. It has the same order as the list 342 returned by :meth:`getmembers`. 343 344 345 .. method:: TarFile.list(verbose=True) 346 347 Print a table of contents to ``sys.stdout``. If *verbose* is :const:`False`, 348 only the names of the members are printed. If it is :const:`True`, output 349 similar to that of :program:`ls -l` is produced. 350 351 352 .. method:: TarFile.next() 353 354 Return the next member of the archive as a :class:`TarInfo` object, when 355 :class:`TarFile` is opened for reading. Return :const:`None` if there is no more 356 available. 357 358 359 .. method:: TarFile.extractall(path=".", members=None) 360 361 Extract all members from the archive to the current working directory or 362 directory *path*. If optional *members* is given, it must be a subset of the 363 list returned by :meth:`getmembers`. Directory information like owner, 364 modification time and permissions are set after all members have been extracted. 365 This is done to work around two problems: A directory's modification time is 366 reset each time a file is created in it. And, if a directory's permissions do 367 not allow writing, extracting files to it will fail. 368 369 .. warning:: 370 371 Never extract archives from untrusted sources without prior inspection. 372 It is possible that files are created outside of *path*, e.g. members 373 that have absolute filenames starting with ``"/"`` or filenames with two 374 dots ``".."``. 375 376 .. versionadded:: 2.5 377 378 379 .. method:: TarFile.extract(member, path="") 380 381 Extract a member from the archive to the current working directory, using its 382 full name. Its file information is extracted as accurately as possible. *member* 383 may be a filename or a :class:`TarInfo` object. You can specify a different 384 directory using *path*. 385 386 .. note:: 387 388 The :meth:`extract` method does not take care of several extraction issues. 389 In most cases you should consider using the :meth:`extractall` method. 390 391 .. warning:: 392 393 See the warning for :meth:`extractall`. 394 395 396 .. method:: TarFile.extractfile(member) 397 398 Extract a member from the archive as a file object. *member* may be a filename 399 or a :class:`TarInfo` object. If *member* is a regular file, a file-like object 400 is returned. If *member* is a link, a file-like object is constructed from the 401 link's target. If *member* is none of the above, :const:`None` is returned. 402 403 .. note:: 404 405 The file-like object is read-only. It provides the methods 406 :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`seek`, :meth:`tell`, 407 and :meth:`close`, and also supports iteration over its lines. 408 409 410 .. method:: TarFile.add(name, arcname=None, recursive=True, exclude=None, filter=None) 411 412 Add the file *name* to the archive. *name* may be any type of file (directory, 413 fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name 414 for the file in the archive. Directories are added recursively by default. This 415 can be avoided by setting *recursive* to :const:`False`. If *exclude* is given 416 it must be a function that takes one filename argument and returns a boolean 417 value. Depending on this value the respective file is either excluded 418 (:const:`True`) or added (:const:`False`). If *filter* is specified it must 419 be a function that takes a :class:`TarInfo` object argument and returns the 420 changed :class:`TarInfo` object. If it instead returns :const:`None` the :class:`TarInfo` 421 object will be excluded from the archive. See :ref:`tar-examples` for an 422 example. 423 424 .. versionchanged:: 2.6 425 Added the *exclude* parameter. 426 427 .. versionchanged:: 2.7 428 Added the *filter* parameter. 429 430 .. deprecated:: 2.7 431 The *exclude* parameter is deprecated, please use the *filter* parameter 432 instead. For maximum portability, *filter* should be used as a keyword 433 argument rather than as a positional argument so that code won't be 434 affected when *exclude* is ultimately removed. 435 436 437 .. method:: TarFile.addfile(tarinfo, fileobj=None) 438 439 Add the :class:`TarInfo` object *tarinfo* to the archive. If *fileobj* is given, 440 ``tarinfo.size`` bytes are read from it and added to the archive. You can 441 create :class:`TarInfo` objects directly, or by using :meth:`gettarinfo`. 442 443 .. note:: 444 On Windows platforms, *fileobj* should always be opened with mode ``'rb'`` to 445 avoid irritation about the file size. 446 447 448 .. method:: TarFile.gettarinfo(name=None, arcname=None, fileobj=None) 449 450 Create a :class:`TarInfo` object from the result of :func:`os.stat` or 451 equivalent on an existing file. The file is either named by *name*, or 452 specified as a file object *fileobj* with a file descriptor. If 453 given, *arcname* specifies an alternative name for the file in the 454 archive, otherwise, the name is taken from *fileobj*s 455 :attr:`~file.name` attribute, or the *name* argument. 456 457 You can modify some 458 of the :class:`TarInfo`s attributes before you add it using :meth:`addfile`. 459 If the file object is not an ordinary file object positioned at the 460 beginning of the file, attributes such as :attr:`~TarInfo.size` may need 461 modifying. This is the case for objects such as :class:`~gzip.GzipFile`. 462 The :attr:`~TarInfo.name` may also be modified, in which case *arcname* 463 could be a dummy string. 464 465 466 .. method:: TarFile.close() 467 468 Close the :class:`TarFile`. In write mode, two finishing zero blocks are 469 appended to the archive. 470 471 472 .. attribute:: TarFile.posix 473 474 Setting this to :const:`True` is equivalent to setting the :attr:`format` 475 attribute to :const:`USTAR_FORMAT`, :const:`False` is equivalent to 476 :const:`GNU_FORMAT`. 477 478 .. versionchanged:: 2.4 479 *posix* defaults to :const:`False`. 480 481 .. deprecated:: 2.6 482 Use the :attr:`format` attribute instead. 483 484 485 .. attribute:: TarFile.pax_headers 486 487 A dictionary containing key-value pairs of pax global headers. 488 489 .. versionadded:: 2.6 490 491 492 .. _tarinfo-objects: 493 494 TarInfo Objects 495 --------------- 496 497 A :class:`TarInfo` object represents one member in a :class:`TarFile`. Aside 498 from storing all required attributes of a file (like file type, size, time, 499 permissions, owner etc.), it provides some useful methods to determine its type. 500 It does *not* contain the file's data itself. 501 502 :class:`TarInfo` objects are returned by :class:`TarFile`'s methods 503 :meth:`getmember`, :meth:`getmembers` and :meth:`gettarinfo`. 504 505 506 .. class:: TarInfo(name="") 507 508 Create a :class:`TarInfo` object. 509 510 511 .. method:: TarInfo.frombuf(buf) 512 513 Create and return a :class:`TarInfo` object from string buffer *buf*. 514 515 .. versionadded:: 2.6 516 Raises :exc:`HeaderError` if the buffer is invalid.. 517 518 519 .. method:: TarInfo.fromtarfile(tarfile) 520 521 Read the next member from the :class:`TarFile` object *tarfile* and return it as 522 a :class:`TarInfo` object. 523 524 .. versionadded:: 2.6 525 526 527 .. method:: TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='strict') 528 529 Create a string buffer from a :class:`TarInfo` object. For information on the 530 arguments see the constructor of the :class:`TarFile` class. 531 532 .. versionchanged:: 2.6 533 The arguments were added. 534 535 A ``TarInfo`` object has the following public data attributes: 536 537 538 .. attribute:: TarInfo.name 539 540 Name of the archive member. 541 542 543 .. attribute:: TarInfo.size 544 545 Size in bytes. 546 547 548 .. attribute:: TarInfo.mtime 549 550 Time of last modification. 551 552 553 .. attribute:: TarInfo.mode 554 555 Permission bits. 556 557 558 .. attribute:: TarInfo.type 559 560 File type. *type* is usually one of these constants: :const:`REGTYPE`, 561 :const:`AREGTYPE`, :const:`LNKTYPE`, :const:`SYMTYPE`, :const:`DIRTYPE`, 562 :const:`FIFOTYPE`, :const:`CONTTYPE`, :const:`CHRTYPE`, :const:`BLKTYPE`, 563 :const:`GNUTYPE_SPARSE`. To determine the type of a :class:`TarInfo` object 564 more conveniently, use the ``is*()`` methods below. 565 566 567 .. attribute:: TarInfo.linkname 568 569 Name of the target file name, which is only present in :class:`TarInfo` objects 570 of type :const:`LNKTYPE` and :const:`SYMTYPE`. 571 572 573 .. attribute:: TarInfo.uid 574 575 User ID of the user who originally stored this member. 576 577 578 .. attribute:: TarInfo.gid 579 580 Group ID of the user who originally stored this member. 581 582 583 .. attribute:: TarInfo.uname 584 585 User name. 586 587 588 .. attribute:: TarInfo.gname 589 590 Group name. 591 592 593 .. attribute:: TarInfo.pax_headers 594 595 A dictionary containing key-value pairs of an associated pax extended header. 596 597 .. versionadded:: 2.6 598 599 A :class:`TarInfo` object also provides some convenient query methods: 600 601 602 .. method:: TarInfo.isfile() 603 604 Return :const:`True` if the :class:`Tarinfo` object is a regular file. 605 606 607 .. method:: TarInfo.isreg() 608 609 Same as :meth:`isfile`. 610 611 612 .. method:: TarInfo.isdir() 613 614 Return :const:`True` if it is a directory. 615 616 617 .. method:: TarInfo.issym() 618 619 Return :const:`True` if it is a symbolic link. 620 621 622 .. method:: TarInfo.islnk() 623 624 Return :const:`True` if it is a hard link. 625 626 627 .. method:: TarInfo.ischr() 628 629 Return :const:`True` if it is a character device. 630 631 632 .. method:: TarInfo.isblk() 633 634 Return :const:`True` if it is a block device. 635 636 637 .. method:: TarInfo.isfifo() 638 639 Return :const:`True` if it is a FIFO. 640 641 642 .. method:: TarInfo.isdev() 643 644 Return :const:`True` if it is one of character device, block device or FIFO. 645 646 647 .. _tar-examples: 648 649 Examples 650 -------- 651 652 How to extract an entire tar archive to the current working directory:: 653 654 import tarfile 655 tar = tarfile.open("sample.tar.gz") 656 tar.extractall() 657 tar.close() 658 659 How to extract a subset of a tar archive with :meth:`TarFile.extractall` using 660 a generator function instead of a list:: 661 662 import os 663 import tarfile 664 665 def py_files(members): 666 for tarinfo in members: 667 if os.path.splitext(tarinfo.name)[1] == ".py": 668 yield tarinfo 669 670 tar = tarfile.open("sample.tar.gz") 671 tar.extractall(members=py_files(tar)) 672 tar.close() 673 674 How to create an uncompressed tar archive from a list of filenames:: 675 676 import tarfile 677 tar = tarfile.open("sample.tar", "w") 678 for name in ["foo", "bar", "quux"]: 679 tar.add(name) 680 tar.close() 681 682 The same example using the :keyword:`with` statement:: 683 684 import tarfile 685 with tarfile.open("sample.tar", "w") as tar: 686 for name in ["foo", "bar", "quux"]: 687 tar.add(name) 688 689 How to read a gzip compressed tar archive and display some member information:: 690 691 import tarfile 692 tar = tarfile.open("sample.tar.gz", "r:gz") 693 for tarinfo in tar: 694 print tarinfo.name, "is", tarinfo.size, "bytes in size and is", 695 if tarinfo.isreg(): 696 print "a regular file." 697 elif tarinfo.isdir(): 698 print "a directory." 699 else: 700 print "something else." 701 tar.close() 702 703 How to create an archive and reset the user information using the *filter* 704 parameter in :meth:`TarFile.add`:: 705 706 import tarfile 707 def reset(tarinfo): 708 tarinfo.uid = tarinfo.gid = 0 709 tarinfo.uname = tarinfo.gname = "root" 710 return tarinfo 711 tar = tarfile.open("sample.tar.gz", "w:gz") 712 tar.add("foo", filter=reset) 713 tar.close() 714 715 716 .. _tar-formats: 717 718 Supported tar formats 719 --------------------- 720 721 There are three tar formats that can be created with the :mod:`tarfile` module: 722 723 * The POSIX.1-1988 ustar format (:const:`USTAR_FORMAT`). It supports filenames 724 up to a length of at best 256 characters and linknames up to 100 characters. The 725 maximum file size is 8 gigabytes. This is an old and limited but widely 726 supported format. 727 728 * The GNU tar format (:const:`GNU_FORMAT`). It supports long filenames and 729 linknames, files bigger than 8 gigabytes and sparse files. It is the de facto 730 standard on GNU/Linux systems. :mod:`tarfile` fully supports the GNU tar 731 extensions for long names, sparse file support is read-only. 732 733 * The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible 734 format with virtually no limits. It supports long filenames and linknames, large 735 files and stores pathnames in a portable way. However, not all tar 736 implementations today are able to handle pax archives properly. 737 738 The *pax* format is an extension to the existing *ustar* format. It uses extra 739 headers for information that cannot be stored otherwise. There are two flavours 740 of pax headers: Extended headers only affect the subsequent file header, global 741 headers are valid for the complete archive and affect all following files. All 742 the data in a pax header is encoded in *UTF-8* for portability reasons. 743 744 There are some more variants of the tar format which can be read, but not 745 created: 746 747 * The ancient V7 format. This is the first tar format from Unix Seventh Edition, 748 storing only regular files and directories. Names must not be longer than 100 749 characters, there is no user/group name information. Some archives have 750 miscalculated header checksums in case of fields with non-ASCII characters. 751 752 * The SunOS tar extended format. This format is a variant of the POSIX.1-2001 753 pax format, but is not compatible. 754 755 .. _tar-unicode: 756 757 Unicode issues 758 -------------- 759 760 The tar format was originally conceived to make backups on tape drives with the 761 main focus on preserving file system information. Nowadays tar archives are 762 commonly used for file distribution and exchanging archives over networks. One 763 problem of the original format (that all other formats are merely variants of) 764 is that there is no concept of supporting different character encodings. For 765 example, an ordinary tar archive created on a *UTF-8* system cannot be read 766 correctly on a *Latin-1* system if it contains non-ASCII characters. Names (i.e. 767 filenames, linknames, user/group names) containing these characters will appear 768 damaged. Unfortunately, there is no way to autodetect the encoding of an 769 archive. 770 771 The pax format was designed to solve this problem. It stores non-ASCII names 772 using the universal character encoding *UTF-8*. When a pax archive is read, 773 these *UTF-8* names are converted to the encoding of the local file system. 774 775 The details of unicode conversion are controlled by the *encoding* and *errors* 776 keyword arguments of the :class:`TarFile` class. 777 778 The default value for *encoding* is the local character encoding. It is deduced 779 from :func:`sys.getfilesystemencoding` and :func:`sys.getdefaultencoding`. In 780 read mode, *encoding* is used exclusively to convert unicode names from a pax 781 archive to strings in the local character encoding. In write mode, the use of 782 *encoding* depends on the chosen archive format. In case of :const:`PAX_FORMAT`, 783 input names that contain non-ASCII characters need to be decoded before being 784 stored as *UTF-8* strings. The other formats do not make use of *encoding* 785 unless unicode objects are used as input names. These are converted to 8-bit 786 character strings before they are added to the archive. 787 788 The *errors* argument defines how characters are treated that cannot be 789 converted to or from *encoding*. Possible values are listed in section 790 :ref:`codec-base-classes`. In read mode, there is an additional scheme 791 ``'utf-8'`` which means that bad characters are replaced by their *UTF-8* 792 representation. This is the default scheme. In write mode the default value for 793 *errors* is ``'strict'`` to ensure that name information is not altered 794 unnoticed. 795 796