1 :mod:`zipfile` --- Work with ZIP archives 2 ========================================= 3 4 .. module:: zipfile 5 :synopsis: Read and write ZIP-format archive files. 6 7 .. moduleauthor:: James C. Ahlstrom <jim (a] interet.com> 8 .. sectionauthor:: James C. Ahlstrom <jim (a] interet.com> 9 10 **Source code:** :source:`Lib/zipfile.py` 11 12 -------------- 13 14 The ZIP file format is a common archive and compression standard. This module 15 provides tools to create, read, write, append, and list a ZIP file. Any 16 advanced use of this module will require an understanding of the format, as 17 defined in `PKZIP Application Note`_. 18 19 This module does not currently handle multi-disk ZIP files. 20 It can handle ZIP files that use the ZIP64 extensions 21 (that is ZIP files that are more than 4 GiB in size). It supports 22 decryption of encrypted files in ZIP archives, but it currently cannot 23 create an encrypted file. Decryption is extremely slow as it is 24 implemented in native Python rather than C. 25 26 The module defines the following items: 27 28 .. exception:: BadZipFile 29 30 The error raised for bad ZIP files. 31 32 .. versionadded:: 3.2 33 34 35 .. exception:: BadZipfile 36 37 Alias of :exc:`BadZipFile`, for compatibility with older Python versions. 38 39 .. deprecated:: 3.2 40 41 42 .. exception:: LargeZipFile 43 44 The error raised when a ZIP file would require ZIP64 functionality but that has 45 not been enabled. 46 47 48 .. class:: ZipFile 49 :noindex: 50 51 The class for reading and writing ZIP files. See section 52 :ref:`zipfile-objects` for constructor details. 53 54 55 .. class:: PyZipFile 56 :noindex: 57 58 Class for creating ZIP archives containing Python libraries. 59 60 61 .. class:: ZipInfo(filename='NoName', date_time=(1980,1,1,0,0,0)) 62 63 Class used to represent information about a member of an archive. Instances 64 of this class are returned by the :meth:`.getinfo` and :meth:`.infolist` 65 methods of :class:`ZipFile` objects. Most users of the :mod:`zipfile` module 66 will not need to create these, but only use those created by this 67 module. *filename* should be the full name of the archive member, and 68 *date_time* should be a tuple containing six fields which describe the time 69 of the last modification to the file; the fields are described in section 70 :ref:`zipinfo-objects`. 71 72 73 .. function:: is_zipfile(filename) 74 75 Returns ``True`` if *filename* is a valid ZIP file based on its magic number, 76 otherwise returns ``False``. *filename* may be a file or file-like object too. 77 78 .. versionchanged:: 3.1 79 Support for file and file-like objects. 80 81 82 .. data:: ZIP_STORED 83 84 The numeric constant for an uncompressed archive member. 85 86 87 .. data:: ZIP_DEFLATED 88 89 The numeric constant for the usual ZIP compression method. This requires the 90 :mod:`zlib` module. 91 92 93 .. data:: ZIP_BZIP2 94 95 The numeric constant for the BZIP2 compression method. This requires the 96 :mod:`bz2` module. 97 98 .. versionadded:: 3.3 99 100 .. data:: ZIP_LZMA 101 102 The numeric constant for the LZMA compression method. This requires the 103 :mod:`lzma` module. 104 105 .. versionadded:: 3.3 106 107 .. note:: 108 109 The ZIP file format specification has included support for bzip2 compression 110 since 2001, and for LZMA compression since 2006. However, some tools 111 (including older Python releases) do not support these compression 112 methods, and may either refuse to process the ZIP file altogether, 113 or fail to extract individual files. 114 115 116 .. seealso:: 117 118 `PKZIP Application Note`_ 119 Documentation on the ZIP file format by Phil Katz, the creator of the format and 120 algorithms used. 121 122 `Info-ZIP Home Page <http://www.info-zip.org/>`_ 123 Information about the Info-ZIP project's ZIP archive programs and development 124 libraries. 125 126 127 .. _zipfile-objects: 128 129 ZipFile Objects 130 --------------- 131 132 133 .. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True) 134 135 Open a ZIP file, where *file* can be either a path to a file (a string) or a 136 file-like object. The *mode* parameter should be ``'r'`` to read an existing 137 file, ``'w'`` to truncate and write a new file, ``'a'`` to append to an 138 existing file, or ``'x'`` to exclusively create and write a new file. 139 If *mode* is ``'x'`` and *file* refers to an existing file, 140 a :exc:`FileExistsError` will be raised. 141 If *mode* is ``'a'`` and *file* refers to an existing ZIP 142 file, then additional files are added to it. If *file* does not refer to a 143 ZIP file, then a new ZIP archive is appended to the file. This is meant for 144 adding a ZIP archive to another file (such as :file:`python.exe`). If 145 *mode* is ``'a'`` and the file does not exist at all, it is created. 146 If *mode* is ``'r'`` or ``'a'``, the file should be seekable. 147 *compression* is the ZIP compression method to use when writing the archive, 148 and should be :const:`ZIP_STORED`, :const:`ZIP_DEFLATED`, 149 :const:`ZIP_BZIP2` or :const:`ZIP_LZMA`; unrecognized 150 values will cause :exc:`NotImplementedError` to be raised. If :const:`ZIP_DEFLATED`, 151 :const:`ZIP_BZIP2` or :const:`ZIP_LZMA` is specified but the corresponding module 152 (:mod:`zlib`, :mod:`bz2` or :mod:`lzma`) is not available, :exc:`RuntimeError` 153 is raised. The default is :const:`ZIP_STORED`. If *allowZip64* is 154 ``True`` (the default) zipfile will create ZIP files that use the ZIP64 155 extensions when the zipfile is larger than 4 GiB. If it is false :mod:`zipfile` 156 will raise an exception when the ZIP file would require ZIP64 extensions. 157 158 If the file is created with mode ``'w'``, ``'x'`` or ``'a'`` and then 159 :meth:`closed <close>` without adding any files to the archive, the appropriate 160 ZIP structures for an empty archive will be written to the file. 161 162 ZipFile is also a context manager and therefore supports the 163 :keyword:`with` statement. In the example, *myzip* is closed after the 164 :keyword:`with` statement's suite is finished---even if an exception occurs:: 165 166 with ZipFile('spam.zip', 'w') as myzip: 167 myzip.write('eggs.txt') 168 169 .. versionadded:: 3.2 170 Added the ability to use :class:`ZipFile` as a context manager. 171 172 .. versionchanged:: 3.3 173 Added support for :mod:`bzip2 <bz2>` and :mod:`lzma` compression. 174 175 .. versionchanged:: 3.4 176 ZIP64 extensions are enabled by default. 177 178 .. versionchanged:: 3.5 179 Added support for writing to unseekable streams. 180 Added support for the ``'x'`` mode. 181 182 .. versionchanged:: 3.6 183 Previously, a plain :exc:`RuntimeError` was raised for unrecognized 184 compression values. 185 186 187 .. method:: ZipFile.close() 188 189 Close the archive file. You must call :meth:`close` before exiting your program 190 or essential records will not be written. 191 192 193 .. method:: ZipFile.getinfo(name) 194 195 Return a :class:`ZipInfo` object with information about the archive member 196 *name*. Calling :meth:`getinfo` for a name not currently contained in the 197 archive will raise a :exc:`KeyError`. 198 199 200 .. method:: ZipFile.infolist() 201 202 Return a list containing a :class:`ZipInfo` object for each member of the 203 archive. The objects are in the same order as their entries in the actual ZIP 204 file on disk if an existing archive was opened. 205 206 207 .. method:: ZipFile.namelist() 208 209 Return a list of archive members by name. 210 211 212 .. method:: ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False) 213 214 Access a member of the archive as a binary file-like object. *name* 215 can be either the name of a file within the archive or a :class:`ZipInfo` 216 object. The *mode* parameter, if included, must be ``'r'`` (the default) 217 or ``'w'``. *pwd* is the password used to decrypt encrypted ZIP files. 218 219 :meth:`~ZipFile.open` is also a context manager and therefore supports the 220 :keyword:`with` statement:: 221 222 with ZipFile('spam.zip') as myzip: 223 with myzip.open('eggs.txt') as myfile: 224 print(myfile.read()) 225 226 With *mode* ``'r'`` the file-like object 227 (``ZipExtFile``) is read-only and provides the following methods: 228 :meth:`~io.BufferedIOBase.read`, :meth:`~io.IOBase.readline`, 229 :meth:`~io.IOBase.readlines`, :meth:`__iter__`, 230 :meth:`~iterator.__next__`. These objects can operate independently of 231 the ZipFile. 232 233 With ``mode='w'``, a writable file handle is returned, which supports the 234 :meth:`~io.BufferedIOBase.write` method. While a writable file handle is open, 235 attempting to read or write other files in the ZIP file will raise a 236 :exc:`ValueError`. 237 238 When writing a file, if the file size is not known in advance but may exceed 239 2 GiB, pass ``force_zip64=True`` to ensure that the header format is 240 capable of supporting large files. If the file size is known in advance, 241 construct a :class:`ZipInfo` object with :attr:`~ZipInfo.file_size` set, and 242 use that as the *name* parameter. 243 244 .. note:: 245 246 The :meth:`.open`, :meth:`read` and :meth:`extract` methods can take a filename 247 or a :class:`ZipInfo` object. You will appreciate this when trying to read a 248 ZIP file that contains members with duplicate names. 249 250 .. versionchanged:: 3.6 251 Removed support of ``mode='U'``. Use :class:`io.TextIOWrapper` for reading 252 compressed text files in :term:`universal newlines` mode. 253 254 .. versionchanged:: 3.6 255 :meth:`open` can now be used to write files into the archive with the 256 ``mode='w'`` option. 257 258 .. versionchanged:: 3.6 259 Calling :meth:`.open` on a closed ZipFile will raise a :exc:`ValueError`. 260 Previously, a :exc:`RuntimeError` was raised. 261 262 263 .. method:: ZipFile.extract(member, path=None, pwd=None) 264 265 Extract a member from the archive to the current working directory; *member* 266 must be its full name or a :class:`ZipInfo` object. Its file information is 267 extracted as accurately as possible. *path* specifies a different directory 268 to extract to. *member* can be a filename or a :class:`ZipInfo` object. 269 *pwd* is the password used for encrypted files. 270 271 Returns the normalized path created (a directory or new file). 272 273 .. note:: 274 275 If a member filename is an absolute path, a drive/UNC sharepoint and 276 leading (back)slashes will be stripped, e.g.: ``///foo/bar`` becomes 277 ``foo/bar`` on Unix, and ``C:\foo\bar`` becomes ``foo\bar`` on Windows. 278 And all ``".."`` components in a member filename will be removed, e.g.: 279 ``../../foo../../ba..r`` becomes ``foo../ba..r``. On Windows illegal 280 characters (``:``, ``<``, ``>``, ``|``, ``"``, ``?``, and ``*``) 281 replaced by underscore (``_``). 282 283 .. versionchanged:: 3.6 284 Calling :meth:`extract` on a closed ZipFile will raise a 285 :exc:`ValueError`. Previously, a :exc:`RuntimeError` was raised. 286 287 288 .. method:: ZipFile.extractall(path=None, members=None, pwd=None) 289 290 Extract all members from the archive to the current working directory. *path* 291 specifies a different directory to extract to. *members* is optional and must 292 be a subset of the list returned by :meth:`namelist`. *pwd* is the password 293 used for encrypted files. 294 295 .. warning:: 296 297 Never extract archives from untrusted sources without prior inspection. 298 It is possible that files are created outside of *path*, e.g. members 299 that have absolute filenames starting with ``"/"`` or filenames with two 300 dots ``".."``. This module attempts to prevent that. 301 See :meth:`extract` note. 302 303 .. versionchanged:: 3.6 304 Calling :meth:`extractall` on a closed ZipFile will raise a 305 :exc:`ValueError`. Previously, a :exc:`RuntimeError` was raised. 306 307 308 .. method:: ZipFile.printdir() 309 310 Print a table of contents for the archive to ``sys.stdout``. 311 312 313 .. method:: ZipFile.setpassword(pwd) 314 315 Set *pwd* as default password to extract encrypted files. 316 317 318 .. method:: ZipFile.read(name, pwd=None) 319 320 Return the bytes of the file *name* in the archive. *name* is the name of the 321 file in the archive, or a :class:`ZipInfo` object. The archive must be open for 322 read or append. *pwd* is the password used for encrypted files and, if specified, 323 it will override the default password set with :meth:`setpassword`. Calling 324 :meth:`read` on a ZipFile that uses a compression method other than 325 :const:`ZIP_STORED`, :const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2` or 326 :const:`ZIP_LZMA` will raise a :exc:`NotImplementedError`. An error will also 327 be raised if the corresponding compression module is not available. 328 329 .. versionchanged:: 3.6 330 Calling :meth:`read` on a closed ZipFile will raise a :exc:`ValueError`. 331 Previously, a :exc:`RuntimeError` was raised. 332 333 334 .. method:: ZipFile.testzip() 335 336 Read all the files in the archive and check their CRC's and file headers. 337 Return the name of the first bad file, or else return ``None``. 338 339 .. versionchanged:: 3.6 340 Calling :meth:`testfile` on a closed ZipFile will raise a 341 :exc:`ValueError`. Previously, a :exc:`RuntimeError` was raised. 342 343 344 .. method:: ZipFile.write(filename, arcname=None, compress_type=None) 345 346 Write the file named *filename* to the archive, giving it the archive name 347 *arcname* (by default, this will be the same as *filename*, but without a drive 348 letter and with leading path separators removed). If given, *compress_type* 349 overrides the value given for the *compression* parameter to the constructor for 350 the new entry. 351 The archive must be open with mode ``'w'``, ``'x'`` or ``'a'``. 352 353 .. note:: 354 355 There is no official file name encoding for ZIP files. If you have unicode file 356 names, you must convert them to byte strings in your desired encoding before 357 passing them to :meth:`write`. WinZip interprets all file names as encoded in 358 CP437, also known as DOS Latin. 359 360 .. note:: 361 362 Archive names should be relative to the archive root, that is, they should not 363 start with a path separator. 364 365 .. note:: 366 367 If ``arcname`` (or ``filename``, if ``arcname`` is not given) contains a null 368 byte, the name of the file in the archive will be truncated at the null byte. 369 370 .. versionchanged:: 3.6 371 Calling :meth:`write` on a ZipFile created with mode ``'r'`` or 372 a closed ZipFile will raise a :exc:`ValueError`. Previously, 373 a :exc:`RuntimeError` was raised. 374 375 376 .. method:: ZipFile.writestr(zinfo_or_arcname, data[, compress_type]) 377 378 Write the string *data* to the archive; *zinfo_or_arcname* is either the file 379 name it will be given in the archive, or a :class:`ZipInfo` instance. If it's 380 an instance, at least the filename, date, and time must be given. If it's a 381 name, the date and time is set to the current date and time. 382 The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``. 383 384 If given, *compress_type* overrides the value given for the *compression* 385 parameter to the constructor for the new entry, or in the *zinfo_or_arcname* 386 (if that is a :class:`ZipInfo` instance). 387 388 .. note:: 389 390 When passing a :class:`ZipInfo` instance as the *zinfo_or_arcname* parameter, 391 the compression method used will be that specified in the *compress_type* 392 member of the given :class:`ZipInfo` instance. By default, the 393 :class:`ZipInfo` constructor sets this member to :const:`ZIP_STORED`. 394 395 .. versionchanged:: 3.2 396 The *compress_type* argument. 397 398 .. versionchanged:: 3.6 399 Calling :meth:`writestr` on a ZipFile created with mode ``'r'`` or 400 a closed ZipFile will raise a :exc:`ValueError`. Previously, 401 a :exc:`RuntimeError` was raised. 402 403 404 The following data attributes are also available: 405 406 407 .. attribute:: ZipFile.debug 408 409 The level of debug output to use. This may be set from ``0`` (the default, no 410 output) to ``3`` (the most output). Debugging information is written to 411 ``sys.stdout``. 412 413 .. attribute:: ZipFile.comment 414 415 The comment text associated with the ZIP file. If assigning a comment to a 416 :class:`ZipFile` instance created with mode ``'w'``, ``'x'`` or ``'a'``, 417 this should be a 418 string no longer than 65535 bytes. Comments longer than this will be 419 truncated in the written archive when :meth:`close` is called. 420 421 422 .. _pyzipfile-objects: 423 424 PyZipFile Objects 425 ----------------- 426 427 The :class:`PyZipFile` constructor takes the same parameters as the 428 :class:`ZipFile` constructor, and one additional parameter, *optimize*. 429 430 .. class:: PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, \ 431 optimize=-1) 432 433 .. versionadded:: 3.2 434 The *optimize* parameter. 435 436 .. versionchanged:: 3.4 437 ZIP64 extensions are enabled by default. 438 439 Instances have one method in addition to those of :class:`ZipFile` objects: 440 441 .. method:: PyZipFile.writepy(pathname, basename='', filterfunc=None) 442 443 Search for files :file:`\*.py` and add the corresponding file to the 444 archive. 445 446 If the *optimize* parameter to :class:`PyZipFile` was not given or ``-1``, 447 the corresponding file is a :file:`\*.pyc` file, compiling if necessary. 448 449 If the *optimize* parameter to :class:`PyZipFile` was ``0``, ``1`` or 450 ``2``, only files with that optimization level (see :func:`compile`) are 451 added to the archive, compiling if necessary. 452 453 If *pathname* is a file, the filename must end with :file:`.py`, and 454 just the (corresponding :file:`\*.py[co]`) file is added at the top level 455 (no path information). If *pathname* is a file that does not end with 456 :file:`.py`, a :exc:`RuntimeError` will be raised. If it is a directory, 457 and the directory is not a package directory, then all the files 458 :file:`\*.py[co]` are added at the top level. If the directory is a 459 package directory, then all :file:`\*.py[co]` are added under the package 460 name as a file path, and if any subdirectories are package directories, 461 all of these are added recursively. 462 463 *basename* is intended for internal use only. 464 465 *filterfunc*, if given, must be a function taking a single string 466 argument. It will be passed each path (including each individual full 467 file path) before it is added to the archive. If *filterfunc* returns a 468 false value, the path will not be added, and if it is a directory its 469 contents will be ignored. For example, if our test files are all either 470 in ``test`` directories or start with the string ``test_``, we can use a 471 *filterfunc* to exclude them:: 472 473 >>> zf = PyZipFile('myprog.zip') 474 >>> def notests(s): 475 ... fn = os.path.basename(s) 476 ... return (not (fn == 'test' or fn.startswith('test_'))) 477 >>> zf.writepy('myprog', filterfunc=notests) 478 479 The :meth:`writepy` method makes archives with file names like 480 this:: 481 482 string.pyc # Top level name 483 test/__init__.pyc # Package directory 484 test/testall.pyc # Module test.testall 485 test/bogus/__init__.pyc # Subpackage directory 486 test/bogus/myfile.pyc # Submodule test.bogus.myfile 487 488 .. versionadded:: 3.4 489 The *filterfunc* parameter. 490 491 492 .. _zipinfo-objects: 493 494 ZipInfo Objects 495 --------------- 496 497 Instances of the :class:`ZipInfo` class are returned by the :meth:`.getinfo` and 498 :meth:`.infolist` methods of :class:`ZipFile` objects. Each object stores 499 information about a single member of the ZIP archive. 500 501 There is one classmethod to make a :class:`ZipInfo` instance for a filesystem 502 file: 503 504 .. classmethod:: ZipInfo.from_file(filename, arcname=None) 505 506 Construct a :class:`ZipInfo` instance for a file on the filesystem, in 507 preparation for adding it to a zip file. 508 509 *filename* should be the path to a file or directory on the filesystem. 510 511 If *arcname* is specified, it is used as the name within the archive. 512 If *arcname* is not specified, the name will be the same as *filename*, but 513 with any drive letter and leading path separators removed. 514 515 .. versionadded:: 3.6 516 517 Instances have the following methods and attributes: 518 519 .. method:: ZipInfo.is_dir() 520 521 Return ``True`` if this archive member is a directory. 522 523 This uses the entry's name: directories should always end with ``/``. 524 525 .. versionadded:: 3.6 526 527 528 .. attribute:: ZipInfo.filename 529 530 Name of the file in the archive. 531 532 533 .. attribute:: ZipInfo.date_time 534 535 The time and date of the last modification to the archive member. This is a 536 tuple of six values: 537 538 +-------+--------------------------+ 539 | Index | Value | 540 +=======+==========================+ 541 | ``0`` | Year (>= 1980) | 542 +-------+--------------------------+ 543 | ``1`` | Month (one-based) | 544 +-------+--------------------------+ 545 | ``2`` | Day of month (one-based) | 546 +-------+--------------------------+ 547 | ``3`` | Hours (zero-based) | 548 +-------+--------------------------+ 549 | ``4`` | Minutes (zero-based) | 550 +-------+--------------------------+ 551 | ``5`` | Seconds (zero-based) | 552 +-------+--------------------------+ 553 554 .. note:: 555 556 The ZIP file format does not support timestamps before 1980. 557 558 559 .. attribute:: ZipInfo.compress_type 560 561 Type of compression for the archive member. 562 563 564 .. attribute:: ZipInfo.comment 565 566 Comment for the individual archive member. 567 568 569 .. attribute:: ZipInfo.extra 570 571 Expansion field data. The `PKZIP Application Note`_ contains 572 some comments on the internal structure of the data contained in this string. 573 574 575 .. attribute:: ZipInfo.create_system 576 577 System which created ZIP archive. 578 579 580 .. attribute:: ZipInfo.create_version 581 582 PKZIP version which created ZIP archive. 583 584 585 .. attribute:: ZipInfo.extract_version 586 587 PKZIP version needed to extract archive. 588 589 590 .. attribute:: ZipInfo.reserved 591 592 Must be zero. 593 594 595 .. attribute:: ZipInfo.flag_bits 596 597 ZIP flag bits. 598 599 600 .. attribute:: ZipInfo.volume 601 602 Volume number of file header. 603 604 605 .. attribute:: ZipInfo.internal_attr 606 607 Internal attributes. 608 609 610 .. attribute:: ZipInfo.external_attr 611 612 External file attributes. 613 614 615 .. attribute:: ZipInfo.header_offset 616 617 Byte offset to the file header. 618 619 620 .. attribute:: ZipInfo.CRC 621 622 CRC-32 of the uncompressed file. 623 624 625 .. attribute:: ZipInfo.compress_size 626 627 Size of the compressed data. 628 629 630 .. attribute:: ZipInfo.file_size 631 632 Size of the uncompressed file. 633 634 635 .. _zipfile-commandline: 636 .. program:: zipfile 637 638 Command-Line Interface 639 ---------------------- 640 641 The :mod:`zipfile` module provides a simple command-line interface to interact 642 with ZIP archives. 643 644 If you want to create a new ZIP archive, specify its name after the :option:`-c` 645 option and then list the filename(s) that should be included: 646 647 .. code-block:: shell-session 648 649 $ python -m zipfile -c monty.zip spam.txt eggs.txt 650 651 Passing a directory is also acceptable: 652 653 .. code-block:: shell-session 654 655 $ python -m zipfile -c monty.zip life-of-brian_1979/ 656 657 If you want to extract a ZIP archive into the specified directory, use 658 the :option:`-e` option: 659 660 .. code-block:: shell-session 661 662 $ python -m zipfile -e monty.zip target-dir/ 663 664 For a list of the files in a ZIP archive, use the :option:`-l` option: 665 666 .. code-block:: shell-session 667 668 $ python -m zipfile -l monty.zip 669 670 671 Command-line options 672 ~~~~~~~~~~~~~~~~~~~~ 673 674 .. cmdoption:: -l <zipfile> 675 676 List files in a zipfile. 677 678 .. cmdoption:: -c <zipfile> <source1> ... <sourceN> 679 680 Create zipfile from source files. 681 682 .. cmdoption:: -e <zipfile> <output_dir> 683 684 Extract zipfile into target directory. 685 686 .. cmdoption:: -t <zipfile> 687 688 Test whether the zipfile is valid or not. 689 690 691 .. _PKZIP Application Note: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 692