Home | History | Annotate | Download | only in library
      1 :mod:`pickle` --- Python object serialization
      2 =============================================
      3 
      4 .. module:: pickle
      5    :synopsis: Convert Python objects to streams of bytes and back.
      6 
      7 .. sectionauthor:: Jim Kerr <jbkerr (a] sr.hp.com>.
      8 .. sectionauthor:: Barry Warsaw <barry (a] python.org>
      9 
     10 **Source code:** :source:`Lib/pickle.py`
     11 
     12 .. index::
     13    single: persistence
     14    pair: persistent; objects
     15    pair: serializing; objects
     16    pair: marshalling; objects
     17    pair: flattening; objects
     18    pair: pickling; objects
     19 
     20 --------------
     21 
     22 The :mod:`pickle` module implements binary protocols for serializing and
     23 de-serializing a Python object structure.  *"Pickling"* is the process
     24 whereby a Python object hierarchy is converted into a byte stream, and
     25 *"unpickling"* is the inverse operation, whereby a byte stream
     26 (from a :term:`binary file` or :term:`bytes-like object`) is converted
     27 back into an object hierarchy.  Pickling (and unpickling) is alternatively
     28 known as "serialization", "marshalling," [#]_ or "flattening"; however, to
     29 avoid confusion, the terms used here are "pickling" and "unpickling".
     30 
     31 .. warning::
     32 
     33    The :mod:`pickle` module is not secure against erroneous or maliciously
     34    constructed data.  Never unpickle data received from an untrusted or
     35    unauthenticated source.
     36 
     37 
     38 Relationship to other Python modules
     39 ------------------------------------
     40 
     41 Comparison with ``marshal``
     42 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     43 
     44 Python has a more primitive serialization module called :mod:`marshal`, but in
     45 general :mod:`pickle` should always be the preferred way to serialize Python
     46 objects.  :mod:`marshal` exists primarily to support Python's :file:`.pyc`
     47 files.
     48 
     49 The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
     50 
     51 * The :mod:`pickle` module keeps track of the objects it has already serialized,
     52   so that later references to the same object won't be serialized again.
     53   :mod:`marshal` doesn't do this.
     54 
     55   This has implications both for recursive objects and object sharing.  Recursive
     56   objects are objects that contain references to themselves.  These are not
     57   handled by marshal, and in fact, attempting to marshal recursive objects will
     58   crash your Python interpreter.  Object sharing happens when there are multiple
     59   references to the same object in different places in the object hierarchy being
     60   serialized.  :mod:`pickle` stores such objects only once, and ensures that all
     61   other references point to the master copy.  Shared objects remain shared, which
     62   can be very important for mutable objects.
     63 
     64 * :mod:`marshal` cannot be used to serialize user-defined classes and their
     65   instances.  :mod:`pickle` can save and restore class instances transparently,
     66   however the class definition must be importable and live in the same module as
     67   when the object was stored.
     68 
     69 * The :mod:`marshal` serialization format is not guaranteed to be portable
     70   across Python versions.  Because its primary job in life is to support
     71   :file:`.pyc` files, the Python implementers reserve the right to change the
     72   serialization format in non-backwards compatible ways should the need arise.
     73   The :mod:`pickle` serialization format is guaranteed to be backwards compatible
     74   across Python releases provided a compatible pickle protocol is chosen and
     75   pickling and unpickling code deals with Python 2 to Python 3 type differences
     76   if your data is crossing that unique breaking change language boundary.
     77 
     78 Comparison with ``json``
     79 ^^^^^^^^^^^^^^^^^^^^^^^^
     80 
     81 There are fundamental differences between the pickle protocols and
     82 `JSON (JavaScript Object Notation) <http://json.org>`_:
     83 
     84 * JSON is a text serialization format (it outputs unicode text, although
     85   most of the time it is then encoded to ``utf-8``), while pickle is
     86   a binary serialization format;
     87 
     88 * JSON is human-readable, while pickle is not;
     89 
     90 * JSON is interoperable and widely used outside of the Python ecosystem,
     91   while pickle is Python-specific;
     92 
     93 * JSON, by default, can only represent a subset of the Python built-in
     94   types, and no custom classes; pickle can represent an extremely large
     95   number of Python types (many of them automatically, by clever usage
     96   of Python's introspection facilities; complex cases can be tackled by
     97   implementing :ref:`specific object APIs <pickle-inst>`).
     98 
     99 .. seealso::
    100    The :mod:`json` module: a standard library module allowing JSON
    101    serialization and deserialization.
    102 
    103 
    104 .. _pickle-protocols:
    105 
    106 Data stream format
    107 ------------------
    108 
    109 .. index::
    110    single: External Data Representation
    111 
    112 The data format used by :mod:`pickle` is Python-specific.  This has the
    113 advantage that there are no restrictions imposed by external standards such as
    114 JSON or XDR (which can't represent pointer sharing); however it means that
    115 non-Python programs may not be able to reconstruct pickled Python objects.
    116 
    117 By default, the :mod:`pickle` data format uses a relatively compact binary
    118 representation.  If you need optimal size characteristics, you can efficiently
    119 :doc:`compress <archiving>` pickled data.
    120 
    121 The module :mod:`pickletools` contains tools for analyzing data streams
    122 generated by :mod:`pickle`.  :mod:`pickletools` source code has extensive
    123 comments about opcodes used by pickle protocols.
    124 
    125 There are currently 5 different protocols which can be used for pickling.
    126 The higher the protocol used, the more recent the version of Python needed
    127 to read the pickle produced.
    128 
    129 * Protocol version 0 is the original "human-readable" protocol and is
    130   backwards compatible with earlier versions of Python.
    131 
    132 * Protocol version 1 is an old binary format which is also compatible with
    133   earlier versions of Python.
    134 
    135 * Protocol version 2 was introduced in Python 2.3.  It provides much more
    136   efficient pickling of :term:`new-style class`\es.  Refer to :pep:`307` for
    137   information about improvements brought by protocol 2.
    138 
    139 * Protocol version 3 was added in Python 3.0.  It has explicit support for
    140   :class:`bytes` objects and cannot be unpickled by Python 2.x.  This is
    141   the default protocol, and the recommended protocol when compatibility with
    142   other Python 3 versions is required.
    143 
    144 * Protocol version 4 was added in Python 3.4.  It adds support for very large
    145   objects, pickling more kinds of objects, and some data format
    146   optimizations.  Refer to :pep:`3154` for information about improvements
    147   brought by protocol 4.
    148 
    149 .. note::
    150    Serialization is a more primitive notion than persistence; although
    151    :mod:`pickle` reads and writes file objects, it does not handle the issue of
    152    naming persistent objects, nor the (even more complicated) issue of concurrent
    153    access to persistent objects.  The :mod:`pickle` module can transform a complex
    154    object into a byte stream and it can transform the byte stream into an object
    155    with the same internal structure.  Perhaps the most obvious thing to do with
    156    these byte streams is to write them onto a file, but it is also conceivable to
    157    send them across a network or store them in a database.  The :mod:`shelve`
    158    module provides a simple interface to pickle and unpickle objects on
    159    DBM-style database files.
    160 
    161 
    162 Module Interface
    163 ----------------
    164 
    165 To serialize an object hierarchy, you simply call the :func:`dumps` function.
    166 Similarly, to de-serialize a data stream, you call the :func:`loads` function.
    167 However, if you want more control over serialization and de-serialization,
    168 you can create a :class:`Pickler` or an :class:`Unpickler` object, respectively.
    169 
    170 The :mod:`pickle` module provides the following constants:
    171 
    172 
    173 .. data:: HIGHEST_PROTOCOL
    174 
    175    An integer, the highest :ref:`protocol version <pickle-protocols>`
    176    available.  This value can be passed as a *protocol* value to functions
    177    :func:`dump` and :func:`dumps` as well as the :class:`Pickler`
    178    constructor.
    179 
    180 .. data:: DEFAULT_PROTOCOL
    181 
    182    An integer, the default :ref:`protocol version <pickle-protocols>` used
    183    for pickling.  May be less than :data:`HIGHEST_PROTOCOL`.  Currently the
    184    default protocol is 3, a new protocol designed for Python 3.
    185 
    186 
    187 The :mod:`pickle` module provides the following functions to make the pickling
    188 process more convenient:
    189 
    190 .. function:: dump(obj, file, protocol=None, \*, fix_imports=True)
    191 
    192    Write a pickled representation of *obj* to the open :term:`file object` *file*.
    193    This is equivalent to ``Pickler(file, protocol).dump(obj)``.
    194 
    195    The optional *protocol* argument, an integer, tells the pickler to use
    196    the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
    197    If not specified, the default is :data:`DEFAULT_PROTOCOL`.  If a negative
    198    number is specified, :data:`HIGHEST_PROTOCOL` is selected.
    199 
    200    The *file* argument must have a write() method that accepts a single bytes
    201    argument.  It can thus be an on-disk file opened for binary writing, an
    202    :class:`io.BytesIO` instance, or any other custom object that meets this
    203    interface.
    204 
    205    If *fix_imports* is true and *protocol* is less than 3, pickle will try to
    206    map the new Python 3 names to the old module names used in Python 2, so
    207    that the pickle data stream is readable with Python 2.
    208 
    209 .. function:: dumps(obj, protocol=None, \*, fix_imports=True)
    210 
    211    Return the pickled representation of the object as a :class:`bytes` object,
    212    instead of writing it to a file.
    213 
    214    Arguments *protocol* and *fix_imports* have the same meaning as in
    215    :func:`dump`.
    216 
    217 .. function:: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
    218 
    219    Read a pickled object representation from the open :term:`file object`
    220    *file* and return the reconstituted object hierarchy specified therein.
    221    This is equivalent to ``Unpickler(file).load()``.
    222 
    223    The protocol version of the pickle is detected automatically, so no
    224    protocol argument is needed.  Bytes past the pickled object's
    225    representation are ignored.
    226 
    227    The argument *file* must have two methods, a read() method that takes an
    228    integer argument, and a readline() method that requires no arguments.  Both
    229    methods should return bytes.  Thus *file* can be an on-disk file opened for
    230    binary reading, an :class:`io.BytesIO` object, or any other custom object
    231    that meets this interface.
    232 
    233    Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
    234    which are used to control compatibility support for pickle stream generated
    235    by Python 2.  If *fix_imports* is true, pickle will try to map the old
    236    Python 2 names to the new names used in Python 3.  The *encoding* and
    237    *errors* tell pickle how to decode 8-bit string instances pickled by Python
    238    2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
    239    be 'bytes' to read these 8-bit string instances as bytes objects.
    240    Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
    241    instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
    242    :class:`~datetime.time` pickled by Python 2.
    243 
    244 .. function:: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
    245 
    246    Read a pickled object hierarchy from a :class:`bytes` object and return the
    247    reconstituted object hierarchy specified therein.
    248 
    249    The protocol version of the pickle is detected automatically, so no
    250    protocol argument is needed.  Bytes past the pickled object's
    251    representation are ignored.
    252 
    253    Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
    254    which are used to control compatibility support for pickle stream generated
    255    by Python 2.  If *fix_imports* is true, pickle will try to map the old
    256    Python 2 names to the new names used in Python 3.  The *encoding* and
    257    *errors* tell pickle how to decode 8-bit string instances pickled by Python
    258    2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
    259    be 'bytes' to read these 8-bit string instances as bytes objects.
    260    Using ``encoding='latin1'`` is required for unpickling NumPy arrays and
    261    instances of :class:`~datetime.datetime`, :class:`~datetime.date` and
    262    :class:`~datetime.time` pickled by Python 2.
    263 
    264 
    265 The :mod:`pickle` module defines three exceptions:
    266 
    267 .. exception:: PickleError
    268 
    269    Common base class for the other pickling exceptions.  It inherits
    270    :exc:`Exception`.
    271 
    272 .. exception:: PicklingError
    273 
    274    Error raised when an unpicklable object is encountered by :class:`Pickler`.
    275    It inherits :exc:`PickleError`.
    276 
    277    Refer to :ref:`pickle-picklable` to learn what kinds of objects can be
    278    pickled.
    279 
    280 .. exception:: UnpicklingError
    281 
    282    Error raised when there is a problem unpickling an object, such as a data
    283    corruption or a security violation.  It inherits :exc:`PickleError`.
    284 
    285    Note that other exceptions may also be raised during unpickling, including
    286    (but not necessarily limited to) AttributeError, EOFError, ImportError, and
    287    IndexError.
    288 
    289 
    290 The :mod:`pickle` module exports two classes, :class:`Pickler` and
    291 :class:`Unpickler`:
    292 
    293 .. class:: Pickler(file, protocol=None, \*, fix_imports=True)
    294 
    295    This takes a binary file for writing a pickle data stream.
    296 
    297    The optional *protocol* argument, an integer, tells the pickler to use
    298    the given protocol; supported protocols are 0 to :data:`HIGHEST_PROTOCOL`.
    299    If not specified, the default is :data:`DEFAULT_PROTOCOL`.  If a negative
    300    number is specified, :data:`HIGHEST_PROTOCOL` is selected.
    301 
    302    The *file* argument must have a write() method that accepts a single bytes
    303    argument.  It can thus be an on-disk file opened for binary writing, an
    304    :class:`io.BytesIO` instance, or any other custom object that meets this
    305    interface.
    306 
    307    If *fix_imports* is true and *protocol* is less than 3, pickle will try to
    308    map the new Python 3 names to the old module names used in Python 2, so
    309    that the pickle data stream is readable with Python 2.
    310 
    311    .. method:: dump(obj)
    312 
    313       Write a pickled representation of *obj* to the open file object given in
    314       the constructor.
    315 
    316    .. method:: persistent_id(obj)
    317 
    318       Do nothing by default.  This exists so a subclass can override it.
    319 
    320       If :meth:`persistent_id` returns ``None``, *obj* is pickled as usual.  Any
    321       other value causes :class:`Pickler` to emit the returned value as a
    322       persistent ID for *obj*.  The meaning of this persistent ID should be
    323       defined by :meth:`Unpickler.persistent_load`.  Note that the value
    324       returned by :meth:`persistent_id` cannot itself have a persistent ID.
    325 
    326       See :ref:`pickle-persistent` for details and examples of uses.
    327 
    328    .. attribute:: dispatch_table
    329 
    330       A pickler object's dispatch table is a registry of *reduction
    331       functions* of the kind which can be declared using
    332       :func:`copyreg.pickle`.  It is a mapping whose keys are classes
    333       and whose values are reduction functions.  A reduction function
    334       takes a single argument of the associated class and should
    335       conform to the same interface as a :meth:`__reduce__`
    336       method.
    337 
    338       By default, a pickler object will not have a
    339       :attr:`dispatch_table` attribute, and it will instead use the
    340       global dispatch table managed by the :mod:`copyreg` module.
    341       However, to customize the pickling for a specific pickler object
    342       one can set the :attr:`dispatch_table` attribute to a dict-like
    343       object.  Alternatively, if a subclass of :class:`Pickler` has a
    344       :attr:`dispatch_table` attribute then this will be used as the
    345       default dispatch table for instances of that class.
    346 
    347       See :ref:`pickle-dispatch` for usage examples.
    348 
    349       .. versionadded:: 3.3
    350 
    351    .. attribute:: fast
    352 
    353       Deprecated. Enable fast mode if set to a true value.  The fast mode
    354       disables the usage of memo, therefore speeding the pickling process by not
    355       generating superfluous PUT opcodes.  It should not be used with
    356       self-referential objects, doing otherwise will cause :class:`Pickler` to
    357       recurse infinitely.
    358 
    359       Use :func:`pickletools.optimize` if you need more compact pickles.
    360 
    361 
    362 .. class:: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
    363 
    364    This takes a binary file for reading a pickle data stream.
    365 
    366    The protocol version of the pickle is detected automatically, so no
    367    protocol argument is needed.
    368 
    369    The argument *file* must have two methods, a read() method that takes an
    370    integer argument, and a readline() method that requires no arguments.  Both
    371    methods should return bytes.  Thus *file* can be an on-disk file object
    372    opened for binary reading, an :class:`io.BytesIO` object, or any other
    373    custom object that meets this interface.
    374 
    375    Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
    376    which are used to control compatibility support for pickle stream generated
    377    by Python 2.  If *fix_imports* is true, pickle will try to map the old
    378    Python 2 names to the new names used in Python 3.  The *encoding* and
    379    *errors* tell pickle how to decode 8-bit string instances pickled by Python
    380    2; these default to 'ASCII' and 'strict', respectively.  The *encoding* can
    381    be 'bytes' to read these 8-bit string instances as bytes objects.
    382 
    383    .. method:: load()
    384 
    385       Read a pickled object representation from the open file object given in
    386       the constructor, and return the reconstituted object hierarchy specified
    387       therein.  Bytes past the pickled object's representation are ignored.
    388 
    389    .. method:: persistent_load(pid)
    390 
    391       Raise an :exc:`UnpicklingError` by default.
    392 
    393       If defined, :meth:`persistent_load` should return the object specified by
    394       the persistent ID *pid*.  If an invalid persistent ID is encountered, an
    395       :exc:`UnpicklingError` should be raised.
    396 
    397       See :ref:`pickle-persistent` for details and examples of uses.
    398 
    399    .. method:: find_class(module, name)
    400 
    401       Import *module* if necessary and return the object called *name* from it,
    402       where the *module* and *name* arguments are :class:`str` objects.  Note,
    403       unlike its name suggests, :meth:`find_class` is also used for finding
    404       functions.
    405 
    406       Subclasses may override this to gain control over what type of objects and
    407       how they can be loaded, potentially reducing security risks. Refer to
    408       :ref:`pickle-restrict` for details.
    409 
    410 
    411 .. _pickle-picklable:
    412 
    413 What can be pickled and unpickled?
    414 ----------------------------------
    415 
    416 The following types can be pickled:
    417 
    418 * ``None``, ``True``, and ``False``
    419 
    420 * integers, floating point numbers, complex numbers
    421 
    422 * strings, bytes, bytearrays
    423 
    424 * tuples, lists, sets, and dictionaries containing only picklable objects
    425 
    426 * functions defined at the top level of a module (using :keyword:`def`, not
    427   :keyword:`lambda`)
    428 
    429 * built-in functions defined at the top level of a module
    430 
    431 * classes that are defined at the top level of a module
    432 
    433 * instances of such classes whose :attr:`~object.__dict__` or the result of
    434   calling :meth:`__getstate__` is picklable  (see section :ref:`pickle-inst` for
    435   details).
    436 
    437 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
    438 exception; when this happens, an unspecified number of bytes may have already
    439 been written to the underlying file.  Trying to pickle a highly recursive data
    440 structure may exceed the maximum recursion depth, a :exc:`RecursionError` will be
    441 raised in this case.  You can carefully raise this limit with
    442 :func:`sys.setrecursionlimit`.
    443 
    444 Note that functions (built-in and user-defined) are pickled by "fully qualified"
    445 name reference, not by value. [#]_  This means that only the function name is
    446 pickled, along with the name of the module the function is defined in.  Neither
    447 the function's code, nor any of its function attributes are pickled.  Thus the
    448 defining module must be importable in the unpickling environment, and the module
    449 must contain the named object, otherwise an exception will be raised. [#]_
    450 
    451 Similarly, classes are pickled by named reference, so the same restrictions in
    452 the unpickling environment apply.  Note that none of the class's code or data is
    453 pickled, so in the following example the class attribute ``attr`` is not
    454 restored in the unpickling environment::
    455 
    456    class Foo:
    457        attr = 'A class attribute'
    458 
    459    picklestring = pickle.dumps(Foo)
    460 
    461 These restrictions are why picklable functions and classes must be defined in
    462 the top level of a module.
    463 
    464 Similarly, when class instances are pickled, their class's code and data are not
    465 pickled along with them.  Only the instance data are pickled.  This is done on
    466 purpose, so you can fix bugs in a class or add methods to the class and still
    467 load objects that were created with an earlier version of the class.  If you
    468 plan to have long-lived objects that will see many versions of a class, it may
    469 be worthwhile to put a version number in the objects so that suitable
    470 conversions can be made by the class's :meth:`__setstate__` method.
    471 
    472 
    473 .. _pickle-inst:
    474 
    475 Pickling Class Instances
    476 ------------------------
    477 
    478 .. currentmodule:: None
    479 
    480 In this section, we describe the general mechanisms available to you to define,
    481 customize, and control how class instances are pickled and unpickled.
    482 
    483 In most cases, no additional code is needed to make instances picklable.  By
    484 default, pickle will retrieve the class and the attributes of an instance via
    485 introspection. When a class instance is unpickled, its :meth:`__init__` method
    486 is usually *not* invoked.  The default behaviour first creates an uninitialized
    487 instance and then restores the saved attributes.  The following code shows an
    488 implementation of this behaviour::
    489 
    490    def save(obj):
    491        return (obj.__class__, obj.__dict__)
    492 
    493    def load(cls, attributes):
    494        obj = cls.__new__(cls)
    495        obj.__dict__.update(attributes)
    496        return obj
    497 
    498 Classes can alter the default behaviour by providing one or several special
    499 methods:
    500 
    501 .. method:: object.__getnewargs_ex__()
    502 
    503    In protocols 2 and newer, classes that implements the
    504    :meth:`__getnewargs_ex__` method can dictate the values passed to the
    505    :meth:`__new__` method upon unpickling.  The method must return a pair
    506    ``(args, kwargs)`` where *args* is a tuple of positional arguments
    507    and *kwargs* a dictionary of named arguments for constructing the
    508    object.  Those will be passed to the :meth:`__new__` method upon
    509    unpickling.
    510 
    511    You should implement this method if the :meth:`__new__` method of your
    512    class requires keyword-only arguments.  Otherwise, it is recommended for
    513    compatibility to implement :meth:`__getnewargs__`.
    514 
    515    .. versionchanged:: 3.6
    516       :meth:`__getnewargs_ex__` is now used in protocols 2 and 3.
    517 
    518 
    519 .. method:: object.__getnewargs__()
    520 
    521    This method serves a similar purpose as :meth:`__getnewargs_ex__`, but
    522    supports only positional arguments.  It must return a tuple of arguments
    523    ``args`` which will be passed to the :meth:`__new__` method upon unpickling.
    524 
    525    :meth:`__getnewargs__` will not be called if :meth:`__getnewargs_ex__` is
    526    defined.
    527 
    528    .. versionchanged:: 3.6
    529       Before Python 3.6, :meth:`__getnewargs__` was called instead of
    530       :meth:`__getnewargs_ex__` in protocols 2 and 3.
    531 
    532 
    533 .. method:: object.__getstate__()
    534 
    535    Classes can further influence how their instances are pickled; if the class
    536    defines the method :meth:`__getstate__`, it is called and the returned object
    537    is pickled as the contents for the instance, instead of the contents of the
    538    instance's dictionary.  If the :meth:`__getstate__` method is absent, the
    539    instance's :attr:`~object.__dict__` is pickled as usual.
    540 
    541 
    542 .. method:: object.__setstate__(state)
    543 
    544    Upon unpickling, if the class defines :meth:`__setstate__`, it is called with
    545    the unpickled state.  In that case, there is no requirement for the state
    546    object to be a dictionary.  Otherwise, the pickled state must be a dictionary
    547    and its items are assigned to the new instance's dictionary.
    548 
    549    .. note::
    550 
    551       If :meth:`__getstate__` returns a false value, the :meth:`__setstate__`
    552       method will not be called upon unpickling.
    553 
    554 
    555 Refer to the section :ref:`pickle-state` for more information about how to use
    556 the methods :meth:`__getstate__` and :meth:`__setstate__`.
    557 
    558 .. note::
    559 
    560    At unpickling time, some methods like :meth:`__getattr__`,
    561    :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
    562    instance.  In case those methods rely on some internal invariant being
    563    true, the type should implement :meth:`__getnewargs__` or
    564    :meth:`__getnewargs_ex__` to establish such an invariant; otherwise,
    565    neither :meth:`__new__` nor :meth:`__init__` will be called.
    566 
    567 .. index:: pair: copy; protocol
    568 
    569 As we shall see, pickle does not use directly the methods described above.  In
    570 fact, these methods are part of the copy protocol which implements the
    571 :meth:`__reduce__` special method.  The copy protocol provides a unified
    572 interface for retrieving the data necessary for pickling and copying
    573 objects. [#]_
    574 
    575 Although powerful, implementing :meth:`__reduce__` directly in your classes is
    576 error prone.  For this reason, class designers should use the high-level
    577 interface (i.e., :meth:`__getnewargs_ex__`, :meth:`__getstate__` and
    578 :meth:`__setstate__`) whenever possible.  We will show, however, cases where
    579 using :meth:`__reduce__` is the only option or leads to more efficient pickling
    580 or both.
    581 
    582 .. method:: object.__reduce__()
    583 
    584    The interface is currently defined as follows.  The :meth:`__reduce__` method
    585    takes no argument and shall return either a string or preferably a tuple (the
    586    returned object is often referred to as the "reduce value").
    587 
    588    If a string is returned, the string should be interpreted as the name of a
    589    global variable.  It should be the object's local name relative to its
    590    module; the pickle module searches the module namespace to determine the
    591    object's module.  This behaviour is typically useful for singletons.
    592 
    593    When a tuple is returned, it must be between two and five items long.
    594    Optional items can either be omitted, or ``None`` can be provided as their
    595    value.  The semantics of each item are in order:
    596 
    597    .. XXX Mention __newobj__ special-case?
    598 
    599    * A callable object that will be called to create the initial version of the
    600      object.
    601 
    602    * A tuple of arguments for the callable object.  An empty tuple must be given
    603      if the callable does not accept any argument.
    604 
    605    * Optionally, the object's state, which will be passed to the object's
    606      :meth:`__setstate__` method as previously described.  If the object has no
    607      such method then, the value must be a dictionary and it will be added to
    608      the object's :attr:`~object.__dict__` attribute.
    609 
    610    * Optionally, an iterator (and not a sequence) yielding successive items.
    611      These items will be appended to the object either using
    612      ``obj.append(item)`` or, in batch, using ``obj.extend(list_of_items)``.
    613      This is primarily used for list subclasses, but may be used by other
    614      classes as long as they have :meth:`append` and :meth:`extend` methods with
    615      the appropriate signature.  (Whether :meth:`append` or :meth:`extend` is
    616      used depends on which pickle protocol version is used as well as the number
    617      of items to append, so both must be supported.)
    618 
    619    * Optionally, an iterator (not a sequence) yielding successive key-value
    620      pairs.  These items will be stored to the object using ``obj[key] =
    621      value``.  This is primarily used for dictionary subclasses, but may be used
    622      by other classes as long as they implement :meth:`__setitem__`.
    623 
    624 
    625 .. method:: object.__reduce_ex__(protocol)
    626 
    627    Alternatively, a :meth:`__reduce_ex__` method may be defined.  The only
    628    difference is this method should take a single integer argument, the protocol
    629    version.  When defined, pickle will prefer it over the :meth:`__reduce__`
    630    method.  In addition, :meth:`__reduce__` automatically becomes a synonym for
    631    the extended version.  The main use for this method is to provide
    632    backwards-compatible reduce values for older Python releases.
    633 
    634 .. currentmodule:: pickle
    635 
    636 .. _pickle-persistent:
    637 
    638 Persistence of External Objects
    639 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    640 
    641 .. index::
    642    single: persistent_id (pickle protocol)
    643    single: persistent_load (pickle protocol)
    644 
    645 For the benefit of object persistence, the :mod:`pickle` module supports the
    646 notion of a reference to an object outside the pickled data stream.  Such
    647 objects are referenced by a persistent ID, which should be either a string of
    648 alphanumeric characters (for protocol 0) [#]_ or just an arbitrary object (for
    649 any newer protocol).
    650 
    651 The resolution of such persistent IDs is not defined by the :mod:`pickle`
    652 module; it will delegate this resolution to the user defined methods on the
    653 pickler and unpickler, :meth:`~Pickler.persistent_id` and
    654 :meth:`~Unpickler.persistent_load` respectively.
    655 
    656 To pickle objects that have an external persistent id, the pickler must have a
    657 custom :meth:`~Pickler.persistent_id` method that takes an object as an
    658 argument and returns either ``None`` or the persistent id for that object.
    659 When ``None`` is returned, the pickler simply pickles the object as normal.
    660 When a persistent ID string is returned, the pickler will pickle that object,
    661 along with a marker so that the unpickler will recognize it as a persistent ID.
    662 
    663 To unpickle external objects, the unpickler must have a custom
    664 :meth:`~Unpickler.persistent_load` method that takes a persistent ID object and
    665 returns the referenced object.
    666 
    667 Here is a comprehensive example presenting how persistent ID can be used to
    668 pickle external objects by reference.
    669 
    670 .. literalinclude:: ../includes/dbpickle.py
    671 
    672 .. _pickle-dispatch:
    673 
    674 Dispatch Tables
    675 ^^^^^^^^^^^^^^^
    676 
    677 If one wants to customize pickling of some classes without disturbing
    678 any other code which depends on pickling, then one can create a
    679 pickler with a private dispatch table.
    680 
    681 The global dispatch table managed by the :mod:`copyreg` module is
    682 available as :data:`copyreg.dispatch_table`.  Therefore, one may
    683 choose to use a modified copy of :data:`copyreg.dispatch_table` as a
    684 private dispatch table.
    685 
    686 For example ::
    687 
    688    f = io.BytesIO()
    689    p = pickle.Pickler(f)
    690    p.dispatch_table = copyreg.dispatch_table.copy()
    691    p.dispatch_table[SomeClass] = reduce_SomeClass
    692 
    693 creates an instance of :class:`pickle.Pickler` with a private dispatch
    694 table which handles the ``SomeClass`` class specially.  Alternatively,
    695 the code ::
    696 
    697    class MyPickler(pickle.Pickler):
    698        dispatch_table = copyreg.dispatch_table.copy()
    699        dispatch_table[SomeClass] = reduce_SomeClass
    700    f = io.BytesIO()
    701    p = MyPickler(f)
    702 
    703 does the same, but all instances of ``MyPickler`` will by default
    704 share the same dispatch table.  The equivalent code using the
    705 :mod:`copyreg` module is ::
    706 
    707    copyreg.pickle(SomeClass, reduce_SomeClass)
    708    f = io.BytesIO()
    709    p = pickle.Pickler(f)
    710 
    711 .. _pickle-state:
    712 
    713 Handling Stateful Objects
    714 ^^^^^^^^^^^^^^^^^^^^^^^^^
    715 
    716 .. index::
    717    single: __getstate__() (copy protocol)
    718    single: __setstate__() (copy protocol)
    719 
    720 Here's an example that shows how to modify pickling behavior for a class.
    721 The :class:`TextReader` class opens a text file, and returns the line number and
    722 line contents each time its :meth:`!readline` method is called. If a
    723 :class:`TextReader` instance is pickled, all attributes *except* the file object
    724 member are saved. When the instance is unpickled, the file is reopened, and
    725 reading resumes from the last location. The :meth:`__setstate__` and
    726 :meth:`__getstate__` methods are used to implement this behavior. ::
    727 
    728    class TextReader:
    729        """Print and number lines in a text file."""
    730 
    731        def __init__(self, filename):
    732            self.filename = filename
    733            self.file = open(filename)
    734            self.lineno = 0
    735 
    736        def readline(self):
    737            self.lineno += 1
    738            line = self.file.readline()
    739            if not line:
    740                return None
    741            if line.endswith('\n'):
    742                line = line[:-1]
    743            return "%i: %s" % (self.lineno, line)
    744 
    745        def __getstate__(self):
    746            # Copy the object's state from self.__dict__ which contains
    747            # all our instance attributes. Always use the dict.copy()
    748            # method to avoid modifying the original state.
    749            state = self.__dict__.copy()
    750            # Remove the unpicklable entries.
    751            del state['file']
    752            return state
    753 
    754        def __setstate__(self, state):
    755            # Restore instance attributes (i.e., filename and lineno).
    756            self.__dict__.update(state)
    757            # Restore the previously opened file's state. To do so, we need to
    758            # reopen it and read from it until the line count is restored.
    759            file = open(self.filename)
    760            for _ in range(self.lineno):
    761                file.readline()
    762            # Finally, save the file.
    763            self.file = file
    764 
    765 
    766 A sample usage might be something like this::
    767 
    768    >>> reader = TextReader("hello.txt")
    769    >>> reader.readline()
    770    '1: Hello world!'
    771    >>> reader.readline()
    772    '2: I am line number two.'
    773    >>> new_reader = pickle.loads(pickle.dumps(reader))
    774    >>> new_reader.readline()
    775    '3: Goodbye!'
    776 
    777 
    778 .. _pickle-restrict:
    779 
    780 Restricting Globals
    781 -------------------
    782 
    783 .. index::
    784    single: find_class() (pickle protocol)
    785 
    786 By default, unpickling will import any class or function that it finds in the
    787 pickle data.  For many applications, this behaviour is unacceptable as it
    788 permits the unpickler to import and invoke arbitrary code.  Just consider what
    789 this hand-crafted pickle data stream does when loaded::
    790 
    791     >>> import pickle
    792     >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
    793     hello world
    794     0
    795 
    796 In this example, the unpickler imports the :func:`os.system` function and then
    797 apply the string argument "echo hello world".  Although this example is
    798 inoffensive, it is not difficult to imagine one that could damage your system.
    799 
    800 For this reason, you may want to control what gets unpickled by customizing
    801 :meth:`Unpickler.find_class`.  Unlike its name suggests,
    802 :meth:`Unpickler.find_class` is called whenever a global (i.e., a class or
    803 a function) is requested.  Thus it is possible to either completely forbid
    804 globals or restrict them to a safe subset.
    805 
    806 Here is an example of an unpickler allowing only few safe classes from the
    807 :mod:`builtins` module to be loaded::
    808 
    809    import builtins
    810    import io
    811    import pickle
    812 
    813    safe_builtins = {
    814        'range',
    815        'complex',
    816        'set',
    817        'frozenset',
    818        'slice',
    819    }
    820 
    821    class RestrictedUnpickler(pickle.Unpickler):
    822 
    823        def find_class(self, module, name):
    824            # Only allow safe classes from builtins.
    825            if module == "builtins" and name in safe_builtins:
    826                return getattr(builtins, name)
    827            # Forbid everything else.
    828            raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
    829                                         (module, name))
    830 
    831    def restricted_loads(s):
    832        """Helper function analogous to pickle.loads()."""
    833        return RestrictedUnpickler(io.BytesIO(s)).load()
    834 
    835 A sample usage of our unpickler working has intended::
    836 
    837     >>> restricted_loads(pickle.dumps([1, 2, range(15)]))
    838     [1, 2, range(0, 15)]
    839     >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
    840     Traceback (most recent call last):
    841       ...
    842     pickle.UnpicklingError: global 'os.system' is forbidden
    843     >>> restricted_loads(b'cbuiltins\neval\n'
    844     ...                  b'(S\'getattr(__import__("os"), "system")'
    845     ...                  b'("echo hello world")\'\ntR.')
    846     Traceback (most recent call last):
    847       ...
    848     pickle.UnpicklingError: global 'builtins.eval' is forbidden
    849 
    850 
    851 .. XXX Add note about how extension codes could evade our protection
    852    mechanism (e.g. cached classes do not invokes find_class()).
    853 
    854 As our examples shows, you have to be careful with what you allow to be
    855 unpickled.  Therefore if security is a concern, you may want to consider
    856 alternatives such as the marshalling API in :mod:`xmlrpc.client` or
    857 third-party solutions.
    858 
    859 
    860 Performance
    861 -----------
    862 
    863 Recent versions of the pickle protocol (from protocol 2 and upwards) feature
    864 efficient binary encodings for several common features and built-in types.
    865 Also, the :mod:`pickle` module has a transparent optimizer written in C.
    866 
    867 
    868 .. _pickle-example:
    869 
    870 Examples
    871 --------
    872 
    873 For the simplest code, use the :func:`dump` and :func:`load` functions. ::
    874 
    875    import pickle
    876 
    877    # An arbitrary collection of objects supported by pickle.
    878    data = {
    879        'a': [1, 2.0, 3, 4+6j],
    880        'b': ("character string", b"byte string"),
    881        'c': {None, True, False}
    882    }
    883 
    884    with open('data.pickle', 'wb') as f:
    885        # Pickle the 'data' dictionary using the highest protocol available.
    886        pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
    887 
    888 
    889 The following example reads the resulting pickled data. ::
    890 
    891    import pickle
    892 
    893    with open('data.pickle', 'rb') as f:
    894        # The protocol version used is detected automatically, so we do not
    895        # have to specify it.
    896        data = pickle.load(f)
    897 
    898 
    899 .. XXX: Add examples showing how to optimize pickles for size (like using
    900 .. pickletools.optimize() or the gzip module).
    901 
    902 
    903 .. seealso::
    904 
    905    Module :mod:`copyreg`
    906       Pickle interface constructor registration for extension types.
    907 
    908    Module :mod:`pickletools`
    909       Tools for working with and analyzing pickled data.
    910 
    911    Module :mod:`shelve`
    912       Indexed databases of objects; uses :mod:`pickle`.
    913 
    914    Module :mod:`copy`
    915       Shallow and deep object copying.
    916 
    917    Module :mod:`marshal`
    918       High-performance serialization of built-in types.
    919 
    920 
    921 .. rubric:: Footnotes
    922 
    923 .. [#] Don't confuse this with the :mod:`marshal` module
    924 
    925 .. [#] This is why :keyword:`lambda` functions cannot be pickled:  all
    926     :keyword:`!lambda` functions share the same name:  ``<lambda>``.
    927 
    928 .. [#] The exception raised will likely be an :exc:`ImportError` or an
    929    :exc:`AttributeError` but it could be something else.
    930 
    931 .. [#] The :mod:`copy` module uses this protocol for shallow and deep copying
    932    operations.
    933 
    934 .. [#] The limitation on alphanumeric characters is due to the fact
    935    the persistent IDs, in protocol 0, are delimited by the newline
    936    character.  Therefore if any kind of newline characters occurs in
    937    persistent IDs, the resulting pickle will become unreadable.
    938