Home | History | Annotate | Download | only in library
      1 :mod:`pickle` --- Python object serialization
      2 =============================================
      3 
      4 .. index::
      5    single: persistence
      6    pair: persistent; objects
      7    pair: serializing; objects
      8    pair: marshalling; objects
      9    pair: flattening; objects
     10    pair: pickling; objects
     11 
     12 .. module:: pickle
     13    :synopsis: Convert Python objects to streams of bytes and back.
     14 .. sectionauthor:: Jim Kerr <jbkerr (a] sr.hp.com>.
     15 .. sectionauthor:: Barry Warsaw <barry (a] zope.com>
     16 
     17 The :mod:`pickle` module implements a fundamental, but powerful algorithm for
     18 serializing and de-serializing a Python object structure.  "Pickling" is the
     19 process whereby a Python object hierarchy is converted into a byte stream, and
     20 "unpickling" is the inverse operation, whereby a byte stream is converted back
     21 into an object hierarchy.  Pickling (and unpickling) is alternatively known as
     22 "serialization", "marshalling," [#]_ or "flattening", however, to avoid
     23 confusion, the terms used here are "pickling" and "unpickling".
     24 
     25 This documentation describes both the :mod:`pickle` module and the
     26 :mod:`cPickle` module.
     27 
     28 .. warning::
     29 
     30    The :mod:`pickle` module is not secure against erroneous or maliciously
     31    constructed data.  Never unpickle data received from an untrusted or
     32    unauthenticated source.
     33 
     34 
     35 Relationship to other Python modules
     36 ------------------------------------
     37 
     38 The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
     39 module.  As its name implies, :mod:`cPickle` is written in C, so it can be up to
     40 1000 times faster than :mod:`pickle`.  However it does not support subclassing
     41 of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
     42 these are functions, not classes.  Most applications have no need for this
     43 functionality, and can benefit from the improved performance of :mod:`cPickle`.
     44 Other than that, the interfaces of the two modules are nearly identical; the
     45 common interface is described in this manual and differences are pointed out
     46 where necessary.  In the following discussions, we use the term "pickle" to
     47 collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
     48 
     49 The data streams the two modules produce are guaranteed to be interchangeable.
     50 
     51 Python has a more primitive serialization module called :mod:`marshal`, but in
     52 general :mod:`pickle` should always be the preferred way to serialize Python
     53 objects.  :mod:`marshal` exists primarily to support Python's :file:`.pyc`
     54 files.
     55 
     56 The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
     57 
     58 * The :mod:`pickle` module keeps track of the objects it has already serialized,
     59   so that later references to the same object won't be serialized again.
     60   :mod:`marshal` doesn't do this.
     61 
     62   This has implications both for recursive objects and object sharing.  Recursive
     63   objects are objects that contain references to themselves.  These are not
     64   handled by marshal, and in fact, attempting to marshal recursive objects will
     65   crash your Python interpreter.  Object sharing happens when there are multiple
     66   references to the same object in different places in the object hierarchy being
     67   serialized.  :mod:`pickle` stores such objects only once, and ensures that all
     68   other references point to the master copy.  Shared objects remain shared, which
     69   can be very important for mutable objects.
     70 
     71 * :mod:`marshal` cannot be used to serialize user-defined classes and their
     72   instances.  :mod:`pickle` can save and restore class instances transparently,
     73   however the class definition must be importable and live in the same module as
     74   when the object was stored.
     75 
     76 * The :mod:`marshal` serialization format is not guaranteed to be portable
     77   across Python versions.  Because its primary job in life is to support
     78   :file:`.pyc` files, the Python implementers reserve the right to change the
     79   serialization format in non-backwards compatible ways should the need arise.
     80   The :mod:`pickle` serialization format is guaranteed to be backwards compatible
     81   across Python releases.
     82 
     83 Note that serialization is a more primitive notion than persistence; although
     84 :mod:`pickle` reads and writes file objects, it does not handle the issue of
     85 naming persistent objects, nor the (even more complicated) issue of concurrent
     86 access to persistent objects.  The :mod:`pickle` module can transform a complex
     87 object into a byte stream and it can transform the byte stream into an object
     88 with the same internal structure.  Perhaps the most obvious thing to do with
     89 these byte streams is to write them onto a file, but it is also conceivable to
     90 send them across a network or store them in a database.  The module
     91 :mod:`shelve` provides a simple interface to pickle and unpickle objects on
     92 DBM-style database files.
     93 
     94 
     95 Data stream format
     96 ------------------
     97 
     98 .. index::
     99    single: XDR
    100    single: External Data Representation
    101 
    102 The data format used by :mod:`pickle` is Python-specific.  This has the
    103 advantage that there are no restrictions imposed by external standards such as
    104 XDR (which can't represent pointer sharing); however it means that non-Python
    105 programs may not be able to reconstruct pickled Python objects.
    106 
    107 By default, the :mod:`pickle` data format uses a printable ASCII representation.
    108 This is slightly more voluminous than a binary representation.  The big
    109 advantage of using printable ASCII (and of some other characteristics of
    110 :mod:`pickle`'s representation) is that for debugging or recovery purposes it is
    111 possible for a human to read the pickled file with a standard text editor.
    112 
    113 There are currently 3 different protocols which can be used for pickling.
    114 
    115 * Protocol version 0 is the original ASCII protocol and is backwards compatible
    116   with earlier versions of Python.
    117 
    118 * Protocol version 1 is the old binary format which is also compatible with
    119   earlier versions of Python.
    120 
    121 * Protocol version 2 was introduced in Python 2.3.  It provides much more
    122   efficient pickling of :term:`new-style class`\es.
    123 
    124 Refer to :pep:`307` for more information.
    125 
    126 If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
    127 as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
    128 available will be used.
    129 
    130 .. versionchanged:: 2.3
    131    Introduced the *protocol* parameter.
    132 
    133 A binary format, which is slightly more efficient, can be chosen by specifying a
    134 *protocol* version >= 1.
    135 
    136 
    137 Usage
    138 -----
    139 
    140 To serialize an object hierarchy, you first create a pickler, then you call the
    141 pickler's :meth:`dump` method.  To de-serialize a data stream, you first create
    142 an unpickler, then you call the unpickler's :meth:`load` method.  The
    143 :mod:`pickle` module provides the following constant:
    144 
    145 
    146 .. data:: HIGHEST_PROTOCOL
    147 
    148    The highest protocol version available.  This value can be passed as a
    149    *protocol* value.
    150 
    151    .. versionadded:: 2.3
    152 
    153 .. note::
    154 
    155    Be sure to always open pickle files created with protocols >= 1 in binary mode.
    156    For the old ASCII-based pickle protocol 0 you can use either text mode or binary
    157    mode as long as you stay consistent.
    158 
    159    A pickle file written with protocol 0 in binary mode will contain lone linefeeds
    160    as line terminators and therefore will look "funny" when viewed in Notepad or
    161    other editors which do not support this format.
    162 
    163 The :mod:`pickle` module provides the following functions to make the pickling
    164 process more convenient:
    165 
    166 
    167 .. function:: dump(obj, file[, protocol])
    168 
    169    Write a pickled representation of *obj* to the open file object *file*.  This is
    170    equivalent to ``Pickler(file, protocol).dump(obj)``.
    171 
    172    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    173    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
    174    version will be used.
    175 
    176    .. versionchanged:: 2.3
    177       Introduced the *protocol* parameter.
    178 
    179    *file* must have a :meth:`write` method that accepts a single string argument.
    180    It can thus be a file object opened for writing, a :mod:`StringIO` object, or
    181    any other custom object that meets this interface.
    182 
    183 
    184 .. function:: load(file)
    185 
    186    Read a string from the open file object *file* and interpret it as a pickle data
    187    stream, reconstructing and returning the original object hierarchy.  This is
    188    equivalent to ``Unpickler(file).load()``.
    189 
    190    *file* must have two methods, a :meth:`read` method that takes an integer
    191    argument, and a :meth:`readline` method that requires no arguments.  Both
    192    methods should return a string.  Thus *file* can be a file object opened for
    193    reading, a :mod:`StringIO` object, or any other custom object that meets this
    194    interface.
    195 
    196    This function automatically determines whether the data stream was written in
    197    binary mode or not.
    198 
    199 
    200 .. function:: dumps(obj[, protocol])
    201 
    202    Return the pickled representation of the object as a string, instead of writing
    203    it to a file.
    204 
    205    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    206    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
    207    version will be used.
    208 
    209    .. versionchanged:: 2.3
    210       The *protocol* parameter was added.
    211 
    212 
    213 .. function:: loads(string)
    214 
    215    Read a pickled object hierarchy from a string.  Characters in the string past
    216    the pickled object's representation are ignored.
    217 
    218 The :mod:`pickle` module also defines three exceptions:
    219 
    220 
    221 .. exception:: PickleError
    222 
    223    A common base class for the other exceptions defined below.  This inherits from
    224    :exc:`Exception`.
    225 
    226 
    227 .. exception:: PicklingError
    228 
    229    This exception is raised when an unpicklable object is passed to the
    230    :meth:`dump` method.
    231 
    232 
    233 .. exception:: UnpicklingError
    234 
    235    This exception is raised when there is a problem unpickling an object. Note that
    236    other exceptions may also be raised during unpickling, including (but not
    237    necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
    238    :exc:`ImportError`, and :exc:`IndexError`.
    239 
    240 The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
    241 :class:`Unpickler`:
    242 
    243 
    244 .. class:: Pickler(file[, protocol])
    245 
    246    This takes a file-like object to which it will write a pickle data stream.
    247 
    248    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    249    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
    250    protocol version will be used.
    251 
    252    .. versionchanged:: 2.3
    253       Introduced the *protocol* parameter.
    254 
    255    *file* must have a :meth:`write` method that accepts a single string argument.
    256    It can thus be an open file object, a :mod:`StringIO` object, or any other
    257    custom object that meets this interface.
    258 
    259    :class:`Pickler` objects define one (or two) public methods:
    260 
    261 
    262    .. method:: dump(obj)
    263 
    264       Write a pickled representation of *obj* to the open file object given in the
    265       constructor.  Either the binary or ASCII format will be used, depending on the
    266       value of the *protocol* argument passed to the constructor.
    267 
    268 
    269    .. method:: clear_memo()
    270 
    271       Clears the pickler's "memo".  The memo is the data structure that remembers
    272       which objects the pickler has already seen, so that shared or recursive objects
    273       pickled by reference and not by value.  This method is useful when re-using
    274       picklers.
    275 
    276       .. note::
    277 
    278          Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
    279          created by :mod:`cPickle`.  In the :mod:`pickle` module, picklers have an
    280          instance variable called :attr:`memo` which is a Python dictionary.  So to clear
    281          the memo for a :mod:`pickle` module pickler, you could do the following::
    282 
    283             mypickler.memo.clear()
    284 
    285          Code that does not need to support older versions of Python should simply use
    286          :meth:`clear_memo`.
    287 
    288 It is possible to make multiple calls to the :meth:`dump` method of the same
    289 :class:`Pickler` instance.  These must then be matched to the same number of
    290 calls to the :meth:`load` method of the corresponding :class:`Unpickler`
    291 instance.  If the same object is pickled by multiple :meth:`dump` calls, the
    292 :meth:`load` will all yield references to the same object. [#]_
    293 
    294 :class:`Unpickler` objects are defined as:
    295 
    296 
    297 .. class:: Unpickler(file)
    298 
    299    This takes a file-like object from which it will read a pickle data stream.
    300    This class automatically determines whether the data stream was written in
    301    binary mode or not, so it does not need a flag as in the :class:`Pickler`
    302    factory.
    303 
    304    *file* must have two methods, a :meth:`read` method that takes an integer
    305    argument, and a :meth:`readline` method that requires no arguments.  Both
    306    methods should return a string.  Thus *file* can be a file object opened for
    307    reading, a :mod:`StringIO` object, or any other custom object that meets this
    308    interface.
    309 
    310    :class:`Unpickler` objects have one (or two) public methods:
    311 
    312 
    313    .. method:: load()
    314 
    315       Read a pickled object representation from the open file object given in
    316       the constructor, and return the reconstituted object hierarchy specified
    317       therein.
    318 
    319       This method automatically determines whether the data stream was written
    320       in binary mode or not.
    321 
    322 
    323    .. method:: noload()
    324 
    325       This is just like :meth:`load` except that it doesn't actually create any
    326       objects.  This is useful primarily for finding what's called "persistent
    327       ids" that may be referenced in a pickle data stream.  See section
    328       :ref:`pickle-protocol` below for more details.
    329 
    330       **Note:** the :meth:`noload` method is currently only available on
    331       :class:`Unpickler` objects created with the :mod:`cPickle` module.
    332       :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
    333       method.
    334 
    335 
    336 What can be pickled and unpickled?
    337 ----------------------------------
    338 
    339 The following types can be pickled:
    340 
    341 * ``None``, ``True``, and ``False``
    342 
    343 * integers, long integers, floating point numbers, complex numbers
    344 
    345 * normal and Unicode strings
    346 
    347 * tuples, lists, sets, and dictionaries containing only picklable objects
    348 
    349 * functions defined at the top level of a module
    350 
    351 * built-in functions defined at the top level of a module
    352 
    353 * classes that are defined at the top level of a module
    354 
    355 * instances of such classes whose :attr:`~object.__dict__` or the result of
    356   calling :meth:`__getstate__` is picklable  (see section :ref:`pickle-protocol`
    357   for details).
    358 
    359 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
    360 exception; when this happens, an unspecified number of bytes may have already
    361 been written to the underlying file. Trying to pickle a highly recursive data
    362 structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
    363 raised in this case. You can carefully raise this limit with
    364 :func:`sys.setrecursionlimit`.
    365 
    366 Note that functions (built-in and user-defined) are pickled by "fully qualified"
    367 name reference, not by value.  This means that only the function name is
    368 pickled, along with the name of the module the function is defined in.  Neither
    369 the function's code, nor any of its function attributes are pickled.  Thus the
    370 defining module must be importable in the unpickling environment, and the module
    371 must contain the named object, otherwise an exception will be raised. [#]_
    372 
    373 Similarly, classes are pickled by named reference, so the same restrictions in
    374 the unpickling environment apply.  Note that none of the class's code or data is
    375 pickled, so in the following example the class attribute ``attr`` is not
    376 restored in the unpickling environment::
    377 
    378    class Foo:
    379        attr = 'a class attr'
    380 
    381    picklestring = pickle.dumps(Foo)
    382 
    383 These restrictions are why picklable functions and classes must be defined in
    384 the top level of a module.
    385 
    386 Similarly, when class instances are pickled, their class's code and data are not
    387 pickled along with them.  Only the instance data are pickled.  This is done on
    388 purpose, so you can fix bugs in a class or add methods to the class and still
    389 load objects that were created with an earlier version of the class.  If you
    390 plan to have long-lived objects that will see many versions of a class, it may
    391 be worthwhile to put a version number in the objects so that suitable
    392 conversions can be made by the class's :meth:`__setstate__` method.
    393 
    394 
    395 .. _pickle-protocol:
    396 
    397 The pickle protocol
    398 -------------------
    399 
    400 .. currentmodule:: None
    401 
    402 This section describes the "pickling protocol" that defines the interface
    403 between the pickler/unpickler and the objects that are being serialized.  This
    404 protocol provides a standard way for you to define, customize, and control how
    405 your objects are serialized and de-serialized.  The description in this section
    406 doesn't cover specific customizations that you can employ to make the unpickling
    407 environment slightly safer from untrusted pickle data streams; see section
    408 :ref:`pickle-sub` for more details.
    409 
    410 
    411 .. _pickle-inst:
    412 
    413 Pickling and unpickling normal class instances
    414 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    415 
    416 .. method:: object.__getinitargs__()
    417 
    418    When a pickled class instance is unpickled, its :meth:`__init__` method is
    419    normally *not* invoked.  If it is desirable that the :meth:`__init__` method
    420    be called on unpickling, an old-style class can define a method
    421    :meth:`__getinitargs__`, which should return a *tuple* of positional
    422    arguments to be passed to the class constructor (:meth:`__init__` for
    423    example).  Keyword arguments are not supported.  The :meth:`__getinitargs__`
    424    method is called at pickle time; the tuple it returns is incorporated in the
    425    pickle for the instance.
    426 
    427 .. method:: object.__getnewargs__()
    428 
    429    New-style types can provide a :meth:`__getnewargs__` method that is used for
    430    protocol 2.  Implementing this method is needed if the type establishes some
    431    internal invariants when the instance is created, or if the memory allocation
    432    is affected by the values passed to the :meth:`__new__` method for the type
    433    (as it is for tuples and strings).  Instances of a :term:`new-style class`
    434    ``C`` are created using ::
    435 
    436       obj = C.__new__(C, *args)
    437 
    438    where *args* is the result of calling :meth:`__getnewargs__` on the original
    439    object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
    440 
    441 .. method:: object.__getstate__()
    442 
    443    Classes can further influence how their instances are pickled; if the class
    444    defines the method :meth:`__getstate__`, it is called and the return state is
    445    pickled as the contents for the instance, instead of the contents of the
    446    instance's dictionary.  If there is no :meth:`__getstate__` method, the
    447    instance's :attr:`~object.__dict__` is pickled.
    448 
    449 .. method:: object.__setstate__(state)
    450 
    451    Upon unpickling, if the class also defines the method :meth:`__setstate__`,
    452    it is called with the unpickled state. [#]_ If there is no
    453    :meth:`__setstate__` method, the pickled state must be a dictionary and its
    454    items are assigned to the new instance's dictionary.  If a class defines both
    455    :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
    456    dictionary and these methods can do what they want. [#]_
    457 
    458    .. note::
    459 
    460       For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
    461       value, the :meth:`__setstate__` method will not be called.
    462 
    463 .. note::
    464 
    465    At unpickling time, some methods like :meth:`__getattr__`,
    466    :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
    467    instance.  In case those methods rely on some internal invariant being
    468    true, the type should implement either :meth:`__getinitargs__` or
    469    :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
    470    :meth:`__new__` nor :meth:`__init__` will be called.
    471 
    472 
    473 Pickling and unpickling extension types
    474 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    475 
    476 .. method:: object.__reduce__()
    477 
    478    When the :class:`Pickler` encounters an object of a type it knows nothing
    479    about --- such as an extension type --- it looks in two places for a hint of
    480    how to pickle it.  One alternative is for the object to implement a
    481    :meth:`__reduce__` method.  If provided, at pickling time :meth:`__reduce__`
    482    will be called with no arguments, and it must return either a string or a
    483    tuple.
    484 
    485    If a string is returned, it names a global variable whose contents are
    486    pickled as normal.  The string returned by :meth:`__reduce__` should be the
    487    object's local name relative to its module; the pickle module searches the
    488    module namespace to determine the object's module.
    489 
    490    When a tuple is returned, it must be between two and five elements long.
    491    Optional elements can either be omitted, or ``None`` can be provided as their
    492    value.  The contents of this tuple are pickled as normal and used to
    493    reconstruct the object at unpickling time.  The semantics of each element
    494    are:
    495 
    496    * A callable object that will be called to create the initial version of the
    497      object.  The next element of the tuple will provide arguments for this
    498      callable, and later elements provide additional state information that will
    499      subsequently be used to fully reconstruct the pickled data.
    500 
    501      In the unpickling environment this object must be either a class, a
    502      callable registered as a "safe constructor" (see below), or it must have an
    503      attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
    504      :exc:`UnpicklingError` will be raised in the unpickling environment.  Note
    505      that as usual, the callable itself is pickled by name.
    506 
    507    * A tuple of arguments for the callable object.
    508 
    509      .. versionchanged:: 2.5
    510         Formerly, this argument could also be ``None``.
    511 
    512    * Optionally, the object's state, which will be passed to the object's
    513      :meth:`__setstate__` method as described in section :ref:`pickle-inst`.  If
    514      the object has no :meth:`__setstate__` method, then, as above, the value
    515      must be a dictionary and it will be added to the object's
    516      :attr:`~object.__dict__`.
    517 
    518    * Optionally, an iterator (and not a sequence) yielding successive list
    519      items.  These list items will be pickled, and appended to the object using
    520      either ``obj.append(item)`` or ``obj.extend(list_of_items)``.  This is
    521      primarily used for list subclasses, but may be used by other classes as
    522      long as they have :meth:`append` and :meth:`extend` methods with the
    523      appropriate signature.  (Whether :meth:`append` or :meth:`extend` is used
    524      depends on which pickle protocol version is used as well as the number of
    525      items to append, so both must be supported.)
    526 
    527    * Optionally, an iterator (not a sequence) yielding successive dictionary
    528      items, which should be tuples of the form ``(key, value)``.  These items
    529      will be pickled and stored to the object using ``obj[key] = value``. This
    530      is primarily used for dictionary subclasses, but may be used by other
    531      classes as long as they implement :meth:`__setitem__`.
    532 
    533 .. method:: object.__reduce_ex__(protocol)
    534 
    535    It is sometimes useful to know the protocol version when implementing
    536    :meth:`__reduce__`.  This can be done by implementing a method named
    537    :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
    538    when it exists, is called in preference over :meth:`__reduce__` (you may
    539    still provide :meth:`__reduce__` for backwards compatibility).  The
    540    :meth:`__reduce_ex__` method will be called with a single integer argument,
    541    the protocol version.
    542 
    543    The :class:`object` class implements both :meth:`__reduce__` and
    544    :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
    545    but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
    546    detects this and calls :meth:`__reduce__`.
    547 
    548 An alternative to implementing a :meth:`__reduce__` method on the object to be
    549 pickled, is to register the callable with the :mod:`copy_reg` module.  This
    550 module provides a way for programs to register "reduction functions" and
    551 constructors for user-defined types.   Reduction functions have the same
    552 semantics and interface as the :meth:`__reduce__` method described above, except
    553 that they are called with a single argument, the object to be pickled.
    554 
    555 The registered constructor is deemed a "safe constructor" for purposes of
    556 unpickling as described above.
    557 
    558 
    559 Pickling and unpickling external objects
    560 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    561 
    562 .. index::
    563    single: persistent_id (pickle protocol)
    564    single: persistent_load (pickle protocol)
    565 
    566 For the benefit of object persistence, the :mod:`pickle` module supports the
    567 notion of a reference to an object outside the pickled data stream.  Such
    568 objects are referenced by a "persistent id", which is just an arbitrary string
    569 of printable ASCII characters. The resolution of such names is not defined by
    570 the :mod:`pickle` module; it will delegate this resolution to user defined
    571 functions on the pickler and unpickler. [#]_
    572 
    573 To define external persistent id resolution, you need to set the
    574 :attr:`~Pickler.persistent_id` attribute of the pickler object and the
    575 :attr:`~Unpickler.persistent_load` attribute of the unpickler object.
    576 
    577 To pickle objects that have an external persistent id, the pickler must have a
    578 custom :func:`~Pickler.persistent_id` method that takes an object as an
    579 argument and returns either ``None`` or the persistent id for that object.
    580 When ``None`` is returned, the pickler simply pickles the object as normal.
    581 When a persistent id string is returned, the pickler will pickle that string,
    582 along with a marker so that the unpickler will recognize the string as a
    583 persistent id.
    584 
    585 To unpickle external objects, the unpickler must have a custom
    586 :func:`~Unpickler.persistent_load` function that takes a persistent id string
    587 and returns the referenced object.
    588 
    589 Here's a silly example that *might* shed more light::
    590 
    591    import pickle
    592    from cStringIO import StringIO
    593 
    594    src = StringIO()
    595    p = pickle.Pickler(src)
    596 
    597    def persistent_id(obj):
    598        if hasattr(obj, 'x'):
    599            return 'the value %d' % obj.x
    600        else:
    601            return None
    602 
    603    p.persistent_id = persistent_id
    604 
    605    class Integer:
    606        def __init__(self, x):
    607            self.x = x
    608        def __str__(self):
    609            return 'My name is integer %d' % self.x
    610 
    611    i = Integer(7)
    612    print i
    613    p.dump(i)
    614 
    615    datastream = src.getvalue()
    616    print repr(datastream)
    617    dst = StringIO(datastream)
    618 
    619    up = pickle.Unpickler(dst)
    620 
    621    class FancyInteger(Integer):
    622        def __str__(self):
    623            return 'I am the integer %d' % self.x
    624 
    625    def persistent_load(persid):
    626        if persid.startswith('the value '):
    627            value = int(persid.split()[2])
    628            return FancyInteger(value)
    629        else:
    630            raise pickle.UnpicklingError, 'Invalid persistent id'
    631 
    632    up.persistent_load = persistent_load
    633 
    634    j = up.load()
    635    print j
    636 
    637 In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load`
    638 attribute can also be set to a Python list, in which case, when the unpickler
    639 reaches a persistent id, the persistent id string will simply be appended to
    640 this list.  This functionality exists so that a pickle data stream can be
    641 "sniffed" for object references without actually instantiating all the objects
    642 in a pickle.
    643 [#]_  Setting :attr:`~Unpickler.persistent_load` to a list is usually used in
    644 conjunction with the :meth:`~Unpickler.noload` method on the Unpickler.
    645 
    646 .. BAW: Both pickle and cPickle support something called inst_persistent_id()
    647    which appears to give unknown types a second shot at producing a persistent
    648    id.  Since Jim Fulton can't remember why it was added or what it's for, I'm
    649    leaving it undocumented.
    650 
    651 
    652 .. _pickle-sub:
    653 
    654 Subclassing Unpicklers
    655 ----------------------
    656 
    657 .. index::
    658    single: load_global() (pickle protocol)
    659    single: find_global() (pickle protocol)
    660 
    661 By default, unpickling will import any class that it finds in the pickle data.
    662 You can control exactly what gets unpickled and what gets called by customizing
    663 your unpickler.  Unfortunately, exactly how you do this is different depending
    664 on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
    665 
    666 In the :mod:`pickle` module, you need to derive a subclass from
    667 :class:`Unpickler`, overriding the :meth:`load_global` method.
    668 :meth:`load_global` should read two lines from the pickle data stream where the
    669 first line will the name of the module containing the class and the second line
    670 will be the name of the instance's class.  It then looks up the class, possibly
    671 importing the module and digging out the attribute, then it appends what it
    672 finds to the unpickler's stack.  Later on, this class will be assigned to the
    673 :attr:`__class__` attribute of an empty class, as a way of magically creating an
    674 instance without calling its class's :meth:`__init__`. Your job (should you
    675 choose to accept it), would be to have :meth:`load_global` push onto the
    676 unpickler's stack, a known safe version of any class you deem safe to unpickle.
    677 It is up to you to produce such a class.  Or you could raise an error if you
    678 want to disallow all unpickling of instances.  If this sounds like a hack,
    679 you're right.  Refer to the source code to make this work.
    680 
    681 Things are a little cleaner with :mod:`cPickle`, but not by much. To control
    682 what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global`
    683 attribute to a function or ``None``.  If it is ``None`` then any attempts to
    684 unpickle instances will raise an :exc:`UnpicklingError`.  If it is a function,
    685 then it should accept a module name and a class name, and return the
    686 corresponding class object.  It is responsible for looking up the class and
    687 performing any necessary imports, and it may raise an error to prevent
    688 instances of the class from being unpickled.
    689 
    690 The moral of the story is that you should be really careful about the source of
    691 the strings your application unpickles.
    692 
    693 
    694 .. _pickle-example:
    695 
    696 Example
    697 -------
    698 
    699 For the simplest code, use the :func:`dump` and :func:`load` functions.  Note
    700 that a self-referencing list is pickled and restored correctly. ::
    701 
    702    import pickle
    703 
    704    data1 = {'a': [1, 2.0, 3, 4+6j],
    705             'b': ('string', u'Unicode string'),
    706             'c': None}
    707 
    708    selfref_list = [1, 2, 3]
    709    selfref_list.append(selfref_list)
    710 
    711    output = open('data.pkl', 'wb')
    712 
    713    # Pickle dictionary using protocol 0.
    714    pickle.dump(data1, output)
    715 
    716    # Pickle the list using the highest protocol available.
    717    pickle.dump(selfref_list, output, -1)
    718 
    719    output.close()
    720 
    721 The following example reads the resulting pickled data.  When reading a
    722 pickle-containing file, you should open the file in binary mode because you
    723 can't be sure if the ASCII or binary format was used. ::
    724 
    725    import pprint, pickle
    726 
    727    pkl_file = open('data.pkl', 'rb')
    728 
    729    data1 = pickle.load(pkl_file)
    730    pprint.pprint(data1)
    731 
    732    data2 = pickle.load(pkl_file)
    733    pprint.pprint(data2)
    734 
    735    pkl_file.close()
    736 
    737 Here's a larger example that shows how to modify pickling behavior for a class.
    738 The :class:`TextReader` class opens a text file, and returns the line number and
    739 line contents each time its :meth:`!readline` method is called. If a
    740 :class:`TextReader` instance is pickled, all attributes *except* the file object
    741 member are saved. When the instance is unpickled, the file is reopened, and
    742 reading resumes from the last location. The :meth:`__setstate__` and
    743 :meth:`__getstate__` methods are used to implement this behavior. ::
    744 
    745    #!/usr/local/bin/python
    746 
    747    class TextReader:
    748        """Print and number lines in a text file."""
    749        def __init__(self, file):
    750            self.file = file
    751            self.fh = open(file)
    752            self.lineno = 0
    753 
    754        def readline(self):
    755            self.lineno = self.lineno + 1
    756            line = self.fh.readline()
    757            if not line:
    758                return None
    759            if line.endswith("\n"):
    760                line = line[:-1]
    761            return "%d: %s" % (self.lineno, line)
    762 
    763        def __getstate__(self):
    764            odict = self.__dict__.copy() # copy the dict since we change it
    765            del odict['fh']              # remove filehandle entry
    766            return odict
    767 
    768        def __setstate__(self, dict):
    769            fh = open(dict['file'])      # reopen file
    770            count = dict['lineno']       # read from file...
    771            while count:                 # until line count is restored
    772                fh.readline()
    773                count = count - 1
    774            self.__dict__.update(dict)   # update attributes
    775            self.fh = fh                 # save the file object
    776 
    777 A sample usage might be something like this::
    778 
    779    >>> import TextReader
    780    >>> obj = TextReader.TextReader("TextReader.py")
    781    >>> obj.readline()
    782    '1: #!/usr/local/bin/python'
    783    >>> obj.readline()
    784    '2: '
    785    >>> obj.readline()
    786    '3: class TextReader:'
    787    >>> import pickle
    788    >>> pickle.dump(obj, open('save.p', 'wb'))
    789 
    790 If you want to see that :mod:`pickle` works across Python processes, start
    791 another Python session, before continuing.  What follows can happen from either
    792 the same process or a new process. ::
    793 
    794    >>> import pickle
    795    >>> reader = pickle.load(open('save.p', 'rb'))
    796    >>> reader.readline()
    797    '4:     """Print and number lines in a text file."""'
    798 
    799 
    800 .. seealso::
    801 
    802    Module :mod:`copy_reg`
    803       Pickle interface constructor registration for extension types.
    804 
    805    Module :mod:`shelve`
    806       Indexed databases of objects; uses :mod:`pickle`.
    807 
    808    Module :mod:`copy`
    809       Shallow and deep object copying.
    810 
    811    Module :mod:`marshal`
    812       High-performance serialization of built-in types.
    813 
    814 
    815 :mod:`cPickle` --- A faster :mod:`pickle`
    816 =========================================
    817 
    818 .. module:: cPickle
    819    :synopsis: Faster version of pickle, but not subclassable.
    820 .. moduleauthor:: Jim Fulton <jim (a] zope.com>
    821 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
    822 
    823 
    824 .. index:: module: pickle
    825 
    826 The :mod:`cPickle` module supports serialization and de-serialization of Python
    827 objects, providing an interface and functionality nearly identical to the
    828 :mod:`pickle` module.  There are several differences, the most important being
    829 performance and subclassability.
    830 
    831 First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
    832 the former is implemented in C.  Second, in the :mod:`cPickle` module the
    833 callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
    834 This means that you cannot use them to derive custom pickling and unpickling
    835 subclasses.  Most applications have no need for this functionality and should
    836 benefit from the greatly improved performance of the :mod:`cPickle` module.
    837 
    838 The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
    839 identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
    840 interchangeably with existing pickles. [#]_
    841 
    842 There are additional minor differences in API between :mod:`cPickle` and
    843 :mod:`pickle`, however for most applications, they are interchangeable.  More
    844 documentation is provided in the :mod:`pickle` module documentation, which
    845 includes a list of the documented differences.
    846 
    847 .. rubric:: Footnotes
    848 
    849 .. [#] Don't confuse this with the :mod:`marshal` module
    850 
    851 .. [#] In the :mod:`pickle` module these callables are classes, which you could
    852    subclass to customize the behavior.  However, in the :mod:`cPickle` module these
    853    callables are factory functions and so cannot be subclassed.  One common reason
    854    to subclass is to control what objects can actually be unpickled.  See section
    855    :ref:`pickle-sub` for more details.
    856 
    857 .. [#] *Warning*: this is intended for pickling multiple objects without intervening
    858    modifications to the objects or their parts.  If you modify an object and then
    859    pickle it again using the same :class:`Pickler` instance, the object is not
    860    pickled again --- a reference to it is pickled and the :class:`Unpickler` will
    861    return the old value, not the modified one. There are two problems here: (1)
    862    detecting changes, and (2) marshalling a minimal set of changes.  Garbage
    863    Collection may also become a problem here.
    864 
    865 .. [#] The exception raised will likely be an :exc:`ImportError` or an
    866    :exc:`AttributeError` but it could be something else.
    867 
    868 .. [#] These methods can also be used to implement copying class instances.
    869 
    870 .. [#] This protocol is also used by the shallow and deep copying operations defined in
    871    the :mod:`copy` module.
    872 
    873 .. [#] The actual mechanism for associating these user defined functions is slightly
    874    different for :mod:`pickle` and :mod:`cPickle`.  The description given here
    875    works the same for both implementations.  Users of the :mod:`pickle` module
    876    could also use subclassing to effect the same results, overriding the
    877    :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
    878    classes.
    879 
    880 .. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
    881    in their living rooms.
    882 
    883 .. [#] A word of caution: the mechanisms described here use internal attributes and
    884    methods, which are subject to change in future versions of Python.  We intend to
    885    someday provide a common interface for controlling this behavior, which will
    886    work in either :mod:`pickle` or :mod:`cPickle`.
    887 
    888 .. [#] Since the pickle data format is actually a tiny stack-oriented programming
    889    language, and some freedom is taken in the encodings of certain objects, it is
    890    possible that the two modules produce different data streams for the same input
    891    objects.  However it is guaranteed that they will always be able to read each
    892    other's data streams.
    893 
    894