Home | History | Annotate | Download | only in library
      1 :mod:`pickle` --- Python object serialization
      2 =============================================
      3 
      4 .. index::
      5    single: persistence
      6    pair: persistent; objects
      7    pair: serializing; objects
      8    pair: marshalling; objects
      9    pair: flattening; objects
     10    pair: pickling; objects
     11 
     12 .. module:: pickle
     13    :synopsis: Convert Python objects to streams of bytes and back.
     14 .. sectionauthor:: Jim Kerr <jbkerr (a] sr.hp.com>.
     15 .. sectionauthor:: Barry Warsaw <barry (a] zope.com>
     16 
     17 The :mod:`pickle` module implements a fundamental, but powerful algorithm for
     18 serializing and de-serializing a Python object structure.  "Pickling" is the
     19 process whereby a Python object hierarchy is converted into a byte stream, and
     20 "unpickling" is the inverse operation, whereby a byte stream is converted back
     21 into an object hierarchy.  Pickling (and unpickling) is alternatively known as
     22 "serialization", "marshalling," [#]_ or "flattening", however, to avoid
     23 confusion, the terms used here are "pickling" and "unpickling".
     24 
     25 This documentation describes both the :mod:`pickle` module and the
     26 :mod:`cPickle` module.
     27 
     28 .. warning::
     29 
     30    The :mod:`pickle` module is not secure against erroneous or maliciously
     31    constructed data.  Never unpickle data received from an untrusted or
     32    unauthenticated source.
     33 
     34 
     35 Relationship to other Python modules
     36 ------------------------------------
     37 
     38 The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle`
     39 module.  As its name implies, :mod:`cPickle` is written in C, so it can be up to
     40 1000 times faster than :mod:`pickle`.  However it does not support subclassing
     41 of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle`
     42 these are functions, not classes.  Most applications have no need for this
     43 functionality, and can benefit from the improved performance of :mod:`cPickle`.
     44 Other than that, the interfaces of the two modules are nearly identical; the
     45 common interface is described in this manual and differences are pointed out
     46 where necessary.  In the following discussions, we use the term "pickle" to
     47 collectively describe the :mod:`pickle` and :mod:`cPickle` modules.
     48 
     49 The data streams the two modules produce are guaranteed to be interchangeable.
     50 
     51 Python has a more primitive serialization module called :mod:`marshal`, but in
     52 general :mod:`pickle` should always be the preferred way to serialize Python
     53 objects.  :mod:`marshal` exists primarily to support Python's :file:`.pyc`
     54 files.
     55 
     56 The :mod:`pickle` module differs from :mod:`marshal` in several significant ways:
     57 
     58 * The :mod:`pickle` module keeps track of the objects it has already serialized,
     59   so that later references to the same object won't be serialized again.
     60   :mod:`marshal` doesn't do this.
     61 
     62   This has implications both for recursive objects and object sharing.  Recursive
     63   objects are objects that contain references to themselves.  These are not
     64   handled by marshal, and in fact, attempting to marshal recursive objects will
     65   crash your Python interpreter.  Object sharing happens when there are multiple
     66   references to the same object in different places in the object hierarchy being
     67   serialized.  :mod:`pickle` stores such objects only once, and ensures that all
     68   other references point to the master copy.  Shared objects remain shared, which
     69   can be very important for mutable objects.
     70 
     71 * :mod:`marshal` cannot be used to serialize user-defined classes and their
     72   instances.  :mod:`pickle` can save and restore class instances transparently,
     73   however the class definition must be importable and live in the same module as
     74   when the object was stored.
     75 
     76 * The :mod:`marshal` serialization format is not guaranteed to be portable
     77   across Python versions.  Because its primary job in life is to support
     78   :file:`.pyc` files, the Python implementers reserve the right to change the
     79   serialization format in non-backwards compatible ways should the need arise.
     80   The :mod:`pickle` serialization format is guaranteed to be backwards compatible
     81   across Python releases.
     82 
     83 Note that serialization is a more primitive notion than persistence; although
     84 :mod:`pickle` reads and writes file objects, it does not handle the issue of
     85 naming persistent objects, nor the (even more complicated) issue of concurrent
     86 access to persistent objects.  The :mod:`pickle` module can transform a complex
     87 object into a byte stream and it can transform the byte stream into an object
     88 with the same internal structure.  Perhaps the most obvious thing to do with
     89 these byte streams is to write them onto a file, but it is also conceivable to
     90 send them across a network or store them in a database.  The module
     91 :mod:`shelve` provides a simple interface to pickle and unpickle objects on
     92 DBM-style database files.
     93 
     94 
     95 Data stream format
     96 ------------------
     97 
     98 .. index::
     99    single: XDR
    100    single: External Data Representation
    101 
    102 The data format used by :mod:`pickle` is Python-specific.  This has the
    103 advantage that there are no restrictions imposed by external standards such as
    104 XDR (which can't represent pointer sharing); however it means that non-Python
    105 programs may not be able to reconstruct pickled Python objects.
    106 
    107 By default, the :mod:`pickle` data format uses a printable ASCII representation.
    108 This is slightly more voluminous than a binary representation.  The big
    109 advantage of using printable ASCII (and of some other characteristics of
    110 :mod:`pickle`'s representation) is that for debugging or recovery purposes it is
    111 possible for a human to read the pickled file with a standard text editor.
    112 
    113 There are currently 3 different protocols which can be used for pickling.
    114 
    115 * Protocol version 0 is the original ASCII protocol and is backwards compatible
    116   with earlier versions of Python.
    117 
    118 * Protocol version 1 is the old binary format which is also compatible with
    119   earlier versions of Python.
    120 
    121 * Protocol version 2 was introduced in Python 2.3.  It provides much more
    122   efficient pickling of :term:`new-style class`\es.
    123 
    124 Refer to :pep:`307` for more information.
    125 
    126 If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified
    127 as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version
    128 available will be used.
    129 
    130 .. versionchanged:: 2.3
    131    Introduced the *protocol* parameter.
    132 
    133 A binary format, which is slightly more efficient, can be chosen by specifying a
    134 *protocol* version >= 1.
    135 
    136 
    137 Usage
    138 -----
    139 
    140 To serialize an object hierarchy, you first create a pickler, then you call the
    141 pickler's :meth:`dump` method.  To de-serialize a data stream, you first create
    142 an unpickler, then you call the unpickler's :meth:`load` method.  The
    143 :mod:`pickle` module provides the following constant:
    144 
    145 
    146 .. data:: HIGHEST_PROTOCOL
    147 
    148    The highest protocol version available.  This value can be passed as a
    149    *protocol* value.
    150 
    151    .. versionadded:: 2.3
    152 
    153 .. note::
    154 
    155    Be sure to always open pickle files created with protocols >= 1 in binary mode.
    156    For the old ASCII-based pickle protocol 0 you can use either text mode or binary
    157    mode as long as you stay consistent.
    158 
    159    A pickle file written with protocol 0 in binary mode will contain lone linefeeds
    160    as line terminators and therefore will look "funny" when viewed in Notepad or
    161    other editors which do not support this format.
    162 
    163 The :mod:`pickle` module provides the following functions to make the pickling
    164 process more convenient:
    165 
    166 
    167 .. function:: dump(obj, file[, protocol])
    168 
    169    Write a pickled representation of *obj* to the open file object *file*.  This is
    170    equivalent to ``Pickler(file, protocol).dump(obj)``.
    171 
    172    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    173    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
    174    version will be used.
    175 
    176    .. versionchanged:: 2.3
    177       Introduced the *protocol* parameter.
    178 
    179    *file* must have a :meth:`write` method that accepts a single string argument.
    180    It can thus be a file object opened for writing, a :mod:`StringIO` object, or
    181    any other custom object that meets this interface.
    182 
    183 
    184 .. function:: load(file)
    185 
    186    Read a string from the open file object *file* and interpret it as a pickle data
    187    stream, reconstructing and returning the original object hierarchy.  This is
    188    equivalent to ``Unpickler(file).load()``.
    189 
    190    *file* must have two methods, a :meth:`read` method that takes an integer
    191    argument, and a :meth:`readline` method that requires no arguments.  Both
    192    methods should return a string.  Thus *file* can be a file object opened for
    193    reading, a :mod:`StringIO` object, or any other custom object that meets this
    194    interface.
    195 
    196    This function automatically determines whether the data stream was written in
    197    binary mode or not.
    198 
    199 
    200 .. function:: dumps(obj[, protocol])
    201 
    202    Return the pickled representation of the object as a string, instead of writing
    203    it to a file.
    204 
    205    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    206    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol
    207    version will be used.
    208 
    209    .. versionchanged:: 2.3
    210       The *protocol* parameter was added.
    211 
    212 
    213 .. function:: loads(string)
    214 
    215    Read a pickled object hierarchy from a string.  Characters in the string past
    216    the pickled object's representation are ignored.
    217 
    218 The :mod:`pickle` module also defines three exceptions:
    219 
    220 
    221 .. exception:: PickleError
    222 
    223    A common base class for the other exceptions defined below.  This inherits from
    224    :exc:`Exception`.
    225 
    226 
    227 .. exception:: PicklingError
    228 
    229    This exception is raised when an unpicklable object is passed to the
    230    :meth:`dump` method.
    231 
    232 
    233 .. exception:: UnpicklingError
    234 
    235    This exception is raised when there is a problem unpickling an object. Note that
    236    other exceptions may also be raised during unpickling, including (but not
    237    necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`,
    238    :exc:`ImportError`, and :exc:`IndexError`.
    239 
    240 The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and
    241 :class:`Unpickler`:
    242 
    243 
    244 .. class:: Pickler(file[, protocol])
    245 
    246    This takes a file-like object to which it will write a pickle data stream.
    247 
    248    If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is
    249    specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest
    250    protocol version will be used.
    251 
    252    .. versionchanged:: 2.3
    253       Introduced the *protocol* parameter.
    254 
    255    *file* must have a :meth:`write` method that accepts a single string argument.
    256    It can thus be an open file object, a :mod:`StringIO` object, or any other
    257    custom object that meets this interface.
    258 
    259    :class:`Pickler` objects define one (or two) public methods:
    260 
    261 
    262    .. method:: dump(obj)
    263 
    264       Write a pickled representation of *obj* to the open file object given in the
    265       constructor.  Either the binary or ASCII format will be used, depending on the
    266       value of the *protocol* argument passed to the constructor.
    267 
    268 
    269    .. method:: clear_memo()
    270 
    271       Clears the pickler's "memo".  The memo is the data structure that remembers
    272       which objects the pickler has already seen, so that shared or recursive objects
    273       pickled by reference and not by value.  This method is useful when re-using
    274       picklers.
    275 
    276       .. note::
    277 
    278          Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers
    279          created by :mod:`cPickle`.  In the :mod:`pickle` module, picklers have an
    280          instance variable called :attr:`memo` which is a Python dictionary.  So to clear
    281          the memo for a :mod:`pickle` module pickler, you could do the following::
    282 
    283             mypickler.memo.clear()
    284 
    285          Code that does not need to support older versions of Python should simply use
    286          :meth:`clear_memo`.
    287 
    288 It is possible to make multiple calls to the :meth:`dump` method of the same
    289 :class:`Pickler` instance.  These must then be matched to the same number of
    290 calls to the :meth:`load` method of the corresponding :class:`Unpickler`
    291 instance.  If the same object is pickled by multiple :meth:`dump` calls, the
    292 :meth:`load` will all yield references to the same object. [#]_
    293 
    294 :class:`Unpickler` objects are defined as:
    295 
    296 
    297 .. class:: Unpickler(file)
    298 
    299    This takes a file-like object from which it will read a pickle data stream.
    300    This class automatically determines whether the data stream was written in
    301    binary mode or not, so it does not need a flag as in the :class:`Pickler`
    302    factory.
    303 
    304    *file* must have two methods, a :meth:`read` method that takes an integer
    305    argument, and a :meth:`readline` method that requires no arguments.  Both
    306    methods should return a string.  Thus *file* can be a file object opened for
    307    reading, a :mod:`StringIO` object, or any other custom object that meets this
    308    interface.
    309 
    310    :class:`Unpickler` objects have one (or two) public methods:
    311 
    312 
    313    .. method:: load()
    314 
    315       Read a pickled object representation from the open file object given in
    316       the constructor, and return the reconstituted object hierarchy specified
    317       therein.
    318 
    319       This method automatically determines whether the data stream was written
    320       in binary mode or not.
    321 
    322 
    323    .. method:: noload()
    324 
    325       This is just like :meth:`load` except that it doesn't actually create any
    326       objects.  This is useful primarily for finding what's called "persistent
    327       ids" that may be referenced in a pickle data stream.  See section
    328       :ref:`pickle-protocol` below for more details.
    329 
    330       **Note:** the :meth:`noload` method is currently only available on
    331       :class:`Unpickler` objects created with the :mod:`cPickle` module.
    332       :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload`
    333       method.
    334 
    335 
    336 What can be pickled and unpickled?
    337 ----------------------------------
    338 
    339 The following types can be pickled:
    340 
    341 * ``None``, ``True``, and ``False``
    342 
    343 * integers, long integers, floating point numbers, complex numbers
    344 
    345 * normal and Unicode strings
    346 
    347 * tuples, lists, sets, and dictionaries containing only picklable objects
    348 
    349 * functions defined at the top level of a module
    350 
    351 * built-in functions defined at the top level of a module
    352 
    353 * classes that are defined at the top level of a module
    354 
    355 * instances of such classes whose :attr:`~object.__dict__` or the result of
    356   calling :meth:`__getstate__` is picklable  (see section :ref:`pickle-protocol`
    357   for details).
    358 
    359 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError`
    360 exception; when this happens, an unspecified number of bytes may have already
    361 been written to the underlying file. Trying to pickle a highly recursive data
    362 structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be
    363 raised in this case. You can carefully raise this limit with
    364 :func:`sys.setrecursionlimit`.
    365 
    366 Note that functions (built-in and user-defined) are pickled by "fully qualified"
    367 name reference, not by value.  This means that only the function name is
    368 pickled, along with the name of the module the function is defined in.  Neither
    369 the function's code, nor any of its function attributes are pickled.  Thus the
    370 defining module must be importable in the unpickling environment, and the module
    371 must contain the named object, otherwise an exception will be raised. [#]_
    372 
    373 Similarly, classes are pickled by named reference, so the same restrictions in
    374 the unpickling environment apply.  Note that none of the class's code or data is
    375 pickled, so in the following example the class attribute ``attr`` is not
    376 restored in the unpickling environment::
    377 
    378    class Foo:
    379        attr = 'a class attr'
    380 
    381    picklestring = pickle.dumps(Foo)
    382 
    383 These restrictions are why picklable functions and classes must be defined in
    384 the top level of a module.
    385 
    386 Similarly, when class instances are pickled, their class's code and data are not
    387 pickled along with them.  Only the instance data are pickled.  This is done on
    388 purpose, so you can fix bugs in a class or add methods to the class and still
    389 load objects that were created with an earlier version of the class.  If you
    390 plan to have long-lived objects that will see many versions of a class, it may
    391 be worthwhile to put a version number in the objects so that suitable
    392 conversions can be made by the class's :meth:`__setstate__` method.
    393 
    394 
    395 .. _pickle-protocol:
    396 
    397 The pickle protocol
    398 -------------------
    399 
    400 .. currentmodule:: None
    401 
    402 This section describes the "pickling protocol" that defines the interface
    403 between the pickler/unpickler and the objects that are being serialized.  This
    404 protocol provides a standard way for you to define, customize, and control how
    405 your objects are serialized and de-serialized.  The description in this section
    406 doesn't cover specific customizations that you can employ to make the unpickling
    407 environment slightly safer from untrusted pickle data streams; see section
    408 :ref:`pickle-sub` for more details.
    409 
    410 
    411 .. _pickle-inst:
    412 
    413 Pickling and unpickling normal class instances
    414 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    415 
    416 .. method:: object.__getinitargs__()
    417 
    418    When a pickled class instance is unpickled, its :meth:`__init__` method is
    419    normally *not* invoked.  If it is desirable that the :meth:`__init__` method
    420    be called on unpickling, an old-style class can define a method
    421    :meth:`__getinitargs__`, which should return a *tuple* containing the
    422    arguments to be passed to the class constructor (:meth:`__init__` for
    423    example).  The :meth:`__getinitargs__` method is called at pickle time; the
    424    tuple it returns is incorporated in the pickle for the instance.
    425 
    426 .. method:: object.__getnewargs__()
    427 
    428    New-style types can provide a :meth:`__getnewargs__` method that is used for
    429    protocol 2.  Implementing this method is needed if the type establishes some
    430    internal invariants when the instance is created, or if the memory allocation
    431    is affected by the values passed to the :meth:`__new__` method for the type
    432    (as it is for tuples and strings).  Instances of a :term:`new-style class`
    433    ``C`` are created using ::
    434 
    435       obj = C.__new__(C, *args)
    436 
    437    where *args* is the result of calling :meth:`__getnewargs__` on the original
    438    object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed.
    439 
    440 .. method:: object.__getstate__()
    441 
    442    Classes can further influence how their instances are pickled; if the class
    443    defines the method :meth:`__getstate__`, it is called and the return state is
    444    pickled as the contents for the instance, instead of the contents of the
    445    instance's dictionary.  If there is no :meth:`__getstate__` method, the
    446    instance's :attr:`~object.__dict__` is pickled.
    447 
    448 .. method:: object.__setstate__(state)
    449 
    450    Upon unpickling, if the class also defines the method :meth:`__setstate__`,
    451    it is called with the unpickled state. [#]_ If there is no
    452    :meth:`__setstate__` method, the pickled state must be a dictionary and its
    453    items are assigned to the new instance's dictionary.  If a class defines both
    454    :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a
    455    dictionary and these methods can do what they want. [#]_
    456 
    457    .. note::
    458 
    459       For :term:`new-style class`\es, if :meth:`__getstate__` returns a false
    460       value, the :meth:`__setstate__` method will not be called.
    461 
    462 .. note::
    463 
    464    At unpickling time, some methods like :meth:`__getattr__`,
    465    :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the
    466    instance.  In case those methods rely on some internal invariant being
    467    true, the type should implement either :meth:`__getinitargs__` or
    468    :meth:`__getnewargs__` to establish such an invariant; otherwise, neither
    469    :meth:`__new__` nor :meth:`__init__` will be called.
    470 
    471 
    472 Pickling and unpickling extension types
    473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    474 
    475 .. method:: object.__reduce__()
    476 
    477    When the :class:`Pickler` encounters an object of a type it knows nothing
    478    about --- such as an extension type --- it looks in two places for a hint of
    479    how to pickle it.  One alternative is for the object to implement a
    480    :meth:`__reduce__` method.  If provided, at pickling time :meth:`__reduce__`
    481    will be called with no arguments, and it must return either a string or a
    482    tuple.
    483 
    484    If a string is returned, it names a global variable whose contents are
    485    pickled as normal.  The string returned by :meth:`__reduce__` should be the
    486    object's local name relative to its module; the pickle module searches the
    487    module namespace to determine the object's module.
    488 
    489    When a tuple is returned, it must be between two and five elements long.
    490    Optional elements can either be omitted, or ``None`` can be provided as their
    491    value.  The contents of this tuple are pickled as normal and used to
    492    reconstruct the object at unpickling time.  The semantics of each element
    493    are:
    494 
    495    * A callable object that will be called to create the initial version of the
    496      object.  The next element of the tuple will provide arguments for this
    497      callable, and later elements provide additional state information that will
    498      subsequently be used to fully reconstruct the pickled data.
    499 
    500      In the unpickling environment this object must be either a class, a
    501      callable registered as a "safe constructor" (see below), or it must have an
    502      attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an
    503      :exc:`UnpicklingError` will be raised in the unpickling environment.  Note
    504      that as usual, the callable itself is pickled by name.
    505 
    506    * A tuple of arguments for the callable object.
    507 
    508      .. versionchanged:: 2.5
    509         Formerly, this argument could also be ``None``.
    510 
    511    * Optionally, the object's state, which will be passed to the object's
    512      :meth:`__setstate__` method as described in section :ref:`pickle-inst`.  If
    513      the object has no :meth:`__setstate__` method, then, as above, the value
    514      must be a dictionary and it will be added to the object's
    515      :attr:`~object.__dict__`.
    516 
    517    * Optionally, an iterator (and not a sequence) yielding successive list
    518      items.  These list items will be pickled, and appended to the object using
    519      either ``obj.append(item)`` or ``obj.extend(list_of_items)``.  This is
    520      primarily used for list subclasses, but may be used by other classes as
    521      long as they have :meth:`append` and :meth:`extend` methods with the
    522      appropriate signature.  (Whether :meth:`append` or :meth:`extend` is used
    523      depends on which pickle protocol version is used as well as the number of
    524      items to append, so both must be supported.)
    525 
    526    * Optionally, an iterator (not a sequence) yielding successive dictionary
    527      items, which should be tuples of the form ``(key, value)``.  These items
    528      will be pickled and stored to the object using ``obj[key] = value``. This
    529      is primarily used for dictionary subclasses, but may be used by other
    530      classes as long as they implement :meth:`__setitem__`.
    531 
    532 .. method:: object.__reduce_ex__(protocol)
    533 
    534    It is sometimes useful to know the protocol version when implementing
    535    :meth:`__reduce__`.  This can be done by implementing a method named
    536    :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`,
    537    when it exists, is called in preference over :meth:`__reduce__` (you may
    538    still provide :meth:`__reduce__` for backwards compatibility).  The
    539    :meth:`__reduce_ex__` method will be called with a single integer argument,
    540    the protocol version.
    541 
    542    The :class:`object` class implements both :meth:`__reduce__` and
    543    :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__`
    544    but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation
    545    detects this and calls :meth:`__reduce__`.
    546 
    547 An alternative to implementing a :meth:`__reduce__` method on the object to be
    548 pickled, is to register the callable with the :mod:`copy_reg` module.  This
    549 module provides a way for programs to register "reduction functions" and
    550 constructors for user-defined types.   Reduction functions have the same
    551 semantics and interface as the :meth:`__reduce__` method described above, except
    552 that they are called with a single argument, the object to be pickled.
    553 
    554 The registered constructor is deemed a "safe constructor" for purposes of
    555 unpickling as described above.
    556 
    557 
    558 Pickling and unpickling external objects
    559 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    560 
    561 .. index::
    562    single: persistent_id (pickle protocol)
    563    single: persistent_load (pickle protocol)
    564 
    565 For the benefit of object persistence, the :mod:`pickle` module supports the
    566 notion of a reference to an object outside the pickled data stream.  Such
    567 objects are referenced by a "persistent id", which is just an arbitrary string
    568 of printable ASCII characters. The resolution of such names is not defined by
    569 the :mod:`pickle` module; it will delegate this resolution to user defined
    570 functions on the pickler and unpickler. [#]_
    571 
    572 To define external persistent id resolution, you need to set the
    573 :attr:`~Pickler.persistent_id` attribute of the pickler object and the
    574 :attr:`~Unpickler.persistent_load` attribute of the unpickler object.
    575 
    576 To pickle objects that have an external persistent id, the pickler must have a
    577 custom :func:`~Pickler.persistent_id` method that takes an object as an
    578 argument and returns either ``None`` or the persistent id for that object.
    579 When ``None`` is returned, the pickler simply pickles the object as normal.
    580 When a persistent id string is returned, the pickler will pickle that string,
    581 along with a marker so that the unpickler will recognize the string as a
    582 persistent id.
    583 
    584 To unpickle external objects, the unpickler must have a custom
    585 :func:`~Unpickler.persistent_load` function that takes a persistent id string
    586 and returns the referenced object.
    587 
    588 Here's a silly example that *might* shed more light::
    589 
    590    import pickle
    591    from cStringIO import StringIO
    592 
    593    src = StringIO()
    594    p = pickle.Pickler(src)
    595 
    596    def persistent_id(obj):
    597        if hasattr(obj, 'x'):
    598            return 'the value %d' % obj.x
    599        else:
    600            return None
    601 
    602    p.persistent_id = persistent_id
    603 
    604    class Integer:
    605        def __init__(self, x):
    606            self.x = x
    607        def __str__(self):
    608            return 'My name is integer %d' % self.x
    609 
    610    i = Integer(7)
    611    print i
    612    p.dump(i)
    613 
    614    datastream = src.getvalue()
    615    print repr(datastream)
    616    dst = StringIO(datastream)
    617 
    618    up = pickle.Unpickler(dst)
    619 
    620    class FancyInteger(Integer):
    621        def __str__(self):
    622            return 'I am the integer %d' % self.x
    623 
    624    def persistent_load(persid):
    625        if persid.startswith('the value '):
    626            value = int(persid.split()[2])
    627            return FancyInteger(value)
    628        else:
    629            raise pickle.UnpicklingError, 'Invalid persistent id'
    630 
    631    up.persistent_load = persistent_load
    632 
    633    j = up.load()
    634    print j
    635 
    636 In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load`
    637 attribute can also be set to a Python list, in which case, when the unpickler
    638 reaches a persistent id, the persistent id string will simply be appended to
    639 this list.  This functionality exists so that a pickle data stream can be
    640 "sniffed" for object references without actually instantiating all the objects
    641 in a pickle.
    642 [#]_  Setting :attr:`~Unpickler.persistent_load` to a list is usually used in
    643 conjunction with the :meth:`~Unpickler.noload` method on the Unpickler.
    644 
    645 .. BAW: Both pickle and cPickle support something called inst_persistent_id()
    646    which appears to give unknown types a second shot at producing a persistent
    647    id.  Since Jim Fulton can't remember why it was added or what it's for, I'm
    648    leaving it undocumented.
    649 
    650 
    651 .. _pickle-sub:
    652 
    653 Subclassing Unpicklers
    654 ----------------------
    655 
    656 .. index::
    657    single: load_global() (pickle protocol)
    658    single: find_global() (pickle protocol)
    659 
    660 By default, unpickling will import any class that it finds in the pickle data.
    661 You can control exactly what gets unpickled and what gets called by customizing
    662 your unpickler.  Unfortunately, exactly how you do this is different depending
    663 on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_
    664 
    665 In the :mod:`pickle` module, you need to derive a subclass from
    666 :class:`Unpickler`, overriding the :meth:`load_global` method.
    667 :meth:`load_global` should read two lines from the pickle data stream where the
    668 first line will the name of the module containing the class and the second line
    669 will be the name of the instance's class.  It then looks up the class, possibly
    670 importing the module and digging out the attribute, then it appends what it
    671 finds to the unpickler's stack.  Later on, this class will be assigned to the
    672 :attr:`__class__` attribute of an empty class, as a way of magically creating an
    673 instance without calling its class's :meth:`__init__`. Your job (should you
    674 choose to accept it), would be to have :meth:`load_global` push onto the
    675 unpickler's stack, a known safe version of any class you deem safe to unpickle.
    676 It is up to you to produce such a class.  Or you could raise an error if you
    677 want to disallow all unpickling of instances.  If this sounds like a hack,
    678 you're right.  Refer to the source code to make this work.
    679 
    680 Things are a little cleaner with :mod:`cPickle`, but not by much. To control
    681 what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global`
    682 attribute to a function or ``None``.  If it is ``None`` then any attempts to
    683 unpickle instances will raise an :exc:`UnpicklingError`.  If it is a function,
    684 then it should accept a module name and a class name, and return the
    685 corresponding class object.  It is responsible for looking up the class and
    686 performing any necessary imports, and it may raise an error to prevent
    687 instances of the class from being unpickled.
    688 
    689 The moral of the story is that you should be really careful about the source of
    690 the strings your application unpickles.
    691 
    692 
    693 .. _pickle-example:
    694 
    695 Example
    696 -------
    697 
    698 For the simplest code, use the :func:`dump` and :func:`load` functions.  Note
    699 that a self-referencing list is pickled and restored correctly. ::
    700 
    701    import pickle
    702 
    703    data1 = {'a': [1, 2.0, 3, 4+6j],
    704             'b': ('string', u'Unicode string'),
    705             'c': None}
    706 
    707    selfref_list = [1, 2, 3]
    708    selfref_list.append(selfref_list)
    709 
    710    output = open('data.pkl', 'wb')
    711 
    712    # Pickle dictionary using protocol 0.
    713    pickle.dump(data1, output)
    714 
    715    # Pickle the list using the highest protocol available.
    716    pickle.dump(selfref_list, output, -1)
    717 
    718    output.close()
    719 
    720 The following example reads the resulting pickled data.  When reading a
    721 pickle-containing file, you should open the file in binary mode because you
    722 can't be sure if the ASCII or binary format was used. ::
    723 
    724    import pprint, pickle
    725 
    726    pkl_file = open('data.pkl', 'rb')
    727 
    728    data1 = pickle.load(pkl_file)
    729    pprint.pprint(data1)
    730 
    731    data2 = pickle.load(pkl_file)
    732    pprint.pprint(data2)
    733 
    734    pkl_file.close()
    735 
    736 Here's a larger example that shows how to modify pickling behavior for a class.
    737 The :class:`TextReader` class opens a text file, and returns the line number and
    738 line contents each time its :meth:`!readline` method is called. If a
    739 :class:`TextReader` instance is pickled, all attributes *except* the file object
    740 member are saved. When the instance is unpickled, the file is reopened, and
    741 reading resumes from the last location. The :meth:`__setstate__` and
    742 :meth:`__getstate__` methods are used to implement this behavior. ::
    743 
    744    #!/usr/local/bin/python
    745 
    746    class TextReader:
    747        """Print and number lines in a text file."""
    748        def __init__(self, file):
    749            self.file = file
    750            self.fh = open(file)
    751            self.lineno = 0
    752 
    753        def readline(self):
    754            self.lineno = self.lineno + 1
    755            line = self.fh.readline()
    756            if not line:
    757                return None
    758            if line.endswith("\n"):
    759                line = line[:-1]
    760            return "%d: %s" % (self.lineno, line)
    761 
    762        def __getstate__(self):
    763            odict = self.__dict__.copy() # copy the dict since we change it
    764            del odict['fh']              # remove filehandle entry
    765            return odict
    766 
    767        def __setstate__(self, dict):
    768            fh = open(dict['file'])      # reopen file
    769            count = dict['lineno']       # read from file...
    770            while count:                 # until line count is restored
    771                fh.readline()
    772                count = count - 1
    773            self.__dict__.update(dict)   # update attributes
    774            self.fh = fh                 # save the file object
    775 
    776 A sample usage might be something like this::
    777 
    778    >>> import TextReader
    779    >>> obj = TextReader.TextReader("TextReader.py")
    780    >>> obj.readline()
    781    '1: #!/usr/local/bin/python'
    782    >>> obj.readline()
    783    '2: '
    784    >>> obj.readline()
    785    '3: class TextReader:'
    786    >>> import pickle
    787    >>> pickle.dump(obj, open('save.p', 'wb'))
    788 
    789 If you want to see that :mod:`pickle` works across Python processes, start
    790 another Python session, before continuing.  What follows can happen from either
    791 the same process or a new process. ::
    792 
    793    >>> import pickle
    794    >>> reader = pickle.load(open('save.p', 'rb'))
    795    >>> reader.readline()
    796    '4:     """Print and number lines in a text file."""'
    797 
    798 
    799 .. seealso::
    800 
    801    Module :mod:`copy_reg`
    802       Pickle interface constructor registration for extension types.
    803 
    804    Module :mod:`shelve`
    805       Indexed databases of objects; uses :mod:`pickle`.
    806 
    807    Module :mod:`copy`
    808       Shallow and deep object copying.
    809 
    810    Module :mod:`marshal`
    811       High-performance serialization of built-in types.
    812 
    813 
    814 :mod:`cPickle` --- A faster :mod:`pickle`
    815 =========================================
    816 
    817 .. module:: cPickle
    818    :synopsis: Faster version of pickle, but not subclassable.
    819 .. moduleauthor:: Jim Fulton <jim (a] zope.com>
    820 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org>
    821 
    822 
    823 .. index:: module: pickle
    824 
    825 The :mod:`cPickle` module supports serialization and de-serialization of Python
    826 objects, providing an interface and functionality nearly identical to the
    827 :mod:`pickle` module.  There are several differences, the most important being
    828 performance and subclassability.
    829 
    830 First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because
    831 the former is implemented in C.  Second, in the :mod:`cPickle` module the
    832 callables :func:`Pickler` and :func:`Unpickler` are functions, not classes.
    833 This means that you cannot use them to derive custom pickling and unpickling
    834 subclasses.  Most applications have no need for this functionality and should
    835 benefit from the greatly improved performance of the :mod:`cPickle` module.
    836 
    837 The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are
    838 identical, so it is possible to use :mod:`pickle` and :mod:`cPickle`
    839 interchangeably with existing pickles. [#]_
    840 
    841 There are additional minor differences in API between :mod:`cPickle` and
    842 :mod:`pickle`, however for most applications, they are interchangeable.  More
    843 documentation is provided in the :mod:`pickle` module documentation, which
    844 includes a list of the documented differences.
    845 
    846 .. rubric:: Footnotes
    847 
    848 .. [#] Don't confuse this with the :mod:`marshal` module
    849 
    850 .. [#] In the :mod:`pickle` module these callables are classes, which you could
    851    subclass to customize the behavior.  However, in the :mod:`cPickle` module these
    852    callables are factory functions and so cannot be subclassed.  One common reason
    853    to subclass is to control what objects can actually be unpickled.  See section
    854    :ref:`pickle-sub` for more details.
    855 
    856 .. [#] *Warning*: this is intended for pickling multiple objects without intervening
    857    modifications to the objects or their parts.  If you modify an object and then
    858    pickle it again using the same :class:`Pickler` instance, the object is not
    859    pickled again --- a reference to it is pickled and the :class:`Unpickler` will
    860    return the old value, not the modified one. There are two problems here: (1)
    861    detecting changes, and (2) marshalling a minimal set of changes.  Garbage
    862    Collection may also become a problem here.
    863 
    864 .. [#] The exception raised will likely be an :exc:`ImportError` or an
    865    :exc:`AttributeError` but it could be something else.
    866 
    867 .. [#] These methods can also be used to implement copying class instances.
    868 
    869 .. [#] This protocol is also used by the shallow and deep copying operations defined in
    870    the :mod:`copy` module.
    871 
    872 .. [#] The actual mechanism for associating these user defined functions is slightly
    873    different for :mod:`pickle` and :mod:`cPickle`.  The description given here
    874    works the same for both implementations.  Users of the :mod:`pickle` module
    875    could also use subclassing to effect the same results, overriding the
    876    :meth:`persistent_id` and :meth:`persistent_load` methods in the derived
    877    classes.
    878 
    879 .. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles
    880    in their living rooms.
    881 
    882 .. [#] A word of caution: the mechanisms described here use internal attributes and
    883    methods, which are subject to change in future versions of Python.  We intend to
    884    someday provide a common interface for controlling this behavior, which will
    885    work in either :mod:`pickle` or :mod:`cPickle`.
    886 
    887 .. [#] Since the pickle data format is actually a tiny stack-oriented programming
    888    language, and some freedom is taken in the encodings of certain objects, it is
    889    possible that the two modules produce different data streams for the same input
    890    objects.  However it is guaranteed that they will always be able to read each
    891    other's data streams.
    892 
    893