1 :mod:`pickle` --- Python object serialization 2 ============================================= 3 4 .. index:: 5 single: persistence 6 pair: persistent; objects 7 pair: serializing; objects 8 pair: marshalling; objects 9 pair: flattening; objects 10 pair: pickling; objects 11 12 .. module:: pickle 13 :synopsis: Convert Python objects to streams of bytes and back. 14 .. sectionauthor:: Jim Kerr <jbkerr (a] sr.hp.com>. 15 .. sectionauthor:: Barry Warsaw <barry (a] zope.com> 16 17 The :mod:`pickle` module implements a fundamental, but powerful algorithm for 18 serializing and de-serializing a Python object structure. "Pickling" is the 19 process whereby a Python object hierarchy is converted into a byte stream, and 20 "unpickling" is the inverse operation, whereby a byte stream is converted back 21 into an object hierarchy. Pickling (and unpickling) is alternatively known as 22 "serialization", "marshalling," [#]_ or "flattening", however, to avoid 23 confusion, the terms used here are "pickling" and "unpickling". 24 25 This documentation describes both the :mod:`pickle` module and the 26 :mod:`cPickle` module. 27 28 .. warning:: 29 30 The :mod:`pickle` module is not secure against erroneous or maliciously 31 constructed data. Never unpickle data received from an untrusted or 32 unauthenticated source. 33 34 35 Relationship to other Python modules 36 ------------------------------------ 37 38 The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle` 39 module. As its name implies, :mod:`cPickle` is written in C, so it can be up to 40 1000 times faster than :mod:`pickle`. However it does not support subclassing 41 of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle` 42 these are functions, not classes. Most applications have no need for this 43 functionality, and can benefit from the improved performance of :mod:`cPickle`. 44 Other than that, the interfaces of the two modules are nearly identical; the 45 common interface is described in this manual and differences are pointed out 46 where necessary. In the following discussions, we use the term "pickle" to 47 collectively describe the :mod:`pickle` and :mod:`cPickle` modules. 48 49 The data streams the two modules produce are guaranteed to be interchangeable. 50 51 Python has a more primitive serialization module called :mod:`marshal`, but in 52 general :mod:`pickle` should always be the preferred way to serialize Python 53 objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc` 54 files. 55 56 The :mod:`pickle` module differs from :mod:`marshal` in several significant ways: 57 58 * The :mod:`pickle` module keeps track of the objects it has already serialized, 59 so that later references to the same object won't be serialized again. 60 :mod:`marshal` doesn't do this. 61 62 This has implications both for recursive objects and object sharing. Recursive 63 objects are objects that contain references to themselves. These are not 64 handled by marshal, and in fact, attempting to marshal recursive objects will 65 crash your Python interpreter. Object sharing happens when there are multiple 66 references to the same object in different places in the object hierarchy being 67 serialized. :mod:`pickle` stores such objects only once, and ensures that all 68 other references point to the master copy. Shared objects remain shared, which 69 can be very important for mutable objects. 70 71 * :mod:`marshal` cannot be used to serialize user-defined classes and their 72 instances. :mod:`pickle` can save and restore class instances transparently, 73 however the class definition must be importable and live in the same module as 74 when the object was stored. 75 76 * The :mod:`marshal` serialization format is not guaranteed to be portable 77 across Python versions. Because its primary job in life is to support 78 :file:`.pyc` files, the Python implementers reserve the right to change the 79 serialization format in non-backwards compatible ways should the need arise. 80 The :mod:`pickle` serialization format is guaranteed to be backwards compatible 81 across Python releases. 82 83 Note that serialization is a more primitive notion than persistence; although 84 :mod:`pickle` reads and writes file objects, it does not handle the issue of 85 naming persistent objects, nor the (even more complicated) issue of concurrent 86 access to persistent objects. The :mod:`pickle` module can transform a complex 87 object into a byte stream and it can transform the byte stream into an object 88 with the same internal structure. Perhaps the most obvious thing to do with 89 these byte streams is to write them onto a file, but it is also conceivable to 90 send them across a network or store them in a database. The module 91 :mod:`shelve` provides a simple interface to pickle and unpickle objects on 92 DBM-style database files. 93 94 95 Data stream format 96 ------------------ 97 98 .. index:: 99 single: XDR 100 single: External Data Representation 101 102 The data format used by :mod:`pickle` is Python-specific. This has the 103 advantage that there are no restrictions imposed by external standards such as 104 XDR (which can't represent pointer sharing); however it means that non-Python 105 programs may not be able to reconstruct pickled Python objects. 106 107 By default, the :mod:`pickle` data format uses a printable ASCII representation. 108 This is slightly more voluminous than a binary representation. The big 109 advantage of using printable ASCII (and of some other characteristics of 110 :mod:`pickle`'s representation) is that for debugging or recovery purposes it is 111 possible for a human to read the pickled file with a standard text editor. 112 113 There are currently 3 different protocols which can be used for pickling. 114 115 * Protocol version 0 is the original ASCII protocol and is backwards compatible 116 with earlier versions of Python. 117 118 * Protocol version 1 is the old binary format which is also compatible with 119 earlier versions of Python. 120 121 * Protocol version 2 was introduced in Python 2.3. It provides much more 122 efficient pickling of :term:`new-style class`\es. 123 124 Refer to :pep:`307` for more information. 125 126 If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified 127 as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version 128 available will be used. 129 130 .. versionchanged:: 2.3 131 Introduced the *protocol* parameter. 132 133 A binary format, which is slightly more efficient, can be chosen by specifying a 134 *protocol* version >= 1. 135 136 137 Usage 138 ----- 139 140 To serialize an object hierarchy, you first create a pickler, then you call the 141 pickler's :meth:`dump` method. To de-serialize a data stream, you first create 142 an unpickler, then you call the unpickler's :meth:`load` method. The 143 :mod:`pickle` module provides the following constant: 144 145 146 .. data:: HIGHEST_PROTOCOL 147 148 The highest protocol version available. This value can be passed as a 149 *protocol* value. 150 151 .. versionadded:: 2.3 152 153 .. note:: 154 155 Be sure to always open pickle files created with protocols >= 1 in binary mode. 156 For the old ASCII-based pickle protocol 0 you can use either text mode or binary 157 mode as long as you stay consistent. 158 159 A pickle file written with protocol 0 in binary mode will contain lone linefeeds 160 as line terminators and therefore will look "funny" when viewed in Notepad or 161 other editors which do not support this format. 162 163 The :mod:`pickle` module provides the following functions to make the pickling 164 process more convenient: 165 166 167 .. function:: dump(obj, file[, protocol]) 168 169 Write a pickled representation of *obj* to the open file object *file*. This is 170 equivalent to ``Pickler(file, protocol).dump(obj)``. 171 172 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 173 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 174 version will be used. 175 176 .. versionchanged:: 2.3 177 Introduced the *protocol* parameter. 178 179 *file* must have a :meth:`write` method that accepts a single string argument. 180 It can thus be a file object opened for writing, a :mod:`StringIO` object, or 181 any other custom object that meets this interface. 182 183 184 .. function:: load(file) 185 186 Read a string from the open file object *file* and interpret it as a pickle data 187 stream, reconstructing and returning the original object hierarchy. This is 188 equivalent to ``Unpickler(file).load()``. 189 190 *file* must have two methods, a :meth:`read` method that takes an integer 191 argument, and a :meth:`readline` method that requires no arguments. Both 192 methods should return a string. Thus *file* can be a file object opened for 193 reading, a :mod:`StringIO` object, or any other custom object that meets this 194 interface. 195 196 This function automatically determines whether the data stream was written in 197 binary mode or not. 198 199 200 .. function:: dumps(obj[, protocol]) 201 202 Return the pickled representation of the object as a string, instead of writing 203 it to a file. 204 205 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 206 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 207 version will be used. 208 209 .. versionchanged:: 2.3 210 The *protocol* parameter was added. 211 212 213 .. function:: loads(string) 214 215 Read a pickled object hierarchy from a string. Characters in the string past 216 the pickled object's representation are ignored. 217 218 The :mod:`pickle` module also defines three exceptions: 219 220 221 .. exception:: PickleError 222 223 A common base class for the other exceptions defined below. This inherits from 224 :exc:`Exception`. 225 226 227 .. exception:: PicklingError 228 229 This exception is raised when an unpicklable object is passed to the 230 :meth:`dump` method. 231 232 233 .. exception:: UnpicklingError 234 235 This exception is raised when there is a problem unpickling an object. Note that 236 other exceptions may also be raised during unpickling, including (but not 237 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`, 238 :exc:`ImportError`, and :exc:`IndexError`. 239 240 The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and 241 :class:`Unpickler`: 242 243 244 .. class:: Pickler(file[, protocol]) 245 246 This takes a file-like object to which it will write a pickle data stream. 247 248 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 249 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest 250 protocol version will be used. 251 252 .. versionchanged:: 2.3 253 Introduced the *protocol* parameter. 254 255 *file* must have a :meth:`write` method that accepts a single string argument. 256 It can thus be an open file object, a :mod:`StringIO` object, or any other 257 custom object that meets this interface. 258 259 :class:`Pickler` objects define one (or two) public methods: 260 261 262 .. method:: dump(obj) 263 264 Write a pickled representation of *obj* to the open file object given in the 265 constructor. Either the binary or ASCII format will be used, depending on the 266 value of the *protocol* argument passed to the constructor. 267 268 269 .. method:: clear_memo() 270 271 Clears the pickler's "memo". The memo is the data structure that remembers 272 which objects the pickler has already seen, so that shared or recursive objects 273 pickled by reference and not by value. This method is useful when re-using 274 picklers. 275 276 .. note:: 277 278 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers 279 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an 280 instance variable called :attr:`memo` which is a Python dictionary. So to clear 281 the memo for a :mod:`pickle` module pickler, you could do the following:: 282 283 mypickler.memo.clear() 284 285 Code that does not need to support older versions of Python should simply use 286 :meth:`clear_memo`. 287 288 It is possible to make multiple calls to the :meth:`dump` method of the same 289 :class:`Pickler` instance. These must then be matched to the same number of 290 calls to the :meth:`load` method of the corresponding :class:`Unpickler` 291 instance. If the same object is pickled by multiple :meth:`dump` calls, the 292 :meth:`load` will all yield references to the same object. [#]_ 293 294 :class:`Unpickler` objects are defined as: 295 296 297 .. class:: Unpickler(file) 298 299 This takes a file-like object from which it will read a pickle data stream. 300 This class automatically determines whether the data stream was written in 301 binary mode or not, so it does not need a flag as in the :class:`Pickler` 302 factory. 303 304 *file* must have two methods, a :meth:`read` method that takes an integer 305 argument, and a :meth:`readline` method that requires no arguments. Both 306 methods should return a string. Thus *file* can be a file object opened for 307 reading, a :mod:`StringIO` object, or any other custom object that meets this 308 interface. 309 310 :class:`Unpickler` objects have one (or two) public methods: 311 312 313 .. method:: load() 314 315 Read a pickled object representation from the open file object given in 316 the constructor, and return the reconstituted object hierarchy specified 317 therein. 318 319 This method automatically determines whether the data stream was written 320 in binary mode or not. 321 322 323 .. method:: noload() 324 325 This is just like :meth:`load` except that it doesn't actually create any 326 objects. This is useful primarily for finding what's called "persistent 327 ids" that may be referenced in a pickle data stream. See section 328 :ref:`pickle-protocol` below for more details. 329 330 **Note:** the :meth:`noload` method is currently only available on 331 :class:`Unpickler` objects created with the :mod:`cPickle` module. 332 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload` 333 method. 334 335 336 What can be pickled and unpickled? 337 ---------------------------------- 338 339 The following types can be pickled: 340 341 * ``None``, ``True``, and ``False`` 342 343 * integers, long integers, floating point numbers, complex numbers 344 345 * normal and Unicode strings 346 347 * tuples, lists, sets, and dictionaries containing only picklable objects 348 349 * functions defined at the top level of a module 350 351 * built-in functions defined at the top level of a module 352 353 * classes that are defined at the top level of a module 354 355 * instances of such classes whose :attr:`~object.__dict__` or the result of 356 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-protocol` 357 for details). 358 359 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` 360 exception; when this happens, an unspecified number of bytes may have already 361 been written to the underlying file. Trying to pickle a highly recursive data 362 structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be 363 raised in this case. You can carefully raise this limit with 364 :func:`sys.setrecursionlimit`. 365 366 Note that functions (built-in and user-defined) are pickled by "fully qualified" 367 name reference, not by value. This means that only the function name is 368 pickled, along with the name of the module the function is defined in. Neither 369 the function's code, nor any of its function attributes are pickled. Thus the 370 defining module must be importable in the unpickling environment, and the module 371 must contain the named object, otherwise an exception will be raised. [#]_ 372 373 Similarly, classes are pickled by named reference, so the same restrictions in 374 the unpickling environment apply. Note that none of the class's code or data is 375 pickled, so in the following example the class attribute ``attr`` is not 376 restored in the unpickling environment:: 377 378 class Foo: 379 attr = 'a class attr' 380 381 picklestring = pickle.dumps(Foo) 382 383 These restrictions are why picklable functions and classes must be defined in 384 the top level of a module. 385 386 Similarly, when class instances are pickled, their class's code and data are not 387 pickled along with them. Only the instance data are pickled. This is done on 388 purpose, so you can fix bugs in a class or add methods to the class and still 389 load objects that were created with an earlier version of the class. If you 390 plan to have long-lived objects that will see many versions of a class, it may 391 be worthwhile to put a version number in the objects so that suitable 392 conversions can be made by the class's :meth:`__setstate__` method. 393 394 395 .. _pickle-protocol: 396 397 The pickle protocol 398 ------------------- 399 400 .. currentmodule:: None 401 402 This section describes the "pickling protocol" that defines the interface 403 between the pickler/unpickler and the objects that are being serialized. This 404 protocol provides a standard way for you to define, customize, and control how 405 your objects are serialized and de-serialized. The description in this section 406 doesn't cover specific customizations that you can employ to make the unpickling 407 environment slightly safer from untrusted pickle data streams; see section 408 :ref:`pickle-sub` for more details. 409 410 411 .. _pickle-inst: 412 413 Pickling and unpickling normal class instances 414 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 415 416 .. method:: object.__getinitargs__() 417 418 When a pickled class instance is unpickled, its :meth:`__init__` method is 419 normally *not* invoked. If it is desirable that the :meth:`__init__` method 420 be called on unpickling, an old-style class can define a method 421 :meth:`__getinitargs__`, which should return a *tuple* of positional 422 arguments to be passed to the class constructor (:meth:`__init__` for 423 example). Keyword arguments are not supported. The :meth:`__getinitargs__` 424 method is called at pickle time; the tuple it returns is incorporated in the 425 pickle for the instance. 426 427 .. method:: object.__getnewargs__() 428 429 New-style types can provide a :meth:`__getnewargs__` method that is used for 430 protocol 2. Implementing this method is needed if the type establishes some 431 internal invariants when the instance is created, or if the memory allocation 432 is affected by the values passed to the :meth:`__new__` method for the type 433 (as it is for tuples and strings). Instances of a :term:`new-style class` 434 ``C`` are created using :: 435 436 obj = C.__new__(C, *args) 437 438 where *args* is the result of calling :meth:`__getnewargs__` on the original 439 object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed. 440 441 .. method:: object.__getstate__() 442 443 Classes can further influence how their instances are pickled; if the class 444 defines the method :meth:`__getstate__`, it is called and the return state is 445 pickled as the contents for the instance, instead of the contents of the 446 instance's dictionary. If there is no :meth:`__getstate__` method, the 447 instance's :attr:`~object.__dict__` is pickled. 448 449 .. method:: object.__setstate__(state) 450 451 Upon unpickling, if the class also defines the method :meth:`__setstate__`, 452 it is called with the unpickled state. [#]_ If there is no 453 :meth:`__setstate__` method, the pickled state must be a dictionary and its 454 items are assigned to the new instance's dictionary. If a class defines both 455 :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a 456 dictionary and these methods can do what they want. [#]_ 457 458 .. note:: 459 460 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false 461 value, the :meth:`__setstate__` method will not be called. 462 463 .. note:: 464 465 At unpickling time, some methods like :meth:`__getattr__`, 466 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the 467 instance. In case those methods rely on some internal invariant being 468 true, the type should implement either :meth:`__getinitargs__` or 469 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither 470 :meth:`__new__` nor :meth:`__init__` will be called. 471 472 473 Pickling and unpickling extension types 474 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 475 476 .. method:: object.__reduce__() 477 478 When the :class:`Pickler` encounters an object of a type it knows nothing 479 about --- such as an extension type --- it looks in two places for a hint of 480 how to pickle it. One alternative is for the object to implement a 481 :meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__` 482 will be called with no arguments, and it must return either a string or a 483 tuple. 484 485 If a string is returned, it names a global variable whose contents are 486 pickled as normal. The string returned by :meth:`__reduce__` should be the 487 object's local name relative to its module; the pickle module searches the 488 module namespace to determine the object's module. 489 490 When a tuple is returned, it must be between two and five elements long. 491 Optional elements can either be omitted, or ``None`` can be provided as their 492 value. The contents of this tuple are pickled as normal and used to 493 reconstruct the object at unpickling time. The semantics of each element 494 are: 495 496 * A callable object that will be called to create the initial version of the 497 object. The next element of the tuple will provide arguments for this 498 callable, and later elements provide additional state information that will 499 subsequently be used to fully reconstruct the pickled data. 500 501 In the unpickling environment this object must be either a class, a 502 callable registered as a "safe constructor" (see below), or it must have an 503 attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an 504 :exc:`UnpicklingError` will be raised in the unpickling environment. Note 505 that as usual, the callable itself is pickled by name. 506 507 * A tuple of arguments for the callable object. 508 509 .. versionchanged:: 2.5 510 Formerly, this argument could also be ``None``. 511 512 * Optionally, the object's state, which will be passed to the object's 513 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If 514 the object has no :meth:`__setstate__` method, then, as above, the value 515 must be a dictionary and it will be added to the object's 516 :attr:`~object.__dict__`. 517 518 * Optionally, an iterator (and not a sequence) yielding successive list 519 items. These list items will be pickled, and appended to the object using 520 either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is 521 primarily used for list subclasses, but may be used by other classes as 522 long as they have :meth:`append` and :meth:`extend` methods with the 523 appropriate signature. (Whether :meth:`append` or :meth:`extend` is used 524 depends on which pickle protocol version is used as well as the number of 525 items to append, so both must be supported.) 526 527 * Optionally, an iterator (not a sequence) yielding successive dictionary 528 items, which should be tuples of the form ``(key, value)``. These items 529 will be pickled and stored to the object using ``obj[key] = value``. This 530 is primarily used for dictionary subclasses, but may be used by other 531 classes as long as they implement :meth:`__setitem__`. 532 533 .. method:: object.__reduce_ex__(protocol) 534 535 It is sometimes useful to know the protocol version when implementing 536 :meth:`__reduce__`. This can be done by implementing a method named 537 :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, 538 when it exists, is called in preference over :meth:`__reduce__` (you may 539 still provide :meth:`__reduce__` for backwards compatibility). The 540 :meth:`__reduce_ex__` method will be called with a single integer argument, 541 the protocol version. 542 543 The :class:`object` class implements both :meth:`__reduce__` and 544 :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` 545 but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation 546 detects this and calls :meth:`__reduce__`. 547 548 An alternative to implementing a :meth:`__reduce__` method on the object to be 549 pickled, is to register the callable with the :mod:`copy_reg` module. This 550 module provides a way for programs to register "reduction functions" and 551 constructors for user-defined types. Reduction functions have the same 552 semantics and interface as the :meth:`__reduce__` method described above, except 553 that they are called with a single argument, the object to be pickled. 554 555 The registered constructor is deemed a "safe constructor" for purposes of 556 unpickling as described above. 557 558 559 Pickling and unpickling external objects 560 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 561 562 .. index:: 563 single: persistent_id (pickle protocol) 564 single: persistent_load (pickle protocol) 565 566 For the benefit of object persistence, the :mod:`pickle` module supports the 567 notion of a reference to an object outside the pickled data stream. Such 568 objects are referenced by a "persistent id", which is just an arbitrary string 569 of printable ASCII characters. The resolution of such names is not defined by 570 the :mod:`pickle` module; it will delegate this resolution to user defined 571 functions on the pickler and unpickler. [#]_ 572 573 To define external persistent id resolution, you need to set the 574 :attr:`~Pickler.persistent_id` attribute of the pickler object and the 575 :attr:`~Unpickler.persistent_load` attribute of the unpickler object. 576 577 To pickle objects that have an external persistent id, the pickler must have a 578 custom :func:`~Pickler.persistent_id` method that takes an object as an 579 argument and returns either ``None`` or the persistent id for that object. 580 When ``None`` is returned, the pickler simply pickles the object as normal. 581 When a persistent id string is returned, the pickler will pickle that string, 582 along with a marker so that the unpickler will recognize the string as a 583 persistent id. 584 585 To unpickle external objects, the unpickler must have a custom 586 :func:`~Unpickler.persistent_load` function that takes a persistent id string 587 and returns the referenced object. 588 589 Here's a silly example that *might* shed more light:: 590 591 import pickle 592 from cStringIO import StringIO 593 594 src = StringIO() 595 p = pickle.Pickler(src) 596 597 def persistent_id(obj): 598 if hasattr(obj, 'x'): 599 return 'the value %d' % obj.x 600 else: 601 return None 602 603 p.persistent_id = persistent_id 604 605 class Integer: 606 def __init__(self, x): 607 self.x = x 608 def __str__(self): 609 return 'My name is integer %d' % self.x 610 611 i = Integer(7) 612 print i 613 p.dump(i) 614 615 datastream = src.getvalue() 616 print repr(datastream) 617 dst = StringIO(datastream) 618 619 up = pickle.Unpickler(dst) 620 621 class FancyInteger(Integer): 622 def __str__(self): 623 return 'I am the integer %d' % self.x 624 625 def persistent_load(persid): 626 if persid.startswith('the value '): 627 value = int(persid.split()[2]) 628 return FancyInteger(value) 629 else: 630 raise pickle.UnpicklingError, 'Invalid persistent id' 631 632 up.persistent_load = persistent_load 633 634 j = up.load() 635 print j 636 637 In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load` 638 attribute can also be set to a Python list, in which case, when the unpickler 639 reaches a persistent id, the persistent id string will simply be appended to 640 this list. This functionality exists so that a pickle data stream can be 641 "sniffed" for object references without actually instantiating all the objects 642 in a pickle. 643 [#]_ Setting :attr:`~Unpickler.persistent_load` to a list is usually used in 644 conjunction with the :meth:`~Unpickler.noload` method on the Unpickler. 645 646 .. BAW: Both pickle and cPickle support something called inst_persistent_id() 647 which appears to give unknown types a second shot at producing a persistent 648 id. Since Jim Fulton can't remember why it was added or what it's for, I'm 649 leaving it undocumented. 650 651 652 .. _pickle-sub: 653 654 Subclassing Unpicklers 655 ---------------------- 656 657 .. index:: 658 single: load_global() (pickle protocol) 659 single: find_global() (pickle protocol) 660 661 By default, unpickling will import any class that it finds in the pickle data. 662 You can control exactly what gets unpickled and what gets called by customizing 663 your unpickler. Unfortunately, exactly how you do this is different depending 664 on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_ 665 666 In the :mod:`pickle` module, you need to derive a subclass from 667 :class:`Unpickler`, overriding the :meth:`load_global` method. 668 :meth:`load_global` should read two lines from the pickle data stream where the 669 first line will the name of the module containing the class and the second line 670 will be the name of the instance's class. It then looks up the class, possibly 671 importing the module and digging out the attribute, then it appends what it 672 finds to the unpickler's stack. Later on, this class will be assigned to the 673 :attr:`__class__` attribute of an empty class, as a way of magically creating an 674 instance without calling its class's :meth:`__init__`. Your job (should you 675 choose to accept it), would be to have :meth:`load_global` push onto the 676 unpickler's stack, a known safe version of any class you deem safe to unpickle. 677 It is up to you to produce such a class. Or you could raise an error if you 678 want to disallow all unpickling of instances. If this sounds like a hack, 679 you're right. Refer to the source code to make this work. 680 681 Things are a little cleaner with :mod:`cPickle`, but not by much. To control 682 what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global` 683 attribute to a function or ``None``. If it is ``None`` then any attempts to 684 unpickle instances will raise an :exc:`UnpicklingError`. If it is a function, 685 then it should accept a module name and a class name, and return the 686 corresponding class object. It is responsible for looking up the class and 687 performing any necessary imports, and it may raise an error to prevent 688 instances of the class from being unpickled. 689 690 The moral of the story is that you should be really careful about the source of 691 the strings your application unpickles. 692 693 694 .. _pickle-example: 695 696 Example 697 ------- 698 699 For the simplest code, use the :func:`dump` and :func:`load` functions. Note 700 that a self-referencing list is pickled and restored correctly. :: 701 702 import pickle 703 704 data1 = {'a': [1, 2.0, 3, 4+6j], 705 'b': ('string', u'Unicode string'), 706 'c': None} 707 708 selfref_list = [1, 2, 3] 709 selfref_list.append(selfref_list) 710 711 output = open('data.pkl', 'wb') 712 713 # Pickle dictionary using protocol 0. 714 pickle.dump(data1, output) 715 716 # Pickle the list using the highest protocol available. 717 pickle.dump(selfref_list, output, -1) 718 719 output.close() 720 721 The following example reads the resulting pickled data. When reading a 722 pickle-containing file, you should open the file in binary mode because you 723 can't be sure if the ASCII or binary format was used. :: 724 725 import pprint, pickle 726 727 pkl_file = open('data.pkl', 'rb') 728 729 data1 = pickle.load(pkl_file) 730 pprint.pprint(data1) 731 732 data2 = pickle.load(pkl_file) 733 pprint.pprint(data2) 734 735 pkl_file.close() 736 737 Here's a larger example that shows how to modify pickling behavior for a class. 738 The :class:`TextReader` class opens a text file, and returns the line number and 739 line contents each time its :meth:`!readline` method is called. If a 740 :class:`TextReader` instance is pickled, all attributes *except* the file object 741 member are saved. When the instance is unpickled, the file is reopened, and 742 reading resumes from the last location. The :meth:`__setstate__` and 743 :meth:`__getstate__` methods are used to implement this behavior. :: 744 745 #!/usr/local/bin/python 746 747 class TextReader: 748 """Print and number lines in a text file.""" 749 def __init__(self, file): 750 self.file = file 751 self.fh = open(file) 752 self.lineno = 0 753 754 def readline(self): 755 self.lineno = self.lineno + 1 756 line = self.fh.readline() 757 if not line: 758 return None 759 if line.endswith("\n"): 760 line = line[:-1] 761 return "%d: %s" % (self.lineno, line) 762 763 def __getstate__(self): 764 odict = self.__dict__.copy() # copy the dict since we change it 765 del odict['fh'] # remove filehandle entry 766 return odict 767 768 def __setstate__(self, dict): 769 fh = open(dict['file']) # reopen file 770 count = dict['lineno'] # read from file... 771 while count: # until line count is restored 772 fh.readline() 773 count = count - 1 774 self.__dict__.update(dict) # update attributes 775 self.fh = fh # save the file object 776 777 A sample usage might be something like this:: 778 779 >>> import TextReader 780 >>> obj = TextReader.TextReader("TextReader.py") 781 >>> obj.readline() 782 '1: #!/usr/local/bin/python' 783 >>> obj.readline() 784 '2: ' 785 >>> obj.readline() 786 '3: class TextReader:' 787 >>> import pickle 788 >>> pickle.dump(obj, open('save.p', 'wb')) 789 790 If you want to see that :mod:`pickle` works across Python processes, start 791 another Python session, before continuing. What follows can happen from either 792 the same process or a new process. :: 793 794 >>> import pickle 795 >>> reader = pickle.load(open('save.p', 'rb')) 796 >>> reader.readline() 797 '4: """Print and number lines in a text file."""' 798 799 800 .. seealso:: 801 802 Module :mod:`copy_reg` 803 Pickle interface constructor registration for extension types. 804 805 Module :mod:`shelve` 806 Indexed databases of objects; uses :mod:`pickle`. 807 808 Module :mod:`copy` 809 Shallow and deep object copying. 810 811 Module :mod:`marshal` 812 High-performance serialization of built-in types. 813 814 815 :mod:`cPickle` --- A faster :mod:`pickle` 816 ========================================= 817 818 .. module:: cPickle 819 :synopsis: Faster version of pickle, but not subclassable. 820 .. moduleauthor:: Jim Fulton <jim (a] zope.com> 821 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 822 823 824 .. index:: module: pickle 825 826 The :mod:`cPickle` module supports serialization and de-serialization of Python 827 objects, providing an interface and functionality nearly identical to the 828 :mod:`pickle` module. There are several differences, the most important being 829 performance and subclassability. 830 831 First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because 832 the former is implemented in C. Second, in the :mod:`cPickle` module the 833 callables :func:`Pickler` and :func:`Unpickler` are functions, not classes. 834 This means that you cannot use them to derive custom pickling and unpickling 835 subclasses. Most applications have no need for this functionality and should 836 benefit from the greatly improved performance of the :mod:`cPickle` module. 837 838 The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are 839 identical, so it is possible to use :mod:`pickle` and :mod:`cPickle` 840 interchangeably with existing pickles. [#]_ 841 842 There are additional minor differences in API between :mod:`cPickle` and 843 :mod:`pickle`, however for most applications, they are interchangeable. More 844 documentation is provided in the :mod:`pickle` module documentation, which 845 includes a list of the documented differences. 846 847 .. rubric:: Footnotes 848 849 .. [#] Don't confuse this with the :mod:`marshal` module 850 851 .. [#] In the :mod:`pickle` module these callables are classes, which you could 852 subclass to customize the behavior. However, in the :mod:`cPickle` module these 853 callables are factory functions and so cannot be subclassed. One common reason 854 to subclass is to control what objects can actually be unpickled. See section 855 :ref:`pickle-sub` for more details. 856 857 .. [#] *Warning*: this is intended for pickling multiple objects without intervening 858 modifications to the objects or their parts. If you modify an object and then 859 pickle it again using the same :class:`Pickler` instance, the object is not 860 pickled again --- a reference to it is pickled and the :class:`Unpickler` will 861 return the old value, not the modified one. There are two problems here: (1) 862 detecting changes, and (2) marshalling a minimal set of changes. Garbage 863 Collection may also become a problem here. 864 865 .. [#] The exception raised will likely be an :exc:`ImportError` or an 866 :exc:`AttributeError` but it could be something else. 867 868 .. [#] These methods can also be used to implement copying class instances. 869 870 .. [#] This protocol is also used by the shallow and deep copying operations defined in 871 the :mod:`copy` module. 872 873 .. [#] The actual mechanism for associating these user defined functions is slightly 874 different for :mod:`pickle` and :mod:`cPickle`. The description given here 875 works the same for both implementations. Users of the :mod:`pickle` module 876 could also use subclassing to effect the same results, overriding the 877 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived 878 classes. 879 880 .. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles 881 in their living rooms. 882 883 .. [#] A word of caution: the mechanisms described here use internal attributes and 884 methods, which are subject to change in future versions of Python. We intend to 885 someday provide a common interface for controlling this behavior, which will 886 work in either :mod:`pickle` or :mod:`cPickle`. 887 888 .. [#] Since the pickle data format is actually a tiny stack-oriented programming 889 language, and some freedom is taken in the encodings of certain objects, it is 890 possible that the two modules produce different data streams for the same input 891 objects. However it is guaranteed that they will always be able to read each 892 other's data streams. 893 894