1 :mod:`pickle` --- Python object serialization 2 ============================================= 3 4 .. index:: 5 single: persistence 6 pair: persistent; objects 7 pair: serializing; objects 8 pair: marshalling; objects 9 pair: flattening; objects 10 pair: pickling; objects 11 12 .. module:: pickle 13 :synopsis: Convert Python objects to streams of bytes and back. 14 .. sectionauthor:: Jim Kerr <jbkerr (a] sr.hp.com>. 15 .. sectionauthor:: Barry Warsaw <barry (a] zope.com> 16 17 The :mod:`pickle` module implements a fundamental, but powerful algorithm for 18 serializing and de-serializing a Python object structure. "Pickling" is the 19 process whereby a Python object hierarchy is converted into a byte stream, and 20 "unpickling" is the inverse operation, whereby a byte stream is converted back 21 into an object hierarchy. Pickling (and unpickling) is alternatively known as 22 "serialization", "marshalling," [#]_ or "flattening", however, to avoid 23 confusion, the terms used here are "pickling" and "unpickling". 24 25 This documentation describes both the :mod:`pickle` module and the 26 :mod:`cPickle` module. 27 28 .. warning:: 29 30 The :mod:`pickle` module is not secure against erroneous or maliciously 31 constructed data. Never unpickle data received from an untrusted or 32 unauthenticated source. 33 34 35 Relationship to other Python modules 36 ------------------------------------ 37 38 The :mod:`pickle` module has an optimized cousin called the :mod:`cPickle` 39 module. As its name implies, :mod:`cPickle` is written in C, so it can be up to 40 1000 times faster than :mod:`pickle`. However it does not support subclassing 41 of the :func:`Pickler` and :func:`Unpickler` classes, because in :mod:`cPickle` 42 these are functions, not classes. Most applications have no need for this 43 functionality, and can benefit from the improved performance of :mod:`cPickle`. 44 Other than that, the interfaces of the two modules are nearly identical; the 45 common interface is described in this manual and differences are pointed out 46 where necessary. In the following discussions, we use the term "pickle" to 47 collectively describe the :mod:`pickle` and :mod:`cPickle` modules. 48 49 The data streams the two modules produce are guaranteed to be interchangeable. 50 51 Python has a more primitive serialization module called :mod:`marshal`, but in 52 general :mod:`pickle` should always be the preferred way to serialize Python 53 objects. :mod:`marshal` exists primarily to support Python's :file:`.pyc` 54 files. 55 56 The :mod:`pickle` module differs from :mod:`marshal` in several significant ways: 57 58 * The :mod:`pickle` module keeps track of the objects it has already serialized, 59 so that later references to the same object won't be serialized again. 60 :mod:`marshal` doesn't do this. 61 62 This has implications both for recursive objects and object sharing. Recursive 63 objects are objects that contain references to themselves. These are not 64 handled by marshal, and in fact, attempting to marshal recursive objects will 65 crash your Python interpreter. Object sharing happens when there are multiple 66 references to the same object in different places in the object hierarchy being 67 serialized. :mod:`pickle` stores such objects only once, and ensures that all 68 other references point to the master copy. Shared objects remain shared, which 69 can be very important for mutable objects. 70 71 * :mod:`marshal` cannot be used to serialize user-defined classes and their 72 instances. :mod:`pickle` can save and restore class instances transparently, 73 however the class definition must be importable and live in the same module as 74 when the object was stored. 75 76 * The :mod:`marshal` serialization format is not guaranteed to be portable 77 across Python versions. Because its primary job in life is to support 78 :file:`.pyc` files, the Python implementers reserve the right to change the 79 serialization format in non-backwards compatible ways should the need arise. 80 The :mod:`pickle` serialization format is guaranteed to be backwards compatible 81 across Python releases. 82 83 Note that serialization is a more primitive notion than persistence; although 84 :mod:`pickle` reads and writes file objects, it does not handle the issue of 85 naming persistent objects, nor the (even more complicated) issue of concurrent 86 access to persistent objects. The :mod:`pickle` module can transform a complex 87 object into a byte stream and it can transform the byte stream into an object 88 with the same internal structure. Perhaps the most obvious thing to do with 89 these byte streams is to write them onto a file, but it is also conceivable to 90 send them across a network or store them in a database. The module 91 :mod:`shelve` provides a simple interface to pickle and unpickle objects on 92 DBM-style database files. 93 94 95 Data stream format 96 ------------------ 97 98 .. index:: 99 single: XDR 100 single: External Data Representation 101 102 The data format used by :mod:`pickle` is Python-specific. This has the 103 advantage that there are no restrictions imposed by external standards such as 104 XDR (which can't represent pointer sharing); however it means that non-Python 105 programs may not be able to reconstruct pickled Python objects. 106 107 By default, the :mod:`pickle` data format uses a printable ASCII representation. 108 This is slightly more voluminous than a binary representation. The big 109 advantage of using printable ASCII (and of some other characteristics of 110 :mod:`pickle`'s representation) is that for debugging or recovery purposes it is 111 possible for a human to read the pickled file with a standard text editor. 112 113 There are currently 3 different protocols which can be used for pickling. 114 115 * Protocol version 0 is the original ASCII protocol and is backwards compatible 116 with earlier versions of Python. 117 118 * Protocol version 1 is the old binary format which is also compatible with 119 earlier versions of Python. 120 121 * Protocol version 2 was introduced in Python 2.3. It provides much more 122 efficient pickling of :term:`new-style class`\es. 123 124 Refer to :pep:`307` for more information. 125 126 If a *protocol* is not specified, protocol 0 is used. If *protocol* is specified 127 as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol version 128 available will be used. 129 130 .. versionchanged:: 2.3 131 Introduced the *protocol* parameter. 132 133 A binary format, which is slightly more efficient, can be chosen by specifying a 134 *protocol* version >= 1. 135 136 137 Usage 138 ----- 139 140 To serialize an object hierarchy, you first create a pickler, then you call the 141 pickler's :meth:`dump` method. To de-serialize a data stream, you first create 142 an unpickler, then you call the unpickler's :meth:`load` method. The 143 :mod:`pickle` module provides the following constant: 144 145 146 .. data:: HIGHEST_PROTOCOL 147 148 The highest protocol version available. This value can be passed as a 149 *protocol* value. 150 151 .. versionadded:: 2.3 152 153 .. note:: 154 155 Be sure to always open pickle files created with protocols >= 1 in binary mode. 156 For the old ASCII-based pickle protocol 0 you can use either text mode or binary 157 mode as long as you stay consistent. 158 159 A pickle file written with protocol 0 in binary mode will contain lone linefeeds 160 as line terminators and therefore will look "funny" when viewed in Notepad or 161 other editors which do not support this format. 162 163 The :mod:`pickle` module provides the following functions to make the pickling 164 process more convenient: 165 166 167 .. function:: dump(obj, file[, protocol]) 168 169 Write a pickled representation of *obj* to the open file object *file*. This is 170 equivalent to ``Pickler(file, protocol).dump(obj)``. 171 172 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 173 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 174 version will be used. 175 176 .. versionchanged:: 2.3 177 Introduced the *protocol* parameter. 178 179 *file* must have a :meth:`write` method that accepts a single string argument. 180 It can thus be a file object opened for writing, a :mod:`StringIO` object, or 181 any other custom object that meets this interface. 182 183 184 .. function:: load(file) 185 186 Read a string from the open file object *file* and interpret it as a pickle data 187 stream, reconstructing and returning the original object hierarchy. This is 188 equivalent to ``Unpickler(file).load()``. 189 190 *file* must have two methods, a :meth:`read` method that takes an integer 191 argument, and a :meth:`readline` method that requires no arguments. Both 192 methods should return a string. Thus *file* can be a file object opened for 193 reading, a :mod:`StringIO` object, or any other custom object that meets this 194 interface. 195 196 This function automatically determines whether the data stream was written in 197 binary mode or not. 198 199 200 .. function:: dumps(obj[, protocol]) 201 202 Return the pickled representation of the object as a string, instead of writing 203 it to a file. 204 205 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 206 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest protocol 207 version will be used. 208 209 .. versionchanged:: 2.3 210 The *protocol* parameter was added. 211 212 213 .. function:: loads(string) 214 215 Read a pickled object hierarchy from a string. Characters in the string past 216 the pickled object's representation are ignored. 217 218 The :mod:`pickle` module also defines three exceptions: 219 220 221 .. exception:: PickleError 222 223 A common base class for the other exceptions defined below. This inherits from 224 :exc:`Exception`. 225 226 227 .. exception:: PicklingError 228 229 This exception is raised when an unpicklable object is passed to the 230 :meth:`dump` method. 231 232 233 .. exception:: UnpicklingError 234 235 This exception is raised when there is a problem unpickling an object. Note that 236 other exceptions may also be raised during unpickling, including (but not 237 necessarily limited to) :exc:`AttributeError`, :exc:`EOFError`, 238 :exc:`ImportError`, and :exc:`IndexError`. 239 240 The :mod:`pickle` module also exports two callables [#]_, :class:`Pickler` and 241 :class:`Unpickler`: 242 243 244 .. class:: Pickler(file[, protocol]) 245 246 This takes a file-like object to which it will write a pickle data stream. 247 248 If the *protocol* parameter is omitted, protocol 0 is used. If *protocol* is 249 specified as a negative value or :const:`HIGHEST_PROTOCOL`, the highest 250 protocol version will be used. 251 252 .. versionchanged:: 2.3 253 Introduced the *protocol* parameter. 254 255 *file* must have a :meth:`write` method that accepts a single string argument. 256 It can thus be an open file object, a :mod:`StringIO` object, or any other 257 custom object that meets this interface. 258 259 :class:`Pickler` objects define one (or two) public methods: 260 261 262 .. method:: dump(obj) 263 264 Write a pickled representation of *obj* to the open file object given in the 265 constructor. Either the binary or ASCII format will be used, depending on the 266 value of the *protocol* argument passed to the constructor. 267 268 269 .. method:: clear_memo() 270 271 Clears the pickler's "memo". The memo is the data structure that remembers 272 which objects the pickler has already seen, so that shared or recursive objects 273 pickled by reference and not by value. This method is useful when re-using 274 picklers. 275 276 .. note:: 277 278 Prior to Python 2.3, :meth:`clear_memo` was only available on the picklers 279 created by :mod:`cPickle`. In the :mod:`pickle` module, picklers have an 280 instance variable called :attr:`memo` which is a Python dictionary. So to clear 281 the memo for a :mod:`pickle` module pickler, you could do the following:: 282 283 mypickler.memo.clear() 284 285 Code that does not need to support older versions of Python should simply use 286 :meth:`clear_memo`. 287 288 It is possible to make multiple calls to the :meth:`dump` method of the same 289 :class:`Pickler` instance. These must then be matched to the same number of 290 calls to the :meth:`load` method of the corresponding :class:`Unpickler` 291 instance. If the same object is pickled by multiple :meth:`dump` calls, the 292 :meth:`load` will all yield references to the same object. [#]_ 293 294 :class:`Unpickler` objects are defined as: 295 296 297 .. class:: Unpickler(file) 298 299 This takes a file-like object from which it will read a pickle data stream. 300 This class automatically determines whether the data stream was written in 301 binary mode or not, so it does not need a flag as in the :class:`Pickler` 302 factory. 303 304 *file* must have two methods, a :meth:`read` method that takes an integer 305 argument, and a :meth:`readline` method that requires no arguments. Both 306 methods should return a string. Thus *file* can be a file object opened for 307 reading, a :mod:`StringIO` object, or any other custom object that meets this 308 interface. 309 310 :class:`Unpickler` objects have one (or two) public methods: 311 312 313 .. method:: load() 314 315 Read a pickled object representation from the open file object given in 316 the constructor, and return the reconstituted object hierarchy specified 317 therein. 318 319 This method automatically determines whether the data stream was written 320 in binary mode or not. 321 322 323 .. method:: noload() 324 325 This is just like :meth:`load` except that it doesn't actually create any 326 objects. This is useful primarily for finding what's called "persistent 327 ids" that may be referenced in a pickle data stream. See section 328 :ref:`pickle-protocol` below for more details. 329 330 **Note:** the :meth:`noload` method is currently only available on 331 :class:`Unpickler` objects created with the :mod:`cPickle` module. 332 :mod:`pickle` module :class:`Unpickler`\ s do not have the :meth:`noload` 333 method. 334 335 336 What can be pickled and unpickled? 337 ---------------------------------- 338 339 The following types can be pickled: 340 341 * ``None``, ``True``, and ``False`` 342 343 * integers, long integers, floating point numbers, complex numbers 344 345 * normal and Unicode strings 346 347 * tuples, lists, sets, and dictionaries containing only picklable objects 348 349 * functions defined at the top level of a module 350 351 * built-in functions defined at the top level of a module 352 353 * classes that are defined at the top level of a module 354 355 * instances of such classes whose :attr:`~object.__dict__` or the result of 356 calling :meth:`__getstate__` is picklable (see section :ref:`pickle-protocol` 357 for details). 358 359 Attempts to pickle unpicklable objects will raise the :exc:`PicklingError` 360 exception; when this happens, an unspecified number of bytes may have already 361 been written to the underlying file. Trying to pickle a highly recursive data 362 structure may exceed the maximum recursion depth, a :exc:`RuntimeError` will be 363 raised in this case. You can carefully raise this limit with 364 :func:`sys.setrecursionlimit`. 365 366 Note that functions (built-in and user-defined) are pickled by "fully qualified" 367 name reference, not by value. This means that only the function name is 368 pickled, along with the name of the module the function is defined in. Neither 369 the function's code, nor any of its function attributes are pickled. Thus the 370 defining module must be importable in the unpickling environment, and the module 371 must contain the named object, otherwise an exception will be raised. [#]_ 372 373 Similarly, classes are pickled by named reference, so the same restrictions in 374 the unpickling environment apply. Note that none of the class's code or data is 375 pickled, so in the following example the class attribute ``attr`` is not 376 restored in the unpickling environment:: 377 378 class Foo: 379 attr = 'a class attr' 380 381 picklestring = pickle.dumps(Foo) 382 383 These restrictions are why picklable functions and classes must be defined in 384 the top level of a module. 385 386 Similarly, when class instances are pickled, their class's code and data are not 387 pickled along with them. Only the instance data are pickled. This is done on 388 purpose, so you can fix bugs in a class or add methods to the class and still 389 load objects that were created with an earlier version of the class. If you 390 plan to have long-lived objects that will see many versions of a class, it may 391 be worthwhile to put a version number in the objects so that suitable 392 conversions can be made by the class's :meth:`__setstate__` method. 393 394 395 .. _pickle-protocol: 396 397 The pickle protocol 398 ------------------- 399 400 .. currentmodule:: None 401 402 This section describes the "pickling protocol" that defines the interface 403 between the pickler/unpickler and the objects that are being serialized. This 404 protocol provides a standard way for you to define, customize, and control how 405 your objects are serialized and de-serialized. The description in this section 406 doesn't cover specific customizations that you can employ to make the unpickling 407 environment slightly safer from untrusted pickle data streams; see section 408 :ref:`pickle-sub` for more details. 409 410 411 .. _pickle-inst: 412 413 Pickling and unpickling normal class instances 414 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 415 416 .. method:: object.__getinitargs__() 417 418 When a pickled class instance is unpickled, its :meth:`__init__` method is 419 normally *not* invoked. If it is desirable that the :meth:`__init__` method 420 be called on unpickling, an old-style class can define a method 421 :meth:`__getinitargs__`, which should return a *tuple* containing the 422 arguments to be passed to the class constructor (:meth:`__init__` for 423 example). The :meth:`__getinitargs__` method is called at pickle time; the 424 tuple it returns is incorporated in the pickle for the instance. 425 426 .. method:: object.__getnewargs__() 427 428 New-style types can provide a :meth:`__getnewargs__` method that is used for 429 protocol 2. Implementing this method is needed if the type establishes some 430 internal invariants when the instance is created, or if the memory allocation 431 is affected by the values passed to the :meth:`__new__` method for the type 432 (as it is for tuples and strings). Instances of a :term:`new-style class` 433 ``C`` are created using :: 434 435 obj = C.__new__(C, *args) 436 437 where *args* is the result of calling :meth:`__getnewargs__` on the original 438 object; if there is no :meth:`__getnewargs__`, an empty tuple is assumed. 439 440 .. method:: object.__getstate__() 441 442 Classes can further influence how their instances are pickled; if the class 443 defines the method :meth:`__getstate__`, it is called and the return state is 444 pickled as the contents for the instance, instead of the contents of the 445 instance's dictionary. If there is no :meth:`__getstate__` method, the 446 instance's :attr:`~object.__dict__` is pickled. 447 448 .. method:: object.__setstate__(state) 449 450 Upon unpickling, if the class also defines the method :meth:`__setstate__`, 451 it is called with the unpickled state. [#]_ If there is no 452 :meth:`__setstate__` method, the pickled state must be a dictionary and its 453 items are assigned to the new instance's dictionary. If a class defines both 454 :meth:`__getstate__` and :meth:`__setstate__`, the state object needn't be a 455 dictionary and these methods can do what they want. [#]_ 456 457 .. note:: 458 459 For :term:`new-style class`\es, if :meth:`__getstate__` returns a false 460 value, the :meth:`__setstate__` method will not be called. 461 462 .. note:: 463 464 At unpickling time, some methods like :meth:`__getattr__`, 465 :meth:`__getattribute__`, or :meth:`__setattr__` may be called upon the 466 instance. In case those methods rely on some internal invariant being 467 true, the type should implement either :meth:`__getinitargs__` or 468 :meth:`__getnewargs__` to establish such an invariant; otherwise, neither 469 :meth:`__new__` nor :meth:`__init__` will be called. 470 471 472 Pickling and unpickling extension types 473 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 474 475 .. method:: object.__reduce__() 476 477 When the :class:`Pickler` encounters an object of a type it knows nothing 478 about --- such as an extension type --- it looks in two places for a hint of 479 how to pickle it. One alternative is for the object to implement a 480 :meth:`__reduce__` method. If provided, at pickling time :meth:`__reduce__` 481 will be called with no arguments, and it must return either a string or a 482 tuple. 483 484 If a string is returned, it names a global variable whose contents are 485 pickled as normal. The string returned by :meth:`__reduce__` should be the 486 object's local name relative to its module; the pickle module searches the 487 module namespace to determine the object's module. 488 489 When a tuple is returned, it must be between two and five elements long. 490 Optional elements can either be omitted, or ``None`` can be provided as their 491 value. The contents of this tuple are pickled as normal and used to 492 reconstruct the object at unpickling time. The semantics of each element 493 are: 494 495 * A callable object that will be called to create the initial version of the 496 object. The next element of the tuple will provide arguments for this 497 callable, and later elements provide additional state information that will 498 subsequently be used to fully reconstruct the pickled data. 499 500 In the unpickling environment this object must be either a class, a 501 callable registered as a "safe constructor" (see below), or it must have an 502 attribute :attr:`__safe_for_unpickling__` with a true value. Otherwise, an 503 :exc:`UnpicklingError` will be raised in the unpickling environment. Note 504 that as usual, the callable itself is pickled by name. 505 506 * A tuple of arguments for the callable object. 507 508 .. versionchanged:: 2.5 509 Formerly, this argument could also be ``None``. 510 511 * Optionally, the object's state, which will be passed to the object's 512 :meth:`__setstate__` method as described in section :ref:`pickle-inst`. If 513 the object has no :meth:`__setstate__` method, then, as above, the value 514 must be a dictionary and it will be added to the object's 515 :attr:`~object.__dict__`. 516 517 * Optionally, an iterator (and not a sequence) yielding successive list 518 items. These list items will be pickled, and appended to the object using 519 either ``obj.append(item)`` or ``obj.extend(list_of_items)``. This is 520 primarily used for list subclasses, but may be used by other classes as 521 long as they have :meth:`append` and :meth:`extend` methods with the 522 appropriate signature. (Whether :meth:`append` or :meth:`extend` is used 523 depends on which pickle protocol version is used as well as the number of 524 items to append, so both must be supported.) 525 526 * Optionally, an iterator (not a sequence) yielding successive dictionary 527 items, which should be tuples of the form ``(key, value)``. These items 528 will be pickled and stored to the object using ``obj[key] = value``. This 529 is primarily used for dictionary subclasses, but may be used by other 530 classes as long as they implement :meth:`__setitem__`. 531 532 .. method:: object.__reduce_ex__(protocol) 533 534 It is sometimes useful to know the protocol version when implementing 535 :meth:`__reduce__`. This can be done by implementing a method named 536 :meth:`__reduce_ex__` instead of :meth:`__reduce__`. :meth:`__reduce_ex__`, 537 when it exists, is called in preference over :meth:`__reduce__` (you may 538 still provide :meth:`__reduce__` for backwards compatibility). The 539 :meth:`__reduce_ex__` method will be called with a single integer argument, 540 the protocol version. 541 542 The :class:`object` class implements both :meth:`__reduce__` and 543 :meth:`__reduce_ex__`; however, if a subclass overrides :meth:`__reduce__` 544 but not :meth:`__reduce_ex__`, the :meth:`__reduce_ex__` implementation 545 detects this and calls :meth:`__reduce__`. 546 547 An alternative to implementing a :meth:`__reduce__` method on the object to be 548 pickled, is to register the callable with the :mod:`copy_reg` module. This 549 module provides a way for programs to register "reduction functions" and 550 constructors for user-defined types. Reduction functions have the same 551 semantics and interface as the :meth:`__reduce__` method described above, except 552 that they are called with a single argument, the object to be pickled. 553 554 The registered constructor is deemed a "safe constructor" for purposes of 555 unpickling as described above. 556 557 558 Pickling and unpickling external objects 559 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 560 561 .. index:: 562 single: persistent_id (pickle protocol) 563 single: persistent_load (pickle protocol) 564 565 For the benefit of object persistence, the :mod:`pickle` module supports the 566 notion of a reference to an object outside the pickled data stream. Such 567 objects are referenced by a "persistent id", which is just an arbitrary string 568 of printable ASCII characters. The resolution of such names is not defined by 569 the :mod:`pickle` module; it will delegate this resolution to user defined 570 functions on the pickler and unpickler. [#]_ 571 572 To define external persistent id resolution, you need to set the 573 :attr:`~Pickler.persistent_id` attribute of the pickler object and the 574 :attr:`~Unpickler.persistent_load` attribute of the unpickler object. 575 576 To pickle objects that have an external persistent id, the pickler must have a 577 custom :func:`~Pickler.persistent_id` method that takes an object as an 578 argument and returns either ``None`` or the persistent id for that object. 579 When ``None`` is returned, the pickler simply pickles the object as normal. 580 When a persistent id string is returned, the pickler will pickle that string, 581 along with a marker so that the unpickler will recognize the string as a 582 persistent id. 583 584 To unpickle external objects, the unpickler must have a custom 585 :func:`~Unpickler.persistent_load` function that takes a persistent id string 586 and returns the referenced object. 587 588 Here's a silly example that *might* shed more light:: 589 590 import pickle 591 from cStringIO import StringIO 592 593 src = StringIO() 594 p = pickle.Pickler(src) 595 596 def persistent_id(obj): 597 if hasattr(obj, 'x'): 598 return 'the value %d' % obj.x 599 else: 600 return None 601 602 p.persistent_id = persistent_id 603 604 class Integer: 605 def __init__(self, x): 606 self.x = x 607 def __str__(self): 608 return 'My name is integer %d' % self.x 609 610 i = Integer(7) 611 print i 612 p.dump(i) 613 614 datastream = src.getvalue() 615 print repr(datastream) 616 dst = StringIO(datastream) 617 618 up = pickle.Unpickler(dst) 619 620 class FancyInteger(Integer): 621 def __str__(self): 622 return 'I am the integer %d' % self.x 623 624 def persistent_load(persid): 625 if persid.startswith('the value '): 626 value = int(persid.split()[2]) 627 return FancyInteger(value) 628 else: 629 raise pickle.UnpicklingError, 'Invalid persistent id' 630 631 up.persistent_load = persistent_load 632 633 j = up.load() 634 print j 635 636 In the :mod:`cPickle` module, the unpickler's :attr:`~Unpickler.persistent_load` 637 attribute can also be set to a Python list, in which case, when the unpickler 638 reaches a persistent id, the persistent id string will simply be appended to 639 this list. This functionality exists so that a pickle data stream can be 640 "sniffed" for object references without actually instantiating all the objects 641 in a pickle. 642 [#]_ Setting :attr:`~Unpickler.persistent_load` to a list is usually used in 643 conjunction with the :meth:`~Unpickler.noload` method on the Unpickler. 644 645 .. BAW: Both pickle and cPickle support something called inst_persistent_id() 646 which appears to give unknown types a second shot at producing a persistent 647 id. Since Jim Fulton can't remember why it was added or what it's for, I'm 648 leaving it undocumented. 649 650 651 .. _pickle-sub: 652 653 Subclassing Unpicklers 654 ---------------------- 655 656 .. index:: 657 single: load_global() (pickle protocol) 658 single: find_global() (pickle protocol) 659 660 By default, unpickling will import any class that it finds in the pickle data. 661 You can control exactly what gets unpickled and what gets called by customizing 662 your unpickler. Unfortunately, exactly how you do this is different depending 663 on whether you're using :mod:`pickle` or :mod:`cPickle`. [#]_ 664 665 In the :mod:`pickle` module, you need to derive a subclass from 666 :class:`Unpickler`, overriding the :meth:`load_global` method. 667 :meth:`load_global` should read two lines from the pickle data stream where the 668 first line will the name of the module containing the class and the second line 669 will be the name of the instance's class. It then looks up the class, possibly 670 importing the module and digging out the attribute, then it appends what it 671 finds to the unpickler's stack. Later on, this class will be assigned to the 672 :attr:`__class__` attribute of an empty class, as a way of magically creating an 673 instance without calling its class's :meth:`__init__`. Your job (should you 674 choose to accept it), would be to have :meth:`load_global` push onto the 675 unpickler's stack, a known safe version of any class you deem safe to unpickle. 676 It is up to you to produce such a class. Or you could raise an error if you 677 want to disallow all unpickling of instances. If this sounds like a hack, 678 you're right. Refer to the source code to make this work. 679 680 Things are a little cleaner with :mod:`cPickle`, but not by much. To control 681 what gets unpickled, you can set the unpickler's :attr:`~Unpickler.find_global` 682 attribute to a function or ``None``. If it is ``None`` then any attempts to 683 unpickle instances will raise an :exc:`UnpicklingError`. If it is a function, 684 then it should accept a module name and a class name, and return the 685 corresponding class object. It is responsible for looking up the class and 686 performing any necessary imports, and it may raise an error to prevent 687 instances of the class from being unpickled. 688 689 The moral of the story is that you should be really careful about the source of 690 the strings your application unpickles. 691 692 693 .. _pickle-example: 694 695 Example 696 ------- 697 698 For the simplest code, use the :func:`dump` and :func:`load` functions. Note 699 that a self-referencing list is pickled and restored correctly. :: 700 701 import pickle 702 703 data1 = {'a': [1, 2.0, 3, 4+6j], 704 'b': ('string', u'Unicode string'), 705 'c': None} 706 707 selfref_list = [1, 2, 3] 708 selfref_list.append(selfref_list) 709 710 output = open('data.pkl', 'wb') 711 712 # Pickle dictionary using protocol 0. 713 pickle.dump(data1, output) 714 715 # Pickle the list using the highest protocol available. 716 pickle.dump(selfref_list, output, -1) 717 718 output.close() 719 720 The following example reads the resulting pickled data. When reading a 721 pickle-containing file, you should open the file in binary mode because you 722 can't be sure if the ASCII or binary format was used. :: 723 724 import pprint, pickle 725 726 pkl_file = open('data.pkl', 'rb') 727 728 data1 = pickle.load(pkl_file) 729 pprint.pprint(data1) 730 731 data2 = pickle.load(pkl_file) 732 pprint.pprint(data2) 733 734 pkl_file.close() 735 736 Here's a larger example that shows how to modify pickling behavior for a class. 737 The :class:`TextReader` class opens a text file, and returns the line number and 738 line contents each time its :meth:`!readline` method is called. If a 739 :class:`TextReader` instance is pickled, all attributes *except* the file object 740 member are saved. When the instance is unpickled, the file is reopened, and 741 reading resumes from the last location. The :meth:`__setstate__` and 742 :meth:`__getstate__` methods are used to implement this behavior. :: 743 744 #!/usr/local/bin/python 745 746 class TextReader: 747 """Print and number lines in a text file.""" 748 def __init__(self, file): 749 self.file = file 750 self.fh = open(file) 751 self.lineno = 0 752 753 def readline(self): 754 self.lineno = self.lineno + 1 755 line = self.fh.readline() 756 if not line: 757 return None 758 if line.endswith("\n"): 759 line = line[:-1] 760 return "%d: %s" % (self.lineno, line) 761 762 def __getstate__(self): 763 odict = self.__dict__.copy() # copy the dict since we change it 764 del odict['fh'] # remove filehandle entry 765 return odict 766 767 def __setstate__(self, dict): 768 fh = open(dict['file']) # reopen file 769 count = dict['lineno'] # read from file... 770 while count: # until line count is restored 771 fh.readline() 772 count = count - 1 773 self.__dict__.update(dict) # update attributes 774 self.fh = fh # save the file object 775 776 A sample usage might be something like this:: 777 778 >>> import TextReader 779 >>> obj = TextReader.TextReader("TextReader.py") 780 >>> obj.readline() 781 '1: #!/usr/local/bin/python' 782 >>> obj.readline() 783 '2: ' 784 >>> obj.readline() 785 '3: class TextReader:' 786 >>> import pickle 787 >>> pickle.dump(obj, open('save.p', 'wb')) 788 789 If you want to see that :mod:`pickle` works across Python processes, start 790 another Python session, before continuing. What follows can happen from either 791 the same process or a new process. :: 792 793 >>> import pickle 794 >>> reader = pickle.load(open('save.p', 'rb')) 795 >>> reader.readline() 796 '4: """Print and number lines in a text file."""' 797 798 799 .. seealso:: 800 801 Module :mod:`copy_reg` 802 Pickle interface constructor registration for extension types. 803 804 Module :mod:`shelve` 805 Indexed databases of objects; uses :mod:`pickle`. 806 807 Module :mod:`copy` 808 Shallow and deep object copying. 809 810 Module :mod:`marshal` 811 High-performance serialization of built-in types. 812 813 814 :mod:`cPickle` --- A faster :mod:`pickle` 815 ========================================= 816 817 .. module:: cPickle 818 :synopsis: Faster version of pickle, but not subclassable. 819 .. moduleauthor:: Jim Fulton <jim (a] zope.com> 820 .. sectionauthor:: Fred L. Drake, Jr. <fdrake (a] acm.org> 821 822 823 .. index:: module: pickle 824 825 The :mod:`cPickle` module supports serialization and de-serialization of Python 826 objects, providing an interface and functionality nearly identical to the 827 :mod:`pickle` module. There are several differences, the most important being 828 performance and subclassability. 829 830 First, :mod:`cPickle` can be up to 1000 times faster than :mod:`pickle` because 831 the former is implemented in C. Second, in the :mod:`cPickle` module the 832 callables :func:`Pickler` and :func:`Unpickler` are functions, not classes. 833 This means that you cannot use them to derive custom pickling and unpickling 834 subclasses. Most applications have no need for this functionality and should 835 benefit from the greatly improved performance of the :mod:`cPickle` module. 836 837 The pickle data stream produced by :mod:`pickle` and :mod:`cPickle` are 838 identical, so it is possible to use :mod:`pickle` and :mod:`cPickle` 839 interchangeably with existing pickles. [#]_ 840 841 There are additional minor differences in API between :mod:`cPickle` and 842 :mod:`pickle`, however for most applications, they are interchangeable. More 843 documentation is provided in the :mod:`pickle` module documentation, which 844 includes a list of the documented differences. 845 846 .. rubric:: Footnotes 847 848 .. [#] Don't confuse this with the :mod:`marshal` module 849 850 .. [#] In the :mod:`pickle` module these callables are classes, which you could 851 subclass to customize the behavior. However, in the :mod:`cPickle` module these 852 callables are factory functions and so cannot be subclassed. One common reason 853 to subclass is to control what objects can actually be unpickled. See section 854 :ref:`pickle-sub` for more details. 855 856 .. [#] *Warning*: this is intended for pickling multiple objects without intervening 857 modifications to the objects or their parts. If you modify an object and then 858 pickle it again using the same :class:`Pickler` instance, the object is not 859 pickled again --- a reference to it is pickled and the :class:`Unpickler` will 860 return the old value, not the modified one. There are two problems here: (1) 861 detecting changes, and (2) marshalling a minimal set of changes. Garbage 862 Collection may also become a problem here. 863 864 .. [#] The exception raised will likely be an :exc:`ImportError` or an 865 :exc:`AttributeError` but it could be something else. 866 867 .. [#] These methods can also be used to implement copying class instances. 868 869 .. [#] This protocol is also used by the shallow and deep copying operations defined in 870 the :mod:`copy` module. 871 872 .. [#] The actual mechanism for associating these user defined functions is slightly 873 different for :mod:`pickle` and :mod:`cPickle`. The description given here 874 works the same for both implementations. Users of the :mod:`pickle` module 875 could also use subclassing to effect the same results, overriding the 876 :meth:`persistent_id` and :meth:`persistent_load` methods in the derived 877 classes. 878 879 .. [#] We'll leave you with the image of Guido and Jim sitting around sniffing pickles 880 in their living rooms. 881 882 .. [#] A word of caution: the mechanisms described here use internal attributes and 883 methods, which are subject to change in future versions of Python. We intend to 884 someday provide a common interface for controlling this behavior, which will 885 work in either :mod:`pickle` or :mod:`cPickle`. 886 887 .. [#] Since the pickle data format is actually a tiny stack-oriented programming 888 language, and some freedom is taken in the encodings of certain objects, it is 889 possible that the two modules produce different data streams for the same input 890 objects. However it is guaranteed that they will always be able to read each 891 other's data streams. 892 893