Home | History | Annotate | Download | only in Misc
      1 This document describes some caveats about the use of Valgrind with
      2 Python.  Valgrind is used periodically by Python developers to try
      3 to ensure there are no memory leaks or invalid memory reads/writes.
      4 
      5 UPDATE: Python 3.6 now supports PYTHONMALLOC=malloc environment variable which
      6 can be used to force the usage of the malloc() allocator of the C library.
      7 
      8 If you don't want to read about the details of using Valgrind, there
      9 are still two things you must do to suppress the warnings.  First,
     10 you must use a suppressions file.  One is supplied in
     11 Misc/valgrind-python.supp.  Second, you must do one of the following:
     12 
     13   * Uncomment Py_USING_MEMORY_DEBUGGER in Objects/obmalloc.c,
     14     then rebuild Python
     15   * Uncomment the lines in Misc/valgrind-python.supp that
     16     suppress the warnings for PyObject_Free and PyObject_Realloc
     17 
     18 If you want to use Valgrind more effectively and catch even more
     19 memory leaks, you will need to configure python --without-pymalloc.
     20 PyMalloc allocates a few blocks in big chunks and most object
     21 allocations don't call malloc, they use chunks doled about by PyMalloc
     22 from the big blocks.  This means Valgrind can't detect
     23 many allocations (and frees), except for those that are forwarded
     24 to the system malloc.  Note: configuring python --without-pymalloc
     25 makes Python run much slower, especially when running under Valgrind.
     26 You may need to run the tests in batches under Valgrind to keep
     27 the memory usage down to allow the tests to complete.  It seems to take
     28 about 5 times longer to run --without-pymalloc.
     29 
     30 Apr 15, 2006:
     31   test_ctypes causes Valgrind 3.1.1 to fail (crash).
     32   test_socket_ssl should be skipped when running valgrind.
     33 	The reason is that it purposely uses uninitialized memory.
     34 	This causes many spurious warnings, so it's easier to just skip it.
     35 
     36 
     37 Details:
     38 --------
     39 Python uses its own small-object allocation scheme on top of malloc,
     40 called PyMalloc.
     41 
     42 Valgrind may show some unexpected results when PyMalloc is used.
     43 Starting with Python 2.3, PyMalloc is used by default.  You can disable
     44 PyMalloc when configuring python by adding the --without-pymalloc option.
     45 If you disable PyMalloc, most of the information in this document and
     46 the supplied suppressions file will not be useful.  As discussed above,
     47 disabling PyMalloc can catch more problems.
     48 
     49 If you use valgrind on a default build of Python,  you will see
     50 many errors like:
     51 
     52         ==6399== Use of uninitialised value of size 4
     53         ==6399== at 0x4A9BDE7E: PyObject_Free (obmalloc.c:711)
     54         ==6399== by 0x4A9B8198: dictresize (dictobject.c:477)
     55 
     56 These are expected and not a problem.  Tim Peters explains
     57 the situation:
     58 
     59         PyMalloc needs to know whether an arbitrary address is one
     60 	that's managed by it, or is managed by the system malloc.
     61 	The current scheme allows this to be determined in constant
     62 	time, regardless of how many memory areas are under pymalloc's
     63 	control.
     64 
     65         The memory pymalloc manages itself is in one or more "arenas",
     66 	each a large contiguous memory area obtained from malloc.
     67 	The base address of each arena is saved by pymalloc
     68 	in a vector.  Each arena is carved into "pools", and a field at
     69 	the start of each pool contains the index of that pool's arena's
     70 	base address in that vector.
     71 
     72         Given an arbitrary address, pymalloc computes the pool base
     73 	address corresponding to it, then looks at "the index" stored
     74 	near there.  If the index read up is out of bounds for the
     75 	vector of arena base addresses pymalloc maintains, then
     76 	pymalloc knows for certain that this address is not under
     77 	pymalloc's control.  Otherwise the index is in bounds, and
     78 	pymalloc compares
     79 
     80             the arena base address stored at that index in the vector
     81 
     82         to
     83 
     84             the arbitrary address pymalloc is investigating
     85 
     86         pymalloc controls this arbitrary address if and only if it lies
     87         in the arena the address's pool's index claims it lies in.
     88 
     89         It doesn't matter whether the memory pymalloc reads up ("the
     90 	index") is initialized.  If it's not initialized, then
     91 	whatever trash gets read up will lead pymalloc to conclude
     92 	(correctly) that the address isn't controlled by it, either
     93 	because the index is out of bounds, or the index is in bounds
     94 	but the arena it represents doesn't contain the address.
     95 
     96         This determination has to be made on every call to one of
     97 	pymalloc's free/realloc entry points, so its speed is critical
     98 	(Python allocates and frees dynamic memory at a ferocious rate
     99 	-- everything in Python, from integers to "stack frames",
    100 	lives in the heap).
    101