Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
      4 
      5 
      6 <chapter id="mc-manual" xreflabel="Memcheck: a memory error detector">
      7 <title>Memcheck: a memory error detector</title>
      8 
      9 <para>To use this tool, you may specify <option>--tool=memcheck</option>
     10 on the Valgrind command line.  You don't have to, though, since Memcheck
     11 is the default tool.</para>
     12 
     13 
     14 <sect1 id="mc-manual.overview" xreflabel="Overview">
     15 <title>Overview</title>
     16 
     17 <para>Memcheck is a memory error detector.  It can detect the following
     18 problems that are common in C and C++ programs.</para>
     19 
     20 <itemizedlist>
     21   <listitem>
     22     <para>Accessing memory you shouldn't, e.g. overrunning and underrunning
     23     heap blocks, overrunning the top of the stack, and accessing memory after
     24     it has been freed.</para>
     25   </listitem>
     26 
     27   <listitem>
     28     <para>Using undefined values, i.e. values that have not been initialised,
     29     or that have been derived from other undefined values.</para>
     30   </listitem>
     31 
     32   <listitem>
     33     <para>Incorrect freeing of heap memory, such as double-freeing heap
     34     blocks, or mismatched use of
     35     <function>malloc</function>/<computeroutput>new</computeroutput>/<computeroutput>new[]</computeroutput>
     36     versus
     37     <function>free</function>/<computeroutput>delete</computeroutput>/<computeroutput>delete[]</computeroutput></para>
     38   </listitem>
     39 
     40   <listitem>
     41     <para>Overlapping <computeroutput>src</computeroutput> and
     42     <computeroutput>dst</computeroutput> pointers in
     43     <computeroutput>memcpy</computeroutput> and related
     44     functions.</para>
     45   </listitem>
     46 
     47   <listitem>
     48     <para>Memory leaks.</para>
     49   </listitem>
     50 </itemizedlist>
     51 
     52 <para>Problems like these can be difficult to find by other means,
     53 often remaining undetected for long periods, then causing occasional,
     54 difficult-to-diagnose crashes.</para>
     55 
     56 </sect1>
     57 
     58 
     59 
     60 <sect1 id="mc-manual.errormsgs"
     61        xreflabel="Explanation of error messages from Memcheck">
     62 <title>Explanation of error messages from Memcheck</title>
     63 
     64 <para>Memcheck issues a range of error messages.  This section presents a
     65 quick summary of what error messages mean.  The precise behaviour of the
     66 error-checking machinery is described in <xref
     67 linkend="mc-manual.machine"/>.</para>
     68 
     69 
     70 <sect2 id="mc-manual.badrw" 
     71        xreflabel="Illegal read / Illegal write errors">
     72 <title>Illegal read / Illegal write errors</title>
     73 
     74 <para>For example:</para>
     75 <programlisting><![CDATA[
     76 Invalid read of size 4
     77    at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
     78    by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
     79    by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
     80    by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
     81  Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
     82 ]]></programlisting>
     83 
     84 <para>This happens when your program reads or writes memory at a place
     85 which Memcheck reckons it shouldn't.  In this example, the program did a
     86 4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied
     87 library libpng.so.2.1.0.9, which was called from somewhere else in the
     88 same library, called from line 326 of <filename>qpngio.cpp</filename>,
     89 and so on.</para>
     90 
     91 <para>Memcheck tries to establish what the illegal address might relate
     92 to, since that's often useful.  So, if it points into a block of memory
     93 which has already been freed, you'll be informed of this, and also where
     94 the block was freed.  Likewise, if it should turn out to be just off
     95 the end of a heap block, a common result of off-by-one-errors in
     96 array subscripting, you'll be informed of this fact, and also where the
     97 block was allocated.  If you use the <option><xref
     98 linkend="opt.read-var-info"/></option> option Memcheck will run more slowly
     99 but may give a more detailed description of any illegal address.</para>
    100 
    101 <para>In this example, Memcheck can't identify the address.  Actually
    102 the address is on the stack, but, for some reason, this is not a valid
    103 stack address -- it is below the stack pointer and that isn't allowed.
    104 In this particular case it's probably caused by GCC generating invalid
    105 code, a known bug in some ancient versions of GCC.</para>
    106 
    107 <para>Note that Memcheck only tells you that your program is about to
    108 access memory at an illegal address.  It can't stop the access from
    109 happening.  So, if your program makes an access which normally would
    110 result in a segmentation fault, you program will still suffer the same
    111 fate -- but you will get a message from Memcheck immediately prior to
    112 this.  In this particular example, reading junk on the stack is
    113 non-fatal, and the program stays alive.</para>
    114 
    115 </sect2>
    116 
    117 
    118 
    119 <sect2 id="mc-manual.uninitvals" 
    120        xreflabel="Use of uninitialised values">
    121 <title>Use of uninitialised values</title>
    122 
    123 <para>For example:</para>
    124 <programlisting><![CDATA[
    125 Conditional jump or move depends on uninitialised value(s)
    126    at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
    127    by 0x402E8476: _IO_printf (printf.c:36)
    128    by 0x8048472: main (tests/manuel1.c:8)
    129 ]]></programlisting>
    130 
    131 <para>An uninitialised-value use error is reported when your program
    132 uses a value which hasn't been initialised -- in other words, is
    133 undefined.  Here, the undefined value is used somewhere inside the
    134 <function>printf</function> machinery of the C library.  This error was
    135 reported when running the following small program:</para>
    136 <programlisting><![CDATA[
    137 int main()
    138 {
    139   int x;
    140   printf ("x = %d\n", x);
    141 }]]></programlisting>
    142 
    143 <para>It is important to understand that your program can copy around
    144 junk (uninitialised) data as much as it likes.  Memcheck observes this
    145 and keeps track of the data, but does not complain.  A complaint is
    146 issued only when your program attempts to make use of uninitialised
    147 data in a way that might affect your program's externally-visible behaviour.
    148 In this example, <varname>x</varname> is uninitialised.  Memcheck observes
    149 the value being passed to <function>_IO_printf</function> and thence to
    150 <function>_IO_vfprintf</function>, but makes no comment.  However,
    151 <function>_IO_vfprintf</function> has to examine the value of
    152 <varname>x</varname> so it can turn it into the corresponding ASCII string,
    153 and it is at this point that Memcheck complains.</para>
    154 
    155 <para>Sources of uninitialised data tend to be:</para>
    156 <itemizedlist>
    157   <listitem>
    158     <para>Local variables in procedures which have not been initialised,
    159     as in the example above.</para>
    160   </listitem>
    161   <listitem>
    162     <para>The contents of heap blocks (allocated with
    163     <function>malloc</function>, <function>new</function>, or a similar
    164     function) before you (or a constructor) write something there.
    165     </para>
    166   </listitem>
    167 </itemizedlist>
    168 
    169 <para>To see information on the sources of uninitialised data in your
    170 program, use the <option>--track-origins=yes</option> option.  This
    171 makes Memcheck run more slowly, but can make it much easier to track down
    172 the root causes of uninitialised value errors.</para>
    173 
    174 </sect2>
    175 
    176 
    177 
    178 <sect2 id="mc-manual.bad-syscall-args" 
    179        xreflabel="Use of uninitialised or unaddressable values in system
    180        calls">
    181 <title>Use of uninitialised or unaddressable values in system
    182        calls</title>
    183 
    184 <para>Memcheck checks all parameters to system calls:
    185 <itemizedlist>
    186   <listitem>
    187     <para>It checks all the direct parameters themselves, whether they are
    188     initialised.</para>
    189   </listitem> 
    190   <listitem>
    191     <para>Also, if a system call needs to read from a buffer provided by
    192     your program, Memcheck checks that the entire buffer is addressable
    193     and its contents are initialised.</para>
    194   </listitem>
    195   <listitem>
    196     <para>Also, if the system call needs to write to a user-supplied
    197     buffer, Memcheck checks that the buffer is addressable.</para>
    198   </listitem>
    199 </itemizedlist>
    200 </para>
    201 
    202 <para>After the system call, Memcheck updates its tracked information to
    203 precisely reflect any changes in memory state caused by the system
    204 call.</para>
    205 
    206 <para>Here's an example of two system calls with invalid parameters:</para>
    207 <programlisting><![CDATA[
    208   #include <stdlib.h>
    209   #include <unistd.h>
    210   int main( void )
    211   {
    212     char* arr  = malloc(10);
    213     int*  arr2 = malloc(sizeof(int));
    214     write( 1 /* stdout */, arr, 10 );
    215     exit(arr2[0]);
    216   }
    217 ]]></programlisting>
    218 
    219 <para>You get these complaints ...</para>
    220 <programlisting><![CDATA[
    221   Syscall param write(buf) points to uninitialised byte(s)
    222      at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
    223      by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
    224      by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
    225    Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
    226      at 0x259852B0: malloc (vg_replace_malloc.c:130)
    227      by 0x80483F1: main (a.c:5)
    228 
    229   Syscall param exit(error_code) contains uninitialised byte(s)
    230      at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
    231      by 0x8048426: main (a.c:8)
    232 ]]></programlisting>
    233 
    234 <para>... because the program has (a) written uninitialised junk
    235 from the heap block to the standard output, and (b) passed an
    236 uninitialised value to <function>exit</function>.  Note that the first
    237 error refers to the memory pointed to by
    238 <computeroutput>buf</computeroutput> (not
    239 <computeroutput>buf</computeroutput> itself), but the second error
    240 refers directly to <computeroutput>exit</computeroutput>'s argument
    241 <computeroutput>arr2[0]</computeroutput>.</para>
    242 
    243 </sect2>
    244 
    245 
    246 <sect2 id="mc-manual.badfrees" xreflabel="Illegal frees">
    247 <title>Illegal frees</title>
    248 
    249 <para>For example:</para>
    250 <programlisting><![CDATA[
    251 Invalid free()
    252    at 0x4004FFDF: free (vg_clientmalloc.c:577)
    253    by 0x80484C7: main (tests/doublefree.c:10)
    254  Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
    255    at 0x4004FFDF: free (vg_clientmalloc.c:577)
    256    by 0x80484C7: main (tests/doublefree.c:10)
    257 ]]></programlisting>
    258 
    259 <para>Memcheck keeps track of the blocks allocated by your program
    260 with <function>malloc</function>/<computeroutput>new</computeroutput>,
    261 so it can know exactly whether or not the argument to
    262 <function>free</function>/<computeroutput>delete</computeroutput> is
    263 legitimate or not.  Here, this test program has freed the same block
    264 twice.  As with the illegal read/write errors, Memcheck attempts to
    265 make sense of the address freed.  If, as here, the address is one
    266 which has previously been freed, you wil be told that -- making
    267 duplicate frees of the same block easy to spot.  You will also get this
    268 message if you try to free a pointer that doesn't point to the start of a
    269 heap block.</para>
    270 
    271 </sect2>
    272 
    273 
    274 <sect2 id="mc-manual.rudefn" 
    275        xreflabel="When a heap block is freed with an inappropriate deallocation
    276 function">
    277 <title>When a heap block is freed with an inappropriate deallocation
    278 function</title>
    279 
    280 <para>In the following example, a block allocated with
    281 <function>new[]</function> has wrongly been deallocated with
    282 <function>free</function>:</para>
    283 <programlisting><![CDATA[
    284 Mismatched free() / delete / delete []
    285    at 0x40043249: free (vg_clientfuncs.c:171)
    286    by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
    287    by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
    288    by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
    289  Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
    290    at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
    291    by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
    292    by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
    293    by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
    294 ]]></programlisting>
    295 
    296 <para>In <literal>C++</literal> it's important to deallocate memory in a
    297 way compatible with how it was allocated.  The deal is:</para>
    298 <itemizedlist>
    299   <listitem>
    300     <para>If allocated with
    301     <function>malloc</function>,
    302     <function>calloc</function>,
    303     <function>realloc</function>,
    304     <function>valloc</function> or
    305     <function>memalign</function>, you must
    306     deallocate with <function>free</function>.</para>
    307   </listitem>
    308   <listitem>
    309    <para>If allocated with <function>new</function>, you must deallocate
    310    with <function>delete</function>.</para>
    311   </listitem>
    312   <listitem>
    313     <para>If allocated with <function>new[]</function>, you must
    314     deallocate with <function>delete[]</function>.</para>
    315   </listitem>
    316 </itemizedlist>
    317 
    318 <para>The worst thing is that on Linux apparently it doesn't matter if
    319 you do mix these up, but the same program may then crash on a
    320 different platform, Solaris for example.  So it's best to fix it
    321 properly.  According to the KDE folks "it's amazing how many C++
    322 programmers don't know this".</para>
    323 
    324 <para>The reason behind the requirement is as follows.  In some C++
    325 implementations, <function>delete[]</function> must be used for
    326 objects allocated by <function>new[]</function> because the compiler
    327 stores the size of the array and the pointer-to-member to the
    328 destructor of the array's content just before the pointer actually
    329 returned.  <function>delete</function> doesn't account for this and will get
    330 confused, possibly corrupting the heap.</para>
    331 
    332 </sect2>
    333 
    334 
    335 
    336 <sect2 id="mc-manual.overlap" 
    337        xreflabel="Overlapping source and destination blocks">
    338 <title>Overlapping source and destination blocks</title>
    339 
    340 <para>The following C library functions copy some data from one
    341 memory block to another (or something similar):
    342 <function>memcpy</function>,
    343 <function>strcpy</function>,
    344 <function>strncpy</function>,
    345 <function>strcat</function>,
    346 <function>strncat</function>. 
    347 The blocks pointed to by their <computeroutput>src</computeroutput> and
    348 <computeroutput>dst</computeroutput> pointers aren't allowed to overlap.
    349 The POSIX standards have wording along the lines "If copying takes place
    350 between objects that overlap, the behavior is undefined." Therefore,
    351 Memcheck checks for this.
    352 </para>
    353 
    354 <para>For example:</para>
    355 <programlisting><![CDATA[
    356 ==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
    357 ==27492==    at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
    358 ==27492==    by 0x804865A: main (overlap.c:40)
    359 ]]></programlisting>
    360 
    361 <para>You don't want the two blocks to overlap because one of them could
    362 get partially overwritten by the copying.</para>
    363 
    364 <para>You might think that Memcheck is being overly pedantic reporting
    365 this in the case where <computeroutput>dst</computeroutput> is less than
    366 <computeroutput>src</computeroutput>.  For example, the obvious way to
    367 implement <function>memcpy</function> is by copying from the first
    368 byte to the last.  However, the optimisation guides of some
    369 architectures recommend copying from the last byte down to the first.
    370 Also, some implementations of <function>memcpy</function> zero
    371 <computeroutput>dst</computeroutput> before copying, because zeroing the
    372 destination's cache line(s) can improve performance.</para>
    373 
    374 <para>The moral of the story is: if you want to write truly portable
    375 code, don't make any assumptions about the language
    376 implementation.</para>
    377 
    378 </sect2>
    379 
    380 
    381 <sect2 id="mc-manual.leaks" xreflabel="Memory leak detection">
    382 <title>Memory leak detection</title>
    383 
    384 <para>Memcheck keeps track of all heap blocks issued in response to
    385 calls to
    386 <function>malloc</function>/<function>new</function> et al.
    387 So when the program exits, it knows which blocks have not been freed.
    388 </para>
    389 
    390 <para>If <option>--leak-check</option> is set appropriately, for each
    391 remaining block, Memcheck determines if the block is reachable from pointers
    392 within the root-set.  The root-set consists of (a) general purpose registers
    393 of all threads, and (b) initialised, aligned, pointer-sized data words in
    394 accessible client memory, including stacks.</para>
    395 
    396 <para>There are two ways a block can be reached.  The first is with a
    397 "start-pointer", i.e. a pointer to the start of the block.  The second is with
    398 an "interior-pointer", i.e. a pointer to the middle of the block.  There are
    399 three ways we know of that an interior-pointer can occur:</para>
    400 
    401 <itemizedlist>
    402   <listitem>
    403     <para>The pointer might have originally been a start-pointer and have been
    404     moved along deliberately (or not deliberately) by the program.  In
    405     particular, this can happen if your program uses tagged pointers, i.e.
    406     if it uses the bottom one, two or three bits of a pointer, which are
    407     normally always zero due to alignment, in order to store extra
    408     information.</para>
    409   </listitem>
    410     
    411   <listitem>
    412     <para>It might be a random junk value in memory, entirely unrelated, just
    413     a coincidence.</para>
    414   </listitem>
    415     
    416   <listitem>
    417     <para>It might be a pointer to an array of C++ objects (which possess
    418     destructors) allocated with <computeroutput>new[]</computeroutput>.  In
    419     this case, some compilers store a "magic cookie" containing the array
    420     length at the start of the allocated block, and return a pointer to just
    421     past that magic cookie, i.e. an interior-pointer.
    422     See <ulink url="http://theory.uwinnipeg.ca/gnu/gcc/gxxint_14.html">this
    423     page</ulink> for more information.</para>
    424   </listitem>
    425 </itemizedlist>
    426 
    427 <para>With that in mind, consider the nine possible cases described by the
    428 following figure.</para>
    429 
    430 <programlisting><![CDATA[
    431      Pointer chain            AAA Category    BBB Category
    432      -------------            ------------    ------------
    433 (1)  RRR ------------> BBB                    DR
    434 (2)  RRR ---> AAA ---> BBB    DR              IR
    435 (3)  RRR               BBB                    DL
    436 (4)  RRR      AAA ---> BBB    DL              IL
    437 (5)  RRR ------?-----> BBB                    (y)DR, (n)DL
    438 (6)  RRR ---> AAA -?-> BBB    DR              (y)IR, (n)DL
    439 (7)  RRR -?-> AAA ---> BBB    (y)DR, (n)DL    (y)IR, (n)IL
    440 (8)  RRR -?-> AAA -?-> BBB    (y)DR, (n)DL    (y,y)IR, (n,y)IL, (_,n)DL
    441 (9)  RRR      AAA -?-> BBB    DL              (y)IL, (n)DL
    442 
    443 Pointer chain legend:
    444 - RRR: a root set node or DR block
    445 - AAA, BBB: heap blocks
    446 - --->: a start-pointer
    447 - -?->: an interior-pointer
    448 
    449 Category legend:
    450 - DR: Directly reachable
    451 - IR: Indirectly reachable
    452 - DL: Directly lost
    453 - IL: Indirectly lost
    454 - (y)XY: it's XY if the interior-pointer is a real pointer
    455 - (n)XY: it's XY if the interior-pointer is not a real pointer
    456 - (_)XY: it's XY in either case
    457 ]]></programlisting>
    458 
    459 <para>Every possible case can be reduced to one of the above nine.  Memcheck
    460 merges some of these cases in its output, resulting in the following four
    461 categories.</para>
    462 
    463 
    464 <itemizedlist>
    465 
    466   <listitem>
    467     <para>"Still reachable". This covers cases 1 and 2 (for the BBB blocks)
    468     above.  A start-pointer or chain of start-pointers to the block is
    469     found.  Since the block is still pointed at, the programmer could, at
    470     least in principle, have freed it before program exit.  Because these
    471     are very common and arguably not a problem, Memcheck won't report such
    472     blocks individually unless <option>--show-reachable=yes</option> is
    473     specified.</para>
    474   </listitem>
    475 
    476   <listitem>
    477     <para>"Definitely lost".  This covers case 3 (for the BBB blocks) above.
    478     This means that no pointer to the block can be found.  The block is
    479     classified as "lost", because the programmer could not possibly have
    480     freed it at program exit, since no pointer to it exists.  This is likely
    481     a symptom of having lost the pointer at some earlier point in the
    482     program.  Such cases should be fixed by the programmer.</para>
    483     </listitem>
    484 
    485   <listitem>
    486     <para>"Indirectly lost".  This covers cases 4 and 9 (for the BBB blocks)
    487     above.  This means that the block is lost, not because there are no
    488     pointers to it, but rather because all the blocks that point to it are
    489     themselves lost.  For example, if you have a binary tree and the root
    490     node is lost, all its children nodes will be indirectly lost.  Because
    491     the problem will disappear if the definitely lost block that caused the
    492     indirect leak is fixed, Memcheck won't report such blocks individually
    493     unless <option>--show-reachable=yes</option> is specified.</para>
    494   </listitem>
    495 
    496   <listitem>
    497     <para>"Possibly lost".  This covers cases 5--8 (for the BBB blocks)
    498     above.  This means that a chain of one or more pointers to the block has
    499     been found, but at least one of the pointers is an interior-pointer.
    500     This could just be a random value in memory that happens to point into a
    501     block, and so you shouldn't consider this ok unless you know you have
    502     interior-pointers.</para>
    503   </listitem>
    504 
    505 </itemizedlist>
    506 
    507 <para>(Note: This mapping of the nine possible cases onto four categories is
    508 not necessarily the best way that leaks could be reported;  in particular,
    509 interior-pointers are treated inconsistently.  It is possible the
    510 categorisation may be improved in the future.)</para>
    511 
    512 <para>Furthermore, if suppressions exists for a block, it will be reported
    513 as "suppressed" no matter what which of the above four categories it belongs
    514 to.</para>
    515 
    516 
    517 <para>The following is an example leak summary.</para>
    518 
    519 <programlisting><![CDATA[
    520 LEAK SUMMARY:
    521    definitely lost: 48 bytes in 3 blocks.
    522    indirectly lost: 32 bytes in 2 blocks.
    523      possibly lost: 96 bytes in 6 blocks.
    524    still reachable: 64 bytes in 4 blocks.
    525         suppressed: 0 bytes in 0 blocks.
    526 ]]></programlisting>
    527 
    528 <para>If <option>--leak-check=full</option> is specified,
    529 Memcheck will give details for each definitely lost or possibly lost block,
    530 including where it was allocated.  (Actually, it merges results for all
    531 blocks that have the same category and sufficiently similar stack traces
    532 into a single "loss record".  The
    533 <option>--leak-resolution</option> lets you control the
    534 meaning of "sufficiently similar".)  It cannot tell you when or how or why
    535 the pointer to a leaked block was lost; you have to work that out for
    536 yourself.  In general, you should attempt to ensure your programs do not
    537 have any definitely lost or possibly lost blocks at exit.</para>
    538 
    539 <para>For example:</para>
    540 <programlisting><![CDATA[
    541 8 bytes in 1 blocks are definitely lost in loss record 1 of 14
    542    at 0x........: malloc (vg_replace_malloc.c:...)
    543    by 0x........: mk (leak-tree.c:11)
    544    by 0x........: main (leak-tree.c:39)
    545 
    546 88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost in loss record 13 of 14
    547    at 0x........: malloc (vg_replace_malloc.c:...)
    548    by 0x........: mk (leak-tree.c:11)
    549    by 0x........: main (leak-tree.c:25)
    550 ]]></programlisting>
    551 
    552 <para>The first message describes a simple case of a single 8 byte block
    553 that has been definitely lost.  The second case mentions another 8 byte
    554 block that has been definitely lost;  the difference is that a further 80
    555 bytes in other blocks are indirectly lost because of this lost block.
    556 The loss records are not presented in any notable order, so the loss record
    557 numbers aren't particularly meaningful.</para>
    558 
    559 <para>If you specify <option>--show-reachable=yes</option>,
    560 reachable and indirectly lost blocks will also be shown, as the following
    561 two examples show.</para>
    562 
    563 <programlisting><![CDATA[
    564 64 bytes in 4 blocks are still reachable in loss record 2 of 4
    565    at 0x........: malloc (vg_replace_malloc.c:177)
    566    by 0x........: mk (leak-cases.c:52)
    567    by 0x........: main (leak-cases.c:74)
    568 
    569 32 bytes in 2 blocks are indirectly lost in loss record 1 of 4
    570    at 0x........: malloc (vg_replace_malloc.c:177)
    571    by 0x........: mk (leak-cases.c:52)
    572    by 0x........: main (leak-cases.c:80)
    573 ]]></programlisting>
    574 
    575 <para>Because there are different kinds of leaks with different severities, an
    576 interesting question is this: which leaks should be counted as true "errors"
    577 and which should not?  The answer to this question affects the numbers printed
    578 in the <computeroutput>ERROR SUMMARY</computeroutput> line, and also the effect
    579 of the <option>--error-exitcode</option> option.  Memcheck uses the following
    580 criteria:</para>
    581 
    582 <itemizedlist>
    583   <listitem>
    584     <para>First, a leak is only counted as a true "error" if
    585     <option>--leak-check=full</option> is specified.  In other words, an
    586     unprinted leak is not considered a true "error".  If this were not the
    587     case, it would be possible to get a high error count but not have any
    588     errors printed, which would be confusing.</para>
    589   </listitem>
    590 
    591   <listitem>
    592     <para>After that, definitely lost and possibly lost blocks are counted as
    593     true "errors".  Indirectly lost and still reachable blocks are not counted
    594     as true "errors", even if <option>--show-reachable=yes</option> is
    595     specified and they are printed;  this is because such blocks don't need
    596     direct fixing by the programmer.
    597     </para>
    598   </listitem>
    599 </itemizedlist>
    600 
    601 </sect2>
    602 
    603 </sect1>
    604 
    605 
    606 
    607 <sect1 id="mc-manual.options" 
    608        xreflabel="Memcheck Command-Line Options">
    609 <title>Memcheck Command-Line Options</title>
    610 
    611 <!-- start of xi:include in the manpage -->
    612 <variablelist id="mc.opts.list">
    613 
    614   <varlistentry id="opt.leak-check" xreflabel="--leak-check">
    615     <term>
    616       <option><![CDATA[--leak-check=<no|summary|yes|full> [default: summary] ]]></option>
    617     </term>
    618     <listitem>
    619       <para>When enabled, search for memory leaks when the client
    620       program finishes.  If set to <varname>summary</varname>, it says how
    621       many leaks occurred.  If set to <varname>full</varname> or
    622       <varname>yes</varname>, it also gives details of each individual
    623       leak.</para>
    624     </listitem>
    625   </varlistentry>
    626 
    627   <varlistentry id="opt.show-possibly-lost" xreflabel="--show-possibly-lost">
    628     <term>
    629       <option><![CDATA[--show-possibly-lost=<yes|no> [default: yes] ]]></option>
    630     </term>
    631     <listitem>
    632       <para>When disabled, the memory leak detector will not show "possibly lost" blocks.  
    633       </para>
    634     </listitem>
    635   </varlistentry>
    636 
    637   <varlistentry id="opt.leak-resolution" xreflabel="--leak-resolution">
    638     <term>
    639       <option><![CDATA[--leak-resolution=<low|med|high> [default: high] ]]></option>
    640     </term>
    641     <listitem>
    642       <para>When doing leak checking, determines how willing
    643       Memcheck is to consider different backtraces to
    644       be the same for the purposes of merging multiple leaks into a single
    645       leak report.  When set to <varname>low</varname>, only the first
    646       two entries need match.  When <varname>med</varname>, four entries
    647       have to match.  When <varname>high</varname>, all entries need to
    648       match.</para>
    649 
    650       <para>For hardcore leak debugging, you probably want to use
    651       <option>--leak-resolution=high</option> together with
    652       <option>--num-callers=40</option> or some such large number.
    653       </para>
    654 
    655       <para>Note that the <option>--leak-resolution</option> setting
    656       does not affect Memcheck's ability to find
    657       leaks.  It only changes how the results are presented.</para>
    658     </listitem>
    659   </varlistentry>
    660 
    661   <varlistentry id="opt.show-reachable" xreflabel="--show-reachable">
    662     <term>
    663       <option><![CDATA[--show-reachable=<yes|no> [default: no] ]]></option>
    664     </term>
    665     <listitem>
    666       <para>When disabled, the memory leak detector only shows "definitely
    667       lost" and "possibly lost" blocks.  When enabled, the leak detector also
    668       shows "reachable" and "indirectly lost" blocks.  (In other words, it
    669       shows all blocks, except suppressed ones, so
    670       <option>--show-all</option> would be a better name for
    671       it.)</para>
    672     </listitem>
    673   </varlistentry>
    674 
    675   <varlistentry id="opt.undef-value-errors" xreflabel="--undef-value-errors">
    676     <term>
    677       <option><![CDATA[--undef-value-errors=<yes|no> [default: yes] ]]></option>
    678     </term>
    679     <listitem>
    680       <para>Controls whether Memcheck reports
    681       uses of undefined value errors.  Set this to
    682       <varname>no</varname> if you don't want to see undefined value
    683       errors.  It also has the side effect of speeding up
    684       Memcheck somewhat.
    685       </para>
    686     </listitem>
    687   </varlistentry>
    688 
    689   <varlistentry id="opt.track-origins" xreflabel="--track-origins">
    690     <term>
    691       <option><![CDATA[--track-origins=<yes|no> [default: no] ]]></option>
    692     </term>
    693       <listitem>
    694         <para>Controls whether Memcheck tracks
    695         the origin of uninitialised values.  By default, it does not,
    696         which means that although it can tell you that an
    697         uninitialised value is being used in a dangerous way, it
    698         cannot tell you where the uninitialised value came from.  This
    699         often makes it difficult to track down the root problem.
    700         </para>
    701         <para>When set
    702         to <varname>yes</varname>, Memcheck keeps
    703         track of the origins of all uninitialised values.  Then, when
    704         an uninitialised value error is
    705         reported, Memcheck will try to show the
    706         origin of the value.  An origin can be one of the following
    707         four places: a heap block, a stack allocation, a client
    708         request, or miscellaneous other sources (eg, a call
    709         to <varname>brk</varname>).
    710         </para>
    711         <para>For uninitialised values originating from a heap
    712         block, Memcheck shows where the block was
    713         allocated.  For uninitialised values originating from a stack
    714         allocation, Memcheck can tell you which
    715         function allocated the value, but no more than that -- typically
    716         it shows you the source location of the opening brace of the
    717         function.  So you should carefully check that all of the
    718         function's local variables are initialised properly.
    719         </para>
    720         <para>Performance overhead: origin tracking is expensive.  It
    721         halves Memcheck's speed and increases
    722         memory use by a minimum of 100MB, and possibly more.
    723         Nevertheless it can drastically reduce the effort required to
    724         identify the root cause of uninitialised value errors, and so
    725         is often a programmer productivity win, despite running
    726         more slowly.
    727         </para>
    728         <para>Accuracy: Memcheck tracks origins
    729         quite accurately.  To avoid very large space and time
    730         overheads, some approximations are made.  It is possible,
    731         although unlikely, that Memcheck will report an incorrect origin, or
    732         not be able to identify any origin.
    733         </para>
    734         <para>Note that the combination
    735         <option>--track-origins=yes</option>
    736         and <option>--undef-value-errors=no</option> is
    737         nonsensical.  Memcheck checks for and
    738         rejects this combination at startup.
    739         </para>
    740       </listitem>
    741   </varlistentry>
    742 
    743   <varlistentry id="opt.partial-loads-ok" xreflabel="--partial-loads-ok">
    744     <term>
    745       <option><![CDATA[--partial-loads-ok=<yes|no> [default: no] ]]></option>
    746     </term>
    747     <listitem>
    748       <para>Controls how Memcheck handles word-sized,
    749       word-aligned loads from addresses for which some bytes are
    750       addressable and others are not.  When <varname>yes</varname>, such
    751       loads do not produce an address error.  Instead, loaded bytes
    752       originating from illegal addresses are marked as uninitialised, and
    753       those corresponding to legal addresses are handled in the normal
    754       way.</para>
    755 
    756       <para>When <varname>no</varname>, loads from partially invalid
    757       addresses are treated the same as loads from completely invalid
    758       addresses: an illegal-address error is issued, and the resulting
    759       bytes are marked as initialised.</para>
    760 
    761       <para>Note that code that behaves in this way is in violation of
    762       the the ISO C/C++ standards, and should be considered broken.  If
    763       at all possible, such code should be fixed.  This option should be
    764       used only as a last resort.</para>
    765     </listitem>
    766   </varlistentry>
    767 
    768   <varlistentry id="opt.freelist-vol" xreflabel="--freelist-vol">
    769     <term>
    770       <option><![CDATA[--freelist-vol=<number> [default: 20000000] ]]></option>
    771     </term>
    772     <listitem>
    773       <para>When the client program releases memory using
    774       <function>free</function> (in <literal>C</literal>) or
    775       <computeroutput>delete</computeroutput>
    776       (<literal>C++</literal>), that memory is not immediately made
    777       available for re-allocation.  Instead, it is marked inaccessible
    778       and placed in a queue of freed blocks.  The purpose is to defer as
    779       long as possible the point at which freed-up memory comes back
    780       into circulation.  This increases the chance that
    781       Memcheck will be able to detect invalid
    782       accesses to blocks for some significant period of time after they
    783       have been freed.</para>
    784 
    785       <para>This option specifies the maximum total size, in bytes, of the
    786       blocks in the queue.  The default value is twenty million bytes.
    787       Increasing this increases the total amount of memory used by
    788       Memcheck but may detect invalid uses of freed
    789       blocks which would otherwise go undetected.</para>
    790     </listitem>
    791   </varlistentry>
    792 
    793   <varlistentry id="opt.freelist-big-blocks" xreflabel="--freelist-big-blocks">
    794     <term>
    795       <option><![CDATA[--freelist-big-blocks=<number> [default: 1000000] ]]></option>
    796     </term>
    797     <listitem>
    798       <para>When making blocks from the queue of freed blocks available
    799       for re-allocation, Memcheck will in priority re-circulate the blocks
    800       with a size greater or equal to <option>--freelist-big-blocks</option>.
    801       This ensures that freeing big blocks (in particular freeing blocks bigger than
    802       <option>--freelist-vol</option>) does not immediately lead to a re-circulation
    803       of all (or a lot of) the small blocks in the free list. In other words,
    804       this option increases the likelihood to discover dangling pointers
    805       for the "small" blocks, even when big blocks are freed.</para>
    806       <para>Setting a value of 0 means that all the blocks are re-circulated
    807       in a FIFO order. </para>
    808     </listitem>
    809   </varlistentry>
    810 
    811   <varlistentry id="opt.workaround-gcc296-bugs" xreflabel="--workaround-gcc296-bugs">
    812     <term>
    813       <option><![CDATA[--workaround-gcc296-bugs=<yes|no> [default: no] ]]></option>
    814     </term>
    815     <listitem>
    816       <para>When enabled, assume that reads and writes some small
    817       distance below the stack pointer are due to bugs in GCC 2.96, and
    818       does not report them.  The "small distance" is 256 bytes by
    819       default.  Note that GCC 2.96 is the default compiler on some ancient
    820       Linux distributions (RedHat 7.X) and so you may need to use this
    821       option.  Do not use it if you do not have to, as it can cause real
    822       errors to be overlooked.  A better alternative is to use a more
    823       recent GCC in which this bug is fixed.</para>
    824 
    825       <para>You may also need to use this option when working with
    826       GCC 3.X or 4.X on 32-bit PowerPC Linux.  This is because
    827       GCC generates code which occasionally accesses below the
    828       stack pointer, particularly for floating-point to/from integer
    829       conversions.  This is in violation of the 32-bit PowerPC ELF
    830       specification, which makes no provision for locations below the
    831       stack pointer to be accessible.</para>
    832     </listitem>
    833   </varlistentry>
    834 
    835   <varlistentry id="opt.ignore-ranges" xreflabel="--ignore-ranges">
    836     <term>
    837       <option><![CDATA[--ignore-ranges=0xPP-0xQQ[,0xRR-0xSS] ]]></option>
    838     </term>
    839     <listitem>
    840     <para>Any ranges listed in this option (and multiple ranges can be
    841     specified, separated by commas) will be ignored by Memcheck's
    842     addressability checking.</para>
    843     </listitem>
    844   </varlistentry>
    845 
    846   <varlistentry id="opt.malloc-fill" xreflabel="--malloc-fill">
    847     <term>
    848       <option><![CDATA[--malloc-fill=<hexnumber> ]]></option>
    849     </term>
    850     <listitem>
    851       <para>Fills blocks allocated
    852       by <computeroutput>malloc</computeroutput>,
    853          <computeroutput>new</computeroutput>, etc, but not
    854       by <computeroutput>calloc</computeroutput>, with the specified
    855       byte.  This can be useful when trying to shake out obscure
    856       memory corruption problems.  The allocated area is still
    857       regarded by Memcheck as undefined -- this option only affects its
    858       contents. Note that <option>--malloc-fill</option> does not
    859       affect a block of memory when it is used as argument
    860       to client requests VALGRIND_MEMPOOL_ALLOC or
    861       VALGRIND_MALLOCLIKE_BLOCK.
    862       </para>
    863     </listitem>
    864   </varlistentry>
    865 
    866   <varlistentry id="opt.free-fill" xreflabel="--free-fill">
    867     <term>
    868       <option><![CDATA[--free-fill=<hexnumber> ]]></option>
    869     </term>
    870     <listitem>
    871       <para>Fills blocks freed
    872       by <computeroutput>free</computeroutput>,
    873          <computeroutput>delete</computeroutput>, etc, with the
    874       specified byte value.  This can be useful when trying to shake out
    875       obscure memory corruption problems.  The freed area is still
    876       regarded by Memcheck as not valid for access -- this option only
    877       affects its contents. Note that <option>--free-fill</option> does not
    878       affect a block of memory when it is used as argument to
    879       client requests VALGRIND_MEMPOOL_FREE or VALGRIND_FREELIKE_BLOCK.
    880       </para>
    881     </listitem>
    882   </varlistentry>
    883 
    884 </variablelist>
    885 <!-- end of xi:include in the manpage -->
    886 
    887 </sect1>
    888 
    889 
    890 <sect1 id="mc-manual.suppfiles" xreflabel="Writing suppression files">
    891 <title>Writing suppression files</title>
    892 
    893 <para>The basic suppression format is described in 
    894 <xref linkend="manual-core.suppress"/>.</para>
    895 
    896 <para>The suppression-type (second) line should have the form:</para>
    897 <programlisting><![CDATA[
    898 Memcheck:suppression_type]]></programlisting>
    899 
    900 <para>The Memcheck suppression types are as follows:</para>
    901 
    902 <itemizedlist>
    903   <listitem>
    904     <para><varname>Value1</varname>, 
    905     <varname>Value2</varname>,
    906     <varname>Value4</varname>,
    907     <varname>Value8</varname>,
    908     <varname>Value16</varname>,
    909     meaning an uninitialised-value error when
    910     using a value of 1, 2, 4, 8 or 16 bytes.</para>
    911   </listitem>
    912 
    913   <listitem>
    914     <para><varname>Cond</varname> (or its old
    915     name, <varname>Value0</varname>), meaning use
    916     of an uninitialised CPU condition code.</para>
    917   </listitem>
    918 
    919   <listitem>
    920     <para><varname>Addr1</varname>,
    921     <varname>Addr2</varname>, 
    922     <varname>Addr4</varname>,
    923     <varname>Addr8</varname>,
    924     <varname>Addr16</varname>, 
    925     meaning an invalid address during a
    926     memory access of 1, 2, 4, 8 or 16 bytes respectively.</para>
    927   </listitem>
    928 
    929   <listitem>
    930     <para><varname>Jump</varname>, meaning an
    931     jump to an unaddressable location error.</para>
    932   </listitem>
    933 
    934   <listitem>
    935     <para><varname>Param</varname>, meaning an
    936     invalid system call parameter error.</para>
    937   </listitem>
    938 
    939   <listitem>
    940     <para><varname>Free</varname>, meaning an
    941     invalid or mismatching free.</para>
    942   </listitem>
    943 
    944   <listitem>
    945     <para><varname>Overlap</varname>, meaning a
    946     <computeroutput>src</computeroutput> /
    947     <computeroutput>dst</computeroutput> overlap in
    948     <function>memcpy</function> or a similar function.</para>
    949   </listitem>
    950 
    951   <listitem>
    952     <para><varname>Leak</varname>, meaning
    953     a memory leak.</para>
    954   </listitem>
    955 
    956 </itemizedlist>
    957 
    958 <para><computeroutput>Param</computeroutput> errors have an extra
    959 information line at this point, which is the name of the offending
    960 system call parameter.  No other error kinds have this extra
    961 line.</para>
    962 
    963 <para>The first line of the calling context: for <varname>ValueN</varname>
    964 and <varname>AddrN</varname> errors, it is either the name of the function
    965 in which the error occurred, or, failing that, the full path of the
    966 <filename>.so</filename> file
    967 or executable containing the error location.  For <varname>Free</varname> errors, is the name
    968 of the function doing the freeing (eg, <function>free</function>,
    969 <function>__builtin_vec_delete</function>, etc).  For
    970 <varname>Overlap</varname> errors, is the name of the function with the
    971 overlapping arguments (eg.  <function>memcpy</function>,
    972 <function>strcpy</function>, etc).</para>
    973 
    974 <para>Lastly, there's the rest of the calling context.</para>
    975 
    976 </sect1>
    977 
    978 
    979 
    980 <sect1 id="mc-manual.machine" 
    981        xreflabel="Details of Memcheck's checking machinery">
    982 <title>Details of Memcheck's checking machinery</title>
    983 
    984 <para>Read this section if you want to know, in detail, exactly
    985 what and how Memcheck is checking.</para>
    986 
    987 
    988 <sect2 id="mc-manual.value" xreflabel="Valid-value (V) bit">
    989 <title>Valid-value (V) bits</title>
    990 
    991 <para>It is simplest to think of Memcheck implementing a synthetic CPU
    992 which is identical to a real CPU, except for one crucial detail.  Every
    993 bit (literally) of data processed, stored and handled by the real CPU
    994 has, in the synthetic CPU, an associated "valid-value" bit, which says
    995 whether or not the accompanying bit has a legitimate value.  In the
    996 discussions which follow, this bit is referred to as the V (valid-value)
    997 bit.</para>
    998 
    999 <para>Each byte in the system therefore has a 8 V bits which follow it
   1000 wherever it goes.  For example, when the CPU loads a word-size item (4
   1001 bytes) from memory, it also loads the corresponding 32 V bits from a
   1002 bitmap which stores the V bits for the process' entire address space.
   1003 If the CPU should later write the whole or some part of that value to
   1004 memory at a different address, the relevant V bits will be stored back
   1005 in the V-bit bitmap.</para>
   1006 
   1007 <para>In short, each bit in the system has (conceptually) an associated V
   1008 bit, which follows it around everywhere, even inside the CPU.  Yes, all the
   1009 CPU's registers (integer, floating point, vector and condition registers)
   1010 have their own V bit vectors.  For this to work, Memcheck uses a great deal
   1011 of compression to represent the V bits compactly.</para>
   1012 
   1013 <para>Copying values around does not cause Memcheck to check for, or
   1014 report on, errors.  However, when a value is used in a way which might
   1015 conceivably affect your program's externally-visible behaviour,
   1016 the associated V bits are immediately checked.  If any of these indicate
   1017 that the value is undefined (even partially), an error is reported.</para>
   1018 
   1019 <para>Here's an (admittedly nonsensical) example:</para>
   1020 <programlisting><![CDATA[
   1021 int i, j;
   1022 int a[10], b[10];
   1023 for ( i = 0; i < 10; i++ ) {
   1024   j = a[i];
   1025   b[i] = j;
   1026 }]]></programlisting>
   1027 
   1028 <para>Memcheck emits no complaints about this, since it merely copies
   1029 uninitialised values from <varname>a[]</varname> into
   1030 <varname>b[]</varname>, and doesn't use them in a way which could
   1031 affect the behaviour of the program.  However, if
   1032 the loop is changed to:</para>
   1033 <programlisting><![CDATA[
   1034 for ( i = 0; i < 10; i++ ) {
   1035   j += a[i];
   1036 }
   1037 if ( j == 77 ) 
   1038   printf("hello there\n");
   1039 ]]></programlisting>
   1040 
   1041 <para>then Memcheck will complain, at the
   1042 <computeroutput>if</computeroutput>, that the condition depends on
   1043 uninitialised values.  Note that it <command>doesn't</command> complain
   1044 at the <varname>j += a[i];</varname>, since at that point the
   1045 undefinedness is not "observable".  It's only when a decision has to be
   1046 made as to whether or not to do the <function>printf</function> -- an
   1047 observable action of your program -- that Memcheck complains.</para>
   1048 
   1049 <para>Most low level operations, such as adds, cause Memcheck to use the
   1050 V bits for the operands to calculate the V bits for the result.  Even if
   1051 the result is partially or wholly undefined, it does not
   1052 complain.</para>
   1053 
   1054 <para>Checks on definedness only occur in three places: when a value is
   1055 used to generate a memory address, when control flow decision needs to
   1056 be made, and when a system call is detected, Memcheck checks definedness
   1057 of parameters as required.</para>
   1058 
   1059 <para>If a check should detect undefinedness, an error message is
   1060 issued.  The resulting value is subsequently regarded as well-defined.
   1061 To do otherwise would give long chains of error messages.  In other
   1062 words, once Memcheck reports an undefined value error, it tries to
   1063 avoid reporting further errors derived from that same undefined
   1064 value.</para>
   1065 
   1066 <para>This sounds overcomplicated.  Why not just check all reads from
   1067 memory, and complain if an undefined value is loaded into a CPU
   1068 register?  Well, that doesn't work well, because perfectly legitimate C
   1069 programs routinely copy uninitialised values around in memory, and we
   1070 don't want endless complaints about that.  Here's the canonical example.
   1071 Consider a struct like this:</para>
   1072 <programlisting><![CDATA[
   1073 struct S { int x; char c; };
   1074 struct S s1, s2;
   1075 s1.x = 42;
   1076 s1.c = 'z';
   1077 s2 = s1;
   1078 ]]></programlisting>
   1079 
   1080 <para>The question to ask is: how large is <varname>struct S</varname>,
   1081 in bytes?  An <varname>int</varname> is 4 bytes and a
   1082 <varname>char</varname> one byte, so perhaps a <varname>struct
   1083 S</varname> occupies 5 bytes?  Wrong.  All non-toy compilers we know
   1084 of will round the size of <varname>struct S</varname> up to a whole
   1085 number of words, in this case 8 bytes.  Not doing this forces compilers
   1086 to generate truly appalling code for accessing arrays of
   1087 <varname>struct S</varname>'s on some architectures.</para>
   1088 
   1089 <para>So <varname>s1</varname> occupies 8 bytes, yet only 5 of them will
   1090 be initialised.  For the assignment <varname>s2 = s1</varname>, GCC
   1091 generates code to copy all 8 bytes wholesale into <varname>s2</varname>
   1092 without regard for their meaning.  If Memcheck simply checked values as
   1093 they came out of memory, it would yelp every time a structure assignment
   1094 like this happened.  So the more complicated behaviour described above
   1095 is necessary.  This allows GCC to copy
   1096 <varname>s1</varname> into <varname>s2</varname> any way it likes, and a
   1097 warning will only be emitted if the uninitialised values are later
   1098 used.</para>
   1099 
   1100 </sect2>
   1101 
   1102 
   1103 <sect2 id="mc-manual.vaddress" xreflabel=" Valid-address (A) bits">
   1104 <title>Valid-address (A) bits</title>
   1105 
   1106 <para>Notice that the previous subsection describes how the validity of
   1107 values is established and maintained without having to say whether the
   1108 program does or does not have the right to access any particular memory
   1109 location.  We now consider the latter question.</para>
   1110 
   1111 <para>As described above, every bit in memory or in the CPU has an
   1112 associated valid-value (V) bit.  In addition, all bytes in memory, but
   1113 not in the CPU, have an associated valid-address (A) bit.  This
   1114 indicates whether or not the program can legitimately read or write that
   1115 location.  It does not give any indication of the validity of the data
   1116 at that location -- that's the job of the V bits -- only whether or not
   1117 the location may be accessed.</para>
   1118 
   1119 <para>Every time your program reads or writes memory, Memcheck checks
   1120 the A bits associated with the address.  If any of them indicate an
   1121 invalid address, an error is emitted.  Note that the reads and writes
   1122 themselves do not change the A bits, only consult them.</para>
   1123 
   1124 <para>So how do the A bits get set/cleared?  Like this:</para>
   1125 
   1126 <itemizedlist>
   1127   <listitem>
   1128     <para>When the program starts, all the global data areas are
   1129     marked as accessible.</para>
   1130   </listitem>
   1131 
   1132   <listitem>
   1133     <para>When the program does
   1134     <function>malloc</function>/<computeroutput>new</computeroutput>,
   1135     the A bits for exactly the area allocated, and not a byte more,
   1136     are marked as accessible.  Upon freeing the area the A bits are
   1137     changed to indicate inaccessibility.</para>
   1138   </listitem>
   1139 
   1140   <listitem>
   1141     <para>When the stack pointer register (<literal>SP</literal>) moves
   1142     up or down, A bits are set.  The rule is that the area from
   1143     <literal>SP</literal> up to the base of the stack is marked as
   1144     accessible, and below <literal>SP</literal> is inaccessible.  (If
   1145     that sounds illogical, bear in mind that the stack grows down, not
   1146     up, on almost all Unix systems, including GNU/Linux.)  Tracking
   1147     <literal>SP</literal> like this has the useful side-effect that the
   1148     section of stack used by a function for local variables etc is
   1149     automatically marked accessible on function entry and inaccessible
   1150     on exit.</para>
   1151   </listitem>
   1152 
   1153   <listitem>
   1154     <para>When doing system calls, A bits are changed appropriately.
   1155     For example, <literal>mmap</literal>
   1156     magically makes files appear in the process'
   1157     address space, so the A bits must be updated if <literal>mmap</literal>
   1158     succeeds.</para>
   1159   </listitem>
   1160 
   1161   <listitem>
   1162     <para>Optionally, your program can tell Memcheck about such changes
   1163     explicitly, using the client request mechanism described
   1164     above.</para>
   1165   </listitem>
   1166 
   1167 </itemizedlist>
   1168 
   1169 </sect2>
   1170 
   1171 
   1172 <sect2 id="mc-manual.together" xreflabel="Putting it all together">
   1173 <title>Putting it all together</title>
   1174 
   1175 <para>Memcheck's checking machinery can be summarised as
   1176 follows:</para>
   1177 
   1178 <itemizedlist>
   1179   <listitem>
   1180     <para>Each byte in memory has 8 associated V (valid-value) bits,
   1181     saying whether or not the byte has a defined value, and a single A
   1182     (valid-address) bit, saying whether or not the program currently has
   1183     the right to read/write that address.  As mentioned above, heavy
   1184     use of compression means the overhead is typically around 25%.</para>
   1185   </listitem>
   1186 
   1187   <listitem>
   1188     <para>When memory is read or written, the relevant A bits are
   1189     consulted.  If they indicate an invalid address, Memcheck emits an
   1190     Invalid read or Invalid write error.</para>
   1191   </listitem>
   1192 
   1193   <listitem>
   1194     <para>When memory is read into the CPU's registers, the relevant V
   1195     bits are fetched from memory and stored in the simulated CPU.  They
   1196     are not consulted.</para>
   1197   </listitem>
   1198 
   1199   <listitem>
   1200     <para>When a register is written out to memory, the V bits for that
   1201     register are written back to memory too.</para>
   1202   </listitem>
   1203 
   1204   <listitem>
   1205     <para>When values in CPU registers are used to generate a memory
   1206     address, or to determine the outcome of a conditional branch, the V
   1207     bits for those values are checked, and an error emitted if any of
   1208     them are undefined.</para>
   1209   </listitem>
   1210 
   1211   <listitem>
   1212     <para>When values in CPU registers are used for any other purpose,
   1213     Memcheck computes the V bits for the result, but does not check
   1214     them.</para>
   1215   </listitem>
   1216 
   1217   <listitem>
   1218     <para>Once the V bits for a value in the CPU have been checked, they
   1219     are then set to indicate validity.  This avoids long chains of
   1220     errors.</para>
   1221   </listitem>
   1222 
   1223   <listitem>
   1224     <para>When values are loaded from memory, Memcheck checks the A bits
   1225     for that location and issues an illegal-address warning if needed.
   1226     In that case, the V bits loaded are forced to indicate Valid,
   1227     despite the location being invalid.</para>
   1228 
   1229     <para>This apparently strange choice reduces the amount of confusing
   1230     information presented to the user.  It avoids the unpleasant
   1231     phenomenon in which memory is read from a place which is both
   1232     unaddressable and contains invalid values, and, as a result, you get
   1233     not only an invalid-address (read/write) error, but also a
   1234     potentially large set of uninitialised-value errors, one for every
   1235     time the value is used.</para>
   1236 
   1237     <para>There is a hazy boundary case to do with multi-byte loads from
   1238     addresses which are partially valid and partially invalid.  See
   1239     details of the option <option>--partial-loads-ok</option> for details.
   1240     </para>
   1241   </listitem>
   1242 
   1243 </itemizedlist>
   1244 
   1245 
   1246 <para>Memcheck intercepts calls to <function>malloc</function>,
   1247 <function>calloc</function>, <function>realloc</function>,
   1248 <function>valloc</function>, <function>memalign</function>,
   1249 <function>free</function>, <computeroutput>new</computeroutput>,
   1250 <computeroutput>new[]</computeroutput>,
   1251 <computeroutput>delete</computeroutput> and
   1252 <computeroutput>delete[]</computeroutput>.  The behaviour you get
   1253 is:</para>
   1254 
   1255 <itemizedlist>
   1256 
   1257   <listitem>
   1258     <para><function>malloc</function>/<function>new</function>/<computeroutput>new[]</computeroutput>:
   1259     the returned memory is marked as addressable but not having valid
   1260     values.  This means you have to write to it before you can read
   1261     it.</para>
   1262   </listitem>
   1263 
   1264   <listitem>
   1265     <para><function>calloc</function>: returned memory is marked both
   1266     addressable and valid, since <function>calloc</function> clears
   1267     the area to zero.</para>
   1268   </listitem>
   1269 
   1270   <listitem>
   1271     <para><function>realloc</function>: if the new size is larger than
   1272     the old, the new section is addressable but invalid, as with
   1273     <function>malloc</function>.  If the new size is smaller, the
   1274     dropped-off section is marked as unaddressable.  You may only pass to
   1275     <function>realloc</function> a pointer previously issued to you by
   1276     <function>malloc</function>/<function>calloc</function>/<function>realloc</function>.</para>
   1277   </listitem>
   1278 
   1279   <listitem>
   1280     <para><function>free</function>/<computeroutput>delete</computeroutput>/<computeroutput>delete[]</computeroutput>:
   1281     you may only pass to these functions a pointer previously issued
   1282     to you by the corresponding allocation function.  Otherwise,
   1283     Memcheck complains.  If the pointer is indeed valid, Memcheck
   1284     marks the entire area it points at as unaddressable, and places
   1285     the block in the freed-blocks-queue.  The aim is to defer as long
   1286     as possible reallocation of this block.  Until that happens, all
   1287     attempts to access it will elicit an invalid-address error, as you
   1288     would hope.</para>
   1289   </listitem>
   1290 
   1291 </itemizedlist>
   1292 
   1293 </sect2>
   1294 </sect1>
   1295 
   1296 <sect1 id="mc-manual.monitor-commands" xreflabel="Memcheck Monitor Commands">
   1297 <title>Memcheck Monitor Commands</title>
   1298 <para>The Memcheck tool provides monitor commands handled by Valgrind's
   1299 built-in gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
   1300 </para>
   1301 
   1302 <itemizedlist>
   1303   <listitem>
   1304     <para><varname>get_vbits &lt;addr&gt; [&lt;len&gt;]</varname>
   1305     shows the definedness (V) bits for &lt;len&gt; (default 1) bytes
   1306     starting at &lt;addr&gt;.  The definedness of each byte in the
   1307     range is given using two hexadecimal digits.  These hexadecimal
   1308     digits encode the validity of each bit of the corresponding byte,
   1309     using 0 if the bit is defined and 1 if the bit is undefined.
   1310     If a byte is not addressable, its validity bits are replaced
   1311     by <varname>__</varname> (a double underscore).
   1312     </para>
   1313     <para>
   1314     In the following example, <varname>string10</varname> is an array
   1315     of 10 characters, in which the even numbered bytes are
   1316     undefined. In the below example, the byte corresponding
   1317     to <varname>string10[5]</varname> is not addressable.
   1318     </para>
   1319 <programlisting><![CDATA[
   1320 (gdb) p &string10
   1321 $4 = (char (*)[10]) 0x8049e28
   1322 (gdb) monitor get_vbits 0x8049e28 10
   1323 ff00ff00 ff__ff00 ff00
   1324 (gdb) 
   1325 ]]></programlisting>
   1326 
   1327     <para> The command get_vbits cannot be used with registers. To get
   1328     the validity bits of a register, you must start Valgrind with the
   1329     option <option>--vgdb-shadow-registers=yes</option>. The validity
   1330     bits of a register can be obtained by printing the 'shadow 1'
   1331     corresponding register.  In the below x86 example, the register
   1332     eax has all its bits undefined, while the register ebx is fully
   1333     defined.
   1334     </para>
   1335 <programlisting><![CDATA[
   1336 (gdb) p /x $eaxs1
   1337 $9 = 0xffffffff
   1338 (gdb) p /x $ebxs1
   1339 $10 = 0x0
   1340 (gdb) 
   1341 ]]></programlisting>
   1342 
   1343   </listitem>
   1344 
   1345   <listitem>
   1346     <para><varname>make_memory
   1347     [noaccess|undefined|defined|Definedifaddressable] &lt;addr&gt;
   1348     [&lt;len&gt;]</varname> marks the range of &lt;len&gt; (default 1)
   1349     bytes at &lt;addr&gt; as having the given status. Parameter
   1350     <varname>noaccess</varname> marks the range as non-accessible, so
   1351     Memcheck will report an error on any access to it.
   1352     <varname>undefined</varname> or <varname>defined</varname> mark
   1353     the area as accessible, but Memcheck regards the bytes in it
   1354     respectively as having undefined or defined values.
   1355     <varname>Definedifaddressable</varname> marks as defined, bytes in
   1356     the range which are already addressible, but makes no change to
   1357     the status of bytes in the range which are not addressible. Note
   1358     that the first letter of <varname>Definedifaddressable</varname>
   1359     is an uppercase D to avoid confusion with <varname>defined</varname>.
   1360     </para>
   1361 
   1362     <para>
   1363     In the following example, the first byte of the
   1364     <varname>string10</varname> is marked as defined:
   1365     </para>
   1366 <programlisting><![CDATA[
   1367 (gdb) monitor make_memory defined 0x8049e28  1
   1368 (gdb) monitor get_vbits 0x8049e28 10
   1369 0000ff00 ff00ff00 ff00
   1370 (gdb) 
   1371 ]]></programlisting>
   1372   </listitem>
   1373 
   1374   <listitem>
   1375     <para><varname>check_memory [addressable|defined] &lt;addr&gt;
   1376     [&lt;len&gt;]</varname> checks that the range of &lt;len&gt;
   1377     (default 1) bytes at &lt;addr&gt; has the specified accessibility.
   1378     It then outputs a description of &lt;addr&gt;. In the following
   1379     example, a detailed description is available because the
   1380     option <option>--read-var-info=yes</option> was given at Valgrind
   1381     startup:
   1382     </para>
   1383 <programlisting><![CDATA[
   1384 (gdb) monitor check_memory defined 0x8049e28  1
   1385 Address 0x8049E28 len 1 defined
   1386 ==14698==  Location 0x8049e28 is 0 bytes inside string10[0],
   1387 ==14698==  declared at prog.c:10, in frame #0 of thread 1
   1388 (gdb) 
   1389 ]]></programlisting>
   1390   </listitem>
   1391 
   1392   <listitem>
   1393     <para><varname>leak_check [full*|summary]
   1394                               [reachable|possibleleak*|definiteleak]
   1395                               [increased*|changed|any]
   1396                               [unlimited*|limited &lt;max_loss_records_output&gt;]
   1397           </varname>
   1398     performs a leak check. The <varname>*</varname> in the arguments
   1399     indicates the default values. </para>
   1400 
   1401     <para> If the first argument is <varname>summary</varname>, only a
   1402     summary of the leak search is given; otherwise a full leak report
   1403     is produced.  A full leak report gives detailed information for
   1404     each leak: the stack trace where the leaked blocks were allocated,
   1405     the number of blocks leaked and their total size.  When a full
   1406     report is requested, the next two arguments further specify what
   1407     kind of leaks to report.  A leak's details are shown if they match
   1408     both the second and third argument. A full leak report might
   1409     output detailed information for many leaks. The nr of leaks for
   1410     which information is output can be controlled using
   1411     the <varname>limited</varname> argument followed by the maximum nr
   1412     of leak records to output. If this maximum is reached, the leak
   1413     search  outputs the records with the biggest number of bytes.
   1414     </para>
   1415 
   1416     <para>The second argument controls what kind of blocks are shown for
   1417     a <varname>full</varname> leak search.  The
   1418     value <varname>definiteleak</varname> specifies that only
   1419     definitely leaked blocks should be shown.  The
   1420     value <varname>possibleleak</varname> will also show possibly
   1421     leaked blocks (those for which only an interior pointer was
   1422     found).  The value
   1423     <varname>reachable</varname> will show all block categories
   1424     (reachable, possibly leaked, definitely leaked).
   1425     </para>
   1426 
   1427     <para>The third argument controls what kinds of changes are shown
   1428     for a <varname>full</varname> leak search. The
   1429     value <varname>increased</varname> specifies that only block
   1430     allocation stacks with an increased number of leaked bytes or
   1431     blocks since the previous leak check should be shown.  The
   1432     value <varname>changed</varname> specifies that allocation stacks
   1433     with any change since the previous leak check should be shown.
   1434     The value <varname>any</varname> specifies that all leak entries
   1435     should be shown, regardless of any increase or decrease.  When
   1436     If <varname>increased</varname> or <varname>changed</varname> are
   1437     specified, the leak report entries will show the delta relative to
   1438     the previous leak report.
   1439     </para>
   1440 
   1441     <para>The following example shows usage of the 
   1442     <varname>leak_check</varname> monitor command on
   1443     the <varname>memcheck/tests/leak-cases.c</varname> regression
   1444     test. The first command outputs one entry having an increase in
   1445     the leaked bytes.  The second command is the same as the first
   1446     command, but uses the abbreviated forms accepted by GDB and the
   1447     Valgrind gdbserver. It only outputs the summary information, as
   1448     there was no increase since the previous leak search.</para>
   1449 <programlisting><![CDATA[
   1450 (gdb) monitor leak_check full possibleleak increased
   1451 ==19520== 16 (+16) bytes in 1 (+1) blocks are possibly lost in loss record 9 of 12
   1452 ==19520==    at 0x40070B4: malloc (vg_replace_malloc.c:263)
   1453 ==19520==    by 0x80484D5: mk (leak-cases.c:52)
   1454 ==19520==    by 0x804855F: f (leak-cases.c:81)
   1455 ==19520==    by 0x80488E0: main (leak-cases.c:107)
   1456 ==19520== 
   1457 ==19520== LEAK SUMMARY:
   1458 ==19520==    definitely lost: 32 (+0) bytes in 2 (+0) blocks
   1459 ==19520==    indirectly lost: 16 (+0) bytes in 1 (+0) blocks
   1460 ==19520==      possibly lost: 32 (+16) bytes in 2 (+1) blocks
   1461 ==19520==    still reachable: 96 (+16) bytes in 6 (+1) blocks
   1462 ==19520==         suppressed: 0 (+0) bytes in 0 (+0) blocks
   1463 ==19520== Reachable blocks (those to which a pointer was found) are not shown.
   1464 ==19520== To see them, add 'reachable any' args to leak_check
   1465 ==19520== 
   1466 (gdb) mo l
   1467 ==19520== LEAK SUMMARY:
   1468 ==19520==    definitely lost: 32 (+0) bytes in 2 (+0) blocks
   1469 ==19520==    indirectly lost: 16 (+0) bytes in 1 (+0) blocks
   1470 ==19520==      possibly lost: 32 (+0) bytes in 2 (+0) blocks
   1471 ==19520==    still reachable: 96 (+0) bytes in 6 (+0) blocks
   1472 ==19520==         suppressed: 0 (+0) bytes in 0 (+0) blocks
   1473 ==19520== Reachable blocks (those to which a pointer was found) are not shown.
   1474 ==19520== To see them, add 'reachable any' args to leak_check
   1475 ==19520== 
   1476 (gdb) 
   1477 ]]></programlisting>
   1478     <para>Note that when using Valgrind's gdbserver, it is not
   1479     necessary to rerun
   1480     with <option>--leak-check=full</option>
   1481     <option>--show-reachable=yes</option> to see the reachable
   1482     blocks. You can obtain the same information without rerunning by
   1483     using the GDB command <computeroutput>monitor leak_check full
   1484     reachable any</computeroutput> (or, using
   1485     abbreviation: <computeroutput>mo l f r a</computeroutput>).
   1486     </para>
   1487   </listitem>
   1488 
   1489   <listitem>
   1490     <para><varname>block_list &lt;loss_record_nr&gt; </varname>
   1491     shows the list of blocks belonging to &lt;loss_record_nr&gt;.
   1492     </para>
   1493 
   1494     <para> A leak search merges the allocated blocks in loss records :
   1495     a loss record re-groups all blocks having the same state (for
   1496     example, Definitely Lost) and the same allocation backtrace.
   1497     Each loss record is identified in the leak search result 
   1498     by a loss record number.
   1499     The <varname>block_list</varname> command shows the loss record information
   1500     followed by the addresses and sizes of the blocks which have been
   1501     merged in the loss record.
   1502     </para>
   1503 
   1504     <para> If a directly lost block causes some other blocks to be indirectly
   1505     lost, the block_list command will also show these indirectly lost blocks.
   1506     The indirectly lost blocks will be indented according to the level of indirection
   1507     between the directly lost block and the indirectly lost block(s).
   1508     Each indirectly lost block is followed by the reference of its loss record.
   1509     </para>
   1510 
   1511     <para> The block_list command can be used on the results of a leak search as long
   1512     as no block has been freed after this leak search: as soon as the program frees
   1513     a block, a new leak search is needed before block_list can be used again.
   1514     </para>
   1515 
   1516     <para>
   1517     In the below example, the program leaks a tree structure by losing the pointer to 
   1518     the block A (top of the tree).
   1519     So, the block A is directly lost, causing an indirect
   1520     loss of blocks B to G. The first block_list command shows the loss record of A
   1521     (a definitely lost block with address 0x4028028, size 16). The addresses and sizes
   1522     of the indirectly lost blocks due to block A are shown below the block A.
   1523     The second command shows the details of one of the indirect loss records output
   1524     by the first command.
   1525     </para>
   1526 <programlisting><![CDATA[
   1527            A
   1528          /   \
   1529         B     C
   1530        / \   / \ 
   1531       D   E F   G
   1532 ]]></programlisting>
   1533 
   1534 <programlisting><![CDATA[
   1535 (gdb) bt
   1536 #0  main () at leak-tree.c:69
   1537 (gdb) monitor leak_check full any
   1538 ==19552== 112 (16 direct, 96 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7
   1539 ==19552==    at 0x40070B4: malloc (vg_replace_malloc.c:263)
   1540 ==19552==    by 0x80484D5: mk (leak-tree.c:28)
   1541 ==19552==    by 0x80484FC: f (leak-tree.c:41)
   1542 ==19552==    by 0x8048856: main (leak-tree.c:63)
   1543 ==19552== 
   1544 ==19552== LEAK SUMMARY:
   1545 ==19552==    definitely lost: 16 bytes in 1 blocks
   1546 ==19552==    indirectly lost: 96 bytes in 6 blocks
   1547 ==19552==      possibly lost: 0 bytes in 0 blocks
   1548 ==19552==    still reachable: 0 bytes in 0 blocks
   1549 ==19552==         suppressed: 0 bytes in 0 blocks
   1550 ==19552== 
   1551 (gdb) monitor block_list 7
   1552 ==19552== 112 (16 direct, 96 indirect) bytes in 1 blocks are definitely lost in loss record 7 of 7
   1553 ==19552==    at 0x40070B4: malloc (vg_replace_malloc.c:263)
   1554 ==19552==    by 0x80484D5: mk (leak-tree.c:28)
   1555 ==19552==    by 0x80484FC: f (leak-tree.c:41)
   1556 ==19552==    by 0x8048856: main (leak-tree.c:63)
   1557 ==19552== 0x4028028[16]
   1558 ==19552==   0x4028068[16] indirect loss record 1
   1559 ==19552==      0x40280E8[16] indirect loss record 3
   1560 ==19552==      0x4028128[16] indirect loss record 4
   1561 ==19552==   0x40280A8[16] indirect loss record 2
   1562 ==19552==      0x4028168[16] indirect loss record 5
   1563 ==19552==      0x40281A8[16] indirect loss record 6
   1564 (gdb) mo b 2
   1565 ==19552== 16 bytes in 1 blocks are indirectly lost in loss record 2 of 7
   1566 ==19552==    at 0x40070B4: malloc (vg_replace_malloc.c:263)
   1567 ==19552==    by 0x80484D5: mk (leak-tree.c:28)
   1568 ==19552==    by 0x8048519: f (leak-tree.c:43)
   1569 ==19552==    by 0x8048856: main (leak-tree.c:63)
   1570 ==19552== 0x40280A8[16]
   1571 ==19552==   0x4028168[16] indirect loss record 5
   1572 ==19552==   0x40281A8[16] indirect loss record 6
   1573 (gdb) 
   1574 
   1575 ]]></programlisting>
   1576 
   1577   </listitem>
   1578 
   1579   <listitem>
   1580     <para><varname>who_points_at &lt;addr&gt; [&lt;len&gt;]</varname> 
   1581     shows all the locations where a pointer to addr is found.
   1582     If len is equal to 1, the command only shows the locations pointing
   1583     exactly at addr (i.e. the "start pointers" to addr).
   1584     If len is &gt; 1, "interior pointers" pointing at the len first bytes
   1585     will also be shown.
   1586     </para>
   1587 
   1588     <para>The locations searched for are the same as the locations
   1589     used in the leak search. So, <varname>who_points_at</varname> can a.o.
   1590     be used to show why the leak search still can reach a block, or can
   1591     search for dangling pointers to a freed block.
   1592     Each location pointing at addr (or pointing inside addr if interior pointers
   1593     are being searched for) will be described.
   1594     </para>
   1595 
   1596     <para>In the below example, the pointers to the 'tree block A' (see example
   1597     in command <varname>block_list</varname>) is shown before the tree was leaked.
   1598     The descriptions are detailed as the option <option>--read-var-info=yes</option> 
   1599     was given at Valgrind startup. The second call shows the pointers (start and interior
   1600     pointers) to block G. The block G (0x40281A8) is reachable via block C (0x40280a8)
   1601     and register ECX of tid 1 (tid is the Valgrind thread id).
   1602     It is "interior reachable" via the register EBX.
   1603     </para>
   1604 
   1605 <programlisting><![CDATA[
   1606 (gdb) monitor who_points_at 0x4028028
   1607 ==20852== Searching for pointers to 0x4028028
   1608 ==20852== *0x8049e20 points at 0x4028028
   1609 ==20852==  Location 0x8049e20 is 0 bytes inside global var "t"
   1610 ==20852==  declared at leak-tree.c:35
   1611 (gdb) monitor who_points_at 0x40281A8 16
   1612 ==20852== Searching for pointers pointing in 16 bytes from 0x40281a8
   1613 ==20852== *0x40280ac points at 0x40281a8
   1614 ==20852==  Address 0x40280ac is 4 bytes inside a block of size 16 alloc'd
   1615 ==20852==    at 0x40070B4: malloc (vg_replace_malloc.c:263)
   1616 ==20852==    by 0x80484D5: mk (leak-tree.c:28)
   1617 ==20852==    by 0x8048519: f (leak-tree.c:43)
   1618 ==20852==    by 0x8048856: main (leak-tree.c:63)
   1619 ==20852== tid 1 register ECX points at 0x40281a8
   1620 ==20852== tid 1 register EBX interior points at 2 bytes inside 0x40281a8
   1621 (gdb)
   1622 ]]></programlisting>
   1623   </listitem>
   1624 
   1625 
   1626 </itemizedlist>
   1627 
   1628 </sect1>
   1629 
   1630 <sect1 id="mc-manual.clientreqs" xreflabel="Client requests">
   1631 <title>Client Requests</title>
   1632 
   1633 <para>The following client requests are defined in
   1634 <filename>memcheck.h</filename>.
   1635 See <filename>memcheck.h</filename> for exact details of their
   1636 arguments.</para>
   1637 
   1638 <itemizedlist>
   1639 
   1640   <listitem>
   1641     <para><varname>VALGRIND_MAKE_MEM_NOACCESS</varname>,
   1642     <varname>VALGRIND_MAKE_MEM_UNDEFINED</varname> and
   1643     <varname>VALGRIND_MAKE_MEM_DEFINED</varname>.
   1644     These mark address ranges as completely inaccessible,
   1645     accessible but containing undefined data, and accessible and
   1646     containing defined data, respectively.</para>
   1647   </listitem>
   1648 
   1649   <listitem>
   1650     <para><varname>VALGRIND_MAKE_MEM_DEFINED_IF_ADDRESSABLE</varname>.
   1651     This is just like <varname>VALGRIND_MAKE_MEM_DEFINED</varname> but only
   1652     affects those bytes that are already addressable.</para>
   1653   </listitem>
   1654 
   1655   <listitem>
   1656     <para><varname>VALGRIND_CHECK_MEM_IS_ADDRESSABLE</varname> and
   1657     <varname>VALGRIND_CHECK_MEM_IS_DEFINED</varname>: check immediately
   1658     whether or not the given address range has the relevant property,
   1659     and if not, print an error message.  Also, for the convenience of
   1660     the client, returns zero if the relevant property holds; otherwise,
   1661     the returned value is the address of the first byte for which the
   1662     property is not true.  Always returns 0 when not run on
   1663     Valgrind.</para>
   1664   </listitem>
   1665 
   1666   <listitem>
   1667     <para><varname>VALGRIND_CHECK_VALUE_IS_DEFINED</varname>: a quick and easy
   1668     way to find out whether Valgrind thinks a particular value
   1669     (lvalue, to be precise) is addressable and defined.  Prints an error
   1670     message if not.  It has no return value.</para>
   1671   </listitem>
   1672 
   1673   <listitem>
   1674     <para><varname>VALGRIND_DO_LEAK_CHECK</varname>: does a full memory leak
   1675     check (like <option>--leak-check=full</option>) right now.
   1676     This is useful for incrementally checking for leaks between arbitrary
   1677     places in the program's execution.  It has no return value.</para>
   1678   </listitem>
   1679 
   1680   <listitem>
   1681     <para><varname>VALGRIND_DO_ADDED_LEAK_CHECK</varname>: same as
   1682    <varname> VALGRIND_DO_LEAK_CHECK</varname> but only shows the
   1683     entries for which there was an increase in leaked bytes or leaked
   1684     number of blocks since the previous leak search.  It has no return
   1685     value.</para>
   1686   </listitem>
   1687 
   1688   <listitem>
   1689     <para><varname>VALGRIND_DO_CHANGED_LEAK_CHECK</varname>: same as
   1690     <varname>VALGRIND_DO_LEAK_CHECK</varname> but only shows the
   1691     entries for which there was an increase or decrease in leaked
   1692     bytes or leaked number of blocks since the previous leak search. It
   1693     has no return value.</para>
   1694   </listitem>
   1695 
   1696   <listitem>
   1697     <para><varname>VALGRIND_DO_QUICK_LEAK_CHECK</varname>: like
   1698     <varname>VALGRIND_DO_LEAK_CHECK</varname>, except it produces only a leak
   1699     summary (like <option>--leak-check=summary</option>).
   1700     It has no return value.</para>
   1701   </listitem>
   1702 
   1703   <listitem>
   1704     <para><varname>VALGRIND_COUNT_LEAKS</varname>: fills in the four
   1705     arguments with the number of bytes of memory found by the previous
   1706     leak check to be leaked (i.e. the sum of direct leaks and indirect leaks),
   1707     dubious, reachable and suppressed.  This is useful in test harness code,
   1708     after calling <varname>VALGRIND_DO_LEAK_CHECK</varname> or
   1709     <varname>VALGRIND_DO_QUICK_LEAK_CHECK</varname>.</para>
   1710   </listitem>
   1711 
   1712   <listitem>
   1713     <para><varname>VALGRIND_COUNT_LEAK_BLOCKS</varname>: identical to
   1714     <varname>VALGRIND_COUNT_LEAKS</varname> except that it returns the
   1715     number of blocks rather than the number of bytes in each
   1716     category.</para>
   1717   </listitem>
   1718 
   1719   <listitem>
   1720     <para><varname>VALGRIND_GET_VBITS</varname> and
   1721     <varname>VALGRIND_SET_VBITS</varname>: allow you to get and set the
   1722     V (validity) bits for an address range.  You should probably only
   1723     set V bits that you have got with
   1724     <varname>VALGRIND_GET_VBITS</varname>.  Only for those who really
   1725     know what they are doing.</para>
   1726   </listitem>
   1727 
   1728   <listitem>
   1729     <para><varname>VALGRIND_CREATE_BLOCK</varname> and 
   1730     <varname>VALGRIND_DISCARD</varname>.  <varname>VALGRIND_CREATE_BLOCK</varname>
   1731     takes an address, a number of bytes and a character string.  The
   1732     specified address range is then associated with that string.  When
   1733     Memcheck reports an invalid access to an address in the range, it
   1734     will describe it in terms of this block rather than in terms of
   1735     any other block it knows about.  Note that the use of this macro
   1736     does not actually change the state of memory in any way -- it
   1737     merely gives a name for the range.
   1738     </para>
   1739 
   1740     <para>At some point you may want Memcheck to stop reporting errors
   1741     in terms of the block named
   1742     by <varname>VALGRIND_CREATE_BLOCK</varname>.  To make this
   1743     possible, <varname>VALGRIND_CREATE_BLOCK</varname> returns a
   1744     "block handle", which is a C <varname>int</varname> value.  You
   1745     can pass this block handle to <varname>VALGRIND_DISCARD</varname>.
   1746     After doing so, Valgrind will no longer relate addressing errors
   1747     in the specified range to the block.  Passing invalid handles to
   1748     <varname>VALGRIND_DISCARD</varname> is harmless.
   1749    </para>
   1750   </listitem>
   1751 
   1752 </itemizedlist>
   1753 
   1754 </sect1>
   1755 
   1756 
   1757 
   1758 
   1759 <sect1 id="mc-manual.mempools" xreflabel="Memory Pools">
   1760 <title>Memory Pools: describing and working with custom allocators</title>
   1761 
   1762 <para>Some programs use custom memory allocators, often for performance
   1763 reasons.  Left to itself, Memcheck is unable to understand the
   1764 behaviour of custom allocation schemes as well as it understands the
   1765 standard allocators, and so may miss errors and leaks in your program.  What
   1766 this section describes is a way to give Memcheck enough of a description of
   1767 your custom allocator that it can make at least some sense of what is
   1768 happening.</para>
   1769 
   1770 <para>There are many different sorts of custom allocator, so Memcheck
   1771 attempts to reason about them using a loose, abstract model.  We
   1772 use the following terminology when describing custom allocation
   1773 systems:</para>
   1774 
   1775 <itemizedlist>
   1776   <listitem>
   1777     <para>Custom allocation involves a set of independent "memory pools".
   1778     </para>
   1779   </listitem>
   1780   <listitem>
   1781     <para>Memcheck's notion of a a memory pool consists of a single "anchor
   1782     address" and a set of non-overlapping "chunks" associated with the
   1783     anchor address.</para>
   1784   </listitem>
   1785   <listitem>
   1786     <para>Typically a pool's anchor address is the address of a 
   1787     book-keeping "header" structure.</para>
   1788   </listitem>
   1789   <listitem>
   1790     <para>Typically the pool's chunks are drawn from a contiguous
   1791     "superblock" acquired through the system
   1792     <function>malloc</function> or
   1793     <function>mmap</function>.</para>
   1794   </listitem>
   1795 
   1796 </itemizedlist>
   1797 
   1798 <para>Keep in mind that the last two points above say "typically": the
   1799 Valgrind mempool client request API is intentionally vague about the
   1800 exact structure of a mempool. There is no specific mention made of
   1801 headers or superblocks. Nevertheless, the following picture may help
   1802 elucidate the intention of the terms in the API:</para>
   1803 
   1804 <programlisting><![CDATA[
   1805    "pool"
   1806    (anchor address)
   1807    |
   1808    v
   1809    +--------+---+
   1810    | header | o |
   1811    +--------+-|-+
   1812               |
   1813               v                  superblock
   1814               +------+---+--------------+---+------------------+
   1815               |      |rzB|  allocation  |rzB|                  |
   1816               +------+---+--------------+---+------------------+
   1817                          ^              ^
   1818                          |              |
   1819                        "addr"     "addr"+"size"
   1820 ]]></programlisting>
   1821 
   1822 <para>
   1823 Note that the header and the superblock may be contiguous or
   1824 discontiguous, and there may be multiple superblocks associated with a
   1825 single header; such variations are opaque to Memcheck. The API
   1826 only requires that your allocation scheme can present sensible values
   1827 of "pool", "addr" and "size".</para>
   1828 
   1829 <para>
   1830 Typically, before making client requests related to mempools, a client
   1831 program will have allocated such a header and superblock for their
   1832 mempool, and marked the superblock NOACCESS using the
   1833 <varname>VALGRIND_MAKE_MEM_NOACCESS</varname> client request.</para>
   1834 
   1835 <para>
   1836 When dealing with mempools, the goal is to maintain a particular
   1837 invariant condition: that Memcheck believes the unallocated portions
   1838 of the pool's superblock (including redzones) are NOACCESS. To
   1839 maintain this invariant, the client program must ensure that the
   1840 superblock starts out in that state; Memcheck cannot make it so, since
   1841 Memcheck never explicitly learns about the superblock of a pool, only
   1842 the allocated chunks within the pool.</para>
   1843 
   1844 <para>
   1845 Once the header and superblock for a pool are established and properly
   1846 marked, there are a number of client requests programs can use to
   1847 inform Memcheck about changes to the state of a mempool:</para>
   1848 
   1849 <itemizedlist>
   1850 
   1851   <listitem>
   1852     <para>
   1853     <varname>VALGRIND_CREATE_MEMPOOL(pool, rzB, is_zeroed)</varname>:
   1854     This request registers the address <varname>pool</varname> as the anchor
   1855     address for a memory pool. It also provides a size
   1856     <varname>rzB</varname>, specifying how large the redzones placed around
   1857     chunks allocated from the pool should be. Finally, it provides an
   1858     <varname>is_zeroed</varname> argument that specifies whether the pool's
   1859     chunks are zeroed (more precisely: defined) when allocated.
   1860     </para>
   1861     <para>
   1862     Upon completion of this request, no chunks are associated with the
   1863     pool.  The request simply tells Memcheck that the pool exists, so that
   1864     subsequent calls can refer to it as a pool.
   1865     </para>
   1866   </listitem>
   1867 
   1868   <listitem>
   1869     <para><varname>VALGRIND_DESTROY_MEMPOOL(pool)</varname>:
   1870     This request tells Memcheck that a pool is being torn down. Memcheck
   1871     then removes all records of chunks associated with the pool, as well
   1872     as its record of the pool's existence. While destroying its records of
   1873     a mempool, Memcheck resets the redzones of any live chunks in the pool
   1874     to NOACCESS.
   1875     </para>
   1876   </listitem>
   1877 
   1878   <listitem>
   1879     <para><varname>VALGRIND_MEMPOOL_ALLOC(pool, addr, size)</varname>:
   1880     This request informs Memcheck that a <varname>size</varname>-byte chunk
   1881     has been allocated at <varname>addr</varname>, and associates the chunk with the
   1882     specified
   1883     <varname>pool</varname>. If the pool was created with nonzero
   1884     <varname>rzB</varname> redzones, Memcheck will mark the
   1885     <varname>rzB</varname> bytes before and after the chunk as NOACCESS. If
   1886     the pool was created with the <varname>is_zeroed</varname> argument set,
   1887     Memcheck will mark the chunk as DEFINED, otherwise Memcheck will mark
   1888     the chunk as UNDEFINED.
   1889     </para>
   1890   </listitem>
   1891 
   1892   <listitem>
   1893     <para><varname>VALGRIND_MEMPOOL_FREE(pool, addr)</varname>:
   1894     This request informs Memcheck that the chunk at <varname>addr</varname>
   1895     should no longer be considered allocated. Memcheck will mark the chunk
   1896     associated with <varname>addr</varname> as NOACCESS, and delete its
   1897     record of the chunk's existence.
   1898     </para>
   1899   </listitem>
   1900 
   1901   <listitem>
   1902     <para><varname>VALGRIND_MEMPOOL_TRIM(pool, addr, size)</varname>:
   1903     This request trims the chunks associated with <varname>pool</varname>.
   1904     The request only operates on chunks associated with
   1905     <varname>pool</varname>. Trimming is formally defined as:</para>
   1906     <itemizedlist>
   1907       <listitem>
   1908         <para> All chunks entirely inside the range
   1909         <varname>addr..(addr+size-1)</varname> are preserved.</para>
   1910       </listitem>
   1911       <listitem>
   1912         <para>All chunks entirely outside the range
   1913         <varname>addr..(addr+size-1)</varname> are discarded, as though
   1914         <varname>VALGRIND_MEMPOOL_FREE</varname> was called on them. </para>
   1915       </listitem>
   1916       <listitem>
   1917         <para>All other chunks must intersect with the range 
   1918         <varname>addr..(addr+size-1)</varname>; areas outside the
   1919         intersection are marked as NOACCESS, as though they had been
   1920         independently freed with
   1921         <varname>VALGRIND_MEMPOOL_FREE</varname>.</para>
   1922       </listitem>
   1923     </itemizedlist>
   1924     <para>This is a somewhat rare request, but can be useful in 
   1925     implementing the type of mass-free operations common in custom 
   1926     LIFO allocators.</para>
   1927   </listitem>
   1928 
   1929   <listitem>
   1930     <para><varname>VALGRIND_MOVE_MEMPOOL(poolA, poolB)</varname>: This
   1931     request informs Memcheck that the pool previously anchored at
   1932     address <varname>poolA</varname> has moved to anchor address
   1933     <varname>poolB</varname>.  This is a rare request, typically only needed
   1934     if you <function>realloc</function> the header of a mempool.</para>
   1935     <para>No memory-status bits are altered by this request.</para>
   1936   </listitem>
   1937 
   1938   <listitem>
   1939     <para>
   1940     <varname>VALGRIND_MEMPOOL_CHANGE(pool, addrA, addrB,
   1941     size)</varname>: This request informs Memcheck that the chunk
   1942     previously allocated at address <varname>addrA</varname> within
   1943     <varname>pool</varname> has been moved and/or resized, and should be
   1944     changed to cover the region <varname>addrB..(addrB+size-1)</varname>. This
   1945     is a rare request, typically only needed if you
   1946     <function>realloc</function> a superblock or wish to extend a chunk
   1947     without changing its memory-status bits.
   1948     </para>
   1949     <para>No memory-status bits are altered by this request.
   1950     </para>
   1951   </listitem>
   1952 
   1953   <listitem>
   1954     <para><varname>VALGRIND_MEMPOOL_EXISTS(pool)</varname>:
   1955     This request informs the caller whether or not Memcheck is currently 
   1956     tracking a mempool at anchor address <varname>pool</varname>. It
   1957     evaluates to 1 when there is a mempool associated with that address, 0
   1958     otherwise. This is a rare request, only useful in circumstances when
   1959     client code might have lost track of the set of active mempools.
   1960     </para>
   1961   </listitem>
   1962 
   1963 </itemizedlist>
   1964 
   1965 </sect1>
   1966 
   1967 
   1968 
   1969 
   1970 
   1971 
   1972 
   1973 <sect1 id="mc-manual.mpiwrap" xreflabel="MPI Wrappers">
   1974 <title>Debugging MPI Parallel Programs with Valgrind</title>
   1975 
   1976 <para>Memcheck supports debugging of distributed-memory applications
   1977 which use the MPI message passing standard.  This support consists of a
   1978 library of wrapper functions for the
   1979 <computeroutput>PMPI_*</computeroutput> interface.  When incorporated
   1980 into the application's address space, either by direct linking or by
   1981 <computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
   1982 calls to <computeroutput>PMPI_Send</computeroutput>,
   1983 <computeroutput>PMPI_Recv</computeroutput>, etc.  They then
   1984 use client requests to inform Memcheck of memory state changes caused
   1985 by the function being wrapped.  This reduces the number of false
   1986 positives that Memcheck otherwise typically reports for MPI
   1987 applications.</para>
   1988 
   1989 <para>The wrappers also take the opportunity to carefully check
   1990 size and definedness of buffers passed as arguments to MPI functions, hence
   1991 detecting errors such as passing undefined data to
   1992 <computeroutput>PMPI_Send</computeroutput>, or receiving data into a
   1993 buffer which is too small.</para>
   1994 
   1995 <para>Unlike most of the rest of Valgrind, the wrapper library is subject to a
   1996 BSD-style license, so you can link it into any code base you like.
   1997 See the top of <computeroutput>mpi/libmpiwrap.c</computeroutput>
   1998 for license details.</para>
   1999 
   2000 
   2001 <sect2 id="mc-manual.mpiwrap.build" xreflabel="Building MPI Wrappers">
   2002 <title>Building and installing the wrappers</title>
   2003 
   2004 <para> The wrapper library will be built automatically if possible.
   2005 Valgrind's configure script will look for a suitable
   2006 <computeroutput>mpicc</computeroutput> to build it with.  This must be
   2007 the same <computeroutput>mpicc</computeroutput> you use to build the
   2008 MPI application you want to debug.  By default, Valgrind tries
   2009 <computeroutput>mpicc</computeroutput>, but you can specify a
   2010 different one by using the configure-time option
   2011 <option>--with-mpicc</option>.  Currently the
   2012 wrappers are only buildable with
   2013 <computeroutput>mpicc</computeroutput>s which are based on GNU
   2014 GCC or Intel's C++ Compiler.</para>
   2015 
   2016 <para>Check that the configure script prints a line like this:</para>
   2017 
   2018 <programlisting><![CDATA[
   2019 checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
   2020 ]]></programlisting>
   2021 
   2022 <para>If it says <computeroutput>... no</computeroutput>, your
   2023 <computeroutput>mpicc</computeroutput> has failed to compile and link
   2024 a test MPI2 program.</para>
   2025 
   2026 <para>If the configure test succeeds, continue in the usual way with
   2027 <computeroutput>make</computeroutput> and <computeroutput>make
   2028 install</computeroutput>.  The final install tree should then contain
   2029 <computeroutput>libmpiwrap-&lt;platform&gt;.so</computeroutput>.
   2030 </para>
   2031 
   2032 <para>Compile up a test MPI program (eg, MPI hello-world) and try
   2033 this:</para>
   2034 
   2035 <programlisting><![CDATA[
   2036 LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so   \
   2037            mpirun [args] $prefix/bin/valgrind ./hello
   2038 ]]></programlisting>
   2039 
   2040 <para>You should see something similar to the following</para>
   2041 
   2042 <programlisting><![CDATA[
   2043 valgrind MPI wrappers 31901: Active for pid 31901
   2044 valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=help for possible options
   2045 ]]></programlisting>
   2046 
   2047 <para>repeated for every process in the group.  If you do not see
   2048 these, there is an build/installation problem of some kind.</para>
   2049 
   2050 <para> The MPI functions to be wrapped are assumed to be in an ELF
   2051 shared object with soname matching
   2052 <computeroutput>libmpi.so*</computeroutput>.  This is known to be
   2053 correct at least for Open MPI and Quadrics MPI, and can easily be
   2054 changed if required.</para> 
   2055 </sect2>
   2056 
   2057 
   2058 <sect2 id="mc-manual.mpiwrap.gettingstarted" 
   2059        xreflabel="Getting started with MPI Wrappers">
   2060 <title>Getting started</title>
   2061 
   2062 <para>Compile your MPI application as usual, taking care to link it
   2063 using the same <computeroutput>mpicc</computeroutput> that your
   2064 Valgrind build was configured with.</para>
   2065 
   2066 <para>
   2067 Use the following basic scheme to run your application on Valgrind with
   2068 the wrappers engaged:</para>
   2069 
   2070 <programlisting><![CDATA[
   2071 MPIWRAP_DEBUG=[wrapper-args]                                  \
   2072    LD_PRELOAD=$prefix/lib/valgrind/libmpiwrap-<platform>.so   \
   2073    mpirun [mpirun-args]                                       \
   2074    $prefix/bin/valgrind [valgrind-args]                       \
   2075    [application] [app-args]
   2076 ]]></programlisting>
   2077 
   2078 <para>As an alternative to
   2079 <computeroutput>LD_PRELOAD</computeroutput>ing
   2080 <computeroutput>libmpiwrap-&lt;platform&gt;.so</computeroutput>, you can
   2081 simply link it to your application if desired.  This should not disturb
   2082 native behaviour of your application in any way.</para>
   2083 </sect2>
   2084 
   2085 
   2086 <sect2 id="mc-manual.mpiwrap.controlling" 
   2087        xreflabel="Controlling the MPI Wrappers">
   2088 <title>Controlling the wrapper library</title>
   2089 
   2090 <para>Environment variable
   2091 <computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
   2092 startup.  The default behaviour is to print a starting banner</para>
   2093 
   2094 <programlisting><![CDATA[
   2095 valgrind MPI wrappers 16386: Active for pid 16386
   2096 valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=help for possible options
   2097 ]]></programlisting>
   2098 
   2099 <para> and then be relatively quiet.</para>
   2100 
   2101 <para>You can give a list of comma-separated options in
   2102 <computeroutput>MPIWRAP_DEBUG</computeroutput>.  These are</para>
   2103 
   2104 <itemizedlist>
   2105   <listitem>
   2106     <para><computeroutput>verbose</computeroutput>:
   2107     show entries/exits of all wrappers.  Also show extra
   2108     debugging info, such as the status of outstanding 
   2109     <computeroutput>MPI_Request</computeroutput>s resulting
   2110     from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
   2111   </listitem>
   2112   <listitem>
   2113     <para><computeroutput>quiet</computeroutput>: 
   2114     opposite of <computeroutput>verbose</computeroutput>, only print 
   2115     anything when the wrappers want
   2116     to report a detected programming error, or in case of catastrophic
   2117     failure of the wrappers.</para>
   2118   </listitem>
   2119   <listitem>
   2120     <para><computeroutput>warn</computeroutput>: 
   2121     by default, functions which lack proper wrappers
   2122     are not commented on, just silently
   2123     ignored.  This causes a warning to be printed for each unwrapped
   2124     function used, up to a maximum of three warnings per function.</para>
   2125   </listitem>
   2126   <listitem>
   2127     <para><computeroutput>strict</computeroutput>: 
   2128     print an error message and abort the program if 
   2129     a function lacking a wrapper is used.</para>
   2130   </listitem>
   2131 </itemizedlist>
   2132 
   2133 <para> If you want to use Valgrind's XML output facility
   2134 (<option>--xml=yes</option>), you should pass
   2135 <computeroutput>quiet</computeroutput> in
   2136 <computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
   2137 extraneous printing from the wrappers.</para>
   2138 
   2139 </sect2>
   2140 
   2141 
   2142 <sect2 id="mc-manual.mpiwrap.limitations.functions" 
   2143        xreflabel="Functions: Abilities and Limitations">
   2144 <title>Functions</title>
   2145 
   2146 <para>All MPI2 functions except
   2147 <computeroutput>MPI_Wtick</computeroutput>,
   2148 <computeroutput>MPI_Wtime</computeroutput> and
   2149 <computeroutput>MPI_Pcontrol</computeroutput> have wrappers.  The
   2150 first two are not wrapped because they return a 
   2151 <computeroutput>double</computeroutput>, which Valgrind's
   2152 function-wrap mechanism cannot handle (but it could easily be
   2153 extended to do so).  <computeroutput>MPI_Pcontrol</computeroutput> cannot be
   2154 wrapped as it has variable arity: 
   2155 <computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput></para>
   2156 
   2157 <para>Most functions are wrapped with a default wrapper which does
   2158 nothing except complain or abort if it is called, depending on
   2159 settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
   2160 above.  The following functions have "real", do-something-useful
   2161 wrappers:</para>
   2162 
   2163 <programlisting><![CDATA[
   2164 PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
   2165 
   2166 PMPI_Recv PMPI_Get_count
   2167 
   2168 PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
   2169 
   2170 PMPI_Irecv
   2171 PMPI_Wait PMPI_Waitall
   2172 PMPI_Test PMPI_Testall
   2173 
   2174 PMPI_Iprobe PMPI_Probe
   2175 
   2176 PMPI_Cancel
   2177 
   2178 PMPI_Sendrecv
   2179 
   2180 PMPI_Type_commit PMPI_Type_free
   2181 
   2182 PMPI_Pack PMPI_Unpack
   2183 
   2184 PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
   2185 PMPI_Reduce PMPI_Allreduce PMPI_Op_create
   2186 
   2187 PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_size
   2188 
   2189 PMPI_Error_string
   2190 PMPI_Init PMPI_Initialized PMPI_Finalize
   2191 ]]></programlisting>
   2192 
   2193 <para> A few functions such as
   2194 <computeroutput>PMPI_Address</computeroutput> are listed as
   2195 <computeroutput>HAS_NO_WRAPPER</computeroutput>.  They have no wrapper
   2196 at all as there is nothing worth checking, and giving a no-op wrapper
   2197 would reduce performance for no reason.</para>
   2198 
   2199 <para> Note that the wrapper library itself can itself generate large
   2200 numbers of calls to the MPI implementation, especially when walking
   2201 complex types.  The most common functions called are
   2202 <computeroutput>PMPI_Extent</computeroutput>,
   2203 <computeroutput>PMPI_Type_get_envelope</computeroutput>,
   2204 <computeroutput>PMPI_Type_get_contents</computeroutput>, and
   2205 <computeroutput>PMPI_Type_free</computeroutput>.  </para>
   2206 </sect2>
   2207 
   2208 <sect2 id="mc-manual.mpiwrap.limitations.types" 
   2209        xreflabel="Types: Abilities and Limitations">
   2210 <title>Types</title>
   2211 
   2212 <para> MPI-1.1 structured types are supported, and walked exactly.
   2213 The currently supported combiners are
   2214 <computeroutput>MPI_COMBINER_NAMED</computeroutput>,
   2215 <computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
   2216 <computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
   2217 <computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
   2218 <computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
   2219 <computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
   2220 <computeroutput>MPI_COMBINER_STRUCT</computeroutput>.  This should
   2221 cover all MPI-1.1 types.  The mechanism (function
   2222 <computeroutput>walk_type</computeroutput>) should extend easily to
   2223 cover MPI2 combiners.</para>
   2224 
   2225 <para>MPI defines some named structured types
   2226 (<computeroutput>MPI_FLOAT_INT</computeroutput>,
   2227 <computeroutput>MPI_DOUBLE_INT</computeroutput>,
   2228 <computeroutput>MPI_LONG_INT</computeroutput>,
   2229 <computeroutput>MPI_2INT</computeroutput>,
   2230 <computeroutput>MPI_SHORT_INT</computeroutput>,
   2231 <computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
   2232 of some basic type and a C <computeroutput>int</computeroutput>.
   2233 Unfortunately the MPI specification makes it impossible to look inside
   2234 these types and see where the fields are.  Therefore these wrappers
   2235 assume the types are laid out as <computeroutput>struct { float val;
   2236 int loc; }</computeroutput> (for
   2237 <computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
   2238 accordingly.  This appears to be correct at least for Open MPI 1.0.2
   2239 and for Quadrics MPI.</para>
   2240 
   2241 <para>If <computeroutput>strict</computeroutput> is an option specified 
   2242 in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
   2243 will abort if an unhandled type is encountered.  Otherwise, the 
   2244 application will print a warning message and continue.</para>
   2245 
   2246 <para>Some effort is made to mark/check memory ranges corresponding to
   2247 arrays of values in a single pass.  This is important for performance
   2248 since asking Valgrind to mark/check any range, no matter how small,
   2249 carries quite a large constant cost.  This optimisation is applied to
   2250 arrays of primitive types (<computeroutput>double</computeroutput>,
   2251 <computeroutput>float</computeroutput>,
   2252 <computeroutput>int</computeroutput>,
   2253 <computeroutput>long</computeroutput>, <computeroutput>long
   2254 long</computeroutput>, <computeroutput>short</computeroutput>,
   2255 <computeroutput>char</computeroutput>, and <computeroutput>long
   2256 double</computeroutput> on platforms where <computeroutput>sizeof(long
   2257 double) == 8</computeroutput>).  For arrays of all other types, the
   2258 wrappers handle each element individually and so there can be a very
   2259 large performance cost.</para>
   2260 
   2261 </sect2>
   2262 
   2263 
   2264 <sect2 id="mc-manual.mpiwrap.writingwrappers" 
   2265        xreflabel="Writing new MPI Wrappers">
   2266 <title>Writing new wrappers</title>
   2267 
   2268 <para>
   2269 For the most part the wrappers are straightforward.  The only
   2270 significant complexity arises with nonblocking receives.</para>
   2271 
   2272 <para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
   2273 states the recv buffer and returns immediately, giving a handle
   2274 (<computeroutput>MPI_Request</computeroutput>) for the transaction.
   2275 Later the user will have to poll for completion with
   2276 <computeroutput>MPI_Wait</computeroutput> etc, and when the
   2277 transaction completes successfully, the wrappers have to paint the
   2278 recv buffer.  But the recv buffer details are not presented to
   2279 <computeroutput>MPI_Wait</computeroutput> -- only the handle is.  The
   2280 library therefore maintains a shadow table which associates
   2281 uncompleted <computeroutput>MPI_Request</computeroutput>s with the
   2282 corresponding buffer address/count/type.  When an operation completes,
   2283 the table is searched for the associated address/count/type info, and
   2284 memory is marked accordingly.</para>
   2285 
   2286 <para>Access to the table is guarded by a (POSIX pthreads) lock, so as
   2287 to make the library thread-safe.</para>
   2288 
   2289 <para>The table is allocated with
   2290 <computeroutput>malloc</computeroutput> and never
   2291 <computeroutput>free</computeroutput>d, so it will show up in leak
   2292 checks.</para>
   2293 
   2294 <para>Writing new wrappers should be fairly easy.  The source file is
   2295 <computeroutput>mpi/libmpiwrap.c</computeroutput>.  If possible,
   2296 find an existing wrapper for a function of similar behaviour to the
   2297 one you want to wrap, and use it as a starting point.  The wrappers
   2298 are organised in sections in the same order as the MPI 1.1 spec, to
   2299 aid navigation.  When adding a wrapper, remember to comment out the
   2300 definition of the default wrapper in the long list of defaults at the
   2301 bottom of the file (do not remove it, just comment it out).</para>
   2302 </sect2>
   2303 
   2304 <sect2 id="mc-manual.mpiwrap.whattoexpect" 
   2305        xreflabel="What to expect with MPI Wrappers">
   2306 <title>What to expect when using the wrappers</title>
   2307 
   2308 <para>The wrappers should reduce Memcheck's false-error rate on MPI
   2309 applications.  Because the wrapping is done at the MPI interface,
   2310 there will still potentially be a large number of errors reported in
   2311 the MPI implementation below the interface.  The best you can do is
   2312 try to suppress them.</para>
   2313 
   2314 <para>You may also find that the input-side (buffer
   2315 length/definedness) checks find errors in your MPI use, for example
   2316 passing too short a buffer to
   2317 <computeroutput>MPI_Recv</computeroutput>.</para>
   2318 
   2319 <para>Functions which are not wrapped may increase the false
   2320 error rate.  A possible approach is to run with
   2321 <computeroutput>MPI_DEBUG</computeroutput> containing
   2322 <computeroutput>warn</computeroutput>.  This will show you functions
   2323 which lack proper wrappers but which are nevertheless used.  You can
   2324 then write wrappers for them.
   2325 </para>
   2326 
   2327 <para>A known source of potential false errors are the
   2328 <computeroutput>PMPI_Reduce</computeroutput> family of functions, when
   2329 using a custom (user-defined) reduction function.  In a reduction
   2330 operation, each node notionally sends data to a "central point" which
   2331 uses the specified reduction function to merge the data items into a
   2332 single item.  Hence, in general, data is passed between nodes and fed
   2333 to the reduction function, but the wrapper library cannot mark the
   2334 transferred data as initialised before it is handed to the reduction
   2335 function, because all that happens "inside" the
   2336 <computeroutput>PMPI_Reduce</computeroutput> call.  As a result you
   2337 may see false positives reported in your reduction function.</para>
   2338 
   2339 </sect2>
   2340 
   2341 </sect1>
   2342 
   2343 
   2344 
   2345 
   2346 
   2347 </chapter>
   2348