Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 
      7 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
      8   <title>Helgrind: a thread error detector</title>
      9 
     10 <para>To use this tool, you must specify
     11 <option>--tool=helgrind</option> on the Valgrind
     12 command line.</para>
     13 
     14 
     15 <sect1 id="hg-manual.overview" xreflabel="Overview">
     16 <title>Overview</title>
     17 
     18 <para>Helgrind is a Valgrind tool for detecting synchronisation errors
     19 in C, C++ and Fortran programs that use the POSIX pthreads
     20 threading primitives.</para>
     21 
     22 <para>The main abstractions in POSIX pthreads are: a set of threads
     23 sharing a common address space, thread creation, thread joining,
     24 thread exit, mutexes (locks), condition variables (inter-thread event
     25 notifications), reader-writer locks, spinlocks, semaphores and
     26 barriers.</para>
     27 
     28 <para>Helgrind can detect three classes of errors, which are discussed
     29 in detail in the next three sections:</para>
     30 
     31 <orderedlist>
     32  <listitem>
     33   <para><link linkend="hg-manual.api-checks">
     34         Misuses of the POSIX pthreads API.</link></para>
     35  </listitem>
     36  <listitem>
     37   <para><link linkend="hg-manual.lock-orders">
     38         Potential deadlocks arising from lock
     39         ordering problems.</link></para>
     40  </listitem>
     41  <listitem>
     42   <para><link linkend="hg-manual.data-races">
     43         Data races -- accessing memory without adequate locking
     44                       or synchronisation</link>.
     45   </para>
     46  </listitem>
     47 </orderedlist>
     48 
     49 <para>Problems like these often result in unreproducible,
     50 timing-dependent crashes, deadlocks and other misbehaviour, and
     51 can be difficult to find by other means.</para>
     52 
     53 <para>Helgrind is aware of all the pthread abstractions and tracks
     54 their effects as accurately as it can.  On x86 and amd64 platforms, it
     55 understands and partially handles implicit locking arising from the
     56 use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
     57 platforms, it partially handles implicit locking arising from 
     58 load-linked and store-conditional instruction pairs.
     59 </para>
     60 
     61 <para>Helgrind works best when your application uses only the POSIX
     62 pthreads API.  However, if you want to use custom threading 
     63 primitives, you can describe their behaviour to Helgrind using the
     64 <varname>ANNOTATE_*</varname> macros defined
     65 in <varname>helgrind.h</varname>.</para>
     66 
     67 
     68 
     69 <para>Following those is a section containing 
     70 <link linkend="hg-manual.effective-use">
     71 hints and tips on how to get the best out of Helgrind.</link>
     72 </para>
     73 
     74 <para>Then there is a
     75 <link linkend="hg-manual.options">summary of command-line
     76 options.</link>
     77 </para>
     78 
     79 <para>Finally, there is 
     80 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
     81 could be improved.</link>
     82 </para>
     83 
     84 </sect1>
     85 
     86 
     87 
     88 
     89 <sect1 id="hg-manual.api-checks" xreflabel="API Checks">
     90 <title>Detected errors: Misuses of the POSIX pthreads API</title>
     91 
     92 <para>Helgrind intercepts calls to many POSIX pthreads functions, and
     93 is therefore able to report on various common problems.  Although
     94 these are unglamourous errors, their presence can lead to undefined
     95 program behaviour and hard-to-find bugs later on.  The detected errors
     96 are:</para>
     97 
     98 <itemizedlist>
     99  <listitem><para>unlocking an invalid mutex</para></listitem>
    100  <listitem><para>unlocking a not-locked mutex</para></listitem>
    101  <listitem><para>unlocking a mutex held by a different
    102                  thread</para></listitem>
    103  <listitem><para>destroying an invalid or a locked mutex</para></listitem>
    104  <listitem><para>recursively locking a non-recursive mutex</para></listitem>
    105  <listitem><para>deallocation of memory that contains a
    106                  locked mutex</para></listitem>
    107  <listitem><para>passing mutex arguments to functions expecting
    108                  reader-writer lock arguments, and vice
    109                  versa</para></listitem>
    110  <listitem><para>when a POSIX pthread function fails with an
    111                  error code that must be handled</para></listitem>
    112  <listitem><para>when a thread exits whilst still holding locked
    113                  locks</para></listitem>
    114  <listitem><para>calling <function>pthread_cond_wait</function>
    115                  with a not-locked mutex, an invalid mutex,
    116                  or one locked by a different
    117                  thread</para></listitem>
    118  <listitem><para>inconsistent bindings between condition
    119                  variables and their associated mutexes</para></listitem>
    120  <listitem><para>invalid or duplicate initialisation of a pthread
    121                  barrier</para></listitem>
    122  <listitem><para>initialisation of a pthread barrier on which threads
    123                  are still waiting</para></listitem>
    124  <listitem><para>destruction of a pthread barrier object which was
    125                  never initialised, or on which threads are still
    126                  waiting</para></listitem>
    127  <listitem><para>waiting on an uninitialised pthread
    128                  barrier</para></listitem>
    129  <listitem><para>for all of the pthreads functions that Helgrind
    130                  intercepts, an error is reported, along with a stack
    131                  trace, if the system threading library routine returns
    132                  an error code, even if Helgrind itself detected no
    133                  error</para></listitem>
    134 </itemizedlist>
    135 
    136 <para>Checks pertaining to the validity of mutexes are generally also
    137 performed for reader-writer locks.</para>
    138 
    139 <para>Various kinds of this-can't-possibly-happen events are also
    140 reported.  These usually indicate bugs in the system threading
    141 library.</para>
    142 
    143 <para>Reported errors always contain a primary stack trace indicating
    144 where the error was detected.  They may also contain auxiliary stack
    145 traces giving additional information.  In particular, most errors
    146 relating to mutexes will also tell you where that mutex first came to
    147 Helgrind's attention (the "<computeroutput>was first observed
    148 at</computeroutput>" part), so you have a chance of figuring out which
    149 mutex it is referring to.  For example:</para>
    150 
    151 <programlisting><![CDATA[
    152 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
    153    at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
    154    by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
    155    by 0x40079B: main (tc09_bad_unlock.c:50)
    156   Lock at 0x7FEFFFA90 was first observed
    157    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    158    by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
    159    by 0x40079B: main (tc09_bad_unlock.c:50)
    160 ]]></programlisting>
    161 
    162 <para>Helgrind has a way of summarising thread identities, as
    163 you see here with the text "<computeroutput>Thread
    164 #1</computeroutput>".  This is so that it can speak about threads and
    165 sets of threads without overwhelming you with details.  See 
    166 <link linkend="hg-manual.data-races.errmsgs">below</link>
    167 for more information on interpreting error messages.</para>
    168 
    169 </sect1>
    170 
    171 
    172 
    173 
    174 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
    175 <title>Detected errors: Inconsistent Lock Orderings</title>
    176 
    177 <para>In this section, and in general, to "acquire" a lock simply
    178 means to lock that lock, and to "release" a lock means to unlock
    179 it.</para>
    180 
    181 <para>Helgrind monitors the order in which threads acquire locks.
    182 This allows it to detect potential deadlocks which could arise from
    183 the formation of cycles of locks.  Detecting such inconsistencies is
    184 useful because, whilst actual deadlocks are fairly obvious, potential
    185 deadlocks may never be discovered during testing and could later lead
    186 to hard-to-diagnose in-service failures.</para>
    187 
    188 <para>The simplest example of such a problem is as
    189 follows.</para>
    190 
    191 <itemizedlist>
    192  <listitem><para>Imagine some shared resource R, which, for whatever
    193   reason, is guarded by two locks, L1 and L2, which must both be held
    194   when R is accessed.</para>
    195  </listitem>
    196  <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
    197   to access R.  The implication of this is that all threads in the
    198   program must acquire the two locks in the order first L1 then L2.
    199   Not doing so risks deadlock.</para>
    200  </listitem>
    201  <listitem><para>The deadlock could happen if two threads -- call them
    202   T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
    203   and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
    204   to acquire L1, but those locks are both already held.  So T1 and T2
    205   become deadlocked.</para>
    206  </listitem>
    207 </itemizedlist>
    208 
    209 <para>Helgrind builds a directed graph indicating the order in which
    210 locks have been acquired in the past.  When a thread acquires a new
    211 lock, the graph is updated, and then checked to see if it now contains
    212 a cycle.  The presence of a cycle indicates a potential deadlock involving
    213 the locks in the cycle.</para>
    214 
    215 <para>In general, Helgrind will choose two locks involved in the cycle
    216 and show you how their acquisition ordering has become inconsistent.
    217 It does this by showing the program points that first defined the
    218 ordering, and the program points which later violated it.  Here is a
    219 simple example involving just two locks:</para>
    220 
    221 <programlisting><![CDATA[
    222 Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
    223 
    224 Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
    225    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    226    by 0x400825: main (tc13_laog1.c:23)
    227 
    228  followed by a later acquisition of lock at 0x7FF0006D0
    229    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    230    by 0x400853: main (tc13_laog1.c:24)
    231 
    232 Required order was established by acquisition of lock at 0x7FF0006D0
    233    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    234    by 0x40076D: main (tc13_laog1.c:17)
    235 
    236  followed by a later acquisition of lock at 0x7FF0006A0
    237    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    238    by 0x40079B: main (tc13_laog1.c:18)
    239 ]]></programlisting>
    240 
    241 <para>When there are more than two locks in the cycle, the error is
    242 equally serious.  However, at present Helgrind does not show the locks
    243 involved, sometimes because that information is not available, but
    244 also so as to avoid flooding you with information.  For example, a
    245 naive implementation of the famous Dining Philosophers problem
    246 involves a cycle of five locks
    247 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
    248 In this case Helgrind has detected that all 5 philosophers could
    249 simultaneously pick up their left fork and then deadlock whilst
    250 waiting to pick up their right forks.</para>
    251 
    252 <programlisting><![CDATA[
    253 Thread #6: lock order "0x80499A0 before 0x8049A00" violated
    254 
    255 Observed (incorrect) order is: acquisition of lock at 0x8049A00
    256    at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
    257    by 0x80485B4: dine (tc14_laog_dinphils.c:18)
    258    by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
    259    by 0x39B924: start_thread (pthread_create.c:297)
    260    by 0x2F107D: clone (clone.S:130)
    261 
    262  followed by a later acquisition of lock at 0x80499A0
    263    at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
    264    by 0x80485CD: dine (tc14_laog_dinphils.c:19)
    265    by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
    266    by 0x39B924: start_thread (pthread_create.c:297)
    267    by 0x2F107D: clone (clone.S:130)
    268 ]]></programlisting>
    269 
    270 </sect1>
    271 
    272 
    273 
    274 
    275 <sect1 id="hg-manual.data-races" xreflabel="Data Races">
    276 <title>Detected errors: Data Races</title>
    277 
    278 <para>A data race happens, or could happen, when two threads access a
    279 shared memory location without using suitable locks or other
    280 synchronisation to ensure single-threaded access.  Such missing
    281 locking can cause obscure timing dependent bugs.  Ensuring programs
    282 are race-free is one of the central difficulties of threaded
    283 programming.</para>
    284 
    285 <para>Reliably detecting races is a difficult problem, and most
    286 of Helgrind's internals are devoted to dealing with it.  
    287 We begin with a simple example.</para>
    288 
    289 
    290 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
    291 <title>A Simple Data Race</title>
    292 
    293 <para>About the simplest possible example of a race is as follows.  In
    294 this program, it is impossible to know what the value
    295 of <computeroutput>var</computeroutput> is at the end of the program.
    296 Is it 2 ?  Or 1 ?</para>
    297 
    298 <programlisting><![CDATA[
    299 #include <pthread.h>
    300 
    301 int var = 0;
    302 
    303 void* child_fn ( void* arg ) {
    304    var++; /* Unprotected relative to parent */ /* this is line 6 */
    305    return NULL;
    306 }
    307 
    308 int main ( void ) {
    309    pthread_t child;
    310    pthread_create(&child, NULL, child_fn, NULL);
    311    var++; /* Unprotected relative to child */ /* this is line 13 */
    312    pthread_join(child, NULL);
    313    return 0;
    314 }
    315 ]]></programlisting>
    316 
    317 <para>The problem is there is nothing to
    318 stop <varname>var</varname> being updated simultaneously
    319 by both threads.  A correct program would 
    320 protect <varname>var</varname> with a lock of type
    321 <function>pthread_mutex_t</function>, which is acquired
    322 before each access and released afterwards.  Helgrind's output for
    323 this program is:</para>
    324 
    325 <programlisting><![CDATA[
    326 Thread #1 is the program's root thread
    327 
    328 Thread #2 was created
    329    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    330    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    331    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    332    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    333    by 0x400605: main (simple_race.c:12)
    334 
    335 Possible data race during read of size 4 at 0x601038 by thread #1
    336 Locks held: none
    337    at 0x400606: main (simple_race.c:13)
    338 
    339 This conflicts with a previous write of size 4 by thread #2
    340 Locks held: none
    341    at 0x4005DC: child_fn (simple_race.c:6)
    342    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    343    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    344    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    345 
    346 Location 0x601038 is 0 bytes inside global var "var"
    347 declared at simple_race.c:3
    348 ]]></programlisting>
    349 
    350 <para>This is quite a lot of detail for an apparently simple error.
    351 The last clause is the main error message.  It says there is a race as
    352 a result of a read of size 4 (bytes), at 0x601038, which is the
    353 address of <computeroutput>var</computeroutput>, happening in
    354 function <computeroutput>main</computeroutput> at line 13 in the
    355 program.</para>
    356 
    357 <para>Two important parts of the message are:</para>
    358 
    359 <itemizedlist>
    360  <listitem>
    361   <para>Helgrind shows two stack traces for the error, not one.  By
    362    definition, a race involves two different threads accessing the
    363    same location in such a way that the result depends on the relative
    364    speeds of the two threads.</para>
    365   <para>
    366    The first stack trace follows the text "<computeroutput>Possible
    367    data race during read of size 4 ...</computeroutput>" and the
    368    second trace follows the text "<computeroutput>This conflicts with
    369    a previous write of size 4 ...</computeroutput>".  Helgrind is
    370    usually able to show both accesses involved in a race.  At least
    371    one of these will be a write (since two concurrent, unsynchronised
    372    reads are harmless), and they will of course be from different
    373    threads.</para>
    374   <para>By examining your program at the two locations, you should be
    375    able to get at least some idea of what the root cause of the
    376    problem is.  For each location, Helgrind shows the set of locks
    377    held at the time of the access.  This often makes it clear which
    378    thread, if any, failed to take a required lock.  In this example
    379    neither thread holds a lock during the access.</para>
    380  </listitem>
    381  <listitem>
    382   <para>For races which occur on global or stack variables, Helgrind
    383    tries to identify the name and defining point of the variable.
    384    Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
    385    global var "var" declared at simple_race.c:3</computeroutput>".</para>
    386   <para>Showing names of stack and global variables carries no
    387    run-time overhead once Helgrind has your program up and running.
    388    However, it does require Helgrind to spend considerable extra time
    389    and memory at program startup to read the relevant debug info.
    390    Hence this facility is disabled by default.  To enable it, you need
    391    to give the <varname>--read-var-info=yes</varname> option to
    392    Helgrind.</para>
    393  </listitem>
    394 </itemizedlist>
    395 
    396 <para>The following section explains Helgrind's race detection
    397 algorithm in more detail.</para>
    398 
    399 </sect2>
    400 
    401 
    402 
    403 <sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
    404 <title>Helgrind's Race Detection Algorithm</title>
    405 
    406 <para>Most programmers think about threaded programming in terms of
    407 the basic functionality provided by the threading library (POSIX
    408 Pthreads): thread creation, thread joining, locks, condition
    409 variables, semaphores and barriers.</para>
    410 
    411 <para>The effect of using these functions is to impose 
    412 constraints upon the order in which memory accesses can
    413 happen.  This implied ordering is generally known as the
    414 "happens-before relation".  Once you understand the happens-before
    415 relation, it is easy to see how Helgrind finds races in your code.
    416 Fortunately, the happens-before relation is itself easy to understand,
    417 and is by itself a useful tool for reasoning about the behaviour of
    418 parallel programs.  We now introduce it using a simple example.</para>
    419 
    420 <para>Consider first the following buggy program:</para>
    421 
    422 <programlisting><![CDATA[
    423 Parent thread:                         Child thread:
    424 
    425 int var;
    426 
    427 // create child thread
    428 pthread_create(...)                          
    429 var = 20;                              var = 10;
    430                                        exit
    431 
    432 // wait for child
    433 pthread_join(...)
    434 printf("%d\n", var);
    435 ]]></programlisting>
    436 
    437 <para>The parent thread creates a child.  Both then write different
    438 values to some variable <computeroutput>var</computeroutput>, and the
    439 parent then waits for the child to exit.</para>
    440 
    441 <para>What is the value of <computeroutput>var</computeroutput> at the
    442 end of the program, 10 or 20?  We don't know.  The program is
    443 considered buggy (it has a race) because the final value
    444 of <computeroutput>var</computeroutput> depends on the relative rates
    445 of progress of the parent and child threads.  If the parent is fast
    446 and the child is slow, then the child's assignment may happen later,
    447 so the final value will be 10; and vice versa if the child is faster
    448 than the parent.</para>
    449 
    450 <para>The relative rates of progress of parent vs child is not something
    451 the programmer can control, and will often change from run to run.
    452 It depends on factors such as the load on the machine, what else is
    453 running, the kernel's scheduling strategy, and many other factors.</para>
    454 
    455 <para>The obvious fix is to use a lock to
    456 protect <computeroutput>var</computeroutput>.  It is however
    457 instructive to consider a somewhat more abstract solution, which is to
    458 send a message from one thread to the other:</para>
    459 
    460 <programlisting><![CDATA[
    461 Parent thread:                         Child thread:
    462 
    463 int var;
    464 
    465 // create child thread
    466 pthread_create(...)                          
    467 var = 20;
    468 // send message to child
    469                                        // wait for message to arrive
    470                                        var = 10;
    471                                        exit
    472 
    473 // wait for child
    474 pthread_join(...)
    475 printf("%d\n", var);
    476 ]]></programlisting>
    477 
    478 <para>Now the program reliably prints "10", regardless of the speed of
    479 the threads.  Why?  Because the child's assignment cannot happen until
    480 after it receives the message.  And the message is not sent until
    481 after the parent's assignment is done.</para>
    482 
    483 <para>The message transmission creates a "happens-before" dependency
    484 between the two assignments: <computeroutput>var = 20;</computeroutput>
    485 must now happen-before <computeroutput>var = 10;</computeroutput>.
    486 And so there is no longer a race
    487 on <computeroutput>var</computeroutput>.
    488 </para>
    489 
    490 <para>Note that it's not significant that the parent sends a message
    491 to the child.  Sending a message from the child (after its assignment)
    492 to the parent (before its assignment) would also fix the problem, causing
    493 the program to reliably print "20".</para>
    494 
    495 <para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
    496 accesses to memory locations.  If a location -- in this example, 
    497 <computeroutput>var</computeroutput>,
    498 is accessed by two different threads, Helgrind checks to see if the
    499 two accesses are ordered by the happens-before relation.  If so,
    500 that's fine; if not, it reports a race.</para>
    501 
    502 <para>It is important to understand that the happens-before relation
    503 creates only a partial ordering, not a total ordering.  An example of
    504 a total ordering is comparison of numbers: for any two numbers 
    505 <computeroutput>x</computeroutput> and
    506 <computeroutput>y</computeroutput>, either 
    507 <computeroutput>x</computeroutput> is less than, equal to, or greater
    508 than
    509 <computeroutput>y</computeroutput>.  A partial ordering is like a
    510 total ordering, but it can also express the concept that two elements
    511 are neither equal, less or greater, but merely unordered with respect
    512 to each other.</para>
    513 
    514 <para>In the fixed example above, we say that 
    515 <computeroutput>var = 20;</computeroutput> "happens-before"
    516 <computeroutput>var = 10;</computeroutput>.  But in the original
    517 version, they are unordered: we cannot say that either happens-before
    518 the other.</para>
    519 
    520 <para>What does it mean to say that two accesses from different
    521 threads are ordered by the happens-before relation?  It means that
    522 there is some chain of inter-thread synchronisation operations which
    523 cause those accesses to happen in a particular order, irrespective of
    524 the actual rates of progress of the individual threads.  This is a
    525 required property for a reliable threaded program, which is why
    526 Helgrind checks for it.</para>
    527 
    528 <para>The happens-before relations created by standard threading
    529 primitives are as follows:</para>
    530 
    531 <itemizedlist>
    532  <listitem><para>When a mutex is unlocked by thread T1 and later (or
    533   immediately) locked by thread T2, then the memory accesses in T1
    534   prior to the unlock must happen-before those in T2 after it acquires
    535   the lock.</para>
    536  </listitem>
    537  <listitem><para>The same idea applies to reader-writer locks,
    538   although with some complication so as to allow correct handling of
    539   reads vs writes.</para>
    540  </listitem>
    541  <listitem><para>When a condition variable (CV) is signalled on by
    542   thread T1 and some other thread T2 is thereby released from a wait
    543   on the same CV, then the memory accesses in T1 prior to the
    544   signalling must happen-before those in T2 after it returns from the
    545   wait.  If no thread was waiting on the CV then there is no
    546   effect.</para>
    547  </listitem>
    548  <listitem><para>If instead T1 broadcasts on a CV, then all of the
    549   waiting threads, rather than just one of them, acquire a
    550   happens-before dependency on the broadcasting thread at the point it
    551   did the broadcast.</para>
    552  </listitem>
    553  <listitem><para>A thread T2 that continues after completing sem_wait
    554   on a semaphore that thread T1 posts on, acquires a happens-before
    555   dependence on the posting thread, a bit like dependencies caused
    556   mutex unlock-lock pairs.  However, since a semaphore can be posted
    557   on many times, it is unspecified from which of the post calls the
    558   wait call gets its happens-before dependency.</para>
    559  </listitem>
    560  <listitem><para>For a group of threads T1 .. Tn which arrive at a
    561   barrier and then move on, each thread after the call has a
    562   happens-after dependency from all threads before the
    563   barrier.</para>
    564  </listitem>
    565  <listitem><para>A newly-created child thread acquires an initial
    566   happens-after dependency on the point where its parent created it.
    567   That is, all memory accesses performed by the parent prior to
    568   creating the child are regarded as happening-before all the accesses
    569   of the child.</para>
    570  </listitem>
    571  <listitem><para>Similarly, when an exiting thread is reaped via a
    572   call to <function>pthread_join</function>, once the call returns, the
    573   reaping thread acquires a happens-after dependency relative to all memory
    574   accesses made by the exiting thread.</para>
    575  </listitem>
    576 </itemizedlist>
    577 
    578 <para>In summary: Helgrind intercepts the above listed events, and builds a
    579 directed acyclic graph represented the collective happens-before
    580 dependencies.  It also monitors all memory accesses.</para>
    581 
    582 <para>If a location is accessed by two different threads, but Helgrind
    583 cannot find any path through the happens-before graph from one access
    584 to the other, then it reports a race.</para>
    585 
    586 <para>There are a couple of caveats:</para>
    587 
    588 <itemizedlist>
    589  <listitem><para>Helgrind doesn't check for a race in the case where
    590   both accesses are reads.  That would be silly, since concurrent
    591   reads are harmless.</para>
    592  </listitem>
    593  <listitem><para>Two accesses are considered to be ordered by the
    594   happens-before dependency even through arbitrarily long chains of
    595   synchronisation events.  For example, if T1 accesses some location
    596   L, and then <function>pthread_cond_signals</function> T2, which later
    597   <function>pthread_cond_signals</function> T3, which then accesses L, then
    598   a suitable happens-before dependency exists between the first and second
    599   accesses, even though it involves two different inter-thread
    600   synchronisation events.</para>
    601  </listitem>
    602 </itemizedlist>
    603 
    604 </sect2>
    605 
    606 
    607 
    608 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
    609 <title>Interpreting Race Error Messages</title>
    610 
    611 <para>Helgrind's race detection algorithm collects a lot of
    612 information, and tries to present it in a helpful way when a race is
    613 detected.  Here's an example:</para>
    614 
    615 <programlisting><![CDATA[
    616 Thread #2 was created
    617    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    618    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    619    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    620    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    621    by 0x4008F2: main (tc21_pthonce.c:86)
    622 
    623 Thread #3 was created
    624    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    625    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    626    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    627    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    628    by 0x4008F2: main (tc21_pthonce.c:86)
    629 
    630 Possible data race during read of size 4 at 0x601070 by thread #3
    631 Locks held: none
    632    at 0x40087A: child (tc21_pthonce.c:74)
    633    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    634    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    635    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    636 
    637 This conflicts with a previous write of size 4 by thread #2
    638 Locks held: none
    639    at 0x400883: child (tc21_pthonce.c:74)
    640    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    641    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    642    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    643 
    644 Location 0x601070 is 0 bytes inside local var "unprotected2"
    645 declared at tc21_pthonce.c:51, in frame #0 of thread 3
    646 ]]></programlisting>
    647 
    648 <para>Helgrind first announces the creation points of any threads
    649 referenced in the error message.  This is so it can speak concisely
    650 about threads without repeatedly printing their creation point call
    651 stacks.  Each thread is only ever announced once, the first time it
    652 appears in any Helgrind error message.</para>
    653 
    654 <para>The main error message begins at the text
    655 "<computeroutput>Possible data race during read</computeroutput>".  At
    656 the start is information you would expect to see -- address and size
    657 of the racing access, whether a read or a write, and the call stack at
    658 the point it was detected.</para>
    659 
    660 <para>A second call stack is presented starting at the text
    661 "<computeroutput>This conflicts with a previous
    662 write</computeroutput>".  This shows a previous access which also
    663 accessed the stated address, and which is believed to be racing
    664 against the access in the first call stack. Note that this second
    665 call stack is limited to a maximum of 8 entries to limit the
    666 memory usage.</para>
    667 
    668 <para>Finally, Helgrind may attempt to give a description of the
    669 raced-on address in source level terms.  In this example, it
    670 identifies it as a local variable, shows its name, declaration point,
    671 and in which frame (of the first call stack) it lives.  Note that this
    672 information is only shown when <varname>--read-var-info=yes</varname>
    673 is specified on the command line.  That's because reading the DWARF3
    674 debug information in enough detail to capture variable type and
    675 location information makes Helgrind much slower at startup, and also
    676 requires considerable amounts of memory, for large programs.
    677 </para>
    678 
    679 <para>Once you have your two call stacks, how do you find the root
    680 cause of the race?</para>
    681 
    682 <para>The first thing to do is examine the source locations referred
    683 to by each call stack.  They should both show an access to the same
    684 location, or variable.</para>
    685 
    686 <para>Now figure out how how that location should have been made
    687 thread-safe:</para>
    688 
    689 <itemizedlist>
    690  <listitem><para>Perhaps the location was intended to be protected by
    691   a mutex?  If so, you need to lock and unlock the mutex at both
    692   access points, even if one of the accesses is reported to be a read.
    693   Did you perhaps forget the locking at one or other of the accesses?
    694   To help you do this, Helgrind shows the set of locks held by each
    695   threads at the time they accessed the raced-on location.</para>
    696  </listitem>
    697  <listitem><para>Alternatively, perhaps you intended to use a some
    698   other scheme to make it safe, such as signalling on a condition
    699   variable.  In all such cases, try to find a synchronisation event
    700   (or a chain thereof) which separates the earlier-observed access (as
    701   shown in the second call stack) from the later-observed access (as
    702   shown in the first call stack).  In other words, try to find
    703   evidence that the earlier access "happens-before" the later access.
    704   See the previous subsection for an explanation of the happens-before
    705   relation.</para>
    706   <para>
    707   The fact that Helgrind is reporting a race means it did not observe
    708   any happens-before relation between the two accesses.  If
    709   Helgrind is working correctly, it should also be the case that you
    710   also cannot find any such relation, even on detailed inspection
    711   of the source code.  Hopefully, though, your inspection of the code
    712   will show where the missing synchronisation operation(s) should have
    713   been.</para>
    714  </listitem>
    715 </itemizedlist>
    716 
    717 </sect2>
    718 
    719 
    720 </sect1>
    721 
    722 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
    723 <title>Hints and Tips for Effective Use of Helgrind</title>
    724 
    725 <para>Helgrind can be very helpful in finding and resolving
    726 threading-related problems.  Like all sophisticated tools, it is most
    727 effective when you understand how to play to its strengths.</para>
    728 
    729 <para>Helgrind will be less effective when you merely throw an
    730 existing threaded program at it and try to make sense of any reported
    731 errors.  It will be more effective if you design threaded programs
    732 from the start in a way that helps Helgrind verify correctness.  The
    733 same is true for finding memory errors with Memcheck, but applies more
    734 here, because thread checking is a harder problem.  Consequently it is
    735 much easier to write a correct program for which Helgrind falsely
    736 reports (threading) errors than it is to write a correct program for
    737 which Memcheck falsely reports (memory) errors.</para>
    738 
    739 <para>With that in mind, here are some tips, listed most important first,
    740 for getting reliable results and avoiding false errors.  The first two
    741 are critical.  Any violations of them will swamp you with huge numbers
    742 of false data-race errors.</para>
    743 
    744 
    745 <orderedlist>
    746 
    747   <listitem>
    748     <para>Make sure your application, and all the libraries it uses,
    749     use the POSIX threading primitives.  Helgrind needs to be able to
    750     see all events pertaining to thread creation, exit, locking and
    751     other synchronisation events.  To do so it intercepts many POSIX
    752     pthreads functions.</para>
    753 
    754     <para>Do not roll your own threading primitives (mutexes, etc)
    755     from combinations of the Linux futex syscall, atomic counters, etc.
    756     These throw Helgrind's internal what's-going-on models
    757     way off course and will give bogus results.</para>
    758 
    759     <para>Also, do not reimplement existing POSIX abstractions using
    760     other POSIX abstractions.  For example, don't build your own
    761     semaphore routines or reader-writer locks from POSIX mutexes and
    762     condition variables.  Instead use POSIX reader-writer locks and
    763     semaphores directly, since Helgrind supports them directly.</para>
    764 
    765     <para>Helgrind directly supports the following POSIX threading
    766     abstractions: mutexes, reader-writer locks, condition variables
    767     (but see below), semaphores and barriers.  Currently spinlocks
    768     are not supported, although they could be in future.</para>
    769 
    770     <para>At the time of writing, the following popular Linux packages
    771     are known to implement their own threading primitives:</para>
    772 
    773     <itemizedlist>
    774      <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
    775       only uses POSIX pthreads primitives.  Unfortunately Qt 4.X 
    776       has its own implementation of mutexes (QMutex) and thread reaping.
    777       Helgrind 3.4.x contains direct support
    778       for Qt 4.X threading, which is experimental but is believed to
    779       work fairly well.  A side effect of supporting Qt 4 directly is
    780       that Helgrind can be used to debug KDE4 applications.  As this
    781       is an experimental feature, we would particularly appreciate
    782       feedback from folks who have used Helgrind to successfully debug
    783       Qt 4 and/or KDE4 applications.</para>
    784      </listitem>
    785      <listitem><para>Runtime support library for GNU OpenMP (part of
    786       GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
    787       library (<filename>libgomp.so</filename>) constructs its own
    788       synchronisation primitives using combinations of atomic memory
    789       instructions and the futex syscall, which causes total chaos since in
    790       Helgrind since it cannot "see" those.</para>
    791      <para>Fortunately, this can be solved using a configuration-time
    792       option (for GCC).  Rebuild GCC from source, and configure using
    793       <varname>--disable-linux-futex</varname>.
    794       This makes libgomp.so use the standard
    795       POSIX threading primitives instead.  Note that this was tested
    796       using GCC 4.2.3 and has not been re-tested using more recent GCC
    797       versions.  We would appreciate hearing about any successes or
    798       failures with more recent versions.</para>
    799      </listitem>
    800     </itemizedlist>
    801 
    802     <para>If you must implement your own threading primitives, there
    803       are a set of client request macros
    804       in <computeroutput>helgrind.h</computeroutput> to help you
    805       describe your primitives to Helgrind.  You should be able to
    806       mark up mutexes, condition variables, etc, without difficulty.
    807     </para>
    808     <para>
    809       It is also possible to mark up the effects of thread-safe
    810       reference counting using the
    811       <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
    812       <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
    813       <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
    814       macros.  Thread-safe reference counting using an atomically
    815       incremented/decremented refcount variable causes Helgrind
    816       problems because a one-to-zero transition of the reference count
    817       means the accessing thread has exclusive ownership of the
    818       associated resource (normally, a C++ object) and can therefore
    819       access it (normally, to run its destructor) without locking.
    820       Helgrind doesn't understand this, and markup is essential to
    821       avoid false positives.
    822     </para>
    823 
    824     <para>
    825       Here are recommended guidelines for marking up thread safe
    826       reference counting in C++.  You only need to mark up your
    827       release methods -- the ones which decrement the reference count.
    828       Given a class like this:
    829     </para>
    830 
    831 <programlisting><![CDATA[
    832 class MyClass {
    833    unsigned int mRefCount;
    834 
    835    void Release ( void ) {
    836       unsigned int newCount = atomic_decrement(&mRefCount);
    837       if (newCount == 0) {
    838          delete this;
    839       }
    840    }
    841 }
    842 ]]></programlisting>
    843 
    844    <para>
    845      the release method should be marked up as follows:
    846    </para>
    847 
    848 <programlisting><![CDATA[
    849    void Release ( void ) {
    850       unsigned int newCount = atomic_decrement(&mRefCount);
    851       if (newCount == 0) {
    852          ANNOTATE_HAPPENS_AFTER(&mRefCount);
    853          ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
    854          delete this;
    855       } else {
    856          ANNOTATE_HAPPENS_BEFORE(&mRefCount);
    857       }
    858    }
    859 ]]></programlisting>
    860 
    861     <para>
    862       There are a number of complex, mostly-theoretical objections to
    863       this scheme.  From a theoretical standpoint it appears to be
    864       impossible to devise a markup scheme which is completely correct
    865       in the sense of guaranteeing to remove all false races.  The
    866       proposed scheme however works well in practice.
    867     </para>
    868 
    869   </listitem>
    870 
    871   <listitem>
    872     <para>Avoid memory recycling.  If you can't avoid it, you must use
    873     tell Helgrind what is going on via the
    874     <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
    875     <computeroutput>helgrind.h</computeroutput>).</para>
    876 
    877     <para>Helgrind is aware of standard heap memory allocation and
    878     deallocation that occurs via
    879     <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
    880     and from entry and exit of stack frames.  In particular, when memory is
    881     deallocated via <function>free</function>, <function>delete</function>,
    882     or function exit, Helgrind considers that memory clean, so when it is
    883     eventually reallocated, its history is irrelevant.</para>
    884 
    885     <para>However, it is common practice to implement memory recycling
    886     schemes.  In these, memory to be freed is not handed to
    887     <function>free</function>/<function>delete</function>, but instead put
    888     into a pool of free buffers to be handed out again as required.  The
    889     problem is that Helgrind has no
    890     way to know that such memory is logically no longer in use, and
    891     its history is irrelevant.  Hence you must make that explicit,
    892     using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
    893     to specify the relevant address ranges.  It's easiest to put these
    894     requests into the pool manager code, and use them either when memory is
    895     returned to the pool, or is allocated from it.</para>
    896   </listitem>
    897 
    898   <listitem>
    899     <para>Avoid POSIX condition variables.  If you can, use POSIX
    900     semaphores (<function>sem_t</function>, <function>sem_post</function>,
    901     <function>sem_wait</function>) to do inter-thread event signalling.
    902     Semaphores with an initial value of zero are particularly useful for
    903     this.</para>
    904 
    905     <para>Helgrind only partially correctly handles POSIX condition
    906     variables.  This is because Helgrind can see inter-thread
    907     dependencies between a <function>pthread_cond_wait</function> call and a
    908     <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
    909     call only if the waiting thread actually gets to the rendezvous first
    910     (so that it actually calls
    911     <function>pthread_cond_wait</function>).  It can't see dependencies
    912     between the threads if the signaller arrives first.  In the latter case,
    913     POSIX guidelines imply that the associated boolean condition still
    914     provides an inter-thread synchronisation event, but one which is
    915     invisible to Helgrind.</para>
    916 
    917     <para>The result of Helgrind missing some inter-thread
    918     synchronisation events is to cause it to report false positives.
    919     </para>
    920 
    921     <para>The root cause of this synchronisation lossage is
    922     particularly hard to understand, so an example is helpful.  It was
    923     discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
    924     in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
    925     canonical POSIX-recommended usage scheme for condition variables
    926     is as follows:</para>
    927 
    928 <programlisting><![CDATA[
    929 b   is a Boolean condition, which is False most of the time
    930 cv  is a condition variable
    931 mx  is its associated mutex
    932 
    933 Signaller:                             Waiter:
    934 
    935 lock(mx)                               lock(mx)
    936 b = True                               while (b == False)
    937 signal(cv)                                wait(cv,mx)
    938 unlock(mx)                             unlock(mx)
    939 ]]></programlisting>
    940 
    941     <para>Assume <computeroutput>b</computeroutput> is False most of
    942     the time.  If the waiter arrives at the rendezvous first, it
    943     enters its while-loop, waits for the signaller to signal, and
    944     eventually proceeds.  Helgrind sees the signal, notes the
    945     dependency, and all is well.</para>
    946 
    947     <para>If the signaller arrives
    948     first, <computeroutput>b</computeroutput> is set to true, and the
    949     signal disappears into nowhere.  When the waiter later arrives, it
    950     does not enter its while-loop and simply carries on.  But even in
    951     this case, the waiter code following the while-loop cannot execute
    952     until the signaller sets <computeroutput>b</computeroutput> to
    953     True.  Hence there is still the same inter-thread dependency, but
    954     this time it is through an arbitrary in-memory condition, and
    955     Helgrind cannot see it.</para>
    956 
    957     <para>By comparison, Helgrind's detection of inter-thread
    958     dependencies caused by semaphore operations is believed to be
    959     exactly correct.</para>
    960 
    961     <para>As far as I know, a solution to this problem that does not
    962     require source-level annotation of condition-variable wait loops
    963     is beyond the current state of the art.</para>
    964   </listitem>
    965 
    966   <listitem>
    967     <para>Make sure you are using a supported Linux distribution.  At
    968     present, Helgrind only properly supports glibc-2.3 or later.  This
    969     in turn means we only support glibc's NPTL threading
    970     implementation.  The old LinuxThreads implementation is not
    971     supported.</para>
    972   </listitem>
    973 
    974   <listitem>
    975     <para>Round up all finished threads using
    976     <function>pthread_join</function>.  Avoid
    977     detaching threads: don't create threads in the detached state, and
    978     don't call <function>pthread_detach</function> on existing threads.</para>
    979 
    980     <para>Using <function>pthread_join</function> to round up finished
    981     threads provides a clear synchronisation point that both Helgrind and
    982     programmers can see.  If you don't call
    983     <function>pthread_join</function> on a thread, Helgrind has no way to
    984     know when it finishes, relative to any
    985     significant synchronisation points for other threads in the program.  So
    986     it assumes that the thread lingers indefinitely and can potentially
    987     interfere indefinitely with the memory state of the program.  It
    988     has every right to assume that -- after all, it might really be
    989     the case that, for scheduling reasons, the exiting thread did run
    990     very slowly in the last stages of its life.</para>
    991   </listitem>
    992 
    993   <listitem>
    994     <para>Perform thread debugging (with Helgrind) and memory
    995     debugging (with Memcheck) together.</para>
    996 
    997     <para>Helgrind tracks the state of memory in detail, and memory
    998     management bugs in the application are liable to cause confusion.
    999     In extreme cases, applications which do many invalid reads and
   1000     writes (particularly to freed memory) have been known to crash
   1001     Helgrind.  So, ideally, you should make your application
   1002     Memcheck-clean before using Helgrind.</para>
   1003 
   1004     <para>It may be impossible to make your application Memcheck-clean
   1005     unless you first remove threading bugs.  In particular, it may be
   1006     difficult to remove all reads and writes to freed memory in
   1007     multithreaded C++ destructor sequences at program termination.
   1008     So, ideally, you should make your application Helgrind-clean
   1009     before using Memcheck.</para>
   1010 
   1011     <para>Since this circularity is obviously unresolvable, at least
   1012     bear in mind that Memcheck and Helgrind are to some extent
   1013     complementary, and you may need to use them together.</para>
   1014   </listitem>
   1015 
   1016   <listitem>
   1017     <para>POSIX requires that implementations of standard I/O
   1018     (<function>printf</function>, <function>fprintf</function>,
   1019     <function>fwrite</function>, <function>fread</function>, etc) are thread
   1020     safe.  Unfortunately GNU libc implements this by using internal locking
   1021     primitives that Helgrind is unable to intercept.  Consequently Helgrind
   1022     generates many false race reports when you use these functions.</para>
   1023 
   1024     <para>Helgrind attempts to hide these errors using the standard
   1025     Valgrind error-suppression mechanism.  So, at least for simple
   1026     test cases, you don't see any.  Nevertheless, some may slip
   1027     through.  Just something to be aware of.</para>
   1028   </listitem>
   1029 
   1030   <listitem>
   1031     <para>Helgrind's error checks do not work properly inside the
   1032     system threading library itself
   1033     (<computeroutput>libpthread.so</computeroutput>), and it usually
   1034     observes large numbers of (false) errors in there.  Valgrind's
   1035     suppression system then filters these out, so you should not see
   1036     them.</para>
   1037 
   1038     <para>If you see any race errors reported
   1039     where <computeroutput>libpthread.so</computeroutput> or
   1040     <computeroutput>ld.so</computeroutput> is the object associated
   1041     with the innermost stack frame, please file a bug report at
   1042     <ulink url="&vg-url;">&vg-url;</ulink>.
   1043     </para>
   1044   </listitem>
   1045 
   1046 </orderedlist>
   1047 
   1048 </sect1>
   1049 
   1050 
   1051 
   1052 
   1053 <sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
   1054 <title>Helgrind Command-line Options</title>
   1055 
   1056 <para>The following end-user options are available:</para>
   1057 
   1058 <!-- start of xi:include in the manpage -->
   1059 <variablelist id="hg.opts.list">
   1060 
   1061   <varlistentry id="opt.free-is-write"
   1062                 xreflabel="--free-is-write">
   1063     <term>
   1064       <option><![CDATA[--free-is-write=no|yes
   1065       [default: no] ]]></option>
   1066     </term>
   1067     <listitem>
   1068       <para>When enabled (not the default), Helgrind treats freeing of
   1069         heap memory as if the memory was written immediately before
   1070         the free.  This exposes races where memory is referenced by
   1071         one thread, and freed by another, but there is no observable
   1072         synchronisation event to ensure that the reference happens
   1073         before the free.
   1074       </para>
   1075       <para>This functionality is new in Valgrind 3.7.0, and is
   1076         regarded as experimental.  It is not enabled by default
   1077         because its interaction with custom memory allocators is not
   1078         well understood at present.  User feedback is welcomed.
   1079       </para>
   1080     </listitem>
   1081   </varlistentry>
   1082 
   1083   <varlistentry id="opt.track-lockorders"
   1084                 xreflabel="--track-lockorders">
   1085     <term>
   1086       <option><![CDATA[--track-lockorders=no|yes
   1087       [default: yes] ]]></option>
   1088     </term>
   1089     <listitem>
   1090       <para>When enabled (the default), Helgrind performs lock order
   1091       consistency checking.  For some buggy programs, the large number
   1092       of lock order errors reported can become annoying, particularly
   1093       if you're only interested in race errors.  You may therefore find
   1094       it helpful to disable lock order checking.</para>
   1095     </listitem>
   1096   </varlistentry>
   1097 
   1098   <varlistentry id="opt.history-level"
   1099                 xreflabel="--history-level">
   1100     <term>
   1101       <option><![CDATA[--history-level=none|approx|full
   1102       [default: full] ]]></option>
   1103     </term>
   1104     <listitem>
   1105       <para><option>--history-level=full</option> (the default) causes
   1106         Helgrind collects enough information about "old" accesses that
   1107         it can produce two stack traces in a race report -- both the
   1108         stack trace for the current access, and the trace for the
   1109         older, conflicting access. To limit memory usage, "old" accesses
   1110         stack traces are limited to a maximum of 8 entries, even if
   1111         <option>--num-callers</option> value is bigger.</para>
   1112       <para>Collecting such information is expensive in both speed and
   1113         memory, particularly for programs that do many inter-thread
   1114         synchronisation events (locks, unlocks, etc).  Without such
   1115         information, it is more difficult to track down the root
   1116         causes of races.  Nonetheless, you may not need it in
   1117         situations where you just want to check for the presence or
   1118         absence of races, for example, when doing regression testing
   1119         of a previously race-free program.</para>
   1120       <para><option>--history-level=none</option> is the opposite
   1121         extreme.  It causes Helgrind not to collect any information
   1122         about previous accesses.  This can be dramatically faster
   1123         than <option>--history-level=full</option>.</para>
   1124       <para><option>--history-level=approx</option> provides a
   1125         compromise between these two extremes.  It causes Helgrind to
   1126         show a full trace for the later access, and approximate
   1127         information regarding the earlier access.  This approximate
   1128         information consists of two stacks, and the earlier access is
   1129         guaranteed to have occurred somewhere between program points
   1130         denoted by the two stacks. This is not as useful as showing
   1131         the exact stack for the previous access
   1132         (as <option>--history-level=full</option> does), but it is
   1133         better than nothing, and it is almost as fast as
   1134         <option>--history-level=none</option>.</para>
   1135     </listitem>
   1136   </varlistentry>
   1137 
   1138   <varlistentry id="opt.conflict-cache-size"
   1139                 xreflabel="--conflict-cache-size">
   1140     <term>
   1141       <option><![CDATA[--conflict-cache-size=N
   1142       [default: 1000000] ]]></option>
   1143     </term>
   1144     <listitem>
   1145       <para>This flag only has any effect
   1146         at <option>--history-level=full</option>.</para>
   1147       <para>Information about "old" conflicting accesses is stored in
   1148         a cache of limited size, with LRU-style management.  This is
   1149         necessary because it isn't practical to store a stack trace
   1150         for every single memory access made by the program.
   1151         Historical information on not recently accessed locations is
   1152         periodically discarded, to free up space in the cache.</para>
   1153       <para>This option controls the size of the cache, in terms of the
   1154         number of different memory addresses for which
   1155         conflicting access information is stored.  If you find that
   1156         Helgrind is showing race errors with only one stack instead of
   1157         the expected two stacks, try increasing this value.</para>
   1158       <para>The minimum value is 10,000 and the maximum is 30,000,000
   1159         (thirty times the default value).  Increasing the value by 1
   1160         increases Helgrind's memory requirement by very roughly 100
   1161         bytes, so the maximum value will easily eat up three extra
   1162         gigabytes or so of memory.</para>
   1163     </listitem>
   1164   </varlistentry>
   1165 
   1166   <varlistentry id="opt.check-stack-refs"
   1167                 xreflabel="--check-stack-refs">
   1168     <term>
   1169       <option><![CDATA[--check-stack-refs=no|yes
   1170       [default: yes] ]]></option>
   1171     </term>
   1172     <listitem>
   1173       <para>
   1174         By default Helgrind checks all data memory accesses made by your
   1175         program.  This flag enables you to skip checking for accesses
   1176         to thread stacks (local variables).  This can improve
   1177         performance, but comes at the cost of missing races on
   1178         stack-allocated data.
   1179       </para>
   1180     </listitem>
   1181   </varlistentry>
   1182 
   1183 
   1184 </variablelist>
   1185 <!-- end of xi:include in the manpage -->
   1186 
   1187 <!-- start of xi:include in the manpage -->
   1188 <!--  commented out, because we don't document debugging options in the
   1189       manual.  Nb: all the double-dashes below had a space inserted in them
   1190       to avoid problems with premature closing of this comment.
   1191 <para>In addition, the following debugging options are available for
   1192 Helgrind:</para>
   1193 
   1194 <variablelist id="hg.debugopts.list">
   1195 
   1196   <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
   1197     <term>
   1198       <option><![CDATA[- -trace-malloc=no|yes [no]
   1199       ]]></option>
   1200     </term>
   1201     <listitem>
   1202       <para>Show all client <function>malloc</function> (etc) and
   1203       <function>free</function> (etc) requests.</para>
   1204     </listitem>
   1205   </varlistentry>
   1206 
   1207   <varlistentry id="opt.cmp-race-err-addrs" 
   1208                 xreflabel="- -cmp-race-err-addrs">
   1209     <term>
   1210       <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
   1211       ]]></option>
   1212     </term>
   1213     <listitem>
   1214       <para>Controls whether or not race (data) addresses should be
   1215         taken into account when removing duplicates of race errors.
   1216         With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
   1217         identical race errors will be considered to be the same if
   1218         their race addresses differ.  With
   1219         With <varname>- -cmp-race-err-addrs=yes</varname> they will be
   1220         considered different.  This is provided to help make certain
   1221         regression tests work reliably.</para>
   1222     </listitem>
   1223   </varlistentry>
   1224 
   1225   <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
   1226     <term>
   1227       <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
   1228       ]]></option>
   1229     </term>
   1230     <listitem>
   1231       <para>Run extensive sanity checks on Helgrind's internal
   1232         data structures at events defined by the bitstring, as
   1233         follows:</para>
   1234       <para><computeroutput>010000 </computeroutput>after changes to
   1235         the lock order acquisition graph</para>
   1236       <para><computeroutput>001000 </computeroutput>after every client
   1237         memory access (NB: not currently used)</para>
   1238       <para><computeroutput>000100 </computeroutput>after every client
   1239         memory range permission setting of 256 bytes or greater</para>
   1240       <para><computeroutput>000010 </computeroutput>after every client
   1241         lock or unlock event</para>
   1242       <para><computeroutput>000001 </computeroutput>after every client
   1243         thread creation or joinage event</para>
   1244       <para>Note these will make Helgrind run very slowly, often to
   1245         the point of being completely unusable.</para>
   1246     </listitem>
   1247   </varlistentry>
   1248 
   1249 </variablelist>
   1250 -->
   1251 <!-- end of xi:include in the manpage -->
   1252 
   1253 
   1254 </sect1>
   1255 
   1256 
   1257 <sect1 id="hg-manual.monitor-commands" xreflabel="Helgrind Monitor Commands">
   1258 <title>Helgrind Monitor Commands</title>
   1259 <para>The Helgrind tool provides monitor commands handled by Valgrind's
   1260 built-in gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
   1261 </para>
   1262 <itemizedlist>
   1263   <listitem>
   1264     <para><varname>info locks</varname> shows the list of locks and their
   1265     status. </para>
   1266     <para>
   1267     In the following example, helgrind knows about one lock.
   1268     This lock is located at the guest address <varname>ga 0x8049a20</varname>.
   1269     The lock kind is <varname>rdwr</varname> indicating a reader-writer lock.
   1270     Other possible lock kinds are <varname>nonRec</varname> (simple mutex, non recursive)
   1271     and <varname>mbRec</varname> (simple mutex, possibly recursive).
   1272     The lock kind is then followed by the list of threads helding the lock.
   1273     In the below example, <varname>R1:thread #6 tid 3</varname> indicates that the
   1274     helgrind thread #6 has acquired (once, as the counter following the letter R is one)
   1275     the lock in read mode. The helgrind thread nr is incremented for each started thread.
   1276     The presence of 'tid 3' indicates that the thread #6 is has not exited yet and is the
   1277     valgrind tid 3. If a thread has terminated, then this is indicated with 'tid (exited)'.
   1278     </para>
   1279 <programlisting><![CDATA[
   1280 (gdb) monitor info locks
   1281 Lock ga 0x8049a20 {
   1282    kind   rdwr
   1283  { R1:thread #6 tid 3 }
   1284 }
   1285 (gdb) 
   1286 ]]></programlisting>
   1287 
   1288     <para> If you give the option <varname>--read-var-info=yes</varname>, then more
   1289     information will be provided about the lock location, such as the global variable
   1290     or the heap block that contains the lock:
   1291     </para>
   1292 <programlisting><![CDATA[
   1293 Lock ga 0x8049a20 {
   1294  Location 0x8049a20 is 0 bytes inside global var "s_rwlock"
   1295  declared at rwlock_race.c:17
   1296    kind   rdwr
   1297  { R1:thread #3 tid 3 }
   1298 }
   1299 ]]></programlisting>
   1300 
   1301   </listitem>
   1302 
   1303 </itemizedlist>
   1304 
   1305 </sect1>
   1306 
   1307 <sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
   1308 <title>Helgrind Client Requests</title>
   1309 
   1310 <para>The following client requests are defined in
   1311 <filename>helgrind.h</filename>.  See that file for exact details of their
   1312 arguments.</para>
   1313 
   1314 <itemizedlist>
   1315 
   1316   <listitem>
   1317     <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
   1318     <para>This makes Helgrind forget everything it knows about a
   1319     specified memory range.  This is particularly useful for memory
   1320     allocators that wish to recycle memory.</para>
   1321   </listitem>
   1322   <listitem>
   1323     <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
   1324   </listitem>
   1325   <listitem>
   1326     <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
   1327   </listitem>
   1328   <listitem>
   1329     <para><function>ANNOTATE_NEW_MEMORY</function></para>
   1330   </listitem>
   1331   <listitem>
   1332     <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
   1333   </listitem>
   1334   <listitem>
   1335     <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
   1336   </listitem>
   1337   <listitem>
   1338     <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
   1339   </listitem>
   1340   <listitem>
   1341     <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
   1342     <para>These are used to describe to Helgrind, the behaviour of
   1343     custom (non-POSIX) synchronisation primitives, which it otherwise
   1344     has no way to understand.  See comments
   1345     in <filename>helgrind.h</filename> for further
   1346     documentation.</para>
   1347   </listitem>
   1348 
   1349 </itemizedlist>
   1350 
   1351 </sect1>
   1352 
   1353 
   1354 
   1355 <sect1 id="hg-manual.todolist" xreflabel="To Do List">
   1356 <title>A To-Do List for Helgrind</title>
   1357 
   1358 <para>The following is a list of loose ends which should be tidied up
   1359 some time.</para>
   1360 
   1361 <itemizedlist>
   1362   <listitem><para>For lock order errors, print the complete lock
   1363     cycle, rather than only doing for size-2 cycles as at
   1364     present.</para>
   1365   </listitem>
   1366   <listitem><para>The conflicting access mechanism sometimes
   1367     mysteriously fails to show the conflicting access' stack, even
   1368     when provided with unbounded storage for conflicting access info.
   1369     This should be investigated.</para>
   1370   </listitem>
   1371   <listitem><para>Document races caused by GCC's thread-unsafe code
   1372     generation for speculative stores.  In the interim see
   1373     <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
   1374     </computeroutput>
   1375     and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
   1376     </para>
   1377   </listitem>
   1378   <listitem><para>Don't update the lock-order graph, and don't check
   1379     for errors, when a "try"-style lock operation happens (e.g.
   1380     <function>pthread_mutex_trylock</function>).  Such calls do not add any real
   1381     restrictions to the locking order, since they can always fail to
   1382     acquire the lock, resulting in the caller going off and doing Plan
   1383     B (presumably it will have a Plan B).  Doing such checks could
   1384     generate false lock-order errors and confuse users.</para>
   1385   </listitem>
   1386   <listitem><para> Performance can be very poor.  Slowdowns on the
   1387     order of 100:1 are not unusual.  There is limited scope for
   1388     performance improvements.
   1389     </para>
   1390   </listitem>
   1391 
   1392 </itemizedlist>
   1393 
   1394 </sect1>
   1395 
   1396 </chapter>
   1397