Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 
      7 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
      8   <title>Helgrind: a thread error detector</title>
      9 
     10 <para>To use this tool, you must specify
     11 <option>--tool=helgrind</option> on the Valgrind
     12 command line.</para>
     13 
     14 
     15 <sect1 id="hg-manual.overview" xreflabel="Overview">
     16 <title>Overview</title>
     17 
     18 <para>Helgrind is a Valgrind tool for detecting synchronisation errors
     19 in C, C++ and Fortran programs that use the POSIX pthreads
     20 threading primitives.</para>
     21 
     22 <para>The main abstractions in POSIX pthreads are: a set of threads
     23 sharing a common address space, thread creation, thread joining,
     24 thread exit, mutexes (locks), condition variables (inter-thread event
     25 notifications), reader-writer locks, spinlocks, semaphores and
     26 barriers.</para>
     27 
     28 <para>Helgrind can detect three classes of errors, which are discussed
     29 in detail in the next three sections:</para>
     30 
     31 <orderedlist>
     32  <listitem>
     33   <para><link linkend="hg-manual.api-checks">
     34         Misuses of the POSIX pthreads API.</link></para>
     35  </listitem>
     36  <listitem>
     37   <para><link linkend="hg-manual.lock-orders">
     38         Potential deadlocks arising from lock
     39         ordering problems.</link></para>
     40  </listitem>
     41  <listitem>
     42   <para><link linkend="hg-manual.data-races">
     43         Data races -- accessing memory without adequate locking
     44                       or synchronisation</link>.
     45   </para>
     46  </listitem>
     47 </orderedlist>
     48 
     49 <para>Problems like these often result in unreproducible,
     50 timing-dependent crashes, deadlocks and other misbehaviour, and
     51 can be difficult to find by other means.</para>
     52 
     53 <para>Helgrind is aware of all the pthread abstractions and tracks
     54 their effects as accurately as it can.  On x86 and amd64 platforms, it
     55 understands and partially handles implicit locking arising from the
     56 use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
     57 platforms, it partially handles implicit locking arising from 
     58 load-linked and store-conditional instruction pairs.
     59 </para>
     60 
     61 <para>Helgrind works best when your application uses only the POSIX
     62 pthreads API.  However, if you want to use custom threading 
     63 primitives, you can describe their behaviour to Helgrind using the
     64 <varname>ANNOTATE_*</varname> macros defined
     65 in <varname>helgrind.h</varname>.</para>
     66 
     67 
     68 
     69 <para>Following those is a section containing 
     70 <link linkend="hg-manual.effective-use">
     71 hints and tips on how to get the best out of Helgrind.</link>
     72 </para>
     73 
     74 <para>Then there is a
     75 <link linkend="hg-manual.options">summary of command-line
     76 options.</link>
     77 </para>
     78 
     79 <para>Finally, there is 
     80 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
     81 could be improved.</link>
     82 </para>
     83 
     84 </sect1>
     85 
     86 
     87 
     88 
     89 <sect1 id="hg-manual.api-checks" xreflabel="API Checks">
     90 <title>Detected errors: Misuses of the POSIX pthreads API</title>
     91 
     92 <para>Helgrind intercepts calls to many POSIX pthreads functions, and
     93 is therefore able to report on various common problems.  Although
     94 these are unglamourous errors, their presence can lead to undefined
     95 program behaviour and hard-to-find bugs later on.  The detected errors
     96 are:</para>
     97 
     98 <itemizedlist>
     99  <listitem><para>unlocking an invalid mutex</para></listitem>
    100  <listitem><para>unlocking a not-locked mutex</para></listitem>
    101  <listitem><para>unlocking a mutex held by a different
    102                  thread</para></listitem>
    103  <listitem><para>destroying an invalid or a locked mutex</para></listitem>
    104  <listitem><para>recursively locking a non-recursive mutex</para></listitem>
    105  <listitem><para>deallocation of memory that contains a
    106                  locked mutex</para></listitem>
    107  <listitem><para>passing mutex arguments to functions expecting
    108                  reader-writer lock arguments, and vice
    109                  versa</para></listitem>
    110  <listitem><para>when a POSIX pthread function fails with an
    111                  error code that must be handled</para></listitem>
    112  <listitem><para>when a thread exits whilst still holding locked
    113                  locks</para></listitem>
    114  <listitem><para>calling <function>pthread_cond_wait</function>
    115                  with a not-locked mutex, an invalid mutex,
    116                  or one locked by a different
    117                  thread</para></listitem>
    118  <listitem><para>inconsistent bindings between condition
    119                  variables and their associated mutexes</para></listitem>
    120  <listitem><para>invalid or duplicate initialisation of a pthread
    121                  barrier</para></listitem>
    122  <listitem><para>initialisation of a pthread barrier on which threads
    123                  are still waiting</para></listitem>
    124  <listitem><para>destruction of a pthread barrier object which was
    125                  never initialised, or on which threads are still
    126                  waiting</para></listitem>
    127  <listitem><para>waiting on an uninitialised pthread
    128                  barrier</para></listitem>
    129  <listitem><para>for all of the pthreads functions that Helgrind
    130                  intercepts, an error is reported, along with a stack
    131                  trace, if the system threading library routine returns
    132                  an error code, even if Helgrind itself detected no
    133                  error</para></listitem>
    134 </itemizedlist>
    135 
    136 <para>Checks pertaining to the validity of mutexes are generally also
    137 performed for reader-writer locks.</para>
    138 
    139 <para>Various kinds of this-can't-possibly-happen events are also
    140 reported.  These usually indicate bugs in the system threading
    141 library.</para>
    142 
    143 <para>Reported errors always contain a primary stack trace indicating
    144 where the error was detected.  They may also contain auxiliary stack
    145 traces giving additional information.  In particular, most errors
    146 relating to mutexes will also tell you where that mutex first came to
    147 Helgrind's attention (the "<computeroutput>was first observed
    148 at</computeroutput>" part), so you have a chance of figuring out which
    149 mutex it is referring to.  For example:</para>
    150 
    151 <programlisting><![CDATA[
    152 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
    153    at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
    154    by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
    155    by 0x40079B: main (tc09_bad_unlock.c:50)
    156   Lock at 0x7FEFFFA90 was first observed
    157    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    158    by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
    159    by 0x40079B: main (tc09_bad_unlock.c:50)
    160 ]]></programlisting>
    161 
    162 <para>Helgrind has a way of summarising thread identities, as
    163 you see here with the text "<computeroutput>Thread
    164 #1</computeroutput>".  This is so that it can speak about threads and
    165 sets of threads without overwhelming you with details.  See 
    166 <link linkend="hg-manual.data-races.errmsgs">below</link>
    167 for more information on interpreting error messages.</para>
    168 
    169 </sect1>
    170 
    171 
    172 
    173 
    174 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
    175 <title>Detected errors: Inconsistent Lock Orderings</title>
    176 
    177 <para>In this section, and in general, to "acquire" a lock simply
    178 means to lock that lock, and to "release" a lock means to unlock
    179 it.</para>
    180 
    181 <para>Helgrind monitors the order in which threads acquire locks.
    182 This allows it to detect potential deadlocks which could arise from
    183 the formation of cycles of locks.  Detecting such inconsistencies is
    184 useful because, whilst actual deadlocks are fairly obvious, potential
    185 deadlocks may never be discovered during testing and could later lead
    186 to hard-to-diagnose in-service failures.</para>
    187 
    188 <para>The simplest example of such a problem is as
    189 follows.</para>
    190 
    191 <itemizedlist>
    192  <listitem><para>Imagine some shared resource R, which, for whatever
    193   reason, is guarded by two locks, L1 and L2, which must both be held
    194   when R is accessed.</para>
    195  </listitem>
    196  <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
    197   to access R.  The implication of this is that all threads in the
    198   program must acquire the two locks in the order first L1 then L2.
    199   Not doing so risks deadlock.</para>
    200  </listitem>
    201  <listitem><para>The deadlock could happen if two threads -- call them
    202   T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
    203   and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
    204   to acquire L1, but those locks are both already held.  So T1 and T2
    205   become deadlocked.</para>
    206  </listitem>
    207 </itemizedlist>
    208 
    209 <para>Helgrind builds a directed graph indicating the order in which
    210 locks have been acquired in the past.  When a thread acquires a new
    211 lock, the graph is updated, and then checked to see if it now contains
    212 a cycle.  The presence of a cycle indicates a potential deadlock involving
    213 the locks in the cycle.</para>
    214 
    215 <para>In general, Helgrind will choose two locks involved in the cycle
    216 and show you how their acquisition ordering has become inconsistent.
    217 It does this by showing the program points that first defined the
    218 ordering, and the program points which later violated it.  Here is a
    219 simple example involving just two locks:</para>
    220 
    221 <programlisting><![CDATA[
    222 Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
    223 
    224 Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
    225    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    226    by 0x400825: main (tc13_laog1.c:23)
    227 
    228  followed by a later acquisition of lock at 0x7FF0006D0
    229    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    230    by 0x400853: main (tc13_laog1.c:24)
    231 
    232 Required order was established by acquisition of lock at 0x7FF0006D0
    233    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    234    by 0x40076D: main (tc13_laog1.c:17)
    235 
    236  followed by a later acquisition of lock at 0x7FF0006A0
    237    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    238    by 0x40079B: main (tc13_laog1.c:18)
    239 ]]></programlisting>
    240 
    241 <para>When there are more than two locks in the cycle, the error is
    242 equally serious.  However, at present Helgrind does not show the locks
    243 involved, sometimes because it that information is not available, but
    244 also so as to avoid flooding you with information.  For example, here
    245 is an example involving a cycle of five locks from a naive
    246 implementation the famous Dining Philosophers problem
    247 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
    248 In this case Helgrind has detected that all 5 philosophers could
    249 simultaneously pick up their left fork and then deadlock whilst
    250 waiting to pick up their right forks.</para>
    251 
    252 <programlisting><![CDATA[
    253 Thread #6: lock order "0x6010C0 before 0x601160" violated
    254 
    255 Observed (incorrect) order is: acquisition of lock at 0x601160
    256    (stack unavailable)
    257 
    258  followed by a later acquisition of lock at 0x6010C0
    259    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    260    by 0x4007DE: dine (tc14_laog_dinphils.c:19)
    261    by 0x4C2CBE7: mythread_wrapper (hg_intercepts.c:219)
    262    by 0x4E369C9: start_thread (pthread_create.c:300)
    263 ]]></programlisting>
    264 
    265 </sect1>
    266 
    267 
    268 
    269 
    270 <sect1 id="hg-manual.data-races" xreflabel="Data Races">
    271 <title>Detected errors: Data Races</title>
    272 
    273 <para>A data race happens, or could happen, when two threads access a
    274 shared memory location without using suitable locks or other
    275 synchronisation to ensure single-threaded access.  Such missing
    276 locking can cause obscure timing dependent bugs.  Ensuring programs
    277 are race-free is one of the central difficulties of threaded
    278 programming.</para>
    279 
    280 <para>Reliably detecting races is a difficult problem, and most
    281 of Helgrind's internals are devoted to dealing with it.  
    282 We begin with a simple example.</para>
    283 
    284 
    285 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
    286 <title>A Simple Data Race</title>
    287 
    288 <para>About the simplest possible example of a race is as follows.  In
    289 this program, it is impossible to know what the value
    290 of <computeroutput>var</computeroutput> is at the end of the program.
    291 Is it 2 ?  Or 1 ?</para>
    292 
    293 <programlisting><![CDATA[
    294 #include <pthread.h>
    295 
    296 int var = 0;
    297 
    298 void* child_fn ( void* arg ) {
    299    var++; /* Unprotected relative to parent */ /* this is line 6 */
    300    return NULL;
    301 }
    302 
    303 int main ( void ) {
    304    pthread_t child;
    305    pthread_create(&child, NULL, child_fn, NULL);
    306    var++; /* Unprotected relative to child */ /* this is line 13 */
    307    pthread_join(child, NULL);
    308    return 0;
    309 }
    310 ]]></programlisting>
    311 
    312 <para>The problem is there is nothing to
    313 stop <varname>var</varname> being updated simultaneously
    314 by both threads.  A correct program would 
    315 protect <varname>var</varname> with a lock of type
    316 <function>pthread_mutex_t</function>, which is acquired
    317 before each access and released afterwards.  Helgrind's output for
    318 this program is:</para>
    319 
    320 <programlisting><![CDATA[
    321 Thread #1 is the program's root thread
    322 
    323 Thread #2 was created
    324    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    325    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    326    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    327    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    328    by 0x400605: main (simple_race.c:12)
    329 
    330 Possible data race during read of size 4 at 0x601038 by thread #1
    331 Locks held: none
    332    at 0x400606: main (simple_race.c:13)
    333 
    334 This conflicts with a previous write of size 4 by thread #2
    335 Locks held: none
    336    at 0x4005DC: child_fn (simple_race.c:6)
    337    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    338    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    339    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    340 
    341 Location 0x601038 is 0 bytes inside global var "var"
    342 declared at simple_race.c:3
    343 ]]></programlisting>
    344 
    345 <para>This is quite a lot of detail for an apparently simple error.
    346 The last clause is the main error message.  It says there is a race as
    347 a result of a read of size 4 (bytes), at 0x601038, which is the
    348 address of <computeroutput>var</computeroutput>, happening in
    349 function <computeroutput>main</computeroutput> at line 13 in the
    350 program.</para>
    351 
    352 <para>Two important parts of the message are:</para>
    353 
    354 <itemizedlist>
    355  <listitem>
    356   <para>Helgrind shows two stack traces for the error, not one.  By
    357    definition, a race involves two different threads accessing the
    358    same location in such a way that the result depends on the relative
    359    speeds of the two threads.</para>
    360   <para>
    361    The first stack trace follows the text "<computeroutput>Possible
    362    data race during read of size 4 ...</computeroutput>" and the
    363    second trace follows the text "<computeroutput>This conflicts with
    364    a previous write of size 4 ...</computeroutput>".  Helgrind is
    365    usually able to show both accesses involved in a race.  At least
    366    one of these will be a write (since two concurrent, unsynchronised
    367    reads are harmless), and they will of course be from different
    368    threads.</para>
    369   <para>By examining your program at the two locations, you should be
    370    able to get at least some idea of what the root cause of the
    371    problem is.  For each location, Helgrind shows the set of locks
    372    held at the time of the access.  This often makes it clear which
    373    thread, if any, failed to take a required lock.  In this example
    374    neither thread holds a lock during the access.</para>
    375  </listitem>
    376  <listitem>
    377   <para>For races which occur on global or stack variables, Helgrind
    378    tries to identify the name and defining point of the variable.
    379    Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
    380    global var "var" declared at simple_race.c:3</computeroutput>".</para>
    381   <para>Showing names of stack and global variables carries no
    382    run-time overhead once Helgrind has your program up and running.
    383    However, it does require Helgrind to spend considerable extra time
    384    and memory at program startup to read the relevant debug info.
    385    Hence this facility is disabled by default.  To enable it, you need
    386    to give the <varname>--read-var-info=yes</varname> option to
    387    Helgrind.</para>
    388  </listitem>
    389 </itemizedlist>
    390 
    391 <para>The following section explains Helgrind's race detection
    392 algorithm in more detail.</para>
    393 
    394 </sect2>
    395 
    396 
    397 
    398 <sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
    399 <title>Helgrind's Race Detection Algorithm</title>
    400 
    401 <para>Most programmers think about threaded programming in terms of
    402 the basic functionality provided by the threading library (POSIX
    403 Pthreads): thread creation, thread joining, locks, condition
    404 variables, semaphores and barriers.</para>
    405 
    406 <para>The effect of using these functions is to impose 
    407 constraints upon the order in which memory accesses can
    408 happen.  This implied ordering is generally known as the
    409 "happens-before relation".  Once you understand the happens-before
    410 relation, it is easy to see how Helgrind finds races in your code.
    411 Fortunately, the happens-before relation is itself easy to understand,
    412 and is by itself a useful tool for reasoning about the behaviour of
    413 parallel programs.  We now introduce it using a simple example.</para>
    414 
    415 <para>Consider first the following buggy program:</para>
    416 
    417 <programlisting><![CDATA[
    418 Parent thread:                         Child thread:
    419 
    420 int var;
    421 
    422 // create child thread
    423 pthread_create(...)                          
    424 var = 20;                              var = 10;
    425                                        exit
    426 
    427 // wait for child
    428 pthread_join(...)
    429 printf("%d\n", var);
    430 ]]></programlisting>
    431 
    432 <para>The parent thread creates a child.  Both then write different
    433 values to some variable <computeroutput>var</computeroutput>, and the
    434 parent then waits for the child to exit.</para>
    435 
    436 <para>What is the value of <computeroutput>var</computeroutput> at the
    437 end of the program, 10 or 20?  We don't know.  The program is
    438 considered buggy (it has a race) because the final value
    439 of <computeroutput>var</computeroutput> depends on the relative rates
    440 of progress of the parent and child threads.  If the parent is fast
    441 and the child is slow, then the child's assignment may happen later,
    442 so the final value will be 10; and vice versa if the child is faster
    443 than the parent.</para>
    444 
    445 <para>The relative rates of progress of parent vs child is not something
    446 the programmer can control, and will often change from run to run.
    447 It depends on factors such as the load on the machine, what else is
    448 running, the kernel's scheduling strategy, and many other factors.</para>
    449 
    450 <para>The obvious fix is to use a lock to
    451 protect <computeroutput>var</computeroutput>.  It is however
    452 instructive to consider a somewhat more abstract solution, which is to
    453 send a message from one thread to the other:</para>
    454 
    455 <programlisting><![CDATA[
    456 Parent thread:                         Child thread:
    457 
    458 int var;
    459 
    460 // create child thread
    461 pthread_create(...)                          
    462 var = 20;
    463 // send message to child
    464                                        // wait for message to arrive
    465                                        var = 10;
    466                                        exit
    467 
    468 // wait for child
    469 pthread_join(...)
    470 printf("%d\n", var);
    471 ]]></programlisting>
    472 
    473 <para>Now the program reliably prints "10", regardless of the speed of
    474 the threads.  Why?  Because the child's assignment cannot happen until
    475 after it receives the message.  And the message is not sent until
    476 after the parent's assignment is done.</para>
    477 
    478 <para>The message transmission creates a "happens-before" dependency
    479 between the two assignments: <computeroutput>var = 20;</computeroutput>
    480 must now happen-before <computeroutput>var = 10;</computeroutput>.
    481 And so there is no longer a race
    482 on <computeroutput>var</computeroutput>.
    483 </para>
    484 
    485 <para>Note that it's not significant that the parent sends a message
    486 to the child.  Sending a message from the child (after its assignment)
    487 to the parent (before its assignment) would also fix the problem, causing
    488 the program to reliably print "20".</para>
    489 
    490 <para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
    491 accesses to memory locations.  If a location -- in this example, 
    492 <computeroutput>var</computeroutput>,
    493 is accessed by two different threads, Helgrind checks to see if the
    494 two accesses are ordered by the happens-before relation.  If so,
    495 that's fine; if not, it reports a race.</para>
    496 
    497 <para>It is important to understand that the happens-before relation
    498 creates only a partial ordering, not a total ordering.  An example of
    499 a total ordering is comparison of numbers: for any two numbers 
    500 <computeroutput>x</computeroutput> and
    501 <computeroutput>y</computeroutput>, either 
    502 <computeroutput>x</computeroutput> is less than, equal to, or greater
    503 than
    504 <computeroutput>y</computeroutput>.  A partial ordering is like a
    505 total ordering, but it can also express the concept that two elements
    506 are neither equal, less or greater, but merely unordered with respect
    507 to each other.</para>
    508 
    509 <para>In the fixed example above, we say that 
    510 <computeroutput>var = 20;</computeroutput> "happens-before"
    511 <computeroutput>var = 10;</computeroutput>.  But in the original
    512 version, they are unordered: we cannot say that either happens-before
    513 the other.</para>
    514 
    515 <para>What does it mean to say that two accesses from different
    516 threads are ordered by the happens-before relation?  It means that
    517 there is some chain of inter-thread synchronisation operations which
    518 cause those accesses to happen in a particular order, irrespective of
    519 the actual rates of progress of the individual threads.  This is a
    520 required property for a reliable threaded program, which is why
    521 Helgrind checks for it.</para>
    522 
    523 <para>The happens-before relations created by standard threading
    524 primitives are as follows:</para>
    525 
    526 <itemizedlist>
    527  <listitem><para>When a mutex is unlocked by thread T1 and later (or
    528   immediately) locked by thread T2, then the memory accesses in T1
    529   prior to the unlock must happen-before those in T2 after it acquires
    530   the lock.</para>
    531  </listitem>
    532  <listitem><para>The same idea applies to reader-writer locks,
    533   although with some complication so as to allow correct handling of
    534   reads vs writes.</para>
    535  </listitem>
    536  <listitem><para>When a condition variable (CV) is signalled on by
    537   thread T1 and some other thread T2 is thereby released from a wait
    538   on the same CV, then the memory accesses in T1 prior to the
    539   signalling must happen-before those in T2 after it returns from the
    540   wait.  If no thread was waiting on the CV then there is no
    541   effect.</para>
    542  </listitem>
    543  <listitem><para>If instead T1 broadcasts on a CV, then all of the
    544   waiting threads, rather than just one of them, acquire a
    545   happens-before dependency on the broadcasting thread at the point it
    546   did the broadcast.</para>
    547  </listitem>
    548  <listitem><para>A thread T2 that continues after completing sem_wait
    549   on a semaphore that thread T1 posts on, acquires a happens-before
    550   dependence on the posting thread, a bit like dependencies caused
    551   mutex unlock-lock pairs.  However, since a semaphore can be posted
    552   on many times, it is unspecified from which of the post calls the
    553   wait call gets its happens-before dependency.</para>
    554  </listitem>
    555  <listitem><para>For a group of threads T1 .. Tn which arrive at a
    556   barrier and then move on, each thread after the call has a
    557   happens-after dependency from all threads before the
    558   barrier.</para>
    559  </listitem>
    560  <listitem><para>A newly-created child thread acquires an initial
    561   happens-after dependency on the point where its parent created it.
    562   That is, all memory accesses performed by the parent prior to
    563   creating the child are regarded as happening-before all the accesses
    564   of the child.</para>
    565  </listitem>
    566  <listitem><para>Similarly, when an exiting thread is reaped via a
    567   call to <function>pthread_join</function>, once the call returns, the
    568   reaping thread acquires a happens-after dependency relative to all memory
    569   accesses made by the exiting thread.</para>
    570  </listitem>
    571 </itemizedlist>
    572 
    573 <para>In summary: Helgrind intercepts the above listed events, and builds a
    574 directed acyclic graph represented the collective happens-before
    575 dependencies.  It also monitors all memory accesses.</para>
    576 
    577 <para>If a location is accessed by two different threads, but Helgrind
    578 cannot find any path through the happens-before graph from one access
    579 to the other, then it reports a race.</para>
    580 
    581 <para>There are a couple of caveats:</para>
    582 
    583 <itemizedlist>
    584  <listitem><para>Helgrind doesn't check for a race in the case where
    585   both accesses are reads.  That would be silly, since concurrent
    586   reads are harmless.</para>
    587  </listitem>
    588  <listitem><para>Two accesses are considered to be ordered by the
    589   happens-before dependency even through arbitrarily long chains of
    590   synchronisation events.  For example, if T1 accesses some location
    591   L, and then <function>pthread_cond_signals</function> T2, which later
    592   <function>pthread_cond_signals</function> T3, which then accesses L, then
    593   a suitable happens-before dependency exists between the first and second
    594   accesses, even though it involves two different inter-thread
    595   synchronisation events.</para>
    596  </listitem>
    597 </itemizedlist>
    598 
    599 </sect2>
    600 
    601 
    602 
    603 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
    604 <title>Interpreting Race Error Messages</title>
    605 
    606 <para>Helgrind's race detection algorithm collects a lot of
    607 information, and tries to present it in a helpful way when a race is
    608 detected.  Here's an example:</para>
    609 
    610 <programlisting><![CDATA[
    611 Thread #2 was created
    612    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    613    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    614    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    615    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    616    by 0x4008F2: main (tc21_pthonce.c:86)
    617 
    618 Thread #3 was created
    619    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    620    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    621    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    622    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    623    by 0x4008F2: main (tc21_pthonce.c:86)
    624 
    625 Possible data race during read of size 4 at 0x601070 by thread #3
    626 Locks held: none
    627    at 0x40087A: child (tc21_pthonce.c:74)
    628    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    629    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    630    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    631 
    632 This conflicts with a previous write of size 4 by thread #2
    633 Locks held: none
    634    at 0x400883: child (tc21_pthonce.c:74)
    635    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    636    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    637    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    638 
    639 Location 0x601070 is 0 bytes inside local var "unprotected2"
    640 declared at tc21_pthonce.c:51, in frame #0 of thread 3
    641 ]]></programlisting>
    642 
    643 <para>Helgrind first announces the creation points of any threads
    644 referenced in the error message.  This is so it can speak concisely
    645 about threads without repeatedly printing their creation point call
    646 stacks.  Each thread is only ever announced once, the first time it
    647 appears in any Helgrind error message.</para>
    648 
    649 <para>The main error message begins at the text
    650 "<computeroutput>Possible data race during read</computeroutput>".  At
    651 the start is information you would expect to see -- address and size
    652 of the racing access, whether a read or a write, and the call stack at
    653 the point it was detected.</para>
    654 
    655 <para>A second call stack is presented starting at the text
    656 "<computeroutput>This conflicts with a previous
    657 write</computeroutput>".  This shows a previous access which also
    658 accessed the stated address, and which is believed to be racing
    659 against the access in the first call stack. Note that this second
    660 call stack is limited to a maximum of 8 entries to limit the
    661 memory usage.</para>
    662 
    663 <para>Finally, Helgrind may attempt to give a description of the
    664 raced-on address in source level terms.  In this example, it
    665 identifies it as a local variable, shows its name, declaration point,
    666 and in which frame (of the first call stack) it lives.  Note that this
    667 information is only shown when <varname>--read-var-info=yes</varname>
    668 is specified on the command line.  That's because reading the DWARF3
    669 debug information in enough detail to capture variable type and
    670 location information makes Helgrind much slower at startup, and also
    671 requires considerable amounts of memory, for large programs.
    672 </para>
    673 
    674 <para>Once you have your two call stacks, how do you find the root
    675 cause of the race?</para>
    676 
    677 <para>The first thing to do is examine the source locations referred
    678 to by each call stack.  They should both show an access to the same
    679 location, or variable.</para>
    680 
    681 <para>Now figure out how how that location should have been made
    682 thread-safe:</para>
    683 
    684 <itemizedlist>
    685  <listitem><para>Perhaps the location was intended to be protected by
    686   a mutex?  If so, you need to lock and unlock the mutex at both
    687   access points, even if one of the accesses is reported to be a read.
    688   Did you perhaps forget the locking at one or other of the accesses?
    689   To help you do this, Helgrind shows the set of locks held by each
    690   threads at the time they accessed the raced-on location.</para>
    691  </listitem>
    692  <listitem><para>Alternatively, perhaps you intended to use a some
    693   other scheme to make it safe, such as signalling on a condition
    694   variable.  In all such cases, try to find a synchronisation event
    695   (or a chain thereof) which separates the earlier-observed access (as
    696   shown in the second call stack) from the later-observed access (as
    697   shown in the first call stack).  In other words, try to find
    698   evidence that the earlier access "happens-before" the later access.
    699   See the previous subsection for an explanation of the happens-before
    700   relation.</para>
    701   <para>
    702   The fact that Helgrind is reporting a race means it did not observe
    703   any happens-before relation between the two accesses.  If
    704   Helgrind is working correctly, it should also be the case that you
    705   also cannot find any such relation, even on detailed inspection
    706   of the source code.  Hopefully, though, your inspection of the code
    707   will show where the missing synchronisation operation(s) should have
    708   been.</para>
    709  </listitem>
    710 </itemizedlist>
    711 
    712 </sect2>
    713 
    714 
    715 </sect1>
    716 
    717 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
    718 <title>Hints and Tips for Effective Use of Helgrind</title>
    719 
    720 <para>Helgrind can be very helpful in finding and resolving
    721 threading-related problems.  Like all sophisticated tools, it is most
    722 effective when you understand how to play to its strengths.</para>
    723 
    724 <para>Helgrind will be less effective when you merely throw an
    725 existing threaded program at it and try to make sense of any reported
    726 errors.  It will be more effective if you design threaded programs
    727 from the start in a way that helps Helgrind verify correctness.  The
    728 same is true for finding memory errors with Memcheck, but applies more
    729 here, because thread checking is a harder problem.  Consequently it is
    730 much easier to write a correct program for which Helgrind falsely
    731 reports (threading) errors than it is to write a correct program for
    732 which Memcheck falsely reports (memory) errors.</para>
    733 
    734 <para>With that in mind, here are some tips, listed most important first,
    735 for getting reliable results and avoiding false errors.  The first two
    736 are critical.  Any violations of them will swamp you with huge numbers
    737 of false data-race errors.</para>
    738 
    739 
    740 <orderedlist>
    741 
    742   <listitem>
    743     <para>Make sure your application, and all the libraries it uses,
    744     use the POSIX threading primitives.  Helgrind needs to be able to
    745     see all events pertaining to thread creation, exit, locking and
    746     other synchronisation events.  To do so it intercepts many POSIX
    747     pthreads functions.</para>
    748 
    749     <para>Do not roll your own threading primitives (mutexes, etc)
    750     from combinations of the Linux futex syscall, atomic counters, etc.
    751     These throw Helgrind's internal what's-going-on models
    752     way off course and will give bogus results.</para>
    753 
    754     <para>Also, do not reimplement existing POSIX abstractions using
    755     other POSIX abstractions.  For example, don't build your own
    756     semaphore routines or reader-writer locks from POSIX mutexes and
    757     condition variables.  Instead use POSIX reader-writer locks and
    758     semaphores directly, since Helgrind supports them directly.</para>
    759 
    760     <para>Helgrind directly supports the following POSIX threading
    761     abstractions: mutexes, reader-writer locks, condition variables
    762     (but see below), semaphores and barriers.  Currently spinlocks
    763     are not supported, although they could be in future.</para>
    764 
    765     <para>At the time of writing, the following popular Linux packages
    766     are known to implement their own threading primitives:</para>
    767 
    768     <itemizedlist>
    769      <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
    770       only uses POSIX pthreads primitives.  Unfortunately Qt 4.X 
    771       has its own implementation of mutexes (QMutex) and thread reaping.
    772       Helgrind 3.4.x contains direct support
    773       for Qt 4.X threading, which is experimental but is believed to
    774       work fairly well.  A side effect of supporting Qt 4 directly is
    775       that Helgrind can be used to debug KDE4 applications.  As this
    776       is an experimental feature, we would particularly appreciate
    777       feedback from folks who have used Helgrind to successfully debug
    778       Qt 4 and/or KDE4 applications.</para>
    779      </listitem>
    780      <listitem><para>Runtime support library for GNU OpenMP (part of
    781       GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
    782       library (<filename>libgomp.so</filename>) constructs its own
    783       synchronisation primitives using combinations of atomic memory
    784       instructions and the futex syscall, which causes total chaos since in
    785       Helgrind since it cannot "see" those.</para>
    786      <para>Fortunately, this can be solved using a configuration-time
    787       option (for GCC).  Rebuild GCC from source, and configure using
    788       <varname>--disable-linux-futex</varname>.
    789       This makes libgomp.so use the standard
    790       POSIX threading primitives instead.  Note that this was tested
    791       using GCC 4.2.3 and has not been re-tested using more recent GCC
    792       versions.  We would appreciate hearing about any successes or
    793       failures with more recent versions.</para>
    794      </listitem>
    795     </itemizedlist>
    796 
    797     <para>If you must implement your own threading primitives, there
    798       are a set of client request macros
    799       in <computeroutput>helgrind.h</computeroutput> to help you
    800       describe your primitives to Helgrind.  You should be able to
    801       mark up mutexes, condition variables, etc, without difficulty.
    802     </para>
    803     <para>
    804       It is also possible to mark up the effects of thread-safe
    805       reference counting using the
    806       <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
    807       <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
    808       <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
    809       macros.  Thread-safe reference counting using an atomically
    810       incremented/decremented refcount variable causes Helgrind
    811       problems because a one-to-zero transition of the reference count
    812       means the accessing thread has exclusive ownership of the
    813       associated resource (normally, a C++ object) and can therefore
    814       access it (normally, to run its destructor) without locking.
    815       Helgrind doesn't understand this, and markup is essential to
    816       avoid false positives.
    817     </para>
    818 
    819     <para>
    820       Here are recommended guidelines for marking up thread safe
    821       reference counting in C++.  You only need to mark up your
    822       release methods -- the ones which decrement the reference count.
    823       Given a class like this:
    824     </para>
    825 
    826 <programlisting><![CDATA[
    827 class MyClass {
    828    unsigned int mRefCount;
    829 
    830    void Release ( void ) {
    831       unsigned int newCount = atomic_decrement(&mRefCount);
    832       if (newCount == 0) {
    833          delete this;
    834       }
    835    }
    836 }
    837 ]]></programlisting>
    838 
    839    <para>
    840      the release method should be marked up as follows:
    841    </para>
    842 
    843 <programlisting><![CDATA[
    844    void Release ( void ) {
    845       unsigned int newCount = atomic_decrement(&mRefCount);
    846       if (newCount == 0) {
    847          ANNOTATE_HAPPENS_AFTER(&mRefCount);
    848          ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
    849          delete this;
    850       } else {
    851          ANNOTATE_HAPPENS_BEFORE(&mRefCount);
    852       }
    853    }
    854 ]]></programlisting>
    855 
    856     <para>
    857       There are a number of complex, mostly-theoretical objections to
    858       this scheme.  From a theoretical standpoint it appears to be
    859       impossible to devise a markup scheme which is completely correct
    860       in the sense of guaranteeing to remove all false races.  The
    861       proposed scheme however works well in practice.
    862     </para>
    863 
    864   </listitem>
    865 
    866   <listitem>
    867     <para>Avoid memory recycling.  If you can't avoid it, you must use
    868     tell Helgrind what is going on via the
    869     <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
    870     <computeroutput>helgrind.h</computeroutput>).</para>
    871 
    872     <para>Helgrind is aware of standard heap memory allocation and
    873     deallocation that occurs via
    874     <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
    875     and from entry and exit of stack frames.  In particular, when memory is
    876     deallocated via <function>free</function>, <function>delete</function>,
    877     or function exit, Helgrind considers that memory clean, so when it is
    878     eventually reallocated, its history is irrelevant.</para>
    879 
    880     <para>However, it is common practice to implement memory recycling
    881     schemes.  In these, memory to be freed is not handed to
    882     <function>free</function>/<function>delete</function>, but instead put
    883     into a pool of free buffers to be handed out again as required.  The
    884     problem is that Helgrind has no
    885     way to know that such memory is logically no longer in use, and
    886     its history is irrelevant.  Hence you must make that explicit,
    887     using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
    888     to specify the relevant address ranges.  It's easiest to put these
    889     requests into the pool manager code, and use them either when memory is
    890     returned to the pool, or is allocated from it.</para>
    891   </listitem>
    892 
    893   <listitem>
    894     <para>Avoid POSIX condition variables.  If you can, use POSIX
    895     semaphores (<function>sem_t</function>, <function>sem_post</function>,
    896     <function>sem_wait</function>) to do inter-thread event signalling.
    897     Semaphores with an initial value of zero are particularly useful for
    898     this.</para>
    899 
    900     <para>Helgrind only partially correctly handles POSIX condition
    901     variables.  This is because Helgrind can see inter-thread
    902     dependencies between a <function>pthread_cond_wait</function> call and a
    903     <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
    904     call only if the waiting thread actually gets to the rendezvous first
    905     (so that it actually calls
    906     <function>pthread_cond_wait</function>).  It can't see dependencies
    907     between the threads if the signaller arrives first.  In the latter case,
    908     POSIX guidelines imply that the associated boolean condition still
    909     provides an inter-thread synchronisation event, but one which is
    910     invisible to Helgrind.</para>
    911 
    912     <para>The result of Helgrind missing some inter-thread
    913     synchronisation events is to cause it to report false positives.
    914     </para>
    915 
    916     <para>The root cause of this synchronisation lossage is
    917     particularly hard to understand, so an example is helpful.  It was
    918     discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
    919     in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
    920     canonical POSIX-recommended usage scheme for condition variables
    921     is as follows:</para>
    922 
    923 <programlisting><![CDATA[
    924 b   is a Boolean condition, which is False most of the time
    925 cv  is a condition variable
    926 mx  is its associated mutex
    927 
    928 Signaller:                             Waiter:
    929 
    930 lock(mx)                               lock(mx)
    931 b = True                               while (b == False)
    932 signal(cv)                                wait(cv,mx)
    933 unlock(mx)                             unlock(mx)
    934 ]]></programlisting>
    935 
    936     <para>Assume <computeroutput>b</computeroutput> is False most of
    937     the time.  If the waiter arrives at the rendezvous first, it
    938     enters its while-loop, waits for the signaller to signal, and
    939     eventually proceeds.  Helgrind sees the signal, notes the
    940     dependency, and all is well.</para>
    941 
    942     <para>If the signaller arrives
    943     first, <computeroutput>b</computeroutput> is set to true, and the
    944     signal disappears into nowhere.  When the waiter later arrives, it
    945     does not enter its while-loop and simply carries on.  But even in
    946     this case, the waiter code following the while-loop cannot execute
    947     until the signaller sets <computeroutput>b</computeroutput> to
    948     True.  Hence there is still the same inter-thread dependency, but
    949     this time it is through an arbitrary in-memory condition, and
    950     Helgrind cannot see it.</para>
    951 
    952     <para>By comparison, Helgrind's detection of inter-thread
    953     dependencies caused by semaphore operations is believed to be
    954     exactly correct.</para>
    955 
    956     <para>As far as I know, a solution to this problem that does not
    957     require source-level annotation of condition-variable wait loops
    958     is beyond the current state of the art.</para>
    959   </listitem>
    960 
    961   <listitem>
    962     <para>Make sure you are using a supported Linux distribution.  At
    963     present, Helgrind only properly supports glibc-2.3 or later.  This
    964     in turn means we only support glibc's NPTL threading
    965     implementation.  The old LinuxThreads implementation is not
    966     supported.</para>
    967   </listitem>
    968 
    969   <listitem>
    970     <para>Round up all finished threads using
    971     <function>pthread_join</function>.  Avoid
    972     detaching threads: don't create threads in the detached state, and
    973     don't call <function>pthread_detach</function> on existing threads.</para>
    974 
    975     <para>Using <function>pthread_join</function> to round up finished
    976     threads provides a clear synchronisation point that both Helgrind and
    977     programmers can see.  If you don't call
    978     <function>pthread_join</function> on a thread, Helgrind has no way to
    979     know when it finishes, relative to any
    980     significant synchronisation points for other threads in the program.  So
    981     it assumes that the thread lingers indefinitely and can potentially
    982     interfere indefinitely with the memory state of the program.  It
    983     has every right to assume that -- after all, it might really be
    984     the case that, for scheduling reasons, the exiting thread did run
    985     very slowly in the last stages of its life.</para>
    986   </listitem>
    987 
    988   <listitem>
    989     <para>Perform thread debugging (with Helgrind) and memory
    990     debugging (with Memcheck) together.</para>
    991 
    992     <para>Helgrind tracks the state of memory in detail, and memory
    993     management bugs in the application are liable to cause confusion.
    994     In extreme cases, applications which do many invalid reads and
    995     writes (particularly to freed memory) have been known to crash
    996     Helgrind.  So, ideally, you should make your application
    997     Memcheck-clean before using Helgrind.</para>
    998 
    999     <para>It may be impossible to make your application Memcheck-clean
   1000     unless you first remove threading bugs.  In particular, it may be
   1001     difficult to remove all reads and writes to freed memory in
   1002     multithreaded C++ destructor sequences at program termination.
   1003     So, ideally, you should make your application Helgrind-clean
   1004     before using Memcheck.</para>
   1005 
   1006     <para>Since this circularity is obviously unresolvable, at least
   1007     bear in mind that Memcheck and Helgrind are to some extent
   1008     complementary, and you may need to use them together.</para>
   1009   </listitem>
   1010 
   1011   <listitem>
   1012     <para>POSIX requires that implementations of standard I/O
   1013     (<function>printf</function>, <function>fprintf</function>,
   1014     <function>fwrite</function>, <function>fread</function>, etc) are thread
   1015     safe.  Unfortunately GNU libc implements this by using internal locking
   1016     primitives that Helgrind is unable to intercept.  Consequently Helgrind
   1017     generates many false race reports when you use these functions.</para>
   1018 
   1019     <para>Helgrind attempts to hide these errors using the standard
   1020     Valgrind error-suppression mechanism.  So, at least for simple
   1021     test cases, you don't see any.  Nevertheless, some may slip
   1022     through.  Just something to be aware of.</para>
   1023   </listitem>
   1024 
   1025   <listitem>
   1026     <para>Helgrind's error checks do not work properly inside the
   1027     system threading library itself
   1028     (<computeroutput>libpthread.so</computeroutput>), and it usually
   1029     observes large numbers of (false) errors in there.  Valgrind's
   1030     suppression system then filters these out, so you should not see
   1031     them.</para>
   1032 
   1033     <para>If you see any race errors reported
   1034     where <computeroutput>libpthread.so</computeroutput> or
   1035     <computeroutput>ld.so</computeroutput> is the object associated
   1036     with the innermost stack frame, please file a bug report at
   1037     <ulink url="&vg-url;">&vg-url;</ulink>.
   1038     </para>
   1039   </listitem>
   1040 
   1041 </orderedlist>
   1042 
   1043 </sect1>
   1044 
   1045 
   1046 
   1047 
   1048 <sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
   1049 <title>Helgrind Command-line Options</title>
   1050 
   1051 <para>The following end-user options are available:</para>
   1052 
   1053 <!-- start of xi:include in the manpage -->
   1054 <variablelist id="hg.opts.list">
   1055 
   1056   <varlistentry id="opt.free-is-write"
   1057                 xreflabel="--free-is-write">
   1058     <term>
   1059       <option><![CDATA[--free-is-write=no|yes
   1060       [default: no] ]]></option>
   1061     </term>
   1062     <listitem>
   1063       <para>When enabled (not the default), Helgrind treats freeing of
   1064         heap memory as if the memory was written immediately before
   1065         the free.  This exposes races where memory is referenced by
   1066         one thread, and freed by another, but there is no observable
   1067         synchronisation event to ensure that the reference happens
   1068         before the free.
   1069       </para>
   1070       <para>This functionality is new in Valgrind 3.7.0, and is
   1071         regarded as experimental.  It is not enabled by default
   1072         because its interaction with custom memory allocators is not
   1073         well understood at present.  User feedback is welcomed.
   1074       </para>
   1075     </listitem>
   1076   </varlistentry>
   1077 
   1078   <varlistentry id="opt.track-lockorders"
   1079                 xreflabel="--track-lockorders">
   1080     <term>
   1081       <option><![CDATA[--track-lockorders=no|yes
   1082       [default: yes] ]]></option>
   1083     </term>
   1084     <listitem>
   1085       <para>When enabled (the default), Helgrind performs lock order
   1086       consistency checking.  For some buggy programs, the large number
   1087       of lock order errors reported can become annoying, particularly
   1088       if you're only interested in race errors.  You may therefore find
   1089       it helpful to disable lock order checking.</para>
   1090     </listitem>
   1091   </varlistentry>
   1092 
   1093   <varlistentry id="opt.history-level"
   1094                 xreflabel="--history-level">
   1095     <term>
   1096       <option><![CDATA[--history-level=none|approx|full
   1097       [default: full] ]]></option>
   1098     </term>
   1099     <listitem>
   1100       <para><option>--history-level=full</option> (the default) causes
   1101         Helgrind collects enough information about "old" accesses that
   1102         it can produce two stack traces in a race report -- both the
   1103         stack trace for the current access, and the trace for the
   1104         older, conflicting access. To limit memory usage, "old" accesses
   1105         stack traces are limited to a maximum of 8 entries, even if
   1106         <option>--num-callers</option> value is bigger.</para>
   1107       <para>Collecting such information is expensive in both speed and
   1108         memory, particularly for programs that do many inter-thread
   1109         synchronisation events (locks, unlocks, etc).  Without such
   1110         information, it is more difficult to track down the root
   1111         causes of races.  Nonetheless, you may not need it in
   1112         situations where you just want to check for the presence or
   1113         absence of races, for example, when doing regression testing
   1114         of a previously race-free program.</para>
   1115       <para><option>--history-level=none</option> is the opposite
   1116         extreme.  It causes Helgrind not to collect any information
   1117         about previous accesses.  This can be dramatically faster
   1118         than <option>--history-level=full</option>.</para>
   1119       <para><option>--history-level=approx</option> provides a
   1120         compromise between these two extremes.  It causes Helgrind to
   1121         show a full trace for the later access, and approximate
   1122         information regarding the earlier access.  This approximate
   1123         information consists of two stacks, and the earlier access is
   1124         guaranteed to have occurred somewhere between program points
   1125         denoted by the two stacks. This is not as useful as showing
   1126         the exact stack for the previous access
   1127         (as <option>--history-level=full</option> does), but it is
   1128         better than nothing, and it is almost as fast as
   1129         <option>--history-level=none</option>.</para>
   1130     </listitem>
   1131   </varlistentry>
   1132 
   1133   <varlistentry id="opt.conflict-cache-size"
   1134                 xreflabel="--conflict-cache-size">
   1135     <term>
   1136       <option><![CDATA[--conflict-cache-size=N
   1137       [default: 1000000] ]]></option>
   1138     </term>
   1139     <listitem>
   1140       <para>This flag only has any effect
   1141         at <option>--history-level=full</option>.</para>
   1142       <para>Information about "old" conflicting accesses is stored in
   1143         a cache of limited size, with LRU-style management.  This is
   1144         necessary because it isn't practical to store a stack trace
   1145         for every single memory access made by the program.
   1146         Historical information on not recently accessed locations is
   1147         periodically discarded, to free up space in the cache.</para>
   1148       <para>This option controls the size of the cache, in terms of the
   1149         number of different memory addresses for which
   1150         conflicting access information is stored.  If you find that
   1151         Helgrind is showing race errors with only one stack instead of
   1152         the expected two stacks, try increasing this value.</para>
   1153       <para>The minimum value is 10,000 and the maximum is 30,000,000
   1154         (thirty times the default value).  Increasing the value by 1
   1155         increases Helgrind's memory requirement by very roughly 100
   1156         bytes, so the maximum value will easily eat up three extra
   1157         gigabytes or so of memory.</para>
   1158     </listitem>
   1159   </varlistentry>
   1160 
   1161   <varlistentry id="opt.check-stack-refs"
   1162                 xreflabel="--check-stack-refs">
   1163     <term>
   1164       <option><![CDATA[--check-stack-refs=no|yes
   1165       [default: yes] ]]></option>
   1166     </term>
   1167     <listitem>
   1168       <para>
   1169         By default Helgrind checks all data memory accesses made by your
   1170         program.  This flag enables you to skip checking for accesses
   1171         to thread stacks (local variables).  This can improve
   1172         performance, but comes at the cost of missing races on
   1173         stack-allocated data.
   1174       </para>
   1175     </listitem>
   1176   </varlistentry>
   1177 
   1178 
   1179 </variablelist>
   1180 <!-- end of xi:include in the manpage -->
   1181 
   1182 <!-- start of xi:include in the manpage -->
   1183 <!--  commented out, because we don't document debugging options in the
   1184       manual.  Nb: all the double-dashes below had a space inserted in them
   1185       to avoid problems with premature closing of this comment.
   1186 <para>In addition, the following debugging options are available for
   1187 Helgrind:</para>
   1188 
   1189 <variablelist id="hg.debugopts.list">
   1190 
   1191   <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
   1192     <term>
   1193       <option><![CDATA[- -trace-malloc=no|yes [no]
   1194       ]]></option>
   1195     </term>
   1196     <listitem>
   1197       <para>Show all client <function>malloc</function> (etc) and
   1198       <function>free</function> (etc) requests.</para>
   1199     </listitem>
   1200   </varlistentry>
   1201 
   1202   <varlistentry id="opt.cmp-race-err-addrs" 
   1203                 xreflabel="- -cmp-race-err-addrs">
   1204     <term>
   1205       <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
   1206       ]]></option>
   1207     </term>
   1208     <listitem>
   1209       <para>Controls whether or not race (data) addresses should be
   1210         taken into account when removing duplicates of race errors.
   1211         With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
   1212         identical race errors will be considered to be the same if
   1213         their race addresses differ.  With
   1214         With <varname>- -cmp-race-err-addrs=yes</varname> they will be
   1215         considered different.  This is provided to help make certain
   1216         regression tests work reliably.</para>
   1217     </listitem>
   1218   </varlistentry>
   1219 
   1220   <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
   1221     <term>
   1222       <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
   1223       ]]></option>
   1224     </term>
   1225     <listitem>
   1226       <para>Run extensive sanity checks on Helgrind's internal
   1227         data structures at events defined by the bitstring, as
   1228         follows:</para>
   1229       <para><computeroutput>010000 </computeroutput>after changes to
   1230         the lock order acquisition graph</para>
   1231       <para><computeroutput>001000 </computeroutput>after every client
   1232         memory access (NB: not currently used)</para>
   1233       <para><computeroutput>000100 </computeroutput>after every client
   1234         memory range permission setting of 256 bytes or greater</para>
   1235       <para><computeroutput>000010 </computeroutput>after every client
   1236         lock or unlock event</para>
   1237       <para><computeroutput>000001 </computeroutput>after every client
   1238         thread creation or joinage event</para>
   1239       <para>Note these will make Helgrind run very slowly, often to
   1240         the point of being completely unusable.</para>
   1241     </listitem>
   1242   </varlistentry>
   1243 
   1244 </variablelist>
   1245 -->
   1246 <!-- end of xi:include in the manpage -->
   1247 
   1248 
   1249 </sect1>
   1250 
   1251 
   1252 
   1253 <sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
   1254 <title>Helgrind Client Requests</title>
   1255 
   1256 <para>The following client requests are defined in
   1257 <filename>helgrind.h</filename>.  See that file for exact details of their
   1258 arguments.</para>
   1259 
   1260 <itemizedlist>
   1261 
   1262   <listitem>
   1263     <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
   1264     <para>This makes Helgrind forget everything it knows about a
   1265     specified memory range.  This is particularly useful for memory
   1266     allocators that wish to recycle memory.</para>
   1267   </listitem>
   1268   <listitem>
   1269     <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
   1270   </listitem>
   1271   <listitem>
   1272     <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
   1273   </listitem>
   1274   <listitem>
   1275     <para><function>ANNOTATE_NEW_MEMORY</function></para>
   1276   </listitem>
   1277   <listitem>
   1278     <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
   1279   </listitem>
   1280   <listitem>
   1281     <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
   1282   </listitem>
   1283   <listitem>
   1284     <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
   1285   </listitem>
   1286   <listitem>
   1287     <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
   1288     <para>These are used to describe to Helgrind, the behaviour of
   1289     custom (non-POSIX) synchronisation primitives, which it otherwise
   1290     has no way to understand.  See comments
   1291     in <filename>helgrind.h</filename> for further
   1292     documentation.</para>
   1293   </listitem>
   1294 
   1295 </itemizedlist>
   1296 
   1297 </sect1>
   1298 
   1299 
   1300 
   1301 <sect1 id="hg-manual.todolist" xreflabel="To Do List">
   1302 <title>A To-Do List for Helgrind</title>
   1303 
   1304 <para>The following is a list of loose ends which should be tidied up
   1305 some time.</para>
   1306 
   1307 <itemizedlist>
   1308   <listitem><para>For lock order errors, print the complete lock
   1309     cycle, rather than only doing for size-2 cycles as at
   1310     present.</para>
   1311   </listitem>
   1312   <listitem><para>The conflicting access mechanism sometimes
   1313     mysteriously fails to show the conflicting access' stack, even
   1314     when provided with unbounded storage for conflicting access info.
   1315     This should be investigated.</para>
   1316   </listitem>
   1317   <listitem><para>Document races caused by GCC's thread-unsafe code
   1318     generation for speculative stores.  In the interim see
   1319     <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
   1320     </computeroutput>
   1321     and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
   1322     </para>
   1323   </listitem>
   1324   <listitem><para>Don't update the lock-order graph, and don't check
   1325     for errors, when a "try"-style lock operation happens (e.g.
   1326     <function>pthread_mutex_trylock</function>).  Such calls do not add any real
   1327     restrictions to the locking order, since they can always fail to
   1328     acquire the lock, resulting in the caller going off and doing Plan
   1329     B (presumably it will have a Plan B).  Doing such checks could
   1330     generate false lock-order errors and confuse users.</para>
   1331   </listitem>
   1332   <listitem><para> Performance can be very poor.  Slowdowns on the
   1333     order of 100:1 are not unusual.  There is limited scope for
   1334     performance improvements.
   1335     </para>
   1336   </listitem>
   1337 
   1338 </itemizedlist>
   1339 
   1340 </sect1>
   1341 
   1342 </chapter>
   1343