Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 
      7 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
      8   <title>Helgrind: a thread error detector</title>
      9 
     10 <para>To use this tool, you must specify
     11 <option>--tool=helgrind</option> on the Valgrind
     12 command line.</para>
     13 
     14 
     15 <sect1 id="hg-manual.overview" xreflabel="Overview">
     16 <title>Overview</title>
     17 
     18 <para>Helgrind is a Valgrind tool for detecting synchronisation errors
     19 in C, C++ and Fortran programs that use the POSIX pthreads
     20 threading primitives.</para>
     21 
     22 <para>The main abstractions in POSIX pthreads are: a set of threads
     23 sharing a common address space, thread creation, thread joining,
     24 thread exit, mutexes (locks), condition variables (inter-thread event
     25 notifications), reader-writer locks, spinlocks, semaphores and
     26 barriers.</para>
     27 
     28 <para>Helgrind can detect three classes of errors, which are discussed
     29 in detail in the next three sections:</para>
     30 
     31 <orderedlist>
     32  <listitem>
     33   <para><link linkend="hg-manual.api-checks">
     34         Misuses of the POSIX pthreads API.</link></para>
     35  </listitem>
     36  <listitem>
     37   <para><link linkend="hg-manual.lock-orders">
     38         Potential deadlocks arising from lock
     39         ordering problems.</link></para>
     40  </listitem>
     41  <listitem>
     42   <para><link linkend="hg-manual.data-races">
     43         Data races -- accessing memory without adequate locking
     44                       or synchronisation</link>.
     45   </para>
     46  </listitem>
     47 </orderedlist>
     48 
     49 <para>Problems like these often result in unreproducible,
     50 timing-dependent crashes, deadlocks and other misbehaviour, and
     51 can be difficult to find by other means.</para>
     52 
     53 <para>Helgrind is aware of all the pthread abstractions and tracks
     54 their effects as accurately as it can.  On x86 and amd64 platforms, it
     55 understands and partially handles implicit locking arising from the
     56 use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
     57 platforms, it partially handles implicit locking arising from 
     58 load-linked and store-conditional instruction pairs.
     59 </para>
     60 
     61 <para>Helgrind works best when your application uses only the POSIX
     62 pthreads API.  However, if you want to use custom threading 
     63 primitives, you can describe their behaviour to Helgrind using the
     64 <varname>ANNOTATE_*</varname> macros defined
     65 in <varname>helgrind.h</varname>.</para>
     66 
     67 
     68 
     69 <para>Following those is a section containing 
     70 <link linkend="hg-manual.effective-use">
     71 hints and tips on how to get the best out of Helgrind.</link>
     72 </para>
     73 
     74 <para>Then there is a
     75 <link linkend="hg-manual.options">summary of command-line
     76 options.</link>
     77 </para>
     78 
     79 <para>Finally, there is 
     80 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
     81 could be improved.</link>
     82 </para>
     83 
     84 </sect1>
     85 
     86 
     87 
     88 
     89 <sect1 id="hg-manual.api-checks" xreflabel="API Checks">
     90 <title>Detected errors: Misuses of the POSIX pthreads API</title>
     91 
     92 <para>Helgrind intercepts calls to many POSIX pthreads functions, and
     93 is therefore able to report on various common problems.  Although
     94 these are unglamourous errors, their presence can lead to undefined
     95 program behaviour and hard-to-find bugs later on.  The detected errors
     96 are:</para>
     97 
     98 <itemizedlist>
     99  <listitem><para>unlocking an invalid mutex</para></listitem>
    100  <listitem><para>unlocking a not-locked mutex</para></listitem>
    101  <listitem><para>unlocking a mutex held by a different
    102                  thread</para></listitem>
    103  <listitem><para>destroying an invalid or a locked mutex</para></listitem>
    104  <listitem><para>recursively locking a non-recursive mutex</para></listitem>
    105  <listitem><para>deallocation of memory that contains a
    106                  locked mutex</para></listitem>
    107  <listitem><para>passing mutex arguments to functions expecting
    108                  reader-writer lock arguments, and vice
    109                  versa</para></listitem>
    110  <listitem><para>when a POSIX pthread function fails with an
    111                  error code that must be handled</para></listitem>
    112  <listitem><para>when a thread exits whilst still holding locked
    113                  locks</para></listitem>
    114  <listitem><para>calling <function>pthread_cond_wait</function>
    115                  with a not-locked mutex, an invalid mutex,
    116                  or one locked by a different
    117                  thread</para></listitem>
    118  <listitem><para>inconsistent bindings between condition
    119                  variables and their associated mutexes</para></listitem>
    120  <listitem><para>invalid or duplicate initialisation of a pthread
    121                  barrier</para></listitem>
    122  <listitem><para>initialisation of a pthread barrier on which threads
    123                  are still waiting</para></listitem>
    124  <listitem><para>destruction of a pthread barrier object which was
    125                  never initialised, or on which threads are still
    126                  waiting</para></listitem>
    127  <listitem><para>waiting on an uninitialised pthread
    128                  barrier</para></listitem>
    129  <listitem><para>for all of the pthreads functions that Helgrind
    130                  intercepts, an error is reported, along with a stack
    131                  trace, if the system threading library routine returns
    132                  an error code, even if Helgrind itself detected no
    133                  error</para></listitem>
    134 </itemizedlist>
    135 
    136 <para>Checks pertaining to the validity of mutexes are generally also
    137 performed for reader-writer locks.</para>
    138 
    139 <para>Various kinds of this-can't-possibly-happen events are also
    140 reported.  These usually indicate bugs in the system threading
    141 library.</para>
    142 
    143 <para>Reported errors always contain a primary stack trace indicating
    144 where the error was detected.  They may also contain auxiliary stack
    145 traces giving additional information.  In particular, most errors
    146 relating to mutexes will also tell you where that mutex first came to
    147 Helgrind's attention (the "<computeroutput>was first observed
    148 at</computeroutput>" part), so you have a chance of figuring out which
    149 mutex it is referring to.  For example:</para>
    150 
    151 <programlisting><![CDATA[
    152 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
    153    at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
    154    by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
    155    by 0x40079B: main (tc09_bad_unlock.c:50)
    156   Lock at 0x7FEFFFA90 was first observed
    157    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    158    by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
    159    by 0x40079B: main (tc09_bad_unlock.c:50)
    160 ]]></programlisting>
    161 
    162 <para>Helgrind has a way of summarising thread identities, as
    163 you see here with the text "<computeroutput>Thread
    164 #1</computeroutput>".  This is so that it can speak about threads and
    165 sets of threads without overwhelming you with details.  See 
    166 <link linkend="hg-manual.data-races.errmsgs">below</link>
    167 for more information on interpreting error messages.</para>
    168 
    169 </sect1>
    170 
    171 
    172 
    173 
    174 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
    175 <title>Detected errors: Inconsistent Lock Orderings</title>
    176 
    177 <para>In this section, and in general, to "acquire" a lock simply
    178 means to lock that lock, and to "release" a lock means to unlock
    179 it.</para>
    180 
    181 <para>Helgrind monitors the order in which threads acquire locks.
    182 This allows it to detect potential deadlocks which could arise from
    183 the formation of cycles of locks.  Detecting such inconsistencies is
    184 useful because, whilst actual deadlocks are fairly obvious, potential
    185 deadlocks may never be discovered during testing and could later lead
    186 to hard-to-diagnose in-service failures.</para>
    187 
    188 <para>The simplest example of such a problem is as
    189 follows.</para>
    190 
    191 <itemizedlist>
    192  <listitem><para>Imagine some shared resource R, which, for whatever
    193   reason, is guarded by two locks, L1 and L2, which must both be held
    194   when R is accessed.</para>
    195  </listitem>
    196  <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
    197   to access R.  The implication of this is that all threads in the
    198   program must acquire the two locks in the order first L1 then L2.
    199   Not doing so risks deadlock.</para>
    200  </listitem>
    201  <listitem><para>The deadlock could happen if two threads -- call them
    202   T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
    203   and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
    204   to acquire L1, but those locks are both already held.  So T1 and T2
    205   become deadlocked.</para>
    206  </listitem>
    207 </itemizedlist>
    208 
    209 <para>Helgrind builds a directed graph indicating the order in which
    210 locks have been acquired in the past.  When a thread acquires a new
    211 lock, the graph is updated, and then checked to see if it now contains
    212 a cycle.  The presence of a cycle indicates a potential deadlock involving
    213 the locks in the cycle.</para>
    214 
    215 <para>In general, Helgrind will choose two locks involved in the cycle
    216 and show you how their acquisition ordering has become inconsistent.
    217 It does this by showing the program points that first defined the
    218 ordering, and the program points which later violated it.  Here is a
    219 simple example involving just two locks:</para>
    220 
    221 <programlisting><![CDATA[
    222 Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
    223 
    224 Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
    225    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    226    by 0x400825: main (tc13_laog1.c:23)
    227 
    228  followed by a later acquisition of lock at 0x7FF0006D0
    229    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    230    by 0x400853: main (tc13_laog1.c:24)
    231 
    232 Required order was established by acquisition of lock at 0x7FF0006D0
    233    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    234    by 0x40076D: main (tc13_laog1.c:17)
    235 
    236  followed by a later acquisition of lock at 0x7FF0006A0
    237    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    238    by 0x40079B: main (tc13_laog1.c:18)
    239 ]]></programlisting>
    240 
    241 <para>When there are more than two locks in the cycle, the error is
    242 equally serious.  However, at present Helgrind does not show the locks
    243 involved, sometimes because it that information is not available, but
    244 also so as to avoid flooding you with information.  For example, here
    245 is an example involving a cycle of five locks from a naive
    246 implementation the famous Dining Philosophers problem
    247 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
    248 In this case Helgrind has detected that all 5 philosophers could
    249 simultaneously pick up their left fork and then deadlock whilst
    250 waiting to pick up their right forks.</para>
    251 
    252 <programlisting><![CDATA[
    253 Thread #6: lock order "0x6010C0 before 0x601160" violated
    254 
    255 Observed (incorrect) order is: acquisition of lock at 0x601160
    256    (stack unavailable)
    257 
    258  followed by a later acquisition of lock at 0x6010C0
    259    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    260    by 0x4007DE: dine (tc14_laog_dinphils.c:19)
    261    by 0x4C2CBE7: mythread_wrapper (hg_intercepts.c:219)
    262    by 0x4E369C9: start_thread (pthread_create.c:300)
    263 ]]></programlisting>
    264 
    265 </sect1>
    266 
    267 
    268 
    269 
    270 <sect1 id="hg-manual.data-races" xreflabel="Data Races">
    271 <title>Detected errors: Data Races</title>
    272 
    273 <para>A data race happens, or could happen, when two threads access a
    274 shared memory location without using suitable locks or other
    275 synchronisation to ensure single-threaded access.  Such missing
    276 locking can cause obscure timing dependent bugs.  Ensuring programs
    277 are race-free is one of the central difficulties of threaded
    278 programming.</para>
    279 
    280 <para>Reliably detecting races is a difficult problem, and most
    281 of Helgrind's internals are devoted to dealing with it.  
    282 We begin with a simple example.</para>
    283 
    284 
    285 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
    286 <title>A Simple Data Race</title>
    287 
    288 <para>About the simplest possible example of a race is as follows.  In
    289 this program, it is impossible to know what the value
    290 of <computeroutput>var</computeroutput> is at the end of the program.
    291 Is it 2 ?  Or 1 ?</para>
    292 
    293 <programlisting><![CDATA[
    294 #include <pthread.h>
    295 
    296 int var = 0;
    297 
    298 void* child_fn ( void* arg ) {
    299    var++; /* Unprotected relative to parent */ /* this is line 6 */
    300    return NULL;
    301 }
    302 
    303 int main ( void ) {
    304    pthread_t child;
    305    pthread_create(&child, NULL, child_fn, NULL);
    306    var++; /* Unprotected relative to child */ /* this is line 13 */
    307    pthread_join(child, NULL);
    308    return 0;
    309 }
    310 ]]></programlisting>
    311 
    312 <para>The problem is there is nothing to
    313 stop <varname>var</varname> being updated simultaneously
    314 by both threads.  A correct program would 
    315 protect <varname>var</varname> with a lock of type
    316 <function>pthread_mutex_t</function>, which is acquired
    317 before each access and released afterwards.  Helgrind's output for
    318 this program is:</para>
    319 
    320 <programlisting><![CDATA[
    321 Thread #1 is the program's root thread
    322 
    323 Thread #2 was created
    324    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    325    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    326    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    327    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    328    by 0x400605: main (simple_race.c:12)
    329 
    330 Possible data race during read of size 4 at 0x601038 by thread #1
    331 Locks held: none
    332    at 0x400606: main (simple_race.c:13)
    333 
    334 This conflicts with a previous write of size 4 by thread #2
    335 Locks held: none
    336    at 0x4005DC: child_fn (simple_race.c:6)
    337    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    338    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    339    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    340 
    341 Location 0x601038 is 0 bytes inside global var "var"
    342 declared at simple_race.c:3
    343 ]]></programlisting>
    344 
    345 <para>This is quite a lot of detail for an apparently simple error.
    346 The last clause is the main error message.  It says there is a race as
    347 a result of a read of size 4 (bytes), at 0x601038, which is the
    348 address of <computeroutput>var</computeroutput>, happening in
    349 function <computeroutput>main</computeroutput> at line 13 in the
    350 program.</para>
    351 
    352 <para>Two important parts of the message are:</para>
    353 
    354 <itemizedlist>
    355  <listitem>
    356   <para>Helgrind shows two stack traces for the error, not one.  By
    357    definition, a race involves two different threads accessing the
    358    same location in such a way that the result depends on the relative
    359    speeds of the two threads.</para>
    360   <para>
    361    The first stack trace follows the text "<computeroutput>Possible
    362    data race during read of size 4 ...</computeroutput>" and the
    363    second trace follows the text "<computeroutput>This conflicts with
    364    a previous write of size 4 ...</computeroutput>".  Helgrind is
    365    usually able to show both accesses involved in a race.  At least
    366    one of these will be a write (since two concurrent, unsynchronised
    367    reads are harmless), and they will of course be from different
    368    threads.</para>
    369   <para>By examining your program at the two locations, you should be
    370    able to get at least some idea of what the root cause of the
    371    problem is.  For each location, Helgrind shows the set of locks
    372    held at the time of the access.  This often makes it clear which
    373    thread, if any, failed to take a required lock.  In this example
    374    neither thread holds a lock during the access.</para>
    375  </listitem>
    376  <listitem>
    377   <para>For races which occur on global or stack variables, Helgrind
    378    tries to identify the name and defining point of the variable.
    379    Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
    380    global var "var" declared at simple_race.c:3</computeroutput>".</para>
    381   <para>Showing names of stack and global variables carries no
    382    run-time overhead once Helgrind has your program up and running.
    383    However, it does require Helgrind to spend considerable extra time
    384    and memory at program startup to read the relevant debug info.
    385    Hence this facility is disabled by default.  To enable it, you need
    386    to give the <varname>--read-var-info=yes</varname> option to
    387    Helgrind.</para>
    388  </listitem>
    389 </itemizedlist>
    390 
    391 <para>The following section explains Helgrind's race detection
    392 algorithm in more detail.</para>
    393 
    394 </sect2>
    395 
    396 
    397 
    398 <sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
    399 <title>Helgrind's Race Detection Algorithm</title>
    400 
    401 <para>Most programmers think about threaded programming in terms of
    402 the basic functionality provided by the threading library (POSIX
    403 Pthreads): thread creation, thread joining, locks, condition
    404 variables, semaphores and barriers.</para>
    405 
    406 <para>The effect of using these functions is to impose 
    407 constraints upon the order in which memory accesses can
    408 happen.  This implied ordering is generally known as the
    409 "happens-before relation".  Once you understand the happens-before
    410 relation, it is easy to see how Helgrind finds races in your code.
    411 Fortunately, the happens-before relation is itself easy to understand,
    412 and is by itself a useful tool for reasoning about the behaviour of
    413 parallel programs.  We now introduce it using a simple example.</para>
    414 
    415 <para>Consider first the following buggy program:</para>
    416 
    417 <programlisting><![CDATA[
    418 Parent thread:                         Child thread:
    419 
    420 int var;
    421 
    422 // create child thread
    423 pthread_create(...)                          
    424 var = 20;                              var = 10;
    425                                        exit
    426 
    427 // wait for child
    428 pthread_join(...)
    429 printf("%d\n", var);
    430 ]]></programlisting>
    431 
    432 <para>The parent thread creates a child.  Both then write different
    433 values to some variable <computeroutput>var</computeroutput>, and the
    434 parent then waits for the child to exit.</para>
    435 
    436 <para>What is the value of <computeroutput>var</computeroutput> at the
    437 end of the program, 10 or 20?  We don't know.  The program is
    438 considered buggy (it has a race) because the final value
    439 of <computeroutput>var</computeroutput> depends on the relative rates
    440 of progress of the parent and child threads.  If the parent is fast
    441 and the child is slow, then the child's assignment may happen later,
    442 so the final value will be 10; and vice versa if the child is faster
    443 than the parent.</para>
    444 
    445 <para>The relative rates of progress of parent vs child is not something
    446 the programmer can control, and will often change from run to run.
    447 It depends on factors such as the load on the machine, what else is
    448 running, the kernel's scheduling strategy, and many other factors.</para>
    449 
    450 <para>The obvious fix is to use a lock to
    451 protect <computeroutput>var</computeroutput>.  It is however
    452 instructive to consider a somewhat more abstract solution, which is to
    453 send a message from one thread to the other:</para>
    454 
    455 <programlisting><![CDATA[
    456 Parent thread:                         Child thread:
    457 
    458 int var;
    459 
    460 // create child thread
    461 pthread_create(...)                          
    462 var = 20;
    463 // send message to child
    464                                        // wait for message to arrive
    465                                        var = 10;
    466                                        exit
    467 
    468 // wait for child
    469 pthread_join(...)
    470 printf("%d\n", var);
    471 ]]></programlisting>
    472 
    473 <para>Now the program reliably prints "10", regardless of the speed of
    474 the threads.  Why?  Because the child's assignment cannot happen until
    475 after it receives the message.  And the message is not sent until
    476 after the parent's assignment is done.</para>
    477 
    478 <para>The message transmission creates a "happens-before" dependency
    479 between the two assignments: <computeroutput>var = 20;</computeroutput>
    480 must now happen-before <computeroutput>var = 10;</computeroutput>.
    481 And so there is no longer a race
    482 on <computeroutput>var</computeroutput>.
    483 </para>
    484 
    485 <para>Note that it's not significant that the parent sends a message
    486 to the child.  Sending a message from the child (after its assignment)
    487 to the parent (before its assignment) would also fix the problem, causing
    488 the program to reliably print "20".</para>
    489 
    490 <para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
    491 accesses to memory locations.  If a location -- in this example, 
    492 <computeroutput>var</computeroutput>,
    493 is accessed by two different threads, Helgrind checks to see if the
    494 two accesses are ordered by the happens-before relation.  If so,
    495 that's fine; if not, it reports a race.</para>
    496 
    497 <para>It is important to understand that the happens-before relation
    498 creates only a partial ordering, not a total ordering.  An example of
    499 a total ordering is comparison of numbers: for any two numbers 
    500 <computeroutput>x</computeroutput> and
    501 <computeroutput>y</computeroutput>, either 
    502 <computeroutput>x</computeroutput> is less than, equal to, or greater
    503 than
    504 <computeroutput>y</computeroutput>.  A partial ordering is like a
    505 total ordering, but it can also express the concept that two elements
    506 are neither equal, less or greater, but merely unordered with respect
    507 to each other.</para>
    508 
    509 <para>In the fixed example above, we say that 
    510 <computeroutput>var = 20;</computeroutput> "happens-before"
    511 <computeroutput>var = 10;</computeroutput>.  But in the original
    512 version, they are unordered: we cannot say that either happens-before
    513 the other.</para>
    514 
    515 <para>What does it mean to say that two accesses from different
    516 threads are ordered by the happens-before relation?  It means that
    517 there is some chain of inter-thread synchronisation operations which
    518 cause those accesses to happen in a particular order, irrespective of
    519 the actual rates of progress of the individual threads.  This is a
    520 required property for a reliable threaded program, which is why
    521 Helgrind checks for it.</para>
    522 
    523 <para>The happens-before relations created by standard threading
    524 primitives are as follows:</para>
    525 
    526 <itemizedlist>
    527  <listitem><para>When a mutex is unlocked by thread T1 and later (or
    528   immediately) locked by thread T2, then the memory accesses in T1
    529   prior to the unlock must happen-before those in T2 after it acquires
    530   the lock.</para>
    531  </listitem>
    532  <listitem><para>The same idea applies to reader-writer locks,
    533   although with some complication so as to allow correct handling of
    534   reads vs writes.</para>
    535  </listitem>
    536  <listitem><para>When a condition variable (CV) is signalled on by
    537   thread T1 and some other thread T2 is thereby released from a wait
    538   on the same CV, then the memory accesses in T1 prior to the
    539   signalling must happen-before those in T2 after it returns from the
    540   wait.  If no thread was waiting on the CV then there is no
    541   effect.</para>
    542  </listitem>
    543  <listitem><para>If instead T1 broadcasts on a CV, then all of the
    544   waiting threads, rather than just one of them, acquire a
    545   happens-before dependency on the broadcasting thread at the point it
    546   did the broadcast.</para>
    547  </listitem>
    548  <listitem><para>A thread T2 that continues after completing sem_wait
    549   on a semaphore that thread T1 posts on, acquires a happens-before
    550   dependence on the posting thread, a bit like dependencies caused
    551   mutex unlock-lock pairs.  However, since a semaphore can be posted
    552   on many times, it is unspecified from which of the post calls the
    553   wait call gets its happens-before dependency.</para>
    554  </listitem>
    555  <listitem><para>For a group of threads T1 .. Tn which arrive at a
    556   barrier and then move on, each thread after the call has a
    557   happens-after dependency from all threads before the
    558   barrier.</para>
    559  </listitem>
    560  <listitem><para>A newly-created child thread acquires an initial
    561   happens-after dependency on the point where its parent created it.
    562   That is, all memory accesses performed by the parent prior to
    563   creating the child are regarded as happening-before all the accesses
    564   of the child.</para>
    565  </listitem>
    566  <listitem><para>Similarly, when an exiting thread is reaped via a
    567   call to <function>pthread_join</function>, once the call returns, the
    568   reaping thread acquires a happens-after dependency relative to all memory
    569   accesses made by the exiting thread.</para>
    570  </listitem>
    571 </itemizedlist>
    572 
    573 <para>In summary: Helgrind intercepts the above listed events, and builds a
    574 directed acyclic graph represented the collective happens-before
    575 dependencies.  It also monitors all memory accesses.</para>
    576 
    577 <para>If a location is accessed by two different threads, but Helgrind
    578 cannot find any path through the happens-before graph from one access
    579 to the other, then it reports a race.</para>
    580 
    581 <para>There are a couple of caveats:</para>
    582 
    583 <itemizedlist>
    584  <listitem><para>Helgrind doesn't check for a race in the case where
    585   both accesses are reads.  That would be silly, since concurrent
    586   reads are harmless.</para>
    587  </listitem>
    588  <listitem><para>Two accesses are considered to be ordered by the
    589   happens-before dependency even through arbitrarily long chains of
    590   synchronisation events.  For example, if T1 accesses some location
    591   L, and then <function>pthread_cond_signals</function> T2, which later
    592   <function>pthread_cond_signals</function> T3, which then accesses L, then
    593   a suitable happens-before dependency exists between the first and second
    594   accesses, even though it involves two different inter-thread
    595   synchronisation events.</para>
    596  </listitem>
    597 </itemizedlist>
    598 
    599 </sect2>
    600 
    601 
    602 
    603 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
    604 <title>Interpreting Race Error Messages</title>
    605 
    606 <para>Helgrind's race detection algorithm collects a lot of
    607 information, and tries to present it in a helpful way when a race is
    608 detected.  Here's an example:</para>
    609 
    610 <programlisting><![CDATA[
    611 Thread #2 was created
    612    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    613    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    614    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    615    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    616    by 0x4008F2: main (tc21_pthonce.c:86)
    617 
    618 Thread #3 was created
    619    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    620    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    621    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    622    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    623    by 0x4008F2: main (tc21_pthonce.c:86)
    624 
    625 Possible data race during read of size 4 at 0x601070 by thread #3
    626 Locks held: none
    627    at 0x40087A: child (tc21_pthonce.c:74)
    628    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    629    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    630    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    631 
    632 This conflicts with a previous write of size 4 by thread #2
    633 Locks held: none
    634    at 0x400883: child (tc21_pthonce.c:74)
    635    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    636    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    637    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    638 
    639 Location 0x601070 is 0 bytes inside local var "unprotected2"
    640 declared at tc21_pthonce.c:51, in frame #0 of thread 3
    641 ]]></programlisting>
    642 
    643 <para>Helgrind first announces the creation points of any threads
    644 referenced in the error message.  This is so it can speak concisely
    645 about threads without repeatedly printing their creation point call
    646 stacks.  Each thread is only ever announced once, the first time it
    647 appears in any Helgrind error message.</para>
    648 
    649 <para>The main error message begins at the text
    650 "<computeroutput>Possible data race during read</computeroutput>".  At
    651 the start is information you would expect to see -- address and size
    652 of the racing access, whether a read or a write, and the call stack at
    653 the point it was detected.</para>
    654 
    655 <para>A second call stack is presented starting at the text
    656 "<computeroutput>This conflicts with a previous
    657 write</computeroutput>".  This shows a previous access which also
    658 accessed the stated address, and which is believed to be racing
    659 against the access in the first call stack.</para>
    660 
    661 <para>Finally, Helgrind may attempt to give a description of the
    662 raced-on address in source level terms.  In this example, it
    663 identifies it as a local variable, shows its name, declaration point,
    664 and in which frame (of the first call stack) it lives.  Note that this
    665 information is only shown when <varname>--read-var-info=yes</varname>
    666 is specified on the command line.  That's because reading the DWARF3
    667 debug information in enough detail to capture variable type and
    668 location information makes Helgrind much slower at startup, and also
    669 requires considerable amounts of memory, for large programs.
    670 </para>
    671 
    672 <para>Once you have your two call stacks, how do you find the root
    673 cause of the race?</para>
    674 
    675 <para>The first thing to do is examine the source locations referred
    676 to by each call stack.  They should both show an access to the same
    677 location, or variable.</para>
    678 
    679 <para>Now figure out how how that location should have been made
    680 thread-safe:</para>
    681 
    682 <itemizedlist>
    683  <listitem><para>Perhaps the location was intended to be protected by
    684   a mutex?  If so, you need to lock and unlock the mutex at both
    685   access points, even if one of the accesses is reported to be a read.
    686   Did you perhaps forget the locking at one or other of the accesses?
    687   To help you do this, Helgrind shows the set of locks held by each
    688   threads at the time they accessed the raced-on location.</para>
    689  </listitem>
    690  <listitem><para>Alternatively, perhaps you intended to use a some
    691   other scheme to make it safe, such as signalling on a condition
    692   variable.  In all such cases, try to find a synchronisation event
    693   (or a chain thereof) which separates the earlier-observed access (as
    694   shown in the second call stack) from the later-observed access (as
    695   shown in the first call stack).  In other words, try to find
    696   evidence that the earlier access "happens-before" the later access.
    697   See the previous subsection for an explanation of the happens-before
    698   relation.</para>
    699   <para>
    700   The fact that Helgrind is reporting a race means it did not observe
    701   any happens-before relation between the two accesses.  If
    702   Helgrind is working correctly, it should also be the case that you
    703   also cannot find any such relation, even on detailed inspection
    704   of the source code.  Hopefully, though, your inspection of the code
    705   will show where the missing synchronisation operation(s) should have
    706   been.</para>
    707  </listitem>
    708 </itemizedlist>
    709 
    710 </sect2>
    711 
    712 
    713 </sect1>
    714 
    715 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
    716 <title>Hints and Tips for Effective Use of Helgrind</title>
    717 
    718 <para>Helgrind can be very helpful in finding and resolving
    719 threading-related problems.  Like all sophisticated tools, it is most
    720 effective when you understand how to play to its strengths.</para>
    721 
    722 <para>Helgrind will be less effective when you merely throw an
    723 existing threaded program at it and try to make sense of any reported
    724 errors.  It will be more effective if you design threaded programs
    725 from the start in a way that helps Helgrind verify correctness.  The
    726 same is true for finding memory errors with Memcheck, but applies more
    727 here, because thread checking is a harder problem.  Consequently it is
    728 much easier to write a correct program for which Helgrind falsely
    729 reports (threading) errors than it is to write a correct program for
    730 which Memcheck falsely reports (memory) errors.</para>
    731 
    732 <para>With that in mind, here are some tips, listed most important first,
    733 for getting reliable results and avoiding false errors.  The first two
    734 are critical.  Any violations of them will swamp you with huge numbers
    735 of false data-race errors.</para>
    736 
    737 
    738 <orderedlist>
    739 
    740   <listitem>
    741     <para>Make sure your application, and all the libraries it uses,
    742     use the POSIX threading primitives.  Helgrind needs to be able to
    743     see all events pertaining to thread creation, exit, locking and
    744     other synchronisation events.  To do so it intercepts many POSIX
    745     pthreads functions.</para>
    746 
    747     <para>Do not roll your own threading primitives (mutexes, etc)
    748     from combinations of the Linux futex syscall, atomic counters, etc.
    749     These throw Helgrind's internal what's-going-on models
    750     way off course and will give bogus results.</para>
    751 
    752     <para>Also, do not reimplement existing POSIX abstractions using
    753     other POSIX abstractions.  For example, don't build your own
    754     semaphore routines or reader-writer locks from POSIX mutexes and
    755     condition variables.  Instead use POSIX reader-writer locks and
    756     semaphores directly, since Helgrind supports them directly.</para>
    757 
    758     <para>Helgrind directly supports the following POSIX threading
    759     abstractions: mutexes, reader-writer locks, condition variables
    760     (but see below), semaphores and barriers.  Currently spinlocks
    761     are not supported, although they could be in future.</para>
    762 
    763     <para>At the time of writing, the following popular Linux packages
    764     are known to implement their own threading primitives:</para>
    765 
    766     <itemizedlist>
    767      <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
    768       only uses POSIX pthreads primitives.  Unfortunately Qt 4.X 
    769       has its own implementation of mutexes (QMutex) and thread reaping.
    770       Helgrind 3.4.x contains direct support
    771       for Qt 4.X threading, which is experimental but is believed to
    772       work fairly well.  A side effect of supporting Qt 4 directly is
    773       that Helgrind can be used to debug KDE4 applications.  As this
    774       is an experimental feature, we would particularly appreciate
    775       feedback from folks who have used Helgrind to successfully debug
    776       Qt 4 and/or KDE4 applications.</para>
    777      </listitem>
    778      <listitem><para>Runtime support library for GNU OpenMP (part of
    779       GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
    780       library (<filename>libgomp.so</filename>) constructs its own
    781       synchronisation primitives using combinations of atomic memory
    782       instructions and the futex syscall, which causes total chaos since in
    783       Helgrind since it cannot "see" those.</para>
    784      <para>Fortunately, this can be solved using a configuration-time
    785       option (for GCC).  Rebuild GCC from source, and configure using
    786       <varname>--disable-linux-futex</varname>.
    787       This makes libgomp.so use the standard
    788       POSIX threading primitives instead.  Note that this was tested
    789       using GCC 4.2.3 and has not been re-tested using more recent GCC
    790       versions.  We would appreciate hearing about any successes or
    791       failures with more recent versions.</para>
    792      </listitem>
    793     </itemizedlist>
    794 
    795     <para>If you must implement your own threading primitives, there
    796       are a set of client request macros
    797       in <computeroutput>helgrind.h</computeroutput> to help you
    798       describe your primitives to Helgrind.  You should be able to
    799       mark up mutexes, condition variables, etc, without difficulty.
    800     </para>
    801     <para>
    802       It is also possible to mark up the effects of thread-safe
    803       reference counting using the
    804       <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
    805       <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
    806       <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
    807       macros.  Thread-safe reference counting using an atomically
    808       incremented/decremented refcount variable causes Helgrind
    809       problems because a one-to-zero transition of the reference count
    810       means the accessing thread has exclusive ownership of the
    811       associated resource (normally, a C++ object) and can therefore
    812       access it (normally, to run its destructor) without locking.
    813       Helgrind doesn't understand this, and markup is essential to
    814       avoid false positives.
    815     </para>
    816 
    817     <para>
    818       Here are recommended guidelines for marking up thread safe
    819       reference counting in C++.  You only need to mark up your
    820       release methods -- the ones which decrement the reference count.
    821       Given a class like this:
    822     </para>
    823 
    824 <programlisting><![CDATA[
    825 class MyClass {
    826    unsigned int mRefCount;
    827 
    828    void Release ( void ) {
    829       unsigned int newCount = atomic_decrement(&mRefCount);
    830       if (newCount == 0) {
    831          delete this;
    832       }
    833    }
    834 }
    835 ]]></programlisting>
    836 
    837    <para>
    838      the release method should be marked up as follows:
    839    </para>
    840 
    841 <programlisting><![CDATA[
    842    void Release ( void ) {
    843       unsigned int newCount = atomic_decrement(&mRefCount);
    844       if (newCount == 0) {
    845          ANNOTATE_HAPPENS_AFTER(&mRefCount);
    846          ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
    847          delete this;
    848       } else {
    849          ANNOTATE_HAPPENS_BEFORE(&mRefCount);
    850       }
    851    }
    852 ]]></programlisting>
    853 
    854     <para>
    855       There are a number of complex, mostly-theoretical objections to
    856       this scheme.  From a theoretical standpoint it appears to be
    857       impossible to devise a markup scheme which is completely correct
    858       in the sense of guaranteeing to remove all false races.  The
    859       proposed scheme however works well in practice.
    860     </para>
    861 
    862   </listitem>
    863 
    864   <listitem>
    865     <para>Avoid memory recycling.  If you can't avoid it, you must use
    866     tell Helgrind what is going on via the
    867     <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
    868     <computeroutput>helgrind.h</computeroutput>).</para>
    869 
    870     <para>Helgrind is aware of standard heap memory allocation and
    871     deallocation that occurs via
    872     <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
    873     and from entry and exit of stack frames.  In particular, when memory is
    874     deallocated via <function>free</function>, <function>delete</function>,
    875     or function exit, Helgrind considers that memory clean, so when it is
    876     eventually reallocated, its history is irrelevant.</para>
    877 
    878     <para>However, it is common practice to implement memory recycling
    879     schemes.  In these, memory to be freed is not handed to
    880     <function>free</function>/<function>delete</function>, but instead put
    881     into a pool of free buffers to be handed out again as required.  The
    882     problem is that Helgrind has no
    883     way to know that such memory is logically no longer in use, and
    884     its history is irrelevant.  Hence you must make that explicit,
    885     using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
    886     to specify the relevant address ranges.  It's easiest to put these
    887     requests into the pool manager code, and use them either when memory is
    888     returned to the pool, or is allocated from it.</para>
    889   </listitem>
    890 
    891   <listitem>
    892     <para>Avoid POSIX condition variables.  If you can, use POSIX
    893     semaphores (<function>sem_t</function>, <function>sem_post</function>,
    894     <function>sem_wait</function>) to do inter-thread event signalling.
    895     Semaphores with an initial value of zero are particularly useful for
    896     this.</para>
    897 
    898     <para>Helgrind only partially correctly handles POSIX condition
    899     variables.  This is because Helgrind can see inter-thread
    900     dependencies between a <function>pthread_cond_wait</function> call and a
    901     <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
    902     call only if the waiting thread actually gets to the rendezvous first
    903     (so that it actually calls
    904     <function>pthread_cond_wait</function>).  It can't see dependencies
    905     between the threads if the signaller arrives first.  In the latter case,
    906     POSIX guidelines imply that the associated boolean condition still
    907     provides an inter-thread synchronisation event, but one which is
    908     invisible to Helgrind.</para>
    909 
    910     <para>The result of Helgrind missing some inter-thread
    911     synchronisation events is to cause it to report false positives.
    912     </para>
    913 
    914     <para>The root cause of this synchronisation lossage is
    915     particularly hard to understand, so an example is helpful.  It was
    916     discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
    917     in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
    918     canonical POSIX-recommended usage scheme for condition variables
    919     is as follows:</para>
    920 
    921 <programlisting><![CDATA[
    922 b   is a Boolean condition, which is False most of the time
    923 cv  is a condition variable
    924 mx  is its associated mutex
    925 
    926 Signaller:                             Waiter:
    927 
    928 lock(mx)                               lock(mx)
    929 b = True                               while (b == False)
    930 signal(cv)                                wait(cv,mx)
    931 unlock(mx)                             unlock(mx)
    932 ]]></programlisting>
    933 
    934     <para>Assume <computeroutput>b</computeroutput> is False most of
    935     the time.  If the waiter arrives at the rendezvous first, it
    936     enters its while-loop, waits for the signaller to signal, and
    937     eventually proceeds.  Helgrind sees the signal, notes the
    938     dependency, and all is well.</para>
    939 
    940     <para>If the signaller arrives
    941     first, <computeroutput>b</computeroutput> is set to true, and the
    942     signal disappears into nowhere.  When the waiter later arrives, it
    943     does not enter its while-loop and simply carries on.  But even in
    944     this case, the waiter code following the while-loop cannot execute
    945     until the signaller sets <computeroutput>b</computeroutput> to
    946     True.  Hence there is still the same inter-thread dependency, but
    947     this time it is through an arbitrary in-memory condition, and
    948     Helgrind cannot see it.</para>
    949 
    950     <para>By comparison, Helgrind's detection of inter-thread
    951     dependencies caused by semaphore operations is believed to be
    952     exactly correct.</para>
    953 
    954     <para>As far as I know, a solution to this problem that does not
    955     require source-level annotation of condition-variable wait loops
    956     is beyond the current state of the art.</para>
    957   </listitem>
    958 
    959   <listitem>
    960     <para>Make sure you are using a supported Linux distribution.  At
    961     present, Helgrind only properly supports glibc-2.3 or later.  This
    962     in turn means we only support glibc's NPTL threading
    963     implementation.  The old LinuxThreads implementation is not
    964     supported.</para>
    965   </listitem>
    966 
    967   <listitem>
    968     <para>Round up all finished threads using
    969     <function>pthread_join</function>.  Avoid
    970     detaching threads: don't create threads in the detached state, and
    971     don't call <function>pthread_detach</function> on existing threads.</para>
    972 
    973     <para>Using <function>pthread_join</function> to round up finished
    974     threads provides a clear synchronisation point that both Helgrind and
    975     programmers can see.  If you don't call
    976     <function>pthread_join</function> on a thread, Helgrind has no way to
    977     know when it finishes, relative to any
    978     significant synchronisation points for other threads in the program.  So
    979     it assumes that the thread lingers indefinitely and can potentially
    980     interfere indefinitely with the memory state of the program.  It
    981     has every right to assume that -- after all, it might really be
    982     the case that, for scheduling reasons, the exiting thread did run
    983     very slowly in the last stages of its life.</para>
    984   </listitem>
    985 
    986   <listitem>
    987     <para>Perform thread debugging (with Helgrind) and memory
    988     debugging (with Memcheck) together.</para>
    989 
    990     <para>Helgrind tracks the state of memory in detail, and memory
    991     management bugs in the application are liable to cause confusion.
    992     In extreme cases, applications which do many invalid reads and
    993     writes (particularly to freed memory) have been known to crash
    994     Helgrind.  So, ideally, you should make your application
    995     Memcheck-clean before using Helgrind.</para>
    996 
    997     <para>It may be impossible to make your application Memcheck-clean
    998     unless you first remove threading bugs.  In particular, it may be
    999     difficult to remove all reads and writes to freed memory in
   1000     multithreaded C++ destructor sequences at program termination.
   1001     So, ideally, you should make your application Helgrind-clean
   1002     before using Memcheck.</para>
   1003 
   1004     <para>Since this circularity is obviously unresolvable, at least
   1005     bear in mind that Memcheck and Helgrind are to some extent
   1006     complementary, and you may need to use them together.</para>
   1007   </listitem>
   1008 
   1009   <listitem>
   1010     <para>POSIX requires that implementations of standard I/O
   1011     (<function>printf</function>, <function>fprintf</function>,
   1012     <function>fwrite</function>, <function>fread</function>, etc) are thread
   1013     safe.  Unfortunately GNU libc implements this by using internal locking
   1014     primitives that Helgrind is unable to intercept.  Consequently Helgrind
   1015     generates many false race reports when you use these functions.</para>
   1016 
   1017     <para>Helgrind attempts to hide these errors using the standard
   1018     Valgrind error-suppression mechanism.  So, at least for simple
   1019     test cases, you don't see any.  Nevertheless, some may slip
   1020     through.  Just something to be aware of.</para>
   1021   </listitem>
   1022 
   1023   <listitem>
   1024     <para>Helgrind's error checks do not work properly inside the
   1025     system threading library itself
   1026     (<computeroutput>libpthread.so</computeroutput>), and it usually
   1027     observes large numbers of (false) errors in there.  Valgrind's
   1028     suppression system then filters these out, so you should not see
   1029     them.</para>
   1030 
   1031     <para>If you see any race errors reported
   1032     where <computeroutput>libpthread.so</computeroutput> or
   1033     <computeroutput>ld.so</computeroutput> is the object associated
   1034     with the innermost stack frame, please file a bug report at
   1035     <ulink url="&vg-url;">&vg-url;</ulink>.
   1036     </para>
   1037   </listitem>
   1038 
   1039 </orderedlist>
   1040 
   1041 </sect1>
   1042 
   1043 
   1044 
   1045 
   1046 <sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
   1047 <title>Helgrind Command-line Options</title>
   1048 
   1049 <para>The following end-user options are available:</para>
   1050 
   1051 <!-- start of xi:include in the manpage -->
   1052 <variablelist id="hg.opts.list">
   1053 
   1054   <varlistentry id="opt.free-is-write"
   1055                 xreflabel="--free-is-write">
   1056     <term>
   1057       <option><![CDATA[--free-is-write=no|yes
   1058       [default: no] ]]></option>
   1059     </term>
   1060     <listitem>
   1061       <para>When enabled (not the default), Helgrind treats freeing of
   1062         heap memory as if the memory was written immediately before
   1063         the free.  This exposes races where memory is referenced by
   1064         one thread, and freed by another, but there is no observable
   1065         synchronisation event to ensure that the reference happens
   1066         before the free.
   1067       </para>
   1068       <para>This functionality is new in Valgrind 3.7.0, and is
   1069         regarded as experimental.  It is not enabled by default
   1070         because its interaction with custom memory allocators is not
   1071         well understood at present.  User feedback is welcomed.
   1072       </para>
   1073     </listitem>
   1074   </varlistentry>
   1075 
   1076   <varlistentry id="opt.track-lockorders"
   1077                 xreflabel="--track-lockorders">
   1078     <term>
   1079       <option><![CDATA[--track-lockorders=no|yes
   1080       [default: yes] ]]></option>
   1081     </term>
   1082     <listitem>
   1083       <para>When enabled (the default), Helgrind performs lock order
   1084       consistency checking.  For some buggy programs, the large number
   1085       of lock order errors reported can become annoying, particularly
   1086       if you're only interested in race errors.  You may therefore find
   1087       it helpful to disable lock order checking.</para>
   1088     </listitem>
   1089   </varlistentry>
   1090 
   1091   <varlistentry id="opt.history-level"
   1092                 xreflabel="--history-level">
   1093     <term>
   1094       <option><![CDATA[--history-level=none|approx|full
   1095       [default: full] ]]></option>
   1096     </term>
   1097     <listitem>
   1098       <para><option>--history-level=full</option> (the default) causes
   1099         Helgrind collects enough information about "old" accesses that
   1100         it can produce two stack traces in a race report -- both the
   1101         stack trace for the current access, and the trace for the
   1102         older, conflicting access.</para>
   1103       <para>Collecting such information is expensive in both speed and
   1104         memory, particularly for programs that do many inter-thread
   1105         synchronisation events (locks, unlocks, etc).  Without such
   1106         information, it is more difficult to track down the root
   1107         causes of races.  Nonetheless, you may not need it in
   1108         situations where you just want to check for the presence or
   1109         absence of races, for example, when doing regression testing
   1110         of a previously race-free program.</para>
   1111       <para><option>--history-level=none</option> is the opposite
   1112         extreme.  It causes Helgrind not to collect any information
   1113         about previous accesses.  This can be dramatically faster
   1114         than <option>--history-level=full</option>.</para>
   1115       <para><option>--history-level=approx</option> provides a
   1116         compromise between these two extremes.  It causes Helgrind to
   1117         show a full trace for the later access, and approximate
   1118         information regarding the earlier access.  This approximate
   1119         information consists of two stacks, and the earlier access is
   1120         guaranteed to have occurred somewhere between program points
   1121         denoted by the two stacks. This is not as useful as showing
   1122         the exact stack for the previous access
   1123         (as <option>--history-level=full</option> does), but it is
   1124         better than nothing, and it is almost as fast as
   1125         <option>--history-level=none</option>.</para>
   1126     </listitem>
   1127   </varlistentry>
   1128 
   1129   <varlistentry id="opt.conflict-cache-size"
   1130                 xreflabel="--conflict-cache-size">
   1131     <term>
   1132       <option><![CDATA[--conflict-cache-size=N
   1133       [default: 1000000] ]]></option>
   1134     </term>
   1135     <listitem>
   1136       <para>This flag only has any effect
   1137         at <option>--history-level=full</option>.</para>
   1138       <para>Information about "old" conflicting accesses is stored in
   1139         a cache of limited size, with LRU-style management.  This is
   1140         necessary because it isn't practical to store a stack trace
   1141         for every single memory access made by the program.
   1142         Historical information on not recently accessed locations is
   1143         periodically discarded, to free up space in the cache.</para>
   1144       <para>This option controls the size of the cache, in terms of the
   1145         number of different memory addresses for which
   1146         conflicting access information is stored.  If you find that
   1147         Helgrind is showing race errors with only one stack instead of
   1148         the expected two stacks, try increasing this value.</para>
   1149       <para>The minimum value is 10,000 and the maximum is 30,000,000
   1150         (thirty times the default value).  Increasing the value by 1
   1151         increases Helgrind's memory requirement by very roughly 100
   1152         bytes, so the maximum value will easily eat up three extra
   1153         gigabytes or so of memory.</para>
   1154     </listitem>
   1155   </varlistentry>
   1156 
   1157   <varlistentry id="opt.check-stack-refs"
   1158                 xreflabel="--check-stack-refs">
   1159     <term>
   1160       <option><![CDATA[--check-stack-refs=no|yes
   1161       [default: yes] ]]></option>
   1162     </term>
   1163     <listitem>
   1164       <para>
   1165         By default Helgrind checks all data memory accesses made by your
   1166         program.  This flag enables you to skip checking for accesses
   1167         to thread stacks (local variables).  This can improve
   1168         performance, but comes at the cost of missing races on
   1169         stack-allocated data.
   1170       </para>
   1171     </listitem>
   1172   </varlistentry>
   1173 
   1174 
   1175 </variablelist>
   1176 <!-- end of xi:include in the manpage -->
   1177 
   1178 <!-- start of xi:include in the manpage -->
   1179 <!--  commented out, because we don't document debugging options in the
   1180       manual.  Nb: all the double-dashes below had a space inserted in them
   1181       to avoid problems with premature closing of this comment.
   1182 <para>In addition, the following debugging options are available for
   1183 Helgrind:</para>
   1184 
   1185 <variablelist id="hg.debugopts.list">
   1186 
   1187   <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
   1188     <term>
   1189       <option><![CDATA[- -trace-malloc=no|yes [no]
   1190       ]]></option>
   1191     </term>
   1192     <listitem>
   1193       <para>Show all client <function>malloc</function> (etc) and
   1194       <function>free</function> (etc) requests.</para>
   1195     </listitem>
   1196   </varlistentry>
   1197 
   1198   <varlistentry id="opt.cmp-race-err-addrs" 
   1199                 xreflabel="- -cmp-race-err-addrs">
   1200     <term>
   1201       <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
   1202       ]]></option>
   1203     </term>
   1204     <listitem>
   1205       <para>Controls whether or not race (data) addresses should be
   1206         taken into account when removing duplicates of race errors.
   1207         With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
   1208         identical race errors will be considered to be the same if
   1209         their race addresses differ.  With
   1210         With <varname>- -cmp-race-err-addrs=yes</varname> they will be
   1211         considered different.  This is provided to help make certain
   1212         regression tests work reliably.</para>
   1213     </listitem>
   1214   </varlistentry>
   1215 
   1216   <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
   1217     <term>
   1218       <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
   1219       ]]></option>
   1220     </term>
   1221     <listitem>
   1222       <para>Run extensive sanity checks on Helgrind's internal
   1223         data structures at events defined by the bitstring, as
   1224         follows:</para>
   1225       <para><computeroutput>010000 </computeroutput>after changes to
   1226         the lock order acquisition graph</para>
   1227       <para><computeroutput>001000 </computeroutput>after every client
   1228         memory access (NB: not currently used)</para>
   1229       <para><computeroutput>000100 </computeroutput>after every client
   1230         memory range permission setting of 256 bytes or greater</para>
   1231       <para><computeroutput>000010 </computeroutput>after every client
   1232         lock or unlock event</para>
   1233       <para><computeroutput>000001 </computeroutput>after every client
   1234         thread creation or joinage event</para>
   1235       <para>Note these will make Helgrind run very slowly, often to
   1236         the point of being completely unusable.</para>
   1237     </listitem>
   1238   </varlistentry>
   1239 
   1240 </variablelist>
   1241 -->
   1242 <!-- end of xi:include in the manpage -->
   1243 
   1244 
   1245 </sect1>
   1246 
   1247 
   1248 
   1249 <sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
   1250 <title>Helgrind Client Requests</title>
   1251 
   1252 <para>The following client requests are defined in
   1253 <filename>helgrind.h</filename>.  See that file for exact details of their
   1254 arguments.</para>
   1255 
   1256 <itemizedlist>
   1257 
   1258   <listitem>
   1259     <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
   1260     <para>This makes Helgrind forget everything it knows about a
   1261     specified memory range.  This is particularly useful for memory
   1262     allocators that wish to recycle memory.</para>
   1263   </listitem>
   1264   <listitem>
   1265     <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
   1266   </listitem>
   1267   <listitem>
   1268     <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
   1269   </listitem>
   1270   <listitem>
   1271     <para><function>ANNOTATE_NEW_MEMORY</function></para>
   1272   </listitem>
   1273   <listitem>
   1274     <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
   1275   </listitem>
   1276   <listitem>
   1277     <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
   1278   </listitem>
   1279   <listitem>
   1280     <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
   1281   </listitem>
   1282   <listitem>
   1283     <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
   1284     <para>These are used to describe to Helgrind, the behaviour of
   1285     custom (non-POSIX) synchronisation primitives, which it otherwise
   1286     has no way to understand.  See comments
   1287     in <filename>helgrind.h</filename> for further
   1288     documentation.</para>
   1289   </listitem>
   1290 
   1291 </itemizedlist>
   1292 
   1293 </sect1>
   1294 
   1295 
   1296 
   1297 <sect1 id="hg-manual.todolist" xreflabel="To Do List">
   1298 <title>A To-Do List for Helgrind</title>
   1299 
   1300 <para>The following is a list of loose ends which should be tidied up
   1301 some time.</para>
   1302 
   1303 <itemizedlist>
   1304   <listitem><para>For lock order errors, print the complete lock
   1305     cycle, rather than only doing for size-2 cycles as at
   1306     present.</para>
   1307   </listitem>
   1308   <listitem><para>The conflicting access mechanism sometimes
   1309     mysteriously fails to show the conflicting access' stack, even
   1310     when provided with unbounded storage for conflicting access info.
   1311     This should be investigated.</para>
   1312   </listitem>
   1313   <listitem><para>Document races caused by GCC's thread-unsafe code
   1314     generation for speculative stores.  In the interim see
   1315     <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
   1316     </computeroutput>
   1317     and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
   1318     </para>
   1319   </listitem>
   1320   <listitem><para>Don't update the lock-order graph, and don't check
   1321     for errors, when a "try"-style lock operation happens (e.g.
   1322     <function>pthread_mutex_trylock</function>).  Such calls do not add any real
   1323     restrictions to the locking order, since they can always fail to
   1324     acquire the lock, resulting in the caller going off and doing Plan
   1325     B (presumably it will have a Plan B).  Doing such checks could
   1326     generate false lock-order errors and confuse users.</para>
   1327   </listitem>
   1328   <listitem><para> Performance can be very poor.  Slowdowns on the
   1329     order of 100:1 are not unusual.  There is limited scope for
   1330     performance improvements.
   1331     </para>
   1332   </listitem>
   1333 
   1334 </itemizedlist>
   1335 
   1336 </sect1>
   1337 
   1338 </chapter>
   1339