Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3           "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 
      7 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector">
      8   <title>Helgrind: a thread error detector</title>
      9 
     10 <para>To use this tool, you must specify
     11 <option>--tool=helgrind</option> on the Valgrind
     12 command line.</para>
     13 
     14 
     15 <sect1 id="hg-manual.overview" xreflabel="Overview">
     16 <title>Overview</title>
     17 
     18 <para>Helgrind is a Valgrind tool for detecting synchronisation errors
     19 in C, C++ and Fortran programs that use the POSIX pthreads
     20 threading primitives.</para>
     21 
     22 <para>The main abstractions in POSIX pthreads are: a set of threads
     23 sharing a common address space, thread creation, thread joining,
     24 thread exit, mutexes (locks), condition variables (inter-thread event
     25 notifications), reader-writer locks, spinlocks, semaphores and
     26 barriers.</para>
     27 
     28 <para>Helgrind can detect three classes of errors, which are discussed
     29 in detail in the next three sections:</para>
     30 
     31 <orderedlist>
     32  <listitem>
     33   <para><link linkend="hg-manual.api-checks">
     34         Misuses of the POSIX pthreads API.</link></para>
     35  </listitem>
     36  <listitem>
     37   <para><link linkend="hg-manual.lock-orders">
     38         Potential deadlocks arising from lock
     39         ordering problems.</link></para>
     40  </listitem>
     41  <listitem>
     42   <para><link linkend="hg-manual.data-races">
     43         Data races -- accessing memory without adequate locking
     44                       or synchronisation</link>.
     45   </para>
     46  </listitem>
     47 </orderedlist>
     48 
     49 <para>Problems like these often result in unreproducible,
     50 timing-dependent crashes, deadlocks and other misbehaviour, and
     51 can be difficult to find by other means.</para>
     52 
     53 <para>Helgrind is aware of all the pthread abstractions and tracks
     54 their effects as accurately as it can.  On x86 and amd64 platforms, it
     55 understands and partially handles implicit locking arising from the
     56 use of the LOCK instruction prefix.  On PowerPC/POWER and ARM
     57 platforms, it partially handles implicit locking arising from 
     58 load-linked and store-conditional instruction pairs.
     59 </para>
     60 
     61 <para>Helgrind works best when your application uses only the POSIX
     62 pthreads API.  However, if you want to use custom threading 
     63 primitives, you can describe their behaviour to Helgrind using the
     64 <varname>ANNOTATE_*</varname> macros defined
     65 in <varname>helgrind.h</varname>.</para>
     66 
     67 
     68 
     69 <para>Following those is a section containing 
     70 <link linkend="hg-manual.effective-use">
     71 hints and tips on how to get the best out of Helgrind.</link>
     72 </para>
     73 
     74 <para>Then there is a
     75 <link linkend="hg-manual.options">summary of command-line
     76 options.</link>
     77 </para>
     78 
     79 <para>Finally, there is 
     80 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind
     81 could be improved.</link>
     82 </para>
     83 
     84 </sect1>
     85 
     86 
     87 
     88 
     89 <sect1 id="hg-manual.api-checks" xreflabel="API Checks">
     90 <title>Detected errors: Misuses of the POSIX pthreads API</title>
     91 
     92 <para>Helgrind intercepts calls to many POSIX pthreads functions, and
     93 is therefore able to report on various common problems.  Although
     94 these are unglamourous errors, their presence can lead to undefined
     95 program behaviour and hard-to-find bugs later on.  The detected errors
     96 are:</para>
     97 
     98 <itemizedlist>
     99  <listitem><para>unlocking an invalid mutex</para></listitem>
    100  <listitem><para>unlocking a not-locked mutex</para></listitem>
    101  <listitem><para>unlocking a mutex held by a different
    102                  thread</para></listitem>
    103  <listitem><para>destroying an invalid or a locked mutex</para></listitem>
    104  <listitem><para>recursively locking a non-recursive mutex</para></listitem>
    105  <listitem><para>deallocation of memory that contains a
    106                  locked mutex</para></listitem>
    107  <listitem><para>passing mutex arguments to functions expecting
    108                  reader-writer lock arguments, and vice
    109                  versa</para></listitem>
    110  <listitem><para>when a POSIX pthread function fails with an
    111                  error code that must be handled</para></listitem>
    112  <listitem><para>when a thread exits whilst still holding locked
    113                  locks</para></listitem>
    114  <listitem><para>calling <function>pthread_cond_wait</function>
    115                  with a not-locked mutex, an invalid mutex,
    116                  or one locked by a different
    117                  thread</para></listitem>
    118  <listitem><para>inconsistent bindings between condition
    119                  variables and their associated mutexes</para></listitem>
    120  <listitem><para>invalid or duplicate initialisation of a pthread
    121                  barrier</para></listitem>
    122  <listitem><para>initialisation of a pthread barrier on which threads
    123                  are still waiting</para></listitem>
    124  <listitem><para>destruction of a pthread barrier object which was
    125                  never initialised, or on which threads are still
    126                  waiting</para></listitem>
    127  <listitem><para>waiting on an uninitialised pthread
    128                  barrier</para></listitem>
    129  <listitem><para>for all of the pthreads functions that Helgrind
    130                  intercepts, an error is reported, along with a stack
    131                  trace, if the system threading library routine returns
    132                  an error code, even if Helgrind itself detected no
    133                  error</para></listitem>
    134 </itemizedlist>
    135 
    136 <para>Checks pertaining to the validity of mutexes are generally also
    137 performed for reader-writer locks.</para>
    138 
    139 <para>Various kinds of this-can't-possibly-happen events are also
    140 reported.  These usually indicate bugs in the system threading
    141 library.</para>
    142 
    143 <para>Reported errors always contain a primary stack trace indicating
    144 where the error was detected.  They may also contain auxiliary stack
    145 traces giving additional information.  In particular, most errors
    146 relating to mutexes will also tell you where that mutex first came to
    147 Helgrind's attention (the "<computeroutput>was first observed
    148 at</computeroutput>" part), so you have a chance of figuring out which
    149 mutex it is referring to.  For example:</para>
    150 
    151 <programlisting><![CDATA[
    152 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90
    153    at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492)
    154    by 0x40073A: nearly_main (tc09_bad_unlock.c:27)
    155    by 0x40079B: main (tc09_bad_unlock.c:50)
    156   Lock at 0x7FEFFFA90 was first observed
    157    at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326)
    158    by 0x40071F: nearly_main (tc09_bad_unlock.c:23)
    159    by 0x40079B: main (tc09_bad_unlock.c:50)
    160 ]]></programlisting>
    161 
    162 <para>Helgrind has a way of summarising thread identities, as
    163 you see here with the text "<computeroutput>Thread
    164 #1</computeroutput>".  This is so that it can speak about threads and
    165 sets of threads without overwhelming you with details.  See 
    166 <link linkend="hg-manual.data-races.errmsgs">below</link>
    167 for more information on interpreting error messages.</para>
    168 
    169 </sect1>
    170 
    171 
    172 
    173 
    174 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders">
    175 <title>Detected errors: Inconsistent Lock Orderings</title>
    176 
    177 <para>In this section, and in general, to "acquire" a lock simply
    178 means to lock that lock, and to "release" a lock means to unlock
    179 it.</para>
    180 
    181 <para>Helgrind monitors the order in which threads acquire locks.
    182 This allows it to detect potential deadlocks which could arise from
    183 the formation of cycles of locks.  Detecting such inconsistencies is
    184 useful because, whilst actual deadlocks are fairly obvious, potential
    185 deadlocks may never be discovered during testing and could later lead
    186 to hard-to-diagnose in-service failures.</para>
    187 
    188 <para>The simplest example of such a problem is as
    189 follows.</para>
    190 
    191 <itemizedlist>
    192  <listitem><para>Imagine some shared resource R, which, for whatever
    193   reason, is guarded by two locks, L1 and L2, which must both be held
    194   when R is accessed.</para>
    195  </listitem>
    196  <listitem><para>Suppose a thread acquires L1, then L2, and proceeds
    197   to access R.  The implication of this is that all threads in the
    198   program must acquire the two locks in the order first L1 then L2.
    199   Not doing so risks deadlock.</para>
    200  </listitem>
    201  <listitem><para>The deadlock could happen if two threads -- call them
    202   T1 and T2 -- both want to access R.  Suppose T1 acquires L1 first,
    203   and T2 acquires L2 first.  Then T1 tries to acquire L2, and T2 tries
    204   to acquire L1, but those locks are both already held.  So T1 and T2
    205   become deadlocked.</para>
    206  </listitem>
    207 </itemizedlist>
    208 
    209 <para>Helgrind builds a directed graph indicating the order in which
    210 locks have been acquired in the past.  When a thread acquires a new
    211 lock, the graph is updated, and then checked to see if it now contains
    212 a cycle.  The presence of a cycle indicates a potential deadlock involving
    213 the locks in the cycle.</para>
    214 
    215 <para>In general, Helgrind will choose two locks involved in the cycle
    216 and show you how their acquisition ordering has become inconsistent.
    217 It does this by showing the program points that first defined the
    218 ordering, and the program points which later violated it.  Here is a
    219 simple example involving just two locks:</para>
    220 
    221 <programlisting><![CDATA[
    222 Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated
    223 
    224 Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0
    225    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    226    by 0x400825: main (tc13_laog1.c:23)
    227 
    228  followed by a later acquisition of lock at 0x7FF0006D0
    229    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    230    by 0x400853: main (tc13_laog1.c:24)
    231 
    232 Required order was established by acquisition of lock at 0x7FF0006D0
    233    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    234    by 0x40076D: main (tc13_laog1.c:17)
    235 
    236  followed by a later acquisition of lock at 0x7FF0006A0
    237    at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494)
    238    by 0x40079B: main (tc13_laog1.c:18)
    239 ]]></programlisting>
    240 
    241 <para>When there are more than two locks in the cycle, the error is
    242 equally serious.  However, at present Helgrind does not show the locks
    243 involved, sometimes because that information is not available, but
    244 also so as to avoid flooding you with information.  For example, a
    245 naive implementation of the famous Dining Philosophers problem
    246 involves a cycle of five locks
    247 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>).
    248 In this case Helgrind has detected that all 5 philosophers could
    249 simultaneously pick up their left fork and then deadlock whilst
    250 waiting to pick up their right forks.</para>
    251 
    252 <programlisting><![CDATA[
    253 Thread #6: lock order "0x80499A0 before 0x8049A00" violated
    254 
    255 Observed (incorrect) order is: acquisition of lock at 0x8049A00
    256    at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
    257    by 0x80485B4: dine (tc14_laog_dinphils.c:18)
    258    by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
    259    by 0x39B924: start_thread (pthread_create.c:297)
    260    by 0x2F107D: clone (clone.S:130)
    261 
    262  followed by a later acquisition of lock at 0x80499A0
    263    at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495)
    264    by 0x80485CD: dine (tc14_laog_dinphils.c:19)
    265    by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219)
    266    by 0x39B924: start_thread (pthread_create.c:297)
    267    by 0x2F107D: clone (clone.S:130)
    268 ]]></programlisting>
    269 
    270 </sect1>
    271 
    272 
    273 
    274 
    275 <sect1 id="hg-manual.data-races" xreflabel="Data Races">
    276 <title>Detected errors: Data Races</title>
    277 
    278 <para>A data race happens, or could happen, when two threads access a
    279 shared memory location without using suitable locks or other
    280 synchronisation to ensure single-threaded access.  Such missing
    281 locking can cause obscure timing dependent bugs.  Ensuring programs
    282 are race-free is one of the central difficulties of threaded
    283 programming.</para>
    284 
    285 <para>Reliably detecting races is a difficult problem, and most
    286 of Helgrind's internals are devoted to dealing with it.  
    287 We begin with a simple example.</para>
    288 
    289 
    290 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race">
    291 <title>A Simple Data Race</title>
    292 
    293 <para>About the simplest possible example of a race is as follows.  In
    294 this program, it is impossible to know what the value
    295 of <computeroutput>var</computeroutput> is at the end of the program.
    296 Is it 2 ?  Or 1 ?</para>
    297 
    298 <programlisting><![CDATA[
    299 #include <pthread.h>
    300 
    301 int var = 0;
    302 
    303 void* child_fn ( void* arg ) {
    304    var++; /* Unprotected relative to parent */ /* this is line 6 */
    305    return NULL;
    306 }
    307 
    308 int main ( void ) {
    309    pthread_t child;
    310    pthread_create(&child, NULL, child_fn, NULL);
    311    var++; /* Unprotected relative to child */ /* this is line 13 */
    312    pthread_join(child, NULL);
    313    return 0;
    314 }
    315 ]]></programlisting>
    316 
    317 <para>The problem is there is nothing to
    318 stop <varname>var</varname> being updated simultaneously
    319 by both threads.  A correct program would 
    320 protect <varname>var</varname> with a lock of type
    321 <function>pthread_mutex_t</function>, which is acquired
    322 before each access and released afterwards.  Helgrind's output for
    323 this program is:</para>
    324 
    325 <programlisting><![CDATA[
    326 Thread #1 is the program's root thread
    327 
    328 Thread #2 was created
    329    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    330    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    331    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    332    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    333    by 0x400605: main (simple_race.c:12)
    334 
    335 Possible data race during read of size 4 at 0x601038 by thread #1
    336 Locks held: none
    337    at 0x400606: main (simple_race.c:13)
    338 
    339 This conflicts with a previous write of size 4 by thread #2
    340 Locks held: none
    341    at 0x4005DC: child_fn (simple_race.c:6)
    342    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    343    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    344    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    345 
    346 Location 0x601038 is 0 bytes inside global var "var"
    347 declared at simple_race.c:3
    348 ]]></programlisting>
    349 
    350 <para>This is quite a lot of detail for an apparently simple error.
    351 The last clause is the main error message.  It says there is a race as
    352 a result of a read of size 4 (bytes), at 0x601038, which is the
    353 address of <computeroutput>var</computeroutput>, happening in
    354 function <computeroutput>main</computeroutput> at line 13 in the
    355 program.</para>
    356 
    357 <para>Two important parts of the message are:</para>
    358 
    359 <itemizedlist>
    360  <listitem>
    361   <para>Helgrind shows two stack traces for the error, not one.  By
    362    definition, a race involves two different threads accessing the
    363    same location in such a way that the result depends on the relative
    364    speeds of the two threads.</para>
    365   <para>
    366    The first stack trace follows the text "<computeroutput>Possible
    367    data race during read of size 4 ...</computeroutput>" and the
    368    second trace follows the text "<computeroutput>This conflicts with
    369    a previous write of size 4 ...</computeroutput>".  Helgrind is
    370    usually able to show both accesses involved in a race.  At least
    371    one of these will be a write (since two concurrent, unsynchronised
    372    reads are harmless), and they will of course be from different
    373    threads.</para>
    374   <para>By examining your program at the two locations, you should be
    375    able to get at least some idea of what the root cause of the
    376    problem is.  For each location, Helgrind shows the set of locks
    377    held at the time of the access.  This often makes it clear which
    378    thread, if any, failed to take a required lock.  In this example
    379    neither thread holds a lock during the access.</para>
    380  </listitem>
    381  <listitem>
    382   <para>For races which occur on global or stack variables, Helgrind
    383    tries to identify the name and defining point of the variable.
    384    Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside
    385    global var "var" declared at simple_race.c:3</computeroutput>".</para>
    386   <para>Showing names of stack and global variables carries no
    387    run-time overhead once Helgrind has your program up and running.
    388    However, it does require Helgrind to spend considerable extra time
    389    and memory at program startup to read the relevant debug info.
    390    Hence this facility is disabled by default.  To enable it, you need
    391    to give the <varname>--read-var-info=yes</varname> option to
    392    Helgrind.</para>
    393  </listitem>
    394 </itemizedlist>
    395 
    396 <para>The following section explains Helgrind's race detection
    397 algorithm in more detail.</para>
    398 
    399 </sect2>
    400 
    401 
    402 
    403 <sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm">
    404 <title>Helgrind's Race Detection Algorithm</title>
    405 
    406 <para>Most programmers think about threaded programming in terms of
    407 the basic functionality provided by the threading library (POSIX
    408 Pthreads): thread creation, thread joining, locks, condition
    409 variables, semaphores and barriers.</para>
    410 
    411 <para>The effect of using these functions is to impose 
    412 constraints upon the order in which memory accesses can
    413 happen.  This implied ordering is generally known as the
    414 "happens-before relation".  Once you understand the happens-before
    415 relation, it is easy to see how Helgrind finds races in your code.
    416 Fortunately, the happens-before relation is itself easy to understand,
    417 and is by itself a useful tool for reasoning about the behaviour of
    418 parallel programs.  We now introduce it using a simple example.</para>
    419 
    420 <para>Consider first the following buggy program:</para>
    421 
    422 <programlisting><![CDATA[
    423 Parent thread:                         Child thread:
    424 
    425 int var;
    426 
    427 // create child thread
    428 pthread_create(...)                          
    429 var = 20;                              var = 10;
    430                                        exit
    431 
    432 // wait for child
    433 pthread_join(...)
    434 printf("%d\n", var);
    435 ]]></programlisting>
    436 
    437 <para>The parent thread creates a child.  Both then write different
    438 values to some variable <computeroutput>var</computeroutput>, and the
    439 parent then waits for the child to exit.</para>
    440 
    441 <para>What is the value of <computeroutput>var</computeroutput> at the
    442 end of the program, 10 or 20?  We don't know.  The program is
    443 considered buggy (it has a race) because the final value
    444 of <computeroutput>var</computeroutput> depends on the relative rates
    445 of progress of the parent and child threads.  If the parent is fast
    446 and the child is slow, then the child's assignment may happen later,
    447 so the final value will be 10; and vice versa if the child is faster
    448 than the parent.</para>
    449 
    450 <para>The relative rates of progress of parent vs child is not something
    451 the programmer can control, and will often change from run to run.
    452 It depends on factors such as the load on the machine, what else is
    453 running, the kernel's scheduling strategy, and many other factors.</para>
    454 
    455 <para>The obvious fix is to use a lock to
    456 protect <computeroutput>var</computeroutput>.  It is however
    457 instructive to consider a somewhat more abstract solution, which is to
    458 send a message from one thread to the other:</para>
    459 
    460 <programlisting><![CDATA[
    461 Parent thread:                         Child thread:
    462 
    463 int var;
    464 
    465 // create child thread
    466 pthread_create(...)                          
    467 var = 20;
    468 // send message to child
    469                                        // wait for message to arrive
    470                                        var = 10;
    471                                        exit
    472 
    473 // wait for child
    474 pthread_join(...)
    475 printf("%d\n", var);
    476 ]]></programlisting>
    477 
    478 <para>Now the program reliably prints "10", regardless of the speed of
    479 the threads.  Why?  Because the child's assignment cannot happen until
    480 after it receives the message.  And the message is not sent until
    481 after the parent's assignment is done.</para>
    482 
    483 <para>The message transmission creates a "happens-before" dependency
    484 between the two assignments: <computeroutput>var = 20;</computeroutput>
    485 must now happen-before <computeroutput>var = 10;</computeroutput>.
    486 And so there is no longer a race
    487 on <computeroutput>var</computeroutput>.
    488 </para>
    489 
    490 <para>Note that it's not significant that the parent sends a message
    491 to the child.  Sending a message from the child (after its assignment)
    492 to the parent (before its assignment) would also fix the problem, causing
    493 the program to reliably print "20".</para>
    494 
    495 <para>Helgrind's algorithm is (conceptually) very simple.  It monitors all
    496 accesses to memory locations.  If a location -- in this example, 
    497 <computeroutput>var</computeroutput>,
    498 is accessed by two different threads, Helgrind checks to see if the
    499 two accesses are ordered by the happens-before relation.  If so,
    500 that's fine; if not, it reports a race.</para>
    501 
    502 <para>It is important to understand that the happens-before relation
    503 creates only a partial ordering, not a total ordering.  An example of
    504 a total ordering is comparison of numbers: for any two numbers 
    505 <computeroutput>x</computeroutput> and
    506 <computeroutput>y</computeroutput>, either 
    507 <computeroutput>x</computeroutput> is less than, equal to, or greater
    508 than
    509 <computeroutput>y</computeroutput>.  A partial ordering is like a
    510 total ordering, but it can also express the concept that two elements
    511 are neither equal, less or greater, but merely unordered with respect
    512 to each other.</para>
    513 
    514 <para>In the fixed example above, we say that 
    515 <computeroutput>var = 20;</computeroutput> "happens-before"
    516 <computeroutput>var = 10;</computeroutput>.  But in the original
    517 version, they are unordered: we cannot say that either happens-before
    518 the other.</para>
    519 
    520 <para>What does it mean to say that two accesses from different
    521 threads are ordered by the happens-before relation?  It means that
    522 there is some chain of inter-thread synchronisation operations which
    523 cause those accesses to happen in a particular order, irrespective of
    524 the actual rates of progress of the individual threads.  This is a
    525 required property for a reliable threaded program, which is why
    526 Helgrind checks for it.</para>
    527 
    528 <para>The happens-before relations created by standard threading
    529 primitives are as follows:</para>
    530 
    531 <itemizedlist>
    532  <listitem><para>When a mutex is unlocked by thread T1 and later (or
    533   immediately) locked by thread T2, then the memory accesses in T1
    534   prior to the unlock must happen-before those in T2 after it acquires
    535   the lock.</para>
    536  </listitem>
    537  <listitem><para>The same idea applies to reader-writer locks,
    538   although with some complication so as to allow correct handling of
    539   reads vs writes.</para>
    540  </listitem>
    541  <listitem><para>When a condition variable (CV) is signalled on by
    542   thread T1 and some other thread T2 is thereby released from a wait
    543   on the same CV, then the memory accesses in T1 prior to the
    544   signalling must happen-before those in T2 after it returns from the
    545   wait.  If no thread was waiting on the CV then there is no
    546   effect.</para>
    547  </listitem>
    548  <listitem><para>If instead T1 broadcasts on a CV, then all of the
    549   waiting threads, rather than just one of them, acquire a
    550   happens-before dependency on the broadcasting thread at the point it
    551   did the broadcast.</para>
    552  </listitem>
    553  <listitem><para>A thread T2 that continues after completing sem_wait
    554   on a semaphore that thread T1 posts on, acquires a happens-before
    555   dependence on the posting thread, a bit like dependencies caused
    556   mutex unlock-lock pairs.  However, since a semaphore can be posted
    557   on many times, it is unspecified from which of the post calls the
    558   wait call gets its happens-before dependency.</para>
    559  </listitem>
    560  <listitem><para>For a group of threads T1 .. Tn which arrive at a
    561   barrier and then move on, each thread after the call has a
    562   happens-after dependency from all threads before the
    563   barrier.</para>
    564  </listitem>
    565  <listitem><para>A newly-created child thread acquires an initial
    566   happens-after dependency on the point where its parent created it.
    567   That is, all memory accesses performed by the parent prior to
    568   creating the child are regarded as happening-before all the accesses
    569   of the child.</para>
    570  </listitem>
    571  <listitem><para>Similarly, when an exiting thread is reaped via a
    572   call to <function>pthread_join</function>, once the call returns, the
    573   reaping thread acquires a happens-after dependency relative to all memory
    574   accesses made by the exiting thread.</para>
    575  </listitem>
    576 </itemizedlist>
    577 
    578 <para>In summary: Helgrind intercepts the above listed events, and builds a
    579 directed acyclic graph represented the collective happens-before
    580 dependencies.  It also monitors all memory accesses.</para>
    581 
    582 <para>If a location is accessed by two different threads, but Helgrind
    583 cannot find any path through the happens-before graph from one access
    584 to the other, then it reports a race.</para>
    585 
    586 <para>There are a couple of caveats:</para>
    587 
    588 <itemizedlist>
    589  <listitem><para>Helgrind doesn't check for a race in the case where
    590   both accesses are reads.  That would be silly, since concurrent
    591   reads are harmless.</para>
    592  </listitem>
    593  <listitem><para>Two accesses are considered to be ordered by the
    594   happens-before dependency even through arbitrarily long chains of
    595   synchronisation events.  For example, if T1 accesses some location
    596   L, and then <function>pthread_cond_signals</function> T2, which later
    597   <function>pthread_cond_signals</function> T3, which then accesses L, then
    598   a suitable happens-before dependency exists between the first and second
    599   accesses, even though it involves two different inter-thread
    600   synchronisation events.</para>
    601  </listitem>
    602 </itemizedlist>
    603 
    604 </sect2>
    605 
    606 
    607 
    608 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages">
    609 <title>Interpreting Race Error Messages</title>
    610 
    611 <para>Helgrind's race detection algorithm collects a lot of
    612 information, and tries to present it in a helpful way when a race is
    613 detected.  Here's an example:</para>
    614 
    615 <programlisting><![CDATA[
    616 Thread #2 was created
    617    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    618    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    619    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    620    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    621    by 0x4008F2: main (tc21_pthonce.c:86)
    622 
    623 Thread #3 was created
    624    at 0x511C08E: clone (in /lib64/libc-2.8.so)
    625    by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so)
    626    by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so)
    627    by 0x4C299D4: pthread_create@* (hg_intercepts.c:214)
    628    by 0x4008F2: main (tc21_pthonce.c:86)
    629 
    630 Possible data race during read of size 4 at 0x601070 by thread #3
    631 Locks held: none
    632    at 0x40087A: child (tc21_pthonce.c:74)
    633    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    634    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    635    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    636 
    637 This conflicts with a previous write of size 4 by thread #2
    638 Locks held: none
    639    at 0x400883: child (tc21_pthonce.c:74)
    640    by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194)
    641    by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so)
    642    by 0x511C0CC: clone (in /lib64/libc-2.8.so)
    643 
    644 Location 0x601070 is 0 bytes inside local var "unprotected2"
    645 declared at tc21_pthonce.c:51, in frame #0 of thread 3
    646 ]]></programlisting>
    647 
    648 <para>Helgrind first announces the creation points of any threads
    649 referenced in the error message.  This is so it can speak concisely
    650 about threads without repeatedly printing their creation point call
    651 stacks.  Each thread is only ever announced once, the first time it
    652 appears in any Helgrind error message.</para>
    653 
    654 <para>The main error message begins at the text
    655 "<computeroutput>Possible data race during read</computeroutput>".  At
    656 the start is information you would expect to see -- address and size
    657 of the racing access, whether a read or a write, and the call stack at
    658 the point it was detected.</para>
    659 
    660 <para>A second call stack is presented starting at the text
    661 "<computeroutput>This conflicts with a previous
    662 write</computeroutput>".  This shows a previous access which also
    663 accessed the stated address, and which is believed to be racing
    664 against the access in the first call stack. Note that this second
    665 call stack is limited to a maximum of 8 entries to limit the
    666 memory usage.</para>
    667 
    668 <para>Finally, Helgrind may attempt to give a description of the
    669 raced-on address in source level terms.  In this example, it
    670 identifies it as a local variable, shows its name, declaration point,
    671 and in which frame (of the first call stack) it lives.  Note that this
    672 information is only shown when <varname>--read-var-info=yes</varname>
    673 is specified on the command line.  That's because reading the DWARF3
    674 debug information in enough detail to capture variable type and
    675 location information makes Helgrind much slower at startup, and also
    676 requires considerable amounts of memory, for large programs.
    677 </para>
    678 
    679 <para>Once you have your two call stacks, how do you find the root
    680 cause of the race?</para>
    681 
    682 <para>The first thing to do is examine the source locations referred
    683 to by each call stack.  They should both show an access to the same
    684 location, or variable.</para>
    685 
    686 <para>Now figure out how how that location should have been made
    687 thread-safe:</para>
    688 
    689 <itemizedlist>
    690  <listitem><para>Perhaps the location was intended to be protected by
    691   a mutex?  If so, you need to lock and unlock the mutex at both
    692   access points, even if one of the accesses is reported to be a read.
    693   Did you perhaps forget the locking at one or other of the accesses?
    694   To help you do this, Helgrind shows the set of locks held by each
    695   threads at the time they accessed the raced-on location.</para>
    696  </listitem>
    697  <listitem><para>Alternatively, perhaps you intended to use a some
    698   other scheme to make it safe, such as signalling on a condition
    699   variable.  In all such cases, try to find a synchronisation event
    700   (or a chain thereof) which separates the earlier-observed access (as
    701   shown in the second call stack) from the later-observed access (as
    702   shown in the first call stack).  In other words, try to find
    703   evidence that the earlier access "happens-before" the later access.
    704   See the previous subsection for an explanation of the happens-before
    705   relation.</para>
    706   <para>
    707   The fact that Helgrind is reporting a race means it did not observe
    708   any happens-before relation between the two accesses.  If
    709   Helgrind is working correctly, it should also be the case that you
    710   also cannot find any such relation, even on detailed inspection
    711   of the source code.  Hopefully, though, your inspection of the code
    712   will show where the missing synchronisation operation(s) should have
    713   been.</para>
    714  </listitem>
    715 </itemizedlist>
    716 
    717 </sect2>
    718 
    719 
    720 </sect1>
    721 
    722 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use">
    723 <title>Hints and Tips for Effective Use of Helgrind</title>
    724 
    725 <para>Helgrind can be very helpful in finding and resolving
    726 threading-related problems.  Like all sophisticated tools, it is most
    727 effective when you understand how to play to its strengths.</para>
    728 
    729 <para>Helgrind will be less effective when you merely throw an
    730 existing threaded program at it and try to make sense of any reported
    731 errors.  It will be more effective if you design threaded programs
    732 from the start in a way that helps Helgrind verify correctness.  The
    733 same is true for finding memory errors with Memcheck, but applies more
    734 here, because thread checking is a harder problem.  Consequently it is
    735 much easier to write a correct program for which Helgrind falsely
    736 reports (threading) errors than it is to write a correct program for
    737 which Memcheck falsely reports (memory) errors.</para>
    738 
    739 <para>With that in mind, here are some tips, listed most important first,
    740 for getting reliable results and avoiding false errors.  The first two
    741 are critical.  Any violations of them will swamp you with huge numbers
    742 of false data-race errors.</para>
    743 
    744 
    745 <orderedlist>
    746 
    747   <listitem>
    748     <para>Make sure your application, and all the libraries it uses,
    749     use the POSIX threading primitives.  Helgrind needs to be able to
    750     see all events pertaining to thread creation, exit, locking and
    751     other synchronisation events.  To do so it intercepts many POSIX
    752     pthreads functions.</para>
    753 
    754     <para>Do not roll your own threading primitives (mutexes, etc)
    755     from combinations of the Linux futex syscall, atomic counters, etc.
    756     These throw Helgrind's internal what's-going-on models
    757     way off course and will give bogus results.</para>
    758 
    759     <para>Also, do not reimplement existing POSIX abstractions using
    760     other POSIX abstractions.  For example, don't build your own
    761     semaphore routines or reader-writer locks from POSIX mutexes and
    762     condition variables.  Instead use POSIX reader-writer locks and
    763     semaphores directly, since Helgrind supports them directly.</para>
    764 
    765     <para>Helgrind directly supports the following POSIX threading
    766     abstractions: mutexes, reader-writer locks, condition variables
    767     (but see below), semaphores and barriers.  Currently spinlocks
    768     are not supported, although they could be in future.</para>
    769 
    770     <para>At the time of writing, the following popular Linux packages
    771     are known to implement their own threading primitives:</para>
    772 
    773     <itemizedlist>
    774      <listitem><para>Qt version 4.X.  Qt 3.X is harmless in that it
    775       only uses POSIX pthreads primitives.  Unfortunately Qt 4.X 
    776       has its own implementation of mutexes (QMutex) and thread reaping.
    777       Helgrind 3.4.x contains direct support
    778       for Qt 4.X threading, which is experimental but is believed to
    779       work fairly well.  A side effect of supporting Qt 4 directly is
    780       that Helgrind can be used to debug KDE4 applications.  As this
    781       is an experimental feature, we would particularly appreciate
    782       feedback from folks who have used Helgrind to successfully debug
    783       Qt 4 and/or KDE4 applications.</para>
    784      </listitem>
    785      <listitem><para>Runtime support library for GNU OpenMP (part of
    786       GCC), at least for GCC versions 4.2 and 4.3.  The GNU OpenMP runtime
    787       library (<filename>libgomp.so</filename>) constructs its own
    788       synchronisation primitives using combinations of atomic memory
    789       instructions and the futex syscall, which causes total chaos since in
    790       Helgrind since it cannot "see" those.</para>
    791      <para>Fortunately, this can be solved using a configuration-time
    792       option (for GCC).  Rebuild GCC from source, and configure using
    793       <varname>--disable-linux-futex</varname>.
    794       This makes libgomp.so use the standard
    795       POSIX threading primitives instead.  Note that this was tested
    796       using GCC 4.2.3 and has not been re-tested using more recent GCC
    797       versions.  We would appreciate hearing about any successes or
    798       failures with more recent versions.</para>
    799      </listitem>
    800     </itemizedlist>
    801 
    802     <para>If you must implement your own threading primitives, there
    803       are a set of client request macros
    804       in <computeroutput>helgrind.h</computeroutput> to help you
    805       describe your primitives to Helgrind.  You should be able to
    806       mark up mutexes, condition variables, etc, without difficulty.
    807     </para>
    808     <para>
    809       It is also possible to mark up the effects of thread-safe
    810       reference counting using the
    811       <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>,
    812       <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and
    813       <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>,
    814       macros.  Thread-safe reference counting using an atomically
    815       incremented/decremented refcount variable causes Helgrind
    816       problems because a one-to-zero transition of the reference count
    817       means the accessing thread has exclusive ownership of the
    818       associated resource (normally, a C++ object) and can therefore
    819       access it (normally, to run its destructor) without locking.
    820       Helgrind doesn't understand this, and markup is essential to
    821       avoid false positives.
    822     </para>
    823 
    824     <para>
    825       Here are recommended guidelines for marking up thread safe
    826       reference counting in C++.  You only need to mark up your
    827       release methods -- the ones which decrement the reference count.
    828       Given a class like this:
    829     </para>
    830 
    831 <programlisting><![CDATA[
    832 class MyClass {
    833    unsigned int mRefCount;
    834 
    835    void Release ( void ) {
    836       unsigned int newCount = atomic_decrement(&mRefCount);
    837       if (newCount == 0) {
    838          delete this;
    839       }
    840    }
    841 }
    842 ]]></programlisting>
    843 
    844    <para>
    845      the release method should be marked up as follows:
    846    </para>
    847 
    848 <programlisting><![CDATA[
    849    void Release ( void ) {
    850       unsigned int newCount = atomic_decrement(&mRefCount);
    851       if (newCount == 0) {
    852          ANNOTATE_HAPPENS_AFTER(&mRefCount);
    853          ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount);
    854          delete this;
    855       } else {
    856          ANNOTATE_HAPPENS_BEFORE(&mRefCount);
    857       }
    858    }
    859 ]]></programlisting>
    860 
    861     <para>
    862       There are a number of complex, mostly-theoretical objections to
    863       this scheme.  From a theoretical standpoint it appears to be
    864       impossible to devise a markup scheme which is completely correct
    865       in the sense of guaranteeing to remove all false races.  The
    866       proposed scheme however works well in practice.
    867     </para>
    868 
    869   </listitem>
    870 
    871   <listitem>
    872     <para>Avoid memory recycling.  If you can't avoid it, you must use
    873     tell Helgrind what is going on via the
    874     <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in
    875     <computeroutput>helgrind.h</computeroutput>).</para>
    876 
    877     <para>Helgrind is aware of standard heap memory allocation and
    878     deallocation that occurs via
    879     <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function>
    880     and from entry and exit of stack frames.  In particular, when memory is
    881     deallocated via <function>free</function>, <function>delete</function>,
    882     or function exit, Helgrind considers that memory clean, so when it is
    883     eventually reallocated, its history is irrelevant.</para>
    884 
    885     <para>However, it is common practice to implement memory recycling
    886     schemes.  In these, memory to be freed is not handed to
    887     <function>free</function>/<function>delete</function>, but instead put
    888     into a pool of free buffers to be handed out again as required.  The
    889     problem is that Helgrind has no
    890     way to know that such memory is logically no longer in use, and
    891     its history is irrelevant.  Hence you must make that explicit,
    892     using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request
    893     to specify the relevant address ranges.  It's easiest to put these
    894     requests into the pool manager code, and use them either when memory is
    895     returned to the pool, or is allocated from it.</para>
    896   </listitem>
    897 
    898   <listitem>
    899     <para>Avoid POSIX condition variables.  If you can, use POSIX
    900     semaphores (<function>sem_t</function>, <function>sem_post</function>,
    901     <function>sem_wait</function>) to do inter-thread event signalling.
    902     Semaphores with an initial value of zero are particularly useful for
    903     this.</para>
    904 
    905     <para>Helgrind only partially correctly handles POSIX condition
    906     variables.  This is because Helgrind can see inter-thread
    907     dependencies between a <function>pthread_cond_wait</function> call and a
    908     <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function>
    909     call only if the waiting thread actually gets to the rendezvous first
    910     (so that it actually calls
    911     <function>pthread_cond_wait</function>).  It can't see dependencies
    912     between the threads if the signaller arrives first.  In the latter case,
    913     POSIX guidelines imply that the associated boolean condition still
    914     provides an inter-thread synchronisation event, but one which is
    915     invisible to Helgrind.</para>
    916 
    917     <para>The result of Helgrind missing some inter-thread
    918     synchronisation events is to cause it to report false positives.
    919     </para>
    920 
    921     <para>The root cause of this synchronisation lossage is
    922     particularly hard to understand, so an example is helpful.  It was
    923     discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
    924     in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
    925     canonical POSIX-recommended usage scheme for condition variables
    926     is as follows:</para>
    927 
    928 <programlisting><![CDATA[
    929 b   is a Boolean condition, which is False most of the time
    930 cv  is a condition variable
    931 mx  is its associated mutex
    932 
    933 Signaller:                             Waiter:
    934 
    935 lock(mx)                               lock(mx)
    936 b = True                               while (b == False)
    937 signal(cv)                                wait(cv,mx)
    938 unlock(mx)                             unlock(mx)
    939 ]]></programlisting>
    940 
    941     <para>Assume <computeroutput>b</computeroutput> is False most of
    942     the time.  If the waiter arrives at the rendezvous first, it
    943     enters its while-loop, waits for the signaller to signal, and
    944     eventually proceeds.  Helgrind sees the signal, notes the
    945     dependency, and all is well.</para>
    946 
    947     <para>If the signaller arrives
    948     first, <computeroutput>b</computeroutput> is set to true, and the
    949     signal disappears into nowhere.  When the waiter later arrives, it
    950     does not enter its while-loop and simply carries on.  But even in
    951     this case, the waiter code following the while-loop cannot execute
    952     until the signaller sets <computeroutput>b</computeroutput> to
    953     True.  Hence there is still the same inter-thread dependency, but
    954     this time it is through an arbitrary in-memory condition, and
    955     Helgrind cannot see it.</para>
    956 
    957     <para>By comparison, Helgrind's detection of inter-thread
    958     dependencies caused by semaphore operations is believed to be
    959     exactly correct.</para>
    960 
    961     <para>As far as I know, a solution to this problem that does not
    962     require source-level annotation of condition-variable wait loops
    963     is beyond the current state of the art.</para>
    964   </listitem>
    965 
    966   <listitem>
    967     <para>Make sure you are using a supported Linux distribution.  At
    968     present, Helgrind only properly supports glibc-2.3 or later.  This
    969     in turn means we only support glibc's NPTL threading
    970     implementation.  The old LinuxThreads implementation is not
    971     supported.</para>
    972   </listitem>
    973 
    974   <listitem>
    975     <para>If your application is using thread local variables,
    976     helgrind might report false positive race conditions on these
    977     variables, despite being very probably race free.  On Linux, you can
    978     use <option>--sim-hints=deactivate-pthread-stack-cache-via-hack</option>
    979     to avoid such false positive error messages
    980     (see <xref linkend="opt.sim-hints"/>).
    981     </para>
    982   </listitem>
    983 
    984   <listitem>
    985     <para>Round up all finished threads using
    986     <function>pthread_join</function>.  Avoid
    987     detaching threads: don't create threads in the detached state, and
    988     don't call <function>pthread_detach</function> on existing threads.</para>
    989 
    990     <para>Using <function>pthread_join</function> to round up finished
    991     threads provides a clear synchronisation point that both Helgrind and
    992     programmers can see.  If you don't call
    993     <function>pthread_join</function> on a thread, Helgrind has no way to
    994     know when it finishes, relative to any
    995     significant synchronisation points for other threads in the program.  So
    996     it assumes that the thread lingers indefinitely and can potentially
    997     interfere indefinitely with the memory state of the program.  It
    998     has every right to assume that -- after all, it might really be
    999     the case that, for scheduling reasons, the exiting thread did run
   1000     very slowly in the last stages of its life.</para>
   1001   </listitem>
   1002 
   1003   <listitem>
   1004     <para>Perform thread debugging (with Helgrind) and memory
   1005     debugging (with Memcheck) together.</para>
   1006 
   1007     <para>Helgrind tracks the state of memory in detail, and memory
   1008     management bugs in the application are liable to cause confusion.
   1009     In extreme cases, applications which do many invalid reads and
   1010     writes (particularly to freed memory) have been known to crash
   1011     Helgrind.  So, ideally, you should make your application
   1012     Memcheck-clean before using Helgrind.</para>
   1013 
   1014     <para>It may be impossible to make your application Memcheck-clean
   1015     unless you first remove threading bugs.  In particular, it may be
   1016     difficult to remove all reads and writes to freed memory in
   1017     multithreaded C++ destructor sequences at program termination.
   1018     So, ideally, you should make your application Helgrind-clean
   1019     before using Memcheck.</para>
   1020 
   1021     <para>Since this circularity is obviously unresolvable, at least
   1022     bear in mind that Memcheck and Helgrind are to some extent
   1023     complementary, and you may need to use them together.</para>
   1024   </listitem>
   1025 
   1026   <listitem>
   1027     <para>POSIX requires that implementations of standard I/O
   1028     (<function>printf</function>, <function>fprintf</function>,
   1029     <function>fwrite</function>, <function>fread</function>, etc) are thread
   1030     safe.  Unfortunately GNU libc implements this by using internal locking
   1031     primitives that Helgrind is unable to intercept.  Consequently Helgrind
   1032     generates many false race reports when you use these functions.</para>
   1033 
   1034     <para>Helgrind attempts to hide these errors using the standard
   1035     Valgrind error-suppression mechanism.  So, at least for simple
   1036     test cases, you don't see any.  Nevertheless, some may slip
   1037     through.  Just something to be aware of.</para>
   1038   </listitem>
   1039 
   1040   <listitem>
   1041     <para>Helgrind's error checks do not work properly inside the
   1042     system threading library itself
   1043     (<computeroutput>libpthread.so</computeroutput>), and it usually
   1044     observes large numbers of (false) errors in there.  Valgrind's
   1045     suppression system then filters these out, so you should not see
   1046     them.</para>
   1047 
   1048     <para>If you see any race errors reported
   1049     where <computeroutput>libpthread.so</computeroutput> or
   1050     <computeroutput>ld.so</computeroutput> is the object associated
   1051     with the innermost stack frame, please file a bug report at
   1052     <ulink url="&vg-url;">&vg-url;</ulink>.
   1053     </para>
   1054   </listitem>
   1055 
   1056 </orderedlist>
   1057 
   1058 </sect1>
   1059 
   1060 
   1061 
   1062 
   1063 <sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options">
   1064 <title>Helgrind Command-line Options</title>
   1065 
   1066 <para>The following end-user options are available:</para>
   1067 
   1068 <!-- start of xi:include in the manpage -->
   1069 <variablelist id="hg.opts.list">
   1070 
   1071   <varlistentry id="opt.free-is-write"
   1072                 xreflabel="--free-is-write">
   1073     <term>
   1074       <option><![CDATA[--free-is-write=no|yes
   1075       [default: no] ]]></option>
   1076     </term>
   1077     <listitem>
   1078       <para>When enabled (not the default), Helgrind treats freeing of
   1079         heap memory as if the memory was written immediately before
   1080         the free.  This exposes races where memory is referenced by
   1081         one thread, and freed by another, but there is no observable
   1082         synchronisation event to ensure that the reference happens
   1083         before the free.
   1084       </para>
   1085       <para>This functionality is new in Valgrind 3.7.0, and is
   1086         regarded as experimental.  It is not enabled by default
   1087         because its interaction with custom memory allocators is not
   1088         well understood at present.  User feedback is welcomed.
   1089       </para>
   1090     </listitem>
   1091   </varlistentry>
   1092 
   1093   <varlistentry id="opt.track-lockorders"
   1094                 xreflabel="--track-lockorders">
   1095     <term>
   1096       <option><![CDATA[--track-lockorders=no|yes
   1097       [default: yes] ]]></option>
   1098     </term>
   1099     <listitem>
   1100       <para>When enabled (the default), Helgrind performs lock order
   1101       consistency checking.  For some buggy programs, the large number
   1102       of lock order errors reported can become annoying, particularly
   1103       if you're only interested in race errors.  You may therefore find
   1104       it helpful to disable lock order checking.</para>
   1105     </listitem>
   1106   </varlistentry>
   1107 
   1108   <varlistentry id="opt.history-level"
   1109                 xreflabel="--history-level">
   1110     <term>
   1111       <option><![CDATA[--history-level=none|approx|full
   1112       [default: full] ]]></option>
   1113     </term>
   1114     <listitem>
   1115       <para><option>--history-level=full</option> (the default) causes
   1116         Helgrind collects enough information about "old" accesses that
   1117         it can produce two stack traces in a race report -- both the
   1118         stack trace for the current access, and the trace for the
   1119         older, conflicting access. To limit memory usage, "old" accesses
   1120         stack traces are limited to a maximum of 8 entries, even if
   1121         <option>--num-callers</option> value is bigger.</para>
   1122       <para>Collecting such information is expensive in both speed and
   1123         memory, particularly for programs that do many inter-thread
   1124         synchronisation events (locks, unlocks, etc).  Without such
   1125         information, it is more difficult to track down the root
   1126         causes of races.  Nonetheless, you may not need it in
   1127         situations where you just want to check for the presence or
   1128         absence of races, for example, when doing regression testing
   1129         of a previously race-free program.</para>
   1130       <para><option>--history-level=none</option> is the opposite
   1131         extreme.  It causes Helgrind not to collect any information
   1132         about previous accesses.  This can be dramatically faster
   1133         than <option>--history-level=full</option>.</para>
   1134       <para><option>--history-level=approx</option> provides a
   1135         compromise between these two extremes.  It causes Helgrind to
   1136         show a full trace for the later access, and approximate
   1137         information regarding the earlier access.  This approximate
   1138         information consists of two stacks, and the earlier access is
   1139         guaranteed to have occurred somewhere between program points
   1140         denoted by the two stacks. This is not as useful as showing
   1141         the exact stack for the previous access
   1142         (as <option>--history-level=full</option> does), but it is
   1143         better than nothing, and it is almost as fast as
   1144         <option>--history-level=none</option>.</para>
   1145     </listitem>
   1146   </varlistentry>
   1147 
   1148   <varlistentry id="opt.conflict-cache-size"
   1149                 xreflabel="--conflict-cache-size">
   1150     <term>
   1151       <option><![CDATA[--conflict-cache-size=N
   1152       [default: 1000000] ]]></option>
   1153     </term>
   1154     <listitem>
   1155       <para>This flag only has any effect
   1156         at <option>--history-level=full</option>.</para>
   1157       <para>Information about "old" conflicting accesses is stored in
   1158         a cache of limited size, with LRU-style management.  This is
   1159         necessary because it isn't practical to store a stack trace
   1160         for every single memory access made by the program.
   1161         Historical information on not recently accessed locations is
   1162         periodically discarded, to free up space in the cache.</para>
   1163       <para>This option controls the size of the cache, in terms of the
   1164         number of different memory addresses for which
   1165         conflicting access information is stored.  If you find that
   1166         Helgrind is showing race errors with only one stack instead of
   1167         the expected two stacks, try increasing this value.</para>
   1168       <para>The minimum value is 10,000 and the maximum is 30,000,000
   1169         (thirty times the default value).  Increasing the value by 1
   1170         increases Helgrind's memory requirement by very roughly 100
   1171         bytes, so the maximum value will easily eat up three extra
   1172         gigabytes or so of memory.</para>
   1173     </listitem>
   1174   </varlistentry>
   1175 
   1176   <varlistentry id="opt.check-stack-refs"
   1177                 xreflabel="--check-stack-refs">
   1178     <term>
   1179       <option><![CDATA[--check-stack-refs=no|yes
   1180       [default: yes] ]]></option>
   1181     </term>
   1182     <listitem>
   1183       <para>
   1184         By default Helgrind checks all data memory accesses made by your
   1185         program.  This flag enables you to skip checking for accesses
   1186         to thread stacks (local variables).  This can improve
   1187         performance, but comes at the cost of missing races on
   1188         stack-allocated data.
   1189       </para>
   1190     </listitem>
   1191   </varlistentry>
   1192 
   1193   <varlistentry id="opt.ignore-thread-creation"
   1194                 xreflabel="--ignore-thread-creation">
   1195     <term>
   1196       <option><![CDATA[--ignore-thread-creation=<yes|no>
   1197       [default: no]]]></option>
   1198     </term>
   1199     <listitem>
   1200       <para>
   1201         Controls whether all activities during thread creation should be
   1202         ignored. By default enabled only on Solaris.
   1203         Solaris provides higher throughput, parallelism and scalability than
   1204         other operating systems, at the cost of more fine-grained locking
   1205         activity. This means for example that when a thread is created under
   1206         glibc, just one big lock is used for all thread setup. Solaris libc
   1207         uses several fine-grained locks and the creator thread resumes its
   1208         activities as soon as possible, leaving for example stack and TLS setup
   1209         sequence to the created thread.
   1210         This situation confuses Helgrind as it assumes there is some false
   1211         ordering in place between creator and created thread; and therefore many
   1212         types of race conditions in the application would not be reported.
   1213         To prevent such false ordering, this command line option is set to
   1214         <computeroutput>yes</computeroutput> by default on Solaris.
   1215         All activity (loads, stores, client requests) is therefore ignored
   1216         during:</para>
   1217       <itemizedlist>
   1218         <listitem>
   1219           <para>
   1220             pthread_create() call in the creator thread
   1221           </para>
   1222         </listitem>
   1223         <listitem>
   1224           <para>
   1225             thread creation phase (stack and TLS setup) in the created thread
   1226           </para>
   1227         </listitem>
   1228       </itemizedlist>
   1229       <para>
   1230          Also new memory allocated during thread creation is untracked,
   1231          that is race reporting is suppressed there. DRD does the same thing
   1232          implicitly. This is necessary because Solaris libc caches many objects
   1233          and reuses them for different threads and that confuses
   1234          Helgrind.</para>
   1235     </listitem>
   1236   </varlistentry>
   1237 
   1238 
   1239 </variablelist>
   1240 <!-- end of xi:include in the manpage -->
   1241 
   1242 <!-- start of xi:include in the manpage -->
   1243 <!--  commented out, because we don't document debugging options in the
   1244       manual.  Nb: all the double-dashes below had a space inserted in them
   1245       to avoid problems with premature closing of this comment.
   1246 <para>In addition, the following debugging options are available for
   1247 Helgrind:</para>
   1248 
   1249 <variablelist id="hg.debugopts.list">
   1250 
   1251   <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc">
   1252     <term>
   1253       <option><![CDATA[- -trace-malloc=no|yes [no]
   1254       ]]></option>
   1255     </term>
   1256     <listitem>
   1257       <para>Show all client <function>malloc</function> (etc) and
   1258       <function>free</function> (etc) requests.</para>
   1259     </listitem>
   1260   </varlistentry>
   1261 
   1262   <varlistentry id="opt.cmp-race-err-addrs" 
   1263                 xreflabel="- -cmp-race-err-addrs">
   1264     <term>
   1265       <option><![CDATA[- -cmp-race-err-addrs=no|yes [no]
   1266       ]]></option>
   1267     </term>
   1268     <listitem>
   1269       <para>Controls whether or not race (data) addresses should be
   1270         taken into account when removing duplicates of race errors.
   1271         With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise
   1272         identical race errors will be considered to be the same if
   1273         their race addresses differ.  With
   1274         With <varname>- -cmp-race-err-addrs=yes</varname> they will be
   1275         considered different.  This is provided to help make certain
   1276         regression tests work reliably.</para>
   1277     </listitem>
   1278   </varlistentry>
   1279 
   1280   <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags">
   1281     <term>
   1282       <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000]
   1283       ]]></option>
   1284     </term>
   1285     <listitem>
   1286       <para>Run extensive sanity checks on Helgrind's internal
   1287         data structures at events defined by the bitstring, as
   1288         follows:</para>
   1289       <para><computeroutput>010000 </computeroutput>after changes to
   1290         the lock order acquisition graph</para>
   1291       <para><computeroutput>001000 </computeroutput>after every client
   1292         memory access (NB: not currently used)</para>
   1293       <para><computeroutput>000100 </computeroutput>after every client
   1294         memory range permission setting of 256 bytes or greater</para>
   1295       <para><computeroutput>000010 </computeroutput>after every client
   1296         lock or unlock event</para>
   1297       <para><computeroutput>000001 </computeroutput>after every client
   1298         thread creation or joinage event</para>
   1299       <para>Note these will make Helgrind run very slowly, often to
   1300         the point of being completely unusable.</para>
   1301     </listitem>
   1302   </varlistentry>
   1303 
   1304 </variablelist>
   1305 -->
   1306 <!-- end of xi:include in the manpage -->
   1307 
   1308 
   1309 </sect1>
   1310 
   1311 
   1312 <sect1 id="hg-manual.monitor-commands" xreflabel="Helgrind Monitor Commands">
   1313 <title>Helgrind Monitor Commands</title>
   1314 <para>The Helgrind tool provides monitor commands handled by Valgrind's
   1315 built-in gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
   1316 </para>
   1317 <itemizedlist>
   1318   <listitem>
   1319     <para><varname>info locks [lock_addr]</varname> shows the list of locks
   1320     and their status. If  <varname>lock_addr</varname> is given, only shows
   1321     the lock located at this address. </para>
   1322     <para>
   1323     In the following example, helgrind knows about one lock.  This
   1324     lock is located at the guest address <varname>ga
   1325     0x8049a20</varname>.  The lock kind is <varname>rdwr</varname>
   1326     indicating a reader-writer lock.  Other possible lock kinds
   1327     are <varname>nonRec</varname> (simple mutex, non recursive)
   1328     and <varname>mbRec</varname> (simple mutex, possibly recursive).
   1329     The lock kind is then followed by the list of threads helding the
   1330     lock.  In the below example, <varname>R1:thread #6 tid 3</varname>
   1331     indicates that the helgrind thread #6 has acquired (once, as the
   1332     counter following the letter R is one) the lock in read mode. The
   1333     helgrind thread nr is incremented for each started thread.  The
   1334     presence of 'tid 3' indicates that the thread #6 is has not exited
   1335     yet and is the valgrind tid 3. If a thread has terminated, then
   1336     this is indicated with 'tid (exited)'.
   1337     </para>
   1338 <programlisting><![CDATA[
   1339 (gdb) monitor info locks
   1340 Lock ga 0x8049a20 {
   1341    kind   rdwr
   1342  { R1:thread #6 tid 3 }
   1343 }
   1344 (gdb) 
   1345 ]]></programlisting>
   1346 
   1347     <para> If you give the option <varname>--read-var-info=yes</varname>,
   1348     then more information will be provided about the lock location, such as
   1349     the global variable or the heap block that contains the lock:
   1350     </para>
   1351 <programlisting><![CDATA[
   1352 Lock ga 0x8049a20 {
   1353  Location 0x8049a20 is 0 bytes inside global var "s_rwlock"
   1354  declared at rwlock_race.c:17
   1355    kind   rdwr
   1356  { R1:thread #3 tid 3 }
   1357 }
   1358 ]]></programlisting>
   1359 
   1360   </listitem>
   1361 
   1362   <listitem>
   1363     <para><varname>accesshistory  &lt;addr&gt; [&lt;len&gt;]</varname>
   1364     shows the  access history recorded for &lt;len&gt; (default 1) bytes
   1365     starting at &lt;addr&gt;. For each recorded access that overlaps
   1366     with the given range, <varname>accesshistory</varname> shows the operation
   1367     type (read or write), the address and size read or written, the helgrind
   1368     thread nr/valgrind tid number that did the operation and the locks held
   1369     by the thread at the time of the operation.
   1370     The oldest access is shown first, the most recent access is shown last.
   1371     </para>
   1372     <para>
   1373     In the following example, we see first a recorded write of 4 bytes by
   1374     thread #7 that has modified the given 2 bytes range.
   1375     The second recorded write is the most recent recorded write : thread #9
   1376     modified the same 2 bytes as part of a 4 bytes write operation.
   1377     The list of locks held by each thread at the time of the write operation
   1378     are also shown.
   1379     </para>
   1380 <programlisting><![CDATA[
   1381 (gdb) monitor accesshistory 0x8049D8A 2
   1382 write of size 4 at 0x8049D88 by thread #7 tid 3
   1383 ==6319== Locks held: 2, at address 0x8049D8C (and 1 that can't be shown)
   1384 ==6319==    at 0x804865F: child_fn1 (locked_vs_unlocked2.c:29)
   1385 ==6319==    by 0x400AE61: mythread_wrapper (hg_intercepts.c:234)
   1386 ==6319==    by 0x39B924: start_thread (pthread_create.c:297)
   1387 ==6319==    by 0x2F107D: clone (clone.S:130)
   1388 
   1389 write of size 4 at 0x8049D88 by thread #9 tid 2
   1390 ==6319== Locks held: 2, at addresses 0x8049DA4 0x8049DD4
   1391 ==6319==    at 0x804877B: child_fn2 (locked_vs_unlocked2.c:45)
   1392 ==6319==    by 0x400AE61: mythread_wrapper (hg_intercepts.c:234)
   1393 ==6319==    by 0x39B924: start_thread (pthread_create.c:297)
   1394 ==6319==    by 0x2F107D: clone (clone.S:130)
   1395 
   1396 ]]></programlisting>
   1397 
   1398   </listitem>
   1399 
   1400 </itemizedlist>
   1401 
   1402 </sect1>
   1403 
   1404 <sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests">
   1405 <title>Helgrind Client Requests</title>
   1406 
   1407 <para>The following client requests are defined in
   1408 <filename>helgrind.h</filename>.  See that file for exact details of their
   1409 arguments.</para>
   1410 
   1411 <itemizedlist>
   1412 
   1413   <listitem>
   1414     <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para>
   1415     <para>This makes Helgrind forget everything it knows about a
   1416     specified memory range.  This is particularly useful for memory
   1417     allocators that wish to recycle memory.</para>
   1418   </listitem>
   1419   <listitem>
   1420     <para><function>ANNOTATE_HAPPENS_BEFORE</function></para>
   1421   </listitem>
   1422   <listitem>
   1423     <para><function>ANNOTATE_HAPPENS_AFTER</function></para>
   1424   </listitem>
   1425   <listitem>
   1426     <para><function>ANNOTATE_NEW_MEMORY</function></para>
   1427   </listitem>
   1428   <listitem>
   1429     <para><function>ANNOTATE_RWLOCK_CREATE</function></para>
   1430   </listitem>
   1431   <listitem>
   1432     <para><function>ANNOTATE_RWLOCK_DESTROY</function></para>
   1433   </listitem>
   1434   <listitem>
   1435     <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para>
   1436   </listitem>
   1437   <listitem>
   1438     <para><function>ANNOTATE_RWLOCK_RELEASED</function></para>
   1439     <para>These are used to describe to Helgrind, the behaviour of
   1440     custom (non-POSIX) synchronisation primitives, which it otherwise
   1441     has no way to understand.  See comments
   1442     in <filename>helgrind.h</filename> for further
   1443     documentation.</para>
   1444   </listitem>
   1445 
   1446 </itemizedlist>
   1447 
   1448 </sect1>
   1449 
   1450 
   1451 
   1452 <sect1 id="hg-manual.todolist" xreflabel="To Do List">
   1453 <title>A To-Do List for Helgrind</title>
   1454 
   1455 <para>The following is a list of loose ends which should be tidied up
   1456 some time.</para>
   1457 
   1458 <itemizedlist>
   1459   <listitem><para>For lock order errors, print the complete lock
   1460     cycle, rather than only doing for size-2 cycles as at
   1461     present.</para>
   1462   </listitem>
   1463   <listitem><para>The conflicting access mechanism sometimes
   1464     mysteriously fails to show the conflicting access' stack, even
   1465     when provided with unbounded storage for conflicting access info.
   1466     This should be investigated.</para>
   1467   </listitem>
   1468   <listitem><para>Document races caused by GCC's thread-unsafe code
   1469     generation for speculative stores.  In the interim see
   1470     <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
   1471     </computeroutput>
   1472     and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
   1473     </para>
   1474   </listitem>
   1475   <listitem><para>Don't update the lock-order graph, and don't check
   1476     for errors, when a "try"-style lock operation happens (e.g.
   1477     <function>pthread_mutex_trylock</function>).  Such calls do not add any real
   1478     restrictions to the locking order, since they can always fail to
   1479     acquire the lock, resulting in the caller going off and doing Plan
   1480     B (presumably it will have a Plan B).  Doing such checks could
   1481     generate false lock-order errors and confuse users.</para>
   1482   </listitem>
   1483   <listitem><para> Performance can be very poor.  Slowdowns on the
   1484     order of 100:1 are not unusual.  There is limited scope for
   1485     performance improvements.
   1486     </para>
   1487   </listitem>
   1488 
   1489 </itemizedlist>
   1490 
   1491 </sect1>
   1492 
   1493 </chapter>
   1494