1 <?xml version="1.0"?> <!-- -*- sgml -*- --> 2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" 4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> 5 6 7 <chapter id="hg-manual" xreflabel="Helgrind: thread error detector"> 8 <title>Helgrind: a thread error detector</title> 9 10 <para>To use this tool, you must specify 11 <option>--tool=helgrind</option> on the Valgrind 12 command line.</para> 13 14 15 <sect1 id="hg-manual.overview" xreflabel="Overview"> 16 <title>Overview</title> 17 18 <para>Helgrind is a Valgrind tool for detecting synchronisation errors 19 in C, C++ and Fortran programs that use the POSIX pthreads 20 threading primitives.</para> 21 22 <para>The main abstractions in POSIX pthreads are: a set of threads 23 sharing a common address space, thread creation, thread joining, 24 thread exit, mutexes (locks), condition variables (inter-thread event 25 notifications), reader-writer locks, spinlocks, semaphores and 26 barriers.</para> 27 28 <para>Helgrind can detect three classes of errors, which are discussed 29 in detail in the next three sections:</para> 30 31 <orderedlist> 32 <listitem> 33 <para><link linkend="hg-manual.api-checks"> 34 Misuses of the POSIX pthreads API.</link></para> 35 </listitem> 36 <listitem> 37 <para><link linkend="hg-manual.lock-orders"> 38 Potential deadlocks arising from lock 39 ordering problems.</link></para> 40 </listitem> 41 <listitem> 42 <para><link linkend="hg-manual.data-races"> 43 Data races -- accessing memory without adequate locking 44 or synchronisation</link>. 45 </para> 46 </listitem> 47 </orderedlist> 48 49 <para>Problems like these often result in unreproducible, 50 timing-dependent crashes, deadlocks and other misbehaviour, and 51 can be difficult to find by other means.</para> 52 53 <para>Helgrind is aware of all the pthread abstractions and tracks 54 their effects as accurately as it can. On x86 and amd64 platforms, it 55 understands and partially handles implicit locking arising from the 56 use of the LOCK instruction prefix. On PowerPC/POWER and ARM 57 platforms, it partially handles implicit locking arising from 58 load-linked and store-conditional instruction pairs. 59 </para> 60 61 <para>Helgrind works best when your application uses only the POSIX 62 pthreads API. However, if you want to use custom threading 63 primitives, you can describe their behaviour to Helgrind using the 64 <varname>ANNOTATE_*</varname> macros defined 65 in <varname>helgrind.h</varname>.</para> 66 67 <para>Helgrind also provides <xref linkend="manual-core.xtree"/> memory 68 profiling using the command line 69 option <computeroutput>--xtree-memory</computeroutput> and the monitor command 70 <computeroutput>xtmemory</computeroutput>.</para> 71 72 73 74 <para>Following those is a section containing 75 <link linkend="hg-manual.effective-use"> 76 hints and tips on how to get the best out of Helgrind.</link> 77 </para> 78 79 <para>Then there is a 80 <link linkend="hg-manual.options">summary of command-line 81 options.</link> 82 </para> 83 84 <para>Finally, there is 85 <link linkend="hg-manual.todolist">a brief summary of areas in which Helgrind 86 could be improved.</link> 87 </para> 88 89 </sect1> 90 91 92 93 94 <sect1 id="hg-manual.api-checks" xreflabel="API Checks"> 95 <title>Detected errors: Misuses of the POSIX pthreads API</title> 96 97 <para>Helgrind intercepts calls to many POSIX pthreads functions, and 98 is therefore able to report on various common problems. Although 99 these are unglamourous errors, their presence can lead to undefined 100 program behaviour and hard-to-find bugs later on. The detected errors 101 are:</para> 102 103 <itemizedlist> 104 <listitem><para>unlocking an invalid mutex</para></listitem> 105 <listitem><para>unlocking a not-locked mutex</para></listitem> 106 <listitem><para>unlocking a mutex held by a different 107 thread</para></listitem> 108 <listitem><para>destroying an invalid or a locked mutex</para></listitem> 109 <listitem><para>recursively locking a non-recursive mutex</para></listitem> 110 <listitem><para>deallocation of memory that contains a 111 locked mutex</para></listitem> 112 <listitem><para>passing mutex arguments to functions expecting 113 reader-writer lock arguments, and vice 114 versa</para></listitem> 115 <listitem><para>when a POSIX pthread function fails with an 116 error code that must be handled</para></listitem> 117 <listitem><para>when a thread exits whilst still holding locked 118 locks</para></listitem> 119 <listitem><para>calling <function>pthread_cond_wait</function> 120 with a not-locked mutex, an invalid mutex, 121 or one locked by a different 122 thread</para></listitem> 123 <listitem><para>inconsistent bindings between condition 124 variables and their associated mutexes</para></listitem> 125 <listitem><para>invalid or duplicate initialisation of a pthread 126 barrier</para></listitem> 127 <listitem><para>initialisation of a pthread barrier on which threads 128 are still waiting</para></listitem> 129 <listitem><para>destruction of a pthread barrier object which was 130 never initialised, or on which threads are still 131 waiting</para></listitem> 132 <listitem><para>waiting on an uninitialised pthread 133 barrier</para></listitem> 134 <listitem><para>for all of the pthreads functions that Helgrind 135 intercepts, an error is reported, along with a stack 136 trace, if the system threading library routine returns 137 an error code, even if Helgrind itself detected no 138 error</para></listitem> 139 </itemizedlist> 140 141 <para>Checks pertaining to the validity of mutexes are generally also 142 performed for reader-writer locks.</para> 143 144 <para>Various kinds of this-can't-possibly-happen events are also 145 reported. These usually indicate bugs in the system threading 146 library.</para> 147 148 <para>Reported errors always contain a primary stack trace indicating 149 where the error was detected. They may also contain auxiliary stack 150 traces giving additional information. In particular, most errors 151 relating to mutexes will also tell you where that mutex first came to 152 Helgrind's attention (the "<computeroutput>was first observed 153 at</computeroutput>" part), so you have a chance of figuring out which 154 mutex it is referring to. For example:</para> 155 156 <programlisting><![CDATA[ 157 Thread #1 unlocked a not-locked lock at 0x7FEFFFA90 158 at 0x4C2408D: pthread_mutex_unlock (hg_intercepts.c:492) 159 by 0x40073A: nearly_main (tc09_bad_unlock.c:27) 160 by 0x40079B: main (tc09_bad_unlock.c:50) 161 Lock at 0x7FEFFFA90 was first observed 162 at 0x4C25D01: pthread_mutex_init (hg_intercepts.c:326) 163 by 0x40071F: nearly_main (tc09_bad_unlock.c:23) 164 by 0x40079B: main (tc09_bad_unlock.c:50) 165 ]]></programlisting> 166 167 <para>Helgrind has a way of summarising thread identities, as 168 you see here with the text "<computeroutput>Thread 169 #1</computeroutput>". This is so that it can speak about threads and 170 sets of threads without overwhelming you with details. See 171 <link linkend="hg-manual.data-races.errmsgs">below</link> 172 for more information on interpreting error messages.</para> 173 174 </sect1> 175 176 177 178 179 <sect1 id="hg-manual.lock-orders" xreflabel="Lock Orders"> 180 <title>Detected errors: Inconsistent Lock Orderings</title> 181 182 <para>In this section, and in general, to "acquire" a lock simply 183 means to lock that lock, and to "release" a lock means to unlock 184 it.</para> 185 186 <para>Helgrind monitors the order in which threads acquire locks. 187 This allows it to detect potential deadlocks which could arise from 188 the formation of cycles of locks. Detecting such inconsistencies is 189 useful because, whilst actual deadlocks are fairly obvious, potential 190 deadlocks may never be discovered during testing and could later lead 191 to hard-to-diagnose in-service failures.</para> 192 193 <para>The simplest example of such a problem is as 194 follows.</para> 195 196 <itemizedlist> 197 <listitem><para>Imagine some shared resource R, which, for whatever 198 reason, is guarded by two locks, L1 and L2, which must both be held 199 when R is accessed.</para> 200 </listitem> 201 <listitem><para>Suppose a thread acquires L1, then L2, and proceeds 202 to access R. The implication of this is that all threads in the 203 program must acquire the two locks in the order first L1 then L2. 204 Not doing so risks deadlock.</para> 205 </listitem> 206 <listitem><para>The deadlock could happen if two threads -- call them 207 T1 and T2 -- both want to access R. Suppose T1 acquires L1 first, 208 and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries 209 to acquire L1, but those locks are both already held. So T1 and T2 210 become deadlocked.</para> 211 </listitem> 212 </itemizedlist> 213 214 <para>Helgrind builds a directed graph indicating the order in which 215 locks have been acquired in the past. When a thread acquires a new 216 lock, the graph is updated, and then checked to see if it now contains 217 a cycle. The presence of a cycle indicates a potential deadlock involving 218 the locks in the cycle.</para> 219 220 <para>In general, Helgrind will choose two locks involved in the cycle 221 and show you how their acquisition ordering has become inconsistent. 222 It does this by showing the program points that first defined the 223 ordering, and the program points which later violated it. Here is a 224 simple example involving just two locks:</para> 225 226 <programlisting><![CDATA[ 227 Thread #1: lock order "0x7FF0006D0 before 0x7FF0006A0" violated 228 229 Observed (incorrect) order is: acquisition of lock at 0x7FF0006A0 230 at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) 231 by 0x400825: main (tc13_laog1.c:23) 232 233 followed by a later acquisition of lock at 0x7FF0006D0 234 at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) 235 by 0x400853: main (tc13_laog1.c:24) 236 237 Required order was established by acquisition of lock at 0x7FF0006D0 238 at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) 239 by 0x40076D: main (tc13_laog1.c:17) 240 241 followed by a later acquisition of lock at 0x7FF0006A0 242 at 0x4C2BC62: pthread_mutex_lock (hg_intercepts.c:494) 243 by 0x40079B: main (tc13_laog1.c:18) 244 ]]></programlisting> 245 246 <para>When there are more than two locks in the cycle, the error is 247 equally serious. However, at present Helgrind does not show the locks 248 involved, sometimes because that information is not available, but 249 also so as to avoid flooding you with information. For example, a 250 naive implementation of the famous Dining Philosophers problem 251 involves a cycle of five locks 252 (see <computeroutput>helgrind/tests/tc14_laog_dinphils.c</computeroutput>). 253 In this case Helgrind has detected that all 5 philosophers could 254 simultaneously pick up their left fork and then deadlock whilst 255 waiting to pick up their right forks.</para> 256 257 <programlisting><![CDATA[ 258 Thread #6: lock order "0x80499A0 before 0x8049A00" violated 259 260 Observed (incorrect) order is: acquisition of lock at 0x8049A00 261 at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495) 262 by 0x80485B4: dine (tc14_laog_dinphils.c:18) 263 by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219) 264 by 0x39B924: start_thread (pthread_create.c:297) 265 by 0x2F107D: clone (clone.S:130) 266 267 followed by a later acquisition of lock at 0x80499A0 268 at 0x40085BC: pthread_mutex_lock (hg_intercepts.c:495) 269 by 0x80485CD: dine (tc14_laog_dinphils.c:19) 270 by 0x400BDA4: mythread_wrapper (hg_intercepts.c:219) 271 by 0x39B924: start_thread (pthread_create.c:297) 272 by 0x2F107D: clone (clone.S:130) 273 ]]></programlisting> 274 275 </sect1> 276 277 278 279 280 <sect1 id="hg-manual.data-races" xreflabel="Data Races"> 281 <title>Detected errors: Data Races</title> 282 283 <para>A data race happens, or could happen, when two threads access a 284 shared memory location without using suitable locks or other 285 synchronisation to ensure single-threaded access. Such missing 286 locking can cause obscure timing dependent bugs. Ensuring programs 287 are race-free is one of the central difficulties of threaded 288 programming.</para> 289 290 <para>Reliably detecting races is a difficult problem, and most 291 of Helgrind's internals are devoted to dealing with it. 292 We begin with a simple example.</para> 293 294 295 <sect2 id="hg-manual.data-races.example" xreflabel="Simple Race"> 296 <title>A Simple Data Race</title> 297 298 <para>About the simplest possible example of a race is as follows. In 299 this program, it is impossible to know what the value 300 of <computeroutput>var</computeroutput> is at the end of the program. 301 Is it 2 ? Or 1 ?</para> 302 303 <programlisting><![CDATA[ 304 #include <pthread.h> 305 306 int var = 0; 307 308 void* child_fn ( void* arg ) { 309 var++; /* Unprotected relative to parent */ /* this is line 6 */ 310 return NULL; 311 } 312 313 int main ( void ) { 314 pthread_t child; 315 pthread_create(&child, NULL, child_fn, NULL); 316 var++; /* Unprotected relative to child */ /* this is line 13 */ 317 pthread_join(child, NULL); 318 return 0; 319 } 320 ]]></programlisting> 321 322 <para>The problem is there is nothing to 323 stop <varname>var</varname> being updated simultaneously 324 by both threads. A correct program would 325 protect <varname>var</varname> with a lock of type 326 <function>pthread_mutex_t</function>, which is acquired 327 before each access and released afterwards. Helgrind's output for 328 this program is:</para> 329 330 <programlisting><![CDATA[ 331 Thread #1 is the program's root thread 332 333 Thread #2 was created 334 at 0x511C08E: clone (in /lib64/libc-2.8.so) 335 by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) 336 by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) 337 by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) 338 by 0x400605: main (simple_race.c:12) 339 340 Possible data race during read of size 4 at 0x601038 by thread #1 341 Locks held: none 342 at 0x400606: main (simple_race.c:13) 343 344 This conflicts with a previous write of size 4 by thread #2 345 Locks held: none 346 at 0x4005DC: child_fn (simple_race.c:6) 347 by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) 348 by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) 349 by 0x511C0CC: clone (in /lib64/libc-2.8.so) 350 351 Location 0x601038 is 0 bytes inside global var "var" 352 declared at simple_race.c:3 353 ]]></programlisting> 354 355 <para>This is quite a lot of detail for an apparently simple error. 356 The last clause is the main error message. It says there is a race as 357 a result of a read of size 4 (bytes), at 0x601038, which is the 358 address of <computeroutput>var</computeroutput>, happening in 359 function <computeroutput>main</computeroutput> at line 13 in the 360 program.</para> 361 362 <para>Two important parts of the message are:</para> 363 364 <itemizedlist> 365 <listitem> 366 <para>Helgrind shows two stack traces for the error, not one. By 367 definition, a race involves two different threads accessing the 368 same location in such a way that the result depends on the relative 369 speeds of the two threads.</para> 370 <para> 371 The first stack trace follows the text "<computeroutput>Possible 372 data race during read of size 4 ...</computeroutput>" and the 373 second trace follows the text "<computeroutput>This conflicts with 374 a previous write of size 4 ...</computeroutput>". Helgrind is 375 usually able to show both accesses involved in a race. At least 376 one of these will be a write (since two concurrent, unsynchronised 377 reads are harmless), and they will of course be from different 378 threads.</para> 379 <para>By examining your program at the two locations, you should be 380 able to get at least some idea of what the root cause of the 381 problem is. For each location, Helgrind shows the set of locks 382 held at the time of the access. This often makes it clear which 383 thread, if any, failed to take a required lock. In this example 384 neither thread holds a lock during the access.</para> 385 </listitem> 386 <listitem> 387 <para>For races which occur on global or stack variables, Helgrind 388 tries to identify the name and defining point of the variable. 389 Hence the text "<computeroutput>Location 0x601038 is 0 bytes inside 390 global var "var" declared at simple_race.c:3</computeroutput>".</para> 391 <para>Showing names of stack and global variables carries no 392 run-time overhead once Helgrind has your program up and running. 393 However, it does require Helgrind to spend considerable extra time 394 and memory at program startup to read the relevant debug info. 395 Hence this facility is disabled by default. To enable it, you need 396 to give the <varname>--read-var-info=yes</varname> option to 397 Helgrind.</para> 398 </listitem> 399 </itemizedlist> 400 401 <para>The following section explains Helgrind's race detection 402 algorithm in more detail.</para> 403 404 </sect2> 405 406 407 408 <sect2 id="hg-manual.data-races.algorithm" xreflabel="DR Algorithm"> 409 <title>Helgrind's Race Detection Algorithm</title> 410 411 <para>Most programmers think about threaded programming in terms of 412 the basic functionality provided by the threading library (POSIX 413 Pthreads): thread creation, thread joining, locks, condition 414 variables, semaphores and barriers.</para> 415 416 <para>The effect of using these functions is to impose 417 constraints upon the order in which memory accesses can 418 happen. This implied ordering is generally known as the 419 "happens-before relation". Once you understand the happens-before 420 relation, it is easy to see how Helgrind finds races in your code. 421 Fortunately, the happens-before relation is itself easy to understand, 422 and is by itself a useful tool for reasoning about the behaviour of 423 parallel programs. We now introduce it using a simple example.</para> 424 425 <para>Consider first the following buggy program:</para> 426 427 <programlisting><![CDATA[ 428 Parent thread: Child thread: 429 430 int var; 431 432 // create child thread 433 pthread_create(...) 434 var = 20; var = 10; 435 exit 436 437 // wait for child 438 pthread_join(...) 439 printf("%d\n", var); 440 ]]></programlisting> 441 442 <para>The parent thread creates a child. Both then write different 443 values to some variable <computeroutput>var</computeroutput>, and the 444 parent then waits for the child to exit.</para> 445 446 <para>What is the value of <computeroutput>var</computeroutput> at the 447 end of the program, 10 or 20? We don't know. The program is 448 considered buggy (it has a race) because the final value 449 of <computeroutput>var</computeroutput> depends on the relative rates 450 of progress of the parent and child threads. If the parent is fast 451 and the child is slow, then the child's assignment may happen later, 452 so the final value will be 10; and vice versa if the child is faster 453 than the parent.</para> 454 455 <para>The relative rates of progress of parent vs child is not something 456 the programmer can control, and will often change from run to run. 457 It depends on factors such as the load on the machine, what else is 458 running, the kernel's scheduling strategy, and many other factors.</para> 459 460 <para>The obvious fix is to use a lock to 461 protect <computeroutput>var</computeroutput>. It is however 462 instructive to consider a somewhat more abstract solution, which is to 463 send a message from one thread to the other:</para> 464 465 <programlisting><![CDATA[ 466 Parent thread: Child thread: 467 468 int var; 469 470 // create child thread 471 pthread_create(...) 472 var = 20; 473 // send message to child 474 // wait for message to arrive 475 var = 10; 476 exit 477 478 // wait for child 479 pthread_join(...) 480 printf("%d\n", var); 481 ]]></programlisting> 482 483 <para>Now the program reliably prints "10", regardless of the speed of 484 the threads. Why? Because the child's assignment cannot happen until 485 after it receives the message. And the message is not sent until 486 after the parent's assignment is done.</para> 487 488 <para>The message transmission creates a "happens-before" dependency 489 between the two assignments: <computeroutput>var = 20;</computeroutput> 490 must now happen-before <computeroutput>var = 10;</computeroutput>. 491 And so there is no longer a race 492 on <computeroutput>var</computeroutput>. 493 </para> 494 495 <para>Note that it's not significant that the parent sends a message 496 to the child. Sending a message from the child (after its assignment) 497 to the parent (before its assignment) would also fix the problem, causing 498 the program to reliably print "20".</para> 499 500 <para>Helgrind's algorithm is (conceptually) very simple. It monitors all 501 accesses to memory locations. If a location -- in this example, 502 <computeroutput>var</computeroutput>, 503 is accessed by two different threads, Helgrind checks to see if the 504 two accesses are ordered by the happens-before relation. If so, 505 that's fine; if not, it reports a race.</para> 506 507 <para>It is important to understand that the happens-before relation 508 creates only a partial ordering, not a total ordering. An example of 509 a total ordering is comparison of numbers: for any two numbers 510 <computeroutput>x</computeroutput> and 511 <computeroutput>y</computeroutput>, either 512 <computeroutput>x</computeroutput> is less than, equal to, or greater 513 than 514 <computeroutput>y</computeroutput>. A partial ordering is like a 515 total ordering, but it can also express the concept that two elements 516 are neither equal, less or greater, but merely unordered with respect 517 to each other.</para> 518 519 <para>In the fixed example above, we say that 520 <computeroutput>var = 20;</computeroutput> "happens-before" 521 <computeroutput>var = 10;</computeroutput>. But in the original 522 version, they are unordered: we cannot say that either happens-before 523 the other.</para> 524 525 <para>What does it mean to say that two accesses from different 526 threads are ordered by the happens-before relation? It means that 527 there is some chain of inter-thread synchronisation operations which 528 cause those accesses to happen in a particular order, irrespective of 529 the actual rates of progress of the individual threads. This is a 530 required property for a reliable threaded program, which is why 531 Helgrind checks for it.</para> 532 533 <para>The happens-before relations created by standard threading 534 primitives are as follows:</para> 535 536 <itemizedlist> 537 <listitem><para>When a mutex is unlocked by thread T1 and later (or 538 immediately) locked by thread T2, then the memory accesses in T1 539 prior to the unlock must happen-before those in T2 after it acquires 540 the lock.</para> 541 </listitem> 542 <listitem><para>The same idea applies to reader-writer locks, 543 although with some complication so as to allow correct handling of 544 reads vs writes.</para> 545 </listitem> 546 <listitem><para>When a condition variable (CV) is signalled on by 547 thread T1 and some other thread T2 is thereby released from a wait 548 on the same CV, then the memory accesses in T1 prior to the 549 signalling must happen-before those in T2 after it returns from the 550 wait. If no thread was waiting on the CV then there is no 551 effect.</para> 552 </listitem> 553 <listitem><para>If instead T1 broadcasts on a CV, then all of the 554 waiting threads, rather than just one of them, acquire a 555 happens-before dependency on the broadcasting thread at the point it 556 did the broadcast.</para> 557 </listitem> 558 <listitem><para>A thread T2 that continues after completing sem_wait 559 on a semaphore that thread T1 posts on, acquires a happens-before 560 dependence on the posting thread, a bit like dependencies caused 561 mutex unlock-lock pairs. However, since a semaphore can be posted 562 on many times, it is unspecified from which of the post calls the 563 wait call gets its happens-before dependency.</para> 564 </listitem> 565 <listitem><para>For a group of threads T1 .. Tn which arrive at a 566 barrier and then move on, each thread after the call has a 567 happens-after dependency from all threads before the 568 barrier.</para> 569 </listitem> 570 <listitem><para>A newly-created child thread acquires an initial 571 happens-after dependency on the point where its parent created it. 572 That is, all memory accesses performed by the parent prior to 573 creating the child are regarded as happening-before all the accesses 574 of the child.</para> 575 </listitem> 576 <listitem><para>Similarly, when an exiting thread is reaped via a 577 call to <function>pthread_join</function>, once the call returns, the 578 reaping thread acquires a happens-after dependency relative to all memory 579 accesses made by the exiting thread.</para> 580 </listitem> 581 </itemizedlist> 582 583 <para>In summary: Helgrind intercepts the above listed events, and builds a 584 directed acyclic graph represented the collective happens-before 585 dependencies. It also monitors all memory accesses.</para> 586 587 <para>If a location is accessed by two different threads, but Helgrind 588 cannot find any path through the happens-before graph from one access 589 to the other, then it reports a race.</para> 590 591 <para>There are a couple of caveats:</para> 592 593 <itemizedlist> 594 <listitem><para>Helgrind doesn't check for a race in the case where 595 both accesses are reads. That would be silly, since concurrent 596 reads are harmless.</para> 597 </listitem> 598 <listitem><para>Two accesses are considered to be ordered by the 599 happens-before dependency even through arbitrarily long chains of 600 synchronisation events. For example, if T1 accesses some location 601 L, and then <function>pthread_cond_signals</function> T2, which later 602 <function>pthread_cond_signals</function> T3, which then accesses L, then 603 a suitable happens-before dependency exists between the first and second 604 accesses, even though it involves two different inter-thread 605 synchronisation events.</para> 606 </listitem> 607 </itemizedlist> 608 609 </sect2> 610 611 612 613 <sect2 id="hg-manual.data-races.errmsgs" xreflabel="Race Error Messages"> 614 <title>Interpreting Race Error Messages</title> 615 616 <para>Helgrind's race detection algorithm collects a lot of 617 information, and tries to present it in a helpful way when a race is 618 detected. Here's an example:</para> 619 620 <programlisting><![CDATA[ 621 Thread #2 was created 622 at 0x511C08E: clone (in /lib64/libc-2.8.so) 623 by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) 624 by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) 625 by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) 626 by 0x4008F2: main (tc21_pthonce.c:86) 627 628 Thread #3 was created 629 at 0x511C08E: clone (in /lib64/libc-2.8.so) 630 by 0x4E333A4: do_clone (in /lib64/libpthread-2.8.so) 631 by 0x4E33A30: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.8.so) 632 by 0x4C299D4: pthread_create@* (hg_intercepts.c:214) 633 by 0x4008F2: main (tc21_pthonce.c:86) 634 635 Possible data race during read of size 4 at 0x601070 by thread #3 636 Locks held: none 637 at 0x40087A: child (tc21_pthonce.c:74) 638 by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) 639 by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) 640 by 0x511C0CC: clone (in /lib64/libc-2.8.so) 641 642 This conflicts with a previous write of size 4 by thread #2 643 Locks held: none 644 at 0x400883: child (tc21_pthonce.c:74) 645 by 0x4C29AFF: mythread_wrapper (hg_intercepts.c:194) 646 by 0x4E3403F: start_thread (in /lib64/libpthread-2.8.so) 647 by 0x511C0CC: clone (in /lib64/libc-2.8.so) 648 649 Location 0x601070 is 0 bytes inside local var "unprotected2" 650 declared at tc21_pthonce.c:51, in frame #0 of thread 3 651 ]]></programlisting> 652 653 <para>Helgrind first announces the creation points of any threads 654 referenced in the error message. This is so it can speak concisely 655 about threads without repeatedly printing their creation point call 656 stacks. Each thread is only ever announced once, the first time it 657 appears in any Helgrind error message.</para> 658 659 <para>The main error message begins at the text 660 "<computeroutput>Possible data race during read</computeroutput>". At 661 the start is information you would expect to see -- address and size 662 of the racing access, whether a read or a write, and the call stack at 663 the point it was detected.</para> 664 665 <para>A second call stack is presented starting at the text 666 "<computeroutput>This conflicts with a previous 667 write</computeroutput>". This shows a previous access which also 668 accessed the stated address, and which is believed to be racing 669 against the access in the first call stack. Note that this second 670 call stack is limited to a maximum of 8 entries to limit the 671 memory usage.</para> 672 673 <para>Finally, Helgrind may attempt to give a description of the 674 raced-on address in source level terms. In this example, it 675 identifies it as a local variable, shows its name, declaration point, 676 and in which frame (of the first call stack) it lives. Note that this 677 information is only shown when <varname>--read-var-info=yes</varname> 678 is specified on the command line. That's because reading the DWARF3 679 debug information in enough detail to capture variable type and 680 location information makes Helgrind much slower at startup, and also 681 requires considerable amounts of memory, for large programs. 682 </para> 683 684 <para>Once you have your two call stacks, how do you find the root 685 cause of the race?</para> 686 687 <para>The first thing to do is examine the source locations referred 688 to by each call stack. They should both show an access to the same 689 location, or variable.</para> 690 691 <para>Now figure out how how that location should have been made 692 thread-safe:</para> 693 694 <itemizedlist> 695 <listitem><para>Perhaps the location was intended to be protected by 696 a mutex? If so, you need to lock and unlock the mutex at both 697 access points, even if one of the accesses is reported to be a read. 698 Did you perhaps forget the locking at one or other of the accesses? 699 To help you do this, Helgrind shows the set of locks held by each 700 threads at the time they accessed the raced-on location.</para> 701 </listitem> 702 <listitem><para>Alternatively, perhaps you intended to use a some 703 other scheme to make it safe, such as signalling on a condition 704 variable. In all such cases, try to find a synchronisation event 705 (or a chain thereof) which separates the earlier-observed access (as 706 shown in the second call stack) from the later-observed access (as 707 shown in the first call stack). In other words, try to find 708 evidence that the earlier access "happens-before" the later access. 709 See the previous subsection for an explanation of the happens-before 710 relation.</para> 711 <para> 712 The fact that Helgrind is reporting a race means it did not observe 713 any happens-before relation between the two accesses. If 714 Helgrind is working correctly, it should also be the case that you 715 also cannot find any such relation, even on detailed inspection 716 of the source code. Hopefully, though, your inspection of the code 717 will show where the missing synchronisation operation(s) should have 718 been.</para> 719 </listitem> 720 </itemizedlist> 721 722 </sect2> 723 724 725 </sect1> 726 727 <sect1 id="hg-manual.effective-use" xreflabel="Helgrind Effective Use"> 728 <title>Hints and Tips for Effective Use of Helgrind</title> 729 730 <para>Helgrind can be very helpful in finding and resolving 731 threading-related problems. Like all sophisticated tools, it is most 732 effective when you understand how to play to its strengths.</para> 733 734 <para>Helgrind will be less effective when you merely throw an 735 existing threaded program at it and try to make sense of any reported 736 errors. It will be more effective if you design threaded programs 737 from the start in a way that helps Helgrind verify correctness. The 738 same is true for finding memory errors with Memcheck, but applies more 739 here, because thread checking is a harder problem. Consequently it is 740 much easier to write a correct program for which Helgrind falsely 741 reports (threading) errors than it is to write a correct program for 742 which Memcheck falsely reports (memory) errors.</para> 743 744 <para>With that in mind, here are some tips, listed most important first, 745 for getting reliable results and avoiding false errors. The first two 746 are critical. Any violations of them will swamp you with huge numbers 747 of false data-race errors.</para> 748 749 750 <orderedlist> 751 752 <listitem> 753 <para>Make sure your application, and all the libraries it uses, 754 use the POSIX threading primitives. Helgrind needs to be able to 755 see all events pertaining to thread creation, exit, locking and 756 other synchronisation events. To do so it intercepts many POSIX 757 pthreads functions.</para> 758 759 <para>Do not roll your own threading primitives (mutexes, etc) 760 from combinations of the Linux futex syscall, atomic counters, etc. 761 These throw Helgrind's internal what's-going-on models 762 way off course and will give bogus results.</para> 763 764 <para>Also, do not reimplement existing POSIX abstractions using 765 other POSIX abstractions. For example, don't build your own 766 semaphore routines or reader-writer locks from POSIX mutexes and 767 condition variables. Instead use POSIX reader-writer locks and 768 semaphores directly, since Helgrind supports them directly.</para> 769 770 <para>Helgrind directly supports the following POSIX threading 771 abstractions: mutexes, reader-writer locks, condition variables 772 (but see below), semaphores and barriers. Currently spinlocks 773 are not supported, although they could be in future.</para> 774 775 <para>At the time of writing, the following popular Linux packages 776 are known to implement their own threading primitives:</para> 777 778 <itemizedlist> 779 <listitem><para>Qt version 4.X. Qt 3.X is harmless in that it 780 only uses POSIX pthreads primitives. Unfortunately Qt 4.X 781 has its own implementation of mutexes (QMutex) and thread reaping. 782 Helgrind 3.4.x contains direct support 783 for Qt 4.X threading, which is experimental but is believed to 784 work fairly well. A side effect of supporting Qt 4 directly is 785 that Helgrind can be used to debug KDE4 applications. As this 786 is an experimental feature, we would particularly appreciate 787 feedback from folks who have used Helgrind to successfully debug 788 Qt 4 and/or KDE4 applications.</para> 789 </listitem> 790 <listitem><para>Runtime support library for GNU OpenMP (part of 791 GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime 792 library (<filename>libgomp.so</filename>) constructs its own 793 synchronisation primitives using combinations of atomic memory 794 instructions and the futex syscall, which causes total chaos since in 795 Helgrind since it cannot "see" those.</para> 796 <para>Fortunately, this can be solved using a configuration-time 797 option (for GCC). Rebuild GCC from source, and configure using 798 <varname>--disable-linux-futex</varname>. 799 This makes libgomp.so use the standard 800 POSIX threading primitives instead. Note that this was tested 801 using GCC 4.2.3 and has not been re-tested using more recent GCC 802 versions. We would appreciate hearing about any successes or 803 failures with more recent versions.</para> 804 </listitem> 805 </itemizedlist> 806 807 <para>If you must implement your own threading primitives, there 808 are a set of client request macros 809 in <computeroutput>helgrind.h</computeroutput> to help you 810 describe your primitives to Helgrind. You should be able to 811 mark up mutexes, condition variables, etc, without difficulty. 812 </para> 813 <para> 814 It is also possible to mark up the effects of thread-safe 815 reference counting using the 816 <computeroutput>ANNOTATE_HAPPENS_BEFORE</computeroutput>, 817 <computeroutput>ANNOTATE_HAPPENS_AFTER</computeroutput> and 818 <computeroutput>ANNOTATE_HAPPENS_BEFORE_FORGET_ALL</computeroutput>, 819 macros. Thread-safe reference counting using an atomically 820 incremented/decremented refcount variable causes Helgrind 821 problems because a one-to-zero transition of the reference count 822 means the accessing thread has exclusive ownership of the 823 associated resource (normally, a C++ object) and can therefore 824 access it (normally, to run its destructor) without locking. 825 Helgrind doesn't understand this, and markup is essential to 826 avoid false positives. 827 </para> 828 829 <para> 830 Here are recommended guidelines for marking up thread safe 831 reference counting in C++. You only need to mark up your 832 release methods -- the ones which decrement the reference count. 833 Given a class like this: 834 </para> 835 836 <programlisting><![CDATA[ 837 class MyClass { 838 unsigned int mRefCount; 839 840 void Release ( void ) { 841 unsigned int newCount = atomic_decrement(&mRefCount); 842 if (newCount == 0) { 843 delete this; 844 } 845 } 846 } 847 ]]></programlisting> 848 849 <para> 850 the release method should be marked up as follows: 851 </para> 852 853 <programlisting><![CDATA[ 854 void Release ( void ) { 855 unsigned int newCount = atomic_decrement(&mRefCount); 856 if (newCount == 0) { 857 ANNOTATE_HAPPENS_AFTER(&mRefCount); 858 ANNOTATE_HAPPENS_BEFORE_FORGET_ALL(&mRefCount); 859 delete this; 860 } else { 861 ANNOTATE_HAPPENS_BEFORE(&mRefCount); 862 } 863 } 864 ]]></programlisting> 865 866 <para> 867 There are a number of complex, mostly-theoretical objections to 868 this scheme. From a theoretical standpoint it appears to be 869 impossible to devise a markup scheme which is completely correct 870 in the sense of guaranteeing to remove all false races. The 871 proposed scheme however works well in practice. 872 </para> 873 874 </listitem> 875 876 <listitem> 877 <para>Avoid memory recycling. If you can't avoid it, you must use 878 tell Helgrind what is going on via the 879 <function>VALGRIND_HG_CLEAN_MEMORY</function> client request (in 880 <computeroutput>helgrind.h</computeroutput>).</para> 881 882 <para>Helgrind is aware of standard heap memory allocation and 883 deallocation that occurs via 884 <function>malloc</function>/<function>free</function>/<function>new</function>/<function>delete</function> 885 and from entry and exit of stack frames. In particular, when memory is 886 deallocated via <function>free</function>, <function>delete</function>, 887 or function exit, Helgrind considers that memory clean, so when it is 888 eventually reallocated, its history is irrelevant.</para> 889 890 <para>However, it is common practice to implement memory recycling 891 schemes. In these, memory to be freed is not handed to 892 <function>free</function>/<function>delete</function>, but instead put 893 into a pool of free buffers to be handed out again as required. The 894 problem is that Helgrind has no 895 way to know that such memory is logically no longer in use, and 896 its history is irrelevant. Hence you must make that explicit, 897 using the <function>VALGRIND_HG_CLEAN_MEMORY</function> client request 898 to specify the relevant address ranges. It's easiest to put these 899 requests into the pool manager code, and use them either when memory is 900 returned to the pool, or is allocated from it.</para> 901 </listitem> 902 903 <listitem> 904 <para>Avoid POSIX condition variables. If you can, use POSIX 905 semaphores (<function>sem_t</function>, <function>sem_post</function>, 906 <function>sem_wait</function>) to do inter-thread event signalling. 907 Semaphores with an initial value of zero are particularly useful for 908 this.</para> 909 910 <para>Helgrind only partially correctly handles POSIX condition 911 variables. This is because Helgrind can see inter-thread 912 dependencies between a <function>pthread_cond_wait</function> call and a 913 <function>pthread_cond_signal</function>/<function>pthread_cond_broadcast</function> 914 call only if the waiting thread actually gets to the rendezvous first 915 (so that it actually calls 916 <function>pthread_cond_wait</function>). It can't see dependencies 917 between the threads if the signaller arrives first. In the latter case, 918 POSIX guidelines imply that the associated boolean condition still 919 provides an inter-thread synchronisation event, but one which is 920 invisible to Helgrind.</para> 921 922 <para>The result of Helgrind missing some inter-thread 923 synchronisation events is to cause it to report false positives. 924 </para> 925 926 <para>The root cause of this synchronisation lossage is 927 particularly hard to understand, so an example is helpful. It was 928 discussed at length by Arndt Muehlenfeld ("Runtime Race Detection 929 in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The 930 canonical POSIX-recommended usage scheme for condition variables 931 is as follows:</para> 932 933 <programlisting><![CDATA[ 934 b is a Boolean condition, which is False most of the time 935 cv is a condition variable 936 mx is its associated mutex 937 938 Signaller: Waiter: 939 940 lock(mx) lock(mx) 941 b = True while (b == False) 942 signal(cv) wait(cv,mx) 943 unlock(mx) unlock(mx) 944 ]]></programlisting> 945 946 <para>Assume <computeroutput>b</computeroutput> is False most of 947 the time. If the waiter arrives at the rendezvous first, it 948 enters its while-loop, waits for the signaller to signal, and 949 eventually proceeds. Helgrind sees the signal, notes the 950 dependency, and all is well.</para> 951 952 <para>If the signaller arrives 953 first, <computeroutput>b</computeroutput> is set to true, and the 954 signal disappears into nowhere. When the waiter later arrives, it 955 does not enter its while-loop and simply carries on. But even in 956 this case, the waiter code following the while-loop cannot execute 957 until the signaller sets <computeroutput>b</computeroutput> to 958 True. Hence there is still the same inter-thread dependency, but 959 this time it is through an arbitrary in-memory condition, and 960 Helgrind cannot see it.</para> 961 962 <para>By comparison, Helgrind's detection of inter-thread 963 dependencies caused by semaphore operations is believed to be 964 exactly correct.</para> 965 966 <para>As far as I know, a solution to this problem that does not 967 require source-level annotation of condition-variable wait loops 968 is beyond the current state of the art.</para> 969 </listitem> 970 971 <listitem> 972 <para>Make sure you are using a supported Linux distribution. At 973 present, Helgrind only properly supports glibc-2.3 or later. This 974 in turn means we only support glibc's NPTL threading 975 implementation. The old LinuxThreads implementation is not 976 supported.</para> 977 </listitem> 978 979 <listitem> 980 <para>If your application is using thread local variables, 981 helgrind might report false positive race conditions on these 982 variables, despite being very probably race free. On Linux, you can 983 use <option>--sim-hints=deactivate-pthread-stack-cache-via-hack</option> 984 to avoid such false positive error messages 985 (see <xref linkend="opt.sim-hints"/>). 986 </para> 987 </listitem> 988 989 <listitem> 990 <para>Round up all finished threads using 991 <function>pthread_join</function>. Avoid 992 detaching threads: don't create threads in the detached state, and 993 don't call <function>pthread_detach</function> on existing threads.</para> 994 995 <para>Using <function>pthread_join</function> to round up finished 996 threads provides a clear synchronisation point that both Helgrind and 997 programmers can see. If you don't call 998 <function>pthread_join</function> on a thread, Helgrind has no way to 999 know when it finishes, relative to any 1000 significant synchronisation points for other threads in the program. So 1001 it assumes that the thread lingers indefinitely and can potentially 1002 interfere indefinitely with the memory state of the program. It 1003 has every right to assume that -- after all, it might really be 1004 the case that, for scheduling reasons, the exiting thread did run 1005 very slowly in the last stages of its life.</para> 1006 </listitem> 1007 1008 <listitem> 1009 <para>Perform thread debugging (with Helgrind) and memory 1010 debugging (with Memcheck) together.</para> 1011 1012 <para>Helgrind tracks the state of memory in detail, and memory 1013 management bugs in the application are liable to cause confusion. 1014 In extreme cases, applications which do many invalid reads and 1015 writes (particularly to freed memory) have been known to crash 1016 Helgrind. So, ideally, you should make your application 1017 Memcheck-clean before using Helgrind.</para> 1018 1019 <para>It may be impossible to make your application Memcheck-clean 1020 unless you first remove threading bugs. In particular, it may be 1021 difficult to remove all reads and writes to freed memory in 1022 multithreaded C++ destructor sequences at program termination. 1023 So, ideally, you should make your application Helgrind-clean 1024 before using Memcheck.</para> 1025 1026 <para>Since this circularity is obviously unresolvable, at least 1027 bear in mind that Memcheck and Helgrind are to some extent 1028 complementary, and you may need to use them together.</para> 1029 </listitem> 1030 1031 <listitem> 1032 <para>POSIX requires that implementations of standard I/O 1033 (<function>printf</function>, <function>fprintf</function>, 1034 <function>fwrite</function>, <function>fread</function>, etc) are thread 1035 safe. Unfortunately GNU libc implements this by using internal locking 1036 primitives that Helgrind is unable to intercept. Consequently Helgrind 1037 generates many false race reports when you use these functions.</para> 1038 1039 <para>Helgrind attempts to hide these errors using the standard 1040 Valgrind error-suppression mechanism. So, at least for simple 1041 test cases, you don't see any. Nevertheless, some may slip 1042 through. Just something to be aware of.</para> 1043 </listitem> 1044 1045 <listitem> 1046 <para>Helgrind's error checks do not work properly inside the 1047 system threading library itself 1048 (<computeroutput>libpthread.so</computeroutput>), and it usually 1049 observes large numbers of (false) errors in there. Valgrind's 1050 suppression system then filters these out, so you should not see 1051 them.</para> 1052 1053 <para>If you see any race errors reported 1054 where <computeroutput>libpthread.so</computeroutput> or 1055 <computeroutput>ld.so</computeroutput> is the object associated 1056 with the innermost stack frame, please file a bug report at 1057 <ulink url="&vg-url;">&vg-url;</ulink>. 1058 </para> 1059 </listitem> 1060 1061 </orderedlist> 1062 1063 </sect1> 1064 1065 1066 1067 1068 <sect1 id="hg-manual.options" xreflabel="Helgrind Command-line Options"> 1069 <title>Helgrind Command-line Options</title> 1070 1071 <para>The following end-user options are available:</para> 1072 1073 <!-- start of xi:include in the manpage --> 1074 <variablelist id="hg.opts.list"> 1075 1076 <varlistentry id="opt.free-is-write" 1077 xreflabel="--free-is-write"> 1078 <term> 1079 <option><![CDATA[--free-is-write=no|yes 1080 [default: no] ]]></option> 1081 </term> 1082 <listitem> 1083 <para>When enabled (not the default), Helgrind treats freeing of 1084 heap memory as if the memory was written immediately before 1085 the free. This exposes races where memory is referenced by 1086 one thread, and freed by another, but there is no observable 1087 synchronisation event to ensure that the reference happens 1088 before the free. 1089 </para> 1090 <para>This functionality is new in Valgrind 3.7.0, and is 1091 regarded as experimental. It is not enabled by default 1092 because its interaction with custom memory allocators is not 1093 well understood at present. User feedback is welcomed. 1094 </para> 1095 </listitem> 1096 </varlistentry> 1097 1098 <varlistentry id="opt.track-lockorders" 1099 xreflabel="--track-lockorders"> 1100 <term> 1101 <option><![CDATA[--track-lockorders=no|yes 1102 [default: yes] ]]></option> 1103 </term> 1104 <listitem> 1105 <para>When enabled (the default), Helgrind performs lock order 1106 consistency checking. For some buggy programs, the large number 1107 of lock order errors reported can become annoying, particularly 1108 if you're only interested in race errors. You may therefore find 1109 it helpful to disable lock order checking.</para> 1110 </listitem> 1111 </varlistentry> 1112 1113 <varlistentry id="opt.history-level" 1114 xreflabel="--history-level"> 1115 <term> 1116 <option><![CDATA[--history-level=none|approx|full 1117 [default: full] ]]></option> 1118 </term> 1119 <listitem> 1120 <para><option>--history-level=full</option> (the default) causes 1121 Helgrind collects enough information about "old" accesses that 1122 it can produce two stack traces in a race report -- both the 1123 stack trace for the current access, and the trace for the 1124 older, conflicting access. To limit memory usage, "old" accesses 1125 stack traces are limited to a maximum of 8 entries, even if 1126 <option>--num-callers</option> value is bigger.</para> 1127 <para>Collecting such information is expensive in both speed and 1128 memory, particularly for programs that do many inter-thread 1129 synchronisation events (locks, unlocks, etc). Without such 1130 information, it is more difficult to track down the root 1131 causes of races. Nonetheless, you may not need it in 1132 situations where you just want to check for the presence or 1133 absence of races, for example, when doing regression testing 1134 of a previously race-free program.</para> 1135 <para><option>--history-level=none</option> is the opposite 1136 extreme. It causes Helgrind not to collect any information 1137 about previous accesses. This can be dramatically faster 1138 than <option>--history-level=full</option>.</para> 1139 <para><option>--history-level=approx</option> provides a 1140 compromise between these two extremes. It causes Helgrind to 1141 show a full trace for the later access, and approximate 1142 information regarding the earlier access. This approximate 1143 information consists of two stacks, and the earlier access is 1144 guaranteed to have occurred somewhere between program points 1145 denoted by the two stacks. This is not as useful as showing 1146 the exact stack for the previous access 1147 (as <option>--history-level=full</option> does), but it is 1148 better than nothing, and it is almost as fast as 1149 <option>--history-level=none</option>.</para> 1150 </listitem> 1151 </varlistentry> 1152 1153 <varlistentry id="opt.conflict-cache-size" 1154 xreflabel="--conflict-cache-size"> 1155 <term> 1156 <option><![CDATA[--conflict-cache-size=N 1157 [default: 1000000] ]]></option> 1158 </term> 1159 <listitem> 1160 <para>This flag only has any effect 1161 at <option>--history-level=full</option>.</para> 1162 <para>Information about "old" conflicting accesses is stored in 1163 a cache of limited size, with LRU-style management. This is 1164 necessary because it isn't practical to store a stack trace 1165 for every single memory access made by the program. 1166 Historical information on not recently accessed locations is 1167 periodically discarded, to free up space in the cache.</para> 1168 <para>This option controls the size of the cache, in terms of the 1169 number of different memory addresses for which 1170 conflicting access information is stored. If you find that 1171 Helgrind is showing race errors with only one stack instead of 1172 the expected two stacks, try increasing this value.</para> 1173 <para>The minimum value is 10,000 and the maximum is 30,000,000 1174 (thirty times the default value). Increasing the value by 1 1175 increases Helgrind's memory requirement by very roughly 100 1176 bytes, so the maximum value will easily eat up three extra 1177 gigabytes or so of memory.</para> 1178 </listitem> 1179 </varlistentry> 1180 1181 <varlistentry id="opt.check-stack-refs" 1182 xreflabel="--check-stack-refs"> 1183 <term> 1184 <option><![CDATA[--check-stack-refs=no|yes 1185 [default: yes] ]]></option> 1186 </term> 1187 <listitem> 1188 <para> 1189 By default Helgrind checks all data memory accesses made by your 1190 program. This flag enables you to skip checking for accesses 1191 to thread stacks (local variables). This can improve 1192 performance, but comes at the cost of missing races on 1193 stack-allocated data. 1194 </para> 1195 </listitem> 1196 </varlistentry> 1197 1198 <varlistentry id="opt.ignore-thread-creation" 1199 xreflabel="--ignore-thread-creation"> 1200 <term> 1201 <option><![CDATA[--ignore-thread-creation=<yes|no> 1202 [default: no]]]></option> 1203 </term> 1204 <listitem> 1205 <para> 1206 Controls whether all activities during thread creation should be 1207 ignored. By default enabled only on Solaris. 1208 Solaris provides higher throughput, parallelism and scalability than 1209 other operating systems, at the cost of more fine-grained locking 1210 activity. This means for example that when a thread is created under 1211 glibc, just one big lock is used for all thread setup. Solaris libc 1212 uses several fine-grained locks and the creator thread resumes its 1213 activities as soon as possible, leaving for example stack and TLS setup 1214 sequence to the created thread. 1215 This situation confuses Helgrind as it assumes there is some false 1216 ordering in place between creator and created thread; and therefore many 1217 types of race conditions in the application would not be reported. 1218 To prevent such false ordering, this command line option is set to 1219 <computeroutput>yes</computeroutput> by default on Solaris. 1220 All activity (loads, stores, client requests) is therefore ignored 1221 during:</para> 1222 <itemizedlist> 1223 <listitem> 1224 <para> 1225 pthread_create() call in the creator thread 1226 </para> 1227 </listitem> 1228 <listitem> 1229 <para> 1230 thread creation phase (stack and TLS setup) in the created thread 1231 </para> 1232 </listitem> 1233 </itemizedlist> 1234 <para> 1235 Also new memory allocated during thread creation is untracked, 1236 that is race reporting is suppressed there. DRD does the same thing 1237 implicitly. This is necessary because Solaris libc caches many objects 1238 and reuses them for different threads and that confuses 1239 Helgrind.</para> 1240 </listitem> 1241 </varlistentry> 1242 1243 1244 </variablelist> 1245 <!-- end of xi:include in the manpage --> 1246 1247 <!-- start of xi:include in the manpage --> 1248 <!-- commented out, because we don't document debugging options in the 1249 manual. Nb: all the double-dashes below had a space inserted in them 1250 to avoid problems with premature closing of this comment. 1251 <para>In addition, the following debugging options are available for 1252 Helgrind:</para> 1253 1254 <variablelist id="hg.debugopts.list"> 1255 1256 <varlistentry id="opt.trace-malloc" xreflabel="- -trace-malloc"> 1257 <term> 1258 <option><![CDATA[- -trace-malloc=no|yes [no] 1259 ]]></option> 1260 </term> 1261 <listitem> 1262 <para>Show all client <function>malloc</function> (etc) and 1263 <function>free</function> (etc) requests.</para> 1264 </listitem> 1265 </varlistentry> 1266 1267 <varlistentry id="opt.cmp-race-err-addrs" 1268 xreflabel="- -cmp-race-err-addrs"> 1269 <term> 1270 <option><![CDATA[- -cmp-race-err-addrs=no|yes [no] 1271 ]]></option> 1272 </term> 1273 <listitem> 1274 <para>Controls whether or not race (data) addresses should be 1275 taken into account when removing duplicates of race errors. 1276 With <varname>- -cmp-race-err-addrs=no</varname>, two otherwise 1277 identical race errors will be considered to be the same if 1278 their race addresses differ. With 1279 With <varname>- -cmp-race-err-addrs=yes</varname> they will be 1280 considered different. This is provided to help make certain 1281 regression tests work reliably.</para> 1282 </listitem> 1283 </varlistentry> 1284 1285 <varlistentry id="opt.hg-sanity-flags" xreflabel="- -hg-sanity-flags"> 1286 <term> 1287 <option><![CDATA[- -hg-sanity-flags=<XXXXXX> (X = 0|1) [000000] 1288 ]]></option> 1289 </term> 1290 <listitem> 1291 <para>Run extensive sanity checks on Helgrind's internal 1292 data structures at events defined by the bitstring, as 1293 follows:</para> 1294 <para><computeroutput>010000 </computeroutput>after changes to 1295 the lock order acquisition graph</para> 1296 <para><computeroutput>001000 </computeroutput>after every client 1297 memory access (NB: not currently used)</para> 1298 <para><computeroutput>000100 </computeroutput>after every client 1299 memory range permission setting of 256 bytes or greater</para> 1300 <para><computeroutput>000010 </computeroutput>after every client 1301 lock or unlock event</para> 1302 <para><computeroutput>000001 </computeroutput>after every client 1303 thread creation or joinage event</para> 1304 <para>Note these will make Helgrind run very slowly, often to 1305 the point of being completely unusable.</para> 1306 </listitem> 1307 </varlistentry> 1308 1309 </variablelist> 1310 --> 1311 <!-- end of xi:include in the manpage --> 1312 1313 1314 </sect1> 1315 1316 1317 <sect1 id="hg-manual.monitor-commands" xreflabel="Helgrind Monitor Commands"> 1318 <title>Helgrind Monitor Commands</title> 1319 <para>The Helgrind tool provides monitor commands handled by Valgrind's 1320 built-in gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>). 1321 </para> 1322 <itemizedlist> 1323 <listitem> 1324 <para><varname>info locks [lock_addr]</varname> shows the list of locks 1325 and their status. If <varname>lock_addr</varname> is given, only shows 1326 the lock located at this address. </para> 1327 <para> 1328 In the following example, helgrind knows about one lock. This 1329 lock is located at the guest address <varname>ga 1330 0x8049a20</varname>. The lock kind is <varname>rdwr</varname> 1331 indicating a reader-writer lock. Other possible lock kinds 1332 are <varname>nonRec</varname> (simple mutex, non recursive) 1333 and <varname>mbRec</varname> (simple mutex, possibly recursive). 1334 The lock kind is then followed by the list of threads helding the 1335 lock. In the below example, <varname>R1:thread #6 tid 3</varname> 1336 indicates that the helgrind thread #6 has acquired (once, as the 1337 counter following the letter R is one) the lock in read mode. The 1338 helgrind thread nr is incremented for each started thread. The 1339 presence of 'tid 3' indicates that the thread #6 is has not exited 1340 yet and is the valgrind tid 3. If a thread has terminated, then 1341 this is indicated with 'tid (exited)'. 1342 </para> 1343 <programlisting><![CDATA[ 1344 (gdb) monitor info locks 1345 Lock ga 0x8049a20 { 1346 kind rdwr 1347 { R1:thread #6 tid 3 } 1348 } 1349 (gdb) 1350 ]]></programlisting> 1351 1352 <para> If you give the option <varname>--read-var-info=yes</varname>, 1353 then more information will be provided about the lock location, such as 1354 the global variable or the heap block that contains the lock: 1355 </para> 1356 <programlisting><![CDATA[ 1357 Lock ga 0x8049a20 { 1358 Location 0x8049a20 is 0 bytes inside global var "s_rwlock" 1359 declared at rwlock_race.c:17 1360 kind rdwr 1361 { R1:thread #3 tid 3 } 1362 } 1363 ]]></programlisting> 1364 1365 </listitem> 1366 1367 <listitem> 1368 <para><varname>accesshistory <addr> [<len>]</varname> 1369 shows the access history recorded for <len> (default 1) bytes 1370 starting at <addr>. For each recorded access that overlaps 1371 with the given range, <varname>accesshistory</varname> shows the operation 1372 type (read or write), the address and size read or written, the helgrind 1373 thread nr/valgrind tid number that did the operation and the locks held 1374 by the thread at the time of the operation. 1375 The oldest access is shown first, the most recent access is shown last. 1376 </para> 1377 <para> 1378 In the following example, we see first a recorded write of 4 bytes by 1379 thread #7 that has modified the given 2 bytes range. 1380 The second recorded write is the most recent recorded write : thread #9 1381 modified the same 2 bytes as part of a 4 bytes write operation. 1382 The list of locks held by each thread at the time of the write operation 1383 are also shown. 1384 </para> 1385 <programlisting><![CDATA[ 1386 (gdb) monitor accesshistory 0x8049D8A 2 1387 write of size 4 at 0x8049D88 by thread #7 tid 3 1388 ==6319== Locks held: 2, at address 0x8049D8C (and 1 that can't be shown) 1389 ==6319== at 0x804865F: child_fn1 (locked_vs_unlocked2.c:29) 1390 ==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234) 1391 ==6319== by 0x39B924: start_thread (pthread_create.c:297) 1392 ==6319== by 0x2F107D: clone (clone.S:130) 1393 1394 write of size 4 at 0x8049D88 by thread #9 tid 2 1395 ==6319== Locks held: 2, at addresses 0x8049DA4 0x8049DD4 1396 ==6319== at 0x804877B: child_fn2 (locked_vs_unlocked2.c:45) 1397 ==6319== by 0x400AE61: mythread_wrapper (hg_intercepts.c:234) 1398 ==6319== by 0x39B924: start_thread (pthread_create.c:297) 1399 ==6319== by 0x2F107D: clone (clone.S:130) 1400 1401 ]]></programlisting> 1402 1403 </listitem> 1404 1405 </itemizedlist> 1406 1407 </sect1> 1408 1409 <sect1 id="hg-manual.client-requests" xreflabel="Helgrind Client Requests"> 1410 <title>Helgrind Client Requests</title> 1411 1412 <para>The following client requests are defined in 1413 <filename>helgrind.h</filename>. See that file for exact details of their 1414 arguments.</para> 1415 1416 <itemizedlist> 1417 1418 <listitem> 1419 <para><function>VALGRIND_HG_CLEAN_MEMORY</function></para> 1420 <para>This makes Helgrind forget everything it knows about a 1421 specified memory range. This is particularly useful for memory 1422 allocators that wish to recycle memory.</para> 1423 </listitem> 1424 <listitem> 1425 <para><function>ANNOTATE_HAPPENS_BEFORE</function></para> 1426 </listitem> 1427 <listitem> 1428 <para><function>ANNOTATE_HAPPENS_AFTER</function></para> 1429 </listitem> 1430 <listitem> 1431 <para><function>ANNOTATE_NEW_MEMORY</function></para> 1432 </listitem> 1433 <listitem> 1434 <para><function>ANNOTATE_RWLOCK_CREATE</function></para> 1435 </listitem> 1436 <listitem> 1437 <para><function>ANNOTATE_RWLOCK_DESTROY</function></para> 1438 </listitem> 1439 <listitem> 1440 <para><function>ANNOTATE_RWLOCK_ACQUIRED</function></para> 1441 </listitem> 1442 <listitem> 1443 <para><function>ANNOTATE_RWLOCK_RELEASED</function></para> 1444 <para>These are used to describe to Helgrind, the behaviour of 1445 custom (non-POSIX) synchronisation primitives, which it otherwise 1446 has no way to understand. See comments 1447 in <filename>helgrind.h</filename> for further 1448 documentation.</para> 1449 </listitem> 1450 1451 </itemizedlist> 1452 1453 </sect1> 1454 1455 1456 1457 <sect1 id="hg-manual.todolist" xreflabel="To Do List"> 1458 <title>A To-Do List for Helgrind</title> 1459 1460 <para>The following is a list of loose ends which should be tidied up 1461 some time.</para> 1462 1463 <itemizedlist> 1464 <listitem><para>For lock order errors, print the complete lock 1465 cycle, rather than only doing for size-2 cycles as at 1466 present.</para> 1467 </listitem> 1468 <listitem><para>The conflicting access mechanism sometimes 1469 mysteriously fails to show the conflicting access' stack, even 1470 when provided with unbounded storage for conflicting access info. 1471 This should be investigated.</para> 1472 </listitem> 1473 <listitem><para>Document races caused by GCC's thread-unsafe code 1474 generation for speculative stores. In the interim see 1475 <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html 1476 </computeroutput> 1477 and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>. 1478 </para> 1479 </listitem> 1480 <listitem><para>Don't update the lock-order graph, and don't check 1481 for errors, when a "try"-style lock operation happens (e.g. 1482 <function>pthread_mutex_trylock</function>). Such calls do not add any real 1483 restrictions to the locking order, since they can always fail to 1484 acquire the lock, resulting in the caller going off and doing Plan 1485 B (presumably it will have a Plan B). Doing such checks could 1486 generate false lock-order errors and confuse users.</para> 1487 </listitem> 1488 <listitem><para> Performance can be very poor. Slowdowns on the 1489 order of 100:1 are not unusual. There is limited scope for 1490 performance improvements. 1491 </para> 1492 </listitem> 1493 1494 </itemizedlist> 1495 1496 </sect1> 1497 1498 </chapter> 1499