1 <?xml version="1.0"?> <!-- -*- sgml -*- --> 2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" 3 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" 4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]> 5 6 <chapter id="cl-manual" xreflabel="Callgrind Manual"> 7 <title>Callgrind: a call-graph generating cache and branch prediction profiler</title> 8 9 10 <para>To use this tool, you must specify 11 <option>--tool=callgrind</option> on the 12 Valgrind command line.</para> 13 14 <sect1 id="cl-manual.use" xreflabel="Overview"> 15 <title>Overview</title> 16 17 <para>Callgrind is a profiling tool that records the call history among 18 functions in a program's run as a call-graph. 19 By default, the collected data consists of 20 the number of instructions executed, their relationship 21 to source lines, the caller/callee relationship between functions, 22 and the numbers of such calls. 23 Optionally, cache simulation and/or branch prediction (similar to Cachegrind) 24 can produce further information about the runtime behavior of an application. 25 </para> 26 27 <para>The profile data is written out to a file at program 28 termination. For presentation of the data, and interactive control 29 of the profiling, two command line tools are provided:</para> 30 <variablelist> 31 <varlistentry> 32 <term><command>callgrind_annotate</command></term> 33 <listitem> 34 <para>This command reads in the profile data, and prints a 35 sorted lists of functions, optionally with source annotation.</para> 36 37 <para>For graphical visualization of the data, try 38 <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based 39 GUI that makes it easy to navigate the large amount of data that 40 Callgrind produces.</para> 41 42 </listitem> 43 </varlistentry> 44 45 <varlistentry> 46 <term><command>callgrind_control</command></term> 47 <listitem> 48 <para>This command enables you to interactively observe and control 49 the status of a program currently running under Callgrind's control, 50 without stopping the program. You can get statistics information as 51 well as the current stack trace, and you can request zeroing of counters 52 or dumping of profile data.</para> 53 </listitem> 54 </varlistentry> 55 </variablelist> 56 57 <sect2 id="cl-manual.functionality" xreflabel="Functionality"> 58 <title>Functionality</title> 59 60 <para>Cachegrind collects flat profile data: event counts (data reads, 61 cache misses, etc.) are attributed directly to the function they 62 occurred in. This cost attribution mechanism is 63 called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis> 64 attribution.</para> 65 66 <para>Callgrind extends this functionality by propagating costs 67 across function call boundaries. If function <function>foo</function> calls 68 <function>bar</function>, the costs from <function>bar</function> are added into 69 <function>foo</function>'s costs. When applied to the program as a whole, 70 this builds up a picture of so called <emphasis>inclusive</emphasis> 71 costs, that is, where the cost of each function includes the costs of 72 all functions it called, directly or indirectly.</para> 73 74 <para>As an example, the inclusive cost of 75 <function>main</function> should be almost 100 percent 76 of the total program cost. Because of costs arising before 77 <function>main</function> is run, such as 78 initialization of the run time linker and construction of global C++ 79 objects, the inclusive cost of <function>main</function> 80 is not exactly 100 percent of the total program cost.</para> 81 82 <para>Together with the call graph, this allows you to find the 83 specific call chains starting from 84 <function>main</function> in which the majority of the 85 program's costs occur. Caller/callee cost attribution is also useful 86 for profiling functions called from multiple call sites, and where 87 optimization opportunities depend on changing code in the callers, in 88 particular by reducing the call count.</para> 89 90 <para>Callgrind's cache simulation is based on that of Cachegrind. 91 Read the documentation for <xref linkend="&vg-cg-manual-id;"/> first. The material 92 below describes the features supported in addition to Cachegrind's 93 features.</para> 94 95 <para>Callgrind's ability to detect function calls and returns depends 96 on the instruction set of the platform it is run on. It works best on 97 x86 and amd64, and unfortunately currently does not work so well on 98 PowerPC, ARM, Thumb or MIPS code. This is because there are no explicit 99 call or return instructions in these instruction sets, so Callgrind 100 has to rely on heuristics to detect calls and returns.</para> 101 102 </sect2> 103 104 <sect2 id="cl-manual.basics" xreflabel="Basic Usage"> 105 <title>Basic Usage</title> 106 107 <para>As with Cachegrind, you probably want to compile with debugging info 108 (the <option>-g</option> option) and with optimization turned on.</para> 109 110 <para>To start a profile run for a program, execute: 111 <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen> 112 </para> 113 114 <para>While the simulation is running, you can observe execution with: 115 <screen>callgrind_control -b</screen> 116 This will print out the current backtrace. To annotate the backtrace with 117 event counts, run 118 <screen>callgrind_control -e -b</screen> 119 </para> 120 121 <para>After program termination, a profile data file named 122 <computeroutput>callgrind.out.<pid></computeroutput> 123 is generated, where <emphasis>pid</emphasis> is the process ID 124 of the program being profiled. 125 The data file contains information about the calls made in the 126 program among the functions executed, together with 127 <command>Instruction Read</command> (Ir) event counts.</para> 128 129 <para>To generate a function-by-function summary from the profile 130 data file, use 131 <screen>callgrind_annotate [options] callgrind.out.<pid></screen> 132 This summary is similar to the output you get from a Cachegrind 133 run with cg_annotate: the list 134 of functions is ordered by exclusive cost of functions, which also 135 are the ones that are shown. 136 Important for the additional features of Callgrind are 137 the following two options:</para> 138 139 <itemizedlist> 140 <listitem> 141 <para><option>--inclusive=yes</option>: Instead of using 142 exclusive cost of functions as sorting order, use and show 143 inclusive cost.</para> 144 </listitem> 145 146 <listitem> 147 <para><option>--tree=both</option>: Interleave into the 148 top level list of functions, information on the callers and the callees 149 of each function. In these lines, which represents executed 150 calls, the cost gives the number of events spent in the call. 151 Indented, above each function, there is the list of callers, 152 and below, the list of callees. The sum of events in calls to 153 a given function (caller lines), as well as the sum of events in 154 calls from the function (callee lines) together with the self 155 cost, gives the total inclusive cost of the function.</para> 156 </listitem> 157 </itemizedlist> 158 159 <para>Use <option>--auto=yes</option> to get annotated source code 160 for all relevant functions for which the source can be found. In 161 addition to source annotation as produced by 162 <computeroutput>cg_annotate</computeroutput>, you will see the 163 annotated call sites with call counts. For all other options, 164 consult the (Cachegrind) documentation for 165 <computeroutput>cg_annotate</computeroutput>. 166 </para> 167 168 <para>For better call graph browsing experience, it is highly recommended 169 to use <ulink url="&cl-gui-url;">KCachegrind</ulink>. 170 If your code 171 has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets 172 of functions calling each other in a recursive manner), you have to 173 use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput> 174 currently does not do any cycle detection, which is important to get correct 175 results in this case.</para> 176 177 <para>If you are additionally interested in measuring the 178 cache behavior of your program, use Callgrind with the option 179 <option><xref linkend="clopt.cache-sim"/>=yes</option>. For 180 branch prediction simulation, use <option><xref linkend="clopt.branch-sim"/>=yes</option>. 181 Expect a further slow down approximately by a factor of 2.</para> 182 183 <para>If the program section you want to profile is somewhere in the 184 middle of the run, it is beneficial to 185 <emphasis>fast forward</emphasis> to this section without any 186 profiling, and then enable profiling. This is achieved by using 187 the command line option 188 <option><xref linkend="opt.instr-atstart"/>=no</option> 189 and running, in a shell: 190 <computeroutput>callgrind_control -i on</computeroutput> just before the 191 interesting code section is executed. To exactly specify 192 the code position where profiling should start, use the client request 193 <computeroutput><xref linkend="cr.start-instr"/></computeroutput>.</para> 194 195 <para>If you want to be able to see assembly code level annotation, specify 196 <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce 197 profile data at instruction granularity. Note that the resulting profile 198 data 199 can only be viewed with KCachegrind. For assembly annotation, it also is 200 interesting to see more details of the control flow inside of functions, 201 i.e. (conditional) jumps. This will be collected by further specifying 202 <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para> 203 204 </sect2> 205 206 </sect1> 207 208 <sect1 id="cl-manual.usage" xreflabel="Advanced Usage"> 209 <title>Advanced Usage</title> 210 211 <sect2 id="cl-manual.dumps" 212 xreflabel="Multiple dumps from one program run"> 213 <title>Multiple profiling dumps from one program run</title> 214 215 <para>Sometimes you are not interested in characteristics of a full 216 program run, but only of a small part of it, for example execution of one 217 algorithm. If there are multiple algorithms, or one algorithm 218 running with different input data, it may even be useful to get different 219 profile information for different parts of a single program run.</para> 220 221 <para>Profile data files have names of the form 222 <screen> 223 callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis> 224 </screen> 225 </para> 226 <para>where <emphasis>pid</emphasis> is the PID of the running 227 program, <emphasis>part</emphasis> is a number incremented on each 228 dump (".part" is skipped for the dump at program termination), and 229 <emphasis>threadID</emphasis> is a thread identification 230 ("-threadID" is only used if you request dumps of individual 231 threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para> 232 233 <para>There are different ways to generate multiple profile dumps 234 while a program is running under Callgrind's supervision. Nevertheless, 235 all methods trigger the same action, which is "dump all profile 236 information since the last dump or program start, and zero cost 237 counters afterwards". To allow for zeroing cost counters without 238 dumping, there is a second action "zero all cost counters now". 239 The different methods are:</para> 240 <itemizedlist> 241 242 <listitem> 243 <para><command>Dump on program termination.</command> 244 This method is the standard way and doesn't need any special 245 action on your part.</para> 246 </listitem> 247 248 <listitem> 249 <para><command>Spontaneous, interactive dumping.</command> Use 250 <screen>callgrind_control -d [hint [PID/Name]]</screen> to 251 request the dumping of profile information of the supervised 252 application with PID or Name. <emphasis>hint</emphasis> is an 253 arbitrary string you can optionally specify to later be able to 254 distinguish profile dumps. The control program will not terminate 255 before the dump is completely written. Note that the application 256 must be actively running for detection of the dump command. So, 257 for a GUI application, resize the window, or for a server, send a 258 request.</para> 259 <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink> 260 for browsing of profile information, you can use the toolbar 261 button <command>Force dump</command>. This will request a dump 262 and trigger a reload after the dump is written.</para> 263 </listitem> 264 265 <listitem> 266 <para><command>Periodic dumping after execution of a specified 267 number of basic blocks</command>. For this, use the command line 268 option <option><xref linkend="opt.dump-every-bb"/>=count</option>. 269 </para> 270 </listitem> 271 272 <listitem> 273 <para><command>Dumping at enter/leave of specified functions.</command> 274 Use the 275 option <option><xref linkend="opt.dump-before"/>=function</option> 276 and <option><xref linkend="opt.dump-after"/>=function</option>. 277 To zero cost counters before entering a function, use 278 <option><xref linkend="opt.zero-before"/>=function</option>.</para> 279 <para>You can specify these options multiple times for different 280 functions. Function specifications support wildcards: e.g. use 281 <option><xref linkend="opt.dump-before"/>='foo*'</option> to 282 generate dumps before entering any function starting with 283 <emphasis>foo</emphasis>.</para> 284 </listitem> 285 286 <listitem> 287 <para><command>Program controlled dumping.</command> 288 Insert 289 <computeroutput><xref linkend="cr.dump-stats"/>;</computeroutput> 290 at the position in your code where you want a profile dump to happen. Use 291 <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput> to only 292 zero profile counters. 293 See <xref linkend="cl-manual.clientrequests"/> for more information on 294 Callgrind specific client requests.</para> 295 </listitem> 296 </itemizedlist> 297 298 <para>If you are running a multi-threaded application and specify the 299 command line option <option><xref linkend="opt.separate-threads"/>=yes</option>, 300 every thread will be profiled on its own and will create its own 301 profile dump. Thus, the last two methods will only generate one dump 302 of the currently running thread. With the other methods, you will get 303 multiple dumps (one for each thread) on a dump request.</para> 304 305 </sect2> 306 307 308 309 <sect2 id="cl-manual.limits" 310 xreflabel="Limiting range of event collection"> 311 <title>Limiting the range of collected events</title> 312 313 <para>By default, whenever events are happening (such as an 314 instruction execution or cache hit/miss), Callgrind is aggregating 315 them into event counters. However, you may be interested only in 316 what is happening within a given function or starting from a given 317 program phase. To this end, you can disable event aggregation for 318 uninteresting program parts. While attribution of events to 319 functions as well as producing seperate output per program phase 320 can be done by other means (see previous section), there are two 321 benefits by disabling aggregation. First, this is very 322 fine-granular (e.g. just for a loop within a function). Second, 323 disabling event aggregation for complete program phases allows to 324 switch off time-consuming cache simulation and allows Callgrind to 325 progress at much higher speed with an slowdown of around factor 2 326 (identical to <computeroutput>valgrind 327 --tool=none</computeroutput>). 328 </para> 329 330 <para>There are two aspects which influence whether Callgrind is 331 aggregating events at some point in time of program execution. 332 First, there is the <emphasis>collection state</emphasis>. If this 333 is off, no aggregation will be done. By changing the collection 334 state, you can control event aggregation at a very fine 335 granularity. However, there is not much difference in regard to 336 execution speed of Callgrind. By default, collection is switched 337 on, but can be disabled by different means (see below). Second, 338 there is the <emphasis>instrumentation mode</emphasis> in which 339 Callgrind is running. This mode either can be on or off. If 340 instrumentation is off, no observation of actions in the program 341 will be done and thus, no actions will be forwarded to the 342 simulator which could trigger events. In the end, no events will 343 be aggregated. The huge benefit is the much higher speed with 344 instrumentation switched off. However, this only should be used 345 with care and in a coarse fashion: every mode change resets the 346 simulator state (ie. whether a memory block is cached or not) and 347 flushes Valgrinds internal cache of instrumented code blocks, 348 resulting in latency penalty at switching time. Also, cache 349 simulator results directly after switching on instrumentation will 350 be skewed due to identified cache misses which would not happen in 351 reality (if you care about this warm-up effect, you should make 352 sure to temporarly have collection state switched off directly 353 after turning instrumentation mode on). However, switching 354 instrumentation state is very useful to skip larger program phases 355 such as an initialization phase. By default, instrumentation is 356 switched on, but as with the collection state, can be changed by 357 various means. 358 </para> 359 360 <para>Callgrind can start with instrumentation mode switched off by 361 specifying 362 option <option><xref linkend="opt.instr-atstart"/>=no</option>. 363 Afterwards, instrumentation can be controlled in two ways: first, 364 interactively with: <screen>callgrind_control -i on</screen> (and 365 switching off again by specifying "off" instead of "on"). Second, 366 instrumentation state can be programatically changed with the 367 macros <computeroutput><xref linkend="cr.start-instr"/>;</computeroutput> 368 and <computeroutput><xref linkend="cr.stop-instr"/>;</computeroutput>. 369 </para> 370 371 <para>Similarly, the collection state at program start can be 372 switched off 373 by <option><xref linkend="opt.instr-atstart"/>=no</option>. During 374 execution, it can be controlled programatically with the 375 macro <computeroutput>CALLGRIND_TOGGLE_COLLECT;</computeroutput>. 376 Further, you can limit event collection to a specific function by 377 using <option><xref linkend="opt.toggle-collect"/>=function</option>. 378 This will toggle the collection state on entering and leaving the 379 specified function. When this option is in effect, the default 380 collection state at program start is "off". Only events happening 381 while running inside of the given function will be 382 collected. Recursive calls of the given function do not trigger 383 any action. This option can be given multiple times to specify 384 different functions of interest.</para> 385 </sect2> 386 387 <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events"> 388 <title>Counting global bus events</title> 389 390 <para>For access to shared data among threads in a multithreaded 391 code, synchronization is required to avoid raced conditions. 392 Synchronization primitives are usually implemented via atomic instructions. 393 However, excessive use of such instructions can lead to performance 394 issues.</para> 395 396 <para>To enable analysis of this problem, Callgrind optionally can count 397 the number of atomic instructions executed. More precisely, for x86/x86_64, 398 these are instructions using a lock prefix. For architectures supporting 399 LL/SC, these are the number of SC instructions executed. For both, the term 400 "global bus events" is used.</para> 401 402 <para>The short name of the event type used for global bus events is "Ge". 403 To count global bus events, use <option><xref linkend="clopt.collect-bus"/>=yes</option>. 404 </para> 405 </sect2> 406 407 <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles"> 408 <title>Avoiding cycles</title> 409 410 <para>Informally speaking, a cycle is a group of functions which 411 call each other in a recursive way.</para> 412 413 <para>Formally speaking, a cycle is a nonempty set S of functions, 414 such that for every pair of functions F and G in S, it is possible 415 to call from F to G (possibly via intermediate functions) and also 416 from G to F. Furthermore, S must be maximal -- that is, be the 417 largest set of functions satisfying this property. For example, if 418 a third function H is called from inside S and calls back into S, 419 then H is also part of the cycle and should be included in S.</para> 420 421 <para>Recursion is quite usual in programs, and therefore, cycles 422 sometimes appear in the call graph output of Callgrind. However, 423 the title of this chapter should raise two questions: What is bad 424 about cycles which makes you want to avoid them? And: How can 425 cycles be avoided without changing program code?</para> 426 427 <para>Cycles are not bad in itself, but tend to make performance 428 analysis of your code harder. This is because inclusive costs 429 for calls inside of a cycle are meaningless. The definition of 430 inclusive cost, i.e. self cost of a function plus inclusive cost 431 of its callees, needs a topological order among functions. For 432 cycles, this does not hold true: callees of a function in a cycle include 433 the function itself. Therefore, KCachegrind does cycle detection 434 and skips visualization of any inclusive cost for calls inside 435 of cycles. Further, all functions in a cycle are collapsed into artifical 436 functions called like <computeroutput>Cycle 1</computeroutput>.</para> 437 438 <para>Now, when a program exposes really big cycles (as is 439 true for some GUI code, or in general code using event or callback based 440 programming style), you lose the nice property to let you pinpoint 441 the bottlenecks by following call chains from 442 <function>main</function>, guided via 443 inclusive cost. In addition, KCachegrind loses its ability to show 444 interesting parts of the call graph, as it uses inclusive costs to 445 cut off uninteresting areas.</para> 446 447 <para>Despite the meaningless of inclusive costs in cycles, the big 448 drawback for visualization motivates the possibility to temporarily 449 switch off cycle detection in KCachegrind, which can lead to 450 misguiding visualization. However, often cycles appear because of 451 unlucky superposition of independent call chains in a way that 452 the profile result will see a cycle. Neglecting uninteresting 453 calls with very small measured inclusive cost would break these 454 cycles. In such cases, incorrect handling of cycles by not detecting 455 them still gives meaningful profiling visualization.</para> 456 457 <para>It has to be noted that currently, <command>callgrind_annotate</command> 458 does not do any cycle detection at all. For program executions with function 459 recursion, it e.g. can print nonsense inclusive costs way above 100%.</para> 460 461 <para>After describing why cycles are bad for profiling, it is worth 462 talking about cycle avoidance. The key insight here is that symbols in 463 the profile data do not have to exactly match the symbols found in the 464 program. Instead, the symbol name could encode additional information 465 from the current execution context such as recursion level of the 466 current function, or even some part of the call chain leading to the 467 function. While encoding of additional information into symbols is 468 quite capable of avoiding cycles, it has to be used carefully to not cause 469 symbol explosion. The latter imposes large memory requirement for Callgrind 470 with possible out-of-memory conditions, and big profile data files.</para> 471 472 <para>A further possibility to avoid cycles in Callgrind's profile data 473 output is to simply leave out given functions in the call graph. Of course, this 474 also skips any call information from and to an ignored function, and thus can 475 break a cycle. Candidates for this typically are dispatcher functions in event 476 driven code. The option to ignore calls to a function is 477 <option><xref linkend="opt.fn-skip"/>=function</option>. Aside from 478 possibly breaking cycles, this is used in Callgrind to skip 479 trampoline functions in the PLT sections 480 for calls to functions in shared libraries. You can see the difference 481 if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>. 482 If a call is ignored, its cost events will be propagated to the 483 enclosing function.</para> 484 485 <para>If you have a recursive function, you can distinguish the first 486 10 recursion levels by specifying 487 <option><xref linkend="opt.separate-recs-num"/>=function</option>. 488 Or for all functions with 489 <option><xref linkend="opt.separate-recs"/>=10</option>, but this will 490 give you much bigger profile data files. In the profile data, you will see 491 the recursion levels of "func" as the different functions with names 492 "func", "func'2", "func'3" and so on.</para> 493 494 <para>If you have call chains "A > B > C" and "A > C > B" 495 in your program, you usually get a "false" cycle "B <> C". Use 496 <option><xref linkend="opt.separate-callers-num"/>=B</option> 497 <option><xref linkend="opt.separate-callers-num"/>=C</option>, 498 and functions "B" and "C" will be treated as different functions 499 depending on the direct caller. Using the apostrophe for appending 500 this "context" to the function name, you get "A > B'A > C'B" 501 and "A > C'A > B'C", and there will be no cycle. Use 502 <option><xref linkend="opt.separate-callers"/>=2</option> to get a 2-caller 503 dependency for all functions. Note that doing this will increase 504 the size of profile data files.</para> 505 506 </sect2> 507 508 <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs"> 509 <title>Forking Programs</title> 510 511 <para>If your program forks, the child will inherit all the profiling 512 data that has been gathered for the parent. To start with empty profile 513 counter values in the child, the client request 514 <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput> 515 can be inserted into code to be executed by the child, directly after 516 <computeroutput>fork</computeroutput>.</para> 517 518 <para>However, you will have to make sure that the output file format string 519 (controlled by <option>--callgrind-out-file</option>) does contain 520 <option>%p</option> (which is true by default). Otherwise, the 521 outputs from the parent and child will overwrite each other or will be 522 intermingled, which almost certainly is not what you want.</para> 523 524 <para>You will be able to control the new child independently from 525 the parent via callgrind_control.</para> 526 527 </sect2> 528 529 </sect1> 530 531 532 <sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options"> 533 <title>Callgrind Command-line Options</title> 534 535 <para> 536 In the following, options are grouped into classes. 537 </para> 538 <para> 539 Some options allow the specification of a function/symbol name, such as 540 <option><xref linkend="opt.dump-before"/>=function</option>, or 541 <option><xref linkend="opt.fn-skip"/>=function</option>. All these options 542 can be specified multiple times for different functions. 543 In addition, the function specifications actually are patterns by supporting 544 the use of wildcards '*' (zero or more arbitrary characters) and '?' 545 (exactly one arbitrary character), similar to file name globbing in the 546 shell. This feature is important especially for C++, as without wildcard 547 usage, the function would have to be specified in full extent, including 548 parameter signature. </para> 549 550 <sect2 id="cl-manual.options.creation" 551 xreflabel="Dump creation options"> 552 <title>Dump creation options</title> 553 554 <para> 555 These options influence the name and format of the profile data files. 556 </para> 557 558 <!-- start of xi:include in the manpage --> 559 <variablelist id="cl.opts.list.creation"> 560 561 <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file"> 562 <term> 563 <option><![CDATA[--callgrind-out-file=<file> ]]></option> 564 </term> 565 <listitem> 566 <para>Write the profile data to 567 <computeroutput>file</computeroutput> rather than to the default 568 output file, 569 <computeroutput>callgrind.out.<pid></computeroutput>. The 570 <option>%p</option> and <option>%q</option> format specifiers 571 can be used to embed the process ID and/or the contents of an 572 environment variable in the name, as is the case for the core 573 option <option><xref linkend="opt.log-file"/></option>. 574 When multiple dumps are made, the file name 575 is modified further; see below.</para> 576 </listitem> 577 </varlistentry> 578 579 <varlistentry id="opt.dump-line" xreflabel="--dump-line"> 580 <term> 581 <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option> 582 </term> 583 <listitem> 584 <para>This specifies that event counting should be performed at 585 source line granularity. This allows source annotation for sources 586 which are compiled with debug information 587 (<option>-g</option>).</para> 588 </listitem> 589 </varlistentry> 590 591 <varlistentry id="opt.dump-instr" xreflabel="--dump-instr"> 592 <term> 593 <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option> 594 </term> 595 <listitem> 596 <para>This specifies that event counting should be performed at 597 per-instruction granularity. 598 This allows for assembly code 599 annotation. Currently the results can only be 600 displayed by KCachegrind.</para> 601 </listitem> 602 </varlistentry> 603 604 <varlistentry id="opt.compress-strings" xreflabel="--compress-strings"> 605 <term> 606 <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option> 607 </term> 608 <listitem> 609 <para>This option influences the output format of the profile data. 610 It specifies whether strings (file and function names) should be 611 identified by numbers. This shrinks the file, 612 but makes it more difficult 613 for humans to read (which is not recommended in any case).</para> 614 </listitem> 615 </varlistentry> 616 617 <varlistentry id="opt.compress-pos" xreflabel="--compress-pos"> 618 <term> 619 <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option> 620 </term> 621 <listitem> 622 <para>This option influences the output format of the profile data. 623 It specifies whether numerical positions are always specified as absolute 624 values or are allowed to be relative to previous numbers. 625 This shrinks the file size.</para> 626 </listitem> 627 </varlistentry> 628 629 <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps"> 630 <term> 631 <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option> 632 </term> 633 <listitem> 634 <para>When enabled, when multiple profile data parts are to be 635 generated these parts are appended to the same output file. 636 Not recommended.</para> 637 </listitem> 638 </varlistentry> 639 640 </variablelist> 641 </sect2> 642 643 <sect2 id="cl-manual.options.activity" 644 xreflabel="Activity options"> 645 <title>Activity options</title> 646 647 <para> 648 These options specify when actions relating to event counts are to 649 be executed. For interactive control use callgrind_control. 650 </para> 651 652 <!-- start of xi:include in the manpage --> 653 <variablelist id="cl.opts.list.activity"> 654 655 <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb"> 656 <term> 657 <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option> 658 </term> 659 <listitem> 660 <para>Dump profile data every <option>count</option> basic blocks. 661 Whether a dump is needed is only checked when Valgrind's internal 662 scheduler is run. Therefore, the minimum setting useful is about 100000. 663 The count is a 64-bit value to make long dump periods possible. 664 </para> 665 </listitem> 666 </varlistentry> 667 668 <varlistentry id="opt.dump-before" xreflabel="--dump-before"> 669 <term> 670 <option><![CDATA[--dump-before=<function> ]]></option> 671 </term> 672 <listitem> 673 <para>Dump when entering <option>function</option>.</para> 674 </listitem> 675 </varlistentry> 676 677 <varlistentry id="opt.zero-before" xreflabel="--zero-before"> 678 <term> 679 <option><![CDATA[--zero-before=<function> ]]></option> 680 </term> 681 <listitem> 682 <para>Zero all costs when entering <option>function</option>.</para> 683 </listitem> 684 </varlistentry> 685 686 <varlistentry id="opt.dump-after" xreflabel="--dump-after"> 687 <term> 688 <option><![CDATA[--dump-after=<function> ]]></option> 689 </term> 690 <listitem> 691 <para>Dump when leaving <option>function</option>.</para> 692 </listitem> 693 </varlistentry> 694 695 </variablelist> 696 <!-- end of xi:include in the manpage --> 697 </sect2> 698 699 <sect2 id="cl-manual.options.collection" 700 xreflabel="Data collection options"> 701 <title>Data collection options</title> 702 703 <para> 704 These options specify when events are to be aggregated into event counts. 705 Also see <xref linkend="cl-manual.limits"/>.</para> 706 707 <!-- start of xi:include in the manpage --> 708 <variablelist id="cl.opts.list.collection"> 709 710 <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart"> 711 <term> 712 <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option> 713 </term> 714 <listitem> 715 <para>Specify if you want Callgrind to start simulation and 716 profiling from the beginning of the program. 717 When set to <computeroutput>no</computeroutput>, 718 Callgrind will not be able 719 to collect any information, including calls, but it will have at 720 most a slowdown of around 4, which is the minimum Valgrind 721 overhead. Instrumentation can be interactively enabled via 722 <computeroutput>callgrind_control -i on</computeroutput>.</para> 723 <para>Note that the resulting call graph will most probably not 724 contain <function>main</function>, but will contain all the 725 functions executed after instrumentation was enabled. 726 Instrumentation can also programatically enabled/disabled. See the 727 Callgrind include file 728 <computeroutput>callgrind.h</computeroutput> for the macro 729 you have to use in your source code.</para> <para>For cache 730 simulation, results will be less accurate when switching on 731 instrumentation later in the program run, as the simulator starts 732 with an empty cache at that moment. Switch on event collection 733 later to cope with this error.</para> 734 </listitem> 735 </varlistentry> 736 737 <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart"> 738 <term> 739 <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option> 740 </term> 741 <listitem> 742 <para>Specify whether event collection is enabled at beginning 743 of the profile run.</para> 744 <para>To only look at parts of your program, you have two 745 possibilities:</para> 746 <orderedlist> 747 <listitem> 748 <para>Zero event counters before entering the program part you 749 want to profile, and dump the event counters to a file after 750 leaving that program part.</para> 751 </listitem> 752 <listitem> 753 <para>Switch on/off collection state as needed to only see 754 event counters happening while inside of the program part you 755 want to profile.</para> 756 </listitem> 757 </orderedlist> 758 <para>The second option can be used if the program part you want to 759 profile is called many times. Option 1, i.e. creating a lot of 760 dumps is not practical here.</para> 761 <para>Collection state can be 762 toggled at entry and exit of a given function with the 763 option <option><xref linkend="opt.toggle-collect"/></option>. If you 764 use this option, collection 765 state should be disabled at the beginning. Note that the 766 specification of <option>--toggle-collect</option> 767 implicitly sets 768 <option>--collect-state=no</option>.</para> 769 <para>Collection state can be toggled also by inserting the client request 770 <computeroutput> 771 <!-- commented out because it causes broken links in the man page 772 <xref linkend="cr.toggle-collect"/>; 773 --> 774 CALLGRIND_TOGGLE_COLLECT 775 ;</computeroutput> 776 at the needed code positions.</para> 777 </listitem> 778 </varlistentry> 779 780 <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect"> 781 <term> 782 <option><![CDATA[--toggle-collect=<function> ]]></option> 783 </term> 784 <listitem> 785 <para>Toggle collection on entry/exit of <option>function</option>.</para> 786 </listitem> 787 </varlistentry> 788 789 <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps"> 790 <term> 791 <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option> 792 </term> 793 <listitem> 794 <para>This specifies whether information for (conditional) jumps 795 should be collected. As above, callgrind_annotate currently is not 796 able to show you the data. You have to use KCachegrind to get jump 797 arrows in the annotated code.</para> 798 </listitem> 799 </varlistentry> 800 801 <varlistentry id="opt.collect-systime" xreflabel="--collect-systime"> 802 <term> 803 <option><![CDATA[--collect-systime=<no|yes> [default: no] ]]></option> 804 </term> 805 <listitem> 806 <para>This specifies whether information for system call times 807 should be collected.</para> 808 </listitem> 809 </varlistentry> 810 811 <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus"> 812 <term> 813 <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option> 814 </term> 815 <listitem> 816 <para>This specifies whether the number of global bus events executed 817 should be collected. The event type "Ge" is used for these events.</para> 818 </listitem> 819 </varlistentry> 820 821 </variablelist> 822 <!-- end of xi:include in the manpage --> 823 </sect2> 824 825 <sect2 id="cl-manual.options.separation" 826 xreflabel="Cost entity separation options"> 827 <title>Cost entity separation options</title> 828 829 <para> 830 These options specify how event counts should be attributed to execution 831 contexts. 832 For example, they specify whether the recursion level or the 833 call chain leading to a function should be taken into account, 834 and whether the thread ID should be considered. 835 Also see <xref linkend="cl-manual.cycles"/>.</para> 836 837 <!-- start of xi:include in the manpage --> 838 <variablelist id="cmd-options.separation"> 839 840 <varlistentry id="opt.separate-threads" xreflabel="--separate-threads"> 841 <term> 842 <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option> 843 </term> 844 <listitem> 845 <para>This option specifies whether profile data should be generated 846 separately for every thread. If yes, the file names get "-threadID" 847 appended.</para> 848 </listitem> 849 </varlistentry> 850 851 <varlistentry id="opt.separate-callers" xreflabel="--separate-callers"> 852 <term> 853 <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option> 854 </term> 855 <listitem> 856 <para>Separate contexts by at most <callers> functions in the 857 call chain. See <xref linkend="cl-manual.cycles"/>.</para> 858 </listitem> 859 </varlistentry> 860 861 <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2"> 862 <term> 863 <option><![CDATA[--separate-callers<number>=<function> ]]></option> 864 </term> 865 <listitem> 866 <para>Separate <option>number</option> callers for <option>function</option>. 867 See <xref linkend="cl-manual.cycles"/>.</para> 868 </listitem> 869 </varlistentry> 870 871 <varlistentry id="opt.separate-recs" xreflabel="--separate-recs"> 872 <term> 873 <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option> 874 </term> 875 <listitem> 876 <para>Separate function recursions by at most <option>level</option> levels. 877 See <xref linkend="cl-manual.cycles"/>.</para> 878 </listitem> 879 </varlistentry> 880 881 <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10"> 882 <term> 883 <option><![CDATA[--separate-recs<number>=<function> ]]></option> 884 </term> 885 <listitem> 886 <para>Separate <option>number</option> recursions for <option>function</option>. 887 See <xref linkend="cl-manual.cycles"/>.</para> 888 </listitem> 889 </varlistentry> 890 891 <varlistentry id="opt.skip-plt" xreflabel="--skip-plt"> 892 <term> 893 <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option> 894 </term> 895 <listitem> 896 <para>Ignore calls to/from PLT sections.</para> 897 </listitem> 898 </varlistentry> 899 900 <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec"> 901 <term> 902 <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option> 903 </term> 904 <listitem> 905 <para>Ignore direct recursions.</para> 906 </listitem> 907 </varlistentry> 908 909 <varlistentry id="opt.fn-skip" xreflabel="--fn-skip"> 910 <term> 911 <option><![CDATA[--fn-skip=<function> ]]></option> 912 </term> 913 <listitem> 914 <para>Ignore calls to/from a given function. E.g. if you have a 915 call chain A > B > C, and you specify function B to be 916 ignored, you will only see A > C.</para> 917 <para>This is very convenient to skip functions handling callback 918 behaviour. For example, with the signal/slot mechanism in the 919 Qt graphics library, you only want 920 to see the function emitting a signal to call the slots connected 921 to that signal. First, determine the real call chain to see the 922 functions needed to be skipped, then use this option.</para> 923 </listitem> 924 </varlistentry> 925 926 <!-- 927 commenting out as it is only enabled with CLG_EXPERIMENTAL. (Nb: I had to 928 insert a space between the double dash to avoid XML comment problems.) 929 930 <varlistentry id="opt.fn-group"> 931 <term> 932 <option><![CDATA[- -fn-group<number>=<function> ]]></option> 933 </term> 934 <listitem> 935 <para>Put a function into a separate group. This influences the 936 context name for cycle avoidance. All functions inside such a 937 group are treated as being the same for context name building, which 938 resembles the call chain leading to a context. By specifying function 939 groups with this option, you can shorten the context name, as functions 940 in the same group will not appear in sequence in the name. </para> 941 </listitem> 942 </varlistentry> 943 --> 944 945 </variablelist> 946 <!-- end of xi:include in the manpage --> 947 </sect2> 948 949 950 <sect2 id="cl-manual.options.simulation" 951 xreflabel="Simulation options"> 952 <title>Simulation options</title> 953 954 <!-- start of xi:include in the manpage --> 955 <variablelist id="cl.opts.list.simulation"> 956 957 <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim"> 958 <term> 959 <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option> 960 </term> 961 <listitem> 962 <para>Specify if you want to do full cache simulation. By default, 963 only instruction read accesses will be counted ("Ir"). 964 With cache simulation, further event counters are enabled: 965 Cache misses on instruction reads ("I1mr"/"ILmr"), 966 data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"), 967 data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw"). 968 For more information, see <xref linkend="&vg-cg-manual-id;"/>. 969 </para> 970 </listitem> 971 </varlistentry> 972 973 <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim"> 974 <term> 975 <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option> 976 </term> 977 <listitem> 978 <para>Specify if you want to do branch prediction simulation. 979 Further event counters are enabled: Number of executed conditional 980 branches and related predictor misses ("Bc"/"Bcm"), executed indirect 981 jumps and related misses of the jump address predictor ("Bi"/"Bim"). 982 </para> 983 </listitem> 984 </varlistentry> 985 986 </variablelist> 987 <!-- end of xi:include in the manpage --> 988 </sect2> 989 990 991 <sect2 id="cl-manual.options.cachesimulation" 992 xreflabel="Cache simulation options"> 993 <title>Cache simulation options</title> 994 995 <!-- start of xi:include in the manpage --> 996 <variablelist id="cl.opts.list.cachesimulation"> 997 998 <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb"> 999 <term> 1000 <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option> 1001 </term> 1002 <listitem> 1003 <para>Specify whether write-back behavior should be simulated, allowing 1004 to distinguish LL caches misses with and without write backs. 1005 The cache model of Cachegrind/Callgrind does not specify write-through 1006 vs. write-back behavior, and this also is not relevant for the number 1007 of generated miss counts. However, with explicit write-back simulation 1008 it can be decided whether a miss triggers not only the loading of a new 1009 cache line, but also if a write back of a dirty cache line had to take 1010 place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw, 1011 for misses because of instruction read, data read, and data write, 1012 respectively. As they produce two memory transactions, they should 1013 account for a doubled time estimation in relation to a normal miss. 1014 </para> 1015 </listitem> 1016 </varlistentry> 1017 1018 <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref"> 1019 <term> 1020 <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option> 1021 </term> 1022 <listitem> 1023 <para>Specify whether simulation of a hardware prefetcher should be 1024 added which is able to detect stream access in the second level cache 1025 by comparing accesses to separate to each page. 1026 As the simulation can not decide about any timing issues of prefetching, 1027 it is assumed that any hardware prefetch triggered succeeds before a 1028 real access is done. Thus, this gives a best-case scenario by covering 1029 all possible stream accesses.</para> 1030 </listitem> 1031 </varlistentry> 1032 1033 <varlistentry id="opt.cacheuse" xreflabel="--cacheuse"> 1034 <term> 1035 <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option> 1036 </term> 1037 <listitem> 1038 <para>Specify whether cache line use should be collected. For every 1039 cache line, from loading to it being evicted, the number of accesses 1040 as well as the number of actually used bytes is determined. This 1041 behavior is related to the code which triggered loading of the cache 1042 line. In contrast to miss counters, which shows the position where 1043 the symptoms of bad cache behavior (i.e. latencies) happens, the 1044 use counters try to pinpoint at the reason (i.e. the code with the 1045 bad access behavior). The new counters are defined in a way such 1046 that worse behavior results in higher cost. 1047 AcCost1 and AcCost2 are counters showing bad temporal locality 1048 for L1 and LL caches, respectively. This is done by summing up 1049 reciprocal values of the numbers of accesses of each cache line, 1050 multiplied by 1000 (as only integer costs are allowed). E.g. for 1051 a given source line with 5 read accesses, a value of 5000 AcCost 1052 means that for every access, a new cache line was loaded and directly 1053 evicted afterwards without further accesses. Similarly, SpLoss1/2 1054 shows bad spatial locality for L1 and LL caches, respectively. It 1055 gives the <emphasis>spatial loss</emphasis> count of bytes which 1056 were loaded into cache but never accessed. It pinpoints at code 1057 accessing data in a way such that cache space is wasted. This hints 1058 at bad layout of data structures in memory. Assuming a cache line 1059 size of 64 bytes and 100 L1 misses for a given source line, the 1060 loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a 1061 value of 3200 for this line, this means that half of the loaded data was 1062 never used, or using a better data layout, only half of the cache 1063 space would have been needed. 1064 Please note that for cache line use counters, it currently is 1065 not possible to provide meaningful inclusive costs. Therefore, 1066 inclusive cost of these counters should be ignored. 1067 </para> 1068 </listitem> 1069 </varlistentry> 1070 1071 <varlistentry id="opt.I1" xreflabel="--I1"> 1072 <term> 1073 <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option> 1074 </term> 1075 <listitem> 1076 <para>Specify the size, associativity and line size of the level 1 1077 instruction cache. </para> 1078 </listitem> 1079 </varlistentry> 1080 1081 <varlistentry id="opt.D1" xreflabel="--D1"> 1082 <term> 1083 <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option> 1084 </term> 1085 <listitem> 1086 <para>Specify the size, associativity and line size of the level 1 1087 data cache.</para> 1088 </listitem> 1089 </varlistentry> 1090 1091 <varlistentry id="opt.LL" xreflabel="--LL"> 1092 <term> 1093 <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option> 1094 </term> 1095 <listitem> 1096 <para>Specify the size, associativity and line size of the last-level 1097 cache.</para> 1098 </listitem> 1099 </varlistentry> 1100 </variablelist> 1101 <!-- end of xi:include in the manpage --> 1102 1103 </sect2> 1104 1105 </sect1> 1106 1107 <sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands"> 1108 <title>Callgrind Monitor Commands</title> 1109 <para>The Callgrind tool provides monitor commands handled by the Valgrind 1110 gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>). 1111 </para> 1112 1113 <itemizedlist> 1114 <listitem> 1115 <para><varname>dump [<dump_hint>]</varname> requests to dump the 1116 profile data. </para> 1117 </listitem> 1118 1119 <listitem> 1120 <para><varname>zero</varname> requests to zero the profile data 1121 counters. </para> 1122 </listitem> 1123 1124 <listitem> 1125 <para><varname>instrumentation [on|off]</varname> requests to set 1126 (if parameter on/off is given) or get the current instrumentation state. 1127 </para> 1128 </listitem> 1129 1130 <listitem> 1131 <para><varname>status</varname> requests to print out some status 1132 information.</para> 1133 </listitem> 1134 1135 </itemizedlist> 1136 </sect1> 1137 1138 <sect1 id="cl-manual.clientrequests" xreflabel="Client request reference"> 1139 <title>Callgrind specific client requests</title> 1140 1141 <para>Callgrind provides the following specific client requests in 1142 <filename>callgrind.h</filename>. See that file for the exact details of 1143 their arguments.</para> 1144 1145 <variablelist id="cl.clientrequests.list"> 1146 1147 <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS"> 1148 <term> 1149 <computeroutput>CALLGRIND_DUMP_STATS</computeroutput> 1150 </term> 1151 <listitem> 1152 <para>Force generation of a profile dump at specified position 1153 in code, for the current thread only. Written counters will be reset 1154 to zero.</para> 1155 </listitem> 1156 </varlistentry> 1157 1158 <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT"> 1159 <term> 1160 <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput> 1161 </term> 1162 <listitem> 1163 <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>, 1164 but allows to specify a string to be able to distinguish profile 1165 dumps.</para> 1166 </listitem> 1167 </varlistentry> 1168 1169 <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS"> 1170 <term> 1171 <computeroutput>CALLGRIND_ZERO_STATS</computeroutput> 1172 </term> 1173 <listitem> 1174 <para>Reset the profile counters for the current thread to zero.</para> 1175 </listitem> 1176 </varlistentry> 1177 1178 <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT"> 1179 <term> 1180 <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput> 1181 </term> 1182 <listitem> 1183 <para>Toggle the collection state. This allows to ignore events 1184 with regard to profile counters. See also options 1185 <option><xref linkend="opt.collect-atstart"/></option> and 1186 <option><xref linkend="opt.toggle-collect"/></option>.</para> 1187 </listitem> 1188 </varlistentry> 1189 1190 <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION"> 1191 <term> 1192 <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput> 1193 </term> 1194 <listitem> 1195 <para>Start full Callgrind instrumentation if not already enabled. 1196 When cache simulation is done, this will flush the simulated cache 1197 and lead to an artifical cache warmup phase afterwards with 1198 cache misses which would not have happened in reality. See also 1199 option <option><xref linkend="opt.instr-atstart"/></option>.</para> 1200 </listitem> 1201 </varlistentry> 1202 1203 <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION"> 1204 <term> 1205 <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput> 1206 </term> 1207 <listitem> 1208 <para>Stop full Callgrind instrumentation if not already disabled. 1209 This flushes Valgrinds translation cache, and does no additional 1210 instrumentation afterwards: it effectivly will run at the same 1211 speed as Nulgrind, i.e. at minimal slowdown. Use this to 1212 speed up the Callgrind run for uninteresting code parts. Use 1213 <computeroutput><xref linkend="cr.start-instr"/></computeroutput> to 1214 enable instrumentation again. See also option 1215 <option><xref linkend="opt.instr-atstart"/></option>.</para> 1216 </listitem> 1217 </varlistentry> 1218 1219 </variablelist> 1220 1221 </sect1> 1222 1223 1224 1225 <sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options"> 1226 <title>callgrind_annotate Command-line Options</title> 1227 1228 <!-- start of xi:include in the manpage --> 1229 <variablelist id="callgrind_annotate.opts.list"> 1230 1231 <varlistentry> 1232 <term><option>-h --help</option></term> 1233 <listitem> 1234 <para>Show summary of options.</para> 1235 </listitem> 1236 </varlistentry> 1237 1238 <varlistentry> 1239 <term><option>--version</option></term> 1240 <listitem> 1241 <para>Show version of callgrind_annotate.</para> 1242 </listitem> 1243 </varlistentry> 1244 1245 <varlistentry> 1246 <term> 1247 <option>--show=A,B,C [default: all]</option> 1248 </term> 1249 <listitem> 1250 <para>Only show figures for events A,B,C.</para> 1251 </listitem> 1252 </varlistentry> 1253 1254 <varlistentry> 1255 <term> 1256 <option>--sort=A,B,C</option> 1257 </term> 1258 <listitem> 1259 <para>Sort columns by events A,B,C [event column order].</para> 1260 </listitem> 1261 </varlistentry> 1262 1263 <varlistentry> 1264 <term> 1265 <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option> 1266 </term> 1267 <listitem> 1268 <para>Percentage of counts (of primary sort event) we are 1269 interested in.</para> 1270 </listitem> 1271 </varlistentry> 1272 1273 <varlistentry> 1274 <term> 1275 <option><![CDATA[--auto=<yes|no> [default: no] ]]></option> 1276 </term> 1277 <listitem> 1278 <para>Annotate all source files containing functions that helped 1279 reach the event count threshold.</para> 1280 </listitem> 1281 </varlistentry> 1282 1283 <varlistentry> 1284 <term> 1285 <option>--context=N [default: 8] </option> 1286 </term> 1287 <listitem> 1288 <para>Print N lines of context before and after annotated 1289 lines.</para> 1290 </listitem> 1291 </varlistentry> 1292 1293 <varlistentry> 1294 <term> 1295 <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option> 1296 </term> 1297 <listitem> 1298 <para>Add subroutine costs to functions calls.</para> 1299 </listitem> 1300 </varlistentry> 1301 1302 <varlistentry> 1303 <term> 1304 <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option> 1305 </term> 1306 <listitem> 1307 <para>Print for each function their callers, the called functions 1308 or both.</para> 1309 </listitem> 1310 </varlistentry> 1311 1312 <varlistentry> 1313 <term> 1314 <option><![CDATA[-I, --include=<dir> ]]></option> 1315 </term> 1316 <listitem> 1317 <para>Add <option>dir</option> to the list of directories to search 1318 for source files.</para> 1319 </listitem> 1320 </varlistentry> 1321 1322 </variablelist> 1323 <!-- end of xi:include in the manpage --> 1324 1325 1326 </sect1> 1327 1328 1329 1330 1331 <sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options"> 1332 <title>callgrind_control Command-line Options</title> 1333 1334 <para>By default, callgrind_control acts on all programs run by the 1335 current user under Callgrind. It is possible to limit the actions to 1336 specified Callgrind runs by providing a list of pids or program names as 1337 argument. The default action is to give some brief information about the 1338 applications being run under Callgrind.</para> 1339 1340 <!-- start of xi:include in the manpage --> 1341 <variablelist id="callgrind_control.opts.list"> 1342 1343 <varlistentry> 1344 <term><option>-h --help</option></term> 1345 <listitem> 1346 <para>Show a short description, usage, and summary of options.</para> 1347 </listitem> 1348 </varlistentry> 1349 1350 <varlistentry> 1351 <term><option>--version</option></term> 1352 <listitem> 1353 <para>Show version of callgrind_control.</para> 1354 </listitem> 1355 </varlistentry> 1356 1357 <varlistentry> 1358 <term><option>-l --long</option></term> 1359 <listitem> 1360 <para>Show also the working directory, in addition to the brief 1361 information given by default. 1362 </para> 1363 </listitem> 1364 </varlistentry> 1365 1366 <varlistentry> 1367 <term><option>-s --stat</option></term> 1368 <listitem> 1369 <para>Show statistics information about active Callgrind runs.</para> 1370 </listitem> 1371 </varlistentry> 1372 1373 <varlistentry> 1374 <term><option>-b --back</option></term> 1375 <listitem> 1376 <para>Show stack/back traces of each thread in active Callgrind runs. For 1377 each active function in the stack trace, also the number of invocations 1378 since program start (or last dump) is shown. This option can be 1379 combined with -e to show inclusive cost of active functions.</para> 1380 </listitem> 1381 </varlistentry> 1382 1383 <varlistentry> 1384 <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term> 1385 <listitem> 1386 <para>Show the current per-thread, exclusive cost values of event 1387 counters. If no explicit event names are given, figures for all event 1388 types which are collected in the given Callgrind run are 1389 shown. Otherwise, only figures for event types A, B, ... are shown. If 1390 this option is combined with -b, inclusive cost for the functions of 1391 each active stack frame is provided, too. 1392 </para> 1393 </listitem> 1394 </varlistentry> 1395 1396 <varlistentry> 1397 <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term> 1398 <listitem> 1399 <para>Request the dumping of profile information. Optionally, a 1400 description can be specified which is written into the dump as part of 1401 the information giving the reason which triggered the dump action. This 1402 can be used to distinguish multiple dumps.</para> 1403 </listitem> 1404 </varlistentry> 1405 1406 <varlistentry> 1407 <term><option>-z --zero</option></term> 1408 <listitem> 1409 <para>Zero all event counters.</para> 1410 </listitem> 1411 </varlistentry> 1412 1413 <varlistentry> 1414 <term><option>-k --kill</option></term> 1415 <listitem> 1416 <para>Force a Callgrind run to be terminated.</para> 1417 </listitem> 1418 </varlistentry> 1419 1420 <varlistentry> 1421 <term><option><![CDATA[--instr=<on|off>]]></option></term> 1422 <listitem> 1423 <para>Switch instrumentation mode on or off. If a Callgrind run has 1424 instrumentation disabled, no simulation is done and no events are 1425 counted. This is useful to skip uninteresting program parts, as there 1426 is much less slowdown (same as with the Valgrind tool "none"). See also 1427 the Callgrind option <option>--instr-atstart</option>.</para> 1428 </listitem> 1429 </varlistentry> 1430 1431 <varlistentry> 1432 <term><option><![CDATA[--vgdb-prefix=<prefix>]]></option></term> 1433 <listitem> 1434 <para>Specify the vgdb prefix to use by callgrind_control. 1435 callgrind_control internally uses vgdb to find and control the active 1436 Callgrind runs. If the <option>--vgdb-prefix</option> option was used 1437 for launching valgrind, then the same option must be given to 1438 callgrind_control.</para> 1439 </listitem> 1440 </varlistentry> 1441 </variablelist> 1442 <!-- end of xi:include in the manpage --> 1443 1444 </sect1> 1445 1446 </chapter> 1447