1 <html> 2 <head> 3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 4 <title>6.Callgrind: a call-graph generating cache and branch prediction profiler</title> 5 <link rel="stylesheet" href="vg_basic.css" type="text/css"> 6 <meta name="generator" content="DocBook XSL Stylesheets V1.75.2"> 7 <link rel="home" href="index.html" title="Valgrind Documentation"> 8 <link rel="up" href="manual.html" title="Valgrind User Manual"> 9 <link rel="prev" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler"> 10 <link rel="next" href="hg-manual.html" title="7.Helgrind: a thread error detector"> 11 </head> 12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13 <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> 14 <td width="22px" align="center" valign="middle"><a accesskey="p" href="cg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> 15 <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> 16 <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> 17 <th align="center" valign="middle">Valgrind User Manual</th> 18 <td width="22px" align="center" valign="middle"><a accesskey="n" href="hg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> 19 </tr></table></div> 20 <div class="chapter" title="6.Callgrind: a call-graph generating cache and branch prediction profiler"> 21 <div class="titlepage"><div><div><h2 class="title"> 22 <a name="cl-manual"></a>6.Callgrind: a call-graph generating cache and branch prediction profiler</h2></div></div></div> 23 <div class="toc"> 24 <p><b>Table of Contents</b></p> 25 <dl> 26 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.use">6.1. Overview</a></span></dt> 27 <dd><dl> 28 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.functionality">6.1.1. Functionality</a></span></dt> 29 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.basics">6.1.2. Basic Usage</a></span></dt> 30 </dl></dd> 31 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.usage">6.2. Advanced Usage</a></span></dt> 32 <dd><dl> 33 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.dumps">6.2.1. Multiple profiling dumps from one program run</a></span></dt> 34 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.limits">6.2.2. Limiting the range of collected events</a></span></dt> 35 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.busevents">6.2.3. Counting global bus events</a></span></dt> 36 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.cycles">6.2.4. Avoiding cycles</a></span></dt> 37 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.forkingprograms">6.2.5. Forking Programs</a></span></dt> 38 </dl></dd> 39 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.options">6.3. Callgrind Command-line Options</a></span></dt> 40 <dd><dl> 41 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.creation">6.3.1. Dump creation options</a></span></dt> 42 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.activity">6.3.2. Activity options</a></span></dt> 43 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.collection">6.3.3. Data collection options</a></span></dt> 44 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.separation">6.3.4. Cost entity separation options</a></span></dt> 45 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.simulation">6.3.5. Simulation options</a></span></dt> 46 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.cachesimulation">6.3.6. Cache simulation options</a></span></dt> 47 </dl></dd> 48 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.clientrequests">6.4. Callgrind specific client requests</a></span></dt> 49 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_annotate-options">6.5. callgrind_annotate Command-line Options</a></span></dt> 50 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_control-options">6.6. callgrind_control Command-line Options</a></span></dt> 51 </dl> 52 </div> 53 <p>To use this tool, you must specify 54 <code class="option">--tool=callgrind</code> on the 55 Valgrind command line.</p> 56 <div class="sect1" title="6.1.Overview"> 57 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 58 <a name="cl-manual.use"></a>6.1.Overview</h2></div></div></div> 59 <p>Callgrind is a profiling tool that records the call history among 60 functions in a program's run as a call-graph. 61 By default, the collected data consists of 62 the number of instructions executed, their relationship 63 to source lines, the caller/callee relationship between functions, 64 and the numbers of such calls. 65 Optionally, cache simulation and/or branch prediction (similar to Cachegrind) 66 can produce further information about the runtime behavior of an application. 67 </p> 68 <p>The profile data is written out to a file at program 69 termination. For presentation of the data, and interactive control 70 of the profiling, two command line tools are provided:</p> 71 <div class="variablelist"><dl> 72 <dt><span class="term"><span class="command"><strong>callgrind_annotate</strong></span></span></dt> 73 <dd> 74 <p>This command reads in the profile data, and prints a 75 sorted lists of functions, optionally with source annotation.</p> 76 <p>For graphical visualization of the data, try 77 <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>, which is a KDE/Qt based 78 GUI that makes it easy to navigate the large amount of data that 79 Callgrind produces.</p> 80 </dd> 81 <dt><span class="term"><span class="command"><strong>callgrind_control</strong></span></span></dt> 82 <dd><p>This command enables you to interactively observe and control 83 the status of a program currently running under Callgrind's control, 84 without stopping the program. You can get statistics information as 85 well as the current stack trace, and you can request zeroing of counters 86 or dumping of profile data.</p></dd> 87 </dl></div> 88 <div class="sect2" title="6.1.1.Functionality"> 89 <div class="titlepage"><div><div><h3 class="title"> 90 <a name="cl-manual.functionality"></a>6.1.1.Functionality</h3></div></div></div> 91 <p>Cachegrind collects flat profile data: event counts (data reads, 92 cache misses, etc.) are attributed directly to the function they 93 occurred in. This cost attribution mechanism is 94 called <span class="emphasis"><em>self</em></span> or <span class="emphasis"><em>exclusive</em></span> 95 attribution.</p> 96 <p>Callgrind extends this functionality by propagating costs 97 across function call boundaries. If function <code class="function">foo</code> calls 98 <code class="function">bar</code>, the costs from <code class="function">bar</code> are added into 99 <code class="function">foo</code>'s costs. When applied to the program as a whole, 100 this builds up a picture of so called <span class="emphasis"><em>inclusive</em></span> 101 costs, that is, where the cost of each function includes the costs of 102 all functions it called, directly or indirectly.</p> 103 <p>As an example, the inclusive cost of 104 <code class="function">main</code> should be almost 100 percent 105 of the total program cost. Because of costs arising before 106 <code class="function">main</code> is run, such as 107 initialization of the run time linker and construction of global C++ 108 objects, the inclusive cost of <code class="function">main</code> 109 is not exactly 100 percent of the total program cost.</p> 110 <p>Together with the call graph, this allows you to find the 111 specific call chains starting from 112 <code class="function">main</code> in which the majority of the 113 program's costs occur. Caller/callee cost attribution is also useful 114 for profiling functions called from multiple call sites, and where 115 optimization opportunities depend on changing code in the callers, in 116 particular by reducing the call count.</p> 117 <p>Callgrind's cache simulation is based on that of Cachegrind. 118 Read the documentation for <a class="xref" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a> first. The material 119 below describes the features supported in addition to Cachegrind's 120 features.</p> 121 <p>Callgrind's ability to detect function calls and returns depends 122 on the instruction set of the platform it is run on. It works best 123 on x86 and amd64, and unfortunately currently does not work so well 124 on PowerPC code. This is because there are no explicit call or return 125 instructions in the PowerPC instruction set, so Callgrind has to rely 126 on heuristics to detect calls and returns.</p> 127 </div> 128 <div class="sect2" title="6.1.2.Basic Usage"> 129 <div class="titlepage"><div><div><h3 class="title"> 130 <a name="cl-manual.basics"></a>6.1.2.Basic Usage</h3></div></div></div> 131 <p>As with Cachegrind, you probably want to compile with debugging info 132 (the <code class="option">-g</code> option) and with optimization turned on.</p> 133 <p>To start a profile run for a program, execute: 134 </p> 135 <pre class="screen">valgrind --tool=callgrind [callgrind options] your-program [program options]</pre> 136 <p> 137 </p> 138 <p>While the simulation is running, you can observe execution with: 139 </p> 140 <pre class="screen">callgrind_control -b</pre> 141 <p> 142 This will print out the current backtrace. To annotate the backtrace with 143 event counts, run 144 </p> 145 <pre class="screen">callgrind_control -e -b</pre> 146 <p> 147 </p> 148 <p>After program termination, a profile data file named 149 <code class="computeroutput">callgrind.out.<pid></code> 150 is generated, where <span class="emphasis"><em>pid</em></span> is the process ID 151 of the program being profiled. 152 The data file contains information about the calls made in the 153 program among the functions executed, together with 154 <span class="command"><strong>Instruction Read</strong></span> (Ir) event counts.</p> 155 <p>To generate a function-by-function summary from the profile 156 data file, use 157 </p> 158 <pre class="screen">callgrind_annotate [options] callgrind.out.<pid></pre> 159 <p> 160 This summary is similar to the output you get from a Cachegrind 161 run with cg_annotate: the list 162 of functions is ordered by exclusive cost of functions, which also 163 are the ones that are shown. 164 Important for the additional features of Callgrind are 165 the following two options:</p> 166 <div class="itemizedlist"><ul class="itemizedlist" type="disc"> 167 <li class="listitem"><p><code class="option">--inclusive=yes</code>: Instead of using 168 exclusive cost of functions as sorting order, use and show 169 inclusive cost.</p></li> 170 <li class="listitem"><p><code class="option">--tree=both</code>: Interleave into the 171 top level list of functions, information on the callers and the callees 172 of each function. In these lines, which represents executed 173 calls, the cost gives the number of events spent in the call. 174 Indented, above each function, there is the list of callers, 175 and below, the list of callees. The sum of events in calls to 176 a given function (caller lines), as well as the sum of events in 177 calls from the function (callee lines) together with the self 178 cost, gives the total inclusive cost of the function.</p></li> 179 </ul></div> 180 <p>Use <code class="option">--auto=yes</code> to get annotated source code 181 for all relevant functions for which the source can be found. In 182 addition to source annotation as produced by 183 <code class="computeroutput">cg_annotate</code>, you will see the 184 annotated call sites with call counts. For all other options, 185 consult the (Cachegrind) documentation for 186 <code class="computeroutput">cg_annotate</code>. 187 </p> 188 <p>For better call graph browsing experience, it is highly recommended 189 to use <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>. 190 If your code 191 has a significant fraction of its cost in <span class="emphasis"><em>cycles</em></span> (sets 192 of functions calling each other in a recursive manner), you have to 193 use KCachegrind, as <code class="computeroutput">callgrind_annotate</code> 194 currently does not do any cycle detection, which is important to get correct 195 results in this case.</p> 196 <p>If you are additionally interested in measuring the 197 cache behavior of your program, use Callgrind with the option 198 <code class="option"><a class="xref" href="cl-manual.html#clopt.cache-sim">--cache-sim</a>=yes</code>. For 199 branch prediction simulation, use <code class="option"><a class="xref" href="cl-manual.html#clopt.branch-sim">--branch-sim</a>=yes</code>. 200 Expect a further slow down approximately by a factor of 2.</p> 201 <p>If the program section you want to profile is somewhere in the 202 middle of the run, it is beneficial to 203 <span class="emphasis"><em>fast forward</em></span> to this section without any 204 profiling, and then enable profiling. This is achieved by using 205 the command line option 206 <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code> 207 and running, in a shell: 208 <code class="computeroutput">callgrind_control -i on</code> just before the 209 interesting code section is executed. To exactly specify 210 the code position where profiling should start, use the client request 211 <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code>.</p> 212 <p>If you want to be able to see assembly code level annotation, specify 213 <code class="option"><a class="xref" href="cl-manual.html#opt.dump-instr">--dump-instr</a>=yes</code>. This will produce 214 profile data at instruction granularity. Note that the resulting profile 215 data 216 can only be viewed with KCachegrind. For assembly annotation, it also is 217 interesting to see more details of the control flow inside of functions, 218 i.e. (conditional) jumps. This will be collected by further specifying 219 <code class="option"><a class="xref" href="cl-manual.html#opt.collect-jumps">--collect-jumps</a>=yes</code>.</p> 220 </div> 221 </div> 222 <div class="sect1" title="6.2.Advanced Usage"> 223 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 224 <a name="cl-manual.usage"></a>6.2.Advanced Usage</h2></div></div></div> 225 <div class="sect2" title="6.2.1.Multiple profiling dumps from one program run"> 226 <div class="titlepage"><div><div><h3 class="title"> 227 <a name="cl-manual.dumps"></a>6.2.1.Multiple profiling dumps from one program run</h3></div></div></div> 228 <p>Sometimes you are not interested in characteristics of a full 229 program run, but only of a small part of it, for example execution of one 230 algorithm. If there are multiple algorithms, or one algorithm 231 running with different input data, it may even be useful to get different 232 profile information for different parts of a single program run.</p> 233 <p>Profile data files have names of the form 234 </p> 235 <pre class="screen"> 236 callgrind.out.<span class="emphasis"><em>pid</em></span>.<span class="emphasis"><em>part</em></span>-<span class="emphasis"><em>threadID</em></span> 237 </pre> 238 <p> 239 </p> 240 <p>where <span class="emphasis"><em>pid</em></span> is the PID of the running 241 program, <span class="emphasis"><em>part</em></span> is a number incremented on each 242 dump (".part" is skipped for the dump at program termination), and 243 <span class="emphasis"><em>threadID</em></span> is a thread identification 244 ("-threadID" is only used if you request dumps of individual 245 threads with <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>).</p> 246 <p>There are different ways to generate multiple profile dumps 247 while a program is running under Callgrind's supervision. Nevertheless, 248 all methods trigger the same action, which is "dump all profile 249 information since the last dump or program start, and zero cost 250 counters afterwards". To allow for zeroing cost counters without 251 dumping, there is a second action "zero all cost counters now". 252 The different methods are:</p> 253 <div class="itemizedlist"><ul class="itemizedlist" type="disc"> 254 <li class="listitem"><p><span class="command"><strong>Dump on program termination.</strong></span> 255 This method is the standard way and doesn't need any special 256 action on your part.</p></li> 257 <li class="listitem"> 258 <p><span class="command"><strong>Spontaneous, interactive dumping.</strong></span> Use 259 </p> 260 <pre class="screen">callgrind_control -d [hint [PID/Name]]</pre> 261 <p> to 262 request the dumping of profile information of the supervised 263 application with PID or Name. <span class="emphasis"><em>hint</em></span> is an 264 arbitrary string you can optionally specify to later be able to 265 distinguish profile dumps. The control program will not terminate 266 before the dump is completely written. Note that the application 267 must be actively running for detection of the dump command. So, 268 for a GUI application, resize the window, or for a server, send a 269 request.</p> 270 <p>If you are using <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a> 271 for browsing of profile information, you can use the toolbar 272 button <span class="command"><strong>Force dump</strong></span>. This will request a dump 273 and trigger a reload after the dump is written.</p> 274 </li> 275 <li class="listitem"><p><span class="command"><strong>Periodic dumping after execution of a specified 276 number of basic blocks</strong></span>. For this, use the command line 277 option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>. 278 </p></li> 279 <li class="listitem"> 280 <p><span class="command"><strong>Dumping at enter/leave of specified functions.</strong></span> 281 Use the 282 option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code> 283 and <code class="option"><a class="xref" href="cl-manual.html#opt.dump-after">--dump-after</a>=function</code>. 284 To zero cost counters before entering a function, use 285 <code class="option"><a class="xref" href="cl-manual.html#opt.zero-before">--zero-before</a>=function</code>.</p> 286 <p>You can specify these options multiple times for different 287 functions. Function specifications support wildcards: e.g. use 288 <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>='foo*'</code> to 289 generate dumps before entering any function starting with 290 <span class="emphasis"><em>foo</em></span>.</p> 291 </li> 292 <li class="listitem"><p><span class="command"><strong>Program controlled dumping.</strong></span> 293 Insert 294 <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.dump-stats">CALLGRIND_DUMP_STATS</a>;</code> 295 at the position in your code where you want a profile dump to happen. Use 296 <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> to only 297 zero profile counters. 298 See <a class="xref" href="cl-manual.html#cl-manual.clientrequests" title="6.4.Callgrind specific client requests">Client request reference</a> for more information on 299 Callgrind specific client requests.</p></li> 300 </ul></div> 301 <p>If you are running a multi-threaded application and specify the 302 command line option <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>, 303 every thread will be profiled on its own and will create its own 304 profile dump. Thus, the last two methods will only generate one dump 305 of the currently running thread. With the other methods, you will get 306 multiple dumps (one for each thread) on a dump request.</p> 307 </div> 308 <div class="sect2" title="6.2.2.Limiting the range of collected events"> 309 <div class="titlepage"><div><div><h3 class="title"> 310 <a name="cl-manual.limits"></a>6.2.2.Limiting the range of collected events</h3></div></div></div> 311 <p>For aggregating events (function enter/leave, 312 instruction execution, memory access) into event numbers, 313 first, the events must be recognizable by Callgrind, and second, 314 the collection state must be enabled.</p> 315 <p>Event collection is only possible if <span class="emphasis"><em>instrumentation</em></span> 316 for program code is enabled. This is the default, but for faster 317 execution (identical to <code class="computeroutput">valgrind --tool=none</code>), 318 it can be disabled until the program reaches a state in which 319 you want to start collecting profiling data. 320 Callgrind can start without instrumentation 321 by specifying option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. 322 Instrumentation can be enabled interactively 323 with: </p> 324 <pre class="screen">callgrind_control -i on</pre> 325 <p> 326 and off by specifying "off" instead of "on". 327 Furthermore, instrumentation state can be programatically changed with 328 the macros <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a>;</code> 329 and <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</a>;</code>. 330 </p> 331 <p>In addition to enabling instrumentation, you must also enable 332 event collection for the parts of your program you are interested in. 333 By default, event collection is enabled everywhere. 334 You can limit collection to a specific function 335 by using 336 <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=function</code>. 337 This will toggle the collection state on entering and leaving 338 the specified functions. 339 When this option is in effect, the default collection state 340 at program start is "off". Only events happening while running 341 inside of the given function will be collected. Recursive 342 calls of the given function do not trigger any action.</p> 343 <p>It is important to note that with instrumentation disabled, the 344 cache simulator cannot see any memory access events, and thus, any 345 simulated cache state will be frozen and wrong without instrumentation. 346 Therefore, to get useful cache events (hits/misses) after switching on 347 instrumentation, the cache first must warm up, 348 probably leading to many <span class="emphasis"><em>cold misses</em></span> 349 which would not have happened in reality. If you do not want to see these, 350 start event collection a few million instructions after you have enabled 351 instrumentation.</p> 352 </div> 353 <div class="sect2" title="6.2.3.Counting global bus events"> 354 <div class="titlepage"><div><div><h3 class="title"> 355 <a name="cl-manual.busevents"></a>6.2.3.Counting global bus events</h3></div></div></div> 356 <p>For access to shared data among threads in a multithreaded 357 code, synchronization is required to avoid raced conditions. 358 Synchronization primitives are usually implemented via atomic instructions. 359 However, excessive use of such instructions can lead to performance 360 issues.</p> 361 <p>To enable analysis of this problem, Callgrind optionally can count 362 the number of atomic instructions executed. More precisely, for x86/x86_64, 363 these are instructions using a lock prefix. For architectures supporting 364 LL/SC, these are the number of SC instructions executed. For both, the term 365 "global bus events" is used.</p> 366 <p>The short name of the event type used for global bus events is "Ge". 367 To count global bus events, use <code class="option"><a class="xref" href="cl-manual.html#clopt.collect-bus">--collect-bus</a>=yes</code>. 368 </p> 369 </div> 370 <div class="sect2" title="6.2.4.Avoiding cycles"> 371 <div class="titlepage"><div><div><h3 class="title"> 372 <a name="cl-manual.cycles"></a>6.2.4.Avoiding cycles</h3></div></div></div> 373 <p>Informally speaking, a cycle is a group of functions which 374 call each other in a recursive way.</p> 375 <p>Formally speaking, a cycle is a nonempty set S of functions, 376 such that for every pair of functions F and G in S, it is possible 377 to call from F to G (possibly via intermediate functions) and also 378 from G to F. Furthermore, S must be maximal -- that is, be the 379 largest set of functions satisfying this property. For example, if 380 a third function H is called from inside S and calls back into S, 381 then H is also part of the cycle and should be included in S.</p> 382 <p>Recursion is quite usual in programs, and therefore, cycles 383 sometimes appear in the call graph output of Callgrind. However, 384 the title of this chapter should raise two questions: What is bad 385 about cycles which makes you want to avoid them? And: How can 386 cycles be avoided without changing program code?</p> 387 <p>Cycles are not bad in itself, but tend to make performance 388 analysis of your code harder. This is because inclusive costs 389 for calls inside of a cycle are meaningless. The definition of 390 inclusive cost, i.e. self cost of a function plus inclusive cost 391 of its callees, needs a topological order among functions. For 392 cycles, this does not hold true: callees of a function in a cycle include 393 the function itself. Therefore, KCachegrind does cycle detection 394 and skips visualization of any inclusive cost for calls inside 395 of cycles. Further, all functions in a cycle are collapsed into artifical 396 functions called like <code class="computeroutput">Cycle 1</code>.</p> 397 <p>Now, when a program exposes really big cycles (as is 398 true for some GUI code, or in general code using event or callback based 399 programming style), you lose the nice property to let you pinpoint 400 the bottlenecks by following call chains from 401 <code class="function">main</code>, guided via 402 inclusive cost. In addition, KCachegrind loses its ability to show 403 interesting parts of the call graph, as it uses inclusive costs to 404 cut off uninteresting areas.</p> 405 <p>Despite the meaningless of inclusive costs in cycles, the big 406 drawback for visualization motivates the possibility to temporarily 407 switch off cycle detection in KCachegrind, which can lead to 408 misguiding visualization. However, often cycles appear because of 409 unlucky superposition of independent call chains in a way that 410 the profile result will see a cycle. Neglecting uninteresting 411 calls with very small measured inclusive cost would break these 412 cycles. In such cases, incorrect handling of cycles by not detecting 413 them still gives meaningful profiling visualization.</p> 414 <p>It has to be noted that currently, <span class="command"><strong>callgrind_annotate</strong></span> 415 does not do any cycle detection at all. For program executions with function 416 recursion, it e.g. can print nonsense inclusive costs way above 100%.</p> 417 <p>After describing why cycles are bad for profiling, it is worth 418 talking about cycle avoidance. The key insight here is that symbols in 419 the profile data do not have to exactly match the symbols found in the 420 program. Instead, the symbol name could encode additional information 421 from the current execution context such as recursion level of the 422 current function, or even some part of the call chain leading to the 423 function. While encoding of additional information into symbols is 424 quite capable of avoiding cycles, it has to be used carefully to not cause 425 symbol explosion. The latter imposes large memory requirement for Callgrind 426 with possible out-of-memory conditions, and big profile data files.</p> 427 <p>A further possibility to avoid cycles in Callgrind's profile data 428 output is to simply leave out given functions in the call graph. Of course, this 429 also skips any call information from and to an ignored function, and thus can 430 break a cycle. Candidates for this typically are dispatcher functions in event 431 driven code. The option to ignore calls to a function is 432 <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. Aside from 433 possibly breaking cycles, this is used in Callgrind to skip 434 trampoline functions in the PLT sections 435 for calls to functions in shared libraries. You can see the difference 436 if you profile with <code class="option"><a class="xref" href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>. 437 If a call is ignored, its cost events will be propagated to the 438 enclosing function.</p> 439 <p>If you have a recursive function, you can distinguish the first 440 10 recursion levels by specifying 441 <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs-num">--separate-recs10</a>=function</code>. 442 Or for all functions with 443 <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs">--separate-recs</a>=10</code>, but this will 444 give you much bigger profile data files. In the profile data, you will see 445 the recursion levels of "func" as the different functions with names 446 "func", "func'2", "func'3" and so on.</p> 447 <p>If you have call chains "A > B > C" and "A > C > B" 448 in your program, you usually get a "false" cycle "B <> C". Use 449 <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=B</code> 450 <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=C</code>, 451 and functions "B" and "C" will be treated as different functions 452 depending on the direct caller. Using the apostrophe for appending 453 this "context" to the function name, you get "A > B'A > C'B" 454 and "A > C'A > B'C", and there will be no cycle. Use 455 <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers">--separate-callers</a>=2</code> to get a 2-caller 456 dependency for all functions. Note that doing this will increase 457 the size of profile data files.</p> 458 </div> 459 <div class="sect2" title="6.2.5.Forking Programs"> 460 <div class="titlepage"><div><div><h3 class="title"> 461 <a name="cl-manual.forkingprograms"></a>6.2.5.Forking Programs</h3></div></div></div> 462 <p>If your program forks, the child will inherit all the profiling 463 data that has been gathered for the parent. To start with empty profile 464 counter values in the child, the client request 465 <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> 466 can be inserted into code to be executed by the child, directly after 467 <code class="computeroutput">fork</code>.</p> 468 <p>However, you will have to make sure that the output file format string 469 (controlled by <code class="option">--callgrind-out-file</code>) does contain 470 <code class="option">%p</code> (which is true by default). Otherwise, the 471 outputs from the parent and child will overwrite each other or will be 472 intermingled, which almost certainly is not what you want.</p> 473 <p>You will be able to control the new child independently from 474 the parent via callgrind_control.</p> 475 </div> 476 </div> 477 <div class="sect1" title="6.3.Callgrind Command-line Options"> 478 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 479 <a name="cl-manual.options"></a>6.3.Callgrind Command-line Options</h2></div></div></div> 480 <p> 481 In the following, options are grouped into classes. 482 </p> 483 <p> 484 Some options allow the specification of a function/symbol name, such as 485 <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>, or 486 <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. All these options 487 can be specified multiple times for different functions. 488 In addition, the function specifications actually are patterns by supporting 489 the use of wildcards '*' (zero or more arbitrary characters) and '?' 490 (exactly one arbitrary character), similar to file name globbing in the 491 shell. This feature is important especially for C++, as without wildcard 492 usage, the function would have to be specified in full extent, including 493 parameter signature. </p> 494 <div class="sect2" title="6.3.1.Dump creation options"> 495 <div class="titlepage"><div><div><h3 class="title"> 496 <a name="cl-manual.options.creation"></a>6.3.1.Dump creation options</h3></div></div></div> 497 <p> 498 These options influence the name and format of the profile data files. 499 </p> 500 <div class="variablelist"> 501 <a name="cl.opts.list.creation"></a><dl> 502 <dt> 503 <a name="opt.callgrind-out-file"></a><span class="term"> 504 <code class="option">--callgrind-out-file=<file> </code> 505 </span> 506 </dt> 507 <dd><p>Write the profile data to 508 <code class="computeroutput">file</code> rather than to the default 509 output file, 510 <code class="computeroutput">callgrind.out.<pid></code>. The 511 <code class="option">%p</code> and <code class="option">%q</code> format specifiers 512 can be used to embed the process ID and/or the contents of an 513 environment variable in the name, as is the case for the core 514 option <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>. 515 When multiple dumps are made, the file name 516 is modified further; see below.</p></dd> 517 <dt> 518 <a name="opt.dump-line"></a><span class="term"> 519 <code class="option">--dump-line=<no|yes> [default: yes] </code> 520 </span> 521 </dt> 522 <dd><p>This specifies that event counting should be performed at 523 source line granularity. This allows source annotation for sources 524 which are compiled with debug information 525 (<code class="option">-g</code>).</p></dd> 526 <dt> 527 <a name="opt.dump-instr"></a><span class="term"> 528 <code class="option">--dump-instr=<no|yes> [default: no] </code> 529 </span> 530 </dt> 531 <dd><p>This specifies that event counting should be performed at 532 per-instruction granularity. 533 This allows for assembly code 534 annotation. Currently the results can only be 535 displayed by KCachegrind.</p></dd> 536 <dt> 537 <a name="opt.compress-strings"></a><span class="term"> 538 <code class="option">--compress-strings=<no|yes> [default: yes] </code> 539 </span> 540 </dt> 541 <dd><p>This option influences the output format of the profile data. 542 It specifies whether strings (file and function names) should be 543 identified by numbers. This shrinks the file, 544 but makes it more difficult 545 for humans to read (which is not recommended in any case).</p></dd> 546 <dt> 547 <a name="opt.compress-pos"></a><span class="term"> 548 <code class="option">--compress-pos=<no|yes> [default: yes] </code> 549 </span> 550 </dt> 551 <dd><p>This option influences the output format of the profile data. 552 It specifies whether numerical positions are always specified as absolute 553 values or are allowed to be relative to previous numbers. 554 This shrinks the file size.</p></dd> 555 <dt> 556 <a name="opt.combine-dumps"></a><span class="term"> 557 <code class="option">--combine-dumps=<no|yes> [default: no] </code> 558 </span> 559 </dt> 560 <dd><p>When enabled, when multiple profile data parts are to be 561 generated these parts are appended to the same output file. 562 Not recommended.</p></dd> 563 </dl> 564 </div> 565 </div> 566 <div class="sect2" title="6.3.2.Activity options"> 567 <div class="titlepage"><div><div><h3 class="title"> 568 <a name="cl-manual.options.activity"></a>6.3.2.Activity options</h3></div></div></div> 569 <p> 570 These options specify when actions relating to event counts are to 571 be executed. For interactive control use callgrind_control. 572 </p> 573 <div class="variablelist"> 574 <a name="cl.opts.list.activity"></a><dl> 575 <dt> 576 <a name="opt.dump-every-bb"></a><span class="term"> 577 <code class="option">--dump-every-bb=<count> [default: 0, never] </code> 578 </span> 579 </dt> 580 <dd><p>Dump profile data every <code class="option">count</code> basic blocks. 581 Whether a dump is needed is only checked when Valgrind's internal 582 scheduler is run. Therefore, the minimum setting useful is about 100000. 583 The count is a 64-bit value to make long dump periods possible. 584 </p></dd> 585 <dt> 586 <a name="opt.dump-before"></a><span class="term"> 587 <code class="option">--dump-before=<function> </code> 588 </span> 589 </dt> 590 <dd><p>Dump when entering <code class="option">function</code>.</p></dd> 591 <dt> 592 <a name="opt.zero-before"></a><span class="term"> 593 <code class="option">--zero-before=<function> </code> 594 </span> 595 </dt> 596 <dd><p>Zero all costs when entering <code class="option">function</code>.</p></dd> 597 <dt> 598 <a name="opt.dump-after"></a><span class="term"> 599 <code class="option">--dump-after=<function> </code> 600 </span> 601 </dt> 602 <dd><p>Dump when leaving <code class="option">function</code>.</p></dd> 603 </dl> 604 </div> 605 </div> 606 <div class="sect2" title="6.3.3.Data collection options"> 607 <div class="titlepage"><div><div><h3 class="title"> 608 <a name="cl-manual.options.collection"></a>6.3.3.Data collection options</h3></div></div></div> 609 <p> 610 These options specify when events are to be aggregated into event counts. 611 Also see <a class="xref" href="cl-manual.html#cl-manual.limits" title="6.2.2.Limiting the range of collected events">Limiting range of event collection</a>.</p> 612 <div class="variablelist"> 613 <a name="cl.opts.list.collection"></a><dl> 614 <dt> 615 <a name="opt.instr-atstart"></a><span class="term"> 616 <code class="option">--instr-atstart=<yes|no> [default: yes] </code> 617 </span> 618 </dt> 619 <dd> 620 <p>Specify if you want Callgrind to start simulation and 621 profiling from the beginning of the program. 622 When set to <code class="computeroutput">no</code>, 623 Callgrind will not be able 624 to collect any information, including calls, but it will have at 625 most a slowdown of around 4, which is the minimum Valgrind 626 overhead. Instrumentation can be interactively enabled via 627 <code class="computeroutput">callgrind_control -i on</code>.</p> 628 <p>Note that the resulting call graph will most probably not 629 contain <code class="function">main</code>, but will contain all the 630 functions executed after instrumentation was enabled. 631 Instrumentation can also programatically enabled/disabled. See the 632 Callgrind include file 633 <code class="computeroutput">callgrind.h</code> for the macro 634 you have to use in your source code.</p> 635 <p>For cache 636 simulation, results will be less accurate when switching on 637 instrumentation later in the program run, as the simulator starts 638 with an empty cache at that moment. Switch on event collection 639 later to cope with this error.</p> 640 </dd> 641 <dt> 642 <a name="opt.collect-atstart"></a><span class="term"> 643 <code class="option">--collect-atstart=<yes|no> [default: yes] </code> 644 </span> 645 </dt> 646 <dd> 647 <p>Specify whether event collection is enabled at beginning 648 of the profile run.</p> 649 <p>To only look at parts of your program, you have two 650 possibilities:</p> 651 <div class="orderedlist"><ol class="orderedlist" type="1"> 652 <li class="listitem"><p>Zero event counters before entering the program part you 653 want to profile, and dump the event counters to a file after 654 leaving that program part.</p></li> 655 <li class="listitem"><p>Switch on/off collection state as needed to only see 656 event counters happening while inside of the program part you 657 want to profile.</p></li> 658 </ol></div> 659 <p>The second option can be used if the program part you want to 660 profile is called many times. Option 1, i.e. creating a lot of 661 dumps is not practical here.</p> 662 <p>Collection state can be 663 toggled at entry and exit of a given function with the 664 option <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>. If you 665 use this option, collection 666 state should be disabled at the beginning. Note that the 667 specification of <code class="option">--toggle-collect</code> 668 implicitly sets 669 <code class="option">--collect-state=no</code>.</p> 670 <p>Collection state can be toggled also by inserting the client request 671 <code class="computeroutput"> 672 673 CALLGRIND_TOGGLE_COLLECT 674 ;</code> 675 at the needed code positions.</p> 676 </dd> 677 <dt> 678 <a name="opt.toggle-collect"></a><span class="term"> 679 <code class="option">--toggle-collect=<function> </code> 680 </span> 681 </dt> 682 <dd><p>Toggle collection on entry/exit of <code class="option">function</code>.</p></dd> 683 <dt> 684 <a name="opt.collect-jumps"></a><span class="term"> 685 <code class="option">--collect-jumps=<no|yes> [default: no] </code> 686 </span> 687 </dt> 688 <dd><p>This specifies whether information for (conditional) jumps 689 should be collected. As above, callgrind_annotate currently is not 690 able to show you the data. You have to use KCachegrind to get jump 691 arrows in the annotated code.</p></dd> 692 <dt> 693 <a name="opt.collect-systime"></a><span class="term"> 694 <code class="option">--collect-systime=<no|yes> [default: no] </code> 695 </span> 696 </dt> 697 <dd><p>This specifies whether information for system call times 698 should be collected.</p></dd> 699 <dt> 700 <a name="clopt.collect-bus"></a><span class="term"> 701 <code class="option">--collect-bus=<no|yes> [default: no] </code> 702 </span> 703 </dt> 704 <dd><p>This specifies whether the number of global bus events executed 705 should be collected. The event type "Ge" is used for these events.</p></dd> 706 </dl> 707 </div> 708 </div> 709 <div class="sect2" title="6.3.4.Cost entity separation options"> 710 <div class="titlepage"><div><div><h3 class="title"> 711 <a name="cl-manual.options.separation"></a>6.3.4.Cost entity separation options</h3></div></div></div> 712 <p> 713 These options specify how event counts should be attributed to execution 714 contexts. 715 For example, they specify whether the recursion level or the 716 call chain leading to a function should be taken into account, 717 and whether the thread ID should be considered. 718 Also see <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p> 719 <div class="variablelist"> 720 <a name="cmd-options.separation"></a><dl> 721 <dt> 722 <a name="opt.separate-threads"></a><span class="term"> 723 <code class="option">--separate-threads=<no|yes> [default: no] </code> 724 </span> 725 </dt> 726 <dd><p>This option specifies whether profile data should be generated 727 separately for every thread. If yes, the file names get "-threadID" 728 appended.</p></dd> 729 <dt> 730 <a name="opt.separate-callers"></a><span class="term"> 731 <code class="option">--separate-callers=<callers> [default: 0] </code> 732 </span> 733 </dt> 734 <dd><p>Separate contexts by at most <callers> functions in the 735 call chain. See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd> 736 <dt> 737 <a name="opt.separate-callers-num"></a><span class="term"> 738 <code class="option">--separate-callers<number>=<function> </code> 739 </span> 740 </dt> 741 <dd><p>Separate <code class="option">number</code> callers for <code class="option">function</code>. 742 See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd> 743 <dt> 744 <a name="opt.separate-recs"></a><span class="term"> 745 <code class="option">--separate-recs=<level> [default: 2] </code> 746 </span> 747 </dt> 748 <dd><p>Separate function recursions by at most <code class="option">level</code> levels. 749 See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd> 750 <dt> 751 <a name="opt.separate-recs-num"></a><span class="term"> 752 <code class="option">--separate-recs<number>=<function> </code> 753 </span> 754 </dt> 755 <dd><p>Separate <code class="option">number</code> recursions for <code class="option">function</code>. 756 See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd> 757 <dt> 758 <a name="opt.skip-plt"></a><span class="term"> 759 <code class="option">--skip-plt=<no|yes> [default: yes] </code> 760 </span> 761 </dt> 762 <dd><p>Ignore calls to/from PLT sections.</p></dd> 763 <dt> 764 <a name="opt.skip-direct-rec"></a><span class="term"> 765 <code class="option">--skip-direct-rec=<no|yes> [default: yes] </code> 766 </span> 767 </dt> 768 <dd><p>Ignore direct recursions.</p></dd> 769 <dt> 770 <a name="opt.fn-skip"></a><span class="term"> 771 <code class="option">--fn-skip=<function> </code> 772 </span> 773 </dt> 774 <dd> 775 <p>Ignore calls to/from a given function. E.g. if you have a 776 call chain A > B > C, and you specify function B to be 777 ignored, you will only see A > C.</p> 778 <p>This is very convenient to skip functions handling callback 779 behaviour. For example, with the signal/slot mechanism in the 780 Qt graphics library, you only want 781 to see the function emitting a signal to call the slots connected 782 to that signal. First, determine the real call chain to see the 783 functions needed to be skipped, then use this option.</p> 784 </dd> 785 </dl> 786 </div> 787 </div> 788 <div class="sect2" title="6.3.5.Simulation options"> 789 <div class="titlepage"><div><div><h3 class="title"> 790 <a name="cl-manual.options.simulation"></a>6.3.5.Simulation options</h3></div></div></div> 791 <div class="variablelist"> 792 <a name="cl.opts.list.simulation"></a><dl> 793 <dt> 794 <a name="clopt.cache-sim"></a><span class="term"> 795 <code class="option">--cache-sim=<yes|no> [default: no] </code> 796 </span> 797 </dt> 798 <dd><p>Specify if you want to do full cache simulation. By default, 799 only instruction read accesses will be counted ("Ir"). 800 With cache simulation, further event counters are enabled: 801 Cache misses on instruction reads ("I1mr"/"ILmr"), 802 data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"), 803 data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw"). 804 For more information, see <a class="xref" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a>. 805 </p></dd> 806 <dt> 807 <a name="clopt.branch-sim"></a><span class="term"> 808 <code class="option">--branch-sim=<yes|no> [default: no] </code> 809 </span> 810 </dt> 811 <dd><p>Specify if you want to do branch prediction simulation. 812 Further event counters are enabled: Number of executed conditional 813 branches and related predictor misses ("Bc"/"Bcm"), executed indirect 814 jumps and related misses of the jump address predictor ("Bi"/"Bim"). 815 </p></dd> 816 </dl> 817 </div> 818 </div> 819 <div class="sect2" title="6.3.6.Cache simulation options"> 820 <div class="titlepage"><div><div><h3 class="title"> 821 <a name="cl-manual.options.cachesimulation"></a>6.3.6.Cache simulation options</h3></div></div></div> 822 <div class="variablelist"> 823 <a name="cl.opts.list.cachesimulation"></a><dl> 824 <dt> 825 <a name="opt.simulate-wb"></a><span class="term"> 826 <code class="option">--simulate-wb=<yes|no> [default: no] </code> 827 </span> 828 </dt> 829 <dd><p>Specify whether write-back behavior should be simulated, allowing 830 to distinguish LL caches misses with and without write backs. 831 The cache model of Cachegrind/Callgrind does not specify write-through 832 vs. write-back behavior, and this also is not relevant for the number 833 of generated miss counts. However, with explicit write-back simulation 834 it can be decided whether a miss triggers not only the loading of a new 835 cache line, but also if a write back of a dirty cache line had to take 836 place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw, 837 for misses because of instruction read, data read, and data write, 838 respectively. As they produce two memory transactions, they should 839 account for a doubled time estimation in relation to a normal miss. 840 </p></dd> 841 <dt> 842 <a name="opt.simulate-hwpref"></a><span class="term"> 843 <code class="option">--simulate-hwpref=<yes|no> [default: no] </code> 844 </span> 845 </dt> 846 <dd><p>Specify whether simulation of a hardware prefetcher should be 847 added which is able to detect stream access in the second level cache 848 by comparing accesses to separate to each page. 849 As the simulation can not decide about any timing issues of prefetching, 850 it is assumed that any hardware prefetch triggered succeeds before a 851 real access is done. Thus, this gives a best-case scenario by covering 852 all possible stream accesses.</p></dd> 853 <dt> 854 <a name="opt.cacheuse"></a><span class="term"> 855 <code class="option">--cacheuse=<yes|no> [default: no] </code> 856 </span> 857 </dt> 858 <dd><p>Specify whether cache line use should be collected. For every 859 cache line, from loading to it being evicted, the number of accesses 860 as well as the number of actually used bytes is determined. This 861 behavior is related to the code which triggered loading of the cache 862 line. In contrast to miss counters, which shows the position where 863 the symptoms of bad cache behavior (i.e. latencies) happens, the 864 use counters try to pinpoint at the reason (i.e. the code with the 865 bad access behavior). The new counters are defined in a way such 866 that worse behavior results in higher cost. 867 AcCost1 and AcCost2 are counters showing bad temporal locality 868 for L1 and LL caches, respectively. This is done by summing up 869 reciprocal values of the numbers of accesses of each cache line, 870 multiplied by 1000 (as only integer costs are allowed). E.g. for 871 a given source line with 5 read accesses, a value of 5000 AcCost 872 means that for every access, a new cache line was loaded and directly 873 evicted afterwards without further accesses. Similarly, SpLoss1/2 874 shows bad spatial locality for L1 and LL caches, respectively. It 875 gives the <span class="emphasis"><em>spatial loss</em></span> count of bytes which 876 were loaded into cache but never accessed. It pinpoints at code 877 accessing data in a way such that cache space is wasted. This hints 878 at bad layout of data structures in memory. Assuming a cache line 879 size of 64 bytes and 100 L1 misses for a given source line, the 880 loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a 881 value of 3200 for this line, this means that half of the loaded data was 882 never used, or using a better data layout, only half of the cache 883 space would have been needed. 884 Please note that for cache line use counters, it currently is 885 not possible to provide meaningful inclusive costs. Therefore, 886 inclusive cost of these counters should be ignored. 887 </p></dd> 888 <dt> 889 <a name="opt.I1"></a><span class="term"> 890 <code class="option">--I1=<size>,<associativity>,<line size> </code> 891 </span> 892 </dt> 893 <dd><p>Specify the size, associativity and line size of the level 1 894 instruction cache. </p></dd> 895 <dt> 896 <a name="opt.D1"></a><span class="term"> 897 <code class="option">--D1=<size>,<associativity>,<line size> </code> 898 </span> 899 </dt> 900 <dd><p>Specify the size, associativity and line size of the level 1 901 data cache.</p></dd> 902 <dt> 903 <a name="opt.LL"></a><span class="term"> 904 <code class="option">--LL=<size>,<associativity>,<line size> </code> 905 </span> 906 </dt> 907 <dd><p>Specify the size, associativity and line size of the last-level 908 cache.</p></dd> 909 </dl> 910 </div> 911 </div> 912 </div> 913 <div class="sect1" title="6.4.Callgrind specific client requests"> 914 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 915 <a name="cl-manual.clientrequests"></a>6.4.Callgrind specific client requests</h2></div></div></div> 916 <p>Callgrind provides the following specific client requests in 917 <code class="filename">callgrind.h</code>. See that file for the exact details of 918 their arguments.</p> 919 <div class="variablelist"> 920 <a name="cl.clientrequests.list"></a><dl> 921 <dt> 922 <a name="cr.dump-stats"></a><span class="term"> 923 <code class="computeroutput">CALLGRIND_DUMP_STATS</code> 924 </span> 925 </dt> 926 <dd><p>Force generation of a profile dump at specified position 927 in code, for the current thread only. Written counters will be reset 928 to zero.</p></dd> 929 <dt> 930 <a name="cr.dump-stats-at"></a><span class="term"> 931 <code class="computeroutput">CALLGRIND_DUMP_STATS_AT(string)</code> 932 </span> 933 </dt> 934 <dd><p>Same as <code class="computeroutput">CALLGRIND_DUMP_STATS</code>, 935 but allows to specify a string to be able to distinguish profile 936 dumps.</p></dd> 937 <dt> 938 <a name="cr.zero-stats"></a><span class="term"> 939 <code class="computeroutput">CALLGRIND_ZERO_STATS</code> 940 </span> 941 </dt> 942 <dd><p>Reset the profile counters for the current thread to zero.</p></dd> 943 <dt> 944 <a name="cr.toggle-collect"></a><span class="term"> 945 <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT</code> 946 </span> 947 </dt> 948 <dd><p>Toggle the collection state. This allows to ignore events 949 with regard to profile counters. See also options 950 <code class="option"><a class="xref" href="cl-manual.html#opt.collect-atstart">--collect-atstart</a></code> and 951 <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.</p></dd> 952 <dt> 953 <a name="cr.start-instr"></a><span class="term"> 954 <code class="computeroutput">CALLGRIND_START_INSTRUMENTATION</code> 955 </span> 956 </dt> 957 <dd><p>Start full Callgrind instrumentation if not already enabled. 958 When cache simulation is done, this will flush the simulated cache 959 and lead to an artifical cache warmup phase afterwards with 960 cache misses which would not have happened in reality. See also 961 option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> 962 <dt> 963 <a name="cr.stop-instr"></a><span class="term"> 964 <code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION</code> 965 </span> 966 </dt> 967 <dd><p>Stop full Callgrind instrumentation if not already disabled. 968 This flushes Valgrinds translation cache, and does no additional 969 instrumentation afterwards: it effectivly will run at the same 970 speed as Nulgrind, i.e. at minimal slowdown. Use this to 971 speed up the Callgrind run for uninteresting code parts. Use 972 <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code> to 973 enable instrumentation again. See also option 974 <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd> 975 </dl> 976 </div> 977 </div> 978 <div class="sect1" title="6.5.callgrind_annotate Command-line Options"> 979 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 980 <a name="cl-manual.callgrind_annotate-options"></a>6.5.callgrind_annotate Command-line Options</h2></div></div></div> 981 <div class="variablelist"> 982 <a name="callgrind_annotate.opts.list"></a><dl> 983 <dt><span class="term"><code class="option">-h --help</code></span></dt> 984 <dd><p>Show summary of options.</p></dd> 985 <dt><span class="term"><code class="option">--version</code></span></dt> 986 <dd><p>Show version of callgrind_annotate.</p></dd> 987 <dt><span class="term"> 988 <code class="option">--show=A,B,C [default: all]</code> 989 </span></dt> 990 <dd><p>Only show figures for events A,B,C.</p></dd> 991 <dt><span class="term"> 992 <code class="option">--sort=A,B,C</code> 993 </span></dt> 994 <dd><p>Sort columns by events A,B,C [event column order].</p></dd> 995 <dt><span class="term"> 996 <code class="option">--threshold=<0--100> [default: 99%] </code> 997 </span></dt> 998 <dd><p>Percentage of counts (of primary sort event) we are 999 interested in.</p></dd> 1000 <dt><span class="term"> 1001 <code class="option">--auto=<yes|no> [default: no] </code> 1002 </span></dt> 1003 <dd><p>Annotate all source files containing functions that helped 1004 reach the event count threshold.</p></dd> 1005 <dt><span class="term"> 1006 <code class="option">--context=N [default: 8] </code> 1007 </span></dt> 1008 <dd><p>Print N lines of context before and after annotated 1009 lines.</p></dd> 1010 <dt><span class="term"> 1011 <code class="option">--inclusive=<yes|no> [default: no] </code> 1012 </span></dt> 1013 <dd><p>Add subroutine costs to functions calls.</p></dd> 1014 <dt><span class="term"> 1015 <code class="option">--tree=<none|caller|calling|both> [default: none] </code> 1016 </span></dt> 1017 <dd><p>Print for each function their callers, the called functions 1018 or both.</p></dd> 1019 <dt><span class="term"> 1020 <code class="option">-I, --include=<dir> </code> 1021 </span></dt> 1022 <dd><p>Add <code class="option">dir</code> to the list of directories to search 1023 for source files.</p></dd> 1024 </dl> 1025 </div> 1026 </div> 1027 <div class="sect1" title="6.6.callgrind_control Command-line Options"> 1028 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 1029 <a name="cl-manual.callgrind_control-options"></a>6.6.callgrind_control Command-line Options</h2></div></div></div> 1030 <p>By default, callgrind_control acts on all programs run by the 1031 current user under Callgrind. It is possible to limit the actions to 1032 specified Callgrind runs by providing a list of pids or program names as 1033 argument. The default action is to give some brief information about the 1034 applications being run under Callgrind.</p> 1035 <div class="variablelist"> 1036 <a name="callgrind_control.opts.list"></a><dl> 1037 <dt><span class="term"><code class="option">-h --help</code></span></dt> 1038 <dd><p>Show a short description, usage, and summary of options.</p></dd> 1039 <dt><span class="term"><code class="option">--version</code></span></dt> 1040 <dd><p>Show version of callgrind_control.</p></dd> 1041 <dt><span class="term"><code class="option">-l --long</code></span></dt> 1042 <dd><p>Show also the working directory, in addition to the brief 1043 information given by default. 1044 </p></dd> 1045 <dt><span class="term"><code class="option">-s --stat</code></span></dt> 1046 <dd><p>Show statistics information about active Callgrind runs.</p></dd> 1047 <dt><span class="term"><code class="option">-b --back</code></span></dt> 1048 <dd><p>Show stack/back traces of each thread in active Callgrind runs. For 1049 each active function in the stack trace, also the number of invocations 1050 since program start (or last dump) is shown. This option can be 1051 combined with -e to show inclusive cost of active functions.</p></dd> 1052 <dt><span class="term"><code class="option">-e [A,B,...] </code> (default: all)</span></dt> 1053 <dd><p>Show the current per-thread, exclusive cost values of event 1054 counters. If no explicit event names are given, figures for all event 1055 types which are collected in the given Callgrind run are 1056 shown. Otherwise, only figures for event types A, B, ... are shown. If 1057 this option is combined with -b, inclusive cost for the functions of 1058 each active stack frame is provided, too. 1059 </p></dd> 1060 <dt><span class="term"><code class="option">--dump[=<desc>] </code> (default: no description)</span></dt> 1061 <dd><p>Request the dumping of profile information. Optionally, a 1062 description can be specified which is written into the dump as part of 1063 the information giving the reason which triggered the dump action. This 1064 can be used to distinguish multiple dumps.</p></dd> 1065 <dt><span class="term"><code class="option">-z --zero</code></span></dt> 1066 <dd><p>Zero all event counters.</p></dd> 1067 <dt><span class="term"><code class="option">-k --kill</code></span></dt> 1068 <dd><p>Force a Callgrind run to be terminated.</p></dd> 1069 <dt><span class="term"><code class="option">--instr=<on|off></code></span></dt> 1070 <dd><p>Switch instrumentation mode on or off. If a Callgrind run has 1071 instrumentation disabled, no simulation is done and no events are 1072 counted. This is useful to skip uninteresting program parts, as there 1073 is much less slowdown (same as with the Valgrind tool "none"). See also 1074 the Callgrind option <code class="option">--instr-atstart</code>.</p></dd> 1075 <dt><span class="term"><code class="option">-w=<dir></code></span></dt> 1076 <dd><p>Specify the startup directory of an active Callgrind run. On some 1077 systems, active Callgrind runs can not be detected. To be able to 1078 control these, the failed auto-detection can be worked around by 1079 specifying the directory where a Callgrind run was started.</p></dd> 1080 </dl> 1081 </div> 1082 </div> 1083 </div> 1084 <div> 1085 <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> 1086 <tr> 1087 <td rowspan="2" width="40%" align="left"> 1088 <a accesskey="p" href="cg-manual.html"><<5.Cachegrind: a cache and branch-prediction profiler</a></td> 1089 <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> 1090 <td rowspan="2" width="40%" align="right"><a accesskey="n" href="hg-manual.html">7.Helgrind: a thread error detector>></a> 1091 </td> 1092 </tr> 1093 <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> 1094 </table> 1095 </div> 1096 </body> 1097 </html> 1098