Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
      4 <title>6.Callgrind: a call-graph generating cache and branch prediction profiler</title>
      5 <link rel="stylesheet" href="vg_basic.css" type="text/css">
      6 <meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
      7 <link rel="home" href="index.html" title="Valgrind Documentation">
      8 <link rel="up" href="manual.html" title="Valgrind User Manual">
      9 <link rel="prev" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler">
     10 <link rel="next" href="hg-manual.html" title="7.Helgrind: a thread error detector">
     11 </head>
     12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
     13 <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
     14 <td width="22px" align="center" valign="middle"><a accesskey="p" href="cg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
     15 <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
     16 <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
     17 <th align="center" valign="middle">Valgrind User Manual</th>
     18 <td width="22px" align="center" valign="middle"><a accesskey="n" href="hg-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
     19 </tr></table></div>
     20 <div class="chapter" title="6.Callgrind: a call-graph generating cache and branch prediction profiler">
     21 <div class="titlepage"><div><div><h2 class="title">
     22 <a name="cl-manual"></a>6.Callgrind: a call-graph generating cache and branch prediction profiler</h2></div></div></div>
     23 <div class="toc">
     24 <p><b>Table of Contents</b></p>
     25 <dl>
     26 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.use">6.1. Overview</a></span></dt>
     27 <dd><dl>
     28 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.functionality">6.1.1. Functionality</a></span></dt>
     29 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.basics">6.1.2. Basic Usage</a></span></dt>
     30 </dl></dd>
     31 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.usage">6.2. Advanced Usage</a></span></dt>
     32 <dd><dl>
     33 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.dumps">6.2.1. Multiple profiling dumps from one program run</a></span></dt>
     34 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.limits">6.2.2. Limiting the range of collected events</a></span></dt>
     35 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.busevents">6.2.3. Counting global bus events</a></span></dt>
     36 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.cycles">6.2.4. Avoiding cycles</a></span></dt>
     37 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.forkingprograms">6.2.5. Forking Programs</a></span></dt>
     38 </dl></dd>
     39 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.options">6.3. Callgrind Command-line Options</a></span></dt>
     40 <dd><dl>
     41 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.creation">6.3.1. Dump creation options</a></span></dt>
     42 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.activity">6.3.2. Activity options</a></span></dt>
     43 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.collection">6.3.3. Data collection options</a></span></dt>
     44 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.separation">6.3.4. Cost entity separation options</a></span></dt>
     45 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.simulation">6.3.5. Simulation options</a></span></dt>
     46 <dt><span class="sect2"><a href="cl-manual.html#cl-manual.options.cachesimulation">6.3.6. Cache simulation options</a></span></dt>
     47 </dl></dd>
     48 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.clientrequests">6.4. Callgrind specific client requests</a></span></dt>
     49 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_annotate-options">6.5. callgrind_annotate Command-line Options</a></span></dt>
     50 <dt><span class="sect1"><a href="cl-manual.html#cl-manual.callgrind_control-options">6.6. callgrind_control Command-line Options</a></span></dt>
     51 </dl>
     52 </div>
     53 <p>To use this tool, you must specify
     54 <code class="option">--tool=callgrind</code> on the
     55 Valgrind command line.</p>
     56 <div class="sect1" title="6.1.Overview">
     57 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     58 <a name="cl-manual.use"></a>6.1.Overview</h2></div></div></div>
     59 <p>Callgrind is a profiling tool that records the call history among
     60 functions in a program's run as a call-graph.
     61 By default, the collected data consists of
     62 the number of instructions executed, their relationship
     63 to source lines, the caller/callee relationship between functions,
     64 and the numbers of such calls.
     65 Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
     66 can produce further information about the runtime behavior of an application.
     67 </p>
     68 <p>The profile data is written out to a file at program
     69 termination. For presentation of the data, and interactive control
     70 of the profiling, two command line tools are provided:</p>
     71 <div class="variablelist"><dl>
     72 <dt><span class="term"><span class="command"><strong>callgrind_annotate</strong></span></span></dt>
     73 <dd>
     74 <p>This command reads in the profile data, and prints a
     75     sorted lists of functions, optionally with source annotation.</p>
     76 <p>For graphical visualization of the data, try
     77     <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>, which is a KDE/Qt based
     78     GUI that makes it easy to navigate the large amount of data that
     79     Callgrind produces.</p>
     80 </dd>
     81 <dt><span class="term"><span class="command"><strong>callgrind_control</strong></span></span></dt>
     82 <dd><p>This command enables you to interactively observe and control 
     83     the status of a program currently running under Callgrind's control,
     84     without stopping the program.  You can get statistics information as
     85     well as the current stack trace, and you can request zeroing of counters
     86     or dumping of profile data.</p></dd>
     87 </dl></div>
     88 <div class="sect2" title="6.1.1.Functionality">
     89 <div class="titlepage"><div><div><h3 class="title">
     90 <a name="cl-manual.functionality"></a>6.1.1.Functionality</h3></div></div></div>
     91 <p>Cachegrind collects flat profile data: event counts (data reads,
     92 cache misses, etc.) are attributed directly to the function they
     93 occurred in.  This cost attribution mechanism is
     94 called <span class="emphasis"><em>self</em></span> or <span class="emphasis"><em>exclusive</em></span>
     95 attribution.</p>
     96 <p>Callgrind extends this functionality by propagating costs
     97 across function call boundaries.  If function <code class="function">foo</code> calls
     98 <code class="function">bar</code>, the costs from <code class="function">bar</code> are added into
     99 <code class="function">foo</code>'s costs.  When applied to the program as a whole,
    100 this builds up a picture of so called <span class="emphasis"><em>inclusive</em></span>
    101 costs, that is, where the cost of each function includes the costs of
    102 all functions it called, directly or indirectly.</p>
    103 <p>As an example, the inclusive cost of
    104 <code class="function">main</code> should be almost 100 percent
    105 of the total program cost.  Because of costs arising before 
    106 <code class="function">main</code> is run, such as
    107 initialization of the run time linker and construction of global C++
    108 objects, the inclusive cost of <code class="function">main</code>
    109 is not exactly 100 percent of the total program cost.</p>
    110 <p>Together with the call graph, this allows you to find the
    111 specific call chains starting from
    112 <code class="function">main</code> in which the majority of the
    113 program's costs occur.  Caller/callee cost attribution is also useful
    114 for profiling functions called from multiple call sites, and where
    115 optimization opportunities depend on changing code in the callers, in
    116 particular by reducing the call count.</p>
    117 <p>Callgrind's cache simulation is based on that of Cachegrind.
    118 Read the documentation for <a class="xref" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a> first.  The material
    119 below describes the features supported in addition to Cachegrind's
    120 features.</p>
    121 <p>Callgrind's ability to detect function calls and returns depends
    122 on the instruction set of the platform it is run on.  It works best
    123 on x86 and amd64, and unfortunately currently does not work so well
    124 on PowerPC code.  This is because there are no explicit call or return
    125 instructions in the PowerPC instruction set, so Callgrind has to rely
    126 on heuristics to detect calls and returns.</p>
    127 </div>
    128 <div class="sect2" title="6.1.2.Basic Usage">
    129 <div class="titlepage"><div><div><h3 class="title">
    130 <a name="cl-manual.basics"></a>6.1.2.Basic Usage</h3></div></div></div>
    131 <p>As with Cachegrind, you probably want to compile with debugging info
    132   (the <code class="option">-g</code> option) and with optimization turned on.</p>
    133 <p>To start a profile run for a program, execute:
    134   </p>
    135 <pre class="screen">valgrind --tool=callgrind [callgrind options] your-program [program options]</pre>
    136 <p>
    137   </p>
    138 <p>While the simulation is running, you can observe execution with:
    139   </p>
    140 <pre class="screen">callgrind_control -b</pre>
    141 <p>
    142   This will print out the current backtrace. To annotate the backtrace with
    143   event counts, run
    144   </p>
    145 <pre class="screen">callgrind_control -e -b</pre>
    146 <p>
    147   </p>
    148 <p>After program termination, a profile data file named 
    149   <code class="computeroutput">callgrind.out.&lt;pid&gt;</code>
    150   is generated, where <span class="emphasis"><em>pid</em></span> is the process ID 
    151   of the program being profiled.
    152   The data file contains information about the calls made in the
    153   program among the functions executed, together with 
    154   <span class="command"><strong>Instruction Read</strong></span> (Ir) event counts.</p>
    155 <p>To generate a function-by-function summary from the profile
    156   data file, use
    157   </p>
    158 <pre class="screen">callgrind_annotate [options] callgrind.out.&lt;pid&gt;</pre>
    159 <p>
    160   This summary is similar to the output you get from a Cachegrind
    161   run with cg_annotate: the list
    162   of functions is ordered by exclusive cost of functions, which also
    163   are the ones that are shown.
    164   Important for the additional features of Callgrind are
    165   the following two options:</p>
    166 <div class="itemizedlist"><ul class="itemizedlist" type="disc">
    167 <li class="listitem"><p><code class="option">--inclusive=yes</code>: Instead of using
    168       exclusive cost of functions as sorting order, use and show
    169       inclusive cost.</p></li>
    170 <li class="listitem"><p><code class="option">--tree=both</code>: Interleave into the
    171       top level list of functions, information on the callers and the callees
    172       of each function. In these lines, which represents executed
    173       calls, the cost gives the number of events spent in the call.
    174       Indented, above each function, there is the list of callers,
    175       and below, the list of callees. The sum of events in calls to
    176       a given function (caller lines), as well as the sum of events in
    177       calls from the function (callee lines) together with the self
    178       cost, gives the total inclusive cost of the function.</p></li>
    179 </ul></div>
    180 <p>Use <code class="option">--auto=yes</code> to get annotated source code
    181   for all relevant functions for which the source can be found. In
    182   addition to source annotation as produced by
    183   <code class="computeroutput">cg_annotate</code>, you will see the
    184   annotated call sites with call counts. For all other options, 
    185   consult the (Cachegrind) documentation for
    186   <code class="computeroutput">cg_annotate</code>.
    187   </p>
    188 <p>For better call graph browsing experience, it is highly recommended
    189   to use <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>.
    190   If your code
    191   has a significant fraction of its cost in <span class="emphasis"><em>cycles</em></span> (sets
    192   of functions calling each other in a recursive manner), you have to
    193   use KCachegrind, as <code class="computeroutput">callgrind_annotate</code>
    194   currently does not do any cycle detection, which is important to get correct
    195   results in this case.</p>
    196 <p>If you are additionally interested in measuring the 
    197   cache behavior of your program, use Callgrind with the option
    198   <code class="option"><a class="xref" href="cl-manual.html#clopt.cache-sim">--cache-sim</a>=yes</code>. For
    199   branch prediction simulation, use <code class="option"><a class="xref" href="cl-manual.html#clopt.branch-sim">--branch-sim</a>=yes</code>.
    200   Expect a further slow down approximately by a factor of 2.</p>
    201 <p>If the program section you want to profile is somewhere in the
    202   middle of the run, it is beneficial to 
    203   <span class="emphasis"><em>fast forward</em></span> to this section without any 
    204   profiling, and then enable profiling.  This is achieved by using
    205   the command line option
    206   <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code> 
    207   and running, in a shell:
    208   <code class="computeroutput">callgrind_control -i on</code> just before the 
    209   interesting code section is executed. To exactly specify
    210   the code position where profiling should start, use the client request
    211   <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code>.</p>
    212 <p>If you want to be able to see assembly code level annotation, specify
    213   <code class="option"><a class="xref" href="cl-manual.html#opt.dump-instr">--dump-instr</a>=yes</code>. This will produce
    214   profile data at instruction granularity. Note that the resulting profile
    215   data
    216   can only be viewed with KCachegrind. For assembly annotation, it also is
    217   interesting to see more details of the control flow inside of functions,
    218   i.e. (conditional) jumps. This will be collected by further specifying
    219   <code class="option"><a class="xref" href="cl-manual.html#opt.collect-jumps">--collect-jumps</a>=yes</code>.</p>
    220 </div>
    221 </div>
    222 <div class="sect1" title="6.2.Advanced Usage">
    223 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    224 <a name="cl-manual.usage"></a>6.2.Advanced Usage</h2></div></div></div>
    225 <div class="sect2" title="6.2.1.Multiple profiling dumps from one program run">
    226 <div class="titlepage"><div><div><h3 class="title">
    227 <a name="cl-manual.dumps"></a>6.2.1.Multiple profiling dumps from one program run</h3></div></div></div>
    228 <p>Sometimes you are not interested in characteristics of a full 
    229   program run, but only of a small part of it, for example execution of one
    230   algorithm.  If there are multiple algorithms, or one algorithm 
    231   running with different input data, it may even be useful to get different
    232   profile information for different parts of a single program run.</p>
    233 <p>Profile data files have names of the form
    234 </p>
    235 <pre class="screen">
    236 callgrind.out.<span class="emphasis"><em>pid</em></span>.<span class="emphasis"><em>part</em></span>-<span class="emphasis"><em>threadID</em></span>
    237 </pre>
    238 <p>
    239   </p>
    240 <p>where <span class="emphasis"><em>pid</em></span> is the PID of the running 
    241   program, <span class="emphasis"><em>part</em></span> is a number incremented on each
    242   dump (".part" is skipped for the dump at program termination), and 
    243   <span class="emphasis"><em>threadID</em></span> is a thread identification 
    244   ("-threadID" is only used if you request dumps of individual 
    245   threads with <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>).</p>
    246 <p>There are different ways to generate multiple profile dumps 
    247   while a program is running under Callgrind's supervision.  Nevertheless,
    248   all methods trigger the same action, which is "dump all profile 
    249   information since the last dump or program start, and zero cost 
    250   counters afterwards".  To allow for zeroing cost counters without
    251   dumping, there is a second action "zero all cost counters now". 
    252   The different methods are:</p>
    253 <div class="itemizedlist"><ul class="itemizedlist" type="disc">
    254 <li class="listitem"><p><span class="command"><strong>Dump on program termination.</strong></span>
    255       This method is the standard way and doesn't need any special
    256       action on your part.</p></li>
    257 <li class="listitem">
    258 <p><span class="command"><strong>Spontaneous, interactive dumping.</strong></span> Use
    259       </p>
    260 <pre class="screen">callgrind_control -d [hint [PID/Name]]</pre>
    261 <p> to 
    262       request the dumping of profile information of the supervised
    263       application with PID or Name.  <span class="emphasis"><em>hint</em></span> is an
    264       arbitrary string you can optionally specify to later be able to
    265       distinguish profile dumps.  The control program will not terminate
    266       before the dump is completely written.  Note that the application
    267       must be actively running for detection of the dump command. So,
    268       for a GUI application, resize the window, or for a server, send a
    269       request.</p>
    270 <p>If you are using <a class="ulink" href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindIndex" target="_top">KCachegrind</a>
    271       for browsing of profile information, you can use the toolbar
    272       button <span class="command"><strong>Force dump</strong></span>. This will request a dump
    273       and trigger a reload after the dump is written.</p>
    274 </li>
    275 <li class="listitem"><p><span class="command"><strong>Periodic dumping after execution of a specified
    276       number of basic blocks</strong></span>. For this, use the command line
    277       option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>.
    278       </p></li>
    279 <li class="listitem">
    280 <p><span class="command"><strong>Dumping at enter/leave of specified functions.</strong></span>
    281       Use the
    282       option <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>
    283       and <code class="option"><a class="xref" href="cl-manual.html#opt.dump-after">--dump-after</a>=function</code>.
    284       To zero cost counters before entering a function, use
    285       <code class="option"><a class="xref" href="cl-manual.html#opt.zero-before">--zero-before</a>=function</code>.</p>
    286 <p>You can specify these options multiple times for different
    287       functions. Function specifications support wildcards: e.g. use
    288       <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>='foo*'</code> to
    289       generate dumps before entering any function starting with 
    290       <span class="emphasis"><em>foo</em></span>.</p>
    291 </li>
    292 <li class="listitem"><p><span class="command"><strong>Program controlled dumping.</strong></span>
    293       Insert
    294       <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.dump-stats">CALLGRIND_DUMP_STATS</a>;</code>
    295       at the position in your code where you want a profile dump to happen. Use 
    296       <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code> to only 
    297       zero profile counters.
    298       See <a class="xref" href="cl-manual.html#cl-manual.clientrequests" title="6.4.Callgrind specific client requests">Client request reference</a> for more information on
    299       Callgrind specific client requests.</p></li>
    300 </ul></div>
    301 <p>If you are running a multi-threaded application and specify the
    302   command line option <code class="option"><a class="xref" href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>, 
    303   every thread will be profiled on its own and will create its own
    304   profile dump. Thus, the last two methods will only generate one dump
    305   of the currently running thread. With the other methods, you will get
    306   multiple dumps (one for each thread) on a dump request.</p>
    307 </div>
    308 <div class="sect2" title="6.2.2.Limiting the range of collected events">
    309 <div class="titlepage"><div><div><h3 class="title">
    310 <a name="cl-manual.limits"></a>6.2.2.Limiting the range of collected events</h3></div></div></div>
    311 <p>For aggregating events (function enter/leave,
    312   instruction execution, memory access) into event numbers,
    313   first, the events must be recognizable by Callgrind, and second,
    314   the collection state must be enabled.</p>
    315 <p>Event collection is only possible if <span class="emphasis"><em>instrumentation</em></span>
    316   for program code is enabled. This is the default, but for faster
    317   execution (identical to <code class="computeroutput">valgrind --tool=none</code>),
    318   it can be disabled until the program reaches a state in which
    319   you want to start collecting profiling data.  
    320   Callgrind can start without instrumentation
    321   by specifying option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>.
    322   Instrumentation can be enabled interactively
    323   with: </p>
    324 <pre class="screen">callgrind_control -i on</pre>
    325 <p>
    326   and off by specifying "off" instead of "on".
    327   Furthermore, instrumentation state can be programatically changed with
    328   the macros <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a>;</code>
    329   and <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.stop-instr">CALLGRIND_STOP_INSTRUMENTATION</a>;</code>.
    330   </p>
    331 <p>In addition to enabling instrumentation, you must also enable
    332   event collection for the parts of your program you are interested in.
    333   By default, event collection is enabled everywhere.
    334   You can limit collection to a specific function
    335   by using 
    336   <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=function</code>. 
    337   This will toggle the collection state on entering and leaving
    338   the specified functions.
    339   When this option is in effect, the default collection state
    340   at program start is "off".  Only events happening while running
    341   inside of the given function will be collected. Recursive
    342   calls of the given function do not trigger any action.</p>
    343 <p>It is important to note that with instrumentation disabled, the
    344   cache simulator cannot see any memory access events, and thus, any
    345   simulated cache state will be frozen and wrong without instrumentation.
    346   Therefore, to get useful cache events (hits/misses) after switching on
    347   instrumentation, the cache first must warm up,
    348   probably leading to many <span class="emphasis"><em>cold misses</em></span>
    349   which would not have happened in reality. If you do not want to see these,
    350   start event collection a few million instructions after you have enabled
    351   instrumentation.</p>
    352 </div>
    353 <div class="sect2" title="6.2.3.Counting global bus events">
    354 <div class="titlepage"><div><div><h3 class="title">
    355 <a name="cl-manual.busevents"></a>6.2.3.Counting global bus events</h3></div></div></div>
    356 <p>For access to shared data among threads in a multithreaded
    357   code, synchronization is required to avoid raced conditions.
    358   Synchronization primitives are usually implemented via atomic instructions.
    359   However, excessive use of such instructions can lead to performance
    360   issues.</p>
    361 <p>To enable analysis of this problem, Callgrind optionally can count
    362   the number of atomic instructions executed. More precisely, for x86/x86_64,
    363   these are instructions using a lock prefix. For architectures supporting
    364   LL/SC, these are the number of SC instructions executed. For both, the term
    365   "global bus events" is used.</p>
    366 <p>The short name of the event type used for global bus events is "Ge".
    367   To count global bus events, use <code class="option"><a class="xref" href="cl-manual.html#clopt.collect-bus">--collect-bus</a>=yes</code>.
    368   </p>
    369 </div>
    370 <div class="sect2" title="6.2.4.Avoiding cycles">
    371 <div class="titlepage"><div><div><h3 class="title">
    372 <a name="cl-manual.cycles"></a>6.2.4.Avoiding cycles</h3></div></div></div>
    373 <p>Informally speaking, a cycle is a group of functions which
    374   call each other in a recursive way.</p>
    375 <p>Formally speaking, a cycle is a nonempty set S of functions,
    376   such that for every pair of functions F and G in S, it is possible
    377   to call from F to G (possibly via intermediate functions) and also
    378   from G to F.  Furthermore, S must be maximal -- that is, be the
    379   largest set of functions satisfying this property.  For example, if
    380   a third function H is called from inside S and calls back into S,
    381   then H is also part of the cycle and should be included in S.</p>
    382 <p>Recursion is quite usual in programs, and therefore, cycles
    383   sometimes appear in the call graph output of Callgrind. However,
    384   the title of this chapter should raise two questions: What is bad
    385   about cycles which makes you want to avoid them? And: How can
    386   cycles be avoided without changing program code?</p>
    387 <p>Cycles are not bad in itself, but tend to make performance
    388   analysis of your code harder. This is because inclusive costs
    389   for calls inside of a cycle are meaningless. The definition of
    390   inclusive cost, i.e. self cost of a function plus inclusive cost
    391   of its callees, needs a topological order among functions. For
    392   cycles, this does not hold true: callees of a function in a cycle include
    393   the function itself. Therefore, KCachegrind does cycle detection
    394   and skips visualization of any inclusive cost for calls inside
    395   of cycles. Further, all functions in a cycle are collapsed into artifical
    396   functions called like <code class="computeroutput">Cycle 1</code>.</p>
    397 <p>Now, when a program exposes really big cycles (as is
    398   true for some GUI code, or in general code using event or callback based
    399   programming style), you lose the nice property to let you pinpoint
    400   the bottlenecks by following call chains from
    401   <code class="function">main</code>, guided via
    402   inclusive cost. In addition, KCachegrind loses its ability to show
    403   interesting parts of the call graph, as it uses inclusive costs to
    404   cut off uninteresting areas.</p>
    405 <p>Despite the meaningless of inclusive costs in cycles, the big
    406   drawback for visualization motivates the possibility to temporarily
    407   switch off cycle detection in KCachegrind, which can lead to
    408   misguiding visualization. However, often cycles appear because of
    409   unlucky superposition of independent call chains in a way that
    410   the profile result will see a cycle. Neglecting uninteresting
    411   calls with very small measured inclusive cost would break these
    412   cycles. In such cases, incorrect handling of cycles by not detecting
    413   them still gives meaningful profiling visualization.</p>
    414 <p>It has to be noted that currently, <span class="command"><strong>callgrind_annotate</strong></span>
    415   does not do any cycle detection at all. For program executions with function
    416   recursion, it e.g. can print nonsense inclusive costs way above 100%.</p>
    417 <p>After describing why cycles are bad for profiling, it is worth
    418   talking about cycle avoidance. The key insight here is that symbols in
    419   the profile data do not have to exactly match the symbols found in the
    420   program. Instead, the symbol name could encode additional information
    421   from the current execution context such as recursion level of the
    422   current function, or even some part of the call chain leading to the
    423   function. While encoding of additional information into symbols is
    424   quite capable of avoiding cycles, it has to be used carefully to not cause
    425   symbol explosion. The latter imposes large memory requirement for Callgrind
    426   with possible out-of-memory conditions, and big profile data files.</p>
    427 <p>A further possibility to avoid cycles in Callgrind's profile data
    428   output is to simply leave out given functions in the call graph. Of course, this
    429   also skips any call information from and to an ignored function, and thus can
    430   break a cycle. Candidates for this typically are dispatcher functions in event
    431   driven code. The option to ignore calls to a function is
    432   <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. Aside from
    433   possibly breaking cycles, this is used in Callgrind to skip
    434   trampoline functions in the PLT sections
    435   for calls to functions in shared libraries. You can see the difference
    436   if you profile with <code class="option"><a class="xref" href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>.
    437   If a call is ignored, its cost events will be propagated to the
    438   enclosing function.</p>
    439 <p>If you have a recursive function, you can distinguish the first
    440   10 recursion levels by specifying
    441   <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs-num">--separate-recs10</a>=function</code>.  
    442   Or for all functions with 
    443   <code class="option"><a class="xref" href="cl-manual.html#opt.separate-recs">--separate-recs</a>=10</code>, but this will 
    444   give you much bigger profile data files.  In the profile data, you will see
    445   the recursion levels of "func" as the different functions with names
    446   "func", "func'2", "func'3" and so on.</p>
    447 <p>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
    448   in your program, you usually get a "false" cycle "B &lt;&gt; C". Use 
    449   <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=B</code> 
    450   <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers-num">--separate-callers2</a>=C</code>,
    451   and functions "B" and "C" will be treated as different functions 
    452   depending on the direct caller. Using the apostrophe for appending 
    453   this "context" to the function name, you get "A &gt; B'A &gt; C'B" 
    454   and "A &gt; C'A &gt; B'C", and there will be no cycle. Use 
    455   <code class="option"><a class="xref" href="cl-manual.html#opt.separate-callers">--separate-callers</a>=2</code> to get a 2-caller 
    456   dependency for all functions.  Note that doing this will increase
    457   the size of profile data files.</p>
    458 </div>
    459 <div class="sect2" title="6.2.5.Forking Programs">
    460 <div class="titlepage"><div><div><h3 class="title">
    461 <a name="cl-manual.forkingprograms"></a>6.2.5.Forking Programs</h3></div></div></div>
    462 <p>If your program forks, the child will inherit all the profiling
    463   data that has been gathered for the parent. To start with empty profile
    464   counter values in the child, the client request
    465   <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.zero-stats">CALLGRIND_ZERO_STATS</a>;</code>
    466   can be inserted into code to be executed by the child, directly after
    467   <code class="computeroutput">fork</code>.</p>
    468 <p>However, you will have to make sure that the output file format string
    469   (controlled by <code class="option">--callgrind-out-file</code>) does contain
    470   <code class="option">%p</code> (which is true by default). Otherwise, the
    471   outputs from the parent and child will overwrite each other or will be
    472   intermingled, which almost certainly is not what you want.</p>
    473 <p>You will be able to control the new child independently from
    474   the parent via callgrind_control.</p>
    475 </div>
    476 </div>
    477 <div class="sect1" title="6.3.Callgrind Command-line Options">
    478 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    479 <a name="cl-manual.options"></a>6.3.Callgrind Command-line Options</h2></div></div></div>
    480 <p>
    481 In the following, options are grouped into classes.
    482 </p>
    483 <p>
    484 Some options allow the specification of a function/symbol name, such as
    485 <code class="option"><a class="xref" href="cl-manual.html#opt.dump-before">--dump-before</a>=function</code>, or
    486 <code class="option"><a class="xref" href="cl-manual.html#opt.fn-skip">--fn-skip</a>=function</code>. All these options
    487 can be specified multiple times for different functions.
    488 In addition, the function specifications actually are patterns by supporting
    489 the use of wildcards '*' (zero or more arbitrary characters) and '?'
    490 (exactly one arbitrary character), similar to file name globbing in the
    491 shell. This feature is important especially for C++, as without wildcard
    492 usage, the function would have to be specified in full extent, including
    493 parameter signature. </p>
    494 <div class="sect2" title="6.3.1.Dump creation options">
    495 <div class="titlepage"><div><div><h3 class="title">
    496 <a name="cl-manual.options.creation"></a>6.3.1.Dump creation options</h3></div></div></div>
    497 <p>
    498 These options influence the name and format of the profile data files.
    499 </p>
    500 <div class="variablelist">
    501 <a name="cl.opts.list.creation"></a><dl>
    502 <dt>
    503 <a name="opt.callgrind-out-file"></a><span class="term">
    504       <code class="option">--callgrind-out-file=&lt;file&gt; </code>
    505     </span>
    506 </dt>
    507 <dd><p>Write the profile data to
    508             <code class="computeroutput">file</code> rather than to the default
    509             output file,
    510             <code class="computeroutput">callgrind.out.&lt;pid&gt;</code>.  The
    511             <code class="option">%p</code> and <code class="option">%q</code> format specifiers
    512             can be used to embed the process ID and/or the contents of an
    513             environment variable in the name, as is the case for the core
    514             option <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
    515             When multiple dumps are made, the file name
    516             is modified further; see below.</p></dd>
    517 <dt>
    518 <a name="opt.dump-line"></a><span class="term">
    519       <code class="option">--dump-line=&lt;no|yes&gt; [default: yes] </code>
    520     </span>
    521 </dt>
    522 <dd><p>This specifies that event counting should be performed at
    523       source line granularity. This allows source annotation for sources
    524       which are compiled with debug information
    525       (<code class="option">-g</code>).</p></dd>
    526 <dt>
    527 <a name="opt.dump-instr"></a><span class="term">
    528       <code class="option">--dump-instr=&lt;no|yes&gt; [default: no] </code>
    529     </span>
    530 </dt>
    531 <dd><p>This specifies that event counting should be performed at
    532       per-instruction granularity.
    533       This allows for assembly code
    534       annotation.  Currently the results can only be 
    535       displayed by KCachegrind.</p></dd>
    536 <dt>
    537 <a name="opt.compress-strings"></a><span class="term">
    538       <code class="option">--compress-strings=&lt;no|yes&gt; [default: yes] </code>
    539     </span>
    540 </dt>
    541 <dd><p>This option influences the output format of the profile data.
    542       It specifies whether strings (file and function names) should be
    543       identified by numbers. This shrinks the file, 
    544       but makes it more difficult
    545       for humans to read (which is not recommended in any case).</p></dd>
    546 <dt>
    547 <a name="opt.compress-pos"></a><span class="term">
    548       <code class="option">--compress-pos=&lt;no|yes&gt; [default: yes] </code>
    549     </span>
    550 </dt>
    551 <dd><p>This option influences the output format of the profile data.
    552       It specifies whether numerical positions are always specified as absolute
    553       values or are allowed to be relative to previous numbers.
    554       This shrinks the file size.</p></dd>
    555 <dt>
    556 <a name="opt.combine-dumps"></a><span class="term">
    557       <code class="option">--combine-dumps=&lt;no|yes&gt; [default: no] </code>
    558     </span>
    559 </dt>
    560 <dd><p>When enabled, when multiple profile data parts are to be
    561       generated these parts are appended to the same output file.
    562       Not recommended.</p></dd>
    563 </dl>
    564 </div>
    565 </div>
    566 <div class="sect2" title="6.3.2.Activity options">
    567 <div class="titlepage"><div><div><h3 class="title">
    568 <a name="cl-manual.options.activity"></a>6.3.2.Activity options</h3></div></div></div>
    569 <p>
    570 These options specify when actions relating to event counts are to
    571 be executed. For interactive control use callgrind_control.
    572 </p>
    573 <div class="variablelist">
    574 <a name="cl.opts.list.activity"></a><dl>
    575 <dt>
    576 <a name="opt.dump-every-bb"></a><span class="term">
    577       <code class="option">--dump-every-bb=&lt;count&gt; [default: 0, never] </code>
    578     </span>
    579 </dt>
    580 <dd><p>Dump profile data every <code class="option">count</code> basic blocks.
    581       Whether a dump is needed is only checked when Valgrind's internal
    582       scheduler is run. Therefore, the minimum setting useful is about 100000.
    583       The count is a 64-bit value to make long dump periods possible.
    584       </p></dd>
    585 <dt>
    586 <a name="opt.dump-before"></a><span class="term">
    587       <code class="option">--dump-before=&lt;function&gt; </code>
    588     </span>
    589 </dt>
    590 <dd><p>Dump when entering <code class="option">function</code>.</p></dd>
    591 <dt>
    592 <a name="opt.zero-before"></a><span class="term">
    593       <code class="option">--zero-before=&lt;function&gt; </code>
    594     </span>
    595 </dt>
    596 <dd><p>Zero all costs when entering <code class="option">function</code>.</p></dd>
    597 <dt>
    598 <a name="opt.dump-after"></a><span class="term">
    599       <code class="option">--dump-after=&lt;function&gt; </code>
    600     </span>
    601 </dt>
    602 <dd><p>Dump when leaving <code class="option">function</code>.</p></dd>
    603 </dl>
    604 </div>
    605 </div>
    606 <div class="sect2" title="6.3.3.Data collection options">
    607 <div class="titlepage"><div><div><h3 class="title">
    608 <a name="cl-manual.options.collection"></a>6.3.3.Data collection options</h3></div></div></div>
    609 <p>
    610 These options specify when events are to be aggregated into event counts.
    611 Also see <a class="xref" href="cl-manual.html#cl-manual.limits" title="6.2.2.Limiting the range of collected events">Limiting range of event collection</a>.</p>
    612 <div class="variablelist">
    613 <a name="cl.opts.list.collection"></a><dl>
    614 <dt>
    615 <a name="opt.instr-atstart"></a><span class="term">
    616       <code class="option">--instr-atstart=&lt;yes|no&gt; [default: yes] </code>
    617     </span>
    618 </dt>
    619 <dd>
    620 <p>Specify if you want Callgrind to start simulation and
    621       profiling from the beginning of the program.  
    622       When set to <code class="computeroutput">no</code>, 
    623       Callgrind will not be able
    624       to collect any information, including calls, but it will have at
    625       most a slowdown of around 4, which is the minimum Valgrind
    626       overhead.  Instrumentation can be interactively enabled via
    627       <code class="computeroutput">callgrind_control -i on</code>.</p>
    628 <p>Note that the resulting call graph will most probably not
    629       contain <code class="function">main</code>, but will contain all the
    630       functions executed after instrumentation was enabled.
    631       Instrumentation can also programatically enabled/disabled. See the
    632       Callgrind include file
    633       <code class="computeroutput">callgrind.h</code> for the macro
    634       you have to use in your source code.</p>
    635 <p>For cache
    636       simulation, results will be less accurate when switching on
    637       instrumentation later in the program run, as the simulator starts
    638       with an empty cache at that moment.  Switch on event collection
    639       later to cope with this error.</p>
    640 </dd>
    641 <dt>
    642 <a name="opt.collect-atstart"></a><span class="term">
    643       <code class="option">--collect-atstart=&lt;yes|no&gt; [default: yes] </code>
    644     </span>
    645 </dt>
    646 <dd>
    647 <p>Specify whether event collection is enabled at beginning
    648       of the profile run.</p>
    649 <p>To only look at parts of your program, you have two
    650       possibilities:</p>
    651 <div class="orderedlist"><ol class="orderedlist" type="1">
    652 <li class="listitem"><p>Zero event counters before entering the program part you
    653         want to profile, and dump the event counters to a file after
    654         leaving that program part.</p></li>
    655 <li class="listitem"><p>Switch on/off collection state as needed to only see
    656           event counters happening while inside of the program part you
    657           want to profile.</p></li>
    658 </ol></div>
    659 <p>The second option can be used if the program part you want to
    660       profile is called many times. Option 1, i.e. creating a lot of
    661       dumps is not practical here.</p>
    662 <p>Collection state can be
    663       toggled at entry and exit of a given function with the
    664       option <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.  If you
    665       use this option, collection
    666       state should be disabled at the beginning.  Note that the
    667       specification of <code class="option">--toggle-collect</code>
    668       implicitly sets
    669       <code class="option">--collect-state=no</code>.</p>
    670 <p>Collection state can be toggled also by inserting the client request
    671       <code class="computeroutput">
    672       
    673       CALLGRIND_TOGGLE_COLLECT
    674       ;</code>
    675       at the needed code positions.</p>
    676 </dd>
    677 <dt>
    678 <a name="opt.toggle-collect"></a><span class="term">
    679       <code class="option">--toggle-collect=&lt;function&gt; </code>
    680     </span>
    681 </dt>
    682 <dd><p>Toggle collection on entry/exit of <code class="option">function</code>.</p></dd>
    683 <dt>
    684 <a name="opt.collect-jumps"></a><span class="term">
    685       <code class="option">--collect-jumps=&lt;no|yes&gt; [default: no] </code>
    686     </span>
    687 </dt>
    688 <dd><p>This specifies whether information for (conditional) jumps
    689       should be collected.  As above, callgrind_annotate currently is not
    690       able to show you the data.  You have to use KCachegrind to get jump
    691       arrows in the annotated code.</p></dd>
    692 <dt>
    693 <a name="opt.collect-systime"></a><span class="term">
    694       <code class="option">--collect-systime=&lt;no|yes&gt; [default: no] </code>
    695     </span>
    696 </dt>
    697 <dd><p>This specifies whether information for system call times
    698       should be collected.</p></dd>
    699 <dt>
    700 <a name="clopt.collect-bus"></a><span class="term">
    701       <code class="option">--collect-bus=&lt;no|yes&gt; [default: no] </code>
    702     </span>
    703 </dt>
    704 <dd><p>This specifies whether the number of global bus events executed
    705       should be collected. The event type "Ge" is used for these events.</p></dd>
    706 </dl>
    707 </div>
    708 </div>
    709 <div class="sect2" title="6.3.4.Cost entity separation options">
    710 <div class="titlepage"><div><div><h3 class="title">
    711 <a name="cl-manual.options.separation"></a>6.3.4.Cost entity separation options</h3></div></div></div>
    712 <p>
    713 These options specify how event counts should be attributed to execution
    714 contexts.
    715 For example, they specify whether the recursion level or the
    716 call chain leading to a function should be taken into account, 
    717 and whether the thread ID should be considered.
    718 Also see <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p>
    719 <div class="variablelist">
    720 <a name="cmd-options.separation"></a><dl>
    721 <dt>
    722 <a name="opt.separate-threads"></a><span class="term">
    723       <code class="option">--separate-threads=&lt;no|yes&gt; [default: no] </code>
    724     </span>
    725 </dt>
    726 <dd><p>This option specifies whether profile data should be generated
    727       separately for every thread. If yes, the file names get "-threadID"
    728       appended.</p></dd>
    729 <dt>
    730 <a name="opt.separate-callers"></a><span class="term">
    731       <code class="option">--separate-callers=&lt;callers&gt; [default: 0] </code>
    732     </span>
    733 </dt>
    734 <dd><p>Separate contexts by at most &lt;callers&gt; functions in the
    735       call chain. See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd>
    736 <dt>
    737 <a name="opt.separate-callers-num"></a><span class="term">
    738       <code class="option">--separate-callers&lt;number&gt;=&lt;function&gt; </code>
    739     </span>
    740 </dt>
    741 <dd><p>Separate <code class="option">number</code> callers for <code class="option">function</code>.
    742       See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd>
    743 <dt>
    744 <a name="opt.separate-recs"></a><span class="term">
    745       <code class="option">--separate-recs=&lt;level&gt; [default: 2] </code>
    746     </span>
    747 </dt>
    748 <dd><p>Separate function recursions by at most <code class="option">level</code> levels.
    749       See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd>
    750 <dt>
    751 <a name="opt.separate-recs-num"></a><span class="term">
    752       <code class="option">--separate-recs&lt;number&gt;=&lt;function&gt; </code>
    753     </span>
    754 </dt>
    755 <dd><p>Separate <code class="option">number</code> recursions for <code class="option">function</code>.
    756       See <a class="xref" href="cl-manual.html#cl-manual.cycles" title="6.2.4.Avoiding cycles">Avoiding cycles</a>.</p></dd>
    757 <dt>
    758 <a name="opt.skip-plt"></a><span class="term">
    759       <code class="option">--skip-plt=&lt;no|yes&gt; [default: yes] </code>
    760     </span>
    761 </dt>
    762 <dd><p>Ignore calls to/from PLT sections.</p></dd>
    763 <dt>
    764 <a name="opt.skip-direct-rec"></a><span class="term">
    765       <code class="option">--skip-direct-rec=&lt;no|yes&gt; [default: yes] </code>
    766     </span>
    767 </dt>
    768 <dd><p>Ignore direct recursions.</p></dd>
    769 <dt>
    770 <a name="opt.fn-skip"></a><span class="term">
    771       <code class="option">--fn-skip=&lt;function&gt; </code>
    772     </span>
    773 </dt>
    774 <dd>
    775 <p>Ignore calls to/from a given function.  E.g. if you have a
    776       call chain A &gt; B &gt; C, and you specify function B to be
    777       ignored, you will only see A &gt; C.</p>
    778 <p>This is very convenient to skip functions handling callback
    779       behaviour.  For example, with the signal/slot mechanism in the
    780       Qt graphics library, you only want
    781       to see the function emitting a signal to call the slots connected
    782       to that signal. First, determine the real call chain to see the
    783       functions needed to be skipped, then use this option.</p>
    784 </dd>
    785 </dl>
    786 </div>
    787 </div>
    788 <div class="sect2" title="6.3.5.Simulation options">
    789 <div class="titlepage"><div><div><h3 class="title">
    790 <a name="cl-manual.options.simulation"></a>6.3.5.Simulation options</h3></div></div></div>
    791 <div class="variablelist">
    792 <a name="cl.opts.list.simulation"></a><dl>
    793 <dt>
    794 <a name="clopt.cache-sim"></a><span class="term">
    795       <code class="option">--cache-sim=&lt;yes|no&gt; [default: no] </code>
    796     </span>
    797 </dt>
    798 <dd><p>Specify if you want to do full cache simulation.  By default,
    799       only instruction read accesses will be counted ("Ir").
    800       With cache simulation, further event counters are enabled:
    801       Cache misses on instruction reads ("I1mr"/"ILmr"),
    802       data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
    803       data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
    804       For more information, see <a class="xref" href="cg-manual.html" title="5.Cachegrind: a cache and branch-prediction profiler">Cachegrind: a cache and branch-prediction profiler</a>.
    805       </p></dd>
    806 <dt>
    807 <a name="clopt.branch-sim"></a><span class="term">
    808       <code class="option">--branch-sim=&lt;yes|no&gt; [default: no] </code>
    809     </span>
    810 </dt>
    811 <dd><p>Specify if you want to do branch prediction simulation.
    812       Further event counters are enabled: Number of executed conditional
    813       branches and related predictor misses ("Bc"/"Bcm"), executed indirect
    814       jumps and related misses of the jump address predictor ("Bi"/"Bim").
    815       </p></dd>
    816 </dl>
    817 </div>
    818 </div>
    819 <div class="sect2" title="6.3.6.Cache simulation options">
    820 <div class="titlepage"><div><div><h3 class="title">
    821 <a name="cl-manual.options.cachesimulation"></a>6.3.6.Cache simulation options</h3></div></div></div>
    822 <div class="variablelist">
    823 <a name="cl.opts.list.cachesimulation"></a><dl>
    824 <dt>
    825 <a name="opt.simulate-wb"></a><span class="term">
    826       <code class="option">--simulate-wb=&lt;yes|no&gt; [default: no] </code>
    827     </span>
    828 </dt>
    829 <dd><p>Specify whether write-back behavior should be simulated, allowing
    830       to distinguish LL caches misses with and without write backs.
    831       The cache model of Cachegrind/Callgrind does not specify write-through
    832       vs. write-back behavior, and this also is not relevant for the number
    833       of generated miss counts. However, with explicit write-back simulation
    834       it can be decided whether a miss triggers not only the loading of a new
    835       cache line, but also if a write back of a dirty cache line had to take
    836       place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
    837       for misses because of instruction read, data read, and data write,
    838       respectively. As they produce two memory transactions, they should
    839       account for a doubled time estimation in relation to a normal miss.
    840       </p></dd>
    841 <dt>
    842 <a name="opt.simulate-hwpref"></a><span class="term">
    843       <code class="option">--simulate-hwpref=&lt;yes|no&gt; [default: no] </code>
    844     </span>
    845 </dt>
    846 <dd><p>Specify whether simulation of a hardware prefetcher should be
    847       added which is able to detect stream access in the second level cache
    848       by comparing accesses to separate to each page.
    849       As the simulation can not decide about any timing issues of prefetching,
    850       it is assumed that any hardware prefetch triggered succeeds before a
    851       real access is done. Thus, this gives a best-case scenario by covering
    852       all possible stream accesses.</p></dd>
    853 <dt>
    854 <a name="opt.cacheuse"></a><span class="term">
    855       <code class="option">--cacheuse=&lt;yes|no&gt; [default: no] </code>
    856     </span>
    857 </dt>
    858 <dd><p>Specify whether cache line use should be collected. For every
    859       cache line, from loading to it being evicted, the number of accesses
    860       as well as the number of actually used bytes is determined. This
    861       behavior is related to the code which triggered loading of the cache
    862       line. In contrast to miss counters, which shows the position where
    863       the symptoms of bad cache behavior (i.e. latencies) happens, the
    864       use counters try to pinpoint at the reason (i.e. the code with the
    865       bad access behavior). The new counters are defined in a way such
    866       that worse behavior results in higher cost.
    867       AcCost1 and AcCost2 are counters showing bad temporal locality
    868       for L1 and LL caches, respectively. This is done by summing up
    869       reciprocal values of the numbers of accesses of each cache line,
    870       multiplied by 1000 (as only integer costs are allowed). E.g. for
    871       a given source line with 5 read accesses, a value of 5000 AcCost
    872       means that for every access, a new cache line was loaded and directly
    873       evicted afterwards without further accesses. Similarly, SpLoss1/2
    874       shows bad spatial locality for L1 and LL caches, respectively. It
    875       gives the <span class="emphasis"><em>spatial loss</em></span> count of bytes which
    876       were loaded into cache but never accessed. It pinpoints at code
    877       accessing data in a way such that cache space is wasted. This hints
    878       at bad layout of data structures in memory. Assuming a cache line
    879       size of 64 bytes and 100 L1 misses for a given source line, the
    880       loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
    881       value of 3200 for this line, this means that half of the loaded data was
    882       never used, or using a better data layout, only half of the cache
    883       space would have been needed.
    884       Please note that for cache line use counters, it currently is
    885       not possible to provide meaningful inclusive costs. Therefore,
    886       inclusive cost of these counters should be ignored.
    887       </p></dd>
    888 <dt>
    889 <a name="opt.I1"></a><span class="term">
    890       <code class="option">--I1=&lt;size&gt;,&lt;associativity&gt;,&lt;line size&gt; </code>
    891     </span>
    892 </dt>
    893 <dd><p>Specify the size, associativity and line size of the level 1
    894       instruction cache.  </p></dd>
    895 <dt>
    896 <a name="opt.D1"></a><span class="term">
    897       <code class="option">--D1=&lt;size&gt;,&lt;associativity&gt;,&lt;line size&gt; </code>
    898     </span>
    899 </dt>
    900 <dd><p>Specify the size, associativity and line size of the level 1
    901       data cache.</p></dd>
    902 <dt>
    903 <a name="opt.LL"></a><span class="term">
    904       <code class="option">--LL=&lt;size&gt;,&lt;associativity&gt;,&lt;line size&gt; </code>
    905     </span>
    906 </dt>
    907 <dd><p>Specify the size, associativity and line size of the last-level
    908       cache.</p></dd>
    909 </dl>
    910 </div>
    911 </div>
    912 </div>
    913 <div class="sect1" title="6.4.Callgrind specific client requests">
    914 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    915 <a name="cl-manual.clientrequests"></a>6.4.Callgrind specific client requests</h2></div></div></div>
    916 <p>Callgrind provides the following specific client requests in
    917 <code class="filename">callgrind.h</code>.  See that file for the exact details of
    918 their arguments.</p>
    919 <div class="variablelist">
    920 <a name="cl.clientrequests.list"></a><dl>
    921 <dt>
    922 <a name="cr.dump-stats"></a><span class="term">
    923       <code class="computeroutput">CALLGRIND_DUMP_STATS</code>
    924     </span>
    925 </dt>
    926 <dd><p>Force generation of a profile dump at specified position
    927       in code, for the current thread only. Written counters will be reset
    928       to zero.</p></dd>
    929 <dt>
    930 <a name="cr.dump-stats-at"></a><span class="term">
    931       <code class="computeroutput">CALLGRIND_DUMP_STATS_AT(string)</code>
    932     </span>
    933 </dt>
    934 <dd><p>Same as <code class="computeroutput">CALLGRIND_DUMP_STATS</code>,
    935       but allows to specify a string to be able to distinguish profile
    936       dumps.</p></dd>
    937 <dt>
    938 <a name="cr.zero-stats"></a><span class="term">
    939       <code class="computeroutput">CALLGRIND_ZERO_STATS</code>
    940     </span>
    941 </dt>
    942 <dd><p>Reset the profile counters for the current thread to zero.</p></dd>
    943 <dt>
    944 <a name="cr.toggle-collect"></a><span class="term">
    945       <code class="computeroutput">CALLGRIND_TOGGLE_COLLECT</code>
    946     </span>
    947 </dt>
    948 <dd><p>Toggle the collection state. This allows to ignore events
    949       with regard to profile counters. See also options
    950       <code class="option"><a class="xref" href="cl-manual.html#opt.collect-atstart">--collect-atstart</a></code> and
    951       <code class="option"><a class="xref" href="cl-manual.html#opt.toggle-collect">--toggle-collect</a></code>.</p></dd>
    952 <dt>
    953 <a name="cr.start-instr"></a><span class="term">
    954       <code class="computeroutput">CALLGRIND_START_INSTRUMENTATION</code>
    955     </span>
    956 </dt>
    957 <dd><p>Start full Callgrind instrumentation if not already enabled.
    958       When cache simulation is done, this will flush the simulated cache
    959       and lead to an artifical cache warmup phase afterwards with
    960       cache misses which would not have happened in reality.  See also
    961       option <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd>
    962 <dt>
    963 <a name="cr.stop-instr"></a><span class="term">
    964       <code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION</code>
    965     </span>
    966 </dt>
    967 <dd><p>Stop full Callgrind instrumentation if not already disabled.
    968       This flushes Valgrinds translation cache, and does no additional
    969       instrumentation afterwards: it effectivly will run at the same
    970       speed as Nulgrind, i.e. at minimal slowdown. Use this to
    971       speed up the Callgrind run for uninteresting code parts. Use
    972       <code class="computeroutput"><a class="xref" href="cl-manual.html#cr.start-instr">CALLGRIND_START_INSTRUMENTATION</a></code> to
    973       enable instrumentation again.  See also option
    974       <code class="option"><a class="xref" href="cl-manual.html#opt.instr-atstart">--instr-atstart</a></code>.</p></dd>
    975 </dl>
    976 </div>
    977 </div>
    978 <div class="sect1" title="6.5.callgrind_annotate Command-line Options">
    979 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    980 <a name="cl-manual.callgrind_annotate-options"></a>6.5.callgrind_annotate Command-line Options</h2></div></div></div>
    981 <div class="variablelist">
    982 <a name="callgrind_annotate.opts.list"></a><dl>
    983 <dt><span class="term"><code class="option">-h --help</code></span></dt>
    984 <dd><p>Show summary of options.</p></dd>
    985 <dt><span class="term"><code class="option">--version</code></span></dt>
    986 <dd><p>Show version of callgrind_annotate.</p></dd>
    987 <dt><span class="term">
    988       <code class="option">--show=A,B,C [default: all]</code>
    989     </span></dt>
    990 <dd><p>Only show figures for events A,B,C.</p></dd>
    991 <dt><span class="term">
    992       <code class="option">--sort=A,B,C</code>
    993     </span></dt>
    994 <dd><p>Sort columns by events A,B,C [event column order].</p></dd>
    995 <dt><span class="term">
    996       <code class="option">--threshold=&lt;0--100&gt; [default: 99%] </code>
    997     </span></dt>
    998 <dd><p>Percentage of counts (of primary sort event) we are 
    999       interested in.</p></dd>
   1000 <dt><span class="term">
   1001       <code class="option">--auto=&lt;yes|no&gt; [default: no] </code>
   1002     </span></dt>
   1003 <dd><p>Annotate all source files containing functions that helped 
   1004       reach the event count threshold.</p></dd>
   1005 <dt><span class="term">
   1006       <code class="option">--context=N [default: 8] </code>
   1007     </span></dt>
   1008 <dd><p>Print N lines of context before and after annotated 
   1009       lines.</p></dd>
   1010 <dt><span class="term">
   1011       <code class="option">--inclusive=&lt;yes|no&gt; [default: no] </code>
   1012     </span></dt>
   1013 <dd><p>Add subroutine costs to functions calls.</p></dd>
   1014 <dt><span class="term">
   1015       <code class="option">--tree=&lt;none|caller|calling|both&gt; [default: none] </code>
   1016     </span></dt>
   1017 <dd><p>Print for each function their callers, the called functions 
   1018       or both.</p></dd>
   1019 <dt><span class="term">
   1020       <code class="option">-I, --include=&lt;dir&gt; </code>
   1021     </span></dt>
   1022 <dd><p>Add <code class="option">dir</code> to the list of directories to search
   1023       for source files.</p></dd>
   1024 </dl>
   1025 </div>
   1026 </div>
   1027 <div class="sect1" title="6.6.callgrind_control Command-line Options">
   1028 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
   1029 <a name="cl-manual.callgrind_control-options"></a>6.6.callgrind_control Command-line Options</h2></div></div></div>
   1030 <p>By default, callgrind_control acts on all programs run by the
   1031   current user under Callgrind.  It is possible to limit the actions to
   1032   specified Callgrind runs by providing a list of pids or program names as
   1033   argument.  The default action is to give some brief information about the
   1034   applications being run under Callgrind.</p>
   1035 <div class="variablelist">
   1036 <a name="callgrind_control.opts.list"></a><dl>
   1037 <dt><span class="term"><code class="option">-h --help</code></span></dt>
   1038 <dd><p>Show a short description, usage, and summary of options.</p></dd>
   1039 <dt><span class="term"><code class="option">--version</code></span></dt>
   1040 <dd><p>Show version of callgrind_control.</p></dd>
   1041 <dt><span class="term"><code class="option">-l --long</code></span></dt>
   1042 <dd><p>Show also the working directory, in addition to the brief
   1043       information given by default.
   1044       </p></dd>
   1045 <dt><span class="term"><code class="option">-s --stat</code></span></dt>
   1046 <dd><p>Show statistics information about active Callgrind runs.</p></dd>
   1047 <dt><span class="term"><code class="option">-b --back</code></span></dt>
   1048 <dd><p>Show stack/back traces of each thread in active Callgrind runs. For
   1049       each active function in the stack trace, also the number of invocations
   1050       since program start (or last dump) is shown. This option can be
   1051       combined with -e to show inclusive cost of active functions.</p></dd>
   1052 <dt><span class="term"><code class="option">-e [A,B,...] </code> (default: all)</span></dt>
   1053 <dd><p>Show the current per-thread, exclusive cost values of event
   1054       counters. If no explicit event names are given, figures for all event
   1055       types which are collected in the given Callgrind run are
   1056       shown. Otherwise, only figures for event types A, B, ... are shown. If
   1057       this option is combined with -b, inclusive cost for the functions of
   1058       each active stack frame is provided, too.
   1059       </p></dd>
   1060 <dt><span class="term"><code class="option">--dump[=&lt;desc&gt;] </code> (default: no description)</span></dt>
   1061 <dd><p>Request the dumping of profile information. Optionally, a 
   1062       description can be specified which is written into the dump as part of
   1063       the information giving the reason which triggered the dump action. This
   1064       can be used to distinguish multiple dumps.</p></dd>
   1065 <dt><span class="term"><code class="option">-z --zero</code></span></dt>
   1066 <dd><p>Zero all event counters.</p></dd>
   1067 <dt><span class="term"><code class="option">-k --kill</code></span></dt>
   1068 <dd><p>Force a Callgrind run to be terminated.</p></dd>
   1069 <dt><span class="term"><code class="option">--instr=&lt;on|off&gt;</code></span></dt>
   1070 <dd><p>Switch instrumentation mode on or off. If a Callgrind run has
   1071       instrumentation disabled, no simulation is done and no events are
   1072       counted. This is useful to skip uninteresting program parts, as there
   1073       is much less slowdown (same as with the Valgrind tool "none"). See also
   1074       the Callgrind option <code class="option">--instr-atstart</code>.</p></dd>
   1075 <dt><span class="term"><code class="option">-w=&lt;dir&gt;</code></span></dt>
   1076 <dd><p>Specify the startup directory of an active Callgrind run. On some
   1077       systems, active Callgrind runs can not be detected. To be able to
   1078       control these, the failed auto-detection can be worked around by
   1079       specifying the directory where a Callgrind run was started.</p></dd>
   1080 </dl>
   1081 </div>
   1082 </div>
   1083 </div>
   1084 <div>
   1085 <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
   1086 <tr>
   1087 <td rowspan="2" width="40%" align="left">
   1088 <a accesskey="p" href="cg-manual.html">&lt;&lt;5.Cachegrind: a cache and branch-prediction profiler</a></td>
   1089 <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
   1090 <td rowspan="2" width="40%" align="right"><a accesskey="n" href="hg-manual.html">7.Helgrind: a thread error detector&gt;&gt;</a>
   1091 </td>
   1092 </tr>
   1093 <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
   1094 </table>
   1095 </div>
   1096 </body>
   1097 </html>
   1098