Home | History | Annotate | Download | only in docs
      1 <?xml version="1.0"?> <!-- -*- sgml -*- -->
      2 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      3   "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
      4 [ <!ENTITY % vg-entities SYSTEM "../../docs/xml/vg-entities.xml"> %vg-entities; ]>
      5 
      6 <chapter id="cl-manual" xreflabel="Callgrind Manual">
      7 <title>Callgrind: a call-graph generating cache and branch prediction profiler</title>
      8 
      9 
     10 <para>To use this tool, you must specify
     11 <option>--tool=callgrind</option> on the
     12 Valgrind command line.</para>
     13 
     14 <sect1 id="cl-manual.use" xreflabel="Overview">
     15 <title>Overview</title>
     16 
     17 <para>Callgrind is a profiling tool that records the call history among
     18 functions in a program's run as a call-graph.
     19 By default, the collected data consists of
     20 the number of instructions executed, their relationship
     21 to source lines, the caller/callee relationship between functions,
     22 and the numbers of such calls.
     23 Optionally, cache simulation and/or branch prediction (similar to Cachegrind)
     24 can produce further information about the runtime behavior of an application.
     25 </para>
     26 
     27 <para>The profile data is written out to a file at program
     28 termination. For presentation of the data, and interactive control
     29 of the profiling, two command line tools are provided:</para>
     30 <variablelist>
     31   <varlistentry>
     32   <term><command>callgrind_annotate</command></term>
     33   <listitem>
     34     <para>This command reads in the profile data, and prints a
     35     sorted lists of functions, optionally with source annotation.</para>
     36 
     37     <para>For graphical visualization of the data, try
     38     <ulink url="&cl-gui-url;">KCachegrind</ulink>, which is a KDE/Qt based
     39     GUI that makes it easy to navigate the large amount of data that
     40     Callgrind produces.</para>
     41 
     42   </listitem>
     43   </varlistentry>
     44 
     45   <varlistentry>
     46   <term><command>callgrind_control</command></term>
     47   <listitem>
     48     <para>This command enables you to interactively observe and control 
     49     the status of a program currently running under Callgrind's control,
     50     without stopping the program.  You can get statistics information as
     51     well as the current stack trace, and you can request zeroing of counters
     52     or dumping of profile data.</para>
     53   </listitem>
     54   </varlistentry>
     55 </variablelist>
     56 
     57   <sect2 id="cl-manual.functionality" xreflabel="Functionality">
     58   <title>Functionality</title>
     59 
     60 <para>Cachegrind collects flat profile data: event counts (data reads,
     61 cache misses, etc.) are attributed directly to the function they
     62 occurred in.  This cost attribution mechanism is
     63 called <emphasis>self</emphasis> or <emphasis>exclusive</emphasis>
     64 attribution.</para>
     65 
     66 <para>Callgrind extends this functionality by propagating costs
     67 across function call boundaries.  If function <function>foo</function> calls
     68 <function>bar</function>, the costs from <function>bar</function> are added into
     69 <function>foo</function>'s costs.  When applied to the program as a whole,
     70 this builds up a picture of so called <emphasis>inclusive</emphasis>
     71 costs, that is, where the cost of each function includes the costs of
     72 all functions it called, directly or indirectly.</para>
     73 
     74 <para>As an example, the inclusive cost of
     75 <function>main</function> should be almost 100 percent
     76 of the total program cost.  Because of costs arising before 
     77 <function>main</function> is run, such as
     78 initialization of the run time linker and construction of global C++
     79 objects, the inclusive cost of <function>main</function>
     80 is not exactly 100 percent of the total program cost.</para>
     81 
     82 <para>Together with the call graph, this allows you to find the
     83 specific call chains starting from
     84 <function>main</function> in which the majority of the
     85 program's costs occur.  Caller/callee cost attribution is also useful
     86 for profiling functions called from multiple call sites, and where
     87 optimization opportunities depend on changing code in the callers, in
     88 particular by reducing the call count.</para>
     89 
     90 <para>Callgrind's cache simulation is based on that of Cachegrind.
     91 Read the documentation for <xref linkend="&vg-cg-manual-id;"/> first.  The material
     92 below describes the features supported in addition to Cachegrind's
     93 features.</para>
     94 
     95 <para>Callgrind's ability to detect function calls and returns depends
     96 on the instruction set of the platform it is run on.  It works best on
     97 x86 and amd64, and unfortunately currently does not work so well on
     98 PowerPC, ARM, Thumb or MIPS code.  This is because there are no explicit
     99 call or return instructions in these instruction sets, so Callgrind
    100 has to rely on heuristics to detect calls and returns.</para>
    101 
    102   </sect2>
    103 
    104   <sect2 id="cl-manual.basics" xreflabel="Basic Usage">
    105   <title>Basic Usage</title>
    106 
    107   <para>As with Cachegrind, you probably want to compile with debugging info
    108   (the <option>-g</option> option) and with optimization turned on.</para>
    109 
    110   <para>To start a profile run for a program, execute:
    111   <screen>valgrind --tool=callgrind [callgrind options] your-program [program options]</screen>
    112   </para>
    113 
    114   <para>While the simulation is running, you can observe execution with:
    115   <screen>callgrind_control -b</screen>
    116   This will print out the current backtrace. To annotate the backtrace with
    117   event counts, run
    118   <screen>callgrind_control -e -b</screen>
    119   </para>
    120 
    121   <para>After program termination, a profile data file named 
    122   <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>
    123   is generated, where <emphasis>pid</emphasis> is the process ID 
    124   of the program being profiled.
    125   The data file contains information about the calls made in the
    126   program among the functions executed, together with 
    127   <command>Instruction Read</command> (Ir) event counts.</para>
    128 
    129   <para>To generate a function-by-function summary from the profile
    130   data file, use
    131   <screen>callgrind_annotate [options] callgrind.out.&lt;pid&gt;</screen>
    132   This summary is similar to the output you get from a Cachegrind
    133   run with cg_annotate: the list
    134   of functions is ordered by exclusive cost of functions, which also
    135   are the ones that are shown.
    136   Important for the additional features of Callgrind are
    137   the following two options:</para>
    138 
    139   <itemizedlist>
    140     <listitem>
    141       <para><option>--inclusive=yes</option>: Instead of using
    142       exclusive cost of functions as sorting order, use and show
    143       inclusive cost.</para>
    144     </listitem>
    145 
    146     <listitem>
    147       <para><option>--tree=both</option>: Interleave into the
    148       top level list of functions, information on the callers and the callees
    149       of each function. In these lines, which represents executed
    150       calls, the cost gives the number of events spent in the call.
    151       Indented, above each function, there is the list of callers,
    152       and below, the list of callees. The sum of events in calls to
    153       a given function (caller lines), as well as the sum of events in
    154       calls from the function (callee lines) together with the self
    155       cost, gives the total inclusive cost of the function.</para>
    156      </listitem>
    157   </itemizedlist>
    158 
    159   <para>Use <option>--auto=yes</option> to get annotated source code
    160   for all relevant functions for which the source can be found. In
    161   addition to source annotation as produced by
    162   <computeroutput>cg_annotate</computeroutput>, you will see the
    163   annotated call sites with call counts. For all other options, 
    164   consult the (Cachegrind) documentation for
    165   <computeroutput>cg_annotate</computeroutput>.
    166   </para>
    167 
    168   <para>For better call graph browsing experience, it is highly recommended
    169   to use <ulink url="&cl-gui-url;">KCachegrind</ulink>.
    170   If your code
    171   has a significant fraction of its cost in <emphasis>cycles</emphasis> (sets
    172   of functions calling each other in a recursive manner), you have to
    173   use KCachegrind, as <computeroutput>callgrind_annotate</computeroutput>
    174   currently does not do any cycle detection, which is important to get correct
    175   results in this case.</para>
    176 
    177   <para>If you are additionally interested in measuring the 
    178   cache behavior of your program, use Callgrind with the option
    179   <option><xref linkend="clopt.cache-sim"/>=yes</option>. For
    180   branch prediction simulation, use <option><xref linkend="clopt.branch-sim"/>=yes</option>.
    181   Expect a further slow down approximately by a factor of 2.</para>
    182 
    183   <para>If the program section you want to profile is somewhere in the
    184   middle of the run, it is beneficial to 
    185   <emphasis>fast forward</emphasis> to this section without any 
    186   profiling, and then enable profiling.  This is achieved by using
    187   the command line option
    188   <option><xref linkend="opt.instr-atstart"/>=no</option> 
    189   and running, in a shell:
    190   <computeroutput>callgrind_control -i on</computeroutput> just before the 
    191   interesting code section is executed. To exactly specify
    192   the code position where profiling should start, use the client request
    193   <computeroutput><xref linkend="cr.start-instr"/></computeroutput>.</para>
    194 
    195   <para>If you want to be able to see assembly code level annotation, specify
    196   <option><xref linkend="opt.dump-instr"/>=yes</option>. This will produce
    197   profile data at instruction granularity. Note that the resulting profile
    198   data
    199   can only be viewed with KCachegrind. For assembly annotation, it also is
    200   interesting to see more details of the control flow inside of functions,
    201   i.e. (conditional) jumps. This will be collected by further specifying
    202   <option><xref linkend="opt.collect-jumps"/>=yes</option>.</para>
    203 
    204   </sect2>
    205 
    206 </sect1>
    207 
    208 <sect1 id="cl-manual.usage" xreflabel="Advanced Usage">
    209 <title>Advanced Usage</title>
    210 
    211   <sect2 id="cl-manual.dumps" 
    212          xreflabel="Multiple dumps from one program run">
    213   <title>Multiple profiling dumps from one program run</title>
    214 
    215   <para>Sometimes you are not interested in characteristics of a full 
    216   program run, but only of a small part of it, for example execution of one
    217   algorithm.  If there are multiple algorithms, or one algorithm 
    218   running with different input data, it may even be useful to get different
    219   profile information for different parts of a single program run.</para>
    220 
    221   <para>Profile data files have names of the form
    222 <screen>
    223 callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphasis>threadID</emphasis>
    224 </screen>
    225   </para>
    226   <para>where <emphasis>pid</emphasis> is the PID of the running 
    227   program, <emphasis>part</emphasis> is a number incremented on each
    228   dump (".part" is skipped for the dump at program termination), and 
    229   <emphasis>threadID</emphasis> is a thread identification 
    230   ("-threadID" is only used if you request dumps of individual 
    231   threads with <option><xref linkend="opt.separate-threads"/>=yes</option>).</para>
    232 
    233   <para>There are different ways to generate multiple profile dumps 
    234   while a program is running under Callgrind's supervision.  Nevertheless,
    235   all methods trigger the same action, which is "dump all profile 
    236   information since the last dump or program start, and zero cost 
    237   counters afterwards".  To allow for zeroing cost counters without
    238   dumping, there is a second action "zero all cost counters now". 
    239   The different methods are:</para>
    240   <itemizedlist>
    241 
    242     <listitem>
    243       <para><command>Dump on program termination.</command>
    244       This method is the standard way and doesn't need any special
    245       action on your part.</para>
    246     </listitem>
    247 
    248     <listitem>
    249       <para><command>Spontaneous, interactive dumping.</command> Use
    250       <screen>callgrind_control -d [hint [PID/Name]]</screen> to 
    251       request the dumping of profile information of the supervised
    252       application with PID or Name.  <emphasis>hint</emphasis> is an
    253       arbitrary string you can optionally specify to later be able to
    254       distinguish profile dumps.  The control program will not terminate
    255       before the dump is completely written.  Note that the application
    256       must be actively running for detection of the dump command. So,
    257       for a GUI application, resize the window, or for a server, send a
    258       request.</para>
    259       <para>If you are using <ulink url="&cl-gui-url;">KCachegrind</ulink>
    260       for browsing of profile information, you can use the toolbar
    261       button <command>Force dump</command>. This will request a dump
    262       and trigger a reload after the dump is written.</para>
    263     </listitem>
    264 
    265     <listitem>
    266       <para><command>Periodic dumping after execution of a specified
    267       number of basic blocks</command>. For this, use the command line
    268       option <option><xref linkend="opt.dump-every-bb"/>=count</option>.
    269       </para>
    270     </listitem>
    271 
    272     <listitem>
    273       <para><command>Dumping at enter/leave of specified functions.</command>
    274       Use the
    275       option <option><xref linkend="opt.dump-before"/>=function</option>
    276       and <option><xref linkend="opt.dump-after"/>=function</option>.
    277       To zero cost counters before entering a function, use
    278       <option><xref linkend="opt.zero-before"/>=function</option>.</para>
    279       <para>You can specify these options multiple times for different
    280       functions. Function specifications support wildcards: e.g. use
    281       <option><xref linkend="opt.dump-before"/>='foo*'</option> to
    282       generate dumps before entering any function starting with 
    283       <emphasis>foo</emphasis>.</para>
    284     </listitem>
    285 
    286     <listitem>
    287       <para><command>Program controlled dumping.</command>
    288       Insert
    289       <computeroutput><xref linkend="cr.dump-stats"/>;</computeroutput>
    290       at the position in your code where you want a profile dump to happen. Use 
    291       <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput> to only 
    292       zero profile counters.
    293       See <xref linkend="cl-manual.clientrequests"/> for more information on
    294       Callgrind specific client requests.</para>
    295     </listitem>
    296   </itemizedlist>
    297 
    298   <para>If you are running a multi-threaded application and specify the
    299   command line option <option><xref linkend="opt.separate-threads"/>=yes</option>, 
    300   every thread will be profiled on its own and will create its own
    301   profile dump. Thus, the last two methods will only generate one dump
    302   of the currently running thread. With the other methods, you will get
    303   multiple dumps (one for each thread) on a dump request.</para>
    304 
    305   </sect2>
    306 
    307 
    308 
    309   <sect2 id="cl-manual.limits" 
    310          xreflabel="Limiting range of event collection">
    311   <title>Limiting the range of collected events</title>
    312 
    313   <para>By default, whenever events are happening (such as an
    314     instruction execution or cache hit/miss), Callgrind is aggregating
    315     them into event counters. However, you may be interested only in
    316     what is happening within a given function or starting from a given
    317     program phase. To this end, you can disable event aggregation for
    318     uninteresting program parts. While attribution of events to
    319     functions as well as producing seperate output per program phase
    320     can be done by other means (see previous section), there are two
    321     benefits by disabling aggregation. First, this is very
    322     fine-granular (e.g. just for a loop within a function).  Second,
    323     disabling event aggregation for complete program phases allows to
    324     switch off time-consuming cache simulation and allows Callgrind to
    325     progress at much higher speed with an slowdown of around factor 2
    326     (identical to <computeroutput>valgrind
    327     --tool=none</computeroutput>).
    328   </para>
    329 
    330   <para>There are two aspects which influence whether Callgrind is
    331     aggregating events at some point in time of program execution.
    332     First, there is the <emphasis>collection state</emphasis>. If this
    333     is off, no aggregation will be done.  By changing the collection
    334     state, you can control event aggregation at a very fine
    335     granularity.  However, there is not much difference in regard to
    336     execution speed of Callgrind.  By default, collection is switched
    337     on, but can be disabled by different means (see below).  Second,
    338     there is the <emphasis>instrumentation mode</emphasis> in which
    339     Callgrind is running. This mode either can be on or off. If
    340     instrumentation is off, no observation of actions in the program
    341     will be done and thus, no actions will be forwarded to the
    342     simulator which could trigger events. In the end, no events will
    343     be aggregated.  The huge benefit is the much higher speed with
    344     instrumentation switched off.  However, this only should be used
    345     with care and in a coarse fashion: every mode change resets the
    346     simulator state (ie. whether a memory block is cached or not) and
    347     flushes Valgrinds internal cache of instrumented code blocks,
    348     resulting in latency penalty at switching time. Also, cache
    349     simulator results directly after switching on instrumentation will
    350     be skewed due to identified cache misses which would not happen in
    351     reality (if you care about this warm-up effect, you should make
    352     sure to temporarly have collection state switched off directly
    353     after turning instrumentation mode on). However, switching
    354     instrumentation state is very useful to skip larger program phases
    355     such as an initialization phase. By default, instrumentation is
    356     switched on, but as with the collection state, can be changed by
    357     various means.
    358   </para>
    359 
    360   <para>Callgrind can start with instrumentation mode switched off by
    361     specifying
    362     option <option><xref linkend="opt.instr-atstart"/>=no</option>.
    363     Afterwards, instrumentation can be controlled in two ways: first,
    364     interactively with: <screen>callgrind_control -i on</screen> (and
    365     switching off again by specifying "off" instead of "on").  Second,
    366     instrumentation state can be programatically changed with the
    367     macros <computeroutput><xref linkend="cr.start-instr"/>;</computeroutput>
    368     and <computeroutput><xref linkend="cr.stop-instr"/>;</computeroutput>.
    369   </para>
    370 
    371   <para>Similarly, the collection state at program start can be
    372     switched off
    373     by <option><xref linkend="opt.instr-atstart"/>=no</option>. During
    374     execution, it can be controlled programatically with the
    375     macro <computeroutput>CALLGRIND_TOGGLE_COLLECT;</computeroutput>.
    376     Further, you can limit event collection to a specific function by
    377     using <option><xref linkend="opt.toggle-collect"/>=function</option>.
    378     This will toggle the collection state on entering and leaving the
    379     specified function.  When this option is in effect, the default
    380     collection state at program start is "off".  Only events happening
    381     while running inside of the given function will be
    382     collected. Recursive calls of the given function do not trigger
    383     any action. This option can be given multiple times to specify
    384     different functions of interest.</para>
    385   </sect2>
    386 
    387   <sect2 id="cl-manual.busevents" xreflabel="Counting global bus events">
    388   <title>Counting global bus events</title>
    389 
    390   <para>For access to shared data among threads in a multithreaded
    391   code, synchronization is required to avoid raced conditions.
    392   Synchronization primitives are usually implemented via atomic instructions.
    393   However, excessive use of such instructions can lead to performance
    394   issues.</para>
    395 
    396   <para>To enable analysis of this problem, Callgrind optionally can count
    397   the number of atomic instructions executed. More precisely, for x86/x86_64,
    398   these are instructions using a lock prefix. For architectures supporting
    399   LL/SC, these are the number of SC instructions executed. For both, the term
    400   "global bus events" is used.</para>
    401 
    402   <para>The short name of the event type used for global bus events is "Ge".
    403   To count global bus events, use <option><xref linkend="clopt.collect-bus"/>=yes</option>.
    404   </para>
    405   </sect2>
    406 
    407   <sect2 id="cl-manual.cycles" xreflabel="Avoiding cycles">
    408   <title>Avoiding cycles</title>
    409 
    410   <para>Informally speaking, a cycle is a group of functions which
    411   call each other in a recursive way.</para>
    412 
    413   <para>Formally speaking, a cycle is a nonempty set S of functions,
    414   such that for every pair of functions F and G in S, it is possible
    415   to call from F to G (possibly via intermediate functions) and also
    416   from G to F.  Furthermore, S must be maximal -- that is, be the
    417   largest set of functions satisfying this property.  For example, if
    418   a third function H is called from inside S and calls back into S,
    419   then H is also part of the cycle and should be included in S.</para>
    420 
    421   <para>Recursion is quite usual in programs, and therefore, cycles
    422   sometimes appear in the call graph output of Callgrind. However,
    423   the title of this chapter should raise two questions: What is bad
    424   about cycles which makes you want to avoid them? And: How can
    425   cycles be avoided without changing program code?</para>
    426 
    427   <para>Cycles are not bad in itself, but tend to make performance
    428   analysis of your code harder. This is because inclusive costs
    429   for calls inside of a cycle are meaningless. The definition of
    430   inclusive cost, i.e. self cost of a function plus inclusive cost
    431   of its callees, needs a topological order among functions. For
    432   cycles, this does not hold true: callees of a function in a cycle include
    433   the function itself. Therefore, KCachegrind does cycle detection
    434   and skips visualization of any inclusive cost for calls inside
    435   of cycles. Further, all functions in a cycle are collapsed into artifical
    436   functions called like <computeroutput>Cycle 1</computeroutput>.</para>
    437 
    438   <para>Now, when a program exposes really big cycles (as is
    439   true for some GUI code, or in general code using event or callback based
    440   programming style), you lose the nice property to let you pinpoint
    441   the bottlenecks by following call chains from
    442   <function>main</function>, guided via
    443   inclusive cost. In addition, KCachegrind loses its ability to show
    444   interesting parts of the call graph, as it uses inclusive costs to
    445   cut off uninteresting areas.</para>
    446 
    447   <para>Despite the meaningless of inclusive costs in cycles, the big
    448   drawback for visualization motivates the possibility to temporarily
    449   switch off cycle detection in KCachegrind, which can lead to
    450   misguiding visualization. However, often cycles appear because of
    451   unlucky superposition of independent call chains in a way that
    452   the profile result will see a cycle. Neglecting uninteresting
    453   calls with very small measured inclusive cost would break these
    454   cycles. In such cases, incorrect handling of cycles by not detecting
    455   them still gives meaningful profiling visualization.</para>
    456 
    457   <para>It has to be noted that currently, <command>callgrind_annotate</command>
    458   does not do any cycle detection at all. For program executions with function
    459   recursion, it e.g. can print nonsense inclusive costs way above 100%.</para>
    460 
    461   <para>After describing why cycles are bad for profiling, it is worth
    462   talking about cycle avoidance. The key insight here is that symbols in
    463   the profile data do not have to exactly match the symbols found in the
    464   program. Instead, the symbol name could encode additional information
    465   from the current execution context such as recursion level of the
    466   current function, or even some part of the call chain leading to the
    467   function. While encoding of additional information into symbols is
    468   quite capable of avoiding cycles, it has to be used carefully to not cause
    469   symbol explosion. The latter imposes large memory requirement for Callgrind
    470   with possible out-of-memory conditions, and big profile data files.</para>
    471 
    472   <para>A further possibility to avoid cycles in Callgrind's profile data
    473   output is to simply leave out given functions in the call graph. Of course, this
    474   also skips any call information from and to an ignored function, and thus can
    475   break a cycle. Candidates for this typically are dispatcher functions in event
    476   driven code. The option to ignore calls to a function is
    477   <option><xref linkend="opt.fn-skip"/>=function</option>. Aside from
    478   possibly breaking cycles, this is used in Callgrind to skip
    479   trampoline functions in the PLT sections
    480   for calls to functions in shared libraries. You can see the difference
    481   if you profile with <option><xref linkend="opt.skip-plt"/>=no</option>.
    482   If a call is ignored, its cost events will be propagated to the
    483   enclosing function.</para>
    484 
    485   <para>If you have a recursive function, you can distinguish the first
    486   10 recursion levels by specifying
    487   <option><xref linkend="opt.separate-recs-num"/>=function</option>.  
    488   Or for all functions with 
    489   <option><xref linkend="opt.separate-recs"/>=10</option>, but this will 
    490   give you much bigger profile data files.  In the profile data, you will see
    491   the recursion levels of "func" as the different functions with names
    492   "func", "func'2", "func'3" and so on.</para>
    493 
    494   <para>If you have call chains "A &gt; B &gt; C" and "A &gt; C &gt; B"
    495   in your program, you usually get a "false" cycle "B &lt;&gt; C". Use 
    496   <option><xref linkend="opt.separate-callers-num"/>=B</option> 
    497   <option><xref linkend="opt.separate-callers-num"/>=C</option>,
    498   and functions "B" and "C" will be treated as different functions 
    499   depending on the direct caller. Using the apostrophe for appending 
    500   this "context" to the function name, you get "A &gt; B'A &gt; C'B" 
    501   and "A &gt; C'A &gt; B'C", and there will be no cycle. Use 
    502   <option><xref linkend="opt.separate-callers"/>=2</option> to get a 2-caller 
    503   dependency for all functions.  Note that doing this will increase
    504   the size of profile data files.</para>
    505 
    506   </sect2>
    507 
    508   <sect2 id="cl-manual.forkingprograms" xreflabel="Forking Programs">
    509   <title>Forking Programs</title>
    510 
    511   <para>If your program forks, the child will inherit all the profiling
    512   data that has been gathered for the parent. To start with empty profile
    513   counter values in the child, the client request
    514   <computeroutput><xref linkend="cr.zero-stats"/>;</computeroutput>
    515   can be inserted into code to be executed by the child, directly after
    516   <computeroutput>fork</computeroutput>.</para>
    517 
    518   <para>However, you will have to make sure that the output file format string
    519   (controlled by <option>--callgrind-out-file</option>) does contain
    520   <option>%p</option> (which is true by default). Otherwise, the
    521   outputs from the parent and child will overwrite each other or will be
    522   intermingled, which almost certainly is not what you want.</para>
    523 
    524   <para>You will be able to control the new child independently from
    525   the parent via callgrind_control.</para>
    526 
    527   </sect2>
    528 
    529 </sect1>
    530 
    531 
    532 <sect1 id="cl-manual.options" xreflabel="Callgrind Command-line Options">
    533 <title>Callgrind Command-line Options</title>
    534 
    535 <para>
    536 In the following, options are grouped into classes.
    537 </para>
    538 <para>
    539 Some options allow the specification of a function/symbol name, such as
    540 <option><xref linkend="opt.dump-before"/>=function</option>, or
    541 <option><xref linkend="opt.fn-skip"/>=function</option>. All these options
    542 can be specified multiple times for different functions.
    543 In addition, the function specifications actually are patterns by supporting
    544 the use of wildcards '*' (zero or more arbitrary characters) and '?'
    545 (exactly one arbitrary character), similar to file name globbing in the
    546 shell. This feature is important especially for C++, as without wildcard
    547 usage, the function would have to be specified in full extent, including
    548 parameter signature. </para>
    549 
    550 <sect2 id="cl-manual.options.creation" 
    551        xreflabel="Dump creation options">
    552 <title>Dump creation options</title>
    553 
    554 <para>
    555 These options influence the name and format of the profile data files.
    556 </para>
    557 
    558 <!-- start of xi:include in the manpage -->
    559 <variablelist id="cl.opts.list.creation">
    560 
    561   <varlistentry id="opt.callgrind-out-file" xreflabel="--callgrind-out-file">
    562     <term>
    563       <option><![CDATA[--callgrind-out-file=<file> ]]></option>
    564     </term>
    565     <listitem>
    566       <para>Write the profile data to
    567             <computeroutput>file</computeroutput> rather than to the default
    568             output file,
    569             <computeroutput>callgrind.out.&lt;pid&gt;</computeroutput>.  The
    570             <option>%p</option> and <option>%q</option> format specifiers
    571             can be used to embed the process ID and/or the contents of an
    572             environment variable in the name, as is the case for the core
    573             option <option><xref linkend="opt.log-file"/></option>.
    574             When multiple dumps are made, the file name
    575             is modified further; see below.</para> 
    576     </listitem>
    577   </varlistentry>
    578 
    579   <varlistentry id="opt.dump-line" xreflabel="--dump-line">
    580     <term>
    581       <option><![CDATA[--dump-line=<no|yes> [default: yes] ]]></option>
    582     </term>
    583     <listitem>
    584       <para>This specifies that event counting should be performed at
    585       source line granularity. This allows source annotation for sources
    586       which are compiled with debug information
    587       (<option>-g</option>).</para>
    588   </listitem>
    589   </varlistentry>
    590 
    591   <varlistentry id="opt.dump-instr" xreflabel="--dump-instr">
    592     <term>
    593       <option><![CDATA[--dump-instr=<no|yes> [default: no] ]]></option>
    594     </term>
    595     <listitem>
    596       <para>This specifies that event counting should be performed at
    597       per-instruction granularity.
    598       This allows for assembly code
    599       annotation.  Currently the results can only be 
    600       displayed by KCachegrind.</para>
    601   </listitem>
    602   </varlistentry>
    603 
    604   <varlistentry id="opt.compress-strings" xreflabel="--compress-strings">
    605     <term>
    606       <option><![CDATA[--compress-strings=<no|yes> [default: yes] ]]></option>
    607     </term>
    608     <listitem>
    609       <para>This option influences the output format of the profile data.
    610       It specifies whether strings (file and function names) should be
    611       identified by numbers. This shrinks the file, 
    612       but makes it more difficult
    613       for humans to read (which is not recommended in any case).</para>
    614     </listitem>
    615   </varlistentry>
    616 
    617   <varlistentry id="opt.compress-pos" xreflabel="--compress-pos">
    618     <term>
    619       <option><![CDATA[--compress-pos=<no|yes> [default: yes] ]]></option>
    620     </term>
    621     <listitem>
    622       <para>This option influences the output format of the profile data.
    623       It specifies whether numerical positions are always specified as absolute
    624       values or are allowed to be relative to previous numbers.
    625       This shrinks the file size.</para>
    626     </listitem>
    627   </varlistentry>
    628 
    629   <varlistentry id="opt.combine-dumps" xreflabel="--combine-dumps">
    630     <term>
    631       <option><![CDATA[--combine-dumps=<no|yes> [default: no] ]]></option>
    632     </term>
    633     <listitem>
    634       <para>When enabled, when multiple profile data parts are to be
    635       generated these parts are appended to the same output file.
    636       Not recommended.</para>
    637   </listitem>
    638   </varlistentry>
    639 
    640 </variablelist>
    641 </sect2>
    642 
    643 <sect2 id="cl-manual.options.activity" 
    644        xreflabel="Activity options">
    645 <title>Activity options</title>
    646 
    647 <para>
    648 These options specify when actions relating to event counts are to
    649 be executed. For interactive control use callgrind_control.
    650 </para>
    651 
    652 <!-- start of xi:include in the manpage -->
    653 <variablelist id="cl.opts.list.activity">
    654 
    655   <varlistentry id="opt.dump-every-bb" xreflabel="--dump-every-bb">
    656     <term>
    657       <option><![CDATA[--dump-every-bb=<count> [default: 0, never] ]]></option>
    658     </term>
    659     <listitem>
    660       <para>Dump profile data every <option>count</option> basic blocks.
    661       Whether a dump is needed is only checked when Valgrind's internal
    662       scheduler is run. Therefore, the minimum setting useful is about 100000.
    663       The count is a 64-bit value to make long dump periods possible.
    664       </para>
    665     </listitem>
    666   </varlistentry>
    667 
    668   <varlistentry id="opt.dump-before" xreflabel="--dump-before">
    669     <term>
    670       <option><![CDATA[--dump-before=<function> ]]></option>
    671     </term>
    672     <listitem>
    673       <para>Dump when entering <option>function</option>.</para>
    674     </listitem>
    675   </varlistentry>
    676 
    677   <varlistentry id="opt.zero-before" xreflabel="--zero-before">
    678     <term>
    679       <option><![CDATA[--zero-before=<function> ]]></option>
    680     </term>
    681     <listitem>
    682       <para>Zero all costs when entering <option>function</option>.</para>
    683     </listitem>
    684   </varlistentry>
    685 
    686   <varlistentry id="opt.dump-after" xreflabel="--dump-after">
    687     <term>
    688       <option><![CDATA[--dump-after=<function> ]]></option>
    689     </term>
    690     <listitem>
    691       <para>Dump when leaving <option>function</option>.</para>
    692     </listitem>
    693   </varlistentry>
    694 
    695 </variablelist>
    696 <!-- end of xi:include in the manpage -->
    697 </sect2>
    698 
    699 <sect2 id="cl-manual.options.collection"
    700        xreflabel="Data collection options">
    701 <title>Data collection options</title>
    702 
    703 <para>
    704 These options specify when events are to be aggregated into event counts.
    705 Also see <xref linkend="cl-manual.limits"/>.</para>
    706 
    707 <!-- start of xi:include in the manpage -->
    708 <variablelist id="cl.opts.list.collection">
    709 
    710   <varlistentry id="opt.instr-atstart" xreflabel="--instr-atstart">
    711     <term>
    712       <option><![CDATA[--instr-atstart=<yes|no> [default: yes] ]]></option>
    713     </term>
    714     <listitem>
    715       <para>Specify if you want Callgrind to start simulation and
    716       profiling from the beginning of the program.  
    717       When set to <computeroutput>no</computeroutput>, 
    718       Callgrind will not be able
    719       to collect any information, including calls, but it will have at
    720       most a slowdown of around 4, which is the minimum Valgrind
    721       overhead.  Instrumentation can be interactively enabled via
    722       <computeroutput>callgrind_control -i on</computeroutput>.</para>
    723       <para>Note that the resulting call graph will most probably not
    724       contain <function>main</function>, but will contain all the
    725       functions executed after instrumentation was enabled.
    726       Instrumentation can also programatically enabled/disabled. See the
    727       Callgrind include file
    728       <computeroutput>callgrind.h</computeroutput> for the macro
    729       you have to use in your source code.</para> <para>For cache
    730       simulation, results will be less accurate when switching on
    731       instrumentation later in the program run, as the simulator starts
    732       with an empty cache at that moment.  Switch on event collection
    733       later to cope with this error.</para>
    734     </listitem>
    735   </varlistentry>
    736   
    737   <varlistentry id="opt.collect-atstart" xreflabel="--collect-atstart">
    738     <term>
    739       <option><![CDATA[--collect-atstart=<yes|no> [default: yes] ]]></option>
    740     </term>
    741     <listitem>
    742       <para>Specify whether event collection is enabled at beginning
    743       of the profile run.</para>
    744       <para>To only look at parts of your program, you have two
    745       possibilities:</para>
    746       <orderedlist>
    747       <listitem>
    748         <para>Zero event counters before entering the program part you
    749         want to profile, and dump the event counters to a file after
    750         leaving that program part.</para>
    751         </listitem>
    752         <listitem>
    753           <para>Switch on/off collection state as needed to only see
    754           event counters happening while inside of the program part you
    755           want to profile.</para>
    756         </listitem>
    757       </orderedlist>
    758       <para>The second option can be used if the program part you want to
    759       profile is called many times. Option 1, i.e. creating a lot of
    760       dumps is not practical here.</para> 
    761       <para>Collection state can be
    762       toggled at entry and exit of a given function with the
    763       option <option><xref linkend="opt.toggle-collect"/></option>.  If you
    764       use this option, collection
    765       state should be disabled at the beginning.  Note that the
    766       specification of <option>--toggle-collect</option>
    767       implicitly sets
    768       <option>--collect-state=no</option>.</para>
    769       <para>Collection state can be toggled also by inserting the client request
    770       <computeroutput>
    771       <!-- commented out because it causes broken links in the man page
    772       <xref linkend="cr.toggle-collect"/>;
    773       -->
    774       CALLGRIND_TOGGLE_COLLECT
    775       ;</computeroutput>
    776       at the needed code positions.</para>
    777     </listitem>
    778   </varlistentry>
    779 
    780   <varlistentry id="opt.toggle-collect" xreflabel="--toggle-collect">
    781     <term>
    782       <option><![CDATA[--toggle-collect=<function> ]]></option>
    783     </term>
    784     <listitem>
    785       <para>Toggle collection on entry/exit of <option>function</option>.</para>
    786     </listitem>
    787   </varlistentry>
    788 
    789   <varlistentry id="opt.collect-jumps" xreflabel="--collect-jumps">
    790     <term>
    791       <option><![CDATA[--collect-jumps=<no|yes> [default: no] ]]></option>
    792     </term>
    793     <listitem>
    794       <para>This specifies whether information for (conditional) jumps
    795       should be collected.  As above, callgrind_annotate currently is not
    796       able to show you the data.  You have to use KCachegrind to get jump
    797       arrows in the annotated code.</para>
    798     </listitem>
    799   </varlistentry>
    800 
    801   <varlistentry id="opt.collect-systime" xreflabel="--collect-systime">
    802     <term>
    803       <option><![CDATA[--collect-systime=<no|yes> [default: no] ]]></option>
    804     </term>
    805     <listitem>
    806       <para>This specifies whether information for system call times
    807       should be collected.</para>
    808     </listitem>
    809   </varlistentry>
    810 
    811   <varlistentry id="clopt.collect-bus" xreflabel="--collect-bus">
    812     <term>
    813       <option><![CDATA[--collect-bus=<no|yes> [default: no] ]]></option>
    814     </term>
    815     <listitem>
    816       <para>This specifies whether the number of global bus events executed
    817       should be collected. The event type "Ge" is used for these events.</para>
    818     </listitem>
    819   </varlistentry>
    820 
    821 </variablelist>
    822 <!-- end of xi:include in the manpage -->
    823 </sect2>
    824 
    825 <sect2 id="cl-manual.options.separation"
    826        xreflabel="Cost entity separation options">
    827 <title>Cost entity separation options</title>
    828 
    829 <para>
    830 These options specify how event counts should be attributed to execution
    831 contexts.
    832 For example, they specify whether the recursion level or the
    833 call chain leading to a function should be taken into account, 
    834 and whether the thread ID should be considered.
    835 Also see <xref linkend="cl-manual.cycles"/>.</para>
    836 
    837 <!-- start of xi:include in the manpage -->
    838 <variablelist id="cmd-options.separation">
    839 
    840   <varlistentry id="opt.separate-threads" xreflabel="--separate-threads">
    841     <term>
    842       <option><![CDATA[--separate-threads=<no|yes> [default: no] ]]></option>
    843     </term>
    844     <listitem>
    845       <para>This option specifies whether profile data should be generated
    846       separately for every thread. If yes, the file names get "-threadID"
    847       appended.</para>
    848     </listitem>
    849   </varlistentry>
    850 
    851   <varlistentry id="opt.separate-callers" xreflabel="--separate-callers">
    852     <term>
    853       <option><![CDATA[--separate-callers=<callers> [default: 0] ]]></option>
    854     </term>
    855     <listitem>
    856       <para>Separate contexts by at most &lt;callers&gt; functions in the
    857       call chain. See <xref linkend="cl-manual.cycles"/>.</para>
    858     </listitem>
    859   </varlistentry>
    860 
    861   <varlistentry id="opt.separate-callers-num" xreflabel="--separate-callers2">
    862     <term>
    863       <option><![CDATA[--separate-callers<number>=<function> ]]></option>
    864     </term>
    865     <listitem>
    866       <para>Separate <option>number</option> callers for <option>function</option>.
    867       See <xref linkend="cl-manual.cycles"/>.</para>
    868     </listitem>
    869   </varlistentry>
    870 
    871   <varlistentry id="opt.separate-recs" xreflabel="--separate-recs">
    872     <term>
    873       <option><![CDATA[--separate-recs=<level> [default: 2] ]]></option>
    874     </term>
    875     <listitem>
    876       <para>Separate function recursions by at most <option>level</option> levels.
    877       See <xref linkend="cl-manual.cycles"/>.</para>
    878     </listitem>
    879   </varlistentry>
    880 
    881   <varlistentry id="opt.separate-recs-num" xreflabel="--separate-recs10">
    882     <term>
    883       <option><![CDATA[--separate-recs<number>=<function> ]]></option>
    884     </term>
    885     <listitem>
    886       <para>Separate <option>number</option> recursions for <option>function</option>.
    887       See <xref linkend="cl-manual.cycles"/>.</para>
    888     </listitem>
    889   </varlistentry>
    890 
    891   <varlistentry id="opt.skip-plt" xreflabel="--skip-plt">
    892     <term>
    893       <option><![CDATA[--skip-plt=<no|yes> [default: yes] ]]></option>
    894     </term>
    895     <listitem>
    896       <para>Ignore calls to/from PLT sections.</para>
    897     </listitem>
    898   </varlistentry>
    899   
    900   <varlistentry id="opt.skip-direct-rec" xreflabel="--skip-direct-rec">
    901     <term>
    902       <option><![CDATA[--skip-direct-rec=<no|yes> [default: yes] ]]></option>
    903     </term>
    904     <listitem>
    905       <para>Ignore direct recursions.</para>
    906     </listitem>
    907   </varlistentry>
    908   
    909   <varlistentry id="opt.fn-skip" xreflabel="--fn-skip">
    910     <term>
    911       <option><![CDATA[--fn-skip=<function> ]]></option>
    912     </term>
    913     <listitem>
    914       <para>Ignore calls to/from a given function.  E.g. if you have a
    915       call chain A &gt; B &gt; C, and you specify function B to be
    916       ignored, you will only see A &gt; C.</para>
    917       <para>This is very convenient to skip functions handling callback
    918       behaviour.  For example, with the signal/slot mechanism in the
    919       Qt graphics library, you only want
    920       to see the function emitting a signal to call the slots connected
    921       to that signal. First, determine the real call chain to see the
    922       functions needed to be skipped, then use this option.</para>
    923     </listitem>
    924   </varlistentry>
    925   
    926 <!-- 
    927     commenting out as it is only enabled with CLG_EXPERIMENTAL.  (Nb: I had to
    928     insert a space between the double dash to avoid XML comment problems.)
    929 
    930   <varlistentry id="opt.fn-group">
    931     <term>
    932       <option><![CDATA[- -fn-group<number>=<function> ]]></option>
    933     </term>
    934     <listitem>
    935       <para>Put a function into a separate group. This influences the
    936       context name for cycle avoidance. All functions inside such a
    937       group are treated as being the same for context name building, which
    938       resembles the call chain leading to a context. By specifying function
    939       groups with this option, you can shorten the context name, as functions
    940       in the same group will not appear in sequence in the name. </para>
    941     </listitem>
    942   </varlistentry>
    943 --> 
    944 
    945 </variablelist>
    946 <!-- end of xi:include in the manpage -->
    947 </sect2>
    948 
    949 
    950 <sect2 id="cl-manual.options.simulation"
    951        xreflabel="Simulation options">
    952 <title>Simulation options</title>
    953 
    954 <!-- start of xi:include in the manpage -->
    955 <variablelist id="cl.opts.list.simulation">
    956 
    957   <varlistentry id="clopt.cache-sim" xreflabel="--cache-sim">
    958     <term>
    959       <option><![CDATA[--cache-sim=<yes|no> [default: no] ]]></option>
    960     </term>
    961     <listitem>
    962       <para>Specify if you want to do full cache simulation.  By default,
    963       only instruction read accesses will be counted ("Ir").
    964       With cache simulation, further event counters are enabled:
    965       Cache misses on instruction reads ("I1mr"/"ILmr"),
    966       data read accesses ("Dr") and related cache misses ("D1mr"/"DLmr"),
    967       data write accesses ("Dw") and related cache misses ("D1mw"/"DLmw").
    968       For more information, see <xref linkend="&vg-cg-manual-id;"/>.
    969       </para>
    970     </listitem>
    971   </varlistentry>
    972 
    973   <varlistentry id="clopt.branch-sim" xreflabel="--branch-sim">
    974     <term>
    975       <option><![CDATA[--branch-sim=<yes|no> [default: no] ]]></option>
    976     </term>
    977     <listitem>
    978       <para>Specify if you want to do branch prediction simulation.
    979       Further event counters are enabled: Number of executed conditional
    980       branches and related predictor misses ("Bc"/"Bcm"), executed indirect
    981       jumps and related misses of the jump address predictor ("Bi"/"Bim").
    982       </para>
    983     </listitem>
    984   </varlistentry>
    985 
    986 </variablelist>
    987 <!-- end of xi:include in the manpage -->
    988 </sect2>
    989 
    990 
    991 <sect2 id="cl-manual.options.cachesimulation"
    992        xreflabel="Cache simulation options">
    993 <title>Cache simulation options</title>
    994 
    995 <!-- start of xi:include in the manpage -->
    996 <variablelist id="cl.opts.list.cachesimulation">
    997 
    998   <varlistentry id="opt.simulate-wb" xreflabel="--simulate-wb">
    999     <term>
   1000       <option><![CDATA[--simulate-wb=<yes|no> [default: no] ]]></option>
   1001     </term>
   1002     <listitem>
   1003       <para>Specify whether write-back behavior should be simulated, allowing
   1004       to distinguish LL caches misses with and without write backs.
   1005       The cache model of Cachegrind/Callgrind does not specify write-through
   1006       vs. write-back behavior, and this also is not relevant for the number
   1007       of generated miss counts. However, with explicit write-back simulation
   1008       it can be decided whether a miss triggers not only the loading of a new
   1009       cache line, but also if a write back of a dirty cache line had to take
   1010       place before. The new dirty miss events are ILdmr, DLdmr, and DLdmw,
   1011       for misses because of instruction read, data read, and data write,
   1012       respectively. As they produce two memory transactions, they should
   1013       account for a doubled time estimation in relation to a normal miss.
   1014       </para>
   1015     </listitem>
   1016   </varlistentry>
   1017 
   1018   <varlistentry id="opt.simulate-hwpref" xreflabel="--simulate-hwpref">
   1019     <term>
   1020       <option><![CDATA[--simulate-hwpref=<yes|no> [default: no] ]]></option>
   1021     </term>
   1022     <listitem>
   1023       <para>Specify whether simulation of a hardware prefetcher should be
   1024       added which is able to detect stream access in the second level cache
   1025       by comparing accesses to separate to each page.
   1026       As the simulation can not decide about any timing issues of prefetching,
   1027       it is assumed that any hardware prefetch triggered succeeds before a
   1028       real access is done. Thus, this gives a best-case scenario by covering
   1029       all possible stream accesses.</para>
   1030     </listitem>
   1031   </varlistentry>
   1032 
   1033   <varlistentry id="opt.cacheuse" xreflabel="--cacheuse">
   1034     <term>
   1035       <option><![CDATA[--cacheuse=<yes|no> [default: no] ]]></option>
   1036     </term>
   1037     <listitem>
   1038       <para>Specify whether cache line use should be collected. For every
   1039       cache line, from loading to it being evicted, the number of accesses
   1040       as well as the number of actually used bytes is determined. This
   1041       behavior is related to the code which triggered loading of the cache
   1042       line. In contrast to miss counters, which shows the position where
   1043       the symptoms of bad cache behavior (i.e. latencies) happens, the
   1044       use counters try to pinpoint at the reason (i.e. the code with the
   1045       bad access behavior). The new counters are defined in a way such
   1046       that worse behavior results in higher cost.
   1047       AcCost1 and AcCost2 are counters showing bad temporal locality
   1048       for L1 and LL caches, respectively. This is done by summing up
   1049       reciprocal values of the numbers of accesses of each cache line,
   1050       multiplied by 1000 (as only integer costs are allowed). E.g. for
   1051       a given source line with 5 read accesses, a value of 5000 AcCost
   1052       means that for every access, a new cache line was loaded and directly
   1053       evicted afterwards without further accesses. Similarly, SpLoss1/2
   1054       shows bad spatial locality for L1 and LL caches, respectively. It
   1055       gives the <emphasis>spatial loss</emphasis> count of bytes which
   1056       were loaded into cache but never accessed. It pinpoints at code
   1057       accessing data in a way such that cache space is wasted. This hints
   1058       at bad layout of data structures in memory. Assuming a cache line
   1059       size of 64 bytes and 100 L1 misses for a given source line, the
   1060       loading of 6400 bytes into L1 was triggered. If SpLoss1 shows a
   1061       value of 3200 for this line, this means that half of the loaded data was
   1062       never used, or using a better data layout, only half of the cache
   1063       space would have been needed.
   1064       Please note that for cache line use counters, it currently is
   1065       not possible to provide meaningful inclusive costs. Therefore,
   1066       inclusive cost of these counters should be ignored.
   1067       </para>
   1068     </listitem>
   1069   </varlistentry>
   1070 
   1071   <varlistentry id="opt.I1" xreflabel="--I1">
   1072     <term>
   1073       <option><![CDATA[--I1=<size>,<associativity>,<line size> ]]></option>
   1074     </term>
   1075     <listitem>
   1076       <para>Specify the size, associativity and line size of the level 1
   1077       instruction cache.  </para>
   1078     </listitem>
   1079   </varlistentry>
   1080 
   1081   <varlistentry id="opt.D1" xreflabel="--D1">
   1082     <term>
   1083       <option><![CDATA[--D1=<size>,<associativity>,<line size> ]]></option>
   1084     </term>
   1085     <listitem>
   1086       <para>Specify the size, associativity and line size of the level 1
   1087       data cache.</para>
   1088     </listitem>
   1089   </varlistentry>
   1090 
   1091   <varlistentry id="opt.LL" xreflabel="--LL">
   1092     <term>
   1093       <option><![CDATA[--LL=<size>,<associativity>,<line size> ]]></option>
   1094     </term>
   1095     <listitem>
   1096       <para>Specify the size, associativity and line size of the last-level
   1097       cache.</para>
   1098     </listitem>
   1099   </varlistentry>
   1100 </variablelist>
   1101 <!-- end of xi:include in the manpage -->
   1102 
   1103 </sect2>
   1104 
   1105 </sect1>
   1106 
   1107 <sect1 id="cl-manual.monitor-commands" xreflabel="Callgrind Monitor Commands">
   1108 <title>Callgrind Monitor Commands</title>
   1109 <para>The Callgrind tool provides monitor commands handled by the Valgrind
   1110 gdbserver (see <xref linkend="manual-core-adv.gdbserver-commandhandling"/>).
   1111 </para>
   1112 
   1113 <itemizedlist>
   1114   <listitem>
   1115     <para><varname>dump [&lt;dump_hint&gt;]</varname> requests to dump the
   1116     profile data. </para>
   1117   </listitem>
   1118 
   1119   <listitem>
   1120     <para><varname>zero</varname> requests to zero the profile data
   1121     counters. </para>
   1122   </listitem>
   1123 
   1124   <listitem>
   1125     <para><varname>instrumentation [on|off]</varname> requests to set 
   1126     (if parameter on/off is given) or get the current instrumentation state.
   1127     </para>
   1128   </listitem>
   1129 
   1130   <listitem>
   1131     <para><varname>status</varname> requests to print out some status
   1132     information.</para>
   1133   </listitem>
   1134 
   1135 </itemizedlist>
   1136 </sect1>
   1137 
   1138 <sect1 id="cl-manual.clientrequests" xreflabel="Client request reference">
   1139 <title>Callgrind specific client requests</title>
   1140 
   1141 <para>Callgrind provides the following specific client requests in
   1142 <filename>callgrind.h</filename>.  See that file for the exact details of
   1143 their arguments.</para>
   1144 
   1145 <variablelist id="cl.clientrequests.list">
   1146   
   1147   <varlistentry id="cr.dump-stats" xreflabel="CALLGRIND_DUMP_STATS">
   1148     <term>
   1149       <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>
   1150     </term>
   1151     <listitem>
   1152       <para>Force generation of a profile dump at specified position
   1153       in code, for the current thread only. Written counters will be reset
   1154       to zero.</para>
   1155     </listitem>
   1156   </varlistentry>
   1157 
   1158   <varlistentry id="cr.dump-stats-at" xreflabel="CALLGRIND_DUMP_STATS_AT">
   1159     <term>
   1160       <computeroutput>CALLGRIND_DUMP_STATS_AT(string)</computeroutput>
   1161     </term>
   1162     <listitem>
   1163       <para>Same as <computeroutput>CALLGRIND_DUMP_STATS</computeroutput>,
   1164       but allows to specify a string to be able to distinguish profile
   1165       dumps.</para>
   1166     </listitem>
   1167   </varlistentry>
   1168 
   1169   <varlistentry id="cr.zero-stats" xreflabel="CALLGRIND_ZERO_STATS">
   1170     <term>
   1171       <computeroutput>CALLGRIND_ZERO_STATS</computeroutput>
   1172     </term>
   1173     <listitem>
   1174       <para>Reset the profile counters for the current thread to zero.</para>
   1175     </listitem>
   1176   </varlistentry>
   1177 
   1178   <varlistentry id="cr.toggle-collect" xreflabel="CALLGRIND_TOGGLE_COLLECT">
   1179     <term>
   1180       <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput>
   1181     </term>
   1182     <listitem>
   1183       <para>Toggle the collection state. This allows to ignore events
   1184       with regard to profile counters. See also options
   1185       <option><xref linkend="opt.collect-atstart"/></option> and
   1186       <option><xref linkend="opt.toggle-collect"/></option>.</para>
   1187     </listitem>
   1188   </varlistentry>
   1189 
   1190   <varlistentry id="cr.start-instr" xreflabel="CALLGRIND_START_INSTRUMENTATION">
   1191     <term>
   1192       <computeroutput>CALLGRIND_START_INSTRUMENTATION</computeroutput>
   1193     </term>
   1194     <listitem>
   1195       <para>Start full Callgrind instrumentation if not already enabled.
   1196       When cache simulation is done, this will flush the simulated cache
   1197       and lead to an artifical cache warmup phase afterwards with
   1198       cache misses which would not have happened in reality.  See also
   1199       option <option><xref linkend="opt.instr-atstart"/></option>.</para>
   1200     </listitem>
   1201   </varlistentry>
   1202 
   1203   <varlistentry id="cr.stop-instr" xreflabel="CALLGRIND_STOP_INSTRUMENTATION">
   1204     <term>
   1205       <computeroutput>CALLGRIND_STOP_INSTRUMENTATION</computeroutput>
   1206     </term>
   1207     <listitem>
   1208       <para>Stop full Callgrind instrumentation if not already disabled.
   1209       This flushes Valgrinds translation cache, and does no additional
   1210       instrumentation afterwards: it effectivly will run at the same
   1211       speed as Nulgrind, i.e. at minimal slowdown. Use this to
   1212       speed up the Callgrind run for uninteresting code parts. Use
   1213       <computeroutput><xref linkend="cr.start-instr"/></computeroutput> to
   1214       enable instrumentation again.  See also option
   1215       <option><xref linkend="opt.instr-atstart"/></option>.</para>
   1216     </listitem>
   1217   </varlistentry>
   1218 
   1219 </variablelist>
   1220 
   1221 </sect1>
   1222 
   1223 
   1224 
   1225 <sect1 id="cl-manual.callgrind_annotate-options" xreflabel="callgrind_annotate Command-line Options">
   1226 <title>callgrind_annotate Command-line Options</title>
   1227 
   1228 <!-- start of xi:include in the manpage -->
   1229 <variablelist id="callgrind_annotate.opts.list">
   1230 
   1231   <varlistentry>
   1232     <term><option>-h --help</option></term>
   1233     <listitem>
   1234       <para>Show summary of options.</para>
   1235     </listitem>
   1236   </varlistentry>
   1237 
   1238   <varlistentry>
   1239     <term><option>--version</option></term>
   1240     <listitem>
   1241       <para>Show version of callgrind_annotate.</para>
   1242     </listitem>
   1243   </varlistentry>
   1244 
   1245   <varlistentry>
   1246     <term>
   1247       <option>--show=A,B,C [default: all]</option>
   1248     </term>
   1249     <listitem>
   1250       <para>Only show figures for events A,B,C.</para>
   1251     </listitem>
   1252   </varlistentry>
   1253 
   1254   <varlistentry>
   1255     <term>
   1256       <option>--sort=A,B,C</option>
   1257     </term>
   1258     <listitem>
   1259       <para>Sort columns by events A,B,C [event column order].</para>
   1260     </listitem>
   1261   </varlistentry>
   1262 
   1263   <varlistentry>
   1264     <term>
   1265       <option><![CDATA[--threshold=<0--100> [default: 99%] ]]></option>
   1266     </term>
   1267     <listitem>
   1268       <para>Percentage of counts (of primary sort event) we are 
   1269       interested in.</para>
   1270     </listitem>
   1271   </varlistentry>
   1272 
   1273   <varlistentry>
   1274     <term>
   1275       <option><![CDATA[--auto=<yes|no> [default: no] ]]></option>
   1276     </term>
   1277     <listitem>
   1278       <para>Annotate all source files containing functions that helped 
   1279       reach the event count threshold.</para>
   1280     </listitem>
   1281   </varlistentry>
   1282 
   1283   <varlistentry>
   1284     <term>
   1285       <option>--context=N [default: 8] </option>
   1286     </term>
   1287     <listitem>
   1288       <para>Print N lines of context before and after annotated 
   1289       lines.</para>
   1290     </listitem>
   1291   </varlistentry>
   1292 
   1293   <varlistentry>
   1294     <term>
   1295       <option><![CDATA[--inclusive=<yes|no> [default: no] ]]></option>
   1296     </term>
   1297     <listitem>
   1298       <para>Add subroutine costs to functions calls.</para>
   1299     </listitem>
   1300   </varlistentry>
   1301 
   1302   <varlistentry>
   1303     <term>
   1304       <option><![CDATA[--tree=<none|caller|calling|both> [default: none] ]]></option>
   1305     </term>
   1306     <listitem>
   1307       <para>Print for each function their callers, the called functions 
   1308       or both.</para>
   1309     </listitem>
   1310   </varlistentry>
   1311 
   1312   <varlistentry>
   1313     <term>
   1314       <option><![CDATA[-I, --include=<dir> ]]></option>
   1315     </term>
   1316     <listitem>
   1317       <para>Add <option>dir</option> to the list of directories to search
   1318       for source files.</para>
   1319   </listitem>
   1320   </varlistentry>
   1321 
   1322 </variablelist>
   1323 <!-- end of xi:include in the manpage -->
   1324 
   1325 
   1326 </sect1>
   1327 
   1328 
   1329 
   1330 
   1331 <sect1 id="cl-manual.callgrind_control-options" xreflabel="callgrind_control Command-line Options">
   1332 <title>callgrind_control Command-line Options</title>
   1333 
   1334 <para>By default, callgrind_control acts on all programs run by the
   1335   current user under Callgrind.  It is possible to limit the actions to
   1336   specified Callgrind runs by providing a list of pids or program names as
   1337   argument.  The default action is to give some brief information about the
   1338   applications being run under Callgrind.</para>
   1339 
   1340 <!-- start of xi:include in the manpage -->
   1341 <variablelist id="callgrind_control.opts.list">
   1342 
   1343   <varlistentry>
   1344     <term><option>-h --help</option></term>
   1345     <listitem>
   1346       <para>Show a short description, usage, and summary of options.</para>
   1347     </listitem>
   1348   </varlistentry>
   1349 
   1350   <varlistentry>
   1351     <term><option>--version</option></term>
   1352     <listitem>
   1353       <para>Show version of callgrind_control.</para>
   1354     </listitem>
   1355   </varlistentry>
   1356 
   1357   <varlistentry>
   1358     <term><option>-l --long</option></term>
   1359     <listitem>
   1360       <para>Show also the working directory, in addition to the brief
   1361       information given by default.
   1362       </para>
   1363     </listitem>
   1364   </varlistentry>
   1365 
   1366   <varlistentry>
   1367     <term><option>-s --stat</option></term>
   1368     <listitem>
   1369       <para>Show statistics information about active Callgrind runs.</para>
   1370     </listitem>
   1371   </varlistentry>
   1372 
   1373   <varlistentry>
   1374     <term><option>-b --back</option></term>
   1375     <listitem>
   1376       <para>Show stack/back traces of each thread in active Callgrind runs. For
   1377       each active function in the stack trace, also the number of invocations
   1378       since program start (or last dump) is shown. This option can be
   1379       combined with -e to show inclusive cost of active functions.</para>
   1380     </listitem>
   1381   </varlistentry>
   1382 
   1383   <varlistentry>
   1384     <term><option><![CDATA[-e [A,B,...] ]]></option> (default: all)</term>
   1385     <listitem>
   1386       <para>Show the current per-thread, exclusive cost values of event
   1387       counters. If no explicit event names are given, figures for all event
   1388       types which are collected in the given Callgrind run are
   1389       shown. Otherwise, only figures for event types A, B, ... are shown. If
   1390       this option is combined with -b, inclusive cost for the functions of
   1391       each active stack frame is provided, too.
   1392       </para>
   1393     </listitem>
   1394   </varlistentry>
   1395 
   1396   <varlistentry>
   1397     <term><option><![CDATA[--dump[=<desc>] ]]></option> (default: no description)</term>
   1398     <listitem>
   1399       <para>Request the dumping of profile information. Optionally, a 
   1400       description can be specified which is written into the dump as part of
   1401       the information giving the reason which triggered the dump action. This
   1402       can be used to distinguish multiple dumps.</para>
   1403     </listitem>
   1404   </varlistentry>
   1405 
   1406   <varlistentry>
   1407     <term><option>-z --zero</option></term>
   1408     <listitem>
   1409       <para>Zero all event counters.</para>
   1410     </listitem>
   1411   </varlistentry>
   1412 
   1413   <varlistentry>
   1414     <term><option>-k --kill</option></term>
   1415     <listitem>
   1416       <para>Force a Callgrind run to be terminated.</para>
   1417     </listitem>
   1418   </varlistentry>
   1419 
   1420   <varlistentry>
   1421     <term><option><![CDATA[--instr=<on|off>]]></option></term>
   1422     <listitem>
   1423       <para>Switch instrumentation mode on or off. If a Callgrind run has
   1424       instrumentation disabled, no simulation is done and no events are
   1425       counted. This is useful to skip uninteresting program parts, as there
   1426       is much less slowdown (same as with the Valgrind tool "none"). See also
   1427       the Callgrind option <option>--instr-atstart</option>.</para>
   1428     </listitem>
   1429   </varlistentry>
   1430 
   1431   <varlistentry>
   1432     <term><option><![CDATA[--vgdb-prefix=<prefix>]]></option></term>
   1433     <listitem>
   1434       <para>Specify the vgdb prefix to use by callgrind_control.
   1435       callgrind_control internally uses vgdb to find and control the active
   1436       Callgrind runs. If the <option>--vgdb-prefix</option> option was used
   1437       for launching valgrind, then the same option must be given to
   1438       callgrind_control.</para>
   1439     </listitem>
   1440   </varlistentry>
   1441 </variablelist>
   1442 <!-- end of xi:include in the manpage -->
   1443 
   1444 </sect1>
   1445 
   1446 </chapter>
   1447