1 <?xml version="1.0" encoding='ISO-8859-1'?> 2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"> 3 4 <book id="oprofile-guide"> 5 <bookinfo> 6 <title>OProfile manual</title> 7 8 <authorgroup> 9 <author> 10 <firstname>John</firstname> 11 <surname>Levon</surname> 12 <affiliation> 13 <address><email>levon (a] movementarian.org</email></address> 14 </affiliation> 15 </author> 16 </authorgroup> 17 18 <copyright> 19 <year>2000-2004</year> 20 <holder>Victoria University of Manchester, John Levon and others</holder> 21 </copyright> 22 </bookinfo> 23 24 <toc></toc> 25 26 <chapter id="introduction"> 27 <title>Introduction</title> 28 29 <para> 30 This manual applies to OProfile version <oprofileversion />. 31 OProfile is a profiling system for Linux 2.2/2.4/2.6 systems on a number of architectures. It is capable of profiling 32 all parts of a running system, from the kernel (including modules and interrupt handlers) to shared libraries 33 to binaries. It runs transparently in the background collecting information at a low overhead. These 34 features make it ideal for profiling entire systems to determine bottle necks in real-world systems. 35 </para> 36 <para> 37 Many CPUs provide "performance counters", hardware registers that can count "events"; for example, 38 cache misses, or CPU cycles. OProfile provides profiles of code based on the number of these occurring events: 39 repeatedly, every time a certain (configurable) number of events has occurred, the PC value is recorded. 40 This information is aggregated into profiles for each binary image.</para> 41 <para> 42 Some hardware setups do not allow OProfile to use performance counters: in these cases, no 43 events are available, and OProfile operates in timer/RTC mode, as described in later chapters. 44 </para> 45 <sect1 id="applications"> 46 <title>Applications of OProfile</title> 47 <para> 48 OProfile is useful in a number of situations. You might want to use OProfile when you : 49 </para> 50 <itemizedlist> 51 <listitem><para>need low overhead</para></listitem> 52 <listitem><para>cannot use highly intrusive profiling methods</para></listitem> 53 <listitem><para>need to profile interrupt handlers</para></listitem> 54 <listitem><para>need to profile an application and its shared libraries</para></listitem> 55 <listitem><para>need to profile dynamically compiled code of supported virtual machines (see <xref linkend="jitsupport"/>)</para></listitem> 56 <listitem><para>need to capture the performance behaviour of entire system</para></listitem> 57 <listitem><para>want to examine hardware effects such as cache misses</para></listitem> 58 <listitem><para>want detailed source annotation</para></listitem> 59 <listitem><para>want instruction-level profiles</para></listitem> 60 <listitem><para>want call-graph profiles</para></listitem> 61 </itemizedlist> 62 <para> 63 OProfile is not a panacea. OProfile might not be a complete solution when you : 64 </para> 65 <itemizedlist> 66 <listitem><para>require call graph profiles on platforms other than 2.6/x86</para></listitem> 67 <listitem><para>don't have root permissions</para></listitem> 68 <listitem><para>require 100% instruction-accurate profiles</para></listitem> 69 <listitem><para>need function call counts or an interstitial profiling API</para></listitem> 70 <listitem><para>cannot tolerate any disturbance to the system whatsoever</para></listitem> 71 <listitem><para>need to profile interpreted or dynamically compiled code of non-supported virtual machines</para></listitem> 72 </itemizedlist> 73 <sect2 id="jitsupport"> 74 <title>Support for dynamically compiled (JIT) code</title> 75 <para> 76 Older versions of OProfile were not capable of attributing samples to symbols from dynamically 77 compiled code, i.e. "just-in-time (JIT) code". Typical JIT compilers load the JIT code into 78 anonymous memory regions. OProfile reported the samples from such code, but the attribution 79 provided was simply: 80 <screen>"anon: <tgid><address range>" </screen> 81 Due to this limitation, it wasn't possible to profile applications executed by virtual machines (VMs) 82 like the Java Virtual Machine. OProfile now contains an infrastructure to support JITed code. 83 A development library is provided to allow developers 84 to add support for any VM that produces dynamically compiled code (see the <emphasis>OProfile JIT agent 85 developer guide</emphasis>). 86 In addition, built-in support is included for the following:</para> 87 <itemizedlist><listitem>JVMTI agent library for Java (1.5 and higher)</listitem> 88 <listitem>JVMPI agent library for Java (1.5 and lower)</listitem> 89 </itemizedlist> 90 <para> 91 For information on how to use OProfile's JIT support, see <xref linkend="setup-jit"/>. 92 </para> 93 </sect2> 94 </sect1> 95 96 <sect1 id="requirements"> 97 <title>System requirements</title> 98 99 <variablelist> 100 <varlistentry> 101 <term>Linux kernel 2.2/2.4/2.6</term> 102 <listitem><para> 103 OProfile uses a kernel module that can be compiled for 104 2.2.11 or later and 2.4. 2.4.10 or above is required if you use the 105 boot-time kernel option <option>nosmp</option>. 2.6 kernels are supported with the in-kernel 106 OProfile driver. Note that only 32-bit x86 and IA64 are supported on 2.2/2.4 kernels. 107 </para> 108 109 <para> 110 2.6 kernels are strongly recommended. Under 2.4, OProfile may cause system crashes if power 111 management is used, or the BIOS does not correctly deal with local APICs. 112 </para> 113 114 <para> 115 To use OProfile's JIT support, a kernel version 2.6.13 or later is required. 116 In earlier kernel versions, the anonymous memory regions are not reported to OProfile and results 117 in profiling reports without any samples in these regions. 118 </para> 119 120 <para> 121 PPC64 processors (Power4/Power5/PPC970, etc.) require a recent (> 2.6.5) kernel with the line 122 <constant>#define PV_970</constant> present in <filename>include/asm-ppc64/processor.h</filename>. 123 <!-- FIXME: do we require always gte 2.4.10 for nosmp ? --> 124 </para> 125 <para> 126 Profiling the Cell Broadband Engine PowerPC Processing Element (PPE) requires a kernel version 127 of 2.6.18 or more recent. 128 Profiling the Cell Broadband Engine Synergistic Processing Element (SPE) requires a kernel version 129 of 2.6.22 or more recent. Additionally, full support of SPE profiling requires a BFD library 130 from binutils code dated January 2007 or later. To ensure the proper BFD support exists, run 131 the <code>configure</code> utility with <code>--with-target=cell-be</code>. 132 133 Profiling the Cell Broadband Engine using SPU events requires a kernel version of 2.6.29-rc1 134 or more recent. 135 136 <note>Attempting to profile SPEs with kernel versions older than 2.6.22 may cause the 137 system to crash.</note> 138 </para> 139 140 <para> 141 Instruction-Based Sampling (IBS) profile on AMD family10h processors requires 142 kernel version 2.6.28-rc2 or later. 143 </para> 144 </listitem> 145 </varlistentry> 146 <varlistentry> 147 <term>modutils 2.4.6 or above</term> 148 <listitem><para> 149 You should have installed modutils 2.4.6 or higher (in fact earlier versions work well in almost all 150 cases). 151 </para></listitem> 152 </varlistentry> 153 <varlistentry> 154 <term>Supported architecture</term> 155 <listitem><para> 156 For Intel IA32, a CPU with either a P6 generation or Pentium 4 core is 157 required. In marketing terms this translates to anything 158 between an Intel Pentium Pro (not Pentium Classics) and 159 a Pentium 4 / Xeon, including all Celerons. The AMD 160 Athlon, Opteron, Phenom, and Turion CPUs are also supported. Other IA32 161 CPU types only support the RTC mode of OProfile; please 162 see later in this manual for details. Hyper-threaded Pentium IVs 163 are not supported in 2.4. For 2.4 kernels, the Intel 164 IA-64 CPUs are also supported. For 2.6 kernels, there is additionally 165 support for Alpha processors, MIPS, ARM, x86-64, sparc64, ppc64, AVR32, and, 166 in timer mode, PA-RISC and s390. 167 </para></listitem> 168 </varlistentry> 169 <varlistentry> 170 <term>Uniprocessor or SMP</term> 171 <listitem><para> 172 SMP machines are fully supported. 173 </para></listitem> 174 </varlistentry> 175 <varlistentry> 176 <term>Required libraries</term> 177 <listitem><para> 178 These libraries are required : <filename>popt</filename>, <filename>bfd</filename>, 179 <filename>liberty</filename> (debian users: libiberty is provided in binutils-dev package), <filename>dl</filename>, 180 plus the standard C++ libraries. 181 </para></listitem> 182 </varlistentry> 183 <varlistentry> 184 <term>Required user account</term> 185 <listitem><para> 186 For secure processing of sample data from JIT virtual machines (e.g., Java), 187 the special user account "oprofile" must exist on the system. The 'configure' 188 and 'make install' operations will print warning messages if this 189 account is not found. If you intend to profile JITed code, you must create 190 a group account named 'oprofile' and then create the 'oprofile' user account, 191 setting the default group to 'oprofile'. A runtime error message is printed to 192 the oprofile daemon log when processing JIT samples if this special user 193 account cannot be found. 194 </para></listitem> 195 </varlistentry> 196 <varlistentry> 197 <term>OProfile GUI</term> 198 <listitem><para> 199 The use of the GUI to start the profiler requires the <filename>Qt 2</filename> library. <filename>Qt 3</filename> should 200 also work. 201 </para></listitem> 202 </varlistentry> 203 <varlistentry> 204 <term><acronym>ELF</acronym></term> 205 <listitem><para> 206 Probably not too strenuous a requirement, but older <acronym>A.OUT</acronym> binaries/libraries are not supported. 207 </para></listitem> 208 </varlistentry> 209 <varlistentry> 210 <term>K&R coding style</term> 211 <listitem><para> 212 OK, so it's not really a requirement, but I wish it was... 213 </para></listitem> 214 </varlistentry> 215 </variablelist> 216 217 218 </sect1> 219 220 <sect1 id="resources"> 221 <title>Internet resources</title> 222 223 <variablelist> 224 <varlistentry> 225 <term>Web page</term> 226 <listitem><para> 227 There is a web page (which you may be reading now) at 228 <ulink url="http://oprofile.sf.net/">http://oprofile.sf.net/</ulink>. 229 </para></listitem> 230 </varlistentry> 231 <varlistentry> 232 <term>Download</term> 233 <listitem><para> 234 You can download a source tarball or get anonymous CVS at the sourceforge page, 235 <ulink url="http://sf.net/projects/oprofile/">http://sf.net/projects/oprofile/</ulink>. 236 </para></listitem> 237 </varlistentry> 238 <varlistentry> 239 <term>Mailing list</term> 240 <listitem><para> 241 There is a low-traffic OProfile-specific mailing list, details at 242 <ulink url="http://sf.net/mail/?group_id=16191">http://sf.net/mail/?group_id=16191</ulink>. 243 </para></listitem> 244 </varlistentry> 245 <varlistentry> 246 <term>Bug tracker</term> 247 <listitem><para> 248 There is a bug tracker for OProfile at SourceForge, 249 <ulink url="http://sf.net/tracker/?group_id=16191&atid=116191">http://sf.net/tracker/?group_id=16191&atid=116191</ulink>. 250 </para></listitem> 251 </varlistentry> 252 <varlistentry> 253 <term>IRC channel</term> 254 <listitem><para> 255 Several OProfile developers and users sometimes hang out on channel <command>#oprofile</command> 256 on the <ulink url="http://oftc.net">OFTC</ulink> network. 257 </para></listitem> 258 </varlistentry> 259 </variablelist> 260 261 </sect1> 262 263 <sect1 id="install"> 264 <title>Installation</title> 265 266 <para> 267 First you need to build OProfile and install it. <command>./configure</command>, <command>make</command>, <command>make install</command> 268 is often all you need, but note these arguments to <command>./configure</command> : 269 </para> 270 <variablelist> 271 <varlistentry> 272 <term><option>--with-linux</option></term> 273 <listitem><para> 274 Use this option to specify the location of the kernel source tree you wish 275 to compile against. The kernel module is built against this source and 276 will only work with a running kernel built from the same source with 277 exact same options, so it is important you specify this option if you need 278 to. 279 </para></listitem> 280 </varlistentry> 281 <varlistentry> 282 <term><option>--with-java</option></term> 283 <listitem> 284 <para> 285 Use this option if you need to profile Java applications. Also, see 286 <xref linkend="requirements"/>, "Required user account". This option 287 is used to specify the location of the Java Development Kit (JDK) 288 source tree you wish to use. This is necessary to get the interface description 289 of the JVMPI (or JVMTI) interface to compile the JIT support code successfully. 290 </para> 291 <note> 292 <para> 293 The Java Runtime Environment (JRE) does not include the development 294 files that are required to compile the JIT support code, so the full 295 JDK must be installed in order to use this option. 296 </para> 297 </note> 298 <para> 299 By default, the Oprofile JIT support libraries will be installed in 300 <filename><oprof_install_dir>/lib/oprofile</filename>. To build 301 and install OProfile and the JIT support libraries as 64-bit, you can 302 do something like the following: 303 <screen> 304 # CFLAGS="-m64" CXXFLAGS="-m64" ./configure \ 305 --with-kernel-support --with-java={my_jdk_installdir} \ 306 --libdir=/usr/local/lib64 307 </screen> 308 </para> 309 <note> 310 <para> 311 If you encounter errors building 64-bit, you should 312 install libtool 1.5.26 or later since that release of 313 libtool fixes known problems for certain platforms. 314 If you install libtool into a non-standard location, 315 you'll need to edit the invocation of 'aclocal' in 316 OProfile's autogen.sh as follows (assume an install 317 location of /usr/local): 318 </para> 319 <para> 320 <code>aclocal -I m4 -I /usr/local/share/aclocal</code> 321 </para> 322 </note> 323 </listitem> 324 </varlistentry> 325 <varlistentry> 326 <term><option>--with-kernel-support</option></term> 327 <listitem><para> 328 Use this option with 2.6 and above kernels to indicate the 329 kernel provides the OProfile device driver. 330 </para></listitem> 331 </varlistentry> 332 <varlistentry> 333 <term><option>--with-qt-dir/includes/libraries</option></term> 334 <listitem><para> 335 Specify the location of Qt headers and libraries. It defaults to searching in 336 <constant>$QTDIR</constant> if these are not specified. 337 </para></listitem> 338 </varlistentry> 339 <varlistentry id="disable-werror"> 340 <term><option>--disable-werror</option></term> 341 <listitem><para> 342 Development versions of OProfile build by 343 default with <option>-Werror</option>. This option turns 344 <option>-Werror</option> off. 345 </para></listitem> 346 </varlistentry> 347 <varlistentry id="disable-optimization"> 348 <term><option>--disable-optimization</option></term> 349 <listitem><para> 350 Disable the <option>-O2</option> compiler flag 351 (useful if you discover an OProfile bug and want to give a useful 352 back-trace etc.) 353 </para></listitem> 354 </varlistentry> 355 </variablelist> 356 <para> 357 You'll need to have a configured kernel source for the current kernel 358 to build the module for 2.4 kernels. Since all distributions provide different kernels it's unlikely the running kernel match the configured source 359 you installed. The safest way is to recompile your own kernel, run it and compile oprofile. It is also recommended that if you have a 360 uniprocessor machine, you enable the local APIC / IO_APIC support for 361 your kernel (this is automatically enabled for SMP kernels). With many BIOS, kernel >= 2.6.9 and UP kernel it's not sufficient to enable the local APIC you must also turn it on explicitly at boot time by providing "lapic" option to the kernel. On 362 machines with power management, such as laptops, the power management 363 must be turned off when using OProfile with 2.4 kernels. The power management software 364 in the BIOS cannot handle the non-maskable interrupts (NMIs) used by 365 OProfile for data collection. If you use the NMI watchdog, be aware that 366 the watchdog is disabled when profiling starts, and not re-enabled until the 367 OProfile module is removed (or, in 2.6, when OProfile is not running). If you compile OProfile for 368 a 2.2 kernel you must be root to compile the module. If you are using 369 2.6 kernels or higher, you do not need kernel source, as long as the 370 OProfile driver is enabled; additionally, you should not need to disable 371 power management. 372 </para> 373 <para> 374 Please note that you must save or have available the <filename>vmlinux</filename> file 375 generated during a kernel compile, as OProfile needs it (you can use 376 <option>--no-vmlinux</option>, but this will prevent kernel profiling). 377 </para> 378 379 </sect1> 380 381 <sect1 id="uninstall"> 382 <title>Uninstalling OProfile</title> 383 <para> 384 You must have the source tree available to uninstall OProfile; a <command>make uninstall</command> will 385 remove all installed files except your configuration file in the directory <filename>~/.oprofile</filename>. 386 </para> 387 </sect1> 388 389 </chapter> 390 391 <chapter id="overview"> 392 <title>Overview</title> 393 394 <sect1 id="getting-started"> 395 <title>Getting started</title> 396 <para> 397 Before you can use OProfile, you must set it up. The minimum setup required for this 398 is to tell OProfile where the <filename>vmlinux</filename> file corresponding to the 399 running kernel is, for example : 400 </para> 401 <screen>opcontrol --vmlinux=/boot/vmlinux-`uname -r`</screen> 402 <para> 403 If you don't want to profile the kernel itself, 404 you can tell OProfile you don't have a <filename>vmlinux</filename> file : 405 </para> 406 <screen>opcontrol --no-vmlinux</screen> 407 <para> 408 Now we are ready to start the daemon (<command>oprofiled</command>) which collects 409 the profile data : 410 </para> 411 <screen>opcontrol --start</screen> 412 <para> 413 When I want to stop profiling, I can do so with : 414 </para> 415 <screen>opcontrol --shutdown</screen> 416 <para> 417 Note that unlike <command>gprof</command>, no instrumentation (<option>-pg</option> 418 and <option>-a</option> options to <command>gcc</command>) 419 is necessary. 420 </para> 421 <para> 422 Periodically (or on <command>opcontrol --shutdown</command> or <command>opcontrol --dump</command>) 423 the profile data is written out into the $SESSION_DIR/samples directory (by default at <filename>/var/lib/oprofile/samples</filename>). 424 These profile files cover shared libraries, applications, the kernel (vmlinux), and kernel modules. 425 You can clear the profile data (at any time) with <command>opcontrol --reset</command>. 426 </para> 427 <para> 428 To place these sample database files in a specific directory instead of the default location (<filename>/var/lib/oprofile</filename>) use the <option>--session-dir=dir</option> option. You must also specify the <option>--session-dir</option> to tell the tools to continue using this directory. (In the future, we should allow this to be specified in an environment variable.) : 429 </para> 430 <screen>opcontrol --no-vmlinux --session-dir=/home/me/tmpsession</screen> 431 <screen>opcontrol --start --session-dir=/home/me/tmpsession</screen> 432 <para> 433 You can get summaries of this data in a number of ways at any time. To get a summary of 434 data across the entire system for all of these profiles, you can do : 435 </para> 436 <screen>opreport [--session-dir=dir]</screen> 437 <para> 438 Or to get a more detailed summary, for a particular image, you can do something like : 439 </para> 440 <screen>opreport -l /boot/vmlinux-`uname -r`</screen> 441 <para> 442 There are also a number of other ways of presenting the data, as described later in this manual. 443 Note that OProfile will choose a default profiling setup for you. However, there are a number 444 of options you can pass to <command>opcontrol</command> if you need to change something, 445 also detailed later. 446 </para> 447 448 </sect1> 449 450 <sect1 id="tools-overview"> 451 <title>Tools summary</title> 452 <para> 453 This section gives a brief description of the available OProfile utilities and their purpose. 454 </para> 455 <variablelist> 456 <varlistentry> 457 <term><filename>ophelp</filename></term> 458 <listitem><para> 459 This utility lists the available events and short descriptions. 460 </para></listitem> 461 </varlistentry> 462 463 <varlistentry> 464 <term><filename>opcontrol</filename></term> 465 <listitem><para> 466 Used for controlling the OProfile data collection, discussed in <xref linkend="controlling" />. 467 </para></listitem> 468 </varlistentry> 469 470 <varlistentry> 471 <term><filename>agent libraries</filename></term> 472 <listitem><para> 473 Used by virtual machines (like the Java VM) to record information about JITed code being profiled. See <xref linkend="setup-jit" />. 474 </para></listitem> 475 </varlistentry> 476 477 <varlistentry> 478 <term><filename>opreport</filename></term> 479 <listitem><para> 480 This is the main tool for retrieving useful profile data, described in 481 <xref linkend="opreport" />. 482 </para></listitem> 483 </varlistentry> 484 485 <varlistentry> 486 <term><filename>opannotate</filename></term> 487 <listitem><para> 488 This utility can be used to produce annotated source, assembly or mixed source/assembly. 489 Source level annotation is available only if the application was compiled with 490 debugging symbols. See <xref linkend="opannotate" />. 491 </para></listitem> 492 </varlistentry> 493 494 <varlistentry> 495 <term><filename>opgprof</filename></term> 496 <listitem><para> 497 This utility can output gprof-style data files for a binary, for use with 498 <command>gprof -p</command>. See <xref linkend="opgprof" />. 499 </para></listitem> 500 </varlistentry> 501 502 <varlistentry> 503 <term><filename>oparchive</filename></term> 504 <listitem><para> 505 This utility can be used to collect executables, debuginfo, 506 and sample files and copy the files into an archive. 507 The archive is self-contained and can be moved to another 508 machine for further analysis. 509 See <xref linkend="oparchive" />. 510 </para></listitem> 511 </varlistentry> 512 513 <varlistentry> 514 <term><filename>opimport</filename></term> 515 <listitem><para> 516 This utility converts sample database files from a foreign binary format (abi) to 517 the native format. This is useful only when moving sample files between hosts, 518 for analysis on platforms other than the one used for collection. 519 See <xref linkend="opimport" />. 520 </para></listitem> 521 </varlistentry> 522 523 </variablelist> 524 </sect1> 525 526 </chapter> 527 528 <chapter id="controlling"> 529 <title>Controlling the profiler</title> 530 531 <sect1 id="controlling-daemon"> 532 <title>Using <command>opcontrol</command></title> 533 <para> 534 In this section we describe the configuration and control of the profiling system 535 with opcontrol in more depth. 536 The <command>opcontrol</command> script has a default setup, but you 537 can alter this with the options given below. In particular, 538 if your hardware supports performance counters, you can configure them. 539 There are a number of counters (for example, counter 0 and counter 1 540 on the Pentium III). Each of these counters can be programmed with 541 an event to count, such as cache misses or MMX operations. The event 542 chosen for each counter is reflected in the profile data collected 543 by OProfile: functions and binaries at the top of the profiles reflect 544 that most of the chosen events happened within that code. 545 </para> 546 <para> 547 Additionally, each counter has a "count" value: this corresponds to how 548 detailed the profile is. The lower the value, the more frequently profile 549 samples are taken. A counter can choose to sample only kernel code, user-space code, 550 or both (both is the default). Finally, some events have a "unit mask" 551 - this is a value that further restricts the types of event that are counted. 552 The event types and unit masks for your CPU are listed by <command>opcontrol 553 --list-events</command>. 554 </para> 555 <para> 556 The <command>opcontrol</command> script provides the following actions : 557 </para> 558 <variablelist> 559 <varlistentry> 560 <term><option>--init</option></term> 561 <listitem><para> 562 Loads the OProfile module if required and makes the OProfile driver 563 interface available. 564 </para></listitem> 565 </varlistentry> 566 <varlistentry> 567 <term><option>--setup</option></term> 568 <listitem><para> 569 Followed by list arguments for profiling set up. List of arguments 570 saved in <filename>/root/.oprofile/daemonrc</filename>. 571 Giving this option is not necessary; you can just directly pass one 572 of the setup options, e.g. <command>opcontrol --no-vmlinux</command>. 573 </para></listitem> 574 </varlistentry> 575 <varlistentry> 576 <term><option>--status</option></term> 577 <listitem><para> 578 Show configuration information. 579 </para></listitem> 580 </varlistentry> 581 <varlistentry> 582 <term><option>--start-daemon</option></term> 583 <listitem><para> 584 Start the oprofile daemon without starting actual profiling. The profiling 585 can then be started using <option>--start</option>. This is useful for avoiding 586 measuring the cost of daemon startup, as <option>--start</option> is a simple 587 write to a file in oprofilefs. Not available in 2.2/2.4 kernels. 588 </para></listitem> 589 </varlistentry> 590 <varlistentry> 591 <term><option>--start</option></term> 592 <listitem><para> 593 Start data collection with either arguments provided by <option>--setup</option> 594 or information saved in <filename>/root/.oprofile/daemonrc</filename>. Specifying 595 the addition <option>--verbose</option> makes the daemon generate lots of debug data 596 whilst it is running. 597 </para></listitem> 598 </varlistentry> 599 <varlistentry> 600 <term><option>--dump</option></term> 601 <listitem><para> 602 Force a flush of the collected profiling data to the daemon. 603 </para></listitem> 604 </varlistentry> 605 <varlistentry> 606 <term><option>--stop</option></term> 607 <listitem><para> 608 Stop data collection (this separate step is not possible with 2.2 or 2.4 kernels). 609 </para></listitem> 610 </varlistentry> 611 <varlistentry> 612 <term><option>--shutdown</option></term> 613 <listitem><para> 614 Stop data collection and kill the daemon. 615 </para></listitem> 616 </varlistentry> 617 <varlistentry> 618 <term><option>--reset</option></term> 619 <listitem><para> 620 Clears out data from current session, but leaves saved sessions. 621 </para></listitem> 622 </varlistentry> 623 <varlistentry> 624 <term><option>--save=</option>session_name</term> 625 <listitem><para> 626 Save data from current session to session_name. 627 </para></listitem> 628 </varlistentry> 629 <varlistentry> 630 <term><option>--deinit</option></term> 631 <listitem><para> 632 Shuts down daemon. Unload the OProfile module and oprofilefs. 633 </para></listitem> 634 </varlistentry> 635 <varlistentry> 636 <term><option>--list-events</option></term> 637 <listitem><para> 638 List event types and unit masks. 639 </para></listitem> 640 </varlistentry> 641 <varlistentry> 642 <term><option>--help</option></term> 643 <listitem><para> 644 Generate usage messages. 645 </para></listitem> 646 </varlistentry> 647 </variablelist> 648 649 <para> 650 There are a number of possible settings, of which, only 651 <option>--vmlinux</option> (or <option>--no-vmlinux</option>) 652 is required. These settings are stored in <filename>~/.oprofile/daemonrc</filename>. 653 </para> 654 <variablelist> 655 <varlistentry> 656 <term><option>--buffer-size=</option>num</term> 657 <listitem><para> 658 Number of samples in kernel buffer. When using a 2.6 kernel 659 buffer watershed need to be tweaked when changing this value. 660 </para></listitem> 661 </varlistentry> 662 <varlistentry> 663 <term><option>--buffer-watershed=</option>num</term> 664 <listitem><para> 665 Set kernel buffer watershed to num samples (2.6 only). When it'll remain only 666 buffer-size - buffer-watershed free entry in the kernel buffer data will be 667 flushed to daemon, most usefull value are in the range [0.25 - 0.5] * buffer-size. 668 </para></listitem> 669 </varlistentry> 670 <varlistentry> 671 <term><option>--cpu-buffer-size=</option>num</term> 672 <listitem><para> 673 Number of samples in kernel per-cpu buffer (2.6 only). If you 674 profile at high rate it can help to increase this if the log 675 file show excessive count of sample lost cpu buffer overflow. 676 </para></listitem> 677 </varlistentry> 678 <varlistentry> 679 <term><option>--event=</option>[eventspec]</term> 680 <listitem><para> 681 Use the given performance counter event to profile. 682 See <xref linkend="eventspec" /> below. 683 </para></listitem> 684 </varlistentry> 685 <varlistentry> 686 <term><option>--session-dir=</option>dir_path</term> 687 <listitem><para> 688 Create/use sample database out of directory <filename>dir_path</filename> instead of 689 the default location (/var/lib/oprofile). 690 </para></listitem> 691 </varlistentry> 692 <varlistentry> 693 <term><option>--separate=</option>[none,lib,kernel,thread,cpu,all]</term> 694 <listitem><para> 695 By default, every profile is stored in a single file. Thus, for example, 696 samples in the C library are all accredited to the <filename>/lib/libc.o</filename> 697 profile. However, you choose to create separate sample files by specifying 698 one of the below options. 699 </para> 700 <informaltable frame="all"> 701 <tgroup cols='2'> 702 <tbody> 703 <row><entry><option>none</option></entry><entry>No profile separation (default)</entry></row> 704 <row><entry><option>lib</option></entry><entry>Create per-application profiles for libraries</entry></row> 705 <row><entry><option>kernel</option></entry><entry>Create per-application profiles for the kernel and kernel modules</entry></row> 706 <row><entry><option>thread</option></entry><entry>Create profiles for each thread and each task</entry></row> 707 <row><entry><option>cpu</option></entry><entry>Create profiles for each CPU</entry></row> 708 <row><entry><option>all</option></entry><entry>All of the above options</entry></row> 709 </tbody> 710 </tgroup> 711 </informaltable> 712 <para> 713 Note that <option>--separate=kernel</option> also turns on <option>--separate=lib</option>. 714 <!-- FIXME: update if this change --> 715 When using <option>--separate=kernel</option>, samples in hardware interrupts, soft-irqs, or other 716 asynchronous kernel contexts are credited to the task currently running. This means you will see 717 seemingly nonsense profiles such as <filename>/bin/bash</filename> showing samples for the PPP modules, 718 etc. 719 </para> 720 <para> 721 On 2.2/2.4 only kernel threads already started when profiling begins are correctly profiled; 722 newly started kernel thread samples are credited to the vmlinux (kernel) profile. 723 </para> 724 <para> 725 Using <option>--separate=thread</option> creates a lot 726 of sample files if you leave OProfile running for a while; it's most 727 useful when used for short sessions, or when using image filtering. 728 </para> 729 </listitem> 730 </varlistentry> 731 <varlistentry> 732 <term><option>--callgraph=</option>#depth</term> 733 <listitem><para> 734 Enable call-graph sample collection with a maximum depth. Use 0 to disable 735 callgraph profiling. NOTE: Callgraph support is available on a limited 736 number of platforms at this time; for example: 737 <para> 738 <itemizedlist> 739 <listitem><para>x86 with recent 2.6 kernel</para></listitem> 740 <listitem><para>ARM with recent 2.6 kernel</para></listitem> 741 <listitem><para>PowerPC with 2.6.17 kernel</para></listitem> 742 </itemizedlist> 743 </para> 744 </para></listitem> 745 </varlistentry> 746 <varlistentry> 747 <term><option>--image=</option>image,[images]|"all"</term> 748 <listitem><para> 749 Image filtering. If you specify one or more absolute 750 paths to binaries, OProfile will only produce profile results for those 751 binary images. This is useful for restricting the sometimes voluminous 752 output you may get otherwise, especially with 753 <option>--separate=thread</option>. Note that if you are using 754 <option>--separate=lib</option> or 755 <option>--separate=kernel</option>, then if you specification an 756 application binary, the shared libraries and kernel code 757 <emphasis>are</emphasis> included. Specify the value 758 "all" to profile everything (the default). 759 </para></listitem> 760 </varlistentry> 761 <varlistentry> 762 <term><option>--vmlinux=</option>file</term> 763 <listitem><para> 764 vmlinux kernel image. 765 </para></listitem> 766 </varlistentry> 767 <varlistentry> 768 <term><option>--no-vmlinux</option></term> 769 <listitem><para> 770 Use this when you don't have a kernel vmlinux file, and you don't want 771 to profile the kernel. This still counts the total number of kernel samples, 772 but can't give symbol-based results for the kernel or any modules. 773 </para></listitem> 774 </varlistentry> 775 </variablelist> 776 777 <sect2 id="opcontrolexamples"> 778 <title>Examples</title> 779 780 <sect3 id="examplesperfctr"> 781 <title>Intel performance counter setup</title> 782 <para> 783 Here, we have a Pentium III running at 800MHz, and we want to look at where data memory 784 references are happening most, and also get results for CPU time. 785 </para> 786 <screen> 787 # opcontrol --event=CPU_CLK_UNHALTED:400000 --event=DATA_MEM_REFS:10000 788 # opcontrol --vmlinux=/boot/2.6.0/vmlinux 789 # opcontrol --start 790 </screen> 791 </sect3> 792 793 <sect3 id="examplesrtc"> 794 <title>RTC mode</title> 795 <para> 796 Here, we have an Intel laptop without support for performance counters, running on 2.4 kernels. 797 </para> 798 <screen> 799 # ophelp -r 800 CPU with RTC device 801 # opcontrol --vmlinux=/boot/2.4.13/vmlinux --event=RTC_INTERRUPTS:1024 802 # opcontrol --start 803 </screen> 804 </sect3> 805 806 <sect3 id="examplesstartdaemon"> 807 <title>Starting the daemon separately</title> 808 <para> 809 If we're running 2.6 kernels, we can use <option>--start-daemon</option> to avoid 810 the profiler startup affecting results. 811 </para> 812 <screen> 813 # opcontrol --vmlinux=/boot/2.6.0/vmlinux 814 # opcontrol --start-daemon 815 # my_favourite_benchmark --init 816 # opcontrol --start ; my_favourite_benchmark --run ; opcontrol --stop 817 </screen> 818 </sect3> 819 820 <sect3 id="exampleseparate"> 821 <title>Separate profiles for libraries and the kernel</title> 822 <para> 823 Here, we want to see a profile of the OProfile daemon itself, including when 824 it was running inside the kernel driver, and its use of shared libraries. 825 </para> 826 <screen> 827 # opcontrol --separate=kernel --vmlinux=/boot/2.6.0/vmlinux 828 # opcontrol --start 829 # my_favourite_stress_test --run 830 # opreport -l -p /lib/modules/2.6.0/kernel /usr/local/bin/oprofiled 831 </screen> 832 </sect3> 833 834 <sect3 id="examplessessions"> 835 <title>Profiling sessions</title> 836 <para> 837 It can often be useful to split up profiling data into several different 838 time periods. For example, you may want to collect data on an application's 839 startup separately from the normal runtime data. You can use the simple 840 command <command>opcontrol --save</command> to do this. For example : 841 </para> 842 <screen> 843 # opcontrol --save=blah 844 </screen> 845 <para> 846 will create a sub-directory in <filename>$SESSION_DIR/samples</filename> containing the samples 847 up to that point (the current session's sample files are moved into this 848 directory). You can then pass this session name as a parameter to the post-profiling 849 analysis tools, to only get data up to the point you named the 850 session. If you do not want to save a session, you can do 851 <command>rm -rf $SESSION_DIR/samples/sessionname</command> or, for the 852 current session, <command>opcontrol --reset</command>. 853 </para> 854 </sect3> 855 </sect2> 856 857 <sect2 id="eventspec"> 858 <title>Specifying performance counter events</title> 859 <para> 860 The <option>--event</option> option to <command>opcontrol</command> 861 takes a specification that indicates how the details of each 862 hardware performance counter should be setup. If you want to 863 revert to OProfile's default setting (<option>--event</option> 864 is strictly optional), use <option>--event=default</option>. Use of this 865 option over-rides all previous event selections. 866 </para> 867 <para> 868 You can pass multiple event specifications. OProfile will allocate 869 hardware counters as necessary. Note that some combinations are not 870 allowed by the CPU; running <command>opcontrol --list-events</command> gives the details 871 of each event. The event specification is a colon-separated string 872 of the form <option><emphasis>name</emphasis>:<emphasis>count</emphasis>:<emphasis>unitmask</emphasis>:<emphasis>kernel</emphasis>:<emphasis>user</emphasis></option> as described in this table: 873 </para> 874 <informaltable frame="all"> 875 <tgroup cols='2'> 876 <tbody> 877 <row><entry><option>name</option></entry><entry>The symbolic event name, e.g. <constant>CPU_CLK_UNHALTED</constant></entry></row> 878 <row><entry><option>count</option></entry><entry>The counter reset value, e.g. 100000</entry></row> 879 <row><entry><option>unitmask</option></entry><entry>The unit mask, as given in the events list, e.g. 0x0f</entry></row> 880 <row><entry><option>kernel</option></entry><entry>Whether to profile kernel code</entry></row> 881 <row><entry><option>user</option></entry><entry>Whether to profile userspace code</entry></row> 882 </tbody> 883 </tgroup> 884 </informaltable> 885 <para> 886 The last three values are optional, if you omit them (e.g. <option>--event=DATA_MEM_REFS:30000</option>), 887 they will be set to the default values (a unit mask of 0, and profiling both kernel and 888 userspace code). Note that some events require a unit mask. 889 </para> 890 <note><para> 891 For the PowerPC platforms, all events specified must be in the same group; i.e., the group number 892 appended to the event name (e.g. <constant><<emphasis>some-event-name</emphasis>>_GRP9</constant>) must be the same. 893 </para></note> 894 <para> 895 If OProfile is using RTC mode, and you want to alter the default counter value, 896 you can use something like <option>--event=RTC_INTERRUPTS:2048</option>. Note the last 897 three values here are ignored. 898 If OProfile is using timer-interrupt mode, there is no configuration possible. 899 </para> 900 <para> 901 The table below lists the events selected by default 902 (<option>--event=default</option>) for the various computer architectures: 903 </para> 904 <informaltable frame="all"> 905 <tgroup cols='3'> 906 <tbody> 907 <row><entry>Processor</entry><entry>cpu_type</entry><entry>Default event</entry></row> 908 <row><entry>Alpha EV4</entry><entry>alpha/ev4</entry><entry>CYCLES:100000:0:1:1</entry></row> 909 <row><entry>Alpha EV5</entry><entry>alpha/ev5</entry><entry>CYCLES:100000:0:1:1</entry></row> 910 <row><entry>Alpha PCA56</entry><entry>alpha/pca56</entry><entry>CYCLES:100000:0:1:1</entry></row> 911 <row><entry>Alpha EV6</entry><entry>alpha/ev6</entry><entry>CYCLES:100000:0:1:1</entry></row> 912 <row><entry>Alpha EV67</entry><entry>alpha/ev67</entry><entry>CYCLES:100000:0:1:1</entry></row> 913 <row><entry>ARM/XScale PMU1</entry><entry>arm/xscale1</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 914 <row><entry>ARM/XScale PMU2</entry><entry>arm/xscale2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 915 <row><entry>ARM/MPCore</entry><entry>arm/mpcore</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 916 <row><entry>AVR32</entry><entry>avr32</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 917 <row><entry>Athlon</entry><entry>i386/athlon</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 918 <row><entry>Pentium Pro</entry><entry>i386/ppro</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 919 <row><entry>Pentium II</entry><entry>i386/pii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 920 <row><entry>Pentium III</entry><entry>i386/piii</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 921 <row><entry>Pentium M (P6 core)</entry><entry>i386/p6_mobile</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 922 <row><entry>Pentium 4 (non-HT)</entry><entry>i386/p4</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row> 923 <row><entry>Pentium 4 (HT)</entry><entry>i386/p4-ht</entry><entry>GLOBAL_POWER_EVENTS:100000:1:1:1</entry></row> 924 <row><entry>Hammer</entry><entry>x86-64/hammer</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 925 <row><entry>Family10h</entry><entry>x86-64/family10</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 926 <row><entry>Family11h</entry><entry>x86-64/family11h</entry><entry>CPU_CLK_UNHALTED:100000:0:1:1</entry></row> 927 <row><entry>Itanium</entry><entry>ia64/itanium</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 928 <row><entry>Itanium 2</entry><entry>ia64/itanium2</entry><entry>CPU_CYCLES:100000:0:1:1</entry></row> 929 <row><entry>TIMER_INT</entry><entry>timer</entry><entry>None selectable</entry></row> 930 <row><entry>IBM iseries</entry><entry>PowerPC 4/5/970</entry><entry>CYCLES:10000:0:1:1</entry></row> 931 <row><entry>IBM pseries</entry><entry>PowerPC 4/5/970/Cell</entry><entry>CYCLES:10000:0:1:1</entry></row> 932 <row><entry>IBM s390</entry><entry>timer</entry><entry>None selectable</entry></row> 933 <row><entry>IBM s390x</entry><entry>timer</entry><entry>None selectable</entry></row> 934 </tbody> 935 </tgroup> 936 </informaltable> 937 938 </sect2> 939 940 </sect1> 941 942 <sect1 id="setup-jit"> 943 <title>Setting up the JIT profiling feature</title> 944 <para> 945 To gather information about JITed code from a virtual machine, 946 it needs to be instrumented with an agent library. We use the 947 agent libraries for Java in the following example. To use the 948 Java profiling feature, you must build OProfile with the "--with-java" option 949 (<xref linkend="install" />). 950 951 </para> 952 953 <sect2 id="setup-jit-jvm"> 954 <title>JVM instrumentation</title> 955 <para> 956 Add this to the startup parameters of the JVM (for JVMTI): 957 958 <screen><option>-agentpath:<libdir>/libjvmti_oprofile.so[=<options>]</option> </screen> 959 or 960 <screen><option>-agentlib:jvmti_oprofile[=<options>]</option> </screen> 961 </para> 962 <para> 963 The JVMPI agent implementation is enabled with the command line option 964 <screen><option>-Xrunjvmpi_oprofile[:<options>]</option> </screen> 965 </para> 966 <para> 967 Currently, there is just one option available -- <option>debug</option>. For JVMPI, 968 the convention for specifying an option is <option>option_name=[yes|no]</option>. 969 For JVMTI, the option specification is simply the option name, implying 970 "yes"; no option specified implies "no". 971 </para> 972 <para> 973 The agent library (installed in <filename><oprof_install_dir>/lib/oprofile</filename>) 974 needs to be in the library search path (e.g. add the library directory 975 to <constant>LD_LIBRARY_PATH</constant>). If the command line of 976 the JVM is not accessible, it may be buried within shell scripts or a 977 launcher program. It may also be possible to set an environment variable to add 978 the instrumentation. 979 For Sun JVMs this is <constant>JAVA_TOOL_OPTIONS</constant>. Please check 980 your JVM documentation for 981 further information on the agent startup options. 982 </para> 983 984 </sect2> 985 </sect1> 986 987 <sect1 id="oprofile-gui"> 988 <title>Using <command>oprof_start</command></title> 989 <para> 990 The <command>oprof_start</command> application provides a convenient way to start the profiler. 991 Note that <command>oprof_start</command> is just a wrapper around the <command>opcontrol</command> script, 992 so it does not provide more services than the script itself. 993 </para> 994 <para> 995 After <command>oprof_start</command> is started you can select the event type for each counter; 996 the sampling rate and other related parameters are explained in <xref linkend="controlling-daemon" />. 997 The "Configuration" section allows you to set general parameters such as the buffer size, kernel filename 998 etc. The counter setup interface should be self-explanatory; <xref linkend="hardware-counters" /> and related 999 links contain information on using unit masks. 1000 </para> 1001 <para> 1002 A status line shows the current status of the profiler: how long it has been running, and the average 1003 number of interrupts received per second and the total, over all processors. 1004 Note that quitting <command>oprof_start</command> does not stop the profiler. 1005 </para> 1006 <para> 1007 Your configuration is saved in the same file as <command>opcontrol</command> uses; that is, 1008 <filename>~/.oprofile/daemonrc</filename>. 1009 </para> 1010 1011 </sect1> 1012 1013 <sect1 id="detailed-parameters"> 1014 <title>Configuration details</title> 1015 1016 <sect2 id="hardware-counters"> 1017 <title>Hardware performance counters</title> 1018 <note> 1019 <para> 1020 Your CPU type may not include the requisite support for hardware performance counters, in which case 1021 you must use OProfile in RTC mode in 2.4 (see <xref linkend="rtc" />), or timer mode in 2.6 (see <xref linkend="timer" />). 1022 You do not really need to read this section unless you are interested in using 1023 events other than the default event chosen by OProfile. 1024 </para> 1025 </note> 1026 <para> 1027 The Intel hardware performance counters are detailed in the Intel IA-32 Architecture Manual, Volume 3, available 1028 from <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink>. 1029 The AMD Athlon/Opteron/Phenom/Turion implementation is detailed in <ulink 1030 url="http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf"> 1031 http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf</ulink>. 1032 For PowerPC64 processors in IBM iSeries, pSeries, and blade server systems, processor documentation 1033 is available at <ulink url="http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC/"> 1034 http://www-01.ibm.com/chips/techlib/techlib.nsf/productfamilies/PowerPC</ulink>. (For example, the 1035 specific publication containing information on the performance monitor unit for the PowerPC970 is 1036 "IBM PowerPC 970FX RISC Microprocessor User's Manual.") 1037 These processors are capable of delivering an interrupt when a counter overflows. 1038 This is the basic mechanism on which OProfile is based. The delivery mode is <acronym>NMI</acronym>, 1039 so blocking interrupts in the kernel does not prevent profiling. When the interrupt handler is called, 1040 the current <acronym>PC</acronym> value and the current task are recorded into the profiling structure. 1041 This allows the overflow event to be attached to a specific assembly instruction in a binary image. 1042 The daemon receives this data from the kernel, and writes it to the sample files. 1043 </para> 1044 <para> 1045 If we use an event such as <constant>CPU_CLK_UNHALTED</constant> or <constant>INST_RETIRED</constant> 1046 (<constant>GLOBAL_POWER_EVENTS</constant> or <constant>INSTR_RETIRED</constant>, respectively, on the Pentium 4), we can 1047 use the overflow counts as an estimate of actual time spent in each part of code. Alternatively we can profile interesting 1048 data such as the cache behaviour of routines with the other available counters. 1049 </para> 1050 <para> 1051 However there are several caveats. First, there are those issues listed in the Intel manual. There is a delay 1052 between the counter overflow and the interrupt delivery that can skew results on a small scale - this means 1053 you cannot rely on the profiles at the instruction level as being perfectly accurate. 1054 If you are using an "event-mode" counter such as the cache counters, a count registered against it doesn't mean 1055 that it is responsible for that event. However, it implies that the counter overflowed in the dynamic 1056 vicinity of that instruction, to within a few instructions. Further details on this problem can be found in 1057 <xref linkend="interpreting" /> and also in the Digital paper "ProfileMe: A Hardware Performance Counter". 1058 </para> 1059 <para> 1060 Each counter has several configuration parameters. 1061 First, there is the unit mask: this simply further specifies what to count. 1062 Second, there is the counter value, discussed below. Third, there is a parameter whether to increment counts 1063 whilst in kernel or user space. You can configure these separately for each counter. 1064 </para> 1065 <para> 1066 After each overflow event, the counter will be re-initialized 1067 such that another overflow will occur after this many events have been counted. Thus, higher 1068 values mean less-detailed profiling, and lower values mean more detail, but higher overhead. 1069 Picking a good value for this 1070 parameter is, unfortunately, somewhat of a black art. It is of course dependent on the event 1071 you have chosen. 1072 Specifying too large a value will mean not enough interrupts are generated 1073 to give a realistic profile (though this problem can be ameliorated by profiling for <emphasis>longer</emphasis>). 1074 Specifying too small a value can lead to higher performance overhead. 1075 </para> 1076 1077 </sect2> 1078 1079 <sect2 id="rtc"> 1080 <title>OProfile in RTC mode</title> 1081 <note><para> 1082 This section applies to 2.2/2.4 kernels only. 1083 </para></note> 1084 <para> 1085 Some CPU types do not provide the needed hardware support to use the hardware performance counters. This includes 1086 some laptops, classic Pentiums, and other CPU types not yet supported by OProfile (such as Cyrix). 1087 On these machines, OProfile falls 1088 back to using the real-time clock interrupt to collect samples. This interrupt is also used by the <command>rtc</command> 1089 module: you cannot have both the OProfile and rtc modules loaded nor the rtc support compiled in the kernel. 1090 </para> 1091 <para> 1092 RTC mode is less capable than the hardware counters mode; in particular, it is unable to profile sections of 1093 the kernel where interrupts are disabled. There is just one available event, "RTC interrupts", and its value 1094 corresponds to the number of interrupts generated per second (that is, a higher number means a better profiling 1095 resolution, and higher overhead). The current implementation of the real-time clock supports only power-of-two 1096 sampling rates from 2 to 4096 per second. Other values within this range are rounded to the nearest power of 1097 two. 1098 </para> 1099 <para> 1100 You can force use of the RTC interrupt with the <option>force_rtc=1</option> module parameter. 1101 </para> 1102 <para> 1103 Setting the value from the GUI should be straightforward. On the command line, you need to specify the 1104 event to <command>opcontrol</command>, e.g. : 1105 </para> 1106 <para><command>opcontrol --event=RTC_INTERRUPTS:256</command></para> 1107 </sect2> 1108 1109 <sect2 id="timer"> 1110 <title>OProfile in timer interrupt mode</title> 1111 <note><para> 1112 This section applies to 2.6 kernels and above only. 1113 </para></note> 1114 <para> 1115 In 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver 1116 falls back to using the timer interrupt for profiling. Like the RTC mode in 2.4 kernels, this is not able to 1117 profile code that has interrupts disabled. Note that there are no configuration parameters for 1118 setting this, unlike the RTC and hardware performance counter setup. 1119 </para> 1120 <para> 1121 You can force use of the timer interrupt by using the <option>timer=1</option> module 1122 parameter (or <option>oprofile.timer=1</option> on the boot command line if OProfile is 1123 built-in). 1124 </para> 1125 </sect2> 1126 1127 <sect2 id="p4"> 1128 <title>Pentium 4 support</title> 1129 <para> 1130 The Pentium 4 / Xeon performance counters are organized around 3 types of model specific registers (MSRs): 45 event 1131 selection control registers (ESCRs), 18 counter configuration control registers (CCCRs) and 18 counters. ESCRs describe a 1132 particular set of events which are to be recorded, and CCCRs bind ESCRs to counters and configure their 1133 operation. Unfortunately the relationship between these registers is quite complex; they cannot all be used with one 1134 another at any time. There is, however, a subset of 8 counters, 8 ESCRs, and 8 CCCRs which can be used independently of 1135 one another, so OProfile only accesses those registers, treating them as a bank of 8 "normal" counters, similar 1136 to those in the P6 or Athlon/Opteron/Phenom/Turion families of CPU. 1137 </para> 1138 <para> 1139 There is currently no support for Precision Event-Based Sampling (PEBS), nor any advanced uses of the Debug Store 1140 (DS). Current support is limited to the conservative extension of OProfile's existing interrupt-based model described 1141 above. Performance monitoring hardware on Pentium 4 / Xeon processors with Hyperthreading enabled (multiple logical 1142 processors on a single die) is not supported in 2.4 kernels (you can use OProfile if you disable hyper-threading, 1143 though). 1144 </para> 1145 </sect2> 1146 1147 <sect2 id="ia64"> 1148 <title>Intel Itanium 2 support</title> 1149 <para> 1150 The Itanium 2 performance monitoring unit (PMU) organizes the counters as four 1151 pairs of performance event monitoring registers. Each pair is composed of a 1152 Performance Monitoring Configuration (PMC) register and Performance Monitoring 1153 Data (PMD) register. The PMC selects the performance event being monitored and 1154 the PMD determines the sampling interval. The IA64 Performance Monitoring Unit 1155 (PMU) triggers sampling with maskable interrupts. Thus, samples will not occur 1156 in sections of the IA64 kernel where interrupts are disabled. 1157 </para> 1158 <para> 1159 None of the advance features of the Itanium 2 performance monitoring unit 1160 such as opcode matching, address range matching, or precise event sampling are 1161 supported by this version of OProfile. The Itanium 2 support only maps OProfile's 1162 existing interrupt-based model to the PMU hardware. 1163 </para> 1164 </sect2> 1165 1166 <sect2 id="ppc64"> 1167 <title>PowerPC64 support</title> 1168 <para> 1169 The performance monitoring unit (PMU) for the IBM PowerPC 64-bit processors 1170 consists of between 4 and 8 counters (depending on the model), plus three 1171 special purpose registers used for programming the counters -- MMCR0, MMCR1, 1172 and MMCRA. Advanced features such as instruction matching and thresholding are 1173 not supported by this version of OProfile. 1174 <note>Later versions of the IBM POWER5+ processor (beginning with revision 3.0) 1175 run the performance monitor unit in POWER6 mode, effectively removing OProfile's 1176 access to counters 5 and 6. These two counters are dedicated to counting 1177 instructions completed and cycles, respectively. In POWER6 mode, however, the 1178 counters do not generate an interrupt on overflow and so are unusable by 1179 OProfile. Kernel versions 2.6.23 and higher will recognize this mode 1180 and export "ppc64/power5++" as the cpu_type to the oprofilefs pseudo filesystem. 1181 OProfile userspace responds to this cpu_type by removing these counters from 1182 the list of potential events to count. Without this kernel support, attempts 1183 to profile using an event from one of these counters will yield incorrect 1184 results -- typically, zero (or near zero) samples in the generated report. 1185 </note> 1186 </para> 1187 1188 </sect2> 1189 1190 <sect2 id="cell-be"> 1191 <title>Cell Broadband Engine support</title> 1192 <para> 1193 The Cell Broadband Engine (CBE) processor core consists of a PowerPC Processing 1194 Element (PPE) and 8 Synergistic Processing Elements (SPE). PPEs and SPEs each 1195 consist of a processing unit (PPU and SPU, respectively) and other hardware 1196 components, such as memory controllers. 1197 </para> 1198 <para> 1199 A PPU has two hardware threads (aka "virtual CPUs"). The performance monitor 1200 unit of the CBE collects event information on one hardware thread at a time. 1201 Therefore, when profiling PPE events, 1202 OProfile collects the profile based on the selected events by time slicing the 1203 performance counter hardware between the two threads. The user must ensure the 1204 collection interval is long enough so that the time spent collecting data for 1205 each PPU is sufficient to obtain a good profile. 1206 </para> 1207 <para> 1208 To profile an SPU application, the user should specify the SPU_CYCLES event. 1209 When starting OProfile with SPU_CYCLES, the opcontrol script enforces certain 1210 separation parameters (separate=cpu,lib) to ensure that sufficient information 1211 is collected in the sample data in order to generate a complete report. The 1212 --merge=cpu option can be used to obtain a more readable report if analyzing 1213 the performance of each separate SPU is not necessary. 1214 </para> 1215 <para> 1216 Profiling with an SPU event (events 4100 through 4163) is not compatible with any other 1217 event. Further more, only one SPU event can be specified at a time. The hardware only 1218 supports profiling on one SPU per node at a time. The OProfile kernel code time slices 1219 between the eight SPUs to collect data on all SPUs. 1220 </para> 1221 <para> 1222 SPU profile reports have some unique characteristics compared to reports for 1223 standard architectures: 1224 </para> 1225 <itemizedlist> 1226 <listitem>Typically no "app name" column. This is really standard OProfile behavior 1227 when the report contains samples for just a single application, which is 1228 commonly the case when profiling SPUs.</listitem> 1229 <listitem>"CPU" equates to "SPU"</listitem> 1230 <listitem>Specifying '--long-filenames' on the opreport command does not always result 1231 in long filenames. This happens when the SPU application code is embedded in 1232 the PPE executable or shared library. The embedded SPU ELF data contains only the 1233 short filename (i.e., no path information) for the SPU binary file that was used as 1234 the source for embedding. The reason that just the short filename is used is because 1235 the original SPU binary file may not exist or be accessible at runtime. The performance 1236 analyst must have sufficient knowledge of the application to be able to correlate the 1237 SPU binary image names found in the report to the application's source files. 1238 <note> 1239 Compile the application with -g and generate the OProfile report 1240 with -g to facilitate finding the right source file(s) on which to focus. 1241 </note> 1242 </listitem> 1243 </itemizedlist> 1244 1245 </sect2> 1246 1247 <sect2 id="amd-ibs-support"> 1248 <title>AMD64 (x86_64) Instruction-Based Sampling (IBS) support</title> 1249 1250 <para> 1251 Instruction-Based Sampling (IBS) is a new performance measurement technique 1252 available on AMD Family 10h processors. Traditional performance counter 1253 sampling is not precise enough to isolate performance issues to individual 1254 instructions. IBS, however, precisely identifies instructions which are not 1255 making the best use of the processor pipeline and memory hierarchy. 1256 For more information, please refer to the "Instruction-Based Sampling: 1257 A New Performance Analysis Technique for AMD Family 10h Processors" ( 1258 <ulink url="http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf"> 1259 http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf</ulink>). 1260 There are two types of IBS profile types, described in the following sections. 1261 </para> 1262 1263 <sect3 id="ibs-fetch"> 1264 <title>IBS Fetch</title> 1265 1266 <para> 1267 IBS fetch sampling is a statistical sampling method which counts completed 1268 fetch operations. When the number of completed fetch operations reaches the 1269 maximum fetch count (the sampling period), IBS tags the fetch operation and 1270 monitors that operation until it either completes or aborts. When a tagged 1271 fetch completes or aborts, a sampling interrupt is generated and an IBS fetch 1272 sample is taken. An IBS fetch sample contains a timestamp, the identifier of 1273 the interrupted process, the virtual fetch address, and several event flags 1274 and values that describe what happened during the fetch operation. 1275 </para> 1276 1277 </sect3> 1278 1279 <sect3 id="ibs-op"> 1280 <title>IBS Op</title> 1281 1282 <para> 1283 IBS op sampling selects, tags, and monitors macro-ops as issued from AMD64 1284 instructions. Two options are available for selecting ops for sampling: 1285 </para> 1286 1287 <itemizedlist> 1288 <listitem> 1289 Cycles-based selection counts CPU clock cycles. The op is tagged and monitored 1290 when the count reaches a threshold (the sampling period) and a valid op is 1291 available. 1292 </listitem> 1293 1294 <listitem> 1295 Dispatched op-based selection counts dispatched macro-ops. 1296 When the count reaches a threshold, the next valid op is tagged and monitored. 1297 </listitem> 1298 </itemizedlist> 1299 1300 <para> 1301 In both cases, an IBS sample is generated only if the tagged op retires. 1302 Thus, IBS op event information does not measure speculative execution activity. 1303 The execution stages of the pipeline monitor the tagged macro-op. When the 1304 tagged macro-op retires, a sampling interrupt is generated and an IBS op 1305 sample is taken. An IBS op sample contains a timestamp, the identifier of 1306 the interrupted process, the virtual address of the AMD64 instruction from 1307 which the op was issued, and several event flags and values that describe 1308 what happened when the macro-op executed. 1309 </para> 1310 1311 </sect3> 1312 1313 <para> 1314 Enabling IBS profiling is done simply by specifying IBS performance events 1315 through the "--event=" options. These events are listed in the 1316 <function>opcontrol --list-events</function>. 1317 </para> 1318 1319 <screen> 1320 opcontrol --event=IBS_FETCH_XXX:<count>:<um>:<kernel>:<user> 1321 opcontrol --event=IBS_OP_XXX:<count>:<um>:<kernel>:<user> 1322 1323 Note: * All IBS fetch event must have the same event count and unitmask, 1324 as do those for IBS op. 1325 </screen> 1326 1327 </sect2> 1328 1329 1330 <sect2 id="misuse"> 1331 <title>Dangerous counter settings</title> 1332 <para> 1333 OProfile is a low-level profiler which allow continuous profiling with a low-overhead cost. 1334 If too low a count reset value is set for a counter, the system can become overloaded with counter 1335 interrupts, and seem as if the system has frozen. Whilst some validation is done, it 1336 is not foolproof. 1337 </para> 1338 <note><para> 1339 This can happen as follows: When the profiler count 1340 reaches zero an NMI handler is called which stores the sample values in an internal buffer, then resets the counter 1341 to its original value. If the count is very low, a pending NMI can be sent before the NMI handler has 1342 completed. Due to the priority of the NMI, the local APIC delivers the pending interrupt immediately after 1343 completion of the previous interrupt handler, and control never returns to other parts of the system. 1344 In this way the system seems to be frozen. 1345 </para></note> 1346 <para>If this happens, it will be impossible to bring the system back to a workable state. 1347 There is no way to provide real security against this happening, other than making sure to use a reasonable value 1348 for the counter reset. For example, setting <constant>CPU_CLK_UNHALTED</constant> event type with a ridiculously low reset count (e.g. 500) 1349 is likely to freeze the system. 1350 </para> 1351 <para> 1352 In short : <command>Don't try a foolish sample count value</command>. Unfortunately the definition of a foolish value 1353 is really dependent on the event type - if ever in doubt, e-mail </para> 1354 <address><email>oprofile-list (a] lists.sf.net</email>.</address> 1355 </sect2> 1356 1357 </sect1> 1358 1359 </chapter> 1360 1361 <chapter id="results"> 1362 <title>Obtaining results</title> 1363 <para> 1364 OK, so the profiler has been running, but it's not much use unless we can get some data out. Fairly often, 1365 OProfile does a little <emphasis>too</emphasis> good a job of keeping overhead low, and no data reaches 1366 the profiler. This can happen on lightly-loaded machines. Remember you can force a dump at any time with : 1367 </para> 1368 <para><command>opcontrol --dump</command></para> 1369 <para>Remember to do this before complaining there is no profiling data ! 1370 Now that we've got some data, it has to be processed. That's the job of <command>opreport</command>, 1371 <command>opannotate</command>, or <command>opgprof</command>. 1372 </para> 1373 1374 <sect1 id="profile-spec"> 1375 <title>Profile specifications</title> 1376 1377 <para> 1378 All of the analysis tools take a <emphasis>profile specification</emphasis>. 1379 This is a set of definitions that describe which actual profiles should be 1380 examined. The simplest profile specification is empty: this will match all 1381 the available profile files for the current session (this is what happens 1382 when you do <command>opreport</command>). 1383 </para> 1384 <para> 1385 Specification parameters are of the form <option>name:value[,value]</option>. 1386 For example, if I wanted to get a combined symbol summary for 1387 <filename>/bin/myprog</filename> and <filename>/bin/myprog2</filename>, 1388 I could do <command>opreport -l image:/bin/myprog,/bin/myprog2</command>. 1389 As a special case, you don't actually need to specify the <option>image:</option> 1390 part here: anything left on the command line is assumed to be an 1391 <option>image:</option> name. Similarly, if no <option>session:</option> 1392 is specified, then <option>session:current</option> is assumed ("current" 1393 is a special name of the current / last profiling session). 1394 </para> 1395 <para> 1396 In addition to the comma-separated list shown above, some of the 1397 specification parameters can take <command>glob</command>-style 1398 values. For example, if I want to see image summaries for all 1399 binaries profiled in <filename>/usr/bin/</filename>, I could do 1400 <command>opreport image:/usr/bin/\*</command>. Note the necessity 1401 to escape the special character from the shell. 1402 </para> 1403 <para> 1404 For <command>opreport</command>, profile specifications can be used to 1405 define two profiles, giving differential output. This is done by 1406 enclosing each of the two specifications within curly braces, as shown 1407 in the examples below. Any specifications outside of curly braces are 1408 shared across both. 1409 </para> 1410 1411 <sect2 id="profile-spec-examples"> 1412 <title>Examples</title> 1413 1414 <para> 1415 Image summaries for all profiles with <constant>DATA_MEM_REFS</constant> 1416 samples in the saved session called "stresstest" : 1417 </para> 1418 <screen> 1419 # opreport session:stresstest event:DATA_MEM_REFS 1420 </screen> 1421 1422 <para> 1423 Symbol summary for the application called "test_sym53c8xx,9xx". Note the 1424 escaping is necessary as <option>image:</option> takes a comma-separated list. 1425 </para> 1426 <screen> 1427 # opreport -l ./test/test_sym53c8xx\,9xx 1428 </screen> 1429 1430 <para> 1431 Image summaries for all binaries in the <filename>test</filename> directory, 1432 excepting <filename>boring-test</filename> : 1433 </para> 1434 <screen> 1435 # opreport image:./test/\* image-exclude:./test/boring-test 1436 </screen> 1437 1438 <para> 1439 Differential profile of a binary stored in two archives : 1440 </para> 1441 <screen> 1442 # opreport -l /bin/bash { archive:./orig } { archive:./new } 1443 </screen> 1444 1445 <para> 1446 Differential profile of an archived binary with the current session : 1447 </para> 1448 <screen> 1449 # opreport -l /bin/bash { archive:./orig } { } 1450 </screen> 1451 1452 </sect2> <!-- profile spec examples --> 1453 1454 <sect2 id="profile-spec-details"> 1455 <title>Profile specification parameters</title> 1456 1457 <variablelist> 1458 <varlistentry> 1459 <term><option>archive:</option><emphasis>archivepath</emphasis></term> 1460 <listitem><para> 1461 A path to an archive made with <command>oparchive</command>. 1462 Absence of this tag, unlike others, means "the current system", 1463 equivalent to specifying "archive:". 1464 </para></listitem> 1465 </varlistentry> 1466 <varlistentry> 1467 <term><option>session:</option><emphasis>sessionlist</emphasis></term> 1468 <listitem><para> 1469 A comma-separated list of session names to resolve in. Absence of this 1470 tag, unlike others, means "the current session", equivalent to 1471 specifying "session:current". 1472 </para></listitem> 1473 </varlistentry> 1474 <varlistentry> 1475 <term><option>session-exclude:</option><emphasis>sessionlist</emphasis></term> 1476 <listitem><para> 1477 A comma-separated list of sessions to exclude. 1478 </para></listitem> 1479 </varlistentry> 1480 <varlistentry> 1481 <term><option>image:</option><emphasis>imagelist</emphasis></term> 1482 <listitem><para> 1483 A comma-separated list of image names to resolve. Each entry may be relative 1484 path, <command>glob</command>-style name, or full path, e.g.</para> 1485 <screen>opreport 'image:/usr/bin/oprofiled,*op*,./opreport'</screen> 1486 </listitem> 1487 </varlistentry> 1488 1489 <varlistentry> 1490 <term><option>image-exclude:</option><emphasis>imagelist</emphasis></term> 1491 <listitem><para> 1492 Same as <option>image:</option>, but the matching images are excluded. 1493 </para></listitem> 1494 </varlistentry> 1495 1496 <varlistentry> 1497 <term><option>lib-image:</option><emphasis>imagelist</emphasis></term> 1498 <listitem><para> 1499 Same as <option>image:</option>, but only for images that are for 1500 a particular primary binary image (namely, an application). This only 1501 makes sense to use if you're using <option>--separate</option>. 1502 This includes kernel modules and the kernel when using 1503 <option>--separate=kernel</option>. 1504 </para></listitem> 1505 </varlistentry> 1506 1507 <varlistentry> 1508 <term><option>lib-image-exclude:</option><emphasis>imagelist</emphasis></term> 1509 <listitem><para> 1510 Same as <option>lib-image:</option>, but the matching images 1511 are excluded. 1512 </para></listitem> 1513 </varlistentry> 1514 1515 <varlistentry> 1516 <term><option>event:</option><emphasis>eventlist</emphasis></term> 1517 <listitem><para> 1518 The symbolic event name to match on, e.g. <option>event:DATA_MEM_REFS</option>. 1519 You can pass a list of events for side-by-side comparison with <command>opreport</command>. 1520 When using the timer interrupt, the event is always "TIMER". 1521 </para></listitem> 1522 </varlistentry> 1523 1524 <varlistentry> 1525 <term><option>count:</option><emphasis>eventcountlist</emphasis></term> 1526 <listitem><para> 1527 The event count to match on, e.g. <option>event:DATA_MEM_REFS count:30000</option>. 1528 Note that this value refers to the setting used for <command>opcontrol</command> 1529 only, and has nothing to do with the sample counts in the profile data 1530 itself. 1531 You can pass a list of events for side-by-side comparison with <command>opreport</command>. 1532 When using the timer interrupt, the count is always 0 (indicating it cannot be set). 1533 </para></listitem> 1534 </varlistentry> 1535 1536 <varlistentry> 1537 <term><option>unit-mask:</option><emphasis>masklist</emphasis></term> 1538 <listitem><para> 1539 The unit mask value of the event to match on, e.g. <option>unit-mask:1</option>. 1540 You can pass a list of events for side-by-side comparison with <command>opreport</command>. 1541 </para></listitem> 1542 </varlistentry> 1543 1544 <varlistentry> 1545 <term><option>cpu:</option><emphasis>cpulist</emphasis></term> 1546 <listitem><para> 1547 Only consider profiles for the given numbered CPU (starting from zero). 1548 This is only useful when using CPU profile separation. 1549 </para></listitem> 1550 </varlistentry> 1551 1552 <varlistentry> 1553 <term><option>tgid:</option><emphasis>pidlist</emphasis></term> 1554 <listitem><para> 1555 Only consider profiles for the given task groups. Unless some program 1556 is using threads, the task group ID of a process is the same 1557 as its process ID. This option corresponds to the POSIX 1558 notion of a thread group. 1559 This is only useful when using per-process profile separation. 1560 </para></listitem> 1561 </varlistentry> 1562 1563 <varlistentry> 1564 <term><option>tid:</option><emphasis>tidlist</emphasis></term> 1565 <listitem><para> 1566 Only consider profiles for the given threads. When using 1567 recent thread libraries, all threads in a process share the 1568 same task group ID, but have different thread IDs. You can 1569 use this option in combination with <option>tgid:</option> to 1570 restrict the results to particular threads within a process. 1571 This is only useful when using per-process profile separation. 1572 </para></listitem> 1573 </varlistentry> 1574 </variablelist> 1575 1576 </sect2> 1577 1578 <sect2 id="locating-and-managing-binary-images"> 1579 <title>Locating and managing binary images</title> 1580 <para> 1581 Each session's sample files can be found in the $SESSION_DIR/samples/ directory (default: <filename>/var/lib/oprofile/samples/</filename>). 1582 These are used, along with the binary image files, to produce human-readable data. 1583 In some circumstances (kernel modules in an initrd, or modules on 2.6 kernels), OProfile 1584 will not be able to find the binary images. All the tools have an <option>--image-path</option> 1585 option to which you can pass a comma-separated list of alternate paths to search. For example, 1586 I can let OProfile find my 2.6 modules by using <command>--image-path /lib/modules/2.6.0/kernel/</command>. 1587 It is your responsibility to ensure that the correct images are found when using this 1588 option. 1589 </para> 1590 <para> 1591 Note that if a binary image changes after the sample file was created, you won't be able to get useful 1592 symbol-based data out. This situation is detected for you. If you replace a binary, you should 1593 make sure to save the old binary if you need to do comparative profiles. 1594 </para> 1595 1596 </sect2> 1597 1598 <sect2 id="no-results"> 1599 <title>What to do when you don't get any results</title> 1600 <para> 1601 When attempting to get output, you may see the error : 1602 </para> 1603 <screen> 1604 error: no sample files found: profile specification too strict ? 1605 </screen> 1606 <para> 1607 What this is saying is that the profile specification you passed in, 1608 when matched against the available sample files, resulted in no matches. 1609 There are a number of reasons this might happen: 1610 </para> 1611 <variablelist> 1612 <varlistentry><term>spelling</term><listitem><para> 1613 You specified a binary name, but spelt it wrongly. Check your spelling ! 1614 </para></listitem></varlistentry> 1615 <varlistentry><term>profiler wasn't running</term><listitem><para> 1616 Make very sure that OProfile was actually up and running when you ran 1617 the binary. 1618 </para></listitem></varlistentry> 1619 <varlistentry><term>binary didn't run long enough</term><listitem><para> 1620 Remember OProfile is a statistical profiler - you're not guaranteed to 1621 get samples for short-running programs. You can help this by using a 1622 lower count for the performance counter, so there are a lot more samples 1623 taken per second. 1624 </para></listitem></varlistentry> 1625 <varlistentry><term>binary spent most of its time in libraries</term><listitem><para> 1626 Similarly, if the binary spends little time in the main binary image 1627 itself, with most of it spent in shared libraries it uses, you might 1628 not see any samples for the binary image itself. You can check this 1629 by using <command>opcontrol --separate=lib</command> before the 1630 profiling session, so <command>opreport</command> and friends show 1631 the library profiles on a per-application basis. 1632 </para></listitem></varlistentry> 1633 <varlistentry><term>specification was really too strict</term><listitem><para> 1634 For example, you specified something like <option>tgid:3433</option>, 1635 but no task with that group ID ever ran the code. 1636 </para></listitem></varlistentry> 1637 <varlistentry><term>binary didn't generate any events</term><listitem><para> 1638 If you're using a particular event counter, for example counting MMX 1639 operations, the code might simply have not generated any events in the 1640 first place. Verify the code you're profiling does what you expect it 1641 to. 1642 </para></listitem></varlistentry> 1643 <varlistentry><term>you didn't specify kernel module name correctly</term><listitem><para> 1644 If you're using 2.6 kernels, and trying to get reports for a kernel 1645 module, make sure to use the <option>-p</option> option, and specify the 1646 module name <emphasis>with</emphasis> the <filename>.ko</filename> 1647 extension. Check if the module is one loaded from initrd. 1648 </para></listitem></varlistentry> 1649 </variablelist> 1650 1651 </sect2> 1652 1653 </sect1> <!-- profile-spec --> 1654 1655 <sect1 id="opreport"> 1656 <title>Image summaries and symbol summaries (<command>opreport</command>)</title> 1657 <para> 1658 The <command>opreport</command> utility is the primary utility you will use for 1659 getting formatted data out of OProfile. It produces two types of data: image summaries 1660 and symbol summaries. An image summary lists the number of samples for individual 1661 binary images such as libraries or applications. Symbol summaries provide per-symbol 1662 profile data. In the following example, we're getting an image summary for the whole 1663 system: 1664 </para> 1665 <screen> 1666 $ opreport --long-filenames 1667 CPU: PIII, speed 863.195 MHz (estimated) 1668 Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 1669 905898 59.7415 /usr/lib/gcc-lib/i386-redhat-linux/3.2/cc1plus 1670 214320 14.1338 /boot/2.6.0/vmlinux 1671 103450 6.8222 /lib/i686/libc-2.3.2.so 1672 60160 3.9674 /usr/local/bin/madplay 1673 31769 2.0951 /usr/local/oprofile-pp/bin/oprofiled 1674 26550 1.7509 /usr/lib/libartsflow.so.1.0.0 1675 23906 1.5765 /usr/bin/as 1676 18770 1.2378 /oprofile 1677 15528 1.0240 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5 1678 11979 0.7900 /usr/X11R6/bin/XFree86 1679 11328 0.7471 /bin/bash 1680 ... 1681 </screen> 1682 <para> 1683 If we had specified <option>--symbols</option> in the previous command, we would have 1684 gotten a symbol summary of all the images across the entire system. We can restrict this to only 1685 part of the system profile; for example, 1686 below is a symbol summary of the OProfile daemon. Note that as we used 1687 <command>opcontrol --separate=kernel</command>, symbols from images that <command>oprofiled</command> 1688 has used are also shown. 1689 </para> 1690 <screen> 1691 $ opreport -l `which oprofiled` 2>/dev/null | more 1692 CPU: PIII, speed 863.195 MHz (estimated) 1693 Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 23150 1694 vma samples % image name symbol name 1695 0804be10 14971 28.1993 oprofiled odb_insert 1696 0804afdc 7144 13.4564 oprofiled pop_buffer_value 1697 c01daea0 6113 11.5144 vmlinux __copy_to_user_ll 1698 0804b060 2816 5.3042 oprofiled opd_put_sample 1699 0804b4a0 2147 4.0441 oprofiled opd_process_samples 1700 0804acf4 1855 3.4941 oprofiled opd_put_image_sample 1701 0804ad84 1766 3.3264 oprofiled opd_find_image 1702 0804a5ec 1084 2.0418 oprofiled opd_find_module 1703 0804ba5c 741 1.3957 oprofiled odb_hash_add_node 1704 ... 1705 </screen> 1706 1707 <para> 1708 These are the two basic ways you are most likely to use regularly, but <command>opreport</command> 1709 can do a lot more than that, as described below. 1710 </para> 1711 1712 <sect2 id="opreport-merging"> 1713 <title>Merging separate profiles</title> 1714 1715 If you have used one of the <option>--separate=</option> options 1716 whilst profiling, there can be several separate profiles for 1717 a single binary image within a session. Normally the output 1718 will keep these images separated (so, for example, the image summary 1719 output shows library image summaries on a per-application basis, 1720 when using <option>--separate=lib</option>). 1721 Sometimes it can be useful to merge these results back together 1722 before getting results. The <option>--merge</option> option allows 1723 you to do that. 1724 </sect2> 1725 1726 <sect2 id="opreport-comparison"> 1727 <title>Side-by-side multiple results</title> 1728 If you have used multiple events when profiling, by default you get 1729 side-by-side results of each event's sample values from <command>opreport</command>. 1730 You can restrict which events to list by appropriate use of the 1731 <option>event:</option> profile specifications, etc. 1732 </sect2> 1733 1734 <sect2 id="opreport-callgraph"> 1735 <title>Callgraph output</title> 1736 <para> 1737 This section provides details on how to use the OProfile callgraph feature. 1738 </para> 1739 <sect3 id="op-cg1"> 1740 <title>Callgraph details</title> 1741 <para> 1742 When using the <option>opcontrol --callgraph</option> option, you can see what 1743 functions are calling other functions in the output. Consider the 1744 following program: 1745 </para> 1746 <screen> 1747 #include <string.h> 1748 #include <stdlib.h> 1749 #include <stdio.h> 1750 1751 #define SIZE 500000 1752 1753 static int compare(const void *s1, const void *s2) 1754 { 1755 return strcmp(s1, s2); 1756 } 1757 1758 static void repeat(void) 1759 { 1760 int i; 1761 char *strings[SIZE]; 1762 char str[] = "abcdefghijklmnopqrstuvwxyz"; 1763 1764 for (i = 0; i < SIZE; ++i) { 1765 strings[i] = strdup(str); 1766 strfry(strings[i]); 1767 } 1768 1769 qsort(strings, SIZE, sizeof(char *), compare); 1770 } 1771 1772 int main() 1773 { 1774 while (1) 1775 repeat(); 1776 } 1777 </screen> 1778 <para> 1779 When running with the call-graph option, OProfile will 1780 record the function stack every time it takes a sample. 1781 <command>opreport --callgraph</command> outputs an entry for each 1782 function, where each entry looks similar to: 1783 </para> 1784 <screen> 1785 samples % image name symbol name 1786 197 0.1548 cg main 1787 127036 99.8452 cg repeat 1788 84590 42.5084 libc-2.3.2.so strfry 1789 84590 66.4838 libc-2.3.2.so strfry [self] 1790 39169 30.7850 libc-2.3.2.so random_r 1791 3475 2.7312 libc-2.3.2.so __i686.get_pc_thunk.bx 1792 ------------------------------------------------------------------------------- 1793 </screen> 1794 <para> 1795 Here the non-indented line is the function we're focussing upon 1796 (<function>strfry()</function>). This 1797 line is the same as you'd get from a normal <command>opreport</command> 1798 output. 1799 </para> 1800 <para> 1801 Above the non-indented line we find the functions that called this 1802 function (for example, <function>repeat()</function> calls 1803 <function>strfry()</function>). The samples and percentage values here 1804 refer to the number of times we took a sample where this call was found 1805 in the stack; the percentage is relative to all other callers of the 1806 function we're focussing on. Note that these values are 1807 <emphasis>not</emphasis> call counts; they only reflect the call stack 1808 every time a sample is taken; that is, if a call is found in the stack 1809 at the time of a sample, it is recorded in this count. 1810 </para> 1811 <para> 1812 Below the line are functions that are called by 1813 <function>strfry()</function> (called <emphasis>callees</emphasis>). 1814 It's clear here that <function>strfry()</function> calls 1815 <function>random_r()</function>. We also see a special entry with a 1816 "[self]" marker. This records the normal samples for the function, but 1817 the percentage becomes relative to all callees. This allows you to 1818 compare time spent in the function itself compared to functions it 1819 calls. Note that if a function calls itself, then it will appear in the 1820 list of callees of itself, but without the "[self]" marker; so recursive 1821 calls are still clearly separable. 1822 </para> 1823 <para> 1824 You may have noticed that the output lists <function>main()</function> 1825 as calling <function>strfry()</function>, but it's clear from the source 1826 that this doesn't actually happen. See <xref 1827 linkend="interpreting-callgraph" /> for an explanation. 1828 </para> 1829 </sect3> 1830 <sect3 id="cg-with-jitsupport"> 1831 <title>Callgraph and JIT support</title> 1832 <para> 1833 Callgraph output where anonymously mapped code is in the callstack can sometimes be misleading. 1834 For all such code, the samples for the anonymously mapped code are stored in a samples subdirectory 1835 named <filename>{anon:anon}/<tgid>.<begin_addr>.<end_addr></filename>. 1836 As stated earlier, if this anonymously mapped code is JITed code from a supported VM like Java, 1837 OProfile creates an ELF file to provide a (somewhat) permanent backing file for the code. 1838 However, when viewing callgraph output, any anonymously mapped code in the callstack 1839 will be attributed to <filename>anon (<tgid>: range:<begin_addr>-<end_addr></filename>, 1840 even if a <filename>.jo</filename> ELF file had been created for it. See the example below. 1841 </para> 1842 <screen> 1843 ------------------------------------------------------------------------------- 1844 1 2.2727 libj9ute23.so java.bin traceV 1845 2 4.5455 libj9ute23.so java.bin utsTraceV 1846 4 9.0909 libj9trc23.so java.bin fillInUTInterfaces 1847 37 84.0909 libj9trc23.so java.bin twGetSequenceCounter 1848 8 0.0154 libj9prt23.so java.bin j9time_hires_clock 1849 27 61.3636 anon (tgid:10014 range:0x100000-0x103000) java.bin (no symbols) 1850 9 20.4545 libc-2.4.so java.bin gettimeofday 1851 8 18.1818 libj9prt23.so java.bin j9time_hires_clock [self] 1852 ------------------------------------------------------------------------------- 1853 </screen> 1854 <para> 1855 The output shows that "anon (tgid:10014 range:0x100000-0x103000)" was a callee of 1856 <code>j9time_hires_clock</code>, even though the ELF file <filename>10014.jo</filename> was 1857 created for this profile run. Unfortunately, there is currently no way to correlate 1858 that anonymous callgraph entry with its corresponding <filename>.jo</filename> file. 1859 </para> 1860 </sect3> 1861 1862 1863 </sect2> <!-- opreport-callgraph --> 1864 1865 <sect2 id="opreport-diff"> 1866 <title>Differential profiles with <command>opreport</command></title> 1867 1868 <para> 1869 Often, we'd like to be able to compare two profiles. For example, when 1870 analysing the performance of an application, we'd like to make code 1871 changes and examine the effect of the change. This is supported in 1872 <command>opreport</command> by giving a profile specification that 1873 identifies two different profiles. The general form is of: 1874 </para> 1875 <screen> 1876 $ opreport <shared-spec> { <first-profile> } { <second-profile> } 1877 </screen> 1878 <note><para> 1879 We lost our Dragon book down the back of the sofa, so you have to be 1880 careful to have spaces around those braces, or things will get 1881 hopelessly confused. We can only apologise. 1882 </para></note> 1883 <para> 1884 For each of the profiles, the shared section is prefixed, and then the 1885 specification is analysed. The usual parameters work both within the 1886 shared section, and in the sub-specification within the curly braces. 1887 </para> 1888 <para> 1889 A typical way to use this feature is with archives created with 1890 <command>oparchive</command>. Let's look at an example: 1891 </para> 1892 <screen> 1893 $ ./a 1894 $ oparchive -o orig ./a 1895 $ opcontrol --reset 1896 # edit and recompile a 1897 $ ./a 1898 # now compare the current profile of a with the archived profile 1899 $ opreport -xl ./a { archive:./orig } { } 1900 CPU: PIII, speed 863.233 MHz (estimated) 1901 Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a 1902 unit mask of 0x00 (No unit mask) count 100000 1903 samples % diff % symbol name 1904 92435 48.5366 +0.4999 a 1905 54226 --- --- c 1906 49222 25.8459 +++ d 1907 48787 25.6175 -2.2e-01 b 1908 </screen> 1909 <para> 1910 Note that we specified an empty second profile in the curly braces, as 1911 we wanted to use the current session; alternatively, we could 1912 have specified another archive, or a tgid etc. We specified the binary 1913 <command>a</command> in the shared section, so we matched that in both 1914 the profiles we're diffing. 1915 </para> 1916 <para> 1917 As in the normal output, the results are sorted by the number of 1918 samples, and the percentage field represents the relative percentage of 1919 the symbol's samples in the second profile. 1920 </para> 1921 <para> 1922 Notice the new column in the output. This value represents the 1923 percentage change of the relative percent between the first and the 1924 second profile: roughly, "how much more important this symbol is". 1925 Looking at the symbol <function>a()</function>, we can see that it took 1926 roughly the same amount of the total profile in both the first and the 1927 second profile. The function <function>c()</function> was not in the new 1928 profile, so has been marked with <function>---</function>. Note that the 1929 sample value is the number of samples in the first profile; since we're 1930 displaying results for the second profile, we don't list a percentage 1931 value for it, as it would be meaningless. <function>d()</function> is 1932 new in the second profile, and consequently marked with 1933 <function>+++</function>. 1934 </para> 1935 <para> 1936 When comparing profiles between different binaries, it should be clear 1937 that functions can change in terms of VMA and size. To avoid this 1938 problem, <command>opreport</command> considers a symbol to be the same 1939 if the symbol name, image name, and owning application name all match; 1940 any other factors are ignored. Note that the check for application name 1941 means that trying to compare library profiles between two different 1942 applications will not work as you might expect: each symbol will be 1943 considered different. 1944 </para> 1945 1946 </sect2> <!-- opreport-diff --> 1947 1948 <sect2 id="opreport-anon"> 1949 <title>Anonymous executable mappings</title> 1950 <para> 1951 Many applications, typically ones involving dynamic compilation into 1952 machine code (just-in-time, or "JIT", compilation), have executable mappings that 1953 are not backed by an ELF file. <command>opreport</command> has basic support for showing the 1954 samples taken in these regions; for example: 1955 <screen> 1956 $ opreport /usr/bin/mono -l 1957 CPU: ppc64 POWER5, speed 1654.34 MHz (estimated) 1958 Counted CYCLES events (Processor Cycles using continuous sampling) with a unit mask of 0x00 (No unit mask) count 100000 1959 samples % image name symbol name 1960 47 58.7500 mono (no symbols) 1961 14 17.5000 anon (tgid:3189 range:0xf72aa000-0xf72fa000) (no symbols) 1962 9 11.2500 anon (tgid:3189 range:0xf6cca000-0xf6dd9000) (no symbols) 1963 . . . . 1964 </screen> 1965 </para> 1966 <para> 1967 Note that, since such mappings are dependent upon individual invocations of 1968 a binary, these mappings are always listed as a dependent image, 1969 even when using <option>--separate=none</option>. 1970 Equally, the results are not affected by the <option>--merge</option> 1971 option. 1972 </para> 1973 <para> 1974 As shown in the opreport output above, OProfile is unable to attribute the samples to any 1975 symbol(s) because there is no ELF file for this code. 1976 Enhanced support for JITed code is now available for some virtual machines; 1977 e.g., the Java Virtual Machine. For details about OProfile output for 1978 JITed code, see <xref linkend="getting-jit-reports" />. 1979 </para> 1980 <para>For more information about JIT support in OProfile, see <xref linkend="jitsupport"/>. 1981 </para> 1982 </sect2> <!-- opreport-anon --> 1983 1984 <sect2 id="opreport-xml"> 1985 <title>XML formatted output</title> 1986 <para> 1987 The -xml option can be used to generate XML instead of the usual 1988 text format. This allows opreport to eliminate some of the constraints 1989 dictated by the two dimensional text format. For example, it is possible 1990 to separate the sample data across multiple events, cpus and threads. The XML 1991 schema implemented by opreport is found in doc/opreport.xsd. It contains 1992 more detailed comments about the structure of the XML generated by opreport. 1993 </para> 1994 <para> 1995 Since XML is consumed by a client program rather than a user, its structure 1996 is fairly static. In particular, the --sort option is incompatible with the 1997 --xml option. Percentages are not dislayed in the XML so the options related 1998 to percentages will have no effect. Full pathnames are always displayed in 1999 the XML so --long-filenames is not necessary. The --details option will cause 2000 all of the individual sample data to be included in the XML as well as the 2001 instruction byte stream for each symbol (for doing disassembly) and can result 2002 in very large XML files. 2003 </para> 2004 </sect2> <!-- opreport-xml --> 2005 2006 <sect2 id="opreport-options"> 2007 <title>Options for <command>opreport</command></title> 2008 2009 <variablelist> 2010 <varlistentry><term><option>--accumulated / -a</option></term><listitem><para> 2011 Accumulate sample and percentage counts in the symbol list. 2012 </para></listitem></varlistentry> 2013 <varlistentry><term><option>--callgraph / -c</option></term><listitem><para> 2014 Show callgraph information. 2015 </para></listitem></varlistentry> 2016 <varlistentry><term><option>--debug-info / -g</option></term><listitem><para> 2017 Show source file and line for each symbol. 2018 </para></listitem></varlistentry> 2019 <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para> 2020 none: no demangling. normal: use default demangler (default) smart: use 2021 pattern-matching to make C++ symbol demangling more readable. 2022 </para></listitem></varlistentry> 2023 <varlistentry><term><option>--details / -d</option></term><listitem><para> 2024 Show per-instruction details for all selected symbols. Note that, for 2025 binaries without symbol information, the VMA values shown are raw file 2026 offsets for the image binary. 2027 </para></listitem></varlistentry> 2028 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> 2029 Do not include application-specific images for libraries, kernel modules 2030 and the kernel. This option only makes sense if the profile session 2031 used --separate. 2032 </para></listitem></varlistentry> 2033 <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para> 2034 Exclude all the symbols in the given comma-separated list. 2035 </para></listitem></varlistentry> 2036 <varlistentry><term><option>--global-percent / -%</option></term><listitem><para> 2037 Make all percentages relative to the whole profile. 2038 </para></listitem></varlistentry> 2039 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> 2040 Show help message. 2041 </para></listitem></varlistentry> 2042 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> 2043 Comma-separated list of additional paths to search for binaries. 2044 This is needed to find modules in kernels 2.6 and upwards. 2045 </para></listitem></varlistentry> 2046 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> 2047 A path to a filesystem to search for additional binaries. 2048 </para></listitem></varlistentry> 2049 <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para> 2050 Only include symbols in the given comma-separated list. 2051 </para></listitem></varlistentry> 2052 <varlistentry><term><option>--long-filenames / -f</option></term><listitem><para> 2053 Output full paths instead of basenames. 2054 </para></listitem></varlistentry> 2055 <varlistentry><term><option>--merge / -m [lib,cpu,tid,tgid,unitmask,all]</option></term><listitem><para> 2056 Merge any profiles separated in a --separate session. 2057 </para></listitem></varlistentry> 2058 <varlistentry><term><option>--no-header</option></term><listitem><para> 2059 Don't output a header detailing profiling parameters. 2060 </para></listitem></varlistentry> 2061 <varlistentry><term><option>--output-file / -o [file]</option></term><listitem><para> 2062 Output to the given file instead of stdout. 2063 </para></listitem></varlistentry> 2064 <varlistentry><term><option>--reverse-sort / -r</option></term><listitem><para> 2065 Reverse the sort from the default. 2066 </para></listitem></varlistentry> 2067 <varlistentry><term><option>--session-dir=</option>dir_path</term><listitem><para> 2068 Use sample database out of directory <filename>dir_path</filename> 2069 instead of the default location (/var/lib/oprofile). 2070 </para></listitem></varlistentry> 2071 <varlistentry><term><option>--show-address / -w</option></term><listitem><para> 2072 Show the VMA address of each symbol (off by default). 2073 </para></listitem></varlistentry> 2074 <varlistentry><term><option>--sort / -s [vma,sample,symbol,debug,image]</option></term><listitem><para> 2075 Sort the list of symbols by, respectively, symbol address, 2076 number of samples, symbol name, debug filename and line number, 2077 binary image filename. 2078 </para></listitem></varlistentry> 2079 <varlistentry><term><option>--symbols / -l</option></term><listitem><para> 2080 List per-symbol information instead of a binary image summary. 2081 </para></listitem></varlistentry> 2082 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> 2083 Only output data for symbols that have more than the given percentage 2084 of total samples. 2085 </para></listitem></varlistentry> 2086 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> 2087 Give verbose debugging output. 2088 </para></listitem></varlistentry> 2089 <varlistentry><term><option>--version / -v</option></term><listitem><para> 2090 Show version. 2091 </para></listitem></varlistentry> 2092 <varlistentry><term><option>--xml / -X</option></term><listitem><para> 2093 Generate XML output. 2094 </para></listitem></varlistentry> 2095 </variablelist> 2096 2097 </sect2> 2098 2099 </sect1> <!-- opreport --> 2100 2101 <sect1 id="opannotate"> 2102 <title>Outputting annotated source (<command>opannotate</command>)</title> 2103 <para> 2104 The <command>opannotate</command> utility generates annotated source files or assembly listings, optionally 2105 mixed with source. 2106 If you want to see the source file, the profiled application needs to have debug information, and the source 2107 must be available through this debug information. For GCC, you must use the <option>-g</option> option 2108 when you are compiling. 2109 If the binary doesn't contain sufficient debug information, you can still 2110 use <command>opannotate <option>--assembly</option></command> to get annotated assembly. 2111 </para> 2112 <para> 2113 Note that for the reason explained in <xref linkend="hardware-counters" /> the results can be 2114 inaccurate. The debug information itself can add other problems; for example, the line number for a symbol can be 2115 incorrect. Assembly instructions can be re-ordered and moved by the compiler, and this can lead to 2116 crediting source lines with samples not really "owned" by this line. Also see 2117 <xref linkend="interpreting" />. 2118 </para> 2119 <para> 2120 You can output the annotation to one single file, containing all the source found using the 2121 <option>--source</option>. You can use this in conjunction with <option>--assembly</option> 2122 to get combined source/assembly output. 2123 </para> 2124 <para> 2125 You can also output a directory of annotated source files that maintains the structure of 2126 the original sources. Each line in the annotated source is prepended with the samples 2127 for that line. Additionally, each symbol is annotated giving details for the symbol 2128 as a whole. An example: 2129 </para> 2130 <screen> 2131 $ opannotate --source --output-dir=annotated /usr/local/oprofile-pp/bin/oprofiled 2132 $ ls annotated/home/moz/src/oprofile-pp/daemon/ 2133 opd_cookie.h opd_image.c opd_kernel.c opd_sample_files.c oprofiled.c 2134 </screen> 2135 <para> 2136 Line numbers are maintained in the source files, but each file has 2137 a footer appended describing the profiling details. The actual annotation 2138 looks something like this : 2139 </para> 2140 <screen> 2141 ... 2142 :static uint64_t pop_buffer_value(struct transient * trans) 2143 11510 1.9661 :{ /* pop_buffer_value total: 89901 15.3566 */ 2144 : uint64_t val; 2145 : 2146 10227 1.7469 : if (!trans->remaining) { 2147 : fprintf(stderr, "BUG: popping empty buffer !\n"); 2148 : exit(EXIT_FAILURE); 2149 : } 2150 : 2151 : val = get_buffer_value(trans->buffer, 0); 2152 2281 0.3896 : trans->remaining--; 2153 2296 0.3922 : trans->buffer += kernel_pointer_size; 2154 : return val; 2155 10454 1.7857 :} 2156 ... 2157 </screen> 2158 2159 <para> 2160 The first number on each line is the number of samples, whilst the second is 2161 the relative percentage of total samples. 2162 </para> 2163 2164 <sect2 id="opannotate-finding-source"> 2165 <title>Locating source files</title> 2166 <para> 2167 Of course, <command>opannotate</command> needs to be able to locate the source files 2168 for the binary image(s) in order to produce output. Some binary images have debug 2169 information where the given source file paths are relative, not absolute. You can 2170 specify search paths to look for these files (similar to <command>gdb</command>'s 2171 <option>dir</option> command) with the <option>--search-dirs</option> option. 2172 </para> 2173 <para> 2174 Sometimes you may have a binary image which gives absolute paths for the source files, 2175 but you have the actual sources elsewhere (commonly, you've installed an SRPM for 2176 a binary on your system and you want annotation from an existing profile). You can 2177 use the <option>--base-dirs</option> option to redirect OProfile to look somewhere 2178 else for source files. For example, imagine we have a binary generated from a source 2179 file that is given in the debug information as <filename>/tmp/build/libfoo/foo.c</filename>, 2180 and you have the source tree matching that binary installed in <filename>/home/user/libfoo/</filename>. 2181 You can redirect OProfile to find <filename>foo.c</filename> correctly like this : 2182 </para> 2183 <screen> 2184 $ opannotate --source --base-dirs=/tmp/build/libfoo/ --search-dirs=/home/user/libfoo/ --output-dir=annotated/ /lib/libfoo.so 2185 </screen> 2186 <para> 2187 You can specify multiple (comma-separated) paths to both options. 2188 </para> 2189 </sect2> 2190 2191 <sect2 id="opannotate-details"> 2192 <title>Usage of <command>opannotate</command></title> 2193 2194 <variablelist> 2195 <varlistentry><term><option>--assembly / -a</option></term><listitem><para> 2196 Output annotated assembly. If this is combined with --source, then mixed 2197 source / assembly annotations are output. 2198 </para></listitem></varlistentry> 2199 <varlistentry><term><option>--base-dirs / -b [paths]/</option></term><listitem><para> 2200 Comma-separated list of path prefixes. This can be used to point OProfile to a 2201 different location for source files when the debug information specifies an 2202 absolute path on your system for the source that does not exist. The prefix 2203 is stripped from the debug source file paths, then searched in the search dirs 2204 specified by <option>--search-dirs</option>. 2205 </para></listitem></varlistentry> 2206 <varlistentry><term><option>--demangle / -D none|normal|smart</option></term><listitem><para> 2207 none: no demangling. normal: use default demangler (default) smart: use 2208 pattern-matching to make C++ symbol demangling more readable. 2209 </para></listitem></varlistentry> 2210 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> 2211 Do not include application-specific images for libraries, kernel modules 2212 and the kernel. This option only makes sense if the profile session 2213 used --separate. 2214 </para></listitem></varlistentry> 2215 <varlistentry><term><option>--exclude-file [files]</option></term><listitem><para> 2216 Exclude all files in the given comma-separated list of glob patterns. 2217 </para></listitem></varlistentry> 2218 <varlistentry><term><option>--exclude-symbols / -e [symbols]</option></term><listitem><para> 2219 Exclude all the symbols in the given comma-separated list. 2220 </para></listitem></varlistentry> 2221 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> 2222 Show help message. 2223 </para></listitem></varlistentry> 2224 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> 2225 Comma-separated list of additional paths to search for binaries. 2226 This is needed to find modules in kernels 2.6 and upwards. 2227 </para></listitem></varlistentry> 2228 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> 2229 A path to a filesystem to search for additional binaries. 2230 </para></listitem></varlistentry> 2231 <varlistentry><term><option>--include-file [files]</option></term><listitem><para> 2232 Only include files in the given comma-separated list of glob patterns. 2233 </para></listitem></varlistentry> 2234 <varlistentry><term><option>--include-symbols / -i [symbols]</option></term><listitem><para> 2235 Only include symbols in the given comma-separated list. 2236 </para></listitem></varlistentry> 2237 <varlistentry><term><option>--objdump-params [params]</option></term><listitem><para> 2238 Pass the given parameters as extra values when calling objdump. 2239 </para></listitem></varlistentry> 2240 <varlistentry><term><option>--output-dir / -o [dir]</option></term><listitem><para> 2241 Output directory. This makes opannotate output one annotated file for each 2242 source file. This option can't be used in conjunction with --assembly. 2243 </para></listitem></varlistentry> 2244 <varlistentry><term><option>--search-dirs / -d [paths]</option></term><listitem><para> 2245 Comma-separated list of paths to search for source files. This is useful to find 2246 source files when the debug information only contains relative paths. 2247 </para></listitem></varlistentry> 2248 <varlistentry><term><option>--source / -s</option></term><listitem><para> 2249 Output annotated source. This requires debugging information to be available 2250 for the binaries. 2251 </para></listitem></varlistentry> 2252 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> 2253 Only output data for symbols that have more than the given percentage 2254 of total samples. 2255 </para></listitem></varlistentry> 2256 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> 2257 Give verbose debugging output. 2258 </para></listitem></varlistentry> 2259 <varlistentry><term><option>--version / -v</option></term><listitem><para> 2260 Show version. 2261 </para></listitem></varlistentry> 2262 </variablelist> 2263 2264 2265 </sect2> <!-- opannotate-details --> 2266 2267 </sect1> <!-- opannotate --> 2268 2269 <sect1 id="getting-jit-reports"> 2270 <title>OProfile results with JIT samples</title> 2271 <para> 2272 After profiling a Java (or other supported VM) application, the command 2273 <screen><command>"opcontrol --dump"</command> </screen> 2274 flushes the sample buffers and creates ELF binaries from the 2275 intermediate files that were written by the agent library. 2276 The ELF binaries are named <filename><tgid>.jo</filename>. 2277 With the symbol information stored in these ELF files, it is 2278 possible to map samples to the appropriate symbols. 2279 </para> 2280 <para> 2281 The usual analysis tools (<command>opreport</command> and/or 2282 <command>opannotate</command>) can now be used 2283 to get symbols and assembly code for the instrumented VM processes. 2284 </para> 2285 <para> 2286 Below is an example of a profile report of a Java application that has been 2287 instrumented with the provided agent library. 2288 <screen> 2289 $ opreport -l /usr/lib/jvm/jre-1.5.0-ibm/bin/java 2290 CPU: Core Solo / Duo, speed 2167 MHz (estimated) 2291 Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 2292 samples % image name symbol name 2293 186020 50.0523 no-vmlinux no-vmlinux (no symbols) 2294 34333 9.2380 7635.jo java void test.f1() 2295 19022 5.1182 libc-2.5.so libc-2.5.so _IO_file_xsputn@@GLIBC_2.1 2296 18762 5.0483 libc-2.5.so libc-2.5.so vfprintf 2297 16408 4.4149 7635.jo java void test$HelloThread.run() 2298 16250 4.3724 7635.jo java void test$test_1.f2(int) 2299 15303 4.1176 7635.jo java void test.f2(int, int) 2300 13252 3.5657 7635.jo java void test.f2(int) 2301 5165 1.3897 7635.jo java void test.f4() 2302 955 0.2570 7635.jo java void test$HelloThread.run()~ 2303 2304 </screen> 2305 </para> 2306 <note><para> 2307 Depending on the JVM that is used, certain options of opreport and opannotate 2308 do NOT work since they rely on debug information (e.g. source code line number) 2309 that is not always available. The Sun JVM does provide the necessary debug 2310 information via the JVMTI[PI] interface, 2311 but other JVMs do not. 2312 </para></note> 2313 <para> 2314 As you can see in the opreport output, the JIT support agent for Java 2315 generates symbols to include the class and method signature. 2316 A symbol with the suffix ˜<n> (e.g. 2317 <code>void test$HelloThread.run()˜1</code>) means that this is 2318 the <n>th occurrence of the identical name. This happens if a method is re-JITed. 2319 A symbol with the suffix %<n>, means that the address space of this symbol 2320 was reused during the sample session (see <xref linkend="overlapping-symbols" />). 2321 The value <n> is the percentage of time that this symbol/code was present in 2322 relation to the total lifetime of all overlapping other symbols. A symbol of the form 2323 <code><return_val> <class_name>$<method_sig></code> denotes an 2324 inner class. 2325 </para> 2326 </sect1> 2327 2328 <sect1 id="opgprof"> 2329 <title><command>gprof</command>-compatible output (<command>opgprof</command>)</title> 2330 <para> 2331 If you're familiar with the output produced by <command>GNU gprof</command>, 2332 you may find <command>opgprof</command> useful. It takes a single binary 2333 as an argument, and produces a <filename>gmon.out</filename> file for use 2334 with <command>gprof -p</command>. If call-graph profiling is enabled, 2335 then this is also included. 2336 </para> 2337 <screen> 2338 $ opgprof `which oprofiled` # generates gmon.out file 2339 $ gprof -p `which oprofiled` | head 2340 Flat profile: 2341 2342 Each sample counts as 1 samples. 2343 % cumulative self self total 2344 time samples samples calls T1/call T1/call name 2345 33.13 206237.00 206237.00 odb_insert 2346 22.67 347386.00 141149.00 pop_buffer_value 2347 9.56 406881.00 59495.00 opd_put_sample 2348 7.34 452599.00 45718.00 opd_find_image 2349 7.19 497327.00 44728.00 opd_process_samples 2350 </screen> 2351 2352 <sect2 id="opgprof-details"> 2353 <title>Usage of <command>opgprof</command></title> 2354 2355 <variablelist> 2356 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> 2357 Show help message. 2358 </para></listitem></varlistentry> 2359 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> 2360 Comma-separated list of additional paths to search for binaries. 2361 This is needed to find modules in kernels 2.6 and upwards. 2362 </para></listitem></varlistentry> 2363 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> 2364 A path to a filesystem to search for additional binaries. 2365 </para></listitem></varlistentry> 2366 <varlistentry><term><option>--output-filename / -o [file]</option></term><listitem><para> 2367 Output to the given file instead of the default, gmon.out 2368 </para></listitem></varlistentry> 2369 <varlistentry><term><option>--threshold / -t [percentage]</option></term><listitem><para> 2370 Only output data for symbols that have more than the given percentage 2371 of total samples. 2372 </para></listitem></varlistentry> 2373 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> 2374 Give verbose debugging output. 2375 </para></listitem></varlistentry> 2376 <varlistentry><term><option>--version / -v</option></term><listitem><para> 2377 Show version. 2378 </para></listitem></varlistentry> 2379 </variablelist> 2380 2381 </sect2> <!-- opgprof-details --> 2382 2383 </sect1> <!-- opgprof --> 2384 2385 <sect1 id="oparchive"> 2386 <title>Archiving measurements (<command>oparchive</command>)</title> 2387 <para> 2388 The <command>oparchive</command> utility generates a directory populated 2389 with executable, debug, and oprofile sample files. This directory can be 2390 moved to another machine via <command>tar</command> and analyzed without 2391 further use of the data collection machine. 2392 </para> 2393 2394 <para> 2395 The following command would collect the sample files, the executables 2396 associated with the sample files, and the debuginfo files associated 2397 with the executables and copy them into 2398 <filename>/tmp/current_data</filename>: 2399 </para> 2400 2401 <screen> 2402 # oparchive -o /tmp/current_data 2403 </screen> 2404 2405 <sect2 id="oparchive-details"> 2406 <title>Usage of <command>oparchive</command></title> 2407 2408 <variablelist> 2409 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> 2410 Show help message. 2411 </para></listitem></varlistentry> 2412 <varlistentry><term><option>--exclude-dependent / -x</option></term><listitem><para> 2413 Do not include application-specific images for libraries, kernel modules 2414 and the kernel. This option only makes sense if the profile session 2415 used --separate. 2416 </para></listitem></varlistentry> 2417 <varlistentry><term><option>--image-path / -p [paths]</option></term><listitem><para> 2418 Comma-separated list of additional paths to search for binaries. 2419 This is needed to find modules in kernels 2.6 and upwards. 2420 </para></listitem></varlistentry> 2421 <varlistentry><term><option>--root / -R [path]</option></term><listitem><para> 2422 A path to a filesystem to search for additional binaries. 2423 </para></listitem></varlistentry> 2424 <varlistentry><term><option>--output-directory / -o [directory]</option></term><listitem><para> 2425 Output to the given directory. There is no default. This must be specified. 2426 </para></listitem></varlistentry> 2427 <varlistentry><term><option>--list-files / -l</option></term><listitem><para> 2428 Only list the files that would be archived, don't copy them. 2429 </para></listitem></varlistentry> 2430 <varlistentry><term><option>--verbose / -V [options]</option></term><listitem><para> 2431 Give verbose debugging output. 2432 </para></listitem></varlistentry> 2433 <varlistentry><term><option>--version / -v</option></term><listitem><para> 2434 Show version. 2435 </para></listitem></varlistentry> 2436 </variablelist> 2437 2438 </sect2> <!-- oparchive-details --> 2439 2440 </sect1> <!-- oparchive --> 2441 2442 <sect1 id="opimport"> 2443 <title>Converting sample database files (<command>opimport</command>)</title> 2444 <para> 2445 This utility converts sample database files from a foreign binary format (abi) to 2446 the native format. This is useful only when moving sample files between hosts, 2447 for analysis on platforms other than the one used for collection. The abi format 2448 of the file to be imported is described in a text file located in <filename>$SESSION_DIR/abi</filename>. 2449 </para> 2450 2451 <para> 2452 The following command would convert the input samples files to the 2453 output samples files using the given abi file as a binary description 2454 of the input file and the curent platform abi as a binary description 2455 of the output file. 2456 </para> 2457 2458 <screen> 2459 # opimport -a /var/lib/oprofile/abi -o /tmp/current/.../GLOBAL_POWER_EVENTS.200000.1.all.all.all /var/lib/.../mprime/GLOBAL_POWER_EVENTS.200000.1.all.all.all 2460 </screen> 2461 2462 <sect2 id="opimport-details"> 2463 <title>Usage of <command>opimport</command></title> 2464 2465 <variablelist> 2466 <varlistentry><term><option>--help / -? / --usage</option></term><listitem><para> 2467 Show help message. 2468 </para></listitem></varlistentry> 2469 <varlistentry><term><option>--abi / -a [filename]</option></term><listitem><para> 2470 Input abi file description location. 2471 </para></listitem></varlistentry> 2472 <varlistentry><term><option>--force / -f</option></term><listitem><para> 2473 Force conversion even if the input and output abi are identical. 2474 </para></listitem></varlistentry> 2475 <varlistentry><term><option>--output / -o [filename]</option></term><listitem><para> 2476 Specify the output filename. If the output file already exists, the file is 2477 not overwritten but data are accumulated in. Sample filename are informative 2478 for post profile tools and must be kept identical, in other word the pathname 2479 from the first path component containing a '{' must be kept as it in the 2480 output filename. 2481 </para></listitem></varlistentry> 2482 <varlistentry><term><option>--verbose / -V</option></term><listitem><para> 2483 Give verbose debugging output. 2484 </para></listitem></varlistentry> 2485 <varlistentry><term><option>--version / -v</option></term><listitem><para> 2486 Show version. 2487 </para></listitem></varlistentry> 2488 </variablelist> 2489 2490 </sect2> <!-- opimport-details --> 2491 2492 </sect1> <!-- opimport --> 2493 2494 </chapter> 2495 2496 <chapter id="interpreting"> 2497 <title>Interpreting profiling results</title> 2498 <para> 2499 The standard caveats of profiling apply in interpreting the results from OProfile: 2500 profile realistic situations, profile different scenarios, profile 2501 for as long as a time as possible, avoid system-specific artifacts, don't trust 2502 the profile data too much. Also bear in mind the comments on the performance 2503 counters above - you <emphasis>cannot</emphasis> rely on totally accurate 2504 instruction-level profiling. However, for almost all circumstances the data 2505 can be useful. Ideally a utility such as Intel's VTUNE would be available to 2506 allow careful instruction-level analysis; go hassle Intel for this, not me ;) 2507 </para> 2508 <sect1 id="irq-latency"> 2509 <title>Profiling interrupt latency</title> 2510 <para> 2511 This is an example of how the latency of delivery of profiling interrupts 2512 can impact the reliability of the profiling data. This is pretty much a 2513 worst-case-scenario example: these problems are fairly rare. 2514 </para> 2515 <screen> 2516 double fun(double a, double b, double c) 2517 { 2518 double result = 0; 2519 for (int i = 0 ; i < 10000; ++i) { 2520 result += a; 2521 result *= b; 2522 result /= c; 2523 } 2524 return result; 2525 } 2526 </screen> 2527 <para> 2528 Here the last instruction of the loop is very costly, and you would expect the result 2529 reflecting that - but (cutting the instructions inside the loop): 2530 </para> 2531 <screen> 2532 $ opannotate -a -t 10 ./a.out 2533 2534 88 15.38% : 8048337: fadd %st(3),%st 2535 48 8.391% : 8048339: fmul %st(2),%st 2536 68 11.88% : 804833b: fdiv %st(1),%st 2537 368 64.33% : 804833d: inc %eax 2538 : 804833e: cmp $0x270f,%eax 2539 : 8048343: jle 8048337 2540 </screen> 2541 <para> 2542 The problem comes from the x86 hardware; when the counter overflows the IRQ 2543 is asserted but the hardware has features that can delay the NMI interrupt: 2544 x86 hardware is synchronous (i.e. cannot interrupt during an instruction); 2545 there is also a latency when the IRQ is asserted, and the multiple 2546 execution units and the out-of-order model of modern x86 CPUs also causes 2547 problems. This is the same function, with annotation : 2548 </para> 2549 <screen> 2550 $ opannotate -s -t 10 ./a.out 2551 2552 :double fun(double a, double b, double c) 2553 :{ /* _Z3funddd total: 572 100.0% */ 2554 : double result = 0; 2555 368 64.33% : for (int i = 0 ; i < 10000; ++i) { 2556 88 15.38% : result += a; 2557 48 8.391% : result *= b; 2558 68 11.88% : result /= c; 2559 : } 2560 : return result; 2561 :} 2562 </screen> 2563 <para> 2564 The conclusion: don't trust samples coming at the end of a loop, 2565 particularly if the last instruction generated by the compiler is costly. This 2566 case can also occur for branches. Always bear in mind that samples 2567 can be delayed by a few cycles from its real position. That's a hardware 2568 problem and OProfile can do nothing about it. 2569 </para> 2570 </sect1> 2571 <sect1 id="kernel-profiling"> 2572 <title>Kernel profiling</title> 2573 <sect2 id="irq-masking"> 2574 <title>Interrupt masking</title> 2575 <para> 2576 OProfile uses non-maskable interrupts (NMI) on the P6 generation, Pentium 4, 2577 Athlon, Opteron, Phenom, and Turion processors. These interrupts can occur even in section of the 2578 Linux where interrupts are disabled, allowing collection of samples in virtually 2579 all executable code. The RTC, timer interrupt mode, and Itanium 2 collection mechanisms 2580 use maskable interrupts. Thus, the RTC and Itanium 2 data collection mechanism have "sample 2581 shadows", or blind spots: regions where no samples will be collected. Typically, the samples 2582 will be attributed to the code immediately after the interrupts are re-enabled. 2583 </para> 2584 </sect2> 2585 <sect2 id="idle"> 2586 <title>Idle time</title> 2587 <para> 2588 Your kernel is likely to support halting the processor when a CPU is idle. As 2589 the typical hardware events like <constant>CPU_CLK_UNHALTED</constant> do not 2590 count when the CPU is halted, the kernel profile will not reflect the actual 2591 amount of time spent idle. You can change this behaviour by booting with 2592 the <option>idle=poll</option> option, which uses a different idle routine. This 2593 will appear as <function>poll_idle()</function> in your kernel profile. 2594 </para> 2595 </sect2> 2596 <sect2 id="kernel-modules"> 2597 <title>Profiling kernel modules</title> 2598 <para> 2599 OProfile profiles kernel modules by default. However, there are a couple of problems 2600 you may have when trying to get results. First, you may have booted via an initrd; 2601 this means that the actual path for the module binaries cannot be determined automatically. 2602 To get around this, you can use the <option>-p</option> option to the profiling tools 2603 to specify where to look for the kernel modules. 2604 </para> 2605 <para> 2606 In 2.6, the information on where kernel module binaries are located has been removed. 2607 This means OProfile needs guiding with the <option>-p</option> option to find your 2608 modules. Normally, you can just use your standard module top-level directory for this. 2609 Note that due to this problem, OProfile cannot check that the modification times match; 2610 it is your responsibility to make sure you do not modify a binary after a profile 2611 has been created. 2612 </para> 2613 <para> 2614 If you have run <command>insmod</command> or <command>modprobe</command> to insert a module 2615 in a particular directory, it is important that you specify this directory with the 2616 <option>-p</option> option first, so that it over-rides an older module binary that might 2617 exist in other directories you've specified with <option>-p</option>. It is up to you 2618 to make sure that these values are correct: 2.6 kernels simply do not provide enough 2619 information for OProfile to get this information. 2620 </para> 2621 </sect2> 2622 </sect1> 2623 2624 <sect1 id="interpreting-callgraph"> 2625 <title>Interpreting call-graph profiles</title> 2626 <para> 2627 Sometimes the results from call-graph profiles may be different to what 2628 you expect to see. The first thing to check is whether the target 2629 binaries where compiled with frame pointers enabled (if the binary was 2630 compiled using <command>gcc</command>'s 2631 <option>-fomit-frame-pointer</option> option, you will not get 2632 meaningful results). Note that as of this writing, the GCC developers 2633 plan to disable frame pointers by default. The Linux kernel is built 2634 without frame pointers by default; there is a configuration option you 2635 can use to turn it on under the "Kernel Hacking" menu. 2636 </para> 2637 <para> 2638 Often you may see a caller of a function that does not actually directly 2639 call the function you're looking at (e.g. if <function>a()</function> 2640 calls <function>b()</function>, which in turn calls 2641 <function>c()</function>, you may see an entry for 2642 <function>a()->c()</function>). What's actually occurring is that we 2643 are taking samples at the very start (or the very end) of 2644 <function>c()</function>; at these few instructions, we haven't yet 2645 created the new function's frame, so it appears as if 2646 <function>a()</function> is calling directly into 2647 <function>c()</function>. Be careful not to be misled by these 2648 entries. 2649 </para> 2650 <para> 2651 Like the rest of OProfile, call-graph profiling uses a statistical 2652 approach; this means that sometimes a backtrace sample is truncated, or 2653 even partially wrong. Bear this in mind when examining results. 2654 </para> 2655 <!-- FIXME: what do we need here ? --> 2656 </sect1> 2657 2658 <sect1 id="debug-info"> 2659 <title>Inaccuracies in annotated source</title> 2660 <sect2 id="effect-of-optimizations"> 2661 <title>Side effects of optimizations</title> 2662 <para> 2663 The compiler can introduce some pitfalls in the annotated source output. 2664 The optimizer can move pieces of code in such manner that two line of codes 2665 are interlaced (instruction scheduling). Also debug info generated by the compiler 2666 can show strange behavior. This is especially true for complex expressions e.g. inside 2667 an if statement: 2668 </para> 2669 <screen> 2670 if (a && .. 2671 b && .. 2672 c &&) 2673 </screen> 2674 <para> 2675 here the problem come from the position of line number. The available debug 2676 info does not give enough details for the if condition, so all samples are 2677 accumulated at the position of the right brace of the expression. Using 2678 <command>opannotate <option>-a</option></command> can help to show the real 2679 samples at an assembly level. 2680 </para> 2681 </sect2> 2682 <sect2 id="prologues"> 2683 <title>Prologues and epilogues</title> 2684 <para> 2685 The compiler generally needs to generate "glue" code across function calls, dependent 2686 on the particular function call conventions used. Additionally other things 2687 need to happen, like stack pointer adjustment for the local variables; this 2688 code is known as the function prologue. Similar code is needed at function return, 2689 and is known as the function epilogue. This will show up in annotations as 2690 samples at the very start and end of a function, where there is no apparent 2691 executable code in the source. 2692 </para> 2693 </sect2> 2694 <sect2 id="inlined-function"> 2695 <title>Inlined functions</title> 2696 <para> 2697 You may see that a function is credited with a certain number of samples, but 2698 the listing does not add up to the correct total. To pick a real example : 2699 </para> 2700 <screen> 2701 :internal_sk_buff_alloc_security(struct sk_buff *skb) 2702 353 2.342% :{ /* internal_sk_buff_alloc_security total: 1882 12.48% */ 2703 : 2704 : sk_buff_security_t *sksec; 2705 15 0.0995% : int rc = 0; 2706 : 2707 10 0.06633% : sksec = skb->lsm_security; 2708 468 3.104% : if (sksec && sksec->magic == DSI_MAGIC) { 2709 : goto out; 2710 : } 2711 : 2712 : sksec = (sk_buff_security_t *) get_sk_buff_memory(skb); 2713 3 0.0199% : if (!sksec) { 2714 38 0.2521% : rc = -ENOMEM; 2715 : goto out; 2716 10 0.06633% : } 2717 : memset(sksec, 0, sizeof (sk_buff_security_t)); 2718 44 0.2919% : sksec->magic = DSI_MAGIC; 2719 32 0.2123% : sksec->skb = skb; 2720 45 0.2985% : sksec->sid = DSI_SID_NORMAL; 2721 31 0.2056% : skb->lsm_security = sksec; 2722 : 2723 : out: 2724 : 2725 146 0.9685% : return rc; 2726 : 2727 98 0.6501% :} 2728 </screen> 2729 <para> 2730 Here, the function is credited with 1,882 samples, but the annotations 2731 below do not account for this. This is usually because of inline functions - 2732 the compiler marks such code with debug entries for the inline function 2733 definition, and this is where <command>opannotate</command> annotates 2734 such samples. In the case above, <function>memset</function> is the most 2735 likely candidate for this problem. Examining the mixed source/assembly 2736 output can help identify such results. 2737 </para> 2738 <para> 2739 This problem is more visible when there is no source file available, in the 2740 following example it's trivially visible the sums of symbols samples is less 2741 than the number of the samples for this file. The difference must be accounted 2742 to inline functions. 2743 </para> 2744 <screen> 2745 /* 2746 * Total samples for file : "arch/i386/kernel/process.c" 2747 * 2748 * 109 2.4616 2749 */ 2750 2751 /* default_idle total: 84 1.8970 */ 2752 /* cpu_idle total: 21 0.4743 */ 2753 /* flush_thread total: 1 0.0226 */ 2754 /* prepare_to_copy total: 1 0.0226 */ 2755 /* __switch_to total: 18 0.4065 */ 2756 </screen> 2757 <para> 2758 The missing samples are not lost, they will be credited to another source 2759 location where the inlined function is defined. The inlined function will be 2760 credited from multiple call site and merged in one place in the annotated 2761 source file so there is no way to see from what call site are coming the 2762 samples for an inlined function. 2763 </para> 2764 <para> 2765 When running <command>opannotate</command>, you may get a warning 2766 "some functions compiled without debug information may have incorrect source line attributions". 2767 In some rare cases, OProfile is not able to verify that the derived source line 2768 is correct (when some parts of the binary image are compiled without debugging 2769 information). Be wary of results if this warning appears. 2770 </para> 2771 <para> 2772 Furthermore, for some languages the compiler can implicitly generate functions, 2773 such as default copy constructors. Such functions are labelled by the compiler 2774 as having a line number of 0, which means the source annotation can be confusing. 2775 </para> 2776 <!-- FIXME so what *actually* happens to those samples ? ignored ? --> 2777 </sect2> 2778 <sect2 id="wrong-linenr-info"> 2779 <title>Inaccuracy in line number information</title> 2780 <para> 2781 Depending on your compiler you can fall into the following problem: 2782 </para> 2783 <screen> 2784 struct big_object { int a[500]; }; 2785 2786 int main() 2787 { 2788 big_object a, b; 2789 for (int i = 0 ; i != 1000 * 1000; ++i) 2790 b = a; 2791 return 0; 2792 } 2793 2794 </screen> 2795 <para> 2796 Compiled with <command>gcc</command> 3.0.4 the annotated source is clearly inaccurate: 2797 </para> 2798 <screen> 2799 :int main() 2800 :{ /* main total: 7871 100% */ 2801 : big_object a, b; 2802 : for (int i = 0 ; i != 1000 * 1000; ++i) 2803 : b = a; 2804 7871 100% : return 0; 2805 :} 2806 </screen> 2807 <para> 2808 The problem here is distinct from the IRQ latency problem; the debug line number 2809 information is not precise enough; again, looking at output of <command>opannoatate -as</command> can help. 2810 </para> 2811 <screen> 2812 :int main() 2813 :{ 2814 : big_object a, b; 2815 : for (int i = 0 ; i != 1000 * 1000; ++i) 2816 : 80484c0: push %ebp 2817 : 80484c1: mov %esp,%ebp 2818 : 80484c3: sub $0xfac,%esp 2819 : 80484c9: push %edi 2820 : 80484ca: push %esi 2821 : 80484cb: push %ebx 2822 : b = a; 2823 : 80484cc: lea 0xfffff060(%ebp),%edx 2824 : 80484d2: lea 0xfffff830(%ebp),%eax 2825 : 80484d8: mov $0xf423f,%ebx 2826 : 80484dd: lea 0x0(%esi),%esi 2827 : return 0; 2828 3 0.03811% : 80484e0: mov %edx,%edi 2829 : 80484e2: mov %eax,%esi 2830 1 0.0127% : 80484e4: cld 2831 8 0.1016% : 80484e5: mov $0x1f4,%ecx 2832 7850 99.73% : 80484ea: repz movsl %ds:(%esi),%es:(%edi) 2833 9 0.1143% : 80484ec: dec %ebx 2834 : 80484ed: jns 80484e0 2835 : 80484ef: xor %eax,%eax 2836 : 80484f1: pop %ebx 2837 : 80484f2: pop %esi 2838 : 80484f3: pop %edi 2839 : 80484f4: leave 2840 : 80484f5: ret 2841 </screen> 2842 <para> 2843 So here it's clear that copying is correctly credited with of all the samples, but the 2844 line number information is misplaced. <command>objdump -dS</command> exposes the 2845 same problem. Note that maintaining accurate debug information for compilers when optimizing is difficult, so this problem is not suprising. 2846 The problem of debug information 2847 accuracy is also dependent on the binutils version used; some BFD library versions 2848 contain a work-around for known problems of <command>gcc</command>, some others do not. This is unfortunate but we must live with that, 2849 since profiling is pointless when you disable optimisation (which would give better debugging entries). 2850 </para> 2851 </sect2> 2852 </sect1> 2853 <sect1 id="symbol-without-debug-info"> 2854 <title>Assembly functions</title> 2855 <para> 2856 Often the assembler cannot generate debug information automatically. 2857 This means that you cannot get a source report unless 2858 you manually define the neccessary debug information; read your assembler documentation for how you might 2859 do that. The only 2860 debugging info needed currently by OProfile is the line-number/filename-VMA association. When profiling assembly 2861 without debugging info you can always get report for symbols, and optionally for VMA, through <command>opreport -l</command> 2862 or <command>opreport -d</command>, but this works only for symbols with the right attributes. 2863 For <command>gas</command> you can get this by 2864 </para> 2865 <screen> 2866 .globl foo 2867 .type foo,@function 2868 </screen> 2869 <para> 2870 whilst for <command>nasm</command> you must use 2871 </para> 2872 <screen> 2873 GLOBAL foo:function ; [1] 2874 </screen> 2875 <para> 2876 Note that OProfile does not need the global attribute, only the function attribute. 2877 </para> 2878 </sect1> 2879 <!-- 2880 2881 FIXME: I commented this bit out until we've written something ... 2882 2883 improve this ? but look first why this file is special 2884 <sect2 id="small-functions"> 2885 <title>Small functions</title> 2886 <para> 2887 Very small functions can show strange behavior. The file in your source 2888 directory of OProfile <filename>$SRC/test-oprofile/understanding/puzzle.c</filename> 2889 show such example 2890 </para> 2891 </sect2> 2892 --> 2893 2894 <sect1 id="overlapping-symbols"> 2895 <title>Overlapping symbols in JITed code</title> 2896 <para> 2897 Some virtual machines (e.g., Java) may re-JIT a method, resulting in previously 2898 allocated space for a piece of compiled code to be reused. This means that, at one distinct 2899 code address, multiple symbols/methods may be present during the run time of the application. 2900 </para> 2901 <para> 2902 Since OProfile samples are buffered and don′t have timing information, there is no way 2903 to correlate samples with the (possibly) varying address ranges in which the code for a symbol 2904 may reside. 2905 An alternative would be flushing the OProfile sampling buffer when we get an unload event, 2906 but this could result in high overhead. 2907 </para> 2908 <para> 2909 To moderate the problem of overlapping symbols, OProfile tries to select the symbol that was 2910 present at this address range most of the time. Additionally, other overlapping symbols 2911 are truncated in the overlapping area. 2912 This gives reasonable results, because in reality, address reuse typically takes place 2913 during phase changes of the application -- in particular, during application startup. 2914 Thus, for optimum profiling results, start the sampling session after application startup 2915 and burn in. 2916 </para> 2917 </sect1> 2918 2919 <sect1 id="hidden-cost"> 2920 <title>Other discrepancies</title> 2921 <para> 2922 Another cause of apparent problems is the hidden cost of instructions. A very 2923 common example is two memory reads: one from L1 cache and the other from memory: 2924 the second memory read is likely to have more samples. 2925 There are many other causes of hidden cost of instructions. A non-exhaustive 2926 list: mis-predicted branch, TLB cache miss, partial register stall, 2927 partial register dependencies, memory mismatch stall, re-executed ops. If you want to write 2928 programs at the assembly level, be sure to take a look at the Intel and 2929 AMD documentation at <ulink url="http://developer.intel.com/">http://developer.intel.com/</ulink> 2930 and <ulink url="http://developer.amd.com/devguides.jsp/">http://developer.amd.com/devguides.jsp</ulink>. 2931 </para> 2932 </sect1> 2933 </chapter> 2934 2935 2936 <chapter id="ack"> 2937 <title>Acknowledgments</title> 2938 <para> 2939 Thanks to (in no particular order) : Arjan van de Ven, Rik van Riel, Juan Quintela, Philippe Elie, 2940 Phillipp Rumpf, Tigran Aivazian, Alex Brown, Alisdair Rawsthorne, Bob Montgomery, Ray Bryant, H.J. Lu, 2941 Jeff Esper, Will Cohen, Graydon Hoare, Cliff Woolley, Alex Tsariounov, Al Stone, Jason Yeh, 2942 Randolph Chung, Anton Blanchard, Richard Henderson, Andries Brouwer, Bryan Rittmeyer, 2943 Maynard P. Johnson, 2944 Richard Reich (rreich (a] rdrtech.com), Zwane Mwaikambo, Dave Jones, Charles Filtness; and finally Pulp, for "Intro". 2945 </para> 2946 </chapter> 2947 2948 </book> 2949