Home | History | Annotate | Download | only in doc
      1 <?xml version="1.0" encoding='ISO-8859-1'?>
      2 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
      3 
      4 <book id="oprofile-internals">
      5 <bookinfo>
      6 	<title>OProfile Internals</title>
      7  
      8 	<authorgroup>
      9 		<author>
     10 			<firstname>John</firstname>
     11 			<surname>Levon</surname>
     12 			<affiliation>
     13 				<address><email>levon (a] movementarian.org</email></address>
     14 			</affiliation>
     15 		</author>
     16 	</authorgroup>
     17 
     18 	<copyright>
     19 		<year>2003</year>
     20 		<holder>John Levon</holder>
     21 	</copyright>
     22 </bookinfo>
     23 
     24 <toc></toc>
     25 
     26 <chapter id="introduction">
     27 <title>Introduction</title>
     28 
     29 <para>
     30 This document is current for OProfile version <oprofileversion />.
     31 This document provides some details on the internal workings of OProfile for the
     32 interested hacker. This document assumes strong C, working C++, plus some knowledge of
     33 kernel internals and CPU hardware.
     34 </para>
     35 <note>
     36 <para>
     37 Only the "new" implementation associated with kernel 2.6 and above is covered here. 2.4
     38 uses a very different kernel module implementation and daemon to produce the sample files.
     39 </para>
     40 </note>
     41 
     42 <sect1 id="overview">
     43 <title>Overview</title>
     44 <para>
     45 OProfile is a statistical continuous profiler. In other words, profiles are generated by
     46 regularly sampling the current registers on each CPU (from an interrupt handler, the
     47 saved PC value at the time of interrupt is stored), and converting that runtime PC
     48 value into something meaningful to the programmer.
     49 </para>
     50 <para>
     51 OProfile achieves this by taking the stream of sampled PC values, along with the detail
     52 of which task was running at the time of the interrupt, and converting into a file offset
     53 against a particular binary file. Because applications <function>mmap()</function>
     54 the code they run (be it <filename>/bin/bash</filename>, <filename>/lib/libfoo.so</filename>
     55 or whatever), it's possible to find the relevant binary file and offset by walking
     56 the task's list of mapped memory areas. Each PC value is thus converted into a tuple
     57 of binary-image,offset. This is something that the userspace tools can use directly
     58 to reconstruct where the code came from, including the particular assembly instructions,
     59 symbol, and source line (via the binary's debug information if present).
     60 </para>
     61 <para>
     62 Regularly sampling the PC value like this approximates what actually was executed and
     63 how often - more often than not, this statistical approximation is good enough to
     64 reflect reality. In common operation, the time between each sample interrupt is regulated
     65 by a fixed number of clock cycles. This implies that the results will reflect where
     66 the CPU is spending the most time; this is obviously a very useful information source
     67 for performance analysis.
     68 </para>
     69 <para>
     70 Sometimes though, an application programmer needs different kinds of information: for example,
     71 "which of the source routines cause the most cache misses ?". The rise in importance of
     72 such metrics in recent years has led many CPU manufacturers to provide hardware performance
     73 counters capable of measuring these events on the hardware level. Typically, these counters
     74 increment once per each event, and generate an interrupt on reaching some pre-defined
     75 number of events. OProfile can use these interrupts to generate samples: then, the
     76 profile results are a statistical approximation of which code caused how many of the
     77 given event.
     78 </para>
     79 <para>
     80 Consider a simplified system that only executes two functions A and B. A
     81 takes one cycle to execute, whereas B takes 99 cycles. Imagine we run at
     82 100 cycles a second, and we've set the performance counter to create an
     83 interrupt after a set number of "events" (in this case an event is one
     84 clock cycle). It should be clear that the chances of the interrupt
     85 occurring in function A is 1/100, and 99/100 for function B. Thus, we
     86 statistically approximate the actual relative performance features of
     87 the two functions over time. This same analysis works for other types of
     88 events, providing that the interrupt is tied to the number of events
     89 occurring (that is, after N events, an interrupt is generated).
     90 </para>
     91 <para>
     92 There are typically more than one of these counters, so it's possible to set up profiling
     93 for several different event types. Using these counters gives us a powerful, low-overhead
     94 way of gaining performance metrics. If OProfile, or the CPU, does not support performance
     95 counters, then a simpler method is used: the kernel timer interrupt feeds samples
     96 into OProfile itself.
     97 </para>
     98 <para>
     99 The rest of this document concerns itself with how we get from receiving samples at
    100 interrupt time to producing user-readable profile information.
    101 </para>
    102 </sect1>
    103 
    104 <sect1 id="components">
    105 <title>Components of the OProfile system</title>
    106 
    107 <sect2 id="arch-specific-components">
    108 <title>Architecture-specific components</title>
    109 <para>
    110 If OProfile supports the hardware performance counters found on
    111 a particular architecture, code for managing the details of setting
    112 up and managing these counters can be found in the kernel source
    113 tree in the relevant <filename>arch/<emphasis>arch</emphasis>/oprofile/</filename>
    114 directory. The architecture-specific implementation works via
    115 filling in the oprofile_operations structure at init time. This
    116 provides a set of operations such as <function>setup()</function>,
    117 <function>start()</function>, <function>stop()</function>, etc.
    118 that manage the hardware-specific details of fiddling with the
    119 performance counter registers.
    120 </para>
    121 <para>
    122 The other important facility available to the architecture code is
    123 <function>oprofile_add_sample()</function>.  This is where a particular sample
    124 taken at interrupt time is fed into the generic OProfile driver code.
    125 </para>
    126 </sect2>
    127 
    128 <sect2 id="filesystem">
    129 <title>oprofilefs</title>
    130 <para>
    131 OProfile implements a pseudo-filesystem known as "oprofilefs", mounted from
    132 userspace at <filename>/dev/oprofile</filename>. This consists of small
    133 files for reporting and receiving configuration from userspace, as well
    134 as the actual character device that the OProfile userspace receives samples
    135 from. At <function>setup()</function> time, the architecture-specific may
    136 add further configuration files related to the details of the performance
    137 counters. For example, on x86, one numbered directory for each hardware
    138 performance counter is added, with files in each for the event type,
    139 reset value, etc.
    140 </para>
    141 <para>
    142 The filesystem also contains a <filename>stats</filename> directory with
    143 a number of useful counters for various OProfile events.
    144 </para>
    145 </sect2>
    146 
    147 <sect2 id="driver">
    148 <title>Generic kernel driver</title>
    149 <para>
    150 This lives in <filename>drivers/oprofile/</filename>, and forms the core of
    151 how OProfile works in the kernel. Its job is to take samples delivered
    152 from the architecture-specific code (via <function>oprofile_add_sample()</function>),
    153 and buffer this data, in a transformed form as described later, until releasing
    154 the data to the userspace daemon via the <filename>/dev/oprofile/buffer</filename>
    155 character device.
    156 </para>
    157 </sect2>
    158 
    159 <sect2 id="daemon">
    160 <title>The OProfile daemon</title>
    161 <para>
    162 The OProfile userspace daemon's job is to take the raw data provided by the
    163 kernel and write it to the disk. It takes the single data stream from the
    164 kernel and logs sample data against a number of sample files (found in
    165 <filename>$SESSION_DIR/samples/current/</filename>, by default located at 
    166 <filename>/var/lib/oprofile/samples/current/</filename>. For the benefit
    167 of the "separate" functionality, the names/paths of these sample files
    168 are mangled to reflect where the samples were from: this can include
    169 thread IDs, the binary file path, the event type used, and more.
    170 </para>
    171 <para>
    172 After this final step from interrupt to disk file, the data is now
    173 persistent (that is, changes in the running of the system do not invalidate
    174 stored data). So the post-profiling tools can run on this data at any
    175 time (assuming the original binary files are still available and unchanged,
    176 naturally).
    177 </para>
    178 </sect2>
    179 
    180 <sect2 id="post-profiling">
    181 <title>Post-profiling tools</title>
    182 So far, we've collected data, but we've yet to present it in a useful form
    183 to the user. This is the job of the post-profiling tools. In general form,
    184 they collate a subset of the available sample files, load and process each one
    185 correlated against the relevant binary file, and finally produce user-readable
    186 information.
    187 </sect2>
    188 
    189 </sect1>
    190 
    191 </chapter>
    192 
    193 <chapter id="performance-counters">
    194 <title>Performance counter management</title>
    195 
    196 <sect1 id ="performance-counters-ui">
    197 <title>Providing a user interface</title>
    198 
    199 <para>
    200 The performance counter registers need programming in order to set the
    201 type of event to count, etc. OProfile uses a standard model across all
    202 CPUs for defining these events as follows :
    203 </para>
    204 <informaltable frame="all">
    205 <tgroup cols='2'> 
    206 <tbody>
    207 <row><entry><option>event</option></entry><entry>The event type e.g. DATA_MEM_REFS</entry></row>
    208 <row><entry><option>unit mask</option></entry><entry>The sub-events to count (more detailed specification)</entry></row>
    209 <row><entry><option>counter</option></entry><entry>The hardware counter(s) that can count this event</entry></row>
    210 <row><entry><option>count</option></entry><entry>The reset value (how many events before an interrupt)</entry></row>
    211 <row><entry><option>kernel</option></entry><entry>Whether the counter should increment when in kernel space</entry></row>
    212 <row><entry><option>user</option></entry><entry>Whether the counter should increment when in user space</entry></row>
    213 </tbody>
    214 </tgroup>
    215 </informaltable>
    216 <para>
    217 The term "unit mask" is borrowed from the Intel architectures, and can
    218 further specify exactly when a counter is incremented (for example,
    219 cache-related events can be restricted to particular state transitions
    220 of the cache lines).
    221 </para>
    222 <para>
    223 All of the available hardware events and their details are specified in
    224 the textual files in the <filename>events</filename> directory. The
    225 syntax of these files should be fairly obvious. The user specifies the
    226 names and configuration details of the chosen counters via
    227 <command>opcontrol</command>. These are then written to the kernel
    228 module (in numerical form) via <filename>/dev/oprofile/N/</filename>
    229 where N is the physical hardware counter (some events can only be used
    230 on specific counters; OProfile hides these details from the user when
    231 possible). On IA64, the perfmon-based interface behaves somewhat
    232 differently, as described later.
    233 </para>
    234 
    235 </sect1>
    236 
    237 <sect1 id="performance-counters-programming">
    238 <title>Programming the performance counter registers</title>
    239 
    240 <para>
    241 We have described how the user interface fills in the desired
    242 configuration of the counters and transmits the information to the
    243 kernel. It is the job of the <function>-&gt;setup()</function> method
    244 to actually program the performance counter registers. Clearly, the
    245 details of how this is done is architecture-specific; it is also
    246 model-specific on many architectures. For example, i386 provides methods
    247 for each model type that programs the counter registers correctly
    248 (see the <filename>op_model_*</filename> files in
    249 <filename>arch/i386/oprofile</filename> for the details). The method
    250 reads the values stored in the virtual oprofilefs files and programs
    251 the registers appropriately, ready for starting the actual profiling
    252 session.
    253 </para>
    254 <para>
    255 The architecture-specific drivers make sure to save the old register
    256 settings before doing OProfile setup. They are restored when OProfile
    257 shuts down. This is useful, for example, on i386, where the NMI watchdog
    258 uses the same performance counter registers as OProfile; they cannot
    259 run concurrently, but OProfile makes sure to restore the setup it found
    260 before it was running.
    261 </para>
    262 <para>
    263 In addition to programming the counter registers themselves, other setup
    264 is often necessary. For example, on i386, the local APIC needs
    265 programming in order to make the counter's overflow interrupt appear as
    266 an NMI (non-maskable interrupt). This allows sampling (and therefore
    267 profiling) of regions where "normal" interrupts are masked, enabling
    268 more reliable profiles.
    269 </para>
    270 
    271 <sect2 id="performance-counters-start">
    272 <title>Starting and stopping the counters</title>
    273 <para>
    274 Initiating a profiling session is done via writing an ASCII '1'
    275 to the file <filename>/dev/oprofile/enable</filename>. This sets up the
    276 core, and calls into the architecture-specific driver to actually
    277 enable each configured counter. Again, the details of how this is
    278 done is model-specific (for example, the Athlon models can disable
    279 or enable on a per-counter basis, unlike the PPro models).
    280 </para>
    281 </sect2>
    282 
    283 <sect2>
    284 <title>IA64 and perfmon</title>
    285 <para>
    286 The IA64 architecture provides a different interface from the other
    287 architectures, using the existing perfmon driver. Register programming
    288 is handled entirely in user-space (see
    289 <filename>daemon/opd_perfmon.c</filename> for the details). A process
    290 is forked for each CPU, which creates a perfmon context and sets the
    291 counter registers appropriately via the
    292 <function>sys_perfmonctl</function> interface. In addition, the actual
    293 initiation and termination of the profiling session is handled via the
    294 same interface using <constant>PFM_START</constant> and
    295 <constant>PFM_STOP</constant>. On IA64, then, there are no oprofilefs
    296 files for the performance counters, as the kernel driver does not
    297 program the registers itself.
    298 </para>
    299 <para>
    300 Instead, the perfmon driver for OProfile simply registers with the
    301 OProfile core with an OProfile-specific UUID. During a profiling
    302 session, the perfmon core calls into the OProfile perfmon driver and
    303 samples are registered with the OProfile core itself as usual (with
    304 <function>oprofile_add_sample()</function>).
    305 </para>
    306 </sect2>
    307 
    308 </sect1>
    309 
    310 </chapter>
    311 
    312 <chapter id="collecting-samples">
    313 <title>Collecting and processing samples</title>
    314 
    315 <sect1 id="receiving-interrupts">
    316 <title>Receiving interrupts</title>
    317 <para>
    318 Naturally, how the overflow interrupts are received is specific
    319 to the hardware architecture, unless we are in "timer" mode, where the
    320 logging routine is called directly from the standard kernel timer
    321 interrupt handler.
    322 </para>
    323 <para>
    324 On the i386 architecture, the local APIC is programmed such that when a
    325 counter overflows (that is, it receives an event that causes an integer
    326 overflow of the register value to zero), an NMI is generated. This calls
    327 into the general handler <function>do_nmi()</function>; because OProfile
    328 has registered itself as capable of handling NMI interrupts, this will
    329 call into the OProfile driver code in
    330 <filename>arch/i386/oprofile</filename>. Here, the saved PC value (the
    331 CPU saves the register set at the time of interrupt on the stack
    332 available for inspection) is extracted, and the counters are examined to
    333 find out which one generated the interrupt. Also determined is whether
    334 the system was inside kernel or user space at the time of the interrupt.
    335 These three pieces of information are then forwarded onto the OProfile
    336 core via <function>oprofile_add_sample()</function>. Finally, the
    337 counter values are reset to the chosen count value, to ensure another
    338 interrupt happens after another N events have occurred. Other
    339 architectures behave in a similar manner.
    340 </para>
    341 </sect1>
    342  
    343 <sect1 id="core-structure">
    344 <title>Core data structures</title>
    345 <para>
    346 Before considering what happens when we log a sample, we shall digress
    347 for a moment and look at the general structure of the data collection
    348 system.
    349 </para>
    350 <para>
    351 OProfile maintains a small buffer for storing the logged samples for
    352 each CPU on the system. Only this buffer is altered when we actually log
    353 a sample (remember, we may still be in an NMI context, so no locking is
    354 possible). The buffer is managed by a two-handed system; the "head"
    355 iterator dictates where the next sample data should be placed in the
    356 buffer. Of course, overflow of the buffer is possible, in which case
    357 the sample is discarded.
    358 </para>
    359 <para>
    360 It is critical to remember that at this point, the PC value is an
    361 absolute value, and is therefore only meaningful in the context of which
    362 task it was logged against. Thus, these per-CPU buffers also maintain
    363 details of which task each logged sample is for, as described in the
    364 next section. In addition, we store whether the sample was in kernel
    365 space or user space (on some architectures and configurations, the address
    366 space is not sub-divided neatly at a specific PC value, so we must store
    367 this information).
    368 </para>
    369 <para>
    370 As well as these small per-CPU buffers, we have a considerably larger
    371 single buffer. This holds the data that is eventually copied out into
    372 the OProfile daemon. On certain system events, the per-CPU buffers are
    373 processed and entered (in mutated form) into the main buffer, known in
    374 the source as the "event buffer". The "tail" iterator indicates the
    375 point from which the CPU may be read, up to the position of the "head"
    376 iterator. This provides an entirely lock-free method for extracting data
    377 from the CPU buffers. This process is described in detail later in this chapter.
    378 </para>
    379 <figure><title>The OProfile buffers</title>
    380 <graphic fileref="buffers.png" />
    381 </figure>
    382 </sect1>
    383 
    384 <sect1 id="logging-sample">
    385 <title>Logging a sample</title>
    386 <para>
    387 As mentioned, the sample is logged into the buffer specific to the
    388 current CPU. The CPU buffer is a simple array of pairs of unsigned long
    389 values; for a sample, they hold the PC value and the counter for the
    390 sample. (The counter value is later used to translate back into the relevant
    391 event type the counter was programmed to).
    392 </para>
    393 <para>
    394 In addition to logging the sample itself, we also log task switches.
    395 This is simply done by storing the address of the last task to log a
    396 sample on that CPU in a data structure, and writing a task switch entry
    397 into the buffer if the new value of <function>current()</function> has
    398 changed. Note that later we will directly de-reference this pointer;
    399 this imposes certain restrictions on when and how the CPU buffers need
    400 to be processed.
    401 </para>
    402 <para>
    403 Finally, as mentioned, we log whether we have changed between kernel and
    404 userspace using a similar method. Both of these variables
    405 (<varname>last_task</varname> and <varname>last_is_kernel</varname>) are
    406 reset when the CPU buffer is read.
    407 </para>
    408 </sect1>
    409 
    410 <sect1 id="logging-stack">
    411 <title>Logging stack traces</title>
    412 <para>
    413 OProfile can also provide statistical samples of call chains (on x86). To
    414 do this, at sample time, the frame pointer chain is traversed, recording
    415 the return address for each stack frame. This will only work if the code
    416 was compiled with frame pointers, but we're careful to abort the
    417 traversal if the frame pointer appears bad. We store the set of return
    418 addresses straight into the CPU buffer. Note that, since this traversal
    419 is keyed off the standard sample interrupt, the number of times a
    420 function appears in a stack trace is not an indicator of how many times
    421 the call site was executed: rather, it's related to the number of
    422 samples we took where that call site was involved. Thus, the results for
    423 stack traces are not necessarily proportional to the call counts:
    424 typical programs will have many <function>main()</function> samples.
    425 </para>
    426 </sect1>
    427 
    428 <sect1 id="synchronising-buffers">
    429 <title>Synchronising the CPU buffers to the event buffer</title>
    430 <!-- FIXME: update when percpu patch goes in -->
    431 <para>
    432 At some point, we have to process the data in each CPU buffer and enter
    433 it into the main (event) buffer. The file
    434 <filename>buffer_sync.c</filename> contains the relevant code. We
    435 periodically (currently every <constant>HZ</constant>/4 jiffies) start
    436 the synchronisation process. In addition, we process the buffers on
    437 certain events, such as an application calling
    438 <function>munmap()</function>. This is particularly important for
    439 <function>exit()</function> - because the CPU buffers contain pointers
    440 to the task structure, if we don't process all the buffers before the
    441 task is actually destroyed and the task structure freed, then we could
    442 end up trying to dereference a bogus pointer in one of the CPU buffers.
    443 </para>
    444 <para>
    445 We also add a notification when a kernel module is loaded; this is so
    446 that user-space can re-read <filename>/proc/modules</filename> to
    447 determine the load addresses of kernel module text sections. Without
    448 this notification, samples for a newly-loaded module could get lost or
    449 be attributed to the wrong module.
    450 </para>
    451 <para>
    452 The synchronisation itself works in the following manner: first, mutual
    453 exclusion on the event buffer is taken. Remember, we do not need to do
    454 that for each CPU buffer, as we only read from the tail iterator (whilst
    455 interrupts might be arriving at the same buffer, but they will write to
    456 the position of the head iterator, leaving previously written entries
    457 intact). Then, we process each CPU buffer in turn. A CPU switch
    458 notification is added to the buffer first (for
    459 <option>--separate=cpu</option> support). Then the processing of the
    460 actual data starts.
    461 </para>
    462 <para>
    463 As mentioned, the CPU buffer consists of task switch entries and the
    464 actual samples. When the routine <function>sync_buffer()</function> sees
    465 a task switch, the process ID and process group ID are recorded into the
    466 event buffer, along with a dcookie (see below) identifying the
    467 application binary (e.g. <filename>/bin/bash</filename>). The
    468 <varname>mmap_sem</varname> for the task is then taken, to allow safe
    469 iteration across the tasks' list of mapped areas. Each sample is then
    470 processed as described in the next section.
    471 </para>
    472 <para>
    473 After a buffer has been read, the tail iterator is updated to reflect
    474 how much of the buffer was processed. Note that when we determined how
    475 much data there was to read in the CPU buffer, we also called
    476 <function>cpu_buffer_reset()</function> to reset
    477 <varname>last_task</varname> and <varname>last_is_kernel</varname>, as
    478 we've already mentioned. During the processing, more samples may have
    479 been arriving in the CPU buffer; this is OK because we are careful to
    480 only update the tail iterator to how much we actually read - on the next
    481 buffer synchronisation, we will start again from that point.
    482 </para>
    483 </sect1>
    484 
    485 <sect1 id="dentry-cookies">
    486 <title>Identifying binary images</title>
    487 <para>
    488 In order to produce useful profiles, we need to be able to associate a
    489 particular PC value sample with an actual ELF binary on the disk. This
    490 leaves us with the problem of how to export this information to
    491 user-space. We create unique IDs that identify a particular directory
    492 entry (dentry), and write those IDs into the event buffer. Later on,
    493 the user-space daemon can call the <function>lookup_dcookie</function>
    494 system call, which looks up the ID and fills in the full path of
    495 the binary image in the buffer user-space passes in. These IDs are
    496 maintained by the code in <filename>fs/dcookies.c</filename>; the
    497 cache lasts for as long as the daemon has the event buffer open.
    498 </para>
    499 </sect1>
    500 
    501 <sect1 id="finding-dentry">
    502 <title>Finding a sample's binary image and offset</title>
    503 <para>
    504 We haven't yet described how we process the absolute PC value into
    505 something usable by the user-space daemon. When we find a sample entered
    506 into the CPU buffer, we traverse the list of mappings for the task
    507 (remember, we will have seen a task switch earlier, so we know which
    508 task's lists to look at). When a mapping is found that contains the PC
    509 value, we look up the mapped file's dentry in the dcookie cache. This
    510 gives the dcookie ID that will uniquely identify the mapped file. Then
    511 we alter the absolute value such that it is an offset from the start of
    512 the file being mapped (the mapping need not start at the start of the
    513 actual file, so we have to consider the offset value of the mapping). We
    514 store this dcookie ID into the event buffer; this identifies which
    515 binary the samples following it are against.
    516 In this manner, we have converted a PC value, which has transitory
    517 meaning only, into a static offset value for later processing by the
    518 daemon.
    519 </para>
    520 <para>
    521 We also attempt to avoid the relatively expensive lookup of the dentry
    522 cookie value by storing the cookie value directly into the dentry
    523 itself; then we can simply derive the cookie value immediately when we
    524 find the correct mapping.
    525 </para>
    526 </sect1>
    527 
    528 </chapter>
    529 
    530 <chapter id="sample-files">
    531 <title>Generating sample files</title>
    532 
    533 <sect1 id="processing-buffer">
    534 <title>Processing the buffer</title>
    535 
    536 <para>
    537 Now we can move onto user-space in our description of how raw interrupt
    538 samples are processed into useful information. As we described in
    539 previous sections, the kernel OProfile driver creates a large buffer of
    540 sample data consisting of offset values, interspersed with
    541 notification of changes in context. These context changes indicate how
    542 following samples should be attributed, and include task switches, CPU
    543 changes, and which dcookie the sample value is against. By processing
    544 this buffer entry-by-entry, we can determine where the samples should
    545 be accredited to. This is particularly important when using the 
    546 <option>--separate</option>.
    547 </para>
    548 <para>
    549 The file <filename>daemon/opd_trans.c</filename> contains the basic routine
    550 for the buffer processing. The <varname>struct transient</varname>
    551 structure is used to hold changes in context. Its members are modified
    552 as we process each entry; it is passed into the routines in
    553 <filename>daemon/opd_sfile.c</filename> for actually logging the sample
    554 to a particular sample file (which will be held in
    555 <filename>$SESSION_DIR/samples/current</filename>).
    556 </para>
    557 <para>
    558 The buffer format is designed for conciseness, as high sampling rates
    559 can easily generate a lot of data. Thus, context changes are prefixed
    560 by an escape code, identified by <function>is_escape_code()</function>.
    561 If an escape code is found, the next entry in the buffer identifies
    562 what type of context change is being read. These are handed off to
    563 various handlers (see the <varname>handlers</varname> array), which
    564 modify the transient structure as appropriate. If it's not an escape
    565 code, then it must be a PC offset value, and the very next entry will
    566 be the numeric hardware counter. These values are read and recorded
    567 in the transient structure; we then do a lookup to find the correct
    568 sample file, and log the sample, as described in the next section.
    569 </para>
    570 
    571 <sect2 id="handling-kernel-samples">
    572 <title>Handling kernel samples</title>
    573 
    574 <para>
    575 Samples from kernel code require a little special handling. Because
    576 the binary text which the sample is against does not correspond to
    577 any file that the kernel directly knows about, the OProfile driver
    578 stores the absolute PC value in the buffer, instead of the file offset.
    579 Of course, we need an offset against some particular binary. To handle
    580 this, we keep a list of loaded modules by parsing
    581 <filename>/proc/modules</filename> as needed. When a module is loaded,
    582 a notification is placed in the OProfile buffer, and this triggers a
    583 re-read. We store the module name, and the loading address and size.
    584 This is also done for the main kernel image, as specified by the user.
    585 The absolute PC value is matched against each address range, and
    586 modified into an offset when the matching module is found. See 
    587 <filename>daemon/opd_kernel.c</filename> for the details.
    588 </para>
    589 
    590 </sect2>
    591 
    592 
    593 </sect1>
    594 
    595 <sect1 id="sample-file-generation">
    596 <title>Locating and creating sample files</title>
    597 
    598 <para>
    599 We have a sample value and its satellite data stored in a
    600 <varname>struct transient</varname>, and we must locate an
    601 actual sample file to store the sample in, using the context
    602 information in the transient structure as a key. The transient data to
    603 sample file lookup is handled in
    604 <filename>daemon/opd_sfile.c</filename>. A hash is taken of the
    605 transient values that are relevant (depending upon the setting of
    606 <option>--separate</option>, some values might be irrelevant), and the
    607 hash value is used to lookup the list of currently open sample files.
    608 Of course, the sample file might not be found, in which case we need
    609 to create and open it.
    610 </para>
    611 <para>
    612 OProfile uses a rather complex scheme for naming sample files, in order
    613 to make selecting relevant sample files easier for the post-profiling
    614 utilities. The exact details of the scheme are given in
    615 <filename>oprofile-tests/pp_interface</filename>, but for now it will
    616 suffice to remember that the filename will include only relevant
    617 information for the current settings, taken from the transient data. A
    618 fully-specified filename looks something like :
    619 </para>
    620 <computeroutput>
    621 /var/lib/oprofile/samples/current/{root}/usr/bin/xmms/{dep}/{root}/lib/tls/libc-2.3.2.so/CPU_CLK_UNHALTED.100000.0.28082.28089.0
    622 </computeroutput>
    623 <para>
    624 It should be clear that this identifies such information as the
    625 application binary, the dependent (library) binary, the hardware event,
    626 and the process and thread ID. Typically, not all this information is
    627 needed, in which cases some values may be replaced with the token
    628 <filename>all</filename>.
    629 </para>
    630 <para>
    631 The code that generates this filename and opens the file is found in
    632 <filename>daemon/opd_mangling.c</filename>. You may have realised that
    633 at this point, we do not have the binary image file names, only the
    634 dcookie values. In order to determine a file name, a dcookie value is
    635 looked up in the dcookie cache. This is to be found in
    636 <filename>daemon/opd_cookie.c</filename>. Since dcookies are both
    637 persistent and unique during a sampling session, we can cache the
    638 values. If the value is not found in the cache, then we ask the kernel
    639 to do the lookup from value to file name for us by calling
    640 <function>lookup_dcookie()</function>. This looks up the value in a
    641 kernel-side cache (see <filename>fs/dcookies.c</filename>) and returns
    642 the fully-qualified file name to userspace.
    643 </para>
    644 
    645 </sect1>
    646 
    647 <sect1 id="sample-file-writing">
    648 <title>Writing data to a sample file</title>
    649 
    650 <para>
    651 Each specific sample file is a hashed collection, where the key is
    652 the PC offset from the transient data, and the value is the number of
    653 samples recorded against that offset. The files are
    654 <function>mmap()</function>ed into the daemon's memory space. The code
    655 to actually log the write against the sample file can be found in
    656 <filename>libdb/</filename>.
    657 </para>
    658 <para>
    659 For recording stack traces, we have a more complicated sample filename
    660 mangling scheme that allows us to identify cross-binary calls. We use
    661 the same sample file format, where the key is a 64-bit value composed
    662 from the from,to pair of offsets.
    663 </para>
    664 
    665 </sect1>
    666 
    667 </chapter>
    668 
    669 <chapter id="output">
    670 <title>Generating useful output</title>
    671 
    672 <para>
    673 All of the tools used to generate human-readable output have to take
    674 roughly the same steps to collect the data for processing. First, the
    675 profile specification given by the user has to be parsed. Next, a list
    676 of sample files matching the specification has to obtained. Using this
    677 list, we need to locate the binary file for each sample file, and then
    678 use them to extract meaningful data, before a final collation and
    679 presentation to the user.
    680 </para>
    681 
    682 <sect1 id="profile-specification">
    683 <title>Handling the profile specification</title>
    684 
    685 <para>
    686 The profile specification presented by the user is parsed in
    687 the function <function>profile_spec::create()</function>. This
    688 creates an object representing the specification. Then we
    689 use <function>profile_spec::generate_file_list()</function>
    690 to search for all sample files and match them against the
    691 <varname>profile_spec</varname>.
    692 </para>
    693 
    694 <para>
    695 To enable this matching process to work, the attributes of
    696 each sample file is encoded in its filename. This is a low-tech
    697 approach to matching specifications against candidate sample
    698 files, but it works reasonably well. A typical sample file
    699 might look like these:
    700 </para>
    701 <screen>
    702 /var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/{cg}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.all.all.all
    703 /var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.all.all.all
    704 /var/lib/oprofile/samples/current/{root}/bin/ls/{dep}/{root}/bin/ls/CPU_CLK_UNHALTED.100000.0.7423.7424.0
    705 /var/lib/oprofile/samples/current/{kern}/r128/{dep}/{kern}/r128/CPU_CLK_UNHALTED.100000.0.all.all.all
    706 </screen>
    707 <para>
    708 This looks unnecessarily complex, but it's actually fairly simple. First
    709 we have the session of the sample, by default located here
    710 <filename>/var/lib/oprofile/samples/current</filename>. This location
    711 can be changed by specifying the --session-dir option at command-line.
    712 This session could equally well be inside an archive from <command>oparchive</command>.
    713 Next we have one of the tokens <filename>{root}</filename> or
    714 <filename>{kern}</filename>. <filename>{root}</filename> indicates
    715 that the binary is found on a file system, and we will encode its path
    716 in the next section (e.g. <filename>/bin/ls</filename>).
    717 <filename>{kern}</filename> indicates a kernel module - on 2.6 kernels
    718 the path information is not available from the kernel, so we have to
    719 special-case kernel modules like this; we encode merely the name of the
    720 module as loaded.
    721 </para>
    722 <para>
    723 Next there is a <filename>{dep}</filename> token, indicating another
    724 token/path which identifies the dependent binary image. This is used even for
    725 the "primary" binary (i.e. the one that was
    726 <function>execve()</function>d), as it simplifies processing. Finally,
    727 if this sample file is a normal flat profile, the actual file is next in
    728 the path. If it's a call-graph sample file, we need one further
    729 specification, to allow us to identify cross-binary arcs in the call
    730 graph.
    731 </para>
    732 <para>
    733 The actual sample file name is dot-separated, where the fields are, in
    734 order: event name, event count, unit mask, task group ID, task ID, and
    735 CPU number.
    736 </para>
    737 <para>
    738 This sample file can be reliably parsed (with
    739 <function>parse_filename()</function>) into a
    740 <varname>filename_spec</varname>. Finally, we can check whether to
    741 include the sample file in the final results by comparing this
    742 <varname>filename_spec</varname> against the
    743 <varname>profile_spec</varname> the user specified (for the interested,
    744 see <function>valid_candidate()</function> and
    745 <function>profile_spec::match</function>). Then comes the really
    746 complicated bit...
    747 </para>
    748 
    749 </sect1>
    750 
    751 <sect1 id="sample-file-collating">
    752 <title>Collating the candidate sample files</title>
    753 
    754 <para>
    755 At this point we have a duplicate-free list of sample files we need
    756 to process. But first we need to do some further arrangement: we
    757 need to classify each sample file, and we may also need to "invert"
    758 the profiles.
    759 </para>
    760 
    761 <sect2 id="sample-file-classifying">
    762 <title>Classifying sample files</title>
    763 
    764 <para>
    765 It's possible for utilities like <command>opreport</command> to show 
    766 data in columnar format: for example, we might want to show the results
    767 of two threads within a process side-by-side. To do this, we need
    768 to classify each sample file into classes - the classes correspond
    769 with each <command>opreport</command> column. The function that handles
    770 this is <function>arrange_profiles()</function>. Each sample file
    771 is added to a particular class. If the sample file is the first in
    772 its class, a template is generated from the sample file. Each template
    773 describes a particular class (thus, in our example above, each template
    774 will have a different thread ID, and this uniquely identifies each
    775 class).
    776 </para>
    777 
    778 <para>
    779 Each class has a list of "profile sets" matching that class's template.
    780 A profile set is either a profile of the primary binary image, or any of
    781 its dependent images. After all sample files have been listed in one of
    782 the profile sets belonging to the classes, we have to name each class and
    783 perform error-checking. This is done by
    784 <function>identify_classes()</function>; each class is checked to ensure
    785 that its "axis" is the same as all the others. This is needed because
    786 <command>opreport</command> can't produce results in 3D format: we can
    787 only differ in one aspect, such as thread ID or event name.
    788 </para>
    789 
    790 </sect2>
    791 
    792 <sect2 id="sample-file-inverting">
    793 <title>Creating inverted profile lists</title>
    794 
    795 <para>
    796 Remember that if we're using certain profile separation options, such as
    797 "--separate=lib", a single binary could be a dependent image to many
    798 different binaries. For example, the C library image would be a
    799 dependent image for most programs that have been profiled. As it
    800 happens, this can cause severe performance problems: without some
    801 re-arrangement, these dependent binary images would be opened each
    802 time we need to process sample files for each program.
    803 </para>
    804 
    805 <para>
    806 The solution is to "invert" the profiles via
    807 <function>invert_profiles()</function>. We create a new data structure
    808 where the dependent binary is first, and the primary binary images using
    809 that dependent binary are listed as sub-images. This helps our
    810 performance problem, as now we only need to open each dependent image
    811 once, when we process the list of inverted profiles.
    812 </para>
    813 
    814 </sect2>
    815 
    816 </sect1>
    817 
    818 <sect1 id="generating-profile-data">
    819 <title>Generating profile data</title>
    820 
    821 <para>
    822 Things don't get any simpler at this point, unfortunately. At this point
    823 we've collected and classified the sample files into the set of inverted
    824 profiles, as described in the previous section. Now we need to process
    825 each inverted profile and make something of the data. The entry point
    826 for this is <function>populate_for_image()</function>.
    827 </para>
    828 
    829 <sect2 id="bfd">
    830 <title>Processing the binary image</title>
    831 <para>
    832 The first thing we do with an inverted profile is attempt to open the
    833 binary image (remember each inverted profile set is only for one binary
    834 image, but may have many sample files to process). The
    835 <varname>op_bfd</varname> class provides an abstracted interface to
    836 this; internally it uses <filename>libbfd</filename>. The main purpose
    837 of this class is to process the symbols for the binary image; this is
    838 also where symbol filtering happens. This is actually quite tricky, but
    839 should be clear from the source.
    840 </para>
    841 </sect2>
    842 
    843 <sect2 id="processing-sample-files">
    844 <title>Processing the sample files</title>
    845 <para>
    846 The class <varname>profile_container</varname> is a hold-all that
    847 contains all the processed results. It is a container of
    848 <varname>profile_t</varname> objects. The
    849 <function>add_sample_files()</function> method uses
    850 <filename>libdb</filename> to open the given sample file and add the
    851 key/value types to the <varname>profile_t</varname>. Once this has been
    852 done, <function>profile_container::add()</function> is passed the
    853 <varname>profile_t</varname> plus the <varname>op_bfd</varname> for
    854 processing.
    855 </para>
    856 <para>
    857 <function>profile_container::add()</function> walks through the symbols
    858 collected in the <varname>op_bfd</varname>.
    859 <function>op_bfd::get_symbol_range()</function> gives us the start and
    860 end of the symbol as an offset from the start of the binary image,
    861 then we interrogate the <varname>profile_t</varname> for the relevant samples
    862 for that offset range. We create a <varname>symbol_entry</varname>
    863 object for this symbol and fill it in. If needed, here we also collect
    864 debug information from the <varname>op_bfd</varname>, and possibly
    865 record the detailed sample information (as used by <command>opreport
    866 -d</command> and <command>opannotate</command>).
    867 Finally the <varname>symbol_entry</varname> is added to
    868 a private container of <varname>profile_container</varname> - this
    869 <varname>symbol_container</varname> holds all such processed symbols.
    870 </para>
    871 </sect2>
    872 
    873 </sect1>
    874 
    875 <sect1 id="generating-output">
    876 <title>Generating output</title>
    877 
    878 <para>
    879 After the processing described in the previous section, we've now got
    880 full details of what we need to output stored in the
    881 <varname>profile_container</varname> on a symbol-by-symbol basis. To
    882 produce output, we need to replay that data and format it suitably.
    883 </para>
    884 <para>
    885 <command>opreport</command> first asks the
    886 <varname>profile_container</varname> for a
    887 <varname>symbol_collection</varname> (this is also where thresholding
    888 happens).
    889 This is sorted, then a
    890 <varname>opreport_formatter</varname> is initialised.
    891 This object initialises a set of field formatters as requested. Then
    892 <function>opreport_formatter::output()</function> is called. This
    893 iterates through the (sorted) <varname>symbol_collection</varname>;
    894 for each entry, the selected fields (as set by the
    895 <varname>format_flags</varname> options) are output by calling the
    896 field formatters, with the <varname>symbol_entry</varname> passed in.
    897 </para>
    898 
    899 </sect1>
    900 
    901 </chapter>
    902 
    903 <chapter id="ext">
    904 <title>Extended Feature Interface</title>
    905 
    906 <sect1 id="ext-intro">
    907 <title>Introduction</title>
    908 
    909 <para>
    910 The Extended Feature Interface is a standard callback interface 
    911 designed to allow extension to the OProfile daemon's sample processing. 
    912 Each feature defines a set of callback handlers which can be enabled or 
    913 disabled through the OProfile daemon's command-line option.
    914 This interface can be used to implement support for architecture-specific
    915 features or features not commonly used by general OProfile users. 
    916 </para>
    917 
    918 </sect1>
    919 
    920 <sect1 id="ext-name-and-handlers">
    921 <title>Feature Name and Handlers</title>
    922 
    923 <para>
    924 Each extended feature has an entry in the <varname>ext_feature_table</varname>
    925 in <filename>opd_extended.cpp</filename>. Each entry contains a feature name,
    926 and a corresponding set of handlers. Feature name is a unique string, which is
    927 used to identify a feature in the table. Each feature provides a set
    928 of handlers, which will be executed by the OProfile daemon from pre-determined
    929 locations to perform certain tasks. At runtime, the OProfile daemon calls a feature
    930 handler wrapper from one of the predetermined locations to check whether
    931 an extended feature is enabled, and whether a particular handler exists.
    932 Only the handlers of the enabled feature will be executed.
    933 </para>
    934 
    935 </sect1>
    936 
    937 <sect1 id="ext-enable">
    938 <title>Enabling Features</title>
    939 
    940 <para>
    941 Each feature is enabled using the OProfile daemon (oprofiled) command-line
    942 option "--ext-feature=&lt;extended-feature-name&gt;:[args]". The
    943 "extended-feature-name" is used to determine the feature to be enabled.
    944 The optional "args" is passed into the feature-specific initialization handler
    945 (<function>ext_init</function>). Currently, only one extended feature can be
    946 enabled at a time.
    947 </para>
    948 
    949 </sect1>
    950 
    951 <sect1 id="ext-types-of-handlers">
    952 <title>Type of Handlers</title>
    953 
    954 <para>
    955 Each feature is responsible for providing its own set of handlers.
    956 Types of handler are:
    957 </para>
    958 
    959 <sect2 id="ext_init">
    960 <title>ext_init Handler</title>
    961 
    962 <para>
    963 "ext_init" handles initialization of an extended feature. It takes
    964 "args" parameter which is passed in through the "oprofiled --ext-feature=&lt;
    965 extended-feature-name&gt;:[args]". This handler is executed in the function
    966 <function>opd_options()</function> in the file <filename>daemon/oprofiled.c
    967 </filename>.
    968 </para>
    969 
    970 <note>
    971 <para>
    972 The ext_init handler is required for all features.
    973 </para>
    974 </note>
    975 
    976 </sect2>
    977 
    978 <sect2 id="ext_print_stats">
    979 <title>ext_print_stats Handler</title>
    980 
    981 <para>
    982 "ext_print_stats" handles the extended feature statistics report. It adds
    983 a new section in the OProfile daemon statistics report, which is normally
    984 outputed to the file
    985 <filename>/var/lib/oprofile/samples/oprofiled.log</filename>.
    986 This handler is executed in the function <function>opd_print_stats()</function>
    987 in the file <filename>daemon/opd_stats.c</filename>.
    988 </para>
    989 
    990 </sect2>
    991 
    992 <sect2 id="ext_sfile_handlers">
    993 <title>ext_sfile Handler</title>
    994 
    995 <para>
    996 "ext_sfile" contains a set of handlers related to operations on the extended
    997 sample files (sample files for events related to extended feature).
    998 These operations include <function>create_sfile()</function>,
    999 <function>sfile_dup()</function>, <function>close_sfile()</function>,
   1000 <function>sync_sfile()</function>, and <function>get_file()</function>
   1001 as defined in <filename>daemon/opd_sfile.c</filename>.
   1002 An additional field, <varname>odb_t * ext_file</varname>, is added to the 
   1003 <varname>struct sfile</varname> for storing extended sample files
   1004 information. 
   1005 
   1006 </para>
   1007 
   1008 </sect2>
   1009 
   1010 </sect1>
   1011 
   1012 <sect1 id="ext-implementation">
   1013 <title>Extended Feature Reference Implementation</title>
   1014 
   1015 <sect2 id="ext-ibs">
   1016 <title>Instruction-Based Sampling (IBS)</title>
   1017 
   1018 <para>
   1019 An example of extended feature implementation can be seen by
   1020 examining the AMD Instruction-Based Sampling support.
   1021 </para>
   1022 
   1023 <sect3 id="ibs-init">
   1024 <title>IBS Initialization</title>
   1025 
   1026 <para>
   1027 Instruction-Based Sampling (IBS) is a new performance measurement technique
   1028 available on AMD Family 10h processors. Enabling IBS profiling is done simply
   1029 by specifying IBS performance events through the "--event=" options.
   1030 </para>
   1031 
   1032 <screen>
   1033 opcontrol --event=IBS_FETCH_XXX:&lt;count&gt;:&lt;um&gt;:&lt;kernel&gt;:&lt;user&gt;
   1034 opcontrol --event=IBS_OP_XXX:&lt;count&gt;:&lt;um&gt;:&lt;kernel&gt;:&lt;user&gt;
   1035 
   1036 Note: * Count and unitmask for all IBS fetch events must be the same,
   1037 	as do those for IBS op.
   1038 </screen>
   1039 
   1040 <para>
   1041 IBS performance events are listed in <function>opcontrol --list-events</function>.
   1042 When users specify these events, opcontrol verifies them using ophelp, which
   1043 checks for the <varname>ext:ibs_fetch</varname> or <varname>ext:ibs_op</varname>
   1044 tag in <filename>events/x86-64/family10/events</filename> file.
   1045 Then, it configures the driver interface (/dev/oprofile/ibs_fetch/... and
   1046 /dev/oprofile/ibs_op/...) and starts the OProfile daemon as follows.
   1047 </para>
   1048 
   1049 <screen>
   1050 oprofiled \
   1051     --ext-feature=ibs:\
   1052 	fetch:&lt;IBS_FETCH_EVENT1&gt;,&lt;IBS_FETCH_EVENT2&gt;,...,:&lt;IBS fetch count&gt;:&lt;IBS Fetch um&gt;|\
   1053 	op:&lt;IBS_OP_EVENT1&gt;,&lt;IBS_OP_EVENT2&gt;,...,:&lt;IBS op count&gt;:&lt;IBS op um&gt;
   1054 </screen>
   1055 
   1056 <para>
   1057 Here, the OProfile daemon parses the <varname>--ext-feature</varname>
   1058 option and checks the feature name ("ibs") before calling the 
   1059 the initialization function to handle the string
   1060 containing IBS events, counts, and unitmasks.
   1061 Then, it stores each event in the IBS virtual-counter table
   1062 (<varname>struct opd_event ibs_vc[OP_MAX_IBS_COUNTERS]</varname>) and
   1063 stores the event index in the IBS Virtual Counter Index (VCI) map
   1064 (<varname>ibs_vci_map[OP_MAX_IBS_COUNTERS]</varname>) with IBS event value
   1065 as the map key.
   1066 </para>
   1067 </sect3>
   1068 
   1069 <sect3 id="ibs-data-processing">
   1070 <title>IBS Data Processing</title>
   1071 
   1072 <para>
   1073 During a profile session, the OProfile daemon identifies IBS samples in the 
   1074 event buffer using the <varname>"IBS_FETCH_CODE"</varname> or 
   1075 <varname>"IBS_OP_CODE"</varname>. These codes trigger the handlers 
   1076 <function>code_ibs_fetch_sample()</function> or 
   1077 <function>code_ibs_op_sample()</function> listed in the
   1078 <varname>handler_t handlers[]</varname> vector in 
   1079 <filename>daemon/opd_trans.c </filename>. These handlers are responsible for
   1080 processing IBS samples and translate them into IBS performance events.
   1081 </para>
   1082 
   1083 <para>
   1084 Unlike traditional performance events, each IBS sample can be derived into 
   1085 multiple IBS performance events. For each event that the user specifies,
   1086 a combination of bits from Model-Specific Registers (MSR) are checked
   1087 against the bitmask defining the event. If the condition is met, the event
   1088 will then be recorded. The derivation logic is in the files
   1089 <filename>daemon/opd_ibs_macro.h</filename> and
   1090 <filename>daemon/opd_ibs_trans.[h,c]</filename>. 
   1091 </para>
   1092 
   1093 </sect3>
   1094 
   1095 <sect3 id="ibs-sample-file">
   1096 <title>IBS Sample File</title>
   1097 
   1098 <para>
   1099 Traditionally, sample file information <varname>(odb_t)</varname> is stored
   1100 in the <varname>struct sfile::odb_t file[OP_MAX_COUNTER]</varname>.
   1101 Currently, <varname>OP_MAX_COUNTER</varname> is 8 on non-alpha, and 20 on
   1102 alpha-based system. Event index (the counter number on which the event
   1103 is configured) is used to access the corresponding entry in the array.
   1104 Unlike the traditional performance event, IBS does not use the actual
   1105 counter registers (i.e. <filename>/dev/oprofile/0,1,2,3</filename>).
   1106 Also, the number of performance events generated by IBS could be larger than
   1107 <varname>OP_MAX_COUNTER</varname> (currently upto 13 IBS-fetch and 46 IBS-op 
   1108 events). Therefore IBS requires a special data structure and sfile
   1109 handlers (<varname>struct opd_ext_sfile_handlers</varname>) for managing
   1110 IBS sample files. IBS-sample-file information is stored in a memory 
   1111 allocated by handler <function>ibs_sfile_create()</function>, which can
   1112 be accessed through <varname>struct sfile::odb_t * ext_files</varname>.
   1113 </para>
   1114 
   1115 </sect3>
   1116 
   1117 </sect2>
   1118 
   1119 </sect1>
   1120 
   1121 </chapter>
   1122 
   1123 <glossary id="glossary">
   1124 <title>Glossary of OProfile source concepts and types</title>
   1125 
   1126 <glossentry><glossterm>application image</glossterm>
   1127 <glossdef><para>
   1128 The primary binary image used by an application. This is derived
   1129 from the kernel and corresponds to the binary started upon running
   1130 an application: for example, <filename>/bin/bash</filename>.
   1131 </para></glossdef></glossentry>
   1132 
   1133 <glossentry><glossterm>binary image</glossterm>
   1134 <glossdef><para>
   1135 An ELF file containing executable code: this includes kernel modules,
   1136 the kernel itself (a.k.a. <filename>vmlinux</filename>), shared libraries,
   1137 and application binaries.
   1138 </para></glossdef></glossentry>
   1139 
   1140 <glossentry><glossterm>dcookie</glossterm>
   1141 <glossdef><para>
   1142 Short for "dentry cookie". A unique ID that can be looked up to provide
   1143 the full path name of a binary image.
   1144 </para></glossdef></glossentry>
   1145 
   1146 <glossentry><glossterm>dependent image</glossterm>
   1147 <glossdef><para>
   1148 A binary image that is dependent upon an application, used with
   1149 per-application separation. Most commonly, shared libraries. For example,
   1150 if <filename>/bin/bash</filename> is running and we take
   1151 some samples inside the C library itself due to <command>bash</command>
   1152 calling library code, then the image <filename>/lib/libc.so</filename>
   1153 would be dependent upon <filename>/bin/bash</filename>.
   1154 </para></glossdef></glossentry>
   1155 
   1156 <glossentry><glossterm>merging</glossterm>
   1157 <glossdef><para>
   1158 This refers to the ability to merge several distinct sample files
   1159 into one set of data at runtime, in the post-profiling tools. For example,
   1160 per-thread sample files can be merged into one set of data, because
   1161 they are compatible (i.e. the aggregation of the data is meaningful),
   1162 but it's not possible to merge sample files for two different events,
   1163 because there would be no useful meaning to the results.
   1164 </para></glossdef></glossentry>
   1165 
   1166 <glossentry><glossterm>profile class</glossterm>
   1167 <glossdef><para>
   1168 A collection of profile data that has been collected under the same
   1169 class template. For example, if we're using <command>opreport</command>
   1170 to show results after profiling with two performance counters enabled
   1171 profiling <constant>DATA_MEM_REFS</constant> and <constant>CPU_CLK_UNHALTED</constant>,
   1172 there would be two profile classes, one for each event. Or if we're on
   1173 an SMP system and doing per-cpu profiling, and we request
   1174 <command>opreport</command> to show results for each CPU side-by-side,
   1175 there would be a profile class for each CPU.
   1176 </para></glossdef></glossentry>
   1177 
   1178 <glossentry><glossterm>profile specification</glossterm>
   1179 <glossdef><para>
   1180 The parameters the user passes to the post-profiling tools that limit
   1181 what sample files are used. This specification is matched against
   1182 the available sample files to generate a selection of profile data.
   1183 </para></glossdef></glossentry>
   1184 
   1185 <glossentry><glossterm>profile template</glossterm>
   1186 <glossdef><para>
   1187 The parameters that define what goes in a particular profile class.
   1188 This includes a symbolic name (e.g. "cpu:1") and the code-usable
   1189 equivalent.
   1190 </para></glossdef></glossentry>
   1191 
   1192 </glossary>
   1193 
   1194 </book>
   1195