Home | History | Annotate | Download | only in html
      1 <html>
      2 <head>
      3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
      4 <title>12.BBV: an experimental basic block vector generation tool</title>
      5 <link rel="stylesheet" href="vg_basic.css" type="text/css">
      6 <meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
      7 <link rel="home" href="index.html" title="Valgrind Documentation">
      8 <link rel="up" href="manual.html" title="Valgrind User Manual">
      9 <link rel="prev" href="pc-manual.html" title="11.Ptrcheck: an experimental heap, stack and global array overrun detector">
     10 <link rel="next" href="lk-manual.html" title="13.Lackey: an example tool">
     11 </head>
     12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
     13 <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr>
     14 <td width="22px" align="center" valign="middle"><a accesskey="p" href="pc-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td>
     15 <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td>
     16 <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td>
     17 <th align="center" valign="middle">Valgrind User Manual</th>
     18 <td width="22px" align="center" valign="middle"><a accesskey="n" href="lk-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td>
     19 </tr></table></div>
     20 <div class="chapter" title="12.BBV: an experimental basic block vector generation tool">
     21 <div class="titlepage"><div><div><h2 class="title">
     22 <a name="bbv-manual"></a>12.BBV: an experimental basic block vector generation tool</h2></div></div></div>
     23 <div class="toc">
     24 <p><b>Table of Contents</b></p>
     25 <dl>
     26 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.overview">12.1. Overview</a></span></dt>
     27 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.quickstart">12.2. Using Basic Block Vectors to create SimPoints</a></span></dt>
     28 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.usage">12.3. BBV Command-line Options</a></span></dt>
     29 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.fileformat">12.4. Basic Block Vector File Format</a></span></dt>
     30 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.implementation">12.5. Implementation</a></span></dt>
     31 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.threadsupport">12.6. Threaded Executable Support</a></span></dt>
     32 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.validation">12.7. Validation</a></span></dt>
     33 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.performance">12.8. Performance</a></span></dt>
     34 </dl>
     35 </div>
     36 <p>To use this tool, you must specify
     37 <code class="option">--tool=exp-bbv</code> on the Valgrind
     38 command line.</p>
     39 <div class="sect1" title="12.1.Overview">
     40 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     41 <a name="bbv-manual.overview"></a>12.1.Overview</h2></div></div></div>
     42 <p>
     43    A basic block is a linear section of code with one entry point and one exit
     44    point.  A <span class="emphasis"><em>basic block vector</em></span> (BBV) is a list of all
     45    basic blocks entered during program execution, and a count of how many
     46    times each basic block was run.
     47 </p>
     48 <p>
     49    BBV is a tool that generates basic block vectors for use with the 
     50    <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint</a>
     51    analysis tool. 
     52    The SimPoint methodology enables speeding up architectural 
     53    simulations by only running a small portion of a program
     54    and then extrapolating total behavior from this
     55    small portion.  Most programs exhibit phase-based behavior, which
     56    means that at various times during execution a program will encounter 
     57    intervals of time where the code behaves similarly to a previous
     58    interval.  If you can detect these intervals and group them together, 
     59    an approximation of the total program behavior can be obtained
     60    by only simulating a bare minimum number of intervals, and then scaling 
     61    the results.
     62 </p>
     63 <p>
     64   In computer architecture research, running a 
     65   benchmark on a cycle-accurate simulator can cause slowdowns on the order
     66   of 1000 times, making it take days, weeks, or even longer to run full
     67   benchmarks.  By utilizing SimPoint this can be reduced significantly,
     68   usually by 90-95%, while still retaining reasonable accuracy.
     69 </p>
     70 <p>
     71    A more complete introduction to how SimPoint works can be 
     72    found in the paper "Automatically Characterizing Large Scale 
     73    Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and 
     74    B. Calder.  
     75 </p>
     76 </div>
     77 <div class="sect1" title="12.2.Using Basic Block Vectors to create SimPoints">
     78 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
     79 <a name="bbv-manual.quickstart"></a>12.2.Using Basic Block Vectors to create SimPoints</h2></div></div></div>
     80 <p>
     81    To quickly create a basic block vector file, you will call Valgrind
     82    like this:
     83 
     84    </p>
     85 <pre class="programlisting">valgrind --tool=exp-bbv /bin/ls</pre>
     86 <p>
     87 
     88    In this case we are running on <code class="filename">/bin/ls</code>,
     89    but this can be any program.  By default a file called
     90    <code class="computeroutput">bb.out.PID</code> will be created,
     91    where PID is replaced by the process ID of the running process.
     92    This file contains the basic block vector.  For long-running programs
     93    this file can be quite large, so it might be wise to compress
     94    it with gzip or some other compression program.
     95 </p>
     96 <p>
     97    To create actual SimPoint results, you will need the SimPoint utility,
     98    available from the 
     99    <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint webpage</a>.
    100    Assuming you have downloaded SimPoint 3.2 and compiled it,
    101    create SimPoint results with a command like the following:
    102       
    103    </p>
    104 <pre class="programlisting">
    105 ./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \
    106     -loadFVFile bb.out.1234.gz \
    107     -k 5 -saveSimpoints results.simpts \
    108     -saveSimpointWeights results.weights</pre>
    109 <p>
    110 
    111    where bb.out.1234.gz is your compressed basic block vector file
    112    generated by BBV.
    113 </p>
    114 <p>   
    115    The SimPoint utility does random linear projection using 15-dimensions,
    116    then does k-mean clustering to calculate which intervals are 
    117    of interest.  In this example we specify 5 intervals with the 
    118    -k 5 option.   
    119 </p>
    120 <p>   
    121    The outputs from the SimPoint run are the 
    122    <code class="computeroutput">results.simpts</code>
    123    and <code class="computeroutput">results.weights</code> files.
    124    The first holds the 5 most relevant intervals of the program.
    125    The seconds holds the weight to scale each interval by when
    126    extrapolating full-program behavior.  The intervals and the weights
    127    can be used in conjunction with a simulator that supports
    128    fast-forwarding; you fast-forward to the interval of interest,
    129    collect stats for the desired interval length, then use
    130    statistics gathered in conjunction with the weights to 
    131    calculate your results.
    132 </p>
    133 </div>
    134 <div class="sect1" title="12.3.BBV Command-line Options">
    135 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    136 <a name="bbv-manual.usage"></a>12.3.BBV Command-line Options</h2></div></div></div>
    137 <p> BBV-specific command-line options are:</p>
    138 <div class="variablelist">
    139 <a name="bbv.opts.list"></a><dl>
    140 <dt>
    141 <a name="opt.bb-out-file"></a><span class="term">
    142         <code class="option">--bb-out-file=&lt;name&gt; [default: bb.out.%p] </code>
    143      </span>
    144 </dt>
    145 <dd><p>
    146            This option selects the name of the basic block vector file.  The
    147            <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
    148            used to embed the process ID and/or the contents of an environment
    149            variable in the name, as is the case for the core option
    150            <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
    151         </p></dd>
    152 <dt>
    153 <a name="opt.pc-out-file"></a><span class="term">
    154         <code class="option">--pc-out-file=&lt;name&gt; [default: pc.out.%p] </code>
    155      </span>
    156 </dt>
    157 <dd><p>
    158            This option selects the name of the PC file.  
    159            This file holds program counter addresses
    160            and function name info for the various basic blocks.
    161            This can be used in conjunction
    162            with the basic block vector file to fast-forward via function names
    163            instead of just instruction counts.  The 
    164            <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be
    165            used to embed the process ID and/or the contents of an environment
    166            variable in the name, as is the case for the core option
    167            <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>.
    168         </p></dd>
    169 <dt>
    170 <a name="opt.interval-size"></a><span class="term">
    171         <code class="option">--interval-size=&lt;number&gt; [default: 100000000] </code>
    172       </span>
    173 </dt>
    174 <dd><p>
    175          This option selects the size of the interval to use.  
    176          The default is 100 
    177          million instructions, which is a commonly used value.  
    178          Other sizes can be used; smaller intervals can help programs
    179          with finer-grained phases.  However smaller interval size
    180          can lead to accuracy issues due to warm-up effects 
    181          (When fast-forwarding the various architectural features
    182          will be un-initialized, and it will take some number
    183          of instructions before they "warm up" to the state a 
    184          full simulation would be at without the fast-forwarding.
    185          Large interval sizes tend to mitigate this.)
    186       </p></dd>
    187 <dt>
    188 <a name="opt.instr-count-only"></a><span class="term">
    189         <code class="option">--instr-count-only [default: no] </code>
    190      </span>
    191 </dt>
    192 <dd><p>
    193            This option tells the tool to only display instruction count
    194            totals, and to not generate the actual basic block vector file.
    195            This is useful for debugging, and for gathering instruction count
    196            info without generating the large basic block vector files.
    197         </p></dd>
    198 </dl>
    199 </div>
    200 </div>
    201 <div class="sect1" title="12.4.Basic Block Vector File Format">
    202 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    203 <a name="bbv-manual.fileformat"></a>12.4.Basic Block Vector File Format</h2></div></div></div>
    204 <p>  
    205   The Basic Block Vector is dumped at fixed intervals.  This
    206   is commonly done every 100 million instructions; the 
    207   <code class="option">--interval-size</code> option can be 
    208   used to change this.
    209 </p>
    210 <p>
    211   The output file looks like this:
    212 </p>
    213 <pre class="programlisting">
    214 T:45:1024 :189:99343
    215 T:11:78573 :15:1353  :56:1
    216 T:18:45 :12:135353 :56:78 314:4324263</pre>
    217 <p>
    218   Each new interval starts with a T.   This is followed on the same line
    219   by a series of basic block and frequency pairs, one for each
    220   basic block that was entered during the interval.  The format for
    221   each block/frequency pair is a colon, followed by a number that
    222   uniquely identifies the basic block, another colon, and then
    223   the frequency (which is the number of times the block was entered,
    224   multiplied by the number of instructions in the block).  The
    225   pairs are separated from each other by a space.
    226 </p>
    227 <p>
    228   The frequency count is multiplied by the number of instructions that are 
    229   in the basic block, in order to weigh the count so that instructions in 
    230   small basic blocks aren't counted as more important than instructions 
    231   in large basic blocks.
    232 </p>
    233 <p>
    234   The SimPoint program only processes lines that start with a "T".  All
    235   other lines are ignored.  Traditionally comments are indicated by
    236   starting a line with a "#" character.  Some other BBV generation tools,
    237   such as PinPoints, generate lines beginning with letters other than "T"
    238   to indicate more information about the program being run.  We do
    239   not generate these, as the SimPoint utility ignores them.
    240 </p>
    241 </div>
    242 <div class="sect1" title="12.5.Implementation">
    243 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    244 <a name="bbv-manual.implementation"></a>12.5.Implementation</h2></div></div></div>
    245 <p>
    246    Valgrind provides all of the information necessary to create
    247    BBV files.  In the current implementation, all instructions
    248    are instrumented.  This is slower (by approximately a factor
    249    of two) than a method that instruments at the basic block level, 
    250    but there are some complications (especially with rep prefix
    251    detection) that make that method more difficult.
    252 </p>
    253 <p>
    254    Valgrind actually provides instrumentation at a superblock level.
    255    A superblock has one entry point but unlike basic blocks can
    256    have multiple exit points.  Once a branch occurs into the middle
    257    of a block, it is split into a new basic block.  Because
    258    Valgrind cannot produce "true" basic blocks, the generated
    259    BBV vectors will be different than those generated by other tools.
    260    In practice this does not seem to affect the accuracy of the
    261    SimPoint results.  We do internally force the
    262    <code class="option">--vex-guest-chase-thresh=0</code>
    263    option to Valgrind which forces a more basic-block-like
    264    behavior.
    265 </p>
    266 <p>
    267    When a superblock is run for the first time, it is instrumented
    268    with our BBV routine.  A block info (bbInfo) structure is allocated
    269    which holds the various information and statistics for the block.
    270    A unique block ID is assigned to the block, and then the
    271    structure is placed into an ordered set.
    272    Then each native instruction in the block is instrumented to
    273    call an instruction counting routine with a pointer to the block
    274    info structure as an argument.
    275 </p>
    276 <p>
    277    At run-time, our instruction counting routines are called once
    278    per native instruction.  The relevant block info structure is accessed
    279    and the block count and total instruction count is updated.   
    280    If the total instruction count overflows the interval size 
    281    then we walk the ordered set, writing out the statistics for
    282    any block that was accessed in the interval, then resetting the
    283    block counters to zero.
    284 </p>
    285 <p>
    286    On the x86 and amd64 architectures the counting code has extra
    287    code to handle rep-prefixed string instructions.  This is because 
    288    actual hardware counts a rep-prefixed instruction 
    289    as one instruction, while a naive Valgrind implementation
    290    would count it as many (possibly hundreds, thousands or even millions)
    291    of instructions.  We handle rep-prefixed instructions specially,
    292    in order to make the results match those obtained with hardware performance
    293    counters.
    294 </p>
    295 <p>
    296    BBV also counts the fldcw instruction.  This instruction is used on 
    297    x86 machines in various ways; it is most commonly found when converting 
    298    floating point values into integers.
    299    On Pentium 4 systems the retired instruction performance
    300    counter counts this instruction as two instructions (all other 
    301    known processors only count it as one).
    302    This can affect results when using SimPoint on Pentium 4 systems.
    303    We provide the fldcw count so that users can evaluate whether it
    304    will impact their results enough to avoid using Pentium 4 machines
    305    for their experiments.  It would be possible to add an option to 
    306    this tool that mimics the double-counting so that the generated BBV
    307    files would be usable for experiments using hardware performance
    308    counters on Pentium 4 systems.
    309 </p>
    310 </div>
    311 <div class="sect1" title="12.6.Threaded Executable Support">
    312 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    313 <a name="bbv-manual.threadsupport"></a>12.6.Threaded Executable Support</h2></div></div></div>
    314 <p>
    315    BBV supports threaded programs.  When a program has multiple threads,
    316    an additional basic block vector file is created for each thread (each
    317    additional file is the specified filename with the thread number
    318    appended at the end).
    319 </p>
    320 <p>
    321    There is no official method of using SimPoint with
    322    threaded workloads.  The most common method is to run
    323    SimPoint on each thread's results independently, and use 
    324    some method of deterministic execution to try to match the
    325    original workload.  This should be possible with the current
    326    BBV.
    327 </p>
    328 </div>
    329 <div class="sect1" title="12.7.Validation">
    330 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    331 <a name="bbv-manual.validation"></a>12.7.Validation</h2></div></div></div>
    332 <p>
    333    BBV has been tested on x86, amd64, and ppc32 platforms.
    334    An earlier version of BBV was tested in detail using
    335    hardware performance counters, this work is described in a paper 
    336    from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation 
    337    to Generate Multi-Platform SimPoints: Methodology and Accuracy" by
    338    V.M. Weaver and S.A. McKee.
    339 </p>
    340 </div>
    341 <div class="sect1" title="12.8.Performance">
    342 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
    343 <a name="bbv-manual.performance"></a>12.8.Performance</h2></div></div></div>
    344 <p>
    345   Using this program slows down execution by roughly a factor of 40
    346   over native execution.  This varies depending on the machine
    347   used and the benchmark being run.
    348   On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D 
    349   processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2).
    350 </p>
    351 </div>
    352 </div>
    353 <div>
    354 <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer">
    355 <tr>
    356 <td rowspan="2" width="40%" align="left">
    357 <a accesskey="p" href="pc-manual.html">&lt;&lt;11.Ptrcheck: an experimental heap, stack and global array overrun detector</a></td>
    358 <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td>
    359 <td rowspan="2" width="40%" align="right"><a accesskey="n" href="lk-manual.html">13.Lackey: an example tool&gt;&gt;</a>
    360 </td>
    361 </tr>
    362 <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr>
    363 </table>
    364 </div>
    365 </body>
    366 </html>
    367