1 <html> 2 <head> 3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 4 <title>12.BBV: an experimental basic block vector generation tool</title> 5 <link rel="stylesheet" type="text/css" href="vg_basic.css"> 6 <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7 <link rel="home" href="index.html" title="Valgrind Documentation"> 8 <link rel="up" href="manual.html" title="Valgrind User Manual"> 9 <link rel="prev" href="sg-manual.html" title="11.SGCheck: an experimental stack and global array overrun detector"> 10 <link rel="next" href="lk-manual.html" title="13.Lackey: an example tool"> 11 </head> 12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13 <div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr> 14 <td width="22px" align="center" valign="middle"><a accesskey="p" href="sg-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td> 15 <td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td> 16 <td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td> 17 <th align="center" valign="middle">Valgrind User Manual</th> 18 <td width="22px" align="center" valign="middle"><a accesskey="n" href="lk-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td> 19 </tr></table></div> 20 <div class="chapter"> 21 <div class="titlepage"><div><div><h1 class="title"> 22 <a name="bbv-manual"></a>12.BBV: an experimental basic block vector generation tool</h1></div></div></div> 23 <div class="toc"> 24 <p><b>Table of Contents</b></p> 25 <dl class="toc"> 26 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.overview">12.1. Overview</a></span></dt> 27 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.quickstart">12.2. Using Basic Block Vectors to create SimPoints</a></span></dt> 28 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.usage">12.3. BBV Command-line Options</a></span></dt> 29 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.fileformat">12.4. Basic Block Vector File Format</a></span></dt> 30 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.implementation">12.5. Implementation</a></span></dt> 31 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.threadsupport">12.6. Threaded Executable Support</a></span></dt> 32 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.validation">12.7. Validation</a></span></dt> 33 <dt><span class="sect1"><a href="bbv-manual.html#bbv-manual.performance">12.8. Performance</a></span></dt> 34 </dl> 35 </div> 36 <p>To use this tool, you must specify 37 <code class="option">--tool=exp-bbv</code> on the Valgrind 38 command line.</p> 39 <div class="sect1"> 40 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 41 <a name="bbv-manual.overview"></a>12.1.Overview</h2></div></div></div> 42 <p> 43 A basic block is a linear section of code with one entry point and one exit 44 point. A <span class="emphasis"><em>basic block vector</em></span> (BBV) is a list of all 45 basic blocks entered during program execution, and a count of how many 46 times each basic block was run. 47 </p> 48 <p> 49 BBV is a tool that generates basic block vectors for use with the 50 <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint</a> 51 analysis tool. 52 The SimPoint methodology enables speeding up architectural 53 simulations by only running a small portion of a program 54 and then extrapolating total behavior from this 55 small portion. Most programs exhibit phase-based behavior, which 56 means that at various times during execution a program will encounter 57 intervals of time where the code behaves similarly to a previous 58 interval. If you can detect these intervals and group them together, 59 an approximation of the total program behavior can be obtained 60 by only simulating a bare minimum number of intervals, and then scaling 61 the results. 62 </p> 63 <p> 64 In computer architecture research, running a 65 benchmark on a cycle-accurate simulator can cause slowdowns on the order 66 of 1000 times, making it take days, weeks, or even longer to run full 67 benchmarks. By utilizing SimPoint this can be reduced significantly, 68 usually by 90-95%, while still retaining reasonable accuracy. 69 </p> 70 <p> 71 A more complete introduction to how SimPoint works can be 72 found in the paper "Automatically Characterizing Large Scale 73 Program Behavior" by T. Sherwood, E. Perelman, G. Hamerly, and 74 B. Calder. 75 </p> 76 </div> 77 <div class="sect1"> 78 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 79 <a name="bbv-manual.quickstart"></a>12.2.Using Basic Block Vectors to create SimPoints</h2></div></div></div> 80 <p> 81 To quickly create a basic block vector file, you will call Valgrind 82 like this: 83 84 </p> 85 <pre class="programlisting">valgrind --tool=exp-bbv /bin/ls</pre> 86 <p> 87 88 In this case we are running on <code class="filename">/bin/ls</code>, 89 but this can be any program. By default a file called 90 <code class="computeroutput">bb.out.PID</code> will be created, 91 where PID is replaced by the process ID of the running process. 92 This file contains the basic block vector. For long-running programs 93 this file can be quite large, so it might be wise to compress 94 it with gzip or some other compression program. 95 </p> 96 <p> 97 To create actual SimPoint results, you will need the SimPoint utility, 98 available from the 99 <a class="ulink" href="http://www.cse.ucsd.edu/~calder/simpoint/" target="_top">SimPoint webpage</a>. 100 Assuming you have downloaded SimPoint 3.2 and compiled it, 101 create SimPoint results with a command like the following: 102 103 </p> 104 <pre class="programlisting"> 105 ./SimPoint.3.2/bin/simpoint -inputVectorsGzipped \ 106 -loadFVFile bb.out.1234.gz \ 107 -k 5 -saveSimpoints results.simpts \ 108 -saveSimpointWeights results.weights</pre> 109 <p> 110 111 where bb.out.1234.gz is your compressed basic block vector file 112 generated by BBV. 113 </p> 114 <p> 115 The SimPoint utility does random linear projection using 15-dimensions, 116 then does k-mean clustering to calculate which intervals are 117 of interest. In this example we specify 5 intervals with the 118 -k 5 option. 119 </p> 120 <p> 121 The outputs from the SimPoint run are the 122 <code class="computeroutput">results.simpts</code> 123 and <code class="computeroutput">results.weights</code> files. 124 The first holds the 5 most relevant intervals of the program. 125 The seconds holds the weight to scale each interval by when 126 extrapolating full-program behavior. The intervals and the weights 127 can be used in conjunction with a simulator that supports 128 fast-forwarding; you fast-forward to the interval of interest, 129 collect stats for the desired interval length, then use 130 statistics gathered in conjunction with the weights to 131 calculate your results. 132 </p> 133 </div> 134 <div class="sect1"> 135 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 136 <a name="bbv-manual.usage"></a>12.3.BBV Command-line Options</h2></div></div></div> 137 <p> BBV-specific command-line options are:</p> 138 <div class="variablelist"> 139 <a name="bbv.opts.list"></a><dl class="variablelist"> 140 <dt> 141 <a name="opt.bb-out-file"></a><span class="term"> 142 <code class="option">--bb-out-file=<name> [default: bb.out.%p] </code> 143 </span> 144 </dt> 145 <dd><p> 146 This option selects the name of the basic block vector file. The 147 <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be 148 used to embed the process ID and/or the contents of an environment 149 variable in the name, as is the case for the core option 150 <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>. 151 </p></dd> 152 <dt> 153 <a name="opt.pc-out-file"></a><span class="term"> 154 <code class="option">--pc-out-file=<name> [default: pc.out.%p] </code> 155 </span> 156 </dt> 157 <dd><p> 158 This option selects the name of the PC file. 159 This file holds program counter addresses 160 and function name info for the various basic blocks. 161 This can be used in conjunction 162 with the basic block vector file to fast-forward via function names 163 instead of just instruction counts. The 164 <code class="option">%p</code> and <code class="option">%q</code> format specifiers can be 165 used to embed the process ID and/or the contents of an environment 166 variable in the name, as is the case for the core option 167 <code class="option"><a class="xref" href="manual-core.html#opt.log-file">--log-file</a></code>. 168 </p></dd> 169 <dt> 170 <a name="opt.interval-size"></a><span class="term"> 171 <code class="option">--interval-size=<number> [default: 100000000] </code> 172 </span> 173 </dt> 174 <dd><p> 175 This option selects the size of the interval to use. 176 The default is 100 177 million instructions, which is a commonly used value. 178 Other sizes can be used; smaller intervals can help programs 179 with finer-grained phases. However smaller interval size 180 can lead to accuracy issues due to warm-up effects 181 (When fast-forwarding the various architectural features 182 will be un-initialized, and it will take some number 183 of instructions before they "warm up" to the state a 184 full simulation would be at without the fast-forwarding. 185 Large interval sizes tend to mitigate this.) 186 </p></dd> 187 <dt> 188 <a name="opt.instr-count-only"></a><span class="term"> 189 <code class="option">--instr-count-only [default: no] </code> 190 </span> 191 </dt> 192 <dd><p> 193 This option tells the tool to only display instruction count 194 totals, and to not generate the actual basic block vector file. 195 This is useful for debugging, and for gathering instruction count 196 info without generating the large basic block vector files. 197 </p></dd> 198 </dl> 199 </div> 200 </div> 201 <div class="sect1"> 202 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 203 <a name="bbv-manual.fileformat"></a>12.4.Basic Block Vector File Format</h2></div></div></div> 204 <p> 205 The Basic Block Vector is dumped at fixed intervals. This 206 is commonly done every 100 million instructions; the 207 <code class="option">--interval-size</code> option can be 208 used to change this. 209 </p> 210 <p> 211 The output file looks like this: 212 </p> 213 <pre class="programlisting"> 214 T:45:1024 :189:99343 215 T:11:78573 :15:1353 :56:1 216 T:18:45 :12:135353 :56:78 314:4324263</pre> 217 <p> 218 Each new interval starts with a T. This is followed on the same line 219 by a series of basic block and frequency pairs, one for each 220 basic block that was entered during the interval. The format for 221 each block/frequency pair is a colon, followed by a number that 222 uniquely identifies the basic block, another colon, and then 223 the frequency (which is the number of times the block was entered, 224 multiplied by the number of instructions in the block). The 225 pairs are separated from each other by a space. 226 </p> 227 <p> 228 The frequency count is multiplied by the number of instructions that are 229 in the basic block, in order to weigh the count so that instructions in 230 small basic blocks aren't counted as more important than instructions 231 in large basic blocks. 232 </p> 233 <p> 234 The SimPoint program only processes lines that start with a "T". All 235 other lines are ignored. Traditionally comments are indicated by 236 starting a line with a "#" character. Some other BBV generation tools, 237 such as PinPoints, generate lines beginning with letters other than "T" 238 to indicate more information about the program being run. We do 239 not generate these, as the SimPoint utility ignores them. 240 </p> 241 </div> 242 <div class="sect1"> 243 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 244 <a name="bbv-manual.implementation"></a>12.5.Implementation</h2></div></div></div> 245 <p> 246 Valgrind provides all of the information necessary to create 247 BBV files. In the current implementation, all instructions 248 are instrumented. This is slower (by approximately a factor 249 of two) than a method that instruments at the basic block level, 250 but there are some complications (especially with rep prefix 251 detection) that make that method more difficult. 252 </p> 253 <p> 254 Valgrind actually provides instrumentation at a superblock level. 255 A superblock has one entry point but unlike basic blocks can 256 have multiple exit points. Once a branch occurs into the middle 257 of a block, it is split into a new basic block. Because 258 Valgrind cannot produce "true" basic blocks, the generated 259 BBV vectors will be different than those generated by other tools. 260 In practice this does not seem to affect the accuracy of the 261 SimPoint results. We do internally force the 262 <code class="option">--vex-guest-chase-thresh=0</code> 263 option to Valgrind which forces a more basic-block-like 264 behavior. 265 </p> 266 <p> 267 When a superblock is run for the first time, it is instrumented 268 with our BBV routine. A block info (bbInfo) structure is allocated 269 which holds the various information and statistics for the block. 270 A unique block ID is assigned to the block, and then the 271 structure is placed into an ordered set. 272 Then each native instruction in the block is instrumented to 273 call an instruction counting routine with a pointer to the block 274 info structure as an argument. 275 </p> 276 <p> 277 At run-time, our instruction counting routines are called once 278 per native instruction. The relevant block info structure is accessed 279 and the block count and total instruction count is updated. 280 If the total instruction count overflows the interval size 281 then we walk the ordered set, writing out the statistics for 282 any block that was accessed in the interval, then resetting the 283 block counters to zero. 284 </p> 285 <p> 286 On the x86 and amd64 architectures the counting code has extra 287 code to handle rep-prefixed string instructions. This is because 288 actual hardware counts a rep-prefixed instruction 289 as one instruction, while a naive Valgrind implementation 290 would count it as many (possibly hundreds, thousands or even millions) 291 of instructions. We handle rep-prefixed instructions specially, 292 in order to make the results match those obtained with hardware performance 293 counters. 294 </p> 295 <p> 296 BBV also counts the fldcw instruction. This instruction is used on 297 x86 machines in various ways; it is most commonly found when converting 298 floating point values into integers. 299 On Pentium 4 systems the retired instruction performance 300 counter counts this instruction as two instructions (all other 301 known processors only count it as one). 302 This can affect results when using SimPoint on Pentium 4 systems. 303 We provide the fldcw count so that users can evaluate whether it 304 will impact their results enough to avoid using Pentium 4 machines 305 for their experiments. It would be possible to add an option to 306 this tool that mimics the double-counting so that the generated BBV 307 files would be usable for experiments using hardware performance 308 counters on Pentium 4 systems. 309 </p> 310 </div> 311 <div class="sect1"> 312 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 313 <a name="bbv-manual.threadsupport"></a>12.6.Threaded Executable Support</h2></div></div></div> 314 <p> 315 BBV supports threaded programs. When a program has multiple threads, 316 an additional basic block vector file is created for each thread (each 317 additional file is the specified filename with the thread number 318 appended at the end). 319 </p> 320 <p> 321 There is no official method of using SimPoint with 322 threaded workloads. The most common method is to run 323 SimPoint on each thread's results independently, and use 324 some method of deterministic execution to try to match the 325 original workload. This should be possible with the current 326 BBV. 327 </p> 328 </div> 329 <div class="sect1"> 330 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 331 <a name="bbv-manual.validation"></a>12.7.Validation</h2></div></div></div> 332 <p> 333 BBV has been tested on x86, amd64, and ppc32 platforms. 334 An earlier version of BBV was tested in detail using 335 hardware performance counters, this work is described in a paper 336 from the HiPEAC'08 conference, "Using Dynamic Binary Instrumentation 337 to Generate Multi-Platform SimPoints: Methodology and Accuracy" by 338 V.M. Weaver and S.A. McKee. 339 </p> 340 </div> 341 <div class="sect1"> 342 <div class="titlepage"><div><div><h2 class="title" style="clear: both"> 343 <a name="bbv-manual.performance"></a>12.8.Performance</h2></div></div></div> 344 <p> 345 Using this program slows down execution by roughly a factor of 40 346 over native execution. This varies depending on the machine 347 used and the benchmark being run. 348 On the SPEC CPU 2000 benchmarks running on a 3.4GHz Pentium D 349 processor, the slowdown ranges from 24x (mcf) to 340x (vortex.2). 350 </p> 351 </div> 352 </div> 353 <div> 354 <br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"> 355 <tr> 356 <td rowspan="2" width="40%" align="left"> 357 <a accesskey="p" href="sg-manual.html"><<11.SGCheck: an experimental stack and global array overrun detector</a></td> 358 <td width="20%" align="center"><a accesskey="u" href="manual.html">Up</a></td> 359 <td rowspan="2" width="40%" align="right"><a accesskey="n" href="lk-manual.html">13.Lackey: an example tool>></a> 360 </td> 361 </tr> 362 <tr><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td></tr> 363 </table> 364 </div> 365 </body> 366 </html> 367