Home | History | Annotate | Download | only in google-benchmark
      1 benchmark
      2 =========
      3 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
      4 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
      5 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
      6 
      7 A library to support the benchmarking of functions, similar to unit-tests.
      8 
      9 Discussion group: https://groups.google.com/d/forum/benchmark-discuss
     10 
     11 IRC channel: https://freenode.net #googlebenchmark
     12 
     13 Example usage
     14 -------------
     15 Define a function that executes the code to be measured a
     16 specified number of times:
     17 
     18 ```c++
     19 static void BM_StringCreation(benchmark::State& state) {
     20   while (state.KeepRunning())
     21     std::string empty_string;
     22 }
     23 // Register the function as a benchmark
     24 BENCHMARK(BM_StringCreation);
     25 
     26 // Define another benchmark
     27 static void BM_StringCopy(benchmark::State& state) {
     28   std::string x = "hello";
     29   while (state.KeepRunning())
     30     std::string copy(x);
     31 }
     32 BENCHMARK(BM_StringCopy);
     33 
     34 BENCHMARK_MAIN();
     35 ```
     36 
     37 Sometimes a family of microbenchmarks can be implemented with
     38 just one routine that takes an extra argument to specify which
     39 one of the family of benchmarks to run.  For example, the following
     40 code defines a family of microbenchmarks for measuring the speed
     41 of `memcpy()` calls of different lengths:
     42 
     43 ```c++
     44 static void BM_memcpy(benchmark::State& state) {
     45   char* src = new char[state.range_x()]; char* dst = new char[state.range_x()];
     46   memset(src, 'x', state.range_x());
     47   while (state.KeepRunning())
     48     memcpy(dst, src, state.range_x());
     49   state.SetBytesProcessed(int64_t(state.iterations()) *
     50                           int64_t(state.range_x()));
     51   delete[] src;
     52   delete[] dst;
     53 }
     54 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
     55 ```
     56 
     57 The preceding code is quite repetitive, and can be replaced with the
     58 following short-hand.  The following invocation will pick a few
     59 appropriate arguments in the specified range and will generate a
     60 microbenchmark for each such argument.
     61 
     62 ```c++
     63 BENCHMARK(BM_memcpy)->Range(8, 8<<10);
     64 ```
     65 
     66 You might have a microbenchmark that depends on two inputs.  For
     67 example, the following code defines a family of microbenchmarks for
     68 measuring the speed of set insertion.
     69 
     70 ```c++
     71 static void BM_SetInsert(benchmark::State& state) {
     72   while (state.KeepRunning()) {
     73     state.PauseTiming();
     74     std::set<int> data = ConstructRandomSet(state.range_x());
     75     state.ResumeTiming();
     76     for (int j = 0; j < state.range_y(); ++j)
     77       data.insert(RandomNumber());
     78   }
     79 }
     80 BENCHMARK(BM_SetInsert)
     81     ->ArgPair(1<<10, 1)
     82     ->ArgPair(1<<10, 8)
     83     ->ArgPair(1<<10, 64)
     84     ->ArgPair(1<<10, 512)
     85     ->ArgPair(8<<10, 1)
     86     ->ArgPair(8<<10, 8)
     87     ->ArgPair(8<<10, 64)
     88     ->ArgPair(8<<10, 512);
     89 ```
     90 
     91 The preceding code is quite repetitive, and can be replaced with
     92 the following short-hand.  The following macro will pick a few
     93 appropriate arguments in the product of the two specified ranges
     94 and will generate a microbenchmark for each such pair.
     95 
     96 ```c++
     97 BENCHMARK(BM_SetInsert)->RangePair(1<<10, 8<<10, 1, 512);
     98 ```
     99 
    100 For more complex patterns of inputs, passing a custom function
    101 to Apply allows programmatic specification of an
    102 arbitrary set of arguments to run the microbenchmark on.
    103 The following example enumerates a dense range on one parameter,
    104 and a sparse range on the second.
    105 
    106 ```c++
    107 static void CustomArguments(benchmark::internal::Benchmark* b) {
    108   for (int i = 0; i <= 10; ++i)
    109     for (int j = 32; j <= 1024*1024; j *= 8)
    110       b->ArgPair(i, j);
    111 }
    112 BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
    113 ```
    114 
    115 Templated microbenchmarks work the same way:
    116 Produce then consume 'size' messages 'iters' times
    117 Measures throughput in the absence of multiprogramming.
    118 
    119 ```c++
    120 template <class Q> int BM_Sequential(benchmark::State& state) {
    121   Q q;
    122   typename Q::value_type v;
    123   while (state.KeepRunning()) {
    124     for (int i = state.range_x(); i--; )
    125       q.push(v);
    126     for (int e = state.range_x(); e--; )
    127       q.Wait(&v);
    128   }
    129   // actually messages, not bytes:
    130   state.SetBytesProcessed(
    131       static_cast<int64_t>(state.iterations())*state.range_x());
    132 }
    133 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
    134 ```
    135 
    136 Three macros are provided for adding benchmark templates.
    137 
    138 ```c++
    139 #if __cplusplus >= 201103L // C++11 and greater.
    140 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
    141 #else // C++ < C++11
    142 #define BENCHMARK_TEMPLATE(func, arg1)
    143 #endif
    144 #define BENCHMARK_TEMPLATE1(func, arg1)
    145 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
    146 ```
    147 
    148 In a multithreaded test (benchmark invoked by multiple threads simultaneously),
    149 it is guaranteed that none of the threads will start until all have called
    150 KeepRunning, and all will have finished before KeepRunning returns false. As
    151 such, any global setup or teardown you want to do can be
    152 wrapped in a check against the thread index:
    153 
    154 ```c++
    155 static void BM_MultiThreaded(benchmark::State& state) {
    156   if (state.thread_index == 0) {
    157     // Setup code here.
    158   }
    159   while (state.KeepRunning()) {
    160     // Run the test as normal.
    161   }
    162   if (state.thread_index == 0) {
    163     // Teardown code here.
    164   }
    165 }
    166 BENCHMARK(BM_MultiThreaded)->Threads(2);
    167 ```
    168 
    169 If the benchmarked code itself uses threads and you want to compare it to
    170 single-threaded code, you may want to use real-time ("wallclock") measurements
    171 for latency comparisons:
    172 
    173 ```c++
    174 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
    175 ```
    176 
    177 Without `UseRealTime`, CPU time is used by default.
    178 
    179 To prevent a value or expression from being optimized away by the compiler
    180 the `benchmark::DoNotOptimize(...)` function can be used.
    181 
    182 ```c++
    183 static void BM_test(benchmark::State& state) {
    184   while (state.KeepRunning()) {
    185       int x = 0;
    186       for (int i=0; i < 64; ++i) {
    187         benchmark::DoNotOptimize(x += i);
    188       }
    189   }
    190 }
    191 ```
    192 
    193 Benchmark Fixtures
    194 ------------------
    195 Fixture tests are created by
    196 first defining a type that derives from ::benchmark::Fixture and then
    197 creating/registering the tests using the following macros:
    198 
    199 * `BENCHMARK_F(ClassName, Method)`
    200 * `BENCHMARK_DEFINE_F(ClassName, Method)`
    201 * `BENCHMARK_REGISTER_F(ClassName, Method)`
    202 
    203 For Example:
    204 
    205 ```c++
    206 class MyFixture : public benchmark::Fixture {};
    207 
    208 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
    209    while (st.KeepRunning()) {
    210      ...
    211   }
    212 }
    213 
    214 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
    215    while (st.KeepRunning()) {
    216      ...
    217   }
    218 }
    219 /* BarTest is NOT registered */
    220 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
    221 /* BarTest is now registered */
    222 ```
    223 
    224 Output Formats
    225 --------------
    226 The library supports multiple output formats. Use the
    227 `--benchmark_format=<tabular|json>` flag to set the format type. `tabular` is
    228 the default format.
    229 
    230 The Tabular format is intended to be a human readable format. By default
    231 the format generates color output. Context is output on stderr and the 
    232 tabular data on stdout. Example tabular output looks like:
    233 ```
    234 Benchmark                               Time(ns)    CPU(ns) Iterations
    235 ----------------------------------------------------------------------
    236 BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
    237 BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
    238 BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
    239 ```
    240 
    241 The JSON format outputs human readable json split into two top level attributes.
    242 The `context` attribute contains information about the run in general, including
    243 information about the CPU and the date.
    244 The `benchmarks` attribute contains a list of ever benchmark run. Example json
    245 output looks like:
    246 ``` json
    247 {
    248   "context": {
    249     "date": "2015/03/17-18:40:25",
    250     "num_cpus": 40,
    251     "mhz_per_cpu": 2801,
    252     "cpu_scaling_enabled": false,
    253     "build_type": "debug"
    254   },
    255   "benchmarks": [
    256     {
    257       "name": "BM_SetInsert/1024/1",
    258       "iterations": 94877,
    259       "real_time": 29275,
    260       "cpu_time": 29836,
    261       "bytes_per_second": 134066,
    262       "items_per_second": 33516
    263     },
    264     {
    265       "name": "BM_SetInsert/1024/8",
    266       "iterations": 21609,
    267       "real_time": 32317,
    268       "cpu_time": 32429,
    269       "bytes_per_second": 986770,
    270       "items_per_second": 246693
    271     },
    272     {
    273       "name": "BM_SetInsert/1024/10",
    274       "iterations": 21393,
    275       "real_time": 32724,
    276       "cpu_time": 33355,
    277       "bytes_per_second": 1199226,
    278       "items_per_second": 299807
    279     }
    280   ]
    281 }
    282 ```
    283 
    284 The CSV format outputs comma-separated values. The `context` is output on stderr
    285 and the CSV itself on stdout. Example CSV output looks like:
    286 ```
    287 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
    288 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
    289 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
    290 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
    291 ```
    292 
    293 Debug vs Release
    294 ----------------
    295 By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
    296 
    297 ```
    298 cmake -DCMAKE_BUILD_TYPE=Release
    299 ```
    300 
    301 To enable link-time optimisation, use
    302 
    303 ```
    304 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
    305 ```
    306 
    307 Linking against the library
    308 ---------------------------
    309 When using gcc, it is necessary to link against pthread to avoid runtime exceptions. This is due to how gcc implements std::thread. See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
    310