Home | History | Annotate | Download | only in google-benchmark
      1 # benchmark
      2 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
      3 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
      4 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
      5 [![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
      6 
      7 A library to support the benchmarking of functions, similar to unit-tests.
      8 
      9 [Discussion group](https://groups.google.com/d/forum/benchmark-discuss)
     10 
     11 IRC channel: [freenode](https://freenode.net) #googlebenchmark
     12 
     13 [Additional Tooling Documentation](docs/tools.md)
     14 
     15 [Assembly Testing Documentation](docs/AssemblyTests.md)
     16 
     17 
     18 ## Building
     19 
     20 The basic steps for configuring and building the library look like this:
     21 
     22 ```bash
     23 $ git clone https://github.com/google/benchmark.git
     24 # Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
     25 $ git clone https://github.com/google/googletest.git benchmark/googletest
     26 $ mkdir build && cd build
     27 $ cmake -G <generator> [options] ../benchmark
     28 # Assuming a makefile generator was used
     29 $ make
     30 ```
     31 
     32 Note that Google Benchmark requires Google Test to build and run the tests. This
     33 dependency can be provided two ways:
     34 
     35 * Checkout the Google Test sources into `benchmark/googletest` as above.
     36 * Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
     37   configuration, the library will automatically download and build any required
     38   dependencies.
     39 
     40 If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
     41 to `CMAKE_ARGS`.
     42 
     43 
     44 ## Installation Guide
     45 
     46 For Ubuntu and Debian Based System
     47 
     48 First make sure you have git and cmake installed (If not please install them)
     49 
     50 ```
     51 sudo apt-get install git cmake
     52 ```
     53 
     54 Now, let's clone the repository and build it
     55 
     56 ```
     57 git clone https://github.com/google/benchmark.git
     58 cd benchmark
     59 # If you want to build tests and don't use BENCHMARK_DOWNLOAD_DEPENDENCIES, then
     60 # git clone https://github.com/google/googletest.git
     61 mkdir build
     62 cd build
     63 cmake .. -DCMAKE_BUILD_TYPE=RELEASE
     64 make
     65 ```
     66 
     67 If you need to install the library globally
     68 
     69 ```
     70 sudo make install
     71 ```
     72 
     73 ## Stable and Experimental Library Versions
     74 
     75 The main branch contains the latest stable version of the benchmarking library;
     76 the API of which can be considered largely stable, with source breaking changes
     77 being made only upon the release of a new major version.
     78 
     79 Newer, experimental, features are implemented and tested on the
     80 [`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
     81 to use, test, and provide feedback on the new features are encouraged to try
     82 this branch. However, this branch provides no stability guarantees and reserves
     83 the right to change and break the API at any time.
     84 
     85 ## Further knowledge
     86 
     87 It may help to read the [Google Test documentation](https://github.com/google/googletest/blob/master/googletest/docs/primer.md)
     88 as some of the structural aspects of the APIs are similar.
     89 
     90 ## Example usage
     91 ### Basic usage
     92 Define a function that executes the code to be measured, register it as a
     93 benchmark function using the `BENCHMARK` macro, and ensure an appropriate `main`
     94 function is available:
     95 
     96 ```c++
     97 #include <benchmark/benchmark.h>
     98 
     99 static void BM_StringCreation(benchmark::State& state) {
    100   for (auto _ : state)
    101     std::string empty_string;
    102 }
    103 // Register the function as a benchmark
    104 BENCHMARK(BM_StringCreation);
    105 
    106 // Define another benchmark
    107 static void BM_StringCopy(benchmark::State& state) {
    108   std::string x = "hello";
    109   for (auto _ : state)
    110     std::string copy(x);
    111 }
    112 BENCHMARK(BM_StringCopy);
    113 
    114 BENCHMARK_MAIN();
    115 ```
    116 
    117 Don't forget to inform your linker to add benchmark library e.g. through 
    118 `-lbenchmark` compilation flag. Alternatively, you may leave out the 
    119 `BENCHMARK_MAIN();` at the end of the source file and link against 
    120 `-lbenchmark_main` to get the same default behavior.
    121 
    122 The benchmark library will measure and report the timing for code within the
    123 `for(...)` loop.
    124 
    125 #### Platform-specific libraries
    126 When the library is built using GCC it is necessary to link with the pthread
    127 library due to how GCC implements `std::thread`. Failing to link to pthread will
    128 lead to runtime exceptions (unless you're using libc++), not linker errors. See
    129 [issue #67](https://github.com/google/benchmark/issues/67) for more details. You
    130 can link to pthread by adding `-pthread` to your linker command. Note, you can
    131 also use `-lpthread`, but there are potential issues with ordering of command
    132 line parameters if you use that.
    133 
    134 If you're running benchmarks on Windows, the shlwapi library (`-lshlwapi`) is
    135 also required.
    136 
    137 If you're running benchmarks on solaris, you'll want the kstat library linked in
    138 too (`-lkstat`).
    139 
    140 ### Passing arguments
    141 Sometimes a family of benchmarks can be implemented with just one routine that
    142 takes an extra argument to specify which one of the family of benchmarks to
    143 run. For example, the following code defines a family of benchmarks for
    144 measuring the speed of `memcpy()` calls of different lengths:
    145 
    146 ```c++
    147 static void BM_memcpy(benchmark::State& state) {
    148   char* src = new char[state.range(0)];
    149   char* dst = new char[state.range(0)];
    150   memset(src, 'x', state.range(0));
    151   for (auto _ : state)
    152     memcpy(dst, src, state.range(0));
    153   state.SetBytesProcessed(int64_t(state.iterations()) *
    154                           int64_t(state.range(0)));
    155   delete[] src;
    156   delete[] dst;
    157 }
    158 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
    159 ```
    160 
    161 The preceding code is quite repetitive, and can be replaced with the following
    162 short-hand. The following invocation will pick a few appropriate arguments in
    163 the specified range and will generate a benchmark for each such argument.
    164 
    165 ```c++
    166 BENCHMARK(BM_memcpy)->Range(8, 8<<10);
    167 ```
    168 
    169 By default the arguments in the range are generated in multiples of eight and
    170 the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
    171 range multiplier is changed to multiples of two.
    172 
    173 ```c++
    174 BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
    175 ```
    176 Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
    177 
    178 You might have a benchmark that depends on two or more inputs. For example, the
    179 following code defines a family of benchmarks for measuring the speed of set
    180 insertion.
    181 
    182 ```c++
    183 static void BM_SetInsert(benchmark::State& state) {
    184   std::set<int> data;
    185   for (auto _ : state) {
    186     state.PauseTiming();
    187     data = ConstructRandomSet(state.range(0));
    188     state.ResumeTiming();
    189     for (int j = 0; j < state.range(1); ++j)
    190       data.insert(RandomNumber());
    191   }
    192 }
    193 BENCHMARK(BM_SetInsert)
    194     ->Args({1<<10, 128})
    195     ->Args({2<<10, 128})
    196     ->Args({4<<10, 128})
    197     ->Args({8<<10, 128})
    198     ->Args({1<<10, 512})
    199     ->Args({2<<10, 512})
    200     ->Args({4<<10, 512})
    201     ->Args({8<<10, 512});
    202 ```
    203 
    204 The preceding code is quite repetitive, and can be replaced with the following
    205 short-hand. The following macro will pick a few appropriate arguments in the
    206 product of the two specified ranges and will generate a benchmark for each such
    207 pair.
    208 
    209 ```c++
    210 BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
    211 ```
    212 
    213 For more complex patterns of inputs, passing a custom function to `Apply` allows
    214 programmatic specification of an arbitrary set of arguments on which to run the
    215 benchmark. The following example enumerates a dense range on one parameter,
    216 and a sparse range on the second.
    217 
    218 ```c++
    219 static void CustomArguments(benchmark::internal::Benchmark* b) {
    220   for (int i = 0; i <= 10; ++i)
    221     for (int j = 32; j <= 1024*1024; j *= 8)
    222       b->Args({i, j});
    223 }
    224 BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
    225 ```
    226 
    227 ### Calculate asymptotic complexity (Big O)
    228 Asymptotic complexity might be calculated for a family of benchmarks. The
    229 following code will calculate the coefficient for the high-order term in the
    230 running time and the normalized root-mean square error of string comparison.
    231 
    232 ```c++
    233 static void BM_StringCompare(benchmark::State& state) {
    234   std::string s1(state.range(0), '-');
    235   std::string s2(state.range(0), '-');
    236   for (auto _ : state) {
    237     benchmark::DoNotOptimize(s1.compare(s2));
    238   }
    239   state.SetComplexityN(state.range(0));
    240 }
    241 BENCHMARK(BM_StringCompare)
    242     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
    243 ```
    244 
    245 As shown in the following invocation, asymptotic complexity might also be
    246 calculated automatically.
    247 
    248 ```c++
    249 BENCHMARK(BM_StringCompare)
    250     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
    251 ```
    252 
    253 The following code will specify asymptotic complexity with a lambda function,
    254 that might be used to customize high-order term calculation.
    255 
    256 ```c++
    257 BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
    258     ->Range(1<<10, 1<<18)->Complexity([](int64_t n)->double{return n; });
    259 ```
    260 
    261 ### Templated benchmarks
    262 Templated benchmarks work the same way: This example produces and consumes
    263 messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
    264 absence of multiprogramming.
    265 
    266 ```c++
    267 template <class Q> void BM_Sequential(benchmark::State& state) {
    268   Q q;
    269   typename Q::value_type v;
    270   for (auto _ : state) {
    271     for (int i = state.range(0); i--; )
    272       q.push(v);
    273     for (int e = state.range(0); e--; )
    274       q.Wait(&v);
    275   }
    276   // actually messages, not bytes:
    277   state.SetBytesProcessed(
    278       static_cast<int64_t>(state.iterations())*state.range(0));
    279 }
    280 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
    281 ```
    282 
    283 Three macros are provided for adding benchmark templates.
    284 
    285 ```c++
    286 #ifdef BENCHMARK_HAS_CXX11
    287 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
    288 #else // C++ < C++11
    289 #define BENCHMARK_TEMPLATE(func, arg1)
    290 #endif
    291 #define BENCHMARK_TEMPLATE1(func, arg1)
    292 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
    293 ```
    294 
    295 ### A Faster KeepRunning loop
    296 
    297 In C++11 mode, a ranged-based for loop should be used in preference to
    298 the `KeepRunning` loop for running the benchmarks. For example:
    299 
    300 ```c++
    301 static void BM_Fast(benchmark::State &state) {
    302   for (auto _ : state) {
    303     FastOperation();
    304   }
    305 }
    306 BENCHMARK(BM_Fast);
    307 ```
    308 
    309 The reason the ranged-for loop is faster than using `KeepRunning`, is
    310 because `KeepRunning` requires a memory load and store of the iteration count
    311 ever iteration, whereas the ranged-for variant is able to keep the iteration count
    312 in a register.
    313 
    314 For example, an empty inner loop of using the ranged-based for method looks like:
    315 
    316 ```asm
    317 # Loop Init
    318   mov rbx, qword ptr [r14 + 104]
    319   call benchmark::State::StartKeepRunning()
    320   test rbx, rbx
    321   je .LoopEnd
    322 .LoopHeader: # =>This Inner Loop Header: Depth=1
    323   add rbx, -1
    324   jne .LoopHeader
    325 .LoopEnd:
    326 ```
    327 
    328 Compared to an empty `KeepRunning` loop, which looks like:
    329 
    330 ```asm
    331 .LoopHeader: # in Loop: Header=BB0_3 Depth=1
    332   cmp byte ptr [rbx], 1
    333   jne .LoopInit
    334 .LoopBody: # =>This Inner Loop Header: Depth=1
    335   mov rax, qword ptr [rbx + 8]
    336   lea rcx, [rax + 1]
    337   mov qword ptr [rbx + 8], rcx
    338   cmp rax, qword ptr [rbx + 104]
    339   jb .LoopHeader
    340   jmp .LoopEnd
    341 .LoopInit:
    342   mov rdi, rbx
    343   call benchmark::State::StartKeepRunning()
    344   jmp .LoopBody
    345 .LoopEnd:
    346 ```
    347 
    348 Unless C++03 compatibility is required, the ranged-for variant of writing
    349 the benchmark loop should be preferred.  
    350 
    351 ## Passing arbitrary arguments to a benchmark
    352 In C++11 it is possible to define a benchmark that takes an arbitrary number
    353 of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
    354 macro creates a benchmark that invokes `func`  with the `benchmark::State` as
    355 the first argument followed by the specified `args...`.
    356 The `test_case_name` is appended to the name of the benchmark and
    357 should describe the values passed.
    358 
    359 ```c++
    360 template <class ...ExtraArgs>
    361 void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
    362   [...]
    363 }
    364 // Registers a benchmark named "BM_takes_args/int_string_test" that passes
    365 // the specified values to `extra_args`.
    366 BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
    367 ```
    368 Note that elements of `...args` may refer to global variables. Users should
    369 avoid modifying global state inside of a benchmark.
    370 
    371 ## Using RegisterBenchmark(name, fn, args...)
    372 
    373 The `RegisterBenchmark(name, func, args...)` function provides an alternative
    374 way to create and register benchmarks.
    375 `RegisterBenchmark(name, func, args...)` creates, registers, and returns a
    376 pointer to a new benchmark with the specified `name` that invokes
    377 `func(st, args...)` where `st` is a `benchmark::State` object.
    378 
    379 Unlike the `BENCHMARK` registration macros, which can only be used at the global
    380 scope, the `RegisterBenchmark` can be called anywhere. This allows for
    381 benchmark tests to be registered programmatically.
    382 
    383 Additionally `RegisterBenchmark` allows any callable object to be registered
    384 as a benchmark. Including capturing lambdas and function objects.
    385 
    386 For Example:
    387 ```c++
    388 auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
    389 
    390 int main(int argc, char** argv) {
    391   for (auto& test_input : { /* ... */ })
    392       benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
    393   benchmark::Initialize(&argc, argv);
    394   benchmark::RunSpecifiedBenchmarks();
    395 }
    396 ```
    397 
    398 ### Multithreaded benchmarks
    399 In a multithreaded test (benchmark invoked by multiple threads simultaneously),
    400 it is guaranteed that none of the threads will start until all have reached
    401 the start of the benchmark loop, and all will have finished before any thread
    402 exits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
    403 API) As such, any global setup or teardown can be wrapped in a check against the thread
    404 index:
    405 
    406 ```c++
    407 static void BM_MultiThreaded(benchmark::State& state) {
    408   if (state.thread_index == 0) {
    409     // Setup code here.
    410   }
    411   for (auto _ : state) {
    412     // Run the test as normal.
    413   }
    414   if (state.thread_index == 0) {
    415     // Teardown code here.
    416   }
    417 }
    418 BENCHMARK(BM_MultiThreaded)->Threads(2);
    419 ```
    420 
    421 If the benchmarked code itself uses threads and you want to compare it to
    422 single-threaded code, you may want to use real-time ("wallclock") measurements
    423 for latency comparisons:
    424 
    425 ```c++
    426 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
    427 ```
    428 
    429 Without `UseRealTime`, CPU time is used by default.
    430 
    431 ## Controlling timers
    432 Normally, the entire duration of the work loop (`for (auto _ : state) {}`)
    433 is measured. But sometimes, it is nessesary to do some work inside of
    434 that loop, every iteration, but without counting that time to the benchmark time.
    435 That is possible, althought it is not recommended, since it has high overhead.
    436 
    437 ```c++
    438 static void BM_SetInsert_With_Timer_Control(benchmark::State& state) {
    439   std::set<int> data;
    440   for (auto _ : state) {
    441     state.PauseTiming(); // Stop timers. They will not count until they are resumed.
    442     data = ConstructRandomSet(state.range(0)); // Do something that should not be measured
    443     state.ResumeTiming(); // And resume timers. They are now counting again.
    444     // The rest will be measured.
    445     for (int j = 0; j < state.range(1); ++j)
    446       data.insert(RandomNumber());
    447   }
    448 }
    449 BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}});
    450 ```
    451 
    452 ## Manual timing
    453 For benchmarking something for which neither CPU time nor real-time are
    454 correct or accurate enough, completely manual timing is supported using
    455 the `UseManualTime` function.
    456 
    457 When `UseManualTime` is used, the benchmarked code must call
    458 `SetIterationTime` once per iteration of the benchmark loop to
    459 report the manually measured time.
    460 
    461 An example use case for this is benchmarking GPU execution (e.g. OpenCL
    462 or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
    463 be accurately measured using CPU time or real-time. Instead, they can be
    464 measured accurately using a dedicated API, and these measurement results
    465 can be reported back with `SetIterationTime`.
    466 
    467 ```c++
    468 static void BM_ManualTiming(benchmark::State& state) {
    469   int microseconds = state.range(0);
    470   std::chrono::duration<double, std::micro> sleep_duration {
    471     static_cast<double>(microseconds)
    472   };
    473 
    474   for (auto _ : state) {
    475     auto start = std::chrono::high_resolution_clock::now();
    476     // Simulate some useful workload with a sleep
    477     std::this_thread::sleep_for(sleep_duration);
    478     auto end   = std::chrono::high_resolution_clock::now();
    479 
    480     auto elapsed_seconds =
    481       std::chrono::duration_cast<std::chrono::duration<double>>(
    482         end - start);
    483 
    484     state.SetIterationTime(elapsed_seconds.count());
    485   }
    486 }
    487 BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
    488 ```
    489 
    490 ### Preventing optimisation
    491 To prevent a value or expression from being optimized away by the compiler
    492 the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
    493 functions can be used.
    494 
    495 ```c++
    496 static void BM_test(benchmark::State& state) {
    497   for (auto _ : state) {
    498       int x = 0;
    499       for (int i=0; i < 64; ++i) {
    500         benchmark::DoNotOptimize(x += i);
    501       }
    502   }
    503 }
    504 ```
    505 
    506 `DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
    507 memory or a register. For GNU based compilers it acts as read/write barrier
    508 for global memory. More specifically it forces the compiler to flush pending
    509 writes to memory and reload any other values as necessary.
    510 
    511 Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
    512 in any way. `<expr>` may even be removed entirely when the result is already
    513 known. For example:
    514 
    515 ```c++
    516   /* Example 1: `<expr>` is removed entirely. */
    517   int foo(int x) { return x + 42; }
    518   while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
    519 
    520   /*  Example 2: Result of '<expr>' is only reused */
    521   int bar(int) __attribute__((const));
    522   while (...) DoNotOptimize(bar(0)); // Optimized to:
    523   // int __result__ = bar(0);
    524   // while (...) DoNotOptimize(__result__);
    525 ```
    526 
    527 The second tool for preventing optimizations is `ClobberMemory()`. In essence
    528 `ClobberMemory()` forces the compiler to perform all pending writes to global
    529 memory. Memory managed by block scope objects must be "escaped" using
    530 `DoNotOptimize(...)` before it can be clobbered. In the below example
    531 `ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
    532 away.
    533 
    534 ```c++
    535 static void BM_vector_push_back(benchmark::State& state) {
    536   for (auto _ : state) {
    537     std::vector<int> v;
    538     v.reserve(1);
    539     benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
    540     v.push_back(42);
    541     benchmark::ClobberMemory(); // Force 42 to be written to memory.
    542   }
    543 }
    544 ```
    545 
    546 Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
    547 
    548 ### Set time unit manually
    549 If a benchmark runs a few milliseconds it may be hard to visually compare the
    550 measured times, since the output data is given in nanoseconds per default. In
    551 order to manually set the time unit, you can specify it manually:
    552 
    553 ```c++
    554 BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
    555 ```
    556 
    557 ### Reporting the mean, median and standard deviation by repeated benchmarks
    558 By default each benchmark is run once and that single result is reported.
    559 However benchmarks are often noisy and a single result may not be representative
    560 of the overall behavior. For this reason it's possible to repeatedly rerun the
    561 benchmark.
    562 
    563 The number of runs of each benchmark is specified globally by the
    564 `--benchmark_repetitions` flag or on a per benchmark basis by calling
    565 `Repetitions` on the registered benchmark object. When a benchmark is run more
    566 than once the mean, median and standard deviation of the runs will be reported.
    567 
    568 Additionally the `--benchmark_report_aggregates_only={true|false}`,
    569 `--benchmark_display_aggregates_only={true|false}` flags or
    570 `ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be
    571 used to change how repeated tests are reported. By default the result of each
    572 repeated run is reported. When `report aggregates only` option is `true`,
    573 only the aggregates (i.e. mean, median and standard deviation, maybe complexity
    574 measurements if they were requested) of the runs is reported, to both the
    575 reporters - standard output (console), and the file.
    576 However when only the `display aggregates only` option is `true`,
    577 only the aggregates are displayed in the standard output, while the file
    578 output still contains everything.
    579 Calling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a
    580 registered benchmark object overrides the value of the appropriate flag for that
    581 benchmark.
    582 
    583 ## User-defined statistics for repeated benchmarks
    584 While having mean, median and standard deviation is nice, this may not be
    585 enough for everyone. For example you may want to know what is the largest
    586 observation, e.g. because you have some real-time constraints. This is easy.
    587 The following code will specify a custom statistic to be calculated, defined
    588 by a lambda function.
    589 
    590 ```c++
    591 void BM_spin_empty(benchmark::State& state) {
    592   for (auto _ : state) {
    593     for (int x = 0; x < state.range(0); ++x) {
    594       benchmark::DoNotOptimize(x);
    595     }
    596   }
    597 }
    598 
    599 BENCHMARK(BM_spin_empty)
    600   ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
    601     return *(std::max_element(std::begin(v), std::end(v)));
    602   })
    603   ->Arg(512);
    604 ```
    605 
    606 ## Fixtures
    607 Fixture tests are created by
    608 first defining a type that derives from `::benchmark::Fixture` and then
    609 creating/registering the tests using the following macros:
    610 
    611 * `BENCHMARK_F(ClassName, Method)`
    612 * `BENCHMARK_DEFINE_F(ClassName, Method)`
    613 * `BENCHMARK_REGISTER_F(ClassName, Method)`
    614 
    615 For Example:
    616 
    617 ```c++
    618 class MyFixture : public benchmark::Fixture {
    619 public:
    620   void SetUp(const ::benchmark::State& state) {
    621   }
    622 
    623   void TearDown(const ::benchmark::State& state) {
    624   }
    625 };
    626 
    627 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
    628    for (auto _ : st) {
    629      ...
    630   }
    631 }
    632 
    633 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
    634    for (auto _ : st) {
    635      ...
    636   }
    637 }
    638 /* BarTest is NOT registered */
    639 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
    640 /* BarTest is now registered */
    641 ```
    642 
    643 ### Templated fixtures
    644 Also you can create templated fixture by using the following macros:
    645 
    646 * `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
    647 * `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
    648 
    649 For example:
    650 ```c++
    651 template<typename T>
    652 class MyFixture : public benchmark::Fixture {};
    653 
    654 BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
    655    for (auto _ : st) {
    656      ...
    657   }
    658 }
    659 
    660 BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
    661    for (auto _ : st) {
    662      ...
    663   }
    664 }
    665 
    666 BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
    667 ```
    668 
    669 ## User-defined counters
    670 
    671 You can add your own counters with user-defined names. The example below
    672 will add columns "Foo", "Bar" and "Baz" in its output:
    673 
    674 ```c++
    675 static void UserCountersExample1(benchmark::State& state) {
    676   double numFoos = 0, numBars = 0, numBazs = 0;
    677   for (auto _ : state) {
    678     // ... count Foo,Bar,Baz events
    679   }
    680   state.counters["Foo"] = numFoos;
    681   state.counters["Bar"] = numBars;
    682   state.counters["Baz"] = numBazs;
    683 }
    684 ```
    685 
    686 The `state.counters` object is a `std::map` with `std::string` keys
    687 and `Counter` values. The latter is a `double`-like class, via an implicit
    688 conversion to `double&`. Thus you can use all of the standard arithmetic
    689 assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
    690 
    691 In multithreaded benchmarks, each counter is set on the calling thread only.
    692 When the benchmark finishes, the counters from each thread will be summed;
    693 the resulting sum is the value which will be shown for the benchmark.
    694 
    695 The `Counter` constructor accepts three parameters: the value as a `double`
    696 ; a bit flag which allows you to show counters as rates, and/or as per-thread
    697 iteration, and/or as per-thread averages, and/or iteration invariants;
    698 and a flag specifying the 'unit' - i.e. is 1k a 1000 (default,
    699 `benchmark::Counter::OneK::kIs1000`), or 1024
    700 (`benchmark::Counter::OneK::kIs1024`)?
    701 
    702 ```c++
    703   // sets a simple counter
    704   state.counters["Foo"] = numFoos;
    705 
    706   // Set the counter as a rate. It will be presented divided
    707   // by the duration of the benchmark.
    708   state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
    709 
    710   // Set the counter as a thread-average quantity. It will
    711   // be presented divided by the number of threads.
    712   state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
    713 
    714   // There's also a combined flag:
    715   state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
    716 
    717   // This says that we process with the rate of state.range(0) bytes every iteration:
    718   state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024);
    719 ```
    720 
    721 When you're compiling in C++11 mode or later you can use `insert()` with
    722 `std::initializer_list`:
    723 
    724 ```c++
    725   // With C++11, this can be done:
    726   state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
    727   // ... instead of:
    728   state.counters["Foo"] = numFoos;
    729   state.counters["Bar"] = numBars;
    730   state.counters["Baz"] = numBazs;
    731 ```
    732 
    733 ### Counter reporting
    734 
    735 When using the console reporter, by default, user counters are are printed at
    736 the end after the table, the same way as ``bytes_processed`` and
    737 ``items_processed``. This is best for cases in which there are few counters,
    738 or where there are only a couple of lines per benchmark. Here's an example of
    739 the default output:
    740 
    741 ```
    742 ------------------------------------------------------------------------------
    743 Benchmark                        Time           CPU Iterations UserCounters...
    744 ------------------------------------------------------------------------------
    745 BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
    746 BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
    747 BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
    748 BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
    749 BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
    750 BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
    751 BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
    752 BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
    753 BM_Factorial                    26 ns         26 ns   26608979 40320
    754 BM_Factorial/real_time          26 ns         26 ns   26587936 40320
    755 BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
    756 BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
    757 BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
    758 BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
    759 ```
    760 
    761 If this doesn't suit you, you can print each counter as a table column by
    762 passing the flag `--benchmark_counters_tabular=true` to the benchmark
    763 application. This is best for cases in which there are a lot of counters, or
    764 a lot of lines per individual benchmark. Note that this will trigger a
    765 reprinting of the table header any time the counter set changes between
    766 individual benchmarks. Here's an example of corresponding output when
    767 `--benchmark_counters_tabular=true` is passed:
    768 
    769 ```
    770 ---------------------------------------------------------------------------------------
    771 Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
    772 ---------------------------------------------------------------------------------------
    773 BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
    774 BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
    775 BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
    776 BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
    777 BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
    778 BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
    779 BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
    780 BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
    781 --------------------------------------------------------------
    782 Benchmark                        Time           CPU Iterations
    783 --------------------------------------------------------------
    784 BM_Factorial                    26 ns         26 ns   26392245 40320
    785 BM_Factorial/real_time          26 ns         26 ns   26494107 40320
    786 BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
    787 BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
    788 BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
    789 BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
    790 BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
    791 BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
    792 BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
    793 BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
    794 BM_CalculatePi/threads:8      2255 ns       9943 ns      70936
    795 ```
    796 Note above the additional header printed when the benchmark changes from
    797 ``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
    798 not have the same counter set as ``BM_UserCounter``.
    799 
    800 ## Exiting Benchmarks in Error
    801 
    802 When errors caused by external influences, such as file I/O and network
    803 communication, occur within a benchmark the
    804 `State::SkipWithError(const char* msg)` function can be used to skip that run
    805 of benchmark and report the error. Note that only future iterations of the
    806 `KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
    807 Users must explicitly exit the loop, otherwise all iterations will be performed.
    808 Users may explicitly return to exit the benchmark immediately.
    809 
    810 The `SkipWithError(...)` function may be used at any point within the benchmark,
    811 including before and after the benchmark loop.
    812 
    813 For example:
    814 
    815 ```c++
    816 static void BM_test(benchmark::State& state) {
    817   auto resource = GetResource();
    818   if (!resource.good()) {
    819       state.SkipWithError("Resource is not good!");
    820       // KeepRunning() loop will not be entered.
    821   }
    822   for (state.KeepRunning()) {
    823       auto data = resource.read_data();
    824       if (!resource.good()) {
    825         state.SkipWithError("Failed to read data!");
    826         break; // Needed to skip the rest of the iteration.
    827      }
    828      do_stuff(data);
    829   }
    830 }
    831 
    832 static void BM_test_ranged_fo(benchmark::State & state) {
    833   state.SkipWithError("test will not be entered");
    834   for (auto _ : state) {
    835     state.SkipWithError("Failed!");
    836     break; // REQUIRED to prevent all further iterations.
    837   }
    838 }
    839 ```
    840 
    841 ## Running a subset of the benchmarks
    842 
    843 The `--benchmark_filter=<regex>` option can be used to only run the benchmarks
    844 which match the specified `<regex>`. For example:
    845 
    846 ```bash
    847 $ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
    848 Run on (1 X 2300 MHz CPU )
    849 2016-06-25 19:34:24
    850 Benchmark              Time           CPU Iterations
    851 ----------------------------------------------------
    852 BM_memcpy/32          11 ns         11 ns   79545455
    853 BM_memcpy/32k       2181 ns       2185 ns     324074
    854 BM_memcpy/32          12 ns         12 ns   54687500
    855 BM_memcpy/32k       1834 ns       1837 ns     357143
    856 ```
    857 
    858 ## Runtime and reporting considerations
    859 When the benchmark binary is executed, each benchmark function is run serially.
    860 The number of iterations to run is determined dynamically by running the
    861 benchmark a few times and measuring the time taken and ensuring that the
    862 ultimate result will be statistically stable. As such, faster benchmark
    863 functions will be run for more iterations than slower benchmark functions, and
    864 the number of iterations is thus reported.
    865 
    866 In all cases, the number of iterations for which the benchmark is run is
    867 governed by the amount of time the benchmark takes. Concretely, the number of
    868 iterations is at least one, not more than 1e9, until CPU time is greater than
    869 the minimum time, or the wallclock time is 5x minimum time. The minimum time is
    870 set per benchmark by calling `MinTime` on the registered benchmark object.
    871 
    872 Average timings are then reported over the iterations run. If multiple
    873 repetitions are requested using the `--benchmark_repetitions` command-line
    874 option, or at registration time, the benchmark function will be run several
    875 times and statistical results across these repetitions will also be reported.
    876 
    877 As well as the per-benchmark entries, a preamble in the report will include
    878 information about the machine on which the benchmarks are run.
    879 
    880 ### Output Formats
    881 The library supports multiple output formats. Use the
    882 `--benchmark_format=<console|json|csv>` flag to set the format type. `console`
    883 is the default format.
    884 
    885 The Console format is intended to be a human readable format. By default
    886 the format generates color output. Context is output on stderr and the
    887 tabular data on stdout. Example tabular output looks like:
    888 ```
    889 Benchmark                               Time(ns)    CPU(ns) Iterations
    890 ----------------------------------------------------------------------
    891 BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
    892 BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
    893 BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
    894 ```
    895 
    896 The JSON format outputs human readable json split into two top level attributes.
    897 The `context` attribute contains information about the run in general, including
    898 information about the CPU and the date.
    899 The `benchmarks` attribute contains a list of every benchmark run. Example json
    900 output looks like:
    901 ```json
    902 {
    903   "context": {
    904     "date": "2015/03/17-18:40:25",
    905     "num_cpus": 40,
    906     "mhz_per_cpu": 2801,
    907     "cpu_scaling_enabled": false,
    908     "build_type": "debug"
    909   },
    910   "benchmarks": [
    911     {
    912       "name": "BM_SetInsert/1024/1",
    913       "iterations": 94877,
    914       "real_time": 29275,
    915       "cpu_time": 29836,
    916       "bytes_per_second": 134066,
    917       "items_per_second": 33516
    918     },
    919     {
    920       "name": "BM_SetInsert/1024/8",
    921       "iterations": 21609,
    922       "real_time": 32317,
    923       "cpu_time": 32429,
    924       "bytes_per_second": 986770,
    925       "items_per_second": 246693
    926     },
    927     {
    928       "name": "BM_SetInsert/1024/10",
    929       "iterations": 21393,
    930       "real_time": 32724,
    931       "cpu_time": 33355,
    932       "bytes_per_second": 1199226,
    933       "items_per_second": 299807
    934     }
    935   ]
    936 }
    937 ```
    938 
    939 The CSV format outputs comma-separated values. The `context` is output on stderr
    940 and the CSV itself on stdout. Example CSV output looks like:
    941 ```
    942 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
    943 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
    944 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
    945 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
    946 ```
    947 
    948 ### Output Files
    949 The library supports writing the output of the benchmark to a file specified
    950 by `--benchmark_out=<filename>`. The format of the output can be specified
    951 using `--benchmark_out_format={json|console|csv}`. Specifying
    952 `--benchmark_out` does not suppress the console output.
    953 
    954 ## Result comparison
    955 
    956 It is possible to compare the benchmarking results. See [Additional Tooling Documentation](docs/tools.md)
    957 
    958 ## Debug vs Release
    959 By default, benchmark builds as a debug library. You will see a warning in the
    960 output when this is the case. To build it as a release library instead, use:
    961 
    962 ```
    963 cmake -DCMAKE_BUILD_TYPE=Release
    964 ```
    965 
    966 To enable link-time optimisation, use
    967 
    968 ```
    969 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
    970 ```
    971 
    972 If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake
    973 cache variables, if autodetection fails.
    974 
    975 If you are using clang, you may need to set `LLVMAR_EXECUTABLE`,
    976 `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
    977 
    978 ## Compiler Support
    979 
    980 Google Benchmark uses C++11 when building the library. As such we require
    981 a modern C++ toolchain, both compiler and standard library.
    982 
    983 The following minimum versions are strongly recommended build the library:
    984 
    985 * GCC 4.8
    986 * Clang 3.4
    987 * Visual Studio 2013
    988 * Intel 2015 Update 1
    989 
    990 Anything older *may* work.
    991 
    992 Note: Using the library and its headers in C++03 is supported. C++11 is only
    993 required to build the library.
    994 
    995 ## Disable CPU frequency scaling
    996 If you see this error:
    997 ```
    998 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
    999 ```
   1000 you might want to disable the CPU frequency scaling while running the benchmark:
   1001 ```bash
   1002 sudo cpupower frequency-set --governor performance
   1003 ./mybench
   1004 sudo cpupower frequency-set --governor powersave
   1005 ```
   1006