Home | History | Annotate | Download | only in google-benchmark
      1 # benchmark
      2 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
      3 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
      4 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
      5 [![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
      6 
      7 A library to support the benchmarking of functions, similar to unit-tests.
      8 
      9 [Discussion group](https://groups.google.com/d/forum/benchmark-discuss)
     10 
     11 IRC channel: [freenode](https://freenode.net) #googlebenchmark
     12 
     13 [Additional Tooling Documentation](docs/tools.md)
     14 
     15 [Assembly Testing Documentation](docs/AssemblyTests.md)
     16 
     17 
     18 ## Building
     19 
     20 The basic steps for configuring and building the library look like this:
     21 
     22 ```bash
     23 $ git clone https://github.com/google/benchmark.git
     24 # Benchmark requires Google Test as a dependency. Add the source tree as a subdirectory.
     25 $ git clone https://github.com/google/googletest.git benchmark/googletest
     26 $ mkdir build && cd build
     27 $ cmake -G <generator> [options] ../benchmark
     28 # Assuming a makefile generator was used
     29 $ make
     30 ```
     31 
     32 Note that Google Benchmark requires Google Test to build and run the tests. This
     33 dependency can be provided two ways:
     34 
     35 * Checkout the Google Test sources into `benchmark/googletest` as above.
     36 * Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
     37   configuration, the library will automatically download and build any required
     38   dependencies.
     39 
     40 If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
     41 to `CMAKE_ARGS`.
     42 
     43 
     44 ## Installation Guide
     45 
     46 For Ubuntu and Debian Based System
     47 
     48 First make sure you have git and cmake installed (If not please install them)
     49 
     50 ```
     51 sudo apt-get install git cmake
     52 ```
     53 
     54 Now, let's clone the repository and build it
     55 
     56 ```
     57 git clone https://github.com/google/benchmark.git
     58 cd benchmark
     59 # If you want to build tests and don't use BENCHMARK_DOWNLOAD_DEPENDENCIES, then
     60 # git clone https://github.com/google/googletest.git
     61 mkdir build
     62 cd build
     63 cmake .. -DCMAKE_BUILD_TYPE=RELEASE
     64 make
     65 ```
     66 
     67 If you need to install the library globally
     68 
     69 ```
     70 sudo make install
     71 ```
     72 
     73 ## Stable and Experimental Library Versions
     74 
     75 The main branch contains the latest stable version of the benchmarking library;
     76 the API of which can be considered largely stable, with source breaking changes
     77 being made only upon the release of a new major version.
     78 
     79 Newer, experimental, features are implemented and tested on the
     80 [`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
     81 to use, test, and provide feedback on the new features are encouraged to try
     82 this branch. However, this branch provides no stability guarantees and reserves
     83 the right to change and break the API at any time.
     84 
     85 ## Further knowledge
     86 
     87 It may help to read the [Google Test documentation](https://github.com/google/googletest/blob/master/googletest/docs/primer.md)
     88 as some of the structural aspects of the APIs are similar.
     89 
     90 ## Example usage
     91 ### Basic usage
     92 Define a function that executes the code to be measured, register it as a
     93 benchmark function using the `BENCHMARK` macro, and ensure an appropriate `main`
     94 function is available:
     95 
     96 ```c++
     97 #include <benchmark/benchmark.h>
     98 
     99 static void BM_StringCreation(benchmark::State& state) {
    100   for (auto _ : state)
    101     std::string empty_string;
    102 }
    103 // Register the function as a benchmark
    104 BENCHMARK(BM_StringCreation);
    105 
    106 // Define another benchmark
    107 static void BM_StringCopy(benchmark::State& state) {
    108   std::string x = "hello";
    109   for (auto _ : state)
    110     std::string copy(x);
    111 }
    112 BENCHMARK(BM_StringCopy);
    113 
    114 BENCHMARK_MAIN();
    115 ```
    116 
    117 Don't forget to inform your linker to add benchmark library e.g. through 
    118 `-lbenchmark` compilation flag. Alternatively, you may leave out the 
    119 `BENCHMARK_MAIN();` at the end of the source file and link against 
    120 `-lbenchmark_main` to get the same default behavior.
    121 
    122 The benchmark library will measure and report the timing for code within the
    123 `for(...)` loop.
    124 
    125 #### Platform-specific libraries
    126 When the library is built using GCC it is necessary to link with the pthread
    127 library due to how GCC implements `std::thread`. Failing to link to pthread will
    128 lead to runtime exceptions (unless you're using libc++), not linker errors. See
    129 [issue #67](https://github.com/google/benchmark/issues/67) for more details. You
    130 can link to pthread by adding `-pthread` to your linker command. Note, you can
    131 also use `-lpthread`, but there are potential issues with ordering of command
    132 line parameters if you use that.
    133 
    134 If you're running benchmarks on Windows, the shlwapi library (`-lshlwapi`) is
    135 also required.
    136 
    137 If you're running benchmarks on solaris, you'll want the kstat library linked in
    138 too (`-lkstat`).
    139 
    140 ### Passing arguments
    141 Sometimes a family of benchmarks can be implemented with just one routine that
    142 takes an extra argument to specify which one of the family of benchmarks to
    143 run. For example, the following code defines a family of benchmarks for
    144 measuring the speed of `memcpy()` calls of different lengths:
    145 
    146 ```c++
    147 static void BM_memcpy(benchmark::State& state) {
    148   char* src = new char[state.range(0)];
    149   char* dst = new char[state.range(0)];
    150   memset(src, 'x', state.range(0));
    151   for (auto _ : state)
    152     memcpy(dst, src, state.range(0));
    153   state.SetBytesProcessed(int64_t(state.iterations()) *
    154                           int64_t(state.range(0)));
    155   delete[] src;
    156   delete[] dst;
    157 }
    158 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
    159 ```
    160 
    161 The preceding code is quite repetitive, and can be replaced with the following
    162 short-hand. The following invocation will pick a few appropriate arguments in
    163 the specified range and will generate a benchmark for each such argument.
    164 
    165 ```c++
    166 BENCHMARK(BM_memcpy)->Range(8, 8<<10);
    167 ```
    168 
    169 By default the arguments in the range are generated in multiples of eight and
    170 the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
    171 range multiplier is changed to multiples of two.
    172 
    173 ```c++
    174 BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
    175 ```
    176 Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
    177 
    178 You might have a benchmark that depends on two or more inputs. For example, the
    179 following code defines a family of benchmarks for measuring the speed of set
    180 insertion.
    181 
    182 ```c++
    183 static void BM_SetInsert(benchmark::State& state) {
    184   std::set<int> data;
    185   for (auto _ : state) {
    186     state.PauseTiming();
    187     data = ConstructRandomSet(state.range(0));
    188     state.ResumeTiming();
    189     for (int j = 0; j < state.range(1); ++j)
    190       data.insert(RandomNumber());
    191   }
    192 }
    193 BENCHMARK(BM_SetInsert)
    194     ->Args({1<<10, 128})
    195     ->Args({2<<10, 128})
    196     ->Args({4<<10, 128})
    197     ->Args({8<<10, 128})
    198     ->Args({1<<10, 512})
    199     ->Args({2<<10, 512})
    200     ->Args({4<<10, 512})
    201     ->Args({8<<10, 512});
    202 ```
    203 
    204 The preceding code is quite repetitive, and can be replaced with the following
    205 short-hand. The following macro will pick a few appropriate arguments in the
    206 product of the two specified ranges and will generate a benchmark for each such
    207 pair.
    208 
    209 ```c++
    210 BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
    211 ```
    212 
    213 For more complex patterns of inputs, passing a custom function to `Apply` allows
    214 programmatic specification of an arbitrary set of arguments on which to run the
    215 benchmark. The following example enumerates a dense range on one parameter,
    216 and a sparse range on the second.
    217 
    218 ```c++
    219 static void CustomArguments(benchmark::internal::Benchmark* b) {
    220   for (int i = 0; i <= 10; ++i)
    221     for (int j = 32; j <= 1024*1024; j *= 8)
    222       b->Args({i, j});
    223 }
    224 BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
    225 ```
    226 
    227 ### Calculate asymptotic complexity (Big O)
    228 Asymptotic complexity might be calculated for a family of benchmarks. The
    229 following code will calculate the coefficient for the high-order term in the
    230 running time and the normalized root-mean square error of string comparison.
    231 
    232 ```c++
    233 static void BM_StringCompare(benchmark::State& state) {
    234   std::string s1(state.range(0), '-');
    235   std::string s2(state.range(0), '-');
    236   for (auto _ : state) {
    237     benchmark::DoNotOptimize(s1.compare(s2));
    238   }
    239   state.SetComplexityN(state.range(0));
    240 }
    241 BENCHMARK(BM_StringCompare)
    242     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
    243 ```
    244 
    245 As shown in the following invocation, asymptotic complexity might also be
    246 calculated automatically.
    247 
    248 ```c++
    249 BENCHMARK(BM_StringCompare)
    250     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
    251 ```
    252 
    253 The following code will specify asymptotic complexity with a lambda function,
    254 that might be used to customize high-order term calculation.
    255 
    256 ```c++
    257 BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
    258     ->Range(1<<10, 1<<18)->Complexity([](int64_t n)->double{return n; });
    259 ```
    260 
    261 ### Templated benchmarks
    262 Templated benchmarks work the same way: This example produces and consumes
    263 messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
    264 absence of multiprogramming.
    265 
    266 ```c++
    267 template <class Q> void BM_Sequential(benchmark::State& state) {
    268   Q q;
    269   typename Q::value_type v;
    270   for (auto _ : state) {
    271     for (int i = state.range(0); i--; )
    272       q.push(v);
    273     for (int e = state.range(0); e--; )
    274       q.Wait(&v);
    275   }
    276   // actually messages, not bytes:
    277   state.SetBytesProcessed(
    278       static_cast<int64_t>(state.iterations())*state.range(0));
    279 }
    280 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
    281 ```
    282 
    283 Three macros are provided for adding benchmark templates.
    284 
    285 ```c++
    286 #ifdef BENCHMARK_HAS_CXX11
    287 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
    288 #else // C++ < C++11
    289 #define BENCHMARK_TEMPLATE(func, arg1)
    290 #endif
    291 #define BENCHMARK_TEMPLATE1(func, arg1)
    292 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
    293 ```
    294 
    295 ### A Faster KeepRunning loop
    296 
    297 In C++11 mode, a ranged-based for loop should be used in preference to
    298 the `KeepRunning` loop for running the benchmarks. For example:
    299 
    300 ```c++
    301 static void BM_Fast(benchmark::State &state) {
    302   for (auto _ : state) {
    303     FastOperation();
    304   }
    305 }
    306 BENCHMARK(BM_Fast);
    307 ```
    308 
    309 The reason the ranged-for loop is faster than using `KeepRunning`, is
    310 because `KeepRunning` requires a memory load and store of the iteration count
    311 ever iteration, whereas the ranged-for variant is able to keep the iteration count
    312 in a register.
    313 
    314 For example, an empty inner loop of using the ranged-based for method looks like:
    315 
    316 ```asm
    317 # Loop Init
    318   mov rbx, qword ptr [r14 + 104]
    319   call benchmark::State::StartKeepRunning()
    320   test rbx, rbx
    321   je .LoopEnd
    322 .LoopHeader: # =>This Inner Loop Header: Depth=1
    323   add rbx, -1
    324   jne .LoopHeader
    325 .LoopEnd:
    326 ```
    327 
    328 Compared to an empty `KeepRunning` loop, which looks like:
    329 
    330 ```asm
    331 .LoopHeader: # in Loop: Header=BB0_3 Depth=1
    332   cmp byte ptr [rbx], 1
    333   jne .LoopInit
    334 .LoopBody: # =>This Inner Loop Header: Depth=1
    335   mov rax, qword ptr [rbx + 8]
    336   lea rcx, [rax + 1]
    337   mov qword ptr [rbx + 8], rcx
    338   cmp rax, qword ptr [rbx + 104]
    339   jb .LoopHeader
    340   jmp .LoopEnd
    341 .LoopInit:
    342   mov rdi, rbx
    343   call benchmark::State::StartKeepRunning()
    344   jmp .LoopBody
    345 .LoopEnd:
    346 ```
    347 
    348 Unless C++03 compatibility is required, the ranged-for variant of writing
    349 the benchmark loop should be preferred.  
    350 
    351 ## Passing arbitrary arguments to a benchmark
    352 In C++11 it is possible to define a benchmark that takes an arbitrary number
    353 of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
    354 macro creates a benchmark that invokes `func`  with the `benchmark::State` as
    355 the first argument followed by the specified `args...`.
    356 The `test_case_name` is appended to the name of the benchmark and
    357 should describe the values passed.
    358 
    359 ```c++
    360 template <class ...ExtraArgs>
    361 void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
    362   [...]
    363 }
    364 // Registers a benchmark named "BM_takes_args/int_string_test" that passes
    365 // the specified values to `extra_args`.
    366 BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
    367 ```
    368 Note that elements of `...args` may refer to global variables. Users should
    369 avoid modifying global state inside of a benchmark.
    370 
    371 ## Using RegisterBenchmark(name, fn, args...)
    372 
    373 The `RegisterBenchmark(name, func, args...)` function provides an alternative
    374 way to create and register benchmarks.
    375 `RegisterBenchmark(name, func, args...)` creates, registers, and returns a
    376 pointer to a new benchmark with the specified `name` that invokes
    377 `func(st, args...)` where `st` is a `benchmark::State` object.
    378 
    379 Unlike the `BENCHMARK` registration macros, which can only be used at the global
    380 scope, the `RegisterBenchmark` can be called anywhere. This allows for
    381 benchmark tests to be registered programmatically.
    382 
    383 Additionally `RegisterBenchmark` allows any callable object to be registered
    384 as a benchmark. Including capturing lambdas and function objects.
    385 
    386 For Example:
    387 ```c++
    388 auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
    389 
    390 int main(int argc, char** argv) {
    391   for (auto& test_input : { /* ... */ })
    392       benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
    393   benchmark::Initialize(&argc, argv);
    394   benchmark::RunSpecifiedBenchmarks();
    395 }
    396 ```
    397 
    398 ### Multithreaded benchmarks
    399 In a multithreaded test (benchmark invoked by multiple threads simultaneously),
    400 it is guaranteed that none of the threads will start until all have reached
    401 the start of the benchmark loop, and all will have finished before any thread
    402 exits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
    403 API) As such, any global setup or teardown can be wrapped in a check against the thread
    404 index:
    405 
    406 ```c++
    407 static void BM_MultiThreaded(benchmark::State& state) {
    408   if (state.thread_index == 0) {
    409     // Setup code here.
    410   }
    411   for (auto _ : state) {
    412     // Run the test as normal.
    413   }
    414   if (state.thread_index == 0) {
    415     // Teardown code here.
    416   }
    417 }
    418 BENCHMARK(BM_MultiThreaded)->Threads(2);
    419 ```
    420 
    421 If the benchmarked code itself uses threads and you want to compare it to
    422 single-threaded code, you may want to use real-time ("wallclock") measurements
    423 for latency comparisons:
    424 
    425 ```c++
    426 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
    427 ```
    428 
    429 Without `UseRealTime`, CPU time is used by default.
    430 
    431 ## Controlling timers
    432 Normally, the entire duration of the work loop (`for (auto _ : state) {}`)
    433 is measured. But sometimes, it is nessesary to do some work inside of
    434 that loop, every iteration, but without counting that time to the benchmark time.
    435 That is possible, althought it is not recommended, since it has high overhead.
    436 
    437 ```c++
    438 static void BM_SetInsert_With_Timer_Control(benchmark::State& state) {
    439   std::set<int> data;
    440   for (auto _ : state) {
    441     state.PauseTiming(); // Stop timers. They will not count until they are resumed.
    442     data = ConstructRandomSet(state.range(0)); // Do something that should not be measured
    443     state.ResumeTiming(); // And resume timers. They are now counting again.
    444     // The rest will be measured.
    445     for (int j = 0; j < state.range(1); ++j)
    446       data.insert(RandomNumber());
    447   }
    448 }
    449 BENCHMARK(BM_SetInsert_With_Timer_Control)->Ranges({{1<<10, 8<<10}, {128, 512}});
    450 ```
    451 
    452 ## Manual timing
    453 For benchmarking something for which neither CPU time nor real-time are
    454 correct or accurate enough, completely manual timing is supported using
    455 the `UseManualTime` function.
    456 
    457 When `UseManualTime` is used, the benchmarked code must call
    458 `SetIterationTime` once per iteration of the benchmark loop to
    459 report the manually measured time.
    460 
    461 An example use case for this is benchmarking GPU execution (e.g. OpenCL
    462 or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
    463 be accurately measured using CPU time or real-time. Instead, they can be
    464 measured accurately using a dedicated API, and these measurement results
    465 can be reported back with `SetIterationTime`.
    466 
    467 ```c++
    468 static void BM_ManualTiming(benchmark::State& state) {
    469   int microseconds = state.range(0);
    470   std::chrono::duration<double, std::micro> sleep_duration {
    471     static_cast<double>(microseconds)
    472   };
    473 
    474   for (auto _ : state) {
    475     auto start = std::chrono::high_resolution_clock::now();
    476     // Simulate some useful workload with a sleep
    477     std::this_thread::sleep_for(sleep_duration);
    478     auto end   = std::chrono::high_resolution_clock::now();
    479 
    480     auto elapsed_seconds =
    481       std::chrono::duration_cast<std::chrono::duration<double>>(
    482         end - start);
    483 
    484     state.SetIterationTime(elapsed_seconds.count());
    485   }
    486 }
    487 BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
    488 ```
    489 
    490 ### Preventing optimisation
    491 To prevent a value or expression from being optimized away by the compiler
    492 the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
    493 functions can be used.
    494 
    495 ```c++
    496 static void BM_test(benchmark::State& state) {
    497   for (auto _ : state) {
    498       int x = 0;
    499       for (int i=0; i < 64; ++i) {
    500         benchmark::DoNotOptimize(x += i);
    501       }
    502   }
    503 }
    504 ```
    505 
    506 `DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
    507 memory or a register. For GNU based compilers it acts as read/write barrier
    508 for global memory. More specifically it forces the compiler to flush pending
    509 writes to memory and reload any other values as necessary.
    510 
    511 Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
    512 in any way. `<expr>` may even be removed entirely when the result is already
    513 known. For example:
    514 
    515 ```c++
    516   /* Example 1: `<expr>` is removed entirely. */
    517   int foo(int x) { return x + 42; }
    518   while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
    519 
    520   /*  Example 2: Result of '<expr>' is only reused */
    521   int bar(int) __attribute__((const));
    522   while (...) DoNotOptimize(bar(0)); // Optimized to:
    523   // int __result__ = bar(0);
    524   // while (...) DoNotOptimize(__result__);
    525 ```
    526 
    527 The second tool for preventing optimizations is `ClobberMemory()`. In essence
    528 `ClobberMemory()` forces the compiler to perform all pending writes to global
    529 memory. Memory managed by block scope objects must be "escaped" using
    530 `DoNotOptimize(...)` before it can be clobbered. In the below example
    531 `ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
    532 away.
    533 
    534 ```c++
    535 static void BM_vector_push_back(benchmark::State& state) {
    536   for (auto _ : state) {
    537     std::vector<int> v;
    538     v.reserve(1);
    539     benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
    540     v.push_back(42);
    541     benchmark::ClobberMemory(); // Force 42 to be written to memory.
    542   }
    543 }
    544 ```
    545 
    546 Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
    547 
    548 ### Set time unit manually
    549 If a benchmark runs a few milliseconds it may be hard to visually compare the
    550 measured times, since the output data is given in nanoseconds per default. In
    551 order to manually set the time unit, you can specify it manually:
    552 
    553 ```c++
    554 BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
    555 ```
    556 
    557 ### Reporting the mean, median and standard deviation by repeated benchmarks
    558 By default each benchmark is run once and that single result is reported.
    559 However benchmarks are often noisy and a single result may not be representative
    560 of the overall behavior. For this reason it's possible to repeatedly rerun the
    561 benchmark.
    562 
    563 The number of runs of each benchmark is specified globally by the
    564 `--benchmark_repetitions` flag or on a per benchmark basis by calling
    565 `Repetitions` on the registered benchmark object. When a benchmark is run more
    566 than once the mean, median and standard deviation of the runs will be reported.
    567 
    568 Additionally the `--benchmark_report_aggregates_only={true|false}`,
    569 `--benchmark_display_aggregates_only={true|false}` flags or
    570 `ReportAggregatesOnly(bool)`, `DisplayAggregatesOnly(bool)` functions can be
    571 used to change how repeated tests are reported. By default the result of each
    572 repeated run is reported. When `report aggregates only` option is `true`,
    573 only the aggregates (i.e. mean, median and standard deviation, maybe complexity
    574 measurements if they were requested) of the runs is reported, to both the
    575 reporters - standard output (console), and the file.
    576 However when only the `display aggregates only` option is `true`,
    577 only the aggregates are displayed in the standard output, while the file
    578 output still contains everything.
    579 Calling `ReportAggregatesOnly(bool)` / `DisplayAggregatesOnly(bool)` on a
    580 registered benchmark object overrides the value of the appropriate flag for that
    581 benchmark.
    582 
    583 ## User-defined statistics for repeated benchmarks
    584 While having mean, median and standard deviation is nice, this may not be
    585 enough for everyone. For example you may want to know what is the largest
    586 observation, e.g. because you have some real-time constraints. This is easy.
    587 The following code will specify a custom statistic to be calculated, defined
    588 by a lambda function.
    589 
    590 ```c++
    591 void BM_spin_empty(benchmark::State& state) {
    592   for (auto _ : state) {
    593     for (int x = 0; x < state.range(0); ++x) {
    594       benchmark::DoNotOptimize(x);
    595     }
    596   }
    597 }
    598 
    599 BENCHMARK(BM_spin_empty)
    600   ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
    601     return *(std::max_element(std::begin(v), std::end(v)));
    602   })
    603   ->Arg(512);
    604 ```
    605 
    606 ## Fixtures
    607 Fixture tests are created by
    608 first defining a type that derives from `::benchmark::Fixture` and then
    609 creating/registering the tests using the following macros:
    610 
    611 * `BENCHMARK_F(ClassName, Method)`
    612 * `BENCHMARK_DEFINE_F(ClassName, Method)`
    613 * `BENCHMARK_REGISTER_F(ClassName, Method)`
    614 
    615 For Example:
    616 
    617 ```c++
    618 class MyFixture : public benchmark::Fixture {};
    619 
    620 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
    621    for (auto _ : st) {
    622      ...
    623   }
    624 }
    625 
    626 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
    627    for (auto _ : st) {
    628      ...
    629   }
    630 }
    631 /* BarTest is NOT registered */
    632 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
    633 /* BarTest is now registered */
    634 ```
    635 
    636 ### Templated fixtures
    637 Also you can create templated fixture by using the following macros:
    638 
    639 * `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
    640 * `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
    641 
    642 For example:
    643 ```c++
    644 template<typename T>
    645 class MyFixture : public benchmark::Fixture {};
    646 
    647 BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
    648    for (auto _ : st) {
    649      ...
    650   }
    651 }
    652 
    653 BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
    654    for (auto _ : st) {
    655      ...
    656   }
    657 }
    658 
    659 BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
    660 ```
    661 
    662 ## User-defined counters
    663 
    664 You can add your own counters with user-defined names. The example below
    665 will add columns "Foo", "Bar" and "Baz" in its output:
    666 
    667 ```c++
    668 static void UserCountersExample1(benchmark::State& state) {
    669   double numFoos = 0, numBars = 0, numBazs = 0;
    670   for (auto _ : state) {
    671     // ... count Foo,Bar,Baz events
    672   }
    673   state.counters["Foo"] = numFoos;
    674   state.counters["Bar"] = numBars;
    675   state.counters["Baz"] = numBazs;
    676 }
    677 ```
    678 
    679 The `state.counters` object is a `std::map` with `std::string` keys
    680 and `Counter` values. The latter is a `double`-like class, via an implicit
    681 conversion to `double&`. Thus you can use all of the standard arithmetic
    682 assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
    683 
    684 In multithreaded benchmarks, each counter is set on the calling thread only.
    685 When the benchmark finishes, the counters from each thread will be summed;
    686 the resulting sum is the value which will be shown for the benchmark.
    687 
    688 The `Counter` constructor accepts three parameters: the value as a `double`
    689 ; a bit flag which allows you to show counters as rates, and/or as per-thread
    690 iteration, and/or as per-thread averages, and/or iteration invariants;
    691 and a flag specifying the 'unit' - i.e. is 1k a 1000 (default,
    692 `benchmark::Counter::OneK::kIs1000`), or 1024
    693 (`benchmark::Counter::OneK::kIs1024`)?
    694 
    695 ```c++
    696   // sets a simple counter
    697   state.counters["Foo"] = numFoos;
    698 
    699   // Set the counter as a rate. It will be presented divided
    700   // by the duration of the benchmark.
    701   state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
    702 
    703   // Set the counter as a thread-average quantity. It will
    704   // be presented divided by the number of threads.
    705   state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
    706 
    707   // There's also a combined flag:
    708   state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
    709 
    710   // This says that we process with the rate of state.range(0) bytes every iteration:
    711   state.counters["BytesProcessed"] = Counter(state.range(0), benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024);
    712 ```
    713 
    714 When you're compiling in C++11 mode or later you can use `insert()` with
    715 `std::initializer_list`:
    716 
    717 ```c++
    718   // With C++11, this can be done:
    719   state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
    720   // ... instead of:
    721   state.counters["Foo"] = numFoos;
    722   state.counters["Bar"] = numBars;
    723   state.counters["Baz"] = numBazs;
    724 ```
    725 
    726 ### Counter reporting
    727 
    728 When using the console reporter, by default, user counters are are printed at
    729 the end after the table, the same way as ``bytes_processed`` and
    730 ``items_processed``. This is best for cases in which there are few counters,
    731 or where there are only a couple of lines per benchmark. Here's an example of
    732 the default output:
    733 
    734 ```
    735 ------------------------------------------------------------------------------
    736 Benchmark                        Time           CPU Iterations UserCounters...
    737 ------------------------------------------------------------------------------
    738 BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
    739 BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
    740 BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
    741 BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
    742 BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
    743 BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
    744 BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
    745 BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
    746 BM_Factorial                    26 ns         26 ns   26608979 40320
    747 BM_Factorial/real_time          26 ns         26 ns   26587936 40320
    748 BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
    749 BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
    750 BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
    751 BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
    752 ```
    753 
    754 If this doesn't suit you, you can print each counter as a table column by
    755 passing the flag `--benchmark_counters_tabular=true` to the benchmark
    756 application. This is best for cases in which there are a lot of counters, or
    757 a lot of lines per individual benchmark. Note that this will trigger a
    758 reprinting of the table header any time the counter set changes between
    759 individual benchmarks. Here's an example of corresponding output when
    760 `--benchmark_counters_tabular=true` is passed:
    761 
    762 ```
    763 ---------------------------------------------------------------------------------------
    764 Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
    765 ---------------------------------------------------------------------------------------
    766 BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
    767 BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
    768 BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
    769 BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
    770 BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
    771 BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
    772 BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
    773 BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
    774 --------------------------------------------------------------
    775 Benchmark                        Time           CPU Iterations
    776 --------------------------------------------------------------
    777 BM_Factorial                    26 ns         26 ns   26392245 40320
    778 BM_Factorial/real_time          26 ns         26 ns   26494107 40320
    779 BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
    780 BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
    781 BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
    782 BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
    783 BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
    784 BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
    785 BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
    786 BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
    787 BM_CalculatePi/threads:8      2255 ns       9943 ns      70936
    788 ```
    789 Note above the additional header printed when the benchmark changes from
    790 ``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
    791 not have the same counter set as ``BM_UserCounter``.
    792 
    793 ## Exiting Benchmarks in Error
    794 
    795 When errors caused by external influences, such as file I/O and network
    796 communication, occur within a benchmark the
    797 `State::SkipWithError(const char* msg)` function can be used to skip that run
    798 of benchmark and report the error. Note that only future iterations of the
    799 `KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
    800 Users must explicitly exit the loop, otherwise all iterations will be performed.
    801 Users may explicitly return to exit the benchmark immediately.
    802 
    803 The `SkipWithError(...)` function may be used at any point within the benchmark,
    804 including before and after the benchmark loop.
    805 
    806 For example:
    807 
    808 ```c++
    809 static void BM_test(benchmark::State& state) {
    810   auto resource = GetResource();
    811   if (!resource.good()) {
    812       state.SkipWithError("Resource is not good!");
    813       // KeepRunning() loop will not be entered.
    814   }
    815   for (state.KeepRunning()) {
    816       auto data = resource.read_data();
    817       if (!resource.good()) {
    818         state.SkipWithError("Failed to read data!");
    819         break; // Needed to skip the rest of the iteration.
    820      }
    821      do_stuff(data);
    822   }
    823 }
    824 
    825 static void BM_test_ranged_fo(benchmark::State & state) {
    826   state.SkipWithError("test will not be entered");
    827   for (auto _ : state) {
    828     state.SkipWithError("Failed!");
    829     break; // REQUIRED to prevent all further iterations.
    830   }
    831 }
    832 ```
    833 
    834 ## Running a subset of the benchmarks
    835 
    836 The `--benchmark_filter=<regex>` option can be used to only run the benchmarks
    837 which match the specified `<regex>`. For example:
    838 
    839 ```bash
    840 $ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
    841 Run on (1 X 2300 MHz CPU )
    842 2016-06-25 19:34:24
    843 Benchmark              Time           CPU Iterations
    844 ----------------------------------------------------
    845 BM_memcpy/32          11 ns         11 ns   79545455
    846 BM_memcpy/32k       2181 ns       2185 ns     324074
    847 BM_memcpy/32          12 ns         12 ns   54687500
    848 BM_memcpy/32k       1834 ns       1837 ns     357143
    849 ```
    850 
    851 ## Runtime and reporting considerations
    852 When the benchmark binary is executed, each benchmark function is run serially.
    853 The number of iterations to run is determined dynamically by running the
    854 benchmark a few times and measuring the time taken and ensuring that the
    855 ultimate result will be statistically stable. As such, faster benchmark
    856 functions will be run for more iterations than slower benchmark functions, and
    857 the number of iterations is thus reported.
    858 
    859 In all cases, the number of iterations for which the benchmark is run is
    860 governed by the amount of time the benchmark takes. Concretely, the number of
    861 iterations is at least one, not more than 1e9, until CPU time is greater than
    862 the minimum time, or the wallclock time is 5x minimum time. The minimum time is
    863 set per benchmark by calling `MinTime` on the registered benchmark object.
    864 
    865 Average timings are then reported over the iterations run. If multiple
    866 repetitions are requested using the `--benchmark_repetitions` command-line
    867 option, or at registration time, the benchmark function will be run several
    868 times and statistical results across these repetitions will also be reported.
    869 
    870 As well as the per-benchmark entries, a preamble in the report will include
    871 information about the machine on which the benchmarks are run.
    872 
    873 ### Output Formats
    874 The library supports multiple output formats. Use the
    875 `--benchmark_format=<console|json|csv>` flag to set the format type. `console`
    876 is the default format.
    877 
    878 The Console format is intended to be a human readable format. By default
    879 the format generates color output. Context is output on stderr and the
    880 tabular data on stdout. Example tabular output looks like:
    881 ```
    882 Benchmark                               Time(ns)    CPU(ns) Iterations
    883 ----------------------------------------------------------------------
    884 BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
    885 BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
    886 BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
    887 ```
    888 
    889 The JSON format outputs human readable json split into two top level attributes.
    890 The `context` attribute contains information about the run in general, including
    891 information about the CPU and the date.
    892 The `benchmarks` attribute contains a list of every benchmark run. Example json
    893 output looks like:
    894 ```json
    895 {
    896   "context": {
    897     "date": "2015/03/17-18:40:25",
    898     "num_cpus": 40,
    899     "mhz_per_cpu": 2801,
    900     "cpu_scaling_enabled": false,
    901     "build_type": "debug"
    902   },
    903   "benchmarks": [
    904     {
    905       "name": "BM_SetInsert/1024/1",
    906       "iterations": 94877,
    907       "real_time": 29275,
    908       "cpu_time": 29836,
    909       "bytes_per_second": 134066,
    910       "items_per_second": 33516
    911     },
    912     {
    913       "name": "BM_SetInsert/1024/8",
    914       "iterations": 21609,
    915       "real_time": 32317,
    916       "cpu_time": 32429,
    917       "bytes_per_second": 986770,
    918       "items_per_second": 246693
    919     },
    920     {
    921       "name": "BM_SetInsert/1024/10",
    922       "iterations": 21393,
    923       "real_time": 32724,
    924       "cpu_time": 33355,
    925       "bytes_per_second": 1199226,
    926       "items_per_second": 299807
    927     }
    928   ]
    929 }
    930 ```
    931 
    932 The CSV format outputs comma-separated values. The `context` is output on stderr
    933 and the CSV itself on stdout. Example CSV output looks like:
    934 ```
    935 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
    936 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
    937 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
    938 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
    939 ```
    940 
    941 ### Output Files
    942 The library supports writing the output of the benchmark to a file specified
    943 by `--benchmark_out=<filename>`. The format of the output can be specified
    944 using `--benchmark_out_format={json|console|csv}`. Specifying
    945 `--benchmark_out` does not suppress the console output.
    946 
    947 ## Result comparison
    948 
    949 It is possible to compare the benchmarking results. See [Additional Tooling Documentation](docs/tools.md)
    950 
    951 ## Debug vs Release
    952 By default, benchmark builds as a debug library. You will see a warning in the
    953 output when this is the case. To build it as a release library instead, use:
    954 
    955 ```
    956 cmake -DCMAKE_BUILD_TYPE=Release
    957 ```
    958 
    959 To enable link-time optimisation, use
    960 
    961 ```
    962 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
    963 ```
    964 
    965 If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake
    966 cache variables, if autodetection fails.
    967 
    968 If you are using clang, you may need to set `LLVMAR_EXECUTABLE`,
    969 `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
    970 
    971 ## Compiler Support
    972 
    973 Google Benchmark uses C++11 when building the library. As such we require
    974 a modern C++ toolchain, both compiler and standard library.
    975 
    976 The following minimum versions are strongly recommended build the library:
    977 
    978 * GCC 4.8
    979 * Clang 3.4
    980 * Visual Studio 2013
    981 * Intel 2015 Update 1
    982 
    983 Anything older *may* work.
    984 
    985 Note: Using the library and its headers in C++03 is supported. C++11 is only
    986 required to build the library.
    987 
    988 ## Disable CPU frequency scaling
    989 If you see this error:
    990 ```
    991 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
    992 ```
    993 you might want to disable the CPU frequency scaling while running the benchmark:
    994 ```bash
    995 sudo cpupower frequency-set --governor performance
    996 ./mybench
    997 sudo cpupower frequency-set --governor powersave
    998 ```
    999