Home | History | Annotate | only in /external/libcxx/utils/google-benchmark
Up to higher level directory
NameDateSize
AUTHORS21-Aug-20181.5K
cmake/21-Aug-2018
CMakeLists.txt21-Aug-20188.6K
CONTRIBUTING.md21-Aug-20182.4K
CONTRIBUTORS21-Aug-20182.4K
docs/21-Aug-2018
include/21-Aug-2018
LICENSE21-Aug-201811.1K
README.LLVM21-Aug-2018184
README.md21-Aug-201832.3K
src/21-Aug-2018
test/21-Aug-2018
tools/21-Aug-2018

README.LLVM

      1 LLVM notes
      2 ----------
      3 
      4 This directory contains the Google Benchmark source code with some unnecessary
      5 files removed. Note that this directory is under a different license than
      6 libc++.
      7 

README.md

      1 # benchmark
      2 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
      3 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
      4 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
      5 [![slackin](https://slackin-iqtfqnpzxd.now.sh/badge.svg)](https://slackin-iqtfqnpzxd.now.sh/)
      6 
      7 A library to support the benchmarking of functions, similar to unit-tests.
      8 
      9 Discussion group: https://groups.google.com/d/forum/benchmark-discuss
     10 
     11 IRC channel: https://freenode.net #googlebenchmark
     12 
     13 [Known issues and common problems](#known-issues)
     14 
     15 [Additional Tooling Documentation](docs/tools.md)
     16 
     17 
     18 ## Building
     19 
     20 The basic steps for configuring and building the library look like this:
     21 
     22 ```bash
     23 $ git clone https://github.com/google/benchmark.git
     24 # Benchmark requires GTest as a dependency. Add the source tree as a subdirectory.
     25 $ git clone https://github.com/google/googletest.git benchmark/googletest
     26 $ mkdir build && cd build
     27 $ cmake -G <generator> [options] ../benchmark
     28 # Assuming a makefile generator was used
     29 $ make
     30 ```
     31 
     32 Note that Google Benchmark requires GTest to build and run the tests. This
     33 dependency can be provided three ways:
     34 
     35 * Checkout the GTest sources into `benchmark/googletest`.
     36 * Otherwise, if `-DBENCHMARK_DOWNLOAD_DEPENDENCIES=ON` is specified during
     37   configuration, the library will automatically download and build any required
     38   dependencies.
     39 * Otherwise, if nothing is done, CMake will use `find_package(GTest REQUIRED)`
     40   to resolve the required GTest dependency.
     41 
     42 If you do not wish to build and run the tests, add `-DBENCHMARK_ENABLE_GTEST_TESTS=OFF`
     43 to `CMAKE_ARGS`.
     44 
     45 
     46 ## Installation Guide
     47 
     48 For Ubuntu and Debian Based System
     49 
     50 First make sure you have git and cmake installed (If not please install it)
     51 
     52 ```
     53 sudo apt-get install git
     54 sudo apt-get install cmake
     55 ```
     56 
     57 Now, let's clone the repository and build it
     58 
     59 ```
     60 git clone https://github.com/google/benchmark.git
     61 cd benchmark
     62 mkdir build
     63 cd build
     64 cmake .. -DCMAKE_BUILD_TYPE=RELEASE
     65 make
     66 ```
     67 
     68 We need to install the library globally now
     69 
     70 ```
     71 sudo make install
     72 ```
     73 
     74 Now you have google/benchmark installed in your machine 
     75 Note: Don't forget to link to pthread library while building
     76 
     77 ## Stable and Experimental Library Versions
     78 
     79 The main branch contains the latest stable version of the benchmarking library;
     80 the API of which can be considered largely stable, with source breaking changes
     81 being made only upon the release of a new major version.
     82 
     83 Newer, experimental, features are implemented and tested on the
     84 [`v2` branch](https://github.com/google/benchmark/tree/v2). Users who wish
     85 to use, test, and provide feedback on the new features are encouraged to try
     86 this branch. However, this branch provides no stability guarantees and reserves
     87 the right to change and break the API at any time.
     88 
     89 
     90 ## Example usage
     91 ### Basic usage
     92 Define a function that executes the code to be measured.
     93 
     94 ```c++
     95 #include <benchmark/benchmark.h>
     96 
     97 static void BM_StringCreation(benchmark::State& state) {
     98   for (auto _ : state)
     99     std::string empty_string;
    100 }
    101 // Register the function as a benchmark
    102 BENCHMARK(BM_StringCreation);
    103 
    104 // Define another benchmark
    105 static void BM_StringCopy(benchmark::State& state) {
    106   std::string x = "hello";
    107   for (auto _ : state)
    108     std::string copy(x);
    109 }
    110 BENCHMARK(BM_StringCopy);
    111 
    112 BENCHMARK_MAIN();
    113 ```
    114 
    115 Don't forget to inform your linker to add benchmark library e.g. through `-lbenchmark` compilation flag.
    116 
    117 The benchmark library will reporting the timing for the code within the `for(...)` loop.
    118 
    119 ### Passing arguments
    120 Sometimes a family of benchmarks can be implemented with just one routine that
    121 takes an extra argument to specify which one of the family of benchmarks to
    122 run. For example, the following code defines a family of benchmarks for
    123 measuring the speed of `memcpy()` calls of different lengths:
    124 
    125 ```c++
    126 static void BM_memcpy(benchmark::State& state) {
    127   char* src = new char[state.range(0)];
    128   char* dst = new char[state.range(0)];
    129   memset(src, 'x', state.range(0));
    130   for (auto _ : state)
    131     memcpy(dst, src, state.range(0));
    132   state.SetBytesProcessed(int64_t(state.iterations()) *
    133                           int64_t(state.range(0)));
    134   delete[] src;
    135   delete[] dst;
    136 }
    137 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
    138 ```
    139 
    140 The preceding code is quite repetitive, and can be replaced with the following
    141 short-hand. The following invocation will pick a few appropriate arguments in
    142 the specified range and will generate a benchmark for each such argument.
    143 
    144 ```c++
    145 BENCHMARK(BM_memcpy)->Range(8, 8<<10);
    146 ```
    147 
    148 By default the arguments in the range are generated in multiples of eight and
    149 the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
    150 range multiplier is changed to multiples of two.
    151 
    152 ```c++
    153 BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
    154 ```
    155 Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
    156 
    157 You might have a benchmark that depends on two or more inputs. For example, the
    158 following code defines a family of benchmarks for measuring the speed of set
    159 insertion.
    160 
    161 ```c++
    162 static void BM_SetInsert(benchmark::State& state) {
    163   std::set<int> data;
    164   for (auto _ : state) {
    165     state.PauseTiming();
    166     data = ConstructRandomSet(state.range(0));
    167     state.ResumeTiming();
    168     for (int j = 0; j < state.range(1); ++j)
    169       data.insert(RandomNumber());
    170   }
    171 }
    172 BENCHMARK(BM_SetInsert)
    173     ->Args({1<<10, 128})
    174     ->Args({2<<10, 128})
    175     ->Args({4<<10, 128})
    176     ->Args({8<<10, 128})
    177     ->Args({1<<10, 512})
    178     ->Args({2<<10, 512})
    179     ->Args({4<<10, 512})
    180     ->Args({8<<10, 512});
    181 ```
    182 
    183 The preceding code is quite repetitive, and can be replaced with the following
    184 short-hand. The following macro will pick a few appropriate arguments in the
    185 product of the two specified ranges and will generate a benchmark for each such
    186 pair.
    187 
    188 ```c++
    189 BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {128, 512}});
    190 ```
    191 
    192 For more complex patterns of inputs, passing a custom function to `Apply` allows
    193 programmatic specification of an arbitrary set of arguments on which to run the
    194 benchmark. The following example enumerates a dense range on one parameter,
    195 and a sparse range on the second.
    196 
    197 ```c++
    198 static void CustomArguments(benchmark::internal::Benchmark* b) {
    199   for (int i = 0; i <= 10; ++i)
    200     for (int j = 32; j <= 1024*1024; j *= 8)
    201       b->Args({i, j});
    202 }
    203 BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
    204 ```
    205 
    206 ### Calculate asymptotic complexity (Big O)
    207 Asymptotic complexity might be calculated for a family of benchmarks. The
    208 following code will calculate the coefficient for the high-order term in the
    209 running time and the normalized root-mean square error of string comparison.
    210 
    211 ```c++
    212 static void BM_StringCompare(benchmark::State& state) {
    213   std::string s1(state.range(0), '-');
    214   std::string s2(state.range(0), '-');
    215   for (auto _ : state) {
    216     benchmark::DoNotOptimize(s1.compare(s2));
    217   }
    218   state.SetComplexityN(state.range(0));
    219 }
    220 BENCHMARK(BM_StringCompare)
    221     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
    222 ```
    223 
    224 As shown in the following invocation, asymptotic complexity might also be
    225 calculated automatically.
    226 
    227 ```c++
    228 BENCHMARK(BM_StringCompare)
    229     ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
    230 ```
    231 
    232 The following code will specify asymptotic complexity with a lambda function,
    233 that might be used to customize high-order term calculation.
    234 
    235 ```c++
    236 BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
    237     ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
    238 ```
    239 
    240 ### Templated benchmarks
    241 Templated benchmarks work the same way: This example produces and consumes
    242 messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
    243 absence of multiprogramming.
    244 
    245 ```c++
    246 template <class Q> int BM_Sequential(benchmark::State& state) {
    247   Q q;
    248   typename Q::value_type v;
    249   for (auto _ : state) {
    250     for (int i = state.range(0); i--; )
    251       q.push(v);
    252     for (int e = state.range(0); e--; )
    253       q.Wait(&v);
    254   }
    255   // actually messages, not bytes:
    256   state.SetBytesProcessed(
    257       static_cast<int64_t>(state.iterations())*state.range(0));
    258 }
    259 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
    260 ```
    261 
    262 Three macros are provided for adding benchmark templates.
    263 
    264 ```c++
    265 #ifdef BENCHMARK_HAS_CXX11
    266 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
    267 #else // C++ < C++11
    268 #define BENCHMARK_TEMPLATE(func, arg1)
    269 #endif
    270 #define BENCHMARK_TEMPLATE1(func, arg1)
    271 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
    272 ```
    273 
    274 ### A Faster KeepRunning loop
    275 
    276 In C++11 mode, a ranged-based for loop should be used in preference to
    277 the `KeepRunning` loop for running the benchmarks. For example:
    278 
    279 ```c++
    280 static void BM_Fast(benchmark::State &state) {
    281   for (auto _ : state) {
    282     FastOperation();
    283   }
    284 }
    285 BENCHMARK(BM_Fast);
    286 ```
    287 
    288 The reason the ranged-for loop is faster than using `KeepRunning`, is
    289 because `KeepRunning` requires a memory load and store of the iteration count
    290 ever iteration, whereas the ranged-for variant is able to keep the iteration count
    291 in a register.
    292 
    293 For example, an empty inner loop of using the ranged-based for method looks like:
    294 
    295 ```asm
    296 # Loop Init
    297   mov rbx, qword ptr [r14 + 104]
    298   call benchmark::State::StartKeepRunning()
    299   test rbx, rbx
    300   je .LoopEnd
    301 .LoopHeader: # =>This Inner Loop Header: Depth=1
    302   add rbx, -1
    303   jne .LoopHeader
    304 .LoopEnd:
    305 ```
    306 
    307 Compared to an empty `KeepRunning` loop, which looks like:
    308 
    309 ```asm
    310 .LoopHeader: # in Loop: Header=BB0_3 Depth=1
    311   cmp byte ptr [rbx], 1
    312   jne .LoopInit
    313 .LoopBody: # =>This Inner Loop Header: Depth=1
    314   mov rax, qword ptr [rbx + 8]
    315   lea rcx, [rax + 1]
    316   mov qword ptr [rbx + 8], rcx
    317   cmp rax, qword ptr [rbx + 104]
    318   jb .LoopHeader
    319   jmp .LoopEnd
    320 .LoopInit:
    321   mov rdi, rbx
    322   call benchmark::State::StartKeepRunning()
    323   jmp .LoopBody
    324 .LoopEnd:
    325 ```
    326 
    327 Unless C++03 compatibility is required, the ranged-for variant of writing
    328 the benchmark loop should be preferred.  
    329 
    330 ## Passing arbitrary arguments to a benchmark
    331 In C++11 it is possible to define a benchmark that takes an arbitrary number
    332 of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
    333 macro creates a benchmark that invokes `func`  with the `benchmark::State` as
    334 the first argument followed by the specified `args...`.
    335 The `test_case_name` is appended to the name of the benchmark and
    336 should describe the values passed.
    337 
    338 ```c++
    339 template <class ...ExtraArgs>
    340 void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
    341   [...]
    342 }
    343 // Registers a benchmark named "BM_takes_args/int_string_test" that passes
    344 // the specified values to `extra_args`.
    345 BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
    346 ```
    347 Note that elements of `...args` may refer to global variables. Users should
    348 avoid modifying global state inside of a benchmark.
    349 
    350 ## Using RegisterBenchmark(name, fn, args...)
    351 
    352 The `RegisterBenchmark(name, func, args...)` function provides an alternative
    353 way to create and register benchmarks.
    354 `RegisterBenchmark(name, func, args...)` creates, registers, and returns a
    355 pointer to a new benchmark with the specified `name` that invokes
    356 `func(st, args...)` where `st` is a `benchmark::State` object.
    357 
    358 Unlike the `BENCHMARK` registration macros, which can only be used at the global
    359 scope, the `RegisterBenchmark` can be called anywhere. This allows for
    360 benchmark tests to be registered programmatically.
    361 
    362 Additionally `RegisterBenchmark` allows any callable object to be registered
    363 as a benchmark. Including capturing lambdas and function objects.
    364 
    365 For Example:
    366 ```c++
    367 auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
    368 
    369 int main(int argc, char** argv) {
    370   for (auto& test_input : { /* ... */ })
    371       benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
    372   benchmark::Initialize(&argc, argv);
    373   benchmark::RunSpecifiedBenchmarks();
    374 }
    375 ```
    376 
    377 ### Multithreaded benchmarks
    378 In a multithreaded test (benchmark invoked by multiple threads simultaneously),
    379 it is guaranteed that none of the threads will start until all have reached
    380 the start of the benchmark loop, and all will have finished before any thread
    381 exits the benchmark loop. (This behavior is also provided by the `KeepRunning()`
    382 API) As such, any global setup or teardown can be wrapped in a check against the thread
    383 index:
    384 
    385 ```c++
    386 static void BM_MultiThreaded(benchmark::State& state) {
    387   if (state.thread_index == 0) {
    388     // Setup code here.
    389   }
    390   for (auto _ : state) {
    391     // Run the test as normal.
    392   }
    393   if (state.thread_index == 0) {
    394     // Teardown code here.
    395   }
    396 }
    397 BENCHMARK(BM_MultiThreaded)->Threads(2);
    398 ```
    399 
    400 If the benchmarked code itself uses threads and you want to compare it to
    401 single-threaded code, you may want to use real-time ("wallclock") measurements
    402 for latency comparisons:
    403 
    404 ```c++
    405 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
    406 ```
    407 
    408 Without `UseRealTime`, CPU time is used by default.
    409 
    410 
    411 ## Manual timing
    412 For benchmarking something for which neither CPU time nor real-time are
    413 correct or accurate enough, completely manual timing is supported using
    414 the `UseManualTime` function.
    415 
    416 When `UseManualTime` is used, the benchmarked code must call
    417 `SetIterationTime` once per iteration of the benchmark loop to
    418 report the manually measured time.
    419 
    420 An example use case for this is benchmarking GPU execution (e.g. OpenCL
    421 or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
    422 be accurately measured using CPU time or real-time. Instead, they can be
    423 measured accurately using a dedicated API, and these measurement results
    424 can be reported back with `SetIterationTime`.
    425 
    426 ```c++
    427 static void BM_ManualTiming(benchmark::State& state) {
    428   int microseconds = state.range(0);
    429   std::chrono::duration<double, std::micro> sleep_duration {
    430     static_cast<double>(microseconds)
    431   };
    432 
    433   for (auto _ : state) {
    434     auto start = std::chrono::high_resolution_clock::now();
    435     // Simulate some useful workload with a sleep
    436     std::this_thread::sleep_for(sleep_duration);
    437     auto end   = std::chrono::high_resolution_clock::now();
    438 
    439     auto elapsed_seconds =
    440       std::chrono::duration_cast<std::chrono::duration<double>>(
    441         end - start);
    442 
    443     state.SetIterationTime(elapsed_seconds.count());
    444   }
    445 }
    446 BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
    447 ```
    448 
    449 ### Preventing optimisation
    450 To prevent a value or expression from being optimized away by the compiler
    451 the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
    452 functions can be used.
    453 
    454 ```c++
    455 static void BM_test(benchmark::State& state) {
    456   for (auto _ : state) {
    457       int x = 0;
    458       for (int i=0; i < 64; ++i) {
    459         benchmark::DoNotOptimize(x += i);
    460       }
    461   }
    462 }
    463 ```
    464 
    465 `DoNotOptimize(<expr>)` forces the  *result* of `<expr>` to be stored in either
    466 memory or a register. For GNU based compilers it acts as read/write barrier
    467 for global memory. More specifically it forces the compiler to flush pending
    468 writes to memory and reload any other values as necessary.
    469 
    470 Note that `DoNotOptimize(<expr>)` does not prevent optimizations on `<expr>`
    471 in any way. `<expr>` may even be removed entirely when the result is already
    472 known. For example:
    473 
    474 ```c++
    475   /* Example 1: `<expr>` is removed entirely. */
    476   int foo(int x) { return x + 42; }
    477   while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
    478 
    479   /*  Example 2: Result of '<expr>' is only reused */
    480   int bar(int) __attribute__((const));
    481   while (...) DoNotOptimize(bar(0)); // Optimized to:
    482   // int __result__ = bar(0);
    483   // while (...) DoNotOptimize(__result__);
    484 ```
    485 
    486 The second tool for preventing optimizations is `ClobberMemory()`. In essence
    487 `ClobberMemory()` forces the compiler to perform all pending writes to global
    488 memory. Memory managed by block scope objects must be "escaped" using
    489 `DoNotOptimize(...)` before it can be clobbered. In the below example
    490 `ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
    491 away.
    492 
    493 ```c++
    494 static void BM_vector_push_back(benchmark::State& state) {
    495   for (auto _ : state) {
    496     std::vector<int> v;
    497     v.reserve(1);
    498     benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
    499     v.push_back(42);
    500     benchmark::ClobberMemory(); // Force 42 to be written to memory.
    501   }
    502 }
    503 ```
    504 
    505 Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
    506 
    507 ### Set time unit manually
    508 If a benchmark runs a few milliseconds it may be hard to visually compare the
    509 measured times, since the output data is given in nanoseconds per default. In
    510 order to manually set the time unit, you can specify it manually:
    511 
    512 ```c++
    513 BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
    514 ```
    515 
    516 ## Controlling number of iterations
    517 In all cases, the number of iterations for which the benchmark is run is
    518 governed by the amount of time the benchmark takes. Concretely, the number of
    519 iterations is at least one, not more than 1e9, until CPU time is greater than
    520 the minimum time, or the wallclock time is 5x minimum time. The minimum time is
    521 set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
    522 the registered benchmark object.
    523 
    524 ## Reporting the mean, median and standard deviation by repeated benchmarks
    525 By default each benchmark is run once and that single result is reported.
    526 However benchmarks are often noisy and a single result may not be representative
    527 of the overall behavior. For this reason it's possible to repeatedly rerun the
    528 benchmark.
    529 
    530 The number of runs of each benchmark is specified globally by the
    531 `--benchmark_repetitions` flag or on a per benchmark basis by calling
    532 `Repetitions` on the registered benchmark object. When a benchmark is run more
    533 than once the mean, median and standard deviation of the runs will be reported.
    534 
    535 Additionally the `--benchmark_report_aggregates_only={true|false}` flag or
    536 `ReportAggregatesOnly(bool)` function can be used to change how repeated tests
    537 are reported. By default the result of each repeated run is reported. When this
    538 option is `true` only the mean, median and standard deviation of the runs is reported.
    539 Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
    540 the value of the flag for that benchmark.
    541 
    542 ## User-defined statistics for repeated benchmarks
    543 While having mean, median and standard deviation is nice, this may not be
    544 enough for everyone. For example you may want to know what is the largest
    545 observation, e.g. because you have some real-time constraints. This is easy.
    546 The following code will specify a custom statistic to be calculated, defined
    547 by a lambda function.
    548 
    549 ```c++
    550 void BM_spin_empty(benchmark::State& state) {
    551   for (auto _ : state) {
    552     for (int x = 0; x < state.range(0); ++x) {
    553       benchmark::DoNotOptimize(x);
    554     }
    555   }
    556 }
    557 
    558 BENCHMARK(BM_spin_empty)
    559   ->ComputeStatistics("max", [](const std::vector<double>& v) -> double {
    560     return *(std::max_element(std::begin(v), std::end(v)));
    561   })
    562   ->Arg(512);
    563 ```
    564 
    565 ## Fixtures
    566 Fixture tests are created by
    567 first defining a type that derives from `::benchmark::Fixture` and then
    568 creating/registering the tests using the following macros:
    569 
    570 * `BENCHMARK_F(ClassName, Method)`
    571 * `BENCHMARK_DEFINE_F(ClassName, Method)`
    572 * `BENCHMARK_REGISTER_F(ClassName, Method)`
    573 
    574 For Example:
    575 
    576 ```c++
    577 class MyFixture : public benchmark::Fixture {};
    578 
    579 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
    580    for (auto _ : st) {
    581      ...
    582   }
    583 }
    584 
    585 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
    586    for (auto _ : st) {
    587      ...
    588   }
    589 }
    590 /* BarTest is NOT registered */
    591 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
    592 /* BarTest is now registered */
    593 ```
    594 
    595 ### Templated fixtures
    596 Also you can create templated fixture by using the following macros:
    597 
    598 * `BENCHMARK_TEMPLATE_F(ClassName, Method, ...)`
    599 * `BENCHMARK_TEMPLATE_DEFINE_F(ClassName, Method, ...)`
    600 
    601 For example:
    602 ```c++
    603 template<typename T>
    604 class MyFixture : public benchmark::Fixture {};
    605 
    606 BENCHMARK_TEMPLATE_F(MyFixture, IntTest, int)(benchmark::State& st) {
    607    for (auto _ : st) {
    608      ...
    609   }
    610 }
    611 
    612 BENCHMARK_TEMPLATE_DEFINE_F(MyFixture, DoubleTest, double)(benchmark::State& st) {
    613    for (auto _ : st) {
    614      ...
    615   }
    616 }
    617 
    618 BENCHMARK_REGISTER_F(MyFixture, DoubleTest)->Threads(2);
    619 ```
    620 
    621 ## User-defined counters
    622 
    623 You can add your own counters with user-defined names. The example below
    624 will add columns "Foo", "Bar" and "Baz" in its output:
    625 
    626 ```c++
    627 static void UserCountersExample1(benchmark::State& state) {
    628   double numFoos = 0, numBars = 0, numBazs = 0;
    629   for (auto _ : state) {
    630     // ... count Foo,Bar,Baz events
    631   }
    632   state.counters["Foo"] = numFoos;
    633   state.counters["Bar"] = numBars;
    634   state.counters["Baz"] = numBazs;
    635 }
    636 ```
    637 
    638 The `state.counters` object is a `std::map` with `std::string` keys
    639 and `Counter` values. The latter is a `double`-like class, via an implicit
    640 conversion to `double&`. Thus you can use all of the standard arithmetic
    641 assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
    642 
    643 In multithreaded benchmarks, each counter is set on the calling thread only.
    644 When the benchmark finishes, the counters from each thread will be summed;
    645 the resulting sum is the value which will be shown for the benchmark.
    646 
    647 The `Counter` constructor accepts two parameters: the value as a `double`
    648 and a bit flag which allows you to show counters as rates and/or as
    649 per-thread averages:
    650 
    651 ```c++
    652   // sets a simple counter
    653   state.counters["Foo"] = numFoos;
    654 
    655   // Set the counter as a rate. It will be presented divided
    656   // by the duration of the benchmark.
    657   state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
    658 
    659   // Set the counter as a thread-average quantity. It will
    660   // be presented divided by the number of threads.
    661   state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
    662 
    663   // There's also a combined flag:
    664   state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
    665 ```
    666 
    667 When you're compiling in C++11 mode or later you can use `insert()` with
    668 `std::initializer_list`:
    669 
    670 ```c++
    671   // With C++11, this can be done:
    672   state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
    673   // ... instead of:
    674   state.counters["Foo"] = numFoos;
    675   state.counters["Bar"] = numBars;
    676   state.counters["Baz"] = numBazs;
    677 ```
    678 
    679 ### Counter reporting
    680 
    681 When using the console reporter, by default, user counters are are printed at
    682 the end after the table, the same way as ``bytes_processed`` and
    683 ``items_processed``. This is best for cases in which there are few counters,
    684 or where there are only a couple of lines per benchmark. Here's an example of
    685 the default output:
    686 
    687 ```
    688 ------------------------------------------------------------------------------
    689 Benchmark                        Time           CPU Iterations UserCounters...
    690 ------------------------------------------------------------------------------
    691 BM_UserCounter/threads:8      2248 ns      10277 ns      68808 Bar=16 Bat=40 Baz=24 Foo=8
    692 BM_UserCounter/threads:1      9797 ns       9788 ns      71523 Bar=2 Bat=5 Baz=3 Foo=1024m
    693 BM_UserCounter/threads:2      4924 ns       9842 ns      71036 Bar=4 Bat=10 Baz=6 Foo=2
    694 BM_UserCounter/threads:4      2589 ns      10284 ns      68012 Bar=8 Bat=20 Baz=12 Foo=4
    695 BM_UserCounter/threads:8      2212 ns      10287 ns      68040 Bar=16 Bat=40 Baz=24 Foo=8
    696 BM_UserCounter/threads:16     1782 ns      10278 ns      68144 Bar=32 Bat=80 Baz=48 Foo=16
    697 BM_UserCounter/threads:32     1291 ns      10296 ns      68256 Bar=64 Bat=160 Baz=96 Foo=32
    698 BM_UserCounter/threads:4      2615 ns      10307 ns      68040 Bar=8 Bat=20 Baz=12 Foo=4
    699 BM_Factorial                    26 ns         26 ns   26608979 40320
    700 BM_Factorial/real_time          26 ns         26 ns   26587936 40320
    701 BM_CalculatePiRange/1           16 ns         16 ns   45704255 0
    702 BM_CalculatePiRange/8           73 ns         73 ns    9520927 3.28374
    703 BM_CalculatePiRange/64         609 ns        609 ns    1140647 3.15746
    704 BM_CalculatePiRange/512       4900 ns       4901 ns     142696 3.14355
    705 ```
    706 
    707 If this doesn't suit you, you can print each counter as a table column by
    708 passing the flag `--benchmark_counters_tabular=true` to the benchmark
    709 application. This is best for cases in which there are a lot of counters, or
    710 a lot of lines per individual benchmark. Note that this will trigger a
    711 reprinting of the table header any time the counter set changes between
    712 individual benchmarks. Here's an example of corresponding output when
    713 `--benchmark_counters_tabular=true` is passed:
    714 
    715 ```
    716 ---------------------------------------------------------------------------------------
    717 Benchmark                        Time           CPU Iterations    Bar   Bat   Baz   Foo
    718 ---------------------------------------------------------------------------------------
    719 BM_UserCounter/threads:8      2198 ns       9953 ns      70688     16    40    24     8
    720 BM_UserCounter/threads:1      9504 ns       9504 ns      73787      2     5     3     1
    721 BM_UserCounter/threads:2      4775 ns       9550 ns      72606      4    10     6     2
    722 BM_UserCounter/threads:4      2508 ns       9951 ns      70332      8    20    12     4
    723 BM_UserCounter/threads:8      2055 ns       9933 ns      70344     16    40    24     8
    724 BM_UserCounter/threads:16     1610 ns       9946 ns      70720     32    80    48    16
    725 BM_UserCounter/threads:32     1192 ns       9948 ns      70496     64   160    96    32
    726 BM_UserCounter/threads:4      2506 ns       9949 ns      70332      8    20    12     4
    727 --------------------------------------------------------------
    728 Benchmark                        Time           CPU Iterations
    729 --------------------------------------------------------------
    730 BM_Factorial                    26 ns         26 ns   26392245 40320
    731 BM_Factorial/real_time          26 ns         26 ns   26494107 40320
    732 BM_CalculatePiRange/1           15 ns         15 ns   45571597 0
    733 BM_CalculatePiRange/8           74 ns         74 ns    9450212 3.28374
    734 BM_CalculatePiRange/64         595 ns        595 ns    1173901 3.15746
    735 BM_CalculatePiRange/512       4752 ns       4752 ns     147380 3.14355
    736 BM_CalculatePiRange/4k       37970 ns      37972 ns      18453 3.14184
    737 BM_CalculatePiRange/32k     303733 ns     303744 ns       2305 3.14162
    738 BM_CalculatePiRange/256k   2434095 ns    2434186 ns        288 3.1416
    739 BM_CalculatePiRange/1024k  9721140 ns    9721413 ns         71 3.14159
    740 BM_CalculatePi/threads:8      2255 ns       9943 ns      70936
    741 ```
    742 Note above the additional header printed when the benchmark changes from
    743 ``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
    744 not have the same counter set as ``BM_UserCounter``.
    745 
    746 ## Exiting Benchmarks in Error
    747 
    748 When errors caused by external influences, such as file I/O and network
    749 communication, occur within a benchmark the
    750 `State::SkipWithError(const char* msg)` function can be used to skip that run
    751 of benchmark and report the error. Note that only future iterations of the
    752 `KeepRunning()` are skipped. For the ranged-for version of the benchmark loop
    753 Users must explicitly exit the loop, otherwise all iterations will be performed.
    754 Users may explicitly return to exit the benchmark immediately.
    755 
    756 The `SkipWithError(...)` function may be used at any point within the benchmark,
    757 including before and after the benchmark loop.
    758 
    759 For example:
    760 
    761 ```c++
    762 static void BM_test(benchmark::State& state) {
    763   auto resource = GetResource();
    764   if (!resource.good()) {
    765       state.SkipWithError("Resource is not good!");
    766       // KeepRunning() loop will not be entered.
    767   }
    768   for (state.KeepRunning()) {
    769       auto data = resource.read_data();
    770       if (!resource.good()) {
    771         state.SkipWithError("Failed to read data!");
    772         break; // Needed to skip the rest of the iteration.
    773      }
    774      do_stuff(data);
    775   }
    776 }
    777 
    778 static void BM_test_ranged_fo(benchmark::State & state) {
    779   state.SkipWithError("test will not be entered");
    780   for (auto _ : state) {
    781     state.SkipWithError("Failed!");
    782     break; // REQUIRED to prevent all further iterations.
    783   }
    784 }
    785 ```
    786 
    787 ## Running a subset of the benchmarks
    788 
    789 The `--benchmark_filter=<regex>` option can be used to only run the benchmarks
    790 which match the specified `<regex>`. For example:
    791 
    792 ```bash
    793 $ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
    794 Run on (1 X 2300 MHz CPU )
    795 2016-06-25 19:34:24
    796 Benchmark              Time           CPU Iterations
    797 ----------------------------------------------------
    798 BM_memcpy/32          11 ns         11 ns   79545455
    799 BM_memcpy/32k       2181 ns       2185 ns     324074
    800 BM_memcpy/32          12 ns         12 ns   54687500
    801 BM_memcpy/32k       1834 ns       1837 ns     357143
    802 ```
    803 
    804 
    805 ## Output Formats
    806 The library supports multiple output formats. Use the
    807 `--benchmark_format=<console|json|csv>` flag to set the format type. `console`
    808 is the default format.
    809 
    810 The Console format is intended to be a human readable format. By default
    811 the format generates color output. Context is output on stderr and the
    812 tabular data on stdout. Example tabular output looks like:
    813 ```
    814 Benchmark                               Time(ns)    CPU(ns) Iterations
    815 ----------------------------------------------------------------------
    816 BM_SetInsert/1024/1                        28928      29349      23853  133.097kB/s   33.2742k items/s
    817 BM_SetInsert/1024/8                        32065      32913      21375  949.487kB/s   237.372k items/s
    818 BM_SetInsert/1024/10                       33157      33648      21431  1.13369MB/s   290.225k items/s
    819 ```
    820 
    821 The JSON format outputs human readable json split into two top level attributes.
    822 The `context` attribute contains information about the run in general, including
    823 information about the CPU and the date.
    824 The `benchmarks` attribute contains a list of ever benchmark run. Example json
    825 output looks like:
    826 ```json
    827 {
    828   "context": {
    829     "date": "2015/03/17-18:40:25",
    830     "num_cpus": 40,
    831     "mhz_per_cpu": 2801,
    832     "cpu_scaling_enabled": false,
    833     "build_type": "debug"
    834   },
    835   "benchmarks": [
    836     {
    837       "name": "BM_SetInsert/1024/1",
    838       "iterations": 94877,
    839       "real_time": 29275,
    840       "cpu_time": 29836,
    841       "bytes_per_second": 134066,
    842       "items_per_second": 33516
    843     },
    844     {
    845       "name": "BM_SetInsert/1024/8",
    846       "iterations": 21609,
    847       "real_time": 32317,
    848       "cpu_time": 32429,
    849       "bytes_per_second": 986770,
    850       "items_per_second": 246693
    851     },
    852     {
    853       "name": "BM_SetInsert/1024/10",
    854       "iterations": 21393,
    855       "real_time": 32724,
    856       "cpu_time": 33355,
    857       "bytes_per_second": 1199226,
    858       "items_per_second": 299807
    859     }
    860   ]
    861 }
    862 ```
    863 
    864 The CSV format outputs comma-separated values. The `context` is output on stderr
    865 and the CSV itself on stdout. Example CSV output looks like:
    866 ```
    867 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
    868 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
    869 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
    870 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
    871 ```
    872 
    873 ## Output Files
    874 The library supports writing the output of the benchmark to a file specified
    875 by `--benchmark_out=<filename>`. The format of the output can be specified
    876 using `--benchmark_out_format={json|console|csv}`. Specifying
    877 `--benchmark_out` does not suppress the console output.
    878 
    879 ## Debug vs Release
    880 By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
    881 
    882 ```
    883 cmake -DCMAKE_BUILD_TYPE=Release
    884 ```
    885 
    886 To enable link-time optimisation, use
    887 
    888 ```
    889 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
    890 ```
    891 
    892 If you are using gcc, you might need to set `GCC_AR` and `GCC_RANLIB` cmake cache variables, if autodetection fails.
    893 If you are using clang, you may need to set `LLVMAR_EXECUTABLE`, `LLVMNM_EXECUTABLE` and `LLVMRANLIB_EXECUTABLE` cmake cache variables.
    894 
    895 ## Linking against the library
    896 When using gcc, it is necessary to link against pthread to avoid runtime exceptions.
    897 This is due to how gcc implements std::thread.
    898 See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
    899 
    900 ## Compiler Support
    901 
    902 Google Benchmark uses C++11 when building the library. As such we require
    903 a modern C++ toolchain, both compiler and standard library.
    904 
    905 The following minimum versions are strongly recommended build the library:
    906 
    907 * GCC 4.8
    908 * Clang 3.4
    909 * Visual Studio 2013
    910 * Intel 2015 Update 1
    911 
    912 Anything older *may* work.
    913 
    914 Note: Using the library and its headers in C++03 is supported. C++11 is only
    915 required to build the library.
    916 
    917 ## Disable CPU frequency scaling
    918 If you see this error:
    919 ```
    920 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
    921 ```
    922 you might want to disable the CPU frequency scaling while running the benchmark:
    923 ```bash
    924 sudo cpupower frequency-set --governor performance
    925 ./mybench
    926 sudo cpupower frequency-set --governor powersave
    927 ```
    928 
    929 # Known Issues
    930 
    931 ### Windows
    932 
    933 * Users must manually link `shlwapi.lib`. Failure to do so may result
    934 in unresolved symbols.
    935 
    936