1 benchmark 2 ========= 3 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark) 4 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master) 5 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark) 6 7 A library to support the benchmarking of functions, similar to unit-tests. 8 9 Discussion group: https://groups.google.com/d/forum/benchmark-discuss 10 11 IRC channel: https://freenode.net #googlebenchmark 12 13 Example usage 14 ------------- 15 Define a function that executes the code to be measured a 16 specified number of times: 17 18 ```c++ 19 static void BM_StringCreation(benchmark::State& state) { 20 while (state.KeepRunning()) 21 std::string empty_string; 22 } 23 // Register the function as a benchmark 24 BENCHMARK(BM_StringCreation); 25 26 // Define another benchmark 27 static void BM_StringCopy(benchmark::State& state) { 28 std::string x = "hello"; 29 while (state.KeepRunning()) 30 std::string copy(x); 31 } 32 BENCHMARK(BM_StringCopy); 33 34 BENCHMARK_MAIN(); 35 ``` 36 37 Sometimes a family of microbenchmarks can be implemented with 38 just one routine that takes an extra argument to specify which 39 one of the family of benchmarks to run. For example, the following 40 code defines a family of microbenchmarks for measuring the speed 41 of `memcpy()` calls of different lengths: 42 43 ```c++ 44 static void BM_memcpy(benchmark::State& state) { 45 char* src = new char[state.range_x()]; char* dst = new char[state.range_x()]; 46 memset(src, 'x', state.range_x()); 47 while (state.KeepRunning()) 48 memcpy(dst, src, state.range_x()); 49 state.SetBytesProcessed(int64_t(state.iterations()) * 50 int64_t(state.range_x())); 51 delete[] src; 52 delete[] dst; 53 } 54 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10); 55 ``` 56 57 The preceding code is quite repetitive, and can be replaced with the 58 following short-hand. The following invocation will pick a few 59 appropriate arguments in the specified range and will generate a 60 microbenchmark for each such argument. 61 62 ```c++ 63 BENCHMARK(BM_memcpy)->Range(8, 8<<10); 64 ``` 65 66 You might have a microbenchmark that depends on two inputs. For 67 example, the following code defines a family of microbenchmarks for 68 measuring the speed of set insertion. 69 70 ```c++ 71 static void BM_SetInsert(benchmark::State& state) { 72 while (state.KeepRunning()) { 73 state.PauseTiming(); 74 std::set<int> data = ConstructRandomSet(state.range_x()); 75 state.ResumeTiming(); 76 for (int j = 0; j < state.range_y(); ++j) 77 data.insert(RandomNumber()); 78 } 79 } 80 BENCHMARK(BM_SetInsert) 81 ->ArgPair(1<<10, 1) 82 ->ArgPair(1<<10, 8) 83 ->ArgPair(1<<10, 64) 84 ->ArgPair(1<<10, 512) 85 ->ArgPair(8<<10, 1) 86 ->ArgPair(8<<10, 8) 87 ->ArgPair(8<<10, 64) 88 ->ArgPair(8<<10, 512); 89 ``` 90 91 The preceding code is quite repetitive, and can be replaced with 92 the following short-hand. The following macro will pick a few 93 appropriate arguments in the product of the two specified ranges 94 and will generate a microbenchmark for each such pair. 95 96 ```c++ 97 BENCHMARK(BM_SetInsert)->RangePair(1<<10, 8<<10, 1, 512); 98 ``` 99 100 For more complex patterns of inputs, passing a custom function 101 to Apply allows programmatic specification of an 102 arbitrary set of arguments to run the microbenchmark on. 103 The following example enumerates a dense range on one parameter, 104 and a sparse range on the second. 105 106 ```c++ 107 static void CustomArguments(benchmark::internal::Benchmark* b) { 108 for (int i = 0; i <= 10; ++i) 109 for (int j = 32; j <= 1024*1024; j *= 8) 110 b->ArgPair(i, j); 111 } 112 BENCHMARK(BM_SetInsert)->Apply(CustomArguments); 113 ``` 114 115 Templated microbenchmarks work the same way: 116 Produce then consume 'size' messages 'iters' times 117 Measures throughput in the absence of multiprogramming. 118 119 ```c++ 120 template <class Q> int BM_Sequential(benchmark::State& state) { 121 Q q; 122 typename Q::value_type v; 123 while (state.KeepRunning()) { 124 for (int i = state.range_x(); i--; ) 125 q.push(v); 126 for (int e = state.range_x(); e--; ) 127 q.Wait(&v); 128 } 129 // actually messages, not bytes: 130 state.SetBytesProcessed( 131 static_cast<int64_t>(state.iterations())*state.range_x()); 132 } 133 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10); 134 ``` 135 136 Three macros are provided for adding benchmark templates. 137 138 ```c++ 139 #if __cplusplus >= 201103L // C++11 and greater. 140 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters. 141 #else // C++ < C++11 142 #define BENCHMARK_TEMPLATE(func, arg1) 143 #endif 144 #define BENCHMARK_TEMPLATE1(func, arg1) 145 #define BENCHMARK_TEMPLATE2(func, arg1, arg2) 146 ``` 147 148 In a multithreaded test (benchmark invoked by multiple threads simultaneously), 149 it is guaranteed that none of the threads will start until all have called 150 KeepRunning, and all will have finished before KeepRunning returns false. As 151 such, any global setup or teardown you want to do can be 152 wrapped in a check against the thread index: 153 154 ```c++ 155 static void BM_MultiThreaded(benchmark::State& state) { 156 if (state.thread_index == 0) { 157 // Setup code here. 158 } 159 while (state.KeepRunning()) { 160 // Run the test as normal. 161 } 162 if (state.thread_index == 0) { 163 // Teardown code here. 164 } 165 } 166 BENCHMARK(BM_MultiThreaded)->Threads(2); 167 ``` 168 169 If the benchmarked code itself uses threads and you want to compare it to 170 single-threaded code, you may want to use real-time ("wallclock") measurements 171 for latency comparisons: 172 173 ```c++ 174 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime(); 175 ``` 176 177 Without `UseRealTime`, CPU time is used by default. 178 179 To prevent a value or expression from being optimized away by the compiler 180 the `benchmark::DoNotOptimize(...)` function can be used. 181 182 ```c++ 183 static void BM_test(benchmark::State& state) { 184 while (state.KeepRunning()) { 185 int x = 0; 186 for (int i=0; i < 64; ++i) { 187 benchmark::DoNotOptimize(x += i); 188 } 189 } 190 } 191 ``` 192 193 Benchmark Fixtures 194 ------------------ 195 Fixture tests are created by 196 first defining a type that derives from ::benchmark::Fixture and then 197 creating/registering the tests using the following macros: 198 199 * `BENCHMARK_F(ClassName, Method)` 200 * `BENCHMARK_DEFINE_F(ClassName, Method)` 201 * `BENCHMARK_REGISTER_F(ClassName, Method)` 202 203 For Example: 204 205 ```c++ 206 class MyFixture : public benchmark::Fixture {}; 207 208 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) { 209 while (st.KeepRunning()) { 210 ... 211 } 212 } 213 214 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) { 215 while (st.KeepRunning()) { 216 ... 217 } 218 } 219 /* BarTest is NOT registered */ 220 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2); 221 /* BarTest is now registered */ 222 ``` 223 224 Output Formats 225 -------------- 226 The library supports multiple output formats. Use the 227 `--benchmark_format=<tabular|json>` flag to set the format type. `tabular` is 228 the default format. 229 230 The Tabular format is intended to be a human readable format. By default 231 the format generates color output. Context is output on stderr and the 232 tabular data on stdout. Example tabular output looks like: 233 ``` 234 Benchmark Time(ns) CPU(ns) Iterations 235 ---------------------------------------------------------------------- 236 BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s 237 BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s 238 BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s 239 ``` 240 241 The JSON format outputs human readable json split into two top level attributes. 242 The `context` attribute contains information about the run in general, including 243 information about the CPU and the date. 244 The `benchmarks` attribute contains a list of ever benchmark run. Example json 245 output looks like: 246 ``` json 247 { 248 "context": { 249 "date": "2015/03/17-18:40:25", 250 "num_cpus": 40, 251 "mhz_per_cpu": 2801, 252 "cpu_scaling_enabled": false, 253 "build_type": "debug" 254 }, 255 "benchmarks": [ 256 { 257 "name": "BM_SetInsert/1024/1", 258 "iterations": 94877, 259 "real_time": 29275, 260 "cpu_time": 29836, 261 "bytes_per_second": 134066, 262 "items_per_second": 33516 263 }, 264 { 265 "name": "BM_SetInsert/1024/8", 266 "iterations": 21609, 267 "real_time": 32317, 268 "cpu_time": 32429, 269 "bytes_per_second": 986770, 270 "items_per_second": 246693 271 }, 272 { 273 "name": "BM_SetInsert/1024/10", 274 "iterations": 21393, 275 "real_time": 32724, 276 "cpu_time": 33355, 277 "bytes_per_second": 1199226, 278 "items_per_second": 299807 279 } 280 ] 281 } 282 ``` 283 284 The CSV format outputs comma-separated values. The `context` is output on stderr 285 and the CSV itself on stdout. Example CSV output looks like: 286 ``` 287 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label 288 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942, 289 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115, 290 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06, 291 ``` 292 293 Debug vs Release 294 ---------------- 295 By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use: 296 297 ``` 298 cmake -DCMAKE_BUILD_TYPE=Release 299 ``` 300 301 To enable link-time optimisation, use 302 303 ``` 304 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true 305 ``` 306 307 Linking against the library 308 --------------------------- 309 When using gcc, it is necessary to link against pthread to avoid runtime exceptions. This is due to how gcc implements std::thread. See [issue #67](https://github.com/google/benchmark/issues/67) for more details. 310