1 benchmark
2 =========
3 [![Build Status](https://travis-ci.org/google/benchmark.svg?branch=master)](https://travis-ci.org/google/benchmark)
4 [![Build status](https://ci.appveyor.com/api/projects/status/u0qsyp7t1tk7cpxs/branch/master?svg=true)](https://ci.appveyor.com/project/google/benchmark/branch/master)
5 [![Coverage Status](https://coveralls.io/repos/google/benchmark/badge.svg)](https://coveralls.io/r/google/benchmark)
6
7 A library to support the benchmarking of functions, similar to unit-tests.
8
9 Discussion group: https://groups.google.com/d/forum/benchmark-discuss
10
11 IRC channel: https://freenode.net #googlebenchmark
12
13 Example usage
14 -------------
15 Define a function that executes the code to be measured a
16 specified number of times:
17
18 ```c++
19 static void BM_StringCreation(benchmark::State& state) {
20 while (state.KeepRunning())
21 std::string empty_string;
22 }
23 // Register the function as a benchmark
24 BENCHMARK(BM_StringCreation);
25
26 // Define another benchmark
27 static void BM_StringCopy(benchmark::State& state) {
28 std::string x = "hello";
29 while (state.KeepRunning())
30 std::string copy(x);
31 }
32 BENCHMARK(BM_StringCopy);
33
34 BENCHMARK_MAIN();
35 ```
36
37 Sometimes a family of microbenchmarks can be implemented with
38 just one routine that takes an extra argument to specify which
39 one of the family of benchmarks to run. For example, the following
40 code defines a family of microbenchmarks for measuring the speed
41 of `memcpy()` calls of different lengths:
42
43 ```c++
44 static void BM_memcpy(benchmark::State& state) {
45 char* src = new char[state.range_x()]; char* dst = new char[state.range_x()];
46 memset(src, 'x', state.range_x());
47 while (state.KeepRunning())
48 memcpy(dst, src, state.range_x());
49 state.SetBytesProcessed(int64_t(state.iterations()) *
50 int64_t(state.range_x()));
51 delete[] src;
52 delete[] dst;
53 }
54 BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
55 ```
56
57 The preceding code is quite repetitive, and can be replaced with the
58 following short-hand. The following invocation will pick a few
59 appropriate arguments in the specified range and will generate a
60 microbenchmark for each such argument.
61
62 ```c++
63 BENCHMARK(BM_memcpy)->Range(8, 8<<10);
64 ```
65
66 You might have a microbenchmark that depends on two inputs. For
67 example, the following code defines a family of microbenchmarks for
68 measuring the speed of set insertion.
69
70 ```c++
71 static void BM_SetInsert(benchmark::State& state) {
72 while (state.KeepRunning()) {
73 state.PauseTiming();
74 std::set<int> data = ConstructRandomSet(state.range_x());
75 state.ResumeTiming();
76 for (int j = 0; j < state.range_y(); ++j)
77 data.insert(RandomNumber());
78 }
79 }
80 BENCHMARK(BM_SetInsert)
81 ->ArgPair(1<<10, 1)
82 ->ArgPair(1<<10, 8)
83 ->ArgPair(1<<10, 64)
84 ->ArgPair(1<<10, 512)
85 ->ArgPair(8<<10, 1)
86 ->ArgPair(8<<10, 8)
87 ->ArgPair(8<<10, 64)
88 ->ArgPair(8<<10, 512);
89 ```
90
91 The preceding code is quite repetitive, and can be replaced with
92 the following short-hand. The following macro will pick a few
93 appropriate arguments in the product of the two specified ranges
94 and will generate a microbenchmark for each such pair.
95
96 ```c++
97 BENCHMARK(BM_SetInsert)->RangePair(1<<10, 8<<10, 1, 512);
98 ```
99
100 For more complex patterns of inputs, passing a custom function
101 to Apply allows programmatic specification of an
102 arbitrary set of arguments to run the microbenchmark on.
103 The following example enumerates a dense range on one parameter,
104 and a sparse range on the second.
105
106 ```c++
107 static void CustomArguments(benchmark::internal::Benchmark* b) {
108 for (int i = 0; i <= 10; ++i)
109 for (int j = 32; j <= 1024*1024; j *= 8)
110 b->ArgPair(i, j);
111 }
112 BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
113 ```
114
115 Templated microbenchmarks work the same way:
116 Produce then consume 'size' messages 'iters' times
117 Measures throughput in the absence of multiprogramming.
118
119 ```c++
120 template <class Q> int BM_Sequential(benchmark::State& state) {
121 Q q;
122 typename Q::value_type v;
123 while (state.KeepRunning()) {
124 for (int i = state.range_x(); i--; )
125 q.push(v);
126 for (int e = state.range_x(); e--; )
127 q.Wait(&v);
128 }
129 // actually messages, not bytes:
130 state.SetBytesProcessed(
131 static_cast<int64_t>(state.iterations())*state.range_x());
132 }
133 BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue<int>)->Range(1<<0, 1<<10);
134 ```
135
136 Three macros are provided for adding benchmark templates.
137
138 ```c++
139 #if __cplusplus >= 201103L // C++11 and greater.
140 #define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
141 #else // C++ < C++11
142 #define BENCHMARK_TEMPLATE(func, arg1)
143 #endif
144 #define BENCHMARK_TEMPLATE1(func, arg1)
145 #define BENCHMARK_TEMPLATE2(func, arg1, arg2)
146 ```
147
148 In a multithreaded test (benchmark invoked by multiple threads simultaneously),
149 it is guaranteed that none of the threads will start until all have called
150 KeepRunning, and all will have finished before KeepRunning returns false. As
151 such, any global setup or teardown you want to do can be
152 wrapped in a check against the thread index:
153
154 ```c++
155 static void BM_MultiThreaded(benchmark::State& state) {
156 if (state.thread_index == 0) {
157 // Setup code here.
158 }
159 while (state.KeepRunning()) {
160 // Run the test as normal.
161 }
162 if (state.thread_index == 0) {
163 // Teardown code here.
164 }
165 }
166 BENCHMARK(BM_MultiThreaded)->Threads(2);
167 ```
168
169 If the benchmarked code itself uses threads and you want to compare it to
170 single-threaded code, you may want to use real-time ("wallclock") measurements
171 for latency comparisons:
172
173 ```c++
174 BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
175 ```
176
177 Without `UseRealTime`, CPU time is used by default.
178
179 To prevent a value or expression from being optimized away by the compiler
180 the `benchmark::DoNotOptimize(...)` function can be used.
181
182 ```c++
183 static void BM_test(benchmark::State& state) {
184 while (state.KeepRunning()) {
185 int x = 0;
186 for (int i=0; i < 64; ++i) {
187 benchmark::DoNotOptimize(x += i);
188 }
189 }
190 }
191 ```
192
193 Benchmark Fixtures
194 ------------------
195 Fixture tests are created by
196 first defining a type that derives from ::benchmark::Fixture and then
197 creating/registering the tests using the following macros:
198
199 * `BENCHMARK_F(ClassName, Method)`
200 * `BENCHMARK_DEFINE_F(ClassName, Method)`
201 * `BENCHMARK_REGISTER_F(ClassName, Method)`
202
203 For Example:
204
205 ```c++
206 class MyFixture : public benchmark::Fixture {};
207
208 BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
209 while (st.KeepRunning()) {
210 ...
211 }
212 }
213
214 BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
215 while (st.KeepRunning()) {
216 ...
217 }
218 }
219 /* BarTest is NOT registered */
220 BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
221 /* BarTest is now registered */
222 ```
223
224 Output Formats
225 --------------
226 The library supports multiple output formats. Use the
227 `--benchmark_format=<tabular|json>` flag to set the format type. `tabular` is
228 the default format.
229
230 The Tabular format is intended to be a human readable format. By default
231 the format generates color output. Context is output on stderr and the
232 tabular data on stdout. Example tabular output looks like:
233 ```
234 Benchmark Time(ns) CPU(ns) Iterations
235 ----------------------------------------------------------------------
236 BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s
237 BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s
238 BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s
239 ```
240
241 The JSON format outputs human readable json split into two top level attributes.
242 The `context` attribute contains information about the run in general, including
243 information about the CPU and the date.
244 The `benchmarks` attribute contains a list of ever benchmark run. Example json
245 output looks like:
246 ``` json
247 {
248 "context": {
249 "date": "2015/03/17-18:40:25",
250 "num_cpus": 40,
251 "mhz_per_cpu": 2801,
252 "cpu_scaling_enabled": false,
253 "build_type": "debug"
254 },
255 "benchmarks": [
256 {
257 "name": "BM_SetInsert/1024/1",
258 "iterations": 94877,
259 "real_time": 29275,
260 "cpu_time": 29836,
261 "bytes_per_second": 134066,
262 "items_per_second": 33516
263 },
264 {
265 "name": "BM_SetInsert/1024/8",
266 "iterations": 21609,
267 "real_time": 32317,
268 "cpu_time": 32429,
269 "bytes_per_second": 986770,
270 "items_per_second": 246693
271 },
272 {
273 "name": "BM_SetInsert/1024/10",
274 "iterations": 21393,
275 "real_time": 32724,
276 "cpu_time": 33355,
277 "bytes_per_second": 1199226,
278 "items_per_second": 299807
279 }
280 ]
281 }
282 ```
283
284 The CSV format outputs comma-separated values. The `context` is output on stderr
285 and the CSV itself on stdout. Example CSV output looks like:
286 ```
287 name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
288 "BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
289 "BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
290 "BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
291 ```
292
293 Debug vs Release
294 ----------------
295 By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
296
297 ```
298 cmake -DCMAKE_BUILD_TYPE=Release
299 ```
300
301 To enable link-time optimisation, use
302
303 ```
304 cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
305 ```
306
307 Linking against the library
308 ---------------------------
309 When using gcc, it is necessary to link against pthread to avoid runtime exceptions. This is due to how gcc implements std::thread. See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
310