README.md
1 # TFLite Model Benchmark Tool
2
3 ## Description
4
5 A simple C++ binary to benchmark a TFLite model and its individual operators,
6 both on desktop machines and on Android. The binary takes a TFLite model,
7 generates random inputs and then repeatedly runs the model for specified number
8 of runs. Aggregate latency statistics are reported after running the benchmark.
9
10 The instructions below are for running the binary on Desktop and Android,
11 for iOS please use the
12 [iOS benchmark app](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark/ios).
13
14 An experimental Android APK wrapper for the benchmark model utility offers more
15 faithful execution behavior on Android (via a foreground Activity). It is
16 located
17 [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark/android).
18
19 ## Parameters
20
21 The binary takes the following required parameters:
22
23 * `graph`: `string` \
24 The path to the TFLite model file.
25
26 and the following optional parameters:
27
28 * `num_threads`: `int` (default=1) \
29 The number of threads to use for running TFLite interpreter.
30 * `warmup_runs`: `int` (default=1) \
31 The number of warmup runs to do before starting the benchmark.
32 * `num_runs`: `int` (default=50) \
33 The number of runs. Increase this to reduce variance.
34 * `run_delay`: `float` (default=-1.0) \
35 The delay in seconds between subsequent benchmark runs. Non-positive values
36 mean use no delay.
37 * `use_nnapi`: `bool` (default=false) \
38 Whether to use [Android NNAPI](https://developer.android.com/ndk/guides/neuralnetworks/).
39 This API is available on recent Android devices.
40
41 ## To build/install/run
42
43 ### On Android:
44
45 (0) Refer to https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android to edit the `WORKSPACE` to configure the android NDK/SDK.
46
47 (1) Build for your specific platform, e.g.:
48
49 ```
50 bazel build -c opt \
51 --config=android_arm \
52 --cxxopt='--std=c++11' \
53 tensorflow/lite/tools/benchmark:benchmark_model
54 ```
55
56 (2) Connect your phone. Push the binary to your phone with adb push
57 (make the directory if required):
58
59 ```
60 adb push bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model /data/local/tmp
61 ```
62
63 (3) Make the binary executable.
64
65 ```
66 adb shell chmod +x /data/local/tmp/benchmark_model
67 ```
68
69 (4) Push the compute graph that you need to test. For example:
70
71 ```
72 adb push mobilenet_quant_v1_224.tflite /data/local/tmp
73 ```
74
75 (5) Run the benchmark. For example:
76
77 ```
78 adb shell /data/local/tmp/benchmark_model \
79 --graph=/data/local/tmp/mobilenet_quant_v1_224.tflite \
80 --num_threads=4
81 ```
82
83 ### On desktop:
84 (1) build the binary
85
86 ```
87 bazel build -c opt tensorflow/lite/tools/benchmark:benchmark_model
88 ```
89
90 (2) Run on your compute graph, similar to the Android case but without the need of adb shell.
91 For example:
92
93 ```
94 bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
95 --graph=mobilenet_quant_v1_224.tflite \
96 --num_threads=4
97 ```
98
99 The MobileNet graph used as an example here may be downloaded from [here](https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_224_android_quant_2017_11_08.zip).
100
101
102 ## Reducing variance between runs on Android.
103
104 Most modern Android phones use [ARM big.LITTLE](https://en.wikipedia.org/wiki/ARM_big.LITTLE)
105 architecture where some cores are more power hungry but faster than other cores.
106 When running benchmarks on these phones there can be significant variance
107 between different runs of the benchmark. One way to reduce variance between runs
108 is to set the [CPU affinity](https://en.wikipedia.org/wiki/Processor_affinity)
109 before running the benchmark. On Android this can be done using the `taskset`
110 command.
111 E.g. for running the benchmark on big cores on Pixel 2 with a single thread one
112 can use the following command:
113
114 ```
115 adb shell taskset f0 /data/local/tmp/benchmark_model \
116 --graph=/data/local/tmp/mobilenet_quant_v1_224.tflite \
117 --num_threads=1
118 ```
119
120 where `f0` is the affinity mask for big cores on Pixel 2.
121 Note: The affinity mask varies with the device.
122
123 ## Profiling model operators
124 The benchmark model binary also allows you to profile operators and give execution times of each operator. To do this,
125 compile the binary with a compiler flag that enables profiling to be compiled in. Pass **--copt=-DTFLITE_PROFILING_ENABLED**
126 to compile benchmark with profiling support.
127 For example, to compile with profiling support on Android, add this flag to the previous command:
128
129 ```
130 bazel build -c opt \
131 --config=android_arm \
132 --cxxopt='--std=c++11' \
133 --copt=-DTFLITE_PROFILING_ENABLED \
134 tensorflow/lite/tools/benchmark:benchmark_model
135 ```
136 This compiles TFLite with profiling enabled, now you can run the benchmark binary like before. The binary will produce detailed statistics for each operation similar to those shown below:
137
138 ```
139
140 ============================== Run Order ==============================
141 [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
142 CONV_2D 0.000 4.269 4.269 0.107% 0.107% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_0/Relu6]
143 DEPTHWISE_CONV_2D 4.270 2.150 2.150 0.054% 0.161% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu6]
144 CONV_2D 6.421 6.107 6.107 0.153% 0.314% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6]
145 DEPTHWISE_CONV_2D 12.528 1.366 1.366 0.034% 0.348% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_2_depthwise/Relu6]
146 CONV_2D 13.895 4.195 4.195 0.105% 0.454% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_2_pointwise/Relu6]
147 DEPTHWISE_CONV_2D 18.091 1.260 1.260 0.032% 0.485% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6]
148 CONV_2D 19.352 6.652 6.652 0.167% 0.652% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6]
149 DEPTHWISE_CONV_2D 26.005 0.698 0.698 0.018% 0.670% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_4_depthwise/Relu6]
150 CONV_2D 26.703 3.344 3.344 0.084% 0.754% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_4_pointwise/Relu6]
151 DEPTHWISE_CONV_2D 30.047 0.646 0.646 0.016% 0.770% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_5_depthwise/Relu6]
152 CONV_2D 30.694 5.800 5.800 0.145% 0.915% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_5_pointwise/Relu6]
153 DEPTHWISE_CONV_2D 36.495 0.331 0.331 0.008% 0.924% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_6_depthwise/Relu6]
154 CONV_2D 36.826 2.838 2.838 0.071% 0.995% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_6_pointwise/Relu6]
155 DEPTHWISE_CONV_2D 39.665 0.439 0.439 0.011% 1.006% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_7_depthwise/Relu6]
156 CONV_2D 40.105 5.293 5.293 0.133% 1.139% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_7_pointwise/Relu6]
157 DEPTHWISE_CONV_2D 45.399 0.352 0.352 0.009% 1.147% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_8_depthwise/Relu6]
158 CONV_2D 45.752 5.322 5.322 0.133% 1.281% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_8_pointwise/Relu6]
159 DEPTHWISE_CONV_2D 51.075 0.357 0.357 0.009% 1.290% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_9_depthwise/Relu6]
160 CONV_2D 51.432 5.693 5.693 0.143% 1.433% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_9_pointwise/Relu6]
161 DEPTHWISE_CONV_2D 57.126 0.366 0.366 0.009% 1.442% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_10_depthwise/Relu6]
162 CONV_2D 57.493 5.472 5.472 0.137% 1.579% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Relu6]
163 DEPTHWISE_CONV_2D 62.966 0.364 0.364 0.009% 1.588% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_11_depthwise/Relu6]
164 CONV_2D 63.330 5.404 5.404 0.136% 1.724% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6]
165 DEPTHWISE_CONV_2D 68.735 0.155 0.155 0.004% 1.728% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_12_depthwise/Relu6]
166 CONV_2D 68.891 2.970 2.970 0.074% 1.802% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_12_pointwise/Relu6]
167 DEPTHWISE_CONV_2D 71.862 0.206 0.206 0.005% 1.807% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_13_depthwise/Relu6]
168 CONV_2D 72.069 5.888 5.888 0.148% 1.955% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_13_pointwise/Relu6]
169 AVERAGE_POOL_2D 77.958 0.036 0.036 0.001% 1.956% 0.000 0 [MobilenetV1/Logits/AvgPool_1a/AvgPool]
170 CONV_2D 77.994 1.445 1.445 0.036% 1.992% 0.000 0 [MobilenetV1/Logits/Conv2d_1c_1x1/BiasAdd]
171 RESHAPE 79.440 0.002 0.002 0.000% 1.992% 0.000 0 [MobilenetV1/Predictions/Reshape]
172 SOFTMAX 79.443 0.029 0.029 0.001% 1.993% 0.000 0 [MobilenetV1/Predictions/Softmax]
173
174 ============================== Top by Computation Time ==============================
175 [node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
176 CONV_2D 19.352 6.652 6.652 0.167% 0.167% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6]
177 CONV_2D 6.421 6.107 6.107 0.153% 0.320% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_1_pointwise/Relu6]
178 CONV_2D 72.069 5.888 5.888 0.148% 0.468% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_13_pointwise/Relu6]
179 CONV_2D 30.694 5.800 5.800 0.145% 0.613% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_5_pointwise/Relu6]
180 CONV_2D 51.432 5.693 5.693 0.143% 0.756% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_9_pointwise/Relu6]
181 CONV_2D 57.493 5.472 5.472 0.137% 0.893% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_10_pointwise/Relu6]
182 CONV_2D 63.330 5.404 5.404 0.136% 1.029% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_11_pointwise/Relu6]
183 CONV_2D 45.752 5.322 5.322 0.133% 1.162% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_8_pointwise/Relu6]
184 CONV_2D 40.105 5.293 5.293 0.133% 1.295% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_7_pointwise/Relu6]
185 CONV_2D 0.000 4.269 4.269 0.107% 1.402% 0.000 0 [MobilenetV1/MobilenetV1/Conv2d_0/Relu6]
186
187 Number of nodes executed: 31
188 ============================== Summary by node type ==============================
189 [Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
190 CONV_2D 15 1.406 89.270% 89.270% 0.000 0
191 DEPTHWISE_CONV_2D 13 0.169 10.730% 100.000% 0.000 0
192 SOFTMAX 1 0.000 0.000% 100.000% 0.000 0
193 RESHAPE 1 0.000 0.000% 100.000% 0.000 0
194 AVERAGE_POOL_2D 1 0.000 0.000% 100.000% 0.000 0
195
196 Timings (microseconds): count=50 first=79449 curr=81350 min=77385 max=88213 avg=79732 std=1929
197 Memory (bytes): count=0
198 31 nodes observed
199
200
201 Average inference timings in us: Warmup: 83235, Init: 38467, no stats: 79760.9
202 ```
203