1 ## Command Line Interface Tutorials 2 3 * [Command Line Inputs](#command-line-inputs) 4 * [Start `tfprof`](#start-tfprof) 5 * [Examples](#examples) 6 * [Profile Python Time](#profile-python-time) 7 * [Profile Graph Time](#profile-graph-time) 8 * [Profile Checkpoint Value](#profile-checkpoint-value) 9 * [Profile Model Parameter](#profile-model-parameter) 10 * [Profile Device Placement](#profile-device-placement) 11 * [Define Customized Operation Type](#define-customized-operation-type) 12 * [Non-interactive Mode](#non-interactive-mode) 13 14 15 ### Command Line Inputs 16 17 tfprof command line tool uses the following input: 18 19 <b>--profile_path:</b> A ProfileProto binary proto file. 20 See QuickStart on generating the file. 21 22 <b>THE OLD WAY BELOW IS DEPRECATED:</b> 23 24 <b>--graph_path:</b> GraphDef proto file (optional in eager execution). 25 Used to build in-memory 26 data structure of the model. For example, graph.pbtxt written by tf.Supervisor 27 can be passed to --graph_path. You can also easily get GraphDef using 28 tf.get_default_graph().as_graph_def(add_shapes=True) or other API. 29 30 <b>--run_meta_path:</b> RunMetadata proto file (optional). 31 Used to get the memory consumption and execution time of 32 each op of the model. 33 34 The following code snippet writes a RunMetadata file: 35 36 ```python 37 run_options = config_pb2.RunOptions(trace_level=config_pb2.RunOptions.FULL_TRACE) 38 run_metadata = config_pb2.RunMetadata() 39 _ = self._sess.run(..., options=run_options, run_metadata=run_metadata) 40 with tf.gfile.Open(os.path.join(output_dir, "run_meta"), "w") as f: 41 f.write(run_metadata.SerializeToString()) 42 ``` 43 44 <b>--op_log_path:</b> 45 tensorflow.tfprof.OpLogProto (optional). A proto used to provide extra operation 46 information. 1) float operations. 2) code traces. 3) define customized operation 47 type for `-account_type_regexes` option. 48 49 The following code snippet writes a OpLogProto file. 50 51 ```python 52 tf.profiler.write_op_log(graph, log_dir, op_log=None) 53 ``` 54 55 <b>--checkpoint_path:</b> TensorFlow checkpoint (optional). 56 It defines _checkpoint_variable op type. It also provides checkpointed tensors' values. 57 Note: this feature is not well maintained now. 58 59 60 ### Start `tfprof` 61 62 #### Build `tfprof` 63 64 ```shell 65 # Build the tool. 66 bazel build --config opt tensorflow/core/profiler:profiler 67 68 # Help information, including detail 'option' instructions. 69 bazel-bin/tensorflow/core/profiler/profiler help 70 ``` 71 72 #### Start `tfprof` Interactive Mode 73 ```shell 74 # The following commands will start tfprof interactive mode. 75 # 76 # Recommended: 77 # 78 # The file contains the binary string of ProfileProto. 79 # It contains all needed information in one file. 80 bazel-bin/tensorflow/core/profiler/profiler \ 81 --profile_path=profile_xxx 82 # 83 # Alternatively, user can pass separate files. 84 # 85 # --graph_path contains the model architecutre and tensor shapes. 86 # --run_meta_path contains the memory and time information. 87 # --op_log_path contains float operation and code traces. 88 # --checkpoint_path contains the model checkpoint data. 89 # 90 # Only includes model architecture, parameters and shapes. 91 bazel-bin/tensorflow/core/profiler/profiler \ 92 --graph_path=graph.pbtxt 93 94 # For profiling eager execution, user can only specify run_meta_path 95 # and profile execution info of each operation. 96 bazel-bin/tensorflow/core/profiler/profiler \ 97 --run_meta_path=run_meta 98 # 99 # Additionally profile ops memory and timing. 100 bazel-bin/tensorflow/core/profiler/profiler \ 101 --graph_path=graph.pbtxt \ 102 --run_meta_path=run_meta \ 103 # 104 # tfprof_log is used to define customized op types, float ops and code traces. 105 # Use tfprof_logger.write_op_log() to create tfprof_log. 106 bazel-bin/tensorflow/core/profiler/profiler \ 107 --graph_path=graph.pbtxt \ 108 --run_meta_path=run_meta \ 109 --op_log_path=tfprof_log \ 110 # 111 # Additionally profile checkpoint statistics and values. 112 # Use '-account_type_regexes _checkpoint_variables' to select 113 # checkpoint tensors. 114 bazel-bin/tensorflow/core/profiler/profiler \ 115 --graph_path=graph.pbtxt \ 116 --run_meta_path=run_meta \ 117 --op_log_path=tfprof_log \ 118 --checkpoint_path=model.ckpt 119 ``` 120 121 #### Start `tfprof` Non-interactive Mode. 122 123 ```python 124 # Runs tfprof in one-shot. 125 bazel-bin/tensorflow/core/profiler/profiler scope \ 126 --graph_path=graph.pbtxt \ 127 --max_depth=3 128 ``` 129 130 #### Press enter to show the default options 131 132 Refer to [Options](options.md) for option instructions. 133 134 ```shell 135 tfprof> 136 -max_depth 4 137 -min_bytes 0 138 -min_micros 0 139 -min_params 0 140 -min_float_ops 0 141 -min_occurrence 0 142 -step -1 143 -order_by name 144 -account_type_regexes Variable,VariableV2 145 -start_name_regexes .* 146 -trim_name_regexes 147 -show_name_regexes .* 148 -hide_name_regexes IsVariableInitialized_[0-9]+,save\/.*,^zeros[0-9_]* 149 -account_displayed_op_only false 150 # supported select fields. Availability depends on --[run_meta|checkpoint|op_log]_path. 151 # [bytes|micros|params|float_ops|occurrence|tensor_value|device|op_types] 152 -select params 153 # format: output_type:key=value,key=value... 154 # output_types: stdout (default), timeline, file. 155 # key=value pairs: 156 # 1. timeline: outfile=<filename> 157 # 2. file: outfile=<filename> 158 # 3. stdout: None. 159 # E.g. timeline:outfile=/tmp/timeline.json 160 -output 161 ``` 162 163 ### Examples 164 165 #### Profile Python Time 166 ```shell 167 # Requires --graph_path --op_log_path 168 tfprof> code -max_depth 1000 -show_name_regexes .*model_analyzer.*py.* -select micros -account_type_regexes .* -order_by micros 169 _TFProfRoot (0us/22.44ms) 170 model_analyzer_test.py:149:run_filename_as_m...:none (0us/22.44ms) 171 model_analyzer_test.py:33:_run_code_in_main:none (0us/22.44ms) 172 model_analyzer_test.py:208:<module>:test.main() (0us/22.44ms) 173 model_analyzer_test.py:132:testComplexCodeView:x = lib.BuildFull... (0us/22.44ms) 174 model_analyzer_testlib.py:63:BuildFullModel:return sgd_op.min... (0us/21.83ms) 175 model_analyzer_testlib.py:58:BuildFullModel:cell, array_ops.c... (0us/333us) 176 model_analyzer_testlib.py:54:BuildFullModel:seq.append(array_... (0us/254us) 177 model_analyzer_testlib.py:42:BuildSmallModel:x = nn_ops.conv2d... (0us/134us) 178 model_analyzer_testlib.py:46:BuildSmallModel:initializer=init_... (0us/40us) 179 ... 180 model_analyzer_testlib.py:61:BuildFullModel:loss = nn_ops.l2_... (0us/28us) 181 model_analyzer_testlib.py:60:BuildFullModel:target = array_op... (0us/0us) 182 model_analyzer_test.py:134:testComplexCodeView:sess.run(variable... (0us/0us) 183 ``` 184 185 Set ```-output timeline:outfile=<filename>``` to generate timeline instead of stdout. 186 <left> 187 ![CodeTimeline](code_timeline.png) 188 </left> 189 190 #### Profile Graph Time 191 192 ```shell 193 # I defined an op named cost to calculate the loss. I want to know what ops 194 # it depends on take a long time to run. 195 196 # Requires --graph_path, --run_meta_path. 197 tfprof> graph -start_name_regexes cost.* -max_depth 100 -min_micros 10000 -select micros -account_type_regexes .* 198 _TFProfRoot (0us/3.61sec) 199 init/init_conv/Conv2D (11.75ms/3.10sec) 200 random_shuffle_queue_DequeueMany (3.09sec/3.09sec) 201 unit_1_0/sub2/conv2/Conv2D (74.14ms/3.19sec) 202 unit_1_3/sub2/conv2/Conv2D (60.75ms/3.34sec) 203 unit_2_4/sub2/conv2/Conv2D (73.58ms/3.54sec) 204 unit_3_3/sub2/conv2/Conv2D (10.26ms/3.60sec) 205 ``` 206 207 #### Profile Checkpoint Value 208 ```shell 209 # Requires --graph_path, --checkpoint_path. 210 tfprof> scope -show_name_regexes unit_1_0.*gamma -select tensor_value -max_depth 5 211 _TFProfRoot () 212 unit_1_0/shared_activation/init_bn/gamma () 213 [1.80 2.10 2.06 1.91 2.26 1.86 1.81 1.37 1.78 1.85 1.96 1.54 2.04 2.34 2.22 1.99 ], 214 unit_1_0/sub2/bn2/gamma () 215 [1.57 1.83 1.30 1.25 1.59 1.14 1.26 0.82 1.19 1.10 1.48 1.01 0.82 1.23 1.21 1.14 ], 216 ``` 217 218 #### Profile Model Parameter 219 220 ```shell 221 # Show the number of parameters of all `tf.trainable_variables()` in the model. 222 # Requires --graph_path --op_log_path. 223 # store option for future commands. 224 tfprof> set -account_type_regexes _trainable_variables 225 tfprof> scope -max_depth 4 -select params 226 _TFProfRoot (--/464.15k params) 227 init/init_conv/DW (3x3x3x16, 432/432 params) 228 pool_logit/DW (64x10, 640/640 params) 229 pool_logit/biases (10, 10/10 params) 230 unit_last/final_bn/beta (64, 64/64 params) 231 unit_last/final_bn/gamma (64, 64/64 params) 232 ``` 233 234 Where does `_trainable_variables` come from? It is customized operation type 235 defined through the OpLogProto file. 236 Users can [Define Customized Operation Type](#define-customized-operation-type) 237 238 <b>Following example shows importance of defining customized operation type.</b> 239 In this example, extra `Variables` are created by TensorFlow 240 implicitly and /Momentum is appended to their names. They shouldn't be 241 included in you model capacity calculation. 242 243 ```shell 244 tfprof> scope -account_type_regexes VariableV2 -max_depth 4 -select params 245 _TFProfRoot (--/930.58k params) 246 global_step (1/1 params) 247 init/init_conv/DW (3x3x3x16, 432/864 params) 248 pool_logit/DW (64x10, 640/1.28k params) 249 pool_logit/DW/Momentum (64x10, 640/640 params) 250 pool_logit/biases (10, 10/20 params) 251 pool_logit/biases/Momentum (10, 10/10 params) 252 unit_last/final_bn/beta (64, 64/128 params) 253 unit_last/final_bn/gamma (64, 64/128 params) 254 unit_last/final_bn/moving_mean (64, 64/64 params) 255 unit_last/final_bn/moving_variance (64, 64/64 params) 256 ``` 257 258 #### Profile Device Placement 259 260 In this tutorial, a model is split 261 on several gpus at workers and several parameter servers. 262 263 In tfprof, 'device' is an op_type. For example, if op1 and op2 are placed on 264 gpu:0. They share an operation type. 265 266 ```shell 267 bazel-bin/tensorflow/core/profiler/profiler \ 268 --graph_path=/tmp/graph.pbtxt \ 269 --run_meta_path=/tmp/run_meta 270 271 # Looks like ps task 1 is holding twice more parameters than task 0. 272 tfprof> scope -select device,params -account_type_regexes .*ps.*task:0.* -max_depth 1 273 _TFProfRoot (--/25.81m params) 274 tfprof> scope -select device,params -account_type_regexes .*ps.*task:1.* -max_depth 1 275 _TFProfRoot (--/58.84m params) 276 ``` 277 278 #### Define Customized Operation Type 279 280 First, in Python code, create an `OpLogProto` proto and add op type 281 information to it: 282 283 ```python 284 op_log = tfprof_log_pb2.OpLogProto() 285 entry = op_log.log_entries.add() 286 entry.name = 'pool_logit/DW' 287 entry.types.append('pool_logit') 288 entry = op_log.log_entries.add() 289 entry.name = 'pool_logit/biases' 290 entry.types.append('pool_logit') 291 ``` 292 293 Second, call write_op_log to write the OpLogProto proto. 294 295 ```python 296 tf.profiler.write_op_log( 297 sess.graph, /tmp/my_op_log_dir, op_log) 298 299 # Get run-time shape information in order to fill shapes and get flops. 300 tf.profiler.write_op_log( 301 sess.graph, /tmp/my_op_log_dir, op_log, run_meta) 302 ``` 303 304 Third, when starting the tfprof tool, specify 305 "--op_log_path=/tmp/my_op_log_dir/op_log" 306 307 ```shell 308 tfprof> scope -account_type_regexes pool_logit -max_depth 4 -select params 309 _TFProfRoot (--/650 params) 310 pool_logit/DW (64x10, 640/640 params) 311 pool_logit/biases (10, 10/10 params) 312 ``` 313 314 Note that `tf.profiler.write_op_log(...)` automatically 315 assigns all `Variables` inside `tf.trainable_variables()` a customized 316 operation type: `_trainable_variables`. 317 318 319 #### Non-interactive Mode 320 12) Run tfprof in one-shot mode and dump result to file. 321 322 ```shell 323 # By default output to stdout. Use -output option to change output types. 324 tfprof scope --graph_path=graph.pbtxt \ 325 --max_depth=3 \ 326 --output="file:outfile=/tmp/dump" 327 Reading Files... 328 Parsing GraphDef... 329 Preparing Views... 330 331 cat /tmp/dump 332 _TFProfRoot (--/930.58k params) 333 global_step (0/0 params) 334 pool_logit/DW (64x10, 640/1.28k params) 335 pool_logit/biases (10, 10/20 params) 336 ``` 337