1 # Graph Transform Tool 2 3 ## Table of Contents 4 5 * [Introduction](#introduction) 6 * [Using the Graph Transform Tool](#using-the-graph-transform-tool) 7 * [Inspecting Graphs](#inspecting-graphs) 8 * [Common Use Cases](#common-use-cases) 9 * [Optimizing for Deployment](#optimizing-for-deployment) 10 * [Fixing Missing Kernel Errors on 11 Mobile](#fixing-missing-kernel-errors-on-mobile) 12 * [Shrinking File Size](#shrinking-file-size) 13 * [Eight-bit Calculations](#eight-bit-calculations) 14 * [Transform Reference](#transform-reference) 15 * [add_default_attributes](#add_default_attributes) 16 * [backport_concatv2](#backport_concatv2) 17 * [flatten_atrous_conv](#flatten_atrous_conv) 18 * [fold_batch_norms](#fold_batch_norms) 19 * [fold_constants](#fold_constants) 20 * [fold_old_batch_norms](#fold_old_batch_norms) 21 * [freeze_requantization_ranges](#freeze_requantization_ranges) 22 * [fuse_convolutions](#fuse_convolutions) 23 * [insert_logging](#insert_logging) 24 * [merge_duplicate_nodes](#merge_duplicate_nodes) 25 * [obfuscate_names](#obfuscate_names) 26 * [quantize_nodes](#quantize_nodes) 27 * [quantize_weights](#quantize_weights) 28 * [remove_attribute](#remove_attribute) 29 * [remove_device](#remove_device) 30 * [remove_nodes](#remove_nodes) 31 * [rename_attribute](#rename_attribute) 32 * [rename_op](#rename_op) 33 * [round_weights](#round_weights) 34 * [sparsify_gather](#sparsify_gather) 35 * [set_device](#set_device) 36 * [sort_by_execution_order](#sort_by_execution_order) 37 * [strip_unused_nodes](#strip_unused_nodes) 38 * [Writing Your Own Transforms](#writing-your-own-transforms) 39 * [Transform Functions](#transform-functions) 40 * [Pattern Syntax](#pattern-syntax) 41 * [ReplaceMatchingOpTypes](#replacematchingoptypes) 42 * [Parameters](#parameters) 43 * [Function Libraries](#function-libraries) 44 * [Registering](#registering) 45 46 ## Introduction 47 48 When you have finished training a model and want to deploy it in production, 49 you'll often want to modify it to better run in its final environment. For 50 example if you're targeting a phone you might want to shrink the file size by 51 quantizing the weights, or optimize away batch normalization or other 52 training-only features. The Graph Transform framework offers a suite of tools 53 for modifying computational graphs, and a framework to make it easy to write 54 your own modifications. 55 56 This guide is structured into three main parts, first giving some tutorials on 57 how to perform common tasks, second a reference covering all of the different 58 transformations that are included, together with the options that apply to them, 59 and third a guide to creating your own transforms. 60 61 ## Using the Graph Transform Tool 62 63 The Graph Transform tool is designed to work on models that are saved as 64 GraphDef files, usually in a binary protobuf format. This is the low-level 65 definition of a TensorFlow computational graph, including a list of nodes and 66 the input and output connections between them. If you're using a Python API to 67 train your model, this will usually be saved out in the same directory as your 68 checkpoints, and usually has a '.pb' suffix. 69 70 If you want to work with the values of your trained parameters, for example to 71 quantize weights, you'll need to run 72 [tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) 73 to convert the checkpoint values into embedded constants within the graph file 74 itself. 75 76 You call the Graph Transform tool itself like this: 77 78 ```bash 79 bazel build tensorflow/tools/graph_transforms:transform_graph 80 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 81 --in_graph=tensorflow_inception_graph.pb \ 82 --out_graph=optimized_inception_graph.pb \ 83 --inputs='Mul:0' \ 84 --outputs='softmax:0' \ 85 --transforms=' 86 strip_unused_nodes(type=float, shape="1,299,299,3") 87 remove_nodes(op=Identity, op=CheckNumerics) 88 fold_old_batch_norms 89 ' 90 ``` 91 92 The arguments here are specifying where to read the graph from, where to write 93 the transformed version to, what the input and output layers are, and what 94 transforms to modify the graph with. The transforms are given as a list of 95 names, and can each have arguments themselves. These transforms define the 96 pipeline of modifications that are applied in order to produce the output. 97 Sometimes you need some transforms to happen before others, and the ordering 98 within the list lets you specify which happen first. 99 Note that the optimization 100 `remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control 101 flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`. 102 103 ## Inspecting Graphs 104 105 Many of the transforms that the tool supports need to know what the input and 106 output layers of the model are. The best source for these is the model training 107 process, where for a classifier the inputs will be the nodes that receive the 108 data from the training set, and the output will be the predictions. If you're 109 unsure, the 110 [`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc) 111 tool can inspect the model and provide guesses about likely input and output nodes, 112 as well as other information that's useful for debugging. Here's an example of 113 how to use it on the [Inception V3 114 graph](http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz): 115 116 ```bash 117 bazel build tensorflow/tools/graph_transforms:summarize_graph 118 bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb 119 ``` 120 121 ## Common Use Cases 122 123 This section has small guides for some of the most frequently-used 124 transformation pipelines, aimed at users who want to quickly accomplish one of 125 these tasks. A lot of them will use the Inception V3 model for their examples, 126 which can be downloaded from 127 [http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz). 128 129 ### Optimizing for Deployment 130 131 If you've finished training your model and want to deploy it on a server or a 132 mobile device, you'll want it to run as fast as possible, and with as few 133 non-essential dependencies as you can. This recipe removes all of the nodes that 134 aren't called during inference, shrinks expressions that are always constant 135 into single nodes, and optimizes away some multiply operations used during batch 136 normalization by pre-multiplying the weights for convolutions. 137 138 ```bash 139 bazel build tensorflow/tools/graph_transforms:transform_graph 140 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 141 --in_graph=tensorflow_inception_graph.pb \ 142 --out_graph=optimized_inception_graph.pb \ 143 --inputs='Mul' \ 144 --outputs='softmax' \ 145 --transforms=' 146 strip_unused_nodes(type=float, shape="1,299,299,3") 147 remove_nodes(op=Identity, op=CheckNumerics) 148 fold_constants(ignore_errors=true) 149 fold_batch_norms 150 fold_old_batch_norms' 151 ``` 152 153 The batch norm folding is included twice because there are two different flavors 154 of batch normalization used in TensorFlow. The older version was implemented 155 with a single BatchNormWithGlobalNormalization op, but it was deprecated in 156 favor of a more recent approach using individual ops to implement the same 157 computation. The two transforms are in there so that both styles are recognized 158 and optimized. 159 160 ### Fixing Missing Kernel Errors on Mobile 161 162 The mobile version of TensorFlow is focused on inference, and so by default the 163 list of supported ops (defined in 164 [tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD) 165 for Bazel and 166 [tensorflow/contrib/makefile/tf_op_files.txt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/tf_op_files.txt) 167 for make builds) doesn't include a lot that are training related. This can cause 168 `No OpKernel was registered to support Op` errors when a GraphDef is loaded, 169 even if the op isn't going to be executed. 170 171 If you see this error and it's an op that you do actually want to run on mobile, 172 then you'll need to make local modifications to the build files to include the 173 right .cc file that defines it. In a lot of cases the op is just a vestigial 174 remnant from the training process though, and if that's true then you can run 175 the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs 176 of your inference usage, to remove those unnecessary nodes: 177 178 ```bash 179 bazel build tensorflow/tools/graph_transforms:transform_graph 180 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 181 --in_graph=tensorflow_inception_graph.pb \ 182 --out_graph=optimized_inception_graph.pb \ 183 --inputs='Mul' \ 184 --outputs='softmax' \ 185 --transforms=' 186 strip_unused_nodes(type=float, shape="1,299,299,3") 187 fold_constants(ignore_errors=true) 188 fold_batch_norms 189 fold_old_batch_norms' 190 ``` 191 192 ### Shrinking File Size 193 194 If you're looking to deploy your model as part of a mobile app, then keeping the 195 download size as small as possible is important. For most TensorFlow models, the 196 largest contributors to the file size are the weights passed in to convolutional 197 and fully-connected layers, so anything that can reduce the storage size for 198 those is very useful. Luckily most neural networks are resistant to noise, so 199 it's possible to change the representation of those weights in a lossy way 200 without losing very much accuracy overall. 201 202 On both iOS and Android app packages are compressed before download, so the 203 simplest way to reduce the bandwidth your users need to receive your app is to 204 provide raw data that compresses more easily. By default the weights are stored 205 as floating-point values, and even tiny differences between numbers result in 206 very different bit patterns, and so these don't compress very well. If you round 207 the weights so that nearby numbers are stored as exactly the same values, the 208 resulting bit stream has a lot more repetition and so compresses down a lot more 209 effectively. To try this technique on your model, run the 210 [round_weights](#round_weights) transform. 211 212 ```bash 213 bazel build tensorflow/tools/graph_transforms:transform_graph 214 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 215 --in_graph=tensorflow_inception_graph.pb \ 216 --out_graph=optimized_inception_graph.pb \ 217 --inputs='Mul' \ 218 --outputs='softmax' \ 219 --transforms=' 220 strip_unused_nodes(type=float, shape="1,299,299,3") 221 fold_constants(ignore_errors=true) 222 fold_batch_norms 223 fold_old_batch_norms 224 round_weights(num_steps=256)' 225 ``` 226 227 You should see that the `optimized_inception_graph.pb` output file is the same 228 size as the input, but if you run zip on it to compress it, it's almost 70% 229 smaller than if you zip the original! The nice thing about this transform is 230 that it doesn't change the structure of the graph at all, so it's running 231 exactly the same operations and should have the same latency and memory usage as 232 before. You can adjust the `num_steps` parameter to control how many values each 233 weight buffer is rounded to, so lower numbers will increase the compression at 234 the cost of accuracy. 235 236 As a further step, you can store the weights into eight-bit values directly. 237 Here's the recipe for that: 238 239 ```bash 240 bazel build tensorflow/tools/graph_transforms:transform_graph 241 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 242 --in_graph=tensorflow_inception_graph.pb \ 243 --out_graph=optimized_inception_graph.pb \ 244 --inputs='Mul' \ 245 --outputs='softmax' \ 246 --transforms=' 247 strip_unused_nodes(type=float, shape="1,299,299,3") 248 fold_constants(ignore_errors=true) 249 fold_batch_norms 250 fold_old_batch_norms 251 quantize_weights' 252 ``` 253 254 You should see that the size of the output graph is about a quarter of the 255 original. The downside to this approach compared to round_weights is that extra 256 decompression ops are inserted to convert the eight-bit values back into 257 floating point, but optimizations in TensorFlow's runtime should ensure these 258 results are cached and so you shouldn't see the graph run any more slowly. 259 260 So far we've been concentrating on weights because those generally take up the 261 most space. If you have a graph with a lot of small nodes in it, the names of 262 those nodes can start to take up a noticeable amount of space too. To shrink 263 those down, you can run the [obfuscate_names](#obfuscate_names) transform, which 264 replaces all the names (except for inputs and outputs) with short, cryptic but 265 unique ids: 266 267 ```bash 268 bazel build tensorflow/tools/graph_transforms:transform_graph 269 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 270 --in_graph=tensorflow_inception_graph.pb \ 271 --out_graph=optimized_inception_graph.pb \ 272 --inputs='Mul:0' \ 273 --outputs='softmax:0' \ 274 --transforms=' 275 obfuscate_names' 276 ``` 277 278 ### Eight-bit Calculations 279 280 For some platforms it's very helpful to be able to do as many calculations as 281 possible in eight-bit, rather than floating-point. The support for this in 282 TensorFlow is still experimental and evolving, but you can convert models into 283 quantized form using the graph transform tool: 284 285 ```bash 286 bazel build tensorflow/tools/graph_transforms:transform_graph 287 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 288 --in_graph=tensorflow_inception_graph.pb \ 289 --out_graph=optimized_inception_graph.pb \ 290 --inputs='Mul' \ 291 --outputs='softmax' \ 292 --transforms=' 293 add_default_attributes 294 strip_unused_nodes(type=float, shape="1,299,299,3") 295 remove_nodes(op=Identity, op=CheckNumerics) 296 fold_constants(ignore_errors=true) 297 fold_batch_norms 298 fold_old_batch_norms 299 quantize_weights 300 quantize_nodes 301 strip_unused_nodes 302 sort_by_execution_order' 303 ``` 304 305 This process converts all the operations in the graph that have eight-bit 306 quantized equivalents, and leaves the rest in floating point. Only a subset of 307 ops are supported, and on many platforms the quantized code may actually be 308 slower than the float equivalents, but this is a way of increasing performance 309 substantially when all the circumstances are right. 310 311 A full guide to optimizing for quantization is beyond the scope of this guide, 312 but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D 313 or similar operations during training. This trains the min/max variables that 314 control the range used for quantization, so that the range doesn't have to be 315 calculated dynamically by RequantizationRange during inference. 316 317 ## Transform Reference 318 319 The --transforms string is parsed as a series of transform names, each of which 320 can have multiple named arguments inside parentheses. Arguments are separated by 321 commas, and double-quotes (") can be used to hold argument values if they 322 themselves contain commas (for example shape definitions). 323 324 The --inputs and --outputs are shared across all transforms, since it's common 325 to need to know what the ingoing and outgoing nodes in the graph are. You should 326 make sure you set these correctly before calling the graph transform tool, and 327 if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool 328 to examine likely inputs and outputs. 329 330 All transforms can be passed the `ignore_errors` flag, with the value set to 331 either true or false. By default any errors that happen within a transform will 332 abort the whole process, but if you enable this then an error will just be 333 logged and the transform skipped. This is especially useful for optional 334 transforms where version errors or other unimportant problems may trigger an 335 error. 336 337 ### add_default_attributes 338 339 Args: None 340 341 When attributes are added to ops in new versions of TensorFlow, they often have 342 defaults to ensure backwards compatible behavior with their original versions. 343 These defaults usually get added when the graph is loaded by the runtime, but if 344 your model is going to be processed outside of the main TensorFlow framework it 345 can be useful to run this update process as a transform. This process finds any 346 op attributes that are defined in the current TensorFlow list of ops but not 347 within the saved model, and sets them to the defined default for that attribute. 348 349 ### backport_concatv2 350 351 Args: None 352 353 If you have a GraphDef file that has been produced by a newer version of the 354 TensorFlow framework and includes ConcatV2, and you want to run it on an older 355 version that only supports Concat, this transform will take care of converting 356 those newer ops to the equivalent older form. 357 358 ### flatten_atrous_conv 359 360 Args: None \ 361 Prerequisites: [fold_constants](#fold_constants) 362 363 This transform flattens atrous convolution, corresponding to a sequence of 364 SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular 365 Conv2D op with upsampled filters. This transforms should only be used in order 366 to run graphs having atrous convolution on platforms that do not yet natively 367 support SpaceToBatchND and BatchToSpaceND operations. You will need to make 368 sure you run [fold_constants](#fold_constants) after this transform. If 369 applicable, you should run this transform before 370 [fold_batch_norms](#fold_batch_norms). 371 372 ### fold_batch_norms 373 374 Args: None \ 375 Prerequisites: [fold_constants](#fold_constants) 376 377 This transform tries to optimize away the Mul that's introduced after a Conv2D 378 (or a MatMul) when batch normalization has been used during training. It scans 379 the graph for any channel-wise multiplies immediately after convolutions, and 380 multiplies the convolution's (or matrix multiplication's) weights with the Mul 381 instead so this can be omitted at inference time. You'll need to make sure you 382 run [fold_constants](#fold_constants) first, since the pattern can only be 383 spotted if the normal complex expression that's produced by training for the Mul 384 input is collapsed down into a simple constant. 385 386 ### fold_constants 387 388 Args: 389 390 * clear_output_shapes: Clears tensor shape information saved as attributes. 391 Some older graphs containes out-of-date information and may cause import 392 errors. Defaults to true. 393 394 Prerequisites: None 395 396 Looks for any sub-graphs within the model that always evaluate to constant 397 expressions, and replaces them with those constants. This optimization is always 398 executed at run-time after the graph is loaded, so running it offline first 399 won't help latency, but it can simplify the graph and so make further processing 400 easier. It's often useful to call this with `fold_constants(ignore_errors=true)` 401 to continue on past transient errors, since this is just an optimization phase. 402 403 ### fold_old_batch_norms 404 405 Args: None \ 406 Prerequisites: None 407 408 In the early days of TensorFlow, batch normalization was implemented using a 409 single monolithic `BatchNormWithGlobalNormalization` op. In modern versions, 410 adding batch normalization from Python will give you a series of smaller math 411 ops instead, to achieve the same effect without special-purpose code. If you 412 have a graph that uses the older-style, this transform will recognize and 413 optimize those ops for inference, in the same way that the 414 [fold_batch_norms](#fold_batch_norms) transform does for the new approach. 415 416 ### freeze_requantization_ranges 417 418 Args: 419 420 * min_max_log_file: Path to a log file containing ranges for ops. 421 * min_percentile: Percentage cutoff to use to calculate an overall min. 422 Defaults to 5. 423 * max_percentile: Percentage cutoff to use to calculate an overall max. 424 Defaults to 5. 425 426 Quantized operations like convolution or matrix multiplies take their inputs as 427 8-bit, but produce 32-bit results. To do further operations on these, they need 428 to be converted back down to the lower depth. To make the most of those eight 429 bits, you need to scale the thirty-two bits of original data down using a scale 430 that matches the range that's actually being used. 431 432 Because that range information isn't stored in the original graph, the 433 [quantization process](#eight-bit-calculations) inserts RequantizationRange ops 434 before each conversion from 32 to 8 bits. This op looks at the 32-bit output and 435 calculates the current min and max every time it's run. 436 437 This isn't incredibly time-consuming, but it is extra work that's nice to avoid 438 if possible. One way of optimizing that away is replacing those 439 RequantizationRange ops with a pair of Const nodes holding known min/max values, 440 so the scaling down can be done without having to inspect the output every time. 441 442 That's what this transform does. It's usually used in conjunction with a copy of 443 the graph that's had [insert_logging](#insert_logging) run on it to instrument 444 it to record the min/max values to stderr. Why is logging used rather than 445 writing to a normal file? As you'll see later, to get best results you want to 446 collect data from a lot of runs on real data, and for mobile apps especially 447 it's a lot easier to do this by copying log files. As an example, here's how 448 you'd add the logging operations for a quantized version of the Inception v3 449 graph: 450 451 ```bash 452 bazel build tensorflow/tools/graph_transforms:transform_graph 453 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 454 --in_graph=/tmp/quantized_inception.pb \ 455 --out_graph=/tmp/logged_quantized_inception.pb \ 456 --inputs=Mul \ 457 --outputs=softmax \ 458 --transforms=' 459 insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\ 460 ' 461 ``` 462 463 Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write 464 out log statements that show the value of the min and max calculated by each 465 RequantizationRange op. Here's an example of running label_image and saving the 466 log: 467 468 ```bash 469 bazel build tensorflow/examples/label_image:label_image 470 bazel-bin/tensorflow/examples/label_image/label_image \ 471 --image=${HOME}/Downloads/grace_hopper.jpg \ 472 --input_layer=Mul \ 473 --output_layer=softmax \ 474 --graph=/tmp/logged_quantized_inception.pb \ 475 --labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \ 476 2>/tmp/min_max_log_small.txt 477 ``` 478 479 If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like 480 this: 481 482 ``` 483 I0108 21:45:42.261883 1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715] 484 ``` 485 486 This is a simple way of serializing the name of the RequantizationRange op and 487 its min/max values every time it's run. It's a file like this that you pass into 488 the transform as the `min_max_log_file` argument. The transform will attempt to 489 extract all of the min/max values associated with ops, ignoring any irrelevant 490 lines in the log, and replace the RequantizationRange ops with two Const nodes 491 containing the found values. 492 493 This isn't the whole story though. The min/max values can vary a lot depending 494 on what the particular inputs to the graph are on any given run, which means 495 picking ranges based on just one run can lead to clipping of values and a loss 496 of accuracy. To get better results, you need to run your network against a range 497 of different inputs. In Inception's case, I often use a thousand different 498 images from the training set. You can then pass the whole concatenated log from 499 all of the runs into the transform, and it will pick ranges based on the 500 aggregate of the values found for each RequantizationRange op. 501 502 To ensure that outliers don't increase the range too much, and so decrease the 503 accuracy by putting too many bits into rare extreme values, the `min_percentile` 504 and `max_percentile` arguments control how the overall min and max are chosen. 505 At their default values of 5, this means that the lowest 5% of the minimum 506 values will be discarded, taking the minimum of the remainder, and the 507 equivalent for the maximum. 508 509 ### fuse_convolutions 510 511 Args: None \ 512 Prerequisites: None 513 514 For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to 515 scale up in the later stages of an image style transfer model), it can improve 516 memory usage and latency to combine the spatial transformations with the 517 convolution's im2col patch generation. This transform looks out for that 518 particular pattern of ops and replaces them with a fused version that combines 519 the resizing and padding with the convolution. 520 521 ### insert_logging 522 523 Args: 524 525 * op: Insert a Print after every occurrence of this op type. Can be repeated 526 to cover multiple types. If not present, all op types will be instrumented. 527 * prefix: Insert a Print after every node whose name starts with this value. 528 Can be repeated to cover multiple nodes. If not present, all node names will 529 be matched. 530 * show_op: If true, the op type will be prepended to all log messages. 531 * show_name: If true, the node's name will be prepended to all log messages. 532 * message: Arbitrary text to log before the values. 533 * first_n: How many times to print before suppressing. Defaults to -1, which 534 means never stop. 535 * summarize: How long numerical results can be before they're truncated. 536 Defaults to 1024. 537 538 The Print operator writes strings to stderr when it's run inside a graph, and 539 prints out the numerical results of the node that it's reading from. This can be 540 very useful when you're debugging and want to follow particular internal values 541 while a graph is running. This transform allows you to insert those ops at 542 particular points in the graph, and customize the message that's displayed. It's 543 also used in conjunction with the 544 [freeze_requantization_ranges](#freeze_requantization_ranges) transform to 545 output information that it needs. 546 547 ### merge_duplicate_nodes 548 549 Args: None \ 550 Prerequisites: None 551 552 If there are Const nodes with the same types and contents, or nodes with the 553 same inputs and attributes, this transform will merge them together. It can be 554 useful when you want to cut down the number of nodes in a graph that has a lot 555 of redundancy (e.g. this transform is always run as part of 556 [quantize_nodes](#quantize_nodes) since the processing there can introduce 557 duplicates of constants that are used in the quantize/dequantize process). 558 559 ### obfuscate_names 560 561 Args: None \ 562 Prerequisites: None 563 564 Replaces all nodes' names with short generated ids, other than the inputs and 565 outputs. This also updates all references within the graph so that the structure 566 is preserved. This can be useful if you want to shrink the file size, or if you 567 want to make it harder to understand the architecture of your model before 568 releasing it. 569 570 ### quantize_nodes 571 572 Args: 573 574 * input_min: The lowest float value for any quantized placeholder inputs. 575 * input_max: The highest float value for any quantized placeholder inputs. If 576 both input_min and input_max are set, then any float placeholders in the 577 graph will be replaced with quantized versions, and consts will be created 578 to pass the range to subsequent operations. 579 * fallback_min: The lowest float value to use for requantizing activation 580 layers. 581 * fallback_max: The highest float value to use for requantizing activation 582 layers. If both fallback_min and fallback_max are set, then instead of using 583 RequantizationRange ops to figure out the useful range dynamically when 584 converting the 32-bit output of ops like QuantizedConv2D and 585 QuantizedBiasAdd, hardwired consts with these values will be used instead. 586 This can help performance, if you know the range of your activation layers 587 ahead of time. 588 589 Prerequisites: [quantize_weights](#quantize_weights) 590 591 Replaces any calculation nodes with their eight-bit equivalents (if available), 592 and adds in conversion layers to allow remaining float operations to 593 interoperate. This is one of the most complex transforms, and involves multiple 594 passes and a lot of rewriting. It's also still an active area of research, so 595 results may vary depending on the platform and operations you're using in your 596 model. You should run quantize_weights first to ensure your Const ops are in 597 eight-bit form. 598 599 ### quantize_weights 600 601 Args: 602 603 * minimum_size: Tensors with fewer elements than this won't be quantized 604 (defaults to 1024) 605 606 Prerequisites: None 607 608 Converts any large (more than minimum_size) float Const op into an eight-bit 609 equivalent, followed by a float conversion op so that the result is usable by 610 subsequent nodes. This is mostly useful for [shrinking file 611 sizes](#shrinking-file-size), but also helps with the more advanced 612 [quantize_nodes](#quantize_nodes) transform. Even though there are no 613 prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or 614 [fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 615 to zero may cause significant loss of precision. 616 617 ### remove_attribute 618 619 Args: 620 621 * attribute_name: Name of the attribute you want to remove. 622 * op_name: Optional name of a single op to restrict the removal to. 623 624 Prerequisites: None 625 626 Deletes the given attribute from either all nodes, or just the one specified in 627 `op_name`. This can be a dangerous transform since it's easy to leave your graph 628 in an invalid state if you remove a required attribute. It can be useful in 629 special circumstances though. 630 631 ### remove_device 632 633 Args: None \ 634 Prerequisites: None 635 636 All ops can have a hardware device specified. This can be a problem when you're 637 loading a graph on a different system than the model was trained on, since some 638 specified devices may not be available. In order to work with graphs like these, 639 you can run this transform to wipe the slate clean and delete the device 640 specifier from all ops. 641 642 ### remove_control_dependencies 643 644 Args: None \ 645 Prerequisites: None 646 647 Removes all control dependencies from the graph. 648 649 ### remove_nodes 650 651 Args: 652 653 * op: The name of the op you want to remove. Can be repeated to remove 654 multiple ops. 655 656 Prerequisites: None 657 658 This is a potentially dangerous transform that looks for single-input, 659 single-output ops with the given names, removes them from the graph, and rewires 660 all inputs that use to pull from them to pull from the preceding node instead. 661 This is most useful for getting rid of ops like `CheckNumerics` that are useful 662 during training but just complicate the graph and increase latency during 663 inference. It's dangerous because it's possible that removing some ops may 664 change the output of your graph, so make sure you check the overall accuracy 665 after using this. 666 667 ### rename_attribute 668 669 Args: 670 671 * old_attribute_name: Current name of the attribute you want to rename. 672 * new_attribute_name: Name that you want the attribute to have now. 673 * op_name: If this is set, only change attributes for a given op type, 674 otherwise apply to all nodes with attribute names that match. 675 676 Prerequisites: None 677 678 Changes the name of the given attribute. This is often useful for upgrading 679 graph files as op definitions change over versions, since the renaming is often 680 enough to deal with minor changes. 681 682 ### rename_op 683 684 Args: 685 686 * old_op_name: Current name of the operation. 687 * new_op_name: Name to change to. 688 689 Prerequisites: None 690 691 Finds all ops with the given name, and changes them to the new one. This can be 692 useful for version upgrading if the changes between ops are minor apart from the 693 name. 694 695 ### round_weights 696 697 Args: 698 699 * num_steps: How many unique values to use in each buffer. 700 701 Prerequisites: None 702 703 Rounds all float values in large Const ops (more than 15 elements) to the given 704 number of steps. The unique values are chosen per buffer by linearly allocating 705 between the largest and smallest values present. This is useful when you'll be 706 deploying on mobile, and you want a model that will compress effectively. See 707 [shrinking file size](#shrinking-file-size) for more details. Even though there 708 are no prerequisites, it is advisable to run 709 [fold_batch_norms](#fold_batch_norms) or 710 [fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 711 to zero may cause significant loss of precision. 712 713 ### sparsify_gather 714 715 Args: None \ 716 Prerequisites: None 717 718 Transform 'Gather' op to a sparsified version where 'params' input of 'Gather' 719 is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is 720 replaced by a hashtable lookup. This is mostly useful for reducing sparse 721 TF.learn linear model memory footprint. 722 723 ### set_device 724 725 Args: 726 727 * device: What device to assign to ops. 728 * if_default: If this is true, only assign to ops with empty existing devices. 729 730 Updates nodes to use the specified device. A device is a way to tell the code 731 that executes the graph which piece of hardware it should run particular nodes 732 on. The right assignment to use may change between training and deployment, so 733 this transform (and [remove_device](#remove_device)) provide a way of updating 734 the placement. If the `is_default` parameter is set, then only ops that don't 735 have a device assigned already will be updated. This is mostly useful for 736 preprocessing of graphs for other stages that expect all ops to have an explicit 737 device assigned. 738 739 ### sort_by_execution_order 740 741 Args: None \ 742 Prerequisites: None 743 744 Arranges the nodes in the GraphDef in topological order, so that the inputs of 745 any given node are always earlier than the node itself. This is especially 746 useful when you're targeting a minimal inference engine, since you can just 747 execute the nodes in the given order knowing that the inputs will be computed 748 before they're needed. 749 750 ### strip_unused_nodes 751 752 Args: 753 754 * type: Default type for any new Placeholder nodes generated, for example 755 int32, float, quint8. 756 * shape: Default shape for any new Placeholder nodes generated, as 757 comma-separated dimensions. For example shape="1,299,299,3". The double 758 quotes are important, since otherwise the commas will be taken as argument 759 separators. 760 * name: Identifier for the placeholder arguments. 761 * type_for_name: What type to use for the previously-given name. 762 * shape_for_name: What shape to use for the previously-given name. 763 764 Prerequisites: None 765 766 Removes all nodes not used in calculated the layers given in `--outputs`, fed by 767 `--inputs`. This is often useful for removing training-only nodes like 768 save-and-restore or summary ops. It's also handy for solving the [missing kernel 769 errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode 770 or other ops you don't need in the inference path. 771 772 The biggest complication is that it sometimes has to create new Placeholder ops, 773 so there are options to control their characteristics. This will happen if you 774 bypass a DecodeJpeg op by specifying an input layer deeper in the network, for 775 example, so you can pass in a raw image array instead of an encoded string as an 776 input. The decode op will be removed, together with the Placeholder that fed it, 777 but a new Placeholder is needed for the input layer you specify. The type and 778 shape arguments let you control the attributes of any new Placeholders that are 779 created. Plain `type` and `shape` set global defaults, but if you have different 780 inputs with varying characteristics, you'll need to pass in a list of arguments 781 where the preceding name specifies what layer each applies to. For example, if 782 you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1, 783 type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float, 784 shape_for_name="1,10,10,3")`. 785 786 ## Writing Your Own Transforms 787 788 The Graph Transform Tool is designed to make it as easy as possible to create 789 your own optimization, modification, and pre-processing transforms. At their 790 heart, all of the transforms take in a valid GraphDef, make some changes, and 791 output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining 792 one node in the graph and its connections. You can find more information on the 793 format at [this guide to TensorFlow model 794 files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html), 795 but for a simple example take a look at 796 [tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc), 797 which implements the [rename_op](#rename_op) transform: 798 799 ```C++ 800 Status RenameOp(const GraphDef& input_graph_def, 801 const TransformFuncContext& context, 802 GraphDef* output_graph_def) { 803 if (!context.params.count("old_op_name") || 804 (context.params.at("old_op_name").size() != 1) || 805 !context.params.count("new_op_name") || 806 (context.params.at("new_op_name").size() != 1)) { 807 return errors::InvalidArgument( 808 "remove_nodes expects exactly one 'old_op_name' and 'new_op_name' " 809 "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)"); 810 } 811 812 const string old_op_name = context.params.at("old_op_name")[0]; 813 const string new_op_name = context.params.at("new_op_name")[0]; 814 output_graph_def->Clear(); 815 for (const NodeDef& node : input_graph_def.node()) { 816 NodeDef* new_node = output_graph_def->mutable_node()->Add(); 817 new_node->CopyFrom(node); 818 if (node.op() == old_op_name) { 819 new_node->set_op(new_op_name); 820 } 821 } 822 823 return Status::OK(); 824 } 825 826 REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp); 827 ``` 828 829 The heart of this transform is the loop through the input_graph_def's nodes. We 830 go through each op, add a new one to the output, copy the original's contents, 831 and then change the op over if it matches the parameters. There's a standard set 832 of parameters for every transform, so they all take in a GraphDef and context, 833 and write out into a new GraphDef. The registration macro at the bottom lets the 834 tool know what function to call when it finds the `rename_op` string in a 835 transforms list. 836 837 ### Transform Functions 838 839 The standard signature that all transform functions have is defined as 840 `TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext` 841 containing environment information, writes to an output GraphDef, and returns a 842 Status indicating whether the transform succeeded. 843 844 The `TransformFuncContext` has a list of the inputs and outputs for the graph, 845 and the [parameter arguments](#parameters) that were passed into the transform 846 by the user. 847 848 If you write a function that matches this signature, and [register 849 it](#registration), the graph transform tool will take care of calling it. 850 851 ### Pattern Syntax 852 853 The `rename_op` example only needs to look at a single node at a time, but one 854 of the most common needs is to modify small sub-graphs within a model. To make 855 this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is 856 a simple and compact way to specify patterns of nodes that you want to look for. 857 The format is: 858 859 ``` 860 OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}" 861 INPUTS ::= OP_TYPE_PATTERN 862 ``` 863 864 The `OP` field can either contain a single "*", which means match any op type, 865 one op type (for example "Const"), or a set of op types separated by `|` symbols 866 (for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported, 867 just these special cases. 868 869 You can think of these patterns as very limited regular expressions designed to 870 pick out sub-trees in graphs. They are deliberately very constrained to the kind 871 of things we commonly find ourselves needing to do, to make creating and 872 debugging as straightforward as possible. 873 874 For example, if you want all Conv2D nodes that have a constant as their second 875 input, you would set up a pattern like this, using C++ initializer lists to 876 populate the structure: 877 878 ```C++ 879 OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}}); 880 ``` 881 882 It can be easier to visualize these initializers using indentation to show the 883 tree structure more clearly: 884 885 ```C++ 886 OpTypePattern conv_pattern({ 887 "Conv2D", 888 { 889 {"*"}, 890 {"Const"} 891 } 892 }); 893 ``` 894 895 In plain English this is saying, a Conv2D op with two inputs, the first of which 896 is any op type, and the second is a Const op. 897 898 Here's a much more complex example, from the [quantize_nodes](#quantize_nodes) 899 transform: 900 901 ```C++ 902 {"QuantizeV2", 903 { 904 {"Dequantize"}, 905 {"Min", 906 { 907 {"Reshape", 908 { 909 {"Dequantize"}, 910 {"Const"}, 911 } 912 }, 913 {"Const"}, 914 } 915 }, 916 {"Max", 917 { 918 {"Reshape", 919 { 920 {"Dequantize"}, 921 {"Const"}, 922 } 923 }, 924 {"Const"}, 925 } 926 }, 927 } 928 } 929 ``` 930 931 This is looking for QuantizeV2 nodes, with three inputs, the first of which is a 932 Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the 933 third is a Max which does the same. Assuming we know the Dequantize ops are 934 pulling from the same eight-bit buffer, the end result of this sub-graph is a 935 no-op, since it's just turning the eight-bit buffer into float, and then 936 immediately converting it back to eight-bits, so if we look for this pattern and 937 remove it we can optimize the graph without changing the result. 938 939 ### ReplaceMatchingOpTypes 940 941 It's very common to want to find all occurrences of a particular sub-graph in a 942 model, and replace them all with a different sub-graph that keeps the same local 943 input and output connections. For example with 944 [fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that 945 read their inputs from BilinearResizes, and replace those combinations with a 946 single FusedResizeAndPadConv2D op, but without affecting other ops. 947 948 To make that sort of transformation easy, we created the 949 `ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern` 950 defining the sub-graph to look for, and a callback function to run for every 951 occurrence it finds. The job of this callback function is to look at the 952 `NodeMatch` that contains information about the current sub-graph, and return a 953 new sub-graph in the new_nodes list that will be used to replace the old 954 sub-graph. 955 956 You can see how it's used in practice in the 957 [fuse_convolutions](#fuse_convolutions) code: 958 959 ```C++ 960 TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes( 961 input_graph_def, // clang-format off 962 {"Conv2D", 963 { 964 {"ResizeBilinear"}, 965 {"*"} 966 } 967 }, // clang-format on 968 [](const NodeMatch& match, const std::set<string>& input_nodes, 969 const std::set<string>& output_nodes, 970 std::vector<NodeDef>* new_nodes) { 971 // Find all the nodes we expect in the subgraph. 972 const NodeDef& conv_node = match.node; 973 const NodeDef& resize_node = match.inputs[0].node; 974 const NodeDef& weights_node = match.inputs[1].node; 975 976 // We'll be reusing the old weights. 977 new_nodes->push_back(weights_node); 978 979 // Create a 'no-op' mirror padding node that has no effect. 980 NodeDef pad_dims_node; 981 pad_dims_node.set_op("Const"); 982 pad_dims_node.set_name(conv_node.name() + "_dummy_paddings"); 983 SetNodeAttr("dtype", DT_INT32, &pad_dims_node); 984 SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0}, 985 &pad_dims_node); 986 new_nodes->push_back(pad_dims_node); 987 988 // Set up the new fused version of the convolution op. 989 NodeDef fused_conv; 990 fused_conv.set_op("FusedResizeAndPadConv2D"); 991 fused_conv.set_name(match.node.name()); 992 AddNodeInput(resize_node.input(0), &fused_conv); 993 AddNodeInput(resize_node.input(1), &fused_conv); 994 AddNodeInput(pad_dims_node.name(), &fused_conv); 995 AddNodeInput(conv_node.input(1), &fused_conv); 996 CopyNodeAttr(resize_node, "align_corners", "resize_align_corners", 997 &fused_conv); 998 SetNodeAttr("mode", "REFLECT", &fused_conv); 999 CopyNodeAttr(conv_node, "T", "T", &fused_conv); 1000 CopyNodeAttr(conv_node, "padding", "padding", &fused_conv); 1001 CopyNodeAttr(conv_node, "strides", "strides", &fused_conv); 1002 new_nodes->push_back(fused_conv); 1003 1004 return Status::OK(); 1005 }, 1006 {}, &replaced_graph_def)); 1007 ``` 1008 1009 Here you can see we define the pattern to look for, and in the callback function 1010 use information from each of the nodes in the old sub-graph to create a new 1011 fused node. We also copy over the old weights input node so that isn't lost. 1012 1013 There are a few things to know about the `ReplaceMatchingOpTypes` function: 1014 1015 * All of the nodes in any matching sub-graphs are removed from the new graph 1016 created by the function. If any of them are needed, it's the callback 1017 function's responsibility to add them back in. There's a `CopyOriginalMatch` 1018 convenience call that will copy over all of the original nodes if you decide 1019 you don't actually want to modify a particular sub-graph. 1020 1021 * It is assumed that the same nodes will never appear in more than one matched 1022 sub-graph. This is to ensure that sub-trees are only replaced once, but it 1023 may mean that some sub-graphs aren't spotted if they overlap with earlier 1024 matches. 1025 1026 * The calling framework tries to ensure that the graph remains sane, by 1027 looking at the new_nodes that are returned and making sure that no nodes 1028 which are needed as inputs by nodes outside the sub-graph are removed. These 1029 important nodes are listed in the `output_nodes` argument that's passed into 1030 each replacement function call. You can disable this checking by setting 1031 `allow_inconsistencies` to true in the options, but otherwise any 1032 replacements that break the graph constraints will be canceled. If you do 1033 allow inconsistencies, it's your transform's responsibility to fix them up 1034 before you return your final result. Functions like `RenameNodeInputs` can 1035 be useful if you are doing wholesale node renaming for example. 1036 1037 ### Parameters 1038 1039 The arguments that are in parentheses after the transform name when the tool is 1040 called are parsed and placed into the params member of the TransformFuncContext 1041 that's given to each transform. For every named argument, there's a vector of 1042 strings containing all the values that it was given, in the order they were 1043 given. These are treated a bit like command-line parameters, and it's the 1044 transform's responsibility to parse them into the data types it needs, and raise 1045 errors by returning a bad Status if any of them are ill-formed. 1046 1047 As an example, here's a hypothetical transform call: 1048 1049 ``` 1050 some_transform(foo=a, foo=b, bar=2, bob="1,2,3") 1051 ``` 1052 1053 Here's what the std::map of strings looks like in the params member: 1054 1055 ``` 1056 {{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}} 1057 ``` 1058 1059 The double quotes around the comma-separated argument to `bob` are important 1060 because otherwise they'll be treated as separate arguments, and the parsing will 1061 fail. 1062 1063 Here's an example of how [round_weights](#round_weights) reads its `num_steps` 1064 parameter: 1065 1066 ```C++ 1067 TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps)); 1068 ``` 1069 1070 If the conversion fails or the parameter occurs more than once the helper 1071 function will raise a meaningful error through the status result of the 1072 transform. If the parameter isn't specified at all then the default will be 1073 used. 1074 1075 ### Function Libraries 1076 1077 A newer feature of TensorFlow is the ability to create libraries of functions as 1078 part of graphs. These are a bit like templates, which define macro operations in 1079 terms of smaller components, which can then be instantiated with different input 1080 and output connections inside the graph just like regular ops. Right now the 1081 graph transform tool just copies these libraries between the input and output 1082 graphs, but it's likely that more complex operations will be supported on them 1083 in the future. 1084 1085 ### Registering 1086 1087 The Graph Transform Tool associates names of transforms with the code to 1088 implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string 1089 and a function, and automagically registers the transform with the tool. You 1090 will need to watch out for a few things though: 1091 1092 * Because it's using global C++ objects in each file under the hood, the 1093 linker can sometimes strip them out and lose the registration. In Bazel you 1094 need to make sure you're linking any new transforms in as libraries, and use 1095 the `alwayslink` flag in your `cc_binary` call. 1096 1097 * You should be able to create your own copy of the transform_graph tool by 1098 linking against the transform_graph_main_lib library in 1099 tensorflow/tools/graph_transforms/BUILD. This contains all the `main()` 1100 logic to parse command line arguments and call transforms. 1101