Home | History | Annotate | only in /external/tensorflow/tensorflow/tools/graph_transforms
Up to higher level directory
NameDateSize
__init__.py21-Aug-20182.3K
add_default_attributes.cc21-Aug-20181.7K
add_default_attributes_test.cc21-Aug-20182.9K
backports.cc21-Aug-20186K
backports_test.cc21-Aug-20187.8K
BUILD21-Aug-20188.6K
compare_graphs.cc21-Aug-20182.7K
fake_quantize_training.cc21-Aug-20182K
fake_quantize_training_test.cc21-Aug-20182.6K
file_utils.cc21-Aug-20181.7K
file_utils.h21-Aug-20181.2K
file_utils_test.cc21-Aug-20183K
flatten_atrous.cc21-Aug-20185.8K
flatten_atrous_test.cc21-Aug-20185K
fold_batch_norms.cc21-Aug-20184.9K
fold_batch_norms_test.cc21-Aug-20188.3K
fold_constants_lib.cc21-Aug-201811.7K
fold_constants_lib.h21-Aug-20181.8K
fold_constants_test.cc21-Aug-201815.5K
fold_old_batch_norms.cc21-Aug-201813.9K
fold_old_batch_norms_test.cc21-Aug-201813.5K
freeze_requantization_ranges.cc21-Aug-20188K
freeze_requantization_ranges_test.cc21-Aug-20188.2K
fuse_convolutions.cc21-Aug-20187.9K
fuse_convolutions_test.cc21-Aug-20188.8K
insert_logging.cc21-Aug-20185.5K
insert_logging_test.cc21-Aug-20187.9K
obfuscate_names.cc21-Aug-20183.5K
obfuscate_names_test.cc21-Aug-20184.8K
python/21-Aug-2018
quantize_nodes.cc21-Aug-201838.8K
quantize_nodes_test.cc21-Aug-201861.8K
quantize_weights.cc21-Aug-20186.3K
quantize_weights_test.cc21-Aug-20186.8K
README.md21-Aug-201846.8K
remove_attribute.cc21-Aug-20182.6K
remove_attribute_test.cc21-Aug-20184.8K
remove_control_dependencies.cc21-Aug-20181.8K
remove_device.cc21-Aug-20181.7K
remove_device_test.cc21-Aug-20183.3K
remove_ema.cc21-Aug-20184.5K
remove_ema_test.cc21-Aug-20184.8K
remove_nodes.cc21-Aug-20184.7K
remove_nodes_test.cc21-Aug-201810.2K
rename_attribute.cc21-Aug-20182.7K
rename_attribute_test.cc21-Aug-20185.3K
rename_op.cc21-Aug-20182.3K
rename_op_test.cc21-Aug-20184K
round_weights.cc21-Aug-20185K
round_weights_test.cc21-Aug-20184K
set_device.cc21-Aug-20181.6K
set_device_test.cc21-Aug-20184.7K
sort_by_execution_order.cc21-Aug-20181.8K
sort_by_execution_order_test.cc21-Aug-20186.4K
sparsify_gather.cc21-Aug-201823K
sparsify_gather_test.cc21-Aug-201827.4K
strip_unused_nodes.cc21-Aug-20187.3K
strip_unused_nodes_test.cc21-Aug-20189.9K
summarize_graph_main.cc21-Aug-201811.9K
transform_graph.cc21-Aug-201810.7K
transform_graph.h21-Aug-20182K
transform_graph_main.cc21-Aug-20182.5K
transform_graph_test.cc21-Aug-20189.6K
transform_utils.cc21-Aug-201825K
transform_utils.h21-Aug-201812.8K
transform_utils_test.cc21-Aug-201841.8K

README.md

      1 # Graph Transform Tool
      2 
      3 ## Table of Contents
      4 
      5 *   [Introduction](#introduction)
      6 *   [Using the Graph Transform Tool](#using-the-graph-transform-tool)
      7 *   [Inspecting Graphs](#inspecting-graphs)
      8 *   [Common Use Cases](#common-use-cases)
      9     *   [Optimizing for Deployment](#optimizing-for-deployment)
     10     *   [Fixing Missing Kernel Errors on
     11         Mobile](#fixing-missing-kernel-errors-on-mobile)
     12     *   [Shrinking File Size](#shrinking-file-size)
     13     *   [Eight-bit Calculations](#eight-bit-calculations)
     14 *   [Transform Reference](#transform-reference)
     15     *   [add_default_attributes](#add_default_attributes)
     16     *   [backport_concatv2](#backport_concatv2)
     17     *   [flatten_atrous_conv](#flatten_atrous_conv)
     18     *   [fold_batch_norms](#fold_batch_norms)
     19     *   [fold_constants](#fold_constants)
     20     *   [fold_old_batch_norms](#fold_old_batch_norms)
     21     *   [freeze_requantization_ranges](#freeze_requantization_ranges)
     22     *   [fuse_convolutions](#fuse_convolutions)
     23     *   [insert_logging](#insert_logging)
     24     *   [merge_duplicate_nodes](#merge_duplicate_nodes)
     25     *   [obfuscate_names](#obfuscate_names)
     26     *   [quantize_nodes](#quantize_nodes)
     27     *   [quantize_weights](#quantize_weights)
     28     *   [remove_attribute](#remove_attribute)
     29     *   [remove_device](#remove_device)
     30     *   [remove_nodes](#remove_nodes)
     31     *   [rename_attribute](#rename_attribute)
     32     *   [rename_op](#rename_op)
     33     *   [round_weights](#round_weights)
     34     *   [sparsify_gather](#sparsify_gather)
     35     *   [set_device](#set_device)
     36     *   [sort_by_execution_order](#sort_by_execution_order)
     37     *   [strip_unused_nodes](#strip_unused_nodes)
     38 *   [Writing Your Own Transforms](#writing-your-own-transforms)
     39     *   [Transform Functions](#transform-functions)
     40     *   [Pattern Syntax](#pattern-syntax)
     41     *   [ReplaceMatchingOpTypes](#replacematchingoptypes)
     42     *   [Parameters](#parameters)
     43     *   [Function Libraries](#function-libraries)
     44     *   [Registering](#registering)
     45 
     46 ## Introduction
     47 
     48 When you have finished training a model and want to deploy it in production,
     49 you'll often want to modify it to better run in its final environment. For
     50 example if you're targeting a phone you might want to shrink the file size by
     51 quantizing the weights, or optimize away batch normalization or other
     52 training-only features. The Graph Transform framework offers a suite of tools
     53 for modifying computational graphs, and a framework to make it easy to write
     54 your own modifications.
     55 
     56 This guide is structured into three main parts, first giving some tutorials on
     57 how to perform common tasks, second a reference covering all of the different
     58 transformations that are included, together with the options that apply to them,
     59 and third a guide to creating your own transforms.
     60 
     61 ## Using the Graph Transform Tool
     62 
     63 The Graph Transform tool is designed to work on models that are saved as
     64 GraphDef files, usually in a binary protobuf format. This is the low-level
     65 definition of a TensorFlow computational graph, including a list of nodes and
     66 the input and output connections between them. If you're using a Python API to
     67 train your model, this will usually be saved out in the same directory as your
     68 checkpoints, and usually has a '.pb' suffix.
     69 
     70 If you want to work with the values of your trained parameters, for example to
     71 quantize weights, you'll need to run
     72 [tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)
     73 to convert the checkpoint values into embedded constants within the graph file
     74 itself.
     75 
     76 You call the Graph Transform tool itself like this:
     77 
     78 ```bash
     79 bazel build tensorflow/tools/graph_transforms:transform_graph
     80 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
     81 --in_graph=tensorflow_inception_graph.pb \
     82 --out_graph=optimized_inception_graph.pb \
     83 --inputs='Mul:0' \
     84 --outputs='softmax:0' \
     85 --transforms='
     86 strip_unused_nodes(type=float, shape="1,299,299,3")
     87 remove_nodes(op=Identity, op=CheckNumerics)
     88 fold_old_batch_norms
     89 '
     90 ```
     91 
     92 The arguments here are specifying where to read the graph from, where to write
     93 the transformed version to, what the input and output layers are, and what
     94 transforms to modify the graph with. The transforms are given as a list of
     95 names, and can each have arguments themselves. These transforms define the
     96 pipeline of modifications that are applied in order to produce the output.
     97 Sometimes you need some transforms to happen before others, and the ordering
     98 within the list lets you specify which happen first.
     99 Note that the optimization
    100 `remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control
    101 flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`.
    102 
    103 ## Inspecting Graphs
    104 
    105 Many of the transforms that the tool supports need to know what the input and
    106 output layers of the model are. The best source for these is the model training
    107 process, where for a classifier the inputs will be the nodes that receive the
    108 data from the training set, and the output will be the predictions. If you're
    109 unsure, the
    110 [`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc)
    111 tool can inspect the model and provide guesses about likely input and output nodes,
    112 as well as other information that's useful for debugging. Here's an example of
    113 how to use it on the [Inception V3
    114 graph](http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz):
    115 
    116 ```bash
    117 bazel build tensorflow/tools/graph_transforms:summarize_graph
    118 bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb
    119 ```
    120 
    121 ## Common Use Cases
    122 
    123 This section has small guides for some of the most frequently-used
    124 transformation pipelines, aimed at users who want to quickly accomplish one of
    125 these tasks. A lot of them will use the Inception V3 model for their examples,
    126 which can be downloaded from
    127 [http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz).
    128 
    129 ### Optimizing for Deployment
    130 
    131 If you've finished training your model and want to deploy it on a server or a
    132 mobile device, you'll want it to run as fast as possible, and with as few
    133 non-essential dependencies as you can. This recipe removes all of the nodes that
    134 aren't called during inference, shrinks expressions that are always constant
    135 into single nodes, and optimizes away some multiply operations used during batch
    136 normalization by pre-multiplying the weights for convolutions.
    137 
    138 ```bash
    139 bazel build tensorflow/tools/graph_transforms:transform_graph
    140 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    141 --in_graph=tensorflow_inception_graph.pb \
    142 --out_graph=optimized_inception_graph.pb \
    143 --inputs='Mul' \
    144 --outputs='softmax' \
    145 --transforms='
    146   strip_unused_nodes(type=float, shape="1,299,299,3")
    147   remove_nodes(op=Identity, op=CheckNumerics)
    148   fold_constants(ignore_errors=true)
    149   fold_batch_norms
    150   fold_old_batch_norms'
    151 ```
    152 
    153 The batch norm folding is included twice because there are two different flavors
    154 of batch normalization used in TensorFlow. The older version was implemented
    155 with a single BatchNormWithGlobalNormalization op, but it was deprecated in
    156 favor of a more recent approach using individual ops to implement the same
    157 computation. The two transforms are in there so that both styles are recognized
    158 and optimized.
    159 
    160 ### Fixing Missing Kernel Errors on Mobile
    161 
    162 The mobile version of TensorFlow is focused on inference, and so by default the
    163 list of supported ops (defined in
    164 [tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD)
    165 for Bazel and
    166 [tensorflow/contrib/makefile/tf_op_files.txt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/tf_op_files.txt)
    167 for make builds) doesn't include a lot that are training related. This can cause
    168 `No OpKernel was registered to support Op` errors when a GraphDef is loaded,
    169 even if the op isn't going to be executed.
    170 
    171 If you see this error and it's an op that you do actually want to run on mobile,
    172 then you'll need to make local modifications to the build files to include the
    173 right .cc file that defines it. In a lot of cases the op is just a vestigial
    174 remnant from the training process though, and if that's true then you can run
    175 the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs
    176 of your inference usage, to remove those unnecessary nodes:
    177 
    178 ```bash
    179 bazel build tensorflow/tools/graph_transforms:transform_graph
    180 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    181 --in_graph=tensorflow_inception_graph.pb \
    182 --out_graph=optimized_inception_graph.pb \
    183 --inputs='Mul' \
    184 --outputs='softmax' \
    185 --transforms='
    186   strip_unused_nodes(type=float, shape="1,299,299,3")
    187   fold_constants(ignore_errors=true)
    188   fold_batch_norms
    189   fold_old_batch_norms'
    190 ```
    191 
    192 ### Shrinking File Size
    193 
    194 If you're looking to deploy your model as part of a mobile app, then keeping the
    195 download size as small as possible is important. For most TensorFlow models, the
    196 largest contributors to the file size are the weights passed in to convolutional
    197 and fully-connected layers, so anything that can reduce the storage size for
    198 those is very useful. Luckily most neural networks are resistant to noise, so
    199 it's possible to change the representation of those weights in a lossy way
    200 without losing very much accuracy overall.
    201 
    202 On both iOS and Android app packages are compressed before download, so the
    203 simplest way to reduce the bandwidth your users need to receive your app is to
    204 provide raw data that compresses more easily. By default the weights are stored
    205 as floating-point values, and even tiny differences between numbers result in
    206 very different bit patterns, and so these don't compress very well. If you round
    207 the weights so that nearby numbers are stored as exactly the same values, the
    208 resulting bit stream has a lot more repetition and so compresses down a lot more
    209 effectively. To try this technique on your model, run the
    210 [round_weights](#round_weights) transform.
    211 
    212 ```bash
    213 bazel build tensorflow/tools/graph_transforms:transform_graph
    214 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    215 --in_graph=tensorflow_inception_graph.pb \
    216 --out_graph=optimized_inception_graph.pb \
    217 --inputs='Mul' \
    218 --outputs='softmax' \
    219 --transforms='
    220   strip_unused_nodes(type=float, shape="1,299,299,3")
    221   fold_constants(ignore_errors=true)
    222   fold_batch_norms
    223   fold_old_batch_norms
    224   round_weights(num_steps=256)'
    225 ```
    226 
    227 You should see that the `optimized_inception_graph.pb` output file is the same
    228 size as the input, but if you run zip on it to compress it, it's almost 70%
    229 smaller than if you zip the original! The nice thing about this transform is
    230 that it doesn't change the structure of the graph at all, so it's running
    231 exactly the same operations and should have the same latency and memory usage as
    232 before. You can adjust the `num_steps` parameter to control how many values each
    233 weight buffer is rounded to, so lower numbers will increase the compression at
    234 the cost of accuracy.
    235 
    236 As a further step, you can store the weights into eight-bit values directly.
    237 Here's the recipe for that:
    238 
    239 ```bash
    240 bazel build tensorflow/tools/graph_transforms:transform_graph
    241 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    242 --in_graph=tensorflow_inception_graph.pb \
    243 --out_graph=optimized_inception_graph.pb \
    244 --inputs='Mul' \
    245 --outputs='softmax' \
    246 --transforms='
    247   strip_unused_nodes(type=float, shape="1,299,299,3")
    248   fold_constants(ignore_errors=true)
    249   fold_batch_norms
    250   fold_old_batch_norms
    251   quantize_weights'
    252 ```
    253 
    254 You should see that the size of the output graph is about a quarter of the
    255 original. The downside to this approach compared to round_weights is that extra
    256 decompression ops are inserted to convert the eight-bit values back into
    257 floating point, but optimizations in TensorFlow's runtime should ensure these
    258 results are cached and so you shouldn't see the graph run any more slowly.
    259 
    260 So far we've been concentrating on weights because those generally take up the
    261 most space. If you have a graph with a lot of small nodes in it, the names of
    262 those nodes can start to take up a noticeable amount of space too. To shrink
    263 those down, you can run the [obfuscate_names](#obfuscate_names) transform, which
    264 replaces all the names (except for inputs and outputs) with short, cryptic but
    265 unique ids:
    266 
    267 ```bash
    268 bazel build tensorflow/tools/graph_transforms:transform_graph
    269 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    270 --in_graph=tensorflow_inception_graph.pb \
    271 --out_graph=optimized_inception_graph.pb \
    272 --inputs='Mul:0' \
    273 --outputs='softmax:0' \
    274 --transforms='
    275   obfuscate_names'
    276 ```
    277 
    278 ### Eight-bit Calculations
    279 
    280 For some platforms it's very helpful to be able to do as many calculations as
    281 possible in eight-bit, rather than floating-point. The support for this in
    282 TensorFlow is still experimental and evolving, but you can convert models into
    283 quantized form using the graph transform tool:
    284 
    285 ```bash
    286 bazel build tensorflow/tools/graph_transforms:transform_graph
    287 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    288 --in_graph=tensorflow_inception_graph.pb \
    289 --out_graph=optimized_inception_graph.pb \
    290 --inputs='Mul' \
    291 --outputs='softmax' \
    292 --transforms='
    293   add_default_attributes
    294   strip_unused_nodes(type=float, shape="1,299,299,3")
    295   remove_nodes(op=Identity, op=CheckNumerics)
    296   fold_constants(ignore_errors=true)
    297   fold_batch_norms
    298   fold_old_batch_norms
    299   quantize_weights
    300   quantize_nodes
    301   strip_unused_nodes
    302   sort_by_execution_order'
    303 ```
    304 
    305 This process converts all the operations in the graph that have eight-bit
    306 quantized equivalents, and leaves the rest in floating point. Only a subset of
    307 ops are supported, and on many platforms the quantized code may actually be
    308 slower than the float equivalents, but this is a way of increasing performance
    309 substantially when all the circumstances are right.
    310 
    311 A full guide to optimizing for quantization is beyond the scope of this guide,
    312 but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D
    313 or similar operations during training. This trains the min/max variables that
    314 control the range used for quantization, so that the range doesn't have to be
    315 calculated dynamically by RequantizationRange during inference.
    316 
    317 ## Transform Reference
    318 
    319 The --transforms string is parsed as a series of transform names, each of which
    320 can have multiple named arguments inside parentheses. Arguments are separated by
    321 commas, and double-quotes (") can be used to hold argument values if they
    322 themselves contain commas (for example shape definitions).
    323 
    324 The --inputs and --outputs are shared across all transforms, since it's common
    325 to need to know what the ingoing and outgoing nodes in the graph are. You should
    326 make sure you set these correctly before calling the graph transform tool, and
    327 if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool
    328 to examine likely inputs and outputs.
    329 
    330 All transforms can be passed the `ignore_errors` flag, with the value set to
    331 either true or false. By default any errors that happen within a transform will
    332 abort the whole process, but if you enable this then an error will just be
    333 logged and the transform skipped. This is especially useful for optional
    334 transforms where version errors or other unimportant problems may trigger an
    335 error.
    336 
    337 ### add_default_attributes
    338 
    339 Args: None
    340 
    341 When attributes are added to ops in new versions of TensorFlow, they often have
    342 defaults to ensure backwards compatible behavior with their original versions.
    343 These defaults usually get added when the graph is loaded by the runtime, but if
    344 your model is going to be processed outside of the main TensorFlow framework it
    345 can be useful to run this update process as a transform. This process finds any
    346 op attributes that are defined in the current TensorFlow list of ops but not
    347 within the saved model, and sets them to the defined default for that attribute.
    348 
    349 ### backport_concatv2
    350 
    351 Args: None
    352 
    353 If you have a GraphDef file that has been produced by a newer version of the
    354 TensorFlow framework and includes ConcatV2, and you want to run it on an older
    355 version that only supports Concat, this transform will take care of converting
    356 those newer ops to the equivalent older form.
    357 
    358 ### flatten_atrous_conv
    359 
    360 Args: None \
    361 Prerequisites: [fold_constants](#fold_constants)
    362 
    363 This transform flattens atrous convolution, corresponding to a sequence of
    364 SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular
    365 Conv2D op with upsampled filters. This transforms should only be used in order
    366 to run graphs having atrous convolution on platforms that do not yet natively
    367 support SpaceToBatchND and BatchToSpaceND operations. You will need to make
    368 sure you run [fold_constants](#fold_constants) after this transform. If
    369 applicable, you should run this transform before
    370 [fold_batch_norms](#fold_batch_norms).
    371 
    372 ### fold_batch_norms
    373 
    374 Args: None \
    375 Prerequisites: [fold_constants](#fold_constants)
    376 
    377 This transform tries to optimize away the Mul that's introduced after a Conv2D
    378 (or a MatMul) when batch normalization has been used during training. It scans
    379 the graph for any channel-wise multiplies immediately after convolutions, and
    380 multiplies the convolution's (or matrix multiplication's) weights with the Mul
    381 instead so this can be omitted at inference time. You'll need to make sure you
    382 run [fold_constants](#fold_constants) first, since the pattern can only be
    383 spotted if the normal complex expression that's produced by training for the Mul
    384 input is collapsed down into a simple constant.
    385 
    386 ### fold_constants
    387 
    388 Args:
    389 
    390 *   clear_output_shapes: Clears tensor shape information saved as attributes.
    391     Some older graphs containes out-of-date information and may cause import
    392     errors. Defaults to true.
    393 
    394 Prerequisites: None
    395 
    396 Looks for any sub-graphs within the model that always evaluate to constant
    397 expressions, and replaces them with those constants. This optimization is always
    398 executed at run-time after the graph is loaded, so running it offline first
    399 won't help latency, but it can simplify the graph and so make further processing
    400 easier. It's often useful to call this with `fold_constants(ignore_errors=true)`
    401 to continue on past transient errors, since this is just an optimization phase.
    402 
    403 ### fold_old_batch_norms
    404 
    405 Args: None \
    406 Prerequisites: None
    407 
    408 In the early days of TensorFlow, batch normalization was implemented using a
    409 single monolithic `BatchNormWithGlobalNormalization` op. In modern versions,
    410 adding batch normalization from Python will give you a series of smaller math
    411 ops instead, to achieve the same effect without special-purpose code. If you
    412 have a graph that uses the older-style, this transform will recognize and
    413 optimize those ops for inference, in the same way that the
    414 [fold_batch_norms](#fold_batch_norms) transform does for the new approach.
    415 
    416 ### freeze_requantization_ranges
    417 
    418 Args:
    419 
    420 *   min_max_log_file: Path to a log file containing ranges for ops.
    421 *   min_percentile: Percentage cutoff to use to calculate an overall min.
    422     Defaults to 5.
    423 *   max_percentile: Percentage cutoff to use to calculate an overall max.
    424     Defaults to 5.
    425 
    426 Quantized operations like convolution or matrix multiplies take their inputs as
    427 8-bit, but produce 32-bit results. To do further operations on these, they need
    428 to be converted back down to the lower depth. To make the most of those eight
    429 bits, you need to scale the thirty-two bits of original data down using a scale
    430 that matches the range that's actually being used.
    431 
    432 Because that range information isn't stored in the original graph, the
    433 [quantization process](#eight-bit-calculations) inserts RequantizationRange ops
    434 before each conversion from 32 to 8 bits. This op looks at the 32-bit output and
    435 calculates the current min and max every time it's run.
    436 
    437 This isn't incredibly time-consuming, but it is extra work that's nice to avoid
    438 if possible. One way of optimizing that away is replacing those
    439 RequantizationRange ops with a pair of Const nodes holding known min/max values,
    440 so the scaling down can be done without having to inspect the output every time.
    441 
    442 That's what this transform does. It's usually used in conjunction with a copy of
    443 the graph that's had [insert_logging](#insert_logging) run on it to instrument
    444 it to record the min/max values to stderr. Why is logging used rather than
    445 writing to a normal file? As you'll see later, to get best results you want to
    446 collect data from a lot of runs on real data, and for mobile apps especially
    447 it's a lot easier to do this by copying log files. As an example, here's how
    448 you'd add the logging operations for a quantized version of the Inception v3
    449 graph:
    450 
    451 ```bash
    452 bazel build tensorflow/tools/graph_transforms:transform_graph
    453 bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
    454 --in_graph=/tmp/quantized_inception.pb \
    455 --out_graph=/tmp/logged_quantized_inception.pb \
    456 --inputs=Mul \
    457 --outputs=softmax \
    458 --transforms='
    459 insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\
    460 '
    461 ```
    462 
    463 Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write
    464 out log statements that show the value of the min and max calculated by each
    465 RequantizationRange op. Here's an example of running label_image and saving the
    466 log:
    467 
    468 ```bash
    469 bazel build tensorflow/examples/label_image:label_image
    470 bazel-bin/tensorflow/examples/label_image/label_image \
    471 --image=${HOME}/Downloads/grace_hopper.jpg \
    472 --input_layer=Mul \
    473 --output_layer=softmax \
    474 --graph=/tmp/logged_quantized_inception.pb \
    475 --labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \
    476 2>/tmp/min_max_log_small.txt
    477 ```
    478 
    479 If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like
    480 this:
    481 
    482 ```
    483 I0108 21:45:42.261883    1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715]
    484 ```
    485 
    486 This is a simple way of serializing the name of the RequantizationRange op and
    487 its min/max values every time it's run. It's a file like this that you pass into
    488 the transform as the `min_max_log_file` argument. The transform will attempt to
    489 extract all of the min/max values associated with ops, ignoring any irrelevant
    490 lines in the log, and replace the RequantizationRange ops with two Const nodes
    491 containing the found values.
    492 
    493 This isn't the whole story though. The min/max values can vary a lot depending
    494 on what the particular inputs to the graph are on any given run, which means
    495 picking ranges based on just one run can lead to clipping of values and a loss
    496 of accuracy. To get better results, you need to run your network against a range
    497 of different inputs. In Inception's case, I often use a thousand different
    498 images from the training set. You can then pass the whole concatenated log from
    499 all of the runs into the transform, and it will pick ranges based on the
    500 aggregate of the values found for each RequantizationRange op.
    501 
    502 To ensure that outliers don't increase the range too much, and so decrease the
    503 accuracy by putting too many bits into rare extreme values, the `min_percentile`
    504 and `max_percentile` arguments control how the overall min and max are chosen.
    505 At their default values of 5, this means that the lowest 5% of the minimum
    506 values will be discarded, taking the minimum of the remainder, and the
    507 equivalent for the maximum.
    508 
    509 ### fuse_convolutions
    510 
    511 Args: None \
    512 Prerequisites: None
    513 
    514 For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to
    515 scale up in the later stages of an image style transfer model), it can improve
    516 memory usage and latency to combine the spatial transformations with the
    517 convolution's im2col patch generation. This transform looks out for that
    518 particular pattern of ops and replaces them with a fused version that combines
    519 the resizing and padding with the convolution.
    520 
    521 ### insert_logging
    522 
    523 Args:
    524 
    525 *   op: Insert a Print after every occurrence of this op type. Can be repeated
    526     to cover multiple types. If not present, all op types will be instrumented.
    527 *   prefix: Insert a Print after every node whose name starts with this value.
    528     Can be repeated to cover multiple nodes. If not present, all node names will
    529     be matched.
    530 *   show_op: If true, the op type will be prepended to all log messages.
    531 *   show_name: If true, the node's name will be prepended to all log messages.
    532 *   message: Arbitrary text to log before the values.
    533 *   first_n: How many times to print before suppressing. Defaults to -1, which
    534     means never stop.
    535 *   summarize: How long numerical results can be before they're truncated.
    536     Defaults to 1024.
    537 
    538 The Print operator writes strings to stderr when it's run inside a graph, and
    539 prints out the numerical results of the node that it's reading from. This can be
    540 very useful when you're debugging and want to follow particular internal values
    541 while a graph is running. This transform allows you to insert those ops at
    542 particular points in the graph, and customize the message that's displayed. It's
    543 also used in conjunction with the
    544 [freeze_requantization_ranges](#freeze_requantization_ranges) transform to
    545 output information that it needs.
    546 
    547 ### merge_duplicate_nodes
    548 
    549 Args: None \
    550 Prerequisites: None
    551 
    552 If there are Const nodes with the same types and contents, or nodes with the
    553 same inputs and attributes, this transform will merge them together. It can be
    554 useful when you want to cut down the number of nodes in a graph that has a lot
    555 of redundancy (e.g. this transform is always run as part of
    556 [quantize_nodes](#quantize_nodes) since the processing there can introduce
    557 duplicates of constants that are used in the quantize/dequantize process).
    558 
    559 ### obfuscate_names
    560 
    561 Args: None \
    562 Prerequisites: None
    563 
    564 Replaces all nodes' names with short generated ids, other than the inputs and
    565 outputs. This also updates all references within the graph so that the structure
    566 is preserved. This can be useful if you want to shrink the file size, or if you
    567 want to make it harder to understand the architecture of your model before
    568 releasing it.
    569 
    570 ### quantize_nodes
    571 
    572 Args:
    573 
    574 *   input_min: The lowest float value for any quantized placeholder inputs.
    575 *   input_max: The highest float value for any quantized placeholder inputs. If
    576     both input_min and input_max are set, then any float placeholders in the
    577     graph will be replaced with quantized versions, and consts will be created
    578     to pass the range to subsequent operations.
    579 *   fallback_min: The lowest float value to use for requantizing activation
    580     layers.
    581 *   fallback_max: The highest float value to use for requantizing activation
    582     layers. If both fallback_min and fallback_max are set, then instead of using
    583     RequantizationRange ops to figure out the useful range dynamically when
    584     converting the 32-bit output of ops like QuantizedConv2D and
    585     QuantizedBiasAdd, hardwired consts with these values will be used instead.
    586     This can help performance, if you know the range of your activation layers
    587     ahead of time.
    588 
    589 Prerequisites: [quantize_weights](#quantize_weights)
    590 
    591 Replaces any calculation nodes with their eight-bit equivalents (if available),
    592 and adds in conversion layers to allow remaining float operations to
    593 interoperate. This is one of the most complex transforms, and involves multiple
    594 passes and a lot of rewriting. It's also still an active area of research, so
    595 results may vary depending on the platform and operations you're using in your
    596 model. You should run quantize_weights first to ensure your Const ops are in
    597 eight-bit form.
    598 
    599 ### quantize_weights
    600 
    601 Args:
    602 
    603 *   minimum_size: Tensors with fewer elements than this won't be quantized
    604 (defaults to 1024)
    605 
    606 Prerequisites: None
    607 
    608 Converts any large (more than minimum_size) float Const op into an eight-bit
    609 equivalent, followed by a float conversion op so that the result is usable by
    610 subsequent nodes. This is mostly useful for [shrinking file
    611 sizes](#shrinking-file-size), but also helps with the more advanced
    612 [quantize_nodes](#quantize_nodes) transform. Even though there are no
    613 prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or
    614 [fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
    615 to zero may cause significant loss of precision.
    616 
    617 ### remove_attribute
    618 
    619 Args:
    620 
    621 *   attribute_name: Name of the attribute you want to remove.
    622 *   op_name: Optional name of a single op to restrict the removal to.
    623 
    624 Prerequisites: None
    625 
    626 Deletes the given attribute from either all nodes, or just the one specified in
    627 `op_name`. This can be a dangerous transform since it's easy to leave your graph
    628 in an invalid state if you remove a required attribute. It can be useful in
    629 special circumstances though.
    630 
    631 ### remove_device
    632 
    633 Args: None \
    634 Prerequisites: None
    635 
    636 All ops can have a hardware device specified. This can be a problem when you're
    637 loading a graph on a different system than the model was trained on, since some
    638 specified devices may not be available. In order to work with graphs like these,
    639 you can run this transform to wipe the slate clean and delete the device
    640 specifier from all ops.
    641 
    642 ### remove_control_dependencies
    643 
    644 Args: None \
    645 Prerequisites: None
    646 
    647 Removes all control dependencies from the graph.
    648 
    649 ### remove_nodes
    650 
    651 Args:
    652 
    653 *   op: The name of the op you want to remove. Can be repeated to remove
    654     multiple ops.
    655 
    656 Prerequisites: None
    657 
    658 This is a potentially dangerous transform that looks for single-input,
    659 single-output ops with the given names, removes them from the graph, and rewires
    660 all inputs that use to pull from them to pull from the preceding node instead.
    661 This is most useful for getting rid of ops like `CheckNumerics` that are useful
    662 during training but just complicate the graph and increase latency during
    663 inference. It's dangerous because it's possible that removing some ops may
    664 change the output of your graph, so make sure you check the overall accuracy
    665 after using this.
    666 
    667 ### rename_attribute
    668 
    669 Args:
    670 
    671 *   old_attribute_name: Current name of the attribute you want to rename.
    672 *   new_attribute_name: Name that you want the attribute to have now.
    673 *   op_name: If this is set, only change attributes for a given op type,
    674     otherwise apply to all nodes with attribute names that match.
    675 
    676 Prerequisites: None
    677 
    678 Changes the name of the given attribute. This is often useful for upgrading
    679 graph files as op definitions change over versions, since the renaming is often
    680 enough to deal with minor changes.
    681 
    682 ### rename_op
    683 
    684 Args:
    685 
    686 *   old_op_name: Current name of the operation.
    687 *   new_op_name: Name to change to.
    688 
    689 Prerequisites: None
    690 
    691 Finds all ops with the given name, and changes them to the new one. This can be
    692 useful for version upgrading if the changes between ops are minor apart from the
    693 name.
    694 
    695 ### round_weights
    696 
    697 Args:
    698 
    699 *   num_steps: How many unique values to use in each buffer.
    700 
    701 Prerequisites: None
    702 
    703 Rounds all float values in large Const ops (more than 15 elements) to the given
    704 number of steps. The unique values are chosen per buffer by linearly allocating
    705 between the largest and smallest values present. This is useful when you'll be
    706 deploying on mobile, and you want a model that will compress effectively. See
    707 [shrinking file size](#shrinking-file-size) for more details. Even though there
    708 are no prerequisites, it is advisable to run
    709 [fold_batch_norms](#fold_batch_norms) or
    710 [fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
    711 to zero may cause significant loss of precision.
    712 
    713 ### sparsify_gather
    714 
    715 Args: None \
    716 Prerequisites: None
    717 
    718 Transform 'Gather' op to a sparsified version where 'params' input of 'Gather'
    719 is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is
    720 replaced by a hashtable lookup. This is mostly useful for reducing sparse
    721 TF.learn linear model memory footprint.
    722 
    723 ### set_device
    724 
    725 Args:
    726 
    727 *   device: What device to assign to ops.
    728 *   if_default: If this is true, only assign to ops with empty existing devices.
    729 
    730 Updates nodes to use the specified device. A device is a way to tell the code
    731 that executes the graph which piece of hardware it should run particular nodes
    732 on. The right assignment to use may change between training and deployment, so
    733 this transform (and [remove_device](#remove_device)) provide a way of updating
    734 the placement. If the `is_default` parameter is set, then only ops that don't
    735 have a device assigned already will be updated. This is mostly useful for
    736 preprocessing of graphs for other stages that expect all ops to have an explicit
    737 device assigned.
    738 
    739 ### sort_by_execution_order
    740 
    741 Args: None \
    742 Prerequisites: None
    743 
    744 Arranges the nodes in the GraphDef in topological order, so that the inputs of
    745 any given node are always earlier than the node itself. This is especially
    746 useful when you're targeting a minimal inference engine, since you can just
    747 execute the nodes in the given order knowing that the inputs will be computed
    748 before they're needed.
    749 
    750 ### strip_unused_nodes
    751 
    752 Args:
    753 
    754 *   type: Default type for any new Placeholder nodes generated, for example
    755     int32, float, quint8.
    756 *   shape: Default shape for any new Placeholder nodes generated, as
    757     comma-separated dimensions. For example shape="1,299,299,3". The double
    758     quotes are important, since otherwise the commas will be taken as argument
    759     separators.
    760 *   name: Identifier for the placeholder arguments.
    761 *   type_for_name: What type to use for the previously-given name.
    762 *   shape_for_name: What shape to use for the previously-given name.
    763 
    764 Prerequisites: None
    765 
    766 Removes all nodes not used in calculated the layers given in `--outputs`, fed by
    767 `--inputs`. This is often useful for removing training-only nodes like
    768 save-and-restore or summary ops. It's also handy for solving the [missing kernel
    769 errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode
    770 or other ops you don't need in the inference path.
    771 
    772 The biggest complication is that it sometimes has to create new Placeholder ops,
    773 so there are options to control their characteristics. This will happen if you
    774 bypass a DecodeJpeg op by specifying an input layer deeper in the network, for
    775 example, so you can pass in a raw image array instead of an encoded string as an
    776 input. The decode op will be removed, together with the Placeholder that fed it,
    777 but a new Placeholder is needed for the input layer you specify. The type and
    778 shape arguments let you control the attributes of any new Placeholders that are
    779 created. Plain `type` and `shape` set global defaults, but if you have different
    780 inputs with varying characteristics, you'll need to pass in a list of arguments
    781 where the preceding name specifies what layer each applies to. For example, if
    782 you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1,
    783 type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float,
    784 shape_for_name="1,10,10,3")`.
    785 
    786 ## Writing Your Own Transforms
    787 
    788 The Graph Transform Tool is designed to make it as easy as possible to create
    789 your own optimization, modification, and pre-processing transforms. At their
    790 heart, all of the transforms take in a valid GraphDef, make some changes, and
    791 output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining
    792 one node in the graph and its connections. You can find more information on the
    793 format at [this guide to TensorFlow model
    794 files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html),
    795 but for a simple example take a look at
    796 [tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc),
    797 which implements the [rename_op](#rename_op) transform:
    798 
    799 ```C++
    800 Status RenameOp(const GraphDef& input_graph_def,
    801                 const TransformFuncContext& context,
    802                 GraphDef* output_graph_def) {
    803   if (!context.params.count("old_op_name") ||
    804       (context.params.at("old_op_name").size() != 1) ||
    805       !context.params.count("new_op_name") ||
    806       (context.params.at("new_op_name").size() != 1)) {
    807     return errors::InvalidArgument(
    808         "remove_nodes expects exactly one 'old_op_name' and 'new_op_name' "
    809         "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)");
    810   }
    811 
    812   const string old_op_name = context.params.at("old_op_name")[0];
    813   const string new_op_name = context.params.at("new_op_name")[0];
    814   output_graph_def->Clear();
    815   for (const NodeDef& node : input_graph_def.node()) {
    816     NodeDef* new_node = output_graph_def->mutable_node()->Add();
    817     new_node->CopyFrom(node);
    818     if (node.op() == old_op_name) {
    819       new_node->set_op(new_op_name);
    820     }
    821   }
    822 
    823   return Status::OK();
    824 }
    825 
    826 REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp);
    827 ```
    828 
    829 The heart of this transform is the loop through the input_graph_def's nodes. We
    830 go through each op, add a new one to the output, copy the original's contents,
    831 and then change the op over if it matches the parameters. There's a standard set
    832 of parameters for every transform, so they all take in a GraphDef and context,
    833 and write out into a new GraphDef. The registration macro at the bottom lets the
    834 tool know what function to call when it finds the `rename_op` string in a
    835 transforms list.
    836 
    837 ### Transform Functions
    838 
    839 The standard signature that all transform functions have is defined as
    840 `TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext`
    841 containing environment information, writes to an output GraphDef, and returns a
    842 Status indicating whether the transform succeeded.
    843 
    844 The `TransformFuncContext` has a list of the inputs and outputs for the graph,
    845 and the [parameter arguments](#parameters) that were passed into the transform
    846 by the user.
    847 
    848 If you write a function that matches this signature, and [register
    849 it](#registration), the graph transform tool will take care of calling it.
    850 
    851 ### Pattern Syntax
    852 
    853 The `rename_op` example only needs to look at a single node at a time, but one
    854 of the most common needs is to modify small sub-graphs within a model. To make
    855 this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is
    856 a simple and compact way to specify patterns of nodes that you want to look for.
    857 The format is:
    858 
    859 ```
    860 OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}"
    861 INPUTS ::= OP_TYPE_PATTERN
    862 ```
    863 
    864 The `OP` field can either contain a single "*", which means match any op type,
    865 one op type (for example "Const"), or a set of op types separated by `|` symbols
    866 (for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported,
    867 just these special cases.
    868 
    869 You can think of these patterns as very limited regular expressions designed to
    870 pick out sub-trees in graphs. They are deliberately very constrained to the kind
    871 of things we commonly find ourselves needing to do, to make creating and
    872 debugging as straightforward as possible.
    873 
    874 For example, if you want all Conv2D nodes that have a constant as their second
    875 input, you would set up a pattern like this, using C++ initializer lists to
    876 populate the structure:
    877 
    878 ```C++
    879 OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}});
    880 ```
    881 
    882 It can be easier to visualize these initializers using indentation to show the
    883 tree structure more clearly:
    884 
    885 ```C++
    886 OpTypePattern conv_pattern({
    887   "Conv2D",
    888   {
    889     {"*"},
    890     {"Const"}
    891   }
    892 });
    893 ```
    894 
    895 In plain English this is saying, a Conv2D op with two inputs, the first of which
    896 is any op type, and the second is a Const op.
    897 
    898 Here's a much more complex example, from the [quantize_nodes](#quantize_nodes)
    899 transform:
    900 
    901 ```C++
    902 {"QuantizeV2",
    903   {
    904     {"Dequantize"},
    905     {"Min",
    906       {
    907         {"Reshape",
    908           {
    909             {"Dequantize"},
    910             {"Const"},
    911           }
    912         },
    913         {"Const"},
    914       }
    915     },
    916     {"Max",
    917       {
    918         {"Reshape",
    919           {
    920             {"Dequantize"},
    921             {"Const"},
    922           }
    923         },
    924         {"Const"},
    925       }
    926     },
    927   }
    928 }
    929 ```
    930 
    931 This is looking for QuantizeV2 nodes, with three inputs, the first of which is a
    932 Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the
    933 third is a Max which does the same. Assuming we know the Dequantize ops are
    934 pulling from the same eight-bit buffer, the end result of this sub-graph is a
    935 no-op, since it's just turning the eight-bit buffer into float, and then
    936 immediately converting it back to eight-bits, so if we look for this pattern and
    937 remove it we can optimize the graph without changing the result.
    938 
    939 ### ReplaceMatchingOpTypes
    940 
    941 It's very common to want to find all occurrences of a particular sub-graph in a
    942 model, and replace them all with a different sub-graph that keeps the same local
    943 input and output connections. For example with
    944 [fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that
    945 read their inputs from BilinearResizes, and replace those combinations with a
    946 single FusedResizeAndPadConv2D op, but without affecting other ops.
    947 
    948 To make that sort of transformation easy, we created the
    949 `ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern`
    950 defining the sub-graph to look for, and a callback function to run for every
    951 occurrence it finds. The job of this callback function is to look at the
    952 `NodeMatch` that contains information about the current sub-graph, and return a
    953 new sub-graph in the new_nodes list that will be used to replace the old
    954 sub-graph.
    955 
    956 You can see how it's used in practice in the
    957 [fuse_convolutions](#fuse_convolutions) code:
    958 
    959 ```C++
    960 TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes(
    961     input_graph_def,  // clang-format off
    962     {"Conv2D",
    963         {
    964             {"ResizeBilinear"},
    965             {"*"}
    966         }
    967     },  // clang-format on
    968     [](const NodeMatch& match, const std::set<string>& input_nodes,
    969        const std::set<string>& output_nodes,
    970        std::vector<NodeDef>* new_nodes) {
    971       // Find all the nodes we expect in the subgraph.
    972       const NodeDef& conv_node = match.node;
    973       const NodeDef& resize_node = match.inputs[0].node;
    974       const NodeDef& weights_node = match.inputs[1].node;
    975 
    976       // We'll be reusing the old weights.
    977       new_nodes->push_back(weights_node);
    978 
    979       // Create a 'no-op' mirror padding node that has no effect.
    980       NodeDef pad_dims_node;
    981       pad_dims_node.set_op("Const");
    982       pad_dims_node.set_name(conv_node.name() + "_dummy_paddings");
    983       SetNodeAttr("dtype", DT_INT32, &pad_dims_node);
    984       SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0},
    985                                &pad_dims_node);
    986       new_nodes->push_back(pad_dims_node);
    987 
    988       // Set up the new fused version of the convolution op.
    989       NodeDef fused_conv;
    990       fused_conv.set_op("FusedResizeAndPadConv2D");
    991       fused_conv.set_name(match.node.name());
    992       AddNodeInput(resize_node.input(0), &fused_conv);
    993       AddNodeInput(resize_node.input(1), &fused_conv);
    994       AddNodeInput(pad_dims_node.name(), &fused_conv);
    995       AddNodeInput(conv_node.input(1), &fused_conv);
    996       CopyNodeAttr(resize_node, "align_corners", "resize_align_corners",
    997                    &fused_conv);
    998       SetNodeAttr("mode", "REFLECT", &fused_conv);
    999       CopyNodeAttr(conv_node, "T", "T", &fused_conv);
   1000       CopyNodeAttr(conv_node, "padding", "padding", &fused_conv);
   1001       CopyNodeAttr(conv_node, "strides", "strides", &fused_conv);
   1002       new_nodes->push_back(fused_conv);
   1003 
   1004       return Status::OK();
   1005     },
   1006     {}, &replaced_graph_def));
   1007 ```
   1008 
   1009 Here you can see we define the pattern to look for, and in the callback function
   1010 use information from each of the nodes in the old sub-graph to create a new
   1011 fused node. We also copy over the old weights input node so that isn't lost.
   1012 
   1013 There are a few things to know about the `ReplaceMatchingOpTypes` function:
   1014 
   1015 *   All of the nodes in any matching sub-graphs are removed from the new graph
   1016     created by the function. If any of them are needed, it's the callback
   1017     function's responsibility to add them back in. There's a `CopyOriginalMatch`
   1018     convenience call that will copy over all of the original nodes if you decide
   1019     you don't actually want to modify a particular sub-graph.
   1020 
   1021 *   It is assumed that the same nodes will never appear in more than one matched
   1022     sub-graph. This is to ensure that sub-trees are only replaced once, but it
   1023     may mean that some sub-graphs aren't spotted if they overlap with earlier
   1024     matches.
   1025 
   1026 *   The calling framework tries to ensure that the graph remains sane, by
   1027     looking at the new_nodes that are returned and making sure that no nodes
   1028     which are needed as inputs by nodes outside the sub-graph are removed. These
   1029     important nodes are listed in the `output_nodes` argument that's passed into
   1030     each replacement function call. You can disable this checking by setting
   1031     `allow_inconsistencies` to true in the options, but otherwise any
   1032     replacements that break the graph constraints will be canceled. If you do
   1033     allow inconsistencies, it's your transform's responsibility to fix them up
   1034     before you return your final result. Functions like `RenameNodeInputs` can
   1035     be useful if you are doing wholesale node renaming for example.
   1036 
   1037 ### Parameters
   1038 
   1039 The arguments that are in parentheses after the transform name when the tool is
   1040 called are parsed and placed into the params member of the TransformFuncContext
   1041 that's given to each transform. For every named argument, there's a vector of
   1042 strings containing all the values that it was given, in the order they were
   1043 given. These are treated a bit like command-line parameters, and it's the
   1044 transform's responsibility to parse them into the data types it needs, and raise
   1045 errors by returning a bad Status if any of them are ill-formed.
   1046 
   1047 As an example, here's a hypothetical transform call:
   1048 
   1049 ```
   1050 some_transform(foo=a, foo=b, bar=2, bob="1,2,3")
   1051 ```
   1052 
   1053 Here's what the std::map of strings looks like in the params member:
   1054 
   1055 ```
   1056 {{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}}
   1057 ```
   1058 
   1059 The double quotes around the comma-separated argument to `bob` are important
   1060 because otherwise they'll be treated as separate arguments, and the parsing will
   1061 fail.
   1062 
   1063 Here's an example of how [round_weights](#round_weights) reads its `num_steps`
   1064 parameter:
   1065 
   1066 ```C++
   1067 TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps));
   1068 ```
   1069 
   1070 If the conversion fails or the parameter occurs more than once the helper
   1071 function will raise a meaningful error through the status result of the
   1072 transform. If the parameter isn't specified at all then the default will be
   1073 used.
   1074 
   1075 ### Function Libraries
   1076 
   1077 A newer feature of TensorFlow is the ability to create libraries of functions as
   1078 part of graphs. These are a bit like templates, which define macro operations in
   1079 terms of smaller components, which can then be instantiated with different input
   1080 and output connections inside the graph just like regular ops. Right now the
   1081 graph transform tool just copies these libraries between the input and output
   1082 graphs, but it's likely that more complex operations will be supported on them
   1083 in the future.
   1084 
   1085 ### Registering
   1086 
   1087 The Graph Transform Tool associates names of transforms with the code to
   1088 implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string
   1089 and a function, and automagically registers the transform with the tool. You
   1090 will need to watch out for a few things though:
   1091 
   1092 *   Because it's using global C++ objects in each file under the hood, the
   1093     linker can sometimes strip them out and lose the registration. In Bazel you
   1094     need to make sure you're linking any new transforms in as libraries, and use
   1095     the `alwayslink` flag in your `cc_binary` call.
   1096 
   1097 *   You should be able to create your own copy of the transform_graph tool by
   1098     linking against the transform_graph_main_lib library in
   1099     tensorflow/tools/graph_transforms/BUILD. This contains all the `main()`
   1100     logic to parse command line arguments and call transforms.
   1101