Home | History | Annotate | Download | only in g3doc
      1 ## Options
      2 
      3 ### Overview
      4 
      5 For all tfprof views, the profiles are processed with the following procedures
      6 
      7 1) An in-memory data structure is built represent the view.
      8 
      9    *  graph view. Graph. Each profiler node corresponds to a
     10       TensorFlow graph node.
     11    *  scope view. Tree. Each profiler node corresponds to a
     12       TensorFlow graph node.
     13    *  code view. Tree. Each profiler node includes to all TensorFlow
     14       graph nodes created by the profiler node (python code).
     15    *  op view. List. Each profiler node includes to all TensorFlow
     16       graph nodes belonging to an operation type.
     17 
     18 2) `-account_type_regexes` is used to first select the nodes that includes
     19    the specified operation types. An operation has its default type
     20    (e.g. MatMul, Conv2D). `tfprof` also considers device as operation type.
     21    User can also define customized operation type. Hence, an operation has
     22    multiple types. Profiler nodes containing matched
     23    types are selected for display and their statistics are aggregated by the
     24    parents of the in-memory data structure.
     25 
     26 3) Various `-xxx_name_regexes`,  `-min_xxx`, `-max_depth` etc options are then
     27    applied to further filter based on profiler node names and statistics.
     28    It's no limited operation name. In code view,
     29    it's the code string. In op view, it's the operation type name. Different
     30    from `-account_type_regexes`, Statistics are used even if a profiler node is not displayed.
     31    For example, in code view, a callee might be hidden, but its statistics is
     32    still aggregated by it's caller. `-account_displayed_op_only`, however,
     33    breaks the rule and only aggregates statistics of displayed names.
     34 
     35 4) Finally, the filtered data structure is output in a format depending
     36    on the `-output` option.
     37 
     38 #### Option Semantics In Different View
     39 options usually have the same semantics in different views. However, some
     40 can vary. For example `-max_depth` in scope view means the depth of
     41 name scope <b>tree</b>. In op view, it means the length of operation <b>list</b>.
     42 In graph view, in means the number of hops in the <b>graph</b>.
     43 
     44 ### Times
     45 
     46 Most machines have multi-core CPUs. Some installs one or more accelerators.
     47 Each accelerator usually performs massive parallel processing. The profiler
     48 tracks the accumulated processing times. Hence, the accumulated processing
     49 time is likely larger than the time of each step.
     50 
     51 micros: This is the sum of cpu and accelerator times.
     52 accelerator_micros: This is the accelerator times.
     53 cpu_micros: This is the cpu times.
     54 
     55 ### Memory
     56 
     57 Tensor memory are usually ref-counted. The memory is released when there is
     58 no more reference to it. It will be difficult to track the release of memory.
     59 Currently, profiler only tracks the allocation of memory. As a result, the
     60 accumulated memory request is uaually larger than the peak memory of the overall
     61 model.
     62 
     63 It's recommended to generate timeline to see the allocator memory usage over
     64 time.
     65 
     66 `bytes`: The memory allocations requested by the operation.
     67 `peak_bytes`: The peak requested memory (not de-allocated) by the operation.
     68 `residual_bytes`: The memory requested by the operation and not de-allocated
     69                 when Compute finishes.
     70 `output_bytes`: The memory output by the operation. It's not necessarily requested
     71               by the current operation. For example, it can be a tensor
     72               forwarded from input to output, with in-place mutation.
     73 
     74 ### Docs
     75 
     76 `-max_depth`: Show nodes that are at most this number of hops from starting node in the data structure.
     77 
     78 `-min_bytes`: Show nodes that request at least this number of bytes.
     79 
     80 `-min_peak_bytes`: Show nodes that using at least this number of bytes during peak memory usage.
     81 
     82 `-min_residual_bytes`: Show nodes that have at least this number of bytes not being de-allocated after Compute.
     83 
     84 `-min_output_bytes`: Show nodes that have at least this number of bytes output (no necessarily allocated by the nodes).
     85 
     86 `-min_micros`: Show nodes that spend at least this number of microseconds to run. It sums
     87 accelerator_micros and cpu_micros. Note: cpu and accelerator can run in parallel.
     88 
     89 `-min_accelerator_micros`: Show nodes that spend at least this number of microseconds to run on accelerator (e.g. GPU).
     90 
     91 `-min_cpu_micros`: Show nodes that spend at least this number of microseconds to run on CPU.
     92 
     93 `-min_params`: Show nodes that contains at least this number of parameters.
     94 
     95 `-min_float_ops`: Show nodes that contain at least this number of float operations. Only available if an node has op.RegisterStatistics() defined and OpLogProto is provided
     96 
     97 `-min_occurrence`: Show nodes that appear at least this number of times..
     98 
     99 `-step`: Show the stats of the this step when multiple steps of RunMetadata were added. By default, show the average of all steps."
    100 
    101 `-order_by`: Order the results by [name|depth|bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence]
    102 
    103 `-account_type_regexes`: Account and display the nodes whose types match one of the type regexes specified. tfprof allow user to define extra operation types for graph nodes through tensorflow.tfprof.OpLogProto proto. regexes are comma-sperated.
    104 
    105 `-start_name_regexes`: Show node starting from the node that matches the regexes, recursively. regexes are comma-separated.
    106 
    107 `-trim_name_regexes`: Hide node starting from the node that matches the regexes, recursively, regexes are comma-seprated.
    108 
    109 `-show_name_regexes`: Show node that match the regexes. regexes are comma-seprated.
    110 
    111 `-hide_name_regexes`: Hide node that match the regexes. regexes are comma-seprated.
    112 
    113 `-account_displayed_op_only`: If True, only account the statistics of ops eventually displayed. If False, account all op statistics matching -account_type_regexes recursively.
    114 
    115 Notes: See <b>overview</b> session on how does above options play with each
    116 other to decide the output and counting.
    117 
    118 `-select`: Comma-separated list of attributes to show. Supported attributes:
    119 [bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence|tensor_value|device|op_types|input_shapes].
    120 
    121 `-output`: Output results as stdout, file or timeline.
    122 The format is ```output_type:key=value,key=value```.
    123 For example: ```-output timeline:outfile=<filename>```.
    124 
    125 ```shell
    126 timeline: key=outfile, value=<filename>.
    127 stdout: none.
    128 file: key=outfile, value=<filename>.
    129 ```
    130