Home | History | Annotate | Download | only in g3doc
      1 ## Profile Memory
      2 
      3 It is generally a good idea to visualize the memory usage in timeline.
      4 It allows you to see the memory consumption of each GPU over time.
      5 
      6 ```python
      7 #To get memory information, you need --graph_path and --run_meta_path
      8 tfprof> graph -max_depth 10000000 -step 0 -account_type_regexes .* -output timeline:outfile=<filename>
      9 generating trace file.
     10 
     11 ******************************************************
     12 Timeline file is written to <filename>
     13 Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
     14 ******************************************************
     15 ```
     16 
     17 <left>
     18 ![Timeline](graph_timeline.png)
     19 </left>
     20 
     21 
     22 ```python
     23 # You can also visualize the memory information through other methods.
     24 
     25 # With op view, it shows you the aggregated output tensor bytes of each
     26 # operation type.
     27 tfprof> op -select bytes -order_by bytes
     28 node name | requested bytes
     29 Identity                   32515.37MB (100.00%, 27.02%)
     30 FusedBatchNormGrad           10802.14MB (72.98%, 8.98%)
     31 FusedBatchNorm               10517.52MB (64.01%, 8.74%)
     32 Conv2D                       10509.25MB (55.27%, 8.73%)
     33 Conv2DBackpropInput           9701.39MB (46.54%, 8.06%)
     34 ReluGrad                      9206.45MB (38.48%, 7.65%)
     35 Relu                          8462.80MB (30.83%, 7.03%)
     36 DepthwiseConv2dNativeBackpropInput     7899.35MB (23.80%, 6.56%)
     37 DepthwiseConv2dNative         7425.17MB (17.23%, 6.17%)
     38 MaxPoolGrad                   3015.44MB (11.06%, 2.51%)
     39 AddN                           2741.49MB (8.56%, 2.28%)
     40 
     41 # With scope view, you can see the operations that outputs largest tensors.
     42 tfprof> scope -order_by bytes -select bytes -min_bytes 100000000
     43 node name | requested bytes
     44 _TFProfRoot (--/120356.38MB)
     45   tower_3/SepConv2d_2b_3x3/separable_conv2d (346.85MB/854.00MB)
     46     tower_3/SepConv2d_2b_3x3/separable_conv2d/depthwise (507.15MB/507.15MB)
     47   tower_0/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB)
     48     tower_0/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB)
     49   tower_2/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB)
     50     tower_2/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB)
     51   tower_1/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB)
     52     tower_1/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB)
     53   tower_3/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB)
     54     tower_3/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB)
     55   tower_2/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB)
     56     tower_2/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB)
     57   tower_0/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB)
     58     tower_0/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB)
     59   ...
     60 
     61 # code view.
     62 tfprof> code  -max_depth 10 -select bytes -order_by bytes -start_name_regexes .*seq2seq.* -min_bytes 1
     63 node name | requested bytes
     64 _TFProfRoot (--/74148.60MB)
     65   seq2seq_attention.py'>:168:run_filename_from...:none (0B/74148.60MB)
     66     seq2seq_attention.py'>:33:_run_code_in_main:none (0B/74148.60MB)
     67       seq2seq_attention.py:316:<module>:app.run() (0B/74148.60MB)
     68         app.py:432:run:_run_main(main or... (0B/74148.60MB)
     69           app.py:352:_run_main:sys.exit(main(arg... (0B/74148.60MB)
     70             seq2seq_attention.py:270:main:_Train(model, bat... (0B/74148.60MB)
     71               seq2seq_attention.py:128:_Train:model.build_graph() (0B/74148.60MB)
     72                 seq2seq_attention_model.py:363:build_graph:self._add_train_o... (0B/48931.86MB)
     73                   seq2seq_attention_model.py:307:_add_train_op:tf.gradients(self... (0B/46761.06MB)
     74                   seq2seq_attention_model.py:322:_add_train_op:zip(grads, tvars)... (0B/2170.80MB)
     75                   seq2seq_attention_model.py:312:_add_train_op:tf.train.exponent... (0B/2.56KB)
     76                   seq2seq_attention_model.py:308:_add_train_op:tf.summary.scalar... (0B/64B)
     77                   seq2seq_attention_model.py:320:_add_train_op:tf.summary.scalar... (0B/64B)
     78                 seq2seq_attention_model.py:360:build_graph:self._add_seq2seq() (0B/25216.74MB)
     79                   seq2seq_attention_model.py:192:_add_seq2seq:sequence_length=a... (0B/21542.55MB)
     80 ```