1 ## Profile Memory 2 3 It is generally a good idea to visualize the memory usage in timeline. 4 It allows you to see the memory consumption of each GPU over time. 5 6 ```python 7 #To get memory information, you need --graph_path and --run_meta_path 8 tfprof> graph -max_depth 10000000 -step 0 -account_type_regexes .* -output timeline:outfile=<filename> 9 generating trace file. 10 11 ****************************************************** 12 Timeline file is written to <filename> 13 Open a Chrome browser, enter URL chrome://tracing and load the timeline file. 14 ****************************************************** 15 ``` 16 17 <left> 18 ![Timeline](graph_timeline.png) 19 </left> 20 21 22 ```python 23 # You can also visualize the memory information through other methods. 24 25 # With op view, it shows you the aggregated output tensor bytes of each 26 # operation type. 27 tfprof> op -select bytes -order_by bytes 28 node name | requested bytes 29 Identity 32515.37MB (100.00%, 27.02%) 30 FusedBatchNormGrad 10802.14MB (72.98%, 8.98%) 31 FusedBatchNorm 10517.52MB (64.01%, 8.74%) 32 Conv2D 10509.25MB (55.27%, 8.73%) 33 Conv2DBackpropInput 9701.39MB (46.54%, 8.06%) 34 ReluGrad 9206.45MB (38.48%, 7.65%) 35 Relu 8462.80MB (30.83%, 7.03%) 36 DepthwiseConv2dNativeBackpropInput 7899.35MB (23.80%, 6.56%) 37 DepthwiseConv2dNative 7425.17MB (17.23%, 6.17%) 38 MaxPoolGrad 3015.44MB (11.06%, 2.51%) 39 AddN 2741.49MB (8.56%, 2.28%) 40 41 # With scope view, you can see the operations that outputs largest tensors. 42 tfprof> scope -order_by bytes -select bytes -min_bytes 100000000 43 node name | requested bytes 44 _TFProfRoot (--/120356.38MB) 45 tower_3/SepConv2d_2b_3x3/separable_conv2d (346.85MB/854.00MB) 46 tower_3/SepConv2d_2b_3x3/separable_conv2d/depthwise (507.15MB/507.15MB) 47 tower_0/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB) 48 tower_0/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB) 49 tower_2/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB) 50 tower_2/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB) 51 tower_1/SepConv2d_2b_3x3/separable_conv2d (346.85MB/693.71MB) 52 tower_1/SepConv2d_2b_3x3/separable_conv2d/depthwise (346.85MB/346.85MB) 53 tower_3/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB) 54 tower_3/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB) 55 tower_2/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB) 56 tower_2/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB) 57 tower_0/SepConv2d_2a_3x3/separable_conv2d (346.85MB/520.28MB) 58 tower_0/SepConv2d_2a_3x3/separable_conv2d/depthwise (173.43MB/173.43MB) 59 ... 60 61 # code view. 62 tfprof> code -max_depth 10 -select bytes -order_by bytes -start_name_regexes .*seq2seq.* -min_bytes 1 63 node name | requested bytes 64 _TFProfRoot (--/74148.60MB) 65 seq2seq_attention.py'>:168:run_filename_from...:none (0B/74148.60MB) 66 seq2seq_attention.py'>:33:_run_code_in_main:none (0B/74148.60MB) 67 seq2seq_attention.py:316:<module>:app.run() (0B/74148.60MB) 68 app.py:432:run:_run_main(main or... (0B/74148.60MB) 69 app.py:352:_run_main:sys.exit(main(arg... (0B/74148.60MB) 70 seq2seq_attention.py:270:main:_Train(model, bat... (0B/74148.60MB) 71 seq2seq_attention.py:128:_Train:model.build_graph() (0B/74148.60MB) 72 seq2seq_attention_model.py:363:build_graph:self._add_train_o... (0B/48931.86MB) 73 seq2seq_attention_model.py:307:_add_train_op:tf.gradients(self... (0B/46761.06MB) 74 seq2seq_attention_model.py:322:_add_train_op:zip(grads, tvars)... (0B/2170.80MB) 75 seq2seq_attention_model.py:312:_add_train_op:tf.train.exponent... (0B/2.56KB) 76 seq2seq_attention_model.py:308:_add_train_op:tf.summary.scalar... (0B/64B) 77 seq2seq_attention_model.py:320:_add_train_op:tf.summary.scalar... (0B/64B) 78 seq2seq_attention_model.py:360:build_graph:self._add_seq2seq() (0B/25216.74MB) 79 seq2seq_attention_model.py:192:_add_seq2seq:sequence_length=a... (0B/21542.55MB) 80 ```