Home | History | Annotate | Download | only in rendering
      1 page.title=Analyzing with Profile GPU Rendering
      2 page.metaDescription=Use the Profile GPU tool to help you optimize your app's rendering performance.
      3 
      4 meta.tags="power"
      5 page.tags="power"
      6 
      7 @jd:body
      8 
      9 <div id="qv-wrapper">
     10 <div id="qv">
     11 
     12 <h2>In this document</h2>
     13     <ol>
     14       <li>
     15         <a href="#visrep">Visual Representation</a></li>
     16       </li>
     17 
     18       <li>
     19        <a href="#sam">Stages and Their Meanings</a>
     20       
     21       <ul>
     22          <li>
     23            <a href="#sv">Input Handling</a>
     24          </li>
     25          <li>
     26            <a href="#asd">Animation</a>
     27          </li>
     28          <li>
     29            <a href="#asd">Measurement/Layout</a>
     30          </li>
     31          <li>
     32            <a href="#asd">Drawing</a>
     33          </li>
     34          </li>
     35          <li>
     36            <a href="#asd">Sync/Upload</a>
     37          </li>
     38          <li>
     39            <a href="#asd">Issuing Commands</a>
     40          </li>
     41          <li>
     42            <a href="#asd">Processing/Swapping Buffer</a>
     43          </li>
     44          <li>
     45            <a href="#asd">Miscellaneous</a>
     46          </li>
     47       </ul>
     48       </li>     
     49      </ol>
     50   </div>
     51 </div>
     52 
     53 <p>
     54 The <a href="/studio/profile/dev-options-rendering.html">
     55 Profile GPU Rendering</a> tool indicates the relative time that each stage of
     56 the rendering pipeline takes to render the previous frame. This knowledge
     57 can help you identify bottlenecks in the pipeline, so that you
     58 can know what to optimize to improve your app's rendering performance.
     59 </p>
     60 
     61 <p>
     62 This page briefly explains what happens during each pipeline stage, and
     63 discusses issues that can cause bottlenecks there. Before reading
     64 this page, you should be familiar with the information presented in the
     65 <a href="/studio/profile/dev-options-rendering.html">Profile GPU
     66 Rendering Walkthrough</a>. In addition, to understand how all of the
     67 stages fit together, it may be helpful to review
     68 <a href="https://www.youtube.com/watch?v=we6poP0kw6E&index=64&list=PLWz5rJ2EKKc9CBxr3BVjPTPoDPLdPIFCE">
     69 how the rendering pipeline works.</a>
     70 </p>
     71 
     72 <h2 id="#visrep">Visual Representation</h2>
     73 
     74 <p>
     75 The Profile GPU Rendering tool displays stages and their relative times in the
     76 form of a graph: a color-coded histogram. Figure 1 shows an example of
     77 such a display.
     78 </p>
     79 
     80   <img src="{@docRoot}topic/performance/images/bars.png">
     81   <p class="img-caption">
     82 <strong>Figure 1.</strong> Profile GPU Rendering Graph
     83   </p>
     84 
     85 </p>
     86 
     87 <p>
     88 Each segment of each vertical bar displayed in the Profile GPU Rendering
     89 graph represents a stage of the pipeline and is highlighted using a specific
     90 color in
     91 the bar graph. Figure 2 shows a key to the meaning of each displayed color.
     92 </p>
     93 
     94   <img src="{@docRoot}topic/performance/images/s-profiler-legend.png">
     95   <p class="img-caption">
     96 <strong>Figure 2.</strong> Profile GPU Rendering Graph Legend
     97   </p>
     98 
     99 <p>
    100 Once you understand what each color signfiies,
    101 you can target specific aspects of your
    102 app to try to optimize its rendering performance.
    103 </p>
    104 
    105 <h2 id="sam">Stages and Their Meanings</a></h2>
    106 
    107 <p>
    108 This section explains what happens during each stage corresponding
    109 to a color in Figure 2, as well as bottleneck causes to look out for.
    110 </p>
    111 
    112 
    113 <h3 id="ih">Input Handling</h3>
    114 
    115 <p>
    116 The input handling stage of the pipeline measures how long the app
    117 spent handling input events. This metric indicates how long the app
    118 spent executing code called as a result of input event callbacks.
    119 </p>
    120 
    121 <h4>When this segment is large</h4>
    122 
    123 <p>
    124 High values in this area are typically a result of too much work, or
    125 too-complex work, occurring inside the input-handler event callbacks.
    126 Since these callbacks always occur on the main thread, solutions to this
    127 problem focus on optimizing the work directly, or offloading the work to a
    128 different thread.
    129 </p>
    130 
    131 <p>
    132 Its also worth noting that {@link android.support.v7.widget.RecyclerView}
    133 scrolling can appear in this phase.
    134 {@link android.support.v7.widget.RecyclerView} scrolls immediately when it
    135 consumes the touch event. As a result,
    136 it can inflate or populate new item views. For this reason, its important to
    137 make this operation as fast as possible. Profiling tools like Traceview or
    138 Systrace can help you investigate further.
    139 </p>
    140 
    141 <h3 id="at">Animation</h3>
    142 
    143 <p>
    144 The Animations phase shows you just how long it took to evaluate all the
    145 animators that were running in that frame. The most common animators are
    146 {@link android.animation.ObjectAnimator},
    147 {@link android.view.ViewPropertyAnimator}, and
    148 <a href="/training/transitions/overview.html">Transitions</a>.
    149 </p>
    150 
    151 <h4>When this segment is large</h4>
    152 
    153 <p>
    154 High values in this area are typically a result of work thats executing due
    155 to some property change of the animation. For example, a fling animation,
    156 which scrolls your {@link android.widget.ListView} or
    157 {@link android.support.v7.widget.RecyclerView}, causes large amounts of view
    158 inflation and population.
    159 </p>
    160 
    161 <h3 id="ml">Measurement/Layout</h3>
    162 
    163 <p>
    164 In order for Android to draw your view items on the screen, it executes
    165 two specific operations across layouts and views in your view hierarchy.
    166 </p>
    167 
    168 <p>
    169 First, the system measures the view items. Every view and layout has
    170 specific data that describes the size of the object on the screen. Some views
    171 can have a specific size; others have a size that adapts to the size
    172 of the parent layout container
    173 </p>
    174 
    175 <p>
    176 Second, the system lays out the view items. Once the system calculates
    177 the sizes of children views, the system can proceed with layout, sizing
    178 and positioning the views on the screen.
    179 </p>
    180 
    181 <p>
    182 The system performs measurement and layout not only for the views to be drawn,
    183 but also for the parent hierarchies of those views, all the way up to the root
    184 view.
    185 </p>
    186 
    187 <h4>When this segment is large</h4>
    188 
    189 <p>
    190 If your app spends a lot of time per frame in this area, it is
    191 usually either because of the sheer volume of views that need to be
    192 laid out, or problems such as
    193 <a href="/topic/performance/optimizing-view-hierarchies.html#double">
    194 double taxation</a> at the wrong spot in your
    195 hierarchy. In either of these cases, addressing performance involves
    196 <a href="/topic/performance/optimizing-view-hierarchies.html">improving
    197 the performance of your view hierarchies</a>.
    198 </p>
    199 
    200 <p>
    201 Code that youve added to
    202 {@link android.view.View#onLayout(boolean, int, int, int, int)} or
    203 {@link android.view.View#onMeasure(int, int)}
    204 can also cause performance
    205 issues. <a href="/studio/profile/traceview.html">Traceview</a> and
    206 <a href="/studio/profile/systrace.html">Systrace</a> can help you examine
    207 the callstacks to identify problems your code may have.
    208 </p>
    209 
    210 <h3 id="draw">Drawing</h3>
    211 
    212 <p>
    213 The draw stage translates a views rendering operations, such as drawing
    214 a background or drawing text, into a sequence of native drawing commands.
    215 The system captures these commands into a display list.
    216 </p>
    217 
    218 <p>
    219 The Draw bar records how much time it takes to complete capturing the commands
    220 into the display list, for all the views that needed to be updated on the screen
    221 this frame. The measured time applies to any code that you have added to the UI
    222 objects in your app. Examples of such code may be the
    223 {@link android.view.View#onDraw(android.graphics.Canvas) onDraw()},
    224 {@link android.view.View#dispatchDraw(android.graphics.Canvas) dispatchDraw()},
    225 and the various <code>draw ()methods</code> belonging to the subclasses of the
    226 {@link android.graphics.drawable.Drawable} class.
    227 </p>
    228 
    229 <h4>When this segment is large</h4>
    230 
    231 <p>
    232 In simplified terms, you can understand this metric as showing how long it took
    233 to run all of the calls to
    234 {@link android.view.View#onDraw(android.graphics.Canvas) onDraw()}
    235 for each invalidated view. This
    236 measurement includes any time spent dispatching draw commands to children and
    237 drawables that may be present. For this reason, when you see this bar spike, the
    238 cause could be that a bunch of views suddenly became invalidated. Invalidation
    239 makes it necessary to regenerate views' display lists. Alternatively, a
    240 lengthy time may be the result of a few custom views that have some extremely
    241 complex logic in their
    242 {@link android.view.View#onDraw(android.graphics.Canvas) onDraw()} methods.
    243 </p>
    244 
    245 <h3 id="su">Sync/Upload</h3>
    246 
    247 <p>
    248 The Sync & Upload metric represents the time it takes to transfer
    249 bitmap objects from CPU memory to GPU memory during the current frame.
    250 </p>
    251 
    252 <p>
    253 As different processors, the CPU and the GPU have different RAM areas
    254 dedicated to processing. When you draw a bitmap on Android, the system
    255 transfers the bitmap to GPU memory before the GPU can render it to the
    256 screen. Then, the GPU caches the bitmap so that the system doesnt need to
    257 transfer the data again unless the texture gets evicted from the GPU texture
    258 cache.
    259 </p>
    260 
    261 <p class="note"><strong>Note:</strong> On Lollipop devices, this stage is
    262 purple.
    263 </p>
    264 
    265 <h4>When this segment is large</h4>
    266 
    267 <p>
    268 All resources for a frame need to reside in GPU memory before they can be
    269 used to draw a frame. This means that a high value for this metric could mean
    270 either a large number of small resource loads or a small number of very large
    271 resources. A common case is when an app displays a single bitmap thats
    272 close to the size of the screen. Another case is when an app displays a
    273 large number of thumbnails.
    274 </p>
    275 
    276 <p>
    277 To shrink this bar, you can employ techniques such as:
    278 </p>
    279 
    280 <ul>
    281    <li>
    282 Ensuring your bitmap resolutions are not much larger than the size at which they
    283 will be displayed. For example, your app should avoid displaying a 1024x1024
    284 image as a 48x48 image.
    285    </li>
    286 
    287    <li>
    288 Taking advantage of {@link android.graphics.Bitmap#prepareToDraw()}
    289 to asynchronously pre-upload a bitmap before the next sync phase.
    290    </li>
    291 </ul>
    292 
    293 <h3 id="ic">Issuing Commands</h3>
    294 
    295 <p>
    296 The <em>Issue Commands</em> segment represents the time it takes to issue all
    297 of the commands necessary for drawing display lists to the screen.
    298 </p>
    299 
    300 <p>
    301 For the system to draw display lists to the screen, it sends the
    302 necessary commands to the GPU. Typically, it performs this action through the
    303 <a href="/guide/topics/graphics/opengl.html">OpenGL ES</a> API.
    304 </p>
    305 
    306 <p>
    307 This process takes some time, as the system performs final transformation
    308 and clipping for each command before sending the command to the GPU. Additional
    309 overhead then arises on the GPU side, which computes the final commands. These
    310 commands include final transformations, and additional clipping.
    311 </p>
    312 
    313 <h4>When this segment is large</h4>
    314 
    315 <p>
    316 The time spent in this stage is a direct measure of the complexity and
    317 quantity of display lists that the system renders in a given
    318 frame. For example, having many draw operations, especially in cases where
    319 there's a small inherent cost to each draw primitive, could inflate this time.
    320 For example:
    321 </p>
    322 
    323 <pre>
    324 for (int i = 0; i < 1000; i++)
    325 canvas.drawPoint()
    326 </pre>
    327 
    328 <p>
    329 is a lot more expensive to issue than:
    330 </p>
    331 
    332 <pre>
    333 canvas.drawPoints(mThousandPointArray);
    334 </pre>
    335 
    336 <p>
    337 There isnt always a 1:1 correlation between issuing commands and
    338 actually drawing display lists. Unlike <em>Issue Commands</em>,
    339 which captures the time it takes to send drawing commands to the GPU,
    340 the <em>Draw</em> metric represents the time that it took to capture the issued
    341 commands into the display list.
    342 </p>
    343 
    344 <p>
    345 This difference arises because the display lists are cached by
    346 the system wherever possible. As a result, there are situations where a
    347 scroll, transform, or animation requires the system to re-send a display
    348 list, but not have to actually rebuild it&mdash;recapture the drawing
    349 commands&mdash;from scratch. As a result, you can see a high Issue
    350 commands bar without seeing a high <em>Draw commands</em> bar.
    351 </p>
    352 
    353 <h3 id="psb">Processing/Swapping Buffers</h3>
    354 
    355 <p>
    356 Once Android finishes submitting all its display list to the GPU,
    357 the system issues one final command to tell the graphics driver that it's
    358 done with the current frame. At this point, the driver can finally present
    359 the updated image to the screen.
    360 </p>
    361 
    362 <h4>When this segment is large</h4>
    363 
    364 <p>
    365 Its important to understand that the GPU executes work in parallel with the
    366 CPU. The Android system issues draw commands to the GPU, and then moves on to
    367 the next task. The GPU reads those draw commands from a queue and processes
    368 them.
    369 </p>
    370 
    371 <p>
    372 In situations where the CPU issues commands faster than the GPU
    373 consumes them, the communications queue between the processors can become
    374 full. When this occurs, the CPU blocks, and waits until there is space in the
    375 queue to place the next command. This full-queue state arises often during the
    376 <em>Swap Buffers</em> stage, because at that point, a whole frames worth of
    377 commands have been submitted.
    378 </p>
    379 
    380 </p>
    381 The key to mitigating this problem is to reduce the complexity of work occurring
    382 on the GPU, in similar fashion to what you would do for the Issue Commands
    383 phase.
    384 </p>
    385 
    386 
    387 <h3 id="mt">Miscellaneous</h3>
    388 
    389 <p>
    390 In addition to the time it takes the rendering system to perform its work,
    391 theres an additional set of work that occurs on the main thread and has
    392 nothing to do with rendering. Time that this work consumes is reported as
    393 <em>misc time</em>. Misc time generally represents work that might be occurring
    394 on the UI thread between two consecutive frames of rendering.
    395 </p>
    396 
    397 <h4>When this segment is large</h4>
    398 
    399 <p>
    400 If this value is high, it is likely that your app has callbacks, intents, or
    401 other work that should be happening on another thread. Tools such as
    402 <a href="/studio/profile/traceview.html">Method
    403 Tracing</a> or <a href="/studio/profile/systrace.html">Systrace</a> can provide
    404 visibility into the tasks that are running on
    405 the main thread. This information can help you target performance improvements.
    406 </p>
    407