Home | History | Annotate | Download | only in debug
      1 <html devsite>
      2   <head>
      3     <title>Evaluating Performance</title>
      4     <meta name="project_path" value="/_project.yaml" />
      5     <meta name="book_path" value="/_book.yaml" />
      6   </head>
      7   <body>
      8   <!--
      9       Copyright 2017 The Android Open Source Project
     10 
     11       Licensed under the Apache License, Version 2.0 (the "License");
     12       you may not use this file except in compliance with the License.
     13       You may obtain a copy of the License at
     14 
     15           http://www.apache.org/licenses/LICENSE-2.0
     16 
     17       Unless required by applicable law or agreed to in writing, software
     18       distributed under the License is distributed on an "AS IS" BASIS,
     19       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     20       See the License for the specific language governing permissions and
     21       limitations under the License.
     22   -->
     23 
     24 
     25 <p>There are two user-visible indicators of performance:</p>
     26 
     27 <ul>
     28 <li><strong>Predictable, perceptible performance</strong>. Does the user
     29 interface (UI) drop frames or consistently render at 60FPS? Does audio play
     30 without artifacts or popping? How long is the delay between the user touching
     31 the screen and the effect showing on the display?</li>
     32 <li><strong>Length of time required for longer operations</strong> (such as
     33 opening applications).</li>
     34 </ul>
     35 
     36 <p>The first is more noticeable than the second. Users typically notice jank
     37 but they won't be able to tell 500ms vs 600ms application startup time unless
     38 they are looking at two devices side-by-side. Touch latency is immediately
     39 noticeable and significantly contributes to the perception of a device.</p>
     40 
     41 <p>As a result, in a fast device, the UI pipeline is the most important thing in
     42 the system other than what is necessary to keep the UI pipeline functional. This
     43 means that the UI pipeline should preempt any other work that is not necessary
     44 for fluid UI. To maintain a fluid UI, background syncing, notification delivery,
     45 and similar work must all be delayed if UI work can be run. It is
     46 acceptable to trade the performance of longer operations (HDR+ runtime,
     47 application startup, etc.) to maintain a fluid UI.</p>
     48 
     49 <h2 id="capacity_vs_jitter">Capacity vs jitter</h2>
     50 <p>When considering device performance, <em>capacity</em> and <em>jitter</em>
     51 are two meaningful metrics.</p>
     52 
     53 <h3 id="capacity">Capacity</h3>
     54 <p>Capacity is the total amount of some resource that the device possesses over
     55 some amount of time. This can be CPU resources, GPU resources, I/O resources,
     56 network resources, memory bandwidth, or any similar metric. When examining
     57 whole-system performance, it can be useful to abstract the individual components
     58 and assume a single metric that determines performance (especially when tuning a
     59 new device because the workloads run on that device are likely fixed).</p>
     60 
     61 <p>The capacity of a system varies based on the computing resources online.
     62 Changing CPU/GPU frequency is the primary means of changing capacity, but there
     63 are others such as changing the number of CPU cores online. Accordingly, the
     64 capacity of a system corresponds with power consumption; <strong>changing
     65 capacity always results in a similar change in power consumption.</strong></p>
     66 
     67 <p>The capacity required at a given time is overwhelmingly determined by the
     68 running application. As a result, the platform can do little to adjust the
     69 capacity required for a given workload, and the means to do so are limited to
     70 runtime improvements (Android framework, ART, Bionic, GPU compiler/drivers,
     71 kernel).</p>
     72 
     73 <h3 id="jitter">Jitter</h3>
     74 <p>While the required capacity for a workload is easy to see, jitter is a more
     75 nebulous concept. For a good introduction to jitter as an impediment to fast
     76 systems, refer to
     77 <em><a href="http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-03-3116">THE
     78 CASE OF THE MISSING SUPERCOMPUTER PERFORMANCE: ACHIEVING OPTIMAL PERFORMANCE ON
     79 THE 8,192 PROCESSORS OF ASCl Q</em></a>. (It's an investigation of why the ASCI
     80 Q supercomputer did not achieve its expected performance and is a great
     81 introduction to optimizing large systems.)</p>
     82 
     83 <p>This page uses the term jitter to describe what the ASCI Q paper calls
     84 <em>noise</em>. Jitter is the random system behavior that prevents perceptible
     85 work from running. It is often work that must be run, but it may not have strict
     86 timing requirements that cause it to run at any particular time. Because it is
     87 random, it is extremely difficult to disprove the existence of jitter for a
     88 given workload. It is also extremely difficult to prove that a known source of
     89 jitter was the cause of a particular performance issue. The tools most commonly
     90 used for diagnosing causes of jitter (such as tracing or logging) can introduce
     91 their own jitter.</p>
     92 
     93 <p>Sources of jitter experienced in real-world implementations of Android
     94 include:</p>
     95 <ul>
     96 <li>Scheduler delay</li>
     97 <li>Interrupt handlers</li>
     98 <li>Driver code running for too long with preemption or interrupts disabled</li>
     99 <li>Long-running softirqs</li>
    100 <li>Lock contention (application, framework, kernel driver, binder lock, mmap
    101 lock)</li>
    102 <li>File descriptor contention where a low-priority thread holds the lock on a
    103 file, preventing a high-priority thread from running</li>
    104 <li>Running UI-critical code in workqueues where it could be delayed</li>
    105 <li>CPU idle transitions</li>
    106 <li>Logging</li>
    107 <li>I/O delays</li>
    108 <li>Unnecessary process creation (e.g., CONNECTIVITY_CHANGE broadcasts)</li>
    109 <li>Page cache thrashing caused by insufficient free memory</li>
    110 </ul>
    111 
    112 <p>The required amount of time for a given period of jitter may or may not
    113 decrease as capacity increases. For example, if a driver leaves interrupts
    114 disabled while waiting for a read from across an i2c bus, it will take a fixed
    115 amount of time regardless of whether the CPU is at 384MHz or 2GHz. Increasing
    116 capacity is not a feasible solution to improve performance when jitter is
    117 involved. As a result, <strong>faster processors will not usually improve
    118 performance in jitter-constrained situations.</strong></p>
    119 
    120 <p>Finally, unlike capacity, jitter is almost entirely within the domain of the
    121 system vendor.</p>
    122 
    123 <h3 id="memory_consumption">Memory consumption</h3>
    124 <p>Memory consumption is traditionally blamed for poor performance. While
    125 consumption itself is not a performance issue, it can cause jitter via
    126 lowmemorykiller overhead, service restarts, and page cache thrashing. Reducing
    127 memory consumption can avoid the direct causes of poor performance, but there
    128 may be other targeted improvements that avoid those causes as well (for example,
    129 pinning the framework to prevent it from being paged out when it will be paged
    130 in soon after).</p>
    131 
    132 <h2 id="analyze_initial">Analyzing initial device performance</h2>
    133 <p>Starting from a functional but poorly-performing system and attempting to fix
    134 the system's behavior by looking at individual cases of user-visible poor
    135 performance is <strong>not</strong> a sound strategy. Because poor performance
    136 is usually not easily reproducible (i.e., jitter) or an application issue, too
    137 many variables in the full system prevent this strategy from being effective. As
    138 a result, it's very easy to misidentify causes and make minor improvements while
    139 missing systemic opportunities for fixing performance across the system.</p>
    140 
    141 <p>Instead, use the following general approach when bringing up a new
    142 device:</p>
    143 <ol>
    144 <li>Get the system booting to UI with all drivers running and some basic
    145 frequency governor settings (if you change the frequency governor settings,
    146 repeat all steps below).</li>
    147 <li>Ensure the kernel supports the <code>sched_blocked_reason</code> tracepoint
    148 as well as other tracepoints in the display pipeline that denote when the frame
    149 is delivered to the display.</li>
    150 <li>Take long traces of the entire UI pipeline (from receiving input via an IRQ
    151 to final scanout) while running a lightweight and consistent workload (e.g.,
    152 <a href="https://android.googlesource.com/platform/frameworks/base.git/+/master/tests/UiBench/">UiBench</a>
    153 or the ball test in <a href="#touchlatency">TouchLatency)</a>.</li>
    154 <li>Fix the frame drops detected in the lightweight and consistent
    155 workload.</li>
    156 <li>Repeat steps 3-4 until you can run with zero dropped frames for 20+ seconds
    157 at a time. </li>
    158 <li>Move on to other user-visible sources of jank.</li>
    159 </ol>
    160 
    161 <p>Other simple things you can do early on in device bringup include:</p>
    162 
    163 <ul>
    164 <li>Ensure your kernel has the
    165 <a href="https://android.googlesource.com/kernel/msm/+/c9f00aa0e25e397533c198a0fcf6246715f99a7b%5E!/">sched_blocked_reason
    166 tracepoint patch</a>. This tracepoint is enabled with the sched trace category
    167 in systrace and provides the function responsible for sleeping when that
    168 thread enters uninterruptible sleep. It is critical for performance analysis
    169 because uninterruptible sleep is a very common indicator of jitter.</li>
    170 <li>Ensure you have sufficient tracing for the GPU and display pipelines. On
    171 recent Qualcomm SOCs, tracepoints are enabled using:</li>
    172 <pre class="devsite-click-to-copy">
    173 <code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/kgsl/enable"</code>
    174 <code class="devsite-terminal">adb shell "echo 1 &gt; /d/tracing/events/mdss/enable"</code>
    175 </pre>
    176 
    177 <p>These events remain enabled when you run systrace so you can see additional
    178 information in the trace about the display pipeline (MDSS) in the
    179 <code>mdss_fb0</code> section. On Qualcomm SOCs, you won't see any additional
    180 information about the GPU in the standard systrace view, but the results are
    181 present in the trace itself (for details, see
    182 <a href="/devices/tech/debug/systrace.html">Understanding
    183 systrace</a>).</p>
    184 
    185 <p>What you want from this kind of display tracing is a single event that
    186 directly indicates a frame has been delivered to the display. From there, you
    187 can determine if you've hit your frame time successfully; if event X<em>n</em>
    188 occurs less than 16.7ms after event X<em>n-1</em> (assuming a 60Hz display),
    189 then you know you did not jank. If your SOC does not provide such signals, work
    190 with your vendor to get them. Debugging jitter is extremely difficult without a
    191 definitive signal of frame completion.</p></ul>
    192 
    193 <h3 id="synthetic_benchmarks">Using synthetic benchmarks</h3>
    194 <p>Synthetic benchmarks are useful for ensuring a device's basic functionality
    195 is present. However, treating benchmarks as a proxy for perceived device
    196 performance is not useful.</p>
    197 
    198 <p>Based on experiences with SOCs, differences in synthetic benchmark
    199 performance between SOCs is not correlated with a similar difference in
    200 perceptible UI performance (number of dropped frames, 99th percentile frame
    201 time, etc.). Synthetic benchmarks are capacity-only benchmarks; jitter impacts
    202 the measured performance of these benchmarks only by stealing time from the bulk
    203 operation of the benchmark. As a result, synthetic benchmark scores are mostly
    204 irrelevant as a metric of user-perceived performance.</p>
    205 
    206 <p>Consider two SOCs running Benchmark X that renders 1000 frames of UI and
    207 reports the total rendering time (lower score is better).</p>
    208 
    209 <ul>
    210 <li>SOC 1 renders each frame of Benchmark X in 10ms and scores 10,000.</li>
    211 <li>SOC 2 renders 99% of frames in 1ms but 1% of frames in 100ms and scores
    212 19,900, a dramatically better score.</li>
    213 </ul>
    214 
    215 <p>If the benchmark is indicative of actual UI performance, SOC 2 would be
    216 unusable. Assuming a 60Hz refresh rate, SOC 2 would have a janky frame every
    217 1.5s of operation. Meanwhile, SOC 1 (the slower SOC according to Benchmark X)
    218 would be perfectly fluid.</p>
    219 
    220 <h3 id="bug_reports">Using bug reports</h3>
    221 <p>Bug reports are sometimes useful for performance analysis, but because they
    222 are so heavyweight, they are rarely useful for debugging sporadic jank issues.
    223 They may provide some hints on what the system was doing at a given time,
    224 especially if the jank was around an application transition (which is logged in
    225 a bug report). Bug reports can also indicate when something is more broadly
    226 wrong with the system that could reduce its effective capacity (such as thermal
    227 throttling or memory fragmentation).</p>
    228 
    229 <h3 id="touchlatency">Using TouchLatency</h3>
    230 <p>Several examples of bad behavior come from TouchLatency, which is the
    231 preferred periodic workload used for the Pixel and Pixel XL. It's available at
    232 <code>frameworks/base/tests/TouchLatency</code> and has two modes: touch latency
    233 and bouncing ball (to switch modes, click the button in the upper-right
    234 corner).</p>
    235 
    236 <p>The bouncing ball test is exactly as simple as it appears: A ball bounces
    237 around the screen forever, regardless of user input. It is usually also
    238 <strong>by far</strong> the hardest test to run perfectly, but the closer it
    239 comes to running without any dropped frames, the better your device will be. The
    240 bouncing ball test is difficult because it is a trivial but perfectly consistent
    241 workload that runs at a very low clock (this assumes device has a frequency
    242 governor; if the device is instead running with fixed clocks, downclock the
    243 CPU/GPU to near-minimum when running the bouncing ball test for the first time).
    244 As the system quiesces and the clocks drop closer to idle, the required CPU/GPU
    245 time per frame increases. You can watch the ball and see things jank, and you'll
    246 be able to see missed frames in systrace as well.</p>
    247 
    248 <p>Because the workload is so consistent, you can identify most sources of
    249 jitter much more easily than in most user-visible workloads by tracking what
    250 exactly is running on the system during each missed frame instead of the UI
    251 pipeline. <strong>The lower clocks amplify the effects of jitter by making it
    252 more likely that any jitter causes a dropped frame.</strong> As a result, the
    253 closer TouchLatency is to 60FPS, the less likely you are to have bad system
    254 behaviors that cause sporadic, hard-to-reproduce jank in larger
    255 applications.</p>
    256 
    257 <p>As jitter is often (but not always) clockspeed-invariant, use a test that
    258 runs at very low clocks to diagnose jitter for the following reasons:</p>
    259 <ul>
    260 <li>Not all jitter is clockspeed-invariant; many sources just consume CPU
    261 time.</li>
    262 <li>The governor should get the average frame time close to the deadline by
    263 clocking down, so time spent running non-UI work can push it over the edge to
    264 dropping a frame.</li>
    265 </ul>
    266 
    267 </body>
    268 </html>
    269