1 page.title=Contributors to Audio Latency 2 @jd:body 3 4 <!-- 5 Copyright 2013 The Android Open Source Project 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 --> 19 <div id="qv-wrapper"> 20 <div id="qv"> 21 <h2>In this document</h2> 22 <ol id="auto-toc"> 23 </ol> 24 </div> 25 </div> 26 27 <p> 28 This page focuses on the contributors to output latency, 29 but a similar discussion applies to input latency. 30 </p> 31 <p> 32 Assuming the analog circuitry does not contribute significantly, then the major 33 surface-level contributors to audio latency are the following: 34 </p> 35 36 <ul> 37 <li>Application</li> 38 <li>Total number of buffers in pipeline</li> 39 <li>Size of each buffer, in frames</li> 40 <li>Additional latency after the app processor, such as from a DSP</li> 41 </ul> 42 43 <p> 44 As accurate as the above list of contributors may be, it is also misleading. 45 The reason is that buffer count and buffer size are more of an 46 <em>effect</em> than a <em>cause</em>. What usually happens is that 47 a given buffer scheme is implemented and tested, but during testing, an audio 48 underrun or overrun is heard as a "click" or "pop." To compensate, the 49 system designer then increases buffer sizes or buffer counts. 50 This has the desired result of eliminating the underruns or overruns, but it also 51 has the undesired side effect of increasing latency. 52 For more information about buffer sizes, see the video 53 <a href="https://youtu.be/PnDK17zP9BI">Audio latency: buffer sizes</a>. 54 55 </p> 56 57 <p> 58 A better approach is to understand the causes of the 59 underruns and overruns, and then correct those. This eliminates the 60 audible artifacts and may permit even smaller or fewer buffers 61 and thus reduce latency. 62 </p> 63 64 <p> 65 In our experience, the most common causes of underruns and overruns include: 66 </p> 67 <ul> 68 <li>Linux CFS (Completely Fair Scheduler)</li> 69 <li>high-priority threads with SCHED_FIFO scheduling</li> 70 <li>priority inversion</li> 71 <li>long scheduling latency</li> 72 <li>long-running interrupt handlers</li> 73 <li>long interrupt disable time</li> 74 <li>power management</li> 75 <li>security kernels</li> 76 </ul> 77 78 <h3 id="linuxCfs">Linux CFS and SCHED_FIFO scheduling</h3> 79 <p> 80 The Linux CFS is designed to be fair to competing workloads sharing a common CPU 81 resource. This fairness is represented by a per-thread <em>nice</em> parameter. 82 The nice value ranges from -19 (least nice, or most CPU time allocated) 83 to 20 (nicest, or least CPU time allocated). In general, all threads with a given 84 nice value receive approximately equal CPU time and threads with a 85 numerically lower nice value should expect to 86 receive more CPU time. However, CFS is "fair" only over relatively long 87 periods of observation. Over short-term observation windows, 88 CFS may allocate the CPU resource in unexpected ways. For example, it 89 may take the CPU away from a thread with numerically low niceness 90 onto a thread with a numerically high niceness. In the case of audio, 91 this can result in an underrun or overrun. 92 </p> 93 94 <p> 95 The obvious solution is to avoid CFS for high-performance audio 96 threads. Beginning with Android 4.1, such threads now use the 97 <code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called 98 <code>SCHED_OTHER</code>) scheduling policy implemented by CFS. 99 </p> 100 101 <h3 id="schedFifo">SCHED_FIFO priorities</h3> 102 <p> 103 Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they 104 are still susceptible to other higher priority <code>SCHED_FIFO</code> threads. 105 These are typically kernel worker threads, but there may also be a few 106 non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code> 107 priorities range from 1 to 99. The audio threads run at priority 108 2 or 3. This leaves priority 1 available for lower priority threads, 109 and priorities 4 to 99 for higher priority threads. We recommend 110 you use priority 1 whenever possible, and reserve priorities 4 to 99 for 111 those threads that are guaranteed to complete within a bounded amount 112 of time, execute with a period shorter than the period of audio threads, 113 and are known to not interfere with scheduling of audio threads. 114 </p> 115 116 <h3 id="rms">Rate-monotonic scheduling</h3> 117 <p> 118 For more information on the theory of assignment of fixed priorities, 119 see the Wikipedia article 120 <a href="http://en.wikipedia.org/wiki/Rate-monotonic_scheduling">Rate-monotonic scheduling</a> (RMS). 121 A key point is that fixed priorities should be allocated strictly based on period, 122 with higher priorities assigned to threads of shorter periods, not based on perceived "importance." 123 Non-periodic threads may be modeled as periodic threads, using the maximum frequency of execution 124 and maximum computation per execution. If a non-periodic thread cannot be modeled as 125 a periodic thread (for example it could execute with unbounded frequency or unbounded computation 126 per execution), then it should not be assigned a fixed priority as that would be incompatible 127 with the scheduling of true periodic threads. 128 </p> 129 130 <h3 id="priorityInversion">Priority inversion</h3> 131 <p> 132 <a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a> 133 is a classic failure mode of real-time systems, 134 where a higher-priority task is blocked for an unbounded time waiting 135 for a lower-priority task to release a resource such as (shared 136 state protected by) a 137 <a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>. 138 See the article "<a href="avoiding_pi.html">Avoiding priority inversion</a>" for techniques to 139 mitigate it. 140 </p> 141 142 <h3 id="schedLatency">Scheduling latency</h3> 143 <p> 144 Scheduling latency is the time between when a thread becomes 145 ready to run and when the resulting context switch completes so that the 146 thread actually runs on a CPU. The shorter the latency the better, and 147 anything over two milliseconds causes problems for audio. Long scheduling 148 latency is most likely to occur during mode transitions, such as 149 bringing up or shutting down a CPU, switching between a security kernel 150 and the normal kernel, switching from full power to low-power mode, 151 or adjusting the CPU clock frequency and voltage. 152 </p> 153 154 <h3 id="interrupts">Interrupts</h3> 155 <p> 156 In many designs, CPU 0 services all external interrupts. So a 157 long-running interrupt handler may delay other interrupts, in particular 158 audio direct memory access (DMA) completion interrupts. Design interrupt handlers 159 to finish quickly and defer lengthy work to a thread (preferably 160 a CFS thread or <code>SCHED_FIFO</code> thread of priority 1). 161 </p> 162 163 <p> 164 Equivalently, disabling interrupts on CPU 0 for a long period 165 has the same result of delaying the servicing of audio interrupts. 166 Long interrupt disable times typically happen while waiting for a kernel 167 <i>spin lock</i>. Review these spin locks to ensure they are bounded. 168 </p> 169 170 <h3 id="power">Power, performance, and thermal management</h3> 171 <p> 172 <a href="http://en.wikipedia.org/wiki/Power_management">Power management</a> 173 is a broad term that encompasses efforts to monitor 174 and reduce power consumption while optimizing performance. 175 <a href="http://en.wikipedia.org/wiki/Thermal_management_of_electronic_devices_and_systems">Thermal management</a> 176 and <a href="http://en.wikipedia.org/wiki/Computer_cooling">computer cooling</a> 177 are similar but seek to measure and control heat to avoid damage due to excess heat. 178 In the Linux kernel, the CPU 179 <a href="http://en.wikipedia.org/wiki/Governor_%28device%29">governor</a> 180 is responsible for low-level policy, while user mode configures high-level policy. 181 Techniques used include: 182 </p> 183 184 <ul> 185 <li>dynamic voltage scaling</li> 186 <li>dynamic frequency scaling</li> 187 <li>dynamic core enabling</li> 188 <li>cluster switching</li> 189 <li>power gating</li> 190 <li>hotplug (hotswap)</li> 191 <li>various sleep modes (halt, stop, idle, suspend, etc.)</li> 192 <li>process migration</li> 193 <li><a href="http://en.wikipedia.org/wiki/Processor_affinity">processor affinity</a></li> 194 </ul> 195 196 <p> 197 Some management operations can result in "work stoppages" or 198 times during which there is no useful work performed by the application processor. 199 These work stoppages can interfere with audio, so such management should be designed 200 for an acceptable worst-case work stoppage while audio is active. 201 Of course, when thermal runaway is imminent, avoiding permanent damage 202 is more important than audio! 203 </p> 204 205 <h3 id="security">Security kernels</h3> 206 <p> 207 A <a href="http://en.wikipedia.org/wiki/Security_kernel">security kernel</a> for 208 <a href="http://en.wikipedia.org/wiki/Digital_rights_management">Digital rights management</a> 209 (DRM) may run on the same application processor core(s) as those used 210 for the main operating system kernel and application code. Any time 211 during which a security kernel operation is active on a core is effectively a 212 stoppage of ordinary work that would normally run on that core. 213 In particular, this may include audio work. By its nature, the internal 214 behavior of a security kernel is inscrutable from higher-level layers, and thus 215 any performance anomalies caused by a security kernel are especially 216 pernicious. For example, security kernel operations do not typically appear in 217 context switch traces. We call this "dark time" — time that elapses 218 yet cannot be observed. Security kernels should be designed for an 219 acceptable worst-case work stoppage while audio is active. 220 </p> 221