Home | History | Annotate | Download | only in audio
      1 <html devsite>
      2   <head>
      3     <title>Contributors to Audio Latency</title>
      4     <meta name="project_path" value="/_project.yaml" />
      5     <meta name="book_path" value="/_book.yaml" />
      6   </head>
      7   <body>
      8   <!--
      9       Copyright 2017 The Android Open Source Project
     10 
     11       Licensed under the Apache License, Version 2.0 (the "License");
     12       you may not use this file except in compliance with the License.
     13       You may obtain a copy of the License at
     14 
     15           http://www.apache.org/licenses/LICENSE-2.0
     16 
     17       Unless required by applicable law or agreed to in writing, software
     18       distributed under the License is distributed on an "AS IS" BASIS,
     19       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     20       See the License for the specific language governing permissions and
     21       limitations under the License.
     22   -->
     23 
     24 
     25 
     26 <p>
     27   This page focuses on the contributors to output latency,
     28   but a similar discussion applies to input latency.
     29 </p>
     30 <p>
     31   Assuming the analog circuitry does not contribute significantly, then the major
     32   surface-level contributors to audio latency are the following:
     33 </p>
     34 
     35 <ul>
     36   <li>Application</li>
     37   <li>Total number of buffers in pipeline</li>
     38   <li>Size of each buffer, in frames</li>
     39   <li>Additional latency after the app processor, such as from a DSP</li>
     40 </ul>
     41 
     42 <p>
     43   As accurate as the above list of contributors may be, it is also misleading.
     44   The reason is that buffer count and buffer size are more of an
     45   <em>effect</em> than a <em>cause</em>.  What usually happens is that
     46   a given buffer scheme is implemented and tested, but during testing, an audio
     47   underrun or overrun is heard as a "click" or "pop."  To compensate, the
     48   system designer then increases buffer sizes or buffer counts.
     49   This has the desired result of eliminating the underruns or overruns, but it also
     50   has the undesired side effect of increasing latency.
     51   For more information about buffer sizes, see the video
     52   <a href="https://youtu.be/PnDK17zP9BI">Audio latency: buffer sizes</a>.
     53 
     54 </p>
     55 
     56 <p>
     57   A better approach is to understand the causes of the
     58   underruns and overruns, and then correct those.  This eliminates the
     59   audible artifacts and may permit even smaller or fewer buffers
     60   and thus reduce latency.
     61 </p>
     62 
     63 <p>
     64   In our experience, the most common causes of underruns and overruns include:
     65 </p>
     66 <ul>
     67   <li>Linux CFS (Completely Fair Scheduler)</li>
     68   <li>high-priority threads with SCHED_FIFO scheduling</li>
     69   <li>priority inversion</li>
     70   <li>long scheduling latency</li>
     71   <li>long-running interrupt handlers</li>
     72   <li>long interrupt disable time</li>
     73   <li>power management</li>
     74   <li>security kernels</li>
     75 </ul>
     76 
     77 <h3 id="linuxCfs">Linux CFS and SCHED_FIFO scheduling</h3>
     78 <p>
     79   The Linux CFS is designed to be fair to competing workloads sharing a common CPU
     80   resource. This fairness is represented by a per-thread <em>nice</em> parameter.
     81   The nice value ranges from -19 (least nice, or most CPU time allocated)
     82   to 20 (nicest, or least CPU time allocated). In general, all threads with a given
     83   nice value receive approximately equal CPU time and threads with a
     84   numerically lower nice value should expect to
     85   receive more CPU time. However, CFS is "fair" only over relatively long
     86   periods of observation. Over short-term observation windows,
     87   CFS may allocate the CPU resource in unexpected ways. For example, it
     88   may take the CPU away from a thread with numerically low niceness
     89   onto a thread with a numerically high niceness.  In the case of audio,
     90   this can result in an underrun or overrun.
     91 </p>
     92 
     93 <p>
     94   The obvious solution is to avoid CFS for high-performance audio
     95   threads. Beginning with Android 4.1, such threads now use the
     96   <code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called
     97   <code>SCHED_OTHER</code>) scheduling policy implemented by CFS.
     98 </p>
     99 
    100 <h3 id="schedFifo">SCHED_FIFO priorities</h3>
    101 <p>
    102   Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they
    103   are still susceptible to other higher priority <code>SCHED_FIFO</code> threads.
    104   These are typically kernel worker threads, but there may also be a few
    105   non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code>
    106   priorities range from 1 to 99.  The audio threads run at priority
    107   2 or 3.  This leaves priority 1 available for lower priority threads,
    108   and priorities 4 to 99 for higher priority threads.  We recommend
    109   you use priority 1 whenever possible, and reserve priorities 4 to 99 for
    110   those threads that are guaranteed to complete within a bounded amount
    111   of time, execute with a period shorter than the period of audio threads,
    112   and are known to not interfere with scheduling of audio threads.
    113 </p>
    114 
    115 <h3 id="rms">Rate-monotonic scheduling</h3>
    116 <p>
    117   For more information on the theory of assignment of fixed priorities,
    118   see the Wikipedia article
    119   <a href="http://en.wikipedia.org/wiki/Rate-monotonic_scheduling">Rate-monotonic scheduling</a> (RMS).
    120   A key point is that fixed priorities should be allocated strictly based on period,
    121   with higher priorities assigned to threads of shorter periods, not based on perceived "importance."
    122   Non-periodic threads may be modeled as periodic threads, using the maximum frequency of execution
    123   and maximum computation per execution.  If a non-periodic thread cannot be modeled as
    124   a periodic thread (for example it could execute with unbounded frequency or unbounded computation
    125   per execution), then it should not be assigned a fixed priority as that would be incompatible
    126   with the scheduling of true periodic threads.
    127 </p>
    128 
    129 <h3 id="priorityInversion">Priority inversion</h3>
    130 <p>
    131   <a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a>
    132   is a classic failure mode of real-time systems,
    133   where a higher-priority task is blocked for an unbounded time waiting
    134   for a lower-priority task to release a resource such as (shared
    135   state protected by) a
    136   <a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>.
    137   See the article "<a href="avoiding_pi.html">Avoiding priority inversion</a>" for techniques to
    138   mitigate it.
    139 </p>
    140 
    141 <h3 id="schedLatency">Scheduling latency</h3>
    142 <p>
    143   Scheduling latency is the time between when a thread becomes
    144   ready to run and when the resulting context switch completes so that the
    145   thread actually runs on a CPU. The shorter the latency the better, and
    146   anything over two milliseconds causes problems for audio. Long scheduling
    147   latency is most likely to occur during mode transitions, such as
    148   bringing up or shutting down a CPU, switching between a security kernel
    149   and the normal kernel, switching from full power to low-power mode,
    150   or adjusting the CPU clock frequency and voltage.
    151 </p>
    152 
    153 <h3 id="interrupts">Interrupts</h3>
    154 <p>
    155   In many designs, CPU 0 services all external interrupts.  So a
    156   long-running interrupt handler may delay other interrupts, in particular
    157   audio direct memory access (DMA) completion interrupts. Design interrupt handlers
    158   to finish quickly and defer lengthy work to a thread (preferably
    159   a CFS thread or <code>SCHED_FIFO</code> thread of priority 1).
    160 </p>
    161 
    162 <p>
    163   Equivalently, disabling interrupts on CPU 0 for a long period
    164   has the same result of delaying the servicing of audio interrupts.
    165   Long interrupt disable times typically happen while waiting for a kernel
    166   <i>spin lock</i>.  Review these spin locks to ensure they are bounded.
    167 </p>
    168 
    169 <h3 id="power">Power, performance, and thermal management</h3>
    170 <p>
    171   <a href="http://en.wikipedia.org/wiki/Power_management">Power management</a>
    172   is a broad term that encompasses efforts to monitor
    173   and reduce power consumption while optimizing performance.
    174   <a href="http://en.wikipedia.org/wiki/Thermal_management_of_electronic_devices_and_systems">Thermal management</a>
    175   and <a href="http://en.wikipedia.org/wiki/Computer_cooling">computer cooling</a>
    176   are similar but seek to measure and control heat to avoid damage due to excess heat.
    177   In the Linux kernel, the CPU
    178   <a href="http://en.wikipedia.org/wiki/Governor_%28device%29">governor</a>
    179   is responsible for low-level policy, while user mode configures high-level policy.
    180   Techniques used include:
    181 </p>
    182 
    183 <ul>
    184   <li>dynamic voltage scaling</li>
    185   <li>dynamic frequency scaling</li>
    186   <li>dynamic core enabling</li>
    187   <li>cluster switching</li>
    188   <li>power gating</li>
    189   <li>hotplug (hotswap)</li>
    190   <li>various sleep modes (halt, stop, idle, suspend, etc.)</li>
    191   <li>process migration</li>
    192   <li><a href="http://en.wikipedia.org/wiki/Processor_affinity">processor affinity</a></li>
    193 </ul>
    194 
    195 <p>
    196   Some management operations can result in "work stoppages" or
    197   times during which there is no useful work performed by the application processor.
    198   These work stoppages can interfere with audio, so such management should be designed
    199   for an acceptable worst-case work stoppage while audio is active.
    200   Of course, when thermal runaway is imminent, avoiding permanent damage
    201   is more important than audio!
    202 </p>
    203 
    204 <h3 id="security">Security kernels</h3>
    205 <p>
    206   A <a href="http://en.wikipedia.org/wiki/Security_kernel">security kernel</a> for
    207   <a href="http://en.wikipedia.org/wiki/Digital_rights_management">Digital rights management</a>
    208   (DRM) may run on the same application processor core(s) as those used
    209   for the main operating system kernel and application code.  Any time
    210   during which a security kernel operation is active on a core is effectively a
    211   stoppage of ordinary work that would normally run on that core.
    212   In particular, this may include audio work.  By its nature, the internal
    213   behavior of a security kernel is inscrutable from higher-level layers, and thus
    214   any performance anomalies caused by a security kernel are especially
    215   pernicious.  For example, security kernel operations do not typically appear in
    216   context switch traces.  We call this "dark time" &mdash; time that elapses
    217   yet cannot be observed.  Security kernels should be designed for an
    218   acceptable worst-case work stoppage while audio is active.
    219 </p>
    220 
    221   </body>
    222 </html>
    223