Home | History | Annotate | Download | only in audio
      1 page.title=Data Formats
      2 @jd:body
      3 
      4 <!--
      5     Copyright 2015 The Android Open Source Project
      6 
      7     Licensed under the Apache License, Version 2.0 (the "License");
      8     you may not use this file except in compliance with the License.
      9     You may obtain a copy of the License at
     10 
     11         http://www.apache.org/licenses/LICENSE-2.0
     12 
     13     Unless required by applicable law or agreed to in writing, software
     14     distributed under the License is distributed on an "AS IS" BASIS,
     15     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16     See the License for the specific language governing permissions and
     17     limitations under the License.
     18 -->
     19 
     20 <div id="qv-wrapper">
     21   <div id="qv">
     22     <h2>In this document</h2>
     23     <ol id="auto-toc">
     24     </ol>
     25   </div>
     26 </div>
     27 
     28 <p>
     29 Android uses a wide variety of audio
     30 <a href="http://en.wikipedia.org/wiki/Data_format">data formats</a>
     31 internally, and exposes a subset of these in public APIs,
     32 <a href="http://en.wikipedia.org/wiki/Audio_file_format">file formats</a>,
     33 and the
     34 <a href="https://en.wikipedia.org/wiki/Hardware_abstraction">Hardware Abstraction Layer</a> (HAL).
     35 </p>
     36 
     37 <h2 id="properties">Properties</h2>
     38 
     39 <p>
     40 The audio data formats are classified by their properties:
     41 </p>
     42 
     43 <dl>
     44 
     45   <dt><a href="https://en.wikipedia.org/wiki/Data_compression">Compression</a></dt>
     46   <dd>
     47     <a href="http://en.wikipedia.org/wiki/Raw_data">Uncompressed</a>,
     48     <a href="http://en.wikipedia.org/wiki/Lossless_compression">lossless compressed</a>, or
     49     <a href="http://en.wikipedia.org/wiki/Lossy_compression">lossy compressed</a>.
     50     PCM is the most common uncompressed audio format. FLAC is a lossless compressed
     51     format, while MP3 and AAC are lossy compressed formats.
     52   </dd>
     53 
     54   <dt><a href="http://en.wikipedia.org/wiki/Audio_bit_depth">Bit depth</a></dt>
     55   <dd>
     56     Number of significant bits per audio sample.
     57   </dd>
     58 
     59   <dt><a href="https://en.wikipedia.org/wiki/Sizeof">Container size</a></dt>
     60   <dd>
     61     Number of bits used to store or transmit a sample. Usually
     62     this is the same as the bit depth, but sometimes additional
     63     padding bits are allocated for alignment. For example, a
     64     24-bit sample could be contained within a 32-bit word.
     65   </dd>
     66 
     67   <dt><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Alignment</a></dt>
     68   <dd>
     69     If the container size is exactly equal to the bit depth, the
     70     representation is called <em>packed</em>. Otherwise the representation is
     71     <em>unpacked</em>. The significant bits of the sample are typically
     72     aligned with either the leftmost (most significant) or rightmost
     73     (least significant) bit of the container. It is conventional to use
     74     the terms <em>packed</em> and <em>unpacked</em> only when the bit
     75     depth is not a
     76     <a href="http://en.wikipedia.org/wiki/Power_of_two">power of two</a>.
     77   </dd>
     78 
     79   <dt><a href="http://en.wikipedia.org/wiki/Signedness">Signedness</a></dt>
     80   <dd>
     81     Whether samples are signed or unsigned.
     82   </dd>
     83 
     84   <dt>Representation</dt>
     85   <dd>
     86     Either fixed point or floating point; see below.
     87   </dd>
     88 
     89 </dl>
     90 
     91 <h2 id="fixed">Fixed point representation</h2>
     92 
     93 <p>
     94 <a href="http://en.wikipedia.org/wiki/Fixed-point_arithmetic">Fixed point</a>
     95 is the most common representation for uncompressed PCM audio data,
     96 especially at hardware interfaces.
     97 </p>
     98 
     99 <p>
    100 A fixed-point number has a fixed (constant) number of digits
    101 before and after the <a href="https://en.wikipedia.org/wiki/Radix_point">radix point</a>.
    102 All of our representations use
    103 <a href="https://en.wikipedia.org/wiki/Binary_number">base 2</a>,
    104 so we substitute <em>bit</em> for <em>digit</em>,
    105 and <em>binary point</em> or simply <em>point</em> for <em>radix point</em>.
    106 The bits to the left of the point are the integer part,
    107 and the bits to the right of the point are the
    108 <a href="https://en.wikipedia.org/wiki/Fractional_part">fractional part</a>.
    109 </p>
    110 
    111 <p>
    112 We speak of <em>integer PCM</em>, because fixed-point values
    113 are usually stored and manipulated as integer values.
    114 The interpretation as fixed-point is implicit.
    115 </p>
    116 
    117 <p>
    118 We use <a href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a>
    119 for all signed fixed-point representations,
    120 so the following holds where all values are in units of one
    121 <a href="https://en.wikipedia.org/wiki/Least_significant_bit">LSB</a>:
    122 </p>
    123 <pre>
    124 |largest negative value| = |largest positive value| + 1
    125 </pre>
    126 
    127 <h3 id="q">Q and U notation</h3>
    128 
    129 <p>
    130 There are various
    131 <a href="https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation">notations</a>
    132 for fixed-point representation in an integer.
    133 We use <a href="https://en.wikipedia.org/wiki/Q_(number_format)">Q notation</a>:
    134 Q<em>m</em>.<em>n</em> means <em>m</em> integer bits and <em>n</em> fractional bits.
    135 The "Q" counts as one bit, though the value is expressed in two's complement.
    136 The total number of bits is <em>m</em> + <em>n</em> + 1.
    137 </p>
    138 
    139 <p>
    140 U<em>m</em>.<em>n</em> is for unsigned numbers:
    141 <em>m</em> integer bits and <em>n</em> fractional bits,
    142 and the "U" counts as zero bits.
    143 The total number of bits is <em>m</em> + <em>n</em>.
    144 </p>
    145 
    146 <p>
    147 The integer part may be used in the final result, or be temporary.
    148 In the latter case, the bits that make up the integer part are called
    149 <em>guard bits</em>. The guard bits permit an intermediate calculation to overflow,
    150 as long as the final value is within range or can be clamped to be within range.
    151 Note that fixed-point guard bits are at the left, while floating-point unit
    152 <a href="https://en.wikipedia.org/wiki/Guard_digit">guard digits</a>
    153 are used to reduce roundoff error and are on the right.
    154 </p>
    155 
    156 <h2 id="floating">Floating point representation</h2>
    157 
    158 <p>
    159 <a href="https://en.wikipedia.org/wiki/Floating_point">Floating point</a>
    160 is an alternative to fixed point, in which the location of the point can vary.
    161 The primary advantages of floating-point include:
    162 </p>
    163 
    164 <ul>
    165   <li>Greater <a href="https://en.wikipedia.org/wiki/Headroom_(audio_signal_processing)">headroom</a>
    166       and <a href="https://en.wikipedia.org/wiki/Dynamic_range">dynamic range</a>;
    167       floating-point arithmetic tolerates exceeeding nominal ranges
    168       during intermediate computation, and only clamps values at the end
    169   </li>
    170   <li>Support for special values such as infinities and NaN</li>
    171   <li>Easier to use in many cases</li>
    172 </ul>
    173 
    174 <p>
    175 Historically, floating-point arithmetic was slower than integer or fixed-point
    176 arithmetic, but now it is common for floating-point to be faster,
    177 provided control flow decisions aren't based on the value of a computation.
    178 </p>
    179 
    180 <h2 id="androidFormats">Android formats for audio</h2>
    181 
    182 <p>
    183 The major Android formats for audio are listed in the table below:
    184 </p>
    185 
    186 <table>
    187 
    188 <tr>
    189   <th></th>
    190   <th colspan="5"><center>Notation</center></th>
    191 </tr>
    192 
    193 <tr>
    194   <th>Property</th>
    195   <th>Q0.15</th>
    196   <th>Q0.7 <sup>1</sup></th>
    197   <th>Q0.23</th>
    198   <th>Q0.31</th>
    199   <th>float</th>
    200 </tr>
    201 
    202 <tr>
    203   <td>Container<br />bits</td>
    204   <td>16</td>
    205   <td>8</td>
    206   <td>24 or 32 <sup>2</sup></td>
    207   <td>32</td>
    208   <td>32</td>
    209 </tr>
    210 
    211 <tr>
    212   <td>Significant bits<br />including sign</td>
    213   <td>16</td>
    214   <td>8</td>
    215   <td>24</td>
    216   <td>24 or 32 <sup>2</sup></td>
    217   <td>25 <sup>3</sup></td>
    218 </tr>
    219 
    220 <tr>
    221   <td>Headroom<br />in dB</td>
    222   <td>0</td>
    223   <td>0</td>
    224   <td>0</td>
    225   <td>0</td>
    226   <td>126 <sup>4</sup></td>
    227 </tr>
    228 
    229 <tr>
    230   <td>Dynamic range<br />in dB</td>
    231   <td>90</td>
    232   <td>42</td>
    233   <td>138</td>
    234   <td>138 to 186</td>
    235   <td>900 <sup>5</sup></td>
    236 </tr>
    237 
    238 </table>
    239 
    240 <p>
    241 All fixed-point formats above have a nominal range of -1.0 to +1.0 minus one LSB.
    242 There is one more negative value than positive value due to the
    243 two's complement representation.
    244 </p>
    245 
    246 <p>
    247 Footnotes:
    248 </p>
    249 
    250 <ol>
    251 
    252 <li>
    253 All formats above express signed sample values.
    254 The 8-bit format is commonly called "unsigned", but
    255 it is actually a signed value with bias of <code>0.10000000</code>.
    256 </li>
    257 
    258 <li>
    259 Q0.23 may be packed into 24 bits (three 8-bit bytes), or unpacked
    260 in 32 bits. If unpacked, the significant bits are either right-justified
    261 towards the LSB with sign extension padding towards the MSB (Q8.23),
    262 or left-justified towards the MSB with zero fill towards the LSB
    263 (Q0.31). Q0.31 theoretically permits up to 32 significant bits,
    264 but hardware interfaces that accept Q0.31 rarely use all the bits.
    265 </li>
    266 
    267 <li>
    268 Single-precision floating point has 23 explicit bits plus one hidden bit and sign bit,
    269 resulting in 25 significant bits total.
    270 <a href="https://en.wikipedia.org/wiki/Denormal_number">Denormal numbers</a>
    271 have fewer significant bits.
    272 </li>
    273 
    274 <li>
    275 Single-precision floating point can express values up to &plusmn;1.7e+38,
    276 which explains the large headroom.
    277 </li>
    278 
    279 <li>
    280 The dynamic range shown is for denormals up to the nominal maximum
    281 value &plusmn;1.0.
    282 Note that some architecture-specific floating point implementations such as
    283 <a href="https://en.wikipedia.org/wiki/ARM_architecture#NEON">NEON</a>
    284 don't support denormals.
    285 </li>
    286 
    287 </ol>
    288 
    289 <h2 id="conversions">Conversions</h2>
    290 
    291 <p>
    292 This section discusses
    293 <a href="https://en.wikipedia.org/wiki/Data_conversion">data conversions</a>
    294 between various representations.
    295 </p>
    296 
    297 <h3 id="floatConversions">Floating point conversions</h3>
    298 
    299 <p>
    300 To convert a value from Q<em>m</em>.<em>n</em> format to floating point:
    301 </p>
    302 
    303 <ol>
    304   <li>Convert the value to floating point as if it were an integer (by ignoring the point).</li>
    305   <li>Multiply by 2<sup>-<em>n</em></sup>.</li>
    306 </ol>
    307 
    308 <p>
    309 For example, to convert a Q4.27 internal value to floating point, use:
    310 </p>
    311 <pre>
    312 float = integer * (2 ^ -27)
    313 </pre>
    314 
    315 <p>
    316 Conversions from floating point to fixed point follow these rules:
    317 </p>
    318 
    319 <ul>
    320 
    321 <li>
    322 Single-precision floating point has a nominal range of &plusmn;1.0,
    323 but the full range for intermediate values is &plusmn;1.7e+38.
    324 Conversion between floating point and fixed point for external representation
    325 (such as output to audio devices) will consider only the nominal range, with
    326 clamping for values that exceed that range.
    327 In particular, when +1.0 is converted
    328 to a fixed-point format, it is clamped to +1.0 minus one LSB.
    329 </li>
    330 
    331 <li>
    332 Denormals (subnormals) and both +/- 0.0 are allowed in representation,
    333 but may be silently converted to 0.0 during processing.
    334 </li>
    335 
    336 <li>
    337 Infinities will either pass through operations or will be silently hard-limited
    338 to +/- 1.0. Generally the latter is for conversion to a fixed-point format.
    339 </li>
    340 
    341 <li>
    342 NaN behavior is undefined: a NaN may propagate as an identical NaN, or may be
    343 converted to a Default NaN, may be silently hard limited to +/- 1.0, or
    344 silently converted to 0.0, or result in an error.
    345 </li>
    346 
    347 </ul>
    348 
    349 <h3 id="fixedConversion">Fixed point conversions</h3>
    350 
    351 <p>
    352 Conversions between different Q<em>m</em>.<em>n</em> formats follow these rules:
    353 </p>
    354 
    355 <ul>
    356 
    357 <li>
    358 When <em>m</em> is increased, sign extend the integer part at left.
    359 </li>
    360 
    361 <li>
    362 When <em>m</em> is decreased, clamp the integer part.
    363 </li>
    364 
    365 <li>
    366 When <em>n</em> is increased, zero extend the fractional part at right.
    367 </li>
    368 
    369 <li>
    370 When <em>n</em> is decreased, either dither, round, or truncate the excess fractional bits at right.
    371 </li>
    372 
    373 </ul>
    374 
    375 <p>
    376 For example, to convert a Q4.27 value to Q0.15 (without dither or
    377 rounding), right shift the Q4.27 value by 12 bits, and clamp any results
    378 that exceed the 16-bit signed range. This aligns the point of the
    379 Q representation.
    380 </p>
    381 
    382 <p>To convert Q7.24 to Q7.23, do a signed divide by 2,
    383 or equivalently add the sign bit to the Q7.24 integer quantity, and then signed right shift by 1.
    384 Note that a simple signed right shift is <em>not</em> equivalent to a signed divide by 2.
    385 </p>
    386 
    387 <h3 id="lossyConversion">Lossy and lossless conversions</h3>
    388 
    389 <p>
    390 A conversion is <em>lossless</em> if it is
    391 <a href="https://en.wikipedia.org/wiki/Inverse_function">invertible</a>:
    392 a conversion from <code>A</code> to <code>B</code> to
    393 <code>C</code> results in <code>A = C</code>.
    394 Otherwise the conversion is <a href="https://en.wikipedia.org/wiki/Lossy_data_conversion">lossy</a>.
    395 </p>
    396 
    397 <p>
    398 Lossless conversions permit
    399 <a href="https://en.wikipedia.org/wiki/Round-trip_format_conversion">round-trip format conversion</a>.
    400 </p>
    401 
    402 <p>
    403 Conversions from fixed point representation with 25 or fewer significant bits to floating point are lossless.
    404 Conversions from floating point to any common fixed point representation are lossy.
    405 </p>
    406