Home | History | Annotate | Download | only in audio
      1 <html devsite>
      2   <head>
      3     <title>Data Formats</title>
      4     <meta name="project_path" value="/_project.yaml" />
      5     <meta name="book_path" value="/_book.yaml" />
      6   </head>
      7   <body>
      8   <!--
      9       Copyright 2017 The Android Open Source Project
     10 
     11       Licensed under the Apache License, Version 2.0 (the "License");
     12       you may not use this file except in compliance with the License.
     13       You may obtain a copy of the License at
     14 
     15           http://www.apache.org/licenses/LICENSE-2.0
     16 
     17       Unless required by applicable law or agreed to in writing, software
     18       distributed under the License is distributed on an "AS IS" BASIS,
     19       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     20       See the License for the specific language governing permissions and
     21       limitations under the License.
     22   -->
     23 
     24 
     25 
     26 <p>
     27 Android uses a wide variety of audio
     28 <a href="http://en.wikipedia.org/wiki/Data_format">data formats</a>
     29 internally, and exposes a subset of these in public APIs,
     30 <a href="http://en.wikipedia.org/wiki/Audio_file_format">file formats</a>,
     31 and the
     32 <a href="https://en.wikipedia.org/wiki/Hardware_abstraction">Hardware Abstraction Layer</a> (HAL).
     33 </p>
     34 
     35 <h2 id="properties">Properties</h2>
     36 
     37 <p>
     38 The audio data formats are classified by their properties:
     39 </p>
     40 
     41 <dl>
     42 
     43   <dt><a href="https://en.wikipedia.org/wiki/Data_compression">Compression</a></dt>
     44   <dd>
     45     <a href="http://en.wikipedia.org/wiki/Raw_data">Uncompressed</a>,
     46     <a href="http://en.wikipedia.org/wiki/Lossless_compression">lossless compressed</a>, or
     47     <a href="http://en.wikipedia.org/wiki/Lossy_compression">lossy compressed</a>.
     48     PCM is the most common uncompressed audio format. FLAC is a lossless compressed
     49     format, while MP3 and AAC are lossy compressed formats.
     50   </dd>
     51 
     52   <dt><a href="http://en.wikipedia.org/wiki/Audio_bit_depth">Bit depth</a></dt>
     53   <dd>
     54     Number of significant bits per audio sample.
     55   </dd>
     56 
     57   <dt><a href="https://en.wikipedia.org/wiki/Sizeof">Container size</a></dt>
     58   <dd>
     59     Number of bits used to store or transmit a sample. Usually
     60     this is the same as the bit depth, but sometimes additional
     61     padding bits are allocated for alignment. For example, a
     62     24-bit sample could be contained within a 32-bit word.
     63   </dd>
     64 
     65   <dt><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Alignment</a></dt>
     66   <dd>
     67     If the container size is exactly equal to the bit depth, the
     68     representation is called <em>packed</em>. Otherwise the representation is
     69     <em>unpacked</em>. The significant bits of the sample are typically
     70     aligned with either the leftmost (most significant) or rightmost
     71     (least significant) bit of the container. It is conventional to use
     72     the terms <em>packed</em> and <em>unpacked</em> only when the bit
     73     depth is not a
     74     <a href="http://en.wikipedia.org/wiki/Power_of_two">power of two</a>.
     75   </dd>
     76 
     77   <dt><a href="http://en.wikipedia.org/wiki/Signedness">Signedness</a></dt>
     78   <dd>
     79     Whether samples are signed or unsigned.
     80   </dd>
     81 
     82   <dt>Representation</dt>
     83   <dd>
     84     Either fixed point or floating point; see below.
     85   </dd>
     86 
     87 </dl>
     88 
     89 <h2 id="fixed">Fixed point representation</h2>
     90 
     91 <p>
     92 <a href="http://en.wikipedia.org/wiki/Fixed-point_arithmetic">Fixed point</a>
     93 is the most common representation for uncompressed PCM audio data,
     94 especially at hardware interfaces.
     95 </p>
     96 
     97 <p>
     98 A fixed-point number has a fixed (constant) number of digits
     99 before and after the <a href="https://en.wikipedia.org/wiki/Radix_point">radix point</a>.
    100 All of our representations use
    101 <a href="https://en.wikipedia.org/wiki/Binary_number">base 2</a>,
    102 so we substitute <em>bit</em> for <em>digit</em>,
    103 and <em>binary point</em> or simply <em>point</em> for <em>radix point</em>.
    104 The bits to the left of the point are the integer part,
    105 and the bits to the right of the point are the
    106 <a href="https://en.wikipedia.org/wiki/Fractional_part">fractional part</a>.
    107 </p>
    108 
    109 <p>
    110 We speak of <em>integer PCM</em>, because fixed-point values
    111 are usually stored and manipulated as integer values.
    112 The interpretation as fixed-point is implicit.
    113 </p>
    114 
    115 <p>
    116 We use <a href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a>
    117 for all signed fixed-point representations,
    118 so the following holds where all values are in units of one
    119 <a href="https://en.wikipedia.org/wiki/Least_significant_bit">LSB</a>:
    120 </p>
    121 <pre class="devsite-click-to-copy">
    122 |largest negative value| = |largest positive value| + 1
    123 </pre>
    124 
    125 <h3 id="q">Q and U notation</h3>
    126 
    127 <p>
    128 There are various
    129 <a href="https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation">notations</a>
    130 for fixed-point representation in an integer.
    131 We use <a href="https://en.wikipedia.org/wiki/Q_(number_format)">Q notation</a>:
    132 Q<em>m</em>.<em>n</em> means <em>m</em> integer bits and <em>n</em> fractional bits.
    133 The "Q" counts as one bit, though the value is expressed in two's complement.
    134 The total number of bits is <em>m</em> + <em>n</em> + 1.
    135 </p>
    136 
    137 <p>
    138 U<em>m</em>.<em>n</em> is for unsigned numbers:
    139 <em>m</em> integer bits and <em>n</em> fractional bits,
    140 and the "U" counts as zero bits.
    141 The total number of bits is <em>m</em> + <em>n</em>.
    142 </p>
    143 
    144 <p>
    145 The integer part may be used in the final result, or be temporary.
    146 In the latter case, the bits that make up the integer part are called
    147 <em>guard bits</em>. The guard bits permit an intermediate calculation to overflow,
    148 as long as the final value is within range or can be clamped to be within range.
    149 Note that fixed-point guard bits are at the left, while floating-point unit
    150 <a href="https://en.wikipedia.org/wiki/Guard_digit">guard digits</a>
    151 are used to reduce roundoff error and are on the right.
    152 </p>
    153 
    154 <h2 id="floating">Floating point representation</h2>
    155 
    156 <p>
    157 <a href="https://en.wikipedia.org/wiki/Floating_point">Floating point</a>
    158 is an alternative to fixed point, in which the location of the point can vary.
    159 The primary advantages of floating-point include:
    160 </p>
    161 
    162 <ul>
    163   <li>Greater <a href="https://en.wikipedia.org/wiki/Headroom_(audio_signal_processing)">headroom</a>
    164       and <a href="https://en.wikipedia.org/wiki/Dynamic_range">dynamic range</a>;
    165       floating-point arithmetic tolerates exceeeding nominal ranges
    166       during intermediate computation, and only clamps values at the end
    167   </li>
    168   <li>Support for special values such as infinities and NaN</li>
    169   <li>Easier to use in many cases</li>
    170 </ul>
    171 
    172 <p>
    173 Historically, floating-point arithmetic was slower than integer or fixed-point
    174 arithmetic, but now it is common for floating-point to be faster,
    175 provided control flow decisions aren't based on the value of a computation.
    176 </p>
    177 
    178 <h2 id="androidFormats">Android formats for audio</h2>
    179 
    180 <p>
    181 The major Android formats for audio are listed in the table below:
    182 </p>
    183 
    184 <table>
    185 
    186 <tr>
    187   <th></th>
    188   <th colspan="6"><center>Notation</center></th>
    189 </tr>
    190 
    191 <tr>
    192   <th>Property</th>
    193   <th>Q0.15</th>
    194   <th>Q0.7 <sup>1</sup></th>
    195   <th>Q0.23</th>
    196   <th>Q0.31</th>
    197   <th>float</th>
    198 </tr>
    199 
    200 <tr>
    201   <td>Container<br />bits</td>
    202   <td>16</td>
    203   <td>8</td>
    204   <td>24 or 32 <sup>2</sup></td>
    205   <td>32</td>
    206   <td>32</td>
    207 </tr>
    208 
    209 <tr>
    210   <td>Significant bits<br />including sign</td>
    211   <td>16</td>
    212   <td>8</td>
    213   <td>24</td>
    214   <td>24 or 32 <sup>2</sup></td>
    215   <td>25 <sup>3</sup></td>
    216 </tr>
    217 
    218 <tr>
    219   <td>Headroom<br />in dB</td>
    220   <td>0</td>
    221   <td>0</td>
    222   <td>0</td>
    223   <td>0</td>
    224   <td>126 <sup>4</sup></td>
    225 </tr>
    226 
    227 <tr>
    228   <td>Dynamic range<br />in dB</td>
    229   <td>90</td>
    230   <td>42</td>
    231   <td>138</td>
    232   <td>138 to 186</td>
    233   <td>900 <sup>5</sup></td>
    234 </tr>
    235 
    236 </table>
    237 
    238 <p>
    239 All fixed-point formats above have a nominal range of -1.0 to +1.0 minus one LSB.
    240 There is one more negative value than positive value due to the
    241 two's complement representation.
    242 </p>
    243 
    244 <p>
    245 Footnotes:
    246 </p>
    247 
    248 <ol>
    249 
    250 <li>
    251 All formats above express signed sample values.
    252 The 8-bit format is commonly called "unsigned", but
    253 it is actually a signed value with bias of <code>0.10000000</code>.
    254 </li>
    255 
    256 <li>
    257 Q0.23 may be packed into 24 bits (three 8-bit bytes), or unpacked
    258 in 32 bits. If unpacked, the significant bits are either right-justified
    259 towards the LSB with sign extension padding towards the MSB (Q8.23),
    260 or left-justified towards the MSB with zero fill towards the LSB
    261 (Q0.31). Q0.31 theoretically permits up to 32 significant bits,
    262 but hardware interfaces that accept Q0.31 rarely use all the bits.
    263 </li>
    264 
    265 <li>
    266 Single-precision floating point has 23 explicit bits plus one hidden bit and sign bit,
    267 resulting in 25 significant bits total.
    268 <a href="https://en.wikipedia.org/wiki/Denormal_number">Denormal numbers</a>
    269 have fewer significant bits.
    270 </li>
    271 
    272 <li>
    273 Single-precision floating point can express values up to &plusmn;1.7e+38,
    274 which explains the large headroom.
    275 </li>
    276 
    277 <li>
    278 The dynamic range shown is for denormals up to the nominal maximum
    279 value &plusmn;1.0.
    280 Note that some architecture-specific floating point implementations such as
    281 <a href="https://en.wikipedia.org/wiki/ARM_architecture#NEON">NEON</a>
    282 don't support denormals.
    283 </li>
    284 
    285 </ol>
    286 
    287 <h2 id="conversions">Conversions</h2>
    288 
    289 <p>
    290 This section discusses
    291 <a href="https://en.wikipedia.org/wiki/Data_conversion">data conversions</a>
    292 between various representations.
    293 </p>
    294 
    295 <h3 id="floatConversions">Floating point conversions</h3>
    296 
    297 <p>
    298 To convert a value from Q<em>m</em>.<em>n</em> format to floating point:
    299 </p>
    300 
    301 <ol>
    302   <li>Convert the value to floating point as if it were an integer (by ignoring the point).</li>
    303   <li>Multiply by 2<sup>-<em>n</em></sup>.</li>
    304 </ol>
    305 
    306 <p>
    307 For example, to convert a Q4.27 internal value to floating point, use:
    308 </p>
    309 <pre class="devsite-click-to-copy">
    310 float = integer * (2 ^ -27)
    311 </pre>
    312 
    313 <p>
    314 Conversions from floating point to fixed point follow these rules:
    315 </p>
    316 
    317 <ul>
    318 
    319 <li>
    320 Single-precision floating point has a nominal range of &plusmn;1.0,
    321 but the full range for intermediate values is &plusmn;1.7e+38.
    322 Conversion between floating point and fixed point for external representation
    323 (such as output to audio devices) will consider only the nominal range, with
    324 clamping for values that exceed that range.
    325 In particular, when +1.0 is converted
    326 to a fixed-point format, it is clamped to +1.0 minus one LSB.
    327 </li>
    328 
    329 <li>
    330 Denormals (subnormals) and both +/- 0.0 are allowed in representation,
    331 but may be silently converted to 0.0 during processing.
    332 </li>
    333 
    334 <li>
    335 Infinities will either pass through operations or will be silently hard-limited
    336 to +/- 1.0. Generally the latter is for conversion to a fixed-point format.
    337 </li>
    338 
    339 <li>
    340 NaN behavior is undefined: a NaN may propagate as an identical NaN, or may be
    341 converted to a Default NaN, may be silently hard limited to +/- 1.0, or
    342 silently converted to 0.0, or result in an error.
    343 </li>
    344 
    345 </ul>
    346 
    347 <h3 id="fixedConversion">Fixed point conversions</h3>
    348 
    349 <p>
    350 Conversions between different Q<em>m</em>.<em>n</em> formats follow these rules:
    351 </p>
    352 
    353 <ul>
    354 
    355 <li>
    356 When <em>m</em> is increased, sign extend the integer part at left.
    357 </li>
    358 
    359 <li>
    360 When <em>m</em> is decreased, clamp the integer part.
    361 </li>
    362 
    363 <li>
    364 When <em>n</em> is increased, zero extend the fractional part at right.
    365 </li>
    366 
    367 <li>
    368 When <em>n</em> is decreased, either dither, round, or truncate the excess fractional bits at right.
    369 </li>
    370 
    371 </ul>
    372 
    373 <p>
    374 For example, to convert a Q4.27 value to Q0.15 (without dither or
    375 rounding), right shift the Q4.27 value by 12 bits, and clamp any results
    376 that exceed the 16-bit signed range. This aligns the point of the
    377 Q representation.
    378 </p>
    379 
    380 <p>To convert Q7.24 to Q7.23, do a signed divide by 2,
    381 or equivalently add the sign bit to the Q7.24 integer quantity, and then signed right shift by 1.
    382 Note that a simple signed right shift is <em>not</em> equivalent to a signed divide by 2.
    383 </p>
    384 
    385 <h3 id="lossyConversion">Lossy and lossless conversions</h3>
    386 
    387 <p>
    388 A conversion is <em>lossless</em> if it is
    389 <a href="https://en.wikipedia.org/wiki/Inverse_function">invertible</a>:
    390 a conversion from <code>A</code> to <code>B</code> to
    391 <code>C</code> results in <code>A = C</code>.
    392 Otherwise the conversion is <a href="https://en.wikipedia.org/wiki/Lossy_data_conversion">lossy</a>.
    393 </p>
    394 
    395 <p>
    396 Lossless conversions permit
    397 <a href="https://en.wikipedia.org/wiki/Round-trip_format_conversion">round-trip format conversion</a>.
    398 </p>
    399 
    400 <p>
    401 Conversions from fixed point representation with 25 or fewer significant bits to floating point are lossless.
    402 Conversions from floating point to any common fixed point representation are lossy.
    403 </p>
    404 
    405   </body>
    406 </html>
    407