1 <html devsite> 2 <head> 3 <title>Data Formats</title> 4 <meta name="project_path" value="/_project.yaml" /> 5 <meta name="book_path" value="/_book.yaml" /> 6 </head> 7 <body> 8 <!-- 9 Copyright 2017 The Android Open Source Project 10 11 Licensed under the Apache License, Version 2.0 (the "License"); 12 you may not use this file except in compliance with the License. 13 You may obtain a copy of the License at 14 15 http://www.apache.org/licenses/LICENSE-2.0 16 17 Unless required by applicable law or agreed to in writing, software 18 distributed under the License is distributed on an "AS IS" BASIS, 19 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 20 See the License for the specific language governing permissions and 21 limitations under the License. 22 --> 23 24 25 26 <p> 27 Android uses a wide variety of audio 28 <a href="http://en.wikipedia.org/wiki/Data_format">data formats</a> 29 internally, and exposes a subset of these in public APIs, 30 <a href="http://en.wikipedia.org/wiki/Audio_file_format">file formats</a>, 31 and the 32 <a href="https://en.wikipedia.org/wiki/Hardware_abstraction">Hardware Abstraction Layer</a> (HAL). 33 </p> 34 35 <h2 id="properties">Properties</h2> 36 37 <p> 38 The audio data formats are classified by their properties: 39 </p> 40 41 <dl> 42 43 <dt><a href="https://en.wikipedia.org/wiki/Data_compression">Compression</a></dt> 44 <dd> 45 <a href="http://en.wikipedia.org/wiki/Raw_data">Uncompressed</a>, 46 <a href="http://en.wikipedia.org/wiki/Lossless_compression">lossless compressed</a>, or 47 <a href="http://en.wikipedia.org/wiki/Lossy_compression">lossy compressed</a>. 48 PCM is the most common uncompressed audio format. FLAC is a lossless compressed 49 format, while MP3 and AAC are lossy compressed formats. 50 </dd> 51 52 <dt><a href="http://en.wikipedia.org/wiki/Audio_bit_depth">Bit depth</a></dt> 53 <dd> 54 Number of significant bits per audio sample. 55 </dd> 56 57 <dt><a href="https://en.wikipedia.org/wiki/Sizeof">Container size</a></dt> 58 <dd> 59 Number of bits used to store or transmit a sample. Usually 60 this is the same as the bit depth, but sometimes additional 61 padding bits are allocated for alignment. For example, a 62 24-bit sample could be contained within a 32-bit word. 63 </dd> 64 65 <dt><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">Alignment</a></dt> 66 <dd> 67 If the container size is exactly equal to the bit depth, the 68 representation is called <em>packed</em>. Otherwise the representation is 69 <em>unpacked</em>. The significant bits of the sample are typically 70 aligned with either the leftmost (most significant) or rightmost 71 (least significant) bit of the container. It is conventional to use 72 the terms <em>packed</em> and <em>unpacked</em> only when the bit 73 depth is not a 74 <a href="http://en.wikipedia.org/wiki/Power_of_two">power of two</a>. 75 </dd> 76 77 <dt><a href="http://en.wikipedia.org/wiki/Signedness">Signedness</a></dt> 78 <dd> 79 Whether samples are signed or unsigned. 80 </dd> 81 82 <dt>Representation</dt> 83 <dd> 84 Either fixed point or floating point; see below. 85 </dd> 86 87 </dl> 88 89 <h2 id="fixed">Fixed point representation</h2> 90 91 <p> 92 <a href="http://en.wikipedia.org/wiki/Fixed-point_arithmetic">Fixed point</a> 93 is the most common representation for uncompressed PCM audio data, 94 especially at hardware interfaces. 95 </p> 96 97 <p> 98 A fixed-point number has a fixed (constant) number of digits 99 before and after the <a href="https://en.wikipedia.org/wiki/Radix_point">radix point</a>. 100 All of our representations use 101 <a href="https://en.wikipedia.org/wiki/Binary_number">base 2</a>, 102 so we substitute <em>bit</em> for <em>digit</em>, 103 and <em>binary point</em> or simply <em>point</em> for <em>radix point</em>. 104 The bits to the left of the point are the integer part, 105 and the bits to the right of the point are the 106 <a href="https://en.wikipedia.org/wiki/Fractional_part">fractional part</a>. 107 </p> 108 109 <p> 110 We speak of <em>integer PCM</em>, because fixed-point values 111 are usually stored and manipulated as integer values. 112 The interpretation as fixed-point is implicit. 113 </p> 114 115 <p> 116 We use <a href="https://en.wikipedia.org/wiki/Two%27s_complement">two's complement</a> 117 for all signed fixed-point representations, 118 so the following holds where all values are in units of one 119 <a href="https://en.wikipedia.org/wiki/Least_significant_bit">LSB</a>: 120 </p> 121 <pre class="devsite-click-to-copy"> 122 |largest negative value| = |largest positive value| + 1 123 </pre> 124 125 <h3 id="q">Q and U notation</h3> 126 127 <p> 128 There are various 129 <a href="https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Notation">notations</a> 130 for fixed-point representation in an integer. 131 We use <a href="https://en.wikipedia.org/wiki/Q_(number_format)">Q notation</a>: 132 Q<em>m</em>.<em>n</em> means <em>m</em> integer bits and <em>n</em> fractional bits. 133 The "Q" counts as one bit, though the value is expressed in two's complement. 134 The total number of bits is <em>m</em> + <em>n</em> + 1. 135 </p> 136 137 <p> 138 U<em>m</em>.<em>n</em> is for unsigned numbers: 139 <em>m</em> integer bits and <em>n</em> fractional bits, 140 and the "U" counts as zero bits. 141 The total number of bits is <em>m</em> + <em>n</em>. 142 </p> 143 144 <p> 145 The integer part may be used in the final result, or be temporary. 146 In the latter case, the bits that make up the integer part are called 147 <em>guard bits</em>. The guard bits permit an intermediate calculation to overflow, 148 as long as the final value is within range or can be clamped to be within range. 149 Note that fixed-point guard bits are at the left, while floating-point unit 150 <a href="https://en.wikipedia.org/wiki/Guard_digit">guard digits</a> 151 are used to reduce roundoff error and are on the right. 152 </p> 153 154 <h2 id="floating">Floating point representation</h2> 155 156 <p> 157 <a href="https://en.wikipedia.org/wiki/Floating_point">Floating point</a> 158 is an alternative to fixed point, in which the location of the point can vary. 159 The primary advantages of floating-point include: 160 </p> 161 162 <ul> 163 <li>Greater <a href="https://en.wikipedia.org/wiki/Headroom_(audio_signal_processing)">headroom</a> 164 and <a href="https://en.wikipedia.org/wiki/Dynamic_range">dynamic range</a>; 165 floating-point arithmetic tolerates exceeeding nominal ranges 166 during intermediate computation, and only clamps values at the end 167 </li> 168 <li>Support for special values such as infinities and NaN</li> 169 <li>Easier to use in many cases</li> 170 </ul> 171 172 <p> 173 Historically, floating-point arithmetic was slower than integer or fixed-point 174 arithmetic, but now it is common for floating-point to be faster, 175 provided control flow decisions aren't based on the value of a computation. 176 </p> 177 178 <h2 id="androidFormats">Android formats for audio</h2> 179 180 <p> 181 The major Android formats for audio are listed in the table below: 182 </p> 183 184 <table> 185 186 <tr> 187 <th></th> 188 <th colspan="6"><center>Notation</center></th> 189 </tr> 190 191 <tr> 192 <th>Property</th> 193 <th>Q0.15</th> 194 <th>Q0.7 <sup>1</sup></th> 195 <th>Q0.23</th> 196 <th>Q0.31</th> 197 <th>float</th> 198 </tr> 199 200 <tr> 201 <td>Container<br />bits</td> 202 <td>16</td> 203 <td>8</td> 204 <td>24 or 32 <sup>2</sup></td> 205 <td>32</td> 206 <td>32</td> 207 </tr> 208 209 <tr> 210 <td>Significant bits<br />including sign</td> 211 <td>16</td> 212 <td>8</td> 213 <td>24</td> 214 <td>24 or 32 <sup>2</sup></td> 215 <td>25 <sup>3</sup></td> 216 </tr> 217 218 <tr> 219 <td>Headroom<br />in dB</td> 220 <td>0</td> 221 <td>0</td> 222 <td>0</td> 223 <td>0</td> 224 <td>126 <sup>4</sup></td> 225 </tr> 226 227 <tr> 228 <td>Dynamic range<br />in dB</td> 229 <td>90</td> 230 <td>42</td> 231 <td>138</td> 232 <td>138 to 186</td> 233 <td>900 <sup>5</sup></td> 234 </tr> 235 236 </table> 237 238 <p> 239 All fixed-point formats above have a nominal range of -1.0 to +1.0 minus one LSB. 240 There is one more negative value than positive value due to the 241 two's complement representation. 242 </p> 243 244 <p> 245 Footnotes: 246 </p> 247 248 <ol> 249 250 <li> 251 All formats above express signed sample values. 252 The 8-bit format is commonly called "unsigned", but 253 it is actually a signed value with bias of <code>0.10000000</code>. 254 </li> 255 256 <li> 257 Q0.23 may be packed into 24 bits (three 8-bit bytes), or unpacked 258 in 32 bits. If unpacked, the significant bits are either right-justified 259 towards the LSB with sign extension padding towards the MSB (Q8.23), 260 or left-justified towards the MSB with zero fill towards the LSB 261 (Q0.31). Q0.31 theoretically permits up to 32 significant bits, 262 but hardware interfaces that accept Q0.31 rarely use all the bits. 263 </li> 264 265 <li> 266 Single-precision floating point has 23 explicit bits plus one hidden bit and sign bit, 267 resulting in 25 significant bits total. 268 <a href="https://en.wikipedia.org/wiki/Denormal_number">Denormal numbers</a> 269 have fewer significant bits. 270 </li> 271 272 <li> 273 Single-precision floating point can express values up to ±1.7e+38, 274 which explains the large headroom. 275 </li> 276 277 <li> 278 The dynamic range shown is for denormals up to the nominal maximum 279 value ±1.0. 280 Note that some architecture-specific floating point implementations such as 281 <a href="https://en.wikipedia.org/wiki/ARM_architecture#NEON">NEON</a> 282 don't support denormals. 283 </li> 284 285 </ol> 286 287 <h2 id="conversions">Conversions</h2> 288 289 <p> 290 This section discusses 291 <a href="https://en.wikipedia.org/wiki/Data_conversion">data conversions</a> 292 between various representations. 293 </p> 294 295 <h3 id="floatConversions">Floating point conversions</h3> 296 297 <p> 298 To convert a value from Q<em>m</em>.<em>n</em> format to floating point: 299 </p> 300 301 <ol> 302 <li>Convert the value to floating point as if it were an integer (by ignoring the point).</li> 303 <li>Multiply by 2<sup>-<em>n</em></sup>.</li> 304 </ol> 305 306 <p> 307 For example, to convert a Q4.27 internal value to floating point, use: 308 </p> 309 <pre class="devsite-click-to-copy"> 310 float = integer * (2 ^ -27) 311 </pre> 312 313 <p> 314 Conversions from floating point to fixed point follow these rules: 315 </p> 316 317 <ul> 318 319 <li> 320 Single-precision floating point has a nominal range of ±1.0, 321 but the full range for intermediate values is ±1.7e+38. 322 Conversion between floating point and fixed point for external representation 323 (such as output to audio devices) will consider only the nominal range, with 324 clamping for values that exceed that range. 325 In particular, when +1.0 is converted 326 to a fixed-point format, it is clamped to +1.0 minus one LSB. 327 </li> 328 329 <li> 330 Denormals (subnormals) and both +/- 0.0 are allowed in representation, 331 but may be silently converted to 0.0 during processing. 332 </li> 333 334 <li> 335 Infinities will either pass through operations or will be silently hard-limited 336 to +/- 1.0. Generally the latter is for conversion to a fixed-point format. 337 </li> 338 339 <li> 340 NaN behavior is undefined: a NaN may propagate as an identical NaN, or may be 341 converted to a Default NaN, may be silently hard limited to +/- 1.0, or 342 silently converted to 0.0, or result in an error. 343 </li> 344 345 </ul> 346 347 <h3 id="fixedConversion">Fixed point conversions</h3> 348 349 <p> 350 Conversions between different Q<em>m</em>.<em>n</em> formats follow these rules: 351 </p> 352 353 <ul> 354 355 <li> 356 When <em>m</em> is increased, sign extend the integer part at left. 357 </li> 358 359 <li> 360 When <em>m</em> is decreased, clamp the integer part. 361 </li> 362 363 <li> 364 When <em>n</em> is increased, zero extend the fractional part at right. 365 </li> 366 367 <li> 368 When <em>n</em> is decreased, either dither, round, or truncate the excess fractional bits at right. 369 </li> 370 371 </ul> 372 373 <p> 374 For example, to convert a Q4.27 value to Q0.15 (without dither or 375 rounding), right shift the Q4.27 value by 12 bits, and clamp any results 376 that exceed the 16-bit signed range. This aligns the point of the 377 Q representation. 378 </p> 379 380 <p>To convert Q7.24 to Q7.23, do a signed divide by 2, 381 or equivalently add the sign bit to the Q7.24 integer quantity, and then signed right shift by 1. 382 Note that a simple signed right shift is <em>not</em> equivalent to a signed divide by 2. 383 </p> 384 385 <h3 id="lossyConversion">Lossy and lossless conversions</h3> 386 387 <p> 388 A conversion is <em>lossless</em> if it is 389 <a href="https://en.wikipedia.org/wiki/Inverse_function">invertible</a>: 390 a conversion from <code>A</code> to <code>B</code> to 391 <code>C</code> results in <code>A = C</code>. 392 Otherwise the conversion is <a href="https://en.wikipedia.org/wiki/Lossy_data_conversion">lossy</a>. 393 </p> 394 395 <p> 396 Lossless conversions permit 397 <a href="https://en.wikipedia.org/wiki/Round-trip_format_conversion">round-trip format conversion</a>. 398 </p> 399 400 <p> 401 Conversions from fixed point representation with 25 or fewer significant bits to floating point are lossless. 402 Conversions from floating point to any common fixed point representation are lossy. 403 </p> 404 405 </body> 406 </html> 407