1 /****************************************************************************** 2 * 3 * Copyright (C) 2014 The Android Open Source Project 4 * Copyright 2003 - 2004 Open Interface North America, Inc. All rights 5 * reserved. 6 * 7 * Licensed under the Apache License, Version 2.0 (the "License"); 8 * you may not use this file except in compliance with the License. 9 * You may obtain a copy of the License at: 10 * 11 * http://www.apache.org/licenses/LICENSE-2.0 12 * 13 * Unless required by applicable law or agreed to in writing, software 14 * distributed under the License is distributed on an "AS IS" BASIS, 15 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 * See the License for the specific language governing permissions and 17 * limitations under the License. 18 * 19 ******************************************************************************/ 20 21 /******************************************************************************* 22 $Revision: #1 $ 23 ******************************************************************************/ 24 25 /** @file 26 27 This file, along with synthesis-generated.c, contains the synthesis 28 filterbank routines. The operations performed correspond to the 29 operations described in A2DP Appendix B, Figure 12.3. Several 30 mathematical optimizations are performed, particularly for the 31 8-subband case. 32 33 One important optimization is to note that the "matrixing" operation 34 can be decomposed into the product of a type II discrete cosine kernel 35 and another, sparse matrix. 36 37 According to Fig 12.3, in the 8-subband case, 38 @code 39 N[k][i] = cos((i+0.5)*(k+4)*pi/8), k = 0..15 and i = 0..7 40 @endcode 41 42 N can be factored as R * C2, where C2 is an 8-point type II discrete 43 cosine kernel given by 44 @code 45 C2[k][i] = cos((i+0.5)*k*pi/8)), k = 0..7 and i = 0..7 46 @endcode 47 48 R turns out to be a sparse 16x8 matrix with the following non-zero 49 entries: 50 @code 51 R[k][k+4] = 1, k = 0..3 52 R[k][abs(12-k)] = -1, k = 5..15 53 @endcode 54 55 The spec describes computing V[0..15] as N * R. 56 @code 57 V[0..15] = N * R = (R * C2) * R = R * (C2 * R) 58 @endcode 59 60 C2 * R corresponds to computing the discrete cosine transform of R, so 61 V[0..15] can be computed by taking the DCT of R followed by assignment 62 and selective negation of the DCT result into V. 63 64 Although this was derived empirically using GNU Octave, it is 65 formally demonstrated in, e.g., Liu, Chi-Min and Lee, 66 Wen-Chieh. "A Unified Fast Algorithm for Cosine Modulated 67 Filter Banks in Current Audio Coding Standards." Journal of 68 the AES 47 (December 1999): 1061. 69 70 Given the shift operation performed prior to computing V[0..15], it is 71 clear that V[0..159] represents a rolling history of the 10 most 72 recent groups of blocks input to the synthesis operation. Interpreting 73 the matrix N in light of its factorization into C2 and R, R's 74 sparseness has implications for interpreting the values in V. In 75 particular, there is considerable redundancy in the values stored in 76 V. Furthermore, since R[4][0..7] are all zeros, one out of every 16 77 values in V will be zero regardless of the input data. Within each 78 block of 16 values in V, fully half of them are redundant or 79 irrelevant: 80 81 @code 82 V[ 0] = DCT[4] 83 V[ 1] = DCT[5] 84 V[ 2] = DCT[6] 85 V[ 3] = DCT[7] 86 V[ 4] = 0 87 V[ 5] = -DCT[7] = -V[3] (redundant) 88 V[ 6] = -DCT[6] = -V[2] (redundant) 89 V[ 7] = -DCT[5] = -V[1] (redundant) 90 V[ 8] = -DCT[4] = -V[0] (redundant) 91 V[ 9] = -DCT[3] 92 V[10] = -DCT[2] 93 V[11] = -DCT[1] 94 V[12] = -DCT[0] 95 V[13] = -DCT[1] = V[11] (redundant) 96 V[14] = -DCT[2] = V[10] (redundant) 97 V[15] = -DCT[3] = V[ 9] (redundant) 98 @endcode 99 100 Since the elements of V beyond 15 were originally computed the same 101 way during a previous run, what holds true for V[x] also holds true 102 for V[x+16]. Thus, so long as care is taken to maintain the mapping, 103 we need only actually store the unique values, which correspond to the 104 output of the DCT, in some cases inverted. In fact, instead of storing 105 V[0..159], we could store DCT[0..79] which would contain a history of 106 DCT results. More on this in a bit. 107 108 Going back to figure 12.3 in the spec, it should be clear that the 109 vector U need not actually be explicitly constructed, but that with 110 suitable indexing into V during the window operation, the same end can 111 be accomplished. In the same spirit of the pseudocode shown in the 112 figure, the following is the construction of W without using U: 113 114 @code 115 for i=0 to 79 do 116 W[i] = D[i]*VSIGN(i)*V[remap_V(i)] where remap_V(i) = 32*(int(i/16)) + 117 (i % 16) + (i % 16 >= 8 ? 16 : 0) 118 and VSIGN(i) maps i%16 into {1, 1, 119 1, 1, 0, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1 } 120 These values correspond to the 121 signs of the redundant values as 122 shown in the explanation three 123 paragraphs above. 124 @endcode 125 126 We saw above how V[4..8,13..15] (and by extension 127 V[(4..8,13..15)+16*n]) can be defined in terms of other elements 128 within the subblock of V. V[0..3,9..12] correspond to DCT elements. 129 130 @code 131 for i=0 to 79 do 132 W[i] = D[i]*DSIGN(i)*DCT[remap_DCT(i)] 133 @endcode 134 135 The DCT is calculated using the Arai-Agui-Nakajima factorization, 136 which saves some computation by producing output that needs to be 137 multiplied by scaling factors before being used. 138 139 @code 140 for i=0 to 79 do 141 W[i] = D[i]*SCALE[i%8]*AAN_DCT[remap_DCT(i)] 142 @endcode 143 144 D can be premultiplied with the DCT scaling factors to yield 145 146 @code 147 for i=0 to 79 do 148 W[i] = DSCALED[i]*AAN_DCT[remap_DCT(i)] where DSCALED[i] = 149 D[i]*SCALE[i%8] 150 @endcode 151 152 The output samples X[0..7] are defined as sums of W: 153 154 @code 155 X[j] = sum{i=0..9}(W[j+8*i]) 156 @endcode 157 158 @ingroup codec_internal 159 */ 160 161 /** 162 @addtogroup codec_internal 163 @{ 164 */ 165 166 #include "oi_codec_sbc_private.h" 167 168 const int32_t dec_window_4[21] = { 169 0, /* +0.00000000E+00 */ 170 97, /* +5.36548976E-04 */ 171 270, /* +1.49188357E-03 */ 172 495, /* +2.73370904E-03 */ 173 694, /* +3.83720193E-03 */ 174 704, /* +3.89205149E-03 */ 175 338, /* +1.86581691E-03 */ 176 -554, /* -3.06012286E-03 */ 177 1974, /* +1.09137620E-02 */ 178 3697, /* +2.04385087E-02 */ 179 5224, /* +2.88757392E-02 */ 180 5824, /* +3.21939290E-02 */ 181 4681, /* +2.58767811E-02 */ 182 1109, /* +6.13245186E-03 */ 183 -5214, /* -2.88217274E-02 */ 184 -14047, /* -7.76463494E-02 */ 185 24529, /* +1.35593274E-01 */ 186 35274, /* +1.94987841E-01 */ 187 44618, /* +2.46636662E-01 */ 188 50984, /* +2.81828203E-01 */ 189 53243, /* +2.94315332E-01 */ 190 }; 191 192 #define DCTII_4_K06_FIX (11585) /* S1.14 11585 0.707107*/ 193 194 #define DCTII_4_K08_FIX (21407) /* S1.14 21407 1.306563*/ 195 196 #define DCTII_4_K09_FIX (-15137) /* S1.14 -15137 -0.923880*/ 197 198 #define DCTII_4_K10_FIX (-8867) /* S1.14 -8867 -0.541196*/ 199 200 /** Scales x by y bits to the right, adding a rounding factor. 201 */ 202 #ifndef SCALE 203 #define SCALE(x, y) (((x) + (1 << ((y)-1))) >> (y)) 204 #endif 205 206 #ifndef CLIP_INT16 207 #define CLIP_INT16(x) \ 208 do { \ 209 if ((x) > OI_INT16_MAX) { \ 210 (x) = OI_INT16_MAX; \ 211 } else if ((x) < OI_INT16_MIN) { \ 212 (x) = OI_INT16_MIN; \ 213 } \ 214 } while (0) 215 #endif 216 217 /** 218 * Default C language implementation of a 16x32->32 multiply. This function may 219 * be replaced by a platform-specific version for speed. 220 * 221 * @param u A signed 16-bit multiplicand 222 * @param v A signed 32-bit multiplier 223 224 * @return A signed 32-bit value corresponding to the 32 most significant bits 225 * of the 48-bit product of u and v. 226 */ 227 INLINE int32_t default_mul_16s_32s_hi(int16_t u, int32_t v) { 228 uint16_t v0; 229 int16_t v1; 230 231 int32_t w, x; 232 233 v0 = (uint16_t)(v & 0xffff); 234 v1 = (int16_t)(v >> 16); 235 236 w = v1 * u; 237 x = u * v0; 238 239 return w + (x >> 16); 240 } 241 242 #define MUL_16S_32S_HI(_x, _y) default_mul_16s_32s_hi(_x, _y) 243 244 #define LONG_MULT_DCT(K, sample) (MUL_16S_32S_HI(K, sample) << 2) 245 246 PRIVATE void SynthWindow80_generated(int16_t* pcm, 247 SBC_BUFFER_T const* RESTRICT buffer, 248 OI_UINT strideShift); 249 PRIVATE void SynthWindow112_generated(int16_t* pcm, 250 SBC_BUFFER_T const* RESTRICT buffer, 251 OI_UINT strideShift); 252 PRIVATE void dct2_8(SBC_BUFFER_T* RESTRICT out, int32_t const* RESTRICT x); 253 254 typedef void (*SYNTH_FRAME)(OI_CODEC_SBC_DECODER_CONTEXT* context, int16_t* pcm, 255 OI_UINT blkstart, OI_UINT blkcount); 256 257 #ifndef COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS 258 #define COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(dest, src) \ 259 do { \ 260 shift_buffer(dest, src, 72); \ 261 } while (0) 262 #endif 263 264 #ifndef DCT2_8 265 #define DCT2_8(dst, src) dct2_8(dst, src) 266 #endif 267 268 #ifndef SYNTH80 269 #define SYNTH80 SynthWindow80_generated 270 #endif 271 272 #ifndef SYNTH112 273 #define SYNTH112 SynthWindow112_generated 274 #endif 275 276 PRIVATE void OI_SBC_SynthFrame_80(OI_CODEC_SBC_DECODER_CONTEXT* context, 277 int16_t* pcm, OI_UINT blkstart, 278 OI_UINT blkcount) { 279 OI_UINT blk; 280 OI_UINT ch; 281 OI_UINT nrof_channels = context->common.frameInfo.nrof_channels; 282 OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1; 283 OI_UINT offset = context->common.filterBufferOffset; 284 int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart; 285 OI_UINT blkstop = blkstart + blkcount; 286 287 for (blk = blkstart; blk < blkstop; blk++) { 288 if (offset == 0) { 289 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS( 290 context->common.filterBuffer[0] + context->common.filterBufferLen - 291 72, 292 context->common.filterBuffer[0]); 293 if (nrof_channels == 2) { 294 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS( 295 context->common.filterBuffer[1] + context->common.filterBufferLen - 296 72, 297 context->common.filterBuffer[1]); 298 } 299 offset = context->common.filterBufferLen - 80; 300 } else { 301 offset -= 1 * 8; 302 } 303 304 for (ch = 0; ch < nrof_channels; ch++) { 305 DCT2_8(context->common.filterBuffer[ch] + offset, s); 306 SYNTH80(pcm + ch, context->common.filterBuffer[ch] + offset, 307 pcmStrideShift); 308 s += 8; 309 } 310 pcm += (8 << pcmStrideShift); 311 } 312 context->common.filterBufferOffset = offset; 313 } 314 315 PRIVATE void OI_SBC_SynthFrame_4SB(OI_CODEC_SBC_DECODER_CONTEXT* context, 316 int16_t* pcm, OI_UINT blkstart, 317 OI_UINT blkcount) { 318 OI_UINT blk; 319 OI_UINT ch; 320 OI_UINT nrof_channels = context->common.frameInfo.nrof_channels; 321 OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1; 322 OI_UINT offset = context->common.filterBufferOffset; 323 int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart; 324 OI_UINT blkstop = blkstart + blkcount; 325 326 for (blk = blkstart; blk < blkstop; blk++) { 327 if (offset == 0) { 328 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS( 329 context->common.filterBuffer[0] + context->common.filterBufferLen - 330 72, 331 context->common.filterBuffer[0]); 332 if (nrof_channels == 2) { 333 COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS( 334 context->common.filterBuffer[1] + context->common.filterBufferLen - 335 72, 336 context->common.filterBuffer[1]); 337 } 338 offset = context->common.filterBufferLen - 80; 339 } else { 340 offset -= 8; 341 } 342 for (ch = 0; ch < nrof_channels; ch++) { 343 cosineModulateSynth4(context->common.filterBuffer[ch] + offset, s); 344 SynthWindow40_int32_int32_symmetry_with_sum( 345 pcm + ch, context->common.filterBuffer[ch] + offset, pcmStrideShift); 346 s += 4; 347 } 348 pcm += (4 << pcmStrideShift); 349 } 350 context->common.filterBufferOffset = offset; 351 } 352 353 #ifdef SBC_ENHANCED 354 355 PRIVATE void OI_SBC_SynthFrame_Enhanced(OI_CODEC_SBC_DECODER_CONTEXT* context, 356 int16_t* pcm, OI_UINT blkstart, 357 OI_UINT blkcount) { 358 OI_UINT blk; 359 OI_UINT ch; 360 OI_UINT nrof_channels = context->common.frameInfo.nrof_channels; 361 OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1; 362 OI_UINT offset = context->common.filterBufferOffset; 363 int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart; 364 OI_UINT blkstop = blkstart + blkcount; 365 366 for (blk = blkstart; blk < blkstop; blk++) { 367 if (offset == 0) { 368 COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS( 369 context->common.filterBuffer[0] + context->common.filterBufferLen - 370 104, 371 context->common.filterBuffer[0]); 372 if (nrof_channels == 2) { 373 COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS( 374 context->common.filterBuffer[1] + context->common.filterBufferLen - 375 104, 376 context->common.filterBuffer[1]); 377 } 378 offset = context->common.filterBufferLen - 112; 379 } else { 380 offset -= 8; 381 } 382 for (ch = 0; ch < nrof_channels; ++ch) { 383 DCT2_8(context->common.filterBuffer[ch] + offset, s); 384 SYNTH112(pcm + ch, context->common.filterBuffer[ch] + offset, 385 pcmStrideShift); 386 s += 8; 387 } 388 pcm += (8 << pcmStrideShift); 389 } 390 context->common.filterBufferOffset = offset; 391 } 392 393 static const SYNTH_FRAME SynthFrameEnhanced[] = { 394 NULL, /* invalid */ 395 OI_SBC_SynthFrame_Enhanced, /* mono */ 396 OI_SBC_SynthFrame_Enhanced /* stereo */ 397 }; 398 399 #endif 400 401 static const SYNTH_FRAME SynthFrame8SB[] = { 402 NULL, /* invalid */ 403 OI_SBC_SynthFrame_80, /* mono */ 404 OI_SBC_SynthFrame_80 /* stereo */ 405 }; 406 407 static const SYNTH_FRAME SynthFrame4SB[] = { 408 NULL, /* invalid */ 409 OI_SBC_SynthFrame_4SB, /* mono */ 410 OI_SBC_SynthFrame_4SB /* stereo */ 411 }; 412 413 PRIVATE void OI_SBC_SynthFrame(OI_CODEC_SBC_DECODER_CONTEXT* context, 414 int16_t* pcm, OI_UINT start_block, 415 OI_UINT nrof_blocks) { 416 OI_UINT nrof_subbands = context->common.frameInfo.nrof_subbands; 417 OI_UINT nrof_channels = context->common.frameInfo.nrof_channels; 418 419 OI_ASSERT(nrof_subbands == 4 || nrof_subbands == 8); 420 if (nrof_subbands == 4) { 421 SynthFrame4SB[nrof_channels](context, pcm, start_block, nrof_blocks); 422 #ifdef SBC_ENHANCED 423 } else if (context->common.frameInfo.enhanced) { 424 SynthFrameEnhanced[nrof_channels](context, pcm, start_block, nrof_blocks); 425 #endif /* SBC_ENHANCED */ 426 } else { 427 SynthFrame8SB[nrof_channels](context, pcm, start_block, nrof_blocks); 428 } 429 } 430 431 void SynthWindow40_int32_int32_symmetry_with_sum(int16_t* pcm, 432 SBC_BUFFER_T buffer[80], 433 OI_UINT strideShift) { 434 int32_t pa; 435 int32_t pb; 436 437 /* These values should be zero, since out[2] of the 4-band cosine modulation 438 * is always zero. */ 439 OI_ASSERT(buffer[2] == 0); 440 OI_ASSERT(buffer[10] == 0); 441 OI_ASSERT(buffer[18] == 0); 442 OI_ASSERT(buffer[26] == 0); 443 OI_ASSERT(buffer[34] == 0); 444 OI_ASSERT(buffer[42] == 0); 445 OI_ASSERT(buffer[50] == 0); 446 OI_ASSERT(buffer[58] == 0); 447 OI_ASSERT(buffer[66] == 0); 448 OI_ASSERT(buffer[74] == 0); 449 450 pa = dec_window_4[4] * (buffer[12] + buffer[76]); 451 pa += dec_window_4[8] * (buffer[16] - buffer[64]); 452 pa += dec_window_4[12] * (buffer[28] + buffer[60]); 453 pa += dec_window_4[16] * (buffer[32] - buffer[48]); 454 pa += dec_window_4[20] * buffer[44]; 455 pa = SCALE(-pa, 15); 456 CLIP_INT16(pa); 457 pcm[0 << strideShift] = (int16_t)pa; 458 459 pa = dec_window_4[1] * buffer[1]; 460 pb = dec_window_4[1] * buffer[79]; 461 pb += dec_window_4[3] * buffer[3]; 462 pa += dec_window_4[3] * buffer[77]; 463 pa += dec_window_4[5] * buffer[13]; 464 pb += dec_window_4[5] * buffer[67]; 465 pb += dec_window_4[7] * buffer[15]; 466 pa += dec_window_4[7] * buffer[65]; 467 pa += dec_window_4[9] * buffer[17]; 468 pb += dec_window_4[9] * buffer[63]; 469 pb += dec_window_4[11] * buffer[19]; 470 pa += dec_window_4[11] * buffer[61]; 471 pa += dec_window_4[13] * buffer[29]; 472 pb += dec_window_4[13] * buffer[51]; 473 pb += dec_window_4[15] * buffer[31]; 474 pa += dec_window_4[15] * buffer[49]; 475 pa += dec_window_4[17] * buffer[33]; 476 pb += dec_window_4[17] * buffer[47]; 477 pb += dec_window_4[19] * buffer[35]; 478 pa += dec_window_4[19] * buffer[45]; 479 pa = SCALE(-pa, 15); 480 CLIP_INT16(pa); 481 pcm[1 << strideShift] = (int16_t)(pa); 482 pb = SCALE(-pb, 15); 483 CLIP_INT16(pb); 484 pcm[3 << strideShift] = (int16_t)(pb); 485 486 pa = dec_window_4[2] * 487 (/*buffer[ 2] + */ buffer[78]); /* buffer[ 2] is always zero */ 488 pa += dec_window_4[6] * 489 (buffer[14] /* + buffer[66]*/); /* buffer[66] is always zero */ 490 pa += dec_window_4[10] * 491 (/*buffer[18] + */ buffer[62]); /* buffer[18] is always zero */ 492 pa += dec_window_4[14] * 493 (buffer[30] /* + buffer[50]*/); /* buffer[50] is always zero */ 494 pa += dec_window_4[18] * 495 (/*buffer[34] + */ buffer[46]); /* buffer[34] is always zero */ 496 pa = SCALE(-pa, 15); 497 CLIP_INT16(pa); 498 pcm[2 << strideShift] = (int16_t)(pa); 499 } 500 501 /** 502 This routine implements the cosine modulation matrix for 4-subband 503 synthesis. This is called "matrixing" in the SBC specification. This 504 matrix, M4, can be factored into an 8-point Type II Discrete Cosine 505 Transform, DCTII_4 and a matrix S4, given here: 506 507 @code 508 __ __ 509 | 0 0 1 0 | 510 | 0 0 0 1 | 511 | 0 0 0 0 | 512 | 0 0 0 -1 | 513 S4 = | 0 0 -1 0 | 514 | 0 -1 0 0 | 515 | -1 0 0 0 | 516 |__ 0 -1 0 0 __| 517 518 M4 * in = S4 * (DCTII_4 * in) 519 @endcode 520 521 (DCTII_4 * in) is computed using a Fast Cosine Transform. The algorithm 522 here is based on an implementation computed by the SPIRAL computer 523 algebra system, manually converted to fixed-point arithmetic. S4 can be 524 implemented using only assignment and negation. 525 */ 526 PRIVATE void cosineModulateSynth4(SBC_BUFFER_T* RESTRICT out, 527 int32_t const* RESTRICT in) { 528 int32_t f0, f1, f2, f3, f4, f7, f8, f9, f10; 529 int32_t y0, y1, y2, y3; 530 531 f0 = (in[0] - in[3]); 532 f1 = (in[0] + in[3]); 533 f2 = (in[1] - in[2]); 534 f3 = (in[1] + in[2]); 535 536 f4 = f1 - f3; 537 538 y0 = -SCALE(f1 + f3, DCT_SHIFT); 539 y2 = -SCALE(LONG_MULT_DCT(DCTII_4_K06_FIX, f4), DCT_SHIFT); 540 f7 = f0 + f2; 541 f8 = LONG_MULT_DCT(DCTII_4_K08_FIX, f0); 542 f9 = LONG_MULT_DCT(DCTII_4_K09_FIX, f7); 543 f10 = LONG_MULT_DCT(DCTII_4_K10_FIX, f2); 544 y3 = -SCALE(f8 + f9, DCT_SHIFT); 545 y1 = -SCALE(f10 - f9, DCT_SHIFT); 546 547 out[0] = (int16_t)-y2; 548 out[1] = (int16_t)-y3; 549 out[2] = (int16_t)0; 550 out[3] = (int16_t)y3; 551 out[4] = (int16_t)y2; 552 out[5] = (int16_t)y1; 553 out[6] = (int16_t)y0; 554 out[7] = (int16_t)y1; 555 } 556 557 /** 558 @} 559 */ 560