Home | History | Annotate | Download | only in srce
      1 /******************************************************************************
      2  *
      3  *  Copyright (C) 2014 The Android Open Source Project
      4  *  Copyright 2003 - 2004 Open Interface North America, Inc. All rights
      5  *                        reserved.
      6  *
      7  *  Licensed under the Apache License, Version 2.0 (the "License");
      8  *  you may not use this file except in compliance with the License.
      9  *  You may obtain a copy of the License at:
     10  *
     11  *  http://www.apache.org/licenses/LICENSE-2.0
     12  *
     13  *  Unless required by applicable law or agreed to in writing, software
     14  *  distributed under the License is distributed on an "AS IS" BASIS,
     15  *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16  *  See the License for the specific language governing permissions and
     17  *  limitations under the License.
     18  *
     19  ******************************************************************************/
     20 
     21 /*******************************************************************************
     22   $Revision: #1 $
     23  ******************************************************************************/
     24 
     25 /** @file
     26 
     27 This file, along with synthesis-generated.c, contains the synthesis
     28 filterbank routines. The operations performed correspond to the
     29 operations described in A2DP Appendix B, Figure 12.3. Several
     30 mathematical optimizations are performed, particularly for the
     31 8-subband case.
     32 
     33 One important optimization is to note that the "matrixing" operation
     34 can be decomposed into the product of a type II discrete cosine kernel
     35 and another, sparse matrix.
     36 
     37 According to Fig 12.3, in the 8-subband case,
     38 @code
     39     N[k][i] = cos((i+0.5)*(k+4)*pi/8), k = 0..15 and i = 0..7
     40 @endcode
     41 
     42 N can be factored as R * C2, where C2 is an 8-point type II discrete
     43 cosine kernel given by
     44 @code
     45     C2[k][i] = cos((i+0.5)*k*pi/8)), k = 0..7 and i = 0..7
     46 @endcode
     47 
     48 R turns out to be a sparse 16x8 matrix with the following non-zero
     49 entries:
     50 @code
     51     R[k][k+4]        =  1,   k = 0..3
     52     R[k][abs(12-k)]  = -1,   k = 5..15
     53 @endcode
     54 
     55 The spec describes computing V[0..15] as N * R.
     56 @code
     57     V[0..15] = N * R = (R * C2) * R = R * (C2 * R)
     58 @endcode
     59 
     60 C2 * R corresponds to computing the discrete cosine transform of R, so
     61 V[0..15] can be computed by taking the DCT of R followed by assignment
     62 and selective negation of the DCT result into V.
     63 
     64         Although this was derived empirically using GNU Octave, it is
     65         formally demonstrated in, e.g., Liu, Chi-Min and Lee,
     66         Wen-Chieh. "A Unified Fast Algorithm for Cosine Modulated
     67         Filter Banks in Current Audio Coding Standards." Journal of
     68         the AES 47 (December 1999): 1061.
     69 
     70 Given the shift operation performed prior to computing V[0..15], it is
     71 clear that V[0..159] represents a rolling history of the 10 most
     72 recent groups of blocks input to the synthesis operation. Interpreting
     73 the matrix N in light of its factorization into C2 and R, R's
     74 sparseness has implications for interpreting the values in V. In
     75 particular, there is considerable redundancy in the values stored in
     76 V. Furthermore, since R[4][0..7] are all zeros, one out of every 16
     77 values in V will be zero regardless of the input data. Within each
     78 block of 16 values in V, fully half of them are redundant or
     79 irrelevant:
     80 
     81 @code
     82     V[ 0] =  DCT[4]
     83     V[ 1] =  DCT[5]
     84     V[ 2] =  DCT[6]
     85     V[ 3] =  DCT[7]
     86     V[ 4] = 0
     87     V[ 5] = -DCT[7] = -V[3] (redundant)
     88     V[ 6] = -DCT[6] = -V[2] (redundant)
     89     V[ 7] = -DCT[5] = -V[1] (redundant)
     90     V[ 8] = -DCT[4] = -V[0] (redundant)
     91     V[ 9] = -DCT[3]
     92     V[10] = -DCT[2]
     93     V[11] = -DCT[1]
     94     V[12] = -DCT[0]
     95     V[13] = -DCT[1] = V[11] (redundant)
     96     V[14] = -DCT[2] = V[10] (redundant)
     97     V[15] = -DCT[3] = V[ 9] (redundant)
     98 @endcode
     99 
    100 Since the elements of V beyond 15 were originally computed the same
    101 way during a previous run, what holds true for V[x] also holds true
    102 for V[x+16]. Thus, so long as care is taken to maintain the mapping,
    103 we need only actually store the unique values, which correspond to the
    104 output of the DCT, in some cases inverted. In fact, instead of storing
    105 V[0..159], we could store DCT[0..79] which would contain a history of
    106 DCT results. More on this in a bit.
    107 
    108 Going back to figure 12.3 in the spec, it should be clear that the
    109 vector U need not actually be explicitly constructed, but that with
    110 suitable indexing into V during the window operation, the same end can
    111 be accomplished. In the same spirit of the pseudocode shown in the
    112 figure, the following is the construction of W without using U:
    113 
    114 @code
    115     for i=0 to 79 do
    116         W[i] = D[i]*VSIGN(i)*V[remap_V(i)] where remap_V(i) = 32*(int(i/16)) +
    117 (i % 16) + (i % 16 >= 8 ? 16 : 0)
    118                                              and VSIGN(i) maps i%16 into {1, 1,
    119 1, 1, 0, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1 }
    120                                              These values correspond to the
    121                                              signs of the redundant values as
    122                                              shown in the explanation three
    123                                              paragraphs above.
    124 @endcode
    125 
    126 We saw above how V[4..8,13..15] (and by extension
    127 V[(4..8,13..15)+16*n]) can be defined in terms of other elements
    128 within the subblock of V. V[0..3,9..12] correspond to DCT elements.
    129 
    130 @code
    131     for i=0 to 79 do
    132         W[i] = D[i]*DSIGN(i)*DCT[remap_DCT(i)]
    133 @endcode
    134 
    135 The DCT is calculated using the Arai-Agui-Nakajima factorization,
    136 which saves some computation by producing output that needs to be
    137 multiplied by scaling factors before being used.
    138 
    139 @code
    140     for i=0 to 79 do
    141         W[i] = D[i]*SCALE[i%8]*AAN_DCT[remap_DCT(i)]
    142 @endcode
    143 
    144 D can be premultiplied with the DCT scaling factors to yield
    145 
    146 @code
    147     for i=0 to 79 do
    148         W[i] = DSCALED[i]*AAN_DCT[remap_DCT(i)] where DSCALED[i] =
    149 D[i]*SCALE[i%8]
    150 @endcode
    151 
    152 The output samples X[0..7] are defined as sums of W:
    153 
    154 @code
    155         X[j] = sum{i=0..9}(W[j+8*i])
    156 @endcode
    157 
    158 @ingroup codec_internal
    159 */
    160 
    161 /**
    162 @addtogroup codec_internal
    163 @{
    164 */
    165 
    166 #include "oi_codec_sbc_private.h"
    167 
    168 const int32_t dec_window_4[21] = {
    169     0,      /* +0.00000000E+00 */
    170     97,     /* +5.36548976E-04 */
    171     270,    /* +1.49188357E-03 */
    172     495,    /* +2.73370904E-03 */
    173     694,    /* +3.83720193E-03 */
    174     704,    /* +3.89205149E-03 */
    175     338,    /* +1.86581691E-03 */
    176     -554,   /* -3.06012286E-03 */
    177     1974,   /* +1.09137620E-02 */
    178     3697,   /* +2.04385087E-02 */
    179     5224,   /* +2.88757392E-02 */
    180     5824,   /* +3.21939290E-02 */
    181     4681,   /* +2.58767811E-02 */
    182     1109,   /* +6.13245186E-03 */
    183     -5214,  /* -2.88217274E-02 */
    184     -14047, /* -7.76463494E-02 */
    185     24529,  /* +1.35593274E-01 */
    186     35274,  /* +1.94987841E-01 */
    187     44618,  /* +2.46636662E-01 */
    188     50984,  /* +2.81828203E-01 */
    189     53243,  /* +2.94315332E-01 */
    190 };
    191 
    192 #define DCTII_4_K06_FIX (11585) /* S1.14      11585   0.707107*/
    193 
    194 #define DCTII_4_K08_FIX (21407) /* S1.14      21407   1.306563*/
    195 
    196 #define DCTII_4_K09_FIX (-15137) /* S1.14     -15137  -0.923880*/
    197 
    198 #define DCTII_4_K10_FIX (-8867) /* S1.14      -8867  -0.541196*/
    199 
    200 /** Scales x by y bits to the right, adding a rounding factor.
    201  */
    202 #ifndef SCALE
    203 #define SCALE(x, y) (((x) + (1 << ((y)-1))) >> (y))
    204 #endif
    205 
    206 #ifndef CLIP_INT16
    207 #define CLIP_INT16(x)                \
    208   do {                               \
    209     if ((x) > OI_INT16_MAX) {        \
    210       (x) = OI_INT16_MAX;            \
    211     } else if ((x) < OI_INT16_MIN) { \
    212       (x) = OI_INT16_MIN;            \
    213     }                                \
    214   } while (0)
    215 #endif
    216 
    217 /**
    218  * Default C language implementation of a 16x32->32 multiply. This function may
    219  * be replaced by a platform-specific version for speed.
    220  *
    221  * @param u A signed 16-bit multiplicand
    222  * @param v A signed 32-bit multiplier
    223 
    224  * @return  A signed 32-bit value corresponding to the 32 most significant bits
    225  * of the 48-bit product of u and v.
    226  */
    227 INLINE int32_t default_mul_16s_32s_hi(int16_t u, int32_t v) {
    228   uint16_t v0;
    229   int16_t v1;
    230 
    231   int32_t w, x;
    232 
    233   v0 = (uint16_t)(v & 0xffff);
    234   v1 = (int16_t)(v >> 16);
    235 
    236   w = v1 * u;
    237   x = u * v0;
    238 
    239   return w + (x >> 16);
    240 }
    241 
    242 #define MUL_16S_32S_HI(_x, _y) default_mul_16s_32s_hi(_x, _y)
    243 
    244 #define LONG_MULT_DCT(K, sample) (MUL_16S_32S_HI(K, sample) << 2)
    245 
    246 PRIVATE void SynthWindow80_generated(int16_t* pcm,
    247                                      SBC_BUFFER_T const* RESTRICT buffer,
    248                                      OI_UINT strideShift);
    249 PRIVATE void SynthWindow112_generated(int16_t* pcm,
    250                                       SBC_BUFFER_T const* RESTRICT buffer,
    251                                       OI_UINT strideShift);
    252 PRIVATE void dct2_8(SBC_BUFFER_T* RESTRICT out, int32_t const* RESTRICT x);
    253 
    254 typedef void (*SYNTH_FRAME)(OI_CODEC_SBC_DECODER_CONTEXT* context, int16_t* pcm,
    255                             OI_UINT blkstart, OI_UINT blkcount);
    256 
    257 #ifndef COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS
    258 #define COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(dest, src) \
    259   do {                                                      \
    260     shift_buffer(dest, src, 72);                            \
    261   } while (0)
    262 #endif
    263 
    264 #ifndef DCT2_8
    265 #define DCT2_8(dst, src) dct2_8(dst, src)
    266 #endif
    267 
    268 #ifndef SYNTH80
    269 #define SYNTH80 SynthWindow80_generated
    270 #endif
    271 
    272 #ifndef SYNTH112
    273 #define SYNTH112 SynthWindow112_generated
    274 #endif
    275 
    276 PRIVATE void OI_SBC_SynthFrame_80(OI_CODEC_SBC_DECODER_CONTEXT* context,
    277                                   int16_t* pcm, OI_UINT blkstart,
    278                                   OI_UINT blkcount) {
    279   OI_UINT blk;
    280   OI_UINT ch;
    281   OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
    282   OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1;
    283   OI_UINT offset = context->common.filterBufferOffset;
    284   int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart;
    285   OI_UINT blkstop = blkstart + blkcount;
    286 
    287   for (blk = blkstart; blk < blkstop; blk++) {
    288     if (offset == 0) {
    289       COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(
    290           context->common.filterBuffer[0] + context->common.filterBufferLen -
    291               72,
    292           context->common.filterBuffer[0]);
    293       if (nrof_channels == 2) {
    294         COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(
    295             context->common.filterBuffer[1] + context->common.filterBufferLen -
    296                 72,
    297             context->common.filterBuffer[1]);
    298       }
    299       offset = context->common.filterBufferLen - 80;
    300     } else {
    301       offset -= 1 * 8;
    302     }
    303 
    304     for (ch = 0; ch < nrof_channels; ch++) {
    305       DCT2_8(context->common.filterBuffer[ch] + offset, s);
    306       SYNTH80(pcm + ch, context->common.filterBuffer[ch] + offset,
    307               pcmStrideShift);
    308       s += 8;
    309     }
    310     pcm += (8 << pcmStrideShift);
    311   }
    312   context->common.filterBufferOffset = offset;
    313 }
    314 
    315 PRIVATE void OI_SBC_SynthFrame_4SB(OI_CODEC_SBC_DECODER_CONTEXT* context,
    316                                    int16_t* pcm, OI_UINT blkstart,
    317                                    OI_UINT blkcount) {
    318   OI_UINT blk;
    319   OI_UINT ch;
    320   OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
    321   OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1;
    322   OI_UINT offset = context->common.filterBufferOffset;
    323   int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart;
    324   OI_UINT blkstop = blkstart + blkcount;
    325 
    326   for (blk = blkstart; blk < blkstop; blk++) {
    327     if (offset == 0) {
    328       COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(
    329           context->common.filterBuffer[0] + context->common.filterBufferLen -
    330               72,
    331           context->common.filterBuffer[0]);
    332       if (nrof_channels == 2) {
    333         COPY_BACKWARD_32BIT_ALIGNED_72_HALFWORDS(
    334             context->common.filterBuffer[1] + context->common.filterBufferLen -
    335                 72,
    336             context->common.filterBuffer[1]);
    337       }
    338       offset = context->common.filterBufferLen - 80;
    339     } else {
    340       offset -= 8;
    341     }
    342     for (ch = 0; ch < nrof_channels; ch++) {
    343       cosineModulateSynth4(context->common.filterBuffer[ch] + offset, s);
    344       SynthWindow40_int32_int32_symmetry_with_sum(
    345           pcm + ch, context->common.filterBuffer[ch] + offset, pcmStrideShift);
    346       s += 4;
    347     }
    348     pcm += (4 << pcmStrideShift);
    349   }
    350   context->common.filterBufferOffset = offset;
    351 }
    352 
    353 #ifdef SBC_ENHANCED
    354 
    355 PRIVATE void OI_SBC_SynthFrame_Enhanced(OI_CODEC_SBC_DECODER_CONTEXT* context,
    356                                         int16_t* pcm, OI_UINT blkstart,
    357                                         OI_UINT blkcount) {
    358   OI_UINT blk;
    359   OI_UINT ch;
    360   OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
    361   OI_UINT pcmStrideShift = context->common.pcmStride == 1 ? 0 : 1;
    362   OI_UINT offset = context->common.filterBufferOffset;
    363   int32_t* s = context->common.subdata + 8 * nrof_channels * blkstart;
    364   OI_UINT blkstop = blkstart + blkcount;
    365 
    366   for (blk = blkstart; blk < blkstop; blk++) {
    367     if (offset == 0) {
    368       COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS(
    369           context->common.filterBuffer[0] + context->common.filterBufferLen -
    370               104,
    371           context->common.filterBuffer[0]);
    372       if (nrof_channels == 2) {
    373         COPY_BACKWARD_32BIT_ALIGNED_104_HALFWORDS(
    374             context->common.filterBuffer[1] + context->common.filterBufferLen -
    375                 104,
    376             context->common.filterBuffer[1]);
    377       }
    378       offset = context->common.filterBufferLen - 112;
    379     } else {
    380       offset -= 8;
    381     }
    382     for (ch = 0; ch < nrof_channels; ++ch) {
    383       DCT2_8(context->common.filterBuffer[ch] + offset, s);
    384       SYNTH112(pcm + ch, context->common.filterBuffer[ch] + offset,
    385                pcmStrideShift);
    386       s += 8;
    387     }
    388     pcm += (8 << pcmStrideShift);
    389   }
    390   context->common.filterBufferOffset = offset;
    391 }
    392 
    393 static const SYNTH_FRAME SynthFrameEnhanced[] = {
    394     NULL,                       /* invalid */
    395     OI_SBC_SynthFrame_Enhanced, /* mono */
    396     OI_SBC_SynthFrame_Enhanced  /* stereo */
    397 };
    398 
    399 #endif
    400 
    401 static const SYNTH_FRAME SynthFrame8SB[] = {
    402     NULL,                 /* invalid */
    403     OI_SBC_SynthFrame_80, /* mono */
    404     OI_SBC_SynthFrame_80  /* stereo */
    405 };
    406 
    407 static const SYNTH_FRAME SynthFrame4SB[] = {
    408     NULL,                  /* invalid */
    409     OI_SBC_SynthFrame_4SB, /* mono */
    410     OI_SBC_SynthFrame_4SB  /* stereo */
    411 };
    412 
    413 PRIVATE void OI_SBC_SynthFrame(OI_CODEC_SBC_DECODER_CONTEXT* context,
    414                                int16_t* pcm, OI_UINT start_block,
    415                                OI_UINT nrof_blocks) {
    416   OI_UINT nrof_subbands = context->common.frameInfo.nrof_subbands;
    417   OI_UINT nrof_channels = context->common.frameInfo.nrof_channels;
    418 
    419   OI_ASSERT(nrof_subbands == 4 || nrof_subbands == 8);
    420   if (nrof_subbands == 4) {
    421     SynthFrame4SB[nrof_channels](context, pcm, start_block, nrof_blocks);
    422 #ifdef SBC_ENHANCED
    423   } else if (context->common.frameInfo.enhanced) {
    424     SynthFrameEnhanced[nrof_channels](context, pcm, start_block, nrof_blocks);
    425 #endif /* SBC_ENHANCED */
    426   } else {
    427     SynthFrame8SB[nrof_channels](context, pcm, start_block, nrof_blocks);
    428   }
    429 }
    430 
    431 void SynthWindow40_int32_int32_symmetry_with_sum(int16_t* pcm,
    432                                                  SBC_BUFFER_T buffer[80],
    433                                                  OI_UINT strideShift) {
    434   int32_t pa;
    435   int32_t pb;
    436 
    437   /* These values should be zero, since out[2] of the 4-band cosine modulation
    438    * is always zero. */
    439   OI_ASSERT(buffer[2] == 0);
    440   OI_ASSERT(buffer[10] == 0);
    441   OI_ASSERT(buffer[18] == 0);
    442   OI_ASSERT(buffer[26] == 0);
    443   OI_ASSERT(buffer[34] == 0);
    444   OI_ASSERT(buffer[42] == 0);
    445   OI_ASSERT(buffer[50] == 0);
    446   OI_ASSERT(buffer[58] == 0);
    447   OI_ASSERT(buffer[66] == 0);
    448   OI_ASSERT(buffer[74] == 0);
    449 
    450   pa = dec_window_4[4] * (buffer[12] + buffer[76]);
    451   pa += dec_window_4[8] * (buffer[16] - buffer[64]);
    452   pa += dec_window_4[12] * (buffer[28] + buffer[60]);
    453   pa += dec_window_4[16] * (buffer[32] - buffer[48]);
    454   pa += dec_window_4[20] * buffer[44];
    455   pa = SCALE(-pa, 15);
    456   CLIP_INT16(pa);
    457   pcm[0 << strideShift] = (int16_t)pa;
    458 
    459   pa = dec_window_4[1] * buffer[1];
    460   pb = dec_window_4[1] * buffer[79];
    461   pb += dec_window_4[3] * buffer[3];
    462   pa += dec_window_4[3] * buffer[77];
    463   pa += dec_window_4[5] * buffer[13];
    464   pb += dec_window_4[5] * buffer[67];
    465   pb += dec_window_4[7] * buffer[15];
    466   pa += dec_window_4[7] * buffer[65];
    467   pa += dec_window_4[9] * buffer[17];
    468   pb += dec_window_4[9] * buffer[63];
    469   pb += dec_window_4[11] * buffer[19];
    470   pa += dec_window_4[11] * buffer[61];
    471   pa += dec_window_4[13] * buffer[29];
    472   pb += dec_window_4[13] * buffer[51];
    473   pb += dec_window_4[15] * buffer[31];
    474   pa += dec_window_4[15] * buffer[49];
    475   pa += dec_window_4[17] * buffer[33];
    476   pb += dec_window_4[17] * buffer[47];
    477   pb += dec_window_4[19] * buffer[35];
    478   pa += dec_window_4[19] * buffer[45];
    479   pa = SCALE(-pa, 15);
    480   CLIP_INT16(pa);
    481   pcm[1 << strideShift] = (int16_t)(pa);
    482   pb = SCALE(-pb, 15);
    483   CLIP_INT16(pb);
    484   pcm[3 << strideShift] = (int16_t)(pb);
    485 
    486   pa = dec_window_4[2] *
    487        (/*buffer[ 2] + */ buffer[78]); /* buffer[ 2] is always zero */
    488   pa += dec_window_4[6] *
    489         (buffer[14] /* + buffer[66]*/); /* buffer[66] is always zero */
    490   pa += dec_window_4[10] *
    491         (/*buffer[18] + */ buffer[62]); /* buffer[18] is always zero */
    492   pa += dec_window_4[14] *
    493         (buffer[30] /* + buffer[50]*/); /* buffer[50] is always zero */
    494   pa += dec_window_4[18] *
    495         (/*buffer[34] + */ buffer[46]); /* buffer[34] is always zero */
    496   pa = SCALE(-pa, 15);
    497   CLIP_INT16(pa);
    498   pcm[2 << strideShift] = (int16_t)(pa);
    499 }
    500 
    501 /**
    502   This routine implements the cosine modulation matrix for 4-subband
    503   synthesis. This is called "matrixing" in the SBC specification. This
    504   matrix, M4,  can be factored into an 8-point Type II Discrete Cosine
    505   Transform, DCTII_4 and a matrix S4, given here:
    506 
    507   @code
    508         __               __
    509        |   0   0   1   0   |
    510        |   0   0   0   1   |
    511        |   0   0   0   0   |
    512        |   0   0   0  -1   |
    513   S4 = |   0   0  -1   0   |
    514        |   0  -1   0   0   |
    515        |  -1   0   0   0   |
    516        |__ 0  -1   0   0 __|
    517 
    518   M4 * in = S4 * (DCTII_4 * in)
    519   @endcode
    520 
    521   (DCTII_4 * in) is computed using a Fast Cosine Transform. The algorithm
    522   here is based on an implementation computed by the SPIRAL computer
    523   algebra system, manually converted to fixed-point arithmetic. S4 can be
    524   implemented using only assignment and negation.
    525   */
    526 PRIVATE void cosineModulateSynth4(SBC_BUFFER_T* RESTRICT out,
    527                                   int32_t const* RESTRICT in) {
    528   int32_t f0, f1, f2, f3, f4, f7, f8, f9, f10;
    529   int32_t y0, y1, y2, y3;
    530 
    531   f0 = (in[0] - in[3]);
    532   f1 = (in[0] + in[3]);
    533   f2 = (in[1] - in[2]);
    534   f3 = (in[1] + in[2]);
    535 
    536   f4 = f1 - f3;
    537 
    538   y0 = -SCALE(f1 + f3, DCT_SHIFT);
    539   y2 = -SCALE(LONG_MULT_DCT(DCTII_4_K06_FIX, f4), DCT_SHIFT);
    540   f7 = f0 + f2;
    541   f8 = LONG_MULT_DCT(DCTII_4_K08_FIX, f0);
    542   f9 = LONG_MULT_DCT(DCTII_4_K09_FIX, f7);
    543   f10 = LONG_MULT_DCT(DCTII_4_K10_FIX, f2);
    544   y3 = -SCALE(f8 + f9, DCT_SHIFT);
    545   y1 = -SCALE(f10 - f9, DCT_SHIFT);
    546 
    547   out[0] = (int16_t)-y2;
    548   out[1] = (int16_t)-y3;
    549   out[2] = (int16_t)0;
    550   out[3] = (int16_t)y3;
    551   out[4] = (int16_t)y2;
    552   out[5] = (int16_t)y1;
    553   out[6] = (int16_t)y0;
    554   out[7] = (int16_t)y1;
    555 }
    556 
    557 /**
    558 @}
    559 */
    560