Home | History | Annotate | Download | only in docs
      1 HOW AUDIO EMULATION WORKS IN QEMU:
      2 ==================================
      3 
      4 Things are a bit tricky, but here's a rough description:
      5 
      6   QEMUSoundCard: models a given emulated sound card
      7   SWVoiceOut:    models an audio output from a QEMUSoundCard
      8   SWVoiceIn:     models an audio input from a QEMUSoundCard
      9 
     10   HWVoiceOut:    models an audio output (backend) on the host.
     11   HWVoiceIn:     models an audio input (backend) on the host.
     12 
     13 Each voice can have its own settings in terms of sample size, endianess, rate, etc...
     14 
     15 
     16 Emulation for a given soundcard typically does:
     17 
     18   1/ Create a QEMUSoundCard object and register it with AUD_register_card()
     19   2/ For each emulated output, call AUD_open_out() to create a SWVoiceOut object.
     20   3/ For each emulated input, call AUD_open_in() to create a SWVoiceIn object.
     21 
     22   Note that you must pass a callback function to AUD_open_out() and AUD_open_in();
     23   more on this later.
     24 
     25   Each SWVoiceOut is associated to a single HWVoiceOut, each SWVoiceIn is
     26   associated to a single HWVoiceIn.
     27 
     28   However you can have several SWVoiceOut associated to the same HWVoiceOut
     29   (same thing for SWVoiceIn/HWVoiceIn).
     30 
     31 SOUND PLAYBACK DETAILS:
     32 =======================
     33 
     34 Each HWVoiceOut has the following too:
     35 
     36   - A fixed-size circular buffer of stereo samples (for stereo).
     37     whose format is either floats or int64_t per sample (depending on build
     38     configuration).
     39 
     40   - A 'samples' field giving the (constant) number of sample pairs in the stereo buffer.
     41 
     42   - A target conversion function, called 'clip()' that is used to read from the stereo
     43     buffer and write into a platform-specific sound buffers (e.g. WinWave-managed buffers
     44     on Windows).
     45 
     46   - A 'rpos' offset into the circular buffer which tells where to read the next samples
     47     from the stereo buffer for the next conversion through 'clip'.
     48 
     49 
     50             |<----------------- samples ----------------------->|
     51 
     52             |                                                   |
     53 
     54             |       rpos                                        |
     55                     |
     56             |_______v___________________________________________|
     57             |       |                                           |
     58             |       |                                           |
     59             |_______|___________________________________________|
     60 
     61 
     62   - A 'run_out' method that is called each time to tell the output backend to
     63     send samples from the stereo buffer to the host sound card/server. This method
     64     shall also modify 'rpos' and returns the number of samples 'played'. A more detailed
     65     description of this process appears below.
     66 
     67   - A 'write' method callback used to write a buffer of emulated sound samples from
     68     a SWVoiceOut into the stereo buffer. Currently all backends simply call the generic
     69     function audio_pcm_sw_write() to implement this.
     70 
     71     According to malc, the audio sub-system's original author, this is to allow
     72     a backend to use a platform-specific function to do the same thing if available.
     73 
     74     (Similarly, all input backends have a 'read' methods which simply calls 'audio_pcm_sw_read')
     75 
     76 Each SWVoiceOut has the following:
     77 
     78   - a 'conv()' function used to read sound samples from the emulated sound card and
     79     copy/mix them to the corresponding HWVoiceOut's stereo buffer.
     80 
     81   - a 'total_hw_samples_mixed' which correspond to the number of samples that have
     82     already been mixed into the target HWVoiceOut stereo buffer (starting from the
     83     HWVoiceOut's 'rpos' offset). NOTE: this is a count of samples in the HWVoiceOut
     84     stereo buffer, not emulated hardware sound samples, which can have different
     85     properties (frequency, size, endianess).
     86                                          ______________
     87                                         |              |
     88                                         |  SWVoiceOut2 |
     89                                         |______________|
     90                   ______________           |
     91                  |              |          |
     92                  |  SWVoiceOut1 |          |     thsm<N> := total_hw_samples_mixed
     93                  |______________|          |                for SWVoiceOut<N>
     94                            |               |
     95                            |               |
     96                     |<-----|------------thsm2-->|
     97                     |      |                    |
     98                     |<---thsm1-------->|        |
     99              _______|__________________v________|_______________ 
    100             |       |111111111111111111|        v               |
    101             |       |222222222222222222222222222|               |
    102             |_______|___________________________________________|
    103                     ^
    104                     |         HWVoiceOut stereo buffer
    105                     rpos
    106 
    107 
    108   - a 'ratio' value, which is the ratio of the target HWVoiceOut's frequency by
    109     the SWVoiceOut's frequency, multiplied by (1 << 32), as a 64-bit integer.
    110 
    111     So, if the HWVoiceOut has a frequency of 44kHz, and the SWVoiceOut has a frequency
    112     of 11kHz, then ratio will be (44/11*(1 << 32)) = 0x4_0000_0000
    113 
    114   - a callback provided by the emulated hardware when the SWVoiceOut is created.
    115     This function is used to mix the SWVoiceOut's samples into the target
    116     HWVoiceOut stereo buffer (it must also perform frequency interpolation,
    117     volume adjustment, etc..).
    118 
    119     This callback normally calls another helper functions in the audio subsystem
    120     (AUD_write()) to to the mixing/volume-adjustment from emulated hardware sample
    121     buffers.
    122 
    123 Here's a small graphics that explains it better:
    124 
    125    SWVoiceOut:  emulated hardware sound buffers:
    126           |
    127           |   (mixed through AUD_write() called from user-provided
    128           |    callback which is itself called on each audio timer
    129           |    tick).
    130           v
    131    HWVoiceOut: stereo sample circular buffer
    132           |
    133           |   (sent through HWVoiceOut's 'clip' function, which is
    134           |    invoked from the 'run_out' method, also called on each
    135           |    audio timer tick)
    136           v
    137    backend-specific sound buffers
    138 
    139 
    140 The function audio_timer() in audio/audio.c is called periodically and it is used as
    141 a pulse to perform sound buffer transfers and mixing. More specifically for audio
    142 output voices:
    143 
    144 - For each HWVoiceOut, find the number of active SWVoiceOut, and the minimum number
    145   of 'total_hw_samples_mixed' that have already been written to the buffer. We will
    146   call this value the number of 'live' samples in the stereo buffer.
    147 
    148 - if 'live' is 0, call the callback of each active SWVoiceOut to fill the stereo
    149   buffer, if needed, then exit.
    150 
    151 - otherwise, call the 'run_out' method of the HWVoiceOut object. This will change
    152   the value of 'rpos' and return the number of samples played. Then the
    153   'total_hw_samples_mixed' field of all active SWVoiceOuts is decremented by
    154   'played', and the callback is called to re-fill the stereo buffer.
    155 
    156 It's important to note that the SWVoiceOut callback:
    157 
    158 - takes a 'free' parameter which is the number of stereo sound samples that can
    159   be sent to the hardware stereo buffer (before rate adjustment, i.e. not the number
    160   of sound samples in the SWVoiceOut emulated hardware sound buffer).
    161 
    162 - must call AUD_write(sw, buff, count), where 'buff' points to emulated sound
    163   samples, and their 'count', which must be <= the 'free' parameter.
    164 
    165 - the implementation of AUD_write() will call the 'write' method of the target
    166   HWVoiceOut, which in turns calls the function audio_pcm_sw_write() which does
    167   standard rate/volume adjustment before mixing the conversion into the target
    168   stereo buffer. It also increases the 'total_hw_samples_mixed' value of the
    169   SWVoiceOut.
    170 
    171 - audio_pcm_sw_write() returns the number of sound sample *bytes* that have
    172   been mixed into the stereo buffer, and so does AUD_write().
    173 
    174 So, in the end, we have the pseudo-code:
    175 
    176     every sound timer ticks:
    177       for hw in list_HWVoiceOut:
    178          live = MIN([sw.total_hw_samples_mixed for sw in hw.list_SWVoiceOut ])
    179          if live > 0:
    180             played = hw.run_out(live)
    181             for sw in hw.list_SWVoiceOut:
    182                 sw.total_hw_samples_mixed -= played
    183 
    184         for sw in hw.list_SWVoiceOut:
    185             free = hw.samples - sw.total_hw_samples_mixed
    186             if free > 0:
    187                 sw.callback(sw, free)
    188 
    189 SOUND RECORDING DETAILS:
    190 ========================
    191 
    192 Things are similar but in reverse order. I.e. the HWVoiceIn acquires sound samples
    193 in its stereo sound buffer, and the SWVoiceIn objects must consume them as soon as
    194 they can.
    195 
    196