Home | History | Annotate | Download | only in coding
      1 .. _devguide-coding-audio:
      2 
      3 #####
      4 Audio
      5 #####
      6 
      7 .. contents::
      8   :local:
      9   :backlinks: none
     10   :depth: 2
     11 
     12 This chapter describes how to use the Pepper audio API to play an audio
     13 stream. The Pepper audio API provides a low-level means of playing a stream of
     14 audio samples generated by a Native Client module. The API generally works as
     15 follows: A Native Client module creates an audio resource that represents an
     16 audio stream, and tells the browser to start or stop playing the audio
     17 resource. The browser calls a function in the Native Client module to fill a
     18 buffer with audio samples every time it needs data to play from the audio
     19 stream.
     20 
     21 The code examples in this chapter describe a simple Native Client module that
     22 generates audio samples using a sine wave with a frequency of 440 Hz. The module
     23 starts playing the audio samples as soon as it is loaded into the browser. For a
     24 slightly more sophisticated example, see the ``audio`` example (source code in
     25 the SDK directory ``examples/api/audio``), which lets users specify a frequency
     26 for the sine wave and click buttons to start and stop audio playback.
     27 
     28 Reference information
     29 =====================
     30 
     31 For reference information related to the Pepper audio API, see the following
     32 documentation:
     33 
     34 * `pp::AudioConfig class
     35   </native-client/pepper_stable/cpp/classpp_1_1_audio_config>`_
     36 
     37 * `pp::Audio class </native-client/pepper_stable/cpp/classpp_1_1_audio>`_
     38 
     39 * `audio_config.h </native-client/pepper_cpp/audio__config_8h>`_
     40 
     41 * `audio.h </native-client/pepper_stable/cpp/audio_8h>`_
     42 
     43 * `PP_AudioSampleRate
     44   </native-client/pepper_stable/c/group___enums#gaee750c350655f2fb0fe04c04029e0ff8>`_
     45 
     46 About the Pepper audio API
     47 ==========================
     48 
     49 The Pepper audio API lets Native Client modules play audio streams in a
     50 browser. To play an audio stream, a module generates audio samples and writes
     51 them into a buffer. The browser reads the audio samples from the buffer and
     52 plays them using an audio device on the client computer.
     53 
     54 .. image:: /images/pepper-audio-buffer.png
     55 
     56 This mechanism is simple but low-level. If you want to play plain sound files in
     57 a web application, you may want to consider higher-level alternatives such as
     58 using the HTML ``<audio>`` tag, JavaScript, or the new `Web Audio API
     59 <http://chromium.googlecode.com/svn/trunk/samples/audio/index.html>`_.
     60 
     61 The Pepper audio API is a good option for playing audio data if you want to do
     62 audio processing in your web application. You might use the audio API, for
     63 example, if you want to apply audio effects to sounds, synthesize your own
     64 sounds, or do any other type of CPU-intensive processing of audio
     65 samples. Another likely use case is gaming applications: you might use a gaming
     66 library to process audio data, and then simply use the audio API to output the
     67 processed data.
     68 
     69 The Pepper audio API is straightforward to use:
     70 
     71 #. Your module creates an audio configuration resource and an audio resource.
     72 
     73 #. Your module implements a callback function that fills an audio buffer with
     74    data.
     75 
     76 #. Your module invokes the StartPlayback and StopPlayback methods of the audio
     77    resource (e.g., when certain events occur).
     78 
     79 #. The browser invokes your callback function whenever it needs audio data to
     80    play. Your callback function can generate the audio data in a number of
     81    ways---e.g., it can generate new data, or it can copy pre-mixed data into the
     82    audio buffer.
     83 
     84 This basic interaction is illustrated below, and described in detail in the
     85 sections that follow.
     86 
     87 .. image:: /images/pepper-audio-api.png
     88 
     89 Digital audio concepts
     90 ======================
     91 
     92 Before you use the Pepper audio API, it's helpful to understand a few concepts
     93 that are fundamental to how digital audio is recorded and played back:
     94 
     95 sample rate
     96   the number of times an input sound source is sampled per second;
     97   correspondingly, the number of samples that are played back per second
     98 
     99 bit depth
    100   the number of bits used to represent a sample
    101 
    102 channels
    103   the number of input sources recorded in each sampling interval;
    104   correspondingly, the number of outputs that are played back simultaneously
    105   (typically using different speakers)
    106 
    107 The higher the sample rate and bit depth used to record a sound wave, the more
    108 accurately the sound wave can be reproduced, since it will have been sampled
    109 more frequently and stored using a higher level of quantization. Common sampling
    110 rates include 44,100 Hz (44,100 samples/second, the sample rate used on CDs),
    111 and 48,000 Hz (the sample rate used on DVDs and Digital Audio Tapes). A common
    112 bit depth is 16 bits per sample, and a common number of channels is 2 (left and
    113 right channels for stereo sound).
    114 
    115 .. _pepper_audio_configurations:
    116 
    117 The Pepper audio API currently lets Native Client modules play audio streams
    118 with the following configurations:
    119 
    120 * **sample rate**: 44,100 Hz or 48,000 Hz
    121 * **bit depth**: 16
    122 * **channels**: 2 (stereo)
    123 
    124 Setting up the module
    125 =====================
    126 
    127 The code examples below describe a simple Native Client module that generates
    128 audio samples using a sine wave with a frequency of 440 Hz. The module starts
    129 playing the audio samples as soon as it is loaded into the browser.
    130 
    131 The Native Client module is set up by implementing subclasses of the
    132 ``pp::Module`` and ``pp::Instance`` classes, as normal.
    133 
    134 .. naclcode::
    135 
    136   class SineSynthInstance : public pp::Instance {
    137    public:
    138     explicit SineSynthInstance(PP_Instance instance);
    139     virtual ~SineSynthInstance() {}
    140 
    141     // Called by the browser once the NaCl module is loaded and ready to
    142     // initialize.  Creates a Pepper audio context and initializes it. Returns
    143     // true on success.  Returning false causes the NaCl module to be deleted
    144     // and no other functions to be called.
    145     virtual bool Init(uint32_t argc, const char* argn[], const char* argv[]);
    146 
    147    private:
    148     // Function called by the browser when it needs more audio samples.
    149     static void SineWaveCallback(void* samples,
    150                                  uint32_t buffer_size,
    151                                  void* data);
    152 
    153     // Audio resource.
    154     pp::Audio audio_;
    155 
    156     ...
    157 
    158   };
    159 
    160   class SineSynthModule : public pp::Module {
    161    public:
    162     SineSynthModule() : pp::Module() {}
    163     ~SineSynthModule() {}
    164 
    165     // Create and return a SineSynthInstance object.
    166     virtual pp::Instance* CreateInstance(PP_Instance instance) {
    167       return new SineSynthInstance(instance);
    168     }
    169   };
    170 
    171 Creating an audio configuration resource
    172 ========================================
    173 
    174 Resources
    175 ---------
    176 
    177 Before the module can play an audio stream, it must create two resources: an
    178 audio configuration resource and an audio resource. Resources are handles to
    179 objects that the browser provides to module instances. An audio resource is an
    180 object that represents the state of an audio stream, including whether the
    181 stream is paused or being played back, and which callback function to invoke
    182 when the samples in the stream's buffer run out. An audio configuration resource
    183 is an object that stores configuration data for an audio resource, including the
    184 sampling frequency of the audio samples, and the number of samples that the
    185 callback function must provide when the browser invokes it.
    186 
    187 Sample frame count
    188 ------------------
    189 
    190 Prior to creating an audio configuration resource, the module should call
    191 ``RecommendSampleFrameCount`` to obtain a *sample frame count* from the
    192 browser. The sample frame count is the number of samples that the callback
    193 function must provide per channel each time the browser invokes the callback
    194 function. For example, if the sample frame count is 4096 for a stereo audio
    195 stream, the callback function must provide a 8192 samples (4096 for the left
    196 channel and 4096 for the right channel).
    197 
    198 The module can request a specific sample frame count, but the browser may return
    199 a different sample frame count depending on the capabilities of the client
    200 device. At present, ``RecommendSampleFrameCount`` simply bound-checks the
    201 requested sample frame count (see ``include/ppapi/c/ppb_audio_config.h`` for the
    202 minimum and maximum sample frame counts, currently 64 and 32768). In the future,
    203 ``RecommendSampleFrameCount`` may perform a more sophisticated calculation,
    204 particularly if there is an intrinsic buffer size for the client device.
    205 
    206 Selecting a sample frame count for an audio stream involves a tradeoff between
    207 latency and CPU usage. If you want your module to have short audio latency so
    208 that it can rapidly change what's playing in the audio stream, you should
    209 request a small sample frame count. That could be useful in gaming applications,
    210 for example, where sounds have to change frequently in response to game
    211 action. However, a small sample frame count results in higher CPU usage, since
    212 the browser must invoke the callback function frequently to refill the audio
    213 buffer. Conversely, a large sample frame count results in higher latency but
    214 lower CPU usage. You should request a large sample frame count if your module
    215 will play long, uninterrupted audio segments.
    216 
    217 Supported audio configurations
    218 ------------------------------
    219 
    220 After the module obtains a sample frame count, it can create an audio
    221 configuration resource. Currently the Pepper audio API supports audio streams
    222 with the configuration settings shown :ref:`above<pepper_audio_configurations>`.
    223 C++ modules can create a configuration resource by instantiating a
    224 ``pp::AudioConfig`` object. Check ``audio_config.h`` for the latest
    225 configurations that are supported.
    226 
    227 .. naclcode::
    228 
    229   bool SineSynthInstance::Init(uint32_t argc,
    230                                const char* argn[],
    231                                const char* argv[]) {
    232 
    233     // Ask the browser/device for an appropriate sample frame count size.
    234     sample_frame_count_ =
    235         pp::AudioConfig::RecommendSampleFrameCount(PP_AUDIOSAMPLERATE_44100,
    236                                                    kSampleFrameCount);
    237 
    238     // Create an audio configuration resource.
    239     pp::AudioConfig audio_config = pp::AudioConfig(this,
    240                                                    PP_AUDIOSAMPLERATE_44100,
    241                                                    sample_frame_count_);
    242 
    243     // Create an audio resource.
    244     audio_ = pp::Audio(this,
    245                        audio_config,
    246                        SineWaveCallback,
    247                        this);
    248 
    249     // Start playback when the module instance is initialized.
    250     return audio_.StartPlayback();
    251   }
    252 
    253 Creating an audio resource
    254 ==========================
    255 
    256 Once the module has created an audio configuration resource, it can create an
    257 audio resource. To do so, it instantiates a ``pp::Audio`` object, passing in a
    258 pointer to the module instance, the audio configuration resource, a callback
    259 function, and a pointer to user data (data that is used in the callback
    260 function).  See the example above.
    261 
    262 Implementing a callback function
    263 ================================
    264 
    265 The browser calls the callback function associated with an audio resource every
    266 time it needs more samples to play. The callback function can generate new
    267 samples (e.g., by applying sound effects), or copy pre-mixed samples into the
    268 audio buffer. The example below generates new samples by computing values of a
    269 sine wave.
    270 
    271 The last parameter passed to the callback function is generic user data that the
    272 function can use in processing samples. In the example below, the user data is a
    273 pointer to the module instance, which includes member variables
    274 ``sample_frame_count_`` (the sample frame count obtained from the browser) and
    275 ``theta_`` (the last angle that was used to compute a sine value in the previous
    276 callback; this lets the function generate a smooth sine wave by starting at that
    277 angle plus a small delta).
    278 
    279 .. naclcode::
    280 
    281   class SineSynthInstance : public pp::Instance {
    282    public:
    283     ...
    284 
    285    private:
    286     static void SineWaveCallback(void* samples,
    287                                  uint32_t buffer_size,
    288                                  void* data) {
    289 
    290       // The user data in this example is a pointer to the module instance.
    291       SineSynthInstance* sine_synth_instance =
    292           reinterpret_cast<SineSynthInstance*>(data);
    293 
    294       // Delta by which to increase theta_ for each sample.
    295       const double delta = kTwoPi * kFrequency / PP_AUDIOSAMPLERATE_44100;
    296       // Amount by which to scale up the computed sine value.
    297       const int16_t max_int16 = std::numeric_limits<int16_t>::max();
    298 
    299       int16_t* buff = reinterpret_cast<int16_t*>(samples);
    300 
    301       // Make sure we can't write outside the buffer.
    302       assert(buffer_size >= (sizeof(*buff) * kChannels *
    303                              sine_synth_instance->sample_frame_count_));
    304 
    305       for (size_t sample_i = 0;
    306            sample_i < sine_synth_instance->sample_frame_count_;
    307            ++sample_i, sine_synth_instance->theta_ += delta) {
    308 
    309         // Keep theta_ from going beyond 2*Pi.
    310         if (sine_synth_instance->theta_ > kTwoPi) {
    311           sine_synth_instance->theta_ -= kTwoPi;
    312         }
    313 
    314         // Compute the sine value for the current theta_, scale it up,
    315         // and write it into the buffer once for each channel.
    316         double sin_value(std::sin(sine_synth_instance->theta_));
    317         int16_t scaled_value = static_cast<int16_t>(sin_value * max_int16);
    318         for (size_t channel = 0; channel < kChannels; ++channel) {
    319           *buff++ = scaled_value;
    320         }
    321       }
    322     }
    323 
    324     ...
    325   };
    326 
    327 Application threads and real-time requirements
    328 ----------------------------------------------
    329 
    330 The callback function runs in a background application thread. This allows audio
    331 processing to continue even when the application is busy doing something
    332 else. If the main application thread and the callback thread access the same
    333 data, you may be tempted to use a lock to control access to that data. You
    334 should avoid the use of locks in the callback thread, however, as attempting to
    335 acquire a lock may cause the thread to get swapped out, resulting in audio
    336 dropouts.
    337 
    338 In general, you must program the callback thread carefully, as the Pepper audio
    339 API is a very low level API that needs to meet hard real-time requirements. If
    340 the callback thread spends too much time processing, it can easily miss the
    341 real-time deadline, resulting in audio dropouts. One way the callback thread can
    342 miss the deadline is by taking too much time doing computation. Another way the
    343 callback thread can miss the deadline is by executing a function call that swaps
    344 out the callback thread. Unfortunately, such function calls include just about
    345 all C Run-Time (CRT) library calls and Pepper API calls. The callback thread
    346 should therefore avoid calls to malloc, gettimeofday, mutex, condvars, critical
    347 sections, and so forth; any such calls could attempt to take a lock and swap out
    348 the callback thread, which would be disastrous for audio playback. Similarly,
    349 the callback thread should avoid Pepper API calls. Audio dropouts due to thread
    350 swapping can be very rare and very hard to track down and debug---it's best to
    351 avoid making system/Pepper calls in the first place. In short, the audio
    352 (callback) thread should use "lock-free" techniques and avoid making CRT library
    353 calls.
    354 
    355 One other issue to be aware of is that the ``StartPlayback`` function (discussed
    356 below) is an asynchronous RPC; i.e., it does not block. That means that the
    357 callback function may not be called immediately after the call to
    358 ``StartPlayback``. If it's important to synchronize the callback thread with
    359 another thread so that the audio stream starts playing simultaneously with
    360 another action in your application, you must handle such synchronization
    361 manually.
    362 
    363 Starting and stopping playback
    364 ==============================
    365 
    366 To start and stop audio playback, the module simply reacts to JavaScript
    367 messages.
    368 
    369 .. naclcode::
    370 
    371   const char* const kPlaySoundId = "playSound";
    372   const char* const kStopSoundId = "stopSound";
    373 
    374   void SineSynthInstance::HandleMessage(const pp::Var& var_message) {
    375     if (!var_message.is_string()) {
    376       return;
    377     }
    378     std::string message = var_message.AsString();
    379     if (message == kPlaySoundId) {
    380       audio_.StartPlayback();
    381     } else if (message == kStopSoundId) {
    382       audio_.StopPlayback();
    383     } else if (...) {
    384       ...
    385     }
    386   }
    387