Home | History | Annotate | Download | only in doc
      1 # libsonic Home Page
      2 
      3 [Download the latest tar-ball from here](download).
      4 
      5 The source code repository can be cloned using git:
      6 
      7     $ git clone git://github.com/waywardgeek/sonic.git
      8 
      9 The source code for the Android version, sonic-ndk, can be cloned with:
     10 
     11     $ git clone git://github.com/waywardgeek/sonic-ndk.git
     12 
     13 There is a simple test app for android that demos capabilities.  You can
     14 [install the Android application from here](Sonic-NDK.apk)
     15 
     16 There is a new native Java port, which is very fast!  Checkout Sonic.java and
     17 Main.java in the latest tar-ball, or get the code from git.
     18 
     19 ## Overview
     20 
     21 Sonic is free software for speeding up or slowing down speech.  While similar to
     22 other algorithms that came before, Sonic is optimized for speed ups of over 2X.
     23 There is a simple sonic library in ANSI C, and one in pure Java.  Both are
     24 designed to easily be integrated into streaming voice applications, like TTS
     25 back ends.  While a very new project, it is already integrated into:
     26 
     27 - espeak
     28 - Debian Sid as package libsonic
     29 - Android Astro Player Nova
     30 - Android Osplayer
     31 - Multiple closed source TTS engines
     32 
     33 The primary motivation behind sonic is to enable the blind and visually impaired
     34 to improve their productivity with free software speech engines, like espeak.
     35 Sonic can also be used by the sighted.  For example, sonic can improve the
     36 experience of listening to an audio book on an Android phone.
     37 
     38 Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
     39 as under the Apache 2.0 license.  Feel free to contact me at
     40 <waywardgeek (a] gmail.com>.  One user was concerned about patents.  I believe the
     41 sonic algorithms do not violate any patents, as most of it is very old, based
     42 on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
     43 the new part, for greater than 2X speed up, is clearly a capability most
     44 developers ignore, and would not bother to patent.
     45 
     46 ## Comparison to Other Solutions
     47 
     48 In short, Sonic is better for speech, while WSOLA is better for music.
     49 
     50 A popular alternative is SoundTouch.  SoundTouch uses WSOLA, an algorithm
     51 optimized for changing the tempo of music.  No WSOLA based program performs well
     52 for speech (contrary to the inventor's estimate of WSOLA).  Listen to [this
     53 soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
     54 it to [this sonic sample](sonic.wav).  Both are sped up by 2X.  WSOLA
     55 introduces unacceptable levels of distortion, making speech impossible to
     56 understand at high speed (over 2.5X) by blind speed listeners.
     57 
     58 However, there are decent free software algorithms for speeding up speech.  They
     59 are all in the TD-PSOLA family.  For speech rates below 2X, sonic uses PICOLA,
     60 which I find to be the best algorithm available.  A slightly buggy
     61 implementation of PICOLA is available in the spandsp library.  I find the one in
     62 RockBox quite good, though it's limited to 2X speed up.  So far as I know, only
     63 sonic is optimized for speed factors needed by the blind, up to 6X.
     64 
     65 Sonic does all of it's CPU intensive work with integer math, and works well on
     66 ARM CPUs without FPUs.  It supports multiple channels (stereo), and is also able
     67 to change the pitch of a voice.  It works well in streaming audio applications,
     68 and can deal with sound streams in 16-bit signed integer, 32-bit floating point,
     69 or 8-bit unsigned formats.  The source code is in plain ANSI C.  In short, it's
     70 production ready.
     71 
     72 ## Using libsonic in your program
     73 
     74 Sonic is still a new library, but is in Debian Sid.  It will take a while
     75 for it to filter out into all the other distros.  For now, feel free to simply
     76 add sonic.c and sonic.h to your application (or Sonic.java), but consider
     77 switching to -lsonic once the library is available on your distro.
     78 
     79 The file [main.c](main.c) is the source code for the sonic command-line application.  It
     80 is meant to be useful as example code.  Feel free to copy directly from main.c
     81 into your application, as main.c is in the public domain.  Dependencies listed
     82 in debian/control like libsndfile are there to compile the sonic command-line
     83 application.  Libsonic has no external dependencies.
     84 
     85 There are basically two ways to use sonic: batch or stream mode.  The simplest
     86 is batch mode where you pass an entire sound sample to sonic.  All you do is
     87 call one function, like this:
     88 
     89     sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);
     90 
     91 This will change the speed and pitch of the sound samples pointed to by samples,
     92 which should be 16-bit signed integers.  Stereo mode is supported, as
     93 is any arbitrary number of channels.  Samples for each channel should be
     94 adjacent in the input array.  Because the samples are modified in-place, be sure
     95 that there is room in the samples array for the speed-changed samples.  In
     96 general, if you are speeding up, rather than slowing down, it will be safe to
     97 have no extra padding.  If your sound samples are mono, and you don't want to
     98 scale volume or playback rate, and if you want normal pitch scaling, then call
     99 it like this:
    100 
    101     sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);
    102 
    103 The other way to use libsonic is in stream mode.  This is more complex, but
    104 allows sonic to be inserted into a sound stream with fairly low latency.  The
    105 current maximum latency in sonic is 31 milliseconds, which is enough to process
    106 two pitch periods of voice as low as 65 Hz.  In general, the latency is equal to
    107 two pitch periods, which is typically closer to 20 milliseconds.
    108 
    109 To process a sound stream, you must create a sonicStream object, which contains
    110 all of the state used by sonic.  Sonic should be thread safe, and multiple
    111 sonicStream objects can be used at the same time.  You create a sonicStream
    112 object like this:
    113 
    114     sonicStream stream = sonicCreateStream(sampleRate, numChannels);
    115 
    116 When you're done with a sonic stream, you can free it's memory with:
    117 
    118     sonicDestroyStream(stream);
    119 
    120 By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
    121 no change at all to the sound stream.  Sonic detects this case, and simply
    122 copies the input to the output to reduce CPU load.  To change the speed, pitch,
    123 rate, or volume, set the parameters using:
    124 
    125     sonicSetSpeed(stream, speed);
    126     sonicSetPitch(stream, pitch);
    127     sonicSetRate(stream, rate);
    128     sonicSetVolume(stream, volume);
    129 
    130 These four parameters are floating point numbers.  A speed of 2.0 means to
    131 double speed of speech.  A pitch of 0.95 means to lower the pitch by about 5%,
    132 and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
    133 exceed the maximum range of a 16-bit integer.  Speech rate scales how fast
    134 speech is played.  A 2.0 value will make you sound like a chipmunk talking very
    135 fast.  A 0.7 value will make you sound like a giant talking slowly.
    136 
    137 By default, pitch is modified by changing the rate, and then using speed
    138 modification to bring the speed back to normal.  This allows for a wide range of
    139 pitch changes, but changing the pitch makes the speaker sound larger or smaller,
    140 too.  If you want to make the person sound like the same person, but talking at
    141 a higher or lower pitch, then enable the vocal chord emulation mode for pitch
    142 scaling, using:
    143 
    144     sonicSetChordPitch(stream, 1);
    145 
    146 However, only small changes to pitch should be used in this mode, as it
    147 introduces significant distortion otherwise.
    148 
    149 After setting the sound parameters, you write to the stream like this:
    150 
    151     sonicWriteShortToStream(stream, samples, numSamples);
    152 
    153 You read the sped up speech samples from sonic like this:
    154 
    155     samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
    156     if(samplesRead > 0) {
    157 	/* Do something with the output samples in outBuffer, like send them to
    158 	 * the sound device. */
    159     }
    160 
    161 You may change the speed, pitch, rate, and volume parameters at any time, without
    162 having to flush or create a new sonic stream.
    163 
    164 When your sound stream ends, there may be several milliseconds of sound data in
    165 the sonic stream's buffers.  To force sonic to process those samples use:
    166 
    167     sonicFlushStream(stream);
    168 
    169 Then, read those samples as above.  That's about all there is to using libsonic.
    170 There are some more functions as a convenience for the user, like
    171 sonicGetSpeed.  Other sound data formats are supported: signed char and float.
    172 If float, the sound data should be between -1.0 and 1.0.  Internally, all sound
    173 data is converted to 16-bit integers for processing.
    174