1 # libsonic Home Page 2 3 [Download the latest tar-ball from here](download). 4 5 The source code repository can be cloned using git: 6 7 $ git clone git://github.com/waywardgeek/sonic.git 8 9 The source code for the Android version, sonic-ndk, can be cloned with: 10 11 $ git clone git://github.com/waywardgeek/sonic-ndk.git 12 13 There is a simple test app for android that demos capabilities. You can 14 [install the Android application from here](Sonic-NDK.apk) 15 16 There is a new native Java port, which is very fast! Checkout Sonic.java and 17 Main.java in the latest tar-ball, or get the code from git. 18 19 ## Overview 20 21 Sonic is free software for speeding up or slowing down speech. While similar to 22 other algorithms that came before, Sonic is optimized for speed ups of over 2X. 23 There is a simple sonic library in ANSI C, and one in pure Java. Both are 24 designed to easily be integrated into streaming voice applications, like TTS 25 back ends. While a very new project, it is already integrated into: 26 27 - espeak 28 - Debian Sid as package libsonic 29 - Android Astro Player Nova 30 - Android Osplayer 31 - Multiple closed source TTS engines 32 33 The primary motivation behind sonic is to enable the blind and visually impaired 34 to improve their productivity with free software speech engines, like espeak. 35 Sonic can also be used by the sighted. For example, sonic can improve the 36 experience of listening to an audio book on an Android phone. 37 38 Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved. It is released 39 as under the Apache 2.0 license. Feel free to contact me at 40 <waywardgeek (a] gmail.com>. One user was concerned about patents. I believe the 41 sonic algorithms do not violate any patents, as most of it is very old, based 42 on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and 43 the new part, for greater than 2X speed up, is clearly a capability most 44 developers ignore, and would not bother to patent. 45 46 ## Comparison to Other Solutions 47 48 In short, Sonic is better for speech, while WSOLA is better for music. 49 50 A popular alternative is SoundTouch. SoundTouch uses WSOLA, an algorithm 51 optimized for changing the tempo of music. No WSOLA based program performs well 52 for speech (contrary to the inventor's estimate of WSOLA). Listen to [this 53 soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare 54 it to [this sonic sample](sonic.wav). Both are sped up by 2X. WSOLA 55 introduces unacceptable levels of distortion, making speech impossible to 56 understand at high speed (over 2.5X) by blind speed listeners. 57 58 However, there are decent free software algorithms for speeding up speech. They 59 are all in the TD-PSOLA family. For speech rates below 2X, sonic uses PICOLA, 60 which I find to be the best algorithm available. A slightly buggy 61 implementation of PICOLA is available in the spandsp library. I find the one in 62 RockBox quite good, though it's limited to 2X speed up. So far as I know, only 63 sonic is optimized for speed factors needed by the blind, up to 6X. 64 65 Sonic does all of it's CPU intensive work with integer math, and works well on 66 ARM CPUs without FPUs. It supports multiple channels (stereo), and is also able 67 to change the pitch of a voice. It works well in streaming audio applications, 68 and can deal with sound streams in 16-bit signed integer, 32-bit floating point, 69 or 8-bit unsigned formats. The source code is in plain ANSI C. In short, it's 70 production ready. 71 72 ## Using libsonic in your program 73 74 Sonic is still a new library, but is in Debian Sid. It will take a while 75 for it to filter out into all the other distros. For now, feel free to simply 76 add sonic.c and sonic.h to your application (or Sonic.java), but consider 77 switching to -lsonic once the library is available on your distro. 78 79 The file [main.c](main.c) is the source code for the sonic command-line application. It 80 is meant to be useful as example code. Feel free to copy directly from main.c 81 into your application, as main.c is in the public domain. Dependencies listed 82 in debian/control like libsndfile are there to compile the sonic command-line 83 application. Libsonic has no external dependencies. 84 85 There are basically two ways to use sonic: batch or stream mode. The simplest 86 is batch mode where you pass an entire sound sample to sonic. All you do is 87 call one function, like this: 88 89 sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels); 90 91 This will change the speed and pitch of the sound samples pointed to by samples, 92 which should be 16-bit signed integers. Stereo mode is supported, as 93 is any arbitrary number of channels. Samples for each channel should be 94 adjacent in the input array. Because the samples are modified in-place, be sure 95 that there is room in the samples array for the speed-changed samples. In 96 general, if you are speeding up, rather than slowing down, it will be safe to 97 have no extra padding. If your sound samples are mono, and you don't want to 98 scale volume or playback rate, and if you want normal pitch scaling, then call 99 it like this: 100 101 sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1); 102 103 The other way to use libsonic is in stream mode. This is more complex, but 104 allows sonic to be inserted into a sound stream with fairly low latency. The 105 current maximum latency in sonic is 31 milliseconds, which is enough to process 106 two pitch periods of voice as low as 65 Hz. In general, the latency is equal to 107 two pitch periods, which is typically closer to 20 milliseconds. 108 109 To process a sound stream, you must create a sonicStream object, which contains 110 all of the state used by sonic. Sonic should be thread safe, and multiple 111 sonicStream objects can be used at the same time. You create a sonicStream 112 object like this: 113 114 sonicStream stream = sonicCreateStream(sampleRate, numChannels); 115 116 When you're done with a sonic stream, you can free it's memory with: 117 118 sonicDestroyStream(stream); 119 120 By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means 121 no change at all to the sound stream. Sonic detects this case, and simply 122 copies the input to the output to reduce CPU load. To change the speed, pitch, 123 rate, or volume, set the parameters using: 124 125 sonicSetSpeed(stream, speed); 126 sonicSetPitch(stream, pitch); 127 sonicSetRate(stream, rate); 128 sonicSetVolume(stream, volume); 129 130 These four parameters are floating point numbers. A speed of 2.0 means to 131 double speed of speech. A pitch of 0.95 means to lower the pitch by about 5%, 132 and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we 133 exceed the maximum range of a 16-bit integer. Speech rate scales how fast 134 speech is played. A 2.0 value will make you sound like a chipmunk talking very 135 fast. A 0.7 value will make you sound like a giant talking slowly. 136 137 By default, pitch is modified by changing the rate, and then using speed 138 modification to bring the speed back to normal. This allows for a wide range of 139 pitch changes, but changing the pitch makes the speaker sound larger or smaller, 140 too. If you want to make the person sound like the same person, but talking at 141 a higher or lower pitch, then enable the vocal chord emulation mode for pitch 142 scaling, using: 143 144 sonicSetChordPitch(stream, 1); 145 146 However, only small changes to pitch should be used in this mode, as it 147 introduces significant distortion otherwise. 148 149 After setting the sound parameters, you write to the stream like this: 150 151 sonicWriteShortToStream(stream, samples, numSamples); 152 153 You read the sped up speech samples from sonic like this: 154 155 samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize); 156 if(samplesRead > 0) { 157 /* Do something with the output samples in outBuffer, like send them to 158 * the sound device. */ 159 } 160 161 You may change the speed, pitch, rate, and volume parameters at any time, without 162 having to flush or create a new sonic stream. 163 164 When your sound stream ends, there may be several milliseconds of sound data in 165 the sonic stream's buffers. To force sonic to process those samples use: 166 167 sonicFlushStream(stream); 168 169 Then, read those samples as above. That's about all there is to using libsonic. 170 There are some more functions as a convenience for the user, like 171 sonicGetSpeed. Other sound data formats are supported: signed char and float. 172 If float, the sound data should be between -1.0 and 1.0. Internally, all sound 173 data is converted to 16-bit integers for processing. 174