Home | History | Annotate | Download | only in articles
      1 page.title=Using Text-to-Speech
      2 @jd:body
      3 
      4 <p>Starting with Android 1.6 (API Level 4), the Android platform includes a new
      5 Text-to-Speech (TTS) capability. Also known as "speech synthesis", TTS enables
      6 your Android device to "speak" text of different languages.</p>
      7 
      8 <p>Before we explain how to use the TTS API itself, let's first review a few
      9 aspects of the engine that will be important to your TTS-enabled application. We
     10 will then show how to make your Android application talk and how to configure
     11 the way it speaks.</p>
     12 
     13 <h3>Languages and resources</h3>
     14 
     15 <p>The TTS engine that ships with the Android platform supports a number of
     16 languages: English, French, German, Italian and Spanish. Also, depending on
     17 which side of the Atlantic you are on, American and British accents for English
     18 are both supported.</p>
     19 
     20 <p>The TTS engine needs to know which language to speak, as a word like "Paris",
     21 for example, is pronounced differently in French and English. So the voice and
     22 dictionary are language-specific resources that need to be loaded before the
     23 engine can start to speak.</p>
     24 
     25 <p>Although all Android-powered devices that support the TTS functionality ship
     26 with the engine, some devices have limited storage and may lack the
     27 language-specific resource files. If a user wants to install those resources,
     28 the TTS API enables an application to query the platform for the availability of
     29 language files and can initiate their download and installation. So upon
     30 creating your activity, a good first step is to check for the presence of the
     31 TTS resources with the corresponding intent:</p>
     32 
     33 <pre>Intent checkIntent = new Intent();
     34 checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
     35 startActivityForResult(checkIntent, MY_DATA_CHECK_CODE);</pre>
     36 
     37 <p>A successful check will be marked by a <code>CHECK_VOICE_DATA_PASS</code>
     38 result code, indicating this device is ready to speak, after the creation of
     39 our 
     40 {@link android.speech.tts.TextToSpeech} object. If not, we need to let the user
     41 know to install the data that's required for the device to become a
     42 multi-lingual talking machine! Downloading and installing the data is
     43 accomplished by firing off the ACTION_INSTALL_TTS_DATA intent, which will take
     44 the user to Android Market, and will let her/him initiate the download.
     45 Installation of the data will happen automatically once the download completes.
     46 Here is an example of what your implementation of
     47 <code>onActivityResult()</code> would look like:</p>
     48 
     49 <pre>private TextToSpeech mTts;
     50 protected void onActivityResult(
     51         int requestCode, int resultCode, Intent data) {
     52     if (requestCode == MY_DATA_CHECK_CODE) {
     53         if (resultCode == TextToSpeech.Engine.CHECK_VOICE_DATA_PASS) {
     54             // success, create the TTS instance
     55             mTts = new TextToSpeech(this, this);
     56         } else {
     57             // missing data, install it
     58             Intent installIntent = new Intent();
     59             installIntent.setAction(
     60                 TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
     61             startActivity(installIntent);
     62         }
     63     }
     64 }</pre>
     65 
     66 <p>In the constructor of the <code>TextToSpeech</code> instance we pass a
     67 reference to the <code>Context</code> to be used (here the current Activity),
     68 and to an <code>OnInitListener</code> (here our Activity as well). This listener
     69 enables our application to be notified when the Text-To-Speech engine is fully
     70 loaded, so we can start configuring it and using it.</p>
     71 
     72 <h4>Languages and Locale</h4>
     73 
     74 <p>At Google I/O 2009, we showed an <a title="Google I/O 2009, TTS
     75 demonstration" href="http://www.youtube.com/watch?v=uX9nt8Cpdqg#t=6m17s"
     76 id="rnfd">example of TTS</a> where it was used to speak the result of a
     77 translation from and to one of the 5 languages the Android TTS engine currently
     78 supports. Loading a language is as simple as calling for instance:</p>
     79 
     80 <pre>mTts.setLanguage(Locale.US);</pre><p>to load and set the language to
     81 English, as spoken in the country "US". A locale is the preferred way to specify
     82 a language because it accounts for the fact that the same language can vary from
     83 one country to another. To query whether a specific Locale is supported, you can
     84 use <code>isLanguageAvailable()</code>, which returns the level of support for
     85 the given Locale. For instance the calls:</p>
     86 
     87 <pre>mTts.isLanguageAvailable(Locale.UK))
     88 mTts.isLanguageAvailable(Locale.FRANCE))
     89 mTts.isLanguageAvailable(new Locale("spa", "ESP")))</pre>
     90 
     91 <p>will return TextToSpeech.LANG_COUNTRY_AVAILABLE to indicate that the language
     92 AND country as described by the Locale parameter are supported (and the data is
     93 correctly installed). But the calls:</p>
     94 
     95 <pre>mTts.isLanguageAvailable(Locale.CANADA_FRENCH))
     96 mTts.isLanguageAvailable(new Locale("spa"))</pre>
     97 
     98 <p>will return <code>TextToSpeech.LANG_AVAILABLE</code>. In the first example,
     99 French is supported, but not the given country. And in the second, only the
    100 language was specified for the Locale, so that's what the match was made on.</p>
    101 
    102 <p>Also note that besides the <code>ACTION_CHECK_TTS_DATA</code> intent to check
    103 the availability of the TTS data, you can also use
    104 <code>isLanguageAvailable()</code> once you have created your
    105 <code>TextToSpeech</code> instance, which will return
    106 <code>TextToSpeech.LANG_MISSING_DATA</code> if the required resources are not
    107 installed for the queried language.</p>
    108 
    109 <p>Making the engine speak an Italian string while the engine is set to the
    110 French language will produce some pretty <i>interesting </i>results, but it will
    111 not exactly be something your user would understand  So try to match the
    112 language of your application's content and the language that you loaded in your
    113 <code>TextToSpeech</code> instance. Also if you are using
    114 <code>Locale.getDefault()</code> to query the current Locale, make sure that at
    115 least the default language is supported.</p>
    116 
    117 <h3>Making your application speak</h3>
    118 
    119 <p>Now that our <code>TextToSpeech</code> instance is properly initialized and
    120 configured, we can start to make your application speak. The simplest way to do
    121 so is to use the <code>speak()</code> method. Let's iterate on the following
    122 example to make a talking alarm clock:</p>
    123 
    124 <pre>String myText1 = "Did you sleep well?";
    125 String myText2 = "I hope so, because it's time to wake up.";
    126 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, null);
    127 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, null);</pre>
    128 
    129 <p>The TTS engine manages a global queue of all the entries to synthesize, which
    130 are also known as "utterances". Each <code>TextToSpeech</code> instance can
    131 manage its own queue in order to control which utterance will interrupt the
    132 current one and which one is simply queued. Here the first <code>speak()</code>
    133 request would interrupt whatever was currently being synthesized: the queue is
    134 flushed and the new utterance is queued, which places it at the head of the
    135 queue. The second utterance is queued and will be played after
    136 <code>myText1</code> has completed.</p>
    137 
    138 <h4>Using optional parameters to change the playback stream type</h4>
    139 
    140 <p>On Android, each audio stream that is played is associated with one stream
    141 type, as defined in 
    142 {@link android.media.AudioManager android.media.AudioManager}. For a talking 
    143 alarm clock, we would like our text to be played on the
    144 <code>AudioManager.STREAM_ALARM</code> stream type so that it respects the alarm
    145 settings the user has chosen on the device. The last parameter of the speak()
    146 method allows you to pass to the TTS engine optional parameters, specified as
    147 key/value pairs in a HashMap. Let's use that mechanism to change the stream type
    148 of our utterances:</p>
    149 
    150 <pre>HashMap&lt;String, String&gt; myHashAlarm = new HashMap();
    151 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_STREAM,
    152         String.valueOf(AudioManager.STREAM_ALARM));
    153 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, myHashAlarm);
    154 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    155 
    156 <h4>Using optional parameters for playback completion callbacks</h4>
    157 
    158 <p>Note that <code>speak()</code> calls are asynchronous, so they will return
    159 well before the text is done being synthesized and played by Android, regardless
    160 of the use of <code>QUEUE_FLUSH</code> or <code>QUEUE_ADD</code>. But you might
    161 need to know when a particular utterance is done playing. For instance you might
    162 want to start playing an annoying music after <code>myText2</code> has finished
    163 synthesizing (remember, we're trying to wake up the user). We will again use an
    164 optional parameter, this time to tag our utterance as one we want to identify.
    165 We also need to make sure our activity implements the
    166 <code>TextToSpeech.OnUtteranceCompletedListener</code> interface:</p>
    167 
    168 <pre>mTts.setOnUtteranceCompletedListener(this);
    169 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_STREAM,
    170         String.valueOf(AudioManager.STREAM_ALARM));
    171 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, myHashAlarm);
    172 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID,
    173         "end of wakeup message ID");
    174 // myHashAlarm now contains two optional parameters
    175 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    176 
    177 <p>And the Activity gets notified of the completion in the implementation 
    178 of the listener:</p>
    179 
    180 <pre>public void onUtteranceCompleted(String uttId) {
    181     if (uttId == "end of wakeup message ID") {
    182         playAnnoyingMusic();
    183     } 
    184 }</pre>
    185 
    186 <h4>File rendering and playback</h4>
    187 
    188 <p>While the <code>speak()</code> method is used to make Android speak the text
    189 right away, there are cases where you would want the result of the synthesis to
    190 be recorded in an audio file instead. This would be the case if, for instance,
    191 there is text your application will speak often; you could avoid the synthesis
    192 CPU-overhead by rendering only once to a file, and then playing back that audio
    193 file whenever needed. Just like for <code>speak()</code>, you can use an
    194 optional utterance identifier to be notified on the completion of the synthesis
    195 to the file:</p>
    196 
    197 <pre>HashMap&lt;String, String&gt; myHashRender = new HashMap();
    198 String wakeUpText = "Are you up yet?";
    199 String destFileName = "/sdcard/myAppCache/wakeUp.wav";
    200 myHashRender.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, wakeUpText);
    201 mTts.synthesizeToFile(wakuUpText, myHashRender, destFileName);</pre>
    202 
    203 <p>Once you are notified of the synthesis completion, you can play the output
    204 file just like any other audio resource with 
    205 {@link android.media.MediaPlayer android.media.MediaPlayer}.</p>
    206 
    207 <p>But the <code>TextToSpeech</code> class offers other ways of associating
    208 audio resources with speech. So at this point we have a WAV file that contains
    209 the result of the synthesis of "Wake up" in the previously selected language. We
    210 can tell our TTS instance to associate the contents of the string "Wake up" with
    211 an audio resource, which can be accessed through its path, or through the
    212 package it's in, and its resource ID, using one of the two
    213 <code>addSpeech()</code> methods:</p>
    214 
    215 <pre>mTts.addSpeech(wakeUpText, destFileName);</pre>
    216 
    217 <p>This way any call to speak() for the same string content as
    218 <code>wakeUpText</code> will result in the playback of
    219 <code>destFileName</code>. If the file is missing, then speak will behave as if
    220 the audio file wasn't there, and will synthesize and play the given string. But
    221 you can also take advantage of that feature to provide an option to the user to
    222 customize how "Wake up" sounds, by recording their own version if they choose
    223 to. Regardless of where that audio file comes from, you can still use the same
    224 line in your Activity code to ask repeatedly "Are you up yet?":</p>
    225 
    226 <pre>mTts.speak(wakeUpText, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    227 
    228 <h4>When not in use...</h4><p>The text-to-speech functionality relies on a
    229 dedicated service shared across all applications that use that feature. When you
    230 are done using TTS, be a good citizen and tell it "you won't be needing its
    231 services anymore" by calling <code>mTts.shutdown()</code>, in your Activity
    232 <code>onDestroy()</code> method for instance.</p>
    233 
    234 <h3>Conclusion</h3>
    235 
    236 <p>Android now talks, and so can your apps. Remember that in order for
    237 synthesized speech to be intelligible, you need to match the language you select
    238 to that of the text to synthesize. Text-to-speech can help you push your app in
    239 new directions. Whether you use TTS to help users with disabilities, to enable
    240 the  use of your application while looking away from the screen, or simply to
    241 make it cool, we hope you'll enjoy this new feature.</p>