Home | History | Annotate | Download | only in articles
      1 page.title=Using Text-to-Speech
      2 parent.title=Articles
      3 parent.link=../browser.html?tag=article
      4 @jd:body
      5 
      6 <p>Starting with Android 1.6 (API Level 4), the Android platform includes a new
      7 Text-to-Speech (TTS) capability. Also known as "speech synthesis", TTS enables
      8 your Android device to "speak" text of different languages.</p>
      9 
     10 <p>Before we explain how to use the TTS API itself, let's first review a few
     11 aspects of the engine that will be important to your TTS-enabled application. We
     12 will then show how to make your Android application talk and how to configure
     13 the way it speaks.</p>
     14 
     15 <h3>Languages and resources</h3>
     16 
     17 <p>The TTS engine that ships with the Android platform supports a number of
     18 languages: English, French, German, Italian and Spanish. Also, depending on
     19 which side of the Atlantic you are on, American and British accents for English
     20 are both supported.</p>
     21 
     22 <p>The TTS engine needs to know which language to speak, as a word like "Paris",
     23 for example, is pronounced differently in French and English. So the voice and
     24 dictionary are language-specific resources that need to be loaded before the
     25 engine can start to speak.</p>
     26 
     27 <p>Although all Android-powered devices that support the TTS functionality ship
     28 with the engine, some devices have limited storage and may lack the
     29 language-specific resource files. If a user wants to install those resources,
     30 the TTS API enables an application to query the platform for the availability of
     31 language files and can initiate their download and installation. So upon
     32 creating your activity, a good first step is to check for the presence of the
     33 TTS resources with the corresponding intent:</p>
     34 
     35 <pre>Intent checkIntent = new Intent();
     36 checkIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
     37 startActivityForResult(checkIntent, MY_DATA_CHECK_CODE);</pre>
     38 
     39 <p>A successful check will be marked by a <code>CHECK_VOICE_DATA_PASS</code>
     40 result code, indicating this device is ready to speak, after the creation of
     41 our 
     42 {@link android.speech.tts.TextToSpeech} object. If not, we need to let the user
     43 know to install the data that's required for the device to become a
     44 multi-lingual talking machine! Downloading and installing the data is
     45 accomplished by firing off the ACTION_INSTALL_TTS_DATA intent, which will take
     46 the user to Android Market, and will let her/him initiate the download.
     47 Installation of the data will happen automatically once the download completes.
     48 Here is an example of what your implementation of
     49 <code>onActivityResult()</code> would look like:</p>
     50 
     51 <pre>private TextToSpeech mTts;
     52 protected void onActivityResult(
     53         int requestCode, int resultCode, Intent data) {
     54     if (requestCode == MY_DATA_CHECK_CODE) {
     55         if (resultCode == TextToSpeech.Engine.CHECK_VOICE_DATA_PASS) {
     56             // success, create the TTS instance
     57             mTts = new TextToSpeech(this, this);
     58         } else {
     59             // missing data, install it
     60             Intent installIntent = new Intent();
     61             installIntent.setAction(
     62                 TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
     63             startActivity(installIntent);
     64         }
     65     }
     66 }</pre>
     67 
     68 <p>In the constructor of the <code>TextToSpeech</code> instance we pass a
     69 reference to the <code>Context</code> to be used (here the current Activity),
     70 and to an <code>OnInitListener</code> (here our Activity as well). This listener
     71 enables our application to be notified when the Text-To-Speech engine is fully
     72 loaded, so we can start configuring it and using it.</p>
     73 
     74 <h4>Languages and Locale</h4>
     75 
     76 <p>At Google I/O 2009, we showed an <a title="Google I/O 2009, TTS
     77 demonstration" href="http://www.youtube.com/watch?v=uX9nt8Cpdqg#t=6m17s"
     78 id="rnfd">example of TTS</a> where it was used to speak the result of a
     79 translation from and to one of the 5 languages the Android TTS engine currently
     80 supports. Loading a language is as simple as calling for instance:</p>
     81 
     82 <pre>mTts.setLanguage(Locale.US);</pre><p>to load and set the language to
     83 English, as spoken in the country "US". A locale is the preferred way to specify
     84 a language because it accounts for the fact that the same language can vary from
     85 one country to another. To query whether a specific Locale is supported, you can
     86 use <code>isLanguageAvailable()</code>, which returns the level of support for
     87 the given Locale. For instance the calls:</p>
     88 
     89 <pre>mTts.isLanguageAvailable(Locale.UK))
     90 mTts.isLanguageAvailable(Locale.FRANCE))
     91 mTts.isLanguageAvailable(new Locale("spa", "ESP")))</pre>
     92 
     93 <p>will return TextToSpeech.LANG_COUNTRY_AVAILABLE to indicate that the language
     94 AND country as described by the Locale parameter are supported (and the data is
     95 correctly installed). But the calls:</p>
     96 
     97 <pre>mTts.isLanguageAvailable(Locale.CANADA_FRENCH))
     98 mTts.isLanguageAvailable(new Locale("spa"))</pre>
     99 
    100 <p>will return <code>TextToSpeech.LANG_AVAILABLE</code>. In the first example,
    101 French is supported, but not the given country. And in the second, only the
    102 language was specified for the Locale, so that's what the match was made on.</p>
    103 
    104 <p>Also note that besides the <code>ACTION_CHECK_TTS_DATA</code> intent to check
    105 the availability of the TTS data, you can also use
    106 <code>isLanguageAvailable()</code> once you have created your
    107 <code>TextToSpeech</code> instance, which will return
    108 <code>TextToSpeech.LANG_MISSING_DATA</code> if the required resources are not
    109 installed for the queried language.</p>
    110 
    111 <p>Making the engine speak an Italian string while the engine is set to the
    112 French language will produce some pretty <i>interesting </i>results, but it will
    113 not exactly be something your user would understand  So try to match the
    114 language of your application's content and the language that you loaded in your
    115 <code>TextToSpeech</code> instance. Also if you are using
    116 <code>Locale.getDefault()</code> to query the current Locale, make sure that at
    117 least the default language is supported.</p>
    118 
    119 <h3>Making your application speak</h3>
    120 
    121 <p>Now that our <code>TextToSpeech</code> instance is properly initialized and
    122 configured, we can start to make your application speak. The simplest way to do
    123 so is to use the <code>speak()</code> method. Let's iterate on the following
    124 example to make a talking alarm clock:</p>
    125 
    126 <pre>String myText1 = "Did you sleep well?";
    127 String myText2 = "I hope so, because it's time to wake up.";
    128 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, null);
    129 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, null);</pre>
    130 
    131 <p>The TTS engine manages a global queue of all the entries to synthesize, which
    132 are also known as "utterances". Each <code>TextToSpeech</code> instance can
    133 manage its own queue in order to control which utterance will interrupt the
    134 current one and which one is simply queued. Here the first <code>speak()</code>
    135 request would interrupt whatever was currently being synthesized: the queue is
    136 flushed and the new utterance is queued, which places it at the head of the
    137 queue. The second utterance is queued and will be played after
    138 <code>myText1</code> has completed.</p>
    139 
    140 <h4>Using optional parameters to change the playback stream type</h4>
    141 
    142 <p>On Android, each audio stream that is played is associated with one stream
    143 type, as defined in 
    144 {@link android.media.AudioManager android.media.AudioManager}. For a talking 
    145 alarm clock, we would like our text to be played on the
    146 <code>AudioManager.STREAM_ALARM</code> stream type so that it respects the alarm
    147 settings the user has chosen on the device. The last parameter of the speak()
    148 method allows you to pass to the TTS engine optional parameters, specified as
    149 key/value pairs in a HashMap. Let's use that mechanism to change the stream type
    150 of our utterances:</p>
    151 
    152 <pre>HashMap&lt;String, String&gt; myHashAlarm = new HashMap();
    153 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_STREAM,
    154         String.valueOf(AudioManager.STREAM_ALARM));
    155 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, myHashAlarm);
    156 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    157 
    158 <h4>Using optional parameters for playback completion callbacks</h4>
    159 
    160 <p>Note that <code>speak()</code> calls are asynchronous, so they will return
    161 well before the text is done being synthesized and played by Android, regardless
    162 of the use of <code>QUEUE_FLUSH</code> or <code>QUEUE_ADD</code>. But you might
    163 need to know when a particular utterance is done playing. For instance you might
    164 want to start playing an annoying music after <code>myText2</code> has finished
    165 synthesizing (remember, we're trying to wake up the user). We will again use an
    166 optional parameter, this time to tag our utterance as one we want to identify.
    167 We also need to make sure our activity implements the
    168 <code>TextToSpeech.OnUtteranceCompletedListener</code> interface:</p>
    169 
    170 <pre>mTts.setOnUtteranceCompletedListener(this);
    171 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_STREAM,
    172         String.valueOf(AudioManager.STREAM_ALARM));
    173 mTts.speak(myText1, TextToSpeech.QUEUE_FLUSH, myHashAlarm);
    174 myHashAlarm.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID,
    175         "end of wakeup message ID");
    176 // myHashAlarm now contains two optional parameters
    177 mTts.speak(myText2, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    178 
    179 <p>And the Activity gets notified of the completion in the implementation 
    180 of the listener:</p>
    181 
    182 <pre>public void onUtteranceCompleted(String uttId) {
    183     if (uttId == "end of wakeup message ID") {
    184         playAnnoyingMusic();
    185     } 
    186 }</pre>
    187 
    188 <h4>File rendering and playback</h4>
    189 
    190 <p>While the <code>speak()</code> method is used to make Android speak the text
    191 right away, there are cases where you would want the result of the synthesis to
    192 be recorded in an audio file instead. This would be the case if, for instance,
    193 there is text your application will speak often; you could avoid the synthesis
    194 CPU-overhead by rendering only once to a file, and then playing back that audio
    195 file whenever needed. Just like for <code>speak()</code>, you can use an
    196 optional utterance identifier to be notified on the completion of the synthesis
    197 to the file:</p>
    198 
    199 <pre>HashMap&lt;String, String&gt; myHashRender = new HashMap();
    200 String wakeUpText = "Are you up yet?";
    201 String destFileName = "/sdcard/myAppCache/wakeUp.wav";
    202 myHashRender.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, wakeUpText);
    203 mTts.synthesizeToFile(wakuUpText, myHashRender, destFileName);</pre>
    204 
    205 <p>Once you are notified of the synthesis completion, you can play the output
    206 file just like any other audio resource with 
    207 {@link android.media.MediaPlayer android.media.MediaPlayer}.</p>
    208 
    209 <p>But the <code>TextToSpeech</code> class offers other ways of associating
    210 audio resources with speech. So at this point we have a WAV file that contains
    211 the result of the synthesis of "Wake up" in the previously selected language. We
    212 can tell our TTS instance to associate the contents of the string "Wake up" with
    213 an audio resource, which can be accessed through its path, or through the
    214 package it's in, and its resource ID, using one of the two
    215 <code>addSpeech()</code> methods:</p>
    216 
    217 <pre>mTts.addSpeech(wakeUpText, destFileName);</pre>
    218 
    219 <p>This way any call to speak() for the same string content as
    220 <code>wakeUpText</code> will result in the playback of
    221 <code>destFileName</code>. If the file is missing, then speak will behave as if
    222 the audio file wasn't there, and will synthesize and play the given string. But
    223 you can also take advantage of that feature to provide an option to the user to
    224 customize how "Wake up" sounds, by recording their own version if they choose
    225 to. Regardless of where that audio file comes from, you can still use the same
    226 line in your Activity code to ask repeatedly "Are you up yet?":</p>
    227 
    228 <pre>mTts.speak(wakeUpText, TextToSpeech.QUEUE_ADD, myHashAlarm);</pre>
    229 
    230 <h4>When not in use...</h4><p>The text-to-speech functionality relies on a
    231 dedicated service shared across all applications that use that feature. When you
    232 are done using TTS, be a good citizen and tell it "you won't be needing its
    233 services anymore" by calling <code>mTts.shutdown()</code>, in your Activity
    234 <code>onDestroy()</code> method for instance.</p>
    235 
    236 <h3>Conclusion</h3>
    237 
    238 <p>Android now talks, and so can your apps. Remember that in order for
    239 synthesized speech to be intelligible, you need to match the language you select
    240 to that of the text to synthesize. Text-to-speech can help you push your app in
    241 new directions. Whether you use TTS to help users with disabilities, to enable
    242 the  use of your application while looking away from the screen, or simply to
    243 make it cool, we hope you'll enjoy this new feature.</p>