(not logged in)
 
 
 
 

Text-To-Speech (TTS)


Aculab Cloud supports Cepstral and Ivona Text To Speech (TTS).

Choosing the voice to use

In the APIs, the TTS

say()
functions support the Speech Synthesis Markup Language (SSML) allowing you to change the way your text is spoken, for example, by choosing which voice you'd like to say it. SSML also allows you to choose which TTS engine you'd like, through use of the optional acu-engine tag which, if provided, must be outermost in the string. If you don't provide this tag, Cepstral is used.

Examples:

  • <acu-engine name="Cepstral">This is Cepstral TTS speaking.</acu-engine>
  • <acu-engine name="Ivona">I'm the Ivona TTS.</acu-engine>
  • And this is the Cepstral TTS speaking, again!

Voice examples

To hear a sample of each of the supported voices, click here.

Ivona

Ivona's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Ivona demos newwinOpens in new window.

Ivona TTS supports a subset of SSML, which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, go to W3C SSML 1.1 recommendationnewwinOpens in new window.

We support the following voices:

  • 'Kimberly' - US English female, default
  • 'Naja' - Danish female
  • 'Mads' - Danish male
  • 'Lotte' - Dutch female
  • 'Ruben' - Dutch male
  • 'Emma' - UK English female
  • 'Amy' - UK English female
  • 'Brian' - UK English male
  • 'Geraint' - Welsh English male
  • 'Gwyneth' - Welsh female
  • 'Nicole' - Australian English female
  • 'Russell' - Australian English male
  • 'Raveena' - Indian English female
  • 'Salli' - US English female
  • 'Ivy' - US English female
  • 'Kendra' - US English, female
  • 'Joey' - US English male
  • 'Celine' - French female
  • 'Mathieu' - French male
  • 'Chantal' - Canadian French female
  • 'Marlene' - German female
  • 'Hans' - German male
  • 'Dora' - Icelandic female
  • 'Karl' - Icelandic male
  • 'Giorgio' - Italian male
  • 'Carla' - Italian female
  • 'Maja' - Polish female
  • 'Ewa' - Polish female
  • 'Jacek' - Polish male
  • 'Jan' - Polish male
  • 'Ricardo' - Brazilian Portuguese male
  • 'Vitoria' - Brazilian Portuguese female
  • 'Cristiano' - Portuguese male
  • 'Carmen' - Romanian female
  • 'Tatyana' - Russian female
  • 'Conchita' - Spanish female
  • 'Enrique' - Spanish male
  • 'Penelope' - US Spanish female
  • 'Miguel' - US Spanish male
  • 'Astrid' - Swedish female
  • 'Filiz' - Turkish male

Cepstral

Cepstral's website has a demo which allows you to select a voice and immediately hear how different text will sound - see Cepstral demos newwinOpens in new window.

Cepstral TTS supports a subset of the Speech Synthesis Markup Language (SSML), which can optionally be embedded within the text you supply to the say function. For a summary of the SSML tags which may be used, see Common SSML tags below. For more detailed information, go to Cepstral SSML FAQ newwinOpens in new window and scroll down to the 'Common Usage Examples'. With reference to that page, please bear in mind the following:

We support the following voices:

  • 'Callie-8kHz' - US English female, default
  • 'Marta-8kHz' - American Spanish female
  • 'Vittoria' - Italian female

We don't support:

  • Inserting recorded audio files (our APIs' play functions already allow file replay)
  • Applying Cepstral special effects
  • Inserting bookmarks

Reserved characters

Some characters are reserved for use in SSML so, if the text you need to say contains any of these, replace them as shown:

  • Less than (<) -> &lt;
  • Greater than (>) -> &gt;
  • Ampersand (&) -> &amp;

For example, "Bill & Ben played in the garden" would be become "Bill &amp; Ben played in the garden".

Common SSML tags

Cepstral and Ivona both support a subset of SSML. Details of common tags can be found below. It is highly recommended that you test your application before deploying with a different TTS engine.

TagDescription
break

Inserts a break or pause in the speech.

Optional arguments are time and strength.

time sets an absolute value for the pause. For example <break time="3s"> and <break time="3ms"> set the break time to be three seconds and three milliseconds respectively.

strength sets the relative value of the pause. These are none, x-weak, weak, medium, strong and x-strong.

Examples:

This is a <break /> sentence break.
This is a <break time="2s"/> two second break.
This is a dramatic <break strength="x-strong"/> break.
voice

Allows the user to specify the voice used. Parameter name is required. The supported voices for each TTS are listed above.

Example:


<acu-engine name='Ivona'><voice name='Amy'>I'm using Amy instead of the default voice.</voice></acu-engine>


prosody

Allows the user to change the pitch, speed and volume of a segment of speech.

Common optional parameters are: pitch, rate and volume.

pitch can be used to set the pitch of speech. Options are: x-low, low, medium, high, x-high,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

rate sets the rate of speech. Options are: x-slow, slow, medium, fast and x-fast,a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

volume sets the volume for speech. Options are: silent, x-soft, soft, medium, loud and x-loud, a relative change (measured in Hz) e.g. +50Hz, or a percentage change e.g +50%.

Examples:

<prosody rate="x-fast">I'm using a very fast rate.</prosody>
This is normal volume. <prosody volume="soft">This is a soft volume.</prosody>
I can talk very <prosody rate="slow" pitch="low">deeply and slowly.</prosody>
Today's date is the <prosody rate="-50%">15th April, 2012.</prosody>
emphasis

Can be used to read with empasis.

Required parameter: level. Options are: reduced, moderate and strong.

Examples:

This is a <emphasis level="strong">level of emphasis</emphasis>, which can be used to highlight important information.