Amazon Polly Documentation

Notice

Getting Started »

Overview

Amazon Polly provides an API that enables you to integrate speech synthesis into your application. You send the text you want converted into speech to the Amazon Polly API, and Amazon Polly returns the audio stream to your application so your application can begin streaming it directly or store it in a standard audio file format.

Wide Selection of Voices and Languages

Amazon Polly offers many lifelike voices and support for a variety of languages, so you can select the ideal voice and distribute your speech-enabled applications in many countries.

Synchronize Speech for an Enhanced Visual Experience

Amazon Polly allows you to request an additional stream of metadata that provides information about when particular sentences, words and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, you can build your applications with an enhanced visual experience, such as speech-synchronized facial animation or karaoke-style word highlighting.

Optimize Your Streaming Audio

Amazon Polly is designed to allow you to stream information through your application to users in near real time. You can also choose from various sampling rates to optimize bandwidth and audio quality for your application.

Adjust Speaking Style, Speech Rate, Pitch, and Loudness

Amazon Polly supports markup language for speech synthesis applications, allowing you to modify phrasing, emphasis, and intonation. Amazon Polly also offers additional options, such as the ability to make certain voices speak in a Newscaster speaking style.

Adjust the Maximum Duration of Speech

Amazon Polly enables you to adjust the speech rate based on a maximum allotted amount of time you define. This is beneficial for many use cases, especially when it comes to localization.

Custom Lexicons

With Amazon Polly’s custom lexicons, or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words and neologisms.

Brand Voice

Brand Voice is a custom engagement where you can work with the Amazon Polly team to build a Neural Text-to-Speech (NTTS) voice for the exclusive use of your organization. Brand Voice allows you to differentiate your products and applications with a unique vocal identity in a wide variety of use cases. We work with you throughout the process to identify the persona, identify an actor or actress and record their speech, and build and train a model to produce the voice. The voice is then made available to your Amazon Web Services account ID(s).

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.amazonaws.cn/en_us/. This additional information does not form part of the Documentation for purposes of the Sinnet Customer Agreement for Amazon Web Services (Beijing Region), Western Cloud Data Customer Agreement for Amazon Web Services (Ningxia Region) or other agreement between you and Sinnet or NWCD governing your use of services of Amazon Web Services China Regions.