Install the AWS SDK for Python (boto3)
- To interact with Amazon Polly via Python, you'll need the AWS SDK for Python, commonly known as boto3. You can install it using pip.
pip install boto3
Set Up AWS Credentials
- AWS SDK for Python requires authentication using access keys: an Access Key ID and a Secret Access Key. These can be set using environment variables, AWS credentials file, or by specifying them directly in the code.
export AWS_ACCESS_KEY_ID='your-access-key-id'
export AWS_SECRET_ACCESS_KEY='your-secret-access-key'
Initialize a Polly Client
- Use boto3 to create a client for Amazon Polly. This client will be used to make requests to Polly.
import boto3
polly_client = boto3.Session(
aws_access_key_id='your-access-key-id',
aws_secret_access_key='your-secret-access-key',
region_name='your-region'
).client('polly')
Convert Text to Speech
- To synthesize speech, use the `synthesize_speech` method. Specify the text, output format (such as MP3), and your preferred voice ID.
response = polly_client.synthesize_speech(
Text='Hello, welcome to using Amazon Polly!',
OutputFormat='mp3',
VoiceId='Joanna'
)
Save the Synthetic Speech to a File
- Extract the audio stream from the response and save it to a file for playback.
with open('speech.mp3', 'wb') as file:
file.write(response['AudioStream'].read())
Select Different Voices and Languages
- Polly supports a diverse range of voices and languages. You can change the `VoiceId` parameter to another voice ID as needed. For a complete list, you can programmatically list all available voices.
voices = polly_client.describe_voices()
for voice in voices['Voices']:
print(f"Voice Name: {voice['Name']}, Language: {voice['LanguageName']}")
Handle Error Responses
- It's crucial to handle potential errors that might occur during the synthesis request. Use try-except blocks to catch these errors and take appropriate actions.
try:
response = polly_client.synthesize_speech(
Text='Hello, welcome to using Amazon Polly!',
OutputFormat='mp3',
VoiceId='Joanna'
)
except (Boto3Error, ClientError) as error:
print(f"An error occurred: {error}")
Stream Audio Directly for Real-Time Use
- If you need real-time audio playback or streaming, you might read from the audio stream without saving it to a file. Libraries like PyDub can help directly play audio streams.
from pydub import AudioSegment
from pydub.playback import play
audio = AudioSegment.from_file(response['AudioStream'], format="mp3")
play(audio)
Optimize and Customize Speech
- Utilize features like Speech Marks, Lexicons, and SSML (Speech Synthesis Markup Language) for advanced controls over the voice output.
- Using SSML gives you more control: manage prosody, pronunciation, and even audio break points for more human-like speech.
ssml_text = """<speak><prosody rate="medium" pitch="high">Hello, welcome to using Amazon Polly!</prosody></speak>"""
response = polly_client.synthesize_speech(
TextType='ssml',
Text=ssml_text,
OutputFormat='mp3',
VoiceId='Joanna'
)
These practices illustrate how to effectively utilize Amazon Polly for a variety of applications, from simple text-to-speech synthesis to more complex, dynamic speech generation using Python.