Synthesize speech with bidirectional streaming
This document walks you through the process of synthesizing audio using bidirectional streaming.
Bidirectional streaming lets you send text input and receive audio data simultaneously. This means that you can start synthesizing speech before the complete input text is sent, which reduces latency and enables real-time interactions. Voice assistants and interactive games use bidirectional streaming to create more dynamic and responsive applications.
To learn more about the fundamental concepts in Cloud Text-to-Speech, read Cloud Text-to-Speech Basics.
Before you begin
Before you can send a request to the Cloud Text-to-Speech API, you must have completed the following actions.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the Cloud Text-to-Speech API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable texttospeech.googleapis.com
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init -
Create or select a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the Cloud Text-to-Speech API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.gcloud services enable texttospeech.googleapis.com
Synthesize speech with bidirectional streaming
Install the client library
Python
Before installing the library, make sure you've prepared your environment for Python development.
pip install --upgrade google-cloud-texttospeech
Send a stream of text and receive a stream of audio
The API accepts a stream of requests with type StreamingSynthesizeRequest,
which contain either StreamingSynthesisInput or StreamingSynthesizeConfig.
Before sending a stream StreamingSynthesizeRequest with
StreamingSynthesisInput, which provides text input, send exactly one
StreamingSynthesizeRequest with a StreamingSynthesizeConfig.
Streaming Cloud Text-to-Speech is only compatible with Chirp 3: HD voices.
Python
Before running the example, make sure you've prepared your environment for Python development.
Clean up
To avoid unnecessary Google Cloud Platform charges, use the Google Cloud console to delete your project if you do not need it.
What's next
- Learn more about Cloud Text-to-Speech by reading the basics.
- Review the list of available voices you can use for synthetic speech.