Chirp 3, the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models offered by Google Cloud's Speech-to-Text (STT) API v2, is available for Voice transcription.
Set up
Follow these steps to enable transcription with Speech-to-Text Chirp 3.
Console
When you create or update a conversation profile using the Agent Assist console, follow these steps to configure Speech-to-Text settings to use the Chirp 3 model.
- Click Conversation profiles.
- Click the name of your profile.
- Navigate to the Speech to Text Config section.
- Choose Chirp 3 for the model.
- (Optional) Select Use Long Form Model for AA Telephony SipRec Integration if the audio is transmitted through Telephony Integration.
- (Optional) Configure Language Code and up to one Alternative Language Codes for language-agnostic transcription.
- (Optional) Configure auto as the language code for language-agnostic transcription.
- (Optional) Configure Phrases for speech adaptation to improve accuracy with model adaptation.
REST API
You can call the API directly to create or update a conversation profile. Enable STT V2 with the ConversationProfile.sttConfig.useSttV2 field, as shown in the following example.
Example Configuration:
{ "name": "projects/PROJECT_ID/locations/global/conversationProfiles/CONVERSATION_PROFILE_ID",f "displayName": "CONVERSATION_PROFILE_NAME", "automatedAgentConfig": { }, "humanAgentAssistantConfig": { "notificationConfig": { "topic": "projects/PROJECT_ID/topics/FEATURE_SUGGESTION_TOPIC_ID", "messageFormat": "JSON" }, "humanAgentSuggestionConfig": { "featureConfigs": [{ "enableEventBasedSuggestion": true, "suggestionFeature": { "type": "ARTICLE_SUGGESTION" }, "conversationModelConfig": { } }] }, "messageAnalysisConfig": { } }, "sttConfig": { "model": "chirp_3", "useSttV2": true, }, "languageCode": "en-US" }
Best practices
Follow these suggestions to get the most from voice transcription with Chirp 3 model.
Audio streaming
To maximize Chirp 3 performance, send audio in near real time. This means if you have X seconds of audio, stream it in roughly X seconds. Break your audio into small chunks, each with a frame size of 100 ms. For more audio streaming best practices, see the Speech-to-Text documentation.
Use speech adaptation
Use transcription with Chirp 3 speech adaptation only with inline phrases configured in the conversation profile.
Regional and language support
Chirp 3 is available for all Speech-to-Text languages with different launch readiness, and in all Agent Assist regions except northamerica-northeast1, northamerica-northeast2, and asia-south1.
Quotas
The number of transcription requests using the Chirp 3 model is limited by the SttV2StreamingRequestsPerMinutePerResourceTypePerRegion quota with chirp_3 labeled as the resource type. See the Google Cloud quotas guide for information on quota usage and how to request a quota increase.
For quotas, transcription requests to the global Dialogflow endpoints are in the us-central1 region.