本節說明如何將串流音訊 (例如來自麥克風的輸入) 轉錄為文字。
串流語音辨識功能可讓您將音訊串流至 Cloud Speech-to-Text,並在處理音訊時即時接收串流語音辨識結果。另請參閱串流語音辨識要求的音訊限制。串流語音辨識只能透過 gRPC 使用。
事前準備
- 登入 Google Cloud 帳戶。如果您是 Google Cloud新手,歡迎 建立帳戶,親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額,可用於執行、測試及部署工作負載。
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Speech-to-Text APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
Make sure that you have the following role or roles on the project: Cloud Speech Administrator
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- Click Select a role, then search for the role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
-
安裝 Google Cloud CLI。
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Speech-to-Text APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
Make sure that you have the following role or roles on the project: Cloud Speech Administrator
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- Click Select a role, then search for the role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
-
安裝 Google Cloud CLI。
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
如果您使用本機殼層,請為使用者帳戶建立本機驗證憑證:
gcloud auth application-default login
如果您使用 Cloud Shell,則不需要執行這項操作。
如果系統傳回驗證錯誤,且您使用外部識別資訊提供者 (IdP),請確認您已 使用聯合身分登入 gcloud CLI。
用戶端程式庫可以使用應用程式預設憑證,輕鬆向 Google API 進行驗證,然後傳送要求給這些 API。有了應用程式預設憑證,您就能在本機測試應用程式並部署,不必變更基礎程式碼。詳情請參閱「 進行驗證以使用用戶端程式庫」一文。
此外,請務必安裝用戶端程式庫。
對本機檔案執行串流語音辨識
下列程式碼區塊包含對本機音訊檔案執行串流語音辨識的範例。串流要求中傳送的音訊大小上限為 25 KB。這項限制適用於初始 StreamingRecognize 要求和串流中每則訊息的大小。如果超出這項限制,系統就會擲回錯誤。
Python
import os
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech as cloud_speech_types
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
def transcribe_streaming_v2(
stream_file: str,
) -> cloud_speech_types.StreamingRecognizeResponse:
"""Transcribes audio from an audio file stream using Google Cloud Speech-to-Text API.
Args:
stream_file (str): Path to the local audio file to be transcribed.
Example: "resources/audio.wav"
Returns:
list[cloud_speech_types.StreamingRecognizeResponse]: A list of objects.
Each response includes the transcription results for the corresponding audio segment.
"""
# Instantiates a client
client = SpeechClient()
# Reads a file as bytes
with open(stream_file, "rb") as f:
audio_content = f.read()
# In practice, stream should be a generator yielding chunks of audio data
chunk_length = len(audio_content) // 5
stream = [
audio_content[start : start + chunk_length]
for start in range(0, len(audio_content), chunk_length)
]
audio_requests = (
cloud_speech_types.StreamingRecognizeRequest(audio=audio) for audio in stream
)
recognition_config = cloud_speech_types.RecognitionConfig(
auto_decoding_config=cloud_speech_types.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="chirp_3",
)
streaming_config = cloud_speech_types.StreamingRecognitionConfig(
config=recognition_config
)
config_request = cloud_speech_types.StreamingRecognizeRequest(
recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
streaming_config=streaming_config,
)
def requests(config: cloud_speech_types.RecognitionConfig, audio: list) -> list:
yield config
yield from audio
# Transcribes the audio into text
responses_iterator = client.streaming_recognize(
requests=requests(config_request, audio_requests)
)
responses = []
for response in responses_iterator:
responses.append(response)
for result in response.results:
print(f"Transcript: {result.alternatives[0].transcript}")
return responses
雖然您可以將本機音訊檔案串流至 Speech-to-Text API,不過建議您執行同步音訊辨識。
清除所用資源
為避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用,請按照下列步驟操作。
-
選用:撤銷您建立的驗證憑證,並刪除本機憑證檔案。
gcloud auth application-default revoke
-
選用:從 gcloud CLI 撤銷憑證。
gcloud auth revoke
控制台
gcloud
刪除 Google Cloud 專案:
gcloud projects delete PROJECT_ID