這個頁面說明如何使用 Speech-to-Text API 和非同步語音辨識,將長音訊檔案 (長度超過 1 分鐘) 轉錄為文字。
關於非同步語音辨識
「批次語音辨識」會啟動長時間執行的音訊處理作業。使用非同步語音辨識功能轉錄長度超過 60 秒的音訊。如果是較短的音訊,使用同步語音辨識會更快、更簡單。非同步語音辨識的上限為 480 分鐘 (8 小時)。
批次語音辨識只能轉錄儲存在 Cloud Storage 中的音訊。轉錄稿輸出內容可內嵌在回應中 (適用於單一檔案的批次辨識要求),或寫入 Cloud Storage。
批次辨識要求會傳回 Operation,其中包含要求目前辨識處理作業的相關資訊。您可以輪詢作業,瞭解作業何時完成,以及何時可取得轉錄稿。
事前準備
- 登入 Google Cloud 帳戶。如果您是 Google Cloud新手,歡迎 建立帳戶,親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額,可用於執行、測試及部署工作負載。
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Speech-to-Text APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
Make sure that you have the following role or roles on the project: Cloud Speech Administrator
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- Click Select a role, then search for the role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
-
安裝 Google Cloud CLI。
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
Enable the Speech-to-Text APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.-
Make sure that you have the following role or roles on the project: Cloud Speech Administrator
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- Click Select a role, then search for the role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
-
-
安裝 Google Cloud CLI。
-
若您採用的是外部識別資訊提供者 (IdP),請先使用聯合身分登入 gcloud CLI。
-
執行下列指令,初始化 gcloud CLI:
gcloud init -
如果您使用本機殼層,請為使用者帳戶建立本機驗證憑證:
gcloud auth application-default login
如果您使用 Cloud Shell,則不需要執行這項操作。
如果系統傳回驗證錯誤,且您使用外部識別資訊提供者 (IdP),請確認您已 使用聯合身分登入 gcloud CLI。
用戶端程式庫可以使用應用程式預設憑證,輕鬆向 Google API 進行驗證,然後傳送要求給這些 API。有了應用程式預設憑證,您就能在本機測試應用程式並部署,不必變更基礎程式碼。詳情請參閱「 進行驗證以使用用戶端程式庫」一文。
此外,請務必安裝用戶端程式庫。
啟用 Cloud Storage 存取權
Speech-to-Text 會使用服務帳戶存取 Cloud Storage 中的檔案。依預設,服務帳戶可存取同一個專案中的 Cloud Storage 檔案。
服務帳戶電子郵件地址如下:
service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com
如要轉錄其他專案中的 Cloud Storage 檔案,請在其他專案中將 [Speech-to-Text 服務代理程式][speech-service-agent] 角色授予這個服務帳戶:
gcloud projects add-iam-policy-binding PROJECT_ID \
--member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
--role=roles/speech.serviceAgent如要進一步瞭解專案 IAM 政策,請參閱 [管理專案、資料夾和機構的存取權][manage-access]。
您也可以授予服務帳戶特定 Cloud Storage bucket 的權限,提供更精細的存取權:
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
--role=roles/storage.admin如要進一步瞭解如何管理 Cloud Storage 存取權,請參閱 Cloud Storage 說明文件中的 [建立及管理存取權控管清單][buckets-manage-acl]。
執行批次辨識並取得內嵌結果
以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次語音辨識,並從回應中讀取內嵌的轉錄結果:
Python
import os
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
def transcribe_batch_gcs_input_inline_output_v2(
audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
"""Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
The transcription results are returned inline in the response.
Args:
audio_uri (str): The Google Cloud Storage URI of the input audio file.
Such as gs://[BUCKET]/[FILE]
Returns:
cloud_speech.BatchRecognizeResults: The response containing the transcription results.
"""
# Instantiates a client
client = SpeechClient()
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="chirp_3",
)
file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)
request = cloud_speech.BatchRecognizeRequest(
recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
config=config,
files=[file_metadata],
recognition_output_config=cloud_speech.RecognitionOutputConfig(
inline_response_config=cloud_speech.InlineOutputConfig(),
),
)
# Transcribes the audio into text
operation = client.batch_recognize(request=request)
print("Waiting for operation to complete...")
response = operation.result(timeout=120)
for result in response.results[audio_uri].transcript.results:
print(f"Transcript: {result.alternatives[0].transcript}")
return response.results[audio_uri].transcript
執行批次辨識作業,並將結果寫入 Cloud Storage
以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次語音辨識,並從 Cloud Storage 中的輸出檔案讀取轉錄結果。請注意,寫入 Cloud Storage 的檔案是 JSON 格式的 BatchRecognizeResults 訊息:
Python
import os
import re
from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
def transcribe_batch_gcs_input_gcs_output_v2(
audio_uri: str,
gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResults:
"""Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
The transcription results are stored in another Google Cloud Storage bucket.
Args:
audio_uri (str): The Google Cloud Storage URI of the input audio file.
E.g., gs://[BUCKET]/[FILE]
gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
E.g., gs://[BUCKET]
Returns:
cloud_speech.BatchRecognizeResults: The response containing the URI of the transcription results.
"""
# Instantiates a client
client = SpeechClient()
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="chirp_3",
)
file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)
request = cloud_speech.BatchRecognizeRequest(
recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
config=config,
files=[file_metadata],
recognition_output_config=cloud_speech.RecognitionOutputConfig(
gcs_output_config=cloud_speech.GcsOutputConfig(
uri=gcs_output_path,
),
),
)
# Transcribes the audio into text
operation = client.batch_recognize(request=request)
print("Waiting for operation to complete...")
response = operation.result(timeout=120)
file_results = response.results[audio_uri]
print(f"Operation finished. Fetching results from {file_results.uri}...")
output_bucket, output_object = re.match(
r"gs://([^/]+)/(.*)", file_results.uri
).group(1, 2)
# Instantiates a Cloud Storage client
storage_client = storage.Client()
# Fetch results from Cloud Storage
bucket = storage_client.bucket(output_bucket)
blob = bucket.blob(output_object)
results_bytes = blob.download_as_bytes()
batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
results_bytes, ignore_unknown_fields=True
)
for result in batch_recognize_results.results:
print(f"Transcript: {result.alternatives[0].transcript}")
return batch_recognize_results
對多個檔案執行批次辨識
以下範例示範如何對 Cloud Storage 中的多個音訊檔案執行批次語音辨識,並從 Cloud Storage 中的輸出檔案讀取轉錄結果:
Python
import os
import re
from typing import List
from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
def transcribe_batch_multiple_files_v2(
audio_uris: List[str],
gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResponse:
"""Transcribes audio from multiple Google Cloud Storage URIs using the Google Cloud Speech-to-Text API.
The transcription results are stored in another Google Cloud Storage bucket.
Args:
audio_uris (List[str]): The list of Google Cloud Storage URIs of the input audio files.
Such as ["gs://[BUCKET]/[FILE]", "gs://[BUCKET]/[FILE]"]
gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript is stored.
Such as gs://[BUCKET]
Returns:
cloud_speech.BatchRecognizeResponse: The response containing the URIs of the transcription results.
"""
# Instantiates a client
client = SpeechClient()
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="chirp_3",
)
files = [cloud_speech.BatchRecognizeFileMetadata(uri=uri) for uri in audio_uris]
request = cloud_speech.BatchRecognizeRequest(
recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
config=config,
files=files,
recognition_output_config=cloud_speech.RecognitionOutputConfig(
gcs_output_config=cloud_speech.GcsOutputConfig(
uri=gcs_output_path,
),
),
)
# Transcribes the audio into text
operation = client.batch_recognize(request=request)
print("Waiting for operation to complete...")
response = operation.result(timeout=120)
print("Operation finished. Fetching results from:")
for uri in audio_uris:
file_results = response.results[uri]
print(f" {file_results.uri}...")
output_bucket, output_object = re.match(
r"gs://([^/]+)/(.*)", file_results.uri
).group(1, 2)
# Instantiates a Cloud Storage client
storage_client = storage.Client()
# Fetch results from Cloud Storage
bucket = storage_client.bucket(output_bucket)
blob = bucket.blob(output_object)
results_bytes = blob.download_as_bytes()
batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
results_bytes, ignore_unknown_fields=True
)
for result in batch_recognize_results.results:
print(f" Transcript: {result.alternatives[0].transcript}")
return response
在批次辨識中啟用動態批次處理
動態批次處理可降低轉錄成本,但延遲時間會較長。這項功能僅適用於批次辨識。
以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次辨識,並啟用動態批次處理:
Python
import os
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
def transcribe_batch_dynamic_batching_v2(
audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
"""Transcribes audio from a Google Cloud Storage URI using dynamic batching.
Args:
audio_uri (str): The Cloud Storage URI of the input audio.
E.g., gs://[BUCKET]/[FILE]
Returns:
cloud_speech.BatchRecognizeResults: The response containing the transcription results.
"""
# Instantiates a client
client = SpeechClient()
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="chirp_3",
)
file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)
request = cloud_speech.BatchRecognizeRequest(
recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
config=config,
files=[file_metadata],
recognition_output_config=cloud_speech.RecognitionOutputConfig(
inline_response_config=cloud_speech.InlineOutputConfig(),
),
processing_strategy=cloud_speech.BatchRecognizeRequest.ProcessingStrategy.DYNAMIC_BATCHING,
)
# Transcribes the audio into text
operation = client.batch_recognize(request=request)
print("Waiting for operation to complete...")
response = operation.result(timeout=120)
for result in response.results[audio_uri].transcript.results:
print(f"Transcript: {result.alternatives[0].transcript}")
return response.results[audio_uri].transcript
覆寫每個檔案的辨識功能
根據預設,批次辨識會對批次辨識要求中的每個檔案使用相同的辨識設定。如果不同檔案需要不同的設定或功能,可以使用 BatchRecognizeFileMetadata 訊息中的 config 欄位,為每個檔案覆寫設定。如需覆寫辨識功能的範例,請參閱辨識器說明文件。
清除所用資源
為避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用,請按照下列步驟操作。
-
選用:撤銷您建立的驗證憑證,並刪除本機憑證檔案。
gcloud auth application-default revoke
-
選用:從 gcloud CLI 撤銷憑證。
gcloud auth revoke
控制台
gcloud
刪除 Google Cloud 專案:
gcloud projects delete PROJECT_ID