Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

將長音訊檔案轉錄成文字

這個頁面說明如何使用 Speech-to-Text API 和非同步語音辨識，將長音訊檔案 (長度超過 1 分鐘) 轉錄為文字。

關於非同步語音辨識

「批次語音辨識」會啟動長時間執行的音訊處理作業。使用非同步語音辨識功能轉錄長度超過 60 秒的音訊。如果是較短的音訊，使用同步語音辨識會更快、更簡單。非同步語音辨識的上限為 480 分鐘 (8 小時)。

批次語音辨識只能轉錄儲存在 Cloud Storage 中的音訊。轉錄稿輸出內容可內嵌在回應中 (適用於單一檔案的批次辨識要求)，或寫入 Cloud Storage。

批次辨識要求會傳回 Operation，其中包含要求目前辨識處理作業的相關資訊。您可以輪詢作業，瞭解作業何時完成，以及何時可取得轉錄稿。

事前準備

登入 Google Cloud 帳戶。如果您是 Google Cloud新手，歡迎建立帳戶，親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額，可用於執行、測試及部署工作負載。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
Click Grant access.
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
Click Select a role, then search for the role.
To grant additional roles, click Add another role and add each additional role.
Click Save.

安裝 Google Cloud CLI。

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
Click Grant access.
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
Click Select a role, then search for the role.
To grant additional roles, click Add another role and add each additional role.
Click Save.

安裝 Google Cloud CLI。

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

用戶端程式庫可以使用應用程式預設憑證，輕鬆向 Google API 進行驗證，然後傳送要求給這些 API。有了應用程式預設憑證，您就能在本機測試應用程式並部署，不必變更基礎程式碼。詳情請參閱「進行驗證以使用用戶端程式庫」一文。

如果您使用本機殼層，請為使用者帳戶建立本機驗證憑證：
```
gcloud auth application-default login
```
如果您使用 Cloud Shell，則不需要執行這項操作。

如果系統傳回驗證錯誤，且您使用外部識別資訊提供者 (IdP)，請確認您已使用聯合身分登入 gcloud CLI。

此外，請務必安裝用戶端程式庫。

啟用 Cloud Storage 存取權

Speech-to-Text 會使用服務帳戶存取 Cloud Storage 中的檔案。依預設，服務帳戶可存取同一個專案中的 Cloud Storage 檔案。

服務帳戶電子郵件地址如下：

service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com

如要轉錄其他專案中的 Cloud Storage 檔案，請在其他專案中將 [Speech-to-Text 服務代理程式][speech-service-agent] 角色授予這個服務帳戶：

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/speech.serviceAgent

如要進一步瞭解專案 IAM 政策，請參閱 [管理專案、資料夾和機構的存取權][manage-access]。

您也可以授予服務帳戶特定 Cloud Storage bucket 的權限，提供更精細的存取權：

gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
    --member=serviceAccount:service-PROJECT_NUMBER@gcp-sa-speech.iam.gserviceaccount.com \
    --role=roles/storage.admin

如要進一步瞭解如何管理 Cloud Storage 存取權，請參閱 Cloud Storage 說明文件中的 [建立及管理存取權控管清單][buckets-manage-acl]。

執行批次辨識並取得內嵌結果

以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次語音辨識，並從回應中讀取內嵌的轉錄結果：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_batch_gcs_input_inline_output_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
        The transcription results are returned inline in the response.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            Such as gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_3",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

執行批次辨識作業，並將結果寫入 Cloud Storage

以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次語音辨識，並從 Cloud Storage 中的輸出檔案讀取轉錄結果。請注意，寫入 Cloud Storage 的檔案是 JSON 格式的 BatchRecognizeResults 訊息：

Python

import os

import re

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_batch_gcs_input_gcs_output_v2(
    audio_uri: str,
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uri (str): The Google Cloud Storage URI of the input audio file.
            E.g., gs://[BUCKET]/[FILE]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript will be stored.
            E.g., gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the URI of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_3",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    file_results = response.results[audio_uri]

    print(f"Operation finished. Fetching results from {file_results.uri}...")
    output_bucket, output_object = re.match(
        r"gs://([^/]+)/(.*)", file_results.uri
    ).group(1, 2)

    # Instantiates a Cloud Storage client
    storage_client = storage.Client()

    # Fetch results from Cloud Storage
    bucket = storage_client.bucket(output_bucket)
    blob = bucket.blob(output_object)
    results_bytes = blob.download_as_bytes()
    batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
        results_bytes, ignore_unknown_fields=True
    )

    for result in batch_recognize_results.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return batch_recognize_results

對多個檔案執行批次辨識

以下範例示範如何對 Cloud Storage 中的多個音訊檔案執行批次語音辨識，並從 Cloud Storage 中的輸出檔案讀取轉錄結果：

Python

import os
import re
from typing import List

from google.cloud import storage
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_batch_multiple_files_v2(
    audio_uris: List[str],
    gcs_output_path: str,
) -> cloud_speech.BatchRecognizeResponse:
    """Transcribes audio from multiple Google Cloud Storage URIs using the Google Cloud Speech-to-Text API.
    The transcription results are stored in another Google Cloud Storage bucket.
    Args:
        audio_uris (List[str]): The list of Google Cloud Storage URIs of the input audio files.
            Such as ["gs://[BUCKET]/[FILE]", "gs://[BUCKET]/[FILE]"]
        gcs_output_path (str): The Google Cloud Storage bucket URI where the output transcript is stored.
            Such as gs://[BUCKET]
    Returns:
        cloud_speech.BatchRecognizeResponse: The response containing the URIs of the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_3",
    )

    files = [cloud_speech.BatchRecognizeFileMetadata(uri=uri) for uri in audio_uris]

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=files,
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            gcs_output_config=cloud_speech.GcsOutputConfig(
                uri=gcs_output_path,
            ),
        ),
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    print("Operation finished. Fetching results from:")
    for uri in audio_uris:
        file_results = response.results[uri]
        print(f"  {file_results.uri}...")
        output_bucket, output_object = re.match(
            r"gs://([^/]+)/(.*)", file_results.uri
        ).group(1, 2)

        # Instantiates a Cloud Storage client
        storage_client = storage.Client()

        # Fetch results from Cloud Storage
        bucket = storage_client.bucket(output_bucket)
        blob = bucket.blob(output_object)
        results_bytes = blob.download_as_bytes()
        batch_recognize_results = cloud_speech.BatchRecognizeResults.from_json(
            results_bytes, ignore_unknown_fields=True
        )

        for result in batch_recognize_results.results:
            print(f"     Transcript: {result.alternatives[0].transcript}")

    return response

在批次辨識中啟用動態批次處理

動態批次處理可降低轉錄成本，但延遲時間會較長。這項功能僅適用於批次辨識。

以下範例說明如何對 Cloud Storage 中的音訊檔案執行批次辨識，並啟用動態批次處理：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")

def transcribe_batch_dynamic_batching_v2(
    audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
    """Transcribes audio from a Google Cloud Storage URI using dynamic batching.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio.
        E.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.BatchRecognizeResults: The response containing the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="chirp_3",
    )

    file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

    request = cloud_speech.BatchRecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        files=[file_metadata],
        recognition_output_config=cloud_speech.RecognitionOutputConfig(
            inline_response_config=cloud_speech.InlineOutputConfig(),
        ),
        processing_strategy=cloud_speech.BatchRecognizeRequest.ProcessingStrategy.DYNAMIC_BATCHING,
    )

    # Transcribes the audio into text
    operation = client.batch_recognize(request=request)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=120)

    for result in response.results[audio_uri].transcript.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response.results[audio_uri].transcript

覆寫每個檔案的辨識功能

根據預設，批次辨識會對批次辨識要求中的每個檔案使用相同的辨識設定。如果不同檔案需要不同的設定或功能，可以使用 BatchRecognizeFileMetadata 訊息中的 config 欄位，為每個檔案覆寫設定。如需覆寫辨識功能的範例，請參閱辨識器說明文件。

清除所用資源

為避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用，請按照下列步驟操作。

選用：撤銷您建立的驗證憑證，並刪除本機憑證檔案。
```
gcloud auth application-default revoke
```
選用：從 gcloud CLI 撤銷憑證。
```
gcloud auth revoke
```

控制台

前往 Google Cloud 控制台的「Manage resources」(管理資源) 頁面。

前往「Manage resources」(管理資源)

在專案清單中選取要刪除的專案，然後點選「Delete」(刪除)。

在對話方塊中輸入專案 ID，然後按一下 [Shut down] (關閉) 以刪除專案。

gcloud

刪除 Google Cloud 專案：

gcloud projects delete PROJECT_ID

後續步驟

如需批次辨識，請參閱參考文件。

瞭解如何轉錄串流音訊。

瞭解如何轉錄短音訊檔案。

如要獲得最佳效能、準確率與其他提示，請參閱最佳做法說明文件。

將長音訊檔案轉錄成文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

關於非同步語音辨識

事前準備

Check for the roles

Grant the roles

Check for the roles

Grant the roles

啟用 Cloud Storage 存取權

執行批次辨識並取得內嵌結果

Python

執行批次辨識作業，並將結果寫入 Cloud Storage

Python

對多個檔案執行批次辨識

Python

在批次辨識中啟用動態批次處理

Python

覆寫每個檔案的辨識功能

清除所用資源

控制台

gcloud

後續步驟

將長音訊檔案轉錄成文字