Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

遷移至最新版 Cloud Speech-to-Text API

Cloud Speech-to-Text API V2 採用最新的 Google Cloud API 設計，可讓客戶立即可用，滿足企業安全性和法規要求。

這些要求是透過下列方式實現：

資料落地：Cloud STT V2 在Google Cloud區域 (例如比利時或新加坡) 提供現有各種語音轉錄模型。這可讓您透過完全區域化的服務叫用語音轉錄模型。
辨識器資源：辨識器是可重複使用的辨識設定，可包含模型、語言和特徵的組合。
記錄：資源建立和轉錄作業會產生記錄，您可以在 Google Cloud 控制台中查看，以便進行遙測和偵錯。
加密：Cloud Speech-to-Text V2 支援客戶管理的加密金鑰，適用於所有資源和批次語音轉錄功能。
自動偵測音訊：Cloud Speech-to-Text V2 可自動偵測音訊檔案的取樣率、聲道數和格式，無須在要求設定中提供這項資訊。

從 V1 遷移至 V2

從 V1 API 遷移至 V2 API 時，系統不會自動執行遷移作業。如要使用這組功能，只需進行最少的實作變更。

透過 API 遷移

與 Cloud STT V1 類似，如要轉錄音訊，您需要選取音訊語言和所選的辨識模型，藉此建立 RecognitionConfig：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response from the recognize request, containing
        the transcription results
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

如有需要，請選取要使用 Cloud Speech-to-Text API 的區域，並檢查該區域的語言和模型可用性：

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def change_speech_v2_location(
    audio_file: str, location: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region. It allows for specifying the location
        to potentially reduce latency and meet data residency requirements.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        location (str): The region where the Speech API will be accessed.
            E.g., "europe-west3"
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    return response

如果需要在多個語音轉錄要求中重複使用特定辨識設定，可以選擇建立辨識器資源：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
    """Сreates a recognizer with an unique ID and default recognition configuration.
    Args:
        recognizer_id (str): The unique identifier for the recognizer to be created.
    Returns:
        cloud_speech.Recognizer: The created recognizer object with configuration.
    """
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{PROJECT_ID}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )
    # Sends the request to create a recognizer and waits for the operation to complete
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

新版 V2 API 的要求和回應也有其他差異。詳情請參閱參考文件。

在使用者介面中遷移

如要透過 Speech Google Cloud 控制台遷移，請按照下列步驟操作：

前往 Speech Google Cloud 控制台。
前往「轉錄稿」頁面。
按一下「新增轉錄稿」，然後在「音訊設定」分頁中選取音訊。
在「Transcription options」(語音轉錄選項) 分頁中，選取「V2」。

後續步驟

使用用戶端程式庫，以您慣用的程式設計語言轉錄音訊。

瞭解如何轉錄短音訊檔案。

瞭解如何轉錄串流音訊。

瞭解如何轉錄長音訊檔案。

遷移至最新版 Cloud Speech-to-Text API 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

從 V1 遷移至 V2

透過 API 遷移

Python

Python

Python

在使用者介面中遷移

後續步驟

遷移至最新版 Cloud Speech-to-Text API