Cloud Speech-to-Text API の最新バージョンに移行する

Cloud Speech-to-Text V2 では、最新の Google Cloud API 設計を活用して、エンタープライズセキュリティと規制に関する要件をすぐに満たすことが可能です。

これらの要件は次のようにして満たされます。

データ所在地: Cloud STT V2 は、ベルギーやシンガポールなどの Google Cloudリージョンで既存の音声文字変換モデルを幅広く提供しています。そのため、完全にリージョン化されたサービスを通じて音声文字変換モデルを呼び出すことができます。
認識機能のリソースフルネス: 認識機能は再利用可能な認識構成で、モデル、言語、機能の組み合わせを含めることができます。
ロギング: リソースの作成と音声文字変換では、 Google Cloud コンソールで利用可能なログが生成されます。そのため、テレメトリーとデバッグが容易になります。
暗号化: Cloud Speech-to-Text V2 は、すべてのリソースとバッチ音声文字変換の顧客管理の暗号鍵をサポートしています。
音声自動検出: Cloud Speech-to-Text V2 では、音声ファイルのサンプルレート、チャンネル数、形式を自動的に検出できます。リクエスト構成でその情報を指定する必要はありません。

V1 から V2 への移行

V1 API から V2 API への移行は自動的には行われません。機能セットを利用するには、最小限の実装変更が必要です。

API での移行

Cloud STT V1 と同様、音声文字変換を行うには、音声の言語と使用する認識モデルを選択して、RecognitionConfig を作成する必要があります。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response from the recognize request, containing
        the transcription results
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

必要に応じて、Cloud Speech-to-Text を使用するリージョンを選択し、そのリージョンの言語とモデルの可用性を確認します。

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def change_speech_v2_location(
    audio_file: str, location: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region. It allows for specifying the location
        to potentially reduce latency and meet data residency requirements.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        location (str): The region where the Speech API will be accessed.
            E.g., "europe-west3"
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    return response

また、多数の音声文字変換リクエストで特定の認識構成を再利用する必要がある場合は、必要に応じて認知ツールリソースを作成します。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
    """Сreates a recognizer with an unique ID and default recognition configuration.
    Args:
        recognizer_id (str): The unique identifier for the recognizer to be created.
    Returns:
        cloud_speech.Recognizer: The created recognizer object with configuration.
    """
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{PROJECT_ID}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )
    # Sends the request to create a recognizer and waits for the operation to complete
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

新しい V2 API のリクエストとレスポンスには、他にも違いがあります。詳細については、リファレンスドキュメントをご覧ください。

UI での移行

Google Cloud コンソールの [Speech] を使用して移行する手順は次のとおりです。

Google Cloud コンソールの [Speech]に移動します。
[音声文字変換] ページに移動します。
[新しい音声文字変換] をクリックし、[音声構成] タブで音声を選択します。
[音声文字変換のオプション] タブで [V2] を選択します。

次のステップ

クライアントライブラリを使用して、好みのプログラミング言語で音声文字変換を行う。

短い音声ファイルを文字に変換する方法を学習する。

ストリーミング音声を文字に変換する方法を学習する。

長い音声ファイルを文字に変換する方法を学習する。

Cloud Speech-to-Text API の最新バージョンに移行する コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

V1 から V2 への移行

API での移行

Python

Python

Python

UI での移行

次のステップ

Cloud Speech-to-Text API の最新バージョンに移行する