使用模型自适应功能改进转录结果

通过模型自适应,您可以提高从 Cloud Speech-to-Text 获取的转写结果的准确率。利用模型自适应功能,您可以指定 Cloud STT 在音频数据中更频繁识别的字词和短语,而不是其他可能合适的替代性说法。模型自适应功能对于在以下用例中提高转写准确率特别有用:

  • 您的音频中包含可能会频繁出现的字词或短语。
  • 您的音频可能包含罕见的字词(例如专有名词),或者在日常用语中不常出现的字词。
  • 您的音频包含噪音或不够清晰。

在阅读本文档之前,请先阅读模型自适应简介,大致了解此功能的运作方式。如需了解每个模型自适应请求的短语和字符限制,请参阅配额和限制

代码示例

模型自适应是可选的 Cloud STT 配置,可用于根据需要自定义转写结果。如需详细了解如何配置识别请求正文,请参阅 RecognitionConfig 文档。

以下代码示例演示了如何使用 SpeechAdaptation 资源来提高转写准确率:PhraseSetCustomClass增强型模型自适应。如需在将来的请求中使用 PhraseSetCustomClass,请记录在创建资源时在响应中返回的资源 name

如需查看适用于您的语言的预构建类列表,请参阅支持的类令牌

Python

如需了解如何安装和使用 Cloud STT 客户端库,请参阅 Cloud STT 客户端库。如需了解详情,请参阅 Cloud STT Python API 参考文档

如需向 Cloud STT 进行身份验证,请设置应用默认凭证。如需了解详情,请参阅为本地开发环境设置身份验证

import os

from google.cloud import speech_v1p1beta1 as speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def transcribe_with_model_adaptation(
    audio_uri: str,
    custom_class_id: str,
    phrase_set_id: str,
) -> str:
    """Create `PhraseSet` and `CustomClasses` for custom item lists in input data.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio. e.g. gs://[BUCKET]/[FILE]
        custom_class_id (str): The unique ID of the custom class to create
        phrase_set_id (str): The unique ID of the PhraseSet to create.
    Returns:
        The transcript of the input audio.
    """
    # Specifies the location where the Speech API will be accessed.
    location = "global"

    # Audio object
    audio = speech.RecognitionAudio(uri=audio_uri)

    # Create the adaptation client
    adaptation_client = speech.AdaptationClient()

    # The parent resource where the custom class and phrase set will be created.
    parent = f"projects/{PROJECT_ID}/locations/{location}"

    # Create the custom class resource
    adaptation_client.create_custom_class(
        {
            "parent": parent,
            "custom_class_id": custom_class_id,
            "custom_class": {
                "items": [
                    {"value": "sushido"},
                    {"value": "altura"},
                    {"value": "taneda"},
                ]
            },
        }
    )
    custom_class_name = (
        f"projects/{PROJECT_ID}/locations/{location}/customClasses/{custom_class_id}"
    )
    # Create the phrase set resource
    phrase_set_response = adaptation_client.create_phrase_set(
        {
            "parent": parent,
            "phrase_set_id": phrase_set_id,
            "phrase_set": {
                "boost": 10,
                "phrases": [
                    {"value": f"Visit restaurants like ${{{custom_class_name}}}"}
                ],
            },
        }
    )
    phrase_set_name = phrase_set_response.name
    # The next section shows how to use the newly created custom
    # class and phrase set to send a transcription request with speech adaptation

    # Speech adaptation configuration
    speech_adaptation = speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])

    # speech configuration object
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=24000,
        language_code="en-US",
        adaptation=speech_adaptation,
    )

    # Create the speech client
    speech_client = speech.SpeechClient()

    response = speech_client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")