Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

透過模型調整機制來改善語音轉錄結果

總覽

您可以透過模型調整功能，讓 Speech-to-Text 比其他系統建議選項更高的頻率，辨識特定字詞或詞組。舉例來說，假設音訊資料經常包含「weather」一詞，當 Speech-to-Text 遇到「weather」一詞時，您希望系統轉錄為「weather」的頻率高於「whether」。在這種情況下，您可以使用模型調整功能，讓 Speech-to-Text 傾向辨識「weather」。

模型調整功能在下列應用情境中特別實用：

提升音訊資料中經常出現的字詞或詞組的準確率。舉例來說，您可以提醒辨識模型使用者通常會說出的語音指令。
- 擴充 Speech-to-Text 辨識的字詞詞彙。 Speech-to-Text 包含的詞彙量非常大。不過，如果音訊資料經常包含一般語言中罕見的字詞 (例如專有名詞或特定領域的字詞)，可以使用模型調適功能新增這些字詞。
- 如果提供的音訊包含噪音或不太清楚，可提高語音轉錄準確率。

您也可以使用模型適應提升功能，微調辨識模型的偏誤。

提升字詞和詞組辨識準確度

如要提高語音轉文字在轉錄音訊資料時辨識出「天氣」一詞的機率，可以在 SpeechAdaptation 資源的 PhraseSet 物件中傳遞「天氣」一詞。

提供多字詞詞組時，Speech-to-Text 更有可能依序辨識出這些字詞。提供詞組也有助於提高辨識出部分詞組 (包括個別字詞) 的可能性。如要瞭解這些詞組的數量和大小限制，請參閱內容限制頁面。

使用類別提升辨識準確度

類別代表自然語言中常見的概念，例如貨幣單位和日曆日期。如果有一大群字詞對應至常見概念，但並非一律包含相同字詞或詞組，您可以使用類別提升轉錄準確度。

舉例來說，假設音訊資料包含人們說出街道地址的錄音檔。你可能錄到某人說「我家是中山路 123 號，左邊第四間房子」。在本例中，您希望 Speech-to-Text 將第一組數字序列「123」辨識為地址，而非序數「一百二十三」。不過，並非所有人都住在「中正路 123 號」。在 PhraseSet 資源中列出所有可能的街道地址並不切實際。您可以改用類別，指出無論實際數字為何，都應辨識為門牌號碼。在本例中，Speech-to-Text 可以更準確地轉錄「123 Main Street」和「987 Grand Boulevard」等詞組，因為這兩者都會辨識為地址號碼。

類別權杖

如要在模型調整中使用類別，請在 PhraseSet 資源的 phrases 欄位中加入類別權杖。請參閱支援的類別符記清單，瞭解您的語言支援哪些符記。舉例來說，如要改善來源音訊中地址號碼的語音轉錄品質，請在 PhraseSet. 的片語中提供 $ADDRESSNUM 值

您可以在 phrases 陣列中將類別做為獨立項目，或在較長的多字詞組中嵌入一或多個類別符記。舉例來說，您可以在字串中加入類別權杖，在較大的片語中指出地址號碼：["my address is $ADDRESSNUM"]。不過，如果音訊包含類似但不完全相同的詞組，例如「I am at 123 Main Street」，這類詞組就無法提供協助。為協助辨識類似詞組，請務必另外加入類別符記：["my address is $ADDRESSNUM", "$ADDRESSNUM"]。如果使用無效或格式錯誤的類別符記，Speech-to-Text 會忽略該符記，但不會觸發錯誤，仍會使用其餘片語做為情境。

自訂類別

您也可以建立自己的 CustomClass，也就是由自訂清單組成的類別，清單內含相關的項目或值。舉例來說，您想轉錄的音訊資料可能包含數百家區域餐廳的名稱。一般語音中較少出現餐廳名稱，因此辨識模型不太可能將其選為「正確」答案。您可以自訂調整辨識模型，這些名稱出現在音訊中時，就能偏向正確的辨識結果。

如要使用自訂類別，請建立 CustomClass 資源，其中包含每個餐廳名稱做為 ClassItem。自訂類別的功能與預先建構的類別權杖相同。phrase 可以包含預先建構的類別權杖和自訂類別。

使用增強功能微調轉錄結果

預設情況下，模型調整項的影響相對較小，尤其是單字詞組。模型調整功能的提升功能可讓您為某些詞組指派較高的權重，藉此提高辨識模型偏誤。如果符合下列所有情況，建議您導入升幅：

您已導入模型調整功能。
您想進一步調整模型調整機制對轉錄結果的影響程度。如要瞭解你的語言是否支援加速功能，請參閱語言支援頁面。

舉例來說，你有很多錄音檔，內容是使用者詢問「進入縣市集市的票價」，其中「集市」一詞的出現頻率高於「票價」。在這種情況下，您可以透過模型調整功能，在 PhraseSet 資源中新增「fair」和「fare」做為 phrases，提高模型辨識這兩個字詞的機率。這樣一來，語音轉文字服務就會比方說更常辨識出「fair」和「fare」，而非「hare」或「lair」。

不過，由於「fair」在音訊中出現的頻率較高，因此系統應該會更常辨識出「fair」而非「fare」。您可能已使用 Speech-to-Text API 轉錄音訊，但發現系統無法正確辨識「fair」這個字，在這種情況下，您可能想使用「boost」功能，為「fair」指派比「fare」更高的提升值。「fair」的權重值較高，因此 Speech-to-Text API 偏向選擇「fair」，而非「fare」。如果沒有提升值，辨識模型會以相同機率辨識「fair」和「fare」。

商家宣傳基本概念

使用提升功能時，請在 PhraseSet 資源中為 phrase 項目指派加權值。Speech-to-Text 會參考這個加權值，為音訊資料中的字詞選取可能的轉錄結果。增強值越高，Speech-to-Text 從可能選項中選擇該字詞或詞組的機率便越高。

如果將增強值指派給多字詞組，增強功能會套用至整個詞組，且只會套用至整個詞組。舉例來說，您想將提升值指派給「My favorite exhibit at the American Museum of Natural History is the blue whale」(我最喜歡美國自然史博物館的展覽是藍鯨) 這個詞組。如果將該詞組新增至 phrase 物件並指派提升值，辨識模型就更有可能逐字辨識出該詞組。

如果增強多字詞組後，仍未獲得所需結果，建議您將組成該詞組的所有雙連字 (2 個字詞，依序排列) 新增為 phrase 項目，並為每個項目指派增強值。延續先前的例子，您可以調查是否要新增其他雙連詞和尾碼 (超過兩個字)，例如「我最喜歡」、「我最喜歡的展覽」、「最喜歡的展覽」、「我最喜歡的美國自然歷史博物館展覽」、「美國自然歷史博物館」和「藍鯨」。這樣一來，STT 辨識模型就更有可能辨識出音訊中含有原始加強詞組部分內容，但並非逐字相符的相關詞組。

設定增幅值

升幅值必須是大於 0 的浮點值。加成值的實際上限為 20。為獲得最佳結果，請調整升幅值，直到轉錄結果準確為止，藉此實驗轉錄結果。

如果提升值較高，偽陰性情形就會減少。偽陰性是指音訊中出現的字詞或詞組，但語音轉文字服務未正確辨識。不過，提高信心指數也可能增加誤判的機率，也就是說，即使音訊中沒有出現該字詞或詞組，轉錄稿中仍可能出現。

使用模型適應的用途範例

下列範例逐步說明如何使用模型調整功能，轉錄某人說「The word is fare」的音訊錄音內容。在本例中，如果沒有語音適應功能，Speech-to-Text 會辨識出「fair」一字。透過語音調整功能，Speech-to-Text 可以識別出「fare」一字。

事前準備

登入 Google Cloud 帳戶。如果您是 Google Cloud新手，歡迎建立帳戶，親自評估產品在實際工作環境中的成效。新客戶還能獲得價值 $300 美元的免費抵免額，可用於執行、測試及部署工作負載。

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

To enable APIs, you need the serviceusage.services.enable permission. If you created the project, then you likely already have this permission through the Owner role (roles/owner). Otherwise, you can get this permission through the Service Usage Admin role (roles/serviceusage.serviceUsageAdmin). Learn how to grant roles.

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
Click Grant access.
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
Click Select a role, then search for the role.
To grant additional roles, click Add another role and add each additional role.
Click Save.

安裝 Google Cloud CLI。

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Speech-to-Text APIs.

Roles required to enable APIs

Enable the APIs

Make sure that you have the following role or roles on the project: Cloud Speech Administrator

Check for the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.

Grant the roles

In the Google Cloud console, go to the IAM page.
Go to IAM
Select the project.
Click Grant access.
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
Click Select a role, then search for the role.
To grant additional roles, click Add another role and add each additional role.
Click Save.

安裝 Google Cloud CLI。

注意：如果您先前已安裝 gcloud CLI，請執行 gcloud components update，確認您使用的是最新版本。

若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

執行下列指令，初始化 gcloud CLI：

gcloud init

用戶端程式庫可以使用應用程式預設憑證，輕鬆向 Google API 進行驗證，然後傳送要求給這些 API。有了應用程式預設憑證，您就能在本機測試應用程式並部署，不必變更基礎程式碼。詳情請參閱「進行驗證以使用用戶端程式庫」一文。

如果您使用本機殼層，請為使用者帳戶建立本機驗證憑證：
```
gcloud auth application-default login
```
如果您使用 Cloud Shell，則不需要執行這項操作。

如果系統傳回驗證錯誤，且您使用外部識別資訊提供者 (IdP)，請確認您已使用聯合身分登入 gcloud CLI。

此外，請務必安裝用戶端程式庫。

使用 `PhraseSet` 改善轉錄內容

下列範例會使用「fare」一詞建構 PhraseSet，並在辨識要求中將其新增為 inline_phrase_set：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def adaptation_v2_inline_phrase_set(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Enhances speech recognition accuracy using an inline phrase set.
    The inline custom phrase set helps the recognizer produce more accurate transcriptions for specific terms.
    Phrases are given a boost to increase their chances of being recognized correctly.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """

    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Build inline phrase set to produce a more accurate transcript
    phrase_set = cloud_speech.PhraseSet(
        phrases=[{"value": "fare", "boost": 10}, {"value": "word", "boost": 20}]
    )
    adaptation = cloud_speech.SpeechAdaptation(
        phrase_sets=[
            cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                inline_phrase_set=phrase_set
            )
        ]
    )
    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        adaptation=adaptation,
        language_codes=["en-US"],
        model="short",
    )

    # Prepare the request which includes specifying the recognizer, configuration, and the audio content
    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

這個範例會建立具有相同詞組的 PhraseSet 資源，然後在辨識要求中參照該資源：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def adaptation_v2_phrase_set_reference(
    audio_file: str,
    phrase_set_id: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribe audio files using a PhraseSet.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        phrase_set_id (str): The unique ID of the PhraseSet to use.
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """

    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Creating operation of creating the PhraseSet on the cloud.
    operation = client.create_phrase_set(
        parent=f"projects/{PROJECT_ID}/locations/global",
        phrase_set_id=phrase_set_id,
        phrase_set=cloud_speech.PhraseSet(phrases=[{"value": "fare", "boost": 10}]),
    )
    phrase_set = operation.result()

    # Add a reference of the PhraseSet into the recognition request
    adaptation = cloud_speech.SpeechAdaptation(
        phrase_sets=[
            cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                phrase_set=phrase_set.name
            )
        ]
    )

    # Automatically detect audio encoding. Use "short" model for short utterances.
    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        adaptation=adaptation,
        language_codes=["en-US"],
        model="short",
    )
    #  Prepare the request which includes specifying the recognizer, configuration, and the audio content
    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )
    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

使用 `CustomClass` 改善轉錄結果

下列範例會使用「fare」項目和名稱「fare」建構 CustomClass。然後在辨識要求中，於 inline_phrase_set 內參照 CustomClass：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def adaptation_v2_inline_custom_class(
    audio_file: str,
) -> cloud_speech.RecognizeResponse:
    """Transcribe audio file using inline custom class.
    The inline custom class helps the recognizer produce more accurate transcriptions for specific terms.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response object which includes the transcription results.
    """
    # Instantiates a client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Define an inline custom class to enhance recognition accuracy with specific items like "fare" etc.
    custom_class_name = "your-class-name"
    custom_class = cloud_speech.CustomClass(
        name=custom_class_name,
        items=[{"value": "fare"}],
    )

    # Build inline phrase set to produce a more accurate transcript
    phrase_set = cloud_speech.PhraseSet(
        phrases=[{"value": custom_class_name, "boost": 20}]
    )
    adaptation = cloud_speech.SpeechAdaptation(
        phrase_sets=[
            cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                inline_phrase_set=phrase_set
            )
        ],
        custom_classes=[custom_class],
    )
    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        adaptation=adaptation,
        language_codes=["en-US"],
        model="short",
    )

    # Prepare the request which includes specifying the recognizer, configuration, and the audio content
    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

這個範例會建立具有相同項目的 CustomClass 資源。然後，它會建立 PhraseSet 資源，其中包含參照 CustomClass 資源名稱的片語。然後在辨識要求中參照 PhraseSet 資源：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def adaptation_v2_custom_class_reference(
    audio_file: str, phrase_set_id: str, custom_class_id: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe audio file using a custom class.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        phrase_set_id (str): The unique ID of the phrase set to use.
        custom_class_id (str): The unique ID of the custom class to use.
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Instantiates a speech client
    client = SpeechClient()

    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Create a custom class to improve recognition accuracy for specific terms
    custom_class = cloud_speech.CustomClass(items=[{"value": "fare"}])
    operation = client.create_custom_class(
        parent=f"projects/{PROJECT_ID}/locations/global",
        custom_class_id=custom_class_id,
        custom_class=custom_class,
    )
    custom_class = operation.result()

    # Create a persistent PhraseSet to reference in a recognition request
    created_phrase_set = cloud_speech.PhraseSet(
        phrases=[
            {
                "value": f"${{{custom_class.name}}}",
                "boost": 20,
            },  # Using custom class reference
        ]
    )
    operation = client.create_phrase_set(
        parent=f"projects/{PROJECT_ID}/locations/global",
        phrase_set_id=phrase_set_id,
        phrase_set=created_phrase_set,
    )
    phrase_set = operation.result()

    # Add a reference of the PhraseSet into the recognition request
    adaptation = cloud_speech.SpeechAdaptation(
        phrase_sets=[
            cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                phrase_set=phrase_set.name
            )
        ]
    )
    # Automatically detect the audio's encoding with short audio model
    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        adaptation=adaptation,
        language_codes=["en-US"],
        model="short",
    )

    # Create a custom class to reference in a PhraseSet
    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

清除所用資源

為避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用，請按照下列步驟操作。

選用：撤銷您建立的驗證憑證，並刪除本機憑證檔案。
```
gcloud auth application-default revoke
```
選用：從 gcloud CLI 撤銷憑證。
```
gcloud auth revoke
```

控制台

前往 Google Cloud 控制台的「Manage resources」(管理資源) 頁面。

前往「Manage resources」(管理資源)

在專案清單中選取要刪除的專案，然後點選「Delete」(刪除)。

在對話方塊中輸入專案 ID，然後按一下 [Shut down] (關閉) 以刪除專案。

gcloud

刪除 Google Cloud 專案：

gcloud projects delete PROJECT_ID

透過模型調整機制來改善語音轉錄結果 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

總覽

提升字詞和詞組辨識準確度

使用類別提升辨識準確度

類別權杖

自訂類別

使用增強功能微調轉錄結果

商家宣傳基本概念

設定增幅值

使用模型適應的用途範例

事前準備

Check for the roles

Grant the roles

Check for the roles

Grant the roles

使用 PhraseSet 改善轉錄內容

Python

Python

使用 CustomClass 改善轉錄結果

Python

Python

清除所用資源

控制台

gcloud

後續步驟

透過模型調整機制來改善語音轉錄結果

使用 `PhraseSet` 改善轉錄內容

使用 `CustomClass` 改善轉錄結果