指定字詞層級準確度信賴水準

您可指定讓「Cloud 語音轉文字」針對語音轉錄中的個別字詞,標明準確率或信賴度的值。

字詞層級信賴度

「Cloud 語音轉文字」對音訊剪輯執行語音轉錄時,也會測量回應的準確度。從 Cloud STT 傳送的回應會以 0.0 至 1.0 的數字,表明整個語音轉錄要求的信賴度。下列程式碼範例示範了 Cloud STT 傳回的信賴度。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.96748614
        }
      ]
    }
  ]
}

除了整個語音轉錄的信賴度外,Cloud 語音轉文字亦可提供語音轉錄中個別字詞的信賴度資訊。回覆隨後會在語音轉錄中加入 WordInfo 詳細資料,指出個別字詞的信賴度,如下列範例所示。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startOffset": "0s",
              "endOffset": "0.300s",
              "word": "how",
              "confidence": SOME NUMBER
            },
            ...
          ]
        }
      ]
    }
  ]
}

在要求中啟用字詞層級信心值

下列程式碼片段示範如何使用本機和遠端檔案,在 Cloud Speech-to-Text 的語音轉錄要求中啟用字詞層級的信賴度。

使用本機檔案

REST

如需完整資訊,請參閱 speech:recognize API 端點。

如要執行同步語音辨識,請提出 POST 要求並提供適當的要求內容。以下為使用 curlPOST 要求範例,這個範例使用 Google Cloud CLI 產生存取權杖。如需 gcloud CLI 的安裝操作說明,請參閱快速入門導覽課程

以下範例說明如何使用 curl 傳送 POST 要求,其中要求主體會啟用字詞層級的信賴度。

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v2/projects/{project}/locations/global/recognizers/{recognizers}:recognize \
    --data '{
    "config": {
        "features": {
            "enableWordTimeOffsets": true,
            "enableWordConfidence": true
        }
    },
    "uri": "gs://cloud-samples-tests/speech/brooklyn.flac"
}' > word-level-confidence.txt

如果要求成功,伺服器會傳回 200 OK HTTP 狀態碼與 JSON 格式的回應,並另存成名為 word-level-confidence.txt 的檔案。

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startOffset": "0s",
              "endOffset": "0.300s",
              "word": "how",
              "confidence": 0.98762906
            },
            {
              "startOffset": "0.300s",
              "endOffset": "0.600s",
              "word": "old",
              "confidence": 0.96929157
            },
            {
              "startOffset": "0.600s",
              "endOffset": "0.800s",
              "word": "is",
              "confidence": 0.98271006
            },
            {
              "startOffset": "0.800s",
              "endOffset": "0.900s",
              "word": "the",
              "confidence": 0.98271006
            },
            {
              "startOffset": "0.900s",
              "endOffset": "1.100s",
              "word": "Brooklyn",
              "confidence": 0.98762906
            },
            {
              "startOffset": "1.100s",
              "endOffset": "1.500s",
              "word": "Bridge",
              "confidence": 0.98762906
            }
          ]
        }
      ],
      "languageCode": "en-us"
    }
  ]
}

Python

如要瞭解如何安裝及使用 Cloud STT 的用戶端程式庫,請參閱「Cloud STT 用戶端程式庫」。詳情請參閱「Cloud STT Python API 參考文件」。

如要向 Cloud STT 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

speech_file = "resources/Google_Gnome.wav"

with open(speech_file, "rb") as audio_file:
    content = audio_file.read()

audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
    enable_word_confidence=True,
)

response = client.recognize(config=config, audio=audio)

for i, result in enumerate(response.results):
    alternative = result.alternatives[0]
    print("-" * 20)
    print(f"First alternative of result {i}")
    print(f"Transcript: {alternative.transcript}")
    print(
        "First Word and Confidence: ({}, {})".format(
            alternative.words[0].word, alternative.words[0].confidence
        )
    )

return response.results