단어 수준 정확성 신뢰도 수준 지정

Cloud Speech-to-Text가 변환 텍스트에 있는 개별 단어의 정확도 또는 신뢰도 수준 값을 나타내도록 지정할 수 있습니다.

단어 수준의 신뢰도

Cloud Speech-to-Text가 오디오 클립을 텍스트로 변환하는 경우 응답의 정확도 수준도 측정합니다. Cloud STT에서 전송된 응답에는 전체 스크립트 작성 요청의 신뢰도 수준이 0.0에서 1.0 사이의 숫자로 표시됩니다. 다음 코드 샘플은 Cloud STT에서 반환된 신뢰도 수준 값의 예시를 보여줍니다.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.96748614
        }
      ]
    }
  ]
}

Cloud STT는 전체 스크립트 작성의 신뢰도 수준뿐 아니라 스크립트 내 개별 단어의 신뢰도 수준도 제공할 수 있습니다. 다음 예시와 같이 개별 단어의 신뢰도 수준을 나타내는 변환 텍스트의 WordInfo 세부정보가 응답에 포함됩니다.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startOffset": "0s",
              "endOffset": "0.300s",
              "word": "how",
              "confidence": SOME NUMBER
            },
            ...
          ]
        }
      ]
    }
  ]
}

요청에 단어 수준의 신뢰도 사용 설정

다음 코드 스니펫은 로컬 파일과 원격 파일을 사용하여 Cloud Speech-to-Text에 대한 스크립트 작성 요청에서 단어 수준의 신뢰도를 사용 설정하는 방법을 보여줍니다.

로컬 파일 사용

REST

자세한 내용은 speech:recognize API 엔드포인트를 참조하세요.

동기 음성 인식을 수행하려면 POST 요청을 하고 적절한 요청 본문을 제공합니다. 다음은 curl을 사용한 POST 요청의 예시입니다. 이 예시에서는 Google Cloud CLI를 사용하여 액세스 토큰을 생성합니다. gcloud CLI 설치에 대한 안내는 빠른 시작을 참조하세요.

다음 예시에서는 curl을 사용하여 POST 요청을 보내는 방법을 보여줍니다. 이 예시에서는 요청의 본문에서 단어 수준의 신뢰도를 사용하도록 설정합니다.

curl -s -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    https://speech.googleapis.com/v2/projects/{project}/locations/global/recognizers/{recognizers}:recognize \
    --data '{
    "config": {
        "features": {
            "enableWordTimeOffsets": true,
            "enableWordConfidence": true
        }
    },
    "uri": "gs://cloud-samples-tests/speech/brooklyn.flac"
}' > word-level-confidence.txt

요청이 성공하면 서버는 200 OK HTTP 상태 코드와 응답을 JSON 형식으로 반환하여 word-level-confidence.txt라는 파일에 저장합니다.

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98360395,
          "words": [
            {
              "startOffset": "0s",
              "endOffset": "0.300s",
              "word": "how",
              "confidence": 0.98762906
            },
            {
              "startOffset": "0.300s",
              "endOffset": "0.600s",
              "word": "old",
              "confidence": 0.96929157
            },
            {
              "startOffset": "0.600s",
              "endOffset": "0.800s",
              "word": "is",
              "confidence": 0.98271006
            },
            {
              "startOffset": "0.800s",
              "endOffset": "0.900s",
              "word": "the",
              "confidence": 0.98271006
            },
            {
              "startOffset": "0.900s",
              "endOffset": "1.100s",
              "word": "Brooklyn",
              "confidence": 0.98762906
            },
            {
              "startOffset": "1.100s",
              "endOffset": "1.500s",
              "word": "Bridge",
              "confidence": 0.98762906
            }
          ]
        }
      ],
      "languageCode": "en-us"
    }
  ]
}

Python

Cloud STT용 클라이언트 라이브러리를 설치하고 사용하는 방법은 Cloud STT 클라이언트 라이브러리를 참고하세요. 자세한 내용은 Cloud STT Python API 참고 문서를 확인하세요.

Cloud STT에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

speech_file = "resources/Google_Gnome.wav"

with open(speech_file, "rb") as audio_file:
    content = audio_file.read()

audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code="en-US",
    enable_word_confidence=True,
)

response = client.recognize(config=config, audio=audio)

for i, result in enumerate(response.results):
    alternative = result.alternatives[0]
    print("-" * 20)
    print(f"First alternative of result {i}")
    print(f"Transcript: {alternative.transcript}")
    print(
        "First Word and Confidence: ({}, {})".format(
            alternative.words[0].word, alternative.words[0].confidence
        )
    )

return response.results