本頁面由 Cloud Translation API 翻譯而成。

Chirp 3 轉錄：提升多語言準確率

Chirp 3 是最新一代的 Google 多語言自動語音辨識 (ASR) 專用生成模型，可根據意見回饋和使用體驗滿足使用者需求。Chirp 3 的準確度和速度都比先前的 Chirp 模型更出色，並提供說話者區分和自動語言偵測功能。

模型詳細資料

Chirp 3：轉錄功能僅適用於 Speech-to-Text API V2。

型號 ID

使用 API 時，只要在辨識要求中指定適當的模型 ID，或在 Google Cloud 控制台中指定模型名稱，即可像使用其他模型一樣使用 Chirp 3：轉錄。在辨識中指定適當的 ID。

模型	型號 ID
Chirp 3	`chirp_3`

API 方法

並非所有辨識方法都支援相同的語言可用性組合，因為 Speech-to-Text API V2 提供 Chirp 3，因此支援下列辨識方法：

API 版本	API 方法	支援
V2	Speech.StreamingRecognize (適用於串流和即時音訊)	支援
V2	Speech.Recognize (適用於短於一分鐘的音訊)	支援
V2	Speech.BatchRecognize (適用於 1 分鐘到 1 小時的長音訊)	支援

區域可用性

Chirp 3 支援以下 Google Cloud 區域，未來將支援更多區域：

Google Cloud 區域	發布準備完成度
`us(multi-region)`	正式發布版
`eu(multi-region)`	正式發布版
`asia-southeast1`	正式發布版
`asia-northeast1`	正式發布版

如要查看各轉錄模型支援的 Google Cloud 區域、語言和語言代碼，以及功能，請按照本文說明使用 Locations API。

語音轉錄功能支援的語言

Chirp 3 支援以下語言的StreamingRecognize、Recognize 和 BatchRecognize轉錄：

語言	`BCP-47 Code`	上線準備
加泰隆尼亞文 (西班牙)	`ca-ES`	正式發布版
簡體中文	`cmn-Hans-CN`	正式發布版
克羅埃西亞文 (克羅埃西亞)	`hr-HR`	正式發布版
丹麥文 (丹麥)	`da-DK`	正式發布版
荷蘭文 (荷蘭)	`nl-NL`	正式發布版
英文 (澳洲)	`en-AU`	正式發布版
英文 (英國)	`en-GB`	正式發布版
英文 (印度)	`en-IN`	正式發布版
英文 (美國)	`en-US`	正式發布版
芬蘭文 (芬蘭)	`fi-FI`	正式發布版
法文 (加拿大)	`fr-CA`	正式發布版
法文 (法國)	`fr-FR`	正式發布版
德文 (德國)	`de-DE`	正式發布版
希臘文 (希臘)	`el-GR`	正式發布版
北印度文 (印度)	`hi-IN`	正式發布版
義大利文 (義大利)	`it-IT`	正式發布版
日文 (日本)	`ja-JP`	正式發布版
韓文 (韓國)	`ko-KR`	正式發布版
波蘭文 (波蘭)	`pl-PL`	正式發布版
葡萄牙文 (巴西)	`pt-BR`	正式發布版
葡萄牙語 (葡萄牙)	`pt-PT`	正式發布版
羅馬尼亞文 (羅馬尼亞)	`ro-RO`	正式發布版
俄文 (俄羅斯)	`ru-RU`	正式發布版
西班牙文 (西班牙)	`es-ES`	正式發布版
西班牙文 (美國)	`es-US`	正式發布版
瑞典文 (瑞典)	`sv-SE`	正式發布版
土耳其文 (土耳其)	`tr-TR`	正式發布版
烏克蘭文 (烏克蘭)	`uk-UA`	正式發布版
越南文 (越南)	`vi-VN`	正式發布版
阿拉伯文	`ar-XA`	預覽
阿拉伯文 (阿爾及利亞)	`ar-DZ`	預覽
阿拉伯文 (巴林)	`ar-BH`	預覽
阿拉伯文 (埃及)	`ar-EG`	預覽
阿拉伯文 (以色列)	`ar-IL`	預覽
阿拉伯文 (約旦)	`ar-JO`	預覽
阿拉伯文 (科威特)	`ar-KW`	預覽
阿拉伯文 (黎巴嫩)	`ar-LB`	預覽
阿拉伯文 (茅利塔尼亞)	`ar-MR`	預覽
阿拉伯文 (摩洛哥)	`ar-MA`	預覽
阿拉伯文 (阿曼)	`ar-OM`	預覽
阿拉伯文 (卡達)	`ar-QA`	預覽
阿拉伯文 (沙烏地阿拉伯)	`ar-SA`	預覽
阿拉伯文 (巴勒斯坦國)	`ar-PS`	預覽
阿拉伯文 (敘利亞)	`ar-SY`	預覽
阿拉伯文 (突尼西亞)	`ar-TN`	預覽
阿拉伯文 (阿拉伯聯合大公國)	`ar-AE`	預覽
阿拉伯文 (葉門)	`ar-YE`	預覽
亞美尼亞文 (亞美尼亞)	`hy-AM`	預覽
孟加拉文 (孟加拉)	`bn-BD`	預覽
孟加拉文 (印度)	`bn-IN`	預覽
保加利亞文 (保加利亞)	`bg-BG`	預覽
緬甸文 (緬甸)	`my-MM`	預覽
中庫德文 (伊拉克)	`ar-IQ`	預覽
中文，粵語 (繁體，香港)	`yue-Hant-HK`	預覽
中文，華語 (繁體，台灣)	`cmn-Hant-TW`	預覽
捷克文 (捷克共和國)	`cs-CZ`	預覽
英文 (菲律賓)	`en-PH`	預覽
愛沙尼亞文 (愛沙尼亞)	`et-EE`	預覽
菲律賓文 (菲律賓)	`fil-PH`	預覽
古吉拉特文 (印度)	`gu-IN`	預覽
希伯來文 (以色列)	`iw-IL`	預覽
匈牙利文 (匈牙利)	`hu-HU`	預覽
印尼文 (印尼)	`id-ID`	預覽
卡納達文 (印度)	`kn-IN`	預覽
高棉文 (柬埔寨)	`km-KH`	預覽
寮文 (寮國)	`lo-LA`	預覽
拉脫維亞文 (拉脫維亞)	`lv-LV`	預覽
立陶宛文 (立陶宛)	`lt-LT`	預覽
馬來文 (馬來西亞)	`ms-MY`	預覽
馬拉雅拉姆文 (印度)	`ml-IN`	預覽
馬拉地文 (印度)	`mr-IN`	預覽
尼泊爾文 (尼泊爾)	`ne-NP`	預覽
挪威文 (挪威)	`no-NO`	預覽
波斯文 (伊朗)	`fa-IR`	預覽
塞爾維亞文 (塞爾維亞)	`sr-RS`	預覽
斯洛伐克文 (斯洛伐克)	`sk-SK`	預覽
斯洛維尼亞文 (斯洛維尼亞)	`sl-SI`	預覽
西班牙文 (墨西哥)	`es-MX`	預覽
斯瓦希里文	`sw`	預覽
泰米爾文 (印度)	`ta-IN`	預覽
泰盧固文 (印度)	`te-IN`	預覽
泰文 (泰國)	`th-TH`	預覽
烏茲別克文 (烏茲別克)	`uz-UZ`	預覽

說話者分段標記功能支援的語言

Chirp 3 僅支援BatchRecognize和Recognize的轉錄和說話者辨識功能，支援的語言如下：

語言	BCP-47 代碼
中文 (簡體，中國)	cmn-Hans-CN
德文 (德國)	de-DE
英文 (英國)	en-GB
英文 (印度)	en-IN
英文 (美國)	en-US
西班牙文 (西班牙)	es-ES
西班牙文 (美國)	es-US
法文 (加拿大)	fr-CA
法文 (法國)	fr-FR
北印度文 (印度)	hi-IN
義大利文 (義大利)	it-IT
日文 (日本)	ja-JP
韓文 (韓國)	ko-KR
葡萄牙文 (巴西)	pt-BR

功能支援與限制

Chirp 3 支援下列功能：

功能	說明	推出階段
自動加上標點符號	由模型自動生成，可選擇停用。	正式發布版
自動大寫	由模型自動生成，可選擇停用。	正式發布版
語句層級的時間戳記	由模型自動生成。	正式發布版
說話者分段標記	自動識別單一聲道音訊樣本中的不同說話者。僅限 `BatchRecognize`	正式發布版
語音調整 (偏誤)	以詞組或字詞的形式向模型提供提示，提高特定字詞或專有名詞的辨識準確率。	正式發布版
不限語言的音訊轉錄	自動推斷並轉錄最常用的語言。	正式發布版

Chirp 3 不支援下列功能：

功能	說明
字詞層級的時間戳記	由模型自動生成，可選擇啟用，但預期會導致轉錄品質下降。
字詞層級信賴度分數	API 會傳回值，但這並非真正的信心分數。

使用 Chirp 3 轉錄

瞭解如何使用 Chirp 3 轉錄語音。

執行串流語音辨識

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_streaming_chirp3(
   audio_file: str
) -> cloud_speech.StreamingRecognizeResponse:
   """Transcribes audio from audio file stream using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.

   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"

   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       content = f.read()

   # In practice, stream should be a generator yielding chunks of audio data
   chunk_length = len(content) // 5
   stream = [
       content[start : start + chunk_length]
       for start in range(0, len(content), chunk_length)
   ]
   audio_requests = (
       cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
   )

   recognition_config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )
   streaming_config = cloud_speech.StreamingRecognitionConfig(
       config=recognition_config
   )
   config_request = cloud_speech.StreamingRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       streaming_config=streaming_config,
   )

   def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
       yield config
       yield from audio

   # Transcribes the audio into text
   responses_iterator = client.streaming_recognize(
       requests=requests(config_request, audio_requests)
   )
   responses = []
   for response in responses_iterator:
       responses.append(response)
       for result in response.results:
           print(f"Transcript: {result.alternatives[0].transcript}")

   return responses

執行同步語音辨識

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

執行批次語音辨識

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text v2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input audio file.
           E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Transcribes the audio into text
   operation = client.batch_recognize(request=request)

   print("Waiting for operation to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response.results[audio_uri].transcript

使用 Chirp 3 功能

透過程式碼範例，瞭解如何使用最新功能：

執行不限語言的轉錄作業

Chirp 3 可自動辨識音訊中使用的主要語言並轉錄成文字，這對多語言應用程式來說至關重要。如要達成這個目標，請按照程式碼範例所示設定 language_codes=["auto"]：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 3.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["auto"],  # Set language code to auto to detect language.
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

執行語言限制轉錄

Chirp 3 可以自動辨識音訊檔案中的主要語言並轉錄。您也可以根據預期的特定語言代碼設定條件，例如：["en-US", "fr-FR"]，這樣模型資源就會專注於最有可能的語言，以取得更可靠的結果，如以下程式碼範例所示：

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_3_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 3.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US", "fr-FR"],  # Set language codes of the expected spoken locales
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

執行轉錄和說話者分段標記

使用 Chirp 3 進行轉錄和說話者辨識工作。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input
         audio file. E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the
         Speech-to-Text API containing the transcription results.
   """

   # Instantiates a client.
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],  # Use "auto" to detect language.
       model="chirp_3",
       features=cloud_speech.RecognitionFeatures(
           # Enable diarization by setting empty diarization configuration.
           diarization_config=cloud_speech.SpeakerDiarizationConfig(),
       ),
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Creates audio transcription job.
   operation = client.batch_recognize(request=request)

   print("Waiting for transcription job to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")
       print(f"Speakers per word: {result.alternatives[0].words}")

   return response.results[audio_uri].transcript

透過模型調整功能提高準確率

Chirp 3 可透過模型調整機制，提升特定音訊的轉錄準確度。您可以提供特定字詞和詞組的清單，提高模型辨識這些字詞和詞組的機率。這項功能特別適合處理特定領域的字詞、專有名詞或獨特的詞彙。

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp3_model_adaptation(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       # Use model adaptation
       adaptation=cloud_speech.SpeechAdaptation(
         phrase_sets=[
             cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                 inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                   {
                       "value": "alphabet",
                   },
                   {
                         "value": "cell phone service",
                   }
                 ])
             )
         ]
       )
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

啟用降噪器和 SNR 篩選器

Chirp 3 可在轉錄前減少背景噪音並濾除不必要的聲音，進而提升音訊品質。如要改善吵雜環境的結果，可以啟用內建的降噪器和訊號雜訊比 (SNR) 篩選器。

設定 denoiser_audio=true 可有效減少背景音樂或噪音，例如雨聲和街上車流聲。

你可以設定 snr_threshold=X，控制語音轉錄所需的最低音量。這有助於過濾掉非語音音訊或背景噪音，避免結果中出現不想要的文字。snr_threshold 值越高，表示使用者必須提高音量，模型才能轉錄語音。

在即時串流使用案例中，可利用 SNR 篩選功能，避免將不必要的聲音傳送至模型進行轉錄。這個設定的值越高，表示語音音量必須比背景噪音大，才能傳送至轉錄模型。

snr_threshold 的設定會與 denoise_audio 是否為 true 或 false 互動。denoise_audio=true, 移除背景噪音，讓語音更清楚。音訊的整體訊噪比會提高。

如果使用案例只涉及使用者語音，沒有其他人說話，請設定 denoise_audio=true 來提高訊號雜訊比 (SNR) 篩選的靈敏度，這有助於濾除非語音噪音。如果使用案例涉及背景有人說話，且您想避免轉錄背景語音，請考慮設定 denoise_audio=false 並降低訊號雜訊比 (SNR) 門檻。

建議的訊號雜訊比 (SNR) 門檻值如下。snr_threshold 的合理值可設為 0 至 1000。0 值表示不篩選任何內容，而 1000 表示篩選所有內容。如果建議的設定不適用於你的情況，請微調值。

音訊降噪	訊噪比門檻	語音感應靈敏度
是	10.0	高
是	20.0	中
是	40.0	低
是	100.0	極低
false	0.5	高
false	1.0	中
false	2.0	低
false	5.0	極低

Python

 import os

 from google.cloud.speech_v2 import SpeechClient
 from google.cloud.speech_v2.types import cloud_speech
 from google.api_core.client_options import ClientOptions

 PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
 REGION = "us"

def transcribe_sync_chirp3_with_timestamps(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text v2 API, which provides word-level timestamps for each transcribed word.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       denoiser_config={
           denoise_audio: True,
           # Medium snr threshold
           snr_threshold: 20.0,
       }
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

在 Google Cloud 控制台中使用 Chirp 3

註冊 Google Cloud 帳戶並建立專案。
前往 Google Cloud 控制台的「Speech」頁面。
如果 API 尚未啟用，請啟用 API。
請確認您擁有 STT 控制台 Workspace。如果沒有工作區，請建立工作區。
1. 前往轉錄稿頁面，然後按一下「新增轉錄稿」。
2. 開啟「工作區」下拉式選單，然後按一下「新工作區」，建立語音轉錄工作區。
3. 在「建立新工作區」導覽側欄中，按一下「瀏覽」。
4. 按一下即可建立新的值區。
5. 輸入值區名稱，然後按一下「繼續」。
6. 按一下「建立」建立 Cloud Storage 值區。
7. 建立值區後，按一下「選取」即可選取要使用的值區。
8. 按一下「建立」，即可完成 Speech-to-Text API V2 控制台的工作區建立作業。
轉錄實際音訊。

在「New Transcription」(新轉錄內容) 頁面中，選取音訊檔案，方法是上傳檔案 (「Local upload」(本機上傳)) 或指定現有的 Cloud Storage 檔案 (「Cloud storage」(Cloud Storage))。
按一下「繼續」，前往「轉錄選項」。
1. 從先前建立的辨識器中，選取您打算用於 Chirp 辨識的口語語言。
2. 在模型下拉式選單中，選取「chirp_3」chirp_3。
3. 在「辨識器」下拉式選單中，選取新建立的辨識器。
4. 按一下「提交」，使用 chirp_3 執行第一個辨識要求。
查看 Chirp 3 轉錄結果。
1. 在「轉錄稿」頁面中，按一下轉錄稿名稱即可查看結果。
2. 在「轉錄詳細資料」頁面中查看轉錄結果，並視需要透過瀏覽器播放音訊。

後續步驟

瞭解如何轉錄短音訊檔案。
瞭解如何轉錄串流音訊。
瞭解如何轉錄長音訊檔案。
如要獲得最佳效能、準確率與其他提示，請參閱最佳做法說明文件。

Chirp 3 轉錄：提升多語言準確率 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

模型詳細資料

型號 ID

API 方法

區域可用性

語音轉錄功能支援的語言

說話者分段標記功能支援的語言

功能支援與限制

使用 Chirp 3 轉錄

執行串流語音辨識

Python

執行同步語音辨識

Python

執行批次語音辨識

Python

使用 Chirp 3 功能

執行不限語言的轉錄作業

Python

執行語言限制轉錄

Python

執行轉錄和說話者分段標記

Python

透過模型調整功能提高準確率

Python

啟用降噪器和 SNR 篩選器

Python

在 Google Cloud 控制台中使用 Chirp 3

後續步驟

Chirp 3 轉錄：提升多語言準確率