启用脏话过滤功能

本页面介绍如何使用 Cloud Speech-to-Text 自动检测音频数据中的脏话,并在转写内容中予以剔除。

您可以通过在 RecognitionFeatures 中设置 profanityFilter=true 来启用脏话过滤器。启用后,Cloud Speech-to-Text 将尝试检测亵渎性字词并在转录内容中仅返回第一个字母后跟星号(例如 f***)。如果此字段设置为 false 或未设置,Cloud Speech-to-Text 将不会尝试过滤脏话。

以下示例演示了如何启用脏话过滤器以识别存储在 Cloud Storage 存储桶中的音频。

Python

如需了解如何安装和使用 Cloud STT 客户端库,请参阅 Cloud STT 客户端库。如需了解详情,请参阅 Cloud STT Python API 参考文档

如需向 Cloud STT 进行身份验证,请设置应用默认凭证。如需了解详情,请参阅为本地开发环境设置身份验证

from google.cloud import speech
from google.cloud.speech import RecognizeResponse


def sync_recognize_with_profanity_filter_gcs(audio_uri: str) -> RecognizeResponse:
    """Recognizes speech from an audio file in Cloud Storage and filters out profane language.
    Args:
        audio_uri (str): The Cloud Storage URI of the input audio, e.g., gs://[BUCKET]/[FILE]
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Define the audio source
    audio = {"uri": audio_uri}

    client = speech.SpeechClient()
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,  # Audio format
        sample_rate_hertz=16000,
        language_code="en-US",
        # Enable profanity filter
        profanity_filter=True,
    )

    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        alternative = result.alternatives[0]
        print(f"Transcript: {alternative.transcript}")

    return response.results