Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

מעבר לגרסה העדכנית ביותר של Cloud Speech-to-Text API

גרסה 2 של Cloud Speech-to-Text API מביאה ללקוחות את עיצוב ה-API העדכני ביותר Google Cloud כדי לעמוד בדרישות האבטחה והרגולציה של הארגון כבר מההתחלה.

הדרישות האלה מתבצעות באמצעות הפעולות הבאות:

מיקום של נתונים: Cloud STT V2 מציע את המגוון הרחב של מודלים קיימים לתמלול בGoogle Cloudאזורים כמו בלגיה או סינגפור. כך אפשר להפעיל את מודלי התמלול שלנו באמצעות שירות אזורי מלא.
גמישות של רכיבי זיהוי: רכיבי זיהוי הם הגדרות זיהוי לשימוש חוזר שיכולות להכיל שילוב של מודל, שפה ותכונות.
רישום ביומן: יצירת משאבים ותמלילים יוצרת יומנים שזמינים במסוף Google Cloud , וכך מאפשרת טלמטריה וניפוי באגים טובים יותר.
הצפנה: Cloud Speech-to-Text V2 תומך במפתחות הצפנה בניהול הלקוח לכל המשאבים, וגם בתמלול באצווה.
‫Audio auto-detect: Cloud Speech-to-Text V2 יכול לזהות באופן אוטומטי את קצב הדגימה, מספר הערוצים והפורמט של קובצי האודיו, בלי שתצטרכו לספק את המידע הזה בהגדרות הבקשה.

מעבר מגרסה 1 לגרסה 2

המעבר מ-V1 API ל-V2 API לא מתבצע באופן אוטומטי. כדי ליהנות מהתכונות, צריך לבצע שינויים מינימליים בהטמעה.

מיגרציה ב-API

בדומה ל-Cloud STT V1, כדי לתמלל אודיו, צריך ליצור RecognitionConfig על ידי בחירת שפת האודיו ומודל הזיהוי הרצוי:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def quickstart_v2(audio_file: str) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
    Returns:
        cloud_speech.RecognizeResponse: The response from the recognize request, containing
        the transcription results
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client
    client = SpeechClient()

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

    return response

אם צריך, בוחרים אזור שבו רוצים להשתמש ב-Cloud Speech-to-Text API, ובודקים את הזמינות של השפה והמודל באזור הזה:

Python

import os

from google.api_core.client_options import ClientOptions
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def change_speech_v2_location(
    audio_file: str, location: str
) -> cloud_speech.RecognizeResponse:
    """Transcribe an audio file in a specific region. It allows for specifying the location
        to potentially reduce latency and meet data residency requirements.
    Args:
        audio_file (str): Path to the local audio file to be transcribed.
        location (str): The region where the Speech API will be accessed.
            E.g., "europe-west3"
    Returns:
        cloud_speech.RecognizeResponse: The full response object which includes the transcription results.
    """
    # Reads a file as bytes
    with open(audio_file, "rb") as f:
        audio_content = f.read()

    # Instantiates a client to a regionalized Speech endpoint.
    client = SpeechClient(
        client_options=ClientOptions(
            api_endpoint=f"{location}-speech.googleapis.com",
        )
    )

    config = cloud_speech.RecognitionConfig(
        auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
        language_codes=["en-US"],
        model="long",
    )

    request = cloud_speech.RecognizeRequest(
        recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",
        config=config,
        content=audio_content,
    )

    # Transcribes the audio into text
    response = client.recognize(request=request)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")
    return response

אופציונלי: אם רוצים להשתמש מחדש בהגדרת זיהוי ספציפית בהרבה בקשות תמלול, אפשר ליצור משאב של זיהוי:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")


def create_recognizer(recognizer_id: str) -> cloud_speech.Recognizer:
    """Сreates a recognizer with an unique ID and default recognition configuration.
    Args:
        recognizer_id (str): The unique identifier for the recognizer to be created.
    Returns:
        cloud_speech.Recognizer: The created recognizer object with configuration.
    """
    # Instantiates a client
    client = SpeechClient()

    request = cloud_speech.CreateRecognizerRequest(
        parent=f"projects/{PROJECT_ID}/locations/global",
        recognizer_id=recognizer_id,
        recognizer=cloud_speech.Recognizer(
            default_recognition_config=cloud_speech.RecognitionConfig(
                language_codes=["en-US"], model="long"
            ),
        ),
    )
    # Sends the request to create a recognizer and waits for the operation to complete
    operation = client.create_recognizer(request=request)
    recognizer = operation.result()

    print("Created Recognizer:", recognizer.name)
    return recognizer

יש הבדלים נוספים בבקשות ובתגובות בגרסה החדשה של V2 API. לפרטים נוספים, כדאי לעיין במאמרי העזרה.

העברה בממשק המשתמש

כדי לבצע מיגרציה דרך מסוף Speech Google Cloud , פועלים לפי השלבים הבאים:

עוברים אל מסוף Google Cloud Speech.
עוברים לדף תמלילים.
לוחצים על תמלול חדש ובוחרים את האודיו בכרטיסייה הגדרת אודיו.
בכרטיסייה אפשרויות תמלול, בוחרים באפשרות V2.

המאמרים הבאים

אפשר להשתמש בספריות לקוח כדי לתמלל אודיו באמצעות שפת התכנות המועדפת עליכם.

איך מתמללים קובצי אודיו קצרים

איך מתמללים אודיו בסטרימינג

איך מתמללים קובצי אודיו ארוכים

מעבר לגרסה העדכנית ביותר של Cloud Speech-to-Text API קל לארגן דפים בעזרת אוספים אפשר לשמור ולסווג תוכן על סמך ההעדפות שלך.

מעבר מגרסה 1 לגרסה 2

מיגרציה ב-API

Python

Python

Python

העברה בממשק המשתמש

המאמרים הבאים

מעבר לגרסה העדכנית ביותר של Cloud Speech-to-Text API