‫Google משתמשת בטכנולוגיית AI כדי לתרגם תוכן לשפה המועדפת עליך. בתרגומים כאלו עשויות להיות שגיאות.

תמלול דיבור לטקסט באמצעות ספריות לקוח

בדף הזה מוסבר איך לשלוח בקשה לזיהוי דיבור ל-Cloud Speech-to-Text בשפת התכנות המועדפת באמצעותGoogle Cloud ספריות הלקוח.

‫Cloud Speech-to-Text מאפשר לכם לשלב בקלות את הטכנולוגיות של Google לזיהוי דיבור באפליקציות של מפתחים. אפשר לשלוח נתוני אודיו אל Cloud Speech-to-Text API, ואז הממשק יחזיר תמלול טקסט של קובץ האודיו הזה. מידע נוסף על השירות זמין במאמר היסודות של Cloud STT.

לפני שמתחילים

לפני ששולחים בקשה ל-Cloud Speech-to-Text API, צריך לבצע את הפעולות הבאות. פרטים נוספים מופיעים בדף לפני שמתחילים.

הפעלת Cloud Speech-to-Text בפרויקט ב- Google Cloud .
מוודאים שהחיוב מופעל עבור Cloud Speech-to-Text.
התקינו את ה-CLI של Google Cloud. אחר כך, אתחלו את ה-CLI של Google Cloud באמצעות הפקודה הבאה:
```
gcloud init
```
אם אתם משתמשים בספק זהויות חיצוני (IdP), קודם אתם צריכים להיכנס ל-CLI של gcloud באמצעות המאגר המאוחד לניהול זהויות.
אם אתם משתמשים במעטפת מקומית, אתם צריכים ליצור פרטי כניסה לאימות מקומי עבור חשבון המשתמש:
```
gcloud auth application-default login
```
אם אתם משתמשים ב-Cloud Shell, אין צורך לבצע את הפעולה הזו.

אם מוחזרת שגיאת אימות ואתם משתמשים בספק זהויות חיצוני (IdP), ודאו ש נכנסתם ל-CLI של gcloud באמצעות המאגר המאוחד לניהול זהויות.
מוודאים שיש את ההרשאות הנדרשות כדי להשלים את ההדרכה. אם משתמשים בפרויקט חדש, לא צריך לוודא כי כבר יש את ההרשאות הנדרשות.
(אופציונלי) יוצרים קטגוריה חדשה ב-Cloud Storage כדי לאחסן את נתוני האודיו.

התפקידים הנדרשים

כדי לקבל את ההרשאות שנדרשות לתמלול דיבור לטקסט, צריך לבקש מהאדמין להקצות לכם את תפקיד ה-IAM‏ Service Usage Consumer (roles/serviceusage.serviceUsageConsumer) בפרויקט. כדי לקרוא הסבר על מתן תפקידים, ראו איך מנהלים את הגישה ברמת הפרויקט, התיקייה והארגון.

יכול להיות שאפשר לקבל את ההרשאות הנדרשות גם באמצעות תפקידים בהתאמה אישית או תפקידים מוגדרים מראש.

התקנת ספריית הלקוח

Go

go get cloud.google.com/go/speech/apiv1

Java

אם משתמשים ב-Maven, צריך להוסיף את הקוד הבא לקובץ pom.xml. במאמר העוסק בספריות BOM ל-Google Cloud Platform תוכלו לקרוא מידע נוסף על עצי מוצרים (BOM).

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.83.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-speech</artifactId>
  </dependency>
</dependencies>

אם משתמשים ב-Gradle, צריך להוסיף את הקוד הבא ליחסי התלות:

implementation platform('com.google.cloud:libraries-bom:26.83.0')

implementation 'com.google.cloud:google-cloud-speech'

אם משתמשים ב-sbt, צריך להוסיף את הקוד הבא ליחסי התלות:

libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "4.89.0"

אם משתמשים ב-Visual Studio Code או ב-IntelliJ, אפשר להוסיף את ספריות הלקוח לפרויקט באמצעות יישומי הפלאגין הבאים של IDE:

באמצעות יישומי הפלאגין תוכלו להשתמש בפונקציות נוספות, כמו ניהול מפתחות לחשבונות שירות. לפרטים נוספים, קראו את מאמרי העזרה של כל אחד מיישומי הפלאגין.

Node.js

לפני שמתקינים את הספרייה, חשוב לוודא שהכנתם את הסביבה לפיתוח Node.js.

npm install @google-cloud/speech

Python

לפני שמתקינים את הספרייה, צריך לוודא שהכנתם את הסביבה לפיתוח בשפת Python.

pip install --upgrade google-cloud-speech

שליחת בקשה לתמלול אודיו

עכשיו אפשר להשתמש ב-Cloud STT כדי לתמלל קובץ אודיו לטקסט. משתמשים בקוד הבא כדי לשלוח בקשת recognize ל-Cloud Speech-to-Text API.

Go


// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (
	"context"
	"fmt"
	"log"

	speech "cloud.google.com/go/speech/apiv1"
	"cloud.google.com/go/speech/apiv1/speechpb"
)

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}
	defer client.Close()

	// The path to the remote audio file to transcribe.
	fileURI := "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: fileURI},
		},
	})
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)
	}

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
}

Java

// Imports the Google Cloud client library
import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import java.util.List;

public class QuickstartSample {

  /** Demonstrates using the Speech API to transcribe an audio file. */
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String gcsUri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw";

      // Builds the sync recognize request
      RecognitionConfig config =
          RecognitionConfig.newBuilder()
              .setEncoding(AudioEncoding.LINEAR16)
              .setSampleRateHertz(16000)
              .setLanguageCode("en-US")
              .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }
}

Node.js

לפני שמריצים את הדוגמה, חשוב לוודא שהכנתם את הסביבה לפיתוח ב-Node.js.

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

async function quickstart() {
  // The path to the remote LINEAR16 file
  const gcsUri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw';

  // The audio file's encoding, sample rate in hertz, and BCP-47 language code
  const audio = {
    uri: gcsUri,
  };
  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
  };
  const request = {
    audio: audio,
    config: config,
  };

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
    .join('\n');
  console.log(`Transcription: ${transcription}`);
}
quickstart();

Python

לפני שמריצים את הדוגמה, חשוב לוודא שהכנתם את הסביבה לפיתוח בשפת Python.


# Imports the Google Cloud client library


from google.cloud import speech



def run_quickstart() -> speech.RecognizeResponse:
    # Instantiates a client
    client = speech.SpeechClient()

    # The name of the audio file to transcribe
    gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

    audio = speech.RecognitionAudio(uri=gcs_uri)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    # Detects speech in the audio file
    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

כל הכבוד! שלחת את הבקשה הראשונה ל-Cloud STT.

אם מתקבלת שגיאה או תגובה ריקה מ-Cloud STT, כדאי לעיין בשלבים לפתרון בעיות ולצמצום שגיאות.

הסרת המשאבים

כדי לא לצבור חיובים לחשבון Google Cloud על המשאבים שבהם השתמשתם בדף הזה, פועלים לפי השלבים הבאים:

אם אתם לא צריכים את הפרויקט, אתם יכולים להשתמש באפשרות Google Cloud console כדי למחוק אותו.

המאמרים הבאים

מתרגלים תמלול של קובצי אודיו קצרים.
איך מעבדים קבוצות של קובצי אודיו ארוכים לזיהוי דיבור
כך מתמללים אודיו בסטרימינג, למשל ממיקרופון.
כדי להתחיל להשתמש ב-Cloud STT בשפה הרצויה, אפשר להשתמש בספריית לקוח של Cloud STT.
עוברים על אפליקציות לדוגמה.
לשיפור הביצועים והדיוק, ולקבלת טיפים נוספים, אפשר לעיין במאמר בנושא שיטות מומלצות.