Trascrizione asincrona di un file audio in Cloud Storage

Esegui la trascrizione asincrona di un file audio archiviato in Cloud Storage.

Per saperne di più

Per la documentazione dettagliata che include questo esempio di codice, vedi quanto segue:

Esempio di codice

Go

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT. Per saperne di più, consulta la documentazione di riferimento dell'API Cloud STT Go.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.


func sendGCS(w io.Writer, client *speech.Client, gcsURI string) error {
	ctx := context.Background()

	// Send the contents of the audio file with the encoding and
	// and sample rate information to be transcripted.
	req := &speechpb.LongRunningRecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: gcsURI},
		},
	}

	op, err := client.LongRunningRecognize(ctx, req)
	if err != nil {
		return err
	}
	resp, err := op.Wait(ctx)
	if err != nil {
		return err
	}

	// Print the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Fprintf(w, "\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
	return nil
}

Java

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT. Per saperne di più, consulta la documentazione di riferimento dell'API Cloud STT Java.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

/**
 * Performs non-blocking speech recognition on remote FLAC file and prints the transcription.
 *
 * @param gcsUri the path to the remote LINEAR16 audio file to transcribe.
 */
public static void asyncRecognizeGcs(String gcsUri) throws Exception {
  // Configure polling algorithm
  SpeechSettings.Builder speechSettings = SpeechSettings.newBuilder();
  TimedRetryAlgorithm timedRetryAlgorithm =
      OperationTimedPollAlgorithm.create(
          RetrySettings.newBuilder()
              .setInitialRetryDelay(Duration.ofMillis(500L))
              .setRetryDelayMultiplier(1.5)
              .setMaxRetryDelay(Duration.ofMillis(5000L))
              .setInitialRpcTimeout(Duration.ZERO) // ignored
              .setRpcTimeoutMultiplier(1.0) // ignored
              .setMaxRpcTimeout(Duration.ZERO) // ignored
              .setTotalTimeout(Duration.ofHours(24L)) // set polling timeout to 24 hours
              .build());
  speechSettings.longRunningRecognizeOperationSettings().setPollingAlgorithm(timedRetryAlgorithm);

  // Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
  try (SpeechClient speech = SpeechClient.create(speechSettings.build())) {

    // Configure remote file request for FLAC
    RecognitionConfig config =
        RecognitionConfig.newBuilder()
            .setEncoding(AudioEncoding.FLAC)
            .setLanguageCode("en-US")
            .setSampleRateHertz(16000)
            .build();
    RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

    // Use non-blocking call for getting file transcription
    OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
        speech.longRunningRecognizeAsync(config, audio);
    while (!response.isDone()) {
      System.out.println("Waiting for response...");
      Thread.sleep(10000);
    }

    List<SpeechRecognitionResult> results = response.get().getResultsList();

    for (SpeechRecognitionResult result : results) {
      // There can be several alternative transcripts for a given chunk of speech. Just use the
      // first (most likely) one here.
      SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
      System.out.printf("Transcription: %s\n", alternative.getTranscript());
    }
  }
}

Node.js

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT. Per saperne di più, consulta la documentazione di riferimento dell'API Cloud STT Node.js.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const gcsUri = 'gs://my-bucket/audio.raw';
// const encoding = 'Encoding of the audio file, e.g. LINEAR16';
// const sampleRateHertz = 16000;
// const languageCode = 'BCP-47 language code, e.g. en-US';

const config = {
  encoding: encoding,
  sampleRateHertz: sampleRateHertz,
  languageCode: languageCode,
};

const audio = {
  uri: gcsUri,
};

const request = {
  config: config,
  audio: audio,
};

// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const [response] = await operation.promise();
const transcription = response.results
  .map(result => result.alternatives[0].transcript)
  .join('\n');
console.log(`Transcription: ${transcription}`);

PHP

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

use Google\Cloud\Speech\V2\Client\SpeechClient;
use Google\Cloud\Speech\V2\BatchRecognizeRequest;
use Google\Cloud\Speech\V2\RecognitionConfig;
use Google\Cloud\Speech\V2\BatchRecognizeFileMetadata;
use Google\Cloud\Speech\V2\ExplicitDecodingConfig;
use Google\Cloud\Speech\V2\ExplicitDecodingConfig\AudioEncoding;
use Google\Cloud\Speech\V2\RecognitionOutputConfig;
use Google\Cloud\Speech\V2\InlineOutputConfig;

/**
 * @param string $projectId The Google Cloud project ID.
 * @param string $location The location of the recognizer.
 * @param string $recognizerId The ID of the recognizer to use.
 * @param string $uri The Cloud Storage object to transcribe (other than global)
 *                    e.x. gs://cloud-samples-data/speech/brooklyn_bridge.raw
 */
function transcribe_async_gcs(string $projectId, string $location, string $recognizerId, string $uri)
{
    $apiEndpoint = $location === 'global' ? null : sprintf('%s-speech.googleapis.com', $location);
    $speech = new SpeechClient(['apiEndpoint' => $apiEndpoint]);
    $recognizerName = SpeechClient::recognizerName($projectId, $location, $recognizerId);

    $config = (new RecognitionConfig())
        // Can also use {@see Google\Cloud\Speech\V2\ExplicitDecodingConfig}
        // ->setExplicitDecodingConfig(new ExplicitDecodingConfig([...]);

        ->setExplicitDecodingConfig(new ExplicitDecodingConfig([
            // change these variables if necessary
            'encoding' => AudioEncoding::LINEAR16,
            'sample_rate_hertz' => 16000,
            'audio_channel_count' => 1,
        ]));

    $outputConfig = (new RecognitionOutputConfig())
        ->setInlineResponseConfig(new InlineOutputConfig());

    $file = new BatchRecognizeFileMetadata();
    $file->setUri($uri);

    $request = (new BatchRecognizeRequest())
        ->setRecognizer($recognizerName)
        ->setConfig($config)
        ->setFiles([$file])
        ->setRecognitionOutputConfig($outputConfig);

    try {
        $operation = $speech->batchRecognize($request);
        $operation->pollUntilComplete();

        if ($operation->operationSucceeded()) {
            $response = $operation->getResult();
            foreach ($response->getResults() as $result) {
                if ($result->getError()) {
                    print('Error: ' . $result->getError()->getMessage());
                }
                // get the most likely transcription
                $transcript = $result->getInlineResult()->getTranscript();
                foreach ($transcript->getResults() as $transacriptResult) {
                    $alternatives = $transacriptResult->getAlternatives();
                    $mostLikely = $alternatives[0];
                    $transcript = $mostLikely->getTranscript();
                    $confidence = $mostLikely->getConfidence();
                    printf('Transcript: %s' . PHP_EOL, $transcript);
                    printf('Confidence: %s' . PHP_EOL, $confidence);
                }
            }
        } else {
            print_r($operation->getError());
        }
    } finally {
        $speech->close();
    }
}

Python

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT. Per saperne di più, consulta la documentazione di riferimento dell'API Cloud STT Python.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

from google.cloud import speech


def transcribe_gcs(gcs_uri: str) -> str:
    """Asynchronously transcribes the audio file from Cloud Storage
    Args:
        gcs_uri: The Google Cloud Storage path to an audio file.
            E.g., "gs://storage-bucket/file.flac".
    Returns:
        The generated transcript from the audio file provided.
    """
    client = speech.SpeechClient()

    audio = speech.RecognitionAudio(uri=gcs_uri)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=44100,
        language_code="en-US",
    )

    operation = client.long_running_recognize(config=config, audio=audio)

    print("Waiting for operation to complete...")
    response = operation.result(timeout=90)

    transcript_builder = []
    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        transcript_builder.append(f"\nTranscript: {result.alternatives[0].transcript}")
        transcript_builder.append(f"\nConfidence: {result.alternatives[0].confidence}")

    transcript = "".join(transcript_builder)
    print(transcript)

    return transcript

Ruby

Per scoprire come installare e utilizzare la libreria client per Cloud STT, consulta la sezione Librerie client Cloud STT.

Per eseguire l'autenticazione in Cloud STT, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.

# storage_path = "Path to file in Cloud Storage, eg. gs://bucket/audio.raw"

require "google/cloud/speech"

speech = Google::Cloud::Speech.speech version: :v1

config = { encoding:          :LINEAR16,
           sample_rate_hertz: 16_000,
           language_code:     "en-US" }
audio = { uri: storage_path }

operation = speech.long_running_recognize config: config, audio: audio

puts "Operation started"

operation.wait_until_done!

raise operation.results.message if operation.error?

results = operation.response.results

alternatives = results.first.alternatives
alternatives.each do |alternative|
  puts "Transcription: #{alternative.transcript}"
end

Passaggi successivi

Per cercare e filtrare gli esempi di codice per altri prodotti Google Cloud , consulta il browser degli esempi diGoogle Cloud .