使用用戶端程式庫將語音轉錄為文字

本頁說明如何使用Google Cloud 用戶端程式庫，以您偏好的程式設計語言將語音辨識要求傳送至 Cloud Speech-to-Text。

Cloud Speech-to-Text 可讓您將 Google 語音辨識技術輕鬆整合至開發人員應用程式。您可以將音訊資料傳送至 Cloud Speech-to-Text API，該 API 接著會傳回音訊檔案的文字轉錄稿。如要進一步瞭解這項服務，請參閱「Cloud STT 基礎知識」。

事前準備

您必須先完成下列動作，才能向 Cloud Speech-to-Text API 傳送要求。詳情請參閱「事前準備」頁面。

為 Google Cloud 專案啟用 Cloud Speech-to-Text。
確認已啟用 Cloud Speech-to-Text 的計費功能。
安裝 Google Cloud CLI。完成後，執行下列指令來初始化 Google Cloud CLI：
```
gcloud init
```
若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。
If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud auth application-default login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.
確認您具備完成本指南所需的權限。如果您是為本指南建立新專案，則已具備必要權限。
(選用) 建立新的 Cloud Storage bucket 來儲存音訊資料。

必要的角色

如要取得語音轉文字所需的權限，請要求管理員授予您專案的服務使用情形消費者 (roles/serviceusage.serviceUsageConsumer) IAM 角色。如要進一步瞭解如何授予角色，請參閱「管理專案、資料夾和組織的存取權」。

您或許也能透過自訂角色或其他預先定義的角色，取得必要權限。

安裝用戶端程式庫

Go

go get cloud.google.com/go/speech/apiv1

Java

If you are using Maven, add the following to your pom.xml file. For more information about BOMs, see The Google Cloud Platform Libraries BOM.

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.75.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-speech</artifactId>
  </dependency>
</dependencies>

If you are using Gradle, add the following to your dependencies:

implementation 'com.google.cloud:google-cloud-speech:4.78.0'

If you are using sbt, add the following to your dependencies:

libraryDependencies += "com.google.cloud" % "google-cloud-speech" % "4.78.0"

If you're using Visual Studio Code or IntelliJ, you can add client libraries to your project using the following IDE plugins:

The plugins provide additional functionality, such as key management for service accounts. Refer to each plugin's documentation for details.

Node.js

安裝程式庫前，請確認您已設定適當的 Node.js 開發環境。

npm install @google-cloud/speech

Python

安裝程式庫前，請確認您已設定適當的 Python 開發環境。

pip install --upgrade google-cloud-speech

提出音訊轉錄要求

您現在可以使用 Cloud STT 將音訊檔案轉錄成文字。請使用下列程式碼，將 recognize 要求傳送至 Cloud Speech-to-Text API。

Go


// Sample speech-quickstart uses the Google Cloud Speech API to transcribe
// audio.
package main

import (
	"context"
	"fmt"
	"log"

	speech "cloud.google.com/go/speech/apiv1"
	"cloud.google.com/go/speech/apiv1/speechpb"
)

func main() {
	ctx := context.Background()

	// Creates a client.
	client, err := speech.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}
	defer client.Close()

	// The path to the remote audio file to transcribe.
	fileURI := "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

	// Detects speech in the audio file.
	resp, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
		Config: &speechpb.RecognitionConfig{
			Encoding:        speechpb.RecognitionConfig_LINEAR16,
			SampleRateHertz: 16000,
			LanguageCode:    "en-US",
		},
		Audio: &speechpb.RecognitionAudio{
			AudioSource: &speechpb.RecognitionAudio_Uri{Uri: fileURI},
		},
	})
	if err != nil {
		log.Fatalf("failed to recognize: %v", err)
	}

	// Prints the results.
	for _, result := range resp.Results {
		for _, alt := range result.Alternatives {
			fmt.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
		}
	}
}

Java

// Imports the Google Cloud client library
import com.google.cloud.speech.v1.RecognitionAudio;
import com.google.cloud.speech.v1.RecognitionConfig;
import com.google.cloud.speech.v1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1.RecognizeResponse;
import com.google.cloud.speech.v1.SpeechClient;
import com.google.cloud.speech.v1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1.SpeechRecognitionResult;
import java.util.List;

public class QuickstartSample {

  /** Demonstrates using the Speech API to transcribe an audio file. */
  public static void main(String... args) throws Exception {
    // Instantiates a client
    try (SpeechClient speechClient = SpeechClient.create()) {

      // The path to the audio file to transcribe
      String gcsUri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw";

      // Builds the sync recognize request
      RecognitionConfig config =
          RecognitionConfig.newBuilder()
              .setEncoding(AudioEncoding.LINEAR16)
              .setSampleRateHertz(16000)
              .setLanguageCode("en-US")
              .build();
      RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();

      // Performs speech recognition on the audio file
      RecognizeResponse response = speechClient.recognize(config, audio);
      List<SpeechRecognitionResult> results = response.getResultsList();

      for (SpeechRecognitionResult result : results) {
        // There can be several alternative transcripts for a given chunk of speech. Just use the
        // first (most likely) one here.
        SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
        System.out.printf("Transcription: %s%n", alternative.getTranscript());
      }
    }
  }
}

Node.js

執行範例前，請確認已設定適當的 Node.js 開發環境。

// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');

// Creates a client
const client = new speech.SpeechClient();

async function quickstart() {
  // The path to the remote LINEAR16 file
  const gcsUri = 'gs://cloud-samples-data/speech/brooklyn_bridge.raw';

  // The audio file's encoding, sample rate in hertz, and BCP-47 language code
  const audio = {
    uri: gcsUri,
  };
  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
  };
  const request = {
    audio: audio,
    config: config,
  };

  // Detects speech in the audio file
  const [response] = await client.recognize(request);
  const transcription = response.results
    .map(result => result.alternatives[0].transcript)
    .join('\n');
  console.log(`Transcription: ${transcription}`);
}
quickstart();

Python

執行範例前，請確認已設定適當的 Python 開發環境。


# Imports the Google Cloud client library


from google.cloud import speech



def run_quickstart() -> speech.RecognizeResponse:
    # Instantiates a client
    client = speech.SpeechClient()

    # The name of the audio file to transcribe
    gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

    audio = speech.RecognitionAudio(uri=gcs_uri)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    # Detects speech in the audio file
    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        print(f"Transcript: {result.alternatives[0].transcript}")

恭喜！您已將第一個要求傳送至 Cloud STT。

如果您收到來自 Cloud STT 的錯誤或空白回應，請查看疑難排解和錯誤緩解措施步驟。

清除所用資源

為了避免系統向您的 Google Cloud 帳戶收取本頁面所用資源的費用，請按照下列步驟操作。

使用 Google Cloud console 刪除不需要的專案。

後續步驟

練習轉錄短音訊檔案。
瞭解如何批次處理長音訊檔案以進行語音辨識。
瞭解如何轉錄串流音訊，例如從麥克風轉錄。
使用 Cloud STT 用戶端程式庫，並以您選擇的語言開始使用 Cloud STT。
逐步演練範例應用程式。
如要獲得最佳效能、準確率與其他提示，請參閱最佳做法說明文件。