建立長篇音訊

本文件將逐步說明如何合成長篇音訊。長篇音訊合成功能會以非同步的方式合成輸入內容，最高可達 100 萬位元組。如要進一步瞭解 Text-to-Speech 的基本概念，請參閱「Text-to-Speech 基本概念」一文。

事前準備

您必須先完成下列動作，才能向 Text-to-Speech API 傳送要求。詳情請參閱「事前準備」頁面。

為 Google Cloud 專案啟用 Text-to-Speech。
1. 確認已啟用 Text-to-Speech 的計費功能。
2. 確認您在輸出 Google Cloud bucket 中具備下列 Identity and Access Management (IAM) 角色。
  - Storage 物件建立者
  - Storage 物件檢視者
安裝 Google Cloud CLI。安裝完成後，執行下列指令來初始化 Google Cloud CLI：
```
gcloud init
```
若您採用的是外部識別資訊提供者 (IdP)，請先使用聯合身分登入 gcloud CLI。

使用指令列根據文字內容合成長篇音訊

如要將長篇文字內容轉換為音訊，請向 https://texttospeech.googleapis.com/v1beta1/projects/{$project_number}/locations/global:synthesizeLongAudio 端點發出 HTTP POST 要求，並在 POST 指令的主體中指定下列欄位。

• voice：要合成的語音類型。

• input.text：要合成的文字。

• audioConfig：要建立的音訊類型。

• output_gcs_uri：格式為「gs://bucket_name/file_name.wav」的 Google Cloud 輸出路徑。

• parent：格式為「projects/{YOUR_PROJECT_NUMBER}/locations/{YOUR_PROJECT_LOCATION}」的上層元素。

輸入內容最多可包含 1 MB 的字元，確切上限會因輸入內容而異。

在用於執行合成作業的專案中，建立 Google Cloud Storage bucket。請確認用於執行合成作業的服務帳戶具備輸出 Google Cloud bucket 的讀取和寫入權限。

在指令列執行 REST 要求，使用 Text-to-Speech 根據文字內容合成音訊。這項指令會使用 gcloud auth application-default print-access-token 指令，擷取要求的授權權杖。

HTTP 方法和網址：

POST https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio

JSON 要求主體：

{
  "parent": "projects/12345/locations/global",
  "audio_config":{
      "audio_encoding":"LINEAR16"
  },
  "input":{
      "text":"hello"
  },
  "voice":{
      "language_code":"en-us",
      "name":"en-us-Standard-A"
  },
  "output_gcs_uri": "gs://bucket_name/file_name.wav"
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或已使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global:synthesizeLongAudio" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 0,
    "startTime": "2022-12-20T00:46:56.296191037Z",
    "lastUpdateTime": "2022-12-20T00:46:56.296191037Z"
  },
  "done": false
}

REST 指令的 JSON 輸出內容包含 name 欄位中的長時間作業名稱。在指令列執行 REST 要求，查詢長時間執行的作業狀態。

請確認執行 GET 作業的服務帳戶與用於合成的服務帳戶屬於同一個專案。

HTTP 方法和網址：
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456
```
請展開以下其中一個選項，以傳送要求：
curl (Linux、macOS 或 Cloud Shell)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或已使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456"
```
PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations/23456" | Select-Object -Expand Content
```
您應該會收到如下的 JSON 回覆：
```
{
  "name": "projects/12345/locations/global/operations/23456",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.texttospeech.v1beta1.SynthesizeLongAudioMetadata",
    "progressPercentage": 100
  },
  "done": true
}
```
如要查詢特定專案中執行的所有作業清單，請執行 REST 要求。

請確認執行 LIST 作業的服務帳戶與用於合成的服務帳戶屬於同一個專案。

HTTP 方法和網址：
```
GET https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations
```
請展開以下其中一個選項，以傳送要求：
curl (Linux、macOS 或 Cloud Shell)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或已使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：
```
curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations"
```
PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：
```
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://texttospeech.googleapis.com/v1beta1/projects/12345/locations/global/operations" | Select-Object -Expand Content
```
您應該會收到如下的 JSON 回覆：
```
{
  "operations": [
    {
      "name": "12345",
      "done": false
    },
    {
      "name": "23456",
      "done": false
    }
  ],
  "nextPageToken": ""
}
```
長時間執行的作業順利完成後，請根據 output_gcs_uri 欄位中的值，透過指定的 bucket URI 尋找輸出音訊檔案。如果作業未順利完成，請使用 GET REST 指令查詢並修正錯誤，然後再次發出 RPC。

使用用戶端程式庫根據文字內容合成長篇音訊

請按照下列指示操作，合成長篇音訊。

安裝用戶端程式庫

Python

安裝程式庫前，請確認您已設定適當的 Python 開發環境。

pip install --upgrade google-cloud-texttospeech

建立音訊資料

您可以使用 Text-to-Speech 建立合成人類語音的長篇音訊檔案。請使用下列程式碼，在 Google Cloud bucket 中建立長篇音訊檔案。

Python

執行範例前，請確認您已設定適當的 Python 開發環境。

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.cloud import texttospeech


def synthesize_long_audio(project_id: str, output_gcs_uri: str) -> None:
    """
    Synthesizes long input, writing the resulting audio to `output_gcs_uri`.

    Args:
        project_id: ID or number of the Google Cloud project you want to use.
        output_gcs_uri: Specifies a Cloud Storage URI for the synthesis results.
            Must be specified in the format:
            ``gs://bucket_name/object_name``, and the bucket must
            already exist.
    """

    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

    input = texttospeech.SynthesisInput(
        text="Test input. Replace this with any text you want to synthesize, up to 1 million bytes long!"
    )

    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )

    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US", name="en-US-Standard-A"
    )

    parent = f"projects/{project_id}/locations/us-central1"

    request = texttospeech.SynthesizeLongAudioRequest(
        parent=parent,
        input=input,
        audio_config=audio_config,
        voice=voice,
        output_gcs_uri=output_gcs_uri,
    )

    operation = client.synthesize_long_audio(request=request)
    # Set a deadline for your LRO to finish. 300 seconds is reasonable, but can be adjusted depending on the length of the input.
    # If the operation times out, that likely means there was an error. In that case, inspect the error, and try again.
    result = operation.result(timeout=300)
    print(
        "\nFinished processing, check your GCS bucket to find your audio file! Printing what should be an empty result: ",
        result,
    )

清除所用資源

請透過Google Cloud console 刪除不需要的專案，以免產生不必要的 Google Cloud 費用。

後續步驟

如要進一步瞭解 Cloud Text-to-Speech，請參閱基本概念。
查看可用於合成語音的可用語音清單。