Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

優先即付即用

即付即用優先級 (即付即用優先級) 是一種用量選項，可提供比即付即用標準級更穩定的效能，且不必預先承諾佈建輸送量。

使用 Priority PayGo 時，系統會依較高的費率，按權杖用量收費。如要瞭解定價，請參閱定價頁面。

使用「優先 PayGo」的時機

Priority PayGo 非常適合重要業務工作負載，這類工作負載的流量模式起伏不定或無法預測。以下是用途範例：

客戶服務虛擬助理
代理工作流程和跨代理互動
研究模擬

使用 Priority PayGo

如要使用 Priority PayGo 將要求傳送至 Gemini API，您必須在要求中加入 X-Vertex-AI-LLM-Shared-Request-Type 標頭。您可以使用 Priority PayGo 執行下列操作：

使用佈建輸送量配額 (如有)，並溢出至優先 PayGo。
僅使用 Priority PayGo。

使用 PT 做為預設值時，請使用 Priority PayGo

如要在使用 Priority PayGo 前先用完所有可用的 PT 配額，請在要求中加入 X-Vertex-AI-LLM-Shared-Request-Type: priority 標頭，如下列範例所示。

Python

安裝

pip install --upgrade google-genai

詳情請參閱 SDK 參考文件。

設定環境變數，透過 Vertex AI 使用 Google Gen AI SDK：

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_ENTERPRISE=True

初始化 GenAI 用戶端，即可使用 Priority PayGo。完成這個步驟後，您不需要進一步調整程式碼，就能在同一個用戶端上，使用 Priority PayGo 與 Gemini API 互動。

from google import genai
from google.genai.types import HttpOptions
client = genai.Client(
  vertexai=True, project='your_project_id', location='global',
  http_options=HttpOptions(
    api_version="v1",
      headers={
        "X-Vertex-AI-LLM-Shared-Request-Type": "priority"
      },
  )
)

REST

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的專案 ID。。
MODEL_ID：要初始化 Priority PayGo 的模型 ID。
PROMPT_TEXT：要納入提示中的文字指令。 JSON。

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
  -H "X-Vertex-AI-LLM-Shared-Request-Type: priority" \
  "https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/MODEL_ID:generateContent" -d \
  $'{
      "contents": {
        "role": "model",
        "parts": { "text": "PROMPT_TEXT" }
    }
  }'

您應該會收到類似如下的 JSON 回應。

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Response to sample request."
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 3,
    "candidatesTokenCount": 900,
    "totalTokenCount": 1957,
    "trafficType": "ON_DEMAND_PRIORITY",
    "thoughtsTokenCount": 1054
  }
}

使用 generateContent 方法要求在完整生成回覆後傳回。如要減少人類觀眾的延遲感，請使用 streamGenerateContent 方法，在生成回覆時串流傳回。
多模態模型 ID 位於網址尾端，方法之前 (例如 gemini-2.5-flash)。這個範例也可能支援其他模型。

僅使用 Priority PayGo

如要只使用 Priority PayGo，請在要求中加入 X-Vertex-AI-LLM-Request-Type: shared 和 X-Vertex-AI-LLM-Shared-Request-Type: priority 標頭，如下列範例所示。

Python

安裝

pip install --upgrade google-genai

詳情請參閱 SDK 參考文件。

設定環境變數，透過 Vertex AI 使用 Google Gen AI SDK：

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_ENTERPRISE=True

初始化 GenAI 用戶端，即可使用 Priority PayGo。完成這個步驟後，您不需要進一步調整程式碼，就能在同一個用戶端上，使用 Priority PayGo 與 Gemini API 互動。

from google import genai
from google.genai.types import HttpOptions
client = genai.Client(
  vertexai=True, project='your_project_id', location='global',
  http_options=HttpOptions(
    api_version="v1",
      headers={
        "X-Vertex-AI-LLM-Request-Type": "shared",
        "X-Vertex-AI-LLM-Shared-Request-Type": "priority"
      },
  )
)

REST

使用任何要求資料之前，請先替換以下項目：

PROJECT_ID：您的專案 ID。。
MODEL_ID：要初始化 Priority PayGo 的模型 ID。
PROMPT_TEXT：要納入提示中的文字指令。 JSON。

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json; charset=utf-8" \
  -H "X-Vertex-AI-LLM-Request-Type: shared" \
  -H "X-Vertex-AI-LLM-Shared-Request-Type: priority" \
  "https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/google/models/MODEL_ID:generateContent" -d \
  $'{
      "contents": {
        "role": "model",
        "parts": { "text": "PROMPT_TEXT" }
    }
  }'

您應該會收到類似如下的 JSON 回應。

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Response to sample request."
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 3,
    "candidatesTokenCount": 900,
    "totalTokenCount": 1957,
    "trafficType": "ON_DEMAND_PRIORITY",
    "thoughtsTokenCount": 1054
  }
}

使用 generateContent 方法要求在完整生成回覆後傳回。如要減少人類觀眾的延遲感，請使用 streamGenerateContent 方法，在生成回覆時串流傳回。
多模態模型 ID 位於網址尾端，方法之前 (例如 gemini-2.5-flash)。這個範例也可能支援其他模型。

驗證 Priority PayGo 用量

如要確認要求是否使用了 Priority PayGo，請查看回應中的流量類型，如下列範例所示。

Python

您可以透過回應中的 traffic_type 欄位，確認要求是否使用 Priority PayGo。如果要求是透過 Priority PayGo 處理，traffic_type 欄位會設為 ON_DEMAND_PRIORITY。

sdk_http_response=HttpResponse(
  headers=
) candidates=[Candidate(
  avg_logprobs=-0.539712212302468,
  content=Content(
    parts=[
      Part(
        text="""Response to sample request.
        """
      ),
    ],
    role='model'
  ),
  finish_reason=nishReason.STOP: 'STOP'>
)] create_time=datetime.datetime(2025, 12, 3, 20, 32, 55, 916498, tzinfo=TzInfo(0)) model_version='gemini-2.5-flash' prompt_feedback=None response_id='response_id' usage_metadata=GenerateContentResponseUsageMetadata(
  candidates_token_count=1408,
  candidates_tokens_details=[
    ModalityTokenCount(
      modality=ty.TEXT: 'TEXT'>,
      token_count=1408
    ),
  ],
  prompt_token_count=5,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=ty.TEXT: 'TEXT'>,
      token_count=5
    ),
  ],
  thoughts_token_count=1356,
  total_token_count=2769,
  traffic_type=fficType.ON_DEMAND_PRIORITY: 'ON_DEMAND_PRIORITY'>
) automatic_function_calling_history=[] parsed=None

REST

您可以透過回應中的 trafficType 欄位，確認要求是否使用 Priority PayGo。如果要求是透過 Priority PayGo 處理，trafficType 欄位會設為 ON_DEMAND_PRIORITY。

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Response to sample request."
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 3,
    "candidatesTokenCount": 900,
    "totalTokenCount": 1957,
    "trafficType": "ON_DEMAND_PRIORITY",
    "thoughtsTokenCount": 1054
  }
}

斜坡限制

即付即用優先方案會在機構層級設定斜坡限制。斜坡限制有助於提供可預測且一致的效能。起始限制取決於模型，如下所示：

Gemini Flash 和 Flash-Lite 模型：每分鐘 400 萬個權杖。
Gemini Pro 模型：每分鐘 100 萬個符記。

每持續使用 10 分鐘，升速上限就會增加 50%。

如果要求超過升速限制，且系統因流量過高而超出容量上限，要求就會降級為標準隨用隨付方案，並按標準隨用隨付方案費率計費。

為盡量避免降級，請逐步擴大用量，確保不超過上限。如果仍需要提升效能，建議購買額外的 PT 配額。

您可以從回應中確認要求是否遭到降級。如果要求降級為標準即付即用，流量類型會設為 ON_DEMAND。詳情請參閱「驗證 Priority PayGo 使用情形」。

優先即付即用 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

使用「優先 PayGo」的時機

使用 Priority PayGo

使用 PT 做為預設值時，請使用 Priority PayGo

Python

安裝

REST

僅使用 Priority PayGo

Python

安裝

REST

驗證 Priority PayGo 用量

Python

REST

斜坡限制

優先即付即用