部署使用自訂權重的模型

使用自訂權重部署模型為預先發布版功能。您可以根據預先定義的一組基礎模型微調模型，並在 Vertex AI Model Garden 中部署自訂模型。您可以將模型構件上傳至專案中的 Cloud Storage bucket，然後透過自訂權重匯入功能部署自訂模型，在 Vertex AI 中只要按一下滑鼠即可完成。

支援自訂權重的 VPC Service Controls。

支援的模型

使用自訂權重部署模型的公開測試版支援下列基礎模型：

模型名稱	版本
Llama	Llama-2：7B、13B Llama-3.1：8B、70B Llama-3.2：1B、3B Llama-4：Scout-17B、Maverick-17B CodeLlama-13B
Gemma	Gemma-2：9B、27B Gemma-3：10 億、40 億、30 億至 120 億、270 億 Medgemma：4B、27B 文字
Qwen	Qwen2：15 億 Qwen2.5：0.5B、1.5B、7B、32B Qwen3：0.6B、1.7B、4B、8B、32B、Qwen3-Coder-480B-A35B-Instruct、Qwen3-Next-80B-A3B-Instruct、Qwen3-Next-80B-A3B-Thinking
Deepseek	Deepseek-R1 Deepseek-V3 DeepSeek-V3.1
Mistral 和 Mixtral	Mistral-7B-v0.1 Mixtral-8x7B-v0.1 Mistral-Nemo-Base-2407
Phi-4	Phi-4-reasoning
OpenAI OSS	gpt-oss：20B、120B

限制

自訂權重不支援匯入量化模型。

模型檔案

您必須以 Hugging Face 權重格式提供模型檔案。如要進一步瞭解 Hugging Face 權重格式，請參閱「使用 Hugging Face 模型」。

如果未提供必要檔案，模型部署作業可能會失敗。

下表列出模型檔案類型，這些類型取決於模型的架構：

模型檔案內容	檔案類型
模型設定	`config.json`
模型權重	`.safetensors` `.bin`
權重指數	`*.index.json`
分詞器檔案	`tokenizer.model` `tokenizer.json` `tokenizer_config.json`

位置

您可以透過 Model Garden 服務，在所有地區部署自訂模型。

必要條件

本節說明如何部署自訂模型。

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the API

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

本教學課程假設您使用 Cloud Shell 與 Google Cloud互動。如要使用 Cloud Shell 以外的 Shell，請執行下列額外設定：

Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```

部署自訂模型

本節說明如何部署自訂模型。

如果您使用指令列介面 (CLI)、Python 或 JavaScript，請將下列變數替換為值，讓程式碼範例正常運作：

REGION：您的區域。例如：uscentral1。
MODEL_GCS：您的 Google Cloud 模型。例如：gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct。
PROJECT_ID：專案 ID。
MODEL_ID：模型 ID。
MACHINE_TYPE：機器類型。例如：g2-standard-12。
ACCELERATOR_TYPE：加速器類型。例如：NVIDIA_L4。
ACCELERATOR_COUNT：加速器數量。
PROMPT：文字提示。

控制台

下列步驟說明如何使用 Google Cloud 控制台，以自訂權重部署模型。

前往 Google Cloud 控制台的「Model Garden」(模型花園) 頁面。

前往 Model Garden
按一下「Deploy model with custom weights」(使用自訂權重部署模型)。「Deploy a model with custom weights」(使用自訂權重部署模型) 窗格隨即顯示。
在「模型來源」部分，執行下列操作：
1. 按一下「瀏覽」，然後選擇儲存模型的 bucket，並按一下「選取」。
2. 選用：在「模型名稱」欄位中輸入模型名稱。
在「部署設定」部分，執行下列操作：
1. 從「區域」欄位選取您所在的區域，然後按一下「確定」。
2. 在「Machine Spec」(機器規格) 欄位中，選取用於部署模型的機器規格。
3. 選用：在「Endpoint name」(端點名稱) 欄位中，系統會預設顯示模型的端點。不過，您可以在欄位中輸入其他端點名稱。
4. 如果專案強制執行 VPC-SC，或您偏好私人存取權，請從「端點存取權」欄位選取「私人 (Private Service Connect)」。否則請選取「公開」。
5. 如果您使用 Private Service Connect，請在「專案 ID」欄位中輸入專案 ID，這些專案是執行查詢用戶端的專案，或按一下「選取專案 ID」，顯示含有專案 ID 的對話方塊。
  
  如果點選「選取專案 ID」，請按照下列步驟操作：
  1. 找出包含嘗試存取模型的程式碼的專案。
  2. 按一下專案的核取方塊。
  3. 按一下「選取」。
點選「Deploy」(部署)。

gcloud CLI

這項指令示範如何將模型部署至特定區域。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

這項指令示範如何將模型部署至特定區域，並指定機器類型、加速器類型和加速器數量。如要選取特定機器設定，就必須設定所有三個欄位。

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

或者，您也可以不必將參數傳遞至 custom_model.deploy() 方法。

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

或者，您也可以使用 API 明確設定機器類型。


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

使用 API 部署

VPC Service Controls 僅適用於專屬私人端點。因此，您必須在下列程式碼範例中設定 private_service_connect_config，該範例說明如何使用 API 進行部署：

curl

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/YOUR_PROJECT/locations/us-central1:deploy" \
    -d '{
      "custom_model": {
        "model_id": "test-mg-deploy-092301",
        "gcs_uri": "gs://YOUR_GCS_BUCKET"
      },
      "destination": "projects/YOUR_PROJECT/locations/us-central1",
      "endpoint_config": {
        "endpoint_display_name": "psc-ep1",
        "private_service_connect_config": {
          "enablePrivateServiceConnect": true,
          "projectAllowlist": ["YOUR_PROJECT"]
        }
      },
      "deploy_config": {
        "dedicated_resources": {
          "machine_spec": {
            "machine_type": "g2-standard-24",
            "accelerator_type": "NVIDIA_L4",
            "accelerator_count": 2
          },
          "min_replica_count": 1,
          "max_replica_count": 1
        }
      }
    }'

使用 Google Cloud 控制台取得端點 ID 和模型 ID

部署完成後，請按照下列步驟操作：

前往 Google Cloud 控制台的「Model Garden」(模型花園) 頁面。

前往 Model Garden
按一下「查看我的端點與模型」。
在「My endpoints」(我的端點) 表格中，從「Name」(名稱) 欄檢查您剛部署的端點。系統隨即會顯示「詳細資料」頁面。
在「已部署模型」表格中，按一下模型。
選取「版本詳細資料」頁面。模型 ID 會顯示在表格的「環境變數」資料列中。

設定 Private Service Connect

您正在新增端點，以便存取 Google API。這個端點可在所選虛擬私有雲網路的所有區域中使用。此外，請考量下列事項：

如果網路利用混合式連線連上端點的虛擬私有雲網路，位處其中的用戶端即可存取該端點。詳情請參閱「透過端點存取 API」。Google
如果用戶端位於對等互連的虛擬私有雲網路，就無法存取端點。

列出端點以取得服務連結

這個程式碼範例示範如何列出端點，以取得服務附件。

curl

$ curl \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/YOUR_PROJECT/locations/us-central1/endpoints/YOUR_ENDPOINT_ID"

這是清單端點回應。

  {
  "name": "projects/440968033208/locations/us-central1/endpoints/mg-endpoint-2c6ae2be-1491-43fe-b179-cb5a63e2c955",
  "displayName": "psc-ep1",
  "deployedModels": [
    {
      "id": "4026753529031950336",
      "model": "projects/440968033208/locations/us-central1/models/mg-custom-1758645924",
      "displayName": "null-null-null-1758645933",
      "createTime": "2025-09-23T16:45:45.169195Z",
      "dedicatedResources": {
        "machineSpec": {
          "machineType": "g2-standard-24",
          "acceleratorType": "NVIDIA_L4",
          "acceleratorCount": 2
        },
        "minReplicaCount": 1,
        "maxReplicaCount": 1
      },
      "enableContainerLogging": true,
      "privateEndpoints": {
        "serviceAttachment": "projects/qdb392d34e2a11149p-tp/regions/us-central1/serviceAttachments/gkedpm-fbbc4061323c91c14ab4d961a2f8b0"
      },
      "modelVersionId": "1",
      "status": {
        "lastUpdateTime": "2025-09-23T17:26:10.031652Z",
        "availableReplicaCount": 1
      }
    }
  ],
  "trafficSplit": {
    "4026753529031950336": 100
  },
  "etag": "AMEw9yPIWQYdbpHu6g6Mhpu1_10J062_oR9Jw9txrp8dFFbel7odLgSK8CGIogAUkR_r",
  "createTime": "2025-09-23T16:45:45.169195Z",
  "updateTime": "2025-09-23T17:13:36.320873Z",
  "privateServiceConnectConfig": {
    "enablePrivateServiceConnect": true,
    "projectAllowlist": [
      "ucaip-vpc-s-1605069239-dut-24"
    ]
  }
}

建立 Private Service Connect

如要建立 Private Service Connect (PSC)，請按照下列步驟操作：

前往 Google Cloud 控制台的「Private Service Connect」頁面。系統隨即會顯示「連結的端點」頁面。

前往 Private Service Connect
按一下「+ 連結端點」。系統隨即會顯示「連結端點」頁面。
從「目標」欄位選取選項。您可以選擇「Google API」，存取大多數 API 和服務，也可以選擇「已發布的服務」，存取已發布的服務。 Google
在「目標詳細資料」部分，從「範圍」清單中選取值，然後從「套裝組合類型」清單中選取值。
在「Endpoint details」(端點詳細資料) 部分，執行下列操作：
1. 在「Endpoint name」(端點名稱) 欄位輸入名稱。
2. 從「Network」(網路) 清單中選取值。選取位於專案中的虛擬私有雲網路。如要在使用主專案內 Shared VPC 網路的服務專案建立 PSC 端點，請改用 Google Cloud CLI 或傳送 API 要求。
3. 從「IP 位址」清單中選取值。
展開「服務目錄」部分。
從「Region」(區域) 清單中選取區域。
從「命名空間」清單中選取命名空間。
按一下「新增端點」。「Endpoints」表格會更新，並為新端點新增資料列。

提出查詢

本節說明如何建立公開端點和私人端點。

查詢公開端點

模型部署後，自訂權重會支援公開專屬端點。您可以使用 API 或 SDK 傳送查詢。

傳送查詢前，您必須先取得端點網址、端點 ID 和模型 ID，這些資訊都可在 Google Cloud 控制台中找到。

如要取得這類資訊，請按照下列步驟操作：

前往 Google Cloud 控制台的「Model Garden」(模型花園) 頁面。

Model Garden
按一下「查看我的端點與模型」。
從「Region」(區域) 清單中選取區域。
如要取得端點 ID 和端點網址，請按一下「我的端點」部分中的端點。

端點 ID 會顯示在「端點 ID」欄位中。

公開端點網址會顯示在「專屬端點」欄位中。
如要取得模型 ID，請在「已部署的模型」部分中找出模型，然後按照下列步驟操作：
1. 在「Model」欄位中，按一下已部署的模型名稱。
2. 按一下「版本詳細資料」。型號 ID 會顯示在「型號 ID」欄位。

取得端點和已部署模型資訊後，請參閱下列程式碼範例，瞭解如何傳送推論要求，或參閱「將線上推論要求傳送至專用公開端點」。

API

下列程式碼範例會根據您的用途，示範使用 API 的不同方式。

Chat completion (unary)

這個範例要求會將完整的即時通訊訊息傳送給模型，並在生成完整的回覆後，以單一區塊的形式取得回覆。這與傳送簡訊並取得單一完整回覆類似。

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}/chat/completions" \
    -d '{
    "model": "'"${MODEL_ID}"'",
    "temperature": 0,
    "top_p": 1,
    "max_tokens": 154,
    "ignore_eos": true,
    "messages": [
      {
        "role": "user",
        "content": "How to tell the time by looking at the sky?"
      }
    ]
  }'

對話完成 (串流)

這項要求是單元對話完成要求的串流版本。在要求中加入 "stream": true，模型就會在生成回覆時，逐一傳送回覆內容。這很適合在即時通訊應用程式中建立類似打字機的即時效果。

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}/chat/completions" \
    -d '{
    "model": "'"${MODEL_ID}"'",
    "stream": true,
    "temperature": 0,
    "top_p": 1,
    "max_tokens": 154,
    "ignore_eos": true,
    "messages": [
      {
        "role": "user",
        "content": "How to tell the time by looking at the sky?"
      }
    ]
  }'

預測

這項要求會直接傳送提示，從模型取得推論結果。這類模型通常用於非對話式工作，例如文字摘要或分類。

  curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
  "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict" \
    -d '{
    "instances": [
      {
        "prompt": "How to tell the time by looking at the sky?",
        "temperature": 0,
        "top_p": 1,
        "max_tokens": 154,
        "ignore_eos": true
      }
    ]
  }'

原始預測

這項要求是 Predict 要求的串流版本。使用 :streamRawPredict 端點並加入 "stream": true，這項要求會傳送直接提示，並在模型生成輸出內容時，以連續資料流的形式接收輸出內容，與串流聊天完成要求類似。

  curl -X POST \
    -N \
    --output - \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:streamRawPredict" \
    -d '{
    "instances": [
      {
        "prompt": "How to tell the time by looking at the sky?",
        "temperature": 0,
        "top_p": 1,
        "max_tokens": 154,
        "ignore_eos": true,
        "stream": true
      }
    ]
  }'

SDK

這個程式碼範例會使用 SDK 將查詢傳送至模型，並從該模型取得回覆。

  from google.cloud import aiplatform

  project_id = ""
  location = ""
  endpoint_id = "" # Use the short ID here

  aiplatform.init(project=project_id, location=location)

  endpoint = aiplatform.Endpoint(endpoint_id)

  prompt = "How to tell the time by looking at the sky?"
  instances=[{"text": prompt}]
  response = endpoint.predict(instances=instances, use_dedicated_endpoint=True)
  print(response.predictions)

查詢私人端點

您可以在允許的專案中使用筆記本或 VM 測試查詢。

這個範例查詢可讓您將變數替換為 IP、專案、端點 ID 和模型 ID (從上述「部署」步驟取得)

curl

完成對話

curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" 'https://YOUR_IP/v1beta1/projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/endpoints/YOUR_ENDPOINT_ID/chat/completions' -d '{ "model": "YOUR_MODEL_ID", "max_tokens": 300, "messages": [{ "role": "user", "content": "how to tell the time by looking at sky?" }]}'

預測

$ curl -k -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" 'https:/YOUR_IP/v1beta1/projects/YOUR_PROJECT_ID/locations/YOUR_LOCATION/endpoints/YOUR_ENDPOINT_ID:predict' -d '{
  "instances": [
    {
      "prompt": "Summarize Goog stock performance",
      "temperature": 0,
      "top_p": 1,
      "max_tokens": 154,
      "ignore_eos": true
    }
  ]
}'

如需使用 API 的其他範例，請參閱匯入自訂權重筆記本。

進一步瞭解 Vertex AI 中自行部署的模型

如要進一步瞭解 Vertex AI 線上預測專屬私人端點，請參閱「使用以 Private Service Connect 為基礎的專屬私人端點進行線上推論」。
如要進一步瞭解自行部署的模型，請參閱「自行部署的模型總覽」。
如要進一步瞭解 Model Garden，請參閱 Model Garden 總覽。
如要進一步瞭解如何部署模型，請參閱「在 Model Garden 中使用模型」。
使用 Gemma 開放式模型
使用 Llama 開放式模型
使用 Hugging Face 開放模型

部署使用自訂權重的模型 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

支援的模型

限制

模型檔案

位置

必要條件

事前準備

部署自訂模型

控制台

gcloud CLI

Python

curl

使用 API 部署

curl

使用 Google Cloud 控制台取得端點 ID 和模型 ID

設定 Private Service Connect

列出端點以取得服務連結

curl

建立 Private Service Connect

提出查詢

查詢公開端點

API

SDK

查詢私人端點

curl

進一步瞭解 Vertex AI 中自行部署的模型

部署使用自訂權重的模型