Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

從圖片分類模型取得推論結果

本頁說明如何使用 Google Cloud 控制台或 Agent Platform API，從圖片分類模型取得線上 (即時) 推論結果和批次推論結果。

線上和批次推論的差異

線上推論是對模型端點發出的同步要求。如要依據應用程式輸入內容發出要求，或是需要及時進行推論，您可以選用「線上推論」模式。

批次推論為非同步要求。您可以直接透過模型資源要求批次推論，不必將模型部署至端點。如果是圖片資料，如果您不需要立即取得回應，並想透過單一要求處理累積的資料，就適合選用「批次推論」模式。

取得線上推論結果

將模型部署至端點

您必須先將模型部署至端點，才能使用模型進行線上推論。部署過程中，系統會將實體資源與模型建立關聯，讓模型以低延遲的方式提供線上推論結果。

您可以將多個模型部署至端點，也可以將模型部署至多個端點。如要進一步瞭解模型部署選項和用途，請參閱「模型部署簡介」。

請使用下列其中一種方法部署模型：

Google Cloud 控制台

前往 Google Cloud 控制台的「Agent Platform」部分，然後前往「Models」頁面。

前往「模型」頁面
按一下要部署的模型名稱，開啟「模型說明」頁面。
在「版本 ID」欄中，按一下要部署的模型版本 ID
按一下「Deploy & Test」(部署及測試)。

如果模型已部署至任何端點，這些端點會列在「Deploy your model」(部署模型) 專區。
按一下「Deploy to endpoint」(部署至端點)。
如要將模型部署至新端點，請按一下「建立新端點」，然後輸入新端點的名稱。如要將模型部署至現有端點，請按一下「新增至現有端點」，然後選取端點「端點名稱」。

您可以為端點新增多個模型，也可以為多個端點新增模型。瞭解詳情。

如果部署至新端點，請選擇端點的存取方式：
- 按一下「標準」，即可透過 REST API 使用端點進行推論。
- 點選「私人」，讓端點使用私人連線。
如果部署至現有端點，且該端點已部署一或多個模型，請更新要部署的模型和已部署模型的流量分配百分比，確保這些百分比加總為 100%。
選取「AutoML Image」，然後按照下列方式設定：
1. 如果將模型部署至新端點，請接受「流量分配」的 100。否則，請調整端點上所有模型的流量分配值，使其總和為 100。
2. 輸入要為模型提供的運算節點數量。
  
  這是模型隨時可用的節點數量。即使沒有推論流量，您仍須支付節點費用。請參閱定價頁面。
3. 瞭解如何變更推論記錄的預設設定。
4. 僅限分類模型 (選用)：在「Explainability options」(可解釋性選項) 專區中，選取「Enable feature attributions for this model」(為這個模型啟用特徵歸因) ，即可啟用 Vertex Explainable AI。接受現有的視覺化設定，或選擇新值，然後按一下「完成」。
  您可以選擇部署已設定 Vertex Explainable AI 的 AutoML 圖片分類模型，並執行附帶說明的推論。在部署時啟用 Vertex Explainable AI，會根據部署的節點數量和部署時間產生額外費用。詳情請參閱定價。
5. 為模型點選「完成」，確認所有「流量分配」百分比都正確無誤後，點選「繼續」。
  系統會顯示模型部署的區域。這個地區必須與您建立模型的地區相同。
6. 按一下「Deploy」(部署)，將模型部署至端點。

API

使用 Agent Platform API 部署模型時，請完成下列步驟：

視需要建立端點。
取得端點 ID。
將模型部署至端點。

建立端點

如要將模型部署至現有端點，可以略過這個步驟。

gcloud

下列範例使用 gcloud ai endpoints create 指令：

gcloud ai endpoints create \
  --region=LOCATION \
  --display-name=ENDPOINT_NAME

更改下列內容：

LOCATION_ID：您使用 Agent Platform 的區域。
ENDPOINT_NAME：端點的顯示名稱。

Google Cloud CLI 工具可能需要幾秒鐘才能建立端點。

REST

使用任何要求資料之前，請先修改下列項目的值：

LOCATION_ID：您的區域。
PROJECT_ID：您的 [專案 ID](/resource-manager/docs/creating-managing-projects#identifiers)。。
ENDPOINT_NAME：端點的顯示名稱。

HTTP 方法和網址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

JSON 要求主體：

{
  "display_name": "ENDPOINT_NAME"
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或已使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

您可以輪詢作業狀態，直到回應包含 "done": true 為止。

<0x0

Java

在試用這個範例之前，請先按照「使用用戶端程式庫的 Agent Platform 快速入門導覽課程」中的 Java 設定說明操作。

如要向 Agent Platform 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

在試用這個範例之前，請先按照「使用用戶端程式庫的 Agent Platform 快速入門導覽課程」中的 Node.js 設定說明操作。

如要向 Agent Platform 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python

如要瞭解如何安裝或更新 Vertex AI SDK for Python，請參閱「安裝 Vertex AI SDK for Python」。詳情請參閱 Python API 參考文件。

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

擷取端點 ID

您需要端點 ID 才能部署模型。

gcloud

下列範例使用 gcloud ai endpoints list 指令：

gcloud ai endpoints list \
  --region=LOCATION \
  --filter=display_name=ENDPOINT_NAME

更改下列內容：

LOCATION_ID：您使用 Agent Platform 的區域。
ENDPOINT_NAME：端點的顯示名稱。

請記下 ENDPOINT_ID 欄中顯示的號碼，並在下一個步驟中使用這個 ID。

REST

使用任何要求資料之前，請先修改下列項目的值：

LOCATION_ID：您使用 Agent Platform 的區域。
PROJECT_ID：。
ENDPOINT_NAME：端點的顯示名稱。

HTTP 方法和網址：

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

請注意 ENDPOINT_ID。

部署模型

選取下方分頁標籤，查看適用於您語言或環境的說明：

gcloud

下列範例使用 gcloud ai endpoints deploy-model 指令。

以下範例會將 Model 部署至 Endpoint，且不會在多個 DeployedModel 資源之間分配流量：

使用下方的任何指令資料之前，請先替換以下項目：

ENDPOINT_ID：端點的 ID。
LOCATION_ID：您使用 Agent Platform 的區域。
MODEL_ID：要部署的模型 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名稱。您也可以使用 Model 的顯示名稱做為 DeployedModel。
MIN_REPLICA_COUNT：此部署作業的節點數量下限。節點數量可視推論負載需求增加或減少，最多可達節點數量上限，最少則不得低於這個數量。
MAX_REPLICA_COUNT：此部署作業的節點數量上限。節點數量可視推論負載需求增減，最多可達這個節點數量，且不得少於節點數量下限。如果省略 --max-replica-count 標記，節點數量上限就會設為 --min-replica-count 的值。

執行 gcloud ai endpoints deploy-model 指令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=100

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=100

流量分配

在上述範例中，--traffic-split=0=100 標記會將 Endpoint 接收到的 100% 預測流量傳送至新的 DeployedModel，也就是以臨時 ID 0 代表的資源。如果 Endpoint 已經有其他 DeployedModel 資源，您可以在新舊 DeployedModel 之間分配流量。舉例來說，如要將 20% 的流量傳送至新的 DeployedModel，並將 80% 的流量傳送至舊的 DeployedModel，請執行下列指令。

使用下方的任何指令資料之前，請先替換以下項目：

OLD_DEPLOYED_MODEL_ID：現有 DeployedModel的 ID。

執行 gcloud ai endpoints deploy-model 指令：

Linux、macOS 或 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows (cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

部署模型。

使用任何要求資料之前，請先修改下列項目的值：

LOCATION_ID：您使用 Agent Platform 的區域。
PROJECT_ID：。
ENDPOINT_ID：端點的 ID。
MODEL_ID：要部署的模型 ID。
DEPLOYED_MODEL_NAME：DeployedModel 的名稱。您也可以使用 Model 的顯示名稱做為 DeployedModel。
MIN_REPLICA_COUNT：此部署作業的節點數量下限。節點數量可視推論負載增加或減少，最多可達節點數量上限，最少則不得低於這個數量。
MAX_REPLICA_COUNT：此部署作業的節點數量上限。節點數量可視推論負載需求增減，最多可達這個節點數量，且不得少於節點數量下限。
TRAFFIC_SPLIT_THIS_MODEL：要將多少預測流量導向這個端點，並透過這項作業部署模型。預設值為 100。所有流量百分比加總必須為 100%。進一步瞭解流量分配。
DEPLOYED_MODEL_ID_N：選用。如果其他模型部署至這個端點，您必須更新流量分配百分比，讓所有百分比加總為 100%。
TRAFFIC_SPLIT_MODEL_N：已部署模型 ID 鍵的流量分配百分比值。
PROJECT_NUMBER：系統自動為專案產生的專案編號

HTTP 方法和網址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

JSON 要求主體：

{
  "deployedModel": {
    "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "automaticResources": {
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT
     }
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

在試用這個範例之前，請先按照「使用用戶端程式庫的 Agent Platform 快速入門導覽課程」中的 Java 設定說明操作。

如要向 Agent Platform 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.api.gax.longrunning.OperationTimedPollAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.aiplatform.v1.AutomaticResources;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import com.google.cloud.aiplatform.v1.stub.EndpointServiceStubSettings;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.threeten.bp.Duration;

public class DeployModelSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String deployedModelDisplayName = "YOUR_DEPLOYED_MODEL_DISPLAY_NAME";
    String endpointId = "YOUR_ENDPOINT_NAME";
    String modelId = "YOUR_MODEL_ID";
    int timeout = 900;
    deployModelSample(project, deployedModelDisplayName, endpointId, modelId, timeout);
  }

  static void deployModelSample(
      String project,
      String deployedModelDisplayName,
      String endpointId,
      String modelId,
      int timeout)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {

    // Set long-running operations (LROs) timeout
    final OperationTimedPollAlgorithm operationTimedPollAlgorithm =
        OperationTimedPollAlgorithm.create(
            RetrySettings.newBuilder()
                .setInitialRetryDelay(Duration.ofMillis(5000L))
                .setRetryDelayMultiplier(1.5)
                .setMaxRetryDelay(Duration.ofMillis(45000L))
                .setInitialRpcTimeout(Duration.ZERO)
                .setRpcTimeoutMultiplier(1.0)
                .setMaxRpcTimeout(Duration.ZERO)
                .setTotalTimeout(Duration.ofSeconds(timeout))
                .build());

    EndpointServiceStubSettings.Builder endpointServiceStubSettingsBuilder =
        EndpointServiceStubSettings.newBuilder();
    endpointServiceStubSettingsBuilder
        .deployModelOperationSettings()
        .setPollingAlgorithm(operationTimedPollAlgorithm);
    EndpointServiceStubSettings endpointStubSettings = endpointServiceStubSettingsBuilder.build();
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.create(endpointStubSettings);
    endpointServiceSettings =
        endpointServiceSettings.toBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      EndpointName endpointName = EndpointName.of(project, location, endpointId);
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      ModelName modelName = ModelName.of(project, location, modelId);
      AutomaticResources automaticResourcesInput =
          AutomaticResources.newBuilder().setMinReplicaCount(1).setMaxReplicaCount(1).build();
      DeployedModel deployedModelInput =
          DeployedModel.newBuilder()
              .setModel(modelName.toString())
              .setDisplayName(deployedModelDisplayName)
              .setAutomaticResources(automaticResourcesInput)
              .build();

      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> deployModelResponseFuture =
          endpointServiceClient.deployModelAsync(endpointName, deployedModelInput, trafficSplit);
      System.out.format(
          "Operation name: %s\n", deployModelResponseFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      DeployModelResponse deployModelResponse = deployModelResponseFuture.get(20, TimeUnit.MINUTES);

      System.out.println("Deploy Model Response");
      DeployedModel deployedModel = deployModelResponse.getDeployedModel();
      System.out.println("\tDeployed Model");
      System.out.format("\t\tid: %s\n", deployedModel.getId());
      System.out.format("\t\tmodel: %s\n", deployedModel.getModel());
      System.out.format("\t\tDisplay Name: %s\n", deployedModel.getDisplayName());
      System.out.format("\t\tCreate Time: %s\n", deployedModel.getCreateTime());

      DedicatedResources dedicatedResources = deployedModel.getDedicatedResources();
      System.out.println("\t\tDedicated Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", dedicatedResources.getMinReplicaCount());

      MachineSpec machineSpec = dedicatedResources.getMachineSpec();
      System.out.println("\t\t\tMachine Spec");
      System.out.format("\t\t\t\tMachine Type: %s\n", machineSpec.getMachineType());
      System.out.format("\t\t\t\tAccelerator Type: %s\n", machineSpec.getAcceleratorType());
      System.out.format("\t\t\t\tAccelerator Count: %s\n", machineSpec.getAcceleratorCount());

      AutomaticResources automaticResources = deployedModel.getAutomaticResources();
      System.out.println("\t\tAutomatic Resources");
      System.out.format("\t\t\tMin Replica Count: %s\n", automaticResources.getMinReplicaCount());
      System.out.format("\t\t\tMax Replica Count: %s\n", automaticResources.getMaxReplicaCount());
    }
  }
}

Node.js

在試用這個範例之前，請先按照「使用用戶端程式庫的 Agent Platform 快速入門導覽課程」中的 Node.js 設定說明操作。

如要向 Agent Platform 進行驗證，請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const modelId = "YOUR_MODEL_ID";
// const endpointId = 'YOUR_ENDPOINT_ID';
// const deployedModelDisplayName = 'YOUR_DEPLOYED_MODEL_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

const modelName = `projects/${project}/locations/${location}/models/${modelId}`;
const endpoint = `projects/${project}/locations/${location}/endpoints/${endpointId}`;
// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint:
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function deployModel() {
  // Configure the parent resource
  // key '0' assigns traffic for the newly deployed model
  // Traffic percentage values must add up to 100
  // Leave dictionary empty if endpoint should not accept any traffic
  const trafficSplit = {0: 100};
  const deployedModel = {
    // format: 'projects/{project}/locations/{location}/models/{model}'
    model: modelName,
    displayName: deployedModelDisplayName,
    automaticResources: {minReplicaCount: 1, maxReplicaCount: 1},
  };
  const request = {
    endpoint,
    deployedModel,
    trafficSplit,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.deployModel(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Deploy model response');
  const modelDeployed = result.deployedModel;
  console.log('\tDeployed model');
  if (!modelDeployed) {
    console.log('\t\tId : {}');
    console.log('\t\tModel : {}');
    console.log('\t\tDisplay name : {}');
    console.log('\t\tCreate time : {}');

    console.log('\t\tDedicated resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMachine spec {}');
    console.log('\t\t\t\tMachine type : {}');
    console.log('\t\t\t\tAccelerator type : {}');
    console.log('\t\t\t\tAccelerator count : {}');

    console.log('\t\tAutomatic resources');
    console.log('\t\t\tMin replica count : {}');
    console.log('\t\t\tMax replica count : {}');
  } else {
    console.log(`\t\tId : ${modelDeployed.id}`);
    console.log(`\t\tModel : ${modelDeployed.model}`);
    console.log(`\t\tDisplay name : ${modelDeployed.displayName}`);
    console.log(`\t\tCreate time : ${modelDeployed.createTime}`);

    const dedicatedResources = modelDeployed.dedicatedResources;
    console.log('\t\tDedicated resources');
    if (!dedicatedResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMachine spec {}');
      console.log('\t\t\t\tMachine type : {}');
      console.log('\t\t\t\tAccelerator type : {}');
      console.log('\t\t\t\tAccelerator count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${dedicatedResources.minReplicaCount}`
      );
      const machineSpec = dedicatedResources.machineSpec;
      console.log('\t\t\tMachine spec');
      console.log(`\t\t\t\tMachine type : ${machineSpec.machineType}`);
      console.log(
        `\t\t\t\tAccelerator type : ${machineSpec.acceleratorType}`
      );
      console.log(
        `\t\t\t\tAccelerator count : ${machineSpec.acceleratorCount}`
      );
    }

    const automaticResources = modelDeployed.automaticResources;
    console.log('\t\tAutomatic resources');
    if (!automaticResources) {
      console.log('\t\t\tMin replica count : {}');
      console.log('\t\t\tMax replica count : {}');
    } else {
      console.log(
        `\t\t\tMin replica count : \
          ${automaticResources.minReplicaCount}`
      );
      console.log(
        `\t\t\tMax replica count : \
          ${automaticResources.maxReplicaCount}`
      );
    }
  }
}
deployModel();

Python

如要瞭解如何安裝或更新 Vertex AI SDK for Python，請參閱「安裝 Vertex AI SDK for Python」。詳情請參閱 Python API 參考文件。

def deploy_model_with_automatic_resources_sample(
    project,
    location,
    model_name: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

瞭解如何變更推論記錄的預設設定。

取得作業狀態

部分要求會啟動長時間執行的作業，需要一段時間才能完成。這些要求會傳回作業名稱，您可以使用該名稱查看作業狀態或取消作業。Agent Platform 提供輔助方法，可對長時間執行的作業發出呼叫。詳情請參閱「處理長時間執行的作業」。

使用已部署的模型進行線上推論

如要進行線上推論，請將一或多個測試項目提交給模型進行分析，模型會根據目標傳回結果。如要進一步瞭解推論結果，請參閱「解讀結果」頁面。

控制台

使用 Google Cloud 控制台要求線上推論。模型必須部署至端點。

前往 Google Cloud 控制台的「Agent Platform」部分，然後前往「Models」頁面。

前往「模型」頁面
在模型清單中，按一下要要求推論的模型名稱。
選取「Deploy & test」(部署及測試) 分頁標籤。
在「Test your model」(測試模型) 區段中，新增測試項目以要求推論。

圖片目標的 AutoML 模型需要上傳圖片，才能要求推論。

如要瞭解局部特徵重要性，請參閱「取得說明」。

推論完成後，Gemini Enterprise Agent Platform 會在控制台中傳回結果。

API

使用 Agent Platform API 要求線上推論。模型必須部署至端點。如需更詳細的範例，請參閱使用 AutoML 圖片分類模型進行線上預測的教學課程筆記本。

取得批次推論結果

如要提出批次推論要求，須指定輸入來源和輸出格式，Gemini Enterprise Agent Platform 會將推論結果儲存在指定位置。AutoML 圖片模型類型的批次推論作業需要輸入 JSON Lines 檔案，以及用來儲存輸出的 Cloud Storage bucket 名稱。

輸入資料規定

提交批次要求時，應在輸入內容中指定要送交模型推論的項目。如果是圖片分類模型，您可以使用 JSON Lines 檔案指定要進行推論的圖片清單，然後將 JSON Lines 檔案儲存在 Cloud Storage bucket 中。下列範例顯示輸入 JSON Lines 檔案中的單行內容：

{"content": "gs://sourcebucket/datasets/images/source_image.jpg", "mimeType": "image/jpeg"}

要求批次推論

如要發出批次推論要求，可以使用 Google Cloud 控制台或 Agent Platform API。視您提交的輸入項目數量而定，批次推論工作可能需要一些時間才能完成。

Google Cloud 控制台

使用 Google Cloud 控制台要求批次推論。

前往 Google Cloud 控制台的「Agent Platform」部分，然後前往「Batch predictions」頁面。

前往「批次預測」頁面
按一下「建立」開啟「新增批次預測」視窗，然後完成下列步驟：
1. 輸入批次推論的名稱。
2. 在「Model name」(模型名稱) 中，選取要用於這項批次推論的模型名稱。
3. 在「Source path」(來源路徑) 中，指定 JSON Lines 輸入檔案所在的 Cloud Storage 位置。
4. 在「目的地路徑」中，指定要儲存批次推論結果的 Cloud Storage 位置。「輸出」格式取決於模型的目標。以圖片為目標的 AutoML 模型會輸出 JSON Lines 檔案。

API

使用 Agent Platform API 傳送批次推論要求。

REST

使用任何要求資料之前，請先修改下列項目的值：

LOCATION_ID：儲存模型及執行批次推論工作的區域，例如 us-central1。
PROJECT_ID：
BATCH_JOB_NAME：批次工作的顯示名稱
MODEL_ID：用於進行推論的模型 ID
THRESHOLD_VALUE (選用)：Gemini Enterprise Agent Platform 只會傳回信心分數至少達到這個值的推論結果。預設值為 0.0。
MAX_PREDICTIONS (選用)：Gemini Enterprise Agent Platform 最多會傳回這個數量的推論結果，並從可信度分數最高的推論結果開始。預設值為 10。
URI：輸入 JSON Lines 檔案所在的 Cloud Storage URI。
BUCKET：Cloud Storage bucket
PROJECT_NUMBER：系統自動為專案產生的專案編號

HTTP 方法和網址：

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs

JSON 要求內文：

{
    "displayName": "BATCH_JOB_NAME",
    "model": "projects/PROJECT/locations/LOCATION/models/MODEL_ID",
    "modelParameters": {
      "confidenceThreshold": THRESHOLD_VALUE,
      "maxPredictions": MAX_PREDICTIONS
    },
    "inputConfig": {
        "instancesFormat": "jsonl",
        "gcsSource": {
            "uris": ["URI"],
        },
    },
    "outputConfig": {
        "predictionsFormat": "jsonl",
        "gcsDestination": {
            "outputUriPrefix": "OUTPUT_BUCKET",
        },
    },
}

如要傳送要求，請選擇以下其中一個選項：

curl

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login 透過使用者帳戶登入 gcloud CLI，或是使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs"

PowerShell

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/batchPredictionJobs" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/batchPredictionJobs/BATCH_JOB_ID",
  "displayName": "BATCH_JOB_NAME",
  "model": "projects/PROJECT_ID/locations/LOCATION_ID/models/MODEL_ID",
  "inputConfig": {
    "instancesFormat": "jsonl",
    "gcsSource": {
      "uris": [
        "CONTENT"
      ]
    }
  },
  "outputConfig": {
    "predictionsFormat": "jsonl",
    "gcsDestination": {
      "outputUriPrefix": "BUCKET"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2020-05-30T02:58:44.341643Z",
  "updateTime": "2020-05-30T02:58:44.341643Z",
  "modelDisplayName": "MODEL_NAME",
  "modelObjective": "MODEL_OBJECTIVE"
}

您可以使用 BATCH_JOB_ID 輪詢批次工作的狀態，直到工作 state 為 JOB_STATE_SUCCEEDED 為止。

Python

如要瞭解如何安裝或更新 Vertex AI SDK for Python，請參閱「安裝 Vertex AI SDK for Python」。詳情請參閱 Python API 參考文件。

def create_batch_prediction_job_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: Union[str, Sequence[str]],
    gcs_destination: str,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

擷取批次推論結果

Gemini Enterprise Agent Platform 會將批次推論輸出內容傳送至您指定的目的地。

批次推論工作完成後，推論的輸出內容會儲存在您在要求中指定的 Cloud Storage bucket。

批次推論結果範例

以下是圖片分類模型的批次推論結果範例。

{
  "instance": {"content": "gs://bucket/image.jpg", "mimeType": "image/jpeg"},
  "prediction": {
    "ids": [1, 2],
    "displayNames": ["cat", "dog"],
    "confidences": [0.7, 0.5]
  }
}

評估模型

解讀結果

從圖片分類模型取得推論結果 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

線上和批次推論的差異

取得線上推論結果

將模型部署至端點

Google Cloud 控制台

API

建立端點

gcloud

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

Java

Node.js

Python

擷取端點 ID

gcloud

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

部署模型

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

流量分配

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

Java

Node.js

Python

取得作業狀態

使用已部署的模型進行線上推論

控制台

API

取得批次推論結果

輸入資料規定

要求批次推論

Google Cloud 控制台

API

REST

curl

PowerShell

Python

擷取批次推論結果

批次推論結果範例

從圖片分類模型取得推論結果