Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

資料物件

在 Agent Retrieval (舊稱 Vector Search 2.0) 中，集合會將資料儲存為稱為「資料物件」的個別 JSON 物件。本頁面說明資料物件必須符合的驗證規則，以及如何個別或批次建立、讀取、更新、匯入、匯出及刪除資料物件。

資料驗證

Agent Retrieval 擷取的每個資料物件都會經過一組固定規則的檢查。如果記錄違反任何規則，系統會拒絕該記錄，並透過 code = INVALID_ARGUMENT傳送至錯誤接收器；系統不會對該記錄執行後續檢查。為避免「修正一個錯誤、重新擷取、遇到下一個錯誤」的迴圈，請先根據所有資料驗證規則驗證資料集，再啟動擷取或索引建構作業。

管道會依下列順序套用驗證：

階段	檢查項目
1. 剖析	JSON 格式正確無誤；包含必要的頂層欄位；欄位類型正確
2. 身分驗證	`id` 欄位存在，且符合記錄的 ID 格式
3. 資料欄位結構定義	`data` 酬載符合集合 `CollectionConfig` 中宣告的 JSON 結構定義
4. 嵌入驗證	每個密集/稀疏向量都符合集合的向量結構定義 (名稱、類型、維度、有限值、稀疏同位和唯一性)
5. 可搜尋欄位人口	系統可從 `data` 成功擷取在結構定義中標示為可搜尋的欄位

1. 剖析驗證

系統會先執行剖析驗證，然後將每個輸入行轉換為內部資料物件。這些規則適用於兩種支援的 JSON 格式：預設格式 (含有頂層 vectors/data 物件) 和 v1 格式 (含有 embedding、sparse_embedding、restricts 或 numeric_restricts)。系統會自動偵測每筆記錄的格式。

JSON 必須可剖析。每行都必須剖析為 JSON 物件。如果某行的頂層鍵與預設或 v1 格式都不相符，系統會拒絕該行並傳回 Unknown JSON format for string: <line>。
id 為必填欄位。每筆記錄都必須包含非空值的 id。否則： 'id' field is missing or null。
必須提供嵌入內容 (僅限 v1 格式)。v1 格式的記錄必須至少包含 embedding 或 sparse_embedding 其中一項。否則：'embedding' or 'sparse_embedding' fields are missing。
稠密嵌入類型檢查。密集嵌入欄位 (v1 中的 embedding，或預設格式中 vectors 下的任何陣列值) 必須是數字的 JSON 陣列。無法強制轉換為 float 的值會遭到拒絕，並顯示 '<field>' field contains non-float values。
稀疏嵌入結構檢查。針對每個稀疏向量：
- 必須是 JSON 物件。
- 必須包含兩個陣列：values (浮點數) 和 indices (長整數)；在第 1 版中，這些是 values 和 dimensions。
- 「values」不得留空。
- 索引不得為負數。
- values.length 必須等於 indices.length (或 v1 dimensions.length)。
data 欄位類型。如果有的話，data 必須是 JSON 物件，而非陣列、字串或純量。否則：'data' field is not a JSON object。
numeric_restricts 形狀 (v1 格式)。numeric_restricts 必須是物件的 JSON 陣列。每個項目都必須有字串 namespace，且只能設定 value_int、value_float 或 value_double 其中一個。

大多數「首次失敗」問題都發生在這個階段。常見錯誤包括嵌入陣列中的字串化數字、缺少 id，或 values/indices 長度不同。

2. 身分驗證

資料物件 ID 必須符合 RFC 1035。實務上，這表示：

長度介於 1 至 63 個字元之間。
只能使用小寫英文字母、數字和連字號 (-)。不得使用大寫英文字母、底線、空格、符號或 Unicode。
開頭必須為小寫字母 (a-z)。
結尾須為小寫字母或數字 (不得有尾隨的 -)。

規則運算式：[a-z]([-a-z0-9]{0,61}[a-z0-9])?

下表列出常見範例：

ID	有效嗎？	原因
`doc-123`	是	開頭須為英文字母，只能使用小寫英文字母、數字和連字號，結尾須為數字
`a`	是	至少要有一個小寫字母
`product-sku-42`	是	符合所有規則
`Doc-123`	否	不得使用大寫字母 `D`
`123-doc`	否	開頭須為英文字母，不得為數字
`doc_123`	否	不允許使用底線
`doc-123-`	否	結尾不得為連字號
`my doc`	否	不得包含空格
64 個以上字元	否	長度上限為 63 個半形字元

這個形狀是 DNS 標籤規則 (主機名稱中點號之間的部分，例如 my-service.example.com)。這裡會重複使用這個規則，確保 ID 能安全地透過網址、檔案名稱、記錄和 CLI 來回傳輸，而不需要逸出。

3. 資料欄位驗證 (JSON 結構定義)

只有在「集合」的 dataSchema 中宣告 dataSchema 時，系統才會執行資料欄位驗證階段。如果未設定結構定義，系統會略過這個階段。CollectionConfig

結構定義遵循狀態。資料物件的 data 酬載會序列化為 JSON，並根據設定的 JSON 結構定義 (草案 7) 進行驗證。驗證器會針對每項結構定義違規情形回報一個錯誤，因此如果記錄有三個無效欄位，就會產生三則錯誤訊息。
- 訊息：DataObject with id <id> failed schema validation: <error>。
結構定義處理錯誤。如果結構定義驗證器本身擲回 (例如不支援的草稿功能)，系統會拒絕記錄並顯示 DataObject with id <id> failed schema validation processing: <exception>。

4. 嵌入欄位驗證

嵌入欄位驗證階段會先疊代密集向量，然後疊代稀疏向量。兩個清單共用一組已查看的向量名稱，因此名稱不得重複使用，即使是跨密集和稀疏邊界也不例外。

共用規則 (適用於密集和稀疏)

規則	重要性解析	錯誤訊息
同一資料物件的密集和稀疏向量名稱不得重複	如果兩個項目的向量名稱相同，就會以相同的儲存空間金鑰為目標，產生未定義的最後寫入者勝出行為	`... has duplicate embedding field '<name>' across its dense/sparse vectors; each vector name must appear at most once`
必須在 `CollectionConfig` 向量結構定義中宣告向量名稱	不明向量名稱無法路由至資料欄	`... has dense/sparse embedding field '<name>' but this field is not defined in CollectionConfig vector schema`

僅限稠密向量的規則

規則	錯誤訊息
這個欄位必須在集合結構定義中設定為「dense」。	`... has dense embedding field '<name>' but CollectionConfig defines it as non-dense`
維度必須與設定的維度相符。	`... field '<name>': expected dense embedding dimension <expected>, but got <actual>`
所有值都必須是有限值，不得為 `NaN`、`+Infinity` 或 `-Infinity`。非有限值會破壞距離計算。	`... field '<name>': dense embedding contains non-finite value <v> at index <i> (NaN/Infinity values are not allowed)`

僅限稀疏向量的規則

規則	錯誤訊息
這個欄位必須在集合結構定義中設定為「稀疏」。	`... has sparse embedding field '<name>' but CollectionConfig defines it as non-sparse`
索引/值長度同位檢查：`indicesCount == valuesCount`。	`... field '<name>': sparse embedding has <n> indices but <m> values; indices and values must have the same length`
非負數索引：每個索引 >= 0。	`... field '<name>': sparse embedding contains negative index <i> at position <p> (indices must be non-negative)`
同一個稀疏向量中的不重複索引。	`... field '<name>': sparse embedding contains duplicate index <i> (each index must appear at most once)`
所有值都必須是有限值。	`... field '<name>': sparse embedding contains non-finite value <v> at position <p> (NaN/Infinity values are not allowed)`

5. 可搜尋欄位人口

嵌入驗證成功後，管道會使用 Collection 的 dataSchema 走訪 data 酬載，並將結構定義宣告的欄位 (字串、整數/數字、布林值、字串陣列和巢狀物件) 複製到可搜尋的欄位索引。這裡有兩種失敗模式也會拒絕記錄：

路徑應為結構，但包含純量 (例如，結構定義指出 author.name 是字串，但 author 本身是文件中的字串)。
字串陣列欄位包含非字串元素。

這通常表示文件的形狀已偏離宣告的結構定義，且不一定會在 JSON 結構定義階段遭到攔截。

準備檢查清單

開始擷取或建立索引前，請根據下列規則驗證整個資料集。這與管道套用的檢查相同，但會依序執行，因此單一用戶端傳遞會顯示每個問題：

格式：每行都會剖析為 JSON，並符合預設或 v1 形狀。
ID：符合 RFC 1035 規則運算式 [a-z]([-a-z0-9]{0,61}[a-z0-9])?，且在資料集中是唯一的。
嵌入呈現 - 每筆記錄至少一個向量；向量名稱會列在CollectionConfig向量結構定義中，並具有正確的密集或稀疏類型。
密集向量：維度正確，沒有 NaN、+Inf 或 -Inf 值。
稀疏向量 - values.length == indices.length；所有索引都大於或等於 0，且不得重複。不允許非有限值。
記錄中密集和稀疏向量的名稱不得重複。
資料結構定義：如果已設定 dataSchema，系統會根據該結構定義 (草案 7) 驗證 data 酬載，且每個欄位的實際 JSON 類型都會與宣告的類型相符 (尤其是巢狀物件和字串陣列)。
v1 numeric_restricts - 每個項目都有字串 namespace，以及 value_int / value_float / value_double 其中之一。

建立資料物件

以下範例示範如何將單一資料物件新增至 ID 為 COLLECTION_ID 的集合。

REST

使用任何要求資料之前，請先修改下列項目的值：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects?dataObjectId=DATA_OBJECT_ID

JSON 要求主體：

{
  "data": {
    "director": "Frank Darabont",
    "genre": "Drama",
    "title": "The Shawshank Redemption",
    "year": 1994
  },
  "vectors":{
    "genre_embedding": {
      "dense": {
        "values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ]
      }
    },
    "sparse_embedding": {
      "sparse": {
        "indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ],
        "values": [ 1, 6, 3, 2, 8, 5, 2 ]
      }
    }
  }
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI，或已使用 Cloud Shell 自動登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects?dataObjectId=DATA_OBJECT_ID"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects?dataObjectId=DATA_OBJECT_ID" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "data": {
    "director": "Frank Darabont",
    "title": "The Shawshank Redemption",
    "year": 1994,
    "genre": "Drama"
  },
  "vectors": {
    "genre_embedding": {
      "dense": {
        "values": [
          0.3863801,
          0.73934346,
          0.16189057,
          0.5271367
        ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [
          0.47520825,
          0.090267465,
          0.8752308
        ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [
          0.5920452,
          0.08301644,
          0.12647335,
          0.619643,
          0.49258286
        ]
      }
    },
    "sparse_embedding": {
      "sparse": {
        "values": [
          1,
          6,
          3,
          2,
          8,
          5,
          2
        ],
        "indices": [
          4065,
          13326,
          17377,
          25918,
          28105,
          32683,
          42998
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

DATA_FILE：包含資料物件資料部分的 JSON 檔案本機路徑。

檔案內容範例：

{
  "director": "Frank Darabont",
  "genre": "Drama",
  "title": "The Shawshank Redemption",
  "year": 1994
}

VECTORS_FILE：包含資料物件向量部分的 JSON 檔案本機路徑。

檔案內容範例：

{
  "genre_embedding": {
    "dense": {
      "values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ]
    }
  },
  "plot_embedding": {
    "dense": {
      "values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ]
    }
  },
  "soundtrack_embedding": {
    "dense": {
      "values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ]
    }
  },
  "sparse_embedding": {
    "sparse": {
      "indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ],
      "values": [ 1, 6, 3, 2, 8, 5, 2 ]
    }
  }
}

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects create DATA_OBJECT_ID \
  --data=DATA_FILE \
  --vectors=VECTORS_FILE \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects create DATA_OBJECT_ID `
  --data=DATA_FILE `
  --vectors=VECTORS_FILE `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects create DATA_OBJECT_ID ^
  --data=DATA_FILE ^
  --vectors=VECTORS_FILE ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應：

Created dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
data_object = vectorsearch_v1.DataObject(
    data={
        "title": "The Shawshank Redemption",
        "genre": "Drama",
        "year": 1994,
        "director": "Frank Darabont",
    },
    vectors={
        "plot_embedding": {
            "dense": {"values": [0.1, 0.2, 0.3]}
        },
        "genre_embedding": {
            "dense": {"values": [0.4, 0.5, 0.6, 0.7]}
        },
        "soundtrack_embedding": {
            "dense": {"values": [0.8, 0.9, 1.0, 1.1, 1.2]}
        },
        "sparse_embedding": {
            "sparse": {"values": [1.0, 2.0], "indices": [10, 20]}
        },
    },
)
request = vectorsearch_v1.CreateDataObjectRequest(
    parent="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    data_object_id="DATA_OBJECT_ID",
    data_object=data_object,
)

# Make the request
response = data_object_service_client.create_data_object(request=request)

# Handle the response
print(response)

系統會自動填入在集合結構定義中指定自動嵌入的欄位。您也可以自備嵌入 (BYOE)，設定未自動填入的向量欄位值。

批次建立資料物件

如要有效率地大量擷取少量記錄 (每個要求最多 1000 個資料物件)，請使用 batchCreate。整個批次是不可分割的：要不是建立所有資料物件，就是整個要求失敗。如果是較大的資料集，建議從 Cloud Storage 匯入資料物件。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchCreate

JSON 要求主體：

{
  "requests": [
    {
      "parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId": "movie-1",
      "dataObject": {
        "data": {
          "title": "The Shawshank Redemption",
          "year": 1994
        },
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.47, 0.09, 0.87] }
          }
        }
      }
    },
    {
      "parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId": "movie-2",
      "dataObject": {
        "data": {
          "title": "The Godfather",
          "year": 1972
        },
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.12, 0.55, 0.31] }
          }
        }
      }
    }
  ]
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchCreate"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchCreate" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "dataObjects": [
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
      "data": {
        "title": "The Shawshank Redemption",
        "year": 1994
      },
      "vectors": {
        "plot_embedding": {
          "dense": { "values": [0.47, 0.09, 0.87] }
        }
      }
    },
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
      "data": {
        "title": "The Godfather",
        "year": 1972
      },
      "vectors": {
        "plot_embedding": {
          "dense": { "values": [0.12, 0.55, 0.31] }
        }
      }
    }
  ]
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-create \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-create `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Windows (cmd.exe)

注意： 如果這個指令使用 ' 引號來引述內容，請將這些單引號換成雙引號。如果引用內容是巢狀結構，請使用 \" 逸出內層引號。

gcloud vector-search collections data-objects batch-create ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Build per-DataObject create requests.
requests = [
    vectorsearch_v1.CreateDataObjectRequest(
        parent=parent,
        data_object_id="movie-1",
        data_object=vectorsearch_v1.DataObject(
            data={"title": "The Shawshank Redemption", "year": 1994},
            vectors={
                "plot_embedding": {"dense": {"values": [0.47, 0.09, 0.87]}},
            },
        ),
    ),
    vectorsearch_v1.CreateDataObjectRequest(
        parent=parent,
        data_object_id="movie-2",
        data_object=vectorsearch_v1.DataObject(
            data={"title": "The Godfather", "year": 1972},
            vectors={
                "plot_embedding": {"dense": {"values": [0.12, 0.55, 0.31]}},
            },
        ),
    ),
]

request = vectorsearch_v1.BatchCreateDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
response = data_object_service_client.batch_create_data_objects(request=request)

# Handle the response
for data_object in response.data_objects:
    print(data_object.name)

取得資料物件

以下範例說明如何從 ID 為 COLLECTION_ID 的集合中，取得 ID 為 DATA_OBJECT_ID 的資料物件。

REST

使用任何要求資料之前，請先修改下列項目的值：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

GET https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

執行下列指令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "createTime": "2026-01-31T20:05:06Z",
  "updateTime": "2026-01-31T20:05:06Z",
  "data": {
    "title": "The Shawshank Redemption",
    "director": "Frank Darabont",
    "year": 1994,
    "genre": "Drama"
  },
  "vectors": {
    "sparse_embedding": {
      "sparse": {
        "values": [
          1,
          6,
          3,
          2,
          8,
          5,
          2
        ],
        "indices": [
          4065,
          13326,
          17377,
          25918,
          28105,
          32683,
          42998
        ]
      }
    },
    "genre_embedding": {
      "dense": {
        "values": [
          0.3863801,
          0.73934346,
          0.16189057,
          0.5271367
        ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [
          0.47520825,
          0.090267465,
          0.8752308
        ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [
          0.5920452,
          0.08301644,
          0.12647335,
          0.619643,
          0.49258286
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects describe DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects describe DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects describe DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應：

name: projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
data:
  director: Frank Darabont
  genre: Drama
  title: The Shawshank Redemption
  year: 1994
vectors:
  genre_embedding:
    dense:
      values:
      - 0.3863801
      - 0.73934346
      - 0.16189057
      - 0.5271367
  plot_embedding:
    dense:
      values:
      - 0.47520825
      - 0.090267465
      - 0.8752308
  soundtrack_embedding:
    dense:
      values:
      - 0.5920452
      - 0.08301644
      - 0.12647335
      - 0.619643
      - 0.49258286
  sparse_embedding:
    sparse:
      indices:
      - 4065
      - 13326
      - 17377
      - 25918
      - 28105
      - 32683
      - 42998
      values:
      - 1.0
      - 6.0
      - 3.0
      - 2.0
      - 8.0
      - 5.0
      - 2.0

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
request = vectorsearch_v1.GetDataObjectRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)

# Make the request
response = data_object_service_client.get_data_object(request=request)

# Handle the response
print(response)

更新資料物件

以下範例說明如何更新 ID 為 COLLECTION_ID 的集合中，ID 為 DATA_OBJECT_ID 的資料物件中的 title 資料欄位和 plot_embedding 向量值。

REST

使用任何要求資料之前，請先修改下列項目的值：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

PATCH https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

JSON 要求主體：

{
  "data": {
    "title": "The Shawshank Redemption (updated)"
  },
  "vectors": {
    "plot_embedding": {
      "dense": {
        "values": [
          1.0,
          1.0,
          1.0
        ]
      }
    }
  }
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X PATCH \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method PATCH `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "data": {
    "title": "The Shawshank Redemption (updated)"
  },
  "vectors": {
    "plot_embedding": {
      "dense": {
        "values": [
          1,
          1,
          1
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects update DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --data='{"title": "The Shawshank Redemption (updated)"}' \
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

Windows (PowerShell)

gcloud vector-search collections data-objects update DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --data='{"title": "The Shawshank Redemption (updated)"}' `
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

Windows (cmd.exe)

注意： 如果這個指令使用 ' 引號來引述內容，請將這些單引號換成雙引號。如果引用內容是巢狀結構，請使用 \" 逸出內層引號。

gcloud vector-search collections data-objects update DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --data='{"title": "The Shawshank Redemption (updated)"}' ^
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

您應該會收到類似以下的回應：

Updated dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
data_object = vectorsearch_v1.DataObject(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
    data={"title": "The Shawshank Redemption (updated)"},
    vectors={
        "plot_embedding": {
            "dense": {"values": [1., 1., 1.]}
        },
    },
)
request = vectorsearch_v1.UpdateDataObjectRequest(
    data_object=data_object,
)

# Make the request
response = data_object_service_client.update_data_object(request=request)

# Handle the response
print(response)

批次更新資料物件

如要一次更新多個資料物件，請使用 batchUpdate。單一批次最多可更新 1000 個資料物件。每項記錄要求都會指定 dataObject (必須包含完整資源 name，以及要變更的欄位) 和 updateMask 清單，列出要覆寫的欄位。遮罩中未列出的欄位則維持不變。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchUpdate

JSON 要求主體：

{
  "requests": [
    {
      "dataObject": {
        "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
        "data": { "genre": "Thriller" }
      },
      "updateMask": "data.genre"
    },
    {
      "dataObject": {
        "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.21, 0.34, 0.55] }
          }
        }
      },
      "updateMask": "vectors.plot_embedding"
    }
  ]
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchUpdate"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchUpdate" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-update \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-update `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Windows (cmd.exe)

注意： 如果這個指令使用 ' 引號來引述內容，請將這些單引號換成雙引號。如果引用內容是巢狀結構，請使用 \" 逸出內層引號。

gcloud vector-search collections data-objects batch-update ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Python

from google.cloud import vectorsearch_v1
from google.protobuf import field_mask_pb2

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Each entry specifies the DataObject to update (with its full resource
# name) and an update_mask listing the fields to overwrite. Fields not
# listed in the mask are left unchanged.
requests = [
    vectorsearch_v1.UpdateDataObjectRequest(
        data_object=vectorsearch_v1.DataObject(
            name=f"{parent}/dataObjects/movie-1",
            data={"genre": "Thriller"},
        ),
        update_mask=field_mask_pb2.FieldMask(paths=["data.genre"]),
    ),
    vectorsearch_v1.UpdateDataObjectRequest(
        data_object=vectorsearch_v1.DataObject(
            name=f"{parent}/dataObjects/movie-2",
            vectors={
                "plot_embedding": {"dense": {"values": [0.21, 0.34, 0.55]}},
            },
        ),
        update_mask=field_mask_pb2.FieldMask(paths=["vectors.plot_embedding"]),
    ),
]

request = vectorsearch_v1.BatchUpdateDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
data_object_service_client.batch_update_data_objects(request=request)

匯入資料物件

以下範例說明如何將 Cloud Storage 中的資料物件匯入 ID 為 COLLECTION_ID 的集合。如要匯入大型資料集，請使用匯入功能；如要大量擷取較小的資料集 (最多 1000 筆記錄)，請考慮批次建立資料物件。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects

JSON 要求主體：

{
  "gcsImport": {
    "contentsUri": "gs://your-bucket/path/to/your-data.json",
    "errorUri": "gs://your-bucket/path/to/import-errors/",
    "outputUri": "gs://your-bucket/path/to/import-output/"
  }
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ImportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections import-data-objects COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" \
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" \
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" \
  --async

Windows (PowerShell)

gcloud vector-search collections import-data-objects COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" `
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" `
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" `
  --async

Windows (cmd.exe)

gcloud vector-search collections import-data-objects COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" ^
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" ^
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" ^
  --async

Python

from google.cloud import vectorsearch_v1

# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()

# Initialize request
request = vectorsearch_v1.ImportDataObjectsRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    gcs_import={
      "contents_uri": "gs://your-bucket/path/to/your-data/",
      "error_uri": "gs://your-bucket/path/to/import-errors/",
    },
)

# Make the request
operation = vector_search_service_client.import_data_objects(request=request)

# Wait for the result (note this may take up to several minutes)
operation.result()

資料夾 gs://your-bucket/path/to/your-data/ 可包含一或多個檔案，每個檔案都包含多個資料物件。如果大型資料集分散在多個檔案中，請使用這個結構。Agent Retrieval 支援下列檔案格式：

JSONL，其中每行都是一個 JSON 物件，具有三個頂層屬性：id、data 和 vectors。如要檢查及手動編輯，請使用這個格式建立新的 Agent Retrieval 資料集。
AVRO：如要使用經過結構定義驗證的精簡二進位格式，請為新的 Agent Retrieval 資料集使用這個格式，通常適用於由 Dataflow、Beam 或 Spark 等資料管道工具產生的大型資料集。
向量搜尋 JSON：只有在遷移現有的向量搜尋 (向量搜尋 1.0) JSON 資料集，並想直接重複使用時，才使用這個格式。
Vector Search AVRO：只有在遷移現有的 Vector Search (Vector Search 1.0) AVRO 資料集，並想直接重複使用時，才使用這個格式。

以下提供 JSONL 範例，其中包含必要屬性。

{
  "id": "movie-789",
  "data": {
    "title":"The Shawshank Redemption",
    "plot": "...",
    "year":1994,
    "avg_rating": 8.5,
    "movie_runtime_info": {
        "hours": 2,
        "minutes": 5
    },
  },
  "vectors": {
    "title_embedding": [-0.23, 0.88, 0.11, ...],
    "sparse_embedding": {
      "values": [0.01, -0.93, 0.27, ...],
      "indices": [23, 83, 131, ...]
    }
  }
}

AVRO

如果是 AVRO 檔案，每筆記錄都必須符合顯示的 DataObject Avro 結構定義。這些欄位會反映 JSONL 格式：

id (必要 string)。
vectors (map，預設為 {})。每個項目都以向量名稱做為鍵，值則為 array 的 float (密集向量) 或具有 values (float 陣列) 和 indices (long 陣列) 的 SparseVector 記錄。
data (可為空值 map，預設為 null)。鍵是資料欄位名稱。每個值都是 DataValue 記錄，其 value 欄位是支援的原始型別 (boolean、int、long、float、double、string) 加上 array 的聯集，DataValue 和 map 的 string 則為 DataValue，適用於巢狀結構。
etag (可為空值 string，預設值為 null)。

{
  "namespace": "com.google.cloud.ai.vectorsearch",
  "type": "record",
  "name": "DataObject",
  "fields": [
    {
      "name": "id",
      "type": "string"
    },
    {
      "name": "vectors",
      "type": {
        "type": "map",
        "values": [
          {
            "type": "array",
            "items": "float"
          },
          {
            "type": "record",
            "name": "SparseVector",
            "fields": [
              {
                "name": "values",
                "type": { "type": "array", "items": "float" }
              },
              {
                "name": "indices",
                "type": { "type": "array", "items": "long" }
              }
            ]
          }
        ]
      },
      "default": {}
    },
    {
      "name": "data",
      "type": [
        "null",
        {
          "type": "map",
          "values": {
            "type": "record",
            "name": "DataValue",
            "fields": [
              {
                "name": "value",
                "type": [
                  "boolean",
                  "int",
                  "long",
                  "float",
                  "double",
                  "string",
                  {
                    "type": "array",
                    "items": "DataValue"
                  },
                  {
                    "type": "map",
                    "values": "DataValue"
                  }
                ]
              }
            ]
          }
        }
      ],
      "default": null
    },
    {
      "name": "etag",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

下列程式碼片段顯示與上述 JSONL 範例相符的單一 AVRO 記錄概念內容。請注意，在這個結構定義下，data 中的每個項目都會包裝在 DataValue 記錄中 (包含單一 value 欄位)，這就是 AVRO 代表 data 中異質型別的方式：

{
  "id": "movie-789",
  "vectors": {
    "title_embedding": [-0.23, 0.88, 0.11],
    "sparse_embedding": {
      "values": [0.01, -0.93, 0.27],
      "indices": [23, 83, 131]
    }
  },
  "data": {
    "title": { "value": "The Shawshank Redemption" },
    "plot": { "value": "..." },
    "year": { "value": 1994 },
    "avg_rating": { "value": 8.5 },
    "movie_runtime_info": {
      "value": {
        "hours":   { "value": 2 },
        "minutes": { "value": 5 }
      }
    }
  }
}

匯出資料物件

以下範例說明如何將集合中的每個資料物件匯出至 Cloud Storage，並採用 JSONL 格式。目的地值區必須與集合位於相同區域。匯出作業是長時間執行的作業。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:exportDataObjects

JSON 要求主體：

{
  "gcsDestination": {
    "exportUri": "gs://your-bucket/path/to/export-dir/",
    "format": "JSONL"
  }
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:exportDataObjects"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:exportDataObjects" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections export-data-objects COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" \
  --gcs-destination-format="jsonl" \
  --async

Windows (PowerShell)

gcloud vector-search collections export-data-objects COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" `
  --gcs-destination-format="jsonl" `
  --async

Windows (cmd.exe)

gcloud vector-search collections export-data-objects COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" ^
  --gcs-destination-format="jsonl" ^
  --async

Python

from google.cloud import vectorsearch_v1

# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()

# Initialize request
request = vectorsearch_v1.ExportDataObjectsRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    gcs_destination={
        "export_uri": "gs://your-bucket/path/to/export-dir/",
        "format": vectorsearch_v1.ExportDataObjectsRequest.GcsExportDestination.Format.JSONL,
    },
)

# Make the request
operation = vector_search_service_client.export_data_objects(request=request)

# Wait for the result (note this may take up to several minutes)
operation.result()

刪除資料物件

以下範例說明如何從 ID 為 COLLECTION_ID 的集合中刪除單一資料物件 DATA_OBJECT_ID。

REST

使用任何要求資料之前，請先修改下列項目的值：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

DELETE https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

執行下列指令：

curl -X DELETE \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method DELETE `
    -Headers $headers `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

DATA_OBJECT_ID：資料物件的 ID。
COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects delete DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects delete DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects delete DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應：

Deleted dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
request = vectorsearch_v1.DeleteDataObjectRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)

# Make the request
data_object_service_client.delete_data_object(request=request)

批次刪除資料物件

如要一次刪除多個資料物件，請使用 batchDelete，並提供完整合格的資料物件資源名稱清單。單一批次最多可刪除 1000 個資料物件。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchDelete

JSON 要求主體：

{
  "requests": [
    { "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1" },
    { "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2" }
  ]
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchDelete"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchDelete" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-delete \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-delete `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Windows (cmd.exe)

注意： 如果這個指令使用 ' 引號來引述內容，請將這些單引號換成雙引號。如果引用內容是巢狀結構，請使用 \" 逸出內層引號。

gcloud vector-search collections data-objects batch-delete ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

requests = [
    vectorsearch_v1.DeleteDataObjectRequest(
        name=f"{parent}/dataObjects/movie-1",
    ),
    vectorsearch_v1.DeleteDataObjectRequest(
        name=f"{parent}/dataObjects/movie-2",
    ),
]

request = vectorsearch_v1.BatchDeleteDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
data_object_service_client.batch_delete_data_objects(request=request)

計算資料物件

如要計算集合包含的資料物件數量，請使用 aggregate 作業和 COUNT 聚合方法。相同的呼叫會接受選用的 JSON 篩選運算式，因此您只能計算符合述詞的資料物件 (例如 genre == "sci-fi")。

如要計算集合中的每個資料物件，請省略篩選器。

REST

使用任何要求資料之前，請先修改下列項目的值：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

HTTP 方法和網址：

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate

JSON 要求主體：

{
  "aggregate": "COUNT",
  "filter": { "genre": { "$eq": "sci-fi" } }
}

請展開以下其中一個選項，以傳送要求：

curl (Linux、macOS 或 Cloud Shell)

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate"

PowerShell (Windows)

注意：下列指令假設您已執行 gcloud init 或 gcloud auth login，透過使用者帳戶登入 gcloud CLI。您可以執行 gcloud auth list，查看目前使用的帳戶。

將要求主體儲存在名為 request.json 的檔案中，然後執行下列指令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate" | Select-Object -Expand Content

您應該會收到如下的 JSON 回覆：

{
  "aggregateResults": [
    { "count": "42" }
  ]
}

gcloud

使用下方的任何指令資料之前，請先替換以下項目：

COLLECTION_ID：集合的 ID。
LOCATION：您使用 Agent Platform 的區域。
PROJECT_ID：您的 Google Cloud 專案 ID。

執行下列指令：

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects aggregate \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --aggregation-method=count \
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Windows (PowerShell)

gcloud vector-search collections data-objects aggregate `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --aggregation-method=count `
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Windows (cmd.exe)

注意： 如果這個指令使用 ' 引號來引述內容，請將這些單引號換成雙引號。如果引用內容是巢狀結構，請使用 \" 逸出內層引號。

gcloud vector-search collections data-objects aggregate ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --aggregation-method=count ^
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Python

from google.cloud import vectorsearch_v1
from google.protobuf import struct_pb2
from google.protobuf import json_format

# Create the client
search_client = vectorsearch_v1.DataObjectSearchServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Optional: build a JSON filter. Omit `filter=` to count everything.
filter_struct = json_format.ParseDict(
    {"genre": {"$eq": "sci-fi"}}, struct_pb2.Struct()
)

request = vectorsearch_v1.AggregateDataObjectsRequest(
    parent=parent,
    aggregate=vectorsearch_v1.AggregationMethod.COUNT,
    filter=filter_struct,
)

# Make the request
response = search_client.aggregate_data_objects(request=request)

# The count value is returned in aggregate_results[0].
for result in response.aggregate_results:
    print(result)

後續步驟

瞭解如何使用 ETag 控制資料物件的並行作業。
瞭解集合索引。
瞭解如何查詢及搜尋資料物件。

資料物件 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

資料驗證

1. 剖析驗證

2. 身分驗證

3. 資料欄位驗證 (JSON 結構定義)

4. 嵌入欄位驗證

共用規則 (適用於密集和稀疏)

僅限稠密向量的規則

僅限稀疏向量的規則

5. 可搜尋欄位人口

準備檢查清單

建立資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

批次建立資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

取得資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

更新資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

批次更新資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

匯入資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

AVRO

匯出資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

Linux、macOS 或 Cloud Shell

Windows (PowerShell)

Windows (cmd.exe)

Python

刪除資料物件

REST

curl (Linux、macOS 或 Cloud Shell)

PowerShell (Windows)

gcloud

資料物件