資料物件

在 Agent Retrieval (舊稱 Vector Search 2.0) 中,集合會將資料儲存為稱為「資料物件」的個別 JSON 物件。本頁面說明資料物件必須符合的驗證規則,以及如何個別或批次建立、讀取、更新、匯入、匯出及刪除資料物件。

資料驗證

Agent Retrieval 擷取的每個資料物件都會經過一組固定規則的檢查。如果記錄違反任何規則,系統會拒絕該記錄,並透過 code = INVALID_ARGUMENT傳送至錯誤接收器;系統不會對該記錄執行後續檢查。為避免「修正一個錯誤、重新擷取、遇到下一個錯誤」的迴圈,請先根據所有資料驗證規則驗證資料集,再啟動擷取或索引建構作業。

管道會依下列順序套用驗證:

階段 檢查項目
1. 剖析 JSON 格式正確無誤;包含必要的頂層欄位;欄位類型正確
2. 身分驗證 id 欄位存在,且符合記錄的 ID 格式
3. 資料欄位結構定義 data 酬載符合集合 CollectionConfig 中宣告的 JSON 結構定義
4. 嵌入驗證 每個密集/稀疏向量都符合集合的向量結構定義 (名稱、類型、維度、有限值、稀疏同位和唯一性)
5. 可搜尋欄位人口 系統可從 data 成功擷取在結構定義中標示為可搜尋的欄位

1. 剖析驗證

系統會先執行剖析驗證,然後將每個輸入行轉換為內部資料物件。這些規則適用於兩種支援的 JSON 格式:預設格式 (含有頂層 vectors/data 物件) 和 v1 格式 (含有 embeddingsparse_embeddingrestrictsnumeric_restricts)。系統會自動偵測每筆記錄的格式。

  • JSON 必須可剖析。每行都必須剖析為 JSON 物件。如果某行的頂層鍵與預設或 v1 格式都不相符,系統會拒絕該行並傳回 Unknown JSON format for string: <line>
  • id 為必填欄位。每筆記錄都必須包含非空值的 id。否則: 'id' field is missing or null
  • 必須提供嵌入內容 (僅限 v1 格式)。v1 格式的記錄必須至少包含 embeddingsparse_embedding 其中一項。否則:'embedding' or 'sparse_embedding' fields are missing
  • 稠密嵌入類型檢查。密集嵌入欄位 (v1 中的 embedding,或預設格式中 vectors 下的任何陣列值) 必須是數字的 JSON 陣列。無法強制轉換為 float 的值會遭到拒絕,並顯示 '<field>' field contains non-float values
  • 稀疏嵌入結構檢查。針對每個稀疏向量:
    • 必須是 JSON 物件。
    • 必須包含兩個陣列:values (浮點數) 和 indices (長整數);在第 1 版中,這些是 valuesdimensions
    • values」不得留空。
    • 索引不得為負數。
    • values.length 必須等於 indices.length (或 v1 dimensions.length)。
  • data 欄位類型。如果有的話,data 必須是 JSON 物件,而非陣列、字串或純量。否則:'data' field is not a JSON object
  • numeric_restricts 形狀 (v1 格式)。numeric_restricts 必須是物件的 JSON 陣列。每個項目都必須有字串 namespace,且只能設定 value_intvalue_floatvalue_double 其中一個。

大多數「首次失敗」問題都發生在這個階段。常見錯誤包括嵌入陣列中的字串化數字、缺少 id,或 values/indices 長度不同。

2. 身分驗證

資料物件 ID 必須符合 RFC 1035。實務上,這表示:

  • 長度介於 1 至 63 個字元之間。
  • 只能使用小寫英文字母、數字和連字號 (-)。不得使用大寫英文字母、底線、空格、符號或 Unicode。
  • 開頭必須為小寫字母 (a-z)。
  • 結尾須為小寫字母或數字 (不得有尾隨的 -)。

規則運算式:[a-z]([-a-z0-9]{0,61}[a-z0-9])?

下表列出常見範例:

ID 有效嗎? 原因
doc-123 開頭須為英文字母,只能使用小寫英文字母、數字和連字號,結尾須為數字
a 至少要有一個小寫字母
product-sku-42 符合所有規則
Doc-123 不得使用大寫字母 D
123-doc 開頭須為英文字母,不得為數字
doc_123 不允許使用底線
doc-123- 結尾不得為連字號
my doc 不得包含空格
64 個以上字元 長度上限為 63 個半形字元

這個形狀是 DNS 標籤規則 (主機名稱中點號之間的部分,例如 my-service.example.com)。這裡會重複使用這個規則,確保 ID 能安全地透過網址、檔案名稱、記錄和 CLI 來回傳輸,而不需要逸出。

3. 資料欄位驗證 (JSON 結構定義)

只有在「集合」的 dataSchema 中宣告 dataSchema 時,系統才會執行資料欄位驗證階段。如果未設定結構定義,系統會略過這個階段。CollectionConfig

  • 結構定義遵循狀態。資料物件的 data 酬載會序列化為 JSON,並根據設定的 JSON 結構定義 (草案 7) 進行驗證。驗證器會針對每項結構定義違規情形回報一個錯誤,因此如果記錄有三個無效欄位,就會產生三則錯誤訊息。
    • 訊息:DataObject with id <id> failed schema validation: <error>
  • 結構定義處理錯誤。如果結構定義驗證器本身擲回 (例如不支援的草稿功能),系統會拒絕記錄並顯示 DataObject with id <id> failed schema validation processing: <exception>

4. 嵌入欄位驗證

嵌入欄位驗證階段會先疊代密集向量,然後疊代稀疏向量。兩個清單共用一組已查看的向量名稱,因此名稱不得重複使用,即使是跨密集和稀疏邊界也不例外。

共用規則 (適用於密集和稀疏)

規則 重要性解析 錯誤訊息
同一資料物件的密集和稀疏向量名稱不得重複 如果兩個項目的向量名稱相同,就會以相同的儲存空間金鑰為目標,產生未定義的最後寫入者勝出行為 ... has duplicate embedding field '<name>' across its dense/sparse vectors; each vector name must appear at most once
必須在 CollectionConfig 向量結構定義中宣告向量名稱 不明向量名稱無法路由至資料欄 ... has dense/sparse embedding field '<name>' but this field is not defined in CollectionConfig vector schema

僅限稠密向量的規則

規則 錯誤訊息
這個欄位必須在集合結構定義中設定為「dense」 ... has dense embedding field '<name>' but CollectionConfig defines it as non-dense
維度必須與設定的維度相符 ... field '<name>': expected dense embedding dimension <expected>, but got <actual>
所有值都必須是有限值,不得為 NaN+Infinity-Infinity。非有限值會破壞距離計算。 ... field '<name>': dense embedding contains non-finite value <v> at index <i> (NaN/Infinity values are not allowed)

僅限稀疏向量的規則

規則 錯誤訊息
這個欄位必須在集合結構定義中設定為「稀疏」 ... has sparse embedding field '<name>' but CollectionConfig defines it as non-sparse
索引/值長度同位檢查indicesCount == valuesCount ... field '<name>': sparse embedding has <n> indices but <m> values; indices and values must have the same length
非負數索引:每個索引 >= 0。 ... field '<name>': sparse embedding contains negative index <i> at position <p> (indices must be non-negative)
同一個稀疏向量中的不重複索引 ... field '<name>': sparse embedding contains duplicate index <i> (each index must appear at most once)
所有值都必須是有限值。 ... field '<name>': sparse embedding contains non-finite value <v> at position <p> (NaN/Infinity values are not allowed)

5. 可搜尋欄位人口

嵌入驗證成功後,管道會使用 Collection 的 dataSchema 走訪 data 酬載,並將結構定義宣告的欄位 (字串、整數/數字、布林值、字串陣列和巢狀物件) 複製到可搜尋的欄位索引。這裡有兩種失敗模式也會拒絕記錄:

  • 路徑應為結構,但包含純量 (例如,結構定義指出 author.name 是字串,但 author 本身是文件中的字串)。
  • 字串陣列欄位包含非字串元素。

這通常表示文件的形狀已偏離宣告的結構定義,且不一定會在 JSON 結構定義階段遭到攔截。

準備檢查清單

開始擷取或建立索引前,請根據下列規則驗證整個資料集。這與管道套用的檢查相同,但會依序執行,因此單一用戶端傳遞會顯示每個問題:

  1. 格式:每行都會剖析為 JSON,並符合預設或 v1 形狀。
  2. ID:符合 RFC 1035 規則運算式 [a-z]([-a-z0-9]{0,61}[a-z0-9])?,且在資料集中是唯一的。
  3. 嵌入呈現 - 每筆記錄至少一個向量;向量名稱會列在CollectionConfig向量結構定義中,並具有正確的密集或稀疏類型。
  4. 密集向量:維度正確,沒有 NaN+Inf-Inf 值。
  5. 稀疏向量 - values.length == indices.length;所有索引都大於或等於 0,且不得重複。不允許非有限值。
  6. 記錄中密集和稀疏向量的名稱不得重複
  7. 資料結構定義:如果已設定 dataSchema,系統會根據該結構定義 (草案 7) 驗證 data 酬載,且每個欄位的實際 JSON 類型都會與宣告的類型相符 (尤其是巢狀物件和字串陣列)。
  8. v1 numeric_restricts - 每個項目都有字串 namespace,以及 value_int / value_float / value_double 其中之一。

建立資料物件

以下範例示範如何將單一資料物件新增至 ID 為 COLLECTION_ID 的集合。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects?dataObjectId=DATA_OBJECT_ID

JSON 要求主體:

{
  "data": {
    "director": "Frank Darabont",
    "genre": "Drama",
    "title": "The Shawshank Redemption",
    "year": 1994
  },
  "vectors":{
    "genre_embedding": {
      "dense": {
        "values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ]
      }
    },
    "sparse_embedding": {
      "sparse": {
        "indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ],
        "values": [ 1, 6, 3, 2, 8, 5, 2 ]
      }
    }
  }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "data": {
    "director": "Frank Darabont",
    "title": "The Shawshank Redemption",
    "year": 1994,
    "genre": "Drama"
  },
  "vectors": {
    "genre_embedding": {
      "dense": {
        "values": [
          0.3863801,
          0.73934346,
          0.16189057,
          0.5271367
        ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [
          0.47520825,
          0.090267465,
          0.8752308
        ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [
          0.5920452,
          0.08301644,
          0.12647335,
          0.619643,
          0.49258286
        ]
      }
    },
    "sparse_embedding": {
      "sparse": {
        "values": [
          1,
          6,
          3,
          2,
          8,
          5,
          2
        ],
        "indices": [
          4065,
          13326,
          17377,
          25918,
          28105,
          32683,
          42998
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • DATA_FILE:包含資料物件資料部分的 JSON 檔案本機路徑。

    檔案內容範例:

    {
      "director": "Frank Darabont",
      "genre": "Drama",
      "title": "The Shawshank Redemption",
      "year": 1994
    }
  • VECTORS_FILE:包含資料物件向量部分的 JSON 檔案本機路徑。

    檔案內容範例:

    {
      "genre_embedding": {
        "dense": {
          "values": [ 0.38638010860523064, 0.739343471733759, 0.16189056837017107, 0.5271366865924485 ]
        }
      },
      "plot_embedding": {
        "dense": {
          "values": [ 0.4752082440607731, 0.09026746166854707, 0.8752307753619009 ]
        }
      },
      "soundtrack_embedding": {
        "dense": {
          "values": [ 0.5920451749052875, 0.08301644173787519, 0.1264733498775969, 0.6196429624200321, 0.4925828581737443 ]
        }
      },
      "sparse_embedding": {
        "sparse": {
          "indices": [ 4065, 13326, 17377, 25918, 28105, 32683, 42998 ],
          "values": [ 1, 6, 3, 2, 8, 5, 2 ]
        }
      }
    }
  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects create DATA_OBJECT_ID \
  --data=DATA_FILE \
  --vectors=VECTORS_FILE \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects create DATA_OBJECT_ID `
  --data=DATA_FILE `
  --vectors=VECTORS_FILE `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects create DATA_OBJECT_ID ^
  --data=DATA_FILE ^
  --vectors=VECTORS_FILE ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應:

Created dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
data_object = vectorsearch_v1.DataObject(
    data={
        "title": "The Shawshank Redemption",
        "genre": "Drama",
        "year": 1994,
        "director": "Frank Darabont",
    },
    vectors={
        "plot_embedding": {
            "dense": {"values": [0.1, 0.2, 0.3]}
        },
        "genre_embedding": {
            "dense": {"values": [0.4, 0.5, 0.6, 0.7]}
        },
        "soundtrack_embedding": {
            "dense": {"values": [0.8, 0.9, 1.0, 1.1, 1.2]}
        },
        "sparse_embedding": {
            "sparse": {"values": [1.0, 2.0], "indices": [10, 20]}
        },
    },
)
request = vectorsearch_v1.CreateDataObjectRequest(
    parent="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    data_object_id="DATA_OBJECT_ID",
    data_object=data_object,
)

# Make the request
response = data_object_service_client.create_data_object(request=request)

# Handle the response
print(response)

系統會自動填入在集合結構定義中指定自動嵌入的欄位。您也可以自備嵌入 (BYOE),設定未自動填入的向量欄位值。

批次建立資料物件

如要有效率地大量擷取少量記錄 (每個要求最多 1000 個資料物件),請使用 batchCreate。整個批次是不可分割的:要不是建立所有資料物件,就是整個要求失敗。如果是較大的資料集,建議從 Cloud Storage 匯入資料物件

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchCreate

JSON 要求主體:

{
  "requests": [
    {
      "parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId": "movie-1",
      "dataObject": {
        "data": {
          "title": "The Shawshank Redemption",
          "year": 1994
        },
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.47, 0.09, 0.87] }
          }
        }
      }
    },
    {
      "parent": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId": "movie-2",
      "dataObject": {
        "data": {
          "title": "The Godfather",
          "year": 1972
        },
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.12, 0.55, 0.31] }
          }
        }
      }
    }
  ]
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "dataObjects": [
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
      "data": {
        "title": "The Shawshank Redemption",
        "year": 1994
      },
      "vectors": {
        "plot_embedding": {
          "dense": { "values": [0.47, 0.09, 0.87] }
        }
      }
    },
    {
      "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
      "data": {
        "title": "The Godfather",
        "year": 1972
      },
      "vectors": {
        "plot_embedding": {
          "dense": { "values": [0.12, 0.55, 0.31] }
        }
      }
    }
  ]
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-create \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-create `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Windows (cmd.exe)

gcloud vector-search collections data-objects batch-create ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-1",
      "dataObject":{"data":{"title":"The Shawshank Redemption","year":1994},"vectors":{"plot_embedding":{"dense":{"values":[0.47,0.09,0.87]}}}}
    },
    {
      "parent":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
      "dataObjectId":"movie-2",
      "dataObject":{"data":{"title":"The Godfather","year":1972},"vectors":{"plot_embedding":{"dense":{"values":[0.12,0.55,0.31]}}}}
    }
  ]'

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Build per-DataObject create requests.
requests = [
    vectorsearch_v1.CreateDataObjectRequest(
        parent=parent,
        data_object_id="movie-1",
        data_object=vectorsearch_v1.DataObject(
            data={"title": "The Shawshank Redemption", "year": 1994},
            vectors={
                "plot_embedding": {"dense": {"values": [0.47, 0.09, 0.87]}},
            },
        ),
    ),
    vectorsearch_v1.CreateDataObjectRequest(
        parent=parent,
        data_object_id="movie-2",
        data_object=vectorsearch_v1.DataObject(
            data={"title": "The Godfather", "year": 1972},
            vectors={
                "plot_embedding": {"dense": {"values": [0.12, 0.55, 0.31]}},
            },
        ),
    ),
]

request = vectorsearch_v1.BatchCreateDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
response = data_object_service_client.batch_create_data_objects(request=request)

# Handle the response
for data_object in response.data_objects:
    print(data_object.name)

取得資料物件

以下範例說明如何從 ID 為 COLLECTION_ID 的集合中,取得 ID 為 DATA_OBJECT_ID 的資料物件。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

GET https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "createTime": "2026-01-31T20:05:06Z",
  "updateTime": "2026-01-31T20:05:06Z",
  "data": {
    "title": "The Shawshank Redemption",
    "director": "Frank Darabont",
    "year": 1994,
    "genre": "Drama"
  },
  "vectors": {
    "sparse_embedding": {
      "sparse": {
        "values": [
          1,
          6,
          3,
          2,
          8,
          5,
          2
        ],
        "indices": [
          4065,
          13326,
          17377,
          25918,
          28105,
          32683,
          42998
        ]
      }
    },
    "genre_embedding": {
      "dense": {
        "values": [
          0.3863801,
          0.73934346,
          0.16189057,
          0.5271367
        ]
      }
    },
    "plot_embedding": {
      "dense": {
        "values": [
          0.47520825,
          0.090267465,
          0.8752308
        ]
      }
    },
    "soundtrack_embedding": {
      "dense": {
        "values": [
          0.5920452,
          0.08301644,
          0.12647335,
          0.619643,
          0.49258286
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects describe DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects describe DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects describe DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應:

name: projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID
data:
  director: Frank Darabont
  genre: Drama
  title: The Shawshank Redemption
  year: 1994
vectors:
  genre_embedding:
    dense:
      values:
      - 0.3863801
      - 0.73934346
      - 0.16189057
      - 0.5271367
  plot_embedding:
    dense:
      values:
      - 0.47520825
      - 0.090267465
      - 0.8752308
  soundtrack_embedding:
    dense:
      values:
      - 0.5920452
      - 0.08301644
      - 0.12647335
      - 0.619643
      - 0.49258286
  sparse_embedding:
    sparse:
      indices:
      - 4065
      - 13326
      - 17377
      - 25918
      - 28105
      - 32683
      - 42998
      values:
      - 1.0
      - 6.0
      - 3.0
      - 2.0
      - 8.0
      - 5.0
      - 2.0

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
request = vectorsearch_v1.GetDataObjectRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)

# Make the request
response = data_object_service_client.get_data_object(request=request)

# Handle the response
print(response)

更新資料物件

以下範例說明如何更新 ID 為 COLLECTION_ID 的集合中,ID 為 DATA_OBJECT_ID 的資料物件中的 title 資料欄位和 plot_embedding 向量值。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

PATCH https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

JSON 要求主體:

{
  "data": {
    "title": "The Shawshank Redemption (updated)"
  },
  "vectors": {
    "plot_embedding": {
      "dense": {
        "values": [
          1.0,
          1.0,
          1.0
        ]
      }
    }
  }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
  "data": {
    "title": "The Shawshank Redemption (updated)"
  },
  "vectors": {
    "plot_embedding": {
      "dense": {
        "values": [
          1,
          1,
          1
        ]
      }
    }
  }
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects update DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --data='{"title": "The Shawshank Redemption (updated)"}' \
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

Windows (PowerShell)

gcloud vector-search collections data-objects update DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --data='{"title": "The Shawshank Redemption (updated)"}' `
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

Windows (cmd.exe)

gcloud vector-search collections data-objects update DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --data='{"title": "The Shawshank Redemption (updated)"}' ^
  --update-vectors='{"plot_embedding": {"dense": {"values": [1.0, 1.0, 1.0]}}}'

您應該會收到類似以下的回應:

Updated dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
data_object = vectorsearch_v1.DataObject(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
    data={"title": "The Shawshank Redemption (updated)"},
    vectors={
        "plot_embedding": {
            "dense": {"values": [1., 1., 1.]}
        },
    },
)
request = vectorsearch_v1.UpdateDataObjectRequest(
    data_object=data_object,
)

# Make the request
response = data_object_service_client.update_data_object(request=request)

# Handle the response
print(response)

批次更新資料物件

如要一次更新多個資料物件,請使用 batchUpdate。單一批次最多可更新 1000 個資料物件。每項記錄要求都會指定 dataObject (必須包含完整資源 name,以及要變更的欄位) 和 updateMask 清單,列出要覆寫的欄位。遮罩中未列出的欄位則維持不變。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchUpdate

JSON 要求主體:

{
  "requests": [
    {
      "dataObject": {
        "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1",
        "data": { "genre": "Thriller" }
      },
      "updateMask": "data.genre"
    },
    {
      "dataObject": {
        "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2",
        "vectors": {
          "plot_embedding": {
            "dense": { "values": [0.21, 0.34, 0.55] }
          }
        }
      },
      "updateMask": "vectors.plot_embedding"
    }
  ]
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-update \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-update `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Windows (cmd.exe)

gcloud vector-search collections data-objects batch-update ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1","data":{"genre":"Thriller"}},
      "updateMask":"data.genre"
    },
    {
      "dataObject":{"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2","vectors":{"plot_embedding":{"dense":{"values":[0.21,0.34,0.55]}}}},
      "updateMask":"vectors.plot_embedding"
    }
  ]'

Python

from google.cloud import vectorsearch_v1
from google.protobuf import field_mask_pb2

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Each entry specifies the DataObject to update (with its full resource
# name) and an update_mask listing the fields to overwrite. Fields not
# listed in the mask are left unchanged.
requests = [
    vectorsearch_v1.UpdateDataObjectRequest(
        data_object=vectorsearch_v1.DataObject(
            name=f"{parent}/dataObjects/movie-1",
            data={"genre": "Thriller"},
        ),
        update_mask=field_mask_pb2.FieldMask(paths=["data.genre"]),
    ),
    vectorsearch_v1.UpdateDataObjectRequest(
        data_object=vectorsearch_v1.DataObject(
            name=f"{parent}/dataObjects/movie-2",
            vectors={
                "plot_embedding": {"dense": {"values": [0.21, 0.34, 0.55]}},
            },
        ),
        update_mask=field_mask_pb2.FieldMask(paths=["vectors.plot_embedding"]),
    ),
]

request = vectorsearch_v1.BatchUpdateDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
data_object_service_client.batch_update_data_objects(request=request)

匯入資料物件

以下範例說明如何將 Cloud Storage 中的資料物件匯入 ID 為 COLLECTION_ID 的集合。如要匯入大型資料集,請使用匯入功能;如要大量擷取較小的資料集 (最多 1000 筆記錄),請考慮批次建立資料物件

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects

JSON 要求主體:

{
  "gcsImport": {
    "contentsUri": "gs://your-bucket/path/to/your-data.json",
    "errorUri": "gs://your-bucket/path/to/import-errors/",
    "outputUri": "gs://your-bucket/path/to/import-output/"
  }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ImportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections import-data-objects COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" \
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" \
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" \
  --async

Windows (PowerShell)

gcloud vector-search collections import-data-objects COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" `
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" `
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" `
  --async

Windows (cmd.exe)

gcloud vector-search collections import-data-objects COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --gcs-import-contents-uri="gs://your-bucket/path/to/your-data.json" ^
  --gcs-import-error-uri="gs://your-bucket/path/to/import-errors/" ^
  --gcs-import-output-uri="gs://your-bucket/path/to/import-output/" ^
  --async

Python

from google.cloud import vectorsearch_v1

# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()

# Initialize request
request = vectorsearch_v1.ImportDataObjectsRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    gcs_import={
      "contents_uri": "gs://your-bucket/path/to/your-data/",
      "error_uri": "gs://your-bucket/path/to/import-errors/",
    },
)

# Make the request
operation = vector_search_service_client.import_data_objects(request=request)

# Wait for the result (note this may take up to several minutes)
operation.result()

資料夾 gs://your-bucket/path/to/your-data/ 可包含一或多個檔案,每個檔案都包含多個資料物件。如果大型資料集分散在多個檔案中,請使用這個結構。Agent Retrieval 支援下列檔案格式:

  • JSONL,其中每行都是一個 JSON 物件,具有三個頂層屬性:iddatavectors。如要檢查及手動編輯,請使用這個格式建立新的 Agent Retrieval 資料集。
  • AVRO:如要使用經過結構定義驗證的精簡二進位格式,請為新的 Agent Retrieval 資料集使用這個格式,通常適用於由 Dataflow、Beam 或 Spark 等資料管道工具產生的大型資料集。
  • 向量搜尋 JSON:只有在遷移現有的向量搜尋 (向量搜尋 1.0) JSON 資料集,並想直接重複使用時,才使用這個格式。
  • Vector Search AVRO:只有在遷移現有的 Vector Search (Vector Search 1.0) AVRO 資料集,並想直接重複使用時,才使用這個格式。

以下提供 JSONL 範例,其中包含必要屬性。

{
  "id": "movie-789",
  "data": {
    "title":"The Shawshank Redemption",
    "plot": "...",
    "year":1994,
    "avg_rating": 8.5,
    "movie_runtime_info": {
        "hours": 2,
        "minutes": 5
    },
  },
  "vectors": {
    "title_embedding": [-0.23, 0.88, 0.11, ...],
    "sparse_embedding": {
      "values": [0.01, -0.93, 0.27, ...],
      "indices": [23, 83, 131, ...]
    }
  }
}

AVRO

如果是 AVRO 檔案,每筆記錄都必須符合顯示的 DataObject Avro 結構定義。這些欄位會反映 JSONL 格式:

  • id (必要 string)。
  • vectors (map,預設為 {})。每個項目都以向量名稱做為鍵,值則為 arrayfloat (密集向量) 或具有 values (float 陣列) 和 indices (long 陣列) 的 SparseVector 記錄。
  • data (可為空值 map,預設為 null)。鍵是資料欄位名稱。每個值都是 DataValue 記錄,其 value 欄位是支援的原始型別 (booleanintlongfloatdoublestring) 加上 array 的聯集,DataValuemapstring 則為 DataValue,適用於巢狀結構。
  • etag (可為空值 string,預設值為 null)。
{
  "namespace": "com.google.cloud.ai.vectorsearch",
  "type": "record",
  "name": "DataObject",
  "fields": [
    {
      "name": "id",
      "type": "string"
    },
    {
      "name": "vectors",
      "type": {
        "type": "map",
        "values": [
          {
            "type": "array",
            "items": "float"
          },
          {
            "type": "record",
            "name": "SparseVector",
            "fields": [
              {
                "name": "values",
                "type": { "type": "array", "items": "float" }
              },
              {
                "name": "indices",
                "type": { "type": "array", "items": "long" }
              }
            ]
          }
        ]
      },
      "default": {}
    },
    {
      "name": "data",
      "type": [
        "null",
        {
          "type": "map",
          "values": {
            "type": "record",
            "name": "DataValue",
            "fields": [
              {
                "name": "value",
                "type": [
                  "boolean",
                  "int",
                  "long",
                  "float",
                  "double",
                  "string",
                  {
                    "type": "array",
                    "items": "DataValue"
                  },
                  {
                    "type": "map",
                    "values": "DataValue"
                  }
                ]
              }
            ]
          }
        }
      ],
      "default": null
    },
    {
      "name": "etag",
      "type": [
        "null",
        "string"
      ],
      "default": null
    }
  ]
}

下列程式碼片段顯示與上述 JSONL 範例相符的單一 AVRO 記錄概念內容。請注意,在這個結構定義下,data 中的每個項目都會包裝在 DataValue 記錄中 (包含單一 value 欄位),這就是 AVRO 代表 data 中異質型別的方式:

{
  "id": "movie-789",
  "vectors": {
    "title_embedding": [-0.23, 0.88, 0.11],
    "sparse_embedding": {
      "values": [0.01, -0.93, 0.27],
      "indices": [23, 83, 131]
    }
  },
  "data": {
    "title": { "value": "The Shawshank Redemption" },
    "plot": { "value": "..." },
    "year": { "value": 1994 },
    "avg_rating": { "value": 8.5 },
    "movie_runtime_info": {
      "value": {
        "hours":   { "value": 2 },
        "minutes": { "value": 5 }
      }
    }
  }
}

匯出資料物件

以下範例說明如何將集合中的每個資料物件匯出至 Cloud Storage,並採用 JSONL 格式。目的地值區必須與集合位於相同區域。匯出作業是長時間執行的作業。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:exportDataObjects

JSON 要求主體:

{
  "gcsDestination": {
    "exportUri": "gs://your-bucket/path/to/export-dir/",
    "format": "JSONL"
  }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections export-data-objects COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" \
  --gcs-destination-format="jsonl" \
  --async

Windows (PowerShell)

gcloud vector-search collections export-data-objects COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" `
  --gcs-destination-format="jsonl" `
  --async

Windows (cmd.exe)

gcloud vector-search collections export-data-objects COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --gcs-destination-export-uri="gs://your-bucket/path/to/export-dir/" ^
  --gcs-destination-format="jsonl" ^
  --async

Python

from google.cloud import vectorsearch_v1

# Create the client
vector_search_service_client = vectorsearch_v1.VectorSearchServiceClient()

# Initialize request
request = vectorsearch_v1.ExportDataObjectsRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    gcs_destination={
        "export_uri": "gs://your-bucket/path/to/export-dir/",
        "format": vectorsearch_v1.ExportDataObjectsRequest.GcsExportDestination.Format.JSONL,
    },
)

# Make the request
operation = vector_search_service_client.export_data_objects(request=request)

# Wait for the result (note this may take up to several minutes)
operation.result()

刪除資料物件

以下範例說明如何從 ID 為 COLLECTION_ID 的集合中刪除單一資料物件 DATA_OBJECT_ID

REST

使用任何要求資料之前,請先修改下列項目的值:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

DELETE https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/operation-1770039043815-649d75471f76e-08de3049-276a02be",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.vectorsearch.v1.ExportDataObjectsMetadata",
    "createTime": "2026-02-02T13:30:43.874527852Z"
  },
  "done": false
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • DATA_OBJECT_ID:資料物件的 ID。
  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects delete DATA_OBJECT_ID \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID

Windows (PowerShell)

gcloud vector-search collections data-objects delete DATA_OBJECT_ID `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID

Windows (cmd.exe)

gcloud vector-search collections data-objects delete DATA_OBJECT_ID ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID

您應該會收到類似以下的回應:

Deleted dataObject [DATA_OBJECT_ID].

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

# Initialize request
request = vectorsearch_v1.DeleteDataObjectRequest(
    name="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/DATA_OBJECT_ID",
)

# Make the request
data_object_service_client.delete_data_object(request=request)

批次刪除資料物件

如要一次刪除多個資料物件,請使用 batchDelete,並提供完整合格的資料物件資源名稱清單。單一批次最多可刪除 1000 個資料物件

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:batchDelete

JSON 要求主體:

{
  "requests": [
    { "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1" },
    { "name": "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2" }
  ]
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects batch-delete \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Windows (PowerShell)

gcloud vector-search collections data-objects batch-delete `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Windows (cmd.exe)

gcloud vector-search collections data-objects batch-delete ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --requests='[
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-1"},
    {"name":"projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects/movie-2"}
  ]'

Python

from google.cloud import vectorsearch_v1

# Create the client
data_object_service_client = vectorsearch_v1.DataObjectServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

requests = [
    vectorsearch_v1.DeleteDataObjectRequest(
        name=f"{parent}/dataObjects/movie-1",
    ),
    vectorsearch_v1.DeleteDataObjectRequest(
        name=f"{parent}/dataObjects/movie-2",
    ),
]

request = vectorsearch_v1.BatchDeleteDataObjectsRequest(
    parent=parent,
    requests=requests,
)

# Make the request
data_object_service_client.batch_delete_data_objects(request=request)

計算資料物件

如要計算集合包含的資料物件數量,請使用 aggregate 作業和 COUNT 聚合方法。相同的呼叫會接受選用的 JSON 篩選運算式,因此您只能計算符合述詞的資料物件 (例如 genre == "sci-fi")。

如要計算集合中的每個資料物件,請省略篩選器。

REST

使用任何要求資料之前,請先修改下列項目的值:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

HTTP 方法和網址:

POST https://vectorsearch.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate

JSON 要求主體:

{
  "aggregate": "COUNT",
  "filter": { "genre": { "$eq": "sci-fi" } }
}

請展開以下其中一個選項,以傳送要求:

您應該會收到如下的 JSON 回覆:

{
  "aggregateResults": [
    { "count": "42" }
  ]
}

gcloud

使用下方的任何指令資料之前,請先替換以下項目:

  • COLLECTION_ID:集合的 ID。
  • LOCATION:您使用 Agent Platform 的區域。
  • PROJECT_ID:您的 Google Cloud 專案 ID

執行下列指令:

Linux、macOS 或 Cloud Shell

gcloud vector-search collections data-objects aggregate \
  --collection=COLLECTION_ID \
  --location=LOCATION \
  --project=PROJECT_ID \
  --aggregation-method=count \
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Windows (PowerShell)

gcloud vector-search collections data-objects aggregate `
  --collection=COLLECTION_ID `
  --location=LOCATION `
  --project=PROJECT_ID `
  --aggregation-method=count `
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Windows (cmd.exe)

gcloud vector-search collections data-objects aggregate ^
  --collection=COLLECTION_ID ^
  --location=LOCATION ^
  --project=PROJECT_ID ^
  --aggregation-method=count ^
  --json-filter='{"genre": {"$eq": "sci-fi"}}'

Python

from google.cloud import vectorsearch_v1
from google.protobuf import struct_pb2
from google.protobuf import json_format

# Create the client
search_client = vectorsearch_v1.DataObjectSearchServiceClient()

parent = "projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID"

# Optional: build a JSON filter. Omit `filter=` to count everything.
filter_struct = json_format.ParseDict(
    {"genre": {"$eq": "sci-fi"}}, struct_pb2.Struct()
)

request = vectorsearch_v1.AggregateDataObjectsRequest(
    parent=parent,
    aggregate=vectorsearch_v1.AggregationMethod.COUNT,
    filter=filter_struct,
)

# Make the request
response = search_client.aggregate_data_objects(request=request)

# The count value is returned in aggregate_results[0].
for result in response.aggregate_results:
    print(result)

後續步驟