為簡化從向量搜尋 1.0 遷移的過程,ImportDataObjects API 新增了這項功能。
遷移程序包含三個主要步驟:
建立結構定義相符的集合。 匯入前,請務必先建立集合。資料結構定義必須經過調整,才能容納轉換後的 Vector Search 1.0 資料。
啟動匯入程序。 呼叫
ImportDataObjectsAPI,指定 Vector Search 1.0 資料的 Cloud Storage 位置,並啟用轉換標記detect_and_convert_vs1_json。瞭解資料轉換。 瞭解 Vector Search 1.0 資料欄位如何對應至新的資料物件結構。
建立集合
首先,請建立集合,並使用與 Vector Search 1.0 資料結構相符的資料結構定義。
curl -X POST \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections?collection_id=movies' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json' \
-d '{ \
"data_schema": { \
"$schema": "http://json-schema.org/draft-07/schema#", \
"type": "object", \
"properties": { \
"restricts": { \
"type": "object", \
"properties": { \
"genres": { \
"type": "array", \
"items": { \
"type": "string" \
} \
}, \
"director": { \
"type": "array", \
"items": { \
"type": "string" \
} \
} \
} \
}, \
"restricts_deny": { \
"type": "object", \
"properties": { \
"genres": { \
"type": "array", \
"items": { \
"type": "string" \
} \
} \
} \
}, \
"numeric_restricts": { \
"type": "object", \
"properties": { \
"year": { \
"type": "integer" \
}, \
"imdb_rating": { \
"type": "number", \
"format": "float" \
} \
} \
}, \
"embedding_metadata": { \
"type": "object", \
"properties": { \
"plot": { \
"type": "string" \
}, \
"customers_review_summary": { \
"type": "string" \
}, \
"critics_review_summary": { \
"type": "string" \
} \
}, \
} \
} \
}, \
"vector_schema": { \
"embedding": { \
"dense_vector": { \
"dimensions": 768 \
} \
}, \
"sparse_embedding": { \
"sparse_vector": {} \
} \
} \
}'
匯入 Vector Search 1.0 資料
接著,請對新建立的集合使用 ImportDataObjects API。
將其指向包含 Vector Search 1.0 資料的 Cloud Storage bucket。
curl -X POST \
"https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{ \
"gcs_import": { \
"contents_uri": "gs://your-bucket/path/to/your-data.jsonl", \
"error_uri": "gs://your-bucket/path/to/import-errors/" \
} \
}'
資料轉換
匯入期間,系統會將 Vector Search 1.0 資料轉換為 Vector Search 2.0 資料物件。以下範例說明如何對應欄位。
Vector Search 1.0 Cloud Storage 檔案格式
{
"id": "movie-789",
"embedding": [-0.23, 0.88, 0.11, ...],
"sparse_embedding": {"values": [0.1, 0.2], "dimensions": [1, 4]},
"restricts": [
{"namespace": "genres", "allow": ["science-fiction", "action"], "deny": ["horror"]},
{"namespace": "director", "allow": ["Christopher Nolan"]}
],
"numeric_restricts": [
{"namespace": "year", "value_int": 2010},
{"namespace": "imdb_rating", "value_float": 8.8}
],
"embedding_metadata": {
"plot": "...",
"customers_review_summary": "...",
"critics_review_summary": "..."
}
}
轉換後的 Vector Search 2.0 資料物件
DataObject(
name="/.../movie-789",
data={
"restricts": {
"genres": ["science-fiction", "action"],
"director": ["Christopher Nolan"],
},
"restricts_deny": {
"genres": ["horror"]
},
"numeric_restricts": {
"year": 2010,
"imdb_rating": 8.8,
},
"embedding_metadata": {
"plot": "...",
"customers_review_summary": "...",
"critics_review_summary": "...",
}
},
vectors={
"embedding": {"dense_vector": {"values": [-0.23, 0.88, 0.11, ...]}},
"sparse_embedding": {"sparse_vector": {"values": [0.1, 0.2], "indices": [1, 4]}},
}
)