为了简化从 Vector Search 1.0 的过渡,ImportDataObjects API 中引入了一项新功能。
迁移过程涉及三个关键步骤:
创建具有匹配架构的集合。 在导入之前,您必须先创建集合。其数据架构必须经过结构化处理,以适应转换后的 Vector Search 1.0 数据。
启动导入流程。 调用
ImportDataObjectsAPI,指定 Vector Search 1.0 数据的 Cloud Storage 位置,并启用转换标志detect_and_convert_vs1_json。了解数据转换。 熟悉 Vector Search 1.0 数据字段如何映射到新的数据对象结构。
创建集合
首先,创建一个集合,其数据架构应与 Vector Search 1.0 数据结构相符。
curl -X POST \
'https://vectorsearch.googleapis.com/v1alpha/projects/PROJECT_ID/locations/LOCATION/collections?collection_id=movies' \
-H 'Bearer $(gcloud auth print-access-token)' \
-H 'Content-Type: application/json' \
-d '{ \
"data_schema": { \
"$schema": "http://json-schema.org/draft-07/schema#", \
"type": "object", \
"properties": { \
"restricts": { \
"type": "object", \
"properties": { \
"genres": { \
"type": "array", \
"items": { \
"type": "string" \
} \
}, \
"director": { \
"type": "array", \
"items": { \
"type": "string" \
} \
} \
} \
}, \
"restricts_deny": { \
"type": "object", \
"properties": { \
"genres": { \
"type": "array", \
"items": { \
"type": "string" \
} \
} \
} \
}, \
"numeric_restricts": { \
"type": "object", \
"properties": { \
"year": { \
"type": "integer" \
}, \
"imdb_rating": { \
"type": "number", \
"format": "float" \
} \
} \
}, \
"embedding_metadata": { \
"type": "object", \
"properties": { \
"plot": { \
"type": "string" \
}, \
"customers_review_summary": { \
"type": "string" \
}, \
"critics_review_summary": { \
"type": "string" \
} \
}, \
} \
} \
}, \
"vector_schema": { \
"embedding": { \
"dense_vector": { \
"dimensions": 768 \
} \
}, \
"sparse_embedding": { \
"sparse_vector": {} \
} \
} \
}'
导入 Vector Search 1.0 数据
接下来,对新创建的集合使用 ImportDataObjects API。
将其指向包含 Vector Search 1.0 数据的 Cloud Storage 存储桶。
curl -X POST \
"https://vectorsearch.googleapis.com/v1main/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{ \
"gcs_import": { \
"contents_uri": "gs://your-bucket/path/to/your-data.jsonl", \
"error_uri": "gs://your-bucket/path/to/import-errors/" \
} \
}'
数据转换
在导入过程中,Vector Search 1.0 数据将转换为 Vector Search 2.0 数据对象。以下示例展示了字段的映射方式。
Vector Search 1.0 Cloud Storage 文件格式
{
"id": "movie-789",
"embedding": [-0.23, 0.88, 0.11, ...],
"sparse_embedding": {"values": [0.1, 0.2], "dimensions": [1, 4]},
"restricts": [
{"namespace": "genres", "allow": ["science-fiction", "action"], "deny": ["horror"]},
{"namespace": "director", "allow": ["Christopher Nolan"]}
],
"numeric_restricts": [
{"namespace": "year", "value_int": 2010},
{"namespace": "imdb_rating", "value_float": 8.8}
],
"embedding_metadata": {
"plot": "...",
"customers_review_summary": "...",
"critics_review_summary": "..."
}
}
转换后的 Vector Search 2.0 数据对象
DataObject(
name="/.../movie-789",
data={
"restricts": {
"genres": ["science-fiction", "action"],
"director": ["Christopher Nolan"],
},
"restricts_deny": {
"genres": ["horror"]
},
"numeric_restricts": {
"year": 2010,
"imdb_rating": 8.8,
},
"embedding_metadata": {
"plot": "...",
"customers_review_summary": "...",
"critics_review_summary": "...",
}
},
vectors={
"embedding": {"dense_vector": {"values": [-0.23, 0.88, 0.11, ...]}},
"sparse_embedding": {"sparse_vector": {"values": [0.1, 0.2], "indices": [1, 4]}},
}
)