從 Vector Search 1.0 遷移

為簡化從向量搜尋 1.0 遷移的過程,ImportDataObjects API 新增了這項功能。

遷移程序包含三個主要步驟:

  1. 建立結構定義相符的集合。 匯入前,請務必先建立集合。資料結構定義必須經過調整,才能容納轉換後的 Vector Search 1.0 資料。

  2. 啟動匯入程序。 呼叫 ImportDataObjects API,指定 Vector Search 1.0 資料的 Cloud Storage 位置,並啟用轉換標記 detect_and_convert_vs1_json

  3. 瞭解資料轉換。 瞭解 Vector Search 1.0 資料欄位如何對應至新的資料物件結構。

建立集合

首先,請建立集合,並使用與 Vector Search 1.0 資料結構相符的資料結構定義。

curl -X POST \
  'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections?collection_id=movies' \
  -H 'Bearer $(gcloud auth print-access-token)' \
  -H 'Content-Type: application/json' \
  -d '{ \
    "data_schema": { \
      "$schema": "http://json-schema.org/draft-07/schema#", \
      "type": "object", \
      "properties": { \
        "restricts": { \
          "type": "object", \
          "properties": { \
            "genres": { \
              "type": "array", \
              "items": { \
                "type": "string" \
              } \
            }, \
            "director": { \
              "type": "array", \
              "items": { \
                "type": "string" \
              } \
            } \
          } \
        }, \
        "restricts_deny": { \
          "type": "object", \
          "properties": { \
            "genres": { \
              "type": "array", \
              "items": { \
                "type": "string" \
              } \
            } \
          } \
        }, \
        "numeric_restricts": { \
          "type": "object", \
          "properties": { \
            "year": { \
              "type": "integer" \
            }, \
            "imdb_rating": { \
              "type": "number", \
              "format": "float" \
            } \
          } \
        }, \
        "embedding_metadata": { \
          "type": "object", \
          "properties": { \
            "plot": { \
              "type": "string" \
            }, \
            "customers_review_summary": { \
              "type": "string" \
            }, \
            "critics_review_summary": { \
              "type": "string" \
            } \
          }, \
        } \
      } \
    }, \
    "vector_schema": { \
      "embedding": { \
        "dense_vector": { \
          "dimensions": 768 \
        } \
      }, \
      "sparse_embedding": { \
        "sparse_vector": {} \
      } \
    } \
  }'

匯入 Vector Search 1.0 資料

接著,請對新建立的集合使用 ImportDataObjects API。 將其指向包含 Vector Search 1.0 資料的 Cloud Storage bucket。

curl -X POST \
"https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID:importDataObjects" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -d '{ \
    "gcs_import": { \
      "contents_uri": "gs://your-bucket/path/to/your-data.jsonl", \
      "error_uri": "gs://your-bucket/path/to/import-errors/" \
    } \
  }'

資料轉換

匯入期間,系統會將 Vector Search 1.0 資料轉換為 Vector Search 2.0 資料物件。以下範例說明如何對應欄位。

Vector Search 1.0 Cloud Storage 檔案格式

{
    "id": "movie-789",
    "embedding": [-0.23, 0.88, 0.11, ...],
    "sparse_embedding": {"values": [0.1, 0.2], "dimensions": [1, 4]},
    "restricts": [
        {"namespace": "genres", "allow": ["science-fiction", "action"], "deny": ["horror"]},
        {"namespace": "director", "allow": ["Christopher Nolan"]}
    ],
    "numeric_restricts": [
        {"namespace": "year", "value_int": 2010},
        {"namespace": "imdb_rating", "value_float": 8.8}
    ],
    "embedding_metadata": {
        "plot": "...",
        "customers_review_summary": "...",
        "critics_review_summary": "..."
    }
}

轉換後的 Vector Search 2.0 資料物件

DataObject(
    name="/.../movie-789",
    data={
        "restricts": {
            "genres": ["science-fiction", "action"],
            "director": ["Christopher Nolan"],
        },
        "restricts_deny": {
            "genres": ["horror"]
        },
        "numeric_restricts": {
            "year": 2010,
            "imdb_rating": 8.8,
        },
        "embedding_metadata": {
            "plot": "...",
            "customers_review_summary": "...",
            "critics_review_summary": "...",
        }
    },
    vectors={
        "embedding": {"dense_vector": {"values": [-0.23, 0.88, 0.11, ...]}},
        "sparse_embedding": {"sparse_vector": {"values": [0.1, 0.2], "indices": [1, 4]}},
    }
)