Querying Collections for Data Objects

The purpose of the Query API is to retrieve Data Objects from a Collection using a filter. This is similar to querying a database table and using a SQL WHERE clause. You can also use aggregation to get a count of Data Objects matching a filter.

Filter expression language

In addition to KNN/ANN search functionality, Vector Search 2.0 provides versatile query capabilities using a custom query language. The query language is explained in the following table.

Filter	Description	Supported Types	Example
$eq	Matches Data Objects with field values that are equal to a specified value.	Number, string, boolean	`{"genre": {"$eq": "documentary"}}`
$ne	Matches Data Objects with field values that are not equal to a specified value.	Number, string, boolean	`{"genre": {"$ne": "drama"}}`
$gt	Matches Data Objects with field values that are greater than a specified value.	Number	`{"year": {"$gt": 2019}}`
$gte	Matches Data Objects with field values that are greater than or equal to a specified value.	Number	`{"year": {"$gte": 2020}}`
$lt	Matches Data Objects with field values that are less than a specified value.	Number	`{"year": {"$lt": 2020}}`
$lte	Matches Data Objects with field values that are less than or equal to a specified value.	Number	`{"year": {"$lte": 2020}}`
$in	Matches Data Objects with field values that are in a specified array.	String	`{"genre": {"$in": ["comedy", "documentary"]}}`
$nin	Matches Data Objects with field values that are not in a specified array.	String	`{"genre": {"$nin": ["comedy", "documentary"]}}`
$and	Joins query clauses with a logical AND.	-	`{"$and": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}`
$or	Joins query clauses with a logical OR.	-	`{"$or": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}`
$all	Selects the documents where the array value of a field contains all specified values.	-	`{"colors": {"$all": ["red", "blue"]}}`

Querying Collections

The following example demonstrates how to use a filter to query for Data Objects in a Collection with the ID COLLECTION_ID.

REST

# Query Data Objects
curl -X POST \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:query' \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H 'Content-Type: application/json' \
  -d '{
    "page_size": 10,
    "page_token": "",
    "filter": {
      "$or": [
        {
          "director": {
            "$eq": "Akira Kurosawa"
          }
        },
        {
          "$and": [
            {
              "director": {
                "$eq": "David Fincher"
              }
            },
            {
              "genre": {
                "$ne": "Thriller"
              }
            }
          ]
        }
      ]
    },
    "output_fields": {
      "data_fields": "*",
      "vector_fields": "*",
      "metadata_fields": "*"
    }
  }'

Python

from google.cloud import vectorsearch_v1beta

# Create the client
data_object_search_service_client = vectorsearch_v1beta.DataObjectSearchServiceClient()

# Initialize request
request = vectorsearch_v1beta.QueryDataObjectsRequest(
    parent="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    filter={
        "$or": [
            {"director": {"$eq": "Akira Kurosawa"}},
            {
                "$and": [
                    {"director": {"$eq": "David Fincher"}},
                    {"genre": {"$ne": "Thriller"}},
                ]
            },
        ]
    },
)

# Make the request
page_result = data_object_search_service_client.query_data_objects(request=request)

# Handle the response
for response in page_result:
    print(response)

To perform an aggregation, you use the aggregate endpoint and specify the type of aggregation in the request body.

The following example demonstrates how to count all Data Objects in a Collection with the ID COLLECTION_ID.

REST

curl -X POST \
'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID/dataObjects:aggregate' \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H 'Content-Type: application/json' \
  -d '{
    "aggregate": "count"
  }'

Python

from google.cloud import vectorsearch_v1beta

# Create the client
data_object_search_service_client = vectorsearch_v1beta.DataObjectSearchServiceClient()

# Initialize request
request = vectorsearch_v1beta.AggregateDataObjectsRequest(
    parent="projects/PROJECT_ID/locations/LOCATION/collections/COLLECTION_ID",
    aggregate="COUNT",
)

# Make the request
response = data_object_search_service_client.aggregate_data_objects(request=request)

# Handle the response
print(response)

What's next?

Learn how to search for Data Objects.

Querying Collections for Data Objects Stay organized with collections Save and categorize content based on your preferences.

Filter expression language

Querying Collections

REST

Python

REST

Python

What's next?

Querying Collections for Data Objects