The AI.CLASSIFY function
This document describes the AI.CLASSIFY function, which uses a
Vertex AI Gemini model to classify
inputs into categories that you provide. BigQuery automatically
structures
your input to improve the quality of the classification.
The following are common use cases:
- Retail: Classify reviews by sentiment or classify products by categories.
- Text analysis: Classify support tickets or emails by topic.
Input
AI.CLASSIFY accepts the following types of input:
- Text data from standard tables.
This function passes your input to a Gemini model and incurs charges in Vertex AI each time it's called.
Syntax
AI.CLASSIFY( [ input => ] 'INPUT', [ categories => ] 'CATEGORIES', connection_id => 'CONNECTION' )
Arguments
AI.CLASSIFY takes the following arguments:
INPUT: aSTRINGorSTRUCTvalue that specifies the input to classify. The input must be the first argument that you specify. You can provide the input value in the following ways:- Specify a
STRINGvalue. For example,'apple'. Specify a
STRUCTvalue that contains one or more fields. You can use the following types of fields within theSTRUCTvalue:Field type Description Examples STRINGA string literal, or the name of a STRINGcolumn.String literal: 'apple'
String column name:my_string_columnARRAY<STRING>You can only use string literals in the array. Array of string literals: ['red ', 'apples']The function combines
STRUCTfields similarly to aCONCAToperation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples ofSTRUCTprompt values and how they are interpreted:Struct field types Struct value Semantic equivalent STRUCT<STRING>('apples')'apples' STRUCT<STRING, STRING>('red', ' apples')'red apples' STRUCT<STRING, ARRAY<STRING>>('crisp ', ['red', ' apples'])'crisp red apples'
- Specify a
CATEGORIES: the categories by which to classify the input. You can specify categories with or without descriptions:With descriptions: Use an
ARRAY<STRUCT<STRING, STRING>>value where each struct contains the category name, followed by a description of the category. The array can only contain string literals. For example, you could use colors to classify sentiment:[('green', 'positive'), ('yellow', 'neutral'), ('red', 'negative')]You can optionally name the fields of the struct for your own readability, but the field names aren't used by the function:
[STRUCT('green' AS label, 'positive' AS description), STRUCT('yellow' AS label, 'neutral' AS description), STRUCT('red' AS label, 'negative' AS description)]Without descriptions: Use an
ARRAY<STRING>value. The array can only contain string literals. This works well when your categories are self-explanatory. For example, you could use the following categories to classify sentiment:['positive', 'neutral', 'negative']
To handle input that doesn't closely match a category, consider including an
'Other'category.CONNECTION: aSTRINGvalue specifying the Cloud resource connection to use. The following forms are accepted:Connection name:
[PROJECT_ID].LOCATION.CONNECTION_IDFor example,
myproject.us.myconnection.Fully qualified connection ID:
projects/PROJECT_ID/locations/LOCATION/connections/CONNECTION_IDFor example,
projects/myproject/locations/us/connections/myconnection.
Replace the following:
PROJECT_ID: the project ID of the project that contains the connection.LOCATION: the location used by the connection.CONNECTION_ID: the connection ID—for example,myconnection.You can get this value by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example,
projects/myproject/locations/connection_location/connections/myconnection.
Output
AI.CLASSIFY returns a STRING value containing the provided category that
best fits the input.
If the call to Vertex AI is unsuccessful for any reason,
such as exceeding quota or model unavailability, then the function returns
NULL.
Examples
The following examples show how to use the AI.CLASSIFY function to classify
text and images into predefined categories.
Classify text by topic
The following query categorizes BBC news articles into high-level categories:
SELECT
title,
body,
AI.CLASSIFY(
body,
categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
connection_id => 'us.example_connection') AS category
FROM
`bigquery-public-data.bbc_news.fulltext`
LIMIT 100;
Classify reviews by sentiment
The following query classifies movie reviews of The English Patient by sentiment according to a custom color scheme. For example, a review that is very positive is classified as 'green'.
SELECT
AI.CLASSIFY(
('Classify the review by sentiment: ', review),
categories =>
[('green', 'The review is positive.'),
('yellow', 'The review is neutral.'),
('red', 'The review is negative.')],
connection_id => 'us.example_connection') AS ai_review_rating,
reviewer_rating AS human_provided_rating,
review,
FROM
`bigquery-public-data.imdb.reviews`
WHERE
title = 'The English Patient'
Locations
You can run AI.CLASSIFY in all of the
regions
that support Gemini models, and also in the US and EU
multi-regions.
Quotas
See Generative AI functions quotas and limits.
What's next
- For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
- For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.
- For more information about supported SQL statements and functions for generative AI models, see End-to-end user journeys for generative AI models.