Analyze images
This tutorial shows you how to gain insights from unstructured image data by integrating BigQuery ML with Gemini. In the tutorial, you
create a
remote model
based on
gemini-2.5-flash
and use the
AI.GENERATE_TEXT
function to automatically extract metadata, such as titles and release years,
from a collection of movie posters.
Objectives
- Create a BigQuery object table over image data in a Cloud Storage bucket.
- Create a BigQuery ML remote model that targets the
Vertex AI
gemini-2.5-flashmodel. - Use the remote model with the
AI.GENERATE_TEXTfunction to identify the movies associated with a set of movie posters.
Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator.
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Required roles
To run this tutorial, you need the following Identity and Access Management (IAM) roles:
- Create and use BigQuery datasets, connections, and models:
BigQuery Admin (
roles/bigquery.admin). - Grant permissions to the connection's service account: Project IAM Admin
(
roles/resourcemanager.projectIamAdmin).
These predefined roles contain the permissions required to perform the tasks in this document. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
- Create a dataset:
bigquery.datasets.create - Create, delegate, and use a connection:
bigquery.connections.* - Set the default connection:
bigquery.config.* - Set service account permissions:
resourcemanager.projects.getIamPolicyandresourcemanager.projects.setIamPolicy - Create an object table:
bigquery.tables.createandbigquery.tables.update - Create a model and run inference:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateDatabigquery.models.updateMetadata
You might also be able to get these permissions with custom roles or other predefined roles.
Prepare the environment
To perform BigQuery ML inference on object tables using gemini-2.5-flash, you must assign a BigQuery reservation to your project. If there's already a reservation assigned to your project, you can skip this step.
Create a reservation
In the Google Cloud console, go to the BigQuery page.
In the navigation menu, click Capacity management.
Click Create Reservation.
On the Create reservation page, do the following:
- For Reservation name, enter
bqml-tutorial-reservation. - For Location, select us (multiple regions in United States).
- Leave the remaining default settings as they are, and click Save.
- For Reservation name, enter
Assign the reservation
In the Google Cloud console, go to the BigQuery page.
In the navigation menu, click Capacity management.
In the Slot Reservations table, find the reservation that you want to assign to your project.
Click View actions > Create assignment.
In Create an assignment, click Browse and select your project.
For Job type, select QUERY. This selection ensures that your SQL queries use this reservation's slots.
Click Create.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click your project name.
Click View actions > Create dataset
On the Create dataset page, do the following:
For Dataset ID, enter
bqml_tutorial.For Location type, select Multi-region, and then select US.
Leave the remaining default settings as they are, and click Create dataset.
bq
To create a new dataset, use the
bq mk --dataset command.
Create a dataset named
bqml_tutorialwith the data location set toUS.bq mk --dataset \ --location=US \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource.
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
Create the object table
Create an object table over the movie poster images in the public Cloud Storage bucket. The object table lets you analyze the images without moving them from Cloud Storage.
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following query to create the object table:
CREATE OR REPLACE EXTERNAL TABLE `bqml_tutorial.movie_posters` WITH CONNECTION DEFAULT OPTIONS ( object_metadata = 'SIMPLE', uris = ['gs://cloud-samples-data/vertex-ai/dataset-management/datasets/classic-movie-posters/*']);
Create the remote model
Create a remote model that represents a Vertex AI
gemini-2.5-flash model:
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following query to create the remote model:
CREATE OR REPLACE MODEL `bqml_tutorial.gemini-vision` REMOTE WITH CONNECTION DEFAULT OPTIONS (ENDPOINT = 'gemini-2.5-flash');
The query might take a few minutes to complete, after which the
gemini-visionmodel appears in thebqml_tutorialdataset in the Explorer pane. Because the query uses aCREATE MODELstatement to create a model, there are no query results.
Analyze the movie posters
Use the remote model to analyze the movie posters and determine what movie each poster represents, and then write this data to a table.
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following query to analyze the movie poster images:
CREATE OR REPLACE TABLE `bqml_tutorial.movie_posters_results` AS ( SELECT uri, result FROM AI.GENERATE_TEXT( MODEL `bqml_tutorial.gemini-vision`, TABLE `bqml_tutorial.movie_posters`, STRUCT( 0.2 AS temperature, 'For the movie represented by this poster, what is the movie title and year of release? Answer in JSON format with two keys: title, year. title should be string, year should be integer.' AS PROMPT)));
In the query editor, run the following statement to view the table data:
SELECT * FROM `bqml_tutorial.movie_posters_results`;
The output is similar to the following:
+--------------------------------------------+----------------------------------+ | uri | result | +--------------------------------------------+----------------------------------+ | gs://cloud-samples-data/vertex-ai/dataset- |
json | | management/datasets/classic-movie- | { | | posters/little_annie_rooney.jpg | "title": "Little Annie Rooney", | | | "year": 1912 | | | } | | || +--------------------------------------------+----------------------------------+ | gs://cloud-samples-data/vertex-ai/dataset- |json | | management/datasets/classic-movie- | { | | posters/mighty_like_a_mouse.jpg | "title": "Mighty Like a Moose", | | | "year": 1926 | | | } | | || +--------------------------------------------+----------------------------------+ | gs://cloud-samples-data/vertex-ai/dataset- |json | | management/datasets/classic-movie- | { | | posters/brown_of_harvard.jpeg | "title": "Brown of Harvard", | | | "year": 1926 | | | } | | || +--------------------------------------------+----------------------------------+
Format the model output
To make the movie title and year data easier to read, format the data returned by the model.
In the Google Cloud console, go to the BigQuery page.
In the query editor, run the following query to format the data:
CREATE OR REPLACE TABLE `bqml_tutorial.movie_posters_results_formatted` AS ( SELECT uri, JSON_QUERY(RTRIM(LTRIM(results.result, " ```json"), "```"), "$.title") AS title, JSON_QUERY(RTRIM(LTRIM(results.result, " ```json"), "```"), "$.year") AS year FROM `bqml_tutorial.movie_posters_results` results );
In the query editor, run the following statement to view the table data:
SELECT * FROM `bqml_tutorial.movie_posters_results_formatted`;
The output is similar to the following:
+--------------------------------------------+----------------------------+------+ | uri | title | year | +--------------------------------------------+----------------------------+------+ | gs://cloud-samples-data/vertex-ai/dataset- | "Barque sortant du port" | 1895 | | management/datasets/classic-movie- | | | | posters/barque_sortant_du_port.jpeg | | | +--------------------------------------------+----------------------------+------+ | gs://cloud-samples-data/vertex-ai/dataset- | "The Great Train Robbery" | 1903 | | management/datasets/classic-movie- | | | | posters/the_great_train_robbery.jpg | | | +--------------------------------------------+----------------------------+------+ | gs://cloud-samples-data/vertex-ai/dataset- | "Little Annie Rooney" | 1912 | | management/datasets/classic-movie- | | | | posters/little_annie_rooney.jpg | | | +--------------------------------------------+----------------------------+------+
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
If you want to reuse the project, then delete the resources that you created for the tutorial.
Delete the dataset
Console
Delete the entire bqml_tutorial dataset and all its contents by running the
following SQL command:
DROP SCHEMA IF EXISTS `bqml_tutorial` CASCADE;
bq
Delete the entire bqml_tutorial dataset and all its contents:
bq rm -r bqml_tutorial
Delete the reservation
Console
If you created a BigQuery reservation as part of this tutorial, you should remove it to avoid continued slot charges.
In the Google Cloud console, go to the BigQuery page.
In the navigation menu, click Capacity management.
In the Slot Reservations table, find
bqml-tutorial-reservation.Click View actions > Delete.
bq
If you created a BigQuery reservation named
bqml-tutorial-reservation in the us location, use the following command to
remove it:
bq rm --reservation --location=us bqml-tutorial-reservation
Delete the connection
Console
In the Google Cloud console, go to the BigQuery page.
In the left pane, click Explorer, locate your project, and then click Connections.
In the table, find your connection.
Click View actions > Delete.
bq
Delete the connection:
bq rm --connection --location=us CONNECTION_ID
Replace CONNECTION_ID with the actual ID of your connection.
What's next
- Learn more about generative AI functions in BigQuery.
- Learn how to tune a model using your data.
- Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.