Work with ObjectRef values

This document describes ObjectRef values and how to create and use them in BigQuery.

An ObjectRef value is a STRUCT type with a predefined schema that references Cloud Storage objects for multimodal analysis. It can be processed by OBJ functions, AI functions, or Python user-defined functions.

Schema

An ObjectRef value has the following fields:

Name Type Mode Description Example
uri STRING REQUIRED The Cloud Storage object's URI. "gs://cloud-samples-data/vision/demo-img.jpg"
version STRING NULLABLE The object generation. "1560286006357632"
authorizer STRING NULLABLE A BigQuery connection ID for delegated access or NULL for direct access. The ID can have the following formats:
"region.connection"
or
"project.region.connection"
"myproject.us.myconnection"
details JSON NULLABLE The object metadata or errors from processing the object. It can include the fields content_type, md5_hash, size, and updated for the object. {"gcs_metadata":{"content_type":"image/png","md5_hash":"dfbbb5cf034af026d89f2dc16930be15","size":915052,"updated":1560286006000000}}

The content_type field in the gcs_metadata field from the details column is fetched from Cloud Storage. You can set an object's content type in Cloud Storage. If you omit it in Cloud Storage, then BigQuery infers the content type from the suffix of the URI.

Create ObjectRef values

You can create ObjectRef values by using object tables, the OBJ.MAKE_REF function, or Cloud Storage Insights datasets.

Use object tables

Use an object table if you don't have URIs stored in a table and want to list all the objects from a Cloud Storage prefix. An object table stores the reference to an object in each row, and has a ref column that contains ObjectRef values. The following query uses the CREATE EXTERNAL TABLE statement to create an object table:

CREATE EXTERNAL TABLE mydataset.images
WITH CONNECTION `us.myconnection`
OPTIONS (uris=["gs://mybucket/images/*"], object_metadata="SIMPLE");

SELECT ref AS image_ref FROM mydataset.images;

ObjectRef values from an object table must have an authorizer for delegated access. The authorizer connection is the same connection that you use to create the object table.

Use the OBJ.MAKE_REF function

Use the OBJ.MAKE_REF function if you already have URIs stored in a table and want to create ObjectRef values from those URIs. The following queries show how to create ObjectRef values in the image_ref column from the uri column that contains Cloud Storage URIs:

-- Specify only the URI
SELECT *, OBJ.MAKE_REF(uri) AS image_ref FROM mydataset.images;
-- Specify the URI and the connection
SELECT *, OBJ.MAKE_REF(uri, "us.myconnection") AS image_ref FROM mydataset.images;

To modify the authorizers of an existing ObjectRef value, you can use the OBJ.MAKE_REF function:

-- Remove the authorizer
SELECT *, OBJ.MAKE_REF(ref, authorizer=>NULL) AS image_ref FROM mydataset.images;
-- Change the authorizer
SELECT *, OBJ.MAKE_REF(ref, authorizer=>"us.myconnection2") AS image_ref FROM mydataset.images;

The OBJ.MAKE_REF function accepts a nullable authorizer to support direct access and delegated access.

Use Cloud Storage Insights datasets

If you have a Storage Insights dataset configured, then the dataset already includes a ref column that contains ObjectRef values. Any ObjectRef values created in Storage Insights datasets don't have an authorizer. To query these objects, you must either have direct access to the object or add an authorizer to the ObjectRef to use delegated access.

Authorizer and permissions

When you pass an ObjectRef value to ObjectRef functions, AI functions, or Python UDFs, those functions need to access the object stored in Cloud Storage. You can authorize this access based on the value of the authorizer field in two ways: direct access and delegated access.

Direct access

With direct access, the user who runs the query accesses the object directly by using their own credentials. Direct access is used when the ObjectRef value has no authorizer.

Direct access has the following restrictions:

  • The user must have permission to access the objects.
  • A query job using the AI.GENERATE, AI.IF, AI.SCORE, or AI.CLASSIFY functions without a connection requires the user to have additional permissions. The query can only access Cloud Storage buckets and objects from the same project in which the job is executed.

For example, if you call the AI.GENERATE function on an ObjectRef value that doesn't have an authorizer, then the function reads the object as you. If you don't have permission to read the object, the function writes a "permission denied" error to the status column in the result.

The following example shows a query that uses direct access:

-- Requires that the end user can read the object "gs://cloud-samples-data/vision/demo-img.jpg" and use the Vertex AI model.
SELECT AI.GENERATE(
  ("Describe this image:",
  OBJ.GET_ACCESS_URL(OBJ.MAKE_REF("gs://cloud-samples-data/vision/demo-img.jpg"), 'r')));

Delegated access

With delegated access, the user who runs the query delegates object access to a BigQuery Cloud resource connection, which is specified in the authorizer field of the ObjectRef value. Delegated access can enable cross-project data access.

To use delegated access, your data administrator must follow these steps to set up the connection and permissions:

For example, if a user passes ObjectRef values that have an authorizer to an AI.GENERATE function, then the function verifies that the user has the bigquery.objectRefs.read permission, and then reads the objects by using the connection's service account. If the user or the service account has insufficient permissions, then the function writes a "permission denied" error to the status column in the result.

The following example shows a query that uses delegated access. It requires the following:

  • The user has the bigquery.objectRefs.read permission on connection1.
  • The service account for connection1 has the storage.objects.get permission on the object.
  • The service account for connection2 has the Vertex AI User role.
SELECT AI.GENERATE(
  ("Describe this image:",
    OBJ.GET_ACCESS_RUL(OBJ.MAKE_REF("gs://cloud-samples-data/vision/demo-img.jpg", "us.connection1"), 'r')),
  connection_id => "us.connection2");

Best practices

Consider the following best practices when you decide whether to use direct or delegated access:

  • Use direct access for a small team operating in a single project for both data storage and analysis. The data administrator uses Identity and Access Management to grant users access to BigQuery data and Cloud Storage data. Users can create ObjectRef values on demand without an authorizer to analyze objects by using their own credentials.
  • Use delegated access for a large team operating across multiple projects, especially when data storage and analysis are decoupled. The data administrator can set up connections and create ObjectRef values for analysis ahead of time with a connection as their authorizer. This approach works with object tables or by using OBJ.MAKE_REF on a list of URIs. Then, the data administrator can share the table storing the ObjectRef values with analysts. The analysts don't need to access the original bucket to analyze the objects.

Errors

Functions that consume ObjectRef values report errors in two ways:

  • Query failure: the query might fail with an error message and no result.
  • Returned error values: the query succeeds, but the function might write errors as a part of the return value. For information about the format of the return value, see the reference page for the function you are using.

When a function returns an ObjectRef value, the details field of that value might contain an errors field. If it does, the value of that field is an array of errors. Each error has the following schema:

Name Type Mode Description Example
code INT64 REQUIRED Standard HTTP error code. 400
message STRING REQUIRED A descriptive, user-friendly error message. "Connection credential for myproject.us.nonexistent_connection cannot be used. Either the connection does not exist, or the user does not have sufficient permissions (bigquery.objectRefs.read)"
source STRING REQUIRED The name of the function that triggered the error. "OBJ.MAKE_REF"

These are two common types of errors:

  • Object error: the object URI or version provided doesn't exist.
  • Authorizer error: the connection doesn't exist or the user has no permission to use it for delegated access.

The following query shows how to select ObjectRef values that contain errors from an Objectref column:

SELECT ref
FROM mydataset.images
WHERE ref.details.errors IS NOT NULL;

What's next