Specify ObjectRef columns in table schemas

This document describes how to define a BigQuery standard table schema with columns that can store ObjectRef values.

ObjectRef values provide metadata and connection information for objects in Cloud Storage. Use ObjectRef values when you need to integrate unstructured data into a standard table. For example, in a products table, you could store product images in the same row with the rest of the product information by adding a column containing ObjectRef values. You can store ObjectRef values in STRUCT columns that use the ObjectRef format, which is STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>.

For more information about working with multimodal data, see Analyze multimodal data. For a tutorial that shows how to work with ObjectRef data, see Analyze multimodal data with SQL.

Prerequisites

To populate and update ObjectRef values in a standard table, the table must have a STRING column that contains URI information for the related Cloud Storage objects.

You must have a Cloud Storage bucket that contains the same objects that are identified in the URI data of the target standard table.

Maintaining ObjectRef values

Any object tables that you create have a ref column that contains an ObjectRef value for the given object. If you have an existing object table, then you can join it with your standard table on the object URI column to populate and update ObjectRef values. This is more efficient because it avoids re-fetching metadata from Cloud Storage to create a new ObjectRef value.

Similarly, if you already have a Storage Insights dataset for object metadata, then you can use the ref.uri or selfLink column to join the standard table with the Storage Insights dataset to populate and update ObjectRef values. Any ObjectRef values created in Storage Insights datasets don't have an authorizer. To query these objects, you must either have direct access to the object or add an authorizer to the ObjectRef to use delegated access.

If you don't have an existing object table or Storage Insights dataset, you can use the OBJ.MAKE_REF function to populate and update ObjectRef values by fetching object metadata directly from Cloud Storage. This approach might be less scalable, because it requires the retrieval of object metadata from Cloud Storage.

Create an ObjectRef column

To create and populate an ObjectRef column in a standard table, select one of the following options:

SQL functions

Create and populate an ObjectRef column based on output from the OBJ.MAKE_REF function:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE OR REPLACE TABLE PROJECT_ID.DATASET_ID.TABLE_NAME
    AS
    SELECT TABLE_NAME.*,
      OBJ.MAKE_REF(uri, 'CONNECTION_ID') AS objectrefcolumn
    FROM DATASET_ID.TABLE_NAME;

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • CONNECTION_ID: A STRING value that contains a Cloud resource connection that the service can use to access the objects in Cloud Storage, in the format location.connection_id. For example, us-west1.myconnection. You can get the connection ID by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/myconnection.

      You must grant the Storage Object User (roles/storage.objectUser) role to the connection's service account on any Cloud Storage bucket where you are using it to access objects.

      The connection must be in the same project and region as the query where you are calling the function.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Object table

Create and populate an ObjectRef column based on data from an object table ref column:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE OR REPLACE TABLE PROJECT_ID.DATASET_ID.TABLE_NAME
    AS
    SELECT TABLE_NAME.*, OBJECT_TABLE.ref AS objectrefcolumn
    FROM DATASET_ID.TABLE_NAME
    INNER JOIN DATASET_ID.OBJECT_TABLE
    ON OBJECT_TABLE.uri = TABLE_NAME.uri;

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • OBJECT_TABLE: the name of the object table that contains the object data that you want to integrate into the standard table.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Create an ARRAY<ObjectRef> column

You can create an ARRAY<STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>> column to contain arrays of ObjectRef values. For example, you could chunk a video into separate images, and then store these images as an array of ObjectRef values.

You can use the ARRAY_AGG function to aggregate arrays of ObjectRef values, including using the ORDER BY clause preserve object order if necessary. You can use the UNNEST operator to parse an array of ObjectRef values into individual ObjectRef values, including using the WITH OFFSET clause to preserve object order if necessary. You can use object metadata, like the URI path and object filename, to map ObjectRef values that represent object chunks to an ObjectRef value that represents the original object.

To see an example of how to work with arrays of ObjectRef values, see the Process ordered multimodal data using ARRAY<ObjectRef> values section of the Analyze multimodal data with SQL tutorial.

Update an ObjectRef column

To update an ObjectRef column in a standard table, select one of the following options:

Object table

Update an ObjectRef column by using data from an object table ref column:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    UPDATE PROJECT_ID.DATASET_ID.TABLE_NAME
    SET objectrefcolumn = (SELECT ref FROM DATASET_ID.OBJECT_TABLE WHERE OBJECT_TABLE.uri = TABLE_NAME.uri)
    WHERE uri != "";

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • OBJECT_TABLE: the name of the object table that contains the same object data as the standard table ObjectRef column.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

SQL functions

Update an ObjectRef column by using output from the OBJ.FETCH_METADATA and OBJ.MAKE_REF functions:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    UPDATE PROJECT_ID.DATASET_ID.TABLE_NAME
    SET objectrefcolumn = (SELECT OBJ.MAKE_REF(uri, 'CONNECTION_ID'))
    WHERE uri != "";

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • CONNECTION_ID: A STRING value that contains a Cloud resource connection that the service can use to access the objects in Cloud Storage, in the format location.connection_id. For example, us-west1.myconnection. You can get the connection ID by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/myconnection.

      You must grant the Storage Object User (roles/storage.objectUser) role to the connection's service account on any Cloud Storage bucket where you are using it to access objects.

      The connection must be in the same project and region as the query where you are calling the function.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

What's next