Sync BigQuery and Iceberg data to AlloyDB

This page describes how to use Lakehouse Federation for AlloyDB for PostgreSQL to sync data from BigQuery native tables, materialized views, BigQuery views, BigLake external tables (such as Apache Iceberg managed tables), and standard external tables. Iceberg is an open table format for managing and exchanging data.

This page assumes that you have an AlloyDB cluster and primary instance, and that you have a BigQuery dataset and tables. For more information, see Create datasets and Create and use tables.

Before you begin

  1. Ensure the bigquery_fdw.enabled flag is configured on the AlloyDB instance.
  2. Familiarize yourself with Supported BigQuery data types and column mappings.
  3. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Enable the AlloyDB, Compute Engine, Resource Manager, and BigQuery APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  8. Verify that billing is enabled for your Google Cloud project.

  9. Enable the AlloyDB, Compute Engine, Resource Manager, and BigQuery APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  10. Enable the Cloud APIs necessary to create and connect to AlloyDB for PostgreSQL.

    Enable the APIs

  11. In the Confirm project step, click Next to confirm the name of the project you are going to make changes to.

  12. In the Enable APIs step, click Enable to enable the following:

    • AlloyDB API
    • Compute Engine API
    • Cloud Resource Manager API
    • Service Networking API
    • BigQuery Storage API

    The Service Networking API is required if you plan to configure network connectivity to AlloyDB using a VPC network that resides in the same Google Cloud project as AlloyDB.

    The Compute Engine API and Cloud Resource Manager API are required if you plan to configure network connectivity to AlloyDB using a VPC network that resides in a different Google Cloud project.

Required roles

To grant read access to the BigQuery Dataset to the AlloyDB cluster service account, you need the following permissions:

  • BigQuery Data Viewer (roles/bigquery.dataViewer) or any custom role with permissions bigquery.tables.get and bigquery.tables.getData. When granted on a table or view, this role provides permissions to read data and metadata from the table or view.
  • BigQuery Read Session User (roles/bigquery.readSessionUser) or any custom role with permissions bigquery.readsessions.create and bigquery.readsessions.getData. Provides the ability to create and use read sessions.

Grant AlloyDB access to the BigQuery dataset

After the Lakehouse Federation feature is enabled on your AlloyDB cluster, grant the AlloyDB cluster service account access to the BigQuery dataset.

To use the gcloud CLI, you can install and initialize the Google Cloud CLI, or you can use Cloud Shell.

  1. Open the gcloud CLI. If you don't have the gcloud CLI installed, install and initialize the gcloud CLI, or use Cloud Shell.

  2. Run the gcloud beta alloydb clusters describe command:

    gcloud beta alloydb clusters describe CLUSTER --region=REGION

    Replace the following:

    • CLUSTER: the AlloyDB cluster ID.
    • REGION: the location of the AlloyDB cluster, for example, asia-east1, us-east1. See the full list of regions at Manage instance locations.

    The output contains a serviceAccountEmail field, which is the service account for this cluster. You can also find the service account on the Cluster details page.

  3. Grant the required permissions. For more information, see Control access to resources with IAM.

    If the cluster service account doesn't have the required permissions, the following errors appear when a query is executed against the BigQuery table:

    • The user does not have bigquery.readsessions.create permissions
    • Permission bigquery.tables.get denied on table
    • Permission bigquery.tables.getData denied on table

Configure the extension

  1. Create the extension.

    1. Connect to the AlloyDB instance using the psql client by following the instructions in Connect a psql client to an instance. Or, you can use AlloyDB Studio. For more information, see Manage your data using the Google Cloud console.
    2. Run the following command:

      CREATE EXTENSION bigquery_fdw;
      
  2. Create a foreign server to define the connection parameters for the remote BigQuery dataset.

    CREATE SERVER BIGQUERY_SERVER_NAME FOREIGN DATA WRAPPER bigquery_fdw;
    

    Replace the following:

    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server. Define this once in a given database. You can replace BIGQUERY_SERVER_NAME with your server name.
  3. Create the user mapping by running the CREATE USER MAPPING command, which specifies the credentials to use when you connect to the foreign server.

    CREATE USER MAPPING FOR USERNAME SERVER BIGQUERY_SERVER_NAME ;
    

    Replace the following:

    • USERNAME: a database username or an IAM user that accesses the foreign table.
    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server that you created.
  4. Define foreign tables that correspond to the tables that you want to access in BigQuery using the CREATE FOREIGN TABLE command. This command lets you define the structure of a remote table. The foreign table can either have all or a subset of the columns in the source table in BigQuery.

    CREATE FOREIGN TABLE TABLENAME (
    COLUMNX_NAME DATA_TYPE,
    COLUMNX_NAME DATA_TYPE,
    ...
    ) SERVER  BIGQUERY_SERVER_NAME OPTIONS (project BIGQUERY_PROJECT_ID, dataset  BIGQUERY_DATASET_NAME, table  BIGQUERY_TABLE_NAME [, mode EXECUTION_MODE]);
    

    Replace the following:

    • TABLENAME: name of the foreign table in the local database.
    • COLUMNX_NAME: the AlloyDB column name. The column name must exactly match the name of the corresponding column in the BigQuery source table. X indicates that the table can be created with multiple columns. The name must also match the exact case of the BigQuery column. If the BigQuery column name contains uppercase letters—for example, employeeID—the AlloyDB identifier must be enclosed in double quotes—for example, "employeeID"—to preserve mixed or uppercase letters.
    • DATA_TYPE: the data type of the column.
    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server that you created.
    • BIGQUERY_PROJECT_ID: ID of the project where the BigQuery dataset resides.
    • BIGQUERY_DATASET_NAME: name of the BigQuery dataset for the table.
    • BIGQUERY_TABLE_NAME: name of the BigQuery table.
    • EXECUTION_MODE: Optional. The mode option supports the following settings:

      • query: Use the BigQuery API for complex queries.
      • storage: Use the BigQuery Storage API for faster bulk reads.
      • auto: Automatically choose between the modes. This is the default.

      For more information, see BigQuery foreign data wrapper execution modes.

    After the foreign table is created, you can query this table the same way that you query any table in AlloyDB.

Sync data

To sync BigQuery data or BigLake Iceberg data to AlloyDB, follow these steps:

  1. Identify an existing data source or create a native BigQuery table or a new Iceberg managed tables.

  2. Use psql to create local_table by running the following command:

    CREATE TABLE local_table AS (SELECT * from foreign_table);
    

    This command creates a copy of the BigQuery table to a local, native AlloyDB table.
    Depending upon your application workflow, you can configure the PostgreSQL pg_cron extension to refresh the AlloyDB table at periodic intervals.

Set up a schedule to sync data to AlloyDB

To set up a schedule to sync BigQuery data or BigLake Iceberg data to AlloyDB, follow these steps:

  1. Configure the bigquery_fdw extension.
  2. Enable the pg_cron extension on the AlloyDB instance. For more information, see Supported database extensions.
    1. Set the alloydb.enable_pg_cron flag to on. For more information, see alloydb.enable_pg_cron.
    2. Set the cron.database_name flag to the database name where you installed the bigquery_fdw extension, and where you want to execute the SQL queries for sync. For more information, see Supported database flags.
  3. To periodically refresh a local copy of the foreign table, run the following commands in the database where you installed the bigquery_fdw extension:

    CREATE EXTENSION pg_cron;
    SELECT cron.schedule(JOB_NAME, SCHEDULE, 'CREATE TABLE IF NOT EXISTS local_table_copy AS (SELECT * FROM foreign_table); DROP TABLE IF EXISTS local_table; ALTER TABLE local_table_copy RENAME TO local_table;');
    

    Replace the following:

    • JOB_NAME: the name of the job.
    • SCHEDULE: the schedule for the job.

    For more information, see What is pg_cron?.

What's next