Access BigQuery from AlloyDB

This page describes how to use Lakehouse Federation to access data that is stored or accessible using BigQuery from an AlloyDB for PostgreSQL interface.

The foreign data wrapper supports a wide range of BigQuery resources, which lets you query the following:

By using this integration, you can treat BigQuery datasets as local tables within your PostgreSQL environment to perform cross-engine analytics. For more information, see Lakehouse Federation in AlloyDB overview.

This page assumes that you have an AlloyDB cluster and primary instance, and that you have a BigQuery dataset and tables. For more information, see Create datasets and Create and use tables.

Before you begin

  1. Ensure the bigquery_fdw.enabled flag is configured on the AlloyDB for PostgreSQL instance.
  2. Familiarize yourself with Supported BigQuery data types and column mappings.
  3. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. Enable the AlloyDB, Compute Engine, Resource Manager, and BigQuery APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

    Go to project selector

  8. Verify that billing is enabled for your Google Cloud project.

  9. Enable the AlloyDB, Compute Engine, Resource Manager, and BigQuery APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the APIs

  10. Enable the Cloud APIs necessary to create and connect to AlloyDB for PostgreSQL.

    Enable the APIs

  11. In the Confirm project step, click Next to confirm the name of the project you are going to make changes to.

  12. In the Enable APIs step, click Enable to enable the following:

    • AlloyDB API
    • Compute Engine API
    • Cloud Resource Manager API
    • Service Networking API
    • BigQuery Storage API
    • BigQuery API

    The Service Networking API is required if you plan to configure network connectivity to AlloyDB using a VPC network that resides in the same Google Cloud project as AlloyDB.

    The Compute Engine API and Cloud Resource Manager API are required if you plan to configure network connectivity to AlloyDB using a VPC network that resides in a different Google Cloud project.

Required roles

To grant read access to the BigQuery Dataset to the AlloyDB cluster service account, you need the following permissions. For more information, see Grant AlloyDB access to the BigQuery dataset.

  • BigQuery Data Viewer (roles/bigquery.dataViewer) or any custom role with permissions bigquery.tables.get and bigquery.tables.getData. When granted on a table or view, this role provides permissions to read data and metadata from the table or view.
  • BigQuery Read Session User (roles/bigquery.readSessionUser) or any custom role with permissions bigquery.readsessions.create and bigquery.readsessions.getData. Provides the ability to create and use read sessions.
  • BigQuery Job User (roles/bigquery.jobUser) or any custom role with permissions bigquery.jobs.create. Provides permissions to run jobs, including queries, within the project using the BigQuery API. This role can only be granted on Resource Manager resources (projects, folders, and organizations).
  • Storage Object Viewer (roles/storage.objectViewer) or any custom role with permissions storage.objects.get. Provides permissions to access BigQuery external tables. Must be granted at the project or bucket level.

Grant AlloyDB access to the BigQuery dataset

After the Lakehouse Federation feature is enabled on your AlloyDB cluster, you must grant the AlloyDB cluster service account access to the BigQuery dataset.

The Google Cloud console automatically grants the required permissions to the cluster service account when you connect a BigQuery table using AlloyDB Studio.

To grant access using gcloud CLI, follow these steps:

gcloud

To use the gcloud CLI, you can install and initialize the Google Cloud CLI, or you can use Cloud Shell.

  1. Open the gcloud CLI. If you don't have the gcloud CLI installed, install and initialize the gcloud CLI, or use Cloud Shell.

  2. Run the gcloud beta alloydb clusters describe command:

    gcloud beta alloydb clusters describe CLUSTER --region=REGION

    Replace the following:

    • CLUSTER: the AlloyDB cluster ID.
    • REGION: the location of the AlloyDB cluster, for example, asia-east1, us-east1. See the full list of regions at Manage instance locations.

    The output contains a serviceAccountEmail field, which is the service account for this cluster. You can also find the service account on the Cluster overview page.

  3. Grant the required permissions. For more information, see Control access to resources with IAM.

    If the cluster service account doesn't have the required permissions, the following errors appear when a query is executed against the BigQuery table:

    • The user does not have bigquery.readsessions.create permissions
    • Permission bigquery.tables.get denied on table
    • Permission bigquery.tables.getData denied on table

Configure the extension

To configure the extension, follow these steps:

Console

  1. Go to the Clusters page.

    Go to Clusters

  2. Click the ID of the cluster you want to use.

  3. In the navigation menu, click AlloyDB Studio.

  4. Log in to your database.

  5. In the Explorer pane, expand the relevant schema.

  6. Click the Actions menu next to BigQuery Tables and click Connect BigQuery table.

  7. In the Connect BigQuery table pane, choose a source project, source dataset, and a table.

  8. The Review and select columns table displays the columns from the selected table. Select the columns you want to map.

  9. In the Table name field, enter a name for the foreign table.

  10. Optional: Click View SQL Command to view the generated command.

  11. Click Connect table. A dialog shows the progress. After the process completes, you can query the table like you query any table in AlloyDB.

psql

  1. Create the extension.

    1. Connect to the AlloyDB instance using the psql client by following the instructions in Connect a psql client to an instance. Or, you can use AlloyDB Studio. For more information, see Manage your data using the Google Cloud console.
    2. Run the following command:

      CREATE EXTENSION bigquery_fdw;
      
  2. Create a foreign server to define the connection parameters for the remote BigQuery dataset.

    CREATE SERVER BIGQUERY_SERVER_NAME FOREIGN DATA WRAPPER bigquery_fdw;
    

    Replace the following:

    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server. Define this once in a given database. You can replace BIGQUERY_SERVER_NAME with your server name.
  3. Create the user mapping by running the CREATE USER MAPPING command, which which maps a local PostgreSQL user you want to connect to the foreign server.

    CREATE USER MAPPING FOR USERNAME SERVER BIGQUERY_SERVER_NAME ;
    

    Replace the following:

    • USERNAME: a database username or an IAM user that accesses the foreign table.
    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server that you created.
  4. Define foreign tables that correspond to the tables that you want to access in BigQuery using the CREATE FOREIGN TABLE command. This command lets you define the structure of a remote table. The foreign table can either have all or a subset of the columns in the source table in BigQuery.

    CREATE FOREIGN TABLE TABLENAME (
    COLUMN1_NAME DATA_TYPE,
    COLUMN2_NAME DATA_TYPE,
    ... ) SERVER BIGQUERY_SERVER_NAME OPTIONS (project BIGQUERY_PROJECT_ID,
    dataset BIGQUERY_DATASET_NAME,
    table BIGQUERY_TABLE_NAME
    [, mode EXECUTION_MODE]);
    

    Replace the following:

    • TABLENAME: the name of the foreign table in the local AlloyDB database.
    • COLUMNX_NAME: the AlloyDB column name. The column name must exactly match the name of the corresponding column in the BigQuery source table. X indicates that the table can be created with multiple columns. The name must also match the exact casing of the BigQuery column. If the BigQuery column name contains uppercase letters (for example, employeeID), the AlloyDB identifier must be enclosed in double quotes (for example, "employeeID") to preserve mixed or uppercase letters.
    • DATA_TYPE: the data type of the column.
    • BIGQUERY_SERVER_NAME: unique identifier for the foreign server that you created.
    • BIGQUERY_PROJECT_ID: ID of the project where the BigQuery dataset resides.
    • BIGQUERY_DATASET_NAME: name of the BigQuery dataset for the table.
    • BIGQUERY_TABLE_NAME: name of the BigQuery table.
    • EXECUTION_MODE: Optional. The mode option can be query to use the BigQuery API for complex queries, storage to use the BigQuery Storage API for faster bulk reads, or auto to automatically choose between the modes; auto is the default. For more information, see BigQuery foreign data wrapper execution modes.

    After the foreign table is created, you can query this table the same way that you query any table in AlloyDB.

BigQuery foreign data wrapper execution modes

The execution mode determines how AlloyDB for PostgreSQL interacts with BigQuery to retrieve data. The BigQuery foreign data wrapper supports two execution modes: query and storage. Choosing the right mode is critical since each mode has separate performance characteristics and pricing. See BigQuery pricing for more information.

Query mode

This mode uses BigQuery API to retrieve data from BigQuery. It uses BigQuery's compute engine to run complex queries by pushing down filters and aggregations. That is, WHERE clauses, GROUP BY clauses, and aggregations are executed on BigQuery before sending data back to PostgreSQL. This mode also supports querying BigQuery views and external tables.

Because this API provides structured, paginated row responses suitable for small result sets, reading large datasets has throughput limitations and higher latencies compared to the streamed alternative of the BigQuery Storage API.

Storage mode

This mode uses the BigQuery Storage API to retrieve data from BigQuery. It enables high-throughput reads by sending structured data over the wire in a binary serialization format. This is the preferred mode for scanning large tables in BigQuery.

However, this mode has some limitations. Not all complex SQL operations can be pushed down to the BigQuery Storage API. For example, aggregations cannot be pushed down to BigQuery and must be executed in AlloyDB. This mode also doesn't support querying BigQuery views and external tables.

Auto mode

The default mode is set to auto if you do not set the mode in the CREATE FOREIGN TABLE command. When you use the auto mode, AlloyDB selects the underlying API to balance performance and maximize SQL operations pushed down to BigQuery.

What's next