Create a stream using the automated flow

This page describes how to create a stream using the automated setup flow.

The automated stream setup simplifies the process of moving data from Cloud SQL for PostgreSQL managed databases to BigQuery by reducing the number of steps that you need perform. The flow lets you create a stream directly from the overview page of your Cloud SQL for PostgreSQL instance. Datastream automates securing the VPC connection between the stream and the source database, creating database configurations and stream connection resources.

Before you begin

  1. Enable the Datastream, Network Connectivity, and Compute Engine APIs.
  2. Ensure that you have the required Identity and Access Management (IAM) permissions to create and manage Datastream resources. See the section that follows for more details.
  3. Create and configure a source Cloud SQL for PostgreSQL database for replication. For more information, see Configure a Cloud SQL for PostgreSQL database for CDC.
  4. We recommend that you enable logical replication for your instance prior to creating the stream. If you don't enable logical replication, Datastream enables it for you, which restarts your source instance.

    For information about enabling logical replication for a Cloud SQL for PostgreSQL instance, see Configure an Cloud SQL for PostgreSQL database for CDC.

  5. Make sure that your source database is configured to use private services access.

Required permissions

To use the automated stream creation flow, you need the following Identity and Access Management (IAM) roles or permissions:

  • serviceusage.services.enable, compute.networkAdmin for enabling the required APIs and performing network configuration tasks.
  • cloudsql.admin for instance configuration tasks.
  • datastream.admin for administration tasks that Datastream performs on your behalf.

Your database administrator user also needs GRANT permissions for the schema that you want to replicate. Connect to your source database and run the following:

  GRANT cloudsqlsuperuser TO "USER_NAME";
  ALTER ROLE "USER_NAME" CREATEROLE;
  GRANT SELECT on ALL TABLES IN SCHEMA "SCHEMA_NAME" to "USER_NAME" WITH GRANT OPTION;
  ALTER DEFAULT PRIVILEGES
  IN SCHEMA "SCHEMA_NAME"
  GRANT SELECT ON TABLES TO "USER_NAME" WITH GRANT OPTION;
  

Create and start the stream

To create and start a stream, perform the following steps:

Console

Get started

  1. Go to your source Cloud SQL for PostgreSQL database instance overview page.

    Go to Cloud SQL Instances

  2. In the Stream data into BigQuery section, click Create stream.

  3. On the Get started page, provide the name for your stream in the Stream name field. A unique identifier is auto-populated.

  4. In the Provide the instance user account details section, select the authentication method:

    • IAM database authentication: this option is available if your user has the Identity and Access Management (IAM) principal identity assigned. For more information, see IAM principals. If you select this option, you must manually grant the cloudsqlsuperuser role and the CREATEROLE permission to your user:

      1. In Google Cloud console, go to the Cloud SQL Instances page.

      Go to Cloud SQL Instances

      1. Select your Cloud SQL instance.
      2. In the navigation menu, click Cloud SQL Studio and sign in.
      3. In the Explorer pane, run the following query for your user:
    GRANT cloudsqlsuperuser TO "USER_NAME";
    ALTER ROLE "USER_NAME" CREATEROLE;
    • Built-in database authentication: provide the username and the password of a user with the cloudsqlsuperuser role. If you select this option, make sure that the user has GRANT permissions on the tables that they want to replicate.
  5. Review additional stream details, such as region, encryption and labels. Expand the Additional stream details section to apply changes, if required.

  6. Click Continue.

Configure source

  1. On the Configure stream source page, select the database from which you want to replicate data.
  2. The Objects to include list selects all available objects by default. The objects on the list are the objects for which you have the necessary permissions to stream. To change the selected objects, modify the selections in the list.
  3. Review advanced stream configurations, such as backfill mode and the maximum number of concurrent backfill connections. Expand the Advanced stream configurations section to apply changes, if required.
  4. Click Continue.

Configure destination

  1. On the Configure destination page, adjust the BigQuery destination settings, if required. For more information, see Configure information about the destination for the stream.

Create and start the stream

  1. Click Create and start later to create your stream and start it later in Datastream, or Start to create and start your stream right away.

  2. You're notified about the tasks that are automatically performed for you:

    • Datastream creates the required Virtual Private Cloud resources, such as internal IP address range, subnetwork and network attachment.
    • Datastream configures tables for CDC, sets up replication slots, sets up a publication for all tables in the database, and creates a dedicated Datastream user.
    • Datastream creates a private connectivity configuration and source and destination connection profiles.
  3. Confirm that you want to create or create and start your stream.

Monitor the stream

You can monitor basic stream details from your source instance overview page in the Google Cloud console. The page displays information such as stream status, stream name, destination BigQuery dataset, and destination project identifier.

From the overview page, you can also perform actions such as starting, stopping, or pausing the stream. For more granular monitoring information, click the stream name to navigate to Datastream.

Delete the stream

When you delete a stream that you created using the automated flow, some resources, such as the PostgreSQL replication slot are deleted automatically. However, there are resources that you need to manually delete:

  • The publication. The publication is created using the database administrator user, and can be deleted only by its owner, so by default, the user who created it.
  • The Datastream reader user.
  • The Datastream source and destination connection profiles
  • The private connectivity resources.
  • All the network resources created during the automated flow, such as the subnetwork and the network attachment.

What's next