Configure a source Spanner database

This page describes how to configure change data capture (CDC) to stream data from a Spanner database to a supported destination, such as BigQuery or Cloud Storage.

Before you begin

If your Spanner instance exists in a Google Cloud project other than the one where Datastream is running, you need to provide the Datastream service agent the spanner.databaseReader IAM role, and if you plan to use Data Boost, the spanner.databaseReaderWithDataBoost role.

If you'd rather use a fine-grained access control database role, see Create a Spanner connection profile and stream for individual permissions required.

Create a Spanner database

To start replicating change data from Spanner, you first need to create a Spanner instance and a Spanner database.

Create a change stream

Spanner uses change streams to track and stream data changes such as inserts, updates, and deletes. To configure your Spanner source for replication in Datastream, you need to create and configure a Spanner change stream. You need to specify the NEW_ROW value capture type for your change stream.

For more information, see Change streams overview.

Create a Spanner connection profile and stream

When you create a new Spanner connection profile, you need to specify the Spanner database that you created. The database name needs to have the following format:

projects/PROJECT_ID/instances/INSTANCE/databases/DATABASE_ID

When you create a stream, you can optionally supply:

  • The objects to be included and excluded.
  • The maximum number of concurrent reads for backfill or change stream queries.
  • Whether to have Datastream use Data Boost when querying Spanner.
  • A fine-grained access control database role for Datastream to use when querying Spanner. The database role needs to have the following permissions at minimum:

    • spanner.sessions.create
    • spanner.sessions.delete
    • spanner.sessions.get
    • spanner.databases.read
    • spanner.databases.select
    • spanner.databases.partitionQuery
    • spanner.databases.partitionRead
    • spanner.databases.beginReadOnlyTransaction
    • spanner.databases.getDdl
    • spanner.databases.useDataBoost (if you choose to use Spanner Data Boost)
    • spanner.databases.useRoleBasedAccess
  • Spanner remote procedure call (RPC) priority for Datastream to use.

What's next