This page describes how to configure change data capture (CDC) to stream data from a Spanner database to a supported destination, such as BigQuery or Cloud Storage.
Before you begin
If your Spanner instance exists in a Google Cloud project other than
the one where Datastream is running, you need to provide the
Datastream service agent the spanner.databaseReader IAM
role, and if you plan to use Data Boost, the
spanner.databaseReaderWithDataBoost role.
If you'd rather use a fine-grained access control database role, see Create a Spanner connection profile and stream for individual permissions required.
Create a Spanner database
To start replicating change data from Spanner, you first need to create a Spanner instance and a Spanner database.
Create a change stream
Spanner uses change streams to track and stream data changes such as
inserts, updates, and deletes. To configure your Spanner source for
replication in Datastream, you need to create and configure a
Spanner change stream. You need to specify the NEW_ROW value capture
type for your change stream.
For more information, see Change streams overview.
Create a Spanner connection profile and stream
When you create a new Spanner connection profile, you need to specify the Spanner database that you created. The database name needs to have the following format:
projects/PROJECT_ID/instances/INSTANCE/databases/DATABASE_ID
When you create a stream, you can optionally supply:
- The objects to be included and excluded.
- The maximum number of concurrent reads for backfill or change stream queries.
- Whether to have Datastream use Data Boost when querying Spanner.
A fine-grained access control database role for Datastream to use when querying Spanner. The database role needs to have the following permissions at minimum:
spanner.sessions.createspanner.sessions.deletespanner.sessions.getspanner.databases.readspanner.databases.selectspanner.databases.partitionQueryspanner.databases.partitionReadspanner.databases.beginReadOnlyTransactionspanner.databases.getDdlspanner.databases.useDataBoost(if you choose to use Spanner Data Boost)spanner.databases.useRoleBasedAccess
Spanner remote procedure call (RPC) priority for Datastream to use.
What's next
- Learn more about Spanner as a source.