Import from Cloud SQL to Spanner

This page describes how to import data from Cloud SQL for MySQL into Spanner.

The process uses Cloud Shell on Google Cloud console to run commands that configure and run a Dataflow job to import a database from Cloud SQL into Spanner.

Process overview

The import process involves the following:

You complete a Google Cloud console workflow where you provide information about your source and target databases:
- Source database details: Cloud SQL instance name, database name, and your credentials.
- Spanner details: Your Spanner instance name, and database name. The command creates the database if it doesn't already exist.
- Output storage: A Cloud Storage bucket name to store output files.
Spanner opens Cloud Shell and populates a command. The command performs the following actions:
- Migrates the schema: The command migrates the schema using the Spanner migration tool. This migration runs in Cloud Shell and uses a public IP address to connect to your Cloud SQL instance. Because Cloud Shell is on its own network, it needs access to Cloud SQL using the public IP address; however, you don't need to allowlist any subnets against the public IP address.
- Starts a data migration: After the tool migrates the schema, the command starts a Dataflow job for data migration. The job reads from the source database directly through its private IP address and writes to Spanner. This job runs using the default Compute Engine service account. Finally, the command prints the Dataflow job URL.

Limitations

The following limitations apply:

This data import only supports a single Cloud SQL for MySQL instance.
Schema conversion is automated; you can't make adjustments to the schema during this import.
This data import is a one-time bulk load; it doesn't support continuous replication.

Before you begin

Before you import your database, complete the following prerequisites:

Ensure that your Cloud SQL instance has a public IP address and a private IP address enabled. For more information, see Configuring public IP connectivity and Configure private IP.
Create a user and password for your Cloud SQL instance that can be used to query the database.
Store the password in Secret Manager. You need the version ID of the secret version. For more information, see Create a secret.
Ensure you have a Cloud Storage bucket. Dataflow uses this bucket to store configuration files and outputs of the Dataflow jobs.
Ensure that Spanner and Cloud SQL are in the same Google Cloud project.
Enable the Dataflow, Cloud Storage, Spanner, Cloud SQL, and Secret Manager APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.
Enable the APIs

Required roles

To ensure that the default Compute Engine service account has the necessary permissions to run the Dataflow job, ask your administrator to grant the following IAM roles to the default Compute Engine service account on your project:

Secret Manager Secret Accessor (roles/secretmanager.secretAccessor)
Cloud SQL Client (roles/cloudsql.client)
Cloud Spanner Database Admin (roles/spanner.databaseAdmin)
Storage Object Admin (roles/storage.objectAdmin)
Dataflow Worker (roles/dataflow.worker)

To get the permissions that you need to configure the import, ask your administrator to grant you the following IAM roles on your project:

Cloud SQL Client (roles/cloudsql.client)
Cloud Spanner Database Admin (roles/spanner.databaseAdmin)
Secret Manager Secret Accessor (roles/secretmanager.secretAccessor)
Storage Admin (roles/storage.admin)
Dataflow Developer (roles/dataflow.developer)
Service Account User (roles/iam.serviceAccountUser)

These predefined roles contain the permissions required to configure the import. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to configure the import:

cloudsql.instances.connect
cloudsql.instances.get
cloudsql.instances.login
spanner.instances.list
spanner.instances.get
spanner.databases.create
spanner.databases.list
spanner.databases.get
spanner.databases.getDdl
spanner.databases.updateDdl
spanner.databases.read
spanner.databases.write
spanner.databases.select
secretmanager.versions.access
storage.objects.create
storage.objects.get
storage.buckets.get
dataflow.jobs.create
dataflow.jobs.get
dataflow.jobs.list
iam.serviceAccounts.actAs

Quota requirements

The quota requirements are as follows:

Spanner: You must have enough compute capacity to support the amount of data that you are importing. We recommend starting with a minimum of one Spanner node. You might need to add more compute capacity so that your job finishes in a reasonable amount of time. No additional compute capacity is required to import a database schema. For more information, see Autoscaling overview
Dataflow: Import jobs are subject to the same CPU, disk usage, and IP address Compute Engine quotas as other Dataflow jobs.
Compute Engine: Before running your import job, you must set up initial quotas for Compute Engine, which Dataflow uses. These quotas represent the maximum number of resources that you allow Dataflow to use for your job. Recommended starting values are:
- CPUs: 200
- In-use IP addresses: 200
- Standard persistent disk: 50 TB
Generally, you don't have to make any other adjustments. Dataflow provides autoscaling so that you only pay for the actual resources used during the import. If your job can make use of more resources, the Dataflow UI displays a warning icon. The job can finish even if there is a warning icon.