This launch checklist provides a list of considerations that need to be made prior to launching a production application on Spanner. It isn't intended to be exhaustive, but serves to highlights key considerations to minimize risks, optimize performance, and ensure alignment with business and operational goals, offering a systematic approach to deliver a seamless and reliable Spanner deployment.
This checklist is broken down into the following sections:
- Design, development, testing, and optimization
- Migration (optional)
- Deployment
- Query optimizer and statistics management
- Disaster recovery
- Security
- Logging and monitoring
- Client library
- Support
- Cost management
Design, development, testing, and optimization
Optimizing schema design, transactions and queries is essential to use Spanner's distributed architecture for high performance and scalability. Rigorous at-production scale and end-to-end testing ensures the system can handle real-world workloads, peak loads, and concurrent operations, while minimizing risks of bottlenecks or failures in production.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Design the schema with scalability and Spanner's
        distributed architecture in mind. Follow best practices such as
        selecting appropriate primary keys and indexes to avoid hotspots and
        consider optimizations like table interleaving for related data. Review
        Schema design best practices
        to ensure the schema supports both high performance and scalability
        under expected workloads.
       | 
| ❑   | 
        Optimize transactions and queries for minimal locking and maximum
        performance. Use Spanner's transaction modes, such as
        locking read-write, strong read-only, and partitioned DML
        statements, to balance consistency, throughput, and latency. Minimize
        locking scopes by using
        read-only transactions
        for queries, batching
        for maximum DML throughput or
        partitioned DML statements for
        large-scale updates and deletes. When migrating from systems with
        different isolation levels (for example, PostgreSQL or MySQL),
        use transactions to avoid performance bottlenecks. For more information,
        see Transactions.
       | 
| ❑   | 
        Conduct rigorous at-scale load testing to validate schema design,
        transaction behavior, and query performance. Simulate peak and
        high-concurrency scenarios that mimic real-world application loads,
        including diverse transaction shapes and query patterns. Evaluate
        latency and throughput under these conditions to confirm that the
        database design and instance topology meets performance requirements.
        Use load testing iteratively during development to optimize and
        refine implementation.
       | 
| ❑   | 
        Extend load testing to encompass all interacting services, not just
        isolated applications. Simulate comprehensive user journeys
        alongside parallel processes, such as batch loads or administration
        tasks that access the database. Run tests on the production
        Spanner instance configuration, ensuring load test
        drivers and services are geographically aligned with the intended
        production deployment topology. This holistic approach identifies
        potential conflicts in advance and ensures smooth database performance
        during real-world operations.
       | 
| ❑   | 
        To ensure predictable query performance, use the optimizer version on
        which the workload has been tested. By default,
        Spanner databases use the latest query optimizer version.
        Regularly evaluate new optimizer versions
        in a controlled environment, and update the default version only after
        confirming compatibility and performance improvements. For more
        information, see
        Query optimizer overview.
       | 
| ❑   | 
        Ensure that query optimizer statistics
        are up-to-date to support efficient query execution plans.
        Although statistics are updated automatically, consider manually
        constructing a new statistics package
        in scenarios such as large-scale data modifications (for example, bulk
        inserts, updates or deletes), addition of new indexes, or schema changes.
        Keeping the query optimizer statistics current is critical for
        maintaining optimal query performance.
       | 
Migration (optional)
Database migration is a comprehensive process that requires a deep dive into the specifics of each individual migration journey. Consider the following in your migration strategy:
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Develop a detailed standard operating procedure (SOP) for the
        migration cutover. This includes steps for application rollout,
        database switchover, and automation to minimize manual intervention.
        Identify and communicate potential downtime windows to stakeholders well
        in advance. Implement robust monitoring and alerting mechanisms to track
        the migration process in real-time and detect any anomalies promptly.
        Ensure the switchover process includes validation checks to confirm data
        integrity and application capabilities post-migration.
       | 
| ❑   | 
        Prepare a detailed fallback plan to revert to the source system in
        the case of critical issues during the migration. Test the fallback
        procedures in a staging environment to ensure that they are reliable,
        and can be executed with minimal downtime. Clearly define conditions
        that would trigger a fallback and ensure the team is trained to execute
        this plan swiftly and efficiently.
       | 
Deployment
Proper deployment planning ensures that Spanner configurations meet workload requirements for availability, latency, and scalability, while accounting for geographic and operational considerations. Aligning sizing, resource management, failover scenarios, and automation minimizes risks, ensures optimal performance, and prevents resource constraints or outages during critical operations.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Ensure your Spanner instance configuration
        (whether regional, dual-region, or multi-regional) aligns with your
        application's workload availability and latency requirements, while also
        taking geographic considerations into account. Calculate the target
        compute capacity based on expected storage sizes, traffic patterns, and
        recommended utilization limits,
        ensuring sufficient capacity for zonal or regional outages. Plan for
        traffic peaks by enabling autoscaling.
        You can set an upper limit for compute capacity to establish cost
        safeguards. For more information, see
        Compute capacity, nodes, and processing units.
       | 
| ❑   | 
        If you're using a dual-region or multi-region instance configuration,
        choose a leader region that minimizes latency for application writes
        from services deployed in your most latency-sensitive locations.
        Test the implications of different leader regions on operation latency,
        and adjust to optimize application performance. Plan for failover
        scenarios by ensuring that the application topology is able to adapt to
        leader region changes during regional outages. For more information, see
        Modify the leader region of a database.
       | 
| ❑   | 
        Configure tags and labels appropriately for operational clarity and
        Google Cloud resource tracking. Use tags to group instances by
        environment or workload type. Use labels for metadata that aids in cost
        analysis and permissions management. For more information, see
        Control access and organize instances with tags.
       | 
| ❑   | 
        Evaluate whether Spanner warm up is necessary,
        especially for services expecting sudden and high traffic upon launch.
        Testing latency under high initial loads might reveal the need for
        pre-launch warm up to ensure optimal performance. If warm up is
        required, generate artificial load. For more information, see
        Warm up the database before application launch.
       | 
| ❑   | 
        Review Spanner limits and quotas before deployment.
        If necessary, request quota increases in the Google Cloud console to avoid
        constraints during peak periods. Be mindful of hard limits (for example,
        maximum tables per database) to prevent issues post-deployment. For more
        information, see
        Quotas and limits.
       | 
| ❑   | 
        Use automation tools like Terraform to provision and manage your
        Spanner instances, ensuring configurations are efficient
        and error-proof. For schema management, consider using tools like
        Liquibase
        to avoid accidental schema drops during updates. For more information,
        see Use Terraform with Spanner.
       | 
Disaster recovery
Establishing a robust disaster recovery (DR) strategy is essential to protect data, minimize downtime, and ensure business continuity during unexpected failures. Regularly testing of restore procedures and automating backups helps ensure operational readiness, compliance with recovery objectives, and reliable data protection tailored to organizational needs.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Define a comprehensive disaster recovery strategy for
        Spanner that includes data protection, recovery
        objectives and failure scenarios. Establish clear recovery time
        objectives (RTO) and recovery point objectives (RPO) that align with
        business continuity requirements. Specify backup frequency, retention
        policies, and use point-in-time recovery (PITR)
        to minimize data loss in case of failures. Review the
        Disaster recovery overview
        to identify the right tools and techniques to ensure compliance with
        availability, reliability, and security for your application. For more
        information, see the
        Data protection and recovery solutions in Spanner
        whitepaper.
       | 
| ❑   | 
        Create detailed documentation for back up and restore procedures,
        including step-by-step guides for various recovery scenarios.
        Regularly test these procedures to ensure operational readiness and
        validate RTO and RPO requirements. Testing should simulate real-world
        failure conditions and scenarios to identify gaps and improve the
        recovery process. For more information, see Restore overview.
       | 
| ❑   | 
        Implement automated backup schedules to ensure consistent and
        reliable data protection. Configure frequency and retention settings to
        match business needs and regulatory obligations. Use
        Spanner's backup scheduling features to automate the
        creation, management, and monitoring of backups. For more information,
        see Create and manage backup schedules.
       | 
| ❑   | 
        Align failover procedures with your application's
        instance configuration topology
        to minimize latency impacts in the case of an outage. Test disaster
        recovery scenarios, ensuring the application can operate efficiently
        when the leader region is moved to a failover region. For more
        information, see Modify the leader region of a database.
       | 
Query optimizer and statistics management
Managing query optimizer versions and statistics is important for maintaining predictable and efficient query performance. Using tested versions and keeping statistics up-to-date ensures stability, prevents unexpected performance changes, and optimizes query execution plans, especially during significant data or schema modifications.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        By default, Spanner databases use the latest query
        optimizer version. To ensure predictable query performance, use the
        optimizer version on which the workload has been tested. Regularly
        evaluate new optimizer versions
        in a controlled environment, and update the default version only after
        confirming compatibility and performance improvements. For more
        information, see the
        Query optimizer overview.
       | 
| ❑   | 
        Ensure that query optimizer statistics
        are up-to-date to support efficient query execution plans.
        Although statistics are updated automatically, consider manually
        constructing a new statistics package
        in scenarios such as large-scale data modifications (for example, bulk
        inserts, updates or deletes), addition of new indexes, or schema changes.
        Keeping the query optimizer statistics current is critical for
        maintaining optimal query performance.
       | 
| ❑   | 
        In certain scenarios, such as after bulk deletes or when new
        statistics generation might unpredictably impact query performance,
        pinning a specific statistics package is advisable. This provides
        consistent query performance until a new package can be generated and
        tested. Regularly review the need to pin statistics and unpin once
        updated packages are validated. For more information, see
        Query optimizer statistics packages.
       | 
Security
Implementing access control measures is essential to protect sensitive data and prevent unauthorized access in Spanner. By enforcing least-privilege access, fine-grained access control (FGAC), and database deletion protection, you can minimize risk, ensure compliance, and safeguard critical assets against accidental or malicious actions.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Review and implement Identity and Access Management (IAM) policies
        following the least-privilege principle for all users and service
        accounts accessing your database. Assign only the necessary
        permissions required to perform specific tasks and regularly audit
        access control permissions to ensure adherence to this model. Use
        service accounts with minimal privileges for automated processes to
        reduce the risk of unauthorized access. For more information, see the
        IAM overview.
       | 
| ❑   | 
        If the application requires restricted access to specific rows,
        columns, or cells within a table, implement fine-grained access control
        (FGAC). Design and apply conditional access policies based on user
        attributes or data values to enforce granular access rules. Regularly
        review and update these policies to align with evolving security and
        compliance requirements. For more information, see the
        Fine-grained access control overview.
       | 
| ❑   | 
        Implement automated backup schedules to ensure consistent and
        reliable data protection. Configure frequency and retention settings to
        match business needs and regulatory obligations. Use
        Spanner's backup scheduling features to automate the
        creation, management, and monitoring of backups. For more information,
        see Create and manage backup schedules.
       | 
| ❑   | 
        Enable database deletion protection to prevent accidental or
        unauthorized deletions. Combine this with strict IAM
        controls to limit deletion privileges to a small, trusted set of users
        or service accounts. Additionally, configure infrastructure automation
        tools like Terraform to include safeguards against unintentional
        deletion of your databases. This layered approach minimizes risks to
        critical data assets. For more information, see
        Prevent accidental database deletion.
       | 
Logging and monitoring
Effective logging and monitoring are critical for maintaining visibility into database operations, detecting anomalies, and ensuring system health. By using audit logs, distributed tracing, dashboards, and proactive alerts, you can quickly identify and resolve issues, optimize performance, and meet compliance requirements.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Enable audit logging to capture detailed information about database
        activities. Configure audit log levels appropriately based on
        compliance and operational requirements to monitor access patterns and
        detect anomalies effectively. Be aware that audit logs might grow large
        especially for  DATA_READandDATA_WRITErequests since all SQL and DML statements are logged for these
        respective requests. For more information, see
         Spanner audit logging.Routing these logs to a user-defined log bucket lets you optimize your log retention costs (the first 30 days aren't charged) and to granularly control log access using log views. | 
| ❑   | 
        Collect client-side metrics by instrumenting your application logic
        with OpenTelemetry to distribute tracing and observability. Set up
        OpenTelemetry instrumentation to capture traces and metrics from 
        Spanner, ensuring end-to-end visibility into application
        performance and database interactions. For more information, see
        Capture custom client-side metrics using OpenTelemetry.
       | 
| ❑   | 
        Create and configure monitoring metrics to visualize query
        performance, latency, CPU utilization, and storage usage.
        Use these metrics for real-time tracking and historical analysis of
        database performance. For more information, see
        Monitor instances with Cloud Monitoring.
       | 
| ❑   | 
        Define threshold-based monitoring alerts for critical metrics to
        proactively detect and address issues. Configure alerts for
        conditions like high query latency, low storage availability, or
        unexpected spikes in traffic. Integrate these alerts with incident
        response tools for prompt action. For more information, see
        Create alerts for Spanner metrics.
       | 
Client library
Configuring operation tagging, session pools, and retry policies is vital for optimizing performance, debugging issues, and maintaining resilience in Spanner. These measures enhance observability, reduce latency, and ensure efficient handling of workload demands and transient errors, aligning system behavior with application requirements.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Configure the client library to use meaningful query request and
        transaction tags. You can use request and transaction tags to
        develop an understanding of your queries, reads, and transactions.
        As a best practice, use contextual metadata such as application
        component, request type, or user context, in your tags to enable
        enhanced debugging and introspection. Ensure tags are visible in query
        statistics and logs to facilitate performance analysis and
        troubleshooting. For more information, see
        Troubleshoot with request tags and transaction tags.
       | 
| ❑   | 
        Optimize session management by enabling session pooling in the client
        library. Configure pool settings, such as minimum and maximum
        sessions, to match workload demands while minimizing latency. Regularly
        monitor session usage to fine-tune these parameters and ensure that the
        session pool provides consistent performance benefits. For more
        information, see Sessions.
       | 
| ❑   | 
        In rare scenarios, the default client library parameters for retries,
        including maximum attempts and exponential backoff intervals, need to be
        configured to balance resilience with performance. Test these
        policies thoroughly to ensure that they align with application needs.
        For more information, see
        Configure custom timeouts and retries.
       | 
Support
To minimize downtime and impact, define clear incident roles and responsibilities to ensure prompt and coordinated responses to Spanner-related issues. For more information, see Get support.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Establish a clear incident response framework, defining roles and
        responsibilities for all team members involved in managing
        Spanner-related incidents. Designate incident roles such
        as Incident Commander, Communications Lead, and Subject Matter Experts
        (SMEs) to ensure efficient coordination and communication during
        incidents. Develop and document processes for identifying, escalating,
        mitigating and resolving issues. Follow best practices outlined in the
        Google SRE Workbook on Incident Response
        and Managing Incidents.
        Conduct regular incident response training and simulations to ensure
        readiness and improve the team's ability to manage high-pressure
        scenarios effectively.
       | 
Cost management
Implementing cost management strategies like autoscaling and incremental backups ensure efficient resource utilization and significant cost savings. Aligning resource provisioning with workload demands and optimizing non-production environments further reduces expenses while maintaining performance and flexibility.
| Checkbox | Activity | 
|---|---|
| ❑   | 
        Evaluate and purchase CUDs for Spanner to lower costs
        on predictable workloads. These commitments might provide
        significant savings compared to on-demand pricing. Analyze historical
        usage patterns to determine optimal CUD commitments. For more
        information see Committed use discounts
        and Spanner pricing.
       | 
| ❑   | 
        Monitor compute capacity utilization and adjust provisioned resources
        to maintain recommended CPU utilization levels. Over-provisioning
        compute resources might lead to unnecessary costs, while
        under-provisioning might impact performance. Follow the recommended
        maximum Spanner CPU utilization guidelines to ensure cost-effective resource alignment.
       | 
| ❑   | 
        Enable autoscaling to dynamically adjust compute capacity based on
        workload demands. This ensures optimal performance during peak loads
        while reducing costs during periods of low activity. Configure scaling
        policies with upper and lower limits to control cost and avoid
        over-scaling. For more information, see
        Autoscaling overview.
       | 
| ❑   | 
        Use incremental backups to reduce backup storage costs.
        Incremental backups only store data changes since the last backup. This
        significantly lowers storage requirements compared to full backups.
        Incorporate incremental backups into your backup strategy. For more
        information, see
        Incremental backups.
       | 
| ❑   | 
        Optimize costs for non-production environments by selecting the most
        optimal instance configuration and deprovisioning resources when
        environments aren't in use. For example, downsize non-critical
        environments after hours or automate resource scaling for development
        and testing scenarios. This approach minimizes costs while maintaining
        operational flexibility.
       |