BigLake metastore offers cross-region replication and disaster recovery to improve your catalog's availability and resilience.
This feature improves your catalog's availability and resilience by ensuring continuous access, protecting against regional outages, preventing data loss, and enabling failover for Apache Iceberg tables that use an Iceberg REST catalog.
How it works
BigLake metastore automatically selects primary and secondary regions for catalog metadata. The primary region processes all table commit metadata and then replicates it to the secondary region for backup.
At any time, especially during a disaster, you can switch the primary and secondary regions for the catalog using the failover operation. This action switches the primary for the catalog and all contained namespaces and tables.
Cross-region replication
Cross-region replication involves two main components: data replication and metastore replication. The disaster recovery feature builds upon cross-region replication to enable failover.
Data replication: Cloud Storage automatically replicates your catalog data across multiple regions when you use a dual-region or multi-region bucket. If a regional outage occurs, your data remains accessible without changes to storage paths.
Metastore replication: For Iceberg REST Catalogs, BigLake metastore automatically replicates your metastore when you use a dual region (or custom dual region) bucket. Metastore replication begins when you create the catalog. BigLake metastore selects a primary and secondary region from the regions defined in your Cloud Storage configuration. The primary region serves all table commit metadata and replicates it to the secondary region for backup.
Disaster recovery with failover
The disaster recovery feature lets you switch the primary and secondary regions for a catalog. The failover operation switches the primary region for the catalog and all its namespaces and tables. Failovers have two modes: soft failover and hard failover.
Soft failover: A soft failover prevents data loss. In this mode, the new primary region begins to accept writes only after all previous data synchronizes from the previous primary region. Use a soft failover for disaster recovery testing or other planned scenarios.
Hard failover: A hard failover prioritizes availability over data consistency and is designed to restore service. In this mode, the primary region always takes over and accepts write traffic, regardless primary region's current state. For example, when using a hard failover, the new primary region can take over even if the previous primary is unreachable.
Limitations
While this feature is in Preview, the REPLICATION_TIMESTAMP only tracks the catalog metadata, rather than Cloud Storage files. To keep data loss with a lower bound, see the Cloud Storage Data availability and durability documentation.
What's next
- Use cross-region replication and disaster recovery with BigLake metastore.