As of April 20th, 2026, BigLake is now called Lakehouse for Apache Iceberg. BigLake metastore is now called the Lakehouse runtime catalog. Lakehouse APIs, client libraries, CLI commands, and IAM names remain unchanged and still reference BigLake.

Set up cross-cloud Lakehouse for Databricks Unity Catalog

This document describes how to set up a cross-cloud Lakehouse to query data from a Databricks Unity Catalog catalog directly within Google Cloud. This capability unifies your data analytics by integrating your external data sources with your existing Google Cloud environment.

Afterward, you can use Lakehouse for Apache Iceberg to manage access to your federated data.

Before you begin

Review the Lakehouse overview to understand how Lakehouse manages access to data.
Read About cross-cloud Lakehouse to understand how it works.
Review the supported catalogs to verify external location requirements and supported configurations.
Understand how to use regional Secret Manager secrets. This is required to set up a cross-cloud Lakehouse with Databricks Unity Catalog using secret-based authentication.
Generate an OAuth Service Principal (client ID and optionally, client secret) within your remote catalog provider that has read access to the target catalog. This process is outside the scope of this documentation.
Optional: If you plan to route queries over a private interconnect between your Google Cloud VPC and your remote cloud provider's VPC (for example, AWS), ensure that you have an active account with your remote provider, provision a Dedicated Cross-Cloud Interconnect or Partner Cross-Cloud Interconnect, establish BGP sessions with your Cloud Router, and verify that you have the required IAM permissions in both cloud environments.

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Verify that billing is enabled for your Google Cloud project.

Enable the BigLake, Secret Manager APIs.

Roles required to enable APIs

To enable APIs, you need the serviceusage.services.enable permission. If you created the project, then you likely already have this permission through the Owner role (roles/owner). Otherwise, you can get this permission through the Service Usage Admin role (roles/serviceusage.serviceUsageAdmin). Learn how to grant roles.

Enable the APIs

Verify that billing is enabled for your Google Cloud project.

Enable the BigLake, Secret Manager APIs.

Roles required to enable APIs

Enable the APIs

Required roles

To get the permissions that you need to set up cross-cloud Lakehouse, ask your administrator to grant you the following IAM roles on your project:

Manage Lakehouse catalogs: BigLake Admin (roles/biglake.admin)
Manage secrets: Secret Manager Admin (roles/secretmanager.admin) (Required only if using secret-based authentication)
Route traffic over private interconnect (User): Compute Network Admin (roles/compute.networkAdmin)
Route traffic over private interconnect (Catalog service account):
- Service Directory Viewer (roles/servicedirectory.viewer)
- Service Directory PSC Authorized Service (roles/servicedirectory.pscAuthorizedService)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Supported catalog details

This guide provides instructions for setting up cross-cloud Lakehouse with a Databricks Unity Catalog catalog on Amazon Web Services (AWS) or Google Cloud. For detailed information regarding external location requirements and supported configurations, see Supported catalogs.

Limitations and considerations

This section lists the limitations and considerations for using cross-cloud Lakehouse.

Supported Cloud Providers: Using a private interconnect with your cross-cloud Lakehouse is supported with the following remote cloud providers: Amazon Web Services (AWS). You can use either a Dedicated Cross-Cloud Interconnect or a Partner Cross-Cloud Interconnect.
Only Databricks Unity Catalog catalogs that use an external location on AWS or an external location on Google Cloud are supported. Unity Catalog catalogs that use default storage on AWS or default storage on Google Cloud are not supported.
You must enable external data access on the metastore used by Unity Catalog, which is disabled by default.
Network routing: If a private interconnect (such as Dedicated CCI or Partner CCI) is not configured, queries route over the public internet. This can result in higher egress fees from your remote cloud provider and less predictable performance.
Data freshness: The --refresh-interval flag for the federated catalog determines how often metadata is synchronized. A shorter interval provides fresher data but can incur additional API costs from the remote catalog provider.
Iceberg Metrics Reporting: Iceberg Metrics Reporting is not available for federated catalogs. Set the rest-metrics-reporting-enabled property to false in your Iceberg client when accessing a federated catalog.

General workflow

To set up and use cross-cloud Lakehouse, follow these general steps:

Set up Cross-Cloud Interconnect (Optional): Configure a private connection between your Google Cloud VPC and your remote cloud provider.
Set up federation: Configure authentication and create a federated catalog in Lakehouse.
- OpenID Connect (OIDC) (Recommended): Create the federated catalog in Lakehouse specifying the remote Service Principal Application ID, then configure a federation policy in Databricks using the Lakehouse catalog service account's unique ID.
- Secret-based authentication: Create a secret in Secret Manager with your remote catalog credentials. Then, create a federated catalog in Lakehouse and grant the catalog service account access to the secret.
Verify the connection: Verify that Lakehouse can successfully connect to your remote catalog.
Query data: Run queries against your federated data using BigQuery or Managed Service for Apache Spark. For more information, see Use cross-cloud Lakehouse.
Configure permissions: Use Identity and Access Management (IAM) to manage who can view and query the federated data.

Set up Cross-Cloud Interconnect (Optional)

Queries to your remote catalog travel over the public internet by default. To help enhance security and compliance, provide predictable performance, and reduce data transfer costs, use a private interconnect. This establishes a dedicated, private network connection between your Google Cloud Virtual Private Cloud (VPC) and your remote cloud provider's network (for example, AWS).

You can provision and configure either of the following private interconnect options between your Google Cloud VPC and your remote cloud provider's VPC (for example, AWS):

Dedicated Cross-Cloud Interconnect: A dedicated physical connection.
Partner Cross-Cloud Interconnect: A connection through a supported service provider.

Establish BGP sessions between your Cloud Router in Google Cloud and your remote cloud provider's VPC to ensure route exchange.

To enable private querying, you must configure a path from Lakehouse to your remote storage bucket (for example, an AWS Amazon S3 bucket) through your private interconnect. There are two architectural flows you can follow to configure this routing:

Internal regional proxy Network Load Balancer routing: This flow uses a Google Cloud internal regional proxy Network Load Balancer to distribute requests across Hybrid Connectivity Network Endpoint Groups (NEGs) pointing to multiple AWS Elastic Network Interfaces (ENIs). This flow is essential for load balancing, scalability, and high availability. It is required for Partner CCI and recommended for Dedicated CCI for load balancing, scalability, and high availability.
Direct endpoint routing: This flow connects Service Directory directly to a single AWS Interface VPC Endpoint IP address. This flow only works for Dedicated CCI and is not supported for Partner CCI.

Select the configuration flow that matches your architecture requirements:

Internal regional proxy Network Load Balancer

To configure an internal regional proxy Network Load Balancer to distribute requests across multiple AWS ENIs for high availability and load balancing, follow these steps:

Configure AWS networking

First, create an Amazon S3 VPC Interface Endpoint (AWS PrivateLink):

In the AWS VPC console, create an Interface Endpoint for Amazon S3.
For the service name, specify com.amazonaws.AWS_REGION.s3.
Select the VPC and subnets that are connected through Direct Connect to your Google Cloud VPC.
Attach Security Groups to the endpoint to control inbound access.
This provisions Elastic Network Interfaces (ENIs) in each selected subnet. Note the private IP addresses of these ENIs.

Next, configure Security Groups:

Ensure that the Security Group or groups attached to the Amazon S3 Endpoint ENIs allow inbound TCP traffic on port 443 from your Google Cloud VPC. This must include the CIDR range of your Google Cloud proxy-only subnet to allow health checks and forwarded traffic.

Configure Google Cloud networking

To simplify setup, run the following commands to configure the internal load balancer. For advanced configurations or more details, see Set up an Internal regional proxy Network Load Balancer for hybrid endpoints.

gcloud compute networks subnets create PROXY_SUBNET_NAME \
    --purpose=REGIONAL_MANAGED_PROXY \
    --role=ACTIVE \
    --region=REGION \
    --network=VPC_NETWORK \
    --range=PROXY_SUBNET_RANGE

Replace the following:

PROXY_SUBNET_NAME: a name for the proxy-only subnet.
PROXY_SUBNET_RANGE: an unused CIDR range within your VPC network (for example, 10.129.0.0/23).

Create a regional health check:
```
gcloud compute health-checks create tcp HEALTH_CHECK_NAME \
    --region=REGION \
    --port=443
```
Replace the following:
- HEALTH_CHECK_NAME: a name for the health check.
- REGION: the Google Cloud region (for example, us-east4).
Create hybrid Connectivity Network Endpoint Groups (NEGs) and add endpoints:

Create a hybrid NEG (NON_GCP_PRIVATE_IP_PORT) for each zone:
```
gcloud compute network-endpoint-groups create NEG_NAME \
    --network-endpoint-type=NON_GCP_PRIVATE_IP_PORT \
    --zone=ZONE \
    --network=VPC_NETWORK
```
Add the private IP address of your AWS ENI to the corresponding hybrid NEG:
```
gcloud compute network-endpoint-groups update NEG_NAME \
    --zone=ZONE \
    --add-endpoint="ip=AWS_S3_IP,port=443"
```
Replace the following:
- NEG_NAME: a name for the hybrid NEG.
- ZONE: the Google Cloud zone (for example, us-east4-a). This zone must reside within the region of your Cross-Cloud Interconnect VLAN attachment.
- VPC_NETWORK: the name of your VPC network.
- AWS_S3_IP: the private IP address of the AWS Amazon S3 VPC Endpoint (ENI) in that zone.
Repeat these commands to create NEGs and add endpoints for other zones if your AWS ENIs are distributed across multiple zones.

Create and configure the backend service:

Create a regional backend service with internal managed load balancing:

gcloud compute backend-services create BACKEND_SERVICE_NAME \
    --load-balancing-scheme=INTERNAL_MANAGED \
    --protocol=TCP \
    --region=REGION \
    --health-checks=HEALTH_CHECK_NAME \
    --health-checks-region=REGION

Add your hybrid NEGs to the backend service:

gcloud compute backend-services add-backend BACKEND_SERVICE_NAME \
    --region=REGION \
    --network-endpoint-group=NEG_NAME \
    --network-endpoint-group-zone=ZONE \
    --balancing-mode=CONNECTION \
    --max-connections=MAX_CONNECTIONS

Replace the following:

BACKEND_SERVICE_NAME: a name for the backend service.
NEG_NAME: the name of the hybrid NEG you created in the previous step.
ZONE: the Google Cloud zone (for example, us-east4-a).
MAX_CONNECTIONS: the maximum concurrent connections the backend should handle (for example, 100).

Repeat the add-backend command for each hybrid NEG you created.

Configure the load balancer frontend:

Create a target TCP proxy:

gcloud compute target-tcp-proxies create TARGET_PROXY_NAME \
    --backend-service=BACKEND_SERVICE_NAME \
    --region=REGION

Create a forwarding rule to route traffic to the target proxy:

gcloud compute forwarding-rules create FORWARDING_RULE_NAME \
    --load-balancing-scheme=INTERNAL_MANAGED \
    --network=VPC_NETWORK \
    --subnet=VPC_SUBNET \
    --ports=443 \
    --region=REGION \
    --target-tcp-proxy=TARGET_PROXY_NAME \
    --target-tcp-proxy-region=REGION \
    --allow-global-access

Replace the following:

TARGET_PROXY_NAME: a name for the target proxy.
FORWARDING_RULE_NAME: a name for the forwarding rule.
VPC_SUBNET: the name of your VPC subnetwork.

After creating the forwarding rule for the load balancer, note the internal IP address assigned to it. This is your ILB_IP_ADDRESS.

Configure Service Directory

Create a namespace for your remote cloud:
```
gcloud service-directory namespaces create NAMESPACE \
    --project=PROJECT_ID \
    --location=REGION
```
Replace the following:
- NAMESPACE: a unique identifier for your namespace.
- PROJECT_ID: your Google Cloud project ID.
- REGION: the Google Cloud region. For example, us-east4. This must be the same region as the federated catalog.

Create a service in the Service Directory namespace:

gcloud service-directory services create SERVICE_NAME \
    --namespace=NAMESPACE \
    --project=PROJECT_ID \
    --location=REGION

Replace the following:

SERVICE_NAME: a unique identifier for your service.

Create an endpoint for the ILB in the service:

gcloud service-directory endpoints create ENDPOINT_NAME \
    --project=PROJECT_ID \
    --namespace=NAMESPACE \
    --service=SERVICE_NAME \
    --location=REGION \
    --network=projects/PROJECT_NUMBER/global/networks/VPC_NETWORK \
    --address=ILB_IP_ADDRESS \
    --port=443

Replace the following:

ENDPOINT_NAME: a unique identifier for your endpoint.
PROJECT_NUMBER: your Google Cloud project number. Use your project number in the --network flag.
ILB_IP_ADDRESS: the internal IP address of your ILB forwarding rule.

Direct endpoint

To configure Service Directory to route traffic directly to a single AWS Interface VPC Endpoint IP address, follow these steps:

Create an Interface VPC Endpoint for Amazon S3 inside your AWS VPC. Note the IP address and port of this endpoint.
Create a namespace for your remote cloud:
```
gcloud service-directory namespaces create NAMESPACE \
    --project=PROJECT_ID \
    --location=REGION
```
Replace the following:
- NAMESPACE: a unique identifier for your namespace.
- PROJECT_ID: your Google Cloud project ID.
- REGION: the Google Cloud region. For example, us-east4. This must be the same region as the federated catalog.

Create a service in the Service Directory namespace:

gcloud service-directory services create SERVICE_NAME \
    --namespace=NAMESPACE \
    --project=PROJECT_ID \
    --location=REGION

Replace the following:

SERVICE_NAME: a unique identifier for your service.

Create an endpoint in the service containing the routing information for your Amazon S3 Interface VPC Endpoint:
```
gcloud service-directory endpoints create ENDPOINT_NAME \
    --service=SERVICE_NAME \
    --namespace=NAMESPACE \
    --project=PROJECT_ID \
    --location=REGION \
    --address=S3_VPCE_IP_ADDRESS \
    --port=S3_VPCE_PORT \
    --network=projects/PROJECT_NUMBER/global/networks/VPC_NETWORK
```
Replace the following:
- ENDPOINT_NAME: a unique identifier for your endpoint.
- S3_VPCE_IP_ADDRESS: the IP address of your Amazon S3 Interface VPC Endpoint. For example, 10.0.1.45.
- S3_VPCE_PORT: the port number of your Amazon S3 Interface VPC Endpoint. For example, 443.
- PROJECT_NUMBER: your Google Cloud project number. Use your project number in the --network flag.
- VPC_NETWORK: the Google Cloud VPC network name associated with your private interconnect.

Set up federation

To query your data, you must set up a Lakehouse federated catalog that connects to your remote catalog.

Configure authentication

Federation requires authenticating to the remote catalog as a Databricks service principal. Choose one of the following authentication options:

OIDC

OpenID Connect (OIDC) allows secretless authentication through Databricks OAuth token federation.

Create a Service Principal in your Databricks account that has read access to the target Unity Catalog catalog.
Save the Application ID (UUID) for the Service Principal. You will need this for a later step.

Secret-based

This option uses an OAuth client ID and client secret stored securely by using regional Secret Manager secrets.

Create a JSON file named credentials.json with your payload:
```
{
"client_id": "CLIENT_ID",
"client_secret": "CLIENT_SECRET"
}
```
Replace the following:
- CLIENT_ID: The OAuth client ID for your Databricks Service Principal.
- CLIENT_SECRET: The OAuth client secret for your Databricks Service Principal.
Configure the regional endpoint for Secret Manager:

By default, Secret Manager uses a global endpoint. However, cross-cloud Lakehouse requires that your secrets be stored in the same region as your Lakehouse catalog. To interact with regional secrets using the gcloud CLI, you must override the default API endpoint for your current session or profile. To avoid connectivity issues, your secret and your catalog must be created in the same region. For example, secretmanager.us-east4.rep.googleapis.com.
```
gcloud config set api_endpoint_overrides/secretmanager https://secretmanager.REGION.rep.googleapis.com/
```
Replace the following:
- REGION: The Google Cloud region where your Secret Manager secret is stored. For example, us-east4. To avoid connectivity issues, your secret and your catalog must be created in the same region. For example, secretmanager.us-east4.rep.googleapis.com.

Upload the payload to Secret Manager:

gcloud secrets create DATABRICKS_SECRET_NAME \
--location="REGION" \
--project="PROJECT_ID" \
--data-file=credentials.json

Replace the following:

DATABRICKS_SECRET_NAME: A name for your Databricks secret.

Create a federated catalog

Create the federated catalog using the Google Cloud console or the gcloud CLI.

Considerations:

Only Databricks Unity Catalog catalogs that use an external location on AWS or an external location on Google Cloud are supported. Unity Catalog catalogs that use default storage on AWS or default storage on Google Cloud are not supported.
You must enable external data access on the metastore used by Unity Catalog, which is disabled by default.

Console

To create a federated catalog using secret-based authentication:

In the Google Cloud console, go to Lakehouse.

Go to Lakehouse
Click Create catalog.
Click Federated catalog.

The Catalog configuration details appear.
For Federated catalog source, select Unity (Databricks).
For Data location, select the Lakehouse region where you want to create the federated catalog. For example, us-east4. To minimize latency (even over public internet) do the following when selecting a region:
- If your Unity Catalog catalog is on AWS, select the Google Cloud region closest to your AWS region.
- If your Unity Catalog catalog is on Google Cloud, select the exact same region.
Click Continue.

The Connection details details appear.
In the Remote catalog details section, in the Unity instance name field, enter your target Databricks instance name. For example: abcd.cloud.databricks.com.
In the Unity catalog name field, enter the name of the target Databricks Unity Catalog catalog to federate to.
Select an option for Authentication Method:
- For OIDC, enter the Application ID for the Service Principal.
- For Secret, enter the name of your secret. Use the following format: projects/PROJECT_ID/locations/REGION/secrets/DATABRICKS_SECRET_NAME.
Optional: In the Service directory name field, enter the path to your Service Directory service. For example: projects/PROJECT_ID/locations/REGION/namespaces/NAMESPACE/services/SERVICE_NAME. This is only required if you are configuring a Cross-Cloud Interconnect.
Click Create.

gcloud CLI

OIDC

If you use OIDC, you must specify unity-service-principal-application-id.

Public internet (no CCI)

gcloud biglake iceberg catalogs create FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --primary-location="REGION" \
    --catalog-type="federated" \
    --federated-catalog-type="unity" \
    --unity-service-principal-application-id="UNITY_SERVICE_PRINCIPAL_APPLICATION_ID" \
    --unity-instance-name="UNITY_INSTANCE_NAME" \
    --unity-catalog-name="UNITY_CATALOG_NAME" \
    --refresh-interval="REFRESH_INTERVAL" \
    --namespace-filters="NAMESPACE_FILTERS"

Replace the following:

PROJECT_ID: your Google Cloud project ID.
REGION: the Lakehouse region where the federated catalog is created. For example, us-east4. To minimize latency, do the following when selecting a region:
- If your Unity Catalog catalog is on AWS, select the Google Cloud region closest to your AWS region.
- If your Unity Catalog catalog is on Google Cloud, select the exact same region.
UNITY_SERVICE_PRINCIPAL_APPLICATION_ID: The Application ID (UUID) of your Databricks Service Principal.
UNITY_INSTANCE_NAME: your target Databricks instance name. For example: abcd.cloud.databricks.com.
UNITY_CATALOG_NAME: the name of the target Databricks Unity Catalog catalog to federate to.
REFRESH_INTERVAL: Optional: Specifies how often to update the catalog's information. Set this value as a duration, for example, 330s or 5m30s. Shorter intervals update data more often but can cost more in API calls. Longer intervals can cost less, but the queried data might not reflect your most current dataset. If omitted or if the value is set to 0s, then background metadata refresh will not start. It will remain disabled until the refresh interval is updated to a positive value.
NAMESPACE_FILTERS: Optional: A comma-separated list of namespaces to federate. For example, ns1,ns2. If omitted, all namespaces will be included.

Customer-owned (CCI)

If you configured a private interconnect (such as Dedicated CCI or Partner CCI), provide the Service Directory service reference so that Lakehouse routes traffic privately.

gcloud biglake iceberg catalogs create FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --primary-location="REGION" \
    --catalog-type="federated" \
    --federated-catalog-type="unity" \
    --unity-service-principal-application-id="UNITY_SERVICE_PRINCIPAL_APPLICATION_ID" \
    --unity-instance-name="UNITY_INSTANCE_NAME" \
    --unity-catalog-name="UNITY_CATALOG_NAME" \
    --refresh-interval="REFRESH_INTERVAL" \
    --namespace-filters="NAMESPACE_FILTERS" \
    --service-directory-name="projects/PROJECT_ID/locations/REGION/namespaces/NAMESPACE/services/SERVICE_NAME"

Replace the following:

PROJECT_ID: your Google Cloud project ID.
PROJECT_NUMBER: your Google Cloud project number.
REGION: the Lakehouse region where the federated catalog is created. For example, us-east4. To minimize latency, do the following when selecting a region:
- If your Unity Catalog catalog is on AWS, select the Google Cloud region closest to your AWS region.
- If your Unity Catalog catalog is on Google Cloud, select the exact same region. Note: This must be the same region as the Service Directory namespace and regional secret.
UNITY_SERVICE_PRINCIPAL_APPLICATION_ID: The Application ID (UUID) of your Databricks Service Principal.
UNITY_INSTANCE_NAME: your target Databricks instance name. For example: abcd.cloud.databricks.com.
UNITY_CATALOG_NAME: the name of the target Databricks Unity Catalog catalog to federate.
REFRESH_INTERVAL: Optional: Specifies how often to update the catalog's information. Set this value as a duration, for example, 330s or 5m30s. Shorter intervals update data more often but can cost more in API calls. Longer intervals can cost less, but the queried data might not reflect your most current dataset. If omitted or if the value is set to 0s, then background metadata refresh will not start. It will remain disabled until the refresh interval is updated to a positive value.
NAMESPACE_FILTERS: Optional: A comma-separated list of namespaces to federate. For example, ns1,ns2. If omitted, all namespaces will be included.
NAMESPACE: the Service Directory namespace you created during private interconnect setup.
SERVICE_NAME: the Service Directory service name you created during private interconnect setup.

Secret-based

If you use secret-based authentication, you must specify secret-name.

Public internet (no CCI)

gcloud biglake iceberg catalogs create FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --primary-location="REGION" \
    --catalog-type="federated" \
    --federated-catalog-type="unity" \
    --secret-name="projects/PROJECT_ID/locations/REGION/secrets/DATABRICKS_SECRET_NAME" \
    --unity-instance-name="UNITY_INSTANCE_NAME" \
    --unity-catalog-name="UNITY_CATALOG_NAME" \
    --refresh-interval="REFRESH_INTERVAL" \
    --namespace-filters="NAMESPACE_FILTERS"

Replace the following:

PROJECT_ID: your Google Cloud project ID.
REGION: the Lakehouse region where the federated catalog is created. For example, us-east4. To minimize latency, do the following when selecting a region:
- If your Unity Catalog catalog is on AWS, select the Google Cloud region closest to your AWS region.
- If your Unity Catalog catalog is on Google Cloud, select the exact same region.
DATABRICKS_SECRET_NAME: the name of your Databricks secret.
UNITY_INSTANCE_NAME: your target Databricks instance name. For example: abcd.cloud.databricks.com.
UNITY_CATALOG_NAME: the name of the target Databricks Unity Catalog catalog to federate to.
REFRESH_INTERVAL: Optional: Specifies how often to update the catalog's information. Set this value as a duration, for example, 330s or 5m30s. Shorter intervals update data more often but can cost more in API calls. Longer intervals can cost less, but the queried data might not reflect your most current dataset. If omitted or if the value is set to 0s, then background metadata refresh will not start. It will remain disabled until the refresh interval is updated to a positive value.
NAMESPACE_FILTERS: Optional: A comma-separated list of namespaces to federate. For example, ns1,ns2. If omitted, all namespaces will be included.

Customer-owned (CCI)

gcloud biglake iceberg catalogs create FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --primary-location="REGION" \
    --catalog-type="federated" \
    --federated-catalog-type="unity" \
    --secret-name="projects/PROJECT_ID/locations/REGION/secrets/DATABRICKS_SECRET_NAME" \
    --unity-instance-name="UNITY_INSTANCE_NAME" \
    --unity-catalog-name="UNITY_CATALOG_NAME" \
    --refresh-interval="REFRESH_INTERVAL" \
    --namespace-filters="NAMESPACE_FILTERS" \
    --service-directory-name="projects/PROJECT_ID/locations/REGION/namespaces/NAMESPACE/services/SERVICE_NAME"

Replace the following:

PROJECT_ID: your Google Cloud project ID.
PROJECT_NUMBER: your Google Cloud project number.
REGION: the Lakehouse region where the federated catalog is created. For example, us-east4. To minimize latency, do the following when selecting a region:
- If your Unity Catalog catalog is on AWS, select the Google Cloud region closest to your AWS region.
- If your Unity Catalog catalog is on Google Cloud, select the exact same region. Note: This must be the same region as the Service Directory namespace and regional secret.
DATABRICKS_SECRET_NAME: the name of your Databricks secret.
UNITY_INSTANCE_NAME: your target Databricks instance name. For example: abcd.cloud.databricks.com.
UNITY_CATALOG_NAME: the name of the target Databricks Unity Catalog catalog to federate.
REFRESH_INTERVAL: Optional: Specifies how often to update the catalog's information. Set this value as a duration, for example, 330s or 5m30s. Shorter intervals update data more often but can cost more in API calls. Longer intervals can cost less, but the queried data might not reflect your most current dataset. If omitted or if the value is set to 0s, then background metadata refresh will not start. It will remain disabled until the refresh interval is updated to a positive value.
NAMESPACE_FILTERS: Optional: A comma-separated list of namespaces to federate. For example, ns1,ns2. If omitted, all namespaces will be included.
NAMESPACE: the Service Directory namespace you created during private interconnect setup.
SERVICE_NAME: the Service Directory service name you created during private interconnect setup.

Complete authentication setup

Complete the setup based on your chosen authentication option:

OIDC

When the catalog is created, Lakehouse provisions a unique service account for it. You must retrieve the service account's numeric unique ID and use it to configure a federation policy for the Databricks Service Principal.

Retrieve the Lakehouse catalog service account's numeric unique ID:

gcloud biglake iceberg catalogs describe FEDERATED_CATALOG_NAME \
  --project="PROJECT_ID" \
  --format='value(biglake-service-account-id)'

In your Databricks account, configure a federation policy for the Service Principal you created earlier. Use the following values:
- Issuer URL: https://accounts.google.com
- Subject: the numeric unique ID of the Lakehouse catalog service account retrieved in the previous step.
- Audience: https://accounts.cloud.databricks.com
Ensure that the Databricks Service Principal has read access to the target catalog and workspace.

Secret-based

When the catalog is created, Lakehouse provisions a unique service account for it (returned as biglake-service-account in the resource description).

You must grant this service account permission to access the secret you created earlier. Note that propagating IAM policies can take a few minutes.

Grant the catalog's service account permission to access the secret.

gcloud config set api_endpoint_overrides/secretmanager https://secretmanager.REGION.rep.googleapis.com/
gcloud secrets add-iam-policy-binding DATABRICKS_SECRET_NAME \
--project="PROJECT_ID" \
--location="REGION" \
--member="serviceAccount:$(gcloud biglake iceberg catalogs describe FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --format='value(biglake-service-account)')" \
--role="roles/secretmanager.secretAccessor"

To verify that the federated catalog service account has access to the secret, run the following command:

gcloud config set api_endpoint_overrides/secretmanager https://secretmanager.REGION.rep.googleapis.com/
gcloud secrets get-iam-policy DATABRICKS_SECRET_NAME \
   --project="PROJECT_ID" \
   --location="REGION"

In the output, verify that the biglake-service-account service account has the roles/secretmanager.secretAccessor role assigned to it.

Complete private interconnect setup (Cross-Cloud Interconnect only)

If you are routing traffic over a private interconnect (Dedicated or Partner CCI), you must grant the Lakehouse catalog service account permissions to discover and authorize connections through Service Directory.

Console

In the Google Cloud console, go to the IAM page.

Go to IAM
Click Grant Access (or Add).
In the New principals field, enter the Lakehouse catalog service account email. You can retrieve this email by describing the catalog (see the gcloud tab).
In the Role dropdown, select Service Directory Viewer (roles/servicedirectory.viewer).
Click Add another role and select Service Directory PSC Authorized Service (roles/servicedirectory.pscAuthorizedService).
Click Save.

gcloud CLI

Grant the required roles to the catalog's service account:

# Grant Service Directory Viewer
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:$(gcloud alpha biglake iceberg catalogs describe FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --format='value(biglake-service-account)')" \
--role="roles/servicedirectory.viewer"

# Grant Service Directory PSC Authorized Service
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:$(gcloud alpha biglake iceberg catalogs describe FEDERATED_CATALOG_NAME \
    --project="PROJECT_ID" \
    --format='value(biglake-service-account)')" \
--role="roles/servicedirectory.pscAuthorizedService"

Verify the connection

Verify that the catalog background metadata refresh cycle completed successfully and namespaces are synchronized.

Verify that the refresh status indicates success:

gcloud biglake iceberg catalogs describe FEDERATED_CATALOG_NAME \
  --project="PROJECT_ID"

Confirm that remote schemas appear as synchronized namespaces:

gcloud biglake iceberg namespaces list \
  --project="PROJECT_ID" \
  --catalog="FEDERATED_CATALOG_NAME"

Set up cross-cloud Lakehouse for Databricks Unity Catalog Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Required roles

Supported catalog details

Limitations and considerations

General workflow

Set up Cross-Cloud Interconnect (Optional)

Internal regional proxy Network Load Balancer

Direct endpoint

Set up federation

Configure authentication

OIDC

Secret-based

Create a federated catalog

Console

gcloud CLI

OIDC

Public internet (no CCI)

Customer-owned (CCI)

Secret-based

Public internet (no CCI)

Customer-owned (CCI)

Complete authentication setup

OIDC

Secret-based

Complete private interconnect setup (Cross-Cloud Interconnect only)

Console

gcloud CLI

Verify the connection

What's next

Set up cross-cloud Lakehouse for Databricks Unity Catalog