A Connect cluster provides an environment for connectors that helps to move data from existing Kafka deployments into a Google Cloud Managed Service for Apache Kafka cluster or move data from Managed Service for Apache Kafka cluster to another Google Cloud service or another Kafka cluster. The secondary Kafka cluster can be another Google Cloud Managed Service for Apache Kafka cluster, a self-managed, or a on-premises one.
Before you begin
Ensure that you have already created a Managed Service for Apache Kafka cluster. You need the name of the Managed Service for Apache Kafka cluster to which the Connect cluster is going to be attached.
Each Connect cluster is associated with a Managed Service for Apache Kafka cluster. This cluster stores the state of the connectors running on the Connect cluster.
Required roles and permissions to create a Connect cluster
    
      To get the permissions that
      you need to create a Connect cluster,
    
      ask your administrator to grant you the
    
  
  
    
      Managed Kafka Connect Cluster Editor  (roles/managedkafka.connectClusterEditor)
     IAM role on your project.
  
  
  
  
  For more information about granting roles, see Manage access to projects, folders, and organizations.
  
  
This predefined role contains the permissions required to create a Connect cluster. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a Connect cluster:
- 
                Grant the create a Connect cluster permission on the specified location:
                  managedkafka.connectClusters.create
You might also be able to get these permissions with custom roles or other predefined roles.
For more information about this role, see Managed Service for Apache Kafka predefined roles.
Required ACL principals
By default Managed Service for Apache Kafka clusters let Connect Cluster
access to resources if no ACLs are configured. This is done by setting
allow.everyone.if.no.acl.found to true, which is the default setting.
However, if the Managed Service for Apache Kafka cluster has ACLs configured, the Connect Cluster doesn't automatically have the read and write permissions to the resources. You have to grant them manually.
The Connect cluster service account that is used as the principal in ACLs
follows this format: User:service-{consumer project
number}@gcp-sa-managedkafka.iam.gserviceaccount.com.
If you have configured ACLs on your Kafka cluster, grant the Connect cluster read and write permissions to topics and read permissions to consumer groups using the following commands:
/bin/kafka-acls.sh \
    --bootstrap-server BOOTSTRAP_ADDR \
    --command-config PATH_TO_CLIENT_PROPERTIES \
    --add \
    --allow-principal User:service-{consumer project number}@gcp-sa-managedkafka.iam.gserviceaccount.com \
    --operation READ --operation WRITE --topic *
/bin/kafka-acls.sh \
    --bootstrap-server BOOTSTRAP_ADDR \
    --command-config PATH_TO_CLIENT_PROPERTIES \
    --add \
    --allow-principal User:service-{consumer project number}@gcp-sa-managedkafka.iam.gserviceaccount.com \
    --operation READ --group *
For more information about these commands, see Configure Apache Kafka ACLs for granular access control.
Create a Connect cluster in a different project
When you create a Connect cluster, it shares the same service agent with the Managed Service for Apache Kafka cluster that is in the same project. If this Managed Service for Apache Kafka cluster is designated as the primary Kafka cluster attached to the Connect cluster, no additional permissions are required.
The service agent is of the format
service-<project_number>@gcp-sa-managedkafka.iam.gserviceaccount.com. The
project number is of the project containing the Connect cluster and the
Managed Service for Apache Kafka cluster.
If your Connect cluster is in project A and the associated
Managed Service for Apache Kafka cluster is in project B, follow these
steps:
- Ensure that the Managed Kafka API is enabled for both project - Aand project- B.
- Identify the service agent of the Connect cluster in project - A.- The service agent is of the format - service-<project_number>@gcp-sa-managedkafka.iam.gserviceaccount.com.
- In project - B, grant the Connect cluster's service account the Managed Kafka Client role (- roles/managedkafka.client).- This role grants the necessary permissions to connect to the Managed Service for Apache Kafka cluster and perform operations like reading and writing data. - For more information about how to grant the role, see Create and grant roles to service agents. 
Always follow the principle of least privilege when granting permissions. Grant only the necessary permissions to ensure security and prevent unauthorized access.
Properties of a Connect cluster
This section describes the properties of a Connect cluster.
Connect cluster name
The name of the Connect cluster that you are creating. For guidelines on how to name a Connect cluster, see Guidelines to name a Managed Service for Apache Kafka resource. The name of a cluster is immutable.
Primary Kafka cluster
The Managed Service for Apache Kafka cluster associated with your Connect cluster. This associated cluster (primary cluster) stores the state of the connectors running on the Connect cluster. Generally, the primary Managed Service for Apache Kafka cluster also serves as the destination for all source connectors and the input for all sink connectors running on the Connect cluster.
A single Managed Service for Apache Kafka cluster can have multiple Connect clusters. If you choose a Managed Service for Apache Kafka cluster in a different project, ensure appropriate permissions are configured.
You cannot update to a different Kafka cluster after you create the Connect cluster.
Region colocation benefits for latency and network costs
Co-locating your Managed Service for Apache Kafka and Connect clusters in
the same region reduces latency and network costs. For example, assume your
Managed Service for Apache Kafka cluster is in region-a and you're using a
sink connector to write data from this Managed Service for Apache Kafka
cluster (source) to a BigQuery table (sink) that is also in
region-a. If you deploy your Connect cluster in region-a, this deployment
choice minimizes latency for the BigQuery write operation and
eliminates inter-region network transfer costs between the
Managed Service for Apache Kafka cluster and the Connect cluster.
Multi-system latency and cost considerations
Kafka Connect uses connectors to move data between systems. One side of the connector always interacts with a Managed Service for Apache Kafka cluster. A single Kafka Connect cluster can run multiple connectors, each acting as either a source (pulling data from a system) or a sink (pushing data to a system).
While a Connect cluster in the same region as the Managed Service for Apache Kafka cluster benefits from lower communication latency between them, each connector also interacts with another system, such as a BigQuery table or another Kafka cluster. Even if the Connect cluster and the Managed Service for Apache Kafka cluster are co-located, that other system could be in a different region. This leads to higher latency and cost. The overall pipeline latency depends on the locations of all three systems: the Managed Service for Apache Kafka cluster, the Connect cluster, and the source or sink system.
For example, if your Managed Service for Apache Kafka cluster is in
region-a, your Connect cluster in region-b, and you're using a
Cloud Storage connector for a bucket in region-c, you'll be charged for two
network hops (region-a to region-b and then region-b to region-c, or the
reverse depending on the connector direction).
Carefully consider all involved regions when planning your Connect cluster placement to optimize for both latency and cost.
Capacity configuration
Capacity configuration requires you to configure the number of vCPUs and the amount of memory for each vCPU for your Connect cluster. You can update the capacity of a Connect cluster after you create it. The following are the properties for capacity configuration:
- vCPUs: The number of vCPUs assigned to a Connect cluster. The minimum value is 3 vCPUs. 
- Memory: The amount of memory that is assigned for each vCPU. You must provision between 1 GiB and 8 GiB per vCPU. The amount of memory can be increased or decreased within these limits after the cluster is created. - For example, if you create a cluster with 6 vCPUs, the minimum memory you can allocate to the cluster is 6 GiB (1 GiB per vCPU), and the maximum is 48 GiB (8 GiB per vCPU). 
The vCPU and memory allocated to each worker in a Connect cluster have a significant impact on the cluster's performance, capacity, and cost. Here's a breakdown of how vCPU and memory affect a Connect cluster.
vCPU count
- Kafka Connect divides the work of a connector into tasks. Each task can process data in parallel. More vCPUs mean more tasks can run simultaneously, leading to higher throughput. 
- More vCPUs increases the costs for your Connect cluster. 
Memory
- Kafka Connect uses memory for buffering data as it flows between connectors and Managed Service for Apache Kafka. Larger memory allows for larger buffers. Large memory can improve throughput, especially for high-volume data streams. Connectors dealing with very large messages or records require sufficient memory to process them without running into - OutOfMemoryErrorexceptions.
- More memory increases the cost of your Connect cluster. 
- If you are using heavy transformation logic, you require more memory allocation. 
Your goal is to pick the right capacity configuration for your Connect cluster. To do this, you must understand the throughput your Connect cluster can handle.
Network configuration
A Virtual Private Cloud (VPC) is a logically isolated section of the Google Cloud network that provides networking for your cloud resources. Subnets are subdivisions of a VPC network, letting you segment your network and control how resources communicate with each other and with external networks. For more information about these resources, see the following:
To ensure your Connect cluster can effectively communicate with your Managed Service for Apache Kafka cluster and other systems, you must configure its network settings. This involves specifying the following properties:
- Worker or primary subnet: The subnet used for allocating IP addresses to the Connect cluster workers (IP consumption). 
- Accessible subnets: The subnets where the Connect cluster workers can access IP addresses of other resources (IP accessibility). 
- Resolvable DNS Domains: DNS addresses that the Connect cluster can resolve to access other services, including other Kafka clusters. 
Worker or primary subnet
The worker subnet, also known as the primary subnet, is used to create a Private Service Connect interface for the Connect cluster workers. Workers send traffic through IP addresses in this subnet. The IP addresses are solely allocated from this worker subnet IP range. This subnet also enables Connect clusters to access IP addresses of network objects such as servers and endpoints located within those subnets.
Here are some requirements for configuring the worker subnet:
- The worker subnet is required. 
- The worker subnet must reside within a VPC. Only one VPC is supported at this time. 
- The worker subnet must be located in the same region as the Connect cluster. 
We recommend that you select the subnet where your primary Kafka Private Service Connect endpoints exist as the primary subnet.
Accessible subnets
Accessible subnets are the subnets within the VPC network where the Connect cluster workers can access IP addresses. The primary purpose of additional subnets is to extend the Connect cluster worker connectivity to other subnets within the same VPC network. These subnets enable workers to access IP addresses of network objects such as servers and endpoints within those subnets.
The full resource URI path of the subnet is in the
format projects/{project}/regions/{region}/subnetworks/{subnet_id}.
Key guidelines for your subnet configuration:
- Additional subnets are optional. However, if your worker subnet is in a different subnet than your primary Kafka Private Service Connect endpoints, you must include the Private Service Connect endpoint subnet as an additional subnet to ensure connectivity. 
- All subnets must reside within the same VPC. Only one VPC is supported at this time. 
- Additional subnets can be located in regions different from the worker subnet. 
- The maximum number of additional subnets that you can configure is 10. This makes the total number of available subnets (worker + additional) as eleven. 
- The worker subnet is automatically included in the list of accessible subnets. 
Resolvable DNS domains
Resolvable DNS domains, also known as DNS domain names, allow DNS addresses in the consumer VPC network to be made available to the tenant VPC. This enables the Connect cluster to resolve DNS names to IP addresses, facilitating communication with other services, including other Kafka clusters for MirrorMaker connectors.
For resolvable DNS domains, you can select a Managed Service for Apache Kafka cluster. You don't need to configure the DNS domain name for the primary Managed Service for Apache Kafka cluster, as its bootstrap address is automatically included in the list of resolvable DNS domains.
However, you can also specify a DNS domain manually, which is necessary if you select an external Kafka cluster. The primary Managed Service for Apache Kafka cluster's DNS domain is automatically included. Other Kafka clusters still require configuring DNS domains.
Secret Manager resources
Specify the Secret Manager to load into workers. These secrets are stored securely in Secret Manager and made available to your Connect cluster.
You can optionally use Secret Managers in connector configurations. For example, you can load a key file into your Connect cluster and have your connector read the file. Secret Managers are mounted as files in workers.
Connect clusters integrate directly with Secret Manager. You must use Secret Manager to store and manage your secrets.
The format for specifying a secret is: projects/{PROJECT_ID}/secrets/{SECRET_NAME}/versions/{VERSION_ID}
- PROJECT_ID: The ID of the project where your Secret Manager secret resides.
- SECRET_NAME: The name of the secret in Secret Manager.
- VERSION_ID: The specific version number of the secret. This is a number such as "1", "2", "3".
You can load up to 32 secrets into a single Connect cluster.
Ensure that the service agent running your Connect workers has
the secretmanager.secretAccessor role (Secret Manager Secret Accessor)
on the secrets you want to use. This role allows the Connect cluster to
retrieve the secret values from Secret Manager.
Labels
Labels are key-value pairs that help you with organization and identification.
They help you organize Connect clusters. You can attach a label to each Connect
cluster, then filter the resources based on their labels. Examples of labels are
environment:prod, application:web-app.
Create a Connect cluster
Before you create a cluster, review the documentation for Connect cluster properties.
Creating a Connect cluster takes 20 to 30 minutes.
Console
- In the Google Cloud console, go to the Connect Clusters page. 
- Click Create. - The Create a Connect cluster page opens. 
- For the Connect cluster name, enter a string. - For more information about how to name a Connect cluster, see Guidelines to name a Managed Service for Apache Kafka resource. 
- For Primary Kafka cluster, select a Managed Service for Apache Kafka cluster from the menu. - For more information about the functions that this Managed Service for Apache Kafka cluster performs, see Primary Kafka cluster. 
- For Location, select a supported location from the Region menu or retain the default value. - For more information about how to select the right location, see Primary Kafka cluster. 
- For Capacity configuration, enter values for vCPUs and Memory or retain the default values. - For vCPUs, enter the number of virtual CPUs for the cluster. - For Memory, enter the amount of memory per CPU in GiB. An error message is displayed if the memory per CPU is greater than 8 GiB. - For more information about how to size a Managed Service for Apache Kafka cluster, see Capacity configuration. 
- For Network configuration, from the Network menu, select or retain the network of the primary Managed Service for Apache Kafka cluster. 
- For Worker subnet, select or retain the subnet from the menu. - The Subnet URI path field is automatically populated. 
- For Accessible subnets, the worker subnet is automatically added as an accessible subnet. - To add additional subnets, click Add a connected subnet and then select the following: - Subnet: Select the subnet from the menu. 
- Subnet URI path: This field is automatically filled, or you can enter a full subnet resource URI path here. - The format of the subnet is - projects/{PROJECT_ID}/regions/{REGION}/subnetworks/{SUBNET_ID}.
- Click Done. 
 - (Optional) Add additional subnets by clicking Add a connected subnet. - You can add additional subnets, up to a maximum value of ten. - For more information about the subnets, see Worker subnet and Accessible subnets. 
- For Resolvable DNS domains, the DNS domain of the primary Kafka cluster is automatically added as a resolvable DNS domain. - To add additional DNS domains, expand the section if needed. 
- Click Add a DNS domain. - Select a Kafka cluster from the menu. - The DNS domain is automatically filled. You can also type in the DNS domain name for an external Kafka cluster. - Click Done. 
- For Secret manager resources, expand the section if needed. 
- Click Add secret resource. 
- Select a secret from the Secret menu and a version from the Secret version menu. You can also create a new Secret. - Ensure that the service agent running your Connect workers has the Secret Manager Secret Accessor role on the secrets you want to use. For more information about Secret Manager, see Secret Manager resources. 
- Click Done. 
- Click Add secret resource if you need to add more secrets. 
- For Labels, expand the section if needed. - To organize your project, add arbitrary labels as key/value pairs to your resources. - Click Add Label to include different environments, services, owners, teams, and so on. 
- Click Create. 
gcloud
- 
  
    
    
      
    
  
  
    
  
  
  
  
    
    In the Google Cloud console, activate Cloud Shell. At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize. 
- Run the - gcloud managed-kafka connect-clusters createcommand:- gcloud managed-kafka connect-clusters create CONNECT_CLUSTER_ID \ --location=LOCATION \ --cpu=CPU \ --memory=MEMORY \ --primary-subnet=PRIMARY_SUBNET \ --kafka-cluster=KAFKA_CLUSTER \ [--additional-subnet=ADDITIONAL_SUBNET] \ [--project=PROJECT_ID] \ [--secret=SECRET] \ [--dns-name=DNS_DOMAIN_NAME] \ [--config-file=CONFIG_FILE] \ [--labels=LABELS] [--async]- Replace the following: - CONNECT_CLUSTER_ID: The ID or name of the Connect cluster. For guidelines on how to name a Connect cluster, see Guidelines to name a Managed Service for Apache Kafka resource. The name of a Connect cluster is immutable. 
- LOCATION: The location where you create the Connect cluster. This must be a supported Google Cloudregion. You cannot change a Connect cluster's location after creation. For a list of available locations, see Managed Service for Apache Kafka locations. For more information about location recommendations, see Primary Kafka cluster. 
- CPU: The number of vCPUs for the Connect cluster. The minimum value is 3 vCPUs. See vCPU count. 
- MEMORY: The amount of memory for the Connect cluster. Use "MB", "MiB", "GB", "GiB", "TB", or "TiB" units. For example, "3GiB". You must provision between 1 GiB and 8 GiB per vCPU. See Memory. 
- PRIMARY_SUBNET: The primary subnet for the Connect cluster. - The format of the subnet is - projects/PROJECT_ID/regions/REGION/subnetworks/SUBNET_ID.- The primary subnet must be in the same region as the Connect cluster. See Connected subnets. 
- ADDITIONAL_SUBNET: (Optional) Additional subnets for the Connect cluster. The other subnets can be in a different region than the Connect cluster, but must be in the same VPC network. See Connected subnets. 
- PROJECT_ID: (Optional) The ID of the Google Cloud project. If not provided, the current project is used. 
- KAFKA_CLUSTER: The ID or fully qualified name of the primary Managed Service for Apache Kafka cluster associated with the Connect cluster. See Kafka cluster. The format of the Kafka cluster is - projects/PROJECT_ID/locations/LOCATION/clusters/CLUSTER_ID.- You cannot update to a different Kafka cluster after you create the Connect cluster. 
- SECRET: (Optional) Secrets to load into workers. Exact Secret versions from Secret Manager must be provided, aliases are not supported. Up to 32 secrets may be loaded into one cluster. Format: - projects/PROJECT_ID/secrets/SECRET_NAME/versions/VERSION_ID
- DNS_DOMAIN_NAME: (Optional) DNS domain names from the subnet to be made visible to the Connect Cluster. The Connect cluster can access resources using domain names instead of relying on IP addresses. See DNS peering. 
- LABELS: (Optional) Labels to associate with the cluster. For more information about the format for labels, see Labels. List of label KEY=VALUE pairs to add. Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers. 
- CONFIG_FILE: (Optional) The path to the JSON or YAML file containing the configuration that is overridden from the cluster or connector defaults. This file also supports inline JSON or YAML. 
- --async: (Optional) Return immediately, without waiting for the operation in progress to complete. With the- --asyncflag, you can continue with other tasks while the cluster creation happens in the background. If you don't use the flag, the system waits for the operation to complete before returning a response. You have to wait until the cluster is fully updated before you can continue with other tasks.
 - You get a response similar to the following: - Create request issued for: [sample-connectcluster] Check operation [projects/test-project/locations/us-east1/operations/operation-1753590328249-63ae19098cc06-64300a0a-06512d02] for status.- Store the - OPERATION_IDto track progress. For example, the value here is- operation-1753590328249-63ae19098cc06-64300a0a-06512d02.
Terraform
You can use a Terraform resource to create a Connect cluster.
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.
Go
Before trying this sample, follow the Go setup instructions in Install the client libraries. For more information, see the Managed Service for Apache Kafka Go API reference documentation.
To authenticate to Managed Service for Apache Kafka, set up Application Default Credentials(ADC). For more information, see Set up ADC for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in Install the client libraries. For more information, see the Managed Service for Apache Kafka Java API reference documentation.
To authenticate to Managed Service for Apache Kafka, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in Install the client libraries. For more information, see the Managed Service for Apache Kafka Python API reference documentation.
To authenticate to Managed Service for Apache Kafka, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Monitor the cluster creation operation
You can run the following command only if you ran the gcloud CLI for creating the Connect cluster.
- Creating a Connect cluster usually takes 20-30 minutes. To track progress of the cluster creation, the - gcloud managed-kafka connect-clusters createcommand uses a long-running operation (LRO), which you can monitor using the following command:- gcloud managed-kafka operations describe OPERATION_ID \ --location=LOCATION- Replace the following: - OPERATION_IDwith the value of the operation ID from the previous section.
- LOCATIONwith the value of the location from the previous section.