Replicate Kafka topics with MirrorMaker 2.0

This tutorial shows how to replicate data between Managed Service for Apache Kafka clusters, by using the MirrorMaker 2.0 Source connector.

The clusters can be in the same project or in different projects, and they can be in the same region or different regions. For this tutorial, you set up replication across regions but within the same project. However, the steps are the same for other combinations.

Architecture

For this scenario, there are three clusters:

  • A Kafka cluster whose data is being replicated. This cluster is called the source cluster, because it's the source of the data.

  • A Kafka cluster where the replicated data is written. This cluster is called the target cluster.

  • A Connect cluster, which lets you create and manage the MirrorMaker 2.0 Source connector.

A Connect cluster has a primary Kafka cluster, which is the Kafka cluster associated with the Connect cluster. To minimize latency for write operations, it's recommended to designate the target cluster as the primary cluster, and to put the Connect cluster in the same region as the target cluster.

The following diagram shows these components:

Architecture of the Kafka data replication scenario

Data replication is performed by the MirrorMaker 2.0 Source connector. Two other MirrorMaker 2.0 connectors are optional for this scenario:

  • MirrorMaker 2.0 Checkpoint connector: Enables seamless failover, by ensuring that consumers on the target cluster can resume processing from the same point as the source cluster.

  • MirrorMaker 2.0 Heartbeat connector: Generates periodic heartbeat messages on the source Kafka cluster, which lets you monitor the health and status of the data replication.

This tutorial does not use these optional connectors. For more information, see When to use MirrorMaker 2.0.

Create the source Kafka cluster

In this step, you create a Managed Service for Apache Kafka cluster. This cluster is the source cluster, which contains the data to replicate.

Console

  1. Go to the Managed Service for Apache Kafka > Clusters page.

    Go to Clusters

  2. Click Create.

  3. In the Cluster name box, enter a name for the cluster.

  4. In the Region list, select a location for the cluster.

  5. For Network configuration, configure the subnet where the cluster is accessible.

    1. For Project, select your project.
    2. For Network, select a VPC network in your project.
    3. For Subnet, select a subnet.
    4. Click Done.
  6. Click Create.

While the cluster is being created, the cluster state is Creating. When the cluster has finished being created, the state is Active.

gcloud

To create the source Kafka cluster, run the managed-kafka clusters create command.

gcloud managed-kafka clusters create SOURCE_KAFKA_CLUSTER \
--location=SOURCE_REGION \
--cpu=3 \
--memory=3GiB \
--subnets=projects/PROJECT_ID/regions/SOURCE_REGION/subnetworks/SOURCE_SUBNET \
--async

Replace the following:

  • SOURCE_KAFKA_CLUSTER: a name for the Kafka cluster
  • SOURCE_REGION: the location of the cluster

    For information about supported locations, see Managed Service for Apache Kafka locations.

  • PROJECT_ID: your project ID

  • SOURCE_SUBNET: the subnet where you want to deploy the cluster; for example, default

The command runs asynchronously and returns an operation ID:

Check operation [projects/PROJECT_ID/locations/SOURCE_SUBNET/operations/OPERATION_ID] for status.

To track the progress of the create operation, use the gcloud managed-kafka operations describe command:

gcloud managed-kafka operations describe OPERATION_ID \
  --location=SOURCE_REGION

For more information, see Monitor the cluster creation operation.

Create a Kafka topic

When the source cluster is ready, create a topic as follows:

Console

  1. Go to the Managed Service for Apache Kafka > Clusters page.

    Go to Clusters

  2. Click the name of the cluster.

  3. In the cluster details page, click Create Topic.

  4. In the Topic name box, enter a name for the topic.

  5. Click Create.

gcloud

To create a Kafka topic, run the managed-kafka topics create command.

gcloud managed-kafka topics create TOPIC_NAME \
--cluster=SOURCE_KAFKA_CLUSTER \
--location=SOURCE_REGION \
--partitions=10 \
--replication-factor=3

Replace the following:

  • TOPIC_NAME: the name of the Kafka topic to create
  • SOURCE_KAFKA_CLUSTER: the name of the source cluster
  • SOURCE_REGION: the region where you created the source cluster

Create the target Kafka cluster

In this step, you create a second Managed Service for Apache Kafka cluster. This cluster is the target cluster, to which MirrorMaker 2.0 replicates data.

Console

  1. Go to the Managed Service for Apache Kafka > Clusters page.

    Go to Clusters

  2. Click Create.

  3. In the Cluster name box, enter a name for the cluster.

  4. In the Region list, select a location for the cluster. Choose a different region from where the source cluster is located.

  5. For Network configuration, configure the subnet where the target cluster is accessible. The subnet must be in the same VPC network as the source cluster's subnet.

    1. For Project, select your project.
    2. For Network, select the same VPC network as the source cluster's subnet.
    3. For Subnet, select the subnet.
    4. Click Done.
  6. Click Create.

While the cluster is being created, the cluster state is Creating. When the cluster has finished being created, the state is Active.

gcloud

To create the target Kafka cluster, run the managed-kafka clusters create command.

gcloud managed-kafka clusters create TARGET_KAFKA_CLUSTER \
--location=TARGET_REGION \
--cpu=3 \
--memory=3GiB \
--subnets=projects/PROJECT_ID/regions/TARGET_REGION/subnetworks/TARGET_SUBNET \
--async

Replace the following:

  • TARGET_KAFKA_CLUSTER: a name for the Kafka cluster
  • TARGET_REGION: the location of the cluster; choose a different region from where the source cluster is located.

    For information about supported locations, see Managed Service for Apache Kafka locations.

  • PROJECT_ID: your project ID

  • TARGET_SUBNET: the subnet where you want to deploy the cluster; for example, default

    Select a subnet in the same VPC as the source cluster.

Create a Connect cluster

In this step, you create a Connect cluster. Creating a Connect cluster usually takes 20-30 minutes.

Before you start this step, make sure the target Kafka cluster from the previous step is fully created.

Console

  1. Go to the Managed Service for Apache Kafka > Connect Clusters page.

    Go to Connect Clusters

  2. Click Create.

  3. For the Connect cluster name, enter a string. Example: my-connect-cluster.

  4. For Primary Kafka cluster, select the target Kafka cluster that you created in the previous step. (Do not select your source Kafka cluster.)

  5. For Location, Network configuration, and Worker subnet, you can either use the default values, or customize them to your specific requirements.

  6. To enable the Connect cluster to access the subnet of the source cluster, perform the following steps:

    1. Expand Accessible subnets.

    2. Click Add a connected subnet.

    3. For Subnet URI path, enter the subnet of the source Kafka cluster. Use the following format:

    projects/PROJECT_ID/regions/SOURCE_REGION/subnetworks/SOURCE_SUBNET
    

    Replace the following:

    • PROJECT_ID: your project ID
    • SOURCE_REGION: the region where you created the source Kafka cluster
    • SOURCE_SUBNET: the name of the subnet for the source Kafka cluster
  7. To enable the Connect cluster to resolve the DNS domain of the source cluster, perform the following steps:

    1. Expand Resolvable DNS domains.

    2. Click Add a DNS domain.

    3. In the Kafka cluster list, select the source Kafka cluster.

  8. Click Create.

While the cluster is being created, the cluster state is Creating. When the cluster has finished being created, the state is Active.

gcloud

To create a Connect cluster, run the gcloud managed-kafka connect-clusters create command.

gcloud managed-kafka connect-clusters create CONNECT_CLUSTER \
  --location=TARGET_REGION \
  --cpu=12 \
  --memory=12GiB \
  --primary-subnet=projects/PROJECT_ID/regions/TARGET_REGION/subnetworks/TARGET_SUBNET \
  --kafka-cluster=TARGET_KAFKA_CLUSTER \
  --dns-name=SOURCE_KAFKA_CLUSTER.SOURCE_REGION.managedkafka.PROJECT_ID.cloud.goog. \
  --additional-subnet=projects/PROJECT_ID/regions/SOURCE_REGION/subnetworks/SOURCE_SUBNET \
  --async

Replace the following:

  • CONNECT_CLUSTER: a name for the Connect cluster
  • TARGET_REGION: the region where you created the target Kafka cluster
  • PROJECT_ID: your project ID
  • TARGET_SUBNET: the subnet where you created the target Kafka cluster
  • TARGET_KAFKA_CLUSTER: the name of the target Kafka cluster
  • SOURCE_KAFKA_CLUSTER: the name of the source Kafka cluster
  • SOURCE_REGION: the region where you created the source Kafka cluster
  • SOURCE_SUBNET: the name of the subnet for the source Kafka cluster

The command runs asynchronously and returns an operation ID:

Check operation [projects/PROJECT_ID/locations/TARGET_REGION/operations/OPERATION_ID] for status.

To track the progress of the create operation, use the gcloud managed-kafka operations describe command:

gcloud managed-kafka operations describe OPERATION_ID \
  --location=TARGET_REGION

For more information, see Monitor the cluster creation operation.

Create a MirrorMaker 2.0 Source connector

In this step, you create a MirrorMaker 2.0 Source connector. This connector replicates messages from the source Kafka cluster to the target Kafka cluster.

Console

  1. Go to the Managed Service for Apache Kafka > Connect Clusters page.

    Go to Connect Clusters

  2. Click the name of the Connect cluster.

  3. Click Create connector.

  4. For the Connector name, enter a string. Example: mm2-connector.

  5. In the Connector plugin list, select MirrorMaker 2.0 Source.

  6. Select Use primary Kafka cluster as target cluster.

  7. For Source cluster, select Managed Service for Apache Kafka Cluster.

  8. In the Kafka cluster list, select your source cluster.

  9. In the Comma-separated topic names or topic regex box, enter the name of the Kafka topic or topics that you want to replicate.

  10. Click Create.

gcloud

To create a MirrorMaker 2.0 Source connector, run the gcloud managed-kafka connectors create command.

gcloud managed-kafka connectors create CONNECTOR_NAME \
  --location=TARGET_REGION \
  --connect-cluster=CONNECT_CLUSTER \
  --configs=connector.class=org.apache.kafka.connect.mirror.MirrorSourceConnector,\
source.cluster.alias=source,\
source.cluster.bootstrap.servers=bootstrap.SOURCE_KAFKA_CLUSTER.SOURCE_REGION.managedkafka.PROJECT_ID.cloud.goog:9092,\
target.cluster.alias=target,\
target.cluster.bootstrap.servers=bootstrap.TARGET_KAFKA_CLUSTER.TARGET_REGION.managedkafka.PROJECT_ID.cloud.goog:9092,\
tasks.max=3,\
topics=TOPIC_NAME

Replace the following:

  • CONNECTOR_NAME: a name for the connector, such as mm2-connector
  • TARGET_REGION: the region where you created the Connect cluster and the target Kafka cluster
  • CONNECT_CLUSTER: the name of your Connect cluster
  • SOURCE_KAFKA_CLUSTER: the name of the source Kafka cluster
  • SOURCE_REGION: the region where you created the source Kafka cluster
  • PROJECT_ID: your project ID
  • TARGET_KAFKA_CLUSTER: the name of the target Kafka cluster
  • TOPIC_NAME: the name of the topic to replicate. This parameter can also specify a comma-separated list of topic names or a regular expression.

The MirrorMaker 2.0 Source connector creates a new topic in the target cluster named "source.TOPIC_NAME", where TOPIC_NAME is the name of the topic in the source cluster.

View results

To verify that messages are being replicated, you can use the Kafka command line tools. For information about setting up the Kafka CLI, see Set up a client machine in the document Produce and consume messages with the CLI.

For example, to send a message to the source cluster, enter the following at the command line:

export BOOTSTRAP=bootstrap.SOURCE_KAFKA_CLUSTER.SOURCE_REGION.managedkafka.PROJECT_ID.cloud.goog:9092

for msg in {1..10}; do
  echo "message $msg"
done | kafka-console-producer.sh --topic TOPIC_NAME \
       --bootstrap-server $BOOTSTRAP --producer.config client.properties

To read the duplicated messages from the target cluster, enter the following at the command line:

export BOOTSTRAP=bootstrap.TARGET_KAFKA_CLUSTER.TARGET_REGION.managedkafka.PROJECT_ID.cloud.goog:9092

kafka-console-consumer.sh --topic source.TOPIC_NAME --from-beginning \
 --bootstrap-server $BOOTSTRAP --consumer.config client.properties

The output looks like the following:

message 1
message 2
message 3
message 4
[...]