Kafka Backup and restore

To backup and restore Kafka topic data, Adobe S3 Kafka connector is used which periodically polls data from Kafka and in turn, uploads it to S3. Each chunk of data is represented as an S3 object. More details about the connector can be found here.

During Restore, topic messages are purged before the restore operation is performed. This is done to make sure that topic configuration remains the same after restoration.

Assumptions

A ConfigMap containing the parameters for the connector is present in the cluster.
Topics should be present in the Kafka cluster before taking the backup.
No consumer should be consuming messages from the topic during restore.

Setup Kafka Cluster

If it hasn't been done already, the strimzi Helm repository needs to be added to your local configuration:

# Add strimzi helm repo
$ helm repo add strimzi https://strimzi.io/charts/
$ helm repo update

Install the Strimzi Cluster Operator from the strimzi Helm repository:

# create namespace
$ kubectl create namespace kafka-test
$ helm install kafka-release strimzi/strimzi-kafka-operator --namespace kafka-test

Setup Kafka Cluster with one ZooKeeper and one Kafka broker instance:

$ kubectl --namespace kafka-test apply -f \
     https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/kafka-cluster.yaml

Add some data to the Kafka topic blogs using Kafka image strimzi/kafka:0.20.0-kafka-2.6.0 provided by strimzi:

# Create a topic on Kafka server
$ kubectl -n kafka-test run kafka-topic -ti --image=strimzi/kafka:0.20.0-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-topics.sh --create --topic blogs --bootstrap-server my-cluster-kafka-bootstrap:9092
# Create a producer to push events to blogs topic
$ kubectl -n kafka-test run kafka-producer -ti --image=strimzi/kafka:0.20.0-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic blogs
>{"userId": 1,"id": 1,"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit"}
>{"userId": 1,"id": 2,"title": "qui est esse"}
>{"userId": 1,"id": 3,"title": "ea molestias quasi exercitationem repellat qui ipsa sit aut"}

Note

To take backup of multiple topics, add comma separated topic names in adobe-s3-sink.properties

Create ConfigMap

A config map with the following configuration should be provided to the Kafka Connector:

Details of the S3 bucket and Kafka broker address
adobe-s3-sink.properties file containing properties related to s3 sink Connector
adobe-s3-source.properties file containing properties related to s3 source Connector
kafkaConfiguration.properties containing properties related to Kafka server

$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-s3-sink.properties
$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-kafkaConfiguration.properties
$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-s3-source.properties
$ kubectl create configmap --namespace kafka-test s3config --from-file=adobe-s3-sink.properties=./adobe-s3-sink.properties \
     --from-file=adobe-kafkaConfiguration.properties=./adobe-kafkaConfiguration.properties --from-file=adobe-s3-source.properties=./adobe-s3-source.properties \
     --from-literal=timeinSeconds=1800

Create Blueprint

$ kubectl --namespace kasten-io apply -f \
    https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/kafka-blueprint.yaml

Once the Blueprint gets created, annotate the ConfigMap with the below annotations to instruct K10 to use this Blueprint while performing backup and restore operations on the Kafka instance.

$ kubectl -n kafka-test annotate configmaps/s3config kanister.kasten.io/blueprint=kafka-blueprint

Finally, use K10 to backup and restore the application.