Kafka Backup and restore
To backup and restore Kafka topic data, Adobe S3 Kafka connector is used which periodically polls data from Kafka and in turn, uploads it to S3. Each chunk of data is represented as an S3 object. More details about the connector can be found here.
During Restore, topic messages are purged before the restore operation is performed. This is done to make sure that topic configuration remains the same after restoration.
Assumptions
A ConfigMap containing the parameters for the connector is present in the cluster.
Topics should be present in the Kafka cluster before taking the backup.
No consumer should be consuming messages from the topic during restore.
Setup Kafka Cluster
If it hasn't been done already, the strimzi
Helm repository needs
to be added to your local configuration:
# Add strimzi helm repo
$ helm repo add strimzi https://strimzi.io/charts/
$ helm repo update
Install the Strimzi Cluster Operator from the strimzi
Helm repository:
# create namespace
$ kubectl create namespace kafka-test
$ helm install kafka-release strimzi/strimzi-kafka-operator --namespace kafka-test
Setup Kafka Cluster with one ZooKeeper and one Kafka broker instance:
$ kubectl --namespace kafka-test apply -f \
https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/kafka-cluster.yaml
Add some data to the Kafka topic blogs using Kafka image
strimzi/kafka:0.20.0-kafka-2.6.0
provided by strimzi:
# Create a topic on Kafka server
$ kubectl -n kafka-test run kafka-topic -ti --image=strimzi/kafka:0.20.0-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-topics.sh --create --topic blogs --bootstrap-server my-cluster-kafka-bootstrap:9092
# Create a producer to push events to blogs topic
$ kubectl -n kafka-test run kafka-producer -ti --image=strimzi/kafka:0.20.0-kafka-2.6.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic blogs
>{"userId": 1,"id": 1,"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit"}
>{"userId": 1,"id": 2,"title": "qui est esse"}
>{"userId": 1,"id": 3,"title": "ea molestias quasi exercitationem repellat qui ipsa sit aut"}
Note
To take backup of multiple topics, add comma separated
topic names in adobe-s3-sink.properties
Create ConfigMap
A config map with the following configuration should be provided to the Kafka Connector:
Details of the S3 bucket and Kafka broker address
adobe-s3-sink.properties
file containing properties related to s3 sink Connectoradobe-s3-source.properties
file containing properties related to s3 source ConnectorkafkaConfiguration.properties
containing properties related to Kafka server
$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-s3-sink.properties
$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-kafkaConfiguration.properties
$ wget https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/adobe-s3-source.properties
$ kubectl create configmap --namespace kafka-test s3config --from-file=adobe-s3-sink.properties=./adobe-s3-sink.properties \
--from-file=adobe-kafkaConfiguration.properties=./adobe-kafkaConfiguration.properties --from-file=adobe-s3-source.properties=./adobe-s3-source.properties \
--from-literal=timeinSeconds=1800
Create Blueprint
$ kubectl --namespace kasten-io apply -f \
https://raw.githubusercontent.com/kanisterio/kanister/0.69.0/examples/kafka/adobe-s3-connector/kafka-blueprint.yaml
Once the Blueprint gets created, annotate the ConfigMap with the below annotations to instruct K10 to use this Blueprint while performing backup and restore operations on the Kafka instance.
$ kubectl -n kafka-test annotate configmaps/s3config kanister.kasten.io/blueprint=kafka-blueprint
Finally, use K10 to backup and restore the application.