etcd Backup (Kubeadm)

Assuming the Kubernetes cluster is set up through Kubeadm, the etcd pods will be running in the kube-system namespace.

Before taking a backup of the etcd cluster, a Secret needs to be created in a temporary new or an existing namespace, containing details about the authentication mechanism used by etcd. In the case of kubeadm, it is likely that etcd will have been deployed using TLS-based authentication. A temporary namespace and a Secret to access etcd can be created by running the following command:

$ kubectl create namespace etcd-backup
  namespace/etcd-backup created

$ kubectl create secret generic etcd-details \
     --from-literal=cacert=/etc/kubernetes/pki/etcd/ca.crt \
     --from-literal=cert=/etc/kubernetes/pki/etcd/server.crt \
     --from-literal=endpoints=https://127.0.0.1:2379 \
     --from-literal=key=/etc/kubernetes/pki/etcd/server.key \
     --from-literal=etcdns=kube-system \
     --from-literal=labels=component=etcd,tier=control-plane \
     --namespace etcd-backup

Note

If the correct path of the server keys and certificate is not provided, backups will fail. These paths can be discovered from the command that gets run inside the etcd pod, by describing the pod or looking into the static pod manifests. The value for the flags etcdns and labels should be the namespace where etcd pods are running and etcd pods' labels respectively.

To avoid any other workloads from etcd-backup namespace being backed up, Secret etcd-details can be labeled to make sure only this Secret is included in the backup. The below command can be executed to label the Secret:

$ kubectl label secret -n etcd-backup etcd-details include=true

Backup

To create the Blueprint resource that will be used by K10 to backup etcd, run the below command:

$ kubectl --namespace kasten-io apply -f \
    https://raw.githubusercontent.com/kanisterio/kanister/0.82.0/examples/etcd/etcd-in-cluster/k8s/etcd-incluster-blueprint.yaml

Alternatively, use the Blueprints page on K10 Dashboard to create the Blueprint resource.

Once the Blueprint is created, the Secret that was created above needs to be annotated to instruct K10 to use the Blueprint to perform backups on the etcd pod. The following command demonstrates how to annotate the Secret with the name of the Blueprint that was created earlier.

$ kubectl annotate secret -n etcd-backup etcd-details kanister.kasten.io/blueprint='etcd-blueprint'

Once the Secret is annotated, use K10 to backup etcd using the new namespace. If the Secret is labeled, as mentioned in one of the previous steps, while creating the policy just that Secret can be included in the backup by adding resource filters like below:

Note

The backup location of etcd can be found by looking at the Kanister artifact of the created restore point.

Restore

To restore the etcd backup, log in to the host (most likely the Kubernetes control plane nodes) where the etcd pod is running. Obtain the restore path by looking into the artifact details of the backup action on the K10 dashboard, and download the snapshot to a specific location on the etcd pod host machine (e.g., /tmp/etcd-snapshot.db). Downloading the snapshot is going to be dependent on the backup storage target in use. For example, if AWS S3 was used as object storage, the AWS CLI will be needed to obtain the backup.

Once the snapshot is downloaded from the backup target, use the etcdctl CLI tool to restore that snapshot to a specific location, for example /var/lib/etcd-from-backup on the host. The below command can be used to restore the etcd backup:

$ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
      --cacert=/etc/kubernetes/pki/etcd/ca.crt \
      --cert=/etc/kubernetes/pki/etcd/server.crt \
      --key=/etc/kubernetes/pki/etcd/server.key \
      --data-dir="/var/lib/etcd-from-backup" \
      --initial-cluster="master=https://127.0.0.1:2380" \
      --name="master" \
      --initial-advertise-peer-urls="https://127.0.0.1:2380" \
      --initial-cluster-token="etcd-cluster-1" \
      snapshot restore /tmp/snapshot-pre-boot.db

All the values that are provided for the above flags can be discovered from the etcd pod manifest (static pod). The two important flags are --data-dir and --initial-cluster-token. --data-dir is the directory where etcd stores its data into and --initial-cluster-token is the flag that defines the token for new members to join this etcd cluster.

Once the backup is restored into a new directory (e.g., /var/lib/etcd-from-backup), the etcd manifest (static pod) needs to be updated to point its data directory to this new directory and the --initial-cluster-token=etcd-cluster-1 needs to be specified in the etcd command argument. Apart from that the volumes and volumeMounts fields should also be changed to point to new data-dir that we restored the backup to.

Multi-Member etcd Cluster

In the cases when the cluster is running a multi-member etcd cluster, the same steps that we followed earlier can be followed to restore the cluster with some minor changes. As mentioned in the official etcd documentation all the members of etcd can be restored from the same snapshot.

Among the leader nodes, choose one that will be used as a restore node and stop the static pods on all other leader nodes. After making sure that the static pods have been stopped on the other leader nodes, the previous step should be followed on those nodes sequentially.

The below command, used to restore the etcd backup, needs to be changed from the previous example before running it on other leader nodes:

$ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
      --cacert=/etc/kubernetes/pki/etcd/ca.crt \
      --cert=/etc/kubernetes/pki/etcd/server.crt \
      --key=/etc/kubernetes/pki/etcd/server.key \
      --data-dir="/var/lib/etcd-from-backup" \
      --initial-cluster="master=https://127.0.0.1:2380" \
      --name="master" \
      --initial-advertise-peer-urls="https://127.0.0.1:2380" \
      --initial-cluster-token="etcd-cluster-1" \
      snapshot restore /tmp/snapshot-pre-boot.db

The name of the host for the flags --initial-cluster and --name should be changed based on the host (leader) on which the command is being run.

To explore more about how etcd backup and restore work, this Kubernetes documentation can be followed.

In reaction to the change in the static pod manifest, the kubelet will automatically recreate the etcd pod with the cluster state that was backed up when the etcd backup was performed.