Veeam Kasten Disaster Recovery

As Veeam Kasten is a stateful application running on the cluster, it must be responsible for backing up its own data to enable recovery in the event of disaster - this is enabled by the Veeam Kasten Disaster Recovery (KDR) policy. In particular, KDR provides the ability to recover the Veeam Kasten platform from a variety of disasters, such as the unintended deletion of Veeam Kasten or its restore points, the failure of the underlying storage used by Veeam Kasten, or even the accidental destruction of the Kubernetes cluster on which Veeam Kasten is deployed.

Configuring Veeam Kasten Disaster Recovery Mode

The KDR mode specifies how internal Veeam Kasten resources are protected. The mode can be set either before or after enabling the KDR policy. Changes to the KDR mode only apply to future KDR policy runs.

All installations default to Legacy DR mode. Quick DR mode is available and recommended for installations using snapshot-capable storage.

Warning

Quick DR mode should only be enabled if the storage provisioner used for Veeam Kasten PVCs supports both the creation of snapshots and the ability to restore the existing volume from a snapshot.

  • To enable Quick DR mode, install or upgrade Veeam Kasten with the --set kastenDisasterRecovery.quickMode.enabled=true Helm value.

  • To enable Legacy DR mode, install or upgrade Veeam Kasten with the --set kastenDisasterRecovery.quickMode.enabled=false Helm value.

Comparing Legacy DR and Quick DR

Refer to the details below to understand the key differences between each mode.

Quick DR

  • Snapshot-capable storage for Veeam Kasten PVCs required

  • Incrementally exports only necessary data from the catalog database and creates a local snapshot of the catalog PVC on each policy run

  • Enables recovery of exported restore points on any cluster

  • Enables recovery of local restore points, exported restore points, and action history only where the local catalog snapshot is available (i.e. in-place recovery on the original cluster)

  • Faster KDR backup and recovery versus Legacy DR

  • Consumes less location profile storage versus Legacy DR

  • Protects additional Veeam Kasten resource types versus Legacy DR

Legacy DR

  • No dependency on snapshot-capable storage for Veeam Kasten PVCs

  • Exports a full dump of the catalog database on each policy run

  • Enables recovery of local restore points, exported restore points, and action history

KDR Protected Resource Matrix

Quick DR VS Legacy DR Support Matrix

Veeam Kasten Resource

Quick DR

Legacy DR

Actions

Yes(1)

Yes

Local Restore Points

Yes(1)

Yes

Exported Restore Points

Yes

Yes

Policies

Yes

Yes

Basic User Policies

Yes

No

Profiles

Yes

Yes

Blueprints

Yes

Yes

Blueprint Bindings

Yes

No

Policy Presets

Yes

No

Transform Sets

Yes

No

Multi-Cluster Primary

Yes

No

Multi-Cluster Secondary

Yes

No

Reports

No

No

ActionPodSpecs

No

No

AuditConfig

No

No

StorageSecurityContext

Yes

No

StorageSecurityContextBinding

Yes

No

Note

For Quick DR, resources marked with (1) can only be restored if a local KDR snapshot is available.

Enabling Veeam Kasten Disaster Recovery

Enabling Veeam Kasten Disaster Recovery (KDR) creates a dedicated policy within Veeam Kasten to back up its resources and catalog data to an external location profile.

Note

Veeam Repository location profiles cannot be used as a destination for KDR backups.

Note

It is strongly recommended to use a location profile that supports immutable backups to ensure restore point catalog data can be recovered in the event of incidents including ransomware and accidental deletion.

The Veeam Kasten Disaster Recovery settings are accessible via the Setup Kasten DR page under the Settings menu in the navigation sidebar. For new installations, these settings are also accessible using the link located within the alerts panel.

../_images/dr_disabled_notification.png

Select the Setup Kasten DR page under the Settings menu in the navigation sidebar.

Enabling KDR requires selecting a Location Profile for the exported KDR backups and providing a passphrase to encrypt the data using AES-256-GCM.

The passphrase can be provided as a raw string or as reference to a secret in HashiCorp Vault or AWS Secrets Manager.

Enable KDR by selecting a valid location profile and providing either a raw passphrase or secret management credentials, then clicking the Enable Kasten DR button.

Note

If providing a raw passphrase, save it securely outside the cluster.

../_images/dr_setup_passphrase.png

Note

Using HashiCorp Vault requires that Kasten is configured to access Vault.

../_images/dr_setup_vault.png

Note

Using AWS Secrets Manager requires that an AWS Infrastructure Profile exists with the adequate permissions

../_images/dr_setup_aws.png

A confirmation message with the cluster ID will be displayed when KDR is enabled. This ID is used as a prefix to the object storage or NFS file storage location where Veeam Kasten saves its exported backup data.

../_images/dr_setup_cluster_id.png

Warning

After enabling Veeam Kasten Disaster Recovery, it is essential to retain the following to successfully recover Veeam Kasten from a disaster:

  1. The source cluster ID

  2. The KDR passphrase (or external secret manager details)

  3. The KDR location profile details and credential

Without this information, restore point catalog recovery will not be possible.

The cluster ID value can also be accessed by using the following kubectl command.

# Extract UUID of the `default` namespace
$ kubectl get namespace default -o jsonpath="{.metadata.uid}{'\n'}"

Managing the Veeam Kasten Disaster Recovery Policy

A policy named k10-disaster-recovery-policy that implements Veeam Kasten Disaster Recovery (KDR) will automatically be created when KDR is enabled. This policy can be viewed through the Policies page in the navigation sidebar.

Click Run Once on the k10-disaster-recovery-policy to start a manual backup.

../_images/dr_enabled_policy.png

Click Edit to modify the frequency and retention settings. It is recommended that the KDR policy match the frequency of the lowest RPO policy on the cluster.

Disabling Veeam Kasten Disaster Recovery

Veeam Kasten Disaster Recovery can be disabled by clicking the Disable Kasten DR button on the Setup Kasten DR page, which is found under the Settings menu in the navigation sidebar.

Warning

It is not recommended to run Veeam Kasten without KDR enabled.

../_images/dr_setup_cluster_id.png

Recovering Veeam Kasten from a Disaster via UI

To recover from a KDR backup using the UI, follow these steps:

  1. On a new cluster, install a fresh Veeam Kasten instance in the same namespace as the original Veeam Kasten instance.

  2. On the new cluster, create a location profile by providing the bucket information and credentials for the object storage location or NFS file storage location where previous Veeam Kasten backups are stored.

  3. On the new cluster, navigate to the Restore Kasten page under the Settings menu in the navigation sidebar.

  4. In the Profile drop-down, select the location profile created in step 3.

../_images/dr_restore_choose_profile.png
  1. For Cluster ID, provide the ID of the original cluster with Veeam Kasten Disaster Recovery enabled. This ID can be found on the Setup Kasten DR page of the original cluster that currently has Veeam Kasten Disaster Recovery enabled.

../_images/dr_restore_cluster_id_input.png
  • Raw passphrase: Provide the passphrase used when enabling Disaster Recovery.

../_images/dr_restore_passphrase.png
  • HashiCorp Vault: Provide the Key Value Secrets Engine Version, Mount, Path, and Passphrase Key stored in a HashiCorp Vault secret.

../_images/dr_restore_vault.png
  • AWS Secrets Manager: Provide the secret name, its associated region, and the key.

../_images/dr_restore_aws.png

Note

For immutable location profiles, a previous point in time can be provided to filter out any restore points newer than the specified time in the next step. If no specific date is chosen, it will display all available restore points, with the most recent ones appearing first.

../_images/dr_restore_point_in_time.png
  1. Click the Next button to start the validation process. If validation succeeds, a drop-down containing the available restore points will be displayed.

../_images/dr_restore_select_restore_point.png

Note

All times are displayed in the local timezone of the client's browser.

  1. Select the desired restore point and click the Next button.

  2. Review the summary and click the Start Restore button to begin the restore process.

../_images/dr_restore_summary.png
  1. Upon completion of a successful restoration, navigation to the dashboard and information about ownership and deletion of the configmap is displayed.

../_images/dr_restore_complete.png

Following recovery of the Veeam Kasten restore point catalog, restore cluster-scoped resources and applications as required.

Recovering Veeam Kasten from a Disaster via CLI

In Veeam Kasten v7.5.0 and above, KDR recoveries can be performed via API or CLI using DR API Resources.

Recovering from a KDR backup using CLI involves the following sequence of steps:

  1. Create a Kubernetes Secret, k10-dr-secret, using the passphrase provided while enabling Disaster Recovery as described in Specifying a Disaster Recovery Passphrase.

  2. Install a fresh Veeam Kasten instance in the same namespace as the above Secret.

  3. Provide bucket information and credentials for the object storage location or NFS file storage location where previous Veeam Kasten backups are stored.

  4. Create KastenDRReview resource providing the source cluster information.

  5. Create KastenDRRestore resource referring to the KastenDRReview resource and choosing one of the restore points provided in the KastenDRReview status.

  6. The steps 4 and 5 can be skipped and KastenDRRestore resource can be created directly with the source cluster information.

  7. Delete the KastenDRReview and KastenDRRestore resources after restore completes.

Following recovery of the Veeam Kasten restore point catalog, restore cluster-scoped resources and applications as required.

Recovering Veeam Kasten From a Disaster via Helm

Note

The k10restore Helm chart is deprecated with Veeam Kasten v7.5.0 release and will be removed in a future release.

Recovering from a KDR backup using k10restore involves the following sequence of actions:

  1. Create a Kubernetes Secret, k10-dr-secret, using the passphrase provided while enabling Disaster Recovery

  2. Install a fresh Veeam Kasten instance in the same namespace as the above Secret

  3. Provide bucket information and credentials for the object storage location or NFS file storage location where previous Veeam Kasten backups are stored

  4. Restoring the Veeam Kasten backup

  5. Uninstalling the Veeam Kasten restore instance after recovery is recommended

Note

If Kasten was previously installed in FIPS mode, ensure the fresh Veeam Kasten instance is also installed in FIPS mode.

Note

If Veeam Kasten backup is stored using an NFS File Storage Location, it is important that the same NFS share is reachable from the recovery cluster and is mounted on all nodes where Veeam Kasten is installed.

Following recovery of the Veeam Kasten restore point catalog, restore cluster-scoped resources and applications as required.

Specifying a Disaster Recovery Passphrase

Currently, Veeam Kasten Disaster Recovery encrypts all artifacts via the use of the AES-256-GCM algorithm. The passphrase entered while enabling Disaster Recovery is used for this encryption. On the cluster used for Veeam Kasten recovery, the Secret k10-dr-secret needs to be therefore created using that same passphrase in the Veeam Kasten namespace (default kasten-io)

The passphrase can be provided as a raw string or reference a secret in HashiCorp Vault or AWS Secrets Manager.

Specifying the passphrase as a raw string:

$ kubectl create secret generic k10-dr-secret \
   --namespace kasten-io \
   --from-literal key=<passphrase>

Specifying the passphrase as a HashiCorp Vault secret:

$ kubectl create secret generic k10-dr-secret \
   --namespace kasten-io \
   --from-literal source=vault \
   --from-literal vault-kv-version=<version-of-key-value-secrets-engine> \
   --from-literal vault-mount-path=<path-where-key-value-engine-is-mounted> \
   --from-literal vault-secret-path=<path-from-mount-to-passphrase-key> \
   --from-literal key=<name-of-passphrase-key>

# Example
$ kubectl create secret generic k10-dr-secret \
   --namespace kasten-io \
   --from-literal source=vault \
   --from-literal vault-kv-version=KVv1 \
   --from-literal vault-mount-path=secret \
   --from-literal vault-secret-path=k10 \
   --from-literal key=passphrase

The supported values for vault-kv-version are KVv1 and KVv2.

Note

Using a passphrase from HashiCorp Vault also requires enabling HashiCorp Vault authentication when installing the kasten/k10restore helm chart. Refer: Enabling HashiCorp Vault using Token Auth or Kubernetes Auth.

Specifying the passphrase as an AWS Secrets Manager secret:

$ kubectl create secret generic k10-dr-secret \
   --namespace kasten-io \
   --from-literal source=aws \
   --from-literal aws-region=<aws-region-for-secret> \
   --from-literal key=<aws-secret-name>

# Example
$ kubectl create secret generic k10-dr-secret \
   --namespace kasten-io \
   --from-literal source=aws \
   --from-literal aws-region=us-east-1 \
   --from-literal key=k10/dr/passphrase

Reinstalling Veeam Kasten

Note

When reinstalling Veeam Kasten on the same cluster, it is important to clean up the namespace in which Veeam Kasten was previously installed before the above passphrase creation.

# Delete the kasten-io namespace.
$ kubectl delete namespace kasten-io

Veeam Kasten must be reinstalled before recovery. Please follow the instructions here.

Configuring Location Profile

Create a Location Profile with the object storage location or NFS file storage location where Veeam Kasten KDR backups are stored.

Restoring Veeam Kasten with k10restore

Requirements:

  • Source cluster ID

  • Name of Location Profile from the previous step

# Install the helm chart that creates the Kasten restore job and wait for completion of the `k10-restore` job
# Assumes that Kasten is installed in the 'kasten-io' namespace.
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name>

If Veeam Kasten Quick Disaster Recovery is enabled, the Veeam Kasten restore helm chart should be installed with the following helm value:

--set quickMode.enabled=true \
--set quickMode.overrideResources=true

Note

The overrideResources flag must be set to true when using Quick Disaster Recovery. Since the Disaster Recovery operation involves creating or replacing resources, confirmation should be provided by setting this flag.

Veeam Kasten provides the ability to apply labels and annotations to all temporary worker pods created during Veeam Kasten recovery as part of its operation. The labels and annotations can be set through the podLabels and podAnnotations Helm flags, respectively. For example, if using a values.yaml file:

podLabels:
   app.kubernetes.io/component: "database"
   topology.kubernetes.io/region: "us-east-1"
podAnnotations:
   config.kubernetes.io/local-config: "true"
   kubernetes.io/description: "Description"

Alternatively, the Helm parameters can be configured using the --set flag:

--set podLabels.labelKey1=value1 --set podLabels.labelKey2=value2 \
--set podAnnotations.annotationKey1="Example annotation" --set podAnnotations.annotationKey2=value2

The restore job always restores the restore point catalog and artifact information. If the restore of other resources (options include profiles, policies, secrets) needs to be skipped, the skipResource flag can be used.

# e.g. to skip restore of profiles and policies, helm install command will be as follows:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name> \
    --set skipResource="profiles\,policies"

The timeout of the entire restore process can be configured by the helm field restore.timeout. The type of this field is int and the value is in minutes.

# e.g. to specify the restore timeout, helm install command will be as follows:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name> \
    --set restore.timeout=<timeout-in-minutes>

If the Disaster Recovery Location Profile was configured for Immutable Backups, Veeam Kasten can be restored to an earlier point in time. The protection period chosen when creating the profile determines how far in the past the point-in-time can be. Set the pointInTime helm value to the desired time stamp.

# e.g. to restore Kasten to 15:04:05 UTC on Jan 2, 2022:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name> \
    --set pointInTime="2022-01-02T15:04:05Z"

See Immutable Backups Workflow for additional information.

Restoring Veeam Kasten Backup with Iron Bank Kasten Images

The general instructions found in Restoring Veeam Kasten with k10restore can be used for restoring Veeam Kasten using Iron Bank hardened images with a few changes.

Specific helm values are used to ensure that the Veeam Kasten restore helm chart only uses Iron Bank images. The values file must be downloaded by running:

$ curl -sO https://docs.kasten.io/ironbank/k10restore-ironbank-values.yaml

Note

This file is protected and should not be modified. It is necessary to specify all other values using the corresponding helm flags, such as --set, --values, etc.

Credentials for Registry1 must be provided in order to successfully pull the images. These should already have been created as part of re-deploying a new Veeam Kasten instance; therefore, only the name of the secret should be used here.

The following set of flags should be added to the instructions found in Restoring Veeam Kasten with k10restore to use Iron Bank images for Veeam Kasten recovery:

...
--values=<PATH TO DOWNLOADED k10restore-ironbank-values.yaml> \
--set-json 'imagePullSecrets=[{"name": "k10-ecr"}]' \
...

Restoring Veeam Kasten Backup in FIPS Mode

The general instructions found in Restoring Veeam Kasten with k10restore can be used for restoring Veeam Kasten in FIPS mode with a few changes.

To ensure that certified cryptographic modules are utilized, you must install the k10restore chart with additional Helm values that can be found here: FIPS values. These should be added to the instructions found in Restoring Veeam Kasten with k10restore for Veeam Kasten disaster recovery:

...
--values=https://docs.kasten.io/latest/fips/fips-restore-values.yaml
...

Restoring Veeam Kasten Backup in Air-Gapped environment

In case of air-gapped installations, it's assumed that k10offline tool is used to push the images to a private container registry. Below command can be used to instruct k10restore to run in air-gapped mode.

# Install the helm chart that creates the Kasten restore job and wait for completion of the `k10-restore` job.
# Assume that Kasten is installed in the 'kasten-io' namespace.
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set airgapped.repository=repo.example.com \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name>

Restoring Veeam Kasten Backup with Google Workload Identity Federation

Veeam Kasten can be restored from a Google Cloud Storage bucket using the Google Workload Identity Federation. Please follow the instructions provided here to restore Veeam Kasten with this option.

Uninstalling k10restore

The K10restore instance can be uninstalled with the helm uninstall command.

# e.g. to uninstall K10restore from the kasten-io namespace
$ helm uninstall k10-restore --namespace=kasten-io

Enabling HashiCorp Vault using Token Auth

Create a Kubernetes secret with the Vault token.

kubectl create secret generic vault-creds \
    --namespace kasten-io \
    --from-literal vault_token=<vault-token>

Warning

This may cause the token to be stored in shell history.

Use these additional parameters when installing the kasten/k10restore helm chart.

--set vault.enabled=true \
--set vault.address=<vault-server-address> \
--set vault.secretName=<name-of-secret-with-vault-creds>

Enabling HashiCorp Vault using Kubernetes Auth

Refer to Configuring Vault Server For Kubernetes Auth prior to installing the kasten/k10restore helm chart.

Use these additional parameters when installing the kasten/k10restore helm chart.

--set vault.enabled=true \
--set vault.address=<vault-server-address> \
--set vault.role=<vault-kubernetes-authentication-role_name> \
--set vault.serviceAccountTokenPath=<service-account-token-path> # optional

vault.role is the name of the Vault Kubernetes authentication role binding the Veeam Kasten service account and namespace to the Vault policy.

vault.serviceAccountTokenPath is optional and defaults to /var/run/secrets/kubernetes.io/serviceaccount/token.

Recovering with the Operator

If you have deployed Veeam Kasten via the OperatorHub on an OpenShift cluster, the k10restore tool can be deployed via the Operator as described below. However, it is recommended to use either the Recovering Veeam Kasten from a Disaster via UI or Recovering Veeam Kasten from a Disaster via CLI process.

Recovering from a Veeam Kasten backup involves the following sequence of actions:

  1. Install a fresh Veeam Kasten instance.

  2. Configure a Location Profile from where the Veeam Kasten backup will be restored.

  3. Create a Kubernetes Secret named k10-dr-secret in the same namespace as the Veeam Kasten install, with the passphrase given when disaster recovery was enabled on the previous Veeam Kasten instance. The commands are detailed here.

  4. Create a K10restore instance. The required values are

    • Cluster ID - value given when disaster recovery was enabled on the previous Veeam Kasten instance.

    • Profile name - name of the Location Profile configured in Step 2.

    and the optional values are

    • Point in time - time (RFC3339) at which to evaluate restore data. Example "2022-01-02T15:04:05Z".

    • Resources to skip - can be used to skip restore of specific resources. Example "profile,policies".

    After recovery, deleting the k10restore instance is recommended.

Operator K10restore form view with Enable HashiCorp Vault set to False

../_images/dr_operator_passphrase.png

Operator K10restore form view with Enable HashiCorp Vault set to True

../_images/dr_operator_vault.png

Using the Restored Veeam Kasten in Place of the Original

The newly restored Veeam Kasten includes a safety mechanism to prevent it from performing critical background maintenance operations on backup data in storage. These operations are exclusive, meaning that there is only one Veeam Kasten instance should perform them one at a time. The DR-restored Veeam Kasten initially assumes that it does not have permission to perform these maintenance tasks. This assumption is made in case the original source, Veeam Kasten, is still running, especially during scenarios like testing the DR restore procedure in a secondary test cluster while the primary production Veeam Kasten is still active.

If no other Veeam Kasten instances are accessing the same sets of backup data (i.e., the original Veeam Kasten has been uninstalled and only the new DR-restored Veeam Kasten remains), it can be signaled that the new Veeam Kasten is now eligible to take over the maintenance duties by deleting the following resource:

# Delete the k10-dr-remove-to-get-ownership configmap in the Kasten namespace.
$ kubectl delete configmap --namespace=kasten-io k10-dr-remove-to-get-ownership

Warning

It is critical that you delete this resource only when you are prepared to make the permanent cutover to the new DR-restored Veeam Kasten instance. Running multiple Veeam Kasten instances simultaneously, each assuming ownership, can corrupt backup data.