Veeam Kasten Disaster Recovery
Veeam Kasten Disaster Recovery (DR) aims to protect Veeam Kasten from the underlying infrastructure failures. In particular, this feature provides the ability to recover the Veeam Kasten platform from a variety of disasters, such as the accidental deletion of Veeam Kasten, the failure of the underlying storage that Veeam Kasten uses for its catalog, or even the accidental destruction of the Kubernetes cluster on which Veeam Kasten is deployed.
Overview
Veeam Kasten enables Disaster Recovery with the help of an internal policy to back up its data stores, storing them in either an object storage bucket or an NFS file storage location configured through a Location Profile.
External Storage Configuration
To enable Veeam Kasten Disaster Recovery, a Location Profile needs to be configured. This profile will use an object storage bucket or an NFS file storage location to store data from Veeam Kasten's internal data stores. The cluster must have write permissions for this location.
Note
A VBR location profile cannot be used as a destination for DR backups.
Enabling Veeam Kasten Disaster Recovery
Note
Veeam Kasten Quick Disaster Recovery can be enabled for faster Kasten Disaster Recovery.
The Veeam Kasten Disaster Recovery settings are accessible via the
Disaster Recovery
page under the Settings
menu in the
navigation sidebar. For new installations, these settings are
also accessible using the link located within the alerts panel.
Select the Disaster Recovery
page under the Settings
menu in the
navigation sidebar.
Enabling Veeam Kasten Disaster Recovery requires selecting a Location Profile for the exported Kasten Disaster Recovery backups and providing a passphrase for encrypting the snapshot data.
The passphrase can be provided as a raw string or as reference to a secret in HashiCorp Vault or AWS Secrets Manager.
Enable Disaster Recovery by selecting a valid location profile and providing
either a raw passphrase or secret management credentials, then clicking
the Enable Kasten DR
button.
Note
If providing a raw passphrase, save it securely outside the cluster.
Note
Using HashiCorp Vault requires that Kasten is configured to access Vault.
Note
Using AWS Secrets Manager requires that an AWS Infrastructure Profile exists with the adequate permissions
Cluster ID
A confirmation message with the cluster ID
will be displayed
when Disaster Recovery is enabled. This ID is used as a prefix to
the object storage or NFS file storage location where Veeam Kasten's
data store saves its exported backups.
Note
Save the cluster ID safely, it is required to recover Veeam Kasten from a disaster.
The cluster ID
value can also be accessed by using the
following kubectl
command.
# Extract UUID of the `default` namespace
$ kubectl get namespace default -o jsonpath="{.metadata.uid}{'\n'}"
Veeam Kasten Disaster Recovery Policy
A policy called k10-disaster-recovery-policy
that implements
Veeam Kasten Disaster Recovery will automatically be created when Disaster
Recovery is enabled. This policy can be viewed through the Policies
page in the navigation sidebar.
Click Run Once
on the k10-disaster-recovery-policy
to start a
backup. The data exported by Veeam Kasten for Disaster Recovery purposes
will be encrypted via AES-256-GCM
.
Warning
- After enabling Veeam Kasten Disaster Recovery, it is essential
to copy and save the following to successfully recover Veeam Kasten from a disaster:
The cluster ID displayed on the disaster recovery page
The Disaster Recovery passphrase provided above
The credentials and object storage bucket or the NFS file storage information (used in the Location Profile configuration above)
Without this information, Veeam Kasten Disaster Recovery will not be possible.
Veeam Kasten Quick Disaster Recovery
Veeam Kasten Quick Disaster Recovery aims to improve the
Veeam Kasten Disaster Recovery workflow. The necessary metadata
is extracted and backed up to facilitate faster recovery in the event of
a disaster. To enable this feature, install or upgrade Kasten
with the --set kastenDisasterRecovery.quickMode.enabled=true
helm value.
Advantages of Veeam Kasten Quick Disaster Recovery
Metadata backup instead of full Catalog backup.
The backed up metadata includes Veeam Kasten resources and data necessary to restore all exported restore points of applications. In addition to these, other Veeam Kasten resources, such as policies and profiles, are also included.
When a disaster occurs, in most cases, application's local restore points are lost along with the Kubernetes cluster. This workflow provides a path for reliable recovery by restoring the exported application restore points.
Limitation
Application's local restore points are not backed up.
Disabling Veeam Kasten Disaster Recovery
Veeam Kasten Disaster Recovery can be disabled by clicking
the Disable Kasten DR
button on the Setup Kasten DR
page,
which is found under the Settings
menu in the navigation sidebar.
Recovering Veeam Kasten from a Disaster via UI
To recover from a Veeam Kasten backup using the UI, follow these steps:
On a new cluster, install a fresh Veeam Kasten instance in the same namespace as the original Veeam Kasten instance.
On the new cluster, create a location profile by providing the bucket information and credentials for the object storage location or NFS file storage location where previous Veeam Kasten backups are stored.
On the new cluster, navigate to the
Restore Kasten
page under theSettings
menu in the navigation sidebar.In the Profile drop-down, select the location profile created in step 3.
For Cluster ID, provide the ID of the original cluster with Veeam Kasten Disaster Recovery enabled. This ID can be found on the
Setup Kasten DR
page of the original cluster that currently has Veeam Kasten Disaster Recovery enabled.
Raw passphrase: Provide the passphrase used when enabling Disaster Recovery.
HashiCorp Vault: Provide the Key Value Secrets Engine Version, Mount, Path, and Passphrase Key stored in a HashiCorp Vault secret.
AWS Secrets Manager: Provide the secret name, its associated region, and the key.
Note
For immutable location profiles, a previous point in time can be provided to filter out any restore points newer than the specified time in the next step. If no specific date is chosen, it will display all available restore points, with the most recent ones appearing first.
Click the
Next
button to start the validation process. If validation succeeds, a drop-down containing the available restore points will be displayed.Note
All times are displayed in the local timezone of the client's browser.
Select the desired restore point and click the
Next
button.Review the summary and click the
Start Restore
button to begin the restore process.
Upon completion of a successful restoration, navigation to the dashboard and information about ownership and deletion of the configmap is displayed.
Recovering Veeam Kasten From a Disaster via Helm
Recovering from a Veeam Kasten backup involves the following sequence of actions:
Create a Kubernetes Secret,
k10-dr-secret
, using the passphrase provided while enabling Disaster RecoveryInstall a fresh Veeam Kasten instance in the same namespace as the above Secret
Provide bucket information and credentials for the object storage location or NFS file storage location where previous Veeam Kasten backups are stored
Restoring the Veeam Kasten backup
Uninstalling the Veeam Kasten restore instance after recovery is recommended
Note
If Kasten was previously installed in FIPS mode, ensure the fresh Veeam Kasten instance is also installed in FIPS mode.
Note
If Veeam Kasten backup is stored using an NFS File Storage Location, it is important that the same NFS share is reachable from the recovery cluster and is mounted on all nodes where Veeam Kasten is installed.
Specifying a Disaster Recovery Passphrase
Currently, Veeam Kasten Disaster Recovery encrypts all artifacts via the
use of the AES-256-GCM algorithm. The passphrase entered while enabling
Disaster Recovery is used for this encryption. On the cluster used for
Veeam Kasten recovery, the Secret k10-dr-secret
needs to be
therefore created using that same passphrase in the Veeam Kasten
namespace (default kasten-io
)
The passphrase can be provided as a raw string or reference a secret in HashiCorp Vault or AWS Secrets Manager.
Specifying the passphrase as a raw string:
$ kubectl create secret generic k10-dr-secret \
--namespace kasten-io \
--from-literal key=<passphrase>
Specifying the passphrase as a HashiCorp Vault secret:
$ kubectl create secret generic k10-dr-secret \
--namespace kasten-io \
--from-literal source=vault \
--from-literal vault-kv-version=<version-of-key-value-secrets-engine> \
--from-literal vault-mount-path=<path-where-key-value-engine-is-mounted> \
--from-literal vault-secret-path=<path-from-mount-to-passphrase-key> \
--from-literal key=<name-of-passphrase-key>
# Example
$ kubectl create secret generic k10-dr-secret \
--namespace kasten-io \
--from-literal source=vault \
--from-literal vault-kv-version=KVv1 \
--from-literal vault-mount-path=secret \
--from-literal vault-secret-path=k10 \
--from-literal key=passphrase
The supported values for vault-kv-version
are KVv1
and KVv2
.
Note
Using a passphrase from HashiCorp Vault also requires enabling
HashiCorp Vault authentication when installing the kasten/k10restore
helm chart. Refer: Enabling HashiCorp Vault using
Token Auth or
Kubernetes Auth.
Specifying the passphrase as an AWS Secrets Manager secret:
$ kubectl create secret generic k10-dr-secret \
--namespace kasten-io \
--from-literal source=aws \
--from-literal aws-region=<aws-region-for-secret> \
--from-literal key=<aws-secret-name>
# Example
$ kubectl create secret generic k10-dr-secret \
--namespace kasten-io \
--from-literal source=aws \
--from-literal aws-region=us-east-1 \
--from-literal key=k10/dr/passphrase
Reinstalling Veeam Kasten
Note
When reinstalling Veeam Kasten on the same cluster, it is important to clean up the namespace in which Veeam Kasten was previously installed before the above passphrase creation.
# Delete the kasten-io namespace.
$ kubectl delete namespace kasten-io
Veeam Kasten must be reinstalled before recovery. Please follow the instructions here.
Provide External Storage Configuration
Create a Location Profile with the object storage location or NFS file storage location where Veeam Kasten backups are stored.
Restoring Veeam Kasten Backup with Iron Bank Kasten Images
The general instructions found in Restore Kasten Backup can be used for restoring Veeam Kasten using Iron Bank hardened images with a few changes.
Specific helm values are used to ensure that the Veeam Kasten restore helm chart only uses Iron Bank images. The values file must be downloaded by running:
$ curl -sO https://docs.kasten.io/ironbank/k10restore-ironbank-values.yaml
Note
This file is protected and should not be modified. It is necessary
to specify all other values using the corresponding helm flags, such as
--set
, --values
, etc.
Credentials for Registry1 must be provided in order to successfully pull the images. These should already have been created as part of re-deploying a new Veeam Kasten instance; therefore, only the name of the secret should be used here.
The following set of flags should be added to the instructions found in Restore Kasten Backup to use Iron Bank images for Veeam Kasten disaster recovery:
...
--values=<PATH TO DOWNLOADED k10restore-ironbank-values.yaml> \
--set-json 'imagePullSecrets=[{"name": "k10-ecr"}]' \
...
Restoring Veeam Kasten Backup in FIPS Mode
The general instructions found in Restore Kasten Backup can be used for restoring Veeam Kasten in FIPS mode with a few changes.
To ensure that certified cryptographic modules are utilized, you must install the k10restore chart with additional Helm values that can be found here: FIPS values. These should be added to the instructions found in Restore Kasten Backup for Veeam Kasten disaster recovery:
...
--values=https://docs.kasten.io/latest/fips/fips-restore-values.yaml
...
Restoring Veeam Kasten Backup
Requirements:
Source
cluster ID
Name of Location Profile from the previous step
# Install the helm chart that creates the Kasten restore job and wait for completion of the `k10-restore` job
# Assumes that Kasten is installed in the 'kasten-io' namespace.
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name>
If Veeam Kasten Quick Disaster Recovery is enabled, the Veeam Kasten restore helm chart should be installed with the following helm value:
--set quickMode.enabled=true \
--set quickMode.overrideResources=true
Note
The overrideResources flag must be set to true when using Quick Disaster Recovery. Since the Disaster Recovery operation involves creating or replacing resources, confirmation should be provided by setting this flag.
Veeam Kasten provides the ability to apply labels and annotations to all
temporary worker pods created during Veeam Kasten recovery as part of its
operation. The labels and annotations can be set through the podLabels
and
podAnnotations
Helm flags, respectively. For example, if using a
values.yaml
file:
podLabels:
app.kubernetes.io/component: "database"
topology.kubernetes.io/region: "us-east-1"
podAnnotations:
config.kubernetes.io/local-config: "true"
kubernetes.io/description: "Description"
Alternatively, the Helm parameters can be configured using the --set
flag:
--set podLabels.labelKey1=value1 --set podLabels.labelKey2=value2 \
--set podAnnotations.annotationKey1="Example annotation" --set podAnnotations.annotationKey2=value2
The restore job always restores the restore point catalog and artifact
information. If the restore of other resources (options include profiles,
policies, secrets) needs to be skipped, the skipResource
flag can be used.
# e.g. to skip restore of profiles and policies, helm install command will be as follows:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name> \
--set skipResource="profiles\,policies"
The timeout of the entire restore process can be configured by the helm field
restore.timeout
. The type of this field is int
and the value is
in minutes.
# e.g. to specify the restore timeout, helm install command will be as follows:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name> \
--set restore.timeout=<timeout-in-minutes>
If the Disaster Recovery Location Profile was configured for
Immutable Backups, Veeam Kasten can be
restored to an earlier point in time. The protection period chosen when
creating the profile determines how far in the past the point-in-time
can be. Set the pointInTime
helm value to the desired time stamp.
# e.g. to restore Kasten to 15:04:05 UTC on Jan 2, 2022:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name> \
--set pointInTime="2022-01-02T15:04:05Z"
See Immutable Backups Workflow for additional information.
Enable HashiCorp Vault using Token Auth
Create a Kubernetes secret with the Vault token.
kubectl create secret generic vault-creds \
--namespace kasten-io \
--from-literal vault_token=<vault-token>
Warning
This may cause the token to be stored in shell history.
Use these additional parameters when installing the kasten/k10restore
helm chart.
--set vault.enabled=true \
--set vault.address=<vault-server-address> \
--set vault.secretName=<name-of-secret-with-vault-creds>
Enable HashiCorp Vault using Kubernetes Auth
Refer to Configuring Vault Server For Kubernetes Auth prior to installing the kasten/k10restore
helm chart.
Use these additional parameters when installing the
kasten/k10restore
helm chart.
--set vault.enabled=true \
--set vault.address=<vault-server-address> \
--set vault.role=<vault-kubernetes-authentication-role_name> \
--set vault.serviceAccountTokenPath=<service-account-token-path> # optional
vault.role
is the name of the Vault Kubernetes authentication role binding
the Veeam Kasten service account and namespace to the Vault policy.
vault.serviceAccountTokenPath
is optional and defaults to
/var/run/secrets/kubernetes.io/serviceaccount/token
.
Restoring Veeam Kasten Backup in Air-Gapped environment
In case of air-gapped installations, it's assumed that k10offline
tool is
used to push the images to a private container registry.
Below command can be used to instruct k10restore
to run in air-gapped mode.
# Install the helm chart that creates the Kasten restore job and wait for completion of the `k10-restore` job.
# Assume that Kasten is installed in the 'kasten-io' namespace.
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set airgapped.repository=repo.example.com \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name>
Restoring Veeam Kasten Backup with Google Workload Identity Federation
Veeam Kasten can be restored from a Google Cloud Storage bucket using the Google Workload Identity Federation. Please follow the instructions provided here to restore Veeam Kasten with this option.
Using the Restored Veeam Kasten in Place of the Original
The newly restored Veeam Kasten includes a safety mechanism to prevent it from performing critical background maintenance operations on backup data in storage. These operations are exclusive, meaning that there is only one Veeam Kasten instance should perform them one at a time. The DR-restored Veeam Kasten initially assumes that it does not have permission to perform these maintenance tasks. This assumption is made in case the original source, Veeam Kasten, is still running, especially during scenarios like testing the DR restore procedure in a secondary test cluster while the primary production Veeam Kasten is still active.
If no other Veeam Kasten instances are accessing the same sets of backup data (i.e., the original Veeam Kasten has been uninstalled and only the new DR-restored Veeam Kasten remains), it can be signaled that the new Veeam Kasten is now eligible to take over the maintenance duties by deleting the following resource:
# Delete the k10-dr-remove-to-get-ownership configmap in the Kasten namespace.
$ kubectl delete configmap --namespace=kasten-io k10-dr-remove-to-get-ownership
Important
It is critical that you delete this resource only when you are prepared to make the permanent cutover to the new DR-restored Veeam Kasten instance. Running multiple Veeam Kasten instances simultaneously, each assuming ownership, can corrupt backup data.
Cluster-Scoped Resource Recovery
Prior to recovering applications, it may be desirable to restore cluster-scoped resources. Cluster-scoped resources may be needed for cluster configuration or as part of application recovery.
Upon completion of the Disaster Recovery Restore job, go to the Applications
card, hover on the Cluster-Scoped Resources
card, click on the
restore
icon, and select a cluster restore point to recover from.
Application Recovery
Upon completion of the Disaster Recovery Restore job, go to the Applications
card, select Removed
under the Filter by status
drop-down menu.
Click restore
under the application and select a restore point
to recover from.
Uninstall k10restore
The K10restore instance can be uninstalled with the helm uninstall command.
# e.g. to uninstall K10restore from the kasten-io namespace
$ helm uninstall k10-restore --namespace=kasten-io
Recovering with the Operator
Recovering from a Veeam Kasten backup involves the following sequence of actions:
Install a fresh Veeam Kasten instance.
Configure a Location Profile from where the Veeam Kasten backup will be restored.
Create a Kubernetes Secret named
k10-dr-secret
in the same namespace as the Veeam Kasten install, with the passphrase given when disaster recovery was enabled on the previous Veeam Kasten instance. The commands are detailed here.Create a K10restore instance. The required values are
Cluster ID - value given when disaster recovery was enabled on the previous Veeam Kasten instance.
Profile name - name of the Location Profile configured in Step 2.
and the optional values are
Point in time - time (RFC3339) at which to evaluate restore data. Example "2022-01-02T15:04:05Z".
Resources to skip - can be used to skip restore of specific resources. Example "profile,policies".
After recovery, deleting the k10restore instance is recommended.
Operator K10restore form view with Enable HashiCorp Vault
set to False
Operator K10restore form view with Enable HashiCorp Vault
set to True