Kasten Disaster Recovery (DR) aims to protect Kasten from the underlying
infrastructure failures. In particular, this feature provides the
ability to recover the Kasten platform from a variety of disasters,
such as the accidental deletion of Kasten, the failure of the
underlying storage that Kasten uses for its catalog, or even the
accidental destruction of the Kubernetes cluster on which Kasten
is deployed.
Kasten enables Disaster Recovery with the help of an internal policy
to back up its data stores, storing them in either an object storage
bucket or an NFS file storage location configured through a
Location Profile.
To enable Kasten Disaster Recovery, a Location Profile
needs to be configured. This profile will use an object storage bucket or an
NFS file storage location to store data from Kasten's internal data
stores. The cluster must have write permissions for this location.
Note
A VBR location profile cannot be used as a destination for DR
backups.
The Kasten Disaster Recovery settings are accessible via the
DisasterRecovery page under the Settings menu in the
navigation sidebar. For new installations, these settings are
also accessible using the link located within the alerts panel.
Select the DisasterRecovery page under the Settings menu in the
navigation sidebar.
Enabling Kasten Disaster Recovery requires selecting a Location Profile for the
exported Kasten Disaster Recovery backups and providing a passphrase for
encrypting the snapshot data.
The passphrase can be provided as a raw string
or as reference to a secret in HashiCorp Vault or AWS Secrets Manager.
Enable Disaster Recovery by selecting a valid location profile and providing
either a raw passphrase or secret management credentials, then clicking
the EnableKastenDR button.
Note
If providing a raw passphrase,
save it securely outside the cluster.
A confirmation message with the clusterID will be displayed when Disaster
Recovery is enabled. This ID is used as a prefix to the object storage or NFS
file storage location where Kasten's data store saves its exported backups.
Note
Save the cluster ID safely,
it is required to recover Kasten from a disaster.
The clusterID value can also be accessed by using the
following kubectl command.
A policy called k10-disaster-recovery-policy that implements
Kasten Disaster Recovery will automatically be created when Disaster
Recovery is enabled. This policy can be viewed through the Policies
page in the navigation sidebar.
Click RunOnce on the k10-disaster-recovery-policy to start a
backup. The data exported by Kasten for Disaster Recovery purposes
will be encrypted via AES-256-GCM.
Warning
After enabling Kasten Disaster Recovery, it is essential
to copy and save the following to successfully recover Kasten
from a disaster:
Kasten Quick Disaster Recovery aims to improve the
K10 Disaster Recovery workflow. The necessary metadata is
extracted and backed up to facilitate faster recovery in the event of
a disaster. To enable this feature, install or upgrade Kasten
with the --setkastenDisasterRecovery.quickMode.enabled=true helm value.
The backed up metadata includes Kasten resources and data
necessary to restore all exported restore points of applications.
In addition to these, other Kasten resources, such as
policies and profiles, are also included.
When a disaster occurs, in most cases, application's local restore points
are lost along with the Kubernetes cluster. This workflow provides a path for
reliable recovery by restoring the exported application restore points.
Kasten Disaster Recovery can be disabled by clicking the DisableKastenDR
button on the SetupKastenDR page, which is found under
the Settings menu in the navigation sidebar.
To recover from a Kasten backup using the UI, follow these steps:
On a new cluster, install a fresh Kasten instance in the same
namespace as the original Kasten instance.
On the new cluster, create a location profile by providing the
bucket information and credentials for the object storage
location or NFS file storage location where previous Kasten
backups are stored.
On the new cluster, navigate to the RestoreKasten
page under the Settings menu in the navigation sidebar.
In the Profile drop-down, select the location profile created
in step 3.
For Cluster ID, provide the ID of the original cluster with Kasten
Disaster Recovery enabled. This ID can be found on the
SetupKastenDR page of the original cluster that currently
has Kasten Disaster Recovery enabled.
Raw passphrase: Provide the passphrase used when enabling
Disaster Recovery.
HashiCorp Vault: Provide the Key Value Secrets Engine Version,
Mount, Path, and Passphrase Key stored in a HashiCorp Vault secret.
AWS Secrets Manager: Provide the secret name, its associated region,
and the key.
Note
For immutable location profiles, a previous
point in time can be provided to filter out any restore points
newer than the specified time in the next step. If no specific
date is chosen, it will display all available restore points,
with the most recent ones appearing first.
Click the Next button to start the validation process.
If validation succeeds, a drop-down containing the available
restore points will be displayed.
Note
All times are displayed in the local timezone of the
client's browser.
Select the desired restore point and click the Next button.
Review the summary and click the StartRestore button to
begin the restore process.
Upon completion of a successful restoration, navigation to the
dashboard and information about ownership and deletion of
the configmap is displayed.
Recovering from a Kasten backup involves the following sequence of actions:
Create a Kubernetes Secret, k10-dr-secret, using the passphrase
provided while enabling Disaster Recovery
Install a fresh Kasten instance in the same namespace as the above Secret
Provide bucket information and credentials for the object storage
location or NFS file storage location where previous Kasten backups
are stored
Restoring the Kasten backup
Uninstalling the Kasten restore instance after recovery is recommended
Note
If Kasten was previously installed in FIPS mode, ensure the fresh Kasten
instance is also installed in FIPS mode.
Note
If Kasten backup is stored using an
NFS File Storage Location, it is
important that the same NFS share is reachable from the recovery cluster
and is mounted on all nodes where Kasten is installed.
Currently, Kasten Disaster Recovery encrypts all artifacts via the use of the
AES-256-GCM algorithm. The passphrase entered while enabling Disaster Recovery
is used for this encryption. On the cluster used for Kasten recovery, the
Secret k10-dr-secret needs to be therefore created using that same
passphrase in the Kasten namespace (default kasten-io)
The passphrase can be provided as a raw string or reference
a secret in HashiCorp Vault or AWS Secrets Manager.
Specifying the passphrase as a HashiCorp Vault secret:
$ kubectlcreatesecretgenerick10-dr-secret\--namespacekasten-io\--from-literalsource=vault\--from-literalvault-kv-version=<version-of-key-value-secrets-engine>\--from-literalvault-mount-path=<path-where-key-value-engine-is-mounted>\--from-literalvault-secret-path=<path-from-mount-to-passphrase-key>\--from-literalkey=<name-of-passphrase-key>
# Example
$ kubectlcreatesecretgenerick10-dr-secret\--namespacekasten-io\--from-literalsource=vault\--from-literalvault-kv-version=KVv1\--from-literalvault-mount-path=secret\--from-literalvault-secret-path=k10\--from-literalkey=passphrase
The supported values for vault-kv-version are KVv1 and KVv2.
Note
Using a passphrase from HashiCorp Vault also requires enabling
HashiCorp Vault authentication when installing the kasten/k10restore
helm chart. Refer: Enabling HashiCorp Vault using
Token Auth or
Kubernetes Auth.
Specifying the passphrase as an AWS Secrets Manager secret:
$ kubectlcreatesecretgenerick10-dr-secret\--namespacekasten-io\--from-literalsource=aws\--from-literalaws-region=<aws-region-for-secret>\--from-literalkey=<aws-secret-name>
# Example
$ kubectlcreatesecretgenerick10-dr-secret\--namespacekasten-io\--from-literalsource=aws\--from-literalaws-region=us-east-1\--from-literalkey=k10/dr/passphrase
When reinstalling Kasten on the same cluster, it is important to
clean up the namespace in which Kasten was previously installed before the
above passphrase creation.
This file is protected and should not be modified. It is necessary
to specify all other values using the corresponding helm flags, such as
--set, --values, etc.
Credentials for Registry1 must be provided in order to successfully pull
the images. These should already have been created as part of re-deploying a
new Kasten instance; therefore, only the name of the secret should be
used here.
The following set of flags should be added to the instructions found in
Restore Kasten Backup to use Iron Bank
images for Kasten disaster recovery:
The general instructions found in Restore Kasten
Backup can be used for restoring
Kasten in FIPS mode with a few changes.
To ensure that certified cryptographic modules are utilized, you must install
the k10restore chart with additional Helm values that can be found here: FIPS
values. These should be added to the
instructions found in Restore Kasten Backup
for Kasten disaster recovery:
The overrideResources flag must be set to true when using
Quick Disaster Recovery. Since the Disaster Recovery operation involves
creating or replacing resources, confirmation should be provided
by setting this flag.
The restore job always restores the restore point catalog and artifact
information. If the restore of other resources (options include profiles,
policies, secrets) needs to be skipped, the skipResource flag can be used.
The timeout of the entire restore process can be configured by the helm field
restore.timeout. The type of this field is int and the value is
in minutes.
If the Disaster Recovery Location Profile was configured for
Immutable Backups, Kasten can be restored
to an earlier point in time. The protection period chosen when
creating the profile determines how far in the past the point-in-time
can be. Set the pointInTime helm value to the desired time stamp.
vault.role is the name of the Vault Kubernetes authentication role binding
the Kasten service account and namespace to the Vault policy.
vault.serviceAccountTokenPath is optional and defaults to
/var/run/secrets/kubernetes.io/serviceaccount/token.
Restoring Kasten Backup in Air-Gapped environment
In case of air-gapped installations, it's assumed that k10offline tool is
used to push the images to a private container registry.
Below command can be used to instruct k10restore to run in air-gapped mode.
Restoring Kasten Backup with Google Workload Identity Federation
Kasten can be restored from a Google Cloud Storage bucket using the
Google Workload Identity Federation. Please follow the instructions
provided here to restore Kasten with
this option.
Using the Restored Kasten in Place of the Original
The newly restored Kasten includes a safety mechanism to prevent it
from performing critical background maintenance operations on backup
data in storage. These operations are exclusive, meaning that there
is only one Kasten instance should perform them one at a time.
The DR-restored Kasten initially assumes that it does not have
permission to perform these maintenance tasks. This assumption is
made in case the original source, Kasten, is still running,
especially during scenarios like testing the DR restore procedure in a
secondary test cluster while the primary production Kasten is still
active.
If no other Kasten instances are accessing the same sets of backup
data (i.e., the original Kasten has been uninstalled and only the new
DR-restored Kasten remains), it can be signaled that the new Kasten is
now eligible to take over the maintenance duties by deleting the
following resource:
It is critical that you delete this resource only when you are prepared
to make the permanent cutover to the new DR-restored Kasten instance. Running
multiple Kasten instances simultaneously, each assuming ownership, can
corrupt backup data.
Prior to recovering applications, it may be desirable to restore
cluster-scoped resources. Cluster-scoped resources may be needed
for cluster configuration or as part of application recovery.
Upon completion of the Disaster Recovery Restore job, go to the Applications
card, hover on the Cluster-ScopedResources card, click on the
restore icon, and select a cluster restore point to recover from.
Upon completion of the Disaster Recovery Restore job, go to the Applications
card, select Removed under the Filterbystatus drop-down menu.
Click restore under the application and select a restore point
to recover from.
Recovering from a Kasten backup involves the following sequence of actions:
Install a fresh Kasten instance.
Configure a Location Profile from
where the Kasten backup will be restored.
Create a Kubernetes Secret named k10-dr-secret in the same namespace
as the Kasten install, with the passphrase given when disaster recovery
was enabled on the previous Kasten instance.
The commands are detailed here.
Create a K10restore instance. The required values are
Cluster ID - value given when disaster recovery was enabled
on the previous Kasten instance.
Profile name - name of the Location Profile configured in Step 2.
and the optional values are
Point in time - time (RFC3339) at which to evaluate restore data.
Example "2022-01-02T15:04:05Z".
Resources to skip - can be used to skip restore of specific resources.
Example "profile,policies".
After recovery, deleting the k10restore instance is recommended.
Operator K10restore form view with EnableHashiCorpVault set to False
Operator K10restore form view with EnableHashiCorpVault set to True