Immutable Backups Workflow
K10 can leverage the object-locking capability available in object stores to make backups immutable. This guards against catastrophic disaster scenarios such as ransomware attacks and allows recovering the backups in those situations.
This feature is currently available for use with AWS S3 and any S3-compatible object store that supports object locking.
Disaster Scenarios
Vulnerabilities can arise from many sources, such as lack of privilege separation due to credentials with permissive access, or from sophisticated attacks.
Consider a comprehensive breach of all secured systems in a Kubernetes cluster and ancillary infrastructure. Assume that a malicious agent has compromised all of the following:
the Kubernetes cluster - can inspect and control applications running in all namespaces, read secrets, manipulate snapshots, and locate backups.
the object store - can tamper with or destroy application backup data that had been exported by K10.
the K10 deployment - can force retirement of K10 snapshots and backups, compelling K10 to delete the associated data and metadata, including application backups and K10 Disaster Recovery self-backups.
the production application - can tamper with or encrypt vital data, demanding a ransom in exchange for resumed access.
In such a sophisticated scenario, an attacker might attempt to render restores from backups useless, before encrypting live application data and demanding a ransom.
K10 Immutable Backups
In the face of such a comprehensive attack, K10 has the ability to turn back the clock.
K10 is capable of exporting backups to object store containers with object locking enabled. Doing so renders the data written there immutable. Even users with administrative privileges are prevented from deleting or tampering with the backup data. Each repository blob is immutable and secure for an extendable period of time.
K10 uses these immutable blobs to go back in time, and retrieve the backups as they were prior to the time of an attack.
Protection Period
How far back in time K10 can see is a tunable parameter called the protection period. The protection period is chosen by the user when creating a profile with immutable backups enabled.
The protection period represents the estimated amount of time required by your organization to recover following [impact](https://attack.mitre.org/tactics/TA0040/). Recovery must begin within that protection period window in order to guarantee a successful restoration from immutable backups. The longer the protection period, the more time can be afforded for the recovery from an impact event like ransomware, and the farther back in time K10 can go to find consistent, unadulterated restore points.
The protection period can be any length of time greater than or equal to 1 day, with a granularity of days. This range is subject to the default retention configuration of the object store location, as discussed below. Longer protection periods may have increased storage costs associated with them, as object versions must be kept around longer.
Default Retention Configuration
To ensure K10 is always in compliance with this protection period window, all new blobs written to the object store are required to be written with a sufficient retention policy automatically applied to them. For this reason, a default blob retention policy must be set on the bucket.
The default retention must always be longer than the requested protection window to fully ensure compliance. How much longer? K10 requires a default retention period that is at least 20 days longer than the protection period.
This requirement puts an upper bound on the longest protection period that K10 can guarantee. For example if an S3 bucket has a default retention of 60 days, the maximum requested protection period that K10 will accept for that bucket will be 40 days.
If a longer protection period is desired than K10 is currently accepting, it's a simple matter of reconfiguring the default retention to be at least 20 days longer than the desired protection period.
Comprehensive Protection
Individual applications can be protected with immutable backups by exporting to a location with object locking enabled. This ensures those application restore points are unable to be tampered with, and that K10 knows how to access them at the appropriate points in time.
However if the cluster has been completely compromised, K10 may also be susceptible to tampering. For example if an attacker manually retires all restore points, from K10's perspective, those backups no longer exist.
To comprehensively secure an application's backups even in the face of an attack on K10, it is necessary to activate a K10 Disaster Recovery policy that also makes use of immutable backups. This means that all backups of K10 itself are also stored immutably in an object-locked object store location.
By combining immutable backup policy runs for an application with a subsequent immutable Disaster Recovery policy run for K10, you preserve both the application's data and the ability to restore it.
Each time the Disaster Recovery policy runs, it will "lock in" the ability to recover from any of the active restore points at that point in time. Multiple applications may be simultaneously protected by different policies with differing scheduling frequencies. Therefore it is recommended to schedule the Disaster Recovery policy to run at least as frequently, if not more so, than the most frequent policy schedule that performs immutable backups. Additionally, the protection period chosen for the Disaster Recovery profile should be at least as long as the longest protection period in use by an immutable backups policy.
Setting Up Immutable Backup Protection
Begin by setting up one or more location profiles, each pointing to an object store destination with object-locking enabled.
The selected object store destination must meet certain criteria:
Must be a bucket on AWS S3 or an S3-compatible object store.
Must be reachable with the credentials provided in the profile form.
Must already exist.
The region provided on the profile form must match the region in which the bucket resides.
Must have object locking enabled.
Must have a default retention configuration.
Default retention mode must be compliance mode.
Default retention period must be at least 21 days.
Select the checkbox for "Enable Immutable Backups".
Click the "Validate Bucket" button. This will initiate a pre-flight check of the above criteria.
If all of the checks succeed, a slider bar will appear for selecting the desired protection period. As discussed earlier, the bounds of this bar are from 1 day to an upper value dictated by the default retention period currently configured at the object store location. If the slider does not include the desired protection period value, it's a simple matter of reconfiguring the location's default retention property: it must be at least 20 days longer than the desired protection period.
Click the "Save Profile" button to submit this profile configuration.
Profiles that reference object-locked object store locations and have immutable backups enabled will display this on its respective profile card: a field indicating "Object Immutability" is "enabled", and the chosen protection period.
Next follow the standard Backups workflow, selecting one of the immutable backup profiles for each policy. This will protect each covered application with immutable backups for all policy runs.
Finally follow the K10 Disaster Recovery workflow, selecting the desired immutable backup profile for the "Cloud Location Profile" in the DR form.
Recovering from a Worst-Case Scenario Attack
Stage 1: Recover K10
In many disaster circumstances, it may be safer to assume that K10 has been compromised if an attack has been detected. If so, K10 may need to be recovered to a point-in-time prior to the attack. If it is absolutely certain that K10 has not been affected, directly restoring the application from K10's current known restore points is still possible, in which case skip directly to Stage 2. When in doubt, discuss with your CISO if it makes sense to restore K10 to a time before the attack.
Restoring K10 is straightforward. Follow the standard Recovering K10 From a Disaster workflow:
Reinstall K10, deleting the K10 namespace if reinstalling on the same cluster.
Create a secret to contain the DR passphrase.
Provide the external storage configuration by adding a location profile referencing the object store location where backups are stored. This should resemble the profile that was created when Setting Up Immutable Backup Protection, but it is not necessary to create it with "Immutable Backups" enabled.
Ascertain when the disaster was likely to have occurred. Pick a timestamp corresponding to a point-in-time prior to the disaster event. The time chosen should be no earlier than the protection period (associated with the K10 Disaster Recovery immutable profile) in the past.
Install the helm chart that creates the K10 restore job. Provide the source cluster ID, the name of the location profile just created, and the point-in-time chosen in the previous step, formatted as a RFC3339 time stamp.
# e.g. to restore K10 to 15:04:05 UTC on Jan 2, 2022:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
--set sourceClusterID=<source-clusterID> \
--set profile.name=<location-profile-name> \
--set pointInTime="2022-01-02T15:04:05Z"
Upon successful recovery, K10 will now be in the same state as it was at the last time the Disaster Recovery policy had been run prior to the chosen point-in-time. This includes references to restore points that had since been retired, even if that retirement happened as part of routine policy retention management.
Restore points referring to snapshots or non-immutable backups may or may not be recoverable in this state; local storage snapshots or non-immutable exported backup data may have been permanently deleted, either during an attack, or as part of routine operation. However any restore points corresponding to immutable backups should still be fully recoverable.
Stage 2: Restore Applications
The process for restoring applications from immutable backups is identical to the standard restoration workflow. Make sure to select the Exported restore point instance, referring to the backup residing in the object store locked bucket.
K10 will do all point-in-time management based on its knowledge of the time the backup took place. Initiating a restore is as simple as selecting the desired backup (exported) restore point and clicking "Restore".
When the restore action completes, the application will be running in the same state, with the same persistent data, as it was at the time the backup took place.