Immutable Backups Workflow

K10 can leverage the object-locking capability available in object stores to make backups immutable. This guards against catastrophic disaster scenarios such as ransomware attacks and allows recovering the backups in those situations.

This feature is currently available for use with AWS S3 and any S3-compatible object store that supports object locking.

Warning

The generic storage and shareable volume backup and restore workflows are not compatible with the protections afforded by immutable backups. Use of a location profile enabled for immutable backups can be used for backup and restore, but the protection period is ignored, and the profile is treated as a non-immutability-enabled location. Please note that using an object-locking bucket for such use cases can amplify storage usage without any additional benefit. Please contact support for any inquiries.

Disaster Scenarios

Vulnerabilities can arise from many sources, such as lack of privilege separation due to credentials with permissive access, or from sophisticated attacks.

Consider a comprehensive breach of all secured systems in a Kubernetes cluster and ancillary infrastructure. Assume that a malicious agent has compromised all of the following:

  • the Kubernetes cluster - can inspect and control applications running in all namespaces, read secrets, manipulate snapshots, and locate backups.

  • the object store - can tamper with or destroy application backup data that had been exported by K10.

  • the K10 deployment - can force retirement of K10 snapshots and backups, compelling K10 to delete the associated data and metadata, including application backups and K10 Disaster Recovery self-backups.

  • the production application - can tamper with or encrypt vital data, demanding a ransom in exchange for resumed access.

In such a sophisticated scenario, an attacker might attempt to render restores from backups useless, before encrypting live application data and demanding a ransom.

K10 Immutable Backups

In the face of such a comprehensive attack, K10 has the ability to turn back the clock.

K10 is capable of exporting backups to object store containers with object locking enabled. Doing so renders the data written there immutable. Even users with administrative privileges are prevented from deleting or tampering with the backup data. Each repository blob is immutable and secure for an extendable period of time.

K10 uses these immutable blobs to go back in time, and retrieve the backups as they were prior to the time of an attack.

Protection Period

How far back in time K10 can see is a tunable parameter called the protection period. The protection period is chosen by the user when creating a profile with immutable backups enabled.

The protection period represents the estimated amount of time required by your organization to recover following impact. Recovery must begin within that protection period window in order to guarantee a successful restoration from immutable backups. The longer the protection period, the more time can be afforded for the recovery from an impact event like ransomware, and the farther back in time K10 can go to find consistent, unadulterated restore points.

The protection period can be any length of time greater than or equal to 1 day, with a granularity of days. This range is subject to the default retention configuration of the object store location, as discussed below. Longer protection periods may have increased storage costs associated with them, as object versions must be kept around longer.

Default Retention Configuration

To ensure K10 is always in compliance with this protection period window, all new blobs written to the object store are required to be written with a sufficient retention policy automatically applied to them. For this reason, a default blob retention policy must be set on the bucket.

The default retention must always be longer than the requested protection window to fully ensure compliance. How much longer? K10 requires a default retention period that is at least 20 days longer than the protection period.

This requirement puts an upper bound on the longest protection period that K10 can guarantee. For example if an S3 bucket has a default retention of 60 days, the maximum requested protection period that K10 will accept for that bucket will be 40 days.

If a longer protection period is desired than K10 is currently accepting, it's a simple matter of reconfiguring the default retention to be at least 20 days longer than the desired protection period.

Active Monitoring

K10 has a background service that monitors repositories containing immutable backups. Each time a restore point is exported to a repository in a locked bucket, some new blobs may be written, and some blobs may be reused. Therefore the retain-until date applied to each blob (the property that renders it immutable) may need to be refreshed, or pushed out to a later date, as time passes.

Once an immutable backups repository is created, K10 will begin periodically querying its data blobs in the background. The frequency of this operation is determined by the chosen protection period; a longer protection period doesn't require checking as frequently, because the retain-until date can be pushed out further when needed. That said, the longest the service will wait without performing a refresh check is 2 days.

In order for this background operation to run smoothly, you must maintain a profile referencing the location containing that repository. Additionally that profile must follow the criteria perscribed for an immutable backups profile.

An alert notification will appear in the upper right corner of the K10 dashboard when any of the following conditions is encountered:

  • the profile no longer meets these criteria

  • the background monitoring service has trouble connecting to the store

  • the background monitoring service is otherwise unable to perform the refresh procedure

Clicking the alert icon will open a side-pane containing a list of outstanding errors and warnings, each with a description of what went wrong. If you are unable to determine the appropriate fix from the information provided, please refer to Kasten K10 Community Support or open a case via https://my.veeam.com.

Comprehensive Protection

Individual applications can be protected with immutable backups by exporting to a location with object locking enabled. This ensures those application restore points are unable to be tampered with, and that K10 knows how to access them at the appropriate points in time.

However if the cluster has been completely compromised, K10 may also be susceptible to tampering. For example if an attacker manually retires all restore points, from K10's perspective, those backups no longer exist.

To comprehensively secure an application's backups even in the face of an attack on K10, it is necessary to activate a K10 Disaster Recovery policy that also makes use of immutable backups. This means that all backups of K10 itself are also stored immutably in an object-locked object store location.

By combining immutable backup policy runs for an application with a subsequent immutable Disaster Recovery policy run for K10, you preserve both the application's data and the ability to restore it.

Each time the Disaster Recovery policy runs, it will "lock in" the ability to recover from any of the active restore points at that point in time. Multiple applications may be simultaneously protected by different policies with differing scheduling frequencies. Therefore it is recommended to schedule the Disaster Recovery policy to run at least as frequently, if not more so, than the most frequent policy schedule that performs immutable backups. Additionally, the protection period chosen for the Disaster Recovery profile should be at least as long as the longest protection period in use by an immutable backups policy.

Setting Up Immutable Backup Protection

Begin by setting up one or more location profiles, each pointing to an object store destination with object-locking enabled.

The selected object store destination must meet certain criteria:

  • Must be a bucket on AWS S3 or an S3-compatible object store.

  • Must be reachable with the credentials provided in the profile form.

  • Must already exist.

  • The region provided on the profile form must match the region in which the bucket resides.

  • Must have object locking enabled.

  • Must have a default retention configuration.

  • Default retention mode must be compliance mode.

  • Default retention period must be at least 21 days.

Select the checkbox for "Enable Immutable Backups".

Click the "Validate Bucket" button. This will initiate a pre-flight check of the above criteria.

If all of the checks succeed, a slider bar will appear for selecting the desired protection period. As discussed earlier, the bounds of this bar are from 1 day to an upper value dictated by the default retention period currently configured at the object store location. If the slider does not include the desired protection period value, it's a simple matter of reconfiguring the location's default retention property: it must be at least 20 days longer than the desired protection period.

Click the "Save Profile" button to submit this profile configuration.

Profiles that reference object-locked object store locations and have immutable backups enabled will display this on its respective profile card: a field indicating "Object Immutability" is "enabled", and the chosen protection period.

Next follow the standard Backups workflow, selecting one of the immutable backup profiles for each policy. This will protect each covered application with immutable backups for all policy runs.

Finally follow the K10 Disaster Recovery workflow, selecting the desired immutable backup profile for the "Cloud Location Profile" in the DR form.

Setting Up Immutable Backup Protection Using Veeam Backup Repository

To use the K10 backup immutability feature with VBR, immutability settings should be set up consistently as follows.

  • Set up the immutability window length in VBR for desired immutability period.

  • Set up a proper protected period in K10 (see K10 immutability documentation for details). This period must be less than or equal to the immutability window in VBR.

  • Configure K10 backup frequency and retention settings in a way that prevents deleting restore points inside the VBR immutability window (otherwise, it will lead to orphaned restore points that will not be automatically retired).

Note

The integrity of the restore points outside the immutability window in VBR is not guaranteed. It is recommended to set up the retention policy to automatically delete such restore points when they are out of the immutability window.

This approach allows to keep only N last backups according to chosen backup frequency. If it is required to keep e.g. N daily backups and M yearly backups, several policies should be created with different backup frequencies and retention settings.

Example

A customer wants to guarantee the immutability for 30 days for their daily backups. They can set up the K10 immutability window to any value up to and including 30 days, but not more.

Then they create a policy with a daily snapshot (one snapshot per day) and exported snapshot retention as in the screenshot:

Note

Weekly, monthly and yearly snapshots are intentionally set to 0.

If a user needs another backup frequency, it can be done via another (separate) policy with separate S3 and VBR repositories, with corresponding immutability settings.

Recovering from a Worst-Case Scenario Attack

Stage 1: Recover K10

In many disaster circumstances, it may be safer to assume that K10 has been compromised if an attack has been detected. If so, K10 may need to be recovered to a point-in-time prior to the attack. If it is absolutely certain that K10 has not been affected, directly restoring the application from K10's current known restore points is still possible, in which case skip directly to Stage 2. When in doubt, discuss with your CISO if it makes sense to restore K10 to a time before the attack.

Restoring K10 is straightforward. Follow the standard Recovering K10 From a Disaster workflow:

  1. Reinstall K10, deleting the K10 namespace if reinstalling on the same cluster.

  2. Create a secret to contain the DR passphrase.

  3. Provide the external storage configuration by adding a location profile referencing the object store location where backups are stored. This should resemble the profile that was created when Setting Up Immutable Backup Protection, but it is not necessary to create it with "Immutable Backups" enabled.

  4. Ascertain when the disaster was likely to have occurred. Pick a timestamp corresponding to a point-in-time prior to the disaster event. The time chosen should be no earlier than the protection period (associated with the K10 Disaster Recovery immutable profile) in the past.

  5. Install the helm chart that creates the K10 restore job. Provide the source cluster ID, the name of the location profile just created, and the point-in-time chosen in the previous step, formatted as a RFC3339 time stamp.

# e.g. to restore K10 to 15:04:05 UTC on Jan 2, 2022:
$ helm install k10-restore kasten/k10restore --namespace=kasten-io \
    --set sourceClusterID=<source-clusterID> \
    --set profile.name=<location-profile-name> \
    --set pointInTime="2022-01-02T15:04:05Z"

Upon successful recovery, K10 will now be in the same state as it was at the last time the Disaster Recovery policy had been run prior to the chosen point-in-time. This includes references to restore points that had since been retired, even if that retirement happened as part of routine policy retention management.

Restore points referring to snapshots or non-immutable backups may or may not be recoverable in this state; local storage snapshots or non-immutable exported backup data may have been permanently deleted, either during an attack, or as part of routine operation. However any restore points corresponding to immutable backups should still be fully recoverable.

Stage 2: Restore Applications

The process for restoring applications from immutable backups is identical to the standard restoration workflow. Make sure to select the Exported restore point instance, referring to the backup residing in the object store locked bucket.

K10 will do all point-in-time management based on its knowledge of the time the backup took place. Initiating a restore is as simple as selecting the desired backup (exported) restore point and clicking "Restore".

Note

The point-in-time chosen for the restore will be different for each PVC, and is a function of the time when the data upload completed for each. The upload completion time can be viewed for each data artifact by querying the RestorePointContent Details. K10 will use a point-in-time 30 seconds after the uploadEndTime timestamp for the restore.

When the restore action completes, the application will be running in the same state, with the same persistent data, as it was at the time the backup took place.