Protecting Applications

Protecting an application with K10, usually accomplished by creating a policy, requires the understanding and use of three concepts:

  • Snapshots and Backups: Depending on your environment and requirement, you might need just one or both of these data capture mechanisms

  • Scheduling: Specification of application capture frequency and snapshot/backup retention objectives

  • Selection: This defines not just which applications are protected by a policy but, whenever finer-grained control is needed, resource filtering can be used to restrict what is captured on a per-application basis

This section demonstrates how to use these concepts in the context of a K10 policy to protect applications. Today, an application for K10 is defined as a collection of namespaced Kubernetes resources (e.g., ConfigMaps, Secrets), relevant non-namespaced resources used by the application (e.g., StorageClasses), Kubernetes workloads (i.e., Deployments, StatefulSets, OpenShift DeploymentConfigs, and standalone Pods), deployment and release information available from Helm v3, and all persistent storage resources (e.g., PersistentVolumeClaims and PersistentVolumes) associated with the workloads.

While you can always create a policy from scratch from the policies page, the easiest way to define policies for unprotected applications is to click on the Applications card on the main dashboard. This will take you to a page where you can see all applications in your Kubernetes cluster.

../_images/overview_apps.png

To protect any unmanaged application, simply click Create New Policy and, as shown below, that will take you to the policy creation section with an auto-populated policy name that you can change. The concepts highlighted above will be described in the below sections in the context of the policy creation workflow.

../_images/policies_create.png

Snapshots and Backups

All policies center around the execution of actions and, for protecting applications, you start by selecting the snapshot action with an optional backup (currently called export) option to that action.

Snapshots

../_images/policies_snapshot.png

Note

A number of public cloud providers (e.g., AWS, Azure, Google Cloud) actually store snapshots in object storage and they are retained independent of the lifecycle of the primary volume. However, this is not true of all public clouds (e.g., IBM Cloud) and you might also need to enable backups in public clouds for safety. Please check with your cloud provider's documentation for more information.

Snapshots are the basis of persistent data capture in K10. They are usually used in the context of disk volumes (PVC/PVs) used by the application but can also apply to application-level data capture (e.g., with Kanister).

Snapshots, in most storage systems, are very efficient in terms of having a very low performance impact on the primary workload, requiring no downtime, supporting fast restore times, and implementing incremental data capture.

However, storage snapshots usually also suffer from constraints such as having relatively low limits on the maximum number of snapshots per volume or per storage array. Most importantly, snapshots are not always durable. First, catastrophic storage system failure will destroy your snapshots along with your primary data. Further, in a number of storage systems, a snapshot's lifecycle is tied to the source volume. So, if the volume is deleted, all related snapshots might automatically be garbage collected at the same time. It is therefore highly recommended that you create backups of your application snapshots too.

Backups

../_images/policies_export_backup.png

Note

In most cases, when application-level capture mechanisms (e.g., logical database dumps via Kanister) are used, these artifacts are directly sent to an object store or an NFS file store. Backups should not be needed in those scenarios unless a mix of application and volume-level data is being captured or in case of a more specific use case.

Given the limitations of snapshots, it is often advisable to set up backups of your application stack. However, even if your snapshots are durable, backups might still be useful in a variety of use cases including lowering costs with K10's data deduplication or backing your snapshots up in a different infrastructure provider for cross-cloud resiliency.

Backup operations convert application and volume snapshots into backups by transforming them into an infrastructure-independent format and then storing them in a target location. To convert your snapshots into backups, select Enable Backups via Snapshot Exports during policy creation. Additional settings for the destination location and control over the export of snapshot data versus just a reference will also be visible here. These are primarily used for migrating applications across clusters and more information on them can be found in the Exporting Applications section. These settings are available when creating a policy, and when manually exporting a restore point.

The data exported in a Backup operation consists of metadata on the application and snapshot data for the application volumes. The destination for the metadata export is an Object Storage Location or an NFS File Storage Location that is specified in the Export Location Profile field. Users without list permissions on profiles can manually enter the name of the profile to export to if they have been given that information and have permissions to create policies.

../_images/profile_textfield.png

There are two mechanisms by which snapshot data get exported: Filesystem Mode Export or Block Mode Export. Each mechanism defines the process of uploading, downloading, and managing snapshot data in a specific destination location. The export mechanism is normally selected automatically on a per-volume basis, based on the Volume Mode used to mount the volume in a Pod:

With automatic selection, both the metadata and the snapshot data are sent to the same destination. In a VMware vSphere environment, there is an option to use the Block Mode Export mechanism for all volume snapshots, regardless of the Volume Mode. This option enables the specification of a Veeam Repository Location as the destination for snapshot data (only), along with a separate location for the associated metadata.

The two export mechanisms are described below:

Filesystem Mode Export

This is the default mode of export for a volume mounted in Filesystem Volume Mode. Such volumes are attached via the volumeMounts property of a Pod container specification.

A filesystem mode export assumes that the format of the data on the application disk is a filesystem. During upload, this export mechanism creates another PersistentVolume from the volume snapshot; this is then mounted in a pod using filesystem Volume Mode and the mounted filesystem is copied with implicit support to deduplicate, compress and encrypt the data. During download the target volume is mounted in a pod using filesystem Volume Mode and the filesystem restored.

The destination for both snapshot data and metadata is the Object Storage Location or NFS File Storage Location that is specified in the Export Location Profile field, though the two types of data are stored separately.

Block Mode Export

This is the default mode of export for a volume mounted in Block Volume Mode. Such volumes are attached via the volumeDevices property of a Pod container specification.

A block mode export accesses the content of the disk snapshot at the block level. If changed block tracking (CBT) data is available for the disk volume and is supported in K10 for the provisioner concerned, then K10 will export just the incremental changes to the export location if possible. If an incremental upload is not possible then the entire content of the volume snapshot will be uploaded each export.

During the upload process, the export mechanism will use storage provisioner specific network data transfer APIs if available and supported by K10, to directly fetch the volume snapshot data from the upload pod. Alternatively, if these APIs are not available, the export mechanism will create a PersistentVolume with the volume snapshot data. This PersistentVolume is then mounted in the upload pod using Block Volume Mode, and the data will be read from the raw volume.

Similarly, during the download process, the mechanism will use storage provisioner specific network data transfer APIs if available and supported by K10, to directly write the data to the target PersistentVolume. Alternatively, if these APIs are not available, the target PersistentVolume will be mounted in the download pod using Block Volume Mode and the data written to the raw volume.

Note

K10 will export snapshot data of a volume mounted with a Block VolumeMode, or import/restore downloaded block mode snapshot data to a volume, only if the following annotation is present on the volume's StorageClass:

$ kubectl annotate storageclass ${STORAGE_CLASS_NAME} k10.kasten.io/sc-supports-block-mode-exports=true

An exception is made for snapshots of PersistentVolumes provisioned by the csi.vsphere.vmware.com provisioner. Volumes of this provisioner are already recognized internally to support block mode exports and do not need this annotation in their StorageClasses.

See K10 Primer Block Mount Check for a checker to verify if a StorageClass can be annotated in this manner.

The presence of this annotation in a StorageClass does not affect volumes of that StorageClass that are mounted with the Filesystem Volume Mode.

When the Block Mode Export mechanism is used, the organization of snapshot data in the destination location is based on the type of location:

  • When the destination is an Object Storage Location or an NFS File Storage Location, snapshot data will be uploaded in a K10 specific format which provides deduplication, compression and encryption support in the specified destination.

    Automatic compaction will be performed periodically on snapshot data, to ensure that the chain of incremental backups that follows a full backup will not grow too long. Compaction synthesizes a full backup by applying the chain of incremental backups to the base full backup and saving the result as a new full backup; no block data is uploaded during compaction as only references to data blocks are manipulated by the operation.

    Metadata will also be sent to the same destination location, though it is stored separately from the snapshot data.

  • When the destination is a Veeam Repository Location then snapshot data is uploaded to a Veeam Repository in its specific format.

    A Veeam Repository Location does not provide metadata storage, which must be specified separately as described below.

In VMware vSphere environments it is possible to optionally enable the use of the Block Mode Export mechanism for all of the volume snapshots of the applications affected by a Backup Policy, regardless of Volume Mode, as it is the preferred way to export snapshot data in a VMware cluster. This setting is mandatory in order to specify a Veeam Repository Location profile as the destination for snapshot data.

Using the Block Mode Export mechanism for all volume snapshots is enabled by explicitly selecting the Export snapshot data in block mode option in the export properties of the Backup Policy and then selecting an Object Storage Location, an NFS File Storage or a Veeam Repository Location profile as the destination for snapshot data in the Location Profile Supporting Block Mode field.

The location for metadata is specified by the Export Location Profile field, a required field in the dialog. When the Location Profile Supporting Block Mode field is an Object Storage Location or an NFS File Storage Location, then both values are required to be the same to ensure that both metadata and snapshot data are sent to the same location; the policy will become invalid if this is not the case. The values can be different only in the case of a Veeam Repository Location as it cannot be used for metadata storage.

../_images/export_block_mode.png

The policy card provides an indication that the block mode export mechanism is used for all exported snapshots (see Export snapshot data in block mode ... below):

../_images/block_mode_export_card.png

If such an indication is not present then snapshot data is exported with a mechanism based on the VolumeMode of each individual volume.

In the particular case of migrating snapshot data from a Veeam Repository Location, an identically named location profile as used as the export block mode destination must exist in the importing cluster.

Scheduling

There are four components to scheduling:

  • How frequently the primary snapshot action should be performed

  • How often snapshots should be exported into backups

  • Retention schedule of snapshots and backups

  • When the primary snapshot action should be performed

Action Frequency

../_images/policies_action_frequency.png

Actions can be set to execute at an hourly, daily, weekly, monthly, or yearly granularity, or on demand. By default, actions set to hourly will execute at the top of the hour and other actions will execute at midnight UTC.

It is also possible to select the time at which scheduled actions will execute and sub-frequencies that execute multiple actions per frequency. See Advanced Schedule Options below.

Sub-hourly actions are useful when you are protecting mostly Kubernetes objects or small data sets. Care should be taken with more general-purpose workloads because of the risk of stressing underlying storage infrastructure or running into storage API rate limits. Further, sub-frequencies will also interact with retention (described below). For example, retaining 24 hourly snapshots at 15-minute intervals would only retain 6 hours of snapshots.

Snapshot Exports to Backups

../_images/policies_backup_selection.png

Backups performed via exports, by default, will be set up to export every snapshot into a backup. However, it is also possible to select a subset of snapshots for exports (e.g., only convert every daily snapshot into a backup).

Note

To maintain backup recovery points, once the policy is saved the export location profile can only be changed to a compatible location profile. The UI will enforce compatibility when editing a policy, but no compatibility enforcement is performed when editing the policy CR directly.

Retention Schedules

../_images/policies_snapshot_retention.png

A powerful scheduling feature in K10 is the ability to use a GFS retention scheme for cost savings and compliance reasons. With this backup rotation scheme, hourly snapshots and backups are rotated on an hourly basis with one graduating to daily every day and so on. It is possible to set the number of hourly, daily, weekly, monthly, and yearly copies that need to be retained and K10 will take care of both cleanup at every retention tier as well as graduation to the next one. For on demand policies it is not possible to set a retention schedule.

../_images/policies_backup_retention.png

By default, backup retention schedules will be set to be the same as snapshot retention schedules but these can be set to independent schedules if needed. This allows users to create policies where a limited number of snapshots are retained for fast recovery from accidental outages while a larger number of backups will be stored for long-term recovery needs. This separate retention schedule is also valuable when limited number of snapshots are supported on the volume but a larger backup retention count is needed for compliance reasons.

The retention schedule for a policy does not apply to snapshots and backups produced by manual policy runs. Any artifacts created by a manual policy run will need to be manually cleaned up.

Advanced Schedule Options

../_images/policies_advanced_scheduling.png

By default, actions set to hourly will execute at the top of the hour and other actions will execute at midnight UTC.

The Advanced Options settings enable picking how many times and when actions are executed within the interval of the frequency. For example, for a daily frequency, what hour or hours within each day and what minute within each hour can be set.

../_images/policies_advanced_retention.png

The retention schedule for the policy can be customized to select which snapshots and backups will graduate and be retained according to the longer period retention counts.

By default, hourly retention counts apply to the hourly at the top of the hour, daily retention counts apply to the action at midnight, weekly retention counts refer to midnight Sunday, monthly retention counts refer to midnight on the 1st of each month, and yearly retention counts refer to midnight on the 1st of January (all UTC).

When using sub-frequencies with multiple actions per period, all of the actions are retained according to the retention count for that frequency.

The Advanced Options settings allows a user to display and enter times in either local time or UTC. All times are converted to UTC and K10 policy schedules do not change for daylight savings time.

Backup Window

../_images/policies_backup_window.png

The Backup Window settings allow a user to select a time interval within which the policy will run. The policy is scheduled to run once at the Backup Window start time. If the selected time interval is too short, the policy run will not finish and will be canceled.

If the policy has an hourly frequency and the duration of the Backup Window exceeds 1 hour, the policy is also scheduled to run every 60 minutes thereafter within the Backup Window.

Advanced Frequency Options can be used with Backup Window but with some limitations. The Backup Window settings override time settings (hours and minutes) selected in the Advanced Frequency Options. Advanced Frequency Options are not available for Hourly and Daily frequencies and only partially available for the other frequency options.

Staggering

With staggering enabled, K10 will automatically find an optimal start time and run the policy within the selected interval. Staggering allows K10 the flexibility to stagger runs of multiple policies and reduce the peak load on the overall system.

Application Selection and Exceptions

This section describes how policies can be bound to applications, how namespaces can be excluded from policies, how policies can protect cluster-scoped resources, and how exceptions can be handled.

Application Selection

You can select applications by two specific methods:

  • Application Names

  • Labels

Selecting By Application Name

../_images/policies_select_name.png

The most straightforward way to apply a policy to an application is to use its name (which is derived from the namespace name). Note that you can select multiple application names in the same policy.

Selecting By Application Name Wildcard

../_images/policies_select_wildcard.png

For policies that need to span similar applications, you can select applications by an application name wildcard. Wildcard selection will match all application that start with the wildcard specified.

../_images/policies_select_all_wildcard.png

For policies that need to span all applications, you can select all applications with a * wildcard.

Selecting No Applications

../_images/policies_select_none.png

For policies that protect only cluster-scoped resources and do not target any applications, you can select "None". For more information about protecting cluster-scoped resources, see Cluster-Scoped Resources.

Selecting By Labels

../_images/policies_select_label.png

For policies that need to span multiple applications (e.g., protect all applications that use MongoDB or applications that have been annotated with the gold label), you can select applications by label. Any application (namespace) that has a matching label as defined in the policy will be selected. Matching occurs on labels applied to namespaces, deployments, and statefulsets. If multiple labels are selected, a union (logical OR) will be performed when deciding to which applications the policy will be applied. All applications with at least one matching label will be selected.

Note that label-based selection can be used to create forward-looking policies as the policy will automatically apply to any future application that has the matching label. For example, using the heritage: Tiller (Helm v2) or heritage: Helm (Helm v3) selector will apply the policy you are creating to any new Helm-deployed applications as the Helm package manager automatically adds that label to any Kubernetes workload it creates.

Namespace Exclusion

Even if a namespace is covered by a policy, it is possible to have the namespace be ignored by the policy. You can add the k10.kasten.io/ignorebackuppolicy annotation to the namespace(s) you want to be ignored. Namespaces that are tagged with the k10.kasten.io/ignorebackuppolicy annotation will be skipped during scheduled backup operations.

Exceptions

Normally K10 retries when errors occur and then fails the action or policy run if errors persist. In some circumstances it is desirable to treat errors as exceptions and continue the action if possible.

Examples of when K10 does this automatically include:

  • When a Snapshot policy selects multiple applications by label and creates durable backups by exporting snapshots, K10 treats failures across applications independently.

    • If the snapshot for an application fails after all retries, that application is not exported. If snapshots for some applications succeed and snapshots for other applications fail, the failures are reported as exceptions in the policy run. If snapshots for all applications fail, the policy run fails and the export is skipped.

    • If the export of a snapshot for an application fails after retries, that application is not exported. If the export of snapshots for some applications succeed and others fail, the failures are reported as exceptions in the export action and policy run. If no application is successfully exported, the export action and policy run fail.

In some cases, treating errors as exceptions is optional:

../_images/ignore_exceptions.png
  • K10 normally waits for workloads (e.g., Deployments or StatefulSets) to become ready before taking snapshots and fails the action if a workload does not become ready after retries. In some cases the desired path for a backup action or policy might be to ignore such timeouts and to proceed to capture whatever it can in a best-effort manner and store that as a restore point.

When an exception occurs, the job will be completed with exception(s):

../_images/ignore_exceptions_job.png

Details of the exceptions can be seen with job details:

../_images/ignore_exceptions_jobdetails.png

Any exception(s) ignored when creating a restore point are noted in the restore point:

../_images/ignore_exceptions_restorepoint.png

Resource Filtering

This section describes how specific application resources can either be included or excluded from capture or restoration.

Warning

Filters should be used with care. It is easy to accidentally define a policy that might leave out essential components of your application.

Resource filtering is supported for both backup policies and restore actions. The recommended best practice is to create backup policies that capture all resources to future-proof restores and to use filters to limit what is restored.

In K10, filters describe which Kubernetes resources should be included or excluded in the backup. If no filters are specified, all the API resources in a namespace are captured by the BackupActions created by this Policy.

Filtering Resources by GVRN

Resource types are identified by group, version, and resource type names, or GVR (e.g., networking.k8s.io/v1/networkpolicies). Core Kubernetes types do not have a group name and are identified by just a version and resource type name (e.g., v1/configmaps). Individual resources are identified by their resource type and resource name, or GVRN. In a filter, an empty or omitted group, version, resource type or resource name matches any value. For example, if you set Group: apps and Resource: deployments, it will capture all Deployments no matter the API Version (e.g., v1 or v1beta1).

../_images/policies_add_gvrn_resource_filter.png

Filters reduce the resources in the backup by first selectively including and then excluding resources:

  • If no include or exclude filters are specified, all the API resources belonging to an application are included in the set of resources to be backed up

  • If only include filters are specified, resources matching any GVRN entry in the include filter are included in the set of resources to be backed up

  • If only exclude filters are specified, resources matching any GVRN entry in the exclude filter are excluded from the set of resources to be backed up

  • If both include and exclude filters are specified, the include filters are applied first and then exclude filters will be applied only on the GVRN resources selected by the include filter

For a full list of API resources in your cluster, run kubectl api-resources.

Filtering Resources by Labels

Note

Non-namespaced objects, such as StorageClasses, may not be affected by label filters.

K10 also supports filtering resources by labels when taking a backup. This is particularly useful when there are multiple apps running in a single namespace. By leveraging label filters, it is possible to selectively choose which application to backup. The rules from the previous section describing the use of include and exclude filters apply to label filters as well.

../_images/policies_add_label_resource_filter.png

Multiple labels can be provided as a part of the same filter if they are to be applied together. Conversely, multiple filters each with their own label can be provided together signifying that any of the labels should match.

../_images/policies_and_label_filter.png

This filter only includes resources with both app:mysql2 and app:mysql1 labels

../_images/policies_or_label_filter.png

This filter includes resources with both app:mysql2 or app:mysql1 label

Safe Backup

For safety, K10 automatically includes namespaced and non-namespaced resources such as associated volumes (PVCs and PVs) and StorageClasses when a StatefulSet, Deployment, or DeploymentConfig is included by filters. Such auto-included resources can be omitted by specifying an exclude filter.

Similarly, given the strict dependency between the objects, K10 protects Custom Resource Definitions (CRDs) if a Custom Resource (CR) is included in a backup. However, it is not possible today to exclude a CRD via exclude filters and any failure to exclude will be silently ignored.

Working With Policies

Using Policy Presets

Operations teams can define multiple protection policy presets that specify parameters such as schedule, retention, location and infrastructure. A catalog of organizational policy presets and SLAs can be provided to the development teams with an intimate knowledge of application requirements, without disclosing credential and storage infrastructure implementation. This ensures separations of concerns while scaling operations in a cloud-native environment.

To create a policy preset, navigate to the Presets page under the Policies menu in the navigation sidebar. Then simply click Create New Preset and, as shown below, the policy preset creation section will be shown.

../_images/policypreset_create.png

As can be seen, the workflow of the policy preset creation is quite similar to the policy creation. The major difference here is that the policy preset does not contain any application specific settings, which must be specified directly in the policy.

While creating (or editing) a policy the user can opt in to "Use a Preset". Users without list permissions on policy presets can manually enter the name of the policy preset to use, if they have been given that information.

../_images/policy_usepreset.png

Note

Each policy created using a preset does not copy its configuration but refers to it. This means that every preset change also entails a change in the corresponding policies.

Viewing Policy Activity

Once you have created a policy and have navigated back to the main dashboard, you will see the selected applications quickly switch from unmanaged to non-compliant (i.e., a policy covers the objects but no action has been taken yet). They will switch to compliant as snapshots and backups are run as scheduled or manually and the application enters a protected state. You can also scroll down on the page to see the activity, how long each snapshot took, and the generated artifacts. Your page will now look similar to this:

../_images/dashboard_compliant.png

More detailed job information can be obtained by clicking on the in-progress or completed jobs.

Manual Policy Runs

../_images/policies_manual_run.png

It is possible to manually create a policy run by going to the policy page and clicking the run once button on the desired policy. Note that any artifacts created by this action will not be eligible for automatic retirement and will need to be manually cleaned up.

Pausing Policies

../_images/policies_paused.png

It is possible to pause new runs of a policy by going to the policy page and clicking the pause button on the desired policy. Once a policy is resumed by clicking the resume button, the next policy run will be at the next scheduled time after the policy is resumed.

Paused policies do not generate skipped jobs and are ignored for the purposes of compliance. Applications that are only protected by paused policies are marked as unmanaged.

Revalidating Policies

Revalidation is useful when a Policy becomes invalid. Policies that are invalid will not run and can result in a breach of compliance.

Editing Policies

It is also possible to edit created policies by clicking the edit button on the policies page.

../_images/policies_edit.png

Policies, a Kubernetes Custom Resource (CR), can also be edited directly by manually modifying the CR's YAML through the dashboard or command line.

Changes made to the policy (e.g., new labels added or resource filtering applied) will take effect during the next scheduled policy run.

Careful attention should be paid to changing a policy's retention schedule as that action will automatically retire and delete restore points that no longer fall under the new retention scheme.

Editing retention counts can change the number of restore points retained at each tier during the next scheduled policy run. The retention counts at the start of a policy run apply to all restore points created by the policy, including those created when the policy had different retention counts. Editing Advanced Schedule Options can change when a policy runs in the future and which restore points created by future policy runs will graduate and be retained by which retention tiers.

Restore points graduate to higher retention tiers according to the retention schedule in effect when the restore point is created. This protects previous restore points when the retention schedule changes.

For example, consider a policy that runs hourly at 20 minutes after the hour and retains 1 hourly and 7 daily snapshots with the daily coming at 22:20. At steady state that policy will have 7 or 8 restore points. If that policy is edited to run at 30 minutes after the hour and retain the 23:30 snapshot as a daily, when the policy next runs at 23:30 it will retain the newly created snapshot as both an hourly and a daily. The 6 most recent snapshots created at 22:20 will be retained, and the oldest snapshot from 22:20 will be retired.

Note

When editing the export location profile on a policy, the updated location profile should only be changed to a profile that references the same file or object store as the previous location profile. Failing to do so will result in the previously exported backups being inaccessible by K10.

Disabling Backup Exports In A Policy

Care should be taken when disabling backup/exports from a policy. If no independent export retention schedule existed, no new exports will be created and the prior exports will be retired as before. The exported artifacts will be retired in the future at the same time as the snapshot artifacts from each policy run.

If an independent retention schedule existed for export, editing the policy to remove exports will remove the independent export retention counts from the policy. Upon the next successful policy run, the snapshot retirement schedule will determine which previous artifacts to retain and which to retire based upon the policy’s retention table. Retiring a policy run will retire both snapshot and export artifacts. Either snapshot or export artifacts for a retiring policy run may already have been retired if the prior export retention values were higher or lower than the policy retention values.

Deleting A Policy

You can easily delete a policy from the policies page or using the API. However, in the interests of safety, deleting a policy will not delete all the restore points that were generated by it. Restore points from deleted policies can be manually deleted from the Application restore point view or via the API.