Resource Requirements

K10's resource requirements are almost always related to the number of applications in your Kubernetes cluster and the kind of data management operations being performed (e.g., snapshots vs. backups).

Some of the resource requirements are static (base resource requirements) while other resources are only required when certain work is done (dynamic resource requirements). The auto-scaling nature of K10 ensures that resources consumed by dynamic requirements will always scale down to zero when no work is being performed.

While the below recommendations for both requests and limits should be applicable to most clusters, it is important to note that the final requirement will be a function of your cluster and application scale, total amount of data, file size distribution, and data churn rate. You can always use Prometheus or Kubernetes Vertical Pod Autoscaling (VPA) with updates disabled to check your particular requirements.

Requirement Types

  • Base Requirements: These are the resources needed for K10's internal scheduling and cleanup services, which are mostly driven by monitoring and catalog scale requirements. The resource footprint for these base requirements is usually static and generally does not noticeably grow with either a growth in catalog size (number of Kubernetes resources protected) or number of applications protected.

  • Disaster Recovery: These are the resources needed to perform a DR of the K10 install and are predominantly used to compress, deduplicate, encrypt, and transfer the K10 catalog to object storage. Providing additional resources can also speed up the DR operation. The DR resource footprint is dynamic and scales down to zero when a DR is not being performed.

  • Backup Requirements: Resources for backup are required when data is transferred from volume snapshots to object storage or NFS file storage. While the backup requirements depend on your data, churn rate, and file system layout, the requirements are not unbounded and can easily fit in a relatively narrow band. Providing additional resources can also speed up backup operations. To prevent unbounded parallelism when protecting a large number of workloads, K10 bounds the number of simultaneous backup jobs (default 9). The backup resource footprint is dynamic and scales down to zero when a backup is not being performed.

Requirement Guidelines

The below table lists the resource requirements for a K10 install protecting 100 applications or namespaces.

It should be noted that DR jobs are also included in the maximum parallelism limit (N) and therefore you can only have N simultaneous backup jobs or N-1 simultaneous backup jobs concurrently with 1 DR job.

K10 Resource Guidelines

Type

Requested CPU (Cores)

Limit CPU (Cores)

Requested Memory (GB)

Limit Memory (GB)

Base

1

2

1

4

DR

1

1

0.3

0.3

Dynamic (per parallel job)

1

1

0.4

0.4

Total

3

4

1.8

4.8

Configuring K10 Resource Usage With Helm Values

Resource usage requests and limits can be specified for the containers running in K10 through Helm values. Kubernetes resource management is at the container level, so in order to set resource values, you will need to provide both the pod and container names. Custom resource usage can be set through Helm in two ways:

  • Providing the path to one or more YAML files during helm install or helm upgrade with the --values flag:

    resources:
      <pod-name>:
        <container-name>:
          requests:
            memory: <value>
            cpu: <value>
          limits:
            memory: <value>
            cpu: <value>
    

    For example this file will modify the settings for the kanister-sidecar container, which runs in the catalog-svc pod:

    resources:
      catalog-svc:
        kanister-sidecar:
          requests:
            memory: 800Mi
            cpu: 250m
          limits:
            memory: 950Mi
            cpu: 900m
    
  • Modifying the resource values one at a time with the --set flag during helm install or helm upgrade:

    --set=resources.<pod-name>.<container-name>.[requests|limits].[memory|cpu]=<value>
    

    For the equivalent behavior of the example above, the following values can be provided:

    --set=resources.catalog-svc.kanister-sidecar.requests.memory=800Mi \
    --set=resources.catalog-svc.kanister-sidecar.requests.cpu=250m \
    --set=resources.catalog-svc.kanister-sidecar.limits.memory=950Mi \
    --set=resources.catalog-svc.kanister-sidecar.limits.cpu=900m
    

When adjusting a container's resource limits or requests, if any setting is left empty, the Helm chart will assume it should be unspecified. Likewise, providing empty settings for a container will result in no limits/requests being applied.

For example, the following Helm values file will yield no specified resource requests/limits for the kanister-sidecar container, and only a CPU limit for the jobs-svc container:

resources:
  catalog-svc:
    kanister-sidecar:
  jobs-svc:
    jobs-svc:
      limits:
        cpu: 50m

Configuring Generic Volume Backup and Restore Resource Usage With Helm Values

Resource requests and limits can be added to the injected Kanister sidecar container, the pod launched during restore operation of Generic Volume Backup and Restore and other temporary pods (see the full list below) through common Helm Values.

Note

If namespace level resource requests and limits have been configured using LimitRange or ResourceQuota, those values can prevent below values from getting applied or result in failure of the operation. Make sure resources are specified taking those restrictions into consideration.

Custom resource requirements can be applied through Helm in two ways:

  • Providing the path to one or more YAML files during helm install or helm upgrade with the --values flag:

    genericVolumeSnapshot:
      resources:
        requests:
          memory: <value>
          cpu: <value>
        limits:
          memory: <value>
          cpu: <value>
    
  • Modifying the resource values one at a time with the --set flag during helm install or helm upgrade:

    --set=genericVolumeSnapshot.resources.[requests|limits].[memory|cpu]=<value>
    

List of pods affected by this setting:

  • copy-vol-data-: Uploads snapshot data to the export location, such as exporting a snapshot to S3 or NFS.

  • data-mover-: An API server used for Generic Volume backups and exporting metadata.

  • backup-data-stats-: Gathers statistics from finished Generic backups.

  • kanister-job-: Used for custom operations usually created by user blueprints.

  • affinity-pvc-group-: A temporary pod required for Generic Volume restore, used to place grouped PVCs on the same node before restoring data on the pods. It does not perform any actions itself.

  • restore-data-: Restores data from exported restore point.

  • create-repo-: Initializes a backup repository in an export location.

  • delete-data-: Deletes a backup from an export location.

  • kopia-maintenance-: Runs maintenance operations for Kopia repositories.

  • prepare-data-job-: Used for operations under volumes, such as moving data for some types of backups during restores.

  • -owner: Used to upgrade repositories for export location for Generic backups and exports created in an old version.

See Resource units in Kubernetes for options on how to specify valid memory and CPU values.