Skip to main content
Version: 8.5.6 (latest)

Red Hat ACM Observability

Veeam Kasten can be integrated with Red Hat Advanced Cluster Management (ACM) Observability Service to provide centralized monitoring across your Red Hat OpenShift fleet. This integration leverages Prometheus remote_write to push metrics from the Kasten cluster to the ACM Hub.

ACM Observability uses Observatorium — an open-source, multi-tenant metrics and logs platform built on Thanos — as its API gateway. The Observatorium API handles authentication, tenant routing, and forwards metrics to the underlying Thanos Receive component for storage.

Overview

Kasten supports two connectivity modes for pushing metrics to the ACM Observatorium API:

Mode When to use Auth Protocol
Same-cluster Kasten is installed on the ACM Hub cluster THANOS-TENANT header (auto-injected) HTTP
Cross-cluster (mTLS) Kasten is on a managed cluster, writing to the Hub Client certificate HTTPS
info

The Observatorium API uses the URL path to identify the tenant and routes requests internally. In same-cluster mode, Kasten writes directly to Thanos Receive (bypassing the Observatorium API), so a THANOS-TENANT header is required. In cross-cluster mode, Kasten writes through the Observatorium API, which extracts the tenant from the URL path and handles routing automatically.

Prerequisites

  • Red Hat ACM installed on the Hub cluster.
  • MultiClusterObservability enabled and configured on the ACM Hub.
  • Veeam Kasten installed (or ready to be installed) on the target cluster.
  • On OpenShift: Kasten must be installed with scc.create: true so that the Prometheus pod can run. Without this, the pod fails with SCC errors (runAsUser / fsGroup not in the allowed range).
  • For cross-cluster (mTLS) mode only:
    • A client certificate signed by the ACM Observability client CA.
    • The client certificate must include a Subject Alternative Name (SAN) and have OU=acm in the subject.
    • See Step 2 below.

Step 1: Gather ACM Configuration

Ensure that the ACM Observability service is running and gather the necessary connection details. You can verify service availability and retrieve configuration details either through the Red Hat ACM UI or via the CLI as described below.

Retrieve Configuration via Web Console

  1. Verify Installation:

    • Log in to your OpenShift Console on the Hub Cluster.
    • Ensure you are in the local-cluster view.
    • Click Search in the left navigation menu.
    • In the Resources dropdown, type and select MultiClusterObservability.
    • Click on the observability resource instance.
    • Ensure the status is Ready.
  2. Find the Tenant ID:

    • Navigate to Infrastructure > Clusters.
    • Select the local-cluster (the Hub cluster).
    • On the Overview tab, locate the Cluster ID. This is your Tenant ID.

Retrieve Configuration via CLI

Hub Cluster

All commands in Step 1 must be run on the ACM Hub cluster.

  1. Verify the Observability Service:

    oc get multiclusterobservability -n open-cluster-management-observability

    Expected output (verify the status is Ready):

    NAME            STATUS   AGE
    observability Ready 14d
  2. Identify the Remote Write URL — choose the URL that matches your deployment:

    • Same-cluster (HTTP) — writes directly to Thanos Receive: http://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/v1/receive

    • Cross-cluster (HTTPS / mTLS) — writes through the Observatorium API: https://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/metrics/v1/default/api/v1/receive

      If Kasten cannot reach the internal service, use the external route instead:

      ROUTE_HOST=$(oc get route observatorium-api -n open-cluster-management-observability -o jsonpath='{.spec.host}')
      echo "https://${ROUTE_HOST}/api/metrics/v1/default/api/v1/receive"
    warning

    Do not use /api/v1/receive for cross-cluster mode — that path bypasses authentication.

  3. Find the Tenant ID: The Tenant ID is the Cluster ID of the ACM Hub Cluster, captured automatically in the environment variables block (Step 3).

Step 2: Prepare mTLS Client Certificate

Skip this step

If Kasten is on the same cluster as the ACM Hub (same-cluster mode), skip to Step 3.

This step applies only to cross-cluster deployments where Kasten communicates with the ACM Hub over HTTPS with mutual TLS. For detailed instructions on preparing mTLS certificates, refer to the Red Hat ACM documentation:

Exporting metrics to external endpoints — Red Hat ACM 2.15

This section documents the tls_config format, certificate requirements, and how to configure external metric endpoints with mTLS.

Key points for Kasten integration:

  • The client certificate must be signed by the ACM client CA (observability-client-ca-certs Secret on the Hub cluster). Certificates from other CAs are rejected.
  • The certificate must include at least one Subject Alternative Name (SAN) — DNS or IP. Certificates with only a Common Name (CN) are rejected.
  • The certificate subject must include OU=acm (Organizational Unit) to map to the RBAC group with write permissions.

Once you have the signed client certificate, client key, and server CA, create the Kubernetes resources in the Kasten namespace:

Target Cluster

Run these commands on the cluster where Kasten is (or will be) installed.

# Create the client certificate Secret (must use 'tls' type for tls.crt / tls.key keys)
kubectl create secret tls prometheus-client-cert \
-n kasten-io \
--cert=k10-client.crt \
--key=k10-client.key

# Create the server CA ConfigMap (key must be 'ca.crt')
kubectl create configmap observability-ca-cert \
-n kasten-io \
--from-file=ca.crt=ca.crt
Certificate Rotation

When the client certificate expires, Prometheus remote_write will fail with TLS handshake errors. Update the Secret with the renewed certificate and restart the Prometheus pod:

kubectl create secret tls prometheus-client-cert \
-n kasten-io \
--cert=k10-client.crt \
--key=k10-client.key \
--dry-run=client -o yaml | kubectl apply -f -

kubectl rollout restart deployment/prometheus-server -n kasten-io

Monitor prometheus_remote_storage_samples_failed_total for early detection of certificate expiry.

Step 3: Configure Kasten for Remote Write

Target Cluster

Run all commands in this step on the cluster where Kasten is (or will be) installed.

Configure Kasten's embedded Prometheus to push metrics to the ACM Hub. This is done by updating the Kasten Helm values.

Advanced Configuration

You can fine-tune the Prometheus remote write behavior by adding standard Prometheus configuration options under prometheus.server.remote_write[0]. Parameters such as queue_config, send_interval, and write_relabel_configs are supported and passed directly to the Prometheus server configuration. Find more details in Prometheus Remote Write.

1. Set Your Environment Variables

Copy-paste the block below into your terminal and edit the values as needed:

# ── Required ─────────────────────────────────────────────────────────────────
# Hub Cluster ID — run on the Hub cluster (see Step 1)
export ACM_TENANT_ID="$(oc get clusterversion version -o jsonpath='{.spec.clusterID}')"

# Remote Write URL — uncomment ONE line that matches your deployment (see Step 1):
# Same-cluster (HTTP):
export REMOTE_WRITE_URL="http://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/v1/receive"
# Cross-cluster (HTTPS / mTLS):
# export REMOTE_WRITE_URL="https://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/metrics/v1/default/api/v1/receive"

# ── Required on non-OpenShift (auto-detected on OpenShift) ───────────────────
export CLUSTER_NAME="us-east-prod-01"
export CLUSTER_ID="$(oc get clusterversion version -o jsonpath='{.spec.clusterID}' 2>/dev/null || echo 'set-me')"

# ── Optional ─────────────────────────────────────────────────────────────────
# Deep-link from the ACM dashboard back to this Kasten instance
export K10_DASHBOARD_URL="https://k10.apps.example.com/k10/"

2. Prepare the Helm Values

cat > k10-values.yaml <<EOF
clusterName: "${CLUSTER_NAME}"

global:
acm:
enabled: true
hubThanosTenantId: "${ACM_TENANT_ID}"
managedClusterId: "${CLUSTER_ID}"
## Cross-cluster only — uncomment to enable mTLS
## (see "Prepare mTLS Client Certificate" below):
# tls:
# clientCertSecretName: "prometheus-client-cert"
# serverCAConfigMapName: "observability-ca-cert"
# # insecureSkipVerify: false

prometheus:
server:
remote_write:
- url: "${REMOTE_WRITE_URL}"
dashboardUrl: "${K10_DASHBOARD_URL}"
metricsRegex: "(backup|restore|import|export|job|policy|action|catalog|process).*"
resources:
requests:
cpu: 750m
memory: 1.5Gi

## Required on OpenShift — allows Prometheus to run with the correct security context
scc:
create: true
EOF
tip

On OpenShift, you can omit clusterName and managedClusterId — Kasten auto-detects both. For cross-cluster mode, uncomment the tls: block, ensure the required Secret and ConfigMap exist (see Step 2), and set REMOTE_WRITE_URL to the HTTPS endpoint.

TLS Helm Values Reference

Helm Value Description
global.acm.tls.clientCertSecretName Name of a kubernetes.io/tls Secret in the Kasten namespace containing tls.crt and tls.key. Setting this activates mTLS and disables the THANOS-TENANT header injection (the tenant is identified via the URL path instead).
global.acm.tls.serverCAConfigMapName Name of a ConfigMap in the Kasten namespace containing the server CA certificate (key: ca.crt). Used for TLS server verification.
global.acm.tls.insecureSkipVerify When true, skips server certificate verification. Use only when connecting to an internal service URL with a self-signed or unverifiable certificate. Default: false.

3. Apply the Configuration

Install or Upgrade Kasten to apply the changes. Run these commands on the target cluster.

# For new installations
helm install k10 kasten/k10 -n kasten-io --create-namespace -f k10-values.yaml

# For existing installations
helm upgrade k10 kasten/k10 --reuse-values -n kasten-io -f k10-values.yaml
Inline Flags

You can also pass individual values using --set flags instead of a values file. See the Helm documentation for details.

Step 4: Verification

Target Cluster

Verification steps 1–2 run on the cluster where Kasten is installed. Steps 3–4 run on the Hub cluster.

1. Verify Prometheus Remote Write Counters

Port-forward to the Kasten Prometheus and check the remote storage counters:

kubectl port-forward -n kasten-io deployment/prometheus-server 9090:9090 &

# Check samples sent (should increase over time)
curl -s "http://localhost:9090/k10/prometheus/api/v1/query?query=prometheus_remote_storage_samples_total"

# Check for failures (should be 0)
curl -s "http://localhost:9090/k10/prometheus/api/v1/query?query=prometheus_remote_storage_samples_failed_total"

A healthy integration shows samples_total increasing steadily and samples_failed_total at 0:

{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1740000000,"13469"]}]}}

If samples_failed_total returns a non-zero value, check the Troubleshooting section.

2. Verify Kasten Prometheus Logs

Check the Prometheus logs to ensure remote_write is active and not encountering sustained errors:

kubectl logs -n kasten-io -l app=prometheus -c prometheus-server --tail=50 | grep -E "WARN|ERR|remote"
note

Occasional Failed to send batch, retrying warnings during startup (WAL replay) are normal and resolve automatically. Sustained errors indicate a configuration problem.

Hub Cluster

Steps 3–4 must be run on the ACM Hub cluster.

3. Verify Metrics in ACM Thanos

Query the Thanos query-frontend for a Kasten metric:

kubectl port-forward -n open-cluster-management-observability \
svc/observability-thanos-query-frontend 9090:9090 &

curl -s "http://localhost:9090/api/v1/query?query=catalog_actions_count" | python3 -m json.tool

Look for results with your cluster name and "application": "k10":

{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "catalog_actions_count",
"application": "k10",
"cluster_name": "us-east-prod-01"
},
"value": [1740000000, "221"]
}
]
}
}

If no results appear, allow 2–5 minutes for metrics to propagate, then recheck.

4. Verify Metrics in ACM Grafana

  1. Open the ACM Grafana dashboard on the Hub cluster.
  2. Navigate to Explore (or Drilldown in newer versions).
  3. Select the Thanos (or observatorium) datasource.
  4. Query for a Kasten metric, for example: catalog_actions_count.
  5. Verify that the metric is visible and tagged with your cluster name.

Troubleshooting

Common Installation Errors

Prometheus pod stuck in FailedCreate on OpenShift

The Prometheus server ReplicaSet reports FailedCreate with a message like:

unable to validate against any security context constraint: [...] .spec.securityContext.fsGroup: Invalid value: []int6465534: 65534 is not an allowed group

  • Cause: Kasten was installed without scc.create: true. The Prometheus pod runs as UID/GID 65534, which is outside the default OpenShift SCC allowed range.

  • Fix: Upgrade Kasten with scc.create: true:

    helm upgrade k10 kasten/k10 --reuse-values -n kasten-io --set scc.create=true

"A valid .Values.global.acm.hubThanosTenantId is required"

This error occurs during helm install or helm upgrade if the Tenant ID is missing.

  • Cause: global.acm.hubThanosTenantId is not set in your values file.
  • Fix: Retrieve the Hub Cluster ID (see Step 1) and add it to your k10-values.yaml.

"global.acm.managedClusterId is required when global.acm.enabled is true"

This error occurs if Kasten cannot auto-detect the Cluster ID (e.g., on non-OpenShift platforms) and it was not provided.

  • Cause: You are installing on a platform where Cluster ID auto-detection is not supported, or RBAC permissions prevent lookup.
  • Fix: Add global.acm.managedClusterId to your k10-values.yaml (use the value from $CLUSTER_ID).

"clusterName is required when prometheus.server.remote_write is configured"

This error occurs if Kasten cannot determine a name for the cluster.

  • Cause: You are installing on a non-OpenShift platform (or auto-detection failed) and did not provide a clusterName.
  • Fix: Add clusterName: "my-cluster-name" to your k10-values.yaml.

Runtime Errors (Prometheus Logs)

"server returned HTTP status 400 Bad Request: Client sent an HTTP request to an HTTPS server"

  • Cause: The remote write URL uses http:// but the Observatorium API expects https://.
  • Fix: Change the URL scheme to https:// and configure the TLS settings (global.acm.tls).

"server returned HTTP status 401 Unauthorized"

  • Cause: The Observatorium API requires authentication. This typically occurs when using the authenticated URL path (/api/metrics/v1/{tenant}/...) without a valid client certificate.
  • Fix: Configure mTLS by setting global.acm.tls.clientCertSecretName and ensuring the client certificate meets the requirements (SAN present, OU=acm).

"server returned HTTP status 400 ... could not determine subject"

  • Cause: The client certificate does not contain a Subject Alternative Name (SAN). The Observatorium API extracts the client identity from SANs, not from the Common Name (CN).
  • Fix: Regenerate the client certificate with at least one SAN (DNS or IP). See Step 2.

"server returned HTTP status 500 ... no matching hashring to handle tenant"

This error appears in the Prometheus logs (kubectl logs ... -c prometheus-server).

  • Cause: The hubThanosTenantId provided is incorrect, or the tenant name in the URL path does not match a configured tenant on the Observatorium API.
  • Fix: Verify that the Tenant ID matches the Hub Cluster ID exactly. For cross-cluster mode, also verify the tenant name in the URL path (typically default).

"EOF" or "use of closed network connection"

  • Cause: Transient TLS connection issues, often seen during Prometheus startup (WAL replay) when it sends a burst of samples.
  • Fix: These are typically self-resolving — Prometheus retries automatically. Check prometheus_remote_storage_samples_failed_total; if it stays at 0, the retries are succeeding. If errors persist, verify network connectivity to the Observatorium API service.

Connectivity Reference

Deployment URL Auth
Kasten on Hub cluster http://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/v1/receive THANOS-TENANT header (auto-injected by Kasten)
Kasten on managed cluster (internal) https://observability-observatorium-api.open-cluster-management-observability.svc:8080/api/metrics/v1/default/api/v1/receive mTLS client certificate
Kasten on managed cluster (external route) https://<route-host>/api/metrics/v1/default/api/v1/receive mTLS client certificate