If you have searched for the kube-prometheus-stack Helm chart documentation, you have probably already seen the official GitHub README, the Artifact Hub page, and a handful of blog posts that cover only the basics. What most of those resources miss is the depth that platform engineers, site reliability engineers, and DevOps teams actually need — the full picture from first install all the way through production-grade configuration, CRD management, upgrading, and real-world troubleshooting.

This guide fills that gap. It is written for engineers who want to understand not just how to run the commands, but why the chart works the way it does, what every critical configuration option actually controls, and how to build a monitoring stack that survives the real pressures of production Kubernetes clusters.

What Is the Kube-Prometheus-Stack Helm Chart?

The kube-prometheus-stack is an official Helm chart maintained by the prometheus-community organization on GitHub. It packages the entire kube-prometheus monitoring stack — Prometheus, Grafana, Alertmanager, Prometheus Operator, Node Exporter, and kube-state-metrics — into a single installable unit.

The chart was originally called the prometheus-operator chart. It was renamed to kube-prometheus-stack to better reflect what it actually deploys: not just the Prometheus Operator, but the entire upstream kube-prometheus project stack. This distinction matters because the Prometheus Operator is just one component among many.

The chart is distributed in two ways. The first is via the traditional Helm repository at https://prometheus-community.github.io/helm-charts. The second, and increasingly preferred method, is via the OCI registry at oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack. The OCI distribution method aligns with modern Helm 3 standards and offers better versioning guarantees.

As of 2026, the latest stable release is version 84.5.0, reflecting years of continuous development and close alignment with upstream Prometheus Operator releases.

Prerequisites Before Installing the Helm Chart

Before running a single Helm command, your environment needs to meet several requirements. Skipping this verification step is the most common cause of failed installations and confusing errors.

  • Kubernetes v1.20+ — Earlier versions are unsupported and may produce unexpected behavior around CRD API versions.
  • Helm v3 — Helm 2 is not supported. Ensure helm version returns a v3.x release.
  • kubectl configured with cluster access and cluster-admin RBAC — required for creating ClusterRoles, ClusterRoleBindings, and cluster-scoped CRDs.
  • A default StorageClass — Without persistent storage, Prometheus and Alertmanager lose all data on pod restart. AWS, GCP, Azure, and DigitalOcean provide default storage classes automatically. Bare-metal clusters need Rook-Ceph, Longhorn, or OpenEBS.

Adding the Prometheus Community Helm Repository

Before installing, register the prometheus-community repository in your local Helm configuration:

terminal — bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

The helm repo update command fetches the latest chart index. Run it every time before installing or upgrading — Helm caches the repository index locally and that cache can become stale. To see all available chart versions:

terminal — bash
helm search repo kube-prometheus-stack --versions

Installing the Kube-Prometheus-Stack Helm Chart

The simplest installation command creates a dedicated monitoring namespace and installs the chart with all default values:

terminal — bash
# Install via Helm repository
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

# Or install via OCI registry (modern approach)
helm install kube-prometheus-stack \
  oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

After installation, verify all pods are running:

terminal — bash
# Check all pods are Running
kubectl get pods -n monitoring

# Check services created
kubectl get svc -n monitoring

A healthy installation shows pods for the Prometheus Operator, Prometheus server, Alertmanager, Grafana, Node Exporter (DaemonSet on every node), and kube-state-metrics — all in Running state within two to three minutes.

Understanding the Chart's Default Values

Every configuration option in the kube-prometheus-stack Helm chart is controlled by the values.yaml file. The default values.yaml is one of the most comprehensive in the entire Helm ecosystem — it runs to thousands of lines and controls every aspect of every component.

terminal — bash
# View full default values
helm show values prometheus-community/kube-prometheus-stack

# Save to file for inspection and customization
helm show values prometheus-community/kube-prometheus-stack > default-values.yaml

The best practice is to create your own custom values file that overrides only the settings you want to change, and pass it to Helm with the -f flag. This keeps customizations clean, version-controllable, and easy to review.

Key Configuration Sections in values.yaml

Prometheus Configuration

The prometheus.prometheusSpec section controls the Prometheus server. The most critical settings for production are storage, retention, resources, and scrape intervals.

custom-values.yaml
prometheus:
  prometheusSpec:
    retention: 30d
    retentionSize: 45GB
    scrapeInterval: 30s
    evaluationInterval: 30s
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: 2000m
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

Memory sizing is one of the most commonly misconfigured settings. As a practical rule, every 100,000 active time series requires approximately 2–4 GB of RAM. A medium-sized cluster with 50 nodes and several hundred pods might generate 200,000–500,000 active series. Plan your resource requests accordingly.

Alertmanager Configuration

The alertmanager.alertmanagerSpec section controls the Alertmanager deployment. The Alertmanager routing configuration — defining which alerts go to which channels — is managed through a Kubernetes Secret or an AlertmanagerConfig CRD.

custom-values.yaml
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 512Mi
        cpu: 500m

Grafana Configuration

Key Grafana settings include enabling persistence, changing default credentials, and configuring Ingress. The default admin password is prom-operatorchange this immediately in production.

custom-values.yaml
grafana:
  enabled: true
  adminPassword: "your-secure-password"
  persistence:
    enabled: true
    storageClassName: standard
    size: 10Gi
  ingress:
    enabled: true
    hosts:
      - grafana.your-domain.com

Node Exporter Configuration

Node Exporter runs as a DaemonSet. Without the appropriate tolerations, it will not run on tainted control plane nodes, leaving gaps in your infrastructure monitoring coverage.

custom-values.yaml
nodeExporter:
  enabled: true
  tolerations:
    - operator: "Exists"

Understanding Custom Resource Definitions (CRDs)

CRD management is the most important aspect of the kube-prometheus-stack that competitors consistently underexplain. Getting CRDs wrong is the most common cause of painful upgrade failures. The chart installs these CRDs:

  • alertmanagerconfigs.monitoring.coreos.com
  • alertmanagers.monitoring.coreos.com
  • podmonitors.monitoring.coreos.com
  • probes.monitoring.coreos.com
  • prometheusagents.monitoring.coreos.com
  • prometheuses.monitoring.coreos.com
  • prometheusrules.monitoring.coreos.com
  • scrapeconfigs.monitoring.coreos.com
  • servicemonitors.monitoring.coreos.com
  • thanosrulers.monitoring.coreos.com
Critical: With Helm v3, CRDs are not updated automatically during helm upgrade. This is deliberate — CRD updates can be destructive. You must apply CRD updates manually before upgrading when a new chart version changes CRD schemas.
terminal — bash
# Apply updated CRDs manually before helm upgrade
kubectl apply --server-side \
  -f https://raw.githubusercontent.com/prometheus-community/helm-charts/main/charts/kube-prometheus-stack/charts/crds/crds/

ServiceMonitor and PodMonitor: Monitoring Your Own Applications

ServiceMonitor and PodMonitor CRDs allow you to add monitoring for your own applications beyond Kubernetes system components. A ServiceMonitor tells Prometheus to scrape metrics from a Kubernetes Service:

servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-webapp-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # Required label
spec:
  selector:
    matchLabels:
      app: my-webapp
  endpoints:
    - port: http
        path: /metrics
        interval: 30s
  namespaceSelector:
    matchNames:
      - production
The release: kube-prometheus-stack label is required. By default, Prometheus only picks up ServiceMonitors carrying this label. Without it, Prometheus will silently ignore your ServiceMonitor. This behavior is configurable via prometheus.prometheusSpec.serviceMonitorSelector.

PrometheusRule: Defining Alerting Rules

PrometheusRule CRDs allow you to define alerting and recording rules as native Kubernetes objects. The Prometheus Operator picks them up automatically — no restart required.

prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack  # Required label
spec:
  groups:
    - name: my-app.rules
      rules:
        - alert: MyAppCrashLooping
            expr: rate(kube_pod_container_status_restarts_total{namespace="production"}[5m]) > 0
            for: 5m
            labels:
                severity: critical
            annotations:
                summary: "Pod is crash looping"
                runbook_url: "https://your-runbooks.com/crash-loop"

Upgrading the Kube-Prometheus-Stack Helm Chart

Upgrading must be done carefully, especially for major version bumps that include CRD changes. The basic upgrade command is:

terminal — bash
# Preview changes with helm-diff plugin first
helm diff upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f your-custom-values.yaml

# Run the upgrade
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f your-custom-values.yaml

# Rollback if something goes wrong
helm rollback kube-prometheus-stack --namespace monitoring

Upgrade checklist: Check release notes for breaking changes → apply CRD updates manually if required → test in staging first → use helm diff to preview changes → upgrade production.

Accessing Prometheus, Grafana, and Alertmanager

By default, all three UIs are only accessible within the cluster via ClusterIP services. Use kubectl port-forward for local access:

terminal — bash
# Prometheus — http://localhost:9090
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

# Grafana — http://localhost:3000 (admin / prom-operator)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Alertmanager — http://localhost:9093
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093

For persistent production access, configure Ingress in values.yaml for each component, or change the service type to LoadBalancer for a cloud provider IP.

Production Best Practices

  1. Enable persistent storage for all stateful components. Configure storageSpec for Prometheus and Alertmanager, and persistence for Grafana.
  2. Set resource requests and limits for every component. Prometheus can become very memory-hungry as your cluster grows. Starting without limits is a common path to destabilizing your cluster during metrics spikes.
  3. Configure Pod Disruption Budgets (PDBs) to ensure at least one replica remains running during node maintenance, preventing monitoring blackouts.
  4. Use a dedicated monitoring namespace. Isolate all monitoring workloads from application workloads for cleaner RBAC and to prevent resource contention.
  5. Pin chart versions in production. Never deploy latest. Specify an exact chart version and upgrade deliberately after testing in staging.
  6. Integrate with GitOps tooling (ArgoCD or Flux). Store your custom values.yaml in Git alongside other infrastructure configuration for a full audit trail.
  7. Plan for high-cardinality metrics. Labels with very high unique values (user_id, request_id) cause Prometheus memory to grow dramatically. Use recording rules to pre-aggregate high-cardinality queries.
  8. Consider Thanos for scale. For large clusters or multi-cluster scenarios, the chart integrates cleanly with Thanos for long-term metric storage in S3/GCS and global querying.

Uninstalling the Helm Chart

terminal — bash
# Uninstall the chart (CRDs are NOT deleted automatically)
helm uninstall kube-prometheus-stack --namespace monitoring

# Only if you want to fully remove ALL CRDs (destructive — removes all custom resources)
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
Helm does not delete CRDs on uninstall — this is a deliberate safety measure. Deleting CRDs manually destroys all custom resources of those types. Only do this if you are certain you want to remove all monitoring configuration from the cluster.

Common Troubleshooting Scenarios

Pods stuck in Pending state

Almost always means missing PVCs (no storage class available) or insufficient cluster resources. Run kubectl describe pod <pod-name> -n monitoring for the specific reason.

Grafana shows no data

Confirm Prometheus is listed as a data source in Grafana (Configuration → Data Sources). If the source exists but shows errors, verify the Prometheus service URL matches the actual service name and port in your monitoring namespace.

ServiceMonitor not being picked up

The most common cause is a missing or incorrect release: kube-prometheus-stack label on the ServiceMonitor object. Check that the label matches prometheus.prometheusSpec.serviceMonitorSelector.matchLabels.

Helm upgrade failing with CRD errors

Apply CRD updates manually before upgrading when the new chart version changes CRD schemas. Follow the CRD management section above.

High Prometheus memory usage

Investigate high-cardinality metrics via the Prometheus UI: navigate to Status → TSDB Status to see which labels consume the most memory. Address by relabeling, dropping, or aggregating high-cardinality time series.

Conclusion

The kube-prometheus-stack Helm chart is the most comprehensive and production-proven monitoring solution available for Kubernetes in 2026. Understanding the chart deeply — beyond the basic install command — is what separates a monitoring setup that just runs from one that genuinely serves your team when things go wrong.

Proper CRD management, persistent storage configuration, meaningful resource sizing, ServiceMonitor and PrometheusRule best practices, and integration with GitOps workflows are all the difference between a fragile monitoring installation and a truly production-grade observability platform.

Ready to Deploy?

Get your full Kubernetes observability stack running in minutes with the official Helm chart.

Quick Install Guide View on GitHub