Production-Ready Monitoring Stack

Kube Prometheus Stack

Deploy Prometheus, Grafana, Alertmanager, and essential exporters as a unified Helm chart. Enterprise-grade monitoring for your Kubernetes clusters — configured in minutes, not days.

0
GitHub Stars
0
Helm Downloads
0
Uptime SLA
Kube Prometheus Stack 2026 — Helm Install with Grafana & Full Kubernetes Observability

Six Pillars of Full-Stack Monitoring

Every component works in concert to deliver end-to-end observability — from infrastructure metrics to intelligent alerting and beautiful dashboards.

Prometheus Operator

The orchestration brain. Manages Prometheus instances using native Kubernetes CRDs — ServiceMonitors, PodMonitors, and PrometheusRules for declarative configuration.

Core Engine

Prometheus Server

Industry-standard time-series database. Scrapes, stores, and evaluates metrics from your entire cluster with powerful PromQL query language support.

TSDB

Grafana

Rich visualization layer with pre-built dashboards for cluster health, node performance, and workload metrics. Customizable and extensible for any use case.

Visualization

Alertmanager

Intelligent alert routing with de-duplication, grouping, and silencing. Route alerts to Slack, PagerDuty, Email, MS Teams, and custom webhooks.

Alerting

Node Exporter

Deployed as a DaemonSet on every node, it exposes CPU, memory, disk I/O, and network metrics — giving you full visibility into host-level infrastructure.

Infrastructure

Kube-State-Metrics

Monitors Kubernetes API objects — deployments, pods, replica sets, services — tracking the desired vs. actual state of all your workloads.

K8s State

How the Stack Works Together

A unified data pipeline from metric collection through intelligent alerting and rich visualization.

Kube Prometheus Stack architecture diagram showing the data flow from metric exporters through Prometheus to Grafana dashboards and Alertmanager notifications

Orchestration

Prometheus Operator watches for ServiceMonitor and PodMonitor CRDs to auto-discover scrape targets.

Metric Collection

Node Exporter gathers hardware metrics; kube-state-metrics captures Kubernetes object states from the API server.

Scraping & Storage

Prometheus pulls metrics from all discovered endpoints and stores them as time-series data with configurable retention.

Rule Evaluation

PrometheusRule objects define alerting conditions. Triggered alerts are forwarded to Alertmanager for routing.

Alerting & Notification

Alertmanager de-duplicates, groups, and routes alerts to Slack, PagerDuty, email, or any webhook receiver.

Visualization

Grafana queries Prometheus to render real-time dashboards — pre-built for cluster health and fully customizable.

Deploy in Three Commands

From zero to full-stack monitoring in under five minutes using the official Helm chart from the Prometheus community.

terminal — bash
# Add the Prometheus community Helm repository
$ helm repo add prometheus-community \
    https://prometheus-community.github.io/helm-charts
$ helm repo update

# Create a dedicated monitoring namespace
$ kubectl create namespace monitoring

# Install the full kube-prometheus-stack
$ helm install prometheus-stack \
    prometheus-community/kube-prometheus-stack \
    --namespace monitoring \
    --values values.yaml

Built for Enterprise Scale

Follow battle-tested patterns to run a reliable, performant, and secure monitoring stack in production.

Persistent Storage

Configure Persistent Volumes for Prometheus and Grafana to survive pod restarts without losing metrics or dashboards.

Cardinality Control

Monitor and manage time-series cardinality to prevent memory explosions. Avoid high-cardinality labels like unique IDs.

High Availability

Run multiple Prometheus replicas with pod anti-affinity for zero-downtime monitoring across failure domains.

Long-Term Storage

Integrate with Thanos, Cortex, or Grafana Cloud via remote_write for historical data retention beyond 30 days.

Security & RBAC

Enforce network policies, OIDC/OAuth authentication for Grafana, and strict Kubernetes RBAC for the monitoring namespace.

ServiceMonitor Discovery

Use ServiceMonitor and PodMonitor CRDs for automatic, service-based metric target discovery — no manual config needed.

Resource Governance

Set CPU/memory requests and limits for every component to prevent resource starvation and OOM kills in production.

Full Observability Pillars

Extend beyond metrics: add Loki for logs and Tempo + OpenTelemetry for distributed tracing in a unified Grafana stack.

Frequently Asked Questions

Everything you need to know about deploying and managing the kube-prometheus-stack.

It is a comprehensive Helm chart that deploys a full Kubernetes monitoring and alerting stack — including Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics, and the Prometheus Operator. It provides production-ready observability out of the box.
Standalone Prometheus requires manual configuration of scrape targets, alerting rules, and dashboards. The kube-prometheus-stack automates all of this using the Prometheus Operator pattern with CRDs like ServiceMonitor and PrometheusRule, plus it includes pre-configured Grafana dashboards and Alertmanager routing.
For a small cluster (1–5 nodes), Prometheus typically needs 2 CPU cores and 4–8 GB RAM. Grafana requires around 0.5 CPU and 512 MB RAM. For production clusters, scale resources based on the number of time series and scrape frequency. Always configure persistent volumes for data retention.
Absolutely. Custom Grafana dashboards can be provisioned via ConfigMaps or the Grafana UI. Custom alerting rules are defined using PrometheusRule CRDs, which the Operator automatically syncs with the Prometheus configuration. The stack is fully extensible.
Local Prometheus storage is recommended for 15–30 days of retention. For longer-term storage, configure remote_write to send metrics to solutions like Thanos, Cortex, Grafana Mimir, or managed services like Amazon Managed Prometheus and Grafana Cloud.
The default configuration is an excellent starting point, but production deployments should customize the values.yaml to enable persistent storage, set resource limits, configure HA replicas, define alert routing destinations (Slack/PagerDuty), and apply network policies for security.
You need a running Kubernetes cluster (v1.19+), Helm v3.x installed, and kubectl configured with cluster access. Ensure your cluster has sufficient resources — at minimum 2 vCPU and 4 GB RAM for the monitoring namespace. A default StorageClass is recommended for persistent volumes.
kube-prometheus-stack supports Kubernetes v1.19 and above, including all current EKS, GKE, and AKS managed versions. Helm v3.2+ is required. The chart is regularly tested against the latest Kubernetes releases and updated within days of new minor versions.
Run: helm repo update && helm upgrade --reuse-values monitoring prometheus-community/kube-prometheus-stack -n monitoring. Always review the chart changelog before upgrading, as CRD changes may require manual steps. Back up your Grafana dashboards and Prometheus data before major version upgrades.
CRDs are not managed by Helm on upgrades — you may need to update them manually for major chart versions. High-cardinality metrics can cause memory issues; always set cardinality limits. Prometheus does not natively support multi-tenancy — use Thanos or Cortex for multi-tenant setups. Windows nodes are not supported by Node Exporter.
Edit the alertmanager.config section in your values.yaml. Define receivers for Slack, PagerDuty, email, or webhook, then set up route rules to match alert labels to the correct receiver. Example: for Slack, provide the webhook URL and channel name under the slack_configs block of your receiver definition.

Ready to Monitor Your Kubernetes Cluster?

Deploy the industry-standard observability stack in minutes. Open-source, battle-tested, and trusted by thousands of engineering teams worldwide.

View on GitHub Artifact Hub