Prometheus Operator
The orchestration brain. Manages Prometheus instances using native Kubernetes CRDs — ServiceMonitors, PodMonitors, and PrometheusRules for declarative configuration.
Core EngineDeploy Prometheus, Grafana, Alertmanager, and essential exporters as a unified Helm chart. Enterprise-grade monitoring for your Kubernetes clusters — configured in minutes, not days.
Every component works in concert to deliver end-to-end observability — from infrastructure metrics to intelligent alerting and beautiful dashboards.
The orchestration brain. Manages Prometheus instances using native Kubernetes CRDs — ServiceMonitors, PodMonitors, and PrometheusRules for declarative configuration.
Core EngineIndustry-standard time-series database. Scrapes, stores, and evaluates metrics from your entire cluster with powerful PromQL query language support.
TSDBRich visualization layer with pre-built dashboards for cluster health, node performance, and workload metrics. Customizable and extensible for any use case.
VisualizationIntelligent alert routing with de-duplication, grouping, and silencing. Route alerts to Slack, PagerDuty, Email, MS Teams, and custom webhooks.
AlertingDeployed as a DaemonSet on every node, it exposes CPU, memory, disk I/O, and network metrics — giving you full visibility into host-level infrastructure.
InfrastructureMonitors Kubernetes API objects — deployments, pods, replica sets, services — tracking the desired vs. actual state of all your workloads.
K8s StateA unified data pipeline from metric collection through intelligent alerting and rich visualization.
Prometheus Operator watches for ServiceMonitor and PodMonitor CRDs to auto-discover scrape targets.
Node Exporter gathers hardware metrics; kube-state-metrics captures Kubernetes object states from the API server.
Prometheus pulls metrics from all discovered endpoints and stores them as time-series data with configurable retention.
PrometheusRule objects define alerting conditions. Triggered alerts are forwarded to Alertmanager for routing.
Alertmanager de-duplicates, groups, and routes alerts to Slack, PagerDuty, email, or any webhook receiver.
Grafana queries Prometheus to render real-time dashboards — pre-built for cluster health and fully customizable.
From zero to full-stack monitoring in under five minutes using the official Helm chart from the Prometheus community.
# Add the Prometheus community Helm repository $ helm repo add prometheus-community \ https://prometheus-community.github.io/helm-charts $ helm repo update # Create a dedicated monitoring namespace $ kubectl create namespace monitoring # Install the full kube-prometheus-stack $ helm install prometheus-stack \ prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --values values.yaml
Follow battle-tested patterns to run a reliable, performant, and secure monitoring stack in production.
Configure Persistent Volumes for Prometheus and Grafana to survive pod restarts without losing metrics or dashboards.
Monitor and manage time-series cardinality to prevent memory explosions. Avoid high-cardinality labels like unique IDs.
Run multiple Prometheus replicas with pod anti-affinity for zero-downtime monitoring across failure domains.
Integrate with Thanos, Cortex, or Grafana Cloud via remote_write for historical data retention beyond 30 days.
Enforce network policies, OIDC/OAuth authentication for Grafana, and strict Kubernetes RBAC for the monitoring namespace.
Use ServiceMonitor and PodMonitor CRDs for automatic, service-based metric target discovery — no manual config needed.
Set CPU/memory requests and limits for every component to prevent resource starvation and OOM kills in production.
Extend beyond metrics: add Loki for logs and Tempo + OpenTelemetry for distributed tracing in a unified Grafana stack.
Everything you need to know about deploying and managing the kube-prometheus-stack.