When you deploy kube-prometheus-stack, Prometheus automatically scrapes metrics from Kubernetes internals — the API server, kubelet, node-exporter, and the control plane. But the moment you deploy your own application, Prometheus has no idea it exists. That is where ServiceMonitor and PodMonitor come in — the Prometheus Operator's custom resource definitions (CRDs) that let you declaratively tell Prometheus exactly how to discover and scrape metrics from your services.

This guide covers everything you need to know about ServiceMonitor and PodMonitor resources: how they work, how to write them correctly, how to debug them when they fail, and how to combine them with PrometheusRule to build a complete custom monitoring pipeline for any application running in Kubernetes.

What is the Prometheus Operator?

The Prometheus Operator is a Kubernetes operator that manages Prometheus instances, Alertmanager clusters, and related monitoring resources using Kubernetes-native custom resource definitions. It was originally created by CoreOS (now part of Red Hat) and is maintained under the prometheus-operator GitHub organization. When you install the kube-prometheus-stack Helm chart, the Prometheus Operator is the core component that orchestrates everything.

Without the Prometheus Operator, configuring Prometheus to scrape a new target requires editing the prometheus.yml configuration file, adding a new scrape_config block, and then reloading or restarting Prometheus. In a Kubernetes environment where services scale dynamically and pods come and go, this static approach becomes unmanageable.

The Prometheus Operator solves this by introducing several CRDs that translate Kubernetes-native resource definitions into Prometheus configuration:

  • Prometheus — Defines a Prometheus server instance, including its retention, storage, replicas, and which ServiceMonitors and PodMonitors it should watch.
  • ServiceMonitor — Declares how Prometheus should discover and scrape metrics from Kubernetes Services.
  • PodMonitor — Declares how Prometheus should discover and scrape metrics directly from pods, without requiring a Service.
  • PrometheusRule — Defines recording rules and alerting rules that Prometheus should evaluate.
  • Alertmanager — Defines an Alertmanager instance for routing and deduplicating alert notifications.
  • AlertmanagerConfig — Allows namespaced, fine-grained configuration of alert routing and receivers.

The operator watches for changes to these CRDs and automatically regenerates the Prometheus configuration. This means when you create a ServiceMonitor, the operator detects it, generates the corresponding scrape_config, updates the Prometheus ConfigMap (or Secret), and triggers a configuration reload — all without downtime or manual intervention.

Understanding ServiceMonitor CRDs

A ServiceMonitor is the most common way to tell Prometheus how to scrape your application. It works by targeting Kubernetes Services — specifically, by matching Service labels with a selector and defining which ports and paths to scrape. The Prometheus Operator then uses the Kubernetes API to discover all Endpoints behind those Services and configures Prometheus to scrape each pod individually.

Here is the anatomy of a ServiceMonitor resource:

servicemonitor-anatomy.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack   # Must match Prometheus serviceMonitorSelector
spec:
  selector:                                    # Which Services to target
    matchLabels:
      app: my-app
  namespaceSelector:                           # Which namespaces to search
    matchNames:
      - default
  endpoints:                                   # How to scrape each Service port
    - port: metrics                             # Must match a named port in the Service
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s

The flow is as follows: the Prometheus Operator watches for ServiceMonitor resources. When it finds one, it reads the selector to determine which Services match. It then queries the Kubernetes API for the Endpoints of those Services to discover the individual pod IPs. Finally, it generates a Prometheus scrape_config that targets each pod on the specified port and path, and injects this configuration into the running Prometheus instance.

This is fundamentally different from the older kubernetes_sd_config approach where Prometheus itself does the service discovery. With the Prometheus Operator, the operator handles discovery through the Kubernetes API and pushes the finalized target list to Prometheus. This separation of concerns makes the system more maintainable and allows for fine-grained RBAC control over who can create monitoring configurations.

Key Fields Explained

  • spec.selector.matchLabels — Label selector that must match the labels on the target Kubernetes Service (not the pods, not the Deployment — the Service itself).
  • spec.endpoints[].port — The name of the port in the Service spec to scrape. This is the port name, not the port number. The name must exactly match a named port in the Service definition.
  • spec.endpoints[].path — The HTTP path where metrics are exposed. Defaults to /metrics if omitted.
  • spec.endpoints[].interval — How often Prometheus scrapes this target. Overrides the global scrape interval.
  • spec.namespaceSelector — Restricts which namespaces the selector searches. If omitted, only the ServiceMonitor's own namespace is searched.
  • metadata.labels — Critical for discoverability. The Prometheus resource has a serviceMonitorSelector that filters which ServiceMonitors it watches. Your ServiceMonitor must have labels that match this selector.

Creating Your First ServiceMonitor

Let us walk through a complete, working example. Suppose you have a Go application that exposes Prometheus metrics on port 8080 at /metrics. Here is the full setup from Deployment to ServiceMonitor.

Step 1: Application Deployment

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-web-app
  template:
    metadata:
      labels:
        app: my-web-app
        version: v1.2.0
    spec:
      containers:
        - name: app
          image: my-registry/my-web-app:v1.2.0
          ports:
            - name: http
              containerPort: 8080
            - name: metrics
              containerPort: 9090

Step 2: Service Definition

service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-web-app
  namespace: default
  labels:
    app: my-web-app         # ServiceMonitor selector matches THIS label
spec:
  selector:
    app: my-web-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: metrics              # ServiceMonitor endpoints[].port matches THIS name
      port: 9090
      targetPort: 9090

Step 3: ServiceMonitor

servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-web-app-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: my-web-app
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
      honorLabels: false

Apply all three resources, then verify the ServiceMonitor was created:

terminal
kubectl apply -f deployment.yaml -f service.yaml -f servicemonitor.yaml
kubectl get servicemonitor -n monitoring
kubectl get endpoints my-web-app -n default

After roughly 30-60 seconds, the Prometheus Operator will detect the new ServiceMonitor, generate the scrape configuration, and reload Prometheus. You can verify the targets appear in the Prometheus UI by port-forwarding to the Prometheus service and navigating to Status > Targets.

The critical label to understand here is release: kube-prometheus-stack on the ServiceMonitor. By default, the kube-prometheus-stack Helm chart configures the Prometheus resource with serviceMonitorSelector.matchLabels.release: kube-prometheus-stack. If your ServiceMonitor does not have this label, Prometheus will never pick it up. You can change this behavior by setting prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues: false in your Helm values to make Prometheus watch all ServiceMonitors regardless of labels.

PodMonitor: When to Use It Instead

A PodMonitor works the same way as a ServiceMonitor, except it targets pods directly rather than going through a Service. The Prometheus Operator watches for PodMonitor resources, discovers matching pods via the Kubernetes API, and generates scrape configurations for each pod.

Use a PodMonitor when:

  • The application has no Service — DaemonSets that expose node-level metrics, batch Jobs that run periodically, or standalone pods without a Service definition.
  • Metrics are on a different port than the Service exposes — If your application's main Service exposes port 8080 for HTTP but metrics are on a sidecar container's port 9090 that is not included in the Service spec.
  • You need pod-level label selection — When you want to scrape pods based on pod labels rather than Service labels, PodMonitor gives you direct access to pod metadata.
podmonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-daemonset-monitor
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: node-metrics-agent
  namespaceSelector:
    matchNames:
      - kube-system
  podMetricsEndpoints:
    - port: metrics
      path: /metrics
      interval: 60s
      scrapeTimeout: 15s

Notice the key difference: PodMonitor uses podMetricsEndpoints instead of endpoints, and the selector matches pod labels directly instead of Service labels. The port name must match a named containerPort on the pod spec. Everything else — namespace selection, label matching, scrape intervals — works identically to ServiceMonitor.

In practice, ServiceMonitor is the right choice about 90% of the time. Most applications in Kubernetes have a Service, and using ServiceMonitor aligns with how the Prometheus Operator is designed to work. Reserve PodMonitor for the edge cases where a Service either does not exist or does not expose the metrics port.

Label Matching and Selector Configuration

Label selectors are where most ServiceMonitor configuration errors occur. There are three distinct layers of label matching that must all align for monitoring to work:

Layer 1: Prometheus to ServiceMonitor

The Prometheus custom resource has a serviceMonitorSelector field that determines which ServiceMonitors the operator picks up. In kube-prometheus-stack, this is typically configured as:

prometheus-resource.yaml (excerpt)
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack-prometheus
spec:
  serviceMonitorSelector:
    matchLabels:
      release: kube-prometheus-stack    # ServiceMonitor must have this label
  serviceMonitorNamespaceSelector: {}    # Empty = all namespaces

Your ServiceMonitor's metadata.labels must include release: kube-prometheus-stack (or whatever your Prometheus resource expects). To check what your Prometheus expects:

terminal
kubectl get prometheus -n monitoring -o jsonpath='{.items[*].spec.serviceMonitorSelector}'

Layer 2: ServiceMonitor to Service

The spec.selector inside the ServiceMonitor must match labels on the target Service. This is the label on the Service's metadata.labels, not the Service's spec.selector (which selects pods).

Layer 3: Port Name Matching

The endpoints[].port value must exactly match a named port in the Service's spec.ports[]. If the Service defines the port as http-metrics but the ServiceMonitor references metrics, it will silently fail with no targets discovered.

For more complex selection logic, you can use matchExpressions instead of matchLabels:

advanced-selector.yaml
spec:
  selector:
    matchExpressions:
      - key: app.kubernetes.io/name
        operator: In
        values:
          - my-app
          - my-app-canary
      - key: monitoring
        operator: NotIn
        values:
          - disabled

This selector targets Services that have app.kubernetes.io/name set to either my-app or my-app-canary, and do not have the label monitoring: disabled. The matchExpressions syntax supports operators: In, NotIn, Exists, and DoesNotExist.

Namespace Selectors and Cross-Namespace Monitoring

By default, a ServiceMonitor only looks for Services in its own namespace. In production environments, you typically place all ServiceMonitors in the monitoring namespace while your applications run in different namespaces. The namespaceSelector field controls this behavior.

Monitor Specific Namespaces

namespace-specific.yaml
spec:
  namespaceSelector:
    matchNames:
      - production
      - staging
      - backend-services

Monitor All Namespaces

namespace-all.yaml
spec:
  namespaceSelector:
    any: true    # Discovers Services across ALL namespaces

Using any: true is convenient but has implications. First, the Prometheus Operator needs RBAC permissions to list Services and Endpoints in all namespaces. The kube-prometheus-stack chart grants these permissions by default. Second, in large clusters with many namespaces, a broadly scoped ServiceMonitor can increase Prometheus load as it discovers more targets. Be intentional about which namespaces you monitor and use specific namespace selectors when possible.

There is also the serviceMonitorNamespaceSelector on the Prometheus resource itself, which controls which namespaces the operator looks for ServiceMonitor resources in (as opposed to which namespaces the ServiceMonitor searches for Services). Setting it to {} (empty) means the operator watches all namespaces for ServiceMonitors.

Configuring Scrape Intervals and Timeouts

Every endpoint in a ServiceMonitor can specify its own scrape interval and timeout, overriding the global Prometheus defaults. Choosing the right values requires balancing data granularity against Prometheus resource consumption.

scrape-config-example.yaml
spec:
  endpoints:
    - port: metrics
      path: /metrics
      interval: 15s            # Scrape every 15 seconds
      scrapeTimeout: 10s      # Timeout must be less than interval
      honorLabels: false      # Prevent target from overriding job/instance labels
      honorTimestamps: true   # Use timestamps from the target if present
      scheme: http            # Use https for TLS-enabled endpoints
      metricRelabelings:      # Drop or rename metrics after scraping
        - sourceLabels: [__name__]
          regex: 'go_.*'
          action: drop           # Drop all Go runtime metrics to save storage

Choosing the Right Interval

  • 15s — Suitable for critical production services where you need near-real-time alerting. Increases Prometheus CPU, memory, and storage consumption proportionally.
  • 30s — The default for most applications. Provides good granularity for dashboards and alerting without excessive overhead. This is the recommended starting point.
  • 60s — Appropriate for infrastructure metrics, batch jobs, or lower-priority services. Reduces Prometheus load by 50% compared to 30s intervals.
  • 300s (5m) — Use for metrics that change slowly, such as disk capacity, certificate expiry, or configuration drift metrics. Not suitable for latency or error rate alerting.

The scrapeTimeout must always be less than the interval. If a scrape takes longer than the timeout, Prometheus marks the target as unhealthy. For most applications, a timeout of 10s with a 30s interval works well. If your application exposes thousands of metrics and scrapes are slow, increase both values proportionally.

The metricRelabelings field is extremely useful for controlling storage costs. Applications instrumented with Prometheus client libraries often expose hundreds of Go runtime metrics (go_gc_*, go_memstats_*) that you may not need. Dropping them at scrape time prevents them from ever entering the time-series database, saving both storage and query performance.

Debugging ServiceMonitor Issues

When a ServiceMonitor is not working, the symptoms are always the same: the target does not appear in Prometheus, and no metrics are collected. Debugging requires checking each layer systematically.

Step 1: Verify the ServiceMonitor exists

terminal
kubectl get servicemonitor -n monitoring
kubectl describe servicemonitor my-web-app-monitor -n monitoring

Step 2: Check the Prometheus operator logs

terminal
kubectl logs -n monitoring deployment/kube-prometheus-stack-operator --tail=50

The operator logs will tell you if it detected the ServiceMonitor and whether it encountered errors generating the scrape config. Common errors include RBAC permission issues or invalid ServiceMonitor specs.

Step 3: Verify label selectors match

terminal
# Check what labels the Prometheus resource expects
kubectl get prometheus -n monitoring -o yaml | grep -A5 serviceMonitorSelector

# Check what labels your ServiceMonitor has
kubectl get servicemonitor my-web-app-monitor -n monitoring --show-labels

# Verify the target Service exists and has matching labels
kubectl get svc -n default --show-labels | grep my-web-app

# Confirm Endpoints exist (pods are running and selected by the Service)
kubectl get endpoints my-web-app -n default

Step 4: Check the Prometheus targets directly

terminal
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

Open http://localhost:9090/targets in your browser. If the target is listed but marked as DOWN, the issue is with the scrape itself (wrong port, wrong path, network policy blocking access, or the application is not exposing metrics correctly). If the target is not listed at all, the issue is with the label selectors or namespace selectors.

Step 5: Test the metrics endpoint directly

terminal
# Run a curl pod to test the metrics endpoint from within the cluster
kubectl run curl-test --rm -it --image=curlimages/curl -- \
  curl -s http://my-web-app.default.svc:9090/metrics | head -20

This confirms whether the application is actually exposing valid Prometheus metrics. The output should be plain text in the Prometheus exposition format, with lines like http_requests_total{method="GET",status="200"} 1234.

Common Issues Checklist

  1. Missing release label — The ServiceMonitor does not have release: kube-prometheus-stack (or the label your Prometheus expects).
  2. Port name mismatch — The endpoints[].port in the ServiceMonitor does not match any named port in the Service spec.
  3. Wrong namespace selector — The ServiceMonitor's namespaceSelector does not include the namespace where the Service lives.
  4. Service selector mismatch — The ServiceMonitor's selector.matchLabels does not match the Service's metadata.labels.
  5. No Endpoints — The Service has no Endpoints because pods are not running, are not ready, or the Service's spec.selector does not match the pod labels.
  6. Network policies — A NetworkPolicy is blocking Prometheus from reaching the target pods on the metrics port.
  7. RBAC permissions — The Prometheus Operator service account lacks permissions to list Services or Endpoints in the target namespace.

PrometheusRule for Custom Alerts

Once your ServiceMonitor is collecting metrics, the next step is to create alerting rules using the PrometheusRule CRD. This gives you a complete monitoring pipeline: your application exposes metrics, ServiceMonitor tells Prometheus how to scrape them, and PrometheusRule defines what conditions should trigger alerts that are sent to Alertmanager for notification routing.

prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-web-app-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: my-web-app.rules
      rules:
        # Alert: High error rate
        - alert: MyWebAppHighErrorRate
          expr: |
            sum(rate(http_requests_total{job="my-web-app",status=~"5.."}[5m]))
            /
            sum(rate(http_requests_total{job="my-web-app"}[5m]))
            > 0.05
          for: 5m
          labels:
            severity: critical
            team: backend
          annotations:
            summary: "High error rate on my-web-app"
            description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
            runbook_url: "https://wiki.internal/runbooks/my-web-app-errors"

        # Alert: High latency on P99
        - alert: MyWebAppHighLatency
          expr: |
            histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{job="my-web-app"}[5m])) by (le))
            > 2.0
          for: 10m
          labels:
            severity: warning
            team: backend
          annotations:
            summary: "P99 latency above 2s on my-web-app"
            description: "P99 latency is {{ $value | humanizeDuration }}"

        # Recording rule: Pre-compute request rate for dashboards
        - record: my_web_app:http_requests:rate5m
          expr: sum(rate(http_requests_total{job="my-web-app"}[5m])) by (status, method)

The PrometheusRule resource follows the same label convention as ServiceMonitor — it must have the release: kube-prometheus-stack label (or whatever your Prometheus resource's ruleSelector expects). The Prometheus Operator detects the PrometheusRule, injects the rules into the Prometheus configuration, and reloads Prometheus automatically.

A few best practices for PrometheusRule definitions:

  • Always include a for duration — This prevents transient spikes from triggering alerts. A for: 5m clause means the condition must be true for 5 consecutive minutes before the alert fires.
  • Use severity labels — Standardize on severity: critical, severity: warning, and severity: info. Alertmanager can route alerts differently based on severity.
  • Add runbook URLs — Include an annotations.runbook_url in every alert. When an engineer gets paged at 3am, a runbook link is the most valuable thing you can provide.
  • Use recording rules for dashboards — Pre-compute expensive PromQL expressions as recording rules. Dashboards that query recording rules load significantly faster than those computing aggregations on the fly.
  • Test rules before deploying — Use promtool check rules to validate rule syntax locally before applying to the cluster.

Conclusion

The Prometheus Operator's ServiceMonitor and PodMonitor CRDs are the standard way to extend Prometheus monitoring to your own applications in Kubernetes. They replace manual scrape configuration with a declarative, Kubernetes-native approach that integrates seamlessly with the kube-prometheus-stack.

The key principles to remember are: always check the three layers of label matching (Prometheus to ServiceMonitor, ServiceMonitor to Service, port name matching); use namespaceSelector to enable cross-namespace monitoring; choose scrape intervals that balance data granularity against resource consumption; and always pair your ServiceMonitors with PrometheusRule definitions to create a complete monitoring pipeline from metrics collection through alerting.

Once metrics are flowing, build custom Grafana dashboards to visualize your application-specific data. For fine-grained control over scrape behavior, resource limits, and storage, review the values.yaml configuration guide. And for clusters that need long-term metric retention beyond what local Prometheus storage provides, integrate Thanos for unlimited historical data.

When something is not working, debug systematically from the operator logs through the label selectors to the metrics endpoint itself. The most common issues are label mismatches that silently prevent the operator from picking up your ServiceMonitor — a problem that is easy to fix once you know where to look.

Ready to Deploy?

Get your full Kubernetes observability stack running in minutes with the official Helm chart.

Quick Install Guide Helm Chart Docs