- Monitor
Configure Alertmanager
Alertmanager is a component in StreamNative Platform and a component of Prometheus. Alertmanager handles alerts sent by StreamNnative components, such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration, such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.
By default, Alertmanager is enabled with StreamNative Platform. To disable it, you can set monitoring.alert_manager: false
in the Pulsar cluster configuration YAML file.
Configure CPU and memory resources
You can configure the requested CPU and memory, resolve time, and alert rules for the Alertmanager in the Pulsar cluster configuration YAML file. Then, you can use the helm upgrade
command to restart the StreamNative Platform to make updates effective.
alert_manager:
resources:
requests:
memory:
cpu:
config:
global:
resolve_timeout:
rules:
groups:
- name:
rules:
Configure alerting rules
Alerting rules allow you to define alert conditions based on Prometheus expression language expressions and to send notifications about firing alerts to an external service. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements label sets.
For more information about alert rules, see alerting rules.
This example shows how to configure alert rules with ZooKeeper.
- name: zookeeper
rules:
- alert: HighWatchers
expr: zookeeper_server_watches_count{job="zookeeper"} > 1000000
for: 5m
labels:
status: warning
annotations:
summary: "Watchers of Zookeeper server is over than 1000k."
description: "Watchers of Zookeeper server {{ $labels.kubernetes_pod_name }} is over than 1000k, current value is {{ $value }}."
- alert: HighEphemerals
expr: zookeeper_server_ephemerals_count{job="zookeeper"} > 10000
for: 5m
labels:
status: warning
annotations:
summary: "Ephemeral nodes of Zookeeper server is over than 10k."
description: "Ephemeral nodes of Zookeeper server {{ $labels.kubernetes_pod_name }} is over than 10k, current value is {{ $value }}."
- alert: HighConnections
expr: zookeeper_server_connections{job="zookeeper"} > 10000
for: 5m
labels:
severity: page
annotations:
summary: "Connections of Zookeeper server is over than 10k."
description: "Connections of Zookeeper server {{ $labels.kubernetes_pod_name }} is over than 10k, current value is {{ $value }}."
- alert: HighDataSize
expr: zookeeper_server_data_size_bytes{job="zookeeper"} > 2147483648
for: 5m
labels:
severity: page
annotations:
summary: "Data size of Zookeeper server is over than 2GB."
description: "Data size of Zookeeper server {{ $labels.instance }} is over than 2GB, current value is {{ $value }}."
- alert: HighRequestThroughput
expr: sum(irate(zookeeper_server_requests{job="zookeeper"}[30s])) by (type) > 1000
for: 5m
labels:
status: warning
annotations:
summary: "Request throughput on Zookeeper server is over than 1000 in 5m."
description: "Request throughput of {{ $labels.type}} on Zookeeper server {{ $labels.instance }} is over than 1k, current value is {{ $value }}."
- alert: HighRequestLatency
expr: zookeeper_server_requests_latency_ms{job="zookeeper", quantile="0.99"} > 100
for: 5m
labels:
severity: page
annotations:
summary: "Request latency on Zookeeper server is over than 100ms."
description: "Request latency {{ $labels.type }} in p99 on Zookeeper server {{ $labels.instance }} is over than 100ms, current value is {{ $value }} ms."