Documentation Index
Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt
Use this file to discover all available pages before exploring further.
Grafana Dashboard
A pre-built Grafana dashboard is available as CompactionScheduler.json in the apache-pulsar-grafana-dashboard repository. Import it into your Grafana instance for comprehensive monitoring.
How to Import
- Download
CompactionScheduler.json from the repository.
- Open Grafana -> Dashboards -> Import.
- Upload
CompactionScheduler.json or paste the JSON content.
- Select your Prometheus data source.
- Click Import.
Dashboard Overview
The dashboard is organized into the following sections:
| Section | Description |
|---|
| Overview | Topic count, task count, publish/compact/commit failed tasks, commit batch size |
| Compaction Write | Compaction lag, task publish lag, task stats, non-committable tasks, throughput (bytes/messages), latencies for compaction duration, WAL read, Parquet write, task commit, lakehouse commit, end-to-end pipeline |
| Persistent API | Read throughput, read latencies (index+data, message, Oxia index, Oxia metadata) |
| WAL | Read cache eviction/loading rate, WAL read latency, S3 cache loading latency |
| S3 | S3 read throughput, request rate, S3 read latency |
| Compaction Read | Lakehouse read bytes/messages, read latency |
| Compaction Write Details | Lakehouse write/encode/before-write/write-record latencies, Parquet write-record/write-metadata latencies |
| DLQ Tasks | Dead Letter Queue task statistics |
Key Alerts
These metrics should be monitored with alerting rules:
| Metric | Alert Condition | Severity |
|---|
pulsar_storage_compact_lag | Compaction lag exceeds threshold per topic | Warning |
compaction_cluster_leaders_ratio | Sum across cluster is not exactly 1 | Critical |
pulsar_storage_compact_quarantined_topics_count | Greater than 0 | Warning |
pulsar_storage_compact_topics_in_dlq | Greater than 0 | Critical |
pulsar_storage_compact_tasks_in_dlq | Greater than 0 | Critical |
pulsar_storage_compact_publish_task_failed_count_total | Increasing | Warning |
pulsar_storage_compact_failed_task_count_total | Increasing | Warning |
pulsar_storage_compact_task_commit_duration_seconds_count{pulsar_response_status="failed"} | Increasing | Critical |
pulsar_subscription_back_log | Backlog exceeds threshold | Warning |
Compaction Service Metrics
The compaction service has three stages: task publishing (leader), WAL-to-Parquet conversion (worker), and commit to lakehouse (leader).
Task Lifecycle
| Metric | Type | Description |
|---|
pulsar_storage_compact_ongoing_topic_count | Gauge | Number of topics currently undergoing compaction |
pulsar_storage_compact_ongoing_task_count | Gauge | Number of active compaction tasks in progress |
pulsar_storage_compact_tasks_in_init_state | Gauge | Tasks in initialization state |
pulsar_storage_compact_tasks_in_compacted_state | Gauge | Tasks in compacted state |
pulsar_storage_compact_tasks_in_prepared_commit_state | Gauge | Tasks in prepared commit state |
pulsar_storage_compact_tasks_in_committed_state | Gauge | Tasks in committed state |
Throughput
| Metric | Type | Description |
|---|
pulsar_storage_compact_bytes_total | Counter | Total bytes processed during compaction |
pulsar_storage_compact_messages_total | Counter | Total messages processed during compaction |
pulsar_storage_compact_published_task_bytes | Gauge | Size in bytes of messages batched in one compaction task |
pulsar_storage_compact_committed_parquet_file_bytes | Gauge | Size in bytes of committed Parquet files |
pulsar_storage_compact_commit_task_batch_size | Gauge | Number of Parquet files in a single commit batch |
Offset Tracking
| Metric | Type | Description |
|---|
pulsar_storage_compact_latest_message_offset | Gauge | Latest message offset for each topic |
pulsar_storage_compact_latest_published_offset | Gauge | Latest published task’s message offset |
pulsar_storage_compact_last_compacted_offset | Gauge | Latest offset confirmed as fully committed to lakehouse |
pulsar_storage_compact_lag | Gauge | Difference between latest message offset and last compacted offset |
Latency
| Metric | Type | Description |
|---|
pulsar_storage_compact_duration_seconds_bucket | Histogram | Total latency of a compaction task |
pulsar_storage_compact_read_messages_duration_seconds_bucket | Histogram | Latency for reading messages from WAL files |
pulsar_storage_compact_write_messages_duration_seconds_bucket | Histogram | Latency for decoding, converting, and writing to Parquet |
pulsar_storage_compact_task_commit_duration_seconds_bucket | Histogram | Latency for committing a task (includes Oxia index + catalog snapshot) |
pulsar_storage_compact_commit_to_lakehouse_duration_seconds_bucket | Histogram | Latency for committing snapshot to catalog service only |
pulsar_storage_compact_message_from_ursa_to_parquet_duration_seconds_bucket | Histogram | End-to-end latency: message write to Parquet file write |
pulsar_storage_compact_message_end_to_end_duration_seconds_bucket | Histogram | End-to-end latency: message write to lakehouse commit |
Failures
| Metric | Type | Description |
|---|
pulsar_storage_compact_publish_task_failed_count_total | Counter | Total failed task publications |
pulsar_storage_compact_failed_task_count_total | Counter | Total failed WAL-to-Parquet conversions |
pulsar_storage_compact_quarantined_topics_count | Gauge | Topics quarantined due to compaction failures |
pulsar_storage_compact_topics_in_dlq | Gauge | Topics in Dead Letter Queue |
pulsar_storage_compact_tasks_in_dlq | Gauge | Tasks in Dead Letter Queue |
pulsar_storage_compact_non_committable_task_count | Counter | Non-committable tasks exceeding threshold |
pulsar_storage_compact_non_committable_task_histogram_bytes_bucket | Histogram | Size distribution of non-committable tasks |
WAL Storage Metrics
| Metric | Type | Description |
|---|
pulsar_storage_wal_putEntry_count_total | Counter | Total entries written to WAL |
pulsar_storage_wal_putEntry_rejected_count_total | Counter | Total entries rejected during WAL write |
pulsar_storage_wal_putEntry_duration_seconds_bucket | Histogram | WAL write latency |
pulsar_storage_wal_putEntry_pending_duration_seconds_bucket | Histogram | Time entries wait in WAL buffer |
pulsar_storage_wal_putEntry_cache_duration_seconds_bucket | Histogram | Write cache write latency |
pulsar_storage_wal_getEntries_duration_seconds_bucket | Histogram | Batch read latency (cache or backend) |
pulsar_storage_wal_getEntry_duration_seconds_bucket | Histogram | Single entry read latency |
pulsar_storage_wal_writeCache_flush_duration_seconds_bucket | Histogram | Write cache flush latency |
pulsar_storage_wal_readCache_loading_count_total | Counter | Read cache loads from backend |
pulsar_storage_wal_readCache_eviction_count_total | Counter | Read cache evictions |
pulsar_storage_wal_readCache_loading_duration_seconds_bucket | Histogram | Cache loading latency |
pulsar_storage_wal_read_cache_missed_total | Counter | Read cache misses |
pulsar_storage_wal_putEntry_pending_count | Gauge | Entries queued in WAL pending buffer |
pulsar_storage_wal_writeCache_flushCallback_pending_count | Gauge | Pending flush acknowledgments |
pulsar_storage_wal_readCache_size_bytes | Gauge | Current read cache size |
Write Cache Metrics
| Metric | Type | Description |
|---|
pulsar_storage_wal_writeCache_used_bytes | Gauge | Write cache utilization |
pulsar_storage_wal_writeCache_bufferSegment_used | Gauge | Buffer segments in use |
pulsar_storage_wal_writeCache_cacheSegment_used | Gauge | Cache segments in use |
pulsar_storage_wal_writeCache_segment_count | Gauge | Total allocated segments |
pulsar_storage_wal_writeCache_capacity_bytes | Gauge | Max capacity per segment |
File Storage Metrics
| Metric | Type | Description |
|---|
pulsar_storage_backend_storage_request_total | Counter | Total backend storage operations |
pulsar_storage_backend_write_duration_seconds_bucket | Histogram | Backend write latency |
pulsar_storage_backend_read_duration_seconds_bucket | Histogram | Backend read latency |
pulsar_storage_backend_metadata_read_duration_seconds_bucket | Histogram | Metadata read latency |
pulsar_storage_backend_crc_duration_seconds_bucket | Histogram | CRC calculation latency |
pulsar_storage_backend_delete_duration_seconds_bucket | Histogram | Object deletion latency |
pulsar_storage_backend_write_bytes_count_bytes_total | Counter | Total bytes written to backend |
pulsar_storage_backend_read_bytes_count_bytes_total | Counter | Total bytes read from backend |
Lakehouse Read Metrics
| Metric | Type | Description |
|---|
pulsar_storage_lakehouse_read_messages_total | Counter | Total messages read from lakehouse (Parquet files) |
pulsar_storage_lakehouse_read_bytes_bytes_total | Counter | Total bytes read from lakehouse |
pulsar_storage_lakehouse_read_request_total | Counter | Total read requests processed |
pulsar_storage_lakehouse_read_cache_hit_total | Counter | Parquet prefetch cache hits |
pulsar_storage_lakehouse_read_cache_miss_total | Counter | Parquet prefetch cache misses |
pulsar_storage_lakehouse_read_latency_seconds_bucket | Histogram | Read latency |
pulsar_storage_lakehouse_read_request_queued_latency_seconds_bucket | Histogram | Queue wait time before processing |
Lakehouse Writer Metrics
| Metric | Type | Description |
|---|
pulsar_storage_lakehouse_writer_before_write_duration | Histogram | Pre-write operation latency |
pulsar_storage_lakehouse_writer_write_all_duration | Histogram | Batch write latency |
pulsar_storage_lakehouse_writer_write_record_duration | Histogram | Individual record write latency |
pulsar_storage_lakehouse_writer_encode_duration | Histogram | Record encoding latency |
Lakehouse Reader Metrics
| Metric | Type | Description |
|---|
pulsar_storage_lakehouse_reader_seek_duration | Histogram | Seek operation latency |
pulsar_storage_lakehouse_reader_read_all_duration | Histogram | Batch read latency |
pulsar_storage_lakehouse_reader_read_record_duration | Histogram | Individual record read latency |
pulsar_storage_lakehouse_reader_decode_duration | Histogram | Record decoding latency |
Parquet File Metrics
Writer
| Metric | Type | Description |
|---|
pulsar_storage_lakehouse_parquet_write_record_duration | Histogram | Parquet record write latency |
pulsar_storage_lakehouse_parquet_write_metadata_duration | Histogram | Parquet metadata write latency |
Reader
| Metric | Type | Description |
|---|
pulsar_storage_lakehouse_parquet_read_record_duration | Histogram | Parquet record read latency |
pulsar_storage_lakehouse_parquet_read_metadata_duration | Histogram | Parquet metadata read latency |
pulsar_storage_lakehouse_parquet_seek_by_offset_duration | Histogram | Seek by offset latency |
pulsar_storage_lakehouse_parquet_seek_by_secondary_index_duration | Histogram | Seek by secondary index latency |