Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.streamnative.io/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

Before you can visualize Lakehouse metrics in your own Grafana instance, enable Metrics Remote Write on your Cloud Environment to forward StreamNative Cloud metrics to your Prometheus-compatible monitoring system or Datadog.

Grafana Dashboard

A pre-built Grafana dashboard is available as CompactionScheduler.json in the apache-pulsar-grafana-dashboard repository. Import it into your Grafana instance for comprehensive monitoring.

How to Import

  1. Download CompactionScheduler.json from the repository.
  2. Open Grafana -> Dashboards -> Import.
  3. Upload CompactionScheduler.json or paste the JSON content.
  4. Select your Prometheus data source.
  5. Click Import.

Dashboard Overview

The dashboard is organized into the following sections:
SectionDescription
OverviewTopic count, task count, publish/compact/commit failed tasks, commit batch size
Compaction WriteCompaction lag, task publish lag, task stats, non-committable tasks, throughput (bytes/messages), latencies for compaction duration, WAL read, Parquet write, task commit, lakehouse commit, end-to-end pipeline
Persistent APIRead throughput, read latencies (index+data, message, Oxia index, Oxia metadata)
WALRead cache eviction/loading rate, WAL read latency, S3 cache loading latency
S3S3 read throughput, request rate, S3 read latency
Compaction ReadLakehouse read bytes/messages, read latency
Compaction Write DetailsLakehouse write/encode/before-write/write-record latencies, Parquet write-record/write-metadata latencies
DLQ TasksDead Letter Queue task statistics

Key Alerts

These metrics should be monitored with alerting rules:
MetricAlert ConditionSeverity
pulsar_storage_compact_lagCompaction lag exceeds threshold per topicWarning
compaction_cluster_leaders_ratioSum across cluster is not exactly 1Critical
pulsar_storage_compact_quarantined_topics_countGreater than 0Warning
pulsar_storage_compact_topics_in_dlqGreater than 0Critical
pulsar_storage_compact_tasks_in_dlqGreater than 0Critical
pulsar_storage_compact_publish_task_failed_count_totalIncreasingWarning
pulsar_storage_compact_failed_task_count_totalIncreasingWarning
pulsar_storage_compact_task_commit_duration_seconds_count{pulsar_response_status="failed"}IncreasingCritical
pulsar_subscription_back_logBacklog exceeds thresholdWarning

Compaction Service Metrics

The compaction service has three stages: task publishing (leader), WAL-to-Parquet conversion (worker), and commit to lakehouse (leader).

Task Lifecycle

MetricTypeDescription
pulsar_storage_compact_ongoing_topic_countGaugeNumber of topics currently undergoing compaction
pulsar_storage_compact_ongoing_task_countGaugeNumber of active compaction tasks in progress
pulsar_storage_compact_tasks_in_init_stateGaugeTasks in initialization state
pulsar_storage_compact_tasks_in_compacted_stateGaugeTasks in compacted state
pulsar_storage_compact_tasks_in_prepared_commit_stateGaugeTasks in prepared commit state
pulsar_storage_compact_tasks_in_committed_stateGaugeTasks in committed state

Throughput

MetricTypeDescription
pulsar_storage_compact_bytes_totalCounterTotal bytes processed during compaction
pulsar_storage_compact_messages_totalCounterTotal messages processed during compaction
pulsar_storage_compact_published_task_bytesGaugeSize in bytes of messages batched in one compaction task
pulsar_storage_compact_committed_parquet_file_bytesGaugeSize in bytes of committed Parquet files
pulsar_storage_compact_commit_task_batch_sizeGaugeNumber of Parquet files in a single commit batch

Offset Tracking

MetricTypeDescription
pulsar_storage_compact_latest_message_offsetGaugeLatest message offset for each topic
pulsar_storage_compact_latest_published_offsetGaugeLatest published task’s message offset
pulsar_storage_compact_last_compacted_offsetGaugeLatest offset confirmed as fully committed to lakehouse
pulsar_storage_compact_lagGaugeDifference between latest message offset and last compacted offset

Latency

MetricTypeDescription
pulsar_storage_compact_duration_seconds_bucketHistogramTotal latency of a compaction task
pulsar_storage_compact_read_messages_duration_seconds_bucketHistogramLatency for reading messages from WAL files
pulsar_storage_compact_write_messages_duration_seconds_bucketHistogramLatency for decoding, converting, and writing to Parquet
pulsar_storage_compact_task_commit_duration_seconds_bucketHistogramLatency for committing a task (includes Oxia index + catalog snapshot)
pulsar_storage_compact_commit_to_lakehouse_duration_seconds_bucketHistogramLatency for committing snapshot to catalog service only
pulsar_storage_compact_message_from_ursa_to_parquet_duration_seconds_bucketHistogramEnd-to-end latency: message write to Parquet file write
pulsar_storage_compact_message_end_to_end_duration_seconds_bucketHistogramEnd-to-end latency: message write to lakehouse commit

Failures

MetricTypeDescription
pulsar_storage_compact_publish_task_failed_count_totalCounterTotal failed task publications
pulsar_storage_compact_failed_task_count_totalCounterTotal failed WAL-to-Parquet conversions
pulsar_storage_compact_quarantined_topics_countGaugeTopics quarantined due to compaction failures
pulsar_storage_compact_topics_in_dlqGaugeTopics in Dead Letter Queue
pulsar_storage_compact_tasks_in_dlqGaugeTasks in Dead Letter Queue
pulsar_storage_compact_non_committable_task_countCounterNon-committable tasks exceeding threshold
pulsar_storage_compact_non_committable_task_histogram_bytes_bucketHistogramSize distribution of non-committable tasks

WAL Storage Metrics

MetricTypeDescription
pulsar_storage_wal_putEntry_count_totalCounterTotal entries written to WAL
pulsar_storage_wal_putEntry_rejected_count_totalCounterTotal entries rejected during WAL write
pulsar_storage_wal_putEntry_duration_seconds_bucketHistogramWAL write latency
pulsar_storage_wal_putEntry_pending_duration_seconds_bucketHistogramTime entries wait in WAL buffer
pulsar_storage_wal_putEntry_cache_duration_seconds_bucketHistogramWrite cache write latency
pulsar_storage_wal_getEntries_duration_seconds_bucketHistogramBatch read latency (cache or backend)
pulsar_storage_wal_getEntry_duration_seconds_bucketHistogramSingle entry read latency
pulsar_storage_wal_writeCache_flush_duration_seconds_bucketHistogramWrite cache flush latency
pulsar_storage_wal_readCache_loading_count_totalCounterRead cache loads from backend
pulsar_storage_wal_readCache_eviction_count_totalCounterRead cache evictions
pulsar_storage_wal_readCache_loading_duration_seconds_bucketHistogramCache loading latency
pulsar_storage_wal_read_cache_missed_totalCounterRead cache misses
pulsar_storage_wal_putEntry_pending_countGaugeEntries queued in WAL pending buffer
pulsar_storage_wal_writeCache_flushCallback_pending_countGaugePending flush acknowledgments
pulsar_storage_wal_readCache_size_bytesGaugeCurrent read cache size

Write Cache Metrics

MetricTypeDescription
pulsar_storage_wal_writeCache_used_bytesGaugeWrite cache utilization
pulsar_storage_wal_writeCache_bufferSegment_usedGaugeBuffer segments in use
pulsar_storage_wal_writeCache_cacheSegment_usedGaugeCache segments in use
pulsar_storage_wal_writeCache_segment_countGaugeTotal allocated segments
pulsar_storage_wal_writeCache_capacity_bytesGaugeMax capacity per segment

File Storage Metrics

MetricTypeDescription
pulsar_storage_backend_storage_request_totalCounterTotal backend storage operations
pulsar_storage_backend_write_duration_seconds_bucketHistogramBackend write latency
pulsar_storage_backend_read_duration_seconds_bucketHistogramBackend read latency
pulsar_storage_backend_metadata_read_duration_seconds_bucketHistogramMetadata read latency
pulsar_storage_backend_crc_duration_seconds_bucketHistogramCRC calculation latency
pulsar_storage_backend_delete_duration_seconds_bucketHistogramObject deletion latency
pulsar_storage_backend_write_bytes_count_bytes_totalCounterTotal bytes written to backend
pulsar_storage_backend_read_bytes_count_bytes_totalCounterTotal bytes read from backend

Lakehouse Read Metrics

MetricTypeDescription
pulsar_storage_lakehouse_read_messages_totalCounterTotal messages read from lakehouse (Parquet files)
pulsar_storage_lakehouse_read_bytes_bytes_totalCounterTotal bytes read from lakehouse
pulsar_storage_lakehouse_read_request_totalCounterTotal read requests processed
pulsar_storage_lakehouse_read_cache_hit_totalCounterParquet prefetch cache hits
pulsar_storage_lakehouse_read_cache_miss_totalCounterParquet prefetch cache misses
pulsar_storage_lakehouse_read_latency_seconds_bucketHistogramRead latency
pulsar_storage_lakehouse_read_request_queued_latency_seconds_bucketHistogramQueue wait time before processing

Lakehouse Writer Metrics

MetricTypeDescription
pulsar_storage_lakehouse_writer_before_write_durationHistogramPre-write operation latency
pulsar_storage_lakehouse_writer_write_all_durationHistogramBatch write latency
pulsar_storage_lakehouse_writer_write_record_durationHistogramIndividual record write latency
pulsar_storage_lakehouse_writer_encode_durationHistogramRecord encoding latency

Lakehouse Reader Metrics

MetricTypeDescription
pulsar_storage_lakehouse_reader_seek_durationHistogramSeek operation latency
pulsar_storage_lakehouse_reader_read_all_durationHistogramBatch read latency
pulsar_storage_lakehouse_reader_read_record_durationHistogramIndividual record read latency
pulsar_storage_lakehouse_reader_decode_durationHistogramRecord decoding latency

Parquet File Metrics

Writer

MetricTypeDescription
pulsar_storage_lakehouse_parquet_write_record_durationHistogramParquet record write latency
pulsar_storage_lakehouse_parquet_write_metadata_durationHistogramParquet metadata write latency

Reader

MetricTypeDescription
pulsar_storage_lakehouse_parquet_read_record_durationHistogramParquet record read latency
pulsar_storage_lakehouse_parquet_read_metadata_durationHistogramParquet metadata read latency
pulsar_storage_lakehouse_parquet_seek_by_offset_durationHistogramSeek by offset latency
pulsar_storage_lakehouse_parquet_seek_by_secondary_index_durationHistogramSeek by secondary index latency