Administration

Monitoring, backup and restore, retention, cluster operations, and background services.

Monitoring

Prometheus Metrics

HyperbyteDB exposes a Prometheus-compatible metrics endpoint at GET /metrics on the same port as the API (default 8086). There is no separate metrics port.

Key metrics:

Metric	Type	Description
`hyperbytedb_write_requests_total`	counter	Total write requests received
`hyperbytedb_write_errors_total`	counter	Failed write requests
`hyperbytedb_write_payload_bytes`	histogram	Raw payload size in bytes
`hyperbytedb_write_duration_seconds`	histogram	Write handler latency
`hyperbytedb_query_requests_total`	counter	Total query requests received
`hyperbytedb_query_errors_total`	counter	Failed queries
`hyperbytedb_query_duration_seconds`	histogram	Query execution latency
`hyperbytedb_ingestion_points_total`	counter	Total points ingested
`hyperbytedb_flush_runs_total`	counter	Flush cycles completed
`hyperbytedb_flush_errors_total`	counter	Failed flush cycles
`hyperbytedb_flush_points_total`	counter	Points flushed to chDB
`hyperbytedb_flush_duration_seconds`	histogram	Flush cycle duration
`hyperbytedb_wal_last_sequence`	gauge	Last flushed WAL sequence

Cluster-specific metrics:

Metric	Type	Description
`hyperbytedb_replication_writes_total`	counter	Write replication attempts
`hyperbytedb_replication_errors_total`	counter	Failed write replications
`hyperbytedb_replication_duration_seconds`	histogram	Replication latency
`hyperbytedb_cluster_node_state`	gauge	Node state (0=Joining through 5=Leaving)
`hyperbytedb_cluster_peers_active`	gauge	Number of active peers
`hyperbytedb_uptime_seconds`	gauge	Node uptime

Prometheus scrape configuration

scrape_configs:
  - job_name: 'hyperbytedb'
    static_configs:
      - targets: ['hyperbytedb:8086']
    metrics_path: /metrics
    scrape_interval: 15s

For clusters, scrape each node individually.

Logging

Logs are written to stderr. Control verbosity with the [logging] config section:

Level	Use case
`error`	Production: errors only
`warn`	Production: errors + warnings
`info`	Default: startup, shutdown, periodic summaries
`debug`	Development: query details, flush activity
`trace`	Deep debugging: all internal operations

Set format = "json" for structured output compatible with log aggregation (Loki, Elasticsearch, and similar).

Environment variable equivalents:

HYPERBYTEDB__LOGGING__LEVEL=info
HYPERBYTEDB__LOGGING__FORMAT=json

Statement summary

When statement_summary.enabled = true, recently executed TimeseriesQL statements are available at GET /api/v1/statements. Each entry includes the normalized query text, digest, execution time, and error status.

Health endpoint

GET /health returns:

{"status": "pass", "message": "ready for queries and writes"}

Always returns 200 as long as the HTTP server is running. In cluster mode, a node in Draining or Leaving state still responds to /health but rejects writes.

Backup and Restore

Create a backup

hyperbytedb backup --output /backups/hyperbytedb-$(date +%Y%m%d)

The backup directory contains:

Directory	Contents
`wal/`	RocksDB checkpoint of the WAL
`meta/`	RocksDB checkpoint of metadata
`data/`	Copy of the chDB session data directory (`chdb.session_data_path`)
`manifest.json`	Timestamp, WAL sequence, engine data paths

Backups can run while HyperbyteDB is serving traffic. RocksDB checkpoints are consistent point-in-time snapshots. For off-node copies, use your operator backup CRD or object storage tooling.

Restore

# 1. Stop HyperbyteDB
# 2. Restore (overwrites configured directories)
hyperbytedb restore --input /backups/hyperbytedb-20240115
# 3. Start HyperbyteDB
hyperbytedb serve

Restore overwrites the configured wal_dir, meta_dir, and chDB session data directory.

Retention

Retention policies are enforced by a background loop that runs ALTER TABLE … DELETE against expired rows in each measurement's MergeTree table. Tune frequency with [retention].interval in config. See Configuration.

Cluster Operations

Cluster inspection (HTTP)

Use the built-in HTTP endpoints for on-call inspection:

Endpoint	Description
`GET /cluster/metrics`	Node id, state, membership version, peer counts
`GET /cluster/nodes`	All nodes with health and addresses
`GET /internal/sync/manifest`	WAL watermark and measurement catalog used for sync
`GET /metrics`	Prometheus metrics

curl -s http://node1:8086/cluster/metrics | jq .
curl -s http://node2:8086/internal/sync/manifest | jq .

Compare manifests across nodes to spot replication lag or catalog drift.

Graceful drain

To remove a node from the cluster without data loss:

curl -sS -XPOST 'http://node-to-remove:8086/internal/drain'

The drain procedure:

Sets node state to Draining (rejects new writes with 503).
Flushes all WAL entries into chDB MergeTree tables.
Waits for replication acks from all peers (up to 60 seconds).
Notifies peers of departure.
Sets state to Leaving.

Cluster sync

In cluster mode, the Raft leader periodically compares /internal/sync/manifest responses from peers and may mark peers as needing sync. New nodes pull metadata and WAL deltas via the sync APIs. See Deep Dive: Clustering.

Background Services

HyperbyteDB runs several background services as Tokio tasks:

Service	Interval	Purpose
Flush	`flush.interval_secs` (10s)	WAL → chDB MergeTree
Retention	`retention.interval` (12h)	`ALTER TABLE … DELETE` for expired rows
Continuous Query	10s (fixed)	Execute CQ schedules
Heartbeat	`heartbeat_interval_secs` (2s, cluster)	Peer liveness detection
Leader sync monitor	30s (cluster)	Compare peer manifests and trigger sync when needed

All services shut down gracefully on ctrl+c: the flush service performs a final flush, then all service handles are awaited.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Administration

Monitoring

Prometheus Metrics

Prometheus scrape configuration

Logging

Statement summary

Health endpoint

Backup and Restore

Create a backup

Restore

Retention

Cluster Operations

Cluster inspection (HTTP)

Graceful drain

Cluster sync

Background Services

See Also

Uh oh!

FilesExpand file tree

administration.md

Latest commit

History

administration.md

File metadata and controls

Administration

Monitoring

Prometheus Metrics

Prometheus scrape configuration

Logging

Statement summary

Health endpoint

Backup and Restore

Create a backup

Restore

Retention

Cluster Operations

Cluster inspection (HTTP)

Graceful drain

Cluster sync

Background Services

See Also