-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
The backend (API + indexer) currently has only plain-text logs to stdout via tracing. There are no metrics, no distributed tracing, no structured log output, and minimal health checks. This issue tracks adding proper observability instrumentation.
Current State
- Logging:
tracing+tracing-subscriberwithfmt::layer()(text). Thejsonfeature is compiled but unused. - Metrics: None. No Prometheus, no
/metricsendpoint. - Distributed tracing: None. No OpenTelemetry, no span propagation.
- Health checks:
GET /healthreturns"OK"with no dependency verification. Indexer has no health endpoint at all. #[instrument]: Not used — no request-scoped spans, no correlation IDs.tower_http::TraceLayer: Configured but effectively silent at production log levels (info).
Proposed Work
1. Prometheus Metrics (/metrics endpoint)
API server:
- Request count by route, method, status code
- Request latency histogram by route
- Active connections gauge
- Database query latency histogram
- Database connection pool stats (active, idle, max)
Indexer:
blocks_indexed_totalcounterblocks_per_secondgaugebatch_duration_secondshistogramfailed_blocks_totalcounterrpc_requests_totalcounter (by status: success/failure)rpc_request_duration_secondshistogramindexer_head_blockgauge (current indexed height)chain_head_blockgauge (latest chain height)indexer_lag_blocksgauge (chain head - indexed head)db_insert_duration_secondshistogram
Crate candidates: metrics + metrics-exporter-prometheus or prometheus-client.
2. Structured JSON Logging
- Activate
tracing-subscriber'sjsonformatter behind a config flag (e.g.,LOG_FORMAT=json) - Ensure batch-complete stats are emitted as named
tracingfields, not embedded in format strings - Add
#[instrument]to API handler functions and key indexer methods for automatic span context
3. Improved Health Checks
API:
GET /health(liveness) — keep as-isGET /health/ready(readiness) — verify DB connectivity + indexer_state freshness (e.g., last update < 5 min)
Indexer:
- Add a lightweight HTTP server (separate port) with
/healththat reports:- Process is alive
- Last successful block indexed + timestamp
- Current lag from chain head
failed_blockstable row count
4. OpenTelemetry Integration (optional / future)
- Wire
tracingspans to OTLP exporter viatracing-opentelemetry - Propagate trace context through RPC calls
- Export to Jaeger/Tempo/etc.
Priority
Prometheus metrics and structured logging are the highest priority — they unblock dashboards, alerting, and log aggregation. Health check improvements are a close second. OTEL tracing is a nice-to-have for later.
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels